GSoC 19: Optimizing the Pre-dump Algorithm

Submitted by Abhishek Dubey on Aug. 20, 2019, 11:06 p.m.

Details

Reviewer None
Submitted Aug. 20, 2019, 11:06 p.m.
Last Updated Aug. 25, 2019, 1:27 p.m.
Revision 2

Cover Letter

This patch series implements optimization of pre-dumping algorithm
as part of GSoC 2019 project.

In current pre-dumping, the target process needs to be frozen
till all the memory pages are drained into pipes. Then the target
process gets unfrozen and pages collected into pipes are written
into image files at the end of pre-dump. This approach has two
problems. First, target process remains frozen for longer duration.
Second, pipes induce memory pressure in the system. If memory
utilization during pre-dump is nearly equal to system's memory,
then this risks running into out-of-memory failures as the pipe
pages are not reclaimable.

The new implementation of pre-dump solves above mentioned two
issues. In this, the target process is frozen untill memory
mappings are collected. Then the process will unfreeze and
continue. The pages of target process are copied while process
is running. We use process_vm_readv syscall to copy pages from
process memory to user-space buffer, by using collected mappings.
Since page copying and process execution are happening
simultaneously, there is possibility that process might have
modified some mappings after they have been collected. This
results in race over memory mappings.

This patch series handles the race over mappings, utilizes
user-space buffer through process_vm_readv syscall to copy pages
and reduces the frozen time for target process by allowing it to
run as soon as memory mappings are collected. We call this new
approach of pre-dumping as "read mode pre-dump". New CLI option
--pre-dump-mode is added for it, which takes "splice" or "read"
as input. "splice" mode is traditional parasite way of pre-dumping
and is set by default.

Evaluation of new approach:
---------------------------

Performance:
------------
$ ./test/zdtm.py run --pre 5 -t zdtm/static/maps04 -f h

Splice mode -
pre-dump: 0.59
pre-dump: 0.06
pre-dump: 0.07
pre-dump: 0.06
pre-dump: 0.06
dump: 0.12

$ ./test/zdtm.py run --pre 5 -t zdtm/static/maps04 --pre-dump-mode=read -f h

Read mode -
pre-dump: 0.66
pre-dump: 0.06
pre-dump: 0.06
pre-dump: 0.06
pre-dump: 0.06
dump: 0.12

Average drop : ~ 7%
Maximum drop : ~ 13%

Freeze time:
------------
$ ./test/zdtm.py run --pre 5 -t zdtm/static/maps04 --show-stats -f h

Splice mode -
pre-dump: 122235
pre-dump: 77514
pre-dump: 77912
pre-dump: 74940
pre-dump: 71977
dump: 131208

$ ./test/zdtm.py run --pre 5 -t zdtm/static/maps04 --pre-dump-mode=read --show-stats -f h

Read mode -
pre-dump: 82365
pre-dump: 67521
pre-dump: 68385
pre-dump: 68706
pre-dump: 67144
dump: 160998

Average reduction : ~18%
Maximum reduction : ~35%

Abhishek Dubey (7):
  Adding --pre-dump-mode option
  Skip generating iov for non-PROT_READ memory
  Skip adding PROT_READ to non-PROT_READ mappings
  Adding cnt_sub for stats manipulation
  Handle vmsplice failure for read mode pre-dump
  read mode pre-dump implementation
  Refactor time accounting macros

 Documentation/criu.txt    |   6 +
 criu/config.c             |  10 ++
 criu/cr-dump.c            |   9 +-
 criu/crtools.c            |   2 +
 criu/include/cr_options.h |   7 +
 criu/include/page-xfer.h  |   4 +
 criu/include/stats.h      |   1 +
 criu/mem.c                |  88 +++++++++--
 criu/page-pipe.c          |  10 ++
 criu/page-xfer.c          | 382 ++++++++++++++++++++++++++++++++++++++++++++++
 criu/stats.c              |  12 ++
 test/zdtm.py              |   9 +-
 12 files changed, 524 insertions(+), 16 deletions(-)
  

Revisions