vmsplice user-buffer(grabbed by process_vm_readv) to page-pipe

Submitted by Andrey Vagin on July 24, 2019, 6:18 a.m.

Details

Message ID 20190724061847.GA15765@gmail.com
State New
Series "vmsplice user-buffer(grabbed by process_vm_readv) to page-pipe"
Headers show

Commit Message

Andrey Vagin July 24, 2019, 6:18 a.m.
On Sun, Jul 21, 2019 at 05:54:34AM +0530, abhishek dubey wrote:
> Hi Andrei,
> 
> On 18/07/19 1:57 AM, Andrei Vagin wrote:
> > On Fri, Jul 12, 2019 at 03:01:39AM +0530, abhishek dubey wrote:
> > > Hi Andrei,
> > > 
> > > On 09/07/19 9:38 PM, Andrei Vagin wrote:
> > > > On Tue, Jul 09, 2019 at 01:50:57AM +0530, abhishek dubey wrote:
> > > > > Hi Andrei,
> > > > > 
> > > > > Please find inline replies.
> > > > > 
> > > > > On 08/07/19 9:35 AM, Andrei Vagin wrote:
> > > > > > On Fri, Jul 05, 2019 at 05:34:26PM +0530, Abhishek Dubey wrote:
> > > > > > > This patch implements usage of process_vm_readv
> > > > > > > syscall to collect memory pages from target process during
> > > > > > > pre-dump. process_vm_readv collects pages in user-buffer,
> > > > > > > which is later vmspliced to page-pipes.
> > > > > > You have to be more detailed in the commit message.
> > > > > > 
> > > > > > Frist of all, you have to explain why we need this patch.
> > > > > I will take care of same from next time.
> > > > > > If you improve performance, you need to give some numbers.
> > > > > Still I am handling issue of "missing entry in parent pagemap". Next patch
> > > > > will have comparative figures included.
> > > > > > If you implement a new feature, you need to provide tests for it.
> > > > > > 
> > > > > > If I understand this patch right, criu freezes all processes, collects
> > > > > > page maps, unfreezes process, dump pages? If this is right,
> > > > > Yes, this is right.
> > > > > > you need to
> > > > > > explain how it handles in this case:
> > > > > > 
> > > > > > addr = map(NULL, 4 * PAGE_SIZE)	|
> > > > > > addr[0] = 1			|
> > > > > > addr[PAGE_SIZE]  = 2		|
> > > > > > addr[2 * PAGE_SIZE] = 3		|
> > > > > > addr[3 * PAGE_SIZE] = 4		|
> > > > > > 				| criu freezes the process
> > > > > > 				| criu collects page maps
> > > > > > 				| criu unfreezes the process
> > > > > > unmap(addr, PAGE_SIZE * 2)	|
> > > > > > 				| criu dumps pages
> > > > > Need to explain this example somewhere along with code(as in page-pipe.h) or
> > > > > some other place?
> > > > here :)
> 
> testcase/maps007 does something like this. But instead of unmapping initial
> 2 pages(as in example above), it always unmaps
> 
> some intermediate page, like page 3 in this example. In current
> implementation, I have handled it by pre-dumping just first 2 pages, 3rd is
> unmapped so skip it
> 
> and 4th page is not pre-dumped. I also adjust the length of iovec as per
> pre-dumped page count. This approach is not correct and passes the test case
> 
> very few times. maps007 fails with page-count mis-match error.

I disabled this check in zdtm.py:


and maps007 fails each time:

[root@fc24 criu]# python test/zdtm.py run --pre 2 -t zdtm/transition/maps007
=== Run 1/1 ================ zdtm/transition/maps007
======================= Run zdtm/transition/maps007 in h =======================
Start test
Test is SUID
./maps007 --pidfile=maps007.pid --outfile=maps007.out
Run criu pre-dump
ERROR: bad page counts, stats = 3506 real = 14(0)
Run criu pre-dump
=[log]=> dump/zdtm/transition/maps007/36/2/pre-dump.log
------------------------ grep Error ------------------------
(00.269502) 	buf 477/433
(00.269639) Can't read remote iovs (225280(1953792)/477)
(00.269649) 	h 0x9cd000 [2]
(00.269657) Checking 0x9cd000/8192 hole
(00.269667) Error (criu/page-xfer.c:287): Missing 9cd000 in parent pagemap
(00.269676) Error (criu/page-xfer.c:331): Hole 0x9cd000/8192 not found in parent
(00.269752) Writing image inventory (version 1)
(00.269842) Error (criu/cr-dump.c:1548): Pre-dumping FAILED.
------------------------ ERROR OVER ------------------------
############## Test zdtm/transition/maps007 FAIL at CRIU pre-dump ##############
Send the 9 signal to  36
Wait for zdtm/transition/maps007(36) to die for 0.100000
##################################### FAIL #####################################


> 
> > > > I think this can be non-trivial case. process_vm_readv will return
> > > > ENOMEM for this segment, but a part of it still exists and we need to
> > > > find what part is here and dump it.
> 
> Consider the following iovec:
> 
> [A: 1 page] [B: 4 page] [C: 1 page] [D: 2 page]
> 
> iovec_cnt: 4, PIPE_size =16
> 
> Before pre-dump could start, process has unmapped complete 'B' region.
> 
> process_vm_readv() will fail at 'B' and will only process region 'A' by
> default.
> 
> Current implementation done till now supports processing region 'C' and 'D'
> and generate page-map accordingly.
> 
> So here bad iovec (B) is skipped and remaining iovec(C,D) are processed.
> 
> ----------------------------------------------------------------------------------------------------------
> 
> Considering the same above example, status of 4 pages of region B :
> 
> {B1: unmapped} {B2: unmapped} {B3: mapped} {B4: mapped}
> 
> The regions B3,B4 although mapped were skipped by pre-dumping.
> 
> 
> For such case, can I implement something like:
> 
> Make "temporary iovec" for region B, like:
> 
> [B1: 1 page] [B2: 1 page] [B3: 1 page] [B4: 1 page]
> 
> 
> With existing implementation(explained above dotted line), process_vm_readv
> can skip iovec B1 and B2
> 
> and move ahead processing B3 onward. Handling culprit iovec at finer
> granularity.
> 
> 
> During pagemap generation, instead of iovec of B, temporary iovec will be
> supplied.
> 
> Can I go ahead with this approach?
> 
> > > Pagemap cache won't be useful here.
> > > 
> > > For now, I can only think of setting a threshold. If size of mapping is
> > > below threshold, we can process page-by-page(which is costly).
> > > 
> > > Beyond threshold, we can leave such mapping to be handled by dump stage.
> Incorrect approach. Can't leave pages.
> > How do you detect that these pages have not been dumped on the dump
> > stage? Do we have a test for this case?
> > 
> > > 
> > > Implementation query:
> > > 
> > > During "pre-dump", some of the iovs in iovec, fail to be processed by
> > > process_vm_readv(). Hence pagemap entry is skipped for same.
> > > 
> > > Now, memory belonging to these failed iovs is not dirtied. During "dump" on
> > > generating iovs, such non-dirty memory areas are marked as
> > > 
> > > holes and check for pages(in parent) backing these holes is done in
> > > page-xfer. Ultimately, holes are pointing to pages that are not pre-dumped.
> > > 
> > > This check must be shifted to "dump stage", so that pages(iovs) skipped by
> > > process_vm_readv can't be marked as holes in incremental steps.
> > > 
> > > Can I add new data structure to list down all the iovs failed by
> > > process_vm_readv. This list will be checked while generating holes in dump
> > > stage.
> > > 
> > > is that a right approach?
> > > 
> > > 
> > > Please suggest if something more efficient or direct solution is possible.
> > > 
> > > 
> > > -Abhishek
> > > 
> -Abhishek

Patch hide | download patch | download mbox

diff --git a/test/zdtm.py b/test/zdtm.py
index 69be1df5a..47768ed2d 100755
--- a/test/zdtm.py
+++ b/test/zdtm.py
@@ -1204,7 +1204,7 @@  class criu:
         if (stats_written != r_pages) or (r_off != 0):
             print("ERROR: bad page counts, stats = %d real = %d(%d)" %
                   (stats_written, r_pages, r_off))
-            raise test_fail_exc("page counts mismatch")
+            #raise test_fail_exc("page counts mismatch")
 
     def dump(self, action, opts=[]):
         self.__iter += 1