files: Fail dump if dump_one_file() fails

Submitted by Andy Tucker on April 10, 2018, 7:54 p.m.

Details

Message ID 20180410195411.78805-1-agtucker@google.com
State Accepted
Series "files: Fail dump if dump_one_file() fails"
Commit 3ff6f667854da78686c4272c168d8240d31da3d7
Headers show

Commit Message

Andy Tucker April 10, 2018, 7:54 p.m.
When dumping a process with a large number of open files,
dump_task_files_seized() processes the fds in batches. If
dump_one_file() results in an error, processing of the current batch is
stopped but the next batch (if any) will still be fetched and the error
value is overwritten. The result is a corrupt dump image (the fdinfo
file is missing a bunch of fds) which results in restore failure.

Also close all received fds after an error (previously the skipped ones
were left open).

Signed-off-by: Andy Tucker <agtucker@google.com>
---
 criu/files.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Patch hide | download patch | download mbox

diff --git a/criu/files.c b/criu/files.c
index 8c83d41f..130a5e8b 100644
--- a/criu/files.c
+++ b/criu/files.c
@@ -618,7 +618,7 @@  int dump_task_files_seized(struct parasite_ctl *ctl, struct pstree_item *item,
 		goto err;
 
 	ret = 0; /* Don't fail if nr_fds == 0 */
-	for (off = 0; off < dfds->nr_fds; off += nr_fds) {
+	for (off = 0; ret == 0 && off < dfds->nr_fds; off += nr_fds) {
 		if (nr_fds + off > dfds->nr_fds)
 			nr_fds = dfds->nr_fds - off;
 
@@ -632,7 +632,6 @@  int dump_task_files_seized(struct parasite_ctl *ctl, struct pstree_item *item,
 
 			ret = dump_one_file(item->pid, dfds->fds[i + off],
 						lfds[i], opts + i, ctl, &e);
-			close(lfds[i]);
 			if (ret)
 				break;
 
@@ -640,6 +639,9 @@  int dump_task_files_seized(struct parasite_ctl *ctl, struct pstree_item *item,
 			if (ret)
 				break;
 		}
+
+		for (i = 0; i < nr_fds; i++)
+			close(lfds[i]);
 	}
 
 	pr_info("----------------------------------------\n");

Comments

Dmitry Safonov April 10, 2018, 10:08 p.m.
+Cc: Mike

:-(

2018-04-10 22:27 GMT+01:00 Patchwork <criupatchwork@gmail.com>:
> == Series Details ==
>
> Series: files: Fail dump if dump_one_file() fails
> URL   : https://patchwork.criu.org/series/2467/
> State : failure
>
> == Logs ==
>
> For more details see: https://travis-ci.org/criupatchwork/criu/builds/364790282?utm_source=github_status&utm_medium=notification
> _______________________________________________
> CRIU mailing list
> CRIU@openvz.org
> https://lists.openvz.org/mailman/listinfo/criu
Dmitry Safonov April 10, 2018, 10:25 p.m.
2018-04-10 20:54 GMT+01:00 Andy Tucker <agtucker@google.com>:
> When dumping a process with a large number of open files,
> dump_task_files_seized() processes the fds in batches. If
> dump_one_file() results in an error, processing of the current batch is
> stopped but the next batch (if any) will still be fetched and the error
> value is overwritten. The result is a corrupt dump image (the fdinfo
> file is missing a bunch of fds) which results in restore failure.
>
> Also close all received fds after an error (previously the skipped ones
> were left open).
>
> Signed-off-by: Andy Tucker <agtucker@google.com>

Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>

Really nice catch!
Have you found it with some automatic tool?

+Cc: Cyrill
This looks like a candidate for your stable Openvz branch, imho.

Thanks,
             Dmitry
Andy Tucker April 10, 2018, 11:19 p.m.
On Tue, Apr 10, 2018 at 3:26 PM Dmitry Safonov <0x7f454c46@gmail.com> wrote:

> Really nice catch!
> Have you found it with some automatic tool?
>

No, just playing with CRIU and hit a restore failure where an fd from an
epoll instance was missing. Saw the error in the dump log and worked out
what must have happened.
Cyrill Gorcunov April 11, 2018, 6:53 a.m.
On Tue, Apr 10, 2018 at 11:19:50PM +0000, Andy Tucker wrote:
>    On Tue, Apr 10, 2018 at 3:26 PM Dmitry Safonov <0x7f454c46@gmail.com>
>    wrote:
> 
>      Really nice catch!
>      Have you found it with some automatic tool?
> 
>    No, just playing with CRIU and hit a restore failure where an fd from an
>    epoll instance was missing. Saw the error in the dump log and worked out
>    what must have happened.

Cool! Thanks a lot!

	Cyrill
Mike Rapoport April 11, 2018, 7:28 a.m.
On Tue, Apr 10, 2018 at 11:08:52PM +0100, Dmitry Safonov wrote:
> +Cc: Mike
> 
> :-(

The latest lazy-pages changes uncovered a couple of bugs, you hit one
of them, sorry about that.
 
> 2018-04-10 22:27 GMT+01:00 Patchwork <criupatchwork@gmail.com>:
> > == Series Details ==
> >
> > Series: files: Fail dump if dump_one_file() fails
> > URL   : https://patchwork.criu.org/series/2467/
> > State : failure
> >
> > == Logs ==
> >
> > For more details see: https://travis-ci.org/criupatchwork/criu/builds/364790282?utm_source=github_status&utm_medium=notification
> > _______________________________________________
> > CRIU mailing list
> > CRIU@openvz.org
> > https://lists.openvz.org/mailman/listinfo/criu
> 
> -- 
>              Dmitry
>
Dmitry Safonov April 11, 2018, 4:05 p.m.
2018-04-11 8:28 GMT+01:00 Mike Rapoport <rppt@linux.vnet.ibm.com>:
> On Tue, Apr 10, 2018 at 11:08:52PM +0100, Dmitry Safonov wrote:
>> +Cc: Mike
>>
>> :-(
>
> The latest lazy-pages changes uncovered a couple of bugs, you hit one
> of them, sorry about that.

That's ok - I didn't mean to push you anyhow, Cc'ing just to make sure
those problems are known to you.

Thanks,
             Dmitry
Andrey Vagin April 19, 2018, 6:57 p.m.
Applies, thank a lot!

On Tue, Apr 10, 2018 at 12:54:11PM -0700, Andy Tucker wrote:
> When dumping a process with a large number of open files,
> dump_task_files_seized() processes the fds in batches. If
> dump_one_file() results in an error, processing of the current batch is
> stopped but the next batch (if any) will still be fetched and the error
> value is overwritten. The result is a corrupt dump image (the fdinfo
> file is missing a bunch of fds) which results in restore failure.
> 
> Also close all received fds after an error (previously the skipped ones
> were left open).
> 
> Signed-off-by: Andy Tucker <agtucker@google.com>
> ---
>  criu/files.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/criu/files.c b/criu/files.c
> index 8c83d41f..130a5e8b 100644
> --- a/criu/files.c
> +++ b/criu/files.c
> @@ -618,7 +618,7 @@ int dump_task_files_seized(struct parasite_ctl *ctl, struct pstree_item *item,
>  		goto err;
>  
>  	ret = 0; /* Don't fail if nr_fds == 0 */
> -	for (off = 0; off < dfds->nr_fds; off += nr_fds) {
> +	for (off = 0; ret == 0 && off < dfds->nr_fds; off += nr_fds) {
>  		if (nr_fds + off > dfds->nr_fds)
>  			nr_fds = dfds->nr_fds - off;
>  
> @@ -632,7 +632,6 @@ int dump_task_files_seized(struct parasite_ctl *ctl, struct pstree_item *item,
>  
>  			ret = dump_one_file(item->pid, dfds->fds[i + off],
>  						lfds[i], opts + i, ctl, &e);
> -			close(lfds[i]);
>  			if (ret)
>  				break;
>  
> @@ -640,6 +639,9 @@ int dump_task_files_seized(struct parasite_ctl *ctl, struct pstree_item *item,
>  			if (ret)
>  				break;
>  		}
> +
> +		for (i = 0; i < nr_fds; i++)
> +			close(lfds[i]);
>  	}
>  
>  	pr_info("----------------------------------------\n");
> -- 
> 2.17.0.484.g0c8726318c-goog
> 
> _______________________________________________
> CRIU mailing list
> CRIU@openvz.org
> https://lists.openvz.org/mailman/listinfo/criu