[3/4] restore: Split restore_one_helper() and wait exiting zombie children

Submitted by Kirill Tkhai on Dec. 28, 2017, 9:35 a.m.

Details

Message ID 151445374752.3248.11114452193501629150.stgit@localhost.localdomain
State New
Series "Fix restore of tasks having zombie pgid"
Headers show

Commit Message

Kirill Tkhai Dec. 28, 2017, 9:35 a.m.
Zombie is also can be choosen as a parent for task helper like
any other task.

If the task helper exits between restore_finish_stage(CR_STATE_RESTORE)
and zombie_prepare_signals()->SIG_UNBLOCK, the standard criu SIGCHLD
handler is called, and the restore fails:

(00.057762)     41: Error (criu/cr-restore.c:1557): 40 exited, status=0
(00.057815) Error (criu/cr-restore.c:2465): Restoring FAILED.

This patch makes restore_one_zombie() behave as restore_one_helper()
and to wait children exits before allowing SIGCHLD. This makes us
safe against races with exiting children.

See next patch for test details.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
---
 criu/cr-restore.c |   33 ++++++++++++++++++++-------------
 1 file changed, 20 insertions(+), 13 deletions(-)

Patch hide | download patch | download mbox

diff --git a/criu/cr-restore.c b/criu/cr-restore.c
index 5bebc3d61..7545572dc 100644
--- a/criu/cr-restore.c
+++ b/criu/cr-restore.c
@@ -1111,6 +1111,8 @@  static int wait_on_helpers_zombies(void)
 	return 0;
 }
 
+static int wait_exiting_children(void);
+
 static int restore_one_zombie(CoreEntry *core)
 {
 	int exit_code = core->tc->exit_code;
@@ -1129,7 +1131,7 @@  static int restore_one_zombie(CoreEntry *core)
 	prctl(PR_SET_NAME, (long)(void *)core->tc->comm, 0, 0, 0);
 
 	if (task_entries != NULL) {
-		restore_finish_stage(task_entries, CR_STATE_RESTORE);
+		wait_exiting_children();
 		zombie_prepare_signals();
 	}
 
@@ -1233,21 +1235,10 @@  static bool child_death_expected(void)
 	return false;
 }
 
-/*
- * Restore a helper process - artificially created by criu
- * to restore attributes of process tree.
- * - sessions for each leaders are dead
- * - process groups with dead leaders
- * - dead tasks for which /proc/<pid>/... is opened by restoring task
- * - whatnot
- */
-static int restore_one_helper(void)
+static int wait_exiting_children(void)
 {
 	siginfo_t info;
 
-	if (prepare_fds(current))
-		return -1;
-
 	if (!child_death_expected()) {
 		/*
 		 * Restoree has no children that should die, during restore,
@@ -1290,6 +1281,22 @@  static int restore_one_helper(void)
 	return 0;
 }
 
+/*
+ * Restore a helper process - artificially created by criu
+ * to restore attributes of process tree.
+ * - sessions for each leaders are dead
+ * - process groups with dead leaders
+ * - dead tasks for which /proc/<pid>/... is opened by restoring task
+ * - whatnot
+ */
+static int restore_one_helper(void)
+{
+	if (prepare_fds(current))
+		return -1;
+
+	return wait_exiting_children();
+}
+
 static int restore_one_task(int pid, CoreEntry *core)
 {
 	int i, ret;