[PATCHv2,3/3] fault-inj: Silently dying helper's child

Submitted by Dmitry Safonov on July 18, 2017, 8:05 p.m.

Details

Message ID 20170718200511.24066-4-dsafonov@virtuozzo.com
State New
Series "Fix TASK_HELPER deadlock on futex"
Headers show

Commit Message

Dmitry Safonov July 18, 2017, 8:05 p.m.
The restorer blob may die silently due to anything:
- Segmentation fault
- OOM killer
- User-sended SIGKILL
- Child CRIU restorer did't abort futex on error path (and exited)

We should terminate the restoring process and avoid locking
self up on waiting for died restoree.

Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
---
 criu/cr-restore.c              | 21 ++++++++++++++++++++-
 criu/include/fault-injection.h |  1 +
 test/jenkins/criu-fault.sh     |  1 +
 3 files changed, 22 insertions(+), 1 deletion(-)

Patch hide | download patch | download mbox

diff --git a/criu/cr-restore.c b/criu/cr-restore.c
index f40af1ba08c8..cac439ce8536 100644
--- a/criu/cr-restore.c
+++ b/criu/cr-restore.c
@@ -3704,11 +3704,30 @@  static int sigreturn_restore(pid_t pid, struct task_restore_args *task_args, uns
 		task_args->clone_restore_fn,
 		task_args->thread_args);
 
+	if (fault_injected(FI_HELPER_CHILD_DIE)) {
+		struct task_entries *t = task_args->task_entries;
+		bool must_die = current->parent->pid->state == TASK_HELPER;
+
+		if (must_die)
+			pr_info("fault-injected: restorer %d will die\n", pid);
+
+		/*
+		 * Restorer dies only when all helpers did current stage:
+		 * Begin: nr_in_progress = nr_tasks + nr_helpers
+		 * Exit on: nr_in_progress = nr_tasks
+		 */
+		futex_wait_while_gt(&t->nr_in_progress, t->nr_tasks);
+
+		if (must_die) {
+			pr_info("fault-injected: %d exiting\n", pid);
+			exit(1);
+		}
+	}
+
 	/*
 	 * An indirect call to task_restore, note it never returns
 	 * and restoring core is extremely destructive.
 	 */
-
 	JUMP_TO_RESTORER_BLOB(new_sp, restore_task_exec_start, task_args);
 
 err:
diff --git a/criu/include/fault-injection.h b/criu/include/fault-injection.h
index 46a5f71b031c..0da6bf8731c3 100644
--- a/criu/include/fault-injection.h
+++ b/criu/include/fault-injection.h
@@ -10,6 +10,7 @@  enum faults {
 	FI_RESTORE_OPEN_LINK_REMAP,
 	FI_PARASITE_CONNECT,
 	FI_POST_RESTORE,
+	FI_HELPER_CHILD_DIE,
 	/* not fatal */
 	FI_VDSO_TRAMPOLINES = 127,
 	FI_CHECK_OPEN_HANDLE = 128,
diff --git a/test/jenkins/criu-fault.sh b/test/jenkins/criu-fault.sh
index b7879116dc29..fbdf9b34ff03 100755
--- a/test/jenkins/criu-fault.sh
+++ b/test/jenkins/criu-fault.sh
@@ -21,3 +21,4 @@  prep
 ./test/zdtm.py run -t zdtm/static/env00 --fault 5 --keep-going --report report || fail
 ./test/zdtm.py run -t zdtm/static/maps04 --fault 131 --keep-going --report report --pre 2:1 || fail
 ./test/zdtm.py run -t zdtm/transition/maps008 --fault 131 --keep-going --report report --pre 2:1 || fail
+./test/zdtm.py run -t zdtm/static/session01 --fault 7 -f ns || fail