[v8,10/10] usernsd: print additional debuging due to silent fails of userns_call

Submitted by Pavel Tikhomirov on Feb. 22, 2018, 11:49 a.m.

Details

Message ID 20180222114928.24343-11-ptikhomirov@virtuozzo.com
State Accepted
Series "don't use wrong pagemap (from other task) on pid reuse"
Headers show

Commit Message

Pavel Tikhomirov Feb. 22, 2018, 11:49 a.m.
We have these fail from prep_usernsd_transport, here is log (with only
fitst hunk applied):

Patch hide | download patch | download mbox

========================= Run zdtm/static/env00 in ns ==========================
Start test
./env00 --pidfile=env00.pid --outfile=env00.out --envname=ENV_00_TEST
Run criu dump
Run criu restore
Fail RESTORE failed: Error(0): Unknown
=[log]=> dump/zdtm/static/env00/85/1/restore.log
------------------------ grep Error ------------------------
(00.022991) PID: real 106 virt 1
(00.140293) Running setup-namespaces scripts
(00.140393)      1: cg: Cgroups 1 inherited from parent
(00.140413)      1: uns: calling usernsd_recv_transport (0, 0)
(00.140460)      1: Error (criu/cr-restore.c:1870): Failed to prepare usernsd transport
(00.140500) uns: daemon calls 0x48dd30 (106, 1, 0)
(00.140538) Error (criu/cr-restore.c:2329): Failed to switch restore stage to CR_STATE_PREPARE_NAMESPACES
(00.170506) Error (criu/mount.c:3175): mnt: Can't remove the directory /tmp/.criu.mntns.W2YHjG: No such file or directory
(00.170530) uns: calling exit_usernsd (-1, 1)
(00.170589) uns: daemon calls 0x48dd00 (104, -1, 1)
(00.170600) uns: `- daemon exits w/ 0
(00.175813) uns: daemon stopped
(00.175827) Error (criu/cr-restore.c:2539): Restoring FAILED.
------------------------ ERROR OVER ------------------------

https://travis-ci.org/Snorch/criu/jobs/344237216

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
---
 criu/cr-restore.c | 4 +++-
 criu/namespaces.c | 8 ++++++--
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/criu/cr-restore.c b/criu/cr-restore.c
index 8c5d067f0..db913b2da 100644
--- a/criu/cr-restore.c
+++ b/criu/cr-restore.c
@@ -1860,8 +1860,10 @@  static int restore_task_with_children(void *_arg)
 
 	/* Restore root task */
 	if (current->parent == NULL) {
-		if (prep_usernsd_transport())
+		if (prep_usernsd_transport()) {
+			pr_err("Failed to prepare usernsd transport\n");
 			goto err;
+		}
 
 		if (join_namespaces()) {
 			pr_perror("Join namespaces failed");
diff --git a/criu/namespaces.c b/criu/namespaces.c
index c393ccea4..c6c785d98 100644
--- a/criu/namespaces.c
+++ b/criu/namespaces.c
@@ -1716,6 +1716,7 @@  int __userns_call(const char *func_name, uns_call_t call, int flags,
 	int ret, res, sk;
 	bool async = flags & UNS_ASYNC;
 	struct unsc_msg um;
+	uns_call_t rcall;
 
 	if (unlikely(arg_size > MAX_UNSFD_MSG_SIZE)) {
 		pr_err("uns: message size exceeded\n");
@@ -1726,7 +1727,7 @@  int __userns_call(const char *func_name, uns_call_t call, int flags,
 		return call(arg, fd, getpid());
 
 	sk = get_service_fd(USERNSD_SK);
-	pr_debug("uns: calling %s (%d, %x)\n", func_name, fd, flags);
+	pr_debug("uns: calling %s[%p] (%d, %x)\n", func_name, call, fd, flags);
 
 	if (!async)
 		/*
@@ -1762,7 +1763,7 @@  int __userns_call(const char *func_name, uns_call_t call, int flags,
 
 	/* Get the response back */
 
-	unsc_msg_init(&um, &call, &res, NULL, 0, 0);
+	unsc_msg_init(&um, &rcall, &res, NULL, 0, 0);
 	ret = recvmsg(sk, &um.h, 0);
 	if (ret <= 0) {
 		pr_perror("uns: recv resp error");
@@ -1776,6 +1777,9 @@  int __userns_call(const char *func_name, uns_call_t call, int flags,
 		unsc_msg_pid_fd(&um, NULL, &ret);
 	else
 		ret = res;
+
+	pr_debug("uns: call %s[%p] returned %d\n", func_name, rcall, ret);
+	BUG_ON(rcall != call);
 out:
 	if (!async)
 		mutex_unlock(&task_entries->userns_sync_lock);