parasite: handle errors while a transport socket is being created

Submitted by Andrei Vagin on July 28, 2016, 6:41 p.m.

Details

Message ID 1469731295-16443-1-git-send-email-avagin@openvz.org
State Rejected
Series "parasite: handle errors while a transport socket is being created"
Headers show

Commit Message

Andrei Vagin July 28, 2016, 6:41 p.m.
From: Andrew Vagin <avagin@virtuozzo.com>

Currently if socket() or connect() syscall-s failed, parasite cures itself,
but criu has not got any signals and waits on accept().

This patch adds a futex to synchronize parasite and criu. The server socket
is created with SOCK_NONBLOCK and waits on the futex when a parasite
connects to it, only then criu calls accept() and it returns immediately.

Reported-by: Yohei Kamitsukasa <uhoidx@gmail.com>
Cc: Yohei Kamitsukasa <uhoidx@gmail.com>
Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
---
 criu/include/parasite.h | 3 +++
 criu/net.c              | 2 +-
 criu/parasite-syscall.c | 9 +++++++++
 criu/pie/parasite.c     | 5 ++++-
 4 files changed, 17 insertions(+), 2 deletions(-)

Patch hide | download patch | download mbox

diff --git a/criu/include/parasite.h b/criu/include/parasite.h
index 82f4c3f..d2a5458 100644
--- a/criu/include/parasite.h
+++ b/criu/include/parasite.h
@@ -15,6 +15,7 @@ 
 
 #include "image.h"
 #include "util-pie.h"
+#include "lock.h"
 
 #include "images/vma.pb-c.h"
 #include "images/tty.pb-c.h"
@@ -77,6 +78,8 @@  struct parasite_init_args {
 	u64			sigreturn_addr;
 
 	u64			sigframe; /* pointer to sigframe */
+
+	futex_t			sync;
 };
 
 struct parasite_unmap_args {
diff --git a/criu/net.c b/criu/net.c
index a03c168..8593c8a 100644
--- a/criu/net.c
+++ b/criu/net.c
@@ -1678,7 +1678,7 @@  static int prep_ns_sockets(struct ns_id *ns, bool for_dump)
 	} else
 		ns->net.nlsk = -1;
 
-	ret = ns->net.seqsk = socket(PF_UNIX, SOCK_SEQPACKET, 0);
+	ret = ns->net.seqsk = socket(PF_UNIX, SOCK_SEQPACKET | SOCK_NONBLOCK, 0);
 	if (ret < 0) {
 		pr_perror("Can't create seqsk for parasite");
 		goto err_sq;
diff --git a/criu/parasite-syscall.c b/criu/parasite-syscall.c
index 824037c..2e3e150 100644
--- a/criu/parasite-syscall.c
+++ b/criu/parasite-syscall.c
@@ -515,6 +515,8 @@  static int parasite_init_daemon(struct parasite_ctl *ctl, struct ns_id *net)
 	args->sigframe = (uintptr_t)ctl->rsigframe;
 	args->log_level = log_get_loglevel();
 
+	futex_set(&args->sync, 0);
+
 	if (prepare_tsock(ctl, pid, args, net))
 		goto err;
 
@@ -526,6 +528,13 @@  static int parasite_init_daemon(struct parasite_ctl *ctl, struct ns_id *net)
 	if (parasite_run(pid, PTRACE_CONT, ctl->parasite_ip, ctl->rstack, &regs, &ctl->orig))
 		goto err;
 
+	futex_wait_while_eq(&args->sync, 0);
+	if (futex_get(&args->sync) != 1) {
+		errno = -(int)futex_get(&args->sync);
+		pr_perror("Unable to connect a transport socket");
+		goto err;
+	}
+
 	if (accept_tsock(ctl) < 0)
 		goto err;
 
diff --git a/criu/pie/parasite.c b/criu/pie/parasite.c
index b0dbafa..87a176f 100644
--- a/criu/pie/parasite.c
+++ b/criu/pie/parasite.c
@@ -737,7 +737,7 @@  static noinline __used int parasite_init_daemon(void *data)
 	args->sigreturn_addr = (u64)(uintptr_t)fini_sigreturn;
 	sigframe = (void*)(uintptr_t)args->sigframe;
 
-	tsock = sys_socket(PF_UNIX, SOCK_SEQPACKET, 0);
+	ret = tsock = sys_socket(PF_UNIX, SOCK_SEQPACKET, 0);
 	if (tsock < 0) {
 		pr_err("Can't create socket: %d\n", tsock);
 		goto err;
@@ -749,6 +749,8 @@  static noinline __used int parasite_init_daemon(void *data)
 		goto err;
 	}
 
+	futex_set_and_wake(&args->sync, 1);
+
 	ret = recv_fd(tsock);
 	if (ret >= 0) {
 		log_set_fd(ret);
@@ -760,6 +762,7 @@  static noinline __used int parasite_init_daemon(void *data)
 	parasite_daemon(data);
 
 err:
+	futex_set_and_wake(&args->sync, ret);
 	fini();
 	BUG();
 

Comments

Dmitry Safonov July 28, 2016, 7:08 p.m.
On 07/28/2016 09:41 PM, Andrey Vagin wrote:
> From: Andrew Vagin <avagin@virtuozzo.com>
>
> Currently if socket() or connect() syscall-s failed, parasite cures itself,
> but criu has not got any signals and waits on accept().
>
> This patch adds a futex to synchronize parasite and criu. The server socket
> is created with SOCK_NONBLOCK and waits on the futex when a parasite
> connects to it, only then criu calls accept() and it returns immediately.
>
> Reported-by: Yohei Kamitsukasa <uhoidx@gmail.com>
> Cc: Yohei Kamitsukasa <uhoidx@gmail.com>
> Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
> ---

Heh, and I've cooked something alike with alarm() and syscall's retval
with syscall's number through parasite_init_args.
But I think this way with futex is prettier.

Reviewed-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Cyrill Gorcunov July 28, 2016, 7:14 p.m.
On Thu, Jul 28, 2016 at 09:41:35PM +0300, Andrey Vagin wrote:
> From: Andrew Vagin <avagin@virtuozzo.com>
> 
> Currently if socket() or connect() syscall-s failed, parasite cures itself,
> but criu has not got any signals and waits on accept().
> 
> This patch adds a futex to synchronize parasite and criu. The server socket
> is created with SOCK_NONBLOCK and waits on the futex when a parasite
> connects to it, only then criu calls accept() and it returns immediately.
> 
> Reported-by: Yohei Kamitsukasa <uhoidx@gmail.com>
> Cc: Yohei Kamitsukasa <uhoidx@gmail.com>
> Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Pavel Emelianov Aug. 4, 2016, 1:43 p.m.
s/sync/daemon_connected/ and applied, thanks.

BTW, what might be the reason for socket()/connect() failure other than ENOMEM in our case?
Pavel Emelianov Aug. 5, 2016, 9:43 a.m.
Oh, by the way -- please, mind writing a fault injection test for this ;)
Andrey Vagin Aug. 5, 2016, 4:07 p.m.
On Fri, Aug 05, 2016 at 12:43:39PM +0300, Pavel Emelyanov wrote:
> Oh, by the way -- please, mind writing a fault injection test for this ;)

Sure