page-server: Allow blocking on socket

Submitted by Pavel Emelianov on Jan. 9, 2017, 10:59 a.m.


Message ID
State Accepted
Series "page-server: Allow blocking on pipe"
Headers show

Commit Message

Pavel Emelianov Jan. 9, 2017, 10:59 a.m.
On 01/09/2017 11:41 AM, Pavel Emelyanov wrote:
> On 01/02/2017 10:30 PM, Andrei Vagin wrote:
>> On Mon, Dec 19, 2016 at 01:13:51PM +0300, Pavel Emelyanov wrote:
>>> This splice tries to get pages from socket into local pipe to
>>> splice them into images later. The data on the socket may not
>>> be there by the time we get to this splice, so there's no reason
>>> to force non-blocking IO here.
>> This SPLICE_F_NONBLOCK isn't about data on the socket. We don't set
>> SOCK_NONBLOCK, so this splice waits data on the socket even with
>> ssize_t tcp_splice_read(struct socket *sock, loff_t *ppos,
>> ...
>>         timeo = sock_rcvtimeo(sk, sock->file->f_flags & O_NONBLOCK);
> True, but when the socket is AF_UNIX one, the issues the other way around:
>         if (sock->file->f_flags & O_NONBLOCK ||
>             flags & SPLICE_F_NONBLOCK)
>                 state.flags = MSG_DONTWAIT;
> so getting data from empty unix socket (it can be empty simply because no
> data other than header has arrived yet) results in EAGAIN.

OK, with the patch below we can revert the original one :)

When splicing page server data from UNIX socket we may get
error (EAGAIN) from splice if no data is available on the
socket yet. This is because the SPLICE_F_NONBLOCK flag is
checked by af_unix.c in the kernel to decide whether or
not to do blocking read.

This is not symmetrical with TCP sockets, which only check
for the socket's O_NONBLOCK flag for the same decicion.

Dropping the SPLICE_F_NONBLOCK flag is not possible too, as
otherwise we'll block on the pipe when trying to put data
into it. Even if part of the data fits into it kernel would
block anyway untill full buffer is in. And there will be
no read() from the pipe, as it should happen one step later 
in the same task.

So to untie this, we need to wait for the data explicitly
with poll().

Signed-off-by: Pavel Emelyanov <>
 criu/include/util.h |  7 +++++++
 criu/page-xfer.c    | 13 ++++++++++++-
 2 files changed, 19 insertions(+), 1 deletion(-)

Patch hide | download patch | download mbox

diff --git a/criu/include/util.h b/criu/include/util.h
index 1fa0742..22c9c4d 100644
--- a/criu/include/util.h
+++ b/criu/include/util.h
@@ -12,6 +12,7 @@ 
 #include <sys/statfs.h>
 #include <sys/sysmacros.h>
 #include <dirent.h>
+#include <poll.h>
 #include "int.h"
 #include "common/compiler.h"
@@ -263,6 +264,12 @@  int fd_has_data(int lfd);
 int make_yard(char *path);
+static inline void sk_wait_data(int sk)
+	struct pollfd pfd = {sk, POLLIN, 0};
+	poll(&pfd, 1, -1);
 void tcp_nodelay(int sk, bool on);
 void tcp_cork(int sk, bool on);
diff --git a/criu/page-xfer.c b/criu/page-xfer.c
index 39c6977..73173bd 100644
--- a/criu/page-xfer.c
+++ b/criu/page-xfer.c
@@ -596,7 +596,18 @@  static int page_server_add(int sk, struct page_server_iov *pi, u32 flags)
 		if (chunk > cxfer.pipe_size)
 			chunk = cxfer.pipe_size;
-		chunk = splice(sk, NULL, cxfer.p[1], NULL, chunk, SPLICE_F_MOVE);
+		/*
+		 * Splicing into a pipe may end up blocking if pipe is "full",
+		 * and we need the SPLICE_F_NONBLOCK flag here. At the same time
+		 * splcing from UNIX socket with this flag aborts splice with
+		 * the EAGAIN if there's no data in it (TCP looks at the socket
+		 * O_NONBLOCK flag _only_ and waits for data), so before doing
+		 * the non-blocking splice we need to explicitly wait.
+		 */
+		sk_wait_data(sk);
+		chunk = splice(sk, NULL, cxfer.p[1], NULL, chunk, SPLICE_F_MOVE | SPLICE_F_NONBLOCK);
 		if (chunk < 0) {
 			pr_perror("Can't read from socket");
 			return -1;