[RHEL8,COMMIT] netlink: allow to set peeking offset for sockets

Submitted by Konstantin Khorenko on March 10, 2020, 3:01 p.m.

Details

Message ID 202003101501.02AF1Fdm024510@finist_co8.work.ct
State New
Series "fixes to VZ8 required for criu"
Headers show

Commit Message

Konstantin Khorenko March 10, 2020, 3:01 p.m.
The commit is pushed to "branch-rh8-4.18.0-80.1.2.vz8.3.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh8-4.18.0-80.1.2.vz8.3.2
------>
commit 769b6aa27ca21903424a9448b272210bdadeb3e1
Author: Stanislav Kinsburskiy <skinsbursky@virtuozzo.com>
Date:   Tue Mar 10 18:01:14 2020 +0300

    netlink: allow to set peeking offset for sockets
    
    Patchset description:
    
    netlink: prepare to dump and restore data from a receive queue
    
    CRIU can dump queued data for unix and tcp sockets,
    now it's time for netlink sockets.
    
    Here are there questions.
    * How to dump data from a receive queue
      We can set peeking offset like we do for unix sockets.
    
    * How to restore data back to a receive queue
      I suggest to add a repair mode like we do for tcp sockets.
    
    * When we can dump data from a receive queue.
      I think we can do this only if a socket doesn't have a running callback.
    
      Andrey Vagin (3):
        netlink: allow to set peeking offset for sockets
        netlink: add an ability to restore messages in a receive queue
        netlink/diag: report flags for netlink sockets
    
    https://jira.sw.ru/browse/PSBM-28386
    
    khorenko@: there is no locking right now, but while we are the only
    user for this interface, this is not essential at the moment, we'll add
    locking on top later in the scope of:
    
    https://jira.sw.ru/browse/PSBM-48484
    
    ===========================================================
    This patch description:
    
    This allows us to read socket's queue without removing skbs from it.
    
    The same logic was implemented for unix and inet sockets and we use this
    to dump and restore sockets in CRIU.
    
    Here is a question whether sk_peek_off has to be protected by locks.
    Currently it isn't protected and an user who uses sk_peek_off has to be
    sure that nobody calls recvmsg for a socket except him.
    
    https://jira.sw.ru/browse/PSBM-28386
    
    Signed-off-by: Andrey Vagin <avagin@virtuozzo.com>
    Reviewed-by: Cyrill Gorcunov <gorcunov@virtuozzo.com>
    
    ===========================================================
    
    netlink: Don't manipulate @sk_peek_off if data fetching failed
    
    When skb_copy_datagram_iovec called to fetch queued data
    it may fail with EFAULT and if MSG_PEEK set by a caller
    the position get advanced even if data hasn't been read.
    So we might loose data bits here on subsequent recvmsg
    calls. Instead lets exit early with error.
    
    In sake of https://jira.sw.ru/browse/PSBM-57921
    
    Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
    Acked-by: Andrey Vagin <avagin@virtuozzo.com>
    
    https://jira.sw.ru/browse/PSBM-101289
    vz7 commit: 081614621eb6e ("netlink: allow to set peeking offset for sockets")
    
    Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
---
 net/netlink/af_netlink.c | 25 ++++++++++++++++++++-----
 1 file changed, 20 insertions(+), 5 deletions(-)

Patch hide | download patch | download mbox

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index ea497a739298..15d1f2b1e339 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1943,17 +1943,18 @@  static int netlink_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
 	struct scm_cookie scm;
 	struct sock *sk = sock->sk;
 	struct netlink_sock *nlk = nlk_sk(sk);
-	int noblock = flags&MSG_DONTWAIT;
 	size_t copied;
 	struct sk_buff *skb, *data_skb;
+	int peeked, skip;
 	int err, ret;
 
 	if (flags&MSG_OOB)
 		return -EOPNOTSUPP;
 
 	copied = 0;
+	skip = sk_peek_offset(sk, flags);
 
-	skb = skb_recv_datagram(sk, flags, noblock, &err);
+	skb = __skb_recv_datagram(sk, flags, NULL, &peeked, &skip, &err);
 	if (skb == NULL)
 		goto out;
 
@@ -1981,14 +1982,20 @@  static int netlink_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
 	nlk->max_recvmsg_len = min_t(size_t, nlk->max_recvmsg_len,
 				     SKB_WITH_OVERHEAD(32768));
 
-	copied = data_skb->len;
+	copied = data_skb->len - skip;
 	if (len < copied) {
 		msg->msg_flags |= MSG_TRUNC;
 		copied = len;
 	}
 
 	skb_reset_transport_header(data_skb);
-	err = skb_copy_datagram_msg(data_skb, 0, msg, copied);
+	err = skb_copy_datagram_msg(data_skb, skip, msg, copied);
+	if (!err) {
+		if (flags & MSG_PEEK)
+			sk_peek_offset_fwd(sk, copied);
+		else
+			sk_peek_offset_bwd(sk, skb->len);
+	}
 
 	if (msg->msg_name) {
 		DECLARE_SOCKADDR(struct sockaddr_nl *, addr, msg->msg_name);
@@ -2007,7 +2014,7 @@  static int netlink_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
 	memset(&scm, 0, sizeof(scm));
 	scm.creds = *NETLINK_CREDS(skb);
 	if (flags & MSG_TRUNC)
-		copied = data_skb->len;
+		copied = data_skb->len - skip;
 
 	skb_free_datagram(sk, skb);
 
@@ -2681,6 +2688,13 @@  int netlink_unregister_notifier(struct notifier_block *nb)
 }
 EXPORT_SYMBOL(netlink_unregister_notifier);
 
+static int netlink_set_peek_off(struct sock *sk, int val)
+{
+	sk->sk_peek_off = val;
+
+	return 0;
+}
+
 static const struct proto_ops netlink_ops = {
 	.family =	PF_NETLINK,
 	.owner =	THIS_MODULE,
@@ -2700,6 +2714,7 @@  static const struct proto_ops netlink_ops = {
 	.recvmsg =	netlink_recvmsg,
 	.mmap =		sock_no_mmap,
 	.sendpage =	sock_no_sendpage,
+	.set_peek_off = netlink_set_peek_off,
 };
 
 static const struct net_proto_family netlink_family_ops = {