[RHEL7,COMMIT] ms/netlink: do not enter direct reclaim from netlink_dump()

Submitted by Konstantin Khorenko on May 18, 2020, 7:35 p.m.

Details

Message ID 202005181935.04IJZhoQ022182@finist-ce7.sw.ru
State New
Series "ms/netlink: do not enter direct reclaim from netlink_dump()"
Headers show

Commit Message

Konstantin Khorenko May 18, 2020, 7:35 p.m.
The commit is pushed to "branch-rh7-3.10.0-1127.8.2.vz7.151.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-1127.8.2.vz7.151.1
------>
commit 978945e17507c29ee959262af5ca64cb34ad26d5
Author: Vasily Averin <vvs@virtuozzo.com>
Date:   Mon May 18 22:35:43 2020 +0300

    ms/netlink: do not enter direct reclaim from netlink_dump()
    
        [ Upstream commit d35c99ff77ecb2eb239731b799386f3b3637a31e ]
    
        Since linux-3.15, netlink_dump() can use up to 16384 bytes skb
        allocations.
    
        Due to struct skb_shared_info ~320 bytes overhead, we end up using
        order-3 (on x86) page allocations, that might trigger direct reclaim and
        add stress.
    
        The intent was really to attempt a large allocation but immediately
        fallback to a smaller one (order-1 on x86) in case of memory stress.
    
        On recent kernels (linux-4.4), we can remove __GFP_DIRECT_RECLAIM to
        meet the goal. Old kernels would need to remove __GFP_WAIT
    
        While we are at it, since we do an order-3 allocation, allow to use
        all the allocated bytes instead of 16384 to reduce syscalls during
        large dumps.
    
        iproute2 already uses 32KB recvmsg() buffer sizes.
    
        Alexei provided an initial patch downsizing to SKB_WITH_OVERHEAD(16384)
    
        Fixes: 9063e21fb026 ("netlink: autosize skb lengthes")
        Signed-off-by: Eric Dumazet <edumazet@google.com>
        Reported-by: Alexei Starovoitov <ast@kernel.org>
        Cc: Greg Thelen <gthelen@google.com>
        Reviewed-by: Greg Rose <grose@lightfleet.com>
        Acked-by: Alexei Starovoitov <ast@kernel.org>
        Signed-off-by: David S. Miller <davem@davemloft.net>
        Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
        Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    
    [vvs@: taken from stable 3.19]
    https://jira.sw.ru/browse/PSBM-104086
    Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
---
 net/netlink/af_netlink.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

Patch hide | download patch | download mbox

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 382141c8a0d71..c36d6c354dfc5 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1802,7 +1802,7 @@  static int netlink_recvmsg(struct kiocb *kiocb, struct socket *sock,
 	/* Record the max length of recvmsg() calls for future allocations */
 	nlk->max_recvmsg_len = max(nlk->max_recvmsg_len, len);
 	nlk->max_recvmsg_len = min_t(size_t, nlk->max_recvmsg_len,
-				     16384);
+				     SKB_WITH_OVERHEAD(32768));
 
 	copied = data_skb->len - skip;
 	if (len < copied) {
@@ -2082,9 +2082,8 @@  static int netlink_dump(struct sock *sk)
 		skb = netlink_alloc_skb(sk,
 					nlk->max_recvmsg_len,
 					nlk->portid,
-					GFP_KERNEL |
-					__GFP_NOWARN |
-					__GFP_NORETRY);
+					(GFP_KERNEL & ~__GFP_WAIT) |
+					__GFP_NOWARN | __GFP_NORETRY);
 		/* available room should be exact amount to avoid MSG_TRUNC */
 		if (skb)
 			skb_reserve(skb, skb_tailroom(skb) -
@@ -2092,7 +2091,7 @@  static int netlink_dump(struct sock *sk)
 	}
 	if (!skb)
 		skb = netlink_alloc_skb(sk, alloc_size, nlk->portid,
-					GFP_KERNEL);
+					(GFP_KERNEL & ~__GFP_WAIT));
 	if (!skb)
 		goto errout_skb;
 	netlink_skb_set_owner_r(skb, sk);