[Devel,rh7,2/3] net: core: use atomic high-order allocations

Submitted by Anatoly Stepanov on Oct. 21, 2016, 11:36 a.m.

Details

Message ID 1477049761-177906-3-git-send-email-astepanov@cloudlinux.com
State New
Series "net: core: optimize high-order allocations"
Headers show

Commit Message

Anatoly Stepanov Oct. 21, 2016, 11:36 a.m.
As we detected intensive direct reclaim activity in
sk_page_frag_refill()
it's reasonable to prevent it from trying so hard to allocate high-order
blocks, just do it when it's effortless.

This is a port of upstream (vanilla) change.
Original commit: fb05e7a89f500cfc06ae277bdc911b281928995d

We saw excessive direct memory compaction triggered by
skb_page_frag_refill.
This causes performance issues and add latency. Commit 5640f7685831e0
introduces the order-3 allocation. According to the changelog, the
order-3
allocation isn't a must-have but to improve performance. But direct
memory
compaction has high overhead. The benefit of order-3 allocation can't
compensate the overhead of direct memory compaction.

This patch makes the order-3 page allocation atomic. If there is no
memory
pressure and memory isn't fragmented, the alloction will still success,
so we
don't sacrifice the order-3 benefit here. If the atomic allocation
fails,
direct memory compaction will not be triggered, skb_page_frag_refill
will
fallback to order-0 immediately, hence the direct memory compaction
overhead is

avoided. In the allocation failure case, kswapd is waken up and doing
compaction, so chances are allocation could success next time.

alloc_skb_with_frags is the same.

The mellanox driver does similar thing, if this is accepted, we must fix
the driver too.

V3: fix the same issue in alloc_skb_with_frags as pointed out by Eric
V2: make the changelog clearer

Cc: Eric Dumazet <edumazet@google.com>
Cc: Chris Mason <clm@fb.com>
Cc: Debabrata Banerjee <dbavatar@gmail.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[ luis: backported to 3.16: used davem's backport to 3.14 ]
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Signed-off-by: Anatoly Stepanov <astepanov@cloudlinux.com>
---
 net/core/sock.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

Patch hide | download patch | download mbox

diff --git a/net/core/sock.c b/net/core/sock.c
index a94e1d0..763bd5d 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1816,7 +1816,7 @@  struct sk_buff *sock_alloc_send_pskb(struct sock *sk, unsigned long header_len,
 
 			while (order) {
 				if (npages >= 1 << order) {
-					page = alloc_pages(sk->sk_allocation |
+					page = alloc_pages((sk->sk_allocation & ~__GFP_WAIT)|
 							   __GFP_COMP |
 							   __GFP_NOWARN |
 							   __GFP_NORETRY,
@@ -1874,14 +1874,15 @@  bool sk_page_frag_refill(struct sock *sk, struct page_frag *pfrag)
 		put_page(pfrag->page);
 	}
 
-	/* We restrict high order allocations to users that can afford to wait */
-	order = (sk->sk_allocation & __GFP_WAIT) ? SKB_FRAG_PAGE_ORDER : 0;
+	order = SKB_FRAG_PAGE_ORDER;
 
 	do {
 		gfp_t gfp = sk->sk_allocation;
 
-		if (order)
+		if (order) {
 			gfp |= __GFP_COMP | __GFP_NOWARN | __GFP_NORETRY;
+			gfp &= ~__GFP_WAIT;
+		}
 		pfrag->page = alloc_pages(gfp, order);
 		if (likely(pfrag->page)) {
 			pfrag->offset = 0;

Comments

Debabrata Banerjee Oct. 24, 2016, 5:17 p.m.
On Fri, Oct 21, 2016 at 7:36 AM, Anatoly Stepanov
<astepanov@cloudlinux.com> wrote:
> As we detected intensive direct reclaim activity in
> sk_page_frag_refill()
> it's reasonable to prevent it from trying so hard to allocate high-order
> blocks, just do it when it's effortless.
>
> This is a port of upstream (vanilla) change.
> Original commit: fb05e7a89f500cfc06ae277bdc911b281928995d
>
> We saw excessive direct memory compaction triggered by
> skb_page_frag_refill.
> This causes performance issues and add latency. Commit 5640f7685831e0
> introduces the order-3 allocation. According to the changelog, the
> order-3
> allocation isn't a must-have but to improve performance. But direct
> memory
> compaction has high overhead. The benefit of order-3 allocation can't
> compensate the overhead of direct memory compaction.
>
> This patch makes the order-3 page allocation atomic. If there is no
> memory
> pressure and memory isn't fragmented, the alloction will still success,
> so we
> don't sacrifice the order-3 benefit here. If the atomic allocation
> fails,
> direct memory compaction will not be triggered, skb_page_frag_refill
> will
> fallback to order-0 immediately, hence the direct memory compaction
> overhead is
>
> avoided. In the allocation failure case, kswapd is waken up and doing
> compaction, so chances are allocation could success next time.
>

So while you do avoid direct reclaim, you can still wake up non-direct
reclaim for callers than can wait, and this can happen very very
often. We've had the allocation order forced to 0 to workaround the
problem since the change was introduced, it doesn't really have much
impact for us since it's almost an impossibility to fill en-masse
order-3 allocations on one of our real-world machines. So if it ever
goes through the __GFP_WAIT path, compact/reclaim cycles happen
indirectly and still take up CPU cycles, without making enough forward
progress for it to matter. This seems to me to be an intractable
problem, unless we find a way to make sure all kernel pages are
movable/compactible.

-Deb