[rh7] ms/net: don't wait for order-3 page allocation

Submitted by Andrey Ryabinin on Feb. 13, 2018, 2:21 p.m.

Details

Message ID 20180213142145.25613-1-aryabinin@virtuozzo.com
State New
Series "ms/net: don't wait for order-3 page allocation"
Headers show

Commit Message

Andrey Ryabinin Feb. 13, 2018, 2:21 p.m.
From: Shaohua Li <shli@fb.com>

We saw excessive direct memory compaction triggered by skb_page_frag_refill.
This causes performance issues and add latency. Commit 5640f7685831e0
introduces the order-3 allocation. According to the changelog, the order-3
allocation isn't a must-have but to improve performance. But direct memory
compaction has high overhead. The benefit of order-3 allocation can't
compensate the overhead of direct memory compaction.

This patch makes the order-3 page allocation atomic. If there is no memory
pressure and memory isn't fragmented, the alloction will still success, so we
don't sacrifice the order-3 benefit here. If the atomic allocation fails,
direct memory compaction will not be triggered, skb_page_frag_refill will
fallback to order-0 immediately, hence the direct memory compaction overhead is
avoided. In the allocation failure case, kswapd is waken up and doing
compaction, so chances are allocation could success next time.

alloc_skb_with_frags is the same.

The mellanox driver does similar thing, if this is accepted, we must fix
the driver too.

V3: fix the same issue in alloc_skb_with_frags as pointed out by Eric
V2: make the changelog clearer

Cc: Eric Dumazet <edumazet@google.com>
Cc: Chris Mason <clm@fb.com>
Cc: Debabrata Banerjee <dbavatar@gmail.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

https://jira.sw.ru/browse/PSBM-81488
(cherry picked from commit fb05e7a89f500cfc06ae277bdc911b281928995d)
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
---
 net/core/skbuff.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Patch hide | download patch | download mbox

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 4147d7219589..dca9d1d9676b 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4360,7 +4360,7 @@  struct sk_buff *alloc_skb_with_frags(unsigned long header_len,
 
 		while (order) {
 			if (npages >= 1 << order) {
-				page = alloc_pages(gfp_mask |
+				page = alloc_pages((gfp_mask & ~__GFP_WAIT) |
 						   __GFP_COMP |
 						   __GFP_NOWARN |
 						   __GFP_NORETRY,

Comments

Alexey Kuznetsov Feb. 13, 2018, 4:52 p.m.
Hello!

BTW, check this a few lines above:

        gfp_head = gfp_mask;
        if (gfp_head & __GFP_WAIT)
                gfp_head |= __GFP_REPEAT;

What is this??


On Tue, Feb 13, 2018 at 5:21 PM, Andrey Ryabinin
<aryabinin@virtuozzo.com> wrote:
> From: Shaohua Li <shli@fb.com>
>
> We saw excessive direct memory compaction triggered by skb_page_frag_refill.
> This causes performance issues and add latency. Commit 5640f7685831e0
> introduces the order-3 allocation. According to the changelog, the order-3
> allocation isn't a must-have but to improve performance. But direct memory
> compaction has high overhead. The benefit of order-3 allocation can't
> compensate the overhead of direct memory compaction.
>
> This patch makes the order-3 page allocation atomic. If there is no memory
> pressure and memory isn't fragmented, the alloction will still success, so we
> don't sacrifice the order-3 benefit here. If the atomic allocation fails,
> direct memory compaction will not be triggered, skb_page_frag_refill will
> fallback to order-0 immediately, hence the direct memory compaction overhead is
> avoided. In the allocation failure case, kswapd is waken up and doing
> compaction, so chances are allocation could success next time.
>
> alloc_skb_with_frags is the same.
>
> The mellanox driver does similar thing, if this is accepted, we must fix
> the driver too.
>
> V3: fix the same issue in alloc_skb_with_frags as pointed out by Eric
> V2: make the changelog clearer
>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Chris Mason <clm@fb.com>
> Cc: Debabrata Banerjee <dbavatar@gmail.com>
> Signed-off-by: Shaohua Li <shli@fb.com>
> Acked-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: David S. Miller <davem@davemloft.net>
>
> https://jira.sw.ru/browse/PSBM-81488
> (cherry picked from commit fb05e7a89f500cfc06ae277bdc911b281928995d)
> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
> ---
>  net/core/skbuff.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 4147d7219589..dca9d1d9676b 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -4360,7 +4360,7 @@ struct sk_buff *alloc_skb_with_frags(unsigned long header_len,
>
>                 while (order) {
>                         if (npages >= 1 << order) {
> -                               page = alloc_pages(gfp_mask |
> +                               page = alloc_pages((gfp_mask & ~__GFP_WAIT) |
>                                                    __GFP_COMP |
>                                                    __GFP_NOWARN |
>                                                    __GFP_NORETRY,
> --
> 2.13.6
>
Andrey Ryabinin Feb. 14, 2018, 10:11 a.m.
On 02/13/2018 07:52 PM, Alexey Kuznetsov wrote:
> Hello!
> 
> BTW, check this a few lines above:
> 
>         gfp_head = gfp_mask;
>         if (gfp_head & __GFP_WAIT)
>                 gfp_head |= __GFP_REPEAT;
> 
> What is this??
> 

Can't tell you why this was added and what purpose does it server, but...
__GFP_REPEAT mostly have effect only on costly order allocations (order >= 4)
It means retry allocation attempts until reclaimer makes any progress
(for non costly order (<= 3) allocator retries anyway, see should_alloc_retry()).

gfp_head used only in alloc_skb(), so this basically means try harder to allocate
large (> 32K) skbs.

Also 'if (gfp_head & __GFP_WAIT)' line is redundant, because __GFP_REPEAT don't have
any effect on !__GFP_WAIT allocations.