[Devel,rh7] fs: add __GFP_NORETRY in alloc_fdmem

Submitted by Anatoly Stepanov on Oct. 21, 2016, 11:42 a.m.

Details

Message ID 1477050123-178281-1-git-send-email-astepanov@cloudlinux.com
State New
Series "fs: add __GFP_NORETRY in alloc_fdmem"
Headers show

Commit Message

Anatoly Stepanov Oct. 21, 2016, 11:42 a.m.
This is a backport of upstream (vanilla) commit:
commit 96c7a2ff21501691587e1ae969b83cbec8b78e08

Under certain conditions there might be a lot of
alloc_fdmem() invocations with order <= PAGE_ALLOC_COSTLY_ORDER.

For example: httpd which is doing a lot of fork() calls.

Real-life examples from our customers:

[532506.773243] httpd           D ffff8803f5fecc20     0 939874   6606
[532506.773257] Call Trace:
[532506.773261]  [<ffffffff8163ce29>] schedule+0x29/0x70
[532506.773264]  [<ffffffff8163a9d5>] schedule_timeout+0x175/0x2d0
[532506.773272]  [<ffffffff8108cc90>] ? internal_add_timer+0x70/0x70
[532506.773276]  [<ffffffff8163c3ae>] io_schedule_timeout+0xae/0x130
[532506.773280]  [<ffffffff8119be85>] wait_iff_congested+0x135/0x150
[532506.773284]  [<ffffffff810a86e0>] ? wake_up_atomic_t+0x30/0x30
[532506.773288]  [<ffffffff8119071f>] shrink_inactive_list+0x65f/0x6c0
[532506.773292]  [<ffffffff81190f55>] shrink_lruvec+0x395/0x800
[532506.773296]  [<ffffffff811914af>] shrink_zone+0xef/0x2d0
[532506.773300]  [<ffffffff81191a30>] do_try_to_free_pages+0x170/0x530
[532506.773310]  [<ffffffff81191ec5>] try_to_free_pages+0xd5/0x160
[532506.773315]  [<ffffffff811850ab>] __alloc_pages_nodemask+0x8ab/0xc10
[532506.773320]  [<ffffffff811cb2f9>] alloc_pages_current+0xa9/0x170
[532506.773324]  [<ffffffff8119f8f8>] kmalloc_order+0x18/0x50
[532506.773327]  [<ffffffff8119f956>] kmalloc_order_trace+0x26/0xa0
[532506.773332]  [<ffffffff811d8c69>] __kmalloc+0x259/0x270
[532506.773337]  [<ffffffff812184d0>] alloc_fdmem+0x20/0x50
[532506.773341]  [<ffffffff812185ac>] alloc_fdtable+0x6c/0xe0
[532506.773344]  [<ffffffff81218b69>] dup_fd+0x1f9/0x2d0
[532506.773354]  [<ffffffff810797cf>] copy_process.part.30+0x87f/0x1510
[532506.773358]  [<ffffffff8107a641>] do_fork+0xe1/0x320
[532506.773370]  [<ffffffff8107a906>] SyS_clone+0x16/0x20
[532506.773376]  [<ffffffff81648299>] stub_clone+0x69/0x90
[532506.773380]  [<ffffffff81647f49>] ? system_call_fastpath+0x16/0x1b

[513890.005271] httpd           D ffff880425db7230     0 811718   6606
[513890.005279] Call Trace:
[513890.005282]  [<ffffffff8163ce29>] schedule+0x29/0x70
[513890.005284]  [<ffffffff8163aa99>] schedule_timeout+0x239/0x2d0
[513890.005292]  [<ffffffff8163c3ae>] io_schedule_timeout+0xae/0x130
[513890.005296]  [<ffffffff8163c448>] io_schedule+0x18/0x20
[513890.005298]  [<ffffffff812c6268>] get_request+0x218/0x780
[513890.005303]  [<ffffffff812c8526>] blk_queue_bio+0xc6/0x3a0
[513890.005309]  [<ffffffffa0002c59>] ? dm_make_request+0x119/0x170 [dm_mod]
[513890.005311]  [<ffffffff812c3892>] generic_make_request+0xe2/0x130
[513890.005313]  [<ffffffff812c3957>] submit_bio+0x77/0x1c0
[513890.005318]  [<ffffffff811bf87e>] __swap_writepage+0x1be/0x260
[513890.005337]  [<ffffffff811bf959>] swap_writepage+0x39/0x80
[513890.005340]  [<ffffffff8118f68d>] shrink_page_list+0x4ad/0xa80
[513890.005343]  [<ffffffff811902bb>] shrink_inactive_list+0x1fb/0x6c0
[513890.005345]  [<ffffffff81190f55>] shrink_lruvec+0x395/0x800
[513890.005348]  [<ffffffff811914af>] shrink_zone+0xef/0x2d0
[513890.005350]  [<ffffffff81191a30>] do_try_to_free_pages+0x170/0x530
[513890.005353]  [<ffffffff81191ec5>] try_to_free_pages+0xd5/0x160
[513890.005355]  [<ffffffff811850ab>] __alloc_pages_nodemask+0x8ab/0xc10
[513890.005358]  [<ffffffff811cb2f9>] alloc_pages_current+0xa9/0x170
[513890.005360]  [<ffffffff8119f8f8>] kmalloc_order+0x18/0x50
[513890.005362]  [<ffffffff8119f956>] kmalloc_order_trace+0x26/0xa0
[513890.005365]  [<ffffffff811d8c69>] __kmalloc+0x259/0x270
[513890.005367]  [<ffffffff812184d0>] alloc_fdmem+0x20/0x50
[513890.005369]  [<ffffffff812185ac>] alloc_fdtable+0x6c/0xe0
[513890.005371]  [<ffffffff81218b69>] dup_fd+0x1f9/0x2d0
[513890.005376]  [<ffffffff810797cf>] copy_process.part.30+0x87f/0x1510
[513890.005378]  [<ffffffff8107a641>] do_fork+0xe1/0x320
[513890.005380]  [<ffffffff8107a906>] SyS_clone+0x16/0x20
[513890.005382]  [<ffffffff81648299>] stub_clone+0x69/0x90

We observed that sometimes kswapd cannot handle this which
causes many direct reclaim attempts which in turn:

1. Increases iowait time due to congestion_wait
2. Increases number of block reqs per second due to
page swapping and writeback
3. May induce OOMs

So it's better DO NOT try that hard to allocate contiguous
area, and fallback to vmalloc() as soon as possible.

Signed-off-by: Anatoly Stepanov <astepanov@cloudlinux.com>
---
 fs/file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Patch hide | download patch | download mbox

diff --git a/fs/file.c b/fs/file.c
index 366d9bb..3f65ba0 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -36,7 +36,7 @@  static void *alloc_fdmem(size_t size)
 	 * vmalloc() if the allocation size will be considered "large" by the VM.
 	 */
 	if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
-		void *data = kmalloc(size, GFP_KERNEL|__GFP_NOWARN);
+		void *data = kmalloc(size, GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY);
 		if (data != NULL)
 			return data;
 	}

Comments

Konstantin Khorenko March 16, 2017, 3:03 p.m.
Andrey, please take a look.

All other patches from Anatoly are applied already, except this one.
Worth to apply this one as well?

--
Best regards,

Konstantin Khorenko,
Virtuozzo Linux Kernel Team

On 10/21/2016 02:42 PM, Anatoly Stepanov wrote:
> This is a backport of upstream (vanilla) commit:
> commit 96c7a2ff21501691587e1ae969b83cbec8b78e08
>
> Under certain conditions there might be a lot of
> alloc_fdmem() invocations with order <= PAGE_ALLOC_COSTLY_ORDER.
>
> For example: httpd which is doing a lot of fork() calls.
>
> Real-life examples from our customers:
>
> [532506.773243] httpd           D ffff8803f5fecc20     0 939874   6606
> [532506.773257] Call Trace:
> [532506.773261]  [<ffffffff8163ce29>] schedule+0x29/0x70
> [532506.773264]  [<ffffffff8163a9d5>] schedule_timeout+0x175/0x2d0
> [532506.773272]  [<ffffffff8108cc90>] ? internal_add_timer+0x70/0x70
> [532506.773276]  [<ffffffff8163c3ae>] io_schedule_timeout+0xae/0x130
> [532506.773280]  [<ffffffff8119be85>] wait_iff_congested+0x135/0x150
> [532506.773284]  [<ffffffff810a86e0>] ? wake_up_atomic_t+0x30/0x30
> [532506.773288]  [<ffffffff8119071f>] shrink_inactive_list+0x65f/0x6c0
> [532506.773292]  [<ffffffff81190f55>] shrink_lruvec+0x395/0x800
> [532506.773296]  [<ffffffff811914af>] shrink_zone+0xef/0x2d0
> [532506.773300]  [<ffffffff81191a30>] do_try_to_free_pages+0x170/0x530
> [532506.773310]  [<ffffffff81191ec5>] try_to_free_pages+0xd5/0x160
> [532506.773315]  [<ffffffff811850ab>] __alloc_pages_nodemask+0x8ab/0xc10
> [532506.773320]  [<ffffffff811cb2f9>] alloc_pages_current+0xa9/0x170
> [532506.773324]  [<ffffffff8119f8f8>] kmalloc_order+0x18/0x50
> [532506.773327]  [<ffffffff8119f956>] kmalloc_order_trace+0x26/0xa0
> [532506.773332]  [<ffffffff811d8c69>] __kmalloc+0x259/0x270
> [532506.773337]  [<ffffffff812184d0>] alloc_fdmem+0x20/0x50
> [532506.773341]  [<ffffffff812185ac>] alloc_fdtable+0x6c/0xe0
> [532506.773344]  [<ffffffff81218b69>] dup_fd+0x1f9/0x2d0
> [532506.773354]  [<ffffffff810797cf>] copy_process.part.30+0x87f/0x1510
> [532506.773358]  [<ffffffff8107a641>] do_fork+0xe1/0x320
> [532506.773370]  [<ffffffff8107a906>] SyS_clone+0x16/0x20
> [532506.773376]  [<ffffffff81648299>] stub_clone+0x69/0x90
> [532506.773380]  [<ffffffff81647f49>] ? system_call_fastpath+0x16/0x1b
>
> [513890.005271] httpd           D ffff880425db7230     0 811718   6606
> [513890.005279] Call Trace:
> [513890.005282]  [<ffffffff8163ce29>] schedule+0x29/0x70
> [513890.005284]  [<ffffffff8163aa99>] schedule_timeout+0x239/0x2d0
> [513890.005292]  [<ffffffff8163c3ae>] io_schedule_timeout+0xae/0x130
> [513890.005296]  [<ffffffff8163c448>] io_schedule+0x18/0x20
> [513890.005298]  [<ffffffff812c6268>] get_request+0x218/0x780
> [513890.005303]  [<ffffffff812c8526>] blk_queue_bio+0xc6/0x3a0
> [513890.005309]  [<ffffffffa0002c59>] ? dm_make_request+0x119/0x170 [dm_mod]
> [513890.005311]  [<ffffffff812c3892>] generic_make_request+0xe2/0x130
> [513890.005313]  [<ffffffff812c3957>] submit_bio+0x77/0x1c0
> [513890.005318]  [<ffffffff811bf87e>] __swap_writepage+0x1be/0x260
> [513890.005337]  [<ffffffff811bf959>] swap_writepage+0x39/0x80
> [513890.005340]  [<ffffffff8118f68d>] shrink_page_list+0x4ad/0xa80
> [513890.005343]  [<ffffffff811902bb>] shrink_inactive_list+0x1fb/0x6c0
> [513890.005345]  [<ffffffff81190f55>] shrink_lruvec+0x395/0x800
> [513890.005348]  [<ffffffff811914af>] shrink_zone+0xef/0x2d0
> [513890.005350]  [<ffffffff81191a30>] do_try_to_free_pages+0x170/0x530
> [513890.005353]  [<ffffffff81191ec5>] try_to_free_pages+0xd5/0x160
> [513890.005355]  [<ffffffff811850ab>] __alloc_pages_nodemask+0x8ab/0xc10
> [513890.005358]  [<ffffffff811cb2f9>] alloc_pages_current+0xa9/0x170
> [513890.005360]  [<ffffffff8119f8f8>] kmalloc_order+0x18/0x50
> [513890.005362]  [<ffffffff8119f956>] kmalloc_order_trace+0x26/0xa0
> [513890.005365]  [<ffffffff811d8c69>] __kmalloc+0x259/0x270
> [513890.005367]  [<ffffffff812184d0>] alloc_fdmem+0x20/0x50
> [513890.005369]  [<ffffffff812185ac>] alloc_fdtable+0x6c/0xe0
> [513890.005371]  [<ffffffff81218b69>] dup_fd+0x1f9/0x2d0
> [513890.005376]  [<ffffffff810797cf>] copy_process.part.30+0x87f/0x1510
> [513890.005378]  [<ffffffff8107a641>] do_fork+0xe1/0x320
> [513890.005380]  [<ffffffff8107a906>] SyS_clone+0x16/0x20
> [513890.005382]  [<ffffffff81648299>] stub_clone+0x69/0x90
>
> We observed that sometimes kswapd cannot handle this which
> causes many direct reclaim attempts which in turn:
>
> 1. Increases iowait time due to congestion_wait
> 2. Increases number of block reqs per second due to
> page swapping and writeback
> 3. May induce OOMs
>
> So it's better DO NOT try that hard to allocate contiguous
> area, and fallback to vmalloc() as soon as possible.
>
> Signed-off-by: Anatoly Stepanov <astepanov@cloudlinux.com>
> ---
>  fs/file.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/file.c b/fs/file.c
> index 366d9bb..3f65ba0 100644
> --- a/fs/file.c
> +++ b/fs/file.c
> @@ -36,7 +36,7 @@ static void *alloc_fdmem(size_t size)
>  	 * vmalloc() if the allocation size will be considered "large" by the VM.
>  	 */
>  	if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
> -		void *data = kmalloc(size, GFP_KERNEL|__GFP_NOWARN);
> +		void *data = kmalloc(size, GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY);
>  		if (data != NULL)
>  			return data;
>  	}
>
Andrey Ryabinin March 16, 2017, 3:08 p.m.
On 03/16/2017 06:03 PM, Konstantin Khorenko wrote:
> Andrey, please take a look.
> 
> All other patches from Anatoly are applied already, except this one.
> Worth to apply this one as well?
> 

Yep,
	Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>


> -- 
> Best regards,
> 
> Konstantin Khorenko,
> Virtuozzo Linux Kernel Team
> 
> On 10/21/2016 02:42 PM, Anatoly Stepanov wrote:
>> This is a backport of upstream (vanilla) commit:
>> commit 96c7a2ff21501691587e1ae969b83cbec8b78e08
>>
>> Under certain conditions there might be a lot of
>> alloc_fdmem() invocations with order <= PAGE_ALLOC_COSTLY_ORDER.
>>
>> For example: httpd which is doing a lot of fork() calls.
>>
>> Real-life examples from our customers:
>>
>> [532506.773243] httpd           D ffff8803f5fecc20     0 939874   6606
>> [532506.773257] Call Trace:
>> [532506.773261]  [<ffffffff8163ce29>] schedule+0x29/0x70
>> [532506.773264]  [<ffffffff8163a9d5>] schedule_timeout+0x175/0x2d0
>> [532506.773272]  [<ffffffff8108cc90>] ? internal_add_timer+0x70/0x70
>> [532506.773276]  [<ffffffff8163c3ae>] io_schedule_timeout+0xae/0x130
>> [532506.773280]  [<ffffffff8119be85>] wait_iff_congested+0x135/0x150
>> [532506.773284]  [<ffffffff810a86e0>] ? wake_up_atomic_t+0x30/0x30
>> [532506.773288]  [<ffffffff8119071f>] shrink_inactive_list+0x65f/0x6c0
>> [532506.773292]  [<ffffffff81190f55>] shrink_lruvec+0x395/0x800
>> [532506.773296]  [<ffffffff811914af>] shrink_zone+0xef/0x2d0
>> [532506.773300]  [<ffffffff81191a30>] do_try_to_free_pages+0x170/0x530
>> [532506.773310]  [<ffffffff81191ec5>] try_to_free_pages+0xd5/0x160
>> [532506.773315]  [<ffffffff811850ab>] __alloc_pages_nodemask+0x8ab/0xc10
>> [532506.773320]  [<ffffffff811cb2f9>] alloc_pages_current+0xa9/0x170
>> [532506.773324]  [<ffffffff8119f8f8>] kmalloc_order+0x18/0x50
>> [532506.773327]  [<ffffffff8119f956>] kmalloc_order_trace+0x26/0xa0
>> [532506.773332]  [<ffffffff811d8c69>] __kmalloc+0x259/0x270
>> [532506.773337]  [<ffffffff812184d0>] alloc_fdmem+0x20/0x50
>> [532506.773341]  [<ffffffff812185ac>] alloc_fdtable+0x6c/0xe0
>> [532506.773344]  [<ffffffff81218b69>] dup_fd+0x1f9/0x2d0
>> [532506.773354]  [<ffffffff810797cf>] copy_process.part.30+0x87f/0x1510
>> [532506.773358]  [<ffffffff8107a641>] do_fork+0xe1/0x320
>> [532506.773370]  [<ffffffff8107a906>] SyS_clone+0x16/0x20
>> [532506.773376]  [<ffffffff81648299>] stub_clone+0x69/0x90
>> [532506.773380]  [<ffffffff81647f49>] ? system_call_fastpath+0x16/0x1b
>>
>> [513890.005271] httpd           D ffff880425db7230     0 811718   6606
>> [513890.005279] Call Trace:
>> [513890.005282]  [<ffffffff8163ce29>] schedule+0x29/0x70
>> [513890.005284]  [<ffffffff8163aa99>] schedule_timeout+0x239/0x2d0
>> [513890.005292]  [<ffffffff8163c3ae>] io_schedule_timeout+0xae/0x130
>> [513890.005296]  [<ffffffff8163c448>] io_schedule+0x18/0x20
>> [513890.005298]  [<ffffffff812c6268>] get_request+0x218/0x780
>> [513890.005303]  [<ffffffff812c8526>] blk_queue_bio+0xc6/0x3a0
>> [513890.005309]  [<ffffffffa0002c59>] ? dm_make_request+0x119/0x170 [dm_mod]
>> [513890.005311]  [<ffffffff812c3892>] generic_make_request+0xe2/0x130
>> [513890.005313]  [<ffffffff812c3957>] submit_bio+0x77/0x1c0
>> [513890.005318]  [<ffffffff811bf87e>] __swap_writepage+0x1be/0x260
>> [513890.005337]  [<ffffffff811bf959>] swap_writepage+0x39/0x80
>> [513890.005340]  [<ffffffff8118f68d>] shrink_page_list+0x4ad/0xa80
>> [513890.005343]  [<ffffffff811902bb>] shrink_inactive_list+0x1fb/0x6c0
>> [513890.005345]  [<ffffffff81190f55>] shrink_lruvec+0x395/0x800
>> [513890.005348]  [<ffffffff811914af>] shrink_zone+0xef/0x2d0
>> [513890.005350]  [<ffffffff81191a30>] do_try_to_free_pages+0x170/0x530
>> [513890.005353]  [<ffffffff81191ec5>] try_to_free_pages+0xd5/0x160
>> [513890.005355]  [<ffffffff811850ab>] __alloc_pages_nodemask+0x8ab/0xc10
>> [513890.005358]  [<ffffffff811cb2f9>] alloc_pages_current+0xa9/0x170
>> [513890.005360]  [<ffffffff8119f8f8>] kmalloc_order+0x18/0x50
>> [513890.005362]  [<ffffffff8119f956>] kmalloc_order_trace+0x26/0xa0
>> [513890.005365]  [<ffffffff811d8c69>] __kmalloc+0x259/0x270
>> [513890.005367]  [<ffffffff812184d0>] alloc_fdmem+0x20/0x50
>> [513890.005369]  [<ffffffff812185ac>] alloc_fdtable+0x6c/0xe0
>> [513890.005371]  [<ffffffff81218b69>] dup_fd+0x1f9/0x2d0
>> [513890.005376]  [<ffffffff810797cf>] copy_process.part.30+0x87f/0x1510
>> [513890.005378]  [<ffffffff8107a641>] do_fork+0xe1/0x320
>> [513890.005380]  [<ffffffff8107a906>] SyS_clone+0x16/0x20
>> [513890.005382]  [<ffffffff81648299>] stub_clone+0x69/0x90
>>
>> We observed that sometimes kswapd cannot handle this which
>> causes many direct reclaim attempts which in turn:
>>
>> 1. Increases iowait time due to congestion_wait
>> 2. Increases number of block reqs per second due to
>> page swapping and writeback
>> 3. May induce OOMs
>>
>> So it's better DO NOT try that hard to allocate contiguous
>> area, and fallback to vmalloc() as soon as possible.
>>
>> Signed-off-by: Anatoly Stepanov <astepanov@cloudlinux.com>
>> ---
>>  fs/file.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/fs/file.c b/fs/file.c
>> index 366d9bb..3f65ba0 100644
>> --- a/fs/file.c
>> +++ b/fs/file.c
>> @@ -36,7 +36,7 @@ static void *alloc_fdmem(size_t size)
>>       * vmalloc() if the allocation size will be considered "large" by the VM.
>>       */
>>      if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
>> -        void *data = kmalloc(size, GFP_KERNEL|__GFP_NOWARN);
>> +        void *data = kmalloc(size, GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY);
>>          if (data != NULL)
>>              return data;
>>      }
>>