mm: Change formula of calculation of default min_free_kbytes

Submitted by Kirill Tkhai on Nov. 8, 2017, 9:27 a.m.

Details

Message ID 151013320842.4139.8867271503650704865.stgit@localhost.localdomain
State New
Series "mm: Change formula of calculation of default min_free_kbytes"
Headers show

Commit Message

Kirill Tkhai Nov. 8, 2017, 9:27 a.m.
Parameter min_free_kbytes acts on per zone watermarks. It is used
to calculate the zones free memory value, below which the direct
reclaim starts and becomes throttled (the called task sleeps).

This patch makes default min_free_kbytes to be 2% of available
physical memory, but not more than 4GB. And this is more, than
previous formula gave (it was a sqrt). Why do we need that.

We bumped in the situation, when intense disc write inside a CT
on a node, having very few free memory, may lead to the state,
when almost all tasks are spining in direct reclaim. The tasks
can't do effective reclaim as generated dirty pages are written
and released by ploop threads, and thus the tasks in practically
are just busy looping. Ploop threads can't produce the effective
reclaim, as processors are occupied by the busylooping tasks
and also they need free pages to do that. So, the system is
looping and becomes very slow and unresponsible.

https://jira.sw.ru/browse/PSBM-69296

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
---
 mm/page_alloc.c |   27 +++------------------------
 1 file changed, 3 insertions(+), 24 deletions(-)

Patch hide | download patch | download mbox

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 137d1d86ddf..2108034bd80 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6399,27 +6399,6 @@  void setup_per_zone_wmarks(void)
 
 /*
  * Initialise min_free_kbytes.
- *
- * For small machines we want it small (128k min).  For large machines
- * we want it large (64MB max).  But it is not linear, because network
- * bandwidth does not increase linearly with machine size.  We use
- *
- * 	min_free_kbytes = 4 * sqrt(lowmem_kbytes), for better accuracy:
- *	min_free_kbytes = sqrt(lowmem_kbytes * 16)
- *
- * which yields
- *
- * 16MB:	512k
- * 32MB:	724k
- * 64MB:	1024k
- * 128MB:	1448k
- * 256MB:	2048k
- * 512MB:	2896k
- * 1024MB:	4096k
- * 2048MB:	5792k
- * 4096MB:	8192k
- * 8192MB:	11584k
- * 16384MB:	16384k
  */
 int __meminit init_per_zone_wmark_min(void)
 {
@@ -6427,14 +6406,14 @@  int __meminit init_per_zone_wmark_min(void)
 	int new_min_free_kbytes;
 
 	lowmem_kbytes = nr_free_buffer_pages() * (PAGE_SIZE >> 10);
-	new_min_free_kbytes = int_sqrt(lowmem_kbytes * 16);
+	new_min_free_kbytes = lowmem_kbytes * 2 / 100; /* 2% */
 
 	if (new_min_free_kbytes > user_min_free_kbytes) {
 		min_free_kbytes = new_min_free_kbytes;
 		if (min_free_kbytes < 128)
 			min_free_kbytes = 128;
-		if (min_free_kbytes > 65536)
-			min_free_kbytes = 65536;
+		if (min_free_kbytes > 4194304)
+			min_free_kbytes = 4194304;
 	} else {
 		pr_warn("min_free_kbytes is not updated to %d because user defined value %d is preferred\n",
 				new_min_free_kbytes, user_min_free_kbytes);

Comments

Konstantin Khorenko Dec. 6, 2017, 3:19 p.m.
Please consider to RK this in ~2weeks of testing.

https://readykernel.com/

--
Best regards,

Konstantin Khorenko,
Virtuozzo Linux Kernel Team

On 11/08/2017 12:27 PM, Kirill Tkhai wrote:
> Parameter min_free_kbytes acts on per zone watermarks. It is used
> to calculate the zones free memory value, below which the direct
> reclaim starts and becomes throttled (the called task sleeps).
>
> This patch makes default min_free_kbytes to be 2% of available
> physical memory, but not more than 4GB. And this is more, than
> previous formula gave (it was a sqrt). Why do we need that.
>
> We bumped in the situation, when intense disc write inside a CT
> on a node, having very few free memory, may lead to the state,
> when almost all tasks are spining in direct reclaim. The tasks
> can't do effective reclaim as generated dirty pages are written
> and released by ploop threads, and thus the tasks in practically
> are just busy looping. Ploop threads can't produce the effective
> reclaim, as processors are occupied by the busylooping tasks
> and also they need free pages to do that. So, the system is
> looping and becomes very slow and unresponsible.
>
> https://jira.sw.ru/browse/PSBM-69296
>
> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
> ---
>  mm/page_alloc.c |   27 +++------------------------
>  1 file changed, 3 insertions(+), 24 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 137d1d86ddf..2108034bd80 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -6399,27 +6399,6 @@ void setup_per_zone_wmarks(void)
>
>  /*
>   * Initialise min_free_kbytes.
> - *
> - * For small machines we want it small (128k min).  For large machines
> - * we want it large (64MB max).  But it is not linear, because network
> - * bandwidth does not increase linearly with machine size.  We use
> - *
> - * 	min_free_kbytes = 4 * sqrt(lowmem_kbytes), for better accuracy:
> - *	min_free_kbytes = sqrt(lowmem_kbytes * 16)
> - *
> - * which yields
> - *
> - * 16MB:	512k
> - * 32MB:	724k
> - * 64MB:	1024k
> - * 128MB:	1448k
> - * 256MB:	2048k
> - * 512MB:	2896k
> - * 1024MB:	4096k
> - * 2048MB:	5792k
> - * 4096MB:	8192k
> - * 8192MB:	11584k
> - * 16384MB:	16384k
>   */
>  int __meminit init_per_zone_wmark_min(void)
>  {
> @@ -6427,14 +6406,14 @@ int __meminit init_per_zone_wmark_min(void)
>  	int new_min_free_kbytes;
>
>  	lowmem_kbytes = nr_free_buffer_pages() * (PAGE_SIZE >> 10);
> -	new_min_free_kbytes = int_sqrt(lowmem_kbytes * 16);
> +	new_min_free_kbytes = lowmem_kbytes * 2 / 100; /* 2% */
>
>  	if (new_min_free_kbytes > user_min_free_kbytes) {
>  		min_free_kbytes = new_min_free_kbytes;
>  		if (min_free_kbytes < 128)
>  			min_free_kbytes = 128;
> -		if (min_free_kbytes > 65536)
> -			min_free_kbytes = 65536;
> +		if (min_free_kbytes > 4194304)
> +			min_free_kbytes = 4194304;
>  	} else {
>  		pr_warn("min_free_kbytes is not updated to %d because user defined value %d is preferred\n",
>  				new_min_free_kbytes, user_min_free_kbytes);
>
> .
>
Vasily Averin Dec. 6, 2017, 3:24 p.m.
it changes __init function.

On 2017-12-06 18:19, Konstantin Khorenko wrote:
> Please consider to RK this in ~2weeks of testing.
> 
> https://readykernel.com/
> 
> -- 
> Best regards,
> 
> Konstantin Khorenko,
> Virtuozzo Linux Kernel Team
> 
> On 11/08/2017 12:27 PM, Kirill Tkhai wrote:
>> Parameter min_free_kbytes acts on per zone watermarks. It is used
>> to calculate the zones free memory value, below which the direct
>> reclaim starts and becomes throttled (the called task sleeps).
>>
>> This patch makes default min_free_kbytes to be 2% of available
>> physical memory, but not more than 4GB. And this is more, than
>> previous formula gave (it was a sqrt). Why do we need that.
>>
>> We bumped in the situation, when intense disc write inside a CT
>> on a node, having very few free memory, may lead to the state,
>> when almost all tasks are spining in direct reclaim. The tasks
>> can't do effective reclaim as generated dirty pages are written
>> and released by ploop threads, and thus the tasks in practically
>> are just busy looping. Ploop threads can't produce the effective
>> reclaim, as processors are occupied by the busylooping tasks
>> and also they need free pages to do that. So, the system is
>> looping and becomes very slow and unresponsible.
>>
>> https://jira.sw.ru/browse/PSBM-69296
>>
>> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
>> ---
>>  mm/page_alloc.c |   27 +++------------------------
>>  1 file changed, 3 insertions(+), 24 deletions(-)
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 137d1d86ddf..2108034bd80 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -6399,27 +6399,6 @@ void setup_per_zone_wmarks(void)
>>
>>  /*
>>   * Initialise min_free_kbytes.
>> - *
>> - * For small machines we want it small (128k min).  For large machines
>> - * we want it large (64MB max).  But it is not linear, because network
>> - * bandwidth does not increase linearly with machine size.  We use
>> - *
>> - *     min_free_kbytes = 4 * sqrt(lowmem_kbytes), for better accuracy:
>> - *    min_free_kbytes = sqrt(lowmem_kbytes * 16)
>> - *
>> - * which yields
>> - *
>> - * 16MB:    512k
>> - * 32MB:    724k
>> - * 64MB:    1024k
>> - * 128MB:    1448k
>> - * 256MB:    2048k
>> - * 512MB:    2896k
>> - * 1024MB:    4096k
>> - * 2048MB:    5792k
>> - * 4096MB:    8192k
>> - * 8192MB:    11584k
>> - * 16384MB:    16384k
>>   */
>>  int __meminit init_per_zone_wmark_min(void)
>>  {
>> @@ -6427,14 +6406,14 @@ int __meminit init_per_zone_wmark_min(void)
>>      int new_min_free_kbytes;
>>
>>      lowmem_kbytes = nr_free_buffer_pages() * (PAGE_SIZE >> 10);
>> -    new_min_free_kbytes = int_sqrt(lowmem_kbytes * 16);
>> +    new_min_free_kbytes = lowmem_kbytes * 2 / 100; /* 2% */
>>
>>      if (new_min_free_kbytes > user_min_free_kbytes) {
>>          min_free_kbytes = new_min_free_kbytes;
>>          if (min_free_kbytes < 128)
>>              min_free_kbytes = 128;
>> -        if (min_free_kbytes > 65536)
>> -            min_free_kbytes = 65536;
>> +        if (min_free_kbytes > 4194304)
>> +            min_free_kbytes = 4194304;
>>      } else {
>>          pr_warn("min_free_kbytes is not updated to %d because user defined value %d is preferred\n",
>>                  new_min_free_kbytes, user_min_free_kbytes);
>>
>> .
>>
>