Memory leak issue in multi-threaded program

Submitted by Leesoo Ahn on Jan. 28, 2020, 5:44 a.m.

Details

Message ID ee1c6d93-9ef9-5dc8-0b0b-d039e5a2e0dd@davolink.co.kr
State New
Series "Memory leak issue in multi-threaded program"
Headers show

Commit Message

Leesoo Ahn Jan. 28, 2020, 5:44 a.m.
Dear musl developers,

Hello!, it seems that musl currently has a memory leak issue in 
multi-threaded program. It occurs in the below situation of latest 
(v1.1.24) source. Also, not only in 32-bits[1], but also 64-bits[2] as 
well.

When a program create and run, at least, two threads or more with 
pthread APIs, VSZ of the program by ps command keeps increasing. But 
here is a weird thing that it is fine 'IF ONLY ONE' pthread is created 
and run.

To confirm the issue in your host machine, please follow the instructions,

0. Clone the musl git and get inside.
1. Build with these options for static build, ./configure 
--prefix=$(pwd)/_build_dir --disable-shared
2. Download the test code[3], then build with the command, 
./_build_dir/bin/musl-gcc ./test.c
3. Run this script, ./a.out &; while [ 1 ]; do { ps aux | grep [a].out | 
grep -v grep; sleep 1; } done

You may figure out that VSZ keeps increasing.

BUT, when I make it to try to allocate memory all the time by kernel 
mmap with this diff[4] as workaround, although it creates more pthreads 
than 2, the issue never happens.

It would be really thankful if you guys could confirm it and find out 
the way to fix the bug.

Thank you in advance and take care.

Best Regards,
Leesoo

----
[1] 32-bits env: https://pastebin.com/xR4PySaM
[2] 64-bits env: https://pastebin.com/stdVQXdE
[3] test code: https://pastebin.com/0s8nmdUv
[4] workaround patch:

Patch hide | download patch | download mbox

diff --git a/src/malloc/malloc.c b/src/malloc/malloc.c
index 9698259..3d39be7 100644
--- a/src/malloc/malloc.c
+++ b/src/malloc/malloc.c
@@ -288,7 +288,11 @@  void *malloc(size_t n)

  	if (adjust_size(&n) < 0) return 0;

+#if 1
+	if ( 1 ) {
+#else
  	if (n > MMAP_THRESHOLD) {
+#endif
  		size_t len = n + OVERHEAD + PAGE_SIZE - 1 & -PAGE_SIZE;
  		char *base = __mmap(0, len, PROT_READ|PROT_WRITE,
  			MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);

Comments

Rich Felker Jan. 28, 2020, 1:29 p.m.
On Tue, Jan 28, 2020 at 02:44:07PM +0900, Leesoo Ahn wrote:
> Dear musl developers,
> 
> Hello!, it seems that musl currently has a memory leak issue in
> multi-threaded program. It occurs in the below situation of latest
> (v1.1.24) source. Also, not only in 32-bits[1], but also 64-bits[2]
> as well.
> 
> When a program create and run, at least, two threads or more with
> pthread APIs, VSZ of the program by ps command keeps increasing. But
> here is a weird thing that it is fine 'IF ONLY ONE' pthread is
> created and run.
> 
> To confirm the issue in your host machine, please follow the instructions,
> 
> 0. Clone the musl git and get inside.
> 1. Build with these options for static build, ./configure
> --prefix=$(pwd)/_build_dir --disable-shared
> 2. Download the test code[3], then build with the command,
> ../_build_dir/bin/musl-gcc ./test.c
> 3. Run this script, ./a.out &; while [ 1 ]; do { ps aux | grep
> [a].out | grep -v grep; sleep 1; } done
> 
> You may figure out that VSZ keeps increasing.
> 
> BUT, when I make it to try to allocate memory all the time by kernel
> mmap with this diff[4] as workaround, although it creates more
> pthreads than 2, the issue never happens.
> 
> It would be really thankful if you guys could confirm it and find
> out the way to fix the bug.

This is a known issue described in:

https://www.openwall.com/lists/musl/2018/10/30/2

and likely several times before that, though it was not realized that
people were hitting it in practice (vs it just being theoretical)
until around that time. I posted an experimental mitigation patch last
spring:

https://www.openwall.com/lists/musl/2019/04/12/4

but it's not heavily tested and its impact on performance is
significant. I think it should be ok if you need an immediate fix, but
you should do some testing to make sure. If you go this route, reports
of any problems (or success) would be nice to hear about.

Further work in that direction was not done because it was already
planned that musl's malloc implementation will be replaced, and that
the replacement will solve this and other problems in much better
ways. This is work in progress and is intended for merge in the next
release cycle:

https://www.openwall.com/lists/musl/2019/10/22/3
https://github.com/richfelker/mallocng-draft

Hope this information helps.

Rich
Leesoo Ahn Jan. 29, 2020, 1:55 a.m.
Dear Rich,

Thank you for the quick feedback. I am currently taking a look at the 
hotfix patch and do stress testing.

However, I can't wait for the next-gen new malloc implementation!

Cheers,
Leesoo

20. 1. 28. 오후 10:29에 Rich Felker 이(가) 쓴 글:
> On Tue, Jan 28, 2020 at 02:44:07PM +0900, Leesoo Ahn wrote:
>> Dear musl developers,
>>
>> Hello!, it seems that musl currently has a memory leak issue in
>> multi-threaded program. It occurs in the below situation of latest
>> (v1.1.24) source. Also, not only in 32-bits[1], but also 64-bits[2]
>> as well.
>>
>> When a program create and run, at least, two threads or more with
>> pthread APIs, VSZ of the program by ps command keeps increasing. But
>> here is a weird thing that it is fine 'IF ONLY ONE' pthread is
>> created and run.
>>
>> To confirm the issue in your host machine, please follow the instructions,
>>
>> 0. Clone the musl git and get inside.
>> 1. Build with these options for static build, ./configure
>> --prefix=$(pwd)/_build_dir --disable-shared
>> 2. Download the test code[3], then build with the command,
>> ../_build_dir/bin/musl-gcc ./test.c
>> 3. Run this script, ./a.out &; while [ 1 ]; do { ps aux | grep
>> [a].out | grep -v grep; sleep 1; } done
>>
>> You may figure out that VSZ keeps increasing.
>>
>> BUT, when I make it to try to allocate memory all the time by kernel
>> mmap with this diff[4] as workaround, although it creates more
>> pthreads than 2, the issue never happens.
>>
>> It would be really thankful if you guys could confirm it and find
>> out the way to fix the bug.
> 
> This is a known issue described in:
> 
> https://www.openwall.com/lists/musl/2018/10/30/2
> 
> and likely several times before that, though it was not realized that
> people were hitting it in practice (vs it just being theoretical)
> until around that time. I posted an experimental mitigation patch last
> spring:
> 
> https://www.openwall.com/lists/musl/2019/04/12/4
> 
> but it's not heavily tested and its impact on performance is
> significant. I think it should be ok if you need an immediate fix, but
> you should do some testing to make sure. If you go this route, reports
> of any problems (or success) would be nice to hear about.
> 
> Further work in that direction was not done because it was already
> planned that musl's malloc implementation will be replaced, and that
> the replacement will solve this and other problems in much better
> ways. This is work in progress and is intended for merge in the next
> release cycle:
> 
> https://www.openwall.com/lists/musl/2019/10/22/3
> https://github.com/richfelker/mallocng-draft
> 
> Hope this information helps.
> 
> Rich
> 
> 
> 
> 
>