* [PATCH] Bug 22111: Fix malloc tcache leak.
@ 2017-09-27 5:45 Carlos O'Donell
2017-09-27 8:57 ` Florian Weimer
0 siblings, 1 reply; 8+ messages in thread
From: Carlos O'Donell @ 2017-09-27 5:45 UTC (permalink / raw)
To: GNU C Library
commit message---
The malloc tcache added in 2.26 will leak all of the elements
remaining in the cache and the cache structure itself when a
thread exits. The defect is that we do not set tcache_shutting_down
early enough, and the thread simply recreates the tcache and
places the elements back onto a new tcache which is subsequently
lost as the thread exits (unfreed memory). The fix is relatively
simple, move the setting of tcache_shutting_down earlier in the
tcache_thread_freeres. We add a test case which uses mallinfo
and some heuristics to look for 1000x memory usage between the
start and end of a thread start/join loop. It is very reliable
at detecting that there is a leak given the number of iterations.
Without the fix the test will consume 122MiB of leaked memory.
Tested on x86_64 with no regressions.
Signed-off-by: Carlos O'Donell <carlos@redhat.com>
---
OK to checkin? In particular I'd be happy to see if there is any
way to make the test better.
Once approved I'll backport this to 2.26 stable branch.
2017-09-26 Carlos O'Donell <carlos@redhat.com>
[BZ #22111]
* malloc/malloc.c (tcache_shutting_down): Use bool type.
(tcache_thread_freeres): Set tcache_shutting_down before
freeing the tcache.
* malloc/Makefile (tests): Add tst-malloc-tcache-leak.
* malloc/tst-malloc-tcache-leak.c: New file.
diff --git a/malloc/Makefile b/malloc/Makefile
index 50b487e..6cf78e1 100644
--- a/malloc/Makefile
+++ b/malloc/Makefile
@@ -34,6 +34,7 @@ tests := mallocbug tst-malloc tst-valloc tst-calloc tst-obstack \
tst-interpose-nothread \
tst-interpose-thread \
tst-alloc_buffer \
+ tst-malloc-tcache-leak \
tests-static := \
tst-interpose-static-nothread \
@@ -242,3 +243,5 @@ tst-dynarray-fail-ENV = MALLOC_TRACE=$(objpfx)tst-dynarray-fail.mtrace
$(objpfx)tst-dynarray-fail-mem.out: $(objpfx)tst-dynarray-fail.out
$(common-objpfx)malloc/mtrace $(objpfx)tst-dynarray-fail.mtrace > $@; \
$(evaluate-test)
+
+$(objpfx)tst-malloc-tcache-leak: $(shared-thread-library)
diff --git a/malloc/malloc.c b/malloc/malloc.c
index 1c2a0b0..d3fcadd 100644
--- a/malloc/malloc.c
+++ b/malloc/malloc.c
@@ -2916,7 +2916,7 @@ typedef struct tcache_perthread_struct
tcache_entry *entries[TCACHE_MAX_BINS];
} tcache_perthread_struct;
-static __thread char tcache_shutting_down = 0;
+static __thread bool tcache_shutting_down = false;
static __thread tcache_perthread_struct *tcache = NULL;
/* Caller must ensure that we know tc_idx is valid and there's room
@@ -2953,8 +2953,12 @@ tcache_thread_freeres (void)
if (!tcache)
return;
+ /* Disable the tcache and prevent it from being reinitialized. */
tcache = NULL;
+ tcache_shutting_down = true;
+ /* Free all of the entries and the tcache itself back to the arena
+ heap for coalescing. */
for (i = 0; i < TCACHE_MAX_BINS; ++i)
{
while (tcache_tmp->entries[i])
@@ -2966,8 +2970,6 @@ tcache_thread_freeres (void)
}
__libc_free (tcache_tmp);
-
- tcache_shutting_down = 1;
}
text_set_element (__libc_thread_subfreeres, tcache_thread_freeres);
diff --git a/malloc/tst-malloc-tcache-leak.c b/malloc/tst-malloc-tcache-leak.c
new file mode 100644
index 0000000..ad7ea46
--- /dev/null
+++ b/malloc/tst-malloc-tcache-leak.c
@@ -0,0 +1,111 @@
+/* Bug 22111: Test that threads do not leak their per thread cache.
+ Copyright (C) 2015-2017 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+/* The point of this test is to start and exit a large number of
+ threads, while at the same time looking to see if the used
+ memory grows with each round of threads run. If the memory
+ grows above some linear bound we declare the test failed and
+ that the malloc implementation is leaking memory with each
+ thread. This is a good indicator that the thread local cache
+ is leaking chunks. */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <malloc.h>
+#include <pthread.h>
+#include <assert.h>
+
+#include <support/check.h>
+#include <support/support.h>
+#include <support/xthread.h>
+
+void *
+worker (void *data)
+{
+ /* Allocate an arbitrary amount of memory that is known to fit into
+ the thread local cache (tcache). If we have at least 64 bins
+ (default e.g. TCACHE_MAX_BINS) we should be able to allocate 32
+ bytes and force malloc to fill the tcache. We have the allocated
+ memory escape back to the parent to be freed to avoid any compiler
+ optimizations. */
+ return (void *) xmalloc (32);
+}
+
+static int
+do_test (void)
+{
+ /* Allocate an arbitrary number of threads that can run concurrently
+ and carry out one allocation from the tcahce and then exit. We
+ could do it one thread at a time to minimize memory usage, but
+ it's faster to do 10 threads at a time, and we won't run out of
+ memory. */
+#define TNUM 10
+ pthread_t *threads;
+ unsigned int i, loop;
+ struct mallinfo info_before, info_after;
+ void *retval;
+
+ /* Avoid there being 0 malloc'd data at this point by allocating the
+ pthread_t's required to run the test. */
+ threads = (pthread_t *) xmalloc (sizeof (pthread_t) * TNUM);
+
+ info_before = mallinfo ();
+
+ assert (info_before.uordblks != 0);
+
+ printf ("INFO: %d (bytes) are in use before starting threads.\n",
+ info_before.uordblks);
+
+ /* Again this is an arbitrary choice. We run the 10 threads 10,000
+ times for a total of 100,000 threads created and joined. This
+ gives us enough iterations to show a leak. */
+ for (loop = 0; loop < 10000; loop++)
+ {
+
+ for (i = 0; i < TNUM; i++)
+ {
+ threads[i] = xpthread_create (NULL, worker, NULL);
+ }
+
+ for (i = 0; i < TNUM; i++)
+ {
+ retval = xpthread_join (threads[i]);
+ free (retval);
+ }
+ }
+
+ info_after = mallinfo ();
+ printf ("INFO: %d (bytes) are in use after all threads joined.\n",
+ info_after.uordblks);
+
+ /* We need to compare the memory in use before and the memory in use
+ after starting and joining 100,000 threads. We almost always grow
+ memory slightly, but not much. If the growth factor is over 1000
+ then we know we have a leak with high confidence. Consider that if
+ even 1-byte leaked per thread we'd have 100,000 bytes of additional
+ memory, and in general the in-use at the start of main is quite
+ low. */
+ if (info_after.uordblks > (info_before.uordblks * 1000))
+ FAIL_EXIT1 ("Memory usage after threads is too high.\n");
+
+ /* Did not detect excessive memory usage. */
+ free (threads);
+ exit (0);
+}
+
+#include <support/test-driver.c>
---
--
Cheers,
Carlos.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Bug 22111: Fix malloc tcache leak.
2017-09-27 5:45 [PATCH] Bug 22111: Fix malloc tcache leak Carlos O'Donell
@ 2017-09-27 8:57 ` Florian Weimer
2017-09-28 17:32 ` [PATCH v2] malloc: Fix tcache leak on thread destruction [BZ #22111] Carlos O'Donell
0 siblings, 1 reply; 8+ messages in thread
From: Florian Weimer @ 2017-09-27 8:57 UTC (permalink / raw)
To: Carlos O'Donell, GNU C Library
On 09/27/2017 07:44 AM, Carlos O'Donell wrote:
> commit message---
Usually, we use something like this as the subject line:
malloc: Fix tcache leak on thread destruction [BZ #22111]
That is, subsystem first, description of the fix, and bug reference last.
> The malloc tcache added in 2.26 will leak all of the elements
> remaining in the cache and the cache structure itself when a
> thread exits. The defect is that we do not set tcache_shutting_down
> early enough, and the thread simply recreates the tcache and
> places the elements back onto a new tcache which is subsequently
> lost as the thread exits (unfreed memory). The fix is relatively
> simple, move the setting of tcache_shutting_down earlier in the
> tcache_thread_freeres. We add a test case which uses mallinfo
> and some heuristics to look for 1000x memory usage between the
> start and end of a thread start/join loop. It is very reliable
> at detecting that there is a leak given the number of iterations.
> Without the fix the test will consume 122MiB of leaked memory.
See below, commit message may need updating.
The change itself looks good to me.
> diff --git a/malloc/tst-malloc-tcache-leak.c b/malloc/tst-malloc-tcache-leak.c
> new file mode 100644
> index 0000000..ad7ea46
> --- /dev/null
> +++ b/malloc/tst-malloc-tcache-leak.c
> @@ -0,0 +1,111 @@
> +static int
> +do_test (void)
> +{
> + /* Allocate an arbitrary number of threads that can run concurrently
> + and carry out one allocation from the tcahce and then exit. We
typo: âtcahceâ
> + could do it one thread at a time to minimize memory usage, but
> + it's faster to do 10 threads at a time, and we won't run out of
> + memory. */
The comment is wrong for NUMA systems at least. Running single-threaded
is faster, particularly if you set the CPU affinity first.
> +#define TNUM 10
TNUM should be parallel_threads or something like that, but I think it
is unneeded.
> + pthread_t *threads;
> + unsigned int i, loop;
> + struct mallinfo info_before, info_after;
> + void *retval;
> +
> + /* Avoid there being 0 malloc'd data at this point by allocating the
> + pthread_t's required to run the test. */
> + threads = (pthread_t *) xmalloc (sizeof (pthread_t) * TNUM);
Should use xcalloc.
> + info_before = mallinfo ();
> +
> + assert (info_before.uordblks != 0);
> +
> + printf ("INFO: %d (bytes) are in use before starting threads.\n",
> + info_before.uordblks);
> +
> + /* Again this is an arbitrary choice. We run the 10 threads 10,000
> + times for a total of 100,000 threads created and joined. This
> + gives us enough iterations to show a leak. */
> + for (loop = 0; loop < 10000; loop++)
If you keep the parallel execution (maybe it is faster on small systems,
I have not tried), the outer loop count should involve TNUM.
> + {
> +
> + for (i = 0; i < TNUM; i++)
> + {
> + threads[i] = xpthread_create (NULL, worker, NULL);
> + }
Extraneous braces.
> + for (i = 0; i < TNUM; i++)
> + {
> + retval = xpthread_join (threads[i]);
> + free (retval);
> + }
> + }
> +
> + info_after = mallinfo ();
> + printf ("INFO: %d (bytes) are in use after all threads joined.\n",
> + info_after.uordblks);
> +
> + /* We need to compare the memory in use before and the memory in use
> + after starting and joining 100,000 threads. We almost always grow
> + memory slightly, but not much. If the growth factor is over 1000
> + then we know we have a leak with high confidence. Consider that if
> + even 1-byte leaked per thread we'd have 100,000 bytes of additional
> + memory, and in general the in-use at the start of main is quite
> + low. */
> + if (info_after.uordblks > (info_before.uordblks * 1000))
> + FAIL_EXIT1 ("Memory usage after threads is too high.\n");
I think you should use a fixed offset here, not something that
essentially scales with the initially allocated amount of memory.
Thanks,
Florian
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v2] malloc: Fix tcache leak on thread destruction [BZ #22111]
2017-09-27 8:57 ` Florian Weimer
@ 2017-09-28 17:32 ` Carlos O'Donell
2017-10-03 1:36 ` DJ Delorie
0 siblings, 1 reply; 8+ messages in thread
From: Carlos O'Donell @ 2017-09-28 17:32 UTC (permalink / raw)
To: Florian Weimer, GNU C Library
[-- Attachment #1: Type: text/plain, Size: 5822 bytes --]
On 09/27/2017 02:57 AM, Florian Weimer wrote:
> On 09/27/2017 07:44 AM, Carlos O'Donell wrote:
>> commit message---
>
> Usually, we use something like this as the subject line:
>
> malloc: Fix tcache leak on thread destruction [BZ #22111]
>
> That is, subsystem first, description of the fix, and bug reference last.
Agreed.
Subsystem: <description> [BZ #NNNN]
I confirmed that it appears that commit<->bugzilla integration knows
how to parse [BZ #NNNN] on the first line of the commit.
I've edited the contribution checklist slightly.
https://sourceware.org/glibc/wiki/Contribution%20checklist
>> The malloc tcache added in 2.26 will leak all of the elements
>> remaining in the cache and the cache structure itself when a
>> thread exits. The defect is that we do not set tcache_shutting_down
>> early enough, and the thread simply recreates the tcache and
>> places the elements back onto a new tcache which is subsequently
>> lost as the thread exits (unfreed memory). The fix is relatively
>> simple, move the setting of tcache_shutting_down earlier in the
>> tcache_thread_freeres. We add a test case which uses mallinfo
>> and some heuristics to look for 1000x memory usage between the
>> start and end of a thread start/join loop. It is very reliable
>> at detecting that there is a leak given the number of iterations.
>> Without the fix the test will consume 122MiB of leaked memory.
>
> See below, commit message may need updating.
>
> The change itself looks good to me.
>
>> diff --git a/malloc/tst-malloc-tcache-leak.c b/malloc/tst-malloc-tcache-leak.c
>> new file mode 100644
>> index 0000000..ad7ea46
>> --- /dev/null
>> +++ b/malloc/tst-malloc-tcache-leak.c
>> @@ -0,0 +1,111 @@
>
>
>> +static int
>> +do_test (void)
>> +{
>> +Â /* Allocate an arbitrary number of threads that can run concurrently
>> +    and carry out one allocation from the tcahce and then exit. We
>
> typo: âtcahceâ
Fixed.
>> +Â Â Â Â could do it one thread at a time to minimize memory usage, but
>> +Â Â Â Â it's faster to do 10 threads at a time, and we won't run out of
>> +    memory. */
>
> The comment is wrong for NUMA systems at least. Running
> single-threaded is faster, particularly if you set the CPU affinity
> first.
I just switched to doing one thread at a time to simplify the test.
No need to optimize it, it only takes 4 seconds to create/join 100,000
threads on an x86_64 and a I expect similarly < 20s performance on other
systems.
>> +#define TNUM 10
>
> TNUM should be parallel_threads or something like that, but I think it is unneeded.
Removed.
>> +Â pthread_t *threads;
>> +Â unsigned int i, loop;
>> +Â struct mallinfo info_before, info_after;
>> +Â void *retval;
>> +
>> +Â /* Avoid there being 0 malloc'd data at this point by allocating the
>> +    pthread_t's required to run the test. */
>> +Â threads = (pthread_t *) xmalloc (sizeof (pthread_t) * TNUM);
>
> Should use xcalloc.
Fixed.
>> +Â info_before = mallinfo ();
>> +
>> +Â assert (info_before.uordblks != 0);
>> +
>> +Â printf ("INFO: %d (bytes) are in use before starting threads.\n",
>> +Â Â Â Â Â Â Â Â Â info_before.uordblks);
>> +
>> + /* Again this is an arbitrary choice. We run the 10 threads 10,000
>> +    times for a total of 100,000 threads created and joined. This
>> +    gives us enough iterations to show a leak. */
>> +Â for (loop = 0; loop < 10000; loop++)
>
> If you keep the parallel execution (maybe it is faster on small systems, I have not tried), the outer loop count should involve TNUM.
Removed. It's a bare single loop now.
>> +Â Â Â {
>> +
>> +Â Â Â Â Â for (i = 0; i < TNUM; i++)
>> +Â Â Â {
>> +Â Â Â Â Â threads[i] = xpthread_create (NULL, worker, NULL);
>> +Â Â Â }
>
> Extraneous braces.
Removed.
>> +Â Â Â Â Â for (i = 0; i < TNUM; i++)
>> +Â Â Â {
>> +Â Â Â Â Â retval = xpthread_join (threads[i]);
>> +Â Â Â Â Â free (retval);
>> +Â Â Â }
>> +Â Â Â }
>> +
>> +Â info_after = mallinfo ();
>> +Â printf ("INFO: %d (bytes) are in use after all threads joined.\n",
>> +Â Â Â Â Â Â Â Â Â info_after.uordblks);
>> +
>> +Â /* We need to compare the memory in use before and the memory in use
>> +    after starting and joining 100,000 threads. We almost always grow
>> +    memory slightly, but not much. If the growth factor is over 1000
>> +    then we know we have a leak with high confidence. Consider that if
>> +Â Â Â Â even 1-byte leaked per thread we'd have 100,000 bytes of additional
>> +Â Â Â Â memory, and in general the in-use at the start of main is quite
>> +    low. */
>> +Â if (info_after.uordblks > (info_before.uordblks * 1000))
>> +Â Â Â FAIL_EXIT1 ("Memory usage after threads is too high.\n");
>
> I think you should use a fixed offset here, not something that
> essentially scales with the initially allocated amount of memory.
Sure, that's a good idea.
I just used the threads count as the fixed offset, which is an assumption
of a small 1-byte leak per thread, and anything higher than that is a leak.
e.g.
Before:
INFO: 624 (bytes) are in use before starting threads.
INFO: 118403536 (bytes) are in use after all threads joined.
error: tst-malloc-tcache-leak.c:91: Memory usage after threads is too high.
Which is a ~1.1kb leak per thread.
After:
INFO: 624 (bytes) are in use before starting threads.
INFO: 3472 (bytes) are in use after all threads joined.
v2 attached.
Would you give a Reviewed-by when you think this looks good?
--
Cheers,
Carlos.
[-- Attachment #2: swbz22111-v2.patch --]
[-- Type: text/x-patch, Size: 7727 bytes --]
From 755a0eb4c954d69814e5609d60bcf17741b3d5b2 Mon Sep 17 00:00:00 2001
From: Carlos O'Donell <carlos@systemhalted.org>
Date: Thu, 28 Sep 2017 11:05:18 -0600
Subject: [PATCH] malloc: Fix tcache leak after thread destruction [BZ #22111]
The malloc tcache added in 2.26 will leak all of the elements remaining
in the cache and the cache structure itself when a thread exits. The
defect is that we do not set tcache_shutting_down early enough, and the
thread simply recreates the tcache and places the elements back onto a
new tcache which is subsequently lost as the thread exits (unfreed
memory). The fix is relatively simple, move the setting of
tcache_shutting_down earlier in tcache_thread_freeres. We add a test
case which uses mallinfo and some heuristics to look for unaccounted for
memory usage between the start and end of a thread start/join loop. It
is very reliable at detecting that there is a leak given the number of
iterations. Without the fix the test will consume 122MiB of leaked
memory.
---
ChangeLog | 9 ++++
malloc/Makefile | 3 ++
malloc/malloc.c | 8 ++--
malloc/tst-malloc-tcache-leak.c | 102 ++++++++++++++++++++++++++++++++++++++++
4 files changed, 119 insertions(+), 3 deletions(-)
create mode 100644 malloc/tst-malloc-tcache-leak.c
diff --git a/ChangeLog b/ChangeLog
index 8fd06de..d96de74 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,12 @@
+2017-09-26 Carlos O'Donell <carlos@redhat.com>
+
+ [BZ #22111]
+ * malloc/malloc.c (tcache_shutting_down): Use bool type.
+ (tcache_thread_freeres): Set tcache_shutting_down before
+ freeing the tcache.
+ * malloc/Makefile (tests): Add tst-malloc-tcache-leak.
+ * malloc/tst-malloc-tcache-leak.c: New file.
+
2017-09-28 Szabolcs Nagy <szabolcs.nagy@arm.com>
* sysdeps/aarch64/libm-test-ulps: Update.
diff --git a/malloc/Makefile b/malloc/Makefile
index 50b487e..6cf78e1 100644
--- a/malloc/Makefile
+++ b/malloc/Makefile
@@ -34,6 +34,7 @@ tests := mallocbug tst-malloc tst-valloc tst-calloc tst-obstack \
tst-interpose-nothread \
tst-interpose-thread \
tst-alloc_buffer \
+ tst-malloc-tcache-leak \
tests-static := \
tst-interpose-static-nothread \
@@ -242,3 +243,5 @@ tst-dynarray-fail-ENV = MALLOC_TRACE=$(objpfx)tst-dynarray-fail.mtrace
$(objpfx)tst-dynarray-fail-mem.out: $(objpfx)tst-dynarray-fail.out
$(common-objpfx)malloc/mtrace $(objpfx)tst-dynarray-fail.mtrace > $@; \
$(evaluate-test)
+
+$(objpfx)tst-malloc-tcache-leak: $(shared-thread-library)
diff --git a/malloc/malloc.c b/malloc/malloc.c
index 1c2a0b0..d3fcadd 100644
--- a/malloc/malloc.c
+++ b/malloc/malloc.c
@@ -2916,7 +2916,7 @@ typedef struct tcache_perthread_struct
tcache_entry *entries[TCACHE_MAX_BINS];
} tcache_perthread_struct;
-static __thread char tcache_shutting_down = 0;
+static __thread bool tcache_shutting_down = false;
static __thread tcache_perthread_struct *tcache = NULL;
/* Caller must ensure that we know tc_idx is valid and there's room
@@ -2953,8 +2953,12 @@ tcache_thread_freeres (void)
if (!tcache)
return;
+ /* Disable the tcache and prevent it from being reinitialized. */
tcache = NULL;
+ tcache_shutting_down = true;
+ /* Free all of the entries and the tcache itself back to the arena
+ heap for coalescing. */
for (i = 0; i < TCACHE_MAX_BINS; ++i)
{
while (tcache_tmp->entries[i])
@@ -2966,8 +2970,6 @@ tcache_thread_freeres (void)
}
__libc_free (tcache_tmp);
-
- tcache_shutting_down = 1;
}
text_set_element (__libc_thread_subfreeres, tcache_thread_freeres);
diff --git a/malloc/tst-malloc-tcache-leak.c b/malloc/tst-malloc-tcache-leak.c
new file mode 100644
index 0000000..18e374f
--- /dev/null
+++ b/malloc/tst-malloc-tcache-leak.c
@@ -0,0 +1,102 @@
+/* Bug 22111: Test that threads do not leak their per thread cache.
+ Copyright (C) 2015-2017 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+/* The point of this test is to start and exit a large number of
+ threads, while at the same time looking to see if the used
+ memory grows with each round of threads run. If the memory
+ grows above some linear bound we declare the test failed and
+ that the malloc implementation is leaking memory with each
+ thread. This is a good indicator that the thread local cache
+ is leaking chunks. */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <malloc.h>
+#include <pthread.h>
+#include <assert.h>
+
+#include <support/check.h>
+#include <support/support.h>
+#include <support/xthread.h>
+
+void *
+worker (void *data)
+{
+ /* Allocate an arbitrary amount of memory that is known to fit into
+ the thread local cache (tcache). If we have at least 64 bins
+ (default e.g. TCACHE_MAX_BINS) we should be able to allocate 32
+ bytes and force malloc to fill the tcache. We have the allocated
+ memory escape back to the parent to be freed to avoid any compiler
+ optimizations. */
+ return (void *) xmalloc (32);
+}
+
+static int
+do_test (void)
+{
+ pthread_t *thread;
+ struct mallinfo info_before, info_after;
+ void *retval;
+
+ /* This is an arbitrary choice. We choose a total of THREADS
+ threads created and joined. This gives us enough iterations to
+ show a leak. */
+ int threads = 100000;
+
+ /* Avoid there being 0 malloc'd data at this point by allocating the
+ pthread_t required to run the test. */
+ thread = (pthread_t *) xcalloc (1, sizeof (pthread_t));
+
+ info_before = mallinfo ();
+
+ assert (info_before.uordblks != 0);
+
+ printf ("INFO: %d (bytes) are in use before starting threads.\n",
+ info_before.uordblks);
+
+ for (int loop = 0; loop < threads; loop++)
+ {
+ *thread = xpthread_create (NULL, worker, NULL);
+ retval = xpthread_join (*thread);
+ free (retval);
+ }
+
+ info_after = mallinfo ();
+ printf ("INFO: %d (bytes) are in use after all threads joined.\n",
+ info_after.uordblks);
+
+ /* We need to compare the memory in use before and the memory in use
+ after starting and joining THREADS threads. We almost always grow
+ memory slightly, but not much. Consider that if even 1-byte leaked
+ per thread we'd have THREADS bytes of additional memory, and in
+ general the in-use at the start of main is quite low. We will
+ always leak a full malloc chunk, and never just 1-byte, therefore
+ anything above "+ threads" from the start (constant offset) is a
+ leak. Obviously this assumes no thread-related malloc'd internal
+ libc data structures persist beyond the thread death, and any that
+ did would limit the number of times you could call pthread_create,
+ which is a QoI we'd want to detect and fix. */
+ if (info_after.uordblks > (info_before.uordblks + threads))
+ FAIL_EXIT1 ("Memory usage after threads is too high.\n");
+
+ /* Did not detect excessive memory usage. */
+ free (thread);
+ exit (0);
+}
+
+#include <support/test-driver.c>
--
2.9.5
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2] malloc: Fix tcache leak on thread destruction [BZ #22111]
2017-09-28 17:32 ` [PATCH v2] malloc: Fix tcache leak on thread destruction [BZ #22111] Carlos O'Donell
@ 2017-10-03 1:36 ` DJ Delorie
2017-10-03 1:43 ` Andrew Pinski
0 siblings, 1 reply; 8+ messages in thread
From: DJ Delorie @ 2017-10-03 1:36 UTC (permalink / raw)
To: Carlos O'Donell; +Cc: fweimer, libc-alpha
Carlos O'Donell <carlos@systemhalted.org> writes:
> diff --git a/malloc/tst-malloc-tcache-leak.c b/malloc/tst-malloc-tcache-leak.c
> +void *
> +worker (void *data)
> +{
> + /* Allocate an arbitrary amount of memory that is known to fit into
> + the thread local cache (tcache). If we have at least 64 bins
> + (default e.g. TCACHE_MAX_BINS) we should be able to allocate 32
> + bytes and force malloc to fill the tcache. We have the allocated
> + memory escape back to the parent to be freed to avoid any compiler
> + optimizations. */
> + return (void *) xmalloc (32);
> +}
This would be slightly more future-proof if it did an alloc/free/alloc
cycle, in case in the future malloc doesn't init the tcache "just
because". The free would force the init, since the chunk would be
stored in the tcache.
Actually, a malloc/free might be a better test, since it tests that the
free'd chunks in the tcache are freed as well as the tcache
infrastructure itself. Or maybe as a second test.
Also, some protection against user-environment tunables disabling the
tcache might be appropriate, although that would only stop false
positives.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2] malloc: Fix tcache leak on thread destruction [BZ #22111]
2017-10-03 1:36 ` DJ Delorie
@ 2017-10-03 1:43 ` Andrew Pinski
2017-10-06 16:35 ` Carlos O'Donell
0 siblings, 1 reply; 8+ messages in thread
From: Andrew Pinski @ 2017-10-03 1:43 UTC (permalink / raw)
To: DJ Delorie; +Cc: Carlos O'Donell, Florian Weimer, GNU C Library
On Mon, Oct 2, 2017 at 6:36 PM, DJ Delorie <dj@redhat.com> wrote:
>
> Carlos O'Donell <carlos@systemhalted.org> writes:
>> diff --git a/malloc/tst-malloc-tcache-leak.c b/malloc/tst-malloc-tcache-leak.c
>> +void *
>> +worker (void *data)
>> +{
>> + /* Allocate an arbitrary amount of memory that is known to fit into
>> + the thread local cache (tcache). If we have at least 64 bins
>> + (default e.g. TCACHE_MAX_BINS) we should be able to allocate 32
>> + bytes and force malloc to fill the tcache. We have the allocated
>> + memory escape back to the parent to be freed to avoid any compiler
>> + optimizations. */
>> + return (void *) xmalloc (32);
>> +}
>
> This would be slightly more future-proof if it did an alloc/free/alloc
> cycle, in case in the future malloc doesn't init the tcache "just
> because". The free would force the init, since the chunk would be
> stored in the tcache.
>
> Actually, a malloc/free might be a better test, since it tests that the
> free'd chunks in the tcache are freed as well as the tcache
> infrastructure itself. Or maybe as a second test.
And if you are going to do a malloc/free pair, make sure you add a
compiler barrier in the code so the compiler does not delete the
malloc/free pairs (which it does already).
Thanks,
Andrew
>
> Also, some protection against user-environment tunables disabling the
> tcache might be appropriate, although that would only stop false
> positives.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2] malloc: Fix tcache leak on thread destruction [BZ #22111]
2017-10-03 1:43 ` Andrew Pinski
@ 2017-10-06 16:35 ` Carlos O'Donell
2017-10-06 16:52 ` DJ Delorie
0 siblings, 1 reply; 8+ messages in thread
From: Carlos O'Donell @ 2017-10-06 16:35 UTC (permalink / raw)
To: Andrew Pinski, DJ Delorie; +Cc: Florian Weimer, GNU C Library
[-- Attachment #1: Type: text/plain, Size: 1662 bytes --]
On 10/02/2017 06:43 PM, Andrew Pinski wrote:
> On Mon, Oct 2, 2017 at 6:36 PM, DJ Delorie <dj@redhat.com> wrote:
>>
>> Carlos O'Donell <carlos@systemhalted.org> writes:
>>> diff --git a/malloc/tst-malloc-tcache-leak.c b/malloc/tst-malloc-tcache-leak.c
>>> +void *
>>> +worker (void *data)
>>> +{
>>> + /* Allocate an arbitrary amount of memory that is known to fit into
>>> + the thread local cache (tcache). If we have at least 64 bins
>>> + (default e.g. TCACHE_MAX_BINS) we should be able to allocate 32
>>> + bytes and force malloc to fill the tcache. We have the allocated
>>> + memory escape back to the parent to be freed to avoid any compiler
>>> + optimizations. */
>>> + return (void *) xmalloc (32);
>>> +}
>>
>> This would be slightly more future-proof if it did an alloc/free/alloc
>> cycle, in case in the future malloc doesn't init the tcache "just
>> because". The free would force the init, since the chunk would be
>> stored in the tcache.
>>
>> Actually, a malloc/free might be a better test, since it tests that the
>> free'd chunks in the tcache are freed as well as the tcache
>> infrastructure itself. Or maybe as a second test.
>
> And if you are going to do a malloc/free pair, make sure you add a
> compiler barrier in the code so the compiler does not delete the
> malloc/free pairs (which it does already).
Thanks DJ, that's a good idea about alloc/free/alloc.
Andrew, Yup, compiler barrier `__asm__ volatile ("" ::: "memory");`
v3 patch attached.
How does this look?
Signed-off-by: Carlos O'Donell <carlos@redhat.com>
Would appreciate any Reviewed-by entries from either of you.
--
Cheers,
Carlos.
[-- Attachment #2: 0001-malloc-Fix-tcache-leak-after-thread-destruction-BZ-2.patch --]
[-- Type: text/x-patch, Size: 8275 bytes --]
From 1e26d35193efbb29239c710a4c46a64708643320 Mon Sep 17 00:00:00 2001
From: Carlos O'Donell <carlos@systemhalted.org>
Date: Thu, 28 Sep 2017 11:05:18 -0600
Subject: [PATCH] malloc: Fix tcache leak after thread destruction [BZ #22111]
The malloc tcache added in 2.26 will leak all of the elements remaining
in the cache and the cache structure itself when a thread exits. The
defect is that we do not set tcache_shutting_down early enough, and the
thread simply recreates the tcache and places the elements back onto a
new tcache which is subsequently lost as the thread exits (unfreed
memory). The fix is relatively simple, move the setting of
tcache_shutting_down earlier in tcache_thread_freeres. We add a test
case which uses mallinfo and some heuristics to look for unaccounted for
memory usage between the start and end of a thread start/join loop. It
is very reliable at detecting that there is a leak given the number of
iterations. Without the fix the test will consume 122MiB of leaked
memory.
---
ChangeLog | 9 ++++
malloc/Makefile | 3 ++
malloc/malloc.c | 8 +--
malloc/tst-malloc-tcache-leak.c | 112 ++++++++++++++++++++++++++++++++++++++++
4 files changed, 129 insertions(+), 3 deletions(-)
create mode 100644 malloc/tst-malloc-tcache-leak.c
diff --git a/ChangeLog b/ChangeLog
index ac0f188..bbd80b1 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,12 @@
+2017-10-06 Carlos O'Donell <carlos@redhat.com>
+
+ [BZ #22111]
+ * malloc/malloc.c (tcache_shutting_down): Use bool type.
+ (tcache_thread_freeres): Set tcache_shutting_down before
+ freeing the tcache.
+ * malloc/Makefile (tests): Add tst-malloc-tcache-leak.
+ * malloc/tst-malloc-tcache-leak.c: New file.
+
2017-10-06 Rajalakshmi Srinivasaraghavan <raji@linux.vnet.ibm.com>
* sysdeps/powerpc/powerpc64/multiarch/memrchr-ppc64.c: Revert
diff --git a/malloc/Makefile b/malloc/Makefile
index 50b487e..6cf78e1 100644
--- a/malloc/Makefile
+++ b/malloc/Makefile
@@ -34,6 +34,7 @@ tests := mallocbug tst-malloc tst-valloc tst-calloc tst-obstack \
tst-interpose-nothread \
tst-interpose-thread \
tst-alloc_buffer \
+ tst-malloc-tcache-leak \
tests-static := \
tst-interpose-static-nothread \
@@ -242,3 +243,5 @@ tst-dynarray-fail-ENV = MALLOC_TRACE=$(objpfx)tst-dynarray-fail.mtrace
$(objpfx)tst-dynarray-fail-mem.out: $(objpfx)tst-dynarray-fail.out
$(common-objpfx)malloc/mtrace $(objpfx)tst-dynarray-fail.mtrace > $@; \
$(evaluate-test)
+
+$(objpfx)tst-malloc-tcache-leak: $(shared-thread-library)
diff --git a/malloc/malloc.c b/malloc/malloc.c
index 1c2a0b0..d3fcadd 100644
--- a/malloc/malloc.c
+++ b/malloc/malloc.c
@@ -2916,7 +2916,7 @@ typedef struct tcache_perthread_struct
tcache_entry *entries[TCACHE_MAX_BINS];
} tcache_perthread_struct;
-static __thread char tcache_shutting_down = 0;
+static __thread bool tcache_shutting_down = false;
static __thread tcache_perthread_struct *tcache = NULL;
/* Caller must ensure that we know tc_idx is valid and there's room
@@ -2953,8 +2953,12 @@ tcache_thread_freeres (void)
if (!tcache)
return;
+ /* Disable the tcache and prevent it from being reinitialized. */
tcache = NULL;
+ tcache_shutting_down = true;
+ /* Free all of the entries and the tcache itself back to the arena
+ heap for coalescing. */
for (i = 0; i < TCACHE_MAX_BINS; ++i)
{
while (tcache_tmp->entries[i])
@@ -2966,8 +2970,6 @@ tcache_thread_freeres (void)
}
__libc_free (tcache_tmp);
-
- tcache_shutting_down = 1;
}
text_set_element (__libc_thread_subfreeres, tcache_thread_freeres);
diff --git a/malloc/tst-malloc-tcache-leak.c b/malloc/tst-malloc-tcache-leak.c
new file mode 100644
index 0000000..22c679b
--- /dev/null
+++ b/malloc/tst-malloc-tcache-leak.c
@@ -0,0 +1,112 @@
+/* Bug 22111: Test that threads do not leak their per thread cache.
+ Copyright (C) 2015-2017 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+/* The point of this test is to start and exit a large number of
+ threads, while at the same time looking to see if the used
+ memory grows with each round of threads run. If the memory
+ grows above some linear bound we declare the test failed and
+ that the malloc implementation is leaking memory with each
+ thread. This is a good indicator that the thread local cache
+ is leaking chunks. */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <malloc.h>
+#include <pthread.h>
+#include <assert.h>
+
+#include <support/check.h>
+#include <support/support.h>
+#include <support/xthread.h>
+
+void *
+worker (void *data)
+{
+ void *ret;
+ /* Allocate an arbitrary amount of memory that is known to fit into
+ the thread local cache (tcache). If we have at least 64 bins
+ (default e.g. TCACHE_MAX_BINS) we should be able to allocate 32
+ bytes and force malloc to fill the tcache. We are assuming tcahce
+ init happens at the first small alloc, but it might in the future
+ be deferred to some other point. Therefore to future proof this
+ test we include a full alloc/free/alloc cycle for the thread. We
+ need a compiler barrier to avoid the removal of the useless
+ alloc/free. We send some memory back to main to have the memory
+ freed after the thread dies, as just another check that the chunks
+ that were previously in the tcache are still OK to free after
+ thread death. */
+ ret = xmalloc (32);
+ __asm__ volatile ("" ::: "memory");
+ free (ret);
+ return (void *) xmalloc (32);
+}
+
+static int
+do_test (void)
+{
+ pthread_t *thread;
+ struct mallinfo info_before, info_after;
+ void *retval;
+
+ /* This is an arbitrary choice. We choose a total of THREADS
+ threads created and joined. This gives us enough iterations to
+ show a leak. */
+ int threads = 100000;
+
+ /* Avoid there being 0 malloc'd data at this point by allocating the
+ pthread_t required to run the test. */
+ thread = (pthread_t *) xcalloc (1, sizeof (pthread_t));
+
+ info_before = mallinfo ();
+
+ assert (info_before.uordblks != 0);
+
+ printf ("INFO: %d (bytes) are in use before starting threads.\n",
+ info_before.uordblks);
+
+ for (int loop = 0; loop < threads; loop++)
+ {
+ *thread = xpthread_create (NULL, worker, NULL);
+ retval = xpthread_join (*thread);
+ free (retval);
+ }
+
+ info_after = mallinfo ();
+ printf ("INFO: %d (bytes) are in use after all threads joined.\n",
+ info_after.uordblks);
+
+ /* We need to compare the memory in use before and the memory in use
+ after starting and joining THREADS threads. We almost always grow
+ memory slightly, but not much. Consider that if even 1-byte leaked
+ per thread we'd have THREADS bytes of additional memory, and in
+ general the in-use at the start of main is quite low. We will
+ always leak a full malloc chunk, and never just 1-byte, therefore
+ anything above "+ threads" from the start (constant offset) is a
+ leak. Obviously this assumes no thread-related malloc'd internal
+ libc data structures persist beyond the thread death, and any that
+ did would limit the number of times you could call pthread_create,
+ which is a QoI we'd want to detect and fix. */
+ if (info_after.uordblks > (info_before.uordblks + threads))
+ FAIL_EXIT1 ("Memory usage after threads is too high.\n");
+
+ /* Did not detect excessive memory usage. */
+ free (thread);
+ exit (0);
+}
+
+#include <support/test-driver.c>
--
2.9.5
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2] malloc: Fix tcache leak on thread destruction [BZ #22111]
2017-10-06 16:35 ` Carlos O'Donell
@ 2017-10-06 16:52 ` DJ Delorie
2017-10-06 17:39 ` Carlos O'Donell
0 siblings, 1 reply; 8+ messages in thread
From: DJ Delorie @ 2017-10-06 16:52 UTC (permalink / raw)
To: Carlos O'Donell; +Cc: pinskia, fweimer, libc-alpha
"Carlos O'Donell" <carlos@redhat.com> writes:
> v3 patch attached.
This is OK. Thanks!
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2] malloc: Fix tcache leak on thread destruction [BZ #22111]
2017-10-06 16:52 ` DJ Delorie
@ 2017-10-06 17:39 ` Carlos O'Donell
0 siblings, 0 replies; 8+ messages in thread
From: Carlos O'Donell @ 2017-10-06 17:39 UTC (permalink / raw)
To: DJ Delorie; +Cc: pinskia, fweimer, libc-alpha
On 10/06/2017 09:52 AM, DJ Delorie wrote:
>
> "Carlos O'Donell" <carlos@redhat.com> writes:
>> v3 patch attached.
>
> This is OK. Thanks!
Pushed to master.
I'm going to backport this to 2.26 stable also, because I want
it to land in Fedora 27.
--
Cheers,
Carlos.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2017-10-06 17:39 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-27 5:45 [PATCH] Bug 22111: Fix malloc tcache leak Carlos O'Donell
2017-09-27 8:57 ` Florian Weimer
2017-09-28 17:32 ` [PATCH v2] malloc: Fix tcache leak on thread destruction [BZ #22111] Carlos O'Donell
2017-10-03 1:36 ` DJ Delorie
2017-10-03 1:43 ` Andrew Pinski
2017-10-06 16:35 ` Carlos O'Donell
2017-10-06 16:52 ` DJ Delorie
2017-10-06 17:39 ` Carlos O'Donell
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).