public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH] malloc: remove dead code
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
                   ` (4 preceding siblings ...)
  2016-01-26  0:26 ` [PATCH] malloc: unobfuscate an assert Joern Engel
@ 2016-01-26  0:26 ` Joern Engel
  2016-01-26  0:26 ` [PATCH] malloc: remove emacs style guards Joern Engel
                   ` (58 subsequent siblings)
  64 siblings, 0 replies; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:26 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

From: Joern Engel <joern@purestorage.org>

JIRA: PURE-27597
---
 tpc/malloc2.13/malloc.c | 105 ------------------------------------------------
 1 file changed, 105 deletions(-)

diff --git a/tpc/malloc2.13/malloc.c b/tpc/malloc2.13/malloc.c
index f1c7b219a0bd..c366b6085953 100644
--- a/tpc/malloc2.13/malloc.c
+++ b/tpc/malloc2.13/malloc.c
@@ -183,7 +183,6 @@ __declspec(dllexport) void malloc2_13_bogus_symbol()
     HAVE_MEMCPY                defined
     USE_MEMCPY                 1 if HAVE_MEMCPY is defined
     HAVE_MMAP                  defined as 1
-    MMAP_CLEARS                1
     HAVE_MREMAP                0 unless linux defined
     USE_ARENAS                 the same as HAVE_MMAP
     malloc_getpagesize         derived from system #includes, or 4096 if not
@@ -663,20 +662,6 @@ extern Void_t*     sbrk(ptrdiff_t);
 
 #ifndef HAVE_MMAP
 #define HAVE_MMAP 1
-
-/*
-   Standard unix mmap using /dev/zero clears memory so calloc doesn't
-   need to.
-*/
-
-#ifndef MMAP_CLEARS
-#define MMAP_CLEARS 1
-#endif
-
-#else /* no mmap */
-#ifndef MMAP_CLEARS
-#define MMAP_CLEARS 0
-#endif
 #endif
 
 
@@ -3470,38 +3455,6 @@ public_mALLOc(size_t bytes)
     return (*hook)(bytes, RETURN_ADDRESS (0));
 
   arena_lookup(ar_ptr);
-#if 0
-  // XXX We need double-word CAS and fastbins must be extended to also
-  // XXX hold a generation counter for each entry.
-  if (ar_ptr) {
-    INTERNAL_SIZE_T nb;               /* normalized request size */
-    checked_request2size(bytes, nb);
-    if (nb <= get_max_fast ()) {
-      long int idx = fastbin_index(nb);
-      mfastbinptr* fb = &fastbin (ar_ptr, idx);
-      mchunkptr pp = *fb;
-      mchunkptr v;
-      do
-	{
-	  v = pp;
-	  if (v == NULL)
-	    break;
-	}
-      while ((pp = catomic_compare_and_exchange_val_acq (fb, v->fd, v)) != v);
-      if (v != 0) {
-	if (__builtin_expect (fastbin_index (chunksize (v)) != idx, 0))
-	  malloc_printerr (check_action, "malloc(): memory corruption (fast)",
-			   chunk2mem (v));
-	check_remalloced_chunk(ar_ptr, v, nb);
-	void *p = chunk2mem(v);
-	if (__builtin_expect (perturb_byte, 0))
-	  alloc_perturb (p, bytes);
-	return p;
-      }
-    }
-  }
-#endif
-
   arena_lock(ar_ptr, bytes);
   if(!ar_ptr)
     return 0;
@@ -5413,64 +5366,6 @@ _int_memalign(struct malloc_state * av, size_t alignment, size_t bytes)
   return chunk2mem(p);
 }
 
-#if 0
-/*
-  ------------------------------ calloc ------------------------------
-*/
-
-Void_t* cALLOc(size_t n_elements, size_t elem_size)
-{
-  mchunkptr p;
-  unsigned long clearsize;
-  unsigned long nclears;
-  INTERNAL_SIZE_T* d;
-
-  Void_t* mem = mALLOc(n_elements * elem_size);
-
-  if (mem != 0) {
-    p = mem2chunk(mem);
-
-#if MMAP_CLEARS
-    if (!chunk_is_mmapped(p)) /* don't need to clear mmapped space */
-#endif
-    {
-      /*
-	Unroll clear of <= 36 bytes (72 if 8byte sizes)
-	We know that contents have an odd number of
-	INTERNAL_SIZE_T-sized words; minimally 3.
-      */
-
-      d = (INTERNAL_SIZE_T*)mem;
-      clearsize = chunksize(p) - SIZE_SZ;
-      nclears = clearsize / sizeof(INTERNAL_SIZE_T);
-      assert(nclears >= 3);
-
-      if (nclears > 9)
-	MALLOC_ZERO(d, clearsize);
-
-      else {
-	*(d+0) = 0;
-	*(d+1) = 0;
-	*(d+2) = 0;
-	if (nclears > 4) {
-	  *(d+3) = 0;
-	  *(d+4) = 0;
-	  if (nclears > 6) {
-	    *(d+5) = 0;
-	    *(d+6) = 0;
-	    if (nclears > 8) {
-	      *(d+7) = 0;
-	      *(d+8) = 0;
-	    }
-	  }
-	}
-      }
-    }
-  }
-  return mem;
-}
-#endif /* 0 */
-
 #ifndef _LIBC
 /*
   ------------------------- independent_calloc -------------------------
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* malloc: performance improvements and bugfixes
@ 2016-01-26  0:26 Joern Engel
  2016-01-26  0:26 ` [PATCH] malloc: push down the memset for huge pages Joern Engel
                   ` (64 more replies)
  0 siblings, 65 replies; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:26 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

From: Joern Engel <joern@purestorage.org>

Short version:
We have forked libc malloc and added a bunch of patches on top.  Some
patches help performance, some fix bugs, many just change the code to
my personal liking.  Here is a braindump that is _not_ intended to be
merged, at least not as-is.  But individual bits could and should get
extracted.

Long version:
When upgrading glibc from 2.13 to a newer version, we started hitting
OOM bugs.  These were caused by enabling PER_THREAD on newer versions.
We split malloc-2.13 from our previous libc and used that instead.
The beginning of a fork.

Later we found various other problems.  Since we now owned the code,
we made use of it.  Overall our version is roughly on-par with
jemalloc while libc malloc gets replaced by most projects that care
about performance and use multithreading.

Some of our changes may be completely unpalatable to libc.  I made no
distinction and give you the entire list - if only to see what some
people might care about.


Rough list:

Use Lindent and unifdef.
I happen to prefer the kernel coding style over GNU coding style.
These only helped me read the code and make changes, but are
absolutely no upstream material.  Sorry about the noise.

Revert PER_THREAD.
Per-thread arenas are an exquisitely bad idea.  If a thread uses a lot
of memory, then frees most, malloc will hang on to the free memory and
neither return it to the system nor use it for other threads.

Remove mprotect
While I admit that some people might care about commit charge, I wager
that most people don't and in particular we don't.  The way malloc
uses mprotect turned the mmap_sem into the single worst lock inside
the Linux kernel.  Removing mprotect mostly fixed that.
Mprotect also triggers bad behaviour in the kernel VM.  Far more VMAs
get created and after reaching 64k the kernel would stop to mmap() for
our process.  We effectively run out of memory with gigabytes of free
memory available to the system.

Use hugepages
In our project hugepages have become a necessity for low latency.
Transparent hugepages aren't good enough, so we have to deal with them
explicitly.  Probably not upstream-material.

Cleanup of arena_get macros
Removes duplicate (and buggy) code and simplifies the logic.  Existing
code outgrew the size where macros may have made sense.

NUMA support
Once you have a NUMA system, this helps a lot.  Currently does a
syscall for every allocation.  Surprisingly the syscall hardly shows
up in profiles and the benefits clearly dominate.  If libc exposed the
vdso-version of getcpu(), that would be much nicer.

Remove __builtin_expect
I benchmarked the effect.  Even if I reversed the logic and marked
unlikely branches likely and vice versa, there was absolutely no
measureable effect.  Filed under cargo cult and removed.

Revert 1d05c2fb9c6f (Val and Arjan's change to dynamically grow heaps)
I couldn't figure out how the logic actually worked.  While I might
not be the best programmer in the world, I find that disturbing for
what is conceptually such a simple change.  Hence,...

Removed ATOMIC_FASTBINS
Not sure if this was a good change, but the the atomic_free_list
(below) recovered the performance, covers more than just fastbins and
is simpler code.

Added a thread cache
A per-thread cache gives most of the performance benefits of
per-thread arenas without the drawback of memory bloat.  128k is less
than most people's stack consumption, so the cost should be
acceptable.

Added atomic_free_list
Makes free() lockless.  If the arena is locked, we just push memory to
the atomic_free_list and let someone else do the actual free later.
Before this change we had an average of three (3) threads blocked on
an arena lock in the stable state.

Fix startup race
I suppose noone ever hit this because the main_arena initialized so
darn fast that they always won the race.  I changed the timings,
mostly with NUMA code, and started losing.

Simplify calloc
I believe the same also happened upstream and later got reverted.  I
couldn't find the rationale for the revert and find it dodgy.
Technically the existing version of calloc can be faster early on, but
not for long-running processes in the stable state.  And once I found
bugs in calloc I couldn't be arsed to debug them and just removed most
of the code.

Made malloc signal-safe
I think malloc() was always signal-safe, but free() wasn't.  It isn't
hard to trigger this in a testcase.  Our version survives such a test,
mostly because of the atomic_free_list.

Fix calculation of aligned heaps
Looks like this was always buggy.  Is that correct or was I misreading
the code?

Remove hooks
I don't understand what problem they were supposed to solve.  Our
project doesn't seem to need them and I have testcases that break
because of the hooks.


If any of this looks interesting for upstream and you have questions,
feel free to pester me.

And maybe as a closing note, I believe there are some applications that
have deeper knowledge about malloc-internal data structures than they
should (*cough*emacs).  As a result it has become impossible to change
the internals of malloc without breaking said applications and libc
malloc has ossified.

At this point, either a handful of applications need to ship the
ossified version of malloc or Everthing Else(tm) has to switch to a
better version of malloc.  The reality we live in has everything else
ship tcmalloc, jemalloc or somesuch and libc malloc is slowly becoming
irrelevant and the butt of hallway jokes.  I don't find this reality
very desireable, and yet here we are.

Jörn

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH] malloc: use MAP_HUGETLB when possible
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
  2016-01-26  0:26 ` [PATCH] malloc: push down the memset for huge pages Joern Engel
  2016-01-26  0:26 ` [PATCH] malloc: turn arena_get() into a function Joern Engel
@ 2016-01-26  0:26 ` Joern Engel
  2016-01-26  0:26 ` [PATCH] malloc: Lindent new_heap Joern Engel
                   ` (61 subsequent siblings)
  64 siblings, 0 replies; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:26 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

From: Joern Engel <joern@purestorage.org>

Replicates the logic from purity to try huge pages first, then fall back
to small pages.  Care must be taken not to combine MAP_HUGETLB with
MAP_NORESERVE, as the result is a successful mmap() followed by SIGBUS
when accessing the memory.

More care must be taken to memset() the returned memory, as our kernel
does not and malloc assumes cleared memory.  There is optimization
potential, as memset(128GB) takes around 20ms.

JIRA: PURE-27597
---
 tpc/malloc2.13/arena.ch | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/tpc/malloc2.13/arena.ch b/tpc/malloc2.13/arena.ch
index 3e778f3f96f7..fae6c2f7ee4c 100644
--- a/tpc/malloc2.13/arena.ch
+++ b/tpc/malloc2.13/arena.ch
@@ -687,6 +687,20 @@ dump_heap(heap) heap_info *heap;
    multiple threads, but only one will succeed.  */
 static char *aligned_heap_area;
 
+static void *mmap_for_heap(void *addr, size_t length)
+{
+	int prot = PROT_READ | PROT_WRITE;
+	int flags = MAP_PRIVATE;
+	void *ret;
+
+	ret = MMAP(addr, length, prot, flags | MAP_HUGETLB);
+	if (ret != MAP_FAILED) {
+		memset(ret, 0, length);
+		return ret;
+	}
+	return MMAP(addr, length, prot, flags | MAP_NORESERVE);
+}
+
 /* Create a new heap.  size is automatically rounded up to a multiple
    of the page size. */
 
@@ -719,8 +733,7 @@ new_heap(size, top_pad) size_t size, top_pad;
      anyway). */
   p2 = MAP_FAILED;
   if(aligned_heap_area) {
-    p2 = (char *)MMAP(aligned_heap_area, HEAP_MAX_SIZE, PROT_READ|PROT_WRITE,
-		      MAP_PRIVATE|MAP_NORESERVE);
+    p2 = mmap_for_heap(aligned_heap_area, HEAP_MAX_SIZE);
     aligned_heap_area = NULL;
     if (p2 != MAP_FAILED && ((unsigned long)p2 & (HEAP_MAX_SIZE-1))) {
       munmap(p2, HEAP_MAX_SIZE);
@@ -728,8 +741,7 @@ new_heap(size, top_pad) size_t size, top_pad;
     }
   }
   if(p2 == MAP_FAILED) {
-    p1 = (char *)MMAP(0, HEAP_MAX_SIZE<<1, PROT_READ|PROT_WRITE,
-		      MAP_PRIVATE|MAP_NORESERVE);
+    p1 = mmap_for_heap(0, HEAP_MAX_SIZE<<1);
     if(p1 != MAP_FAILED) {
       p2 = (char *)(((unsigned long)p1 + (HEAP_MAX_SIZE-1))
 		    & ~(HEAP_MAX_SIZE-1));
@@ -742,7 +754,7 @@ new_heap(size, top_pad) size_t size, top_pad;
     } else {
       /* Try to take the chance that an allocation of only HEAP_MAX_SIZE
 	 is already aligned. */
-      p2 = (char *)MMAP(0, HEAP_MAX_SIZE, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_NORESERVE);
+      p2 = mmap_for_heap(0, HEAP_MAX_SIZE);
       if(p2 == MAP_FAILED)
 	return 0;
       if((unsigned long)p2 & (HEAP_MAX_SIZE-1)) {
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH] malloc: Lindent new_heap
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
                   ` (2 preceding siblings ...)
  2016-01-26  0:26 ` [PATCH] malloc: use MAP_HUGETLB when possible Joern Engel
@ 2016-01-26  0:26 ` Joern Engel
  2016-01-26  0:26 ` [PATCH] malloc: unobfuscate an assert Joern Engel
                   ` (60 subsequent siblings)
  64 siblings, 0 replies; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:26 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

From: Joern Engel <joern@purestorage.org>

Cleanup before touching the function some more

JIRA: PURE-27597
---
 tpc/malloc2.13/arena.ch | 121 +++++++++++++++++++++++-------------------------
 1 file changed, 57 insertions(+), 64 deletions(-)

diff --git a/tpc/malloc2.13/arena.ch b/tpc/malloc2.13/arena.ch
index fae6c2f7ee4c..372dc7ced2b9 100644
--- a/tpc/malloc2.13/arena.ch
+++ b/tpc/malloc2.13/arena.ch
@@ -703,71 +703,64 @@ static void *mmap_for_heap(void *addr, size_t length)
 
 /* Create a new heap.  size is automatically rounded up to a multiple
    of the page size. */
-
-static heap_info *
-internal_function
-#if __STD_C
-new_heap(size_t size, size_t top_pad)
-#else
-new_heap(size, top_pad) size_t size, top_pad;
-#endif
+static heap_info *new_heap(size_t size, size_t top_pad)
 {
-  size_t page_mask = malloc_getpagesize - 1;
-  char *p1, *p2;
-  unsigned long ul;
-  heap_info *h;
-
-  if(size+top_pad < HEAP_MIN_SIZE)
-    size = HEAP_MIN_SIZE;
-  else if(size+top_pad <= HEAP_MAX_SIZE)
-    size += top_pad;
-  else if(size > HEAP_MAX_SIZE)
-    return 0;
-  else
-    size = HEAP_MAX_SIZE;
-  size = (size + page_mask) & ~page_mask;
-
-  /* A memory region aligned to a multiple of HEAP_MAX_SIZE is needed.
-     No swap space needs to be reserved for the following large
-     mapping (on Linux, this is the case for all non-writable mappings
-     anyway). */
-  p2 = MAP_FAILED;
-  if(aligned_heap_area) {
-    p2 = mmap_for_heap(aligned_heap_area, HEAP_MAX_SIZE);
-    aligned_heap_area = NULL;
-    if (p2 != MAP_FAILED && ((unsigned long)p2 & (HEAP_MAX_SIZE-1))) {
-      munmap(p2, HEAP_MAX_SIZE);
-      p2 = MAP_FAILED;
-    }
-  }
-  if(p2 == MAP_FAILED) {
-    p1 = mmap_for_heap(0, HEAP_MAX_SIZE<<1);
-    if(p1 != MAP_FAILED) {
-      p2 = (char *)(((unsigned long)p1 + (HEAP_MAX_SIZE-1))
-		    & ~(HEAP_MAX_SIZE-1));
-      ul = p2 - p1;
-      if (ul)
-	munmap(p1, ul);
-      else
-	aligned_heap_area = p2 + HEAP_MAX_SIZE;
-      munmap(p2 + HEAP_MAX_SIZE, HEAP_MAX_SIZE - ul);
-    } else {
-      /* Try to take the chance that an allocation of only HEAP_MAX_SIZE
-	 is already aligned. */
-      p2 = mmap_for_heap(0, HEAP_MAX_SIZE);
-      if(p2 == MAP_FAILED)
-	return 0;
-      if((unsigned long)p2 & (HEAP_MAX_SIZE-1)) {
-	munmap(p2, HEAP_MAX_SIZE);
-	return 0;
-      }
-    }
-  }
-  h = (heap_info *)p2;
-  h->size = size;
-  h->mprotect_size = size;
-  THREAD_STAT(stat_n_heaps++);
-  return h;
+	size_t page_mask = malloc_getpagesize - 1;
+	char *p1, *p2;
+	unsigned long ul;
+	heap_info *h;
+
+	if (size + top_pad < HEAP_MIN_SIZE)
+		size = HEAP_MIN_SIZE;
+	else if (size + top_pad <= HEAP_MAX_SIZE)
+		size += top_pad;
+	else if (size > HEAP_MAX_SIZE)
+		return 0;
+	else
+		size = HEAP_MAX_SIZE;
+	size = (size + page_mask) & ~page_mask;
+
+	/* A memory region aligned to a multiple of HEAP_MAX_SIZE is needed.
+	   No swap space needs to be reserved for the following large
+	   mapping (on Linux, this is the case for all non-writable mappings
+	   anyway). */
+	p2 = MAP_FAILED;
+	if (aligned_heap_area) {
+		p2 = mmap_for_heap(aligned_heap_area, HEAP_MAX_SIZE);
+		aligned_heap_area = NULL;
+		if (p2 != MAP_FAILED && ((unsigned long)p2 & (HEAP_MAX_SIZE - 1))) {
+			munmap(p2, HEAP_MAX_SIZE);
+			p2 = MAP_FAILED;
+		}
+	}
+	if (p2 == MAP_FAILED) {
+		p1 = mmap_for_heap(0, HEAP_MAX_SIZE << 1);
+		if (p1 != MAP_FAILED) {
+			p2 = (char *)(((unsigned long)p1 + (HEAP_MAX_SIZE - 1))
+				      & ~(HEAP_MAX_SIZE - 1));
+			ul = p2 - p1;
+			if (ul)
+				munmap(p1, ul);
+			else
+				aligned_heap_area = p2 + HEAP_MAX_SIZE;
+			munmap(p2 + HEAP_MAX_SIZE, HEAP_MAX_SIZE - ul);
+		} else {
+			/* Try to take the chance that an allocation of only HEAP_MAX_SIZE
+			   is already aligned. */
+			p2 = mmap_for_heap(0, HEAP_MAX_SIZE);
+			if (p2 == MAP_FAILED)
+				return 0;
+			if ((unsigned long)p2 & (HEAP_MAX_SIZE - 1)) {
+				munmap(p2, HEAP_MAX_SIZE);
+				return 0;
+			}
+		}
+	}
+	h = (heap_info *) p2;
+	h->size = size;
+	h->mprotect_size = size;
+	THREAD_STAT(stat_n_heaps++);
+	return h;
 }
 
 /* Grow a heap.  size is automatically rounded up to a
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH] malloc: turn arena_get() into a function
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
  2016-01-26  0:26 ` [PATCH] malloc: push down the memset for huge pages Joern Engel
@ 2016-01-26  0:26 ` Joern Engel
  2016-01-26  0:26 ` [PATCH] malloc: use MAP_HUGETLB when possible Joern Engel
                   ` (62 subsequent siblings)
  64 siblings, 0 replies; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:26 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

From: Joern Engel <joern@purestorage.org>

Macros may have made sense in the '90s.  Not anymore.

JIRA: PURE-27597
---
 tpc/malloc2.13/arena.h  | 79 ++++++++++++++++++++++---------------------------
 tpc/malloc2.13/malloc.c | 14 ++++-----
 2 files changed, 43 insertions(+), 50 deletions(-)

diff --git a/tpc/malloc2.13/arena.h b/tpc/malloc2.13/arena.h
index 47ecf9421a46..0a269715004c 100644
--- a/tpc/malloc2.13/arena.h
+++ b/tpc/malloc2.13/arena.h
@@ -94,46 +94,6 @@ static int __malloc_initialized = -1;
 
 /**************************************************************************/
 
-/*
- * Calling getcpu() for every allocation is too expensive - but we can turn
- * the syscall into a pointer dereference to a kernel shared memory page.
- */
-#include <sys/syscall.h>
-static inline int getnode(void)
-{
-	int node, ret;
-	ret = syscall(SYS_getcpu, NULL, &node, NULL);
-	return (ret == -1) ? 0 : node;
-}
-
-/* arena_get() acquires an arena and locks the corresponding mutex.
-   First, try the one last locked successfully by this thread.  (This
-   is the common case and handled with a macro for speed.)  Then, loop
-   once over the circularly linked list of arenas.  If no arena is
-   readily available, create a new one.  In this latter case, `size'
-   is just a hint as to how much memory will be required immediately
-   in the new arena. */
-
-#define arena_get(ptr, size) do { \
-	arena_lookup(ptr); \
-	arena_lock(ptr, size); \
-} while(0)
-
-#define arena_lookup(ptr) do { \
-	Void_t *vptr = NULL; \
-	int node = getnode(); \
-	ptr = (struct malloc_state *)tsd_getspecific(arena_key, vptr); \
-	if (!ptr || ptr->numa_node != node) \
-		ptr = numa_arena[node]; \
-} while(0)
-
-#define arena_lock(ptr, size) do { \
-	if(ptr && !mutex_trylock(&ptr->mutex)) { \
-		THREAD_STAT(++(ptr->stat_lock_direct)); \
-	} else \
-		ptr = arena_get2(ptr, (size)); \
-} while(0)
-
 /* find the heap and corresponding arena for a given ptr */
 
 #define heap_for_ptr(ptr) \
@@ -702,9 +662,7 @@ static struct malloc_state *_int_new_arena(size_t size, int numa_node)
 	return a;
 }
 
-
-
-static struct malloc_state *internal_function arena_get2(struct malloc_state *a_tsd, size_t size)
+static struct malloc_state *arena_get2(struct malloc_state *a_tsd, size_t size)
 {
 	struct malloc_state *a;
 
@@ -746,3 +704,38 @@ static struct malloc_state *internal_function arena_get2(struct malloc_state *a_
 
 	return a;
 }
+
+/*
+ * Calling getcpu() for every allocation is too expensive - but we can turn
+ * the syscall into a pointer dereference to a kernel shared memory page.
+ */
+#include <sys/syscall.h>
+static inline int getnode(void)
+{
+	int node, ret;
+	ret = syscall(SYS_getcpu, NULL, &node, NULL);
+	return (ret == -1) ? 0 : node;
+}
+
+/* arena_get() acquires an arena and locks the corresponding mutex.
+   First, try the one last locked successfully by this thread.  (This
+   is the common case and handled with a macro for speed.)  Then, loop
+   once over the circularly linked list of arenas.  If no arena is
+   readily available, create a new one.  In this latter case, `size'
+   is just a hint as to how much memory will be required immediately
+   in the new arena. */
+
+static struct malloc_state *arena_get(size_t size)
+{
+	struct malloc_state *arena = NULL;
+	int node = getnode();
+
+	arena = pthread_getspecific(arena_key);
+	if (!arena || arena->numa_node != node)
+		arena = numa_arena[node];
+	if (arena && !mutex_trylock(&arena->mutex)) {
+		THREAD_STAT(++(arena->stat_lock_direct));
+	} else
+		arena = arena_get2(arena, size);
+	return arena;
+}
diff --git a/tpc/malloc2.13/malloc.c b/tpc/malloc2.13/malloc.c
index 11f050acebb1..b86f0c3ff65c 100644
--- a/tpc/malloc2.13/malloc.c
+++ b/tpc/malloc2.13/malloc.c
@@ -3354,7 +3354,7 @@ Void_t *public_mALLOc(size_t bytes)
 	if (__builtin_expect(hook != NULL, 0))
 		return (*hook) (bytes, RETURN_ADDRESS(0));
 
-	arena_get(ar_ptr, bytes);
+	ar_ptr = arena_get(bytes);
 	if (!ar_ptr)
 		return 0;
 	victim = _int_malloc(ar_ptr, bytes);
@@ -3543,7 +3543,7 @@ Void_t *public_mEMALIGn(size_t alignment, size_t bytes)
 	if (alignment < MINSIZE)
 		alignment = MINSIZE;
 
-	arena_get(ar_ptr, bytes + alignment + MINSIZE);
+	ar_ptr = arena_get(bytes + alignment + MINSIZE);
 	if (!ar_ptr)
 		return 0;
 	p = _int_memalign(ar_ptr, alignment, bytes);
@@ -3570,7 +3570,7 @@ Void_t *public_vALLOc(size_t bytes)
 	if (__builtin_expect(hook != NULL, 0))
 		return (*hook) (pagesz, bytes, RETURN_ADDRESS(0));
 
-	arena_get(ar_ptr, bytes + pagesz + MINSIZE);
+	ar_ptr = arena_get(bytes + pagesz + MINSIZE);
 	if (!ar_ptr)
 		return 0;
 	p = _int_valloc(ar_ptr, bytes);
@@ -3601,7 +3601,7 @@ Void_t *public_pVALLOc(size_t bytes)
 	if (__builtin_expect(hook != NULL, 0))
 		return (*hook) (pagesz, rounded_bytes, RETURN_ADDRESS(0));
 
-	arena_get(ar_ptr, bytes + 2 * pagesz + MINSIZE);
+	ar_ptr = arena_get(bytes + 2 * pagesz + MINSIZE);
 	p = _int_pvalloc(ar_ptr, bytes);
 	(void)mutex_unlock(&ar_ptr->mutex);
 	if (!p) {
@@ -3652,7 +3652,7 @@ Void_t *public_cALLOc(size_t n, size_t elem_size)
 
 	sz = bytes;
 
-	arena_get(av, sz);
+	av = arena_get(sz);
 	if (!av)
 		return 0;
 
@@ -3739,7 +3739,7 @@ public_iCALLOc(size_t n, size_t elem_size, Void_t** chunks)
   struct malloc_state * ar_ptr;
   Void_t** m;
 
-  arena_get(ar_ptr, n*elem_size);
+  ar_ptr = arena_get(n*elem_size);
   if(!ar_ptr)
     return 0;
 
@@ -3754,7 +3754,7 @@ public_iCOMALLOc(size_t n, size_t sizes[], Void_t** chunks)
   struct malloc_state * ar_ptr;
   Void_t** m;
 
-  arena_get(ar_ptr, 0);
+  ar_ptr = arena_get(0);
   if(!ar_ptr)
     return 0;
 
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH] malloc: kill mprotect
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
                   ` (7 preceding siblings ...)
  2016-01-26  0:26 ` [PATCH] malloc: unifdef -m -Ulibc_hidden_def Joern Engel
@ 2016-01-26  0:26 ` Joern Engel
  2016-01-26  0:27 ` [PATCH] malloc: remove mstate typedef Joern Engel
                   ` (55 subsequent siblings)
  64 siblings, 0 replies; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:26 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

From: Joern Engel <joern@purestorage.org>

The only reason to use mprotect is to reduce the "commit charge", i.e.
the amount of memory we are charged for when using overcommit_memory=2
(don't overcommit).  We want to overcommit, so there is zero benefit to
us from using mprotect.  But the cost is significant.

In a microbenchmark I got a 9x speed improvement from this change.  How
much of that remains for foed is unclear, but a typical pattern was to
call mprotect for 4k chunks, which took the mmap_sem writeable each
time, so the contention on mmap_sem should be noticeably reduced.

shrink_heap() used to call mmap(PROT_NONE) for our build-parameters.
Change that to calling madvise(DONT_NEED) instead, as mmap(PROT_NONE)
would now cause a segfault and mmap(PROT_READ|PROT_WRITE) would be an
expensive noop.

JIRA: PURE-27597
---
 tpc/malloc2.13/arena.ch | 30 ++++--------------------------
 1 file changed, 4 insertions(+), 26 deletions(-)

diff --git a/tpc/malloc2.13/arena.ch b/tpc/malloc2.13/arena.ch
index 387476db2a68..3e778f3f96f7 100644
--- a/tpc/malloc2.13/arena.ch
+++ b/tpc/malloc2.13/arena.ch
@@ -719,7 +719,7 @@ new_heap(size, top_pad) size_t size, top_pad;
      anyway). */
   p2 = MAP_FAILED;
   if(aligned_heap_area) {
-    p2 = (char *)MMAP(aligned_heap_area, HEAP_MAX_SIZE, PROT_NONE,
+    p2 = (char *)MMAP(aligned_heap_area, HEAP_MAX_SIZE, PROT_READ|PROT_WRITE,
 		      MAP_PRIVATE|MAP_NORESERVE);
     aligned_heap_area = NULL;
     if (p2 != MAP_FAILED && ((unsigned long)p2 & (HEAP_MAX_SIZE-1))) {
@@ -728,7 +728,7 @@ new_heap(size, top_pad) size_t size, top_pad;
     }
   }
   if(p2 == MAP_FAILED) {
-    p1 = (char *)MMAP(0, HEAP_MAX_SIZE<<1, PROT_NONE,
+    p1 = (char *)MMAP(0, HEAP_MAX_SIZE<<1, PROT_READ|PROT_WRITE,
 		      MAP_PRIVATE|MAP_NORESERVE);
     if(p1 != MAP_FAILED) {
       p2 = (char *)(((unsigned long)p1 + (HEAP_MAX_SIZE-1))
@@ -742,7 +742,7 @@ new_heap(size, top_pad) size_t size, top_pad;
     } else {
       /* Try to take the chance that an allocation of only HEAP_MAX_SIZE
 	 is already aligned. */
-      p2 = (char *)MMAP(0, HEAP_MAX_SIZE, PROT_NONE, MAP_PRIVATE|MAP_NORESERVE);
+      p2 = (char *)MMAP(0, HEAP_MAX_SIZE, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_NORESERVE);
       if(p2 == MAP_FAILED)
 	return 0;
       if((unsigned long)p2 & (HEAP_MAX_SIZE-1)) {
@@ -751,10 +751,6 @@ new_heap(size, top_pad) size_t size, top_pad;
       }
     }
   }
-  if(mprotect(p2, size, PROT_READ|PROT_WRITE) != 0) {
-    munmap(p2, HEAP_MAX_SIZE);
-    return 0;
-  }
   h = (heap_info *)p2;
   h->size = size;
   h->mprotect_size = size;
@@ -780,10 +776,6 @@ grow_heap(h, diff) heap_info *h; long diff;
   if((unsigned long) new_size > (unsigned long) HEAP_MAX_SIZE)
     return -1;
   if((unsigned long) new_size > h->mprotect_size) {
-    if (mprotect((char *)h + h->mprotect_size,
-		 (unsigned long) new_size - h->mprotect_size,
-		 PROT_READ|PROT_WRITE) != 0)
-      return -2;
     h->mprotect_size = new_size;
   }
 
@@ -807,21 +799,7 @@ shrink_heap(h, diff) heap_info *h; long diff;
     return -1;
   /* Try to re-map the extra heap space freshly to save memory, and
      make it inaccessible. */
-#ifdef _LIBC
-  if (__builtin_expect (__libc_enable_secure, 0))
-#else
-  if (1)
-#endif
-    {
-      if((char *)MMAP((char *)h + new_size, diff, PROT_NONE,
-		      MAP_PRIVATE|MAP_FIXED) == (char *) MAP_FAILED)
-	return -2;
-      h->mprotect_size = new_size;
-    }
-#ifdef _LIBC
-  else
-    madvise ((char *)h + new_size, diff, MADV_DONTNEED);
-#endif
+  madvise ((char *)h + new_size, diff, MADV_DONTNEED);
   /*fprintf(stderr, "shrink %p %08lx\n", h, new_size);*/
 
   h->size = new_size;
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH] malloc: push down the memset for huge pages
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
@ 2016-01-26  0:26 ` Joern Engel
  2016-01-26  0:26 ` [PATCH] malloc: turn arena_get() into a function Joern Engel
                   ` (63 subsequent siblings)
  64 siblings, 0 replies; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:26 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

From: Joern Engel <joern@purestorage.org>

mmap tends to return unaligned memory, so malloc maps twice the required
memory and trims off the unaligned bits.  That means we memset twice as
much as necessary.  Fix that.

JIRA: PURE-27597
---
 tpc/malloc2.13/arena.ch | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/tpc/malloc2.13/arena.ch b/tpc/malloc2.13/arena.ch
index 372dc7ced2b9..09bdf0fd26b5 100644
--- a/tpc/malloc2.13/arena.ch
+++ b/tpc/malloc2.13/arena.ch
@@ -687,7 +687,7 @@ dump_heap(heap) heap_info *heap;
    multiple threads, but only one will succeed.  */
 static char *aligned_heap_area;
 
-static void *mmap_for_heap(void *addr, size_t length)
+static void *mmap_for_heap(void *addr, size_t length, int *must_clear)
 {
 	int prot = PROT_READ | PROT_WRITE;
 	int flags = MAP_PRIVATE;
@@ -695,9 +695,10 @@ static void *mmap_for_heap(void *addr, size_t length)
 
 	ret = MMAP(addr, length, prot, flags | MAP_HUGETLB);
 	if (ret != MAP_FAILED) {
-		memset(ret, 0, length);
+		*must_clear = 1;
 		return ret;
 	}
+	*must_clear = 0;
 	return MMAP(addr, length, prot, flags | MAP_NORESERVE);
 }
 
@@ -709,6 +710,7 @@ static heap_info *new_heap(size_t size, size_t top_pad)
 	char *p1, *p2;
 	unsigned long ul;
 	heap_info *h;
+	int must_clear;
 
 	if (size + top_pad < HEAP_MIN_SIZE)
 		size = HEAP_MIN_SIZE;
@@ -726,7 +728,7 @@ static heap_info *new_heap(size_t size, size_t top_pad)
 	   anyway). */
 	p2 = MAP_FAILED;
 	if (aligned_heap_area) {
-		p2 = mmap_for_heap(aligned_heap_area, HEAP_MAX_SIZE);
+		p2 = mmap_for_heap(aligned_heap_area, HEAP_MAX_SIZE, &must_clear);
 		aligned_heap_area = NULL;
 		if (p2 != MAP_FAILED && ((unsigned long)p2 & (HEAP_MAX_SIZE - 1))) {
 			munmap(p2, HEAP_MAX_SIZE);
@@ -734,7 +736,7 @@ static heap_info *new_heap(size_t size, size_t top_pad)
 		}
 	}
 	if (p2 == MAP_FAILED) {
-		p1 = mmap_for_heap(0, HEAP_MAX_SIZE << 1);
+		p1 = mmap_for_heap(0, HEAP_MAX_SIZE << 1, &must_clear);
 		if (p1 != MAP_FAILED) {
 			p2 = (char *)(((unsigned long)p1 + (HEAP_MAX_SIZE - 1))
 				      & ~(HEAP_MAX_SIZE - 1));
@@ -747,7 +749,7 @@ static heap_info *new_heap(size_t size, size_t top_pad)
 		} else {
 			/* Try to take the chance that an allocation of only HEAP_MAX_SIZE
 			   is already aligned. */
-			p2 = mmap_for_heap(0, HEAP_MAX_SIZE);
+			p2 = mmap_for_heap(0, HEAP_MAX_SIZE, &must_clear);
 			if (p2 == MAP_FAILED)
 				return 0;
 			if ((unsigned long)p2 & (HEAP_MAX_SIZE - 1)) {
@@ -756,6 +758,8 @@ static heap_info *new_heap(size_t size, size_t top_pad)
 			}
 		}
 	}
+	if (must_clear)
+		memset(p2, 0, HEAP_MAX_SIZE);
 	h = (heap_info *) p2;
 	h->size = size;
 	h->mprotect_size = size;
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH] malloc: remove emacs style guards
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
                   ` (5 preceding siblings ...)
  2016-01-26  0:26 ` [PATCH] malloc: remove dead code Joern Engel
@ 2016-01-26  0:26 ` Joern Engel
  2016-01-26  0:26 ` [PATCH] malloc: unifdef -m -Ulibc_hidden_def Joern Engel
                   ` (57 subsequent siblings)
  64 siblings, 0 replies; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:26 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

From: Joern Engel <joern@purestorage.org>

JIRA: PURE-27597
---
 tpc/malloc2.13/arena.h  | 8 --------
 tpc/malloc2.13/hooks.h  | 6 ------
 tpc/malloc2.13/malloc.c | 5 -----
 3 files changed, 19 deletions(-)

diff --git a/tpc/malloc2.13/arena.h b/tpc/malloc2.13/arena.h
index b8fc5c99a1cd..47ecf9421a46 100644
--- a/tpc/malloc2.13/arena.h
+++ b/tpc/malloc2.13/arena.h
@@ -746,11 +746,3 @@ static struct malloc_state *internal_function arena_get2(struct malloc_state *a_
 
 	return a;
 }
-
-
-
-/*
- * Local variables:
- * c-basic-offset: 2
- * End:
- */
diff --git a/tpc/malloc2.13/hooks.h b/tpc/malloc2.13/hooks.h
index 209544da5377..c236aab89237 100644
--- a/tpc/malloc2.13/hooks.h
+++ b/tpc/malloc2.13/hooks.h
@@ -611,9 +611,3 @@ public_sET_STATe(Void_t* msptr)
   (void)mutex_unlock(&main_arena.mutex);
   return 0;
 }
-
-/*
- * Local variables:
- * c-basic-offset: 2
- * End:
- */
diff --git a/tpc/malloc2.13/malloc.c b/tpc/malloc2.13/malloc.c
index 4bc6247d910e..11f050acebb1 100644
--- a/tpc/malloc2.13/malloc.c
+++ b/tpc/malloc2.13/malloc.c
@@ -5998,9 +5998,4 @@ History:
 [see ftp://g.oswego.edu/pub/misc/malloc.c for the history of dlmalloc]
 
 */
-/*
- * Local variables:
- * c-basic-offset: 2
- * End:
- */
 #endif /* WIN32 */
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH] malloc: unifdef -m -Ulibc_hidden_def
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
                   ` (6 preceding siblings ...)
  2016-01-26  0:26 ` [PATCH] malloc: remove emacs style guards Joern Engel
@ 2016-01-26  0:26 ` Joern Engel
  2016-01-26  0:26 ` [PATCH] malloc: kill mprotect Joern Engel
                   ` (56 subsequent siblings)
  64 siblings, 0 replies; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:26 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

From: Joern Engel <joern@purestorage.org>

JIRA: PURE-27597
---
 tpc/malloc2.13/malloc.c | 12 ------------
 1 file changed, 12 deletions(-)

diff --git a/tpc/malloc2.13/malloc.c b/tpc/malloc2.13/malloc.c
index 4fff268316ed..94b55241d3bf 100644
--- a/tpc/malloc2.13/malloc.c
+++ b/tpc/malloc2.13/malloc.c
@@ -3438,9 +3438,6 @@ public_mALLOc(size_t bytes)
 	 ar_ptr == arena_for_chunk(mem2chunk(victim)));
   return victim;
 }
-#ifdef libc_hidden_def
-libc_hidden_def(public_mALLOc)
-#endif
 
 void
 public_fREe(Void_t* mem)
@@ -3492,9 +3489,6 @@ public_fREe(Void_t* mem)
   (void)mutex_unlock(&ar_ptr->mutex);
 #endif
 }
-#ifdef libc_hidden_def
-libc_hidden_def (public_fREe)
-#endif
 
 Void_t*
 public_rEALLOc(Void_t* oldmem, size_t bytes)
@@ -3603,9 +3597,6 @@ public_rEALLOc(Void_t* oldmem, size_t bytes)
 
   return newp;
 }
-#ifdef libc_hidden_def
-libc_hidden_def (public_rEALLOc)
-#endif
 
 Void_t*
 public_mEMALIGn(size_t alignment, size_t bytes)
@@ -3653,9 +3644,6 @@ public_mEMALIGn(size_t alignment, size_t bytes)
 	 ar_ptr == arena_for_chunk(mem2chunk(p)));
   return p;
 }
-#ifdef libc_hidden_def
-libc_hidden_def (public_mEMALIGn)
-#endif
 
 Void_t*
 public_vALLOc(size_t bytes)
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH] malloc: unobfuscate an assert
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
                   ` (3 preceding siblings ...)
  2016-01-26  0:26 ` [PATCH] malloc: Lindent new_heap Joern Engel
@ 2016-01-26  0:26 ` Joern Engel
  2016-01-26  0:26 ` [PATCH] malloc: remove dead code Joern Engel
                   ` (59 subsequent siblings)
  64 siblings, 0 replies; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:26 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

From: Joern Engel <joern@purestorage.org>

Assert was defined such that by default one gets 'variable set but not
used' style warnings.  It is better to prevent them by changing the
assert define than by littering all callsites with workarounds.

JIRA: PURE-27597
---
 tpc/malloc2.13/arena.h  | 3 +--
 tpc/malloc2.13/malloc.c | 2 +-
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/tpc/malloc2.13/arena.h b/tpc/malloc2.13/arena.h
index d038565f84d3..85373466928f 100644
--- a/tpc/malloc2.13/arena.h
+++ b/tpc/malloc2.13/arena.h
@@ -486,8 +486,7 @@ static void mbind_memory(void *mem, size_t size, int node)
 
 	assert(max_node < sizeof(unsigned long));
 	err = mbind(mem, size, MPOL_PREFERRED, &node_mask, max_node, MPOL_F_STATIC_NODES);
-	if (err)
-		assert(!err);
+	assert(!err);
 }
 
 /* Create a new heap.  size is automatically rounded up to a multiple
diff --git a/tpc/malloc2.13/malloc.c b/tpc/malloc2.13/malloc.c
index 461621c11250..0a71065a7b90 100644
--- a/tpc/malloc2.13/malloc.c
+++ b/tpc/malloc2.13/malloc.c
@@ -316,7 +316,7 @@ extern "C" {
 #include <assert.h>
 #else
 #undef  assert
-#define assert(x) ((void)0)
+#define assert(x) if (0 && !(x)) { ; }
 #endif
 
 /*
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH] malloc: remove __builtin_expect
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
                   ` (10 preceding siblings ...)
  2016-01-26  0:27 ` [PATCH] malloc: Lindent before functional changes Joern Engel
@ 2016-01-26  0:27 ` Joern Engel
  2016-01-26  7:56   ` Yury Gribov
  2016-01-26  0:27 ` [PATCH] malloc: s/max_node/num_nodes/ Joern Engel
                   ` (52 subsequent siblings)
  64 siblings, 1 reply; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:27 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

From: Joern Engel <joern@purestorage.org>

It was disabled anyway and only served as obfuscation.  No change
post-compilation.

JIRA: PURE-27597
---
 tpc/malloc2.13/malloc.c | 112 ++++++++++++++++++++++++------------------------
 1 file changed, 55 insertions(+), 57 deletions(-)

diff --git a/tpc/malloc2.13/malloc.c b/tpc/malloc2.13/malloc.c
index 0a71065a7b90..06e0f258ea1a 100644
--- a/tpc/malloc2.13/malloc.c
+++ b/tpc/malloc2.13/malloc.c
@@ -472,8 +472,6 @@ extern "C" {
 
 #endif /* USE_DL_PREFIX */
 
-#define __builtin_expect(expr, val)	(expr)
-
 #define fwrite(buf, size, count, fp) _IO_fwrite (buf, size, count, fp)
 
 /*
@@ -1903,13 +1901,13 @@ typedef struct malloc_chunk* mbinptr;
 #define unlink(P, BK, FD) {                                            \
   FD = P->fd;                                                          \
   BK = P->bk;                                                          \
-  if (__builtin_expect (FD->bk != P || BK->fd != P, 0))                \
+  if (FD->bk != P || BK->fd != P)                \
     malloc_printerr (check_action, "corrupted double-linked list", P); \
   else {                                                               \
     FD->bk = BK;                                                       \
     BK->fd = FD;                                                       \
     if (!in_smallbin_range (P->size)				       \
-	&& __builtin_expect (P->fd_nextsize != NULL, 0)) {	       \
+	&& P->fd_nextsize != NULL) {	       \
       assert (P->fd_nextsize->bk_nextsize == P);		       \
       assert (P->bk_nextsize->fd_nextsize == P);		       \
       if (FD->fd_nextsize == NULL) {				       \
@@ -2935,7 +2933,7 @@ static Void_t* sYSMALLOc(INTERNAL_SIZE_T nb, struct malloc_state * av)
   if (brk != (char*)(MORECORE_FAILURE)) {
     /* Call the `morecore' hook if necessary.  */
     void (*hook) (void) = force_reg (dlafter_morecore_hook);
-    if (__builtin_expect (hook != NULL, 0))
+    if (hook != NULL)
       (*hook) ();
   } else {
   /*
@@ -3073,7 +3071,7 @@ static Void_t* sYSMALLOc(INTERNAL_SIZE_T nb, struct malloc_state * av)
 	} else {
 	  /* Call the `morecore' hook if necessary.  */
 	  void (*hook) (void) = force_reg (dlafter_morecore_hook);
-	  if (__builtin_expect (hook != NULL, 0))
+	  if (hook != NULL)
 	    (*hook) ();
 	}
       }
@@ -3220,7 +3218,7 @@ static int sYSTRIm(size_t pad, struct malloc_state * av)
       MORECORE(-extra);
       /* Call the `morecore' hook if necessary.  */
       void (*hook) (void) = force_reg (dlafter_morecore_hook);
-      if (__builtin_expect (hook != NULL, 0))
+      if (hook != NULL)
 	(*hook) ();
       new_brk = (char*)(MORECORE(0));
 
@@ -3260,7 +3258,7 @@ munmap_chunk(mchunkptr p)
      page size.  But gcc does not recognize the optimization possibility
      (in the moment at least) so we combine the two values into one before
      the bit test.  */
-  if (__builtin_expect (((block | total_size) & (mp_.pagesize - 1)) != 0, 0))
+  if (((block | total_size) & (mp_.pagesize - 1)) != 0)
     {
       malloc_printerr (check_action, "munmap_chunk(): invalid pointer",
 		       chunk2mem (p));
@@ -3351,7 +3349,7 @@ Void_t *public_mALLOc(size_t bytes)
 
 	__malloc_ptr_t(*hook) (size_t, __const __malloc_ptr_t)
 	    = force_reg(dlmalloc_hook);
-	if (__builtin_expect(hook != NULL, 0))
+	if (hook != NULL)
 		return (*hook) (bytes, RETURN_ADDRESS(0));
 
 	ar_ptr = arena_get(bytes);
@@ -3375,7 +3373,7 @@ public_fREe(Void_t* mem)
 
   void (*hook) (__malloc_ptr_t, __const __malloc_ptr_t)
     = force_reg (dlfree_hook);
-  if (__builtin_expect (hook != NULL, 0)) {
+  if (hook != NULL) {
     (*hook)(mem, RETURN_ADDRESS (0));
     return;
   }
@@ -3428,7 +3426,7 @@ public_rEALLOc(Void_t* oldmem, size_t bytes)
 
   __malloc_ptr_t (*hook) (__malloc_ptr_t, size_t, __const __malloc_ptr_t) =
     force_reg (dlrealloc_hook);
-  if (__builtin_expect (hook != NULL, 0))
+  if (hook != NULL)
     return (*hook)(oldmem, bytes, RETURN_ADDRESS (0));
 
 #if REALLOC_ZERO_BYTES_FREES
@@ -3447,8 +3445,8 @@ public_rEALLOc(Void_t* oldmem, size_t bytes)
      allocator never wrapps around at the end of the address space.
      Therefore we can exclude some size values which might appear
      here by accident or by "design" from some intruder.  */
-  if (__builtin_expect ((uintptr_t) oldp > (uintptr_t) -oldsize, 0)
-      || __builtin_expect (misaligned_chunk (oldp), 0))
+  if ((uintptr_t) oldp > (uintptr_t) -oldsize
+      || misaligned_chunk (oldp))
     {
       malloc_printerr (check_action, "realloc(): invalid pointer", oldmem);
       return NULL;
@@ -3532,7 +3530,7 @@ Void_t *public_mEMALIGn(size_t alignment, size_t bytes)
 	Void_t *p;
 
 	__malloc_ptr_t(*hook) __MALLOC_PMT((size_t, size_t, __const __malloc_ptr_t)) = force_reg(dlmemalign_hook);
-	if (__builtin_expect(hook != NULL, 0))
+	if (hook != NULL)
 		return (*hook) (alignment, bytes, RETURN_ADDRESS(0));
 
 	/* If need less alignment than we give anyway, just relay to malloc */
@@ -3567,7 +3565,7 @@ Void_t *public_vALLOc(size_t bytes)
 	size_t pagesz = mp_.pagesize;
 
 	__malloc_ptr_t(*hook) __MALLOC_PMT((size_t, size_t, __const __malloc_ptr_t)) = force_reg(dlmemalign_hook);
-	if (__builtin_expect(hook != NULL, 0))
+	if (hook != NULL)
 		return (*hook) (pagesz, bytes, RETURN_ADDRESS(0));
 
 	ar_ptr = arena_get(bytes + pagesz + MINSIZE);
@@ -3598,7 +3596,7 @@ Void_t *public_pVALLOc(size_t bytes)
 	size_t rounded_bytes = (bytes + page_mask) & ~(page_mask);
 
 	__malloc_ptr_t(*hook) __MALLOC_PMT((size_t, size_t, __const __malloc_ptr_t)) = force_reg(dlmemalign_hook);
-	if (__builtin_expect(hook != NULL, 0))
+	if (hook != NULL)
 		return (*hook) (pagesz, rounded_bytes, RETURN_ADDRESS(0));
 
 	ar_ptr = arena_get(bytes + 2 * pagesz + MINSIZE);
@@ -3628,7 +3626,7 @@ Void_t *public_cALLOc(size_t n, size_t elem_size)
 	bytes = n * elem_size;
 #define HALF_INTERNAL_SIZE_T \
   (((INTERNAL_SIZE_T) 1) << (8 * sizeof (INTERNAL_SIZE_T) / 2))
-	if (__builtin_expect((n | elem_size) >= HALF_INTERNAL_SIZE_T, 0)) {
+	if ((n | elem_size) >= HALF_INTERNAL_SIZE_T) {
 		if (elem_size != 0 && bytes / elem_size != n) {
 			MALLOC_FAILURE_ACTION;
 			return 0;
@@ -3636,7 +3634,7 @@ Void_t *public_cALLOc(size_t n, size_t elem_size)
 	}
 
 	__malloc_ptr_t(*hook) __MALLOC_PMT((size_t, __const __malloc_ptr_t)) = force_reg(dlmalloc_hook);
-	if (__builtin_expect(hook != NULL, 0)) {
+	if (hook != NULL) {
 		sz = bytes;
 		mem = (*hook) (sz, RETURN_ADDRESS(0));
 		if (mem == 0)
@@ -3686,7 +3684,7 @@ Void_t *public_cALLOc(size_t n, size_t elem_size)
 
 	/* Two optional cases in which clearing not necessary */
 	if (chunk_is_mmapped(p)) {
-		if (__builtin_expect(perturb_byte, 0))
+		if (perturb_byte)
 			MALLOC_ZERO(mem, sz);
 		return mem;
 	}
@@ -3899,7 +3897,7 @@ _int_malloc(struct malloc_state * av, size_t bytes)
     victim = *fb;
 #endif
     if (victim != 0) {
-      if (__builtin_expect (fastbin_index (chunksize (victim)) != idx, 0))
+      if (fastbin_index (chunksize (victim)) != idx)
 	{
 	  errstr = "malloc(): memory corruption (fast)";
 	errout:
@@ -3911,7 +3909,7 @@ _int_malloc(struct malloc_state * av, size_t bytes)
 #endif
       check_remalloced_chunk(av, victim, nb);
       void *p = chunk2mem(victim);
-      if (__builtin_expect (perturb_byte, 0))
+      if (perturb_byte)
 	alloc_perturb (p, bytes);
       return p;
     }
@@ -3934,7 +3932,7 @@ _int_malloc(struct malloc_state * av, size_t bytes)
 	malloc_consolidate(av);
       else {
 	bck = victim->bk;
-	if (__builtin_expect (bck->fd != victim, 0))
+	if (bck->fd != victim)
 	  {
 	    errstr = "malloc(): smallbin double linked list corrupted";
 	    goto errout;
@@ -3947,7 +3945,7 @@ _int_malloc(struct malloc_state * av, size_t bytes)
 	  victim->size |= NON_MAIN_ARENA;
 	check_malloced_chunk(av, victim, nb);
 	void *p = chunk2mem(victim);
-	if (__builtin_expect (perturb_byte, 0))
+	if (perturb_byte)
 	  alloc_perturb (p, bytes);
 	return p;
       }
@@ -3989,8 +3987,8 @@ _int_malloc(struct malloc_state * av, size_t bytes)
     int iters = 0;
     while ( (victim = unsorted_chunks(av)->bk) != unsorted_chunks(av)) {
       bck = victim->bk;
-      if (__builtin_expect (victim->size <= 2 * SIZE_SZ, 0)
-	  || __builtin_expect (victim->size > av->system_mem, 0))
+      if (victim->size <= 2 * SIZE_SZ
+	  || victim->size > av->system_mem)
 	malloc_printerr (check_action, "malloc(): memory corruption",
 			 chunk2mem (victim));
       size = chunksize(victim);
@@ -4027,7 +4025,7 @@ _int_malloc(struct malloc_state * av, size_t bytes)
 
 	check_malloced_chunk(av, victim, nb);
 	void *p = chunk2mem(victim);
-	if (__builtin_expect (perturb_byte, 0))
+	if (perturb_byte)
 	  alloc_perturb (p, bytes);
 	return p;
       }
@@ -4044,7 +4042,7 @@ _int_malloc(struct malloc_state * av, size_t bytes)
 	  victim->size |= NON_MAIN_ARENA;
 	check_malloced_chunk(av, victim, nb);
 	void *p = chunk2mem(victim);
-	if (__builtin_expect (perturb_byte, 0))
+	if (perturb_byte)
 	  alloc_perturb (p, bytes);
 	return p;
       }
@@ -4148,7 +4146,7 @@ _int_malloc(struct malloc_state * av, size_t bytes)
 	     have to perform a complete insert here.  */
 	  bck = unsorted_chunks(av);
 	  fwd = bck->fd;
-	  if (__builtin_expect (fwd->bk != bck, 0))
+	  if (fwd->bk != bck)
 	    {
 	      errstr = "malloc(): corrupted unsorted chunks";
 	      goto errout;
@@ -4169,7 +4167,7 @@ _int_malloc(struct malloc_state * av, size_t bytes)
 	}
 	check_malloced_chunk(av, victim, nb);
 	void *p = chunk2mem(victim);
-	if (__builtin_expect (perturb_byte, 0))
+	if (perturb_byte)
 	  alloc_perturb (p, bytes);
 	return p;
       }
@@ -4248,7 +4246,7 @@ _int_malloc(struct malloc_state * av, size_t bytes)
 	     have to perform a complete insert here.  */
 	  bck = unsorted_chunks(av);
 	  fwd = bck->fd;
-	  if (__builtin_expect (fwd->bk != bck, 0))
+	  if (fwd->bk != bck)
 	    {
 	      errstr = "malloc(): corrupted unsorted chunks 2";
 	      goto errout;
@@ -4273,7 +4271,7 @@ _int_malloc(struct malloc_state * av, size_t bytes)
 	}
 	check_malloced_chunk(av, victim, nb);
 	void *p = chunk2mem(victim);
-	if (__builtin_expect (perturb_byte, 0))
+	if (perturb_byte)
 	  alloc_perturb (p, bytes);
 	return p;
       }
@@ -4308,7 +4306,7 @@ _int_malloc(struct malloc_state * av, size_t bytes)
 
       check_malloced_chunk(av, victim, nb);
       void *p = chunk2mem(victim);
-      if (__builtin_expect (perturb_byte, 0))
+      if (perturb_byte)
 	alloc_perturb (p, bytes);
       return p;
     }
@@ -4343,7 +4341,7 @@ _int_malloc(struct malloc_state * av, size_t bytes)
     */
     else {
       void *p = sYSMALLOc(nb, av);
-      if (p != NULL && __builtin_expect (perturb_byte, 0))
+      if (p != NULL && perturb_byte)
 	alloc_perturb (p, bytes);
       return p;
     }
@@ -4381,8 +4379,8 @@ _int_free(struct malloc_state * av, mchunkptr p)
      allocator never wrapps around at the end of the address space.
      Therefore we can exclude some size values which might appear
      here by accident or by "design" from some intruder.  */
-  if (__builtin_expect ((uintptr_t) p > (uintptr_t) -size, 0)
-      || __builtin_expect (misaligned_chunk (p), 0))
+  if ((uintptr_t) p > (uintptr_t) -size
+      || misaligned_chunk (p))
     {
       errstr = "free(): invalid pointer";
     errout:
@@ -4394,7 +4392,7 @@ _int_free(struct malloc_state * av, mchunkptr p)
       return;
     }
   /* We know that each chunk is at least MINSIZE bytes in size.  */
-  if (__builtin_expect (size < MINSIZE, 0))
+  if (size < MINSIZE)
     {
       errstr = "free(): invalid size";
       goto errout;
@@ -4418,9 +4416,9 @@ _int_free(struct malloc_state * av, mchunkptr p)
 #endif
       ) {
 
-    if (__builtin_expect (chunk_at_offset (p, size)->size <= 2 * SIZE_SZ, 0)
-	|| __builtin_expect (chunksize (chunk_at_offset (p, size))
-			     >= av->system_mem, 0))
+    if (chunk_at_offset (p, size)->size <= 2 * SIZE_SZ
+	|| chunksize (chunk_at_offset (p, size))
+			     >= av->system_mem)
       {
 #ifdef ATOMIC_FASTBINS
 	/* We might not have a lock at this point and concurrent modifications
@@ -4447,7 +4445,7 @@ _int_free(struct malloc_state * av, mchunkptr p)
 #endif
       }
 
-    if (__builtin_expect (perturb_byte, 0))
+    if (perturb_byte)
       free_perturb (chunk2mem(p), size - 2 * SIZE_SZ);
 
     set_fastchunks(av);
@@ -4462,7 +4460,7 @@ _int_free(struct malloc_state * av, mchunkptr p)
       {
 	/* Another simple check: make sure the top of the bin is not the
 	   record we are going to add (i.e., double free).  */
-	if (__builtin_expect (old == p, 0))
+	if (old == p)
 	  {
 	    errstr = "double free or corruption (fasttop)";
 	    goto errout;
@@ -4473,7 +4471,7 @@ _int_free(struct malloc_state * av, mchunkptr p)
       }
     while ((old = catomic_compare_and_exchange_val_rel (fb, p, fd)) != fd);
 
-    if (fd != NULL && __builtin_expect (old_idx != idx, 0))
+    if (fd != NULL && old_idx != idx)
       {
 	errstr = "invalid fastbin entry (free)";
 	goto errout;
@@ -4481,13 +4479,13 @@ _int_free(struct malloc_state * av, mchunkptr p)
 #else
     /* Another simple check: make sure the top of the bin is not the
        record we are going to add (i.e., double free).  */
-    if (__builtin_expect (*fb == p, 0))
+    if (*fb == p)
       {
 	errstr = "double free or corruption (fasttop)";
 	goto errout;
       }
     if (*fb != NULL
-	&& __builtin_expect (fastbin_index(chunksize(*fb)) != idx, 0))
+	&& fastbin_index(chunksize(*fb)) != idx)
       {
 	errstr = "invalid fastbin entry (free)";
 	goto errout;
@@ -4523,35 +4521,35 @@ _int_free(struct malloc_state * av, mchunkptr p)
 
     /* Lightweight tests: check whether the block is already the
        top block.  */
-    if (__builtin_expect (p == av->top, 0))
+    if (p == av->top)
       {
 	errstr = "double free or corruption (top)";
 	goto errout;
       }
     /* Or whether the next chunk is beyond the boundaries of the arena.  */
-    if (__builtin_expect (contiguous (av)
+    if (contiguous (av)
 			  && (char *) nextchunk
-			  >= ((char *) av->top + chunksize(av->top)), 0))
+			  >= ((char *) av->top + chunksize(av->top)))
       {
 	errstr = "double free or corruption (out)";
 	goto errout;
       }
     /* Or whether the block is actually not marked used.  */
-    if (__builtin_expect (!prev_inuse(nextchunk), 0))
+    if (!prev_inuse(nextchunk))
       {
 	errstr = "double free or corruption (!prev)";
 	goto errout;
       }
 
     nextsize = chunksize(nextchunk);
-    if (__builtin_expect (nextchunk->size <= 2 * SIZE_SZ, 0)
-	|| __builtin_expect (nextsize >= av->system_mem, 0))
+    if (nextchunk->size <= 2 * SIZE_SZ
+	|| nextsize >= av->system_mem)
       {
 	errstr = "free(): invalid next size (normal)";
 	goto errout;
       }
 
-    if (__builtin_expect (perturb_byte, 0))
+    if (perturb_byte)
       free_perturb (chunk2mem(p), size - 2 * SIZE_SZ);
 
     /* consolidate backward */
@@ -4581,7 +4579,7 @@ _int_free(struct malloc_state * av, mchunkptr p)
 
       bck = unsorted_chunks(av);
       fwd = bck->fd;
-      if (__builtin_expect (fwd->bk != bck, 0))
+      if (fwd->bk != bck)
 	{
 	  errstr = "free(): corrupted unsorted chunks";
 	  goto errout;
@@ -4822,8 +4820,8 @@ _int_realloc(struct malloc_state * av, mchunkptr oldp, INTERNAL_SIZE_T oldsize,
   const char *errstr = NULL;
 
   /* oldmem size */
-  if (__builtin_expect (oldp->size <= 2 * SIZE_SZ, 0)
-      || __builtin_expect (oldsize >= av->system_mem, 0))
+  if (oldp->size <= 2 * SIZE_SZ
+      || oldsize >= av->system_mem)
     {
       errstr = "realloc(): invalid old size";
     errout:
@@ -4843,8 +4841,8 @@ _int_realloc(struct malloc_state * av, mchunkptr oldp, INTERNAL_SIZE_T oldsize,
 
     next = chunk_at_offset(oldp, oldsize);
     INTERNAL_SIZE_T nextsize = chunksize(next);
-    if (__builtin_expect (next->size <= 2 * SIZE_SZ, 0)
-	|| __builtin_expect (nextsize >= av->system_mem, 0))
+    if (next->size <= 2 * SIZE_SZ
+	|| nextsize >= av->system_mem)
       {
 	errstr = "realloc(): invalid next size";
 	goto errout;
@@ -5769,7 +5767,7 @@ dlposix_memalign (void **memptr, size_t alignment, size_t size)
   __malloc_ptr_t (*hook) __MALLOC_PMT ((size_t, size_t,
 					__const __malloc_ptr_t)) =
     force_reg (dlmemalign_hook);
-  if (__builtin_expect (hook != NULL, 0))
+  if (hook != NULL)
     mem = (*hook)(alignment, size, RETURN_ADDRESS (0));
   else
     mem = public_mEMALIGn (alignment, size);
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH] malloc: unifdef -m -DUSE_ARENAS -DHAVE_MMAP
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
                   ` (43 preceding siblings ...)
  2016-01-26  0:27 ` [PATCH] malloc: introduce get_backup_arena() Joern Engel
@ 2016-01-26  0:27 ` Joern Engel
  2016-01-26  0:28 ` [PATCH] malloc: plug thread-cache memory leak Joern Engel
                   ` (19 subsequent siblings)
  64 siblings, 0 replies; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:27 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

From: Joern Engel <joern@purestorage.org>

Plus a bit of manual comment cleanup.

JIRA: PURE-27597
---
 tpc/malloc2.13/arena.h  | 28 -----------------
 tpc/malloc2.13/hooks.h  |  8 -----
 tpc/malloc2.13/malloc.c | 84 +------------------------------------------------
 3 files changed, 1 insertion(+), 119 deletions(-)

diff --git a/tpc/malloc2.13/arena.h b/tpc/malloc2.13/arena.h
index 0aaccb914d92..803d7b3bf020 100644
--- a/tpc/malloc2.13/arena.h
+++ b/tpc/malloc2.13/arena.h
@@ -98,7 +98,6 @@ static int __malloc_initialized = -1;
 
 /**************************************************************************/
 
-#if USE_ARENAS
 
 /* arena_get() acquires an arena and locks the corresponding mutex.
    First, try the one last locked successfully by this thread.  (This
@@ -141,29 +140,6 @@ static int __malloc_initialized = -1;
 #define arena_for_chunk(ptr) \
  (chunk_non_main_arena(ptr) ? heap_for_ptr(ptr)->ar_ptr : &main_arena)
 
-#else /* !USE_ARENAS */
-
-/* There is only one arena, main_arena. */
-
-#if THREAD_STATS
-#define arena_get(ar_ptr, sz) do { \
-  ar_ptr = &main_arena; \
-  if(!mutex_trylock(&ar_ptr->mutex)) \
-    ++(ar_ptr->stat_lock_direct); \
-  else { \
-    (void)mutex_lock(&ar_ptr->mutex); \
-    ++(ar_ptr->stat_lock_wait); \
-  } \
-} while(0)
-#else
-#define arena_get(ar_ptr, sz) do { \
-  ar_ptr = &main_arena; \
-  (void)mutex_lock(&ar_ptr->mutex); \
-} while(0)
-#endif
-#define arena_for_chunk(ptr) (&main_arena)
-
-#endif /* USE_ARENAS */
 
 /**************************************************************************/
 
@@ -232,13 +208,11 @@ free_atfork(Void_t* mem, const Void_t *caller)
 
   p = mem2chunk(mem);         /* do not bother to replicate free_check here */
 
-#if HAVE_MMAP
   if (chunk_is_mmapped(p))                       /* release mmapped memory. */
   {
     munmap_chunk(p);
     return;
   }
-#endif
 
 #ifdef ATOMIC_FASTBINS
   ar_ptr = arena_for_chunk(p);
@@ -636,7 +610,6 @@ thread_atfork_static(ptmalloc_lock_all, ptmalloc_unlock_all, \
 
 /* Managing heaps and arenas (for concurrent threads) */
 
-#if USE_ARENAS
 
 #if MALLOC_DEBUG > 1
 
@@ -1083,7 +1056,6 @@ arena_thread_freeres (void)
 text_set_element (__libc_thread_subfreeres, arena_thread_freeres);
 #endif
 
-#endif /* USE_ARENAS */
 
 /*
  * Local variables:
diff --git a/tpc/malloc2.13/hooks.h b/tpc/malloc2.13/hooks.h
index 05cfafbb78ba..48f54f915275 100644
--- a/tpc/malloc2.13/hooks.h
+++ b/tpc/malloc2.13/hooks.h
@@ -249,13 +249,11 @@ free_check(Void_t* mem, const Void_t *caller)
     malloc_printerr(check_action, "free(): invalid pointer", mem);
     return;
   }
-#if HAVE_MMAP
   if (chunk_is_mmapped(p)) {
     (void)mutex_unlock(&main_arena.mutex);
     munmap_chunk(p);
     return;
   }
-#endif
 #if 0 /* Erase freed memory. */
   memset(mem, 0, chunksize(p) - (SIZE_SZ+1));
 #endif
@@ -295,7 +293,6 @@ realloc_check(Void_t* oldmem, size_t bytes, const Void_t *caller)
   checked_request2size(bytes+1, nb);
   (void)mutex_lock(&main_arena.mutex);
 
-#if HAVE_MMAP
   if (chunk_is_mmapped(oldp)) {
 #if HAVE_MREMAP
     mchunkptr newp = mremap_chunk(oldp, nb);
@@ -318,7 +315,6 @@ realloc_check(Void_t* oldmem, size_t bytes, const Void_t *caller)
       }
     }
   } else {
-#endif /* HAVE_MMAP */
     if (top_check() >= 0) {
       INTERNAL_SIZE_T nb;
       checked_request2size(bytes + 1, nb);
@@ -336,9 +332,7 @@ realloc_check(Void_t* oldmem, size_t bytes, const Void_t *caller)
 	     0, nb - (oldsize+SIZE_SZ));
     }
 #endif
-#if HAVE_MMAP
   }
-#endif
 
   /* mem2chunk_check changed the magic byte in the old chunk.
      If newmem is NULL, then the old chunk will still be used though,
@@ -414,12 +408,10 @@ free_starter(Void_t* mem, const Void_t *caller)
 
   if(!mem) return;
   p = mem2chunk(mem);
-#if HAVE_MMAP
   if (chunk_is_mmapped(p)) {
     munmap_chunk(p);
     return;
   }
-#endif
 #ifdef ATOMIC_FASTBINS
   _int_free(&main_arena, p, 1);
 #else
diff --git a/tpc/malloc2.13/malloc.c b/tpc/malloc2.13/malloc.c
index c366b6085953..4fff268316ed 100644
--- a/tpc/malloc2.13/malloc.c
+++ b/tpc/malloc2.13/malloc.c
@@ -649,23 +649,6 @@ extern Void_t*     sbrk(ptrdiff_t);
 
 
 /*
-  Define HAVE_MMAP as true to optionally make malloc() use mmap() to
-  allocate very large blocks.  These will be returned to the
-  operating system immediately after a free(). Also, if mmap
-  is available, it is used as a backup strategy in cases where
-  MORECORE fails to provide space from system.
-
-  This malloc is best tuned to work with mmap for large requests.
-  If you do not have mmap, operations involving very large chunks (1MB
-  or so) may be slower than you'd like.
-*/
-
-#ifndef HAVE_MMAP
-#define HAVE_MMAP 1
-#endif
-
-
-/*
    MMAP_AS_MORECORE_SIZE is the minimum mmap size argument to use if
    sbrk fails, and mmap is used as a backup (which is done only if
    HAVE_MMAP).  The value must be a multiple of page size.  This
@@ -696,17 +679,7 @@ extern Void_t*     sbrk(ptrdiff_t);
 #define HAVE_MREMAP 0
 #endif
 
-#endif /* HAVE_MMAP */
-
-/* Define USE_ARENAS to enable support for multiple `arenas'.  These
-   are allocated using mmap(), are necessary for threads and
-   occasionally useful to overcome address space limitations affecting
-   sbrk(). */
-
-#ifndef USE_ARENAS
-#define USE_ARENAS HAVE_MMAP
-#endif
-
+#endif /* HAVE_MREMAP */
 
 /*
   The system page size. To the extent possible, this malloc manages
@@ -1455,11 +1428,7 @@ int      dlposix_memalign(void **, size_t, size_t);
 #define M_MMAP_MAX             -4
 
 #ifndef DEFAULT_MMAP_MAX
-#if HAVE_MMAP
 #define DEFAULT_MMAP_MAX       (65536)
-#else
-#define DEFAULT_MMAP_MAX       (0)
-#endif
 #endif
 
 #ifdef __cplusplus
@@ -1629,7 +1598,6 @@ do {                                                                          \
 /* ------------------ MMAP support ------------------  */
 
 
-#if HAVE_MMAP
 
 #include <fcntl.h>
 #ifndef LACKS_SYS_MMAN_H
@@ -1674,7 +1642,6 @@ static int dev_zero_fd = -1; /* Cached file descriptor for /dev/zero. */
 #endif
 
 
-#endif /* HAVE_MMAP */
 
 
 /*
@@ -2494,7 +2461,6 @@ static void do_check_chunk(struct malloc_state * av, mchunkptr p)
 
   }
   else {
-#if HAVE_MMAP
     /* address is outside main heap  */
     if (contiguous(av) && av->top != initial_top(av)) {
       assert(((char*)p) < min_address || ((char*)p) >= max_address);
@@ -2503,10 +2469,6 @@ static void do_check_chunk(struct malloc_state * av, mchunkptr p)
     assert(((p->prev_size + sz) & (mp_.pagesize-1)) == 0);
     /* mem is aligned */
     assert(aligned_OK(chunk2mem(p)));
-#else
-    /* force an appropriate assert violation if debug set */
-    assert(!chunk_is_mmapped(p));
-#endif
   }
 }
 
@@ -2836,7 +2798,6 @@ static Void_t* sYSMALLOc(INTERNAL_SIZE_T nb, struct malloc_state * av)
   bool            tried_mmap = false;
 
 
-#if HAVE_MMAP
 
   /*
     If have mmap, and the request size meets the mmap threshold, and
@@ -2920,7 +2881,6 @@ static Void_t* sYSMALLOc(INTERNAL_SIZE_T nb, struct malloc_state * av)
       }
     }
   }
-#endif
 
   /* Record incoming configuration of top */
 
@@ -3056,7 +3016,6 @@ static Void_t* sYSMALLOc(INTERNAL_SIZE_T nb, struct malloc_state * av)
     segregated mmap region.
   */
 
-#if HAVE_MMAP
     /* Cannot merge with old top, so add its size back in */
     if (contiguous(av))
       size = (size + old_size + pagemask) & ~pagemask;
@@ -3085,7 +3044,6 @@ static Void_t* sYSMALLOc(INTERNAL_SIZE_T nb, struct malloc_state * av)
 	set_noncontiguous(av);
       }
     }
-#endif
   }
 
   if (brk != (char*)(MORECORE_FAILURE)) {
@@ -3351,7 +3309,6 @@ static int sYSTRIm(size_t pad, struct malloc_state * av)
   return 0;
 }
 
-#ifdef HAVE_MMAP
 
 static void
 internal_function
@@ -3439,7 +3396,6 @@ mremap_chunk(mchunkptr p, size_t new_size)
 
 #endif /* HAVE_MREMAP */
 
-#endif /* HAVE_MMAP */
 
 /*------------------------ Public wrappers. --------------------------------*/
 
@@ -3468,7 +3424,6 @@ public_mALLOc(size_t bytes)
       victim = _int_malloc(ar_ptr, bytes);
       (void)mutex_unlock(&ar_ptr->mutex);
     } else {
-#if USE_ARENAS
       /* ... or sbrk() has failed and there is still a chance to mmap() */
       ar_ptr = arena_get2(ar_ptr->next ? ar_ptr : 0, bytes);
       (void)mutex_unlock(&main_arena.mutex);
@@ -3476,7 +3431,6 @@ public_mALLOc(size_t bytes)
 	victim = _int_malloc(ar_ptr, bytes);
 	(void)mutex_unlock(&ar_ptr->mutex);
       }
-#endif
     }
   } else
     (void)mutex_unlock(&ar_ptr->mutex);
@@ -3506,7 +3460,6 @@ public_fREe(Void_t* mem)
 
   p = mem2chunk(mem);
 
-#if HAVE_MMAP
   if (chunk_is_mmapped(p))                       /* release mmapped memory. */
   {
     /* see if the dynamic brk/mmap threshold needs adjusting */
@@ -3520,7 +3473,6 @@ public_fREe(Void_t* mem)
     munmap_chunk(p);
     return;
   }
-#endif
 
   ar_ptr = arena_for_chunk(p);
 #ifdef ATOMIC_FASTBINS
@@ -3582,7 +3534,6 @@ public_rEALLOc(Void_t* oldmem, size_t bytes)
 
   checked_request2size(bytes, nb);
 
-#if HAVE_MMAP
   if (chunk_is_mmapped(oldp))
   {
     Void_t* newmem;
@@ -3600,7 +3551,6 @@ public_rEALLOc(Void_t* oldmem, size_t bytes)
     munmap_chunk(oldp);
     return newmem;
   }
-#endif
 
   ar_ptr = arena_for_chunk(oldp);
 #if THREAD_STATS
@@ -3688,7 +3638,6 @@ public_mEMALIGn(size_t alignment, size_t bytes)
       p = _int_memalign(ar_ptr, alignment, bytes);
       (void)mutex_unlock(&ar_ptr->mutex);
     } else {
-#if USE_ARENAS
       /* ... or sbrk() has failed and there is still a chance to mmap() */
       struct malloc_state * prev = ar_ptr->next ? ar_ptr : 0;
       (void)mutex_unlock(&ar_ptr->mutex);
@@ -3697,7 +3646,6 @@ public_mEMALIGn(size_t alignment, size_t bytes)
 	p = _int_memalign(ar_ptr, alignment, bytes);
 	(void)mutex_unlock(&ar_ptr->mutex);
       }
-#endif
     }
   } else
     (void)mutex_unlock(&ar_ptr->mutex);
@@ -3739,14 +3687,12 @@ public_vALLOc(size_t bytes)
       p = _int_memalign(ar_ptr, pagesz, bytes);
       (void)mutex_unlock(&ar_ptr->mutex);
     } else {
-#if USE_ARENAS
       /* ... or sbrk() has failed and there is still a chance to mmap() */
       ar_ptr = arena_get2(ar_ptr->next ? ar_ptr : 0, bytes);
       if(ar_ptr) {
 	p = _int_memalign(ar_ptr, pagesz, bytes);
 	(void)mutex_unlock(&ar_ptr->mutex);
       }
-#endif
     }
   }
   assert(!p || chunk_is_mmapped(mem2chunk(p)) ||
@@ -3785,7 +3731,6 @@ public_pVALLOc(size_t bytes)
       p = _int_memalign(ar_ptr, pagesz, rounded_bytes);
       (void)mutex_unlock(&ar_ptr->mutex);
     } else {
-#if USE_ARENAS
       /* ... or sbrk() has failed and there is still a chance to mmap() */
       ar_ptr = arena_get2(ar_ptr->next ? ar_ptr : 0,
 			  bytes + 2*pagesz + MINSIZE);
@@ -3793,7 +3738,6 @@ public_pVALLOc(size_t bytes)
 	p = _int_memalign(ar_ptr, pagesz, rounded_bytes);
 	(void)mutex_unlock(&ar_ptr->mutex);
       }
-#endif
     }
   }
   assert(!p || chunk_is_mmapped(mem2chunk(p)) ||
@@ -3878,7 +3822,6 @@ public_cALLOc(size_t n, size_t elem_size)
       mem = _int_malloc(&main_arena, sz);
       (void)mutex_unlock(&main_arena.mutex);
     } else {
-#if USE_ARENAS
       /* ... or sbrk() has failed and there is still a chance to mmap() */
       (void)mutex_lock(&main_arena.mutex);
       av = arena_get2(av->next ? av : 0, sz);
@@ -3887,21 +3830,18 @@ public_cALLOc(size_t n, size_t elem_size)
 	mem = _int_malloc(av, sz);
 	(void)mutex_unlock(&av->mutex);
       }
-#endif
     }
     if (mem == 0) return 0;
   }
   p = mem2chunk(mem);
 
   /* Two optional cases in which clearing not necessary */
-#if HAVE_MMAP
   if (chunk_is_mmapped (p))
     {
       if (__builtin_expect (perturb_byte, 0))
 	MALLOC_ZERO (mem, sz);
       return mem;
     }
-#endif
 
   csz = chunksize(p);
 
@@ -4877,9 +4817,7 @@ _int_free(struct malloc_state * av, mchunkptr p)
   */
 
   else {
-#if HAVE_MMAP
     munmap_chunk (p);
-#endif
   }
 }
 
@@ -5188,7 +5126,6 @@ _int_realloc(struct malloc_state * av, mchunkptr oldp, INTERNAL_SIZE_T oldsize,
   */
 
   else {
-#if HAVE_MMAP
 
 #if HAVE_MREMAP
     INTERNAL_SIZE_T offset = oldp->prev_size;
@@ -5244,12 +5181,6 @@ _int_realloc(struct malloc_state * av, mchunkptr oldp, INTERNAL_SIZE_T oldsize,
     }
     return newmem;
 
-#else
-    /* If !HAVE_MMAP, but chunk_is_mmapped, user must have overwritten mem */
-    check_malloc_state(av);
-    MALLOC_FAILURE_ACTION;
-    return 0;
-#endif
   }
 #endif
 }
@@ -5724,21 +5655,15 @@ void mSTATs()
     ar_ptr = ar_ptr->next;
     if(ar_ptr == &main_arena) break;
   }
-#if HAVE_MMAP
   fprintf(stderr, "Total (incl. mmap):\n");
-#else
-  fprintf(stderr, "Total:\n");
-#endif
   fprintf(stderr, "system bytes     = %14lu\n", system_b);
   fprintf(stderr, "in use bytes     = %14lu\n", in_use_b);
 #ifdef NO_THREADS
   fprintf(stderr, "max system bytes = %14lu\n", mp_.max_total_mem);
 #endif
-#if HAVE_MMAP
   fprintf(stderr, "max mmap regions = %14u\n", mp_.max_n_mmaps);
   fprintf(stderr, "max mmap bytes   = %14lu\n",
 	  (unsigned long)mp_.max_mmapped_mem);
-#endif
 #if THREAD_STATS
   fprintf(stderr, "heaps created    = %10d\n",  stat_n_heaps);
   fprintf(stderr, "locked directly  = %10ld\n", stat_lock_direct);
@@ -5789,22 +5714,15 @@ int mALLOPt(int param_number, int value)
     break;
 
   case M_MMAP_THRESHOLD:
-#if USE_ARENAS
     /* Forbid setting the threshold too high. */
     if((unsigned long)value > HEAP_MAX_SIZE/2)
       res = 0;
     else
-#endif
       mp_.mmap_threshold = value;
       mp_.no_dyn_threshold = 1;
     break;
 
   case M_MMAP_MAX:
-#if !HAVE_MMAP
-    if (value != 0)
-      res = 0;
-    else
-#endif
       mp_.n_mmaps_max = value;
       mp_.no_dyn_threshold = 1;
     break;
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH] malloc: hide THREAD_STATS
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
                   ` (33 preceding siblings ...)
  2016-01-26  0:27 ` [PATCH] malloc: use atomic free list Joern Engel
@ 2016-01-26  0:27 ` Joern Engel
  2016-01-26  0:27 ` [PATCH] malloc: document and fix linked list handling Joern Engel
                   ` (29 subsequent siblings)
  64 siblings, 0 replies; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:27 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

From: Joern Engel <joern@purestorage.org>

Move the #ifdef THREAD_STATS into a helper function instead of
open-coding it in some places and forgetting it in others.

JIRA: PURE-27597
---
 tpc/malloc2.13/arena.h  |  19 +++++++++
 tpc/malloc2.13/malloc.c | 111 ++++++++++++++++--------------------------------
 2 files changed, 56 insertions(+), 74 deletions(-)

diff --git a/tpc/malloc2.13/arena.h b/tpc/malloc2.13/arena.h
index 2e74cdb05d86..6ac0635364af 100644
--- a/tpc/malloc2.13/arena.h
+++ b/tpc/malloc2.13/arena.h
@@ -782,3 +782,22 @@ static struct malloc_state *arena_get(size_t size)
 		arena = arena_get2(arena, size);
 	return arena;
 }
+
+static inline void arena_lock(struct malloc_state *arena)
+{
+#if THREAD_STATS
+	if(!mutex_trylock(&arena->mutex))
+		++(arena->stat_lock_direct);
+	else {
+		(void)mutex_lock(&arena->mutex);
+		++(arena->stat_lock_wait);
+	}
+#else
+	(void)mutex_lock(&arena->mutex);
+#endif
+}
+
+static inline void arena_unlock(struct malloc_state *arena)
+{
+	(void)mutex_unlock(&arena->mutex);
+}
diff --git a/tpc/malloc2.13/malloc.c b/tpc/malloc2.13/malloc.c
index 078b3eead789..9df824584745 100644
--- a/tpc/malloc2.13/malloc.c
+++ b/tpc/malloc2.13/malloc.c
@@ -3249,19 +3249,19 @@ mremap_chunk(mchunkptr p, size_t new_size)
 
 #endif /* HAVE_MREMAP */
 
-static struct malloc_state *get_backup_arena(struct malloc_state *ar_ptr, size_t bytes)
+static struct malloc_state *get_backup_arena(struct malloc_state *arena, size_t bytes)
 {
-	if (ar_ptr != &main_arena) {
+	if (arena != &main_arena) {
 		/* Maybe the failure is due to running out of mmapped areas. */
-		(void)mutex_unlock(&ar_ptr->mutex);
-		ar_ptr = &main_arena;
-		(void)mutex_lock(&ar_ptr->mutex);
+		arena_unlock(arena);
+		arena = &main_arena;
+		arena_lock(arena);
 	} else {
 		/* ... or sbrk() has failed and there is still a chance to mmap() */
-		ar_ptr = arena_get2(ar_ptr, bytes);
-		(void)mutex_unlock(&main_arena.mutex);
+		arena = arena_get2(arena, bytes);
+		arena_unlock(&main_arena);
 	}
-	return ar_ptr;
+	return arena;
 }
 
 /*------------------------ Public wrappers. --------------------------------*/
@@ -3284,7 +3284,7 @@ Void_t *public_mALLOc(size_t bytes)
 		ar_ptr = get_backup_arena(ar_ptr, bytes);
 		victim = _int_malloc(ar_ptr, bytes);
 	}
-	(void)mutex_unlock(&ar_ptr->mutex);
+	arena_unlock(ar_ptr);
 	assert(!victim || chunk_is_mmapped(mem2chunk(victim)) || ar_ptr == arena_for_chunk(mem2chunk(victim)));
 	return victim;
 }
@@ -3317,18 +3317,9 @@ public_fREe(Void_t* mem)
 #ifdef ATOMIC_FASTBINS
   _int_free(ar_ptr, p, 0);
 #else
-# if THREAD_STATS
-  if(!mutex_trylock(&ar_ptr->mutex))
-    ++(ar_ptr->stat_lock_direct);
-  else {
-    (void)mutex_lock(&ar_ptr->mutex);
-    ++(ar_ptr->stat_lock_wait);
-  }
-# else
-  (void)mutex_lock(&ar_ptr->mutex);
-# endif
+  arena_lock(ar_ptr);
   _int_free(ar_ptr, p);
-  (void)mutex_unlock(&ar_ptr->mutex);
+  arena_unlock(ar_ptr);
 #endif
 }
 
@@ -3389,16 +3380,7 @@ public_rEALLOc(Void_t* oldmem, size_t bytes)
   }
 
   ar_ptr = arena_for_chunk(oldp);
-#if THREAD_STATS
-  if(!mutex_trylock(&ar_ptr->mutex))
-    ++(ar_ptr->stat_lock_direct);
-  else {
-    (void)mutex_lock(&ar_ptr->mutex);
-    ++(ar_ptr->stat_lock_wait);
-  }
-#else
-  (void)mutex_lock(&ar_ptr->mutex);
-#endif
+  arena_lock(ar_ptr);
 
 #if !defined NO_THREADS
   /* As in malloc(), remember this arena for the next allocation. */
@@ -3407,7 +3389,7 @@ public_rEALLOc(Void_t* oldmem, size_t bytes)
 
   newp = _int_realloc(ar_ptr, oldp, oldsize, nb);
 
-  (void)mutex_unlock(&ar_ptr->mutex);
+  arena_unlock(ar_ptr);
   assert(!newp || chunk_is_mmapped(mem2chunk(newp)) ||
 	 ar_ptr == arena_for_chunk(mem2chunk(newp)));
 
@@ -3421,18 +3403,9 @@ public_rEALLOc(Void_t* oldmem, size_t bytes)
 #ifdef ATOMIC_FASTBINS
 	  _int_free(ar_ptr, oldp, 0);
 #else
-# if THREAD_STATS
-	  if(!mutex_trylock(&ar_ptr->mutex))
-	    ++(ar_ptr->stat_lock_direct);
-	  else {
-	    (void)mutex_lock(&ar_ptr->mutex);
-	    ++(ar_ptr->stat_lock_wait);
-	  }
-# else
-	  (void)mutex_lock(&ar_ptr->mutex);
-# endif
+	  arena_lock(ar_ptr);
 	  _int_free(ar_ptr, oldp);
-	  (void)mutex_unlock(&ar_ptr->mutex);
+	  arena_unlock(ar_ptr);
 #endif
 	}
     }
@@ -3465,7 +3438,7 @@ Void_t *public_mEMALIGn(size_t alignment, size_t bytes)
 		ar_ptr = get_backup_arena(ar_ptr, bytes);
 		p = _int_memalign(ar_ptr, alignment, bytes);
 	}
-	(void)mutex_unlock(&ar_ptr->mutex);
+	arena_unlock(ar_ptr);
 	assert(!p || chunk_is_mmapped(mem2chunk(p)) || ar_ptr == arena_for_chunk(mem2chunk(p)));
 	return p;
 }
@@ -3488,12 +3461,12 @@ Void_t *public_vALLOc(size_t bytes)
 	if (!ar_ptr)
 		return 0;
 	p = _int_valloc(ar_ptr, bytes);
-	(void)mutex_unlock(&ar_ptr->mutex);
+	arena_unlock(ar_ptr);
 	if (!p) {
 		ar_ptr = get_backup_arena(ar_ptr, bytes);
 		p = _int_memalign(ar_ptr, pagesz, bytes);
 	}
-	(void)mutex_unlock(&ar_ptr->mutex);
+	arena_unlock(ar_ptr);
 	assert(!p || chunk_is_mmapped(mem2chunk(p)) || ar_ptr == arena_for_chunk(mem2chunk(p)));
 
 	return p;
@@ -3517,12 +3490,11 @@ Void_t *public_pVALLOc(size_t bytes)
 
 	ar_ptr = arena_get(bytes + 2 * pagesz + MINSIZE);
 	p = _int_pvalloc(ar_ptr, bytes);
-	(void)mutex_unlock(&ar_ptr->mutex);
 	if (!p) {
 		ar_ptr = get_backup_arena(ar_ptr, bytes);
 		p = _int_memalign(ar_ptr, pagesz, rounded_bytes);
 	}
-	(void)mutex_unlock(&ar_ptr->mutex);
+	arena_unlock(ar_ptr);
 	assert(!p || chunk_is_mmapped(mem2chunk(p)) || ar_ptr == arena_for_chunk(mem2chunk(p)));
 
 	return p;
@@ -3591,7 +3563,7 @@ Void_t *public_cALLOc(size_t n, size_t elem_size)
 		av = get_backup_arena(av, bytes);
 		mem = _int_malloc(&main_arena, sz);
 	}
-	(void)mutex_unlock(&av->mutex);
+	arena_unlock(av);
 
 	assert(!mem || chunk_is_mmapped(mem2chunk(mem)) || av == arena_for_chunk(mem2chunk(mem)));
 	if (mem == 0)
@@ -3658,7 +3630,7 @@ public_iCALLOc(size_t n, size_t elem_size, Void_t** chunks)
     return 0;
 
   m = _int_icalloc(ar_ptr, n, elem_size, chunks);
-  (void)mutex_unlock(&ar_ptr->mutex);
+  arena_unlock(ar_ptr);
   return m;
 }
 
@@ -3673,7 +3645,7 @@ public_iCOMALLOc(size_t n, size_t sizes[], Void_t** chunks)
     return 0;
 
   m = _int_icomalloc(ar_ptr, n, sizes, chunks);
-  (void)mutex_unlock(&ar_ptr->mutex);
+  arena_unlock(ar_ptr);
   return m;
 }
 
@@ -3695,9 +3667,9 @@ public_mTRIm(size_t s)
   struct malloc_state * ar_ptr = &main_arena;
   do
     {
-      (void) mutex_lock (&ar_ptr->mutex);
+      arena_lock(ar_ptr);
       result |= mTRIm (ar_ptr, s);
-      (void) mutex_unlock (&ar_ptr->mutex);
+      arena_unlock(ar_ptr);
 
       ar_ptr = ar_ptr->next;
     }
@@ -3728,9 +3700,9 @@ struct mallinfo public_mALLINFo()
 
   if(__malloc_initialized < 0)
     ptmalloc_init ();
-  (void)mutex_lock(&main_arena.mutex);
+  arena_lock(&main_arena);
   m = mALLINFo(&main_arena);
-  (void)mutex_unlock(&main_arena.mutex);
+  arena_unlock(&main_arena);
   ret.arena = (int)m.arena;
   ret.ordblks = (int)m.ordblks;
   ret.smblks = (int)m.smblks;
@@ -4302,7 +4274,7 @@ _int_free(struct malloc_state * av, mchunkptr p)
     errout:
 #ifdef ATOMIC_FASTBINS
       if (! have_lock && locked)
-	(void)mutex_unlock(&av->mutex);
+	arena_unlock(av);
 #endif
       malloc_printerr (check_action, errstr, chunk2mem(p));
       return;
@@ -4342,7 +4314,7 @@ _int_free(struct malloc_state * av, mchunkptr p)
 	   after getting the lock.  */
 	if (have_lock
 	    || ({ assert (locked == 0);
-		  mutex_lock(&av->mutex);
+		  arena_lock*av);
 		  locked = 1;
 		  chunk_at_offset (p, size)->size <= 2 * SIZE_SZ
 		    || chunksize (chunk_at_offset (p, size)) >= av->system_mem;
@@ -4355,7 +4327,7 @@ _int_free(struct malloc_state * av, mchunkptr p)
 #ifdef ATOMIC_FASTBINS
 	if (! have_lock)
 	  {
-	    (void)mutex_unlock(&av->mutex);
+	    arena_unlock(av);
 	    locked = 0;
 	  }
 #endif
@@ -4419,16 +4391,7 @@ _int_free(struct malloc_state * av, mchunkptr p)
   else if (!chunk_is_mmapped(p)) {
 #ifdef ATOMIC_FASTBINS
     if (! have_lock) {
-# if THREAD_STATS
-      if(!mutex_trylock(&av->mutex))
-	++(av->stat_lock_direct);
-      else {
-	(void)mutex_lock(&av->mutex);
-	++(av->stat_lock_wait);
-      }
-# else
-      (void)mutex_lock(&av->mutex);
-# endif
+      arena_lock(av);
       locked = 1;
     }
 #endif
@@ -4564,7 +4527,7 @@ _int_free(struct malloc_state * av, mchunkptr p)
 #ifdef ATOMIC_FASTBINS
     if (! have_lock) {
       assert (locked);
-      (void)mutex_unlock(&av->mutex);
+      arena_unlock(av);
     }
 #endif
   }
@@ -5388,7 +5351,7 @@ void mSTATs()
   if(__malloc_initialized < 0)
     ptmalloc_init ();
   for (i=0, ar_ptr = &main_arena;; i++) {
-    (void)mutex_lock(&ar_ptr->mutex);
+    arena_lock(ar_ptr);
     mi = mALLINFo(ar_ptr);
     fprintf(stderr, "Arena %d:\n", i);
     fprintf(stderr, "system bytes     = %14lu\n", mi.arena);
@@ -5404,7 +5367,7 @@ void mSTATs()
     stat_lock_loop += ar_ptr->stat_lock_loop;
     stat_lock_wait += ar_ptr->stat_lock_wait;
 #endif
-    (void)mutex_unlock(&ar_ptr->mutex);
+    arena_unlock(ar_ptr);
     ar_ptr = ar_ptr->next;
     if(ar_ptr == &main_arena) break;
   }
@@ -5439,7 +5402,7 @@ int mALLOPt(int param_number, int value)
 
   if(__malloc_initialized < 0)
     ptmalloc_init ();
-  (void)mutex_lock(&av->mutex);
+  arena_lock(av);
   /* Ensure initialization/consolidation */
   malloc_consolidate(av);
 
@@ -5481,7 +5444,7 @@ int mALLOPt(int param_number, int value)
     break;
 
   }
-  (void)mutex_unlock(&av->mutex);
+  arena_unlock(av);
   return res;
 }
 
@@ -5736,7 +5699,7 @@ dlmalloc_info (int options, FILE *fp)
     } sizes[NFASTBINS + NBINS - 1];
 #define nsizes (sizeof (sizes) / sizeof (sizes[0]))
 
-    mutex_lock (&ar_ptr->mutex);
+    arena_lock(ar_ptr);
 
     for (size_t i = 0; i < NFASTBINS; ++i)
       {
@@ -5807,7 +5770,7 @@ dlmalloc_info (int options, FILE *fp)
 	avail += sizes[NFASTBINS - 1 + i].total;
       }
 
-    mutex_unlock (&ar_ptr->mutex);
+    arena_unlock(ar_ptr);
 
     total_nfastblocks += nfastblocks;
     total_fastavail += fastavail;
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH] malloc: add documentation
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
                   ` (12 preceding siblings ...)
  2016-01-26  0:27 ` [PATCH] malloc: s/max_node/num_nodes/ Joern Engel
@ 2016-01-26  0:27 ` Joern Engel
  2016-01-26  0:27 ` [PATCH] malloc: Lindent public_fREe() Joern Engel
                   ` (50 subsequent siblings)
  64 siblings, 0 replies; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:27 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

JIRA: PURE-27597
---
 tpc/malloc2.13/design | 90 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 90 insertions(+)
 create mode 100644 tpc/malloc2.13/design

diff --git a/tpc/malloc2.13/design b/tpc/malloc2.13/design
new file mode 100644
index 000000000000..1c3018093cdd
--- /dev/null
+++ b/tpc/malloc2.13/design
@@ -0,0 +1,90 @@
+This malloc is based on glibc 2.13 malloc.  Doug Lea started it all,
+Wolfram Gloger extended it for multithreading, Ulrich Drepper
+mainained it as part of glibc, then "improved" it so much we had to
+fork it.
+
+For introduction, please read http://gee.cs.oswego.edu/dl/html/malloc.html
+
+Nomenclature:
+dlmalloc: Doug Lea's version
+ptmalloc: Wolfram Gloger's version
+libcmalloc: Libc version as of 2.13
+puremalloc: Pure Storage's version
+
+
+Arenas:
+Large allocations are done by mmap.  All small allocations come from
+an arena, which is split into suitable chunks.  Dlmalloc had a single
+arena, which provided a locking hotspot.  Arena was enlarged by
+sbrk(2).  Ptmalloc introduced multiple arenas, creating them on the
+fly when locks when observing lock contention.
+
+Libcmalloc made arenas per-thread, which further reduced lock
+contention, but significantly increased memory consumption.  Glibc
+2.13 was the latest version without per-thread arenas enabled.  Costa
+gave Pure a private copy of 2.13 malloc to avoid the regression.
+
+Arenas are on a single-linked list with a pointer kept in thread-local
+storage.  If the last arena used for a thread is locked, it will try
+the next arena, etc.  If all arenas are locked, a new arena is
+created.
+
+Jörn changed this into a single-linked list per numa node.  Threads
+always allocate from a numa-local arena.
+
+
+NUMA locality:
+All arenas use mbind() to preferentially get memory from just one numa
+node.  In case of memory shortage the kernel is allowed to go
+cross-numa.  As always, memory shortage should be avoided.
+
+The getcpu() syscall is used to detect the current numa node when
+allocating memory.  If the scheduler moves threads to different numa
+nodes, performance will suffer.  No surprise there.  Syscall overhead
+could be a performance problem.  We have plans to create a shared
+memory page containing information like the thread's current numa node
+to solve that.  Surprisingly the syscall overhead doesn't seem that
+high, so it may take a while.
+
+
+Hugepages:
+Using hugepages instead of small pages makes a significant difference
+in process exit time.  We have repeatedly observer >20s spent in
+exit_mm(), freeing all process memory.  Going from small pages to huge
+pages solves that problem.  Puremalloc uses huge pages for all mmap
+allocations.
+
+The main_arena still uses sbrk() to allocate system memory, i.e. it
+uses small pages.  To solve this the main_arena was taken off the
+per-node lists.  It is still being used in special cases, e.g. when
+creating a new thread.  Changing that is a lot of work and not worth
+it yet.
+
+
+Thread caching:
+tcmalloc and jemalloc demonstrated that a per-thread cache for malloc
+can be beneficial, so we introduced the same to puremalloc.  Freed
+objects initially stay in the thread cache.  Roughly half the time
+they get reused by an allocation shortly after.  If objects exist in
+cache, malloc doesn't have to take the arena lock.
+
+When going to the arena, puremalloc pre-allocates a second object.
+Pre-allocation further reduces arena contention.  Pre-allocating more
+than one object yields diminishing returns.  The performance
+difference between thread cache and arena just isn't high enough.
+
+
+Binning:
+The binning strategies of dlmalloc and jemalloc are pretty ad-hoc and
+discontinuous.  Jemalloc is extremely fine-grained up to 4k, then
+jumps from 4k to 8k to 16k, etc.  As a result, allocations slightly
+above 4k, 8k, etc. result in nearly 100% overhead.  Dlmalloc is less
+bad, but similar.
+
+For the thread cache, puremalloc uses 16 bins per power of two.  This
+requires an implementation of fls(), which is not standardized in C.
+Fls() done in inline assembly is slower than a predicted jump, but
+faster than a mispredicted jump.  Overall performance is about awash.
+Not having corner cases with 100% memory overhead is a real benefit,
+though.  Worst-case overhead is 6%, and hard to achieve even when
+trying.
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH] malloc: avoid main_arena
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
                   ` (22 preceding siblings ...)
  2016-01-26  0:27 ` [PATCH] malloc: better inline documentation Joern Engel
@ 2016-01-26  0:27 ` Joern Engel
  2016-01-26  0:27 ` [PATCH] malloc: limit free_atomic_list() latency Joern Engel
                   ` (40 subsequent siblings)
  64 siblings, 0 replies; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:27 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

In spite of using MAP_HUGETLB we still had memory allocated in small
pages.  One cause is the main_arena, which uses sbrk() to allocate
memory.  By removing it from the per-numa lists we should minimize use
of the main_arena, while keeping it around for special-purpose
allocations around fork time, etc.

JIRA: PURE-27597
---
 tpc/malloc2.13/arena.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tpc/malloc2.13/arena.h b/tpc/malloc2.13/arena.h
index d66b4e7029a2..0ee6286bf286 100644
--- a/tpc/malloc2.13/arena.h
+++ b/tpc/malloc2.13/arena.h
@@ -354,9 +354,9 @@ static void ptmalloc_init(void)
 #endif				/* !defined NO_THREADS */
 	mutex_init(&main_arena.mutex);
 	main_arena.next = &main_arena;
-	numa_arena[0] = &main_arena;
+	main_arena.numa_node = -1;
 	parse_node_count();
-	for (i = 1; i <= max_node; i++) {
+	for (i = 0; i <= max_node; i++) {
 		numa_arena[i] = _int_new_arena(0, i);
 		numa_arena[i]->local_next = numa_arena[i];
 		(void)mutex_unlock(&numa_arena[i]->mutex);
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH] malloc: simplify and fix calloc
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
                   ` (20 preceding siblings ...)
  2016-01-26  0:27 ` [PATCH] malloc: Revert glibc 1d05c2fb9c6f Joern Engel
@ 2016-01-26  0:27 ` Joern Engel
  2016-01-26 10:32   ` Will Newton
  2016-01-26  0:27 ` [PATCH] malloc: better inline documentation Joern Engel
                   ` (42 subsequent siblings)
  64 siblings, 1 reply; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:27 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

Calloc was essentially a copy of malloc that tried to skip the memset if
the memory was just returned from mmap() and is already cleared.  It was
also buggy and a suitable test could trigger a segfault.

New code only does the overflow check, malloc and memset.  Runtime cost
is lower than maintenance cost of already-buggy code.

JIRA: PURE-27597
JIRA: PURE-36718
---
 tpc/malloc2.13/malloc.c | 106 +++---------------------------------------------
 1 file changed, 5 insertions(+), 101 deletions(-)

diff --git a/tpc/malloc2.13/malloc.c b/tpc/malloc2.13/malloc.c
index d9fecfe3f921..190c1d24b082 100644
--- a/tpc/malloc2.13/malloc.c
+++ b/tpc/malloc2.13/malloc.c
@@ -3506,13 +3506,8 @@ Void_t *public_pVALLOc(size_t bytes)
 
 Void_t *public_cALLOc(size_t n, size_t elem_size)
 {
-	struct malloc_state *av;
-	mchunkptr oldtop, p;
-	INTERNAL_SIZE_T bytes, sz, csz, oldtopsize;
+	INTERNAL_SIZE_T bytes;
 	Void_t *mem;
-	unsigned long clearsize;
-	unsigned long nclears;
-	INTERNAL_SIZE_T *d;
 
 	/* size_t is unsigned so the behavior on overflow is defined.  */
 	bytes = n * elem_size;
@@ -3525,101 +3520,10 @@ Void_t *public_cALLOc(size_t n, size_t elem_size)
 		}
 	}
 
-	__malloc_ptr_t(*hook) __MALLOC_PMT((size_t, __const __malloc_ptr_t)) = force_reg(dlmalloc_hook);
-	if (hook != NULL) {
-		sz = bytes;
-		mem = (*hook) (sz, RETURN_ADDRESS(0));
-		if (mem == 0)
-			return 0;
-#ifdef HAVE_MEMCPY
-		return memset(mem, 0, sz);
-#else
-		while (sz > 0)
-			((char *)mem)[--sz] = 0;	/* rather inefficient */
-		return mem;
-#endif
-	}
-
-	sz = bytes;
-
-	av = arena_get(sz);
-	if (!av)
-		return 0;
-
-	/* Check if we hand out the top chunk, in which case there may be no
-	   need to clear. */
-#if MORECORE_CLEARS
-	oldtop = top(av);
-	oldtopsize = chunksize(top(av));
-#if MORECORE_CLEARS < 2
-	/* Only newly allocated memory is guaranteed to be cleared.  */
-	if (av == &main_arena && oldtopsize < mp_.sbrk_base + av->max_system_mem - (char *)oldtop)
-		oldtopsize = (mp_.sbrk_base + av->max_system_mem - (char *)oldtop);
-#endif
-	if (av != &main_arena) {
-		heap_info *heap = heap_for_ptr(oldtop);
-		if (oldtopsize < (char *)heap + heap->mprotect_size - (char *)oldtop)
-			oldtopsize = (char *)heap + heap->mprotect_size - (char *)oldtop;
-	}
-#endif
-	mem = _int_malloc(av, sz);
-	if (mem == 0) {
-		av = get_backup_arena(av, bytes);
-		mem = _int_malloc(&main_arena, sz);
-	}
-	arena_unlock(av);
-
-	assert(!mem || chunk_is_mmapped(mem2chunk(mem)) || av == arena_for_chunk(mem2chunk(mem)));
-	if (mem == 0)
-		return 0;
-	p = mem2chunk(mem);
-
-	/* Two optional cases in which clearing not necessary */
-	if (chunk_is_mmapped(p)) {
-		if (perturb_byte)
-			MALLOC_ZERO(mem, sz);
-		return mem;
-	}
-
-	csz = chunksize(p);
-
-#if MORECORE_CLEARS
-	if (perturb_byte == 0 && (p == oldtop && csz > oldtopsize)) {
-		/* clear only the bytes from non-freshly-sbrked memory */
-		csz = oldtopsize;
-	}
-#endif
-
-	/* Unroll clear of <= 36 bytes (72 if 8byte sizes).  We know that
-	   contents have an odd number of INTERNAL_SIZE_T-sized words;
-	   minimally 3.  */
-	d = (INTERNAL_SIZE_T *) mem;
-	clearsize = csz - SIZE_SZ;
-	nclears = clearsize / sizeof(INTERNAL_SIZE_T);
-	assert(nclears >= 3);
-
-	if (nclears > 9)
-		MALLOC_ZERO(d, clearsize);
-
-	else {
-		*(d + 0) = 0;
-		*(d + 1) = 0;
-		*(d + 2) = 0;
-		if (nclears > 4) {
-			*(d + 3) = 0;
-			*(d + 4) = 0;
-			if (nclears > 6) {
-				*(d + 5) = 0;
-				*(d + 6) = 0;
-				if (nclears > 8) {
-					*(d + 7) = 0;
-					*(d + 8) = 0;
-				}
-			}
-		}
-	}
-
-	return mem;
+	mem = public_mALLOc(bytes);
+	if (!mem)
+		return NULL;
+	return memset(mem, 0, bytes);
 }
 
 
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH] malloc: always free objects locklessly
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
                   ` (36 preceding siblings ...)
  2016-01-26  0:27 ` [PATCH] malloc: Lindent users of arena_get2 Joern Engel
@ 2016-01-26  0:27 ` Joern Engel
  2016-01-26  0:27 ` [PATCH] malloc: initial numa support Joern Engel
                   ` (26 subsequent siblings)
  64 siblings, 0 replies; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:27 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

Tcache_free has become lockless, but only covered objects small enough
for caching.  Have it handle larger objects as well, spreading the
lockless goodness a bit further.

JIRA: PURE-27597
---
 tpc/malloc2.13/malloc.c | 11 +----------
 tpc/malloc2.13/tcache.h | 32 ++++++++++++++++++++++++++------
 2 files changed, 27 insertions(+), 16 deletions(-)

diff --git a/tpc/malloc2.13/malloc.c b/tpc/malloc2.13/malloc.c
index 90d0e7e552b9..022b9a4ce712 100644
--- a/tpc/malloc2.13/malloc.c
+++ b/tpc/malloc2.13/malloc.c
@@ -3293,7 +3293,6 @@ out:
 
 void public_fREe(Void_t * mem)
 {
-	struct malloc_state *ar_ptr;
 	mchunkptr p;		/* chunk corresponding to mem */
 
 	void (*hook) (__malloc_ptr_t, __const __malloc_ptr_t) = force_reg(dlfree_hook);
@@ -3312,15 +3311,7 @@ void public_fREe(Void_t * mem)
 		return;
 	}
 
-	if (tcache_free(p)) {
-		/* Object could be freed on fast path */
-		return;
-	}
-
-	ar_ptr = arena_for_chunk(p);
-	arena_lock(ar_ptr);
-	_int_free(ar_ptr, p);
-	arena_unlock(ar_ptr);
+	tcache_free(p);
 }
 
 Void_t*
diff --git a/tpc/malloc2.13/tcache.h b/tpc/malloc2.13/tcache.h
index 55bf3862af91..7ebbc139a6ca 100644
--- a/tpc/malloc2.13/tcache.h
+++ b/tpc/malloc2.13/tcache.h
@@ -108,6 +108,13 @@ static void free_atomic_list(struct malloc_state *arena)
 {
 	struct malloc_chunk *victim, *next;
 
+	/*
+	 * Check without using atomic first - if we lose the race we will
+	 * free things next time around.
+	 */
+	if (!arena->atomic_free_list)
+		return;
+
 	victim = __sync_lock_test_and_set(&arena->atomic_free_list, NULL);
 	while (victim) {
 		next = victim->fd;
@@ -290,19 +297,20 @@ static void *tcache_malloc(size_t size)
 /*
  * returns 1 if object was freed
  */
-static int tcache_free(mchunkptr p)
+static void tcache_free(mchunkptr p)
 {
 	struct thread_cache *cache;
+	struct malloc_state *arena;
 	struct malloc_chunk **bin;
 	size_t size;
 	int bin_no;
 
 	tsd_getspecific(cache_key, cache);
 	if (!cache)
-		return 0;
+		goto uncached_free;
 	size = chunksize(p);
 	if (size > MAX_CACHED_SIZE)
-		return 0;
+		goto uncached_free;
 	bin_no = cache_bin(size);
 	assert(bin_no < CACHE_NO_BINS);
 
@@ -311,15 +319,27 @@ static int tcache_free(mchunkptr p)
 	bin = &cache->tc_bin[bin_no];
 	if (*bin == p) {
 		malloc_printerr(check_action, "double free or corruption (tcache)", chunk2mem(p));
-		return 0;
+		return;
 	}
 	if (*bin && cache_bin(chunksize(*bin)) != bin_no) {
 		malloc_printerr(check_action, "invalid tcache entry", chunk2mem(p));
-		return 0;
+		return;
 	}
 	p->fd = *bin;
 	*bin = p;
 	if (cache->tc_size > CACHE_SIZE)
 		tcache_gc(cache);
-	return 1;
+	return;
+
+ uncached_free:
+	arena = arena_for_chunk(p);
+	if(!mutex_trylock(&arena->mutex)) {
+		_int_free(arena, p);
+		free_atomic_list(arena);
+		arena_unlock(arena);
+	} else {
+		do {
+			p->fd = arena->atomic_free_list;
+		} while (!__sync_bool_compare_and_swap(&arena->atomic_free_list, p->fd, p));
+	}
 }
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH] malloc: add locking to thread cache
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
                   ` (29 preceding siblings ...)
  2016-01-26  0:27 ` [PATCH] malloc: fix perturb_byte handling for tcache Joern Engel
@ 2016-01-26  0:27 ` Joern Engel
  2016-01-26 12:45   ` Szabolcs Nagy
  2016-01-26  0:27 ` [PATCH] malloc: fix local_next handling Joern Engel
                   ` (33 subsequent siblings)
  64 siblings, 1 reply; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:27 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

With signals we can reenter the thread-cache.  Protect against that with
a lock.  Will almost never happen in practice, it took the company five
years to reproduce a similar race in the existing malloc.  But easy to
trigger with a targeted test.

JIRA: PURE-27597
---
 tpc/malloc2.13/malloc-machine.h |  3 +++
 tpc/malloc2.13/tcache.h         | 32 +++++++++++++++++++++++++++-----
 2 files changed, 30 insertions(+), 5 deletions(-)

diff --git a/tpc/malloc2.13/malloc-machine.h b/tpc/malloc2.13/malloc-machine.h
index 07072f5d5e11..a03b78bf3c1a 100644
--- a/tpc/malloc2.13/malloc-machine.h
+++ b/tpc/malloc2.13/malloc-machine.h
@@ -65,6 +65,9 @@ static inline int mutex_lock(mutex_t *m) {
     }
   }
 }
+/* Returns 0 on success, 1 on error
+   XXX: This is the opposite of the Linux kernel's mutex_trylock
+   primitive, making it confusing and error-prone. */
 static inline int mutex_trylock(mutex_t *m) {
   int r;
 
diff --git a/tpc/malloc2.13/tcache.h b/tpc/malloc2.13/tcache.h
index e267ce905ed0..ece03fc464cd 100644
--- a/tpc/malloc2.13/tcache.h
+++ b/tpc/malloc2.13/tcache.h
@@ -90,6 +90,13 @@ static inline int get_bit(unsigned long *bitmap, int i)
 }
 
 struct thread_cache {
+	/*
+	 * Since the cache is per-thread, it should not need a lock.
+	 * The reason we have one anyway is signal handlers, which can
+	 * cause the same thread to enter this code twice concurrently.
+	 */
+	mutex_t mutex;
+
 	/* Bytes in cache */
 	uint32_t tc_size;
 
@@ -264,9 +271,13 @@ static void *tcache_malloc(size_t size)
 		if (!cache)
 			return NULL;
 		memset(cache, 0, sizeof(*cache));
+		mutex_init(&cache->mutex);
 		tsd_setspecific(cache_key, cache);
 	}
 
+	if (mutex_trylock(&cache->mutex))
+		return NULL;
+
 	bin_no = cache_bin(nb);
 	assert(bin_no < CACHE_NO_BINS);
 	set_accessed(cache, bin_no);
@@ -281,6 +292,7 @@ static void *tcache_malloc(size_t size)
 		}
 		if (cache_bin(chunksize(*bin)) != bin_no) {
 			malloc_printerr(check_action, "invalid tcache entry", victim);
+			mutex_unlock(&cache->mutex);
 			return NULL;
 		}
 		*bin = victim->fd;
@@ -288,6 +300,7 @@ static void *tcache_malloc(size_t size)
 		cache->tc_size -= chunksize(victim);
 		cache->tc_count--;
 		alloc_perturb(p, size);
+		mutex_unlock(&cache->mutex);
 		return p;
 	}
 
@@ -301,8 +314,10 @@ static void *tcache_malloc(size_t size)
 		tcache_gc(cache);
 
 	arena = arena_get(size);
-	if (!arena)
+	if (!arena) {
+		mutex_unlock(&cache->mutex);
 		return NULL;
+	}
 	free_atomic_list(arena);
 	/* TODO: _int_malloc does checked_request2size() again, which is silly */
 	victim = _int_malloc(arena, size);
@@ -335,6 +350,7 @@ static void *tcache_malloc(size_t size)
 	arena_unlock(arena);
 	assert(!victim || arena == arena_for_chunk(mem2chunk(victim)));
 	alloc_perturb(victim, size);
+	mutex_unlock(&cache->mutex);
 	return victim;
 }
 
@@ -346,12 +362,16 @@ static void tcache_free(mchunkptr p)
 	size_t size;
 	int bin_no;
 
+	size = chunksize(p);
+	if (size > MAX_CACHED_SIZE)
+		goto uncached_free;
+
 	tsd_getspecific(cache_key, cache);
 	if (!cache)
 		goto uncached_free;
-	size = chunksize(p);
-	if (size > MAX_CACHED_SIZE)
+	if (mutex_trylock(&cache->mutex))
 		goto uncached_free;
+
 	bin_no = cache_bin(size);
 	assert(bin_no < CACHE_NO_BINS);
 
@@ -360,16 +380,18 @@ static void tcache_free(mchunkptr p)
 	bin = &cache->tc_bin[bin_no];
 	if (*bin == p) {
 		malloc_printerr(check_action, "double free or corruption (tcache)", chunk2mem(p));
-		return;
+		goto out;
 	}
 	if (*bin && cache_bin(chunksize(*bin)) != bin_no) {
 		malloc_printerr(check_action, "invalid tcache entry", chunk2mem(p));
-		return;
+		goto out;
 	}
 	free_perturb(p, size - 2 * SIZE_SZ);
 	add_to_bin(bin, p);
 	if (cache->tc_size > CACHE_SIZE)
 		tcache_gc(cache);
+ out:
+	mutex_unlock(&cache->mutex);
 	return;
 
  uncached_free:
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH] malloc: tune thread cache
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
                   ` (16 preceding siblings ...)
  2016-01-26  0:27 ` [PATCH] malloc: quenche last compiler warnings Joern Engel
@ 2016-01-26  0:27 ` Joern Engel
  2016-01-26  0:27 ` [PATCH] malloc: create a useful assert Joern Engel
                   ` (46 subsequent siblings)
  64 siblings, 0 replies; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:27 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

Experiments have shown that prefetch of more than one object yields
diminishing returns and can even be harmful.  Growing the cache to 128k
yields much better results and should still be small enough.

JIRA: PURE-27597
---
 tpc/malloc2.13/tcache.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tpc/malloc2.13/tcache.h b/tpc/malloc2.13/tcache.h
index 0ddee48a30dc..55bf3862af91 100644
--- a/tpc/malloc2.13/tcache.h
+++ b/tpc/malloc2.13/tcache.h
@@ -24,13 +24,13 @@ static inline int fls(int x)
  * arena.  On free we keep the freed object in hope of reusing it in
  * future allocations.
  */
-#define CACHE_SIZE_BITS		(16)
+#define CACHE_SIZE_BITS		(17)
 #define CACHE_SIZE		(1 << CACHE_SIZE_BITS)
 #define MAX_CACHED_SIZE_BITS	(CACHE_SIZE_BITS - 3)
 #define MAX_CACHED_SIZE		(1 << MAX_CACHED_SIZE_BITS)
 #define MAX_PREFETCH_SIZE_BITS	(CACHE_SIZE_BITS - 6)
 #define MAX_PREFETCH_SIZE	(1 << MAX_PREFETCH_SIZE_BITS)
-#define NO_PREFETCH		(1 << 3)
+#define NO_PREFETCH		(1)
 
 /*
  * Binning is done as a subdivided buddy allocator.  A buddy allocator
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH] malloc: Lindent before functional changes
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
                   ` (9 preceding siblings ...)
  2016-01-26  0:27 ` [PATCH] malloc: remove mstate typedef Joern Engel
@ 2016-01-26  0:27 ` Joern Engel
  2016-01-26  0:27 ` [PATCH] malloc: remove __builtin_expect Joern Engel
                   ` (53 subsequent siblings)
  64 siblings, 0 replies; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:27 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

From: Joern Engel <joern@purestorage.org>

JIRA: PURE-27597
---
 tpc/malloc2.13/arena.h  | 477 +++++++++++++++++++++++-------------------------
 tpc/malloc2.13/malloc.c |  46 ++---
 2 files changed, 253 insertions(+), 270 deletions(-)

diff --git a/tpc/malloc2.13/arena.h b/tpc/malloc2.13/arena.h
index 803d7b3bf020..c854de12910c 100644
--- a/tpc/malloc2.13/arena.h
+++ b/tpc/malloc2.13/arena.h
@@ -108,37 +108,37 @@ static int __malloc_initialized = -1;
    in the new arena. */
 
 #define arena_get(ptr, size) do { \
-  arena_lookup(ptr); \
-  arena_lock(ptr, size); \
+	arena_lookup(ptr); \
+	arena_lock(ptr, size); \
 } while(0)
 
 #define arena_lookup(ptr) do { \
-  Void_t *vptr = NULL; \
-  ptr = (struct malloc_state *)tsd_getspecific(arena_key, vptr); \
+	Void_t *vptr = NULL; \
+	ptr = (struct malloc_state *)tsd_getspecific(arena_key, vptr); \
 } while(0)
 
 #ifdef PER_THREAD
 #define arena_lock(ptr, size) do { \
-  if(ptr) \
-    (void)mutex_lock(&ptr->mutex); \
-  else \
-    ptr = arena_get2(ptr, (size)); \
+	if(ptr) \
+		(void)mutex_lock(&ptr->mutex); \
+	else \
+		ptr = arena_get2(ptr, (size)); \
 } while(0)
 #else
 #define arena_lock(ptr, size) do { \
-  if(ptr && !mutex_trylock(&ptr->mutex)) { \
-    THREAD_STAT(++(ptr->stat_lock_direct)); \
-  } else \
-    ptr = arena_get2(ptr, (size)); \
+	if(ptr && !mutex_trylock(&ptr->mutex)) { \
+		THREAD_STAT(++(ptr->stat_lock_direct)); \
+	} else \
+		ptr = arena_get2(ptr, (size)); \
 } while(0)
 #endif
 
 /* find the heap and corresponding arena for a given ptr */
 
 #define heap_for_ptr(ptr) \
- ((heap_info *)((unsigned long)(ptr) & ~(HEAP_MAX_SIZE-1)))
+	((heap_info *)((unsigned long)(ptr) & ~(HEAP_MAX_SIZE-1)))
 #define arena_for_chunk(ptr) \
- (chunk_non_main_arena(ptr) ? heap_for_ptr(ptr)->ar_ptr : &main_arena)
+	(chunk_non_main_arena(ptr) ? heap_for_ptr(ptr)->ar_ptr : &main_arena)
 
 
 /**************************************************************************/
@@ -436,168 +436,157 @@ __libc_malloc_pthread_startup (bool first_time)
 # endif
 #endif
 
-static void
-ptmalloc_init (void)
+static void ptmalloc_init(void)
 {
-  const char* s;
-  int secure = 0;
+	const char *s;
+	int secure = 0;
 
-  if(__malloc_initialized >= 0) return;
-  __malloc_initialized = 0;
+	if (__malloc_initialized >= 0)
+		return;
+	__malloc_initialized = 0;
 
 #ifdef _LIBC
-# if defined SHARED && !USE___THREAD
-  /* ptmalloc_init_minimal may already have been called via
-     __libc_malloc_pthread_startup, above.  */
-  if (mp_.pagesize == 0)
-# endif
+#if defined SHARED && !USE___THREAD
+	/* ptmalloc_init_minimal may already have been called via
+	   __libc_malloc_pthread_startup, above.  */
+	if (mp_.pagesize == 0)
+#endif
 #endif
-    ptmalloc_init_minimal();
+		ptmalloc_init_minimal();
 
 #ifndef NO_THREADS
-# if defined _LIBC
-  /* We know __pthread_initialize_minimal has already been called,
-     and that is enough.  */
-#   define NO_STARTER
-# endif
-# ifndef NO_STARTER
-  /* With some threads implementations, creating thread-specific data
-     or initializing a mutex may call malloc() itself.  Provide a
-     simple starter version (realloc() won't work). */
-  save_malloc_hook = dlmalloc_hook;
-  save_memalign_hook = dlmemalign_hook;
-  save_free_hook = dlfree_hook;
-  dlmalloc_hook = malloc_starter;
-  dlmemalign_hook = memalign_starter;
-  dlfree_hook = free_starter;
-#  ifdef _LIBC
-  /* Initialize the pthreads interface. */
-  if (__pthread_initialize != NULL)
-    __pthread_initialize();
-#  endif /* !defined _LIBC */
-# endif	/* !defined NO_STARTER */
-#endif /* !defined NO_THREADS */
-  mutex_init(&main_arena.mutex);
-  main_arena.next = &main_arena;
+#if defined _LIBC
+	/* We know __pthread_initialize_minimal has already been called,
+	   and that is enough.  */
+#define NO_STARTER
+#endif
+#ifndef NO_STARTER
+	/* With some threads implementations, creating thread-specific data
+	   or initializing a mutex may call malloc() itself.  Provide a
+	   simple starter version (realloc() won't work). */
+	save_malloc_hook = dlmalloc_hook;
+	save_memalign_hook = dlmemalign_hook;
+	save_free_hook = dlfree_hook;
+	dlmalloc_hook = malloc_starter;
+	dlmemalign_hook = memalign_starter;
+	dlfree_hook = free_starter;
+#ifdef _LIBC
+	/* Initialize the pthreads interface. */
+	if (__pthread_initialize != NULL)
+		__pthread_initialize();
+#endif				/* !defined _LIBC */
+#endif				/* !defined NO_STARTER */
+#endif				/* !defined NO_THREADS */
+	mutex_init(&main_arena.mutex);
+	main_arena.next = &main_arena;
 
 #if defined _LIBC && defined SHARED
-  /* In case this libc copy is in a non-default namespace, never use brk.
-     Likewise if dlopened from statically linked program.  */
-  Dl_info di;
-  struct link_map *l;
-
-  if (_dl_open_hook != NULL
-      || (_dl_addr (ptmalloc_init, &di, &l, NULL) != 0
-	  && l->l_ns != LM_ID_BASE))
-    __morecore = __failing_morecore;
+	/* In case this libc copy is in a non-default namespace, never use brk.
+	   Likewise if dlopened from statically linked program.  */
+	Dl_info di;
+	struct link_map *l;
+
+	if (_dl_open_hook != NULL || (_dl_addr(ptmalloc_init, &di, &l, NULL) != 0 && l->l_ns != LM_ID_BASE))
+		__morecore = __failing_morecore;
 #endif
 
-  mutex_init(&list_lock);
-  tsd_key_create(&arena_key, NULL);
-  tsd_setspecific(arena_key, (Void_t *)&main_arena);
-  thread_atfork(ptmalloc_lock_all, ptmalloc_unlock_all, ptmalloc_unlock_all2);
+	mutex_init(&list_lock);
+	tsd_key_create(&arena_key, NULL);
+	tsd_setspecific(arena_key, (Void_t *) & main_arena);
+	thread_atfork(ptmalloc_lock_all, ptmalloc_unlock_all, ptmalloc_unlock_all2);
 #ifndef NO_THREADS
-# ifndef NO_STARTER
-  dlmalloc_hook = save_malloc_hook;
-  dlmemalign_hook = save_memalign_hook;
-  dlfree_hook = save_free_hook;
-# else
-#  undef NO_STARTER
-# endif
+#ifndef NO_STARTER
+	dlmalloc_hook = save_malloc_hook;
+	dlmemalign_hook = save_memalign_hook;
+	dlfree_hook = save_free_hook;
+#else
+#undef NO_STARTER
+#endif
 #endif
 #ifdef _LIBC
-  secure = __libc_enable_secure;
-  s = NULL;
-  if (__builtin_expect (_environ != NULL, 1))
-    {
-      char **runp = _environ;
-      char *envline;
-
-      while (__builtin_expect ((envline = next_env_entry (&runp)) != NULL,
-			       0))
-	{
-	  size_t len = strcspn (envline, "=");
-
-	  if (envline[len] != '=')
-	    /* This is a "MALLOC_" variable at the end of the string
-	       without a '=' character.  Ignore it since otherwise we
-	       will access invalid memory below.  */
-	    continue;
-
-	  switch (len)
-	    {
-	    case 6:
-	      if (memcmp (envline, "CHECK_", 6) == 0)
-		s = &envline[7];
-	      break;
-	    case 8:
-	      if (! secure)
-		{
-		  if (memcmp (envline, "TOP_PAD_", 8) == 0)
-		    mALLOPt(M_TOP_PAD, atoi(&envline[9]));
-		  else if (memcmp (envline, "PERTURB_", 8) == 0)
-		    mALLOPt(M_PERTURB, atoi(&envline[9]));
-		}
-	      break;
-	    case 9:
-	      if (! secure)
-		{
-		  if (memcmp (envline, "MMAP_MAX_", 9) == 0)
-		    mALLOPt(M_MMAP_MAX, atoi(&envline[10]));
+	secure = __libc_enable_secure;
+	s = NULL;
+	if (__builtin_expect(_environ != NULL, 1)) {
+		char **runp = _environ;
+		char *envline;
+
+		while (__builtin_expect((envline = next_env_entry(&runp)) != NULL, 0)) {
+			size_t len = strcspn(envline, "=");
+
+			if (envline[len] != '=')
+				/* This is a "MALLOC_" variable at the end of the string
+				   without a '=' character.  Ignore it since otherwise we
+				   will access invalid memory below.  */
+				continue;
+
+			switch (len) {
+			case 6:
+				if (memcmp(envline, "CHECK_", 6) == 0)
+					s = &envline[7];
+				break;
+			case 8:
+				if (!secure) {
+					if (memcmp(envline, "TOP_PAD_", 8) == 0)
+						mALLOPt(M_TOP_PAD, atoi(&envline[9]));
+					else if (memcmp(envline, "PERTURB_", 8) == 0)
+						mALLOPt(M_PERTURB, atoi(&envline[9]));
+				}
+				break;
+			case 9:
+				if (!secure) {
+					if (memcmp(envline, "MMAP_MAX_", 9) == 0)
+						mALLOPt(M_MMAP_MAX, atoi(&envline[10]));
 #ifdef PER_THREAD
-		  else if (memcmp (envline, "ARENA_MAX", 9) == 0)
-		    mALLOPt(M_ARENA_MAX, atoi(&envline[10]));
+					else if (memcmp(envline, "ARENA_MAX", 9) == 0)
+						mALLOPt(M_ARENA_MAX, atoi(&envline[10]));
 #endif
-		}
-	      break;
+				}
+				break;
 #ifdef PER_THREAD
-	    case 10:
-	      if (! secure)
-		{
-		  if (memcmp (envline, "ARENA_TEST", 10) == 0)
-		    mALLOPt(M_ARENA_TEST, atoi(&envline[11]));
-		}
-	      break;
+			case 10:
+				if (!secure) {
+					if (memcmp(envline, "ARENA_TEST", 10) == 0)
+						mALLOPt(M_ARENA_TEST, atoi(&envline[11]));
+				}
+				break;
 #endif
-	    case 15:
-	      if (! secure)
-		{
-		  if (memcmp (envline, "TRIM_THRESHOLD_", 15) == 0)
-		    mALLOPt(M_TRIM_THRESHOLD, atoi(&envline[16]));
-		  else if (memcmp (envline, "MMAP_THRESHOLD_", 15) == 0)
-		    mALLOPt(M_MMAP_THRESHOLD, atoi(&envline[16]));
+			case 15:
+				if (!secure) {
+					if (memcmp(envline, "TRIM_THRESHOLD_", 15) == 0)
+						mALLOPt(M_TRIM_THRESHOLD, atoi(&envline[16]));
+					else if (memcmp(envline, "MMAP_THRESHOLD_", 15) == 0)
+						mALLOPt(M_MMAP_THRESHOLD, atoi(&envline[16]));
+				}
+				break;
+			default:
+				break;
+			}
 		}
-	      break;
-	    default:
-	      break;
-	    }
 	}
-    }
 #else
-  if (! secure)
-    {
-      if((s = getenv("MALLOC_TRIM_THRESHOLD_")))
-	mALLOPt(M_TRIM_THRESHOLD, atoi(s));
-      if((s = getenv("MALLOC_TOP_PAD_")))
-	mALLOPt(M_TOP_PAD, atoi(s));
-      if((s = getenv("MALLOC_PERTURB_")))
-	mALLOPt(M_PERTURB, atoi(s));
-      if((s = getenv("MALLOC_MMAP_THRESHOLD_")))
-	mALLOPt(M_MMAP_THRESHOLD, atoi(s));
-      if((s = getenv("MALLOC_MMAP_MAX_")))
-	mALLOPt(M_MMAP_MAX, atoi(s));
-    }
-  s = getenv("MALLOC_CHECK_");
+	if (!secure) {
+		if ((s = getenv("MALLOC_TRIM_THRESHOLD_")))
+			mALLOPt(M_TRIM_THRESHOLD, atoi(s));
+		if ((s = getenv("MALLOC_TOP_PAD_")))
+			mALLOPt(M_TOP_PAD, atoi(s));
+		if ((s = getenv("MALLOC_PERTURB_")))
+			mALLOPt(M_PERTURB, atoi(s));
+		if ((s = getenv("MALLOC_MMAP_THRESHOLD_")))
+			mALLOPt(M_MMAP_THRESHOLD, atoi(s));
+		if ((s = getenv("MALLOC_MMAP_MAX_")))
+			mALLOPt(M_MMAP_MAX, atoi(s));
+	}
+	s = getenv("MALLOC_CHECK_");
 #endif
-  if(s && s[0]) {
-    mALLOPt(M_CHECK_ACTION, (int)(s[0] - '0'));
-    if (check_action != 0)
-      dlmalloc_check_init();
-  }
-  void (*hook) (void) = force_reg (dlmalloc_initialize_hook);
-  if (hook != NULL)
-    (*hook)();
-  __malloc_initialized = 1;
+	if (s && s[0]) {
+		mALLOPt(M_CHECK_ACTION, (int)(s[0] - '0'));
+		if (check_action != 0)
+			dlmalloc_check_init();
+	}
+	void (*hook) (void) = force_reg(dlmalloc_initialize_hook);
+	if (hook != NULL)
+		(*hook) ();
+	__malloc_initialized = 1;
 }
 
 /* There are platforms (e.g. Hurd) with a link-time hook mechanism. */
@@ -836,65 +825,62 @@ heap_trim(heap_info *heap, size_t pad)
 
 /* Create a new arena with initial size "size".  */
 
-static struct malloc_state *
-_int_new_arena(size_t size)
+static struct malloc_state *_int_new_arena(size_t size)
 {
-  struct malloc_state * a;
-  heap_info *h;
-  char *ptr;
-  unsigned long misalign;
-
-  h = new_heap(size + (sizeof(*h) + sizeof(*a) + MALLOC_ALIGNMENT),
-	       mp_.top_pad);
-  if(!h) {
-    /* Maybe size is too large to fit in a single heap.  So, just try
-       to create a minimally-sized arena and let _int_malloc() attempt
-       to deal with the large request via mmap_chunk().  */
-    h = new_heap(sizeof(*h) + sizeof(*a) + MALLOC_ALIGNMENT, mp_.top_pad);
-    if(!h)
-      return 0;
-  }
-  a = h->ar_ptr = (struct malloc_state *)(h+1);
-  malloc_init_state(a);
-  /*a->next = NULL;*/
-  a->system_mem = a->max_system_mem = h->size;
-  arena_mem += h->size;
+	struct malloc_state *a;
+	heap_info *h;
+	char *ptr;
+	unsigned long misalign;
+
+	h = new_heap(size + (sizeof(*h) + sizeof(*a) + MALLOC_ALIGNMENT), mp_.top_pad);
+	if (!h) {
+		/* Maybe size is too large to fit in a single heap.  So, just try
+		   to create a minimally-sized arena and let _int_malloc() attempt
+		   to deal with the large request via mmap_chunk().  */
+		h = new_heap(sizeof(*h) + sizeof(*a) + MALLOC_ALIGNMENT, mp_.top_pad);
+		if (!h)
+			return 0;
+	}
+	a = h->ar_ptr = (struct malloc_state *)(h + 1);
+	malloc_init_state(a);
+	/*a->next = NULL; */
+	a->system_mem = a->max_system_mem = h->size;
+	arena_mem += h->size;
 #ifdef NO_THREADS
-  if((unsigned long)(mp_.mmapped_mem + arena_mem + main_arena.system_mem) >
-     mp_.max_total_mem)
-    mp_.max_total_mem = mp_.mmapped_mem + arena_mem + main_arena.system_mem;
+	if ((unsigned long)(mp_.mmapped_mem + arena_mem + main_arena.system_mem) > mp_.max_total_mem)
+		mp_.max_total_mem = mp_.mmapped_mem + arena_mem + main_arena.system_mem;
 #endif
 
-  /* Set up the top chunk, with proper alignment. */
-  ptr = (char *)(a + 1);
-  misalign = (unsigned long)chunk2mem(ptr) & MALLOC_ALIGN_MASK;
-  if (misalign > 0)
-    ptr += MALLOC_ALIGNMENT - misalign;
-  top(a) = (mchunkptr)ptr;
-  set_head(top(a), (((char*)h + h->size) - ptr) | PREV_INUSE);
+	/* Set up the top chunk, with proper alignment. */
+	ptr = (char *)(a + 1);
+	misalign = (unsigned long)chunk2mem(ptr) & MALLOC_ALIGN_MASK;
+	if (misalign > 0)
+		ptr += MALLOC_ALIGNMENT - misalign;
+	top(a) = (mchunkptr) ptr;
+	set_head(top(a), (((char *)h + h->size) - ptr) | PREV_INUSE);
 
-  tsd_setspecific(arena_key, (Void_t *)a);
-  mutex_init(&a->mutex);
-  (void)mutex_lock(&a->mutex);
+	tsd_setspecific(arena_key, (Void_t *) a);
+	mutex_init(&a->mutex);
+	(void)mutex_lock(&a->mutex);
 
 #ifdef PER_THREAD
-  (void)mutex_lock(&list_lock);
+	(void)mutex_lock(&list_lock);
 #endif
 
-  /* Add the new arena to the global list.  */
-  a->next = main_arena.next;
-  atomic_write_barrier ();
-  main_arena.next = a;
+	/* Add the new arena to the global list.  */
+	a->next = main_arena.next;
+	atomic_write_barrier();
+	main_arena.next = a;
 
 #ifdef PER_THREAD
-  ++narenas;
+	++narenas;
 
-  (void)mutex_unlock(&list_lock);
+	(void)mutex_unlock(&list_lock);
 #endif
 
-  THREAD_STAT(++(a->stat_lock_loop));
+	THREAD_STAT(++(a->stat_lock_loop));
 
-  return a;
+	return a;
 }
 
 
@@ -977,64 +963,61 @@ reused_arena (void)
 }
 #endif
 
-static struct malloc_state *
-internal_function
-arena_get2(struct malloc_state * a_tsd, size_t size)
+static struct malloc_state *internal_function arena_get2(struct malloc_state *a_tsd, size_t size)
 {
-  struct malloc_state * a;
+	struct malloc_state *a;
 
 #ifdef PER_THREAD
-  if ((a = get_free_list ()) == NULL
-      && (a = reused_arena ()) == NULL)
-    /* Nothing immediately available, so generate a new arena.  */
-    a = _int_new_arena(size);
+	if ((a = get_free_list()) == NULL && (a = reused_arena()) == NULL)
+		/* Nothing immediately available, so generate a new arena.  */
+		a = _int_new_arena(size);
 #else
-  if(!a_tsd)
-    a = a_tsd = &main_arena;
-  else {
-    a = a_tsd->next;
-    if(!a) {
-      /* This can only happen while initializing the new arena. */
-      (void)mutex_lock(&main_arena.mutex);
-      THREAD_STAT(++(main_arena.stat_lock_wait));
-      return &main_arena;
-    }
-  }
+	if (!a_tsd)
+		a = a_tsd = &main_arena;
+	else {
+		a = a_tsd->next;
+		if (!a) {
+			/* This can only happen while initializing the new arena. */
+			(void)mutex_lock(&main_arena.mutex);
+			THREAD_STAT(++(main_arena.stat_lock_wait));
+			return &main_arena;
+		}
+	}
 
-  /* Check the global, circularly linked list for available arenas. */
-  bool retried = false;
+	/* Check the global, circularly linked list for available arenas. */
+	bool retried = false;
  repeat:
-  do {
-    if(!mutex_trylock(&a->mutex)) {
-      if (retried)
-	(void)mutex_unlock(&list_lock);
-      THREAD_STAT(++(a->stat_lock_loop));
-      tsd_setspecific(arena_key, (Void_t *)a);
-      return a;
-    }
-    a = a->next;
-  } while(a != a_tsd);
-
-  /* If not even the list_lock can be obtained, try again.  This can
-     happen during `atfork', or for example on systems where thread
-     creation makes it temporarily impossible to obtain _any_
-     locks. */
-  if(!retried && mutex_trylock(&list_lock)) {
-    /* We will block to not run in a busy loop.  */
-    (void)mutex_lock(&list_lock);
-
-    /* Since we blocked there might be an arena available now.  */
-    retried = true;
-    a = a_tsd;
-    goto repeat;
-  }
+	do {
+		if (!mutex_trylock(&a->mutex)) {
+			if (retried)
+				(void)mutex_unlock(&list_lock);
+			THREAD_STAT(++(a->stat_lock_loop));
+			tsd_setspecific(arena_key, (Void_t *) a);
+			return a;
+		}
+		a = a->next;
+	} while (a != a_tsd);
+
+	/* If not even the list_lock can be obtained, try again.  This can
+	   happen during `atfork', or for example on systems where thread
+	   creation makes it temporarily impossible to obtain _any_
+	   locks. */
+	if (!retried && mutex_trylock(&list_lock)) {
+		/* We will block to not run in a busy loop.  */
+		(void)mutex_lock(&list_lock);
+
+		/* Since we blocked there might be an arena available now.  */
+		retried = true;
+		a = a_tsd;
+		goto repeat;
+	}
 
-  /* Nothing immediately available, so generate a new arena.  */
-  a = _int_new_arena(size);
-  (void)mutex_unlock(&list_lock);
+	/* Nothing immediately available, so generate a new arena.  */
+	a = _int_new_arena(size);
+	(void)mutex_unlock(&list_lock);
 #endif
 
-  return a;
+	return a;
 }
 
 #ifdef PER_THREAD
diff --git a/tpc/malloc2.13/malloc.c b/tpc/malloc2.13/malloc.c
index 7c94a8cefcac..c9644c382e05 100644
--- a/tpc/malloc2.13/malloc.c
+++ b/tpc/malloc2.13/malloc.c
@@ -2209,43 +2209,43 @@ typedef struct malloc_chunk* mfastbinptr;
 */
 
 struct malloc_state {
-  /* Serialize access.  */
-  mutex_t mutex;
+	/* Serialize access.  */
+	mutex_t mutex;
 
-  /* Flags (formerly in max_fast).  */
-  int flags;
+	/* Flags (formerly in max_fast).  */
+	int flags;
 
 #if THREAD_STATS
-  /* Statistics for locking.  Only used if THREAD_STATS is defined.  */
-  long stat_lock_direct, stat_lock_loop, stat_lock_wait;
+	/* Statistics for locking.  Only used if THREAD_STATS is defined.  */
+	long stat_lock_direct, stat_lock_loop, stat_lock_wait;
 #endif
 
-  /* Fastbins */
-  mfastbinptr      fastbinsY[NFASTBINS];
+	/* Fastbins */
+	mfastbinptr fastbinsY[NFASTBINS];
 
-  /* Base of the topmost chunk -- not otherwise kept in a bin */
-  mchunkptr        top;
+	/* Base of the topmost chunk -- not otherwise kept in a bin */
+	mchunkptr top;
 
-  /* The remainder from the most recent split of a small request */
-  mchunkptr        last_remainder;
+	/* The remainder from the most recent split of a small request */
+	mchunkptr last_remainder;
 
-  /* Normal bins packed as described above */
-  mchunkptr        bins[NBINS * 2 - 2];
+	/* Normal bins packed as described above */
+	mchunkptr bins[NBINS * 2 - 2];
 
-  /* Bitmap of bins */
-  unsigned int     binmap[BINMAPSIZE];
+	/* Bitmap of bins */
+	unsigned int binmap[BINMAPSIZE];
 
-  /* Linked list */
-  struct malloc_state *next;
+	/* Linked list */
+	struct malloc_state *next;
 
 #ifdef PER_THREAD
-  /* Linked list for free arenas.  */
-  struct malloc_state *next_free;
+	/* Linked list for free arenas.  */
+	struct malloc_state *next_free;
 #endif
 
-  /* Memory allocated from the system in this arena.  */
-  INTERNAL_SIZE_T system_mem;
-  INTERNAL_SIZE_T max_system_mem;
+	/* Memory allocated from the system in this arena.  */
+	INTERNAL_SIZE_T system_mem;
+	INTERNAL_SIZE_T max_system_mem;
 };
 
 struct malloc_par {
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH] malloc: fix perturb_byte handling for tcache
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
                   ` (28 preceding siblings ...)
  2016-01-26  0:27 ` [PATCH] malloc: unifdef -m -UATOMIC_FASTBINS Joern Engel
@ 2016-01-26  0:27 ` Joern Engel
  2016-01-26  0:27 ` [PATCH] malloc: add locking to thread cache Joern Engel
                   ` (34 subsequent siblings)
  64 siblings, 0 replies; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:27 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

Also removes stale comment for tcache_free.

JIRA: PURE-27597
---
 tpc/malloc2.13/malloc.c | 5 +----
 tpc/malloc2.13/tcache.h | 9 ++++++---
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/tpc/malloc2.13/malloc.c b/tpc/malloc2.13/malloc.c
index 190c1d24b082..74b35f6aa366 100644
--- a/tpc/malloc2.13/malloc.c
+++ b/tpc/malloc2.13/malloc.c
@@ -3290,7 +3290,7 @@ Void_t *public_mALLOc(size_t bytes)
 
 	victim = tcache_malloc(bytes);
 	if (victim)
-		goto out;
+		return victim;
 
 	ar_ptr = arena_get(bytes);
 	if (!ar_ptr)
@@ -3302,9 +3302,6 @@ Void_t *public_mALLOc(size_t bytes)
 	}
 	arena_unlock(ar_ptr);
 	assert(!victim || chunk_is_mmapped(mem2chunk(victim)) || ar_ptr == arena_for_chunk(mem2chunk(victim)));
-out:
-	if (perturb_byte)
-		alloc_perturb(victim, bytes);
 	return victim;
 }
 
diff --git a/tpc/malloc2.13/tcache.h b/tpc/malloc2.13/tcache.h
index edfe7acbc75e..00fe24249d49 100644
--- a/tpc/malloc2.13/tcache.h
+++ b/tpc/malloc2.13/tcache.h
@@ -287,6 +287,8 @@ static void *tcache_malloc(size_t size)
 		void *p = chunk2mem(victim);
 		cache->tc_size -= chunksize(victim);
 		cache->tc_count--;
+		if (perturb_byte)
+			alloc_perturb(p, size);
 		return p;
 	}
 
@@ -333,12 +335,11 @@ static void *tcache_malloc(size_t size)
 	}
 	arena_unlock(arena);
 	assert(!victim || arena == arena_for_chunk(mem2chunk(victim)));
+	if (perturb_byte)
+		alloc_perturb(victim, size);
 	return victim;
 }
 
-/*
- * returns 1 if object was freed
- */
 static void tcache_free(mchunkptr p)
 {
 	struct thread_cache *cache;
@@ -367,6 +368,8 @@ static void tcache_free(mchunkptr p)
 		malloc_printerr(check_action, "invalid tcache entry", chunk2mem(p));
 		return;
 	}
+	if (perturb_byte)
+		free_perturb(p, size - 2 * SIZE_SZ);
 	add_to_bin(bin, p);
 	if (cache->tc_size > CACHE_SIZE)
 		tcache_gc(cache);
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH] malloc: initial numa support
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
                   ` (37 preceding siblings ...)
  2016-01-26  0:27 ` [PATCH] malloc: always free objects locklessly Joern Engel
@ 2016-01-26  0:27 ` Joern Engel
  2016-01-26  0:27 ` [PATCH] malloc: unifdef -m -UPER_THREAD -U_LIBC Joern Engel
                   ` (25 subsequent siblings)
  64 siblings, 0 replies; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:27 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

From: Joern Engel <joern@purestorage.org>

Deliberately unoptimized.  We don't explicitly set the numa policy for
our heaps, relying on memory being allocated locally.  And we do a
syscall for every malloc, which is exceedingly expensive.

See how this one fares, then refine the code.

JIRA: PURE-27597
---
 tpc/malloc2.13/arena.h  | 47 +++++++++++++++++++++++++++++------------------
 tpc/malloc2.13/malloc.c | 18 ++++++++++++++----
 2 files changed, 43 insertions(+), 22 deletions(-)

diff --git a/tpc/malloc2.13/arena.h b/tpc/malloc2.13/arena.h
index 118563003c8d..b8fc5c99a1cd 100644
--- a/tpc/malloc2.13/arena.h
+++ b/tpc/malloc2.13/arena.h
@@ -94,6 +94,17 @@ static int __malloc_initialized = -1;
 
 /**************************************************************************/
 
+/*
+ * Calling getcpu() for every allocation is too expensive - but we can turn
+ * the syscall into a pointer dereference to a kernel shared memory page.
+ */
+#include <sys/syscall.h>
+static inline int getnode(void)
+{
+	int node, ret;
+	ret = syscall(SYS_getcpu, NULL, &node, NULL);
+	return (ret == -1) ? 0 : node;
+}
 
 /* arena_get() acquires an arena and locks the corresponding mutex.
    First, try the one last locked successfully by this thread.  (This
@@ -110,7 +121,10 @@ static int __malloc_initialized = -1;
 
 #define arena_lookup(ptr) do { \
 	Void_t *vptr = NULL; \
+	int node = getnode(); \
 	ptr = (struct malloc_state *)tsd_getspecific(arena_key, vptr); \
+	if (!ptr || ptr->numa_node != node) \
+		ptr = numa_arena[node]; \
 } while(0)
 
 #define arena_lock(ptr, size) do { \
@@ -330,12 +344,12 @@ ptmalloc_init_minimal (void)
   mp_.pagesize       = malloc_getpagesize;
 }
 
-
+static struct malloc_state *_int_new_arena(size_t size, int numa_node);
 
 static void ptmalloc_init(void)
 {
 	const char *s;
-	int secure = 0;
+	int i, secure = 0;
 
 	if (__malloc_initialized >= 0)
 		return;
@@ -358,7 +372,12 @@ static void ptmalloc_init(void)
 #endif				/* !defined NO_THREADS */
 	mutex_init(&main_arena.mutex);
 	main_arena.next = &main_arena;
-
+	numa_arena[0] = &main_arena;
+	for (i = 1; i < MAX_NUMA_NODES; i++) {
+		numa_arena[i] = _int_new_arena(0, i);
+		numa_arena[i]->local_next = numa_arena[i];
+		(void)mutex_unlock(&numa_arena[i]->mutex);
+	}
 
 	mutex_init(&list_lock);
 	tsd_key_create(&arena_key, NULL);
@@ -633,7 +652,7 @@ heap_trim(heap_info *heap, size_t pad)
 
 /* Create a new arena with initial size "size".  */
 
-static struct malloc_state *_int_new_arena(size_t size)
+static struct malloc_state *_int_new_arena(size_t size, int numa_node)
 {
 	struct malloc_state *a;
 	heap_info *h;
@@ -647,7 +666,7 @@ static struct malloc_state *_int_new_arena(size_t size)
 		   to deal with the large request via mmap_chunk().  */
 		h = new_heap(sizeof(*h) + sizeof(*a) + MALLOC_ALIGNMENT, mp_.top_pad);
 		if (!h)
-			return 0;
+			abort();
 	}
 	a = h->ar_ptr = (struct malloc_state *)(h + 1);
 	malloc_init_state(a);
@@ -676,7 +695,7 @@ static struct malloc_state *_int_new_arena(size_t size)
 	a->next = main_arena.next;
 	atomic_write_barrier();
 	main_arena.next = a;
-
+	a->numa_node = numa_node;
 
 	THREAD_STAT(++(a->stat_lock_loop));
 
@@ -689,17 +708,9 @@ static struct malloc_state *internal_function arena_get2(struct malloc_state *a_
 {
 	struct malloc_state *a;
 
-	if (!a_tsd)
-		a = a_tsd = &main_arena;
-	else {
-		a = a_tsd->next;
-		if (!a) {
-			/* This can only happen while initializing the new arena. */
-			(void)mutex_lock(&main_arena.mutex);
-			THREAD_STAT(++(main_arena.stat_lock_wait));
-			return &main_arena;
-		}
-	}
+	a = a_tsd->next;
+	if (!a)
+		abort();
 
 	/* Check the global, circularly linked list for available arenas. */
 	bool retried = false;
@@ -730,7 +741,7 @@ static struct malloc_state *internal_function arena_get2(struct malloc_state *a_
 	}
 
 	/* Nothing immediately available, so generate a new arena.  */
-	a = _int_new_arena(size);
+	a = _int_new_arena(size, a_tsd->numa_node);
 	(void)mutex_unlock(&list_lock);
 
 	return a;
diff --git a/tpc/malloc2.13/malloc.c b/tpc/malloc2.13/malloc.c
index a420ef278e68..4bc6247d910e 100644
--- a/tpc/malloc2.13/malloc.c
+++ b/tpc/malloc2.13/malloc.c
@@ -2154,6 +2154,14 @@ struct malloc_state {
 	/* Serialize access.  */
 	mutex_t mutex;
 
+	/* NUMA-local linked list */
+	struct malloc_state *local_next;
+
+	/* Linked list */
+	struct malloc_state *next;
+
+	int numa_node;
+
 	/* Flags (formerly in max_fast).  */
 	int flags;
 
@@ -2177,10 +2185,6 @@ struct malloc_state {
 	/* Bitmap of bins */
 	unsigned int binmap[BINMAPSIZE];
 
-	/* Linked list */
-	struct malloc_state *next;
-
-
 	/* Memory allocated from the system in this arena.  */
 	INTERNAL_SIZE_T system_mem;
 	INTERNAL_SIZE_T max_system_mem;
@@ -2223,6 +2227,12 @@ struct malloc_par {
 
 static struct malloc_state main_arena;
 
+/*
+ * For numa locality, have a per-node list of arenas.
+ */
+#define MAX_NUMA_NODES 2
+static struct malloc_state *numa_arena[MAX_NUMA_NODES];
+
 /* There is only one instance of the malloc parameters.  */
 
 static struct malloc_par mp_;
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH] malloc: fix startup races
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
                   ` (18 preceding siblings ...)
  2016-01-26  0:27 ` [PATCH] malloc: create a useful assert Joern Engel
@ 2016-01-26  0:27 ` Joern Engel
  2016-01-26  0:27 ` [PATCH] malloc: Revert glibc 1d05c2fb9c6f Joern Engel
                   ` (44 subsequent siblings)
  64 siblings, 0 replies; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:27 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

Malloc() can get called before ptmalloc_init().  A check in arena_get()
covers that.  ptmalloc_init() should use cmpxchg or we could have
several concurrent initializers.

There still is a race where early threads can malloc_starter() and
exhaust the available memory.  Most programs don't expect malloc() to
return NULL, so I consider that yet another failure.  Easy workaround is
to do an allocation before creating threads.

JIRA: PURE-27597
---
 tpc/malloc2.13/arena.h | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/tpc/malloc2.13/arena.h b/tpc/malloc2.13/arena.h
index 8890e83ad18f..86e77ffe57f6 100644
--- a/tpc/malloc2.13/arena.h
+++ b/tpc/malloc2.13/arena.h
@@ -333,11 +333,13 @@ static void ptmalloc_init(void)
 	const char *s;
 	int i, secure = 0;
 
-	if (__malloc_initialized >= 0)
+	if (!__sync_bool_compare_and_swap(&__malloc_initialized, -1, 0)) {
+		do {
+			sched_yield();
+		} while (__malloc_initialized <= 0);
 		return;
-	__malloc_initialized = 0;
-
-		ptmalloc_init_minimal();
+	}
+	ptmalloc_init_minimal();
 
 #ifndef NO_THREADS
 #ifndef NO_STARTER
@@ -792,13 +794,19 @@ static inline int getnode(void)
    readily available, create a new one.  In this latter case, `size'
    is just a hint as to how much memory will be required immediately
    in the new arena. */
-
 static struct malloc_state *arena_get(size_t size)
 {
 	struct malloc_state *arena = NULL;
 	int node = getnode();
 
 	/*
+	 * It is possible to race with malloc_init and "win".  The
+	 * bug has existed for decades, but the race window grew
+	 * with numa_arenas.
+	 */
+	if (__malloc_initialized <= 0)
+		ptmalloc_init();
+	/*
 	 * getnode() is inherently racy.  It returns the correct node
 	 * number at the time of the syscall, but the thread may be
 	 * migrated to a different node at any moment, even before
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH] malloc: better inline documentation
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
                   ` (21 preceding siblings ...)
  2016-01-26  0:27 ` [PATCH] malloc: simplify and fix calloc Joern Engel
@ 2016-01-26  0:27 ` Joern Engel
  2016-01-26  0:27 ` [PATCH] malloc: avoid main_arena Joern Engel
                   ` (41 subsequent siblings)
  64 siblings, 0 replies; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:27 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

JIRA: PURE-27597
---
 tpc/malloc2.13/arena.h  | 20 ++++++++++++++++++++
 tpc/malloc2.13/malloc.c |  7 +++++++
 tpc/malloc2.13/tcache.h | 15 +++++++++++----
 3 files changed, 38 insertions(+), 4 deletions(-)

diff --git a/tpc/malloc2.13/arena.h b/tpc/malloc2.13/arena.h
index 685822897d97..7f50dacb8297 100644
--- a/tpc/malloc2.13/arena.h
+++ b/tpc/malloc2.13/arena.h
@@ -454,6 +454,19 @@ dump_heap(heap_info *heap)
    multiple threads, but only one will succeed.  */
 static char *aligned_heap_area;
 
+/*
+ * Preferentially mmap's huge pages, falling back on small pages if
+ * necessary.  For Pure kernels huge pages are not cleared, so we
+ * have to do so here.  Pushing it down to the caller is a small
+ * optimization in the cases where new_heap() allocated twice the
+ * necessary memory for alignment, then frees the unaligned bits.
+ * Only clearing the remaing half means we spend half the time.
+ *
+ * Only part of the heap has to be cleared, so yet another
+ * optimization would be possible.  Most likely we only need to
+ * clear the areana-header and heap-header.  But someone needs to do
+ * the homework before enabling this optimization.
+ */
 static void *mmap_for_heap(void *addr, size_t length, int *must_clear)
 {
 	int prot = PROT_READ | PROT_WRITE;
@@ -776,6 +789,13 @@ static struct malloc_state *arena_get(size_t size)
 	struct malloc_state *arena = NULL;
 	int node = getnode();
 
+	/*
+	 * getnode() is inherently racy.  It returns the correct node
+	 * number at the time of the syscall, but the thread may be
+	 * migrated to a different node at any moment, even before
+	 * getnode() returns.  Nothing we can do about this, we try
+	 * to use a numa-local arena, but are limited to best-effort.
+	 */
 	tsd_getspecific(arena_key, arena);
 	if (!arena || arena->numa_node != node) {
 		arena = numa_arena[node];
diff --git a/tpc/malloc2.13/malloc.c b/tpc/malloc2.13/malloc.c
index 46b3545aaa8f..18c7b407bbea 100644
--- a/tpc/malloc2.13/malloc.c
+++ b/tpc/malloc2.13/malloc.c
@@ -3247,6 +3247,13 @@ mremap_chunk(mchunkptr p, size_t new_size)
 
 #endif /* HAVE_MREMAP */
 
+/*
+ * Rationale behind this function is that if you cannot find enough
+ * memory through sbrk, which the main_arena uses, you might be
+ * successful with mmap or vice versa.  It is unclear whether the
+ * rationale still makes sense.  I invite anyone to do the mental
+ * excercise and prove we can remove this function.
+ */
 static struct malloc_state *get_backup_arena(struct malloc_state *arena, size_t bytes)
 {
 	if (arena != &main_arena) {
diff --git a/tpc/malloc2.13/tcache.h b/tpc/malloc2.13/tcache.h
index 628dbc00256a..b269498657f3 100644
--- a/tpc/malloc2.13/tcache.h
+++ b/tpc/malloc2.13/tcache.h
@@ -19,10 +19,17 @@ static inline int fls(int x)
 
 /*
  * Per-thread cache is supposed to reduce lock contention on arenas.
- * When doing allocations we prefetch several identical objects and
- * can return the surplus on future allocations without going to an
- * arena.  On free we keep the freed object in hope of reusing it in
- * future allocations.
+ * Freed objects go to the cache first, allowing allocations to be
+ * serviced from it without going to the arenas.  On cache misses we
+ * have to take the arena lock, but can amortize the cost by
+ * prefetching additional objects for future use.
+ *
+ * Prefetching is a heuristic.  If an object of size X is requested,
+ * we assume more objects of the same size will be requested in the
+ * near future.  If correct, this reduces locking overhead.  If
+ * incorrect, we spend cpu cycles and pollute the tcache with unused
+ * objects.  Sweet spot depends on the workload, but seems to be
+ * around one.
  */
 #define CACHE_SIZE_BITS		(17)
 #define CACHE_SIZE		(1 << CACHE_SIZE_BITS)
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH] malloc: introduce get_backup_arena()
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
                   ` (42 preceding siblings ...)
  2016-01-26  0:27 ` [PATCH] malloc: fix hard-coded constant Joern Engel
@ 2016-01-26  0:27 ` Joern Engel
  2016-01-26  0:27 ` [PATCH] malloc: unifdef -m -DUSE_ARENAS -DHAVE_MMAP Joern Engel
                   ` (20 subsequent siblings)
  64 siblings, 0 replies; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:27 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

From: Joern Engel <joern@purestorage.org>

Removes a lot of duplicate code.  Not all copies were identical and I
believe some were somewhat buggy.  Then again, this code is very
unlikely to run at all, so those bugs were equally unlikely to matter in
practice.

JIRA: PURE-27597
---
 tpc/malloc2.13/malloc.c | 124 ++++++++++++++----------------------------------
 1 file changed, 35 insertions(+), 89 deletions(-)

diff --git a/tpc/malloc2.13/malloc.c b/tpc/malloc2.13/malloc.c
index 28d9d902b7ec..7c94a8cefcac 100644
--- a/tpc/malloc2.13/malloc.c
+++ b/tpc/malloc2.13/malloc.c
@@ -3396,6 +3396,20 @@ mremap_chunk(mchunkptr p, size_t new_size)
 
 #endif /* HAVE_MREMAP */
 
+static struct malloc_state *get_backup_arena(struct malloc_state *ar_ptr, size_t bytes)
+{
+	if (ar_ptr != &main_arena) {
+		/* Maybe the failure is due to running out of mmapped areas. */
+		(void)mutex_unlock(&ar_ptr->mutex);
+		ar_ptr = &main_arena;
+		(void)mutex_lock(&ar_ptr->mutex);
+	} else {
+		/* ... or sbrk() has failed and there is still a chance to mmap() */
+		ar_ptr = arena_get2(ar_ptr, bytes);
+		(void)mutex_unlock(&main_arena.mutex);
+	}
+	return ar_ptr;
+}
 
 /*------------------------ Public wrappers. --------------------------------*/
 
@@ -3409,30 +3423,15 @@ Void_t *public_mALLOc(size_t bytes)
 	if (__builtin_expect(hook != NULL, 0))
 		return (*hook) (bytes, RETURN_ADDRESS(0));
 
-	arena_lookup(ar_ptr);
-	arena_lock(ar_ptr, bytes);
+	arena_get(ar_ptr, bytes);
 	if (!ar_ptr)
 		return 0;
 	victim = _int_malloc(ar_ptr, bytes);
 	if (!victim) {
-		/* Maybe the failure is due to running out of mmapped areas. */
-		if (ar_ptr != &main_arena) {
-			(void)mutex_unlock(&ar_ptr->mutex);
-			ar_ptr = &main_arena;
-			(void)mutex_lock(&ar_ptr->mutex);
-			victim = _int_malloc(ar_ptr, bytes);
-			(void)mutex_unlock(&ar_ptr->mutex);
-		} else {
-			/* ... or sbrk() has failed and there is still a chance to mmap() */
-			ar_ptr = arena_get2(ar_ptr->next ? ar_ptr : 0, bytes);
-			(void)mutex_unlock(&main_arena.mutex);
-			if (ar_ptr) {
-				victim = _int_malloc(ar_ptr, bytes);
-				(void)mutex_unlock(&ar_ptr->mutex);
-			}
-		}
-	} else
-		(void)mutex_unlock(&ar_ptr->mutex);
+		ar_ptr = get_backup_arena(ar_ptr, bytes);
+		victim = _int_malloc(ar_ptr, bytes);
+	}
+	(void)mutex_unlock(&ar_ptr->mutex);
 	assert(!victim || chunk_is_mmapped(mem2chunk(victim)) || ar_ptr == arena_for_chunk(mem2chunk(victim)));
 	return victim;
 }
@@ -3618,25 +3617,10 @@ Void_t *public_mEMALIGn(size_t alignment, size_t bytes)
 		return 0;
 	p = _int_memalign(ar_ptr, alignment, bytes);
 	if (!p) {
-		/* Maybe the failure is due to running out of mmapped areas. */
-		if (ar_ptr != &main_arena) {
-			(void)mutex_unlock(&ar_ptr->mutex);
-			ar_ptr = &main_arena;
-			(void)mutex_lock(&ar_ptr->mutex);
-			p = _int_memalign(ar_ptr, alignment, bytes);
-			(void)mutex_unlock(&ar_ptr->mutex);
-		} else {
-			/* ... or sbrk() has failed and there is still a chance to mmap() */
-			struct malloc_state *prev = ar_ptr->next ? ar_ptr : 0;
-			(void)mutex_unlock(&ar_ptr->mutex);
-			ar_ptr = arena_get2(prev, bytes);
-			if (ar_ptr) {
-				p = _int_memalign(ar_ptr, alignment, bytes);
-				(void)mutex_unlock(&ar_ptr->mutex);
-			}
-		}
-	} else
-		(void)mutex_unlock(&ar_ptr->mutex);
+		ar_ptr = get_backup_arena(ar_ptr, bytes);
+		p = _int_memalign(ar_ptr, alignment, bytes);
+	}
+	(void)mutex_unlock(&ar_ptr->mutex);
 	assert(!p || chunk_is_mmapped(mem2chunk(p)) || ar_ptr == arena_for_chunk(mem2chunk(p)));
 	return p;
 }
@@ -3661,21 +3645,10 @@ Void_t *public_vALLOc(size_t bytes)
 	p = _int_valloc(ar_ptr, bytes);
 	(void)mutex_unlock(&ar_ptr->mutex);
 	if (!p) {
-		/* Maybe the failure is due to running out of mmapped areas. */
-		if (ar_ptr != &main_arena) {
-			ar_ptr = &main_arena;
-			(void)mutex_lock(&ar_ptr->mutex);
-			p = _int_memalign(ar_ptr, pagesz, bytes);
-			(void)mutex_unlock(&ar_ptr->mutex);
-		} else {
-			/* ... or sbrk() has failed and there is still a chance to mmap() */
-			ar_ptr = arena_get2(ar_ptr->next ? ar_ptr : 0, bytes);
-			if (ar_ptr) {
-				p = _int_memalign(ar_ptr, pagesz, bytes);
-				(void)mutex_unlock(&ar_ptr->mutex);
-			}
-		}
+		ar_ptr = get_backup_arena(ar_ptr, bytes);
+		p = _int_memalign(ar_ptr, pagesz, bytes);
 	}
+	(void)mutex_unlock(&ar_ptr->mutex);
 	assert(!p || chunk_is_mmapped(mem2chunk(p)) || ar_ptr == arena_for_chunk(mem2chunk(p)));
 
 	return p;
@@ -3701,21 +3674,10 @@ Void_t *public_pVALLOc(size_t bytes)
 	p = _int_pvalloc(ar_ptr, bytes);
 	(void)mutex_unlock(&ar_ptr->mutex);
 	if (!p) {
-		/* Maybe the failure is due to running out of mmapped areas. */
-		if (ar_ptr != &main_arena) {
-			ar_ptr = &main_arena;
-			(void)mutex_lock(&ar_ptr->mutex);
-			p = _int_memalign(ar_ptr, pagesz, rounded_bytes);
-			(void)mutex_unlock(&ar_ptr->mutex);
-		} else {
-			/* ... or sbrk() has failed and there is still a chance to mmap() */
-			ar_ptr = arena_get2(ar_ptr->next ? ar_ptr : 0, bytes + 2 * pagesz + MINSIZE);
-			if (ar_ptr) {
-				p = _int_memalign(ar_ptr, pagesz, rounded_bytes);
-				(void)mutex_unlock(&ar_ptr->mutex);
-			}
-		}
+		ar_ptr = get_backup_arena(ar_ptr, bytes);
+		p = _int_memalign(ar_ptr, pagesz, rounded_bytes);
 	}
+	(void)mutex_unlock(&ar_ptr->mutex);
 	assert(!p || chunk_is_mmapped(mem2chunk(p)) || ar_ptr == arena_for_chunk(mem2chunk(p)));
 
 	return p;
@@ -3780,31 +3742,15 @@ Void_t *public_cALLOc(size_t n, size_t elem_size)
 	}
 #endif
 	mem = _int_malloc(av, sz);
-
-	/* Only clearing follows, so we can unlock early. */
+	if (mem == 0) {
+		av = get_backup_arena(av, bytes);
+		mem = _int_malloc(&main_arena, sz);
+	}
 	(void)mutex_unlock(&av->mutex);
 
 	assert(!mem || chunk_is_mmapped(mem2chunk(mem)) || av == arena_for_chunk(mem2chunk(mem)));
-
-	if (mem == 0) {
-		/* Maybe the failure is due to running out of mmapped areas. */
-		if (av != &main_arena) {
-			(void)mutex_lock(&main_arena.mutex);
-			mem = _int_malloc(&main_arena, sz);
-			(void)mutex_unlock(&main_arena.mutex);
-		} else {
-			/* ... or sbrk() has failed and there is still a chance to mmap() */
-			(void)mutex_lock(&main_arena.mutex);
-			av = arena_get2(av->next ? av : 0, sz);
-			(void)mutex_unlock(&main_arena.mutex);
-			if (av) {
-				mem = _int_malloc(av, sz);
-				(void)mutex_unlock(&av->mutex);
-			}
-		}
-		if (mem == 0)
-			return 0;
-	}
+	if (mem == 0)
+		return 0;
 	p = mem2chunk(mem);
 
 	/* Two optional cases in which clearing not necessary */
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH] malloc: quenche last compiler warnings
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
                   ` (15 preceding siblings ...)
  2016-01-26  0:27 ` [PATCH] malloc: make numa_node_count more robust Joern Engel
@ 2016-01-26  0:27 ` Joern Engel
  2016-01-26  0:27 ` [PATCH] malloc: tune thread cache Joern Engel
                   ` (47 subsequent siblings)
  64 siblings, 0 replies; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:27 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

Looks like __THROW was changed at some point when the __leaf__ attribute
was introduced.  And our version of malloc is out of date wrt. modern
compilers and headers.

See /usr/include/x86_64-linux-gnu/sys/cdefs.h

JIRA: PURE-27597
---
 tpc/malloc2.13/hooks.h  | 2 +-
 tpc/malloc2.13/malloc.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tpc/malloc2.13/hooks.h b/tpc/malloc2.13/hooks.h
index f855b0f1b165..afc8eeb93a8b 100644
--- a/tpc/malloc2.13/hooks.h
+++ b/tpc/malloc2.13/hooks.h
@@ -343,7 +343,7 @@ realloc_check(Void_t* oldmem, size_t bytes, const Void_t *caller)
 static Void_t*
 memalign_check(size_t alignment, size_t bytes, const Void_t *caller)
 {
-  INTERNAL_SIZE_T nb;
+  INTERNAL_SIZE_T nb __attribute__((unused));
   Void_t* mem;
 
   if (alignment <= MALLOC_ALIGNMENT) return malloc_check(bytes, NULL);
diff --git a/tpc/malloc2.13/malloc.c b/tpc/malloc2.13/malloc.c
index 9f2d2df47ea1..d9fecfe3f921 100644
--- a/tpc/malloc2.13/malloc.c
+++ b/tpc/malloc2.13/malloc.c
@@ -1324,7 +1324,7 @@ struct malloc_state;
 
 #define __malloc_ptr_t void *
 
-# define __MALLOC_P(args)	args __THROW
+# define __MALLOC_P(args)	args __THROWNL
 /* This macro will be used for functions which might take C++ callback
    functions.  */
 # define __MALLOC_PMT(args)	args
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH] malloc: s/max_node/num_nodes/
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
                   ` (11 preceding siblings ...)
  2016-01-26  0:27 ` [PATCH] malloc: remove __builtin_expect Joern Engel
@ 2016-01-26  0:27 ` Joern Engel
  2016-01-26  0:27 ` [PATCH] malloc: add documentation Joern Engel
                   ` (51 subsequent siblings)
  64 siblings, 0 replies; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:27 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

Costa complained and I once had a bug because I confused max_node for
num_nodes.  Also renames parse_node_count and turns it into a function.
You expect something called foo_init to have side-effects, but not
necessarily something called parse_foo.

JIRA: PURE-27597
---
 tpc/malloc2.13/arena.h  | 22 ++++++++++++----------
 tpc/malloc2.13/malloc.c |  7 +++++--
 2 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/tpc/malloc2.13/arena.h b/tpc/malloc2.13/arena.h
index 9236a3231f07..685822897d97 100644
--- a/tpc/malloc2.13/arena.h
+++ b/tpc/malloc2.13/arena.h
@@ -297,7 +297,7 @@ ptmalloc_init_minimal (void)
 
 static struct malloc_state *_int_new_arena(size_t size, int numa_node);
 
-static int max_node = -1;
+static int num_nodes = 0;
 
 #include <sys/types.h>
 #include <dirent.h>
@@ -306,26 +306,28 @@ static int max_node = -1;
 /*
  * Wouldn't it be nice to get this with a single syscall instead?
  */
-static void parse_node_count(void)
+static int numa_node_count(void)
 {
 	DIR *d;
 	struct dirent *de;
+	int ret;
 
 	d = opendir("/sys/devices/system/node");
 	if (!d) {
-		max_node = 0;
+		ret = 1;
 	} else {
 		while ((de = readdir(d)) != NULL) {
 			int nd;
 			if (strncmp(de->d_name, "node", 4))
 				continue;
 			nd = strtoul(de->d_name + 4, NULL, 0);
-			if (max_node < nd)
-				max_node = nd;
+			if (ret < nd + 1)
+				ret = nd + 1;
 		}
 		closedir(d);
 	}
-	assert(max_node < MAX_NUMA_NODES);
+	assert(ret <= MAX_NUMA_NODES);
+	return ret;
 }
 
 static void ptmalloc_init(void)
@@ -355,8 +357,8 @@ static void ptmalloc_init(void)
 	mutex_init(&main_arena.mutex);
 	main_arena.next = &main_arena;
 	main_arena.numa_node = -1;
-	parse_node_count();
-	for (i = 0; i <= max_node; i++) {
+	num_nodes = numa_node_count();
+	for (i = 0; i < num_nodes; i++) {
 		numa_arena[i] = _int_new_arena(0, i);
 		numa_arena[i]->local_next = numa_arena[i];
 		(void)mutex_unlock(&numa_arena[i]->mutex);
@@ -474,8 +476,8 @@ static void mbind_memory(void *mem, size_t size, int node)
 	unsigned long node_mask = 1 << node;
 	int err;
 
-	assert(max_node < sizeof(unsigned long));
-	err = mbind(mem, size, MPOL_PREFERRED, &node_mask, max_node + 1, 0);
+	assert(num_nodes <= sizeof(unsigned long));
+	err = mbind(mem, size, MPOL_PREFERRED, &node_mask, num_nodes, 0);
 	assert(!err);
 }
 
diff --git a/tpc/malloc2.13/malloc.c b/tpc/malloc2.13/malloc.c
index 94b7e223ec6f..46b3545aaa8f 100644
--- a/tpc/malloc2.13/malloc.c
+++ b/tpc/malloc2.13/malloc.c
@@ -2154,9 +2154,12 @@ struct malloc_par {
 static struct malloc_state main_arena;
 
 /*
- * For numa locality, have a per-node list of arenas.
+ * For numa locality, have a per-node list of arenas.  64 ought to be
+ * enough to anybody - for a while, at least.  Having it match
+ * wordsize is convenient, as mbind_memory() will need to be changed
+ * at the same time MAX_NUMA_NODES.
  */
-#define MAX_NUMA_NODES 2
+#define MAX_NUMA_NODES sizeof(unsigned long)
 static struct malloc_state *numa_arena[MAX_NUMA_NODES];
 
 /* There is only one instance of the malloc parameters.  */
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH] malloc: unifdef -m -UATOMIC_FASTBINS
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
                   ` (27 preceding siblings ...)
  2016-01-26  0:27 ` [PATCH] malloc: use mbind() Joern Engel
@ 2016-01-26  0:27 ` Joern Engel
  2016-01-26  0:27 ` [PATCH] malloc: fix perturb_byte handling for tcache Joern Engel
                   ` (35 subsequent siblings)
  64 siblings, 0 replies; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:27 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

From: Joern Engel <joern@purestorage.org>

Atomic fastbins are a horrible eye-sore, known to be buggy in our
version and cause unnecessary cacheline pingpong.  We can do better than
that.

JIRA: PURE-27597
---
 tpc/malloc2.13/arena.h  |   6 --
 tpc/malloc2.13/hooks.h  |   8 ---
 tpc/malloc2.13/malloc.c | 148 ------------------------------------------------
 3 files changed, 162 deletions(-)

diff --git a/tpc/malloc2.13/arena.h b/tpc/malloc2.13/arena.h
index 6ac0635364af..ef5e22a0811d 100644
--- a/tpc/malloc2.13/arena.h
+++ b/tpc/malloc2.13/arena.h
@@ -169,11 +169,6 @@ free_atfork(Void_t* mem, const Void_t *caller)
     return;
   }
 
-#ifdef ATOMIC_FASTBINS
-  ar_ptr = arena_for_chunk(p);
-  tsd_getspecific(arena_key, vptr);
-  _int_free(ar_ptr, p, vptr == ATFORK_ARENA_PTR);
-#else
   ar_ptr = arena_for_chunk(p);
   tsd_getspecific(arena_key, vptr);
   if(vptr != ATFORK_ARENA_PTR)
@@ -181,7 +176,6 @@ free_atfork(Void_t* mem, const Void_t *caller)
   _int_free(ar_ptr, p);
   if(vptr != ATFORK_ARENA_PTR)
     (void)mutex_unlock(&ar_ptr->mutex);
-#endif
 }
 
 
diff --git a/tpc/malloc2.13/hooks.h b/tpc/malloc2.13/hooks.h
index c236aab89237..f855b0f1b165 100644
--- a/tpc/malloc2.13/hooks.h
+++ b/tpc/malloc2.13/hooks.h
@@ -257,11 +257,7 @@ free_check(Void_t* mem, const Void_t *caller)
 #if 0 /* Erase freed memory. */
   memset(mem, 0, chunksize(p) - (SIZE_SZ+1));
 #endif
-#ifdef ATOMIC_FASTBINS
-  _int_free(&main_arena, p, 1);
-#else
   _int_free(&main_arena, p);
-#endif
   (void)mutex_unlock(&main_arena.mutex);
 }
 
@@ -406,11 +402,7 @@ free_starter(Void_t* mem, const Void_t *caller)
     munmap_chunk(p);
     return;
   }
-#ifdef ATOMIC_FASTBINS
-  _int_free(&main_arena, p, 1);
-#else
   _int_free(&main_arena, p);
-#endif
 }
 
 # endif	/* !defiend NO_STARTER */
diff --git a/tpc/malloc2.13/malloc.c b/tpc/malloc2.13/malloc.c
index 9df824584745..ddc9d51c445b 100644
--- a/tpc/malloc2.13/malloc.c
+++ b/tpc/malloc2.13/malloc.c
@@ -1358,11 +1358,7 @@ struct mallinfo2 {
 
 
 static Void_t*  _int_malloc(struct malloc_state *, size_t);
-#ifdef ATOMIC_FASTBINS
-static void     _int_free(struct malloc_state *, mchunkptr, int);
-#else
 static void     _int_free(struct malloc_state *, mchunkptr);
-#endif
 static Void_t*  _int_realloc(struct malloc_state *, mchunkptr, INTERNAL_SIZE_T,
 			     INTERNAL_SIZE_T);
 static Void_t*  _int_memalign(struct malloc_state *, size_t, size_t);
@@ -2035,13 +2031,8 @@ typedef struct malloc_chunk* mfastbinptr;
 #define FASTCHUNKS_BIT        (1U)
 
 #define have_fastchunks(M)     (((M)->flags &  FASTCHUNKS_BIT) == 0)
-#ifdef ATOMIC_FASTBINS
-#define clear_fastchunks(M)    catomic_or (&(M)->flags, FASTCHUNKS_BIT)
-#define set_fastchunks(M)      catomic_and (&(M)->flags, ~FASTCHUNKS_BIT)
-#else
 #define clear_fastchunks(M)    ((M)->flags |=  FASTCHUNKS_BIT)
 #define set_fastchunks(M)      ((M)->flags &= ~FASTCHUNKS_BIT)
-#endif
 
 /*
   NONCONTIGUOUS_BIT indicates that MORECORE does not return contiguous
@@ -2756,10 +2747,8 @@ static Void_t* sYSMALLOc(INTERNAL_SIZE_T nb, struct malloc_state * av)
   /* Precondition: not enough current space to satisfy nb request */
   assert((unsigned long)(old_size) < (unsigned long)(nb + MINSIZE));
 
-#ifndef ATOMIC_FASTBINS
   /* Precondition: all fastbins are consolidated */
   assert(!have_fastchunks(av));
-#endif
 
 
   if (av != &main_arena) {
@@ -2805,11 +2794,7 @@ static Void_t* sYSMALLOc(INTERNAL_SIZE_T nb, struct malloc_state * av)
 	set_head(chunk_at_offset(old_top, old_size), (2*SIZE_SZ)|PREV_INUSE);
 	set_foot(chunk_at_offset(old_top, old_size), (2*SIZE_SZ));
 	set_head(old_top, old_size|PREV_INUSE|NON_MAIN_ARENA);
-#ifdef ATOMIC_FASTBINS
-	_int_free(av, old_top, 1);
-#else
 	_int_free(av, old_top);
-#endif
       } else {
 	set_head(old_top, (old_size + 2*SIZE_SZ)|PREV_INUSE);
 	set_foot(old_top, (old_size + 2*SIZE_SZ));
@@ -3049,11 +3034,7 @@ static Void_t* sYSMALLOc(INTERNAL_SIZE_T nb, struct malloc_state * av)
 
 	  /* If possible, release the rest. */
 	  if (old_size >= MINSIZE) {
-#ifdef ATOMIC_FASTBINS
-	    _int_free(av, old_top, 1);
-#else
 	    _int_free(av, old_top);
-#endif
 	  }
 
 	}
@@ -3314,13 +3295,9 @@ public_fREe(Void_t* mem)
   }
 
   ar_ptr = arena_for_chunk(p);
-#ifdef ATOMIC_FASTBINS
-  _int_free(ar_ptr, p, 0);
-#else
   arena_lock(ar_ptr);
   _int_free(ar_ptr, p);
   arena_unlock(ar_ptr);
-#endif
 }
 
 Void_t*
@@ -3400,13 +3377,9 @@ public_rEALLOc(Void_t* oldmem, size_t bytes)
       if (newp != NULL)
 	{
 	  MALLOC_COPY (newp, oldmem, oldsize - SIZE_SZ);
-#ifdef ATOMIC_FASTBINS
-	  _int_free(ar_ptr, oldp, 0);
-#else
 	  arena_lock(ar_ptr);
 	  _int_free(ar_ptr, oldp);
 	  arena_unlock(ar_ptr);
-#endif
 	}
     }
 
@@ -3771,19 +3744,7 @@ _int_malloc(struct malloc_state * av, size_t bytes)
   if ((unsigned long)(nb) <= (unsigned long)(get_max_fast ())) {
     idx = fastbin_index(nb);
     mfastbinptr* fb = &fastbin (av, idx);
-#ifdef ATOMIC_FASTBINS
-    mchunkptr pp = *fb;
-    do
-      {
-	victim = pp;
-	if (victim == NULL)
-	  break;
-      }
-    while ((pp = catomic_compare_and_exchange_val_acq (fb, victim->fd, victim))
-	   != victim);
-#else
     victim = *fb;
-#endif
     if (victim != 0) {
       if (fastbin_index (chunksize (victim)) != idx)
 	{
@@ -3792,9 +3753,7 @@ _int_malloc(struct malloc_state * av, size_t bytes)
 	  malloc_printerr (check_action, errstr, chunk2mem (victim));
 	  return NULL;
 	}
-#ifndef ATOMIC_FASTBINS
       *fb = victim->fd;
-#endif
       check_remalloced_chunk(av, victim, nb);
       void *p = chunk2mem(victim);
       if (perturb_byte)
@@ -4199,18 +4158,6 @@ _int_malloc(struct malloc_state * av, size_t bytes)
       return p;
     }
 
-#ifdef ATOMIC_FASTBINS
-    /* When we are using atomic ops to free fast chunks we can get
-       here for all block sizes.  */
-    else if (have_fastchunks(av)) {
-      malloc_consolidate(av);
-      /* restore original bin index */
-      if (in_smallbin_range(nb))
-	idx = smallbin_index(nb);
-      else
-	idx = largebin_index(nb);
-    }
-#else
     /*
       If there is space available in fastbins, consolidate and retry,
       to possibly avoid expanding memory. This can occur only if nb is
@@ -4222,7 +4169,6 @@ _int_malloc(struct malloc_state * av, size_t bytes)
       malloc_consolidate(av);
       idx = smallbin_index(nb); /* restore original bin index */
     }
-#endif
 
     /*
        Otherwise, relay to handle system-dependent cases
@@ -4241,11 +4187,7 @@ _int_malloc(struct malloc_state * av, size_t bytes)
 */
 
 static void
-#ifdef ATOMIC_FASTBINS
-_int_free(struct malloc_state * av, mchunkptr p, int have_lock)
-#else
 _int_free(struct malloc_state * av, mchunkptr p)
-#endif
 {
   INTERNAL_SIZE_T size;        /* its size */
   mfastbinptr*    fb;          /* associated fastbin */
@@ -4257,9 +4199,6 @@ _int_free(struct malloc_state * av, mchunkptr p)
   mchunkptr       fwd;         /* misc temp for linking */
 
   const char *errstr = NULL;
-#ifdef ATOMIC_FASTBINS
-  int locked = 0;
-#endif
 
   size = chunksize(p);
 
@@ -4272,10 +4211,6 @@ _int_free(struct malloc_state * av, mchunkptr p)
     {
       errstr = "free(): invalid pointer";
     errout:
-#ifdef ATOMIC_FASTBINS
-      if (! have_lock && locked)
-	arena_unlock(av);
-#endif
       malloc_printerr (check_action, errstr, chunk2mem(p));
       return;
     }
@@ -4308,29 +4243,10 @@ _int_free(struct malloc_state * av, mchunkptr p)
 	|| chunksize (chunk_at_offset (p, size))
 			     >= av->system_mem)
       {
-#ifdef ATOMIC_FASTBINS
-	/* We might not have a lock at this point and concurrent modifications
-	   of system_mem might have let to a false positive.  Redo the test
-	   after getting the lock.  */
-	if (have_lock
-	    || ({ assert (locked == 0);
-		  arena_lock*av);
-		  locked = 1;
-		  chunk_at_offset (p, size)->size <= 2 * SIZE_SZ
-		    || chunksize (chunk_at_offset (p, size)) >= av->system_mem;
-	      }))
-#endif
 	  {
 	    errstr = "free(): invalid next size (fast)";
 	    goto errout;
 	  }
-#ifdef ATOMIC_FASTBINS
-	if (! have_lock)
-	  {
-	    arena_unlock(av);
-	    locked = 0;
-	  }
-#endif
       }
 
     if (perturb_byte)
@@ -4340,31 +4256,6 @@ _int_free(struct malloc_state * av, mchunkptr p)
     unsigned int idx = fastbin_index(size);
     fb = &fastbin (av, idx);
 
-#ifdef ATOMIC_FASTBINS
-    mchunkptr fd;
-    mchunkptr old = *fb;
-    unsigned int old_idx = ~0u;
-    do
-      {
-	/* Another simple check: make sure the top of the bin is not the
-	   record we are going to add (i.e., double free).  */
-	if (old == p)
-	  {
-	    errstr = "double free or corruption (fasttop)";
-	    goto errout;
-	  }
-	if (old != NULL)
-	  old_idx = fastbin_index(chunksize(old));
-	p->fd = fd = old;
-      }
-    while ((old = catomic_compare_and_exchange_val_rel (fb, p, fd)) != fd);
-
-    if (fd != NULL && old_idx != idx)
-      {
-	errstr = "invalid fastbin entry (free)";
-	goto errout;
-      }
-#else
     /* Another simple check: make sure the top of the bin is not the
        record we are going to add (i.e., double free).  */
     if (*fb == p)
@@ -4381,7 +4272,6 @@ _int_free(struct malloc_state * av, mchunkptr p)
 
     p->fd = *fb;
     *fb = p;
-#endif
   }
 
   /*
@@ -4389,12 +4279,6 @@ _int_free(struct malloc_state * av, mchunkptr p)
   */
 
   else if (!chunk_is_mmapped(p)) {
-#ifdef ATOMIC_FASTBINS
-    if (! have_lock) {
-      arena_lock(av);
-      locked = 1;
-    }
-#endif
 
     nextchunk = chunk_at_offset(p, size);
 
@@ -4524,12 +4408,6 @@ _int_free(struct malloc_state * av, mchunkptr p)
       }
     }
 
-#ifdef ATOMIC_FASTBINS
-    if (! have_lock) {
-      assert (locked);
-      arena_unlock(av);
-    }
-#endif
   }
   /*
     If the chunk was allocated via mmap, release via munmap(). Note
@@ -4605,15 +4483,9 @@ static void malloc_consolidate(struct malloc_state * av)
 #endif
     fb = &fastbin (av, 0);
     do {
-#ifdef ATOMIC_FASTBINS
-      p = atomic_exchange_acq (fb, 0);
-#else
       p = *fb;
-#endif
       if (p != 0) {
-#ifndef ATOMIC_FASTBINS
 	*fb = 0;
-#endif
 	do {
 	  check_inuse_chunk(av, p);
 	  nextp = p->fd;
@@ -4804,11 +4676,7 @@ _int_realloc(struct malloc_state * av, mchunkptr oldp, INTERNAL_SIZE_T oldsize,
 	    }
 	  }
 
-#ifdef ATOMIC_FASTBINS
-	  _int_free(av, oldp, 1);
-#else
 	  _int_free(av, oldp);
-#endif
 	  check_inuse_chunk(av, newp);
 	  return chunk2mem(newp);
 	}
@@ -4832,11 +4700,7 @@ _int_realloc(struct malloc_state * av, mchunkptr oldp, INTERNAL_SIZE_T oldsize,
 	       (av != &main_arena ? NON_MAIN_ARENA : 0));
       /* Mark remainder as inuse so free() won't complain */
       set_inuse_bit_at_offset(remainder, remainder_size);
-#ifdef ATOMIC_FASTBINS
-      _int_free(av, remainder, 1);
-#else
       _int_free(av, remainder);
-#endif
     }
 
     check_inuse_chunk(av, newp);
@@ -4895,11 +4759,7 @@ _int_realloc(struct malloc_state * av, mchunkptr oldp, INTERNAL_SIZE_T oldsize,
       newmem = _int_malloc(av, nb - MALLOC_ALIGN_MASK);
       if (newmem != 0) {
 	MALLOC_COPY(newmem, chunk2mem(oldp), oldsize - 2*SIZE_SZ);
-#ifdef ATOMIC_FASTBINS
-	_int_free(av, oldp, 1);
-#else
 	_int_free(av, oldp);
-#endif
       }
     }
     return newmem;
@@ -4988,11 +4848,7 @@ _int_memalign(struct malloc_state * av, size_t alignment, size_t bytes)
 	     (av != &main_arena ? NON_MAIN_ARENA : 0));
     set_inuse_bit_at_offset(newp, newsize);
     set_head_size(p, leadsize | (av != &main_arena ? NON_MAIN_ARENA : 0));
-#ifdef ATOMIC_FASTBINS
-    _int_free(av, p, 1);
-#else
     _int_free(av, p);
-#endif
     p = newp;
 
     assert (newsize >= nb &&
@@ -5008,11 +4864,7 @@ _int_memalign(struct malloc_state * av, size_t alignment, size_t bytes)
       set_head(remainder, remainder_size | PREV_INUSE |
 	       (av != &main_arena ? NON_MAIN_ARENA : 0));
       set_head_size(p, nb);
-#ifdef ATOMIC_FASTBINS
-      _int_free(av, remainder, 1);
-#else
       _int_free(av, remainder);
-#endif
     }
   }
 
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH] malloc: prefetch for tcache_malloc
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
                   ` (40 preceding siblings ...)
  2016-01-26  0:27 ` [PATCH] malloc: destroy thread cache on thread exit Joern Engel
@ 2016-01-26  0:27 ` Joern Engel
  2016-01-26  0:27 ` [PATCH] malloc: fix hard-coded constant Joern Engel
                   ` (22 subsequent siblings)
  64 siblings, 0 replies; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:27 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

From: Joern Engel <joern@purestorage.org>

We also cycle through the entire bin if the first object doesn't fit.
Can be somewhat expensive, but we either find a match or will prefetch
and find a match the next few times.

JIRA: PURE-27597
---
 tpc/malloc2.13/malloc.c |  5 +++-
 tpc/malloc2.13/tcache.h | 68 +++++++++++++++++++++++++++++++++++++++++--------
 2 files changed, 62 insertions(+), 11 deletions(-)

diff --git a/tpc/malloc2.13/malloc.c b/tpc/malloc2.13/malloc.c
index 40f6aa578c6f..1ee563bb299e 100644
--- a/tpc/malloc2.13/malloc.c
+++ b/tpc/malloc2.13/malloc.c
@@ -3267,7 +3267,7 @@ Void_t *public_mALLOc(size_t bytes)
 
 	victim = tcache_malloc(bytes);
 	if (victim)
-		return victim;
+		goto out;
 
 	ar_ptr = arena_get(bytes);
 	if (!ar_ptr)
@@ -3279,6 +3279,9 @@ Void_t *public_mALLOc(size_t bytes)
 	}
 	arena_unlock(ar_ptr);
 	assert(!victim || chunk_is_mmapped(mem2chunk(victim)) || ar_ptr == arena_for_chunk(mem2chunk(victim)));
+out:
+	if (perturb_byte)
+		alloc_perturb(victim, bytes);
 	return victim;
 }
 
diff --git a/tpc/malloc2.13/tcache.h b/tpc/malloc2.13/tcache.h
index 62d58cc77475..7cf6b316456f 100644
--- a/tpc/malloc2.13/tcache.h
+++ b/tpc/malloc2.13/tcache.h
@@ -27,7 +27,7 @@ static inline int fls(int x)
 #define CACHE_SIZE		(1 << 16)
 #define MAX_CACHED_SIZE		(CACHE_SIZE >> 3)
 #define MAX_PREFETCH_SIZE	(CACHE_SIZE >> 6)
-#define NO_PREFECT		(1 << 3)
+#define NO_PREFETCH		(1 << 3)
 
 /*
  * Binning is done as a subdivided buddy allocator.  A buddy allocator
@@ -102,9 +102,9 @@ static void *tcache_malloc(size_t size)
 {
 	struct thread_cache *cache;
 	struct malloc_state *arena;
-	struct malloc_chunk **bin, *victim;
+	struct malloc_chunk **bin, *victim, *prefetch;
 	size_t nb;
-	int bin_no;
+	int bin_no, i;
 
 	checked_request2size(size, nb);
 	if (nb > MAX_CACHED_SIZE)
@@ -114,6 +114,10 @@ static void *tcache_malloc(size_t size)
 	if (!cache) {
 		arena = arena_get(sizeof(*cache));
 		cache = _int_malloc(arena, sizeof(*cache));
+		if (!cache) {
+			arena = get_backup_arena(arena, sizeof(*cache));
+			cache = _int_malloc(arena, sizeof(*cache));
+		}
 		arena_unlock(arena);
 		if (!cache)
 			return NULL;
@@ -126,23 +130,67 @@ static void *tcache_malloc(size_t size)
 
 	bin = &cache->tc_bin[bin_no];
 	victim = *bin;
-	if (victim) {
-		if (chunksize(victim) < nb)
-			return NULL;
+	while (victim) {
+		if (chunksize(victim) < nb) {
+			bin = &victim->fd;
+			victim = *bin;
+			continue;
+		}
 		if (cache_bin(chunksize(*bin)) != bin_no) {
 			malloc_printerr(check_action, "invalid tcache entry", victim);
 			return NULL;
 		}
 		*bin = victim->fd;
 		void *p = chunk2mem(victim);
-		if (perturb_byte)
-			alloc_perturb(p, size);
 		cache->tc_size -= chunksize(victim);
 		cache->tc_count--;
 		return p;
 	}
-	/* TODO: prefetch objects */
-	return NULL;
+
+	/*
+	 * GC the cache before prefetching, not after.  The last thing
+	 * we want is to spend effort prefetching, then release all
+	 * those objects via cache_gc.  Also do it before taking the
+	 * lock, to minimize hold times.
+	 */
+	if (nb <= MAX_PREFETCH_SIZE && (cache->tc_size + nb * 8) > CACHE_SIZE )
+		cache_gc(cache);
+
+	arena = arena_get(size);
+	if (!arena)
+		return NULL;
+	/* TODO: _int_malloc does checked_request2size() again, which is silly */
+	victim = _int_malloc(arena, size);
+	if (!victim) {
+		arena = get_backup_arena(arena, size);
+		victim = _int_malloc(arena, size);
+	}
+	if (victim && nb <= MAX_PREFETCH_SIZE) {
+		/* Prefetch some more while we hold the lock */
+		for (i = 0; i < NO_PREFETCH; i++) {
+			prefetch = _int_malloc(arena, size);
+			if (!prefetch)
+				break;
+			prefetch = mem2chunk(prefetch);
+			if (cache_bin(chunksize(prefetch)) > bin_no) {
+				/*
+				 * If _int_malloc() returns bigger chunks,
+				 * we assume that prefetching won't buy us
+				 * any benefits.
+				 */
+				_int_free(arena, prefetch);
+				break;
+			}
+			assert(cache_bin(chunksize(prefetch)) == bin_no);
+			cache->tc_size += chunksize(prefetch);
+			cache->tc_count++;
+			prefetch->fd = *bin;
+			*bin = prefetch;
+		}
+	}
+	arena_unlock(arena);
+	assert(!victim || arena == arena_for_chunk(mem2chunk(victim)));
+	return victim;
 }
 
 /*
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH] malloc: document and fix linked list handling
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
                   ` (34 preceding siblings ...)
  2016-01-26  0:27 ` [PATCH] malloc: hide THREAD_STATS Joern Engel
@ 2016-01-26  0:27 ` Joern Engel
  2016-01-26  0:27 ` [PATCH] malloc: Lindent users of arena_get2 Joern Engel
                   ` (28 subsequent siblings)
  64 siblings, 0 replies; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:27 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

Costa wondered why we have a write barrier without a matching read
barrier.  On closer examination that pattern is actually correct,
although by no means obvious.  There were, however, two bugs.

commit 52c75b22e320:
    Also, change the numa_arena[node] pointers whenever we use them.
    Otherwise those arenas become hot spots.  If we move them around, the
    load gets spread roughly evenly.

While this sounds good in principle, it races with _int_new_arena.  We
have to take the list_lock to do so safely.  At this point I'd rather
remove the code than go fancy with trylocks.

The second bug seems to be ancient.  atomic_write_barrier() protects
against the compiler reordering writes, but not against the cpu.

JIRA: PURE-27597
---
 tpc/malloc2.13/arena.h          | 16 +++++++++++++---
 tpc/malloc2.13/malloc-machine.h |  6 +++---
 2 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/tpc/malloc2.13/arena.h b/tpc/malloc2.13/arena.h
index 7f50dacb8297..20c3614e65bf 100644
--- a/tpc/malloc2.13/arena.h
+++ b/tpc/malloc2.13/arena.h
@@ -706,6 +706,17 @@ static struct malloc_state *_int_new_arena(size_t size, int numa_node)
 	/* Add the new arena to the global lists.  */
 	a->numa_node = numa_node;
 
+	/*
+	 * a->next must be a valid pointer before changing
+	 * main_arena.next, otherwise a reader following the list
+	 * pointers could end up dereferencing NULL or worse.  The
+	 * same reasoning applies to a->local_next.
+	 *
+	 * No atomic_read_barrier() is needed, because these are
+	 * dependent reads that are naturally ordered.  You really
+	 * cannot load a->next->next before loading a->next, after
+	 * all.
+	 */
 	a->next = main_arena.next;
 	atomic_write_barrier();
 	main_arena.next = a;
@@ -797,10 +808,9 @@ static struct malloc_state *arena_get(size_t size)
 	 * to use a numa-local arena, but are limited to best-effort.
 	 */
 	tsd_getspecific(arena_key, arena);
-	if (!arena || arena->numa_node != node) {
+	if (!arena || arena->numa_node != node)
 		arena = numa_arena[node];
-		numa_arena[node] = arena->local_next;
-	}
+
 	if (arena && !mutex_trylock(&arena->mutex)) {
 		THREAD_STAT(++(arena->stat_lock_direct));
 	} else
diff --git a/tpc/malloc2.13/malloc-machine.h b/tpc/malloc2.13/malloc-machine.h
index 5e01a27e4adc..07072f5d5e11 100644
--- a/tpc/malloc2.13/malloc-machine.h
+++ b/tpc/malloc2.13/malloc-machine.h
@@ -110,15 +110,15 @@ typedef pthread_key_t tsd_key_t;
 
 
 #ifndef atomic_full_barrier
-# define atomic_full_barrier() __asm ("" ::: "memory")
+# define atomic_full_barrier() __sync_synchronize()
 #endif
 
 #ifndef atomic_read_barrier
-# define atomic_read_barrier() atomic_full_barrier ()
+# define atomic_read_barrier() atomic_full_barrier()
 #endif
 
 #ifndef atomic_write_barrier
-# define atomic_write_barrier() atomic_full_barrier ()
+# define atomic_write_barrier() atomic_full_barrier()
 #endif
 
 #ifndef DEFAULT_TOP_PAD
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH] malloc: Lindent users of arena_get2
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
                   ` (35 preceding siblings ...)
  2016-01-26  0:27 ` [PATCH] malloc: document and fix linked list handling Joern Engel
@ 2016-01-26  0:27 ` Joern Engel
  2016-01-26  0:27 ` [PATCH] malloc: always free objects locklessly Joern Engel
                   ` (27 subsequent siblings)
  64 siblings, 0 replies; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:27 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

From: Joern Engel <joern@purestorage.org>

Preparation for functional change

JIRA: PURE-27597
---
 tpc/malloc2.13/malloc.c | 495 +++++++++++++++++++++++-------------------------
 1 file changed, 239 insertions(+), 256 deletions(-)

diff --git a/tpc/malloc2.13/malloc.c b/tpc/malloc2.13/malloc.c
index 94b55241d3bf..28d9d902b7ec 100644
--- a/tpc/malloc2.13/malloc.c
+++ b/tpc/malloc2.13/malloc.c
@@ -3399,44 +3399,42 @@ mremap_chunk(mchunkptr p, size_t new_size)
 
 /*------------------------ Public wrappers. --------------------------------*/
 
-Void_t*
-public_mALLOc(size_t bytes)
+Void_t *public_mALLOc(size_t bytes)
 {
-  struct malloc_state * ar_ptr;
-  Void_t *victim;
-
-  __malloc_ptr_t (*hook) (size_t, __const __malloc_ptr_t)
-    = force_reg (dlmalloc_hook);
-  if (__builtin_expect (hook != NULL, 0))
-    return (*hook)(bytes, RETURN_ADDRESS (0));
-
-  arena_lookup(ar_ptr);
-  arena_lock(ar_ptr, bytes);
-  if(!ar_ptr)
-    return 0;
-  victim = _int_malloc(ar_ptr, bytes);
-  if(!victim) {
-    /* Maybe the failure is due to running out of mmapped areas. */
-    if(ar_ptr != &main_arena) {
-      (void)mutex_unlock(&ar_ptr->mutex);
-      ar_ptr = &main_arena;
-      (void)mutex_lock(&ar_ptr->mutex);
-      victim = _int_malloc(ar_ptr, bytes);
-      (void)mutex_unlock(&ar_ptr->mutex);
-    } else {
-      /* ... or sbrk() has failed and there is still a chance to mmap() */
-      ar_ptr = arena_get2(ar_ptr->next ? ar_ptr : 0, bytes);
-      (void)mutex_unlock(&main_arena.mutex);
-      if(ar_ptr) {
+	struct malloc_state *ar_ptr;
+	Void_t *victim;
+
+	__malloc_ptr_t(*hook) (size_t, __const __malloc_ptr_t)
+	    = force_reg(dlmalloc_hook);
+	if (__builtin_expect(hook != NULL, 0))
+		return (*hook) (bytes, RETURN_ADDRESS(0));
+
+	arena_lookup(ar_ptr);
+	arena_lock(ar_ptr, bytes);
+	if (!ar_ptr)
+		return 0;
 	victim = _int_malloc(ar_ptr, bytes);
-	(void)mutex_unlock(&ar_ptr->mutex);
-      }
-    }
-  } else
-    (void)mutex_unlock(&ar_ptr->mutex);
-  assert(!victim || chunk_is_mmapped(mem2chunk(victim)) ||
-	 ar_ptr == arena_for_chunk(mem2chunk(victim)));
-  return victim;
+	if (!victim) {
+		/* Maybe the failure is due to running out of mmapped areas. */
+		if (ar_ptr != &main_arena) {
+			(void)mutex_unlock(&ar_ptr->mutex);
+			ar_ptr = &main_arena;
+			(void)mutex_lock(&ar_ptr->mutex);
+			victim = _int_malloc(ar_ptr, bytes);
+			(void)mutex_unlock(&ar_ptr->mutex);
+		} else {
+			/* ... or sbrk() has failed and there is still a chance to mmap() */
+			ar_ptr = arena_get2(ar_ptr->next ? ar_ptr : 0, bytes);
+			(void)mutex_unlock(&main_arena.mutex);
+			if (ar_ptr) {
+				victim = _int_malloc(ar_ptr, bytes);
+				(void)mutex_unlock(&ar_ptr->mutex);
+			}
+		}
+	} else
+		(void)mutex_unlock(&ar_ptr->mutex);
+	assert(!victim || chunk_is_mmapped(mem2chunk(victim)) || ar_ptr == arena_for_chunk(mem2chunk(victim)));
+	return victim;
 }
 
 void
@@ -3598,278 +3596,263 @@ public_rEALLOc(Void_t* oldmem, size_t bytes)
   return newp;
 }
 
-Void_t*
-public_mEMALIGn(size_t alignment, size_t bytes)
+Void_t *public_mEMALIGn(size_t alignment, size_t bytes)
 {
-  struct malloc_state * ar_ptr;
-  Void_t *p;
+	struct malloc_state *ar_ptr;
+	Void_t *p;
 
-  __malloc_ptr_t (*hook) __MALLOC_PMT ((size_t, size_t,
-					__const __malloc_ptr_t)) =
-    force_reg (dlmemalign_hook);
-  if (__builtin_expect (hook != NULL, 0))
-    return (*hook)(alignment, bytes, RETURN_ADDRESS (0));
+	__malloc_ptr_t(*hook) __MALLOC_PMT((size_t, size_t, __const __malloc_ptr_t)) = force_reg(dlmemalign_hook);
+	if (__builtin_expect(hook != NULL, 0))
+		return (*hook) (alignment, bytes, RETURN_ADDRESS(0));
 
-  /* If need less alignment than we give anyway, just relay to malloc */
-  if (alignment <= MALLOC_ALIGNMENT) return public_mALLOc(bytes);
+	/* If need less alignment than we give anyway, just relay to malloc */
+	if (alignment <= MALLOC_ALIGNMENT)
+		return public_mALLOc(bytes);
 
-  /* Otherwise, ensure that it is at least a minimum chunk size */
-  if (alignment <  MINSIZE) alignment = MINSIZE;
+	/* Otherwise, ensure that it is at least a minimum chunk size */
+	if (alignment < MINSIZE)
+		alignment = MINSIZE;
 
-  arena_get(ar_ptr, bytes + alignment + MINSIZE);
-  if(!ar_ptr)
-    return 0;
-  p = _int_memalign(ar_ptr, alignment, bytes);
-  if(!p) {
-    /* Maybe the failure is due to running out of mmapped areas. */
-    if(ar_ptr != &main_arena) {
-      (void)mutex_unlock(&ar_ptr->mutex);
-      ar_ptr = &main_arena;
-      (void)mutex_lock(&ar_ptr->mutex);
-      p = _int_memalign(ar_ptr, alignment, bytes);
-      (void)mutex_unlock(&ar_ptr->mutex);
-    } else {
-      /* ... or sbrk() has failed and there is still a chance to mmap() */
-      struct malloc_state * prev = ar_ptr->next ? ar_ptr : 0;
-      (void)mutex_unlock(&ar_ptr->mutex);
-      ar_ptr = arena_get2(prev, bytes);
-      if(ar_ptr) {
+	arena_get(ar_ptr, bytes + alignment + MINSIZE);
+	if (!ar_ptr)
+		return 0;
 	p = _int_memalign(ar_ptr, alignment, bytes);
-	(void)mutex_unlock(&ar_ptr->mutex);
-      }
-    }
-  } else
-    (void)mutex_unlock(&ar_ptr->mutex);
-  assert(!p || chunk_is_mmapped(mem2chunk(p)) ||
-	 ar_ptr == arena_for_chunk(mem2chunk(p)));
-  return p;
+	if (!p) {
+		/* Maybe the failure is due to running out of mmapped areas. */
+		if (ar_ptr != &main_arena) {
+			(void)mutex_unlock(&ar_ptr->mutex);
+			ar_ptr = &main_arena;
+			(void)mutex_lock(&ar_ptr->mutex);
+			p = _int_memalign(ar_ptr, alignment, bytes);
+			(void)mutex_unlock(&ar_ptr->mutex);
+		} else {
+			/* ... or sbrk() has failed and there is still a chance to mmap() */
+			struct malloc_state *prev = ar_ptr->next ? ar_ptr : 0;
+			(void)mutex_unlock(&ar_ptr->mutex);
+			ar_ptr = arena_get2(prev, bytes);
+			if (ar_ptr) {
+				p = _int_memalign(ar_ptr, alignment, bytes);
+				(void)mutex_unlock(&ar_ptr->mutex);
+			}
+		}
+	} else
+		(void)mutex_unlock(&ar_ptr->mutex);
+	assert(!p || chunk_is_mmapped(mem2chunk(p)) || ar_ptr == arena_for_chunk(mem2chunk(p)));
+	return p;
 }
 
-Void_t*
-public_vALLOc(size_t bytes)
+Void_t *public_vALLOc(size_t bytes)
 {
-  struct malloc_state * ar_ptr;
-  Void_t *p;
+	struct malloc_state *ar_ptr;
+	Void_t *p;
 
-  if(__malloc_initialized < 0)
-    ptmalloc_init ();
+	if (__malloc_initialized < 0)
+		ptmalloc_init();
 
-  size_t pagesz = mp_.pagesize;
+	size_t pagesz = mp_.pagesize;
 
-  __malloc_ptr_t (*hook) __MALLOC_PMT ((size_t, size_t,
-					__const __malloc_ptr_t)) =
-    force_reg (dlmemalign_hook);
-  if (__builtin_expect (hook != NULL, 0))
-    return (*hook)(pagesz, bytes, RETURN_ADDRESS (0));
+	__malloc_ptr_t(*hook) __MALLOC_PMT((size_t, size_t, __const __malloc_ptr_t)) = force_reg(dlmemalign_hook);
+	if (__builtin_expect(hook != NULL, 0))
+		return (*hook) (pagesz, bytes, RETURN_ADDRESS(0));
 
-  arena_get(ar_ptr, bytes + pagesz + MINSIZE);
-  if(!ar_ptr)
-    return 0;
-  p = _int_valloc(ar_ptr, bytes);
-  (void)mutex_unlock(&ar_ptr->mutex);
-  if(!p) {
-    /* Maybe the failure is due to running out of mmapped areas. */
-    if(ar_ptr != &main_arena) {
-      ar_ptr = &main_arena;
-      (void)mutex_lock(&ar_ptr->mutex);
-      p = _int_memalign(ar_ptr, pagesz, bytes);
-      (void)mutex_unlock(&ar_ptr->mutex);
-    } else {
-      /* ... or sbrk() has failed and there is still a chance to mmap() */
-      ar_ptr = arena_get2(ar_ptr->next ? ar_ptr : 0, bytes);
-      if(ar_ptr) {
-	p = _int_memalign(ar_ptr, pagesz, bytes);
+	arena_get(ar_ptr, bytes + pagesz + MINSIZE);
+	if (!ar_ptr)
+		return 0;
+	p = _int_valloc(ar_ptr, bytes);
 	(void)mutex_unlock(&ar_ptr->mutex);
-      }
-    }
-  }
-  assert(!p || chunk_is_mmapped(mem2chunk(p)) ||
-	 ar_ptr == arena_for_chunk(mem2chunk(p)));
+	if (!p) {
+		/* Maybe the failure is due to running out of mmapped areas. */
+		if (ar_ptr != &main_arena) {
+			ar_ptr = &main_arena;
+			(void)mutex_lock(&ar_ptr->mutex);
+			p = _int_memalign(ar_ptr, pagesz, bytes);
+			(void)mutex_unlock(&ar_ptr->mutex);
+		} else {
+			/* ... or sbrk() has failed and there is still a chance to mmap() */
+			ar_ptr = arena_get2(ar_ptr->next ? ar_ptr : 0, bytes);
+			if (ar_ptr) {
+				p = _int_memalign(ar_ptr, pagesz, bytes);
+				(void)mutex_unlock(&ar_ptr->mutex);
+			}
+		}
+	}
+	assert(!p || chunk_is_mmapped(mem2chunk(p)) || ar_ptr == arena_for_chunk(mem2chunk(p)));
 
-  return p;
+	return p;
 }
 
-Void_t*
-public_pVALLOc(size_t bytes)
+Void_t *public_pVALLOc(size_t bytes)
 {
-  struct malloc_state * ar_ptr;
-  Void_t *p;
+	struct malloc_state *ar_ptr;
+	Void_t *p;
 
-  if(__malloc_initialized < 0)
-    ptmalloc_init ();
+	if (__malloc_initialized < 0)
+		ptmalloc_init();
 
-  size_t pagesz = mp_.pagesize;
-  size_t page_mask = mp_.pagesize - 1;
-  size_t rounded_bytes = (bytes + page_mask) & ~(page_mask);
+	size_t pagesz = mp_.pagesize;
+	size_t page_mask = mp_.pagesize - 1;
+	size_t rounded_bytes = (bytes + page_mask) & ~(page_mask);
 
-  __malloc_ptr_t (*hook) __MALLOC_PMT ((size_t, size_t,
-					__const __malloc_ptr_t)) =
-    force_reg (dlmemalign_hook);
-  if (__builtin_expect (hook != NULL, 0))
-    return (*hook)(pagesz, rounded_bytes, RETURN_ADDRESS (0));
+	__malloc_ptr_t(*hook) __MALLOC_PMT((size_t, size_t, __const __malloc_ptr_t)) = force_reg(dlmemalign_hook);
+	if (__builtin_expect(hook != NULL, 0))
+		return (*hook) (pagesz, rounded_bytes, RETURN_ADDRESS(0));
 
-  arena_get(ar_ptr, bytes + 2*pagesz + MINSIZE);
-  p = _int_pvalloc(ar_ptr, bytes);
-  (void)mutex_unlock(&ar_ptr->mutex);
-  if(!p) {
-    /* Maybe the failure is due to running out of mmapped areas. */
-    if(ar_ptr != &main_arena) {
-      ar_ptr = &main_arena;
-      (void)mutex_lock(&ar_ptr->mutex);
-      p = _int_memalign(ar_ptr, pagesz, rounded_bytes);
-      (void)mutex_unlock(&ar_ptr->mutex);
-    } else {
-      /* ... or sbrk() has failed and there is still a chance to mmap() */
-      ar_ptr = arena_get2(ar_ptr->next ? ar_ptr : 0,
-			  bytes + 2*pagesz + MINSIZE);
-      if(ar_ptr) {
-	p = _int_memalign(ar_ptr, pagesz, rounded_bytes);
+	arena_get(ar_ptr, bytes + 2 * pagesz + MINSIZE);
+	p = _int_pvalloc(ar_ptr, bytes);
 	(void)mutex_unlock(&ar_ptr->mutex);
-      }
-    }
-  }
-  assert(!p || chunk_is_mmapped(mem2chunk(p)) ||
-	 ar_ptr == arena_for_chunk(mem2chunk(p)));
+	if (!p) {
+		/* Maybe the failure is due to running out of mmapped areas. */
+		if (ar_ptr != &main_arena) {
+			ar_ptr = &main_arena;
+			(void)mutex_lock(&ar_ptr->mutex);
+			p = _int_memalign(ar_ptr, pagesz, rounded_bytes);
+			(void)mutex_unlock(&ar_ptr->mutex);
+		} else {
+			/* ... or sbrk() has failed and there is still a chance to mmap() */
+			ar_ptr = arena_get2(ar_ptr->next ? ar_ptr : 0, bytes + 2 * pagesz + MINSIZE);
+			if (ar_ptr) {
+				p = _int_memalign(ar_ptr, pagesz, rounded_bytes);
+				(void)mutex_unlock(&ar_ptr->mutex);
+			}
+		}
+	}
+	assert(!p || chunk_is_mmapped(mem2chunk(p)) || ar_ptr == arena_for_chunk(mem2chunk(p)));
 
-  return p;
+	return p;
 }
 
-Void_t*
-public_cALLOc(size_t n, size_t elem_size)
+Void_t *public_cALLOc(size_t n, size_t elem_size)
 {
-  struct malloc_state * av;
-  mchunkptr oldtop, p;
-  INTERNAL_SIZE_T bytes, sz, csz, oldtopsize;
-  Void_t* mem;
-  unsigned long clearsize;
-  unsigned long nclears;
-  INTERNAL_SIZE_T* d;
-
-  /* size_t is unsigned so the behavior on overflow is defined.  */
-  bytes = n * elem_size;
+	struct malloc_state *av;
+	mchunkptr oldtop, p;
+	INTERNAL_SIZE_T bytes, sz, csz, oldtopsize;
+	Void_t *mem;
+	unsigned long clearsize;
+	unsigned long nclears;
+	INTERNAL_SIZE_T *d;
+
+	/* size_t is unsigned so the behavior on overflow is defined.  */
+	bytes = n * elem_size;
 #define HALF_INTERNAL_SIZE_T \
   (((INTERNAL_SIZE_T) 1) << (8 * sizeof (INTERNAL_SIZE_T) / 2))
-  if (__builtin_expect ((n | elem_size) >= HALF_INTERNAL_SIZE_T, 0)) {
-    if (elem_size != 0 && bytes / elem_size != n) {
-      MALLOC_FAILURE_ACTION;
-      return 0;
-    }
-  }
+	if (__builtin_expect((n | elem_size) >= HALF_INTERNAL_SIZE_T, 0)) {
+		if (elem_size != 0 && bytes / elem_size != n) {
+			MALLOC_FAILURE_ACTION;
+			return 0;
+		}
+	}
 
-  __malloc_ptr_t (*hook) __MALLOC_PMT ((size_t, __const __malloc_ptr_t)) =
-    force_reg (dlmalloc_hook);
-  if (__builtin_expect (hook != NULL, 0)) {
-    sz = bytes;
-    mem = (*hook)(sz, RETURN_ADDRESS (0));
-    if(mem == 0)
-      return 0;
+	__malloc_ptr_t(*hook) __MALLOC_PMT((size_t, __const __malloc_ptr_t)) = force_reg(dlmalloc_hook);
+	if (__builtin_expect(hook != NULL, 0)) {
+		sz = bytes;
+		mem = (*hook) (sz, RETURN_ADDRESS(0));
+		if (mem == 0)
+			return 0;
 #ifdef HAVE_MEMCPY
-    return memset(mem, 0, sz);
+		return memset(mem, 0, sz);
 #else
-    while(sz > 0) ((char*)mem)[--sz] = 0; /* rather inefficient */
-    return mem;
+		while (sz > 0)
+			((char *)mem)[--sz] = 0;	/* rather inefficient */
+		return mem;
 #endif
-  }
+	}
 
-  sz = bytes;
+	sz = bytes;
 
-  arena_get(av, sz);
-  if(!av)
-    return 0;
+	arena_get(av, sz);
+	if (!av)
+		return 0;
 
-  /* Check if we hand out the top chunk, in which case there may be no
-     need to clear. */
+	/* Check if we hand out the top chunk, in which case there may be no
+	   need to clear. */
 #if MORECORE_CLEARS
-  oldtop = top(av);
-  oldtopsize = chunksize(top(av));
+	oldtop = top(av);
+	oldtopsize = chunksize(top(av));
 #if MORECORE_CLEARS < 2
-  /* Only newly allocated memory is guaranteed to be cleared.  */
-  if (av == &main_arena &&
-      oldtopsize < mp_.sbrk_base + av->max_system_mem - (char *)oldtop)
-    oldtopsize = (mp_.sbrk_base + av->max_system_mem - (char *)oldtop);
+	/* Only newly allocated memory is guaranteed to be cleared.  */
+	if (av == &main_arena && oldtopsize < mp_.sbrk_base + av->max_system_mem - (char *)oldtop)
+		oldtopsize = (mp_.sbrk_base + av->max_system_mem - (char *)oldtop);
 #endif
-  if (av != &main_arena)
-    {
-      heap_info *heap = heap_for_ptr (oldtop);
-      if (oldtopsize < (char *) heap + heap->mprotect_size - (char *) oldtop)
-	oldtopsize = (char *) heap + heap->mprotect_size - (char *) oldtop;
-    }
+	if (av != &main_arena) {
+		heap_info *heap = heap_for_ptr(oldtop);
+		if (oldtopsize < (char *)heap + heap->mprotect_size - (char *)oldtop)
+			oldtopsize = (char *)heap + heap->mprotect_size - (char *)oldtop;
+	}
 #endif
-  mem = _int_malloc(av, sz);
-
-  /* Only clearing follows, so we can unlock early. */
-  (void)mutex_unlock(&av->mutex);
-
-  assert(!mem || chunk_is_mmapped(mem2chunk(mem)) ||
-	 av == arena_for_chunk(mem2chunk(mem)));
-
-  if (mem == 0) {
-    /* Maybe the failure is due to running out of mmapped areas. */
-    if(av != &main_arena) {
-      (void)mutex_lock(&main_arena.mutex);
-      mem = _int_malloc(&main_arena, sz);
-      (void)mutex_unlock(&main_arena.mutex);
-    } else {
-      /* ... or sbrk() has failed and there is still a chance to mmap() */
-      (void)mutex_lock(&main_arena.mutex);
-      av = arena_get2(av->next ? av : 0, sz);
-      (void)mutex_unlock(&main_arena.mutex);
-      if(av) {
 	mem = _int_malloc(av, sz);
+
+	/* Only clearing follows, so we can unlock early. */
 	(void)mutex_unlock(&av->mutex);
-      }
-    }
-    if (mem == 0) return 0;
-  }
-  p = mem2chunk(mem);
 
-  /* Two optional cases in which clearing not necessary */
-  if (chunk_is_mmapped (p))
-    {
-      if (__builtin_expect (perturb_byte, 0))
-	MALLOC_ZERO (mem, sz);
-      return mem;
-    }
+	assert(!mem || chunk_is_mmapped(mem2chunk(mem)) || av == arena_for_chunk(mem2chunk(mem)));
+
+	if (mem == 0) {
+		/* Maybe the failure is due to running out of mmapped areas. */
+		if (av != &main_arena) {
+			(void)mutex_lock(&main_arena.mutex);
+			mem = _int_malloc(&main_arena, sz);
+			(void)mutex_unlock(&main_arena.mutex);
+		} else {
+			/* ... or sbrk() has failed and there is still a chance to mmap() */
+			(void)mutex_lock(&main_arena.mutex);
+			av = arena_get2(av->next ? av : 0, sz);
+			(void)mutex_unlock(&main_arena.mutex);
+			if (av) {
+				mem = _int_malloc(av, sz);
+				(void)mutex_unlock(&av->mutex);
+			}
+		}
+		if (mem == 0)
+			return 0;
+	}
+	p = mem2chunk(mem);
+
+	/* Two optional cases in which clearing not necessary */
+	if (chunk_is_mmapped(p)) {
+		if (__builtin_expect(perturb_byte, 0))
+			MALLOC_ZERO(mem, sz);
+		return mem;
+	}
 
-  csz = chunksize(p);
+	csz = chunksize(p);
 
 #if MORECORE_CLEARS
-  if (perturb_byte == 0 && (p == oldtop && csz > oldtopsize)) {
-    /* clear only the bytes from non-freshly-sbrked memory */
-    csz = oldtopsize;
-  }
+	if (perturb_byte == 0 && (p == oldtop && csz > oldtopsize)) {
+		/* clear only the bytes from non-freshly-sbrked memory */
+		csz = oldtopsize;
+	}
 #endif
 
-  /* Unroll clear of <= 36 bytes (72 if 8byte sizes).  We know that
-     contents have an odd number of INTERNAL_SIZE_T-sized words;
-     minimally 3.  */
-  d = (INTERNAL_SIZE_T*)mem;
-  clearsize = csz - SIZE_SZ;
-  nclears = clearsize / sizeof(INTERNAL_SIZE_T);
-  assert(nclears >= 3);
+	/* Unroll clear of <= 36 bytes (72 if 8byte sizes).  We know that
+	   contents have an odd number of INTERNAL_SIZE_T-sized words;
+	   minimally 3.  */
+	d = (INTERNAL_SIZE_T *) mem;
+	clearsize = csz - SIZE_SZ;
+	nclears = clearsize / sizeof(INTERNAL_SIZE_T);
+	assert(nclears >= 3);
 
-  if (nclears > 9)
-    MALLOC_ZERO(d, clearsize);
+	if (nclears > 9)
+		MALLOC_ZERO(d, clearsize);
 
-  else {
-    *(d+0) = 0;
-    *(d+1) = 0;
-    *(d+2) = 0;
-    if (nclears > 4) {
-      *(d+3) = 0;
-      *(d+4) = 0;
-      if (nclears > 6) {
-	*(d+5) = 0;
-	*(d+6) = 0;
-	if (nclears > 8) {
-	  *(d+7) = 0;
-	  *(d+8) = 0;
+	else {
+		*(d + 0) = 0;
+		*(d + 1) = 0;
+		*(d + 2) = 0;
+		if (nclears > 4) {
+			*(d + 3) = 0;
+			*(d + 4) = 0;
+			if (nclears > 6) {
+				*(d + 5) = 0;
+				*(d + 6) = 0;
+				if (nclears > 8) {
+					*(d + 7) = 0;
+					*(d + 8) = 0;
+				}
+			}
+		}
 	}
-      }
-    }
-  }
 
-  return mem;
+	return mem;
 }
 
 #ifndef _LIBC
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH] malloc: create a useful assert
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
                   ` (17 preceding siblings ...)
  2016-01-26  0:27 ` [PATCH] malloc: tune thread cache Joern Engel
@ 2016-01-26  0:27 ` Joern Engel
  2016-01-26  0:27 ` [PATCH] malloc: fix startup races Joern Engel
                   ` (45 subsequent siblings)
  64 siblings, 0 replies; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:27 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

From: Joern Engel <joern@purestorage.org>

No idea why including <assert.h> doesn't do the job.

JIRA: PURE-27597
---
 tpc/malloc2.13/malloc.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/tpc/malloc2.13/malloc.c b/tpc/malloc2.13/malloc.c
index 83d2c4c02e48..dca97ef553c0 100644
--- a/tpc/malloc2.13/malloc.c
+++ b/tpc/malloc2.13/malloc.c
@@ -313,7 +313,14 @@ extern "C" {
 */
 
 #if MALLOC_DEBUG
+# ifdef PURE_HACK
+#undef  assert
+#define assert(x) if (!(x)) { \
+	fprintf(stderr, "%s:%d: %s\n", __FILE__, __LINE__, __STRING(x)); \
+	abort(); }
+# else
 #include <assert.h>
+# endif
 #else
 #undef  assert
 #define assert(x) if (0 && !(x)) { ; }
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH] malloc: make numa_node_count more robust
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
                   ` (14 preceding siblings ...)
  2016-01-26  0:27 ` [PATCH] malloc: Lindent public_fREe() Joern Engel
@ 2016-01-26  0:27 ` Joern Engel
  2016-01-26  0:27 ` [PATCH] malloc: quenche last compiler warnings Joern Engel
                   ` (48 subsequent siblings)
  64 siblings, 0 replies; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:27 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

Gcc was warning that ret might be undefined.  Given a Linux system, this
is pretty hard to imagine.  But any fix that removes more code than it
adds is a good fix in my book.

JIRA: PURE-27597
---
 tpc/malloc2.13/arena.h | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/tpc/malloc2.13/arena.h b/tpc/malloc2.13/arena.h
index 20c3614e65bf..8890e83ad18f 100644
--- a/tpc/malloc2.13/arena.h
+++ b/tpc/malloc2.13/arena.h
@@ -310,12 +310,10 @@ static int numa_node_count(void)
 {
 	DIR *d;
 	struct dirent *de;
-	int ret;
+	int ret = 1;
 
 	d = opendir("/sys/devices/system/node");
-	if (!d) {
-		ret = 1;
-	} else {
+	if (d) {
 		while ((de = readdir(d)) != NULL) {
 			int nd;
 			if (strncmp(de->d_name, "node", 4))
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH] malloc: use atomic free list
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
                   ` (32 preceding siblings ...)
  2016-01-26  0:27 ` [PATCH] malloc: use tsd_getspecific for arena_get Joern Engel
@ 2016-01-26  0:27 ` Joern Engel
  2016-01-26  0:27 ` [PATCH] malloc: hide THREAD_STATS Joern Engel
                   ` (30 subsequent siblings)
  64 siblings, 0 replies; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:27 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

From: Joern Engel <joern@purestorage.org>

Malloc will almost never block on a mutex.  If it would, it will jump to
the next arena until it finds an uncontested mutex - creating a new
arena if this strategy fails.

Free is forced to block on the mutex, as it cannot free objects to a
different arena.  To avoid this we put freed chunks on an atomic list
and let the next taker of the lock handle chunks on this list.

JIRA: PURE-27597
---
 tpc/malloc2.13/malloc.c |  6 ++++++
 tpc/malloc2.13/tcache.h | 50 ++++++++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 51 insertions(+), 5 deletions(-)

diff --git a/tpc/malloc2.13/malloc.c b/tpc/malloc2.13/malloc.c
index 1ee563bb299e..90d0e7e552b9 100644
--- a/tpc/malloc2.13/malloc.c
+++ b/tpc/malloc2.13/malloc.c
@@ -2112,6 +2112,12 @@ struct malloc_state {
 	/* Memory allocated from the system in this arena.  */
 	INTERNAL_SIZE_T system_mem;
 	INTERNAL_SIZE_T max_system_mem;
+
+	/*
+	 * Chunks freed by tcache_gc, not sorted into any bins yet and
+	 * not protected by mutex - use atomic operations on this.
+	 */
+	mchunkptr atomic_free_list;
 };
 
 struct malloc_par {
diff --git a/tpc/malloc2.13/tcache.h b/tpc/malloc2.13/tcache.h
index 1829cd4ebb9a..0ddee48a30dc 100644
--- a/tpc/malloc2.13/tcache.h
+++ b/tpc/malloc2.13/tcache.h
@@ -104,10 +104,22 @@ static inline int is_accessed(struct thread_cache *cache, int bin)
 	return get_bit(cache->accessed_map, bin);
 }
 
+static void free_atomic_list(struct malloc_state *arena)
+{
+	struct malloc_chunk *victim, *next;
+
+	victim = __sync_lock_test_and_set(&arena->atomic_free_list, NULL);
+	while (victim) {
+		next = victim->fd;
+		_int_free(arena, victim);
+		victim = next;
+	}
+}
+
 static void tcache_gc(struct thread_cache *cache)
 {
 	struct malloc_chunk *victim, *next;
-	struct malloc_state *arena;
+	struct malloc_state *arena = NULL;
 	int i, did_repeat = 0;
 
 repeat:
@@ -122,10 +134,19 @@ repeat:
 			cache->tc_size -= chunksize(victim);
 			cache->tc_count--;
 			arena = arena_for_chunk(victim);
-			/* TODO: use atomic bins instead */
-			mutex_lock(&arena->mutex);
-			_int_free(arena, victim);
-			mutex_unlock(&arena->mutex);
+			/*
+			 * Free, unlike malloc, has no choice which arena to
+			 * use and therefore cannot avoid blocking on the
+			 * mutex by using trylock.
+			 *
+			 * Instead we move chunks to an atomic list that we
+			 * can access locklessly and defer actually freeing
+			 * of the chunks to the next allocator that holds
+			 * the mutex.
+			 */
+			do {
+				victim->fd = arena->atomic_free_list;
+			} while (!__sync_bool_compare_and_swap(&arena->atomic_free_list, victim->fd, victim));
 			victim = next;
 		}
 	}
@@ -149,6 +170,24 @@ repeat:
 		assert(cache->tc_size == 0);
 		assert(cache->tc_count == 0);
 	}
+	if (arena && !mutex_trylock(&arena->mutex)) {
+		/*
+		 * Chunks on the atomic free list can linger for a long time
+		 * if no allocations from that arena happen.  This occupies
+		 * system memory that otherwise might be returned to the kernel.
+		 *
+		 * If we can get the mutex, we might as well handle the atomic
+		 * list.  If not, the arena must be in active use and someone
+		 * else will handle this for us.
+		 *
+		 * Theoretically we should try every arena we processed chunks
+		 * for.  By doing it for just one arena essentially at random,
+		 * we will eventually do it for all and don't have to keep track
+		 * of all the arenas we used.
+		 */
+		free_atomic_list(arena);
+		arena_unlock(arena);
+	}
 }
 
 static void *tcache_malloc(size_t size)
@@ -213,6 +252,7 @@ static void *tcache_malloc(size_t size)
 	arena = arena_get(size);
 	if (!arena)
 		return NULL;
+	free_atomic_list(arena);
 	/* TODO: _int_malloc does checked_request2size() again, which is silly */
 	victim = _int_malloc(arena, size);
 	if (!victim) {
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH] malloc: remove mstate typedef
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
                   ` (8 preceding siblings ...)
  2016-01-26  0:26 ` [PATCH] malloc: kill mprotect Joern Engel
@ 2016-01-26  0:27 ` Joern Engel
  2016-01-26  0:27 ` [PATCH] malloc: Lindent before functional changes Joern Engel
                   ` (54 subsequent siblings)
  64 siblings, 0 replies; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:27 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Jörn Engel

From: Jörn Engel <joern@purestorage.com>

Improve readability

JIRA: PURE-27597
---
 tpc/malloc2.13/arena.ch |  42 ++++++++++----------
 tpc/malloc2.13/malloc.c | 103 ++++++++++++++++++++++++------------------------
 2 files changed, 72 insertions(+), 73 deletions(-)

diff --git a/tpc/malloc2.13/arena.ch b/tpc/malloc2.13/arena.ch
index b8e7c611c42c..0aaccb914d92 100644
--- a/tpc/malloc2.13/arena.ch
+++ b/tpc/malloc2.13/arena.ch
@@ -57,7 +57,7 @@
    USE_ARENAS. */
 
 typedef struct _heap_info {
-  mstate ar_ptr; /* Arena for this heap. */
+  struct malloc_state * ar_ptr; /* Arena for this heap. */
   struct _heap_info *prev; /* Previous heap. */
   size_t size;   /* Current size in bytes. */
   size_t mprotect_size;	/* Size in bytes that has been mprotected
@@ -80,7 +80,7 @@ static tsd_key_t arena_key;
 static mutex_t list_lock;
 #ifdef PER_THREAD
 static size_t narenas;
-static mstate free_list;
+static struct malloc_state * free_list;
 #endif
 
 #if THREAD_STATS
@@ -115,7 +115,7 @@ static int __malloc_initialized = -1;
 
 #define arena_lookup(ptr) do { \
   Void_t *vptr = NULL; \
-  ptr = (mstate)tsd_getspecific(arena_key, vptr); \
+  ptr = (struct malloc_state *)tsd_getspecific(arena_key, vptr); \
 } while(0)
 
 #ifdef PER_THREAD
@@ -224,7 +224,7 @@ static void
 free_atfork(Void_t* mem, const Void_t *caller)
 {
   Void_t *vptr = NULL;
-  mstate ar_ptr;
+  struct malloc_state * ar_ptr;
   mchunkptr p;                          /* chunk corresponding to mem */
 
   if (mem == 0)                              /* free(0) has no effect */
@@ -268,7 +268,7 @@ static unsigned int atfork_recursive_cntr;
 static void
 ptmalloc_lock_all (void)
 {
-  mstate ar_ptr;
+  struct malloc_state * ar_ptr;
 
   if(__malloc_initialized < 1)
     return;
@@ -303,7 +303,7 @@ ptmalloc_lock_all (void)
 static void
 ptmalloc_unlock_all (void)
 {
-  mstate ar_ptr;
+  struct malloc_state * ar_ptr;
 
   if(__malloc_initialized < 1)
     return;
@@ -330,7 +330,7 @@ ptmalloc_unlock_all (void)
 static void
 ptmalloc_unlock_all2 (void)
 {
-  mstate ar_ptr;
+  struct malloc_state * ar_ptr;
 
   if(__malloc_initialized < 1)
     return;
@@ -649,7 +649,7 @@ dump_heap(heap_info *heap)
   mchunkptr p;
 
   fprintf(stderr, "Heap %p, size %10lx:\n", heap, (long)heap->size);
-  ptr = (heap->ar_ptr != (mstate)(heap+1)) ?
+  ptr = (heap->ar_ptr != (struct malloc_state *)(heap+1)) ?
     (char*)(heap + 1) : (char*)(heap + 1) + sizeof(struct malloc_state);
   p = (mchunkptr)(((unsigned long)ptr + MALLOC_ALIGN_MASK) &
 		  ~MALLOC_ALIGN_MASK);
@@ -812,7 +812,7 @@ static int
 internal_function
 heap_trim(heap_info *heap, size_t pad)
 {
-  mstate ar_ptr = heap->ar_ptr;
+  struct malloc_state * ar_ptr = heap->ar_ptr;
   unsigned long pagesz = mp_.pagesize;
   mchunkptr top_chunk = top(ar_ptr), p, bck, fwd;
   heap_info *prev_heap;
@@ -863,10 +863,10 @@ heap_trim(heap_info *heap, size_t pad)
 
 /* Create a new arena with initial size "size".  */
 
-static mstate
+static struct malloc_state *
 _int_new_arena(size_t size)
 {
-  mstate a;
+  struct malloc_state * a;
   heap_info *h;
   char *ptr;
   unsigned long misalign;
@@ -881,7 +881,7 @@ _int_new_arena(size_t size)
     if(!h)
       return 0;
   }
-  a = h->ar_ptr = (mstate)(h+1);
+  a = h->ar_ptr = (struct malloc_state *)(h+1);
   malloc_init_state(a);
   /*a->next = NULL;*/
   a->system_mem = a->max_system_mem = h->size;
@@ -926,10 +926,10 @@ _int_new_arena(size_t size)
 
 
 #ifdef PER_THREAD
-static mstate
+static struct malloc_state *
 get_free_list (void)
 {
-  mstate result = free_list;
+  struct malloc_state * result = free_list;
   if (result != NULL)
     {
       (void)mutex_lock(&list_lock);
@@ -950,7 +950,7 @@ get_free_list (void)
 }
 
 
-static mstate
+static struct malloc_state *
 reused_arena (void)
 {
   if (narenas <= mp_.arena_test)
@@ -977,8 +977,8 @@ reused_arena (void)
   if (narenas < narenas_limit)
     return NULL;
 
-  mstate result;
-  static mstate next_to_use;
+  struct malloc_state * result;
+  static struct malloc_state * next_to_use;
   if (next_to_use == NULL)
     next_to_use = &main_arena;
 
@@ -1004,11 +1004,11 @@ reused_arena (void)
 }
 #endif
 
-static mstate
+static struct malloc_state *
 internal_function
-arena_get2(mstate a_tsd, size_t size)
+arena_get2(struct malloc_state * a_tsd, size_t size)
 {
-  mstate a;
+  struct malloc_state * a;
 
 #ifdef PER_THREAD
   if ((a = get_free_list ()) == NULL
@@ -1069,7 +1069,7 @@ static void __attribute__ ((section ("__libc_thread_freeres_fn")))
 arena_thread_freeres (void)
 {
   Void_t *vptr = NULL;
-  mstate a = tsd_getspecific(arena_key, vptr);
+  struct malloc_state * a = tsd_getspecific(arena_key, vptr);
   tsd_setspecific(arena_key, NULL);
 
   if (a != NULL)
diff --git a/tpc/malloc2.13/malloc.c b/tpc/malloc2.13/malloc.c
index b17b17bba4d4..6b75c9a6beb0 100644
--- a/tpc/malloc2.13/malloc.c
+++ b/tpc/malloc2.13/malloc.c
@@ -1484,7 +1484,6 @@ int      dlposix_memalign(void **, size_t, size_t);
 #include <dlmalloc.h>
 
 struct malloc_state;
-typedef struct malloc_state *mstate;
 
 #define __malloc_ptr_t void *
 
@@ -1528,27 +1527,27 @@ struct mallinfo2 {
 /* Internal routines.  */
 
 
-static Void_t*  _int_malloc(mstate, size_t);
+static Void_t*  _int_malloc(struct malloc_state *, size_t);
 #ifdef ATOMIC_FASTBINS
-static void     _int_free(mstate, mchunkptr, int);
+static void     _int_free(struct malloc_state *, mchunkptr, int);
 #else
-static void     _int_free(mstate, mchunkptr);
+static void     _int_free(struct malloc_state *, mchunkptr);
 #endif
-static Void_t*  _int_realloc(mstate, mchunkptr, INTERNAL_SIZE_T,
+static Void_t*  _int_realloc(struct malloc_state *, mchunkptr, INTERNAL_SIZE_T,
 			     INTERNAL_SIZE_T);
-static Void_t*  _int_memalign(mstate, size_t, size_t);
-static Void_t*  _int_valloc(mstate, size_t);
-static Void_t*  _int_pvalloc(mstate, size_t);
+static Void_t*  _int_memalign(struct malloc_state *, size_t, size_t);
+static Void_t*  _int_valloc(struct malloc_state *, size_t);
+static Void_t*  _int_pvalloc(struct malloc_state *, size_t);
 /*static Void_t*  cALLOc(size_t, size_t);*/
 #ifndef _LIBC
-static Void_t** _int_icalloc(mstate, size_t, size_t, Void_t**);
-static Void_t** _int_icomalloc(mstate, size_t, size_t*, Void_t**);
+static Void_t** _int_icalloc(struct malloc_state *, size_t, size_t, Void_t**);
+static Void_t** _int_icomalloc(struct malloc_state *, size_t, size_t*, Void_t**);
 #endif
-static int      mTRIm(mstate, size_t);
+static int      mTRIm(struct malloc_state *, size_t);
 static size_t   mUSABLe(Void_t*);
 static void     mSTATs(void);
 static int      mALLOPt(int, int);
-static struct mallinfo2 mALLINFo(mstate);
+static struct mallinfo2 mALLINFo(struct malloc_state *);
 static void malloc_printerr(int action, const char *str, void *ptr);
 
 static Void_t* internal_function mem2mem_check(Void_t *p, size_t sz);
@@ -2363,7 +2362,7 @@ static INTERNAL_SIZE_T global_max_fast;
   optimization at all. (Inlining it in malloc_consolidate is fine though.)
 */
 
-static void malloc_init_state(mstate av)
+static void malloc_init_state(struct malloc_state * av)
 {
   int     i;
   mbinptr bin;
@@ -2389,11 +2388,11 @@ static void malloc_init_state(mstate av)
    Other internal utilities operating on mstates
 */
 
-static Void_t*  sYSMALLOc(INTERNAL_SIZE_T, mstate);
-static int      sYSTRIm(size_t, mstate);
-static void     malloc_consolidate(mstate);
+static Void_t*  sYSMALLOc(INTERNAL_SIZE_T, struct malloc_state *);
+static int      sYSTRIm(size_t, struct malloc_state *);
+static void     malloc_consolidate(struct malloc_state *);
 #ifndef _LIBC
-static Void_t** iALLOc(mstate, size_t, size_t*, int, Void_t**);
+static Void_t** iALLOc(struct malloc_state *, size_t, size_t*, int, Void_t**);
 #endif
 
 
@@ -2485,7 +2484,7 @@ static int perturb_byte;
   Properties of all chunks
 */
 
-static void do_check_chunk(mstate av, mchunkptr p)
+static void do_check_chunk(struct malloc_state * av, mchunkptr p)
 {
   unsigned long sz = chunksize(p);
   /* min and max possible addresses assuming contiguous allocation */
@@ -2530,7 +2529,7 @@ static void do_check_chunk(mstate av, mchunkptr p)
   Properties of free chunks
 */
 
-static void do_check_free_chunk(mstate av, mchunkptr p)
+static void do_check_free_chunk(struct malloc_state * av, mchunkptr p)
 {
   INTERNAL_SIZE_T sz = p->size & ~(PREV_INUSE|NON_MAIN_ARENA);
   mchunkptr next = chunk_at_offset(p, sz);
@@ -2564,7 +2563,7 @@ static void do_check_free_chunk(mstate av, mchunkptr p)
   Properties of inuse chunks
 */
 
-static void do_check_inuse_chunk(mstate av, mchunkptr p)
+static void do_check_inuse_chunk(struct malloc_state * av, mchunkptr p)
 {
   mchunkptr next;
 
@@ -2601,7 +2600,7 @@ static void do_check_inuse_chunk(mstate av, mchunkptr p)
   Properties of chunks recycled from fastbins
 */
 
-static void do_check_remalloced_chunk(mstate av, mchunkptr p, INTERNAL_SIZE_T s)
+static void do_check_remalloced_chunk(struct malloc_state * av, mchunkptr p, INTERNAL_SIZE_T s)
 {
   INTERNAL_SIZE_T sz = p->size & ~(PREV_INUSE|NON_MAIN_ARENA);
 
@@ -2629,7 +2628,7 @@ static void do_check_remalloced_chunk(mstate av, mchunkptr p, INTERNAL_SIZE_T s)
   Properties of nonrecycled chunks at the point they are malloced
 */
 
-static void do_check_malloced_chunk(mstate av, mchunkptr p, INTERNAL_SIZE_T s)
+static void do_check_malloced_chunk(struct malloc_state * av, mchunkptr p, INTERNAL_SIZE_T s)
 {
   /* same as recycled case ... */
   do_check_remalloced_chunk(av, p, s);
@@ -2659,7 +2658,7 @@ static void do_check_malloced_chunk(mstate av, mchunkptr p, INTERNAL_SIZE_T s)
   display chunk addresses, sizes, bins, and other instrumentation.
 */
 
-static void do_check_malloc_state(mstate av)
+static void do_check_malloc_state(struct malloc_state * av)
 {
   int i;
   mchunkptr p;
@@ -2826,7 +2825,7 @@ static void do_check_malloc_state(mstate av)
   be extended or replaced.
 */
 
-static Void_t* sYSMALLOc(INTERNAL_SIZE_T nb, mstate av)
+static Void_t* sYSMALLOc(INTERNAL_SIZE_T nb, struct malloc_state * av)
 {
   mchunkptr       old_top;        /* incoming value of av->top */
   INTERNAL_SIZE_T old_size;       /* its size */
@@ -3310,7 +3309,7 @@ static Void_t* sYSMALLOc(INTERNAL_SIZE_T nb, mstate av)
   returns 1 if it actually released any memory, else 0.
 */
 
-static int sYSTRIm(size_t pad, mstate av)
+static int sYSTRIm(size_t pad, struct malloc_state * av)
 {
   long  top_size;        /* Amount of top-most memory */
   long  extra;           /* Amount to release */
@@ -3462,7 +3461,7 @@ mremap_chunk(mchunkptr p, size_t new_size)
 Void_t*
 public_mALLOc(size_t bytes)
 {
-  mstate ar_ptr;
+  struct malloc_state * ar_ptr;
   Void_t *victim;
 
   __malloc_ptr_t (*hook) (size_t, __const __malloc_ptr_t)
@@ -3539,7 +3538,7 @@ libc_hidden_def(public_mALLOc)
 void
 public_fREe(Void_t* mem)
 {
-  mstate ar_ptr;
+  struct malloc_state * ar_ptr;
   mchunkptr p;                          /* chunk corresponding to mem */
 
   void (*hook) (__malloc_ptr_t, __const __malloc_ptr_t)
@@ -3595,7 +3594,7 @@ libc_hidden_def (public_fREe)
 Void_t*
 public_rEALLOc(Void_t* oldmem, size_t bytes)
 {
-  mstate ar_ptr;
+  struct malloc_state * ar_ptr;
   INTERNAL_SIZE_T    nb;      /* padded request size */
 
   Void_t* newp;             /* chunk to return */
@@ -3708,7 +3707,7 @@ libc_hidden_def (public_rEALLOc)
 Void_t*
 public_mEMALIGn(size_t alignment, size_t bytes)
 {
-  mstate ar_ptr;
+  struct malloc_state * ar_ptr;
   Void_t *p;
 
   __malloc_ptr_t (*hook) __MALLOC_PMT ((size_t, size_t,
@@ -3738,7 +3737,7 @@ public_mEMALIGn(size_t alignment, size_t bytes)
     } else {
 #if USE_ARENAS
       /* ... or sbrk() has failed and there is still a chance to mmap() */
-      mstate prev = ar_ptr->next ? ar_ptr : 0;
+      struct malloc_state * prev = ar_ptr->next ? ar_ptr : 0;
       (void)mutex_unlock(&ar_ptr->mutex);
       ar_ptr = arena_get2(prev, bytes);
       if(ar_ptr) {
@@ -3760,7 +3759,7 @@ libc_hidden_def (public_mEMALIGn)
 Void_t*
 public_vALLOc(size_t bytes)
 {
-  mstate ar_ptr;
+  struct malloc_state * ar_ptr;
   Void_t *p;
 
   if(__malloc_initialized < 0)
@@ -3806,7 +3805,7 @@ public_vALLOc(size_t bytes)
 Void_t*
 public_pVALLOc(size_t bytes)
 {
-  mstate ar_ptr;
+  struct malloc_state * ar_ptr;
   Void_t *p;
 
   if(__malloc_initialized < 0)
@@ -3853,7 +3852,7 @@ public_pVALLOc(size_t bytes)
 Void_t*
 public_cALLOc(size_t n, size_t elem_size)
 {
-  mstate av;
+  struct malloc_state * av;
   mchunkptr oldtop, p;
   INTERNAL_SIZE_T bytes, sz, csz, oldtopsize;
   Void_t* mem;
@@ -3997,7 +3996,7 @@ public_cALLOc(size_t n, size_t elem_size)
 Void_t**
 public_iCALLOc(size_t n, size_t elem_size, Void_t** chunks)
 {
-  mstate ar_ptr;
+  struct malloc_state * ar_ptr;
   Void_t** m;
 
   arena_get(ar_ptr, n*elem_size);
@@ -4012,7 +4011,7 @@ public_iCALLOc(size_t n, size_t elem_size, Void_t** chunks)
 Void_t**
 public_iCOMALLOc(size_t n, size_t sizes[], Void_t** chunks)
 {
-  mstate ar_ptr;
+  struct malloc_state * ar_ptr;
   Void_t** m;
 
   arena_get(ar_ptr, 0);
@@ -4040,7 +4039,7 @@ public_mTRIm(size_t s)
   if(__malloc_initialized < 0)
     ptmalloc_init ();
 
-  mstate ar_ptr = &main_arena;
+  struct malloc_state * ar_ptr = &main_arena;
   do
     {
       (void) mutex_lock (&ar_ptr->mutex);
@@ -4105,7 +4104,7 @@ public_mALLOPt(int p, int v)
 */
 
 static Void_t*
-_int_malloc(mstate av, size_t bytes)
+_int_malloc(struct malloc_state * av, size_t bytes)
 {
   INTERNAL_SIZE_T nb;               /* normalized request size */
   unsigned int    idx;              /* associated bin index */
@@ -4618,9 +4617,9 @@ _int_malloc(mstate av, size_t bytes)
 
 static void
 #ifdef ATOMIC_FASTBINS
-_int_free(mstate av, mchunkptr p, int have_lock)
+_int_free(struct malloc_state * av, mchunkptr p, int have_lock)
 #else
-_int_free(mstate av, mchunkptr p)
+_int_free(struct malloc_state * av, mchunkptr p)
 #endif
 {
   INTERNAL_SIZE_T size;        /* its size */
@@ -4945,7 +4944,7 @@ _int_free(mstate av, mchunkptr p)
   initialization code.
 */
 
-static void malloc_consolidate(mstate av)
+static void malloc_consolidate(struct malloc_state * av)
 {
   mfastbinptr*    fb;                 /* current fastbin being consolidated */
   mfastbinptr*    maxfb;              /* last fastbin (for loop control) */
@@ -5063,7 +5062,7 @@ static void malloc_consolidate(mstate av)
 */
 
 Void_t*
-_int_realloc(mstate av, mchunkptr oldp, INTERNAL_SIZE_T oldsize,
+_int_realloc(struct malloc_state * av, mchunkptr oldp, INTERNAL_SIZE_T oldsize,
 	     INTERNAL_SIZE_T nb)
 {
   mchunkptr        newp;            /* chunk to return */
@@ -5307,7 +5306,7 @@ _int_realloc(mstate av, mchunkptr oldp, INTERNAL_SIZE_T oldsize,
 */
 
 static Void_t*
-_int_memalign(mstate av, size_t alignment, size_t bytes)
+_int_memalign(struct malloc_state * av, size_t alignment, size_t bytes)
 {
   INTERNAL_SIZE_T nb;             /* padded  request size */
   char*           m;              /* memory returned by malloc call */
@@ -5478,7 +5477,7 @@ Void_t* cALLOc(size_t n_elements, size_t elem_size)
 */
 
 Void_t**
-_int_icalloc(mstate av, size_t n_elements, size_t elem_size, Void_t* chunks[])
+_int_icalloc(struct malloc_state * av, size_t n_elements, size_t elem_size, Void_t* chunks[])
 {
   size_t sz = elem_size; /* serves as 1-element array */
   /* opts arg of 3 means all elements are same size, and should be cleared */
@@ -5490,7 +5489,7 @@ _int_icalloc(mstate av, size_t n_elements, size_t elem_size, Void_t* chunks[])
 */
 
 Void_t**
-_int_icomalloc(mstate av, size_t n_elements, size_t sizes[], Void_t* chunks[])
+_int_icomalloc(struct malloc_state * av, size_t n_elements, size_t sizes[], Void_t* chunks[])
 {
   return iALLOc(av, n_elements, sizes, 0, chunks);
 }
@@ -5508,7 +5507,7 @@ _int_icomalloc(mstate av, size_t n_elements, size_t sizes[], Void_t* chunks[])
 
 
 static Void_t**
-iALLOc(mstate av, size_t n_elements, size_t* sizes, int opts, Void_t* chunks[])
+iALLOc(struct malloc_state * av, size_t n_elements, size_t* sizes, int opts, Void_t* chunks[])
 {
   INTERNAL_SIZE_T element_size;   /* chunksize of each element, if all same */
   INTERNAL_SIZE_T contents_size;  /* total size of elements */
@@ -5629,7 +5628,7 @@ iALLOc(mstate av, size_t n_elements, size_t* sizes, int opts, Void_t* chunks[])
 */
 
 static Void_t*
-_int_valloc(mstate av, size_t bytes)
+_int_valloc(struct malloc_state * av, size_t bytes)
 {
   /* Ensure initialization/consolidation */
   if (have_fastchunks(av)) malloc_consolidate(av);
@@ -5642,7 +5641,7 @@ _int_valloc(mstate av, size_t bytes)
 
 
 static Void_t*
-_int_pvalloc(mstate av, size_t bytes)
+_int_pvalloc(struct malloc_state * av, size_t bytes)
 {
   size_t pagesz;
 
@@ -5657,7 +5656,7 @@ _int_pvalloc(mstate av, size_t bytes)
   ------------------------------ malloc_trim ------------------------------
 */
 
-static int mTRIm(mstate av, size_t pad)
+static int mTRIm(struct malloc_state * av, size_t pad)
 {
   /* Ensure initialization/consolidation */
   malloc_consolidate (av);
@@ -5733,7 +5732,7 @@ size_t mUSABLe(Void_t* mem)
   ------------------------------ mallinfo ------------------------------
 */
 
-struct mallinfo2 mALLINFo(mstate av)
+struct mallinfo2 mALLINFo(struct malloc_state * av)
 {
   struct mallinfo2 mi;
   size_t i;
@@ -5795,7 +5794,7 @@ struct mallinfo2 mALLINFo(mstate av)
 void mSTATs()
 {
   int i;
-  mstate ar_ptr;
+  struct malloc_state * ar_ptr;
   struct mallinfo2 mi;
   unsigned long in_use_b = mp_.mmapped_mem, system_b = in_use_b;
 #if THREAD_STATS
@@ -5866,7 +5865,7 @@ void mSTATs()
 
 int mALLOPt(int param_number, int value)
 {
-  mstate av = &main_arena;
+  struct malloc_state * av = &main_arena;
   int res = 1;
 
   if(__malloc_initialized < 0)
@@ -6173,7 +6172,7 @@ dlmalloc_info (int options, FILE *fp)
   fputs ("<malloc version=\"1\">\n", fp);
 
   /* Iterate over all arenas currently in use.  */
-  mstate ar_ptr = &main_arena;
+  struct malloc_state * ar_ptr = &main_arena;
   do {
     fprintf (fp, "<heap nr=\"%d\">\n<sizes>\n", n++);
 
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH] malloc: limit free_atomic_list() latency
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
                   ` (23 preceding siblings ...)
  2016-01-26  0:27 ` [PATCH] malloc: avoid main_arena Joern Engel
@ 2016-01-26  0:27 ` Joern Engel
  2016-01-26  0:27 ` [PATCH] malloc: unifdef -D__STD_C Joern Engel
                   ` (39 subsequent siblings)
  64 siblings, 0 replies; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:27 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

Costa and Scott expressed concerns about long free lists causing it to
spend a long time clearing things up.  This puts a hard cap on things.

JIRA: PURE-27597
---
 tpc/malloc2.13/malloc.c |  6 ++++++
 tpc/malloc2.13/tcache.h | 22 ++++++++++++++--------
 2 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/tpc/malloc2.13/malloc.c b/tpc/malloc2.13/malloc.c
index 18c7b407bbea..9f2d2df47ea1 100644
--- a/tpc/malloc2.13/malloc.c
+++ b/tpc/malloc2.13/malloc.c
@@ -2118,6 +2118,12 @@ struct malloc_state {
 	 * not protected by mutex - use atomic operations on this.
 	 */
 	mchunkptr atomic_free_list;
+
+	/*
+	 * Secondary free list in case there are too many objects on
+	 * the primary list to free all at once.
+	 */
+	mchunkptr amortized_free_list;
 };
 
 struct malloc_par {
diff --git a/tpc/malloc2.13/tcache.h b/tpc/malloc2.13/tcache.h
index b02203398f2f..edfe7acbc75e 100644
--- a/tpc/malloc2.13/tcache.h
+++ b/tpc/malloc2.13/tcache.h
@@ -118,23 +118,29 @@ static inline int is_accessed(struct thread_cache *cache, int bin)
 	return get_bit(cache->accessed_map, bin);
 }
 
+/*
+ * Free objects from the atomic_free_list while holding the
+ * arena_lock.  In case the atomic_free_list has become obscenely big
+ * we limit ourselves to freeing 64 objects at once.
+ */
 static void free_atomic_list(struct malloc_state *arena)
 {
 	struct malloc_chunk *victim, *next;
+	int i;
 
-	/*
-	 * Check without using atomic first - if we lose the race we will
-	 * free things next time around.
-	 */
-	if (!arena->atomic_free_list)
-		return;
+	if (!arena->amortized_free_list) {
+		if (!arena->atomic_free_list)
+			return;
+		arena->amortized_free_list = __sync_lock_test_and_set(&arena->atomic_free_list, NULL);
+	}
 
-	victim = __sync_lock_test_and_set(&arena->atomic_free_list, NULL);
-	while (victim) {
+	victim = arena->amortized_free_list;
+	for (i = 64; i && victim; i--) {
 		next = victim->fd;
 		_int_free(arena, victim);
 		victim = next;
 	}
+	arena->amortized_free_list = victim;
 }
 
 static void tcache_gc(struct thread_cache *cache)
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH] malloc: Revert glibc 1d05c2fb9c6f
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
                   ` (19 preceding siblings ...)
  2016-01-26  0:27 ` [PATCH] malloc: fix startup races Joern Engel
@ 2016-01-26  0:27 ` Joern Engel
  2016-01-26  0:27 ` [PATCH] malloc: simplify and fix calloc Joern Engel
                   ` (43 subsequent siblings)
  64 siblings, 0 replies; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:27 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

From: Joern Engel <joern@purestorage.org>

    * malloc/malloc.c: Dynamically size mmap treshold if the program

        frees mmaped blocks.
        Patch by Valerie Henson and Arjan van de Ven.

The proper patch would have been to increase the mmap threashold, not to
make it dynamic.  Dynamic behaviour only causes headaches because
short-running benchmarks behave completely different from long-running
processes.

Added complexity doesn't help when working on the code either.

In order not to revert the improvements gained by increasing the
constants, HEAP_MAX_SIZE is set to 64MB and DEFAULT_MMAP_THRESHOLD to
half that.

JIRA: PURE-27597
---
 tpc/malloc2.13/arena.h  |  6 +---
 tpc/malloc2.13/malloc.c | 90 +------------------------------------------------
 2 files changed, 2 insertions(+), 94 deletions(-)

diff --git a/tpc/malloc2.13/arena.h b/tpc/malloc2.13/arena.h
index 85373466928f..6fc760f0d5ff 100644
--- a/tpc/malloc2.13/arena.h
+++ b/tpc/malloc2.13/arena.h
@@ -25,11 +25,7 @@
 
 #define HEAP_MIN_SIZE (32*1024)
 #ifndef HEAP_MAX_SIZE
-# ifdef DEFAULT_MMAP_THRESHOLD_MAX
-#  define HEAP_MAX_SIZE (2 * DEFAULT_MMAP_THRESHOLD_MAX)
-# else
-#  define HEAP_MAX_SIZE (1024*1024) /* must be a power of two */
-# endif
+#define HEAP_MAX_SIZE (64*1024*1024) /* must be a power of two */
 #endif
 
 /* HEAP_MIN_SIZE and HEAP_MAX_SIZE limit the size of mmap()ed heaps
diff --git a/tpc/malloc2.13/malloc.c b/tpc/malloc2.13/malloc.c
index 06e0f258ea1a..078b3eead789 100644
--- a/tpc/malloc2.13/malloc.c
+++ b/tpc/malloc2.13/malloc.c
@@ -1242,27 +1242,6 @@ int      dlposix_memalign(void **, size_t, size_t);
 #endif
 
 /*
-  MMAP_THRESHOLD_MAX and _MIN are the bounds on the dynamically
-  adjusted MMAP_THRESHOLD.
-*/
-
-#ifndef DEFAULT_MMAP_THRESHOLD_MIN
-#define DEFAULT_MMAP_THRESHOLD_MIN (128 * 1024)
-#endif
-
-#ifndef DEFAULT_MMAP_THRESHOLD_MAX
-  /* For 32-bit platforms we cannot increase the maximum mmap
-     threshold much because it is also the minimum value for the
-     maximum heap size and its alignment.  Going above 512k (i.e., 1M
-     for new heaps) wastes too much address space.  */
-# if __WORDSIZE == 32
-#  define DEFAULT_MMAP_THRESHOLD_MAX (512 * 1024)
-# else
-#  define DEFAULT_MMAP_THRESHOLD_MAX (4 * 1024 * 1024 * sizeof(long))
-# endif
-#endif
-
-/*
   M_MMAP_THRESHOLD is the request size threshold for using mmap()
   to service a request. Requests of at least this size that cannot
   be allocated using already-existing space will be serviced via mmap.
@@ -1301,63 +1280,12 @@ int      dlposix_memalign(void **, size_t, size_t);
   "large" chunks, but the value of "large" varies across systems.  The
   default is an empirically derived value that works well in most
   systems.
-
-
-  Update in 2006:
-  The above was written in 2001. Since then the world has changed a lot.
-  Memory got bigger. Applications got bigger. The virtual address space
-  layout in 32 bit linux changed.
-
-  In the new situation, brk() and mmap space is shared and there are no
-  artificial limits on brk size imposed by the kernel. What is more,
-  applications have started using transient allocations larger than the
-  128Kb as was imagined in 2001.
-
-  The price for mmap is also high now; each time glibc mmaps from the
-  kernel, the kernel is forced to zero out the memory it gives to the
-  application. Zeroing memory is expensive and eats a lot of cache and
-  memory bandwidth. This has nothing to do with the efficiency of the
-  virtual memory system, by doing mmap the kernel just has no choice but
-  to zero.
-
-  In 2001, the kernel had a maximum size for brk() which was about 800
-  megabytes on 32 bit x86, at that point brk() would hit the first
-  mmaped shared libaries and couldn't expand anymore. With current 2.6
-  kernels, the VA space layout is different and brk() and mmap
-  both can span the entire heap at will.
-
-  Rather than using a static threshold for the brk/mmap tradeoff,
-  we are now using a simple dynamic one. The goal is still to avoid
-  fragmentation. The old goals we kept are
-  1) try to get the long lived large allocations to use mmap()
-  2) really large allocations should always use mmap()
-  and we're adding now:
-  3) transient allocations should use brk() to avoid forcing the kernel
-     having to zero memory over and over again
-
-  The implementation works with a sliding threshold, which is by default
-  limited to go between 128Kb and 32Mb (64Mb for 64 bitmachines) and starts
-  out at 128Kb as per the 2001 default.
-
-  This allows us to satisfy requirement 1) under the assumption that long
-  lived allocations are made early in the process' lifespan, before it has
-  started doing dynamic allocations of the same size (which will
-  increase the threshold).
-
-  The upperbound on the threshold satisfies requirement 2)
-
-  The threshold goes up in value when the application frees memory that was
-  allocated with the mmap allocator. The idea is that once the application
-  starts freeing memory of a certain size, it's highly probable that this is
-  a size the application uses for transient allocations. This estimator
-  is there to satisfy the new third requirement.
-
 */
 
 #define M_MMAP_THRESHOLD      -3
 
 #ifndef DEFAULT_MMAP_THRESHOLD
-#define DEFAULT_MMAP_THRESHOLD DEFAULT_MMAP_THRESHOLD_MIN
+#define DEFAULT_MMAP_THRESHOLD (HEAP_MAX_SIZE / 2)
 #endif
 
 /*
@@ -2198,10 +2126,6 @@ struct malloc_par {
   int              n_mmaps;
   int              n_mmaps_max;
   int              max_n_mmaps;
-  /* the mmap_threshold is dynamic, until the user sets
-     it manually, at which point we need to disable any
-     dynamic behavior. */
-  int              no_dyn_threshold;
 
   /* Cache malloc_getpagesize */
   unsigned int     pagesize;
@@ -3385,14 +3309,6 @@ public_fREe(Void_t* mem)
 
   if (chunk_is_mmapped(p))                       /* release mmapped memory. */
   {
-    /* see if the dynamic brk/mmap threshold needs adjusting */
-    if (!mp_.no_dyn_threshold
-	&& p->size > mp_.mmap_threshold
-	&& p->size <= DEFAULT_MMAP_THRESHOLD_MAX)
-      {
-	mp_.mmap_threshold = chunksize (p);
-	mp_.trim_threshold = 2 * mp_.mmap_threshold;
-      }
     munmap_chunk(p);
     return;
   }
@@ -5538,12 +5454,10 @@ int mALLOPt(int param_number, int value)
 
   case M_TRIM_THRESHOLD:
     mp_.trim_threshold = value;
-    mp_.no_dyn_threshold = 1;
     break;
 
   case M_TOP_PAD:
     mp_.top_pad = value;
-    mp_.no_dyn_threshold = 1;
     break;
 
   case M_MMAP_THRESHOLD:
@@ -5552,12 +5466,10 @@ int mALLOPt(int param_number, int value)
       res = 0;
     else
       mp_.mmap_threshold = value;
-      mp_.no_dyn_threshold = 1;
     break;
 
   case M_MMAP_MAX:
       mp_.n_mmaps_max = value;
-      mp_.no_dyn_threshold = 1;
     break;
 
   case M_CHECK_ACTION:
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH] malloc: only free half the objects on tcache_gc
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
                   ` (25 preceding siblings ...)
  2016-01-26  0:27 ` [PATCH] malloc: unifdef -D__STD_C Joern Engel
@ 2016-01-26  0:27 ` Joern Engel
  2016-01-26  0:27 ` [PATCH] malloc: use mbind() Joern Engel
                   ` (37 subsequent siblings)
  64 siblings, 0 replies; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:27 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

Facebook did the same for jemalloc.  Improves cache hit rates by about
9%.

JIRA: PURE-27597
---
 tpc/malloc2.13/tcache.h | 49 +++++++++++++++++++++++++++++++------------------
 1 file changed, 31 insertions(+), 18 deletions(-)

diff --git a/tpc/malloc2.13/tcache.h b/tpc/malloc2.13/tcache.h
index 7ebbc139a6ca..1d324526194f 100644
--- a/tpc/malloc2.13/tcache.h
+++ b/tpc/malloc2.13/tcache.h
@@ -91,6 +91,13 @@ struct thread_cache {
 
 	unsigned long accessed_map[CACHE_BITMAP_SIZE];
 
+	/*
+	 * Bins are single-linked lists, using the fd pointer of
+	 * struct malloc_chunk.  The bk pointer stores the number of
+	 * objects in a bin.  Since bins are LIFO, the bk pointer is
+	 * written once when objects are put into a but and remain
+	 * valid forever.
+	 */
 	struct malloc_chunk *tc_bin[CACHE_NO_BINS];
 };
 
@@ -127,16 +134,21 @@ static void tcache_gc(struct thread_cache *cache)
 {
 	struct malloc_chunk *victim, *next;
 	struct malloc_state *arena = NULL;
-	int i, did_repeat = 0;
+	unsigned long limit;
+	int i;
 
-repeat:
+ repeat:
 	for (i = 0; i < CACHE_NO_BINS; i++) {
 		victim = cache->tc_bin[i];
 		/* accessed bins get skipped - they are useful */
 		if (is_accessed(cache, i) || !victim)
 			continue;
-		cache->tc_bin[i] = NULL;
-		while (victim) {
+		/*
+		 * Only free half of each bin.  Cache remains fuller on
+		 * average and hit rates are higher.
+		 */
+		limit = (unsigned long)victim->bk >> 1;
+		while (victim && (unsigned long)victim->bk > limit) {
 			next = victim->fd;
 			cache->tc_size -= chunksize(victim);
 			cache->tc_count--;
@@ -156,6 +168,7 @@ repeat:
 			} while (!__sync_bool_compare_and_swap(&arena->atomic_free_list, victim->fd, victim));
 			victim = next;
 		}
+		cache->tc_bin[i] = victim;
 	}
 	memset(cache->accessed_map, 0, sizeof(cache->accessed_map));
 
@@ -164,18 +177,10 @@ repeat:
 		 * Since we skip accessed bins we can run into
 		 * pathological cases where all bins are empty or
 		 * accessed and we made no progress.  In those cases
-		 * we retry after clearing the accessed bits, freeing
-		 * the entire cache.  Should be rare.
+		 * we retry after clearing the accessed bits.
+		 * Should be rare.
 		 */
-		did_repeat = 1;
 		goto repeat;
-	} else if (did_repeat) {
-		/*
-		 * Since we freed the entire cache, we can verify the
-		 * counters are consistent.
-		 */
-		assert(cache->tc_size == 0);
-		assert(cache->tc_count == 0);
 	}
 	if (arena && !mutex_trylock(&arena->mutex)) {
 		/*
@@ -197,6 +202,16 @@ repeat:
 	}
 }
 
+static void add_to_bin(struct malloc_chunk **bin, struct malloc_chunk *p)
+{
+	struct malloc_chunk *old;
+
+	old = *bin;
+	p->fd = old;
+	p->bk = (void *) (old ? (unsigned long)old->bk + 1 : 1);
+	*bin = p;
+}
+
 static void *tcache_malloc(size_t size)
 {
 	struct thread_cache *cache;
@@ -285,8 +300,7 @@ static void *tcache_malloc(size_t size)
 			assert(cache_bin(chunksize(prefetch)) == bin_no);
 			cache->tc_size += chunksize(prefetch);
 			cache->tc_count++;
-			prefetch->fd = *bin;
-			*bin = prefetch;
+			add_to_bin(bin, prefetch);
 		}
 	}
 	arena_unlock(arena);
@@ -325,8 +339,7 @@ static void tcache_free(mchunkptr p)
 		malloc_printerr(check_action, "invalid tcache entry", chunk2mem(p));
 		return;
 	}
-	p->fd = *bin;
-	*bin = p;
+	add_to_bin(bin, p);
 	if (cache->tc_size > CACHE_SIZE)
 		tcache_gc(cache);
 	return;
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH] malloc: Lindent public_fREe()
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
                   ` (13 preceding siblings ...)
  2016-01-26  0:27 ` [PATCH] malloc: add documentation Joern Engel
@ 2016-01-26  0:27 ` Joern Engel
  2016-01-26  0:27 ` [PATCH] malloc: make numa_node_count more robust Joern Engel
                   ` (49 subsequent siblings)
  64 siblings, 0 replies; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:27 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

From: Joern Engel <joern@purestorage.org>

JIRA: PURE-27597
---
 tpc/malloc2.13/malloc.c | 44 ++++++++++++++++++++------------------------
 1 file changed, 20 insertions(+), 24 deletions(-)

diff --git a/tpc/malloc2.13/malloc.c b/tpc/malloc2.13/malloc.c
index ddc9d51c445b..83d2c4c02e48 100644
--- a/tpc/malloc2.13/malloc.c
+++ b/tpc/malloc2.13/malloc.c
@@ -3252,8 +3252,7 @@ Void_t *public_mALLOc(size_t bytes)
 	struct malloc_state *ar_ptr;
 	Void_t *victim;
 
-	__malloc_ptr_t(*hook) (size_t, __const __malloc_ptr_t)
-	    = force_reg(dlmalloc_hook);
+	__malloc_ptr_t(*hook) (size_t, __const __malloc_ptr_t) = force_reg(dlmalloc_hook);
 	if (hook != NULL)
 		return (*hook) (bytes, RETURN_ADDRESS(0));
 
@@ -3270,34 +3269,31 @@ Void_t *public_mALLOc(size_t bytes)
 	return victim;
 }
 
-void
-public_fREe(Void_t* mem)
+void public_fREe(Void_t * mem)
 {
-  struct malloc_state * ar_ptr;
-  mchunkptr p;                          /* chunk corresponding to mem */
+	struct malloc_state *ar_ptr;
+	mchunkptr p;		/* chunk corresponding to mem */
 
-  void (*hook) (__malloc_ptr_t, __const __malloc_ptr_t)
-    = force_reg (dlfree_hook);
-  if (hook != NULL) {
-    (*hook)(mem, RETURN_ADDRESS (0));
-    return;
-  }
+	void (*hook) (__malloc_ptr_t, __const __malloc_ptr_t) = force_reg(dlfree_hook);
+	if (hook != NULL) {
+		(*hook) (mem, RETURN_ADDRESS(0));
+		return;
+	}
 
-  if (mem == 0)                              /* free(0) has no effect */
-    return;
+	if (mem == 0)		/* free(0) has no effect */
+		return;
 
-  p = mem2chunk(mem);
+	p = mem2chunk(mem);
 
-  if (chunk_is_mmapped(p))                       /* release mmapped memory. */
-  {
-    munmap_chunk(p);
-    return;
-  }
+	if (chunk_is_mmapped(p)) {	/* release mmapped memory. */
+		munmap_chunk(p);
+		return;
+	}
 
-  ar_ptr = arena_for_chunk(p);
-  arena_lock(ar_ptr);
-  _int_free(ar_ptr, p);
-  arena_unlock(ar_ptr);
+	ar_ptr = arena_for_chunk(p);
+	arena_lock(ar_ptr);
+	_int_free(ar_ptr, p);
+	arena_unlock(ar_ptr);
 }
 
 Void_t*
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH] malloc: fix hard-coded constant
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
                   ` (41 preceding siblings ...)
  2016-01-26  0:27 ` [PATCH] malloc: prefetch for tcache_malloc Joern Engel
@ 2016-01-26  0:27 ` Joern Engel
  2016-01-26  0:27 ` [PATCH] malloc: introduce get_backup_arena() Joern Engel
                   ` (21 subsequent siblings)
  64 siblings, 0 replies; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:27 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

8 used to be identical to NO_PREFETCH.

JIRA: PURE-27597
---
 tpc/malloc2.13/tcache.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tpc/malloc2.13/tcache.h b/tpc/malloc2.13/tcache.h
index b269498657f3..b02203398f2f 100644
--- a/tpc/malloc2.13/tcache.h
+++ b/tpc/malloc2.13/tcache.h
@@ -290,7 +290,7 @@ static void *tcache_malloc(size_t size)
 	 * those objects via tcache_gc.  Also do it before taking the
 	 * lock, to minimize hold times.
 	 */
-	if (nb <= MAX_PREFETCH_SIZE && (cache->tc_size + nb * 8) > CACHE_SIZE )
+	if (nb <= MAX_PREFETCH_SIZE && (cache->tc_size + nb * NO_PREFETCH) > CACHE_SIZE)
 		tcache_gc(cache);
 
 	arena = arena_get(size);
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH] malloc: fix local_next handling
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
                   ` (30 preceding siblings ...)
  2016-01-26  0:27 ` [PATCH] malloc: add locking to thread cache Joern Engel
@ 2016-01-26  0:27 ` Joern Engel
  2016-01-26  0:27 ` [PATCH] malloc: use tsd_getspecific for arena_get Joern Engel
                   ` (32 subsequent siblings)
  64 siblings, 0 replies; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:27 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

arena_get2() still used the old ->next pointer to find the next arena -
ignoring numa locality.  Initial per-node arenas were not on the global
list.

Also, change the numa_arena[node] pointers whenever we use them.
Otherwise those arenas become hot spots.  If we move them around, the
load gets spread roughly evenly.

JIRA: PURE-27597
---
 tpc/malloc2.13/arena.h | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/tpc/malloc2.13/arena.h b/tpc/malloc2.13/arena.h
index 0ee6286bf286..599774ea1300 100644
--- a/tpc/malloc2.13/arena.h
+++ b/tpc/malloc2.13/arena.h
@@ -688,11 +688,18 @@ static struct malloc_state *_int_new_arena(size_t size, int numa_node)
 	(void)mutex_lock(&a->mutex);
 
 
-	/* Add the new arena to the global list.  */
+	/* Add the new arena to the global lists.  */
+	a->numa_node = numa_node;
+
 	a->next = main_arena.next;
 	atomic_write_barrier();
 	main_arena.next = a;
-	a->numa_node = numa_node;
+
+	if (numa_arena[numa_node]) {
+		a->local_next = numa_arena[numa_node]->local_next;
+		atomic_write_barrier();
+		numa_arena[numa_node]->local_next = a;
+	}
 
 	THREAD_STAT(++(a->stat_lock_loop));
 
@@ -703,7 +710,7 @@ static struct malloc_state *arena_get2(struct malloc_state *a_tsd, size_t size)
 {
 	struct malloc_state *a;
 
-	a = a_tsd->next;
+	a = a_tsd->local_next;
 	if (!a)
 		abort();
 
@@ -718,7 +725,7 @@ static struct malloc_state *arena_get2(struct malloc_state *a_tsd, size_t size)
 			tsd_setspecific(arena_key, (Void_t *) a);
 			return a;
 		}
-		a = a->next;
+		a = a->local_next;
 	} while (a != a_tsd);
 
 	/* If not even the list_lock can be obtained, try again.  This can
@@ -768,8 +775,10 @@ static struct malloc_state *arena_get(size_t size)
 	int node = getnode();
 
 	tsd_getspecific(arena_key, arena);
-	if (!arena || arena->numa_node != node)
+	if (!arena || arena->numa_node != node) {
 		arena = numa_arena[node];
+		numa_arena[node] = arena->local_next;
+	}
 	if (arena && !mutex_trylock(&arena->mutex)) {
 		THREAD_STAT(++(arena->stat_lock_direct));
 	} else
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH] malloc: use mbind()
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
                   ` (26 preceding siblings ...)
  2016-01-26  0:27 ` [PATCH] malloc: only free half the objects on tcache_gc Joern Engel
@ 2016-01-26  0:27 ` Joern Engel
  2016-01-26  0:27 ` [PATCH] malloc: unifdef -m -UATOMIC_FASTBINS Joern Engel
                   ` (36 subsequent siblings)
  64 siblings, 0 replies; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:27 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

From: Joern Engel <joern@purestorage.org>

This explicitly does not use libnuma.  Libnuma itself depends on malloc
and adds very little to a bare mbind().

JIRA: PURE-27597
---
 tpc/malloc2.13/arena.h  | 56 +++++++++++++++++++++++++++++++++++++++++++++----
 tpc/malloc2.13/malloc.c |  2 +-
 2 files changed, 53 insertions(+), 5 deletions(-)

diff --git a/tpc/malloc2.13/arena.h b/tpc/malloc2.13/arena.h
index 0a269715004c..d038565f84d3 100644
--- a/tpc/malloc2.13/arena.h
+++ b/tpc/malloc2.13/arena.h
@@ -306,6 +306,37 @@ ptmalloc_init_minimal (void)
 
 static struct malloc_state *_int_new_arena(size_t size, int numa_node);
 
+static int max_node = -1;
+
+#include <sys/types.h>
+#include <dirent.h>
+#include <string.h>
+
+/*
+ * Wouldn't it be nice to get this with a single syscall instead?
+ */
+static void parse_node_count(void)
+{
+	DIR *d;
+	struct dirent *de;
+
+	d = opendir("/sys/devices/system/node");
+	if (!d) {
+		max_node = 0;
+	} else {
+		while ((de = readdir(d)) != NULL) {
+			int nd;
+			if (strncmp(de->d_name, "node", 4))
+				continue;
+			nd = strtoul(de->d_name + 4, NULL, 0);
+			if (max_node < nd)
+				max_node = nd;
+		}
+		closedir(d);
+	}
+	assert(max_node < MAX_NUMA_NODES);
+}
+
 static void ptmalloc_init(void)
 {
 	const char *s;
@@ -333,7 +364,8 @@ static void ptmalloc_init(void)
 	mutex_init(&main_arena.mutex);
 	main_arena.next = &main_arena;
 	numa_arena[0] = &main_arena;
-	for (i = 1; i < MAX_NUMA_NODES; i++) {
+	parse_node_count();
+	for (i = 1; i <= max_node; i++) {
 		numa_arena[i] = _int_new_arena(0, i);
 		numa_arena[i]->local_next = numa_arena[i];
 		(void)mutex_unlock(&numa_arena[i]->mutex);
@@ -443,9 +475,24 @@ static void *mmap_for_heap(void *addr, size_t length, int *must_clear)
 	return MMAP(addr, length, prot, flags | MAP_NORESERVE);
 }
 
+#include <numaif.h>
+#ifndef MPOL_F_STATIC_NODES
+#define MPOL_F_STATIC_NODES   (1 << 15)
+#endif
+static void mbind_memory(void *mem, size_t size, int node)
+{
+	unsigned long node_mask = 1 << node;
+	int err;
+
+	assert(max_node < sizeof(unsigned long));
+	err = mbind(mem, size, MPOL_PREFERRED, &node_mask, max_node, MPOL_F_STATIC_NODES);
+	if (err)
+		assert(!err);
+}
+
 /* Create a new heap.  size is automatically rounded up to a multiple
    of the page size. */
-static heap_info *new_heap(size_t size, size_t top_pad)
+static heap_info *new_heap(size_t size, size_t top_pad, int numa_node)
 {
 	size_t page_mask = malloc_getpagesize - 1;
 	char *p1, *p2;
@@ -499,6 +546,7 @@ static heap_info *new_heap(size_t size, size_t top_pad)
 			}
 		}
 	}
+	mbind_memory(p2, HEAP_MAX_SIZE, numa_node);
 	if (must_clear)
 		memset(p2, 0, HEAP_MAX_SIZE);
 	h = (heap_info *) p2;
@@ -619,12 +667,12 @@ static struct malloc_state *_int_new_arena(size_t size, int numa_node)
 	char *ptr;
 	unsigned long misalign;
 
-	h = new_heap(size + (sizeof(*h) + sizeof(*a) + MALLOC_ALIGNMENT), mp_.top_pad);
+	h = new_heap(size + (sizeof(*h) + sizeof(*a) + MALLOC_ALIGNMENT), mp_.top_pad, numa_node);
 	if (!h) {
 		/* Maybe size is too large to fit in a single heap.  So, just try
 		   to create a minimally-sized arena and let _int_malloc() attempt
 		   to deal with the large request via mmap_chunk().  */
-		h = new_heap(sizeof(*h) + sizeof(*a) + MALLOC_ALIGNMENT, mp_.top_pad);
+		h = new_heap(sizeof(*h) + sizeof(*a) + MALLOC_ALIGNMENT, mp_.top_pad, numa_node);
 		if (!h)
 			abort();
 	}
diff --git a/tpc/malloc2.13/malloc.c b/tpc/malloc2.13/malloc.c
index b86f0c3ff65c..461621c11250 100644
--- a/tpc/malloc2.13/malloc.c
+++ b/tpc/malloc2.13/malloc.c
@@ -2859,7 +2859,7 @@ static Void_t* sYSMALLOc(INTERNAL_SIZE_T nb, struct malloc_state * av)
       set_head(old_top, (((char *)old_heap + old_heap->size) - (char *)old_top)
 	       | PREV_INUSE);
     }
-    else if ((heap = new_heap(nb + (MINSIZE + sizeof(*heap)), mp_.top_pad))) {
+    else if ((heap = new_heap(nb + (MINSIZE + sizeof(*heap)), mp_.top_pad, av->numa_node))) {
       /* Use a newly allocated heap.  */
       heap->ar_ptr = av;
       heap->prev = old_heap;
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH] malloc: unifdef -m -UPER_THREAD -U_LIBC
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
                   ` (38 preceding siblings ...)
  2016-01-26  0:27 ` [PATCH] malloc: initial numa support Joern Engel
@ 2016-01-26  0:27 ` Joern Engel
  2016-01-26  0:27 ` [PATCH] malloc: destroy thread cache on thread exit Joern Engel
                   ` (24 subsequent siblings)
  64 siblings, 0 replies; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:27 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

From: Joern Engel <joern@purestorage.org>

Both got in the way of functional changes.

JIRA: PURE-27597
---
 tpc/malloc2.13/arena.h  | 304 +-----------------------------------------------
 tpc/malloc2.13/hooks.h  |  16 ---
 tpc/malloc2.13/malloc.c | 109 +----------------
 3 files changed, 4 insertions(+), 425 deletions(-)

diff --git a/tpc/malloc2.13/arena.h b/tpc/malloc2.13/arena.h
index c854de12910c..118563003c8d 100644
--- a/tpc/malloc2.13/arena.h
+++ b/tpc/malloc2.13/arena.h
@@ -78,10 +78,6 @@ extern int sanity_check_heap_info_alignment[(sizeof (heap_info)
 
 static tsd_key_t arena_key;
 static mutex_t list_lock;
-#ifdef PER_THREAD
-static size_t narenas;
-static struct malloc_state * free_list;
-#endif
 
 #if THREAD_STATS
 static int stat_n_heaps;
@@ -117,21 +113,12 @@ static int __malloc_initialized = -1;
 	ptr = (struct malloc_state *)tsd_getspecific(arena_key, vptr); \
 } while(0)
 
-#ifdef PER_THREAD
-#define arena_lock(ptr, size) do { \
-	if(ptr) \
-		(void)mutex_lock(&ptr->mutex); \
-	else \
-		ptr = arena_get2(ptr, (size)); \
-} while(0)
-#else
 #define arena_lock(ptr, size) do { \
 	if(ptr && !mutex_trylock(&ptr->mutex)) { \
 		THREAD_STAT(++(ptr->stat_lock_direct)); \
 	} else \
 		ptr = arena_get2(ptr, (size)); \
 } while(0)
-#endif
 
 /* find the heap and corresponding arena for a given ptr */
 
@@ -149,10 +136,8 @@ static int __malloc_initialized = -1;
 
 static __malloc_ptr_t (*save_malloc_hook) (size_t __size,
 					   __const __malloc_ptr_t);
-# if !defined _LIBC || (defined SHARED && !USE___THREAD)
 static __malloc_ptr_t (*save_memalign_hook) (size_t __align, size_t __size,
 					     __const __malloc_ptr_t);
-# endif
 static void           (*save_free_hook) (__malloc_ptr_t __ptr,
 					 __const __malloc_ptr_t);
 static Void_t*        save_arena;
@@ -308,22 +293,13 @@ ptmalloc_unlock_all2 (void)
 
   if(__malloc_initialized < 1)
     return;
-#if defined _LIBC || defined MALLOC_HOOKS
+#if defined MALLOC_HOOKS
   tsd_setspecific(arena_key, save_arena);
   dlmalloc_hook = save_malloc_hook;
   dlfree_hook = save_free_hook;
 #endif
-#ifdef PER_THREAD
-  free_list = NULL;
-#endif
   for(ar_ptr = &main_arena;;) {
     mutex_init(&ar_ptr->mutex);
-#ifdef PER_THREAD
-    if (ar_ptr != save_arena) {
-      ar_ptr->next_free = free_list;
-      free_list = ar_ptr;
-    }
-#endif
     ar_ptr = ar_ptr->next;
     if(ar_ptr == &main_arena) break;
   }
@@ -340,41 +316,6 @@ ptmalloc_unlock_all2 (void)
 #endif /* !defined NO_THREADS */
 
 /* Initialization routine. */
-#ifdef _LIBC
-#include <string.h>
-extern char **_environ;
-
-static char *
-internal_function
-next_env_entry (char ***position)
-{
-  char **current = *position;
-  char *result = NULL;
-
-  while (*current != NULL)
-    {
-      if (__builtin_expect ((*current)[0] == 'M', 0)
-	  && (*current)[1] == 'A'
-	  && (*current)[2] == 'L'
-	  && (*current)[3] == 'L'
-	  && (*current)[4] == 'O'
-	  && (*current)[5] == 'C'
-	  && (*current)[6] == '_')
-	{
-	  result = &(*current)[7];
-
-	  /* Save current position for next visit.  */
-	  *position = ++current;
-
-	  break;
-	}
-
-      ++current;
-    }
-
-  return result;
-}
-#endif /* _LIBC */
 
 /* Set up basic state so that _int_malloc et al can work.  */
 static void
@@ -387,54 +328,9 @@ ptmalloc_init_minimal (void)
   mp_.mmap_threshold = DEFAULT_MMAP_THRESHOLD;
   mp_.trim_threshold = DEFAULT_TRIM_THRESHOLD;
   mp_.pagesize       = malloc_getpagesize;
-#ifdef PER_THREAD
-# define NARENAS_FROM_NCORES(n) ((n) * (sizeof(long) == 4 ? 2 : 8))
-  mp_.arena_test     = NARENAS_FROM_NCORES (1);
-  narenas = 1;
-#endif
-}
-
-
-#ifdef _LIBC
-# ifdef SHARED
-static void *
-__failing_morecore (ptrdiff_t d)
-{
-  return (void *) MORECORE_FAILURE;
 }
 
-extern struct dl_open_hook *_dl_open_hook;
-libc_hidden_proto (_dl_open_hook);
-# endif
 
-# if defined SHARED && !USE___THREAD
-/* This is called by __pthread_initialize_minimal when it needs to use
-   malloc to set up the TLS state.  We cannot do the full work of
-   ptmalloc_init (below) until __pthread_initialize_minimal has finished,
-   so it has to switch to using the special startup-time hooks while doing
-   those allocations.  */
-void
-__libc_malloc_pthread_startup (bool first_time)
-{
-  if (first_time)
-    {
-      ptmalloc_init_minimal ();
-      save_malloc_hook = dlmalloc_hook;
-      save_memalign_hook = dlmemalign_hook;
-      save_free_hook = dlfree_hook;
-      dlmalloc_hook = malloc_starter;
-      dlmemalign_hook = memalign_starter;
-      dlfree_hook = free_starter;
-    }
-  else
-    {
-      dlmalloc_hook = save_malloc_hook;
-      dlmemalign_hook = save_memalign_hook;
-      dlfree_hook = save_free_hook;
-    }
-}
-# endif
-#endif
 
 static void ptmalloc_init(void)
 {
@@ -445,21 +341,9 @@ static void ptmalloc_init(void)
 		return;
 	__malloc_initialized = 0;
 
-#ifdef _LIBC
-#if defined SHARED && !USE___THREAD
-	/* ptmalloc_init_minimal may already have been called via
-	   __libc_malloc_pthread_startup, above.  */
-	if (mp_.pagesize == 0)
-#endif
-#endif
 		ptmalloc_init_minimal();
 
 #ifndef NO_THREADS
-#if defined _LIBC
-	/* We know __pthread_initialize_minimal has already been called,
-	   and that is enough.  */
-#define NO_STARTER
-#endif
 #ifndef NO_STARTER
 	/* With some threads implementations, creating thread-specific data
 	   or initializing a mutex may call malloc() itself.  Provide a
@@ -470,25 +354,11 @@ static void ptmalloc_init(void)
 	dlmalloc_hook = malloc_starter;
 	dlmemalign_hook = memalign_starter;
 	dlfree_hook = free_starter;
-#ifdef _LIBC
-	/* Initialize the pthreads interface. */
-	if (__pthread_initialize != NULL)
-		__pthread_initialize();
-#endif				/* !defined _LIBC */
 #endif				/* !defined NO_STARTER */
 #endif				/* !defined NO_THREADS */
 	mutex_init(&main_arena.mutex);
 	main_arena.next = &main_arena;
 
-#if defined _LIBC && defined SHARED
-	/* In case this libc copy is in a non-default namespace, never use brk.
-	   Likewise if dlopened from statically linked program.  */
-	Dl_info di;
-	struct link_map *l;
-
-	if (_dl_open_hook != NULL || (_dl_addr(ptmalloc_init, &di, &l, NULL) != 0 && l->l_ns != LM_ID_BASE))
-		__morecore = __failing_morecore;
-#endif
 
 	mutex_init(&list_lock);
 	tsd_key_create(&arena_key, NULL);
@@ -503,67 +373,6 @@ static void ptmalloc_init(void)
 #undef NO_STARTER
 #endif
 #endif
-#ifdef _LIBC
-	secure = __libc_enable_secure;
-	s = NULL;
-	if (__builtin_expect(_environ != NULL, 1)) {
-		char **runp = _environ;
-		char *envline;
-
-		while (__builtin_expect((envline = next_env_entry(&runp)) != NULL, 0)) {
-			size_t len = strcspn(envline, "=");
-
-			if (envline[len] != '=')
-				/* This is a "MALLOC_" variable at the end of the string
-				   without a '=' character.  Ignore it since otherwise we
-				   will access invalid memory below.  */
-				continue;
-
-			switch (len) {
-			case 6:
-				if (memcmp(envline, "CHECK_", 6) == 0)
-					s = &envline[7];
-				break;
-			case 8:
-				if (!secure) {
-					if (memcmp(envline, "TOP_PAD_", 8) == 0)
-						mALLOPt(M_TOP_PAD, atoi(&envline[9]));
-					else if (memcmp(envline, "PERTURB_", 8) == 0)
-						mALLOPt(M_PERTURB, atoi(&envline[9]));
-				}
-				break;
-			case 9:
-				if (!secure) {
-					if (memcmp(envline, "MMAP_MAX_", 9) == 0)
-						mALLOPt(M_MMAP_MAX, atoi(&envline[10]));
-#ifdef PER_THREAD
-					else if (memcmp(envline, "ARENA_MAX", 9) == 0)
-						mALLOPt(M_ARENA_MAX, atoi(&envline[10]));
-#endif
-				}
-				break;
-#ifdef PER_THREAD
-			case 10:
-				if (!secure) {
-					if (memcmp(envline, "ARENA_TEST", 10) == 0)
-						mALLOPt(M_ARENA_TEST, atoi(&envline[11]));
-				}
-				break;
-#endif
-			case 15:
-				if (!secure) {
-					if (memcmp(envline, "TRIM_THRESHOLD_", 15) == 0)
-						mALLOPt(M_TRIM_THRESHOLD, atoi(&envline[16]));
-					else if (memcmp(envline, "MMAP_THRESHOLD_", 15) == 0)
-						mALLOPt(M_MMAP_THRESHOLD, atoi(&envline[16]));
-				}
-				break;
-			default:
-				break;
-			}
-		}
-	}
-#else
 	if (!secure) {
 		if ((s = getenv("MALLOC_TRIM_THRESHOLD_")))
 			mALLOPt(M_TRIM_THRESHOLD, atoi(s));
@@ -577,7 +386,6 @@ static void ptmalloc_init(void)
 			mALLOPt(M_MMAP_MAX, atoi(s));
 	}
 	s = getenv("MALLOC_CHECK_");
-#endif
 	if (s && s[0]) {
 		mALLOPt(M_CHECK_ACTION, (int)(s[0] - '0'));
 		if (check_action != 0)
@@ -863,20 +671,12 @@ static struct malloc_state *_int_new_arena(size_t size)
 	mutex_init(&a->mutex);
 	(void)mutex_lock(&a->mutex);
 
-#ifdef PER_THREAD
-	(void)mutex_lock(&list_lock);
-#endif
 
 	/* Add the new arena to the global list.  */
 	a->next = main_arena.next;
 	atomic_write_barrier();
 	main_arena.next = a;
 
-#ifdef PER_THREAD
-	++narenas;
-
-	(void)mutex_unlock(&list_lock);
-#endif
 
 	THREAD_STAT(++(a->stat_lock_loop));
 
@@ -884,94 +684,11 @@ static struct malloc_state *_int_new_arena(size_t size)
 }
 
 
-#ifdef PER_THREAD
-static struct malloc_state *
-get_free_list (void)
-{
-  struct malloc_state * result = free_list;
-  if (result != NULL)
-    {
-      (void)mutex_lock(&list_lock);
-      result = free_list;
-      if (result != NULL)
-	free_list = result->next_free;
-      (void)mutex_unlock(&list_lock);
-
-      if (result != NULL)
-	{
-	  (void)mutex_lock(&result->mutex);
-	  tsd_setspecific(arena_key, (Void_t *)result);
-	  THREAD_STAT(++(result->stat_lock_loop));
-	}
-    }
-
-  return result;
-}
-
-
-static struct malloc_state *
-reused_arena (void)
-{
-  if (narenas <= mp_.arena_test)
-    return NULL;
-
-  static int narenas_limit;
-  if (narenas_limit == 0)
-    {
-      if (mp_.arena_max != 0)
-	narenas_limit = mp_.arena_max;
-      else
-	{
-	  int n  = __get_nprocs ();
-
-	  if (n >= 1)
-	    narenas_limit = NARENAS_FROM_NCORES (n);
-	  else
-	    /* We have no information about the system.  Assume two
-	       cores.  */
-	    narenas_limit = NARENAS_FROM_NCORES (2);
-	}
-    }
-
-  if (narenas < narenas_limit)
-    return NULL;
-
-  struct malloc_state * result;
-  static struct malloc_state * next_to_use;
-  if (next_to_use == NULL)
-    next_to_use = &main_arena;
-
-  result = next_to_use;
-  do
-    {
-      if (!mutex_trylock(&result->mutex))
-	goto out;
-
-      result = result->next;
-    }
-  while (result != next_to_use);
-
-  /* No arena available.  Wait for the next in line.  */
-  (void)mutex_lock(&result->mutex);
-
- out:
-  tsd_setspecific(arena_key, (Void_t *)result);
-  THREAD_STAT(++(result->stat_lock_loop));
-  next_to_use = result->next;
-
-  return result;
-}
-#endif
 
 static struct malloc_state *internal_function arena_get2(struct malloc_state *a_tsd, size_t size)
 {
 	struct malloc_state *a;
 
-#ifdef PER_THREAD
-	if ((a = get_free_list()) == NULL && (a = reused_arena()) == NULL)
-		/* Nothing immediately available, so generate a new arena.  */
-		a = _int_new_arena(size);
-#else
 	if (!a_tsd)
 		a = a_tsd = &main_arena;
 	else {
@@ -1015,29 +732,10 @@ static struct malloc_state *internal_function arena_get2(struct malloc_state *a_
 	/* Nothing immediately available, so generate a new arena.  */
 	a = _int_new_arena(size);
 	(void)mutex_unlock(&list_lock);
-#endif
 
 	return a;
 }
 
-#ifdef PER_THREAD
-static void __attribute__ ((section ("__libc_thread_freeres_fn")))
-arena_thread_freeres (void)
-{
-  Void_t *vptr = NULL;
-  struct malloc_state * a = tsd_getspecific(arena_key, vptr);
-  tsd_setspecific(arena_key, NULL);
-
-  if (a != NULL)
-    {
-      (void)mutex_lock(&list_lock);
-      a->next_free = free_list;
-      free_list = a;
-      (void)mutex_unlock(&list_lock);
-    }
-}
-text_set_element (__libc_thread_subfreeres, arena_thread_freeres);
-#endif
 
 
 /*
diff --git a/tpc/malloc2.13/hooks.h b/tpc/malloc2.13/hooks.h
index 48f54f915275..209544da5377 100644
--- a/tpc/malloc2.13/hooks.h
+++ b/tpc/malloc2.13/hooks.h
@@ -367,12 +367,6 @@ memalign_check(size_t alignment, size_t bytes, const Void_t *caller)
 
 #ifndef NO_THREADS
 
-# ifdef _LIBC
-#  if USE___THREAD || !defined SHARED
-    /* These routines are never needed in this configuration.  */
-#   define NO_STARTER
-#  endif
-# endif
 
 # ifdef NO_STARTER
 #  undef NO_STARTER
@@ -512,11 +506,6 @@ public_gET_STATe(void)
   ms->max_mmapped_mem = mp_.max_mmapped_mem;
   ms->using_malloc_checking = using_malloc_checking;
   ms->max_fast = get_max_fast();
-#ifdef PER_THREAD
-  ms->arena_test = mp_.arena_test;
-  ms->arena_max = mp_.arena_max;
-  ms->narenas = narenas;
-#endif
   (void)mutex_unlock(&main_arena.mutex);
   return (Void_t*)ms;
 }
@@ -616,11 +605,6 @@ public_sET_STATe(Void_t* msptr)
     }
   }
   if (ms->version >= 4) {
-#ifdef PER_THREAD
-    mp_.arena_test = ms->arena_test;
-    mp_.arena_max = ms->arena_max;
-    narenas = ms->narenas;
-#endif
   }
   check_malloc_state(&main_arena);
 
diff --git a/tpc/malloc2.13/malloc.c b/tpc/malloc2.13/malloc.c
index c9644c382e05..a420ef278e68 100644
--- a/tpc/malloc2.13/malloc.c
+++ b/tpc/malloc2.13/malloc.c
@@ -248,14 +248,6 @@ __declspec(dllexport) void malloc2_13_bogus_symbol()
 
 #include <malloc-machine.h>
 
-#ifdef _LIBC
-#ifdef ATOMIC_FASTBINS
-#include <atomic.h>
-#endif
-#include <_itoa.h>
-#include <bits/wordsize.h>
-#include <sys/sysinfo.h>
-#endif
 
 #ifdef __cplusplus
 extern "C" {
@@ -460,39 +452,6 @@ extern "C" {
 #define public_gET_STATe dlget_state
 #define public_sET_STATe dlset_state
 #else /* USE_DL_PREFIX */
-#if defined(_LIBC)
-
-/* Special defines for the GNU C library.  */
-#define public_cALLOc    __libc_calloc
-#define public_fREe      __libc_free
-#define public_cFREe     __libc_cfree
-#define public_mALLOc    __libc_malloc
-#define public_mEMALIGn  __libc_memalign
-#define public_rEALLOc   __libc_realloc
-#define public_vALLOc    __libc_valloc
-#define public_pVALLOc   __libc_pvalloc
-#define public_mALLINFo  __libc_mallinfo
-#define public_mALLOPt   __libc_mallopt
-#define public_mTRIm     __malloc_trim
-#define public_mSTATs    __malloc_stats
-#define public_mUSABLe   __malloc_usable_size
-#define public_iCALLOc   __libc_independent_calloc
-#define public_iCOMALLOc __libc_independent_comalloc
-#define public_gET_STATe __malloc_get_state
-#define public_sET_STATe __malloc_set_state
-#define malloc_getpagesize __getpagesize()
-#define open             __open
-#define mmap             __mmap
-#define munmap           __munmap
-#define mremap           __mremap
-#define mprotect         __mprotect
-#define MORECORE         (*__morecore)
-#define MORECORE_FAILURE 0
-
-Void_t * __default_morecore (ptrdiff_t);
-Void_t *(*__morecore)(ptrdiff_t) = __default_morecore;
-
-#else /* !_LIBC */
 #define public_cALLOc    calloc
 #define public_fREe      free
 #define public_cFREe     cfree
@@ -511,14 +470,11 @@ Void_t *(*__morecore)(ptrdiff_t) = __default_morecore;
 #define public_gET_STATe malloc_get_state
 #define public_sET_STATe malloc_set_state
 
-#endif /* _LIBC */
 #endif /* USE_DL_PREFIX */
 
-#ifndef _LIBC
 #define __builtin_expect(expr, val)	(expr)
 
 #define fwrite(buf, size, count, fp) _IO_fwrite (buf, size, count, fp)
-#endif
 
 /*
   HAVE_MEMCPY should be defined if you are not otherwise using
@@ -546,16 +502,12 @@ Void_t *(*__morecore)(ptrdiff_t) = __default_morecore;
 
 
 
-#ifdef _LIBC
-# include <string.h>
-#else
 #ifdef WIN32
 /* On Win32 memset and memcpy are already declared in windows.h */
 #else
 void* memset(void*, int, size_t);
 void* memcpy(void*, const void*, size_t);
 #endif
-#endif
 
 
 /* Force a value to be in a register and stop the compiler referring
@@ -930,7 +882,6 @@ int      public_mALLOPt(int, int);
 */
 struct mallinfo public_mALLINFo(void);
 
-#ifndef _LIBC
 /*
   independent_calloc(size_t n_elements, size_t element_size, Void_t* chunks[]);
 
@@ -1046,7 +997,6 @@ Void_t** public_iCALLOc(size_t, size_t, Void_t**);
 */
 Void_t** public_iCOMALLOc(size_t, size_t*, Void_t**);
 
-#endif /* _LIBC */
 
 
 /*
@@ -1147,7 +1097,7 @@ Void_t*  public_gET_STATe(void);
 */
 int      public_sET_STATe(Void_t*);
 
-#if defined(_LIBC) || defined(PURE_HACK)
+#if defined(PURE_HACK)
 /*
   posix_memalign(void **memptr, size_t alignment, size_t size);
 
@@ -1493,10 +1443,8 @@ static Void_t*  _int_memalign(struct malloc_state *, size_t, size_t);
 static Void_t*  _int_valloc(struct malloc_state *, size_t);
 static Void_t*  _int_pvalloc(struct malloc_state *, size_t);
 /*static Void_t*  cALLOc(size_t, size_t);*/
-#ifndef _LIBC
 static Void_t** _int_icalloc(struct malloc_state *, size_t, size_t, Void_t**);
 static Void_t** _int_icomalloc(struct malloc_state *, size_t, size_t*, Void_t**);
-#endif
 static int      mTRIm(struct malloc_state *, size_t);
 static size_t   mUSABLe(Void_t*);
 static void     mSTATs(void);
@@ -1518,12 +1466,6 @@ static Void_t*   realloc_check(Void_t* oldmem, size_t bytes,
 static Void_t*   memalign_check(size_t alignment, size_t bytes,
 				const Void_t *caller);
 #ifndef NO_THREADS
-# ifdef _LIBC
-#  if USE___THREAD || !defined SHARED
-    /* These routines are never needed in this configuration.  */
-#   define NO_STARTER
-#  endif
-# endif
 # ifdef NO_STARTER
 #  undef NO_STARTER
 # else
@@ -2238,10 +2180,6 @@ struct malloc_state {
 	/* Linked list */
 	struct malloc_state *next;
 
-#ifdef PER_THREAD
-	/* Linked list for free arenas.  */
-	struct malloc_state *next_free;
-#endif
 
 	/* Memory allocated from the system in this arena.  */
 	INTERNAL_SIZE_T system_mem;
@@ -2253,10 +2191,6 @@ struct malloc_par {
   unsigned long    trim_threshold;
   INTERNAL_SIZE_T  top_pad;
   INTERNAL_SIZE_T  mmap_threshold;
-#ifdef PER_THREAD
-  INTERNAL_SIZE_T  arena_test;
-  INTERNAL_SIZE_T  arena_max;
-#endif
 
   /* Memory map support */
   int              n_mmaps;
@@ -2294,11 +2228,6 @@ static struct malloc_state main_arena;
 static struct malloc_par mp_;
 
 
-#ifdef PER_THREAD
-/*  Non public mallopt parameters.  */
-#define M_ARENA_TEST -7
-#define M_ARENA_MAX  -8
-#endif
 
 
 /* Maximum size of memory handled in fastbins.  */
@@ -2343,9 +2272,7 @@ static void malloc_init_state(struct malloc_state * av)
 static Void_t*  sYSMALLOc(INTERNAL_SIZE_T, struct malloc_state *);
 static int      sYSTRIm(size_t, struct malloc_state *);
 static void     malloc_consolidate(struct malloc_state *);
-#ifndef _LIBC
 static Void_t** iALLOc(struct malloc_state *, size_t, size_t*, int, Void_t**);
-#endif
 
 
 /* -------------- Early definitions for debugging hooks ---------------- */
@@ -2353,13 +2280,7 @@ static Void_t** iALLOc(struct malloc_state *, size_t, size_t*, int, Void_t**);
 /* Define and initialize the hook variables.  These weak definitions must
    appear before any use of the variables in a function (arena.c uses one).  */
 #ifndef weak_variable
-#ifndef _LIBC
 #define weak_variable /**/
-#else
-/* In GNU libc we want the hook variables to be weak definitions to
-   avoid a problem with Emacs.  */
-#define weak_variable weak_function
-#endif
 #endif
 
 /* Forward declarations.  */
@@ -3555,7 +3476,7 @@ public_rEALLOc(Void_t* oldmem, size_t bytes)
   (void)mutex_lock(&ar_ptr->mutex);
 #endif
 
-#if !defined NO_THREADS && !defined PER_THREAD
+#if !defined NO_THREADS
   /* As in malloc(), remember this arena for the next allocation. */
   tsd_setspecific(arena_key, (Void_t *)ar_ptr);
 #endif
@@ -3801,7 +3722,6 @@ Void_t *public_cALLOc(size_t n, size_t elem_size)
 	return mem;
 }
 
-#ifndef _LIBC
 
 Void_t**
 public_iCALLOc(size_t n, size_t elem_size, Void_t** chunks)
@@ -3839,7 +3759,6 @@ public_cFREe(Void_t* m)
   public_fREe(m);
 }
 
-#endif /* _LIBC */
 
 int
 public_mTRIm(size_t s)
@@ -5214,7 +5133,6 @@ _int_memalign(struct malloc_state * av, size_t alignment, size_t bytes)
   return chunk2mem(p);
 }
 
-#ifndef _LIBC
 /*
   ------------------------- independent_calloc -------------------------
 */
@@ -5363,7 +5281,6 @@ iALLOc(struct malloc_state * av, size_t n_elements, size_t* sizes, int opts, Voi
 
   return marray;
 }
-#endif /* _LIBC */
 
 
 /*
@@ -5546,11 +5463,6 @@ void mSTATs()
 
   if(__malloc_initialized < 0)
     ptmalloc_init ();
-#ifdef _LIBC
-  _IO_flockfile (stderr);
-  int old_flags2 = ((_IO_FILE *) stderr)->_flags2;
-  ((_IO_FILE *) stderr)->_flags2 |= _IO_FLAGS2_NOTCANCEL;
-#endif
   for (i=0, ar_ptr = &main_arena;; i++) {
     (void)mutex_lock(&ar_ptr->mutex);
     mi = mALLINFo(ar_ptr);
@@ -5589,10 +5501,6 @@ void mSTATs()
   fprintf(stderr, "locked total     = %10ld\n",
 	  stat_lock_direct + stat_lock_loop + stat_lock_wait);
 #endif
-#ifdef _LIBC
-  ((_IO_FILE *) stderr)->_flags2 |= old_flags2;
-  _IO_funlockfile (stderr);
-#endif
 }
 
 
@@ -5652,17 +5560,6 @@ int mALLOPt(int param_number, int value)
     perturb_byte = value;
     break;
 
-#ifdef PER_THREAD
-  case M_ARENA_TEST:
-    if (value > 0)
-      mp_.arena_test = value;
-    break;
-
-  case M_ARENA_MAX:
-    if (value > 0)
-      mp_.arena_max = value;
-    break;
-#endif
   }
   (void)mutex_unlock(&av->mutex);
   return res;
@@ -5841,7 +5738,7 @@ malloc_printerr(int action, const char *str, void *ptr)
 }
 #endif
 
-#if defined(_LIBC) || defined(PURE_HACK)
+#if defined(PURE_HACK)
 # include <sys/param.h>
 
 /* We need a wrapper function for one of the additions of POSIX.  */
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH] malloc: use tsd_getspecific for arena_get
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
                   ` (31 preceding siblings ...)
  2016-01-26  0:27 ` [PATCH] malloc: fix local_next handling Joern Engel
@ 2016-01-26  0:27 ` Joern Engel
  2016-01-26  0:27 ` [PATCH] malloc: use atomic free list Joern Engel
                   ` (31 subsequent siblings)
  64 siblings, 0 replies; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:27 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

From: Joern Engel <joern@purestorage.org>

Whether the interface of tsd_getspecific is good or bad, code should
remain self-consistent.  Changing the interface can be done at some
later time, if deemed useful.

JIRA: PURE-27597
---
 tpc/malloc2.13/arena.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tpc/malloc2.13/arena.h b/tpc/malloc2.13/arena.h
index 6fc760f0d5ff..2e74cdb05d86 100644
--- a/tpc/malloc2.13/arena.h
+++ b/tpc/malloc2.13/arena.h
@@ -773,7 +773,7 @@ static struct malloc_state *arena_get(size_t size)
 	struct malloc_state *arena = NULL;
 	int node = getnode();
 
-	arena = pthread_getspecific(arena_key);
+	tsd_getspecific(arena_key, arena);
 	if (!arena || arena->numa_node != node)
 		arena = numa_arena[node];
 	if (arena && !mutex_trylock(&arena->mutex)) {
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH] malloc: unifdef -D__STD_C
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
                   ` (24 preceding siblings ...)
  2016-01-26  0:27 ` [PATCH] malloc: limit free_atomic_list() latency Joern Engel
@ 2016-01-26  0:27 ` Joern Engel
  2016-01-26  0:27 ` [PATCH] malloc: only free half the objects on tcache_gc Joern Engel
                   ` (38 subsequent siblings)
  64 siblings, 0 replies; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:27 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

From: Joern Engel <joern@purestorage.org>

Code cleanup to improve readability.

JIRA: PURE-27597
---
 tpc/malloc2.13/arena.ch |  24 ------
 tpc/malloc2.13/hooks.ch |  57 -------------
 tpc/malloc2.13/malloc.c | 209 ------------------------------------------------
 3 files changed, 290 deletions(-)

diff --git a/tpc/malloc2.13/arena.ch b/tpc/malloc2.13/arena.ch
index 09bdf0fd26b5..b8e7c611c42c 100644
--- a/tpc/malloc2.13/arena.ch
+++ b/tpc/malloc2.13/arena.ch
@@ -465,11 +465,7 @@ __libc_malloc_pthread_startup (bool first_time)
 static void
 ptmalloc_init (void)
 {
-#if __STD_C
   const char* s;
-#else
-  char* s;
-#endif
   int secure = 0;
 
   if(__malloc_initialized >= 0) return;
@@ -647,11 +643,7 @@ thread_atfork_static(ptmalloc_lock_all, ptmalloc_unlock_all, \
 /* Print the complete contents of a single heap to stderr. */
 
 static void
-#if __STD_C
 dump_heap(heap_info *heap)
-#else
-dump_heap(heap) heap_info *heap;
-#endif
 {
   char *ptr;
   mchunkptr p;
@@ -771,11 +763,7 @@ static heap_info *new_heap(size_t size, size_t top_pad)
    multiple of the page size. */
 
 static int
-#if __STD_C
 grow_heap(heap_info *h, long diff)
-#else
-grow_heap(h, diff) heap_info *h; long diff;
-#endif
 {
   size_t page_mask = malloc_getpagesize - 1;
   long new_size;
@@ -795,11 +783,7 @@ grow_heap(h, diff) heap_info *h; long diff;
 /* Shrink a heap.  */
 
 static int
-#if __STD_C
 shrink_heap(heap_info *h, long diff)
-#else
-shrink_heap(h, diff) heap_info *h; long diff;
-#endif
 {
   long new_size;
 
@@ -826,11 +810,7 @@ shrink_heap(h, diff) heap_info *h; long diff;
 
 static int
 internal_function
-#if __STD_C
 heap_trim(heap_info *heap, size_t pad)
-#else
-heap_trim(heap, pad) heap_info *heap; size_t pad;
-#endif
 {
   mstate ar_ptr = heap->ar_ptr;
   unsigned long pagesz = mp_.pagesize;
@@ -1026,11 +1006,7 @@ reused_arena (void)
 
 static mstate
 internal_function
-#if __STD_C
 arena_get2(mstate a_tsd, size_t size)
-#else
-arena_get2(a_tsd, size) mstate a_tsd; size_t size;
-#endif
 {
   mstate a;
 
diff --git a/tpc/malloc2.13/hooks.ch b/tpc/malloc2.13/hooks.ch
index 939cb3fefd95..05cfafbb78ba 100644
--- a/tpc/malloc2.13/hooks.ch
+++ b/tpc/malloc2.13/hooks.ch
@@ -26,12 +26,7 @@
    initialization routine, then do the normal work. */
 
 static Void_t*
-#if __STD_C
 malloc_hook_ini(size_t sz, const __malloc_ptr_t caller)
-#else
-malloc_hook_ini(sz, caller)
-     size_t sz; const __malloc_ptr_t caller;
-#endif
 {
   dlmalloc_hook = NULL;
   ptmalloc_init();
@@ -39,12 +34,7 @@ malloc_hook_ini(sz, caller)
 }
 
 static Void_t*
-#if __STD_C
 realloc_hook_ini(Void_t* ptr, size_t sz, const __malloc_ptr_t caller)
-#else
-realloc_hook_ini(ptr, sz, caller)
-     Void_t* ptr; size_t sz; const __malloc_ptr_t caller;
-#endif
 {
   dlmalloc_hook = NULL;
   dlrealloc_hook = NULL;
@@ -53,12 +43,7 @@ realloc_hook_ini(ptr, sz, caller)
 }
 
 static Void_t*
-#if __STD_C
 memalign_hook_ini(size_t alignment, size_t sz, const __malloc_ptr_t caller)
-#else
-memalign_hook_ini(alignment, sz, caller)
-     size_t alignment; size_t sz; const __malloc_ptr_t caller;
-#endif
 {
   dlmemalign_hook = NULL;
   ptmalloc_init();
@@ -110,11 +95,7 @@ dlmalloc_check_init()
 
 static Void_t*
 internal_function
-#if __STD_C
 mem2mem_check(Void_t *ptr, size_t sz)
-#else
-mem2mem_check(ptr, sz) Void_t *ptr; size_t sz;
-#endif
 {
   mchunkptr p;
   unsigned char* m_ptr = (unsigned char*)BOUNDED_N(ptr, sz);
@@ -141,11 +122,7 @@ mem2mem_check(ptr, sz) Void_t *ptr; size_t sz;
 
 static mchunkptr
 internal_function
-#if __STD_C
 mem2chunk_check(Void_t* mem, unsigned char **magic_p)
-#else
-mem2chunk_check(mem, magic_p) Void_t* mem; unsigned char **magic_p;
-#endif
 {
   mchunkptr p;
   INTERNAL_SIZE_T sz, c;
@@ -200,11 +177,7 @@ mem2chunk_check(mem, magic_p) Void_t* mem; unsigned char **magic_p;
 
 static int
 internal_function
-#if __STD_C
 top_check(void)
-#else
-top_check()
-#endif
 {
   mchunkptr t = top(&main_arena);
   char* brk, * new_brk;
@@ -247,11 +220,7 @@ top_check()
 }
 
 static Void_t*
-#if __STD_C
 malloc_check(size_t sz, const Void_t *caller)
-#else
-malloc_check(sz, caller) size_t sz; const Void_t *caller;
-#endif
 {
   Void_t *victim;
 
@@ -267,11 +236,7 @@ malloc_check(sz, caller) size_t sz; const Void_t *caller;
 }
 
 static void
-#if __STD_C
 free_check(Void_t* mem, const Void_t *caller)
-#else
-free_check(mem, caller) Void_t* mem; const Void_t *caller;
-#endif
 {
   mchunkptr p;
 
@@ -303,12 +268,7 @@ free_check(mem, caller) Void_t* mem; const Void_t *caller;
 }
 
 static Void_t*
-#if __STD_C
 realloc_check(Void_t* oldmem, size_t bytes, const Void_t *caller)
-#else
-realloc_check(oldmem, bytes, caller)
-     Void_t* oldmem; size_t bytes; const Void_t *caller;
-#endif
 {
   INTERNAL_SIZE_T nb;
   Void_t* newmem = 0;
@@ -391,12 +351,7 @@ realloc_check(oldmem, bytes, caller)
 }
 
 static Void_t*
-#if __STD_C
 memalign_check(size_t alignment, size_t bytes, const Void_t *caller)
-#else
-memalign_check(alignment, bytes, caller)
-     size_t alignment; size_t bytes; const Void_t *caller;
-#endif
 {
   INTERNAL_SIZE_T nb;
   Void_t* mem;
@@ -433,11 +388,7 @@ memalign_check(alignment, bytes, caller)
    ptmalloc_init() hasn't completed yet. */
 
 static Void_t*
-#if __STD_C
 malloc_starter(size_t sz, const Void_t *caller)
-#else
-malloc_starter(sz, caller) size_t sz; const Void_t *caller;
-#endif
 {
   Void_t* victim;
 
@@ -447,11 +398,7 @@ malloc_starter(sz, caller) size_t sz; const Void_t *caller;
 }
 
 static Void_t*
-#if __STD_C
 memalign_starter(size_t align, size_t sz, const Void_t *caller)
-#else
-memalign_starter(align, sz, caller) size_t align, sz; const Void_t *caller;
-#endif
 {
   Void_t* victim;
 
@@ -461,11 +408,7 @@ memalign_starter(align, sz, caller) size_t align, sz; const Void_t *caller;
 }
 
 static void
-#if __STD_C
 free_starter(Void_t* mem, const Void_t *caller)
-#else
-free_starter(mem, caller) Void_t* mem; const Void_t *caller;
-#endif
 {
   mchunkptr p;
 
diff --git a/tpc/malloc2.13/malloc.c b/tpc/malloc2.13/malloc.c
index 887edcc94e56..b17b17bba4d4 100644
--- a/tpc/malloc2.13/malloc.c
+++ b/tpc/malloc2.13/malloc.c
@@ -234,13 +234,6 @@ __declspec(dllexport) void malloc2_13_bogus_symbol()
   with it.
 */
 
-#ifndef __STD_C
-#if defined(__STDC__) || defined(__cplusplus)
-#define __STD_C     1
-#else
-#define __STD_C     0
-#endif
-#endif /*__STD_C*/
 
 
 /*
@@ -248,19 +241,11 @@ __declspec(dllexport) void malloc2_13_bogus_symbol()
 */
 
 #ifndef Void_t
-#if (__STD_C || defined(WIN32))
 #define Void_t      void
-#else
-#define Void_t      char
-#endif
 #endif /*Void_t*/
 
-#if __STD_C
 #include <stddef.h>   /* for size_t */
 #include <stdlib.h>   /* for getenv(), abort() */
-#else
-#include <sys/types.h>
-#endif
 
 #include <malloc-machine.h>
 
@@ -561,7 +546,6 @@ Void_t *(*__morecore)(ptrdiff_t) = __default_morecore;
 #endif
 
 
-#if (__STD_C || defined(HAVE_MEMCPY))
 
 #ifdef _LIBC
 # include <string.h>
@@ -569,14 +553,8 @@ Void_t *(*__morecore)(ptrdiff_t) = __default_morecore;
 #ifdef WIN32
 /* On Win32 memset and memcpy are already declared in windows.h */
 #else
-#if __STD_C
 void* memset(void*, int, size_t);
 void* memcpy(void*, const void*, size_t);
-#else
-Void_t* memset();
-Void_t* memcpy();
-#endif
-#endif
 #endif
 #endif
 
@@ -596,13 +574,9 @@ Void_t* memcpy();
 */
 
 #ifndef MALLOC_FAILURE_ACTION
-#if __STD_C
 #define MALLOC_FAILURE_ACTION \
    errno = ENOMEM;
 
-#else
-#define MALLOC_FAILURE_ACTION
-#endif
 #endif
 
 /*
@@ -612,11 +586,7 @@ Void_t* memcpy();
 
 #ifdef LACKS_UNISTD_H
 #if !defined(__FreeBSD__) && !defined(__OpenBSD__) && !defined(__NetBSD__)
-#if __STD_C
 extern Void_t*     sbrk(ptrdiff_t);
-#else
-extern Void_t*     sbrk();
-#endif
 #endif
 #endif
 
@@ -867,11 +837,7 @@ extern Void_t*     sbrk();
   differs across systems, but is in all cases less than the maximum
   representable value of a size_t.
 */
-#if __STD_C
 Void_t*  public_mALLOc(size_t);
-#else
-Void_t*  public_mALLOc();
-#endif
 #ifdef libc_hidden_proto
 libc_hidden_proto (public_mALLOc)
 #endif
@@ -887,11 +853,7 @@ libc_hidden_proto (public_mALLOc)
   when possible, automatically trigger operations that give
   back unused memory to the system, thus reducing program footprint.
 */
-#if __STD_C
 void     public_fREe(Void_t*);
-#else
-void     public_fREe();
-#endif
 #ifdef libc_hidden_proto
 libc_hidden_proto (public_fREe)
 #endif
@@ -901,11 +863,7 @@ libc_hidden_proto (public_fREe)
   Returns a pointer to n_elements * element_size bytes, with all locations
   set to zero.
 */
-#if __STD_C
 Void_t*  public_cALLOc(size_t, size_t);
-#else
-Void_t*  public_cALLOc();
-#endif
 
 /*
   realloc(Void_t* p, size_t n)
@@ -934,11 +892,7 @@ Void_t*  public_cALLOc();
   The old unix realloc convention of allowing the last-free'd chunk
   to be used as an argument to realloc is not supported.
 */
-#if __STD_C
 Void_t*  public_rEALLOc(Void_t*, size_t);
-#else
-Void_t*  public_rEALLOc();
-#endif
 #ifdef libc_hidden_proto
 libc_hidden_proto (public_rEALLOc)
 #endif
@@ -955,11 +909,7 @@ libc_hidden_proto (public_rEALLOc)
 
   Overreliance on memalign is a sure way to fragment space.
 */
-#if __STD_C
 Void_t*  public_mEMALIGn(size_t, size_t);
-#else
-Void_t*  public_mEMALIGn();
-#endif
 #ifdef libc_hidden_proto
 libc_hidden_proto (public_mEMALIGn)
 #endif
@@ -969,11 +919,7 @@ libc_hidden_proto (public_mEMALIGn)
   Equivalent to memalign(pagesize, n), where pagesize is the page
   size of the system. If the pagesize is unknown, 4096 is used.
 */
-#if __STD_C
 Void_t*  public_vALLOc(size_t);
-#else
-Void_t*  public_vALLOc();
-#endif
 
 
 
@@ -998,11 +944,7 @@ Void_t*  public_vALLOc();
   M_MMAP_THRESHOLD -3         128*1024   any   (or 0 if no MMAP support)
   M_MMAP_MAX       -4         65536      any   (0 disables use of mmap)
 */
-#if __STD_C
 int      public_mALLOPt(int, int);
-#else
-int      public_mALLOPt();
-#endif
 
 
 /*
@@ -1028,11 +970,7 @@ int      public_mALLOPt();
   be kept as longs, the reported values may wrap around zero and
   thus be inaccurate.
 */
-#if __STD_C
 struct mallinfo public_mALLINFo(void);
-#else
-struct mallinfo public_mALLINFo();
-#endif
 
 #ifndef _LIBC
 /*
@@ -1087,11 +1025,7 @@ struct mallinfo public_mALLINFo();
     return first;
   }
 */
-#if __STD_C
 Void_t** public_iCALLOc(size_t, size_t, Void_t**);
-#else
-Void_t** public_iCALLOc();
-#endif
 
 /*
   independent_comalloc(size_t n_elements, size_t sizes[], Void_t* chunks[]);
@@ -1152,11 +1086,7 @@ Void_t** public_iCALLOc();
   since it cannot reuse existing noncontiguous small chunks that
   might be available for some of the elements.
 */
-#if __STD_C
 Void_t** public_iCOMALLOc(size_t, size_t*, Void_t**);
-#else
-Void_t** public_iCOMALLOc();
-#endif
 
 #endif /* _LIBC */
 
@@ -1166,11 +1096,7 @@ Void_t** public_iCOMALLOc();
   Equivalent to valloc(minimum-page-that-holds(n)), that is,
   round up n to nearest pagesize.
  */
-#if __STD_C
 Void_t*  public_pVALLOc(size_t);
-#else
-Void_t*  public_pVALLOc();
-#endif
 
 /*
   cfree(Void_t* p);
@@ -1180,11 +1106,7 @@ Void_t*  public_pVALLOc();
   for odd historical reasons (such as: cfree is used in example
   code in the first edition of K&R).
 */
-#if __STD_C
 void     public_cFREe(Void_t*);
-#else
-void     public_cFREe();
-#endif
 
 /*
   malloc_trim(size_t pad);
@@ -1210,11 +1132,7 @@ void     public_cFREe();
   On systems that do not support "negative sbrks", it will always
   return 0.
 */
-#if __STD_C
 int      public_mTRIm(size_t);
-#else
-int      public_mTRIm();
-#endif
 
 /*
   malloc_usable_size(Void_t* p);
@@ -1231,11 +1149,7 @@ int      public_mTRIm();
   assert(malloc_usable_size(p) >= 256);
 
 */
-#if __STD_C
 size_t   public_mUSABLe(Void_t*);
-#else
-size_t   public_mUSABLe();
-#endif
 
 /*
   malloc_stats();
@@ -1257,11 +1171,7 @@ size_t   public_mUSABLe();
   More information can be obtained by calling mallinfo.
 
 */
-#if __STD_C
 void     public_mSTATs(void);
-#else
-void     public_mSTATs();
-#endif
 
 /*
   malloc_get_state(void);
@@ -1269,11 +1179,7 @@ void     public_mSTATs();
   Returns the state of all malloc variables in an opaque data
   structure.
 */
-#if __STD_C
 Void_t*  public_gET_STATe(void);
-#else
-Void_t*  public_gET_STATe();
-#endif
 
 /*
   malloc_set_state(Void_t* state);
@@ -1281,11 +1187,7 @@ Void_t*  public_gET_STATe();
   Restore the state of all malloc variables from data obtained with
   malloc_get_state().
 */
-#if __STD_C
 int      public_sET_STATe(Void_t*);
-#else
-int      public_sET_STATe();
-#endif
 
 #if defined(_LIBC) || defined(PURE_HACK)
 /*
@@ -1625,7 +1527,6 @@ struct mallinfo2 {
 
 /* Internal routines.  */
 
-#if __STD_C
 
 static Void_t*  _int_malloc(mstate, size_t);
 #ifdef ATOMIC_FASTBINS
@@ -1681,24 +1582,6 @@ static Void_t*   malloc_atfork(size_t sz, const Void_t *caller);
 static void      free_atfork(Void_t* mem, const Void_t *caller);
 #endif
 
-#else
-
-static Void_t*  _int_malloc();
-static void     _int_free();
-static Void_t*  _int_realloc();
-static Void_t*  _int_memalign();
-static Void_t*  _int_valloc();
-static Void_t*  _int_pvalloc();
-/*static Void_t*  cALLOc();*/
-static Void_t** _int_icalloc();
-static Void_t** _int_icomalloc();
-static int      mTRIm();
-static size_t   mUSABLe();
-static void     mSTATs();
-static int      mALLOPt();
-static struct mallinfo2 mALLINFo();
-
-#endif
 
 
 
@@ -2480,11 +2363,7 @@ static INTERNAL_SIZE_T global_max_fast;
   optimization at all. (Inlining it in malloc_consolidate is fine though.)
 */
 
-#if __STD_C
 static void malloc_init_state(mstate av)
-#else
-static void malloc_init_state(av) mstate av;
-#endif
 {
   int     i;
   mbinptr bin;
@@ -2510,19 +2389,12 @@ static void malloc_init_state(av) mstate av;
    Other internal utilities operating on mstates
 */
 
-#if __STD_C
 static Void_t*  sYSMALLOc(INTERNAL_SIZE_T, mstate);
 static int      sYSTRIm(size_t, mstate);
 static void     malloc_consolidate(mstate);
 #ifndef _LIBC
 static Void_t** iALLOc(mstate, size_t, size_t*, int, Void_t**);
 #endif
-#else
-static Void_t*  sYSMALLOc();
-static int      sYSTRIm();
-static void     malloc_consolidate();
-static Void_t** iALLOc();
-#endif
 
 
 /* -------------- Early definitions for debugging hooks ---------------- */
@@ -2613,11 +2485,7 @@ static int perturb_byte;
   Properties of all chunks
 */
 
-#if __STD_C
 static void do_check_chunk(mstate av, mchunkptr p)
-#else
-static void do_check_chunk(av, p) mstate av; mchunkptr p;
-#endif
 {
   unsigned long sz = chunksize(p);
   /* min and max possible addresses assuming contiguous allocation */
@@ -2662,11 +2530,7 @@ static void do_check_chunk(av, p) mstate av; mchunkptr p;
   Properties of free chunks
 */
 
-#if __STD_C
 static void do_check_free_chunk(mstate av, mchunkptr p)
-#else
-static void do_check_free_chunk(av, p) mstate av; mchunkptr p;
-#endif
 {
   INTERNAL_SIZE_T sz = p->size & ~(PREV_INUSE|NON_MAIN_ARENA);
   mchunkptr next = chunk_at_offset(p, sz);
@@ -2700,11 +2564,7 @@ static void do_check_free_chunk(av, p) mstate av; mchunkptr p;
   Properties of inuse chunks
 */
 
-#if __STD_C
 static void do_check_inuse_chunk(mstate av, mchunkptr p)
-#else
-static void do_check_inuse_chunk(av, p) mstate av; mchunkptr p;
-#endif
 {
   mchunkptr next;
 
@@ -2741,12 +2601,7 @@ static void do_check_inuse_chunk(av, p) mstate av; mchunkptr p;
   Properties of chunks recycled from fastbins
 */
 
-#if __STD_C
 static void do_check_remalloced_chunk(mstate av, mchunkptr p, INTERNAL_SIZE_T s)
-#else
-static void do_check_remalloced_chunk(av, p, s)
-mstate av; mchunkptr p; INTERNAL_SIZE_T s;
-#endif
 {
   INTERNAL_SIZE_T sz = p->size & ~(PREV_INUSE|NON_MAIN_ARENA);
 
@@ -2774,12 +2629,7 @@ mstate av; mchunkptr p; INTERNAL_SIZE_T s;
   Properties of nonrecycled chunks at the point they are malloced
 */
 
-#if __STD_C
 static void do_check_malloced_chunk(mstate av, mchunkptr p, INTERNAL_SIZE_T s)
-#else
-static void do_check_malloced_chunk(av, p, s)
-mstate av; mchunkptr p; INTERNAL_SIZE_T s;
-#endif
 {
   /* same as recycled case ... */
   do_check_remalloced_chunk(av, p, s);
@@ -2976,11 +2826,7 @@ static void do_check_malloc_state(mstate av)
   be extended or replaced.
 */
 
-#if __STD_C
 static Void_t* sYSMALLOc(INTERNAL_SIZE_T nb, mstate av)
-#else
-static Void_t* sYSMALLOc(nb, av) INTERNAL_SIZE_T nb; mstate av;
-#endif
 {
   mchunkptr       old_top;        /* incoming value of av->top */
   INTERNAL_SIZE_T old_size;       /* its size */
@@ -3464,11 +3310,7 @@ static Void_t* sYSMALLOc(nb, av) INTERNAL_SIZE_T nb; mstate av;
   returns 1 if it actually released any memory, else 0.
 */
 
-#if __STD_C
 static int sYSTRIm(size_t pad, mstate av)
-#else
-static int sYSTRIm(pad, av) size_t pad; mstate av;
-#endif
 {
   long  top_size;        /* Amount of top-most memory */
   long  extra;           /* Amount to release */
@@ -3529,11 +3371,7 @@ static int sYSTRIm(pad, av) size_t pad; mstate av;
 
 static void
 internal_function
-#if __STD_C
 munmap_chunk(mchunkptr p)
-#else
-munmap_chunk(p) mchunkptr p;
-#endif
 {
   INTERNAL_SIZE_T size = chunksize(p);
 
@@ -3570,11 +3408,7 @@ munmap_chunk(p) mchunkptr p;
 
 static mchunkptr
 internal_function
-#if __STD_C
 mremap_chunk(mchunkptr p, size_t new_size)
-#else
-mremap_chunk(p, new_size) mchunkptr p; size_t new_size;
-#endif
 {
   size_t page_mask = mp_.pagesize - 1;
   INTERNAL_SIZE_T offset = p->prev_size;
@@ -5111,11 +4945,7 @@ _int_free(mstate av, mchunkptr p)
   initialization code.
 */
 
-#if __STD_C
 static void malloc_consolidate(mstate av)
-#else
-static void malloc_consolidate(av) mstate av;
-#endif
 {
   mfastbinptr*    fb;                 /* current fastbin being consolidated */
   mfastbinptr*    maxfb;              /* last fastbin (for loop control) */
@@ -5589,11 +5419,7 @@ _int_memalign(mstate av, size_t alignment, size_t bytes)
   ------------------------------ calloc ------------------------------
 */
 
-#if __STD_C
 Void_t* cALLOc(size_t n_elements, size_t elem_size)
-#else
-Void_t* cALLOc(n_elements, elem_size) size_t n_elements; size_t elem_size;
-#endif
 {
   mchunkptr p;
   unsigned long clearsize;
@@ -5652,12 +5478,7 @@ Void_t* cALLOc(n_elements, elem_size) size_t n_elements; size_t elem_size;
 */
 
 Void_t**
-#if __STD_C
 _int_icalloc(mstate av, size_t n_elements, size_t elem_size, Void_t* chunks[])
-#else
-_int_icalloc(av, n_elements, elem_size, chunks)
-mstate av; size_t n_elements; size_t elem_size; Void_t* chunks[];
-#endif
 {
   size_t sz = elem_size; /* serves as 1-element array */
   /* opts arg of 3 means all elements are same size, and should be cleared */
@@ -5669,12 +5490,7 @@ mstate av; size_t n_elements; size_t elem_size; Void_t* chunks[];
 */
 
 Void_t**
-#if __STD_C
 _int_icomalloc(mstate av, size_t n_elements, size_t sizes[], Void_t* chunks[])
-#else
-_int_icomalloc(av, n_elements, sizes, chunks)
-mstate av; size_t n_elements; size_t sizes[]; Void_t* chunks[];
-#endif
 {
   return iALLOc(av, n_elements, sizes, 0, chunks);
 }
@@ -5692,12 +5508,7 @@ mstate av; size_t n_elements; size_t sizes[]; Void_t* chunks[];
 
 
 static Void_t**
-#if __STD_C
 iALLOc(mstate av, size_t n_elements, size_t* sizes, int opts, Void_t* chunks[])
-#else
-iALLOc(av, n_elements, sizes, opts, chunks)
-mstate av; size_t n_elements; size_t* sizes; int opts; Void_t* chunks[];
-#endif
 {
   INTERNAL_SIZE_T element_size;   /* chunksize of each element, if all same */
   INTERNAL_SIZE_T contents_size;  /* total size of elements */
@@ -5818,11 +5629,7 @@ mstate av; size_t n_elements; size_t* sizes; int opts; Void_t* chunks[];
 */
 
 static Void_t*
-#if __STD_C
 _int_valloc(mstate av, size_t bytes)
-#else
-_int_valloc(av, bytes) mstate av; size_t bytes;
-#endif
 {
   /* Ensure initialization/consolidation */
   if (have_fastchunks(av)) malloc_consolidate(av);
@@ -5835,11 +5642,7 @@ _int_valloc(av, bytes) mstate av; size_t bytes;
 
 
 static Void_t*
-#if __STD_C
 _int_pvalloc(mstate av, size_t bytes)
-#else
-_int_pvalloc(av, bytes) mstate av, size_t bytes;
-#endif
 {
   size_t pagesz;
 
@@ -5854,11 +5657,7 @@ _int_pvalloc(av, bytes) mstate av, size_t bytes;
   ------------------------------ malloc_trim ------------------------------
 */
 
-#if __STD_C
 static int mTRIm(mstate av, size_t pad)
-#else
-static int mTRIm(av, pad) mstate av; size_t pad;
-#endif
 {
   /* Ensure initialization/consolidation */
   malloc_consolidate (av);
@@ -5917,11 +5716,7 @@ static int mTRIm(av, pad) mstate av; size_t pad;
   ------------------------- malloc_usable_size -------------------------
 */
 
-#if __STD_C
 size_t mUSABLe(Void_t* mem)
-#else
-size_t mUSABLe(mem) Void_t* mem;
-#endif
 {
   mchunkptr p;
   if (mem != 0) {
@@ -6069,11 +5864,7 @@ void mSTATs()
   ------------------------------ mallopt ------------------------------
 */
 
-#if __STD_C
 int mALLOPt(int param_number, int value)
-#else
-int mALLOPt(param_number, value) int param_number; int value;
-#endif
 {
   mstate av = &main_arena;
   int res = 1;
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH] malloc: destroy thread cache on thread exit
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
                   ` (39 preceding siblings ...)
  2016-01-26  0:27 ` [PATCH] malloc: unifdef -m -UPER_THREAD -U_LIBC Joern Engel
@ 2016-01-26  0:27 ` Joern Engel
  2016-01-26  0:27 ` [PATCH] malloc: prefetch for tcache_malloc Joern Engel
                   ` (23 subsequent siblings)
  64 siblings, 0 replies; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:27 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

I don't think this matters much - current glibc malloc actually leaks
memory every time a thread exits.  But for correctness sake.

JIRA: PURE-27597
---
 tpc/malloc2.13/arena.h  |  2 +-
 tpc/malloc2.13/malloc.c |  1 +
 tpc/malloc2.13/tcache.h | 15 +++++++++++++++
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/tpc/malloc2.13/arena.h b/tpc/malloc2.13/arena.h
index 599774ea1300..9236a3231f07 100644
--- a/tpc/malloc2.13/arena.h
+++ b/tpc/malloc2.13/arena.h
@@ -363,7 +363,7 @@ static void ptmalloc_init(void)
 	}
 
 	mutex_init(&list_lock);
-	tsd_key_create(&cache_key, NULL);
+	tsd_key_create(&cache_key, tcache_destroy);
 	tsd_key_create(&arena_key, NULL);
 	tsd_setspecific(arena_key, (Void_t *) & main_arena);
 	thread_atfork(ptmalloc_lock_all, ptmalloc_unlock_all, ptmalloc_unlock_all2);
diff --git a/tpc/malloc2.13/malloc.c b/tpc/malloc2.13/malloc.c
index 022b9a4ce712..94b7e223ec6f 100644
--- a/tpc/malloc2.13/malloc.c
+++ b/tpc/malloc2.13/malloc.c
@@ -2209,6 +2209,7 @@ static Void_t*  sYSMALLOc(INTERNAL_SIZE_T, struct malloc_state *);
 static int      sYSTRIm(size_t, struct malloc_state *);
 static void     malloc_consolidate(struct malloc_state *);
 static Void_t** iALLOc(struct malloc_state *, size_t, size_t*, int, Void_t**);
+static void tcache_destroy(void *_cache);
 
 
 /* -------------- Early definitions for debugging hooks ---------------- */
diff --git a/tpc/malloc2.13/tcache.h b/tpc/malloc2.13/tcache.h
index 1d324526194f..628dbc00256a 100644
--- a/tpc/malloc2.13/tcache.h
+++ b/tpc/malloc2.13/tcache.h
@@ -202,6 +202,21 @@ static void tcache_gc(struct thread_cache *cache)
 	}
 }
 
+static void tcache_destroy(void *_cache)
+{
+	struct thread_cache *cache = _cache;
+
+	/*
+	 * tcache_gc almost does what we want.  It tries hard to only
+	 * free part of the cache, but call it often enough and it
+	 * will free everything.  Probably slightly slower than having
+	 * a specialized copy of the same code, but also much simpler.
+	 */
+	while (cache->tc_count)
+		tcache_gc(cache);
+	public_fREe(cache);
+}
+
 static void add_to_bin(struct malloc_chunk **bin, struct malloc_chunk *p)
 {
 	struct malloc_chunk *old;
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH] malloc: plug thread-cache memory leak
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
                   ` (44 preceding siblings ...)
  2016-01-26  0:27 ` [PATCH] malloc: unifdef -m -DUSE_ARENAS -DHAVE_MMAP Joern Engel
@ 2016-01-26  0:28 ` Joern Engel
  2016-01-26  0:32 ` [PATCH] malloc: create aliases for malloc, free, Joern Engel
                   ` (18 subsequent siblings)
  64 siblings, 0 replies; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:28 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

Freeing the cache into itself is an exploit worthy of Münchhausen.  We
only leaked a little memory and only when destroying threads.  But low
impact is no excuse for silly bugs.

JIRA: PURE-27597
---
 tpc/malloc2.13/tcache.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/tpc/malloc2.13/tcache.h b/tpc/malloc2.13/tcache.h
index ece03fc464cd..b7bdb9ad41e4 100644
--- a/tpc/malloc2.13/tcache.h
+++ b/tpc/malloc2.13/tcache.h
@@ -226,6 +226,7 @@ static void tcache_destroy(void *_cache)
 {
 	struct thread_cache *cache = _cache;
 
+	mutex_lock(&cache->mutex);
 	/*
 	 * tcache_gc almost does what we want.  It tries hard to only
 	 * free part of the cache, but call it often enough and it
@@ -234,6 +235,15 @@ static void tcache_destroy(void *_cache)
 	 */
 	while (cache->tc_count)
 		tcache_gc(cache);
+	/*
+	 * public_free will call tcache_free, which could free the
+	 * cache into... the cache.  Result would be a memory leak.
+	 * But since we hold the cache-mutex, it cannot and has to
+	 * free the memory back to the arena.
+	 * A bit of a hack, but creates smaller code than having a
+	 * second copy of uncached_free.  And this is by no means a
+	 * performance-critical path.
+	 */
 	public_fREe(cache);
 }
 
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH] malloc: remove get_backup_arena() from tcache_malloc()
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
                   ` (50 preceding siblings ...)
  2016-01-26  0:32 ` [PATCH] malloc: use bitmap to conserve hot bins Joern Engel
@ 2016-01-26  0:32 ` Joern Engel
  2016-01-26  0:32 ` [PATCH] malloc: define __libc_memalign Joern Engel
                   ` (12 subsequent siblings)
  64 siblings, 0 replies; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:32 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

get_backup_arena() appears useless on our systems anyway and is doubly
useless to tcache_malloc.  If we cannot allocate a cache we continue
without a cache and try again next time.  Saves 48 bytes of text.

JIRA: PURE-35526
---
 tpc/malloc2.13/tcache.h | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/tpc/malloc2.13/tcache.h b/tpc/malloc2.13/tcache.h
index b7bdb9ad41e4..9e210a973d10 100644
--- a/tpc/malloc2.13/tcache.h
+++ b/tpc/malloc2.13/tcache.h
@@ -273,10 +273,6 @@ static void *tcache_malloc(size_t size)
 	if (!cache) {
 		arena = arena_get(sizeof(*cache));
 		cache = _int_malloc(arena, sizeof(*cache));
-		if (!cache) {
-			arena = get_backup_arena(arena, sizeof(*cache));
-			cache = _int_malloc(arena, sizeof(*cache));
-		}
 		arena_unlock(arena);
 		if (!cache)
 			return NULL;
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH] malloc: remove stale condition
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
                   ` (48 preceding siblings ...)
  2016-01-26  0:32 ` [PATCH] malloc: remove tcache prefetching Joern Engel
@ 2016-01-26  0:32 ` Joern Engel
  2016-01-26  0:32 ` [PATCH] malloc: use bitmap to conserve hot bins Joern Engel
                   ` (14 subsequent siblings)
  64 siblings, 0 replies; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:32 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

It is no longer possible to get there with arena==NULL.

JIRA: PURE-27597
---
 tpc/malloc2.13/arena.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tpc/malloc2.13/arena.h b/tpc/malloc2.13/arena.h
index 86e77ffe57f6..3e6107fbc4a4 100644
--- a/tpc/malloc2.13/arena.h
+++ b/tpc/malloc2.13/arena.h
@@ -817,7 +817,7 @@ static struct malloc_state *arena_get(size_t size)
 	if (!arena || arena->numa_node != node)
 		arena = numa_arena[node];
 
-	if (arena && !mutex_trylock(&arena->mutex)) {
+	if (!mutex_trylock(&arena->mutex)) {
 		THREAD_STAT(++(arena->stat_lock_direct));
 	} else
 		arena = arena_get2(arena, size);
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH] malloc: remove hooks from malloc() and free()
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
                   ` (55 preceding siblings ...)
  2016-01-26  0:32 ` [PATCH] malloc: brain-dead thread cache Joern Engel
@ 2016-01-26  0:32 ` Joern Engel
  2016-01-26  0:32 ` [PATCH] malloc: speed up mmap Joern Engel
                   ` (7 subsequent siblings)
  64 siblings, 0 replies; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:32 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

Apparently these hooks are not needed on our systems.  They cost a
significant bit of performance when present.  And most importantly, they
cause failures including memory corruptions in stress tests.

A much larger change that we should do eventually is removing all hooks
throughout malloc.

JIRA: PURE-27597
---
 tpc/malloc2.13/malloc.c | 10 ----------
 1 file changed, 10 deletions(-)

diff --git a/tpc/malloc2.13/malloc.c b/tpc/malloc2.13/malloc.c
index 2427fae19b39..c4e3fbada60a 100644
--- a/tpc/malloc2.13/malloc.c
+++ b/tpc/malloc2.13/malloc.c
@@ -3290,10 +3290,6 @@ Void_t *public_mALLOc(size_t bytes)
 	struct malloc_state *ar_ptr;
 	Void_t *victim;
 
-	__malloc_ptr_t(*hook) (size_t, __const __malloc_ptr_t) = force_reg(dlmalloc_hook);
-	if (hook != NULL)
-		return (*hook) (bytes, RETURN_ADDRESS(0));
-
 	victim = tcache_malloc(bytes);
 	if (victim)
 		return victim;
@@ -3315,12 +3311,6 @@ void public_fREe(Void_t * mem)
 {
 	mchunkptr p;		/* chunk corresponding to mem */
 
-	void (*hook) (__malloc_ptr_t, __const __malloc_ptr_t) = force_reg(dlfree_hook);
-	if (hook != NULL) {
-		(*hook) (mem, RETURN_ADDRESS(0));
-		return;
-	}
-
 	if (mem == 0)		/* free(0) has no effect */
 		return;
 
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH] malloc: speed up mmap
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
                   ` (56 preceding siblings ...)
  2016-01-26  0:32 ` [PATCH] malloc: remove hooks from malloc() and free() Joern Engel
@ 2016-01-26  0:32 ` Joern Engel
  2016-01-26  0:32 ` [PATCH] malloc: fix calculation of aligned heaps Joern Engel
                   ` (6 subsequent siblings)
  64 siblings, 0 replies; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:32 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

MAP_POPULATE might actually slow down the mmap proper, but it will speed
up all subsequent operations, including the memset().  And shrinking the
size of memset to the size of malloc headers - one for the heap and one
for the arena - should be somewhat faster then clearing all 64MB.  Even
at 10GB/s we save 6ms.

https://codereviews.purestorage.com/r/26569/
JIRA: PURE-46853
---
 tpc/malloc2.13/arena.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tpc/malloc2.13/arena.h b/tpc/malloc2.13/arena.h
index e53076d213b2..c959b77b9e9d 100644
--- a/tpc/malloc2.13/arena.h
+++ b/tpc/malloc2.13/arena.h
@@ -379,7 +379,7 @@ static char *aligned_heap_area;
 static void *mmap_for_heap(void *addr, size_t length, int *must_clear)
 {
 	int prot = PROT_READ | PROT_WRITE;
-	int flags = MAP_PRIVATE;
+	int flags = MAP_PRIVATE | MAP_POPULATE;
 	void *ret;
 
 	ret = MMAP(addr, length, prot, flags | MAP_HUGETLB);
@@ -462,7 +462,7 @@ static heap_info *new_heap(size_t size, size_t top_pad, int numa_node)
 	}
 	mbind_memory(p2, HEAP_MAX_SIZE, numa_node);
 	if (must_clear)
-		memset(p2, 0, HEAP_MAX_SIZE);
+		memset(p2, 0, sizeof(heap_info) + sizeof(struct malloc_state));
 	h = (heap_info *) p2;
 	h->size = size;
 	h->mprotect_size = size;
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH] malloc: create aliases for malloc, free,...
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
                   ` (45 preceding siblings ...)
  2016-01-26  0:28 ` [PATCH] malloc: plug thread-cache memory leak Joern Engel
@ 2016-01-26  0:32 ` Joern Engel
  2016-01-26  0:32 ` [PATCH] malloc: fix mbind on old kernels Joern Engel
                   ` (17 subsequent siblings)
  64 siblings, 0 replies; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:32 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

Our version explicitly uses the dl prefix, so dlmalloc, dlfree, etc.
But if you want to DL_PRELOAD our version to replace libc malloc, you
need actual malloc, free, etc. to be defined.

And while at it, remove the old broken aliasing under !PURE_HACK.

JIRA: PURE-42344
https://codereviews.purestorage.com/r/23673/
---
 tpc/malloc2.13/malloc.c | 52 +++++++++++++++++++++----------------------------
 1 file changed, 22 insertions(+), 30 deletions(-)

diff --git a/tpc/malloc2.13/malloc.c b/tpc/malloc2.13/malloc.c
index 439c1247fe99..2e3067344ad2 100644
--- a/tpc/malloc2.13/malloc.c
+++ b/tpc/malloc2.13/malloc.c
@@ -1102,14 +1102,12 @@ Void_t*  public_gET_STATe(void);
 */
 int      public_sET_STATe(Void_t*);
 
-#if defined(PURE_HACK)
 /*
   posix_memalign(void **memptr, size_t alignment, size_t size);
 
   POSIX wrapper like memalign(), checking for validity of size.
 */
 int      dlposix_memalign(void **, size_t, size_t);
-#endif
 
 /* mallopt tuning options */
 
@@ -5393,8 +5391,7 @@ malloc_printerr(int action, const char *str, void *ptr)
 }
 #endif
 
-#if defined(PURE_HACK)
-# include <sys/param.h>
+#include <sys/param.h>
 
 /* We need a wrapper function for one of the additions of POSIX.  */
 int
@@ -5427,10 +5424,6 @@ dlposix_memalign (void **memptr, size_t alignment, size_t size)
   return ENOMEM;
 }
 
-#ifndef PURE_HACK
-weak_alias (__posix_memalign, posix_memalign)
-#endif
-
 int
 dlmalloc_info (int options, FILE *fp)
 {
@@ -5614,28 +5607,27 @@ dlmalloc_info (int options, FILE *fp)
   return 0;
 }
 
-#ifndef PURE_HACK
-strong_alias (__libc_calloc, __calloc) weak_alias (__libc_calloc, calloc)
-strong_alias (__libc_free, __cfree) weak_alias (__libc_free, cfree)
-strong_alias (__libc_free, __free) strong_alias (__libc_free, free)
-strong_alias (__libc_malloc, __malloc) strong_alias (__libc_malloc, malloc)
-strong_alias (__libc_memalign, __memalign)
-weak_alias (__libc_memalign, memalign)
-strong_alias (__libc_realloc, __realloc) strong_alias (__libc_realloc, realloc)
-strong_alias (__libc_valloc, __valloc) weak_alias (__libc_valloc, valloc)
-strong_alias (__libc_pvalloc, __pvalloc) weak_alias (__libc_pvalloc, pvalloc)
-strong_alias (__libc_mallinfo, __mallinfo)
-weak_alias (__libc_mallinfo, mallinfo)
-strong_alias (__libc_mallopt, __mallopt) weak_alias (__libc_mallopt, mallopt)
-
-weak_alias (__malloc_stats, malloc_stats)
-weak_alias (__malloc_usable_size, malloc_usable_size)
-weak_alias (__malloc_trim, malloc_trim)
-weak_alias (__malloc_get_state, malloc_get_state)
-weak_alias (__malloc_set_state, malloc_set_state)
-#endif
-
-#endif /* _LIBC */
+# define strong_alias(name, aliasname) \
+	extern __typeof(name) aliasname __attribute__((alias(#name)))
+
+strong_alias(dlcalloc, calloc);
+strong_alias(dlfree, free);
+strong_alias(dlcfree, cfree);
+strong_alias(dlmalloc, malloc);
+strong_alias(dlmemalign, memalign);
+strong_alias(dlrealloc, realloc);
+strong_alias(dlvalloc, valloc);
+strong_alias(dlpvalloc, pvalloc);
+strong_alias(dlmallinfo, mallinfo);
+strong_alias(dlmallopt, mallopt);
+strong_alias(dlmalloc_trim, malloc_trim);
+strong_alias(dlmalloc_stats, malloc_stats);
+strong_alias(dlmalloc_usable_size, malloc_usable_size);
+strong_alias(dlindependent_calloc, independent_calloc);
+strong_alias(dlindependent_comalloc, independent_comalloc);
+strong_alias(dlget_state, get_state);
+strong_alias(dlset_state, set_state);
+strong_alias(dlposix_memalign, posix_memalign);
 
 /* ------------------------------------------------------------
 History:
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH] malloc: remove atfork hooks
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
                   ` (58 preceding siblings ...)
  2016-01-26  0:32 ` [PATCH] malloc: fix calculation of aligned heaps Joern Engel
@ 2016-01-26  0:32 ` Joern Engel
  2016-01-26  0:32 ` [PATCH] malloc: remove all remaining hooks Joern Engel
                   ` (4 subsequent siblings)
  64 siblings, 0 replies; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:32 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

They are no longer necessary or even checked.  More importantly, this
also no longer sets the thread-local arena-pointer to -1, which
triggered a segfault.

JIRA: PURE-42344
https://codereviews.purestorage.com/r/23706/
---
 tpc/malloc2.13/arena.h  | 129 +++++-------------------------------------------
 tpc/malloc2.13/hooks.h  |  48 ------------------
 tpc/malloc2.13/malloc.c |  13 -----
 3 files changed, 12 insertions(+), 178 deletions(-)

diff --git a/tpc/malloc2.13/arena.h b/tpc/malloc2.13/arena.h
index 587bad51dd63..2205c52da8f1 100644
--- a/tpc/malloc2.13/arena.h
+++ b/tpc/malloc2.13/arena.h
@@ -105,83 +105,9 @@ static int __malloc_initialized = -1;
 
 /* atfork support.  */
 
-static __malloc_ptr_t (*save_malloc_hook) (size_t __size,
-					   __const __malloc_ptr_t);
-static __malloc_ptr_t (*save_memalign_hook) (size_t __align, size_t __size,
-					     __const __malloc_ptr_t);
-static void           (*save_free_hook) (__malloc_ptr_t __ptr,
-					 __const __malloc_ptr_t);
-static Void_t*        save_arena;
-
-#ifdef ATFORK_MEM
-ATFORK_MEM;
-#endif
-
-/* Magic value for the thread-specific arena pointer when
-   malloc_atfork() is in use.  */
-
-#define ATFORK_ARENA_PTR ((Void_t*)-1)
-
-/* The following hooks are used while the `atfork' handling mechanism
-   is active. */
-
-static Void_t*
-malloc_atfork(size_t sz, const Void_t *caller)
-{
-  Void_t *vptr = NULL;
-  Void_t *victim;
-
-  tsd_getspecific(arena_key, vptr);
-  if(vptr == ATFORK_ARENA_PTR) {
-    /* We are the only thread that may allocate at all.  */
-    if(save_malloc_hook != malloc_check) {
-      return _int_malloc(&main_arena, sz);
-    } else {
-      if(top_check()<0)
-	return 0;
-      victim = _int_malloc(&main_arena, sz+1);
-      return mem2mem_check(victim, sz);
-    }
-  } else {
-    /* Suspend the thread until the `atfork' handlers have completed.
-       By that time, the hooks will have been reset as well, so that
-       mALLOc() can be used again. */
-    (void)mutex_lock(&list_lock);
-    (void)mutex_unlock(&list_lock);
-    return public_mALLOc(sz);
-  }
-}
-
-static void
-free_atfork(Void_t* mem, const Void_t *caller)
-{
-  Void_t *vptr = NULL;
-  struct malloc_state * ar_ptr;
-  mchunkptr p;                          /* chunk corresponding to mem */
-
-  if (mem == 0)                              /* free(0) has no effect */
-    return;
-
-  p = mem2chunk(mem);         /* do not bother to replicate free_check here */
-
-  if (chunk_is_mmapped(p))                       /* release mmapped memory. */
-  {
-    munmap_chunk(p);
-    return;
-  }
-
-  ar_ptr = arena_for_chunk(p);
-  tsd_getspecific(arena_key, vptr);
-  if(vptr != ATFORK_ARENA_PTR)
-    (void)mutex_lock(&ar_ptr->mutex);
-  _int_free(ar_ptr, p);
-  if(vptr != ATFORK_ARENA_PTR)
-    (void)mutex_unlock(&ar_ptr->mutex);
-}
-
-
 /* Counter for number of times the list is locked by the same thread.  */
 static unsigned int atfork_recursive_cntr;
+static unsigned int atfork_recursive_thread;
 
 /* The following two functions are registered via thread_atfork() to
    make sure that the mutexes remain in a consistent state in the
@@ -189,6 +115,12 @@ static unsigned int atfork_recursive_cntr;
    temporarily, because the `atfork' handler mechanism may use
    malloc/free internally (e.g. in LinuxThreads). */
 
+#include <sys/syscall.h>
+static inline pid_t gettid(void)
+{
+	return syscall(SYS_gettid);
+}
+
 static void
 ptmalloc_lock_all (void)
 {
@@ -198,9 +130,7 @@ ptmalloc_lock_all (void)
     return;
   if (mutex_trylock(&list_lock))
     {
-      Void_t *my_arena;
-      tsd_getspecific(arena_key, my_arena);
-      if (my_arena == ATFORK_ARENA_PTR)
+      if (atfork_recursive_thread == gettid())
 	/* This is the same thread which already locks the global list.
 	   Just bump the counter.  */
 	goto out;
@@ -213,13 +143,7 @@ ptmalloc_lock_all (void)
     ar_ptr = ar_ptr->next;
     if(ar_ptr == &main_arena) break;
   }
-  save_malloc_hook = dlmalloc_hook;
-  save_free_hook = dlfree_hook;
-  dlmalloc_hook = malloc_atfork;
-  dlfree_hook = free_atfork;
-  /* Only the current thread may perform malloc/free calls now. */
-  tsd_getspecific(arena_key, save_arena);
-  tsd_setspecific(arena_key, ATFORK_ARENA_PTR);
+  atfork_recursive_thread = gettid();
  out:
   ++atfork_recursive_cntr;
 }
@@ -233,9 +157,7 @@ ptmalloc_unlock_all (void)
     return;
   if (--atfork_recursive_cntr != 0)
     return;
-  tsd_setspecific(arena_key, save_arena);
-  dlmalloc_hook = save_malloc_hook;
-  dlfree_hook = save_free_hook;
+  atfork_recursive_thread = 0;
   for(ar_ptr = &main_arena;;) {
     (void)mutex_unlock(&ar_ptr->mutex);
     ar_ptr = ar_ptr->next;
@@ -259,7 +181,7 @@ ptmalloc_unlock_all2 (void)
   if(__malloc_initialized < 1)
     return;
 #if defined MALLOC_HOOKS
-  tsd_setspecific(arena_key, save_arena);
+  atfork_recursive_thread = 0;
   dlmalloc_hook = save_malloc_hook;
   dlfree_hook = save_free_hook;
 #endif
@@ -302,7 +224,6 @@ static int num_nodes = 0;
 #include <sys/types.h>
 #include <dirent.h>
 #include <string.h>
-#include <sys/syscall.h>
 
 /*
  * Wouldn't it be nice to get this with a single syscall instead?
@@ -329,11 +250,6 @@ static int numa_node_count(void)
 	return ret;
 }
 
-static inline pid_t gettid(void)
-{
-	return syscall(SYS_gettid);
-}
-
 static void ptmalloc_init(void)
 {
 	const char *s;
@@ -355,19 +271,6 @@ static void ptmalloc_init(void)
 	}
 	ptmalloc_init_minimal();
 
-#ifndef NO_THREADS
-#ifndef NO_STARTER
-	/* With some threads implementations, creating thread-specific data
-	   or initializing a mutex may call malloc() itself.  Provide a
-	   simple starter version (realloc() won't work). */
-	save_malloc_hook = dlmalloc_hook;
-	save_memalign_hook = dlmemalign_hook;
-	save_free_hook = dlfree_hook;
-	dlmalloc_hook = malloc_starter;
-	dlmemalign_hook = memalign_starter;
-	dlfree_hook = free_starter;
-#endif				/* !defined NO_STARTER */
-#endif				/* !defined NO_THREADS */
 	mutex_init(&main_arena.mutex);
 	main_arena.next = &main_arena;
 	main_arena.local_next = &main_arena;
@@ -391,15 +294,7 @@ static void ptmalloc_init(void)
 	tsd_key_create(&arena_key, NULL);
 	tsd_setspecific(arena_key, (Void_t *) & main_arena);
 	thread_atfork(ptmalloc_lock_all, ptmalloc_unlock_all, ptmalloc_unlock_all2);
-#ifndef NO_THREADS
-#ifndef NO_STARTER
-	dlmalloc_hook = save_malloc_hook;
-	dlmemalign_hook = save_memalign_hook;
-	dlfree_hook = save_free_hook;
-#else
-#undef NO_STARTER
-#endif
-#endif
+
 	if (!secure) {
 		if ((s = getenv("MALLOC_TRIM_THRESHOLD_")))
 			mALLOPt(M_TRIM_THRESHOLD, atoi(s));
diff --git a/tpc/malloc2.13/hooks.h b/tpc/malloc2.13/hooks.h
index afc8eeb93a8b..f192080a71bf 100644
--- a/tpc/malloc2.13/hooks.h
+++ b/tpc/malloc2.13/hooks.h
@@ -361,54 +361,6 @@ memalign_check(size_t alignment, size_t bytes, const Void_t *caller)
   return mem2mem_check(mem, bytes);
 }
 
-#ifndef NO_THREADS
-
-
-# ifdef NO_STARTER
-#  undef NO_STARTER
-# else
-
-/* The following hooks are used when the global initialization in
-   ptmalloc_init() hasn't completed yet. */
-
-static Void_t*
-malloc_starter(size_t sz, const Void_t *caller)
-{
-  Void_t* victim;
-
-  victim = _int_malloc(&main_arena, sz);
-
-  return victim ? BOUNDED_N(victim, sz) : 0;
-}
-
-static Void_t*
-memalign_starter(size_t align, size_t sz, const Void_t *caller)
-{
-  Void_t* victim;
-
-  victim = _int_memalign(&main_arena, align, sz);
-
-  return victim ? BOUNDED_N(victim, sz) : 0;
-}
-
-static void
-free_starter(Void_t* mem, const Void_t *caller)
-{
-  mchunkptr p;
-
-  if(!mem) return;
-  p = mem2chunk(mem);
-  if (chunk_is_mmapped(p)) {
-    munmap_chunk(p);
-    return;
-  }
-  _int_free(&main_arena, p);
-}
-
-# endif	/* !defiend NO_STARTER */
-#endif /* NO_THREADS */
-
-
 /* Get/set state: malloc_get_state() records the current state of all
    malloc variables (_except_ for the actual heap contents and `hook'
    function pointers) in a system dependent, opaque data structure.
diff --git a/tpc/malloc2.13/malloc.c b/tpc/malloc2.13/malloc.c
index 2e3067344ad2..f56321444b76 100644
--- a/tpc/malloc2.13/malloc.c
+++ b/tpc/malloc2.13/malloc.c
@@ -1392,19 +1392,6 @@ static Void_t*   realloc_check(Void_t* oldmem, size_t bytes,
 			       const Void_t *caller);
 static Void_t*   memalign_check(size_t alignment, size_t bytes,
 				const Void_t *caller);
-#ifndef NO_THREADS
-# ifdef NO_STARTER
-#  undef NO_STARTER
-# else
-static Void_t*   malloc_starter(size_t sz, const Void_t *caller);
-static Void_t*   memalign_starter(size_t aln, size_t sz, const Void_t *caller);
-static void      free_starter(Void_t* mem, const Void_t *caller);
-# endif
-static Void_t*   malloc_atfork(size_t sz, const Void_t *caller);
-static void      free_atfork(Void_t* mem, const Void_t *caller);
-#endif
-
-
 
 
 
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH] malloc: move out perturb_byte conditionals
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
                   ` (52 preceding siblings ...)
  2016-01-26  0:32 ` [PATCH] malloc: define __libc_memalign Joern Engel
@ 2016-01-26  0:32 ` Joern Engel
  2016-01-26  0:32 ` [PATCH] malloc: rename *.ch to *.h Joern Engel
                   ` (10 subsequent siblings)
  64 siblings, 0 replies; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:32 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

Lots of repetitive code.  Probably not worth an inline function.

JIRA: PURE-27597
---
 tpc/malloc2.13/malloc.c | 41 +++++++++++++++++++----------------------
 tpc/malloc2.13/tcache.h |  9 +++------
 2 files changed, 22 insertions(+), 28 deletions(-)

diff --git a/tpc/malloc2.13/malloc.c b/tpc/malloc2.13/malloc.c
index 74b35f6aa366..2427fae19b39 100644
--- a/tpc/malloc2.13/malloc.c
+++ b/tpc/malloc2.13/malloc.c
@@ -2264,8 +2264,14 @@ static int check_action = DEFAULT_CHECK_ACTION;
 
 static int perturb_byte;
 
-#define alloc_perturb(p, n) memset (p, (perturb_byte ^ 0xff) & 0xff, n)
-#define free_perturb(p, n) memset (p, perturb_byte & 0xff, n)
+#define alloc_perturb(p, n) do {				\
+	if (perturb_byte)					\
+		memset(p, (perturb_byte ^ 0xff) & 0xff, n);	\
+} while (0)
+#define free_perturb(p, n) do {					\
+	if (perturb_byte)					\
+		memset(p, perturb_byte & 0xff, n);		\
+} while (0)
 
 
 /* ------------------- Support for multiple arenas -------------------- */
@@ -3688,8 +3694,7 @@ _int_malloc(struct malloc_state * av, size_t bytes)
       *fb = victim->fd;
       check_remalloced_chunk(av, victim, nb);
       void *p = chunk2mem(victim);
-      if (perturb_byte)
-	alloc_perturb (p, bytes);
+      alloc_perturb(p, bytes);
       return p;
     }
   }
@@ -3724,8 +3729,7 @@ _int_malloc(struct malloc_state * av, size_t bytes)
 	  victim->size |= NON_MAIN_ARENA;
 	check_malloced_chunk(av, victim, nb);
 	void *p = chunk2mem(victim);
-	if (perturb_byte)
-	  alloc_perturb (p, bytes);
+	alloc_perturb(p, bytes);
 	return p;
       }
     }
@@ -3804,8 +3808,7 @@ _int_malloc(struct malloc_state * av, size_t bytes)
 
 	check_malloced_chunk(av, victim, nb);
 	void *p = chunk2mem(victim);
-	if (perturb_byte)
-	  alloc_perturb (p, bytes);
+	alloc_perturb(p, bytes);
 	return p;
       }
 
@@ -3821,8 +3824,7 @@ _int_malloc(struct malloc_state * av, size_t bytes)
 	  victim->size |= NON_MAIN_ARENA;
 	check_malloced_chunk(av, victim, nb);
 	void *p = chunk2mem(victim);
-	if (perturb_byte)
-	  alloc_perturb (p, bytes);
+	alloc_perturb(p, bytes);
 	return p;
       }
 
@@ -3946,8 +3948,7 @@ _int_malloc(struct malloc_state * av, size_t bytes)
 	}
 	check_malloced_chunk(av, victim, nb);
 	void *p = chunk2mem(victim);
-	if (perturb_byte)
-	  alloc_perturb (p, bytes);
+	alloc_perturb(p, bytes);
 	return p;
       }
     }
@@ -4050,8 +4051,7 @@ _int_malloc(struct malloc_state * av, size_t bytes)
 	}
 	check_malloced_chunk(av, victim, nb);
 	void *p = chunk2mem(victim);
-	if (perturb_byte)
-	  alloc_perturb (p, bytes);
+	alloc_perturb(p, bytes);
 	return p;
       }
     }
@@ -4085,8 +4085,7 @@ _int_malloc(struct malloc_state * av, size_t bytes)
 
       check_malloced_chunk(av, victim, nb);
       void *p = chunk2mem(victim);
-      if (perturb_byte)
-	alloc_perturb (p, bytes);
+      alloc_perturb(p, bytes);
       return p;
     }
 
@@ -4107,8 +4106,8 @@ _int_malloc(struct malloc_state * av, size_t bytes)
     */
     else {
       void *p = sYSMALLOc(nb, av);
-      if (p != NULL && perturb_byte)
-	alloc_perturb (p, bytes);
+      if (p)
+	alloc_perturb(p, bytes);
       return p;
     }
   }
@@ -4181,8 +4180,7 @@ _int_free(struct malloc_state * av, mchunkptr p)
 	  }
       }
 
-    if (perturb_byte)
-      free_perturb (chunk2mem(p), size - 2 * SIZE_SZ);
+    free_perturb(chunk2mem(p), size - 2 * SIZE_SZ);
 
     set_fastchunks(av);
     unsigned int idx = fastbin_index(size);
@@ -4244,8 +4242,7 @@ _int_free(struct malloc_state * av, mchunkptr p)
 	goto errout;
       }
 
-    if (perturb_byte)
-      free_perturb (chunk2mem(p), size - 2 * SIZE_SZ);
+    free_perturb(chunk2mem(p), size - 2 * SIZE_SZ);
 
     /* consolidate backward */
     if (!prev_inuse(p)) {
diff --git a/tpc/malloc2.13/tcache.h b/tpc/malloc2.13/tcache.h
index 00fe24249d49..e267ce905ed0 100644
--- a/tpc/malloc2.13/tcache.h
+++ b/tpc/malloc2.13/tcache.h
@@ -287,8 +287,7 @@ static void *tcache_malloc(size_t size)
 		void *p = chunk2mem(victim);
 		cache->tc_size -= chunksize(victim);
 		cache->tc_count--;
-		if (perturb_byte)
-			alloc_perturb(p, size);
+		alloc_perturb(p, size);
 		return p;
 	}
 
@@ -335,8 +334,7 @@ static void *tcache_malloc(size_t size)
 	}
 	arena_unlock(arena);
 	assert(!victim || arena == arena_for_chunk(mem2chunk(victim)));
-	if (perturb_byte)
-		alloc_perturb(victim, size);
+	alloc_perturb(victim, size);
 	return victim;
 }
 
@@ -368,8 +366,7 @@ static void tcache_free(mchunkptr p)
 		malloc_printerr(check_action, "invalid tcache entry", chunk2mem(p));
 		return;
 	}
-	if (perturb_byte)
-		free_perturb(p, size - 2 * SIZE_SZ);
+	free_perturb(p, size - 2 * SIZE_SZ);
 	add_to_bin(bin, p);
 	if (cache->tc_size > CACHE_SIZE)
 		tcache_gc(cache);
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH] malloc: Don't call tsd_setspecific before tsd_key_create
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
                   ` (60 preceding siblings ...)
  2016-01-26  0:32 ` [PATCH] malloc: remove all remaining hooks Joern Engel
@ 2016-01-26  0:32 ` Joern Engel
  2016-01-26  0:32 ` [PATCH] malloc: allow recursion from ptmalloc_init to malloc Joern Engel
                   ` (2 subsequent siblings)
  64 siblings, 0 replies; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:32 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Roland Dreier

From: Roland Dreier <roland@purestorage.com>

Commit 2c9effe0e1e6 ("malloc: initial numa support") added a call from
ptmalloc_init() to "tsd_setspecific(arena_key, ..." (via
_int_new_arena()) before "tsd_key_create(&arena_key, ...". This leads to
corruption someone else's thread-local storage.

Fix this by moving the tsd_key_create() calls earlier.

Found while trying to run gcc AddressSanitizer.

JIRA: PURE-43504
https://codereviews.purestorage.com/r/23907/
Reviewed-by: Joern
---
 tpc/malloc2.13/arena.h | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/tpc/malloc2.13/arena.h b/tpc/malloc2.13/arena.h
index fdb029e89669..e53076d213b2 100644
--- a/tpc/malloc2.13/arena.h
+++ b/tpc/malloc2.13/arena.h
@@ -272,6 +272,10 @@ static void ptmalloc_init(void)
 	main_arena.local_next = &main_arena;
 	main_arena.numa_node = -1;
 
+	mutex_init(&list_lock);
+	tsd_key_create(&cache_key, tcache_destroy);
+	tsd_key_create(&arena_key, NULL);
+
 	/* numa_node_count() can recurse into malloc().  Use main_arena
 	   for all numa nodes and set init_tid to allow recursion. */
 	for (i = 0; i < MAX_NUMA_NODES; i++) {
@@ -285,9 +289,6 @@ static void ptmalloc_init(void)
 		(void)mutex_unlock(&numa_arena[i]->mutex);
 	}
 
-	mutex_init(&list_lock);
-	tsd_key_create(&cache_key, tcache_destroy);
-	tsd_key_create(&arena_key, NULL);
 	tsd_setspecific(arena_key, (Void_t *) & main_arena);
 	thread_atfork(ptmalloc_lock_all, ptmalloc_unlock_all, ptmalloc_unlock_all2);
 
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH] malloc: brain-dead thread cache
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
                   ` (54 preceding siblings ...)
  2016-01-26  0:32 ` [PATCH] malloc: rename *.ch to *.h Joern Engel
@ 2016-01-26  0:32 ` Joern Engel
  2016-01-26  0:32 ` [PATCH] malloc: remove hooks from malloc() and free() Joern Engel
                   ` (8 subsequent siblings)
  64 siblings, 0 replies; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:32 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

From: Joern Engel <joern@purestorage.org>

No prefetching yet, cache_gc takes a lock for every single object,
entire cache gets flushed on gc,...

JIRA: PURE-27597
---
 tpc/malloc2.13/arena.h  |   2 +
 tpc/malloc2.13/malloc.c |  11 +++
 tpc/malloc2.13/tcache.h | 183 ++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 196 insertions(+)
 create mode 100644 tpc/malloc2.13/tcache.h

diff --git a/tpc/malloc2.13/arena.h b/tpc/malloc2.13/arena.h
index 0804ecfe3a26..d66b4e7029a2 100644
--- a/tpc/malloc2.13/arena.h
+++ b/tpc/malloc2.13/arena.h
@@ -72,6 +72,7 @@ extern int sanity_check_heap_info_alignment[(sizeof (heap_info)
 
 /* Thread specific data */
 
+static tsd_key_t cache_key;
 static tsd_key_t arena_key;
 static mutex_t list_lock;
 
@@ -362,6 +363,7 @@ static void ptmalloc_init(void)
 	}
 
 	mutex_init(&list_lock);
+	tsd_key_create(&cache_key, NULL);
 	tsd_key_create(&arena_key, NULL);
 	tsd_setspecific(arena_key, (Void_t *) & main_arena);
 	thread_atfork(ptmalloc_lock_all, ptmalloc_unlock_all, ptmalloc_unlock_all2);
diff --git a/tpc/malloc2.13/malloc.c b/tpc/malloc2.13/malloc.c
index dca97ef553c0..40f6aa578c6f 100644
--- a/tpc/malloc2.13/malloc.c
+++ b/tpc/malloc2.13/malloc.c
@@ -3254,6 +3254,8 @@ static struct malloc_state *get_backup_arena(struct malloc_state *arena, size_t
 
 /*------------------------ Public wrappers. --------------------------------*/
 
+#include "tcache.h"
+
 Void_t *public_mALLOc(size_t bytes)
 {
 	struct malloc_state *ar_ptr;
@@ -3263,6 +3265,10 @@ Void_t *public_mALLOc(size_t bytes)
 	if (hook != NULL)
 		return (*hook) (bytes, RETURN_ADDRESS(0));
 
+	victim = tcache_malloc(bytes);
+	if (victim)
+		return victim;
+
 	ar_ptr = arena_get(bytes);
 	if (!ar_ptr)
 		return 0;
@@ -3297,6 +3303,11 @@ void public_fREe(Void_t * mem)
 		return;
 	}
 
+	if (tcache_free(p)) {
+		/* Object could be freed on fast path */
+		return;
+	}
+
 	ar_ptr = arena_for_chunk(p);
 	arena_lock(ar_ptr);
 	_int_free(ar_ptr, p);
diff --git a/tpc/malloc2.13/tcache.h b/tpc/malloc2.13/tcache.h
new file mode 100644
index 000000000000..62d58cc77475
--- /dev/null
+++ b/tpc/malloc2.13/tcache.h
@@ -0,0 +1,183 @@
+/*
+ * Thread-cache for malloc.
+ */
+
+#if (defined(__i386__) || defined(__amd64__) || defined(__x86_64__))
+static inline int fls(int x)
+{
+	int r;
+
+	asm("bsrl %1,%0\n\t"
+	"jnz 1f\n\t"
+	"movl $-1,%0\n"
+	"1:" : "=r" (r) : "rm" (x));
+	return r + 1;
+}
+#else
+#error Must define fls()
+#endif
+
+/*
+ * Per-thread cache is supposed to reduce lock contention on arenas.
+ * When doing allocations we prefetch several identical objects and
+ * can return the surplus on future allocations without going to an
+ * arena.  On free we keep the freed object in hope of reusing it in
+ * future allocations.
+ */
+#define CACHE_SIZE		(1 << 16)
+#define MAX_CACHED_SIZE		(CACHE_SIZE >> 3)
+#define MAX_PREFETCH_SIZE	(CACHE_SIZE >> 6)
+#define NO_PREFECT		(1 << 3)
+
+/*
+ * Binning is done as a subdivided buddy allocator.  A buddy allocator
+ * has one bin per power of two.  We use 16 bins per power of two,
+ * yielding a worst-case fragmentation of 6%.
+ *
+ * Worst-case fragmentation is nearly impossible to reach.  In bad
+ * real-world workloads there is a single dominant allocation size,
+ * causing most objects to be created with a perfect size.
+ */
+#define ALIGN_BITS	(4)
+#define ALIGNMENT	(1 << ALIGN_BITS)
+#define ALIGN_DOWN(x)   ((x) & ~(ALIGNMENT - 1))
+#define ALIGN(x)	(((x) + (ALIGNMENT - 1)) & ~(ALIGNMENT - 1))
+#define SUBBIN_BITS	(4)
+#define SUBBINS		(1 << SUBBIN_BITS)
+
+static int cache_bin(int size)
+{
+	int power, subbin;
+
+	size >>= ALIGN_BITS;
+	if (size < SUBBINS)
+		return size;
+	power = fls(size);
+	subbin = size >> (power - SUBBIN_BITS - 1);
+	return SUBBINS * (power - SUBBIN_BITS - 1) + subbin;
+}
+
+#define CACHE_NO_BINS 97
+
+struct thread_cache {
+	/* Bytes in cache */
+	uint32_t tc_size;
+
+	/* Objects in cache */
+	uint32_t tc_count;
+
+	struct malloc_chunk *tc_bin[CACHE_NO_BINS];
+};
+
+/*
+ * XXX Completely unoptimized - lots of low-hanging fruit
+ */
+static void cache_gc(struct thread_cache *cache)
+{
+	struct malloc_chunk *victim, *next;
+	struct malloc_state *arena;
+	int i;
+
+	for (i = 0; i < CACHE_NO_BINS; i++) {
+		victim = cache->tc_bin[i];
+		if (!victim)
+			continue;
+		cache->tc_bin[i] = NULL;
+		while (victim) {
+			next = victim->fd;
+			cache->tc_size -= chunksize(victim);
+			cache->tc_count--;
+			arena = arena_for_chunk(victim);
+			mutex_lock(&arena->mutex);
+			_int_free(arena, victim);
+			mutex_unlock(&arena->mutex);
+			victim = next;
+		}
+	}
+	assert(cache->tc_size == 0);
+	assert(cache->tc_count == 0);
+}
+
+static void *tcache_malloc(size_t size)
+{
+	struct thread_cache *cache;
+	struct malloc_state *arena;
+	struct malloc_chunk **bin, *victim;
+	size_t nb;
+	int bin_no;
+
+	checked_request2size(size, nb);
+	if (nb > MAX_CACHED_SIZE)
+		return NULL;
+
+	tsd_getspecific(cache_key, cache);
+	if (!cache) {
+		arena = arena_get(sizeof(*cache));
+		cache = _int_malloc(arena, sizeof(*cache));
+		arena_unlock(arena);
+		if (!cache)
+			return NULL;
+		memset(cache, 0, sizeof(*cache));
+		tsd_setspecific(cache_key, cache);
+	}
+
+	bin_no = cache_bin(nb);
+	assert(bin_no < CACHE_NO_BINS);
+
+	bin = &cache->tc_bin[bin_no];
+	victim = *bin;
+	if (victim) {
+		if (chunksize(victim) < nb)
+			return NULL;
+		if (cache_bin(chunksize(*bin)) != bin_no) {
+			malloc_printerr(check_action, "invalid tcache entry", victim);
+			return NULL;
+		}
+		*bin = victim->fd;
+		void *p = chunk2mem(victim);
+		if (perturb_byte)
+			alloc_perturb(p, size);
+		cache->tc_size -= chunksize(victim);
+		cache->tc_count--;
+		return p;
+	}
+	/* TODO: prefetch objects */
+	return NULL;
+}
+
+/*
+ * returns 1 if object was freed
+ */
+static int tcache_free(mchunkptr p)
+{
+	struct thread_cache *cache;
+	struct malloc_chunk **bin;
+	size_t size;
+	int bin_no;
+
+	tsd_getspecific(cache_key, cache);
+	if (!cache)
+		return 0;
+	size = chunksize(p);
+	if (size > MAX_CACHED_SIZE)
+		return 0;
+	bin_no = cache_bin(size);
+	assert(bin_no < CACHE_NO_BINS);
+
+	cache->tc_size += size;
+	cache->tc_count++;
+	bin = &cache->tc_bin[bin_no];
+	if (*bin == p) {
+		malloc_printerr(check_action, "double free or corruption (tcache)", chunk2mem(p));
+		return 0;
+	}
+	if (*bin && cache_bin(chunksize(*bin)) != bin_no) {
+		malloc_printerr(check_action, "invalid tcache entry", chunk2mem(p));
+		return 0;
+	}
+	p->fd = *bin;
+	*bin = p;
+	if (cache->tc_size > CACHE_SIZE)
+		cache_gc(cache);
+	return 1;
+}
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH] malloc: remove all remaining hooks
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
                   ` (59 preceding siblings ...)
  2016-01-26  0:32 ` [PATCH] malloc: remove atfork hooks Joern Engel
@ 2016-01-26  0:32 ` Joern Engel
  2016-01-26  0:32 ` [PATCH] malloc: Don't call tsd_setspecific before tsd_key_create Joern Engel
                   ` (3 subsequent siblings)
  64 siblings, 0 replies; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:32 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

They provide debugging and customization support we don't use.  Apart
from making the code unnecessarily complicated, hooks also caused
several bugs in the past.

JIRA: PURE-42344
https://codereviews.purestorage.com/r/23707/
---
 tpc/malloc2.13/arena.h    |   9 --
 tpc/malloc2.13/dlmalloc.h |  16 ---
 tpc/malloc2.13/hooks.h    | 324 +---------------------------------------------
 tpc/malloc2.13/malloc.c   |  81 +-----------
 4 files changed, 3 insertions(+), 427 deletions(-)

diff --git a/tpc/malloc2.13/arena.h b/tpc/malloc2.13/arena.h
index 2205c52da8f1..fdb029e89669 100644
--- a/tpc/malloc2.13/arena.h
+++ b/tpc/malloc2.13/arena.h
@@ -180,11 +180,7 @@ ptmalloc_unlock_all2 (void)
 
   if(__malloc_initialized < 1)
     return;
-#if defined MALLOC_HOOKS
   atfork_recursive_thread = 0;
-  dlmalloc_hook = save_malloc_hook;
-  dlfree_hook = save_free_hook;
-#endif
   for(ar_ptr = &main_arena;;) {
     mutex_init(&ar_ptr->mutex);
     ar_ptr = ar_ptr->next;
@@ -310,12 +306,7 @@ static void ptmalloc_init(void)
 	s = getenv("MALLOC_CHECK_");
 	if (s && s[0]) {
 		mALLOPt(M_CHECK_ACTION, (int)(s[0] - '0'));
-		if (check_action != 0)
-			dlmalloc_check_init();
 	}
-	void (*hook) (void) = force_reg(dlmalloc_initialize_hook);
-	if (hook != NULL)
-		(*hook) ();
 	__malloc_initialized = 1;
 }
 
diff --git a/tpc/malloc2.13/dlmalloc.h b/tpc/malloc2.13/dlmalloc.h
index 825e9490ba19..9a812e83aad0 100644
--- a/tpc/malloc2.13/dlmalloc.h
+++ b/tpc/malloc2.13/dlmalloc.h
@@ -102,22 +102,6 @@ extern void *dlget_state __DLMALLOC_P ((void));
    malloc_get_state(). */
 extern int dlset_state __DLMALLOC_P ((void *__ptr));
 
-/* Called once when malloc is initialized; redefining this variable in
-   the application provides the preferred way to set up the hook
-   pointers. */
-extern void (*dlmalloc_initialize_hook) __DLMALLOC_PMT ((void));
-/* Hooks for debugging and user-defined versions. */
-extern void (*dlfree_hook) __DLMALLOC_PMT ((void *__ptr,
-					const void *));
-extern void *(*dlmalloc_hook) __DLMALLOC_PMT ((size_t __size,
-					     const void *));
-extern void *(*dlrealloc_hook) __DLMALLOC_PMT ((void *__ptr, size_t __size,
-					      const void *));
-extern void *(*dlmemalign_hook) __DLMALLOC_PMT ((size_t __alignment,
-					       size_t __size,
-					       const void *));
-extern void (*dlafter_morecore_hook) __DLMALLOC_PMT ((void));
-
 /* Activate a standard set of debugging hooks. */
 extern void dlmalloc_check_init __DLMALLOC_P ((void));
 
diff --git a/tpc/malloc2.13/hooks.h b/tpc/malloc2.13/hooks.h
index f192080a71bf..ebf3d7d5d209 100644
--- a/tpc/malloc2.13/hooks.h
+++ b/tpc/malloc2.13/hooks.h
@@ -25,31 +25,6 @@
 /* Hooks for debugging versions.  The initial hooks just call the
    initialization routine, then do the normal work. */
 
-static Void_t*
-malloc_hook_ini(size_t sz, const __malloc_ptr_t caller)
-{
-  dlmalloc_hook = NULL;
-  ptmalloc_init();
-  return public_mALLOc(sz);
-}
-
-static Void_t*
-realloc_hook_ini(Void_t* ptr, size_t sz, const __malloc_ptr_t caller)
-{
-  dlmalloc_hook = NULL;
-  dlrealloc_hook = NULL;
-  ptmalloc_init();
-  return public_rEALLOc(ptr, sz);
-}
-
-static Void_t*
-memalign_hook_ini(size_t alignment, size_t sz, const __malloc_ptr_t caller)
-{
-  dlmemalign_hook = NULL;
-  ptmalloc_init();
-  return public_mEMALIGn(alignment, sz);
-}
-
 /* Whether we are using malloc checking.  */
 static int using_malloc_checking;
 
@@ -68,21 +43,6 @@ static int using_malloc_checking;
    further malloc checking is safe.  */
 static int disallow_malloc_check;
 
-/* Activate a standard set of debugging hooks. */
-void
-dlmalloc_check_init()
-{
-  if (disallow_malloc_check) {
-    disallow_malloc_check = 0;
-    return;
-  }
-  using_malloc_checking = 1;
-  dlmalloc_hook = malloc_check;
-  dlfree_hook = free_check;
-  dlrealloc_hook = realloc_check;
-  dlmemalign_hook = memalign_check;
-}
-
 /* A simple, standard set of debugging hooks.  Overhead is `only' one
    byte per chunk; still this will catch most cases of double frees or
    overruns.  The goal here is to avoid obscure crashes due to invalid
@@ -90,277 +50,9 @@ dlmalloc_check_init()
 
 #define MAGICBYTE(p) ( ( ((size_t)p >> 3) ^ ((size_t)p >> 11)) & 0xFF )
 
-/* Instrument a chunk with overrun detector byte(s) and convert it
-   into a user pointer with requested size sz. */
-
-static Void_t*
-internal_function
-mem2mem_check(Void_t *ptr, size_t sz)
-{
-  mchunkptr p;
-  unsigned char* m_ptr = (unsigned char*)BOUNDED_N(ptr, sz);
-  size_t i;
-
-  if (!ptr)
-    return ptr;
-  p = mem2chunk(ptr);
-  for(i = chunksize(p) - (chunk_is_mmapped(p) ? 2*SIZE_SZ+1 : SIZE_SZ+1);
-      i > sz;
-      i -= 0xFF) {
-    if(i-sz < 0x100) {
-      m_ptr[i] = (unsigned char)(i-sz);
-      break;
-    }
-    m_ptr[i] = 0xFF;
-  }
-  m_ptr[sz] = MAGICBYTE(p);
-  return (Void_t*)m_ptr;
-}
-
-/* Convert a pointer to be free()d or realloc()ed to a valid chunk
-   pointer.  If the provided pointer is not valid, return NULL. */
-
-static mchunkptr
-internal_function
-mem2chunk_check(Void_t* mem, unsigned char **magic_p)
-{
-  mchunkptr p;
-  INTERNAL_SIZE_T sz, c;
-  unsigned char magic;
-
-  if(!aligned_OK(mem)) return NULL;
-  p = mem2chunk(mem);
-  if (!chunk_is_mmapped(p)) {
-    /* Must be a chunk in conventional heap memory. */
-    int contig = contiguous(&main_arena);
-    sz = chunksize(p);
-    if((contig &&
-	((char*)p<mp_.sbrk_base ||
-	 ((char*)p + sz)>=(mp_.sbrk_base+main_arena.system_mem) )) ||
-       sz<MINSIZE || sz&MALLOC_ALIGN_MASK || !inuse(p) ||
-       ( !prev_inuse(p) && (p->prev_size&MALLOC_ALIGN_MASK ||
-			    (contig && (char*)prev_chunk(p)<mp_.sbrk_base) ||
-			    next_chunk(prev_chunk(p))!=p) ))
-      return NULL;
-    magic = MAGICBYTE(p);
-    for(sz += SIZE_SZ-1; (c = ((unsigned char*)p)[sz]) != magic; sz -= c) {
-      if(c<=0 || sz<(c+2*SIZE_SZ)) return NULL;
-    }
-  } else {
-    unsigned long offset, page_mask = malloc_getpagesize-1;
-
-    /* mmap()ed chunks have MALLOC_ALIGNMENT or higher power-of-two
-       alignment relative to the beginning of a page.  Check this
-       first. */
-    offset = (unsigned long)mem & page_mask;
-    if((offset!=MALLOC_ALIGNMENT && offset!=0 && offset!=0x10 &&
-	offset!=0x20 && offset!=0x40 && offset!=0x80 && offset!=0x100 &&
-	offset!=0x200 && offset!=0x400 && offset!=0x800 && offset!=0x1000 &&
-	offset<0x2000) ||
-       !chunk_is_mmapped(p) || (p->size & PREV_INUSE) ||
-       ( (((unsigned long)p - p->prev_size) & page_mask) != 0 ) ||
-       ( (sz = chunksize(p)), ((p->prev_size + sz) & page_mask) != 0 ) )
-      return NULL;
-    magic = MAGICBYTE(p);
-    for(sz -= 1; (c = ((unsigned char*)p)[sz]) != magic; sz -= c) {
-      if(c<=0 || sz<(c+2*SIZE_SZ)) return NULL;
-    }
-  }
-  ((unsigned char*)p)[sz] ^= 0xFF;
-  if (magic_p)
-    *magic_p = (unsigned char *)p + sz;
-  return p;
-}
-
 /* Check for corruption of the top chunk, and try to recover if
    necessary. */
 
-static int
-internal_function
-top_check(void)
-{
-  mchunkptr t = top(&main_arena);
-  char* brk, * new_brk;
-  INTERNAL_SIZE_T front_misalign, sbrk_size;
-  unsigned long pagesz = malloc_getpagesize;
-
-  if (t == initial_top(&main_arena) ||
-      (!chunk_is_mmapped(t) &&
-       chunksize(t)>=MINSIZE &&
-       prev_inuse(t) &&
-       (!contiguous(&main_arena) ||
-	(char*)t + chunksize(t) == mp_.sbrk_base + main_arena.system_mem)))
-    return 0;
-
-  malloc_printerr (check_action, "malloc: top chunk is corrupt", t);
-
-  /* Try to set up a new top chunk. */
-  brk = MORECORE(0);
-  front_misalign = (unsigned long)chunk2mem(brk) & MALLOC_ALIGN_MASK;
-  if (front_misalign > 0)
-    front_misalign = MALLOC_ALIGNMENT - front_misalign;
-  sbrk_size = front_misalign + mp_.top_pad + MINSIZE;
-  sbrk_size += pagesz - ((unsigned long)(brk + sbrk_size) & (pagesz - 1));
-  new_brk = (char*)(MORECORE (sbrk_size));
-  if (new_brk == (char*)(MORECORE_FAILURE))
-    {
-      MALLOC_FAILURE_ACTION;
-      return -1;
-    }
-  /* Call the `morecore' hook if necessary.  */
-  void (*hook) (void) = force_reg (dlafter_morecore_hook);
-  if (hook)
-    (*hook) ();
-  main_arena.system_mem = (new_brk - mp_.sbrk_base) + sbrk_size;
-
-  top(&main_arena) = (mchunkptr)(brk + front_misalign);
-  set_head(top(&main_arena), (sbrk_size - front_misalign) | PREV_INUSE);
-
-  return 0;
-}
-
-static Void_t*
-malloc_check(size_t sz, const Void_t *caller)
-{
-  Void_t *victim;
-
-  if (sz+1 == 0) {
-    MALLOC_FAILURE_ACTION;
-    return NULL;
-  }
-
-  (void)mutex_lock(&main_arena.mutex);
-  victim = (top_check() >= 0) ? _int_malloc(&main_arena, sz+1) : NULL;
-  (void)mutex_unlock(&main_arena.mutex);
-  return mem2mem_check(victim, sz);
-}
-
-static void
-free_check(Void_t* mem, const Void_t *caller)
-{
-  mchunkptr p;
-
-  if(!mem) return;
-  (void)mutex_lock(&main_arena.mutex);
-  p = mem2chunk_check(mem, NULL);
-  if(!p) {
-    (void)mutex_unlock(&main_arena.mutex);
-
-    malloc_printerr(check_action, "free(): invalid pointer", mem);
-    return;
-  }
-  if (chunk_is_mmapped(p)) {
-    (void)mutex_unlock(&main_arena.mutex);
-    munmap_chunk(p);
-    return;
-  }
-#if 0 /* Erase freed memory. */
-  memset(mem, 0, chunksize(p) - (SIZE_SZ+1));
-#endif
-  _int_free(&main_arena, p);
-  (void)mutex_unlock(&main_arena.mutex);
-}
-
-static Void_t*
-realloc_check(Void_t* oldmem, size_t bytes, const Void_t *caller)
-{
-  INTERNAL_SIZE_T nb;
-  Void_t* newmem = 0;
-  unsigned char *magic_p;
-
-  if (bytes+1 == 0) {
-    MALLOC_FAILURE_ACTION;
-    return NULL;
-  }
-  if (oldmem == 0) return malloc_check(bytes, NULL);
-  if (bytes == 0) {
-    free_check (oldmem, NULL);
-    return NULL;
-  }
-  (void)mutex_lock(&main_arena.mutex);
-  const mchunkptr oldp = mem2chunk_check(oldmem, &magic_p);
-  (void)mutex_unlock(&main_arena.mutex);
-  if(!oldp) {
-    malloc_printerr(check_action, "realloc(): invalid pointer", oldmem);
-    return malloc_check(bytes, NULL);
-  }
-  const INTERNAL_SIZE_T oldsize = chunksize(oldp);
-
-  checked_request2size(bytes+1, nb);
-  (void)mutex_lock(&main_arena.mutex);
-
-  if (chunk_is_mmapped(oldp)) {
-#if HAVE_MREMAP
-    mchunkptr newp = mremap_chunk(oldp, nb);
-    if(newp)
-      newmem = chunk2mem(newp);
-    else
-#endif
-    {
-      /* Note the extra SIZE_SZ overhead. */
-      if(oldsize - SIZE_SZ >= nb)
-	newmem = oldmem; /* do nothing */
-      else {
-	/* Must alloc, copy, free. */
-	if (top_check() >= 0)
-	  newmem = _int_malloc(&main_arena, bytes+1);
-	if (newmem) {
-	  MALLOC_COPY(BOUNDED_N(newmem, bytes+1), oldmem, oldsize - 2*SIZE_SZ);
-	  munmap_chunk(oldp);
-	}
-      }
-    }
-  } else {
-    if (top_check() >= 0) {
-      INTERNAL_SIZE_T nb;
-      checked_request2size(bytes + 1, nb);
-      newmem = _int_realloc(&main_arena, oldp, oldsize, nb);
-    }
-#if 0 /* Erase freed memory. */
-    if(newmem)
-      newp = mem2chunk(newmem);
-    nb = chunksize(newp);
-    if(oldp<newp || oldp>=chunk_at_offset(newp, nb)) {
-      memset((char*)oldmem + 2*sizeof(mbinptr), 0,
-	     oldsize - (2*sizeof(mbinptr)+2*SIZE_SZ+1));
-    } else if(nb > oldsize+SIZE_SZ) {
-      memset((char*)BOUNDED_N(chunk2mem(newp), bytes) + oldsize,
-	     0, nb - (oldsize+SIZE_SZ));
-    }
-#endif
-  }
-
-  /* mem2chunk_check changed the magic byte in the old chunk.
-     If newmem is NULL, then the old chunk will still be used though,
-     so we need to invert that change here.  */
-  if (newmem == NULL) *magic_p ^= 0xFF;
-
-  (void)mutex_unlock(&main_arena.mutex);
-
-  return mem2mem_check(newmem, bytes);
-}
-
-static Void_t*
-memalign_check(size_t alignment, size_t bytes, const Void_t *caller)
-{
-  INTERNAL_SIZE_T nb __attribute__((unused));
-  Void_t* mem;
-
-  if (alignment <= MALLOC_ALIGNMENT) return malloc_check(bytes, NULL);
-  if (alignment <  MINSIZE) alignment = MINSIZE;
-
-  if (bytes+1 == 0) {
-    MALLOC_FAILURE_ACTION;
-    return NULL;
-  }
-  checked_request2size(bytes+1, nb);
-  (void)mutex_lock(&main_arena.mutex);
-  mem = (top_check() >= 0) ? _int_memalign(&main_arena, alignment, bytes+1) :
-    NULL;
-  (void)mutex_unlock(&main_arena.mutex);
-  return mem2mem_check(mem, bytes);
-}
-
 /* Get/set state: malloc_get_state() records the current state of all
    malloc variables (_except_ for the actual heap contents and `hook'
    function pointers) in a system dependent, opaque data structure.
@@ -411,6 +103,7 @@ public_gET_STATe(void)
   int i;
   mbinptr b;
 
+  abort();
   ms = (struct malloc_save_state*)public_mALLOc(sizeof(*ms));
   if (!ms)
     return 0;
@@ -461,6 +154,7 @@ public_sET_STATe(Void_t* msptr)
   size_t i;
   mbinptr b;
 
+  abort();
   disallow_malloc_check = 1;
   ptmalloc_init();
   if(ms->magic != MALLOC_STATE_MAGIC) return -1;
@@ -534,20 +228,6 @@ public_sET_STATe(Void_t* msptr)
   mp_.mmapped_mem = ms->mmapped_mem;
   mp_.max_mmapped_mem = ms->max_mmapped_mem;
   /* add version-dependent code here */
-  if (ms->version >= 1) {
-    /* Check whether it is safe to enable malloc checking, or whether
-       it is necessary to disable it.  */
-    if (ms->using_malloc_checking && !using_malloc_checking &&
-	!disallow_malloc_check)
-      dlmalloc_check_init ();
-    else if (!ms->using_malloc_checking && using_malloc_checking) {
-      dlmalloc_hook = NULL;
-      dlfree_hook = NULL;
-      dlrealloc_hook = NULL;
-      dlmemalign_hook = NULL;
-      using_malloc_checking = 0;
-    }
-  }
   if (ms->version >= 4) {
   }
   check_malloc_state(&main_arena);
diff --git a/tpc/malloc2.13/malloc.c b/tpc/malloc2.13/malloc.c
index f56321444b76..50a50949bd56 100644
--- a/tpc/malloc2.13/malloc.c
+++ b/tpc/malloc2.13/malloc.c
@@ -1379,21 +1379,11 @@ static int      mALLOPt(int, int);
 static struct mallinfo2 mALLINFo(struct malloc_state *);
 static void malloc_printerr(int action, const char *str, void *ptr);
 
-static Void_t* internal_function mem2mem_check(Void_t *p, size_t sz);
-static int internal_function top_check(void);
 static void internal_function munmap_chunk(mchunkptr p);
 #if HAVE_MREMAP
 static mchunkptr internal_function mremap_chunk(mchunkptr p, size_t new_size);
 #endif
 
-static Void_t*   malloc_check(size_t sz, const Void_t *caller);
-static void      free_check(Void_t* mem, const Void_t *caller);
-static Void_t*   realloc_check(Void_t* oldmem, size_t bytes,
-			       const Void_t *caller);
-static Void_t*   memalign_check(size_t alignment, size_t bytes,
-				const Void_t *caller);
-
-
 
 /* ------------- Optional versions of memcopy ---------------- */
 
@@ -2206,36 +2196,6 @@ static Void_t** iALLOc(struct malloc_state *, size_t, size_t*, int, Void_t**);
 static void tcache_destroy(void *_cache);
 
 
-/* -------------- Early definitions for debugging hooks ---------------- */
-
-/* Define and initialize the hook variables.  These weak definitions must
-   appear before any use of the variables in a function (arena.c uses one).  */
-#ifndef weak_variable
-#define weak_variable /**/
-#endif
-
-/* Forward declarations.  */
-static Void_t* malloc_hook_ini __MALLOC_P ((size_t sz,
-					    const __malloc_ptr_t caller));
-static Void_t* realloc_hook_ini __MALLOC_P ((Void_t* ptr, size_t sz,
-					     const __malloc_ptr_t caller));
-static Void_t* memalign_hook_ini __MALLOC_P ((size_t alignment, size_t sz,
-					      const __malloc_ptr_t caller));
-
-void weak_variable (*dlmalloc_initialize_hook) (void) = NULL;
-void weak_variable (*dlfree_hook) (__malloc_ptr_t __ptr,
-				   const __malloc_ptr_t) = NULL;
-__malloc_ptr_t weak_variable (*dlmalloc_hook)
-     (size_t __size, const __malloc_ptr_t) = malloc_hook_ini;
-__malloc_ptr_t weak_variable (*dlrealloc_hook)
-     (__malloc_ptr_t __ptr, size_t __size, const __malloc_ptr_t)
-     = realloc_hook_ini;
-__malloc_ptr_t weak_variable (*dlmemalign_hook)
-     (size_t __alignment, size_t __size, const __malloc_ptr_t)
-     = memalign_hook_ini;
-void weak_variable (*dlafter_morecore_hook) (void) = NULL;
-
-
 /* ---------------- Error behavior ------------------------------------ */
 
 #ifndef DEFAULT_CHECK_ACTION
@@ -2855,10 +2815,6 @@ static Void_t* sYSMALLOc(INTERNAL_SIZE_T nb, struct malloc_state * av)
     brk = (char*)(MORECORE(size));
 
   if (brk != (char*)(MORECORE_FAILURE)) {
-    /* Call the `morecore' hook if necessary.  */
-    void (*hook) (void) = force_reg (dlafter_morecore_hook);
-    if (hook != NULL)
-      (*hook) ();
   } else {
   /*
     If have mmap, try using it as a backup when MORECORE fails or
@@ -2992,11 +2948,6 @@ static Void_t* sYSMALLOc(INTERNAL_SIZE_T nb, struct malloc_state * av)
 	if (snd_brk == (char*)(MORECORE_FAILURE)) {
 	  correction = 0;
 	  snd_brk = (char*)(MORECORE(0));
-	} else {
-	  /* Call the `morecore' hook if necessary.  */
-	  void (*hook) (void) = force_reg (dlafter_morecore_hook);
-	  if (hook != NULL)
-	    (*hook) ();
 	}
       }
 
@@ -3136,10 +3087,6 @@ static int sYSTRIm(size_t pad, struct malloc_state * av)
       */
 
       MORECORE(-extra);
-      /* Call the `morecore' hook if necessary.  */
-      void (*hook) (void) = force_reg (dlafter_morecore_hook);
-      if (hook != NULL)
-	(*hook) ();
       new_brk = (char*)(MORECORE(0));
 
       if (new_brk != (char*)MORECORE_FAILURE) {
@@ -3318,11 +3265,6 @@ public_rEALLOc(Void_t* oldmem, size_t bytes)
 
   Void_t* newp;             /* chunk to return */
 
-  __malloc_ptr_t (*hook) (__malloc_ptr_t, size_t, __const __malloc_ptr_t) =
-    force_reg (dlrealloc_hook);
-  if (hook != NULL)
-    return (*hook)(oldmem, bytes, RETURN_ADDRESS (0));
-
 #if REALLOC_ZERO_BYTES_FREES
   if (bytes == 0 && oldmem != NULL) { public_fREe(oldmem); return 0; }
 #endif
@@ -3401,10 +3343,6 @@ Void_t *public_mEMALIGn(size_t alignment, size_t bytes)
 	struct malloc_state *ar_ptr;
 	Void_t *p;
 
-	__malloc_ptr_t(*hook) __MALLOC_PMT((size_t, size_t, __const __malloc_ptr_t)) = force_reg(dlmemalign_hook);
-	if (hook != NULL)
-		return (*hook) (alignment, bytes, RETURN_ADDRESS(0));
-
 	/* If need less alignment than we give anyway, just relay to malloc */
 	if (alignment <= MALLOC_ALIGNMENT)
 		return public_mALLOc(bytes);
@@ -3436,10 +3374,6 @@ Void_t *public_vALLOc(size_t bytes)
 
 	size_t pagesz = mp_.pagesize;
 
-	__malloc_ptr_t(*hook) __MALLOC_PMT((size_t, size_t, __const __malloc_ptr_t)) = force_reg(dlmemalign_hook);
-	if (hook != NULL)
-		return (*hook) (pagesz, bytes, RETURN_ADDRESS(0));
-
 	ar_ptr = arena_get(bytes + pagesz + MINSIZE);
 	if (!ar_ptr)
 		return 0;
@@ -3467,10 +3401,6 @@ Void_t *public_pVALLOc(size_t bytes)
 	size_t page_mask = mp_.pagesize - 1;
 	size_t rounded_bytes = (bytes + page_mask) & ~(page_mask);
 
-	__malloc_ptr_t(*hook) __MALLOC_PMT((size_t, size_t, __const __malloc_ptr_t)) = force_reg(dlmemalign_hook);
-	if (hook != NULL)
-		return (*hook) (pagesz, rounded_bytes, RETURN_ADDRESS(0));
-
 	ar_ptr = arena_get(bytes + 2 * pagesz + MINSIZE);
 	p = _int_pvalloc(ar_ptr, bytes);
 	if (!p) {
@@ -5193,7 +5123,6 @@ int mALLOPt(int param_number, int value)
     break;
 
   case M_CHECK_ACTION:
-    check_action = value;
     break;
 
   case M_PERTURB:
@@ -5393,15 +5322,7 @@ dlposix_memalign (void **memptr, size_t alignment, size_t size)
       || alignment == 0)
     return EINVAL;
 
-  /* Call the hook here, so that caller is posix_memalign's caller
-     and not posix_memalign itself.  */
-  __malloc_ptr_t (*hook) __MALLOC_PMT ((size_t, size_t,
-					__const __malloc_ptr_t)) =
-    force_reg (dlmemalign_hook);
-  if (hook != NULL)
-    mem = (*hook)(alignment, size, RETURN_ADDRESS (0));
-  else
-    mem = public_mEMALIGn (alignment, size);
+  mem = public_mEMALIGn (alignment, size);
 
   if (mem != NULL) {
     *memptr = mem;
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH] malloc: fix mbind on old kernels
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
                   ` (46 preceding siblings ...)
  2016-01-26  0:32 ` [PATCH] malloc: create aliases for malloc, free, Joern Engel
@ 2016-01-26  0:32 ` Joern Engel
  2016-01-26  0:32 ` [PATCH] malloc: remove tcache prefetching Joern Engel
                   ` (16 subsequent siblings)
  64 siblings, 0 replies; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:32 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

From: Joern Engel <joern@purestorage.org>

Kernel returned -EINVAL when passing in MPOL_F_STATIC_NODES.  The flag
being documented in the manpage, yet undefined in the header was dodgy
anyway.  Sounds like an intriguing story waiting to be uncovered.

JIRA: PURE-27597
---
 tpc/malloc2.13/arena.h | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/tpc/malloc2.13/arena.h b/tpc/malloc2.13/arena.h
index ef5e22a0811d..0804ecfe3a26 100644
--- a/tpc/malloc2.13/arena.h
+++ b/tpc/malloc2.13/arena.h
@@ -466,16 +466,14 @@ static void *mmap_for_heap(void *addr, size_t length, int *must_clear)
 }
 
 #include <numaif.h>
-#ifndef MPOL_F_STATIC_NODES
-#define MPOL_F_STATIC_NODES   (1 << 15)
-#endif
+
 static void mbind_memory(void *mem, size_t size, int node)
 {
 	unsigned long node_mask = 1 << node;
 	int err;
 
 	assert(max_node < sizeof(unsigned long));
-	err = mbind(mem, size, MPOL_PREFERRED, &node_mask, max_node, MPOL_F_STATIC_NODES);
+	err = mbind(mem, size, MPOL_PREFERRED, &node_mask, max_node + 1, 0);
 	assert(!err);
 }
 
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH] malloc: define __libc_memalign
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
                   ` (51 preceding siblings ...)
  2016-01-26  0:32 ` [PATCH] malloc: remove get_backup_arena() from tcache_malloc() Joern Engel
@ 2016-01-26  0:32 ` Joern Engel
  2016-01-26  0:32 ` [PATCH] malloc: move out perturb_byte conditionals Joern Engel
                   ` (11 subsequent siblings)
  64 siblings, 0 replies; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:32 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

This one is special, in a horrible way.  The libc implementation
of tls does memory allocations.  Malloc depends on tls, so we have
a circular dependency.  Libc solved this by having a weak definition
for __libc_memalign in tls which does little more than mmap.
If malloc is already initialized, the malloc version will override
the tls version.  Otherwise tls will us it's own.

Life gets more interesting if people ship their own version of
malloc (this version) and provide hooks for libc malloc to catch
mistakes where allocations go to libc malloc that shouldn't.  Those
hook also instrument __libc_memalign and get triggered by the tls
code - but only on unusual workloads where the stars align.

Defining an alias for any __libc_* function is wrong on many levels,
but that seems to be the only solution short of patching libc code.

https://codereviews.purestorage.com/r/23945/
JIRA: PURE-43462
---
 tpc/malloc2.13/malloc.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/tpc/malloc2.13/malloc.c b/tpc/malloc2.13/malloc.c
index 50a50949bd56..19eea9f77deb 100644
--- a/tpc/malloc2.13/malloc.c
+++ b/tpc/malloc2.13/malloc.c
@@ -5537,6 +5537,25 @@ strong_alias(dlget_state, get_state);
 strong_alias(dlset_state, set_state);
 strong_alias(dlposix_memalign, posix_memalign);
 
+/*
+ * This one is special, in a horrible way.  The libc implementation
+ * of tls does memory allocations.  Malloc depends on tls, so we have
+ * a circular dependency.  Libc solved this by having a weak definition
+ * for __libc_memalign in tls which does little more than mmap.
+ * If malloc is already initialized, the malloc version will override
+ * the tls version.  Otherwise tls will us it's own.
+ *
+ * Life gets more interesting if people ship their own version of
+ * malloc (this version) and provide hooks for libc malloc to catch
+ * mistakes where allocations go to libc malloc that shouldn't.  Those
+ * hook also instrument __libc_memalign and get triggered by the tls
+ * code - but only on unusual workloads where the stars align.
+ *
+ * Defining an alias for any __libc_* function is wrong on many levels,
+ * but that seems to be the only solution short of patching libc code.
+ */
+strong_alias(dlmemalign, __libc_memalign);
+
 /* ------------------------------------------------------------
 History:
 
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH] malloc: remove tcache prefetching
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
                   ` (47 preceding siblings ...)
  2016-01-26  0:32 ` [PATCH] malloc: fix mbind on old kernels Joern Engel
@ 2016-01-26  0:32 ` Joern Engel
  2016-01-26  0:32 ` [PATCH] malloc: remove stale condition Joern Engel
                   ` (15 subsequent siblings)
  64 siblings, 0 replies; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:32 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

The code doesn't help.  Best I could demonstrate in an artificial test
that spends all its time in malloc/free is 1% performance improvement,
assuming all allocations are the same size.  Once you mix sizes and
there is a chance of prefetched objects being unused and later freed,
the performance degrades.  It was a nice idea, it didn't survive a
reality test, let's remove it.

JIRA: PURE-27597
---
 tpc/malloc2.13/arena.h  |  2 ++
 tpc/malloc2.13/malloc.c |  1 +
 tpc/malloc2.13/tcache.h | 70 +++++--------------------------------------------
 3 files changed, 9 insertions(+), 64 deletions(-)

diff --git a/tpc/malloc2.13/arena.h b/tpc/malloc2.13/arena.h
index f6b108f661dc..587bad51dd63 100644
--- a/tpc/malloc2.13/arena.h
+++ b/tpc/malloc2.13/arena.h
@@ -843,6 +843,7 @@ static struct malloc_state *arena_get(size_t size)
 		THREAD_STAT(++(arena->stat_lock_direct));
 	} else
 		arena = arena_get2(arena, size);
+	free_atomic_list(arena);
 	return arena;
 }
 
@@ -858,6 +859,7 @@ static inline void arena_lock(struct malloc_state *arena)
 #else
 	(void)mutex_lock(&arena->mutex);
 #endif
+	free_atomic_list(arena);
 }
 
 static inline void arena_unlock(struct malloc_state *arena)
diff --git a/tpc/malloc2.13/malloc.c b/tpc/malloc2.13/malloc.c
index c4e3fbada60a..439c1247fe99 100644
--- a/tpc/malloc2.13/malloc.c
+++ b/tpc/malloc2.13/malloc.c
@@ -2274,6 +2274,7 @@ static int perturb_byte;
 } while (0)
 
 
+static void free_atomic_list(struct malloc_state *arena);
 /* ------------------- Support for multiple arenas -------------------- */
 #include "arena.h"
 
diff --git a/tpc/malloc2.13/tcache.h b/tpc/malloc2.13/tcache.h
index 9e210a973d10..78d5d2f17462 100644
--- a/tpc/malloc2.13/tcache.h
+++ b/tpc/malloc2.13/tcache.h
@@ -20,24 +20,12 @@ static inline int fls(int x)
 /*
  * Per-thread cache is supposed to reduce lock contention on arenas.
  * Freed objects go to the cache first, allowing allocations to be
- * serviced from it without going to the arenas.  On cache misses we
- * have to take the arena lock, but can amortize the cost by
- * prefetching additional objects for future use.
- *
- * Prefetching is a heuristic.  If an object of size X is requested,
- * we assume more objects of the same size will be requested in the
- * near future.  If correct, this reduces locking overhead.  If
- * incorrect, we spend cpu cycles and pollute the tcache with unused
- * objects.  Sweet spot depends on the workload, but seems to be
- * around one.
+ * serviced from it without going to the arenas.
  */
 #define CACHE_SIZE_BITS		(17)
 #define CACHE_SIZE		(1 << CACHE_SIZE_BITS)
 #define MAX_CACHED_SIZE_BITS	(CACHE_SIZE_BITS - 3)
 #define MAX_CACHED_SIZE		(1 << MAX_CACHED_SIZE_BITS)
-#define MAX_PREFETCH_SIZE_BITS	(CACHE_SIZE_BITS - 6)
-#define MAX_PREFETCH_SIZE	(1 << MAX_PREFETCH_SIZE_BITS)
-#define NO_PREFETCH		(1)
 
 /*
  * Binning is done as a subdivided buddy allocator.  A buddy allocator
@@ -128,7 +116,8 @@ static inline int is_accessed(struct thread_cache *cache, int bin)
 /*
  * Free objects from the atomic_free_list while holding the
  * arena_lock.  In case the atomic_free_list has become obscenely big
- * we limit ourselves to freeing 64 objects at once.
+ * we limit ourselves to freeing 64 objects at once.  List sizes
+ * above 7000 elements have been observed in tests, so this matters.
  */
 static void free_atomic_list(struct malloc_state *arena)
 {
@@ -261,9 +250,9 @@ static void *tcache_malloc(size_t size)
 {
 	struct thread_cache *cache;
 	struct malloc_state *arena;
-	struct malloc_chunk **bin, *victim, *prefetch;
+	struct malloc_chunk **bin, *victim;
 	size_t nb;
-	int bin_no, i;
+	int bin_no;
 
 	checked_request2size(size, nb);
 	if (nb > MAX_CACHED_SIZE)
@@ -309,55 +298,8 @@ static void *tcache_malloc(size_t size)
 		mutex_unlock(&cache->mutex);
 		return p;
 	}
-
-	/*
-	 * GC the cache before prefetching, not after.  The last thing
-	 * we want is to spend effort prefetching, then release all
-	 * those objects via tcache_gc.  Also do it before taking the
-	 * lock, to minimize hold times.
-	 */
-	if (nb <= MAX_PREFETCH_SIZE && (cache->tc_size + nb * NO_PREFETCH) > CACHE_SIZE)
-		tcache_gc(cache);
-
-	arena = arena_get(size);
-	if (!arena) {
-		mutex_unlock(&cache->mutex);
-		return NULL;
-	}
-	free_atomic_list(arena);
-	/* TODO: _int_malloc does checked_request2size() again, which is silly */
-	victim = _int_malloc(arena, size);
-	if (!victim) {
-		arena = get_backup_arena(arena, size);
-		victim = _int_malloc(arena, size);
-	}
-	if (victim && nb <= MAX_PREFETCH_SIZE) {
-		/* Prefetch some more while we hold the lock */
-		for (i = 0; i < NO_PREFETCH; i++) {
-			prefetch = _int_malloc(arena, size);
-			if (!prefetch)
-				break;
-			prefetch = mem2chunk(prefetch);
-			if (cache_bin(chunksize(prefetch)) > bin_no) {
-				/*
-				 * If _int_malloc() returns bigger chunks,
-				 * we assume that prefetching won't buy us
-				 * any benefits.
-				 */
-				_int_free(arena, prefetch);
-				break;
-			}
-			assert(cache_bin(chunksize(prefetch)) == bin_no);
-			cache->tc_size += chunksize(prefetch);
-			cache->tc_count++;
-			add_to_bin(bin, prefetch);
-		}
-	}
-	arena_unlock(arena);
-	assert(!victim || arena == arena_for_chunk(mem2chunk(victim)));
-	alloc_perturb(victim, size);
 	mutex_unlock(&cache->mutex);
-	return victim;
+	return NULL;
 }
 
 static void tcache_free(mchunkptr p)
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH] malloc: fix calculation of aligned heaps
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
                   ` (57 preceding siblings ...)
  2016-01-26  0:32 ` [PATCH] malloc: speed up mmap Joern Engel
@ 2016-01-26  0:32 ` Joern Engel
  2016-01-26  0:32 ` [PATCH] malloc: remove atfork hooks Joern Engel
                   ` (5 subsequent siblings)
  64 siblings, 0 replies; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:32 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

By my reading we would use the aligned_heap_area for every other
allocation while we should be able to use it for almost every
allocation.

JIRA: PURE-27597
---
 tpc/malloc2.13/arena.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tpc/malloc2.13/arena.h b/tpc/malloc2.13/arena.h
index 7de7436a30ba..f6b108f661dc 100644
--- a/tpc/malloc2.13/arena.h
+++ b/tpc/malloc2.13/arena.h
@@ -543,8 +543,9 @@ static heap_info *new_heap(size_t size, size_t top_pad, int numa_node)
 	p2 = MAP_FAILED;
 	if (aligned_heap_area) {
 		p2 = mmap_for_heap(aligned_heap_area, HEAP_MAX_SIZE, &must_clear);
-		aligned_heap_area = NULL;
+		aligned_heap_area = p2 + HEAP_MAX_SIZE;
 		if (p2 != MAP_FAILED && ((unsigned long)p2 & (HEAP_MAX_SIZE - 1))) {
+			aligned_heap_area = NULL;
 			munmap(p2, HEAP_MAX_SIZE);
 			p2 = MAP_FAILED;
 		}
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH] malloc: use bitmap to conserve hot bins
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
                   ` (49 preceding siblings ...)
  2016-01-26  0:32 ` [PATCH] malloc: remove stale condition Joern Engel
@ 2016-01-26  0:32 ` Joern Engel
  2016-01-26  0:32 ` [PATCH] malloc: remove get_backup_arena() from tcache_malloc() Joern Engel
                   ` (13 subsequent siblings)
  64 siblings, 0 replies; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:32 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

From: Joern Engel <joern@purestorage.org>

Tcache_gc needs some heuristic to decide what memory will remain useful
in the future and what memory can be returned to the allocator.  This
uses the same heuristic as lru - bins that have been used in the recent
past are preserved entirely, bins that haven't been used are emptied.

JIRA: PURE-27597
---
 tpc/malloc2.13/tcache.h | 88 +++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 71 insertions(+), 17 deletions(-)

diff --git a/tpc/malloc2.13/tcache.h b/tpc/malloc2.13/tcache.h
index 7cf6b316456f..1829cd4ebb9a 100644
--- a/tpc/malloc2.13/tcache.h
+++ b/tpc/malloc2.13/tcache.h
@@ -24,9 +24,12 @@ static inline int fls(int x)
  * arena.  On free we keep the freed object in hope of reusing it in
  * future allocations.
  */
-#define CACHE_SIZE		(1 << 16)
-#define MAX_CACHED_SIZE		(CACHE_SIZE >> 3)
-#define MAX_PREFETCH_SIZE	(CACHE_SIZE >> 6)
+#define CACHE_SIZE_BITS		(16)
+#define CACHE_SIZE		(1 << CACHE_SIZE_BITS)
+#define MAX_CACHED_SIZE_BITS	(CACHE_SIZE_BITS - 3)
+#define MAX_CACHED_SIZE		(1 << MAX_CACHED_SIZE_BITS)
+#define MAX_PREFETCH_SIZE_BITS	(CACHE_SIZE_BITS - 6)
+#define MAX_PREFETCH_SIZE	(1 << MAX_PREFETCH_SIZE_BITS)
 #define NO_PREFETCH		(1 << 3)
 
 /*
@@ -40,8 +43,8 @@ static inline int fls(int x)
  */
 #define ALIGN_BITS	(4)
 #define ALIGNMENT	(1 << ALIGN_BITS)
-#define ALIGN_DOWN(x)   ((x) & ~(ALIGNMENT - 1))
-#define ALIGN(x)	(((x) + (ALIGNMENT - 1)) & ~(ALIGNMENT - 1))
+#define ALIGN_DOWN(x, a) ((x) & ~((a) - 1))
+#define ALIGN(x, a)	(((x) + ((a) - 1)) & ~((a) - 1))
 #define SUBBIN_BITS	(4)
 #define SUBBINS		(1 << SUBBIN_BITS)
 
@@ -57,7 +60,27 @@ static int cache_bin(int size)
 	return SUBBINS * (power - SUBBIN_BITS - 1) + subbin;
 }
 
-#define CACHE_NO_BINS 97
+#define CACHE_NO_BINS	(SUBBINS * (MAX_CACHED_SIZE_BITS - ALIGN_BITS - SUBBIN_BITS + 1) + 1)
+
+#define CACHE_BITMAP_SIZE	(ALIGN(CACHE_NO_BINS, __WORDSIZE) / __WORDSIZE)
+
+#define BITMAP_WORD(i)	((i) / __WORDSIZE)
+#define BITMAP_BIT(i)	(1UL << ((i) & (__WORDSIZE - 1)))
+
+static inline void clear_bit(unsigned long *bitmap, int i)
+{
+	bitmap[BITMAP_WORD(i)] &= ~BITMAP_BIT(i);
+}
+
+static inline void set_bit(unsigned long *bitmap, int i)
+{
+	bitmap[BITMAP_WORD(i)] |= BITMAP_BIT(i);
+}
+
+static inline int get_bit(unsigned long *bitmap, int i)
+{
+	return !!(bitmap[BITMAP_WORD(i)] & BITMAP_BIT(i));
+}
 
 struct thread_cache {
 	/* Bytes in cache */
@@ -66,21 +89,32 @@ struct thread_cache {
 	/* Objects in cache */
 	uint32_t tc_count;
 
+	unsigned long accessed_map[CACHE_BITMAP_SIZE];
+
 	struct malloc_chunk *tc_bin[CACHE_NO_BINS];
 };
 
-/*
- * XXX Completely unoptimized - lots of low-hanging fruit
- */
-static void cache_gc(struct thread_cache *cache)
+static inline void set_accessed(struct thread_cache *cache, int bin)
+{
+	set_bit(cache->accessed_map, bin);
+}
+
+static inline int is_accessed(struct thread_cache *cache, int bin)
+{
+	return get_bit(cache->accessed_map, bin);
+}
+
+static void tcache_gc(struct thread_cache *cache)
 {
 	struct malloc_chunk *victim, *next;
 	struct malloc_state *arena;
-	int i;
+	int i, did_repeat = 0;
 
+repeat:
 	for (i = 0; i < CACHE_NO_BINS; i++) {
 		victim = cache->tc_bin[i];
-		if (!victim)
+		/* accessed bins get skipped - they are useful */
+		if (is_accessed(cache, i) || !victim)
 			continue;
 		cache->tc_bin[i] = NULL;
 		while (victim) {
@@ -88,14 +122,33 @@ static void cache_gc(struct thread_cache *cache)
 			cache->tc_size -= chunksize(victim);
 			cache->tc_count--;
 			arena = arena_for_chunk(victim);
+			/* TODO: use atomic bins instead */
 			mutex_lock(&arena->mutex);
 			_int_free(arena, victim);
 			mutex_unlock(&arena->mutex);
 			victim = next;
 		}
 	}
-	assert(cache->tc_size == 0);
-	assert(cache->tc_count == 0);
+	memset(cache->accessed_map, 0, sizeof(cache->accessed_map));
+
+	if (cache->tc_size > CACHE_SIZE) {
+		/*
+		 * Since we skip accessed bins we can run into
+		 * pathological cases where all bins are empty or
+		 * accessed and we made no progress.  In those cases
+		 * we retry after clearing the accessed bits, freeing
+		 * the entire cache.  Should be rare.
+		 */
+		did_repeat = 1;
+		goto repeat;
+	} else if (did_repeat) {
+		/*
+		 * Since we freed the entire cache, we can verify the
+		 * counters are consistent.
+		 */
+		assert(cache->tc_size == 0);
+		assert(cache->tc_count == 0);
+	}
 }
 
 static void *tcache_malloc(size_t size)
@@ -127,6 +180,7 @@ static void *tcache_malloc(size_t size)
 
 	bin_no = cache_bin(nb);
 	assert(bin_no < CACHE_NO_BINS);
+	set_accessed(cache, bin_no);
 
 	bin = &cache->tc_bin[bin_no];
 	victim = *bin;
@@ -150,11 +204,11 @@ static void *tcache_malloc(size_t size)
 	/*
 	 * GC the cache before prefetching, not after.  The last thing
 	 * we want is to spend effort prefetching, then release all
-	 * those objects via cache_gc.  Also do it before taking the
+	 * those objects via tcache_gc.  Also do it before taking the
 	 * lock, to minimize hold times.
 	 */
 	if (nb <= MAX_PREFETCH_SIZE && (cache->tc_size + nb * 8) > CACHE_SIZE )
-		cache_gc(cache);
+		tcache_gc(cache);
 
 	arena = arena_get(size);
 	if (!arena)
@@ -226,6 +280,6 @@ static int tcache_free(mchunkptr p)
 	p->fd = *bin;
 	*bin = p;
 	if (cache->tc_size > CACHE_SIZE)
-		cache_gc(cache);
+		tcache_gc(cache);
 	return 1;
 }
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH] malloc: rename *.ch to *.h
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
                   ` (53 preceding siblings ...)
  2016-01-26  0:32 ` [PATCH] malloc: move out perturb_byte conditionals Joern Engel
@ 2016-01-26  0:32 ` Joern Engel
  2016-01-26  0:32 ` [PATCH] malloc: brain-dead thread cache Joern Engel
                   ` (9 subsequent siblings)
  64 siblings, 0 replies; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:32 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

From: Joern Engel <joern@purestorage.org>

Editors will switch to c mode when edition *.c or *.h files, but not
when editing *.ch files.  Avoids constant annoyances.

JIRA: PURE-27597
---
 tpc/malloc2.13/arena.ch | 1092 -----------------------------------------------
 tpc/malloc2.13/arena.h  | 1092 +++++++++++++++++++++++++++++++++++++++++++++++
 tpc/malloc2.13/hooks.ch |  643 ----------------------------
 tpc/malloc2.13/hooks.h  |  643 ++++++++++++++++++++++++++++
 tpc/malloc2.13/malloc.c |    4 +-
 5 files changed, 1737 insertions(+), 1737 deletions(-)
 delete mode 100644 tpc/malloc2.13/arena.ch
 create mode 100644 tpc/malloc2.13/arena.h
 delete mode 100644 tpc/malloc2.13/hooks.ch
 create mode 100644 tpc/malloc2.13/hooks.h

diff --git a/tpc/malloc2.13/arena.ch b/tpc/malloc2.13/arena.ch
deleted file mode 100644
index 0aaccb914d92..000000000000
--- a/tpc/malloc2.13/arena.ch
+++ /dev/null
@@ -1,1092 +0,0 @@
-/* Malloc implementation for multiple threads without lock contention.
-   Copyright (C) 2001,2002,2003,2004,2005,2006,2007,2009,2010
-   Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-   Contributed by Wolfram Gloger <wg@malloc.de>, 2001.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public License as
-   published by the Free Software Foundation; either version 2.1 of the
-   License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; see the file COPYING.LIB.  If not,
-   write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
-   Boston, MA 02111-1307, USA.  */
-
-#include <stdbool.h>
-
-/* Compile-time constants.  */
-
-#define HEAP_MIN_SIZE (32*1024)
-#ifndef HEAP_MAX_SIZE
-# ifdef DEFAULT_MMAP_THRESHOLD_MAX
-#  define HEAP_MAX_SIZE (2 * DEFAULT_MMAP_THRESHOLD_MAX)
-# else
-#  define HEAP_MAX_SIZE (1024*1024) /* must be a power of two */
-# endif
-#endif
-
-/* HEAP_MIN_SIZE and HEAP_MAX_SIZE limit the size of mmap()ed heaps
-   that are dynamically created for multi-threaded programs.  The
-   maximum size must be a power of two, for fast determination of
-   which heap belongs to a chunk.  It should be much larger than the
-   mmap threshold, so that requests with a size just below that
-   threshold can be fulfilled without creating too many heaps.  */
-
-
-#ifndef THREAD_STATS
-#define THREAD_STATS 0
-#endif
-
-/* If THREAD_STATS is non-zero, some statistics on mutex locking are
-   computed.  */
-
-/***************************************************************************/
-
-#define top(ar_ptr) ((ar_ptr)->top)
-
-/* A heap is a single contiguous memory region holding (coalesceable)
-   malloc_chunks.  It is allocated with mmap() and always starts at an
-   address aligned to HEAP_MAX_SIZE.  Not used unless compiling with
-   USE_ARENAS. */
-
-typedef struct _heap_info {
-  struct malloc_state * ar_ptr; /* Arena for this heap. */
-  struct _heap_info *prev; /* Previous heap. */
-  size_t size;   /* Current size in bytes. */
-  size_t mprotect_size;	/* Size in bytes that has been mprotected
-			   PROT_READ|PROT_WRITE.  */
-  /* Make sure the following data is properly aligned, particularly
-     that sizeof (heap_info) + 2 * SIZE_SZ is a multiple of
-     MALLOC_ALIGNMENT. */
-  char pad[-6 * SIZE_SZ & MALLOC_ALIGN_MASK];
-} heap_info;
-
-/* Get a compile-time error if the heap_info padding is not correct
-   to make alignment work as expected in sYSMALLOc.  */
-extern int sanity_check_heap_info_alignment[(sizeof (heap_info)
-					     + 2 * SIZE_SZ) % MALLOC_ALIGNMENT
-					    ? -1 : 1];
-
-/* Thread specific data */
-
-static tsd_key_t arena_key;
-static mutex_t list_lock;
-#ifdef PER_THREAD
-static size_t narenas;
-static struct malloc_state * free_list;
-#endif
-
-#if THREAD_STATS
-static int stat_n_heaps;
-#define THREAD_STAT(x) x
-#else
-#define THREAD_STAT(x) do ; while(0)
-#endif
-
-/* Mapped memory in non-main arenas (reliable only for NO_THREADS). */
-static unsigned long arena_mem;
-
-/* Already initialized? */
-static int __malloc_initialized = -1;
-
-/**************************************************************************/
-
-#if USE_ARENAS
-
-/* arena_get() acquires an arena and locks the corresponding mutex.
-   First, try the one last locked successfully by this thread.  (This
-   is the common case and handled with a macro for speed.)  Then, loop
-   once over the circularly linked list of arenas.  If no arena is
-   readily available, create a new one.  In this latter case, `size'
-   is just a hint as to how much memory will be required immediately
-   in the new arena. */
-
-#define arena_get(ptr, size) do { \
-  arena_lookup(ptr); \
-  arena_lock(ptr, size); \
-} while(0)
-
-#define arena_lookup(ptr) do { \
-  Void_t *vptr = NULL; \
-  ptr = (struct malloc_state *)tsd_getspecific(arena_key, vptr); \
-} while(0)
-
-#ifdef PER_THREAD
-#define arena_lock(ptr, size) do { \
-  if(ptr) \
-    (void)mutex_lock(&ptr->mutex); \
-  else \
-    ptr = arena_get2(ptr, (size)); \
-} while(0)
-#else
-#define arena_lock(ptr, size) do { \
-  if(ptr && !mutex_trylock(&ptr->mutex)) { \
-    THREAD_STAT(++(ptr->stat_lock_direct)); \
-  } else \
-    ptr = arena_get2(ptr, (size)); \
-} while(0)
-#endif
-
-/* find the heap and corresponding arena for a given ptr */
-
-#define heap_for_ptr(ptr) \
- ((heap_info *)((unsigned long)(ptr) & ~(HEAP_MAX_SIZE-1)))
-#define arena_for_chunk(ptr) \
- (chunk_non_main_arena(ptr) ? heap_for_ptr(ptr)->ar_ptr : &main_arena)
-
-#else /* !USE_ARENAS */
-
-/* There is only one arena, main_arena. */
-
-#if THREAD_STATS
-#define arena_get(ar_ptr, sz) do { \
-  ar_ptr = &main_arena; \
-  if(!mutex_trylock(&ar_ptr->mutex)) \
-    ++(ar_ptr->stat_lock_direct); \
-  else { \
-    (void)mutex_lock(&ar_ptr->mutex); \
-    ++(ar_ptr->stat_lock_wait); \
-  } \
-} while(0)
-#else
-#define arena_get(ar_ptr, sz) do { \
-  ar_ptr = &main_arena; \
-  (void)mutex_lock(&ar_ptr->mutex); \
-} while(0)
-#endif
-#define arena_for_chunk(ptr) (&main_arena)
-
-#endif /* USE_ARENAS */
-
-/**************************************************************************/
-
-#ifndef NO_THREADS
-
-/* atfork support.  */
-
-static __malloc_ptr_t (*save_malloc_hook) (size_t __size,
-					   __const __malloc_ptr_t);
-# if !defined _LIBC || (defined SHARED && !USE___THREAD)
-static __malloc_ptr_t (*save_memalign_hook) (size_t __align, size_t __size,
-					     __const __malloc_ptr_t);
-# endif
-static void           (*save_free_hook) (__malloc_ptr_t __ptr,
-					 __const __malloc_ptr_t);
-static Void_t*        save_arena;
-
-#ifdef ATFORK_MEM
-ATFORK_MEM;
-#endif
-
-/* Magic value for the thread-specific arena pointer when
-   malloc_atfork() is in use.  */
-
-#define ATFORK_ARENA_PTR ((Void_t*)-1)
-
-/* The following hooks are used while the `atfork' handling mechanism
-   is active. */
-
-static Void_t*
-malloc_atfork(size_t sz, const Void_t *caller)
-{
-  Void_t *vptr = NULL;
-  Void_t *victim;
-
-  tsd_getspecific(arena_key, vptr);
-  if(vptr == ATFORK_ARENA_PTR) {
-    /* We are the only thread that may allocate at all.  */
-    if(save_malloc_hook != malloc_check) {
-      return _int_malloc(&main_arena, sz);
-    } else {
-      if(top_check()<0)
-	return 0;
-      victim = _int_malloc(&main_arena, sz+1);
-      return mem2mem_check(victim, sz);
-    }
-  } else {
-    /* Suspend the thread until the `atfork' handlers have completed.
-       By that time, the hooks will have been reset as well, so that
-       mALLOc() can be used again. */
-    (void)mutex_lock(&list_lock);
-    (void)mutex_unlock(&list_lock);
-    return public_mALLOc(sz);
-  }
-}
-
-static void
-free_atfork(Void_t* mem, const Void_t *caller)
-{
-  Void_t *vptr = NULL;
-  struct malloc_state * ar_ptr;
-  mchunkptr p;                          /* chunk corresponding to mem */
-
-  if (mem == 0)                              /* free(0) has no effect */
-    return;
-
-  p = mem2chunk(mem);         /* do not bother to replicate free_check here */
-
-#if HAVE_MMAP
-  if (chunk_is_mmapped(p))                       /* release mmapped memory. */
-  {
-    munmap_chunk(p);
-    return;
-  }
-#endif
-
-#ifdef ATOMIC_FASTBINS
-  ar_ptr = arena_for_chunk(p);
-  tsd_getspecific(arena_key, vptr);
-  _int_free(ar_ptr, p, vptr == ATFORK_ARENA_PTR);
-#else
-  ar_ptr = arena_for_chunk(p);
-  tsd_getspecific(arena_key, vptr);
-  if(vptr != ATFORK_ARENA_PTR)
-    (void)mutex_lock(&ar_ptr->mutex);
-  _int_free(ar_ptr, p);
-  if(vptr != ATFORK_ARENA_PTR)
-    (void)mutex_unlock(&ar_ptr->mutex);
-#endif
-}
-
-
-/* Counter for number of times the list is locked by the same thread.  */
-static unsigned int atfork_recursive_cntr;
-
-/* The following two functions are registered via thread_atfork() to
-   make sure that the mutexes remain in a consistent state in the
-   fork()ed version of a thread.  Also adapt the malloc and free hooks
-   temporarily, because the `atfork' handler mechanism may use
-   malloc/free internally (e.g. in LinuxThreads). */
-
-static void
-ptmalloc_lock_all (void)
-{
-  struct malloc_state * ar_ptr;
-
-  if(__malloc_initialized < 1)
-    return;
-  if (mutex_trylock(&list_lock))
-    {
-      Void_t *my_arena;
-      tsd_getspecific(arena_key, my_arena);
-      if (my_arena == ATFORK_ARENA_PTR)
-	/* This is the same thread which already locks the global list.
-	   Just bump the counter.  */
-	goto out;
-
-      /* This thread has to wait its turn.  */
-      (void)mutex_lock(&list_lock);
-    }
-  for(ar_ptr = &main_arena;;) {
-    (void)mutex_lock(&ar_ptr->mutex);
-    ar_ptr = ar_ptr->next;
-    if(ar_ptr == &main_arena) break;
-  }
-  save_malloc_hook = dlmalloc_hook;
-  save_free_hook = dlfree_hook;
-  dlmalloc_hook = malloc_atfork;
-  dlfree_hook = free_atfork;
-  /* Only the current thread may perform malloc/free calls now. */
-  tsd_getspecific(arena_key, save_arena);
-  tsd_setspecific(arena_key, ATFORK_ARENA_PTR);
- out:
-  ++atfork_recursive_cntr;
-}
-
-static void
-ptmalloc_unlock_all (void)
-{
-  struct malloc_state * ar_ptr;
-
-  if(__malloc_initialized < 1)
-    return;
-  if (--atfork_recursive_cntr != 0)
-    return;
-  tsd_setspecific(arena_key, save_arena);
-  dlmalloc_hook = save_malloc_hook;
-  dlfree_hook = save_free_hook;
-  for(ar_ptr = &main_arena;;) {
-    (void)mutex_unlock(&ar_ptr->mutex);
-    ar_ptr = ar_ptr->next;
-    if(ar_ptr == &main_arena) break;
-  }
-  (void)mutex_unlock(&list_lock);
-}
-
-#ifdef __linux__
-
-/* In NPTL, unlocking a mutex in the child process after a
-   fork() is currently unsafe, whereas re-initializing it is safe and
-   does not leak resources.  Therefore, a special atfork handler is
-   installed for the child. */
-
-static void
-ptmalloc_unlock_all2 (void)
-{
-  struct malloc_state * ar_ptr;
-
-  if(__malloc_initialized < 1)
-    return;
-#if defined _LIBC || defined MALLOC_HOOKS
-  tsd_setspecific(arena_key, save_arena);
-  dlmalloc_hook = save_malloc_hook;
-  dlfree_hook = save_free_hook;
-#endif
-#ifdef PER_THREAD
-  free_list = NULL;
-#endif
-  for(ar_ptr = &main_arena;;) {
-    mutex_init(&ar_ptr->mutex);
-#ifdef PER_THREAD
-    if (ar_ptr != save_arena) {
-      ar_ptr->next_free = free_list;
-      free_list = ar_ptr;
-    }
-#endif
-    ar_ptr = ar_ptr->next;
-    if(ar_ptr == &main_arena) break;
-  }
-  mutex_init(&list_lock);
-  atfork_recursive_cntr = 0;
-}
-
-#else
-
-#define ptmalloc_unlock_all2 ptmalloc_unlock_all
-
-#endif
-
-#endif /* !defined NO_THREADS */
-
-/* Initialization routine. */
-#ifdef _LIBC
-#include <string.h>
-extern char **_environ;
-
-static char *
-internal_function
-next_env_entry (char ***position)
-{
-  char **current = *position;
-  char *result = NULL;
-
-  while (*current != NULL)
-    {
-      if (__builtin_expect ((*current)[0] == 'M', 0)
-	  && (*current)[1] == 'A'
-	  && (*current)[2] == 'L'
-	  && (*current)[3] == 'L'
-	  && (*current)[4] == 'O'
-	  && (*current)[5] == 'C'
-	  && (*current)[6] == '_')
-	{
-	  result = &(*current)[7];
-
-	  /* Save current position for next visit.  */
-	  *position = ++current;
-
-	  break;
-	}
-
-      ++current;
-    }
-
-  return result;
-}
-#endif /* _LIBC */
-
-/* Set up basic state so that _int_malloc et al can work.  */
-static void
-ptmalloc_init_minimal (void)
-{
-#if DEFAULT_TOP_PAD != 0
-  mp_.top_pad        = DEFAULT_TOP_PAD;
-#endif
-  mp_.n_mmaps_max    = DEFAULT_MMAP_MAX;
-  mp_.mmap_threshold = DEFAULT_MMAP_THRESHOLD;
-  mp_.trim_threshold = DEFAULT_TRIM_THRESHOLD;
-  mp_.pagesize       = malloc_getpagesize;
-#ifdef PER_THREAD
-# define NARENAS_FROM_NCORES(n) ((n) * (sizeof(long) == 4 ? 2 : 8))
-  mp_.arena_test     = NARENAS_FROM_NCORES (1);
-  narenas = 1;
-#endif
-}
-
-
-#ifdef _LIBC
-# ifdef SHARED
-static void *
-__failing_morecore (ptrdiff_t d)
-{
-  return (void *) MORECORE_FAILURE;
-}
-
-extern struct dl_open_hook *_dl_open_hook;
-libc_hidden_proto (_dl_open_hook);
-# endif
-
-# if defined SHARED && !USE___THREAD
-/* This is called by __pthread_initialize_minimal when it needs to use
-   malloc to set up the TLS state.  We cannot do the full work of
-   ptmalloc_init (below) until __pthread_initialize_minimal has finished,
-   so it has to switch to using the special startup-time hooks while doing
-   those allocations.  */
-void
-__libc_malloc_pthread_startup (bool first_time)
-{
-  if (first_time)
-    {
-      ptmalloc_init_minimal ();
-      save_malloc_hook = dlmalloc_hook;
-      save_memalign_hook = dlmemalign_hook;
-      save_free_hook = dlfree_hook;
-      dlmalloc_hook = malloc_starter;
-      dlmemalign_hook = memalign_starter;
-      dlfree_hook = free_starter;
-    }
-  else
-    {
-      dlmalloc_hook = save_malloc_hook;
-      dlmemalign_hook = save_memalign_hook;
-      dlfree_hook = save_free_hook;
-    }
-}
-# endif
-#endif
-
-static void
-ptmalloc_init (void)
-{
-  const char* s;
-  int secure = 0;
-
-  if(__malloc_initialized >= 0) return;
-  __malloc_initialized = 0;
-
-#ifdef _LIBC
-# if defined SHARED && !USE___THREAD
-  /* ptmalloc_init_minimal may already have been called via
-     __libc_malloc_pthread_startup, above.  */
-  if (mp_.pagesize == 0)
-# endif
-#endif
-    ptmalloc_init_minimal();
-
-#ifndef NO_THREADS
-# if defined _LIBC
-  /* We know __pthread_initialize_minimal has already been called,
-     and that is enough.  */
-#   define NO_STARTER
-# endif
-# ifndef NO_STARTER
-  /* With some threads implementations, creating thread-specific data
-     or initializing a mutex may call malloc() itself.  Provide a
-     simple starter version (realloc() won't work). */
-  save_malloc_hook = dlmalloc_hook;
-  save_memalign_hook = dlmemalign_hook;
-  save_free_hook = dlfree_hook;
-  dlmalloc_hook = malloc_starter;
-  dlmemalign_hook = memalign_starter;
-  dlfree_hook = free_starter;
-#  ifdef _LIBC
-  /* Initialize the pthreads interface. */
-  if (__pthread_initialize != NULL)
-    __pthread_initialize();
-#  endif /* !defined _LIBC */
-# endif	/* !defined NO_STARTER */
-#endif /* !defined NO_THREADS */
-  mutex_init(&main_arena.mutex);
-  main_arena.next = &main_arena;
-
-#if defined _LIBC && defined SHARED
-  /* In case this libc copy is in a non-default namespace, never use brk.
-     Likewise if dlopened from statically linked program.  */
-  Dl_info di;
-  struct link_map *l;
-
-  if (_dl_open_hook != NULL
-      || (_dl_addr (ptmalloc_init, &di, &l, NULL) != 0
-	  && l->l_ns != LM_ID_BASE))
-    __morecore = __failing_morecore;
-#endif
-
-  mutex_init(&list_lock);
-  tsd_key_create(&arena_key, NULL);
-  tsd_setspecific(arena_key, (Void_t *)&main_arena);
-  thread_atfork(ptmalloc_lock_all, ptmalloc_unlock_all, ptmalloc_unlock_all2);
-#ifndef NO_THREADS
-# ifndef NO_STARTER
-  dlmalloc_hook = save_malloc_hook;
-  dlmemalign_hook = save_memalign_hook;
-  dlfree_hook = save_free_hook;
-# else
-#  undef NO_STARTER
-# endif
-#endif
-#ifdef _LIBC
-  secure = __libc_enable_secure;
-  s = NULL;
-  if (__builtin_expect (_environ != NULL, 1))
-    {
-      char **runp = _environ;
-      char *envline;
-
-      while (__builtin_expect ((envline = next_env_entry (&runp)) != NULL,
-			       0))
-	{
-	  size_t len = strcspn (envline, "=");
-
-	  if (envline[len] != '=')
-	    /* This is a "MALLOC_" variable at the end of the string
-	       without a '=' character.  Ignore it since otherwise we
-	       will access invalid memory below.  */
-	    continue;
-
-	  switch (len)
-	    {
-	    case 6:
-	      if (memcmp (envline, "CHECK_", 6) == 0)
-		s = &envline[7];
-	      break;
-	    case 8:
-	      if (! secure)
-		{
-		  if (memcmp (envline, "TOP_PAD_", 8) == 0)
-		    mALLOPt(M_TOP_PAD, atoi(&envline[9]));
-		  else if (memcmp (envline, "PERTURB_", 8) == 0)
-		    mALLOPt(M_PERTURB, atoi(&envline[9]));
-		}
-	      break;
-	    case 9:
-	      if (! secure)
-		{
-		  if (memcmp (envline, "MMAP_MAX_", 9) == 0)
-		    mALLOPt(M_MMAP_MAX, atoi(&envline[10]));
-#ifdef PER_THREAD
-		  else if (memcmp (envline, "ARENA_MAX", 9) == 0)
-		    mALLOPt(M_ARENA_MAX, atoi(&envline[10]));
-#endif
-		}
-	      break;
-#ifdef PER_THREAD
-	    case 10:
-	      if (! secure)
-		{
-		  if (memcmp (envline, "ARENA_TEST", 10) == 0)
-		    mALLOPt(M_ARENA_TEST, atoi(&envline[11]));
-		}
-	      break;
-#endif
-	    case 15:
-	      if (! secure)
-		{
-		  if (memcmp (envline, "TRIM_THRESHOLD_", 15) == 0)
-		    mALLOPt(M_TRIM_THRESHOLD, atoi(&envline[16]));
-		  else if (memcmp (envline, "MMAP_THRESHOLD_", 15) == 0)
-		    mALLOPt(M_MMAP_THRESHOLD, atoi(&envline[16]));
-		}
-	      break;
-	    default:
-	      break;
-	    }
-	}
-    }
-#else
-  if (! secure)
-    {
-      if((s = getenv("MALLOC_TRIM_THRESHOLD_")))
-	mALLOPt(M_TRIM_THRESHOLD, atoi(s));
-      if((s = getenv("MALLOC_TOP_PAD_")))
-	mALLOPt(M_TOP_PAD, atoi(s));
-      if((s = getenv("MALLOC_PERTURB_")))
-	mALLOPt(M_PERTURB, atoi(s));
-      if((s = getenv("MALLOC_MMAP_THRESHOLD_")))
-	mALLOPt(M_MMAP_THRESHOLD, atoi(s));
-      if((s = getenv("MALLOC_MMAP_MAX_")))
-	mALLOPt(M_MMAP_MAX, atoi(s));
-    }
-  s = getenv("MALLOC_CHECK_");
-#endif
-  if(s && s[0]) {
-    mALLOPt(M_CHECK_ACTION, (int)(s[0] - '0'));
-    if (check_action != 0)
-      dlmalloc_check_init();
-  }
-  void (*hook) (void) = force_reg (dlmalloc_initialize_hook);
-  if (hook != NULL)
-    (*hook)();
-  __malloc_initialized = 1;
-}
-
-/* There are platforms (e.g. Hurd) with a link-time hook mechanism. */
-#ifdef thread_atfork_static
-thread_atfork_static(ptmalloc_lock_all, ptmalloc_unlock_all, \
-		     ptmalloc_unlock_all2)
-#endif
-
-\f
-
-/* Managing heaps and arenas (for concurrent threads) */
-
-#if USE_ARENAS
-
-#if MALLOC_DEBUG > 1
-
-/* Print the complete contents of a single heap to stderr. */
-
-static void
-dump_heap(heap_info *heap)
-{
-  char *ptr;
-  mchunkptr p;
-
-  fprintf(stderr, "Heap %p, size %10lx:\n", heap, (long)heap->size);
-  ptr = (heap->ar_ptr != (struct malloc_state *)(heap+1)) ?
-    (char*)(heap + 1) : (char*)(heap + 1) + sizeof(struct malloc_state);
-  p = (mchunkptr)(((unsigned long)ptr + MALLOC_ALIGN_MASK) &
-		  ~MALLOC_ALIGN_MASK);
-  for(;;) {
-    fprintf(stderr, "chunk %p size %10lx", p, (long)p->size);
-    if(p == top(heap->ar_ptr)) {
-      fprintf(stderr, " (top)\n");
-      break;
-    } else if(p->size == (0|PREV_INUSE)) {
-      fprintf(stderr, " (fence)\n");
-      break;
-    }
-    fprintf(stderr, "\n");
-    p = next_chunk(p);
-  }
-}
-
-#endif /* MALLOC_DEBUG > 1 */
-
-/* If consecutive mmap (0, HEAP_MAX_SIZE << 1, ...) calls return decreasing
-   addresses as opposed to increasing, new_heap would badly fragment the
-   address space.  In that case remember the second HEAP_MAX_SIZE part
-   aligned to HEAP_MAX_SIZE from last mmap (0, HEAP_MAX_SIZE << 1, ...)
-   call (if it is already aligned) and try to reuse it next time.  We need
-   no locking for it, as kernel ensures the atomicity for us - worst case
-   we'll call mmap (addr, HEAP_MAX_SIZE, ...) for some value of addr in
-   multiple threads, but only one will succeed.  */
-static char *aligned_heap_area;
-
-static void *mmap_for_heap(void *addr, size_t length, int *must_clear)
-{
-	int prot = PROT_READ | PROT_WRITE;
-	int flags = MAP_PRIVATE;
-	void *ret;
-
-	ret = MMAP(addr, length, prot, flags | MAP_HUGETLB);
-	if (ret != MAP_FAILED) {
-		*must_clear = 1;
-		return ret;
-	}
-	*must_clear = 0;
-	return MMAP(addr, length, prot, flags | MAP_NORESERVE);
-}
-
-/* Create a new heap.  size is automatically rounded up to a multiple
-   of the page size. */
-static heap_info *new_heap(size_t size, size_t top_pad)
-{
-	size_t page_mask = malloc_getpagesize - 1;
-	char *p1, *p2;
-	unsigned long ul;
-	heap_info *h;
-	int must_clear;
-
-	if (size + top_pad < HEAP_MIN_SIZE)
-		size = HEAP_MIN_SIZE;
-	else if (size + top_pad <= HEAP_MAX_SIZE)
-		size += top_pad;
-	else if (size > HEAP_MAX_SIZE)
-		return 0;
-	else
-		size = HEAP_MAX_SIZE;
-	size = (size + page_mask) & ~page_mask;
-
-	/* A memory region aligned to a multiple of HEAP_MAX_SIZE is needed.
-	   No swap space needs to be reserved for the following large
-	   mapping (on Linux, this is the case for all non-writable mappings
-	   anyway). */
-	p2 = MAP_FAILED;
-	if (aligned_heap_area) {
-		p2 = mmap_for_heap(aligned_heap_area, HEAP_MAX_SIZE, &must_clear);
-		aligned_heap_area = NULL;
-		if (p2 != MAP_FAILED && ((unsigned long)p2 & (HEAP_MAX_SIZE - 1))) {
-			munmap(p2, HEAP_MAX_SIZE);
-			p2 = MAP_FAILED;
-		}
-	}
-	if (p2 == MAP_FAILED) {
-		p1 = mmap_for_heap(0, HEAP_MAX_SIZE << 1, &must_clear);
-		if (p1 != MAP_FAILED) {
-			p2 = (char *)(((unsigned long)p1 + (HEAP_MAX_SIZE - 1))
-				      & ~(HEAP_MAX_SIZE - 1));
-			ul = p2 - p1;
-			if (ul)
-				munmap(p1, ul);
-			else
-				aligned_heap_area = p2 + HEAP_MAX_SIZE;
-			munmap(p2 + HEAP_MAX_SIZE, HEAP_MAX_SIZE - ul);
-		} else {
-			/* Try to take the chance that an allocation of only HEAP_MAX_SIZE
-			   is already aligned. */
-			p2 = mmap_for_heap(0, HEAP_MAX_SIZE, &must_clear);
-			if (p2 == MAP_FAILED)
-				return 0;
-			if ((unsigned long)p2 & (HEAP_MAX_SIZE - 1)) {
-				munmap(p2, HEAP_MAX_SIZE);
-				return 0;
-			}
-		}
-	}
-	if (must_clear)
-		memset(p2, 0, HEAP_MAX_SIZE);
-	h = (heap_info *) p2;
-	h->size = size;
-	h->mprotect_size = size;
-	THREAD_STAT(stat_n_heaps++);
-	return h;
-}
-
-/* Grow a heap.  size is automatically rounded up to a
-   multiple of the page size. */
-
-static int
-grow_heap(heap_info *h, long diff)
-{
-  size_t page_mask = malloc_getpagesize - 1;
-  long new_size;
-
-  diff = (diff + page_mask) & ~page_mask;
-  new_size = (long)h->size + diff;
-  if((unsigned long) new_size > (unsigned long) HEAP_MAX_SIZE)
-    return -1;
-  if((unsigned long) new_size > h->mprotect_size) {
-    h->mprotect_size = new_size;
-  }
-
-  h->size = new_size;
-  return 0;
-}
-
-/* Shrink a heap.  */
-
-static int
-shrink_heap(heap_info *h, long diff)
-{
-  long new_size;
-
-  new_size = (long)h->size - diff;
-  if(new_size < (long)sizeof(*h))
-    return -1;
-  /* Try to re-map the extra heap space freshly to save memory, and
-     make it inaccessible. */
-  madvise ((char *)h + new_size, diff, MADV_DONTNEED);
-  /*fprintf(stderr, "shrink %p %08lx\n", h, new_size);*/
-
-  h->size = new_size;
-  return 0;
-}
-
-/* Delete a heap. */
-
-#define delete_heap(heap) \
-  do {								\
-    if ((char *)(heap) + HEAP_MAX_SIZE == aligned_heap_area)	\
-      aligned_heap_area = NULL;					\
-    munmap((char*)(heap), HEAP_MAX_SIZE);			\
-  } while (0)
-
-static int
-internal_function
-heap_trim(heap_info *heap, size_t pad)
-{
-  struct malloc_state * ar_ptr = heap->ar_ptr;
-  unsigned long pagesz = mp_.pagesize;
-  mchunkptr top_chunk = top(ar_ptr), p, bck, fwd;
-  heap_info *prev_heap;
-  long new_size, top_size, extra;
-
-  /* Can this heap go away completely? */
-  while(top_chunk == chunk_at_offset(heap, sizeof(*heap))) {
-    prev_heap = heap->prev;
-    p = chunk_at_offset(prev_heap, prev_heap->size - (MINSIZE-2*SIZE_SZ));
-    assert(p->size == (0|PREV_INUSE)); /* must be fencepost */
-    p = prev_chunk(p);
-    new_size = chunksize(p) + (MINSIZE-2*SIZE_SZ);
-    assert(new_size>0 && new_size<(long)(2*MINSIZE));
-    if(!prev_inuse(p))
-      new_size += p->prev_size;
-    assert(new_size>0 && new_size<HEAP_MAX_SIZE);
-    if(new_size + (HEAP_MAX_SIZE - prev_heap->size) < pad + MINSIZE + pagesz)
-      break;
-    ar_ptr->system_mem -= heap->size;
-    arena_mem -= heap->size;
-    delete_heap(heap);
-    heap = prev_heap;
-    if(!prev_inuse(p)) { /* consolidate backward */
-      p = prev_chunk(p);
-      unlink(p, bck, fwd);
-    }
-    assert(((unsigned long)((char*)p + new_size) & (pagesz-1)) == 0);
-    assert( ((char*)p + new_size) == ((char*)heap + heap->size) );
-    top(ar_ptr) = top_chunk = p;
-    set_head(top_chunk, new_size | PREV_INUSE);
-    /*check_chunk(ar_ptr, top_chunk);*/
-  }
-  top_size = chunksize(top_chunk);
-  extra = (top_size - pad - MINSIZE - 1) & ~(pagesz - 1);
-  if(extra < (long)pagesz)
-    return 0;
-  /* Try to shrink. */
-  if(shrink_heap(heap, extra) != 0)
-    return 0;
-  ar_ptr->system_mem -= extra;
-  arena_mem -= extra;
-
-  /* Success. Adjust top accordingly. */
-  set_head(top_chunk, (top_size - extra) | PREV_INUSE);
-  /*check_chunk(ar_ptr, top_chunk);*/
-  return 1;
-}
-
-/* Create a new arena with initial size "size".  */
-
-static struct malloc_state *
-_int_new_arena(size_t size)
-{
-  struct malloc_state * a;
-  heap_info *h;
-  char *ptr;
-  unsigned long misalign;
-
-  h = new_heap(size + (sizeof(*h) + sizeof(*a) + MALLOC_ALIGNMENT),
-	       mp_.top_pad);
-  if(!h) {
-    /* Maybe size is too large to fit in a single heap.  So, just try
-       to create a minimally-sized arena and let _int_malloc() attempt
-       to deal with the large request via mmap_chunk().  */
-    h = new_heap(sizeof(*h) + sizeof(*a) + MALLOC_ALIGNMENT, mp_.top_pad);
-    if(!h)
-      return 0;
-  }
-  a = h->ar_ptr = (struct malloc_state *)(h+1);
-  malloc_init_state(a);
-  /*a->next = NULL;*/
-  a->system_mem = a->max_system_mem = h->size;
-  arena_mem += h->size;
-#ifdef NO_THREADS
-  if((unsigned long)(mp_.mmapped_mem + arena_mem + main_arena.system_mem) >
-     mp_.max_total_mem)
-    mp_.max_total_mem = mp_.mmapped_mem + arena_mem + main_arena.system_mem;
-#endif
-
-  /* Set up the top chunk, with proper alignment. */
-  ptr = (char *)(a + 1);
-  misalign = (unsigned long)chunk2mem(ptr) & MALLOC_ALIGN_MASK;
-  if (misalign > 0)
-    ptr += MALLOC_ALIGNMENT - misalign;
-  top(a) = (mchunkptr)ptr;
-  set_head(top(a), (((char*)h + h->size) - ptr) | PREV_INUSE);
-
-  tsd_setspecific(arena_key, (Void_t *)a);
-  mutex_init(&a->mutex);
-  (void)mutex_lock(&a->mutex);
-
-#ifdef PER_THREAD
-  (void)mutex_lock(&list_lock);
-#endif
-
-  /* Add the new arena to the global list.  */
-  a->next = main_arena.next;
-  atomic_write_barrier ();
-  main_arena.next = a;
-
-#ifdef PER_THREAD
-  ++narenas;
-
-  (void)mutex_unlock(&list_lock);
-#endif
-
-  THREAD_STAT(++(a->stat_lock_loop));
-
-  return a;
-}
-
-
-#ifdef PER_THREAD
-static struct malloc_state *
-get_free_list (void)
-{
-  struct malloc_state * result = free_list;
-  if (result != NULL)
-    {
-      (void)mutex_lock(&list_lock);
-      result = free_list;
-      if (result != NULL)
-	free_list = result->next_free;
-      (void)mutex_unlock(&list_lock);
-
-      if (result != NULL)
-	{
-	  (void)mutex_lock(&result->mutex);
-	  tsd_setspecific(arena_key, (Void_t *)result);
-	  THREAD_STAT(++(result->stat_lock_loop));
-	}
-    }
-
-  return result;
-}
-
-
-static struct malloc_state *
-reused_arena (void)
-{
-  if (narenas <= mp_.arena_test)
-    return NULL;
-
-  static int narenas_limit;
-  if (narenas_limit == 0)
-    {
-      if (mp_.arena_max != 0)
-	narenas_limit = mp_.arena_max;
-      else
-	{
-	  int n  = __get_nprocs ();
-
-	  if (n >= 1)
-	    narenas_limit = NARENAS_FROM_NCORES (n);
-	  else
-	    /* We have no information about the system.  Assume two
-	       cores.  */
-	    narenas_limit = NARENAS_FROM_NCORES (2);
-	}
-    }
-
-  if (narenas < narenas_limit)
-    return NULL;
-
-  struct malloc_state * result;
-  static struct malloc_state * next_to_use;
-  if (next_to_use == NULL)
-    next_to_use = &main_arena;
-
-  result = next_to_use;
-  do
-    {
-      if (!mutex_trylock(&result->mutex))
-	goto out;
-
-      result = result->next;
-    }
-  while (result != next_to_use);
-
-  /* No arena available.  Wait for the next in line.  */
-  (void)mutex_lock(&result->mutex);
-
- out:
-  tsd_setspecific(arena_key, (Void_t *)result);
-  THREAD_STAT(++(result->stat_lock_loop));
-  next_to_use = result->next;
-
-  return result;
-}
-#endif
-
-static struct malloc_state *
-internal_function
-arena_get2(struct malloc_state * a_tsd, size_t size)
-{
-  struct malloc_state * a;
-
-#ifdef PER_THREAD
-  if ((a = get_free_list ()) == NULL
-      && (a = reused_arena ()) == NULL)
-    /* Nothing immediately available, so generate a new arena.  */
-    a = _int_new_arena(size);
-#else
-  if(!a_tsd)
-    a = a_tsd = &main_arena;
-  else {
-    a = a_tsd->next;
-    if(!a) {
-      /* This can only happen while initializing the new arena. */
-      (void)mutex_lock(&main_arena.mutex);
-      THREAD_STAT(++(main_arena.stat_lock_wait));
-      return &main_arena;
-    }
-  }
-
-  /* Check the global, circularly linked list for available arenas. */
-  bool retried = false;
- repeat:
-  do {
-    if(!mutex_trylock(&a->mutex)) {
-      if (retried)
-	(void)mutex_unlock(&list_lock);
-      THREAD_STAT(++(a->stat_lock_loop));
-      tsd_setspecific(arena_key, (Void_t *)a);
-      return a;
-    }
-    a = a->next;
-  } while(a != a_tsd);
-
-  /* If not even the list_lock can be obtained, try again.  This can
-     happen during `atfork', or for example on systems where thread
-     creation makes it temporarily impossible to obtain _any_
-     locks. */
-  if(!retried && mutex_trylock(&list_lock)) {
-    /* We will block to not run in a busy loop.  */
-    (void)mutex_lock(&list_lock);
-
-    /* Since we blocked there might be an arena available now.  */
-    retried = true;
-    a = a_tsd;
-    goto repeat;
-  }
-
-  /* Nothing immediately available, so generate a new arena.  */
-  a = _int_new_arena(size);
-  (void)mutex_unlock(&list_lock);
-#endif
-
-  return a;
-}
-
-#ifdef PER_THREAD
-static void __attribute__ ((section ("__libc_thread_freeres_fn")))
-arena_thread_freeres (void)
-{
-  Void_t *vptr = NULL;
-  struct malloc_state * a = tsd_getspecific(arena_key, vptr);
-  tsd_setspecific(arena_key, NULL);
-
-  if (a != NULL)
-    {
-      (void)mutex_lock(&list_lock);
-      a->next_free = free_list;
-      free_list = a;
-      (void)mutex_unlock(&list_lock);
-    }
-}
-text_set_element (__libc_thread_subfreeres, arena_thread_freeres);
-#endif
-
-#endif /* USE_ARENAS */
-
-/*
- * Local variables:
- * c-basic-offset: 2
- * End:
- */
diff --git a/tpc/malloc2.13/arena.h b/tpc/malloc2.13/arena.h
new file mode 100644
index 000000000000..0aaccb914d92
--- /dev/null
+++ b/tpc/malloc2.13/arena.h
@@ -0,0 +1,1092 @@
+/* Malloc implementation for multiple threads without lock contention.
+   Copyright (C) 2001,2002,2003,2004,2005,2006,2007,2009,2010
+   Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+   Contributed by Wolfram Gloger <wg@malloc.de>, 2001.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public License as
+   published by the Free Software Foundation; either version 2.1 of the
+   License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; see the file COPYING.LIB.  If not,
+   write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+   Boston, MA 02111-1307, USA.  */
+
+#include <stdbool.h>
+
+/* Compile-time constants.  */
+
+#define HEAP_MIN_SIZE (32*1024)
+#ifndef HEAP_MAX_SIZE
+# ifdef DEFAULT_MMAP_THRESHOLD_MAX
+#  define HEAP_MAX_SIZE (2 * DEFAULT_MMAP_THRESHOLD_MAX)
+# else
+#  define HEAP_MAX_SIZE (1024*1024) /* must be a power of two */
+# endif
+#endif
+
+/* HEAP_MIN_SIZE and HEAP_MAX_SIZE limit the size of mmap()ed heaps
+   that are dynamically created for multi-threaded programs.  The
+   maximum size must be a power of two, for fast determination of
+   which heap belongs to a chunk.  It should be much larger than the
+   mmap threshold, so that requests with a size just below that
+   threshold can be fulfilled without creating too many heaps.  */
+
+
+#ifndef THREAD_STATS
+#define THREAD_STATS 0
+#endif
+
+/* If THREAD_STATS is non-zero, some statistics on mutex locking are
+   computed.  */
+
+/***************************************************************************/
+
+#define top(ar_ptr) ((ar_ptr)->top)
+
+/* A heap is a single contiguous memory region holding (coalesceable)
+   malloc_chunks.  It is allocated with mmap() and always starts at an
+   address aligned to HEAP_MAX_SIZE.  Not used unless compiling with
+   USE_ARENAS. */
+
+typedef struct _heap_info {
+  struct malloc_state * ar_ptr; /* Arena for this heap. */
+  struct _heap_info *prev; /* Previous heap. */
+  size_t size;   /* Current size in bytes. */
+  size_t mprotect_size;	/* Size in bytes that has been mprotected
+			   PROT_READ|PROT_WRITE.  */
+  /* Make sure the following data is properly aligned, particularly
+     that sizeof (heap_info) + 2 * SIZE_SZ is a multiple of
+     MALLOC_ALIGNMENT. */
+  char pad[-6 * SIZE_SZ & MALLOC_ALIGN_MASK];
+} heap_info;
+
+/* Get a compile-time error if the heap_info padding is not correct
+   to make alignment work as expected in sYSMALLOc.  */
+extern int sanity_check_heap_info_alignment[(sizeof (heap_info)
+					     + 2 * SIZE_SZ) % MALLOC_ALIGNMENT
+					    ? -1 : 1];
+
+/* Thread specific data */
+
+static tsd_key_t arena_key;
+static mutex_t list_lock;
+#ifdef PER_THREAD
+static size_t narenas;
+static struct malloc_state * free_list;
+#endif
+
+#if THREAD_STATS
+static int stat_n_heaps;
+#define THREAD_STAT(x) x
+#else
+#define THREAD_STAT(x) do ; while(0)
+#endif
+
+/* Mapped memory in non-main arenas (reliable only for NO_THREADS). */
+static unsigned long arena_mem;
+
+/* Already initialized? */
+static int __malloc_initialized = -1;
+
+/**************************************************************************/
+
+#if USE_ARENAS
+
+/* arena_get() acquires an arena and locks the corresponding mutex.
+   First, try the one last locked successfully by this thread.  (This
+   is the common case and handled with a macro for speed.)  Then, loop
+   once over the circularly linked list of arenas.  If no arena is
+   readily available, create a new one.  In this latter case, `size'
+   is just a hint as to how much memory will be required immediately
+   in the new arena. */
+
+#define arena_get(ptr, size) do { \
+  arena_lookup(ptr); \
+  arena_lock(ptr, size); \
+} while(0)
+
+#define arena_lookup(ptr) do { \
+  Void_t *vptr = NULL; \
+  ptr = (struct malloc_state *)tsd_getspecific(arena_key, vptr); \
+} while(0)
+
+#ifdef PER_THREAD
+#define arena_lock(ptr, size) do { \
+  if(ptr) \
+    (void)mutex_lock(&ptr->mutex); \
+  else \
+    ptr = arena_get2(ptr, (size)); \
+} while(0)
+#else
+#define arena_lock(ptr, size) do { \
+  if(ptr && !mutex_trylock(&ptr->mutex)) { \
+    THREAD_STAT(++(ptr->stat_lock_direct)); \
+  } else \
+    ptr = arena_get2(ptr, (size)); \
+} while(0)
+#endif
+
+/* find the heap and corresponding arena for a given ptr */
+
+#define heap_for_ptr(ptr) \
+ ((heap_info *)((unsigned long)(ptr) & ~(HEAP_MAX_SIZE-1)))
+#define arena_for_chunk(ptr) \
+ (chunk_non_main_arena(ptr) ? heap_for_ptr(ptr)->ar_ptr : &main_arena)
+
+#else /* !USE_ARENAS */
+
+/* There is only one arena, main_arena. */
+
+#if THREAD_STATS
+#define arena_get(ar_ptr, sz) do { \
+  ar_ptr = &main_arena; \
+  if(!mutex_trylock(&ar_ptr->mutex)) \
+    ++(ar_ptr->stat_lock_direct); \
+  else { \
+    (void)mutex_lock(&ar_ptr->mutex); \
+    ++(ar_ptr->stat_lock_wait); \
+  } \
+} while(0)
+#else
+#define arena_get(ar_ptr, sz) do { \
+  ar_ptr = &main_arena; \
+  (void)mutex_lock(&ar_ptr->mutex); \
+} while(0)
+#endif
+#define arena_for_chunk(ptr) (&main_arena)
+
+#endif /* USE_ARENAS */
+
+/**************************************************************************/
+
+#ifndef NO_THREADS
+
+/* atfork support.  */
+
+static __malloc_ptr_t (*save_malloc_hook) (size_t __size,
+					   __const __malloc_ptr_t);
+# if !defined _LIBC || (defined SHARED && !USE___THREAD)
+static __malloc_ptr_t (*save_memalign_hook) (size_t __align, size_t __size,
+					     __const __malloc_ptr_t);
+# endif
+static void           (*save_free_hook) (__malloc_ptr_t __ptr,
+					 __const __malloc_ptr_t);
+static Void_t*        save_arena;
+
+#ifdef ATFORK_MEM
+ATFORK_MEM;
+#endif
+
+/* Magic value for the thread-specific arena pointer when
+   malloc_atfork() is in use.  */
+
+#define ATFORK_ARENA_PTR ((Void_t*)-1)
+
+/* The following hooks are used while the `atfork' handling mechanism
+   is active. */
+
+static Void_t*
+malloc_atfork(size_t sz, const Void_t *caller)
+{
+  Void_t *vptr = NULL;
+  Void_t *victim;
+
+  tsd_getspecific(arena_key, vptr);
+  if(vptr == ATFORK_ARENA_PTR) {
+    /* We are the only thread that may allocate at all.  */
+    if(save_malloc_hook != malloc_check) {
+      return _int_malloc(&main_arena, sz);
+    } else {
+      if(top_check()<0)
+	return 0;
+      victim = _int_malloc(&main_arena, sz+1);
+      return mem2mem_check(victim, sz);
+    }
+  } else {
+    /* Suspend the thread until the `atfork' handlers have completed.
+       By that time, the hooks will have been reset as well, so that
+       mALLOc() can be used again. */
+    (void)mutex_lock(&list_lock);
+    (void)mutex_unlock(&list_lock);
+    return public_mALLOc(sz);
+  }
+}
+
+static void
+free_atfork(Void_t* mem, const Void_t *caller)
+{
+  Void_t *vptr = NULL;
+  struct malloc_state * ar_ptr;
+  mchunkptr p;                          /* chunk corresponding to mem */
+
+  if (mem == 0)                              /* free(0) has no effect */
+    return;
+
+  p = mem2chunk(mem);         /* do not bother to replicate free_check here */
+
+#if HAVE_MMAP
+  if (chunk_is_mmapped(p))                       /* release mmapped memory. */
+  {
+    munmap_chunk(p);
+    return;
+  }
+#endif
+
+#ifdef ATOMIC_FASTBINS
+  ar_ptr = arena_for_chunk(p);
+  tsd_getspecific(arena_key, vptr);
+  _int_free(ar_ptr, p, vptr == ATFORK_ARENA_PTR);
+#else
+  ar_ptr = arena_for_chunk(p);
+  tsd_getspecific(arena_key, vptr);
+  if(vptr != ATFORK_ARENA_PTR)
+    (void)mutex_lock(&ar_ptr->mutex);
+  _int_free(ar_ptr, p);
+  if(vptr != ATFORK_ARENA_PTR)
+    (void)mutex_unlock(&ar_ptr->mutex);
+#endif
+}
+
+
+/* Counter for number of times the list is locked by the same thread.  */
+static unsigned int atfork_recursive_cntr;
+
+/* The following two functions are registered via thread_atfork() to
+   make sure that the mutexes remain in a consistent state in the
+   fork()ed version of a thread.  Also adapt the malloc and free hooks
+   temporarily, because the `atfork' handler mechanism may use
+   malloc/free internally (e.g. in LinuxThreads). */
+
+static void
+ptmalloc_lock_all (void)
+{
+  struct malloc_state * ar_ptr;
+
+  if(__malloc_initialized < 1)
+    return;
+  if (mutex_trylock(&list_lock))
+    {
+      Void_t *my_arena;
+      tsd_getspecific(arena_key, my_arena);
+      if (my_arena == ATFORK_ARENA_PTR)
+	/* This is the same thread which already locks the global list.
+	   Just bump the counter.  */
+	goto out;
+
+      /* This thread has to wait its turn.  */
+      (void)mutex_lock(&list_lock);
+    }
+  for(ar_ptr = &main_arena;;) {
+    (void)mutex_lock(&ar_ptr->mutex);
+    ar_ptr = ar_ptr->next;
+    if(ar_ptr == &main_arena) break;
+  }
+  save_malloc_hook = dlmalloc_hook;
+  save_free_hook = dlfree_hook;
+  dlmalloc_hook = malloc_atfork;
+  dlfree_hook = free_atfork;
+  /* Only the current thread may perform malloc/free calls now. */
+  tsd_getspecific(arena_key, save_arena);
+  tsd_setspecific(arena_key, ATFORK_ARENA_PTR);
+ out:
+  ++atfork_recursive_cntr;
+}
+
+static void
+ptmalloc_unlock_all (void)
+{
+  struct malloc_state * ar_ptr;
+
+  if(__malloc_initialized < 1)
+    return;
+  if (--atfork_recursive_cntr != 0)
+    return;
+  tsd_setspecific(arena_key, save_arena);
+  dlmalloc_hook = save_malloc_hook;
+  dlfree_hook = save_free_hook;
+  for(ar_ptr = &main_arena;;) {
+    (void)mutex_unlock(&ar_ptr->mutex);
+    ar_ptr = ar_ptr->next;
+    if(ar_ptr == &main_arena) break;
+  }
+  (void)mutex_unlock(&list_lock);
+}
+
+#ifdef __linux__
+
+/* In NPTL, unlocking a mutex in the child process after a
+   fork() is currently unsafe, whereas re-initializing it is safe and
+   does not leak resources.  Therefore, a special atfork handler is
+   installed for the child. */
+
+static void
+ptmalloc_unlock_all2 (void)
+{
+  struct malloc_state * ar_ptr;
+
+  if(__malloc_initialized < 1)
+    return;
+#if defined _LIBC || defined MALLOC_HOOKS
+  tsd_setspecific(arena_key, save_arena);
+  dlmalloc_hook = save_malloc_hook;
+  dlfree_hook = save_free_hook;
+#endif
+#ifdef PER_THREAD
+  free_list = NULL;
+#endif
+  for(ar_ptr = &main_arena;;) {
+    mutex_init(&ar_ptr->mutex);
+#ifdef PER_THREAD
+    if (ar_ptr != save_arena) {
+      ar_ptr->next_free = free_list;
+      free_list = ar_ptr;
+    }
+#endif
+    ar_ptr = ar_ptr->next;
+    if(ar_ptr == &main_arena) break;
+  }
+  mutex_init(&list_lock);
+  atfork_recursive_cntr = 0;
+}
+
+#else
+
+#define ptmalloc_unlock_all2 ptmalloc_unlock_all
+
+#endif
+
+#endif /* !defined NO_THREADS */
+
+/* Initialization routine. */
+#ifdef _LIBC
+#include <string.h>
+extern char **_environ;
+
+static char *
+internal_function
+next_env_entry (char ***position)
+{
+  char **current = *position;
+  char *result = NULL;
+
+  while (*current != NULL)
+    {
+      if (__builtin_expect ((*current)[0] == 'M', 0)
+	  && (*current)[1] == 'A'
+	  && (*current)[2] == 'L'
+	  && (*current)[3] == 'L'
+	  && (*current)[4] == 'O'
+	  && (*current)[5] == 'C'
+	  && (*current)[6] == '_')
+	{
+	  result = &(*current)[7];
+
+	  /* Save current position for next visit.  */
+	  *position = ++current;
+
+	  break;
+	}
+
+      ++current;
+    }
+
+  return result;
+}
+#endif /* _LIBC */
+
+/* Set up basic state so that _int_malloc et al can work.  */
+static void
+ptmalloc_init_minimal (void)
+{
+#if DEFAULT_TOP_PAD != 0
+  mp_.top_pad        = DEFAULT_TOP_PAD;
+#endif
+  mp_.n_mmaps_max    = DEFAULT_MMAP_MAX;
+  mp_.mmap_threshold = DEFAULT_MMAP_THRESHOLD;
+  mp_.trim_threshold = DEFAULT_TRIM_THRESHOLD;
+  mp_.pagesize       = malloc_getpagesize;
+#ifdef PER_THREAD
+# define NARENAS_FROM_NCORES(n) ((n) * (sizeof(long) == 4 ? 2 : 8))
+  mp_.arena_test     = NARENAS_FROM_NCORES (1);
+  narenas = 1;
+#endif
+}
+
+
+#ifdef _LIBC
+# ifdef SHARED
+static void *
+__failing_morecore (ptrdiff_t d)
+{
+  return (void *) MORECORE_FAILURE;
+}
+
+extern struct dl_open_hook *_dl_open_hook;
+libc_hidden_proto (_dl_open_hook);
+# endif
+
+# if defined SHARED && !USE___THREAD
+/* This is called by __pthread_initialize_minimal when it needs to use
+   malloc to set up the TLS state.  We cannot do the full work of
+   ptmalloc_init (below) until __pthread_initialize_minimal has finished,
+   so it has to switch to using the special startup-time hooks while doing
+   those allocations.  */
+void
+__libc_malloc_pthread_startup (bool first_time)
+{
+  if (first_time)
+    {
+      ptmalloc_init_minimal ();
+      save_malloc_hook = dlmalloc_hook;
+      save_memalign_hook = dlmemalign_hook;
+      save_free_hook = dlfree_hook;
+      dlmalloc_hook = malloc_starter;
+      dlmemalign_hook = memalign_starter;
+      dlfree_hook = free_starter;
+    }
+  else
+    {
+      dlmalloc_hook = save_malloc_hook;
+      dlmemalign_hook = save_memalign_hook;
+      dlfree_hook = save_free_hook;
+    }
+}
+# endif
+#endif
+
+static void
+ptmalloc_init (void)
+{
+  const char* s;
+  int secure = 0;
+
+  if(__malloc_initialized >= 0) return;
+  __malloc_initialized = 0;
+
+#ifdef _LIBC
+# if defined SHARED && !USE___THREAD
+  /* ptmalloc_init_minimal may already have been called via
+     __libc_malloc_pthread_startup, above.  */
+  if (mp_.pagesize == 0)
+# endif
+#endif
+    ptmalloc_init_minimal();
+
+#ifndef NO_THREADS
+# if defined _LIBC
+  /* We know __pthread_initialize_minimal has already been called,
+     and that is enough.  */
+#   define NO_STARTER
+# endif
+# ifndef NO_STARTER
+  /* With some threads implementations, creating thread-specific data
+     or initializing a mutex may call malloc() itself.  Provide a
+     simple starter version (realloc() won't work). */
+  save_malloc_hook = dlmalloc_hook;
+  save_memalign_hook = dlmemalign_hook;
+  save_free_hook = dlfree_hook;
+  dlmalloc_hook = malloc_starter;
+  dlmemalign_hook = memalign_starter;
+  dlfree_hook = free_starter;
+#  ifdef _LIBC
+  /* Initialize the pthreads interface. */
+  if (__pthread_initialize != NULL)
+    __pthread_initialize();
+#  endif /* !defined _LIBC */
+# endif	/* !defined NO_STARTER */
+#endif /* !defined NO_THREADS */
+  mutex_init(&main_arena.mutex);
+  main_arena.next = &main_arena;
+
+#if defined _LIBC && defined SHARED
+  /* In case this libc copy is in a non-default namespace, never use brk.
+     Likewise if dlopened from statically linked program.  */
+  Dl_info di;
+  struct link_map *l;
+
+  if (_dl_open_hook != NULL
+      || (_dl_addr (ptmalloc_init, &di, &l, NULL) != 0
+	  && l->l_ns != LM_ID_BASE))
+    __morecore = __failing_morecore;
+#endif
+
+  mutex_init(&list_lock);
+  tsd_key_create(&arena_key, NULL);
+  tsd_setspecific(arena_key, (Void_t *)&main_arena);
+  thread_atfork(ptmalloc_lock_all, ptmalloc_unlock_all, ptmalloc_unlock_all2);
+#ifndef NO_THREADS
+# ifndef NO_STARTER
+  dlmalloc_hook = save_malloc_hook;
+  dlmemalign_hook = save_memalign_hook;
+  dlfree_hook = save_free_hook;
+# else
+#  undef NO_STARTER
+# endif
+#endif
+#ifdef _LIBC
+  secure = __libc_enable_secure;
+  s = NULL;
+  if (__builtin_expect (_environ != NULL, 1))
+    {
+      char **runp = _environ;
+      char *envline;
+
+      while (__builtin_expect ((envline = next_env_entry (&runp)) != NULL,
+			       0))
+	{
+	  size_t len = strcspn (envline, "=");
+
+	  if (envline[len] != '=')
+	    /* This is a "MALLOC_" variable at the end of the string
+	       without a '=' character.  Ignore it since otherwise we
+	       will access invalid memory below.  */
+	    continue;
+
+	  switch (len)
+	    {
+	    case 6:
+	      if (memcmp (envline, "CHECK_", 6) == 0)
+		s = &envline[7];
+	      break;
+	    case 8:
+	      if (! secure)
+		{
+		  if (memcmp (envline, "TOP_PAD_", 8) == 0)
+		    mALLOPt(M_TOP_PAD, atoi(&envline[9]));
+		  else if (memcmp (envline, "PERTURB_", 8) == 0)
+		    mALLOPt(M_PERTURB, atoi(&envline[9]));
+		}
+	      break;
+	    case 9:
+	      if (! secure)
+		{
+		  if (memcmp (envline, "MMAP_MAX_", 9) == 0)
+		    mALLOPt(M_MMAP_MAX, atoi(&envline[10]));
+#ifdef PER_THREAD
+		  else if (memcmp (envline, "ARENA_MAX", 9) == 0)
+		    mALLOPt(M_ARENA_MAX, atoi(&envline[10]));
+#endif
+		}
+	      break;
+#ifdef PER_THREAD
+	    case 10:
+	      if (! secure)
+		{
+		  if (memcmp (envline, "ARENA_TEST", 10) == 0)
+		    mALLOPt(M_ARENA_TEST, atoi(&envline[11]));
+		}
+	      break;
+#endif
+	    case 15:
+	      if (! secure)
+		{
+		  if (memcmp (envline, "TRIM_THRESHOLD_", 15) == 0)
+		    mALLOPt(M_TRIM_THRESHOLD, atoi(&envline[16]));
+		  else if (memcmp (envline, "MMAP_THRESHOLD_", 15) == 0)
+		    mALLOPt(M_MMAP_THRESHOLD, atoi(&envline[16]));
+		}
+	      break;
+	    default:
+	      break;
+	    }
+	}
+    }
+#else
+  if (! secure)
+    {
+      if((s = getenv("MALLOC_TRIM_THRESHOLD_")))
+	mALLOPt(M_TRIM_THRESHOLD, atoi(s));
+      if((s = getenv("MALLOC_TOP_PAD_")))
+	mALLOPt(M_TOP_PAD, atoi(s));
+      if((s = getenv("MALLOC_PERTURB_")))
+	mALLOPt(M_PERTURB, atoi(s));
+      if((s = getenv("MALLOC_MMAP_THRESHOLD_")))
+	mALLOPt(M_MMAP_THRESHOLD, atoi(s));
+      if((s = getenv("MALLOC_MMAP_MAX_")))
+	mALLOPt(M_MMAP_MAX, atoi(s));
+    }
+  s = getenv("MALLOC_CHECK_");
+#endif
+  if(s && s[0]) {
+    mALLOPt(M_CHECK_ACTION, (int)(s[0] - '0'));
+    if (check_action != 0)
+      dlmalloc_check_init();
+  }
+  void (*hook) (void) = force_reg (dlmalloc_initialize_hook);
+  if (hook != NULL)
+    (*hook)();
+  __malloc_initialized = 1;
+}
+
+/* There are platforms (e.g. Hurd) with a link-time hook mechanism. */
+#ifdef thread_atfork_static
+thread_atfork_static(ptmalloc_lock_all, ptmalloc_unlock_all, \
+		     ptmalloc_unlock_all2)
+#endif
+
+\f
+
+/* Managing heaps and arenas (for concurrent threads) */
+
+#if USE_ARENAS
+
+#if MALLOC_DEBUG > 1
+
+/* Print the complete contents of a single heap to stderr. */
+
+static void
+dump_heap(heap_info *heap)
+{
+  char *ptr;
+  mchunkptr p;
+
+  fprintf(stderr, "Heap %p, size %10lx:\n", heap, (long)heap->size);
+  ptr = (heap->ar_ptr != (struct malloc_state *)(heap+1)) ?
+    (char*)(heap + 1) : (char*)(heap + 1) + sizeof(struct malloc_state);
+  p = (mchunkptr)(((unsigned long)ptr + MALLOC_ALIGN_MASK) &
+		  ~MALLOC_ALIGN_MASK);
+  for(;;) {
+    fprintf(stderr, "chunk %p size %10lx", p, (long)p->size);
+    if(p == top(heap->ar_ptr)) {
+      fprintf(stderr, " (top)\n");
+      break;
+    } else if(p->size == (0|PREV_INUSE)) {
+      fprintf(stderr, " (fence)\n");
+      break;
+    }
+    fprintf(stderr, "\n");
+    p = next_chunk(p);
+  }
+}
+
+#endif /* MALLOC_DEBUG > 1 */
+
+/* If consecutive mmap (0, HEAP_MAX_SIZE << 1, ...) calls return decreasing
+   addresses as opposed to increasing, new_heap would badly fragment the
+   address space.  In that case remember the second HEAP_MAX_SIZE part
+   aligned to HEAP_MAX_SIZE from last mmap (0, HEAP_MAX_SIZE << 1, ...)
+   call (if it is already aligned) and try to reuse it next time.  We need
+   no locking for it, as kernel ensures the atomicity for us - worst case
+   we'll call mmap (addr, HEAP_MAX_SIZE, ...) for some value of addr in
+   multiple threads, but only one will succeed.  */
+static char *aligned_heap_area;
+
+static void *mmap_for_heap(void *addr, size_t length, int *must_clear)
+{
+	int prot = PROT_READ | PROT_WRITE;
+	int flags = MAP_PRIVATE;
+	void *ret;
+
+	ret = MMAP(addr, length, prot, flags | MAP_HUGETLB);
+	if (ret != MAP_FAILED) {
+		*must_clear = 1;
+		return ret;
+	}
+	*must_clear = 0;
+	return MMAP(addr, length, prot, flags | MAP_NORESERVE);
+}
+
+/* Create a new heap.  size is automatically rounded up to a multiple
+   of the page size. */
+static heap_info *new_heap(size_t size, size_t top_pad)
+{
+	size_t page_mask = malloc_getpagesize - 1;
+	char *p1, *p2;
+	unsigned long ul;
+	heap_info *h;
+	int must_clear;
+
+	if (size + top_pad < HEAP_MIN_SIZE)
+		size = HEAP_MIN_SIZE;
+	else if (size + top_pad <= HEAP_MAX_SIZE)
+		size += top_pad;
+	else if (size > HEAP_MAX_SIZE)
+		return 0;
+	else
+		size = HEAP_MAX_SIZE;
+	size = (size + page_mask) & ~page_mask;
+
+	/* A memory region aligned to a multiple of HEAP_MAX_SIZE is needed.
+	   No swap space needs to be reserved for the following large
+	   mapping (on Linux, this is the case for all non-writable mappings
+	   anyway). */
+	p2 = MAP_FAILED;
+	if (aligned_heap_area) {
+		p2 = mmap_for_heap(aligned_heap_area, HEAP_MAX_SIZE, &must_clear);
+		aligned_heap_area = NULL;
+		if (p2 != MAP_FAILED && ((unsigned long)p2 & (HEAP_MAX_SIZE - 1))) {
+			munmap(p2, HEAP_MAX_SIZE);
+			p2 = MAP_FAILED;
+		}
+	}
+	if (p2 == MAP_FAILED) {
+		p1 = mmap_for_heap(0, HEAP_MAX_SIZE << 1, &must_clear);
+		if (p1 != MAP_FAILED) {
+			p2 = (char *)(((unsigned long)p1 + (HEAP_MAX_SIZE - 1))
+				      & ~(HEAP_MAX_SIZE - 1));
+			ul = p2 - p1;
+			if (ul)
+				munmap(p1, ul);
+			else
+				aligned_heap_area = p2 + HEAP_MAX_SIZE;
+			munmap(p2 + HEAP_MAX_SIZE, HEAP_MAX_SIZE - ul);
+		} else {
+			/* Try to take the chance that an allocation of only HEAP_MAX_SIZE
+			   is already aligned. */
+			p2 = mmap_for_heap(0, HEAP_MAX_SIZE, &must_clear);
+			if (p2 == MAP_FAILED)
+				return 0;
+			if ((unsigned long)p2 & (HEAP_MAX_SIZE - 1)) {
+				munmap(p2, HEAP_MAX_SIZE);
+				return 0;
+			}
+		}
+	}
+	if (must_clear)
+		memset(p2, 0, HEAP_MAX_SIZE);
+	h = (heap_info *) p2;
+	h->size = size;
+	h->mprotect_size = size;
+	THREAD_STAT(stat_n_heaps++);
+	return h;
+}
+
+/* Grow a heap.  size is automatically rounded up to a
+   multiple of the page size. */
+
+static int
+grow_heap(heap_info *h, long diff)
+{
+  size_t page_mask = malloc_getpagesize - 1;
+  long new_size;
+
+  diff = (diff + page_mask) & ~page_mask;
+  new_size = (long)h->size + diff;
+  if((unsigned long) new_size > (unsigned long) HEAP_MAX_SIZE)
+    return -1;
+  if((unsigned long) new_size > h->mprotect_size) {
+    h->mprotect_size = new_size;
+  }
+
+  h->size = new_size;
+  return 0;
+}
+
+/* Shrink a heap.  */
+
+static int
+shrink_heap(heap_info *h, long diff)
+{
+  long new_size;
+
+  new_size = (long)h->size - diff;
+  if(new_size < (long)sizeof(*h))
+    return -1;
+  /* Try to re-map the extra heap space freshly to save memory, and
+     make it inaccessible. */
+  madvise ((char *)h + new_size, diff, MADV_DONTNEED);
+  /*fprintf(stderr, "shrink %p %08lx\n", h, new_size);*/
+
+  h->size = new_size;
+  return 0;
+}
+
+/* Delete a heap. */
+
+#define delete_heap(heap) \
+  do {								\
+    if ((char *)(heap) + HEAP_MAX_SIZE == aligned_heap_area)	\
+      aligned_heap_area = NULL;					\
+    munmap((char*)(heap), HEAP_MAX_SIZE);			\
+  } while (0)
+
+static int
+internal_function
+heap_trim(heap_info *heap, size_t pad)
+{
+  struct malloc_state * ar_ptr = heap->ar_ptr;
+  unsigned long pagesz = mp_.pagesize;
+  mchunkptr top_chunk = top(ar_ptr), p, bck, fwd;
+  heap_info *prev_heap;
+  long new_size, top_size, extra;
+
+  /* Can this heap go away completely? */
+  while(top_chunk == chunk_at_offset(heap, sizeof(*heap))) {
+    prev_heap = heap->prev;
+    p = chunk_at_offset(prev_heap, prev_heap->size - (MINSIZE-2*SIZE_SZ));
+    assert(p->size == (0|PREV_INUSE)); /* must be fencepost */
+    p = prev_chunk(p);
+    new_size = chunksize(p) + (MINSIZE-2*SIZE_SZ);
+    assert(new_size>0 && new_size<(long)(2*MINSIZE));
+    if(!prev_inuse(p))
+      new_size += p->prev_size;
+    assert(new_size>0 && new_size<HEAP_MAX_SIZE);
+    if(new_size + (HEAP_MAX_SIZE - prev_heap->size) < pad + MINSIZE + pagesz)
+      break;
+    ar_ptr->system_mem -= heap->size;
+    arena_mem -= heap->size;
+    delete_heap(heap);
+    heap = prev_heap;
+    if(!prev_inuse(p)) { /* consolidate backward */
+      p = prev_chunk(p);
+      unlink(p, bck, fwd);
+    }
+    assert(((unsigned long)((char*)p + new_size) & (pagesz-1)) == 0);
+    assert( ((char*)p + new_size) == ((char*)heap + heap->size) );
+    top(ar_ptr) = top_chunk = p;
+    set_head(top_chunk, new_size | PREV_INUSE);
+    /*check_chunk(ar_ptr, top_chunk);*/
+  }
+  top_size = chunksize(top_chunk);
+  extra = (top_size - pad - MINSIZE - 1) & ~(pagesz - 1);
+  if(extra < (long)pagesz)
+    return 0;
+  /* Try to shrink. */
+  if(shrink_heap(heap, extra) != 0)
+    return 0;
+  ar_ptr->system_mem -= extra;
+  arena_mem -= extra;
+
+  /* Success. Adjust top accordingly. */
+  set_head(top_chunk, (top_size - extra) | PREV_INUSE);
+  /*check_chunk(ar_ptr, top_chunk);*/
+  return 1;
+}
+
+/* Create a new arena with initial size "size".  */
+
+static struct malloc_state *
+_int_new_arena(size_t size)
+{
+  struct malloc_state * a;
+  heap_info *h;
+  char *ptr;
+  unsigned long misalign;
+
+  h = new_heap(size + (sizeof(*h) + sizeof(*a) + MALLOC_ALIGNMENT),
+	       mp_.top_pad);
+  if(!h) {
+    /* Maybe size is too large to fit in a single heap.  So, just try
+       to create a minimally-sized arena and let _int_malloc() attempt
+       to deal with the large request via mmap_chunk().  */
+    h = new_heap(sizeof(*h) + sizeof(*a) + MALLOC_ALIGNMENT, mp_.top_pad);
+    if(!h)
+      return 0;
+  }
+  a = h->ar_ptr = (struct malloc_state *)(h+1);
+  malloc_init_state(a);
+  /*a->next = NULL;*/
+  a->system_mem = a->max_system_mem = h->size;
+  arena_mem += h->size;
+#ifdef NO_THREADS
+  if((unsigned long)(mp_.mmapped_mem + arena_mem + main_arena.system_mem) >
+     mp_.max_total_mem)
+    mp_.max_total_mem = mp_.mmapped_mem + arena_mem + main_arena.system_mem;
+#endif
+
+  /* Set up the top chunk, with proper alignment. */
+  ptr = (char *)(a + 1);
+  misalign = (unsigned long)chunk2mem(ptr) & MALLOC_ALIGN_MASK;
+  if (misalign > 0)
+    ptr += MALLOC_ALIGNMENT - misalign;
+  top(a) = (mchunkptr)ptr;
+  set_head(top(a), (((char*)h + h->size) - ptr) | PREV_INUSE);
+
+  tsd_setspecific(arena_key, (Void_t *)a);
+  mutex_init(&a->mutex);
+  (void)mutex_lock(&a->mutex);
+
+#ifdef PER_THREAD
+  (void)mutex_lock(&list_lock);
+#endif
+
+  /* Add the new arena to the global list.  */
+  a->next = main_arena.next;
+  atomic_write_barrier ();
+  main_arena.next = a;
+
+#ifdef PER_THREAD
+  ++narenas;
+
+  (void)mutex_unlock(&list_lock);
+#endif
+
+  THREAD_STAT(++(a->stat_lock_loop));
+
+  return a;
+}
+
+
+#ifdef PER_THREAD
+static struct malloc_state *
+get_free_list (void)
+{
+  struct malloc_state * result = free_list;
+  if (result != NULL)
+    {
+      (void)mutex_lock(&list_lock);
+      result = free_list;
+      if (result != NULL)
+	free_list = result->next_free;
+      (void)mutex_unlock(&list_lock);
+
+      if (result != NULL)
+	{
+	  (void)mutex_lock(&result->mutex);
+	  tsd_setspecific(arena_key, (Void_t *)result);
+	  THREAD_STAT(++(result->stat_lock_loop));
+	}
+    }
+
+  return result;
+}
+
+
+static struct malloc_state *
+reused_arena (void)
+{
+  if (narenas <= mp_.arena_test)
+    return NULL;
+
+  static int narenas_limit;
+  if (narenas_limit == 0)
+    {
+      if (mp_.arena_max != 0)
+	narenas_limit = mp_.arena_max;
+      else
+	{
+	  int n  = __get_nprocs ();
+
+	  if (n >= 1)
+	    narenas_limit = NARENAS_FROM_NCORES (n);
+	  else
+	    /* We have no information about the system.  Assume two
+	       cores.  */
+	    narenas_limit = NARENAS_FROM_NCORES (2);
+	}
+    }
+
+  if (narenas < narenas_limit)
+    return NULL;
+
+  struct malloc_state * result;
+  static struct malloc_state * next_to_use;
+  if (next_to_use == NULL)
+    next_to_use = &main_arena;
+
+  result = next_to_use;
+  do
+    {
+      if (!mutex_trylock(&result->mutex))
+	goto out;
+
+      result = result->next;
+    }
+  while (result != next_to_use);
+
+  /* No arena available.  Wait for the next in line.  */
+  (void)mutex_lock(&result->mutex);
+
+ out:
+  tsd_setspecific(arena_key, (Void_t *)result);
+  THREAD_STAT(++(result->stat_lock_loop));
+  next_to_use = result->next;
+
+  return result;
+}
+#endif
+
+static struct malloc_state *
+internal_function
+arena_get2(struct malloc_state * a_tsd, size_t size)
+{
+  struct malloc_state * a;
+
+#ifdef PER_THREAD
+  if ((a = get_free_list ()) == NULL
+      && (a = reused_arena ()) == NULL)
+    /* Nothing immediately available, so generate a new arena.  */
+    a = _int_new_arena(size);
+#else
+  if(!a_tsd)
+    a = a_tsd = &main_arena;
+  else {
+    a = a_tsd->next;
+    if(!a) {
+      /* This can only happen while initializing the new arena. */
+      (void)mutex_lock(&main_arena.mutex);
+      THREAD_STAT(++(main_arena.stat_lock_wait));
+      return &main_arena;
+    }
+  }
+
+  /* Check the global, circularly linked list for available arenas. */
+  bool retried = false;
+ repeat:
+  do {
+    if(!mutex_trylock(&a->mutex)) {
+      if (retried)
+	(void)mutex_unlock(&list_lock);
+      THREAD_STAT(++(a->stat_lock_loop));
+      tsd_setspecific(arena_key, (Void_t *)a);
+      return a;
+    }
+    a = a->next;
+  } while(a != a_tsd);
+
+  /* If not even the list_lock can be obtained, try again.  This can
+     happen during `atfork', or for example on systems where thread
+     creation makes it temporarily impossible to obtain _any_
+     locks. */
+  if(!retried && mutex_trylock(&list_lock)) {
+    /* We will block to not run in a busy loop.  */
+    (void)mutex_lock(&list_lock);
+
+    /* Since we blocked there might be an arena available now.  */
+    retried = true;
+    a = a_tsd;
+    goto repeat;
+  }
+
+  /* Nothing immediately available, so generate a new arena.  */
+  a = _int_new_arena(size);
+  (void)mutex_unlock(&list_lock);
+#endif
+
+  return a;
+}
+
+#ifdef PER_THREAD
+static void __attribute__ ((section ("__libc_thread_freeres_fn")))
+arena_thread_freeres (void)
+{
+  Void_t *vptr = NULL;
+  struct malloc_state * a = tsd_getspecific(arena_key, vptr);
+  tsd_setspecific(arena_key, NULL);
+
+  if (a != NULL)
+    {
+      (void)mutex_lock(&list_lock);
+      a->next_free = free_list;
+      free_list = a;
+      (void)mutex_unlock(&list_lock);
+    }
+}
+text_set_element (__libc_thread_subfreeres, arena_thread_freeres);
+#endif
+
+#endif /* USE_ARENAS */
+
+/*
+ * Local variables:
+ * c-basic-offset: 2
+ * End:
+ */
diff --git a/tpc/malloc2.13/hooks.ch b/tpc/malloc2.13/hooks.ch
deleted file mode 100644
index 05cfafbb78ba..000000000000
--- a/tpc/malloc2.13/hooks.ch
+++ /dev/null
@@ -1,643 +0,0 @@
-/* Malloc implementation for multiple threads without lock contention.
-   Copyright (C) 2001-2006, 2007, 2008, 2009 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-   Contributed by Wolfram Gloger <wg@malloc.de>, 2001.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public License as
-   published by the Free Software Foundation; either version 2.1 of the
-   License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; see the file COPYING.LIB.  If not,
-   write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
-   Boston, MA 02111-1307, USA.  */
-
-/* What to do if the standard debugging hooks are in place and a
-   corrupt pointer is detected: do nothing (0), print an error message
-   (1), or call abort() (2). */
-
-/* Hooks for debugging versions.  The initial hooks just call the
-   initialization routine, then do the normal work. */
-
-static Void_t*
-malloc_hook_ini(size_t sz, const __malloc_ptr_t caller)
-{
-  dlmalloc_hook = NULL;
-  ptmalloc_init();
-  return public_mALLOc(sz);
-}
-
-static Void_t*
-realloc_hook_ini(Void_t* ptr, size_t sz, const __malloc_ptr_t caller)
-{
-  dlmalloc_hook = NULL;
-  dlrealloc_hook = NULL;
-  ptmalloc_init();
-  return public_rEALLOc(ptr, sz);
-}
-
-static Void_t*
-memalign_hook_ini(size_t alignment, size_t sz, const __malloc_ptr_t caller)
-{
-  dlmemalign_hook = NULL;
-  ptmalloc_init();
-  return public_mEMALIGn(alignment, sz);
-}
-
-/* Whether we are using malloc checking.  */
-static int using_malloc_checking;
-
-/* A flag that is set by malloc_set_state, to signal that malloc checking
-   must not be enabled on the request from the user (via the MALLOC_CHECK_
-   environment variable).  It is reset by __malloc_check_init to tell
-   malloc_set_state that the user has requested malloc checking.
-
-   The purpose of this flag is to make sure that malloc checking is not
-   enabled when the heap to be restored was constructed without malloc
-   checking, and thus does not contain the required magic bytes.
-   Otherwise the heap would be corrupted by calls to free and realloc.  If
-   it turns out that the heap was created with malloc checking and the
-   user has requested it malloc_set_state just calls __malloc_check_init
-   again to enable it.  On the other hand, reusing such a heap without
-   further malloc checking is safe.  */
-static int disallow_malloc_check;
-
-/* Activate a standard set of debugging hooks. */
-void
-dlmalloc_check_init()
-{
-  if (disallow_malloc_check) {
-    disallow_malloc_check = 0;
-    return;
-  }
-  using_malloc_checking = 1;
-  dlmalloc_hook = malloc_check;
-  dlfree_hook = free_check;
-  dlrealloc_hook = realloc_check;
-  dlmemalign_hook = memalign_check;
-}
-
-/* A simple, standard set of debugging hooks.  Overhead is `only' one
-   byte per chunk; still this will catch most cases of double frees or
-   overruns.  The goal here is to avoid obscure crashes due to invalid
-   usage, unlike in the MALLOC_DEBUG code. */
-
-#define MAGICBYTE(p) ( ( ((size_t)p >> 3) ^ ((size_t)p >> 11)) & 0xFF )
-
-/* Instrument a chunk with overrun detector byte(s) and convert it
-   into a user pointer with requested size sz. */
-
-static Void_t*
-internal_function
-mem2mem_check(Void_t *ptr, size_t sz)
-{
-  mchunkptr p;
-  unsigned char* m_ptr = (unsigned char*)BOUNDED_N(ptr, sz);
-  size_t i;
-
-  if (!ptr)
-    return ptr;
-  p = mem2chunk(ptr);
-  for(i = chunksize(p) - (chunk_is_mmapped(p) ? 2*SIZE_SZ+1 : SIZE_SZ+1);
-      i > sz;
-      i -= 0xFF) {
-    if(i-sz < 0x100) {
-      m_ptr[i] = (unsigned char)(i-sz);
-      break;
-    }
-    m_ptr[i] = 0xFF;
-  }
-  m_ptr[sz] = MAGICBYTE(p);
-  return (Void_t*)m_ptr;
-}
-
-/* Convert a pointer to be free()d or realloc()ed to a valid chunk
-   pointer.  If the provided pointer is not valid, return NULL. */
-
-static mchunkptr
-internal_function
-mem2chunk_check(Void_t* mem, unsigned char **magic_p)
-{
-  mchunkptr p;
-  INTERNAL_SIZE_T sz, c;
-  unsigned char magic;
-
-  if(!aligned_OK(mem)) return NULL;
-  p = mem2chunk(mem);
-  if (!chunk_is_mmapped(p)) {
-    /* Must be a chunk in conventional heap memory. */
-    int contig = contiguous(&main_arena);
-    sz = chunksize(p);
-    if((contig &&
-	((char*)p<mp_.sbrk_base ||
-	 ((char*)p + sz)>=(mp_.sbrk_base+main_arena.system_mem) )) ||
-       sz<MINSIZE || sz&MALLOC_ALIGN_MASK || !inuse(p) ||
-       ( !prev_inuse(p) && (p->prev_size&MALLOC_ALIGN_MASK ||
-			    (contig && (char*)prev_chunk(p)<mp_.sbrk_base) ||
-			    next_chunk(prev_chunk(p))!=p) ))
-      return NULL;
-    magic = MAGICBYTE(p);
-    for(sz += SIZE_SZ-1; (c = ((unsigned char*)p)[sz]) != magic; sz -= c) {
-      if(c<=0 || sz<(c+2*SIZE_SZ)) return NULL;
-    }
-  } else {
-    unsigned long offset, page_mask = malloc_getpagesize-1;
-
-    /* mmap()ed chunks have MALLOC_ALIGNMENT or higher power-of-two
-       alignment relative to the beginning of a page.  Check this
-       first. */
-    offset = (unsigned long)mem & page_mask;
-    if((offset!=MALLOC_ALIGNMENT && offset!=0 && offset!=0x10 &&
-	offset!=0x20 && offset!=0x40 && offset!=0x80 && offset!=0x100 &&
-	offset!=0x200 && offset!=0x400 && offset!=0x800 && offset!=0x1000 &&
-	offset<0x2000) ||
-       !chunk_is_mmapped(p) || (p->size & PREV_INUSE) ||
-       ( (((unsigned long)p - p->prev_size) & page_mask) != 0 ) ||
-       ( (sz = chunksize(p)), ((p->prev_size + sz) & page_mask) != 0 ) )
-      return NULL;
-    magic = MAGICBYTE(p);
-    for(sz -= 1; (c = ((unsigned char*)p)[sz]) != magic; sz -= c) {
-      if(c<=0 || sz<(c+2*SIZE_SZ)) return NULL;
-    }
-  }
-  ((unsigned char*)p)[sz] ^= 0xFF;
-  if (magic_p)
-    *magic_p = (unsigned char *)p + sz;
-  return p;
-}
-
-/* Check for corruption of the top chunk, and try to recover if
-   necessary. */
-
-static int
-internal_function
-top_check(void)
-{
-  mchunkptr t = top(&main_arena);
-  char* brk, * new_brk;
-  INTERNAL_SIZE_T front_misalign, sbrk_size;
-  unsigned long pagesz = malloc_getpagesize;
-
-  if (t == initial_top(&main_arena) ||
-      (!chunk_is_mmapped(t) &&
-       chunksize(t)>=MINSIZE &&
-       prev_inuse(t) &&
-       (!contiguous(&main_arena) ||
-	(char*)t + chunksize(t) == mp_.sbrk_base + main_arena.system_mem)))
-    return 0;
-
-  malloc_printerr (check_action, "malloc: top chunk is corrupt", t);
-
-  /* Try to set up a new top chunk. */
-  brk = MORECORE(0);
-  front_misalign = (unsigned long)chunk2mem(brk) & MALLOC_ALIGN_MASK;
-  if (front_misalign > 0)
-    front_misalign = MALLOC_ALIGNMENT - front_misalign;
-  sbrk_size = front_misalign + mp_.top_pad + MINSIZE;
-  sbrk_size += pagesz - ((unsigned long)(brk + sbrk_size) & (pagesz - 1));
-  new_brk = (char*)(MORECORE (sbrk_size));
-  if (new_brk == (char*)(MORECORE_FAILURE))
-    {
-      MALLOC_FAILURE_ACTION;
-      return -1;
-    }
-  /* Call the `morecore' hook if necessary.  */
-  void (*hook) (void) = force_reg (dlafter_morecore_hook);
-  if (hook)
-    (*hook) ();
-  main_arena.system_mem = (new_brk - mp_.sbrk_base) + sbrk_size;
-
-  top(&main_arena) = (mchunkptr)(brk + front_misalign);
-  set_head(top(&main_arena), (sbrk_size - front_misalign) | PREV_INUSE);
-
-  return 0;
-}
-
-static Void_t*
-malloc_check(size_t sz, const Void_t *caller)
-{
-  Void_t *victim;
-
-  if (sz+1 == 0) {
-    MALLOC_FAILURE_ACTION;
-    return NULL;
-  }
-
-  (void)mutex_lock(&main_arena.mutex);
-  victim = (top_check() >= 0) ? _int_malloc(&main_arena, sz+1) : NULL;
-  (void)mutex_unlock(&main_arena.mutex);
-  return mem2mem_check(victim, sz);
-}
-
-static void
-free_check(Void_t* mem, const Void_t *caller)
-{
-  mchunkptr p;
-
-  if(!mem) return;
-  (void)mutex_lock(&main_arena.mutex);
-  p = mem2chunk_check(mem, NULL);
-  if(!p) {
-    (void)mutex_unlock(&main_arena.mutex);
-
-    malloc_printerr(check_action, "free(): invalid pointer", mem);
-    return;
-  }
-#if HAVE_MMAP
-  if (chunk_is_mmapped(p)) {
-    (void)mutex_unlock(&main_arena.mutex);
-    munmap_chunk(p);
-    return;
-  }
-#endif
-#if 0 /* Erase freed memory. */
-  memset(mem, 0, chunksize(p) - (SIZE_SZ+1));
-#endif
-#ifdef ATOMIC_FASTBINS
-  _int_free(&main_arena, p, 1);
-#else
-  _int_free(&main_arena, p);
-#endif
-  (void)mutex_unlock(&main_arena.mutex);
-}
-
-static Void_t*
-realloc_check(Void_t* oldmem, size_t bytes, const Void_t *caller)
-{
-  INTERNAL_SIZE_T nb;
-  Void_t* newmem = 0;
-  unsigned char *magic_p;
-
-  if (bytes+1 == 0) {
-    MALLOC_FAILURE_ACTION;
-    return NULL;
-  }
-  if (oldmem == 0) return malloc_check(bytes, NULL);
-  if (bytes == 0) {
-    free_check (oldmem, NULL);
-    return NULL;
-  }
-  (void)mutex_lock(&main_arena.mutex);
-  const mchunkptr oldp = mem2chunk_check(oldmem, &magic_p);
-  (void)mutex_unlock(&main_arena.mutex);
-  if(!oldp) {
-    malloc_printerr(check_action, "realloc(): invalid pointer", oldmem);
-    return malloc_check(bytes, NULL);
-  }
-  const INTERNAL_SIZE_T oldsize = chunksize(oldp);
-
-  checked_request2size(bytes+1, nb);
-  (void)mutex_lock(&main_arena.mutex);
-
-#if HAVE_MMAP
-  if (chunk_is_mmapped(oldp)) {
-#if HAVE_MREMAP
-    mchunkptr newp = mremap_chunk(oldp, nb);
-    if(newp)
-      newmem = chunk2mem(newp);
-    else
-#endif
-    {
-      /* Note the extra SIZE_SZ overhead. */
-      if(oldsize - SIZE_SZ >= nb)
-	newmem = oldmem; /* do nothing */
-      else {
-	/* Must alloc, copy, free. */
-	if (top_check() >= 0)
-	  newmem = _int_malloc(&main_arena, bytes+1);
-	if (newmem) {
-	  MALLOC_COPY(BOUNDED_N(newmem, bytes+1), oldmem, oldsize - 2*SIZE_SZ);
-	  munmap_chunk(oldp);
-	}
-      }
-    }
-  } else {
-#endif /* HAVE_MMAP */
-    if (top_check() >= 0) {
-      INTERNAL_SIZE_T nb;
-      checked_request2size(bytes + 1, nb);
-      newmem = _int_realloc(&main_arena, oldp, oldsize, nb);
-    }
-#if 0 /* Erase freed memory. */
-    if(newmem)
-      newp = mem2chunk(newmem);
-    nb = chunksize(newp);
-    if(oldp<newp || oldp>=chunk_at_offset(newp, nb)) {
-      memset((char*)oldmem + 2*sizeof(mbinptr), 0,
-	     oldsize - (2*sizeof(mbinptr)+2*SIZE_SZ+1));
-    } else if(nb > oldsize+SIZE_SZ) {
-      memset((char*)BOUNDED_N(chunk2mem(newp), bytes) + oldsize,
-	     0, nb - (oldsize+SIZE_SZ));
-    }
-#endif
-#if HAVE_MMAP
-  }
-#endif
-
-  /* mem2chunk_check changed the magic byte in the old chunk.
-     If newmem is NULL, then the old chunk will still be used though,
-     so we need to invert that change here.  */
-  if (newmem == NULL) *magic_p ^= 0xFF;
-
-  (void)mutex_unlock(&main_arena.mutex);
-
-  return mem2mem_check(newmem, bytes);
-}
-
-static Void_t*
-memalign_check(size_t alignment, size_t bytes, const Void_t *caller)
-{
-  INTERNAL_SIZE_T nb;
-  Void_t* mem;
-
-  if (alignment <= MALLOC_ALIGNMENT) return malloc_check(bytes, NULL);
-  if (alignment <  MINSIZE) alignment = MINSIZE;
-
-  if (bytes+1 == 0) {
-    MALLOC_FAILURE_ACTION;
-    return NULL;
-  }
-  checked_request2size(bytes+1, nb);
-  (void)mutex_lock(&main_arena.mutex);
-  mem = (top_check() >= 0) ? _int_memalign(&main_arena, alignment, bytes+1) :
-    NULL;
-  (void)mutex_unlock(&main_arena.mutex);
-  return mem2mem_check(mem, bytes);
-}
-
-#ifndef NO_THREADS
-
-# ifdef _LIBC
-#  if USE___THREAD || !defined SHARED
-    /* These routines are never needed in this configuration.  */
-#   define NO_STARTER
-#  endif
-# endif
-
-# ifdef NO_STARTER
-#  undef NO_STARTER
-# else
-
-/* The following hooks are used when the global initialization in
-   ptmalloc_init() hasn't completed yet. */
-
-static Void_t*
-malloc_starter(size_t sz, const Void_t *caller)
-{
-  Void_t* victim;
-
-  victim = _int_malloc(&main_arena, sz);
-
-  return victim ? BOUNDED_N(victim, sz) : 0;
-}
-
-static Void_t*
-memalign_starter(size_t align, size_t sz, const Void_t *caller)
-{
-  Void_t* victim;
-
-  victim = _int_memalign(&main_arena, align, sz);
-
-  return victim ? BOUNDED_N(victim, sz) : 0;
-}
-
-static void
-free_starter(Void_t* mem, const Void_t *caller)
-{
-  mchunkptr p;
-
-  if(!mem) return;
-  p = mem2chunk(mem);
-#if HAVE_MMAP
-  if (chunk_is_mmapped(p)) {
-    munmap_chunk(p);
-    return;
-  }
-#endif
-#ifdef ATOMIC_FASTBINS
-  _int_free(&main_arena, p, 1);
-#else
-  _int_free(&main_arena, p);
-#endif
-}
-
-# endif	/* !defiend NO_STARTER */
-#endif /* NO_THREADS */
-
-
-/* Get/set state: malloc_get_state() records the current state of all
-   malloc variables (_except_ for the actual heap contents and `hook'
-   function pointers) in a system dependent, opaque data structure.
-   This data structure is dynamically allocated and can be free()d
-   after use.  malloc_set_state() restores the state of all malloc
-   variables to the previously obtained state.  This is especially
-   useful when using this malloc as part of a shared library, and when
-   the heap contents are saved/restored via some other method.  The
-   primary example for this is GNU Emacs with its `dumping' procedure.
-   `Hook' function pointers are never saved or restored by these
-   functions, with two exceptions: If malloc checking was in use when
-   malloc_get_state() was called, then malloc_set_state() calls
-   __malloc_check_init() if possible; if malloc checking was not in
-   use in the recorded state but the user requested malloc checking,
-   then the hooks are reset to 0.  */
-
-#define MALLOC_STATE_MAGIC   0x444c4541l
-#define MALLOC_STATE_VERSION (0*0x100l + 4l) /* major*0x100 + minor */
-
-struct malloc_save_state {
-  long          magic;
-  long          version;
-  mbinptr       av[NBINS * 2 + 2];
-  char*         sbrk_base;
-  int           sbrked_mem_bytes;
-  unsigned long trim_threshold;
-  unsigned long top_pad;
-  unsigned int  n_mmaps_max;
-  unsigned long mmap_threshold;
-  int           check_action;
-  unsigned long max_sbrked_mem;
-  unsigned long max_total_mem;
-  unsigned int  n_mmaps;
-  unsigned int  max_n_mmaps;
-  unsigned long mmapped_mem;
-  unsigned long max_mmapped_mem;
-  int           using_malloc_checking;
-  unsigned long max_fast;
-  unsigned long arena_test;
-  unsigned long arena_max;
-  unsigned long narenas;
-};
-
-Void_t*
-public_gET_STATe(void)
-{
-  struct malloc_save_state* ms;
-  int i;
-  mbinptr b;
-
-  ms = (struct malloc_save_state*)public_mALLOc(sizeof(*ms));
-  if (!ms)
-    return 0;
-  (void)mutex_lock(&main_arena.mutex);
-  malloc_consolidate(&main_arena);
-  ms->magic = MALLOC_STATE_MAGIC;
-  ms->version = MALLOC_STATE_VERSION;
-  ms->av[0] = 0;
-  ms->av[1] = 0; /* used to be binblocks, now no longer used */
-  ms->av[2] = top(&main_arena);
-  ms->av[3] = 0; /* used to be undefined */
-  for(i=1; i<NBINS; i++) {
-    b = bin_at(&main_arena, i);
-    if(first(b) == b)
-      ms->av[2*i+2] = ms->av[2*i+3] = 0; /* empty bin */
-    else {
-      ms->av[2*i+2] = first(b);
-      ms->av[2*i+3] = last(b);
-    }
-  }
-  ms->sbrk_base = mp_.sbrk_base;
-  ms->sbrked_mem_bytes = main_arena.system_mem;
-  ms->trim_threshold = mp_.trim_threshold;
-  ms->top_pad = mp_.top_pad;
-  ms->n_mmaps_max = mp_.n_mmaps_max;
-  ms->mmap_threshold = mp_.mmap_threshold;
-  ms->check_action = check_action;
-  ms->max_sbrked_mem = main_arena.max_system_mem;
-#ifdef NO_THREADS
-  ms->max_total_mem = mp_.max_total_mem;
-#else
-  ms->max_total_mem = 0;
-#endif
-  ms->n_mmaps = mp_.n_mmaps;
-  ms->max_n_mmaps = mp_.max_n_mmaps;
-  ms->mmapped_mem = mp_.mmapped_mem;
-  ms->max_mmapped_mem = mp_.max_mmapped_mem;
-  ms->using_malloc_checking = using_malloc_checking;
-  ms->max_fast = get_max_fast();
-#ifdef PER_THREAD
-  ms->arena_test = mp_.arena_test;
-  ms->arena_max = mp_.arena_max;
-  ms->narenas = narenas;
-#endif
-  (void)mutex_unlock(&main_arena.mutex);
-  return (Void_t*)ms;
-}
-
-int
-public_sET_STATe(Void_t* msptr)
-{
-  struct malloc_save_state* ms = (struct malloc_save_state*)msptr;
-  size_t i;
-  mbinptr b;
-
-  disallow_malloc_check = 1;
-  ptmalloc_init();
-  if(ms->magic != MALLOC_STATE_MAGIC) return -1;
-  /* Must fail if the major version is too high. */
-  if((ms->version & ~0xffl) > (MALLOC_STATE_VERSION & ~0xffl)) return -2;
-  (void)mutex_lock(&main_arena.mutex);
-  /* There are no fastchunks.  */
-  clear_fastchunks(&main_arena);
-  if (ms->version >= 4)
-    set_max_fast(ms->max_fast);
-  else
-    set_max_fast(64);	/* 64 used to be the value we always used.  */
-  for (i=0; i<NFASTBINS; ++i)
-    fastbin (&main_arena, i) = 0;
-  for (i=0; i<BINMAPSIZE; ++i)
-    main_arena.binmap[i] = 0;
-  top(&main_arena) = ms->av[2];
-  main_arena.last_remainder = 0;
-  for(i=1; i<NBINS; i++) {
-    b = bin_at(&main_arena, i);
-    if(ms->av[2*i+2] == 0) {
-      assert(ms->av[2*i+3] == 0);
-      first(b) = last(b) = b;
-    } else {
-      if(ms->version >= 3 &&
-	 (i<NSMALLBINS || (largebin_index(chunksize(ms->av[2*i+2]))==i &&
-			   largebin_index(chunksize(ms->av[2*i+3]))==i))) {
-	first(b) = ms->av[2*i+2];
-	last(b) = ms->av[2*i+3];
-	/* Make sure the links to the bins within the heap are correct.  */
-	first(b)->bk = b;
-	last(b)->fd = b;
-	/* Set bit in binblocks.  */
-	mark_bin(&main_arena, i);
-      } else {
-	/* Oops, index computation from chunksize must have changed.
-	   Link the whole list into unsorted_chunks.  */
-	first(b) = last(b) = b;
-	b = unsorted_chunks(&main_arena);
-	ms->av[2*i+2]->bk = b;
-	ms->av[2*i+3]->fd = b->fd;
-	b->fd->bk = ms->av[2*i+3];
-	b->fd = ms->av[2*i+2];
-      }
-    }
-  }
-  if (ms->version < 3) {
-    /* Clear fd_nextsize and bk_nextsize fields.  */
-    b = unsorted_chunks(&main_arena)->fd;
-    while (b != unsorted_chunks(&main_arena)) {
-      if (!in_smallbin_range(chunksize(b))) {
-	b->fd_nextsize = NULL;
-	b->bk_nextsize = NULL;
-      }
-      b = b->fd;
-    }
-  }
-  mp_.sbrk_base = ms->sbrk_base;
-  main_arena.system_mem = ms->sbrked_mem_bytes;
-  mp_.trim_threshold = ms->trim_threshold;
-  mp_.top_pad = ms->top_pad;
-  mp_.n_mmaps_max = ms->n_mmaps_max;
-  mp_.mmap_threshold = ms->mmap_threshold;
-  check_action = ms->check_action;
-  main_arena.max_system_mem = ms->max_sbrked_mem;
-#ifdef NO_THREADS
-  mp_.max_total_mem = ms->max_total_mem;
-#endif
-  mp_.n_mmaps = ms->n_mmaps;
-  mp_.max_n_mmaps = ms->max_n_mmaps;
-  mp_.mmapped_mem = ms->mmapped_mem;
-  mp_.max_mmapped_mem = ms->max_mmapped_mem;
-  /* add version-dependent code here */
-  if (ms->version >= 1) {
-    /* Check whether it is safe to enable malloc checking, or whether
-       it is necessary to disable it.  */
-    if (ms->using_malloc_checking && !using_malloc_checking &&
-	!disallow_malloc_check)
-      dlmalloc_check_init ();
-    else if (!ms->using_malloc_checking && using_malloc_checking) {
-      dlmalloc_hook = NULL;
-      dlfree_hook = NULL;
-      dlrealloc_hook = NULL;
-      dlmemalign_hook = NULL;
-      using_malloc_checking = 0;
-    }
-  }
-  if (ms->version >= 4) {
-#ifdef PER_THREAD
-    mp_.arena_test = ms->arena_test;
-    mp_.arena_max = ms->arena_max;
-    narenas = ms->narenas;
-#endif
-  }
-  check_malloc_state(&main_arena);
-
-  (void)mutex_unlock(&main_arena.mutex);
-  return 0;
-}
-
-/*
- * Local variables:
- * c-basic-offset: 2
- * End:
- */
diff --git a/tpc/malloc2.13/hooks.h b/tpc/malloc2.13/hooks.h
new file mode 100644
index 000000000000..05cfafbb78ba
--- /dev/null
+++ b/tpc/malloc2.13/hooks.h
@@ -0,0 +1,643 @@
+/* Malloc implementation for multiple threads without lock contention.
+   Copyright (C) 2001-2006, 2007, 2008, 2009 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+   Contributed by Wolfram Gloger <wg@malloc.de>, 2001.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public License as
+   published by the Free Software Foundation; either version 2.1 of the
+   License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; see the file COPYING.LIB.  If not,
+   write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+   Boston, MA 02111-1307, USA.  */
+
+/* What to do if the standard debugging hooks are in place and a
+   corrupt pointer is detected: do nothing (0), print an error message
+   (1), or call abort() (2). */
+
+/* Hooks for debugging versions.  The initial hooks just call the
+   initialization routine, then do the normal work. */
+
+static Void_t*
+malloc_hook_ini(size_t sz, const __malloc_ptr_t caller)
+{
+  dlmalloc_hook = NULL;
+  ptmalloc_init();
+  return public_mALLOc(sz);
+}
+
+static Void_t*
+realloc_hook_ini(Void_t* ptr, size_t sz, const __malloc_ptr_t caller)
+{
+  dlmalloc_hook = NULL;
+  dlrealloc_hook = NULL;
+  ptmalloc_init();
+  return public_rEALLOc(ptr, sz);
+}
+
+static Void_t*
+memalign_hook_ini(size_t alignment, size_t sz, const __malloc_ptr_t caller)
+{
+  dlmemalign_hook = NULL;
+  ptmalloc_init();
+  return public_mEMALIGn(alignment, sz);
+}
+
+/* Whether we are using malloc checking.  */
+static int using_malloc_checking;
+
+/* A flag that is set by malloc_set_state, to signal that malloc checking
+   must not be enabled on the request from the user (via the MALLOC_CHECK_
+   environment variable).  It is reset by __malloc_check_init to tell
+   malloc_set_state that the user has requested malloc checking.
+
+   The purpose of this flag is to make sure that malloc checking is not
+   enabled when the heap to be restored was constructed without malloc
+   checking, and thus does not contain the required magic bytes.
+   Otherwise the heap would be corrupted by calls to free and realloc.  If
+   it turns out that the heap was created with malloc checking and the
+   user has requested it malloc_set_state just calls __malloc_check_init
+   again to enable it.  On the other hand, reusing such a heap without
+   further malloc checking is safe.  */
+static int disallow_malloc_check;
+
+/* Activate a standard set of debugging hooks. */
+void
+dlmalloc_check_init()
+{
+  if (disallow_malloc_check) {
+    disallow_malloc_check = 0;
+    return;
+  }
+  using_malloc_checking = 1;
+  dlmalloc_hook = malloc_check;
+  dlfree_hook = free_check;
+  dlrealloc_hook = realloc_check;
+  dlmemalign_hook = memalign_check;
+}
+
+/* A simple, standard set of debugging hooks.  Overhead is `only' one
+   byte per chunk; still this will catch most cases of double frees or
+   overruns.  The goal here is to avoid obscure crashes due to invalid
+   usage, unlike in the MALLOC_DEBUG code. */
+
+#define MAGICBYTE(p) ( ( ((size_t)p >> 3) ^ ((size_t)p >> 11)) & 0xFF )
+
+/* Instrument a chunk with overrun detector byte(s) and convert it
+   into a user pointer with requested size sz. */
+
+static Void_t*
+internal_function
+mem2mem_check(Void_t *ptr, size_t sz)
+{
+  mchunkptr p;
+  unsigned char* m_ptr = (unsigned char*)BOUNDED_N(ptr, sz);
+  size_t i;
+
+  if (!ptr)
+    return ptr;
+  p = mem2chunk(ptr);
+  for(i = chunksize(p) - (chunk_is_mmapped(p) ? 2*SIZE_SZ+1 : SIZE_SZ+1);
+      i > sz;
+      i -= 0xFF) {
+    if(i-sz < 0x100) {
+      m_ptr[i] = (unsigned char)(i-sz);
+      break;
+    }
+    m_ptr[i] = 0xFF;
+  }
+  m_ptr[sz] = MAGICBYTE(p);
+  return (Void_t*)m_ptr;
+}
+
+/* Convert a pointer to be free()d or realloc()ed to a valid chunk
+   pointer.  If the provided pointer is not valid, return NULL. */
+
+static mchunkptr
+internal_function
+mem2chunk_check(Void_t* mem, unsigned char **magic_p)
+{
+  mchunkptr p;
+  INTERNAL_SIZE_T sz, c;
+  unsigned char magic;
+
+  if(!aligned_OK(mem)) return NULL;
+  p = mem2chunk(mem);
+  if (!chunk_is_mmapped(p)) {
+    /* Must be a chunk in conventional heap memory. */
+    int contig = contiguous(&main_arena);
+    sz = chunksize(p);
+    if((contig &&
+	((char*)p<mp_.sbrk_base ||
+	 ((char*)p + sz)>=(mp_.sbrk_base+main_arena.system_mem) )) ||
+       sz<MINSIZE || sz&MALLOC_ALIGN_MASK || !inuse(p) ||
+       ( !prev_inuse(p) && (p->prev_size&MALLOC_ALIGN_MASK ||
+			    (contig && (char*)prev_chunk(p)<mp_.sbrk_base) ||
+			    next_chunk(prev_chunk(p))!=p) ))
+      return NULL;
+    magic = MAGICBYTE(p);
+    for(sz += SIZE_SZ-1; (c = ((unsigned char*)p)[sz]) != magic; sz -= c) {
+      if(c<=0 || sz<(c+2*SIZE_SZ)) return NULL;
+    }
+  } else {
+    unsigned long offset, page_mask = malloc_getpagesize-1;
+
+    /* mmap()ed chunks have MALLOC_ALIGNMENT or higher power-of-two
+       alignment relative to the beginning of a page.  Check this
+       first. */
+    offset = (unsigned long)mem & page_mask;
+    if((offset!=MALLOC_ALIGNMENT && offset!=0 && offset!=0x10 &&
+	offset!=0x20 && offset!=0x40 && offset!=0x80 && offset!=0x100 &&
+	offset!=0x200 && offset!=0x400 && offset!=0x800 && offset!=0x1000 &&
+	offset<0x2000) ||
+       !chunk_is_mmapped(p) || (p->size & PREV_INUSE) ||
+       ( (((unsigned long)p - p->prev_size) & page_mask) != 0 ) ||
+       ( (sz = chunksize(p)), ((p->prev_size + sz) & page_mask) != 0 ) )
+      return NULL;
+    magic = MAGICBYTE(p);
+    for(sz -= 1; (c = ((unsigned char*)p)[sz]) != magic; sz -= c) {
+      if(c<=0 || sz<(c+2*SIZE_SZ)) return NULL;
+    }
+  }
+  ((unsigned char*)p)[sz] ^= 0xFF;
+  if (magic_p)
+    *magic_p = (unsigned char *)p + sz;
+  return p;
+}
+
+/* Check for corruption of the top chunk, and try to recover if
+   necessary. */
+
+static int
+internal_function
+top_check(void)
+{
+  mchunkptr t = top(&main_arena);
+  char* brk, * new_brk;
+  INTERNAL_SIZE_T front_misalign, sbrk_size;
+  unsigned long pagesz = malloc_getpagesize;
+
+  if (t == initial_top(&main_arena) ||
+      (!chunk_is_mmapped(t) &&
+       chunksize(t)>=MINSIZE &&
+       prev_inuse(t) &&
+       (!contiguous(&main_arena) ||
+	(char*)t + chunksize(t) == mp_.sbrk_base + main_arena.system_mem)))
+    return 0;
+
+  malloc_printerr (check_action, "malloc: top chunk is corrupt", t);
+
+  /* Try to set up a new top chunk. */
+  brk = MORECORE(0);
+  front_misalign = (unsigned long)chunk2mem(brk) & MALLOC_ALIGN_MASK;
+  if (front_misalign > 0)
+    front_misalign = MALLOC_ALIGNMENT - front_misalign;
+  sbrk_size = front_misalign + mp_.top_pad + MINSIZE;
+  sbrk_size += pagesz - ((unsigned long)(brk + sbrk_size) & (pagesz - 1));
+  new_brk = (char*)(MORECORE (sbrk_size));
+  if (new_brk == (char*)(MORECORE_FAILURE))
+    {
+      MALLOC_FAILURE_ACTION;
+      return -1;
+    }
+  /* Call the `morecore' hook if necessary.  */
+  void (*hook) (void) = force_reg (dlafter_morecore_hook);
+  if (hook)
+    (*hook) ();
+  main_arena.system_mem = (new_brk - mp_.sbrk_base) + sbrk_size;
+
+  top(&main_arena) = (mchunkptr)(brk + front_misalign);
+  set_head(top(&main_arena), (sbrk_size - front_misalign) | PREV_INUSE);
+
+  return 0;
+}
+
+static Void_t*
+malloc_check(size_t sz, const Void_t *caller)
+{
+  Void_t *victim;
+
+  if (sz+1 == 0) {
+    MALLOC_FAILURE_ACTION;
+    return NULL;
+  }
+
+  (void)mutex_lock(&main_arena.mutex);
+  victim = (top_check() >= 0) ? _int_malloc(&main_arena, sz+1) : NULL;
+  (void)mutex_unlock(&main_arena.mutex);
+  return mem2mem_check(victim, sz);
+}
+
+static void
+free_check(Void_t* mem, const Void_t *caller)
+{
+  mchunkptr p;
+
+  if(!mem) return;
+  (void)mutex_lock(&main_arena.mutex);
+  p = mem2chunk_check(mem, NULL);
+  if(!p) {
+    (void)mutex_unlock(&main_arena.mutex);
+
+    malloc_printerr(check_action, "free(): invalid pointer", mem);
+    return;
+  }
+#if HAVE_MMAP
+  if (chunk_is_mmapped(p)) {
+    (void)mutex_unlock(&main_arena.mutex);
+    munmap_chunk(p);
+    return;
+  }
+#endif
+#if 0 /* Erase freed memory. */
+  memset(mem, 0, chunksize(p) - (SIZE_SZ+1));
+#endif
+#ifdef ATOMIC_FASTBINS
+  _int_free(&main_arena, p, 1);
+#else
+  _int_free(&main_arena, p);
+#endif
+  (void)mutex_unlock(&main_arena.mutex);
+}
+
+static Void_t*
+realloc_check(Void_t* oldmem, size_t bytes, const Void_t *caller)
+{
+  INTERNAL_SIZE_T nb;
+  Void_t* newmem = 0;
+  unsigned char *magic_p;
+
+  if (bytes+1 == 0) {
+    MALLOC_FAILURE_ACTION;
+    return NULL;
+  }
+  if (oldmem == 0) return malloc_check(bytes, NULL);
+  if (bytes == 0) {
+    free_check (oldmem, NULL);
+    return NULL;
+  }
+  (void)mutex_lock(&main_arena.mutex);
+  const mchunkptr oldp = mem2chunk_check(oldmem, &magic_p);
+  (void)mutex_unlock(&main_arena.mutex);
+  if(!oldp) {
+    malloc_printerr(check_action, "realloc(): invalid pointer", oldmem);
+    return malloc_check(bytes, NULL);
+  }
+  const INTERNAL_SIZE_T oldsize = chunksize(oldp);
+
+  checked_request2size(bytes+1, nb);
+  (void)mutex_lock(&main_arena.mutex);
+
+#if HAVE_MMAP
+  if (chunk_is_mmapped(oldp)) {
+#if HAVE_MREMAP
+    mchunkptr newp = mremap_chunk(oldp, nb);
+    if(newp)
+      newmem = chunk2mem(newp);
+    else
+#endif
+    {
+      /* Note the extra SIZE_SZ overhead. */
+      if(oldsize - SIZE_SZ >= nb)
+	newmem = oldmem; /* do nothing */
+      else {
+	/* Must alloc, copy, free. */
+	if (top_check() >= 0)
+	  newmem = _int_malloc(&main_arena, bytes+1);
+	if (newmem) {
+	  MALLOC_COPY(BOUNDED_N(newmem, bytes+1), oldmem, oldsize - 2*SIZE_SZ);
+	  munmap_chunk(oldp);
+	}
+      }
+    }
+  } else {
+#endif /* HAVE_MMAP */
+    if (top_check() >= 0) {
+      INTERNAL_SIZE_T nb;
+      checked_request2size(bytes + 1, nb);
+      newmem = _int_realloc(&main_arena, oldp, oldsize, nb);
+    }
+#if 0 /* Erase freed memory. */
+    if(newmem)
+      newp = mem2chunk(newmem);
+    nb = chunksize(newp);
+    if(oldp<newp || oldp>=chunk_at_offset(newp, nb)) {
+      memset((char*)oldmem + 2*sizeof(mbinptr), 0,
+	     oldsize - (2*sizeof(mbinptr)+2*SIZE_SZ+1));
+    } else if(nb > oldsize+SIZE_SZ) {
+      memset((char*)BOUNDED_N(chunk2mem(newp), bytes) + oldsize,
+	     0, nb - (oldsize+SIZE_SZ));
+    }
+#endif
+#if HAVE_MMAP
+  }
+#endif
+
+  /* mem2chunk_check changed the magic byte in the old chunk.
+     If newmem is NULL, then the old chunk will still be used though,
+     so we need to invert that change here.  */
+  if (newmem == NULL) *magic_p ^= 0xFF;
+
+  (void)mutex_unlock(&main_arena.mutex);
+
+  return mem2mem_check(newmem, bytes);
+}
+
+static Void_t*
+memalign_check(size_t alignment, size_t bytes, const Void_t *caller)
+{
+  INTERNAL_SIZE_T nb;
+  Void_t* mem;
+
+  if (alignment <= MALLOC_ALIGNMENT) return malloc_check(bytes, NULL);
+  if (alignment <  MINSIZE) alignment = MINSIZE;
+
+  if (bytes+1 == 0) {
+    MALLOC_FAILURE_ACTION;
+    return NULL;
+  }
+  checked_request2size(bytes+1, nb);
+  (void)mutex_lock(&main_arena.mutex);
+  mem = (top_check() >= 0) ? _int_memalign(&main_arena, alignment, bytes+1) :
+    NULL;
+  (void)mutex_unlock(&main_arena.mutex);
+  return mem2mem_check(mem, bytes);
+}
+
+#ifndef NO_THREADS
+
+# ifdef _LIBC
+#  if USE___THREAD || !defined SHARED
+    /* These routines are never needed in this configuration.  */
+#   define NO_STARTER
+#  endif
+# endif
+
+# ifdef NO_STARTER
+#  undef NO_STARTER
+# else
+
+/* The following hooks are used when the global initialization in
+   ptmalloc_init() hasn't completed yet. */
+
+static Void_t*
+malloc_starter(size_t sz, const Void_t *caller)
+{
+  Void_t* victim;
+
+  victim = _int_malloc(&main_arena, sz);
+
+  return victim ? BOUNDED_N(victim, sz) : 0;
+}
+
+static Void_t*
+memalign_starter(size_t align, size_t sz, const Void_t *caller)
+{
+  Void_t* victim;
+
+  victim = _int_memalign(&main_arena, align, sz);
+
+  return victim ? BOUNDED_N(victim, sz) : 0;
+}
+
+static void
+free_starter(Void_t* mem, const Void_t *caller)
+{
+  mchunkptr p;
+
+  if(!mem) return;
+  p = mem2chunk(mem);
+#if HAVE_MMAP
+  if (chunk_is_mmapped(p)) {
+    munmap_chunk(p);
+    return;
+  }
+#endif
+#ifdef ATOMIC_FASTBINS
+  _int_free(&main_arena, p, 1);
+#else
+  _int_free(&main_arena, p);
+#endif
+}
+
+# endif	/* !defiend NO_STARTER */
+#endif /* NO_THREADS */
+
+
+/* Get/set state: malloc_get_state() records the current state of all
+   malloc variables (_except_ for the actual heap contents and `hook'
+   function pointers) in a system dependent, opaque data structure.
+   This data structure is dynamically allocated and can be free()d
+   after use.  malloc_set_state() restores the state of all malloc
+   variables to the previously obtained state.  This is especially
+   useful when using this malloc as part of a shared library, and when
+   the heap contents are saved/restored via some other method.  The
+   primary example for this is GNU Emacs with its `dumping' procedure.
+   `Hook' function pointers are never saved or restored by these
+   functions, with two exceptions: If malloc checking was in use when
+   malloc_get_state() was called, then malloc_set_state() calls
+   __malloc_check_init() if possible; if malloc checking was not in
+   use in the recorded state but the user requested malloc checking,
+   then the hooks are reset to 0.  */
+
+#define MALLOC_STATE_MAGIC   0x444c4541l
+#define MALLOC_STATE_VERSION (0*0x100l + 4l) /* major*0x100 + minor */
+
+struct malloc_save_state {
+  long          magic;
+  long          version;
+  mbinptr       av[NBINS * 2 + 2];
+  char*         sbrk_base;
+  int           sbrked_mem_bytes;
+  unsigned long trim_threshold;
+  unsigned long top_pad;
+  unsigned int  n_mmaps_max;
+  unsigned long mmap_threshold;
+  int           check_action;
+  unsigned long max_sbrked_mem;
+  unsigned long max_total_mem;
+  unsigned int  n_mmaps;
+  unsigned int  max_n_mmaps;
+  unsigned long mmapped_mem;
+  unsigned long max_mmapped_mem;
+  int           using_malloc_checking;
+  unsigned long max_fast;
+  unsigned long arena_test;
+  unsigned long arena_max;
+  unsigned long narenas;
+};
+
+Void_t*
+public_gET_STATe(void)
+{
+  struct malloc_save_state* ms;
+  int i;
+  mbinptr b;
+
+  ms = (struct malloc_save_state*)public_mALLOc(sizeof(*ms));
+  if (!ms)
+    return 0;
+  (void)mutex_lock(&main_arena.mutex);
+  malloc_consolidate(&main_arena);
+  ms->magic = MALLOC_STATE_MAGIC;
+  ms->version = MALLOC_STATE_VERSION;
+  ms->av[0] = 0;
+  ms->av[1] = 0; /* used to be binblocks, now no longer used */
+  ms->av[2] = top(&main_arena);
+  ms->av[3] = 0; /* used to be undefined */
+  for(i=1; i<NBINS; i++) {
+    b = bin_at(&main_arena, i);
+    if(first(b) == b)
+      ms->av[2*i+2] = ms->av[2*i+3] = 0; /* empty bin */
+    else {
+      ms->av[2*i+2] = first(b);
+      ms->av[2*i+3] = last(b);
+    }
+  }
+  ms->sbrk_base = mp_.sbrk_base;
+  ms->sbrked_mem_bytes = main_arena.system_mem;
+  ms->trim_threshold = mp_.trim_threshold;
+  ms->top_pad = mp_.top_pad;
+  ms->n_mmaps_max = mp_.n_mmaps_max;
+  ms->mmap_threshold = mp_.mmap_threshold;
+  ms->check_action = check_action;
+  ms->max_sbrked_mem = main_arena.max_system_mem;
+#ifdef NO_THREADS
+  ms->max_total_mem = mp_.max_total_mem;
+#else
+  ms->max_total_mem = 0;
+#endif
+  ms->n_mmaps = mp_.n_mmaps;
+  ms->max_n_mmaps = mp_.max_n_mmaps;
+  ms->mmapped_mem = mp_.mmapped_mem;
+  ms->max_mmapped_mem = mp_.max_mmapped_mem;
+  ms->using_malloc_checking = using_malloc_checking;
+  ms->max_fast = get_max_fast();
+#ifdef PER_THREAD
+  ms->arena_test = mp_.arena_test;
+  ms->arena_max = mp_.arena_max;
+  ms->narenas = narenas;
+#endif
+  (void)mutex_unlock(&main_arena.mutex);
+  return (Void_t*)ms;
+}
+
+int
+public_sET_STATe(Void_t* msptr)
+{
+  struct malloc_save_state* ms = (struct malloc_save_state*)msptr;
+  size_t i;
+  mbinptr b;
+
+  disallow_malloc_check = 1;
+  ptmalloc_init();
+  if(ms->magic != MALLOC_STATE_MAGIC) return -1;
+  /* Must fail if the major version is too high. */
+  if((ms->version & ~0xffl) > (MALLOC_STATE_VERSION & ~0xffl)) return -2;
+  (void)mutex_lock(&main_arena.mutex);
+  /* There are no fastchunks.  */
+  clear_fastchunks(&main_arena);
+  if (ms->version >= 4)
+    set_max_fast(ms->max_fast);
+  else
+    set_max_fast(64);	/* 64 used to be the value we always used.  */
+  for (i=0; i<NFASTBINS; ++i)
+    fastbin (&main_arena, i) = 0;
+  for (i=0; i<BINMAPSIZE; ++i)
+    main_arena.binmap[i] = 0;
+  top(&main_arena) = ms->av[2];
+  main_arena.last_remainder = 0;
+  for(i=1; i<NBINS; i++) {
+    b = bin_at(&main_arena, i);
+    if(ms->av[2*i+2] == 0) {
+      assert(ms->av[2*i+3] == 0);
+      first(b) = last(b) = b;
+    } else {
+      if(ms->version >= 3 &&
+	 (i<NSMALLBINS || (largebin_index(chunksize(ms->av[2*i+2]))==i &&
+			   largebin_index(chunksize(ms->av[2*i+3]))==i))) {
+	first(b) = ms->av[2*i+2];
+	last(b) = ms->av[2*i+3];
+	/* Make sure the links to the bins within the heap are correct.  */
+	first(b)->bk = b;
+	last(b)->fd = b;
+	/* Set bit in binblocks.  */
+	mark_bin(&main_arena, i);
+      } else {
+	/* Oops, index computation from chunksize must have changed.
+	   Link the whole list into unsorted_chunks.  */
+	first(b) = last(b) = b;
+	b = unsorted_chunks(&main_arena);
+	ms->av[2*i+2]->bk = b;
+	ms->av[2*i+3]->fd = b->fd;
+	b->fd->bk = ms->av[2*i+3];
+	b->fd = ms->av[2*i+2];
+      }
+    }
+  }
+  if (ms->version < 3) {
+    /* Clear fd_nextsize and bk_nextsize fields.  */
+    b = unsorted_chunks(&main_arena)->fd;
+    while (b != unsorted_chunks(&main_arena)) {
+      if (!in_smallbin_range(chunksize(b))) {
+	b->fd_nextsize = NULL;
+	b->bk_nextsize = NULL;
+      }
+      b = b->fd;
+    }
+  }
+  mp_.sbrk_base = ms->sbrk_base;
+  main_arena.system_mem = ms->sbrked_mem_bytes;
+  mp_.trim_threshold = ms->trim_threshold;
+  mp_.top_pad = ms->top_pad;
+  mp_.n_mmaps_max = ms->n_mmaps_max;
+  mp_.mmap_threshold = ms->mmap_threshold;
+  check_action = ms->check_action;
+  main_arena.max_system_mem = ms->max_sbrked_mem;
+#ifdef NO_THREADS
+  mp_.max_total_mem = ms->max_total_mem;
+#endif
+  mp_.n_mmaps = ms->n_mmaps;
+  mp_.max_n_mmaps = ms->max_n_mmaps;
+  mp_.mmapped_mem = ms->mmapped_mem;
+  mp_.max_mmapped_mem = ms->max_mmapped_mem;
+  /* add version-dependent code here */
+  if (ms->version >= 1) {
+    /* Check whether it is safe to enable malloc checking, or whether
+       it is necessary to disable it.  */
+    if (ms->using_malloc_checking && !using_malloc_checking &&
+	!disallow_malloc_check)
+      dlmalloc_check_init ();
+    else if (!ms->using_malloc_checking && using_malloc_checking) {
+      dlmalloc_hook = NULL;
+      dlfree_hook = NULL;
+      dlrealloc_hook = NULL;
+      dlmemalign_hook = NULL;
+      using_malloc_checking = 0;
+    }
+  }
+  if (ms->version >= 4) {
+#ifdef PER_THREAD
+    mp_.arena_test = ms->arena_test;
+    mp_.arena_max = ms->arena_max;
+    narenas = ms->narenas;
+#endif
+  }
+  check_malloc_state(&main_arena);
+
+  (void)mutex_unlock(&main_arena.mutex);
+  return 0;
+}
+
+/*
+ * Local variables:
+ * c-basic-offset: 2
+ * End:
+ */
diff --git a/tpc/malloc2.13/malloc.c b/tpc/malloc2.13/malloc.c
index 6b75c9a6beb0..f1c7b219a0bd 100644
--- a/tpc/malloc2.13/malloc.c
+++ b/tpc/malloc2.13/malloc.c
@@ -2450,7 +2450,7 @@ static int perturb_byte;
 
 
 /* ------------------- Support for multiple arenas -------------------- */
-#include "arena.ch"
+#include "arena.h"
 
 /*
   Debugging support
@@ -2813,7 +2813,7 @@ static void do_check_malloc_state(struct malloc_state * av)
 
 
 /* ----------------- Support for debugging hooks -------------------- */
-#include "hooks.ch"
+#include "hooks.h"
 
 
 /* ----------- Routines dealing with system allocation -------------- */
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [PATCH] malloc: allow recursion from ptmalloc_init to malloc
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
                   ` (61 preceding siblings ...)
  2016-01-26  0:32 ` [PATCH] malloc: Don't call tsd_setspecific before tsd_key_create Joern Engel
@ 2016-01-26  0:32 ` Joern Engel
  2016-01-26  0:50 ` malloc: performance improvements and bugfixes Paul Eggert
  2016-01-28 13:51 ` Carlos O'Donell
  64 siblings, 0 replies; 119+ messages in thread
From: Joern Engel @ 2016-01-26  0:32 UTC (permalink / raw)
  To: GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

numa_node_count() works by essentially counting subdirectories in
/sys/devices/system/node.  Completely insane, compared to
_SC_NPROCESSORS_CONF, but apparently that is our interface for now.

On some machines, opendir() allocates memory and recurses back into
malloc.  Solve that by initially using main_arena for every node and
allowing the initializing thread to bypass initialization if it
recurses.  We do one allocation from main_arena, everything thereafter
should come from numa-local arenas.

JIRA: PURE-35526
---
 tpc/malloc2.13/arena.h | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/tpc/malloc2.13/arena.h b/tpc/malloc2.13/arena.h
index 3e6107fbc4a4..7de7436a30ba 100644
--- a/tpc/malloc2.13/arena.h
+++ b/tpc/malloc2.13/arena.h
@@ -302,6 +302,7 @@ static int num_nodes = 0;
 #include <sys/types.h>
 #include <dirent.h>
 #include <string.h>
+#include <sys/syscall.h>
 
 /*
  * Wouldn't it be nice to get this with a single syscall instead?
@@ -328,13 +329,26 @@ static int numa_node_count(void)
 	return ret;
 }
 
+static inline pid_t gettid(void)
+{
+	return syscall(SYS_gettid);
+}
+
 static void ptmalloc_init(void)
 {
 	const char *s;
 	int i, secure = 0;
+	static pid_t init_tid;
 
 	if (!__sync_bool_compare_and_swap(&__malloc_initialized, -1, 0)) {
 		do {
+			if (init_tid == gettid()) {
+				/* We have recursed back into malloc()
+				   from ptmalloc_init.  At this point
+				   we can survive by using the main_arena,
+				   so just return. */
+				return;
+			}
 			sched_yield();
 		} while (__malloc_initialized <= 0);
 		return;
@@ -356,7 +370,15 @@ static void ptmalloc_init(void)
 #endif				/* !defined NO_THREADS */
 	mutex_init(&main_arena.mutex);
 	main_arena.next = &main_arena;
+	main_arena.local_next = &main_arena;
 	main_arena.numa_node = -1;
+
+	/* numa_node_count() can recurse into malloc().  Use main_arena
+	   for all numa nodes and set init_tid to allow recursion. */
+	for (i = 0; i < MAX_NUMA_NODES; i++) {
+		numa_arena[i] = &main_arena;
+	}
+	init_tid = gettid();
 	num_nodes = numa_node_count();
 	for (i = 0; i < num_nodes; i++) {
 		numa_arena[i] = _int_new_arena(0, i);
@@ -779,7 +801,6 @@ static struct malloc_state *arena_get2(struct malloc_state *a_tsd, size_t size)
  * Calling getcpu() for every allocation is too expensive - but we can turn
  * the syscall into a pointer dereference to a kernel shared memory page.
  */
-#include <sys/syscall.h>
 static inline int getnode(void)
 {
 	int node, ret;
-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: malloc: performance improvements and bugfixes
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
                   ` (62 preceding siblings ...)
  2016-01-26  0:32 ` [PATCH] malloc: allow recursion from ptmalloc_init to malloc Joern Engel
@ 2016-01-26  0:50 ` Paul Eggert
  2016-01-26  1:01   ` Jörn Engel
                     ` (2 more replies)
  2016-01-28 13:51 ` Carlos O'Donell
  64 siblings, 3 replies; 119+ messages in thread
From: Paul Eggert @ 2016-01-26  0:50 UTC (permalink / raw)
  To: Joern Engel, GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

Thanks for doing all the work and bringing it to our attention. A couple 
of comments on the process:

On 01/25/2016 04:24 PM, Joern Engel wrote:
> I happen to prefer the kernel coding style over GNU coding style.

Nevertheless, let's keep the GNU coding style for glibc. In some places 
the existing code doesn't conform to that style but that can be fixed as 
we go.

> I believe there are some applications that
> have deeper knowledge about malloc-internal data structures than they
> should (*cough*emacs).  As a result it has become impossible to change
> the internals of malloc without breaking said applications

This underestimates Emacs. :-)

Emacs "knows" so much about glibc malloc's internal data structures that 
Emacs should do the right thing if glibc removes the hooks in question. 
Of course we should test the resulting combination. However, the point 
is that Emacs's usage of glibc malloc internals shouldn't ossify glibc 
malloc.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: malloc: performance improvements and bugfixes
  2016-01-26  0:50 ` malloc: performance improvements and bugfixes Paul Eggert
@ 2016-01-26  1:01   ` Jörn Engel
  2016-01-26  1:52     ` Siddhesh Poyarekar
  2016-01-26  1:52     ` Joseph Myers
  2016-01-26 20:50   ` Steven Munroe
  2016-01-27 21:45   ` Carlos O'Donell
  2 siblings, 2 replies; 119+ messages in thread
From: Jörn Engel @ 2016-01-26  1:01 UTC (permalink / raw)
  To: Paul Eggert; +Cc: GNU C. Library, Siddhesh Poyarekar, Joern Engel

On Mon, Jan 25, 2016 at 04:50:44PM -0800, Paul Eggert wrote:
> Thanks for doing all the work and bringing it to our attention. A couple of
> comments on the process:
> 
> On 01/25/2016 04:24 PM, Joern Engel wrote:
> >I happen to prefer the kernel coding style over GNU coding style.
> 
> Nevertheless, let's keep the GNU coding style for glibc. In some places the
> existing code doesn't conform to that style but that can be fixed as we go.

Agreed.  I thought I mentioned that this is a braindump and not a
patchset for submission. ;)

And since I seem to have messed up patch numbering, here is the order
they should have:
0001-malloc-kill-mprotect.patch
0002-malloc-use-MAP_HUGETLB-when-possible.patch
0003-malloc-Lindent-new_heap.patch
0004-malloc-push-down-the-memset-for-huge-pages.patch
0005-malloc-unifdef-D__STD_C.patch
0006-malloc-remove-mstate-typedef.patch
0007-malloc-rename-.ch-to-.h.patch
0008-malloc-remove-dead-code.patch
0009-malloc-unifdef-m-DUSE_ARENAS-DHAVE_MMAP.patch
0010-malloc-unifdef-m-Ulibc_hidden_def.patch
0011-malloc-Lindent-users-of-arena_get2.patch
0012-malloc-introduce-get_backup_arena.patch
0013-malloc-Lindent-before-functional-changes.patch
0014-malloc-unifdef-m-UPER_THREAD-U_LIBC.patch
0015-malloc-initial-numa-support.patch
0016-malloc-remove-emacs-style-guards.patch
0017-malloc-turn-arena_get-into-a-function.patch
0018-malloc-use-mbind.patch
0019-malloc-unobfuscate-an-assert.patch
0020-malloc-remove-__builtin_expect.patch
0021-malloc-Revert-glibc-1d05c2fb9c6f.patch
0022-malloc-use-tsd_getspecific-for-arena_get.patch
0023-malloc-hide-THREAD_STATS.patch
0024-malloc-unifdef-m-UATOMIC_FASTBINS.patch
0025-malloc-Lindent-public_fREe.patch
0026-malloc-create-a-useful-assert.patch
0027-malloc-fix-mbind-on-old-kernels.patch
0028-malloc-brain-dead-thread-cache.patch
0029-malloc-prefetch-for-tcache_malloc.patch
0030-malloc-use-bitmap-to-conserve-hot-bins.patch
0031-malloc-use-atomic-free-list.patch
0032-malloc-tune-thread-cache.patch
0033-malloc-avoid-main_arena.patch
0034-malloc-add-documentation.patch
0035-malloc-fix-local_next-handling.patch
0036-malloc-always-free-objects-locklessly.patch
0037-malloc-only-free-half-the-objects-on-tcache_gc.patch
0038-malloc-destroy-thread-cache-on-thread-exit.patch
0039-malloc-s-max_node-num_nodes.patch
0040-malloc-better-inline-documentation.patch
0041-malloc-fix-hard-coded-constant.patch
0042-malloc-document-and-fix-linked-list-handling.patch
0043-malloc-limit-free_atomic_list-latency.patch
0044-malloc-make-numa_node_count-more-robust.patch
0045-malloc-fix-startup-races.patch
0046-malloc-quenche-last-compiler-warnings.patch
0047-malloc-simplify-and-fix-calloc.patch
0048-malloc-remove-stale-condition.patch
0049-malloc-fix-perturb_byte-handling-for-tcache.patch
0050-malloc-move-out-perturb_byte-conditionals.patch
0051-malloc-add-locking-to-thread-cache.patch
0052-malloc-plug-thread-cache-memory-leak.patch
0053-malloc-allow-recursion-from-ptmalloc_init-to-malloc.patch
0054-malloc-remove-get_backup_arena-from-tcache_malloc.patch
0055-malloc-fix-calculation-of-aligned-heaps.patch
0056-malloc-remove-hooks-from-malloc-and-free.patch
0057-malloc-remove-tcache-prefetching.patch
0058-malloc-create-aliases-for-malloc-free.patch
0059-malloc-remove-atfork-hooks.patch
0060-malloc-remove-all-remaining-hooks.patch
0061-malloc-define-__libc_memalign.patch
0062-malloc-Don-t-call-tsd_setspecific-before-tsd_key_cre.patch
0063-malloc-speed-up-mmap.patch

Jörn

--
Happiness isn't having what you want, it's wanting what you have.
-- unknown

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: malloc: performance improvements and bugfixes
  2016-01-26  1:01   ` Jörn Engel
  2016-01-26  1:52     ` Siddhesh Poyarekar
@ 2016-01-26  1:52     ` Joseph Myers
  1 sibling, 0 replies; 119+ messages in thread
From: Joseph Myers @ 2016-01-26  1:52 UTC (permalink / raw)
  To: Jörn Engel
  Cc: Paul Eggert, GNU C. Library, Siddhesh Poyarekar, Joern Engel

[-- Attachment #1: Type: text/plain, Size: 919 bytes --]

On Mon, 25 Jan 2016, Jörn Engel wrote:

> On Mon, Jan 25, 2016 at 04:50:44PM -0800, Paul Eggert wrote:
> > Thanks for doing all the work and bringing it to our attention. A couple of
> > comments on the process:
> > 
> > On 01/25/2016 04:24 PM, Joern Engel wrote:
> > >I happen to prefer the kernel coding style over GNU coding style.
> > 
> > Nevertheless, let's keep the GNU coding style for glibc. In some places the
> > existing code doesn't conform to that style but that can be fixed as we go.
> 
> Agreed.  I thought I mentioned that this is a braindump and not a
> patchset for submission. ;)

And without a copyright assignment on file, people should only look at the 
descriptions of issues found with glibc malloc in case those issues are 
still applicable to current glibc malloc and so give ideas for improving 
it, and avoid looking at the patches themselves.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: malloc: performance improvements and bugfixes
  2016-01-26  1:01   ` Jörn Engel
@ 2016-01-26  1:52     ` Siddhesh Poyarekar
  2016-01-26  2:45       ` Jörn Engel
  2016-01-26  1:52     ` Joseph Myers
  1 sibling, 1 reply; 119+ messages in thread
From: Siddhesh Poyarekar @ 2016-01-26  1:52 UTC (permalink / raw)
  To: Jörn Engel; +Cc: Paul Eggert, GNU C. Library, Joern Engel

On 26 January 2016 at 06:30, Jörn Engel <joern@purestorage.com> wrote:
> Agreed.  I thought I mentioned that this is a braindump and not a
> patchset for submission. ;)

Thank you for doing this. There is however the issue of copyright
assignment.  I don't have access to the copyright list, so could you
(or Carlos, Joseph, etc.) please tell me if you have signed it?
Without that, not only is your work unusable for us (since it is a
significant change), it also makes it difficult for someone with a
copyright assignment to write an independent implementation since
they're 'tainted' by your patch.

FTR, I have only seen your patch to revert the mmap_threshold changes
which I don't agree with - if you don't want to overcommit, you just
set overcommit to 0 and mprotect is never called.  I don't know how
you're hitting that case with overcommit set to 0.

Siddhesh
-- 
http://siddhesh.in

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: malloc: performance improvements and bugfixes
  2016-01-26  1:52     ` Siddhesh Poyarekar
@ 2016-01-26  2:45       ` Jörn Engel
  2016-01-26  3:22         ` Jörn Engel
                           ` (2 more replies)
  0 siblings, 3 replies; 119+ messages in thread
From: Jörn Engel @ 2016-01-26  2:45 UTC (permalink / raw)
  To: Siddhesh Poyarekar; +Cc: Paul Eggert, GNU C. Library, Joern Engel

On Tue, Jan 26, 2016 at 07:22:32AM +0530, Siddhesh Poyarekar wrote:
> On 26 January 2016 at 06:30, Jörn Engel <joern@purestorage.com> wrote:
> > Agreed.  I thought I mentioned that this is a braindump and not a
> > patchset for submission. ;)
> 
> Thank you for doing this. There is however the issue of copyright
> assignment.  I don't have access to the copyright list, so could you
> (or Carlos, Joseph, etc.) please tell me if you have signed it?

I have not and will not.  Or at least someone with a silver tongue would
have to spend significant time explaining the advantages of copyright
assignment to me.

> Without that, not only is your work unusable for us (since it is a
> significant change), it also makes it difficult for someone with a
> copyright assignment to write an independent implementation since
> they're 'tainted' by your patch.

I guess that will be the end of it, then.  Bummer.

Jörn

--
The rabbit runs faster than the fox, because the rabbit is rinning for
his life while the fox is only running for his dinner.
-- Aesop

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: malloc: performance improvements and bugfixes
  2016-01-26  2:45       ` Jörn Engel
@ 2016-01-26  3:22         ` Jörn Engel
  2016-01-26  6:22           ` Mike Frysinger
                             ` (2 more replies)
  2016-01-26  7:40         ` Paul Eggert
  2016-01-26  9:54         ` Florian Weimer
  2 siblings, 3 replies; 119+ messages in thread
From: Jörn Engel @ 2016-01-26  3:22 UTC (permalink / raw)
  To: Siddhesh Poyarekar; +Cc: Paul Eggert, GNU C. Library, Joern Engel

On Mon, Jan 25, 2016 at 06:44:37PM -0800, Jörn Engel wrote:
> On Tue, Jan 26, 2016 at 07:22:32AM +0530, Siddhesh Poyarekar wrote:
> > On 26 January 2016 at 06:30, Jörn Engel <joern@purestorage.com> wrote:
> > > Agreed.  I thought I mentioned that this is a braindump and not a
> > > patchset for submission. ;)
> > 
> > Thank you for doing this. There is however the issue of copyright
> > assignment.  I don't have access to the copyright list, so could you
> > (or Carlos, Joseph, etc.) please tell me if you have signed it?
> 
> I have not and will not.  Or at least someone with a silver tongue would
> have to spend significant time explaining the advantages of copyright
> assignment to me.

Maybe I should elaborate a little.

I am quite thankful to the FSF for the GPL.  Creating that license was a
wonderful move for those people that aren't happy with BSD-style
licenses.

That said, I find language like "version 2 or later" trollbait at best.
The paranoid in me and many other developers starts wondering under what
circumstances the FSF might turn evil, by any definition of evil, and
create a license to further their own schemes.

Some might argue that GPLv3 already is evil.  I personally don't mind
either version 2 or 3, but I hate the rift this has created where some
code is "2 only please" and other code is "3 or later".

Copyright assignment is far far worse.  I am signing away ownership of
the code.  But of which code?  Everything I ever write in the future?
To answer this question I have to read a lot of legalese, any developers
favorite.  Then I have to pay a lawyer to explain the finer points to
me, because I may have missed them.  Next I have to pay a second lawyer
to judge whether the first lawyer even knew what he was talking about,
which sadly isn't always the case.

At this point I am pretty much exhausted and write some kernel code
instead.  Linus will fuzz about the technical merits that I enjoy
deliberating, not about copyright assignment.  Or I get a job where the
same legal problems await me, but I at least get compensated for them.

And just in case it wasn't clear, all the above is my personal opinion
and not that of my employer.  Maybe my employer sees enough merit in
getting code upstream to handle the legal work.  I do not and will not
pursue this any further.

Sad.

Jörn

--
It's just what we asked for, but not what we want!
-- anonymous

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: malloc: performance improvements and bugfixes
  2016-01-26  3:22         ` Jörn Engel
@ 2016-01-26  6:22           ` Mike Frysinger
  2016-01-26  7:54             ` Jörn Engel
  2016-01-26 12:37           ` Torvald Riegel
  2016-01-26 13:23           ` Florian Weimer
  2 siblings, 1 reply; 119+ messages in thread
From: Mike Frysinger @ 2016-01-26  6:22 UTC (permalink / raw)
  To: Jörn Engel
  Cc: Siddhesh Poyarekar, Paul Eggert, GNU C. Library, Joern Engel

[-- Attachment #1: Type: text/plain, Size: 1375 bytes --]

On 25 Jan 2016 19:21, Jörn Engel wrote:
> That said, I find language like "version 2 or later" trollbait at best.
> The paranoid in me and many other developers starts wondering under what
> circumstances the FSF might turn evil, by any definition of evil, and
> create a license to further their own schemes.

this doesn't retroactively change existing releases.  so any code you
contribute today, at worst, will be under that license (LGPL-2.1+).

> Copyright assignment is far far worse.  I am signing away ownership of
> the code.  But of which code?  Everything I ever write in the future?
> To answer this question I have to read a lot of legalese, any developers
> favorite.  Then I have to pay a lawyer to explain the finer points to
> me, because I may have missed them.  Next I have to pay a second lawyer
> to judge whether the first lawyer even knew what he was talking about,
> which sadly isn't always the case.

FSF CLA is per-project, and iirc, only like one or two pages.  i don't
recall it being that dense.

i'm not trying to push you to sign a CLA ... it's certainly your choice
and CLA's do suck.  unfortunately, this is currently what the FSF forces
onto GNU projects.  i'm not really sure why anymore -- the busybox/linux
cases show that GPL enforcement is possible w/only a single copyright
holder and no project CLA.
-mike

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: malloc: performance improvements and bugfixes
  2016-01-26  2:45       ` Jörn Engel
  2016-01-26  3:22         ` Jörn Engel
@ 2016-01-26  7:40         ` Paul Eggert
  2016-01-26  9:54         ` Florian Weimer
  2 siblings, 0 replies; 119+ messages in thread
From: Paul Eggert @ 2016-01-26  7:40 UTC (permalink / raw)
  To: Jörn Engel, Siddhesh Poyarekar; +Cc: GNU C. Library, Joern Engel

Jörn Engel wrote:
> I have not and will not.  Or at least someone with a silver tongue would
> have to spend significant time explaining the advantages of copyright
> assignment to me.

Given your later remarks, it sounds like it'd be less work for us to rewrite the 
malloc implementation ourselves, than to try to convince you to assign 
copyright. Oh well.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: malloc: performance improvements and bugfixes
  2016-01-26  6:22           ` Mike Frysinger
@ 2016-01-26  7:54             ` Jörn Engel
  2016-01-26  9:53               ` Florian Weimer
  0 siblings, 1 reply; 119+ messages in thread
From: Jörn Engel @ 2016-01-26  7:54 UTC (permalink / raw)
  To: Siddhesh Poyarekar, Paul Eggert, GNU C. Library, Joern Engel

On Tue, Jan 26, 2016 at 01:22:08AM -0500, Mike Frysinger wrote:
> On 25 Jan 2016 19:21, Jörn Engel wrote:
> > That said, I find language like "version 2 or later" trollbait at best.
> > The paranoid in me and many other developers starts wondering under what
> > circumstances the FSF might turn evil, by any definition of evil, and
> > create a license to further their own schemes.
> 
> this doesn't retroactively change existing releases.  so any code you
> contribute today, at worst, will be under that license (LGPL-2.1+).

That is not the point.  The point is that "or later" is signing a blank
check to the FSF to create a license as they see fit and I might
dislike.

> > Copyright assignment is far far worse.  I am signing away ownership of
> > the code.  But of which code?  Everything I ever write in the future?
> > To answer this question I have to read a lot of legalese, any developers
> > favorite.  Then I have to pay a lawyer to explain the finer points to
> > me, because I may have missed them.  Next I have to pay a second lawyer
> > to judge whether the first lawyer even knew what he was talking about,
> > which sadly isn't always the case.
> 
> FSF CLA is per-project, and iirc, only like one or two pages.  i don't
> recall it being that dense.

It doesn't show up particularly prominently in Google results either.
The closest I could find in the top ten was
http://comments.gmane.org/gmane.emacs.devel/156850

If I search for "glibc copyright assignment" I don't find anything
useful in the top ten either.  Everything I do find talks about the
copyright assignment, but I don't find the form itself.  Feels a lot
like a Douglas Adams novel so far.

Imo we should try to collect the best code in the world into a couple of
high-quality repositories.  Glibc seems like the right repository for
things like a memory allocator and I really don't mind contributing.
You can have my code under the exact terms and conditions that I
received your code under, that is perfectly fair.

But if people are asked not to even look at my code, I cannot help but
wonder how this is supposed to help the free software movement or
anything else.  If I cannot find the form I supposedly have to sign, how
can I judge whether signing them is merely annoying or will harm me in
the future.

For example, could I relicense my code under a different license after
assigning the copyright to some other entity?  That matters a lot to me
and I would expect it to matter to others as well.  Why don't I find an
FAQ page where that question is answered?

The conclusion I am coming to is that people are actively discouraged
from contributing.  Whether that is by design, accident or incompetence,
I cannot tell.  But it makes me unhappy.

> i'm not trying to push you to sign a CLA ... it's certainly your choice
> and CLA's do suck.  unfortunately, this is currently what the FSF forces
> onto GNU projects.  i'm not really sure why anymore -- the busybox/linux
> cases show that GPL enforcement is possible w/only a single copyright
> holder and no project CLA.

Sadly they also show that you mostly get bad code that wasn't worth the
legal effort you invested.  Might be nice to find the GPL in the
paperwork that accompanies your new TV, but otherwise the litigation
seemed mostly useless to me.

Jörn

--
"Error protection by error detection and correction."
-- from a university class

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH] malloc: remove __builtin_expect
  2016-01-26  0:27 ` [PATCH] malloc: remove __builtin_expect Joern Engel
@ 2016-01-26  7:56   ` Yury Gribov
  2016-01-26  9:00     ` Jörn Engel
  2016-01-26 20:43     ` Steven Munroe
  0 siblings, 2 replies; 119+ messages in thread
From: Yury Gribov @ 2016-01-26  7:56 UTC (permalink / raw)
  To: Joern Engel, GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

On 01/26/2016 03:24 AM, Joern Engel wrote:
> From: Joern Engel <joern@purestorage.org>
>
> It was disabled anyway and only served as obfuscation.  No change
> post-compilation.

FYI I've witnessed significant improvements from (real) __builtin_expect 
in other projects.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH] malloc: remove __builtin_expect
  2016-01-26  7:56   ` Yury Gribov
@ 2016-01-26  9:00     ` Jörn Engel
  2016-01-26  9:37       ` Yury Gribov
  2016-01-26 15:46       ` Jeff Law
  2016-01-26 20:43     ` Steven Munroe
  1 sibling, 2 replies; 119+ messages in thread
From: Jörn Engel @ 2016-01-26  9:00 UTC (permalink / raw)
  To: Yury Gribov; +Cc: GNU C. Library, Siddhesh Poyarekar, Joern Engel

On Tue, Jan 26, 2016 at 10:56:49AM +0300, Yury Gribov wrote:
> On 01/26/2016 03:24 AM, Joern Engel wrote:
> >From: Joern Engel <joern@purestorage.org>
> >
> >It was disabled anyway and only served as obfuscation.  No change
> >post-compilation.
> 
> FYI I've witnessed significant improvements from (real) __builtin_expect in
> other projects.

Interesting.  I tried to find any effect and couldn't.  Michael Kerrisk
managed to write an example program that demonstrated the advantage:
http://blog.man7.org/2012/10/how-much-do-builtinexpect-likely-and.html

I tried his program and again couldn't demonstrate any effect.  Not that
I question his numbers, there likely was an effect using his compiler
and machine.  But with a newer compiler and/or cpu, the effect was gone.

So if you have a reference to the project and can replicate the
improvement, I would be interested.

Jörn

--
It's worth remembering that one reason the Internet succeeded was that
it did not need the permission of the local telcos  in order to get up
and going.
-- iang

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH] malloc: remove __builtin_expect
  2016-01-26  9:00     ` Jörn Engel
@ 2016-01-26  9:37       ` Yury Gribov
  2016-01-26 15:46       ` Jeff Law
  1 sibling, 0 replies; 119+ messages in thread
From: Yury Gribov @ 2016-01-26  9:37 UTC (permalink / raw)
  To: Jörn Engel; +Cc: GNU C. Library, Siddhesh Poyarekar, Joern Engel

On 01/26/2016 11:59 AM, Jörn Engel wrote:
> On Tue, Jan 26, 2016 at 10:56:49AM +0300, Yury Gribov wrote:
>> On 01/26/2016 03:24 AM, Joern Engel wrote:
>>> From: Joern Engel <joern@purestorage.org>
>>>
>>> It was disabled anyway and only served as obfuscation.  No change
>>> post-compilation.
>>
>> FYI I've witnessed significant improvements from (real) __builtin_expect in
>> other projects.
>
> Interesting.  I tried to find any effect and couldn't.  Michael Kerrisk
> managed to write an example program that demonstrated the advantage:
> http://blog.man7.org/2012/10/how-much-do-builtinexpect-likely-and.html
>
> I tried his program and again couldn't demonstrate any effect.  Not that
> I question his numbers, there likely was an effect using his compiler
> and machine.  But with a newer compiler and/or cpu, the effect was gone.
>
> So if you have a reference to the project and can replicate the
> improvement, I would be interested.

Ok, it took time to dig the archives: 
http://comments.gmane.org/gmane.comp.debugging.address-sanitizer/1288 . 
Basically adding __builtin_expect in an important hot-path saved 20% of 
performance (that's on Clang though).

> Jörn
>
> --
> It's worth remembering that one reason the Internet succeeded was that
> it did not need the permission of the local telcos  in order to get up
> and going.
> -- iang
>
>

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: malloc: performance improvements and bugfixes
  2016-01-26  7:54             ` Jörn Engel
@ 2016-01-26  9:53               ` Florian Weimer
  2016-01-26 17:05                 ` Jörn Engel
  0 siblings, 1 reply; 119+ messages in thread
From: Florian Weimer @ 2016-01-26  9:53 UTC (permalink / raw)
  To: Jörn Engel, Siddhesh Poyarekar, Paul Eggert, GNU C. Library,
	Joern Engel

On 01/26/2016 08:53 AM, Jörn Engel wrote:
> But if people are asked not to even look at my code, I cannot help but
> wonder how this is supposed to help the free software movement or
> anything else.

I think the world benefits from having a free software C run-time
library with a clear overall copyright situation.  But this means that
once you work on the library, you need to be aware how your
contributions come into being and to what degree they are influenced by
the work of others.

Sometimes, the results can be quite annoying.  For example, we have a
patch intertwined with a bug report, and we cannot clarify its copyright
situation, so someone wrote down the analysis in their own words, so
that someone else can re-create the fix from scratch, without looking at
the original patch submission.  But there really isn't a way around that.

This is independent of the copyright assignment situation.  Copyright
assignment (or formal contribution agreements) just make you more aware
of this obligation to the community.

Florian

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: malloc: performance improvements and bugfixes
  2016-01-26  2:45       ` Jörn Engel
  2016-01-26  3:22         ` Jörn Engel
  2016-01-26  7:40         ` Paul Eggert
@ 2016-01-26  9:54         ` Florian Weimer
  2 siblings, 0 replies; 119+ messages in thread
From: Florian Weimer @ 2016-01-26  9:54 UTC (permalink / raw)
  To: Jörn Engel, Siddhesh Poyarekar
  Cc: Paul Eggert, GNU C. Library, Joern Engel

On 01/26/2016 03:44 AM, Jörn Engel wrote:

> I have not and will not.  Or at least someone with a silver tongue would
> have to spend significant time explaining the advantages of copyright
> assignment to me.

The only practical advantage is that with the assignment in place, there
is a chance that your changes are merged, and you do not have to
maintain your fork anymore.

Florian

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH] malloc: simplify and fix calloc
  2016-01-26  0:27 ` [PATCH] malloc: simplify and fix calloc Joern Engel
@ 2016-01-26 10:32   ` Will Newton
  0 siblings, 0 replies; 119+ messages in thread
From: Will Newton @ 2016-01-26 10:32 UTC (permalink / raw)
  To: Joern Engel; +Cc: GNU C. Library, Siddhesh Poyarekar

On Tue, Jan 26, 2016 at 12:25 AM, Joern Engel <joern@purestorage.com> wrote:
> Calloc was essentially a copy of malloc that tried to skip the memset if
> the memory was just returned from mmap() and is already cleared.  It was
> also buggy and a suitable test could trigger a segfault.

Would it be possible to add such a test to the testsuite?

> New code only does the overflow check, malloc and memset.  Runtime cost
> is lower than maintenance cost of already-buggy code.
>
> JIRA: PURE-27597
> JIRA: PURE-36718
> ---
>  tpc/malloc2.13/malloc.c | 106 +++---------------------------------------------
>  1 file changed, 5 insertions(+), 101 deletions(-)
>
> diff --git a/tpc/malloc2.13/malloc.c b/tpc/malloc2.13/malloc.c
> index d9fecfe3f921..190c1d24b082 100644
> --- a/tpc/malloc2.13/malloc.c
> +++ b/tpc/malloc2.13/malloc.c
> @@ -3506,13 +3506,8 @@ Void_t *public_pVALLOc(size_t bytes)
>
>  Void_t *public_cALLOc(size_t n, size_t elem_size)
>  {
> -       struct malloc_state *av;
> -       mchunkptr oldtop, p;
> -       INTERNAL_SIZE_T bytes, sz, csz, oldtopsize;
> +       INTERNAL_SIZE_T bytes;
>         Void_t *mem;
> -       unsigned long clearsize;
> -       unsigned long nclears;
> -       INTERNAL_SIZE_T *d;
>
>         /* size_t is unsigned so the behavior on overflow is defined.  */
>         bytes = n * elem_size;
> @@ -3525,101 +3520,10 @@ Void_t *public_cALLOc(size_t n, size_t elem_size)
>                 }
>         }
>
> -       __malloc_ptr_t(*hook) __MALLOC_PMT((size_t, __const __malloc_ptr_t)) = force_reg(dlmalloc_hook);
> -       if (hook != NULL) {
> -               sz = bytes;
> -               mem = (*hook) (sz, RETURN_ADDRESS(0));
> -               if (mem == 0)
> -                       return 0;
> -#ifdef HAVE_MEMCPY
> -               return memset(mem, 0, sz);
> -#else
> -               while (sz > 0)
> -                       ((char *)mem)[--sz] = 0;        /* rather inefficient */
> -               return mem;
> -#endif
> -       }
> -
> -       sz = bytes;
> -
> -       av = arena_get(sz);
> -       if (!av)
> -               return 0;
> -
> -       /* Check if we hand out the top chunk, in which case there may be no
> -          need to clear. */
> -#if MORECORE_CLEARS
> -       oldtop = top(av);
> -       oldtopsize = chunksize(top(av));
> -#if MORECORE_CLEARS < 2
> -       /* Only newly allocated memory is guaranteed to be cleared.  */
> -       if (av == &main_arena && oldtopsize < mp_.sbrk_base + av->max_system_mem - (char *)oldtop)
> -               oldtopsize = (mp_.sbrk_base + av->max_system_mem - (char *)oldtop);
> -#endif
> -       if (av != &main_arena) {
> -               heap_info *heap = heap_for_ptr(oldtop);
> -               if (oldtopsize < (char *)heap + heap->mprotect_size - (char *)oldtop)
> -                       oldtopsize = (char *)heap + heap->mprotect_size - (char *)oldtop;
> -       }
> -#endif
> -       mem = _int_malloc(av, sz);
> -       if (mem == 0) {
> -               av = get_backup_arena(av, bytes);
> -               mem = _int_malloc(&main_arena, sz);
> -       }
> -       arena_unlock(av);
> -
> -       assert(!mem || chunk_is_mmapped(mem2chunk(mem)) || av == arena_for_chunk(mem2chunk(mem)));
> -       if (mem == 0)
> -               return 0;
> -       p = mem2chunk(mem);
> -
> -       /* Two optional cases in which clearing not necessary */
> -       if (chunk_is_mmapped(p)) {
> -               if (perturb_byte)
> -                       MALLOC_ZERO(mem, sz);
> -               return mem;
> -       }
> -
> -       csz = chunksize(p);
> -
> -#if MORECORE_CLEARS
> -       if (perturb_byte == 0 && (p == oldtop && csz > oldtopsize)) {
> -               /* clear only the bytes from non-freshly-sbrked memory */
> -               csz = oldtopsize;
> -       }
> -#endif
> -
> -       /* Unroll clear of <= 36 bytes (72 if 8byte sizes).  We know that
> -          contents have an odd number of INTERNAL_SIZE_T-sized words;
> -          minimally 3.  */
> -       d = (INTERNAL_SIZE_T *) mem;
> -       clearsize = csz - SIZE_SZ;
> -       nclears = clearsize / sizeof(INTERNAL_SIZE_T);
> -       assert(nclears >= 3);
> -
> -       if (nclears > 9)
> -               MALLOC_ZERO(d, clearsize);
> -
> -       else {
> -               *(d + 0) = 0;
> -               *(d + 1) = 0;
> -               *(d + 2) = 0;
> -               if (nclears > 4) {
> -                       *(d + 3) = 0;
> -                       *(d + 4) = 0;
> -                       if (nclears > 6) {
> -                               *(d + 5) = 0;
> -                               *(d + 6) = 0;
> -                               if (nclears > 8) {
> -                                       *(d + 7) = 0;
> -                                       *(d + 8) = 0;
> -                               }
> -                       }
> -               }
> -       }
> -
> -       return mem;
> +       mem = public_mALLOc(bytes);
> +       if (!mem)
> +               return NULL;
> +       return memset(mem, 0, bytes);
>  }
>
>
> --
> 2.7.0.rc3
>

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: malloc: performance improvements and bugfixes
  2016-01-26  3:22         ` Jörn Engel
  2016-01-26  6:22           ` Mike Frysinger
@ 2016-01-26 12:37           ` Torvald Riegel
  2016-01-26 13:23           ` Florian Weimer
  2 siblings, 0 replies; 119+ messages in thread
From: Torvald Riegel @ 2016-01-26 12:37 UTC (permalink / raw)
  To: Jörn Engel
  Cc: Siddhesh Poyarekar, Paul Eggert, GNU C. Library, Joern Engel

On Mon, 2016-01-25 at 19:21 -0800, Jörn Engel wrote:
> On Mon, Jan 25, 2016 at 06:44:37PM -0800, Jörn Engel wrote:
> > On Tue, Jan 26, 2016 at 07:22:32AM +0530, Siddhesh Poyarekar wrote:
> > > On 26 January 2016 at 06:30, Jörn Engel <joern@purestorage.com> wrote:
> > > > Agreed.  I thought I mentioned that this is a braindump and not a
> > > > patchset for submission. ;)
> > > 
> > > Thank you for doing this. There is however the issue of copyright
> > > assignment.  I don't have access to the copyright list, so could you
> > > (or Carlos, Joseph, etc.) please tell me if you have signed it?
> > 
> > I have not and will not.  Or at least someone with a silver tongue would
> > have to spend significant time explaining the advantages of copyright
> > assignment to me.
> 
> Maybe I should elaborate a little.
> 
> I am quite thankful to the FSF for the GPL.  Creating that license was a
> wonderful move for those people that aren't happy with BSD-style
> licenses.
> 
> That said, I find language like "version 2 or later" trollbait at best.
> The paranoid in me and many other developers starts wondering under what
> circumstances the FSF might turn evil, by any definition of evil, and
> create a license to further their own schemes.
> 
> Some might argue that GPLv3 already is evil.  I personally don't mind
> either version 2 or 3, but I hate the rift this has created where some
> code is "2 only please" and other code is "3 or later".
> 
> Copyright assignment is far far worse.  I am signing away ownership of
> the code.  But of which code?  Everything I ever write in the future?

I certainly won't give you any legal advice, but I think this all is
less complex than you seem to think it is.  For example, if you'd sign a
copyright assignment for contributions, you still need to actively
contribute code in the first case, which would clarify your "which
code?" question.

> To answer this question I have to read a lot of legalese, any developers
> favorite.  Then I have to pay a lawyer to explain the finer points to
> me, because I may have missed them.  Next I have to pay a second lawyer
> to judge whether the first lawyer even knew what he was talking about,
> which sadly isn't always the case.
> 
> At this point I am pretty much exhausted and write some kernel code
> instead.  Linus will fuzz about the technical merits that I enjoy
> deliberating, not about copyright assignment.  Or I get a job where the
> same legal problems await me, but I at least get compensated for them.
> 
> And just in case it wasn't clear, all the above is my personal opinion
> and not that of my employer.  Maybe my employer sees enough merit in
> getting code upstream to handle the legal work.  I do not and will not
> pursue this any further.

So, what does your employer think about it?

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH] malloc: add locking to thread cache
  2016-01-26  0:27 ` [PATCH] malloc: add locking to thread cache Joern Engel
@ 2016-01-26 12:45   ` Szabolcs Nagy
  2016-01-26 13:14     ` Florian Weimer
                       ` (2 more replies)
  0 siblings, 3 replies; 119+ messages in thread
From: Szabolcs Nagy @ 2016-01-26 12:45 UTC (permalink / raw)
  To: Joern Engel, GNU C. Library; +Cc: Siddhesh Poyarekar, nd

On 26/01/16 00:25, Joern Engel wrote:
> With signals we can reenter the thread-cache.  Protect against that with
> a lock.  Will almost never happen in practice, it took the company five
> years to reproduce a similar race in the existing malloc.  But easy to
> trigger with a targeted test.

why do you try to make malloc as-safe?

isn't it better to fix malloc usage in signal handlers?

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH] malloc: add locking to thread cache
  2016-01-26 12:45   ` Szabolcs Nagy
  2016-01-26 13:14     ` Florian Weimer
@ 2016-01-26 13:14     ` Yury Gribov
  2016-01-26 17:41     ` Jörn Engel
  2 siblings, 0 replies; 119+ messages in thread
From: Yury Gribov @ 2016-01-26 13:14 UTC (permalink / raw)
  To: Szabolcs Nagy, Joern Engel, GNU C. Library; +Cc: Siddhesh Poyarekar, nd

On 01/26/2016 03:44 PM, Szabolcs Nagy wrote:
> On 26/01/16 00:25, Joern Engel wrote:
>> With signals we can reenter the thread-cache.  Protect against that with
>> a lock.  Will almost never happen in practice, it took the company five
>> years to reproduce a similar race in the existing malloc.  But easy to
>> trigger with a targeted test.
>
> why do you try to make malloc as-safe?
>
> isn't it better to fix malloc usage in signal handlers?

FYI I once tried to check OSS for violations of signal handler 
requirements (see https://github.com/yugr/sigcheck).  It turned out that 
many packages (ab)use malloc in sighandlers (directly or via printf et 
al.).  Fixing (or even finding) all badly behaving software out there 
would be a huge amount of work so "fixing" it on the library side 
instead may make sense.

-Y

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH] malloc: add locking to thread cache
  2016-01-26 12:45   ` Szabolcs Nagy
@ 2016-01-26 13:14     ` Florian Weimer
  2016-01-26 13:23       ` Yury Gribov
  2016-01-26 13:14     ` Yury Gribov
  2016-01-26 17:41     ` Jörn Engel
  2 siblings, 1 reply; 119+ messages in thread
From: Florian Weimer @ 2016-01-26 13:14 UTC (permalink / raw)
  To: Szabolcs Nagy, Joern Engel, GNU C. Library; +Cc: Siddhesh Poyarekar, nd

On 01/26/2016 01:44 PM, Szabolcs Nagy wrote:
> On 26/01/16 00:25, Joern Engel wrote:
>> With signals we can reenter the thread-cache.  Protect against that with
>> a lock.  Will almost never happen in practice, it took the company five
>> years to reproduce a similar race in the existing malloc.  But easy to
>> trigger with a targeted test.
> 
> why do you try to make malloc as-safe?
> 
> isn't it better to fix malloc usage in signal handlers?

We have functionality like dprintf, syslog, backtrace, C++ thread-local
object access which might be used from signal handlers, but which can
call malloc.  Fixing this in malloc may seem attractive, but I doubt it
can be made completely reliable by concentrating changes there.

Florian

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: malloc: performance improvements and bugfixes
  2016-01-26  3:22         ` Jörn Engel
  2016-01-26  6:22           ` Mike Frysinger
  2016-01-26 12:37           ` Torvald Riegel
@ 2016-01-26 13:23           ` Florian Weimer
  2 siblings, 0 replies; 119+ messages in thread
From: Florian Weimer @ 2016-01-26 13:23 UTC (permalink / raw)
  To: Jörn Engel, Siddhesh Poyarekar
  Cc: Paul Eggert, GNU C. Library, Joern Engel

On 01/26/2016 04:21 AM, Jörn Engel wrote:

> And just in case it wasn't clear, all the above is my personal opinion
> and not that of my employer.  Maybe my employer sees enough merit in
> getting code upstream to handle the legal work.  I do not and will not
> pursue this any further.

Even if you cannot contribute code directly, performance testing of
malloc changes by others is extremely helpful.  In the past, lack of
credible, real-world benchmarking prevented us from making any
fundamental changes to the malloc code.

Florian

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH] malloc: add locking to thread cache
  2016-01-26 13:14     ` Florian Weimer
@ 2016-01-26 13:23       ` Yury Gribov
  2016-01-26 13:40         ` Szabolcs Nagy
  0 siblings, 1 reply; 119+ messages in thread
From: Yury Gribov @ 2016-01-26 13:23 UTC (permalink / raw)
  To: Florian Weimer, Szabolcs Nagy, Joern Engel, GNU C. Library
  Cc: Siddhesh Poyarekar, nd

On 01/26/2016 04:14 PM, Florian Weimer wrote:
> On 01/26/2016 01:44 PM, Szabolcs Nagy wrote:
>> On 26/01/16 00:25, Joern Engel wrote:
>>> With signals we can reenter the thread-cache.  Protect against that with
>>> a lock.  Will almost never happen in practice, it took the company five
>>> years to reproduce a similar race in the existing malloc.  But easy to
>>> trigger with a targeted test.
>>
>> why do you try to make malloc as-safe?
>>
>> isn't it better to fix malloc usage in signal handlers?
>
> We have functionality like dprintf, syslog, backtrace, C++ thread-local
> object access which might be used from signal handlers, but which can
> call malloc.

Right.  One can argue that this is an error and someone should go fix 
the code but in my experience most real-world signal handlers have these 
problems and noone is going to rewrite them any time soon.

> Fixing this in malloc may seem attractive, but I doubt it
> can be made completely reliable by concentrating changes there.
>
> Florian
>
>
>

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH] malloc: add locking to thread cache
  2016-01-26 13:23       ` Yury Gribov
@ 2016-01-26 13:40         ` Szabolcs Nagy
  2016-01-26 18:00           ` Mike Frysinger
  0 siblings, 1 reply; 119+ messages in thread
From: Szabolcs Nagy @ 2016-01-26 13:40 UTC (permalink / raw)
  To: Yury Gribov, Florian Weimer, Joern Engel, GNU C. Library
  Cc: Siddhesh Poyarekar, nd

On 26/01/16 13:23, Yury Gribov wrote:
> On 01/26/2016 04:14 PM, Florian Weimer wrote:
>> On 01/26/2016 01:44 PM, Szabolcs Nagy wrote:
>>> On 26/01/16 00:25, Joern Engel wrote:
>>>> With signals we can reenter the thread-cache.  Protect against that with
>>>> a lock.  Will almost never happen in practice, it took the company five
>>>> years to reproduce a similar race in the existing malloc.  But easy to
>>>> trigger with a targeted test.
>>>
>>> why do you try to make malloc as-safe?
>>>
>>> isn't it better to fix malloc usage in signal handlers?
>>
>> We have functionality like dprintf, syslog, backtrace, C++ thread-local
>> object access which might be used from signal handlers, but which can
>> call malloc.
> 
> Right.  One can argue that this is an error and someone should go fix the code but in my experience most
> real-world signal handlers have these problems and noone is going to rewrite them any time soon.
> 

it is easy to write non-conforming code, but that
does not mean the libc should try to make it work.

in this case signals should be blocked whenever the
library is entered through non-as-safe api calls.
that has a significant cost for portable code.

>> Fixing this in malloc may seem attractive, but I doubt it
>> can be made completely reliable by concentrating changes there.
>>
>> Florian
>>
>>
>>
> 

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH] malloc: remove __builtin_expect
  2016-01-26  9:00     ` Jörn Engel
  2016-01-26  9:37       ` Yury Gribov
@ 2016-01-26 15:46       ` Jeff Law
  1 sibling, 0 replies; 119+ messages in thread
From: Jeff Law @ 2016-01-26 15:46 UTC (permalink / raw)
  To: Jörn Engel, Yury Gribov
  Cc: GNU C. Library, Siddhesh Poyarekar, Joern Engel

On 01/26/2016 01:59 AM, Jörn Engel wrote:
> On Tue, Jan 26, 2016 at 10:56:49AM +0300, Yury Gribov wrote:
>> On 01/26/2016 03:24 AM, Joern Engel wrote:
>>> From: Joern Engel <joern@purestorage.org>
>>>
>>> It was disabled anyway and only served as obfuscation.  No change
>>> post-compilation.
>>
>> FYI I've witnessed significant improvements from (real) __builtin_expect in
>> other projects.
>
> Interesting.  I tried to find any effect and couldn't.  Michael Kerrisk
> managed to write an example program that demonstrated the advantage:
> http://blog.man7.org/2012/10/how-much-do-builtinexpect-likely-and.html
>
> I tried his program and again couldn't demonstrate any effect.  Not that
> I question his numbers, there likely was an effect using his compiler
> and machine.  But with a newer compiler and/or cpu, the effect was gone.
>
> So if you have a reference to the project and can replicate the
> improvement, I would be interested.
It's likely project and processor specific.  We've certainly seen cases 
through GCC where this matters.   These constructs feed into the branch 
probability estimations that are then used throughout GCC to drive the 
optimization passes -- from high level stuff all the way down to setting 
static branch prediction bits in the assembly code.

jeff

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: malloc: performance improvements and bugfixes
  2016-01-26  9:53               ` Florian Weimer
@ 2016-01-26 17:05                 ` Jörn Engel
  2016-01-26 17:31                   ` Paul Eggert
                                     ` (2 more replies)
  0 siblings, 3 replies; 119+ messages in thread
From: Jörn Engel @ 2016-01-26 17:05 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Siddhesh Poyarekar, Paul Eggert, GNU C. Library, Joern Engel

So far I counted eight responses to the copyright assignment question.
A useful response, imo, would have included two things:
1. the actual text of the copyright assignment and
2. the FAQ answering most of my questions.

Lack of 1. means we waste time in a discussion without properly defining
what we are discussing.  It would be a miracle if that yielded useful
results.

But the lack of 2. is that I find concerning.  Am I seriously the first
person that has reservations about signing a copyright assigment?  If I
am not and you have answered similar questions before, where did you
collect those answers?  Or if I am, does that mean I am the only nutjob
on the planet or am I an example of other people that don't contribute
because of copyright assignment?

Since there doesn't seem to be an FAQ, let me start one.

(Q) Why do you need copyright assignment?

(A) See http://www.gnu.org/licenses/why-assign.en.html

(Q) But there was successful litigation about the Linux kernel, which
    doesn't require copyright assignment.  Can you elaborate what those
    "very substantial procedural advantages" are so I don't have to take
    your word for it?

(Q) If I hold all rights for a particular piece of code and assign
    copyright to the FSF, under what terms can I use that code myself?

(Q) As the sole author of a piece of code, can I license that code under
    a different license like BSD, MIT, etc. after assigning copyright to
    the FSF?

(Q) The FSF as the sole copyright holder could relicense code under a
    proprietary license.  Are there mechanisms in place to prevent such
    a scenario beyond "trust us"?

I believe those to be valid questions and reasonable people may well
decide not to do copyright assigment without anwers to them.  If you
would rather reverse-engineer my code than answer legal questions, I can
completely understand that.  But it would be sad if the primary effect
of the FSF these days was to prevent people from contributing.

Jörn

--
But this is not to say that the main benefit of Linux and other GPL
software is lower-cost. Control is the main benefit--cost is secondary.
-- Bruce Perens

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: malloc: performance improvements and bugfixes
  2016-01-26 17:05                 ` Jörn Engel
@ 2016-01-26 17:31                   ` Paul Eggert
  2016-01-26 17:48                     ` Adhemerval Zanella
  2016-01-26 17:49                   ` Joseph Myers
  2016-01-26 17:57                   ` Mike Frysinger
  2 siblings, 1 reply; 119+ messages in thread
From: Paul Eggert @ 2016-01-26 17:31 UTC (permalink / raw)
  To: Jörn Engel, Florian Weimer
  Cc: Siddhesh Poyarekar, GNU C. Library, Joern Engel

On 01/26/2016 09:04 AM, Jörn Engel wrote:
> Since there doesn't seem to be an FAQ, let me start one.

Feel free, but it'll be just your opinion.

As I understand it you have stated that you won't sign anything. If you 
change your mind and will seriously consider assigning copyright, please 
write to me privately and I can start the ball rolling on getting you a 
copy of an assignment so that you can address your concerns and decide 
for yourself whether to sign it. It's entirely up to you.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH] malloc: add locking to thread cache
  2016-01-26 12:45   ` Szabolcs Nagy
  2016-01-26 13:14     ` Florian Weimer
  2016-01-26 13:14     ` Yury Gribov
@ 2016-01-26 17:41     ` Jörn Engel
  2 siblings, 0 replies; 119+ messages in thread
From: Jörn Engel @ 2016-01-26 17:41 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: GNU C. Library, Siddhesh Poyarekar, nd

On Tue, Jan 26, 2016 at 12:44:57PM +0000, Szabolcs Nagy wrote:
> On 26/01/16 00:25, Joern Engel wrote:
> > With signals we can reenter the thread-cache.  Protect against that with
> > a lock.  Will almost never happen in practice, it took the company five
> > years to reproduce a similar race in the existing malloc.  But easy to
> > trigger with a targeted test.
> 
> why do you try to make malloc as-safe?

Why would I not?  I find it easier to make malloc signal-safe than to
find an alternative fix for the signal handler of a single application.
If nothing else, the cost of entering the signal handler a million times
is much lower for a malloc stresstest than an application.

> isn't it better to fix malloc usage in signal handlers?

If your goal is portability across many different malloc libraries, I
would agree.  If your goal is to fix deadlocks in applications, changing
malloc is a more economical use of your time.

Jörn

--
Functionality is an asset, but code is a liability.
--Ted Dziuba

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: malloc: performance improvements and bugfixes
  2016-01-26 17:31                   ` Paul Eggert
@ 2016-01-26 17:48                     ` Adhemerval Zanella
  0 siblings, 0 replies; 119+ messages in thread
From: Adhemerval Zanella @ 2016-01-26 17:48 UTC (permalink / raw)
  To: libc-alpha



On 26-01-2016 15:31, Paul Eggert wrote:
> On 01/26/2016 09:04 AM, Jörn Engel wrote:
>> Since there doesn't seem to be an FAQ, let me start one.
> 
> Feel free, but it'll be just your opinion.
> 
> As I understand it you have stated that you won't sign anything. If you change your mind and will seriously consider assigning copyright, please write to me privately and I can start the ball rolling on getting you a copy of an assignment so that you can address your concerns and decide for yourself whether to sign it. It's entirely up to you.

And just a side note, the copyright assignment requirement is explicit
stated in glibc development wiki [1] that is linked in project development
tab [2].

I am saying this because you could have started this copyright discussion
*before* the patch dump itself. It would have saved a lot of time from
both sides.


[1] https://sourceware.org/glibc/wiki/Contribution%20checklist#FSF_copyright_Assignment
[2] https://www.gnu.org/software/libc/development.html

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: malloc: performance improvements and bugfixes
  2016-01-26 17:05                 ` Jörn Engel
  2016-01-26 17:31                   ` Paul Eggert
@ 2016-01-26 17:49                   ` Joseph Myers
  2016-01-26 17:57                   ` Mike Frysinger
  2 siblings, 0 replies; 119+ messages in thread
From: Joseph Myers @ 2016-01-26 17:49 UTC (permalink / raw)
  To: Jörn Engel
  Cc: Florian Weimer, Siddhesh Poyarekar, Paul Eggert, GNU C. Library,
	Joern Engel

[-- Attachment #1: Type: text/plain, Size: 2249 bytes --]

On Tue, 26 Jan 2016, Jörn Engel wrote:

> (Q) If I hold all rights for a particular piece of code and assign
>     copyright to the FSF, under what terms can I use that code myself?
> 
> (Q) As the sole author of a piece of code, can I license that code under
>     a different license like BSD, MIT, etc. after assigning copyright to
>     the FSF?

The assignment of past changes says:

   Upon thirty days' prior written notice, the Foundation agrees to
   grant me non-exclusive rights to use the Work (i.e. my changes and
   enhancements, not the program which I enhanced) as I see fit; (and
   the Foundation's rights shall otherwise continue unchanged).

The assignment of past and future changes says:

  (d) FSF agrees to grant back to Developer, and does hereby grant,
  non-exclusive, royalty-free and non-cancellable rights to use the
  Works (i.e., Developer's changes and/or enhancements, not the Program
  that they enhance), as Developer sees fit; this grant back does not
  limit FSF's rights and public rights acquired through this agreement.

> (Q) The FSF as the sole copyright holder could relicense code under a
>     proprietary license.  Are there mechanisms in place to prevent such
>     a scenario beyond "trust us"?

Both assignments include terms along the lines of:

  4. FSF agrees that all distribution of the Works, or of any work
  "based on the Works'', or the Program as enhanced by the Works, that
  takes place under the control of FSF or its agents or successors,
  shall be on terms that explicitly and perpetually permit anyone
  possessing a copy of the work to which the terms apply, and possessing
  accurate notice of these terms, to redistribute copies of the work to
  anyone on the same terms.  These terms shall not restrict which
  members of the public copies may be distributed to. These terms shall
  not require a member of the public to pay any royalty to FSF or to
  anyone else for any permitted use of the work they apply to, or to
  communicate with FSF or its agents or assignees in any way either when
  redistribution is performed or on any other occasion.

(So the FSF can relicense on more permissive terms, but not on proprietary 
terms.)

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: malloc: performance improvements and bugfixes
  2016-01-26 17:05                 ` Jörn Engel
  2016-01-26 17:31                   ` Paul Eggert
  2016-01-26 17:49                   ` Joseph Myers
@ 2016-01-26 17:57                   ` Mike Frysinger
  2016-01-27 20:46                     ` Manuel López-Ibáñez
  2 siblings, 1 reply; 119+ messages in thread
From: Mike Frysinger @ 2016-01-26 17:57 UTC (permalink / raw)
  To: Jörn Engel
  Cc: Florian Weimer, Siddhesh Poyarekar, Paul Eggert, GNU C. Library,
	Joern Engel

[-- Attachment #1: Type: text/plain, Size: 918 bytes --]

On 26 Jan 2016 09:04, Jörn Engel wrote:
> So far I counted eight responses to the copyright assignment question.
> A useful response, imo, would have included two things:
> 1. the actual text of the copyright assignment and

i don't think they post it online.  there's a short Q/A you send over:
http://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=blob_plain;f=doc/Copyright/request-assign.future;hb=HEAD

and then they generate the full text for you to sign.  if you actually
want, i can dig up a copy of one i've signed and post it.

> 2. the FAQ answering most of my questions.

http://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=blob_plain;f=doc/Copyright/conditions.text;hb=HEAD

> Since there doesn't seem to be an FAQ, let me start one.

this isn't the list for that.  please post such queries to the FSF
directly.  glibc is a FSF project, and the FSF says they want CLAs
here.  FIN.
-mike

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH] malloc: add locking to thread cache
  2016-01-26 13:40         ` Szabolcs Nagy
@ 2016-01-26 18:00           ` Mike Frysinger
       [not found]             ` <56A8966D.9080000@arm.com>
  0 siblings, 1 reply; 119+ messages in thread
From: Mike Frysinger @ 2016-01-26 18:00 UTC (permalink / raw)
  To: Szabolcs Nagy
  Cc: Yury Gribov, Florian Weimer, Joern Engel, GNU C. Library,
	Siddhesh Poyarekar, nd

[-- Attachment #1: Type: text/plain, Size: 1693 bytes --]

On 26 Jan 2016 13:40, Szabolcs Nagy wrote:
> On 26/01/16 13:23, Yury Gribov wrote:
> > On 01/26/2016 04:14 PM, Florian Weimer wrote:
> >> On 01/26/2016 01:44 PM, Szabolcs Nagy wrote:
> >>> On 26/01/16 00:25, Joern Engel wrote:
> >>>> With signals we can reenter the thread-cache.  Protect against that with
> >>>> a lock.  Will almost never happen in practice, it took the company five
> >>>> years to reproduce a similar race in the existing malloc.  But easy to
> >>>> trigger with a targeted test.
> >>>
> >>> why do you try to make malloc as-safe?
> >>>
> >>> isn't it better to fix malloc usage in signal handlers?
> >>
> >> We have functionality like dprintf, syslog, backtrace, C++ thread-local
> >> object access which might be used from signal handlers, but which can
> >> call malloc.
> > 
> > Right.  One can argue that this is an error and someone should go fix the code but in my experience most
> > real-world signal handlers have these problems and noone is going to rewrite them any time soon.
> > 
> 
> it is easy to write non-conforming code, but that
> does not mean the libc should try to make it work.

and that has a real cost for real time code, as well as adding real
overhead to hot paths.  we spend so much time on trying to get lock
less code, just to re-add overhead by calling into the kernel ?

> in this case signals should be blocked whenever the
> library is entered through non-as-safe api calls.
> that has a significant cost for portable code.

err, this statement makes no sense.  if you were writing portable
code, you wouldn't be using malloc or other non-as safe code in
the signal handler in the first place.
-mike

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH] malloc: remove __builtin_expect
  2016-01-26  7:56   ` Yury Gribov
  2016-01-26  9:00     ` Jörn Engel
@ 2016-01-26 20:43     ` Steven Munroe
  2016-01-26 21:08       ` Florian Weimer
  1 sibling, 1 reply; 119+ messages in thread
From: Steven Munroe @ 2016-01-26 20:43 UTC (permalink / raw)
  To: Yury Gribov; +Cc: Joern Engel, GNU C. Library, Siddhesh Poyarekar, Joern Engel

On Tue, 2016-01-26 at 10:56 +0300, Yury Gribov wrote:
> On 01/26/2016 03:24 AM, Joern Engel wrote:
> > From: Joern Engel <joern@purestorage.org>
> >
> > It was disabled anyway and only served as obfuscation.  No change
> > post-compilation.
> 
> FYI I've witnessed significant improvements from (real) __builtin_expect 
> in other projects.
> 
It depends on the platform and if the programmer correctly understands
the behavior of the program as written.

Net, except for error cases that "should not happen, ever", a bad idea.


^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: malloc: performance improvements and bugfixes
  2016-01-26  0:50 ` malloc: performance improvements and bugfixes Paul Eggert
  2016-01-26  1:01   ` Jörn Engel
@ 2016-01-26 20:50   ` Steven Munroe
  2016-01-26 21:40     ` Florian Weimer
  2016-01-27 21:45   ` Carlos O'Donell
  2 siblings, 1 reply; 119+ messages in thread
From: Steven Munroe @ 2016-01-26 20:50 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Joern Engel, GNU C. Library, Siddhesh Poyarekar, Joern Engel

On Mon, 2016-01-25 at 16:50 -0800, Paul Eggert wrote:
> Thanks for doing all the work and bringing it to our attention. A couple 
> of comments on the process:
> 
> On 01/25/2016 04:24 PM, Joern Engel wrote:
> > I happen to prefer the kernel coding style over GNU coding style.
> 
> Nevertheless, let's keep the GNU coding style for glibc. In some places 
> the existing code doesn't conform to that style but that can be fixed as 
> we go.
> 
> > I believe there are some applications that
> > have deeper knowledge about malloc-internal data structures than they
> > should (*cough*emacs).  As a result it has become impossible to change
> > the internals of malloc without breaking said applications
> 
> This underestimates Emacs. :-)
> 
> Emacs "knows" so much about glibc malloc's internal data structures that 
> Emacs should do the right thing if glibc removes the hooks in question. 
> Of course we should test the resulting combination. However, the point 
> is that Emacs's usage of glibc malloc internals shouldn't ossify glibc 
> malloc.
> 
So why not fix emacs to stop doing this (purely evil behavior).

If they want to persist their internal state from session to session
there are better ways. For example: https://sphde.github.io/

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH] malloc: remove __builtin_expect
  2016-01-26 20:43     ` Steven Munroe
@ 2016-01-26 21:08       ` Florian Weimer
  2016-01-26 21:35         ` Steven Munroe
  0 siblings, 1 reply; 119+ messages in thread
From: Florian Weimer @ 2016-01-26 21:08 UTC (permalink / raw)
  To: munroesj, Yury Gribov
  Cc: Joern Engel, GNU C. Library, Siddhesh Poyarekar, Joern Engel

On 01/26/2016 09:43 PM, Steven Munroe wrote:
> On Tue, 2016-01-26 at 10:56 +0300, Yury Gribov wrote:
>> On 01/26/2016 03:24 AM, Joern Engel wrote:
>>> From: Joern Engel <joern@purestorage.org>
>>>
>>> It was disabled anyway and only served as obfuscation.  No change
>>> post-compilation.
>>
>> FYI I've witnessed significant improvements from (real) __builtin_expect 
>> in other projects.
>>
> It depends on the platform and if the programmer correctly understands
> the behavior of the program as written.
> 
> Net, except for error cases that "should not happen, ever", a bad idea.

Based on what I saw, glibc uses __builtin_expect and the macros derived
from it in two conflicting ways: to express that one alternative is more
likely that the other, and to state that some alternative is impossible
in practice (for a well-behaved program in particular).

GCC's current interpretation leans towards the latter, at least on
x86_64.  I think GCC even puts unlikely code into separate text sections
in some cases.  Most of our __builtin_expect uses seem to be of the
former nature: things that can and do happen during normal operation,
like an unusual character in a character set conversion, or a
locale-related environment variable which is set.

I found it also problematic that GCC does the hot/cold partitioning (not
sure if it's actually called this way) on the error paths themselves,
where we already know that the impossible has happened, and all paths
eventually head towards an abort.  It doesn't make sense for GCC to
split those that hit a known no-return function from those that hit a
function which does not return, but GCC cannot infer that from available
declarations.

Florian

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH] malloc: remove __builtin_expect
  2016-01-26 21:08       ` Florian Weimer
@ 2016-01-26 21:35         ` Steven Munroe
  2016-01-26 21:42           ` Jeff Law
  2016-01-26 21:45           ` Florian Weimer
  0 siblings, 2 replies; 119+ messages in thread
From: Steven Munroe @ 2016-01-26 21:35 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Yury Gribov, Joern Engel, GNU C. Library, Siddhesh Poyarekar,
	Joern Engel

On Tue, 2016-01-26 at 22:07 +0100, Florian Weimer wrote:
> On 01/26/2016 09:43 PM, Steven Munroe wrote:
> > On Tue, 2016-01-26 at 10:56 +0300, Yury Gribov wrote:
> >> On 01/26/2016 03:24 AM, Joern Engel wrote:
> >>> From: Joern Engel <joern@purestorage.org>
> >>>
> >>> It was disabled anyway and only served as obfuscation.  No change
> >>> post-compilation.
> >>
> >> FYI I've witnessed significant improvements from (real) __builtin_expect 
> >> in other projects.
> >>
> > It depends on the platform and if the programmer correctly understands
> > the behavior of the program as written.
> > 
> > Net, except for error cases that "should not happen, ever", a bad idea.
> 
> Based on what I saw, glibc uses __builtin_expect and the macros derived
> from it in two conflicting ways: to express that one alternative is more
> likely that the other, and to state that some alternative is impossible
> in practice (for a well-behaved program in particular).
> 
> GCC's current interpretation leans towards the latter, at least on
> x86_64.  I think GCC even puts unlikely code into separate text sections
> in some cases.  Most of our __builtin_expect uses seem to be of the
> former nature: things that can and do happen during normal operation,
> like an unusual character in a character set conversion, or a
> locale-related environment variable which is set.
> 
I am also concerned the GCC has some serious bugs in its block frequency
handling. For example:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67755


> I found it also problematic that GCC does the hot/cold partitioning (not
> sure if it's actually called this way) on the error paths themselves,
> where we already know that the impossible has happened, and all paths
> eventually head towards an abort.  It doesn't make sense for GCC to
> split those that hit a known no-return function from those that hit a
> function which does not return, but GCC cannot infer that from available
> declarations.
> 
> Florian
> 


^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: malloc: performance improvements and bugfixes
  2016-01-26 20:50   ` Steven Munroe
@ 2016-01-26 21:40     ` Florian Weimer
  2016-01-26 21:48       ` Steven Munroe
                         ` (2 more replies)
  0 siblings, 3 replies; 119+ messages in thread
From: Florian Weimer @ 2016-01-26 21:40 UTC (permalink / raw)
  To: munroesj
  Cc: Paul Eggert, Joern Engel, GNU C. Library, Siddhesh Poyarekar,
	Joern Engel

On 01/26/2016 09:50 PM, Steven Munroe wrote:

> So why not fix emacs to stop doing this (purely evil behavior).
> 
> If they want to persist their internal state from session to session
> there are better ways. For example: https://sphde.github.io/

It's complicated, but not just for the technical challenges involved.
My warning to the Emacs developers that we are going to clean this up on
the glibc side was not universally well-received.

Some of the Emacs developers think they have a stop-gap solution for
future Emacs binaries: They are going to interpose their own malloc.  My
testing agrees; it should happen automatically once we remove
long-deprecated symbols from <malloc.h> because that will cause their
autoconf check to fail, triggering a switch to their built-in malloc.
(This will happen even before with deprecate and eventually remove
__malloc_get_state and __malloc_set_state from the API.)

For existing binaries with newer glibc, I think I found a way to quickly
rewrite the dumped heap in such a way that we do not have to add any
special cases to the hot paths in free/realloc.  (We don't even have to
use Siddhesh's corrupt arena code for that, although that could be an
option as well.)  The main challenge is whether we can obtain the
address of the first *allocated* chunk on the main arena.  Once we have
that, we can do the heap traversal.  I expect maybe a few dozen lines of
code for this in total, and the constraints on future malloc evolution
are minimal.  It's much better than maintaining two mallocs inside glibc.

I also know one reason why valgrind does not work with dumped Emacs
binaries, and it's quite straightforward to fix (it could easily be part
of the traversal to rewrite the heap, but we have to implement it twice,
once in glibc and once for the Emacs malloc implementation).  Let's hope
this is the only fix needed to get valgrind going.

In short, things aren't as bad as we thought they are, and we can
finally fix bug 6527 after we have implemented the traversal of the
dumped heap I mentioned above.

Florian

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH] malloc: remove __builtin_expect
  2016-01-26 21:35         ` Steven Munroe
@ 2016-01-26 21:42           ` Jeff Law
  2016-01-27  0:37             ` Steven Munroe
  2016-01-26 21:45           ` Florian Weimer
  1 sibling, 1 reply; 119+ messages in thread
From: Jeff Law @ 2016-01-26 21:42 UTC (permalink / raw)
  To: munroesj, Florian Weimer
  Cc: Yury Gribov, Joern Engel, GNU C. Library, Siddhesh Poyarekar,
	Joern Engel

On 01/26/2016 02:34 PM, Steven Munroe wrote:
> On Tue, 2016-01-26 at 22:07 +0100, Florian Weimer wrote:
>> On 01/26/2016 09:43 PM, Steven Munroe wrote:
>>> On Tue, 2016-01-26 at 10:56 +0300, Yury Gribov wrote:
>>>> On 01/26/2016 03:24 AM, Joern Engel wrote:
>>>>> From: Joern Engel <joern@purestorage.org>
>>>>>
>>>>> It was disabled anyway and only served as obfuscation.  No change
>>>>> post-compilation.
>>>>
>>>> FYI I've witnessed significant improvements from (real) __builtin_expect
>>>> in other projects.
>>>>
>>> It depends on the platform and if the programmer correctly understands
>>> the behavior of the program as written.
>>>
>>> Net, except for error cases that "should not happen, ever", a bad idea.
>>
>> Based on what I saw, glibc uses __builtin_expect and the macros derived
>> from it in two conflicting ways: to express that one alternative is more
>> likely that the other, and to state that some alternative is impossible
>> in practice (for a well-behaved program in particular).
>>
>> GCC's current interpretation leans towards the latter, at least on
>> x86_64.  I think GCC even puts unlikely code into separate text sections
>> in some cases.  Most of our __builtin_expect uses seem to be of the
>> former nature: things that can and do happen during normal operation,
>> like an unusual character in a character set conversion, or a
>> locale-related environment variable which is set.
>>
> I am also concerned the GCC has some serious bugs in its block frequency
> handling. For example:
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67755
Ha, already fixed that one :-)
Jeff

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH] malloc: remove __builtin_expect
  2016-01-26 21:35         ` Steven Munroe
  2016-01-26 21:42           ` Jeff Law
@ 2016-01-26 21:45           ` Florian Weimer
  1 sibling, 0 replies; 119+ messages in thread
From: Florian Weimer @ 2016-01-26 21:45 UTC (permalink / raw)
  To: munroesj
  Cc: Yury Gribov, Joern Engel, GNU C. Library, Siddhesh Poyarekar,
	Joern Engel

On 01/26/2016 10:34 PM, Steven Munroe wrote:

>> Based on what I saw, glibc uses __builtin_expect and the macros derived
>> from it in two conflicting ways: to express that one alternative is more
>> likely that the other, and to state that some alternative is impossible
>> in practice (for a well-behaved program in particular).
>>
>> GCC's current interpretation leans towards the latter, at least on
>> x86_64.  I think GCC even puts unlikely code into separate text sections
>> in some cases.  Most of our __builtin_expect uses seem to be of the
>> former nature: things that can and do happen during normal operation,
>> like an unusual character in a character set conversion, or a
>> locale-related environment variable which is set.
>>
> I am also concerned the GCC has some serious bugs in its block frequency
> handling. For example:
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67755

Hmm.  So maybe what I saw looking at disassembly was a consequence of
this bug, and not intended GCC behavior.

Florian

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: malloc: performance improvements and bugfixes
  2016-01-26 21:40     ` Florian Weimer
@ 2016-01-26 21:48       ` Steven Munroe
  2016-01-26 21:51         ` Florian Weimer
  2016-01-26 21:51       ` Paul Eggert
  2016-01-26 22:00       ` Jörn Engel
  2 siblings, 1 reply; 119+ messages in thread
From: Steven Munroe @ 2016-01-26 21:48 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Paul Eggert, Joern Engel, GNU C. Library, Siddhesh Poyarekar,
	Joern Engel

On Tue, 2016-01-26 at 22:40 +0100, Florian Weimer wrote:
> On 01/26/2016 09:50 PM, Steven Munroe wrote:
> 
> > So why not fix emacs to stop doing this (purely evil behavior).
> > 
> > If they want to persist their internal state from session to session
> > there are better ways. For example: https://sphde.github.io/
> 
> It's complicated, but not just for the technical challenges involved.
> My warning to the Emacs developers that we are going to clean this up on
> the glibc side was not universally well-received.
> 
> Some of the Emacs developers think they have a stop-gap solution for
> future Emacs binaries: They are going to interpose their own malloc.  My
> testing agrees; it should happen automatically once we remove
> long-deprecated symbols from <malloc.h> because that will cause their
> autoconf check to fail, triggering a switch to their built-in malloc.
> (This will happen even before with deprecate and eventually remove
> __malloc_get_state and __malloc_set_state from the API.)
> 
> For existing binaries with newer glibc, I think I found a way to quickly
> rewrite the dumped heap in such a way that we do not have to add any
> special cases to the hot paths in free/realloc.  (We don't even have to
> use Siddhesh's corrupt arena code for that, although that could be an
> option as well.)  The main challenge is whether we can obtain the
> address of the first *allocated* chunk on the main arena.  Once we have
> that, we can do the heap traversal.  I expect maybe a few dozen lines of
> code for this in total, and the constraints on future malloc evolution
> are minimal.  It's much better than maintaining two mallocs inside glibc.
> 
Will this allow the PPC32 ABI to finally use the correct 16-byte
alignment for malloc? Otherwise it is hard to use vector or
_Decimal128...

> I also know one reason why valgrind does not work with dumped Emacs
> binaries, and it's quite straightforward to fix (it could easily be part
> of the traversal to rewrite the heap, but we have to implement it twice,
> once in glibc and once for the Emacs malloc implementation).  Let's hope
> this is the only fix needed to get valgrind going.
> 
> In short, things aren't as bad as we thought they are, and we can
> finally fix bug 6527 after we have implemented the traversal of the
> dumped heap I mentioned above.
> 
> Florian
> 


^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: malloc: performance improvements and bugfixes
  2016-01-26 21:40     ` Florian Weimer
  2016-01-26 21:48       ` Steven Munroe
@ 2016-01-26 21:51       ` Paul Eggert
  2016-01-26 21:57         ` Florian Weimer
  2016-01-26 22:00       ` Jörn Engel
  2 siblings, 1 reply; 119+ messages in thread
From: Paul Eggert @ 2016-01-26 21:51 UTC (permalink / raw)
  To: Florian Weimer; +Cc: GNU C. Library

On 01/26/2016 01:40 PM, Florian Weimer wrote:
> In short, things aren't as bad as we thought they are, and we can
> finally fix bug 6527 after we have implemented the traversal of the
> dumped heap I mentioned above.

Thanks for the heads-up. We have also circulated some changes on the 
Emacs side, to help Emacs work better with the Cygwin port of glibc, 
which also has problems in this area. These Emacs changes are not yet 
installed and are not currently planned for the next Emacs release, but 
are likely to go in after that. When it's convenient I would like to 
test these Emacs changes with the glibc malloc changes that you're 
envisioning, and iron out any glitches that come up.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: malloc: performance improvements and bugfixes
  2016-01-26 21:48       ` Steven Munroe
@ 2016-01-26 21:51         ` Florian Weimer
  0 siblings, 0 replies; 119+ messages in thread
From: Florian Weimer @ 2016-01-26 21:51 UTC (permalink / raw)
  To: munroesj
  Cc: Paul Eggert, Joern Engel, GNU C. Library, Siddhesh Poyarekar,
	Joern Engel

On 01/26/2016 10:48 PM, Steven Munroe wrote:

> Will this allow the PPC32 ABI to finally use the correct 16-byte
> alignment for malloc? Otherwise it is hard to use vector or
> _Decimal128...

I think this is …

>> In short, things aren't as bad as we thought they are, and we can
>> finally fix bug 6527 after we have implemented the traversal of the
>> dumped heap I mentioned above.

bug 6527, no?

Florian

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: malloc: performance improvements and bugfixes
  2016-01-26 21:51       ` Paul Eggert
@ 2016-01-26 21:57         ` Florian Weimer
  2016-01-26 22:18           ` Paul Eggert
  0 siblings, 1 reply; 119+ messages in thread
From: Florian Weimer @ 2016-01-26 21:57 UTC (permalink / raw)
  To: Paul Eggert; +Cc: GNU C. Library

On 01/26/2016 10:51 PM, Paul Eggert wrote:
> On 01/26/2016 01:40 PM, Florian Weimer wrote:
>> In short, things aren't as bad as we thought they are, and we can
>> finally fix bug 6527 after we have implemented the traversal of the
>> dumped heap I mentioned above.
> 
> Thanks for the heads-up. We have also circulated some changes on the
> Emacs side, to help Emacs work better with the Cygwin port of glibc,
> which also has problems in this area. These Emacs changes are not yet
> installed and are not currently planned for the next Emacs release, but
> are likely to go in after that. When it's convenient I would like to
> test these Emacs changes with the glibc malloc changes that you're
> envisioning, and iron out any glitches that come up.

One change you can make *today* is to include <malloc.h> in src/emacs.c
(conditionally for dlmalloc), and not just the autoconf test case.  This
way, you will actually see new deprecation warnings as they appear in
the header file.

You won't get one for __malloc_initialize_hook, even though it's been
deprecated for close to five years now.  This is because GCC does not
warn if you supply a definition for a deprecated declaration, and
__malloc_initialize_hook is used through symbol interposition (because
any assignment would happen too late).

Florian

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: malloc: performance improvements and bugfixes
  2016-01-26 21:40     ` Florian Weimer
  2016-01-26 21:48       ` Steven Munroe
  2016-01-26 21:51       ` Paul Eggert
@ 2016-01-26 22:00       ` Jörn Engel
  2016-01-26 22:02         ` Florian Weimer
  2 siblings, 1 reply; 119+ messages in thread
From: Jörn Engel @ 2016-01-26 22:00 UTC (permalink / raw)
  To: Florian Weimer
  Cc: munroesj, Paul Eggert, GNU C. Library, Siddhesh Poyarekar, Joern Engel

On Tue, Jan 26, 2016 at 10:40:50PM +0100, Florian Weimer wrote:
> On 01/26/2016 09:50 PM, Steven Munroe wrote:
> 
> > So why not fix emacs to stop doing this (purely evil behavior).
> > 
> > If they want to persist their internal state from session to session
> > there are better ways. For example: https://sphde.github.io/
> 
> It's complicated, but not just for the technical challenges involved.
> My warning to the Emacs developers that we are going to clean this up on
> the glibc side was not universally well-received.

One option is to do something like

#define USE_NEW_MALLOC
#include <stdlib.h>

It is admittedly tasteless and will only look worse over time.  But it
is no worse than linking against an alternative malloc.  If there is too
much inertia from legacy users, this is the low-tech variant to leave
them behind and improve malloc.

Better alternatives are certainly welcome.  But if they take another
year to get implemented, I wonder if we should even bother.  It was a
question people asked me several times and answering that question only
gets harder over time.

Jörn

--
To recognize individual spam features you have to try to get into the
mind of the spammer, and frankly I want to spend as little time inside
the minds of spammers as possible.
-- Paul Graham

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: malloc: performance improvements and bugfixes
  2016-01-26 22:00       ` Jörn Engel
@ 2016-01-26 22:02         ` Florian Weimer
  0 siblings, 0 replies; 119+ messages in thread
From: Florian Weimer @ 2016-01-26 22:02 UTC (permalink / raw)
  To: Jörn Engel
  Cc: munroesj, Paul Eggert, GNU C. Library, Siddhesh Poyarekar, Joern Engel

On 01/26/2016 10:59 PM, Jörn Engel wrote:
> On Tue, Jan 26, 2016 at 10:40:50PM +0100, Florian Weimer wrote:
>> On 01/26/2016 09:50 PM, Steven Munroe wrote:
>>
>>> So why not fix emacs to stop doing this (purely evil behavior).
>>>
>>> If they want to persist their internal state from session to session
>>> there are better ways. For example: https://sphde.github.io/
>>
>> It's complicated, but not just for the technical challenges involved.
>> My warning to the Emacs developers that we are going to clean this up on
>> the glibc side was not universally well-received.
> 
> One option is to do something like
> 
> #define USE_NEW_MALLOC
> #include <stdlib.h>

This is not an option because we would have to prevent all libraries
which Emacs links against from doing this (recursively).  This does not
work.

And we don't want to maintain two mallocs inside glibc.

Florian

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: malloc: performance improvements and bugfixes
  2016-01-26 21:57         ` Florian Weimer
@ 2016-01-26 22:18           ` Paul Eggert
  2016-01-26 22:24             ` Florian Weimer
  0 siblings, 1 reply; 119+ messages in thread
From: Paul Eggert @ 2016-01-26 22:18 UTC (permalink / raw)
  To: Florian Weimer; +Cc: GNU C. Library

On 01/26/2016 01:57 PM, Florian Weimer wrote:
> One change you can make*today*  is to include <malloc.h> in src/emacs.c
> (conditionally for dlmalloc), and not just the autoconf test case.

Emacs has included <malloc.h> for decades, in src/alloca.c, so this part 
is already done.

> you will actually see new deprecation warnings as they appear in
> the header file.

In practice this kind of deprecation warnings tends to be more trouble 
than it's worth. We already know about the compatibility issues, and 
don't need the deprecation warnings to remind us. The warnings will 
probably cause needless work by users who will not understand the 
details and who will send us low-value bug reports.

Come to think of it, how can we shut the deprecation warnings off? There 
should be an easy way for Emacs to do that.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: malloc: performance improvements and bugfixes
  2016-01-26 22:18           ` Paul Eggert
@ 2016-01-26 22:24             ` Florian Weimer
  2016-01-27  1:31               ` Paul Eggert
  0 siblings, 1 reply; 119+ messages in thread
From: Florian Weimer @ 2016-01-26 22:24 UTC (permalink / raw)
  To: Paul Eggert; +Cc: GNU C. Library

On 01/26/2016 11:18 PM, Paul Eggert wrote:
> On 01/26/2016 01:57 PM, Florian Weimer wrote:
>> One change you can make*today*  is to include <malloc.h> in src/emacs.c
>> (conditionally for dlmalloc), and not just the autoconf test case.
> 
> Emacs has included <malloc.h> for decades, in src/alloca.c, so this part
> is already done.

But this file does not use any of the deprecated (or
soon-to-be-deprecated) identifiers.

The deprecation warnings I'm talking about are not #warning directives
in the header file.  You only get them if you use the very identifiers
annotated with deprecation warnings (or if you use a preprocessor macro
that is subject to a warning hack).

Florian

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH] malloc: remove __builtin_expect
  2016-01-26 21:42           ` Jeff Law
@ 2016-01-27  0:37             ` Steven Munroe
  2016-01-27  3:16               ` Jeff Law
  0 siblings, 1 reply; 119+ messages in thread
From: Steven Munroe @ 2016-01-27  0:37 UTC (permalink / raw)
  To: Jeff Law
  Cc: Florian Weimer, Yury Gribov, Joern Engel, GNU C. Library,
	Siddhesh Poyarekar, Joern Engel

On Tue, 2016-01-26 at 14:42 -0700, Jeff Law wrote:
> On 01/26/2016 02:34 PM, Steven Munroe wrote:
> > On Tue, 2016-01-26 at 22:07 +0100, Florian Weimer wrote:
> >> On 01/26/2016 09:43 PM, Steven Munroe wrote:
> >>> On Tue, 2016-01-26 at 10:56 +0300, Yury Gribov wrote:
> >>>> On 01/26/2016 03:24 AM, Joern Engel wrote:
> >>>>> From: Joern Engel <joern@purestorage.org>
> >>>>>
> >>>>> It was disabled anyway and only served as obfuscation.  No change
> >>>>> post-compilation.
> >>>>
> >>>> FYI I've witnessed significant improvements from (real) __builtin_expect
> >>>> in other projects.
> >>>>
> >>> It depends on the platform and if the programmer correctly understands
> >>> the behavior of the program as written.
> >>>
> >>> Net, except for error cases that "should not happen, ever", a bad idea.
> >>
> >> Based on what I saw, glibc uses __builtin_expect and the macros derived
> >> from it in two conflicting ways: to express that one alternative is more
> >> likely that the other, and to state that some alternative is impossible
> >> in practice (for a well-behaved program in particular).
> >>
> >> GCC's current interpretation leans towards the latter, at least on
> >> x86_64.  I think GCC even puts unlikely code into separate text sections
> >> in some cases.  Most of our __builtin_expect uses seem to be of the
> >> former nature: things that can and do happen during normal operation,
> >> like an unusual character in a character set conversion, or a
> >> locale-related environment variable which is set.
> >>
> > I am also concerned the GCC has some serious bugs in its block frequency
> > handling. For example:
> >
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67755
> Ha, already fixed that one :-)

But there are more, we are still digging them out.



^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: malloc: performance improvements and bugfixes
  2016-01-26 22:24             ` Florian Weimer
@ 2016-01-27  1:31               ` Paul Eggert
  0 siblings, 0 replies; 119+ messages in thread
From: Paul Eggert @ 2016-01-27  1:31 UTC (permalink / raw)
  To: Florian Weimer; +Cc: GNU C. Library

[-- Attachment #1: Type: text/plain, Size: 1135 bytes --]

On 01/26/2016 02:24 PM, Florian Weimer wrote:
> But this file does not use any of the deprecated (or
> soon-to-be-deprecated) identifiers.
>
> The deprecation warnings I'm talking about are not #warning directives
> in the header file.  You only get them if you use the very identifiers
> annotated with deprecation warnings (or if you use a preprocessor macro
> that is subject to a warning hack).

Thanks, I see now. However, GCC does not issue the deprecation warning 
even if the hook is used, so long as the hook is initialized the way 
that Emacs does it. (Emacs must initialize the hook statically because 
the initialization must take effect before 'main' is called.) So, for 
example, if I compile the attached program on Fedora 23 with GCC, the 
program compiles and runs without complaint.

The failure to warn is arguably a GCC bug. It's a blessing in disguise 
for Emacs, as I don't want the deprecation warning there anyway.

I plan to install something into Emacs that includes <malloc.h> before 
initializing the hook, though, as that's just standard hygiene. Perhaps 
some day GCC will issue the warning you want....


[-- Attachment #2: t.c --]
[-- Type: text/plain, Size: 376 bytes --]

#include <unistd.h>
#include <malloc.h>

#ifndef __MALLOC_HOOK_VOLATILE
# define __MALLOC_HOOK_VOLATILE
#endif

static int foo = 1;

static void
malloc_initialize_hook (void)
{
  foo = 0;
}

void (*__MALLOC_HOOK_VOLATILE __malloc_initialize_hook) (void)
  = malloc_initialize_hook;

int
main (void)
{
  char *p = malloc (100);
  read (-1, p, 100);
  free (p);
  return foo;
}

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH] malloc: remove __builtin_expect
  2016-01-27  0:37             ` Steven Munroe
@ 2016-01-27  3:16               ` Jeff Law
  0 siblings, 0 replies; 119+ messages in thread
From: Jeff Law @ 2016-01-27  3:16 UTC (permalink / raw)
  To: munroesj
  Cc: Florian Weimer, Yury Gribov, Joern Engel, GNU C. Library,
	Siddhesh Poyarekar, Joern Engel

On 01/26/2016 05:37 PM, Steven Munroe wrote:
> On Tue, 2016-01-26 at 14:42 -0700, Jeff Law wrote:
>> On 01/26/2016 02:34 PM, Steven Munroe wrote:
>>> On Tue, 2016-01-26 at 22:07 +0100, Florian Weimer wrote:
>>>> On 01/26/2016 09:43 PM, Steven Munroe wrote:
>>>>> On Tue, 2016-01-26 at 10:56 +0300, Yury Gribov wrote:
>>>>>> On 01/26/2016 03:24 AM, Joern Engel wrote:
>>>>>>> From: Joern Engel <joern@purestorage.org>
>>>>>>>
>>>>>>> It was disabled anyway and only served as obfuscation.  No change
>>>>>>> post-compilation.
>>>>>>
>>>>>> FYI I've witnessed significant improvements from (real) __builtin_expect
>>>>>> in other projects.
>>>>>>
>>>>> It depends on the platform and if the programmer correctly understands
>>>>> the behavior of the program as written.
>>>>>
>>>>> Net, except for error cases that "should not happen, ever", a bad idea.
>>>>
>>>> Based on what I saw, glibc uses __builtin_expect and the macros derived
>>>> from it in two conflicting ways: to express that one alternative is more
>>>> likely that the other, and to state that some alternative is impossible
>>>> in practice (for a well-behaved program in particular).
>>>>
>>>> GCC's current interpretation leans towards the latter, at least on
>>>> x86_64.  I think GCC even puts unlikely code into separate text sections
>>>> in some cases.  Most of our __builtin_expect uses seem to be of the
>>>> former nature: things that can and do happen during normal operation,
>>>> like an unusual character in a character set conversion, or a
>>>> locale-related environment variable which is set.
>>>>
>>> I am also concerned the GCC has some serious bugs in its block frequency
>>> handling. For example:
>>>
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67755
>> Ha, already fixed that one :-)
>
> But there are more, we are still digging them out.
And we'll keep squashing 'em!

I suspect the FSM threader is going to be the source of a lot of these 
issues in the future.

jeff

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH] malloc: add locking to thread cache
       [not found]             ` <56A8966D.9080000@arm.com>
@ 2016-01-27 17:45               ` Jörn Engel
  2016-01-27 19:19                 ` Torvald Riegel
  2016-01-27 21:36                 ` Carlos O'Donell
  0 siblings, 2 replies; 119+ messages in thread
From: Jörn Engel @ 2016-01-27 17:45 UTC (permalink / raw)
  To: Szabolcs Nagy
  Cc: Yury Gribov, Florian Weimer, GNU C. Library, Siddhesh Poyarekar, nd

On Wed, Jan 27, 2016 at 10:05:33AM +0000, Szabolcs Nagy wrote:
> 
> portable code does not need the fix, but all users
> of the api are affected by the overhead of the fix.

Sorry, but you have no idea what you are talking about.  The overhead of
the fix is _negative_.  Users _want_ to be affected.

The reason why free() is not signalsafe is that it spins on an
arena-lock.  If the same thread already holds that lock, you have a
classic deadlock.  By not spinning on the lock you make code run faster.
You also fix the signal-induced deadlock.

Ok, for thread-cache there is some overhead.  Without signal-safety you
wouldn't need a lock for the thread-cache at all.  But here I call
bullshit again, because I had the same concerns.  Then I measured and
could not demonstrate any performance impact.

"If you cannot measure it, it does not exist."

You might disagree philosophically, but if an engineers goes out of his
way to measure a certain effect and finds nothing, that effect hardly
matters in any practical way.

Jörn

--
Only children believe that ideas are simply born.  Usually two other
ideas have sex.
-- Jon Gnarr

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH] malloc: add locking to thread cache
  2016-01-27 17:45               ` Jörn Engel
@ 2016-01-27 19:19                 ` Torvald Riegel
  2016-01-27 19:43                   ` Jörn Engel
  2016-01-27 21:36                 ` Carlos O'Donell
  1 sibling, 1 reply; 119+ messages in thread
From: Torvald Riegel @ 2016-01-27 19:19 UTC (permalink / raw)
  To: Jörn Engel
  Cc: Szabolcs Nagy, Yury Gribov, Florian Weimer, GNU C. Library,
	Siddhesh Poyarekar, nd

On Wed, 2016-01-27 at 09:44 -0800, Jörn Engel wrote:
> On Wed, Jan 27, 2016 at 10:05:33AM +0000, Szabolcs Nagy wrote:
> > 
> > portable code does not need the fix, but all users
> > of the api are affected by the overhead of the fix.
> 
> Sorry, but you have no idea what you are talking about.  The overhead of
> the fix is _negative_.  Users _want_ to be affected.
> 
> The reason why free() is not signalsafe is that it spins on an
> arena-lock.  If the same thread already holds that lock, you have a
> classic deadlock.  By not spinning on the lock you make code run faster.
> You also fix the signal-induced deadlock.
> 
> Ok, for thread-cache there is some overhead.  Without signal-safety you
> wouldn't need a lock for the thread-cache at all.  But here I call
> bullshit again, because I had the same concerns.  Then I measured and
> could not demonstrate any performance impact.
> 
> "If you cannot measure it, it does not exist."
> 
> You might disagree philosophically, but if an engineers goes out of his
> way to measure a certain effect and finds nothing, that effect hardly
> matters in any practical way.

Then it should be sufficient if you can describe what you actually
measured to convince Szabolcs; if your measurement is as good as you
seem to say it is, it should be obvious why his (and your prior)
concerns are unfounded.  Up to now, you just stated that you did measure
something.

I'd also like to kindly point out that we're not on LKML here.  We all
will understand you better if you provide sounds arguments and provide
the necessary background information instead of if you "call bullshit".

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH] malloc: add locking to thread cache
  2016-01-27 19:19                 ` Torvald Riegel
@ 2016-01-27 19:43                   ` Jörn Engel
  0 siblings, 0 replies; 119+ messages in thread
From: Jörn Engel @ 2016-01-27 19:43 UTC (permalink / raw)
  To: Torvald Riegel
  Cc: Szabolcs Nagy, Yury Gribov, Florian Weimer, GNU C. Library,
	Siddhesh Poyarekar, nd

On Wed, Jan 27, 2016 at 08:19:02PM +0100, Torvald Riegel wrote:
> 
> Then it should be sufficient if you can describe what you actually
> measured to convince Szabolcs; if your measurement is as good as you
> seem to say it is, it should be obvious why his (and your prior)
> concerns are unfounded.  Up to now, you just stated that you did measure
> something.

I believe I ran my testsuite (see tarball from the other subthread) with
and without locking for the thread cache.  Signal torture obviously
didn't finish without locking.  The performance impact for the others
was well in the noise.  Maybe with a few hundred repetitions I could
have extracted some non-noise signal, but I was happy with that result.

Since my testsuite spends >50% of the time in malloc, any real-world
results would get diluted even further.

> I'd also like to kindly point out that we're not on LKML here.  We all
> will understand you better if you provide sounds arguments and provide
> the necessary background information instead of if you "call bullshit".

Fair.

I have little patience for theoretical concerns that block real
improvements based on zero evidence.  But I shouldn't drop my manners
because of that.  Apologies.

Jörn

--
The cheapest, fastest and most reliable components of a computer
system are those that aren't there.
-- Gordon Bell, DEC labratories

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: malloc: performance improvements and bugfixes
  2016-01-26 17:57                   ` Mike Frysinger
@ 2016-01-27 20:46                     ` Manuel López-Ibáñez
  0 siblings, 0 replies; 119+ messages in thread
From: Manuel López-Ibáñez @ 2016-01-27 20:46 UTC (permalink / raw)
  To: Jörn Engel, Florian Weimer, Siddhesh Poyarekar, Paul Eggert,
	GNU C. Library, Joern Engel

Jörn (and others),

Judging from similar discussions in the GCC list:

https://gcc.gnu.org/ml/gcc/2010-04/msg00847.html
https://gcc.gnu.org/ml/gcc/2010-04/msg00725.html

I would strongly advise to stop discussing this here and contact the 
copyright-clerk of the FSF and expose your case: "I want to contribute to GNU 
libc, but I don't want to assign copyright because of X, Y, and Z. What can I do?"

Why?

1) As the thread in GCC shows, almost every contributor to any GNU project 
don't know or are completely wrong about the details. At most, they can tell 
you their personal case, however, the FSF changes their processes with respect 
to legal changes, thus personal experience is of limited value.

2) The process is very individualized. It depends on where you live, where you 
work, how much you want to contribute, etc. Only the FSF clerk (who I guess is 
a lawyer or has access to a lawyer) knows all the fine details.

3) The FSF is very flexible and will try to personalize the process to your 
circumstances and desires. So explain them in detail what are your worries and 
questions. In my first CA, I changed some parts of the document to please my 
university (whose lawyers had to sign it), and the FSF was ok with it.

As I said in that thread, "the process is overly complex, obscure, confusing 
and slow. It does not seem that it needs to be so. It is scaring away potential
contributors and slowing down GCC development." and "It would be
extremely useful if anyone that feels confident on the subject wrote a
FAQ-like document in the wiki that we could use as a reference for the
future."

However, only the FSF can fix this, any discussion here about the what, the how 
or the why is pointless. (Yet, more than 5 years have passed since the above 
discussion and the situation has not improved. I firmly believe that the CA is 
the number one hurdle for contributing to GCC, more than the GPLv3 or the code 
itself.)

Cheers,

	Manuel.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [PATCH] malloc: add locking to thread cache
  2016-01-27 17:45               ` Jörn Engel
  2016-01-27 19:19                 ` Torvald Riegel
@ 2016-01-27 21:36                 ` Carlos O'Donell
  1 sibling, 0 replies; 119+ messages in thread
From: Carlos O'Donell @ 2016-01-27 21:36 UTC (permalink / raw)
  To: Jörn Engel, Szabolcs Nagy
  Cc: Yury Gribov, Florian Weimer, GNU C. Library, Siddhesh Poyarekar, nd

On 01/27/2016 12:44 PM, Jörn Engel wrote:
> On Wed, Jan 27, 2016 at 10:05:33AM +0000, Szabolcs Nagy wrote:
>>
>> portable code does not need the fix, but all users
>> of the api are affected by the overhead of the fix.
> 
> Sorry, but you have no idea what you are talking about.  The overhead of
> the fix is _negative_.  Users _want_ to be affected.
> 
> The reason why free() is not signalsafe is that it spins on an
> arena-lock.  If the same thread already holds that lock, you have a
> classic deadlock.  By not spinning on the lock you make code run faster.
> You also fix the signal-induced deadlock.
> 
> Ok, for thread-cache there is some overhead.  Without signal-safety you
> wouldn't need a lock for the thread-cache at all.  But here I call
> bullshit again, because I had the same concerns.  Then I measured and
> could not demonstrate any performance impact.
> 
> "If you cannot measure it, it does not exist."
> 
> You might disagree philosophically, but if an engineers goes out of his
> way to measure a certain effect and finds nothing, that effect hardly
> matters in any practical way.

My mantra is slightly different:

"If others cannot measure it, your performance gains don't exist."

We are always looking for microbenchmark contributions that others
can run objectively to evaluate any such changes.

Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: malloc: performance improvements and bugfixes
  2016-01-26  0:50 ` malloc: performance improvements and bugfixes Paul Eggert
  2016-01-26  1:01   ` Jörn Engel
  2016-01-26 20:50   ` Steven Munroe
@ 2016-01-27 21:45   ` Carlos O'Donell
  2 siblings, 0 replies; 119+ messages in thread
From: Carlos O'Donell @ 2016-01-27 21:45 UTC (permalink / raw)
  To: Paul Eggert, Joern Engel, GNU C. Library; +Cc: Siddhesh Poyarekar, Joern Engel

On 01/25/2016 07:50 PM, Paul Eggert wrote:
> Thanks for doing all the work and bringing it to our attention. A
> couple of comments on the process:
> 
> On 01/25/2016 04:24 PM, Joern Engel wrote:
>> I happen to prefer the kernel coding style over GNU coding style.
> 
> Nevertheless, let's keep the GNU coding style for glibc. In some
> places the existing code doesn't conform to that style but that can
> be fixed as we go.
> 
>> I believe there are some applications that have deeper knowledge
>> about malloc-internal data structures than they should
>> (*cough*emacs).  As a result it has become impossible to change the
>> internals of malloc without breaking said applications
> 
> This underestimates Emacs. :-)
> 
> Emacs "knows" so much about glibc malloc's internal data structures
> that Emacs should do the right thing if glibc removes the hooks in
> question. Of course we should test the resulting combination.
> However, the point is that Emacs's usage of glibc malloc internals
> shouldn't ossify glibc malloc.

Agreed. Paul, thank you for your contribution to make sure that
glibc and emacs empower the GNU ecosystem as best they can together.

Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: malloc: performance improvements and bugfixes
  2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
                   ` (63 preceding siblings ...)
  2016-01-26  0:50 ` malloc: performance improvements and bugfixes Paul Eggert
@ 2016-01-28 13:51 ` Carlos O'Donell
  64 siblings, 0 replies; 119+ messages in thread
From: Carlos O'Donell @ 2016-01-28 13:51 UTC (permalink / raw)
  To: Joern Engel, GNU C. Library, Will Newton, Joseph S. Myers,
	y.gribov, Adhemerval Zanella, Szabolcs Nagy, Mike Frysinger,
	Paul Eggert, Florian Weimer, Steven Munroe, lopezibanez
  Cc: Siddhesh Poyarekar, Joern Engel

On 01/25/2016 07:24 PM, Joern Engel wrote:
> From: Joern Engel <joern@purestorage.org>
> 
> Short version:
> We have forked libc malloc and added a bunch of patches on top.  Some
> patches help performance, some fix bugs, many just change the code to
> my personal liking.  Here is a braindump that is _not_ intended to be
> merged, at least not as-is.  But individual bits could and should get
> extracted.

You certainly have done quite a bit of work to tune glibc's general
purpose allocator into something that specifically suits your application
needs. Thank you for sharing the high-level details of your changes.

I appreciate your posting of these patches to the mailing list, they are
certainly a very interesting (even if I can't read them beyond their
general description, and I wish I could).

I would like to respond formally, as an GNU project steward for the GNU
C Library project to the top of this thread and make two points very
clear (not buried in another thread):

(1) The GNU C Library project, as a GNU project, requires copyright 
    assignment for legally significant changes by a particular author.

    This requirement is linked to here: "contribution checklist":
    http://www.gnu.org/software/libc/development.html

    See the "FSF copyright assignment":
    https://sourceware.org/glibc/wiki/Contribution%20checklist

    If you have any questions, please reach out directly to the FSF.

    Even though you never intended to submit these patches for inclusion,
    as you say in your email, you stand to taint anyone who reads these
    changes and then tries to implement similar work. This is nothing new,
    we face this challenge every day, but it bears repeating and you
    present the perfect opportunity to reiterate this message.

(2) Objective performance evaluations.

    We take microbenchmarking and whole-system benchmarking very seriously
    and any changes presented by anyone, senior or junior, should provide
    methods of objective evaluation of performance.

While I know you did not intend your posts to be considered for
formal inclusion, it is still a good time to point out the above
two important criteria. This way others watching don't wonder why
we didn't simply include your patches directly into glibc master
with some minor cleanups.

From patch inception to commit there is always a lot of work to
do for a core project like glibc.

I hope and look forward to your future submissions to the project.

Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 119+ messages in thread

end of thread, other threads:[~2016-01-28 13:51 UTC | newest]

Thread overview: 119+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-26  0:26 malloc: performance improvements and bugfixes Joern Engel
2016-01-26  0:26 ` [PATCH] malloc: push down the memset for huge pages Joern Engel
2016-01-26  0:26 ` [PATCH] malloc: turn arena_get() into a function Joern Engel
2016-01-26  0:26 ` [PATCH] malloc: use MAP_HUGETLB when possible Joern Engel
2016-01-26  0:26 ` [PATCH] malloc: Lindent new_heap Joern Engel
2016-01-26  0:26 ` [PATCH] malloc: unobfuscate an assert Joern Engel
2016-01-26  0:26 ` [PATCH] malloc: remove dead code Joern Engel
2016-01-26  0:26 ` [PATCH] malloc: remove emacs style guards Joern Engel
2016-01-26  0:26 ` [PATCH] malloc: unifdef -m -Ulibc_hidden_def Joern Engel
2016-01-26  0:26 ` [PATCH] malloc: kill mprotect Joern Engel
2016-01-26  0:27 ` [PATCH] malloc: remove mstate typedef Joern Engel
2016-01-26  0:27 ` [PATCH] malloc: Lindent before functional changes Joern Engel
2016-01-26  0:27 ` [PATCH] malloc: remove __builtin_expect Joern Engel
2016-01-26  7:56   ` Yury Gribov
2016-01-26  9:00     ` Jörn Engel
2016-01-26  9:37       ` Yury Gribov
2016-01-26 15:46       ` Jeff Law
2016-01-26 20:43     ` Steven Munroe
2016-01-26 21:08       ` Florian Weimer
2016-01-26 21:35         ` Steven Munroe
2016-01-26 21:42           ` Jeff Law
2016-01-27  0:37             ` Steven Munroe
2016-01-27  3:16               ` Jeff Law
2016-01-26 21:45           ` Florian Weimer
2016-01-26  0:27 ` [PATCH] malloc: s/max_node/num_nodes/ Joern Engel
2016-01-26  0:27 ` [PATCH] malloc: add documentation Joern Engel
2016-01-26  0:27 ` [PATCH] malloc: Lindent public_fREe() Joern Engel
2016-01-26  0:27 ` [PATCH] malloc: make numa_node_count more robust Joern Engel
2016-01-26  0:27 ` [PATCH] malloc: quenche last compiler warnings Joern Engel
2016-01-26  0:27 ` [PATCH] malloc: tune thread cache Joern Engel
2016-01-26  0:27 ` [PATCH] malloc: create a useful assert Joern Engel
2016-01-26  0:27 ` [PATCH] malloc: fix startup races Joern Engel
2016-01-26  0:27 ` [PATCH] malloc: Revert glibc 1d05c2fb9c6f Joern Engel
2016-01-26  0:27 ` [PATCH] malloc: simplify and fix calloc Joern Engel
2016-01-26 10:32   ` Will Newton
2016-01-26  0:27 ` [PATCH] malloc: better inline documentation Joern Engel
2016-01-26  0:27 ` [PATCH] malloc: avoid main_arena Joern Engel
2016-01-26  0:27 ` [PATCH] malloc: limit free_atomic_list() latency Joern Engel
2016-01-26  0:27 ` [PATCH] malloc: unifdef -D__STD_C Joern Engel
2016-01-26  0:27 ` [PATCH] malloc: only free half the objects on tcache_gc Joern Engel
2016-01-26  0:27 ` [PATCH] malloc: use mbind() Joern Engel
2016-01-26  0:27 ` [PATCH] malloc: unifdef -m -UATOMIC_FASTBINS Joern Engel
2016-01-26  0:27 ` [PATCH] malloc: fix perturb_byte handling for tcache Joern Engel
2016-01-26  0:27 ` [PATCH] malloc: add locking to thread cache Joern Engel
2016-01-26 12:45   ` Szabolcs Nagy
2016-01-26 13:14     ` Florian Weimer
2016-01-26 13:23       ` Yury Gribov
2016-01-26 13:40         ` Szabolcs Nagy
2016-01-26 18:00           ` Mike Frysinger
     [not found]             ` <56A8966D.9080000@arm.com>
2016-01-27 17:45               ` Jörn Engel
2016-01-27 19:19                 ` Torvald Riegel
2016-01-27 19:43                   ` Jörn Engel
2016-01-27 21:36                 ` Carlos O'Donell
2016-01-26 13:14     ` Yury Gribov
2016-01-26 17:41     ` Jörn Engel
2016-01-26  0:27 ` [PATCH] malloc: fix local_next handling Joern Engel
2016-01-26  0:27 ` [PATCH] malloc: use tsd_getspecific for arena_get Joern Engel
2016-01-26  0:27 ` [PATCH] malloc: use atomic free list Joern Engel
2016-01-26  0:27 ` [PATCH] malloc: hide THREAD_STATS Joern Engel
2016-01-26  0:27 ` [PATCH] malloc: document and fix linked list handling Joern Engel
2016-01-26  0:27 ` [PATCH] malloc: Lindent users of arena_get2 Joern Engel
2016-01-26  0:27 ` [PATCH] malloc: always free objects locklessly Joern Engel
2016-01-26  0:27 ` [PATCH] malloc: initial numa support Joern Engel
2016-01-26  0:27 ` [PATCH] malloc: unifdef -m -UPER_THREAD -U_LIBC Joern Engel
2016-01-26  0:27 ` [PATCH] malloc: destroy thread cache on thread exit Joern Engel
2016-01-26  0:27 ` [PATCH] malloc: prefetch for tcache_malloc Joern Engel
2016-01-26  0:27 ` [PATCH] malloc: fix hard-coded constant Joern Engel
2016-01-26  0:27 ` [PATCH] malloc: introduce get_backup_arena() Joern Engel
2016-01-26  0:27 ` [PATCH] malloc: unifdef -m -DUSE_ARENAS -DHAVE_MMAP Joern Engel
2016-01-26  0:28 ` [PATCH] malloc: plug thread-cache memory leak Joern Engel
2016-01-26  0:32 ` [PATCH] malloc: create aliases for malloc, free, Joern Engel
2016-01-26  0:32 ` [PATCH] malloc: fix mbind on old kernels Joern Engel
2016-01-26  0:32 ` [PATCH] malloc: remove tcache prefetching Joern Engel
2016-01-26  0:32 ` [PATCH] malloc: remove stale condition Joern Engel
2016-01-26  0:32 ` [PATCH] malloc: use bitmap to conserve hot bins Joern Engel
2016-01-26  0:32 ` [PATCH] malloc: remove get_backup_arena() from tcache_malloc() Joern Engel
2016-01-26  0:32 ` [PATCH] malloc: define __libc_memalign Joern Engel
2016-01-26  0:32 ` [PATCH] malloc: move out perturb_byte conditionals Joern Engel
2016-01-26  0:32 ` [PATCH] malloc: rename *.ch to *.h Joern Engel
2016-01-26  0:32 ` [PATCH] malloc: brain-dead thread cache Joern Engel
2016-01-26  0:32 ` [PATCH] malloc: remove hooks from malloc() and free() Joern Engel
2016-01-26  0:32 ` [PATCH] malloc: speed up mmap Joern Engel
2016-01-26  0:32 ` [PATCH] malloc: fix calculation of aligned heaps Joern Engel
2016-01-26  0:32 ` [PATCH] malloc: remove atfork hooks Joern Engel
2016-01-26  0:32 ` [PATCH] malloc: remove all remaining hooks Joern Engel
2016-01-26  0:32 ` [PATCH] malloc: Don't call tsd_setspecific before tsd_key_create Joern Engel
2016-01-26  0:32 ` [PATCH] malloc: allow recursion from ptmalloc_init to malloc Joern Engel
2016-01-26  0:50 ` malloc: performance improvements and bugfixes Paul Eggert
2016-01-26  1:01   ` Jörn Engel
2016-01-26  1:52     ` Siddhesh Poyarekar
2016-01-26  2:45       ` Jörn Engel
2016-01-26  3:22         ` Jörn Engel
2016-01-26  6:22           ` Mike Frysinger
2016-01-26  7:54             ` Jörn Engel
2016-01-26  9:53               ` Florian Weimer
2016-01-26 17:05                 ` Jörn Engel
2016-01-26 17:31                   ` Paul Eggert
2016-01-26 17:48                     ` Adhemerval Zanella
2016-01-26 17:49                   ` Joseph Myers
2016-01-26 17:57                   ` Mike Frysinger
2016-01-27 20:46                     ` Manuel López-Ibáñez
2016-01-26 12:37           ` Torvald Riegel
2016-01-26 13:23           ` Florian Weimer
2016-01-26  7:40         ` Paul Eggert
2016-01-26  9:54         ` Florian Weimer
2016-01-26  1:52     ` Joseph Myers
2016-01-26 20:50   ` Steven Munroe
2016-01-26 21:40     ` Florian Weimer
2016-01-26 21:48       ` Steven Munroe
2016-01-26 21:51         ` Florian Weimer
2016-01-26 21:51       ` Paul Eggert
2016-01-26 21:57         ` Florian Weimer
2016-01-26 22:18           ` Paul Eggert
2016-01-26 22:24             ` Florian Weimer
2016-01-27  1:31               ` Paul Eggert
2016-01-26 22:00       ` Jörn Engel
2016-01-26 22:02         ` Florian Weimer
2016-01-27 21:45   ` Carlos O'Donell
2016-01-28 13:51 ` Carlos O'Donell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).