public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH 0/6] malloc: more memory tagging optimizations
@ 2021-03-19 13:25 Szabolcs Nagy
  2021-03-19 13:26 ` [PATCH 1/6] malloc: Use memsize instead of CHUNK_AVAILABLE_SIZE Szabolcs Nagy
                   ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: Szabolcs Nagy @ 2021-03-19 13:25 UTC (permalink / raw)
  To: libc-alpha, DJ Delorie

Second set of refactoring patches on top of the previously posted
patches. The two sets together are in the nsz/mtag-2 branch.

The main changes are renaming CHUNK_AVAILABLE_SIZE and changing
chunk2mem to not get the tag at mem (so the returned pointer
is not suitable for dereference if mem is already tagged, but
that turns out to be rare).

Further changes considered but rejected:

- remove tag_at from mem2chunk too (mem is often known to be
  untagged, or we can assume the internal tag is 0 and turn
  tag_at into an unmask operations). this does not seem to have
  much performance effect on small allocations.

- don't try to use a new tag different from the old one in
  tag_new_usable since at most call sites the old one is just
  the internal one which is excluded already from tag generation,
  only realloc benefits from the current behaviour. this change
  would be a bit aarch64 specific (tag generation can exclude the
  internally used tag) and does not have much perf effect either.

Szabolcs Nagy (6):
  malloc: Use memsize instead of CHUNK_AVAILABLE_SIZE
  malloc: Use different tag after mremap
  malloc: Use chunk2rawmem throughout
  malloc: Rename chunk2rawmem
  malloc: Remove unnecessary tagging around _mid_memalign
  malloc: Ensure mtag code path in checked_request2size is cold

 malloc/hooks.c  |  13 +++--
 malloc/malloc.c | 126 +++++++++++++++++++++++++-----------------------
 2 files changed, 72 insertions(+), 67 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/6] malloc: Use memsize instead of CHUNK_AVAILABLE_SIZE
  2021-03-19 13:25 [PATCH 0/6] malloc: more memory tagging optimizations Szabolcs Nagy
@ 2021-03-19 13:26 ` Szabolcs Nagy
  2021-03-23 20:01   ` DJ Delorie
  2021-03-19 13:26 ` [PATCH 2/6] malloc: Use different tag after mremap Szabolcs Nagy
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Szabolcs Nagy @ 2021-03-19 13:26 UTC (permalink / raw)
  To: libc-alpha, DJ Delorie

This is a pure refactoring change that does not affect behaviour.

The CHUNK_AVAILABLE_SIZE name was unclear, the memsize name tries to
follow the existing convention of mem denoting the allocation that is
handed out to the user, while chunk is its internally used container.

The user owned memory for a given chunk starts at chunk2mem(p) and
the size is memsize(p).  It is not valid to use on dumped heap chunks.

Moved the definition next to other chunk and mem related macros.
---
 malloc/hooks.c  | 11 +++++------
 malloc/malloc.c | 39 +++++++++++++++++++--------------------
 2 files changed, 24 insertions(+), 26 deletions(-)

diff --git a/malloc/hooks.c b/malloc/hooks.c
index 9474e199c3..b82ff5781b 100644
--- a/malloc/hooks.c
+++ b/malloc/hooks.c
@@ -102,7 +102,7 @@ malloc_check_get_size (mchunkptr p)
 
   assert (using_malloc_checking == 1);
 
-  for (size = CHUNK_AVAILABLE_SIZE (p) - 1;
+  for (size = CHUNK_HDR_SZ + memsize (p) - 1;
        (c = *SAFE_CHAR_OFFSET (p, size)) != magic;
        size -= c)
     {
@@ -130,7 +130,7 @@ mem2mem_check (void *ptr, size_t req_sz)
 
   p = mem2chunk (ptr);
   magic = magicbyte (p);
-  max_sz = CHUNK_AVAILABLE_SIZE (p) - CHUNK_HDR_SZ;
+  max_sz = memsize (p);
 
   for (i = max_sz - 1; i > req_sz; i -= block_sz)
     {
@@ -175,7 +175,7 @@ mem2chunk_check (void *mem, unsigned char **magic_p)
                                next_chunk (prev_chunk (p)) != p)))
         return NULL;
 
-      for (sz = CHUNK_AVAILABLE_SIZE (p) - 1;
+      for (sz = CHUNK_HDR_SZ + memsize (p) - 1;
 	   (c = *SAFE_CHAR_OFFSET (p, sz)) != magic;
 	   sz -= c)
         {
@@ -200,7 +200,7 @@ mem2chunk_check (void *mem, unsigned char **magic_p)
           ((prev_size (p) + sz) & page_mask) != 0)
         return NULL;
 
-      for (sz = CHUNK_AVAILABLE_SIZE (p) - 1;
+      for (sz = CHUNK_HDR_SZ + memsize (p) - 1;
 	   (c = *SAFE_CHAR_OFFSET (p, sz)) != magic;
 	   sz -= c)
         {
@@ -279,8 +279,7 @@ free_check (void *mem, const void *caller)
   else
     {
       /* Mark the chunk as belonging to the library again.  */
-      (void)tag_region (chunk2rawmem (p), CHUNK_AVAILABLE_SIZE (p)
-                                         - CHUNK_HDR_SZ);
+      (void)tag_region (chunk2rawmem (p), memsize (p));
       _int_free (&main_arena, p, 1);
       __libc_lock_unlock (main_arena.mutex);
     }
diff --git a/malloc/malloc.c b/malloc/malloc.c
index 36583120ce..03eb0f40fa 100644
--- a/malloc/malloc.c
+++ b/malloc/malloc.c
@@ -1331,18 +1331,6 @@ nextchunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    MINSIZE :                                                      \
    ((req) + SIZE_SZ + MALLOC_ALIGN_MASK) & ~MALLOC_ALIGN_MASK)
 
-/* Available size of chunk.  This is the size of the real usable data
-   in the chunk, plus the chunk header.  Note: If memory tagging is
-   enabled the layout changes to accomodate the granule size, this is
-   wasteful for small allocations so not done by default.  The logic
-   does not work if chunk headers are not granule aligned.  */
-_Static_assert (__MTAG_GRANULE_SIZE <= CHUNK_HDR_SZ,
-		"memory tagging is not supported with large granule.");
-#define CHUNK_AVAILABLE_SIZE(p)                                       \
-  (__MTAG_GRANULE_SIZE > SIZE_SZ && __glibc_unlikely (mtag_enabled) ? \
-    chunksize (p) :                                                   \
-    chunksize (p) + (chunk_is_mmapped (p) ? 0 : SIZE_SZ))
-
 /* Check if REQ overflows when padded and aligned and if the resulting value
    is less than PTRDIFF_T.  Returns TRUE and the requested size or MINSIZE in
    case the value is less than MINSIZE on SZ or false if any of the previous
@@ -1465,14 +1453,26 @@ checked_request2size (size_t req, size_t *sz) __nonnull (1)
 #pragma GCC poison mchunk_size
 #pragma GCC poison mchunk_prev_size
 
+/* This is the size of the real usable data in the chunk.  Not valid for
+   dumped heap chunks.  */
+#define memsize(p)                                                    \
+  (__MTAG_GRANULE_SIZE > SIZE_SZ && __glibc_unlikely (mtag_enabled) ? \
+    chunksize (p) - CHUNK_HDR_SZ :                                    \
+    chunksize (p) - CHUNK_HDR_SZ + (chunk_is_mmapped (p) ? 0 : SIZE_SZ))
+
+/* If memory tagging is enabled the layout changes to accomodate the granule
+   size, this is wasteful for small allocations so not done by default.
+   Both the chunk header and user data has to be granule aligned.  */
+_Static_assert (__MTAG_GRANULE_SIZE <= CHUNK_HDR_SZ,
+		"memory tagging is not supported with large granule.");
+
 static __always_inline void *
 tag_new_usable (void *ptr)
 {
   if (__glibc_unlikely (mtag_enabled) && ptr)
     {
       mchunkptr cp = mem2chunk(ptr);
-      ptr = __libc_mtag_tag_region (__libc_mtag_new_tag (ptr),
-				    CHUNK_AVAILABLE_SIZE (cp) - CHUNK_HDR_SZ);
+      ptr = __libc_mtag_tag_region (__libc_mtag_new_tag (ptr), memsize (cp));
     }
   return ptr;
 }
@@ -3316,8 +3316,7 @@ __libc_free (void *mem)
       MAYBE_INIT_TCACHE ();
 
       /* Mark the chunk as belonging to the library again.  */
-      (void)tag_region (chunk2rawmem (p),
-			CHUNK_AVAILABLE_SIZE (p) - CHUNK_HDR_SZ);
+      (void)tag_region (chunk2rawmem (p), memsize (p));
 
       ar_ptr = arena_for_chunk (p);
       _int_free (ar_ptr, p, 0);
@@ -3459,7 +3458,7 @@ __libc_realloc (void *oldmem, size_t bytes)
       newp = __libc_malloc (bytes);
       if (newp != NULL)
         {
-	  size_t sz = CHUNK_AVAILABLE_SIZE (oldp) - CHUNK_HDR_SZ;
+	  size_t sz = memsize (oldp);
 	  memcpy (newp, oldmem, sz);
 	  (void) tag_region (chunk2rawmem (oldp), sz);
           _int_free (ar_ptr, oldp, 0);
@@ -3675,7 +3674,7 @@ __libc_calloc (size_t n, size_t elem_size)
      regardless of MORECORE_CLEARS, so we zero the whole block while
      doing so.  */
   if (__glibc_unlikely (mtag_enabled))
-    return tag_new_zero_region (mem, CHUNK_AVAILABLE_SIZE (p) - CHUNK_HDR_SZ);
+    return tag_new_zero_region (mem, memsize (p));
 
   INTERNAL_SIZE_T csz = chunksize (p);
 
@@ -4863,7 +4862,7 @@ _int_realloc(mstate av, mchunkptr oldp, INTERNAL_SIZE_T oldsize,
           else
             {
 	      void *oldmem = chunk2rawmem (oldp);
-	      size_t sz = CHUNK_AVAILABLE_SIZE (oldp) - CHUNK_HDR_SZ;
+	      size_t sz = memsize (oldp);
 	      (void) tag_region (oldmem, sz);
 	      newmem = tag_new_usable (newmem);
 	      memcpy (newmem, oldmem, sz);
@@ -5110,7 +5109,7 @@ musable (void *mem)
 	    result = chunksize (p) - CHUNK_HDR_SZ;
 	}
       else if (inuse (p))
-	result = CHUNK_AVAILABLE_SIZE (p) - CHUNK_HDR_SZ;
+	result = memsize (p);
 
       return result;
     }
-- 
2.17.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 2/6] malloc: Use different tag after mremap
  2021-03-19 13:25 [PATCH 0/6] malloc: more memory tagging optimizations Szabolcs Nagy
  2021-03-19 13:26 ` [PATCH 1/6] malloc: Use memsize instead of CHUNK_AVAILABLE_SIZE Szabolcs Nagy
@ 2021-03-19 13:26 ` Szabolcs Nagy
  2021-03-23 20:03   ` DJ Delorie
  2021-03-19 13:26 ` [PATCH 3/6] malloc: Use chunk2rawmem throughout Szabolcs Nagy
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Szabolcs Nagy @ 2021-03-19 13:26 UTC (permalink / raw)
  To: libc-alpha, DJ Delorie

The comment explained why different tag is used after mremap, but
for that correctly tagged pointer should be passed to tag_new_usable.
Use chunk2mem to get the tag.
---
 malloc/malloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/malloc/malloc.c b/malloc/malloc.c
index 03eb0f40fa..34884808e2 100644
--- a/malloc/malloc.c
+++ b/malloc/malloc.c
@@ -3411,7 +3411,7 @@ __libc_realloc (void *oldmem, size_t bytes)
       newp = mremap_chunk (oldp, nb);
       if (newp)
 	{
-	  void *newmem = chunk2rawmem (newp);
+	  void *newmem = chunk2mem (newp);
 	  /* Give the new block a different tag.  This helps to ensure
 	     that stale handles to the previous mapping are not
 	     reused.  There's a performance hit for both us and the
-- 
2.17.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 3/6] malloc: Use chunk2rawmem throughout
  2021-03-19 13:25 [PATCH 0/6] malloc: more memory tagging optimizations Szabolcs Nagy
  2021-03-19 13:26 ` [PATCH 1/6] malloc: Use memsize instead of CHUNK_AVAILABLE_SIZE Szabolcs Nagy
  2021-03-19 13:26 ` [PATCH 2/6] malloc: Use different tag after mremap Szabolcs Nagy
@ 2021-03-19 13:26 ` Szabolcs Nagy
  2021-03-23 20:25   ` DJ Delorie
  2021-03-19 13:27 ` [PATCH 4/6] malloc: Rename chunk2rawmem Szabolcs Nagy
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Szabolcs Nagy @ 2021-03-19 13:26 UTC (permalink / raw)
  To: libc-alpha, DJ Delorie

The difference between chunk2mem and chunk2rawmem is that the latter
does not get the memory tag for the returned pointer.  It turns out
chunk2rawmem almost always works:

The input of chunk2mem is a chunk pointer that is untagged so it can
access the chunk header. All memory that is not user allocated heap
memory is untagged, which in the current implementation means that it
has the 0 tag, but this patch does not rely on the tag value. The
patch relies on that chunk operations are either done on untagged
chunks or without doing memory access to the user owned part.

Internal interface contracts:

sysmalloc: Returns untagged memory.
_int_malloc: Returns untagged memory.
_int_free: Takes untagged memory.
_int_memalign: Returns untagged memory.
_int_realloc: Takes and returns tagged memory.

So only _int_realloc and functions outside this list need care.
Alignment checks do not need the right tag and tcache works with
untagged memory.

tag_at was kept in realloc after an mremap, which is not strictly
necessary, since the pointer is only used to retag the memory, but this
way the tag is guaranteed to be different from the old tag.
---
 malloc/hooks.c  |  2 +-
 malloc/malloc.c | 58 ++++++++++++++++++++++++++++---------------------
 2 files changed, 34 insertions(+), 26 deletions(-)

diff --git a/malloc/hooks.c b/malloc/hooks.c
index b82ff5781b..e888adcdc3 100644
--- a/malloc/hooks.c
+++ b/malloc/hooks.c
@@ -330,7 +330,7 @@ realloc_check (void *oldmem, size_t bytes, const void *caller)
 #if HAVE_MREMAP
       mchunkptr newp = mremap_chunk (oldp, chnb);
       if (newp)
-        newmem = chunk2mem (newp);
+        newmem = tag_at (chunk2rawmem (newp));
       else
 #endif
       {
diff --git a/malloc/malloc.c b/malloc/malloc.c
index 34884808e2..9ddb65f029 100644
--- a/malloc/malloc.c
+++ b/malloc/malloc.c
@@ -1286,18 +1286,26 @@ nextchunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    headers have distinct tags.  Converting fully from one to the other
    involves extracting the tag at the other address and creating a
    suitable pointer using it.  That can be quite expensive.  There are
-   many occasions, though when the pointer will not be dereferenced
-   (for example, because we only want to assert that the pointer is
-   correctly aligned).  In these cases it is more efficient not
-   to extract the tag, since the answer will be the same either way.
-   chunk2rawmem() can be used in these cases.
- */
+   cases when the pointers are not dereferenced (for example only used
+   for alignment check) so the tags are not relevant, and there are
+   cases when user data is not tagged distinctly from malloc headers
+   (user data is untagged because tagging is done late in malloc and
+   early in free).  User memory tagging across internal interfaces:
+
+      sysmalloc: Returns untagged memory.
+      _int_malloc: Returns untagged memory.
+      _int_free: Takes untagged memory.
+      _int_memalign: Returns untagged memory.
+      _int_memalign: Returns untagged memory.
+      _mid_memalign: Returns tagged memory.
+      _int_realloc: Takes and returns tagged memory.
+*/
 
 /* The chunk header is two SIZE_SZ elements, but this is used widely, so
    we define it here for clarity later.  */
 #define CHUNK_HDR_SZ (2 * SIZE_SZ)
 
-/* Convert a user mem pointer to a chunk address without correcting
+/* Convert a chunk address to a user mem pointer without correcting
    the tag.  */
 #define chunk2rawmem(p) ((void*)((char*)(p) + CHUNK_HDR_SZ))
 
@@ -1320,7 +1328,7 @@ nextchunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 #define aligned_OK(m)  (((unsigned long)(m) & MALLOC_ALIGN_MASK) == 0)
 
 #define misaligned_chunk(p) \
-  ((uintptr_t)(MALLOC_ALIGNMENT == CHUNK_HDR_SZ ? (p) : chunk2mem (p)) \
+  ((uintptr_t)(MALLOC_ALIGNMENT == CHUNK_HDR_SZ ? (p) : chunk2rawmem (p)) \
    & MALLOC_ALIGN_MASK)
 
 /* pad request bytes into a usable size -- internal version */
@@ -2528,7 +2536,7 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av)
 
               check_chunk (av, p);
 
-              return chunk2mem (p);
+              return chunk2rawmem (p);
             }
         }
     }
@@ -2898,7 +2906,7 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av)
       set_head (p, nb | PREV_INUSE | (av != &main_arena ? NON_MAIN_ARENA : 0));
       set_head (remainder, remainder_size | PREV_INUSE);
       check_malloced_chunk (av, p, nb);
-      return chunk2mem (p);
+      return chunk2rawmem (p);
     }
 
   /* catch all failure paths */
@@ -3030,7 +3038,7 @@ mremap_chunk (mchunkptr p, size_t new_size)
   assert (chunk_is_mmapped (p));
 
   uintptr_t block = (uintptr_t) p - offset;
-  uintptr_t mem = (uintptr_t) chunk2mem(p);
+  uintptr_t mem = (uintptr_t) chunk2rawmem(p);
   size_t total_size = offset + size;
   if (__glibc_unlikely ((block | total_size) & (pagesize - 1)) != 0
       || __glibc_unlikely (!powerof2 (mem & (pagesize - 1))))
@@ -3096,7 +3104,7 @@ static __thread tcache_perthread_struct *tcache = NULL;
 static __always_inline void
 tcache_put (mchunkptr chunk, size_t tc_idx)
 {
-  tcache_entry *e = (tcache_entry *) chunk2mem (chunk);
+  tcache_entry *e = (tcache_entry *) chunk2rawmem (chunk);
 
   /* Mark this chunk as "in the tcache" so the test in _int_free will
      detect a double free.  */
@@ -3411,7 +3419,7 @@ __libc_realloc (void *oldmem, size_t bytes)
       newp = mremap_chunk (oldp, nb);
       if (newp)
 	{
-	  void *newmem = chunk2mem (newp);
+	  void *newmem = tag_at (chunk2rawmem (newp));
 	  /* Give the new block a different tag.  This helps to ensure
 	     that stale handles to the previous mapping are not
 	     reused.  There's a performance hit for both us and the
@@ -3852,7 +3860,7 @@ _int_malloc (mstate av, size_t bytes)
 		    }
 		}
 #endif
-	      void *p = chunk2mem (victim);
+	      void *p = chunk2rawmem (victim);
 	      alloc_perturb (p, bytes);
 	      return p;
 	    }
@@ -3910,7 +3918,7 @@ _int_malloc (mstate av, size_t bytes)
 		}
 	    }
 #endif
-          void *p = chunk2mem (victim);
+          void *p = chunk2rawmem (victim);
           alloc_perturb (p, bytes);
           return p;
         }
@@ -4011,7 +4019,7 @@ _int_malloc (mstate av, size_t bytes)
               set_foot (remainder, remainder_size);
 
               check_malloced_chunk (av, victim, nb);
-              void *p = chunk2mem (victim);
+              void *p = chunk2rawmem (victim);
               alloc_perturb (p, bytes);
               return p;
             }
@@ -4043,7 +4051,7 @@ _int_malloc (mstate av, size_t bytes)
 		{
 #endif
               check_malloced_chunk (av, victim, nb);
-              void *p = chunk2mem (victim);
+              void *p = chunk2rawmem (victim);
               alloc_perturb (p, bytes);
               return p;
 #if USE_TCACHE
@@ -4205,7 +4213,7 @@ _int_malloc (mstate av, size_t bytes)
                   set_foot (remainder, remainder_size);
                 }
               check_malloced_chunk (av, victim, nb);
-              void *p = chunk2mem (victim);
+              void *p = chunk2rawmem (victim);
               alloc_perturb (p, bytes);
               return p;
             }
@@ -4313,7 +4321,7 @@ _int_malloc (mstate av, size_t bytes)
                   set_foot (remainder, remainder_size);
                 }
               check_malloced_chunk (av, victim, nb);
-              void *p = chunk2mem (victim);
+              void *p = chunk2rawmem (victim);
               alloc_perturb (p, bytes);
               return p;
             }
@@ -4351,7 +4359,7 @@ _int_malloc (mstate av, size_t bytes)
           set_head (remainder, remainder_size | PREV_INUSE);
 
           check_malloced_chunk (av, victim, nb);
-          void *p = chunk2mem (victim);
+          void *p = chunk2rawmem (victim);
           alloc_perturb (p, bytes);
           return p;
         }
@@ -4419,7 +4427,7 @@ _int_free (mstate av, mchunkptr p, int have_lock)
     if (tcache != NULL && tc_idx < mp_.tcache_bins)
       {
 	/* Check to see if it's already in the tcache.  */
-	tcache_entry *e = (tcache_entry *) chunk2mem (p);
+	tcache_entry *e = (tcache_entry *) chunk2rawmem (p);
 
 	/* This test succeeds on double free.  However, we don't 100%
 	   trust it (it also matches random payload data at a 1 in
@@ -4491,7 +4499,7 @@ _int_free (mstate av, mchunkptr p, int have_lock)
 	  malloc_printerr ("free(): invalid next size (fast)");
       }
 
-    free_perturb (chunk2mem(p), size - CHUNK_HDR_SZ);
+    free_perturb (chunk2rawmem(p), size - CHUNK_HDR_SZ);
 
     atomic_store_relaxed (&av->have_fastchunks, true);
     unsigned int idx = fastbin_index(size);
@@ -4564,7 +4572,7 @@ _int_free (mstate av, mchunkptr p, int have_lock)
 	|| __builtin_expect (nextsize >= av->system_mem, 0))
       malloc_printerr ("free(): invalid next size (normal)");
 
-    free_perturb (chunk2mem(p), size - CHUNK_HDR_SZ);
+    free_perturb (chunk2rawmem(p), size - CHUNK_HDR_SZ);
 
     /* consolidate backward */
     if (!prev_inuse(p)) {
@@ -4964,7 +4972,7 @@ _int_memalign (mstate av, size_t alignment, size_t bytes)
         {
           set_prev_size (newp, prev_size (p) + leadsize);
           set_head (newp, newsize | IS_MMAPPED);
-          return chunk2mem (newp);
+          return chunk2rawmem (newp);
         }
 
       /* Otherwise, give back leader, use the rest */
@@ -4995,7 +5003,7 @@ _int_memalign (mstate av, size_t alignment, size_t bytes)
     }
 
   check_inuse_chunk (av, p);
-  return chunk2mem (p);
+  return chunk2rawmem (p);
 }
 
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 4/6] malloc: Rename chunk2rawmem
  2021-03-19 13:25 [PATCH 0/6] malloc: more memory tagging optimizations Szabolcs Nagy
                   ` (2 preceding siblings ...)
  2021-03-19 13:26 ` [PATCH 3/6] malloc: Use chunk2rawmem throughout Szabolcs Nagy
@ 2021-03-19 13:27 ` Szabolcs Nagy
  2021-03-23 20:43   ` DJ Delorie
  2021-03-19 13:27 ` [PATCH 5/6] malloc: Remove unnecessary tagging around _mid_memalign Szabolcs Nagy
  2021-03-19 13:27 ` [PATCH 6/6] malloc: Ensure mtag code path in checked_request2size is cold Szabolcs Nagy
  5 siblings, 1 reply; 13+ messages in thread
From: Szabolcs Nagy @ 2021-03-19 13:27 UTC (permalink / raw)
  To: libc-alpha, DJ Delorie

The previous patch ensured that all chunk to mem computations use
chunk2rawmem, so now we can rename it to chunk2mem, and in the few
cases where the tag of mem is relevant chunk2mem_tag can be used.

Replaced tag_at (chunk2rawmem (x)) with chunk2mem_tag (x).
Renamed chunk2rawmem to chunk2mem.
---
 malloc/hooks.c  |  4 +--
 malloc/malloc.c | 82 ++++++++++++++++++++++++-------------------------
 2 files changed, 43 insertions(+), 43 deletions(-)

diff --git a/malloc/hooks.c b/malloc/hooks.c
index e888adcdc3..c91f9502ba 100644
--- a/malloc/hooks.c
+++ b/malloc/hooks.c
@@ -279,7 +279,7 @@ free_check (void *mem, const void *caller)
   else
     {
       /* Mark the chunk as belonging to the library again.  */
-      (void)tag_region (chunk2rawmem (p), memsize (p));
+      (void)tag_region (chunk2mem (p), memsize (p));
       _int_free (&main_arena, p, 1);
       __libc_lock_unlock (main_arena.mutex);
     }
@@ -330,7 +330,7 @@ realloc_check (void *oldmem, size_t bytes, const void *caller)
 #if HAVE_MREMAP
       mchunkptr newp = mremap_chunk (oldp, chnb);
       if (newp)
-        newmem = tag_at (chunk2rawmem (newp));
+        newmem = chunk2mem_tag (newp);
       else
 #endif
       {
diff --git a/malloc/malloc.c b/malloc/malloc.c
index 9ddb65f029..6f87b7bdb1 100644
--- a/malloc/malloc.c
+++ b/malloc/malloc.c
@@ -1307,12 +1307,12 @@ nextchunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 
 /* Convert a chunk address to a user mem pointer without correcting
    the tag.  */
-#define chunk2rawmem(p) ((void*)((char*)(p) + CHUNK_HDR_SZ))
+#define chunk2mem(p) ((void*)((char*)(p) + CHUNK_HDR_SZ))
 
-/* Convert between user mem pointers and chunk pointers, updating any
-   memory tags on the pointer to respect the tag value at that
-   location.  */
-#define chunk2mem(p) ((void *)tag_at (((char*)(p) + CHUNK_HDR_SZ)))
+/* Convert a chunk address to a user mem pointer and extract the right tag.  */
+#define chunk2mem_tag(p) ((void*)tag_at ((char*)(p) + CHUNK_HDR_SZ))
+
+/* Convert a user mem pointer to a chunk address and extract the right tag.  */
 #define mem2chunk(mem) ((mchunkptr)tag_at (((char*)(mem) - CHUNK_HDR_SZ)))
 
 /* The smallest possible chunk */
@@ -1328,7 +1328,7 @@ nextchunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 #define aligned_OK(m)  (((unsigned long)(m) & MALLOC_ALIGN_MASK) == 0)
 
 #define misaligned_chunk(p) \
-  ((uintptr_t)(MALLOC_ALIGNMENT == CHUNK_HDR_SZ ? (p) : chunk2rawmem (p)) \
+  ((uintptr_t)(MALLOC_ALIGNMENT == CHUNK_HDR_SZ ? (p) : chunk2mem (p)) \
    & MALLOC_ALIGN_MASK)
 
 /* pad request bytes into a usable size -- internal version */
@@ -2128,7 +2128,7 @@ do_check_chunk (mstate av, mchunkptr p)
       /* chunk is page-aligned */
       assert (((prev_size (p) + sz) & (GLRO (dl_pagesize) - 1)) == 0);
       /* mem is aligned */
-      assert (aligned_OK (chunk2rawmem (p)));
+      assert (aligned_OK (chunk2mem (p)));
     }
 }
 
@@ -2152,7 +2152,7 @@ do_check_free_chunk (mstate av, mchunkptr p)
   if ((unsigned long) (sz) >= MINSIZE)
     {
       assert ((sz & MALLOC_ALIGN_MASK) == 0);
-      assert (aligned_OK (chunk2rawmem (p)));
+      assert (aligned_OK (chunk2mem (p)));
       /* ... matching footer field */
       assert (prev_size (next_chunk (p)) == sz);
       /* ... and is fully consolidated */
@@ -2231,7 +2231,7 @@ do_check_remalloced_chunk (mstate av, mchunkptr p, INTERNAL_SIZE_T s)
   assert ((sz & MALLOC_ALIGN_MASK) == 0);
   assert ((unsigned long) (sz) >= MINSIZE);
   /* ... and alignment */
-  assert (aligned_OK (chunk2rawmem (p)));
+  assert (aligned_OK (chunk2mem (p)));
   /* chunk is less than MINSIZE more than request */
   assert ((long) (sz) - (long) (s) >= 0);
   assert ((long) (sz) - (long) (s + MINSIZE) < 0);
@@ -2501,16 +2501,16 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av)
 
               if (MALLOC_ALIGNMENT == CHUNK_HDR_SZ)
                 {
-                  /* For glibc, chunk2rawmem increases the address by
+                  /* For glibc, chunk2mem increases the address by
                      CHUNK_HDR_SZ and MALLOC_ALIGN_MASK is
                      CHUNK_HDR_SZ-1.  Each mmap'ed area is page
                      aligned and therefore definitely
                      MALLOC_ALIGN_MASK-aligned.  */
-                  assert (((INTERNAL_SIZE_T) chunk2rawmem (mm) & MALLOC_ALIGN_MASK) == 0);
+                  assert (((INTERNAL_SIZE_T) chunk2mem (mm) & MALLOC_ALIGN_MASK) == 0);
                   front_misalign = 0;
                 }
               else
-                front_misalign = (INTERNAL_SIZE_T) chunk2rawmem (mm) & MALLOC_ALIGN_MASK;
+                front_misalign = (INTERNAL_SIZE_T) chunk2mem (mm) & MALLOC_ALIGN_MASK;
               if (front_misalign > 0)
                 {
                   correction = MALLOC_ALIGNMENT - front_misalign;
@@ -2536,7 +2536,7 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av)
 
               check_chunk (av, p);
 
-              return chunk2rawmem (p);
+              return chunk2mem (p);
             }
         }
     }
@@ -2757,7 +2757,7 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av)
 
                   /* Guarantee alignment of first new chunk made from this space */
 
-                  front_misalign = (INTERNAL_SIZE_T) chunk2rawmem (brk) & MALLOC_ALIGN_MASK;
+                  front_misalign = (INTERNAL_SIZE_T) chunk2mem (brk) & MALLOC_ALIGN_MASK;
                   if (front_misalign > 0)
                     {
                       /*
@@ -2815,10 +2815,10 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av)
                 {
                   if (MALLOC_ALIGNMENT == CHUNK_HDR_SZ)
                     /* MORECORE/mmap must correctly align */
-                    assert (((unsigned long) chunk2rawmem (brk) & MALLOC_ALIGN_MASK) == 0);
+                    assert (((unsigned long) chunk2mem (brk) & MALLOC_ALIGN_MASK) == 0);
                   else
                     {
-                      front_misalign = (INTERNAL_SIZE_T) chunk2rawmem (brk) & MALLOC_ALIGN_MASK;
+                      front_misalign = (INTERNAL_SIZE_T) chunk2mem (brk) & MALLOC_ALIGN_MASK;
                       if (front_misalign > 0)
                         {
                           /*
@@ -2906,7 +2906,7 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av)
       set_head (p, nb | PREV_INUSE | (av != &main_arena ? NON_MAIN_ARENA : 0));
       set_head (remainder, remainder_size | PREV_INUSE);
       check_malloced_chunk (av, p, nb);
-      return chunk2rawmem (p);
+      return chunk2mem (p);
     }
 
   /* catch all failure paths */
@@ -3004,7 +3004,7 @@ munmap_chunk (mchunkptr p)
   if (DUMPED_MAIN_ARENA_CHUNK (p))
     return;
 
-  uintptr_t mem = (uintptr_t) chunk2rawmem (p);
+  uintptr_t mem = (uintptr_t) chunk2mem (p);
   uintptr_t block = (uintptr_t) p - prev_size (p);
   size_t total_size = prev_size (p) + size;
   /* Unfortunately we have to do the compilers job by hand here.  Normally
@@ -3038,7 +3038,7 @@ mremap_chunk (mchunkptr p, size_t new_size)
   assert (chunk_is_mmapped (p));
 
   uintptr_t block = (uintptr_t) p - offset;
-  uintptr_t mem = (uintptr_t) chunk2rawmem(p);
+  uintptr_t mem = (uintptr_t) chunk2mem(p);
   size_t total_size = offset + size;
   if (__glibc_unlikely ((block | total_size) & (pagesize - 1)) != 0
       || __glibc_unlikely (!powerof2 (mem & (pagesize - 1))))
@@ -3059,7 +3059,7 @@ mremap_chunk (mchunkptr p, size_t new_size)
 
   p = (mchunkptr) (cp + offset);
 
-  assert (aligned_OK (chunk2rawmem (p)));
+  assert (aligned_OK (chunk2mem (p)));
 
   assert (prev_size (p) == offset);
   set_head (p, (new_size - offset) | IS_MMAPPED);
@@ -3104,7 +3104,7 @@ static __thread tcache_perthread_struct *tcache = NULL;
 static __always_inline void
 tcache_put (mchunkptr chunk, size_t tc_idx)
 {
-  tcache_entry *e = (tcache_entry *) chunk2rawmem (chunk);
+  tcache_entry *e = (tcache_entry *) chunk2mem (chunk);
 
   /* Mark this chunk as "in the tcache" so the test in _int_free will
      detect a double free.  */
@@ -3324,7 +3324,7 @@ __libc_free (void *mem)
       MAYBE_INIT_TCACHE ();
 
       /* Mark the chunk as belonging to the library again.  */
-      (void)tag_region (chunk2rawmem (p), memsize (p));
+      (void)tag_region (chunk2mem (p), memsize (p));
 
       ar_ptr = arena_for_chunk (p);
       _int_free (ar_ptr, p, 0);
@@ -3419,7 +3419,7 @@ __libc_realloc (void *oldmem, size_t bytes)
       newp = mremap_chunk (oldp, nb);
       if (newp)
 	{
-	  void *newmem = tag_at (chunk2rawmem (newp));
+	  void *newmem = chunk2mem_tag (newp);
 	  /* Give the new block a different tag.  This helps to ensure
 	     that stale handles to the previous mapping are not
 	     reused.  There's a performance hit for both us and the
@@ -3468,7 +3468,7 @@ __libc_realloc (void *oldmem, size_t bytes)
         {
 	  size_t sz = memsize (oldp);
 	  memcpy (newp, oldmem, sz);
-	  (void) tag_region (chunk2rawmem (oldp), sz);
+	  (void) tag_region (chunk2mem (oldp), sz);
           _int_free (ar_ptr, oldp, 0);
         }
     }
@@ -3860,7 +3860,7 @@ _int_malloc (mstate av, size_t bytes)
 		    }
 		}
 #endif
-	      void *p = chunk2rawmem (victim);
+	      void *p = chunk2mem (victim);
 	      alloc_perturb (p, bytes);
 	      return p;
 	    }
@@ -3918,7 +3918,7 @@ _int_malloc (mstate av, size_t bytes)
 		}
 	    }
 #endif
-          void *p = chunk2rawmem (victim);
+          void *p = chunk2mem (victim);
           alloc_perturb (p, bytes);
           return p;
         }
@@ -4019,7 +4019,7 @@ _int_malloc (mstate av, size_t bytes)
               set_foot (remainder, remainder_size);
 
               check_malloced_chunk (av, victim, nb);
-              void *p = chunk2rawmem (victim);
+              void *p = chunk2mem (victim);
               alloc_perturb (p, bytes);
               return p;
             }
@@ -4051,7 +4051,7 @@ _int_malloc (mstate av, size_t bytes)
 		{
 #endif
               check_malloced_chunk (av, victim, nb);
-              void *p = chunk2rawmem (victim);
+              void *p = chunk2mem (victim);
               alloc_perturb (p, bytes);
               return p;
 #if USE_TCACHE
@@ -4213,7 +4213,7 @@ _int_malloc (mstate av, size_t bytes)
                   set_foot (remainder, remainder_size);
                 }
               check_malloced_chunk (av, victim, nb);
-              void *p = chunk2rawmem (victim);
+              void *p = chunk2mem (victim);
               alloc_perturb (p, bytes);
               return p;
             }
@@ -4321,7 +4321,7 @@ _int_malloc (mstate av, size_t bytes)
                   set_foot (remainder, remainder_size);
                 }
               check_malloced_chunk (av, victim, nb);
-              void *p = chunk2rawmem (victim);
+              void *p = chunk2mem (victim);
               alloc_perturb (p, bytes);
               return p;
             }
@@ -4359,7 +4359,7 @@ _int_malloc (mstate av, size_t bytes)
           set_head (remainder, remainder_size | PREV_INUSE);
 
           check_malloced_chunk (av, victim, nb);
-          void *p = chunk2rawmem (victim);
+          void *p = chunk2mem (victim);
           alloc_perturb (p, bytes);
           return p;
         }
@@ -4427,7 +4427,7 @@ _int_free (mstate av, mchunkptr p, int have_lock)
     if (tcache != NULL && tc_idx < mp_.tcache_bins)
       {
 	/* Check to see if it's already in the tcache.  */
-	tcache_entry *e = (tcache_entry *) chunk2rawmem (p);
+	tcache_entry *e = (tcache_entry *) chunk2mem (p);
 
 	/* This test succeeds on double free.  However, we don't 100%
 	   trust it (it also matches random payload data at a 1 in
@@ -4499,7 +4499,7 @@ _int_free (mstate av, mchunkptr p, int have_lock)
 	  malloc_printerr ("free(): invalid next size (fast)");
       }
 
-    free_perturb (chunk2rawmem(p), size - CHUNK_HDR_SZ);
+    free_perturb (chunk2mem(p), size - CHUNK_HDR_SZ);
 
     atomic_store_relaxed (&av->have_fastchunks, true);
     unsigned int idx = fastbin_index(size);
@@ -4572,7 +4572,7 @@ _int_free (mstate av, mchunkptr p, int have_lock)
 	|| __builtin_expect (nextsize >= av->system_mem, 0))
       malloc_printerr ("free(): invalid next size (normal)");
 
-    free_perturb (chunk2rawmem(p), size - CHUNK_HDR_SZ);
+    free_perturb (chunk2mem(p), size - CHUNK_HDR_SZ);
 
     /* consolidate backward */
     if (!prev_inuse(p)) {
@@ -4836,7 +4836,7 @@ _int_realloc(mstate av, mchunkptr oldp, INTERNAL_SIZE_T oldsize,
           av->top = chunk_at_offset (oldp, nb);
           set_head (av->top, (newsize - nb) | PREV_INUSE);
           check_inuse_chunk (av, oldp);
-          return tag_new_usable (chunk2rawmem (oldp));
+          return tag_new_usable (chunk2mem (oldp));
         }
 
       /* Try to expand forward into next chunk;  split off remainder below */
@@ -4869,7 +4869,7 @@ _int_realloc(mstate av, mchunkptr oldp, INTERNAL_SIZE_T oldsize,
             }
           else
             {
-	      void *oldmem = chunk2rawmem (oldp);
+	      void *oldmem = chunk2mem (oldp);
 	      size_t sz = memsize (oldp);
 	      (void) tag_region (oldmem, sz);
 	      newmem = tag_new_usable (newmem);
@@ -4906,7 +4906,7 @@ _int_realloc(mstate av, mchunkptr oldp, INTERNAL_SIZE_T oldsize,
     }
 
   check_inuse_chunk (av, newp);
-  return tag_new_usable (chunk2rawmem (newp));
+  return tag_new_usable (chunk2mem (newp));
 }
 
 /*
@@ -4972,7 +4972,7 @@ _int_memalign (mstate av, size_t alignment, size_t bytes)
         {
           set_prev_size (newp, prev_size (p) + leadsize);
           set_head (newp, newsize | IS_MMAPPED);
-          return chunk2rawmem (newp);
+          return chunk2mem (newp);
         }
 
       /* Otherwise, give back leader, use the rest */
@@ -4984,7 +4984,7 @@ _int_memalign (mstate av, size_t alignment, size_t bytes)
       p = newp;
 
       assert (newsize >= nb &&
-              (((unsigned long) (chunk2rawmem (p))) % alignment) == 0);
+              (((unsigned long) (chunk2mem (p))) % alignment) == 0);
     }
 
   /* Also give back spare room at the end */
@@ -5003,7 +5003,7 @@ _int_memalign (mstate av, size_t alignment, size_t bytes)
     }
 
   check_inuse_chunk (av, p);
-  return chunk2rawmem (p);
+  return chunk2mem (p);
 }
 
 
@@ -5038,7 +5038,7 @@ mtrim (mstate av, size_t pad)
                                                 + sizeof (struct malloc_chunk)
                                                 + psm1) & ~psm1);
 
-                assert ((char *) chunk2rawmem (p) + 2 * CHUNK_HDR_SZ
+                assert ((char *) chunk2mem (p) + 2 * CHUNK_HDR_SZ
 			<= paligned_mem);
                 assert ((char *) p + size > paligned_mem);
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 5/6] malloc: Remove unnecessary tagging around _mid_memalign
  2021-03-19 13:25 [PATCH 0/6] malloc: more memory tagging optimizations Szabolcs Nagy
                   ` (3 preceding siblings ...)
  2021-03-19 13:27 ` [PATCH 4/6] malloc: Rename chunk2rawmem Szabolcs Nagy
@ 2021-03-19 13:27 ` Szabolcs Nagy
  2021-03-23 20:44   ` DJ Delorie
  2021-03-19 13:27 ` [PATCH 6/6] malloc: Ensure mtag code path in checked_request2size is cold Szabolcs Nagy
  5 siblings, 1 reply; 13+ messages in thread
From: Szabolcs Nagy @ 2021-03-19 13:27 UTC (permalink / raw)
  To: libc-alpha, DJ Delorie

The internal _mid_memalign already returns newly tagged memory.
(__libc_memalign and posix_memalign already relied on this, this
patch fixes the other call sites.)
---
 malloc/malloc.c | 10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/malloc/malloc.c b/malloc/malloc.c
index 6f87b7bdb1..cb1837d0d7 100644
--- a/malloc/malloc.c
+++ b/malloc/malloc.c
@@ -3553,22 +3553,17 @@ libc_hidden_def (__libc_memalign)
 void *
 __libc_valloc (size_t bytes)
 {
-  void *p;
-
   if (__malloc_initialized < 0)
     ptmalloc_init ();
 
   void *address = RETURN_ADDRESS (0);
   size_t pagesize = GLRO (dl_pagesize);
-  p = _mid_memalign (pagesize, bytes, address);
-  return tag_new_usable (p);
+  return _mid_memalign (pagesize, bytes, address);
 }
 
 void *
 __libc_pvalloc (size_t bytes)
 {
-  void *p;
-
   if (__malloc_initialized < 0)
     ptmalloc_init ();
 
@@ -3585,8 +3580,7 @@ __libc_pvalloc (size_t bytes)
     }
   rounded_bytes = rounded_bytes & -(pagesize - 1);
 
-  p = _mid_memalign (pagesize, rounded_bytes, address);
-  return tag_new_usable (p);
+  return _mid_memalign (pagesize, rounded_bytes, address);
 }
 
 void *
-- 
2.17.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 6/6] malloc: Ensure mtag code path in checked_request2size is cold
  2021-03-19 13:25 [PATCH 0/6] malloc: more memory tagging optimizations Szabolcs Nagy
                   ` (4 preceding siblings ...)
  2021-03-19 13:27 ` [PATCH 5/6] malloc: Remove unnecessary tagging around _mid_memalign Szabolcs Nagy
@ 2021-03-19 13:27 ` Szabolcs Nagy
  2021-03-23 20:46   ` DJ Delorie
  5 siblings, 1 reply; 13+ messages in thread
From: Szabolcs Nagy @ 2021-03-19 13:27 UTC (permalink / raw)
  To: libc-alpha, DJ Delorie

This is a workaround (hack) for a gcc optimization issue (PR 99551).
Without this the generated code may evaluate the expression in the
cold path which causes performance regression for small allocations
in the memory tagging disabled (common) case.
---
 malloc/malloc.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/malloc/malloc.c b/malloc/malloc.c
index cb1837d0d7..ef1f5b1621 100644
--- a/malloc/malloc.c
+++ b/malloc/malloc.c
@@ -1357,8 +1357,13 @@ checked_request2size (size_t req, size_t *sz) __nonnull (1)
      must be a macro that produces a compile time constant if passed
      a constant literal.  */
   if (__glibc_unlikely (mtag_enabled))
-    req = (req + (__MTAG_GRANULE_SIZE - 1)) &
-	  ~(size_t)(__MTAG_GRANULE_SIZE - 1);
+    {
+      /* Ensure this is not evaluated if !mtag_enabled, see gcc PR 99551.  */
+      asm ("");
+
+      req = (req + (__MTAG_GRANULE_SIZE - 1)) &
+	    ~(size_t)(__MTAG_GRANULE_SIZE - 1);
+    }
 
   *sz = request2size (req);
   return true;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/6] malloc: Use memsize instead of CHUNK_AVAILABLE_SIZE
  2021-03-19 13:26 ` [PATCH 1/6] malloc: Use memsize instead of CHUNK_AVAILABLE_SIZE Szabolcs Nagy
@ 2021-03-23 20:01   ` DJ Delorie
  0 siblings, 0 replies; 13+ messages in thread
From: DJ Delorie @ 2021-03-23 20:01 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: libc-alpha

Szabolcs Nagy <szabolcs.nagy@arm.com> writes:
> This is a pure refactoring change that does not affect behaviour.

You hope :-)

LGTM.
Reviewed-by: DJ Delorie <dj@redhat.com>

> -  for (size = CHUNK_AVAILABLE_SIZE (p) - 1;
> +  for (size = CHUNK_HDR_SZ + memsize (p) - 1;

Ok.

> -  max_sz = CHUNK_AVAILABLE_SIZE (p) - CHUNK_HDR_SZ;
> +  max_sz = memsize (p);

Ok.

> -      for (sz = CHUNK_AVAILABLE_SIZE (p) - 1;
> +      for (sz = CHUNK_HDR_SZ + memsize (p) - 1;

Ok.  Odd that it's not chunksize() but that would affect behavior ;-)

> -      for (sz = CHUNK_AVAILABLE_SIZE (p) - 1;
> +      for (sz = CHUNK_HDR_SZ + memsize (p) - 1;

Ok.

> -      (void)tag_region (chunk2rawmem (p), CHUNK_AVAILABLE_SIZE (p)
> -                                         - CHUNK_HDR_SZ);
> +      (void)tag_region (chunk2rawmem (p), memsize (p));

Ok.

> -/* Available size of chunk.  This is the size of the real usable data
> -   in the chunk, plus the chunk header.  Note: If memory tagging is
> -   enabled the layout changes to accomodate the granule size, this is
> -   wasteful for small allocations so not done by default.  The logic
> -   does not work if chunk headers are not granule aligned.  */
> -_Static_assert (__MTAG_GRANULE_SIZE <= CHUNK_HDR_SZ,
> -		"memory tagging is not supported with large granule.");
> -#define CHUNK_AVAILABLE_SIZE(p)                                       \
> -  (__MTAG_GRANULE_SIZE > SIZE_SZ && __glibc_unlikely (mtag_enabled) ? \
> -    chunksize (p) :                                                   \
> -    chunksize (p) + (chunk_is_mmapped (p) ? 0 : SIZE_SZ))
> -

Dropped, ok.

> +/* This is the size of the real usable data in the chunk.  Not valid for
> +   dumped heap chunks.  */
> +#define memsize(p)                                                    \
> +  (__MTAG_GRANULE_SIZE > SIZE_SZ && __glibc_unlikely (mtag_enabled) ? \
> +    chunksize (p) - CHUNK_HDR_SZ :                                    \
> +    chunksize (p) - CHUNK_HDR_SZ + (chunk_is_mmapped (p) ? 0 : SIZE_SZ))
> +
> +/* If memory tagging is enabled the layout changes to accomodate the granule
> +   size, this is wasteful for small allocations so not done by default.
> +   Both the chunk header and user data has to be granule aligned.  */
> +_Static_assert (__MTAG_GRANULE_SIZE <= CHUNK_HDR_SZ,
> +		"memory tagging is not supported with large granule.");
> +

New, ok.

> -      ptr = __libc_mtag_tag_region (__libc_mtag_new_tag (ptr),
> -				    CHUNK_AVAILABLE_SIZE (cp) - CHUNK_HDR_SZ);
> +      ptr = __libc_mtag_tag_region (__libc_mtag_new_tag (ptr), memsize (cp));

Ok.

> -      (void)tag_region (chunk2rawmem (p),
> -			CHUNK_AVAILABLE_SIZE (p) - CHUNK_HDR_SZ);
> +      (void)tag_region (chunk2rawmem (p), memsize (p));

Ok.

> -	  size_t sz = CHUNK_AVAILABLE_SIZE (oldp) - CHUNK_HDR_SZ;
> +	  size_t sz = memsize (oldp);

Ok.

> -    return tag_new_zero_region (mem, CHUNK_AVAILABLE_SIZE (p) - CHUNK_HDR_SZ);
> +    return tag_new_zero_region (mem, memsize (p));

Ok.

> -	      size_t sz = CHUNK_AVAILABLE_SIZE (oldp) - CHUNK_HDR_SZ;
> +	      size_t sz = memsize (oldp);

Ok.

> -	result = CHUNK_AVAILABLE_SIZE (p) - CHUNK_HDR_SZ;
> +	result = memsize (p);

Ok.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/6] malloc: Use different tag after mremap
  2021-03-19 13:26 ` [PATCH 2/6] malloc: Use different tag after mremap Szabolcs Nagy
@ 2021-03-23 20:03   ` DJ Delorie
  0 siblings, 0 replies; 13+ messages in thread
From: DJ Delorie @ 2021-03-23 20:03 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: libc-alpha

Szabolcs Nagy <szabolcs.nagy@arm.com> writes:
> The comment explained why different tag is used after mremap, but
> for that correctly tagged pointer should be passed to tag_new_usable.
> Use chunk2mem to get the tag.

> -	  void *newmem = chunk2rawmem (newp);
> +	  void *newmem = chunk2mem (newp);
>  	  /* Give the new block a different tag.  This helps to ensure
>  	     that stale handles to the previous mapping are not
>  	     reused.  There's a performance hit for both us and the

Ok.

LGTM
Reviewed-by: DJ Delorie <dj@redhat.com>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/6] malloc: Use chunk2rawmem throughout
  2021-03-19 13:26 ` [PATCH 3/6] malloc: Use chunk2rawmem throughout Szabolcs Nagy
@ 2021-03-23 20:25   ` DJ Delorie
  0 siblings, 0 replies; 13+ messages in thread
From: DJ Delorie @ 2021-03-23 20:25 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: libc-alpha

Szabolcs Nagy <szabolcs.nagy@arm.com> writes:
> The difference between chunk2mem and chunk2rawmem is that the latter
> does not get the memory tag for the returned pointer.  It turns out
> chunk2rawmem almost always works:

Given that these two macros are identical on non-aarch64 systems, I'm
going to gloss over the "is it tagged correctly" question since you
folks can just test it, and other targets won't care ;-)

LGTM
Reviewed-by: DJ Delorie <dj@redhat.com>

> sysmalloc: Returns untagged memory.
> _int_malloc: Returns untagged memory.
> _int_free: Takes untagged memory.
> _int_memalign: Returns untagged memory.
> _int_realloc: Takes and returns tagged memory.

We should probably put this information in comments at each function
implementation too, but at least it's in the source files :-)

> -        newmem = chunk2mem (newp);
> +        newmem = tag_at (chunk2rawmem (newp));

Ok.

>     headers have distinct tags.  Converting fully from one to the other
>     involves extracting the tag at the other address and creating a
>     suitable pointer using it.  That can be quite expensive.  There are
> -   many occasions, though when the pointer will not be dereferenced
> -   (for example, because we only want to assert that the pointer is
> -   correctly aligned).  In these cases it is more efficient not
> -   to extract the tag, since the answer will be the same either way.
> -   chunk2rawmem() can be used in these cases.
> - */
> +   cases when the pointers are not dereferenced (for example only used
> +   for alignment check) so the tags are not relevant, and there are
> +   cases when user data is not tagged distinctly from malloc headers
> +   (user data is untagged because tagging is done late in malloc and
> +   early in free).  User memory tagging across internal interfaces:
> +
> +      sysmalloc: Returns untagged memory.
> +      _int_malloc: Returns untagged memory.
> +      _int_free: Takes untagged memory.
> +      _int_memalign: Returns untagged memory.
> +      _int_memalign: Returns untagged memory.
> +      _mid_memalign: Returns tagged memory.
> +      _int_realloc: Takes and returns tagged memory.
> +*/

Ok.

> -/* Convert a user mem pointer to a chunk address without correcting
> +/* Convert a chunk address to a user mem pointer without correcting
>     the tag.  */
>  #define chunk2rawmem(p) ((void*)((char*)(p) + CHUNK_HDR_SZ))

Heh.  Ok.

>  #define misaligned_chunk(p) \
> -  ((uintptr_t)(MALLOC_ALIGNMENT == CHUNK_HDR_SZ ? (p) : chunk2mem (p)) \
> +  ((uintptr_t)(MALLOC_ALIGNMENT == CHUNK_HDR_SZ ? (p) : chunk2rawmem (p)) \
>     & MALLOC_ALIGN_MASK)

Ok.

> -              return chunk2mem (p);
> +              return chunk2rawmem (p);

Ok.

> -      return chunk2mem (p);
> +      return chunk2rawmem (p);

Ok.

> -  uintptr_t mem = (uintptr_t) chunk2mem(p);
> +  uintptr_t mem = (uintptr_t) chunk2rawmem(p);

Ok.

> -  tcache_entry *e = (tcache_entry *) chunk2mem (chunk);
> +  tcache_entry *e = (tcache_entry *) chunk2rawmem (chunk);

Ok.

> -	  void *newmem = chunk2mem (newp);
> +	  void *newmem = tag_at (chunk2rawmem (newp));

Ok.

> -	      void *p = chunk2mem (victim);
> +	      void *p = chunk2rawmem (victim);

Ok.

> -          void *p = chunk2mem (victim);
> +          void *p = chunk2rawmem (victim);

Ok.

> -              void *p = chunk2mem (victim);
> +              void *p = chunk2rawmem (victim);

Ok.

> -              void *p = chunk2mem (victim);
> +              void *p = chunk2rawmem (victim);

Ok.

> -              void *p = chunk2mem (victim);
> +              void *p = chunk2rawmem (victim);

Ok.

> -              void *p = chunk2mem (victim);
> +              void *p = chunk2rawmem (victim);

Ok.

> -          void *p = chunk2mem (victim);
> +          void *p = chunk2rawmem (victim);

Ok.

> -	tcache_entry *e = (tcache_entry *) chunk2mem (p);
> +	tcache_entry *e = (tcache_entry *) chunk2rawmem (p);

Ok.

> -    free_perturb (chunk2mem(p), size - CHUNK_HDR_SZ);
> +    free_perturb (chunk2rawmem(p), size - CHUNK_HDR_SZ);

Ok.

> -    free_perturb (chunk2mem(p), size - CHUNK_HDR_SZ);
> +    free_perturb (chunk2rawmem(p), size - CHUNK_HDR_SZ);

Ok.

> -          return chunk2mem (newp);
> +          return chunk2rawmem (newp);

Ok.

> -  return chunk2mem (p);
> +  return chunk2rawmem (p);

Ok.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/6] malloc: Rename chunk2rawmem
  2021-03-19 13:27 ` [PATCH 4/6] malloc: Rename chunk2rawmem Szabolcs Nagy
@ 2021-03-23 20:43   ` DJ Delorie
  0 siblings, 0 replies; 13+ messages in thread
From: DJ Delorie @ 2021-03-23 20:43 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: libc-alpha

Szabolcs Nagy <szabolcs.nagy@arm.com> writes:
> Replaced tag_at (chunk2rawmem (x)) with chunk2mem_tag (x).
> Renamed chunk2rawmem to chunk2mem.

LGTM.
Reviewed-by: DJ Delorie <dj@redhat.com>

> -      (void)tag_region (chunk2rawmem (p), memsize (p));
> +      (void)tag_region (chunk2mem (p), memsize (p));

Ok.

> -        newmem = tag_at (chunk2rawmem (newp));
> +        newmem = chunk2mem_tag (newp);

Ok.

> -#define chunk2rawmem(p) ((void*)((char*)(p) + CHUNK_HDR_SZ))
> +#define chunk2mem(p) ((void*)((char*)(p) + CHUNK_HDR_SZ))

Ok.

> -/* Convert between user mem pointers and chunk pointers, updating any
> -   memory tags on the pointer to respect the tag value at that
> -   location.  */
> -#define chunk2mem(p) ((void *)tag_at (((char*)(p) + CHUNK_HDR_SZ)))
> +/* Convert a chunk address to a user mem pointer and extract the right tag.  */
> +#define chunk2mem_tag(p) ((void*)tag_at ((char*)(p) + CHUNK_HDR_SZ))
> +
> +/* Convert a user mem pointer to a chunk address and extract the right tag.  */
>  #define mem2chunk(mem) ((mchunkptr)tag_at (((char*)(mem) - CHUNK_HDR_SZ)))

Ok.

>  #define misaligned_chunk(p) \
> -  ((uintptr_t)(MALLOC_ALIGNMENT == CHUNK_HDR_SZ ? (p) : chunk2rawmem (p)) \
> +  ((uintptr_t)(MALLOC_ALIGNMENT == CHUNK_HDR_SZ ? (p) : chunk2mem (p)) \
>     & MALLOC_ALIGN_MASK)

Ok.

> -      assert (aligned_OK (chunk2rawmem (p)));
> +      assert (aligned_OK (chunk2mem (p)));

Ok.

> -      assert (aligned_OK (chunk2rawmem (p)));
> +      assert (aligned_OK (chunk2mem (p)));

Ok.

> -  assert (aligned_OK (chunk2rawmem (p)));
> +  assert (aligned_OK (chunk2mem (p)));

Ok.

> -                  /* For glibc, chunk2rawmem increases the address by
> +                  /* For glibc, chunk2mem increases the address by

Ok.

> -                  assert (((INTERNAL_SIZE_T) chunk2rawmem (mm) & MALLOC_ALIGN_MASK) == 0);
> +                  assert (((INTERNAL_SIZE_T) chunk2mem (mm) & MALLOC_ALIGN_MASK) == 0);

Ok.

> -                front_misalign = (INTERNAL_SIZE_T) chunk2rawmem (mm) & MALLOC_ALIGN_MASK;
> +                front_misalign = (INTERNAL_SIZE_T) chunk2mem (mm) & MALLOC_ALIGN_MASK;

Ok.

> -              return chunk2rawmem (p);
> +              return chunk2mem (p);

Ok.

> -                  front_misalign = (INTERNAL_SIZE_T) chunk2rawmem (brk) & MALLOC_ALIGN_MASK;
> +                  front_misalign = (INTERNAL_SIZE_T) chunk2mem (brk) & MALLOC_ALIGN_MASK;

Ok.

> -                    assert (((unsigned long) chunk2rawmem (brk) & MALLOC_ALIGN_MASK) == 0);
> +                    assert (((unsigned long) chunk2mem (brk) & MALLOC_ALIGN_MASK) == 0);

Ok.

> -                      front_misalign = (INTERNAL_SIZE_T) chunk2rawmem (brk) & MALLOC_ALIGN_MASK;
> +                      front_misalign = (INTERNAL_SIZE_T) chunk2mem (brk) & MALLOC_ALIGN_MASK;

Ok.

> -      return chunk2rawmem (p);
> +      return chunk2mem (p);

Ok.

> -  uintptr_t mem = (uintptr_t) chunk2rawmem (p);
> +  uintptr_t mem = (uintptr_t) chunk2mem (p);

Ok.

> -  uintptr_t mem = (uintptr_t) chunk2rawmem(p);
> +  uintptr_t mem = (uintptr_t) chunk2mem(p);

Ok.

> -  assert (aligned_OK (chunk2rawmem (p)));
> +  assert (aligned_OK (chunk2mem (p)));

Ok.

> -  tcache_entry *e = (tcache_entry *) chunk2rawmem (chunk);
> +  tcache_entry *e = (tcache_entry *) chunk2mem (chunk);

Ok.

> -      (void)tag_region (chunk2rawmem (p), memsize (p));
> +      (void)tag_region (chunk2mem (p), memsize (p));

Ok.

> -	  void *newmem = tag_at (chunk2rawmem (newp));
> +	  void *newmem = chunk2mem_tag (newp);

Ok.

> -	  (void) tag_region (chunk2rawmem (oldp), sz);
> +	  (void) tag_region (chunk2mem (oldp), sz);

Ok.

> -	      void *p = chunk2rawmem (victim);
> +	      void *p = chunk2mem (victim);

Ok.

> -          void *p = chunk2rawmem (victim);
> +          void *p = chunk2mem (victim);

Ok.

> -              void *p = chunk2rawmem (victim);
> +              void *p = chunk2mem (victim);

Ok.

> -              void *p = chunk2rawmem (victim);
> +              void *p = chunk2mem (victim);

Ok.

> -              void *p = chunk2rawmem (victim);
> +              void *p = chunk2mem (victim);

Ok.

> -              void *p = chunk2rawmem (victim);
> +              void *p = chunk2mem (victim);

Ok.

> -          void *p = chunk2rawmem (victim);
> +          void *p = chunk2mem (victim);

Ok.

> -	tcache_entry *e = (tcache_entry *) chunk2rawmem (p);
> +	tcache_entry *e = (tcache_entry *) chunk2mem (p);

Ok.

> -    free_perturb (chunk2rawmem(p), size - CHUNK_HDR_SZ);
> +    free_perturb (chunk2mem(p), size - CHUNK_HDR_SZ);

Ok.

> -    free_perturb (chunk2rawmem(p), size - CHUNK_HDR_SZ);
> +    free_perturb (chunk2mem(p), size - CHUNK_HDR_SZ);

Ok.

> -          return tag_new_usable (chunk2rawmem (oldp));
> +          return tag_new_usable (chunk2mem (oldp));

Ok.

> -	      void *oldmem = chunk2rawmem (oldp);
> +	      void *oldmem = chunk2mem (oldp);

Ok.

> -  return tag_new_usable (chunk2rawmem (newp));
> +  return tag_new_usable (chunk2mem (newp));

Ok.

> -          return chunk2rawmem (newp);
> +          return chunk2mem (newp);

Ok.

> -              (((unsigned long) (chunk2rawmem (p))) % alignment) == 0);
> +              (((unsigned long) (chunk2mem (p))) % alignment) == 0);

Ok.

> -  return chunk2rawmem (p);
> +  return chunk2mem (p);

Ok.

> -                assert ((char *) chunk2rawmem (p) + 2 * CHUNK_HDR_SZ
> +                assert ((char *) chunk2mem (p) + 2 * CHUNK_HDR_SZ

Ok.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 5/6] malloc: Remove unnecessary tagging around _mid_memalign
  2021-03-19 13:27 ` [PATCH 5/6] malloc: Remove unnecessary tagging around _mid_memalign Szabolcs Nagy
@ 2021-03-23 20:44   ` DJ Delorie
  0 siblings, 0 replies; 13+ messages in thread
From: DJ Delorie @ 2021-03-23 20:44 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: libc-alpha

Szabolcs Nagy <szabolcs.nagy@arm.com> writes:
> The internal _mid_memalign already returns newly tagged memory.
> (__libc_memalign and posix_memalign already relied on this, this
> patch fixes the other call sites.)

LGTM.
Reviewed-by: DJ Delorie <dj@redhat.com>

>  void *
>  __libc_valloc (size_t bytes)
>  {
> -  void *p;
> -
>    if (__malloc_initialized < 0)
>      ptmalloc_init ();
>  
>    void *address = RETURN_ADDRESS (0);
>    size_t pagesize = GLRO (dl_pagesize);
> -  p = _mid_memalign (pagesize, bytes, address);
> -  return tag_new_usable (p);
> +  return _mid_memalign (pagesize, bytes, address);
>  }
>  
>  void *
>  __libc_pvalloc (size_t bytes)
>  {
> -  void *p;
> -
>    if (__malloc_initialized < 0)
>      ptmalloc_init ();
>  
> @@ -3585,8 +3580,7 @@ __libc_pvalloc (size_t bytes)
>      }
>    rounded_bytes = rounded_bytes & -(pagesize - 1);
>  
> -  p = _mid_memalign (pagesize, rounded_bytes, address);
> -  return tag_new_usable (p);
> +  return _mid_memalign (pagesize, rounded_bytes, address);
>  }


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 6/6] malloc: Ensure mtag code path in checked_request2size is cold
  2021-03-19 13:27 ` [PATCH 6/6] malloc: Ensure mtag code path in checked_request2size is cold Szabolcs Nagy
@ 2021-03-23 20:46   ` DJ Delorie
  0 siblings, 0 replies; 13+ messages in thread
From: DJ Delorie @ 2021-03-23 20:46 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: libc-alpha

Szabolcs Nagy <szabolcs.nagy@arm.com> writes:
> This is a workaround (hack) for a gcc optimization issue (PR 99551).
> Without this the generated code may evaluate the expression in the
> cold path which causes performance regression for small allocations
> in the memory tagging disabled (common) case.

>    if (__glibc_unlikely (mtag_enabled))
> -    req = (req + (__MTAG_GRANULE_SIZE - 1)) &
> -	  ~(size_t)(__MTAG_GRANULE_SIZE - 1);
> +    {
> +      /* Ensure this is not evaluated if !mtag_enabled, see gcc PR 99551.  */
> +      asm ("");
> +
> +      req = (req + (__MTAG_GRANULE_SIZE - 1)) &
> +	    ~(size_t)(__MTAG_GRANULE_SIZE - 1);
> +    }

Hack indeed, but cross-target-safe and shouldn't affect the unused code
removal on non-aarch64 targets.

LGTM.
Reviewed-by: DJ Delorie <dj@redhat.com>


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-03-23 20:46 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-19 13:25 [PATCH 0/6] malloc: more memory tagging optimizations Szabolcs Nagy
2021-03-19 13:26 ` [PATCH 1/6] malloc: Use memsize instead of CHUNK_AVAILABLE_SIZE Szabolcs Nagy
2021-03-23 20:01   ` DJ Delorie
2021-03-19 13:26 ` [PATCH 2/6] malloc: Use different tag after mremap Szabolcs Nagy
2021-03-23 20:03   ` DJ Delorie
2021-03-19 13:26 ` [PATCH 3/6] malloc: Use chunk2rawmem throughout Szabolcs Nagy
2021-03-23 20:25   ` DJ Delorie
2021-03-19 13:27 ` [PATCH 4/6] malloc: Rename chunk2rawmem Szabolcs Nagy
2021-03-23 20:43   ` DJ Delorie
2021-03-19 13:27 ` [PATCH 5/6] malloc: Remove unnecessary tagging around _mid_memalign Szabolcs Nagy
2021-03-23 20:44   ` DJ Delorie
2021-03-19 13:27 ` [PATCH 6/6] malloc: Ensure mtag code path in checked_request2size is cold Szabolcs Nagy
2021-03-23 20:46   ` DJ Delorie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).