public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH 00/16] memory tagging improvements
@ 2021-03-04 16:30 Szabolcs Nagy
  2021-03-04 16:30 ` [PATCH 01/16] malloc: Fix a realloc crash with heap tagging [BZ 27468] Szabolcs Nagy
                   ` (15 more replies)
  0 siblings, 16 replies; 39+ messages in thread
From: Szabolcs Nagy @ 2021-03-04 16:30 UTC (permalink / raw)
  To: libc-alpha, Richard.Earnshaw, DJ Delorie

This set tries to reduce the overhead of memory tagging support: when
glibc is built with --enable-memory-tagging there are runtime checks
that affect performance even if heap tagging is not enabled at runtime.

I refactored the code to reduce internal abstractions and ifdefs. And
a number of patches optimize heap tagging on aarch64 with MTE.

Also available in the nsz/mtag branch.

Issues found and fixed:
- realloc failed to untag the old memory in some cases.
- incorrect but useless macro definition in public sys/prctl.h.
- incorrect but harmless internal MTAG_MMAP_FLAGS declaration.

Issues found but not fixed:
- large granule is not supported: both chunk header and user allocation
  must be tag granule aligned, but the code assumes that the chunk
  header is 2*SIZE_SZ so larger granule is not supported.
- request2size and checked_request2size are not consistent which looks
  wrong (likely works in practice other than some sizes don't end up in
  fastbin or tcache that one might expected to, but i think this can
  break easily). Fixing it is not obvious because of assumptions about
  the chunk layout that is no longer true with mtag enabled.

Other possible optimizations:
- ifunc malloc apis
  good: near zero overhead for runtime dispatch.
  bad: malloc has to be built twice or larger refactoring is needed.
- unconditional top byte zero
  internal data is tagged 0, only user allocation is tagged non-zero
  (we rely on this e.g. where MADV_DONTNEED is used: then the kernel
  zeros the tags so with non-zero tags internal pointers would not be
  able to access recycled memory), so tag_at(p) for internal data
  could be unconditional top byte clear.
  good: faster than the branch in tag_at, may improve security (?).
  bad: design change.
- unconditional layout
  padding up to tag granule can be done unconditionally
  good: no runtime check, same layout with and without heap tagging.
  bad: wasting space with small allocations, design change.

This patchset tried to avoid intrusive changes that affect behaviour.

Benchmarked using bench-malloc-thread glibc ubenchmark on Neoverse N1.
For size=16 the runtime check overhead is
  ~ 32% before the patch set
  ~ 16% after the patch set.
On a practical workload or for large sizes the overhead should be tiny.

Szabolcs Nagy (16):
  malloc: Fix a realloc crash with heap tagging [BZ 27468]
  Remove PR_TAGGED_ADDR_ENABLE from sys/prctl.h
  malloc: Move MTAG_MMAP_FLAGS definition
  malloc: Simplify __mtag_tag_new_usable
  malloc: Avoid taggig mmaped memory on free
  malloc: Ensure the generic mtag hooks are not used
  malloc: Refactor TAG_ macros to avoid indirection
  malloc: Use global flag instead of function pointer dispatch for mtag
  malloc: Only support zeroing and not arbitrary memset with mtag
  malloc: Change calloc when tagging is disabled
  malloc: Use branches instead of mtag_granule_mask
  malloc: Use mtag_enabled instead of USE_MTAG
  aarch64: inline __libc_mtag_address_get_tag
  aarch64: inline __libc_mtag_new_tag
  aarch64: Optimize __libc_mtag_tag_region
  aarch64: Optimize __libc_mtag_tag_zero_region

 include/malloc.h                         |   7 -
 malloc/arena.c                           |  45 +-----
 malloc/hooks.c                           |  20 ++-
 malloc/malloc.c                          | 177 ++++++++++++-----------
 sysdeps/aarch64/Makefile                 |   4 +-
 sysdeps/aarch64/__mtag_address_get_tag.S |  32 ----
 sysdeps/aarch64/__mtag_memset_tag.S      |  53 -------
 sysdeps/aarch64/__mtag_new_tag.S         |  37 -----
 sysdeps/aarch64/__mtag_tag_region.S      |  98 ++++++++++---
 sysdeps/aarch64/__mtag_tag_zero_region.S | 113 +++++++++++++++
 sysdeps/aarch64/libc-mtag.h              |  32 ++--
 sysdeps/generic/libc-mtag.h              |  43 ++++--
 sysdeps/unix/sysv/linux/sys/prctl.h      |   4 -
 13 files changed, 353 insertions(+), 312 deletions(-)
 delete mode 100644 sysdeps/aarch64/__mtag_address_get_tag.S
 delete mode 100644 sysdeps/aarch64/__mtag_memset_tag.S
 delete mode 100644 sysdeps/aarch64/__mtag_new_tag.S
 create mode 100644 sysdeps/aarch64/__mtag_tag_zero_region.S

-- 
2.17.1


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 01/16] malloc: Fix a realloc crash with heap tagging [BZ 27468]
  2021-03-04 16:30 [PATCH 00/16] memory tagging improvements Szabolcs Nagy
@ 2021-03-04 16:30 ` Szabolcs Nagy
  2021-03-05  0:15   ` DJ Delorie
  2021-03-05 20:51   ` DJ Delorie
  2021-03-04 16:30 ` [PATCH 02/16] Remove PR_TAGGED_ADDR_ENABLE from sys/prctl.h Szabolcs Nagy
                   ` (14 subsequent siblings)
  15 siblings, 2 replies; 39+ messages in thread
From: Szabolcs Nagy @ 2021-03-04 16:30 UTC (permalink / raw)
  To: libc-alpha, Richard.Earnshaw, DJ Delorie

_int_free must be called with a chunk that has its tag reset. This was
missing in a rare case that could crash when heap tagging is enabled:
when in a multi-threaded process the current arena runs out of memory
during realloc, but another arena still has space to finish the realloc
then _int_free was called without clearing the user allocation tags.

And another _int_free call site in realloc used the wrong size for the
tag clearing: the chunk header of the next chunk was also cleared which
in practice is probably not a problem, but logically that belongs to a
different chunk so it may cause trouble.

Fixes bug 27468.
---
 malloc/malloc.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/malloc/malloc.c b/malloc/malloc.c
index 1f4bbd8edf..10ea6aa441 100644
--- a/malloc/malloc.c
+++ b/malloc/malloc.c
@@ -3446,7 +3446,9 @@ __libc_realloc (void *oldmem, size_t bytes)
       newp = __libc_malloc (bytes);
       if (newp != NULL)
         {
-          memcpy (newp, oldmem, oldsize - SIZE_SZ);
+	  size_t sz = CHUNK_AVAILABLE_SIZE (oldp) - CHUNK_HDR_SZ;
+	  memcpy (newp, oldmem, sz);
+	  (void) TAG_REGION (chunk2rawmem (oldp), sz);
           _int_free (ar_ptr, oldp, 0);
         }
     }
@@ -4850,10 +4852,10 @@ _int_realloc(mstate av, mchunkptr oldp, INTERNAL_SIZE_T oldsize,
           else
             {
 	      void *oldmem = chunk2mem (oldp);
+	      size_t sz = CHUNK_AVAILABLE_SIZE (oldp) - CHUNK_HDR_SZ;
 	      newmem = TAG_NEW_USABLE (newmem);
-	      memcpy (newmem, oldmem,
-		      CHUNK_AVAILABLE_SIZE (oldp) - CHUNK_HDR_SZ);
-	      (void) TAG_REGION (chunk2rawmem (oldp), oldsize);
+	      memcpy (newmem, oldmem, sz);
+	      (void) TAG_REGION (chunk2rawmem (oldp), sz);
               _int_free (av, oldp, 1);
               check_inuse_chunk (av, newp);
               return chunk2mem (newp);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 02/16] Remove PR_TAGGED_ADDR_ENABLE from sys/prctl.h
  2021-03-04 16:30 [PATCH 00/16] memory tagging improvements Szabolcs Nagy
  2021-03-04 16:30 ` [PATCH 01/16] malloc: Fix a realloc crash with heap tagging [BZ 27468] Szabolcs Nagy
@ 2021-03-04 16:30 ` Szabolcs Nagy
  2021-03-26 11:29   ` Szabolcs Nagy
  2021-03-04 16:31 ` [PATCH 03/16] malloc: Move MTAG_MMAP_FLAGS definition Szabolcs Nagy
                   ` (13 subsequent siblings)
  15 siblings, 1 reply; 39+ messages in thread
From: Szabolcs Nagy @ 2021-03-04 16:30 UTC (permalink / raw)
  To: libc-alpha, Richard.Earnshaw, DJ Delorie

The value of PR_TAGGED_ADDR_ENABLE was incorrect in the installed
headers and the prctl command macros were missing that are needed
for it to be useful (PR_SET_TAGGED_ADDR_CTRL).  Linux headers have
the definitions since 5.4 so it's widely available, we don't need
to repeat these definitions.  The remaining definitions are from
Linux 5.10.

To build glibc with --enable-memory-tagging, Linux 5.4 headers and
binutils 2.33.1 or newer is needed.
---
 sysdeps/unix/sysv/linux/sys/prctl.h | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/sysdeps/unix/sysv/linux/sys/prctl.h b/sysdeps/unix/sysv/linux/sys/prctl.h
index 00817ff0f1..c9048c7cdb 100644
--- a/sysdeps/unix/sysv/linux/sys/prctl.h
+++ b/sysdeps/unix/sysv/linux/sys/prctl.h
@@ -25,10 +25,6 @@
    we're picking up...  */
 
 /* Memory tagging control operations (for AArch64).  */
-#ifndef PR_TAGGED_ADDR_ENABLE
-# define PR_TAGGED_ADDR_ENABLE	(1UL << 8)
-#endif
-
 #ifndef PR_MTE_TCF_SHIFT
 # define PR_MTE_TCF_SHIFT	1
 # define PR_MTE_TCF_NONE	(0UL << PR_MTE_TCF_SHIFT)
-- 
2.17.1


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 03/16] malloc: Move MTAG_MMAP_FLAGS definition
  2021-03-04 16:30 [PATCH 00/16] memory tagging improvements Szabolcs Nagy
  2021-03-04 16:30 ` [PATCH 01/16] malloc: Fix a realloc crash with heap tagging [BZ 27468] Szabolcs Nagy
  2021-03-04 16:30 ` [PATCH 02/16] Remove PR_TAGGED_ADDR_ENABLE from sys/prctl.h Szabolcs Nagy
@ 2021-03-04 16:31 ` Szabolcs Nagy
  2021-03-05  1:07   ` DJ Delorie
  2021-03-04 16:31 ` [PATCH 04/16] malloc: Simplify __mtag_tag_new_usable Szabolcs Nagy
                   ` (12 subsequent siblings)
  15 siblings, 1 reply; 39+ messages in thread
From: Szabolcs Nagy @ 2021-03-04 16:31 UTC (permalink / raw)
  To: libc-alpha, Richard.Earnshaw, DJ Delorie

This is only used internally in malloc.c, the extern declaration
was wrong, __mtag_mmap_flags has internal linkage.
---
 include/malloc.h | 7 -------
 malloc/malloc.c  | 2 ++
 2 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/include/malloc.h b/include/malloc.h
index 7ae08d53d3..b77761f74d 100644
--- a/include/malloc.h
+++ b/include/malloc.h
@@ -16,11 +16,4 @@ typedef struct malloc_state *mstate;
 
 # endif /* !_ISOMAC */
 
-#ifdef USE_MTAG
-extern int __mtag_mmap_flags;
-#define MTAG_MMAP_FLAGS __mtag_mmap_flags
-#else
-#define MTAG_MMAP_FLAGS 0
-#endif
-
 #endif
diff --git a/malloc/malloc.c b/malloc/malloc.c
index 10ea6aa441..4538a01614 100644
--- a/malloc/malloc.c
+++ b/malloc/malloc.c
@@ -463,11 +463,13 @@ static void *(*__tag_region)(void *, size_t) = __default_tag_region;
 static void *(*__tag_new_usable)(void *) = __default_tag_nop;
 static void *(*__tag_at)(void *) = __default_tag_nop;
 
+# define MTAG_MMAP_FLAGS __mtag_mmap_flags
 # define TAG_NEW_MEMSET(ptr, val, size) __tag_new_memset (ptr, val, size)
 # define TAG_REGION(ptr, size) __tag_region (ptr, size)
 # define TAG_NEW_USABLE(ptr) __tag_new_usable (ptr)
 # define TAG_AT(ptr) __tag_at (ptr)
 #else
+# define MTAG_MMAP_FLAGS 0
 # define TAG_NEW_MEMSET(ptr, val, size) memset (ptr, val, size)
 # define TAG_REGION(ptr, size) (ptr)
 # define TAG_NEW_USABLE(ptr) (ptr)
-- 
2.17.1


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 04/16] malloc: Simplify __mtag_tag_new_usable
  2021-03-04 16:30 [PATCH 00/16] memory tagging improvements Szabolcs Nagy
                   ` (2 preceding siblings ...)
  2021-03-04 16:31 ` [PATCH 03/16] malloc: Move MTAG_MMAP_FLAGS definition Szabolcs Nagy
@ 2021-03-04 16:31 ` Szabolcs Nagy
  2021-03-05  0:20   ` DJ Delorie
  2021-03-05 18:52   ` DJ Delorie
  2021-03-04 16:31 ` [PATCH 05/16] malloc: Avoid taggig mmaped memory on free Szabolcs Nagy
                   ` (11 subsequent siblings)
  15 siblings, 2 replies; 39+ messages in thread
From: Szabolcs Nagy @ 2021-03-04 16:31 UTC (permalink / raw)
  To: libc-alpha, Richard.Earnshaw, DJ Delorie

The chunk cannot be a dumped one here.  The only non-obvious cases
are free and realloc which may be called on a dumped area chunk,
but in both cases it can be verified that tagging is already
avoided for dumped area chunks.
---
 malloc/arena.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/malloc/arena.c b/malloc/arena.c
index bf17be27d4..0777dc70c6 100644
--- a/malloc/arena.c
+++ b/malloc/arena.c
@@ -298,11 +298,6 @@ __mtag_tag_new_usable (void *ptr)
   if (ptr)
     {
       mchunkptr cp = mem2chunk(ptr);
-      /* This likely will never happen, but we can't handle retagging
-	 chunks from the dumped main arena.  So just return the
-	 existing pointer.  */
-      if (DUMPED_MAIN_ARENA_CHUNK (cp))
-	return ptr;
       ptr = __libc_mtag_tag_region (__libc_mtag_new_tag (ptr),
 				    CHUNK_AVAILABLE_SIZE (cp) - CHUNK_HDR_SZ);
     }
-- 
2.17.1


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 05/16] malloc: Avoid taggig mmaped memory on free
  2021-03-04 16:30 [PATCH 00/16] memory tagging improvements Szabolcs Nagy
                   ` (3 preceding siblings ...)
  2021-03-04 16:31 ` [PATCH 04/16] malloc: Simplify __mtag_tag_new_usable Szabolcs Nagy
@ 2021-03-04 16:31 ` Szabolcs Nagy
  2021-03-05  1:01   ` DJ Delorie
  2021-03-04 16:31 ` [PATCH 06/16] malloc: Ensure the generic mtag hooks are not used Szabolcs Nagy
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 39+ messages in thread
From: Szabolcs Nagy @ 2021-03-04 16:31 UTC (permalink / raw)
  To: libc-alpha, Richard.Earnshaw, DJ Delorie

Either the memory belongs to the dumped area, in which case we don't
want to tag (the dumped area has the same tag as malloc internal data
so tagging is unnecessary, but chunks there may not have the right
alignment for the tag granule), or the memory will be unmapped
immediately (and thus tagging is not useful).
---
 malloc/malloc.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/malloc/malloc.c b/malloc/malloc.c
index 4538a01614..b4c800bd7f 100644
--- a/malloc/malloc.c
+++ b/malloc/malloc.c
@@ -3284,9 +3284,6 @@ __libc_free (void *mem)
 
   p = mem2chunk (mem);
 
-  /* Mark the chunk as belonging to the library again.  */
-  (void)TAG_REGION (chunk2rawmem (p), CHUNK_AVAILABLE_SIZE (p) - CHUNK_HDR_SZ);
-
   if (chunk_is_mmapped (p))                       /* release mmapped memory. */
     {
       /* See if the dynamic brk/mmap threshold needs adjusting.
@@ -3307,6 +3304,10 @@ __libc_free (void *mem)
     {
       MAYBE_INIT_TCACHE ();
 
+      /* Mark the chunk as belonging to the library again.  */
+      (void)TAG_REGION (chunk2rawmem (p),
+			CHUNK_AVAILABLE_SIZE (p) - CHUNK_HDR_SZ);
+
       ar_ptr = arena_for_chunk (p);
       _int_free (ar_ptr, p, 0);
     }
-- 
2.17.1


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 06/16] malloc: Ensure the generic mtag hooks are not used
  2021-03-04 16:30 [PATCH 00/16] memory tagging improvements Szabolcs Nagy
                   ` (4 preceding siblings ...)
  2021-03-04 16:31 ` [PATCH 05/16] malloc: Avoid taggig mmaped memory on free Szabolcs Nagy
@ 2021-03-04 16:31 ` Szabolcs Nagy
  2021-03-05  1:05   ` DJ Delorie
  2021-03-05 20:30   ` DJ Delorie
  2021-03-04 16:32 ` [PATCH 07/16] malloc: Refactor TAG_ macros to avoid indirection Szabolcs Nagy
                   ` (9 subsequent siblings)
  15 siblings, 2 replies; 39+ messages in thread
From: Szabolcs Nagy @ 2021-03-04 16:31 UTC (permalink / raw)
  To: libc-alpha, Richard.Earnshaw, DJ Delorie

Use inline functions instead of macros, because macros can cause unused
variable warnings and type conversion issues.  We assume these functions
may appear in the code but only in dead code paths (hidden by a runtime
check), so it's important that they can compile with correct types, but
if they are actually used that should be an error.

Currently the hooks are only used when USE_MTAG is true which only
happens on aarch64 and then the aarch64 specific code is used not this
generic header.  However followup refactoring will allow the hooks to
be used with !USE_MTAG.

Note: the const qualifier in the comment was wrong: changing tags is a
write operation.
---
 sysdeps/generic/libc-mtag.h | 41 ++++++++++++++++++++++++++++---------
 1 file changed, 31 insertions(+), 10 deletions(-)

diff --git a/sysdeps/generic/libc-mtag.h b/sysdeps/generic/libc-mtag.h
index 1a866cdc0c..e8fc236b6c 100644
--- a/sysdeps/generic/libc-mtag.h
+++ b/sysdeps/generic/libc-mtag.h
@@ -31,22 +31,43 @@
 /* Extra flags to pass to mmap() to request a tagged region of memory.  */
 #define __MTAG_MMAP_FLAGS 0
 
+/* Memory tagging target hooks are only called when memory tagging is
+   enabled at runtime.  The generic definitions here must not be used.  */
+void __libc_mtag_link_error (void);
+
 /* Set the tags for a region of memory, which must have size and alignment
-   that are multiples of __MTAG_GRANULE_SIZE.  Size cannot be zero.
-   void *__libc_mtag_tag_region (const void *, size_t)  */
-#define __libc_mtag_tag_region(p, s) (p)
+   that are multiples of __MTAG_GRANULE_SIZE.  Size cannot be zero.  */
+static inline void *
+__libc_mtag_tag_region (void *p, size_t n)
+{
+  __libc_mtag_link_error ();
+  return p;
+}
 
 /* Optimized equivalent to __libc_mtag_tag_region followed by memset.  */
-#define __libc_mtag_memset_with_tag memset
+static inline void *
+__libc_mtag_memset_with_tag (void *p, int c, size_t n)
+{
+  __libc_mtag_link_error ();
+  return memset (p, c, n);
+}
 
 /* Convert address P to a pointer that is tagged correctly for that
-   location.
-   void *__libc_mtag_address_get_tag (void*)  */
-#define __libc_mtag_address_get_tag(p) (p)
+   location.  */
+static inline void *
+__libc_mtag_address_get_tag (void *p)
+{
+  __libc_mtag_link_error ();
+  return p;
+}
 
 /* Assign a new (random) tag to a pointer P (does not adjust the tag on
-   the memory addressed).
-   void *__libc_mtag_new_tag (void*)  */
-#define __libc_mtag_new_tag(p) (p)
+   the memory addressed).  */
+static inline void *
+__libc_mtag_new_tag (void *p)
+{
+  __libc_mtag_link_error ();
+  return p;
+}
 
 #endif /* _GENERIC_LIBC_MTAG_H */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 07/16] malloc: Refactor TAG_ macros to avoid indirection
  2021-03-04 16:30 [PATCH 00/16] memory tagging improvements Szabolcs Nagy
                   ` (5 preceding siblings ...)
  2021-03-04 16:31 ` [PATCH 06/16] malloc: Ensure the generic mtag hooks are not used Szabolcs Nagy
@ 2021-03-04 16:32 ` Szabolcs Nagy
  2021-03-05  0:28   ` DJ Delorie
  2021-03-04 16:32 ` [PATCH 08/16] malloc: Use global flag instead of function pointer dispatch for mtag Szabolcs Nagy
                   ` (8 subsequent siblings)
  15 siblings, 1 reply; 39+ messages in thread
From: Szabolcs Nagy @ 2021-03-04 16:32 UTC (permalink / raw)
  To: libc-alpha, Richard.Earnshaw, DJ Delorie

This does not change behaviour, just removes one layer of indirection
in the internal memory tagging logic.

Use tag_ and mtag_ prefixes instead of __tag_ and __mtag_ since these
are all symbols with internal linkage, private to malloc.c, so there
is no user namespace pollution issue.
---
 malloc/arena.c  | 16 +++++-----
 malloc/hooks.c  | 10 +++---
 malloc/malloc.c | 81 +++++++++++++++++++++++--------------------------
 3 files changed, 51 insertions(+), 56 deletions(-)

diff --git a/malloc/arena.c b/malloc/arena.c
index 0777dc70c6..d0778fea92 100644
--- a/malloc/arena.c
+++ b/malloc/arena.c
@@ -332,12 +332,12 @@ ptmalloc_init (void)
       if (__MTAG_SBRK_UNTAGGED)
 	__morecore = __failing_morecore;
 
-      __mtag_mmap_flags = __MTAG_MMAP_FLAGS;
-      __tag_new_memset = __mtag_tag_new_memset;
-      __tag_region = __libc_mtag_tag_region;
-      __tag_new_usable = __mtag_tag_new_usable;
-      __tag_at = __libc_mtag_address_get_tag;
-      __mtag_granule_mask = ~(size_t)(__MTAG_GRANULE_SIZE - 1);
+      mtag_mmap_flags = __MTAG_MMAP_FLAGS;
+      tag_new_memset = __mtag_tag_new_memset;
+      tag_region = __libc_mtag_tag_region;
+      tag_new_usable = __mtag_tag_new_usable;
+      tag_at = __libc_mtag_address_get_tag;
+      mtag_granule_mask = ~(size_t)(__MTAG_GRANULE_SIZE - 1);
     }
 #endif
 
@@ -557,7 +557,7 @@ new_heap (size_t size, size_t top_pad)
             }
         }
     }
-  if (__mprotect (p2, size, MTAG_MMAP_FLAGS | PROT_READ | PROT_WRITE) != 0)
+  if (__mprotect (p2, size, mtag_mmap_flags | PROT_READ | PROT_WRITE) != 0)
     {
       __munmap (p2, HEAP_MAX_SIZE);
       return 0;
@@ -587,7 +587,7 @@ grow_heap (heap_info *h, long diff)
     {
       if (__mprotect ((char *) h + h->mprotect_size,
                       (unsigned long) new_size - h->mprotect_size,
-                      MTAG_MMAP_FLAGS | PROT_READ | PROT_WRITE) != 0)
+                      mtag_mmap_flags | PROT_READ | PROT_WRITE) != 0)
         return -2;
 
       h->mprotect_size = new_size;
diff --git a/malloc/hooks.c b/malloc/hooks.c
index efec05f0a8..d8e304c31c 100644
--- a/malloc/hooks.c
+++ b/malloc/hooks.c
@@ -68,7 +68,7 @@ __malloc_check_init (void)
    tags, so fetch the tag at each location before dereferencing
    it.  */
 #define SAFE_CHAR_OFFSET(p,offset) \
-  ((unsigned char *) TAG_AT (((unsigned char *) p) + offset))
+  ((unsigned char *) tag_at (((unsigned char *) p) + offset))
 
 /* A simple, standard set of debugging hooks.  Overhead is `only' one
    byte per chunk; still this will catch most cases of double frees or
@@ -249,7 +249,7 @@ malloc_check (size_t sz, const void *caller)
   top_check ();
   victim = _int_malloc (&main_arena, nb);
   __libc_lock_unlock (main_arena.mutex);
-  return mem2mem_check (TAG_NEW_USABLE (victim), sz);
+  return mem2mem_check (tag_new_usable (victim), sz);
 }
 
 static void
@@ -280,7 +280,7 @@ free_check (void *mem, const void *caller)
   else
     {
       /* Mark the chunk as belonging to the library again.  */
-      (void)TAG_REGION (chunk2rawmem (p), CHUNK_AVAILABLE_SIZE (p)
+      (void)tag_region (chunk2rawmem (p), CHUNK_AVAILABLE_SIZE (p)
                                          - CHUNK_HDR_SZ);
       _int_free (&main_arena, p, 1);
       __libc_lock_unlock (main_arena.mutex);
@@ -375,7 +375,7 @@ invert:
 
   __libc_lock_unlock (main_arena.mutex);
 
-  return mem2mem_check (TAG_NEW_USABLE (newmem), bytes);
+  return mem2mem_check (tag_new_usable (newmem), bytes);
 }
 
 static void *
@@ -417,7 +417,7 @@ memalign_check (size_t alignment, size_t bytes, const void *caller)
   top_check ();
   mem = _int_memalign (&main_arena, alignment, bytes + 1);
   __libc_lock_unlock (main_arena.mutex);
-  return mem2mem_check (TAG_NEW_USABLE (mem), bytes);
+  return mem2mem_check (tag_new_usable (mem), bytes);
 }
 
 #if SHLIB_COMPAT (libc, GLIBC_2_0, GLIBC_2_25)
diff --git a/malloc/malloc.c b/malloc/malloc.c
index b4c800bd7f..e5f520267b 100644
--- a/malloc/malloc.c
+++ b/malloc/malloc.c
@@ -413,26 +413,26 @@ void *(*__morecore)(ptrdiff_t) = __default_morecore;
    operations can continue to be used.  Support macros are used to do
    this:
 
-   void *TAG_NEW_MEMSET (void *ptr, int, val, size_t size)
+   void *tag_new_memset (void *ptr, int, val, size_t size)
 
    Has the same interface as memset(), but additionally allocates a
    new tag, colors the memory with that tag and returns a pointer that
    is correctly colored for that location.  The non-tagging version
    will simply call memset.
 
-   void *TAG_REGION (void *ptr, size_t size)
+   void *tag_region (void *ptr, size_t size)
 
    Color the region of memory pointed to by PTR and size SIZE with
    the color of PTR.  Returns the original pointer.
 
-   void *TAG_NEW_USABLE (void *ptr)
+   void *tag_new_usable (void *ptr)
 
    Allocate a new random color and use it to color the user region of
    a chunk; this may include data from the subsequent chunk's header
    if tagging is sufficiently fine grained.  Returns PTR suitably
    recolored for accessing the memory there.
 
-   void *TAG_AT (void *ptr)
+   void *tag_at (void *ptr)
 
    Read the current color of the memory at the address pointed to by
    PTR (ignoring it's current color) and return PTR recolored to that
@@ -455,25 +455,20 @@ __default_tag_nop (void *ptr)
   return ptr;
 }
 
-static int __mtag_mmap_flags = 0;
-static size_t __mtag_granule_mask = ~(size_t)0;
+static int mtag_mmap_flags = 0;
+static size_t mtag_granule_mask = ~(size_t)0;
 
-static void *(*__tag_new_memset)(void *, int, size_t) = memset;
-static void *(*__tag_region)(void *, size_t) = __default_tag_region;
-static void *(*__tag_new_usable)(void *) = __default_tag_nop;
-static void *(*__tag_at)(void *) = __default_tag_nop;
+static void *(*tag_new_memset)(void *, int, size_t) = memset;
+static void *(*tag_region)(void *, size_t) = __default_tag_region;
+static void *(*tag_new_usable)(void *) = __default_tag_nop;
+static void *(*tag_at)(void *) = __default_tag_nop;
 
-# define MTAG_MMAP_FLAGS __mtag_mmap_flags
-# define TAG_NEW_MEMSET(ptr, val, size) __tag_new_memset (ptr, val, size)
-# define TAG_REGION(ptr, size) __tag_region (ptr, size)
-# define TAG_NEW_USABLE(ptr) __tag_new_usable (ptr)
-# define TAG_AT(ptr) __tag_at (ptr)
 #else
-# define MTAG_MMAP_FLAGS 0
-# define TAG_NEW_MEMSET(ptr, val, size) memset (ptr, val, size)
-# define TAG_REGION(ptr, size) (ptr)
-# define TAG_NEW_USABLE(ptr) (ptr)
-# define TAG_AT(ptr) (ptr)
+# define mtag_mmap_flags 0
+# define tag_new_memset(ptr, val, size) memset (ptr, val, size)
+# define tag_region(ptr, size) (ptr)
+# define tag_new_usable(ptr) (ptr)
+# define tag_at(ptr) (ptr)
 #endif
 
 #include <string.h>
@@ -1305,8 +1300,8 @@ nextchunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 /* Convert between user mem pointers and chunk pointers, updating any
    memory tags on the pointer to respect the tag value at that
    location.  */
-#define chunk2mem(p) ((void*)TAG_AT (((char*)(p) + CHUNK_HDR_SZ)))
-#define mem2chunk(mem) ((mchunkptr)TAG_AT (((char*)(mem) - CHUNK_HDR_SZ)))
+#define chunk2mem(p) ((void *)tag_at (((char*)(p) + CHUNK_HDR_SZ)))
+#define mem2chunk(mem) ((mchunkptr)tag_at (((char*)(mem) - CHUNK_HDR_SZ)))
 
 /* The smallest possible chunk */
 #define MIN_CHUNK_SIZE        (offsetof(struct malloc_chunk, fd_nextsize))
@@ -1337,7 +1332,7 @@ nextchunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 #ifdef USE_MTAG
 #define CHUNK_AVAILABLE_SIZE(p) \
   ((chunksize (p) + (chunk_is_mmapped (p) ? 0 : SIZE_SZ))	\
-   & __mtag_granule_mask)
+   & mtag_granule_mask)
 #else
 #define CHUNK_AVAILABLE_SIZE(p) \
   (chunksize (p) + (chunk_is_mmapped (p) ? 0 : SIZE_SZ))
@@ -1361,7 +1356,7 @@ checked_request2size (size_t req, size_t *sz) __nonnull (1)
      number.  Ideally, this would be part of request2size(), but that
      must be a macro that produces a compile time constant if passed
      a constant literal.  */
-  req = (req + ~__mtag_granule_mask) & __mtag_granule_mask;
+  req = (req + ~mtag_granule_mask) & mtag_granule_mask;
 #endif
 
   *sz = request2size (req);
@@ -2467,7 +2462,7 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av)
       if ((unsigned long) (size) > (unsigned long) (nb))
         {
           mm = (char *) (MMAP (0, size,
-			       MTAG_MMAP_FLAGS | PROT_READ | PROT_WRITE, 0));
+			       mtag_mmap_flags | PROT_READ | PROT_WRITE, 0));
 
           if (mm != MAP_FAILED)
             {
@@ -2665,7 +2660,7 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av)
           if ((unsigned long) (size) > (unsigned long) (nb))
             {
               char *mbrk = (char *) (MMAP (0, size,
-					   MTAG_MMAP_FLAGS | PROT_READ | PROT_WRITE,
+					   mtag_mmap_flags | PROT_READ | PROT_WRITE,
 					   0));
 
               if (mbrk != MAP_FAILED)
@@ -3221,14 +3216,14 @@ __libc_malloc (size_t bytes)
       && tcache->counts[tc_idx] > 0)
     {
       victim = tcache_get (tc_idx);
-      return TAG_NEW_USABLE (victim);
+      return tag_new_usable (victim);
     }
   DIAG_POP_NEEDS_COMMENT;
 #endif
 
   if (SINGLE_THREAD_P)
     {
-      victim = TAG_NEW_USABLE (_int_malloc (&main_arena, bytes));
+      victim = tag_new_usable (_int_malloc (&main_arena, bytes));
       assert (!victim || chunk_is_mmapped (mem2chunk (victim)) ||
 	      &main_arena == arena_for_chunk (mem2chunk (victim)));
       return victim;
@@ -3249,7 +3244,7 @@ __libc_malloc (size_t bytes)
   if (ar_ptr != NULL)
     __libc_lock_unlock (ar_ptr->mutex);
 
-  victim = TAG_NEW_USABLE (victim);
+  victim = tag_new_usable (victim);
 
   assert (!victim || chunk_is_mmapped (mem2chunk (victim)) ||
           ar_ptr == arena_for_chunk (mem2chunk (victim)));
@@ -3305,7 +3300,7 @@ __libc_free (void *mem)
       MAYBE_INIT_TCACHE ();
 
       /* Mark the chunk as belonging to the library again.  */
-      (void)TAG_REGION (chunk2rawmem (p),
+      (void)tag_region (chunk2rawmem (p),
 			CHUNK_AVAILABLE_SIZE (p) - CHUNK_HDR_SZ);
 
       ar_ptr = arena_for_chunk (p);
@@ -3408,7 +3403,7 @@ __libc_realloc (void *oldmem, size_t bytes)
 	     reused.  There's a performance hit for both us and the
 	     caller for doing this, so we might want to
 	     reconsider.  */
-	  return TAG_NEW_USABLE (newmem);
+	  return tag_new_usable (newmem);
 	}
 #endif
       /* Note the extra SIZE_SZ overhead. */
@@ -3451,7 +3446,7 @@ __libc_realloc (void *oldmem, size_t bytes)
         {
 	  size_t sz = CHUNK_AVAILABLE_SIZE (oldp) - CHUNK_HDR_SZ;
 	  memcpy (newp, oldmem, sz);
-	  (void) TAG_REGION (chunk2rawmem (oldp), sz);
+	  (void) tag_region (chunk2rawmem (oldp), sz);
           _int_free (ar_ptr, oldp, 0);
         }
     }
@@ -3509,7 +3504,7 @@ _mid_memalign (size_t alignment, size_t bytes, void *address)
       p = _int_memalign (&main_arena, alignment, bytes);
       assert (!p || chunk_is_mmapped (mem2chunk (p)) ||
 	      &main_arena == arena_for_chunk (mem2chunk (p)));
-      return TAG_NEW_USABLE (p);
+      return tag_new_usable (p);
     }
 
   arena_get (ar_ptr, bytes + alignment + MINSIZE);
@@ -3527,7 +3522,7 @@ _mid_memalign (size_t alignment, size_t bytes, void *address)
 
   assert (!p || chunk_is_mmapped (mem2chunk (p)) ||
           ar_ptr == arena_for_chunk (mem2chunk (p)));
-  return TAG_NEW_USABLE (p);
+  return tag_new_usable (p);
 }
 /* For ISO C11.  */
 weak_alias (__libc_memalign, aligned_alloc)
@@ -3544,7 +3539,7 @@ __libc_valloc (size_t bytes)
   void *address = RETURN_ADDRESS (0);
   size_t pagesize = GLRO (dl_pagesize);
   p = _mid_memalign (pagesize, bytes, address);
-  return TAG_NEW_USABLE (p);
+  return tag_new_usable (p);
 }
 
 void *
@@ -3569,7 +3564,7 @@ __libc_pvalloc (size_t bytes)
   rounded_bytes = rounded_bytes & -(pagesize - 1);
 
   p = _mid_memalign (pagesize, rounded_bytes, address);
-  return TAG_NEW_USABLE (p);
+  return tag_new_usable (p);
 }
 
 void *
@@ -3666,7 +3661,7 @@ __libc_calloc (size_t n, size_t elem_size)
      regardless of MORECORE_CLEARS, so we zero the whole block while
      doing so.  */
 #ifdef USE_MTAG
-  return TAG_NEW_MEMSET (mem, 0, CHUNK_AVAILABLE_SIZE (p) - CHUNK_HDR_SZ);
+  return tag_new_memset (mem, 0, CHUNK_AVAILABLE_SIZE (p) - CHUNK_HDR_SZ);
 #else
   INTERNAL_SIZE_T csz = chunksize (p);
 
@@ -4821,7 +4816,7 @@ _int_realloc(mstate av, mchunkptr oldp, INTERNAL_SIZE_T oldsize,
           av->top = chunk_at_offset (oldp, nb);
           set_head (av->top, (newsize - nb) | PREV_INUSE);
           check_inuse_chunk (av, oldp);
-          return TAG_NEW_USABLE (chunk2rawmem (oldp));
+          return tag_new_usable (chunk2rawmem (oldp));
         }
 
       /* Try to expand forward into next chunk;  split off remainder below */
@@ -4856,9 +4851,9 @@ _int_realloc(mstate av, mchunkptr oldp, INTERNAL_SIZE_T oldsize,
             {
 	      void *oldmem = chunk2mem (oldp);
 	      size_t sz = CHUNK_AVAILABLE_SIZE (oldp) - CHUNK_HDR_SZ;
-	      newmem = TAG_NEW_USABLE (newmem);
+	      newmem = tag_new_usable (newmem);
 	      memcpy (newmem, oldmem, sz);
-	      (void) TAG_REGION (chunk2rawmem (oldp), sz);
+	      (void) tag_region (chunk2rawmem (oldp), sz);
               _int_free (av, oldp, 1);
               check_inuse_chunk (av, newp);
               return chunk2mem (newp);
@@ -4881,7 +4876,7 @@ _int_realloc(mstate av, mchunkptr oldp, INTERNAL_SIZE_T oldsize,
     {
       remainder = chunk_at_offset (newp, nb);
       /* Clear any user-space tags before writing the header.  */
-      remainder = TAG_REGION (remainder, remainder_size);
+      remainder = tag_region (remainder, remainder_size);
       set_head_size (newp, nb | (av != &main_arena ? NON_MAIN_ARENA : 0));
       set_head (remainder, remainder_size | PREV_INUSE |
                 (av != &main_arena ? NON_MAIN_ARENA : 0));
@@ -4891,7 +4886,7 @@ _int_realloc(mstate av, mchunkptr oldp, INTERNAL_SIZE_T oldsize,
     }
 
   check_inuse_chunk (av, newp);
-  return TAG_NEW_USABLE (chunk2rawmem (newp));
+  return tag_new_usable (chunk2rawmem (newp));
 }
 
 /*
@@ -5108,7 +5103,7 @@ musable (void *mem)
       /* The usable space may be reduced if memory tagging is needed,
 	 since we cannot share the user-space data with malloc's internal
 	 data structure.  */
-      result &= __mtag_granule_mask;
+      result &= mtag_granule_mask;
 #endif
       return result;
     }
-- 
2.17.1


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 08/16] malloc: Use global flag instead of function pointer dispatch for mtag
  2021-03-04 16:30 [PATCH 00/16] memory tagging improvements Szabolcs Nagy
                   ` (6 preceding siblings ...)
  2021-03-04 16:32 ` [PATCH 07/16] malloc: Refactor TAG_ macros to avoid indirection Szabolcs Nagy
@ 2021-03-04 16:32 ` Szabolcs Nagy
  2021-03-05  0:46   ` DJ Delorie
  2021-03-04 16:32 ` [PATCH 09/16] malloc: Only support zeroing and not arbitrary memset with mtag Szabolcs Nagy
                   ` (7 subsequent siblings)
  15 siblings, 1 reply; 39+ messages in thread
From: Szabolcs Nagy @ 2021-03-04 16:32 UTC (permalink / raw)
  To: libc-alpha, Richard.Earnshaw, DJ Delorie

A flag check can be faster than function pointers because of how
branch prediction and speculation works and it can also remove a layer
of indirection when there is a mismatch between the malloc internal
tag_* api and __libc_mtag_* target hooks.

Memory tagging wrapper functions are moved to malloc.c from arena.c and
the logic now checks mmap_enabled.  The definition of tag_new_usable is
moved after chunk related definitions.

This refactoring also allows using mtag_enabled checks instead of
USE_MTAG ifdefs when memory tagging support only changes code logic
when memory tagging is enabled at runtime.
---
 malloc/arena.c  | 33 +---------------------------
 malloc/malloc.c | 58 ++++++++++++++++++++++++++++++++-----------------
 2 files changed, 39 insertions(+), 52 deletions(-)

diff --git a/malloc/arena.c b/malloc/arena.c
index d0778fea92..1e83bb66bd 100644
--- a/malloc/arena.c
+++ b/malloc/arena.c
@@ -287,34 +287,6 @@ extern struct dl_open_hook *_dl_open_hook;
 libc_hidden_proto (_dl_open_hook);
 #endif
 
-#ifdef USE_MTAG
-
-/* Generate a new (random) tag value for PTR and tag the memory it
-   points to upto the end of the usable size for the chunk containing
-   it.  Return the newly tagged pointer.  */
-static void *
-__mtag_tag_new_usable (void *ptr)
-{
-  if (ptr)
-    {
-      mchunkptr cp = mem2chunk(ptr);
-      ptr = __libc_mtag_tag_region (__libc_mtag_new_tag (ptr),
-				    CHUNK_AVAILABLE_SIZE (cp) - CHUNK_HDR_SZ);
-    }
-  return ptr;
-}
-
-/* Generate a new (random) tag value for PTR, set the tags for the
-   memory to the new tag and initialize the memory contents to VAL.
-   In practice this function will only be called with VAL=0, but we
-   keep this parameter to maintain the same prototype as memset.  */
-static void *
-__mtag_tag_new_memset (void *ptr, int val, size_t size)
-{
-  return __libc_mtag_memset_with_tag (__libc_mtag_new_tag (ptr), val, size);
-}
-#endif
-
 static void
 ptmalloc_init (void)
 {
@@ -332,11 +304,8 @@ ptmalloc_init (void)
       if (__MTAG_SBRK_UNTAGGED)
 	__morecore = __failing_morecore;
 
+      mtag_enabled = true;
       mtag_mmap_flags = __MTAG_MMAP_FLAGS;
-      tag_new_memset = __mtag_tag_new_memset;
-      tag_region = __libc_mtag_tag_region;
-      tag_new_usable = __mtag_tag_new_usable;
-      tag_at = __libc_mtag_address_get_tag;
       mtag_granule_mask = ~(size_t)(__MTAG_GRANULE_SIZE - 1);
     }
 #endif
diff --git a/malloc/malloc.c b/malloc/malloc.c
index e5f520267b..caf34843f7 100644
--- a/malloc/malloc.c
+++ b/malloc/malloc.c
@@ -441,35 +441,41 @@ void *(*__morecore)(ptrdiff_t) = __default_morecore;
 */
 
 #ifdef USE_MTAG
+static bool mtag_enabled = false;
+static int mtag_mmap_flags = 0;
+static size_t mtag_granule_mask = ~(size_t)0;
+#else
+# define mtag_enabled false
+# define mtag_mmap_flags 0
+#endif
 
-/* Default implementaions when memory tagging is supported, but disabled.  */
-static void *
-__default_tag_region (void *ptr, size_t size)
+static __always_inline void *
+tag_region (void *ptr, size_t size)
 {
+  if (__glibc_unlikely (mtag_enabled))
+    return __libc_mtag_tag_region (ptr, size);
   return ptr;
 }
 
-static void *
-__default_tag_nop (void *ptr)
+static __always_inline void *
+tag_new_memset (void *ptr, int val, size_t size)
 {
-  return ptr;
+  if (__glibc_unlikely (mtag_enabled))
+    return __libc_mtag_memset_with_tag (__libc_mtag_new_tag (ptr), val, size);
+  return memset (ptr, val, size);
 }
 
-static int mtag_mmap_flags = 0;
-static size_t mtag_granule_mask = ~(size_t)0;
-
-static void *(*tag_new_memset)(void *, int, size_t) = memset;
-static void *(*tag_region)(void *, size_t) = __default_tag_region;
-static void *(*tag_new_usable)(void *) = __default_tag_nop;
-static void *(*tag_at)(void *) = __default_tag_nop;
+/* Defined later.  */
+static void *
+tag_new_usable (void *ptr);
 
-#else
-# define mtag_mmap_flags 0
-# define tag_new_memset(ptr, val, size) memset (ptr, val, size)
-# define tag_region(ptr, size) (ptr)
-# define tag_new_usable(ptr) (ptr)
-# define tag_at(ptr) (ptr)
-#endif
+static __always_inline void *
+tag_at (void *ptr)
+{
+  if (__glibc_unlikely (mtag_enabled))
+    return __libc_mtag_address_get_tag (ptr);
+  return ptr;
+}
 
 #include <string.h>
 
@@ -3185,6 +3191,18 @@ tcache_thread_shutdown (void)
 
 #endif /* !USE_TCACHE  */
 
+static __always_inline void *
+tag_new_usable (void *ptr)
+{
+  if (__glibc_unlikely (mtag_enabled) && ptr)
+    {
+      mchunkptr cp = mem2chunk(ptr);
+      ptr = __libc_mtag_tag_region (__libc_mtag_new_tag (ptr),
+				    CHUNK_AVAILABLE_SIZE (cp) - CHUNK_HDR_SZ);
+    }
+  return ptr;
+}
+
 void *
 __libc_malloc (size_t bytes)
 {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 09/16] malloc: Only support zeroing and not arbitrary memset with mtag
  2021-03-04 16:30 [PATCH 00/16] memory tagging improvements Szabolcs Nagy
                   ` (7 preceding siblings ...)
  2021-03-04 16:32 ` [PATCH 08/16] malloc: Use global flag instead of function pointer dispatch for mtag Szabolcs Nagy
@ 2021-03-04 16:32 ` Szabolcs Nagy
  2021-03-05  0:49   ` DJ Delorie
  2021-03-04 16:33 ` [PATCH 10/16] malloc: Change calloc when tagging is disabled Szabolcs Nagy
                   ` (6 subsequent siblings)
  15 siblings, 1 reply; 39+ messages in thread
From: Szabolcs Nagy @ 2021-03-04 16:32 UTC (permalink / raw)
  To: libc-alpha, Richard.Earnshaw, DJ Delorie

The memset api is suboptimal and does not provide much benefit. Memory
tagging only needs a zeroing memset (and only for memory that's sized
and aligned to multiples of the tag granule), so change the internal
api and the target hooks accordingly.  This is to simplify the
implementation of the target hook.
---
 malloc/malloc.c                                | 17 ++++++++---------
 sysdeps/aarch64/Makefile                       |  2 +-
 ...g_memset_tag.S => __mtag_tag_zero_region.S} | 18 +++++++-----------
 sysdeps/aarch64/libc-mtag.h                    |  4 ++--
 sysdeps/generic/libc-mtag.h                    |  6 +++---
 5 files changed, 21 insertions(+), 26 deletions(-)
 rename sysdeps/aarch64/{__mtag_memset_tag.S => __mtag_tag_zero_region.S} (82%)

diff --git a/malloc/malloc.c b/malloc/malloc.c
index caf34843f7..9002d51d7b 100644
--- a/malloc/malloc.c
+++ b/malloc/malloc.c
@@ -413,12 +413,11 @@ void *(*__morecore)(ptrdiff_t) = __default_morecore;
    operations can continue to be used.  Support macros are used to do
    this:
 
-   void *tag_new_memset (void *ptr, int, val, size_t size)
+   void *tag_new_zero_region (void *ptr, size_t size)
 
-   Has the same interface as memset(), but additionally allocates a
-   new tag, colors the memory with that tag and returns a pointer that
-   is correctly colored for that location.  The non-tagging version
-   will simply call memset.
+   Allocates a new tag, colors the memory with that tag, zeros the
+   memory and returns a pointer that is correctly colored for that
+   location.  The non-tagging version will simply call memset with 0.
 
    void *tag_region (void *ptr, size_t size)
 
@@ -458,11 +457,11 @@ tag_region (void *ptr, size_t size)
 }
 
 static __always_inline void *
-tag_new_memset (void *ptr, int val, size_t size)
+tag_new_zero_region (void *ptr, size_t size)
 {
   if (__glibc_unlikely (mtag_enabled))
-    return __libc_mtag_memset_with_tag (__libc_mtag_new_tag (ptr), val, size);
-  return memset (ptr, val, size);
+    return __libc_mtag_tag_zero_region (__libc_mtag_new_tag (ptr), size);
+  return memset (ptr, 0, size);
 }
 
 /* Defined later.  */
@@ -3679,7 +3678,7 @@ __libc_calloc (size_t n, size_t elem_size)
      regardless of MORECORE_CLEARS, so we zero the whole block while
      doing so.  */
 #ifdef USE_MTAG
-  return tag_new_memset (mem, 0, CHUNK_AVAILABLE_SIZE (p) - CHUNK_HDR_SZ);
+  return tag_new_zero_region (mem, CHUNK_AVAILABLE_SIZE (p) - CHUNK_HDR_SZ);
 #else
   INTERNAL_SIZE_T csz = chunksize (p);
 
diff --git a/sysdeps/aarch64/Makefile b/sysdeps/aarch64/Makefile
index d3ab37a40a..259070cfad 100644
--- a/sysdeps/aarch64/Makefile
+++ b/sysdeps/aarch64/Makefile
@@ -41,7 +41,7 @@ endif
 ifeq ($(subdir),misc)
 sysdep_headers += sys/ifunc.h
 sysdep_routines += __mtag_address_get_tag \
-		   __mtag_memset_tag \
+		   __mtag_tag_zero_region \
 		   __mtag_new_tag \
 		   __mtag_tag_region
 
diff --git a/sysdeps/aarch64/__mtag_memset_tag.S b/sysdeps/aarch64/__mtag_tag_zero_region.S
similarity index 82%
rename from sysdeps/aarch64/__mtag_memset_tag.S
rename to sysdeps/aarch64/__mtag_tag_zero_region.S
index 3c202888a4..74d398bba5 100644
--- a/sysdeps/aarch64/__mtag_memset_tag.S
+++ b/sysdeps/aarch64/__mtag_tag_zero_region.S
@@ -20,9 +20,6 @@
 
 #ifdef USE_MTAG
 
-/* Use the same register names and assignments as memset.  */
-#include "memset-reg.h"
-
 	.arch armv8.5-a
 	.arch_extension memtag
 
@@ -31,16 +28,15 @@
 /* FIXME: This is a minimal implementation.  We could do much better than
    this for large values of COUNT.  */
 
-ENTRY(__libc_mtag_memset_with_tag)
+#define dstin x0
+#define count x1
+#define dst   x2
 
-	and	valw, valw, 255
-	orr	valw, valw, valw, lsl 8
-	orr	valw, valw, valw, lsl 16
-	orr	val, val, val, lsl 32
-	mov	dst, dstin
+ENTRY(__libc_mtag_tag_zero_region)
 
+	mov	dst, dstin
 L(loop):
-	stgp	val, val, [dst], #16
+	stzg	dst, [dst], #16
 	subs	count, count, 16
 	bne	L(loop)
 #if 0
@@ -49,5 +45,5 @@ L(loop):
 	ldg	dstin, [dstin] // Recover the tag created (might be untagged).
 #endif
 	ret
-END (__libc_mtag_memset_with_tag)
+END (__libc_mtag_tag_zero_region)
 #endif /* USE_MTAG */
diff --git a/sysdeps/aarch64/libc-mtag.h b/sysdeps/aarch64/libc-mtag.h
index 979cbb743e..f58402ccf9 100644
--- a/sysdeps/aarch64/libc-mtag.h
+++ b/sysdeps/aarch64/libc-mtag.h
@@ -39,8 +39,8 @@
    void *__libc_mtag_tag_region (const void *, size_t)  */
 void *__libc_mtag_tag_region (void *, size_t);
 
-/* Optimized equivalent to __libc_mtag_tag_region followed by memset.  */
-void *__libc_mtag_memset_with_tag (void *, int, size_t);
+/* Optimized equivalent to __libc_mtag_tag_region followed by memset to 0.  */
+void *__libc_mtag_tag_zero_region (void *, size_t);
 
 /* Convert address P to a pointer that is tagged correctly for that
    location.
diff --git a/sysdeps/generic/libc-mtag.h b/sysdeps/generic/libc-mtag.h
index e8fc236b6c..4743e873f1 100644
--- a/sysdeps/generic/libc-mtag.h
+++ b/sysdeps/generic/libc-mtag.h
@@ -44,12 +44,12 @@ __libc_mtag_tag_region (void *p, size_t n)
   return p;
 }
 
-/* Optimized equivalent to __libc_mtag_tag_region followed by memset.  */
+/* Optimized equivalent to __libc_mtag_tag_region followed by memset to 0.  */
 static inline void *
-__libc_mtag_memset_with_tag (void *p, int c, size_t n)
+__libc_mtag_tag_zero_region (void *p, size_t n)
 {
   __libc_mtag_link_error ();
-  return memset (p, c, n);
+  return memset (p, 0, n);
 }
 
 /* Convert address P to a pointer that is tagged correctly for that
-- 
2.17.1


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 10/16] malloc: Change calloc when tagging is disabled
  2021-03-04 16:30 [PATCH 00/16] memory tagging improvements Szabolcs Nagy
                   ` (8 preceding siblings ...)
  2021-03-04 16:32 ` [PATCH 09/16] malloc: Only support zeroing and not arbitrary memset with mtag Szabolcs Nagy
@ 2021-03-04 16:33 ` Szabolcs Nagy
  2021-03-05  1:06   ` DJ Delorie
  2021-03-04 16:33 ` [PATCH 11/16] malloc: Use branches instead of mtag_granule_mask Szabolcs Nagy
                   ` (5 subsequent siblings)
  15 siblings, 1 reply; 39+ messages in thread
From: Szabolcs Nagy @ 2021-03-04 16:33 UTC (permalink / raw)
  To: libc-alpha, Richard.Earnshaw, DJ Delorie

When glibc is built with memory tagging support (USE_MTAG) but it is not
enabled at runtime (mtag_enabled) then unconditional memset was used
even though that can be often avoided.

This is for performance when tagging is supported but not enabled.
The extra check should have no overhead: tag_new_zero_region already
had a runtime check which the compiler can now optimize away.
---
 malloc/malloc.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/malloc/malloc.c b/malloc/malloc.c
index 9002d51d7b..b1ee0f450b 100644
--- a/malloc/malloc.c
+++ b/malloc/malloc.c
@@ -3591,11 +3591,9 @@ __libc_calloc (size_t n, size_t elem_size)
   mchunkptr oldtop;
   INTERNAL_SIZE_T sz, oldtopsize;
   void *mem;
-#ifndef USE_MTAG
   unsigned long clearsize;
   unsigned long nclears;
   INTERNAL_SIZE_T *d;
-#endif
   ptrdiff_t bytes;
 
   if (__glibc_unlikely (__builtin_mul_overflow (n, elem_size, &bytes)))
@@ -3674,12 +3672,13 @@ __libc_calloc (size_t n, size_t elem_size)
     return 0;
 
   mchunkptr p = mem2chunk (mem);
+
   /* If we are using memory tagging, then we need to set the tags
      regardless of MORECORE_CLEARS, so we zero the whole block while
      doing so.  */
-#ifdef USE_MTAG
-  return tag_new_zero_region (mem, CHUNK_AVAILABLE_SIZE (p) - CHUNK_HDR_SZ);
-#else
+  if (__glibc_unlikely (mtag_enabled))
+    return tag_new_zero_region (mem, CHUNK_AVAILABLE_SIZE (p) - CHUNK_HDR_SZ);
+
   INTERNAL_SIZE_T csz = chunksize (p);
 
   /* Two optional cases in which clearing not necessary */
@@ -3733,7 +3732,6 @@ __libc_calloc (size_t n, size_t elem_size)
     }
 
   return mem;
-#endif
 }
 
 /*
-- 
2.17.1


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 11/16] malloc: Use branches instead of mtag_granule_mask
  2021-03-04 16:30 [PATCH 00/16] memory tagging improvements Szabolcs Nagy
                   ` (9 preceding siblings ...)
  2021-03-04 16:33 ` [PATCH 10/16] malloc: Change calloc when tagging is disabled Szabolcs Nagy
@ 2021-03-04 16:33 ` Szabolcs Nagy
  2021-03-05 21:00   ` DJ Delorie
  2021-03-04 16:33 ` [PATCH 12/16] malloc: Use mtag_enabled instead of USE_MTAG Szabolcs Nagy
                   ` (4 subsequent siblings)
  15 siblings, 1 reply; 39+ messages in thread
From: Szabolcs Nagy @ 2021-03-04 16:33 UTC (permalink / raw)
  To: libc-alpha, Richard.Earnshaw, DJ Delorie

The branches may be better optimized since mtag_enabled is widely used.

Granule size larger than a chunk header is not supported since then we
cannot have both the chunk header and user area granule aligned.  To
fix that for targets with large granule, the chunk layout has to change.

So code that attempted to handle the granule mask generally was changed.
This simplified CHUNK_AVAILABLE_SIZE and the logic in malloc_usable_size.
---
 malloc/arena.c  |  1 -
 malloc/malloc.c | 34 ++++++++++++++--------------------
 2 files changed, 14 insertions(+), 21 deletions(-)

diff --git a/malloc/arena.c b/malloc/arena.c
index 1e83bb66bd..9fbbb38a15 100644
--- a/malloc/arena.c
+++ b/malloc/arena.c
@@ -306,7 +306,6 @@ ptmalloc_init (void)
 
       mtag_enabled = true;
       mtag_mmap_flags = __MTAG_MMAP_FLAGS;
-      mtag_granule_mask = ~(size_t)(__MTAG_GRANULE_SIZE - 1);
     }
 #endif
 
diff --git a/malloc/malloc.c b/malloc/malloc.c
index b1ee0f450b..8854afec88 100644
--- a/malloc/malloc.c
+++ b/malloc/malloc.c
@@ -442,7 +442,6 @@ void *(*__morecore)(ptrdiff_t) = __default_morecore;
 #ifdef USE_MTAG
 static bool mtag_enabled = false;
 static int mtag_mmap_flags = 0;
-static size_t mtag_granule_mask = ~(size_t)0;
 #else
 # define mtag_enabled false
 # define mtag_mmap_flags 0
@@ -1333,15 +1332,16 @@ nextchunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    ((req) + SIZE_SZ + MALLOC_ALIGN_MASK) & ~MALLOC_ALIGN_MASK)
 
 /* Available size of chunk.  This is the size of the real usable data
-   in the chunk, plus the chunk header.  */
-#ifdef USE_MTAG
-#define CHUNK_AVAILABLE_SIZE(p) \
-  ((chunksize (p) + (chunk_is_mmapped (p) ? 0 : SIZE_SZ))	\
-   & mtag_granule_mask)
-#else
-#define CHUNK_AVAILABLE_SIZE(p) \
-  (chunksize (p) + (chunk_is_mmapped (p) ? 0 : SIZE_SZ))
-#endif
+   in the chunk, plus the chunk header.  Note: If memory tagging is
+   enabled the layout changes to accomodate the granule size, this is
+   wasteful for small allocations so not done by default.  The logic
+   does not work if chunk headers are not granule aligned.  */
+_Static_assert (__MTAG_GRANULE_SIZE <= CHUNK_HDR_SZ,
+		"memory tagging is not supported with large granule.");
+#define CHUNK_AVAILABLE_SIZE(p)                                       \
+  (__MTAG_GRANULE_SIZE > SIZE_SZ && __glibc_unlikely (mtag_enabled) ? \
+    chunksize (p) :                                                   \
+    chunksize (p) + (chunk_is_mmapped (p) ? 0 : SIZE_SZ))
 
 /* Check if REQ overflows when padded and aligned and if the resulting value
    is less than PTRDIFF_T.  Returns TRUE and the requested size or MINSIZE in
@@ -1353,7 +1353,6 @@ checked_request2size (size_t req, size_t *sz) __nonnull (1)
   if (__glibc_unlikely (req > PTRDIFF_MAX))
     return false;
 
-#ifdef USE_MTAG
   /* When using tagged memory, we cannot share the end of the user
      block with the header for the next chunk, so ensure that we
      allocate blocks that are rounded up to the granule size.  Take
@@ -1361,8 +1360,9 @@ checked_request2size (size_t req, size_t *sz) __nonnull (1)
      number.  Ideally, this would be part of request2size(), but that
      must be a macro that produces a compile time constant if passed
      a constant literal.  */
-  req = (req + ~mtag_granule_mask) & mtag_granule_mask;
-#endif
+  if (__glibc_unlikely (mtag_enabled))
+    req = (req + (__MTAG_GRANULE_SIZE - 1)) &
+	  ~(size_t)(__MTAG_GRANULE_SIZE - 1);
 
   *sz = request2size (req);
   return true;
@@ -5112,14 +5112,8 @@ musable (void *mem)
 	    result = chunksize (p) - CHUNK_HDR_SZ;
 	}
       else if (inuse (p))
-	result = chunksize (p) - SIZE_SZ;
+	result = CHUNK_AVAILABLE_SIZE (p) - CHUNK_HDR_SZ;
 
-#ifdef USE_MTAG
-      /* The usable space may be reduced if memory tagging is needed,
-	 since we cannot share the user-space data with malloc's internal
-	 data structure.  */
-      result &= mtag_granule_mask;
-#endif
       return result;
     }
   return 0;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 12/16] malloc: Use mtag_enabled instead of USE_MTAG
  2021-03-04 16:30 [PATCH 00/16] memory tagging improvements Szabolcs Nagy
                   ` (10 preceding siblings ...)
  2021-03-04 16:33 ` [PATCH 11/16] malloc: Use branches instead of mtag_granule_mask Szabolcs Nagy
@ 2021-03-04 16:33 ` Szabolcs Nagy
  2021-03-05  0:56   ` DJ Delorie
  2021-03-04 16:34 ` [PATCH 13/16] aarch64: inline __libc_mtag_address_get_tag Szabolcs Nagy
                   ` (3 subsequent siblings)
  15 siblings, 1 reply; 39+ messages in thread
From: Szabolcs Nagy @ 2021-03-04 16:33 UTC (permalink / raw)
  To: libc-alpha, Richard.Earnshaw, DJ Delorie

Use the runtime check where possible: it should not cause slow down in
the !USE_MTAG case since then mtag_enabled is constant false, but it
allows compiling the tagging logic so it's less likely to break or
diverge when developers only test the !USE_MTAG case.
---
 malloc/hooks.c  | 10 ++++------
 malloc/malloc.c | 10 ++++------
 2 files changed, 8 insertions(+), 12 deletions(-)

diff --git a/malloc/hooks.c b/malloc/hooks.c
index d8e304c31c..9474e199c3 100644
--- a/malloc/hooks.c
+++ b/malloc/hooks.c
@@ -262,11 +262,10 @@ free_check (void *mem, const void *caller)
 
   int err = errno;
 
-#ifdef USE_MTAG
   /* Quickly check that the freed pointer matches the tag for the memory.
      This gives a useful double-free detection.  */
-  *(volatile char *)mem;
-#endif
+  if (__glibc_unlikely (mtag_enabled))
+    *(volatile char *)mem;
 
   __libc_lock_lock (main_arena.mutex);
   p = mem2chunk_check (mem, NULL);
@@ -310,11 +309,10 @@ realloc_check (void *oldmem, size_t bytes, const void *caller)
       return NULL;
     }
 
-#ifdef USE_MTAG
   /* Quickly check that the freed pointer matches the tag for the memory.
      This gives a useful double-free detection.  */
-  *(volatile char *)oldmem;
-#endif
+  if (__glibc_unlikely (mtag_enabled))
+    *(volatile char *)oldmem;
 
   __libc_lock_lock (main_arena.mutex);
   const mchunkptr oldp = mem2chunk_check (oldmem, &magic_p);
diff --git a/malloc/malloc.c b/malloc/malloc.c
index 8854afec88..2d96bb085c 100644
--- a/malloc/malloc.c
+++ b/malloc/malloc.c
@@ -3286,11 +3286,10 @@ __libc_free (void *mem)
   if (mem == 0)                              /* free(0) has no effect */
     return;
 
-#ifdef USE_MTAG
   /* Quickly check that the freed pointer matches the tag for the memory.
      This gives a useful double-free detection.  */
-  *(volatile char *)mem;
-#endif
+  if (__glibc_unlikely (mtag_enabled))
+    *(volatile char *)mem;
 
   int err = errno;
 
@@ -3352,11 +3351,10 @@ __libc_realloc (void *oldmem, size_t bytes)
   if (oldmem == 0)
     return __libc_malloc (bytes);
 
-#ifdef USE_MTAG
   /* Perform a quick check to ensure that the pointer's tag matches the
      memory's tag.  */
-  *(volatile char*) oldmem;
-#endif
+  if (__glibc_unlikely (mtag_enabled))
+    *(volatile char*) oldmem;
 
   /* chunk corresponding to oldmem */
   const mchunkptr oldp = mem2chunk (oldmem);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 13/16] aarch64: inline __libc_mtag_address_get_tag
  2021-03-04 16:30 [PATCH 00/16] memory tagging improvements Szabolcs Nagy
                   ` (11 preceding siblings ...)
  2021-03-04 16:33 ` [PATCH 12/16] malloc: Use mtag_enabled instead of USE_MTAG Szabolcs Nagy
@ 2021-03-04 16:34 ` Szabolcs Nagy
  2021-03-04 16:34 ` [PATCH 14/16] aarch64: inline __libc_mtag_new_tag Szabolcs Nagy
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 39+ messages in thread
From: Szabolcs Nagy @ 2021-03-04 16:34 UTC (permalink / raw)
  To: libc-alpha, Richard.Earnshaw, DJ Delorie

This is a common operation when heap tagging is enabled, so inline the
instruction instead of using an extern call.

The .inst directive is used instead of the name of the instruction (or
acle intrinsics) because malloc.c is not compiled for armv8.5-a+memtag
architecture, runtime cpu support detection is used.

Prototypes are removed from the comments as they were not always
correct.
---
 sysdeps/aarch64/Makefile                 |  3 +--
 sysdeps/aarch64/__mtag_address_get_tag.S | 32 ------------------------
 sysdeps/aarch64/libc-mtag.h              | 14 +++++++----
 3 files changed, 10 insertions(+), 39 deletions(-)
 delete mode 100644 sysdeps/aarch64/__mtag_address_get_tag.S

diff --git a/sysdeps/aarch64/Makefile b/sysdeps/aarch64/Makefile
index 259070cfad..5d594debea 100644
--- a/sysdeps/aarch64/Makefile
+++ b/sysdeps/aarch64/Makefile
@@ -40,8 +40,7 @@ endif
 
 ifeq ($(subdir),misc)
 sysdep_headers += sys/ifunc.h
-sysdep_routines += __mtag_address_get_tag \
-		   __mtag_tag_zero_region \
+sysdep_routines += __mtag_tag_zero_region \
 		   __mtag_new_tag \
 		   __mtag_tag_region
 
diff --git a/sysdeps/aarch64/__mtag_address_get_tag.S b/sysdeps/aarch64/__mtag_address_get_tag.S
deleted file mode 100644
index eab6c49285..0000000000
--- a/sysdeps/aarch64/__mtag_address_get_tag.S
+++ /dev/null
@@ -1,32 +0,0 @@
-/* Copyright (C) 2020-2021 Free Software Foundation, Inc.
-
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library.  If not, see
-   <http://www.gnu.org/licenses/>.  */
-
-#include <sysdep.h>
-
-#ifdef USE_MTAG
-#define ptr x0
-
-	.arch armv8.5-a
-	.arch_extension memtag
-
-ENTRY (__libc_mtag_address_get_tag)
-
-	ldg	ptr, [ptr]
-	ret
-END (__libc_mtag_address_get_tag)
-#endif /* USE_MTAG */
diff --git a/sysdeps/aarch64/libc-mtag.h b/sysdeps/aarch64/libc-mtag.h
index f58402ccf9..da1b6be776 100644
--- a/sysdeps/aarch64/libc-mtag.h
+++ b/sysdeps/aarch64/libc-mtag.h
@@ -35,17 +35,21 @@
 #define __MTAG_MMAP_FLAGS PROT_MTE
 
 /* Set the tags for a region of memory, which must have size and alignment
-   that are multiples of __MTAG_GRANULE_SIZE.  Size cannot be zero.
-   void *__libc_mtag_tag_region (const void *, size_t)  */
+   that are multiples of __MTAG_GRANULE_SIZE.  Size cannot be zero.  */
 void *__libc_mtag_tag_region (void *, size_t);
 
 /* Optimized equivalent to __libc_mtag_tag_region followed by memset to 0.  */
 void *__libc_mtag_tag_zero_region (void *, size_t);
 
 /* Convert address P to a pointer that is tagged correctly for that
-   location.
-   void *__libc_mtag_address_get_tag (void*)  */
-void *__libc_mtag_address_get_tag (void *);
+   location.  */
+static __always_inline void *
+__libc_mtag_address_get_tag (void *p)
+{
+  register void *x0 asm ("x0") = p;
+  asm (".inst 0xd9600000 /* ldg x0, [x0] */" : "+r" (x0));
+  return x0;
+}
 
 /* Assign a new (random) tag to a pointer P (does not adjust the tag on
    the memory addressed).
-- 
2.17.1


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 14/16] aarch64: inline __libc_mtag_new_tag
  2021-03-04 16:30 [PATCH 00/16] memory tagging improvements Szabolcs Nagy
                   ` (12 preceding siblings ...)
  2021-03-04 16:34 ` [PATCH 13/16] aarch64: inline __libc_mtag_address_get_tag Szabolcs Nagy
@ 2021-03-04 16:34 ` Szabolcs Nagy
  2021-03-04 16:34 ` [PATCH 15/16] aarch64: Optimize __libc_mtag_tag_region Szabolcs Nagy
  2021-03-04 16:34 ` [PATCH 16/16] aarch64: Optimize __libc_mtag_tag_zero_region Szabolcs Nagy
  15 siblings, 0 replies; 39+ messages in thread
From: Szabolcs Nagy @ 2021-03-04 16:34 UTC (permalink / raw)
  To: libc-alpha, Richard.Earnshaw, DJ Delorie

This is a common operation when heap tagging is enabled, so inline the
instructions instead of using an extern call.
---
 sysdeps/aarch64/Makefile         |  1 -
 sysdeps/aarch64/__mtag_new_tag.S | 37 --------------------------------
 sysdeps/aarch64/libc-mtag.h      | 14 +++++++++---
 3 files changed, 11 insertions(+), 41 deletions(-)
 delete mode 100644 sysdeps/aarch64/__mtag_new_tag.S

diff --git a/sysdeps/aarch64/Makefile b/sysdeps/aarch64/Makefile
index 5d594debea..1099f1d657 100644
--- a/sysdeps/aarch64/Makefile
+++ b/sysdeps/aarch64/Makefile
@@ -41,7 +41,6 @@ endif
 ifeq ($(subdir),misc)
 sysdep_headers += sys/ifunc.h
 sysdep_routines += __mtag_tag_zero_region \
-		   __mtag_new_tag \
 		   __mtag_tag_region
 
 endif
diff --git a/sysdeps/aarch64/__mtag_new_tag.S b/sysdeps/aarch64/__mtag_new_tag.S
deleted file mode 100644
index 72c1aded4c..0000000000
--- a/sysdeps/aarch64/__mtag_new_tag.S
+++ /dev/null
@@ -1,37 +0,0 @@
-/* Copyright (C) 2020-2021 Free Software Foundation, Inc.
-
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library.  If not, see
-   <http://www.gnu.org/licenses/>.  */
-
-#include <sysdep.h>
-
-#ifdef USE_MTAG
-
-	.arch armv8.5-a
-	.arch_extension memtag
-
-/* NB, only supported on variants with 64-bit pointers.  */
-
-#define ptr x0
-#define xset x1
-
-ENTRY(__libc_mtag_new_tag)
-	// Guarantee that the new tag is not the same as now.
-	gmi	xset, ptr, xzr
-	irg	ptr, ptr, xset
-	ret
-END (__libc_mtag_new_tag)
-#endif /* USE_MTAG */
diff --git a/sysdeps/aarch64/libc-mtag.h b/sysdeps/aarch64/libc-mtag.h
index da1b6be776..7588c6de9e 100644
--- a/sysdeps/aarch64/libc-mtag.h
+++ b/sysdeps/aarch64/libc-mtag.h
@@ -52,9 +52,17 @@ __libc_mtag_address_get_tag (void *p)
 }
 
 /* Assign a new (random) tag to a pointer P (does not adjust the tag on
-   the memory addressed).
-   void *__libc_mtag_new_tag (void*)  */
-void *__libc_mtag_new_tag (void *);
+   the memory addressed).  */
+static __always_inline void *
+__libc_mtag_new_tag (void *p)
+{
+  register void *x0 asm ("x0") = p;
+  register unsigned long x1 asm ("x1");
+  /* Guarantee that the new tag is not the same as now.  */
+  asm (".inst 0x9adf1401 /* gmi x1, x0, xzr */\n"
+       ".inst 0x9ac11000 /* irg x0, x0, x1 */" : "+r" (x0), "=r" (x1));
+  return x0;
+}
 
 #endif /* USE_MTAG */
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 15/16] aarch64: Optimize __libc_mtag_tag_region
  2021-03-04 16:30 [PATCH 00/16] memory tagging improvements Szabolcs Nagy
                   ` (13 preceding siblings ...)
  2021-03-04 16:34 ` [PATCH 14/16] aarch64: inline __libc_mtag_new_tag Szabolcs Nagy
@ 2021-03-04 16:34 ` Szabolcs Nagy
  2021-03-04 16:34 ` [PATCH 16/16] aarch64: Optimize __libc_mtag_tag_zero_region Szabolcs Nagy
  15 siblings, 0 replies; 39+ messages in thread
From: Szabolcs Nagy @ 2021-03-04 16:34 UTC (permalink / raw)
  To: libc-alpha, Richard.Earnshaw, DJ Delorie

This is a target hook for memory tagging, the original was a naive
implementation. The optimized version relies on "dc gva" to tag 64
bytes at a time for large allocations and optimizes small cases without
adding too many branches. This was not benchmarked on real cpu, but
expected to be faster than the naive implementation.
---
 sysdeps/aarch64/__mtag_tag_region.S | 98 +++++++++++++++++++++++------
 1 file changed, 80 insertions(+), 18 deletions(-)

diff --git a/sysdeps/aarch64/__mtag_tag_region.S b/sysdeps/aarch64/__mtag_tag_region.S
index 9a8a3ffb60..cae0c8f121 100644
--- a/sysdeps/aarch64/__mtag_tag_region.S
+++ b/sysdeps/aarch64/__mtag_tag_region.S
@@ -20,32 +20,94 @@
 
 #ifdef USE_MTAG
 
-/* Use the same register names and assignments as memset.  */
-
+/* Assumptions:
+ *
+ * ARMv8-a, AArch64, MTE, LP64 ABI.
+ *
+ * Interface contract:
+ * Address is 16 byte aligned and size is multiple of 16.
+ * Returns the passed pointer.
+ * The memory region may remain untagged if tagging is not enabled.
+ */
 	.arch armv8.5-a
 	.arch_extension memtag
 
-/* NB, only supported on variants with 64-bit pointers.  */
+#define dstin	x0
+#define count	x1
+#define dst	x2
+#define dstend	x3
+#define tmp	x4
+#define zva_val	x4
+
+ENTRY (__libc_mtag_tag_region)
+	PTR_ARG (0)
+	SIZE_ARG (1)
+
+	add	dstend, dstin, count
 
-/* FIXME: This is a minimal implementation.  We could do better than
-   this for larger values of COUNT.  */
+	cmp	count, 96
+	b.hi	L(set_long)
 
-#define dstin x0
-#define count x1
-#define dst   x2
+	tbnz	count, 6, L(set96)
 
-ENTRY_ALIGN(__libc_mtag_tag_region, 6)
+	/* Set 0, 16, 32, or 48 bytes.  */
+	lsr	tmp, count, 5
+	add	tmp, dstin, tmp, lsl 4
+	cbz     count, L(end)
+	stg	dstin, [dstin]
+	stg	dstin, [tmp]
+	stg	dstin, [dstend, -16]
+L(end):
+	ret
+
+	.p2align 4
+	/* Set 64..96 bytes.  Write 64 bytes from the start and
+	   32 bytes from the end.  */
+L(set96):
+	st2g	dstin, [dstin]
+	st2g	dstin, [dstin, 32]
+	st2g	dstin, [dstend, -32]
+	ret
 
-	mov	dst, dstin
-L(loop):
-	stg	dst, [dst], #16
-	subs	count, count, 16
-	bne	L(loop)
-#if 0
-	/* This is not currently needed, since for now we are only called
-	   to tag memory that is taggable.  */
-	ldg	dstin, [dstin] // Recover the tag created (might be untagged).
+	.p2align 4
+	/* Size is > 96 bytes.  */
+L(set_long):
+	cmp	count, 160
+	b.lo	L(no_zva)
+
+#ifndef SKIP_ZVA_CHECK
+	mrs	zva_val, dczid_el0
+	and	zva_val, zva_val, 31
+	cmp	zva_val, 4		/* ZVA size is 64 bytes.  */
+	b.ne	L(no_zva)
 #endif
+	st2g	dstin, [dstin]
+	st2g	dstin, [dstin, 32]
+	bic	dst, dstin, 63
+	sub	count, dstend, dst	/* Count is now 64 too large.  */
+	sub	count, count, 128	/* Adjust count and bias for loop.  */
+
+	.p2align 4
+L(zva_loop):
+	add	dst, dst, 64
+	dc	gva, dst
+	subs	count, count, 64
+	b.hi	L(zva_loop)
+	st2g	dstin, [dstend, -64]
+	st2g	dstin, [dstend, -32]
 	ret
+
+L(no_zva):
+	sub	dst, dstin, 32		/* Dst is biased by -32.  */
+	sub	count, count, 64	/* Adjust count for loop.  */
+L(no_zva_loop):
+	st2g	dstin, [dst, 32]
+	st2g	dstin, [dst, 64]!
+	subs	count, count, 64
+	b.hi	L(no_zva_loop)
+	st2g	dstin, [dstend, -64]
+	st2g	dstin, [dstend, -32]
+	ret
+
 END (__libc_mtag_tag_region)
 #endif /* USE_MTAG */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 16/16] aarch64: Optimize __libc_mtag_tag_zero_region
  2021-03-04 16:30 [PATCH 00/16] memory tagging improvements Szabolcs Nagy
                   ` (14 preceding siblings ...)
  2021-03-04 16:34 ` [PATCH 15/16] aarch64: Optimize __libc_mtag_tag_region Szabolcs Nagy
@ 2021-03-04 16:34 ` Szabolcs Nagy
  15 siblings, 0 replies; 39+ messages in thread
From: Szabolcs Nagy @ 2021-03-04 16:34 UTC (permalink / raw)
  To: libc-alpha, Richard.Earnshaw, DJ Delorie

This is a target hook for memory tagging, the original was a naive
implementation. Uses the same algorithm as __libc_mtag_tag_region,
but with instructions that also zero the memory.  This was not
benchmarked on real cpu, but expected to be faster than the naive
implementation.
---
 sysdeps/aarch64/__mtag_tag_zero_region.S | 96 ++++++++++++++++++++----
 1 file changed, 80 insertions(+), 16 deletions(-)

diff --git a/sysdeps/aarch64/__mtag_tag_zero_region.S b/sysdeps/aarch64/__mtag_tag_zero_region.S
index 74d398bba5..7d955fbd0c 100644
--- a/sysdeps/aarch64/__mtag_tag_zero_region.S
+++ b/sysdeps/aarch64/__mtag_tag_zero_region.S
@@ -20,30 +20,94 @@
 
 #ifdef USE_MTAG
 
+/* Assumptions:
+ *
+ * ARMv8-a, AArch64, MTE, LP64 ABI.
+ *
+ * Interface contract:
+ * Address is 16 byte aligned and size is multiple of 16.
+ * Returns the passed pointer.
+ * The memory region may remain untagged if tagging is not enabled.
+ */
 	.arch armv8.5-a
 	.arch_extension memtag
 
-/* NB, only supported on variants with 64-bit pointers.  */
+#define dstin	x0
+#define count	x1
+#define dst	x2
+#define dstend	x3
+#define tmp	x4
+#define zva_val	x4
 
-/* FIXME: This is a minimal implementation.  We could do much better than
-   this for large values of COUNT.  */
+ENTRY (__libc_mtag_tag_zero_region)
+	PTR_ARG (0)
+	SIZE_ARG (1)
 
-#define dstin x0
-#define count x1
-#define dst   x2
+	add	dstend, dstin, count
 
-ENTRY(__libc_mtag_tag_zero_region)
+	cmp	count, 96
+	b.hi	L(set_long)
 
-	mov	dst, dstin
-L(loop):
-	stzg	dst, [dst], #16
-	subs	count, count, 16
-	bne	L(loop)
-#if 0
-	/* This is not currently needed, since for now we are only called
-	   to tag memory that is taggable.  */
-	ldg	dstin, [dstin] // Recover the tag created (might be untagged).
+	tbnz	count, 6, L(set96)
+
+	/* Set 0, 16, 32, or 48 bytes.  */
+	lsr	tmp, count, 5
+	add	tmp, dstin, tmp, lsl 4
+	cbz     count, L(end)
+	stzg	dstin, [dstin]
+	stzg	dstin, [tmp]
+	stzg	dstin, [dstend, -16]
+L(end):
+	ret
+
+	.p2align 4
+	/* Set 64..96 bytes.  Write 64 bytes from the start and
+	   32 bytes from the end.  */
+L(set96):
+	stz2g	dstin, [dstin]
+	stz2g	dstin, [dstin, 32]
+	stz2g	dstin, [dstend, -32]
+	ret
+
+	.p2align 4
+	/* Size is > 96 bytes.  */
+L(set_long):
+	cmp	count, 160
+	b.lo	L(no_zva)
+
+#ifndef SKIP_ZVA_CHECK
+	mrs	zva_val, dczid_el0
+	and	zva_val, zva_val, 31
+	cmp	zva_val, 4		/* ZVA size is 64 bytes.  */
+	b.ne	L(no_zva)
 #endif
+	stz2g	dstin, [dstin]
+	stz2g	dstin, [dstin, 32]
+	bic	dst, dstin, 63
+	sub	count, dstend, dst	/* Count is now 64 too large.  */
+	sub	count, count, 128	/* Adjust count and bias for loop.  */
+
+	.p2align 4
+L(zva_loop):
+	add	dst, dst, 64
+	dc	gzva, dst
+	subs	count, count, 64
+	b.hi	L(zva_loop)
+	stz2g	dstin, [dstend, -64]
+	stz2g	dstin, [dstend, -32]
 	ret
+
+L(no_zva):
+	sub	dst, dstin, 32		/* Dst is biased by -32.  */
+	sub	count, count, 64	/* Adjust count for loop.  */
+L(no_zva_loop):
+	stz2g	dstin, [dst, 32]
+	stz2g	dstin, [dst, 64]!
+	subs	count, count, 64
+	b.hi	L(no_zva_loop)
+	stz2g	dstin, [dstend, -64]
+	stz2g	dstin, [dstend, -32]
+	ret
+
 END (__libc_mtag_tag_zero_region)
 #endif /* USE_MTAG */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 01/16] malloc: Fix a realloc crash with heap tagging [BZ 27468]
  2021-03-04 16:30 ` [PATCH 01/16] malloc: Fix a realloc crash with heap tagging [BZ 27468] Szabolcs Nagy
@ 2021-03-05  0:15   ` DJ Delorie
  2021-03-05 12:01     ` Szabolcs Nagy
  2021-03-05 20:51   ` DJ Delorie
  1 sibling, 1 reply; 39+ messages in thread
From: DJ Delorie @ 2021-03-05  0:15 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: libc-alpha, Richard.Earnshaw

Szabolcs Nagy <szabolcs.nagy@arm.com> writes:
> diff --git a/malloc/malloc.c b/malloc/malloc.c
> index 1f4bbd8edf..10ea6aa441 100644
> --- a/malloc/malloc.c
> +++ b/malloc/malloc.c
> @@ -3446,7 +3446,9 @@ __libc_realloc (void *oldmem, size_t bytes)
>        newp = __libc_malloc (bytes);
>        if (newp != NULL)
>          {
> -          memcpy (newp, oldmem, oldsize - SIZE_SZ);

> +	  size_t sz = CHUNK_AVAILABLE_SIZE (oldp) - CHUNK_HDR_SZ;

I think this is semantically wrong, because the chunk size
(mptr->mchunk_size) does not include the mchunk_prev_size that's
accounted for in CHUNK_HDR_SZ.  I suspect the problem is that
CHUNK_AVAILABLE_SIZE is wrong, in that it adds SIZE_SZ in the non-tagged
case, and shouldn't, or that it's defined (or named) wrong.

chunksize(p) is the difference between this chunk and the corresponding
address in the next chunk.  i.e. it's prev_ptr to prev_ptr, or
user-bytes to user-bytes.

A "chunk pointer" does NOT point to the beginning of the chunk, but to
the prev_ptr in the PREVIOUS chunk.  So CHUNK_HDR_SZ is the offset from
a chunk pointer to the user data, but it is NOT the difference between
the chunk size and the user data size.  Using CHUNK_HDR_SZ in any
user-data-size computations is suspect logic.

That the resulting value happens to be correct is irrelevent here,
although I suspect it will be off by a word when tagging is enabled, and
not memcpy enough data, if the prev_ptr word is still part of the "user
data" when tagging is enabled.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 04/16] malloc: Simplify __mtag_tag_new_usable
  2021-03-04 16:31 ` [PATCH 04/16] malloc: Simplify __mtag_tag_new_usable Szabolcs Nagy
@ 2021-03-05  0:20   ` DJ Delorie
  2021-03-05 12:24     ` Szabolcs Nagy
  2021-03-05 18:52   ` DJ Delorie
  1 sibling, 1 reply; 39+ messages in thread
From: DJ Delorie @ 2021-03-05  0:20 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: libc-alpha, Richard.Earnshaw

Szabolcs Nagy <szabolcs.nagy@arm.com> writes:
> The chunk cannot be a dumped one here.

What about the realloc-expand case in malloc.c:4819 ?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 07/16] malloc: Refactor TAG_ macros to avoid indirection
  2021-03-04 16:32 ` [PATCH 07/16] malloc: Refactor TAG_ macros to avoid indirection Szabolcs Nagy
@ 2021-03-05  0:28   ` DJ Delorie
  0 siblings, 0 replies; 39+ messages in thread
From: DJ Delorie @ 2021-03-05  0:28 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: libc-alpha, Richard.Earnshaw

Szabolcs Nagy <szabolcs.nagy@arm.com> writes:
> This does not change behaviour, just removes one layer of indirection
> in the internal memory tagging logic.
>
> Use tag_ and mtag_ prefixes instead of __tag_ and __mtag_ since these
> are all symbols with internal linkage, private to malloc.c, so there
> is no user namespace pollution issue.
> ---
>  malloc/arena.c  | 16 +++++-----
>  malloc/hooks.c  | 10 +++---
>  malloc/malloc.c | 81 +++++++++++++++++++++++--------------------------
>  3 files changed, 51 insertions(+), 56 deletions(-)

LGTM
Reviewed-by: DJ Delorie <dj@redhat.com>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 08/16] malloc: Use global flag instead of function pointer dispatch for mtag
  2021-03-04 16:32 ` [PATCH 08/16] malloc: Use global flag instead of function pointer dispatch for mtag Szabolcs Nagy
@ 2021-03-05  0:46   ` DJ Delorie
  2021-03-05 12:53     ` Szabolcs Nagy
  0 siblings, 1 reply; 39+ messages in thread
From: DJ Delorie @ 2021-03-05  0:46 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: libc-alpha, Richard.Earnshaw

Szabolcs Nagy <szabolcs.nagy@arm.com> writes:
> The definition of tag_new_usable is moved after chunk related
> definitions.

A couple thousand lines after, that is.  Perhaps it could go at the end
of the chunk related definitions, closer to the other tag functions?

> This refactoring also allows using mtag_enabled checks instead of
> USE_MTAG ifdefs when memory tagging support only changes code logic
> when memory tagging is enabled at runtime.

Are you relying on the compiler to do the work of the old #ifdefs for
platforms without USE_MTAG set?  I.e. are we sure that the compiled
binaries will have the mtag-related code optimized out?

Otherwise LGTM with the above caveats.

Reviewed-by: DJ Delorie <dj@redhat.com>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 09/16] malloc: Only support zeroing and not arbitrary memset with mtag
  2021-03-04 16:32 ` [PATCH 09/16] malloc: Only support zeroing and not arbitrary memset with mtag Szabolcs Nagy
@ 2021-03-05  0:49   ` DJ Delorie
  0 siblings, 0 replies; 39+ messages in thread
From: DJ Delorie @ 2021-03-05  0:49 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: libc-alpha, Richard.Earnshaw


Szabolcs Nagy <szabolcs.nagy@arm.com> writes:
> The memset api is suboptimal and does not provide much benefit. Memory
> tagging only needs a zeroing memset (and only for memory that's sized
> and aligned to multiples of the tag granule), so change the internal
> api and the target hooks accordingly.  This is to simplify the
> implementation of the target hook.

LGTM

Reviewed-by: DJ Delorie <dj@redhat.com>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 12/16] malloc: Use mtag_enabled instead of USE_MTAG
  2021-03-04 16:33 ` [PATCH 12/16] malloc: Use mtag_enabled instead of USE_MTAG Szabolcs Nagy
@ 2021-03-05  0:56   ` DJ Delorie
  0 siblings, 0 replies; 39+ messages in thread
From: DJ Delorie @ 2021-03-05  0:56 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: libc-alpha, Richard.Earnshaw

Szabolcs Nagy <szabolcs.nagy@arm.com> writes:
> Use the runtime check where possible: it should not cause slow down in
> the !USE_MTAG case since then mtag_enabled is constant false, but it
> allows compiling the tagging logic so it's less likely to break or
> diverge when developers only test the !USE_MTAG case.

LGTM

Reviewed-by: DJ Delorie <dj@redhat.com>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 05/16] malloc: Avoid taggig mmaped memory on free
  2021-03-04 16:31 ` [PATCH 05/16] malloc: Avoid taggig mmaped memory on free Szabolcs Nagy
@ 2021-03-05  1:01   ` DJ Delorie
  0 siblings, 0 replies; 39+ messages in thread
From: DJ Delorie @ 2021-03-05  1:01 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: libc-alpha, Richard.Earnshaw

Szabolcs Nagy <szabolcs.nagy@arm.com> writes:
> Either the memory belongs to the dumped area, in which case we don't
> want to tag (the dumped area has the same tag as malloc internal data
> so tagging is unnecessary, but chunks there may not have the right
> alignment for the tag granule), or the memory will be unmapped
> immediately (and thus tagging is not useful).

LGTM

Reviewed-by: DJ Delorie <dj@redhat.com>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 06/16] malloc: Ensure the generic mtag hooks are not used
  2021-03-04 16:31 ` [PATCH 06/16] malloc: Ensure the generic mtag hooks are not used Szabolcs Nagy
@ 2021-03-05  1:05   ` DJ Delorie
  2021-03-05 12:44     ` Szabolcs Nagy
  2021-03-05 20:30   ` DJ Delorie
  1 sibling, 1 reply; 39+ messages in thread
From: DJ Delorie @ 2021-03-05  1:05 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: libc-alpha, Richard.Earnshaw

Szabolcs Nagy <szabolcs.nagy@arm.com> writes:
> +/* Memory tagging target hooks are only called when memory tagging is
> +   enabled at runtime.  The generic definitions here must not be used.  */
> +void __libc_mtag_link_error (void);

This may make it impossible to compile individual glibc modules with
-O0, say, for debugging purposes.  I do this quite often with malloc.c
and do not want to lose that ability.

Do we trust the compiler's default optimizations (at -O0) to do code
elimination based on constant conditions?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 10/16] malloc: Change calloc when tagging is disabled
  2021-03-04 16:33 ` [PATCH 10/16] malloc: Change calloc when tagging is disabled Szabolcs Nagy
@ 2021-03-05  1:06   ` DJ Delorie
  0 siblings, 0 replies; 39+ messages in thread
From: DJ Delorie @ 2021-03-05  1:06 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: libc-alpha, Richard.Earnshaw

Szabolcs Nagy <szabolcs.nagy@arm.com> writes:
> When glibc is built with memory tagging support (USE_MTAG) but it is not
> enabled at runtime (mtag_enabled) then unconditional memset was used
> even though that can be often avoided.
>
> This is for performance when tagging is supported but not enabled.
> The extra check should have no overhead: tag_new_zero_region already
> had a runtime check which the compiler can now optimize away.

LGTM

Reviewed-by: DJ Delorie <dj@redhat.com>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 03/16] malloc: Move MTAG_MMAP_FLAGS definition
  2021-03-04 16:31 ` [PATCH 03/16] malloc: Move MTAG_MMAP_FLAGS definition Szabolcs Nagy
@ 2021-03-05  1:07   ` DJ Delorie
  0 siblings, 0 replies; 39+ messages in thread
From: DJ Delorie @ 2021-03-05  1:07 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: libc-alpha, Richard.Earnshaw

Szabolcs Nagy <szabolcs.nagy@arm.com> writes:
> This is only used internally in malloc.c, the extern declaration
> was wrong, __mtag_mmap_flags has internal linkage.

LGTM

Reviewed-by: DJ Delorie <dj@redhat.com>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 01/16] malloc: Fix a realloc crash with heap tagging [BZ 27468]
  2021-03-05  0:15   ` DJ Delorie
@ 2021-03-05 12:01     ` Szabolcs Nagy
  2021-03-05 18:42       ` DJ Delorie
  0 siblings, 1 reply; 39+ messages in thread
From: Szabolcs Nagy @ 2021-03-05 12:01 UTC (permalink / raw)
  To: DJ Delorie; +Cc: libc-alpha, Richard.Earnshaw

The 03/04/2021 19:15, DJ Delorie wrote:
> Szabolcs Nagy <szabolcs.nagy@arm.com> writes:
> > diff --git a/malloc/malloc.c b/malloc/malloc.c
> > index 1f4bbd8edf..10ea6aa441 100644
> > --- a/malloc/malloc.c
> > +++ b/malloc/malloc.c
> > @@ -3446,7 +3446,9 @@ __libc_realloc (void *oldmem, size_t bytes)
> >        newp = __libc_malloc (bytes);
> >        if (newp != NULL)
> >          {
> > -          memcpy (newp, oldmem, oldsize - SIZE_SZ);
> 
> > +	  size_t sz = CHUNK_AVAILABLE_SIZE (oldp) - CHUNK_HDR_SZ;
> 
> I think this is semantically wrong, because the chunk size
> (mptr->mchunk_size) does not include the mchunk_prev_size that's
> accounted for in CHUNK_HDR_SZ.  I suspect the problem is that
> CHUNK_AVAILABLE_SIZE is wrong, in that it adds SIZE_SZ in the non-tagged
> case, and shouldn't, or that it's defined (or named) wrong.
> 
> chunksize(p) is the difference between this chunk and the corresponding
> address in the next chunk.  i.e. it's prev_ptr to prev_ptr, or
> user-bytes to user-bytes.
> 
> A "chunk pointer" does NOT point to the beginning of the chunk, but to
> the prev_ptr in the PREVIOUS chunk.  So CHUNK_HDR_SZ is the offset from
> a chunk pointer to the user data, but it is NOT the difference between
> the chunk size and the user data size.  Using CHUNK_HDR_SZ in any
> user-data-size computations is suspect logic.
> 
> That the resulting value happens to be correct is irrelevent here,
> although I suspect it will be off by a word when tagging is enabled, and
> not memcpy enough data, if the prev_ptr word is still part of the "user
> data" when tagging is enabled.

it seems CHUNK_AVAILABLE_SIZE is defined as

  (memory owned by the user) + CHUNK_HDR_SZ

and it should work on mmaped and normal chunks with or without
tagging. so by this definition i think the change is right, but
the CHUNK_AVAILABLE_SIZE may not have the most useful definition.
i can change this macro to be more meaningful, e.g.:

  CHUNK_USER_SIZE(chunk):  memory owned by the user in chunk.

i.e. the interval that user code may access in chunk p is

  [ chunk2mem(p), chunk2mem(p) + CHUNK_USER_SIZE(p) )

with tagging on aarch64 (granule = 2*size_t) this does not include
the prev_ptr word at the end.

I can refactor the code using this macro, or let me know if you
have a different preference (and if it should be backported with
this bug fix or have it as a follow up change on master).

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 04/16] malloc: Simplify __mtag_tag_new_usable
  2021-03-05  0:20   ` DJ Delorie
@ 2021-03-05 12:24     ` Szabolcs Nagy
  0 siblings, 0 replies; 39+ messages in thread
From: Szabolcs Nagy @ 2021-03-05 12:24 UTC (permalink / raw)
  To: DJ Delorie; +Cc: libc-alpha, Richard.Earnshaw

The 03/04/2021 19:20, DJ Delorie wrote:
> Szabolcs Nagy <szabolcs.nagy@arm.com> writes:
> > The chunk cannot be a dumped one here.
> 
> What about the realloc-expand case in malloc.c:4819 ?

that's in _int_realloc and it seems _int_realloc
is never called on dumped chunks (nor mmapped
chunks in more general)

(there is some inconsistency here about what
kind of chunks may end up in _int_realloc vs
_int_free, e.g. the former may be tagged and
never mmapped, the latter is always untagged
but may be mmapped, but neither can be dumped.
these constraints are currently not documented)

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 06/16] malloc: Ensure the generic mtag hooks are not used
  2021-03-05  1:05   ` DJ Delorie
@ 2021-03-05 12:44     ` Szabolcs Nagy
  0 siblings, 0 replies; 39+ messages in thread
From: Szabolcs Nagy @ 2021-03-05 12:44 UTC (permalink / raw)
  To: DJ Delorie; +Cc: libc-alpha, Richard.Earnshaw

The 03/04/2021 20:05, DJ Delorie wrote:
> Szabolcs Nagy <szabolcs.nagy@arm.com> writes:
> > +/* Memory tagging target hooks are only called when memory tagging is
> > +   enabled at runtime.  The generic definitions here must not be used.  */
> > +void __libc_mtag_link_error (void);
> 
> This may make it impossible to compile individual glibc modules with
> -O0, say, for debugging purposes.  I do this quite often with malloc.c
> and do not want to lose that ability.
> 
> Do we trust the compiler's default optimizations (at -O0) to do code
> elimination based on constant conditions?

yes, this breaks -O0, i haven't thought of that.

i thought it's better to guard everything with
one flag in the code than ifdefs (that can be
runtime flag with tagging enabled or constant 0
when disabled) since ifdefed code is not checked
by the compiler.

the link error is to ensure that without tagging
the tagging hooks are not called accidentally.
i can replace that with abort() and then -O0
works but then you only get runtime failure, not
a link time one (which is less useful since it
may be in a very rarely used code path).

alternatively i can remove this safety net or
just go back to ifdefs.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 08/16] malloc: Use global flag instead of function pointer dispatch for mtag
  2021-03-05  0:46   ` DJ Delorie
@ 2021-03-05 12:53     ` Szabolcs Nagy
  0 siblings, 0 replies; 39+ messages in thread
From: Szabolcs Nagy @ 2021-03-05 12:53 UTC (permalink / raw)
  To: DJ Delorie; +Cc: libc-alpha, Richard.Earnshaw

The 03/04/2021 19:46, DJ Delorie wrote:
> Szabolcs Nagy <szabolcs.nagy@arm.com> writes:
> > The definition of tag_new_usable is moved after chunk related
> > definitions.
> 
> A couple thousand lines after, that is.  Perhaps it could go at the end
> of the chunk related definitions, closer to the other tag functions?

ok. i'm happy to move it around. (i moved it down
close to its first use).

> > This refactoring also allows using mtag_enabled checks instead of
> > USE_MTAG ifdefs when memory tagging support only changes code logic
> > when memory tagging is enabled at runtime.
> 
> Are you relying on the compiler to do the work of the old #ifdefs for
> platforms without USE_MTAG set?  I.e. are we sure that the compiled
> binaries will have the mtag-related code optimized out?

yes as described in my patch 6 response, i attempted to
remove ifdefs.

i think at -O1 we can rely on 'if (0) {...}' to be
optimized out completely (but that code is still type
checked and has to reference defined variables etc.)

with -O0 i'd expect the code to be there in the
compiled libc, just unreachable.

i can add back the ifdefs if this seems too intrusive
(it is now less clear that mtag related code does not
affect the tagging disabled configuration).

> 
> Otherwise LGTM with the above caveats.
> 
> Reviewed-by: DJ Delorie <dj@redhat.com>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 01/16] malloc: Fix a realloc crash with heap tagging [BZ 27468]
  2021-03-05 12:01     ` Szabolcs Nagy
@ 2021-03-05 18:42       ` DJ Delorie
  0 siblings, 0 replies; 39+ messages in thread
From: DJ Delorie @ 2021-03-05 18:42 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: libc-alpha, Richard.Earnshaw

Szabolcs Nagy <szabolcs.nagy@arm.com> writes:
> with tagging on aarch64 (granule = 2*size_t) this does not include
> the prev_ptr word at the end.

So chunksize() includes the overhead to align tagging, but
CHUNK_USER_SIZE() would not?

It's unfortunate that we have such a good name for the chunk, but no
good name for "that portion of the chunk that the application can use".
USER_SIZE vs rawmem vs mem vs whatever.

> I can refactor the code using this macro, or let me know if you have a
> different preference (and if it should be backported with this bug fix
> or have it as a follow up change on master).

An agreement to fix it later is sufficient, let's not mess up the
existing patch set.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 04/16] malloc: Simplify __mtag_tag_new_usable
  2021-03-04 16:31 ` [PATCH 04/16] malloc: Simplify __mtag_tag_new_usable Szabolcs Nagy
  2021-03-05  0:20   ` DJ Delorie
@ 2021-03-05 18:52   ` DJ Delorie
  1 sibling, 0 replies; 39+ messages in thread
From: DJ Delorie @ 2021-03-05 18:52 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: libc-alpha, Richard.Earnshaw

Szabolcs Nagy <szabolcs.nagy@arm.com> writes:
> The chunk cannot be a dumped one here.  The only non-obvious cases
> are free and realloc which may be called on a dumped area chunk,
> but in both cases it can be verified that tagging is already
> avoided for dumped area chunks.

LGTM after feedback :-)

Reviewed-by: DJ Delorie <dj@redhat.com>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 06/16] malloc: Ensure the generic mtag hooks are not used
  2021-03-04 16:31 ` [PATCH 06/16] malloc: Ensure the generic mtag hooks are not used Szabolcs Nagy
  2021-03-05  1:05   ` DJ Delorie
@ 2021-03-05 20:30   ` DJ Delorie
  1 sibling, 0 replies; 39+ messages in thread
From: DJ Delorie @ 2021-03-05 20:30 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: libc-alpha, Richard.Earnshaw


Szabolcs Nagy <szabolcs.nagy@arm.com> writes:
> Use inline functions instead of macros, because macros can cause unused
> variable warnings and type conversion issues.  We assume these functions
> may appear in the code but only in dead code paths (hidden by a runtime
> check), so it's important that they can compile with correct types, but
> if they are actually used that should be an error.
>
> Currently the hooks are only used when USE_MTAG is true which only
> happens on aarch64 and then the aarch64 specific code is used not this
> generic header.  However followup refactoring will allow the hooks to
> be used with !USE_MTAG.
>
> Note: the const qualifier in the comment was wrong: changing tags is a
> write operation.

This is outside my usual area, but I'll give it my +1 given the
follow-up.

Reviewed-by: DJ Delorie <dj@redhat.com>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 01/16] malloc: Fix a realloc crash with heap tagging [BZ 27468]
  2021-03-04 16:30 ` [PATCH 01/16] malloc: Fix a realloc crash with heap tagging [BZ 27468] Szabolcs Nagy
  2021-03-05  0:15   ` DJ Delorie
@ 2021-03-05 20:51   ` DJ Delorie
  1 sibling, 0 replies; 39+ messages in thread
From: DJ Delorie @ 2021-03-05 20:51 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: libc-alpha, Richard.Earnshaw


LGTM given our conversation about the unfortunate name of
"CHUNK_AVAILABLE_SIZE" (to be dealt with later).

Reviewed-by: DJ Delorie <dj@redhat.com>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 11/16] malloc: Use branches instead of mtag_granule_mask
  2021-03-04 16:33 ` [PATCH 11/16] malloc: Use branches instead of mtag_granule_mask Szabolcs Nagy
@ 2021-03-05 21:00   ` DJ Delorie
  0 siblings, 0 replies; 39+ messages in thread
From: DJ Delorie @ 2021-03-05 21:00 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: libc-alpha, Richard.Earnshaw


LGTM given our conversation about the unfortunate name for
"CHUNK_AVAILABLE_SIZE" to be dealt with later.

Reviewed-by: DJ Delorie <dj@redhat.com>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 02/16] Remove PR_TAGGED_ADDR_ENABLE from sys/prctl.h
  2021-03-04 16:30 ` [PATCH 02/16] Remove PR_TAGGED_ADDR_ENABLE from sys/prctl.h Szabolcs Nagy
@ 2021-03-26 11:29   ` Szabolcs Nagy
  2021-04-13  8:37     ` Szabolcs Nagy
  0 siblings, 1 reply; 39+ messages in thread
From: Szabolcs Nagy @ 2021-03-26 11:29 UTC (permalink / raw)
  To: libc-alpha, Richard.Earnshaw, DJ Delorie

The 03/04/2021 16:30, Szabolcs Nagy via Libc-alpha wrote:
> The value of PR_TAGGED_ADDR_ENABLE was incorrect in the installed
> headers and the prctl command macros were missing that are needed
> for it to be useful (PR_SET_TAGGED_ADDR_CTRL).  Linux headers have
> the definitions since 5.4 so it's widely available, we don't need
> to repeat these definitions.  The remaining definitions are from
> Linux 5.10.
> 
> To build glibc with --enable-memory-tagging, Linux 5.4 headers and
> binutils 2.33.1 or newer is needed.

ping.

i now committed all the other memtag patches (thanks
DJ for the reviews) only this one needs a review
(i would like to back port it too).


> ---
>  sysdeps/unix/sysv/linux/sys/prctl.h | 4 ----
>  1 file changed, 4 deletions(-)
> 
> diff --git a/sysdeps/unix/sysv/linux/sys/prctl.h b/sysdeps/unix/sysv/linux/sys/prctl.h
> index 00817ff0f1..c9048c7cdb 100644
> --- a/sysdeps/unix/sysv/linux/sys/prctl.h
> +++ b/sysdeps/unix/sysv/linux/sys/prctl.h
> @@ -25,10 +25,6 @@
>     we're picking up...  */
>  
>  /* Memory tagging control operations (for AArch64).  */
> -#ifndef PR_TAGGED_ADDR_ENABLE
> -# define PR_TAGGED_ADDR_ENABLE	(1UL << 8)
> -#endif
> -
>  #ifndef PR_MTE_TCF_SHIFT
>  # define PR_MTE_TCF_SHIFT	1
>  # define PR_MTE_TCF_NONE	(0UL << PR_MTE_TCF_SHIFT)
> -- 
> 2.17.1
> 

-- 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 02/16] Remove PR_TAGGED_ADDR_ENABLE from sys/prctl.h
  2021-03-26 11:29   ` Szabolcs Nagy
@ 2021-04-13  8:37     ` Szabolcs Nagy
  2021-04-13 21:32       ` DJ Delorie
  0 siblings, 1 reply; 39+ messages in thread
From: Szabolcs Nagy @ 2021-04-13  8:37 UTC (permalink / raw)
  To: libc-alpha, Richard.Earnshaw, DJ Delorie

The 03/26/2021 11:29, Szabolcs Nagy via Libc-alpha wrote:
> The 03/04/2021 16:30, Szabolcs Nagy via Libc-alpha wrote:
> > The value of PR_TAGGED_ADDR_ENABLE was incorrect in the installed
> > headers and the prctl command macros were missing that are needed
> > for it to be useful (PR_SET_TAGGED_ADDR_CTRL).  Linux headers have
> > the definitions since 5.4 so it's widely available, we don't need
> > to repeat these definitions.  The remaining definitions are from
> > Linux 5.10.
> > 
> > To build glibc with --enable-memory-tagging, Linux 5.4 headers and
> > binutils 2.33.1 or newer is needed.
> 
> ping.
> 
> i now committed all the other memtag patches (thanks
> DJ for the reviews) only this one needs a review
> (i would like to back port it too).


ping.

DJ: can you please do the review of this header change?
(others seems to be less interested looking at it)

> > ---
> >  sysdeps/unix/sysv/linux/sys/prctl.h | 4 ----
> >  1 file changed, 4 deletions(-)
> > 
> > diff --git a/sysdeps/unix/sysv/linux/sys/prctl.h b/sysdeps/unix/sysv/linux/sys/prctl.h
> > index 00817ff0f1..c9048c7cdb 100644
> > --- a/sysdeps/unix/sysv/linux/sys/prctl.h
> > +++ b/sysdeps/unix/sysv/linux/sys/prctl.h
> > @@ -25,10 +25,6 @@
> >     we're picking up...  */
> >  
> >  /* Memory tagging control operations (for AArch64).  */
> > -#ifndef PR_TAGGED_ADDR_ENABLE
> > -# define PR_TAGGED_ADDR_ENABLE	(1UL << 8)
> > -#endif
> > -
> >  #ifndef PR_MTE_TCF_SHIFT
> >  # define PR_MTE_TCF_SHIFT	1
> >  # define PR_MTE_TCF_NONE	(0UL << PR_MTE_TCF_SHIFT)
> > -- 
> > 2.17.1
> > 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 02/16] Remove PR_TAGGED_ADDR_ENABLE from sys/prctl.h
  2021-04-13  8:37     ` Szabolcs Nagy
@ 2021-04-13 21:32       ` DJ Delorie
  0 siblings, 0 replies; 39+ messages in thread
From: DJ Delorie @ 2021-04-13 21:32 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: libc-alpha, Richard.Earnshaw

Szabolcs Nagy <szabolcs.nagy@arm.com> writes:
> DJ: can you please do the review of this header change?
> (others seems to be less interested looking at it)

>> > -#ifndef PR_TAGGED_ADDR_ENABLE
>> > -# define PR_TAGGED_ADDR_ENABLE	(1UL << 8)
>> > -#endif
>> > -

I confirmed that 5.4 and master define this as 1UL << 0 so yeah, this
needs to go.

LGTM.

Reviewed-by: DJ Delorie <dj@redhat.com>


^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2021-04-13 21:32 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-04 16:30 [PATCH 00/16] memory tagging improvements Szabolcs Nagy
2021-03-04 16:30 ` [PATCH 01/16] malloc: Fix a realloc crash with heap tagging [BZ 27468] Szabolcs Nagy
2021-03-05  0:15   ` DJ Delorie
2021-03-05 12:01     ` Szabolcs Nagy
2021-03-05 18:42       ` DJ Delorie
2021-03-05 20:51   ` DJ Delorie
2021-03-04 16:30 ` [PATCH 02/16] Remove PR_TAGGED_ADDR_ENABLE from sys/prctl.h Szabolcs Nagy
2021-03-26 11:29   ` Szabolcs Nagy
2021-04-13  8:37     ` Szabolcs Nagy
2021-04-13 21:32       ` DJ Delorie
2021-03-04 16:31 ` [PATCH 03/16] malloc: Move MTAG_MMAP_FLAGS definition Szabolcs Nagy
2021-03-05  1:07   ` DJ Delorie
2021-03-04 16:31 ` [PATCH 04/16] malloc: Simplify __mtag_tag_new_usable Szabolcs Nagy
2021-03-05  0:20   ` DJ Delorie
2021-03-05 12:24     ` Szabolcs Nagy
2021-03-05 18:52   ` DJ Delorie
2021-03-04 16:31 ` [PATCH 05/16] malloc: Avoid taggig mmaped memory on free Szabolcs Nagy
2021-03-05  1:01   ` DJ Delorie
2021-03-04 16:31 ` [PATCH 06/16] malloc: Ensure the generic mtag hooks are not used Szabolcs Nagy
2021-03-05  1:05   ` DJ Delorie
2021-03-05 12:44     ` Szabolcs Nagy
2021-03-05 20:30   ` DJ Delorie
2021-03-04 16:32 ` [PATCH 07/16] malloc: Refactor TAG_ macros to avoid indirection Szabolcs Nagy
2021-03-05  0:28   ` DJ Delorie
2021-03-04 16:32 ` [PATCH 08/16] malloc: Use global flag instead of function pointer dispatch for mtag Szabolcs Nagy
2021-03-05  0:46   ` DJ Delorie
2021-03-05 12:53     ` Szabolcs Nagy
2021-03-04 16:32 ` [PATCH 09/16] malloc: Only support zeroing and not arbitrary memset with mtag Szabolcs Nagy
2021-03-05  0:49   ` DJ Delorie
2021-03-04 16:33 ` [PATCH 10/16] malloc: Change calloc when tagging is disabled Szabolcs Nagy
2021-03-05  1:06   ` DJ Delorie
2021-03-04 16:33 ` [PATCH 11/16] malloc: Use branches instead of mtag_granule_mask Szabolcs Nagy
2021-03-05 21:00   ` DJ Delorie
2021-03-04 16:33 ` [PATCH 12/16] malloc: Use mtag_enabled instead of USE_MTAG Szabolcs Nagy
2021-03-05  0:56   ` DJ Delorie
2021-03-04 16:34 ` [PATCH 13/16] aarch64: inline __libc_mtag_address_get_tag Szabolcs Nagy
2021-03-04 16:34 ` [PATCH 14/16] aarch64: inline __libc_mtag_new_tag Szabolcs Nagy
2021-03-04 16:34 ` [PATCH 15/16] aarch64: Optimize __libc_mtag_tag_region Szabolcs Nagy
2021-03-04 16:34 ` [PATCH 16/16] aarch64: Optimize __libc_mtag_tag_zero_region Szabolcs Nagy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).