Re: [PATCH] Remove catomics - Adhemerval Zanella

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

From: Adhemerval Zanella <adhemerval.zanella@linaro.org>
To: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Cc: GNU C Library <libc-alpha@sourceware.org>
Subject: Re: [PATCH] Remove catomics
Date: Wed, 22 Jun 2022 10:00:26 -0300	[thread overview]
Message-ID: <A8970DF4-4066-40E0-96D7-D3AF40227B12@linaro.org> (raw)
In-Reply-To: <AM5PR0801MB16684B64A3E0360741C250A883AF9@AM5PR0801MB1668.eurprd08.prod.outlook.com>



> On 17 Jun 2022, at 08:56, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
> 
> Hi Adhemerval,
> 
>> LGTM, thanks. Some comments below, but I can send a cleanup patch if you prefer.
>> 
>> #ifndef COMPARE_AND_SWAP
>> 
>> I think we can also remove this macro, only ia64 redefines it to old compiler builtin
>> (__sync_bool_compare_and_swap).
> 
> I removed it from ia64 and the #ifdef/#endif elsewhere. However the macro seems
> to have arguments in the correct order so it seems best to not substitute it for now.

Fair enough.

> 
> Also I noticed some dead uses of multiple_threads, so I have removed those
> (in principle your single-thread patch series no longer needs to make changes
> to sysdeps/x86/atomic-machine.h). Rerunning build-many-glibcs for a few arches passed.

Since this patch removes a x86 optimization (sorry, I realized it after my review), 
I think it would be better if circle back and first get my single-thread refactor
patches in (which fixes SINGLE_THREAD_P syscall on aarch64 and other architectures)
since it does not change x86.

After we can then remove the unused catomic operation and make the single-thread
optimization locking generic (so we can finally remove x86 arch-specific bits).

> 
> Cheers,
> Wilco
> 
> v2: Remove COMPARE_AND_SWAP ifdefs and dead uses of multiple_threads.
> 
> The catomics are not supported on most targets and are only used in a few places which are not
> performance critical, so replace all uses with more standard atomics.
> Replace uses of catomic_add, catomic_increment, catomic_decrement and catomic_fetch_and_add with
> atomic_fetch_add_relaxed which maps to a standard compiler builtin. Relaxed memory ordering is
> correct for simple counters since they only need atomicity.
> 
> ---
> diff --git a/elf/dl-fptr.c b/elf/dl-fptr.c
> index 6645a260b809ecd521796e0d1adee56b3e0bd993..ffc83ef78bc16f36773a51da7442646f47e85b71 100644
> --- a/elf/dl-fptr.c
> +++ b/elf/dl-fptr.c
> @@ -38,10 +38,8 @@
> # error "ELF_MACHINE_LOAD_ADDRESS is not defined."
> #endif
> 
> -#ifndef COMPARE_AND_SWAP
> -# define COMPARE_AND_SWAP(ptr, old, new) \
> -  (catomic_compare_and_exchange_bool_acq (ptr, new, old) == 0)
> -#endif
> +#define COMPARE_AND_SWAP(ptr, old, new) \
> +  (atomic_compare_and_exchange_bool_acq (ptr, new, old) == 0)
> 
> ElfW(Addr) _dl_boot_fptr_table [ELF_MACHINE_BOOT_FPTR_TABLE_LEN];
> 
> diff --git a/elf/dl-profile.c b/elf/dl-profile.c
> index ec57e3a96552ae6460c22a0fcc819b85d486c0da..0af1f577d2d695d08edce9e13d9b39f77911b1d5 100644
> --- a/elf/dl-profile.c
> +++ b/elf/dl-profile.c
> @@ -548,24 +548,24 @@ _dl_mcount (ElfW(Addr) frompc, ElfW(Addr) selfpc)
> 	      size_t newfromidx;
> 	      to_index = (data[narcs].self_pc
> 			  / (HASHFRACTION * sizeof (*tos)));
> -	      newfromidx = catomic_exchange_and_add (&fromidx, 1) + 1;
> +	      newfromidx = atomic_fetch_add_relaxed (&fromidx, 1) + 1;
> 	      froms[newfromidx].here = &data[narcs];
> 	      froms[newfromidx].link = tos[to_index];
> 	      tos[to_index] = newfromidx;
> -	      catomic_increment (&narcs);
> +	      atomic_fetch_add_relaxed (&narcs, 1);
> 	    }
> 
> 	  /* If we still have no entry stop searching and insert.  */
> 	  if (*topcindex == 0)
> 	    {
> -	      unsigned int newarc = catomic_exchange_and_add (narcsp, 1);
> +	      unsigned int newarc = atomic_fetch_add_relaxed (narcsp, 1);
> 
> 	      /* In rare cases it could happen that all entries in FROMS are
> 		 occupied.  So we cannot count this anymore.  */
> 	      if (newarc >= fromlimit)
> 		goto done;
> 
> -	      *topcindex = catomic_exchange_and_add (&fromidx, 1) + 1;
> +	      *topcindex = atomic_fetch_add_relaxed (&fromidx, 1) + 1;
> 	      fromp = &froms[*topcindex];
> 
> 	      fromp->here = &data[newarc];
> @@ -573,7 +573,7 @@ _dl_mcount (ElfW(Addr) frompc, ElfW(Addr) selfpc)
> 	      data[newarc].self_pc = selfpc;
> 	      data[newarc].count = 0;
> 	      fromp->link = 0;
> -	      catomic_increment (&narcs);
> +	      atomic_fetch_add_relaxed (&narcs, 1);
> 
> 	      break;
> 	    }
> @@ -586,7 +586,7 @@ _dl_mcount (ElfW(Addr) frompc, ElfW(Addr) selfpc)
>     }
> 
>   /* Increment the counter.  */
> -  catomic_increment (&fromp->here->count);
> +  atomic_fetch_add_relaxed (&fromp->here->count, 1);
> 
>  done:
>   ;
> diff --git a/include/atomic.h b/include/atomic.h
> index 2cb52c9cfd894308b97b97a04dd574b2287bf1b2..264db9a0b7619ff6520f84a19c53c1eb9a3b42a3 100644
> --- a/include/atomic.h
> +++ b/include/atomic.h
> @@ -24,13 +24,6 @@
>    - atomic arithmetic and logic operation on memory.  They all
>      have the prefix "atomic_".
> 
> -   - conditionally atomic operations of the same kinds.  These
> -     always behave identical but can be faster when atomicity
> -     is not really needed since only one thread has access to
> -     the memory location.  In that case the code is slower in
> -     the multi-thread case.  The interfaces have the prefix
> -     "catomic_".
> -
>    - support functions like barriers.  They also have the prefix
>      "atomic_".
> 
> @@ -93,29 +86,6 @@
> #endif
> 
> 
> -#ifndef catomic_compare_and_exchange_val_acq
> -# ifdef __arch_c_compare_and_exchange_val_32_acq
> -#  define catomic_compare_and_exchange_val_acq(mem, newval, oldval) \
> -  __atomic_val_bysize (__arch_c_compare_and_exchange_val,acq,		      \
> -		       mem, newval, oldval)
> -# else
> -#  define catomic_compare_and_exchange_val_acq(mem, newval, oldval) \
> -  atomic_compare_and_exchange_val_acq (mem, newval, oldval)
> -# endif
> -#endif
> -
> -
> -#ifndef catomic_compare_and_exchange_val_rel
> -# ifndef atomic_compare_and_exchange_val_rel
> -#  define catomic_compare_and_exchange_val_rel(mem, newval, oldval)	      \
> -  catomic_compare_and_exchange_val_acq (mem, newval, oldval)
> -# else
> -#  define catomic_compare_and_exchange_val_rel(mem, newval, oldval)	      \
> -  atomic_compare_and_exchange_val_rel (mem, newval, oldval)
> -# endif
> -#endif
> -
> -
> #ifndef atomic_compare_and_exchange_val_rel
> # define atomic_compare_and_exchange_val_rel(mem, newval, oldval)	      \
>   atomic_compare_and_exchange_val_acq (mem, newval, oldval)
> @@ -141,23 +111,6 @@
> #endif
> 
> 
> -#ifndef catomic_compare_and_exchange_bool_acq
> -# ifdef __arch_c_compare_and_exchange_bool_32_acq
> -#  define catomic_compare_and_exchange_bool_acq(mem, newval, oldval) \
> -  __atomic_bool_bysize (__arch_c_compare_and_exchange_bool,acq,		      \
> -		        mem, newval, oldval)
> -# else
> -#  define catomic_compare_and_exchange_bool_acq(mem, newval, oldval) \
> -  ({ /* Cannot use __oldval here, because macros later in this file might     \
> -	call this macro with __oldval argument.	 */			      \
> -     __typeof (oldval) __atg4_old = (oldval);				      \
> -     catomic_compare_and_exchange_val_acq (mem, newval, __atg4_old)	      \
> -       != __atg4_old;							      \
> -  })
> -# endif
> -#endif
> -
> -
> /* Store NEWVALUE in *MEM and return the old value.  */
> #ifndef atomic_exchange_acq
> # define atomic_exchange_acq(mem, newvalue) \
> @@ -212,24 +165,6 @@
>   atomic_exchange_and_add_acq(mem, value)
> #endif
> 
> -#ifndef catomic_exchange_and_add
> -# define catomic_exchange_and_add(mem, value) \
> -  ({ __typeof (*(mem)) __atg7_oldv;					      \
> -     __typeof (mem) __atg7_memp = (mem);				      \
> -     __typeof (*(mem)) __atg7_value = (value);				      \
> -									      \
> -     do									      \
> -       __atg7_oldv = *__atg7_memp;					      \
> -     while (__builtin_expect						      \
> -	    (catomic_compare_and_exchange_bool_acq (__atg7_memp,	      \
> -						    __atg7_oldv		      \
> -						    + __atg7_value,	      \
> -						    __atg7_oldv), 0));	      \
> -									      \
> -     __atg7_oldv; })
> -#endif
> -
> -
> #ifndef atomic_max
> # define atomic_max(mem, value) \
>   do {									      \
> @@ -246,25 +181,6 @@
>   } while (0)
> #endif
> 
> -
> -#ifndef catomic_max
> -# define catomic_max(mem, value) \
> -  do {									      \
> -    __typeof (*(mem)) __atg9_oldv;					      \
> -    __typeof (mem) __atg9_memp = (mem);					      \
> -    __typeof (*(mem)) __atg9_value = (value);				      \
> -    do {								      \
> -      __atg9_oldv = *__atg9_memp;					      \
> -      if (__atg9_oldv >= __atg9_value)					      \
> -	break;								      \
> -    } while (__builtin_expect						      \
> -	     (catomic_compare_and_exchange_bool_acq (__atg9_memp,	      \
> -						     __atg9_value,	      \
> -						     __atg9_oldv), 0));	      \
> -  } while (0)
> -#endif
> -
> -
> #ifndef atomic_min
> # define atomic_min(mem, value) \
>   do {									      \
> @@ -288,32 +204,16 @@
> #endif
> 
> 
> -#ifndef catomic_add
> -# define catomic_add(mem, value) \
> -  (void) catomic_exchange_and_add ((mem), (value))
> -#endif
> -
> -
> #ifndef atomic_increment
> # define atomic_increment(mem) atomic_add ((mem), 1)
> #endif
> 
> 
> -#ifndef catomic_increment
> -# define catomic_increment(mem) catomic_add ((mem), 1)
> -#endif
> -
> -
> #ifndef atomic_increment_val
> # define atomic_increment_val(mem) (atomic_exchange_and_add ((mem), 1) + 1)
> #endif
> 
> 
> -#ifndef catomic_increment_val
> -# define catomic_increment_val(mem) (catomic_exchange_and_add ((mem), 1) + 1)
> -#endif
> -
> -
> /* Add one to *MEM and return true iff it's now zero.  */
> #ifndef atomic_increment_and_test
> # define atomic_increment_and_test(mem) \
> @@ -326,21 +226,11 @@
> #endif
> 
> 
> -#ifndef catomic_decrement
> -# define catomic_decrement(mem) catomic_add ((mem), -1)
> -#endif
> -
> -
> #ifndef atomic_decrement_val
> # define atomic_decrement_val(mem) (atomic_exchange_and_add ((mem), -1) - 1)
> #endif
> 
> 
> -#ifndef catomic_decrement_val
> -# define catomic_decrement_val(mem) (catomic_exchange_and_add ((mem), -1) - 1)
> -#endif
> -
> -
> /* Subtract 1 from *MEM and return true iff it's now zero.  */
> #ifndef atomic_decrement_and_test
> # define atomic_decrement_and_test(mem) \
> @@ -421,22 +311,6 @@
>   } while (0)
> #endif
> 
> -#ifndef catomic_and
> -# define catomic_and(mem, mask) \
> -  do {									      \
> -    __typeof (*(mem)) __atg20_old;					      \
> -    __typeof (mem) __atg20_memp = (mem);				      \
> -    __typeof (*(mem)) __atg20_mask = (mask);				      \
> -									      \
> -    do									      \
> -      __atg20_old = (*__atg20_memp);					      \
> -    while (__builtin_expect						      \
> -	   (catomic_compare_and_exchange_bool_acq (__atg20_memp,	      \
> -						   __atg20_old & __atg20_mask,\
> -						   __atg20_old), 0));	      \
> -  } while (0)
> -#endif
> -
> /* Atomically *mem &= mask and return the old value of *mem.  */
> #ifndef atomic_and_val
> # define atomic_and_val(mem, mask) \
> @@ -471,22 +345,6 @@
>   } while (0)
> #endif
> 
> -#ifndef catomic_or
> -# define catomic_or(mem, mask) \
> -  do {									      \
> -    __typeof (*(mem)) __atg18_old;					      \
> -    __typeof (mem) __atg18_memp = (mem);				      \
> -    __typeof (*(mem)) __atg18_mask = (mask);				      \
> -									      \
> -    do									      \
> -      __atg18_old = (*__atg18_memp);					      \
> -    while (__builtin_expect						      \
> -	   (catomic_compare_and_exchange_bool_acq (__atg18_memp,	      \
> -						   __atg18_old | __atg18_mask,\
> -						   __atg18_old), 0));	      \
> -  } while (0)
> -#endif
> -
> /* Atomically *mem |= mask and return the old value of *mem.  */
> #ifndef atomic_or_val
> # define atomic_or_val(mem, mask) \
> diff --git a/malloc/arena.c b/malloc/arena.c
> index 0a684a720d9648953073bd7d35faca100762c031..7c74a18381a4be5fe6bcb94b38a62dbfa6b674f4 100644
> --- a/malloc/arena.c
> +++ b/malloc/arena.c
> @@ -958,11 +958,11 @@ arena_get2 (size_t size, mstate avoid_arena)
>          enough address space to create that many arenas.  */
>       if (__glibc_unlikely (n <= narenas_limit - 1))
>         {
> -          if (catomic_compare_and_exchange_bool_acq (&narenas, n + 1, n))
> +          if (atomic_compare_and_exchange_bool_acq (&narenas, n + 1, n))
>             goto repeat;
>           a = _int_new_arena (size);
> 	  if (__glibc_unlikely (a == NULL))
> -            catomic_decrement (&narenas);
> +            atomic_fetch_add_relaxed (&narenas, -1);
>         }
>       else
>         a = reused_arena (avoid_arena);
> diff --git a/malloc/malloc.c b/malloc/malloc.c
> index 09e5ff2bce5d79b3754687db3aef314640d336eb..a335cd23e18ee09027a8e7a71d808bfa1343e06d 100644
> --- a/malloc/malloc.c
> +++ b/malloc/malloc.c
> @@ -2486,11 +2486,11 @@ sysmalloc_mmap (INTERNAL_SIZE_T nb, size_t pagesize, int extra_flags, mstate av)
>     }
> 
>   /* update statistics */
> -  int new = atomic_exchange_and_add (&mp_.n_mmaps, 1) + 1;
> +  int new = atomic_fetch_add_relaxed (&mp_.n_mmaps, 1) + 1;
>   atomic_max (&mp_.max_n_mmaps, new);
> 
>   unsigned long sum;
> -  sum = atomic_exchange_and_add (&mp_.mmapped_mem, size) + size;
> +  sum = atomic_fetch_add_relaxed (&mp_.mmapped_mem, size) + size;
>   atomic_max (&mp_.max_mmapped_mem, sum);
> 
>   check_chunk (av, p);
> @@ -3059,8 +3059,8 @@ munmap_chunk (mchunkptr p)
>       || __glibc_unlikely (!powerof2 (mem & (pagesize - 1))))
>     malloc_printerr ("munmap_chunk(): invalid pointer");
> 
> -  atomic_decrement (&mp_.n_mmaps);
> -  atomic_add (&mp_.mmapped_mem, -total_size);
> +  atomic_fetch_add_relaxed (&mp_.n_mmaps, -1);
> +  atomic_fetch_add_relaxed (&mp_.mmapped_mem, -total_size);
> 
>   /* If munmap failed the process virtual memory address space is in a
>      bad shape.  Just leave the block hanging around, the process will
> @@ -3110,7 +3110,7 @@ mremap_chunk (mchunkptr p, size_t new_size)
>   set_head (p, (new_size - offset) | IS_MMAPPED);
> 
>   INTERNAL_SIZE_T new;
> -  new = atomic_exchange_and_add (&mp_.mmapped_mem, new_size - size - offset)
> +  new = atomic_fetch_add_relaxed (&mp_.mmapped_mem, new_size - size - offset)
>         + new_size - size - offset;
>   atomic_max (&mp_.max_mmapped_mem, new);
>   return p;
> @@ -3832,7 +3832,7 @@ _int_malloc (mstate av, size_t bytes)
>       if (__glibc_unlikely (pp != NULL && misaligned_chunk (pp)))       \
> 	malloc_printerr ("malloc(): unaligned fastbin chunk detected"); \
>     }							\
> -  while ((pp = catomic_compare_and_exchange_val_acq (fb, pp, victim)) \
> +  while ((pp = atomic_compare_and_exchange_val_acq (fb, pp, victim)) \
> 	 != victim);					\
> 
>   if ((unsigned long) (nb) <= (unsigned long) (get_max_fast ()))
> @@ -4550,7 +4550,7 @@ _int_free (mstate av, mchunkptr p, int have_lock)
> 	  old2 = old;
> 	  p->fd = PROTECT_PTR (&p->fd, old);
> 	}
> -      while ((old = catomic_compare_and_exchange_val_rel (fb, p, old2))
> +      while ((old = atomic_compare_and_exchange_val_rel (fb, p, old2))
> 	     != old2);
> 
>     /* Check that size of fastbin chunk at the top is the same as
> diff --git a/malloc/memusage.c b/malloc/memusage.c
> index f30906dffb2731c104ea375af48f59c65bcc7c9c..74712834fa8b96fb2d9589d34b34ab07d05a84ca 100644
> --- a/malloc/memusage.c
> +++ b/malloc/memusage.c
> @@ -148,8 +148,8 @@ update_data (struct header *result, size_t len, size_t old_len)
> 
>   /* Compute current heap usage and compare it with the maximum value.  */
>   size_t heap
> -    = catomic_exchange_and_add (&current_heap, len - old_len) + len - old_len;
> -  catomic_max (&peak_heap, heap);
> +    = atomic_fetch_add_relaxed (&current_heap, len - old_len) + len - old_len;
> +  atomic_max (&peak_heap, heap);
> 
>   /* Compute current stack usage and compare it with the maximum
>      value.  The base stack pointer might not be set if this is not
> @@ -172,15 +172,15 @@ update_data (struct header *result, size_t len, size_t old_len)
>     start_sp = sp;
>   size_t current_stack = start_sp - sp;
> #endif
> -  catomic_max (&peak_stack, current_stack);
> +  atomic_max (&peak_stack, current_stack);
> 
>   /* Add up heap and stack usage and compare it with the maximum value.  */
> -  catomic_max (&peak_total, heap + current_stack);
> +  atomic_max (&peak_total, heap + current_stack);
> 
>   /* Store the value only if we are writing to a file.  */
>   if (fd != -1)
>     {
> -      uint32_t idx = catomic_exchange_and_add (&buffer_cnt, 1);
> +      uint32_t idx = atomic_fetch_add_relaxed (&buffer_cnt, 1);
>       if (idx + 1 >= 2 * buffer_size)
>         {
>           /* We try to reset the counter to the correct range.  If
> @@ -188,7 +188,7 @@ update_data (struct header *result, size_t len, size_t old_len)
>              counter it does not matter since that thread will take
>              care of the correction.  */
>           uint32_t reset = (idx + 1) % (2 * buffer_size);
> -          catomic_compare_and_exchange_val_acq (&buffer_cnt, reset, idx + 1);
> +          atomic_compare_and_exchange_val_acq (&buffer_cnt, reset, idx + 1);
>           if (idx >= 2 * buffer_size)
>             idx = reset - 1;
>         }
> @@ -362,24 +362,24 @@ malloc (size_t len)
>     return (*mallocp)(len);
> 
>   /* Keep track of number of calls.  */
> -  catomic_increment (&calls[idx_malloc]);
> +  atomic_fetch_add_relaxed (&calls[idx_malloc], 1);
>   /* Keep track of total memory consumption for `malloc'.  */
> -  catomic_add (&total[idx_malloc], len);
> +  atomic_fetch_add_relaxed (&total[idx_malloc], len);
>   /* Keep track of total memory requirement.  */
> -  catomic_add (&grand_total, len);
> +  atomic_fetch_add_relaxed (&grand_total, len);
>   /* Remember the size of the request.  */
>   if (len < 65536)
> -    catomic_increment (&histogram[len / 16]);
> +    atomic_fetch_add_relaxed (&histogram[len / 16], 1);
>   else
> -    catomic_increment (&large);
> +    atomic_fetch_add_relaxed (&large, 1);
>   /* Total number of calls of any of the functions.  */
> -  catomic_increment (&calls_total);
> +  atomic_fetch_add_relaxed (&calls_total, 1);
> 
>   /* Do the real work.  */
>   result = (struct header *) (*mallocp)(len + sizeof (struct header));
>   if (result == NULL)
>     {
> -      catomic_increment (&failed[idx_malloc]);
> +      atomic_fetch_add_relaxed (&failed[idx_malloc], 1);
>       return NULL;
>     }
> 
> @@ -430,21 +430,21 @@ realloc (void *old, size_t len)
>     }
> 
>   /* Keep track of number of calls.  */
> -  catomic_increment (&calls[idx_realloc]);
> +  atomic_fetch_add_relaxed (&calls[idx_realloc], 1);
>   if (len > old_len)
>     {
>       /* Keep track of total memory consumption for `realloc'.  */
> -      catomic_add (&total[idx_realloc], len - old_len);
> +      atomic_fetch_add_relaxed (&total[idx_realloc], len - old_len);
>       /* Keep track of total memory requirement.  */
> -      catomic_add (&grand_total, len - old_len);
> +      atomic_fetch_add_relaxed (&grand_total, len - old_len);
>     }
> 
>   if (len == 0 && old != NULL)
>     {
>       /* Special case.  */
> -      catomic_increment (&realloc_free);
> +      atomic_fetch_add_relaxed (&realloc_free, 1);
>       /* Keep track of total memory freed using `free'.  */
> -      catomic_add (&total[idx_free], real->length);
> +      atomic_fetch_add_relaxed (&total[idx_free], real->length);
> 
>       /* Update the allocation data and write out the records if necessary.  */
>       update_data (NULL, 0, old_len);
> @@ -457,26 +457,26 @@ realloc (void *old, size_t len)
> 
>   /* Remember the size of the request.  */
>   if (len < 65536)
> -    catomic_increment (&histogram[len / 16]);
> +    atomic_fetch_add_relaxed (&histogram[len / 16], 1);
>   else
> -    catomic_increment (&large);
> +    atomic_fetch_add_relaxed (&large, 1);
>   /* Total number of calls of any of the functions.  */
> -  catomic_increment (&calls_total);
> +  atomic_fetch_add_relaxed (&calls_total, 1);
> 
>   /* Do the real work.  */
>   result = (struct header *) (*reallocp)(real, len + sizeof (struct header));
>   if (result == NULL)
>     {
> -      catomic_increment (&failed[idx_realloc]);
> +      atomic_fetch_add_relaxed (&failed[idx_realloc], 1);
>       return NULL;
>     }
> 
>   /* Record whether the reduction/increase happened in place.  */
>   if (real == result)
> -    catomic_increment (&inplace);
> +    atomic_fetch_add_relaxed (&inplace, 1);
>   /* Was the buffer increased?  */
>   if (old_len > len)
> -    catomic_increment (&decreasing);
> +    atomic_fetch_add_relaxed (&decreasing, 1);
> 
>   /* Update the allocation data and write out the records if necessary.  */
>   update_data (result, len, old_len);
> @@ -508,16 +508,16 @@ calloc (size_t n, size_t len)
>     return (*callocp)(n, len);
> 
>   /* Keep track of number of calls.  */
> -  catomic_increment (&calls[idx_calloc]);
> +  atomic_fetch_add_relaxed (&calls[idx_calloc], 1);
>   /* Keep track of total memory consumption for `calloc'.  */
> -  catomic_add (&total[idx_calloc], size);
> +  atomic_fetch_add_relaxed (&total[idx_calloc], size);
>   /* Keep track of total memory requirement.  */
> -  catomic_add (&grand_total, size);
> +  atomic_fetch_add_relaxed (&grand_total, size);
>   /* Remember the size of the request.  */
>   if (size < 65536)
> -    catomic_increment (&histogram[size / 16]);
> +    atomic_fetch_add_relaxed (&histogram[size / 16], 1);
>   else
> -    catomic_increment (&large);
> +    atomic_fetch_add_relaxed (&large, 1);
>   /* Total number of calls of any of the functions.  */
>   ++calls_total;
> 
> @@ -525,7 +525,7 @@ calloc (size_t n, size_t len)
>   result = (struct header *) (*mallocp)(size + sizeof (struct header));
>   if (result == NULL)
>     {
> -      catomic_increment (&failed[idx_calloc]);
> +      atomic_fetch_add_relaxed (&failed[idx_calloc], 1);
>       return NULL;
>     }
> 
> @@ -563,7 +563,7 @@ free (void *ptr)
>   /* `free (NULL)' has no effect.  */
>   if (ptr == NULL)
>     {
> -      catomic_increment (&calls[idx_free]);
> +      atomic_fetch_add_relaxed (&calls[idx_free], 1);
>       return;
>     }
> 
> @@ -577,9 +577,9 @@ free (void *ptr)
>     }
> 
>   /* Keep track of number of calls.  */
> -  catomic_increment (&calls[idx_free]);
> +  atomic_fetch_add_relaxed (&calls[idx_free], 1);
>   /* Keep track of total memory freed using `free'.  */
> -  catomic_add (&total[idx_free], real->length);
> +  atomic_fetch_add_relaxed (&total[idx_free], real->length);
> 
>   /* Update the allocation data and write out the records if necessary.  */
>   update_data (NULL, 0, real->length);
> @@ -614,22 +614,22 @@ mmap (void *start, size_t len, int prot, int flags, int fd, off_t offset)
>                  ? idx_mmap_a : prot & PROT_WRITE ? idx_mmap_w : idx_mmap_r);
> 
>       /* Keep track of number of calls.  */
> -      catomic_increment (&calls[idx]);
> +      atomic_fetch_add_relaxed (&calls[idx], 1);
>       /* Keep track of total memory consumption for `malloc'.  */
> -      catomic_add (&total[idx], len);
> +      atomic_fetch_add_relaxed (&total[idx], len);
>       /* Keep track of total memory requirement.  */
> -      catomic_add (&grand_total, len);
> +      atomic_fetch_add_relaxed (&grand_total, len);
>       /* Remember the size of the request.  */
>       if (len < 65536)
> -        catomic_increment (&histogram[len / 16]);
> +        atomic_fetch_add_relaxed (&histogram[len / 16], 1);
>       else
> -        catomic_increment (&large);
> +        atomic_fetch_add_relaxed (&large, 1);
>       /* Total number of calls of any of the functions.  */
> -      catomic_increment (&calls_total);
> +      atomic_fetch_add_relaxed (&calls_total, 1);
> 
>       /* Check for failures.  */
>       if (result == NULL)
> -        catomic_increment (&failed[idx]);
> +        atomic_fetch_add_relaxed (&failed[idx], 1);
>       else if (idx == idx_mmap_w)
>         /* Update the allocation data and write out the records if
>            necessary.  Note the first parameter is NULL which means
> @@ -667,22 +667,22 @@ mmap64 (void *start, size_t len, int prot, int flags, int fd, off64_t offset)
>                  ? idx_mmap_a : prot & PROT_WRITE ? idx_mmap_w : idx_mmap_r);
> 
>       /* Keep track of number of calls.  */
> -      catomic_increment (&calls[idx]);
> +      atomic_fetch_add_relaxed (&calls[idx], 1);
>       /* Keep track of total memory consumption for `malloc'.  */
> -      catomic_add (&total[idx], len);
> +      atomic_fetch_add_relaxed (&total[idx], len);
>       /* Keep track of total memory requirement.  */
> -      catomic_add (&grand_total, len);
> +      atomic_fetch_add_relaxed (&grand_total, len);
>       /* Remember the size of the request.  */
>       if (len < 65536)
> -        catomic_increment (&histogram[len / 16]);
> +        atomic_fetch_add_relaxed (&histogram[len / 16], 1);
>       else
> -        catomic_increment (&large);
> +        atomic_fetch_add_relaxed (&large, 1);
>       /* Total number of calls of any of the functions.  */
> -      catomic_increment (&calls_total);
> +      atomic_fetch_add_relaxed (&calls_total, 1);
> 
>       /* Check for failures.  */
>       if (result == NULL)
> -        catomic_increment (&failed[idx]);
> +        atomic_fetch_add_relaxed (&failed[idx], 1);
>       else if (idx == idx_mmap_w)
>         /* Update the allocation data and write out the records if
>            necessary.  Note the first parameter is NULL which means
> @@ -722,33 +722,33 @@ mremap (void *start, size_t old_len, size_t len, int flags, ...)
>   if (!not_me && trace_mmap)
>     {
>       /* Keep track of number of calls.  */
> -      catomic_increment (&calls[idx_mremap]);
> +      atomic_fetch_add_relaxed (&calls[idx_mremap], 1);
>       if (len > old_len)
>         {
>           /* Keep track of total memory consumption for `malloc'.  */
> -          catomic_add (&total[idx_mremap], len - old_len);
> +          atomic_fetch_add_relaxed (&total[idx_mremap], len - old_len);
>           /* Keep track of total memory requirement.  */
> -          catomic_add (&grand_total, len - old_len);
> +          atomic_fetch_add_relaxed (&grand_total, len - old_len);
>         }
>       /* Remember the size of the request.  */
>       if (len < 65536)
> -        catomic_increment (&histogram[len / 16]);
> +        atomic_fetch_add_relaxed (&histogram[len / 16], 1);
>       else
> -        catomic_increment (&large);
> +        atomic_fetch_add_relaxed (&large, 1);
>       /* Total number of calls of any of the functions.  */
> -      catomic_increment (&calls_total);
> +      atomic_fetch_add_relaxed (&calls_total, 1);
> 
>       /* Check for failures.  */
>       if (result == NULL)
> -        catomic_increment (&failed[idx_mremap]);
> +        atomic_fetch_add_relaxed (&failed[idx_mremap], 1);
>       else
>         {
>           /* Record whether the reduction/increase happened in place.  */
>           if (start == result)
> -            catomic_increment (&inplace_mremap);
> +            atomic_fetch_add_relaxed (&inplace_mremap, 1);
>           /* Was the buffer increased?  */
>           if (old_len > len)
> -            catomic_increment (&decreasing_mremap);
> +            atomic_fetch_add_relaxed (&decreasing_mremap, 1);
> 
>           /* Update the allocation data and write out the records if
>              necessary.  Note the first parameter is NULL which means
> @@ -783,19 +783,19 @@ munmap (void *start, size_t len)
>   if (!not_me && trace_mmap)
>     {
>       /* Keep track of number of calls.  */
> -      catomic_increment (&calls[idx_munmap]);
> +      atomic_fetch_add_relaxed (&calls[idx_munmap], 1);
> 
>       if (__glibc_likely (result == 0))
>         {
>           /* Keep track of total memory freed using `free'.  */
> -          catomic_add (&total[idx_munmap], len);
> +          atomic_fetch_add_relaxed (&total[idx_munmap], len);
> 
>           /* Update the allocation data and write out the records if
>              necessary.  */
>           update_data (NULL, 0, len);
>         }
>       else
> -        catomic_increment (&failed[idx_munmap]);
> +        atomic_fetch_add_relaxed (&failed[idx_munmap], 1);
>     }
> 
>   return result;
> diff --git a/manual/memory.texi b/manual/memory.texi
> index 23a039c57e60c81787252d935e3b309fd8290902..5cb1dbd281006148f23cfa38c5703fb79089ba78 100644
> --- a/manual/memory.texi
> +++ b/manual/memory.texi
> @@ -354,7 +354,7 @@ this function is in @file{stdlib.h}.
> @c that's protected by list_lock; next_free is only modified while
> @c list_lock is held too.  All other data members of an arena, as well
> @c as the metadata of the memory areas assigned to it, are only modified
> -@c while holding the arena's mutex (fastbin pointers use catomic ops
> +@c while holding the arena's mutex (fastbin pointers use atomic ops
> @c because they may be modified by free without taking the arena's
> @c lock).  Some reassurance was needed for fastbins, for it wasn't clear
> @c how they were initialized.  It turns out they are always
> @@ -383,7 +383,7 @@ this function is in @file{stdlib.h}.
> @c     mutex_lock (arena lock) dup @asulock @aculock [returns locked]
> @c    __get_nprocs ext ok @acsfd
> @c    NARENAS_FROM_NCORES ok
> -@c    catomic_compare_and_exchange_bool_acq ok
> +@c    atomic_compare_and_exchange_bool_acq ok
> @c    _int_new_arena ok @asulock @aculock @acsmem
> @c     new_heap ok @acsmem
> @c      mmap ok @acsmem
> @@ -397,7 +397,7 @@ this function is in @file{stdlib.h}.
> @c     mutex_lock (list_lock) dup @asulock @aculock
> @c     atomic_write_barrier ok
> @c     mutex_unlock (list_lock) @aculock
> -@c    catomic_decrement ok
> +@c    atomic_decrement ok
> @c    reused_arena @asulock @aculock
> @c      reads&writes next_to_use and iterates over arena next without guards
> @c      those are harmless as long as we don't drop arenas from the
> @@ -414,7 +414,7 @@ this function is in @file{stdlib.h}.
> @c   get_max_fast ok
> @c   fastbin_index ok
> @c   fastbin ok
> -@c   catomic_compare_and_exhange_val_acq ok
> +@c   atomic_compare_and_exhange_val_acq ok
> @c   malloc_printerr dup @mtsenv
> @c     if we get to it, we're toast already, undefined behavior must have
> @c     been invoked before
> @@ -521,10 +521,10 @@ this function is in @file{stdlib.h}.
> @c     chunk2mem dup ok
> @c     free_perturb ok
> @c     set_fastchunks ok
> -@c      catomic_and ok
> +@c      atomic_and ok
> @c     fastbin_index dup ok
> @c     fastbin dup ok
> -@c     catomic_compare_and_exchange_val_rel ok
> +@c     atomic_compare_and_exchange_val_rel ok
> @c     chunk_is_mmapped ok
> @c     contiguous dup ok
> @c     prev_inuse ok
> @@ -706,7 +706,7 @@ The prototype for this function is in @file{stdlib.h}.
> @safety{@prelim{}@mtsafe{}@asunsafe{@asulock{}}@acunsafe{@aculock{} @acsfd{} @acsmem{}}}
> @c __libc_free @asulock @aculock @acsfd @acsmem
> @c   releasing memory into fastbins modifies the arena without taking
> -@c   its mutex, but catomic operations ensure safety.  If two (or more)
> +@c   its mutex, but atomic operations ensure safety.  If two (or more)
> @c   threads are running malloc and have their own arenas locked when
> @c   each gets a signal whose handler free()s large (non-fastbin-able)
> @c   blocks from each other's arena, we deadlock; this is a more general
> diff --git a/misc/tst-atomic.c b/misc/tst-atomic.c
> index 6d681a7bfdf4f48b4c04a073ebd480326dbd3cc8..4f9d2c1a46b363d346dbc2fa0962ae196844a43a 100644
> --- a/misc/tst-atomic.c
> +++ b/misc/tst-atomic.c
> @@ -393,117 +393,6 @@ do_test (void)
>     }
> #endif
> 
> -#ifdef catomic_compare_and_exchange_val_acq
> -  mem = 24;
> -  if (catomic_compare_and_exchange_val_acq (&mem, 35, 24) != 24
> -      || mem != 35)
> -    {
> -      puts ("catomic_compare_and_exchange_val_acq test 1 failed");
> -      ret = 1;
> -    }
> -
> -  mem = 12;
> -  if (catomic_compare_and_exchange_val_acq (&mem, 10, 15) != 12
> -      || mem != 12)
> -    {
> -      puts ("catomic_compare_and_exchange_val_acq test 2 failed");
> -      ret = 1;
> -    }
> -
> -  mem = -15;
> -  if (catomic_compare_and_exchange_val_acq (&mem, -56, -15) != -15
> -      || mem != -56)
> -    {
> -      puts ("catomic_compare_and_exchange_val_acq test 3 failed");
> -      ret = 1;
> -    }
> -
> -  mem = -1;
> -  if (catomic_compare_and_exchange_val_acq (&mem, 17, 0) != -1
> -      || mem != -1)
> -    {
> -      puts ("catomic_compare_and_exchange_val_acq test 4 failed");
> -      ret = 1;
> -    }
> -#endif
> -
> -  mem = 24;
> -  if (catomic_compare_and_exchange_bool_acq (&mem, 35, 24)
> -      || mem != 35)
> -    {
> -      puts ("catomic_compare_and_exchange_bool_acq test 1 failed");
> -      ret = 1;
> -    }
> -
> -  mem = 12;
> -  if (! catomic_compare_and_exchange_bool_acq (&mem, 10, 15)
> -      || mem != 12)
> -    {
> -      puts ("catomic_compare_and_exchange_bool_acq test 2 failed");
> -      ret = 1;
> -    }
> -
> -  mem = -15;
> -  if (catomic_compare_and_exchange_bool_acq (&mem, -56, -15)
> -      || mem != -56)
> -    {
> -      puts ("catomic_compare_and_exchange_bool_acq test 3 failed");
> -      ret = 1;
> -    }
> -
> -  mem = -1;
> -  if (! catomic_compare_and_exchange_bool_acq (&mem, 17, 0)
> -      || mem != -1)
> -    {
> -      puts ("catomic_compare_and_exchange_bool_acq test 4 failed");
> -      ret = 1;
> -    }
> -
> -  mem = 2;
> -  if (catomic_exchange_and_add (&mem, 11) != 2
> -      || mem != 13)
> -    {
> -      puts ("catomic_exchange_and_add test failed");
> -      ret = 1;
> -    }
> -
> -  mem = -21;
> -  catomic_add (&mem, 22);
> -  if (mem != 1)
> -    {
> -      puts ("catomic_add test failed");
> -      ret = 1;
> -    }
> -
> -  mem = -1;
> -  catomic_increment (&mem);
> -  if (mem != 0)
> -    {
> -      puts ("catomic_increment test failed");
> -      ret = 1;
> -    }
> -
> -  mem = 2;
> -  if (catomic_increment_val (&mem) != 3)
> -    {
> -      puts ("catomic_increment_val test failed");
> -      ret = 1;
> -    }
> -
> -  mem = 17;
> -  catomic_decrement (&mem);
> -  if (mem != 16)
> -    {
> -      puts ("catomic_decrement test failed");
> -      ret = 1;
> -    }
> -
> -  if (catomic_decrement_val (&mem) != 15)
> -    {
> -      puts ("catomic_decrement_val test failed");
> -      ret = 1;
> -    }
> -
>   /* Tests for C11-like atomics.  */
>   mem = 11;
>   if (atomic_load_relaxed (&mem) != 11 || atomic_load_acquire (&mem) != 11)
> diff --git a/sysdeps/hppa/dl-fptr.c b/sysdeps/hppa/dl-fptr.c
> index 9ed21602d6155d4b960278f8d1fac4ffa885b9d5..40bf5cd3b306315d8eeb6bdba2b2b46b1ea5059e 100644
> --- a/sysdeps/hppa/dl-fptr.c
> +++ b/sysdeps/hppa/dl-fptr.c
> @@ -41,10 +41,8 @@
> # error "ELF_MACHINE_LOAD_ADDRESS is not defined."
> #endif
> 
> -#ifndef COMPARE_AND_SWAP
> -# define COMPARE_AND_SWAP(ptr, old, new) \
> -  (catomic_compare_and_exchange_bool_acq (ptr, new, old) == 0)
> -#endif
> +#define COMPARE_AND_SWAP(ptr, old, new) \
> +  (atomic_compare_and_exchange_bool_acq (ptr, new, old) == 0)
> 
> ElfW(Addr) _dl_boot_fptr_table [ELF_MACHINE_BOOT_FPTR_TABLE_LEN];
> 
> diff --git a/sysdeps/ia64/dl-fptr.h b/sysdeps/ia64/dl-fptr.h
> index e6c34528b67e7986ac44d77eb2505889a1767e29..a3a2994db20f44dfd3280f6ad618001cbadc0539 100644
> --- a/sysdeps/ia64/dl-fptr.h
> +++ b/sysdeps/ia64/dl-fptr.h
> @@ -22,9 +22,6 @@
> #include <ia64intrin.h>
> #include <sysdeps/generic/dl-fptr.h>
> 
> -#define COMPARE_AND_SWAP(ptr, old, new)	\
> -  __sync_bool_compare_and_swap (ptr, old, new)
> -
> /* There are currently 123 dynamic symbols in ld.so.
>    ELF_MACHINE_BOOT_FPTR_TABLE_LEN needs to be at least that big.  */
> #define ELF_MACHINE_BOOT_FPTR_TABLE_LEN	200
> diff --git a/sysdeps/s390/atomic-machine.h b/sysdeps/s390/atomic-machine.h
> index e85b2ef50541c7aab6d2981180f6205d2bd681b6..6b1de51c2a30baf5554a729a80a7ce04b56fc22c 100644
> --- a/sysdeps/s390/atomic-machine.h
> +++ b/sysdeps/s390/atomic-machine.h
> @@ -70,8 +70,6 @@
>     !__atomic_compare_exchange_n (mem, (void *) &__atg2_oldval, newval,	\
> 				  1, __ATOMIC_ACQUIRE,			\
> 				  __ATOMIC_RELAXED); })
> -#define catomic_compare_and_exchange_bool_acq(mem, newval, oldval)	\
> -  atomic_compare_and_exchange_bool_acq (mem, newval, oldval)
> 
> /* Store NEWVALUE in *MEM and return the old value.  */
> #define atomic_exchange_acq(mem, newvalue)				\
> @@ -90,8 +88,6 @@
> # define atomic_exchange_and_add_rel(mem, operand)			\
>   ({ __atomic_check_size((mem));					\
>   __atomic_fetch_add ((mem), (operand), __ATOMIC_RELEASE); })
> -#define catomic_exchange_and_add(mem, value)	\
> -  atomic_exchange_and_add (mem, value)
> 
> /* Atomically *mem |= mask and return the old value of *mem.  */
> /* The gcc builtin uses load-and-or instruction on z196 zarch and higher cpus
> @@ -104,8 +100,6 @@
>   do {						\
>     atomic_or_val (mem, mask);			\
>   } while (0)
> -#define catomic_or(mem, mask)			\
> -  atomic_or (mem, mask)
> 
> /* Atomically *mem |= 1 << bit and return true if the bit was set in old value
>    of *mem.  */
> @@ -129,5 +123,3 @@
>   do {						\
>     atomic_and_val (mem, mask);			\
>   } while (0)
> -#define catomic_and(mem, mask)			\
> -  atomic_and(mem, mask)
> diff --git a/sysdeps/unix/sysv/linux/riscv/atomic-machine.h b/sysdeps/unix/sysv/linux/riscv/atomic-machine.h
> index 9ae89e0ef12ad28319755ac51260908779b9579f..f4b2cbced828a80335887bf172fd60767cf978ac 100644
> --- a/sysdeps/unix/sysv/linux/riscv/atomic-machine.h
> +++ b/sysdeps/unix/sysv/linux/riscv/atomic-machine.h
> @@ -170,10 +170,6 @@
>   ({ typeof (*mem) __mask = (typeof (*mem))1 << (bit);    \
>      asm_amo ("amoor", ".aq", mem, __mask) & __mask; })
> 
> -# define catomic_exchange_and_add(mem, value)		\
> -  atomic_exchange_and_add (mem, value)
> -# define catomic_max(mem, value) atomic_max (mem, value)
> -
> #else /* __riscv_atomic */
> # error "ISAs that do not subsume the A extension are not supported"
> #endif /* !__riscv_atomic */
> diff --git a/sysdeps/x86/atomic-machine.h b/sysdeps/x86/atomic-machine.h
> index f24f1c71ed718c601c71decc1ee0c4b49fdf32f8..5a48521443367b1ca94c9f153c2a163661de789c 100644
> --- a/sysdeps/x86/atomic-machine.h
> +++ b/sysdeps/x86/atomic-machine.h
> @@ -20,7 +20,6 @@
> #define _X86_ATOMIC_MACHINE_H 1
> 
> #include <stdint.h>
> -#include <tls.h>			/* For tcbhead_t.  */
> #include <libc-pointer-arith.h>		/* For cast_to_integer.  */
> 
> #define LOCK_PREFIX "lock;"
> @@ -52,52 +51,7 @@
>   (! __sync_bool_compare_and_swap (mem, oldval, newval))
> 
> 
> -#define __arch_c_compare_and_exchange_val_8_acq(mem, newval, oldval) \
> -  ({ __typeof (*mem) ret;						      \
> -     __asm __volatile ("cmpl $0, %%" SEG_REG ":%P5\n\t"			      \
> -		       "je 0f\n\t"					      \
> -		       "lock\n"						      \
> -		       "0:\tcmpxchgb %b2, %1"				      \
> -		       : "=a" (ret), "=m" (*mem)			      \
> -		       : BR_CONSTRAINT (newval), "m" (*mem), "0" (oldval),    \
> -			 "i" (offsetof (tcbhead_t, multiple_threads)));	      \
> -     ret; })
> -
> -#define __arch_c_compare_and_exchange_val_16_acq(mem, newval, oldval) \
> -  ({ __typeof (*mem) ret;						      \
> -     __asm __volatile ("cmpl $0, %%" SEG_REG ":%P5\n\t"			      \
> -		       "je 0f\n\t"					      \
> -		       "lock\n"						      \
> -		       "0:\tcmpxchgw %w2, %1"				      \
> -		       : "=a" (ret), "=m" (*mem)			      \
> -		       : BR_CONSTRAINT (newval), "m" (*mem), "0" (oldval),    \
> -			 "i" (offsetof (tcbhead_t, multiple_threads)));	      \
> -     ret; })
> -
> -#define __arch_c_compare_and_exchange_val_32_acq(mem, newval, oldval) \
> -  ({ __typeof (*mem) ret;						      \
> -     __asm __volatile ("cmpl $0, %%" SEG_REG ":%P5\n\t"			      \
> -		       "je 0f\n\t"					      \
> -		       "lock\n"						      \
> -		       "0:\tcmpxchgl %2, %1"				      \
> -		       : "=a" (ret), "=m" (*mem)			      \
> -		       : BR_CONSTRAINT (newval), "m" (*mem), "0" (oldval),    \
> -			 "i" (offsetof (tcbhead_t, multiple_threads)));       \
> -     ret; })
> -
> #ifdef __x86_64__
> -# define __arch_c_compare_and_exchange_val_64_acq(mem, newval, oldval) \
> -  ({ __typeof (*mem) ret;						      \
> -     __asm __volatile ("cmpl $0, %%fs:%P5\n\t"				      \
> -		       "je 0f\n\t"					      \
> -		       "lock\n"						      \
> -		       "0:\tcmpxchgq %q2, %1"				      \
> -		       : "=a" (ret), "=m" (*mem)			      \
> -		       : "q" ((int64_t) cast_to_integer (newval)),	      \
> -			 "m" (*mem),					      \
> -			 "0" ((int64_t) cast_to_integer (oldval)),	      \
> -			 "i" (offsetof (tcbhead_t, multiple_threads)));	      \
> -     ret; })
> # define do_exchange_and_add_val_64_acq(pfx, mem, value) 0
> # define do_add_val_64_acq(pfx, mem, value) do { } while (0)
> #else
> @@ -107,13 +61,6 @@
>    such an operation.  So don't define any code for now.  If it is
>    really going to be used the code below can be used on Intel Pentium
>    and later, but NOT on i486.  */
> -# define __arch_c_compare_and_exchange_val_64_acq(mem, newval, oldval) \
> -  ({ __typeof (*mem) ret = *(mem);					      \
> -     __atomic_link_error ();						      \
> -     ret = (newval);							      \
> -     ret = (oldval);							      \
> -     ret; })
> -
> # define __arch_compare_and_exchange_val_64_acq(mem, newval, oldval)	      \
>   ({ __typeof (*mem) ret = *(mem);					      \
>      __atomic_link_error ();						      \
> @@ -181,24 +128,20 @@
>      if (sizeof (*mem) == 1)						      \
>        __asm __volatile (lock "xaddb %b0, %1"				      \
> 			 : "=q" (__result), "=m" (*mem)			      \
> -			 : "0" (__addval), "m" (*mem),			      \
> -			   "i" (offsetof (tcbhead_t, multiple_threads)));     \
> +			 : "0" (__addval), "m" (*mem));			      \
>      else if (sizeof (*mem) == 2)					      \
>        __asm __volatile (lock "xaddw %w0, %1"				      \
> 			 : "=r" (__result), "=m" (*mem)			      \
> -			 : "0" (__addval), "m" (*mem),			      \
> -			   "i" (offsetof (tcbhead_t, multiple_threads)));     \
> +			 : "0" (__addval), "m" (*mem));			      \
>      else if (sizeof (*mem) == 4)					      \
>        __asm __volatile (lock "xaddl %0, %1"				      \
> 			 : "=r" (__result), "=m" (*mem)			      \
> -			 : "0" (__addval), "m" (*mem),			      \
> -			   "i" (offsetof (tcbhead_t, multiple_threads)));     \
> +			 : "0" (__addval), "m" (*mem));			      \
>      else if (__HAVE_64B_ATOMICS)					      \
>        __asm __volatile (lock "xaddq %q0, %1"				      \
> 			 : "=r" (__result), "=m" (*mem)			      \
> 			 : "0" ((int64_t) cast_to_integer (__addval)),     \
> -			   "m" (*mem),					      \
> -			   "i" (offsetof (tcbhead_t, multiple_threads)));     \
> +			   "m" (*mem));					      \
>      else								      \
>        __result = do_exchange_and_add_val_64_acq (pfx, (mem), __addval);      \
>      __result; })
> @@ -206,14 +149,6 @@
> #define atomic_exchange_and_add(mem, value) \
>   __sync_fetch_and_add (mem, value)
> 
> -#define __arch_exchange_and_add_cprefix \
> -  "cmpl $0, %%" SEG_REG ":%P4\n\tje 0f\n\tlock\n0:\t"
> -
> -#define catomic_exchange_and_add(mem, value) \
> -  __arch_exchange_and_add_body (__arch_exchange_and_add_cprefix, __arch_c,    \
> -				mem, value)
> -
> -
> #define __arch_add_body(lock, pfx, apfx, mem, value) \
>   do {									      \
>     if (__builtin_constant_p (value) && (value) == 1)			      \
> @@ -223,24 +158,20 @@
>     else if (sizeof (*mem) == 1)					      \
>       __asm __volatile (lock "addb %b1, %0"				      \
> 			: "=m" (*mem)					      \
> -			: IBR_CONSTRAINT (value), "m" (*mem),		      \
> -			  "i" (offsetof (tcbhead_t, multiple_threads)));      \
> +			: IBR_CONSTRAINT (value), "m" (*mem));		      \
>     else if (sizeof (*mem) == 2)					      \
>       __asm __volatile (lock "addw %w1, %0"				      \
> 			: "=m" (*mem)					      \
> -			: "ir" (value), "m" (*mem),			      \
> -			  "i" (offsetof (tcbhead_t, multiple_threads)));      \
> +			: "ir" (value), "m" (*mem));			      \
>     else if (sizeof (*mem) == 4)					      \
>       __asm __volatile (lock "addl %1, %0"				      \
> 			: "=m" (*mem)					      \
> -			: "ir" (value), "m" (*mem),			      \
> -			  "i" (offsetof (tcbhead_t, multiple_threads)));      \
> +			: "ir" (value), "m" (*mem));			      \
>     else if (__HAVE_64B_ATOMICS)					      \
>       __asm __volatile (lock "addq %q1, %0"				      \
> 			: "=m" (*mem)					      \
> 			: "ir" ((int64_t) cast_to_integer (value)),	      \
> -			  "m" (*mem),					      \
> -			  "i" (offsetof (tcbhead_t, multiple_threads)));      \
> +			  "m" (*mem));					      \
>     else								      \
>       do_add_val_64_acq (apfx, (mem), (value));				      \
>   } while (0)
> @@ -248,13 +179,6 @@
> # define atomic_add(mem, value) \
>   __arch_add_body (LOCK_PREFIX, atomic, __arch, mem, value)
> 
> -#define __arch_add_cprefix \
> -  "cmpl $0, %%" SEG_REG ":%P3\n\tje 0f\n\tlock\n0:\t"
> -
> -#define catomic_add(mem, value) \
> -  __arch_add_body (__arch_add_cprefix, atomic, __arch_c, mem, value)
> -
> -
> #define atomic_add_negative(mem, value) \
>   ({ unsigned char __result;						      \
>      if (sizeof (*mem) == 1)						      \
> @@ -308,36 +232,25 @@
>     if (sizeof (*mem) == 1)						      \
>       __asm __volatile (lock "incb %b0"					      \
> 			: "=m" (*mem)					      \
> -			: "m" (*mem),					      \
> -			  "i" (offsetof (tcbhead_t, multiple_threads)));      \
> +			: "m" (*mem));					      \
>     else if (sizeof (*mem) == 2)					      \
>       __asm __volatile (lock "incw %w0"					      \
> 			: "=m" (*mem)					      \
> -			: "m" (*mem),					      \
> -			  "i" (offsetof (tcbhead_t, multiple_threads)));      \
> +			: "m" (*mem));					      \
>     else if (sizeof (*mem) == 4)					      \
>       __asm __volatile (lock "incl %0"					      \
> 			: "=m" (*mem)					      \
> -			: "m" (*mem),					      \
> -			  "i" (offsetof (tcbhead_t, multiple_threads)));      \
> +			: "m" (*mem));					      \
>     else if (__HAVE_64B_ATOMICS)					      \
>       __asm __volatile (lock "incq %q0"					      \
> 			: "=m" (*mem)					      \
> -			: "m" (*mem),					      \
> -			  "i" (offsetof (tcbhead_t, multiple_threads)));      \
> +			: "m" (*mem));					      \
>     else								      \
>       do_add_val_64_acq (pfx, mem, 1);					      \
>   } while (0)
> 
> #define atomic_increment(mem) __arch_increment_body (LOCK_PREFIX, __arch, mem)
> 
> -#define __arch_increment_cprefix \
> -  "cmpl $0, %%" SEG_REG ":%P2\n\tje 0f\n\tlock\n0:\t"
> -
> -#define catomic_increment(mem) \
> -  __arch_increment_body (__arch_increment_cprefix, __arch_c, mem)
> -
> -
> #define atomic_increment_and_test(mem) \
>   ({ unsigned char __result;						      \
>      if (sizeof (*mem) == 1)						      \
> @@ -366,36 +279,25 @@
>     if (sizeof (*mem) == 1)						      \
>       __asm __volatile (lock "decb %b0"					      \
> 			: "=m" (*mem)					      \
> -			: "m" (*mem),					      \
> -			  "i" (offsetof (tcbhead_t, multiple_threads)));      \
> +			: "m" (*mem));					      \
>     else if (sizeof (*mem) == 2)					      \
>       __asm __volatile (lock "decw %w0"					      \
> 			: "=m" (*mem)					      \
> -			: "m" (*mem),					      \
> -			  "i" (offsetof (tcbhead_t, multiple_threads)));      \
> +			: "m" (*mem));					      \
>     else if (sizeof (*mem) == 4)					      \
>       __asm __volatile (lock "decl %0"					      \
> 			: "=m" (*mem)					      \
> -			: "m" (*mem),					      \
> -			  "i" (offsetof (tcbhead_t, multiple_threads)));      \
> +			: "m" (*mem));					      \
>     else if (__HAVE_64B_ATOMICS)					      \
>       __asm __volatile (lock "decq %q0"					      \
> 			: "=m" (*mem)					      \
> -			: "m" (*mem),					      \
> -			  "i" (offsetof (tcbhead_t, multiple_threads)));      \
> +			: "m" (*mem));					      \
>     else								      \
>       do_add_val_64_acq (pfx, mem, -1);					      \
>   } while (0)
> 
> #define atomic_decrement(mem) __arch_decrement_body (LOCK_PREFIX, __arch, mem)
> 
> -#define __arch_decrement_cprefix \
> -  "cmpl $0, %%" SEG_REG ":%P2\n\tje 0f\n\tlock\n0:\t"
> -
> -#define catomic_decrement(mem) \
> -  __arch_decrement_body (__arch_decrement_cprefix, __arch_c, mem)
> -
> -
> #define atomic_decrement_and_test(mem) \
>   ({ unsigned char __result;						      \
>      if (sizeof (*mem) == 1)						      \
> @@ -472,65 +374,49 @@
>     if (sizeof (*mem) == 1)						      \
>       __asm __volatile (lock "andb %b1, %0"				      \
> 			: "=m" (*mem)					      \
> -			: IBR_CONSTRAINT (mask), "m" (*mem),		      \
> -			  "i" (offsetof (tcbhead_t, multiple_threads)));      \
> +			: IBR_CONSTRAINT (mask), "m" (*mem));		      \
>     else if (sizeof (*mem) == 2)					      \
>       __asm __volatile (lock "andw %w1, %0"				      \
> 			: "=m" (*mem)					      \
> -			: "ir" (mask), "m" (*mem),			      \
> -			  "i" (offsetof (tcbhead_t, multiple_threads)));      \
> +			: "ir" (mask), "m" (*mem));			      \
>     else if (sizeof (*mem) == 4)					      \
>       __asm __volatile (lock "andl %1, %0"				      \
> 			: "=m" (*mem)					      \
> -			: "ir" (mask), "m" (*mem),			      \
> -			  "i" (offsetof (tcbhead_t, multiple_threads)));      \
> +			: "ir" (mask), "m" (*mem));			      \
>     else if (__HAVE_64B_ATOMICS)					      \
>       __asm __volatile (lock "andq %q1, %0"				      \
> 			: "=m" (*mem)					      \
> -			: "ir" (mask), "m" (*mem),			      \
> -			  "i" (offsetof (tcbhead_t, multiple_threads)));      \
> +			: "ir" (mask), "m" (*mem));			      \
>     else								      \
>       __atomic_link_error ();						      \
>   } while (0)
> 
> -#define __arch_cprefix \
> -  "cmpl $0, %%" SEG_REG ":%P3\n\tje 0f\n\tlock\n0:\t"
> -
> #define atomic_and(mem, mask) __arch_and_body (LOCK_PREFIX, mem, mask)
> 
> -#define catomic_and(mem, mask) __arch_and_body (__arch_cprefix, mem, mask)
> -
> -
> #define __arch_or_body(lock, mem, mask) \
>   do {									      \
>     if (sizeof (*mem) == 1)						      \
>       __asm __volatile (lock "orb %b1, %0"				      \
> 			: "=m" (*mem)					      \
> -			: IBR_CONSTRAINT (mask), "m" (*mem),		      \
> -			  "i" (offsetof (tcbhead_t, multiple_threads)));      \
> +			: IBR_CONSTRAINT (mask), "m" (*mem));		      \
>     else if (sizeof (*mem) == 2)					      \
>       __asm __volatile (lock "orw %w1, %0"				      \
> 			: "=m" (*mem)					      \
> -			: "ir" (mask), "m" (*mem),			      \
> -			  "i" (offsetof (tcbhead_t, multiple_threads)));      \
> +			: "ir" (mask), "m" (*mem));			      \
>     else if (sizeof (*mem) == 4)					      \
>       __asm __volatile (lock "orl %1, %0"				      \
> 			: "=m" (*mem)					      \
> -			: "ir" (mask), "m" (*mem),			      \
> -			  "i" (offsetof (tcbhead_t, multiple_threads)));      \
> +			: "ir" (mask), "m" (*mem));			      \
>     else if (__HAVE_64B_ATOMICS)					      \
>       __asm __volatile (lock "orq %q1, %0"				      \
> 			: "=m" (*mem)					      \
> -			: "ir" (mask), "m" (*mem),			      \
> -			  "i" (offsetof (tcbhead_t, multiple_threads)));      \
> +			: "ir" (mask), "m" (*mem));			      \
>     else								      \
>       __atomic_link_error ();						      \
>   } while (0)
> 
> #define atomic_or(mem, mask) __arch_or_body (LOCK_PREFIX, mem, mask)
> 
> -#define catomic_or(mem, mask) __arch_or_body (__arch_cprefix, mem, mask)
> -
> /* We don't use mfence because it is supposedly slower due to having to
>    provide stronger guarantees (e.g., regarding self-modifying code).  */
> #define atomic_full_barrier() \
>

next prev parent reply	other threads:[~2022-06-22 13:00 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-16 10:01 Wilco Dijkstra
2022-06-16 20:06 ` Adhemerval Zanella
2022-06-17 11:56   ` Wilco Dijkstra
2022-06-22 13:00     ` Adhemerval Zanella [this message]
2022-07-05 11:16       ` Wilco Dijkstra
2022-07-06 12:15         ` Adhemerval Zanella

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=A8970DF4-4066-40E0-96D7-D3AF40227B12@linaro.org \
    --to=adhemerval.zanella@linaro.org \
    --cc=Wilco.Dijkstra@arm.com \
    --cc=libc-alpha@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).