public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH v4 0/4] Libatomic: Add LSE128 atomics support for AArch64
@ 2024-01-24 17:17 Victor Do Nascimento
  2024-01-24 17:17 ` [PATCH v4 1/4] libatomic: atomic_16.S: Improve ENTRY, END and ALIAS macro interface Victor Do Nascimento
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Victor Do Nascimento @ 2024-01-24 17:17 UTC (permalink / raw)
  To: gcc-patches
  Cc: kyrylo.tkachov, richard.sandiford, Richard.Earnshaw,
	Victor Do Nascimento

v4 updates

  1. Make use of HWCAP2_LSE128, as defined in the  Linux kernel v6.7
  for feature check.  This has required adding a new patch to the
  series, enabling ifunc resolvers to read a second arg of type
  `__ifunc_arg_t *', from which the `_hwcap2' member can be queried
  for LSE128 support.  HWCAP2_LSE128, HWCAP_ATOMICS and __ifunc_arg_t
  are conditionally defined in the `host-config.h' file to allow
  backwards compatibility with older versions of glibc which lack
  definitions for these.

  2. Run configure test LIBAT_TEST_FEAT_LSE128 unconditionally,
  renaming it to LIBAT_TEST_FEAT_AARCH64_LSE128.  While it may seem
  counter-intuitive to run an aarch64 test on non-aarch64 targets, the
  Automake manual makes it clear:

    "Note that you must arrange for every AM_CONDITIONAL to be
     invoked every time configure is run. If AM_CONDITIONAL is
     run conditionally (e.g., in a shell if statement), then
     the result will confuse automake."

  Failure to do so has been found to result in Libatomic build
  failures on arm and x86_64 targets.

  3. Minor changes in the implementations of {ENTRY|END}_FEAT and
  ALIAS macros used in `config/linux/aarch64/atomic_16.S'

  4. Improve commit message in PATCH 2/3 documenting design choice
  around merging REL and ACQ_REL memory orderings in LSE128 atomic
  functions.

Regression-tested on aarch64-none-linux-gnu on Cortex-A72 and
LSE128-enabled Armv-A Base RevC AEM FVP.

---

Building upon Wilco Dijkstra's work on AArch64 128-bit atomics for
Libatomic, namely the patches from [1] and [2],  this patch series
extends the library's  capabilities to dynamically select and emit
Armv9.4-a LSE128 implementations of atomic operations via ifuncs at
run-time whenever architectural support is present.

Regression tested on aarch64-linux-gnu target with LSE128-support.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-June/620529.html
[2] https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626358.html

Victor Do Nascimento (4):
  libatomic: atomic_16.S: Improve ENTRY, END and ALIAS macro interface
  libatomic: Add support for __ifunc_arg_t arg in ifunc resolver
  libatomic: Enable LSE128 128-bit atomics for armv9.4-a
  aarch64: Add explicit checks for implicit LSE/LSE2 requirements.

 libatomic/Makefile.am                        |   3 +
 libatomic/Makefile.in                        |   1 +
 libatomic/acinclude.m4                       |  19 ++
 libatomic/auto-config.h.in                   |   3 +
 libatomic/config/linux/aarch64/atomic_16.S   | 247 ++++++++++++++++---
 libatomic/config/linux/aarch64/host-config.h |  60 ++++-
 libatomic/configure                          |  61 ++++-
 libatomic/configure.ac                       |   3 +
 libatomic/configure.tgt                      |   2 +-
 9 files changed, 358 insertions(+), 41 deletions(-)

-- 
2.42.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v4 1/4] libatomic: atomic_16.S: Improve ENTRY, END and ALIAS macro interface
  2024-01-24 17:17 [PATCH v4 0/4] Libatomic: Add LSE128 atomics support for AArch64 Victor Do Nascimento
@ 2024-01-24 17:17 ` Victor Do Nascimento
  2024-01-25 17:17   ` Richard Sandiford
  2024-01-24 17:17 ` [PATCH v4 2/4] libatomic: Add support for __ifunc_arg_t arg in ifunc resolver Victor Do Nascimento
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 9+ messages in thread
From: Victor Do Nascimento @ 2024-01-24 17:17 UTC (permalink / raw)
  To: gcc-patches
  Cc: kyrylo.tkachov, richard.sandiford, Richard.Earnshaw,
	Victor Do Nascimento

The introduction of further architectural-feature dependent ifuncs
for AArch64 makes hard-coding ifunc `_i<n>' suffixes to functions
cumbersome to work with.  It is awkward to remember which ifunc maps
onto which arch feature and makes the code harder to maintain when new
ifuncs are added and their suffixes possibly altered.

This patch uses pre-processor `#define' statements to map each suffix to
a descriptive feature name macro, for example:

  #define LSE(NAME) NAME##_i1

Where we wish to generate ifunc names with the pre-processor's token
concatenation feature, we add a level of indirection to previous macro
calls.  If before we would have had`MACRO(<name>_i<n>)', we now have
`MACRO_FEAT(name, feature)'.  Where we wish to refer to base
functionality (i.e., functions where ifunc suffixes are absent), the
original `MACRO(<name>)' may be used to bypass suffixing.

Consequently, for base functionality, where the ifunc suffix is
absent, the macro interface remains the same.  For example, the entry
and endpoints of `libat_store_16' remain defined by:

  ENTRY (libat_store_16)

and

  END (libat_store_16)

For the LSE2 implementation of the same 16-byte atomic store, we now
have:

  ENTRY_FEAT (libat_store_16, LSE2)

and

  END_FEAT (libat_store_16, LSE2)

For the aliasing of function names, we define the following new
implementation of the ALIAS macro:

  ALIAS (FN_BASE_NAME, FROM_SUFFIX, TO_SUFFIX)

Defining the `CORE(NAME)' macro to be the identity operator, it
returns the base function name unaltered and allows us to alias
target-specific ifuncs to the corresponding base implementation.
For example, we'd alias the LSE2 `libat_exchange_16' to it base
implementation with:

  ALIAS (libat_exchange_16, LSE2, CORE)

libatomic/ChangeLog:
	* config/linux/aarch64/atomic_16.S (CORE): New macro.
	(LSE2): Likewise.
	(ENTRY_FEAT): Likewise.
	(ENTRY_FEAT1): Likewise.
	(END_FEAT): Likewise.
	(END_FEAT1): Likewise.
	(ALIAS): Modify macro to take in `arch' arguments.
	(ALIAS1): New.
---
 libatomic/config/linux/aarch64/atomic_16.S | 79 +++++++++++++---------
 1 file changed, 47 insertions(+), 32 deletions(-)

diff --git a/libatomic/config/linux/aarch64/atomic_16.S b/libatomic/config/linux/aarch64/atomic_16.S
index ad14f8f2e6e..16a42925903 100644
--- a/libatomic/config/linux/aarch64/atomic_16.S
+++ b/libatomic/config/linux/aarch64/atomic_16.S
@@ -40,22 +40,38 @@
 
 	.arch	armv8-a+lse
 
-#define ENTRY(name)		\
-	.global name;		\
-	.hidden name;		\
-	.type name,%function;	\
+#define LSE2(NAME) 	NAME##_i1
+#define CORE(NAME) 	NAME
+
+#define ENTRY(NAME) ENTRY_FEAT1 (NAME)
+
+#define ENTRY_FEAT(NAME, FEAT)  \
+	ENTRY_FEAT1 (FEAT (NAME))
+
+#define ENTRY_FEAT1(NAME)	\
+	.global NAME;		\
+	.hidden NAME;		\
+	.type NAME,%function;	\
 	.p2align 4;		\
-name:				\
-	.cfi_startproc;		\
+NAME:				\
+	.cfi_startproc;	\
 	hint	34	// bti c
 
-#define END(name)		\
+#define END(NAME) END_FEAT1 (NAME)
+
+#define END_FEAT(NAME, FEAT)	\
+	END_FEAT1 (FEAT (NAME))
+
+#define END_FEAT1(NAME)	\
 	.cfi_endproc;		\
-	.size name, .-name;
+	.size NAME, .-NAME;
+
+#define ALIAS(NAME, FROM, TO)	\
+	ALIAS1 (FROM (NAME),TO (NAME))
 
-#define ALIAS(alias,name)	\
-	.global alias;		\
-	.set alias, name;
+#define ALIAS1(ALIAS, NAME)	\
+	.global ALIAS;		\
+	.set ALIAS, NAME;
 
 #define res0 x0
 #define res1 x1
@@ -108,7 +124,7 @@ ENTRY (libat_load_16)
 END (libat_load_16)
 
 
-ENTRY (libat_load_16_i1)
+ENTRY_FEAT (libat_load_16, LSE2)
 	cbnz	w1, 1f
 
 	/* RELAXED.  */
@@ -128,7 +144,7 @@ ENTRY (libat_load_16_i1)
 	ldp	res0, res1, [x0]
 	dmb	ishld
 	ret
-END (libat_load_16_i1)
+END_FEAT (libat_load_16, LSE2)
 
 
 ENTRY (libat_store_16)
@@ -148,7 +164,7 @@ ENTRY (libat_store_16)
 END (libat_store_16)
 
 
-ENTRY (libat_store_16_i1)
+ENTRY_FEAT (libat_store_16, LSE2)
 	cbnz	w4, 1f
 
 	/* RELAXED.  */
@@ -160,7 +176,7 @@ ENTRY (libat_store_16_i1)
 	stlxp	w4, in0, in1, [x0]
 	cbnz	w4, 1b
 	ret
-END (libat_store_16_i1)
+END_FEAT (libat_store_16, LSE2)
 
 
 ENTRY (libat_exchange_16)
@@ -237,7 +253,7 @@ ENTRY (libat_compare_exchange_16)
 END (libat_compare_exchange_16)
 
 
-ENTRY (libat_compare_exchange_16_i1)
+ENTRY_FEAT (libat_compare_exchange_16, LSE2)
 	ldp	exp0, exp1, [x1]
 	mov	tmp0, exp0
 	mov	tmp1, exp1
@@ -270,7 +286,7 @@ ENTRY (libat_compare_exchange_16_i1)
 	/* ACQ_REL/SEQ_CST.  */
 4:	caspal	exp0, exp1, in0, in1, [x0]
 	b	0b
-END (libat_compare_exchange_16_i1)
+END_FEAT (libat_compare_exchange_16, LSE2)
 
 
 ENTRY (libat_fetch_add_16)
@@ -556,21 +572,20 @@ END (libat_test_and_set_16)
 
 /* Alias entry points which are the same in baseline and LSE2.  */
 
-ALIAS (libat_exchange_16_i1, libat_exchange_16)
-ALIAS (libat_fetch_add_16_i1, libat_fetch_add_16)
-ALIAS (libat_add_fetch_16_i1, libat_add_fetch_16)
-ALIAS (libat_fetch_sub_16_i1, libat_fetch_sub_16)
-ALIAS (libat_sub_fetch_16_i1, libat_sub_fetch_16)
-ALIAS (libat_fetch_or_16_i1, libat_fetch_or_16)
-ALIAS (libat_or_fetch_16_i1, libat_or_fetch_16)
-ALIAS (libat_fetch_and_16_i1, libat_fetch_and_16)
-ALIAS (libat_and_fetch_16_i1, libat_and_fetch_16)
-ALIAS (libat_fetch_xor_16_i1, libat_fetch_xor_16)
-ALIAS (libat_xor_fetch_16_i1, libat_xor_fetch_16)
-ALIAS (libat_fetch_nand_16_i1, libat_fetch_nand_16)
-ALIAS (libat_nand_fetch_16_i1, libat_nand_fetch_16)
-ALIAS (libat_test_and_set_16_i1, libat_test_and_set_16)
-
+ALIAS (libat_exchange_16, LSE2, CORE)
+ALIAS (libat_fetch_add_16, LSE2, CORE)
+ALIAS (libat_add_fetch_16, LSE2, CORE)
+ALIAS (libat_fetch_sub_16, LSE2, CORE)
+ALIAS (libat_sub_fetch_16, LSE2, CORE)
+ALIAS (libat_fetch_or_16, LSE2, CORE)
+ALIAS (libat_or_fetch_16, LSE2, CORE)
+ALIAS (libat_fetch_and_16, LSE2, CORE)
+ALIAS (libat_and_fetch_16, LSE2, CORE)
+ALIAS (libat_fetch_xor_16, LSE2, CORE)
+ALIAS (libat_xor_fetch_16, LSE2, CORE)
+ALIAS (libat_fetch_nand_16, LSE2, CORE)
+ALIAS (libat_nand_fetch_16, LSE2, CORE)
+ALIAS (libat_test_and_set_16, LSE2, CORE)
 
 /* GNU_PROPERTY_AARCH64_* macros from elf.h for use in asm code.  */
 #define FEATURE_1_AND 0xc0000000
-- 
2.42.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v4 2/4] libatomic: Add support for __ifunc_arg_t arg in ifunc resolver
  2024-01-24 17:17 [PATCH v4 0/4] Libatomic: Add LSE128 atomics support for AArch64 Victor Do Nascimento
  2024-01-24 17:17 ` [PATCH v4 1/4] libatomic: atomic_16.S: Improve ENTRY, END and ALIAS macro interface Victor Do Nascimento
@ 2024-01-24 17:17 ` Victor Do Nascimento
  2024-01-25 17:28   ` Richard Sandiford
  2024-01-24 17:17 ` [PATCH v4 3/4] libatomic: Enable LSE128 128-bit atomics for armv9.4-a Victor Do Nascimento
  2024-01-24 17:17 ` [PATCH v4 4/4] aarch64: Add explicit checks for implicit LSE/LSE2 requirements Victor Do Nascimento
  3 siblings, 1 reply; 9+ messages in thread
From: Victor Do Nascimento @ 2024-01-24 17:17 UTC (permalink / raw)
  To: gcc-patches
  Cc: kyrylo.tkachov, richard.sandiford, Richard.Earnshaw,
	Victor Do Nascimento

With support for new atomic features in Armv9.4-a being indicated by
HWCAP2 bits, Libatomic's ifunc resolver must now query its second
argument, of type __ifunc_arg_t*.

We therefore make this argument known to libatomic, allowing us to
query hwcap2 bits in the following manner:

  bool
  resolver (unsigned long hwcap, const __ifunc_arg_t *features);
  {
    return (features->hwcap2 & HWCAP2_<FEAT_NAME>);
  }

libatomic/ChangeLog:

	* config/linux/aarch64/host-config.h (__ifunc_arg_t):
	Conditionally-defined if `sys/ifunc.h' not found.
	(_IFUNC_ARG_HWCAP): Likewise.
	(IFUNC_COND_1): Pass __ifunc_arg_t argument to ifunc.
	(ifunc1): Modify function signature to accept __ifunc_arg_t
	argument.
	* configure.tgt: Add second `const __ifunc_arg_t *features'
	argument to IFUNC_RESOLVER_ARGS.
---
 libatomic/config/linux/aarch64/host-config.h | 15 +++++++++++++--
 libatomic/configure.tgt                      |  2 +-
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/libatomic/config/linux/aarch64/host-config.h b/libatomic/config/linux/aarch64/host-config.h
index 4200293c4e3..8fd4fe3321a 100644
--- a/libatomic/config/linux/aarch64/host-config.h
+++ b/libatomic/config/linux/aarch64/host-config.h
@@ -24,9 +24,20 @@
 #if HAVE_IFUNC
 #include <sys/auxv.h>
 
+#if __has_include(<sys/ifunc.h>)
+# include <sys/ifunc.h>
+#else
+typedef struct __ifunc_arg_t {
+  unsigned long _size;
+  unsigned long _hwcap;
+  unsigned long _hwcap2;
+} __ifunc_arg_t;
+# define _IFUNC_ARG_HWCAP (1ULL << 62)
+#endif
+
 #ifdef HWCAP_USCAT
 # if N == 16
-#  define IFUNC_COND_1	ifunc1 (hwcap)
+#  define IFUNC_COND_1	ifunc1 (hwcap, features)
 # else
 #  define IFUNC_COND_1	(hwcap & HWCAP_ATOMICS)
 # endif
@@ -48,7 +59,7 @@
 #define MIDR_PARTNUM(midr)	(((midr) >> 4) & 0xfff)
 
 static inline bool
-ifunc1 (unsigned long hwcap)
+ifunc1 (unsigned long hwcap, const __ifunc_arg_t *features)
 {
   if (hwcap & HWCAP_USCAT)
     return true;
diff --git a/libatomic/configure.tgt b/libatomic/configure.tgt
index b7609132c58..67a5f2dff80 100644
--- a/libatomic/configure.tgt
+++ b/libatomic/configure.tgt
@@ -194,7 +194,7 @@ esac
 # The type may be different on different architectures.
 case "${target}" in
   aarch64*-*-*)
-	IFUNC_RESOLVER_ARGS="uint64_t hwcap"
+	IFUNC_RESOLVER_ARGS="uint64_t hwcap, const __ifunc_arg_t *features"
 	;;
   *)
 	IFUNC_RESOLVER_ARGS="void"
-- 
2.42.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v4 3/4] libatomic: Enable LSE128 128-bit atomics for armv9.4-a
  2024-01-24 17:17 [PATCH v4 0/4] Libatomic: Add LSE128 atomics support for AArch64 Victor Do Nascimento
  2024-01-24 17:17 ` [PATCH v4 1/4] libatomic: atomic_16.S: Improve ENTRY, END and ALIAS macro interface Victor Do Nascimento
  2024-01-24 17:17 ` [PATCH v4 2/4] libatomic: Add support for __ifunc_arg_t arg in ifunc resolver Victor Do Nascimento
@ 2024-01-24 17:17 ` Victor Do Nascimento
  2024-01-25 17:38   ` Richard Sandiford
  2024-01-24 17:17 ` [PATCH v4 4/4] aarch64: Add explicit checks for implicit LSE/LSE2 requirements Victor Do Nascimento
  3 siblings, 1 reply; 9+ messages in thread
From: Victor Do Nascimento @ 2024-01-24 17:17 UTC (permalink / raw)
  To: gcc-patches
  Cc: kyrylo.tkachov, richard.sandiford, Richard.Earnshaw,
	Victor Do Nascimento

The armv9.4-a architectural revision adds three new atomic operations
associated with the LSE128 feature:

  * LDCLRP - Atomic AND NOT (bitclear) of a location with 128-bit
  value held in a pair of registers, with original data loaded into
  the same 2 registers.
  * LDSETP - Atomic OR (bitset) of a location with 128-bit value held
  in a pair of registers, with original data loaded into the same 2
  registers.
  * SWPP - Atomic swap of one 128-bit value with 128-bit value held
  in a pair of registers.

It is worth noting that in keeping with existing 128-bit atomic
operations in `atomic_16.S', we have chosen to merge certain
less-restrictive orderings into more restrictive ones.  This is done
to minimize the number of branches in the atomic functions, minimizing
both the likelihood of branch mispredictions and, in keeping code
small, limit the need for extra fetch cycles.

Past benchmarking has revealed that acquire is typically slightly
faster than release (5-10%), such that for the most frequently used
atomics (CAS and SWP) it makes sense to add support for acquire, as
well as release.

Likewise, it was identified that combining acquire and release typically
results in little to no penalty, such that it is of negligible benefit
to distinguish between release and acquire-release, making the
combining release/acq_rel/seq_cst a worthwhile design choice.

This patch adds the logic required to make use of these when the
architectural feature is present and a suitable assembler available.

In order to do this, the following changes are made:

  1. Add a configure-time check to check for LSE128 support in the
  assembler.
  2. Edit host-config.h so that when N == 16, nifunc = 2.
  3. Where available due to LSE128, implement the second ifunc, making
  use of the novel instructions.
  4. For atomic functions unable to make use of these new
  instructions, define a new alias which causes the _i1 function
  variant to point ahead to the corresponding _i2 implementation.

libatomic/ChangeLog:

	* Makefile.am (AM_CPPFLAGS): add conditional setting of
	-DHAVE_FEAT_LSE128.
	* acinclude.m4 (LIBAT_TEST_FEAT_AARCH64_LSE128): New.
	* config/linux/aarch64/atomic_16.S (LSE128): New macro
	definition.
	(libat_exchange_16): New LSE128 variant.
	(libat_fetch_or_16): Likewise.
	(libat_or_fetch_16): Likewise.
	(libat_fetch_and_16): Likewise.
	(libat_and_fetch_16): Likewise.
	* config/linux/aarch64/host-config.h (IFUNC_COND_2): New.
	(IFUNC_NCOND): Add operand size checking.
	(has_lse2): Renamed from `ifunc1`.
	(has_lse128): New.
	(HWCAP2_LSE128): Likewise.
	* libatomic/configure.ac: Add call to
	LIBAT_TEST_FEAT_AARCH64_LSE128.
	* configure (ac_subst_vars): Regenerated via autoreconf.
	* libatomic/Makefile.in: Likewise.
	* libatomic/auto-config.h.in: Likewise.
---
 libatomic/Makefile.am                        |   3 +
 libatomic/Makefile.in                        |   1 +
 libatomic/acinclude.m4                       |  19 +++
 libatomic/auto-config.h.in                   |   3 +
 libatomic/config/linux/aarch64/atomic_16.S   | 170 ++++++++++++++++++-
 libatomic/config/linux/aarch64/host-config.h |  42 ++++-
 libatomic/configure                          |  61 ++++++-
 libatomic/configure.ac                       |   3 +
 8 files changed, 293 insertions(+), 9 deletions(-)

diff --git a/libatomic/Makefile.am b/libatomic/Makefile.am
index cfad90124f9..0623a0bf2d1 100644
--- a/libatomic/Makefile.am
+++ b/libatomic/Makefile.am
@@ -130,6 +130,9 @@ libatomic_la_LIBADD = $(foreach s,$(SIZES),$(addsuffix _$(s)_.lo,$(SIZEOBJS)))
 ## On a target-specific basis, include alternates to be selected by IFUNC.
 if HAVE_IFUNC
 if ARCH_AARCH64_LINUX
+if ARCH_AARCH64_HAVE_LSE128
+AM_CPPFLAGS	     = -DHAVE_FEAT_LSE128
+endif
 IFUNC_OPTIONS	     = -march=armv8-a+lse
 libatomic_la_LIBADD += $(foreach s,$(SIZES),$(addsuffix _$(s)_1_.lo,$(SIZEOBJS)))
 libatomic_la_SOURCES += atomic_16.S
diff --git a/libatomic/Makefile.in b/libatomic/Makefile.in
index dc2330b91fd..cd48fa21334 100644
--- a/libatomic/Makefile.in
+++ b/libatomic/Makefile.in
@@ -452,6 +452,7 @@ M_SRC = $(firstword $(filter %/$(M_FILE), $(all_c_files)))
 libatomic_la_LIBADD = $(foreach s,$(SIZES),$(addsuffix \
 	_$(s)_.lo,$(SIZEOBJS))) $(am__append_1) $(am__append_3) \
 	$(am__append_4) $(am__append_5)
+@ARCH_AARCH64_HAVE_LSE128_TRUE@@ARCH_AARCH64_LINUX_TRUE@@HAVE_IFUNC_TRUE@AM_CPPFLAGS = -DHAVE_FEAT_LSE128
 @ARCH_AARCH64_LINUX_TRUE@@HAVE_IFUNC_TRUE@IFUNC_OPTIONS = -march=armv8-a+lse
 @ARCH_ARM_LINUX_TRUE@@HAVE_IFUNC_TRUE@IFUNC_OPTIONS = -march=armv7-a+fp -DHAVE_KERNEL64
 @ARCH_I386_TRUE@@HAVE_IFUNC_TRUE@IFUNC_OPTIONS = -march=i586
diff --git a/libatomic/acinclude.m4 b/libatomic/acinclude.m4
index f35ab5b60a5..d4f13174e2c 100644
--- a/libatomic/acinclude.m4
+++ b/libatomic/acinclude.m4
@@ -83,6 +83,25 @@ AC_DEFUN([LIBAT_TEST_ATOMIC_BUILTIN],[
   ])
 ])
 
+dnl
+dnl Test if the host assembler supports armv9.4-a LSE128 isns.
+dnl
+AC_DEFUN([LIBAT_TEST_FEAT_AARCH64_LSE128],[
+  AC_CACHE_CHECK([for armv9.4-a LSE128 insn support],
+    [libat_cv_have_feat_lse128],[
+    AC_LANG_CONFTEST([AC_LANG_PROGRAM([],[asm(".arch armv9-a+lse128")])])
+    if AC_TRY_EVAL(ac_compile); then
+      eval libat_cv_have_feat_lse128=yes
+    else
+      eval libat_cv_have_feat_lse128=no
+    fi
+    rm -f conftest*
+  ])
+  LIBAT_DEFINE_YESNO([HAVE_FEAT_LSE128], [$libat_cv_have_feat_lse128],
+	[Have LSE128 support for 16 byte integers.])
+  AM_CONDITIONAL([ARCH_AARCH64_HAVE_LSE128], [test x$libat_cv_have_feat_lse128 = xyes])
+])
+
 dnl
 dnl Test if we have __atomic_load and __atomic_store for mode $1, size $2
 dnl
diff --git a/libatomic/auto-config.h.in b/libatomic/auto-config.h.in
index ab3424a759e..7c78933b07d 100644
--- a/libatomic/auto-config.h.in
+++ b/libatomic/auto-config.h.in
@@ -105,6 +105,9 @@
 /* Define to 1 if you have the <dlfcn.h> header file. */
 #undef HAVE_DLFCN_H
 
+/* Have LSE128 support for 16 byte integers. */
+#undef HAVE_FEAT_LSE128
+
 /* Define to 1 if you have the <fenv.h> header file. */
 #undef HAVE_FENV_H
 
diff --git a/libatomic/config/linux/aarch64/atomic_16.S b/libatomic/config/linux/aarch64/atomic_16.S
index 16a42925903..979ed8498cd 100644
--- a/libatomic/config/linux/aarch64/atomic_16.S
+++ b/libatomic/config/linux/aarch64/atomic_16.S
@@ -35,12 +35,17 @@
    writes, this will be true when using atomics in actual code.
 
    The libat_<op>_16 entry points are ARMv8.0.
-   The libat_<op>_16_i1 entry points are used when LSE2 is available.  */
-
+   The libat_<op>_16_i1 entry points are used when LSE128 is available.
+   The libat_<op>_16_i2 entry points are used when LSE2 is available.  */
 
+#if HAVE_FEAT_LSE128
+	.arch	armv9-a+lse128
+#else
 	.arch	armv8-a+lse
+#endif
 
-#define LSE2(NAME) 	NAME##_i1
+#define LSE128(NAME) 	NAME##_i1
+#define LSE2(NAME) 	NAME##_i2
 #define CORE(NAME) 	NAME
 
 #define ENTRY(NAME) ENTRY_FEAT1 (NAME)
@@ -206,6 +211,31 @@ ENTRY (libat_exchange_16)
 END (libat_exchange_16)
 
 
+#if HAVE_FEAT_LSE128
+ENTRY_FEAT (libat_exchange_16, LSE128)
+	mov	tmp0, x0
+	mov	res0, in0
+	mov	res1, in1
+	cbnz	w4, 1f
+
+	/* RELAXED.  */
+	swpp	res0, res1, [tmp0]
+	ret
+1:
+	cmp	w4, ACQUIRE
+	b.hi	2f
+
+	/* ACQUIRE/CONSUME.  */
+	swppa	res0, res1, [tmp0]
+	ret
+
+	/* RELEASE/ACQ_REL/SEQ_CST.  */
+2:	swppal	res0, res1, [tmp0]
+	ret
+END_FEAT (libat_exchange_16, LSE128)
+#endif
+
+
 ENTRY (libat_compare_exchange_16)
 	ldp	exp0, exp1, [x1]
 	cbz	w4, 3f
@@ -399,6 +429,31 @@ ENTRY (libat_fetch_or_16)
 END (libat_fetch_or_16)
 
 
+#if HAVE_FEAT_LSE128
+ENTRY_FEAT (libat_fetch_or_16, LSE128)
+	mov	tmp0, x0
+	mov	res0, in0
+	mov	res1, in1
+	cbnz	w4, 1f
+
+	/* RELAXED.  */
+	ldsetp	res0, res1, [tmp0]
+	ret
+1:
+	cmp	w4, ACQUIRE
+	b.hi	2f
+
+	/* ACQUIRE/CONSUME.  */
+	ldsetpa	res0, res1, [tmp0]
+	ret
+
+	/* RELEASE/ACQ_REL/SEQ_CST.  */
+2:	ldsetpal	res0, res1, [tmp0]
+	ret
+END_FEAT (libat_fetch_or_16, LSE128)
+#endif
+
+
 ENTRY (libat_or_fetch_16)
 	mov	x5, x0
 	cbnz	w4, 2f
@@ -421,6 +476,36 @@ ENTRY (libat_or_fetch_16)
 END (libat_or_fetch_16)
 
 
+#if HAVE_FEAT_LSE128
+ENTRY_FEAT (libat_or_fetch_16, LSE128)
+	cbnz	w4, 1f
+	mov	tmp0, in0
+	mov	tmp1, in1
+
+	/* RELAXED.  */
+	ldsetp	in0, in1, [x0]
+	orr	res0, in0, tmp0
+	orr	res1, in1, tmp1
+	ret
+1:
+	cmp	w4, ACQUIRE
+	b.hi	2f
+
+	/* ACQUIRE/CONSUME.  */
+	ldsetpa	in0, in1, [x0]
+	orr	res0, in0, tmp0
+	orr	res1, in1, tmp1
+	ret
+
+	/* RELEASE/ACQ_REL/SEQ_CST.  */
+2:	ldsetpal	in0, in1, [x0]
+	orr	res0, in0, tmp0
+	orr	res1, in1, tmp1
+	ret
+END_FEAT (libat_or_fetch_16, LSE128)
+#endif
+
+
 ENTRY (libat_fetch_and_16)
 	mov	x5, x0
 	cbnz	w4, 2f
@@ -443,6 +528,32 @@ ENTRY (libat_fetch_and_16)
 END (libat_fetch_and_16)
 
 
+#if HAVE_FEAT_LSE128
+ENTRY_FEAT (libat_fetch_and_16, LSE128)
+	mov	tmp0, x0
+	mvn	res0, in0
+	mvn	res1, in1
+	cbnz	w4, 1f
+
+	/* RELAXED.  */
+	ldclrp	res0, res1, [tmp0]
+	ret
+
+1:
+	cmp	w4, ACQUIRE
+	b.hi	2f
+
+	/* ACQUIRE/CONSUME.  */
+	ldclrpa res0, res1, [tmp0]
+	ret
+
+	/* RELEASE/ACQ_REL/SEQ_CST.  */
+2:	ldclrpal	res0, res1, [tmp0]
+	ret
+END_FEAT (libat_fetch_and_16, LSE128)
+#endif
+
+
 ENTRY (libat_and_fetch_16)
 	mov	x5, x0
 	cbnz	w4, 2f
@@ -465,6 +576,37 @@ ENTRY (libat_and_fetch_16)
 END (libat_and_fetch_16)
 
 
+#if HAVE_FEAT_LSE128
+ENTRY_FEAT (libat_and_fetch_16, LSE128)
+	mvn	tmp0, in0
+	mvn	tmp0, in1
+	cbnz	w4, 1f
+
+	/* RELAXED.  */
+	ldclrp	tmp0, tmp1, [x0]
+	and	res0, tmp0, in0
+	and	res1, tmp1, in1
+	ret
+
+1:
+	cmp	w4, ACQUIRE
+	b.hi	2f
+
+	/* ACQUIRE/CONSUME.  */
+	ldclrpa tmp0, tmp1, [x0]
+	and	res0, tmp0, in0
+	and	res1, tmp1, in1
+	ret
+
+	/* RELEASE/ACQ_REL/SEQ_CST.  */
+2:	ldclrpal	tmp0, tmp1, [x5]
+	and	res0, tmp0, in0
+	and	res1, tmp1, in1
+	ret
+END_FEAT (libat_and_fetch_16, LSE128)
+#endif
+
+
 ENTRY (libat_fetch_xor_16)
 	mov	x5, x0
 	cbnz	w4, 2f
@@ -570,6 +712,28 @@ ENTRY (libat_test_and_set_16)
 END (libat_test_and_set_16)
 
 
+/* Alias entry points which are the same in LSE2 and LSE128.  */
+
+#if !HAVE_FEAT_LSE128
+ALIAS (libat_exchange_16, LSE128, LSE2)
+ALIAS (libat_fetch_or_16, LSE128, LSE2)
+ALIAS (libat_fetch_and_16, LSE128, LSE2)
+ALIAS (libat_or_fetch_16, LSE128, LSE2)
+ALIAS (libat_and_fetch_16, LSE128, LSE2)
+#endif
+ALIAS (libat_load_16, LSE128, LSE2)
+ALIAS (libat_store_16, LSE128, LSE2)
+ALIAS (libat_compare_exchange_16, LSE128, LSE2)
+ALIAS (libat_fetch_add_16, LSE128, LSE2)
+ALIAS (libat_add_fetch_16, LSE128, LSE2)
+ALIAS (libat_fetch_sub_16, LSE128, LSE2)
+ALIAS (libat_sub_fetch_16, LSE128, LSE2)
+ALIAS (libat_fetch_xor_16, LSE128, LSE2)
+ALIAS (libat_xor_fetch_16, LSE128, LSE2)
+ALIAS (libat_fetch_nand_16, LSE128, LSE2)
+ALIAS (libat_nand_fetch_16, LSE128, LSE2)
+ALIAS (libat_test_and_set_16, LSE128, LSE2)
+
 /* Alias entry points which are the same in baseline and LSE2.  */
 
 ALIAS (libat_exchange_16, LSE2, CORE)
diff --git a/libatomic/config/linux/aarch64/host-config.h b/libatomic/config/linux/aarch64/host-config.h
index 8fd4fe3321a..1bc7d839232 100644
--- a/libatomic/config/linux/aarch64/host-config.h
+++ b/libatomic/config/linux/aarch64/host-config.h
@@ -37,14 +37,17 @@ typedef struct __ifunc_arg_t {
 
 #ifdef HWCAP_USCAT
 # if N == 16
-#  define IFUNC_COND_1	ifunc1 (hwcap, features)
+#  define IFUNC_COND_1		(has_lse128 (hwcap, features))
+#  define IFUNC_COND_2		(has_lse2 (hwcap, features))
+#  define IFUNC_NCOND(N)	2
 # else
-#  define IFUNC_COND_1	(hwcap & HWCAP_ATOMICS)
+#  define IFUNC_COND_1		(hwcap & HWCAP_ATOMICS)
+#  define IFUNC_NCOND(N)	1
 # endif
 #else
 #  define IFUNC_COND_1	(false)
+#  define IFUNC_NCOND(N)	1
 #endif
-#define IFUNC_NCOND(N)	(1)
 
 #endif /* HAVE_IFUNC */
 
@@ -59,7 +62,7 @@ typedef struct __ifunc_arg_t {
 #define MIDR_PARTNUM(midr)	(((midr) >> 4) & 0xfff)
 
 static inline bool
-ifunc1 (unsigned long hwcap, const __ifunc_arg_t *features)
+has_lse2 (unsigned long hwcap, const __ifunc_arg_t *features)
 {
   if (hwcap & HWCAP_USCAT)
     return true;
@@ -75,6 +78,37 @@ ifunc1 (unsigned long hwcap, const __ifunc_arg_t *features)
 
   return false;
 }
+
+/* LSE128 atomic support encoded in ID_AA64ISAR0_EL1.Atomic,
+   bits[23:20].  The expected value is 0b0011.  Check that.  */
+
+#define AT_FEAT_FIELD(isar0)	(((isar0) >> 20) & 15)
+
+/* Ensure backwards compatibility with glibc <= 2.38.  */
+#ifndef HWCAP2_LSE128
+#define HWCAP2_LSE128		(1UL << 47)
+#endif
+
+static inline bool
+has_lse128 (unsigned long hwcap, const __ifunc_arg_t *features)
+{
+  if (hwcap & _IFUNC_ARG_HWCAP
+      && features->_hwcap2 & HWCAP2_LSE128)
+    return true;
+  /* A 0 HWCAP2_LSE128 bit may be just as much a sign of missing HWCAP2 bit
+     support in older kernels as it is of CPU feature absence.  Try fallback
+     method to guarantee LSE128 is not implemented.
+
+     In the absence of HWCAP_CPUID, we are unable to check for LSE128.  */
+  if (!(hwcap & HWCAP_CPUID))
+    return false;
+  unsigned long isar0;
+  asm volatile ("mrs %0, ID_AA64ISAR0_EL1" : "=r" (isar0));
+  if (AT_FEAT_FIELD (isar0) >= 3)
+    return true;
+  return false;
+}
+
 #endif
 
 #include_next <host-config.h>
diff --git a/libatomic/configure b/libatomic/configure
index d579bab96f8..8ab730d8082 100755
--- a/libatomic/configure
+++ b/libatomic/configure
@@ -656,6 +656,8 @@ LIBAT_BUILD_VERSIONED_SHLIB_FALSE
 LIBAT_BUILD_VERSIONED_SHLIB_TRUE
 OPT_LDFLAGS
 SECTION_LDFLAGS
+ARCH_AARCH64_HAVE_LSE128_FALSE
+ARCH_AARCH64_HAVE_LSE128_TRUE
 SYSROOT_CFLAGS_FOR_TARGET
 enable_aarch64_lse
 libtool_VERSION
@@ -11456,7 +11458,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 11459 "configure"
+#line 11461 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -11562,7 +11564,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 11565 "configure"
+#line 11567 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -14697,6 +14699,57 @@ _ACEOF
 
 
 
+# Check for target-specific assembly-level support for atomic operations.
+
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for armv9.4-a LSE128 insn support" >&5
+$as_echo_n "checking for armv9.4-a LSE128 insn support... " >&6; }
+if ${libat_cv_have_feat_lse128+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+
+    cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+int
+main ()
+{
+asm(".arch armv9-a+lse128")
+  ;
+  return 0;
+}
+_ACEOF
+    if { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_compile\""; } >&5
+  (eval $ac_compile) 2>&5
+  ac_status=$?
+  $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; }; then
+      eval libat_cv_have_feat_lse128=yes
+    else
+      eval libat_cv_have_feat_lse128=no
+    fi
+    rm -f conftest*
+
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $libat_cv_have_feat_lse128" >&5
+$as_echo "$libat_cv_have_feat_lse128" >&6; }
+
+  yesno=`echo $libat_cv_have_feat_lse128 | tr 'yesno' '1  0 '`
+
+cat >>confdefs.h <<_ACEOF
+#define HAVE_FEAT_LSE128 $yesno
+_ACEOF
+
+
+   if test x$libat_cv_have_feat_lse128 = xyes; then
+  ARCH_AARCH64_HAVE_LSE128_TRUE=
+  ARCH_AARCH64_HAVE_LSE128_FALSE='#'
+else
+  ARCH_AARCH64_HAVE_LSE128_TRUE='#'
+  ARCH_AARCH64_HAVE_LSE128_FALSE=
+fi
+
+
+
  { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether byte ordering is bigendian" >&5
 $as_echo_n "checking whether byte ordering is bigendian... " >&6; }
 if ${ac_cv_c_bigendian+:} false; then :
@@ -15989,6 +16042,10 @@ if test -z "${ENABLE_DARWIN_AT_RPATH_TRUE}" && test -z "${ENABLE_DARWIN_AT_RPATH
   as_fn_error $? "conditional \"ENABLE_DARWIN_AT_RPATH\" was never defined.
 Usually this means the macro was only invoked conditionally." "$LINENO" 5
 fi
+if test -z "${ARCH_AARCH64_HAVE_LSE128_TRUE}" && test -z "${ARCH_AARCH64_HAVE_LSE128_FALSE}"; then
+  as_fn_error $? "conditional \"ARCH_AARCH64_HAVE_LSE128\" was never defined.
+Usually this means the macro was only invoked conditionally." "$LINENO" 5
+fi
 
 if test -z "${LIBAT_BUILD_VERSIONED_SHLIB_TRUE}" && test -z "${LIBAT_BUILD_VERSIONED_SHLIB_FALSE}"; then
   as_fn_error $? "conditional \"LIBAT_BUILD_VERSIONED_SHLIB\" was never defined.
diff --git a/libatomic/configure.ac b/libatomic/configure.ac
index 32a2cdb13ae..85824fa7614 100644
--- a/libatomic/configure.ac
+++ b/libatomic/configure.ac
@@ -206,6 +206,9 @@ LIBAT_FORALL_MODES([LIBAT_HAVE_ATOMIC_CAS])
 LIBAT_FORALL_MODES([LIBAT_HAVE_ATOMIC_FETCH_ADD])
 LIBAT_FORALL_MODES([LIBAT_HAVE_ATOMIC_FETCH_OP])
 
+# Check for target-specific assembly-level support for atomic operations.
+LIBAT_TEST_FEAT_AARCH64_LSE128()
+
 AC_C_BIGENDIAN
 # I don't like the default behaviour of WORDS_BIGENDIAN undefined for LE.
 AH_BOTTOM(
-- 
2.42.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v4 4/4] aarch64: Add explicit checks for implicit LSE/LSE2 requirements.
  2024-01-24 17:17 [PATCH v4 0/4] Libatomic: Add LSE128 atomics support for AArch64 Victor Do Nascimento
                   ` (2 preceding siblings ...)
  2024-01-24 17:17 ` [PATCH v4 3/4] libatomic: Enable LSE128 128-bit atomics for armv9.4-a Victor Do Nascimento
@ 2024-01-24 17:17 ` Victor Do Nascimento
  2024-01-25 17:41   ` Richard Sandiford
  3 siblings, 1 reply; 9+ messages in thread
From: Victor Do Nascimento @ 2024-01-24 17:17 UTC (permalink / raw)
  To: gcc-patches
  Cc: kyrylo.tkachov, richard.sandiford, Richard.Earnshaw,
	Victor Do Nascimento

At present, Evaluation of both `has_lse2(hwcap)' and
`has_lse128(hwcap)' may require issuing an `mrs' instruction to query
a system register.  This instruction, when issued from user-space
results in a trap by the kernel which then returns the value read in
by the system register.  Given the undesirable nature of the
computational expense associated with the context switch, it is
important to implement mechanisms to, wherever possible, forgo the
operation.

In light of this, given how other architectural requirements serving
as prerequisites have long been assigned HWCAP bits by the kernel, we
can inexpensively query for their availability before attempting to
read any system registers.  Where one of these early tests fail, we
can assert that the main feature of interest (be it LSE2 or LSE128)
cannot be present, allowing us to return from the function early and
skip the unnecessary expensive kernel-mediated access to system
registers.

libatomic/ChangeLog:

	* config/linux/aarch64/host-config.h (has_lse2): Add test for LSE.
	(has_lse128): Add test for LSE2.
---
 libatomic/config/linux/aarch64/host-config.h | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/libatomic/config/linux/aarch64/host-config.h b/libatomic/config/linux/aarch64/host-config.h
index 1bc7d839232..4e354124063 100644
--- a/libatomic/config/linux/aarch64/host-config.h
+++ b/libatomic/config/linux/aarch64/host-config.h
@@ -64,8 +64,13 @@ typedef struct __ifunc_arg_t {
 static inline bool
 has_lse2 (unsigned long hwcap, const __ifunc_arg_t *features)
 {
+  /* Check for LSE2.  */
   if (hwcap & HWCAP_USCAT)
     return true;
+  /* No point checking further for atomic 128-bit load/store if LSE
+     prerequisite not met.  */
+  if (!(hwcap & HWCAP_ATOMICS))
+    return false;
   if (!(hwcap & HWCAP_CPUID))
     return false;
 
@@ -99,9 +104,11 @@ has_lse128 (unsigned long hwcap, const __ifunc_arg_t *features)
      support in older kernels as it is of CPU feature absence.  Try fallback
      method to guarantee LSE128 is not implemented.
 
-     In the absence of HWCAP_CPUID, we are unable to check for LSE128.  */
-  if (!(hwcap & HWCAP_CPUID))
-    return false;
+     In the absence of HWCAP_CPUID, we are unable to check for LSE128.
+     If feature check available, check LSE2 prerequisite before proceeding.  */
+  if (!(hwcap & HWCAP_CPUID) || !(hwcap & HWCAP_USCAT))
+     return false;
+
   unsigned long isar0;
   asm volatile ("mrs %0, ID_AA64ISAR0_EL1" : "=r" (isar0));
   if (AT_FEAT_FIELD (isar0) >= 3)
-- 
2.42.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 1/4] libatomic: atomic_16.S: Improve ENTRY, END and ALIAS macro interface
  2024-01-24 17:17 ` [PATCH v4 1/4] libatomic: atomic_16.S: Improve ENTRY, END and ALIAS macro interface Victor Do Nascimento
@ 2024-01-25 17:17   ` Richard Sandiford
  0 siblings, 0 replies; 9+ messages in thread
From: Richard Sandiford @ 2024-01-25 17:17 UTC (permalink / raw)
  To: Victor Do Nascimento; +Cc: gcc-patches, kyrylo.tkachov, Richard.Earnshaw

Victor Do Nascimento <victor.donascimento@arm.com> writes:
> The introduction of further architectural-feature dependent ifuncs
> for AArch64 makes hard-coding ifunc `_i<n>' suffixes to functions
> cumbersome to work with.  It is awkward to remember which ifunc maps
> onto which arch feature and makes the code harder to maintain when new
> ifuncs are added and their suffixes possibly altered.
>
> This patch uses pre-processor `#define' statements to map each suffix to
> a descriptive feature name macro, for example:
>
>   #define LSE(NAME) NAME##_i1
>
> Where we wish to generate ifunc names with the pre-processor's token
> concatenation feature, we add a level of indirection to previous macro
> calls.  If before we would have had`MACRO(<name>_i<n>)', we now have
> `MACRO_FEAT(name, feature)'.  Where we wish to refer to base
> functionality (i.e., functions where ifunc suffixes are absent), the
> original `MACRO(<name>)' may be used to bypass suffixing.
>
> Consequently, for base functionality, where the ifunc suffix is
> absent, the macro interface remains the same.  For example, the entry
> and endpoints of `libat_store_16' remain defined by:
>
>   ENTRY (libat_store_16)
>
> and
>
>   END (libat_store_16)
>
> For the LSE2 implementation of the same 16-byte atomic store, we now
> have:
>
>   ENTRY_FEAT (libat_store_16, LSE2)
>
> and
>
>   END_FEAT (libat_store_16, LSE2)
>
> For the aliasing of function names, we define the following new
> implementation of the ALIAS macro:
>
>   ALIAS (FN_BASE_NAME, FROM_SUFFIX, TO_SUFFIX)
>
> Defining the `CORE(NAME)' macro to be the identity operator, it
> returns the base function name unaltered and allows us to alias
> target-specific ifuncs to the corresponding base implementation.
> For example, we'd alias the LSE2 `libat_exchange_16' to it base
> implementation with:
>
>   ALIAS (libat_exchange_16, LSE2, CORE)
>
> libatomic/ChangeLog:
> 	* config/linux/aarch64/atomic_16.S (CORE): New macro.
> 	(LSE2): Likewise.
> 	(ENTRY_FEAT): Likewise.
> 	(ENTRY_FEAT1): Likewise.
> 	(END_FEAT): Likewise.
> 	(END_FEAT1): Likewise.
> 	(ALIAS): Modify macro to take in `arch' arguments.
> 	(ALIAS1): New.
> ---
>  libatomic/config/linux/aarch64/atomic_16.S | 79 +++++++++++++---------
>  1 file changed, 47 insertions(+), 32 deletions(-)
>
> diff --git a/libatomic/config/linux/aarch64/atomic_16.S b/libatomic/config/linux/aarch64/atomic_16.S
> index ad14f8f2e6e..16a42925903 100644
> --- a/libatomic/config/linux/aarch64/atomic_16.S
> +++ b/libatomic/config/linux/aarch64/atomic_16.S
> @@ -40,22 +40,38 @@
>  
>  	.arch	armv8-a+lse
>  
> -#define ENTRY(name)		\
> -	.global name;		\
> -	.hidden name;		\
> -	.type name,%function;	\
> +#define LSE2(NAME) 	NAME##_i1
> +#define CORE(NAME) 	NAME
> +
> +#define ENTRY(NAME) ENTRY_FEAT1 (NAME)
> +
> +#define ENTRY_FEAT(NAME, FEAT)  \
> +	ENTRY_FEAT1 (FEAT (NAME))
> +
> +#define ENTRY_FEAT1(NAME)	\
> +	.global NAME;		\
> +	.hidden NAME;		\
> +	.type NAME,%function;	\

I don't think ENTRY_FEAT1 is necessary now.  It should be possible
to keep ENTRY as it was and use:

#define ENTRY_FEAT(NAME, FEAT)  \
	ENTRY (FEAT (NAME))

Similarly for END/END_FEAT.

OK with those changes, thanks.

Richard

>  	.p2align 4;		\
> -name:				\
> -	.cfi_startproc;		\
> +NAME:				\
> +	.cfi_startproc;	\
>  	hint	34	// bti c
>  
> -#define END(name)		\
> +#define END(NAME) END_FEAT1 (NAME)
> +
> +#define END_FEAT(NAME, FEAT)	\
> +	END_FEAT1 (FEAT (NAME))
> +
> +#define END_FEAT1(NAME)	\
>  	.cfi_endproc;		\
> -	.size name, .-name;
> +	.size NAME, .-NAME;
> +
> +#define ALIAS(NAME, FROM, TO)	\
> +	ALIAS1 (FROM (NAME),TO (NAME))
>  
> -#define ALIAS(alias,name)	\
> -	.global alias;		\
> -	.set alias, name;
> +#define ALIAS1(ALIAS, NAME)	\
> +	.global ALIAS;		\
> +	.set ALIAS, NAME;
>  
>  #define res0 x0
>  #define res1 x1
> @@ -108,7 +124,7 @@ ENTRY (libat_load_16)
>  END (libat_load_16)
>  
>  
> -ENTRY (libat_load_16_i1)
> +ENTRY_FEAT (libat_load_16, LSE2)
>  	cbnz	w1, 1f
>  
>  	/* RELAXED.  */
> @@ -128,7 +144,7 @@ ENTRY (libat_load_16_i1)
>  	ldp	res0, res1, [x0]
>  	dmb	ishld
>  	ret
> -END (libat_load_16_i1)
> +END_FEAT (libat_load_16, LSE2)
>  
>  
>  ENTRY (libat_store_16)
> @@ -148,7 +164,7 @@ ENTRY (libat_store_16)
>  END (libat_store_16)
>  
>  
> -ENTRY (libat_store_16_i1)
> +ENTRY_FEAT (libat_store_16, LSE2)
>  	cbnz	w4, 1f
>  
>  	/* RELAXED.  */
> @@ -160,7 +176,7 @@ ENTRY (libat_store_16_i1)
>  	stlxp	w4, in0, in1, [x0]
>  	cbnz	w4, 1b
>  	ret
> -END (libat_store_16_i1)
> +END_FEAT (libat_store_16, LSE2)
>  
>  
>  ENTRY (libat_exchange_16)
> @@ -237,7 +253,7 @@ ENTRY (libat_compare_exchange_16)
>  END (libat_compare_exchange_16)
>  
>  
> -ENTRY (libat_compare_exchange_16_i1)
> +ENTRY_FEAT (libat_compare_exchange_16, LSE2)
>  	ldp	exp0, exp1, [x1]
>  	mov	tmp0, exp0
>  	mov	tmp1, exp1
> @@ -270,7 +286,7 @@ ENTRY (libat_compare_exchange_16_i1)
>  	/* ACQ_REL/SEQ_CST.  */
>  4:	caspal	exp0, exp1, in0, in1, [x0]
>  	b	0b
> -END (libat_compare_exchange_16_i1)
> +END_FEAT (libat_compare_exchange_16, LSE2)
>  
>  
>  ENTRY (libat_fetch_add_16)
> @@ -556,21 +572,20 @@ END (libat_test_and_set_16)
>  
>  /* Alias entry points which are the same in baseline and LSE2.  */
>  
> -ALIAS (libat_exchange_16_i1, libat_exchange_16)
> -ALIAS (libat_fetch_add_16_i1, libat_fetch_add_16)
> -ALIAS (libat_add_fetch_16_i1, libat_add_fetch_16)
> -ALIAS (libat_fetch_sub_16_i1, libat_fetch_sub_16)
> -ALIAS (libat_sub_fetch_16_i1, libat_sub_fetch_16)
> -ALIAS (libat_fetch_or_16_i1, libat_fetch_or_16)
> -ALIAS (libat_or_fetch_16_i1, libat_or_fetch_16)
> -ALIAS (libat_fetch_and_16_i1, libat_fetch_and_16)
> -ALIAS (libat_and_fetch_16_i1, libat_and_fetch_16)
> -ALIAS (libat_fetch_xor_16_i1, libat_fetch_xor_16)
> -ALIAS (libat_xor_fetch_16_i1, libat_xor_fetch_16)
> -ALIAS (libat_fetch_nand_16_i1, libat_fetch_nand_16)
> -ALIAS (libat_nand_fetch_16_i1, libat_nand_fetch_16)
> -ALIAS (libat_test_and_set_16_i1, libat_test_and_set_16)
> -
> +ALIAS (libat_exchange_16, LSE2, CORE)
> +ALIAS (libat_fetch_add_16, LSE2, CORE)
> +ALIAS (libat_add_fetch_16, LSE2, CORE)
> +ALIAS (libat_fetch_sub_16, LSE2, CORE)
> +ALIAS (libat_sub_fetch_16, LSE2, CORE)
> +ALIAS (libat_fetch_or_16, LSE2, CORE)
> +ALIAS (libat_or_fetch_16, LSE2, CORE)
> +ALIAS (libat_fetch_and_16, LSE2, CORE)
> +ALIAS (libat_and_fetch_16, LSE2, CORE)
> +ALIAS (libat_fetch_xor_16, LSE2, CORE)
> +ALIAS (libat_xor_fetch_16, LSE2, CORE)
> +ALIAS (libat_fetch_nand_16, LSE2, CORE)
> +ALIAS (libat_nand_fetch_16, LSE2, CORE)
> +ALIAS (libat_test_and_set_16, LSE2, CORE)
>  
>  /* GNU_PROPERTY_AARCH64_* macros from elf.h for use in asm code.  */
>  #define FEATURE_1_AND 0xc0000000

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 2/4] libatomic: Add support for __ifunc_arg_t arg in ifunc resolver
  2024-01-24 17:17 ` [PATCH v4 2/4] libatomic: Add support for __ifunc_arg_t arg in ifunc resolver Victor Do Nascimento
@ 2024-01-25 17:28   ` Richard Sandiford
  0 siblings, 0 replies; 9+ messages in thread
From: Richard Sandiford @ 2024-01-25 17:28 UTC (permalink / raw)
  To: Victor Do Nascimento; +Cc: gcc-patches, kyrylo.tkachov, Richard.Earnshaw

Victor Do Nascimento <victor.donascimento@arm.com> writes:
> With support for new atomic features in Armv9.4-a being indicated by
> HWCAP2 bits, Libatomic's ifunc resolver must now query its second
> argument, of type __ifunc_arg_t*.
>
> We therefore make this argument known to libatomic, allowing us to
> query hwcap2 bits in the following manner:
>
>   bool
>   resolver (unsigned long hwcap, const __ifunc_arg_t *features);
>   {
>     return (features->hwcap2 & HWCAP2_<FEAT_NAME>);
>   }
>
> libatomic/ChangeLog:
>
> 	* config/linux/aarch64/host-config.h (__ifunc_arg_t):
> 	Conditionally-defined if `sys/ifunc.h' not found.
> 	(_IFUNC_ARG_HWCAP): Likewise.
> 	(IFUNC_COND_1): Pass __ifunc_arg_t argument to ifunc.
> 	(ifunc1): Modify function signature to accept __ifunc_arg_t
> 	argument.
> 	* configure.tgt: Add second `const __ifunc_arg_t *features'
> 	argument to IFUNC_RESOLVER_ARGS.

OK, thanks.

Richard

> ---
>  libatomic/config/linux/aarch64/host-config.h | 15 +++++++++++++--
>  libatomic/configure.tgt                      |  2 +-
>  2 files changed, 14 insertions(+), 3 deletions(-)
>
> diff --git a/libatomic/config/linux/aarch64/host-config.h b/libatomic/config/linux/aarch64/host-config.h
> index 4200293c4e3..8fd4fe3321a 100644
> --- a/libatomic/config/linux/aarch64/host-config.h
> +++ b/libatomic/config/linux/aarch64/host-config.h
> @@ -24,9 +24,20 @@
>  #if HAVE_IFUNC
>  #include <sys/auxv.h>
>  
> +#if __has_include(<sys/ifunc.h>)
> +# include <sys/ifunc.h>
> +#else
> +typedef struct __ifunc_arg_t {
> +  unsigned long _size;
> +  unsigned long _hwcap;
> +  unsigned long _hwcap2;
> +} __ifunc_arg_t;
> +# define _IFUNC_ARG_HWCAP (1ULL << 62)
> +#endif
> +
>  #ifdef HWCAP_USCAT
>  # if N == 16
> -#  define IFUNC_COND_1	ifunc1 (hwcap)
> +#  define IFUNC_COND_1	ifunc1 (hwcap, features)
>  # else
>  #  define IFUNC_COND_1	(hwcap & HWCAP_ATOMICS)
>  # endif
> @@ -48,7 +59,7 @@
>  #define MIDR_PARTNUM(midr)	(((midr) >> 4) & 0xfff)
>  
>  static inline bool
> -ifunc1 (unsigned long hwcap)
> +ifunc1 (unsigned long hwcap, const __ifunc_arg_t *features)
>  {
>    if (hwcap & HWCAP_USCAT)
>      return true;
> diff --git a/libatomic/configure.tgt b/libatomic/configure.tgt
> index b7609132c58..67a5f2dff80 100644
> --- a/libatomic/configure.tgt
> +++ b/libatomic/configure.tgt
> @@ -194,7 +194,7 @@ esac
>  # The type may be different on different architectures.
>  case "${target}" in
>    aarch64*-*-*)
> -	IFUNC_RESOLVER_ARGS="uint64_t hwcap"
> +	IFUNC_RESOLVER_ARGS="uint64_t hwcap, const __ifunc_arg_t *features"
>  	;;
>    *)
>  	IFUNC_RESOLVER_ARGS="void"

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 3/4] libatomic: Enable LSE128 128-bit atomics for armv9.4-a
  2024-01-24 17:17 ` [PATCH v4 3/4] libatomic: Enable LSE128 128-bit atomics for armv9.4-a Victor Do Nascimento
@ 2024-01-25 17:38   ` Richard Sandiford
  0 siblings, 0 replies; 9+ messages in thread
From: Richard Sandiford @ 2024-01-25 17:38 UTC (permalink / raw)
  To: Victor Do Nascimento; +Cc: gcc-patches, kyrylo.tkachov, Richard.Earnshaw

Victor Do Nascimento <victor.donascimento@arm.com> writes:
> The armv9.4-a architectural revision adds three new atomic operations
> associated with the LSE128 feature:
>
>   * LDCLRP - Atomic AND NOT (bitclear) of a location with 128-bit
>   value held in a pair of registers, with original data loaded into
>   the same 2 registers.
>   * LDSETP - Atomic OR (bitset) of a location with 128-bit value held
>   in a pair of registers, with original data loaded into the same 2
>   registers.
>   * SWPP - Atomic swap of one 128-bit value with 128-bit value held
>   in a pair of registers.
>
> It is worth noting that in keeping with existing 128-bit atomic
> operations in `atomic_16.S', we have chosen to merge certain
> less-restrictive orderings into more restrictive ones.  This is done
> to minimize the number of branches in the atomic functions, minimizing
> both the likelihood of branch mispredictions and, in keeping code
> small, limit the need for extra fetch cycles.
>
> Past benchmarking has revealed that acquire is typically slightly
> faster than release (5-10%), such that for the most frequently used
> atomics (CAS and SWP) it makes sense to add support for acquire, as
> well as release.
>
> Likewise, it was identified that combining acquire and release typically
> results in little to no penalty, such that it is of negligible benefit
> to distinguish between release and acquire-release, making the
> combining release/acq_rel/seq_cst a worthwhile design choice.

I was thinking more that it would be good to have this as a block
comment within the file itself.  I won't insist though.  At least
having it in the commit message will ensure that it's discoverable
from the git repo.

> This patch adds the logic required to make use of these when the
> architectural feature is present and a suitable assembler available.
>
> In order to do this, the following changes are made:
>
>   1. Add a configure-time check to check for LSE128 support in the
>   assembler.
>   2. Edit host-config.h so that when N == 16, nifunc = 2.
>   3. Where available due to LSE128, implement the second ifunc, making
>   use of the novel instructions.
>   4. For atomic functions unable to make use of these new
>   instructions, define a new alias which causes the _i1 function
>   variant to point ahead to the corresponding _i2 implementation.
>
> libatomic/ChangeLog:
>
> 	* Makefile.am (AM_CPPFLAGS): add conditional setting of
> 	-DHAVE_FEAT_LSE128.
> 	* acinclude.m4 (LIBAT_TEST_FEAT_AARCH64_LSE128): New.
> 	* config/linux/aarch64/atomic_16.S (LSE128): New macro
> 	definition.
> 	(libat_exchange_16): New LSE128 variant.
> 	(libat_fetch_or_16): Likewise.
> 	(libat_or_fetch_16): Likewise.
> 	(libat_fetch_and_16): Likewise.
> 	(libat_and_fetch_16): Likewise.
> 	* config/linux/aarch64/host-config.h (IFUNC_COND_2): New.
> 	(IFUNC_NCOND): Add operand size checking.
> 	(has_lse2): Renamed from `ifunc1`.
> 	(has_lse128): New.
> 	(HWCAP2_LSE128): Likewise.
> 	* libatomic/configure.ac: Add call to
> 	LIBAT_TEST_FEAT_AARCH64_LSE128.
> 	* configure (ac_subst_vars): Regenerated via autoreconf.
> 	* libatomic/Makefile.in: Likewise.
> 	* libatomic/auto-config.h.in: Likewise.

OK, thanks.

Richard

> ---
>  libatomic/Makefile.am                        |   3 +
>  libatomic/Makefile.in                        |   1 +
>  libatomic/acinclude.m4                       |  19 +++
>  libatomic/auto-config.h.in                   |   3 +
>  libatomic/config/linux/aarch64/atomic_16.S   | 170 ++++++++++++++++++-
>  libatomic/config/linux/aarch64/host-config.h |  42 ++++-
>  libatomic/configure                          |  61 ++++++-
>  libatomic/configure.ac                       |   3 +
>  8 files changed, 293 insertions(+), 9 deletions(-)
>
> diff --git a/libatomic/Makefile.am b/libatomic/Makefile.am
> index cfad90124f9..0623a0bf2d1 100644
> --- a/libatomic/Makefile.am
> +++ b/libatomic/Makefile.am
> @@ -130,6 +130,9 @@ libatomic_la_LIBADD = $(foreach s,$(SIZES),$(addsuffix _$(s)_.lo,$(SIZEOBJS)))
>  ## On a target-specific basis, include alternates to be selected by IFUNC.
>  if HAVE_IFUNC
>  if ARCH_AARCH64_LINUX
> +if ARCH_AARCH64_HAVE_LSE128
> +AM_CPPFLAGS	     = -DHAVE_FEAT_LSE128
> +endif
>  IFUNC_OPTIONS	     = -march=armv8-a+lse
>  libatomic_la_LIBADD += $(foreach s,$(SIZES),$(addsuffix _$(s)_1_.lo,$(SIZEOBJS)))
>  libatomic_la_SOURCES += atomic_16.S
> diff --git a/libatomic/Makefile.in b/libatomic/Makefile.in
> index dc2330b91fd..cd48fa21334 100644
> --- a/libatomic/Makefile.in
> +++ b/libatomic/Makefile.in
> @@ -452,6 +452,7 @@ M_SRC = $(firstword $(filter %/$(M_FILE), $(all_c_files)))
>  libatomic_la_LIBADD = $(foreach s,$(SIZES),$(addsuffix \
>  	_$(s)_.lo,$(SIZEOBJS))) $(am__append_1) $(am__append_3) \
>  	$(am__append_4) $(am__append_5)
> +@ARCH_AARCH64_HAVE_LSE128_TRUE@@ARCH_AARCH64_LINUX_TRUE@@HAVE_IFUNC_TRUE@AM_CPPFLAGS = -DHAVE_FEAT_LSE128
>  @ARCH_AARCH64_LINUX_TRUE@@HAVE_IFUNC_TRUE@IFUNC_OPTIONS = -march=armv8-a+lse
>  @ARCH_ARM_LINUX_TRUE@@HAVE_IFUNC_TRUE@IFUNC_OPTIONS = -march=armv7-a+fp -DHAVE_KERNEL64
>  @ARCH_I386_TRUE@@HAVE_IFUNC_TRUE@IFUNC_OPTIONS = -march=i586
> diff --git a/libatomic/acinclude.m4 b/libatomic/acinclude.m4
> index f35ab5b60a5..d4f13174e2c 100644
> --- a/libatomic/acinclude.m4
> +++ b/libatomic/acinclude.m4
> @@ -83,6 +83,25 @@ AC_DEFUN([LIBAT_TEST_ATOMIC_BUILTIN],[
>    ])
>  ])
>  
> +dnl
> +dnl Test if the host assembler supports armv9.4-a LSE128 isns.
> +dnl
> +AC_DEFUN([LIBAT_TEST_FEAT_AARCH64_LSE128],[
> +  AC_CACHE_CHECK([for armv9.4-a LSE128 insn support],
> +    [libat_cv_have_feat_lse128],[
> +    AC_LANG_CONFTEST([AC_LANG_PROGRAM([],[asm(".arch armv9-a+lse128")])])
> +    if AC_TRY_EVAL(ac_compile); then
> +      eval libat_cv_have_feat_lse128=yes
> +    else
> +      eval libat_cv_have_feat_lse128=no
> +    fi
> +    rm -f conftest*
> +  ])
> +  LIBAT_DEFINE_YESNO([HAVE_FEAT_LSE128], [$libat_cv_have_feat_lse128],
> +	[Have LSE128 support for 16 byte integers.])
> +  AM_CONDITIONAL([ARCH_AARCH64_HAVE_LSE128], [test x$libat_cv_have_feat_lse128 = xyes])
> +])
> +
>  dnl
>  dnl Test if we have __atomic_load and __atomic_store for mode $1, size $2
>  dnl
> diff --git a/libatomic/auto-config.h.in b/libatomic/auto-config.h.in
> index ab3424a759e..7c78933b07d 100644
> --- a/libatomic/auto-config.h.in
> +++ b/libatomic/auto-config.h.in
> @@ -105,6 +105,9 @@
>  /* Define to 1 if you have the <dlfcn.h> header file. */
>  #undef HAVE_DLFCN_H
>  
> +/* Have LSE128 support for 16 byte integers. */
> +#undef HAVE_FEAT_LSE128
> +
>  /* Define to 1 if you have the <fenv.h> header file. */
>  #undef HAVE_FENV_H
>  
> diff --git a/libatomic/config/linux/aarch64/atomic_16.S b/libatomic/config/linux/aarch64/atomic_16.S
> index 16a42925903..979ed8498cd 100644
> --- a/libatomic/config/linux/aarch64/atomic_16.S
> +++ b/libatomic/config/linux/aarch64/atomic_16.S
> @@ -35,12 +35,17 @@
>     writes, this will be true when using atomics in actual code.
>  
>     The libat_<op>_16 entry points are ARMv8.0.
> -   The libat_<op>_16_i1 entry points are used when LSE2 is available.  */
> -
> +   The libat_<op>_16_i1 entry points are used when LSE128 is available.
> +   The libat_<op>_16_i2 entry points are used when LSE2 is available.  */
>  
> +#if HAVE_FEAT_LSE128
> +	.arch	armv9-a+lse128
> +#else
>  	.arch	armv8-a+lse
> +#endif
>  
> -#define LSE2(NAME) 	NAME##_i1
> +#define LSE128(NAME) 	NAME##_i1
> +#define LSE2(NAME) 	NAME##_i2
>  #define CORE(NAME) 	NAME
>  
>  #define ENTRY(NAME) ENTRY_FEAT1 (NAME)
> @@ -206,6 +211,31 @@ ENTRY (libat_exchange_16)
>  END (libat_exchange_16)
>  
>  
> +#if HAVE_FEAT_LSE128
> +ENTRY_FEAT (libat_exchange_16, LSE128)
> +	mov	tmp0, x0
> +	mov	res0, in0
> +	mov	res1, in1
> +	cbnz	w4, 1f
> +
> +	/* RELAXED.  */
> +	swpp	res0, res1, [tmp0]
> +	ret
> +1:
> +	cmp	w4, ACQUIRE
> +	b.hi	2f
> +
> +	/* ACQUIRE/CONSUME.  */
> +	swppa	res0, res1, [tmp0]
> +	ret
> +
> +	/* RELEASE/ACQ_REL/SEQ_CST.  */
> +2:	swppal	res0, res1, [tmp0]
> +	ret
> +END_FEAT (libat_exchange_16, LSE128)
> +#endif
> +
> +
>  ENTRY (libat_compare_exchange_16)
>  	ldp	exp0, exp1, [x1]
>  	cbz	w4, 3f
> @@ -399,6 +429,31 @@ ENTRY (libat_fetch_or_16)
>  END (libat_fetch_or_16)
>  
>  
> +#if HAVE_FEAT_LSE128
> +ENTRY_FEAT (libat_fetch_or_16, LSE128)
> +	mov	tmp0, x0
> +	mov	res0, in0
> +	mov	res1, in1
> +	cbnz	w4, 1f
> +
> +	/* RELAXED.  */
> +	ldsetp	res0, res1, [tmp0]
> +	ret
> +1:
> +	cmp	w4, ACQUIRE
> +	b.hi	2f
> +
> +	/* ACQUIRE/CONSUME.  */
> +	ldsetpa	res0, res1, [tmp0]
> +	ret
> +
> +	/* RELEASE/ACQ_REL/SEQ_CST.  */
> +2:	ldsetpal	res0, res1, [tmp0]
> +	ret
> +END_FEAT (libat_fetch_or_16, LSE128)
> +#endif
> +
> +
>  ENTRY (libat_or_fetch_16)
>  	mov	x5, x0
>  	cbnz	w4, 2f
> @@ -421,6 +476,36 @@ ENTRY (libat_or_fetch_16)
>  END (libat_or_fetch_16)
>  
>  
> +#if HAVE_FEAT_LSE128
> +ENTRY_FEAT (libat_or_fetch_16, LSE128)
> +	cbnz	w4, 1f
> +	mov	tmp0, in0
> +	mov	tmp1, in1
> +
> +	/* RELAXED.  */
> +	ldsetp	in0, in1, [x0]
> +	orr	res0, in0, tmp0
> +	orr	res1, in1, tmp1
> +	ret
> +1:
> +	cmp	w4, ACQUIRE
> +	b.hi	2f
> +
> +	/* ACQUIRE/CONSUME.  */
> +	ldsetpa	in0, in1, [x0]
> +	orr	res0, in0, tmp0
> +	orr	res1, in1, tmp1
> +	ret
> +
> +	/* RELEASE/ACQ_REL/SEQ_CST.  */
> +2:	ldsetpal	in0, in1, [x0]
> +	orr	res0, in0, tmp0
> +	orr	res1, in1, tmp1
> +	ret
> +END_FEAT (libat_or_fetch_16, LSE128)
> +#endif
> +
> +
>  ENTRY (libat_fetch_and_16)
>  	mov	x5, x0
>  	cbnz	w4, 2f
> @@ -443,6 +528,32 @@ ENTRY (libat_fetch_and_16)
>  END (libat_fetch_and_16)
>  
>  
> +#if HAVE_FEAT_LSE128
> +ENTRY_FEAT (libat_fetch_and_16, LSE128)
> +	mov	tmp0, x0
> +	mvn	res0, in0
> +	mvn	res1, in1
> +	cbnz	w4, 1f
> +
> +	/* RELAXED.  */
> +	ldclrp	res0, res1, [tmp0]
> +	ret
> +
> +1:
> +	cmp	w4, ACQUIRE
> +	b.hi	2f
> +
> +	/* ACQUIRE/CONSUME.  */
> +	ldclrpa res0, res1, [tmp0]
> +	ret
> +
> +	/* RELEASE/ACQ_REL/SEQ_CST.  */
> +2:	ldclrpal	res0, res1, [tmp0]
> +	ret
> +END_FEAT (libat_fetch_and_16, LSE128)
> +#endif
> +
> +
>  ENTRY (libat_and_fetch_16)
>  	mov	x5, x0
>  	cbnz	w4, 2f
> @@ -465,6 +576,37 @@ ENTRY (libat_and_fetch_16)
>  END (libat_and_fetch_16)
>  
>  
> +#if HAVE_FEAT_LSE128
> +ENTRY_FEAT (libat_and_fetch_16, LSE128)
> +	mvn	tmp0, in0
> +	mvn	tmp0, in1
> +	cbnz	w4, 1f
> +
> +	/* RELAXED.  */
> +	ldclrp	tmp0, tmp1, [x0]
> +	and	res0, tmp0, in0
> +	and	res1, tmp1, in1
> +	ret
> +
> +1:
> +	cmp	w4, ACQUIRE
> +	b.hi	2f
> +
> +	/* ACQUIRE/CONSUME.  */
> +	ldclrpa tmp0, tmp1, [x0]
> +	and	res0, tmp0, in0
> +	and	res1, tmp1, in1
> +	ret
> +
> +	/* RELEASE/ACQ_REL/SEQ_CST.  */
> +2:	ldclrpal	tmp0, tmp1, [x5]
> +	and	res0, tmp0, in0
> +	and	res1, tmp1, in1
> +	ret
> +END_FEAT (libat_and_fetch_16, LSE128)
> +#endif
> +
> +
>  ENTRY (libat_fetch_xor_16)
>  	mov	x5, x0
>  	cbnz	w4, 2f
> @@ -570,6 +712,28 @@ ENTRY (libat_test_and_set_16)
>  END (libat_test_and_set_16)
>  
>  
> +/* Alias entry points which are the same in LSE2 and LSE128.  */
> +
> +#if !HAVE_FEAT_LSE128
> +ALIAS (libat_exchange_16, LSE128, LSE2)
> +ALIAS (libat_fetch_or_16, LSE128, LSE2)
> +ALIAS (libat_fetch_and_16, LSE128, LSE2)
> +ALIAS (libat_or_fetch_16, LSE128, LSE2)
> +ALIAS (libat_and_fetch_16, LSE128, LSE2)
> +#endif
> +ALIAS (libat_load_16, LSE128, LSE2)
> +ALIAS (libat_store_16, LSE128, LSE2)
> +ALIAS (libat_compare_exchange_16, LSE128, LSE2)
> +ALIAS (libat_fetch_add_16, LSE128, LSE2)
> +ALIAS (libat_add_fetch_16, LSE128, LSE2)
> +ALIAS (libat_fetch_sub_16, LSE128, LSE2)
> +ALIAS (libat_sub_fetch_16, LSE128, LSE2)
> +ALIAS (libat_fetch_xor_16, LSE128, LSE2)
> +ALIAS (libat_xor_fetch_16, LSE128, LSE2)
> +ALIAS (libat_fetch_nand_16, LSE128, LSE2)
> +ALIAS (libat_nand_fetch_16, LSE128, LSE2)
> +ALIAS (libat_test_and_set_16, LSE128, LSE2)
> +
>  /* Alias entry points which are the same in baseline and LSE2.  */
>  
>  ALIAS (libat_exchange_16, LSE2, CORE)
> diff --git a/libatomic/config/linux/aarch64/host-config.h b/libatomic/config/linux/aarch64/host-config.h
> index 8fd4fe3321a..1bc7d839232 100644
> --- a/libatomic/config/linux/aarch64/host-config.h
> +++ b/libatomic/config/linux/aarch64/host-config.h
> @@ -37,14 +37,17 @@ typedef struct __ifunc_arg_t {
>  
>  #ifdef HWCAP_USCAT
>  # if N == 16
> -#  define IFUNC_COND_1	ifunc1 (hwcap, features)
> +#  define IFUNC_COND_1		(has_lse128 (hwcap, features))
> +#  define IFUNC_COND_2		(has_lse2 (hwcap, features))
> +#  define IFUNC_NCOND(N)	2
>  # else
> -#  define IFUNC_COND_1	(hwcap & HWCAP_ATOMICS)
> +#  define IFUNC_COND_1		(hwcap & HWCAP_ATOMICS)
> +#  define IFUNC_NCOND(N)	1
>  # endif
>  #else
>  #  define IFUNC_COND_1	(false)
> +#  define IFUNC_NCOND(N)	1
>  #endif
> -#define IFUNC_NCOND(N)	(1)
>  
>  #endif /* HAVE_IFUNC */
>  
> @@ -59,7 +62,7 @@ typedef struct __ifunc_arg_t {
>  #define MIDR_PARTNUM(midr)	(((midr) >> 4) & 0xfff)
>  
>  static inline bool
> -ifunc1 (unsigned long hwcap, const __ifunc_arg_t *features)
> +has_lse2 (unsigned long hwcap, const __ifunc_arg_t *features)
>  {
>    if (hwcap & HWCAP_USCAT)
>      return true;
> @@ -75,6 +78,37 @@ ifunc1 (unsigned long hwcap, const __ifunc_arg_t *features)
>  
>    return false;
>  }
> +
> +/* LSE128 atomic support encoded in ID_AA64ISAR0_EL1.Atomic,
> +   bits[23:20].  The expected value is 0b0011.  Check that.  */
> +
> +#define AT_FEAT_FIELD(isar0)	(((isar0) >> 20) & 15)
> +
> +/* Ensure backwards compatibility with glibc <= 2.38.  */
> +#ifndef HWCAP2_LSE128
> +#define HWCAP2_LSE128		(1UL << 47)
> +#endif
> +
> +static inline bool
> +has_lse128 (unsigned long hwcap, const __ifunc_arg_t *features)
> +{
> +  if (hwcap & _IFUNC_ARG_HWCAP
> +      && features->_hwcap2 & HWCAP2_LSE128)
> +    return true;
> +  /* A 0 HWCAP2_LSE128 bit may be just as much a sign of missing HWCAP2 bit
> +     support in older kernels as it is of CPU feature absence.  Try fallback
> +     method to guarantee LSE128 is not implemented.
> +
> +     In the absence of HWCAP_CPUID, we are unable to check for LSE128.  */
> +  if (!(hwcap & HWCAP_CPUID))
> +    return false;
> +  unsigned long isar0;
> +  asm volatile ("mrs %0, ID_AA64ISAR0_EL1" : "=r" (isar0));
> +  if (AT_FEAT_FIELD (isar0) >= 3)
> +    return true;
> +  return false;
> +}
> +
>  #endif
>  
>  #include_next <host-config.h>
> diff --git a/libatomic/configure b/libatomic/configure
> index d579bab96f8..8ab730d8082 100755
> --- a/libatomic/configure
> +++ b/libatomic/configure
> @@ -656,6 +656,8 @@ LIBAT_BUILD_VERSIONED_SHLIB_FALSE
>  LIBAT_BUILD_VERSIONED_SHLIB_TRUE
>  OPT_LDFLAGS
>  SECTION_LDFLAGS
> +ARCH_AARCH64_HAVE_LSE128_FALSE
> +ARCH_AARCH64_HAVE_LSE128_TRUE
>  SYSROOT_CFLAGS_FOR_TARGET
>  enable_aarch64_lse
>  libtool_VERSION
> @@ -11456,7 +11458,7 @@ else
>    lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
>    lt_status=$lt_dlunknown
>    cat > conftest.$ac_ext <<_LT_EOF
> -#line 11459 "configure"
> +#line 11461 "configure"
>  #include "confdefs.h"
>  
>  #if HAVE_DLFCN_H
> @@ -11562,7 +11564,7 @@ else
>    lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
>    lt_status=$lt_dlunknown
>    cat > conftest.$ac_ext <<_LT_EOF
> -#line 11565 "configure"
> +#line 11567 "configure"
>  #include "confdefs.h"
>  
>  #if HAVE_DLFCN_H
> @@ -14697,6 +14699,57 @@ _ACEOF
>  
>  
>  
> +# Check for target-specific assembly-level support for atomic operations.
> +
> +  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for armv9.4-a LSE128 insn support" >&5
> +$as_echo_n "checking for armv9.4-a LSE128 insn support... " >&6; }
> +if ${libat_cv_have_feat_lse128+:} false; then :
> +  $as_echo_n "(cached) " >&6
> +else
> +
> +    cat confdefs.h - <<_ACEOF >conftest.$ac_ext
> +/* end confdefs.h.  */
> +
> +int
> +main ()
> +{
> +asm(".arch armv9-a+lse128")
> +  ;
> +  return 0;
> +}
> +_ACEOF
> +    if { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_compile\""; } >&5
> +  (eval $ac_compile) 2>&5
> +  ac_status=$?
> +  $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
> +  test $ac_status = 0; }; then
> +      eval libat_cv_have_feat_lse128=yes
> +    else
> +      eval libat_cv_have_feat_lse128=no
> +    fi
> +    rm -f conftest*
> +
> +fi
> +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $libat_cv_have_feat_lse128" >&5
> +$as_echo "$libat_cv_have_feat_lse128" >&6; }
> +
> +  yesno=`echo $libat_cv_have_feat_lse128 | tr 'yesno' '1  0 '`
> +
> +cat >>confdefs.h <<_ACEOF
> +#define HAVE_FEAT_LSE128 $yesno
> +_ACEOF
> +
> +
> +   if test x$libat_cv_have_feat_lse128 = xyes; then
> +  ARCH_AARCH64_HAVE_LSE128_TRUE=
> +  ARCH_AARCH64_HAVE_LSE128_FALSE='#'
> +else
> +  ARCH_AARCH64_HAVE_LSE128_TRUE='#'
> +  ARCH_AARCH64_HAVE_LSE128_FALSE=
> +fi
> +
> +
> +
>   { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether byte ordering is bigendian" >&5
>  $as_echo_n "checking whether byte ordering is bigendian... " >&6; }
>  if ${ac_cv_c_bigendian+:} false; then :
> @@ -15989,6 +16042,10 @@ if test -z "${ENABLE_DARWIN_AT_RPATH_TRUE}" && test -z "${ENABLE_DARWIN_AT_RPATH
>    as_fn_error $? "conditional \"ENABLE_DARWIN_AT_RPATH\" was never defined.
>  Usually this means the macro was only invoked conditionally." "$LINENO" 5
>  fi
> +if test -z "${ARCH_AARCH64_HAVE_LSE128_TRUE}" && test -z "${ARCH_AARCH64_HAVE_LSE128_FALSE}"; then
> +  as_fn_error $? "conditional \"ARCH_AARCH64_HAVE_LSE128\" was never defined.
> +Usually this means the macro was only invoked conditionally." "$LINENO" 5
> +fi
>  
>  if test -z "${LIBAT_BUILD_VERSIONED_SHLIB_TRUE}" && test -z "${LIBAT_BUILD_VERSIONED_SHLIB_FALSE}"; then
>    as_fn_error $? "conditional \"LIBAT_BUILD_VERSIONED_SHLIB\" was never defined.
> diff --git a/libatomic/configure.ac b/libatomic/configure.ac
> index 32a2cdb13ae..85824fa7614 100644
> --- a/libatomic/configure.ac
> +++ b/libatomic/configure.ac
> @@ -206,6 +206,9 @@ LIBAT_FORALL_MODES([LIBAT_HAVE_ATOMIC_CAS])
>  LIBAT_FORALL_MODES([LIBAT_HAVE_ATOMIC_FETCH_ADD])
>  LIBAT_FORALL_MODES([LIBAT_HAVE_ATOMIC_FETCH_OP])
>  
> +# Check for target-specific assembly-level support for atomic operations.
> +LIBAT_TEST_FEAT_AARCH64_LSE128()
> +
>  AC_C_BIGENDIAN
>  # I don't like the default behaviour of WORDS_BIGENDIAN undefined for LE.
>  AH_BOTTOM(

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 4/4] aarch64: Add explicit checks for implicit LSE/LSE2 requirements.
  2024-01-24 17:17 ` [PATCH v4 4/4] aarch64: Add explicit checks for implicit LSE/LSE2 requirements Victor Do Nascimento
@ 2024-01-25 17:41   ` Richard Sandiford
  0 siblings, 0 replies; 9+ messages in thread
From: Richard Sandiford @ 2024-01-25 17:41 UTC (permalink / raw)
  To: Victor Do Nascimento; +Cc: gcc-patches, kyrylo.tkachov, Richard.Earnshaw

Victor Do Nascimento <victor.donascimento@arm.com> writes:
> At present, Evaluation of both `has_lse2(hwcap)' and
> `has_lse128(hwcap)' may require issuing an `mrs' instruction to query
> a system register.  This instruction, when issued from user-space
> results in a trap by the kernel which then returns the value read in
> by the system register.  Given the undesirable nature of the
> computational expense associated with the context switch, it is
> important to implement mechanisms to, wherever possible, forgo the
> operation.
>
> In light of this, given how other architectural requirements serving
> as prerequisites have long been assigned HWCAP bits by the kernel, we
> can inexpensively query for their availability before attempting to
> read any system registers.  Where one of these early tests fail, we
> can assert that the main feature of interest (be it LSE2 or LSE128)
> cannot be present, allowing us to return from the function early and
> skip the unnecessary expensive kernel-mediated access to system
> registers.
>
> libatomic/ChangeLog:
>
> 	* config/linux/aarch64/host-config.h (has_lse2): Add test for LSE.
> 	(has_lse128): Add test for LSE2.

FAOD, the previous OK for this patch still stands.

Thanks,
Richard

> ---
>  libatomic/config/linux/aarch64/host-config.h | 13 ++++++++++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/libatomic/config/linux/aarch64/host-config.h b/libatomic/config/linux/aarch64/host-config.h
> index 1bc7d839232..4e354124063 100644
> --- a/libatomic/config/linux/aarch64/host-config.h
> +++ b/libatomic/config/linux/aarch64/host-config.h
> @@ -64,8 +64,13 @@ typedef struct __ifunc_arg_t {
>  static inline bool
>  has_lse2 (unsigned long hwcap, const __ifunc_arg_t *features)
>  {
> +  /* Check for LSE2.  */
>    if (hwcap & HWCAP_USCAT)
>      return true;
> +  /* No point checking further for atomic 128-bit load/store if LSE
> +     prerequisite not met.  */
> +  if (!(hwcap & HWCAP_ATOMICS))
> +    return false;
>    if (!(hwcap & HWCAP_CPUID))
>      return false;
>  
> @@ -99,9 +104,11 @@ has_lse128 (unsigned long hwcap, const __ifunc_arg_t *features)
>       support in older kernels as it is of CPU feature absence.  Try fallback
>       method to guarantee LSE128 is not implemented.
>  
> -     In the absence of HWCAP_CPUID, we are unable to check for LSE128.  */
> -  if (!(hwcap & HWCAP_CPUID))
> -    return false;
> +     In the absence of HWCAP_CPUID, we are unable to check for LSE128.
> +     If feature check available, check LSE2 prerequisite before proceeding.  */
> +  if (!(hwcap & HWCAP_CPUID) || !(hwcap & HWCAP_USCAT))
> +     return false;
> +
>    unsigned long isar0;
>    asm volatile ("mrs %0, ID_AA64ISAR0_EL1" : "=r" (isar0));
>    if (AT_FEAT_FIELD (isar0) >= 3)

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-01-25 17:41 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-24 17:17 [PATCH v4 0/4] Libatomic: Add LSE128 atomics support for AArch64 Victor Do Nascimento
2024-01-24 17:17 ` [PATCH v4 1/4] libatomic: atomic_16.S: Improve ENTRY, END and ALIAS macro interface Victor Do Nascimento
2024-01-25 17:17   ` Richard Sandiford
2024-01-24 17:17 ` [PATCH v4 2/4] libatomic: Add support for __ifunc_arg_t arg in ifunc resolver Victor Do Nascimento
2024-01-25 17:28   ` Richard Sandiford
2024-01-24 17:17 ` [PATCH v4 3/4] libatomic: Enable LSE128 128-bit atomics for armv9.4-a Victor Do Nascimento
2024-01-25 17:38   ` Richard Sandiford
2024-01-24 17:17 ` [PATCH v4 4/4] aarch64: Add explicit checks for implicit LSE/LSE2 requirements Victor Do Nascimento
2024-01-25 17:41   ` Richard Sandiford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).