public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 0/4] Libatomic: Cleanup ifunc selector and aliasing
@ 2024-05-16 13:36 Victor Do Nascimento
  2024-05-16 13:36 ` [PATCH 1/4] Libatomic: Define per-file identifier macros Victor Do Nascimento
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Victor Do Nascimento @ 2024-05-16 13:36 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.sandiford, Richard.Earnshaw, Victor Do Nascimento

The recent introduction of the optional LSE128 and RCPC3 architectural
extensions to AArch64 has further led to the increased flexibility of
atomic support in the architecture, with many extensions providing
support for distinct atomic operations, each with different potential
applications in mind.

This has led to maintenance difficulties in Libatomic, in particular
regarding the way the ifunc selector is generated via a series of
macro expansions at compile-time.

Until now, irrespective of the atomic operation in question, all atomic
functions for a particular operand size were expected to have the same
number of ifunc alternatives, meaning that a one-size-fits-all
approach could reasonably be taken for the selector.

This meant that if, hypothetically, for a particular architecture and
operand size one particular atomic operation was to have 3 different
implementations associated with different extensions, libatomic would
likewise be required to present three ifunc alternatives for all other
atomic functions.

The consequence in the design choice was the unnecessary use of
function aliasing and the unwieldy code which resulted from this.

This patch series attempts to remediate this issue by making the
preprocessor macros defining the number of ifunc alternatives and
their respective selection functions dependent on the file importing
the ifunc selector-generating framework.

all files are given `LAT_<FILENAME>' macros, defined at the beginning
and undef'd at the end of the file.  It is these macros that are
subsequently used to fine-tune the behaviors of `libatomic_i.h' and
`host-config.h'.

In particular, the definition of the `IFUNC_NCOND(N)' and
`IFUNC_COND_<n>' macros in host-config.h can now be guarded behind
these new file-specific macros, which ultimately control what the
`GEN_SELECTOR(X)' macro in `libatomic_i.h' expands to.  As both of
these headers are imported once per file implementing some atomic
operation, fine-tuned control is now possible.

Regtested with both `--enable-gnu-indirect-function' and
`--disable-gnu-indirect-function' configurations on armv9.4-a target
with LRCPC3 and LSE128 support and without.

Victor Do Nascimento (4):
  Libatomic: Define per-file identifier macros
  Libatomic: Make ifunc selector behavior contingent on importing file
  Libatomic: Clean up AArch64 ifunc aliasing
  Libatomic: Clean up AArch64 `atomic_16.S' implementation file

 libatomic/cas_n.c                            |   2 +
 libatomic/config/linux/aarch64/atomic_16.S   | 623 +++++++++----------
 libatomic/config/linux/aarch64/host-config.h |  35 +-
 libatomic/exch_n.c                           |   2 +
 libatomic/fadd_n.c                           |   2 +
 libatomic/fand_n.c                           |   2 +
 libatomic/fence.c                            |   2 +
 libatomic/fenv.c                             |   2 +
 libatomic/fior_n.c                           |   2 +
 libatomic/flag.c                             |   2 +
 libatomic/fnand_n.c                          |   2 +
 libatomic/fop_n.c                            |   2 +
 libatomic/fsub_n.c                           |   2 +
 libatomic/fxor_n.c                           |   2 +
 libatomic/gcas.c                             |   2 +
 libatomic/gexch.c                            |   2 +
 libatomic/glfree.c                           |   2 +
 libatomic/gload.c                            |   2 +
 libatomic/gstore.c                           |   2 +
 libatomic/load_n.c                           |   2 +
 libatomic/store_n.c                          |   2 +
 libatomic/tas_n.c                            |   2 +
 22 files changed, 357 insertions(+), 341 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/4] Libatomic: Define per-file identifier macros
  2024-05-16 13:36 [PATCH 0/4] Libatomic: Cleanup ifunc selector and aliasing Victor Do Nascimento
@ 2024-05-16 13:36 ` Victor Do Nascimento
  2024-05-16 13:36 ` [PATCH 2/4] Libatomic: Make ifunc selector behavior contingent on importing file Victor Do Nascimento
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Victor Do Nascimento @ 2024-05-16 13:36 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.sandiford, Richard.Earnshaw, Victor Do Nascimento

In order to facilitate the fine-tuning of how `libatomic_i.h' and
`host-config.h' headers are used by different atomic functions, we
define distinct identifier macros for each file which, in implementing
atomic operations, imports these headers.

The idea is that different parts of these headers could then be
conditionally defined depending on the macros set by the file that
`#include'd them.

Given how it is possible that some file names are generic enough that
using them as-is for macro names (e.g. flag.c -> FLAG) may potentially
lead to name clashes with other macros, all file names first have LAT_
prepended to them such that, for example, flag.c is assigned the
LAT_FLAG macro.

Libatomic/ChangeLog:

	* cas_n.c (LAT_CAS_N): New.
	* exch_n.c (LAT_EXCH_N): Likewise.
	* fadd_n.c (LAT_FADD_N): Likewise.
	* fand_n.c (LAT_FAND_N): Likewise.
	* fence.c (LAT_FENCE): Likewise.
	* fenv.c (LAT_FENV): Likewise.
	* fior_n.c (LAT_FIOR_N): Likewise.
	* flag.c (LAT_FLAG): Likewise.
	* fnand_n.c (LAT_FNAND_N): Likewise.
	* fop_n.c (LAT_FOP_N): Likewise
	* fsub_n.c (LAT_FSUB_N): Likewise.
	* fxor_n.c (LAT_FXOR_N): Likewise.
	* gcas.c (LAT_GCAS): Likewise.
	* gexch.c (LAT_GEXCH): Likewise.
	* glfree.c (LAT_GLFREE): Likewise.
	* gload.c (LAT_GLOAD): Likewise.
	* gstore.c (LAT_GSTORE): Likewise.
	* load_n.c (LAT_LOAD_N): Likewise.
	* store_n.c (LAT_STORE_N): Likewise.
	* tas_n.c (LAT_TAS_N): Likewise.
---
 libatomic/cas_n.c   | 2 ++
 libatomic/exch_n.c  | 2 ++
 libatomic/fadd_n.c  | 2 ++
 libatomic/fand_n.c  | 2 ++
 libatomic/fence.c   | 2 ++
 libatomic/fenv.c    | 2 ++
 libatomic/fior_n.c  | 2 ++
 libatomic/flag.c    | 2 ++
 libatomic/fnand_n.c | 2 ++
 libatomic/fop_n.c   | 2 ++
 libatomic/fsub_n.c  | 2 ++
 libatomic/fxor_n.c  | 2 ++
 libatomic/gcas.c    | 2 ++
 libatomic/gexch.c   | 2 ++
 libatomic/glfree.c  | 2 ++
 libatomic/gload.c   | 2 ++
 libatomic/gstore.c  | 2 ++
 libatomic/load_n.c  | 2 ++
 libatomic/store_n.c | 2 ++
 libatomic/tas_n.c   | 2 ++
 20 files changed, 40 insertions(+)

diff --git a/libatomic/cas_n.c b/libatomic/cas_n.c
index a080b990371..2a6357e48db 100644
--- a/libatomic/cas_n.c
+++ b/libatomic/cas_n.c
@@ -22,6 +22,7 @@
    see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
    <http://www.gnu.org/licenses/>.  */
 
+#define LAT_CAS_N
 #include "libatomic_i.h"
 
 
@@ -122,3 +123,4 @@ SIZE(libat_compare_exchange) (UTYPE *mptr, UTYPE *eptr, UTYPE newval,
 #endif
 
 EXPORT_ALIAS (SIZE(compare_exchange));
+#undef LAT_CAS_N
diff --git a/libatomic/exch_n.c b/libatomic/exch_n.c
index e5ff80769b9..184d3de1009 100644
--- a/libatomic/exch_n.c
+++ b/libatomic/exch_n.c
@@ -22,6 +22,7 @@
    see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
    <http://www.gnu.org/licenses/>.  */
 
+#define LAT_EXCH_N
 #include "libatomic_i.h"
 
 
@@ -126,3 +127,4 @@ SIZE(libat_exchange) (UTYPE *mptr, UTYPE newval, int smodel UNUSED)
 #endif
 
 EXPORT_ALIAS (SIZE(exchange));
+#undef LAT_EXCH_N
diff --git a/libatomic/fadd_n.c b/libatomic/fadd_n.c
index bc15b8bc0e6..32b75cec654 100644
--- a/libatomic/fadd_n.c
+++ b/libatomic/fadd_n.c
@@ -22,6 +22,7 @@
    see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
    <http://www.gnu.org/licenses/>.  */
 
+#define LAT_FADD_N
 #include <libatomic_i.h>
 
 #define NAME	add
@@ -43,3 +44,4 @@
 #endif
 
 #include "fop_n.c"
+#undef LAT_FADD_N
diff --git a/libatomic/fand_n.c b/libatomic/fand_n.c
index ffe9ed8700f..9eab55bcd72 100644
--- a/libatomic/fand_n.c
+++ b/libatomic/fand_n.c
@@ -1,3 +1,5 @@
+#define LAT_FAND_N
 #define NAME	and
 #define OP(X,Y)	((X) & (Y))
 #include "fop_n.c"
+#undef LAT_FAND_N
diff --git a/libatomic/fence.c b/libatomic/fence.c
index a9b1e280c5a..4022194a57a 100644
--- a/libatomic/fence.c
+++ b/libatomic/fence.c
@@ -21,6 +21,7 @@
    see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
    <http://www.gnu.org/licenses/>.  */
 
+#define LAT_FENCE
 #include "libatomic_i.h"
 
 #include <stdatomic.h>
@@ -43,3 +44,4 @@ void
 {
   atomic_signal_fence (order);
 }
+#undef LAT_FENCE
diff --git a/libatomic/fenv.c b/libatomic/fenv.c
index 41f187c1f85..dccad356a31 100644
--- a/libatomic/fenv.c
+++ b/libatomic/fenv.c
@@ -21,6 +21,7 @@
    see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
    <http://www.gnu.org/licenses/>.  */
 
+#define LAT_FENV
 #include "libatomic_i.h"
 
 #ifdef HAVE_FENV_H
@@ -70,3 +71,4 @@ __atomic_feraiseexcept (int excepts __attribute__ ((unused)))
     }
 #endif
 }
+#undef LAT_FENV
diff --git a/libatomic/fior_n.c b/libatomic/fior_n.c
index 55d0d66b469..2b58d4805d6 100644
--- a/libatomic/fior_n.c
+++ b/libatomic/fior_n.c
@@ -1,3 +1,5 @@
+#define LAT_FIOR_N
 #define NAME	or
 #define OP(X,Y)	((X) | (Y))
 #include "fop_n.c"
+#undef LAT_FIOR_N
diff --git a/libatomic/flag.c b/libatomic/flag.c
index e4a5a27819a..8afd80c9130 100644
--- a/libatomic/flag.c
+++ b/libatomic/flag.c
@@ -21,6 +21,7 @@
    see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
    <http://www.gnu.org/licenses/>.  */
 
+#define LAT_FLAG
 #include "libatomic_i.h"
 
 #include <stdatomic.h>
@@ -62,3 +63,4 @@ void
 {
   return atomic_flag_clear_explicit (object, order);
 }
+#undef LAT_FLAG
diff --git a/libatomic/fnand_n.c b/libatomic/fnand_n.c
index a3c98c70494..84a02709cbb 100644
--- a/libatomic/fnand_n.c
+++ b/libatomic/fnand_n.c
@@ -1,3 +1,5 @@
+#define LAT_FNAND_N
 #define NAME	nand
 #define OP(X,Y)	~((X) & (Y))
 #include "fop_n.c"
+#undef LAT_FNAND_N
diff --git a/libatomic/fop_n.c b/libatomic/fop_n.c
index f5eb07e859f..fefff3a57a4 100644
--- a/libatomic/fop_n.c
+++ b/libatomic/fop_n.c
@@ -22,6 +22,7 @@
    see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
    <http://www.gnu.org/licenses/>.  */
 
+#define LAT_FOP_N
 #include <libatomic_i.h>
 
 
@@ -198,3 +199,4 @@ SIZE(C3(libat_,NAME,_fetch)) (UTYPE *mptr, UTYPE opval, int smodel UNUSED)
 
 EXPORT_ALIAS (SIZE(C2(fetch_,NAME)));
 EXPORT_ALIAS (SIZE(C2(NAME,_fetch)));
+#undef LAT_FOP_N
diff --git a/libatomic/fsub_n.c b/libatomic/fsub_n.c
index e9f8d7d25e1..49b375a543f 100644
--- a/libatomic/fsub_n.c
+++ b/libatomic/fsub_n.c
@@ -1,3 +1,5 @@
+#define LAT_FSUB_N
 #define NAME	sub
 #define OP(X,Y)	((X) - (Y))
 #include "fop_n.c"
+#undef LAT_FSUB_N
diff --git a/libatomic/fxor_n.c b/libatomic/fxor_n.c
index 0f2d9624127..d9a91bc3b23 100644
--- a/libatomic/fxor_n.c
+++ b/libatomic/fxor_n.c
@@ -1,3 +1,5 @@
+#define LAT_FXOR_N
 #define NAME	xor
 #define OP(X,Y)	((X) ^ (Y))
 #include "fop_n.c"
+#undef LAT_FXOR_N
diff --git a/libatomic/gcas.c b/libatomic/gcas.c
index 21d11305f1e..af4a5f5c5ee 100644
--- a/libatomic/gcas.c
+++ b/libatomic/gcas.c
@@ -22,6 +22,7 @@
    see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
    <http://www.gnu.org/licenses/>.  */
 
+#define LAT_GCAS
 #include "libatomic_i.h"
 
 
@@ -118,3 +119,4 @@ libat_compare_exchange (size_t n, void *mptr, void *eptr, void *dptr,
 }
 
 EXPORT_ALIAS (compare_exchange);
+#undef LAT_GCAS
diff --git a/libatomic/gexch.c b/libatomic/gexch.c
index 6233759a2e8..afb054c0ef2 100644
--- a/libatomic/gexch.c
+++ b/libatomic/gexch.c
@@ -22,6 +22,7 @@
    see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
    <http://www.gnu.org/licenses/>.  */
 
+#define LAT_GEXCH
 #include "libatomic_i.h"
 
 
@@ -142,3 +143,4 @@ libat_exchange (size_t n, void *mptr, void *vptr, void *rptr, int smodel)
 }
 
 EXPORT_ALIAS (exchange);
+#undef LAT_GEXCH
diff --git a/libatomic/glfree.c b/libatomic/glfree.c
index 58a45126194..1051ceb81cd 100644
--- a/libatomic/glfree.c
+++ b/libatomic/glfree.c
@@ -22,6 +22,7 @@
    see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
    <http://www.gnu.org/licenses/>.  */
 
+#define LAT_GLFREE
 #include "libatomic_i.h"
 
 /* Accesses with a power-of-two size are not lock-free if we don't have an
@@ -80,3 +81,4 @@ libat_is_lock_free (size_t n, void *ptr)
 }
 
 EXPORT_ALIAS (is_lock_free);
+#undef LAT_GLFREE
diff --git a/libatomic/gload.c b/libatomic/gload.c
index 4b3198cc5ae..9b499672161 100644
--- a/libatomic/gload.c
+++ b/libatomic/gload.c
@@ -22,6 +22,7 @@
    see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
    <http://www.gnu.org/licenses/>.  */
 
+#define LAT_GLOAD
 #include "libatomic_i.h"
 
 
@@ -98,3 +99,4 @@ libat_load (size_t n, void *mptr, void *rptr, int smodel)
 }
 
 EXPORT_ALIAS (load);
+#undef LAT_GLOAD
diff --git a/libatomic/gstore.c b/libatomic/gstore.c
index 505a7b9b2df..b2636059bd8 100644
--- a/libatomic/gstore.c
+++ b/libatomic/gstore.c
@@ -22,6 +22,7 @@
    see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
    <http://www.gnu.org/licenses/>.  */
 
+#define LAT_GSTORE
 #include "libatomic_i.h"
 
 
@@ -106,3 +107,4 @@ libat_store (size_t n, void *mptr, void *vptr, int smodel)
 }
 
 EXPORT_ALIAS (store);
+#undef LAT_GSTORE
diff --git a/libatomic/load_n.c b/libatomic/load_n.c
index 7513f191833..657c8e23ed2 100644
--- a/libatomic/load_n.c
+++ b/libatomic/load_n.c
@@ -22,6 +22,7 @@
    see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
    <http://www.gnu.org/licenses/>.  */
 
+#define LAT_LOAD_N
 #include "libatomic_i.h"
 
 
@@ -113,3 +114,4 @@ SIZE(libat_load) (UTYPE *mptr, int smodel)
 #endif
 
 EXPORT_ALIAS (SIZE(load));
+#undef LAT_LOAD_N
diff --git a/libatomic/store_n.c b/libatomic/store_n.c
index d8ab5e69a50..079e22d75ba 100644
--- a/libatomic/store_n.c
+++ b/libatomic/store_n.c
@@ -22,6 +22,7 @@
    see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
    <http://www.gnu.org/licenses/>.  */
 
+#define LAT_STORE_N
 #include "libatomic_i.h"
 
 
@@ -110,3 +111,4 @@ SIZE(libat_store) (UTYPE *mptr, UTYPE newval, int smodel)
 #endif
 
 EXPORT_ALIAS (SIZE(store));
+#undef LAT_STORE_N
diff --git a/libatomic/tas_n.c b/libatomic/tas_n.c
index 4a01cd2a5c8..9321b3a4e02 100644
--- a/libatomic/tas_n.c
+++ b/libatomic/tas_n.c
@@ -22,6 +22,7 @@
    see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
    <http://www.gnu.org/licenses/>.  */
 
+#define LAT_TAS_N
 #include "libatomic_i.h"
 
 
@@ -113,3 +114,4 @@ SIZE(libat_test_and_set) (UTYPE *mptr, int smodel UNUSED)
 #endif
 
 EXPORT_ALIAS (SIZE(test_and_set));
+#undef LAT_TAS_N
-- 
2.34.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 2/4] Libatomic: Make ifunc selector behavior contingent on importing file
  2024-05-16 13:36 [PATCH 0/4] Libatomic: Cleanup ifunc selector and aliasing Victor Do Nascimento
  2024-05-16 13:36 ` [PATCH 1/4] Libatomic: Define per-file identifier macros Victor Do Nascimento
@ 2024-05-16 13:36 ` Victor Do Nascimento
  2024-05-16 13:36 ` [PATCH 3/4] Libatomic: Clean up AArch64 ifunc aliasing Victor Do Nascimento
  2024-05-16 13:36 ` [PATCH 4/4] Libatomic: Clean up AArch64 `atomic_16.S' implementation file Victor Do Nascimento
  3 siblings, 0 replies; 5+ messages in thread
From: Victor Do Nascimento @ 2024-05-16 13:36 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.sandiford, Richard.Earnshaw, Victor Do Nascimento

By querying previously-defined file-identifier macros, `host-config.h'
is able to get information about its environment and, based on this
information, select more appropriate function-specific ifunc
selectors.  This reduces the number of unnecessary feature tests that
need to be carried out in order to find the best atomic implementation
for a function at run-time.

An immediate benefit of this is that we can further fine-tune the
architectural requirements for each atomic function without risk of
incurring the maintenance and runtime-performance penalties of having
to maintain an ifunc selector with a huge number of alternatives, most
of which are irrelevant for any particular function.  Consequently,
for AArch64 targets, we relax the architectural requirements of
`compare_exchange_16', which now requires only LSE as opposed to the
newer LSE2.

The new flexibility provided by this approach also means that certain
functions can now be called directly, doing away with ifunc selectors
altogether when only a single implementation is available for it on a
given target.  As per the macro expansion framework laid out in
`libatomic_i.h', such functions should have their names prefixed with
`__atomic_' as opposed to `libat_'.  This is the same prefix applied
to function names when Libatomic is configured with
`--disable-gnu-indirect-function'.

To achieve this, these functions unconditionally apply the aliasing
rule that at present is conditionally applied only when libatomic is
built without ifunc support, which ensures that the default
`libat_##NAME' is accessible via the equivalent `__atomic_##NAME' too.
This is ensured by using the new `ENTRY_ALIASED' macro.

libatomic/ChangeLog:

	* config/linux/aarch64/atomic_16.S (LSE): New.
	(ENTRY_ALIASED): Likewise.
	* config/linux/aarch64/host-config.h (LSE_ATOP): New.
	(LSE2_ATOP): Likewise.
	(LSE128_ATOP): Likewise.
	(IFUNC_COND_1): Make its definition conditional on above 3
	macros.
	(IFUNC_NCOND): Likewise.
---
 libatomic/config/linux/aarch64/atomic_16.S   | 31 +++++++++--------
 libatomic/config/linux/aarch64/host-config.h | 35 ++++++++++++++++----
 2 files changed, 45 insertions(+), 21 deletions(-)

diff --git a/libatomic/config/linux/aarch64/atomic_16.S b/libatomic/config/linux/aarch64/atomic_16.S
index b63e97ac5a2..1517e9e78df 100644
--- a/libatomic/config/linux/aarch64/atomic_16.S
+++ b/libatomic/config/linux/aarch64/atomic_16.S
@@ -54,17 +54,20 @@
 #endif
 
 #define LSE128(NAME)	libat_##NAME##_i1
-#define LSE2(NAME)	libat_##NAME##_i2
+#define LSE(NAME)	libat_##NAME##_i1
+#define LSE2(NAME)	libat_##NAME##_i1
 #define CORE(NAME)	libat_##NAME
 #define ATOMIC(NAME)	__atomic_##NAME
 
+/* Emit __atomic_* entrypoints if no ifuncs.  */
+#define ENTRY_ALIASED(NAME)	ENTRY2 (CORE (NAME), ALIAS (NAME, ATOMIC, CORE))
+
 #if HAVE_IFUNC
 # define ENTRY(NAME)		ENTRY2 (CORE (NAME), )
 # define ENTRY_FEAT(NAME, FEAT) ENTRY2 (FEAT (NAME), )
 # define END_FEAT(NAME, FEAT)	END2 (FEAT (NAME))
 #else
-/* Emit __atomic_* entrypoints if no ifuncs.  */
-# define ENTRY(NAME)	ENTRY2 (CORE (NAME), ALIAS (NAME, ATOMIC, CORE))
+# define ENTRY(NAME)	ENTRY_ALIASED (NAME)
 #endif
 
 #define END(NAME)		END2 (CORE (NAME))
@@ -299,7 +302,7 @@ END (compare_exchange_16)
 
 
 #if HAVE_FEAT_LSE2
-ENTRY_FEAT (compare_exchange_16, LSE2)
+ENTRY_FEAT (compare_exchange_16, LSE)
 	ldp	exp0, exp1, [x1]
 	mov	tmp0, exp0
 	mov	tmp1, exp1
@@ -332,11 +335,11 @@ ENTRY_FEAT (compare_exchange_16, LSE2)
 	/* ACQ_REL/SEQ_CST.  */
 4:	caspal	exp0, exp1, in0, in1, [x0]
 	b	0b
-END_FEAT (compare_exchange_16, LSE2)
+END_FEAT (compare_exchange_16, LSE)
 #endif
 
 
-ENTRY (fetch_add_16)
+ENTRY_ALIASED (fetch_add_16)
 	mov	x5, x0
 	cbnz	w4, 2f
 
@@ -358,7 +361,7 @@ ENTRY (fetch_add_16)
 END (fetch_add_16)
 
 
-ENTRY (add_fetch_16)
+ENTRY_ALIASED (add_fetch_16)
 	mov	x5, x0
 	cbnz	w4, 2f
 
@@ -380,7 +383,7 @@ ENTRY (add_fetch_16)
 END (add_fetch_16)
 
 
-ENTRY (fetch_sub_16)
+ENTRY_ALIASED (fetch_sub_16)
 	mov	x5, x0
 	cbnz	w4, 2f
 
@@ -402,7 +405,7 @@ ENTRY (fetch_sub_16)
 END (fetch_sub_16)
 
 
-ENTRY (sub_fetch_16)
+ENTRY_ALIASED (sub_fetch_16)
 	mov	x5, x0
 	cbnz	w4, 2f
 
@@ -624,7 +627,7 @@ END_FEAT (and_fetch_16, LSE128)
 #endif
 
 
-ENTRY (fetch_xor_16)
+ENTRY_ALIASED (fetch_xor_16)
 	mov	x5, x0
 	cbnz	w4, 2f
 
@@ -646,7 +649,7 @@ ENTRY (fetch_xor_16)
 END (fetch_xor_16)
 
 
-ENTRY (xor_fetch_16)
+ENTRY_ALIASED (xor_fetch_16)
 	mov	x5, x0
 	cbnz	w4, 2f
 
@@ -668,7 +671,7 @@ ENTRY (xor_fetch_16)
 END (xor_fetch_16)
 
 
-ENTRY (fetch_nand_16)
+ENTRY_ALIASED (fetch_nand_16)
 	mov	x5, x0
 	mvn	in0, in0
 	mvn	in1, in1
@@ -692,7 +695,7 @@ ENTRY (fetch_nand_16)
 END (fetch_nand_16)
 
 
-ENTRY (nand_fetch_16)
+ENTRY_ALIASED (nand_fetch_16)
 	mov	x5, x0
 	mvn	in0, in0
 	mvn	in1, in1
@@ -718,7 +721,7 @@ END (nand_fetch_16)
 
 /* __atomic_test_and_set is always inlined, so this entry is unused and
    only required for completeness.  */
-ENTRY (test_and_set_16)
+ENTRY_ALIASED (test_and_set_16)
 
 	/* RELAXED/ACQUIRE/CONSUME/RELEASE/ACQ_REL/SEQ_CST.  */
 	mov	x5, x0
diff --git a/libatomic/config/linux/aarch64/host-config.h b/libatomic/config/linux/aarch64/host-config.h
index e1a699948f4..6e010594a6c 100644
--- a/libatomic/config/linux/aarch64/host-config.h
+++ b/libatomic/config/linux/aarch64/host-config.h
@@ -48,15 +48,36 @@ typedef struct __ifunc_arg_t {
 # define _IFUNC_ARG_HWCAP (1ULL << 62)
 #endif
 
-#if N == 16
-# define IFUNC_COND_1		(has_lse128 (hwcap, features))
-# define IFUNC_COND_2		(has_lse2 (hwcap, features))
-# define IFUNC_NCOND(N)	2
-#else
-# define IFUNC_COND_1		(hwcap & HWCAP_ATOMICS)
-# define IFUNC_NCOND(N)	1
+/* From the file which imported `host-config.h' we can ascertain which
+   architectural extension provides relevant atomic support.  From this,
+   we can proceed to tweak the ifunc selector behavior.  */
+#if defined (LAT_CAS_N)
+# define LSE_ATOP
+#elif defined (LAT_LOAD_N) || defined (LAT_STORE_N)
+# define LSE2_ATOP
+#elif defined (LAT_EXCH_N) || defined (LAT_FIOR_N) || defined (LAT_FAND_N)
+# define LSE128_ATOP
 #endif
 
+# if N == 16
+#  if defined (LSE_ATOP)
+#   define IFUNC_NCOND(N)	1
+#   define IFUNC_COND_1	(hwcap & HWCAP_ATOMICS)
+#  elif defined (LSE2_ATOP)
+#   define IFUNC_NCOND(N)	1
+#   define IFUNC_COND_1	(has_lse2 (hwcap, features))
+#  elif  HAVE_FEAT_LSE128 && defined (LSE128_ATOP)
+#   define IFUNC_NCOND(N)	1
+#   define IFUNC_COND_1	(has_lse128 (hwcap, features))
+#  else
+#   define IFUNC_NCOND(N)	0
+#   define IFUNC_ALT		1
+#  endif
+# else
+#  define IFUNC_COND_1		(hwcap & HWCAP_ATOMICS)
+#  define IFUNC_NCOND(N)	1
+# endif
+
 #define MIDR_IMPLEMENTOR(midr)	(((midr) >> 24) & 255)
 #define MIDR_PARTNUM(midr)	(((midr) >> 4) & 0xfff)
 
-- 
2.34.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 3/4] Libatomic: Clean up AArch64 ifunc aliasing
  2024-05-16 13:36 [PATCH 0/4] Libatomic: Cleanup ifunc selector and aliasing Victor Do Nascimento
  2024-05-16 13:36 ` [PATCH 1/4] Libatomic: Define per-file identifier macros Victor Do Nascimento
  2024-05-16 13:36 ` [PATCH 2/4] Libatomic: Make ifunc selector behavior contingent on importing file Victor Do Nascimento
@ 2024-05-16 13:36 ` Victor Do Nascimento
  2024-05-16 13:36 ` [PATCH 4/4] Libatomic: Clean up AArch64 `atomic_16.S' implementation file Victor Do Nascimento
  3 siblings, 0 replies; 5+ messages in thread
From: Victor Do Nascimento @ 2024-05-16 13:36 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.sandiford, Richard.Earnshaw, Victor Do Nascimento

Following improvements to the way ifuncs are selected based on
detected architectural features, we are able to do away with many of
the aliases that were previously needed for subsets of atomic
functions that were not implemented in a given extension.

This may be clarified by virtue of an example. Before, LSE128
functions carried the suffix _i1 and LSE2 functions the _i2.

Using a single ifunc selector for all atomic functions meant that if
LSE128 was detected, the _i1 function variant would be used
indiscriminately, irrespective of whether or not a function had an
LSE128-specific implementation.  Aliasing was thus needed to redirect
calls to these missing functions to their _i2 LSE2 alternatives.

The more architectural extensions for which support was added, the
more complex the aliasing chain.

With the per-file configuration of ifuncs, we do away with the need
for such aliasing.

libatomic/ChangeLog:

	* config/linux/aarch64/atomic_16.S: Remove unnecessary
	aliasing.
---
 libatomic/config/linux/aarch64/atomic_16.S | 41 ----------------------
 1 file changed, 41 deletions(-)

diff --git a/libatomic/config/linux/aarch64/atomic_16.S b/libatomic/config/linux/aarch64/atomic_16.S
index 1517e9e78df..16ff03057ab 100644
--- a/libatomic/config/linux/aarch64/atomic_16.S
+++ b/libatomic/config/linux/aarch64/atomic_16.S
@@ -732,47 +732,6 @@ ENTRY_ALIASED (test_and_set_16)
 END (test_and_set_16)
 
 
-/* Alias entry points which are the same in LSE2 and LSE128.  */
-
-#if HAVE_IFUNC
-# if !HAVE_FEAT_LSE128
-ALIAS (exchange_16, LSE128, LSE2)
-ALIAS (fetch_or_16, LSE128, LSE2)
-ALIAS (fetch_and_16, LSE128, LSE2)
-ALIAS (or_fetch_16, LSE128, LSE2)
-ALIAS (and_fetch_16, LSE128, LSE2)
-# endif
-ALIAS (load_16, LSE128, LSE2)
-ALIAS (store_16, LSE128, LSE2)
-ALIAS (compare_exchange_16, LSE128, LSE2)
-ALIAS (fetch_add_16, LSE128, LSE2)
-ALIAS (add_fetch_16, LSE128, LSE2)
-ALIAS (fetch_sub_16, LSE128, LSE2)
-ALIAS (sub_fetch_16, LSE128, LSE2)
-ALIAS (fetch_xor_16, LSE128, LSE2)
-ALIAS (xor_fetch_16, LSE128, LSE2)
-ALIAS (fetch_nand_16, LSE128, LSE2)
-ALIAS (nand_fetch_16, LSE128, LSE2)
-ALIAS (test_and_set_16, LSE128, LSE2)
-
-/* Alias entry points which are the same in baseline and LSE2.  */
-
-ALIAS (exchange_16, LSE2, CORE)
-ALIAS (fetch_add_16, LSE2, CORE)
-ALIAS (add_fetch_16, LSE2, CORE)
-ALIAS (fetch_sub_16, LSE2, CORE)
-ALIAS (sub_fetch_16, LSE2, CORE)
-ALIAS (fetch_or_16, LSE2, CORE)
-ALIAS (or_fetch_16, LSE2, CORE)
-ALIAS (fetch_and_16, LSE2, CORE)
-ALIAS (and_fetch_16, LSE2, CORE)
-ALIAS (fetch_xor_16, LSE2, CORE)
-ALIAS (xor_fetch_16, LSE2, CORE)
-ALIAS (fetch_nand_16, LSE2, CORE)
-ALIAS (nand_fetch_16, LSE2, CORE)
-ALIAS (test_and_set_16, LSE2, CORE)
-#endif
-
 /* GNU_PROPERTY_AARCH64_* macros from elf.h for use in asm code.  */
 #define FEATURE_1_AND 0xc0000000
 #define FEATURE_1_BTI 1
-- 
2.34.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 4/4] Libatomic: Clean up AArch64 `atomic_16.S' implementation file
  2024-05-16 13:36 [PATCH 0/4] Libatomic: Cleanup ifunc selector and aliasing Victor Do Nascimento
                   ` (2 preceding siblings ...)
  2024-05-16 13:36 ` [PATCH 3/4] Libatomic: Clean up AArch64 ifunc aliasing Victor Do Nascimento
@ 2024-05-16 13:36 ` Victor Do Nascimento
  3 siblings, 0 replies; 5+ messages in thread
From: Victor Do Nascimento @ 2024-05-16 13:36 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.sandiford, Richard.Earnshaw, Victor Do Nascimento

At present, `atomic_16.S' groups different implementations of the
same functions together in the file.  Therefore, as an example,
the LSE128 implementation of `exchange_16' follows on immediately
from its core implementation, as does the `fetch_or_16' LSE128
implementation.

Such architectural extension-dependent implementations are dependent
both on ifunc and assembler support.  They may therefore conceivably
be guarded by 2 preprocessor macros, e.g. `#if HAVE_IFUNC' and `#if
HAVE_FEAT_LSE128'.

Having to apply these guards on a per-function basis adds unnecessary
clutter to the file and makes its maintenance more error-prone.

We therefore reorganize the layout of the file in such a way that all
core implementations needing no `#ifdef's are placed first, followed
by all ifunc-dependent implementations, which can all be guarded by a
single `#if HAVE_IFUNC'.  Within the guard, these are then subdivided
and organized according to architectural extension requirements such
that in the case of LSE128-specific functions, for example, they can
all be guarded by a single `#if HAVE_FEAT_LSE128', greatly reducing
the overall number of required `#ifdef' macros.

libatomic/ChangeLog:

	* config/linux/aarch64/atomic_16.S: reshuffle functions.
---
 libatomic/config/linux/aarch64/atomic_16.S | 583 ++++++++++-----------
 1 file changed, 288 insertions(+), 295 deletions(-)

diff --git a/libatomic/config/linux/aarch64/atomic_16.S b/libatomic/config/linux/aarch64/atomic_16.S
index 16ff03057ab..27363f82b75 100644
--- a/libatomic/config/linux/aarch64/atomic_16.S
+++ b/libatomic/config/linux/aarch64/atomic_16.S
@@ -40,15 +40,12 @@
 
 #include "auto-config.h"
 
-#if !HAVE_IFUNC
-# undef HAVE_FEAT_LSE128
-# define HAVE_FEAT_LSE128 0
-#endif
-
-#define HAVE_FEAT_LSE2	HAVE_IFUNC
-
-#if HAVE_FEAT_LSE128
+#if HAVE_IFUNC
+# if HAVE_FEAT_LSE128
 	.arch	armv9-a+lse128
+# else
+	.arch	armv8-a+lse
+# endif
 #else
 	.arch	armv8-a+lse
 #endif
@@ -124,6 +121,8 @@ NAME:				\
 #define ACQ_REL 4
 #define SEQ_CST 5
 
+/* Core atomic operation implementations.  These are available irrespective of
+   ifunc support or the presence of additional architectural extensions.  */
 
 ENTRY (load_16)
 	mov	x5, x0
@@ -143,31 +142,6 @@ ENTRY (load_16)
 END (load_16)
 
 
-#if HAVE_FEAT_LSE2
-ENTRY_FEAT (load_16, LSE2)
-	cbnz	w1, 1f
-
-	/* RELAXED.  */
-	ldp	res0, res1, [x0]
-	ret
-1:
-	cmp	w1, SEQ_CST
-	b.eq	2f
-
-	/* ACQUIRE/CONSUME (Load-AcquirePC semantics).  */
-	ldp	res0, res1, [x0]
-	dmb	ishld
-	ret
-
-	/* SEQ_CST.  */
-2:	ldar	tmp0, [x0]	/* Block reordering with Store-Release instr.  */
-	ldp	res0, res1, [x0]
-	dmb	ishld
-	ret
-END_FEAT (load_16, LSE2)
-#endif
-
-
 ENTRY (store_16)
 	cbnz	w4, 2f
 
@@ -185,23 +159,6 @@ ENTRY (store_16)
 END (store_16)
 
 
-#if HAVE_FEAT_LSE2
-ENTRY_FEAT (store_16, LSE2)
-	cbnz	w4, 1f
-
-	/* RELAXED.  */
-	stp	in0, in1, [x0]
-	ret
-
-	/* RELEASE/SEQ_CST.  */
-1:	ldxp	xzr, tmp0, [x0]
-	stlxp	w4, in0, in1, [x0]
-	cbnz	w4, 1b
-	ret
-END_FEAT (store_16, LSE2)
-#endif
-
-
 ENTRY (exchange_16)
 	mov	x5, x0
 	cbnz	w4, 2f
@@ -229,31 +186,6 @@ ENTRY (exchange_16)
 END (exchange_16)
 
 
-#if HAVE_FEAT_LSE128
-ENTRY_FEAT (exchange_16, LSE128)
-	mov	tmp0, x0
-	mov	res0, in0
-	mov	res1, in1
-	cbnz	w4, 1f
-
-	/* RELAXED.  */
-	swpp	res0, res1, [tmp0]
-	ret
-1:
-	cmp	w4, ACQUIRE
-	b.hi	2f
-
-	/* ACQUIRE/CONSUME.  */
-	swppa	res0, res1, [tmp0]
-	ret
-
-	/* RELEASE/ACQ_REL/SEQ_CST.  */
-2:	swppal	res0, res1, [tmp0]
-	ret
-END_FEAT (exchange_16, LSE128)
-#endif
-
-
 ENTRY (compare_exchange_16)
 	ldp	exp0, exp1, [x1]
 	cbz	w4, 3f
@@ -301,43 +233,97 @@ ENTRY (compare_exchange_16)
 END (compare_exchange_16)
 
 
-#if HAVE_FEAT_LSE2
-ENTRY_FEAT (compare_exchange_16, LSE)
-	ldp	exp0, exp1, [x1]
-	mov	tmp0, exp0
-	mov	tmp1, exp1
-	cbz	w4, 2f
-	cmp	w4, RELEASE
-	b.hs	3f
+ENTRY (fetch_or_16)
+	mov	x5, x0
+	cbnz	w4, 2f
 
-	/* ACQUIRE/CONSUME.  */
-	caspa	exp0, exp1, in0, in1, [x0]
-0:
-	cmp	exp0, tmp0
-	ccmp	exp1, tmp1, 0, eq
-	bne	1f
-	mov	x0, 1
+	/* RELAXED.  */
+1:	ldxp	res0, res1, [x5]
+	orr	tmp0, res0, in0
+	orr	tmp1, res1, in1
+	stxp	w4, tmp0, tmp1, [x5]
+	cbnz	w4, 1b
 	ret
-1:
-	stp	exp0, exp1, [x1]
-	mov	x0, 0
+
+	/* ACQUIRE/CONSUME/RELEASE/ACQ_REL/SEQ_CST.  */
+2:	ldaxp	res0, res1, [x5]
+	orr	tmp0, res0, in0
+	orr	tmp1, res1, in1
+	stlxp	w4, tmp0, tmp1, [x5]
+	cbnz	w4, 2b
 	ret
+END (fetch_or_16)
+
+
+ENTRY (or_fetch_16)
+	mov	x5, x0
+	cbnz	w4, 2f
 
 	/* RELAXED.  */
-2:	casp	exp0, exp1, in0, in1, [x0]
-	b	0b
+1:	ldxp	res0, res1, [x5]
+	orr	res0, res0, in0
+	orr	res1, res1, in1
+	stxp	w4, res0, res1, [x5]
+	cbnz	w4, 1b
+	ret
 
-	/* RELEASE.  */
-3:	b.hi	4f
-	caspl	exp0, exp1, in0, in1, [x0]
-	b	0b
+	/* ACQUIRE/CONSUME/RELEASE/ACQ_REL/SEQ_CST.  */
+2:	ldaxp	res0, res1, [x5]
+	orr	res0, res0, in0
+	orr	res1, res1, in1
+	stlxp	w4, res0, res1, [x5]
+	cbnz	w4, 2b
+	ret
+END (or_fetch_16)
+
+
+ENTRY (fetch_and_16)
+	mov	x5, x0
+	cbnz	w4, 2f
+
+	/* RELAXED.  */
+1:	ldxp	res0, res1, [x5]
+	and	tmp0, res0, in0
+	and	tmp1, res1, in1
+	stxp	w4, tmp0, tmp1, [x5]
+	cbnz	w4, 1b
+	ret
+
+	/* ACQUIRE/CONSUME/RELEASE/ACQ_REL/SEQ_CST.  */
+2:	ldaxp	res0, res1, [x5]
+	and	tmp0, res0, in0
+	and	tmp1, res1, in1
+	stlxp	w4, tmp0, tmp1, [x5]
+	cbnz	w4, 2b
+	ret
+END (fetch_and_16)
+
+
+ENTRY (and_fetch_16)
+	mov	x5, x0
+	cbnz	w4, 2f
+
+	/* RELAXED.  */
+1:	ldxp	res0, res1, [x5]
+	and	res0, res0, in0
+	and	res1, res1, in1
+	stxp	w4, res0, res1, [x5]
+	cbnz	w4, 1b
+	ret
+
+	/* ACQUIRE/CONSUME/RELEASE/ACQ_REL/SEQ_CST.  */
+2:	ldaxp	res0, res1, [x5]
+	and	res0, res0, in0
+	and	res1, res1, in1
+	stlxp	w4, res0, res1, [x5]
+	cbnz	w4, 2b
+	ret
+END (and_fetch_16)
 
-	/* ACQ_REL/SEQ_CST.  */
-4:	caspal	exp0, exp1, in0, in1, [x0]
-	b	0b
-END_FEAT (compare_exchange_16, LSE)
-#endif
 
+/* The following functions are currently single-implementation operations,
+   so they are never assigned an ifunc selector.  As such, they must be
+   reachable from __atomic_* entrypoints.  */
 
 ENTRY_ALIASED (fetch_add_16)
 	mov	x5, x0
@@ -427,309 +413,316 @@ ENTRY_ALIASED (sub_fetch_16)
 END (sub_fetch_16)
 
 
-ENTRY (fetch_or_16)
+ENTRY_ALIASED (fetch_xor_16)
 	mov	x5, x0
 	cbnz	w4, 2f
 
 	/* RELAXED.  */
 1:	ldxp	res0, res1, [x5]
-	orr	tmp0, res0, in0
-	orr	tmp1, res1, in1
+	eor	tmp0, res0, in0
+	eor	tmp1, res1, in1
 	stxp	w4, tmp0, tmp1, [x5]
 	cbnz	w4, 1b
 	ret
 
 	/* ACQUIRE/CONSUME/RELEASE/ACQ_REL/SEQ_CST.  */
 2:	ldaxp	res0, res1, [x5]
-	orr	tmp0, res0, in0
-	orr	tmp1, res1, in1
+	eor	tmp0, res0, in0
+	eor	tmp1, res1, in1
 	stlxp	w4, tmp0, tmp1, [x5]
 	cbnz	w4, 2b
 	ret
-END (fetch_or_16)
+END (fetch_xor_16)
 
 
-#if HAVE_FEAT_LSE128
-ENTRY_FEAT (fetch_or_16, LSE128)
-	mov	tmp0, x0
-	mov	res0, in0
-	mov	res1, in1
-	cbnz	w4, 1f
+ENTRY_ALIASED (xor_fetch_16)
+	mov	x5, x0
+	cbnz	w4, 2f
 
 	/* RELAXED.  */
-	ldsetp	res0, res1, [tmp0]
-	ret
-1:
-	cmp	w4, ACQUIRE
-	b.hi	2f
-
-	/* ACQUIRE/CONSUME.  */
-	ldsetpa	res0, res1, [tmp0]
+1:	ldxp	res0, res1, [x5]
+	eor	res0, res0, in0
+	eor	res1, res1, in1
+	stxp	w4, res0, res1, [x5]
+	cbnz	w4, 1b
 	ret
 
-	/* RELEASE/ACQ_REL/SEQ_CST.  */
-2:	ldsetpal	res0, res1, [tmp0]
+	/* ACQUIRE/CONSUME/RELEASE/ACQ_REL/SEQ_CST.  */
+2:	ldaxp	res0, res1, [x5]
+	eor	res0, res0, in0
+	eor	res1, res1, in1
+	stlxp	w4, res0, res1, [x5]
+	cbnz	w4, 2b
 	ret
-END_FEAT (fetch_or_16, LSE128)
-#endif
+END (xor_fetch_16)
 
 
-ENTRY (or_fetch_16)
+ENTRY_ALIASED (fetch_nand_16)
 	mov	x5, x0
+	mvn	in0, in0
+	mvn	in1, in1
 	cbnz	w4, 2f
 
 	/* RELAXED.  */
 1:	ldxp	res0, res1, [x5]
-	orr	res0, res0, in0
-	orr	res1, res1, in1
-	stxp	w4, res0, res1, [x5]
+	orn	tmp0, in0, res0
+	orn	tmp1, in1, res1
+	stxp	w4, tmp0, tmp1, [x5]
 	cbnz	w4, 1b
 	ret
 
 	/* ACQUIRE/CONSUME/RELEASE/ACQ_REL/SEQ_CST.  */
 2:	ldaxp	res0, res1, [x5]
-	orr	res0, res0, in0
-	orr	res1, res1, in1
-	stlxp	w4, res0, res1, [x5]
+	orn	tmp0, in0, res0
+	orn	tmp1, in1, res1
+	stlxp	w4, tmp0, tmp1, [x5]
 	cbnz	w4, 2b
 	ret
-END (or_fetch_16)
+END (fetch_nand_16)
 
 
-#if HAVE_FEAT_LSE128
-ENTRY_FEAT (or_fetch_16, LSE128)
-	cbnz	w4, 1f
-	mov	tmp0, in0
-	mov	tmp1, in1
+ENTRY_ALIASED (nand_fetch_16)
+	mov	x5, x0
+	mvn	in0, in0
+	mvn	in1, in1
+	cbnz	w4, 2f
 
 	/* RELAXED.  */
-	ldsetp	in0, in1, [x0]
-	orr	res0, in0, tmp0
-	orr	res1, in1, tmp1
+1:	ldxp	res0, res1, [x5]
+	orn	res0, in0, res0
+	orn	res1, in1, res1
+	stxp	w4, res0, res1, [x5]
+	cbnz	w4, 1b
 	ret
-1:
-	cmp	w4, ACQUIRE
-	b.hi	2f
 
-	/* ACQUIRE/CONSUME.  */
-	ldsetpa	in0, in1, [x0]
-	orr	res0, in0, tmp0
-	orr	res1, in1, tmp1
+	/* ACQUIRE/CONSUME/RELEASE/ACQ_REL/SEQ_CST.  */
+2:	ldaxp	res0, res1, [x5]
+	orn	res0, in0, res0
+	orn	res1, in1, res1
+	stlxp	w4, res0, res1, [x5]
+	cbnz	w4, 2b
 	ret
+END (nand_fetch_16)
 
-	/* RELEASE/ACQ_REL/SEQ_CST.  */
-2:	ldsetpal	in0, in1, [x0]
-	orr	res0, in0, tmp0
-	orr	res1, in1, tmp1
-	ret
-END_FEAT (or_fetch_16, LSE128)
-#endif
 
+/* __atomic_test_and_set is always inlined, so this entry is unused and
+   only required for completeness.  */
+ENTRY_ALIASED (test_and_set_16)
 
-ENTRY (fetch_and_16)
+	/* RELAXED/ACQUIRE/CONSUME/RELEASE/ACQ_REL/SEQ_CST.  */
 	mov	x5, x0
-	cbnz	w4, 2f
-
-	/* RELAXED.  */
-1:	ldxp	res0, res1, [x5]
-	and	tmp0, res0, in0
-	and	tmp1, res1, in1
-	stxp	w4, tmp0, tmp1, [x5]
+1:	ldaxrb	w0, [x5]
+	stlxrb	w4, w2, [x5]
 	cbnz	w4, 1b
 	ret
+END (test_and_set_16)
 
-	/* ACQUIRE/CONSUME/RELEASE/ACQ_REL/SEQ_CST.  */
-2:	ldaxp	res0, res1, [x5]
-	and	tmp0, res0, in0
-	and	tmp1, res1, in1
-	stlxp	w4, tmp0, tmp1, [x5]
-	cbnz	w4, 2b
-	ret
-END (fetch_and_16)
-
+/* Ensure extension-specific implementations are not included unless ifunc
+   support is present, along with necessary assembler support.  */
 
-#if HAVE_FEAT_LSE128
-ENTRY_FEAT (fetch_and_16, LSE128)
-	mov	tmp0, x0
-	mvn	res0, in0
-	mvn	res1, in1
-	cbnz	w4, 1f
+#if HAVE_IFUNC
+ENTRY_FEAT (load_16, LSE2)
+	cbnz	w1, 1f
 
 	/* RELAXED.  */
-	ldclrp	res0, res1, [tmp0]
+	ldp	res0, res1, [x0]
 	ret
-
 1:
-	cmp	w4, ACQUIRE
-	b.hi	2f
+	cmp	w1, SEQ_CST
+	b.eq	2f
 
-	/* ACQUIRE/CONSUME.  */
-	ldclrpa res0, res1, [tmp0]
+	/* ACQUIRE/CONSUME (Load-AcquirePC semantics).  */
+	ldp	res0, res1, [x0]
+	dmb	ishld
 	ret
 
-	/* RELEASE/ACQ_REL/SEQ_CST.  */
-2:	ldclrpal	res0, res1, [tmp0]
+	/* SEQ_CST.  */
+2:	ldar	tmp0, [x0]	/* Block reordering with Store-Release instr.  */
+	ldp	res0, res1, [x0]
+	dmb	ishld
 	ret
-END_FEAT (fetch_and_16, LSE128)
-#endif
+END_FEAT (load_16, LSE2)
 
 
-ENTRY (and_fetch_16)
-	mov	x5, x0
-	cbnz	w4, 2f
+ENTRY_FEAT (store_16, LSE2)
+	cbnz	w4, 1f
 
 	/* RELAXED.  */
-1:	ldxp	res0, res1, [x5]
-	and	res0, res0, in0
-	and	res1, res1, in1
-	stxp	w4, res0, res1, [x5]
+	stp	in0, in1, [x0]
+	ret
+
+	/* RELEASE/SEQ_CST.  */
+1:	ldxp	xzr, tmp0, [x0]
+	stlxp	w4, in0, in1, [x0]
 	cbnz	w4, 1b
 	ret
+END_FEAT (store_16, LSE2)
 
-	/* ACQUIRE/CONSUME/RELEASE/ACQ_REL/SEQ_CST.  */
-2:	ldaxp	res0, res1, [x5]
-	and	res0, res0, in0
-	and	res1, res1, in1
-	stlxp	w4, res0, res1, [x5]
-	cbnz	w4, 2b
+
+ENTRY_FEAT (compare_exchange_16, LSE)
+	ldp	exp0, exp1, [x1]
+	mov	tmp0, exp0
+	mov	tmp1, exp1
+	cbz	w4, 2f
+	cmp	w4, RELEASE
+	b.hs	3f
+
+	/* ACQUIRE/CONSUME.  */
+	caspa	exp0, exp1, in0, in1, [x0]
+0:
+	cmp	exp0, tmp0
+	ccmp	exp1, tmp1, 0, eq
+	bne	1f
+	mov	x0, 1
 	ret
-END (and_fetch_16)
+1:
+	stp	exp0, exp1, [x1]
+	mov	x0, 0
+	ret
+
+	/* RELAXED.  */
+2:	casp	exp0, exp1, in0, in1, [x0]
+	b	0b
+
+	/* RELEASE.  */
+3:	b.hi	4f
+	caspl	exp0, exp1, in0, in1, [x0]
+	b	0b
+
+	/* ACQ_REL/SEQ_CST.  */
+4:	caspal	exp0, exp1, in0, in1, [x0]
+	b	0b
+END_FEAT (compare_exchange_16, LSE)
 
 
 #if HAVE_FEAT_LSE128
-ENTRY_FEAT (and_fetch_16, LSE128)
-	mvn	tmp0, in0
-	mvn	tmp0, in1
+ENTRY_FEAT (exchange_16, LSE128)
+	mov	tmp0, x0
+	mov	res0, in0
+	mov	res1, in1
 	cbnz	w4, 1f
 
 	/* RELAXED.  */
-	ldclrp	tmp0, tmp1, [x0]
-	and	res0, tmp0, in0
-	and	res1, tmp1, in1
+	swpp	res0, res1, [tmp0]
 	ret
-
 1:
 	cmp	w4, ACQUIRE
 	b.hi	2f
 
 	/* ACQUIRE/CONSUME.  */
-	ldclrpa tmp0, tmp1, [x0]
-	and	res0, tmp0, in0
-	and	res1, tmp1, in1
+	swppa	res0, res1, [tmp0]
 	ret
 
 	/* RELEASE/ACQ_REL/SEQ_CST.  */
-2:	ldclrpal	tmp0, tmp1, [x5]
-	and	res0, tmp0, in0
-	and	res1, tmp1, in1
+2:	swppal	res0, res1, [tmp0]
 	ret
-END_FEAT (and_fetch_16, LSE128)
-#endif
+END_FEAT (exchange_16, LSE128)
 
 
-ENTRY_ALIASED (fetch_xor_16)
-	mov	x5, x0
-	cbnz	w4, 2f
+ENTRY_FEAT (fetch_or_16, LSE128)
+	mov	tmp0, x0
+	mov	res0, in0
+	mov	res1, in1
+	cbnz	w4, 1f
 
 	/* RELAXED.  */
-1:	ldxp	res0, res1, [x5]
-	eor	tmp0, res0, in0
-	eor	tmp1, res1, in1
-	stxp	w4, tmp0, tmp1, [x5]
-	cbnz	w4, 1b
+	ldsetp	res0, res1, [tmp0]
 	ret
+1:
+	cmp	w4, ACQUIRE
+	b.hi	2f
 
-	/* ACQUIRE/CONSUME/RELEASE/ACQ_REL/SEQ_CST.  */
-2:	ldaxp	res0, res1, [x5]
-	eor	tmp0, res0, in0
-	eor	tmp1, res1, in1
-	stlxp	w4, tmp0, tmp1, [x5]
-	cbnz	w4, 2b
+	/* ACQUIRE/CONSUME.  */
+	ldsetpa	res0, res1, [tmp0]
 	ret
-END (fetch_xor_16)
 
+	/* RELEASE/ACQ_REL/SEQ_CST.  */
+2:	ldsetpal	res0, res1, [tmp0]
+	ret
+END_FEAT (fetch_or_16, LSE128)
 
-ENTRY_ALIASED (xor_fetch_16)
-	mov	x5, x0
-	cbnz	w4, 2f
+
+ENTRY_FEAT (or_fetch_16, LSE128)
+	cbnz	w4, 1f
+	mov	tmp0, in0
+	mov	tmp1, in1
 
 	/* RELAXED.  */
-1:	ldxp	res0, res1, [x5]
-	eor	res0, res0, in0
-	eor	res1, res1, in1
-	stxp	w4, res0, res1, [x5]
-	cbnz	w4, 1b
+	ldsetp	in0, in1, [x0]
+	orr	res0, in0, tmp0
+	orr	res1, in1, tmp1
 	ret
+1:
+	cmp	w4, ACQUIRE
+	b.hi	2f
 
-	/* ACQUIRE/CONSUME/RELEASE/ACQ_REL/SEQ_CST.  */
-2:	ldaxp	res0, res1, [x5]
-	eor	res0, res0, in0
-	eor	res1, res1, in1
-	stlxp	w4, res0, res1, [x5]
-	cbnz	w4, 2b
+	/* ACQUIRE/CONSUME.  */
+	ldsetpa	in0, in1, [x0]
+	orr	res0, in0, tmp0
+	orr	res1, in1, tmp1
 	ret
-END (xor_fetch_16)
 
+	/* RELEASE/ACQ_REL/SEQ_CST.  */
+2:	ldsetpal	in0, in1, [x0]
+	orr	res0, in0, tmp0
+	orr	res1, in1, tmp1
+	ret
+END_FEAT (or_fetch_16, LSE128)
 
-ENTRY_ALIASED (fetch_nand_16)
-	mov	x5, x0
-	mvn	in0, in0
-	mvn	in1, in1
-	cbnz	w4, 2f
+
+ENTRY_FEAT (fetch_and_16, LSE128)
+	mov	tmp0, x0
+	mvn	res0, in0
+	mvn	res1, in1
+	cbnz	w4, 1f
 
 	/* RELAXED.  */
-1:	ldxp	res0, res1, [x5]
-	orn	tmp0, in0, res0
-	orn	tmp1, in1, res1
-	stxp	w4, tmp0, tmp1, [x5]
-	cbnz	w4, 1b
+	ldclrp	res0, res1, [tmp0]
 	ret
 
-	/* ACQUIRE/CONSUME/RELEASE/ACQ_REL/SEQ_CST.  */
-2:	ldaxp	res0, res1, [x5]
-	orn	tmp0, in0, res0
-	orn	tmp1, in1, res1
-	stlxp	w4, tmp0, tmp1, [x5]
-	cbnz	w4, 2b
+1:
+	cmp	w4, ACQUIRE
+	b.hi	2f
+
+	/* ACQUIRE/CONSUME.  */
+	ldclrpa res0, res1, [tmp0]
 	ret
-END (fetch_nand_16)
 
+	/* RELEASE/ACQ_REL/SEQ_CST.  */
+2:	ldclrpal	res0, res1, [tmp0]
+	ret
+END_FEAT (fetch_and_16, LSE128)
 
-ENTRY_ALIASED (nand_fetch_16)
-	mov	x5, x0
-	mvn	in0, in0
-	mvn	in1, in1
-	cbnz	w4, 2f
 
-	/* RELAXED.  */
-1:	ldxp	res0, res1, [x5]
-	orn	res0, in0, res0
-	orn	res1, in1, res1
-	stxp	w4, res0, res1, [x5]
-	cbnz	w4, 1b
-	ret
+ENTRY_FEAT (and_fetch_16, LSE128)
+	mvn	tmp0, in0
+	mvn	tmp0, in1
+	cbnz	w4, 1f
 
-	/* ACQUIRE/CONSUME/RELEASE/ACQ_REL/SEQ_CST.  */
-2:	ldaxp	res0, res1, [x5]
-	orn	res0, in0, res0
-	orn	res1, in1, res1
-	stlxp	w4, res0, res1, [x5]
-	cbnz	w4, 2b
+	/* RELAXED.  */
+	ldclrp	tmp0, tmp1, [x0]
+	and	res0, tmp0, in0
+	and	res1, tmp1, in1
 	ret
-END (nand_fetch_16)
 
+1:
+	cmp	w4, ACQUIRE
+	b.hi	2f
 
-/* __atomic_test_and_set is always inlined, so this entry is unused and
-   only required for completeness.  */
-ENTRY_ALIASED (test_and_set_16)
+	/* ACQUIRE/CONSUME.  */
+	ldclrpa tmp0, tmp1, [x0]
+	and	res0, tmp0, in0
+	and	res1, tmp1, in1
+	ret
 
-	/* RELAXED/ACQUIRE/CONSUME/RELEASE/ACQ_REL/SEQ_CST.  */
-	mov	x5, x0
-1:	ldaxrb	w0, [x5]
-	stlxrb	w4, w2, [x5]
-	cbnz	w4, 1b
+	/* RELEASE/ACQ_REL/SEQ_CST.  */
+2:	ldclrpal	tmp0, tmp1, [x5]
+	and	res0, tmp0, in0
+	and	res1, tmp1, in1
 	ret
-END (test_and_set_16)
+END_FEAT (and_fetch_16, LSE128)
+#endif /* HAVE_FEAT_LSE128 */
+#endif /* HAVE_IFUNC */
 
 
 /* GNU_PROPERTY_AARCH64_* macros from elf.h for use in asm code.  */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-05-16 13:37 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-16 13:36 [PATCH 0/4] Libatomic: Cleanup ifunc selector and aliasing Victor Do Nascimento
2024-05-16 13:36 ` [PATCH 1/4] Libatomic: Define per-file identifier macros Victor Do Nascimento
2024-05-16 13:36 ` [PATCH 2/4] Libatomic: Make ifunc selector behavior contingent on importing file Victor Do Nascimento
2024-05-16 13:36 ` [PATCH 3/4] Libatomic: Clean up AArch64 ifunc aliasing Victor Do Nascimento
2024-05-16 13:36 ` [PATCH 4/4] Libatomic: Clean up AArch64 `atomic_16.S' implementation file Victor Do Nascimento

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).