* [PATCH 0/4] Libatomic: Cleanup ifunc selector and aliasing
@ 2024-05-16 13:36 Victor Do Nascimento
2024-05-16 13:36 ` [PATCH 1/4] Libatomic: Define per-file identifier macros Victor Do Nascimento
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: Victor Do Nascimento @ 2024-05-16 13:36 UTC (permalink / raw)
To: gcc-patches; +Cc: richard.sandiford, Richard.Earnshaw, Victor Do Nascimento
The recent introduction of the optional LSE128 and RCPC3 architectural
extensions to AArch64 has further led to the increased flexibility of
atomic support in the architecture, with many extensions providing
support for distinct atomic operations, each with different potential
applications in mind.
This has led to maintenance difficulties in Libatomic, in particular
regarding the way the ifunc selector is generated via a series of
macro expansions at compile-time.
Until now, irrespective of the atomic operation in question, all atomic
functions for a particular operand size were expected to have the same
number of ifunc alternatives, meaning that a one-size-fits-all
approach could reasonably be taken for the selector.
This meant that if, hypothetically, for a particular architecture and
operand size one particular atomic operation was to have 3 different
implementations associated with different extensions, libatomic would
likewise be required to present three ifunc alternatives for all other
atomic functions.
The consequence in the design choice was the unnecessary use of
function aliasing and the unwieldy code which resulted from this.
This patch series attempts to remediate this issue by making the
preprocessor macros defining the number of ifunc alternatives and
their respective selection functions dependent on the file importing
the ifunc selector-generating framework.
all files are given `LAT_<FILENAME>' macros, defined at the beginning
and undef'd at the end of the file. It is these macros that are
subsequently used to fine-tune the behaviors of `libatomic_i.h' and
`host-config.h'.
In particular, the definition of the `IFUNC_NCOND(N)' and
`IFUNC_COND_<n>' macros in host-config.h can now be guarded behind
these new file-specific macros, which ultimately control what the
`GEN_SELECTOR(X)' macro in `libatomic_i.h' expands to. As both of
these headers are imported once per file implementing some atomic
operation, fine-tuned control is now possible.
Regtested with both `--enable-gnu-indirect-function' and
`--disable-gnu-indirect-function' configurations on armv9.4-a target
with LRCPC3 and LSE128 support and without.
Victor Do Nascimento (4):
Libatomic: Define per-file identifier macros
Libatomic: Make ifunc selector behavior contingent on importing file
Libatomic: Clean up AArch64 ifunc aliasing
Libatomic: Clean up AArch64 `atomic_16.S' implementation file
libatomic/cas_n.c | 2 +
libatomic/config/linux/aarch64/atomic_16.S | 623 +++++++++----------
libatomic/config/linux/aarch64/host-config.h | 35 +-
libatomic/exch_n.c | 2 +
libatomic/fadd_n.c | 2 +
libatomic/fand_n.c | 2 +
libatomic/fence.c | 2 +
libatomic/fenv.c | 2 +
libatomic/fior_n.c | 2 +
libatomic/flag.c | 2 +
libatomic/fnand_n.c | 2 +
libatomic/fop_n.c | 2 +
libatomic/fsub_n.c | 2 +
libatomic/fxor_n.c | 2 +
libatomic/gcas.c | 2 +
libatomic/gexch.c | 2 +
libatomic/glfree.c | 2 +
libatomic/gload.c | 2 +
libatomic/gstore.c | 2 +
libatomic/load_n.c | 2 +
libatomic/store_n.c | 2 +
libatomic/tas_n.c | 2 +
22 files changed, 357 insertions(+), 341 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 1/4] Libatomic: Define per-file identifier macros
2024-05-16 13:36 [PATCH 0/4] Libatomic: Cleanup ifunc selector and aliasing Victor Do Nascimento
@ 2024-05-16 13:36 ` Victor Do Nascimento
2024-05-16 13:36 ` [PATCH 2/4] Libatomic: Make ifunc selector behavior contingent on importing file Victor Do Nascimento
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Victor Do Nascimento @ 2024-05-16 13:36 UTC (permalink / raw)
To: gcc-patches; +Cc: richard.sandiford, Richard.Earnshaw, Victor Do Nascimento
In order to facilitate the fine-tuning of how `libatomic_i.h' and
`host-config.h' headers are used by different atomic functions, we
define distinct identifier macros for each file which, in implementing
atomic operations, imports these headers.
The idea is that different parts of these headers could then be
conditionally defined depending on the macros set by the file that
`#include'd them.
Given how it is possible that some file names are generic enough that
using them as-is for macro names (e.g. flag.c -> FLAG) may potentially
lead to name clashes with other macros, all file names first have LAT_
prepended to them such that, for example, flag.c is assigned the
LAT_FLAG macro.
Libatomic/ChangeLog:
* cas_n.c (LAT_CAS_N): New.
* exch_n.c (LAT_EXCH_N): Likewise.
* fadd_n.c (LAT_FADD_N): Likewise.
* fand_n.c (LAT_FAND_N): Likewise.
* fence.c (LAT_FENCE): Likewise.
* fenv.c (LAT_FENV): Likewise.
* fior_n.c (LAT_FIOR_N): Likewise.
* flag.c (LAT_FLAG): Likewise.
* fnand_n.c (LAT_FNAND_N): Likewise.
* fop_n.c (LAT_FOP_N): Likewise
* fsub_n.c (LAT_FSUB_N): Likewise.
* fxor_n.c (LAT_FXOR_N): Likewise.
* gcas.c (LAT_GCAS): Likewise.
* gexch.c (LAT_GEXCH): Likewise.
* glfree.c (LAT_GLFREE): Likewise.
* gload.c (LAT_GLOAD): Likewise.
* gstore.c (LAT_GSTORE): Likewise.
* load_n.c (LAT_LOAD_N): Likewise.
* store_n.c (LAT_STORE_N): Likewise.
* tas_n.c (LAT_TAS_N): Likewise.
---
libatomic/cas_n.c | 2 ++
libatomic/exch_n.c | 2 ++
libatomic/fadd_n.c | 2 ++
libatomic/fand_n.c | 2 ++
libatomic/fence.c | 2 ++
libatomic/fenv.c | 2 ++
libatomic/fior_n.c | 2 ++
libatomic/flag.c | 2 ++
libatomic/fnand_n.c | 2 ++
libatomic/fop_n.c | 2 ++
libatomic/fsub_n.c | 2 ++
libatomic/fxor_n.c | 2 ++
libatomic/gcas.c | 2 ++
libatomic/gexch.c | 2 ++
libatomic/glfree.c | 2 ++
libatomic/gload.c | 2 ++
libatomic/gstore.c | 2 ++
libatomic/load_n.c | 2 ++
libatomic/store_n.c | 2 ++
libatomic/tas_n.c | 2 ++
20 files changed, 40 insertions(+)
diff --git a/libatomic/cas_n.c b/libatomic/cas_n.c
index a080b990371..2a6357e48db 100644
--- a/libatomic/cas_n.c
+++ b/libatomic/cas_n.c
@@ -22,6 +22,7 @@
see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
<http://www.gnu.org/licenses/>. */
+#define LAT_CAS_N
#include "libatomic_i.h"
@@ -122,3 +123,4 @@ SIZE(libat_compare_exchange) (UTYPE *mptr, UTYPE *eptr, UTYPE newval,
#endif
EXPORT_ALIAS (SIZE(compare_exchange));
+#undef LAT_CAS_N
diff --git a/libatomic/exch_n.c b/libatomic/exch_n.c
index e5ff80769b9..184d3de1009 100644
--- a/libatomic/exch_n.c
+++ b/libatomic/exch_n.c
@@ -22,6 +22,7 @@
see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
<http://www.gnu.org/licenses/>. */
+#define LAT_EXCH_N
#include "libatomic_i.h"
@@ -126,3 +127,4 @@ SIZE(libat_exchange) (UTYPE *mptr, UTYPE newval, int smodel UNUSED)
#endif
EXPORT_ALIAS (SIZE(exchange));
+#undef LAT_EXCH_N
diff --git a/libatomic/fadd_n.c b/libatomic/fadd_n.c
index bc15b8bc0e6..32b75cec654 100644
--- a/libatomic/fadd_n.c
+++ b/libatomic/fadd_n.c
@@ -22,6 +22,7 @@
see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
<http://www.gnu.org/licenses/>. */
+#define LAT_FADD_N
#include <libatomic_i.h>
#define NAME add
@@ -43,3 +44,4 @@
#endif
#include "fop_n.c"
+#undef LAT_FADD_N
diff --git a/libatomic/fand_n.c b/libatomic/fand_n.c
index ffe9ed8700f..9eab55bcd72 100644
--- a/libatomic/fand_n.c
+++ b/libatomic/fand_n.c
@@ -1,3 +1,5 @@
+#define LAT_FAND_N
#define NAME and
#define OP(X,Y) ((X) & (Y))
#include "fop_n.c"
+#undef LAT_FAND_N
diff --git a/libatomic/fence.c b/libatomic/fence.c
index a9b1e280c5a..4022194a57a 100644
--- a/libatomic/fence.c
+++ b/libatomic/fence.c
@@ -21,6 +21,7 @@
see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
<http://www.gnu.org/licenses/>. */
+#define LAT_FENCE
#include "libatomic_i.h"
#include <stdatomic.h>
@@ -43,3 +44,4 @@ void
{
atomic_signal_fence (order);
}
+#undef LAT_FENCE
diff --git a/libatomic/fenv.c b/libatomic/fenv.c
index 41f187c1f85..dccad356a31 100644
--- a/libatomic/fenv.c
+++ b/libatomic/fenv.c
@@ -21,6 +21,7 @@
see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
<http://www.gnu.org/licenses/>. */
+#define LAT_FENV
#include "libatomic_i.h"
#ifdef HAVE_FENV_H
@@ -70,3 +71,4 @@ __atomic_feraiseexcept (int excepts __attribute__ ((unused)))
}
#endif
}
+#undef LAT_FENV
diff --git a/libatomic/fior_n.c b/libatomic/fior_n.c
index 55d0d66b469..2b58d4805d6 100644
--- a/libatomic/fior_n.c
+++ b/libatomic/fior_n.c
@@ -1,3 +1,5 @@
+#define LAT_FIOR_N
#define NAME or
#define OP(X,Y) ((X) | (Y))
#include "fop_n.c"
+#undef LAT_FIOR_N
diff --git a/libatomic/flag.c b/libatomic/flag.c
index e4a5a27819a..8afd80c9130 100644
--- a/libatomic/flag.c
+++ b/libatomic/flag.c
@@ -21,6 +21,7 @@
see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
<http://www.gnu.org/licenses/>. */
+#define LAT_FLAG
#include "libatomic_i.h"
#include <stdatomic.h>
@@ -62,3 +63,4 @@ void
{
return atomic_flag_clear_explicit (object, order);
}
+#undef LAT_FLAG
diff --git a/libatomic/fnand_n.c b/libatomic/fnand_n.c
index a3c98c70494..84a02709cbb 100644
--- a/libatomic/fnand_n.c
+++ b/libatomic/fnand_n.c
@@ -1,3 +1,5 @@
+#define LAT_FNAND_N
#define NAME nand
#define OP(X,Y) ~((X) & (Y))
#include "fop_n.c"
+#undef LAT_FNAND_N
diff --git a/libatomic/fop_n.c b/libatomic/fop_n.c
index f5eb07e859f..fefff3a57a4 100644
--- a/libatomic/fop_n.c
+++ b/libatomic/fop_n.c
@@ -22,6 +22,7 @@
see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
<http://www.gnu.org/licenses/>. */
+#define LAT_FOP_N
#include <libatomic_i.h>
@@ -198,3 +199,4 @@ SIZE(C3(libat_,NAME,_fetch)) (UTYPE *mptr, UTYPE opval, int smodel UNUSED)
EXPORT_ALIAS (SIZE(C2(fetch_,NAME)));
EXPORT_ALIAS (SIZE(C2(NAME,_fetch)));
+#undef LAT_FOP_N
diff --git a/libatomic/fsub_n.c b/libatomic/fsub_n.c
index e9f8d7d25e1..49b375a543f 100644
--- a/libatomic/fsub_n.c
+++ b/libatomic/fsub_n.c
@@ -1,3 +1,5 @@
+#define LAT_FSUB_N
#define NAME sub
#define OP(X,Y) ((X) - (Y))
#include "fop_n.c"
+#undef LAT_FSUB_N
diff --git a/libatomic/fxor_n.c b/libatomic/fxor_n.c
index 0f2d9624127..d9a91bc3b23 100644
--- a/libatomic/fxor_n.c
+++ b/libatomic/fxor_n.c
@@ -1,3 +1,5 @@
+#define LAT_FXOR_N
#define NAME xor
#define OP(X,Y) ((X) ^ (Y))
#include "fop_n.c"
+#undef LAT_FXOR_N
diff --git a/libatomic/gcas.c b/libatomic/gcas.c
index 21d11305f1e..af4a5f5c5ee 100644
--- a/libatomic/gcas.c
+++ b/libatomic/gcas.c
@@ -22,6 +22,7 @@
see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
<http://www.gnu.org/licenses/>. */
+#define LAT_GCAS
#include "libatomic_i.h"
@@ -118,3 +119,4 @@ libat_compare_exchange (size_t n, void *mptr, void *eptr, void *dptr,
}
EXPORT_ALIAS (compare_exchange);
+#undef LAT_GCAS
diff --git a/libatomic/gexch.c b/libatomic/gexch.c
index 6233759a2e8..afb054c0ef2 100644
--- a/libatomic/gexch.c
+++ b/libatomic/gexch.c
@@ -22,6 +22,7 @@
see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
<http://www.gnu.org/licenses/>. */
+#define LAT_GEXCH
#include "libatomic_i.h"
@@ -142,3 +143,4 @@ libat_exchange (size_t n, void *mptr, void *vptr, void *rptr, int smodel)
}
EXPORT_ALIAS (exchange);
+#undef LAT_GEXCH
diff --git a/libatomic/glfree.c b/libatomic/glfree.c
index 58a45126194..1051ceb81cd 100644
--- a/libatomic/glfree.c
+++ b/libatomic/glfree.c
@@ -22,6 +22,7 @@
see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
<http://www.gnu.org/licenses/>. */
+#define LAT_GLFREE
#include "libatomic_i.h"
/* Accesses with a power-of-two size are not lock-free if we don't have an
@@ -80,3 +81,4 @@ libat_is_lock_free (size_t n, void *ptr)
}
EXPORT_ALIAS (is_lock_free);
+#undef LAT_GLFREE
diff --git a/libatomic/gload.c b/libatomic/gload.c
index 4b3198cc5ae..9b499672161 100644
--- a/libatomic/gload.c
+++ b/libatomic/gload.c
@@ -22,6 +22,7 @@
see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
<http://www.gnu.org/licenses/>. */
+#define LAT_GLOAD
#include "libatomic_i.h"
@@ -98,3 +99,4 @@ libat_load (size_t n, void *mptr, void *rptr, int smodel)
}
EXPORT_ALIAS (load);
+#undef LAT_GLOAD
diff --git a/libatomic/gstore.c b/libatomic/gstore.c
index 505a7b9b2df..b2636059bd8 100644
--- a/libatomic/gstore.c
+++ b/libatomic/gstore.c
@@ -22,6 +22,7 @@
see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
<http://www.gnu.org/licenses/>. */
+#define LAT_GSTORE
#include "libatomic_i.h"
@@ -106,3 +107,4 @@ libat_store (size_t n, void *mptr, void *vptr, int smodel)
}
EXPORT_ALIAS (store);
+#undef LAT_GSTORE
diff --git a/libatomic/load_n.c b/libatomic/load_n.c
index 7513f191833..657c8e23ed2 100644
--- a/libatomic/load_n.c
+++ b/libatomic/load_n.c
@@ -22,6 +22,7 @@
see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
<http://www.gnu.org/licenses/>. */
+#define LAT_LOAD_N
#include "libatomic_i.h"
@@ -113,3 +114,4 @@ SIZE(libat_load) (UTYPE *mptr, int smodel)
#endif
EXPORT_ALIAS (SIZE(load));
+#undef LAT_LOAD_N
diff --git a/libatomic/store_n.c b/libatomic/store_n.c
index d8ab5e69a50..079e22d75ba 100644
--- a/libatomic/store_n.c
+++ b/libatomic/store_n.c
@@ -22,6 +22,7 @@
see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
<http://www.gnu.org/licenses/>. */
+#define LAT_STORE_N
#include "libatomic_i.h"
@@ -110,3 +111,4 @@ SIZE(libat_store) (UTYPE *mptr, UTYPE newval, int smodel)
#endif
EXPORT_ALIAS (SIZE(store));
+#undef LAT_STORE_N
diff --git a/libatomic/tas_n.c b/libatomic/tas_n.c
index 4a01cd2a5c8..9321b3a4e02 100644
--- a/libatomic/tas_n.c
+++ b/libatomic/tas_n.c
@@ -22,6 +22,7 @@
see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
<http://www.gnu.org/licenses/>. */
+#define LAT_TAS_N
#include "libatomic_i.h"
@@ -113,3 +114,4 @@ SIZE(libat_test_and_set) (UTYPE *mptr, int smodel UNUSED)
#endif
EXPORT_ALIAS (SIZE(test_and_set));
+#undef LAT_TAS_N
--
2.34.1
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 2/4] Libatomic: Make ifunc selector behavior contingent on importing file
2024-05-16 13:36 [PATCH 0/4] Libatomic: Cleanup ifunc selector and aliasing Victor Do Nascimento
2024-05-16 13:36 ` [PATCH 1/4] Libatomic: Define per-file identifier macros Victor Do Nascimento
@ 2024-05-16 13:36 ` Victor Do Nascimento
2024-05-16 13:36 ` [PATCH 3/4] Libatomic: Clean up AArch64 ifunc aliasing Victor Do Nascimento
2024-05-16 13:36 ` [PATCH 4/4] Libatomic: Clean up AArch64 `atomic_16.S' implementation file Victor Do Nascimento
3 siblings, 0 replies; 5+ messages in thread
From: Victor Do Nascimento @ 2024-05-16 13:36 UTC (permalink / raw)
To: gcc-patches; +Cc: richard.sandiford, Richard.Earnshaw, Victor Do Nascimento
By querying previously-defined file-identifier macros, `host-config.h'
is able to get information about its environment and, based on this
information, select more appropriate function-specific ifunc
selectors. This reduces the number of unnecessary feature tests that
need to be carried out in order to find the best atomic implementation
for a function at run-time.
An immediate benefit of this is that we can further fine-tune the
architectural requirements for each atomic function without risk of
incurring the maintenance and runtime-performance penalties of having
to maintain an ifunc selector with a huge number of alternatives, most
of which are irrelevant for any particular function. Consequently,
for AArch64 targets, we relax the architectural requirements of
`compare_exchange_16', which now requires only LSE as opposed to the
newer LSE2.
The new flexibility provided by this approach also means that certain
functions can now be called directly, doing away with ifunc selectors
altogether when only a single implementation is available for it on a
given target. As per the macro expansion framework laid out in
`libatomic_i.h', such functions should have their names prefixed with
`__atomic_' as opposed to `libat_'. This is the same prefix applied
to function names when Libatomic is configured with
`--disable-gnu-indirect-function'.
To achieve this, these functions unconditionally apply the aliasing
rule that at present is conditionally applied only when libatomic is
built without ifunc support, which ensures that the default
`libat_##NAME' is accessible via the equivalent `__atomic_##NAME' too.
This is ensured by using the new `ENTRY_ALIASED' macro.
libatomic/ChangeLog:
* config/linux/aarch64/atomic_16.S (LSE): New.
(ENTRY_ALIASED): Likewise.
* config/linux/aarch64/host-config.h (LSE_ATOP): New.
(LSE2_ATOP): Likewise.
(LSE128_ATOP): Likewise.
(IFUNC_COND_1): Make its definition conditional on above 3
macros.
(IFUNC_NCOND): Likewise.
---
libatomic/config/linux/aarch64/atomic_16.S | 31 +++++++++--------
libatomic/config/linux/aarch64/host-config.h | 35 ++++++++++++++++----
2 files changed, 45 insertions(+), 21 deletions(-)
diff --git a/libatomic/config/linux/aarch64/atomic_16.S b/libatomic/config/linux/aarch64/atomic_16.S
index b63e97ac5a2..1517e9e78df 100644
--- a/libatomic/config/linux/aarch64/atomic_16.S
+++ b/libatomic/config/linux/aarch64/atomic_16.S
@@ -54,17 +54,20 @@
#endif
#define LSE128(NAME) libat_##NAME##_i1
-#define LSE2(NAME) libat_##NAME##_i2
+#define LSE(NAME) libat_##NAME##_i1
+#define LSE2(NAME) libat_##NAME##_i1
#define CORE(NAME) libat_##NAME
#define ATOMIC(NAME) __atomic_##NAME
+/* Emit __atomic_* entrypoints if no ifuncs. */
+#define ENTRY_ALIASED(NAME) ENTRY2 (CORE (NAME), ALIAS (NAME, ATOMIC, CORE))
+
#if HAVE_IFUNC
# define ENTRY(NAME) ENTRY2 (CORE (NAME), )
# define ENTRY_FEAT(NAME, FEAT) ENTRY2 (FEAT (NAME), )
# define END_FEAT(NAME, FEAT) END2 (FEAT (NAME))
#else
-/* Emit __atomic_* entrypoints if no ifuncs. */
-# define ENTRY(NAME) ENTRY2 (CORE (NAME), ALIAS (NAME, ATOMIC, CORE))
+# define ENTRY(NAME) ENTRY_ALIASED (NAME)
#endif
#define END(NAME) END2 (CORE (NAME))
@@ -299,7 +302,7 @@ END (compare_exchange_16)
#if HAVE_FEAT_LSE2
-ENTRY_FEAT (compare_exchange_16, LSE2)
+ENTRY_FEAT (compare_exchange_16, LSE)
ldp exp0, exp1, [x1]
mov tmp0, exp0
mov tmp1, exp1
@@ -332,11 +335,11 @@ ENTRY_FEAT (compare_exchange_16, LSE2)
/* ACQ_REL/SEQ_CST. */
4: caspal exp0, exp1, in0, in1, [x0]
b 0b
-END_FEAT (compare_exchange_16, LSE2)
+END_FEAT (compare_exchange_16, LSE)
#endif
-ENTRY (fetch_add_16)
+ENTRY_ALIASED (fetch_add_16)
mov x5, x0
cbnz w4, 2f
@@ -358,7 +361,7 @@ ENTRY (fetch_add_16)
END (fetch_add_16)
-ENTRY (add_fetch_16)
+ENTRY_ALIASED (add_fetch_16)
mov x5, x0
cbnz w4, 2f
@@ -380,7 +383,7 @@ ENTRY (add_fetch_16)
END (add_fetch_16)
-ENTRY (fetch_sub_16)
+ENTRY_ALIASED (fetch_sub_16)
mov x5, x0
cbnz w4, 2f
@@ -402,7 +405,7 @@ ENTRY (fetch_sub_16)
END (fetch_sub_16)
-ENTRY (sub_fetch_16)
+ENTRY_ALIASED (sub_fetch_16)
mov x5, x0
cbnz w4, 2f
@@ -624,7 +627,7 @@ END_FEAT (and_fetch_16, LSE128)
#endif
-ENTRY (fetch_xor_16)
+ENTRY_ALIASED (fetch_xor_16)
mov x5, x0
cbnz w4, 2f
@@ -646,7 +649,7 @@ ENTRY (fetch_xor_16)
END (fetch_xor_16)
-ENTRY (xor_fetch_16)
+ENTRY_ALIASED (xor_fetch_16)
mov x5, x0
cbnz w4, 2f
@@ -668,7 +671,7 @@ ENTRY (xor_fetch_16)
END (xor_fetch_16)
-ENTRY (fetch_nand_16)
+ENTRY_ALIASED (fetch_nand_16)
mov x5, x0
mvn in0, in0
mvn in1, in1
@@ -692,7 +695,7 @@ ENTRY (fetch_nand_16)
END (fetch_nand_16)
-ENTRY (nand_fetch_16)
+ENTRY_ALIASED (nand_fetch_16)
mov x5, x0
mvn in0, in0
mvn in1, in1
@@ -718,7 +721,7 @@ END (nand_fetch_16)
/* __atomic_test_and_set is always inlined, so this entry is unused and
only required for completeness. */
-ENTRY (test_and_set_16)
+ENTRY_ALIASED (test_and_set_16)
/* RELAXED/ACQUIRE/CONSUME/RELEASE/ACQ_REL/SEQ_CST. */
mov x5, x0
diff --git a/libatomic/config/linux/aarch64/host-config.h b/libatomic/config/linux/aarch64/host-config.h
index e1a699948f4..6e010594a6c 100644
--- a/libatomic/config/linux/aarch64/host-config.h
+++ b/libatomic/config/linux/aarch64/host-config.h
@@ -48,15 +48,36 @@ typedef struct __ifunc_arg_t {
# define _IFUNC_ARG_HWCAP (1ULL << 62)
#endif
-#if N == 16
-# define IFUNC_COND_1 (has_lse128 (hwcap, features))
-# define IFUNC_COND_2 (has_lse2 (hwcap, features))
-# define IFUNC_NCOND(N) 2
-#else
-# define IFUNC_COND_1 (hwcap & HWCAP_ATOMICS)
-# define IFUNC_NCOND(N) 1
+/* From the file which imported `host-config.h' we can ascertain which
+ architectural extension provides relevant atomic support. From this,
+ we can proceed to tweak the ifunc selector behavior. */
+#if defined (LAT_CAS_N)
+# define LSE_ATOP
+#elif defined (LAT_LOAD_N) || defined (LAT_STORE_N)
+# define LSE2_ATOP
+#elif defined (LAT_EXCH_N) || defined (LAT_FIOR_N) || defined (LAT_FAND_N)
+# define LSE128_ATOP
#endif
+# if N == 16
+# if defined (LSE_ATOP)
+# define IFUNC_NCOND(N) 1
+# define IFUNC_COND_1 (hwcap & HWCAP_ATOMICS)
+# elif defined (LSE2_ATOP)
+# define IFUNC_NCOND(N) 1
+# define IFUNC_COND_1 (has_lse2 (hwcap, features))
+# elif HAVE_FEAT_LSE128 && defined (LSE128_ATOP)
+# define IFUNC_NCOND(N) 1
+# define IFUNC_COND_1 (has_lse128 (hwcap, features))
+# else
+# define IFUNC_NCOND(N) 0
+# define IFUNC_ALT 1
+# endif
+# else
+# define IFUNC_COND_1 (hwcap & HWCAP_ATOMICS)
+# define IFUNC_NCOND(N) 1
+# endif
+
#define MIDR_IMPLEMENTOR(midr) (((midr) >> 24) & 255)
#define MIDR_PARTNUM(midr) (((midr) >> 4) & 0xfff)
--
2.34.1
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 3/4] Libatomic: Clean up AArch64 ifunc aliasing
2024-05-16 13:36 [PATCH 0/4] Libatomic: Cleanup ifunc selector and aliasing Victor Do Nascimento
2024-05-16 13:36 ` [PATCH 1/4] Libatomic: Define per-file identifier macros Victor Do Nascimento
2024-05-16 13:36 ` [PATCH 2/4] Libatomic: Make ifunc selector behavior contingent on importing file Victor Do Nascimento
@ 2024-05-16 13:36 ` Victor Do Nascimento
2024-05-16 13:36 ` [PATCH 4/4] Libatomic: Clean up AArch64 `atomic_16.S' implementation file Victor Do Nascimento
3 siblings, 0 replies; 5+ messages in thread
From: Victor Do Nascimento @ 2024-05-16 13:36 UTC (permalink / raw)
To: gcc-patches; +Cc: richard.sandiford, Richard.Earnshaw, Victor Do Nascimento
Following improvements to the way ifuncs are selected based on
detected architectural features, we are able to do away with many of
the aliases that were previously needed for subsets of atomic
functions that were not implemented in a given extension.
This may be clarified by virtue of an example. Before, LSE128
functions carried the suffix _i1 and LSE2 functions the _i2.
Using a single ifunc selector for all atomic functions meant that if
LSE128 was detected, the _i1 function variant would be used
indiscriminately, irrespective of whether or not a function had an
LSE128-specific implementation. Aliasing was thus needed to redirect
calls to these missing functions to their _i2 LSE2 alternatives.
The more architectural extensions for which support was added, the
more complex the aliasing chain.
With the per-file configuration of ifuncs, we do away with the need
for such aliasing.
libatomic/ChangeLog:
* config/linux/aarch64/atomic_16.S: Remove unnecessary
aliasing.
---
libatomic/config/linux/aarch64/atomic_16.S | 41 ----------------------
1 file changed, 41 deletions(-)
diff --git a/libatomic/config/linux/aarch64/atomic_16.S b/libatomic/config/linux/aarch64/atomic_16.S
index 1517e9e78df..16ff03057ab 100644
--- a/libatomic/config/linux/aarch64/atomic_16.S
+++ b/libatomic/config/linux/aarch64/atomic_16.S
@@ -732,47 +732,6 @@ ENTRY_ALIASED (test_and_set_16)
END (test_and_set_16)
-/* Alias entry points which are the same in LSE2 and LSE128. */
-
-#if HAVE_IFUNC
-# if !HAVE_FEAT_LSE128
-ALIAS (exchange_16, LSE128, LSE2)
-ALIAS (fetch_or_16, LSE128, LSE2)
-ALIAS (fetch_and_16, LSE128, LSE2)
-ALIAS (or_fetch_16, LSE128, LSE2)
-ALIAS (and_fetch_16, LSE128, LSE2)
-# endif
-ALIAS (load_16, LSE128, LSE2)
-ALIAS (store_16, LSE128, LSE2)
-ALIAS (compare_exchange_16, LSE128, LSE2)
-ALIAS (fetch_add_16, LSE128, LSE2)
-ALIAS (add_fetch_16, LSE128, LSE2)
-ALIAS (fetch_sub_16, LSE128, LSE2)
-ALIAS (sub_fetch_16, LSE128, LSE2)
-ALIAS (fetch_xor_16, LSE128, LSE2)
-ALIAS (xor_fetch_16, LSE128, LSE2)
-ALIAS (fetch_nand_16, LSE128, LSE2)
-ALIAS (nand_fetch_16, LSE128, LSE2)
-ALIAS (test_and_set_16, LSE128, LSE2)
-
-/* Alias entry points which are the same in baseline and LSE2. */
-
-ALIAS (exchange_16, LSE2, CORE)
-ALIAS (fetch_add_16, LSE2, CORE)
-ALIAS (add_fetch_16, LSE2, CORE)
-ALIAS (fetch_sub_16, LSE2, CORE)
-ALIAS (sub_fetch_16, LSE2, CORE)
-ALIAS (fetch_or_16, LSE2, CORE)
-ALIAS (or_fetch_16, LSE2, CORE)
-ALIAS (fetch_and_16, LSE2, CORE)
-ALIAS (and_fetch_16, LSE2, CORE)
-ALIAS (fetch_xor_16, LSE2, CORE)
-ALIAS (xor_fetch_16, LSE2, CORE)
-ALIAS (fetch_nand_16, LSE2, CORE)
-ALIAS (nand_fetch_16, LSE2, CORE)
-ALIAS (test_and_set_16, LSE2, CORE)
-#endif
-
/* GNU_PROPERTY_AARCH64_* macros from elf.h for use in asm code. */
#define FEATURE_1_AND 0xc0000000
#define FEATURE_1_BTI 1
--
2.34.1
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 4/4] Libatomic: Clean up AArch64 `atomic_16.S' implementation file
2024-05-16 13:36 [PATCH 0/4] Libatomic: Cleanup ifunc selector and aliasing Victor Do Nascimento
` (2 preceding siblings ...)
2024-05-16 13:36 ` [PATCH 3/4] Libatomic: Clean up AArch64 ifunc aliasing Victor Do Nascimento
@ 2024-05-16 13:36 ` Victor Do Nascimento
3 siblings, 0 replies; 5+ messages in thread
From: Victor Do Nascimento @ 2024-05-16 13:36 UTC (permalink / raw)
To: gcc-patches; +Cc: richard.sandiford, Richard.Earnshaw, Victor Do Nascimento
At present, `atomic_16.S' groups different implementations of the
same functions together in the file. Therefore, as an example,
the LSE128 implementation of `exchange_16' follows on immediately
from its core implementation, as does the `fetch_or_16' LSE128
implementation.
Such architectural extension-dependent implementations are dependent
both on ifunc and assembler support. They may therefore conceivably
be guarded by 2 preprocessor macros, e.g. `#if HAVE_IFUNC' and `#if
HAVE_FEAT_LSE128'.
Having to apply these guards on a per-function basis adds unnecessary
clutter to the file and makes its maintenance more error-prone.
We therefore reorganize the layout of the file in such a way that all
core implementations needing no `#ifdef's are placed first, followed
by all ifunc-dependent implementations, which can all be guarded by a
single `#if HAVE_IFUNC'. Within the guard, these are then subdivided
and organized according to architectural extension requirements such
that in the case of LSE128-specific functions, for example, they can
all be guarded by a single `#if HAVE_FEAT_LSE128', greatly reducing
the overall number of required `#ifdef' macros.
libatomic/ChangeLog:
* config/linux/aarch64/atomic_16.S: reshuffle functions.
---
libatomic/config/linux/aarch64/atomic_16.S | 583 ++++++++++-----------
1 file changed, 288 insertions(+), 295 deletions(-)
diff --git a/libatomic/config/linux/aarch64/atomic_16.S b/libatomic/config/linux/aarch64/atomic_16.S
index 16ff03057ab..27363f82b75 100644
--- a/libatomic/config/linux/aarch64/atomic_16.S
+++ b/libatomic/config/linux/aarch64/atomic_16.S
@@ -40,15 +40,12 @@
#include "auto-config.h"
-#if !HAVE_IFUNC
-# undef HAVE_FEAT_LSE128
-# define HAVE_FEAT_LSE128 0
-#endif
-
-#define HAVE_FEAT_LSE2 HAVE_IFUNC
-
-#if HAVE_FEAT_LSE128
+#if HAVE_IFUNC
+# if HAVE_FEAT_LSE128
.arch armv9-a+lse128
+# else
+ .arch armv8-a+lse
+# endif
#else
.arch armv8-a+lse
#endif
@@ -124,6 +121,8 @@ NAME: \
#define ACQ_REL 4
#define SEQ_CST 5
+/* Core atomic operation implementations. These are available irrespective of
+ ifunc support or the presence of additional architectural extensions. */
ENTRY (load_16)
mov x5, x0
@@ -143,31 +142,6 @@ ENTRY (load_16)
END (load_16)
-#if HAVE_FEAT_LSE2
-ENTRY_FEAT (load_16, LSE2)
- cbnz w1, 1f
-
- /* RELAXED. */
- ldp res0, res1, [x0]
- ret
-1:
- cmp w1, SEQ_CST
- b.eq 2f
-
- /* ACQUIRE/CONSUME (Load-AcquirePC semantics). */
- ldp res0, res1, [x0]
- dmb ishld
- ret
-
- /* SEQ_CST. */
-2: ldar tmp0, [x0] /* Block reordering with Store-Release instr. */
- ldp res0, res1, [x0]
- dmb ishld
- ret
-END_FEAT (load_16, LSE2)
-#endif
-
-
ENTRY (store_16)
cbnz w4, 2f
@@ -185,23 +159,6 @@ ENTRY (store_16)
END (store_16)
-#if HAVE_FEAT_LSE2
-ENTRY_FEAT (store_16, LSE2)
- cbnz w4, 1f
-
- /* RELAXED. */
- stp in0, in1, [x0]
- ret
-
- /* RELEASE/SEQ_CST. */
-1: ldxp xzr, tmp0, [x0]
- stlxp w4, in0, in1, [x0]
- cbnz w4, 1b
- ret
-END_FEAT (store_16, LSE2)
-#endif
-
-
ENTRY (exchange_16)
mov x5, x0
cbnz w4, 2f
@@ -229,31 +186,6 @@ ENTRY (exchange_16)
END (exchange_16)
-#if HAVE_FEAT_LSE128
-ENTRY_FEAT (exchange_16, LSE128)
- mov tmp0, x0
- mov res0, in0
- mov res1, in1
- cbnz w4, 1f
-
- /* RELAXED. */
- swpp res0, res1, [tmp0]
- ret
-1:
- cmp w4, ACQUIRE
- b.hi 2f
-
- /* ACQUIRE/CONSUME. */
- swppa res0, res1, [tmp0]
- ret
-
- /* RELEASE/ACQ_REL/SEQ_CST. */
-2: swppal res0, res1, [tmp0]
- ret
-END_FEAT (exchange_16, LSE128)
-#endif
-
-
ENTRY (compare_exchange_16)
ldp exp0, exp1, [x1]
cbz w4, 3f
@@ -301,43 +233,97 @@ ENTRY (compare_exchange_16)
END (compare_exchange_16)
-#if HAVE_FEAT_LSE2
-ENTRY_FEAT (compare_exchange_16, LSE)
- ldp exp0, exp1, [x1]
- mov tmp0, exp0
- mov tmp1, exp1
- cbz w4, 2f
- cmp w4, RELEASE
- b.hs 3f
+ENTRY (fetch_or_16)
+ mov x5, x0
+ cbnz w4, 2f
- /* ACQUIRE/CONSUME. */
- caspa exp0, exp1, in0, in1, [x0]
-0:
- cmp exp0, tmp0
- ccmp exp1, tmp1, 0, eq
- bne 1f
- mov x0, 1
+ /* RELAXED. */
+1: ldxp res0, res1, [x5]
+ orr tmp0, res0, in0
+ orr tmp1, res1, in1
+ stxp w4, tmp0, tmp1, [x5]
+ cbnz w4, 1b
ret
-1:
- stp exp0, exp1, [x1]
- mov x0, 0
+
+ /* ACQUIRE/CONSUME/RELEASE/ACQ_REL/SEQ_CST. */
+2: ldaxp res0, res1, [x5]
+ orr tmp0, res0, in0
+ orr tmp1, res1, in1
+ stlxp w4, tmp0, tmp1, [x5]
+ cbnz w4, 2b
ret
+END (fetch_or_16)
+
+
+ENTRY (or_fetch_16)
+ mov x5, x0
+ cbnz w4, 2f
/* RELAXED. */
-2: casp exp0, exp1, in0, in1, [x0]
- b 0b
+1: ldxp res0, res1, [x5]
+ orr res0, res0, in0
+ orr res1, res1, in1
+ stxp w4, res0, res1, [x5]
+ cbnz w4, 1b
+ ret
- /* RELEASE. */
-3: b.hi 4f
- caspl exp0, exp1, in0, in1, [x0]
- b 0b
+ /* ACQUIRE/CONSUME/RELEASE/ACQ_REL/SEQ_CST. */
+2: ldaxp res0, res1, [x5]
+ orr res0, res0, in0
+ orr res1, res1, in1
+ stlxp w4, res0, res1, [x5]
+ cbnz w4, 2b
+ ret
+END (or_fetch_16)
+
+
+ENTRY (fetch_and_16)
+ mov x5, x0
+ cbnz w4, 2f
+
+ /* RELAXED. */
+1: ldxp res0, res1, [x5]
+ and tmp0, res0, in0
+ and tmp1, res1, in1
+ stxp w4, tmp0, tmp1, [x5]
+ cbnz w4, 1b
+ ret
+
+ /* ACQUIRE/CONSUME/RELEASE/ACQ_REL/SEQ_CST. */
+2: ldaxp res0, res1, [x5]
+ and tmp0, res0, in0
+ and tmp1, res1, in1
+ stlxp w4, tmp0, tmp1, [x5]
+ cbnz w4, 2b
+ ret
+END (fetch_and_16)
+
+
+ENTRY (and_fetch_16)
+ mov x5, x0
+ cbnz w4, 2f
+
+ /* RELAXED. */
+1: ldxp res0, res1, [x5]
+ and res0, res0, in0
+ and res1, res1, in1
+ stxp w4, res0, res1, [x5]
+ cbnz w4, 1b
+ ret
+
+ /* ACQUIRE/CONSUME/RELEASE/ACQ_REL/SEQ_CST. */
+2: ldaxp res0, res1, [x5]
+ and res0, res0, in0
+ and res1, res1, in1
+ stlxp w4, res0, res1, [x5]
+ cbnz w4, 2b
+ ret
+END (and_fetch_16)
- /* ACQ_REL/SEQ_CST. */
-4: caspal exp0, exp1, in0, in1, [x0]
- b 0b
-END_FEAT (compare_exchange_16, LSE)
-#endif
+/* The following functions are currently single-implementation operations,
+ so they are never assigned an ifunc selector. As such, they must be
+ reachable from __atomic_* entrypoints. */
ENTRY_ALIASED (fetch_add_16)
mov x5, x0
@@ -427,309 +413,316 @@ ENTRY_ALIASED (sub_fetch_16)
END (sub_fetch_16)
-ENTRY (fetch_or_16)
+ENTRY_ALIASED (fetch_xor_16)
mov x5, x0
cbnz w4, 2f
/* RELAXED. */
1: ldxp res0, res1, [x5]
- orr tmp0, res0, in0
- orr tmp1, res1, in1
+ eor tmp0, res0, in0
+ eor tmp1, res1, in1
stxp w4, tmp0, tmp1, [x5]
cbnz w4, 1b
ret
/* ACQUIRE/CONSUME/RELEASE/ACQ_REL/SEQ_CST. */
2: ldaxp res0, res1, [x5]
- orr tmp0, res0, in0
- orr tmp1, res1, in1
+ eor tmp0, res0, in0
+ eor tmp1, res1, in1
stlxp w4, tmp0, tmp1, [x5]
cbnz w4, 2b
ret
-END (fetch_or_16)
+END (fetch_xor_16)
-#if HAVE_FEAT_LSE128
-ENTRY_FEAT (fetch_or_16, LSE128)
- mov tmp0, x0
- mov res0, in0
- mov res1, in1
- cbnz w4, 1f
+ENTRY_ALIASED (xor_fetch_16)
+ mov x5, x0
+ cbnz w4, 2f
/* RELAXED. */
- ldsetp res0, res1, [tmp0]
- ret
-1:
- cmp w4, ACQUIRE
- b.hi 2f
-
- /* ACQUIRE/CONSUME. */
- ldsetpa res0, res1, [tmp0]
+1: ldxp res0, res1, [x5]
+ eor res0, res0, in0
+ eor res1, res1, in1
+ stxp w4, res0, res1, [x5]
+ cbnz w4, 1b
ret
- /* RELEASE/ACQ_REL/SEQ_CST. */
-2: ldsetpal res0, res1, [tmp0]
+ /* ACQUIRE/CONSUME/RELEASE/ACQ_REL/SEQ_CST. */
+2: ldaxp res0, res1, [x5]
+ eor res0, res0, in0
+ eor res1, res1, in1
+ stlxp w4, res0, res1, [x5]
+ cbnz w4, 2b
ret
-END_FEAT (fetch_or_16, LSE128)
-#endif
+END (xor_fetch_16)
-ENTRY (or_fetch_16)
+ENTRY_ALIASED (fetch_nand_16)
mov x5, x0
+ mvn in0, in0
+ mvn in1, in1
cbnz w4, 2f
/* RELAXED. */
1: ldxp res0, res1, [x5]
- orr res0, res0, in0
- orr res1, res1, in1
- stxp w4, res0, res1, [x5]
+ orn tmp0, in0, res0
+ orn tmp1, in1, res1
+ stxp w4, tmp0, tmp1, [x5]
cbnz w4, 1b
ret
/* ACQUIRE/CONSUME/RELEASE/ACQ_REL/SEQ_CST. */
2: ldaxp res0, res1, [x5]
- orr res0, res0, in0
- orr res1, res1, in1
- stlxp w4, res0, res1, [x5]
+ orn tmp0, in0, res0
+ orn tmp1, in1, res1
+ stlxp w4, tmp0, tmp1, [x5]
cbnz w4, 2b
ret
-END (or_fetch_16)
+END (fetch_nand_16)
-#if HAVE_FEAT_LSE128
-ENTRY_FEAT (or_fetch_16, LSE128)
- cbnz w4, 1f
- mov tmp0, in0
- mov tmp1, in1
+ENTRY_ALIASED (nand_fetch_16)
+ mov x5, x0
+ mvn in0, in0
+ mvn in1, in1
+ cbnz w4, 2f
/* RELAXED. */
- ldsetp in0, in1, [x0]
- orr res0, in0, tmp0
- orr res1, in1, tmp1
+1: ldxp res0, res1, [x5]
+ orn res0, in0, res0
+ orn res1, in1, res1
+ stxp w4, res0, res1, [x5]
+ cbnz w4, 1b
ret
-1:
- cmp w4, ACQUIRE
- b.hi 2f
- /* ACQUIRE/CONSUME. */
- ldsetpa in0, in1, [x0]
- orr res0, in0, tmp0
- orr res1, in1, tmp1
+ /* ACQUIRE/CONSUME/RELEASE/ACQ_REL/SEQ_CST. */
+2: ldaxp res0, res1, [x5]
+ orn res0, in0, res0
+ orn res1, in1, res1
+ stlxp w4, res0, res1, [x5]
+ cbnz w4, 2b
ret
+END (nand_fetch_16)
- /* RELEASE/ACQ_REL/SEQ_CST. */
-2: ldsetpal in0, in1, [x0]
- orr res0, in0, tmp0
- orr res1, in1, tmp1
- ret
-END_FEAT (or_fetch_16, LSE128)
-#endif
+/* __atomic_test_and_set is always inlined, so this entry is unused and
+ only required for completeness. */
+ENTRY_ALIASED (test_and_set_16)
-ENTRY (fetch_and_16)
+ /* RELAXED/ACQUIRE/CONSUME/RELEASE/ACQ_REL/SEQ_CST. */
mov x5, x0
- cbnz w4, 2f
-
- /* RELAXED. */
-1: ldxp res0, res1, [x5]
- and tmp0, res0, in0
- and tmp1, res1, in1
- stxp w4, tmp0, tmp1, [x5]
+1: ldaxrb w0, [x5]
+ stlxrb w4, w2, [x5]
cbnz w4, 1b
ret
+END (test_and_set_16)
- /* ACQUIRE/CONSUME/RELEASE/ACQ_REL/SEQ_CST. */
-2: ldaxp res0, res1, [x5]
- and tmp0, res0, in0
- and tmp1, res1, in1
- stlxp w4, tmp0, tmp1, [x5]
- cbnz w4, 2b
- ret
-END (fetch_and_16)
-
+/* Ensure extension-specific implementations are not included unless ifunc
+ support is present, along with necessary assembler support. */
-#if HAVE_FEAT_LSE128
-ENTRY_FEAT (fetch_and_16, LSE128)
- mov tmp0, x0
- mvn res0, in0
- mvn res1, in1
- cbnz w4, 1f
+#if HAVE_IFUNC
+ENTRY_FEAT (load_16, LSE2)
+ cbnz w1, 1f
/* RELAXED. */
- ldclrp res0, res1, [tmp0]
+ ldp res0, res1, [x0]
ret
-
1:
- cmp w4, ACQUIRE
- b.hi 2f
+ cmp w1, SEQ_CST
+ b.eq 2f
- /* ACQUIRE/CONSUME. */
- ldclrpa res0, res1, [tmp0]
+ /* ACQUIRE/CONSUME (Load-AcquirePC semantics). */
+ ldp res0, res1, [x0]
+ dmb ishld
ret
- /* RELEASE/ACQ_REL/SEQ_CST. */
-2: ldclrpal res0, res1, [tmp0]
+ /* SEQ_CST. */
+2: ldar tmp0, [x0] /* Block reordering with Store-Release instr. */
+ ldp res0, res1, [x0]
+ dmb ishld
ret
-END_FEAT (fetch_and_16, LSE128)
-#endif
+END_FEAT (load_16, LSE2)
-ENTRY (and_fetch_16)
- mov x5, x0
- cbnz w4, 2f
+ENTRY_FEAT (store_16, LSE2)
+ cbnz w4, 1f
/* RELAXED. */
-1: ldxp res0, res1, [x5]
- and res0, res0, in0
- and res1, res1, in1
- stxp w4, res0, res1, [x5]
+ stp in0, in1, [x0]
+ ret
+
+ /* RELEASE/SEQ_CST. */
+1: ldxp xzr, tmp0, [x0]
+ stlxp w4, in0, in1, [x0]
cbnz w4, 1b
ret
+END_FEAT (store_16, LSE2)
- /* ACQUIRE/CONSUME/RELEASE/ACQ_REL/SEQ_CST. */
-2: ldaxp res0, res1, [x5]
- and res0, res0, in0
- and res1, res1, in1
- stlxp w4, res0, res1, [x5]
- cbnz w4, 2b
+
+ENTRY_FEAT (compare_exchange_16, LSE)
+ ldp exp0, exp1, [x1]
+ mov tmp0, exp0
+ mov tmp1, exp1
+ cbz w4, 2f
+ cmp w4, RELEASE
+ b.hs 3f
+
+ /* ACQUIRE/CONSUME. */
+ caspa exp0, exp1, in0, in1, [x0]
+0:
+ cmp exp0, tmp0
+ ccmp exp1, tmp1, 0, eq
+ bne 1f
+ mov x0, 1
ret
-END (and_fetch_16)
+1:
+ stp exp0, exp1, [x1]
+ mov x0, 0
+ ret
+
+ /* RELAXED. */
+2: casp exp0, exp1, in0, in1, [x0]
+ b 0b
+
+ /* RELEASE. */
+3: b.hi 4f
+ caspl exp0, exp1, in0, in1, [x0]
+ b 0b
+
+ /* ACQ_REL/SEQ_CST. */
+4: caspal exp0, exp1, in0, in1, [x0]
+ b 0b
+END_FEAT (compare_exchange_16, LSE)
#if HAVE_FEAT_LSE128
-ENTRY_FEAT (and_fetch_16, LSE128)
- mvn tmp0, in0
- mvn tmp0, in1
+ENTRY_FEAT (exchange_16, LSE128)
+ mov tmp0, x0
+ mov res0, in0
+ mov res1, in1
cbnz w4, 1f
/* RELAXED. */
- ldclrp tmp0, tmp1, [x0]
- and res0, tmp0, in0
- and res1, tmp1, in1
+ swpp res0, res1, [tmp0]
ret
-
1:
cmp w4, ACQUIRE
b.hi 2f
/* ACQUIRE/CONSUME. */
- ldclrpa tmp0, tmp1, [x0]
- and res0, tmp0, in0
- and res1, tmp1, in1
+ swppa res0, res1, [tmp0]
ret
/* RELEASE/ACQ_REL/SEQ_CST. */
-2: ldclrpal tmp0, tmp1, [x5]
- and res0, tmp0, in0
- and res1, tmp1, in1
+2: swppal res0, res1, [tmp0]
ret
-END_FEAT (and_fetch_16, LSE128)
-#endif
+END_FEAT (exchange_16, LSE128)
-ENTRY_ALIASED (fetch_xor_16)
- mov x5, x0
- cbnz w4, 2f
+ENTRY_FEAT (fetch_or_16, LSE128)
+ mov tmp0, x0
+ mov res0, in0
+ mov res1, in1
+ cbnz w4, 1f
/* RELAXED. */
-1: ldxp res0, res1, [x5]
- eor tmp0, res0, in0
- eor tmp1, res1, in1
- stxp w4, tmp0, tmp1, [x5]
- cbnz w4, 1b
+ ldsetp res0, res1, [tmp0]
ret
+1:
+ cmp w4, ACQUIRE
+ b.hi 2f
- /* ACQUIRE/CONSUME/RELEASE/ACQ_REL/SEQ_CST. */
-2: ldaxp res0, res1, [x5]
- eor tmp0, res0, in0
- eor tmp1, res1, in1
- stlxp w4, tmp0, tmp1, [x5]
- cbnz w4, 2b
+ /* ACQUIRE/CONSUME. */
+ ldsetpa res0, res1, [tmp0]
ret
-END (fetch_xor_16)
+ /* RELEASE/ACQ_REL/SEQ_CST. */
+2: ldsetpal res0, res1, [tmp0]
+ ret
+END_FEAT (fetch_or_16, LSE128)
-ENTRY_ALIASED (xor_fetch_16)
- mov x5, x0
- cbnz w4, 2f
+
+ENTRY_FEAT (or_fetch_16, LSE128)
+ cbnz w4, 1f
+ mov tmp0, in0
+ mov tmp1, in1
/* RELAXED. */
-1: ldxp res0, res1, [x5]
- eor res0, res0, in0
- eor res1, res1, in1
- stxp w4, res0, res1, [x5]
- cbnz w4, 1b
+ ldsetp in0, in1, [x0]
+ orr res0, in0, tmp0
+ orr res1, in1, tmp1
ret
+1:
+ cmp w4, ACQUIRE
+ b.hi 2f
- /* ACQUIRE/CONSUME/RELEASE/ACQ_REL/SEQ_CST. */
-2: ldaxp res0, res1, [x5]
- eor res0, res0, in0
- eor res1, res1, in1
- stlxp w4, res0, res1, [x5]
- cbnz w4, 2b
+ /* ACQUIRE/CONSUME. */
+ ldsetpa in0, in1, [x0]
+ orr res0, in0, tmp0
+ orr res1, in1, tmp1
ret
-END (xor_fetch_16)
+ /* RELEASE/ACQ_REL/SEQ_CST. */
+2: ldsetpal in0, in1, [x0]
+ orr res0, in0, tmp0
+ orr res1, in1, tmp1
+ ret
+END_FEAT (or_fetch_16, LSE128)
-ENTRY_ALIASED (fetch_nand_16)
- mov x5, x0
- mvn in0, in0
- mvn in1, in1
- cbnz w4, 2f
+
+ENTRY_FEAT (fetch_and_16, LSE128)
+ mov tmp0, x0
+ mvn res0, in0
+ mvn res1, in1
+ cbnz w4, 1f
/* RELAXED. */
-1: ldxp res0, res1, [x5]
- orn tmp0, in0, res0
- orn tmp1, in1, res1
- stxp w4, tmp0, tmp1, [x5]
- cbnz w4, 1b
+ ldclrp res0, res1, [tmp0]
ret
- /* ACQUIRE/CONSUME/RELEASE/ACQ_REL/SEQ_CST. */
-2: ldaxp res0, res1, [x5]
- orn tmp0, in0, res0
- orn tmp1, in1, res1
- stlxp w4, tmp0, tmp1, [x5]
- cbnz w4, 2b
+1:
+ cmp w4, ACQUIRE
+ b.hi 2f
+
+ /* ACQUIRE/CONSUME. */
+ ldclrpa res0, res1, [tmp0]
ret
-END (fetch_nand_16)
+ /* RELEASE/ACQ_REL/SEQ_CST. */
+2: ldclrpal res0, res1, [tmp0]
+ ret
+END_FEAT (fetch_and_16, LSE128)
-ENTRY_ALIASED (nand_fetch_16)
- mov x5, x0
- mvn in0, in0
- mvn in1, in1
- cbnz w4, 2f
- /* RELAXED. */
-1: ldxp res0, res1, [x5]
- orn res0, in0, res0
- orn res1, in1, res1
- stxp w4, res0, res1, [x5]
- cbnz w4, 1b
- ret
+ENTRY_FEAT (and_fetch_16, LSE128)
+ mvn tmp0, in0
+ mvn tmp0, in1
+ cbnz w4, 1f
- /* ACQUIRE/CONSUME/RELEASE/ACQ_REL/SEQ_CST. */
-2: ldaxp res0, res1, [x5]
- orn res0, in0, res0
- orn res1, in1, res1
- stlxp w4, res0, res1, [x5]
- cbnz w4, 2b
+ /* RELAXED. */
+ ldclrp tmp0, tmp1, [x0]
+ and res0, tmp0, in0
+ and res1, tmp1, in1
ret
-END (nand_fetch_16)
+1:
+ cmp w4, ACQUIRE
+ b.hi 2f
-/* __atomic_test_and_set is always inlined, so this entry is unused and
- only required for completeness. */
-ENTRY_ALIASED (test_and_set_16)
+ /* ACQUIRE/CONSUME. */
+ ldclrpa tmp0, tmp1, [x0]
+ and res0, tmp0, in0
+ and res1, tmp1, in1
+ ret
- /* RELAXED/ACQUIRE/CONSUME/RELEASE/ACQ_REL/SEQ_CST. */
- mov x5, x0
-1: ldaxrb w0, [x5]
- stlxrb w4, w2, [x5]
- cbnz w4, 1b
+ /* RELEASE/ACQ_REL/SEQ_CST. */
+2: ldclrpal tmp0, tmp1, [x5]
+ and res0, tmp0, in0
+ and res1, tmp1, in1
ret
-END (test_and_set_16)
+END_FEAT (and_fetch_16, LSE128)
+#endif /* HAVE_FEAT_LSE128 */
+#endif /* HAVE_IFUNC */
/* GNU_PROPERTY_AARCH64_* macros from elf.h for use in asm code. */
--
2.34.1
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-05-16 13:37 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-16 13:36 [PATCH 0/4] Libatomic: Cleanup ifunc selector and aliasing Victor Do Nascimento
2024-05-16 13:36 ` [PATCH 1/4] Libatomic: Define per-file identifier macros Victor Do Nascimento
2024-05-16 13:36 ` [PATCH 2/4] Libatomic: Make ifunc selector behavior contingent on importing file Victor Do Nascimento
2024-05-16 13:36 ` [PATCH 3/4] Libatomic: Clean up AArch64 ifunc aliasing Victor Do Nascimento
2024-05-16 13:36 ` [PATCH 4/4] Libatomic: Clean up AArch64 `atomic_16.S' implementation file Victor Do Nascimento
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).