* [PATCH v2 0/2] Libatomic: Add LSE128 atomics support for AArch64 @ 2023-11-13 11:37 Victor Do Nascimento 2023-11-13 11:37 ` [PATCH v2 1/2] libatomic: atomic_16.S: Improve ENTRY, END and ALIAS macro interface Victor Do Nascimento 2023-11-13 11:37 ` [PATCH v2 2/2] libatomic: Enable LSE128 128-bit atomics for armv9.4-a Victor Do Nascimento 0 siblings, 2 replies; 6+ messages in thread From: Victor Do Nascimento @ 2023-11-13 11:37 UTC (permalink / raw) To: gcc-patches Cc: kyrylo.tkachov, richard.sandiford, Richard.Earnshaw, Victor Do Nascimento v2 updates: Move the previously unguarded definition of IFUNC_NCONDN(N) in `host-config.h' to within the scope of `#ifdef HWCAP_USCAP'. This is done so that its definition is not only contingent on the value of N but also on the definition of HWCAP_USCAP as it was found that building on systems where !HWCAP_USCAP and N == 16 led to a previously-undetected build error. --- Building upon Wilco Dijkstra's work on AArch64 128-bit atomics for Libatomic, namely the patches from [1] and [2], this patch series extends the library's capabilities to dynamically select and emit Armv9.4-a LSE128 implementations of atomic operations via ifuncs at run-time whenever architectural support is present. Regression tested on aarch64-linux-gnu target with LSE128-support. [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-June/620529.html [2] https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626358.html Victor Do Nascimento (2): libatomic: atomic_16.S: Improve ENTRY, END and ALIAS macro interface libatomic: Enable LSE128 128-bit atomics for armv9.4-a libatomic/Makefile.am | 3 + libatomic/Makefile.in | 1 + libatomic/acinclude.m4 | 19 ++ libatomic/auto-config.h.in | 3 + libatomic/config/linux/aarch64/atomic_16.S | 315 ++++++++++++++----- libatomic/config/linux/aarch64/host-config.h | 27 +- libatomic/configure | 59 +++- libatomic/configure.ac | 1 + 8 files changed, 352 insertions(+), 76 deletions(-) -- 2.42.0 ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v2 1/2] libatomic: atomic_16.S: Improve ENTRY, END and ALIAS macro interface 2023-11-13 11:37 [PATCH v2 0/2] Libatomic: Add LSE128 atomics support for AArch64 Victor Do Nascimento @ 2023-11-13 11:37 ` Victor Do Nascimento 2023-11-29 13:44 ` Richard Earnshaw 2023-11-13 11:37 ` [PATCH v2 2/2] libatomic: Enable LSE128 128-bit atomics for armv9.4-a Victor Do Nascimento 1 sibling, 1 reply; 6+ messages in thread From: Victor Do Nascimento @ 2023-11-13 11:37 UTC (permalink / raw) To: gcc-patches Cc: kyrylo.tkachov, richard.sandiford, Richard.Earnshaw, Victor Do Nascimento The introduction of further architectural-feature dependent ifuncs for AArch64 makes hard-coding ifunc `_i<n>' suffixes to functions cumbersome to work with. It is awkward to remember which ifunc maps onto which arch feature and makes the code harder to maintain when new ifuncs are added and their suffixes possibly altered. This patch uses pre-processor `#define' statements to map each suffix to a descriptive feature name macro, for example: #define LSE2 _i1 and reconstructs function names with the pre-processor's token concatenation feature, such that for `MACRO(name)', we would now have `MACRO(name, feature)' and in the macro definition body we replace `name` with `name##feature`. libatomic/ChangeLog: * config/linux/aarch64/atomic_16.S (CORE): New macro. (LSE2): Likewise. (ENTRY): Modify macro to take in `arch' argument. (END): Likewise. (ALIAS): Likewise. (ENTRY1): New macro. (END1): Likewise. (ALIAS): Likewise. --- libatomic/config/linux/aarch64/atomic_16.S | 147 +++++++++++---------- 1 file changed, 79 insertions(+), 68 deletions(-) diff --git a/libatomic/config/linux/aarch64/atomic_16.S b/libatomic/config/linux/aarch64/atomic_16.S index 0485c284117..3f6225830e6 100644 --- a/libatomic/config/linux/aarch64/atomic_16.S +++ b/libatomic/config/linux/aarch64/atomic_16.S @@ -39,22 +39,34 @@ .arch armv8-a+lse -#define ENTRY(name) \ - .global name; \ - .hidden name; \ - .type name,%function; \ - .p2align 4; \ -name: \ - .cfi_startproc; \ +#define ENTRY(name, feat) \ + ENTRY1(name, feat) + +#define ENTRY1(name, feat) \ + .global name##feat; \ + .hidden name##feat; \ + .type name##feat,%function; \ + .p2align 4; \ +name##feat: \ + .cfi_startproc; \ hint 34 // bti c -#define END(name) \ - .cfi_endproc; \ - .size name, .-name; +#define END(name, feat) \ + END1(name, feat) -#define ALIAS(alias,name) \ - .global alias; \ - .set alias, name; +#define END1(name, feat) \ + .cfi_endproc; \ + .size name##feat, .-name##feat; + +#define ALIAS(alias, from, to) \ + ALIAS1(alias,from,to) + +#define ALIAS1(alias, from, to) \ + .global alias##from; \ + .set alias##from, alias##to; + +#define CORE +#define LSE2 _i1 #define res0 x0 #define res1 x1 @@ -89,7 +101,7 @@ name: \ #define SEQ_CST 5 -ENTRY (libat_load_16) +ENTRY (libat_load_16, CORE) mov x5, x0 cbnz w1, 2f @@ -104,10 +116,10 @@ ENTRY (libat_load_16) stxp w4, res0, res1, [x5] cbnz w4, 2b ret -END (libat_load_16) +END (libat_load_16, CORE) -ENTRY (libat_load_16_i1) +ENTRY (libat_load_16, LSE2) cbnz w1, 1f /* RELAXED. */ @@ -127,10 +139,10 @@ ENTRY (libat_load_16_i1) ldp res0, res1, [x0] dmb ishld ret -END (libat_load_16_i1) +END (libat_load_16, LSE2) -ENTRY (libat_store_16) +ENTRY (libat_store_16, CORE) cbnz w4, 2f /* RELAXED. */ @@ -144,10 +156,10 @@ ENTRY (libat_store_16) stlxp w4, in0, in1, [x0] cbnz w4, 2b ret -END (libat_store_16) +END (libat_store_16, CORE) -ENTRY (libat_store_16_i1) +ENTRY (libat_store_16, LSE2) cbnz w4, 1f /* RELAXED. */ @@ -159,10 +171,10 @@ ENTRY (libat_store_16_i1) stlxp w4, in0, in1, [x0] cbnz w4, 1b ret -END (libat_store_16_i1) +END (libat_store_16, LSE2) -ENTRY (libat_exchange_16) +ENTRY (libat_exchange_16, CORE) mov x5, x0 cbnz w4, 2f @@ -186,10 +198,10 @@ ENTRY (libat_exchange_16) stlxp w4, in0, in1, [x5] cbnz w4, 4b ret -END (libat_exchange_16) +END (libat_exchange_16, CORE) -ENTRY (libat_compare_exchange_16) +ENTRY (libat_compare_exchange_16, CORE) ldp exp0, exp1, [x1] cbz w4, 3f cmp w4, RELEASE @@ -228,10 +240,10 @@ ENTRY (libat_compare_exchange_16) cbnz w4, 4b mov x0, 1 ret -END (libat_compare_exchange_16) +END (libat_compare_exchange_16, CORE) -ENTRY (libat_compare_exchange_16_i1) +ENTRY (libat_compare_exchange_16, LSE2) ldp exp0, exp1, [x1] mov tmp0, exp0 mov tmp1, exp1 @@ -264,10 +276,10 @@ ENTRY (libat_compare_exchange_16_i1) /* ACQ_REL/SEQ_CST. */ 4: caspal exp0, exp1, in0, in1, [x0] b 0b -END (libat_compare_exchange_16_i1) +END (libat_compare_exchange_16, LSE2) -ENTRY (libat_fetch_add_16) +ENTRY (libat_fetch_add_16, CORE) mov x5, x0 cbnz w4, 2f @@ -286,10 +298,10 @@ ENTRY (libat_fetch_add_16) stlxp w4, tmp0, tmp1, [x5] cbnz w4, 2b ret -END (libat_fetch_add_16) +END (libat_fetch_add_16, CORE) -ENTRY (libat_add_fetch_16) +ENTRY (libat_add_fetch_16, CORE) mov x5, x0 cbnz w4, 2f @@ -308,10 +320,10 @@ ENTRY (libat_add_fetch_16) stlxp w4, res0, res1, [x5] cbnz w4, 2b ret -END (libat_add_fetch_16) +END (libat_add_fetch_16, CORE) -ENTRY (libat_fetch_sub_16) +ENTRY (libat_fetch_sub_16, CORE) mov x5, x0 cbnz w4, 2f @@ -330,10 +342,10 @@ ENTRY (libat_fetch_sub_16) stlxp w4, tmp0, tmp1, [x5] cbnz w4, 2b ret -END (libat_fetch_sub_16) +END (libat_fetch_sub_16, CORE) -ENTRY (libat_sub_fetch_16) +ENTRY (libat_sub_fetch_16, CORE) mov x5, x0 cbnz w4, 2f @@ -352,10 +364,10 @@ ENTRY (libat_sub_fetch_16) stlxp w4, res0, res1, [x5] cbnz w4, 2b ret -END (libat_sub_fetch_16) +END (libat_sub_fetch_16, CORE) -ENTRY (libat_fetch_or_16) +ENTRY (libat_fetch_or_16, CORE) mov x5, x0 cbnz w4, 2f @@ -374,10 +386,10 @@ ENTRY (libat_fetch_or_16) stlxp w4, tmp0, tmp1, [x5] cbnz w4, 2b ret -END (libat_fetch_or_16) +END (libat_fetch_or_16, CORE) -ENTRY (libat_or_fetch_16) +ENTRY (libat_or_fetch_16, CORE) mov x5, x0 cbnz w4, 2f @@ -396,10 +408,10 @@ ENTRY (libat_or_fetch_16) stlxp w4, res0, res1, [x5] cbnz w4, 2b ret -END (libat_or_fetch_16) +END (libat_or_fetch_16, CORE) -ENTRY (libat_fetch_and_16) +ENTRY (libat_fetch_and_16, CORE) mov x5, x0 cbnz w4, 2f @@ -418,10 +430,10 @@ ENTRY (libat_fetch_and_16) stlxp w4, tmp0, tmp1, [x5] cbnz w4, 2b ret -END (libat_fetch_and_16) +END (libat_fetch_and_16, CORE) -ENTRY (libat_and_fetch_16) +ENTRY (libat_and_fetch_16, CORE) mov x5, x0 cbnz w4, 2f @@ -440,10 +452,10 @@ ENTRY (libat_and_fetch_16) stlxp w4, res0, res1, [x5] cbnz w4, 2b ret -END (libat_and_fetch_16) +END (libat_and_fetch_16, CORE) -ENTRY (libat_fetch_xor_16) +ENTRY (libat_fetch_xor_16, CORE) mov x5, x0 cbnz w4, 2f @@ -462,10 +474,10 @@ ENTRY (libat_fetch_xor_16) stlxp w4, tmp0, tmp1, [x5] cbnz w4, 2b ret -END (libat_fetch_xor_16) +END (libat_fetch_xor_16, CORE) -ENTRY (libat_xor_fetch_16) +ENTRY (libat_xor_fetch_16, CORE) mov x5, x0 cbnz w4, 2f @@ -484,10 +496,10 @@ ENTRY (libat_xor_fetch_16) stlxp w4, res0, res1, [x5] cbnz w4, 2b ret -END (libat_xor_fetch_16) +END (libat_xor_fetch_16, CORE) -ENTRY (libat_fetch_nand_16) +ENTRY (libat_fetch_nand_16, CORE) mov x5, x0 mvn in0, in0 mvn in1, in1 @@ -508,10 +520,10 @@ ENTRY (libat_fetch_nand_16) stlxp w4, tmp0, tmp1, [x5] cbnz w4, 2b ret -END (libat_fetch_nand_16) +END (libat_fetch_nand_16, CORE) -ENTRY (libat_nand_fetch_16) +ENTRY (libat_nand_fetch_16, CORE) mov x5, x0 mvn in0, in0 mvn in1, in1 @@ -532,12 +544,12 @@ ENTRY (libat_nand_fetch_16) stlxp w4, res0, res1, [x5] cbnz w4, 2b ret -END (libat_nand_fetch_16) +END (libat_nand_fetch_16, CORE) /* __atomic_test_and_set is always inlined, so this entry is unused and only required for completeness. */ -ENTRY (libat_test_and_set_16) +ENTRY (libat_test_and_set_16, CORE) /* RELAXED/ACQUIRE/CONSUME/RELEASE/ACQ_REL/SEQ_CST. */ mov x5, x0 @@ -545,26 +557,25 @@ ENTRY (libat_test_and_set_16) stlxrb w4, w2, [x5] cbnz w4, 1b ret -END (libat_test_and_set_16) +END (libat_test_and_set_16, CORE) /* Alias entry points which are the same in baseline and LSE2. */ -ALIAS (libat_exchange_16_i1, libat_exchange_16) -ALIAS (libat_fetch_add_16_i1, libat_fetch_add_16) -ALIAS (libat_add_fetch_16_i1, libat_add_fetch_16) -ALIAS (libat_fetch_sub_16_i1, libat_fetch_sub_16) -ALIAS (libat_sub_fetch_16_i1, libat_sub_fetch_16) -ALIAS (libat_fetch_or_16_i1, libat_fetch_or_16) -ALIAS (libat_or_fetch_16_i1, libat_or_fetch_16) -ALIAS (libat_fetch_and_16_i1, libat_fetch_and_16) -ALIAS (libat_and_fetch_16_i1, libat_and_fetch_16) -ALIAS (libat_fetch_xor_16_i1, libat_fetch_xor_16) -ALIAS (libat_xor_fetch_16_i1, libat_xor_fetch_16) -ALIAS (libat_fetch_nand_16_i1, libat_fetch_nand_16) -ALIAS (libat_nand_fetch_16_i1, libat_nand_fetch_16) -ALIAS (libat_test_and_set_16_i1, libat_test_and_set_16) - +ALIAS (libat_exchange_16, LSE2, CORE) +ALIAS (libat_fetch_add_16, LSE2, CORE) +ALIAS (libat_add_fetch_16, LSE2, CORE) +ALIAS (libat_fetch_sub_16, LSE2, CORE) +ALIAS (libat_sub_fetch_16, LSE2, CORE) +ALIAS (libat_fetch_or_16, LSE2, CORE) +ALIAS (libat_or_fetch_16, LSE2, CORE) +ALIAS (libat_fetch_and_16, LSE2, CORE) +ALIAS (libat_and_fetch_16, LSE2, CORE) +ALIAS (libat_fetch_xor_16, LSE2, CORE) +ALIAS (libat_xor_fetch_16, LSE2, CORE) +ALIAS (libat_fetch_nand_16, LSE2, CORE) +ALIAS (libat_nand_fetch_16, LSE2, CORE) +ALIAS (libat_test_and_set_16, LSE2, CORE) /* GNU_PROPERTY_AARCH64_* macros from elf.h for use in asm code. */ #define FEATURE_1_AND 0xc0000000 -- 2.42.0 ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 1/2] libatomic: atomic_16.S: Improve ENTRY, END and ALIAS macro interface 2023-11-13 11:37 ` [PATCH v2 1/2] libatomic: atomic_16.S: Improve ENTRY, END and ALIAS macro interface Victor Do Nascimento @ 2023-11-29 13:44 ` Richard Earnshaw 0 siblings, 0 replies; 6+ messages in thread From: Richard Earnshaw @ 2023-11-29 13:44 UTC (permalink / raw) To: Victor Do Nascimento, gcc-patches Cc: kyrylo.tkachov, richard.sandiford, Richard.Earnshaw On 13/11/2023 11:37, Victor Do Nascimento wrote: > The introduction of further architectural-feature dependent ifuncs > for AArch64 makes hard-coding ifunc `_i<n>' suffixes to functions > cumbersome to work with. It is awkward to remember which ifunc maps > onto which arch feature and makes the code harder to maintain when new > ifuncs are added and their suffixes possibly altered. > > This patch uses pre-processor `#define' statements to map each suffix to > a descriptive feature name macro, for example: > > #define LSE2 _i1 > > and reconstructs function names with the pre-processor's token > concatenation feature, such that for `MACRO(name)', we would now have > `MACRO(name, feature)' and in the macro definition body we replace > `name` with `name##feature`. > > libatomic/ChangeLog: > * config/linux/aarch64/atomic_16.S (CORE): New macro. > (LSE2): Likewise. > (ENTRY): Modify macro to take in `arch' argument. > (END): Likewise. > (ALIAS): Likewise. > (ENTRY1): New macro. > (END1): Likewise. > (ALIAS): Likewise. > --- > libatomic/config/linux/aarch64/atomic_16.S | 147 +++++++++++---------- > 1 file changed, 79 insertions(+), 68 deletions(-) > > diff --git a/libatomic/config/linux/aarch64/atomic_16.S b/libatomic/config/linux/aarch64/atomic_16.S > index 0485c284117..3f6225830e6 100644 > --- a/libatomic/config/linux/aarch64/atomic_16.S > +++ b/libatomic/config/linux/aarch64/atomic_16.S > @@ -39,22 +39,34 @@ > > .arch armv8-a+lse > > -#define ENTRY(name) \ > - .global name; \ > - .hidden name; \ > - .type name,%function; \ > - .p2align 4; \ > -name: \ > - .cfi_startproc; \ > +#define ENTRY(name, feat) \ > + ENTRY1(name, feat) I'd be much more inclined to keep the 'API' of ENTRY and the related functions the same and then define a new macro ENTRY_FEAT that took the second parameter; then you could define ENTRY as #define ENTRY(name) ENTRY_FEAT (name, CORE) and save the need to modify all the base functionality. > + > +#define ENTRY1(name, feat) \ > + .global name##feat; \ > + .hidden name##feat; \ > + .type name##feat,%function; \ > + .p2align 4; \ > +name##feat: \ > + .cfi_startproc; \ > hint 34 // bti c > > -#define END(name) \ > - .cfi_endproc; \ > - .size name, .-name; > +#define END(name, feat) \ > + END1(name, feat) > > -#define ALIAS(alias,name) \ > - .global alias; \ > - .set alias, name; > +#define END1(name, feat) \ > + .cfi_endproc; \ > + .size name##feat, .-name##feat; > + > +#define ALIAS(alias, from, to) \ > + ALIAS1(alias,from,to) > + > +#define ALIAS1(alias, from, to) \ > + .global alias##from; \ > + .set alias##from, alias##to > + > +#define CORE > +#define LSE2 _i1 > > #define res0 x0 > #define res1 x1 > @@ -89,7 +101,7 @@ name: \ > #define SEQ_CST 5 > > > -ENTRY (libat_load_16) > +ENTRY (libat_load_16, CORE) > mov x5, x0 > cbnz w1, 2f > > @@ -104,10 +116,10 @@ ENTRY (libat_load_16) > stxp w4, res0, res1, [x5] > cbnz w4, 2b > ret > -END (libat_load_16) > +END (libat_load_16, CORE) > > > -ENTRY (libat_load_16_i1) > +ENTRY (libat_load_16, LSE2) > cbnz w1, 1f > > /* RELAXED. */ > @@ -127,10 +139,10 @@ ENTRY (libat_load_16_i1) > ldp res0, res1, [x0] > dmb ishld > ret > -END (libat_load_16_i1) > +END (libat_load_16, LSE2) > > > -ENTRY (libat_store_16) > +ENTRY (libat_store_16, CORE) > cbnz w4, 2f > > /* RELAXED. */ > @@ -144,10 +156,10 @@ ENTRY (libat_store_16) > stlxp w4, in0, in1, [x0] > cbnz w4, 2b > ret > -END (libat_store_16) > +END (libat_store_16, CORE) > > > -ENTRY (libat_store_16_i1) > +ENTRY (libat_store_16, LSE2) > cbnz w4, 1f > > /* RELAXED. */ > @@ -159,10 +171,10 @@ ENTRY (libat_store_16_i1) > stlxp w4, in0, in1, [x0] > cbnz w4, 1b > ret > -END (libat_store_16_i1) > +END (libat_store_16, LSE2) > > > -ENTRY (libat_exchange_16) > +ENTRY (libat_exchange_16, CORE) > mov x5, x0 > cbnz w4, 2f > > @@ -186,10 +198,10 @@ ENTRY (libat_exchange_16) > stlxp w4, in0, in1, [x5] > cbnz w4, 4b > ret > -END (libat_exchange_16) > +END (libat_exchange_16, CORE) > > > -ENTRY (libat_compare_exchange_16) > +ENTRY (libat_compare_exchange_16, CORE) > ldp exp0, exp1, [x1] > cbz w4, 3f > cmp w4, RELEASE > @@ -228,10 +240,10 @@ ENTRY (libat_compare_exchange_16) > cbnz w4, 4b > mov x0, 1 > ret > -END (libat_compare_exchange_16) > +END (libat_compare_exchange_16, CORE) > > > -ENTRY (libat_compare_exchange_16_i1) > +ENTRY (libat_compare_exchange_16, LSE2) > ldp exp0, exp1, [x1] > mov tmp0, exp0 > mov tmp1, exp1 > @@ -264,10 +276,10 @@ ENTRY (libat_compare_exchange_16_i1) > /* ACQ_REL/SEQ_CST. */ > 4: caspal exp0, exp1, in0, in1, [x0] > b 0b > -END (libat_compare_exchange_16_i1) > +END (libat_compare_exchange_16, LSE2) > > > -ENTRY (libat_fetch_add_16) > +ENTRY (libat_fetch_add_16, CORE) > mov x5, x0 > cbnz w4, 2f > > @@ -286,10 +298,10 @@ ENTRY (libat_fetch_add_16) > stlxp w4, tmp0, tmp1, [x5] > cbnz w4, 2b > ret > -END (libat_fetch_add_16) > +END (libat_fetch_add_16, CORE) > > > -ENTRY (libat_add_fetch_16) > +ENTRY (libat_add_fetch_16, CORE) > mov x5, x0 > cbnz w4, 2f > > @@ -308,10 +320,10 @@ ENTRY (libat_add_fetch_16) > stlxp w4, res0, res1, [x5] > cbnz w4, 2b > ret > -END (libat_add_fetch_16) > +END (libat_add_fetch_16, CORE) > > > -ENTRY (libat_fetch_sub_16) > +ENTRY (libat_fetch_sub_16, CORE) > mov x5, x0 > cbnz w4, 2f > > @@ -330,10 +342,10 @@ ENTRY (libat_fetch_sub_16) > stlxp w4, tmp0, tmp1, [x5] > cbnz w4, 2b > ret > -END (libat_fetch_sub_16) > +END (libat_fetch_sub_16, CORE) > > > -ENTRY (libat_sub_fetch_16) > +ENTRY (libat_sub_fetch_16, CORE) > mov x5, x0 > cbnz w4, 2f > > @@ -352,10 +364,10 @@ ENTRY (libat_sub_fetch_16) > stlxp w4, res0, res1, [x5] > cbnz w4, 2b > ret > -END (libat_sub_fetch_16) > +END (libat_sub_fetch_16, CORE) > > > -ENTRY (libat_fetch_or_16) > +ENTRY (libat_fetch_or_16, CORE) > mov x5, x0 > cbnz w4, 2f > > @@ -374,10 +386,10 @@ ENTRY (libat_fetch_or_16) > stlxp w4, tmp0, tmp1, [x5] > cbnz w4, 2b > ret > -END (libat_fetch_or_16) > +END (libat_fetch_or_16, CORE) > > > -ENTRY (libat_or_fetch_16) > +ENTRY (libat_or_fetch_16, CORE) > mov x5, x0 > cbnz w4, 2f > > @@ -396,10 +408,10 @@ ENTRY (libat_or_fetch_16) > stlxp w4, res0, res1, [x5] > cbnz w4, 2b > ret > -END (libat_or_fetch_16) > +END (libat_or_fetch_16, CORE) > > > -ENTRY (libat_fetch_and_16) > +ENTRY (libat_fetch_and_16, CORE) > mov x5, x0 > cbnz w4, 2f > > @@ -418,10 +430,10 @@ ENTRY (libat_fetch_and_16) > stlxp w4, tmp0, tmp1, [x5] > cbnz w4, 2b > ret > -END (libat_fetch_and_16) > +END (libat_fetch_and_16, CORE) > > > -ENTRY (libat_and_fetch_16) > +ENTRY (libat_and_fetch_16, CORE) > mov x5, x0 > cbnz w4, 2f > > @@ -440,10 +452,10 @@ ENTRY (libat_and_fetch_16) > stlxp w4, res0, res1, [x5] > cbnz w4, 2b > ret > -END (libat_and_fetch_16) > +END (libat_and_fetch_16, CORE) > > > -ENTRY (libat_fetch_xor_16) > +ENTRY (libat_fetch_xor_16, CORE) > mov x5, x0 > cbnz w4, 2f > > @@ -462,10 +474,10 @@ ENTRY (libat_fetch_xor_16) > stlxp w4, tmp0, tmp1, [x5] > cbnz w4, 2b > ret > -END (libat_fetch_xor_16) > +END (libat_fetch_xor_16, CORE) > > > -ENTRY (libat_xor_fetch_16) > +ENTRY (libat_xor_fetch_16, CORE) > mov x5, x0 > cbnz w4, 2f > > @@ -484,10 +496,10 @@ ENTRY (libat_xor_fetch_16) > stlxp w4, res0, res1, [x5] > cbnz w4, 2b > ret > -END (libat_xor_fetch_16) > +END (libat_xor_fetch_16, CORE) > > > -ENTRY (libat_fetch_nand_16) > +ENTRY (libat_fetch_nand_16, CORE) > mov x5, x0 > mvn in0, in0 > mvn in1, in1 > @@ -508,10 +520,10 @@ ENTRY (libat_fetch_nand_16) > stlxp w4, tmp0, tmp1, [x5] > cbnz w4, 2b > ret > -END (libat_fetch_nand_16) > +END (libat_fetch_nand_16, CORE) > > > -ENTRY (libat_nand_fetch_16) > +ENTRY (libat_nand_fetch_16, CORE) > mov x5, x0 > mvn in0, in0 > mvn in1, in1 > @@ -532,12 +544,12 @@ ENTRY (libat_nand_fetch_16) > stlxp w4, res0, res1, [x5] > cbnz w4, 2b > ret > -END (libat_nand_fetch_16) > +END (libat_nand_fetch_16, CORE) > > > /* __atomic_test_and_set is always inlined, so this entry is unused and > only required for completeness. */ > -ENTRY (libat_test_and_set_16) > +ENTRY (libat_test_and_set_16, CORE) > > /* RELAXED/ACQUIRE/CONSUME/RELEASE/ACQ_REL/SEQ_CST. */ > mov x5, x0 > @@ -545,26 +557,25 @@ ENTRY (libat_test_and_set_16) > stlxrb w4, w2, [x5] > cbnz w4, 1b > ret > -END (libat_test_and_set_16) > +END (libat_test_and_set_16, CORE) > > > /* Alias entry points which are the same in baseline and LSE2. */ > > -ALIAS (libat_exchange_16_i1, libat_exchange_16) > -ALIAS (libat_fetch_add_16_i1, libat_fetch_add_16) > -ALIAS (libat_add_fetch_16_i1, libat_add_fetch_16) > -ALIAS (libat_fetch_sub_16_i1, libat_fetch_sub_16) > -ALIAS (libat_sub_fetch_16_i1, libat_sub_fetch_16) > -ALIAS (libat_fetch_or_16_i1, libat_fetch_or_16) > -ALIAS (libat_or_fetch_16_i1, libat_or_fetch_16) > -ALIAS (libat_fetch_and_16_i1, libat_fetch_and_16) > -ALIAS (libat_and_fetch_16_i1, libat_and_fetch_16) > -ALIAS (libat_fetch_xor_16_i1, libat_fetch_xor_16) > -ALIAS (libat_xor_fetch_16_i1, libat_xor_fetch_16) > -ALIAS (libat_fetch_nand_16_i1, libat_fetch_nand_16) > -ALIAS (libat_nand_fetch_16_i1, libat_nand_fetch_16) > -ALIAS (libat_test_and_set_16_i1, libat_test_and_set_16) > - > +ALIAS (libat_exchange_16, LSE2, CORE) > +ALIAS (libat_fetch_add_16, LSE2, CORE) > +ALIAS (libat_add_fetch_16, LSE2, CORE) > +ALIAS (libat_fetch_sub_16, LSE2, CORE) > +ALIAS (libat_sub_fetch_16, LSE2, CORE) > +ALIAS (libat_fetch_or_16, LSE2, CORE) > +ALIAS (libat_or_fetch_16, LSE2, CORE) > +ALIAS (libat_fetch_and_16, LSE2, CORE) > +ALIAS (libat_and_fetch_16, LSE2, CORE) > +ALIAS (libat_fetch_xor_16, LSE2, CORE) > +ALIAS (libat_xor_fetch_16, LSE2, CORE) > +ALIAS (libat_fetch_nand_16, LSE2, CORE) > +ALIAS (libat_nand_fetch_16, LSE2, CORE) > +ALIAS (libat_test_and_set_16, LSE2, CORE) > > /* GNU_PROPERTY_AARCH64_* macros from elf.h for use in asm code. */ > #define FEATURE_1_AND 0xc0000000 Other than that, this is OK. R. ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v2 2/2] libatomic: Enable LSE128 128-bit atomics for armv9.4-a 2023-11-13 11:37 [PATCH v2 0/2] Libatomic: Add LSE128 atomics support for AArch64 Victor Do Nascimento 2023-11-13 11:37 ` [PATCH v2 1/2] libatomic: atomic_16.S: Improve ENTRY, END and ALIAS macro interface Victor Do Nascimento @ 2023-11-13 11:37 ` Victor Do Nascimento 2023-11-29 15:15 ` Richard Earnshaw 1 sibling, 1 reply; 6+ messages in thread From: Victor Do Nascimento @ 2023-11-13 11:37 UTC (permalink / raw) To: gcc-patches Cc: kyrylo.tkachov, richard.sandiford, Richard.Earnshaw, Victor Do Nascimento The armv9.4-a architectural revision adds three new atomic operations associated with the LSE128 feature: * LDCLRP - Atomic AND NOT (bitclear) of a location with 128-bit value held in a pair of registers, with original data loaded into the same 2 registers. * LDSETP - Atomic OR (bitset) of a location with 128-bit value held in a pair of registers, with original data loaded into the same 2 registers. * SWPP - Atomic swap of one 128-bit value with 128-bit value held in a pair of registers. This patch adds the logic required to make use of these when the architectural feature is present and a suitable assembler available. In order to do this, the following changes are made: 1. Add a configure-time check to check for LSE128 support in the assembler. 2. Edit host-config.h so that when N == 16, nifunc = 2. 3. Where available due to LSE128, implement the second ifunc, making use of the novel instructions. 4. For atomic functions unable to make use of these new instructions, define a new alias which causes the _i1 function variant to point ahead to the corresponding _i2 implementation. libatomic/ChangeLog: * Makefile.am (AM_CPPFLAGS): add conditional setting of -DHAVE_FEAT_LSE128. * acinclude.m4 (LIBAT_TEST_FEAT_LSE128): New. * config/linux/aarch64/atomic_16.S (LSE128): New macro definition. (libat_exchange_16): New LSE128 variant. (libat_fetch_or_16): Likewise. (libat_or_fetch_16): Likewise. (libat_fetch_and_16): Likewise. (libat_and_fetch_16): Likewise. * config/linux/aarch64/host-config.h (IFUNC_COND_2): New. (IFUNC_NCOND): Add operand size checking. (has_lse2): Renamed from `ifunc1`. (has_lse128): New. (HAS_LSE128): Likewise. * libatomic/configure.ac: Add call to LIBAT_TEST_FEAT_LSE128. * configure (ac_subst_vars): Regenerated via autoreconf. * libatomic/Makefile.in: Likewise. * libatomic/auto-config.h.in: Likewise. --- libatomic/Makefile.am | 3 + libatomic/Makefile.in | 1 + libatomic/acinclude.m4 | 19 +++ libatomic/auto-config.h.in | 3 + libatomic/config/linux/aarch64/atomic_16.S | 170 ++++++++++++++++++- libatomic/config/linux/aarch64/host-config.h | 27 ++- libatomic/configure | 59 ++++++- libatomic/configure.ac | 1 + 8 files changed, 274 insertions(+), 9 deletions(-) diff --git a/libatomic/Makefile.am b/libatomic/Makefile.am index c0b8dea5037..24e843db67d 100644 --- a/libatomic/Makefile.am +++ b/libatomic/Makefile.am @@ -130,6 +130,9 @@ libatomic_la_LIBADD = $(foreach s,$(SIZES),$(addsuffix _$(s)_.lo,$(SIZEOBJS))) ## On a target-specific basis, include alternates to be selected by IFUNC. if HAVE_IFUNC if ARCH_AARCH64_LINUX +if ARCH_AARCH64_HAVE_LSE128 +AM_CPPFLAGS = -DHAVE_FEAT_LSE128 +endif IFUNC_OPTIONS = -march=armv8-a+lse libatomic_la_LIBADD += $(foreach s,$(SIZES),$(addsuffix _$(s)_1_.lo,$(SIZEOBJS))) libatomic_la_SOURCES += atomic_16.S diff --git a/libatomic/Makefile.in b/libatomic/Makefile.in index dc2330b91fd..cd48fa21334 100644 --- a/libatomic/Makefile.in +++ b/libatomic/Makefile.in @@ -452,6 +452,7 @@ M_SRC = $(firstword $(filter %/$(M_FILE), $(all_c_files))) libatomic_la_LIBADD = $(foreach s,$(SIZES),$(addsuffix \ _$(s)_.lo,$(SIZEOBJS))) $(am__append_1) $(am__append_3) \ $(am__append_4) $(am__append_5) +@ARCH_AARCH64_HAVE_LSE128_TRUE@@ARCH_AARCH64_LINUX_TRUE@@HAVE_IFUNC_TRUE@AM_CPPFLAGS = -DHAVE_FEAT_LSE128 @ARCH_AARCH64_LINUX_TRUE@@HAVE_IFUNC_TRUE@IFUNC_OPTIONS = -march=armv8-a+lse @ARCH_ARM_LINUX_TRUE@@HAVE_IFUNC_TRUE@IFUNC_OPTIONS = -march=armv7-a+fp -DHAVE_KERNEL64 @ARCH_I386_TRUE@@HAVE_IFUNC_TRUE@IFUNC_OPTIONS = -march=i586 diff --git a/libatomic/acinclude.m4 b/libatomic/acinclude.m4 index f35ab5b60a5..4197db8f404 100644 --- a/libatomic/acinclude.m4 +++ b/libatomic/acinclude.m4 @@ -83,6 +83,25 @@ AC_DEFUN([LIBAT_TEST_ATOMIC_BUILTIN],[ ]) ]) +dnl +dnl Test if the host assembler supports armv9.4-a LSE128 isns. +dnl +AC_DEFUN([LIBAT_TEST_FEAT_LSE128],[ + AC_CACHE_CHECK([for armv9.4-a LSE128 insn support], + [libat_cv_have_feat_lse128],[ + AC_LANG_CONFTEST([AC_LANG_PROGRAM([],[asm(".arch armv9-a+lse128")])]) + if AC_TRY_EVAL(ac_link); then + eval libat_cv_have_feat_lse128=yes + else + eval libat_cv_have_feat_lse128=no + fi + rm -f conftest* + ]) + LIBAT_DEFINE_YESNO([HAVE_FEAT_LSE128], [$libat_cv_have_feat_lse128], + [Have LSE128 support for 16 byte integers.]) + AM_CONDITIONAL([ARCH_AARCH64_HAVE_LSE128], [test x$libat_cv_have_feat_lse128 = xyes]) +]) + dnl dnl Test if we have __atomic_load and __atomic_store for mode $1, size $2 dnl diff --git a/libatomic/auto-config.h.in b/libatomic/auto-config.h.in index ab3424a759e..7c78933b07d 100644 --- a/libatomic/auto-config.h.in +++ b/libatomic/auto-config.h.in @@ -105,6 +105,9 @@ /* Define to 1 if you have the <dlfcn.h> header file. */ #undef HAVE_DLFCN_H +/* Have LSE128 support for 16 byte integers. */ +#undef HAVE_FEAT_LSE128 + /* Define to 1 if you have the <fenv.h> header file. */ #undef HAVE_FENV_H diff --git a/libatomic/config/linux/aarch64/atomic_16.S b/libatomic/config/linux/aarch64/atomic_16.S index 3f6225830e6..44a773031f8 100644 --- a/libatomic/config/linux/aarch64/atomic_16.S +++ b/libatomic/config/linux/aarch64/atomic_16.S @@ -34,10 +34,14 @@ writes, this will be true when using atomics in actual code. The libat_<op>_16 entry points are ARMv8.0. - The libat_<op>_16_i1 entry points are used when LSE2 is available. */ - + The libat_<op>_16_i1 entry points are used when LSE128 is available. + The libat_<op>_16_i2 entry points are used when LSE2 is available. */ +#if HAVE_FEAT_LSE128 + .arch armv8-a+lse128 +#else .arch armv8-a+lse +#endif #define ENTRY(name, feat) \ ENTRY1(name, feat) @@ -66,7 +70,8 @@ name##feat: \ .set alias##from, alias##to; #define CORE -#define LSE2 _i1 +#define LSE128 _i1 +#define LSE2 _i2 #define res0 x0 #define res1 x1 @@ -201,6 +206,31 @@ ENTRY (libat_exchange_16, CORE) END (libat_exchange_16, CORE) +#if HAVE_FEAT_LSE128 +ENTRY (libat_exchange_16, LSE128) + mov tmp0, x0 + mov res0, in0 + mov res1, in1 + cbnz w4, 1f + + /* RELAXED. */ + swpp res0, res1, [tmp0] + ret +1: + cmp w4, ACQUIRE + b.hi 2f + + /* ACQUIRE/CONSUME. */ + swppa res0, res1, [tmp0] + ret + + /* RELEASE/ACQ_REL/SEQ_CST. */ +2: swppal res0, res1, [tmp0] + ret +END (libat_exchange_16, LSE128) +#endif + + ENTRY (libat_compare_exchange_16, CORE) ldp exp0, exp1, [x1] cbz w4, 3f @@ -389,6 +419,31 @@ ENTRY (libat_fetch_or_16, CORE) END (libat_fetch_or_16, CORE) +#if HAVE_FEAT_LSE128 +ENTRY (libat_fetch_or_16, LSE128) + mov tmp0, x0 + mov res0, in0 + mov res1, in1 + cbnz w4, 1f + + /* RELAXED. */ + ldsetp res0, res1, [tmp0] + ret +1: + cmp w4, ACQUIRE + b.hi 2f + + /* ACQUIRE/CONSUME. */ + ldsetpa res0, res1, [tmp0] + ret + + /* RELEASE/ACQ_REL/SEQ_CST. */ +2: ldsetpal res0, res1, [tmp0] + ret +END (libat_fetch_or_16, LSE128) +#endif + + ENTRY (libat_or_fetch_16, CORE) mov x5, x0 cbnz w4, 2f @@ -411,6 +466,36 @@ ENTRY (libat_or_fetch_16, CORE) END (libat_or_fetch_16, CORE) +#if HAVE_FEAT_LSE128 +ENTRY (libat_or_fetch_16, LSE128) + cbnz w4, 1f + mov tmp0, in0 + mov tmp1, in1 + + /* RELAXED. */ + ldsetp in0, in1, [x0] + orr res0, in0, tmp0 + orr res1, in1, tmp1 + ret +1: + cmp w4, ACQUIRE + b.hi 2f + + /* ACQUIRE/CONSUME. */ + ldsetpa in0, in1, [x0] + orr res0, in0, tmp0 + orr res1, in1, tmp1 + ret + + /* RELEASE/ACQ_REL/SEQ_CST. */ +2: ldsetpal in0, in1, [x0] + orr res0, in0, tmp0 + orr res1, in1, tmp1 + ret +END (libat_or_fetch_16, LSE128) +#endif + + ENTRY (libat_fetch_and_16, CORE) mov x5, x0 cbnz w4, 2f @@ -433,6 +518,32 @@ ENTRY (libat_fetch_and_16, CORE) END (libat_fetch_and_16, CORE) +#if HAVE_FEAT_LSE128 +ENTRY (libat_fetch_and_16, LSE128) + mov tmp0, x0 + mvn res0, in0 + mvn res1, in1 + cbnz w4, 1f + + /* RELAXED. */ + ldclrp res0, res1, [tmp0] + ret + +1: + cmp w4, ACQUIRE + b.hi 2f + + /* ACQUIRE/CONSUME. */ + ldclrpa res0, res1, [tmp0] + ret + + /* RELEASE/ACQ_REL/SEQ_CST. */ +2: ldclrpal res0, res1, [tmp0] + ret +END (libat_fetch_and_16, LSE128) +#endif + + ENTRY (libat_and_fetch_16, CORE) mov x5, x0 cbnz w4, 2f @@ -455,6 +566,37 @@ ENTRY (libat_and_fetch_16, CORE) END (libat_and_fetch_16, CORE) +#if HAVE_FEAT_LSE128 +ENTRY (libat_and_fetch_16, LSE128) + mvn tmp0, in0 + mvn tmp0, in1 + cbnz w4, 1f + + /* RELAXED. */ + ldclrp tmp0, tmp1, [x0] + and res0, tmp0, in0 + and res1, tmp1, in1 + ret + +1: + cmp w4, ACQUIRE + b.hi 2f + + /* ACQUIRE/CONSUME. */ + ldclrpa tmp0, tmp1, [x0] + and res0, tmp0, in0 + and res1, tmp1, in1 + ret + + /* RELEASE/ACQ_REL/SEQ_CST. */ +2: ldclrpal tmp0, tmp1, [x5] + and res0, tmp0, in0 + and res1, tmp1, in1 + ret +END (libat_and_fetch_16, LSE128) +#endif + + ENTRY (libat_fetch_xor_16, CORE) mov x5, x0 cbnz w4, 2f @@ -560,6 +702,28 @@ ENTRY (libat_test_and_set_16, CORE) END (libat_test_and_set_16, CORE) +/* Alias entry points which are the same in LSE2 and LSE128. */ + +#if !HAVE_FEAT_LSE128 +ALIAS (libat_exchange_16, LSE128, LSE2) +ALIAS (libat_fetch_or_16, LSE128, LSE2) +ALIAS (libat_fetch_and_16, LSE128, LSE2) +ALIAS (libat_or_fetch_16, LSE128, LSE2) +ALIAS (libat_and_fetch_16, LSE128, LSE2) +#endif +ALIAS (libat_load_16, LSE128, LSE2) +ALIAS (libat_store_16, LSE128, LSE2) +ALIAS (libat_compare_exchange_16, LSE128, LSE2) +ALIAS (libat_fetch_add_16, LSE128, LSE2) +ALIAS (libat_add_fetch_16, LSE128, LSE2) +ALIAS (libat_fetch_sub_16, LSE128, LSE2) +ALIAS (libat_sub_fetch_16, LSE128, LSE2) +ALIAS (libat_fetch_xor_16, LSE128, LSE2) +ALIAS (libat_xor_fetch_16, LSE128, LSE2) +ALIAS (libat_fetch_nand_16, LSE128, LSE2) +ALIAS (libat_nand_fetch_16, LSE128, LSE2) +ALIAS (libat_test_and_set_16, LSE128, LSE2) + /* Alias entry points which are the same in baseline and LSE2. */ ALIAS (libat_exchange_16, LSE2, CORE) diff --git a/libatomic/config/linux/aarch64/host-config.h b/libatomic/config/linux/aarch64/host-config.h index 30ef21c7715..d873e91b1c9 100644 --- a/libatomic/config/linux/aarch64/host-config.h +++ b/libatomic/config/linux/aarch64/host-config.h @@ -26,14 +26,17 @@ #ifdef HWCAP_USCAT # if N == 16 -# define IFUNC_COND_1 (ifunc1 (hwcap)) +# define IFUNC_COND_1 (has_lse128 (hwcap)) +# define IFUNC_COND_2 (has_lse2 (hwcap)) +# define IFUNC_NCOND(N) 2 # else -# define IFUNC_COND_1 (hwcap & HWCAP_ATOMICS) +# define IFUNC_COND_1 (hwcap & HWCAP_ATOMICS) +# define IFUNC_NCOND(N) 1 # endif #else # define IFUNC_COND_1 (false) +# define IFUNC_NCOND(N) 1 #endif -#define IFUNC_NCOND(N) (1) #endif /* HAVE_IFUNC */ @@ -56,7 +59,7 @@ #define MIDR_PARTNUM(midr) (((midr) >> 4) & 0xfff) static inline bool -ifunc1 (unsigned long hwcap) +has_lse2 (unsigned long hwcap) { if (hwcap & HWCAP_USCAT) return true; @@ -69,6 +72,22 @@ ifunc1 (unsigned long hwcap) return true; return false; } + +/* LSE128 atomic support encoded in ID_AA64ISAR0_EL1.Atomic, + bits[23:20]. The expected value is 0b0011. Check that. */ +#define HAS_LSE128() ({ \ + unsigned long val; \ + asm volatile ("mrs %0, ID_AA64ISAR0_EL1" : "=r" (val)); \ + (val & 0xf00000) >= 0x300000; \ + }) + +static inline bool +has_lse128 (unsigned long hwcap) +{ + if (has_lse2 (hwcap) && HAS_LSE128 ()) + return true; + return false; +} #endif #include_next <host-config.h> diff --git a/libatomic/configure b/libatomic/configure index d579bab96f8..ee3bbb97d69 100755 --- a/libatomic/configure +++ b/libatomic/configure @@ -657,6 +657,8 @@ LIBAT_BUILD_VERSIONED_SHLIB_TRUE OPT_LDFLAGS SECTION_LDFLAGS SYSROOT_CFLAGS_FOR_TARGET +ARCH_AARCH64_HAVE_LSE128_FALSE +ARCH_AARCH64_HAVE_LSE128_TRUE enable_aarch64_lse libtool_VERSION ENABLE_DARWIN_AT_RPATH_FALSE @@ -11456,7 +11458,7 @@ else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat > conftest.$ac_ext <<_LT_EOF -#line 11459 "configure" +#line 11461 "configure" #include "confdefs.h" #if HAVE_DLFCN_H @@ -11562,7 +11564,7 @@ else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat > conftest.$ac_ext <<_LT_EOF -#line 11565 "configure" +#line 11567 "configure" #include "confdefs.h" #if HAVE_DLFCN_H @@ -11926,6 +11928,55 @@ ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $ ac_compiler_gnu=$ac_cv_c_compiler_gnu + + { $as_echo "$as_me:${as_lineno-$LINENO}: checking for armv9.4-a LSE128 insn support" >&5 +$as_echo_n "checking for armv9.4-a LSE128 insn support... " >&6; } +if ${libat_cv_have_feat_lse128+:} false; then : + $as_echo_n "(cached) " >&6 +else + + cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end confdefs.h. */ + +int +main () +{ +asm(".arch armv9-a+lse128") + ; + return 0; +} +_ACEOF + if { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_link\""; } >&5 + (eval $ac_link) 2>&5 + ac_status=$? + $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 + test $ac_status = 0; }; then + eval libat_cv_have_feat_lse128=yes + else + eval libat_cv_have_feat_lse128=no + fi + rm -f conftest* + +fi +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $libat_cv_have_feat_lse128" >&5 +$as_echo "$libat_cv_have_feat_lse128" >&6; } + + yesno=`echo $libat_cv_have_feat_lse128 | tr 'yesno' '1 0 '` + +cat >>confdefs.h <<_ACEOF +#define HAVE_FEAT_LSE128 $yesno +_ACEOF + + + if test x$libat_cv_have_feat_lse128 = xyes; then + ARCH_AARCH64_HAVE_LSE128_TRUE= + ARCH_AARCH64_HAVE_LSE128_FALSE='#' +else + ARCH_AARCH64_HAVE_LSE128_TRUE='#' + ARCH_AARCH64_HAVE_LSE128_FALSE= +fi + + ;; esac @@ -15989,6 +16040,10 @@ if test -z "${ENABLE_DARWIN_AT_RPATH_TRUE}" && test -z "${ENABLE_DARWIN_AT_RPATH as_fn_error $? "conditional \"ENABLE_DARWIN_AT_RPATH\" was never defined. Usually this means the macro was only invoked conditionally." "$LINENO" 5 fi +if test -z "${ARCH_AARCH64_HAVE_LSE128_TRUE}" && test -z "${ARCH_AARCH64_HAVE_LSE128_FALSE}"; then + as_fn_error $? "conditional \"ARCH_AARCH64_HAVE_LSE128\" was never defined. +Usually this means the macro was only invoked conditionally." "$LINENO" 5 +fi if test -z "${LIBAT_BUILD_VERSIONED_SHLIB_TRUE}" && test -z "${LIBAT_BUILD_VERSIONED_SHLIB_FALSE}"; then as_fn_error $? "conditional \"LIBAT_BUILD_VERSIONED_SHLIB\" was never defined. diff --git a/libatomic/configure.ac b/libatomic/configure.ac index 5f2821ac3f4..b2fe68d7d0f 100644 --- a/libatomic/configure.ac +++ b/libatomic/configure.ac @@ -169,6 +169,7 @@ AC_MSG_RESULT([$target_thread_file]) case "$target" in *aarch64*) ACX_PROG_CC_WARNING_OPTS([-march=armv8-a+lse],[enable_aarch64_lse]) + LIBAT_TEST_FEAT_LSE128() ;; esac -- 2.42.0 ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 2/2] libatomic: Enable LSE128 128-bit atomics for armv9.4-a 2023-11-13 11:37 ` [PATCH v2 2/2] libatomic: Enable LSE128 128-bit atomics for armv9.4-a Victor Do Nascimento @ 2023-11-29 15:15 ` Richard Earnshaw 2023-12-08 14:51 ` Szabolcs Nagy 0 siblings, 1 reply; 6+ messages in thread From: Richard Earnshaw @ 2023-11-29 15:15 UTC (permalink / raw) To: Victor Do Nascimento, gcc-patches Cc: kyrylo.tkachov, richard.sandiford, Richard.Earnshaw On 13/11/2023 11:37, Victor Do Nascimento wrote: > The armv9.4-a architectural revision adds three new atomic operations > associated with the LSE128 feature: > > * LDCLRP - Atomic AND NOT (bitclear) of a location with 128-bit > value held in a pair of registers, with original data loaded into > the same 2 registers. > * LDSETP - Atomic OR (bitset) of a location with 128-bit value held > in a pair of registers, with original data loaded into the same 2 > registers. > * SWPP - Atomic swap of one 128-bit value with 128-bit value held > in a pair of registers. > > This patch adds the logic required to make use of these when the > architectural feature is present and a suitable assembler available. > > In order to do this, the following changes are made: > > 1. Add a configure-time check to check for LSE128 support in the > assembler. > 2. Edit host-config.h so that when N == 16, nifunc = 2. > 3. Where available due to LSE128, implement the second ifunc, making > use of the novel instructions. > 4. For atomic functions unable to make use of these new > instructions, define a new alias which causes the _i1 function > variant to point ahead to the corresponding _i2 implementation. > > libatomic/ChangeLog: > > * Makefile.am (AM_CPPFLAGS): add conditional setting of > -DHAVE_FEAT_LSE128. > * acinclude.m4 (LIBAT_TEST_FEAT_LSE128): New. > * config/linux/aarch64/atomic_16.S (LSE128): New macro > definition. > (libat_exchange_16): New LSE128 variant. > (libat_fetch_or_16): Likewise. > (libat_or_fetch_16): Likewise. > (libat_fetch_and_16): Likewise. > (libat_and_fetch_16): Likewise. > * config/linux/aarch64/host-config.h (IFUNC_COND_2): New. > (IFUNC_NCOND): Add operand size checking. > (has_lse2): Renamed from `ifunc1`. > (has_lse128): New. > (HAS_LSE128): Likewise. > * libatomic/configure.ac: Add call to LIBAT_TEST_FEAT_LSE128. > * configure (ac_subst_vars): Regenerated via autoreconf. > * libatomic/Makefile.in: Likewise. > * libatomic/auto-config.h.in: Likewise. > --- > libatomic/Makefile.am | 3 + > libatomic/Makefile.in | 1 + > libatomic/acinclude.m4 | 19 +++ > libatomic/auto-config.h.in | 3 + > libatomic/config/linux/aarch64/atomic_16.S | 170 ++++++++++++++++++- > libatomic/config/linux/aarch64/host-config.h | 27 ++- > libatomic/configure | 59 ++++++- > libatomic/configure.ac | 1 + > 8 files changed, 274 insertions(+), 9 deletions(-) > > diff --git a/libatomic/Makefile.am b/libatomic/Makefile.am > index c0b8dea5037..24e843db67d 100644 > --- a/libatomic/Makefile.am > +++ b/libatomic/Makefile.am > @@ -130,6 +130,9 @@ libatomic_la_LIBADD = $(foreach s,$(SIZES),$(addsuffix _$(s)_.lo,$(SIZEOBJS))) > ## On a target-specific basis, include alternates to be selected by IFUNC. > if HAVE_IFUNC > if ARCH_AARCH64_LINUX > +if ARCH_AARCH64_HAVE_LSE128 > +AM_CPPFLAGS = -DHAVE_FEAT_LSE128 > +endif > IFUNC_OPTIONS = -march=armv8-a+lse > libatomic_la_LIBADD += $(foreach s,$(SIZES),$(addsuffix _$(s)_1_.lo,$(SIZEOBJS))) > libatomic_la_SOURCES += atomic_16.S > diff --git a/libatomic/Makefile.in b/libatomic/Makefile.in > index dc2330b91fd..cd48fa21334 100644 > --- a/libatomic/Makefile.in > +++ b/libatomic/Makefile.in > @@ -452,6 +452,7 @@ M_SRC = $(firstword $(filter %/$(M_FILE), $(all_c_files))) > libatomic_la_LIBADD = $(foreach s,$(SIZES),$(addsuffix \ > _$(s)_.lo,$(SIZEOBJS))) $(am__append_1) $(am__append_3) \ > $(am__append_4) $(am__append_5) > +@ARCH_AARCH64_HAVE_LSE128_TRUE@@ARCH_AARCH64_LINUX_TRUE@@HAVE_IFUNC_TRUE@AM_CPPFLAGS = -DHAVE_FEAT_LSE128 > @ARCH_AARCH64_LINUX_TRUE@@HAVE_IFUNC_TRUE@IFUNC_OPTIONS = -march=armv8-a+lse > @ARCH_ARM_LINUX_TRUE@@HAVE_IFUNC_TRUE@IFUNC_OPTIONS = -march=armv7-a+fp -DHAVE_KERNEL64 > @ARCH_I386_TRUE@@HAVE_IFUNC_TRUE@IFUNC_OPTIONS = -march=i586 > diff --git a/libatomic/acinclude.m4 b/libatomic/acinclude.m4 > index f35ab5b60a5..4197db8f404 100644 > --- a/libatomic/acinclude.m4 > +++ b/libatomic/acinclude.m4 > @@ -83,6 +83,25 @@ AC_DEFUN([LIBAT_TEST_ATOMIC_BUILTIN],[ > ]) > ]) > > +dnl > +dnl Test if the host assembler supports armv9.4-a LSE128 isns. > +dnl > +AC_DEFUN([LIBAT_TEST_FEAT_LSE128],[ > + AC_CACHE_CHECK([for armv9.4-a LSE128 insn support], > + [libat_cv_have_feat_lse128],[ > + AC_LANG_CONFTEST([AC_LANG_PROGRAM([],[asm(".arch armv9-a+lse128")])]) > + if AC_TRY_EVAL(ac_link); then > + eval libat_cv_have_feat_lse128=yes > + else > + eval libat_cv_have_feat_lse128=no > + fi > + rm -f conftest* > + ]) > + LIBAT_DEFINE_YESNO([HAVE_FEAT_LSE128], [$libat_cv_have_feat_lse128], > + [Have LSE128 support for 16 byte integers.]) > + AM_CONDITIONAL([ARCH_AARCH64_HAVE_LSE128], [test x$libat_cv_have_feat_lse128 = xyes]) > +]) > + > dnl > dnl Test if we have __atomic_load and __atomic_store for mode $1, size $2 > dnl > diff --git a/libatomic/auto-config.h.in b/libatomic/auto-config.h.in > index ab3424a759e..7c78933b07d 100644 > --- a/libatomic/auto-config.h.in > +++ b/libatomic/auto-config.h.in > @@ -105,6 +105,9 @@ > /* Define to 1 if you have the <dlfcn.h> header file. */ > #undef HAVE_DLFCN_H > > +/* Have LSE128 support for 16 byte integers. */ > +#undef HAVE_FEAT_LSE128 > + > /* Define to 1 if you have the <fenv.h> header file. */ > #undef HAVE_FENV_H > > diff --git a/libatomic/config/linux/aarch64/atomic_16.S b/libatomic/config/linux/aarch64/atomic_16.S > index 3f6225830e6..44a773031f8 100644 > --- a/libatomic/config/linux/aarch64/atomic_16.S > +++ b/libatomic/config/linux/aarch64/atomic_16.S > @@ -34,10 +34,14 @@ > writes, this will be true when using atomics in actual code. > > The libat_<op>_16 entry points are ARMv8.0. > - The libat_<op>_16_i1 entry points are used when LSE2 is available. */ > - > + The libat_<op>_16_i1 entry points are used when LSE128 is available. > + The libat_<op>_16_i2 entry points are used when LSE2 is available. */ > > +#if HAVE_FEAT_LSE128 > + .arch armv8-a+lse128 Shouldn't this match the test we run during configure? ie armv9-a+lse128? I'm not sure which is preferable, but it seems odd for them not to be the same. > +#else > .arch armv8-a+lse > +#endif > > #define ENTRY(name, feat) \ > ENTRY1(name, feat) > @@ -66,7 +70,8 @@ name##feat: \ > .set alias##from, alias##to; > > #define CORE > -#define LSE2 _i1 > +#define LSE128 _i1 > +#define LSE2 _i2 > > #define res0 x0 > #define res1 x1 > @@ -201,6 +206,31 @@ ENTRY (libat_exchange_16, CORE) > END (libat_exchange_16, CORE) > > > +#if HAVE_FEAT_LSE128 > +ENTRY (libat_exchange_16, LSE128) > + mov tmp0, x0 > + mov res0, in0 > + mov res1, in1 > + cbnz w4, 1f > + > + /* RELAXED. */ > + swpp res0, res1, [tmp0] > + ret > +1: > + cmp w4, ACQUIRE > + b.hi 2f > + > + /* ACQUIRE/CONSUME. */ > + swppa res0, res1, [tmp0] > + ret > + > + /* RELEASE/ACQ_REL/SEQ_CST. */ > +2: swppal res0, res1, [tmp0] > + ret > +END (libat_exchange_16, LSE128) > +#endif > + > + > ENTRY (libat_compare_exchange_16, CORE) > ldp exp0, exp1, [x1] > cbz w4, 3f > @@ -389,6 +419,31 @@ ENTRY (libat_fetch_or_16, CORE) > END (libat_fetch_or_16, CORE) > > > +#if HAVE_FEAT_LSE128 > +ENTRY (libat_fetch_or_16, LSE128) > + mov tmp0, x0 > + mov res0, in0 > + mov res1, in1 > + cbnz w4, 1f > + > + /* RELAXED. */ > + ldsetp res0, res1, [tmp0] > + ret > +1: > + cmp w4, ACQUIRE > + b.hi 2f > + > + /* ACQUIRE/CONSUME. */ > + ldsetpa res0, res1, [tmp0] > + ret > + > + /* RELEASE/ACQ_REL/SEQ_CST. */ > +2: ldsetpal res0, res1, [tmp0] > + ret > +END (libat_fetch_or_16, LSE128) > +#endif > + > + > ENTRY (libat_or_fetch_16, CORE) > mov x5, x0 > cbnz w4, 2f > @@ -411,6 +466,36 @@ ENTRY (libat_or_fetch_16, CORE) > END (libat_or_fetch_16, CORE) > > > +#if HAVE_FEAT_LSE128 > +ENTRY (libat_or_fetch_16, LSE128) > + cbnz w4, 1f > + mov tmp0, in0 > + mov tmp1, in1 > + > + /* RELAXED. */ > + ldsetp in0, in1, [x0] > + orr res0, in0, tmp0 > + orr res1, in1, tmp1 > + ret > +1: > + cmp w4, ACQUIRE > + b.hi 2f > + > + /* ACQUIRE/CONSUME. */ > + ldsetpa in0, in1, [x0] > + orr res0, in0, tmp0 > + orr res1, in1, tmp1 > + ret > + > + /* RELEASE/ACQ_REL/SEQ_CST. */ > +2: ldsetpal in0, in1, [x0] > + orr res0, in0, tmp0 > + orr res1, in1, tmp1 > + ret > +END (libat_or_fetch_16, LSE128) > +#endif > + > + > ENTRY (libat_fetch_and_16, CORE) > mov x5, x0 > cbnz w4, 2f > @@ -433,6 +518,32 @@ ENTRY (libat_fetch_and_16, CORE) > END (libat_fetch_and_16, CORE) > > > +#if HAVE_FEAT_LSE128 > +ENTRY (libat_fetch_and_16, LSE128) > + mov tmp0, x0 > + mvn res0, in0 > + mvn res1, in1 > + cbnz w4, 1f > + > + /* RELAXED. */ > + ldclrp res0, res1, [tmp0] > + ret > + > +1: > + cmp w4, ACQUIRE > + b.hi 2f > + > + /* ACQUIRE/CONSUME. */ > + ldclrpa res0, res1, [tmp0] > + ret > + > + /* RELEASE/ACQ_REL/SEQ_CST. */ > +2: ldclrpal res0, res1, [tmp0] > + ret > +END (libat_fetch_and_16, LSE128) > +#endif > + > + > ENTRY (libat_and_fetch_16, CORE) > mov x5, x0 > cbnz w4, 2f > @@ -455,6 +566,37 @@ ENTRY (libat_and_fetch_16, CORE) > END (libat_and_fetch_16, CORE) > > > +#if HAVE_FEAT_LSE128 > +ENTRY (libat_and_fetch_16, LSE128) > + mvn tmp0, in0 > + mvn tmp0, in1 > + cbnz w4, 1f > + > + /* RELAXED. */ > + ldclrp tmp0, tmp1, [x0] > + and res0, tmp0, in0 > + and res1, tmp1, in1 > + ret > + > +1: > + cmp w4, ACQUIRE > + b.hi 2f > + > + /* ACQUIRE/CONSUME. */ > + ldclrpa tmp0, tmp1, [x0] > + and res0, tmp0, in0 > + and res1, tmp1, in1 > + ret > + > + /* RELEASE/ACQ_REL/SEQ_CST. */ > +2: ldclrpal tmp0, tmp1, [x5] > + and res0, tmp0, in0 > + and res1, tmp1, in1 > + ret > +END (libat_and_fetch_16, LSE128) > +#endif > + > + > ENTRY (libat_fetch_xor_16, CORE) > mov x5, x0 > cbnz w4, 2f > @@ -560,6 +702,28 @@ ENTRY (libat_test_and_set_16, CORE) > END (libat_test_and_set_16, CORE) > > > +/* Alias entry points which are the same in LSE2 and LSE128. */ > + > +#if !HAVE_FEAT_LSE128 > +ALIAS (libat_exchange_16, LSE128, LSE2) > +ALIAS (libat_fetch_or_16, LSE128, LSE2) > +ALIAS (libat_fetch_and_16, LSE128, LSE2) > +ALIAS (libat_or_fetch_16, LSE128, LSE2) > +ALIAS (libat_and_fetch_16, LSE128, LSE2) > +#endif > +ALIAS (libat_load_16, LSE128, LSE2) > +ALIAS (libat_store_16, LSE128, LSE2) > +ALIAS (libat_compare_exchange_16, LSE128, LSE2) > +ALIAS (libat_fetch_add_16, LSE128, LSE2) > +ALIAS (libat_add_fetch_16, LSE128, LSE2) > +ALIAS (libat_fetch_sub_16, LSE128, LSE2) > +ALIAS (libat_sub_fetch_16, LSE128, LSE2) > +ALIAS (libat_fetch_xor_16, LSE128, LSE2) > +ALIAS (libat_xor_fetch_16, LSE128, LSE2) > +ALIAS (libat_fetch_nand_16, LSE128, LSE2) > +ALIAS (libat_nand_fetch_16, LSE128, LSE2) > +ALIAS (libat_test_and_set_16, LSE128, LSE2) > + > /* Alias entry points which are the same in baseline and LSE2. */ > > ALIAS (libat_exchange_16, LSE2, CORE) > diff --git a/libatomic/config/linux/aarch64/host-config.h b/libatomic/config/linux/aarch64/host-config.h > index 30ef21c7715..d873e91b1c9 100644 > --- a/libatomic/config/linux/aarch64/host-config.h > +++ b/libatomic/config/linux/aarch64/host-config.h > @@ -26,14 +26,17 @@ > > #ifdef HWCAP_USCAT > # if N == 16 > -# define IFUNC_COND_1 (ifunc1 (hwcap)) > +# define IFUNC_COND_1 (has_lse128 (hwcap)) > +# define IFUNC_COND_2 (has_lse2 (hwcap)) > +# define IFUNC_NCOND(N) 2 > # else > -# define IFUNC_COND_1 (hwcap & HWCAP_ATOMICS) > +# define IFUNC_COND_1 (hwcap & HWCAP_ATOMICS) > +# define IFUNC_NCOND(N) 1 > # endif > #else > # define IFUNC_COND_1 (false) > +# define IFUNC_NCOND(N) 1 > #endif > -#define IFUNC_NCOND(N) (1) > > #endif /* HAVE_IFUNC */ > > @@ -56,7 +59,7 @@ > #define MIDR_PARTNUM(midr) (((midr) >> 4) & 0xfff) > > static inline bool > -ifunc1 (unsigned long hwcap) > +has_lse2 (unsigned long hwcap) > { > if (hwcap & HWCAP_USCAT) > return true; > @@ -69,6 +72,22 @@ ifunc1 (unsigned long hwcap) > return true; > return false; > } > + > +/* LSE128 atomic support encoded in ID_AA64ISAR0_EL1.Atomic, > + bits[23:20]. The expected value is 0b0011. Check that. */ > +#define HAS_LSE128() ({ \ > + unsigned long val; \ > + asm volatile ("mrs %0, ID_AA64ISAR0_EL1" : "=r" (val)); \ > + (val & 0xf00000) >= 0x300000; \ > + }) > + The pseudo-code for this register reads: if PSTATE.EL == EL0 then if IsFeatureImplemented(FEAT_IDST) then if EL2Enabled() && HCR_EL2.TGE == '1' then AArch64.SystemAccessTrap(EL2, 0x18); else AArch64.SystemAccessTrap(EL1, 0x18); else UNDEFINED; ... So this instruction may result in SIGILL if run on cores without FEAT_IDST. SystemAccessTrap just punts the problem up to the kernel or hypervisor as well. I think we need a hwcap bit to work this out, which is the preferred way on Linux anyway. Something like this? :) https://lore.kernel.org/linux-arm-kernel/20231003124544.858804-2-joey.gouly@arm.com/T/ > +static inline bool > +has_lse128 (unsigned long hwcap) > +{ > + if (has_lse2 (hwcap) && HAS_LSE128 ()) Why does this need to test for LSE2, surely that's mandatory if LSE128 is implemented. > + return true; > + return false; > +} > #endif > > #include_next <host-config.h> > diff --git a/libatomic/configure b/libatomic/configure > index d579bab96f8..ee3bbb97d69 100755 > --- a/libatomic/configure > +++ b/libatomic/configure > @@ -657,6 +657,8 @@ LIBAT_BUILD_VERSIONED_SHLIB_TRUE > OPT_LDFLAGS > SECTION_LDFLAGS > SYSROOT_CFLAGS_FOR_TARGET > +ARCH_AARCH64_HAVE_LSE128_FALSE > +ARCH_AARCH64_HAVE_LSE128_TRUE > enable_aarch64_lse > libtool_VERSION > ENABLE_DARWIN_AT_RPATH_FALSE > @@ -11456,7 +11458,7 @@ else > lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 > lt_status=$lt_dlunknown > cat > conftest.$ac_ext <<_LT_EOF > -#line 11459 "configure" > +#line 11461 "configure" > #include "confdefs.h" > > #if HAVE_DLFCN_H > @@ -11562,7 +11564,7 @@ else > lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 > lt_status=$lt_dlunknown > cat > conftest.$ac_ext <<_LT_EOF > -#line 11565 "configure" > +#line 11567 "configure" > #include "confdefs.h" > > #if HAVE_DLFCN_H > @@ -11926,6 +11928,55 @@ ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $ > ac_compiler_gnu=$ac_cv_c_compiler_gnu > > > + > + { $as_echo "$as_me:${as_lineno-$LINENO}: checking for armv9.4-a LSE128 insn support" >&5 > +$as_echo_n "checking for armv9.4-a LSE128 insn support... " >&6; } > +if ${libat_cv_have_feat_lse128+:} false; then : > + $as_echo_n "(cached) " >&6 > +else > + > + cat confdefs.h - <<_ACEOF >conftest.$ac_ext > +/* end confdefs.h. */ > + > +int > +main () > +{ > +asm(".arch armv9-a+lse128") > + ; > + return 0; > +} > +_ACEOF > + if { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_link\""; } >&5 > + (eval $ac_link) 2>&5 > + ac_status=$? > + $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 > + test $ac_status = 0; }; then > + eval libat_cv_have_feat_lse128=yes > + else > + eval libat_cv_have_feat_lse128=no > + fi > + rm -f conftest* > + > +fi > +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $libat_cv_have_feat_lse128" >&5 > +$as_echo "$libat_cv_have_feat_lse128" >&6; } > + > + yesno=`echo $libat_cv_have_feat_lse128 | tr 'yesno' '1 0 '` > + > +cat >>confdefs.h <<_ACEOF > +#define HAVE_FEAT_LSE128 $yesno > +_ACEOF > + > + > + if test x$libat_cv_have_feat_lse128 = xyes; then > + ARCH_AARCH64_HAVE_LSE128_TRUE= > + ARCH_AARCH64_HAVE_LSE128_FALSE='#' > +else > + ARCH_AARCH64_HAVE_LSE128_TRUE='#' > + ARCH_AARCH64_HAVE_LSE128_FALSE= > +fi > + > + > ;; > esac > > @@ -15989,6 +16040,10 @@ if test -z "${ENABLE_DARWIN_AT_RPATH_TRUE}" && test -z "${ENABLE_DARWIN_AT_RPATH > as_fn_error $? "conditional \"ENABLE_DARWIN_AT_RPATH\" was never defined. > Usually this means the macro was only invoked conditionally." "$LINENO" 5 > fi > +if test -z "${ARCH_AARCH64_HAVE_LSE128_TRUE}" && test -z "${ARCH_AARCH64_HAVE_LSE128_FALSE}"; then > + as_fn_error $? "conditional \"ARCH_AARCH64_HAVE_LSE128\" was never defined. > +Usually this means the macro was only invoked conditionally." "$LINENO" 5 > +fi > > if test -z "${LIBAT_BUILD_VERSIONED_SHLIB_TRUE}" && test -z "${LIBAT_BUILD_VERSIONED_SHLIB_FALSE}"; then > as_fn_error $? "conditional \"LIBAT_BUILD_VERSIONED_SHLIB\" was never defined. > diff --git a/libatomic/configure.ac b/libatomic/configure.ac > index 5f2821ac3f4..b2fe68d7d0f 100644 > --- a/libatomic/configure.ac > +++ b/libatomic/configure.ac > @@ -169,6 +169,7 @@ AC_MSG_RESULT([$target_thread_file]) > case "$target" in > *aarch64*) > ACX_PROG_CC_WARNING_OPTS([-march=armv8-a+lse],[enable_aarch64_lse]) > + LIBAT_TEST_FEAT_LSE128() > ;; > esac > R. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 2/2] libatomic: Enable LSE128 128-bit atomics for armv9.4-a 2023-11-29 15:15 ` Richard Earnshaw @ 2023-12-08 14:51 ` Szabolcs Nagy 0 siblings, 0 replies; 6+ messages in thread From: Szabolcs Nagy @ 2023-12-08 14:51 UTC (permalink / raw) To: Richard Earnshaw, Victor Do Nascimento, gcc-patches Cc: kyrylo.tkachov, richard.sandiford, Richard.Earnshaw The 11/29/2023 15:15, Richard Earnshaw wrote: > On 13/11/2023 11:37, Victor Do Nascimento wrote: > > +/* LSE128 atomic support encoded in ID_AA64ISAR0_EL1.Atomic, > > + bits[23:20]. The expected value is 0b0011. Check that. */ > > +#define HAS_LSE128() ({ \ > > + unsigned long val; \ > > + asm volatile ("mrs %0, ID_AA64ISAR0_EL1" : "=r" (val)); \ > > + (val & 0xf00000) >= 0x300000; \ > > + }) > > + > > The pseudo-code for this register reads: > > if PSTATE.EL == EL0 then > if IsFeatureImplemented(FEAT_IDST) then > if EL2Enabled() && HCR_EL2.TGE == '1' then > AArch64.SystemAccessTrap(EL2, 0x18); > else > AArch64.SystemAccessTrap(EL1, 0x18); > else > UNDEFINED; > ... > > So this instruction may result in SIGILL if run on cores without FEAT_IDST. > SystemAccessTrap just punts the problem up to the kernel or hypervisor as > well. yes, HWCAP_CPUID has to be checked to see if linux traps and emulates the mrs for userspace. > I think we need a hwcap bit to work this out, which is the preferred way on yes, use hwcap instead of id reg (hwcap2 is passed to aarch64 ifuncs or __getauxval works) > Linux anyway. Something like this? :) https://lore.kernel.org/linux-arm-kernel/20231003124544.858804-2-joey.gouly@arm.com/T/ note that there was no linux release since this got added. we can add the hwcap values tentatively, but there is a risk of revert on the kernel side (which means libatomic vs linux abi break) so i would only commit the patch into gcc after a linux release is tagged. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2023-12-08 14:51 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-11-13 11:37 [PATCH v2 0/2] Libatomic: Add LSE128 atomics support for AArch64 Victor Do Nascimento 2023-11-13 11:37 ` [PATCH v2 1/2] libatomic: atomic_16.S: Improve ENTRY, END and ALIAS macro interface Victor Do Nascimento 2023-11-29 13:44 ` Richard Earnshaw 2023-11-13 11:37 ` [PATCH v2 2/2] libatomic: Enable LSE128 128-bit atomics for armv9.4-a Victor Do Nascimento 2023-11-29 15:15 ` Richard Earnshaw 2023-12-08 14:51 ` Szabolcs Nagy
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).