public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [AArch64] PR105162: emit barrier for __sync and __atomic builtins on CPUs without LSE
@ 2022-04-07 20:15 Pop, Sebastian
  2022-04-18 18:22 ` Pop, Sebastian
  0 siblings, 1 reply; 10+ messages in thread
From: Pop, Sebastian @ 2022-04-07 20:15 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 672 bytes --]

Hi,


With -moutline-atomics gcc stops generating a barrier for __sync builtins: https://gcc.gnu.org/PR105162<https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105162>

This is a problem on CPUs without LSE instructions where the ld/st exclusives do not guarantee a full barrier.

The attached patch adds the barrier to the outline-atomics functions on the path without LSE instructions.

In consequence, under -moutline-atomics __atomic and __sync builtins now behave the same with and without LSE instructions.

To complete the change, the second patch makes gcc emit the barrier for __atomic builtins as well, i.e., independently of is_mm_sync().


Sebastian

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-AArch64-add-barrier-to-no-LSE-path-in-outline-atomic.patch --]
[-- Type: text/x-patch; name="0001-AArch64-add-barrier-to-no-LSE-path-in-outline-atomic.patch", Size: 832 bytes --]

From b1ffa7d737427dc9414cb0c315f08b7c84ef647b Mon Sep 17 00:00:00 2001
From: Sebastian Pop <spop@amazon.com>
Date: Wed, 6 Apr 2022 21:42:11 +0000
Subject: [PATCH] [AArch64] add barrier to no LSE path in outline-atomics
 functions

---
 libgcc/config/aarch64/lse.S | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/libgcc/config/aarch64/lse.S b/libgcc/config/aarch64/lse.S
index c353ec2215b..ac77c68e300 100644
--- a/libgcc/config/aarch64/lse.S
+++ b/libgcc/config/aarch64/lse.S
@@ -229,6 +229,7 @@ STARTFN	NAME(swp)
 0:	LDXR		s(0), [x1]
 	STXR		w(tmp1), s(tmp0), [x1]
 	cbnz		w(tmp1), 0b
+        dmb             ish
 	ret
 
 ENDFN	NAME(swp)
@@ -273,6 +274,7 @@ STARTFN	NAME(LDNM)
 	OP		s(tmp1), s(0), s(tmp0)
 	STXR		w(tmp2), s(tmp1), [x1]
 	cbnz		w(tmp2), 0b
+        dmb             ish
 	ret
 
 ENDFN	NAME(LDNM)
-- 
2.25.1


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: 0002-AArch64-emit-a-barrier-for-__atomic-builtins.patch --]
[-- Type: text/x-patch; name="0002-AArch64-emit-a-barrier-for-__atomic-builtins.patch", Size: 2147 bytes --]

From 68c07f95157057f0167723b182f0ccffdac8a17e Mon Sep 17 00:00:00 2001
From: Sebastian Pop <spop@amazon.com>
Date: Thu, 7 Apr 2022 19:18:57 +0000
Subject: [PATCH 2/2] [AArch64] emit a barrier for __atomic builtins

---
 gcc/config/aarch64/aarch64.cc | 15 +++------------
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 18f80499079..be1b8d22c6a 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -22931,9 +22931,7 @@ aarch64_split_compare_and_swap (rtx operands[])
   if (strong_zero_p)
     aarch64_gen_compare_reg (NE, rval, const0_rtx);
 
-  /* Emit any final barrier needed for a __sync operation.  */
-  if (is_mm_sync (model))
-    aarch64_emit_post_barrier (model);
+  aarch64_emit_post_barrier (model);
 }
 
 /* Split an atomic operation.  */
@@ -22948,7 +22946,6 @@ aarch64_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem,
   machine_mode mode = GET_MODE (mem);
   machine_mode wmode = (mode == DImode ? DImode : SImode);
   const enum memmodel model = memmodel_from_int (INTVAL (model_rtx));
-  const bool is_sync = is_mm_sync (model);
   rtx_code_label *label;
   rtx x;
 
@@ -22966,11 +22963,7 @@ aarch64_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem,
 
   /* The initial load can be relaxed for a __sync operation since a final
      barrier will be emitted to stop code hoisting.  */
- if (is_sync)
-    aarch64_emit_load_exclusive (mode, old_out, mem,
-				 GEN_INT (MEMMODEL_RELAXED));
-  else
-    aarch64_emit_load_exclusive (mode, old_out, mem, model_rtx);
+  aarch64_emit_load_exclusive (mode, old_out, mem, GEN_INT (MEMMODEL_RELAXED));
 
   switch (code)
     {
@@ -23016,9 +23009,7 @@ aarch64_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem,
 			    gen_rtx_LABEL_REF (Pmode, label), pc_rtx);
   aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
 
-  /* Emit any final barrier needed for a __sync operation.  */
-  if (is_sync)
-    aarch64_emit_post_barrier (model);
+  aarch64_emit_post_barrier (model);
 }
 
 static void
-- 
2.25.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [AArch64] PR105162: emit barrier for __sync and __atomic builtins on CPUs without LSE
  2022-04-07 20:15 [AArch64] PR105162: emit barrier for __sync and __atomic builtins on CPUs without LSE Pop, Sebastian
@ 2022-04-18 18:22 ` Pop, Sebastian
  2022-04-19 12:51   ` Wilco Dijkstra
  0 siblings, 1 reply; 10+ messages in thread
From: Pop, Sebastian @ 2022-04-18 18:22 UTC (permalink / raw)
  To: gcc-patches; +Cc: Kyrylo Tkachov, wilco

[-- Attachment #1: Type: text/plain, Size: 1333 bytes --]

Hi,


Wilco pointed out in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105162#c7? that

"Only __sync needs the extra full barrier, but __atomic does not."

The attached patch does that by adding out-of-line functions for MEMMODEL_SYNC_*.

Those new functions contain a barrier on the path without LSE instructions.


I tested the patch on aarch64-linux with bootstrap and make check.


Sebastian


________________________________
From: Pop, Sebastian
Sent: Thursday, April 7, 2022 3:15 PM
To: gcc-patches@gcc.gnu.org
Cc: Kyrylo Tkachov
Subject: [AArch64] PR105162: emit barrier for __sync and __atomic builtins on CPUs without LSE


Hi,


With -moutline-atomics gcc stops generating a barrier for __sync builtins: https://gcc.gnu.org/PR105162<https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105162>

This is a problem on CPUs without LSE instructions where the ld/st exclusives do not guarantee a full barrier.

The attached patch adds the barrier to the outline-atomics functions on the path without LSE instructions.

In consequence, under -moutline-atomics __atomic and __sync builtins now behave the same with and without LSE instructions.

To complete the change, the second patch makes gcc emit the barrier for __atomic builtins as well, i.e., independently of is_mm_sync().


Sebastian

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-AArch64-add-barriers-to-ool-__sync-builtins.patch --]
[-- Type: text/x-patch; name="0001-AArch64-add-barriers-to-ool-__sync-builtins.patch", Size: 8484 bytes --]

From b45842c209ddcb560d6de477ced444f4d948ea76 Mon Sep 17 00:00:00 2001
From: Sebastian Pop <spop@amazon.com>
Date: Mon, 18 Apr 2022 15:13:20 +0000
Subject: [PATCH] [AArch64] add barriers to ool __sync builtins

---
 gcc/config/aarch64/aarch64-protos.h           |  2 +-
 gcc/config/aarch64/aarch64.cc                 | 18 ++++++++--
 .../gcc.target/aarch64/sync-comp-swap-ool.c   |  6 ++++
 .../gcc.target/aarch64/sync-op-acquire-ool.c  |  6 ++++
 .../gcc.target/aarch64/sync-op-full-ool.c     |  9 +++++
 .../gcc.target/aarch64/target_attr_20.c       |  2 +-
 .../gcc.target/aarch64/target_attr_21.c       |  2 +-
 libgcc/config/aarch64/lse.S                   | 33 +++++++++++++++++--
 libgcc/config/aarch64/t-lse                   |  8 ++---
 9 files changed, 74 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sync-comp-swap-ool.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sync-op-acquire-ool.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sync-op-full-ool.c

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 46bade28ed6..0b325e1e32b 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -1051,7 +1051,7 @@ bool aarch64_high_bits_all_ones_p (HOST_WIDE_INT);
 
 struct atomic_ool_names
 {
-    const char *str[5][4];
+    const char *str[5][7];
 };
 
 rtx aarch64_atomic_ool_func(machine_mode mode, rtx model_rtx,
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 18f80499079..3b02ee379f5 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -22670,14 +22670,14 @@ aarch64_emit_unlikely_jump (rtx insn)
   add_reg_br_prob_note (jump, profile_probability::very_unlikely ());
 }
 
-/* We store the names of the various atomic helpers in a 5x4 array.
+/* We store the names of the various atomic helpers in a 5x7 array.
    Return the libcall function given MODE, MODEL and NAMES.  */
 
 rtx
 aarch64_atomic_ool_func(machine_mode mode, rtx model_rtx,
 			const atomic_ool_names *names)
 {
-  memmodel model = memmodel_base (INTVAL (model_rtx));
+  memmodel model = memmodel_from_int (INTVAL (model_rtx));
   int mode_idx, model_idx;
 
   switch (mode)
@@ -22717,6 +22717,15 @@ aarch64_atomic_ool_func(machine_mode mode, rtx model_rtx,
     case MEMMODEL_SEQ_CST:
       model_idx = 3;
       break;
+    case MEMMODEL_SYNC_ACQUIRE:
+      model_idx = 4;
+      break;
+    case MEMMODEL_SYNC_RELEASE:
+      model_idx = 5;
+      break;
+    case MEMMODEL_SYNC_SEQ_CST:
+      model_idx = 6;
+      break;
     default:
       gcc_unreachable ();
     }
@@ -22729,7 +22738,10 @@ aarch64_atomic_ool_func(machine_mode mode, rtx model_rtx,
   { "__aarch64_" #B #N "_relax", \
     "__aarch64_" #B #N "_acq", \
     "__aarch64_" #B #N "_rel", \
-    "__aarch64_" #B #N "_acq_rel" }
+    "__aarch64_" #B #N "_acq_rel", \
+    "__aarch64_" #B #N "_sync_acq", \
+    "__aarch64_" #B #N "_sync_rel", \
+    "__aarch64_" #B #N "_sync_seq" }
 
 #define DEF4(B)  DEF0(B, 1), DEF0(B, 2), DEF0(B, 4), DEF0(B, 8), \
 		 { NULL, NULL, NULL, NULL }
diff --git a/gcc/testsuite/gcc.target/aarch64/sync-comp-swap-ool.c b/gcc/testsuite/gcc.target/aarch64/sync-comp-swap-ool.c
new file mode 100644
index 00000000000..cb28db21d17
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sync-comp-swap-ool.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-options "-march=armv8-a+nolse -O2 -fno-ipa-icf -moutline-atomics" } */
+
+#include "sync-comp-swap.x"
+
+/* { dg-final { scan-assembler-times "bl.*__aarch64_cas4_sync_seq" 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sync-op-acquire-ool.c b/gcc/testsuite/gcc.target/aarch64/sync-op-acquire-ool.c
new file mode 100644
index 00000000000..bf8e7210e75
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sync-op-acquire-ool.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-options "-march=armv8-a+nolse -O2 -moutline-atomics" } */
+
+#include "sync-op-acquire.x"
+
+/* { dg-final { scan-assembler-times "bl.*__aarch64_swp4_sync_acq" 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sync-op-full-ool.c b/gcc/testsuite/gcc.target/aarch64/sync-op-full-ool.c
new file mode 100644
index 00000000000..a50fd96fffb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sync-op-full-ool.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-march=armv8-a+nolse -O2 -moutline-atomics" } */
+
+#include "sync-op-full.x"
+
+/* { dg-final { scan-assembler-times "bl.*__aarch64_ldadd4_sync_seq" 1 } } */
+/* { dg-final { scan-assembler-times "bl.*__aarch64_ldclr4_sync_seq" 1 } } */
+/* { dg-final { scan-assembler-times "bl.*__aarch64_ldeor4_sync_seq" 1 } } */
+/* { dg-final { scan-assembler-times "bl.*__aarch64_ldset4_sync_seq" 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/target_attr_20.c b/gcc/testsuite/gcc.target/aarch64/target_attr_20.c
index 509fb039e84..abf9f5cf230 100644
--- a/gcc/testsuite/gcc.target/aarch64/target_attr_20.c
+++ b/gcc/testsuite/gcc.target/aarch64/target_attr_20.c
@@ -24,4 +24,4 @@ bar (void)
     }
 }
 
-/* { dg-final { scan-assembler-not "bl.*__aarch64_cas2_acq_rel" } } */
+/* { dg-final { scan-assembler-not "bl.*__aarch64_cas2" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/target_attr_21.c b/gcc/testsuite/gcc.target/aarch64/target_attr_21.c
index acace4c8f2a..02d9577c3d4 100644
--- a/gcc/testsuite/gcc.target/aarch64/target_attr_21.c
+++ b/gcc/testsuite/gcc.target/aarch64/target_attr_21.c
@@ -24,4 +24,4 @@ bar (void)
     }
 }
 
-/* { dg-final { scan-assembler-times "bl.*__aarch64_cas2_acq_rel" 1 } } */
+/* { dg-final { scan-assembler-times "bl.*__aarch64_cas2" 1 } } */
diff --git a/libgcc/config/aarch64/lse.S b/libgcc/config/aarch64/lse.S
index c353ec2215b..81cbfe92825 100644
--- a/libgcc/config/aarch64/lse.S
+++ b/libgcc/config/aarch64/lse.S
@@ -87,24 +87,49 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 # define L
 # define M     0x000000
 # define N     0x000000
+# define BARRIER
 #elif MODEL == 2
 # define SUFF  _acq
 # define A     a
 # define L
 # define M     0x400000
 # define N     0x800000
+# define BARRIER
 #elif MODEL == 3
 # define SUFF  _rel
 # define A
 # define L     l
 # define M     0x008000
 # define N     0x400000
+# define BARRIER
 #elif MODEL == 4
 # define SUFF  _acq_rel
 # define A     a
 # define L     l
 # define M     0x408000
 # define N     0xc00000
+# define BARRIER
+#elif MODEL == 5
+# define SUFF  _sync_acq
+# define A     a
+# define L
+# define M     0x400000
+# define N     0x800000
+# define BARRIER dmb		ish
+#elif MODEL == 6
+# define SUFF  _sync_rel
+# define A
+# define L     l
+# define M     0x008000
+# define N     0x400000
+# define BARRIER dmb		ish
+#elif MODEL == 7
+# define SUFF  _sync_seq
+# define A     a
+# define L     l
+# define M     0x408000
+# define N     0xc00000
+# define BARRIER dmb		ish
 #else
 # error
 #endif
@@ -183,7 +208,8 @@ STARTFN	NAME(cas)
 	bne		1f
 	STXR		w(tmp1), s(1), [x2]
 	cbnz		w(tmp1), 0b
-1:	ret
+1:	BARRIER
+	ret
 
 #else
 #define LDXP	glue3(ld, A, xp)
@@ -205,7 +231,8 @@ STARTFN	NAME(cas)
 	bne		1f
 	STXP		w(tmp2), x2, x3, [x4]
 	cbnz		w(tmp2), 0b
-1:	ret
+1:	BARRIER
+	ret
 
 #endif
 
@@ -229,6 +256,7 @@ STARTFN	NAME(swp)
 0:	LDXR		s(0), [x1]
 	STXR		w(tmp1), s(tmp0), [x1]
 	cbnz		w(tmp1), 0b
+	BARRIER
 	ret
 
 ENDFN	NAME(swp)
@@ -273,6 +301,7 @@ STARTFN	NAME(LDNM)
 	OP		s(tmp1), s(0), s(tmp0)
 	STXR		w(tmp2), s(tmp1), [x1]
 	cbnz		w(tmp2), 0b
+	BARRIER
 	ret
 
 ENDFN	NAME(LDNM)
diff --git a/libgcc/config/aarch64/t-lse b/libgcc/config/aarch64/t-lse
index 790cada3315..aca874f0e31 100644
--- a/libgcc/config/aarch64/t-lse
+++ b/libgcc/config/aarch64/t-lse
@@ -18,13 +18,13 @@
 # along with GCC; see the file COPYING3.  If not see
 # <http://www.gnu.org/licenses/>.
 
-# Compare-and-swap has 5 sizes and 4 memory models.
+# Compare-and-swap has 5 sizes and 7 memory models.
 S0 := $(foreach s, 1 2 4 8 16, $(addsuffix _$(s), cas))
-O0 := $(foreach m, 1 2 3 4, $(addsuffix _$(m)$(objext), $(S0)))
+O0 := $(foreach m, 1 2 3 4 5 6 7, $(addsuffix _$(m)$(objext), $(S0)))
 
-# Swap, Load-and-operate have 4 sizes and 4 memory models
+# Swap, Load-and-operate have 4 sizes and 7 memory models
 S1 := $(foreach s, 1 2 4 8, $(addsuffix _$(s), swp ldadd ldclr ldeor ldset))
-O1 := $(foreach m, 1 2 3 4, $(addsuffix _$(m)$(objext), $(S1)))
+O1 := $(foreach m, 1 2 3 4 5 6 7, $(addsuffix _$(m)$(objext), $(S1)))
 
 LSE_OBJS := $(O0) $(O1)
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [AArch64] PR105162: emit barrier for __sync and __atomic builtins on CPUs without LSE
  2022-04-18 18:22 ` Pop, Sebastian
@ 2022-04-19 12:51   ` Wilco Dijkstra
  2022-04-25 22:06     ` Pop, Sebastian
  0 siblings, 1 reply; 10+ messages in thread
From: Wilco Dijkstra @ 2022-04-19 12:51 UTC (permalink / raw)
  To: Pop, Sebastian, gcc-patches

Hi Sebastian,

> Wilco pointed out in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105162#c7​ that
> "Only __sync needs the extra full barrier, but __atomic does not."
> The attached patch does that by adding out-of-line functions for MEMMODEL_SYNC_*.
> Those new functions contain a barrier on the path without LSE instructions.

Yes, adding _sync versions of the outline functions is the correct approach. However
there is no need to have separate _acq/_rel/_seq variants for every function since all
but one are _seq. Also we should ensure we generate the same sequence as the inlined
versions so that they are consistent. This means ensuring the LDXR macro ignores the
'A' for the _sync variants and the swp function switches to acquire semantics.

Cheers,
Wilco

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [AArch64] PR105162: emit barrier for __sync and __atomic builtins on CPUs without LSE
  2022-04-19 12:51   ` Wilco Dijkstra
@ 2022-04-25 22:06     ` Pop, Sebastian
  2022-05-03 15:33       ` Wilco Dijkstra
  0 siblings, 1 reply; 10+ messages in thread
From: Pop, Sebastian @ 2022-04-25 22:06 UTC (permalink / raw)
  To: Wilco Dijkstra, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1585 bytes --]

Hi Wilco,

Thanks for your review.
Please find attached the patch amended following your recommendations.
The number of new functions for _sync is reduced by 3x.
I tested the patch on Graviton2 aarch64-linux.
I also checked by hand that the outline functions in libgcc look similar to what GCC produces for the inline version.

Thanks,
Sebastian
________________________________________
From: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Sent: Tuesday, April 19, 2022 7:51 AM
To: Pop, Sebastian; gcc-patches@gcc.gnu.org
Cc: Kyrylo Tkachov
Subject: RE: [EXTERNAL] [AArch64] PR105162: emit barrier for __sync and __atomic builtins on CPUs without LSE

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.



Hi Sebastian,

> Wilco pointed out in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105162#c7​ that
> "Only __sync needs the extra full barrier, but __atomic does not."
> The attached patch does that by adding out-of-line functions for MEMMODEL_SYNC_*.
> Those new functions contain a barrier on the path without LSE instructions.

Yes, adding _sync versions of the outline functions is the correct approach. However
there is no need to have separate _acq/_rel/_seq variants for every function since all
but one are _seq. Also we should ensure we generate the same sequence as the inlined
versions so that they are consistent. This means ensuring the LDXR macro ignores the
'A' for the _sync variants and the swp function switches to acquire semantics.

Cheers,
Wilco

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-AArch64-add-barriers-to-ool-__sync-builtins.patch --]
[-- Type: text/x-patch; name="0001-AArch64-add-barriers-to-ool-__sync-builtins.patch", Size: 8856 bytes --]

From 4e90111e507a611d3daa0f71fc17c6d5cc3e203e Mon Sep 17 00:00:00 2001
From: Sebastian Pop <spop@amazon.com>
Date: Mon, 18 Apr 2022 15:13:20 +0000
Subject: [PATCH] [AArch64] add barriers to ool __sync builtins

---
 gcc/config/aarch64/aarch64-protos.h           |  2 +-
 gcc/config/aarch64/aarch64.cc                 | 12 ++++--
 .../gcc.target/aarch64/sync-comp-swap-ool.c   |  6 +++
 .../gcc.target/aarch64/sync-op-acquire-ool.c  |  6 +++
 .../gcc.target/aarch64/sync-op-full-ool.c     |  9 ++++
 .../gcc.target/aarch64/target_attr_20.c       |  2 +-
 .../gcc.target/aarch64/target_attr_21.c       |  2 +-
 libgcc/config/aarch64/lse.S                   | 42 +++++++++++++++++--
 libgcc/config/aarch64/t-lse                   |  8 ++--
 9 files changed, 75 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sync-comp-swap-ool.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sync-op-acquire-ool.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sync-op-full-ool.c

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 46bade28ed6..3ad5e77a1af 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -1051,7 +1051,7 @@ bool aarch64_high_bits_all_ones_p (HOST_WIDE_INT);
 
 struct atomic_ool_names
 {
-    const char *str[5][4];
+    const char *str[5][5];
 };
 
 rtx aarch64_atomic_ool_func(machine_mode mode, rtx model_rtx,
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 18f80499079..3ad11e84aae 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -22670,14 +22670,14 @@ aarch64_emit_unlikely_jump (rtx insn)
   add_reg_br_prob_note (jump, profile_probability::very_unlikely ());
 }
 
-/* We store the names of the various atomic helpers in a 5x4 array.
+/* We store the names of the various atomic helpers in a 5x5 array.
    Return the libcall function given MODE, MODEL and NAMES.  */
 
 rtx
 aarch64_atomic_ool_func(machine_mode mode, rtx model_rtx,
 			const atomic_ool_names *names)
 {
-  memmodel model = memmodel_base (INTVAL (model_rtx));
+  memmodel model = memmodel_from_int (INTVAL (model_rtx));
   int mode_idx, model_idx;
 
   switch (mode)
@@ -22717,6 +22717,11 @@ aarch64_atomic_ool_func(machine_mode mode, rtx model_rtx,
     case MEMMODEL_SEQ_CST:
       model_idx = 3;
       break;
+    case MEMMODEL_SYNC_ACQUIRE:
+    case MEMMODEL_SYNC_RELEASE:
+    case MEMMODEL_SYNC_SEQ_CST:
+      model_idx = 4;
+      break;
     default:
       gcc_unreachable ();
     }
@@ -22729,7 +22734,8 @@ aarch64_atomic_ool_func(machine_mode mode, rtx model_rtx,
   { "__aarch64_" #B #N "_relax", \
     "__aarch64_" #B #N "_acq", \
     "__aarch64_" #B #N "_rel", \
-    "__aarch64_" #B #N "_acq_rel" }
+    "__aarch64_" #B #N "_acq_rel", \
+    "__aarch64_" #B #N "_sync" }
 
 #define DEF4(B)  DEF0(B, 1), DEF0(B, 2), DEF0(B, 4), DEF0(B, 8), \
 		 { NULL, NULL, NULL, NULL }
diff --git a/gcc/testsuite/gcc.target/aarch64/sync-comp-swap-ool.c b/gcc/testsuite/gcc.target/aarch64/sync-comp-swap-ool.c
new file mode 100644
index 00000000000..372f4aa8746
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sync-comp-swap-ool.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-options "-march=armv8-a+nolse -O2 -fno-ipa-icf -moutline-atomics" } */
+
+#include "sync-comp-swap.x"
+
+/* { dg-final { scan-assembler-times "bl.*__aarch64_cas4_sync" 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sync-op-acquire-ool.c b/gcc/testsuite/gcc.target/aarch64/sync-op-acquire-ool.c
new file mode 100644
index 00000000000..95d9c56b5e1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sync-op-acquire-ool.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-options "-march=armv8-a+nolse -O2 -moutline-atomics" } */
+
+#include "sync-op-acquire.x"
+
+/* { dg-final { scan-assembler-times "bl.*__aarch64_swp4_sync" 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sync-op-full-ool.c b/gcc/testsuite/gcc.target/aarch64/sync-op-full-ool.c
new file mode 100644
index 00000000000..2f3881d9755
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sync-op-full-ool.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-march=armv8-a+nolse -O2 -moutline-atomics" } */
+
+#include "sync-op-full.x"
+
+/* { dg-final { scan-assembler-times "bl.*__aarch64_ldadd4_sync" 1 } } */
+/* { dg-final { scan-assembler-times "bl.*__aarch64_ldclr4_sync" 1 } } */
+/* { dg-final { scan-assembler-times "bl.*__aarch64_ldeor4_sync" 1 } } */
+/* { dg-final { scan-assembler-times "bl.*__aarch64_ldset4_sync" 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/target_attr_20.c b/gcc/testsuite/gcc.target/aarch64/target_attr_20.c
index 509fb039e84..abf9f5cf230 100644
--- a/gcc/testsuite/gcc.target/aarch64/target_attr_20.c
+++ b/gcc/testsuite/gcc.target/aarch64/target_attr_20.c
@@ -24,4 +24,4 @@ bar (void)
     }
 }
 
-/* { dg-final { scan-assembler-not "bl.*__aarch64_cas2_acq_rel" } } */
+/* { dg-final { scan-assembler-not "bl.*__aarch64_cas2" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/target_attr_21.c b/gcc/testsuite/gcc.target/aarch64/target_attr_21.c
index acace4c8f2a..02d9577c3d4 100644
--- a/gcc/testsuite/gcc.target/aarch64/target_attr_21.c
+++ b/gcc/testsuite/gcc.target/aarch64/target_attr_21.c
@@ -24,4 +24,4 @@ bar (void)
     }
 }
 
-/* { dg-final { scan-assembler-times "bl.*__aarch64_cas2_acq_rel" 1 } } */
+/* { dg-final { scan-assembler-times "bl.*__aarch64_cas2" 1 } } */
diff --git a/libgcc/config/aarch64/lse.S b/libgcc/config/aarch64/lse.S
index c353ec2215b..9c29cf08b59 100644
--- a/libgcc/config/aarch64/lse.S
+++ b/libgcc/config/aarch64/lse.S
@@ -87,24 +87,44 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 # define L
 # define M     0x000000
 # define N     0x000000
+# define BARRIER
 #elif MODEL == 2
 # define SUFF  _acq
 # define A     a
 # define L
 # define M     0x400000
 # define N     0x800000
+# define BARRIER
 #elif MODEL == 3
 # define SUFF  _rel
 # define A
 # define L     l
 # define M     0x008000
 # define N     0x400000
+# define BARRIER
 #elif MODEL == 4
 # define SUFF  _acq_rel
 # define A     a
 # define L     l
 # define M     0x408000
 # define N     0xc00000
+# define BARRIER
+#elif MODEL == 5
+# define SUFF  _sync
+#ifdef L_swp
+/* swp has _acq semantics.  */
+#  define A    a
+#  define L
+#  define M    0x400000
+#  define N    0x800000
+#else
+/* All other _sync functions have _seq semantics.  */
+#  define A    a
+#  define L    l
+#  define M    0x408000
+#  define N    0xc00000
+#endif
+# define BARRIER dmb		ish
 #else
 # error
 #endif
@@ -127,7 +147,12 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 #endif
 
 #define NAME(BASE)		glue4(__aarch64_, BASE, SIZE, SUFF)
-#define LDXR			glue4(ld, A, xr, S)
+#if MODEL == 5
+/* Drop A for _sync functions.  */
+# define LDXR			glue3(ld, xr, S)
+#else
+# define LDXR			glue4(ld, A, xr, S)
+#endif
 #define STXR			glue4(st, L, xr, S)
 
 /* Temporary registers used.  Other than these, only the return value
@@ -183,10 +208,16 @@ STARTFN	NAME(cas)
 	bne		1f
 	STXR		w(tmp1), s(1), [x2]
 	cbnz		w(tmp1), 0b
-1:	ret
+1:	BARRIER
+	ret
 
 #else
-#define LDXP	glue3(ld, A, xp)
+#if MODEL == 5
+/* Drop A for _sync functions.  */
+# define LDXP	glue2(ld, xp)
+#else
+# define LDXP	glue3(ld, A, xp)
+#endif
 #define STXP	glue3(st, L, xp)
 #ifdef HAVE_AS_LSE
 # define CASP	glue3(casp, A, L)	x0, x1, x2, x3, [x4]
@@ -205,7 +236,8 @@ STARTFN	NAME(cas)
 	bne		1f
 	STXP		w(tmp2), x2, x3, [x4]
 	cbnz		w(tmp2), 0b
-1:	ret
+1:	BARRIER
+	ret
 
 #endif
 
@@ -229,6 +261,7 @@ STARTFN	NAME(swp)
 0:	LDXR		s(0), [x1]
 	STXR		w(tmp1), s(tmp0), [x1]
 	cbnz		w(tmp1), 0b
+	BARRIER
 	ret
 
 ENDFN	NAME(swp)
@@ -273,6 +306,7 @@ STARTFN	NAME(LDNM)
 	OP		s(tmp1), s(0), s(tmp0)
 	STXR		w(tmp2), s(tmp1), [x1]
 	cbnz		w(tmp2), 0b
+	BARRIER
 	ret
 
 ENDFN	NAME(LDNM)
diff --git a/libgcc/config/aarch64/t-lse b/libgcc/config/aarch64/t-lse
index 790cada3315..624daf7eddf 100644
--- a/libgcc/config/aarch64/t-lse
+++ b/libgcc/config/aarch64/t-lse
@@ -18,13 +18,13 @@
 # along with GCC; see the file COPYING3.  If not see
 # <http://www.gnu.org/licenses/>.
 
-# Compare-and-swap has 5 sizes and 4 memory models.
+# Compare-and-swap has 5 sizes and 5 memory models.
 S0 := $(foreach s, 1 2 4 8 16, $(addsuffix _$(s), cas))
-O0 := $(foreach m, 1 2 3 4, $(addsuffix _$(m)$(objext), $(S0)))
+O0 := $(foreach m, 1 2 3 4 5, $(addsuffix _$(m)$(objext), $(S0)))
 
-# Swap, Load-and-operate have 4 sizes and 4 memory models
+# Swap, Load-and-operate have 4 sizes and 5 memory models
 S1 := $(foreach s, 1 2 4 8, $(addsuffix _$(s), swp ldadd ldclr ldeor ldset))
-O1 := $(foreach m, 1 2 3 4, $(addsuffix _$(m)$(objext), $(S1)))
+O1 := $(foreach m, 1 2 3 4 5, $(addsuffix _$(m)$(objext), $(S1)))
 
 LSE_OBJS := $(O0) $(O1)
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [AArch64] PR105162: emit barrier for __sync and __atomic builtins on CPUs without LSE
  2022-04-25 22:06     ` Pop, Sebastian
@ 2022-05-03 15:33       ` Wilco Dijkstra
  2022-05-04 14:23         ` Pop, Sebastian
  0 siblings, 1 reply; 10+ messages in thread
From: Wilco Dijkstra @ 2022-05-03 15:33 UTC (permalink / raw)
  To: Pop, Sebastian, gcc-patches

Hi Sebastian,

> Please find attached the patch amended following your recommendations.
> The number of new functions for _sync is reduced by 3x.
> I tested the patch on Graviton2 aarch64-linux.
> I also checked by hand that the outline functions in libgcc look similar to what GCC produces for the inline version.

Yes this looks good to me (still needs maintainer approval). One minor nitpick,
a few of the tests check for __aarch64_cas2 - this should be __aarch64_cas2_sync.
Note the patch still needs an appropriate commit message.

Cheers,
Wilco

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [AArch64] PR105162: emit barrier for __sync and __atomic builtins on CPUs without LSE
  2022-05-03 15:33       ` Wilco Dijkstra
@ 2022-05-04 14:23         ` Pop, Sebastian
  2022-05-13 14:58           ` Richard Sandiford
  0 siblings, 1 reply; 10+ messages in thread
From: Pop, Sebastian @ 2022-05-04 14:23 UTC (permalink / raw)
  To: Wilco Dijkstra, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1113 bytes --]

> Yes this looks good to me (still needs maintainer approval). 

Thanks again Wilco for your review.

> One minor nitpick,
> a few of the tests check for __aarch64_cas2 - this should be __aarch64_cas2_sync.

Fixed in the attached patch.

> Note the patch still needs an appropriate commit message.

Added the following ChangeLog entry to the commit message.

        * config/aarch64/aarch64-protos.h (atomic_ool_names): Increase dimension
        of str array.
        * config/aarch64/aarch64.cc (aarch64_atomic_ool_func): Call
        memmodel_from_int and handle MEMMODEL_SYNC_*.
        (DEF0): Add __aarch64_*_sync functions.

testsuite/
        * gcc.target/aarch64/sync-comp-swap-ool.c: New.
        * gcc.target/aarch64/sync-op-acquire-ool.c: New.
        * gcc.target/aarch64/sync-op-full-ool.c: New.
        * gcc.target/aarch64/target_attr_20.c: Update check.
        * gcc.target/aarch64/target_attr_21.c: Same.

libgcc/
        * config/aarch64/lse.S: Define BARRIER and handle memory MODEL 5.
        * config/aarch64/t-lse: Add a 5th memory model for _sync functions.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-AArch64-add-barriers-to-ool-__sync-builtins.patch --]
[-- Type: text/x-patch; name="0001-AArch64-add-barriers-to-ool-__sync-builtins.patch", Size: 9507 bytes --]

From 3b624598035e4e0c1aee89efaae28596a64b3d0d Mon Sep 17 00:00:00 2001
From: Sebastian Pop <spop@amazon.com>
Date: Mon, 18 Apr 2022 15:13:20 +0000
Subject: [PATCH] [AArch64] add barriers to ool __sync builtins

	* config/aarch64/aarch64-protos.h (atomic_ool_names): Increase dimension
	of str array.
	* config/aarch64/aarch64.cc (aarch64_atomic_ool_func): Call
	memmodel_from_int and handle MEMMODEL_SYNC_*.
	(DEF0): Add __aarch64_*_sync functions.

testsuite/
	* gcc.target/aarch64/sync-comp-swap-ool.c: New.
	* gcc.target/aarch64/sync-op-acquire-ool.c: New.
	* gcc.target/aarch64/sync-op-full-ool.c: New.
	* gcc.target/aarch64/target_attr_20.c: Update check.
	* gcc.target/aarch64/target_attr_21.c: Same.

libgcc/
	* config/aarch64/lse.S: Define BARRIER and handle memory MODEL 5.
	* config/aarch64/t-lse: Add a 5th memory model for _sync functions.
---
 gcc/config/aarch64/aarch64-protos.h           |  2 +-
 gcc/config/aarch64/aarch64.cc                 | 12 ++++--
 .../gcc.target/aarch64/sync-comp-swap-ool.c   |  6 +++
 .../gcc.target/aarch64/sync-op-acquire-ool.c  |  6 +++
 .../gcc.target/aarch64/sync-op-full-ool.c     |  9 ++++
 .../gcc.target/aarch64/target_attr_20.c       |  2 +-
 .../gcc.target/aarch64/target_attr_21.c       |  2 +-
 libgcc/config/aarch64/lse.S                   | 42 +++++++++++++++++--
 libgcc/config/aarch64/t-lse                   |  8 ++--
 9 files changed, 75 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sync-comp-swap-ool.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sync-op-acquire-ool.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sync-op-full-ool.c

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 46bade28ed6..3ad5e77a1af 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -1051,7 +1051,7 @@ bool aarch64_high_bits_all_ones_p (HOST_WIDE_INT);
 
 struct atomic_ool_names
 {
-    const char *str[5][4];
+    const char *str[5][5];
 };
 
 rtx aarch64_atomic_ool_func(machine_mode mode, rtx model_rtx,
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 18f80499079..3ad11e84aae 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -22670,14 +22670,14 @@ aarch64_emit_unlikely_jump (rtx insn)
   add_reg_br_prob_note (jump, profile_probability::very_unlikely ());
 }
 
-/* We store the names of the various atomic helpers in a 5x4 array.
+/* We store the names of the various atomic helpers in a 5x5 array.
    Return the libcall function given MODE, MODEL and NAMES.  */
 
 rtx
 aarch64_atomic_ool_func(machine_mode mode, rtx model_rtx,
 			const atomic_ool_names *names)
 {
-  memmodel model = memmodel_base (INTVAL (model_rtx));
+  memmodel model = memmodel_from_int (INTVAL (model_rtx));
   int mode_idx, model_idx;
 
   switch (mode)
@@ -22717,6 +22717,11 @@ aarch64_atomic_ool_func(machine_mode mode, rtx model_rtx,
     case MEMMODEL_SEQ_CST:
       model_idx = 3;
       break;
+    case MEMMODEL_SYNC_ACQUIRE:
+    case MEMMODEL_SYNC_RELEASE:
+    case MEMMODEL_SYNC_SEQ_CST:
+      model_idx = 4;
+      break;
     default:
       gcc_unreachable ();
     }
@@ -22729,7 +22734,8 @@ aarch64_atomic_ool_func(machine_mode mode, rtx model_rtx,
   { "__aarch64_" #B #N "_relax", \
     "__aarch64_" #B #N "_acq", \
     "__aarch64_" #B #N "_rel", \
-    "__aarch64_" #B #N "_acq_rel" }
+    "__aarch64_" #B #N "_acq_rel", \
+    "__aarch64_" #B #N "_sync" }
 
 #define DEF4(B)  DEF0(B, 1), DEF0(B, 2), DEF0(B, 4), DEF0(B, 8), \
 		 { NULL, NULL, NULL, NULL }
diff --git a/gcc/testsuite/gcc.target/aarch64/sync-comp-swap-ool.c b/gcc/testsuite/gcc.target/aarch64/sync-comp-swap-ool.c
new file mode 100644
index 00000000000..372f4aa8746
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sync-comp-swap-ool.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-options "-march=armv8-a+nolse -O2 -fno-ipa-icf -moutline-atomics" } */
+
+#include "sync-comp-swap.x"
+
+/* { dg-final { scan-assembler-times "bl.*__aarch64_cas4_sync" 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sync-op-acquire-ool.c b/gcc/testsuite/gcc.target/aarch64/sync-op-acquire-ool.c
new file mode 100644
index 00000000000..95d9c56b5e1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sync-op-acquire-ool.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-options "-march=armv8-a+nolse -O2 -moutline-atomics" } */
+
+#include "sync-op-acquire.x"
+
+/* { dg-final { scan-assembler-times "bl.*__aarch64_swp4_sync" 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sync-op-full-ool.c b/gcc/testsuite/gcc.target/aarch64/sync-op-full-ool.c
new file mode 100644
index 00000000000..2f3881d9755
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sync-op-full-ool.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-march=armv8-a+nolse -O2 -moutline-atomics" } */
+
+#include "sync-op-full.x"
+
+/* { dg-final { scan-assembler-times "bl.*__aarch64_ldadd4_sync" 1 } } */
+/* { dg-final { scan-assembler-times "bl.*__aarch64_ldclr4_sync" 1 } } */
+/* { dg-final { scan-assembler-times "bl.*__aarch64_ldeor4_sync" 1 } } */
+/* { dg-final { scan-assembler-times "bl.*__aarch64_ldset4_sync" 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/target_attr_20.c b/gcc/testsuite/gcc.target/aarch64/target_attr_20.c
index 509fb039e84..c9454fc420b 100644
--- a/gcc/testsuite/gcc.target/aarch64/target_attr_20.c
+++ b/gcc/testsuite/gcc.target/aarch64/target_attr_20.c
@@ -24,4 +24,4 @@ bar (void)
     }
 }
 
-/* { dg-final { scan-assembler-not "bl.*__aarch64_cas2_acq_rel" } } */
+/* { dg-final { scan-assembler-not "bl.*__aarch64_cas2_sync" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/target_attr_21.c b/gcc/testsuite/gcc.target/aarch64/target_attr_21.c
index acace4c8f2a..b8e56223b02 100644
--- a/gcc/testsuite/gcc.target/aarch64/target_attr_21.c
+++ b/gcc/testsuite/gcc.target/aarch64/target_attr_21.c
@@ -24,4 +24,4 @@ bar (void)
     }
 }
 
-/* { dg-final { scan-assembler-times "bl.*__aarch64_cas2_acq_rel" 1 } } */
+/* { dg-final { scan-assembler-times "bl.*__aarch64_cas2_sync" 1 } } */
diff --git a/libgcc/config/aarch64/lse.S b/libgcc/config/aarch64/lse.S
index c353ec2215b..9c29cf08b59 100644
--- a/libgcc/config/aarch64/lse.S
+++ b/libgcc/config/aarch64/lse.S
@@ -87,24 +87,44 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 # define L
 # define M     0x000000
 # define N     0x000000
+# define BARRIER
 #elif MODEL == 2
 # define SUFF  _acq
 # define A     a
 # define L
 # define M     0x400000
 # define N     0x800000
+# define BARRIER
 #elif MODEL == 3
 # define SUFF  _rel
 # define A
 # define L     l
 # define M     0x008000
 # define N     0x400000
+# define BARRIER
 #elif MODEL == 4
 # define SUFF  _acq_rel
 # define A     a
 # define L     l
 # define M     0x408000
 # define N     0xc00000
+# define BARRIER
+#elif MODEL == 5
+# define SUFF  _sync
+#ifdef L_swp
+/* swp has _acq semantics.  */
+#  define A    a
+#  define L
+#  define M    0x400000
+#  define N    0x800000
+#else
+/* All other _sync functions have _seq semantics.  */
+#  define A    a
+#  define L    l
+#  define M    0x408000
+#  define N    0xc00000
+#endif
+# define BARRIER dmb		ish
 #else
 # error
 #endif
@@ -127,7 +147,12 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 #endif
 
 #define NAME(BASE)		glue4(__aarch64_, BASE, SIZE, SUFF)
-#define LDXR			glue4(ld, A, xr, S)
+#if MODEL == 5
+/* Drop A for _sync functions.  */
+# define LDXR			glue3(ld, xr, S)
+#else
+# define LDXR			glue4(ld, A, xr, S)
+#endif
 #define STXR			glue4(st, L, xr, S)
 
 /* Temporary registers used.  Other than these, only the return value
@@ -183,10 +208,16 @@ STARTFN	NAME(cas)
 	bne		1f
 	STXR		w(tmp1), s(1), [x2]
 	cbnz		w(tmp1), 0b
-1:	ret
+1:	BARRIER
+	ret
 
 #else
-#define LDXP	glue3(ld, A, xp)
+#if MODEL == 5
+/* Drop A for _sync functions.  */
+# define LDXP	glue2(ld, xp)
+#else
+# define LDXP	glue3(ld, A, xp)
+#endif
 #define STXP	glue3(st, L, xp)
 #ifdef HAVE_AS_LSE
 # define CASP	glue3(casp, A, L)	x0, x1, x2, x3, [x4]
@@ -205,7 +236,8 @@ STARTFN	NAME(cas)
 	bne		1f
 	STXP		w(tmp2), x2, x3, [x4]
 	cbnz		w(tmp2), 0b
-1:	ret
+1:	BARRIER
+	ret
 
 #endif
 
@@ -229,6 +261,7 @@ STARTFN	NAME(swp)
 0:	LDXR		s(0), [x1]
 	STXR		w(tmp1), s(tmp0), [x1]
 	cbnz		w(tmp1), 0b
+	BARRIER
 	ret
 
 ENDFN	NAME(swp)
@@ -273,6 +306,7 @@ STARTFN	NAME(LDNM)
 	OP		s(tmp1), s(0), s(tmp0)
 	STXR		w(tmp2), s(tmp1), [x1]
 	cbnz		w(tmp2), 0b
+	BARRIER
 	ret
 
 ENDFN	NAME(LDNM)
diff --git a/libgcc/config/aarch64/t-lse b/libgcc/config/aarch64/t-lse
index 790cada3315..624daf7eddf 100644
--- a/libgcc/config/aarch64/t-lse
+++ b/libgcc/config/aarch64/t-lse
@@ -18,13 +18,13 @@
 # along with GCC; see the file COPYING3.  If not see
 # <http://www.gnu.org/licenses/>.
 
-# Compare-and-swap has 5 sizes and 4 memory models.
+# Compare-and-swap has 5 sizes and 5 memory models.
 S0 := $(foreach s, 1 2 4 8 16, $(addsuffix _$(s), cas))
-O0 := $(foreach m, 1 2 3 4, $(addsuffix _$(m)$(objext), $(S0)))
+O0 := $(foreach m, 1 2 3 4 5, $(addsuffix _$(m)$(objext), $(S0)))
 
-# Swap, Load-and-operate have 4 sizes and 4 memory models
+# Swap, Load-and-operate have 4 sizes and 5 memory models
 S1 := $(foreach s, 1 2 4 8, $(addsuffix _$(s), swp ldadd ldclr ldeor ldset))
-O1 := $(foreach m, 1 2 3 4, $(addsuffix _$(m)$(objext), $(S1)))
+O1 := $(foreach m, 1 2 3 4 5, $(addsuffix _$(m)$(objext), $(S1)))
 
 LSE_OBJS := $(O0) $(O1)
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [AArch64] PR105162: emit barrier for __sync and __atomic builtins on CPUs without LSE
  2022-05-04 14:23         ` Pop, Sebastian
@ 2022-05-13 14:58           ` Richard Sandiford
  2022-05-13 15:41             ` Wilco Dijkstra
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Sandiford @ 2022-05-13 14:58 UTC (permalink / raw)
  To: Pop, Sebastian via Gcc-patches; +Cc: Wilco Dijkstra, Pop, Sebastian

"Pop, Sebastian via Gcc-patches" <gcc-patches@gcc.gnu.org> writes:
>> Yes this looks good to me (still needs maintainer approval). 
>
> Thanks again Wilco for your review.
>
>> One minor nitpick,
>> a few of the tests check for __aarch64_cas2 - this should be __aarch64_cas2_sync.
>
> Fixed in the attached patch.
>
>> Note the patch still needs an appropriate commit message.
>
> Added the following ChangeLog entry to the commit message.
>
>         * config/aarch64/aarch64-protos.h (atomic_ool_names): Increase dimension
>         of str array.
>         * config/aarch64/aarch64.cc (aarch64_atomic_ool_func): Call
>         memmodel_from_int and handle MEMMODEL_SYNC_*.
>         (DEF0): Add __aarch64_*_sync functions.
>
> testsuite/
>         * gcc.target/aarch64/sync-comp-swap-ool.c: New.
>         * gcc.target/aarch64/sync-op-acquire-ool.c: New.
>         * gcc.target/aarch64/sync-op-full-ool.c: New.
>         * gcc.target/aarch64/target_attr_20.c: Update check.
>         * gcc.target/aarch64/target_attr_21.c: Same.
>
> libgcc/
>         * config/aarch64/lse.S: Define BARRIER and handle memory MODEL 5.
>         * config/aarch64/t-lse: Add a 5th memory model for _sync functions.

OK, thanks.

Richard

> From 3b624598035e4e0c1aee89efaae28596a64b3d0d Mon Sep 17 00:00:00 2001
> From: Sebastian Pop <spop@amazon.com>
> Date: Mon, 18 Apr 2022 15:13:20 +0000
> Subject: [PATCH] [AArch64] add barriers to ool __sync builtins
>
> 	* config/aarch64/aarch64-protos.h (atomic_ool_names): Increase dimension
> 	of str array.
> 	* config/aarch64/aarch64.cc (aarch64_atomic_ool_func): Call
> 	memmodel_from_int and handle MEMMODEL_SYNC_*.
> 	(DEF0): Add __aarch64_*_sync functions.
>
> testsuite/
> 	* gcc.target/aarch64/sync-comp-swap-ool.c: New.
> 	* gcc.target/aarch64/sync-op-acquire-ool.c: New.
> 	* gcc.target/aarch64/sync-op-full-ool.c: New.
> 	* gcc.target/aarch64/target_attr_20.c: Update check.
> 	* gcc.target/aarch64/target_attr_21.c: Same.
>
> libgcc/
> 	* config/aarch64/lse.S: Define BARRIER and handle memory MODEL 5.
> 	* config/aarch64/t-lse: Add a 5th memory model for _sync functions.
> ---
>  gcc/config/aarch64/aarch64-protos.h           |  2 +-
>  gcc/config/aarch64/aarch64.cc                 | 12 ++++--
>  .../gcc.target/aarch64/sync-comp-swap-ool.c   |  6 +++
>  .../gcc.target/aarch64/sync-op-acquire-ool.c  |  6 +++
>  .../gcc.target/aarch64/sync-op-full-ool.c     |  9 ++++
>  .../gcc.target/aarch64/target_attr_20.c       |  2 +-
>  .../gcc.target/aarch64/target_attr_21.c       |  2 +-
>  libgcc/config/aarch64/lse.S                   | 42 +++++++++++++++++--
>  libgcc/config/aarch64/t-lse                   |  8 ++--
>  9 files changed, 75 insertions(+), 14 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sync-comp-swap-ool.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sync-op-acquire-ool.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sync-op-full-ool.c
>
> diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
> index 46bade28ed6..3ad5e77a1af 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -1051,7 +1051,7 @@ bool aarch64_high_bits_all_ones_p (HOST_WIDE_INT);
>  
>  struct atomic_ool_names
>  {
> -    const char *str[5][4];
> +    const char *str[5][5];
>  };
>  
>  rtx aarch64_atomic_ool_func(machine_mode mode, rtx model_rtx,
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 18f80499079..3ad11e84aae 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -22670,14 +22670,14 @@ aarch64_emit_unlikely_jump (rtx insn)
>    add_reg_br_prob_note (jump, profile_probability::very_unlikely ());
>  }
>  
> -/* We store the names of the various atomic helpers in a 5x4 array.
> +/* We store the names of the various atomic helpers in a 5x5 array.
>     Return the libcall function given MODE, MODEL and NAMES.  */
>  
>  rtx
>  aarch64_atomic_ool_func(machine_mode mode, rtx model_rtx,
>  			const atomic_ool_names *names)
>  {
> -  memmodel model = memmodel_base (INTVAL (model_rtx));
> +  memmodel model = memmodel_from_int (INTVAL (model_rtx));
>    int mode_idx, model_idx;
>  
>    switch (mode)
> @@ -22717,6 +22717,11 @@ aarch64_atomic_ool_func(machine_mode mode, rtx model_rtx,
>      case MEMMODEL_SEQ_CST:
>        model_idx = 3;
>        break;
> +    case MEMMODEL_SYNC_ACQUIRE:
> +    case MEMMODEL_SYNC_RELEASE:
> +    case MEMMODEL_SYNC_SEQ_CST:
> +      model_idx = 4;
> +      break;
>      default:
>        gcc_unreachable ();
>      }
> @@ -22729,7 +22734,8 @@ aarch64_atomic_ool_func(machine_mode mode, rtx model_rtx,
>    { "__aarch64_" #B #N "_relax", \
>      "__aarch64_" #B #N "_acq", \
>      "__aarch64_" #B #N "_rel", \
> -    "__aarch64_" #B #N "_acq_rel" }
> +    "__aarch64_" #B #N "_acq_rel", \
> +    "__aarch64_" #B #N "_sync" }
>  
>  #define DEF4(B)  DEF0(B, 1), DEF0(B, 2), DEF0(B, 4), DEF0(B, 8), \
>  		 { NULL, NULL, NULL, NULL }
> diff --git a/gcc/testsuite/gcc.target/aarch64/sync-comp-swap-ool.c b/gcc/testsuite/gcc.target/aarch64/sync-comp-swap-ool.c
> new file mode 100644
> index 00000000000..372f4aa8746
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sync-comp-swap-ool.c
> @@ -0,0 +1,6 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=armv8-a+nolse -O2 -fno-ipa-icf -moutline-atomics" } */
> +
> +#include "sync-comp-swap.x"
> +
> +/* { dg-final { scan-assembler-times "bl.*__aarch64_cas4_sync" 1 } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/sync-op-acquire-ool.c b/gcc/testsuite/gcc.target/aarch64/sync-op-acquire-ool.c
> new file mode 100644
> index 00000000000..95d9c56b5e1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sync-op-acquire-ool.c
> @@ -0,0 +1,6 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=armv8-a+nolse -O2 -moutline-atomics" } */
> +
> +#include "sync-op-acquire.x"
> +
> +/* { dg-final { scan-assembler-times "bl.*__aarch64_swp4_sync" 1 } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/sync-op-full-ool.c b/gcc/testsuite/gcc.target/aarch64/sync-op-full-ool.c
> new file mode 100644
> index 00000000000..2f3881d9755
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sync-op-full-ool.c
> @@ -0,0 +1,9 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=armv8-a+nolse -O2 -moutline-atomics" } */
> +
> +#include "sync-op-full.x"
> +
> +/* { dg-final { scan-assembler-times "bl.*__aarch64_ldadd4_sync" 1 } } */
> +/* { dg-final { scan-assembler-times "bl.*__aarch64_ldclr4_sync" 1 } } */
> +/* { dg-final { scan-assembler-times "bl.*__aarch64_ldeor4_sync" 1 } } */
> +/* { dg-final { scan-assembler-times "bl.*__aarch64_ldset4_sync" 1 } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/target_attr_20.c b/gcc/testsuite/gcc.target/aarch64/target_attr_20.c
> index 509fb039e84..c9454fc420b 100644
> --- a/gcc/testsuite/gcc.target/aarch64/target_attr_20.c
> +++ b/gcc/testsuite/gcc.target/aarch64/target_attr_20.c
> @@ -24,4 +24,4 @@ bar (void)
>      }
>  }
>  
> -/* { dg-final { scan-assembler-not "bl.*__aarch64_cas2_acq_rel" } } */
> +/* { dg-final { scan-assembler-not "bl.*__aarch64_cas2_sync" } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/target_attr_21.c b/gcc/testsuite/gcc.target/aarch64/target_attr_21.c
> index acace4c8f2a..b8e56223b02 100644
> --- a/gcc/testsuite/gcc.target/aarch64/target_attr_21.c
> +++ b/gcc/testsuite/gcc.target/aarch64/target_attr_21.c
> @@ -24,4 +24,4 @@ bar (void)
>      }
>  }
>  
> -/* { dg-final { scan-assembler-times "bl.*__aarch64_cas2_acq_rel" 1 } } */
> +/* { dg-final { scan-assembler-times "bl.*__aarch64_cas2_sync" 1 } } */
> diff --git a/libgcc/config/aarch64/lse.S b/libgcc/config/aarch64/lse.S
> index c353ec2215b..9c29cf08b59 100644
> --- a/libgcc/config/aarch64/lse.S
> +++ b/libgcc/config/aarch64/lse.S
> @@ -87,24 +87,44 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
>  # define L
>  # define M     0x000000
>  # define N     0x000000
> +# define BARRIER
>  #elif MODEL == 2
>  # define SUFF  _acq
>  # define A     a
>  # define L
>  # define M     0x400000
>  # define N     0x800000
> +# define BARRIER
>  #elif MODEL == 3
>  # define SUFF  _rel
>  # define A
>  # define L     l
>  # define M     0x008000
>  # define N     0x400000
> +# define BARRIER
>  #elif MODEL == 4
>  # define SUFF  _acq_rel
>  # define A     a
>  # define L     l
>  # define M     0x408000
>  # define N     0xc00000
> +# define BARRIER
> +#elif MODEL == 5
> +# define SUFF  _sync
> +#ifdef L_swp
> +/* swp has _acq semantics.  */
> +#  define A    a
> +#  define L
> +#  define M    0x400000
> +#  define N    0x800000
> +#else
> +/* All other _sync functions have _seq semantics.  */
> +#  define A    a
> +#  define L    l
> +#  define M    0x408000
> +#  define N    0xc00000
> +#endif
> +# define BARRIER dmb		ish
>  #else
>  # error
>  #endif
> @@ -127,7 +147,12 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
>  #endif
>  
>  #define NAME(BASE)		glue4(__aarch64_, BASE, SIZE, SUFF)
> -#define LDXR			glue4(ld, A, xr, S)
> +#if MODEL == 5
> +/* Drop A for _sync functions.  */
> +# define LDXR			glue3(ld, xr, S)
> +#else
> +# define LDXR			glue4(ld, A, xr, S)
> +#endif
>  #define STXR			glue4(st, L, xr, S)
>  
>  /* Temporary registers used.  Other than these, only the return value
> @@ -183,10 +208,16 @@ STARTFN	NAME(cas)
>  	bne		1f
>  	STXR		w(tmp1), s(1), [x2]
>  	cbnz		w(tmp1), 0b
> -1:	ret
> +1:	BARRIER
> +	ret
>  
>  #else
> -#define LDXP	glue3(ld, A, xp)
> +#if MODEL == 5
> +/* Drop A for _sync functions.  */
> +# define LDXP	glue2(ld, xp)
> +#else
> +# define LDXP	glue3(ld, A, xp)
> +#endif
>  #define STXP	glue3(st, L, xp)
>  #ifdef HAVE_AS_LSE
>  # define CASP	glue3(casp, A, L)	x0, x1, x2, x3, [x4]
> @@ -205,7 +236,8 @@ STARTFN	NAME(cas)
>  	bne		1f
>  	STXP		w(tmp2), x2, x3, [x4]
>  	cbnz		w(tmp2), 0b
> -1:	ret
> +1:	BARRIER
> +	ret
>  
>  #endif
>  
> @@ -229,6 +261,7 @@ STARTFN	NAME(swp)
>  0:	LDXR		s(0), [x1]
>  	STXR		w(tmp1), s(tmp0), [x1]
>  	cbnz		w(tmp1), 0b
> +	BARRIER
>  	ret
>  
>  ENDFN	NAME(swp)
> @@ -273,6 +306,7 @@ STARTFN	NAME(LDNM)
>  	OP		s(tmp1), s(0), s(tmp0)
>  	STXR		w(tmp2), s(tmp1), [x1]
>  	cbnz		w(tmp2), 0b
> +	BARRIER
>  	ret
>  
>  ENDFN	NAME(LDNM)
> diff --git a/libgcc/config/aarch64/t-lse b/libgcc/config/aarch64/t-lse
> index 790cada3315..624daf7eddf 100644
> --- a/libgcc/config/aarch64/t-lse
> +++ b/libgcc/config/aarch64/t-lse
> @@ -18,13 +18,13 @@
>  # along with GCC; see the file COPYING3.  If not see
>  # <http://www.gnu.org/licenses/>.
>  
> -# Compare-and-swap has 5 sizes and 4 memory models.
> +# Compare-and-swap has 5 sizes and 5 memory models.
>  S0 := $(foreach s, 1 2 4 8 16, $(addsuffix _$(s), cas))
> -O0 := $(foreach m, 1 2 3 4, $(addsuffix _$(m)$(objext), $(S0)))
> +O0 := $(foreach m, 1 2 3 4 5, $(addsuffix _$(m)$(objext), $(S0)))
>  
> -# Swap, Load-and-operate have 4 sizes and 4 memory models
> +# Swap, Load-and-operate have 4 sizes and 5 memory models
>  S1 := $(foreach s, 1 2 4 8, $(addsuffix _$(s), swp ldadd ldclr ldeor ldset))
> -O1 := $(foreach m, 1 2 3 4, $(addsuffix _$(m)$(objext), $(S1)))
> +O1 := $(foreach m, 1 2 3 4 5, $(addsuffix _$(m)$(objext), $(S1)))
>  
>  LSE_OBJS := $(O0) $(O1)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [AArch64] PR105162: emit barrier for __sync and __atomic builtins on CPUs without LSE
  2022-05-13 14:58           ` Richard Sandiford
@ 2022-05-13 15:41             ` Wilco Dijkstra
  2022-05-13 20:32               ` Pop, Sebastian
  0 siblings, 1 reply; 10+ messages in thread
From: Wilco Dijkstra @ 2022-05-13 15:41 UTC (permalink / raw)
  To: Pop, Sebastian; +Cc: Richard Sandiford, Pop, Sebastian via Gcc-patches

Hi Sebastian,

>> Note the patch still needs an appropriate commit message.
>
> Added the following ChangeLog entry to the commit message.
>
>         * config/aarch64/aarch64-protos.h (atomic_ool_names): Increase dimension
>         of str array.
>         * config/aarch64/aarch64.cc (aarch64_atomic_ool_func): Call
>         memmodel_from_int and handle MEMMODEL_SYNC_*.
>         (DEF0): Add __aarch64_*_sync functions.

When you commit this, please also add a PR target/105162 before the changelog so the
commit gets automatically added to the PR. We will need backports as well.

Cheers,
Wilco

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [AArch64] PR105162: emit barrier for __sync and __atomic builtins on CPUs without LSE
  2022-05-13 15:41             ` Wilco Dijkstra
@ 2022-05-13 20:32               ` Pop, Sebastian
  2022-05-16 11:42                 ` Richard Sandiford
  0 siblings, 1 reply; 10+ messages in thread
From: Pop, Sebastian @ 2022-05-13 20:32 UTC (permalink / raw)
  To: Wilco Dijkstra; +Cc: Richard Sandiford, Pop, Sebastian via Gcc-patches

[-- Attachment #1: Type: text/plain, Size: 197 bytes --]

Please see attached the patch back-ported to branches 12, 11, 10, and 9.
Tested on aarch64-linux with bootstrap and regression test.
Ok to commit to the GCC active branches?

Thanks,
Sebastian

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: gcc-12.patch --]
[-- Type: text/x-patch; name="gcc-12.patch", Size: 9616 bytes --]

From bba8d09284f3478f7d542ca4e7812d4c55e25bd4 Mon Sep 17 00:00:00 2001
From: Sebastian Pop <spop@amazon.com>
Date: Mon, 18 Apr 2022 15:13:20 +0000
Subject: [PATCH] [AArch64] add barriers to ool __sync builtins

2022-05-13  Sebastian Pop  <spop@amazon.com>

gcc/
	PR target/105162
	* config/aarch64/aarch64-protos.h (atomic_ool_names): Increase dimension
	of str array.
	* config/aarch64/aarch64.cc (aarch64_atomic_ool_func): Call
	memmodel_from_int and handle MEMMODEL_SYNC_*.
	(DEF0): Add __aarch64_*_sync functions.

gcc/testsuite/
	PR target/105162
	* gcc.target/aarch64/sync-comp-swap-ool.c: New.
	* gcc.target/aarch64/sync-op-acquire-ool.c: New.
	* gcc.target/aarch64/sync-op-full-ool.c: New.
	* gcc.target/aarch64/target_attr_20.c: Update check.
	* gcc.target/aarch64/target_attr_21.c: Same.

libgcc/
	PR target/105162
	* config/aarch64/lse.S: Define BARRIER and handle memory MODEL 5.
	* config/aarch64/t-lse: Add a 5th memory model for _sync functions.
---
 gcc/config/aarch64/aarch64-protos.h           |  2 +-
 gcc/config/aarch64/aarch64.cc                 | 12 ++++--
 .../gcc.target/aarch64/sync-comp-swap-ool.c   |  6 +++
 .../gcc.target/aarch64/sync-op-acquire-ool.c  |  6 +++
 .../gcc.target/aarch64/sync-op-full-ool.c     |  9 ++++
 .../gcc.target/aarch64/target_attr_20.c       |  2 +-
 .../gcc.target/aarch64/target_attr_21.c       |  2 +-
 libgcc/config/aarch64/lse.S                   | 42 +++++++++++++++++--
 libgcc/config/aarch64/t-lse                   |  8 ++--
 9 files changed, 75 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sync-comp-swap-ool.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sync-op-acquire-ool.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sync-op-full-ool.c

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 2ac781dff4a..df311812e8d 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -1065,7 +1065,7 @@ bool aarch64_high_bits_all_ones_p (HOST_WIDE_INT);
 
 struct atomic_ool_names
 {
-    const char *str[5][4];
+    const char *str[5][5];
 };
 
 rtx aarch64_atomic_ool_func(machine_mode mode, rtx model_rtx,
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index f650abbc4ce..f4d2a800f39 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -22678,14 +22678,14 @@ aarch64_emit_unlikely_jump (rtx insn)
   add_reg_br_prob_note (jump, profile_probability::very_unlikely ());
 }
 
-/* We store the names of the various atomic helpers in a 5x4 array.
+/* We store the names of the various atomic helpers in a 5x5 array.
    Return the libcall function given MODE, MODEL and NAMES.  */
 
 rtx
 aarch64_atomic_ool_func(machine_mode mode, rtx model_rtx,
 			const atomic_ool_names *names)
 {
-  memmodel model = memmodel_base (INTVAL (model_rtx));
+  memmodel model = memmodel_from_int (INTVAL (model_rtx));
   int mode_idx, model_idx;
 
   switch (mode)
@@ -22725,6 +22725,11 @@ aarch64_atomic_ool_func(machine_mode mode, rtx model_rtx,
     case MEMMODEL_SEQ_CST:
       model_idx = 3;
       break;
+    case MEMMODEL_SYNC_ACQUIRE:
+    case MEMMODEL_SYNC_RELEASE:
+    case MEMMODEL_SYNC_SEQ_CST:
+      model_idx = 4;
+      break;
     default:
       gcc_unreachable ();
     }
@@ -22737,7 +22742,8 @@ aarch64_atomic_ool_func(machine_mode mode, rtx model_rtx,
   { "__aarch64_" #B #N "_relax", \
     "__aarch64_" #B #N "_acq", \
     "__aarch64_" #B #N "_rel", \
-    "__aarch64_" #B #N "_acq_rel" }
+    "__aarch64_" #B #N "_acq_rel", \
+    "__aarch64_" #B #N "_sync" }
 
 #define DEF4(B)  DEF0(B, 1), DEF0(B, 2), DEF0(B, 4), DEF0(B, 8), \
 		 { NULL, NULL, NULL, NULL }
diff --git a/gcc/testsuite/gcc.target/aarch64/sync-comp-swap-ool.c b/gcc/testsuite/gcc.target/aarch64/sync-comp-swap-ool.c
new file mode 100644
index 00000000000..372f4aa8746
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sync-comp-swap-ool.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-options "-march=armv8-a+nolse -O2 -fno-ipa-icf -moutline-atomics" } */
+
+#include "sync-comp-swap.x"
+
+/* { dg-final { scan-assembler-times "bl.*__aarch64_cas4_sync" 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sync-op-acquire-ool.c b/gcc/testsuite/gcc.target/aarch64/sync-op-acquire-ool.c
new file mode 100644
index 00000000000..95d9c56b5e1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sync-op-acquire-ool.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-options "-march=armv8-a+nolse -O2 -moutline-atomics" } */
+
+#include "sync-op-acquire.x"
+
+/* { dg-final { scan-assembler-times "bl.*__aarch64_swp4_sync" 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sync-op-full-ool.c b/gcc/testsuite/gcc.target/aarch64/sync-op-full-ool.c
new file mode 100644
index 00000000000..2f3881d9755
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sync-op-full-ool.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-march=armv8-a+nolse -O2 -moutline-atomics" } */
+
+#include "sync-op-full.x"
+
+/* { dg-final { scan-assembler-times "bl.*__aarch64_ldadd4_sync" 1 } } */
+/* { dg-final { scan-assembler-times "bl.*__aarch64_ldclr4_sync" 1 } } */
+/* { dg-final { scan-assembler-times "bl.*__aarch64_ldeor4_sync" 1 } } */
+/* { dg-final { scan-assembler-times "bl.*__aarch64_ldset4_sync" 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/target_attr_20.c b/gcc/testsuite/gcc.target/aarch64/target_attr_20.c
index 509fb039e84..c9454fc420b 100644
--- a/gcc/testsuite/gcc.target/aarch64/target_attr_20.c
+++ b/gcc/testsuite/gcc.target/aarch64/target_attr_20.c
@@ -24,4 +24,4 @@ bar (void)
     }
 }
 
-/* { dg-final { scan-assembler-not "bl.*__aarch64_cas2_acq_rel" } } */
+/* { dg-final { scan-assembler-not "bl.*__aarch64_cas2_sync" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/target_attr_21.c b/gcc/testsuite/gcc.target/aarch64/target_attr_21.c
index acace4c8f2a..b8e56223b02 100644
--- a/gcc/testsuite/gcc.target/aarch64/target_attr_21.c
+++ b/gcc/testsuite/gcc.target/aarch64/target_attr_21.c
@@ -24,4 +24,4 @@ bar (void)
     }
 }
 
-/* { dg-final { scan-assembler-times "bl.*__aarch64_cas2_acq_rel" 1 } } */
+/* { dg-final { scan-assembler-times "bl.*__aarch64_cas2_sync" 1 } } */
diff --git a/libgcc/config/aarch64/lse.S b/libgcc/config/aarch64/lse.S
index c353ec2215b..9c29cf08b59 100644
--- a/libgcc/config/aarch64/lse.S
+++ b/libgcc/config/aarch64/lse.S
@@ -87,24 +87,44 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 # define L
 # define M     0x000000
 # define N     0x000000
+# define BARRIER
 #elif MODEL == 2
 # define SUFF  _acq
 # define A     a
 # define L
 # define M     0x400000
 # define N     0x800000
+# define BARRIER
 #elif MODEL == 3
 # define SUFF  _rel
 # define A
 # define L     l
 # define M     0x008000
 # define N     0x400000
+# define BARRIER
 #elif MODEL == 4
 # define SUFF  _acq_rel
 # define A     a
 # define L     l
 # define M     0x408000
 # define N     0xc00000
+# define BARRIER
+#elif MODEL == 5
+# define SUFF  _sync
+#ifdef L_swp
+/* swp has _acq semantics.  */
+#  define A    a
+#  define L
+#  define M    0x400000
+#  define N    0x800000
+#else
+/* All other _sync functions have _seq semantics.  */
+#  define A    a
+#  define L    l
+#  define M    0x408000
+#  define N    0xc00000
+#endif
+# define BARRIER dmb		ish
 #else
 # error
 #endif
@@ -127,7 +147,12 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 #endif
 
 #define NAME(BASE)		glue4(__aarch64_, BASE, SIZE, SUFF)
-#define LDXR			glue4(ld, A, xr, S)
+#if MODEL == 5
+/* Drop A for _sync functions.  */
+# define LDXR			glue3(ld, xr, S)
+#else
+# define LDXR			glue4(ld, A, xr, S)
+#endif
 #define STXR			glue4(st, L, xr, S)
 
 /* Temporary registers used.  Other than these, only the return value
@@ -183,10 +208,16 @@ STARTFN	NAME(cas)
 	bne		1f
 	STXR		w(tmp1), s(1), [x2]
 	cbnz		w(tmp1), 0b
-1:	ret
+1:	BARRIER
+	ret
 
 #else
-#define LDXP	glue3(ld, A, xp)
+#if MODEL == 5
+/* Drop A for _sync functions.  */
+# define LDXP	glue2(ld, xp)
+#else
+# define LDXP	glue3(ld, A, xp)
+#endif
 #define STXP	glue3(st, L, xp)
 #ifdef HAVE_AS_LSE
 # define CASP	glue3(casp, A, L)	x0, x1, x2, x3, [x4]
@@ -205,7 +236,8 @@ STARTFN	NAME(cas)
 	bne		1f
 	STXP		w(tmp2), x2, x3, [x4]
 	cbnz		w(tmp2), 0b
-1:	ret
+1:	BARRIER
+	ret
 
 #endif
 
@@ -229,6 +261,7 @@ STARTFN	NAME(swp)
 0:	LDXR		s(0), [x1]
 	STXR		w(tmp1), s(tmp0), [x1]
 	cbnz		w(tmp1), 0b
+	BARRIER
 	ret
 
 ENDFN	NAME(swp)
@@ -273,6 +306,7 @@ STARTFN	NAME(LDNM)
 	OP		s(tmp1), s(0), s(tmp0)
 	STXR		w(tmp2), s(tmp1), [x1]
 	cbnz		w(tmp2), 0b
+	BARRIER
 	ret
 
 ENDFN	NAME(LDNM)
diff --git a/libgcc/config/aarch64/t-lse b/libgcc/config/aarch64/t-lse
index 790cada3315..624daf7eddf 100644
--- a/libgcc/config/aarch64/t-lse
+++ b/libgcc/config/aarch64/t-lse
@@ -18,13 +18,13 @@
 # along with GCC; see the file COPYING3.  If not see
 # <http://www.gnu.org/licenses/>.
 
-# Compare-and-swap has 5 sizes and 4 memory models.
+# Compare-and-swap has 5 sizes and 5 memory models.
 S0 := $(foreach s, 1 2 4 8 16, $(addsuffix _$(s), cas))
-O0 := $(foreach m, 1 2 3 4, $(addsuffix _$(m)$(objext), $(S0)))
+O0 := $(foreach m, 1 2 3 4 5, $(addsuffix _$(m)$(objext), $(S0)))
 
-# Swap, Load-and-operate have 4 sizes and 4 memory models
+# Swap, Load-and-operate have 4 sizes and 5 memory models
 S1 := $(foreach s, 1 2 4 8, $(addsuffix _$(s), swp ldadd ldclr ldeor ldset))
-O1 := $(foreach m, 1 2 3 4, $(addsuffix _$(m)$(objext), $(S1)))
+O1 := $(foreach m, 1 2 3 4 5, $(addsuffix _$(m)$(objext), $(S1)))
 
 LSE_OBJS := $(O0) $(O1)
 
-- 
2.25.1


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: gcc-11.patch --]
[-- Type: text/x-patch; name="gcc-11.patch", Size: 9611 bytes --]

From b79bd219910af32d2557adaed81ce448c4032e47 Mon Sep 17 00:00:00 2001
From: Sebastian Pop <spop@amazon.com>
Date: Mon, 18 Apr 2022 15:13:20 +0000
Subject: [PATCH] [AArch64] add barriers to ool __sync builtins

2022-05-13  Sebastian Pop  <spop@amazon.com>

gcc/
	PR target/105162
	* config/aarch64/aarch64-protos.h (atomic_ool_names): Increase dimension
	of str array.
	* config/aarch64/aarch64.c (aarch64_atomic_ool_func): Call
	memmodel_from_int and handle MEMMODEL_SYNC_*.
	(DEF0): Add __aarch64_*_sync functions.

gcc/testsuite/
	PR target/105162
	* gcc.target/aarch64/sync-comp-swap-ool.c: New.
	* gcc.target/aarch64/sync-op-acquire-ool.c: New.
	* gcc.target/aarch64/sync-op-full-ool.c: New.
	* gcc.target/aarch64/target_attr_20.c: Update check.
	* gcc.target/aarch64/target_attr_21.c: Same.

libgcc/
	PR target/105162
	* config/aarch64/lse.S: Define BARRIER and handle memory MODEL 5.
	* config/aarch64/t-lse: Add a 5th memory model for _sync functions.
---
 gcc/config/aarch64/aarch64-protos.h           |  2 +-
 gcc/config/aarch64/aarch64.c                  | 12 ++++--
 .../gcc.target/aarch64/sync-comp-swap-ool.c   |  6 +++
 .../gcc.target/aarch64/sync-op-acquire-ool.c  |  6 +++
 .../gcc.target/aarch64/sync-op-full-ool.c     |  9 ++++
 .../gcc.target/aarch64/target_attr_20.c       |  2 +-
 .../gcc.target/aarch64/target_attr_21.c       |  2 +-
 libgcc/config/aarch64/lse.S                   | 42 +++++++++++++++++--
 libgcc/config/aarch64/t-lse                   |  8 ++--
 9 files changed, 75 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sync-comp-swap-ool.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sync-op-acquire-ool.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sync-op-full-ool.c

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index b91eeeba101..ad62da81d31 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -1034,7 +1034,7 @@ bool aarch64_high_bits_all_ones_p (HOST_WIDE_INT);
 
 struct atomic_ool_names
 {
-    const char *str[5][4];
+    const char *str[5][5];
 };
 
 rtx aarch64_atomic_ool_func(machine_mode mode, rtx model_rtx,
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index c155f4883cf..7e6f9d08ea3 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -21627,14 +21627,14 @@ aarch64_emit_unlikely_jump (rtx insn)
   add_reg_br_prob_note (jump, profile_probability::very_unlikely ());
 }
 
-/* We store the names of the various atomic helpers in a 5x4 array.
+/* We store the names of the various atomic helpers in a 5x5 array.
    Return the libcall function given MODE, MODEL and NAMES.  */
 
 rtx
 aarch64_atomic_ool_func(machine_mode mode, rtx model_rtx,
 			const atomic_ool_names *names)
 {
-  memmodel model = memmodel_base (INTVAL (model_rtx));
+  memmodel model = memmodel_from_int (INTVAL (model_rtx));
   int mode_idx, model_idx;
 
   switch (mode)
@@ -21674,6 +21674,11 @@ aarch64_atomic_ool_func(machine_mode mode, rtx model_rtx,
     case MEMMODEL_SEQ_CST:
       model_idx = 3;
       break;
+    case MEMMODEL_SYNC_ACQUIRE:
+    case MEMMODEL_SYNC_RELEASE:
+    case MEMMODEL_SYNC_SEQ_CST:
+      model_idx = 4;
+      break;
     default:
       gcc_unreachable ();
     }
@@ -21686,7 +21691,8 @@ aarch64_atomic_ool_func(machine_mode mode, rtx model_rtx,
   { "__aarch64_" #B #N "_relax", \
     "__aarch64_" #B #N "_acq", \
     "__aarch64_" #B #N "_rel", \
-    "__aarch64_" #B #N "_acq_rel" }
+    "__aarch64_" #B #N "_acq_rel", \
+    "__aarch64_" #B #N "_sync" }
 
 #define DEF4(B)  DEF0(B, 1), DEF0(B, 2), DEF0(B, 4), DEF0(B, 8), \
 		 { NULL, NULL, NULL, NULL }
diff --git a/gcc/testsuite/gcc.target/aarch64/sync-comp-swap-ool.c b/gcc/testsuite/gcc.target/aarch64/sync-comp-swap-ool.c
new file mode 100644
index 00000000000..372f4aa8746
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sync-comp-swap-ool.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-options "-march=armv8-a+nolse -O2 -fno-ipa-icf -moutline-atomics" } */
+
+#include "sync-comp-swap.x"
+
+/* { dg-final { scan-assembler-times "bl.*__aarch64_cas4_sync" 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sync-op-acquire-ool.c b/gcc/testsuite/gcc.target/aarch64/sync-op-acquire-ool.c
new file mode 100644
index 00000000000..95d9c56b5e1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sync-op-acquire-ool.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-options "-march=armv8-a+nolse -O2 -moutline-atomics" } */
+
+#include "sync-op-acquire.x"
+
+/* { dg-final { scan-assembler-times "bl.*__aarch64_swp4_sync" 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sync-op-full-ool.c b/gcc/testsuite/gcc.target/aarch64/sync-op-full-ool.c
new file mode 100644
index 00000000000..2f3881d9755
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sync-op-full-ool.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-march=armv8-a+nolse -O2 -moutline-atomics" } */
+
+#include "sync-op-full.x"
+
+/* { dg-final { scan-assembler-times "bl.*__aarch64_ldadd4_sync" 1 } } */
+/* { dg-final { scan-assembler-times "bl.*__aarch64_ldclr4_sync" 1 } } */
+/* { dg-final { scan-assembler-times "bl.*__aarch64_ldeor4_sync" 1 } } */
+/* { dg-final { scan-assembler-times "bl.*__aarch64_ldset4_sync" 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/target_attr_20.c b/gcc/testsuite/gcc.target/aarch64/target_attr_20.c
index 509fb039e84..c9454fc420b 100644
--- a/gcc/testsuite/gcc.target/aarch64/target_attr_20.c
+++ b/gcc/testsuite/gcc.target/aarch64/target_attr_20.c
@@ -24,4 +24,4 @@ bar (void)
     }
 }
 
-/* { dg-final { scan-assembler-not "bl.*__aarch64_cas2_acq_rel" } } */
+/* { dg-final { scan-assembler-not "bl.*__aarch64_cas2_sync" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/target_attr_21.c b/gcc/testsuite/gcc.target/aarch64/target_attr_21.c
index acace4c8f2a..b8e56223b02 100644
--- a/gcc/testsuite/gcc.target/aarch64/target_attr_21.c
+++ b/gcc/testsuite/gcc.target/aarch64/target_attr_21.c
@@ -24,4 +24,4 @@ bar (void)
     }
 }
 
-/* { dg-final { scan-assembler-times "bl.*__aarch64_cas2_acq_rel" 1 } } */
+/* { dg-final { scan-assembler-times "bl.*__aarch64_cas2_sync" 1 } } */
diff --git a/libgcc/config/aarch64/lse.S b/libgcc/config/aarch64/lse.S
index df965b5a524..9215873842b 100644
--- a/libgcc/config/aarch64/lse.S
+++ b/libgcc/config/aarch64/lse.S
@@ -87,24 +87,44 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 # define L
 # define M     0x000000
 # define N     0x000000
+# define BARRIER
 #elif MODEL == 2
 # define SUFF  _acq
 # define A     a
 # define L
 # define M     0x400000
 # define N     0x800000
+# define BARRIER
 #elif MODEL == 3
 # define SUFF  _rel
 # define A
 # define L     l
 # define M     0x008000
 # define N     0x400000
+# define BARRIER
 #elif MODEL == 4
 # define SUFF  _acq_rel
 # define A     a
 # define L     l
 # define M     0x408000
 # define N     0xc00000
+# define BARRIER
+#elif MODEL == 5
+# define SUFF  _sync
+#ifdef L_swp
+/* swp has _acq semantics.  */
+#  define A    a
+#  define L
+#  define M    0x400000
+#  define N    0x800000
+#else
+/* All other _sync functions have _seq semantics.  */
+#  define A    a
+#  define L    l
+#  define M    0x408000
+#  define N    0xc00000
+#endif
+# define BARRIER dmb		ish
 #else
 # error
 #endif
@@ -127,7 +147,12 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 #endif
 
 #define NAME(BASE)		glue4(__aarch64_, BASE, SIZE, SUFF)
-#define LDXR			glue4(ld, A, xr, S)
+#if MODEL == 5
+/* Drop A for _sync functions.  */
+# define LDXR			glue3(ld, xr, S)
+#else
+# define LDXR			glue4(ld, A, xr, S)
+#endif
 #define STXR			glue4(st, L, xr, S)
 
 /* Temporary registers used.  Other than these, only the return value
@@ -183,10 +208,16 @@ STARTFN	NAME(cas)
 	bne		1f
 	STXR		w(tmp1), s(1), [x2]
 	cbnz		w(tmp1), 0b
-1:	ret
+1:	BARRIER
+	ret
 
 #else
-#define LDXP	glue3(ld, A, xp)
+#if MODEL == 5
+/* Drop A for _sync functions.  */
+# define LDXP	glue2(ld, xp)
+#else
+# define LDXP	glue3(ld, A, xp)
+#endif
 #define STXP	glue3(st, L, xp)
 #ifdef HAVE_AS_LSE
 # define CASP	glue3(casp, A, L)	x0, x1, x2, x3, [x4]
@@ -205,7 +236,8 @@ STARTFN	NAME(cas)
 	bne		1f
 	STXP		w(tmp2), x2, x3, [x4]
 	cbnz		w(tmp2), 0b
-1:	ret
+1:	BARRIER
+	ret
 
 #endif
 
@@ -229,6 +261,7 @@ STARTFN	NAME(swp)
 0:	LDXR		s(0), [x1]
 	STXR		w(tmp1), s(tmp0), [x1]
 	cbnz		w(tmp1), 0b
+	BARRIER
 	ret
 
 ENDFN	NAME(swp)
@@ -273,6 +306,7 @@ STARTFN	NAME(LDNM)
 	OP		s(tmp1), s(0), s(tmp0)
 	STXR		w(tmp2), s(tmp1), [x1]
 	cbnz		w(tmp2), 0b
+	BARRIER
 	ret
 
 ENDFN	NAME(LDNM)
diff --git a/libgcc/config/aarch64/t-lse b/libgcc/config/aarch64/t-lse
index 88d2d84d100..6ec6df79392 100644
--- a/libgcc/config/aarch64/t-lse
+++ b/libgcc/config/aarch64/t-lse
@@ -18,13 +18,13 @@
 # along with GCC; see the file COPYING3.  If not see
 # <http://www.gnu.org/licenses/>.
 
-# Compare-and-swap has 5 sizes and 4 memory models.
+# Compare-and-swap has 5 sizes and 5 memory models.
 S0 := $(foreach s, 1 2 4 8 16, $(addsuffix _$(s), cas))
-O0 := $(foreach m, 1 2 3 4, $(addsuffix _$(m)$(objext), $(S0)))
+O0 := $(foreach m, 1 2 3 4 5, $(addsuffix _$(m)$(objext), $(S0)))
 
-# Swap, Load-and-operate have 4 sizes and 4 memory models
+# Swap, Load-and-operate have 4 sizes and 5 memory models
 S1 := $(foreach s, 1 2 4 8, $(addsuffix _$(s), swp ldadd ldclr ldeor ldset))
-O1 := $(foreach m, 1 2 3 4, $(addsuffix _$(m)$(objext), $(S1)))
+O1 := $(foreach m, 1 2 3 4 5, $(addsuffix _$(m)$(objext), $(S1)))
 
 LSE_OBJS := $(O0) $(O1)
 
-- 
2.25.1


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #4: gcc-10.patch --]
[-- Type: text/x-patch; name="gcc-10.patch", Size: 9599 bytes --]

From b9d83a26d81e6739aa38214bea9ef69c275ffa25 Mon Sep 17 00:00:00 2001
From: Sebastian Pop <spop@amazon.com>
Date: Mon, 18 Apr 2022 15:13:20 +0000
Subject: [PATCH] add barriers to ool __sync builtins

2022-05-13  Sebastian Pop  <spop@amazon.com>

gcc/
	PR target/105162
	* config/aarch64/aarch64-protos.h (atomic_ool_names): Increase dimension
	of str array.
	* config/aarch64/aarch64.c (aarch64_atomic_ool_func): Call
	memmodel_from_int and handle MEMMODEL_SYNC_*.
	(DEF0): Add __aarch64_*_sync functions.

gcc/testsuite/
	PR target/105162
	* gcc.target/aarch64/sync-comp-swap-ool.c: New.
	* gcc.target/aarch64/sync-op-acquire-ool.c: New.
	* gcc.target/aarch64/sync-op-full-ool.c: New.
	* gcc.target/aarch64/target_attr_20.c: Update check.
	* gcc.target/aarch64/target_attr_21.c: Same.

libgcc/
	PR target/105162
	* config/aarch64/lse.S: Define BARRIER and handle memory MODEL 5.
	* config/aarch64/t-lse: Add a 5th memory model for _sync functions.
---
 gcc/config/aarch64/aarch64-protos.h           |  2 +-
 gcc/config/aarch64/aarch64.c                  | 12 ++++--
 .../gcc.target/aarch64/sync-comp-swap-ool.c   |  6 +++
 .../gcc.target/aarch64/sync-op-acquire-ool.c  |  6 +++
 .../gcc.target/aarch64/sync-op-full-ool.c     |  9 ++++
 .../gcc.target/aarch64/target_attr_20.c       |  2 +-
 .../gcc.target/aarch64/target_attr_21.c       |  2 +-
 libgcc/config/aarch64/lse.S                   | 42 +++++++++++++++++--
 libgcc/config/aarch64/t-lse                   |  8 ++--
 9 files changed, 75 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sync-comp-swap-ool.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sync-op-acquire-ool.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sync-op-full-ool.c

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index bebd1b36228..61e2ad54f88 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -787,7 +787,7 @@ bool aarch64_high_bits_all_ones_p (HOST_WIDE_INT);
 
 struct atomic_ool_names
 {
-    const char *str[5][4];
+    const char *str[5][5];
 };
 
 rtx aarch64_atomic_ool_func(machine_mode mode, rtx model_rtx,
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index d934dd67fc5..67c2f1123b4 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -19617,14 +19617,14 @@ aarch64_emit_unlikely_jump (rtx insn)
   add_reg_br_prob_note (jump, profile_probability::very_unlikely ());
 }
 
-/* We store the names of the various atomic helpers in a 5x4 array.
+/* We store the names of the various atomic helpers in a 5x5 array.
    Return the libcall function given MODE, MODEL and NAMES.  */
 
 rtx
 aarch64_atomic_ool_func(machine_mode mode, rtx model_rtx,
 			const atomic_ool_names *names)
 {
-  memmodel model = memmodel_base (INTVAL (model_rtx));
+  memmodel model = memmodel_from_int (INTVAL (model_rtx));
   int mode_idx, model_idx;
 
   switch (mode)
@@ -19664,6 +19664,11 @@ aarch64_atomic_ool_func(machine_mode mode, rtx model_rtx,
     case MEMMODEL_SEQ_CST:
       model_idx = 3;
       break;
+    case MEMMODEL_SYNC_ACQUIRE:
+    case MEMMODEL_SYNC_RELEASE:
+    case MEMMODEL_SYNC_SEQ_CST:
+      model_idx = 4;
+      break;
     default:
       gcc_unreachable ();
     }
@@ -19676,7 +19681,8 @@ aarch64_atomic_ool_func(machine_mode mode, rtx model_rtx,
   { "__aarch64_" #B #N "_relax", \
     "__aarch64_" #B #N "_acq", \
     "__aarch64_" #B #N "_rel", \
-    "__aarch64_" #B #N "_acq_rel" }
+    "__aarch64_" #B #N "_acq_rel", \
+    "__aarch64_" #B #N "_sync" }
 
 #define DEF4(B)  DEF0(B, 1), DEF0(B, 2), DEF0(B, 4), DEF0(B, 8), \
 		 { NULL, NULL, NULL, NULL }
diff --git a/gcc/testsuite/gcc.target/aarch64/sync-comp-swap-ool.c b/gcc/testsuite/gcc.target/aarch64/sync-comp-swap-ool.c
new file mode 100644
index 00000000000..372f4aa8746
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sync-comp-swap-ool.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-options "-march=armv8-a+nolse -O2 -fno-ipa-icf -moutline-atomics" } */
+
+#include "sync-comp-swap.x"
+
+/* { dg-final { scan-assembler-times "bl.*__aarch64_cas4_sync" 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sync-op-acquire-ool.c b/gcc/testsuite/gcc.target/aarch64/sync-op-acquire-ool.c
new file mode 100644
index 00000000000..95d9c56b5e1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sync-op-acquire-ool.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-options "-march=armv8-a+nolse -O2 -moutline-atomics" } */
+
+#include "sync-op-acquire.x"
+
+/* { dg-final { scan-assembler-times "bl.*__aarch64_swp4_sync" 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sync-op-full-ool.c b/gcc/testsuite/gcc.target/aarch64/sync-op-full-ool.c
new file mode 100644
index 00000000000..2f3881d9755
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sync-op-full-ool.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-march=armv8-a+nolse -O2 -moutline-atomics" } */
+
+#include "sync-op-full.x"
+
+/* { dg-final { scan-assembler-times "bl.*__aarch64_ldadd4_sync" 1 } } */
+/* { dg-final { scan-assembler-times "bl.*__aarch64_ldclr4_sync" 1 } } */
+/* { dg-final { scan-assembler-times "bl.*__aarch64_ldeor4_sync" 1 } } */
+/* { dg-final { scan-assembler-times "bl.*__aarch64_ldset4_sync" 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/target_attr_20.c b/gcc/testsuite/gcc.target/aarch64/target_attr_20.c
index 509fb039e84..c9454fc420b 100644
--- a/gcc/testsuite/gcc.target/aarch64/target_attr_20.c
+++ b/gcc/testsuite/gcc.target/aarch64/target_attr_20.c
@@ -24,4 +24,4 @@ bar (void)
     }
 }
 
-/* { dg-final { scan-assembler-not "bl.*__aarch64_cas2_acq_rel" } } */
+/* { dg-final { scan-assembler-not "bl.*__aarch64_cas2_sync" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/target_attr_21.c b/gcc/testsuite/gcc.target/aarch64/target_attr_21.c
index acace4c8f2a..b8e56223b02 100644
--- a/gcc/testsuite/gcc.target/aarch64/target_attr_21.c
+++ b/gcc/testsuite/gcc.target/aarch64/target_attr_21.c
@@ -24,4 +24,4 @@ bar (void)
     }
 }
 
-/* { dg-final { scan-assembler-times "bl.*__aarch64_cas2_acq_rel" 1 } } */
+/* { dg-final { scan-assembler-times "bl.*__aarch64_cas2_sync" 1 } } */
diff --git a/libgcc/config/aarch64/lse.S b/libgcc/config/aarch64/lse.S
index c8fbfbce4fd..67a1108786f 100644
--- a/libgcc/config/aarch64/lse.S
+++ b/libgcc/config/aarch64/lse.S
@@ -87,24 +87,44 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 # define L
 # define M     0x000000
 # define N     0x000000
+# define BARRIER
 #elif MODEL == 2
 # define SUFF  _acq
 # define A     a
 # define L
 # define M     0x400000
 # define N     0x800000
+# define BARRIER
 #elif MODEL == 3
 # define SUFF  _rel
 # define A
 # define L     l
 # define M     0x008000
 # define N     0x400000
+# define BARRIER
 #elif MODEL == 4
 # define SUFF  _acq_rel
 # define A     a
 # define L     l
 # define M     0x408000
 # define N     0xc00000
+# define BARRIER
+#elif MODEL == 5
+# define SUFF  _sync
+#ifdef L_swp
+/* swp has _acq semantics.  */
+#  define A    a
+#  define L
+#  define M    0x400000
+#  define N    0x800000
+#else
+/* All other _sync functions have _seq semantics.  */
+#  define A    a
+#  define L    l
+#  define M    0x408000
+#  define N    0xc00000
+#endif
+# define BARRIER dmb		ish
 #else
 # error
 #endif
@@ -127,7 +147,12 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 #endif
 
 #define NAME(BASE)		glue4(__aarch64_, BASE, SIZE, SUFF)
-#define LDXR			glue4(ld, A, xr, S)
+#if MODEL == 5
+/* Drop A for _sync functions.  */
+# define LDXR			glue3(ld, xr, S)
+#else
+# define LDXR			glue4(ld, A, xr, S)
+#endif
 #define STXR			glue4(st, L, xr, S)
 
 /* Temporary registers used.  Other than these, only the return value
@@ -183,10 +208,16 @@ STARTFN	NAME(cas)
 	bne		1f
 	STXR		w(tmp1), s(1), [x2]
 	cbnz		w(tmp1), 0b
-1:	ret
+1:	BARRIER
+	ret
 
 #else
-#define LDXP	glue3(ld, A, xp)
+#if MODEL == 5
+/* Drop A for _sync functions.  */
+# define LDXP	glue2(ld, xp)
+#else
+# define LDXP	glue3(ld, A, xp)
+#endif
 #define STXP	glue3(st, L, xp)
 #ifdef HAVE_AS_LSE
 # define CASP	glue3(casp, A, L)	x0, x1, x2, x3, [x4]
@@ -205,7 +236,8 @@ STARTFN	NAME(cas)
 	bne		1f
 	STXP		w(tmp2), x2, x3, [x4]
 	cbnz		w(tmp2), 0b
-1:	ret
+1:	BARRIER
+	ret
 
 #endif
 
@@ -229,6 +261,7 @@ STARTFN	NAME(swp)
 0:	LDXR		s(0), [x1]
 	STXR		w(tmp1), s(tmp0), [x1]
 	cbnz		w(tmp1), 0b
+	BARRIER
 	ret
 
 ENDFN	NAME(swp)
@@ -273,6 +306,7 @@ STARTFN	NAME(LDNM)
 	OP		s(tmp1), s(0), s(tmp0)
 	STXR		w(tmp2), s(tmp1), [x1]
 	cbnz		w(tmp2), 0b
+	BARRIER
 	ret
 
 ENDFN	NAME(LDNM)
diff --git a/libgcc/config/aarch64/t-lse b/libgcc/config/aarch64/t-lse
index c41f0508372..e3093724dc0 100644
--- a/libgcc/config/aarch64/t-lse
+++ b/libgcc/config/aarch64/t-lse
@@ -18,13 +18,13 @@
 # along with GCC; see the file COPYING3.  If not see
 # <http://www.gnu.org/licenses/>.
 
-# Compare-and-swap has 5 sizes and 4 memory models.
+# Compare-and-swap has 5 sizes and 5 memory models.
 S0 := $(foreach s, 1 2 4 8 16, $(addsuffix _$(s), cas))
-O0 := $(foreach m, 1 2 3 4, $(addsuffix _$(m)$(objext), $(S0)))
+O0 := $(foreach m, 1 2 3 4 5, $(addsuffix _$(m)$(objext), $(S0)))
 
-# Swap, Load-and-operate have 4 sizes and 4 memory models
+# Swap, Load-and-operate have 4 sizes and 5 memory models
 S1 := $(foreach s, 1 2 4 8, $(addsuffix _$(s), swp ldadd ldclr ldeor ldset))
-O1 := $(foreach m, 1 2 3 4, $(addsuffix _$(m)$(objext), $(S1)))
+O1 := $(foreach m, 1 2 3 4 5, $(addsuffix _$(m)$(objext), $(S1)))
 
 LSE_OBJS := $(O0) $(O1)
 
-- 
2.32.0


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #5: gcc-9.patch --]
[-- Type: text/x-patch; name="gcc-9.patch", Size: 9609 bytes --]

From 263c214639686d43cc95efea6120ee4601cd2cf3 Mon Sep 17 00:00:00 2001
From: Sebastian Pop <spop@amazon.com>
Date: Mon, 18 Apr 2022 15:13:20 +0000
Subject: [PATCH] [AArch64] add barriers to ool __sync builtins

2022-05-13  Sebastian Pop  <spop@amazon.com>

gcc/
	PR target/105162
	* config/aarch64/aarch64-protos.h (atomic_ool_names): Increase dimension
	of str array.
	* config/aarch64/aarch64.c (aarch64_atomic_ool_func): Call
	memmodel_from_int and handle MEMMODEL_SYNC_*.
	(DEF0): Add __aarch64_*_sync functions.

gcc/testsuite/
	PR target/105162
	* gcc.target/aarch64/sync-comp-swap-ool.c: New.
	* gcc.target/aarch64/sync-op-acquire-ool.c: New.
	* gcc.target/aarch64/sync-op-full-ool.c: New.
	* gcc.target/aarch64/target_attr_20.c: Update check.
	* gcc.target/aarch64/target_attr_21.c: Same.

libgcc/
	PR target/105162
	* config/aarch64/lse.S: Define BARRIER and handle memory MODEL 5.
	* config/aarch64/t-lse: Add a 5th memory model for _sync functions.
---
 gcc/config/aarch64/aarch64-protos.h           |  2 +-
 gcc/config/aarch64/aarch64.c                  | 12 ++++--
 .../gcc.target/aarch64/sync-comp-swap-ool.c   |  6 +++
 .../gcc.target/aarch64/sync-op-acquire-ool.c  |  6 +++
 .../gcc.target/aarch64/sync-op-full-ool.c     |  9 ++++
 .../gcc.target/aarch64/target_attr_20.c       |  2 +-
 .../gcc.target/aarch64/target_attr_21.c       |  2 +-
 libgcc/config/aarch64/lse.S                   | 42 +++++++++++++++++--
 libgcc/config/aarch64/t-lse                   |  8 ++--
 9 files changed, 75 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sync-comp-swap-ool.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sync-op-acquire-ool.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sync-op-full-ool.c

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index c9e5d586a51..12a47ebff55 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -647,7 +647,7 @@ bool aarch64_high_bits_all_ones_p (HOST_WIDE_INT);
 
 struct atomic_ool_names
 {
-    const char *str[5][4];
+    const char *str[5][5];
 };
 
 rtx aarch64_atomic_ool_func(machine_mode mode, rtx model_rtx,
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index e96a350ed5b..a35dceab9fc 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -15764,14 +15764,14 @@ aarch64_emit_unlikely_jump (rtx insn)
   add_reg_br_prob_note (jump, profile_probability::very_unlikely ());
 }
 
-/* We store the names of the various atomic helpers in a 5x4 array.
+/* We store the names of the various atomic helpers in a 5x5 array.
    Return the libcall function given MODE, MODEL and NAMES.  */
 
 rtx
 aarch64_atomic_ool_func(machine_mode mode, rtx model_rtx,
 			const atomic_ool_names *names)
 {
-  memmodel model = memmodel_base (INTVAL (model_rtx));
+  memmodel model = memmodel_from_int (INTVAL (model_rtx));
   int mode_idx, model_idx;
 
   switch (mode)
@@ -15811,6 +15811,11 @@ aarch64_atomic_ool_func(machine_mode mode, rtx model_rtx,
     case MEMMODEL_SEQ_CST:
       model_idx = 3;
       break;
+    case MEMMODEL_SYNC_ACQUIRE:
+    case MEMMODEL_SYNC_RELEASE:
+    case MEMMODEL_SYNC_SEQ_CST:
+      model_idx = 4;
+      break;
     default:
       gcc_unreachable ();
     }
@@ -15823,7 +15828,8 @@ aarch64_atomic_ool_func(machine_mode mode, rtx model_rtx,
   { "__aarch64_" #B #N "_relax", \
     "__aarch64_" #B #N "_acq", \
     "__aarch64_" #B #N "_rel", \
-    "__aarch64_" #B #N "_acq_rel" }
+    "__aarch64_" #B #N "_acq_rel", \
+    "__aarch64_" #B #N "_sync" }
 
 #define DEF4(B)  DEF0(B, 1), DEF0(B, 2), DEF0(B, 4), DEF0(B, 8), \
 		 { NULL, NULL, NULL, NULL }
diff --git a/gcc/testsuite/gcc.target/aarch64/sync-comp-swap-ool.c b/gcc/testsuite/gcc.target/aarch64/sync-comp-swap-ool.c
new file mode 100644
index 00000000000..372f4aa8746
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sync-comp-swap-ool.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-options "-march=armv8-a+nolse -O2 -fno-ipa-icf -moutline-atomics" } */
+
+#include "sync-comp-swap.x"
+
+/* { dg-final { scan-assembler-times "bl.*__aarch64_cas4_sync" 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sync-op-acquire-ool.c b/gcc/testsuite/gcc.target/aarch64/sync-op-acquire-ool.c
new file mode 100644
index 00000000000..95d9c56b5e1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sync-op-acquire-ool.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-options "-march=armv8-a+nolse -O2 -moutline-atomics" } */
+
+#include "sync-op-acquire.x"
+
+/* { dg-final { scan-assembler-times "bl.*__aarch64_swp4_sync" 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sync-op-full-ool.c b/gcc/testsuite/gcc.target/aarch64/sync-op-full-ool.c
new file mode 100644
index 00000000000..2f3881d9755
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sync-op-full-ool.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-march=armv8-a+nolse -O2 -moutline-atomics" } */
+
+#include "sync-op-full.x"
+
+/* { dg-final { scan-assembler-times "bl.*__aarch64_ldadd4_sync" 1 } } */
+/* { dg-final { scan-assembler-times "bl.*__aarch64_ldclr4_sync" 1 } } */
+/* { dg-final { scan-assembler-times "bl.*__aarch64_ldeor4_sync" 1 } } */
+/* { dg-final { scan-assembler-times "bl.*__aarch64_ldset4_sync" 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/target_attr_20.c b/gcc/testsuite/gcc.target/aarch64/target_attr_20.c
index 509fb039e84..c9454fc420b 100644
--- a/gcc/testsuite/gcc.target/aarch64/target_attr_20.c
+++ b/gcc/testsuite/gcc.target/aarch64/target_attr_20.c
@@ -24,4 +24,4 @@ bar (void)
     }
 }
 
-/* { dg-final { scan-assembler-not "bl.*__aarch64_cas2_acq_rel" } } */
+/* { dg-final { scan-assembler-not "bl.*__aarch64_cas2_sync" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/target_attr_21.c b/gcc/testsuite/gcc.target/aarch64/target_attr_21.c
index acace4c8f2a..b8e56223b02 100644
--- a/gcc/testsuite/gcc.target/aarch64/target_attr_21.c
+++ b/gcc/testsuite/gcc.target/aarch64/target_attr_21.c
@@ -24,4 +24,4 @@ bar (void)
     }
 }
 
-/* { dg-final { scan-assembler-times "bl.*__aarch64_cas2_acq_rel" 1 } } */
+/* { dg-final { scan-assembler-times "bl.*__aarch64_cas2_sync" 1 } } */
diff --git a/libgcc/config/aarch64/lse.S b/libgcc/config/aarch64/lse.S
index e3ae5191103..720281f149e 100644
--- a/libgcc/config/aarch64/lse.S
+++ b/libgcc/config/aarch64/lse.S
@@ -87,24 +87,44 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 # define L
 # define M     0x000000
 # define N     0x000000
+# define BARRIER
 #elif MODEL == 2
 # define SUFF  _acq
 # define A     a
 # define L
 # define M     0x400000
 # define N     0x800000
+# define BARRIER
 #elif MODEL == 3
 # define SUFF  _rel
 # define A
 # define L     l
 # define M     0x008000
 # define N     0x400000
+# define BARRIER
 #elif MODEL == 4
 # define SUFF  _acq_rel
 # define A     a
 # define L     l
 # define M     0x408000
 # define N     0xc00000
+# define BARRIER
+#elif MODEL == 5
+# define SUFF  _sync
+#ifdef L_swp
+/* swp has _acq semantics.  */
+#  define A    a
+#  define L
+#  define M    0x400000
+#  define N    0x800000
+#else
+/* All other _sync functions have _seq semantics.  */
+#  define A    a
+#  define L    l
+#  define M    0x408000
+#  define N    0xc00000
+#endif
+# define BARRIER dmb		ish
 #else
 # error
 #endif
@@ -127,7 +147,12 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 #endif
 
 #define NAME(BASE)		glue4(__aarch64_, BASE, SIZE, SUFF)
-#define LDXR			glue4(ld, A, xr, S)
+#if MODEL == 5
+/* Drop A for _sync functions.  */
+# define LDXR			glue3(ld, xr, S)
+#else
+# define LDXR			glue4(ld, A, xr, S)
+#endif
 #define STXR			glue4(st, L, xr, S)
 
 /* Temporary registers used.  Other than these, only the return value
@@ -183,10 +208,16 @@ STARTFN	NAME(cas)
 	bne		1f
 	STXR		w(tmp1), s(1), [x2]
 	cbnz		w(tmp1), 0b
-1:	ret
+1:	BARRIER
+	ret
 
 #else
-#define LDXP	glue3(ld, A, xp)
+#if MODEL == 5
+/* Drop A for _sync functions.  */
+# define LDXP	glue2(ld, xp)
+#else
+# define LDXP	glue3(ld, A, xp)
+#endif
 #define STXP	glue3(st, L, xp)
 #ifdef HAVE_AS_LSE
 # define CASP	glue3(casp, A, L)	x0, x1, x2, x3, [x4]
@@ -205,7 +236,8 @@ STARTFN	NAME(cas)
 	bne		1f
 	STXP		w(tmp2), x2, x3, [x4]
 	cbnz		w(tmp2), 0b
-1:	ret
+1:	BARRIER
+	ret
 
 #endif
 
@@ -229,6 +261,7 @@ STARTFN	NAME(swp)
 0:	LDXR		s(0), [x1]
 	STXR		w(tmp1), s(tmp0), [x1]
 	cbnz		w(tmp1), 0b
+	BARRIER
 	ret
 
 ENDFN	NAME(swp)
@@ -273,6 +306,7 @@ STARTFN	NAME(LDNM)
 	OP		s(tmp1), s(0), s(tmp0)
 	STXR		w(tmp2), s(tmp1), [x1]
 	cbnz		w(tmp2), 0b
+	BARRIER
 	ret
 
 ENDFN	NAME(LDNM)
diff --git a/libgcc/config/aarch64/t-lse b/libgcc/config/aarch64/t-lse
index fe3868dacbf..af9b68257fc 100644
--- a/libgcc/config/aarch64/t-lse
+++ b/libgcc/config/aarch64/t-lse
@@ -18,13 +18,13 @@
 # along with GCC; see the file COPYING3.  If not see
 # <http://www.gnu.org/licenses/>.
 
-# Compare-and-swap has 5 sizes and 4 memory models.
+# Compare-and-swap has 5 sizes and 5 memory models.
 S0 := $(foreach s, 1 2 4 8 16, $(addsuffix _$(s), cas))
-O0 := $(foreach m, 1 2 3 4, $(addsuffix _$(m)$(objext), $(S0)))
+O0 := $(foreach m, 1 2 3 4 5, $(addsuffix _$(m)$(objext), $(S0)))
 
-# Swap, Load-and-operate have 4 sizes and 4 memory models
+# Swap, Load-and-operate have 4 sizes and 5 memory models
 S1 := $(foreach s, 1 2 4 8, $(addsuffix _$(s), swp ldadd ldclr ldeor ldset))
-O1 := $(foreach m, 1 2 3 4, $(addsuffix _$(m)$(objext), $(S1)))
+O1 := $(foreach m, 1 2 3 4 5, $(addsuffix _$(m)$(objext), $(S1)))
 
 LSE_OBJS := $(O0) $(O1)
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [AArch64] PR105162: emit barrier for __sync and __atomic builtins on CPUs without LSE
  2022-05-13 20:32               ` Pop, Sebastian
@ 2022-05-16 11:42                 ` Richard Sandiford
  0 siblings, 0 replies; 10+ messages in thread
From: Richard Sandiford @ 2022-05-16 11:42 UTC (permalink / raw)
  To: Pop, Sebastian; +Cc: Wilco Dijkstra, Pop, Sebastian via Gcc-patches

"Pop, Sebastian" <spop@amazon.com> writes:
> Please see attached the patch back-ported to branches 12, 11, 10, and 9.
> Tested on aarch64-linux with bootstrap and regression test.
> Ok to commit to the GCC active branches?

OK, thanks.  Only very safe patches are supposed to be going into GCC 9
at this stage, and I guess this one is a bit on the edge.  But I think
it's worth applying anyway because it's fixing a non-determinstic
wrong-code regression.

Richard

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-05-16 11:42 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-07 20:15 [AArch64] PR105162: emit barrier for __sync and __atomic builtins on CPUs without LSE Pop, Sebastian
2022-04-18 18:22 ` Pop, Sebastian
2022-04-19 12:51   ` Wilco Dijkstra
2022-04-25 22:06     ` Pop, Sebastian
2022-05-03 15:33       ` Wilco Dijkstra
2022-05-04 14:23         ` Pop, Sebastian
2022-05-13 14:58           ` Richard Sandiford
2022-05-13 15:41             ` Wilco Dijkstra
2022-05-13 20:32               ` Pop, Sebastian
2022-05-16 11:42                 ` Richard Sandiford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).