public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH, AArch64 v4 1/6] aarch64: Extend %R for integer registers
  2019-09-18  1:58 [PATCH, AArch64 v4 0/6] LSE atomics out-of-line Richard Henderson
  2019-09-18  1:58 ` [PATCH, AArch64 v4 3/6] aarch64: Tidy aarch64_split_compare_and_swap Richard Henderson
  2019-09-18  1:58 ` [PATCH, AArch64 v4 5/6] aarch64: Implement -moutline-atomics Richard Henderson
@ 2019-09-18  1:58 ` Richard Henderson
  2019-09-18  1:58 ` [PATCH, AArch64 v4 2/6] aarch64: Implement TImode compare-and-swap Richard Henderson
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Richard Henderson @ 2019-09-18  1:58 UTC (permalink / raw)
  To: gcc-patches
  Cc: Wilco.Dijkstra, kyrylo.tkachov, Marcus.Shawcroft, James.Greenhalgh

	* config/aarch64/aarch64.c (aarch64_print_operand): Allow integer
	registers with %R.
---
 gcc/config/aarch64/aarch64.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 232317d4a5a..99d51e2aef9 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -8420,7 +8420,7 @@ sizetochar (int size)
      'S/T/U/V':		Print a FP/SIMD register name for a register list.
 			The register printed is the FP/SIMD register name
 			of X + 0/1/2/3 for S/T/U/V.
-     'R':		Print a scalar FP/SIMD register name + 1.
+     'R':		Print a scalar Integer/FP/SIMD register name + 1.
      'X':		Print bottom 16 bits of integer constant in hex.
      'w/x':		Print a general register name or the zero register
 			(32-bit or 64-bit).
@@ -8623,12 +8623,13 @@ aarch64_print_operand (FILE *f, rtx x, int code)
       break;
 
     case 'R':
-      if (!REG_P (x) || !FP_REGNUM_P (REGNO (x)))
-	{
-	  output_operand_lossage ("incompatible floating point / vector register operand for '%%%c'", code);
-	  return;
-	}
-      asm_fprintf (f, "q%d", REGNO (x) - V0_REGNUM + 1);
+      if (REG_P (x) && FP_REGNUM_P (REGNO (x)))
+	asm_fprintf (f, "q%d", REGNO (x) - V0_REGNUM + 1);
+      else if (REG_P (x) && GP_REGNUM_P (REGNO (x)))
+	asm_fprintf (f, "x%d", REGNO (x) - R0_REGNUM + 1);
+      else
+	output_operand_lossage ("incompatible register operand for '%%%c'",
+				code);
       break;
 
     case 'X':
-- 
2.17.1

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH, AArch64 v4 0/6] LSE atomics out-of-line
@ 2019-09-18  1:58 Richard Henderson
  2019-09-18  1:58 ` [PATCH, AArch64 v4 3/6] aarch64: Tidy aarch64_split_compare_and_swap Richard Henderson
                   ` (6 more replies)
  0 siblings, 7 replies; 12+ messages in thread
From: Richard Henderson @ 2019-09-18  1:58 UTC (permalink / raw)
  To: gcc-patches
  Cc: Wilco.Dijkstra, kyrylo.tkachov, Marcus.Shawcroft, James.Greenhalgh

Version 3 was back in November:
https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00062.html

Changes since v3:
  * Do not swap_commutative_operands_p in aarch64_gen_compare_reg.
    This is the probable cause of the bootstrap problem that Kyrill reported.
  * Add unwind markers to the out-of-line functions.
  * Use uxt{8,16} instead of mov in CAS functions,
    in preference to including the uxt with the cmp.
  * Prefer the lse case in the out-of-line fallthru (Wilco).
  * Name the option -moutline-atomics (Wilco)
  * Name the variable __aarch64_have_lse_atomics (Wilco);
    fix the definition in lse-init.c.
  * Rename the functions s/__aa64/__aarch64/ (Seemed sensible to match prev)
  * Always use Pmode for the address for libcalls, fixing ilp32 (Kyrill).

Still not done is a custom calling convention during code generation,
but that can come later as an optimization.

Tested aarch64-linux on a thunder x1.
I have not run tests on any platform supporting LSE, even qemu.


r~


Richard Henderson (6):
  aarch64: Extend %R for integer registers
  aarch64: Implement TImode compare-and-swap
  aarch64: Tidy aarch64_split_compare_and_swap
  aarch64: Add out-of-line functions for LSE atomics
  aarch64: Implement -moutline-atomics
  TESTING: Enable -moutline-atomics by default

 gcc/config/aarch64/aarch64-protos.h           |  13 +
 gcc/common/config/aarch64/aarch64-common.c    |   6 +-
 gcc/config/aarch64/aarch64.c                  | 204 +++++++++++----
 .../atomic-comp-swap-release-acquire.c        |   2 +-
 .../gcc.target/aarch64/atomic-op-acq_rel.c    |   2 +-
 .../gcc.target/aarch64/atomic-op-acquire.c    |   2 +-
 .../gcc.target/aarch64/atomic-op-char.c       |   2 +-
 .../gcc.target/aarch64/atomic-op-consume.c    |   2 +-
 .../gcc.target/aarch64/atomic-op-imm.c        |   2 +-
 .../gcc.target/aarch64/atomic-op-int.c        |   2 +-
 .../gcc.target/aarch64/atomic-op-long.c       |   2 +-
 .../gcc.target/aarch64/atomic-op-relaxed.c    |   2 +-
 .../gcc.target/aarch64/atomic-op-release.c    |   2 +-
 .../gcc.target/aarch64/atomic-op-seq_cst.c    |   2 +-
 .../gcc.target/aarch64/atomic-op-short.c      |   2 +-
 .../aarch64/atomic_cmp_exchange_zero_reg_1.c  |   2 +-
 .../atomic_cmp_exchange_zero_strong_1.c       |   2 +-
 .../gcc.target/aarch64/sync-comp-swap.c       |   2 +-
 .../gcc.target/aarch64/sync-op-acquire.c      |   2 +-
 .../gcc.target/aarch64/sync-op-full.c         |   2 +-
 libgcc/config/aarch64/lse-init.c              |  45 ++++
 gcc/config/aarch64/aarch64.opt                |   3 +
 gcc/config/aarch64/atomics.md                 | 187 +++++++++++++-
 gcc/config/aarch64/iterators.md               |   3 +
 gcc/doc/invoke.texi                           |  16 +-
 libgcc/config.host                            |   4 +
 libgcc/config/aarch64/lse.S                   | 235 ++++++++++++++++++
 libgcc/config/aarch64/t-lse                   |  44 ++++
 28 files changed, 709 insertions(+), 85 deletions(-)
 create mode 100644 libgcc/config/aarch64/lse-init.c
 create mode 100644 libgcc/config/aarch64/lse.S
 create mode 100644 libgcc/config/aarch64/t-lse

-- 
2.17.1

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH, AArch64 v4 4/6] aarch64: Add out-of-line functions for LSE atomics
  2019-09-18  1:58 [PATCH, AArch64 v4 0/6] LSE atomics out-of-line Richard Henderson
                   ` (3 preceding siblings ...)
  2019-09-18  1:58 ` [PATCH, AArch64 v4 2/6] aarch64: Implement TImode compare-and-swap Richard Henderson
@ 2019-09-18  1:58 ` Richard Henderson
  2019-09-18 12:58   ` Kyrill Tkachov
  2019-12-23 16:05   ` Roman Zhuykov
  2019-09-18  1:58 ` [PATCH, AArch64 v4 6/6] TESTING: Enable -moutline-atomics by default Richard Henderson
  2019-09-18 12:58 ` [PATCH, AArch64 v4 0/6] LSE atomics out-of-line Kyrill Tkachov
  6 siblings, 2 replies; 12+ messages in thread
From: Richard Henderson @ 2019-09-18  1:58 UTC (permalink / raw)
  To: gcc-patches
  Cc: Wilco.Dijkstra, kyrylo.tkachov, Marcus.Shawcroft, James.Greenhalgh

This is the libgcc part of the interface -- providing the functions.
Rationale is provided at the top of libgcc/config/aarch64/lse.S.

	* config/aarch64/lse-init.c: New file.
	* config/aarch64/lse.S: New file.
	* config/aarch64/t-lse: New file.
	* config.host: Add t-lse to all aarch64 tuples.
---
 libgcc/config/aarch64/lse-init.c |  45 ++++++
 libgcc/config.host               |   4 +
 libgcc/config/aarch64/lse.S      | 235 +++++++++++++++++++++++++++++++
 libgcc/config/aarch64/t-lse      |  44 ++++++
 4 files changed, 328 insertions(+)
 create mode 100644 libgcc/config/aarch64/lse-init.c
 create mode 100644 libgcc/config/aarch64/lse.S
 create mode 100644 libgcc/config/aarch64/t-lse

diff --git a/libgcc/config/aarch64/lse-init.c b/libgcc/config/aarch64/lse-init.c
new file mode 100644
index 00000000000..51fb21d45c9
--- /dev/null
+++ b/libgcc/config/aarch64/lse-init.c
@@ -0,0 +1,45 @@
+/* Out-of-line LSE atomics for AArch64 architecture, Init.
+   Copyright (C) 2018 Free Software Foundation, Inc.
+   Contributed by Linaro Ltd.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+/* Define the symbol gating the LSE implementations.  */
+_Bool __aarch64_have_lse_atomics
+  __attribute__((visibility("hidden"), nocommon));
+
+/* Disable initialization of __aarch64_have_lse_atomics during bootstrap.  */
+#ifndef inhibit_libc
+# include <sys/auxv.h>
+
+/* Disable initialization if the system headers are too old.  */
+# if defined(AT_HWCAP) && defined(HWCAP_ATOMICS)
+
+static void __attribute__((constructor))
+init_have_lse_atomics (void)
+{
+  unsigned long hwcap = getauxval (AT_HWCAP);
+  __aarch64_have_lse_atomics = (hwcap & HWCAP_ATOMICS) != 0;
+}
+
+# endif /* HWCAP */
+#endif /* inhibit_libc */
diff --git a/libgcc/config.host b/libgcc/config.host
index 728e543ea39..122113fc519 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -350,12 +350,14 @@ aarch64*-*-elf | aarch64*-*-rtems*)
 	extra_parts="$extra_parts crtbegin.o crtend.o crti.o crtn.o"
 	extra_parts="$extra_parts crtfastmath.o"
 	tmake_file="${tmake_file} ${cpu_type}/t-aarch64"
+	tmake_file="${tmake_file} ${cpu_type}/t-lse t-slibgcc-libgcc"
 	tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp t-crtfm"
 	md_unwind_header=aarch64/aarch64-unwind.h
 	;;
 aarch64*-*-freebsd*)
 	extra_parts="$extra_parts crtfastmath.o"
 	tmake_file="${tmake_file} ${cpu_type}/t-aarch64"
+	tmake_file="${tmake_file} ${cpu_type}/t-lse t-slibgcc-libgcc"
 	tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp t-crtfm"
 	md_unwind_header=aarch64/freebsd-unwind.h
 	;;
@@ -367,12 +369,14 @@ aarch64*-*-netbsd*)
 	;;
 aarch64*-*-fuchsia*)
 	tmake_file="${tmake_file} ${cpu_type}/t-aarch64"
+	tmake_file="${tmake_file} ${cpu_type}/t-lse t-slibgcc-libgcc"
 	tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp"
 	;;
 aarch64*-*-linux*)
 	extra_parts="$extra_parts crtfastmath.o"
 	md_unwind_header=aarch64/linux-unwind.h
 	tmake_file="${tmake_file} ${cpu_type}/t-aarch64"
+	tmake_file="${tmake_file} ${cpu_type}/t-lse t-slibgcc-libgcc"
 	tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp t-crtfm"
 	;;
 alpha*-*-linux*)
diff --git a/libgcc/config/aarch64/lse.S b/libgcc/config/aarch64/lse.S
new file mode 100644
index 00000000000..c24a39242ca
--- /dev/null
+++ b/libgcc/config/aarch64/lse.S
@@ -0,0 +1,235 @@
+/* Out-of-line LSE atomics for AArch64 architecture.
+   Copyright (C) 2018 Free Software Foundation, Inc.
+   Contributed by Linaro Ltd.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+/*
+ * The problem that we are trying to solve is operating system deployment
+ * of ARMv8.1-Atomics, also known as Large System Exensions (LSE).
+ *
+ * There are a number of potential solutions for this problem which have
+ * been proposed and rejected for various reasons.  To recap:
+ *
+ * (1) Multiple builds.  The dynamic linker will examine /lib64/atomics/
+ * if HWCAP_ATOMICS is set, allowing entire libraries to be overwritten.
+ * However, not all Linux distributions are happy with multiple builds,
+ * and anyway it has no effect on main applications.
+ *
+ * (2) IFUNC.  We could put these functions into libgcc_s.so, and have
+ * a single copy of each function for all DSOs.  However, ARM is concerned
+ * that the branch-to-indirect-branch that is implied by using a PLT,
+ * as required by IFUNC, is too much overhead for smaller cpus.
+ *
+ * (3) Statically predicted direct branches.  This is the approach that
+ * is taken here.  These functions are linked into every DSO that uses them.
+ * All of the symbols are hidden, so that the functions are called via a
+ * direct branch.  The choice of LSE vs non-LSE is done via one byte load
+ * followed by a well-predicted direct branch.  The functions are compiled
+ * separately to minimize code size.
+ */
+
+/* Tell the assembler to accept LSE instructions.  */
+	.arch armv8-a+lse
+
+/* Declare the symbol gating the LSE implementations.  */
+	.hidden	__aarch64_have_lse_atomics
+
+/* Turn size and memory model defines into mnemonic fragments.  */
+#if SIZE == 1
+# define S     b
+# define UXT   uxtb
+#elif SIZE == 2
+# define S     h
+# define UXT   uxth
+#elif SIZE == 4 || SIZE == 8 || SIZE == 16
+# define S
+# define UXT   mov
+#else
+# error
+#endif
+
+#if MODEL == 1
+# define SUFF  _relax
+# define A
+# define L
+#elif MODEL == 2
+# define SUFF  _acq
+# define A     a
+# define L
+#elif MODEL == 3
+# define SUFF  _rel
+# define A
+# define L     l
+#elif MODEL == 4
+# define SUFF  _acq_rel
+# define A     a
+# define L     l
+#else
+# error
+#endif
+
+/* Concatenate symbols.  */
+#define glue2_(A, B)		A ## B
+#define glue2(A, B)		glue2_(A, B)
+#define glue3_(A, B, C)		A ## B ## C
+#define glue3(A, B, C)		glue3_(A, B, C)
+#define glue4_(A, B, C, D)	A ## B ## C ## D
+#define glue4(A, B, C, D)	glue4_(A, B, C, D)
+
+/* Select the size of a register, given a regno.  */
+#define x(N)			glue2(x, N)
+#define w(N)			glue2(w, N)
+#if SIZE < 8
+# define s(N)			w(N)
+#else
+# define s(N)			x(N)
+#endif
+
+#define NAME(BASE)		glue4(__aarch64_, BASE, SIZE, SUFF)
+#define LDXR			glue4(ld, A, xr, S)
+#define STXR			glue4(st, L, xr, S)
+
+/* Temporary registers used.  Other than these, only the return value
+   register (x0) and the flags are modified.  */
+#define tmp0	16
+#define tmp1	17
+#define tmp2	15
+
+/* Start and end a function.  */
+.macro	STARTFN name
+	.text
+	.balign	16
+	.globl	\name
+	.hidden	\name
+	.type	\name, %function
+	.cfi_startproc
+\name:
+.endm
+
+.macro	ENDFN name
+	.cfi_endproc
+	.size	\name, . - \name
+.endm
+
+/* Branch to LABEL if LSE is disabled.  */
+.macro	JUMP_IF_NOT_LSE label
+	adrp	x(tmp0), __aarch64_have_lse_atomics
+	ldrb	w(tmp0), [x(tmp0), :lo12:__aarch64_have_lse_atomics]
+	cbz	w(tmp0), \label
+.endm
+
+#ifdef L_cas
+
+STARTFN	NAME(cas)
+	JUMP_IF_NOT_LSE	8f
+
+#if SIZE < 16
+#define CAS	glue4(cas, A, L, S)
+
+	CAS		s(0), s(1), [x2]
+	ret
+
+8:	UXT		s(tmp0), s(0)
+0:	LDXR		s(0), [x2]
+	cmp		s(0), s(tmp0)
+	bne		1f
+	STXR		w(tmp1), s(1), [x2]
+	cbnz		w(tmp1), 0b
+1:	ret
+
+#else
+#define LDXP	glue3(ld, A, xp)
+#define STXP	glue3(st, L, xp)
+#define CASP	glue3(casp, A, L)
+
+	CASP		x0, x1, x2, x3, [x4]
+	ret
+
+8:	mov		x(tmp0), x0
+	mov		x(tmp1), x1
+0:	LDXP		x0, x1, [x4]
+	cmp		x0, x(tmp0)
+	ccmp		x1, x(tmp1), #0, eq
+	bne		1f
+	STXP		w(tmp2), x(tmp0), x(tmp1), [x4]
+	cbnz		w(tmp2), 0b
+1:	ret
+
+#endif
+
+ENDFN	NAME(cas)
+#endif
+
+#ifdef L_swp
+#define SWP	glue4(swp, A, L, S)
+
+STARTFN	NAME(swp)
+	JUMP_IF_NOT_LSE	8f
+
+	SWP		s(0), s(0), [x1]
+	ret
+
+8:	mov		s(tmp0), s(0)
+0:	LDXR		s(0), [x1]
+	STXR		w(tmp1), s(tmp0), [x1]
+	cbnz		w(tmp1), 0b
+	ret
+
+ENDFN	NAME(swp)
+#endif
+
+#if defined(L_ldadd) || defined(L_ldclr) \
+    || defined(L_ldeor) || defined(L_ldset)
+
+#ifdef L_ldadd
+#define LDNM	ldadd
+#define OP	add
+#elif defined(L_ldclr)
+#define LDNM	ldclr
+#define OP	bic
+#elif defined(L_ldeor)
+#define LDNM	ldeor
+#define OP	eor
+#elif defined(L_ldset)
+#define LDNM	ldset
+#define OP	orr
+#else
+#error
+#endif
+#define LDOP	glue4(LDNM, A, L, S)
+
+STARTFN	NAME(LDNM)
+	JUMP_IF_NOT_LSE	8f
+
+	LDOP		s(0), s(0), [x1]
+	ret
+
+8:	mov		s(tmp0), s(0)
+0:	LDXR		s(0), [x1]
+	OP		s(tmp1), s(0), s(tmp0)
+	STXR		w(tmp1), s(tmp1), [x1]
+	cbnz		w(tmp1), 0b
+	ret
+
+ENDFN	NAME(LDNM)
+#endif
diff --git a/libgcc/config/aarch64/t-lse b/libgcc/config/aarch64/t-lse
new file mode 100644
index 00000000000..c7f4223cd45
--- /dev/null
+++ b/libgcc/config/aarch64/t-lse
@@ -0,0 +1,44 @@
+# Out-of-line LSE atomics for AArch64 architecture.
+# Copyright (C) 2018 Free Software Foundation, Inc.
+# Contributed by Linaro Ltd.
+#
+# This file is part of GCC.
+#
+# GCC is free software; you can redistribute it and/or modify it
+# under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at your option)
+# any later version.
+#
+# GCC is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# <http://www.gnu.org/licenses/>.
+
+# Compare-and-swap has 5 sizes and 4 memory models.
+S0 := $(foreach s, 1 2 4 8 16, $(addsuffix _$(s), cas))
+O0 := $(foreach m, 1 2 3 4, $(addsuffix _$(m)$(objext), $(S0)))
+
+# Swap, Load-and-operate have 4 sizes and 4 memory models
+S1 := $(foreach s, 1 2 4 8, $(addsuffix _$(s), swp ldadd ldclr ldeor ldset))
+O1 := $(foreach m, 1 2 3 4, $(addsuffix _$(m)$(objext), $(S1)))
+
+LSE_OBJS := $(O0) $(O1)
+
+libgcc-objects += $(LSE_OBJS) lse-init$(objext)
+
+empty      =
+space      = $(empty) $(empty)
+PAT_SPLIT  = $(subst _,$(space),$(*F))
+PAT_BASE   = $(word 1,$(PAT_SPLIT))
+PAT_N      = $(word 2,$(PAT_SPLIT))
+PAT_M      = $(word 3,$(PAT_SPLIT))
+
+lse-init$(objext): $(srcdir)/config/aarch64/lse-init.c
+	$(gcc_compile) -c $<
+
+$(LSE_OBJS): $(srcdir)/config/aarch64/lse.S
+	$(gcc_compile) -DL_$(PAT_BASE) -DSIZE=$(PAT_N) -DMODEL=$(PAT_M) -c $<
-- 
2.17.1

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH, AArch64 v4 6/6] TESTING: Enable -moutline-atomics by default
  2019-09-18  1:58 [PATCH, AArch64 v4 0/6] LSE atomics out-of-line Richard Henderson
                   ` (4 preceding siblings ...)
  2019-09-18  1:58 ` [PATCH, AArch64 v4 4/6] aarch64: Add out-of-line functions for LSE atomics Richard Henderson
@ 2019-09-18  1:58 ` Richard Henderson
  2019-09-18 12:58 ` [PATCH, AArch64 v4 0/6] LSE atomics out-of-line Kyrill Tkachov
  6 siblings, 0 replies; 12+ messages in thread
From: Richard Henderson @ 2019-09-18  1:58 UTC (permalink / raw)
  To: gcc-patches
  Cc: Wilco.Dijkstra, kyrylo.tkachov, Marcus.Shawcroft, James.Greenhalgh

---
 gcc/common/config/aarch64/aarch64-common.c | 6 ++++--
 gcc/config/aarch64/aarch64.c               | 6 ++++--
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/gcc/common/config/aarch64/aarch64-common.c b/gcc/common/config/aarch64/aarch64-common.c
index 07c03253951..2bbf454eea9 100644
--- a/gcc/common/config/aarch64/aarch64-common.c
+++ b/gcc/common/config/aarch64/aarch64-common.c
@@ -32,9 +32,11 @@
 #include "diagnostic.h"
 #include "params.h"
 
-#ifdef  TARGET_BIG_ENDIAN_DEFAULT
 #undef  TARGET_DEFAULT_TARGET_FLAGS
-#define TARGET_DEFAULT_TARGET_FLAGS (MASK_BIG_END)
+#ifdef  TARGET_BIG_ENDIAN_DEFAULT
+#define TARGET_DEFAULT_TARGET_FLAGS (MASK_BIG_END | MASK_OUTLINE_ATOMICS)
+#else
+#define TARGET_DEFAULT_TARGET_FLAGS (MASK_OUTLINE_ATOMICS)
 #endif
 
 #undef  TARGET_HANDLE_OPTION
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 56a4a47db73..ca4363e7831 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -20535,9 +20535,11 @@ aarch64_run_selftests (void)
 #undef TARGET_C_MODE_FOR_SUFFIX
 #define TARGET_C_MODE_FOR_SUFFIX aarch64_c_mode_for_suffix
 
-#ifdef TARGET_BIG_ENDIAN_DEFAULT
 #undef  TARGET_DEFAULT_TARGET_FLAGS
-#define TARGET_DEFAULT_TARGET_FLAGS (MASK_BIG_END)
+#ifdef  TARGET_BIG_ENDIAN_DEFAULT
+#define TARGET_DEFAULT_TARGET_FLAGS (MASK_BIG_END | MASK_OUTLINE_ATOMICS)
+#else
+#define TARGET_DEFAULT_TARGET_FLAGS (MASK_OUTLINE_ATOMICS)
 #endif
 
 #undef TARGET_CLASS_MAX_NREGS
-- 
2.17.1

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH, AArch64 v4 3/6] aarch64: Tidy aarch64_split_compare_and_swap
  2019-09-18  1:58 [PATCH, AArch64 v4 0/6] LSE atomics out-of-line Richard Henderson
@ 2019-09-18  1:58 ` Richard Henderson
  2019-09-18  1:58 ` [PATCH, AArch64 v4 5/6] aarch64: Implement -moutline-atomics Richard Henderson
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Richard Henderson @ 2019-09-18  1:58 UTC (permalink / raw)
  To: gcc-patches
  Cc: Wilco.Dijkstra, kyrylo.tkachov, Marcus.Shawcroft, James.Greenhalgh

With aarch64_track_speculation, we had extra code to do exactly what the
!strong_zero_p path already did.  The rest is reducing code duplication.

	* config/aarch64/aarch64 (aarch64_split_compare_and_swap): Disable
	strong_zero_p for aarch64_track_speculation; unify some code paths;
	use aarch64_gen_compare_reg instead of open-coding.
---
 gcc/config/aarch64/aarch64.c | 50 ++++++++++--------------------------
 1 file changed, 14 insertions(+), 36 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index a5c4f55627d..b937514e6f8 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -16955,13 +16955,11 @@ aarch64_emit_post_barrier (enum memmodel model)
 void
 aarch64_split_compare_and_swap (rtx operands[])
 {
-  rtx rval, mem, oldval, newval, scratch;
+  rtx rval, mem, oldval, newval, scratch, x, model_rtx;
   machine_mode mode;
   bool is_weak;
   rtx_code_label *label1, *label2;
-  rtx x, cond;
   enum memmodel model;
-  rtx model_rtx;
 
   rval = operands[0];
   mem = operands[1];
@@ -16982,7 +16980,8 @@ aarch64_split_compare_and_swap (rtx operands[])
 	CBNZ	scratch, .label1
     .label2:
 	CMP	rval, 0.  */
-  bool strong_zero_p = !is_weak && oldval == const0_rtx && mode != TImode;
+  bool strong_zero_p = (!is_weak && !aarch64_track_speculation &&
+			oldval == const0_rtx && mode != TImode);
 
   label1 = NULL;
   if (!is_weak)
@@ -16995,35 +16994,20 @@ aarch64_split_compare_and_swap (rtx operands[])
   /* The initial load can be relaxed for a __sync operation since a final
      barrier will be emitted to stop code hoisting.  */
   if (is_mm_sync (model))
-    aarch64_emit_load_exclusive (mode, rval, mem,
-				 GEN_INT (MEMMODEL_RELAXED));
+    aarch64_emit_load_exclusive (mode, rval, mem, GEN_INT (MEMMODEL_RELAXED));
   else
     aarch64_emit_load_exclusive (mode, rval, mem, model_rtx);
 
   if (strong_zero_p)
-    {
-      if (aarch64_track_speculation)
-	{
-	  /* Emit an explicit compare instruction, so that we can correctly
-	     track the condition codes.  */
-	  rtx cc_reg = aarch64_gen_compare_reg (NE, rval, const0_rtx);
-	  x = gen_rtx_NE (GET_MODE (cc_reg), cc_reg, const0_rtx);
-	}
-      else
-	x = gen_rtx_NE (VOIDmode, rval, const0_rtx);
-
-      x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
-				gen_rtx_LABEL_REF (Pmode, label2), pc_rtx);
-      aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
-    }
+    x = gen_rtx_NE (VOIDmode, rval, const0_rtx);
   else
     {
-      cond = aarch64_gen_compare_reg_maybe_ze (NE, rval, oldval, mode);
-      x = gen_rtx_NE (VOIDmode, cond, const0_rtx);
-      x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
-				gen_rtx_LABEL_REF (Pmode, label2), pc_rtx);
-      aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
+      rtx cc_reg = aarch64_gen_compare_reg_maybe_ze (NE, rval, oldval, mode);
+      x = gen_rtx_NE (VOIDmode, cc_reg, const0_rtx);
     }
+  x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
+			    gen_rtx_LABEL_REF (Pmode, label2), pc_rtx);
+  aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
 
   aarch64_emit_store_exclusive (mode, scratch, mem, newval, model_rtx);
 
@@ -17044,22 +17028,16 @@ aarch64_split_compare_and_swap (rtx operands[])
       aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
     }
   else
-    {
-      cond = gen_rtx_REG (CCmode, CC_REGNUM);
-      x = gen_rtx_COMPARE (CCmode, scratch, const0_rtx);
-      emit_insn (gen_rtx_SET (cond, x));
-    }
+    aarch64_gen_compare_reg (NE, scratch, const0_rtx);
 
   emit_label (label2);
+
   /* If we used a CBNZ in the exchange loop emit an explicit compare with RVAL
      to set the condition flags.  If this is not used it will be removed by
      later passes.  */
   if (strong_zero_p)
-    {
-      cond = gen_rtx_REG (CCmode, CC_REGNUM);
-      x = gen_rtx_COMPARE (CCmode, rval, const0_rtx);
-      emit_insn (gen_rtx_SET (cond, x));
-    }
+    aarch64_gen_compare_reg (NE, rval, const0_rtx);
+
   /* Emit any final barrier needed for a __sync operation.  */
   if (is_mm_sync (model))
     aarch64_emit_post_barrier (model);
-- 
2.17.1

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH, AArch64 v4 2/6] aarch64: Implement TImode compare-and-swap
  2019-09-18  1:58 [PATCH, AArch64 v4 0/6] LSE atomics out-of-line Richard Henderson
                   ` (2 preceding siblings ...)
  2019-09-18  1:58 ` [PATCH, AArch64 v4 1/6] aarch64: Extend %R for integer registers Richard Henderson
@ 2019-09-18  1:58 ` Richard Henderson
  2019-09-18  1:58 ` [PATCH, AArch64 v4 4/6] aarch64: Add out-of-line functions for LSE atomics Richard Henderson
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Richard Henderson @ 2019-09-18  1:58 UTC (permalink / raw)
  To: gcc-patches
  Cc: Wilco.Dijkstra, kyrylo.tkachov, Marcus.Shawcroft, James.Greenhalgh

This pattern will only be used with the __sync functions, because
we do not yet have a bare TImode atomic load.

	* config/aarch64/aarch64.c (aarch64_gen_compare_reg): Add support
	for NE comparison of TImode values.
	(aarch64_emit_load_exclusive): Add support for TImode.
	(aarch64_emit_store_exclusive): Likewise.
	(aarch64_split_compare_and_swap): Disable strong_zero_p for TImode.
	* config/aarch64/atomics.md (@atomic_compare_and_swap<ALLI_TI>):
	Change iterator from ALLI to ALLI_TI.
	(@atomic_compare_and_swap<JUST_TI>): New.
	(@atomic_compare_and_swap<JUST_TI>_lse): New.
	(aarch64_load_exclusive_pair): New.
	(aarch64_store_exclusive_pair): New.
	* config/aarch64/iterators.md (JUST_TI): New.
---
 gcc/config/aarch64/aarch64.c    | 48 ++++++++++++++---
 gcc/config/aarch64/atomics.md   | 93 +++++++++++++++++++++++++++++++--
 gcc/config/aarch64/iterators.md |  3 ++
 3 files changed, 131 insertions(+), 13 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 99d51e2aef9..a5c4f55627d 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -2039,10 +2039,33 @@ emit_set_insn (rtx x, rtx y)
 rtx
 aarch64_gen_compare_reg (RTX_CODE code, rtx x, rtx y)
 {
-  machine_mode mode = SELECT_CC_MODE (code, x, y);
-  rtx cc_reg = gen_rtx_REG (mode, CC_REGNUM);
+  machine_mode cmp_mode = GET_MODE (x);
+  machine_mode cc_mode;
+  rtx cc_reg;
 
-  emit_set_insn (cc_reg, gen_rtx_COMPARE (mode, x, y));
+  if (cmp_mode == TImode)
+    {
+      gcc_assert (code == NE);
+
+      cc_mode = CCmode;
+      cc_reg = gen_rtx_REG (cc_mode, CC_REGNUM);
+
+      rtx x_lo = operand_subword (x, 0, 0, TImode);
+      rtx y_lo = operand_subword (y, 0, 0, TImode);
+      emit_set_insn (cc_reg, gen_rtx_COMPARE (cc_mode, x_lo, y_lo));
+
+      rtx x_hi = operand_subword (x, 1, 0, TImode);
+      rtx y_hi = operand_subword (y, 1, 0, TImode);
+      emit_insn (gen_ccmpdi (cc_reg, cc_reg, x_hi, y_hi,
+			     gen_rtx_EQ (cc_mode, cc_reg, const0_rtx),
+			     GEN_INT (AARCH64_EQ)));
+    }
+  else
+    {
+      cc_mode = SELECT_CC_MODE (code, x, y);
+      cc_reg = gen_rtx_REG (cc_mode, CC_REGNUM);
+      emit_set_insn (cc_reg, gen_rtx_COMPARE (cc_mode, x, y));
+    }
   return cc_reg;
 }
 
@@ -2593,7 +2616,6 @@ aarch64_zero_extend_const_eq (machine_mode xmode, rtx x,
   gcc_assert (r != NULL);
   return rtx_equal_p (x, r);
 }
-			      
 
 /* Return TARGET if it is nonnull and a register of mode MODE.
    Otherwise, return a fresh register of mode MODE if we can,
@@ -16814,16 +16836,26 @@ static void
 aarch64_emit_load_exclusive (machine_mode mode, rtx rval,
 			     rtx mem, rtx model_rtx)
 {
-  emit_insn (gen_aarch64_load_exclusive (mode, rval, mem, model_rtx));
+  if (mode == TImode)
+    emit_insn (gen_aarch64_load_exclusive_pair (gen_lowpart (DImode, rval),
+						gen_highpart (DImode, rval),
+						mem, model_rtx));
+  else
+    emit_insn (gen_aarch64_load_exclusive (mode, rval, mem, model_rtx));
 }
 
 /* Emit store exclusive.  */
 
 static void
 aarch64_emit_store_exclusive (machine_mode mode, rtx bval,
-			      rtx rval, rtx mem, rtx model_rtx)
+			      rtx mem, rtx rval, rtx model_rtx)
 {
-  emit_insn (gen_aarch64_store_exclusive (mode, bval, rval, mem, model_rtx));
+  if (mode == TImode)
+    emit_insn (gen_aarch64_store_exclusive_pair
+	       (bval, mem, operand_subword (rval, 0, 0, TImode),
+		operand_subword (rval, 1, 0, TImode), model_rtx));
+  else
+    emit_insn (gen_aarch64_store_exclusive (mode, bval, mem, rval, model_rtx));
 }
 
 /* Mark the previous jump instruction as unlikely.  */
@@ -16950,7 +16982,7 @@ aarch64_split_compare_and_swap (rtx operands[])
 	CBNZ	scratch, .label1
     .label2:
 	CMP	rval, 0.  */
-  bool strong_zero_p = !is_weak && oldval == const0_rtx;
+  bool strong_zero_p = !is_weak && oldval == const0_rtx && mode != TImode;
 
   label1 = NULL;
   if (!is_weak)
diff --git a/gcc/config/aarch64/atomics.md b/gcc/config/aarch64/atomics.md
index a679270cd38..f8bdd048b37 100644
--- a/gcc/config/aarch64/atomics.md
+++ b/gcc/config/aarch64/atomics.md
@@ -21,11 +21,11 @@
 ;; Instruction patterns.
 
 (define_expand "@atomic_compare_and_swap<mode>"
-  [(match_operand:SI 0 "register_operand")			;; bool out
-   (match_operand:ALLI 1 "register_operand")			;; val out
-   (match_operand:ALLI 2 "aarch64_sync_memory_operand")		;; memory
-   (match_operand:ALLI 3 "nonmemory_operand")			;; expected
-   (match_operand:ALLI 4 "aarch64_reg_or_zero")			;; desired
+  [(match_operand:SI 0 "register_operand" "")			;; bool out
+   (match_operand:ALLI_TI 1 "register_operand" "")		;; val out
+   (match_operand:ALLI_TI 2 "aarch64_sync_memory_operand" "")	;; memory
+   (match_operand:ALLI_TI 3 "nonmemory_operand" "")		;; expected
+   (match_operand:ALLI_TI 4 "aarch64_reg_or_zero" "")		;; desired
    (match_operand:SI 5 "const_int_operand")			;; is_weak
    (match_operand:SI 6 "const_int_operand")			;; mod_s
    (match_operand:SI 7 "const_int_operand")]			;; mod_f
@@ -88,6 +88,30 @@
   }
 )
 
+(define_insn_and_split "@aarch64_compare_and_swap<mode>"
+  [(set (reg:CC CC_REGNUM)					;; bool out
+    (unspec_volatile:CC [(const_int 0)] UNSPECV_ATOMIC_CMPSW))
+   (set (match_operand:JUST_TI 0 "register_operand" "=&r")	;; val out
+    (match_operand:JUST_TI 1 "aarch64_sync_memory_operand" "+Q")) ;; memory
+   (set (match_dup 1)
+    (unspec_volatile:JUST_TI
+      [(match_operand:JUST_TI 2 "aarch64_reg_or_zero" "rZ")	;; expect
+       (match_operand:JUST_TI 3 "aarch64_reg_or_zero" "rZ")	;; desired
+       (match_operand:SI 4 "const_int_operand")			;; is_weak
+       (match_operand:SI 5 "const_int_operand")			;; mod_s
+       (match_operand:SI 6 "const_int_operand")]		;; mod_f
+      UNSPECV_ATOMIC_CMPSW))
+   (clobber (match_scratch:SI 7 "=&r"))]
+  ""
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+  {
+    aarch64_split_compare_and_swap (operands);
+    DONE;
+  }
+)
+
 (define_insn "@aarch64_compare_and_swap<mode>_lse"
   [(set (match_operand:SI 0 "register_operand" "+r")		;; val out
     (zero_extend:SI
@@ -133,6 +157,28 @@
     return "casal<atomic_sfx>\t%<w>0, %<w>2, %1";
 })
 
+(define_insn "@aarch64_compare_and_swap<mode>_lse"
+  [(set (match_operand:JUST_TI 0 "register_operand" "+r")	;; val out
+    (match_operand:JUST_TI 1 "aarch64_sync_memory_operand" "+Q")) ;; memory
+   (set (match_dup 1)
+    (unspec_volatile:JUST_TI
+      [(match_dup 0)						;; expect
+       (match_operand:JUST_TI 2 "register_operand" "r")		;; desired
+       (match_operand:SI 3 "const_int_operand")]		;; mod_s
+      UNSPECV_ATOMIC_CMPSW))]
+  "TARGET_LSE"
+{
+  enum memmodel model = memmodel_from_int (INTVAL (operands[3]));
+  if (is_mm_relaxed (model))
+    return "casp\t%0, %R0, %2, %R2, %1";
+  else if (is_mm_acquire (model) || is_mm_consume (model))
+    return "caspa\t%0, %R0, %2, %R2, %1";
+  else if (is_mm_release (model))
+    return "caspl\t%0, %R0, %2, %R2, %1";
+  else
+    return "caspal\t%0, %R0, %2, %R2, %1";
+})
+
 (define_expand "atomic_exchange<mode>"
  [(match_operand:ALLI 0 "register_operand")
   (match_operand:ALLI 1 "aarch64_sync_memory_operand")
@@ -581,6 +627,24 @@
   }
 )
 
+(define_insn "aarch64_load_exclusive_pair"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+	(unspec_volatile:DI
+	  [(match_operand:TI 2 "aarch64_sync_memory_operand" "Q")
+	   (match_operand:SI 3 "const_int_operand")]
+	  UNSPECV_LX))
+   (set (match_operand:DI 1 "register_operand" "=r")
+	(unspec_volatile:DI [(match_dup 2) (match_dup 3)] UNSPECV_LX))]
+  ""
+  {
+    enum memmodel model = memmodel_from_int (INTVAL (operands[3]));
+    if (is_mm_relaxed (model) || is_mm_consume (model) || is_mm_release (model))
+      return "ldxp\t%0, %1, %2";
+    else
+      return "ldaxp\t%0, %1, %2";
+  }
+)
+
 (define_insn "@aarch64_store_exclusive<mode>"
   [(set (match_operand:SI 0 "register_operand" "=&r")
     (unspec_volatile:SI [(const_int 0)] UNSPECV_SX))
@@ -599,6 +663,25 @@
   }
 )
 
+(define_insn "aarch64_store_exclusive_pair"
+  [(set (match_operand:SI 0 "register_operand" "=&r")
+	(unspec_volatile:SI [(const_int 0)] UNSPECV_SX))
+   (set (match_operand:TI 1 "aarch64_sync_memory_operand" "=Q")
+	(unspec_volatile:TI
+	  [(match_operand:DI 2 "aarch64_reg_or_zero" "rZ")
+	   (match_operand:DI 3 "aarch64_reg_or_zero" "rZ")
+	   (match_operand:SI 4 "const_int_operand")]
+	  UNSPECV_SX))]
+  ""
+  {
+    enum memmodel model = memmodel_from_int (INTVAL (operands[3]));
+    if (is_mm_relaxed (model) || is_mm_consume (model) || is_mm_acquire (model))
+      return "stxp\t%w0, %x2, %x3, %1";
+    else
+      return "stlxp\t%w0, %x2, %x3, %1";
+  }
+)
+
 (define_expand "mem_thread_fence"
   [(match_operand:SI 0 "const_int_operand")]
   ""
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index d23f0fcbc2f..03b3ce36302 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -29,6 +29,9 @@
 ;; Iterator for HI, SI, DI, some instructions can only work on these modes.
 (define_mode_iterator GPI_I16 [(HI "AARCH64_ISA_F16") SI DI])
 
+;; "Iterator" for just TI -- features like @pattern only work with iterators.
+(define_mode_iterator JUST_TI [TI])
+
 ;; Iterator for QI and HI modes
 (define_mode_iterator SHORT [QI HI])
 
-- 
2.17.1

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH, AArch64 v4 5/6] aarch64: Implement -moutline-atomics
  2019-09-18  1:58 [PATCH, AArch64 v4 0/6] LSE atomics out-of-line Richard Henderson
  2019-09-18  1:58 ` [PATCH, AArch64 v4 3/6] aarch64: Tidy aarch64_split_compare_and_swap Richard Henderson
@ 2019-09-18  1:58 ` Richard Henderson
  2019-09-18 12:58   ` Kyrill Tkachov
  2019-09-18  1:58 ` [PATCH, AArch64 v4 1/6] aarch64: Extend %R for integer registers Richard Henderson
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 12+ messages in thread
From: Richard Henderson @ 2019-09-18  1:58 UTC (permalink / raw)
  To: gcc-patches
  Cc: Wilco.Dijkstra, kyrylo.tkachov, Marcus.Shawcroft, James.Greenhalgh

	* config/aarch64/aarch64.opt (-moutline-atomics): New.
	* config/aarch64/aarch64.c (aarch64_atomic_ool_func): New.
	(aarch64_ool_cas_names, aarch64_ool_swp_names): New.
	(aarch64_ool_ldadd_names, aarch64_ool_ldset_names): New.
	(aarch64_ool_ldclr_names, aarch64_ool_ldeor_names): New.
	(aarch64_expand_compare_and_swap): Honor TARGET_OUTLINE_ATOMICS.
	* config/aarch64/atomics.md (atomic_exchange<ALLI>): Likewise.
	(atomic_<atomic_op><ALLI>): Likewise.
	(atomic_fetch_<atomic_op><ALLI>): Likewise.
	(atomic_<atomic_op>_fetch<ALLI>): Likewise.
testsuite/
	* gcc.target/aarch64/atomic-op-acq_rel.c: Use -mno-outline-atomics.
	* gcc.target/aarch64/atomic-comp-swap-release-acquire.c: Likewise.
	* gcc.target/aarch64/atomic-op-acquire.c: Likewise.
	* gcc.target/aarch64/atomic-op-char.c: Likewise.
	* gcc.target/aarch64/atomic-op-consume.c: Likewise.
	* gcc.target/aarch64/atomic-op-imm.c: Likewise.
	* gcc.target/aarch64/atomic-op-int.c: Likewise.
	* gcc.target/aarch64/atomic-op-long.c: Likewise.
	* gcc.target/aarch64/atomic-op-relaxed.c: Likewise.
	* gcc.target/aarch64/atomic-op-release.c: Likewise.
	* gcc.target/aarch64/atomic-op-seq_cst.c: Likewise.
	* gcc.target/aarch64/atomic-op-short.c: Likewise.
	* gcc.target/aarch64/atomic_cmp_exchange_zero_reg_1.c: Likewise.
	* gcc.target/aarch64/atomic_cmp_exchange_zero_strong_1.c: Likewise.
	* gcc.target/aarch64/sync-comp-swap.c: Likewise.
	* gcc.target/aarch64/sync-op-acquire.c: Likewise.
	* gcc.target/aarch64/sync-op-full.c: Likewise.
---
 gcc/config/aarch64/aarch64-protos.h           | 13 +++
 gcc/config/aarch64/aarch64.c                  | 87 +++++++++++++++++
 .../atomic-comp-swap-release-acquire.c        |  2 +-
 .../gcc.target/aarch64/atomic-op-acq_rel.c    |  2 +-
 .../gcc.target/aarch64/atomic-op-acquire.c    |  2 +-
 .../gcc.target/aarch64/atomic-op-char.c       |  2 +-
 .../gcc.target/aarch64/atomic-op-consume.c    |  2 +-
 .../gcc.target/aarch64/atomic-op-imm.c        |  2 +-
 .../gcc.target/aarch64/atomic-op-int.c        |  2 +-
 .../gcc.target/aarch64/atomic-op-long.c       |  2 +-
 .../gcc.target/aarch64/atomic-op-relaxed.c    |  2 +-
 .../gcc.target/aarch64/atomic-op-release.c    |  2 +-
 .../gcc.target/aarch64/atomic-op-seq_cst.c    |  2 +-
 .../gcc.target/aarch64/atomic-op-short.c      |  2 +-
 .../aarch64/atomic_cmp_exchange_zero_reg_1.c  |  2 +-
 .../atomic_cmp_exchange_zero_strong_1.c       |  2 +-
 .../gcc.target/aarch64/sync-comp-swap.c       |  2 +-
 .../gcc.target/aarch64/sync-op-acquire.c      |  2 +-
 .../gcc.target/aarch64/sync-op-full.c         |  2 +-
 gcc/config/aarch64/aarch64.opt                |  3 +
 gcc/config/aarch64/atomics.md                 | 94 +++++++++++++++++--
 gcc/doc/invoke.texi                           | 16 +++-
 22 files changed, 221 insertions(+), 26 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index c4b73d26df6..1c1aac7201a 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -696,4 +696,17 @@ poly_uint64 aarch64_regmode_natural_size (machine_mode);
 
 bool aarch64_high_bits_all_ones_p (HOST_WIDE_INT);
 
+struct atomic_ool_names
+{
+    const char *str[5][4];
+};
+
+rtx aarch64_atomic_ool_func(machine_mode mode, rtx model_rtx,
+			    const atomic_ool_names *names);
+extern const atomic_ool_names aarch64_ool_swp_names;
+extern const atomic_ool_names aarch64_ool_ldadd_names;
+extern const atomic_ool_names aarch64_ool_ldset_names;
+extern const atomic_ool_names aarch64_ool_ldclr_names;
+extern const atomic_ool_names aarch64_ool_ldeor_names;
+
 #endif /* GCC_AARCH64_PROTOS_H */
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index b937514e6f8..56a4a47db73 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -16867,6 +16867,82 @@ aarch64_emit_unlikely_jump (rtx insn)
   add_reg_br_prob_note (jump, profile_probability::very_unlikely ());
 }
 
+/* We store the names of the various atomic helpers in a 5x4 array.
+   Return the libcall function given MODE, MODEL and NAMES.  */
+
+rtx
+aarch64_atomic_ool_func(machine_mode mode, rtx model_rtx,
+			const atomic_ool_names *names)
+{
+  memmodel model = memmodel_base (INTVAL (model_rtx));
+  int mode_idx, model_idx;
+
+  switch (mode)
+    {
+    case E_QImode:
+      mode_idx = 0;
+      break;
+    case E_HImode:
+      mode_idx = 1;
+      break;
+    case E_SImode:
+      mode_idx = 2;
+      break;
+    case E_DImode:
+      mode_idx = 3;
+      break;
+    case E_TImode:
+      mode_idx = 4;
+      break;
+    default:
+      gcc_unreachable ();
+    }
+
+  switch (model)
+    {
+    case MEMMODEL_RELAXED:
+      model_idx = 0;
+      break;
+    case MEMMODEL_CONSUME:
+    case MEMMODEL_ACQUIRE:
+      model_idx = 1;
+      break;
+    case MEMMODEL_RELEASE:
+      model_idx = 2;
+      break;
+    case MEMMODEL_ACQ_REL:
+    case MEMMODEL_SEQ_CST:
+      model_idx = 3;
+      break;
+    default:
+      gcc_unreachable ();
+    }
+
+  return init_one_libfunc_visibility (names->str[mode_idx][model_idx],
+				      VISIBILITY_HIDDEN);
+}
+
+#define DEF0(B, N) \
+  { "__aarch64_" #B #N "_relax", \
+    "__aarch64_" #B #N "_acq", \
+    "__aarch64_" #B #N "_rel", \
+    "__aarch64_" #B #N "_acq_rel" }
+
+#define DEF4(B)  DEF0(B, 1), DEF0(B, 2), DEF0(B, 4), DEF0(B, 8), \
+		 { NULL, NULL, NULL, NULL }
+#define DEF5(B)  DEF0(B, 1), DEF0(B, 2), DEF0(B, 4), DEF0(B, 8), DEF0(B, 16)
+
+static const atomic_ool_names aarch64_ool_cas_names = { { DEF5(cas) } };
+const atomic_ool_names aarch64_ool_swp_names = { { DEF4(swp) } };
+const atomic_ool_names aarch64_ool_ldadd_names = { { DEF4(ldadd) } };
+const atomic_ool_names aarch64_ool_ldset_names = { { DEF4(ldset) } };
+const atomic_ool_names aarch64_ool_ldclr_names = { { DEF4(ldclr) } };
+const atomic_ool_names aarch64_ool_ldeor_names = { { DEF4(ldeor) } };
+
+#undef DEF0
+#undef DEF4
+#undef DEF5
+
 /* Expand a compare and swap pattern.  */
 
 void
@@ -16913,6 +16989,17 @@ aarch64_expand_compare_and_swap (rtx operands[])
 						   newval, mod_s));
       cc_reg = aarch64_gen_compare_reg_maybe_ze (NE, rval, oldval, mode);
     }
+  else if (TARGET_OUTLINE_ATOMICS)
+    {
+      /* Oldval must satisfy compare afterward.  */
+      if (!aarch64_plus_operand (oldval, mode))
+	oldval = force_reg (mode, oldval);
+      rtx func = aarch64_atomic_ool_func (mode, mod_s, &aarch64_ool_cas_names);
+      rval = emit_library_call_value (func, NULL_RTX, LCT_NORMAL, r_mode,
+				      oldval, mode, newval, mode,
+				      XEXP (mem, 0), Pmode);
+      cc_reg = aarch64_gen_compare_reg_maybe_ze (NE, rval, oldval, mode);
+    }
   else
     {
       /* The oldval predicate varies by mode.  Test it and force to reg.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-comp-swap-release-acquire.c b/gcc/testsuite/gcc.target/aarch64/atomic-comp-swap-release-acquire.c
index 49ca5d0d09c..a828a72aa75 100644
--- a/gcc/testsuite/gcc.target/aarch64/atomic-comp-swap-release-acquire.c
+++ b/gcc/testsuite/gcc.target/aarch64/atomic-comp-swap-release-acquire.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-march=armv8-a+nolse -O2 -fno-ipa-icf" } */
+/* { dg-options "-march=armv8-a+nolse -O2 -fno-ipa-icf -mno-outline-atomics" } */
 
 #include "atomic-comp-swap-release-acquire.x"
 
diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-op-acq_rel.c b/gcc/testsuite/gcc.target/aarch64/atomic-op-acq_rel.c
index 74f26348e42..6823ce381b2 100644
--- a/gcc/testsuite/gcc.target/aarch64/atomic-op-acq_rel.c
+++ b/gcc/testsuite/gcc.target/aarch64/atomic-op-acq_rel.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-march=armv8-a+nolse -O2" } */
+/* { dg-options "-march=armv8-a+nolse -O2 -mno-outline-atomics" } */
 
 #include "atomic-op-acq_rel.x"
 
diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-op-acquire.c b/gcc/testsuite/gcc.target/aarch64/atomic-op-acquire.c
index 66c1b1efe20..87937de378a 100644
--- a/gcc/testsuite/gcc.target/aarch64/atomic-op-acquire.c
+++ b/gcc/testsuite/gcc.target/aarch64/atomic-op-acquire.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-march=armv8-a+nolse -O2" } */
+/* { dg-options "-march=armv8-a+nolse -O2 -mno-outline-atomics" } */
 
 #include "atomic-op-acquire.x"
 
diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-op-char.c b/gcc/testsuite/gcc.target/aarch64/atomic-op-char.c
index c09d0434ecf..60955e57da3 100644
--- a/gcc/testsuite/gcc.target/aarch64/atomic-op-char.c
+++ b/gcc/testsuite/gcc.target/aarch64/atomic-op-char.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-march=armv8-a+nolse -O2" } */
+/* { dg-options "-march=armv8-a+nolse -O2 -mno-outline-atomics" } */
 
 #include "atomic-op-char.x"
 
diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-op-consume.c b/gcc/testsuite/gcc.target/aarch64/atomic-op-consume.c
index 5783ab84f5c..16cb11aeeaf 100644
--- a/gcc/testsuite/gcc.target/aarch64/atomic-op-consume.c
+++ b/gcc/testsuite/gcc.target/aarch64/atomic-op-consume.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-march=armv8-a+nolse -O2" } */
+/* { dg-options "-march=armv8-a+nolse -O2 -mno-outline-atomics" } */
 
 #include "atomic-op-consume.x"
 
diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-op-imm.c b/gcc/testsuite/gcc.target/aarch64/atomic-op-imm.c
index 18b8f0b04e9..bcab4e481e3 100644
--- a/gcc/testsuite/gcc.target/aarch64/atomic-op-imm.c
+++ b/gcc/testsuite/gcc.target/aarch64/atomic-op-imm.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-march=armv8-a+nolse -O2" } */
+/* { dg-options "-march=armv8-a+nolse -O2 -mno-outline-atomics" } */
 
 int v = 0;
 
diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-op-int.c b/gcc/testsuite/gcc.target/aarch64/atomic-op-int.c
index 8520f0839ba..040e4a8d168 100644
--- a/gcc/testsuite/gcc.target/aarch64/atomic-op-int.c
+++ b/gcc/testsuite/gcc.target/aarch64/atomic-op-int.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-march=armv8-a+nolse -O2" } */
+/* { dg-options "-march=armv8-a+nolse -O2 -mno-outline-atomics" } */
 
 #include "atomic-op-int.x"
 
diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-op-long.c b/gcc/testsuite/gcc.target/aarch64/atomic-op-long.c
index d011f8c5ce2..fc88b92cd3e 100644
--- a/gcc/testsuite/gcc.target/aarch64/atomic-op-long.c
+++ b/gcc/testsuite/gcc.target/aarch64/atomic-op-long.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-march=armv8-a+nolse -O2" } */
+/* { dg-options "-march=armv8-a+nolse -O2 -mno-outline-atomics" } */
 
 long v = 0;
 
diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-op-relaxed.c b/gcc/testsuite/gcc.target/aarch64/atomic-op-relaxed.c
index ed96bfdb978..503d62b0280 100644
--- a/gcc/testsuite/gcc.target/aarch64/atomic-op-relaxed.c
+++ b/gcc/testsuite/gcc.target/aarch64/atomic-op-relaxed.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-march=armv8-a+nolse -O2" } */
+/* { dg-options "-march=armv8-a+nolse -O2 -mno-outline-atomics" } */
 
 #include "atomic-op-relaxed.x"
 
diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-op-release.c b/gcc/testsuite/gcc.target/aarch64/atomic-op-release.c
index fc4be17de89..efe14aea7e4 100644
--- a/gcc/testsuite/gcc.target/aarch64/atomic-op-release.c
+++ b/gcc/testsuite/gcc.target/aarch64/atomic-op-release.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-march=armv8-a+nolse -O2" } */
+/* { dg-options "-march=armv8-a+nolse -O2 -mno-outline-atomics" } */
 
 #include "atomic-op-release.x"
 
diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-op-seq_cst.c b/gcc/testsuite/gcc.target/aarch64/atomic-op-seq_cst.c
index 613000fe490..09973bf82ba 100644
--- a/gcc/testsuite/gcc.target/aarch64/atomic-op-seq_cst.c
+++ b/gcc/testsuite/gcc.target/aarch64/atomic-op-seq_cst.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-march=armv8-a+nolse -O2" } */
+/* { dg-options "-march=armv8-a+nolse -O2 -mno-outline-atomics" } */
 
 #include "atomic-op-seq_cst.x"
 
diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-op-short.c b/gcc/testsuite/gcc.target/aarch64/atomic-op-short.c
index e82c8118ece..e1dcebb0f89 100644
--- a/gcc/testsuite/gcc.target/aarch64/atomic-op-short.c
+++ b/gcc/testsuite/gcc.target/aarch64/atomic-op-short.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-march=armv8-a+nolse -O2" } */
+/* { dg-options "-march=armv8-a+nolse -O2 -mno-outline-atomics" } */
 
 #include "atomic-op-short.x"
 
diff --git a/gcc/testsuite/gcc.target/aarch64/atomic_cmp_exchange_zero_reg_1.c b/gcc/testsuite/gcc.target/aarch64/atomic_cmp_exchange_zero_reg_1.c
index f2a21ddf2e1..29246979bfb 100644
--- a/gcc/testsuite/gcc.target/aarch64/atomic_cmp_exchange_zero_reg_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/atomic_cmp_exchange_zero_reg_1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -march=armv8-a+nolse" } */
+/* { dg-options "-O2 -march=armv8-a+nolse -mno-outline-atomics" } */
 /* { dg-skip-if "" { *-*-* } { "-mcpu=*" } { "" } } */
 
 int
diff --git a/gcc/testsuite/gcc.target/aarch64/atomic_cmp_exchange_zero_strong_1.c b/gcc/testsuite/gcc.target/aarch64/atomic_cmp_exchange_zero_strong_1.c
index 8d2ae67dfbe..6daf9b08f5a 100644
--- a/gcc/testsuite/gcc.target/aarch64/atomic_cmp_exchange_zero_strong_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/atomic_cmp_exchange_zero_strong_1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -march=armv8-a+nolse" } */
+/* { dg-options "-O2 -march=armv8-a+nolse -mno-outline-atomics" } */
 /* { dg-skip-if "" { *-*-* } { "-mcpu=*" } { "" } } */
 
 int
diff --git a/gcc/testsuite/gcc.target/aarch64/sync-comp-swap.c b/gcc/testsuite/gcc.target/aarch64/sync-comp-swap.c
index e571b2f13b3..f56415f3354 100644
--- a/gcc/testsuite/gcc.target/aarch64/sync-comp-swap.c
+++ b/gcc/testsuite/gcc.target/aarch64/sync-comp-swap.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-march=armv8-a+nolse -O2 -fno-ipa-icf" } */
+/* { dg-options "-march=armv8-a+nolse -O2 -fno-ipa-icf -mno-outline-atomics" } */
 
 #include "sync-comp-swap.x"
 
diff --git a/gcc/testsuite/gcc.target/aarch64/sync-op-acquire.c b/gcc/testsuite/gcc.target/aarch64/sync-op-acquire.c
index 357bf1be3b2..39b3144aa36 100644
--- a/gcc/testsuite/gcc.target/aarch64/sync-op-acquire.c
+++ b/gcc/testsuite/gcc.target/aarch64/sync-op-acquire.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-march=armv8-a+nolse -O2" } */
+/* { dg-options "-march=armv8-a+nolse -O2 -mno-outline-atomics" } */
 
 #include "sync-op-acquire.x"
 
diff --git a/gcc/testsuite/gcc.target/aarch64/sync-op-full.c b/gcc/testsuite/gcc.target/aarch64/sync-op-full.c
index c6ba1629965..6b8b2043f40 100644
--- a/gcc/testsuite/gcc.target/aarch64/sync-op-full.c
+++ b/gcc/testsuite/gcc.target/aarch64/sync-op-full.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-march=armv8-a+nolse -O2" } */
+/* { dg-options "-march=armv8-a+nolse -O2 -mno-outline-atomics" } */
 
 #include "sync-op-full.x"
 
diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
index 55d466068b8..865b6a6d8ca 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -255,3 +255,6 @@ user-land code.
 TargetVariable
 long aarch64_stack_protector_guard_offset = 0
 
+moutline-atomics
+Target Report Mask(OUTLINE_ATOMICS) Save
+Generate local calls to out-of-line atomic operations.
diff --git a/gcc/config/aarch64/atomics.md b/gcc/config/aarch64/atomics.md
index f8bdd048b37..2e59b868420 100644
--- a/gcc/config/aarch64/atomics.md
+++ b/gcc/config/aarch64/atomics.md
@@ -186,16 +186,27 @@
   (match_operand:SI 3 "const_int_operand")]
   ""
   {
-    rtx (*gen) (rtx, rtx, rtx, rtx);
-
     /* Use an atomic SWP when available.  */
     if (TARGET_LSE)
-      gen = gen_aarch64_atomic_exchange<mode>_lse;
+      {
+	emit_insn (gen_aarch64_atomic_exchange<mode>_lse
+		   (operands[0], operands[1], operands[2], operands[3]));
+      }
+    else if (TARGET_OUTLINE_ATOMICS)
+      {
+	machine_mode mode = <MODE>mode;
+	rtx func = aarch64_atomic_ool_func (mode, operands[3],
+					    &aarch64_ool_swp_names);
+	rtx rval = emit_library_call_value (func, operands[0], LCT_NORMAL,
+					    mode, operands[2], mode,
+					    XEXP (operands[1], 0), Pmode);
+        emit_move_insn (operands[0], rval);
+      }
     else
-      gen = gen_aarch64_atomic_exchange<mode>;
-
-    emit_insn (gen (operands[0], operands[1], operands[2], operands[3]));
-
+      {
+	emit_insn (gen_aarch64_atomic_exchange<mode>
+		   (operands[0], operands[1], operands[2], operands[3]));
+      }
     DONE;
   }
 )
@@ -280,6 +291,39 @@
 	  }
 	operands[1] = force_reg (<MODE>mode, operands[1]);
       }
+    else if (TARGET_OUTLINE_ATOMICS)
+      {
+        const atomic_ool_names *names;
+	switch (<CODE>)
+	  {
+	  case MINUS:
+	    operands[1] = expand_simple_unop (<MODE>mode, NEG, operands[1],
+					      NULL, 1);
+	    /* fallthru */
+	  case PLUS:
+	    names = &aarch64_ool_ldadd_names;
+	    break;
+	  case IOR:
+	    names = &aarch64_ool_ldset_names;
+	    break;
+	  case XOR:
+	    names = &aarch64_ool_ldeor_names;
+	    break;
+	  case AND:
+	    operands[1] = expand_simple_unop (<MODE>mode, NOT, operands[1],
+					      NULL, 1);
+	    names = &aarch64_ool_ldclr_names;
+	    break;
+	  default:
+	    gcc_unreachable ();
+	  }
+        machine_mode mode = <MODE>mode;
+	rtx func = aarch64_atomic_ool_func (mode, operands[2], names);
+	emit_library_call_value (func, NULL_RTX, LCT_NORMAL, mode,
+				 operands[1], mode,
+				 XEXP (operands[0], 0), Pmode);
+        DONE;
+      }
     else
       gen = gen_aarch64_atomic_<atomic_optab><mode>;
 
@@ -405,6 +449,40 @@
 	}
       operands[2] = force_reg (<MODE>mode, operands[2]);
     }
+  else if (TARGET_OUTLINE_ATOMICS)
+    {
+      const atomic_ool_names *names;
+      switch (<CODE>)
+	{
+	case MINUS:
+	  operands[2] = expand_simple_unop (<MODE>mode, NEG, operands[2],
+					    NULL, 1);
+	  /* fallthru */
+	case PLUS:
+	  names = &aarch64_ool_ldadd_names;
+	  break;
+	case IOR:
+	  names = &aarch64_ool_ldset_names;
+	  break;
+	case XOR:
+	  names = &aarch64_ool_ldeor_names;
+	  break;
+	case AND:
+	  operands[2] = expand_simple_unop (<MODE>mode, NOT, operands[2],
+					    NULL, 1);
+	  names = &aarch64_ool_ldclr_names;
+	  break;
+	default:
+	  gcc_unreachable ();
+	}
+      machine_mode mode = <MODE>mode;
+      rtx func = aarch64_atomic_ool_func (mode, operands[3], names);
+      rtx rval = emit_library_call_value (func, operands[0], LCT_NORMAL, mode,
+					  operands[2], mode,
+					  XEXP (operands[1], 0), Pmode);
+      emit_move_insn (operands[0], rval);
+      DONE;
+    }
   else
     gen = gen_aarch64_atomic_fetch_<atomic_optab><mode>;
 
@@ -494,7 +572,7 @@
 {
   /* Use an atomic load-operate instruction when possible.  In this case
      we will re-compute the result from the original mem value. */
-  if (TARGET_LSE)
+  if (TARGET_LSE || TARGET_OUTLINE_ATOMICS)
     {
       rtx tmp = gen_reg_rtx (<MODE>mode);
       operands[2] = force_reg (<MODE>mode, operands[2]);
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 0e3693598e7..900fda1efb2 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -643,7 +643,8 @@ Objective-C and Objective-C++ Dialects}.
 -march=@var{name}  -mcpu=@var{name}  -mtune=@var{name}  @gol
 -moverride=@var{string}  -mverbose-cost-dump @gol
 -mstack-protector-guard=@var{guard} -mstack-protector-guard-reg=@var{sysreg} @gol
--mstack-protector-guard-offset=@var{offset} -mtrack-speculation }
+-mstack-protector-guard-offset=@var{offset} -mtrack-speculation @gol
+-moutline-atomics }
 
 @emph{Adapteva Epiphany Options}
 @gccoptlist{-mhalf-reg-file  -mprefer-short-insn-regs @gol
@@ -15874,6 +15875,19 @@ be used by the compiler when expanding calls to
 @code{__builtin_speculation_safe_copy} to permit a more efficient code
 sequence to be generated.
 
+@item -moutline-atomics
+@itemx -mno-outline-atomics
+Enable or disable calls to out-of-line helpers to implement atomic operations.
+These helpers will, at runtime, determine if ARMv8.1-Atomics instructions
+should be used; if not, they will use the load/store-exclusive instructions
+that are present in the base ARMv8.0 ISA.
+
+This option is only applicable when compiling for the base ARMv8.0
+instruction set.  If using a later revision, e.g. @option{-march=armv8.1-a}
+or @option{-march=armv8-a+lse}, the ARMv8.1-Atomics instructions will be
+used directly.  The same applies when using @option{-mcpu=} when the
+selected cpu supports the @samp{lse} feature.
+
 @item -march=@var{name}
 @opindex march
 Specify the name of the target architecture and, optionally, one or
-- 
2.17.1

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH, AArch64 v4 4/6] aarch64: Add out-of-line functions for LSE atomics
  2019-09-18  1:58 ` [PATCH, AArch64 v4 4/6] aarch64: Add out-of-line functions for LSE atomics Richard Henderson
@ 2019-09-18 12:58   ` Kyrill Tkachov
  2019-12-23 16:05   ` Roman Zhuykov
  1 sibling, 0 replies; 12+ messages in thread
From: Kyrill Tkachov @ 2019-09-18 12:58 UTC (permalink / raw)
  To: Richard Henderson, gcc-patches
  Cc: Wilco.Dijkstra, Marcus.Shawcroft, James.Greenhalgh

On 9/18/19 2:58 AM, Richard Henderson wrote:
> This is the libgcc part of the interface -- providing the functions.
> Rationale is provided at the top of libgcc/config/aarch64/lse.S.
>
> 	* config/aarch64/lse-init.c: New file.
> 	* config/aarch64/lse.S: New file.
> 	* config/aarch64/t-lse: New file.
> 	* config.host: Add t-lse to all aarch64 tuples.
> ---
>   libgcc/config/aarch64/lse-init.c |  45 ++++++
>   libgcc/config.host               |   4 +
>   libgcc/config/aarch64/lse.S      | 235 +++++++++++++++++++++++++++++++
>   libgcc/config/aarch64/t-lse      |  44 ++++++
>   4 files changed, 328 insertions(+)
>   create mode 100644 libgcc/config/aarch64/lse-init.c
>   create mode 100644 libgcc/config/aarch64/lse.S
>   create mode 100644 libgcc/config/aarch64/t-lse
>
> diff --git a/libgcc/config/aarch64/lse-init.c b/libgcc/config/aarch64/lse-init.c
> new file mode 100644
> index 00000000000..51fb21d45c9
> --- /dev/null
> +++ b/libgcc/config/aarch64/lse-init.c
> @@ -0,0 +1,45 @@
> +/* Out-of-line LSE atomics for AArch64 architecture, Init.
> +   Copyright (C) 2018 Free Software Foundation, Inc.
> +   Contributed by Linaro Ltd.
> +


This, and the other new files, will need an updated copyright date now.

Thanks,

Kyrill


> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +for more details.
> +
> +Under Section 7 of GPL version 3, you are granted additional
> +permissions described in the GCC Runtime Library Exception, version
> +3.1, as published by the Free Software Foundation.
> +
> +You should have received a copy of the GNU General Public License and
> +a copy of the GCC Runtime Library Exception along with this program;
> +see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> +<http://www.gnu.org/licenses/>.  */
> +
> +/* Define the symbol gating the LSE implementations.  */
> +_Bool __aarch64_have_lse_atomics
> +  __attribute__((visibility("hidden"), nocommon));
> +
> +/* Disable initialization of __aarch64_have_lse_atomics during bootstrap.  */
> +#ifndef inhibit_libc
> +# include <sys/auxv.h>
> +
> +/* Disable initialization if the system headers are too old.  */
> +# if defined(AT_HWCAP) && defined(HWCAP_ATOMICS)
> +
> +static void __attribute__((constructor))
> +init_have_lse_atomics (void)
> +{
> +  unsigned long hwcap = getauxval (AT_HWCAP);
> +  __aarch64_have_lse_atomics = (hwcap & HWCAP_ATOMICS) != 0;
> +}
> +
> +# endif /* HWCAP */
> +#endif /* inhibit_libc */
> diff --git a/libgcc/config.host b/libgcc/config.host
> index 728e543ea39..122113fc519 100644
> --- a/libgcc/config.host
> +++ b/libgcc/config.host
> @@ -350,12 +350,14 @@ aarch64*-*-elf | aarch64*-*-rtems*)
>   	extra_parts="$extra_parts crtbegin.o crtend.o crti.o crtn.o"
>   	extra_parts="$extra_parts crtfastmath.o"
>   	tmake_file="${tmake_file} ${cpu_type}/t-aarch64"
> +	tmake_file="${tmake_file} ${cpu_type}/t-lse t-slibgcc-libgcc"
>   	tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp t-crtfm"
>   	md_unwind_header=aarch64/aarch64-unwind.h
>   	;;
>   aarch64*-*-freebsd*)
>   	extra_parts="$extra_parts crtfastmath.o"
>   	tmake_file="${tmake_file} ${cpu_type}/t-aarch64"
> +	tmake_file="${tmake_file} ${cpu_type}/t-lse t-slibgcc-libgcc"
>   	tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp t-crtfm"
>   	md_unwind_header=aarch64/freebsd-unwind.h
>   	;;
> @@ -367,12 +369,14 @@ aarch64*-*-netbsd*)
>   	;;
>   aarch64*-*-fuchsia*)
>   	tmake_file="${tmake_file} ${cpu_type}/t-aarch64"
> +	tmake_file="${tmake_file} ${cpu_type}/t-lse t-slibgcc-libgcc"
>   	tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp"
>   	;;
>   aarch64*-*-linux*)
>   	extra_parts="$extra_parts crtfastmath.o"
>   	md_unwind_header=aarch64/linux-unwind.h
>   	tmake_file="${tmake_file} ${cpu_type}/t-aarch64"
> +	tmake_file="${tmake_file} ${cpu_type}/t-lse t-slibgcc-libgcc"
>   	tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp t-crtfm"
>   	;;
>   alpha*-*-linux*)
> diff --git a/libgcc/config/aarch64/lse.S b/libgcc/config/aarch64/lse.S
> new file mode 100644
> index 00000000000..c24a39242ca
> --- /dev/null
> +++ b/libgcc/config/aarch64/lse.S
> @@ -0,0 +1,235 @@
> +/* Out-of-line LSE atomics for AArch64 architecture.
> +   Copyright (C) 2018 Free Software Foundation, Inc.
> +   Contributed by Linaro Ltd.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +for more details.
> +
> +Under Section 7 of GPL version 3, you are granted additional
> +permissions described in the GCC Runtime Library Exception, version
> +3.1, as published by the Free Software Foundation.
> +
> +You should have received a copy of the GNU General Public License and
> +a copy of the GCC Runtime Library Exception along with this program;
> +see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> +<http://www.gnu.org/licenses/>.  */
> +
> +/*
> + * The problem that we are trying to solve is operating system deployment
> + * of ARMv8.1-Atomics, also known as Large System Exensions (LSE).
> + *
> + * There are a number of potential solutions for this problem which have
> + * been proposed and rejected for various reasons.  To recap:
> + *
> + * (1) Multiple builds.  The dynamic linker will examine /lib64/atomics/
> + * if HWCAP_ATOMICS is set, allowing entire libraries to be overwritten.
> + * However, not all Linux distributions are happy with multiple builds,
> + * and anyway it has no effect on main applications.
> + *
> + * (2) IFUNC.  We could put these functions into libgcc_s.so, and have
> + * a single copy of each function for all DSOs.  However, ARM is concerned
> + * that the branch-to-indirect-branch that is implied by using a PLT,
> + * as required by IFUNC, is too much overhead for smaller cpus.
> + *
> + * (3) Statically predicted direct branches.  This is the approach that
> + * is taken here.  These functions are linked into every DSO that uses them.
> + * All of the symbols are hidden, so that the functions are called via a
> + * direct branch.  The choice of LSE vs non-LSE is done via one byte load
> + * followed by a well-predicted direct branch.  The functions are compiled
> + * separately to minimize code size.
> + */
> +
> +/* Tell the assembler to accept LSE instructions.  */
> +	.arch armv8-a+lse
> +
> +/* Declare the symbol gating the LSE implementations.  */
> +	.hidden	__aarch64_have_lse_atomics
> +
> +/* Turn size and memory model defines into mnemonic fragments.  */
> +#if SIZE == 1
> +# define S     b
> +# define UXT   uxtb
> +#elif SIZE == 2
> +# define S     h
> +# define UXT   uxth
> +#elif SIZE == 4 || SIZE == 8 || SIZE == 16
> +# define S
> +# define UXT   mov
> +#else
> +# error
> +#endif
> +
> +#if MODEL == 1
> +# define SUFF  _relax
> +# define A
> +# define L
> +#elif MODEL == 2
> +# define SUFF  _acq
> +# define A     a
> +# define L
> +#elif MODEL == 3
> +# define SUFF  _rel
> +# define A
> +# define L     l
> +#elif MODEL == 4
> +# define SUFF  _acq_rel
> +# define A     a
> +# define L     l
> +#else
> +# error
> +#endif
> +
> +/* Concatenate symbols.  */
> +#define glue2_(A, B)		A ## B
> +#define glue2(A, B)		glue2_(A, B)
> +#define glue3_(A, B, C)		A ## B ## C
> +#define glue3(A, B, C)		glue3_(A, B, C)
> +#define glue4_(A, B, C, D)	A ## B ## C ## D
> +#define glue4(A, B, C, D)	glue4_(A, B, C, D)
> +
> +/* Select the size of a register, given a regno.  */
> +#define x(N)			glue2(x, N)
> +#define w(N)			glue2(w, N)
> +#if SIZE < 8
> +# define s(N)			w(N)
> +#else
> +# define s(N)			x(N)
> +#endif
> +
> +#define NAME(BASE)		glue4(__aarch64_, BASE, SIZE, SUFF)
> +#define LDXR			glue4(ld, A, xr, S)
> +#define STXR			glue4(st, L, xr, S)
> +
> +/* Temporary registers used.  Other than these, only the return value
> +   register (x0) and the flags are modified.  */
> +#define tmp0	16
> +#define tmp1	17
> +#define tmp2	15
> +
> +/* Start and end a function.  */
> +.macro	STARTFN name
> +	.text
> +	.balign	16
> +	.globl	\name
> +	.hidden	\name
> +	.type	\name, %function
> +	.cfi_startproc
> +\name:
> +.endm
> +
> +.macro	ENDFN name
> +	.cfi_endproc
> +	.size	\name, . - \name
> +.endm
> +
> +/* Branch to LABEL if LSE is disabled.  */
> +.macro	JUMP_IF_NOT_LSE label
> +	adrp	x(tmp0), __aarch64_have_lse_atomics
> +	ldrb	w(tmp0), [x(tmp0), :lo12:__aarch64_have_lse_atomics]
> +	cbz	w(tmp0), \label
> +.endm
> +
> +#ifdef L_cas
> +
> +STARTFN	NAME(cas)
> +	JUMP_IF_NOT_LSE	8f
> +
> +#if SIZE < 16
> +#define CAS	glue4(cas, A, L, S)
> +
> +	CAS		s(0), s(1), [x2]
> +	ret
> +
> +8:	UXT		s(tmp0), s(0)
> +0:	LDXR		s(0), [x2]
> +	cmp		s(0), s(tmp0)
> +	bne		1f
> +	STXR		w(tmp1), s(1), [x2]
> +	cbnz		w(tmp1), 0b
> +1:	ret
> +
> +#else
> +#define LDXP	glue3(ld, A, xp)
> +#define STXP	glue3(st, L, xp)
> +#define CASP	glue3(casp, A, L)
> +
> +	CASP		x0, x1, x2, x3, [x4]
> +	ret
> +
> +8:	mov		x(tmp0), x0
> +	mov		x(tmp1), x1
> +0:	LDXP		x0, x1, [x4]
> +	cmp		x0, x(tmp0)
> +	ccmp		x1, x(tmp1), #0, eq
> +	bne		1f
> +	STXP		w(tmp2), x(tmp0), x(tmp1), [x4]
> +	cbnz		w(tmp2), 0b
> +1:	ret
> +
> +#endif
> +
> +ENDFN	NAME(cas)
> +#endif
> +
> +#ifdef L_swp
> +#define SWP	glue4(swp, A, L, S)
> +
> +STARTFN	NAME(swp)
> +	JUMP_IF_NOT_LSE	8f
> +
> +	SWP		s(0), s(0), [x1]
> +	ret
> +
> +8:	mov		s(tmp0), s(0)
> +0:	LDXR		s(0), [x1]
> +	STXR		w(tmp1), s(tmp0), [x1]
> +	cbnz		w(tmp1), 0b
> +	ret
> +
> +ENDFN	NAME(swp)
> +#endif
> +
> +#if defined(L_ldadd) || defined(L_ldclr) \
> +    || defined(L_ldeor) || defined(L_ldset)
> +
> +#ifdef L_ldadd
> +#define LDNM	ldadd
> +#define OP	add
> +#elif defined(L_ldclr)
> +#define LDNM	ldclr
> +#define OP	bic
> +#elif defined(L_ldeor)
> +#define LDNM	ldeor
> +#define OP	eor
> +#elif defined(L_ldset)
> +#define LDNM	ldset
> +#define OP	orr
> +#else
> +#error
> +#endif
> +#define LDOP	glue4(LDNM, A, L, S)
> +
> +STARTFN	NAME(LDNM)
> +	JUMP_IF_NOT_LSE	8f
> +
> +	LDOP		s(0), s(0), [x1]
> +	ret
> +
> +8:	mov		s(tmp0), s(0)
> +0:	LDXR		s(0), [x1]
> +	OP		s(tmp1), s(0), s(tmp0)
> +	STXR		w(tmp1), s(tmp1), [x1]
> +	cbnz		w(tmp1), 0b
> +	ret
> +
> +ENDFN	NAME(LDNM)
> +#endif
> diff --git a/libgcc/config/aarch64/t-lse b/libgcc/config/aarch64/t-lse
> new file mode 100644
> index 00000000000..c7f4223cd45
> --- /dev/null
> +++ b/libgcc/config/aarch64/t-lse
> @@ -0,0 +1,44 @@
> +# Out-of-line LSE atomics for AArch64 architecture.
> +# Copyright (C) 2018 Free Software Foundation, Inc.
> +# Contributed by Linaro Ltd.
> +#
> +# This file is part of GCC.
> +#
> +# GCC is free software; you can redistribute it and/or modify it
> +# under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3, or (at your option)
> +# any later version.
> +#
> +# GCC is distributed in the hope that it will be useful, but
> +# WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +# General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with GCC; see the file COPYING3.  If not see
> +# <http://www.gnu.org/licenses/>.
> +
> +# Compare-and-swap has 5 sizes and 4 memory models.
> +S0 := $(foreach s, 1 2 4 8 16, $(addsuffix _$(s), cas))
> +O0 := $(foreach m, 1 2 3 4, $(addsuffix _$(m)$(objext), $(S0)))
> +
> +# Swap, Load-and-operate have 4 sizes and 4 memory models
> +S1 := $(foreach s, 1 2 4 8, $(addsuffix _$(s), swp ldadd ldclr ldeor ldset))
> +O1 := $(foreach m, 1 2 3 4, $(addsuffix _$(m)$(objext), $(S1)))
> +
> +LSE_OBJS := $(O0) $(O1)
> +
> +libgcc-objects += $(LSE_OBJS) lse-init$(objext)
> +
> +empty      =
> +space      = $(empty) $(empty)
> +PAT_SPLIT  = $(subst _,$(space),$(*F))
> +PAT_BASE   = $(word 1,$(PAT_SPLIT))
> +PAT_N      = $(word 2,$(PAT_SPLIT))
> +PAT_M      = $(word 3,$(PAT_SPLIT))
> +
> +lse-init$(objext): $(srcdir)/config/aarch64/lse-init.c
> +	$(gcc_compile) -c $<
> +
> +$(LSE_OBJS): $(srcdir)/config/aarch64/lse.S
> +	$(gcc_compile) -DL_$(PAT_BASE) -DSIZE=$(PAT_N) -DMODEL=$(PAT_M) -c $<

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH, AArch64 v4 5/6] aarch64: Implement -moutline-atomics
  2019-09-18  1:58 ` [PATCH, AArch64 v4 5/6] aarch64: Implement -moutline-atomics Richard Henderson
@ 2019-09-18 12:58   ` Kyrill Tkachov
  0 siblings, 0 replies; 12+ messages in thread
From: Kyrill Tkachov @ 2019-09-18 12:58 UTC (permalink / raw)
  To: Richard Henderson, gcc-patches
  Cc: Wilco.Dijkstra, Marcus.Shawcroft, James.Greenhalgh


On 9/18/19 2:58 AM, Richard Henderson wrote:
> 	* config/aarch64/aarch64.opt (-moutline-atomics): New.
> 	* config/aarch64/aarch64.c (aarch64_atomic_ool_func): New.
> 	(aarch64_ool_cas_names, aarch64_ool_swp_names): New.
> 	(aarch64_ool_ldadd_names, aarch64_ool_ldset_names): New.
> 	(aarch64_ool_ldclr_names, aarch64_ool_ldeor_names): New.
> 	(aarch64_expand_compare_and_swap): Honor TARGET_OUTLINE_ATOMICS.
> 	* config/aarch64/atomics.md (atomic_exchange<ALLI>): Likewise.
> 	(atomic_<atomic_op><ALLI>): Likewise.
> 	(atomic_fetch_<atomic_op><ALLI>): Likewise.
> 	(atomic_<atomic_op>_fetch<ALLI>): Likewise.
> testsuite/
> 	* gcc.target/aarch64/atomic-op-acq_rel.c: Use -mno-outline-atomics.
> 	* gcc.target/aarch64/atomic-comp-swap-release-acquire.c: Likewise.
> 	* gcc.target/aarch64/atomic-op-acquire.c: Likewise.
> 	* gcc.target/aarch64/atomic-op-char.c: Likewise.
> 	* gcc.target/aarch64/atomic-op-consume.c: Likewise.
> 	* gcc.target/aarch64/atomic-op-imm.c: Likewise.
> 	* gcc.target/aarch64/atomic-op-int.c: Likewise.
> 	* gcc.target/aarch64/atomic-op-long.c: Likewise.
> 	* gcc.target/aarch64/atomic-op-relaxed.c: Likewise.
> 	* gcc.target/aarch64/atomic-op-release.c: Likewise.
> 	* gcc.target/aarch64/atomic-op-seq_cst.c: Likewise.
> 	* gcc.target/aarch64/atomic-op-short.c: Likewise.
> 	* gcc.target/aarch64/atomic_cmp_exchange_zero_reg_1.c: Likewise.
> 	* gcc.target/aarch64/atomic_cmp_exchange_zero_strong_1.c: Likewise.
> 	* gcc.target/aarch64/sync-comp-swap.c: Likewise.
> 	* gcc.target/aarch64/sync-op-acquire.c: Likewise.
> 	* gcc.target/aarch64/sync-op-full.c: Likewise.
> ---
>   gcc/config/aarch64/aarch64-protos.h           | 13 +++
>   gcc/config/aarch64/aarch64.c                  | 87 +++++++++++++++++
>   .../atomic-comp-swap-release-acquire.c        |  2 +-
>   .../gcc.target/aarch64/atomic-op-acq_rel.c    |  2 +-
>   .../gcc.target/aarch64/atomic-op-acquire.c    |  2 +-
>   .../gcc.target/aarch64/atomic-op-char.c       |  2 +-
>   .../gcc.target/aarch64/atomic-op-consume.c    |  2 +-
>   .../gcc.target/aarch64/atomic-op-imm.c        |  2 +-
>   .../gcc.target/aarch64/atomic-op-int.c        |  2 +-
>   .../gcc.target/aarch64/atomic-op-long.c       |  2 +-
>   .../gcc.target/aarch64/atomic-op-relaxed.c    |  2 +-
>   .../gcc.target/aarch64/atomic-op-release.c    |  2 +-
>   .../gcc.target/aarch64/atomic-op-seq_cst.c    |  2 +-
>   .../gcc.target/aarch64/atomic-op-short.c      |  2 +-
>   .../aarch64/atomic_cmp_exchange_zero_reg_1.c  |  2 +-
>   .../atomic_cmp_exchange_zero_strong_1.c       |  2 +-
>   .../gcc.target/aarch64/sync-comp-swap.c       |  2 +-
>   .../gcc.target/aarch64/sync-op-acquire.c      |  2 +-
>   .../gcc.target/aarch64/sync-op-full.c         |  2 +-
>   gcc/config/aarch64/aarch64.opt                |  3 +
>   gcc/config/aarch64/atomics.md                 | 94 +++++++++++++++++--
>   gcc/doc/invoke.texi                           | 16 +++-
>   22 files changed, 221 insertions(+), 26 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
> index c4b73d26df6..1c1aac7201a 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -696,4 +696,17 @@ poly_uint64 aarch64_regmode_natural_size (machine_mode);
>   
>   bool aarch64_high_bits_all_ones_p (HOST_WIDE_INT);
>   
> +struct atomic_ool_names
> +{
> +    const char *str[5][4];
> +};
> +
> +rtx aarch64_atomic_ool_func(machine_mode mode, rtx model_rtx,
> +			    const atomic_ool_names *names);
> +extern const atomic_ool_names aarch64_ool_swp_names;
> +extern const atomic_ool_names aarch64_ool_ldadd_names;
> +extern const atomic_ool_names aarch64_ool_ldset_names;
> +extern const atomic_ool_names aarch64_ool_ldclr_names;
> +extern const atomic_ool_names aarch64_ool_ldeor_names;
> +
>   #endif /* GCC_AARCH64_PROTOS_H */
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index b937514e6f8..56a4a47db73 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -16867,6 +16867,82 @@ aarch64_emit_unlikely_jump (rtx insn)
>     add_reg_br_prob_note (jump, profile_probability::very_unlikely ());
>   }
>   
> +/* We store the names of the various atomic helpers in a 5x4 array.
> +   Return the libcall function given MODE, MODEL and NAMES.  */
> +
> +rtx
> +aarch64_atomic_ool_func(machine_mode mode, rtx model_rtx,
> +			const atomic_ool_names *names)
> +{
> +  memmodel model = memmodel_base (INTVAL (model_rtx));
> +  int mode_idx, model_idx;
> +
> +  switch (mode)
> +    {
> +    case E_QImode:
> +      mode_idx = 0;
> +      break;
> +    case E_HImode:
> +      mode_idx = 1;
> +      break;
> +    case E_SImode:
> +      mode_idx = 2;
> +      break;
> +    case E_DImode:
> +      mode_idx = 3;
> +      break;
> +    case E_TImode:
> +      mode_idx = 4;
> +      break;
> +    default:
> +      gcc_unreachable ();
> +    }
> +
> +  switch (model)
> +    {
> +    case MEMMODEL_RELAXED:
> +      model_idx = 0;
> +      break;
> +    case MEMMODEL_CONSUME:
> +    case MEMMODEL_ACQUIRE:
> +      model_idx = 1;
> +      break;
> +    case MEMMODEL_RELEASE:
> +      model_idx = 2;
> +      break;
> +    case MEMMODEL_ACQ_REL:
> +    case MEMMODEL_SEQ_CST:
> +      model_idx = 3;
> +      break;
> +    default:
> +      gcc_unreachable ();
> +    }
> +
> +  return init_one_libfunc_visibility (names->str[mode_idx][model_idx],
> +				      VISIBILITY_HIDDEN);
> +}
> +
> +#define DEF0(B, N) \
> +  { "__aarch64_" #B #N "_relax", \
> +    "__aarch64_" #B #N "_acq", \
> +    "__aarch64_" #B #N "_rel", \
> +    "__aarch64_" #B #N "_acq_rel" }
> +
> +#define DEF4(B)  DEF0(B, 1), DEF0(B, 2), DEF0(B, 4), DEF0(B, 8), \
> +		 { NULL, NULL, NULL, NULL }
> +#define DEF5(B)  DEF0(B, 1), DEF0(B, 2), DEF0(B, 4), DEF0(B, 8), DEF0(B, 16)
> +
> +static const atomic_ool_names aarch64_ool_cas_names = { { DEF5(cas) } };
> +const atomic_ool_names aarch64_ool_swp_names = { { DEF4(swp) } };
> +const atomic_ool_names aarch64_ool_ldadd_names = { { DEF4(ldadd) } };
> +const atomic_ool_names aarch64_ool_ldset_names = { { DEF4(ldset) } };
> +const atomic_ool_names aarch64_ool_ldclr_names = { { DEF4(ldclr) } };
> +const atomic_ool_names aarch64_ool_ldeor_names = { { DEF4(ldeor) } };
> +
> +#undef DEF0
> +#undef DEF4
> +#undef DEF5
> +
>   /* Expand a compare and swap pattern.  */
>   
>   void
> @@ -16913,6 +16989,17 @@ aarch64_expand_compare_and_swap (rtx operands[])
>   						   newval, mod_s));
>         cc_reg = aarch64_gen_compare_reg_maybe_ze (NE, rval, oldval, mode);
>       }
> +  else if (TARGET_OUTLINE_ATOMICS)
> +    {
> +      /* Oldval must satisfy compare afterward.  */
> +      if (!aarch64_plus_operand (oldval, mode))
> +	oldval = force_reg (mode, oldval);
> +      rtx func = aarch64_atomic_ool_func (mode, mod_s, &aarch64_ool_cas_names);
> +      rval = emit_library_call_value (func, NULL_RTX, LCT_NORMAL, r_mode,
> +				      oldval, mode, newval, mode,
> +				      XEXP (mem, 0), Pmode);
> +      cc_reg = aarch64_gen_compare_reg_maybe_ze (NE, rval, oldval, mode);
> +    }
>     else
>       {
>         /* The oldval predicate varies by mode.  Test it and force to reg.  */
> diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-comp-swap-release-acquire.c b/gcc/testsuite/gcc.target/aarch64/atomic-comp-swap-release-acquire.c
> index 49ca5d0d09c..a828a72aa75 100644
> --- a/gcc/testsuite/gcc.target/aarch64/atomic-comp-swap-release-acquire.c
> +++ b/gcc/testsuite/gcc.target/aarch64/atomic-comp-swap-release-acquire.c
> @@ -1,5 +1,5 @@
>   /* { dg-do compile } */
> -/* { dg-options "-march=armv8-a+nolse -O2 -fno-ipa-icf" } */
> +/* { dg-options "-march=armv8-a+nolse -O2 -fno-ipa-icf -mno-outline-atomics" } */
>   
>   #include "atomic-comp-swap-release-acquire.x"
>   
> diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-op-acq_rel.c b/gcc/testsuite/gcc.target/aarch64/atomic-op-acq_rel.c
> index 74f26348e42..6823ce381b2 100644
> --- a/gcc/testsuite/gcc.target/aarch64/atomic-op-acq_rel.c
> +++ b/gcc/testsuite/gcc.target/aarch64/atomic-op-acq_rel.c
> @@ -1,5 +1,5 @@
>   /* { dg-do compile } */
> -/* { dg-options "-march=armv8-a+nolse -O2" } */
> +/* { dg-options "-march=armv8-a+nolse -O2 -mno-outline-atomics" } */
>   
>   #include "atomic-op-acq_rel.x"
>   
> diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-op-acquire.c b/gcc/testsuite/gcc.target/aarch64/atomic-op-acquire.c
> index 66c1b1efe20..87937de378a 100644
> --- a/gcc/testsuite/gcc.target/aarch64/atomic-op-acquire.c
> +++ b/gcc/testsuite/gcc.target/aarch64/atomic-op-acquire.c
> @@ -1,5 +1,5 @@
>   /* { dg-do compile } */
> -/* { dg-options "-march=armv8-a+nolse -O2" } */
> +/* { dg-options "-march=armv8-a+nolse -O2 -mno-outline-atomics" } */
>   
>   #include "atomic-op-acquire.x"
>   
> diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-op-char.c b/gcc/testsuite/gcc.target/aarch64/atomic-op-char.c
> index c09d0434ecf..60955e57da3 100644
> --- a/gcc/testsuite/gcc.target/aarch64/atomic-op-char.c
> +++ b/gcc/testsuite/gcc.target/aarch64/atomic-op-char.c
> @@ -1,5 +1,5 @@
>   /* { dg-do compile } */
> -/* { dg-options "-march=armv8-a+nolse -O2" } */
> +/* { dg-options "-march=armv8-a+nolse -O2 -mno-outline-atomics" } */
>   
>   #include "atomic-op-char.x"
>   
> diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-op-consume.c b/gcc/testsuite/gcc.target/aarch64/atomic-op-consume.c
> index 5783ab84f5c..16cb11aeeaf 100644
> --- a/gcc/testsuite/gcc.target/aarch64/atomic-op-consume.c
> +++ b/gcc/testsuite/gcc.target/aarch64/atomic-op-consume.c
> @@ -1,5 +1,5 @@
>   /* { dg-do compile } */
> -/* { dg-options "-march=armv8-a+nolse -O2" } */
> +/* { dg-options "-march=armv8-a+nolse -O2 -mno-outline-atomics" } */
>   
>   #include "atomic-op-consume.x"
>   
> diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-op-imm.c b/gcc/testsuite/gcc.target/aarch64/atomic-op-imm.c
> index 18b8f0b04e9..bcab4e481e3 100644
> --- a/gcc/testsuite/gcc.target/aarch64/atomic-op-imm.c
> +++ b/gcc/testsuite/gcc.target/aarch64/atomic-op-imm.c
> @@ -1,5 +1,5 @@
>   /* { dg-do compile } */
> -/* { dg-options "-march=armv8-a+nolse -O2" } */
> +/* { dg-options "-march=armv8-a+nolse -O2 -mno-outline-atomics" } */
>   
>   int v = 0;
>   
> diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-op-int.c b/gcc/testsuite/gcc.target/aarch64/atomic-op-int.c
> index 8520f0839ba..040e4a8d168 100644
> --- a/gcc/testsuite/gcc.target/aarch64/atomic-op-int.c
> +++ b/gcc/testsuite/gcc.target/aarch64/atomic-op-int.c
> @@ -1,5 +1,5 @@
>   /* { dg-do compile } */
> -/* { dg-options "-march=armv8-a+nolse -O2" } */
> +/* { dg-options "-march=armv8-a+nolse -O2 -mno-outline-atomics" } */
>   
>   #include "atomic-op-int.x"
>   
> diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-op-long.c b/gcc/testsuite/gcc.target/aarch64/atomic-op-long.c
> index d011f8c5ce2..fc88b92cd3e 100644
> --- a/gcc/testsuite/gcc.target/aarch64/atomic-op-long.c
> +++ b/gcc/testsuite/gcc.target/aarch64/atomic-op-long.c
> @@ -1,5 +1,5 @@
>   /* { dg-do compile } */
> -/* { dg-options "-march=armv8-a+nolse -O2" } */
> +/* { dg-options "-march=armv8-a+nolse -O2 -mno-outline-atomics" } */
>   
>   long v = 0;
>   
> diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-op-relaxed.c b/gcc/testsuite/gcc.target/aarch64/atomic-op-relaxed.c
> index ed96bfdb978..503d62b0280 100644
> --- a/gcc/testsuite/gcc.target/aarch64/atomic-op-relaxed.c
> +++ b/gcc/testsuite/gcc.target/aarch64/atomic-op-relaxed.c
> @@ -1,5 +1,5 @@
>   /* { dg-do compile } */
> -/* { dg-options "-march=armv8-a+nolse -O2" } */
> +/* { dg-options "-march=armv8-a+nolse -O2 -mno-outline-atomics" } */
>   
>   #include "atomic-op-relaxed.x"
>   
> diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-op-release.c b/gcc/testsuite/gcc.target/aarch64/atomic-op-release.c
> index fc4be17de89..efe14aea7e4 100644
> --- a/gcc/testsuite/gcc.target/aarch64/atomic-op-release.c
> +++ b/gcc/testsuite/gcc.target/aarch64/atomic-op-release.c
> @@ -1,5 +1,5 @@
>   /* { dg-do compile } */
> -/* { dg-options "-march=armv8-a+nolse -O2" } */
> +/* { dg-options "-march=armv8-a+nolse -O2 -mno-outline-atomics" } */
>   
>   #include "atomic-op-release.x"
>   
> diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-op-seq_cst.c b/gcc/testsuite/gcc.target/aarch64/atomic-op-seq_cst.c
> index 613000fe490..09973bf82ba 100644
> --- a/gcc/testsuite/gcc.target/aarch64/atomic-op-seq_cst.c
> +++ b/gcc/testsuite/gcc.target/aarch64/atomic-op-seq_cst.c
> @@ -1,5 +1,5 @@
>   /* { dg-do compile } */
> -/* { dg-options "-march=armv8-a+nolse -O2" } */
> +/* { dg-options "-march=armv8-a+nolse -O2 -mno-outline-atomics" } */
>   
>   #include "atomic-op-seq_cst.x"
>   
> diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-op-short.c b/gcc/testsuite/gcc.target/aarch64/atomic-op-short.c
> index e82c8118ece..e1dcebb0f89 100644
> --- a/gcc/testsuite/gcc.target/aarch64/atomic-op-short.c
> +++ b/gcc/testsuite/gcc.target/aarch64/atomic-op-short.c
> @@ -1,5 +1,5 @@
>   /* { dg-do compile } */
> -/* { dg-options "-march=armv8-a+nolse -O2" } */
> +/* { dg-options "-march=armv8-a+nolse -O2 -mno-outline-atomics" } */
>   
>   #include "atomic-op-short.x"
>   
> diff --git a/gcc/testsuite/gcc.target/aarch64/atomic_cmp_exchange_zero_reg_1.c b/gcc/testsuite/gcc.target/aarch64/atomic_cmp_exchange_zero_reg_1.c
> index f2a21ddf2e1..29246979bfb 100644
> --- a/gcc/testsuite/gcc.target/aarch64/atomic_cmp_exchange_zero_reg_1.c
> +++ b/gcc/testsuite/gcc.target/aarch64/atomic_cmp_exchange_zero_reg_1.c
> @@ -1,5 +1,5 @@
>   /* { dg-do compile } */
> -/* { dg-options "-O2 -march=armv8-a+nolse" } */
> +/* { dg-options "-O2 -march=armv8-a+nolse -mno-outline-atomics" } */
>   /* { dg-skip-if "" { *-*-* } { "-mcpu=*" } { "" } } */
>   
>   int
> diff --git a/gcc/testsuite/gcc.target/aarch64/atomic_cmp_exchange_zero_strong_1.c b/gcc/testsuite/gcc.target/aarch64/atomic_cmp_exchange_zero_strong_1.c
> index 8d2ae67dfbe..6daf9b08f5a 100644
> --- a/gcc/testsuite/gcc.target/aarch64/atomic_cmp_exchange_zero_strong_1.c
> +++ b/gcc/testsuite/gcc.target/aarch64/atomic_cmp_exchange_zero_strong_1.c
> @@ -1,5 +1,5 @@
>   /* { dg-do compile } */
> -/* { dg-options "-O2 -march=armv8-a+nolse" } */
> +/* { dg-options "-O2 -march=armv8-a+nolse -mno-outline-atomics" } */
>   /* { dg-skip-if "" { *-*-* } { "-mcpu=*" } { "" } } */
>   
>   int
> diff --git a/gcc/testsuite/gcc.target/aarch64/sync-comp-swap.c b/gcc/testsuite/gcc.target/aarch64/sync-comp-swap.c
> index e571b2f13b3..f56415f3354 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sync-comp-swap.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sync-comp-swap.c
> @@ -1,5 +1,5 @@
>   /* { dg-do compile } */
> -/* { dg-options "-march=armv8-a+nolse -O2 -fno-ipa-icf" } */
> +/* { dg-options "-march=armv8-a+nolse -O2 -fno-ipa-icf -mno-outline-atomics" } */
>   
>   #include "sync-comp-swap.x"
>   
> diff --git a/gcc/testsuite/gcc.target/aarch64/sync-op-acquire.c b/gcc/testsuite/gcc.target/aarch64/sync-op-acquire.c
> index 357bf1be3b2..39b3144aa36 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sync-op-acquire.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sync-op-acquire.c
> @@ -1,5 +1,5 @@
>   /* { dg-do compile } */
> -/* { dg-options "-march=armv8-a+nolse -O2" } */
> +/* { dg-options "-march=armv8-a+nolse -O2 -mno-outline-atomics" } */
>   
>   #include "sync-op-acquire.x"
>   
> diff --git a/gcc/testsuite/gcc.target/aarch64/sync-op-full.c b/gcc/testsuite/gcc.target/aarch64/sync-op-full.c
> index c6ba1629965..6b8b2043f40 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sync-op-full.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sync-op-full.c
> @@ -1,5 +1,5 @@
>   /* { dg-do compile } */
> -/* { dg-options "-march=armv8-a+nolse -O2" } */
> +/* { dg-options "-march=armv8-a+nolse -O2 -mno-outline-atomics" } */
>   
>   #include "sync-op-full.x"
>   
> diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
> index 55d466068b8..865b6a6d8ca 100644
> --- a/gcc/config/aarch64/aarch64.opt
> +++ b/gcc/config/aarch64/aarch64.opt
> @@ -255,3 +255,6 @@ user-land code.
>   TargetVariable
>   long aarch64_stack_protector_guard_offset = 0
>   
> +moutline-atomics
> +Target Report Mask(OUTLINE_ATOMICS) Save
> +Generate local calls to out-of-line atomic operations.
> diff --git a/gcc/config/aarch64/atomics.md b/gcc/config/aarch64/atomics.md
> index f8bdd048b37..2e59b868420 100644
> --- a/gcc/config/aarch64/atomics.md
> +++ b/gcc/config/aarch64/atomics.md
> @@ -186,16 +186,27 @@
>     (match_operand:SI 3 "const_int_operand")]
>     ""
>     {
> -    rtx (*gen) (rtx, rtx, rtx, rtx);
> -
>       /* Use an atomic SWP when available.  */
>       if (TARGET_LSE)
> -      gen = gen_aarch64_atomic_exchange<mode>_lse;
> +      {
> +	emit_insn (gen_aarch64_atomic_exchange<mode>_lse
> +		   (operands[0], operands[1], operands[2], operands[3]));
> +      }
> +    else if (TARGET_OUTLINE_ATOMICS)
> +      {
> +	machine_mode mode = <MODE>mode;
> +	rtx func = aarch64_atomic_ool_func (mode, operands[3],
> +					    &aarch64_ool_swp_names);
> +	rtx rval = emit_library_call_value (func, operands[0], LCT_NORMAL,
> +					    mode, operands[2], mode,
> +					    XEXP (operands[1], 0), Pmode);
> +        emit_move_insn (operands[0], rval);
> +      }
>       else
> -      gen = gen_aarch64_atomic_exchange<mode>;
> -
> -    emit_insn (gen (operands[0], operands[1], operands[2], operands[3]));
> -
> +      {
> +	emit_insn (gen_aarch64_atomic_exchange<mode>
> +		   (operands[0], operands[1], operands[2], operands[3]));
> +      }
>       DONE;
>     }
>   )
> @@ -280,6 +291,39 @@
>   	  }
>   	operands[1] = force_reg (<MODE>mode, operands[1]);
>         }
> +    else if (TARGET_OUTLINE_ATOMICS)
> +      {
> +        const atomic_ool_names *names;
> +	switch (<CODE>)
> +	  {
> +	  case MINUS:
> +	    operands[1] = expand_simple_unop (<MODE>mode, NEG, operands[1],
> +					      NULL, 1);
> +	    /* fallthru */
> +	  case PLUS:
> +	    names = &aarch64_ool_ldadd_names;
> +	    break;
> +	  case IOR:
> +	    names = &aarch64_ool_ldset_names;
> +	    break;
> +	  case XOR:
> +	    names = &aarch64_ool_ldeor_names;
> +	    break;
> +	  case AND:
> +	    operands[1] = expand_simple_unop (<MODE>mode, NOT, operands[1],
> +					      NULL, 1);
> +	    names = &aarch64_ool_ldclr_names;
> +	    break;
> +	  default:
> +	    gcc_unreachable ();
> +	  }
> +        machine_mode mode = <MODE>mode;
> +	rtx func = aarch64_atomic_ool_func (mode, operands[2], names);
> +	emit_library_call_value (func, NULL_RTX, LCT_NORMAL, mode,
> +				 operands[1], mode,
> +				 XEXP (operands[0], 0), Pmode);
> +        DONE;
> +      }
>       else
>         gen = gen_aarch64_atomic_<atomic_optab><mode>;
>   
> @@ -405,6 +449,40 @@
>   	}
>         operands[2] = force_reg (<MODE>mode, operands[2]);
>       }
> +  else if (TARGET_OUTLINE_ATOMICS)
> +    {
> +      const atomic_ool_names *names;
> +      switch (<CODE>)
> +	{
> +	case MINUS:
> +	  operands[2] = expand_simple_unop (<MODE>mode, NEG, operands[2],
> +					    NULL, 1);
> +	  /* fallthru */
> +	case PLUS:
> +	  names = &aarch64_ool_ldadd_names;
> +	  break;
> +	case IOR:
> +	  names = &aarch64_ool_ldset_names;
> +	  break;
> +	case XOR:
> +	  names = &aarch64_ool_ldeor_names;
> +	  break;
> +	case AND:
> +	  operands[2] = expand_simple_unop (<MODE>mode, NOT, operands[2],
> +					    NULL, 1);
> +	  names = &aarch64_ool_ldclr_names;
> +	  break;
> +	default:
> +	  gcc_unreachable ();
> +	}
> +      machine_mode mode = <MODE>mode;
> +      rtx func = aarch64_atomic_ool_func (mode, operands[3], names);
> +      rtx rval = emit_library_call_value (func, operands[0], LCT_NORMAL, mode,
> +					  operands[2], mode,
> +					  XEXP (operands[1], 0), Pmode);
> +      emit_move_insn (operands[0], rval);
> +      DONE;
> +    }
>     else
>       gen = gen_aarch64_atomic_fetch_<atomic_optab><mode>;
>   
> @@ -494,7 +572,7 @@
>   {
>     /* Use an atomic load-operate instruction when possible.  In this case
>        we will re-compute the result from the original mem value. */
> -  if (TARGET_LSE)
> +  if (TARGET_LSE || TARGET_OUTLINE_ATOMICS)
>       {
>         rtx tmp = gen_reg_rtx (<MODE>mode);
>         operands[2] = force_reg (<MODE>mode, operands[2]);
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 0e3693598e7..900fda1efb2 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -643,7 +643,8 @@ Objective-C and Objective-C++ Dialects}.
>   -march=@var{name}  -mcpu=@var{name}  -mtune=@var{name}  @gol
>   -moverride=@var{string}  -mverbose-cost-dump @gol
>   -mstack-protector-guard=@var{guard} -mstack-protector-guard-reg=@var{sysreg} @gol
> --mstack-protector-guard-offset=@var{offset} -mtrack-speculation }
> +-mstack-protector-guard-offset=@var{offset} -mtrack-speculation @gol
> +-moutline-atomics }
>   
>   @emph{Adapteva Epiphany Options}
>   @gccoptlist{-mhalf-reg-file  -mprefer-short-insn-regs @gol
> @@ -15874,6 +15875,19 @@ be used by the compiler when expanding calls to
>   @code{__builtin_speculation_safe_copy} to permit a more efficient code
>   sequence to be generated.
>   
> +@item -moutline-atomics
> +@itemx -mno-outline-atomics
> +Enable or disable calls to out-of-line helpers to implement atomic operations.
> +These helpers will, at runtime, determine if ARMv8.1-Atomics instructions
> +should be used; if not, they will use the load/store-exclusive instructions
> +that are present in the base ARMv8.0 ISA.

Let's call them "LSE instructions from Armv8.1-A", rather than 
ARMv8.1-Atomics.


> +
> +This option is only applicable when compiling for the base ARMv8.0
> +instruction set.  If using a later revision, e.g. @option{-march=armv8.1-a}
> +or @option{-march=armv8-a+lse}, the ARMv8.1-Atomics instructions will be
> +used directly.  The same applies when using @option{-mcpu=} when the
> +selected cpu supports the @samp{lse} feature.
> +

This needs a corresponding ChangeLog entry.

Thanks,

Kyril

>   @item -march=@var{name}
>   @opindex march
>   Specify the name of the target architecture and, optionally, one or

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH, AArch64 v4 0/6] LSE atomics out-of-line
  2019-09-18  1:58 [PATCH, AArch64 v4 0/6] LSE atomics out-of-line Richard Henderson
                   ` (5 preceding siblings ...)
  2019-09-18  1:58 ` [PATCH, AArch64 v4 6/6] TESTING: Enable -moutline-atomics by default Richard Henderson
@ 2019-09-18 12:58 ` Kyrill Tkachov
  2019-09-19 14:39   ` Richard Henderson
  6 siblings, 1 reply; 12+ messages in thread
From: Kyrill Tkachov @ 2019-09-18 12:58 UTC (permalink / raw)
  To: Richard Henderson, gcc-patches
  Cc: Wilco.Dijkstra, Marcus.Shawcroft, James.Greenhalgh

Hi Richard,

On 9/18/19 2:58 AM, Richard Henderson wrote:
> Version 3 was back in November:
> https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00062.html
>
> Changes since v3:
>    * Do not swap_commutative_operands_p in aarch64_gen_compare_reg.
>      This is the probable cause of the bootstrap problem that Kyrill reported.
>    * Add unwind markers to the out-of-line functions.
>    * Use uxt{8,16} instead of mov in CAS functions,
>      in preference to including the uxt with the cmp.
>    * Prefer the lse case in the out-of-line fallthru (Wilco).
>    * Name the option -moutline-atomics (Wilco)
>    * Name the variable __aarch64_have_lse_atomics (Wilco);
>      fix the definition in lse-init.c.
>    * Rename the functions s/__aa64/__aarch64/ (Seemed sensible to match prev)
>    * Always use Pmode for the address for libcalls, fixing ilp32 (Kyrill).
>
> Still not done is a custom calling convention during code generation,
> but that can come later as an optimization.
>
> Tested aarch64-linux on a thunder x1.
> I have not run tests on any platform supporting LSE, even qemu.
>
Thanks for this.

I've bootstrapped and tested this patch series on systems with and 
without LSE support, both with and without patch [6/6], so 4 setups in 
total.

It all looks clean for me.

I'm favour of this series going in (modulo patch 6/6, leaving the option 
to turn it on to the user).

I've got a couple of small comments on some of the patches that IMO can 
be fixed when committing.

I'll respond to them individually.

Thanks,

Kyrill

> r~
>
>
> Richard Henderson (6):
>    aarch64: Extend %R for integer registers
>    aarch64: Implement TImode compare-and-swap
>    aarch64: Tidy aarch64_split_compare_and_swap
>    aarch64: Add out-of-line functions for LSE atomics
>    aarch64: Implement -moutline-atomics
>    TESTING: Enable -moutline-atomics by default
>
>   gcc/config/aarch64/aarch64-protos.h           |  13 +
>   gcc/common/config/aarch64/aarch64-common.c    |   6 +-
>   gcc/config/aarch64/aarch64.c                  | 204 +++++++++++----
>   .../atomic-comp-swap-release-acquire.c        |   2 +-
>   .../gcc.target/aarch64/atomic-op-acq_rel.c    |   2 +-
>   .../gcc.target/aarch64/atomic-op-acquire.c    |   2 +-
>   .../gcc.target/aarch64/atomic-op-char.c       |   2 +-
>   .../gcc.target/aarch64/atomic-op-consume.c    |   2 +-
>   .../gcc.target/aarch64/atomic-op-imm.c        |   2 +-
>   .../gcc.target/aarch64/atomic-op-int.c        |   2 +-
>   .../gcc.target/aarch64/atomic-op-long.c       |   2 +-
>   .../gcc.target/aarch64/atomic-op-relaxed.c    |   2 +-
>   .../gcc.target/aarch64/atomic-op-release.c    |   2 +-
>   .../gcc.target/aarch64/atomic-op-seq_cst.c    |   2 +-
>   .../gcc.target/aarch64/atomic-op-short.c      |   2 +-
>   .../aarch64/atomic_cmp_exchange_zero_reg_1.c  |   2 +-
>   .../atomic_cmp_exchange_zero_strong_1.c       |   2 +-
>   .../gcc.target/aarch64/sync-comp-swap.c       |   2 +-
>   .../gcc.target/aarch64/sync-op-acquire.c      |   2 +-
>   .../gcc.target/aarch64/sync-op-full.c         |   2 +-
>   libgcc/config/aarch64/lse-init.c              |  45 ++++
>   gcc/config/aarch64/aarch64.opt                |   3 +
>   gcc/config/aarch64/atomics.md                 | 187 +++++++++++++-
>   gcc/config/aarch64/iterators.md               |   3 +
>   gcc/doc/invoke.texi                           |  16 +-
>   libgcc/config.host                            |   4 +
>   libgcc/config/aarch64/lse.S                   | 235 ++++++++++++++++++
>   libgcc/config/aarch64/t-lse                   |  44 ++++
>   28 files changed, 709 insertions(+), 85 deletions(-)
>   create mode 100644 libgcc/config/aarch64/lse-init.c
>   create mode 100644 libgcc/config/aarch64/lse.S
>   create mode 100644 libgcc/config/aarch64/t-lse
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH, AArch64 v4 0/6] LSE atomics out-of-line
  2019-09-18 12:58 ` [PATCH, AArch64 v4 0/6] LSE atomics out-of-line Kyrill Tkachov
@ 2019-09-19 14:39   ` Richard Henderson
  0 siblings, 0 replies; 12+ messages in thread
From: Richard Henderson @ 2019-09-19 14:39 UTC (permalink / raw)
  To: Kyrill Tkachov, Richard Henderson, gcc-patches
  Cc: Wilco.Dijkstra, Marcus.Shawcroft, James.Greenhalgh

On 9/18/19 5:58 AM, Kyrill Tkachov wrote:
> Thanks for this.
> 
> I've bootstrapped and tested this patch series on systems with and without LSE
> support, both with and without patch [6/6], so 4 setups in total.
> 
> It all looks clean for me.
> 
> I'm favour of this series going in (modulo patch 6/6, leaving the option to
> turn it on to the user).
> 
> I've got a couple of small comments on some of the patches that IMO can be
> fixed when committing.

Thanks.  Committed with the requested modifications.


r~

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH, AArch64 v4 4/6] aarch64: Add out-of-line functions for LSE atomics
  2019-09-18  1:58 ` [PATCH, AArch64 v4 4/6] aarch64: Add out-of-line functions for LSE atomics Richard Henderson
  2019-09-18 12:58   ` Kyrill Tkachov
@ 2019-12-23 16:05   ` Roman Zhuykov
  1 sibling, 0 replies; 12+ messages in thread
From: Roman Zhuykov @ 2019-12-23 16:05 UTC (permalink / raw)
  To: Richard Henderson
  Cc: gcc-patches, Wilco.Dijkstra, kyrylo.tkachov, Marcus.Shawcroft,
	James.Greenhalgh

This caused:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93053

--
Roman

Richard Henderson wrote 18.09.2019 04:58:
> This is the libgcc part of the interface -- providing the functions.
> Rationale is provided at the top of libgcc/config/aarch64/lse.S.
> 
> 	* config/aarch64/lse-init.c: New file.
> 	* config/aarch64/lse.S: New file.
> 	* config/aarch64/t-lse: New file.
> 	* config.host: Add t-lse to all aarch64 tuples.
> ---
>  libgcc/config/aarch64/lse-init.c |  45 ++++++
>  libgcc/config.host               |   4 +
>  libgcc/config/aarch64/lse.S      | 235 +++++++++++++++++++++++++++++++
>  libgcc/config/aarch64/t-lse      |  44 ++++++
>  4 files changed, 328 insertions(+)
>  create mode 100644 libgcc/config/aarch64/lse-init.c
>  create mode 100644 libgcc/config/aarch64/lse.S
>  create mode 100644 libgcc/config/aarch64/t-lse
> 
> diff --git a/libgcc/config/aarch64/lse-init.c 
> b/libgcc/config/aarch64/lse-init.c
> new file mode 100644
> index 00000000000..51fb21d45c9
> --- /dev/null
> +++ b/libgcc/config/aarch64/lse-init.c
> @@ -0,0 +1,45 @@
> +/* Out-of-line LSE atomics for AArch64 architecture, Init.
> +   Copyright (C) 2018 Free Software Foundation, Inc.
> +   Contributed by Linaro Ltd.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +for more details.
> +
> +Under Section 7 of GPL version 3, you are granted additional
> +permissions described in the GCC Runtime Library Exception, version
> +3.1, as published by the Free Software Foundation.
> +
> +You should have received a copy of the GNU General Public License and
> +a copy of the GCC Runtime Library Exception along with this program;
> +see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> +<http://www.gnu.org/licenses/>.  */
> +
> +/* Define the symbol gating the LSE implementations.  */
> +_Bool __aarch64_have_lse_atomics
> +  __attribute__((visibility("hidden"), nocommon));
> +
> +/* Disable initialization of __aarch64_have_lse_atomics during 
> bootstrap.  */
> +#ifndef inhibit_libc
> +# include <sys/auxv.h>
> +
> +/* Disable initialization if the system headers are too old.  */
> +# if defined(AT_HWCAP) && defined(HWCAP_ATOMICS)
> +
> +static void __attribute__((constructor))
> +init_have_lse_atomics (void)
> +{
> +  unsigned long hwcap = getauxval (AT_HWCAP);
> +  __aarch64_have_lse_atomics = (hwcap & HWCAP_ATOMICS) != 0;
> +}
> +
> +# endif /* HWCAP */
> +#endif /* inhibit_libc */
> diff --git a/libgcc/config.host b/libgcc/config.host
> index 728e543ea39..122113fc519 100644
> --- a/libgcc/config.host
> +++ b/libgcc/config.host
> @@ -350,12 +350,14 @@ aarch64*-*-elf | aarch64*-*-rtems*)
>  	extra_parts="$extra_parts crtbegin.o crtend.o crti.o crtn.o"
>  	extra_parts="$extra_parts crtfastmath.o"
>  	tmake_file="${tmake_file} ${cpu_type}/t-aarch64"
> +	tmake_file="${tmake_file} ${cpu_type}/t-lse t-slibgcc-libgcc"
>  	tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp t-crtfm"
>  	md_unwind_header=aarch64/aarch64-unwind.h
>  	;;
>  aarch64*-*-freebsd*)
>  	extra_parts="$extra_parts crtfastmath.o"
>  	tmake_file="${tmake_file} ${cpu_type}/t-aarch64"
> +	tmake_file="${tmake_file} ${cpu_type}/t-lse t-slibgcc-libgcc"
>  	tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp t-crtfm"
>  	md_unwind_header=aarch64/freebsd-unwind.h
>  	;;
> @@ -367,12 +369,14 @@ aarch64*-*-netbsd*)
>  	;;
>  aarch64*-*-fuchsia*)
>  	tmake_file="${tmake_file} ${cpu_type}/t-aarch64"
> +	tmake_file="${tmake_file} ${cpu_type}/t-lse t-slibgcc-libgcc"
>  	tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp"
>  	;;
>  aarch64*-*-linux*)
>  	extra_parts="$extra_parts crtfastmath.o"
>  	md_unwind_header=aarch64/linux-unwind.h
>  	tmake_file="${tmake_file} ${cpu_type}/t-aarch64"
> +	tmake_file="${tmake_file} ${cpu_type}/t-lse t-slibgcc-libgcc"
>  	tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp t-crtfm"
>  	;;
>  alpha*-*-linux*)
> diff --git a/libgcc/config/aarch64/lse.S b/libgcc/config/aarch64/lse.S
> new file mode 100644
> index 00000000000..c24a39242ca
> --- /dev/null
> +++ b/libgcc/config/aarch64/lse.S
> @@ -0,0 +1,235 @@
> +/* Out-of-line LSE atomics for AArch64 architecture.
> +   Copyright (C) 2018 Free Software Foundation, Inc.
> +   Contributed by Linaro Ltd.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +for more details.
> +
> +Under Section 7 of GPL version 3, you are granted additional
> +permissions described in the GCC Runtime Library Exception, version
> +3.1, as published by the Free Software Foundation.
> +
> +You should have received a copy of the GNU General Public License and
> +a copy of the GCC Runtime Library Exception along with this program;
> +see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> +<http://www.gnu.org/licenses/>.  */
> +
> +/*
> + * The problem that we are trying to solve is operating system 
> deployment
> + * of ARMv8.1-Atomics, also known as Large System Exensions (LSE).
> + *
> + * There are a number of potential solutions for this problem which 
> have
> + * been proposed and rejected for various reasons.  To recap:
> + *
> + * (1) Multiple builds.  The dynamic linker will examine 
> /lib64/atomics/
> + * if HWCAP_ATOMICS is set, allowing entire libraries to be 
> overwritten.
> + * However, not all Linux distributions are happy with multiple 
> builds,
> + * and anyway it has no effect on main applications.
> + *
> + * (2) IFUNC.  We could put these functions into libgcc_s.so, and have
> + * a single copy of each function for all DSOs.  However, ARM is 
> concerned
> + * that the branch-to-indirect-branch that is implied by using a PLT,
> + * as required by IFUNC, is too much overhead for smaller cpus.
> + *
> + * (3) Statically predicted direct branches.  This is the approach 
> that
> + * is taken here.  These functions are linked into every DSO that uses 
> them.
> + * All of the symbols are hidden, so that the functions are called via 
> a
> + * direct branch.  The choice of LSE vs non-LSE is done via one byte 
> load
> + * followed by a well-predicted direct branch.  The functions are 
> compiled
> + * separately to minimize code size.
> + */
> +
> +/* Tell the assembler to accept LSE instructions.  */
> +	.arch armv8-a+lse
> +
> +/* Declare the symbol gating the LSE implementations.  */
> +	.hidden	__aarch64_have_lse_atomics
> +
> +/* Turn size and memory model defines into mnemonic fragments.  */
> +#if SIZE == 1
> +# define S     b
> +# define UXT   uxtb
> +#elif SIZE == 2
> +# define S     h
> +# define UXT   uxth
> +#elif SIZE == 4 || SIZE == 8 || SIZE == 16
> +# define S
> +# define UXT   mov
> +#else
> +# error
> +#endif
> +
> +#if MODEL == 1
> +# define SUFF  _relax
> +# define A
> +# define L
> +#elif MODEL == 2
> +# define SUFF  _acq
> +# define A     a
> +# define L
> +#elif MODEL == 3
> +# define SUFF  _rel
> +# define A
> +# define L     l
> +#elif MODEL == 4
> +# define SUFF  _acq_rel
> +# define A     a
> +# define L     l
> +#else
> +# error
> +#endif
> +
> +/* Concatenate symbols.  */
> +#define glue2_(A, B)		A ## B
> +#define glue2(A, B)		glue2_(A, B)
> +#define glue3_(A, B, C)		A ## B ## C
> +#define glue3(A, B, C)		glue3_(A, B, C)
> +#define glue4_(A, B, C, D)	A ## B ## C ## D
> +#define glue4(A, B, C, D)	glue4_(A, B, C, D)
> +
> +/* Select the size of a register, given a regno.  */
> +#define x(N)			glue2(x, N)
> +#define w(N)			glue2(w, N)
> +#if SIZE < 8
> +# define s(N)			w(N)
> +#else
> +# define s(N)			x(N)
> +#endif
> +
> +#define NAME(BASE)		glue4(__aarch64_, BASE, SIZE, SUFF)
> +#define LDXR			glue4(ld, A, xr, S)
> +#define STXR			glue4(st, L, xr, S)
> +
> +/* Temporary registers used.  Other than these, only the return value
> +   register (x0) and the flags are modified.  */
> +#define tmp0	16
> +#define tmp1	17
> +#define tmp2	15
> +
> +/* Start and end a function.  */
> +.macro	STARTFN name
> +	.text
> +	.balign	16
> +	.globl	\name
> +	.hidden	\name
> +	.type	\name, %function
> +	.cfi_startproc
> +\name:
> +.endm
> +
> +.macro	ENDFN name
> +	.cfi_endproc
> +	.size	\name, . - \name
> +.endm
> +
> +/* Branch to LABEL if LSE is disabled.  */
> +.macro	JUMP_IF_NOT_LSE label
> +	adrp	x(tmp0), __aarch64_have_lse_atomics
> +	ldrb	w(tmp0), [x(tmp0), :lo12:__aarch64_have_lse_atomics]
> +	cbz	w(tmp0), \label
> +.endm
> +
> +#ifdef L_cas
> +
> +STARTFN	NAME(cas)
> +	JUMP_IF_NOT_LSE	8f
> +
> +#if SIZE < 16
> +#define CAS	glue4(cas, A, L, S)
> +
> +	CAS		s(0), s(1), [x2]
> +	ret
> +
> +8:	UXT		s(tmp0), s(0)
> +0:	LDXR		s(0), [x2]
> +	cmp		s(0), s(tmp0)
> +	bne		1f
> +	STXR		w(tmp1), s(1), [x2]
> +	cbnz		w(tmp1), 0b
> +1:	ret
> +
> +#else
> +#define LDXP	glue3(ld, A, xp)
> +#define STXP	glue3(st, L, xp)
> +#define CASP	glue3(casp, A, L)
> +
> +	CASP		x0, x1, x2, x3, [x4]
> +	ret
> +
> +8:	mov		x(tmp0), x0
> +	mov		x(tmp1), x1
> +0:	LDXP		x0, x1, [x4]
> +	cmp		x0, x(tmp0)
> +	ccmp		x1, x(tmp1), #0, eq
> +	bne		1f
> +	STXP		w(tmp2), x(tmp0), x(tmp1), [x4]
> +	cbnz		w(tmp2), 0b
> +1:	ret
> +
> +#endif
> +
> +ENDFN	NAME(cas)
> +#endif
> +
> +#ifdef L_swp
> +#define SWP	glue4(swp, A, L, S)
> +
> +STARTFN	NAME(swp)
> +	JUMP_IF_NOT_LSE	8f
> +
> +	SWP		s(0), s(0), [x1]
> +	ret
> +
> +8:	mov		s(tmp0), s(0)
> +0:	LDXR		s(0), [x1]
> +	STXR		w(tmp1), s(tmp0), [x1]
> +	cbnz		w(tmp1), 0b
> +	ret
> +
> +ENDFN	NAME(swp)
> +#endif
> +
> +#if defined(L_ldadd) || defined(L_ldclr) \
> +    || defined(L_ldeor) || defined(L_ldset)
> +
> +#ifdef L_ldadd
> +#define LDNM	ldadd
> +#define OP	add
> +#elif defined(L_ldclr)
> +#define LDNM	ldclr
> +#define OP	bic
> +#elif defined(L_ldeor)
> +#define LDNM	ldeor
> +#define OP	eor
> +#elif defined(L_ldset)
> +#define LDNM	ldset
> +#define OP	orr
> +#else
> +#error
> +#endif
> +#define LDOP	glue4(LDNM, A, L, S)
> +
> +STARTFN	NAME(LDNM)
> +	JUMP_IF_NOT_LSE	8f
> +
> +	LDOP		s(0), s(0), [x1]
> +	ret
> +
> +8:	mov		s(tmp0), s(0)
> +0:	LDXR		s(0), [x1]
> +	OP		s(tmp1), s(0), s(tmp0)
> +	STXR		w(tmp1), s(tmp1), [x1]
> +	cbnz		w(tmp1), 0b
> +	ret
> +
> +ENDFN	NAME(LDNM)
> +#endif
> diff --git a/libgcc/config/aarch64/t-lse b/libgcc/config/aarch64/t-lse
> new file mode 100644
> index 00000000000..c7f4223cd45
> --- /dev/null
> +++ b/libgcc/config/aarch64/t-lse
> @@ -0,0 +1,44 @@
> +# Out-of-line LSE atomics for AArch64 architecture.
> +# Copyright (C) 2018 Free Software Foundation, Inc.
> +# Contributed by Linaro Ltd.
> +#
> +# This file is part of GCC.
> +#
> +# GCC is free software; you can redistribute it and/or modify it
> +# under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3, or (at your option)
> +# any later version.
> +#
> +# GCC is distributed in the hope that it will be useful, but
> +# WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +# General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with GCC; see the file COPYING3.  If not see
> +# <http://www.gnu.org/licenses/>.
> +
> +# Compare-and-swap has 5 sizes and 4 memory models.
> +S0 := $(foreach s, 1 2 4 8 16, $(addsuffix _$(s), cas))
> +O0 := $(foreach m, 1 2 3 4, $(addsuffix _$(m)$(objext), $(S0)))
> +
> +# Swap, Load-and-operate have 4 sizes and 4 memory models
> +S1 := $(foreach s, 1 2 4 8, $(addsuffix _$(s), swp ldadd ldclr ldeor 
> ldset))
> +O1 := $(foreach m, 1 2 3 4, $(addsuffix _$(m)$(objext), $(S1)))
> +
> +LSE_OBJS := $(O0) $(O1)
> +
> +libgcc-objects += $(LSE_OBJS) lse-init$(objext)
> +
> +empty      =
> +space      = $(empty) $(empty)
> +PAT_SPLIT  = $(subst _,$(space),$(*F))
> +PAT_BASE   = $(word 1,$(PAT_SPLIT))
> +PAT_N      = $(word 2,$(PAT_SPLIT))
> +PAT_M      = $(word 3,$(PAT_SPLIT))
> +
> +lse-init$(objext): $(srcdir)/config/aarch64/lse-init.c
> +	$(gcc_compile) -c $<
> +
> +$(LSE_OBJS): $(srcdir)/config/aarch64/lse.S
> +	$(gcc_compile) -DL_$(PAT_BASE) -DSIZE=$(PAT_N) -DMODEL=$(PAT_M) -c $<

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-12-23 15:38 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-18  1:58 [PATCH, AArch64 v4 0/6] LSE atomics out-of-line Richard Henderson
2019-09-18  1:58 ` [PATCH, AArch64 v4 3/6] aarch64: Tidy aarch64_split_compare_and_swap Richard Henderson
2019-09-18  1:58 ` [PATCH, AArch64 v4 5/6] aarch64: Implement -moutline-atomics Richard Henderson
2019-09-18 12:58   ` Kyrill Tkachov
2019-09-18  1:58 ` [PATCH, AArch64 v4 1/6] aarch64: Extend %R for integer registers Richard Henderson
2019-09-18  1:58 ` [PATCH, AArch64 v4 2/6] aarch64: Implement TImode compare-and-swap Richard Henderson
2019-09-18  1:58 ` [PATCH, AArch64 v4 4/6] aarch64: Add out-of-line functions for LSE atomics Richard Henderson
2019-09-18 12:58   ` Kyrill Tkachov
2019-12-23 16:05   ` Roman Zhuykov
2019-09-18  1:58 ` [PATCH, AArch64 v4 6/6] TESTING: Enable -moutline-atomics by default Richard Henderson
2019-09-18 12:58 ` [PATCH, AArch64 v4 0/6] LSE atomics out-of-line Kyrill Tkachov
2019-09-19 14:39   ` Richard Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).