* [PATCH, AArch64 v4 1/6] aarch64: Extend %R for integer registers
2019-09-18 1:58 [PATCH, AArch64 v4 0/6] LSE atomics out-of-line Richard Henderson
@ 2019-09-18 1:58 ` Richard Henderson
2019-09-18 1:58 ` [PATCH, AArch64 v4 5/6] aarch64: Implement -moutline-atomics Richard Henderson
` (5 subsequent siblings)
6 siblings, 0 replies; 12+ messages in thread
From: Richard Henderson @ 2019-09-18 1:58 UTC (permalink / raw)
To: gcc-patches
Cc: Wilco.Dijkstra, kyrylo.tkachov, Marcus.Shawcroft, James.Greenhalgh
* config/aarch64/aarch64.c (aarch64_print_operand): Allow integer
registers with %R.
---
gcc/config/aarch64/aarch64.c | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 232317d4a5a..99d51e2aef9 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -8420,7 +8420,7 @@ sizetochar (int size)
'S/T/U/V': Print a FP/SIMD register name for a register list.
The register printed is the FP/SIMD register name
of X + 0/1/2/3 for S/T/U/V.
- 'R': Print a scalar FP/SIMD register name + 1.
+ 'R': Print a scalar Integer/FP/SIMD register name + 1.
'X': Print bottom 16 bits of integer constant in hex.
'w/x': Print a general register name or the zero register
(32-bit or 64-bit).
@@ -8623,12 +8623,13 @@ aarch64_print_operand (FILE *f, rtx x, int code)
break;
case 'R':
- if (!REG_P (x) || !FP_REGNUM_P (REGNO (x)))
- {
- output_operand_lossage ("incompatible floating point / vector register operand for '%%%c'", code);
- return;
- }
- asm_fprintf (f, "q%d", REGNO (x) - V0_REGNUM + 1);
+ if (REG_P (x) && FP_REGNUM_P (REGNO (x)))
+ asm_fprintf (f, "q%d", REGNO (x) - V0_REGNUM + 1);
+ else if (REG_P (x) && GP_REGNUM_P (REGNO (x)))
+ asm_fprintf (f, "x%d", REGNO (x) - R0_REGNUM + 1);
+ else
+ output_operand_lossage ("incompatible register operand for '%%%c'",
+ code);
break;
case 'X':
--
2.17.1
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH, AArch64 v4 0/6] LSE atomics out-of-line
@ 2019-09-18 1:58 Richard Henderson
2019-09-18 1:58 ` [PATCH, AArch64 v4 1/6] aarch64: Extend %R for integer registers Richard Henderson
` (6 more replies)
0 siblings, 7 replies; 12+ messages in thread
From: Richard Henderson @ 2019-09-18 1:58 UTC (permalink / raw)
To: gcc-patches
Cc: Wilco.Dijkstra, kyrylo.tkachov, Marcus.Shawcroft, James.Greenhalgh
Version 3 was back in November:
https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00062.html
Changes since v3:
* Do not swap_commutative_operands_p in aarch64_gen_compare_reg.
This is the probable cause of the bootstrap problem that Kyrill reported.
* Add unwind markers to the out-of-line functions.
* Use uxt{8,16} instead of mov in CAS functions,
in preference to including the uxt with the cmp.
* Prefer the lse case in the out-of-line fallthru (Wilco).
* Name the option -moutline-atomics (Wilco)
* Name the variable __aarch64_have_lse_atomics (Wilco);
fix the definition in lse-init.c.
* Rename the functions s/__aa64/__aarch64/ (Seemed sensible to match prev)
* Always use Pmode for the address for libcalls, fixing ilp32 (Kyrill).
Still not done is a custom calling convention during code generation,
but that can come later as an optimization.
Tested aarch64-linux on a thunder x1.
I have not run tests on any platform supporting LSE, even qemu.
r~
Richard Henderson (6):
aarch64: Extend %R for integer registers
aarch64: Implement TImode compare-and-swap
aarch64: Tidy aarch64_split_compare_and_swap
aarch64: Add out-of-line functions for LSE atomics
aarch64: Implement -moutline-atomics
TESTING: Enable -moutline-atomics by default
gcc/config/aarch64/aarch64-protos.h | 13 +
gcc/common/config/aarch64/aarch64-common.c | 6 +-
gcc/config/aarch64/aarch64.c | 204 +++++++++++----
.../atomic-comp-swap-release-acquire.c | 2 +-
.../gcc.target/aarch64/atomic-op-acq_rel.c | 2 +-
.../gcc.target/aarch64/atomic-op-acquire.c | 2 +-
.../gcc.target/aarch64/atomic-op-char.c | 2 +-
.../gcc.target/aarch64/atomic-op-consume.c | 2 +-
.../gcc.target/aarch64/atomic-op-imm.c | 2 +-
.../gcc.target/aarch64/atomic-op-int.c | 2 +-
.../gcc.target/aarch64/atomic-op-long.c | 2 +-
.../gcc.target/aarch64/atomic-op-relaxed.c | 2 +-
.../gcc.target/aarch64/atomic-op-release.c | 2 +-
.../gcc.target/aarch64/atomic-op-seq_cst.c | 2 +-
.../gcc.target/aarch64/atomic-op-short.c | 2 +-
.../aarch64/atomic_cmp_exchange_zero_reg_1.c | 2 +-
.../atomic_cmp_exchange_zero_strong_1.c | 2 +-
.../gcc.target/aarch64/sync-comp-swap.c | 2 +-
.../gcc.target/aarch64/sync-op-acquire.c | 2 +-
.../gcc.target/aarch64/sync-op-full.c | 2 +-
libgcc/config/aarch64/lse-init.c | 45 ++++
gcc/config/aarch64/aarch64.opt | 3 +
gcc/config/aarch64/atomics.md | 187 +++++++++++++-
gcc/config/aarch64/iterators.md | 3 +
gcc/doc/invoke.texi | 16 +-
libgcc/config.host | 4 +
libgcc/config/aarch64/lse.S | 235 ++++++++++++++++++
libgcc/config/aarch64/t-lse | 44 ++++
28 files changed, 709 insertions(+), 85 deletions(-)
create mode 100644 libgcc/config/aarch64/lse-init.c
create mode 100644 libgcc/config/aarch64/lse.S
create mode 100644 libgcc/config/aarch64/t-lse
--
2.17.1
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH, AArch64 v4 4/6] aarch64: Add out-of-line functions for LSE atomics
2019-09-18 1:58 [PATCH, AArch64 v4 0/6] LSE atomics out-of-line Richard Henderson
` (3 preceding siblings ...)
2019-09-18 1:58 ` [PATCH, AArch64 v4 6/6] TESTING: Enable -moutline-atomics by default Richard Henderson
@ 2019-09-18 1:58 ` Richard Henderson
2019-09-18 12:58 ` Kyrill Tkachov
2019-12-23 16:05 ` Roman Zhuykov
2019-09-18 1:58 ` [PATCH, AArch64 v4 2/6] aarch64: Implement TImode compare-and-swap Richard Henderson
2019-09-18 12:58 ` [PATCH, AArch64 v4 0/6] LSE atomics out-of-line Kyrill Tkachov
6 siblings, 2 replies; 12+ messages in thread
From: Richard Henderson @ 2019-09-18 1:58 UTC (permalink / raw)
To: gcc-patches
Cc: Wilco.Dijkstra, kyrylo.tkachov, Marcus.Shawcroft, James.Greenhalgh
This is the libgcc part of the interface -- providing the functions.
Rationale is provided at the top of libgcc/config/aarch64/lse.S.
* config/aarch64/lse-init.c: New file.
* config/aarch64/lse.S: New file.
* config/aarch64/t-lse: New file.
* config.host: Add t-lse to all aarch64 tuples.
---
libgcc/config/aarch64/lse-init.c | 45 ++++++
libgcc/config.host | 4 +
libgcc/config/aarch64/lse.S | 235 +++++++++++++++++++++++++++++++
libgcc/config/aarch64/t-lse | 44 ++++++
4 files changed, 328 insertions(+)
create mode 100644 libgcc/config/aarch64/lse-init.c
create mode 100644 libgcc/config/aarch64/lse.S
create mode 100644 libgcc/config/aarch64/t-lse
diff --git a/libgcc/config/aarch64/lse-init.c b/libgcc/config/aarch64/lse-init.c
new file mode 100644
index 00000000000..51fb21d45c9
--- /dev/null
+++ b/libgcc/config/aarch64/lse-init.c
@@ -0,0 +1,45 @@
+/* Out-of-line LSE atomics for AArch64 architecture, Init.
+ Copyright (C) 2018 Free Software Foundation, Inc.
+ Contributed by Linaro Ltd.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+<http://www.gnu.org/licenses/>. */
+
+/* Define the symbol gating the LSE implementations. */
+_Bool __aarch64_have_lse_atomics
+ __attribute__((visibility("hidden"), nocommon));
+
+/* Disable initialization of __aarch64_have_lse_atomics during bootstrap. */
+#ifndef inhibit_libc
+# include <sys/auxv.h>
+
+/* Disable initialization if the system headers are too old. */
+# if defined(AT_HWCAP) && defined(HWCAP_ATOMICS)
+
+static void __attribute__((constructor))
+init_have_lse_atomics (void)
+{
+ unsigned long hwcap = getauxval (AT_HWCAP);
+ __aarch64_have_lse_atomics = (hwcap & HWCAP_ATOMICS) != 0;
+}
+
+# endif /* HWCAP */
+#endif /* inhibit_libc */
diff --git a/libgcc/config.host b/libgcc/config.host
index 728e543ea39..122113fc519 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -350,12 +350,14 @@ aarch64*-*-elf | aarch64*-*-rtems*)
extra_parts="$extra_parts crtbegin.o crtend.o crti.o crtn.o"
extra_parts="$extra_parts crtfastmath.o"
tmake_file="${tmake_file} ${cpu_type}/t-aarch64"
+ tmake_file="${tmake_file} ${cpu_type}/t-lse t-slibgcc-libgcc"
tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp t-crtfm"
md_unwind_header=aarch64/aarch64-unwind.h
;;
aarch64*-*-freebsd*)
extra_parts="$extra_parts crtfastmath.o"
tmake_file="${tmake_file} ${cpu_type}/t-aarch64"
+ tmake_file="${tmake_file} ${cpu_type}/t-lse t-slibgcc-libgcc"
tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp t-crtfm"
md_unwind_header=aarch64/freebsd-unwind.h
;;
@@ -367,12 +369,14 @@ aarch64*-*-netbsd*)
;;
aarch64*-*-fuchsia*)
tmake_file="${tmake_file} ${cpu_type}/t-aarch64"
+ tmake_file="${tmake_file} ${cpu_type}/t-lse t-slibgcc-libgcc"
tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp"
;;
aarch64*-*-linux*)
extra_parts="$extra_parts crtfastmath.o"
md_unwind_header=aarch64/linux-unwind.h
tmake_file="${tmake_file} ${cpu_type}/t-aarch64"
+ tmake_file="${tmake_file} ${cpu_type}/t-lse t-slibgcc-libgcc"
tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp t-crtfm"
;;
alpha*-*-linux*)
diff --git a/libgcc/config/aarch64/lse.S b/libgcc/config/aarch64/lse.S
new file mode 100644
index 00000000000..c24a39242ca
--- /dev/null
+++ b/libgcc/config/aarch64/lse.S
@@ -0,0 +1,235 @@
+/* Out-of-line LSE atomics for AArch64 architecture.
+ Copyright (C) 2018 Free Software Foundation, Inc.
+ Contributed by Linaro Ltd.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+<http://www.gnu.org/licenses/>. */
+
+/*
+ * The problem that we are trying to solve is operating system deployment
+ * of ARMv8.1-Atomics, also known as Large System Exensions (LSE).
+ *
+ * There are a number of potential solutions for this problem which have
+ * been proposed and rejected for various reasons. To recap:
+ *
+ * (1) Multiple builds. The dynamic linker will examine /lib64/atomics/
+ * if HWCAP_ATOMICS is set, allowing entire libraries to be overwritten.
+ * However, not all Linux distributions are happy with multiple builds,
+ * and anyway it has no effect on main applications.
+ *
+ * (2) IFUNC. We could put these functions into libgcc_s.so, and have
+ * a single copy of each function for all DSOs. However, ARM is concerned
+ * that the branch-to-indirect-branch that is implied by using a PLT,
+ * as required by IFUNC, is too much overhead for smaller cpus.
+ *
+ * (3) Statically predicted direct branches. This is the approach that
+ * is taken here. These functions are linked into every DSO that uses them.
+ * All of the symbols are hidden, so that the functions are called via a
+ * direct branch. The choice of LSE vs non-LSE is done via one byte load
+ * followed by a well-predicted direct branch. The functions are compiled
+ * separately to minimize code size.
+ */
+
+/* Tell the assembler to accept LSE instructions. */
+ .arch armv8-a+lse
+
+/* Declare the symbol gating the LSE implementations. */
+ .hidden __aarch64_have_lse_atomics
+
+/* Turn size and memory model defines into mnemonic fragments. */
+#if SIZE == 1
+# define S b
+# define UXT uxtb
+#elif SIZE == 2
+# define S h
+# define UXT uxth
+#elif SIZE == 4 || SIZE == 8 || SIZE == 16
+# define S
+# define UXT mov
+#else
+# error
+#endif
+
+#if MODEL == 1
+# define SUFF _relax
+# define A
+# define L
+#elif MODEL == 2
+# define SUFF _acq
+# define A a
+# define L
+#elif MODEL == 3
+# define SUFF _rel
+# define A
+# define L l
+#elif MODEL == 4
+# define SUFF _acq_rel
+# define A a
+# define L l
+#else
+# error
+#endif
+
+/* Concatenate symbols. */
+#define glue2_(A, B) A ## B
+#define glue2(A, B) glue2_(A, B)
+#define glue3_(A, B, C) A ## B ## C
+#define glue3(A, B, C) glue3_(A, B, C)
+#define glue4_(A, B, C, D) A ## B ## C ## D
+#define glue4(A, B, C, D) glue4_(A, B, C, D)
+
+/* Select the size of a register, given a regno. */
+#define x(N) glue2(x, N)
+#define w(N) glue2(w, N)
+#if SIZE < 8
+# define s(N) w(N)
+#else
+# define s(N) x(N)
+#endif
+
+#define NAME(BASE) glue4(__aarch64_, BASE, SIZE, SUFF)
+#define LDXR glue4(ld, A, xr, S)
+#define STXR glue4(st, L, xr, S)
+
+/* Temporary registers used. Other than these, only the return value
+ register (x0) and the flags are modified. */
+#define tmp0 16
+#define tmp1 17
+#define tmp2 15
+
+/* Start and end a function. */
+.macro STARTFN name
+ .text
+ .balign 16
+ .globl \name
+ .hidden \name
+ .type \name, %function
+ .cfi_startproc
+\name:
+.endm
+
+.macro ENDFN name
+ .cfi_endproc
+ .size \name, . - \name
+.endm
+
+/* Branch to LABEL if LSE is disabled. */
+.macro JUMP_IF_NOT_LSE label
+ adrp x(tmp0), __aarch64_have_lse_atomics
+ ldrb w(tmp0), [x(tmp0), :lo12:__aarch64_have_lse_atomics]
+ cbz w(tmp0), \label
+.endm
+
+#ifdef L_cas
+
+STARTFN NAME(cas)
+ JUMP_IF_NOT_LSE 8f
+
+#if SIZE < 16
+#define CAS glue4(cas, A, L, S)
+
+ CAS s(0), s(1), [x2]
+ ret
+
+8: UXT s(tmp0), s(0)
+0: LDXR s(0), [x2]
+ cmp s(0), s(tmp0)
+ bne 1f
+ STXR w(tmp1), s(1), [x2]
+ cbnz w(tmp1), 0b
+1: ret
+
+#else
+#define LDXP glue3(ld, A, xp)
+#define STXP glue3(st, L, xp)
+#define CASP glue3(casp, A, L)
+
+ CASP x0, x1, x2, x3, [x4]
+ ret
+
+8: mov x(tmp0), x0
+ mov x(tmp1), x1
+0: LDXP x0, x1, [x4]
+ cmp x0, x(tmp0)
+ ccmp x1, x(tmp1), #0, eq
+ bne 1f
+ STXP w(tmp2), x(tmp0), x(tmp1), [x4]
+ cbnz w(tmp2), 0b
+1: ret
+
+#endif
+
+ENDFN NAME(cas)
+#endif
+
+#ifdef L_swp
+#define SWP glue4(swp, A, L, S)
+
+STARTFN NAME(swp)
+ JUMP_IF_NOT_LSE 8f
+
+ SWP s(0), s(0), [x1]
+ ret
+
+8: mov s(tmp0), s(0)
+0: LDXR s(0), [x1]
+ STXR w(tmp1), s(tmp0), [x1]
+ cbnz w(tmp1), 0b
+ ret
+
+ENDFN NAME(swp)
+#endif
+
+#if defined(L_ldadd) || defined(L_ldclr) \
+ || defined(L_ldeor) || defined(L_ldset)
+
+#ifdef L_ldadd
+#define LDNM ldadd
+#define OP add
+#elif defined(L_ldclr)
+#define LDNM ldclr
+#define OP bic
+#elif defined(L_ldeor)
+#define LDNM ldeor
+#define OP eor
+#elif defined(L_ldset)
+#define LDNM ldset
+#define OP orr
+#else
+#error
+#endif
+#define LDOP glue4(LDNM, A, L, S)
+
+STARTFN NAME(LDNM)
+ JUMP_IF_NOT_LSE 8f
+
+ LDOP s(0), s(0), [x1]
+ ret
+
+8: mov s(tmp0), s(0)
+0: LDXR s(0), [x1]
+ OP s(tmp1), s(0), s(tmp0)
+ STXR w(tmp1), s(tmp1), [x1]
+ cbnz w(tmp1), 0b
+ ret
+
+ENDFN NAME(LDNM)
+#endif
diff --git a/libgcc/config/aarch64/t-lse b/libgcc/config/aarch64/t-lse
new file mode 100644
index 00000000000..c7f4223cd45
--- /dev/null
+++ b/libgcc/config/aarch64/t-lse
@@ -0,0 +1,44 @@
+# Out-of-line LSE atomics for AArch64 architecture.
+# Copyright (C) 2018 Free Software Foundation, Inc.
+# Contributed by Linaro Ltd.
+#
+# This file is part of GCC.
+#
+# GCC is free software; you can redistribute it and/or modify it
+# under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at your option)
+# any later version.
+#
+# GCC is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+# General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3. If not see
+# <http://www.gnu.org/licenses/>.
+
+# Compare-and-swap has 5 sizes and 4 memory models.
+S0 := $(foreach s, 1 2 4 8 16, $(addsuffix _$(s), cas))
+O0 := $(foreach m, 1 2 3 4, $(addsuffix _$(m)$(objext), $(S0)))
+
+# Swap, Load-and-operate have 4 sizes and 4 memory models
+S1 := $(foreach s, 1 2 4 8, $(addsuffix _$(s), swp ldadd ldclr ldeor ldset))
+O1 := $(foreach m, 1 2 3 4, $(addsuffix _$(m)$(objext), $(S1)))
+
+LSE_OBJS := $(O0) $(O1)
+
+libgcc-objects += $(LSE_OBJS) lse-init$(objext)
+
+empty =
+space = $(empty) $(empty)
+PAT_SPLIT = $(subst _,$(space),$(*F))
+PAT_BASE = $(word 1,$(PAT_SPLIT))
+PAT_N = $(word 2,$(PAT_SPLIT))
+PAT_M = $(word 3,$(PAT_SPLIT))
+
+lse-init$(objext): $(srcdir)/config/aarch64/lse-init.c
+ $(gcc_compile) -c $<
+
+$(LSE_OBJS): $(srcdir)/config/aarch64/lse.S
+ $(gcc_compile) -DL_$(PAT_BASE) -DSIZE=$(PAT_N) -DMODEL=$(PAT_M) -c $<
--
2.17.1
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH, AArch64 v4 6/6] TESTING: Enable -moutline-atomics by default
2019-09-18 1:58 [PATCH, AArch64 v4 0/6] LSE atomics out-of-line Richard Henderson
` (2 preceding siblings ...)
2019-09-18 1:58 ` [PATCH, AArch64 v4 3/6] aarch64: Tidy aarch64_split_compare_and_swap Richard Henderson
@ 2019-09-18 1:58 ` Richard Henderson
2019-09-18 1:58 ` [PATCH, AArch64 v4 4/6] aarch64: Add out-of-line functions for LSE atomics Richard Henderson
` (2 subsequent siblings)
6 siblings, 0 replies; 12+ messages in thread
From: Richard Henderson @ 2019-09-18 1:58 UTC (permalink / raw)
To: gcc-patches
Cc: Wilco.Dijkstra, kyrylo.tkachov, Marcus.Shawcroft, James.Greenhalgh
---
gcc/common/config/aarch64/aarch64-common.c | 6 ++++--
gcc/config/aarch64/aarch64.c | 6 ++++--
2 files changed, 8 insertions(+), 4 deletions(-)
diff --git a/gcc/common/config/aarch64/aarch64-common.c b/gcc/common/config/aarch64/aarch64-common.c
index 07c03253951..2bbf454eea9 100644
--- a/gcc/common/config/aarch64/aarch64-common.c
+++ b/gcc/common/config/aarch64/aarch64-common.c
@@ -32,9 +32,11 @@
#include "diagnostic.h"
#include "params.h"
-#ifdef TARGET_BIG_ENDIAN_DEFAULT
#undef TARGET_DEFAULT_TARGET_FLAGS
-#define TARGET_DEFAULT_TARGET_FLAGS (MASK_BIG_END)
+#ifdef TARGET_BIG_ENDIAN_DEFAULT
+#define TARGET_DEFAULT_TARGET_FLAGS (MASK_BIG_END | MASK_OUTLINE_ATOMICS)
+#else
+#define TARGET_DEFAULT_TARGET_FLAGS (MASK_OUTLINE_ATOMICS)
#endif
#undef TARGET_HANDLE_OPTION
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 56a4a47db73..ca4363e7831 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -20535,9 +20535,11 @@ aarch64_run_selftests (void)
#undef TARGET_C_MODE_FOR_SUFFIX
#define TARGET_C_MODE_FOR_SUFFIX aarch64_c_mode_for_suffix
-#ifdef TARGET_BIG_ENDIAN_DEFAULT
#undef TARGET_DEFAULT_TARGET_FLAGS
-#define TARGET_DEFAULT_TARGET_FLAGS (MASK_BIG_END)
+#ifdef TARGET_BIG_ENDIAN_DEFAULT
+#define TARGET_DEFAULT_TARGET_FLAGS (MASK_BIG_END | MASK_OUTLINE_ATOMICS)
+#else
+#define TARGET_DEFAULT_TARGET_FLAGS (MASK_OUTLINE_ATOMICS)
#endif
#undef TARGET_CLASS_MAX_NREGS
--
2.17.1
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH, AArch64 v4 3/6] aarch64: Tidy aarch64_split_compare_and_swap
2019-09-18 1:58 [PATCH, AArch64 v4 0/6] LSE atomics out-of-line Richard Henderson
2019-09-18 1:58 ` [PATCH, AArch64 v4 1/6] aarch64: Extend %R for integer registers Richard Henderson
2019-09-18 1:58 ` [PATCH, AArch64 v4 5/6] aarch64: Implement -moutline-atomics Richard Henderson
@ 2019-09-18 1:58 ` Richard Henderson
2019-09-18 1:58 ` [PATCH, AArch64 v4 6/6] TESTING: Enable -moutline-atomics by default Richard Henderson
` (3 subsequent siblings)
6 siblings, 0 replies; 12+ messages in thread
From: Richard Henderson @ 2019-09-18 1:58 UTC (permalink / raw)
To: gcc-patches
Cc: Wilco.Dijkstra, kyrylo.tkachov, Marcus.Shawcroft, James.Greenhalgh
With aarch64_track_speculation, we had extra code to do exactly what the
!strong_zero_p path already did. The rest is reducing code duplication.
* config/aarch64/aarch64 (aarch64_split_compare_and_swap): Disable
strong_zero_p for aarch64_track_speculation; unify some code paths;
use aarch64_gen_compare_reg instead of open-coding.
---
gcc/config/aarch64/aarch64.c | 50 ++++++++++--------------------------
1 file changed, 14 insertions(+), 36 deletions(-)
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index a5c4f55627d..b937514e6f8 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -16955,13 +16955,11 @@ aarch64_emit_post_barrier (enum memmodel model)
void
aarch64_split_compare_and_swap (rtx operands[])
{
- rtx rval, mem, oldval, newval, scratch;
+ rtx rval, mem, oldval, newval, scratch, x, model_rtx;
machine_mode mode;
bool is_weak;
rtx_code_label *label1, *label2;
- rtx x, cond;
enum memmodel model;
- rtx model_rtx;
rval = operands[0];
mem = operands[1];
@@ -16982,7 +16980,8 @@ aarch64_split_compare_and_swap (rtx operands[])
CBNZ scratch, .label1
.label2:
CMP rval, 0. */
- bool strong_zero_p = !is_weak && oldval == const0_rtx && mode != TImode;
+ bool strong_zero_p = (!is_weak && !aarch64_track_speculation &&
+ oldval == const0_rtx && mode != TImode);
label1 = NULL;
if (!is_weak)
@@ -16995,35 +16994,20 @@ aarch64_split_compare_and_swap (rtx operands[])
/* The initial load can be relaxed for a __sync operation since a final
barrier will be emitted to stop code hoisting. */
if (is_mm_sync (model))
- aarch64_emit_load_exclusive (mode, rval, mem,
- GEN_INT (MEMMODEL_RELAXED));
+ aarch64_emit_load_exclusive (mode, rval, mem, GEN_INT (MEMMODEL_RELAXED));
else
aarch64_emit_load_exclusive (mode, rval, mem, model_rtx);
if (strong_zero_p)
- {
- if (aarch64_track_speculation)
- {
- /* Emit an explicit compare instruction, so that we can correctly
- track the condition codes. */
- rtx cc_reg = aarch64_gen_compare_reg (NE, rval, const0_rtx);
- x = gen_rtx_NE (GET_MODE (cc_reg), cc_reg, const0_rtx);
- }
- else
- x = gen_rtx_NE (VOIDmode, rval, const0_rtx);
-
- x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
- gen_rtx_LABEL_REF (Pmode, label2), pc_rtx);
- aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
- }
+ x = gen_rtx_NE (VOIDmode, rval, const0_rtx);
else
{
- cond = aarch64_gen_compare_reg_maybe_ze (NE, rval, oldval, mode);
- x = gen_rtx_NE (VOIDmode, cond, const0_rtx);
- x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
- gen_rtx_LABEL_REF (Pmode, label2), pc_rtx);
- aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
+ rtx cc_reg = aarch64_gen_compare_reg_maybe_ze (NE, rval, oldval, mode);
+ x = gen_rtx_NE (VOIDmode, cc_reg, const0_rtx);
}
+ x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
+ gen_rtx_LABEL_REF (Pmode, label2), pc_rtx);
+ aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
aarch64_emit_store_exclusive (mode, scratch, mem, newval, model_rtx);
@@ -17044,22 +17028,16 @@ aarch64_split_compare_and_swap (rtx operands[])
aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
}
else
- {
- cond = gen_rtx_REG (CCmode, CC_REGNUM);
- x = gen_rtx_COMPARE (CCmode, scratch, const0_rtx);
- emit_insn (gen_rtx_SET (cond, x));
- }
+ aarch64_gen_compare_reg (NE, scratch, const0_rtx);
emit_label (label2);
+
/* If we used a CBNZ in the exchange loop emit an explicit compare with RVAL
to set the condition flags. If this is not used it will be removed by
later passes. */
if (strong_zero_p)
- {
- cond = gen_rtx_REG (CCmode, CC_REGNUM);
- x = gen_rtx_COMPARE (CCmode, rval, const0_rtx);
- emit_insn (gen_rtx_SET (cond, x));
- }
+ aarch64_gen_compare_reg (NE, rval, const0_rtx);
+
/* Emit any final barrier needed for a __sync operation. */
if (is_mm_sync (model))
aarch64_emit_post_barrier (model);
--
2.17.1
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH, AArch64 v4 2/6] aarch64: Implement TImode compare-and-swap
2019-09-18 1:58 [PATCH, AArch64 v4 0/6] LSE atomics out-of-line Richard Henderson
` (4 preceding siblings ...)
2019-09-18 1:58 ` [PATCH, AArch64 v4 4/6] aarch64: Add out-of-line functions for LSE atomics Richard Henderson
@ 2019-09-18 1:58 ` Richard Henderson
2019-09-18 12:58 ` [PATCH, AArch64 v4 0/6] LSE atomics out-of-line Kyrill Tkachov
6 siblings, 0 replies; 12+ messages in thread
From: Richard Henderson @ 2019-09-18 1:58 UTC (permalink / raw)
To: gcc-patches
Cc: Wilco.Dijkstra, kyrylo.tkachov, Marcus.Shawcroft, James.Greenhalgh
This pattern will only be used with the __sync functions, because
we do not yet have a bare TImode atomic load.
* config/aarch64/aarch64.c (aarch64_gen_compare_reg): Add support
for NE comparison of TImode values.
(aarch64_emit_load_exclusive): Add support for TImode.
(aarch64_emit_store_exclusive): Likewise.
(aarch64_split_compare_and_swap): Disable strong_zero_p for TImode.
* config/aarch64/atomics.md (@atomic_compare_and_swap<ALLI_TI>):
Change iterator from ALLI to ALLI_TI.
(@atomic_compare_and_swap<JUST_TI>): New.
(@atomic_compare_and_swap<JUST_TI>_lse): New.
(aarch64_load_exclusive_pair): New.
(aarch64_store_exclusive_pair): New.
* config/aarch64/iterators.md (JUST_TI): New.
---
gcc/config/aarch64/aarch64.c | 48 ++++++++++++++---
gcc/config/aarch64/atomics.md | 93 +++++++++++++++++++++++++++++++--
gcc/config/aarch64/iterators.md | 3 ++
3 files changed, 131 insertions(+), 13 deletions(-)
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 99d51e2aef9..a5c4f55627d 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -2039,10 +2039,33 @@ emit_set_insn (rtx x, rtx y)
rtx
aarch64_gen_compare_reg (RTX_CODE code, rtx x, rtx y)
{
- machine_mode mode = SELECT_CC_MODE (code, x, y);
- rtx cc_reg = gen_rtx_REG (mode, CC_REGNUM);
+ machine_mode cmp_mode = GET_MODE (x);
+ machine_mode cc_mode;
+ rtx cc_reg;
- emit_set_insn (cc_reg, gen_rtx_COMPARE (mode, x, y));
+ if (cmp_mode == TImode)
+ {
+ gcc_assert (code == NE);
+
+ cc_mode = CCmode;
+ cc_reg = gen_rtx_REG (cc_mode, CC_REGNUM);
+
+ rtx x_lo = operand_subword (x, 0, 0, TImode);
+ rtx y_lo = operand_subword (y, 0, 0, TImode);
+ emit_set_insn (cc_reg, gen_rtx_COMPARE (cc_mode, x_lo, y_lo));
+
+ rtx x_hi = operand_subword (x, 1, 0, TImode);
+ rtx y_hi = operand_subword (y, 1, 0, TImode);
+ emit_insn (gen_ccmpdi (cc_reg, cc_reg, x_hi, y_hi,
+ gen_rtx_EQ (cc_mode, cc_reg, const0_rtx),
+ GEN_INT (AARCH64_EQ)));
+ }
+ else
+ {
+ cc_mode = SELECT_CC_MODE (code, x, y);
+ cc_reg = gen_rtx_REG (cc_mode, CC_REGNUM);
+ emit_set_insn (cc_reg, gen_rtx_COMPARE (cc_mode, x, y));
+ }
return cc_reg;
}
@@ -2593,7 +2616,6 @@ aarch64_zero_extend_const_eq (machine_mode xmode, rtx x,
gcc_assert (r != NULL);
return rtx_equal_p (x, r);
}
-
/* Return TARGET if it is nonnull and a register of mode MODE.
Otherwise, return a fresh register of mode MODE if we can,
@@ -16814,16 +16836,26 @@ static void
aarch64_emit_load_exclusive (machine_mode mode, rtx rval,
rtx mem, rtx model_rtx)
{
- emit_insn (gen_aarch64_load_exclusive (mode, rval, mem, model_rtx));
+ if (mode == TImode)
+ emit_insn (gen_aarch64_load_exclusive_pair (gen_lowpart (DImode, rval),
+ gen_highpart (DImode, rval),
+ mem, model_rtx));
+ else
+ emit_insn (gen_aarch64_load_exclusive (mode, rval, mem, model_rtx));
}
/* Emit store exclusive. */
static void
aarch64_emit_store_exclusive (machine_mode mode, rtx bval,
- rtx rval, rtx mem, rtx model_rtx)
+ rtx mem, rtx rval, rtx model_rtx)
{
- emit_insn (gen_aarch64_store_exclusive (mode, bval, rval, mem, model_rtx));
+ if (mode == TImode)
+ emit_insn (gen_aarch64_store_exclusive_pair
+ (bval, mem, operand_subword (rval, 0, 0, TImode),
+ operand_subword (rval, 1, 0, TImode), model_rtx));
+ else
+ emit_insn (gen_aarch64_store_exclusive (mode, bval, mem, rval, model_rtx));
}
/* Mark the previous jump instruction as unlikely. */
@@ -16950,7 +16982,7 @@ aarch64_split_compare_and_swap (rtx operands[])
CBNZ scratch, .label1
.label2:
CMP rval, 0. */
- bool strong_zero_p = !is_weak && oldval == const0_rtx;
+ bool strong_zero_p = !is_weak && oldval == const0_rtx && mode != TImode;
label1 = NULL;
if (!is_weak)
diff --git a/gcc/config/aarch64/atomics.md b/gcc/config/aarch64/atomics.md
index a679270cd38..f8bdd048b37 100644
--- a/gcc/config/aarch64/atomics.md
+++ b/gcc/config/aarch64/atomics.md
@@ -21,11 +21,11 @@
;; Instruction patterns.
(define_expand "@atomic_compare_and_swap<mode>"
- [(match_operand:SI 0 "register_operand") ;; bool out
- (match_operand:ALLI 1 "register_operand") ;; val out
- (match_operand:ALLI 2 "aarch64_sync_memory_operand") ;; memory
- (match_operand:ALLI 3 "nonmemory_operand") ;; expected
- (match_operand:ALLI 4 "aarch64_reg_or_zero") ;; desired
+ [(match_operand:SI 0 "register_operand" "") ;; bool out
+ (match_operand:ALLI_TI 1 "register_operand" "") ;; val out
+ (match_operand:ALLI_TI 2 "aarch64_sync_memory_operand" "") ;; memory
+ (match_operand:ALLI_TI 3 "nonmemory_operand" "") ;; expected
+ (match_operand:ALLI_TI 4 "aarch64_reg_or_zero" "") ;; desired
(match_operand:SI 5 "const_int_operand") ;; is_weak
(match_operand:SI 6 "const_int_operand") ;; mod_s
(match_operand:SI 7 "const_int_operand")] ;; mod_f
@@ -88,6 +88,30 @@
}
)
+(define_insn_and_split "@aarch64_compare_and_swap<mode>"
+ [(set (reg:CC CC_REGNUM) ;; bool out
+ (unspec_volatile:CC [(const_int 0)] UNSPECV_ATOMIC_CMPSW))
+ (set (match_operand:JUST_TI 0 "register_operand" "=&r") ;; val out
+ (match_operand:JUST_TI 1 "aarch64_sync_memory_operand" "+Q")) ;; memory
+ (set (match_dup 1)
+ (unspec_volatile:JUST_TI
+ [(match_operand:JUST_TI 2 "aarch64_reg_or_zero" "rZ") ;; expect
+ (match_operand:JUST_TI 3 "aarch64_reg_or_zero" "rZ") ;; desired
+ (match_operand:SI 4 "const_int_operand") ;; is_weak
+ (match_operand:SI 5 "const_int_operand") ;; mod_s
+ (match_operand:SI 6 "const_int_operand")] ;; mod_f
+ UNSPECV_ATOMIC_CMPSW))
+ (clobber (match_scratch:SI 7 "=&r"))]
+ ""
+ "#"
+ "&& reload_completed"
+ [(const_int 0)]
+ {
+ aarch64_split_compare_and_swap (operands);
+ DONE;
+ }
+)
+
(define_insn "@aarch64_compare_and_swap<mode>_lse"
[(set (match_operand:SI 0 "register_operand" "+r") ;; val out
(zero_extend:SI
@@ -133,6 +157,28 @@
return "casal<atomic_sfx>\t%<w>0, %<w>2, %1";
})
+(define_insn "@aarch64_compare_and_swap<mode>_lse"
+ [(set (match_operand:JUST_TI 0 "register_operand" "+r") ;; val out
+ (match_operand:JUST_TI 1 "aarch64_sync_memory_operand" "+Q")) ;; memory
+ (set (match_dup 1)
+ (unspec_volatile:JUST_TI
+ [(match_dup 0) ;; expect
+ (match_operand:JUST_TI 2 "register_operand" "r") ;; desired
+ (match_operand:SI 3 "const_int_operand")] ;; mod_s
+ UNSPECV_ATOMIC_CMPSW))]
+ "TARGET_LSE"
+{
+ enum memmodel model = memmodel_from_int (INTVAL (operands[3]));
+ if (is_mm_relaxed (model))
+ return "casp\t%0, %R0, %2, %R2, %1";
+ else if (is_mm_acquire (model) || is_mm_consume (model))
+ return "caspa\t%0, %R0, %2, %R2, %1";
+ else if (is_mm_release (model))
+ return "caspl\t%0, %R0, %2, %R2, %1";
+ else
+ return "caspal\t%0, %R0, %2, %R2, %1";
+})
+
(define_expand "atomic_exchange<mode>"
[(match_operand:ALLI 0 "register_operand")
(match_operand:ALLI 1 "aarch64_sync_memory_operand")
@@ -581,6 +627,24 @@
}
)
+(define_insn "aarch64_load_exclusive_pair"
+ [(set (match_operand:DI 0 "register_operand" "=r")
+ (unspec_volatile:DI
+ [(match_operand:TI 2 "aarch64_sync_memory_operand" "Q")
+ (match_operand:SI 3 "const_int_operand")]
+ UNSPECV_LX))
+ (set (match_operand:DI 1 "register_operand" "=r")
+ (unspec_volatile:DI [(match_dup 2) (match_dup 3)] UNSPECV_LX))]
+ ""
+ {
+ enum memmodel model = memmodel_from_int (INTVAL (operands[3]));
+ if (is_mm_relaxed (model) || is_mm_consume (model) || is_mm_release (model))
+ return "ldxp\t%0, %1, %2";
+ else
+ return "ldaxp\t%0, %1, %2";
+ }
+)
+
(define_insn "@aarch64_store_exclusive<mode>"
[(set (match_operand:SI 0 "register_operand" "=&r")
(unspec_volatile:SI [(const_int 0)] UNSPECV_SX))
@@ -599,6 +663,25 @@
}
)
+(define_insn "aarch64_store_exclusive_pair"
+ [(set (match_operand:SI 0 "register_operand" "=&r")
+ (unspec_volatile:SI [(const_int 0)] UNSPECV_SX))
+ (set (match_operand:TI 1 "aarch64_sync_memory_operand" "=Q")
+ (unspec_volatile:TI
+ [(match_operand:DI 2 "aarch64_reg_or_zero" "rZ")
+ (match_operand:DI 3 "aarch64_reg_or_zero" "rZ")
+ (match_operand:SI 4 "const_int_operand")]
+ UNSPECV_SX))]
+ ""
+ {
+ enum memmodel model = memmodel_from_int (INTVAL (operands[3]));
+ if (is_mm_relaxed (model) || is_mm_consume (model) || is_mm_acquire (model))
+ return "stxp\t%w0, %x2, %x3, %1";
+ else
+ return "stlxp\t%w0, %x2, %x3, %1";
+ }
+)
+
(define_expand "mem_thread_fence"
[(match_operand:SI 0 "const_int_operand")]
""
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index d23f0fcbc2f..03b3ce36302 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -29,6 +29,9 @@
;; Iterator for HI, SI, DI, some instructions can only work on these modes.
(define_mode_iterator GPI_I16 [(HI "AARCH64_ISA_F16") SI DI])
+;; "Iterator" for just TI -- features like @pattern only work with iterators.
+(define_mode_iterator JUST_TI [TI])
+
;; Iterator for QI and HI modes
(define_mode_iterator SHORT [QI HI])
--
2.17.1
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH, AArch64 v4 5/6] aarch64: Implement -moutline-atomics
2019-09-18 1:58 [PATCH, AArch64 v4 0/6] LSE atomics out-of-line Richard Henderson
2019-09-18 1:58 ` [PATCH, AArch64 v4 1/6] aarch64: Extend %R for integer registers Richard Henderson
@ 2019-09-18 1:58 ` Richard Henderson
2019-09-18 12:58 ` Kyrill Tkachov
2019-09-18 1:58 ` [PATCH, AArch64 v4 3/6] aarch64: Tidy aarch64_split_compare_and_swap Richard Henderson
` (4 subsequent siblings)
6 siblings, 1 reply; 12+ messages in thread
From: Richard Henderson @ 2019-09-18 1:58 UTC (permalink / raw)
To: gcc-patches
Cc: Wilco.Dijkstra, kyrylo.tkachov, Marcus.Shawcroft, James.Greenhalgh
* config/aarch64/aarch64.opt (-moutline-atomics): New.
* config/aarch64/aarch64.c (aarch64_atomic_ool_func): New.
(aarch64_ool_cas_names, aarch64_ool_swp_names): New.
(aarch64_ool_ldadd_names, aarch64_ool_ldset_names): New.
(aarch64_ool_ldclr_names, aarch64_ool_ldeor_names): New.
(aarch64_expand_compare_and_swap): Honor TARGET_OUTLINE_ATOMICS.
* config/aarch64/atomics.md (atomic_exchange<ALLI>): Likewise.
(atomic_<atomic_op><ALLI>): Likewise.
(atomic_fetch_<atomic_op><ALLI>): Likewise.
(atomic_<atomic_op>_fetch<ALLI>): Likewise.
testsuite/
* gcc.target/aarch64/atomic-op-acq_rel.c: Use -mno-outline-atomics.
* gcc.target/aarch64/atomic-comp-swap-release-acquire.c: Likewise.
* gcc.target/aarch64/atomic-op-acquire.c: Likewise.
* gcc.target/aarch64/atomic-op-char.c: Likewise.
* gcc.target/aarch64/atomic-op-consume.c: Likewise.
* gcc.target/aarch64/atomic-op-imm.c: Likewise.
* gcc.target/aarch64/atomic-op-int.c: Likewise.
* gcc.target/aarch64/atomic-op-long.c: Likewise.
* gcc.target/aarch64/atomic-op-relaxed.c: Likewise.
* gcc.target/aarch64/atomic-op-release.c: Likewise.
* gcc.target/aarch64/atomic-op-seq_cst.c: Likewise.
* gcc.target/aarch64/atomic-op-short.c: Likewise.
* gcc.target/aarch64/atomic_cmp_exchange_zero_reg_1.c: Likewise.
* gcc.target/aarch64/atomic_cmp_exchange_zero_strong_1.c: Likewise.
* gcc.target/aarch64/sync-comp-swap.c: Likewise.
* gcc.target/aarch64/sync-op-acquire.c: Likewise.
* gcc.target/aarch64/sync-op-full.c: Likewise.
---
gcc/config/aarch64/aarch64-protos.h | 13 +++
gcc/config/aarch64/aarch64.c | 87 +++++++++++++++++
.../atomic-comp-swap-release-acquire.c | 2 +-
.../gcc.target/aarch64/atomic-op-acq_rel.c | 2 +-
.../gcc.target/aarch64/atomic-op-acquire.c | 2 +-
.../gcc.target/aarch64/atomic-op-char.c | 2 +-
.../gcc.target/aarch64/atomic-op-consume.c | 2 +-
.../gcc.target/aarch64/atomic-op-imm.c | 2 +-
.../gcc.target/aarch64/atomic-op-int.c | 2 +-
.../gcc.target/aarch64/atomic-op-long.c | 2 +-
.../gcc.target/aarch64/atomic-op-relaxed.c | 2 +-
.../gcc.target/aarch64/atomic-op-release.c | 2 +-
.../gcc.target/aarch64/atomic-op-seq_cst.c | 2 +-
.../gcc.target/aarch64/atomic-op-short.c | 2 +-
.../aarch64/atomic_cmp_exchange_zero_reg_1.c | 2 +-
.../atomic_cmp_exchange_zero_strong_1.c | 2 +-
.../gcc.target/aarch64/sync-comp-swap.c | 2 +-
.../gcc.target/aarch64/sync-op-acquire.c | 2 +-
.../gcc.target/aarch64/sync-op-full.c | 2 +-
gcc/config/aarch64/aarch64.opt | 3 +
gcc/config/aarch64/atomics.md | 94 +++++++++++++++++--
gcc/doc/invoke.texi | 16 +++-
22 files changed, 221 insertions(+), 26 deletions(-)
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index c4b73d26df6..1c1aac7201a 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -696,4 +696,17 @@ poly_uint64 aarch64_regmode_natural_size (machine_mode);
bool aarch64_high_bits_all_ones_p (HOST_WIDE_INT);
+struct atomic_ool_names
+{
+ const char *str[5][4];
+};
+
+rtx aarch64_atomic_ool_func(machine_mode mode, rtx model_rtx,
+ const atomic_ool_names *names);
+extern const atomic_ool_names aarch64_ool_swp_names;
+extern const atomic_ool_names aarch64_ool_ldadd_names;
+extern const atomic_ool_names aarch64_ool_ldset_names;
+extern const atomic_ool_names aarch64_ool_ldclr_names;
+extern const atomic_ool_names aarch64_ool_ldeor_names;
+
#endif /* GCC_AARCH64_PROTOS_H */
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index b937514e6f8..56a4a47db73 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -16867,6 +16867,82 @@ aarch64_emit_unlikely_jump (rtx insn)
add_reg_br_prob_note (jump, profile_probability::very_unlikely ());
}
+/* We store the names of the various atomic helpers in a 5x4 array.
+ Return the libcall function given MODE, MODEL and NAMES. */
+
+rtx
+aarch64_atomic_ool_func(machine_mode mode, rtx model_rtx,
+ const atomic_ool_names *names)
+{
+ memmodel model = memmodel_base (INTVAL (model_rtx));
+ int mode_idx, model_idx;
+
+ switch (mode)
+ {
+ case E_QImode:
+ mode_idx = 0;
+ break;
+ case E_HImode:
+ mode_idx = 1;
+ break;
+ case E_SImode:
+ mode_idx = 2;
+ break;
+ case E_DImode:
+ mode_idx = 3;
+ break;
+ case E_TImode:
+ mode_idx = 4;
+ break;
+ default:
+ gcc_unreachable ();
+ }
+
+ switch (model)
+ {
+ case MEMMODEL_RELAXED:
+ model_idx = 0;
+ break;
+ case MEMMODEL_CONSUME:
+ case MEMMODEL_ACQUIRE:
+ model_idx = 1;
+ break;
+ case MEMMODEL_RELEASE:
+ model_idx = 2;
+ break;
+ case MEMMODEL_ACQ_REL:
+ case MEMMODEL_SEQ_CST:
+ model_idx = 3;
+ break;
+ default:
+ gcc_unreachable ();
+ }
+
+ return init_one_libfunc_visibility (names->str[mode_idx][model_idx],
+ VISIBILITY_HIDDEN);
+}
+
+#define DEF0(B, N) \
+ { "__aarch64_" #B #N "_relax", \
+ "__aarch64_" #B #N "_acq", \
+ "__aarch64_" #B #N "_rel", \
+ "__aarch64_" #B #N "_acq_rel" }
+
+#define DEF4(B) DEF0(B, 1), DEF0(B, 2), DEF0(B, 4), DEF0(B, 8), \
+ { NULL, NULL, NULL, NULL }
+#define DEF5(B) DEF0(B, 1), DEF0(B, 2), DEF0(B, 4), DEF0(B, 8), DEF0(B, 16)
+
+static const atomic_ool_names aarch64_ool_cas_names = { { DEF5(cas) } };
+const atomic_ool_names aarch64_ool_swp_names = { { DEF4(swp) } };
+const atomic_ool_names aarch64_ool_ldadd_names = { { DEF4(ldadd) } };
+const atomic_ool_names aarch64_ool_ldset_names = { { DEF4(ldset) } };
+const atomic_ool_names aarch64_ool_ldclr_names = { { DEF4(ldclr) } };
+const atomic_ool_names aarch64_ool_ldeor_names = { { DEF4(ldeor) } };
+
+#undef DEF0
+#undef DEF4
+#undef DEF5
+
/* Expand a compare and swap pattern. */
void
@@ -16913,6 +16989,17 @@ aarch64_expand_compare_and_swap (rtx operands[])
newval, mod_s));
cc_reg = aarch64_gen_compare_reg_maybe_ze (NE, rval, oldval, mode);
}
+ else if (TARGET_OUTLINE_ATOMICS)
+ {
+ /* Oldval must satisfy compare afterward. */
+ if (!aarch64_plus_operand (oldval, mode))
+ oldval = force_reg (mode, oldval);
+ rtx func = aarch64_atomic_ool_func (mode, mod_s, &aarch64_ool_cas_names);
+ rval = emit_library_call_value (func, NULL_RTX, LCT_NORMAL, r_mode,
+ oldval, mode, newval, mode,
+ XEXP (mem, 0), Pmode);
+ cc_reg = aarch64_gen_compare_reg_maybe_ze (NE, rval, oldval, mode);
+ }
else
{
/* The oldval predicate varies by mode. Test it and force to reg. */
diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-comp-swap-release-acquire.c b/gcc/testsuite/gcc.target/aarch64/atomic-comp-swap-release-acquire.c
index 49ca5d0d09c..a828a72aa75 100644
--- a/gcc/testsuite/gcc.target/aarch64/atomic-comp-swap-release-acquire.c
+++ b/gcc/testsuite/gcc.target/aarch64/atomic-comp-swap-release-acquire.c
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-options "-march=armv8-a+nolse -O2 -fno-ipa-icf" } */
+/* { dg-options "-march=armv8-a+nolse -O2 -fno-ipa-icf -mno-outline-atomics" } */
#include "atomic-comp-swap-release-acquire.x"
diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-op-acq_rel.c b/gcc/testsuite/gcc.target/aarch64/atomic-op-acq_rel.c
index 74f26348e42..6823ce381b2 100644
--- a/gcc/testsuite/gcc.target/aarch64/atomic-op-acq_rel.c
+++ b/gcc/testsuite/gcc.target/aarch64/atomic-op-acq_rel.c
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-options "-march=armv8-a+nolse -O2" } */
+/* { dg-options "-march=armv8-a+nolse -O2 -mno-outline-atomics" } */
#include "atomic-op-acq_rel.x"
diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-op-acquire.c b/gcc/testsuite/gcc.target/aarch64/atomic-op-acquire.c
index 66c1b1efe20..87937de378a 100644
--- a/gcc/testsuite/gcc.target/aarch64/atomic-op-acquire.c
+++ b/gcc/testsuite/gcc.target/aarch64/atomic-op-acquire.c
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-options "-march=armv8-a+nolse -O2" } */
+/* { dg-options "-march=armv8-a+nolse -O2 -mno-outline-atomics" } */
#include "atomic-op-acquire.x"
diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-op-char.c b/gcc/testsuite/gcc.target/aarch64/atomic-op-char.c
index c09d0434ecf..60955e57da3 100644
--- a/gcc/testsuite/gcc.target/aarch64/atomic-op-char.c
+++ b/gcc/testsuite/gcc.target/aarch64/atomic-op-char.c
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-options "-march=armv8-a+nolse -O2" } */
+/* { dg-options "-march=armv8-a+nolse -O2 -mno-outline-atomics" } */
#include "atomic-op-char.x"
diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-op-consume.c b/gcc/testsuite/gcc.target/aarch64/atomic-op-consume.c
index 5783ab84f5c..16cb11aeeaf 100644
--- a/gcc/testsuite/gcc.target/aarch64/atomic-op-consume.c
+++ b/gcc/testsuite/gcc.target/aarch64/atomic-op-consume.c
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-options "-march=armv8-a+nolse -O2" } */
+/* { dg-options "-march=armv8-a+nolse -O2 -mno-outline-atomics" } */
#include "atomic-op-consume.x"
diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-op-imm.c b/gcc/testsuite/gcc.target/aarch64/atomic-op-imm.c
index 18b8f0b04e9..bcab4e481e3 100644
--- a/gcc/testsuite/gcc.target/aarch64/atomic-op-imm.c
+++ b/gcc/testsuite/gcc.target/aarch64/atomic-op-imm.c
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-options "-march=armv8-a+nolse -O2" } */
+/* { dg-options "-march=armv8-a+nolse -O2 -mno-outline-atomics" } */
int v = 0;
diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-op-int.c b/gcc/testsuite/gcc.target/aarch64/atomic-op-int.c
index 8520f0839ba..040e4a8d168 100644
--- a/gcc/testsuite/gcc.target/aarch64/atomic-op-int.c
+++ b/gcc/testsuite/gcc.target/aarch64/atomic-op-int.c
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-options "-march=armv8-a+nolse -O2" } */
+/* { dg-options "-march=armv8-a+nolse -O2 -mno-outline-atomics" } */
#include "atomic-op-int.x"
diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-op-long.c b/gcc/testsuite/gcc.target/aarch64/atomic-op-long.c
index d011f8c5ce2..fc88b92cd3e 100644
--- a/gcc/testsuite/gcc.target/aarch64/atomic-op-long.c
+++ b/gcc/testsuite/gcc.target/aarch64/atomic-op-long.c
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-options "-march=armv8-a+nolse -O2" } */
+/* { dg-options "-march=armv8-a+nolse -O2 -mno-outline-atomics" } */
long v = 0;
diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-op-relaxed.c b/gcc/testsuite/gcc.target/aarch64/atomic-op-relaxed.c
index ed96bfdb978..503d62b0280 100644
--- a/gcc/testsuite/gcc.target/aarch64/atomic-op-relaxed.c
+++ b/gcc/testsuite/gcc.target/aarch64/atomic-op-relaxed.c
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-options "-march=armv8-a+nolse -O2" } */
+/* { dg-options "-march=armv8-a+nolse -O2 -mno-outline-atomics" } */
#include "atomic-op-relaxed.x"
diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-op-release.c b/gcc/testsuite/gcc.target/aarch64/atomic-op-release.c
index fc4be17de89..efe14aea7e4 100644
--- a/gcc/testsuite/gcc.target/aarch64/atomic-op-release.c
+++ b/gcc/testsuite/gcc.target/aarch64/atomic-op-release.c
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-options "-march=armv8-a+nolse -O2" } */
+/* { dg-options "-march=armv8-a+nolse -O2 -mno-outline-atomics" } */
#include "atomic-op-release.x"
diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-op-seq_cst.c b/gcc/testsuite/gcc.target/aarch64/atomic-op-seq_cst.c
index 613000fe490..09973bf82ba 100644
--- a/gcc/testsuite/gcc.target/aarch64/atomic-op-seq_cst.c
+++ b/gcc/testsuite/gcc.target/aarch64/atomic-op-seq_cst.c
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-options "-march=armv8-a+nolse -O2" } */
+/* { dg-options "-march=armv8-a+nolse -O2 -mno-outline-atomics" } */
#include "atomic-op-seq_cst.x"
diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-op-short.c b/gcc/testsuite/gcc.target/aarch64/atomic-op-short.c
index e82c8118ece..e1dcebb0f89 100644
--- a/gcc/testsuite/gcc.target/aarch64/atomic-op-short.c
+++ b/gcc/testsuite/gcc.target/aarch64/atomic-op-short.c
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-options "-march=armv8-a+nolse -O2" } */
+/* { dg-options "-march=armv8-a+nolse -O2 -mno-outline-atomics" } */
#include "atomic-op-short.x"
diff --git a/gcc/testsuite/gcc.target/aarch64/atomic_cmp_exchange_zero_reg_1.c b/gcc/testsuite/gcc.target/aarch64/atomic_cmp_exchange_zero_reg_1.c
index f2a21ddf2e1..29246979bfb 100644
--- a/gcc/testsuite/gcc.target/aarch64/atomic_cmp_exchange_zero_reg_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/atomic_cmp_exchange_zero_reg_1.c
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-options "-O2 -march=armv8-a+nolse" } */
+/* { dg-options "-O2 -march=armv8-a+nolse -mno-outline-atomics" } */
/* { dg-skip-if "" { *-*-* } { "-mcpu=*" } { "" } } */
int
diff --git a/gcc/testsuite/gcc.target/aarch64/atomic_cmp_exchange_zero_strong_1.c b/gcc/testsuite/gcc.target/aarch64/atomic_cmp_exchange_zero_strong_1.c
index 8d2ae67dfbe..6daf9b08f5a 100644
--- a/gcc/testsuite/gcc.target/aarch64/atomic_cmp_exchange_zero_strong_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/atomic_cmp_exchange_zero_strong_1.c
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-options "-O2 -march=armv8-a+nolse" } */
+/* { dg-options "-O2 -march=armv8-a+nolse -mno-outline-atomics" } */
/* { dg-skip-if "" { *-*-* } { "-mcpu=*" } { "" } } */
int
diff --git a/gcc/testsuite/gcc.target/aarch64/sync-comp-swap.c b/gcc/testsuite/gcc.target/aarch64/sync-comp-swap.c
index e571b2f13b3..f56415f3354 100644
--- a/gcc/testsuite/gcc.target/aarch64/sync-comp-swap.c
+++ b/gcc/testsuite/gcc.target/aarch64/sync-comp-swap.c
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-options "-march=armv8-a+nolse -O2 -fno-ipa-icf" } */
+/* { dg-options "-march=armv8-a+nolse -O2 -fno-ipa-icf -mno-outline-atomics" } */
#include "sync-comp-swap.x"
diff --git a/gcc/testsuite/gcc.target/aarch64/sync-op-acquire.c b/gcc/testsuite/gcc.target/aarch64/sync-op-acquire.c
index 357bf1be3b2..39b3144aa36 100644
--- a/gcc/testsuite/gcc.target/aarch64/sync-op-acquire.c
+++ b/gcc/testsuite/gcc.target/aarch64/sync-op-acquire.c
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-options "-march=armv8-a+nolse -O2" } */
+/* { dg-options "-march=armv8-a+nolse -O2 -mno-outline-atomics" } */
#include "sync-op-acquire.x"
diff --git a/gcc/testsuite/gcc.target/aarch64/sync-op-full.c b/gcc/testsuite/gcc.target/aarch64/sync-op-full.c
index c6ba1629965..6b8b2043f40 100644
--- a/gcc/testsuite/gcc.target/aarch64/sync-op-full.c
+++ b/gcc/testsuite/gcc.target/aarch64/sync-op-full.c
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-options "-march=armv8-a+nolse -O2" } */
+/* { dg-options "-march=armv8-a+nolse -O2 -mno-outline-atomics" } */
#include "sync-op-full.x"
diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
index 55d466068b8..865b6a6d8ca 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -255,3 +255,6 @@ user-land code.
TargetVariable
long aarch64_stack_protector_guard_offset = 0
+moutline-atomics
+Target Report Mask(OUTLINE_ATOMICS) Save
+Generate local calls to out-of-line atomic operations.
diff --git a/gcc/config/aarch64/atomics.md b/gcc/config/aarch64/atomics.md
index f8bdd048b37..2e59b868420 100644
--- a/gcc/config/aarch64/atomics.md
+++ b/gcc/config/aarch64/atomics.md
@@ -186,16 +186,27 @@
(match_operand:SI 3 "const_int_operand")]
""
{
- rtx (*gen) (rtx, rtx, rtx, rtx);
-
/* Use an atomic SWP when available. */
if (TARGET_LSE)
- gen = gen_aarch64_atomic_exchange<mode>_lse;
+ {
+ emit_insn (gen_aarch64_atomic_exchange<mode>_lse
+ (operands[0], operands[1], operands[2], operands[3]));
+ }
+ else if (TARGET_OUTLINE_ATOMICS)
+ {
+ machine_mode mode = <MODE>mode;
+ rtx func = aarch64_atomic_ool_func (mode, operands[3],
+ &aarch64_ool_swp_names);
+ rtx rval = emit_library_call_value (func, operands[0], LCT_NORMAL,
+ mode, operands[2], mode,
+ XEXP (operands[1], 0), Pmode);
+ emit_move_insn (operands[0], rval);
+ }
else
- gen = gen_aarch64_atomic_exchange<mode>;
-
- emit_insn (gen (operands[0], operands[1], operands[2], operands[3]));
-
+ {
+ emit_insn (gen_aarch64_atomic_exchange<mode>
+ (operands[0], operands[1], operands[2], operands[3]));
+ }
DONE;
}
)
@@ -280,6 +291,39 @@
}
operands[1] = force_reg (<MODE>mode, operands[1]);
}
+ else if (TARGET_OUTLINE_ATOMICS)
+ {
+ const atomic_ool_names *names;
+ switch (<CODE>)
+ {
+ case MINUS:
+ operands[1] = expand_simple_unop (<MODE>mode, NEG, operands[1],
+ NULL, 1);
+ /* fallthru */
+ case PLUS:
+ names = &aarch64_ool_ldadd_names;
+ break;
+ case IOR:
+ names = &aarch64_ool_ldset_names;
+ break;
+ case XOR:
+ names = &aarch64_ool_ldeor_names;
+ break;
+ case AND:
+ operands[1] = expand_simple_unop (<MODE>mode, NOT, operands[1],
+ NULL, 1);
+ names = &aarch64_ool_ldclr_names;
+ break;
+ default:
+ gcc_unreachable ();
+ }
+ machine_mode mode = <MODE>mode;
+ rtx func = aarch64_atomic_ool_func (mode, operands[2], names);
+ emit_library_call_value (func, NULL_RTX, LCT_NORMAL, mode,
+ operands[1], mode,
+ XEXP (operands[0], 0), Pmode);
+ DONE;
+ }
else
gen = gen_aarch64_atomic_<atomic_optab><mode>;
@@ -405,6 +449,40 @@
}
operands[2] = force_reg (<MODE>mode, operands[2]);
}
+ else if (TARGET_OUTLINE_ATOMICS)
+ {
+ const atomic_ool_names *names;
+ switch (<CODE>)
+ {
+ case MINUS:
+ operands[2] = expand_simple_unop (<MODE>mode, NEG, operands[2],
+ NULL, 1);
+ /* fallthru */
+ case PLUS:
+ names = &aarch64_ool_ldadd_names;
+ break;
+ case IOR:
+ names = &aarch64_ool_ldset_names;
+ break;
+ case XOR:
+ names = &aarch64_ool_ldeor_names;
+ break;
+ case AND:
+ operands[2] = expand_simple_unop (<MODE>mode, NOT, operands[2],
+ NULL, 1);
+ names = &aarch64_ool_ldclr_names;
+ break;
+ default:
+ gcc_unreachable ();
+ }
+ machine_mode mode = <MODE>mode;
+ rtx func = aarch64_atomic_ool_func (mode, operands[3], names);
+ rtx rval = emit_library_call_value (func, operands[0], LCT_NORMAL, mode,
+ operands[2], mode,
+ XEXP (operands[1], 0), Pmode);
+ emit_move_insn (operands[0], rval);
+ DONE;
+ }
else
gen = gen_aarch64_atomic_fetch_<atomic_optab><mode>;
@@ -494,7 +572,7 @@
{
/* Use an atomic load-operate instruction when possible. In this case
we will re-compute the result from the original mem value. */
- if (TARGET_LSE)
+ if (TARGET_LSE || TARGET_OUTLINE_ATOMICS)
{
rtx tmp = gen_reg_rtx (<MODE>mode);
operands[2] = force_reg (<MODE>mode, operands[2]);
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 0e3693598e7..900fda1efb2 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -643,7 +643,8 @@ Objective-C and Objective-C++ Dialects}.
-march=@var{name} -mcpu=@var{name} -mtune=@var{name} @gol
-moverride=@var{string} -mverbose-cost-dump @gol
-mstack-protector-guard=@var{guard} -mstack-protector-guard-reg=@var{sysreg} @gol
--mstack-protector-guard-offset=@var{offset} -mtrack-speculation }
+-mstack-protector-guard-offset=@var{offset} -mtrack-speculation @gol
+-moutline-atomics }
@emph{Adapteva Epiphany Options}
@gccoptlist{-mhalf-reg-file -mprefer-short-insn-regs @gol
@@ -15874,6 +15875,19 @@ be used by the compiler when expanding calls to
@code{__builtin_speculation_safe_copy} to permit a more efficient code
sequence to be generated.
+@item -moutline-atomics
+@itemx -mno-outline-atomics
+Enable or disable calls to out-of-line helpers to implement atomic operations.
+These helpers will, at runtime, determine if ARMv8.1-Atomics instructions
+should be used; if not, they will use the load/store-exclusive instructions
+that are present in the base ARMv8.0 ISA.
+
+This option is only applicable when compiling for the base ARMv8.0
+instruction set. If using a later revision, e.g. @option{-march=armv8.1-a}
+or @option{-march=armv8-a+lse}, the ARMv8.1-Atomics instructions will be
+used directly. The same applies when using @option{-mcpu=} when the
+selected cpu supports the @samp{lse} feature.
+
@item -march=@var{name}
@opindex march
Specify the name of the target architecture and, optionally, one or
--
2.17.1
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH, AArch64 v4 4/6] aarch64: Add out-of-line functions for LSE atomics
2019-09-18 1:58 ` [PATCH, AArch64 v4 4/6] aarch64: Add out-of-line functions for LSE atomics Richard Henderson
@ 2019-09-18 12:58 ` Kyrill Tkachov
2019-12-23 16:05 ` Roman Zhuykov
1 sibling, 0 replies; 12+ messages in thread
From: Kyrill Tkachov @ 2019-09-18 12:58 UTC (permalink / raw)
To: Richard Henderson, gcc-patches
Cc: Wilco.Dijkstra, Marcus.Shawcroft, James.Greenhalgh
On 9/18/19 2:58 AM, Richard Henderson wrote:
> This is the libgcc part of the interface -- providing the functions.
> Rationale is provided at the top of libgcc/config/aarch64/lse.S.
>
> * config/aarch64/lse-init.c: New file.
> * config/aarch64/lse.S: New file.
> * config/aarch64/t-lse: New file.
> * config.host: Add t-lse to all aarch64 tuples.
> ---
> libgcc/config/aarch64/lse-init.c | 45 ++++++
> libgcc/config.host | 4 +
> libgcc/config/aarch64/lse.S | 235 +++++++++++++++++++++++++++++++
> libgcc/config/aarch64/t-lse | 44 ++++++
> 4 files changed, 328 insertions(+)
> create mode 100644 libgcc/config/aarch64/lse-init.c
> create mode 100644 libgcc/config/aarch64/lse.S
> create mode 100644 libgcc/config/aarch64/t-lse
>
> diff --git a/libgcc/config/aarch64/lse-init.c b/libgcc/config/aarch64/lse-init.c
> new file mode 100644
> index 00000000000..51fb21d45c9
> --- /dev/null
> +++ b/libgcc/config/aarch64/lse-init.c
> @@ -0,0 +1,45 @@
> +/* Out-of-line LSE atomics for AArch64 architecture, Init.
> + Copyright (C) 2018 Free Software Foundation, Inc.
> + Contributed by Linaro Ltd.
> +
This, and the other new files, will need an updated copyright date now.
Thanks,
Kyrill
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
> +for more details.
> +
> +Under Section 7 of GPL version 3, you are granted additional
> +permissions described in the GCC Runtime Library Exception, version
> +3.1, as published by the Free Software Foundation.
> +
> +You should have received a copy of the GNU General Public License and
> +a copy of the GCC Runtime Library Exception along with this program;
> +see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
> +<http://www.gnu.org/licenses/>. */
> +
> +/* Define the symbol gating the LSE implementations. */
> +_Bool __aarch64_have_lse_atomics
> + __attribute__((visibility("hidden"), nocommon));
> +
> +/* Disable initialization of __aarch64_have_lse_atomics during bootstrap. */
> +#ifndef inhibit_libc
> +# include <sys/auxv.h>
> +
> +/* Disable initialization if the system headers are too old. */
> +# if defined(AT_HWCAP) && defined(HWCAP_ATOMICS)
> +
> +static void __attribute__((constructor))
> +init_have_lse_atomics (void)
> +{
> + unsigned long hwcap = getauxval (AT_HWCAP);
> + __aarch64_have_lse_atomics = (hwcap & HWCAP_ATOMICS) != 0;
> +}
> +
> +# endif /* HWCAP */
> +#endif /* inhibit_libc */
> diff --git a/libgcc/config.host b/libgcc/config.host
> index 728e543ea39..122113fc519 100644
> --- a/libgcc/config.host
> +++ b/libgcc/config.host
> @@ -350,12 +350,14 @@ aarch64*-*-elf | aarch64*-*-rtems*)
> extra_parts="$extra_parts crtbegin.o crtend.o crti.o crtn.o"
> extra_parts="$extra_parts crtfastmath.o"
> tmake_file="${tmake_file} ${cpu_type}/t-aarch64"
> + tmake_file="${tmake_file} ${cpu_type}/t-lse t-slibgcc-libgcc"
> tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp t-crtfm"
> md_unwind_header=aarch64/aarch64-unwind.h
> ;;
> aarch64*-*-freebsd*)
> extra_parts="$extra_parts crtfastmath.o"
> tmake_file="${tmake_file} ${cpu_type}/t-aarch64"
> + tmake_file="${tmake_file} ${cpu_type}/t-lse t-slibgcc-libgcc"
> tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp t-crtfm"
> md_unwind_header=aarch64/freebsd-unwind.h
> ;;
> @@ -367,12 +369,14 @@ aarch64*-*-netbsd*)
> ;;
> aarch64*-*-fuchsia*)
> tmake_file="${tmake_file} ${cpu_type}/t-aarch64"
> + tmake_file="${tmake_file} ${cpu_type}/t-lse t-slibgcc-libgcc"
> tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp"
> ;;
> aarch64*-*-linux*)
> extra_parts="$extra_parts crtfastmath.o"
> md_unwind_header=aarch64/linux-unwind.h
> tmake_file="${tmake_file} ${cpu_type}/t-aarch64"
> + tmake_file="${tmake_file} ${cpu_type}/t-lse t-slibgcc-libgcc"
> tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp t-crtfm"
> ;;
> alpha*-*-linux*)
> diff --git a/libgcc/config/aarch64/lse.S b/libgcc/config/aarch64/lse.S
> new file mode 100644
> index 00000000000..c24a39242ca
> --- /dev/null
> +++ b/libgcc/config/aarch64/lse.S
> @@ -0,0 +1,235 @@
> +/* Out-of-line LSE atomics for AArch64 architecture.
> + Copyright (C) 2018 Free Software Foundation, Inc.
> + Contributed by Linaro Ltd.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
> +for more details.
> +
> +Under Section 7 of GPL version 3, you are granted additional
> +permissions described in the GCC Runtime Library Exception, version
> +3.1, as published by the Free Software Foundation.
> +
> +You should have received a copy of the GNU General Public License and
> +a copy of the GCC Runtime Library Exception along with this program;
> +see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
> +<http://www.gnu.org/licenses/>. */
> +
> +/*
> + * The problem that we are trying to solve is operating system deployment
> + * of ARMv8.1-Atomics, also known as Large System Exensions (LSE).
> + *
> + * There are a number of potential solutions for this problem which have
> + * been proposed and rejected for various reasons. To recap:
> + *
> + * (1) Multiple builds. The dynamic linker will examine /lib64/atomics/
> + * if HWCAP_ATOMICS is set, allowing entire libraries to be overwritten.
> + * However, not all Linux distributions are happy with multiple builds,
> + * and anyway it has no effect on main applications.
> + *
> + * (2) IFUNC. We could put these functions into libgcc_s.so, and have
> + * a single copy of each function for all DSOs. However, ARM is concerned
> + * that the branch-to-indirect-branch that is implied by using a PLT,
> + * as required by IFUNC, is too much overhead for smaller cpus.
> + *
> + * (3) Statically predicted direct branches. This is the approach that
> + * is taken here. These functions are linked into every DSO that uses them.
> + * All of the symbols are hidden, so that the functions are called via a
> + * direct branch. The choice of LSE vs non-LSE is done via one byte load
> + * followed by a well-predicted direct branch. The functions are compiled
> + * separately to minimize code size.
> + */
> +
> +/* Tell the assembler to accept LSE instructions. */
> + .arch armv8-a+lse
> +
> +/* Declare the symbol gating the LSE implementations. */
> + .hidden __aarch64_have_lse_atomics
> +
> +/* Turn size and memory model defines into mnemonic fragments. */
> +#if SIZE == 1
> +# define S b
> +# define UXT uxtb
> +#elif SIZE == 2
> +# define S h
> +# define UXT uxth
> +#elif SIZE == 4 || SIZE == 8 || SIZE == 16
> +# define S
> +# define UXT mov
> +#else
> +# error
> +#endif
> +
> +#if MODEL == 1
> +# define SUFF _relax
> +# define A
> +# define L
> +#elif MODEL == 2
> +# define SUFF _acq
> +# define A a
> +# define L
> +#elif MODEL == 3
> +# define SUFF _rel
> +# define A
> +# define L l
> +#elif MODEL == 4
> +# define SUFF _acq_rel
> +# define A a
> +# define L l
> +#else
> +# error
> +#endif
> +
> +/* Concatenate symbols. */
> +#define glue2_(A, B) A ## B
> +#define glue2(A, B) glue2_(A, B)
> +#define glue3_(A, B, C) A ## B ## C
> +#define glue3(A, B, C) glue3_(A, B, C)
> +#define glue4_(A, B, C, D) A ## B ## C ## D
> +#define glue4(A, B, C, D) glue4_(A, B, C, D)
> +
> +/* Select the size of a register, given a regno. */
> +#define x(N) glue2(x, N)
> +#define w(N) glue2(w, N)
> +#if SIZE < 8
> +# define s(N) w(N)
> +#else
> +# define s(N) x(N)
> +#endif
> +
> +#define NAME(BASE) glue4(__aarch64_, BASE, SIZE, SUFF)
> +#define LDXR glue4(ld, A, xr, S)
> +#define STXR glue4(st, L, xr, S)
> +
> +/* Temporary registers used. Other than these, only the return value
> + register (x0) and the flags are modified. */
> +#define tmp0 16
> +#define tmp1 17
> +#define tmp2 15
> +
> +/* Start and end a function. */
> +.macro STARTFN name
> + .text
> + .balign 16
> + .globl \name
> + .hidden \name
> + .type \name, %function
> + .cfi_startproc
> +\name:
> +.endm
> +
> +.macro ENDFN name
> + .cfi_endproc
> + .size \name, . - \name
> +.endm
> +
> +/* Branch to LABEL if LSE is disabled. */
> +.macro JUMP_IF_NOT_LSE label
> + adrp x(tmp0), __aarch64_have_lse_atomics
> + ldrb w(tmp0), [x(tmp0), :lo12:__aarch64_have_lse_atomics]
> + cbz w(tmp0), \label
> +.endm
> +
> +#ifdef L_cas
> +
> +STARTFN NAME(cas)
> + JUMP_IF_NOT_LSE 8f
> +
> +#if SIZE < 16
> +#define CAS glue4(cas, A, L, S)
> +
> + CAS s(0), s(1), [x2]
> + ret
> +
> +8: UXT s(tmp0), s(0)
> +0: LDXR s(0), [x2]
> + cmp s(0), s(tmp0)
> + bne 1f
> + STXR w(tmp1), s(1), [x2]
> + cbnz w(tmp1), 0b
> +1: ret
> +
> +#else
> +#define LDXP glue3(ld, A, xp)
> +#define STXP glue3(st, L, xp)
> +#define CASP glue3(casp, A, L)
> +
> + CASP x0, x1, x2, x3, [x4]
> + ret
> +
> +8: mov x(tmp0), x0
> + mov x(tmp1), x1
> +0: LDXP x0, x1, [x4]
> + cmp x0, x(tmp0)
> + ccmp x1, x(tmp1), #0, eq
> + bne 1f
> + STXP w(tmp2), x(tmp0), x(tmp1), [x4]
> + cbnz w(tmp2), 0b
> +1: ret
> +
> +#endif
> +
> +ENDFN NAME(cas)
> +#endif
> +
> +#ifdef L_swp
> +#define SWP glue4(swp, A, L, S)
> +
> +STARTFN NAME(swp)
> + JUMP_IF_NOT_LSE 8f
> +
> + SWP s(0), s(0), [x1]
> + ret
> +
> +8: mov s(tmp0), s(0)
> +0: LDXR s(0), [x1]
> + STXR w(tmp1), s(tmp0), [x1]
> + cbnz w(tmp1), 0b
> + ret
> +
> +ENDFN NAME(swp)
> +#endif
> +
> +#if defined(L_ldadd) || defined(L_ldclr) \
> + || defined(L_ldeor) || defined(L_ldset)
> +
> +#ifdef L_ldadd
> +#define LDNM ldadd
> +#define OP add
> +#elif defined(L_ldclr)
> +#define LDNM ldclr
> +#define OP bic
> +#elif defined(L_ldeor)
> +#define LDNM ldeor
> +#define OP eor
> +#elif defined(L_ldset)
> +#define LDNM ldset
> +#define OP orr
> +#else
> +#error
> +#endif
> +#define LDOP glue4(LDNM, A, L, S)
> +
> +STARTFN NAME(LDNM)
> + JUMP_IF_NOT_LSE 8f
> +
> + LDOP s(0), s(0), [x1]
> + ret
> +
> +8: mov s(tmp0), s(0)
> +0: LDXR s(0), [x1]
> + OP s(tmp1), s(0), s(tmp0)
> + STXR w(tmp1), s(tmp1), [x1]
> + cbnz w(tmp1), 0b
> + ret
> +
> +ENDFN NAME(LDNM)
> +#endif
> diff --git a/libgcc/config/aarch64/t-lse b/libgcc/config/aarch64/t-lse
> new file mode 100644
> index 00000000000..c7f4223cd45
> --- /dev/null
> +++ b/libgcc/config/aarch64/t-lse
> @@ -0,0 +1,44 @@
> +# Out-of-line LSE atomics for AArch64 architecture.
> +# Copyright (C) 2018 Free Software Foundation, Inc.
> +# Contributed by Linaro Ltd.
> +#
> +# This file is part of GCC.
> +#
> +# GCC is free software; you can redistribute it and/or modify it
> +# under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3, or (at your option)
> +# any later version.
> +#
> +# GCC is distributed in the hope that it will be useful, but
> +# WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> +# General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with GCC; see the file COPYING3. If not see
> +# <http://www.gnu.org/licenses/>.
> +
> +# Compare-and-swap has 5 sizes and 4 memory models.
> +S0 := $(foreach s, 1 2 4 8 16, $(addsuffix _$(s), cas))
> +O0 := $(foreach m, 1 2 3 4, $(addsuffix _$(m)$(objext), $(S0)))
> +
> +# Swap, Load-and-operate have 4 sizes and 4 memory models
> +S1 := $(foreach s, 1 2 4 8, $(addsuffix _$(s), swp ldadd ldclr ldeor ldset))
> +O1 := $(foreach m, 1 2 3 4, $(addsuffix _$(m)$(objext), $(S1)))
> +
> +LSE_OBJS := $(O0) $(O1)
> +
> +libgcc-objects += $(LSE_OBJS) lse-init$(objext)
> +
> +empty =
> +space = $(empty) $(empty)
> +PAT_SPLIT = $(subst _,$(space),$(*F))
> +PAT_BASE = $(word 1,$(PAT_SPLIT))
> +PAT_N = $(word 2,$(PAT_SPLIT))
> +PAT_M = $(word 3,$(PAT_SPLIT))
> +
> +lse-init$(objext): $(srcdir)/config/aarch64/lse-init.c
> + $(gcc_compile) -c $<
> +
> +$(LSE_OBJS): $(srcdir)/config/aarch64/lse.S
> + $(gcc_compile) -DL_$(PAT_BASE) -DSIZE=$(PAT_N) -DMODEL=$(PAT_M) -c $<
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH, AArch64 v4 5/6] aarch64: Implement -moutline-atomics
2019-09-18 1:58 ` [PATCH, AArch64 v4 5/6] aarch64: Implement -moutline-atomics Richard Henderson
@ 2019-09-18 12:58 ` Kyrill Tkachov
0 siblings, 0 replies; 12+ messages in thread
From: Kyrill Tkachov @ 2019-09-18 12:58 UTC (permalink / raw)
To: Richard Henderson, gcc-patches
Cc: Wilco.Dijkstra, Marcus.Shawcroft, James.Greenhalgh
On 9/18/19 2:58 AM, Richard Henderson wrote:
> * config/aarch64/aarch64.opt (-moutline-atomics): New.
> * config/aarch64/aarch64.c (aarch64_atomic_ool_func): New.
> (aarch64_ool_cas_names, aarch64_ool_swp_names): New.
> (aarch64_ool_ldadd_names, aarch64_ool_ldset_names): New.
> (aarch64_ool_ldclr_names, aarch64_ool_ldeor_names): New.
> (aarch64_expand_compare_and_swap): Honor TARGET_OUTLINE_ATOMICS.
> * config/aarch64/atomics.md (atomic_exchange<ALLI>): Likewise.
> (atomic_<atomic_op><ALLI>): Likewise.
> (atomic_fetch_<atomic_op><ALLI>): Likewise.
> (atomic_<atomic_op>_fetch<ALLI>): Likewise.
> testsuite/
> * gcc.target/aarch64/atomic-op-acq_rel.c: Use -mno-outline-atomics.
> * gcc.target/aarch64/atomic-comp-swap-release-acquire.c: Likewise.
> * gcc.target/aarch64/atomic-op-acquire.c: Likewise.
> * gcc.target/aarch64/atomic-op-char.c: Likewise.
> * gcc.target/aarch64/atomic-op-consume.c: Likewise.
> * gcc.target/aarch64/atomic-op-imm.c: Likewise.
> * gcc.target/aarch64/atomic-op-int.c: Likewise.
> * gcc.target/aarch64/atomic-op-long.c: Likewise.
> * gcc.target/aarch64/atomic-op-relaxed.c: Likewise.
> * gcc.target/aarch64/atomic-op-release.c: Likewise.
> * gcc.target/aarch64/atomic-op-seq_cst.c: Likewise.
> * gcc.target/aarch64/atomic-op-short.c: Likewise.
> * gcc.target/aarch64/atomic_cmp_exchange_zero_reg_1.c: Likewise.
> * gcc.target/aarch64/atomic_cmp_exchange_zero_strong_1.c: Likewise.
> * gcc.target/aarch64/sync-comp-swap.c: Likewise.
> * gcc.target/aarch64/sync-op-acquire.c: Likewise.
> * gcc.target/aarch64/sync-op-full.c: Likewise.
> ---
> gcc/config/aarch64/aarch64-protos.h | 13 +++
> gcc/config/aarch64/aarch64.c | 87 +++++++++++++++++
> .../atomic-comp-swap-release-acquire.c | 2 +-
> .../gcc.target/aarch64/atomic-op-acq_rel.c | 2 +-
> .../gcc.target/aarch64/atomic-op-acquire.c | 2 +-
> .../gcc.target/aarch64/atomic-op-char.c | 2 +-
> .../gcc.target/aarch64/atomic-op-consume.c | 2 +-
> .../gcc.target/aarch64/atomic-op-imm.c | 2 +-
> .../gcc.target/aarch64/atomic-op-int.c | 2 +-
> .../gcc.target/aarch64/atomic-op-long.c | 2 +-
> .../gcc.target/aarch64/atomic-op-relaxed.c | 2 +-
> .../gcc.target/aarch64/atomic-op-release.c | 2 +-
> .../gcc.target/aarch64/atomic-op-seq_cst.c | 2 +-
> .../gcc.target/aarch64/atomic-op-short.c | 2 +-
> .../aarch64/atomic_cmp_exchange_zero_reg_1.c | 2 +-
> .../atomic_cmp_exchange_zero_strong_1.c | 2 +-
> .../gcc.target/aarch64/sync-comp-swap.c | 2 +-
> .../gcc.target/aarch64/sync-op-acquire.c | 2 +-
> .../gcc.target/aarch64/sync-op-full.c | 2 +-
> gcc/config/aarch64/aarch64.opt | 3 +
> gcc/config/aarch64/atomics.md | 94 +++++++++++++++++--
> gcc/doc/invoke.texi | 16 +++-
> 22 files changed, 221 insertions(+), 26 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
> index c4b73d26df6..1c1aac7201a 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -696,4 +696,17 @@ poly_uint64 aarch64_regmode_natural_size (machine_mode);
>
> bool aarch64_high_bits_all_ones_p (HOST_WIDE_INT);
>
> +struct atomic_ool_names
> +{
> + const char *str[5][4];
> +};
> +
> +rtx aarch64_atomic_ool_func(machine_mode mode, rtx model_rtx,
> + const atomic_ool_names *names);
> +extern const atomic_ool_names aarch64_ool_swp_names;
> +extern const atomic_ool_names aarch64_ool_ldadd_names;
> +extern const atomic_ool_names aarch64_ool_ldset_names;
> +extern const atomic_ool_names aarch64_ool_ldclr_names;
> +extern const atomic_ool_names aarch64_ool_ldeor_names;
> +
> #endif /* GCC_AARCH64_PROTOS_H */
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index b937514e6f8..56a4a47db73 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -16867,6 +16867,82 @@ aarch64_emit_unlikely_jump (rtx insn)
> add_reg_br_prob_note (jump, profile_probability::very_unlikely ());
> }
>
> +/* We store the names of the various atomic helpers in a 5x4 array.
> + Return the libcall function given MODE, MODEL and NAMES. */
> +
> +rtx
> +aarch64_atomic_ool_func(machine_mode mode, rtx model_rtx,
> + const atomic_ool_names *names)
> +{
> + memmodel model = memmodel_base (INTVAL (model_rtx));
> + int mode_idx, model_idx;
> +
> + switch (mode)
> + {
> + case E_QImode:
> + mode_idx = 0;
> + break;
> + case E_HImode:
> + mode_idx = 1;
> + break;
> + case E_SImode:
> + mode_idx = 2;
> + break;
> + case E_DImode:
> + mode_idx = 3;
> + break;
> + case E_TImode:
> + mode_idx = 4;
> + break;
> + default:
> + gcc_unreachable ();
> + }
> +
> + switch (model)
> + {
> + case MEMMODEL_RELAXED:
> + model_idx = 0;
> + break;
> + case MEMMODEL_CONSUME:
> + case MEMMODEL_ACQUIRE:
> + model_idx = 1;
> + break;
> + case MEMMODEL_RELEASE:
> + model_idx = 2;
> + break;
> + case MEMMODEL_ACQ_REL:
> + case MEMMODEL_SEQ_CST:
> + model_idx = 3;
> + break;
> + default:
> + gcc_unreachable ();
> + }
> +
> + return init_one_libfunc_visibility (names->str[mode_idx][model_idx],
> + VISIBILITY_HIDDEN);
> +}
> +
> +#define DEF0(B, N) \
> + { "__aarch64_" #B #N "_relax", \
> + "__aarch64_" #B #N "_acq", \
> + "__aarch64_" #B #N "_rel", \
> + "__aarch64_" #B #N "_acq_rel" }
> +
> +#define DEF4(B) DEF0(B, 1), DEF0(B, 2), DEF0(B, 4), DEF0(B, 8), \
> + { NULL, NULL, NULL, NULL }
> +#define DEF5(B) DEF0(B, 1), DEF0(B, 2), DEF0(B, 4), DEF0(B, 8), DEF0(B, 16)
> +
> +static const atomic_ool_names aarch64_ool_cas_names = { { DEF5(cas) } };
> +const atomic_ool_names aarch64_ool_swp_names = { { DEF4(swp) } };
> +const atomic_ool_names aarch64_ool_ldadd_names = { { DEF4(ldadd) } };
> +const atomic_ool_names aarch64_ool_ldset_names = { { DEF4(ldset) } };
> +const atomic_ool_names aarch64_ool_ldclr_names = { { DEF4(ldclr) } };
> +const atomic_ool_names aarch64_ool_ldeor_names = { { DEF4(ldeor) } };
> +
> +#undef DEF0
> +#undef DEF4
> +#undef DEF5
> +
> /* Expand a compare and swap pattern. */
>
> void
> @@ -16913,6 +16989,17 @@ aarch64_expand_compare_and_swap (rtx operands[])
> newval, mod_s));
> cc_reg = aarch64_gen_compare_reg_maybe_ze (NE, rval, oldval, mode);
> }
> + else if (TARGET_OUTLINE_ATOMICS)
> + {
> + /* Oldval must satisfy compare afterward. */
> + if (!aarch64_plus_operand (oldval, mode))
> + oldval = force_reg (mode, oldval);
> + rtx func = aarch64_atomic_ool_func (mode, mod_s, &aarch64_ool_cas_names);
> + rval = emit_library_call_value (func, NULL_RTX, LCT_NORMAL, r_mode,
> + oldval, mode, newval, mode,
> + XEXP (mem, 0), Pmode);
> + cc_reg = aarch64_gen_compare_reg_maybe_ze (NE, rval, oldval, mode);
> + }
> else
> {
> /* The oldval predicate varies by mode. Test it and force to reg. */
> diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-comp-swap-release-acquire.c b/gcc/testsuite/gcc.target/aarch64/atomic-comp-swap-release-acquire.c
> index 49ca5d0d09c..a828a72aa75 100644
> --- a/gcc/testsuite/gcc.target/aarch64/atomic-comp-swap-release-acquire.c
> +++ b/gcc/testsuite/gcc.target/aarch64/atomic-comp-swap-release-acquire.c
> @@ -1,5 +1,5 @@
> /* { dg-do compile } */
> -/* { dg-options "-march=armv8-a+nolse -O2 -fno-ipa-icf" } */
> +/* { dg-options "-march=armv8-a+nolse -O2 -fno-ipa-icf -mno-outline-atomics" } */
>
> #include "atomic-comp-swap-release-acquire.x"
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-op-acq_rel.c b/gcc/testsuite/gcc.target/aarch64/atomic-op-acq_rel.c
> index 74f26348e42..6823ce381b2 100644
> --- a/gcc/testsuite/gcc.target/aarch64/atomic-op-acq_rel.c
> +++ b/gcc/testsuite/gcc.target/aarch64/atomic-op-acq_rel.c
> @@ -1,5 +1,5 @@
> /* { dg-do compile } */
> -/* { dg-options "-march=armv8-a+nolse -O2" } */
> +/* { dg-options "-march=armv8-a+nolse -O2 -mno-outline-atomics" } */
>
> #include "atomic-op-acq_rel.x"
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-op-acquire.c b/gcc/testsuite/gcc.target/aarch64/atomic-op-acquire.c
> index 66c1b1efe20..87937de378a 100644
> --- a/gcc/testsuite/gcc.target/aarch64/atomic-op-acquire.c
> +++ b/gcc/testsuite/gcc.target/aarch64/atomic-op-acquire.c
> @@ -1,5 +1,5 @@
> /* { dg-do compile } */
> -/* { dg-options "-march=armv8-a+nolse -O2" } */
> +/* { dg-options "-march=armv8-a+nolse -O2 -mno-outline-atomics" } */
>
> #include "atomic-op-acquire.x"
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-op-char.c b/gcc/testsuite/gcc.target/aarch64/atomic-op-char.c
> index c09d0434ecf..60955e57da3 100644
> --- a/gcc/testsuite/gcc.target/aarch64/atomic-op-char.c
> +++ b/gcc/testsuite/gcc.target/aarch64/atomic-op-char.c
> @@ -1,5 +1,5 @@
> /* { dg-do compile } */
> -/* { dg-options "-march=armv8-a+nolse -O2" } */
> +/* { dg-options "-march=armv8-a+nolse -O2 -mno-outline-atomics" } */
>
> #include "atomic-op-char.x"
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-op-consume.c b/gcc/testsuite/gcc.target/aarch64/atomic-op-consume.c
> index 5783ab84f5c..16cb11aeeaf 100644
> --- a/gcc/testsuite/gcc.target/aarch64/atomic-op-consume.c
> +++ b/gcc/testsuite/gcc.target/aarch64/atomic-op-consume.c
> @@ -1,5 +1,5 @@
> /* { dg-do compile } */
> -/* { dg-options "-march=armv8-a+nolse -O2" } */
> +/* { dg-options "-march=armv8-a+nolse -O2 -mno-outline-atomics" } */
>
> #include "atomic-op-consume.x"
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-op-imm.c b/gcc/testsuite/gcc.target/aarch64/atomic-op-imm.c
> index 18b8f0b04e9..bcab4e481e3 100644
> --- a/gcc/testsuite/gcc.target/aarch64/atomic-op-imm.c
> +++ b/gcc/testsuite/gcc.target/aarch64/atomic-op-imm.c
> @@ -1,5 +1,5 @@
> /* { dg-do compile } */
> -/* { dg-options "-march=armv8-a+nolse -O2" } */
> +/* { dg-options "-march=armv8-a+nolse -O2 -mno-outline-atomics" } */
>
> int v = 0;
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-op-int.c b/gcc/testsuite/gcc.target/aarch64/atomic-op-int.c
> index 8520f0839ba..040e4a8d168 100644
> --- a/gcc/testsuite/gcc.target/aarch64/atomic-op-int.c
> +++ b/gcc/testsuite/gcc.target/aarch64/atomic-op-int.c
> @@ -1,5 +1,5 @@
> /* { dg-do compile } */
> -/* { dg-options "-march=armv8-a+nolse -O2" } */
> +/* { dg-options "-march=armv8-a+nolse -O2 -mno-outline-atomics" } */
>
> #include "atomic-op-int.x"
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-op-long.c b/gcc/testsuite/gcc.target/aarch64/atomic-op-long.c
> index d011f8c5ce2..fc88b92cd3e 100644
> --- a/gcc/testsuite/gcc.target/aarch64/atomic-op-long.c
> +++ b/gcc/testsuite/gcc.target/aarch64/atomic-op-long.c
> @@ -1,5 +1,5 @@
> /* { dg-do compile } */
> -/* { dg-options "-march=armv8-a+nolse -O2" } */
> +/* { dg-options "-march=armv8-a+nolse -O2 -mno-outline-atomics" } */
>
> long v = 0;
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-op-relaxed.c b/gcc/testsuite/gcc.target/aarch64/atomic-op-relaxed.c
> index ed96bfdb978..503d62b0280 100644
> --- a/gcc/testsuite/gcc.target/aarch64/atomic-op-relaxed.c
> +++ b/gcc/testsuite/gcc.target/aarch64/atomic-op-relaxed.c
> @@ -1,5 +1,5 @@
> /* { dg-do compile } */
> -/* { dg-options "-march=armv8-a+nolse -O2" } */
> +/* { dg-options "-march=armv8-a+nolse -O2 -mno-outline-atomics" } */
>
> #include "atomic-op-relaxed.x"
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-op-release.c b/gcc/testsuite/gcc.target/aarch64/atomic-op-release.c
> index fc4be17de89..efe14aea7e4 100644
> --- a/gcc/testsuite/gcc.target/aarch64/atomic-op-release.c
> +++ b/gcc/testsuite/gcc.target/aarch64/atomic-op-release.c
> @@ -1,5 +1,5 @@
> /* { dg-do compile } */
> -/* { dg-options "-march=armv8-a+nolse -O2" } */
> +/* { dg-options "-march=armv8-a+nolse -O2 -mno-outline-atomics" } */
>
> #include "atomic-op-release.x"
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-op-seq_cst.c b/gcc/testsuite/gcc.target/aarch64/atomic-op-seq_cst.c
> index 613000fe490..09973bf82ba 100644
> --- a/gcc/testsuite/gcc.target/aarch64/atomic-op-seq_cst.c
> +++ b/gcc/testsuite/gcc.target/aarch64/atomic-op-seq_cst.c
> @@ -1,5 +1,5 @@
> /* { dg-do compile } */
> -/* { dg-options "-march=armv8-a+nolse -O2" } */
> +/* { dg-options "-march=armv8-a+nolse -O2 -mno-outline-atomics" } */
>
> #include "atomic-op-seq_cst.x"
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-op-short.c b/gcc/testsuite/gcc.target/aarch64/atomic-op-short.c
> index e82c8118ece..e1dcebb0f89 100644
> --- a/gcc/testsuite/gcc.target/aarch64/atomic-op-short.c
> +++ b/gcc/testsuite/gcc.target/aarch64/atomic-op-short.c
> @@ -1,5 +1,5 @@
> /* { dg-do compile } */
> -/* { dg-options "-march=armv8-a+nolse -O2" } */
> +/* { dg-options "-march=armv8-a+nolse -O2 -mno-outline-atomics" } */
>
> #include "atomic-op-short.x"
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/atomic_cmp_exchange_zero_reg_1.c b/gcc/testsuite/gcc.target/aarch64/atomic_cmp_exchange_zero_reg_1.c
> index f2a21ddf2e1..29246979bfb 100644
> --- a/gcc/testsuite/gcc.target/aarch64/atomic_cmp_exchange_zero_reg_1.c
> +++ b/gcc/testsuite/gcc.target/aarch64/atomic_cmp_exchange_zero_reg_1.c
> @@ -1,5 +1,5 @@
> /* { dg-do compile } */
> -/* { dg-options "-O2 -march=armv8-a+nolse" } */
> +/* { dg-options "-O2 -march=armv8-a+nolse -mno-outline-atomics" } */
> /* { dg-skip-if "" { *-*-* } { "-mcpu=*" } { "" } } */
>
> int
> diff --git a/gcc/testsuite/gcc.target/aarch64/atomic_cmp_exchange_zero_strong_1.c b/gcc/testsuite/gcc.target/aarch64/atomic_cmp_exchange_zero_strong_1.c
> index 8d2ae67dfbe..6daf9b08f5a 100644
> --- a/gcc/testsuite/gcc.target/aarch64/atomic_cmp_exchange_zero_strong_1.c
> +++ b/gcc/testsuite/gcc.target/aarch64/atomic_cmp_exchange_zero_strong_1.c
> @@ -1,5 +1,5 @@
> /* { dg-do compile } */
> -/* { dg-options "-O2 -march=armv8-a+nolse" } */
> +/* { dg-options "-O2 -march=armv8-a+nolse -mno-outline-atomics" } */
> /* { dg-skip-if "" { *-*-* } { "-mcpu=*" } { "" } } */
>
> int
> diff --git a/gcc/testsuite/gcc.target/aarch64/sync-comp-swap.c b/gcc/testsuite/gcc.target/aarch64/sync-comp-swap.c
> index e571b2f13b3..f56415f3354 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sync-comp-swap.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sync-comp-swap.c
> @@ -1,5 +1,5 @@
> /* { dg-do compile } */
> -/* { dg-options "-march=armv8-a+nolse -O2 -fno-ipa-icf" } */
> +/* { dg-options "-march=armv8-a+nolse -O2 -fno-ipa-icf -mno-outline-atomics" } */
>
> #include "sync-comp-swap.x"
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/sync-op-acquire.c b/gcc/testsuite/gcc.target/aarch64/sync-op-acquire.c
> index 357bf1be3b2..39b3144aa36 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sync-op-acquire.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sync-op-acquire.c
> @@ -1,5 +1,5 @@
> /* { dg-do compile } */
> -/* { dg-options "-march=armv8-a+nolse -O2" } */
> +/* { dg-options "-march=armv8-a+nolse -O2 -mno-outline-atomics" } */
>
> #include "sync-op-acquire.x"
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/sync-op-full.c b/gcc/testsuite/gcc.target/aarch64/sync-op-full.c
> index c6ba1629965..6b8b2043f40 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sync-op-full.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sync-op-full.c
> @@ -1,5 +1,5 @@
> /* { dg-do compile } */
> -/* { dg-options "-march=armv8-a+nolse -O2" } */
> +/* { dg-options "-march=armv8-a+nolse -O2 -mno-outline-atomics" } */
>
> #include "sync-op-full.x"
>
> diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
> index 55d466068b8..865b6a6d8ca 100644
> --- a/gcc/config/aarch64/aarch64.opt
> +++ b/gcc/config/aarch64/aarch64.opt
> @@ -255,3 +255,6 @@ user-land code.
> TargetVariable
> long aarch64_stack_protector_guard_offset = 0
>
> +moutline-atomics
> +Target Report Mask(OUTLINE_ATOMICS) Save
> +Generate local calls to out-of-line atomic operations.
> diff --git a/gcc/config/aarch64/atomics.md b/gcc/config/aarch64/atomics.md
> index f8bdd048b37..2e59b868420 100644
> --- a/gcc/config/aarch64/atomics.md
> +++ b/gcc/config/aarch64/atomics.md
> @@ -186,16 +186,27 @@
> (match_operand:SI 3 "const_int_operand")]
> ""
> {
> - rtx (*gen) (rtx, rtx, rtx, rtx);
> -
> /* Use an atomic SWP when available. */
> if (TARGET_LSE)
> - gen = gen_aarch64_atomic_exchange<mode>_lse;
> + {
> + emit_insn (gen_aarch64_atomic_exchange<mode>_lse
> + (operands[0], operands[1], operands[2], operands[3]));
> + }
> + else if (TARGET_OUTLINE_ATOMICS)
> + {
> + machine_mode mode = <MODE>mode;
> + rtx func = aarch64_atomic_ool_func (mode, operands[3],
> + &aarch64_ool_swp_names);
> + rtx rval = emit_library_call_value (func, operands[0], LCT_NORMAL,
> + mode, operands[2], mode,
> + XEXP (operands[1], 0), Pmode);
> + emit_move_insn (operands[0], rval);
> + }
> else
> - gen = gen_aarch64_atomic_exchange<mode>;
> -
> - emit_insn (gen (operands[0], operands[1], operands[2], operands[3]));
> -
> + {
> + emit_insn (gen_aarch64_atomic_exchange<mode>
> + (operands[0], operands[1], operands[2], operands[3]));
> + }
> DONE;
> }
> )
> @@ -280,6 +291,39 @@
> }
> operands[1] = force_reg (<MODE>mode, operands[1]);
> }
> + else if (TARGET_OUTLINE_ATOMICS)
> + {
> + const atomic_ool_names *names;
> + switch (<CODE>)
> + {
> + case MINUS:
> + operands[1] = expand_simple_unop (<MODE>mode, NEG, operands[1],
> + NULL, 1);
> + /* fallthru */
> + case PLUS:
> + names = &aarch64_ool_ldadd_names;
> + break;
> + case IOR:
> + names = &aarch64_ool_ldset_names;
> + break;
> + case XOR:
> + names = &aarch64_ool_ldeor_names;
> + break;
> + case AND:
> + operands[1] = expand_simple_unop (<MODE>mode, NOT, operands[1],
> + NULL, 1);
> + names = &aarch64_ool_ldclr_names;
> + break;
> + default:
> + gcc_unreachable ();
> + }
> + machine_mode mode = <MODE>mode;
> + rtx func = aarch64_atomic_ool_func (mode, operands[2], names);
> + emit_library_call_value (func, NULL_RTX, LCT_NORMAL, mode,
> + operands[1], mode,
> + XEXP (operands[0], 0), Pmode);
> + DONE;
> + }
> else
> gen = gen_aarch64_atomic_<atomic_optab><mode>;
>
> @@ -405,6 +449,40 @@
> }
> operands[2] = force_reg (<MODE>mode, operands[2]);
> }
> + else if (TARGET_OUTLINE_ATOMICS)
> + {
> + const atomic_ool_names *names;
> + switch (<CODE>)
> + {
> + case MINUS:
> + operands[2] = expand_simple_unop (<MODE>mode, NEG, operands[2],
> + NULL, 1);
> + /* fallthru */
> + case PLUS:
> + names = &aarch64_ool_ldadd_names;
> + break;
> + case IOR:
> + names = &aarch64_ool_ldset_names;
> + break;
> + case XOR:
> + names = &aarch64_ool_ldeor_names;
> + break;
> + case AND:
> + operands[2] = expand_simple_unop (<MODE>mode, NOT, operands[2],
> + NULL, 1);
> + names = &aarch64_ool_ldclr_names;
> + break;
> + default:
> + gcc_unreachable ();
> + }
> + machine_mode mode = <MODE>mode;
> + rtx func = aarch64_atomic_ool_func (mode, operands[3], names);
> + rtx rval = emit_library_call_value (func, operands[0], LCT_NORMAL, mode,
> + operands[2], mode,
> + XEXP (operands[1], 0), Pmode);
> + emit_move_insn (operands[0], rval);
> + DONE;
> + }
> else
> gen = gen_aarch64_atomic_fetch_<atomic_optab><mode>;
>
> @@ -494,7 +572,7 @@
> {
> /* Use an atomic load-operate instruction when possible. In this case
> we will re-compute the result from the original mem value. */
> - if (TARGET_LSE)
> + if (TARGET_LSE || TARGET_OUTLINE_ATOMICS)
> {
> rtx tmp = gen_reg_rtx (<MODE>mode);
> operands[2] = force_reg (<MODE>mode, operands[2]);
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 0e3693598e7..900fda1efb2 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -643,7 +643,8 @@ Objective-C and Objective-C++ Dialects}.
> -march=@var{name} -mcpu=@var{name} -mtune=@var{name} @gol
> -moverride=@var{string} -mverbose-cost-dump @gol
> -mstack-protector-guard=@var{guard} -mstack-protector-guard-reg=@var{sysreg} @gol
> --mstack-protector-guard-offset=@var{offset} -mtrack-speculation }
> +-mstack-protector-guard-offset=@var{offset} -mtrack-speculation @gol
> +-moutline-atomics }
>
> @emph{Adapteva Epiphany Options}
> @gccoptlist{-mhalf-reg-file -mprefer-short-insn-regs @gol
> @@ -15874,6 +15875,19 @@ be used by the compiler when expanding calls to
> @code{__builtin_speculation_safe_copy} to permit a more efficient code
> sequence to be generated.
>
> +@item -moutline-atomics
> +@itemx -mno-outline-atomics
> +Enable or disable calls to out-of-line helpers to implement atomic operations.
> +These helpers will, at runtime, determine if ARMv8.1-Atomics instructions
> +should be used; if not, they will use the load/store-exclusive instructions
> +that are present in the base ARMv8.0 ISA.
Let's call them "LSE instructions from Armv8.1-A", rather than
ARMv8.1-Atomics.
> +
> +This option is only applicable when compiling for the base ARMv8.0
> +instruction set. If using a later revision, e.g. @option{-march=armv8.1-a}
> +or @option{-march=armv8-a+lse}, the ARMv8.1-Atomics instructions will be
> +used directly. The same applies when using @option{-mcpu=} when the
> +selected cpu supports the @samp{lse} feature.
> +
This needs a corresponding ChangeLog entry.
Thanks,
Kyril
> @item -march=@var{name}
> @opindex march
> Specify the name of the target architecture and, optionally, one or
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH, AArch64 v4 0/6] LSE atomics out-of-line
2019-09-18 1:58 [PATCH, AArch64 v4 0/6] LSE atomics out-of-line Richard Henderson
` (5 preceding siblings ...)
2019-09-18 1:58 ` [PATCH, AArch64 v4 2/6] aarch64: Implement TImode compare-and-swap Richard Henderson
@ 2019-09-18 12:58 ` Kyrill Tkachov
2019-09-19 14:39 ` Richard Henderson
6 siblings, 1 reply; 12+ messages in thread
From: Kyrill Tkachov @ 2019-09-18 12:58 UTC (permalink / raw)
To: Richard Henderson, gcc-patches
Cc: Wilco.Dijkstra, Marcus.Shawcroft, James.Greenhalgh
Hi Richard,
On 9/18/19 2:58 AM, Richard Henderson wrote:
> Version 3 was back in November:
> https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00062.html
>
> Changes since v3:
> * Do not swap_commutative_operands_p in aarch64_gen_compare_reg.
> This is the probable cause of the bootstrap problem that Kyrill reported.
> * Add unwind markers to the out-of-line functions.
> * Use uxt{8,16} instead of mov in CAS functions,
> in preference to including the uxt with the cmp.
> * Prefer the lse case in the out-of-line fallthru (Wilco).
> * Name the option -moutline-atomics (Wilco)
> * Name the variable __aarch64_have_lse_atomics (Wilco);
> fix the definition in lse-init.c.
> * Rename the functions s/__aa64/__aarch64/ (Seemed sensible to match prev)
> * Always use Pmode for the address for libcalls, fixing ilp32 (Kyrill).
>
> Still not done is a custom calling convention during code generation,
> but that can come later as an optimization.
>
> Tested aarch64-linux on a thunder x1.
> I have not run tests on any platform supporting LSE, even qemu.
>
Thanks for this.
I've bootstrapped and tested this patch series on systems with and
without LSE support, both with and without patch [6/6], so 4 setups in
total.
It all looks clean for me.
I'm favour of this series going in (modulo patch 6/6, leaving the option
to turn it on to the user).
I've got a couple of small comments on some of the patches that IMO can
be fixed when committing.
I'll respond to them individually.
Thanks,
Kyrill
> r~
>
>
> Richard Henderson (6):
> aarch64: Extend %R for integer registers
> aarch64: Implement TImode compare-and-swap
> aarch64: Tidy aarch64_split_compare_and_swap
> aarch64: Add out-of-line functions for LSE atomics
> aarch64: Implement -moutline-atomics
> TESTING: Enable -moutline-atomics by default
>
> gcc/config/aarch64/aarch64-protos.h | 13 +
> gcc/common/config/aarch64/aarch64-common.c | 6 +-
> gcc/config/aarch64/aarch64.c | 204 +++++++++++----
> .../atomic-comp-swap-release-acquire.c | 2 +-
> .../gcc.target/aarch64/atomic-op-acq_rel.c | 2 +-
> .../gcc.target/aarch64/atomic-op-acquire.c | 2 +-
> .../gcc.target/aarch64/atomic-op-char.c | 2 +-
> .../gcc.target/aarch64/atomic-op-consume.c | 2 +-
> .../gcc.target/aarch64/atomic-op-imm.c | 2 +-
> .../gcc.target/aarch64/atomic-op-int.c | 2 +-
> .../gcc.target/aarch64/atomic-op-long.c | 2 +-
> .../gcc.target/aarch64/atomic-op-relaxed.c | 2 +-
> .../gcc.target/aarch64/atomic-op-release.c | 2 +-
> .../gcc.target/aarch64/atomic-op-seq_cst.c | 2 +-
> .../gcc.target/aarch64/atomic-op-short.c | 2 +-
> .../aarch64/atomic_cmp_exchange_zero_reg_1.c | 2 +-
> .../atomic_cmp_exchange_zero_strong_1.c | 2 +-
> .../gcc.target/aarch64/sync-comp-swap.c | 2 +-
> .../gcc.target/aarch64/sync-op-acquire.c | 2 +-
> .../gcc.target/aarch64/sync-op-full.c | 2 +-
> libgcc/config/aarch64/lse-init.c | 45 ++++
> gcc/config/aarch64/aarch64.opt | 3 +
> gcc/config/aarch64/atomics.md | 187 +++++++++++++-
> gcc/config/aarch64/iterators.md | 3 +
> gcc/doc/invoke.texi | 16 +-
> libgcc/config.host | 4 +
> libgcc/config/aarch64/lse.S | 235 ++++++++++++++++++
> libgcc/config/aarch64/t-lse | 44 ++++
> 28 files changed, 709 insertions(+), 85 deletions(-)
> create mode 100644 libgcc/config/aarch64/lse-init.c
> create mode 100644 libgcc/config/aarch64/lse.S
> create mode 100644 libgcc/config/aarch64/t-lse
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH, AArch64 v4 0/6] LSE atomics out-of-line
2019-09-18 12:58 ` [PATCH, AArch64 v4 0/6] LSE atomics out-of-line Kyrill Tkachov
@ 2019-09-19 14:39 ` Richard Henderson
0 siblings, 0 replies; 12+ messages in thread
From: Richard Henderson @ 2019-09-19 14:39 UTC (permalink / raw)
To: Kyrill Tkachov, Richard Henderson, gcc-patches
Cc: Wilco.Dijkstra, Marcus.Shawcroft, James.Greenhalgh
On 9/18/19 5:58 AM, Kyrill Tkachov wrote:
> Thanks for this.
>
> I've bootstrapped and tested this patch series on systems with and without LSE
> support, both with and without patch [6/6], so 4 setups in total.
>
> It all looks clean for me.
>
> I'm favour of this series going in (modulo patch 6/6, leaving the option to
> turn it on to the user).
>
> I've got a couple of small comments on some of the patches that IMO can be
> fixed when committing.
Thanks. Committed with the requested modifications.
r~
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH, AArch64 v4 4/6] aarch64: Add out-of-line functions for LSE atomics
2019-09-18 1:58 ` [PATCH, AArch64 v4 4/6] aarch64: Add out-of-line functions for LSE atomics Richard Henderson
2019-09-18 12:58 ` Kyrill Tkachov
@ 2019-12-23 16:05 ` Roman Zhuykov
1 sibling, 0 replies; 12+ messages in thread
From: Roman Zhuykov @ 2019-12-23 16:05 UTC (permalink / raw)
To: Richard Henderson
Cc: gcc-patches, Wilco.Dijkstra, kyrylo.tkachov, Marcus.Shawcroft,
James.Greenhalgh
This caused:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93053
--
Roman
Richard Henderson wrote 18.09.2019 04:58:
> This is the libgcc part of the interface -- providing the functions.
> Rationale is provided at the top of libgcc/config/aarch64/lse.S.
>
> * config/aarch64/lse-init.c: New file.
> * config/aarch64/lse.S: New file.
> * config/aarch64/t-lse: New file.
> * config.host: Add t-lse to all aarch64 tuples.
> ---
> libgcc/config/aarch64/lse-init.c | 45 ++++++
> libgcc/config.host | 4 +
> libgcc/config/aarch64/lse.S | 235 +++++++++++++++++++++++++++++++
> libgcc/config/aarch64/t-lse | 44 ++++++
> 4 files changed, 328 insertions(+)
> create mode 100644 libgcc/config/aarch64/lse-init.c
> create mode 100644 libgcc/config/aarch64/lse.S
> create mode 100644 libgcc/config/aarch64/t-lse
>
> diff --git a/libgcc/config/aarch64/lse-init.c
> b/libgcc/config/aarch64/lse-init.c
> new file mode 100644
> index 00000000000..51fb21d45c9
> --- /dev/null
> +++ b/libgcc/config/aarch64/lse-init.c
> @@ -0,0 +1,45 @@
> +/* Out-of-line LSE atomics for AArch64 architecture, Init.
> + Copyright (C) 2018 Free Software Foundation, Inc.
> + Contributed by Linaro Ltd.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
> +for more details.
> +
> +Under Section 7 of GPL version 3, you are granted additional
> +permissions described in the GCC Runtime Library Exception, version
> +3.1, as published by the Free Software Foundation.
> +
> +You should have received a copy of the GNU General Public License and
> +a copy of the GCC Runtime Library Exception along with this program;
> +see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
> +<http://www.gnu.org/licenses/>. */
> +
> +/* Define the symbol gating the LSE implementations. */
> +_Bool __aarch64_have_lse_atomics
> + __attribute__((visibility("hidden"), nocommon));
> +
> +/* Disable initialization of __aarch64_have_lse_atomics during
> bootstrap. */
> +#ifndef inhibit_libc
> +# include <sys/auxv.h>
> +
> +/* Disable initialization if the system headers are too old. */
> +# if defined(AT_HWCAP) && defined(HWCAP_ATOMICS)
> +
> +static void __attribute__((constructor))
> +init_have_lse_atomics (void)
> +{
> + unsigned long hwcap = getauxval (AT_HWCAP);
> + __aarch64_have_lse_atomics = (hwcap & HWCAP_ATOMICS) != 0;
> +}
> +
> +# endif /* HWCAP */
> +#endif /* inhibit_libc */
> diff --git a/libgcc/config.host b/libgcc/config.host
> index 728e543ea39..122113fc519 100644
> --- a/libgcc/config.host
> +++ b/libgcc/config.host
> @@ -350,12 +350,14 @@ aarch64*-*-elf | aarch64*-*-rtems*)
> extra_parts="$extra_parts crtbegin.o crtend.o crti.o crtn.o"
> extra_parts="$extra_parts crtfastmath.o"
> tmake_file="${tmake_file} ${cpu_type}/t-aarch64"
> + tmake_file="${tmake_file} ${cpu_type}/t-lse t-slibgcc-libgcc"
> tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp t-crtfm"
> md_unwind_header=aarch64/aarch64-unwind.h
> ;;
> aarch64*-*-freebsd*)
> extra_parts="$extra_parts crtfastmath.o"
> tmake_file="${tmake_file} ${cpu_type}/t-aarch64"
> + tmake_file="${tmake_file} ${cpu_type}/t-lse t-slibgcc-libgcc"
> tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp t-crtfm"
> md_unwind_header=aarch64/freebsd-unwind.h
> ;;
> @@ -367,12 +369,14 @@ aarch64*-*-netbsd*)
> ;;
> aarch64*-*-fuchsia*)
> tmake_file="${tmake_file} ${cpu_type}/t-aarch64"
> + tmake_file="${tmake_file} ${cpu_type}/t-lse t-slibgcc-libgcc"
> tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp"
> ;;
> aarch64*-*-linux*)
> extra_parts="$extra_parts crtfastmath.o"
> md_unwind_header=aarch64/linux-unwind.h
> tmake_file="${tmake_file} ${cpu_type}/t-aarch64"
> + tmake_file="${tmake_file} ${cpu_type}/t-lse t-slibgcc-libgcc"
> tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp t-crtfm"
> ;;
> alpha*-*-linux*)
> diff --git a/libgcc/config/aarch64/lse.S b/libgcc/config/aarch64/lse.S
> new file mode 100644
> index 00000000000..c24a39242ca
> --- /dev/null
> +++ b/libgcc/config/aarch64/lse.S
> @@ -0,0 +1,235 @@
> +/* Out-of-line LSE atomics for AArch64 architecture.
> + Copyright (C) 2018 Free Software Foundation, Inc.
> + Contributed by Linaro Ltd.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
> +for more details.
> +
> +Under Section 7 of GPL version 3, you are granted additional
> +permissions described in the GCC Runtime Library Exception, version
> +3.1, as published by the Free Software Foundation.
> +
> +You should have received a copy of the GNU General Public License and
> +a copy of the GCC Runtime Library Exception along with this program;
> +see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
> +<http://www.gnu.org/licenses/>. */
> +
> +/*
> + * The problem that we are trying to solve is operating system
> deployment
> + * of ARMv8.1-Atomics, also known as Large System Exensions (LSE).
> + *
> + * There are a number of potential solutions for this problem which
> have
> + * been proposed and rejected for various reasons. To recap:
> + *
> + * (1) Multiple builds. The dynamic linker will examine
> /lib64/atomics/
> + * if HWCAP_ATOMICS is set, allowing entire libraries to be
> overwritten.
> + * However, not all Linux distributions are happy with multiple
> builds,
> + * and anyway it has no effect on main applications.
> + *
> + * (2) IFUNC. We could put these functions into libgcc_s.so, and have
> + * a single copy of each function for all DSOs. However, ARM is
> concerned
> + * that the branch-to-indirect-branch that is implied by using a PLT,
> + * as required by IFUNC, is too much overhead for smaller cpus.
> + *
> + * (3) Statically predicted direct branches. This is the approach
> that
> + * is taken here. These functions are linked into every DSO that uses
> them.
> + * All of the symbols are hidden, so that the functions are called via
> a
> + * direct branch. The choice of LSE vs non-LSE is done via one byte
> load
> + * followed by a well-predicted direct branch. The functions are
> compiled
> + * separately to minimize code size.
> + */
> +
> +/* Tell the assembler to accept LSE instructions. */
> + .arch armv8-a+lse
> +
> +/* Declare the symbol gating the LSE implementations. */
> + .hidden __aarch64_have_lse_atomics
> +
> +/* Turn size and memory model defines into mnemonic fragments. */
> +#if SIZE == 1
> +# define S b
> +# define UXT uxtb
> +#elif SIZE == 2
> +# define S h
> +# define UXT uxth
> +#elif SIZE == 4 || SIZE == 8 || SIZE == 16
> +# define S
> +# define UXT mov
> +#else
> +# error
> +#endif
> +
> +#if MODEL == 1
> +# define SUFF _relax
> +# define A
> +# define L
> +#elif MODEL == 2
> +# define SUFF _acq
> +# define A a
> +# define L
> +#elif MODEL == 3
> +# define SUFF _rel
> +# define A
> +# define L l
> +#elif MODEL == 4
> +# define SUFF _acq_rel
> +# define A a
> +# define L l
> +#else
> +# error
> +#endif
> +
> +/* Concatenate symbols. */
> +#define glue2_(A, B) A ## B
> +#define glue2(A, B) glue2_(A, B)
> +#define glue3_(A, B, C) A ## B ## C
> +#define glue3(A, B, C) glue3_(A, B, C)
> +#define glue4_(A, B, C, D) A ## B ## C ## D
> +#define glue4(A, B, C, D) glue4_(A, B, C, D)
> +
> +/* Select the size of a register, given a regno. */
> +#define x(N) glue2(x, N)
> +#define w(N) glue2(w, N)
> +#if SIZE < 8
> +# define s(N) w(N)
> +#else
> +# define s(N) x(N)
> +#endif
> +
> +#define NAME(BASE) glue4(__aarch64_, BASE, SIZE, SUFF)
> +#define LDXR glue4(ld, A, xr, S)
> +#define STXR glue4(st, L, xr, S)
> +
> +/* Temporary registers used. Other than these, only the return value
> + register (x0) and the flags are modified. */
> +#define tmp0 16
> +#define tmp1 17
> +#define tmp2 15
> +
> +/* Start and end a function. */
> +.macro STARTFN name
> + .text
> + .balign 16
> + .globl \name
> + .hidden \name
> + .type \name, %function
> + .cfi_startproc
> +\name:
> +.endm
> +
> +.macro ENDFN name
> + .cfi_endproc
> + .size \name, . - \name
> +.endm
> +
> +/* Branch to LABEL if LSE is disabled. */
> +.macro JUMP_IF_NOT_LSE label
> + adrp x(tmp0), __aarch64_have_lse_atomics
> + ldrb w(tmp0), [x(tmp0), :lo12:__aarch64_have_lse_atomics]
> + cbz w(tmp0), \label
> +.endm
> +
> +#ifdef L_cas
> +
> +STARTFN NAME(cas)
> + JUMP_IF_NOT_LSE 8f
> +
> +#if SIZE < 16
> +#define CAS glue4(cas, A, L, S)
> +
> + CAS s(0), s(1), [x2]
> + ret
> +
> +8: UXT s(tmp0), s(0)
> +0: LDXR s(0), [x2]
> + cmp s(0), s(tmp0)
> + bne 1f
> + STXR w(tmp1), s(1), [x2]
> + cbnz w(tmp1), 0b
> +1: ret
> +
> +#else
> +#define LDXP glue3(ld, A, xp)
> +#define STXP glue3(st, L, xp)
> +#define CASP glue3(casp, A, L)
> +
> + CASP x0, x1, x2, x3, [x4]
> + ret
> +
> +8: mov x(tmp0), x0
> + mov x(tmp1), x1
> +0: LDXP x0, x1, [x4]
> + cmp x0, x(tmp0)
> + ccmp x1, x(tmp1), #0, eq
> + bne 1f
> + STXP w(tmp2), x(tmp0), x(tmp1), [x4]
> + cbnz w(tmp2), 0b
> +1: ret
> +
> +#endif
> +
> +ENDFN NAME(cas)
> +#endif
> +
> +#ifdef L_swp
> +#define SWP glue4(swp, A, L, S)
> +
> +STARTFN NAME(swp)
> + JUMP_IF_NOT_LSE 8f
> +
> + SWP s(0), s(0), [x1]
> + ret
> +
> +8: mov s(tmp0), s(0)
> +0: LDXR s(0), [x1]
> + STXR w(tmp1), s(tmp0), [x1]
> + cbnz w(tmp1), 0b
> + ret
> +
> +ENDFN NAME(swp)
> +#endif
> +
> +#if defined(L_ldadd) || defined(L_ldclr) \
> + || defined(L_ldeor) || defined(L_ldset)
> +
> +#ifdef L_ldadd
> +#define LDNM ldadd
> +#define OP add
> +#elif defined(L_ldclr)
> +#define LDNM ldclr
> +#define OP bic
> +#elif defined(L_ldeor)
> +#define LDNM ldeor
> +#define OP eor
> +#elif defined(L_ldset)
> +#define LDNM ldset
> +#define OP orr
> +#else
> +#error
> +#endif
> +#define LDOP glue4(LDNM, A, L, S)
> +
> +STARTFN NAME(LDNM)
> + JUMP_IF_NOT_LSE 8f
> +
> + LDOP s(0), s(0), [x1]
> + ret
> +
> +8: mov s(tmp0), s(0)
> +0: LDXR s(0), [x1]
> + OP s(tmp1), s(0), s(tmp0)
> + STXR w(tmp1), s(tmp1), [x1]
> + cbnz w(tmp1), 0b
> + ret
> +
> +ENDFN NAME(LDNM)
> +#endif
> diff --git a/libgcc/config/aarch64/t-lse b/libgcc/config/aarch64/t-lse
> new file mode 100644
> index 00000000000..c7f4223cd45
> --- /dev/null
> +++ b/libgcc/config/aarch64/t-lse
> @@ -0,0 +1,44 @@
> +# Out-of-line LSE atomics for AArch64 architecture.
> +# Copyright (C) 2018 Free Software Foundation, Inc.
> +# Contributed by Linaro Ltd.
> +#
> +# This file is part of GCC.
> +#
> +# GCC is free software; you can redistribute it and/or modify it
> +# under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3, or (at your option)
> +# any later version.
> +#
> +# GCC is distributed in the hope that it will be useful, but
> +# WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> +# General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with GCC; see the file COPYING3. If not see
> +# <http://www.gnu.org/licenses/>.
> +
> +# Compare-and-swap has 5 sizes and 4 memory models.
> +S0 := $(foreach s, 1 2 4 8 16, $(addsuffix _$(s), cas))
> +O0 := $(foreach m, 1 2 3 4, $(addsuffix _$(m)$(objext), $(S0)))
> +
> +# Swap, Load-and-operate have 4 sizes and 4 memory models
> +S1 := $(foreach s, 1 2 4 8, $(addsuffix _$(s), swp ldadd ldclr ldeor
> ldset))
> +O1 := $(foreach m, 1 2 3 4, $(addsuffix _$(m)$(objext), $(S1)))
> +
> +LSE_OBJS := $(O0) $(O1)
> +
> +libgcc-objects += $(LSE_OBJS) lse-init$(objext)
> +
> +empty =
> +space = $(empty) $(empty)
> +PAT_SPLIT = $(subst _,$(space),$(*F))
> +PAT_BASE = $(word 1,$(PAT_SPLIT))
> +PAT_N = $(word 2,$(PAT_SPLIT))
> +PAT_M = $(word 3,$(PAT_SPLIT))
> +
> +lse-init$(objext): $(srcdir)/config/aarch64/lse-init.c
> + $(gcc_compile) -c $<
> +
> +$(LSE_OBJS): $(srcdir)/config/aarch64/lse.S
> + $(gcc_compile) -DL_$(PAT_BASE) -DSIZE=$(PAT_N) -DMODEL=$(PAT_M) -c $<
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2019-12-23 15:38 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-18 1:58 [PATCH, AArch64 v4 0/6] LSE atomics out-of-line Richard Henderson
2019-09-18 1:58 ` [PATCH, AArch64 v4 1/6] aarch64: Extend %R for integer registers Richard Henderson
2019-09-18 1:58 ` [PATCH, AArch64 v4 5/6] aarch64: Implement -moutline-atomics Richard Henderson
2019-09-18 12:58 ` Kyrill Tkachov
2019-09-18 1:58 ` [PATCH, AArch64 v4 3/6] aarch64: Tidy aarch64_split_compare_and_swap Richard Henderson
2019-09-18 1:58 ` [PATCH, AArch64 v4 6/6] TESTING: Enable -moutline-atomics by default Richard Henderson
2019-09-18 1:58 ` [PATCH, AArch64 v4 4/6] aarch64: Add out-of-line functions for LSE atomics Richard Henderson
2019-09-18 12:58 ` Kyrill Tkachov
2019-12-23 16:05 ` Roman Zhuykov
2019-09-18 1:58 ` [PATCH, AArch64 v4 2/6] aarch64: Implement TImode compare-and-swap Richard Henderson
2019-09-18 12:58 ` [PATCH, AArch64 v4 0/6] LSE atomics out-of-line Kyrill Tkachov
2019-09-19 14:39 ` Richard Henderson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).