public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [AArch64][PATCH 1/5] Use atomic instructions for swap and fetch-update operations.
@ 2015-09-17 16:39 Matthew Wahab
  2015-09-17 16:42 ` [AArch64][PATCH 2/5] Add BIC instruction Matthew Wahab
                   ` (4 more replies)
  0 siblings, 5 replies; 20+ messages in thread
From: Matthew Wahab @ 2015-09-17 16:39 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 3442 bytes --]

Hello,

ARMv8.1 adds atomic swap and atomic load-operate instructions with
optional memory ordering specifiers. This patch series adds the
instructions to GCC, making them available with -march=armv8.1-a or
-march=armv8+lse, and uses them to implement the __sync and __atomic
builtins.

The ARMv8.1 swap instruction swaps the value in a register with a value
in memory. The load-operate instructions load a value from memory,
update it with the result of an operation and store the result in
memory.

This series uses the swap instruction to implement the atomic_exchange
patterns and the load-operate instructions to implement the
atomic_fetch_<op> and atomic_<op>_fetch patterns. For the
atomic_<op>_fetch patterns, the value returned as the result of the
operation has to be recalculated from the loaded data. The ARMv8 BIC
instruction is added so that it can be used for this recalculation.

The patches in this series
- add and use the atomic swap instruction.
- add the Aarch64 BIC instruction,
- add the ARMv8.1 load-operate instructions,
- use the load-operate instructions to implement the atomic_fetch_<op>
   patterns,
- use the load-operate instructions to implement the patterns
   atomic_<op>_fetch patterns,

The code-generation changes in this series are based around a new
function, aarch64_gen_atomic_ldop, which takes the operation to be
implemented and emits the appropriate code, making use of the atomic
instruction. This follows the existing uses aarch64_split_atomic_op for
the same purpose when atomic instructions aren't available.

This patch adds the ARMv8.1 SWAP instruction and function
aarch64_gen_atomic_ldop and changes the implementation of the
atomic_exchange pattern to the atomic instruction when it is available.

The general form of the code generated for an atomic_exchange, with
destination D, source S, memory address A and memory order MO is:

    swp<mo><sz> S, D, [A]

where
    <mo> is one of {'', 'a', 'l', 'al'} depending on memory order MO.
    <sz> is one of {'', 'b', 'h'} depending on the data size.

This patch also adds tests for the changes. These reuse the support code
introduced for the atomic CAS tests, adding macros to test functions
taking one memory ordering argument. These are used to iteratively
define functions using the __atomic_exchange builtins, which should be
implemented using the atomic swap.

Tested the series for aarch64-none-linux-gnu with native bootstrap and
make check. Also tested for aarch64-none-elf with cross-compiled
check-gcc on an ARMv8.1 emulator with +lse enabled by default.

Ok for trunk?
Matthew

gcc/
2015-09-17  Matthew Wahab  <matthew.wahab@arm.com>

	* config/aarch64/aarch64-protos.h (aarch64_gen_atomic_ldop):
	Declare.
	* config/aarch64/aarch64.c (aarch64_emit_atomic_swp): New.
	(aarch64_gen_atomic_ldop): New.
	(aarch64_split_atomic_op): Fix whitespace and add a comment.
	* config/aarch64/atomics.md (UNSPECV_ATOMIC_SWP): New.
	(atomic_compare_and_swap<mode>_lse): Remove comments and fix
	whitespace.
	(atomic_exchange<mode>): Replace with an expander.
	(aarch64_atomic_exchange<mode>): New.
	(aarch64_atomic_exchange<mode>_lse): New.
	(aarch64_atomic_<atomic_optab><mode>): Fix some whitespace.
	(aarch64_atomic_swp<mode>): New.


gcc/testsuite/
2015-09-17  Matthew Wahab  <matthew.wahab@arm.com>

	* gcc.target/aarch64/atomic-inst-ops.inc: (TEST_MODEL): New.
	(TEST_ONE): New.
          * gcc.target/aarch64/atomic-inst-swap.c: New.


[-- Attachment #2: 0001-Add-atomic-SWP-instruction.patch --]
[-- Type: text/x-patch, Size: 10975 bytes --]

From 425fa05a5e3656c8d6d0d085628424b4c846cd49 Mon Sep 17 00:00:00 2001
From: Matthew Wahab <matthew.wahab@arm.com>
Date: Fri, 7 Aug 2015 17:18:37 +0100
Subject: [PATCH 1/5] Add atomic SWP instruction

Change-Id: I87bf48526cb11e65edd15691f5eab20446e418c9
---
 gcc/config/aarch64/aarch64-protos.h                |  1 +
 gcc/config/aarch64/aarch64.c                       | 46 ++++++++++-
 gcc/config/aarch64/atomics.md                      | 92 +++++++++++++++++++---
 .../gcc.target/aarch64/atomic-inst-ops.inc         | 13 +++
 gcc/testsuite/gcc.target/aarch64/atomic-inst-swp.c | 44 +++++++++++
 5 files changed, 183 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/atomic-inst-swp.c

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index ff19851..eba4c76 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -378,6 +378,7 @@ rtx aarch64_load_tp (rtx);
 void aarch64_expand_compare_and_swap (rtx op[]);
 void aarch64_split_compare_and_swap (rtx op[]);
 void aarch64_gen_atomic_cas (rtx, rtx, rtx, rtx, rtx);
+void aarch64_gen_atomic_ldop (enum rtx_code, rtx, rtx, rtx, rtx);
 void aarch64_split_atomic_op (enum rtx_code, rtx, rtx, rtx, rtx, rtx, rtx);
 
 bool aarch64_gen_adjusted_ldpstp (rtx *, bool, enum machine_mode, RTX_CODE);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 4d2126b..dc05c6e 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -11185,11 +11185,54 @@ aarch64_split_compare_and_swap (rtx operands[])
     aarch64_emit_post_barrier (model);
 }
 
+/* Emit an atomic swap.  */
+
+static void
+aarch64_emit_atomic_swap (machine_mode mode, rtx dst, rtx value,
+			  rtx mem, rtx model)
+{
+  rtx (*gen) (rtx, rtx, rtx, rtx);
+
+  switch (mode)
+    {
+    case QImode: gen = gen_aarch64_atomic_swpqi; break;
+    case HImode: gen = gen_aarch64_atomic_swphi; break;
+    case SImode: gen = gen_aarch64_atomic_swpsi; break;
+    case DImode: gen = gen_aarch64_atomic_swpdi; break;
+    default:
+      gcc_unreachable ();
+    }
+
+  emit_insn (gen (dst, mem, value, model));
+}
+
+/* Emit an atomic operation where the architecture supports it.  */
+
+void
+aarch64_gen_atomic_ldop (enum rtx_code code, rtx out_data,
+			 rtx mem, rtx value, rtx model_rtx)
+{
+  machine_mode mode = GET_MODE (mem);
+
+  out_data = gen_lowpart (mode, out_data);
+
+  switch (code)
+    {
+    case SET:
+      aarch64_emit_atomic_swap (mode, out_data, value, mem, model_rtx);
+      return;
+
+    default:
+      /* The operation can't be done with atomic instructions.  */
+      gcc_unreachable ();
+    }
+}
+
 /* Split an atomic operation.  */
 
 void
 aarch64_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem,
-		     rtx value, rtx model_rtx, rtx cond)
+			 rtx value, rtx model_rtx, rtx cond)
 {
   machine_mode mode = GET_MODE (mem);
   machine_mode wmode = (mode == DImode ? DImode : SImode);
@@ -11198,6 +11241,7 @@ aarch64_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem,
   rtx_code_label *label;
   rtx x;
 
+  /* Split the atomic operation into a sequence.  */
   label = gen_label_rtx ();
   emit_label (label);
 
diff --git a/gcc/config/aarch64/atomics.md b/gcc/config/aarch64/atomics.md
index 65d2cc9..0e71002 100644
--- a/gcc/config/aarch64/atomics.md
+++ b/gcc/config/aarch64/atomics.md
@@ -27,6 +27,7 @@
     UNSPECV_ATOMIC_CMPSW		; Represent an atomic compare swap.
     UNSPECV_ATOMIC_EXCHG		; Represent an atomic exchange.
     UNSPECV_ATOMIC_CAS			; Represent an atomic CAS.
+    UNSPECV_ATOMIC_SWP			; Represent an atomic SWP.
     UNSPECV_ATOMIC_OP			; Represent an atomic operation.
 ])
 
@@ -122,19 +123,19 @@
 )
 
 (define_insn_and_split "aarch64_compare_and_swap<mode>_lse"
-  [(set (reg:CC CC_REGNUM)					;; bool out
+  [(set (reg:CC CC_REGNUM)
     (unspec_volatile:CC [(const_int 0)] UNSPECV_ATOMIC_CMPSW))
-   (set (match_operand:GPI 0 "register_operand" "=&r")		;; val out
-    (match_operand:GPI 1 "aarch64_sync_memory_operand" "+Q"))   ;; memory
+   (set (match_operand:GPI 0 "register_operand" "=&r")
+    (match_operand:GPI 1 "aarch64_sync_memory_operand" "+Q"))
    (set (match_dup 1)
     (unspec_volatile:GPI
-      [(match_operand:GPI 2 "aarch64_plus_operand" "rI")	;; expect
-       (match_operand:GPI 3 "register_operand" "r")		;; desired
-       (match_operand:SI 4 "const_int_operand")			;; is_weak
-       (match_operand:SI 5 "const_int_operand")			;; mod_s
-       (match_operand:SI 6 "const_int_operand")]		;; mod_f
+      [(match_operand:GPI 2 "aarch64_plus_operand" "rI")
+       (match_operand:GPI 3 "register_operand" "r")
+       (match_operand:SI 4 "const_int_operand")
+       (match_operand:SI 5 "const_int_operand")
+       (match_operand:SI 6 "const_int_operand")]
       UNSPECV_ATOMIC_CMPSW))]
-  "TARGET_LSE "
+  "TARGET_LSE"
   "#"
   "&& reload_completed"
   [(const_int 0)]
@@ -146,7 +147,32 @@
   }
 )
 
-(define_insn_and_split "atomic_exchange<mode>"
+(define_expand "atomic_exchange<mode>"
+ [(parallel
+   [(set (match_operand:ALLI 0 "register_operand" "")
+     (match_operand:ALLI 1 "aarch64_sync_memory_operand" ""))
+    (set (match_dup 1)
+     (unspec_volatile:ALLI
+      [(match_operand:ALLI 2 "register_operand" "")
+       (match_operand:SI 3 "const_int_operand" "")]
+      UNSPECV_ATOMIC_EXCHG))])]
+  ""
+  {
+    rtx (*gen) (rtx, rtx, rtx, rtx);
+
+    /* Use an atomic SWP when available.  */
+    if (TARGET_LSE)
+      gen = gen_aarch64_atomic_exchange<mode>_lse;
+    else
+      gen = gen_aarch64_atomic_exchange<mode>;
+
+    emit_insn (gen (operands[0], operands[1], operands[2], operands[3]));
+
+    DONE;
+  }
+)
+
+(define_insn_and_split "aarch64_atomic_exchange<mode>"
   [(set (match_operand:ALLI 0 "register_operand" "=&r")		;; output
     (match_operand:ALLI 1 "aarch64_sync_memory_operand" "+Q")) ;; memory
    (set (match_dup 1)
@@ -162,7 +188,27 @@
   [(const_int 0)]
   {
     aarch64_split_atomic_op (SET, operands[0], NULL, operands[1],
-			    operands[2], operands[3], operands[4]);
+			     operands[2], operands[3], operands[4]);
+    DONE;
+  }
+)
+
+(define_insn_and_split "aarch64_atomic_exchange<mode>_lse"
+  [(set (match_operand:ALLI 0 "register_operand" "=&r")
+    (match_operand:ALLI 1 "aarch64_sync_memory_operand" "+Q"))
+   (set (match_dup 1)
+    (unspec_volatile:ALLI
+      [(match_operand:ALLI 2 "register_operand" "r")
+       (match_operand:SI 3 "const_int_operand" "")]
+      UNSPECV_ATOMIC_EXCHG))
+   (clobber (reg:CC CC_REGNUM))]
+  "TARGET_LSE"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+  {
+    aarch64_gen_atomic_ldop (SET, operands[0], operands[1],
+			     operands[2], operands[3]);
     DONE;
   }
 )
@@ -183,7 +229,7 @@
   [(const_int 0)]
   {
     aarch64_split_atomic_op (<CODE>, NULL, operands[3], operands[0],
-			    operands[1], operands[2], operands[4]);
+			     operands[1], operands[2], operands[4]);
     DONE;
   }
 )
@@ -425,6 +471,28 @@
 
 ;; ARMv8.1 LSE instructions.
 
+;; Atomic swap with memory.
+(define_insn "aarch64_atomic_swp<mode>"
+ [(set (match_operand:ALLI 0 "register_operand" "+&r")
+   (match_operand:ALLI 1 "aarch64_sync_memory_operand" "+Q"))
+  (set (match_dup 1)
+   (unspec_volatile:ALLI
+    [(match_operand:ALLI 2 "register_operand" "r")
+     (match_operand:SI 3 "const_int_operand" "")]
+    UNSPECV_ATOMIC_SWP))]
+  "TARGET_LSE && reload_completed"
+  {
+    enum memmodel model = memmodel_from_int (INTVAL (operands[3]));
+    if (is_mm_relaxed (model))
+      return "swp<atomic_sfx>\t%<w>2, %<w>0, %1";
+    else if (is_mm_acquire (model) || is_mm_consume (model))
+      return "swpa<atomic_sfx>\t%<w>2, %<w>0, %1";
+    else if (is_mm_release (model))
+      return "swpl<atomic_sfx>\t%<w>2, %<w>0, %1";
+    else
+      return "swpal<atomic_sfx>\t%<w>2, %<w>0, %1";
+  })
+
 ;; Atomic compare-and-swap: HI and smaller modes.
 
 (define_insn "aarch64_atomic_cas<mode>"
diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-inst-ops.inc b/gcc/testsuite/gcc.target/aarch64/atomic-inst-ops.inc
index 72c7e5c..c2fdcba 100644
--- a/gcc/testsuite/gcc.target/aarch64/atomic-inst-ops.inc
+++ b/gcc/testsuite/gcc.target/aarch64/atomic-inst-ops.inc
@@ -32,6 +32,15 @@ typedef __uint128_t uint128;
   TEST_M##N (NAME, FN, int128, MODEL1, MODEL2)		\
   TEST_M##N (NAME, FN, uint128, MODEL1, MODEL2)
 
+/* Models to test.  */
+#define TEST_MODEL(NAME, FN, N)					\
+  TEST_TY (NAME##_relaxed, FN, N, __ATOMIC_RELAXED, DUMMY)	\
+  TEST_TY (NAME##_consume, FN, N, __ATOMIC_CONSUME, DUMMY)	\
+  TEST_TY (NAME##_acquire, FN, N, __ATOMIC_ACQUIRE, DUMMY)	\
+  TEST_TY (NAME##_release, FN, N, __ATOMIC_RELEASE, DUMMY)	\
+  TEST_TY (NAME##_acq_rel, FN, N, __ATOMIC_ACQ_REL, DUMMY)	\
+  TEST_TY (NAME##_seq_cst, FN, N, __ATOMIC_SEQ_CST, DUMMY)	\
+
 /* Cross-product of models to test.  */
 #define TEST_MODEL_M1(NAME, FN, N, M)			\
   TEST_TY (NAME##_relaxed, FN, N, M, __ATOMIC_RELAXED)	\
@@ -51,3 +60,7 @@ typedef __uint128_t uint128;
 
 /* Expand functions for a cross-product of memory models and types.  */
 #define TEST_TWO(NAME, FN) TEST_MODEL_M2 (NAME, FN)
+
+/* Expand functions for a set of memory models and types.  */
+#define TEST_ONE(NAME, FN) TEST_MODEL (NAME, FN, 1)
+
diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-inst-swp.c b/gcc/testsuite/gcc.target/aarch64/atomic-inst-swp.c
new file mode 100644
index 0000000..dabc9b9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/atomic-inst-swp.c
@@ -0,0 +1,44 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv8-a+lse" } */
+
+/* Test ARMv8.1-A SWP instruction.  */
+
+#include "atomic-inst-ops.inc"
+
+#define TEST TEST_ONE
+
+#define SWAP_ATOMIC(FN, TY, MODEL)					\
+  TY FNNAME (FN, TY) (TY* val, TY foo)					\
+  {									\
+    return __atomic_exchange_n (val, foo, MODEL);			\
+  }
+
+#define SWAP_ATOMIC_NORETURN(FN, TY, MODEL)				\
+  void FNNAME (FN, TY) (TY* val, TY* foo, TY* bar)			\
+  {									\
+    __atomic_exchange (val, foo, bar, MODEL);				\
+  }
+
+
+TEST (swap_atomic, SWAP_ATOMIC)
+TEST (swap_atomic_noreturn, SWAP_ATOMIC_NORETURN)
+
+
+/* { dg-final { scan-assembler-times "swpb\t" 4} } */
+/* { dg-final { scan-assembler-times "swpab\t" 8} } */
+/* { dg-final { scan-assembler-times "swplb\t" 4} } */
+/* { dg-final { scan-assembler-times "swpalb\t" 8} } */
+
+/* { dg-final { scan-assembler-times "swph\t" 4} } */
+/* { dg-final { scan-assembler-times "swpah\t" 8} } */
+/* { dg-final { scan-assembler-times "swplh\t" 4} } */
+/* { dg-final { scan-assembler-times "swpalh\t" 8} } */
+
+/* { dg-final { scan-assembler-times "swp\t" 8} } */
+/* { dg-final { scan-assembler-times "swpa\t" 16} } */
+/* { dg-final { scan-assembler-times "swpl\t" 8} } */
+/* { dg-final { scan-assembler-times "swpal\t" 16} } */
+
+/* { dg-final { scan-assembler-not "ldaxr\t" } } */
+/* { dg-final { scan-assembler-not "stlxr\t" } } */
+/* { dg-final { scan-assembler-not "dmb" } } */
-- 
2.1.4


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [AArch64][PATCH 2/5] Add BIC instruction.
  2015-09-17 16:39 [AArch64][PATCH 1/5] Use atomic instructions for swap and fetch-update operations Matthew Wahab
@ 2015-09-17 16:42 ` Matthew Wahab
  2015-09-18  8:11   ` James Greenhalgh
  2015-09-17 16:47 ` [AArch64][PATCH 3/5] Add atomic load-operate instructions Matthew Wahab
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 20+ messages in thread
From: Matthew Wahab @ 2015-09-17 16:42 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 644 bytes --]

Hello,

ARMv8.1 adds atomic swap and atomic load-operate instructions with
optional memory ordering specifiers. This patch adds an expander to
generate a BIC instruction that can be explicitly called when
implementing the atomic_<op>_fetch pattern to calculate the value to
be returned by the operation.

Tested the series for aarch64-none-linux-gnu with native bootstrap and
make check. Also tested for aarch64-none-elf with cross-compiled
check-gcc on an ARMv8.1 emulator with +lse enabled by default.

Ok for trunk?
Matthew

2015-09-17  Matthew Wahab  <matthew.wahab@arm.com>

	* config/aarch64/aarch64.md (bic_<SHIFT:optab><mode>3): New.



[-- Attachment #2: 0002-Add-BIC-instruction.patch --]
[-- Type: text/x-patch, Size: 1037 bytes --]

From 14e122ee98aa20826ee070d20c58c94206cdd91b Mon Sep 17 00:00:00 2001
From: Matthew Wahab <matthew.wahab@arm.com>
Date: Mon, 17 Aug 2015 17:48:27 +0100
Subject: [PATCH 2/5] Add BIC instruction

Change-Id: Ibef049bfa1bfe5e168feada3dc358f28383e6410
---
 gcc/config/aarch64/aarch64.md | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 88ba72e..bae4af4 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3351,6 +3351,19 @@
    (set_attr "simd" "*,yes")]
 )
 
+(define_expand "bic_<SHIFT:optab><mode>3"
+ [(set (match_operand:GPI 0 "register_operand" "=r")
+   (and:GPI
+    (not:GPI
+     (SHIFT:GPI
+      (match_operand:GPI 1 "register_operand" "r")
+      (match_operand:QI 2 "aarch64_shift_imm_si" "n")))
+    (match_operand:GPI 3 "register_operand" "r")))]
+ ""
+ ""
+ [(set_attr "type" "logics_shift_imm")]
+)
+
 (define_insn "*and_one_cmpl<mode>3_compare0"
   [(set (reg:CC_NZ CC_REGNUM)
 	(compare:CC_NZ
-- 
2.1.4


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [AArch64][PATCH 3/5] Add atomic load-operate instructions.
  2015-09-17 16:39 [AArch64][PATCH 1/5] Use atomic instructions for swap and fetch-update operations Matthew Wahab
  2015-09-17 16:42 ` [AArch64][PATCH 2/5] Add BIC instruction Matthew Wahab
@ 2015-09-17 16:47 ` Matthew Wahab
  2015-09-18  8:39   ` James Greenhalgh
  2015-09-17 16:49 ` [AArch64][PATCH 4/5] Use atomic load-operate instructions for fetch-update patterns Matthew Wahab
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 20+ messages in thread
From: Matthew Wahab @ 2015-09-17 16:47 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 738 bytes --]

Hello,

ARMv8.1 adds atomic swap and atomic load-operate instructions with
optional memory ordering specifiers. This patch adds the ARMv8.1 atomic
load-operate instructions.

Tested the series for aarch64-none-linux-gnu with native bootstrap and
make check. Also tested for aarch64-none-elf with cross-compiled
check-gcc on an ARMv8.1 emulator with +lse enabled by default.

Ok for trunk?
Matthew

2015-09-17  Matthew Wahab  <matthew.wahab@arm.com>

	* config/aarch64/aarch64/atomics.md (UNSPECV_ATOMIC_LDOP): New.
	(UNSPECV_ATOMIC_LDOP_OR): New.
	(UNSPECV_ATOMIC_LDOP_BIC): New.
	(UNSPECV_ATOMIC_LDOP_XOR): New.
	(UNSPECV_ATOMIC_LDOP_PLUS): New.
	(ATOMIC_LDOP): New.
	(atomic_ldop): New.
	(aarch64_atomic_load<atomic_ldop><mode>): New.


[-- Attachment #2: 0003-Add-atomic-load-operate-instructions.patch --]
[-- Type: text/x-patch, Size: 2636 bytes --]

From 6a8a83c4efbd607924f0630779d4915c9dad079c Mon Sep 17 00:00:00 2001
From: Matthew Wahab <matthew.wahab@arm.com>
Date: Mon, 10 Aug 2015 17:02:08 +0100
Subject: [PATCH 3/5] Add atomic load-operate instructions.

Change-Id: I3746875bad7469403bee7df952f0ba565e4abc71
---
 gcc/config/aarch64/atomics.md | 41 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)

diff --git a/gcc/config/aarch64/atomics.md b/gcc/config/aarch64/atomics.md
index 0e71002..b7b6fb5 100644
--- a/gcc/config/aarch64/atomics.md
+++ b/gcc/config/aarch64/atomics.md
@@ -29,8 +29,25 @@
     UNSPECV_ATOMIC_CAS			; Represent an atomic CAS.
     UNSPECV_ATOMIC_SWP			; Represent an atomic SWP.
     UNSPECV_ATOMIC_OP			; Represent an atomic operation.
+    UNSPECV_ATOMIC_LDOP			; Represent an atomic load-operation
+    UNSPECV_ATOMIC_LDOP_OR		; Represent an atomic load-or
+    UNSPECV_ATOMIC_LDOP_BIC		; Represent an atomic load-bic
+    UNSPECV_ATOMIC_LDOP_XOR		; Represent an atomic load-xor
+    UNSPECV_ATOMIC_LDOP_PLUS		; Represent an atomic load-add
 ])
 
+;; Iterators for load-operate instructions.
+
+(define_int_iterator ATOMIC_LDOP
+ [UNSPECV_ATOMIC_LDOP_OR UNSPECV_ATOMIC_LDOP_BIC
+  UNSPECV_ATOMIC_LDOP_XOR UNSPECV_ATOMIC_LDOP_PLUS])
+
+(define_int_attr atomic_ldop
+ [(UNSPECV_ATOMIC_LDOP_OR "set") (UNSPECV_ATOMIC_LDOP_BIC "clr")
+  (UNSPECV_ATOMIC_LDOP_XOR "eor") (UNSPECV_ATOMIC_LDOP_PLUS "add")])
+
+;; Instruction patterns.
+
 (define_expand "atomic_compare_and_swap<mode>"
   [(match_operand:SI 0 "register_operand" "")			;; bool out
    (match_operand:ALLI 1 "register_operand" "")			;; val out
@@ -541,3 +558,27 @@
     else
       return "casal<atomic_sfx>\t%<w>0, %<w>2, %1";
 })
+
+;; Atomic load-op: Load data, operate, store result, keep data.
+
+(define_insn "aarch64_atomic_load<atomic_ldop><mode>"
+ [(set (match_operand:ALLI 0 "register_operand" "=r")
+   (match_operand:ALLI 1 "aarch64_sync_memory_operand" "+Q"))
+  (set (match_dup 1)
+   (unspec_volatile:ALLI
+    [(match_dup 1)
+     (match_operand:ALLI 2 "register_operand")
+     (match_operand:SI 3 "const_int_operand")]
+    ATOMIC_LDOP))]
+ "TARGET_LSE && reload_completed"
+ {
+   enum memmodel model = memmodel_from_int (INTVAL (operands[3]));
+   if (is_mm_relaxed (model))
+     return "ld<atomic_ldop><atomic_sfx>\t%<w>2, %<w>0, %1";
+   else if (is_mm_acquire (model) || is_mm_consume (model))
+     return "ld<atomic_ldop>a<atomic_sfx>\t%<w>2, %<w>0, %1";
+   else if (is_mm_release (model))
+     return "ld<atomic_ldop>l<atomic_sfx>\t%<w>2, %<w>0, %1";
+   else
+     return "ld<atomic_ldop>al<atomic_sfx>\t%<w>2, %<w>0, %1";
+ })
-- 
2.1.4


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [AArch64][PATCH 4/5] Use atomic load-operate instructions for fetch-update patterns.
  2015-09-17 16:39 [AArch64][PATCH 1/5] Use atomic instructions for swap and fetch-update operations Matthew Wahab
  2015-09-17 16:42 ` [AArch64][PATCH 2/5] Add BIC instruction Matthew Wahab
  2015-09-17 16:47 ` [AArch64][PATCH 3/5] Add atomic load-operate instructions Matthew Wahab
@ 2015-09-17 16:49 ` Matthew Wahab
  2015-09-18  9:05   ` James Greenhalgh
  2015-09-17 16:56 ` [AArch64][PATCH 5/5] Use atomic load-operate instructions for update-fetch patterns Matthew Wahab
  2015-09-18  7:59 ` [AArch64][PATCH 1/5] Use atomic instructions for swap and fetch-update operations James Greenhalgh
  4 siblings, 1 reply; 20+ messages in thread
From: Matthew Wahab @ 2015-09-17 16:49 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2344 bytes --]

Hello,

ARMv8.1 adds atomic swap and atomic load-operate instructions with
optional memory ordering specifiers. This patch uses the ARMv8.1 atomic
load-operate instructions to implement the atomic_fetch_<op>
patterns. This patch also updates the implementation of the atomic_<op>
patterns, which are treated as versions of the atomic_fetch_<op> which
discard the loaded data.

The general form of the code generated for an atomic_fetch_<op>, with
destination D, source S, memory address A and memory order MO, depends
on whether the operation is directly supported by the instruction. If
<op> is one of PLUS, IOR or XOR, the code generated is:

     ld<opc><mo> S, D, [A]

where
   <opc> is one {add, set, eor}
   <mo> is one of {'', 'a', 'l', 'al'} depending on memory order MO.
   <sz> is one of {'', 'b', 'h'} depending on the data size.

If <op> is SUB, the code generated, with scratch register r, is:

     neg r, S
     ldadd<mo><sz> r, D, [A]

If <op> is AND, the code generated is:
     not r, S
     ldclr<mo><sz> r, D, [A]

Any operation not in {PLUS, IOR, XOR, SUB, AND} is passed to the
existing aarch64_split_atomic_op function, to implement the operation
using sequences built with the ARMv8 load-exclusive/store-exclusive
instructions

Tested the series for aarch64-none-linux-gnu with native bootstrap and
make check. Also tested for aarch64-none-elf with cross-compiled
check-gcc on an ARMv8.1 emulator with +lse enabled by default.

Ok for trunk?
Matthew

gcc/
2015-09-17  Matthew Wahab  <matthew.wahab@arm.com>

	* config/aarch64/aarch64-protos.h
	(aarch64_atomic_ldop_supported_p): Declare.
	* config/aarch64/aarch64.c (aarch64_atomic_ldop_supported_p): New.
	(enum aarch64_atomic_load_op_code): New.
	(aarch64_emit_atomic_load_op): New.
	(aarch64_gen_atomic_load_op): Update to support load-operate
	patterns.
	* config/aarch64/atomics.md (atomic_<atomic_optab><mode>): Change
	to an expander.
	(aarch64_atomic_<atomic_optab><mode>): New.
	(aarch64_atomic_<atomic_optab><mode>_lse): New.
	(atomic_fetch_<atomic_optab><mode>): Change to an expander.
	(aarch64_atomic_fetch_<atomic_optab><mode>): New.
	(aarch64_atomic_fetch_<atomic_optab><mode>_lse): New.

gcc/testsuite/
2015-09-17  Matthew Wahab  <matthew.wahab@arm.com>

	* gcc.target/aarch64/atomic-inst-ldadd.c: New.
	* gcc.target/aarch64/atomic-inst-ldlogic.c: New.


[-- Attachment #2: 0004-Use-atomic-instructions-for-fetch-update-patterns.patch --]
[-- Type: text/x-patch, Size: 17756 bytes --]

From c4b8eb6d2ca62c57f4a011e06335b918f603ad2a Mon Sep 17 00:00:00 2001
From: Matthew Wahab <matthew.wahab@arm.com>
Date: Fri, 7 Aug 2015 17:10:42 +0100
Subject: [PATCH 4/5] Use atomic instructions for fetch-update patterns.

Change-Id: I39759f02e61039067ccaabfd52039e4804eddf2f
---
 gcc/config/aarch64/aarch64-protos.h                |   2 +
 gcc/config/aarch64/aarch64.c                       | 176 ++++++++++++++++++++-
 gcc/config/aarch64/atomics.md                      | 109 ++++++++++++-
 .../gcc.target/aarch64/atomic-inst-ldadd.c         |  58 +++++++
 .../gcc.target/aarch64/atomic-inst-ldlogic.c       | 109 +++++++++++++
 5 files changed, 444 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/atomic-inst-ldadd.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/atomic-inst-ldlogic.c

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index eba4c76..76ebd6f 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -378,6 +378,8 @@ rtx aarch64_load_tp (rtx);
 void aarch64_expand_compare_and_swap (rtx op[]);
 void aarch64_split_compare_and_swap (rtx op[]);
 void aarch64_gen_atomic_cas (rtx, rtx, rtx, rtx, rtx);
+
+bool aarch64_atomic_ldop_supported_p (enum rtx_code);
 void aarch64_gen_atomic_ldop (enum rtx_code, rtx, rtx, rtx, rtx);
 void aarch64_split_atomic_op (enum rtx_code, rtx, rtx, rtx, rtx, rtx, rtx);
 
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index dc05c6e..33f9ef3 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -11064,6 +11064,33 @@ aarch64_expand_compare_and_swap (rtx operands[])
   emit_insn (gen_rtx_SET (bval, x));
 }
 
+/* Test whether the target supports using a atomic load-operate instruction.
+   CODE is the operation and AFTER is TRUE if the data in memory after the
+   operation should be returned and FALSE if the data before the operation
+   should be returned.  Returns FALSE if the operation isn't supported by the
+   architecture.
+  */
+
+bool
+aarch64_atomic_ldop_supported_p (enum rtx_code code)
+{
+  if (!TARGET_LSE)
+    return false;
+
+  switch (code)
+    {
+    case SET:
+    case AND:
+    case IOR:
+    case XOR:
+    case MINUS:
+    case PLUS:
+      return true;
+    default:
+      return false;
+    }
+}
+
 /* Emit a barrier, that is appropriate for memory model MODEL, at the end of a
    sequence implementing an atomic operation.  */
 
@@ -11206,26 +11233,169 @@ aarch64_emit_atomic_swap (machine_mode mode, rtx dst, rtx value,
   emit_insn (gen (dst, mem, value, model));
 }
 
-/* Emit an atomic operation where the architecture supports it.  */
+/* Operations supported by aarch64_emit_atomic_load_op.  */
+
+enum aarch64_atomic_load_op_code
+{
+  AARCH64_LDOP_PLUS,	/* A + B  */
+  AARCH64_LDOP_XOR,	/* A ^ B  */
+  AARCH64_LDOP_OR,	/* A | B  */
+  AARCH64_LDOP_BIC	/* A & ~B  */
+};
+
+/* Emit an atomic load-operate.  */
+
+static void
+aarch64_emit_atomic_load_op (enum aarch64_atomic_load_op_code code,
+			     machine_mode mode, rtx dst, rtx src,
+			     rtx mem, rtx model)
+{
+  typedef rtx (*aarch64_atomic_load_op_fn) (rtx, rtx, rtx, rtx);
+  const aarch64_atomic_load_op_fn plus[] =
+  {
+    gen_aarch64_atomic_loadaddqi,
+    gen_aarch64_atomic_loadaddhi,
+    gen_aarch64_atomic_loadaddsi,
+    gen_aarch64_atomic_loadadddi
+  };
+  const aarch64_atomic_load_op_fn eor[] =
+  {
+    gen_aarch64_atomic_loadeorqi,
+    gen_aarch64_atomic_loadeorhi,
+    gen_aarch64_atomic_loadeorsi,
+    gen_aarch64_atomic_loadeordi
+  };
+  const aarch64_atomic_load_op_fn ior[] =
+  {
+    gen_aarch64_atomic_loadsetqi,
+    gen_aarch64_atomic_loadsethi,
+    gen_aarch64_atomic_loadsetsi,
+    gen_aarch64_atomic_loadsetdi
+  };
+  const aarch64_atomic_load_op_fn bic[] =
+  {
+    gen_aarch64_atomic_loadclrqi,
+    gen_aarch64_atomic_loadclrhi,
+    gen_aarch64_atomic_loadclrsi,
+    gen_aarch64_atomic_loadclrdi
+  };
+  aarch64_atomic_load_op_fn gen;
+  int idx = 0;
+
+  switch (mode)
+    {
+    case QImode: idx = 0; break;
+    case HImode: idx = 1; break;
+    case SImode: idx = 2; break;
+    case DImode: idx = 3; break;
+    default:
+      gcc_unreachable ();
+    }
+
+  switch (code)
+    {
+    case AARCH64_LDOP_PLUS: gen = plus[idx]; break;
+    case AARCH64_LDOP_XOR: gen = eor[idx]; break;
+    case AARCH64_LDOP_OR: gen = ior[idx]; break;
+    case AARCH64_LDOP_BIC: gen = bic[idx]; break;
+    default:
+      gcc_unreachable ();
+    }
+
+  emit_insn (gen (dst, mem, src, model));
+}
+
+/* Emit an atomic load+operate.  CODE is the operation.  OUT_DATA is the
+   location to store the data read from memory.  MEM is the memory location to
+   read and modify.  MODEL_RTX is the memory ordering to use.  VALUE is the
+   second operand for the operation.  Either OUT_DATA or OUT_RESULT, but not
+   both, can be NULL.  */
 
 void
 aarch64_gen_atomic_ldop (enum rtx_code code, rtx out_data,
 			 rtx mem, rtx value, rtx model_rtx)
 {
   machine_mode mode = GET_MODE (mem);
+  machine_mode wmode = (mode == DImode ? DImode : SImode);
+  const bool short_mode = (mode < SImode);
+  aarch64_atomic_load_op_code ldop_code;
+  rtx src;
+  rtx x;
 
-  out_data = gen_lowpart (mode, out_data);
+  if (out_data)
+    out_data = gen_lowpart (mode, out_data);
+
+  /* Make sure the value is in a register, putting it into a destination
+     register if it needs to be manipulated.  */
+  if (!register_operand (value, mode)
+      || code == AND || code == MINUS)
+    {
+      src = out_data;
+      emit_move_insn (src, gen_lowpart (mode, value));
+    }
+  else
+    src = value;
+  gcc_assert (register_operand (src, mode));
 
+  /* Preprocess the data for the operation as necessary.  If the operation is
+     a SET then emit a swap instruction and finish.  */
   switch (code)
     {
     case SET:
-      aarch64_emit_atomic_swap (mode, out_data, value, mem, model_rtx);
+      aarch64_emit_atomic_swap (mode, out_data, src, mem, model_rtx);
       return;
 
+    case MINUS:
+      /* Negate the value and treat it as a PLUS.  */
+      {
+	rtx neg_src;
+
+	/* Resize the value if necessary.  */
+	if (short_mode)
+	  src = gen_lowpart (wmode, src);
+
+	neg_src = gen_rtx_NEG (wmode, src);
+	emit_insn (gen_rtx_SET (src, neg_src));
+
+	if (short_mode)
+	  src = gen_lowpart (mode, src);
+      }
+      /* Fall-through.  */
+    case PLUS:
+      ldop_code = AARCH64_LDOP_PLUS;
+      break;
+
+    case IOR:
+      ldop_code = AARCH64_LDOP_OR;
+      break;
+
+    case XOR:
+      ldop_code = AARCH64_LDOP_XOR;
+      break;
+
+    case AND:
+      {
+	rtx not_src;
+
+	/* Resize the value if necessary.  */
+	if (short_mode)
+	  src = gen_lowpart (wmode, src);
+
+	not_src = gen_rtx_NOT (wmode, src);
+	emit_insn (gen_rtx_SET (src, not_src));
+
+	if (short_mode)
+	  src = gen_lowpart (mode, src);
+      }
+      ldop_code = AARCH64_LDOP_BIC;
+      break;
+
     default:
       /* The operation can't be done with atomic instructions.  */
       gcc_unreachable ();
     }
+
+  aarch64_emit_atomic_load_op (ldop_code, mode, out_data, src, mem, model_rtx);
 }
 
 /* Split an atomic operation.  */
diff --git a/gcc/config/aarch64/atomics.md b/gcc/config/aarch64/atomics.md
index b7b6fb5..5a10cf9 100644
--- a/gcc/config/aarch64/atomics.md
+++ b/gcc/config/aarch64/atomics.md
@@ -230,23 +230,67 @@
   }
 )
 
-(define_insn_and_split "atomic_<atomic_optab><mode>"
+(define_expand "atomic_<atomic_optab><mode>"
+ [(set (match_operand:ALLI 0 "aarch64_sync_memory_operand" "")
+   (unspec_volatile:ALLI
+    [(atomic_op:ALLI (match_dup 0)
+      (match_operand:ALLI 1 "<atomic_op_operand>" ""))
+     (match_operand:SI 2 "const_int_operand")]
+    UNSPECV_ATOMIC_OP))
+  (clobber (reg:CC CC_REGNUM))]
+  ""
+  {
+    rtx (*gen) (rtx, rtx, rtx);
+
+    /* Use an atomic load-operate instruction when possible.  */
+    if (aarch64_atomic_ldop_supported_p (<CODE>))
+      gen = gen_aarch64_atomic_<atomic_optab><mode>_lse;
+    else
+      gen = gen_aarch64_atomic_<atomic_optab><mode>;
+
+    emit_insn (gen (operands[0], operands[1], operands[2]));
+
+    DONE;
+  }
+)
+
+(define_insn_and_split "aarch64_atomic_<atomic_optab><mode>"
+ [(set (match_operand:ALLI 0 "aarch64_sync_memory_operand" "+Q")
+   (unspec_volatile:ALLI
+    [(atomic_op:ALLI (match_dup 0)
+      (match_operand:ALLI 1 "<atomic_op_operand>" "r<const_atomic>"))
+     (match_operand:SI 2 "const_int_operand")]
+    UNSPECV_ATOMIC_OP))
+  (clobber (reg:CC CC_REGNUM))
+  (clobber (match_scratch:ALLI 3 "=&r"))
+  (clobber (match_scratch:SI 4 "=&r"))]
+  ""
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+  {
+    aarch64_split_atomic_op (<CODE>, NULL, operands[3], operands[0],
+			     operands[1], operands[2], operands[4]);
+    DONE;
+  }
+)
+
+(define_insn_and_split "aarch64_atomic_<atomic_optab><mode>_lse"
   [(set (match_operand:ALLI 0 "aarch64_sync_memory_operand" "+Q")
     (unspec_volatile:ALLI
       [(atomic_op:ALLI (match_dup 0)
 	(match_operand:ALLI 1 "<atomic_op_operand>" "r<const_atomic>"))
-       (match_operand:SI 2 "const_int_operand")]		;; model
+       (match_operand:SI 2 "const_int_operand")]
       UNSPECV_ATOMIC_OP))
-       (clobber (reg:CC CC_REGNUM))
    (clobber (match_scratch:ALLI 3 "=&r"))
-   (clobber (match_scratch:SI 4 "=&r"))]
+   (clobber (reg:CC CC_REGNUM))]
   ""
   "#"
   "&& reload_completed"
   [(const_int 0)]
   {
-    aarch64_split_atomic_op (<CODE>, NULL, operands[3], operands[0],
-			     operands[1], operands[2], operands[4]);
+    aarch64_gen_atomic_ldop (<CODE>, operands[3], operands[0],
+			     operands[1], operands[2]);
     DONE;
   }
 )
@@ -273,7 +317,37 @@
   }
 )
 
-(define_insn_and_split "atomic_fetch_<atomic_optab><mode>"
+;; Load-operate-store, returning the updated memory data.
+
+(define_expand "atomic_fetch_<atomic_optab><mode>"
+ [(parallel
+   [(set
+     (match_operand:ALLI 0 "register_operand" "")
+     (match_operand:ALLI 1 "aarch64_sync_memory_operand" ""))
+   (set
+    (match_dup 1)
+    (unspec_volatile:ALLI
+     [(atomic_op:ALLI
+       (match_dup 1)
+       (match_operand:ALLI 2 "<atomic_op_operand>" ""))
+      (match_operand:SI 3 "const_int_operand")]
+     UNSPECV_ATOMIC_OP))])]
+ ""
+{
+  rtx (*gen) (rtx, rtx, rtx, rtx);
+
+  /* Use an atomic load-operate instruction when possible.  */
+  if (aarch64_atomic_ldop_supported_p (<CODE>))
+    gen = gen_aarch64_atomic_fetch_<atomic_optab><mode>_lse;
+  else
+    gen = gen_aarch64_atomic_fetch_<atomic_optab><mode>;
+
+  emit_insn (gen (operands[0], operands[1], operands[2], operands[3]));
+
+  DONE;
+})
+
+(define_insn_and_split "aarch64_atomic_fetch_<atomic_optab><mode>"
   [(set (match_operand:ALLI 0 "register_operand" "=&r")
     (match_operand:ALLI 1 "aarch64_sync_memory_operand" "+Q"))
    (set (match_dup 1)
@@ -296,6 +370,27 @@
   }
 )
 
+(define_insn_and_split "aarch64_atomic_fetch_<atomic_optab><mode>_lse"
+  [(set (match_operand:ALLI 0 "register_operand" "=&r")
+    (match_operand:ALLI 1 "aarch64_sync_memory_operand" "+Q"))
+   (set (match_dup 1)
+    (unspec_volatile:ALLI
+      [(atomic_op:ALLI (match_dup 1)
+	(match_operand:ALLI 2 "<atomic_op_operand>" "r<const_atomic>"))
+       (match_operand:SI 3 "const_int_operand")]
+      UNSPECV_ATOMIC_LDOP))
+   (clobber (reg:CC CC_REGNUM))]
+  "TARGET_LSE"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+  {
+    aarch64_gen_atomic_ldop (<CODE>, operands[0], operands[1],
+			     operands[2], operands[3]);
+    DONE;
+  }
+)
+
 (define_insn_and_split "atomic_fetch_nand<mode>"
   [(set (match_operand:ALLI 0 "register_operand" "=&r")
     (match_operand:ALLI 1 "aarch64_sync_memory_operand" "+Q"))
diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-inst-ldadd.c b/gcc/testsuite/gcc.target/aarch64/atomic-inst-ldadd.c
new file mode 100644
index 0000000..c21d2ed
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/atomic-inst-ldadd.c
@@ -0,0 +1,58 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv8-a+lse" } */
+
+/* Test ARMv8.1-A Load-ADD instruction.  */
+
+#include "atomic-inst-ops.inc"
+
+#define TEST TEST_ONE
+
+#define LOAD_ADD(FN, TY, MODEL)						\
+  TY FNNAME (FN, TY) (TY* val, TY* foo)					\
+  {									\
+    return __atomic_fetch_add (val, foo, MODEL);			\
+  }
+
+#define LOAD_ADD_NORETURN(FN, TY, MODEL)				\
+  void FNNAME (FN, TY) (TY* val, TY* foo)				\
+  {									\
+    __atomic_fetch_add (val, foo, MODEL);				\
+  }
+
+#define LOAD_SUB(FN, TY, MODEL)						\
+  TY FNNAME (FN, TY) (TY* val, TY* foo)					\
+  {									\
+    return __atomic_fetch_sub (val, foo, MODEL);			\
+  }
+
+#define LOAD_SUB_NORETURN(FN, TY, MODEL)				\
+  void FNNAME (FN, TY) (TY* val, TY* foo)				\
+  {									\
+    __atomic_fetch_sub (val, foo, MODEL);				\
+  }
+
+
+TEST (load_add, LOAD_ADD)
+TEST (load_add_notreturn, LOAD_ADD_NORETURN)
+
+TEST (load_sub, LOAD_SUB)
+TEST (load_sub_notreturn, LOAD_SUB_NORETURN)
+
+/* { dg-final { scan-assembler-times "ldaddb\t" 8} } */
+/* { dg-final { scan-assembler-times "ldaddab\t" 16} } */
+/* { dg-final { scan-assembler-times "ldaddlb\t" 8} } */
+/* { dg-final { scan-assembler-times "ldaddalb\t" 16} } */
+
+/* { dg-final { scan-assembler-times "ldaddh\t" 8} } */
+/* { dg-final { scan-assembler-times "ldaddah\t" 16} } */
+/* { dg-final { scan-assembler-times "ldaddlh\t" 8} } */
+/* { dg-final { scan-assembler-times "ldaddalh\t" 16} } */
+
+/* { dg-final { scan-assembler-times "ldadd\t" 16} } */
+/* { dg-final { scan-assembler-times "ldadda\t" 32} } */
+/* { dg-final { scan-assembler-times "ldaddl\t" 16} } */
+/* { dg-final { scan-assembler-times "ldaddal\t" 32} } */
+
+/* { dg-final { scan-assembler-not "ldaxr\t" } } */
+/* { dg-final { scan-assembler-not "stlxr\t" } } */
+/* { dg-final { scan-assembler-not "dmb" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-inst-ldlogic.c b/gcc/testsuite/gcc.target/aarch64/atomic-inst-ldlogic.c
new file mode 100644
index 0000000..fd0f484
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/atomic-inst-ldlogic.c
@@ -0,0 +1,109 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv8-a+lse" } */
+
+/* Test ARMv8.1-A LD<logic-op> instruction.  */
+
+#include "atomic-inst-ops.inc"
+
+#define TEST TEST_ONE
+
+#define LOAD_OR(FN, TY, MODEL)						\
+  TY FNNAME (FN, TY) (TY* val, TY* foo)					\
+  {									\
+    return __atomic_fetch_or (val, foo, MODEL);				\
+  }
+
+#define LOAD_OR_NORETURN(FN, TY, MODEL)					\
+  void FNNAME (FN, TY) (TY* val, TY* foo)				\
+  {									\
+    __atomic_fetch_or (val, foo, MODEL);				\
+  }
+
+#define LOAD_AND(FN, TY, MODEL)						\
+  TY FNNAME (FN, TY) (TY* val, TY* foo)					\
+  {									\
+    return __atomic_fetch_and (val, foo, MODEL);			\
+  }
+
+#define LOAD_AND_NORETURN(FN, TY, MODEL)				\
+  void FNNAME (FN, TY) (TY* val, TY* foo)				\
+  {									\
+    __atomic_fetch_and (val, foo, MODEL);				\
+  }
+
+#define LOAD_XOR(FN, TY, MODEL)						\
+  TY FNNAME (FN, TY) (TY* val, TY* foo)					\
+  {									\
+    return __atomic_fetch_xor (val, foo, MODEL);			\
+  }
+
+#define LOAD_XOR_NORETURN(FN, TY, MODEL)				\
+  void FNNAME (FN, TY) (TY* val, TY* foo)				\
+  {									\
+    __atomic_fetch_xor (val, foo, MODEL);				\
+  }
+
+
+TEST (load_or, LOAD_OR)
+TEST (load_or_notreturn, LOAD_OR_NORETURN)
+
+TEST (load_and, LOAD_AND)
+TEST (load_and_notreturn, LOAD_AND_NORETURN)
+
+TEST (load_xor, LOAD_XOR)
+TEST (load_xor_notreturn, LOAD_XOR_NORETURN)
+
+/* Load-OR.  */
+
+/* { dg-final { scan-assembler-times "ldsetb\t" 4} } */
+/* { dg-final { scan-assembler-times "ldsetab\t" 8} } */
+/* { dg-final { scan-assembler-times "ldsetlb\t" 4} } */
+/* { dg-final { scan-assembler-times "ldsetalb\t" 8} } */
+
+/* { dg-final { scan-assembler-times "ldseth\t" 4} } */
+/* { dg-final { scan-assembler-times "ldsetah\t" 8} } */
+/* { dg-final { scan-assembler-times "ldsetlh\t" 4} } */
+/* { dg-final { scan-assembler-times "ldsetalh\t" 8} } */
+
+/* { dg-final { scan-assembler-times "ldset\t" 8} } */
+/* { dg-final { scan-assembler-times "ldseta\t" 16} } */
+/* { dg-final { scan-assembler-times "ldsetl\t" 8} } */
+/* { dg-final { scan-assembler-times "ldsetal\t" 16} } */
+
+/* Load-AND.  */
+
+/* { dg-final { scan-assembler-times "ldclrb\t" 4} } */
+/* { dg-final { scan-assembler-times "ldclrab\t" 8} } */
+/* { dg-final { scan-assembler-times "ldclrlb\t" 4} } */
+/* { dg-final { scan-assembler-times "ldclralb\t" 8} } */
+
+/* { dg-final { scan-assembler-times "ldclrh\t" 4} } */
+/* { dg-final { scan-assembler-times "ldclrah\t" 8} } */
+/* { dg-final { scan-assembler-times "ldclrlh\t" 4} } */
+/* { dg-final { scan-assembler-times "ldclralh\t" 8} } */
+
+/* { dg-final { scan-assembler-times "ldclr\t" 8} */
+/* { dg-final { scan-assembler-times "ldclra\t" 16} } */
+/* { dg-final { scan-assembler-times "ldclrl\t" 8} } */
+/* { dg-final { scan-assembler-times "ldclral\t" 16} } */
+
+/* Load-XOR.  */
+
+/* { dg-final { scan-assembler-times "ldeorb\t" 4} } */
+/* { dg-final { scan-assembler-times "ldeorab\t" 8} } */
+/* { dg-final { scan-assembler-times "ldeorlb\t" 4} } */
+/* { dg-final { scan-assembler-times "ldeoralb\t" 8} } */
+
+/* { dg-final { scan-assembler-times "ldeorh\t" 4} } */
+/* { dg-final { scan-assembler-times "ldeorah\t" 8} } */
+/* { dg-final { scan-assembler-times "ldeorlh\t" 4} } */
+/* { dg-final { scan-assembler-times "ldeoralh\t" 8} } */
+
+/* { dg-final { scan-assembler-times "ldeor\t" 8} */
+/* { dg-final { scan-assembler-times "ldeora\t" 16} } */
+/* { dg-final { scan-assembler-times "ldeorl\t" 8} } */
+/* { dg-final { scan-assembler-times "ldeoral\t" 16} } */
+
+/* { dg-final { scan-assembler-not "ldaxr\t" } } */
+/* { dg-final { scan-assembler-not "stlxr\t" } } */
+/* { dg-final { scan-assembler-not "dmb" } } */
-- 
2.1.4


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [AArch64][PATCH 5/5] Use atomic load-operate instructions for update-fetch patterns.
  2015-09-17 16:39 [AArch64][PATCH 1/5] Use atomic instructions for swap and fetch-update operations Matthew Wahab
                   ` (2 preceding siblings ...)
  2015-09-17 16:49 ` [AArch64][PATCH 4/5] Use atomic load-operate instructions for fetch-update patterns Matthew Wahab
@ 2015-09-17 16:56 ` Matthew Wahab
  2015-09-17 18:56   ` Andrew Pinski
  2015-09-21 11:44   ` Matthew Wahab
  2015-09-18  7:59 ` [AArch64][PATCH 1/5] Use atomic instructions for swap and fetch-update operations James Greenhalgh
  4 siblings, 2 replies; 20+ messages in thread
From: Matthew Wahab @ 2015-09-17 16:56 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2710 bytes --]

Hello,

ARMv8.1 adds atomic swap and atomic load-operate instructions with
optional memory ordering specifiers. This patch uses the ARMv8.1
load-operate instructions to implement the atomic_<op>_fetch patterns.

The approach is to use the atomic load-operate instruction to atomically
load the data and update memory and then to use the loaded data to
calculate the value that the instruction would have stored. The
calculation attempts to mirror the operation of the atomic instruction.
For example, atomic_and_fetch<mode> is implemented with an atomic
load-bic so the result is also calculated using a BIC instruction.

The general form of the code generated for an atomic_<op>_fetch, with
destination D, source S, memory address A and memory order MO, depends
on whether or not the operation is directly supported by the
instruction. If <op> is one of PLUS, IOR or XOR, the code generated is:

     ld<opc><mo><sz> S, D, [A]
     <inst> D, D, S
where
     <opc> is one {add, set, eor}
     <inst> is one of {add, orr, xor}
     <mo> is one of {'', 'a', 'l', 'al'} depending on memory order MO.
     <sz> is one of {'', 'b', 'h'} depending on the data size.

If <op> is SUB, the code generated is:

     neg S, S
     ldadd<mo><sz> S, D, [A]
     add D, D, S

If <op> is AND, the code generated is:

     not S, S
     ldclr<mo><sz> S, D, [A]
     bic D, S, S

Any operation not in {PLUS, IOR, XOR, SUB, AND} is passed to the
existing aarch64_split_atomic_op function, to implement the operation
using sequences built with the ARMv8 load-exclusive/store-exclusive
instructions

Tested the series for aarch64-none-linux-gnu with native bootstrap and
make check. Also tested for aarch64-none-elf with cross-compiled
check-gcc on an ARMv8.1 emulator with +lse enabled by default.

Ok for trunk?
Matthew

2015-09-17  Matthew Wahab  <matthew.wahab@arm.com>

	* config/aarch64/aarch64-protos.h (aarch64_gen_atomic_ldop):
	Adjust declaration.
	* config/aarch64/aarch64.c (aarch64_emit_bic): New.
	(aarch64_gen_atomic_load_op): Adjust comment.  Add parameter
	out_result.  Update to support update-fetch operations.
	* config/aarch64/atomics.md (aarch64_atomic_exchange<mode>_lse):
	Adjust for change to aarch64_gen_atomic_ldop.
	(aarch64_atomic_<atomic_optab><mode>_lse): Likewise.
	(aarch64_atomic_fetch_<atomic_optab><mode>_lse): Likewise.
	(atomic_<atomic_optab>_fetch<mode>): Change to an expander.
	(aarch64_atomic_<atomic_optab>_fetch<mode>): New.
	(aarch64_atomic_<atomic_optab>_fetch<mode>_lse): New.

gcc/testsuite
2015-09-17  Matthew Wahab  <matthew.wahab@arm.com>

	* gcc.target/aarch64/atomic-inst-ldadd.c: Add tests for
	update-fetch operations.
	* gcc.target/aarch64/atomic-inst-ldlogic.c: Likewise.


[-- Attachment #2: 0005-Use-atomic-instructions-for-update-fetch-patterns.patch --]
[-- Type: text/x-patch, Size: 16761 bytes --]

From 577bdb656e451df5ce1c8c651a642c3ac4d7c73b Mon Sep 17 00:00:00 2001
From: Matthew Wahab <matthew.wahab@arm.com>
Date: Mon, 17 Aug 2015 11:27:18 +0100
Subject: [PATCH 5/5] Use atomic instructions for update-fetch patterns.

Change-Id: I5eef48586fe904f0d2df8c581fb3c12a4a2d9c78
---
 gcc/config/aarch64/aarch64-protos.h                |   2 +-
 gcc/config/aarch64/aarch64.c                       |  72 +++++++++++--
 gcc/config/aarch64/atomics.md                      |  61 ++++++++++-
 .../gcc.target/aarch64/atomic-inst-ldadd.c         |  53 ++++++---
 .../gcc.target/aarch64/atomic-inst-ldlogic.c       | 118 ++++++++++++++-------
 5 files changed, 247 insertions(+), 59 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 76ebd6f..dd8ebcc 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -380,7 +380,7 @@ void aarch64_split_compare_and_swap (rtx op[]);
 void aarch64_gen_atomic_cas (rtx, rtx, rtx, rtx, rtx);
 
 bool aarch64_atomic_ldop_supported_p (enum rtx_code);
-void aarch64_gen_atomic_ldop (enum rtx_code, rtx, rtx, rtx, rtx);
+void aarch64_gen_atomic_ldop (enum rtx_code, rtx, rtx, rtx, rtx, rtx);
 void aarch64_split_atomic_op (enum rtx_code, rtx, rtx, rtx, rtx, rtx, rtx);
 
 bool aarch64_gen_adjusted_ldpstp (rtx *, bool, enum machine_mode, RTX_CODE);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 33f9ef3..d95b81f 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -11212,6 +11212,25 @@ aarch64_split_compare_and_swap (rtx operands[])
     aarch64_emit_post_barrier (model);
 }
 
+/* Emit a BIC instruction.  */
+
+static void
+aarch64_emit_bic (machine_mode mode, rtx dst, rtx s1, rtx s2, int shift)
+{
+  rtx shift_rtx = GEN_INT (shift);
+  rtx (*gen) (rtx, rtx, rtx, rtx);
+
+  switch (mode)
+    {
+    case SImode: gen = gen_bic_lshrsi3; break;
+    case DImode: gen = gen_bic_lshrdi3; break;
+    default:
+      gcc_unreachable ();
+    }
+
+  emit_insn (gen (dst, s2, shift_rtx, s1));
+}
+
 /* Emit an atomic swap.  */
 
 static void
@@ -11306,13 +11325,14 @@ aarch64_emit_atomic_load_op (enum aarch64_atomic_load_op_code code,
 }
 
 /* Emit an atomic load+operate.  CODE is the operation.  OUT_DATA is the
-   location to store the data read from memory.  MEM is the memory location to
-   read and modify.  MODEL_RTX is the memory ordering to use.  VALUE is the
-   second operand for the operation.  Either OUT_DATA or OUT_RESULT, but not
-   both, can be NULL.  */
+   location to store the data read from memory.  OUT_RESULT is the location to
+   store the result of the operation.  MEM is the memory location to read and
+   modify.  MODEL_RTX is the memory ordering to use.  VALUE is the second
+   operand for the operation.  Either OUT_DATA or OUT_RESULT, but not both, can
+   be NULL.  */
 
 void
-aarch64_gen_atomic_ldop (enum rtx_code code, rtx out_data,
+aarch64_gen_atomic_ldop (enum rtx_code code, rtx out_data, rtx out_result,
 			 rtx mem, rtx value, rtx model_rtx)
 {
   machine_mode mode = GET_MODE (mem);
@@ -11325,12 +11345,15 @@ aarch64_gen_atomic_ldop (enum rtx_code code, rtx out_data,
   if (out_data)
     out_data = gen_lowpart (mode, out_data);
 
+  if (out_result)
+    out_result = gen_lowpart (mode, out_result);
+
   /* Make sure the value is in a register, putting it into a destination
      register if it needs to be manipulated.  */
   if (!register_operand (value, mode)
       || code == AND || code == MINUS)
     {
-      src = out_data;
+      src = out_result ? out_result : out_data;
       emit_move_insn (src, gen_lowpart (mode, value));
     }
   else
@@ -11396,6 +11419,43 @@ aarch64_gen_atomic_ldop (enum rtx_code code, rtx out_data,
     }
 
   aarch64_emit_atomic_load_op (ldop_code, mode, out_data, src, mem, model_rtx);
+
+  /* If necessary, calculate the data in memory after the update by redoing the
+     operation from values in registers.  */
+  if (!out_result)
+    return;
+
+  if (short_mode)
+    {
+      src = gen_lowpart (wmode, src);
+      out_data = gen_lowpart (wmode, out_data);
+      out_result = gen_lowpart (wmode, out_result);
+    }
+
+  x = NULL_RTX;
+
+  switch (code)
+    {
+    case MINUS:
+    case PLUS:
+      x = gen_rtx_PLUS (wmode, out_data, src);
+      break;
+    case IOR:
+      x = gen_rtx_IOR (wmode, out_data, src);
+      break;
+    case XOR:
+      x = gen_rtx_XOR (wmode, out_data, src);
+      break;
+    case AND:
+      aarch64_emit_bic (wmode, out_result, out_data, src, 0);
+      return;
+    default:
+      gcc_unreachable ();
+    }
+
+  emit_set_insn (out_result, x);
+
+  return;
 }
 
 /* Split an atomic operation.  */
diff --git a/gcc/config/aarch64/atomics.md b/gcc/config/aarch64/atomics.md
index 5a10cf9..2563cd5 100644
--- a/gcc/config/aarch64/atomics.md
+++ b/gcc/config/aarch64/atomics.md
@@ -224,7 +224,7 @@
   "&& reload_completed"
   [(const_int 0)]
   {
-    aarch64_gen_atomic_ldop (SET, operands[0], operands[1],
+    aarch64_gen_atomic_ldop (SET, operands[0], NULL, operands[1],
 			     operands[2], operands[3]);
     DONE;
   }
@@ -289,7 +289,7 @@
   "&& reload_completed"
   [(const_int 0)]
   {
-    aarch64_gen_atomic_ldop (<CODE>, operands[3], operands[0],
+    aarch64_gen_atomic_ldop (<CODE>, operands[3], NULL, operands[0],
 			     operands[1], operands[2]);
     DONE;
   }
@@ -385,7 +385,7 @@
   "&& reload_completed"
   [(const_int 0)]
   {
-    aarch64_gen_atomic_ldop (<CODE>, operands[0], operands[1],
+    aarch64_gen_atomic_ldop (<CODE>, operands[0], NULL, operands[1],
 			     operands[2], operands[3]);
     DONE;
   }
@@ -415,7 +415,36 @@
   }
 )
 
-(define_insn_and_split "atomic_<atomic_optab>_fetch<mode>"
+;; Load-operate-store, returning the original memory data.
+
+(define_expand "atomic_<atomic_optab>_fetch<mode>"
+ [(parallel
+   [(set (match_operand:ALLI 0 "register_operand" "")
+     (atomic_op:ALLI
+      (match_operand:ALLI 1 "aarch64_sync_memory_operand" "")
+      (match_operand:ALLI 2 "<atomic_op_operand>" "")))
+    (set (match_dup 1)
+     (unspec_volatile:ALLI
+      [(match_dup 1) (match_dup 2)
+       (match_operand:SI 3 "const_int_operand")]
+      UNSPECV_ATOMIC_OP))])]
+ ""
+{
+  rtx (*gen) (rtx, rtx, rtx, rtx);
+  rtx value = operands[2];
+
+  /* Use an atomic load-operate instruction when possible.  */
+  if (aarch64_atomic_ldop_supported_p (<CODE>))
+    gen = gen_aarch64_atomic_<atomic_optab>_fetch<mode>_lse;
+  else
+    gen = gen_aarch64_atomic_<atomic_optab>_fetch<mode>;
+
+  emit_insn (gen (operands[0], operands[1], value, operands[3]));
+
+  DONE;
+})
+
+(define_insn_and_split "aarch64_atomic_<atomic_optab>_fetch<mode>"
   [(set (match_operand:ALLI 0 "register_operand" "=&r")
     (atomic_op:ALLI
       (match_operand:ALLI 1 "aarch64_sync_memory_operand" "+Q")
@@ -438,6 +467,30 @@
   }
 )
 
+(define_insn_and_split "aarch64_atomic_<atomic_optab>_fetch<mode>_lse"
+  [(set (match_operand:ALLI 0 "register_operand" "=&r")
+    (atomic_op:ALLI
+     (match_operand:ALLI 1 "aarch64_sync_memory_operand" "+Q")
+     (match_operand:ALLI 2 "<atomic_op_operand>" "r<const_atomic>")))
+   (set (match_dup 1)
+    (unspec_volatile:ALLI
+      [(match_dup 1)
+       (match_dup 2)
+       (match_operand:SI 3 "const_int_operand")]
+      UNSPECV_ATOMIC_LDOP))
+     (clobber (match_scratch:ALLI 4 "=r"))
+     (clobber (reg:CC CC_REGNUM))]
+  "TARGET_LSE"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+  {
+    aarch64_gen_atomic_ldop (<CODE>, operands[4], operands[0], operands[1],
+			     operands[2], operands[3]);
+    DONE;
+  }
+)
+
 (define_insn_and_split "atomic_nand_fetch<mode>"
   [(set (match_operand:ALLI 0 "register_operand" "=&r")
     (not:ALLI
diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-inst-ldadd.c b/gcc/testsuite/gcc.target/aarch64/atomic-inst-ldadd.c
index c21d2ed..4b2282c 100644
--- a/gcc/testsuite/gcc.target/aarch64/atomic-inst-ldadd.c
+++ b/gcc/testsuite/gcc.target/aarch64/atomic-inst-ldadd.c
@@ -31,6 +31,29 @@
     __atomic_fetch_sub (val, foo, MODEL);				\
   }
 
+#define ADD_LOAD(FN, TY, MODEL)						\
+  TY FNNAME (FN, TY) (TY* val, TY* foo)					\
+  {									\
+    return __atomic_add_fetch (val, foo, MODEL);			\
+  }
+
+#define ADD_LOAD_NORETURN(FN, TY, MODEL)				\
+  void FNNAME (FN, TY) (TY* val, TY* foo)				\
+  {									\
+    __atomic_add_fetch (val, foo, MODEL);				\
+  }
+
+#define SUB_LOAD(FN, TY, MODEL)						\
+  TY FNNAME (FN, TY) (TY* val, TY* foo)					\
+  {									\
+    return __atomic_sub_fetch (val, foo, MODEL);			\
+  }
+
+#define SUB_LOAD_NORETURN(FN, TY, MODEL)				\
+  void FNNAME (FN, TY) (TY* val, TY* foo)				\
+  {									\
+    __atomic_sub_fetch (val, foo, MODEL);				\
+  }
 
 TEST (load_add, LOAD_ADD)
 TEST (load_add_notreturn, LOAD_ADD_NORETURN)
@@ -38,20 +61,26 @@ TEST (load_add_notreturn, LOAD_ADD_NORETURN)
 TEST (load_sub, LOAD_SUB)
 TEST (load_sub_notreturn, LOAD_SUB_NORETURN)
 
-/* { dg-final { scan-assembler-times "ldaddb\t" 8} } */
-/* { dg-final { scan-assembler-times "ldaddab\t" 16} } */
-/* { dg-final { scan-assembler-times "ldaddlb\t" 8} } */
-/* { dg-final { scan-assembler-times "ldaddalb\t" 16} } */
+TEST (add_load, ADD_LOAD)
+TEST (add_load_notreturn, ADD_LOAD_NORETURN)
+
+TEST (sub_load, SUB_LOAD)
+TEST (sub_load_notreturn, SUB_LOAD_NORETURN)
+
+/* { dg-final { scan-assembler-times "ldaddb\t" 16} } */
+/* { dg-final { scan-assembler-times "ldaddab\t" 32} } */
+/* { dg-final { scan-assembler-times "ldaddlb\t" 16} } */
+/* { dg-final { scan-assembler-times "ldaddalb\t" 32} } */
 
-/* { dg-final { scan-assembler-times "ldaddh\t" 8} } */
-/* { dg-final { scan-assembler-times "ldaddah\t" 16} } */
-/* { dg-final { scan-assembler-times "ldaddlh\t" 8} } */
-/* { dg-final { scan-assembler-times "ldaddalh\t" 16} } */
+/* { dg-final { scan-assembler-times "ldaddh\t" 16} } */
+/* { dg-final { scan-assembler-times "ldaddah\t" 32} } */
+/* { dg-final { scan-assembler-times "ldaddlh\t" 16} } */
+/* { dg-final { scan-assembler-times "ldaddalh\t" 32} } */
 
-/* { dg-final { scan-assembler-times "ldadd\t" 16} } */
-/* { dg-final { scan-assembler-times "ldadda\t" 32} } */
-/* { dg-final { scan-assembler-times "ldaddl\t" 16} } */
-/* { dg-final { scan-assembler-times "ldaddal\t" 32} } */
+/* { dg-final { scan-assembler-times "ldadd\t" 32} } */
+/* { dg-final { scan-assembler-times "ldadda\t" 64} } */
+/* { dg-final { scan-assembler-times "ldaddl\t" 32} } */
+/* { dg-final { scan-assembler-times "ldaddal\t" 64} } */
 
 /* { dg-final { scan-assembler-not "ldaxr\t" } } */
 /* { dg-final { scan-assembler-not "stlxr\t" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-inst-ldlogic.c b/gcc/testsuite/gcc.target/aarch64/atomic-inst-ldlogic.c
index fd0f484..4879d52 100644
--- a/gcc/testsuite/gcc.target/aarch64/atomic-inst-ldlogic.c
+++ b/gcc/testsuite/gcc.target/aarch64/atomic-inst-ldlogic.c
@@ -43,6 +43,42 @@
     __atomic_fetch_xor (val, foo, MODEL);				\
   }
 
+#define OR_LOAD(FN, TY, MODEL)						\
+  TY FNNAME (FN, TY) (TY* val, TY* foo)					\
+  {									\
+    return __atomic_or_fetch (val, foo, MODEL);				\
+  }
+
+#define OR_LOAD_NORETURN(FN, TY, MODEL)					\
+  void FNNAME (FN, TY) (TY* val, TY* foo)				\
+  {									\
+    __atomic_or_fetch (val, foo, MODEL);				\
+  }
+
+#define AND_LOAD(FN, TY, MODEL)						\
+  TY FNNAME (FN, TY) (TY* val, TY* foo)					\
+  {									\
+    return __atomic_and_fetch (val, foo, MODEL);			\
+  }
+
+#define AND_LOAD_NORETURN(FN, TY, MODEL)				\
+  void FNNAME (FN, TY) (TY* val, TY* foo)				\
+  {									\
+    __atomic_and_fetch (val, foo, MODEL);				\
+  }
+
+#define XOR_LOAD(FN, TY, MODEL)						\
+  TY FNNAME (FN, TY) (TY* val, TY* foo)					\
+  {									\
+    return __atomic_xor_fetch (val, foo, MODEL);			\
+  }
+
+#define XOR_LOAD_NORETURN(FN, TY, MODEL)				\
+  void FNNAME (FN, TY) (TY* val, TY* foo)				\
+  {									\
+    __atomic_xor_fetch (val, foo, MODEL);				\
+  }
+
 
 TEST (load_or, LOAD_OR)
 TEST (load_or_notreturn, LOAD_OR_NORETURN)
@@ -53,56 +89,66 @@ TEST (load_and_notreturn, LOAD_AND_NORETURN)
 TEST (load_xor, LOAD_XOR)
 TEST (load_xor_notreturn, LOAD_XOR_NORETURN)
 
+TEST (or_load, OR_LOAD)
+TEST (or_load_notreturn, OR_LOAD_NORETURN)
+
+TEST (and_load, AND_LOAD)
+TEST (and_load_notreturn, AND_LOAD_NORETURN)
+
+TEST (xor_load, XOR_LOAD)
+TEST (xor_load_notreturn, XOR_LOAD_NORETURN)
+
+
 /* Load-OR.  */
 
-/* { dg-final { scan-assembler-times "ldsetb\t" 4} } */
-/* { dg-final { scan-assembler-times "ldsetab\t" 8} } */
-/* { dg-final { scan-assembler-times "ldsetlb\t" 4} } */
-/* { dg-final { scan-assembler-times "ldsetalb\t" 8} } */
+/* { dg-final { scan-assembler-times "ldsetb\t" 8} } */
+/* { dg-final { scan-assembler-times "ldsetab\t" 16} } */
+/* { dg-final { scan-assembler-times "ldsetlb\t" 8} } */
+/* { dg-final { scan-assembler-times "ldsetalb\t" 16} } */
 
-/* { dg-final { scan-assembler-times "ldseth\t" 4} } */
-/* { dg-final { scan-assembler-times "ldsetah\t" 8} } */
-/* { dg-final { scan-assembler-times "ldsetlh\t" 4} } */
-/* { dg-final { scan-assembler-times "ldsetalh\t" 8} } */
+/* { dg-final { scan-assembler-times "ldseth\t" 8} } */
+/* { dg-final { scan-assembler-times "ldsetah\t" 16} } */
+/* { dg-final { scan-assembler-times "ldsetlh\t" 8} } */
+/* { dg-final { scan-assembler-times "ldsetalh\t" 16} } */
 
-/* { dg-final { scan-assembler-times "ldset\t" 8} } */
-/* { dg-final { scan-assembler-times "ldseta\t" 16} } */
-/* { dg-final { scan-assembler-times "ldsetl\t" 8} } */
-/* { dg-final { scan-assembler-times "ldsetal\t" 16} } */
+/* { dg-final { scan-assembler-times "ldset\t" 16} } */
+/* { dg-final { scan-assembler-times "ldseta\t" 32} } */
+/* { dg-final { scan-assembler-times "ldsetl\t" 16} } */
+/* { dg-final { scan-assembler-times "ldsetal\t" 32} } */
 
 /* Load-AND.  */
 
-/* { dg-final { scan-assembler-times "ldclrb\t" 4} } */
-/* { dg-final { scan-assembler-times "ldclrab\t" 8} } */
-/* { dg-final { scan-assembler-times "ldclrlb\t" 4} } */
-/* { dg-final { scan-assembler-times "ldclralb\t" 8} } */
+/* { dg-final { scan-assembler-times "ldclrb\t" 8} } */
+/* { dg-final { scan-assembler-times "ldclrab\t" 16} } */
+/* { dg-final { scan-assembler-times "ldclrlb\t" 8} } */
+/* { dg-final { scan-assembler-times "ldclralb\t" 16} } */
 
-/* { dg-final { scan-assembler-times "ldclrh\t" 4} } */
-/* { dg-final { scan-assembler-times "ldclrah\t" 8} } */
-/* { dg-final { scan-assembler-times "ldclrlh\t" 4} } */
-/* { dg-final { scan-assembler-times "ldclralh\t" 8} } */
+/* { dg-final { scan-assembler-times "ldclrh\t" 8} } */
+/* { dg-final { scan-assembler-times "ldclrah\t" 16} } */
+/* { dg-final { scan-assembler-times "ldclrlh\t" 8} } */
+/* { dg-final { scan-assembler-times "ldclralh\t" 16} } */
 
-/* { dg-final { scan-assembler-times "ldclr\t" 8} */
-/* { dg-final { scan-assembler-times "ldclra\t" 16} } */
-/* { dg-final { scan-assembler-times "ldclrl\t" 8} } */
-/* { dg-final { scan-assembler-times "ldclral\t" 16} } */
+/* { dg-final { scan-assembler-times "ldclr\t" 16} */
+/* { dg-final { scan-assembler-times "ldclra\t" 32} } */
+/* { dg-final { scan-assembler-times "ldclrl\t" 16} } */
+/* { dg-final { scan-assembler-times "ldclral\t" 32} } */
 
 /* Load-XOR.  */
 
-/* { dg-final { scan-assembler-times "ldeorb\t" 4} } */
-/* { dg-final { scan-assembler-times "ldeorab\t" 8} } */
-/* { dg-final { scan-assembler-times "ldeorlb\t" 4} } */
-/* { dg-final { scan-assembler-times "ldeoralb\t" 8} } */
+/* { dg-final { scan-assembler-times "ldeorb\t" 8} } */
+/* { dg-final { scan-assembler-times "ldeorab\t" 16} } */
+/* { dg-final { scan-assembler-times "ldeorlb\t" 8} } */
+/* { dg-final { scan-assembler-times "ldeoralb\t" 16} } */
 
-/* { dg-final { scan-assembler-times "ldeorh\t" 4} } */
-/* { dg-final { scan-assembler-times "ldeorah\t" 8} } */
-/* { dg-final { scan-assembler-times "ldeorlh\t" 4} } */
-/* { dg-final { scan-assembler-times "ldeoralh\t" 8} } */
+/* { dg-final { scan-assembler-times "ldeorh\t" 8} } */
+/* { dg-final { scan-assembler-times "ldeorah\t" 16} } */
+/* { dg-final { scan-assembler-times "ldeorlh\t" 8} } */
+/* { dg-final { scan-assembler-times "ldeoralh\t" 16} } */
 
-/* { dg-final { scan-assembler-times "ldeor\t" 8} */
-/* { dg-final { scan-assembler-times "ldeora\t" 16} } */
-/* { dg-final { scan-assembler-times "ldeorl\t" 8} } */
-/* { dg-final { scan-assembler-times "ldeoral\t" 16} } */
+/* { dg-final { scan-assembler-times "ldeor\t" 16} */
+/* { dg-final { scan-assembler-times "ldeora\t" 32} } */
+/* { dg-final { scan-assembler-times "ldeorl\t" 16} } */
+/* { dg-final { scan-assembler-times "ldeoral\t" 32} } */
 
 /* { dg-final { scan-assembler-not "ldaxr\t" } } */
 /* { dg-final { scan-assembler-not "stlxr\t" } } */
-- 
2.1.4


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [AArch64][PATCH 5/5] Use atomic load-operate instructions for update-fetch patterns.
  2015-09-17 16:56 ` [AArch64][PATCH 5/5] Use atomic load-operate instructions for update-fetch patterns Matthew Wahab
@ 2015-09-17 18:56   ` Andrew Pinski
  2015-09-18 16:27     ` Ramana Radhakrishnan
  2015-09-21 11:44   ` Matthew Wahab
  1 sibling, 1 reply; 20+ messages in thread
From: Andrew Pinski @ 2015-09-17 18:56 UTC (permalink / raw)
  To: Matthew Wahab; +Cc: GCC Patches

On Thu, Sep 17, 2015 at 9:54 AM, Matthew Wahab
<matthew.wahab@foss.arm.com> wrote:
> Hello,
>
> ARMv8.1 adds atomic swap and atomic load-operate instructions with
> optional memory ordering specifiers. This patch uses the ARMv8.1
> load-operate instructions to implement the atomic_<op>_fetch patterns.
>
> The approach is to use the atomic load-operate instruction to atomically
> load the data and update memory and then to use the loaded data to
> calculate the value that the instruction would have stored. The
> calculation attempts to mirror the operation of the atomic instruction.
> For example, atomic_and_fetch<mode> is implemented with an atomic
> load-bic so the result is also calculated using a BIC instruction.
>
> The general form of the code generated for an atomic_<op>_fetch, with
> destination D, source S, memory address A and memory order MO, depends
> on whether or not the operation is directly supported by the
> instruction. If <op> is one of PLUS, IOR or XOR, the code generated is:
>
>     ld<opc><mo><sz> S, D, [A]
>     <inst> D, D, S
> where
>     <opc> is one {add, set, eor}
>     <inst> is one of {add, orr, xor}
>     <mo> is one of {'', 'a', 'l', 'al'} depending on memory order MO.
>     <sz> is one of {'', 'b', 'h'} depending on the data size.
>
> If <op> is SUB, the code generated is:
>
>     neg S, S
>     ldadd<mo><sz> S, D, [A]
>     add D, D, S
>
> If <op> is AND, the code generated is:
>
>     not S, S
>     ldclr<mo><sz> S, D, [A]
>     bic D, S, S
>
> Any operation not in {PLUS, IOR, XOR, SUB, AND} is passed to the
> existing aarch64_split_atomic_op function, to implement the operation
> using sequences built with the ARMv8 load-exclusive/store-exclusive
> instructions
>
> Tested the series for aarch64-none-linux-gnu with native bootstrap and
> make check. Also tested for aarch64-none-elf with cross-compiled
> check-gcc on an ARMv8.1 emulator with +lse enabled by default.


Are you going to add some builtins for MIN/MAX support too?

Thanks,
Andrew Pinski

>
> Ok for trunk?
> Matthew
>
> 2015-09-17  Matthew Wahab  <matthew.wahab@arm.com>
>
>         * config/aarch64/aarch64-protos.h (aarch64_gen_atomic_ldop):
>         Adjust declaration.
>         * config/aarch64/aarch64.c (aarch64_emit_bic): New.
>         (aarch64_gen_atomic_load_op): Adjust comment.  Add parameter
>         out_result.  Update to support update-fetch operations.
>         * config/aarch64/atomics.md (aarch64_atomic_exchange<mode>_lse):
>         Adjust for change to aarch64_gen_atomic_ldop.
>         (aarch64_atomic_<atomic_optab><mode>_lse): Likewise.
>         (aarch64_atomic_fetch_<atomic_optab><mode>_lse): Likewise.
>         (atomic_<atomic_optab>_fetch<mode>): Change to an expander.
>         (aarch64_atomic_<atomic_optab>_fetch<mode>): New.
>         (aarch64_atomic_<atomic_optab>_fetch<mode>_lse): New.
>
> gcc/testsuite
> 2015-09-17  Matthew Wahab  <matthew.wahab@arm.com>
>
>         * gcc.target/aarch64/atomic-inst-ldadd.c: Add tests for
>         update-fetch operations.
>         * gcc.target/aarch64/atomic-inst-ldlogic.c: Likewise.
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [AArch64][PATCH 1/5] Use atomic instructions for swap and fetch-update operations.
  2015-09-17 16:39 [AArch64][PATCH 1/5] Use atomic instructions for swap and fetch-update operations Matthew Wahab
                   ` (3 preceding siblings ...)
  2015-09-17 16:56 ` [AArch64][PATCH 5/5] Use atomic load-operate instructions for update-fetch patterns Matthew Wahab
@ 2015-09-18  7:59 ` James Greenhalgh
  2015-09-21 11:06   ` Matthew Wahab
  4 siblings, 1 reply; 20+ messages in thread
From: James Greenhalgh @ 2015-09-18  7:59 UTC (permalink / raw)
  To: Matthew Wahab; +Cc: gcc-patches

On Thu, Sep 17, 2015 at 05:37:55PM +0100, Matthew Wahab wrote:
> Hello,
> 
> ARMv8.1 adds atomic swap and atomic load-operate instructions with
> optional memory ordering specifiers. This patch series adds the
> instructions to GCC, making them available with -march=armv8.1-a or
> -march=armv8+lse, and uses them to implement the __sync and __atomic
> builtins.
> 
> The ARMv8.1 swap instruction swaps the value in a register with a value
> in memory. The load-operate instructions load a value from memory,
> update it with the result of an operation and store the result in
> memory.
> 
> This series uses the swap instruction to implement the atomic_exchange
> patterns and the load-operate instructions to implement the
> atomic_fetch_<op> and atomic_<op>_fetch patterns. For the
> atomic_<op>_fetch patterns, the value returned as the result of the
> operation has to be recalculated from the loaded data. The ARMv8 BIC
> instruction is added so that it can be used for this recalculation.
> 
> The patches in this series
> - add and use the atomic swap instruction.
> - add the Aarch64 BIC instruction,
> - add the ARMv8.1 load-operate instructions,
> - use the load-operate instructions to implement the atomic_fetch_<op>
>    patterns,
> - use the load-operate instructions to implement the patterns
>    atomic_<op>_fetch patterns,
> 
> The code-generation changes in this series are based around a new
> function, aarch64_gen_atomic_ldop, which takes the operation to be
> implemented and emits the appropriate code, making use of the atomic
> instruction. This follows the existing uses aarch64_split_atomic_op for
> the same purpose when atomic instructions aren't available.
> 
> This patch adds the ARMv8.1 SWAP instruction and function
> aarch64_gen_atomic_ldop and changes the implementation of the
> atomic_exchange pattern to the atomic instruction when it is available.
> 
> The general form of the code generated for an atomic_exchange, with
> destination D, source S, memory address A and memory order MO is:
> 
>     swp<mo><sz> S, D, [A]
> 
> where
>     <mo> is one of {'', 'a', 'l', 'al'} depending on memory order MO.
>     <sz> is one of {'', 'b', 'h'} depending on the data size.
> 
> This patch also adds tests for the changes. These reuse the support code
> introduced for the atomic CAS tests, adding macros to test functions
> taking one memory ordering argument. These are used to iteratively
> define functions using the __atomic_exchange builtins, which should be
> implemented using the atomic swap.
> 
> Tested the series for aarch64-none-linux-gnu with native bootstrap and
> make check. Also tested for aarch64-none-elf with cross-compiled
> check-gcc on an ARMv8.1 emulator with +lse enabled by default.
> 
> Ok for trunk?

OK, though I have a question on one patch hunk.

> gcc/
> 2015-09-17  Matthew Wahab  <matthew.wahab@arm.com>
> 
> 	* config/aarch64/aarch64-protos.h (aarch64_gen_atomic_ldop):
> 	Declare.
> 	* config/aarch64/aarch64.c (aarch64_emit_atomic_swp): New.
> 	(aarch64_gen_atomic_ldop): New.
> 	(aarch64_split_atomic_op): Fix whitespace and add a comment.
> 	* config/aarch64/atomics.md (UNSPECV_ATOMIC_SWP): New.
> 	(atomic_compare_and_swap<mode>_lse): Remove comments and fix
> 	whitespace.
> 	(atomic_exchange<mode>): Replace with an expander.
> 	(aarch64_atomic_exchange<mode>): New.
> 	(aarch64_atomic_exchange<mode>_lse): New.
> 	(aarch64_atomic_<atomic_optab><mode>): Fix some whitespace.
> 	(aarch64_atomic_swp<mode>): New.
> 
> 
> gcc/testsuite/
> 2015-09-17  Matthew Wahab  <matthew.wahab@arm.com>
> 
> 	* gcc.target/aarch64/atomic-inst-ops.inc: (TEST_MODEL): New.
> 	(TEST_ONE): New.
>           * gcc.target/aarch64/atomic-inst-swap.c: New.
> 

> diff --git a/gcc/config/aarch64/atomics.md b/gcc/config/aarch64/atomics.md
> index 65d2cc9..0e71002 100644
> --- a/gcc/config/aarch64/atomics.md
> +++ b/gcc/config/aarch64/atomics.md
> @@ -27,6 +27,7 @@
>      UNSPECV_ATOMIC_CMPSW		; Represent an atomic compare swap.
>      UNSPECV_ATOMIC_EXCHG		; Represent an atomic exchange.
>      UNSPECV_ATOMIC_CAS			; Represent an atomic CAS.
> +    UNSPECV_ATOMIC_SWP			; Represent an atomic SWP.
>      UNSPECV_ATOMIC_OP			; Represent an atomic operation.
>  ])
>  
> @@ -122,19 +123,19 @@
>  )
>  
>  (define_insn_and_split "aarch64_compare_and_swap<mode>_lse"
> -  [(set (reg:CC CC_REGNUM)					;; bool out
> +  [(set (reg:CC CC_REGNUM)
>      (unspec_volatile:CC [(const_int 0)] UNSPECV_ATOMIC_CMPSW))
> -   (set (match_operand:GPI 0 "register_operand" "=&r")		;; val out
> -    (match_operand:GPI 1 "aarch64_sync_memory_operand" "+Q"))   ;; memory
> +   (set (match_operand:GPI 0 "register_operand" "=&r")
> +    (match_operand:GPI 1 "aarch64_sync_memory_operand" "+Q"))
>     (set (match_dup 1)
>      (unspec_volatile:GPI
> -      [(match_operand:GPI 2 "aarch64_plus_operand" "rI")	;; expect
> -       (match_operand:GPI 3 "register_operand" "r")		;; desired
> -       (match_operand:SI 4 "const_int_operand")			;; is_weak
> -       (match_operand:SI 5 "const_int_operand")			;; mod_s
> -       (match_operand:SI 6 "const_int_operand")]		;; mod_f
> +      [(match_operand:GPI 2 "aarch64_plus_operand" "rI")
> +       (match_operand:GPI 3 "register_operand" "r")
> +       (match_operand:SI 4 "const_int_operand")
> +       (match_operand:SI 5 "const_int_operand")
> +       (match_operand:SI 6 "const_int_operand")]

I'm not sure I understand the change here, those comments still look helpful
enough for understanding the pattern, what have a I missed?

Thanks,
James

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [AArch64][PATCH 2/5] Add BIC instruction.
  2015-09-17 16:42 ` [AArch64][PATCH 2/5] Add BIC instruction Matthew Wahab
@ 2015-09-18  8:11   ` James Greenhalgh
  2015-09-21 11:16     ` [AArch64][PATCH 2/5] Make BIC, other logical instructions, available. (was: Add BIC instruction.) Matthew Wahab
  0 siblings, 1 reply; 20+ messages in thread
From: James Greenhalgh @ 2015-09-18  8:11 UTC (permalink / raw)
  To: Matthew Wahab; +Cc: gcc-patches

On Thu, Sep 17, 2015 at 05:40:48PM +0100, Matthew Wahab wrote:
> Hello,
> 
> ARMv8.1 adds atomic swap and atomic load-operate instructions with
> optional memory ordering specifiers. This patch adds an expander to
> generate a BIC instruction that can be explicitly called when
> implementing the atomic_<op>_fetch pattern to calculate the value to
> be returned by the operation.
> 
> Tested the series for aarch64-none-linux-gnu with native bootstrap and
> make check. Also tested for aarch64-none-elf with cross-compiled
> check-gcc on an ARMv8.1 emulator with +lse enabled by default.
> 
> Ok for trunk?

Why not make the "*<LOGICAL:optab>_one_cmpl_<SHIFT:optab><mode>3" pattern
named (remove the leading *) and call that in your atomic_<op>_fetch
patterns as:

  and_one_cmpl_<shift><mode>3

I'd rather that than to add a pettern that simply expands to the same
thing.

Thanks,
James

> 
> 2015-09-17  Matthew Wahab  <matthew.wahab@arm.com>
> 
> 	* config/aarch64/aarch64.md (bic_<SHIFT:optab><mode>3): New.
> 
> 

> From 14e122ee98aa20826ee070d20c58c94206cdd91b Mon Sep 17 00:00:00 2001
> From: Matthew Wahab <matthew.wahab@arm.com>
> Date: Mon, 17 Aug 2015 17:48:27 +0100
> Subject: [PATCH 2/5] Add BIC instruction
> 
> Change-Id: Ibef049bfa1bfe5e168feada3dc358f28383e6410
> ---
>  gcc/config/aarch64/aarch64.md | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 88ba72e..bae4af4 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -3351,6 +3351,19 @@
>     (set_attr "simd" "*,yes")]
>  )
>  
> +(define_expand "bic_<SHIFT:optab><mode>3"
> + [(set (match_operand:GPI 0 "register_operand" "=r")
> +   (and:GPI
> +    (not:GPI
> +     (SHIFT:GPI
> +      (match_operand:GPI 1 "register_operand" "r")
> +      (match_operand:QI 2 "aarch64_shift_imm_si" "n")))
> +    (match_operand:GPI 3 "register_operand" "r")))]
> + ""
> + ""
> + [(set_attr "type" "logics_shift_imm")]
> +)
> +
>  (define_insn "*and_one_cmpl<mode>3_compare0"
>    [(set (reg:CC_NZ CC_REGNUM)
>  	(compare:CC_NZ
> -- 
> 2.1.4
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [AArch64][PATCH 3/5] Add atomic load-operate instructions.
  2015-09-17 16:47 ` [AArch64][PATCH 3/5] Add atomic load-operate instructions Matthew Wahab
@ 2015-09-18  8:39   ` James Greenhalgh
  2015-09-21 11:23     ` Matthew Wahab
  0 siblings, 1 reply; 20+ messages in thread
From: James Greenhalgh @ 2015-09-18  8:39 UTC (permalink / raw)
  To: Matthew Wahab; +Cc: gcc-patches

On Thu, Sep 17, 2015 at 05:42:35PM +0100, Matthew Wahab wrote:
> Hello,
> 
> ARMv8.1 adds atomic swap and atomic load-operate instructions with
> optional memory ordering specifiers. This patch adds the ARMv8.1 atomic
> load-operate instructions.
> 
> Tested the series for aarch64-none-linux-gnu with native bootstrap and
> make check. Also tested for aarch64-none-elf with cross-compiled
> check-gcc on an ARMv8.1 emulator with +lse enabled by default.
> 
> Ok for trunk?
> Matthew
> 
> 2015-09-17  Matthew Wahab  <matthew.wahab@arm.com>
> 
> 	* config/aarch64/aarch64/atomics.md (UNSPECV_ATOMIC_LDOP): New.
> 	(UNSPECV_ATOMIC_LDOP_OR): New.
> 	(UNSPECV_ATOMIC_LDOP_BIC): New.
> 	(UNSPECV_ATOMIC_LDOP_XOR): New.
> 	(UNSPECV_ATOMIC_LDOP_PLUS): New.
> 	(ATOMIC_LDOP): New.
> 	(atomic_ldop): New.
> 	(aarch64_atomic_load<atomic_ldop><mode>): New.
> 

> From 6a8a83c4efbd607924f0630779d4915c9dad079c Mon Sep 17 00:00:00 2001
> From: Matthew Wahab <matthew.wahab@arm.com>
> Date: Mon, 10 Aug 2015 17:02:08 +0100
> Subject: [PATCH 3/5] Add atomic load-operate instructions.
> 
> Change-Id: I3746875bad7469403bee7df952f0ba565e4abc71
> ---
>  gcc/config/aarch64/atomics.md | 41 +++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 41 insertions(+)
> 
> diff --git a/gcc/config/aarch64/atomics.md b/gcc/config/aarch64/atomics.md
> index 0e71002..b7b6fb5 100644
> --- a/gcc/config/aarch64/atomics.md
> +++ b/gcc/config/aarch64/atomics.md
> @@ -29,8 +29,25 @@
>      UNSPECV_ATOMIC_CAS			; Represent an atomic CAS.
>      UNSPECV_ATOMIC_SWP			; Represent an atomic SWP.
>      UNSPECV_ATOMIC_OP			; Represent an atomic operation.
> +    UNSPECV_ATOMIC_LDOP			; Represent an atomic load-operation
> +    UNSPECV_ATOMIC_LDOP_OR		; Represent an atomic load-or
> +    UNSPECV_ATOMIC_LDOP_BIC		; Represent an atomic load-bic
> +    UNSPECV_ATOMIC_LDOP_XOR		; Represent an atomic load-xor
> +    UNSPECV_ATOMIC_LDOP_PLUS		; Represent an atomic load-add
>  ])
>  
> +;; Iterators for load-operate instructions.
> +
> +(define_int_iterator ATOMIC_LDOP
> + [UNSPECV_ATOMIC_LDOP_OR UNSPECV_ATOMIC_LDOP_BIC
> +  UNSPECV_ATOMIC_LDOP_XOR UNSPECV_ATOMIC_LDOP_PLUS])
> +
> +(define_int_attr atomic_ldop
> + [(UNSPECV_ATOMIC_LDOP_OR "set") (UNSPECV_ATOMIC_LDOP_BIC "clr")
> +  (UNSPECV_ATOMIC_LDOP_XOR "eor") (UNSPECV_ATOMIC_LDOP_PLUS "add")])

There is precedent (atomic_optab, atomic_op_operand, const_atomic, etc.) for
these living in config/aarch64/iterators.md so they should be moved there.
Presumably the difficulty with that is to do with the position of the
"unspecv" define_c_enum? I'd argue that is in the wrong place too...

If you want to leave this to a cleanup patch in stage 3 that is fine.

This patch is OK for trunk.

Thanks,
James

> +
> +;; Instruction patterns.
> +
>  (define_expand "atomic_compare_and_swap<mode>"
>    [(match_operand:SI 0 "register_operand" "")			;; bool out
>     (match_operand:ALLI 1 "register_operand" "")			;; val out
> @@ -541,3 +558,27 @@
>      else
>        return "casal<atomic_sfx>\t%<w>0, %<w>2, %1";
>  })
> +
> +;; Atomic load-op: Load data, operate, store result, keep data.
> +
> +(define_insn "aarch64_atomic_load<atomic_ldop><mode>"
> + [(set (match_operand:ALLI 0 "register_operand" "=r")
> +   (match_operand:ALLI 1 "aarch64_sync_memory_operand" "+Q"))
> +  (set (match_dup 1)
> +   (unspec_volatile:ALLI
> +    [(match_dup 1)
> +     (match_operand:ALLI 2 "register_operand")
> +     (match_operand:SI 3 "const_int_operand")]
> +    ATOMIC_LDOP))]
> + "TARGET_LSE && reload_completed"
> + {
> +   enum memmodel model = memmodel_from_int (INTVAL (operands[3]));
> +   if (is_mm_relaxed (model))
> +     return "ld<atomic_ldop><atomic_sfx>\t%<w>2, %<w>0, %1";
> +   else if (is_mm_acquire (model) || is_mm_consume (model))
> +     return "ld<atomic_ldop>a<atomic_sfx>\t%<w>2, %<w>0, %1";
> +   else if (is_mm_release (model))
> +     return "ld<atomic_ldop>l<atomic_sfx>\t%<w>2, %<w>0, %1";
> +   else
> +     return "ld<atomic_ldop>al<atomic_sfx>\t%<w>2, %<w>0, %1";
> + })
> -- 
> 2.1.4
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [AArch64][PATCH 4/5] Use atomic load-operate instructions for fetch-update patterns.
  2015-09-17 16:49 ` [AArch64][PATCH 4/5] Use atomic load-operate instructions for fetch-update patterns Matthew Wahab
@ 2015-09-18  9:05   ` James Greenhalgh
  2015-09-21 11:32     ` Matthew Wahab
  0 siblings, 1 reply; 20+ messages in thread
From: James Greenhalgh @ 2015-09-18  9:05 UTC (permalink / raw)
  To: Matthew Wahab; +Cc: gcc-patches

On Thu, Sep 17, 2015 at 05:47:43PM +0100, Matthew Wahab wrote:
> Hello,
> 
> ARMv8.1 adds atomic swap and atomic load-operate instructions with
> optional memory ordering specifiers. This patch uses the ARMv8.1 atomic
> load-operate instructions to implement the atomic_fetch_<op>
> patterns. This patch also updates the implementation of the atomic_<op>
> patterns, which are treated as versions of the atomic_fetch_<op> which
> discard the loaded data.
> 
> The general form of the code generated for an atomic_fetch_<op>, with
> destination D, source S, memory address A and memory order MO, depends
> on whether the operation is directly supported by the instruction. If
> <op> is one of PLUS, IOR or XOR, the code generated is:
> 
>      ld<opc><mo> S, D, [A]
> 
> where
>    <opc> is one {add, set, eor}
>    <mo> is one of {'', 'a', 'l', 'al'} depending on memory order MO.
>    <sz> is one of {'', 'b', 'h'} depending on the data size.
> 
> If <op> is SUB, the code generated, with scratch register r, is:
> 
>      neg r, S
>      ldadd<mo><sz> r, D, [A]
> 
> If <op> is AND, the code generated is:
>      not r, S
>      ldclr<mo><sz> r, D, [A]
> 
> Any operation not in {PLUS, IOR, XOR, SUB, AND} is passed to the
> existing aarch64_split_atomic_op function, to implement the operation
> using sequences built with the ARMv8 load-exclusive/store-exclusive
> instructions
> 
> Tested the series for aarch64-none-linux-gnu with native bootstrap and
> make check. Also tested for aarch64-none-elf with cross-compiled
> check-gcc on an ARMv8.1 emulator with +lse enabled by default.
> 
> Ok for trunk?

Some comments in line below.

> From c4b8eb6d2ca62c57f4a011e06335b918f603ad2a Mon Sep 17 00:00:00 2001
> From: Matthew Wahab <matthew.wahab@arm.com>
> Date: Fri, 7 Aug 2015 17:10:42 +0100
> Subject: [PATCH 4/5] Use atomic instructions for fetch-update patterns.
> 
> Change-Id: I39759f02e61039067ccaabfd52039e4804eddf2f
> ---
>  gcc/config/aarch64/aarch64-protos.h                |   2 +
>  gcc/config/aarch64/aarch64.c                       | 176 ++++++++++++++++++++-
>  gcc/config/aarch64/atomics.md                      | 109 ++++++++++++-
>  .../gcc.target/aarch64/atomic-inst-ldadd.c         |  58 +++++++
>  .../gcc.target/aarch64/atomic-inst-ldlogic.c       | 109 +++++++++++++
>  5 files changed, 444 insertions(+), 10 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/atomic-inst-ldadd.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/atomic-inst-ldlogic.c
> 
> diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
> index eba4c76..76ebd6f 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -378,6 +378,8 @@ rtx aarch64_load_tp (rtx);
>  void aarch64_expand_compare_and_swap (rtx op[]);
>  void aarch64_split_compare_and_swap (rtx op[]);
>  void aarch64_gen_atomic_cas (rtx, rtx, rtx, rtx, rtx);
> +
> +bool aarch64_atomic_ldop_supported_p (enum rtx_code);
>  void aarch64_gen_atomic_ldop (enum rtx_code, rtx, rtx, rtx, rtx);
>  void aarch64_split_atomic_op (enum rtx_code, rtx, rtx, rtx, rtx, rtx, rtx);
>  
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index dc05c6e..33f9ef3 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -11064,6 +11064,33 @@ aarch64_expand_compare_and_swap (rtx operands[])
>    emit_insn (gen_rtx_SET (bval, x));
>  }
>  
> +/* Test whether the target supports using a atomic load-operate instruction.
> +   CODE is the operation and AFTER is TRUE if the data in memory after the
> +   operation should be returned and FALSE if the data before the operation
> +   should be returned.  Returns FALSE if the operation isn't supported by the
> +   architecture.
> +  */

Stray newline, leave the */ on the line before.

> +
> +bool
> +aarch64_atomic_ldop_supported_p (enum rtx_code code)
> +{
> +  if (!TARGET_LSE)
> +    return false;
> +
> +  switch (code)
> +    {
> +    case SET:
> +    case AND:
> +    case IOR:
> +    case XOR:
> +    case MINUS:
> +    case PLUS:
> +      return true;
> +    default:
> +      return false;
> +    }
> +}
> +
>  /* Emit a barrier, that is appropriate for memory model MODEL, at the end of a
>     sequence implementing an atomic operation.  */
>  
> @@ -11206,26 +11233,169 @@ aarch64_emit_atomic_swap (machine_mode mode, rtx dst, rtx value,
>    emit_insn (gen (dst, mem, value, model));
>  }
>  
> -/* Emit an atomic operation where the architecture supports it.  */
> +/* Operations supported by aarch64_emit_atomic_load_op.  */
> +
> +enum aarch64_atomic_load_op_code
> +{
> +  AARCH64_LDOP_PLUS,	/* A + B  */
> +  AARCH64_LDOP_XOR,	/* A ^ B  */
> +  AARCH64_LDOP_OR,	/* A | B  */
> +  AARCH64_LDOP_BIC	/* A & ~B  */
> +};

I have a small preference to calling these the same name as the
instructions they will generate, so AARCH64_LDOP_ADD, AARCH64_LDOP_EOR,
AARCH64_LDOP_SET and AARCH64_LDOP_CLR, but I'm happy fo you to leave it
this way if you prefer.

> +
> +/* Emit an atomic load-operate.  */
> +
> +static void
> +aarch64_emit_atomic_load_op (enum aarch64_atomic_load_op_code code,
> +			     machine_mode mode, rtx dst, rtx src,
> +			     rtx mem, rtx model)
> +{
> +  typedef rtx (*aarch64_atomic_load_op_fn) (rtx, rtx, rtx, rtx);
> +  const aarch64_atomic_load_op_fn plus[] =
> +  {
> +    gen_aarch64_atomic_loadaddqi,
> +    gen_aarch64_atomic_loadaddhi,
> +    gen_aarch64_atomic_loadaddsi,
> +    gen_aarch64_atomic_loadadddi
> +  };
> +  const aarch64_atomic_load_op_fn eor[] =
> +  {
> +    gen_aarch64_atomic_loadeorqi,
> +    gen_aarch64_atomic_loadeorhi,
> +    gen_aarch64_atomic_loadeorsi,
> +    gen_aarch64_atomic_loadeordi
> +  };
> +  const aarch64_atomic_load_op_fn ior[] =
> +  {
> +    gen_aarch64_atomic_loadsetqi,
> +    gen_aarch64_atomic_loadsethi,
> +    gen_aarch64_atomic_loadsetsi,
> +    gen_aarch64_atomic_loadsetdi
> +  };
> +  const aarch64_atomic_load_op_fn bic[] =
> +  {
> +    gen_aarch64_atomic_loadclrqi,
> +    gen_aarch64_atomic_loadclrhi,
> +    gen_aarch64_atomic_loadclrsi,
> +    gen_aarch64_atomic_loadclrdi
> +  };
> +  aarch64_atomic_load_op_fn gen;
> +  int idx = 0;
> +
> +  switch (mode)
> +    {
> +    case QImode: idx = 0; break;
> +    case HImode: idx = 1; break;
> +    case SImode: idx = 2; break;
> +    case DImode: idx = 3; break;
> +    default:
> +      gcc_unreachable ();
> +    }
> +
> +  switch (code)
> +    {
> +    case AARCH64_LDOP_PLUS: gen = plus[idx]; break;
> +    case AARCH64_LDOP_XOR: gen = eor[idx]; break;
> +    case AARCH64_LDOP_OR: gen = ior[idx]; break;
> +    case AARCH64_LDOP_BIC: gen = bic[idx]; break;
> +    default:
> +      gcc_unreachable ();
> +    }
> +
> +  emit_insn (gen (dst, mem, src, model));
> +}
> +
> +/* Emit an atomic load+operate.  CODE is the operation.  OUT_DATA is the
> +   location to store the data read from memory.  MEM is the memory location to
> +   read and modify.  MODEL_RTX is the memory ordering to use.  VALUE is the
> +   second operand for the operation.  Either OUT_DATA or OUT_RESULT, but not
> +   both, can be NULL.  */
>  
>  void
>  aarch64_gen_atomic_ldop (enum rtx_code code, rtx out_data,
>  			 rtx mem, rtx value, rtx model_rtx)
>  {
>    machine_mode mode = GET_MODE (mem);
> +  machine_mode wmode = (mode == DImode ? DImode : SImode);
> +  const bool short_mode = (mode < SImode);
> +  aarch64_atomic_load_op_code ldop_code;
> +  rtx src;
> +  rtx x;
>  
> -  out_data = gen_lowpart (mode, out_data);
> +  if (out_data)
> +    out_data = gen_lowpart (mode, out_data);
> +
> +  /* Make sure the value is in a register, putting it into a destination
> +     register if it needs to be manipulated.  */
> +  if (!register_operand (value, mode)
> +      || code == AND || code == MINUS)
> +    {
> +      src = out_data;
> +      emit_move_insn (src, gen_lowpart (mode, value));
> +    }
> +  else
> +    src = value;
> +  gcc_assert (register_operand (src, mode));
>  
> +  /* Preprocess the data for the operation as necessary.  If the operation is
> +     a SET then emit a swap instruction and finish.  */
>    switch (code)
>      {
>      case SET:
> -      aarch64_emit_atomic_swap (mode, out_data, value, mem, model_rtx);
> +      aarch64_emit_atomic_swap (mode, out_data, src, mem, model_rtx);
>        return;
>  
> +    case MINUS:
> +      /* Negate the value and treat it as a PLUS.  */
> +      {
> +	rtx neg_src;
> +
> +	/* Resize the value if necessary.  */
> +	if (short_mode)
> +	  src = gen_lowpart (wmode, src);
> +
> +	neg_src = gen_rtx_NEG (wmode, src);
> +	emit_insn (gen_rtx_SET (src, neg_src));
> +
> +	if (short_mode)
> +	  src = gen_lowpart (mode, src);
> +      }
> +      /* Fall-through.  */
> +    case PLUS:
> +      ldop_code = AARCH64_LDOP_PLUS;
> +      break;
> +
> +    case IOR:
> +      ldop_code = AARCH64_LDOP_OR;
> +      break;
> +
> +    case XOR:
> +      ldop_code = AARCH64_LDOP_XOR;
> +      break;
> +
> +    case AND:
> +      {
> +	rtx not_src;
> +
> +	/* Resize the value if necessary.  */
> +	if (short_mode)
> +	  src = gen_lowpart (wmode, src);
> +
> +	not_src = gen_rtx_NOT (wmode, src);
> +	emit_insn (gen_rtx_SET (src, not_src));
> +
> +	if (short_mode)
> +	  src = gen_lowpart (mode, src);
> +      }
> +      ldop_code = AARCH64_LDOP_BIC;
> +      break;
> +
>      default:
>        /* The operation can't be done with atomic instructions.  */
>        gcc_unreachable ();
>      }
> +
> +  aarch64_emit_atomic_load_op (ldop_code, mode, out_data, src, mem, model_rtx);
>  }
>  
>  /* Split an atomic operation.  */
> diff --git a/gcc/config/aarch64/atomics.md b/gcc/config/aarch64/atomics.md
> index b7b6fb5..5a10cf9 100644
> --- a/gcc/config/aarch64/atomics.md
> +++ b/gcc/config/aarch64/atomics.md
> @@ -230,23 +230,67 @@
>    }
>  )
>  
> -(define_insn_and_split "atomic_<atomic_optab><mode>"
> +(define_expand "atomic_<atomic_optab><mode>"
> + [(set (match_operand:ALLI 0 "aarch64_sync_memory_operand" "")
> +   (unspec_volatile:ALLI
> +    [(atomic_op:ALLI (match_dup 0)
> +      (match_operand:ALLI 1 "<atomic_op_operand>" ""))
> +     (match_operand:SI 2 "const_int_operand")]
> +    UNSPECV_ATOMIC_OP))
> +  (clobber (reg:CC CC_REGNUM))]

This is not needed for the LSE forms of these instructions and may result
in less optimal code genmeration. On the other hand, that will only be in
a corner case and this is only a define_expand. Because of that, it would
be clearer to a reader if you dropped the detailed description of this
in RTL (which is never used) and rewrote it using just the uses of the
operands, as so:

> +(define_expand "atomic_<atomic_optab><mode>"
> + [(match_operand:ALLI 0 "aarch64_sync_memory_operand" "")
> +  (match_operand:ALLI 1 "<atomic_op_operand>" "")
> +  (match_operand:SI 2 "const_int_operand")]




> +(define_insn_and_split "aarch64_atomic_<atomic_optab><mode>"
> + [(set (match_operand:ALLI 0 "aarch64_sync_memory_operand" "+Q")
> +   (unspec_volatile:ALLI
> +    [(atomic_op:ALLI (match_dup 0)
> +      (match_operand:ALLI 1 "<atomic_op_operand>" "r<const_atomic>"))
> +     (match_operand:SI 2 "const_int_operand")]
> +    UNSPECV_ATOMIC_OP))
> +  (clobber (reg:CC CC_REGNUM))
> +  (clobber (match_scratch:ALLI 3 "=&r"))
> +  (clobber (match_scratch:SI 4 "=&r"))]
> +  ""

TARGET_LSE ?

> +  "#"
> +  "&& reload_completed"
> +  [(const_int 0)]
> +  {
> +    aarch64_split_atomic_op (<CODE>, NULL, operands[3], operands[0],
> +			     operands[1], operands[2], operands[4]);
> +    DONE;
> +  }
> +)
> +
> +(define_insn_and_split "aarch64_atomic_<atomic_optab><mode>_lse"
>    [(set (match_operand:ALLI 0 "aarch64_sync_memory_operand" "+Q")
>      (unspec_volatile:ALLI
>        [(atomic_op:ALLI (match_dup 0)
>  	(match_operand:ALLI 1 "<atomic_op_operand>" "r<const_atomic>"))
> -       (match_operand:SI 2 "const_int_operand")]		;; model
> +       (match_operand:SI 2 "const_int_operand")]
>        UNSPECV_ATOMIC_OP))
> -       (clobber (reg:CC CC_REGNUM))
>     (clobber (match_scratch:ALLI 3 "=&r"))
> -   (clobber (match_scratch:SI 4 "=&r"))]
> +   (clobber (reg:CC CC_REGNUM))]
>    ""

TARGET_LSE?

>    "#"
>    "&& reload_completed"
>    [(const_int 0)]
>    {
> -    aarch64_split_atomic_op (<CODE>, NULL, operands[3], operands[0],
> -			     operands[1], operands[2], operands[4]);
> +    aarch64_gen_atomic_ldop (<CODE>, operands[3], operands[0],
> +			     operands[1], operands[2]);
>      DONE;
>    }
>  )
> @@ -273,7 +317,37 @@
>    }
>  )
>  
> -(define_insn_and_split "atomic_fetch_<atomic_optab><mode>"
> +;; Load-operate-store, returning the updated memory data.
> +
> +(define_expand "atomic_fetch_<atomic_optab><mode>"
> + [(parallel
> +   [(set
> +     (match_operand:ALLI 0 "register_operand" "")
> +     (match_operand:ALLI 1 "aarch64_sync_memory_operand" ""))
> +   (set
> +    (match_dup 1)
> +    (unspec_volatile:ALLI
> +     [(atomic_op:ALLI
> +       (match_dup 1)
> +       (match_operand:ALLI 2 "<atomic_op_operand>" ""))
> +      (match_operand:SI 3 "const_int_operand")]
> +     UNSPECV_ATOMIC_OP))])]

This again seems very detailed for an expand pattern which won't use
it.

> + ""
> +{
> +  rtx (*gen) (rtx, rtx, rtx, rtx);
> +
> +  /* Use an atomic load-operate instruction when possible.  */
> +  if (aarch64_atomic_ldop_supported_p (<CODE>))
> +    gen = gen_aarch64_atomic_fetch_<atomic_optab><mode>_lse;
> +  else
> +    gen = gen_aarch64_atomic_fetch_<atomic_optab><mode>;
> +
> +  emit_insn (gen (operands[0], operands[1], operands[2], operands[3]));
> +
> +  DONE;
> +})
> +
> +(define_insn_and_split "aarch64_atomic_fetch_<atomic_optab><mode>"
>    [(set (match_operand:ALLI 0 "register_operand" "=&r")
>      (match_operand:ALLI 1 "aarch64_sync_memory_operand" "+Q"))
>     (set (match_dup 1)
> @@ -296,6 +370,27 @@
>    }
>  )
>  
> +(define_insn_and_split "aarch64_atomic_fetch_<atomic_optab><mode>_lse"
> +  [(set (match_operand:ALLI 0 "register_operand" "=&r")
> +    (match_operand:ALLI 1 "aarch64_sync_memory_operand" "+Q"))
> +   (set (match_dup 1)
> +    (unspec_volatile:ALLI
> +      [(atomic_op:ALLI (match_dup 1)
> +	(match_operand:ALLI 2 "<atomic_op_operand>" "r<const_atomic>"))
> +       (match_operand:SI 3 "const_int_operand")]
> +      UNSPECV_ATOMIC_LDOP))
> +   (clobber (reg:CC CC_REGNUM))]
> +  "TARGET_LSE"
> +  "#"
> +  "&& reload_completed"
> +  [(const_int 0)]
> +  {
> +    aarch64_gen_atomic_ldop (<CODE>, operands[0], operands[1],
> +			     operands[2], operands[3]);
> +    DONE;
> +  }
> +)
> +
>  (define_insn_and_split "atomic_fetch_nand<mode>"
>    [(set (match_operand:ALLI 0 "register_operand" "=&r")
>      (match_operand:ALLI 1 "aarch64_sync_memory_operand" "+Q"))
>


Thanks,
James

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [AArch64][PATCH 5/5] Use atomic load-operate instructions for update-fetch patterns.
  2015-09-17 18:56   ` Andrew Pinski
@ 2015-09-18 16:27     ` Ramana Radhakrishnan
  0 siblings, 0 replies; 20+ messages in thread
From: Ramana Radhakrishnan @ 2015-09-18 16:27 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: Matthew Wahab, Marcus Shawcroft, gcc-patches

Hi Andrew,


>>
>> Tested the series for aarch64-none-linux-gnu with native bootstrap and
>> make check. Also tested for aarch64-none-elf with cross-compiled
>> check-gcc on an ARMv8.1 emulator with +lse enabled by default.
> 
> 
> Are you going to add some builtins for MIN/MAX support too?

The ACLE does not specify special intrinsics for any of the atomic instructions,
implementing the GCC intrinsics for atomics. The AArch64 backend is consistent
with the ACLE in terms of the intrinsics that have been added into it.

Having had some internal discussions on this topic - I've been made aware that there are proposals such as
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3696.htm, though not considered for C++14 might
be a useful avenue to explore. It is better to add intrinsics to the general GCC language extensions that
matched with this or were suitable rather than something to the AArch64 backend.

We have no immediate plans of doing so - Is this something you can help with ?

regards
Ramana

> 
> Thanks,
> Andrew Pinski
> 
>>
>> Ok for trunk?
>> Matthew
>>
>> 2015-09-17  Matthew Wahab  <matthew.wahab@arm.com>
>>
>>         * config/aarch64/aarch64-protos.h (aarch64_gen_atomic_ldop):
>>         Adjust declaration.
>>         * config/aarch64/aarch64.c (aarch64_emit_bic): New.
>>         (aarch64_gen_atomic_load_op): Adjust comment.  Add parameter
>>         out_result.  Update to support update-fetch operations.
>>         * config/aarch64/atomics.md (aarch64_atomic_exchange<mode>_lse):
>>         Adjust for change to aarch64_gen_atomic_ldop.
>>         (aarch64_atomic_<atomic_optab><mode>_lse): Likewise.
>>         (aarch64_atomic_fetch_<atomic_optab><mode>_lse): Likewise.
>>         (atomic_<atomic_optab>_fetch<mode>): Change to an expander.
>>         (aarch64_atomic_<atomic_optab>_fetch<mode>): New.
>>         (aarch64_atomic_<atomic_optab>_fetch<mode>_lse): New.
>>
>> gcc/testsuite
>> 2015-09-17  Matthew Wahab  <matthew.wahab@arm.com>
>>
>>         * gcc.target/aarch64/atomic-inst-ldadd.c: Add tests for
>>         update-fetch operations.
>>         * gcc.target/aarch64/atomic-inst-ldlogic.c: Likewise.
>>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [AArch64][PATCH 1/5] Use atomic instructions for swap and fetch-update operations.
  2015-09-18  7:59 ` [AArch64][PATCH 1/5] Use atomic instructions for swap and fetch-update operations James Greenhalgh
@ 2015-09-21 11:06   ` Matthew Wahab
  2015-09-21 11:25     ` James Greenhalgh
  0 siblings, 1 reply; 20+ messages in thread
From: Matthew Wahab @ 2015-09-21 11:06 UTC (permalink / raw)
  To: James Greenhalgh; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 3238 bytes --]

On 18/09/15 08:58, James Greenhalgh wrote:
> On Thu, Sep 17, 2015 at 05:37:55PM +0100, Matthew Wahab wrote:

>> diff --git a/gcc/config/aarch64/atomics.md b/gcc/config/aarch64/atomics.md
>> index 65d2cc9..0e71002 100644
>> --- a/gcc/config/aarch64/atomics.md
>> +++ b/gcc/config/aarch64/atomics.md
>> @@ -27,6 +27,7 @@
>>       UNSPECV_ATOMIC_CMPSW		; Represent an atomic compare swap.
>>       UNSPECV_ATOMIC_EXCHG		; Represent an atomic exchange.
>>       UNSPECV_ATOMIC_CAS			; Represent an atomic CAS.
>> +    UNSPECV_ATOMIC_SWP			; Represent an atomic SWP.
>>       UNSPECV_ATOMIC_OP			; Represent an atomic operation.
>>   ])
>>
>> @@ -122,19 +123,19 @@
>>   )
>>
>>   (define_insn_and_split "aarch64_compare_and_swap<mode>_lse"
>> -  [(set (reg:CC CC_REGNUM)					;; bool out
>> +  [(set (reg:CC CC_REGNUM)
>>       (unspec_volatile:CC [(const_int 0)] UNSPECV_ATOMIC_CMPSW))
>> -   (set (match_operand:GPI 0 "register_operand" "=&r")		;; val out
>> -    (match_operand:GPI 1 "aarch64_sync_memory_operand" "+Q"))   ;; memory
>> +   (set (match_operand:GPI 0 "register_operand" "=&r")
>> +    (match_operand:GPI 1 "aarch64_sync_memory_operand" "+Q"))
>>      (set (match_dup 1)
>>       (unspec_volatile:GPI
>> -      [(match_operand:GPI 2 "aarch64_plus_operand" "rI")	;; expect
>> -       (match_operand:GPI 3 "register_operand" "r")		;; desired
>> -       (match_operand:SI 4 "const_int_operand")			;; is_weak
>> -       (match_operand:SI 5 "const_int_operand")			;; mod_s
>> -       (match_operand:SI 6 "const_int_operand")]		;; mod_f
>> +      [(match_operand:GPI 2 "aarch64_plus_operand" "rI")
>> +       (match_operand:GPI 3 "register_operand" "r")
>> +       (match_operand:SI 4 "const_int_operand")
>> +       (match_operand:SI 5 "const_int_operand")
>> +       (match_operand:SI 6 "const_int_operand")]
>
> I'm not sure I understand the change here, those comments still look helpful
> enough for understanding the pattern, what have a I missed?

That was part of an attempt to clean up some code. It's unnecessary and I've dropped 
the change.

Attached is the updated patch with some other changes:
- Simplified the atomic_exchange<mode> expander in line with reviews for
   other patches in the series.
- Removed the CC clobber from aarch64_atomic_exchange<mode>_lse, it was
   over-cautious.
- Added a missing entry to the change log (noting a whitespace fix).

Ok for trunk?
Matthew

gcc/
2015-09-21  Matthew Wahab  <matthew.wahab@arm.com>

	* config/aarch64/aarch64-protos.h (aarch64_gen_atomic_ldop):
	Declare.
	* config/aarch64/aarch64.c (aarch64_emit_atomic_swap): New.
	(aarch64_gen_atomic_ldop): New.
	(aarch64_split_atomic_op): Fix whitespace and add a comment.
	* config/aarch64/atomics.md (UNSPECV_ATOMIC_SWP): New.
	(aarch64_compare_and_swap<mode>_lse): Fix some whitespace.
	(atomic_exchange<mode>): Replace with an expander.
	(aarch64_atomic_exchange<mode>): New.
	(aarch64_atomic_exchange<mode>_lse): New.
	(aarch64_atomic_<atomic_optab><mode>): Fix some whitespace.
	(aarch64_atomic_swp<mode>): New.


gcc/testsuite/
2015-09-21  Matthew Wahab  <matthew.wahab@arm.com>

	* gcc.target/aarch64/atomic-inst-ops.inc: (TEST_MODEL): New.
	(TEST_ONE): New.
         * gcc.target/aarch64/atomic-inst-swap.c: New.



[-- Attachment #2: 0001-Add-atomic-SWP-instruction.patch --]
[-- Type: text/x-patch, Size: 9860 bytes --]

From 31226dce8d36be98ca95d9165d4147a3bf84d180 Mon Sep 17 00:00:00 2001
From: Matthew Wahab <matthew.wahab@arm.com>
Date: Fri, 7 Aug 2015 17:18:37 +0100
Subject: [PATCH 1/5] Add atomic SWP instruction

Change-Id: I87bf48526cb11e65edd15691f5eab20446e418c9
---
 gcc/config/aarch64/aarch64-protos.h                |  1 +
 gcc/config/aarch64/aarch64.c                       | 46 +++++++++++++-
 gcc/config/aarch64/atomics.md                      | 71 ++++++++++++++++++++--
 .../gcc.target/aarch64/atomic-inst-ops.inc         | 13 ++++
 gcc/testsuite/gcc.target/aarch64/atomic-inst-swp.c | 44 ++++++++++++++
 5 files changed, 170 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/atomic-inst-swp.c

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index ff19851..eba4c76 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -378,6 +378,7 @@ rtx aarch64_load_tp (rtx);
 void aarch64_expand_compare_and_swap (rtx op[]);
 void aarch64_split_compare_and_swap (rtx op[]);
 void aarch64_gen_atomic_cas (rtx, rtx, rtx, rtx, rtx);
+void aarch64_gen_atomic_ldop (enum rtx_code, rtx, rtx, rtx, rtx);
 void aarch64_split_atomic_op (enum rtx_code, rtx, rtx, rtx, rtx, rtx, rtx);
 
 bool aarch64_gen_adjusted_ldpstp (rtx *, bool, enum machine_mode, RTX_CODE);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 4d2126b..dc05c6e 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -11185,11 +11185,54 @@ aarch64_split_compare_and_swap (rtx operands[])
     aarch64_emit_post_barrier (model);
 }
 
+/* Emit an atomic swap.  */
+
+static void
+aarch64_emit_atomic_swap (machine_mode mode, rtx dst, rtx value,
+			  rtx mem, rtx model)
+{
+  rtx (*gen) (rtx, rtx, rtx, rtx);
+
+  switch (mode)
+    {
+    case QImode: gen = gen_aarch64_atomic_swpqi; break;
+    case HImode: gen = gen_aarch64_atomic_swphi; break;
+    case SImode: gen = gen_aarch64_atomic_swpsi; break;
+    case DImode: gen = gen_aarch64_atomic_swpdi; break;
+    default:
+      gcc_unreachable ();
+    }
+
+  emit_insn (gen (dst, mem, value, model));
+}
+
+/* Emit an atomic operation where the architecture supports it.  */
+
+void
+aarch64_gen_atomic_ldop (enum rtx_code code, rtx out_data,
+			 rtx mem, rtx value, rtx model_rtx)
+{
+  machine_mode mode = GET_MODE (mem);
+
+  out_data = gen_lowpart (mode, out_data);
+
+  switch (code)
+    {
+    case SET:
+      aarch64_emit_atomic_swap (mode, out_data, value, mem, model_rtx);
+      return;
+
+    default:
+      /* The operation can't be done with atomic instructions.  */
+      gcc_unreachable ();
+    }
+}
+
 /* Split an atomic operation.  */
 
 void
 aarch64_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem,
-		     rtx value, rtx model_rtx, rtx cond)
+			 rtx value, rtx model_rtx, rtx cond)
 {
   machine_mode mode = GET_MODE (mem);
   machine_mode wmode = (mode == DImode ? DImode : SImode);
@@ -11198,6 +11241,7 @@ aarch64_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem,
   rtx_code_label *label;
   rtx x;
 
+  /* Split the atomic operation into a sequence.  */
   label = gen_label_rtx ();
   emit_label (label);
 
diff --git a/gcc/config/aarch64/atomics.md b/gcc/config/aarch64/atomics.md
index 65d2cc9..cb80539 100644
--- a/gcc/config/aarch64/atomics.md
+++ b/gcc/config/aarch64/atomics.md
@@ -27,6 +27,7 @@
     UNSPECV_ATOMIC_CMPSW		; Represent an atomic compare swap.
     UNSPECV_ATOMIC_EXCHG		; Represent an atomic exchange.
     UNSPECV_ATOMIC_CAS			; Represent an atomic CAS.
+    UNSPECV_ATOMIC_SWP			; Represent an atomic SWP.
     UNSPECV_ATOMIC_OP			; Represent an atomic operation.
 ])
 
@@ -134,7 +135,7 @@
        (match_operand:SI 5 "const_int_operand")			;; mod_s
        (match_operand:SI 6 "const_int_operand")]		;; mod_f
       UNSPECV_ATOMIC_CMPSW))]
-  "TARGET_LSE "
+  "TARGET_LSE"
   "#"
   "&& reload_completed"
   [(const_int 0)]
@@ -146,7 +147,28 @@
   }
 )
 
-(define_insn_and_split "atomic_exchange<mode>"
+(define_expand "atomic_exchange<mode>"
+ [(match_operand:ALLI 0 "register_operand" "")
+  (match_operand:ALLI 1 "aarch64_sync_memory_operand" "")
+  (match_operand:ALLI 2 "register_operand" "")
+  (match_operand:SI 3 "const_int_operand" "")]
+  ""
+  {
+    rtx (*gen) (rtx, rtx, rtx, rtx);
+
+    /* Use an atomic SWP when available.  */
+    if (TARGET_LSE)
+      gen = gen_aarch64_atomic_exchange<mode>_lse;
+    else
+      gen = gen_aarch64_atomic_exchange<mode>;
+
+    emit_insn (gen (operands[0], operands[1], operands[2], operands[3]));
+
+    DONE;
+  }
+)
+
+(define_insn_and_split "aarch64_atomic_exchange<mode>"
   [(set (match_operand:ALLI 0 "register_operand" "=&r")		;; output
     (match_operand:ALLI 1 "aarch64_sync_memory_operand" "+Q")) ;; memory
    (set (match_dup 1)
@@ -162,7 +184,26 @@
   [(const_int 0)]
   {
     aarch64_split_atomic_op (SET, operands[0], NULL, operands[1],
-			    operands[2], operands[3], operands[4]);
+			     operands[2], operands[3], operands[4]);
+    DONE;
+  }
+)
+
+(define_insn_and_split "aarch64_atomic_exchange<mode>_lse"
+  [(set (match_operand:ALLI 0 "register_operand" "=&r")
+    (match_operand:ALLI 1 "aarch64_sync_memory_operand" "+Q"))
+   (set (match_dup 1)
+    (unspec_volatile:ALLI
+      [(match_operand:ALLI 2 "register_operand" "r")
+       (match_operand:SI 3 "const_int_operand" "")]
+      UNSPECV_ATOMIC_EXCHG))]
+  "TARGET_LSE"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+  {
+    aarch64_gen_atomic_ldop (SET, operands[0], operands[1],
+			     operands[2], operands[3]);
     DONE;
   }
 )
@@ -183,7 +224,7 @@
   [(const_int 0)]
   {
     aarch64_split_atomic_op (<CODE>, NULL, operands[3], operands[0],
-			    operands[1], operands[2], operands[4]);
+			     operands[1], operands[2], operands[4]);
     DONE;
   }
 )
@@ -425,6 +466,28 @@
 
 ;; ARMv8.1 LSE instructions.
 
+;; Atomic swap with memory.
+(define_insn "aarch64_atomic_swp<mode>"
+ [(set (match_operand:ALLI 0 "register_operand" "+&r")
+   (match_operand:ALLI 1 "aarch64_sync_memory_operand" "+Q"))
+  (set (match_dup 1)
+   (unspec_volatile:ALLI
+    [(match_operand:ALLI 2 "register_operand" "r")
+     (match_operand:SI 3 "const_int_operand" "")]
+    UNSPECV_ATOMIC_SWP))]
+  "TARGET_LSE && reload_completed"
+  {
+    enum memmodel model = memmodel_from_int (INTVAL (operands[3]));
+    if (is_mm_relaxed (model))
+      return "swp<atomic_sfx>\t%<w>2, %<w>0, %1";
+    else if (is_mm_acquire (model) || is_mm_consume (model))
+      return "swpa<atomic_sfx>\t%<w>2, %<w>0, %1";
+    else if (is_mm_release (model))
+      return "swpl<atomic_sfx>\t%<w>2, %<w>0, %1";
+    else
+      return "swpal<atomic_sfx>\t%<w>2, %<w>0, %1";
+  })
+
 ;; Atomic compare-and-swap: HI and smaller modes.
 
 (define_insn "aarch64_atomic_cas<mode>"
diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-inst-ops.inc b/gcc/testsuite/gcc.target/aarch64/atomic-inst-ops.inc
index 72c7e5c..c2fdcba 100644
--- a/gcc/testsuite/gcc.target/aarch64/atomic-inst-ops.inc
+++ b/gcc/testsuite/gcc.target/aarch64/atomic-inst-ops.inc
@@ -32,6 +32,15 @@ typedef __uint128_t uint128;
   TEST_M##N (NAME, FN, int128, MODEL1, MODEL2)		\
   TEST_M##N (NAME, FN, uint128, MODEL1, MODEL2)
 
+/* Models to test.  */
+#define TEST_MODEL(NAME, FN, N)					\
+  TEST_TY (NAME##_relaxed, FN, N, __ATOMIC_RELAXED, DUMMY)	\
+  TEST_TY (NAME##_consume, FN, N, __ATOMIC_CONSUME, DUMMY)	\
+  TEST_TY (NAME##_acquire, FN, N, __ATOMIC_ACQUIRE, DUMMY)	\
+  TEST_TY (NAME##_release, FN, N, __ATOMIC_RELEASE, DUMMY)	\
+  TEST_TY (NAME##_acq_rel, FN, N, __ATOMIC_ACQ_REL, DUMMY)	\
+  TEST_TY (NAME##_seq_cst, FN, N, __ATOMIC_SEQ_CST, DUMMY)	\
+
 /* Cross-product of models to test.  */
 #define TEST_MODEL_M1(NAME, FN, N, M)			\
   TEST_TY (NAME##_relaxed, FN, N, M, __ATOMIC_RELAXED)	\
@@ -51,3 +60,7 @@ typedef __uint128_t uint128;
 
 /* Expand functions for a cross-product of memory models and types.  */
 #define TEST_TWO(NAME, FN) TEST_MODEL_M2 (NAME, FN)
+
+/* Expand functions for a set of memory models and types.  */
+#define TEST_ONE(NAME, FN) TEST_MODEL (NAME, FN, 1)
+
diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-inst-swp.c b/gcc/testsuite/gcc.target/aarch64/atomic-inst-swp.c
new file mode 100644
index 0000000..dabc9b9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/atomic-inst-swp.c
@@ -0,0 +1,44 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv8-a+lse" } */
+
+/* Test ARMv8.1-A SWP instruction.  */
+
+#include "atomic-inst-ops.inc"
+
+#define TEST TEST_ONE
+
+#define SWAP_ATOMIC(FN, TY, MODEL)					\
+  TY FNNAME (FN, TY) (TY* val, TY foo)					\
+  {									\
+    return __atomic_exchange_n (val, foo, MODEL);			\
+  }
+
+#define SWAP_ATOMIC_NORETURN(FN, TY, MODEL)				\
+  void FNNAME (FN, TY) (TY* val, TY* foo, TY* bar)			\
+  {									\
+    __atomic_exchange (val, foo, bar, MODEL);				\
+  }
+
+
+TEST (swap_atomic, SWAP_ATOMIC)
+TEST (swap_atomic_noreturn, SWAP_ATOMIC_NORETURN)
+
+
+/* { dg-final { scan-assembler-times "swpb\t" 4} } */
+/* { dg-final { scan-assembler-times "swpab\t" 8} } */
+/* { dg-final { scan-assembler-times "swplb\t" 4} } */
+/* { dg-final { scan-assembler-times "swpalb\t" 8} } */
+
+/* { dg-final { scan-assembler-times "swph\t" 4} } */
+/* { dg-final { scan-assembler-times "swpah\t" 8} } */
+/* { dg-final { scan-assembler-times "swplh\t" 4} } */
+/* { dg-final { scan-assembler-times "swpalh\t" 8} } */
+
+/* { dg-final { scan-assembler-times "swp\t" 8} } */
+/* { dg-final { scan-assembler-times "swpa\t" 16} } */
+/* { dg-final { scan-assembler-times "swpl\t" 8} } */
+/* { dg-final { scan-assembler-times "swpal\t" 16} } */
+
+/* { dg-final { scan-assembler-not "ldaxr\t" } } */
+/* { dg-final { scan-assembler-not "stlxr\t" } } */
+/* { dg-final { scan-assembler-not "dmb" } } */
-- 
2.1.4


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [AArch64][PATCH 2/5] Make BIC, other logical instructions, available. (was: Add BIC instruction.)
  2015-09-18  8:11   ` James Greenhalgh
@ 2015-09-21 11:16     ` Matthew Wahab
  2015-09-21 11:31       ` James Greenhalgh
  0 siblings, 1 reply; 20+ messages in thread
From: Matthew Wahab @ 2015-09-21 11:16 UTC (permalink / raw)
  To: James Greenhalgh; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1210 bytes --]

On 18/09/15 09:05, James Greenhalgh wrote:
> On Thu, Sep 17, 2015 at 05:40:48PM +0100, Matthew Wahab wrote:
>> Hello,
>>
>> ARMv8.1 adds atomic swap and atomic load-operate instructions with
>> optional memory ordering specifiers. This patch adds an expander to
>> generate a BIC instruction that can be explicitly called when
>> implementing the atomic_<op>_fetch pattern to calculate the value to
>> be returned by the operation.
>>
>
> Why not make the "*<LOGICAL:optab>_one_cmpl_<SHIFT:optab><mode>3" pattern
> named (remove the leading *) and call that in your atomic_<op>_fetch
> patterns as:
>
>    and_one_cmpl_<shift><mode>3
>
> I'd rather that than to add a pettern that simply expands to the same
> thing.

I overlooked that pattern when I was trying to find the bic emitter. I've attached an 
updated patch.

Tested as part of the series for aarch64-none-linux-gnu with native bootstrap and
make check. Also tested for aarch64-none-elf with cross-compiled
check-gcc on an ARMv8.1 emulator with +lse enabled by default.

Ok for trunk?
Matthew

2015-09-21  Matthew Wahab  <matthew.wahab@arm.com>

	* config/aarch64/aarch64.md
	(<LOGICAL:optab>_one_cmpl_<SHIFT:optab><mode>3): Make a named
	pattern.


[-- Attachment #2: 0002-Make-BIC-other-logical-instructions-available-for-us.patch --]
[-- Type: text/x-patch, Size: 850 bytes --]

From 0e2ae8739d70e4d1c14fa848f67847b1ecf94f71 Mon Sep 17 00:00:00 2001
From: Matthew Wahab <matthew.wahab@arm.com>
Date: Mon, 17 Aug 2015 17:48:27 +0100
Subject: [PATCH 2/5] Make BIC, other logical instructions, available for use.

Change-Id: Ibef049bfa1bfe5e168feada3dc358f28383e6410
---
 gcc/config/aarch64/aarch64.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 88ba72e..72384ce 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3392,7 +3392,7 @@
   [(set_attr "type" "logics_reg")]
 )
 
-(define_insn "*<LOGICAL:optab>_one_cmpl_<SHIFT:optab><mode>3"
+(define_insn "<LOGICAL:optab>_one_cmpl_<SHIFT:optab><mode>3"
   [(set (match_operand:GPI 0 "register_operand" "=r")
 	(LOGICAL:GPI (not:GPI
 		      (SHIFT:GPI
-- 
2.1.4


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [AArch64][PATCH 3/5] Add atomic load-operate instructions.
  2015-09-18  8:39   ` James Greenhalgh
@ 2015-09-21 11:23     ` Matthew Wahab
  0 siblings, 0 replies; 20+ messages in thread
From: Matthew Wahab @ 2015-09-21 11:23 UTC (permalink / raw)
  To: James Greenhalgh; +Cc: gcc-patches

On 18/09/15 09:39, James Greenhalgh wrote:
> On Thu, Sep 17, 2015 at 05:42:35PM +0100, Matthew Wahab wrote:
>> ---
>>   gcc/config/aarch64/atomics.md | 41 +++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 41 insertions(+)
>>
>> diff --git a/gcc/config/aarch64/atomics.md b/gcc/config/aarch64/atomics.md
>> index 0e71002..b7b6fb5 100644
>> --- a/gcc/config/aarch64/atomics.md
>> +++ b/gcc/config/aarch64/atomics.md
>> @@ -29,8 +29,25 @@
>>       UNSPECV_ATOMIC_CAS			; Represent an atomic CAS.
>>       UNSPECV_ATOMIC_SWP			; Represent an atomic SWP.
>>       UNSPECV_ATOMIC_OP			; Represent an atomic operation.
>> +    UNSPECV_ATOMIC_LDOP			; Represent an atomic load-operation
>> +    UNSPECV_ATOMIC_LDOP_OR		; Represent an atomic load-or
>> +    UNSPECV_ATOMIC_LDOP_BIC		; Represent an atomic load-bic
>> +    UNSPECV_ATOMIC_LDOP_XOR		; Represent an atomic load-xor
>> +    UNSPECV_ATOMIC_LDOP_PLUS		; Represent an atomic load-add
>>   ])
>>
>> +;; Iterators for load-operate instructions.
>> +
>> +(define_int_iterator ATOMIC_LDOP
>> + [UNSPECV_ATOMIC_LDOP_OR UNSPECV_ATOMIC_LDOP_BIC
>> +  UNSPECV_ATOMIC_LDOP_XOR UNSPECV_ATOMIC_LDOP_PLUS])
>> +
>> +(define_int_attr atomic_ldop
>> + [(UNSPECV_ATOMIC_LDOP_OR "set") (UNSPECV_ATOMIC_LDOP_BIC "clr")
>> +  (UNSPECV_ATOMIC_LDOP_XOR "eor") (UNSPECV_ATOMIC_LDOP_PLUS "add")])
>
> There is precedent (atomic_optab, atomic_op_operand, const_atomic, etc.) for
> these living in config/aarch64/iterators.md so they should be moved there.
> Presumably the difficulty with that is to do with the position of the
> "unspecv" define_c_enum? I'd argue that is in the wrong place too...
>
> If you want to leave this to a cleanup patch in stage 3 that is fine.
>
> This patch is OK for trunk.
>

I'd prefer to keep the clean-up separate from this series. I'll commit the patch as 
it is and the deal with the iterator move later.

Thanks,
Matthew

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [AArch64][PATCH 1/5] Use atomic instructions for swap and fetch-update operations.
  2015-09-21 11:06   ` Matthew Wahab
@ 2015-09-21 11:25     ` James Greenhalgh
  0 siblings, 0 replies; 20+ messages in thread
From: James Greenhalgh @ 2015-09-21 11:25 UTC (permalink / raw)
  To: Matthew Wahab; +Cc: gcc-patches

On Mon, Sep 21, 2015 at 12:06:40PM +0100, Matthew Wahab wrote:
> On 18/09/15 08:58, James Greenhalgh wrote:
> > On Thu, Sep 17, 2015 at 05:37:55PM +0100, Matthew Wahab wrote:
> 
> >> diff --git a/gcc/config/aarch64/atomics.md b/gcc/config/aarch64/atomics.md
> >> index 65d2cc9..0e71002 100644
> >> --- a/gcc/config/aarch64/atomics.md
> >> +++ b/gcc/config/aarch64/atomics.md
> >> @@ -27,6 +27,7 @@
> >>       UNSPECV_ATOMIC_CMPSW		; Represent an atomic compare swap.
> >>       UNSPECV_ATOMIC_EXCHG		; Represent an atomic exchange.
> >>       UNSPECV_ATOMIC_CAS			; Represent an atomic CAS.
> >> +    UNSPECV_ATOMIC_SWP			; Represent an atomic SWP.
> >>       UNSPECV_ATOMIC_OP			; Represent an atomic operation.
> >>   ])
> >>
> >> @@ -122,19 +123,19 @@
> >>   )
> >>
> >>   (define_insn_and_split "aarch64_compare_and_swap<mode>_lse"
> >> -  [(set (reg:CC CC_REGNUM)					;; bool out
> >> +  [(set (reg:CC CC_REGNUM)
> >>       (unspec_volatile:CC [(const_int 0)] UNSPECV_ATOMIC_CMPSW))
> >> -   (set (match_operand:GPI 0 "register_operand" "=&r")		;; val out
> >> -    (match_operand:GPI 1 "aarch64_sync_memory_operand" "+Q"))   ;; memory
> >> +   (set (match_operand:GPI 0 "register_operand" "=&r")
> >> +    (match_operand:GPI 1 "aarch64_sync_memory_operand" "+Q"))
> >>      (set (match_dup 1)
> >>       (unspec_volatile:GPI
> >> -      [(match_operand:GPI 2 "aarch64_plus_operand" "rI")	;; expect
> >> -       (match_operand:GPI 3 "register_operand" "r")		;; desired
> >> -       (match_operand:SI 4 "const_int_operand")			;; is_weak
> >> -       (match_operand:SI 5 "const_int_operand")			;; mod_s
> >> -       (match_operand:SI 6 "const_int_operand")]		;; mod_f
> >> +      [(match_operand:GPI 2 "aarch64_plus_operand" "rI")
> >> +       (match_operand:GPI 3 "register_operand" "r")
> >> +       (match_operand:SI 4 "const_int_operand")
> >> +       (match_operand:SI 5 "const_int_operand")
> >> +       (match_operand:SI 6 "const_int_operand")]
> >
> > I'm not sure I understand the change here, those comments still look helpful
> > enough for understanding the pattern, what have a I missed?
> 
> That was part of an attempt to clean up some code. It's unnecessary and I've dropped 
> the change.
> 
> Attached is the updated patch with some other changes:
> - Simplified the atomic_exchange<mode> expander in line with reviews for
>    other patches in the series.
> - Removed the CC clobber from aarch64_atomic_exchange<mode>_lse, it was
>    over-cautious.
> - Added a missing entry to the change log (noting a whitespace fix).
> 
> Ok for trunk?
> Matthew

OK.

Thanks,
James

> gcc/
> 2015-09-21  Matthew Wahab  <matthew.wahab@arm.com>
> 
> 	* config/aarch64/aarch64-protos.h (aarch64_gen_atomic_ldop):
> 	Declare.
> 	* config/aarch64/aarch64.c (aarch64_emit_atomic_swap): New.
> 	(aarch64_gen_atomic_ldop): New.
> 	(aarch64_split_atomic_op): Fix whitespace and add a comment.
> 	* config/aarch64/atomics.md (UNSPECV_ATOMIC_SWP): New.
> 	(aarch64_compare_and_swap<mode>_lse): Fix some whitespace.
> 	(atomic_exchange<mode>): Replace with an expander.
> 	(aarch64_atomic_exchange<mode>): New.
> 	(aarch64_atomic_exchange<mode>_lse): New.
> 	(aarch64_atomic_<atomic_optab><mode>): Fix some whitespace.
> 	(aarch64_atomic_swp<mode>): New.
> 
> 
> gcc/testsuite/
> 2015-09-21  Matthew Wahab  <matthew.wahab@arm.com>
> 
> 	* gcc.target/aarch64/atomic-inst-ops.inc: (TEST_MODEL): New.
> 	(TEST_ONE): New.
>          * gcc.target/aarch64/atomic-inst-swap.c: New.
> 
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [AArch64][PATCH 2/5] Make BIC, other logical instructions, available. (was: Add BIC instruction.)
  2015-09-21 11:16     ` [AArch64][PATCH 2/5] Make BIC, other logical instructions, available. (was: Add BIC instruction.) Matthew Wahab
@ 2015-09-21 11:31       ` James Greenhalgh
  0 siblings, 0 replies; 20+ messages in thread
From: James Greenhalgh @ 2015-09-21 11:31 UTC (permalink / raw)
  To: Matthew Wahab; +Cc: gcc-patches

On Mon, Sep 21, 2015 at 12:12:21PM +0100, Matthew Wahab wrote:
> On 18/09/15 09:05, James Greenhalgh wrote:
> > On Thu, Sep 17, 2015 at 05:40:48PM +0100, Matthew Wahab wrote:
> >> Hello,
> >>
> >> ARMv8.1 adds atomic swap and atomic load-operate instructions with
> >> optional memory ordering specifiers. This patch adds an expander to
> >> generate a BIC instruction that can be explicitly called when
> >> implementing the atomic_<op>_fetch pattern to calculate the value to
> >> be returned by the operation.
> >>
> >
> > Why not make the "*<LOGICAL:optab>_one_cmpl_<SHIFT:optab><mode>3" pattern
> > named (remove the leading *) and call that in your atomic_<op>_fetch
> > patterns as:
> >
> >    and_one_cmpl_<shift><mode>3
> >
> > I'd rather that than to add a pettern that simply expands to the same
> > thing.
> 
> I overlooked that pattern when I was trying to find the bic emitter. I've attached an 
> updated patch.
> 
> Tested as part of the series for aarch64-none-linux-gnu with native bootstrap and
> make check. Also tested for aarch64-none-elf with cross-compiled
> check-gcc on an ARMv8.1 emulator with +lse enabled by default.
> 
> Ok for trunk?
> Matthew

OK.

Thanks,
James

> 
> 2015-09-21  Matthew Wahab  <matthew.wahab@arm.com>
> 
> 	* config/aarch64/aarch64.md
> 	(<LOGICAL:optab>_one_cmpl_<SHIFT:optab><mode>3): Make a named
> 	pattern.
> 

> From 0e2ae8739d70e4d1c14fa848f67847b1ecf94f71 Mon Sep 17 00:00:00 2001
> From: Matthew Wahab <matthew.wahab@arm.com>
> Date: Mon, 17 Aug 2015 17:48:27 +0100
> Subject: [PATCH 2/5] Make BIC, other logical instructions, available for use.
> 
> Change-Id: Ibef049bfa1bfe5e168feada3dc358f28383e6410
> ---
>  gcc/config/aarch64/aarch64.md | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 88ba72e..72384ce 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -3392,7 +3392,7 @@
>    [(set_attr "type" "logics_reg")]
>  )
>  
> -(define_insn "*<LOGICAL:optab>_one_cmpl_<SHIFT:optab><mode>3"
> +(define_insn "<LOGICAL:optab>_one_cmpl_<SHIFT:optab><mode>3"
>    [(set (match_operand:GPI 0 "register_operand" "=r")
>  	(LOGICAL:GPI (not:GPI
>  		      (SHIFT:GPI
> -- 
> 2.1.4
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [AArch64][PATCH 4/5] Use atomic load-operate instructions for fetch-update patterns.
  2015-09-18  9:05   ` James Greenhalgh
@ 2015-09-21 11:32     ` Matthew Wahab
  2015-09-21 14:15       ` James Greenhalgh
  0 siblings, 1 reply; 20+ messages in thread
From: Matthew Wahab @ 2015-09-21 11:32 UTC (permalink / raw)
  To: James Greenhalgh; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 7785 bytes --]

On 18/09/15 09:55, James Greenhalgh wrote:
> On Thu, Sep 17, 2015 at 05:47:43PM +0100, Matthew Wahab wrote:
>> Hello,
>>
>> ARMv8.1 adds atomic swap and atomic load-operate instructions with
>> optional memory ordering specifiers. This patch uses the ARMv8.1 atomic
>> load-operate instructions to implement the atomic_fetch_<op>
>> patterns. This patch also updates the implementation of the atomic_<op>
>> patterns, which are treated as versions of the atomic_fetch_<op> which
>> discard the loaded data.
>>
[..]
>>
>> Ok for trunk?
>
> Some comments in line below.
>
>>
>> diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
>> index eba4c76..76ebd6f 100644
>> --- a/gcc/config/aarch64/aarch64-protos.h
>> +++ b/gcc/config/aarch64/aarch64-protos.h
>> @@ -378,6 +378,8 @@ rtx aarch64_load_tp (rtx);
>>   void aarch64_expand_compare_and_swap (rtx op[]);
>>   void aarch64_split_compare_and_swap (rtx op[]);
>>   void aarch64_gen_atomic_cas (rtx, rtx, rtx, rtx, rtx);
>> +
>> +bool aarch64_atomic_ldop_supported_p (enum rtx_code);
>>   void aarch64_gen_atomic_ldop (enum rtx_code, rtx, rtx, rtx, rtx);
>>   void aarch64_split_atomic_op (enum rtx_code, rtx, rtx, rtx, rtx, rtx, rtx);
>>
>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>> index dc05c6e..33f9ef3 100644
>> --- a/gcc/config/aarch64/aarch64.c
>> +++ b/gcc/config/aarch64/aarch64.c
>> @@ -11064,6 +11064,33 @@ aarch64_expand_compare_and_swap (rtx operands[])
>>     emit_insn (gen_rtx_SET (bval, x));
>>   }
>>
>> +/* Test whether the target supports using a atomic load-operate instruction.
>> +   CODE is the operation and AFTER is TRUE if the data in memory after the
>> +   operation should be returned and FALSE if the data before the operation
>> +   should be returned.  Returns FALSE if the operation isn't supported by the
>> +   architecture.
>> +  */
>
> Stray newline, leave the */ on the line before.

Fixed this.

>> +
>> +bool
>> +aarch64_atomic_ldop_supported_p (enum rtx_code code)
>> +{
>> +  if (!TARGET_LSE)
>> +    return false;
>> +
>> +  switch (code)
>> +    {
>> +    case SET:
>> +    case AND:
>> +    case IOR:
>> +    case XOR:
>> +    case MINUS:
>> +    case PLUS:
>> +      return true;
>> +    default:
>> +      return false;
>> +    }
>> +}
>> +
>>   /* Emit a barrier, that is appropriate for memory model MODEL, at the end of a
>>      sequence implementing an atomic operation.  */
>>
>> @@ -11206,26 +11233,169 @@ aarch64_emit_atomic_swap (machine_mode mode, rtx dst, rtx value,
>>     emit_insn (gen (dst, mem, value, model));
>>   }
>>
>> -/* Emit an atomic operation where the architecture supports it.  */
>> +/* Operations supported by aarch64_emit_atomic_load_op.  */
>> +
>> +enum aarch64_atomic_load_op_code
>> +{
>> +  AARCH64_LDOP_PLUS,	/* A + B  */
>> +  AARCH64_LDOP_XOR,	/* A ^ B  */
>> +  AARCH64_LDOP_OR,	/* A | B  */
>> +  AARCH64_LDOP_BIC	/* A & ~B  */
>> +};
>
> I have a small preference to calling these the same name as the
> instructions they will generate, so AARCH64_LDOP_ADD, AARCH64_LDOP_EOR,
> AARCH64_LDOP_SET and AARCH64_LDOP_CLR, but I'm happy fo you to leave it
> this way if you prefer.
>

I prefer to keep them related to the operation being implemented rather than how it 
is implemented so I've left them as they are.


>> -(define_insn_and_split "atomic_<atomic_optab><mode>"
>> +(define_expand "atomic_<atomic_optab><mode>"
>> + [(set (match_operand:ALLI 0 "aarch64_sync_memory_operand" "")
>> +   (unspec_volatile:ALLI
>> +    [(atomic_op:ALLI (match_dup 0)
>> +      (match_operand:ALLI 1 "<atomic_op_operand>" ""))
>> +     (match_operand:SI 2 "const_int_operand")]
>> +    UNSPECV_ATOMIC_OP))
>> +  (clobber (reg:CC CC_REGNUM))]
>
> This is not needed for the LSE forms of these instructions and may result
> in less optimal code genmeration. On the other hand, that will only be in
> a corner case and this is only a define_expand. Because of that, it would
> be clearer to a reader if you dropped the detailed description of this
> in RTL (which is never used) and rewrote it using just the uses of the
> operands, as so:
>
>> +(define_expand "atomic_<atomic_optab><mode>"
>> + [(match_operand:ALLI 0 "aarch64_sync_memory_operand" "")
>> +  (match_operand:ALLI 1 "<atomic_op_operand>" "")
>> +  (match_operand:SI 2 "const_int_operand")]
>

Switched the new expanders in this and the other patches to the simpler form.

>
>> +(define_insn_and_split "aarch64_atomic_<atomic_optab><mode>"
>> + [(set (match_operand:ALLI 0 "aarch64_sync_memory_operand" "+Q")
>> +   (unspec_volatile:ALLI
>> +    [(atomic_op:ALLI (match_dup 0)
>> +      (match_operand:ALLI 1 "<atomic_op_operand>" "r<const_atomic>"))
>> +     (match_operand:SI 2 "const_int_operand")]
>> +    UNSPECV_ATOMIC_OP))
>> +  (clobber (reg:CC CC_REGNUM))
>> +  (clobber (match_scratch:ALLI 3 "=&r"))
>> +  (clobber (match_scratch:SI 4 "=&r"))]
>> +  ""
>
> TARGET_LSE ?

It's not needed here because this pattern is always available.

>> +  "#"
>> +  "&& reload_completed"
>> +  [(const_int 0)]
>> +  {
>> +    aarch64_split_atomic_op (<CODE>, NULL, operands[3], operands[0],
>> +			     operands[1], operands[2], operands[4]);
>> +    DONE;
>> +  }
>> +)
>> +
>> +(define_insn_and_split "aarch64_atomic_<atomic_optab><mode>_lse"
>>     [(set (match_operand:ALLI 0 "aarch64_sync_memory_operand" "+Q")
>>       (unspec_volatile:ALLI
>>         [(atomic_op:ALLI (match_dup 0)
>>   	(match_operand:ALLI 1 "<atomic_op_operand>" "r<const_atomic>"))
>> -       (match_operand:SI 2 "const_int_operand")]		;; model
>> +       (match_operand:SI 2 "const_int_operand")]
>>         UNSPECV_ATOMIC_OP))
>> -       (clobber (reg:CC CC_REGNUM))
>>      (clobber (match_scratch:ALLI 3 "=&r"))
>> -   (clobber (match_scratch:SI 4 "=&r"))]
>> +   (clobber (reg:CC CC_REGNUM))]
>>     ""
>
> TARGET_LSE?

Fixed. It's not strictly needed since this pattern is only called by 
atomic_<atomic_optab><mode> which does the test for TARGET_LSE. The other _lse 
patterns have the guard though and the pattern only makes sense when +lse is available.


>>
>> -(define_insn_and_split "atomic_fetch_<atomic_optab><mode>"
>> +;; Load-operate-store, returning the updated memory data.
>> +
>> +(define_expand "atomic_fetch_<atomic_optab><mode>"
>> + [(parallel
>> +   [(set
>> +     (match_operand:ALLI 0 "register_operand" "")
>> +     (match_operand:ALLI 1 "aarch64_sync_memory_operand" ""))
>> +   (set
>> +    (match_dup 1)
>> +    (unspec_volatile:ALLI
>> +     [(atomic_op:ALLI
>> +       (match_dup 1)
>> +       (match_operand:ALLI 2 "<atomic_op_operand>" ""))
>> +      (match_operand:SI 3 "const_int_operand")]
>> +     UNSPECV_ATOMIC_OP))])]
>
> This again seems very detailed for an expand pattern which won't use
> it.

Simplified the expander.

I've also removed the CC clobber from the _lse patterns, it was overly cautious.

Update patch attached, Ok for trunk?
Matthew

gcc/
2015-09-21  Matthew Wahab  <matthew.wahab@arm.com>

	* config/aarch64/aarch64-protos.h
	(aarch64_atomic_ldop_supported_p): Declare.
	* config/aarch64/aarch64.c (aarch64_atomic_ldop_supported_p): New.
	(enum aarch64_atomic_load_op_code): New.
	(aarch64_emit_atomic_load_op): New.
	(aarch64_gen_atomic_ldop): Update to support load-operate
	patterns.
	* config/aarch64/atomics.md (atomic_<atomic_optab><mode>): Change
	to an expander.
	(aarch64_atomic_<atomic_optab><mode>): New.
	(aarch64_atomic_<atomic_optab><mode>_lse): New.
	(atomic_fetch_<atomic_optab><mode>): Change to an expander.
	(aarch64_atomic_fetch_<atomic_optab><mode>): New.
	(aarch64_atomic_fetch_<atomic_optab><mode>_lse): New.

gcc/testsuite/
2015-09-21  Matthew Wahab  <matthew.wahab@arm.com>

	* gcc.target/aarch64/atomic-inst-ldadd.c: New.
	* gcc.target/aarch64/atomic-inst-ldlogic.c: New.


[-- Attachment #2: 0004-Use-atomic-instructions-for-fetch-update-patterns.patch --]
[-- Type: text/x-patch, Size: 17488 bytes --]

From 5d1bc64d7e9509374076e4c4ff5a285d4b073f24 Mon Sep 17 00:00:00 2001
From: Matthew Wahab <matthew.wahab@arm.com>
Date: Fri, 7 Aug 2015 17:10:42 +0100
Subject: [PATCH 4/5] Use atomic instructions for fetch-update patterns.

Change-Id: I39759f02e61039067ccaabfd52039e4804eddf2f
---
 gcc/config/aarch64/aarch64-protos.h                |   2 +
 gcc/config/aarch64/aarch64.c                       | 175 ++++++++++++++++++++-
 gcc/config/aarch64/atomics.md                      | 101 ++++++++++--
 .../gcc.target/aarch64/atomic-inst-ldadd.c         |  58 +++++++
 .../gcc.target/aarch64/atomic-inst-ldlogic.c       | 109 +++++++++++++
 5 files changed, 433 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/atomic-inst-ldadd.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/atomic-inst-ldlogic.c

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index eba4c76..76ebd6f 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -378,6 +378,8 @@ rtx aarch64_load_tp (rtx);
 void aarch64_expand_compare_and_swap (rtx op[]);
 void aarch64_split_compare_and_swap (rtx op[]);
 void aarch64_gen_atomic_cas (rtx, rtx, rtx, rtx, rtx);
+
+bool aarch64_atomic_ldop_supported_p (enum rtx_code);
 void aarch64_gen_atomic_ldop (enum rtx_code, rtx, rtx, rtx, rtx);
 void aarch64_split_atomic_op (enum rtx_code, rtx, rtx, rtx, rtx, rtx, rtx);
 
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index dc05c6e..3a1b434 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -11064,6 +11064,32 @@ aarch64_expand_compare_and_swap (rtx operands[])
   emit_insn (gen_rtx_SET (bval, x));
 }
 
+/* Test whether the target supports using a atomic load-operate instruction.
+   CODE is the operation and AFTER is TRUE if the data in memory after the
+   operation should be returned and FALSE if the data before the operation
+   should be returned.  Returns FALSE if the operation isn't supported by the
+   architecture.  */
+
+bool
+aarch64_atomic_ldop_supported_p (enum rtx_code code)
+{
+  if (!TARGET_LSE)
+    return false;
+
+  switch (code)
+    {
+    case SET:
+    case AND:
+    case IOR:
+    case XOR:
+    case MINUS:
+    case PLUS:
+      return true;
+    default:
+      return false;
+    }
+}
+
 /* Emit a barrier, that is appropriate for memory model MODEL, at the end of a
    sequence implementing an atomic operation.  */
 
@@ -11206,26 +11232,169 @@ aarch64_emit_atomic_swap (machine_mode mode, rtx dst, rtx value,
   emit_insn (gen (dst, mem, value, model));
 }
 
-/* Emit an atomic operation where the architecture supports it.  */
+/* Operations supported by aarch64_emit_atomic_load_op.  */
+
+enum aarch64_atomic_load_op_code
+{
+  AARCH64_LDOP_PLUS,	/* A + B  */
+  AARCH64_LDOP_XOR,	/* A ^ B  */
+  AARCH64_LDOP_OR,	/* A | B  */
+  AARCH64_LDOP_BIC	/* A & ~B  */
+};
+
+/* Emit an atomic load-operate.  */
+
+static void
+aarch64_emit_atomic_load_op (enum aarch64_atomic_load_op_code code,
+			     machine_mode mode, rtx dst, rtx src,
+			     rtx mem, rtx model)
+{
+  typedef rtx (*aarch64_atomic_load_op_fn) (rtx, rtx, rtx, rtx);
+  const aarch64_atomic_load_op_fn plus[] =
+  {
+    gen_aarch64_atomic_loadaddqi,
+    gen_aarch64_atomic_loadaddhi,
+    gen_aarch64_atomic_loadaddsi,
+    gen_aarch64_atomic_loadadddi
+  };
+  const aarch64_atomic_load_op_fn eor[] =
+  {
+    gen_aarch64_atomic_loadeorqi,
+    gen_aarch64_atomic_loadeorhi,
+    gen_aarch64_atomic_loadeorsi,
+    gen_aarch64_atomic_loadeordi
+  };
+  const aarch64_atomic_load_op_fn ior[] =
+  {
+    gen_aarch64_atomic_loadsetqi,
+    gen_aarch64_atomic_loadsethi,
+    gen_aarch64_atomic_loadsetsi,
+    gen_aarch64_atomic_loadsetdi
+  };
+  const aarch64_atomic_load_op_fn bic[] =
+  {
+    gen_aarch64_atomic_loadclrqi,
+    gen_aarch64_atomic_loadclrhi,
+    gen_aarch64_atomic_loadclrsi,
+    gen_aarch64_atomic_loadclrdi
+  };
+  aarch64_atomic_load_op_fn gen;
+  int idx = 0;
+
+  switch (mode)
+    {
+    case QImode: idx = 0; break;
+    case HImode: idx = 1; break;
+    case SImode: idx = 2; break;
+    case DImode: idx = 3; break;
+    default:
+      gcc_unreachable ();
+    }
+
+  switch (code)
+    {
+    case AARCH64_LDOP_PLUS: gen = plus[idx]; break;
+    case AARCH64_LDOP_XOR: gen = eor[idx]; break;
+    case AARCH64_LDOP_OR: gen = ior[idx]; break;
+    case AARCH64_LDOP_BIC: gen = bic[idx]; break;
+    default:
+      gcc_unreachable ();
+    }
+
+  emit_insn (gen (dst, mem, src, model));
+}
+
+/* Emit an atomic load+operate.  CODE is the operation.  OUT_DATA is the
+   location to store the data read from memory.  MEM is the memory location to
+   read and modify.  MODEL_RTX is the memory ordering to use.  VALUE is the
+   second operand for the operation.  Either OUT_DATA or OUT_RESULT, but not
+   both, can be NULL.  */
 
 void
 aarch64_gen_atomic_ldop (enum rtx_code code, rtx out_data,
 			 rtx mem, rtx value, rtx model_rtx)
 {
   machine_mode mode = GET_MODE (mem);
+  machine_mode wmode = (mode == DImode ? DImode : SImode);
+  const bool short_mode = (mode < SImode);
+  aarch64_atomic_load_op_code ldop_code;
+  rtx src;
+  rtx x;
+
+  if (out_data)
+    out_data = gen_lowpart (mode, out_data);
 
-  out_data = gen_lowpart (mode, out_data);
+  /* Make sure the value is in a register, putting it into a destination
+     register if it needs to be manipulated.  */
+  if (!register_operand (value, mode)
+      || code == AND || code == MINUS)
+    {
+      src = out_data;
+      emit_move_insn (src, gen_lowpart (mode, value));
+    }
+  else
+    src = value;
+  gcc_assert (register_operand (src, mode));
 
+  /* Preprocess the data for the operation as necessary.  If the operation is
+     a SET then emit a swap instruction and finish.  */
   switch (code)
     {
     case SET:
-      aarch64_emit_atomic_swap (mode, out_data, value, mem, model_rtx);
+      aarch64_emit_atomic_swap (mode, out_data, src, mem, model_rtx);
       return;
 
+    case MINUS:
+      /* Negate the value and treat it as a PLUS.  */
+      {
+	rtx neg_src;
+
+	/* Resize the value if necessary.  */
+	if (short_mode)
+	  src = gen_lowpart (wmode, src);
+
+	neg_src = gen_rtx_NEG (wmode, src);
+	emit_insn (gen_rtx_SET (src, neg_src));
+
+	if (short_mode)
+	  src = gen_lowpart (mode, src);
+      }
+      /* Fall-through.  */
+    case PLUS:
+      ldop_code = AARCH64_LDOP_PLUS;
+      break;
+
+    case IOR:
+      ldop_code = AARCH64_LDOP_OR;
+      break;
+
+    case XOR:
+      ldop_code = AARCH64_LDOP_XOR;
+      break;
+
+    case AND:
+      {
+	rtx not_src;
+
+	/* Resize the value if necessary.  */
+	if (short_mode)
+	  src = gen_lowpart (wmode, src);
+
+	not_src = gen_rtx_NOT (wmode, src);
+	emit_insn (gen_rtx_SET (src, not_src));
+
+	if (short_mode)
+	  src = gen_lowpart (mode, src);
+      }
+      ldop_code = AARCH64_LDOP_BIC;
+      break;
+
     default:
       /* The operation can't be done with atomic instructions.  */
       gcc_unreachable ();
     }
+
+  aarch64_emit_atomic_load_op (ldop_code, mode, out_data, src, mem, model_rtx);
 }
 
 /* Split an atomic operation.  */
diff --git a/gcc/config/aarch64/atomics.md b/gcc/config/aarch64/atomics.md
index 11a9d13..e0d8856 100644
--- a/gcc/config/aarch64/atomics.md
+++ b/gcc/config/aarch64/atomics.md
@@ -225,23 +225,63 @@
   }
 )
 
-(define_insn_and_split "atomic_<atomic_optab><mode>"
+(define_expand "atomic_<atomic_optab><mode>"
+ [(match_operand:ALLI 0 "aarch64_sync_memory_operand" "")
+  (atomic_op:ALLI
+   (match_operand:ALLI 1 "<atomic_op_operand>" "")
+   (match_operand:SI 2 "const_int_operand"))]
+  ""
+  {
+    rtx (*gen) (rtx, rtx, rtx);
+
+    /* Use an atomic load-operate instruction when possible.  */
+    if (aarch64_atomic_ldop_supported_p (<CODE>))
+      gen = gen_aarch64_atomic_<atomic_optab><mode>_lse;
+    else
+      gen = gen_aarch64_atomic_<atomic_optab><mode>;
+
+    emit_insn (gen (operands[0], operands[1], operands[2]));
+
+    DONE;
+  }
+)
+
+(define_insn_and_split "aarch64_atomic_<atomic_optab><mode>"
+ [(set (match_operand:ALLI 0 "aarch64_sync_memory_operand" "+Q")
+   (unspec_volatile:ALLI
+    [(atomic_op:ALLI (match_dup 0)
+      (match_operand:ALLI 1 "<atomic_op_operand>" "r<const_atomic>"))
+     (match_operand:SI 2 "const_int_operand")]
+    UNSPECV_ATOMIC_OP))
+  (clobber (reg:CC CC_REGNUM))
+  (clobber (match_scratch:ALLI 3 "=&r"))
+  (clobber (match_scratch:SI 4 "=&r"))]
+  ""
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+  {
+    aarch64_split_atomic_op (<CODE>, NULL, operands[3], operands[0],
+			     operands[1], operands[2], operands[4]);
+    DONE;
+  }
+)
+
+(define_insn_and_split "aarch64_atomic_<atomic_optab><mode>_lse"
   [(set (match_operand:ALLI 0 "aarch64_sync_memory_operand" "+Q")
     (unspec_volatile:ALLI
       [(atomic_op:ALLI (match_dup 0)
 	(match_operand:ALLI 1 "<atomic_op_operand>" "r<const_atomic>"))
-       (match_operand:SI 2 "const_int_operand")]		;; model
+       (match_operand:SI 2 "const_int_operand")]
       UNSPECV_ATOMIC_OP))
-       (clobber (reg:CC CC_REGNUM))
-   (clobber (match_scratch:ALLI 3 "=&r"))
-   (clobber (match_scratch:SI 4 "=&r"))]
-  ""
+   (clobber (match_scratch:ALLI 3 "=&r"))]
+  "TARGET_LSE"
   "#"
   "&& reload_completed"
   [(const_int 0)]
   {
-    aarch64_split_atomic_op (<CODE>, NULL, operands[3], operands[0],
-			     operands[1], operands[2], operands[4]);
+    aarch64_gen_atomic_ldop (<CODE>, operands[3], operands[0],
+			     operands[1], operands[2]);
     DONE;
   }
 )
@@ -268,7 +308,30 @@
   }
 )
 
-(define_insn_and_split "atomic_fetch_<atomic_optab><mode>"
+;; Load-operate-store, returning the updated memory data.
+
+(define_expand "atomic_fetch_<atomic_optab><mode>"
+ [(match_operand:ALLI 0 "register_operand" "")
+  (match_operand:ALLI 1 "aarch64_sync_memory_operand" "")
+  (atomic_op:ALLI
+   (match_operand:ALLI 2 "<atomic_op_operand>" "")
+   (match_operand:SI 3 "const_int_operand"))]
+ ""
+{
+  rtx (*gen) (rtx, rtx, rtx, rtx);
+
+  /* Use an atomic load-operate instruction when possible.  */
+  if (aarch64_atomic_ldop_supported_p (<CODE>))
+    gen = gen_aarch64_atomic_fetch_<atomic_optab><mode>_lse;
+  else
+    gen = gen_aarch64_atomic_fetch_<atomic_optab><mode>;
+
+  emit_insn (gen (operands[0], operands[1], operands[2], operands[3]));
+
+  DONE;
+})
+
+(define_insn_and_split "aarch64_atomic_fetch_<atomic_optab><mode>"
   [(set (match_operand:ALLI 0 "register_operand" "=&r")
     (match_operand:ALLI 1 "aarch64_sync_memory_operand" "+Q"))
    (set (match_dup 1)
@@ -291,6 +354,26 @@
   }
 )
 
+(define_insn_and_split "aarch64_atomic_fetch_<atomic_optab><mode>_lse"
+  [(set (match_operand:ALLI 0 "register_operand" "=&r")
+    (match_operand:ALLI 1 "aarch64_sync_memory_operand" "+Q"))
+   (set (match_dup 1)
+    (unspec_volatile:ALLI
+      [(atomic_op:ALLI (match_dup 1)
+	(match_operand:ALLI 2 "<atomic_op_operand>" "r<const_atomic>"))
+       (match_operand:SI 3 "const_int_operand")]
+      UNSPECV_ATOMIC_LDOP))]
+  "TARGET_LSE"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+  {
+    aarch64_gen_atomic_ldop (<CODE>, operands[0], operands[1],
+			     operands[2], operands[3]);
+    DONE;
+  }
+)
+
 (define_insn_and_split "atomic_fetch_nand<mode>"
   [(set (match_operand:ALLI 0 "register_operand" "=&r")
     (match_operand:ALLI 1 "aarch64_sync_memory_operand" "+Q"))
diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-inst-ldadd.c b/gcc/testsuite/gcc.target/aarch64/atomic-inst-ldadd.c
new file mode 100644
index 0000000..c21d2ed
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/atomic-inst-ldadd.c
@@ -0,0 +1,58 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv8-a+lse" } */
+
+/* Test ARMv8.1-A Load-ADD instruction.  */
+
+#include "atomic-inst-ops.inc"
+
+#define TEST TEST_ONE
+
+#define LOAD_ADD(FN, TY, MODEL)						\
+  TY FNNAME (FN, TY) (TY* val, TY* foo)					\
+  {									\
+    return __atomic_fetch_add (val, foo, MODEL);			\
+  }
+
+#define LOAD_ADD_NORETURN(FN, TY, MODEL)				\
+  void FNNAME (FN, TY) (TY* val, TY* foo)				\
+  {									\
+    __atomic_fetch_add (val, foo, MODEL);				\
+  }
+
+#define LOAD_SUB(FN, TY, MODEL)						\
+  TY FNNAME (FN, TY) (TY* val, TY* foo)					\
+  {									\
+    return __atomic_fetch_sub (val, foo, MODEL);			\
+  }
+
+#define LOAD_SUB_NORETURN(FN, TY, MODEL)				\
+  void FNNAME (FN, TY) (TY* val, TY* foo)				\
+  {									\
+    __atomic_fetch_sub (val, foo, MODEL);				\
+  }
+
+
+TEST (load_add, LOAD_ADD)
+TEST (load_add_notreturn, LOAD_ADD_NORETURN)
+
+TEST (load_sub, LOAD_SUB)
+TEST (load_sub_notreturn, LOAD_SUB_NORETURN)
+
+/* { dg-final { scan-assembler-times "ldaddb\t" 8} } */
+/* { dg-final { scan-assembler-times "ldaddab\t" 16} } */
+/* { dg-final { scan-assembler-times "ldaddlb\t" 8} } */
+/* { dg-final { scan-assembler-times "ldaddalb\t" 16} } */
+
+/* { dg-final { scan-assembler-times "ldaddh\t" 8} } */
+/* { dg-final { scan-assembler-times "ldaddah\t" 16} } */
+/* { dg-final { scan-assembler-times "ldaddlh\t" 8} } */
+/* { dg-final { scan-assembler-times "ldaddalh\t" 16} } */
+
+/* { dg-final { scan-assembler-times "ldadd\t" 16} } */
+/* { dg-final { scan-assembler-times "ldadda\t" 32} } */
+/* { dg-final { scan-assembler-times "ldaddl\t" 16} } */
+/* { dg-final { scan-assembler-times "ldaddal\t" 32} } */
+
+/* { dg-final { scan-assembler-not "ldaxr\t" } } */
+/* { dg-final { scan-assembler-not "stlxr\t" } } */
+/* { dg-final { scan-assembler-not "dmb" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-inst-ldlogic.c b/gcc/testsuite/gcc.target/aarch64/atomic-inst-ldlogic.c
new file mode 100644
index 0000000..fd0f484
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/atomic-inst-ldlogic.c
@@ -0,0 +1,109 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv8-a+lse" } */
+
+/* Test ARMv8.1-A LD<logic-op> instruction.  */
+
+#include "atomic-inst-ops.inc"
+
+#define TEST TEST_ONE
+
+#define LOAD_OR(FN, TY, MODEL)						\
+  TY FNNAME (FN, TY) (TY* val, TY* foo)					\
+  {									\
+    return __atomic_fetch_or (val, foo, MODEL);				\
+  }
+
+#define LOAD_OR_NORETURN(FN, TY, MODEL)					\
+  void FNNAME (FN, TY) (TY* val, TY* foo)				\
+  {									\
+    __atomic_fetch_or (val, foo, MODEL);				\
+  }
+
+#define LOAD_AND(FN, TY, MODEL)						\
+  TY FNNAME (FN, TY) (TY* val, TY* foo)					\
+  {									\
+    return __atomic_fetch_and (val, foo, MODEL);			\
+  }
+
+#define LOAD_AND_NORETURN(FN, TY, MODEL)				\
+  void FNNAME (FN, TY) (TY* val, TY* foo)				\
+  {									\
+    __atomic_fetch_and (val, foo, MODEL);				\
+  }
+
+#define LOAD_XOR(FN, TY, MODEL)						\
+  TY FNNAME (FN, TY) (TY* val, TY* foo)					\
+  {									\
+    return __atomic_fetch_xor (val, foo, MODEL);			\
+  }
+
+#define LOAD_XOR_NORETURN(FN, TY, MODEL)				\
+  void FNNAME (FN, TY) (TY* val, TY* foo)				\
+  {									\
+    __atomic_fetch_xor (val, foo, MODEL);				\
+  }
+
+
+TEST (load_or, LOAD_OR)
+TEST (load_or_notreturn, LOAD_OR_NORETURN)
+
+TEST (load_and, LOAD_AND)
+TEST (load_and_notreturn, LOAD_AND_NORETURN)
+
+TEST (load_xor, LOAD_XOR)
+TEST (load_xor_notreturn, LOAD_XOR_NORETURN)
+
+/* Load-OR.  */
+
+/* { dg-final { scan-assembler-times "ldsetb\t" 4} } */
+/* { dg-final { scan-assembler-times "ldsetab\t" 8} } */
+/* { dg-final { scan-assembler-times "ldsetlb\t" 4} } */
+/* { dg-final { scan-assembler-times "ldsetalb\t" 8} } */
+
+/* { dg-final { scan-assembler-times "ldseth\t" 4} } */
+/* { dg-final { scan-assembler-times "ldsetah\t" 8} } */
+/* { dg-final { scan-assembler-times "ldsetlh\t" 4} } */
+/* { dg-final { scan-assembler-times "ldsetalh\t" 8} } */
+
+/* { dg-final { scan-assembler-times "ldset\t" 8} } */
+/* { dg-final { scan-assembler-times "ldseta\t" 16} } */
+/* { dg-final { scan-assembler-times "ldsetl\t" 8} } */
+/* { dg-final { scan-assembler-times "ldsetal\t" 16} } */
+
+/* Load-AND.  */
+
+/* { dg-final { scan-assembler-times "ldclrb\t" 4} } */
+/* { dg-final { scan-assembler-times "ldclrab\t" 8} } */
+/* { dg-final { scan-assembler-times "ldclrlb\t" 4} } */
+/* { dg-final { scan-assembler-times "ldclralb\t" 8} } */
+
+/* { dg-final { scan-assembler-times "ldclrh\t" 4} } */
+/* { dg-final { scan-assembler-times "ldclrah\t" 8} } */
+/* { dg-final { scan-assembler-times "ldclrlh\t" 4} } */
+/* { dg-final { scan-assembler-times "ldclralh\t" 8} } */
+
+/* { dg-final { scan-assembler-times "ldclr\t" 8} */
+/* { dg-final { scan-assembler-times "ldclra\t" 16} } */
+/* { dg-final { scan-assembler-times "ldclrl\t" 8} } */
+/* { dg-final { scan-assembler-times "ldclral\t" 16} } */
+
+/* Load-XOR.  */
+
+/* { dg-final { scan-assembler-times "ldeorb\t" 4} } */
+/* { dg-final { scan-assembler-times "ldeorab\t" 8} } */
+/* { dg-final { scan-assembler-times "ldeorlb\t" 4} } */
+/* { dg-final { scan-assembler-times "ldeoralb\t" 8} } */
+
+/* { dg-final { scan-assembler-times "ldeorh\t" 4} } */
+/* { dg-final { scan-assembler-times "ldeorah\t" 8} } */
+/* { dg-final { scan-assembler-times "ldeorlh\t" 4} } */
+/* { dg-final { scan-assembler-times "ldeoralh\t" 8} } */
+
+/* { dg-final { scan-assembler-times "ldeor\t" 8} */
+/* { dg-final { scan-assembler-times "ldeora\t" 16} } */
+/* { dg-final { scan-assembler-times "ldeorl\t" 8} } */
+/* { dg-final { scan-assembler-times "ldeoral\t" 16} } */
+
+/* { dg-final { scan-assembler-not "ldaxr\t" } } */
+/* { dg-final { scan-assembler-not "stlxr\t" } } */
+/* { dg-final { scan-assembler-not "dmb" } } */
-- 
2.1.4


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [AArch64][PATCH 5/5] Use atomic load-operate instructions for update-fetch patterns.
  2015-09-17 16:56 ` [AArch64][PATCH 5/5] Use atomic load-operate instructions for update-fetch patterns Matthew Wahab
  2015-09-17 18:56   ` Andrew Pinski
@ 2015-09-21 11:44   ` Matthew Wahab
  2015-09-21 14:27     ` James Greenhalgh
  1 sibling, 1 reply; 20+ messages in thread
From: Matthew Wahab @ 2015-09-21 11:44 UTC (permalink / raw)
  To: James Greenhalgh; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 3109 bytes --]

On 17/09/15 17:54, Matthew Wahab wrote:
> ARMv8.1 adds atomic swap and atomic load-operate instructions with
> optional memory ordering specifiers. This patch uses the ARMv8.1
> load-operate instructions to implement the atomic_<op>_fetch patterns.
>
> The approach is to use the atomic load-operate instruction to atomically
> load the data and update memory and then to use the loaded data to
> calculate the value that the instruction would have stored. The
> calculation attempts to mirror the operation of the atomic instruction.
> For example, atomic_and_fetch<mode> is implemented with an atomic
> load-bic so the result is also calculated using a BIC instruction.
>
[...]
>
> 2015-09-17  Matthew Wahab  <matthew.wahab@arm.com>
>
>      * config/aarch64/aarch64-protos.h (aarch64_gen_atomic_ldop):
>      Adjust declaration.
>      * config/aarch64/aarch64.c (aarch64_emit_bic): New.
>      (aarch64_gen_atomic_load_op): Adjust comment.  Add parameter
>      out_result.  Update to support update-fetch operations.
>      * config/aarch64/atomics.md (aarch64_atomic_exchange<mode>_lse):
>      Adjust for change to aarch64_gen_atomic_ldop.
>      (aarch64_atomic_<atomic_optab><mode>_lse): Likewise.
>      (aarch64_atomic_fetch_<atomic_optab><mode>_lse): Likewise.
>      (atomic_<atomic_optab>_fetch<mode>): Change to an expander.
>      (aarch64_atomic_<atomic_optab>_fetch<mode>): New.
>      (aarch64_atomic_<atomic_optab>_fetch<mode>_lse): New.
>
> gcc/testsuite
> 2015-09-17  Matthew Wahab  <matthew.wahab@arm.com>
>
>      * gcc.target/aarch64/atomic-inst-ldadd.c: Add tests for
>      update-fetch operations.
>      * gcc.target/aarch64/atomic-inst-ldlogic.c: Likewise.
>

Attached an updated patch that takes into account the review comments and changes for 
the rest of the series.

The changes in this patch:
- Updated emit_bic for changes in the earlier patch.
- Simplified the patterns used in the new expanders.
- Dropped CC clobber from the _lse patterns.

Tested the series for aarch64-none-linux-gnu with native bootstrap and
make check. Also tested for aarch64-none-elf with cross-compiled
check-gcc on an ARMv8.1 emulator with +lse enabled by default.

Ok for trunk?
Matthew

2015-09-21  Matthew Wahab  <matthew.wahab@arm.com>

	* config/aarch64/aarch64-protos.h (aarch64_gen_atomic_ldop):
	Adjust declaration.
	* config/aarch64/aarch64.c (aarch64_emit_bic): New.
	(aarch64_gen_atomic_ldop): Adjust comment.  Add parameter
	out_result.  Update to support update-fetch operations.
	* config/aarch64/atomics.md (aarch64_atomic_exchange<mode>_lse):
	Adjust for change to aarch64_gen_atomic_ldop.
	(aarch64_atomic_<atomic_optab><mode>_lse): Likewise.
	(aarch64_atomic_fetch_<atomic_optab><mode>_lse): Likewise.
	(atomic_<atomic_optab>_fetch<mode>): Change to an expander.
	(aarch64_atomic_<atomic_optab>_fetch<mode>): New.
	(aarch64_atomic_<atomic_optab>_fetch<mode>_lse): New.

gcc/testsuite
2015-09-21  Matthew Wahab  <matthew.wahab@arm.com>

	* gcc.target/aarch64/atomic-inst-ldadd.c: Add tests for
	update-fetch operations.
	* gcc.target/aarch64/atomic-inst-ldlogic.c: Likewise.



[-- Attachment #2: 0005-Use-atomic-instructions-for-update-fetch-patterns.patch --]
[-- Type: text/x-patch, Size: 16590 bytes --]

From abd313723964e90b6e7d7785b646c657f6b072f9 Mon Sep 17 00:00:00 2001
From: Matthew Wahab <matthew.wahab@arm.com>
Date: Mon, 17 Aug 2015 11:27:18 +0100
Subject: [PATCH 5/5] Use atomic instructions for update-fetch patterns.

Change-Id: I5eef48586fe904f0d2df8c581fb3c12a4a2d9c78
---
 gcc/config/aarch64/aarch64-protos.h                |   2 +-
 gcc/config/aarch64/aarch64.c                       |  72 +++++++++++--
 gcc/config/aarch64/atomics.md                      |  55 +++++++++-
 .../gcc.target/aarch64/atomic-inst-ldadd.c         |  53 ++++++---
 .../gcc.target/aarch64/atomic-inst-ldlogic.c       | 118 ++++++++++++++-------
 5 files changed, 241 insertions(+), 59 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 76ebd6f..dd8ebcc 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -380,7 +380,7 @@ void aarch64_split_compare_and_swap (rtx op[]);
 void aarch64_gen_atomic_cas (rtx, rtx, rtx, rtx, rtx);
 
 bool aarch64_atomic_ldop_supported_p (enum rtx_code);
-void aarch64_gen_atomic_ldop (enum rtx_code, rtx, rtx, rtx, rtx);
+void aarch64_gen_atomic_ldop (enum rtx_code, rtx, rtx, rtx, rtx, rtx);
 void aarch64_split_atomic_op (enum rtx_code, rtx, rtx, rtx, rtx, rtx, rtx);
 
 bool aarch64_gen_adjusted_ldpstp (rtx *, bool, enum machine_mode, RTX_CODE);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 3a1b434..b6cdf7c 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -11211,6 +11211,25 @@ aarch64_split_compare_and_swap (rtx operands[])
     aarch64_emit_post_barrier (model);
 }
 
+/* Emit a BIC instruction.  */
+
+static void
+aarch64_emit_bic (machine_mode mode, rtx dst, rtx s1, rtx s2, int shift)
+{
+  rtx shift_rtx = GEN_INT (shift);
+  rtx (*gen) (rtx, rtx, rtx, rtx);
+
+  switch (mode)
+    {
+    case SImode: gen = gen_and_one_cmpl_lshrsi3; break;
+    case DImode: gen = gen_and_one_cmpl_lshrdi3; break;
+    default:
+      gcc_unreachable ();
+    }
+
+  emit_insn (gen (dst, s2, shift_rtx, s1));
+}
+
 /* Emit an atomic swap.  */
 
 static void
@@ -11305,13 +11324,14 @@ aarch64_emit_atomic_load_op (enum aarch64_atomic_load_op_code code,
 }
 
 /* Emit an atomic load+operate.  CODE is the operation.  OUT_DATA is the
-   location to store the data read from memory.  MEM is the memory location to
-   read and modify.  MODEL_RTX is the memory ordering to use.  VALUE is the
-   second operand for the operation.  Either OUT_DATA or OUT_RESULT, but not
-   both, can be NULL.  */
+   location to store the data read from memory.  OUT_RESULT is the location to
+   store the result of the operation.  MEM is the memory location to read and
+   modify.  MODEL_RTX is the memory ordering to use.  VALUE is the second
+   operand for the operation.  Either OUT_DATA or OUT_RESULT, but not both, can
+   be NULL.  */
 
 void
-aarch64_gen_atomic_ldop (enum rtx_code code, rtx out_data,
+aarch64_gen_atomic_ldop (enum rtx_code code, rtx out_data, rtx out_result,
 			 rtx mem, rtx value, rtx model_rtx)
 {
   machine_mode mode = GET_MODE (mem);
@@ -11324,12 +11344,15 @@ aarch64_gen_atomic_ldop (enum rtx_code code, rtx out_data,
   if (out_data)
     out_data = gen_lowpart (mode, out_data);
 
+  if (out_result)
+    out_result = gen_lowpart (mode, out_result);
+
   /* Make sure the value is in a register, putting it into a destination
      register if it needs to be manipulated.  */
   if (!register_operand (value, mode)
       || code == AND || code == MINUS)
     {
-      src = out_data;
+      src = out_result ? out_result : out_data;
       emit_move_insn (src, gen_lowpart (mode, value));
     }
   else
@@ -11395,6 +11418,43 @@ aarch64_gen_atomic_ldop (enum rtx_code code, rtx out_data,
     }
 
   aarch64_emit_atomic_load_op (ldop_code, mode, out_data, src, mem, model_rtx);
+
+  /* If necessary, calculate the data in memory after the update by redoing the
+     operation from values in registers.  */
+  if (!out_result)
+    return;
+
+  if (short_mode)
+    {
+      src = gen_lowpart (wmode, src);
+      out_data = gen_lowpart (wmode, out_data);
+      out_result = gen_lowpart (wmode, out_result);
+    }
+
+  x = NULL_RTX;
+
+  switch (code)
+    {
+    case MINUS:
+    case PLUS:
+      x = gen_rtx_PLUS (wmode, out_data, src);
+      break;
+    case IOR:
+      x = gen_rtx_IOR (wmode, out_data, src);
+      break;
+    case XOR:
+      x = gen_rtx_XOR (wmode, out_data, src);
+      break;
+    case AND:
+      aarch64_emit_bic (wmode, out_result, out_data, src, 0);
+      return;
+    default:
+      gcc_unreachable ();
+    }
+
+  emit_set_insn (out_result, x);
+
+  return;
 }
 
 /* Split an atomic operation.  */
diff --git a/gcc/config/aarch64/atomics.md b/gcc/config/aarch64/atomics.md
index e0d8856..e7ac5f6 100644
--- a/gcc/config/aarch64/atomics.md
+++ b/gcc/config/aarch64/atomics.md
@@ -219,7 +219,7 @@
   "&& reload_completed"
   [(const_int 0)]
   {
-    aarch64_gen_atomic_ldop (SET, operands[0], operands[1],
+    aarch64_gen_atomic_ldop (SET, operands[0], NULL, operands[1],
 			     operands[2], operands[3]);
     DONE;
   }
@@ -280,7 +280,7 @@
   "&& reload_completed"
   [(const_int 0)]
   {
-    aarch64_gen_atomic_ldop (<CODE>, operands[3], operands[0],
+    aarch64_gen_atomic_ldop (<CODE>, operands[3], NULL, operands[0],
 			     operands[1], operands[2]);
     DONE;
   }
@@ -368,7 +368,7 @@
   "&& reload_completed"
   [(const_int 0)]
   {
-    aarch64_gen_atomic_ldop (<CODE>, operands[0], operands[1],
+    aarch64_gen_atomic_ldop (<CODE>, operands[0], NULL, operands[1],
 			     operands[2], operands[3]);
     DONE;
   }
@@ -398,7 +398,31 @@
   }
 )
 
-(define_insn_and_split "atomic_<atomic_optab>_fetch<mode>"
+;; Load-operate-store, returning the original memory data.
+
+(define_expand "atomic_<atomic_optab>_fetch<mode>"
+ [(match_operand:ALLI 0 "register_operand" "")
+  (atomic_op:ALLI
+   (match_operand:ALLI 1 "aarch64_sync_memory_operand" "")
+   (match_operand:ALLI 2 "<atomic_op_operand>" ""))
+  (match_operand:SI 3 "const_int_operand")]
+ ""
+{
+  rtx (*gen) (rtx, rtx, rtx, rtx);
+  rtx value = operands[2];
+
+  /* Use an atomic load-operate instruction when possible.  */
+  if (aarch64_atomic_ldop_supported_p (<CODE>))
+    gen = gen_aarch64_atomic_<atomic_optab>_fetch<mode>_lse;
+  else
+    gen = gen_aarch64_atomic_<atomic_optab>_fetch<mode>;
+
+  emit_insn (gen (operands[0], operands[1], value, operands[3]));
+
+  DONE;
+})
+
+(define_insn_and_split "aarch64_atomic_<atomic_optab>_fetch<mode>"
   [(set (match_operand:ALLI 0 "register_operand" "=&r")
     (atomic_op:ALLI
       (match_operand:ALLI 1 "aarch64_sync_memory_operand" "+Q")
@@ -421,6 +445,29 @@
   }
 )
 
+(define_insn_and_split "aarch64_atomic_<atomic_optab>_fetch<mode>_lse"
+  [(set (match_operand:ALLI 0 "register_operand" "=&r")
+    (atomic_op:ALLI
+     (match_operand:ALLI 1 "aarch64_sync_memory_operand" "+Q")
+     (match_operand:ALLI 2 "<atomic_op_operand>" "r<const_atomic>")))
+   (set (match_dup 1)
+    (unspec_volatile:ALLI
+      [(match_dup 1)
+       (match_dup 2)
+       (match_operand:SI 3 "const_int_operand")]
+      UNSPECV_ATOMIC_LDOP))
+     (clobber (match_scratch:ALLI 4 "=r"))]
+  "TARGET_LSE"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+  {
+    aarch64_gen_atomic_ldop (<CODE>, operands[4], operands[0], operands[1],
+			     operands[2], operands[3]);
+    DONE;
+  }
+)
+
 (define_insn_and_split "atomic_nand_fetch<mode>"
   [(set (match_operand:ALLI 0 "register_operand" "=&r")
     (not:ALLI
diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-inst-ldadd.c b/gcc/testsuite/gcc.target/aarch64/atomic-inst-ldadd.c
index c21d2ed..4b2282c 100644
--- a/gcc/testsuite/gcc.target/aarch64/atomic-inst-ldadd.c
+++ b/gcc/testsuite/gcc.target/aarch64/atomic-inst-ldadd.c
@@ -31,6 +31,29 @@
     __atomic_fetch_sub (val, foo, MODEL);				\
   }
 
+#define ADD_LOAD(FN, TY, MODEL)						\
+  TY FNNAME (FN, TY) (TY* val, TY* foo)					\
+  {									\
+    return __atomic_add_fetch (val, foo, MODEL);			\
+  }
+
+#define ADD_LOAD_NORETURN(FN, TY, MODEL)				\
+  void FNNAME (FN, TY) (TY* val, TY* foo)				\
+  {									\
+    __atomic_add_fetch (val, foo, MODEL);				\
+  }
+
+#define SUB_LOAD(FN, TY, MODEL)						\
+  TY FNNAME (FN, TY) (TY* val, TY* foo)					\
+  {									\
+    return __atomic_sub_fetch (val, foo, MODEL);			\
+  }
+
+#define SUB_LOAD_NORETURN(FN, TY, MODEL)				\
+  void FNNAME (FN, TY) (TY* val, TY* foo)				\
+  {									\
+    __atomic_sub_fetch (val, foo, MODEL);				\
+  }
 
 TEST (load_add, LOAD_ADD)
 TEST (load_add_notreturn, LOAD_ADD_NORETURN)
@@ -38,20 +61,26 @@ TEST (load_add_notreturn, LOAD_ADD_NORETURN)
 TEST (load_sub, LOAD_SUB)
 TEST (load_sub_notreturn, LOAD_SUB_NORETURN)
 
-/* { dg-final { scan-assembler-times "ldaddb\t" 8} } */
-/* { dg-final { scan-assembler-times "ldaddab\t" 16} } */
-/* { dg-final { scan-assembler-times "ldaddlb\t" 8} } */
-/* { dg-final { scan-assembler-times "ldaddalb\t" 16} } */
+TEST (add_load, ADD_LOAD)
+TEST (add_load_notreturn, ADD_LOAD_NORETURN)
+
+TEST (sub_load, SUB_LOAD)
+TEST (sub_load_notreturn, SUB_LOAD_NORETURN)
+
+/* { dg-final { scan-assembler-times "ldaddb\t" 16} } */
+/* { dg-final { scan-assembler-times "ldaddab\t" 32} } */
+/* { dg-final { scan-assembler-times "ldaddlb\t" 16} } */
+/* { dg-final { scan-assembler-times "ldaddalb\t" 32} } */
 
-/* { dg-final { scan-assembler-times "ldaddh\t" 8} } */
-/* { dg-final { scan-assembler-times "ldaddah\t" 16} } */
-/* { dg-final { scan-assembler-times "ldaddlh\t" 8} } */
-/* { dg-final { scan-assembler-times "ldaddalh\t" 16} } */
+/* { dg-final { scan-assembler-times "ldaddh\t" 16} } */
+/* { dg-final { scan-assembler-times "ldaddah\t" 32} } */
+/* { dg-final { scan-assembler-times "ldaddlh\t" 16} } */
+/* { dg-final { scan-assembler-times "ldaddalh\t" 32} } */
 
-/* { dg-final { scan-assembler-times "ldadd\t" 16} } */
-/* { dg-final { scan-assembler-times "ldadda\t" 32} } */
-/* { dg-final { scan-assembler-times "ldaddl\t" 16} } */
-/* { dg-final { scan-assembler-times "ldaddal\t" 32} } */
+/* { dg-final { scan-assembler-times "ldadd\t" 32} } */
+/* { dg-final { scan-assembler-times "ldadda\t" 64} } */
+/* { dg-final { scan-assembler-times "ldaddl\t" 32} } */
+/* { dg-final { scan-assembler-times "ldaddal\t" 64} } */
 
 /* { dg-final { scan-assembler-not "ldaxr\t" } } */
 /* { dg-final { scan-assembler-not "stlxr\t" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-inst-ldlogic.c b/gcc/testsuite/gcc.target/aarch64/atomic-inst-ldlogic.c
index fd0f484..4879d52 100644
--- a/gcc/testsuite/gcc.target/aarch64/atomic-inst-ldlogic.c
+++ b/gcc/testsuite/gcc.target/aarch64/atomic-inst-ldlogic.c
@@ -43,6 +43,42 @@
     __atomic_fetch_xor (val, foo, MODEL);				\
   }
 
+#define OR_LOAD(FN, TY, MODEL)						\
+  TY FNNAME (FN, TY) (TY* val, TY* foo)					\
+  {									\
+    return __atomic_or_fetch (val, foo, MODEL);				\
+  }
+
+#define OR_LOAD_NORETURN(FN, TY, MODEL)					\
+  void FNNAME (FN, TY) (TY* val, TY* foo)				\
+  {									\
+    __atomic_or_fetch (val, foo, MODEL);				\
+  }
+
+#define AND_LOAD(FN, TY, MODEL)						\
+  TY FNNAME (FN, TY) (TY* val, TY* foo)					\
+  {									\
+    return __atomic_and_fetch (val, foo, MODEL);			\
+  }
+
+#define AND_LOAD_NORETURN(FN, TY, MODEL)				\
+  void FNNAME (FN, TY) (TY* val, TY* foo)				\
+  {									\
+    __atomic_and_fetch (val, foo, MODEL);				\
+  }
+
+#define XOR_LOAD(FN, TY, MODEL)						\
+  TY FNNAME (FN, TY) (TY* val, TY* foo)					\
+  {									\
+    return __atomic_xor_fetch (val, foo, MODEL);			\
+  }
+
+#define XOR_LOAD_NORETURN(FN, TY, MODEL)				\
+  void FNNAME (FN, TY) (TY* val, TY* foo)				\
+  {									\
+    __atomic_xor_fetch (val, foo, MODEL);				\
+  }
+
 
 TEST (load_or, LOAD_OR)
 TEST (load_or_notreturn, LOAD_OR_NORETURN)
@@ -53,56 +89,66 @@ TEST (load_and_notreturn, LOAD_AND_NORETURN)
 TEST (load_xor, LOAD_XOR)
 TEST (load_xor_notreturn, LOAD_XOR_NORETURN)
 
+TEST (or_load, OR_LOAD)
+TEST (or_load_notreturn, OR_LOAD_NORETURN)
+
+TEST (and_load, AND_LOAD)
+TEST (and_load_notreturn, AND_LOAD_NORETURN)
+
+TEST (xor_load, XOR_LOAD)
+TEST (xor_load_notreturn, XOR_LOAD_NORETURN)
+
+
 /* Load-OR.  */
 
-/* { dg-final { scan-assembler-times "ldsetb\t" 4} } */
-/* { dg-final { scan-assembler-times "ldsetab\t" 8} } */
-/* { dg-final { scan-assembler-times "ldsetlb\t" 4} } */
-/* { dg-final { scan-assembler-times "ldsetalb\t" 8} } */
+/* { dg-final { scan-assembler-times "ldsetb\t" 8} } */
+/* { dg-final { scan-assembler-times "ldsetab\t" 16} } */
+/* { dg-final { scan-assembler-times "ldsetlb\t" 8} } */
+/* { dg-final { scan-assembler-times "ldsetalb\t" 16} } */
 
-/* { dg-final { scan-assembler-times "ldseth\t" 4} } */
-/* { dg-final { scan-assembler-times "ldsetah\t" 8} } */
-/* { dg-final { scan-assembler-times "ldsetlh\t" 4} } */
-/* { dg-final { scan-assembler-times "ldsetalh\t" 8} } */
+/* { dg-final { scan-assembler-times "ldseth\t" 8} } */
+/* { dg-final { scan-assembler-times "ldsetah\t" 16} } */
+/* { dg-final { scan-assembler-times "ldsetlh\t" 8} } */
+/* { dg-final { scan-assembler-times "ldsetalh\t" 16} } */
 
-/* { dg-final { scan-assembler-times "ldset\t" 8} } */
-/* { dg-final { scan-assembler-times "ldseta\t" 16} } */
-/* { dg-final { scan-assembler-times "ldsetl\t" 8} } */
-/* { dg-final { scan-assembler-times "ldsetal\t" 16} } */
+/* { dg-final { scan-assembler-times "ldset\t" 16} } */
+/* { dg-final { scan-assembler-times "ldseta\t" 32} } */
+/* { dg-final { scan-assembler-times "ldsetl\t" 16} } */
+/* { dg-final { scan-assembler-times "ldsetal\t" 32} } */
 
 /* Load-AND.  */
 
-/* { dg-final { scan-assembler-times "ldclrb\t" 4} } */
-/* { dg-final { scan-assembler-times "ldclrab\t" 8} } */
-/* { dg-final { scan-assembler-times "ldclrlb\t" 4} } */
-/* { dg-final { scan-assembler-times "ldclralb\t" 8} } */
+/* { dg-final { scan-assembler-times "ldclrb\t" 8} } */
+/* { dg-final { scan-assembler-times "ldclrab\t" 16} } */
+/* { dg-final { scan-assembler-times "ldclrlb\t" 8} } */
+/* { dg-final { scan-assembler-times "ldclralb\t" 16} } */
 
-/* { dg-final { scan-assembler-times "ldclrh\t" 4} } */
-/* { dg-final { scan-assembler-times "ldclrah\t" 8} } */
-/* { dg-final { scan-assembler-times "ldclrlh\t" 4} } */
-/* { dg-final { scan-assembler-times "ldclralh\t" 8} } */
+/* { dg-final { scan-assembler-times "ldclrh\t" 8} } */
+/* { dg-final { scan-assembler-times "ldclrah\t" 16} } */
+/* { dg-final { scan-assembler-times "ldclrlh\t" 8} } */
+/* { dg-final { scan-assembler-times "ldclralh\t" 16} } */
 
-/* { dg-final { scan-assembler-times "ldclr\t" 8} */
-/* { dg-final { scan-assembler-times "ldclra\t" 16} } */
-/* { dg-final { scan-assembler-times "ldclrl\t" 8} } */
-/* { dg-final { scan-assembler-times "ldclral\t" 16} } */
+/* { dg-final { scan-assembler-times "ldclr\t" 16} */
+/* { dg-final { scan-assembler-times "ldclra\t" 32} } */
+/* { dg-final { scan-assembler-times "ldclrl\t" 16} } */
+/* { dg-final { scan-assembler-times "ldclral\t" 32} } */
 
 /* Load-XOR.  */
 
-/* { dg-final { scan-assembler-times "ldeorb\t" 4} } */
-/* { dg-final { scan-assembler-times "ldeorab\t" 8} } */
-/* { dg-final { scan-assembler-times "ldeorlb\t" 4} } */
-/* { dg-final { scan-assembler-times "ldeoralb\t" 8} } */
+/* { dg-final { scan-assembler-times "ldeorb\t" 8} } */
+/* { dg-final { scan-assembler-times "ldeorab\t" 16} } */
+/* { dg-final { scan-assembler-times "ldeorlb\t" 8} } */
+/* { dg-final { scan-assembler-times "ldeoralb\t" 16} } */
 
-/* { dg-final { scan-assembler-times "ldeorh\t" 4} } */
-/* { dg-final { scan-assembler-times "ldeorah\t" 8} } */
-/* { dg-final { scan-assembler-times "ldeorlh\t" 4} } */
-/* { dg-final { scan-assembler-times "ldeoralh\t" 8} } */
+/* { dg-final { scan-assembler-times "ldeorh\t" 8} } */
+/* { dg-final { scan-assembler-times "ldeorah\t" 16} } */
+/* { dg-final { scan-assembler-times "ldeorlh\t" 8} } */
+/* { dg-final { scan-assembler-times "ldeoralh\t" 16} } */
 
-/* { dg-final { scan-assembler-times "ldeor\t" 8} */
-/* { dg-final { scan-assembler-times "ldeora\t" 16} } */
-/* { dg-final { scan-assembler-times "ldeorl\t" 8} } */
-/* { dg-final { scan-assembler-times "ldeoral\t" 16} } */
+/* { dg-final { scan-assembler-times "ldeor\t" 16} */
+/* { dg-final { scan-assembler-times "ldeora\t" 32} } */
+/* { dg-final { scan-assembler-times "ldeorl\t" 16} } */
+/* { dg-final { scan-assembler-times "ldeoral\t" 32} } */
 
 /* { dg-final { scan-assembler-not "ldaxr\t" } } */
 /* { dg-final { scan-assembler-not "stlxr\t" } } */
-- 
2.1.4


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [AArch64][PATCH 4/5] Use atomic load-operate instructions for fetch-update patterns.
  2015-09-21 11:32     ` Matthew Wahab
@ 2015-09-21 14:15       ` James Greenhalgh
  0 siblings, 0 replies; 20+ messages in thread
From: James Greenhalgh @ 2015-09-21 14:15 UTC (permalink / raw)
  To: Matthew Wahab; +Cc: gcc-patches

On Mon, Sep 21, 2015 at 12:31:21PM +0100, Matthew Wahab wrote:
> I've also removed the CC clobber from the _lse patterns, it was overly cautious.
> 
> Update patch attached, Ok for trunk?
> Matthew

OK.

Thanks,
James

> 
> gcc/
> 2015-09-21  Matthew Wahab  <matthew.wahab@arm.com>
> 
> 	* config/aarch64/aarch64-protos.h
> 	(aarch64_atomic_ldop_supported_p): Declare.
> 	* config/aarch64/aarch64.c (aarch64_atomic_ldop_supported_p): New.
> 	(enum aarch64_atomic_load_op_code): New.
> 	(aarch64_emit_atomic_load_op): New.
> 	(aarch64_gen_atomic_ldop): Update to support load-operate
> 	patterns.
> 	* config/aarch64/atomics.md (atomic_<atomic_optab><mode>): Change
> 	to an expander.
> 	(aarch64_atomic_<atomic_optab><mode>): New.
> 	(aarch64_atomic_<atomic_optab><mode>_lse): New.
> 	(atomic_fetch_<atomic_optab><mode>): Change to an expander.
> 	(aarch64_atomic_fetch_<atomic_optab><mode>): New.
> 	(aarch64_atomic_fetch_<atomic_optab><mode>_lse): New.
> 
> gcc/testsuite/
> 2015-09-21  Matthew Wahab  <matthew.wahab@arm.com>
> 
> 	* gcc.target/aarch64/atomic-inst-ldadd.c: New.
> 	* gcc.target/aarch64/atomic-inst-ldlogic.c: New.
> 

> From 5d1bc64d7e9509374076e4c4ff5a285d4b073f24 Mon Sep 17 00:00:00 2001
> From: Matthew Wahab <matthew.wahab@arm.com>
> Date: Fri, 7 Aug 2015 17:10:42 +0100
> Subject: [PATCH 4/5] Use atomic instructions for fetch-update patterns.
> 
> Change-Id: I39759f02e61039067ccaabfd52039e4804eddf2f
> ---
>  gcc/config/aarch64/aarch64-protos.h                |   2 +
>  gcc/config/aarch64/aarch64.c                       | 175 ++++++++++++++++++++-
>  gcc/config/aarch64/atomics.md                      | 101 ++++++++++--
>  .../gcc.target/aarch64/atomic-inst-ldadd.c         |  58 +++++++
>  .../gcc.target/aarch64/atomic-inst-ldlogic.c       | 109 +++++++++++++
>  5 files changed, 433 insertions(+), 12 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/atomic-inst-ldadd.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/atomic-inst-ldlogic.c
> 
> diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
> index eba4c76..76ebd6f 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -378,6 +378,8 @@ rtx aarch64_load_tp (rtx);
>  void aarch64_expand_compare_and_swap (rtx op[]);
>  void aarch64_split_compare_and_swap (rtx op[]);
>  void aarch64_gen_atomic_cas (rtx, rtx, rtx, rtx, rtx);
> +
> +bool aarch64_atomic_ldop_supported_p (enum rtx_code);
>  void aarch64_gen_atomic_ldop (enum rtx_code, rtx, rtx, rtx, rtx);
>  void aarch64_split_atomic_op (enum rtx_code, rtx, rtx, rtx, rtx, rtx, rtx);
>  
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index dc05c6e..3a1b434 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -11064,6 +11064,32 @@ aarch64_expand_compare_and_swap (rtx operands[])
>    emit_insn (gen_rtx_SET (bval, x));
>  }
>  
> +/* Test whether the target supports using a atomic load-operate instruction.
> +   CODE is the operation and AFTER is TRUE if the data in memory after the
> +   operation should be returned and FALSE if the data before the operation
> +   should be returned.  Returns FALSE if the operation isn't supported by the
> +   architecture.  */
> +
> +bool
> +aarch64_atomic_ldop_supported_p (enum rtx_code code)
> +{
> +  if (!TARGET_LSE)
> +    return false;
> +
> +  switch (code)
> +    {
> +    case SET:
> +    case AND:
> +    case IOR:
> +    case XOR:
> +    case MINUS:
> +    case PLUS:
> +      return true;
> +    default:
> +      return false;
> +    }
> +}
> +
>  /* Emit a barrier, that is appropriate for memory model MODEL, at the end of a
>     sequence implementing an atomic operation.  */
>  
> @@ -11206,26 +11232,169 @@ aarch64_emit_atomic_swap (machine_mode mode, rtx dst, rtx value,
>    emit_insn (gen (dst, mem, value, model));
>  }
>  
> -/* Emit an atomic operation where the architecture supports it.  */
> +/* Operations supported by aarch64_emit_atomic_load_op.  */
> +
> +enum aarch64_atomic_load_op_code
> +{
> +  AARCH64_LDOP_PLUS,	/* A + B  */
> +  AARCH64_LDOP_XOR,	/* A ^ B  */
> +  AARCH64_LDOP_OR,	/* A | B  */
> +  AARCH64_LDOP_BIC	/* A & ~B  */
> +};
> +
> +/* Emit an atomic load-operate.  */
> +
> +static void
> +aarch64_emit_atomic_load_op (enum aarch64_atomic_load_op_code code,
> +			     machine_mode mode, rtx dst, rtx src,
> +			     rtx mem, rtx model)
> +{
> +  typedef rtx (*aarch64_atomic_load_op_fn) (rtx, rtx, rtx, rtx);
> +  const aarch64_atomic_load_op_fn plus[] =
> +  {
> +    gen_aarch64_atomic_loadaddqi,
> +    gen_aarch64_atomic_loadaddhi,
> +    gen_aarch64_atomic_loadaddsi,
> +    gen_aarch64_atomic_loadadddi
> +  };
> +  const aarch64_atomic_load_op_fn eor[] =
> +  {
> +    gen_aarch64_atomic_loadeorqi,
> +    gen_aarch64_atomic_loadeorhi,
> +    gen_aarch64_atomic_loadeorsi,
> +    gen_aarch64_atomic_loadeordi
> +  };
> +  const aarch64_atomic_load_op_fn ior[] =
> +  {
> +    gen_aarch64_atomic_loadsetqi,
> +    gen_aarch64_atomic_loadsethi,
> +    gen_aarch64_atomic_loadsetsi,
> +    gen_aarch64_atomic_loadsetdi
> +  };
> +  const aarch64_atomic_load_op_fn bic[] =
> +  {
> +    gen_aarch64_atomic_loadclrqi,
> +    gen_aarch64_atomic_loadclrhi,
> +    gen_aarch64_atomic_loadclrsi,
> +    gen_aarch64_atomic_loadclrdi
> +  };
> +  aarch64_atomic_load_op_fn gen;
> +  int idx = 0;
> +
> +  switch (mode)
> +    {
> +    case QImode: idx = 0; break;
> +    case HImode: idx = 1; break;
> +    case SImode: idx = 2; break;
> +    case DImode: idx = 3; break;
> +    default:
> +      gcc_unreachable ();
> +    }
> +
> +  switch (code)
> +    {
> +    case AARCH64_LDOP_PLUS: gen = plus[idx]; break;
> +    case AARCH64_LDOP_XOR: gen = eor[idx]; break;
> +    case AARCH64_LDOP_OR: gen = ior[idx]; break;
> +    case AARCH64_LDOP_BIC: gen = bic[idx]; break;
> +    default:
> +      gcc_unreachable ();
> +    }
> +
> +  emit_insn (gen (dst, mem, src, model));
> +}
> +
> +/* Emit an atomic load+operate.  CODE is the operation.  OUT_DATA is the
> +   location to store the data read from memory.  MEM is the memory location to
> +   read and modify.  MODEL_RTX is the memory ordering to use.  VALUE is the
> +   second operand for the operation.  Either OUT_DATA or OUT_RESULT, but not
> +   both, can be NULL.  */
>  
>  void
>  aarch64_gen_atomic_ldop (enum rtx_code code, rtx out_data,
>  			 rtx mem, rtx value, rtx model_rtx)
>  {
>    machine_mode mode = GET_MODE (mem);
> +  machine_mode wmode = (mode == DImode ? DImode : SImode);
> +  const bool short_mode = (mode < SImode);
> +  aarch64_atomic_load_op_code ldop_code;
> +  rtx src;
> +  rtx x;
> +
> +  if (out_data)
> +    out_data = gen_lowpart (mode, out_data);
>  
> -  out_data = gen_lowpart (mode, out_data);
> +  /* Make sure the value is in a register, putting it into a destination
> +     register if it needs to be manipulated.  */
> +  if (!register_operand (value, mode)
> +      || code == AND || code == MINUS)
> +    {
> +      src = out_data;
> +      emit_move_insn (src, gen_lowpart (mode, value));
> +    }
> +  else
> +    src = value;
> +  gcc_assert (register_operand (src, mode));
>  
> +  /* Preprocess the data for the operation as necessary.  If the operation is
> +     a SET then emit a swap instruction and finish.  */
>    switch (code)
>      {
>      case SET:
> -      aarch64_emit_atomic_swap (mode, out_data, value, mem, model_rtx);
> +      aarch64_emit_atomic_swap (mode, out_data, src, mem, model_rtx);
>        return;
>  
> +    case MINUS:
> +      /* Negate the value and treat it as a PLUS.  */
> +      {
> +	rtx neg_src;
> +
> +	/* Resize the value if necessary.  */
> +	if (short_mode)
> +	  src = gen_lowpart (wmode, src);
> +
> +	neg_src = gen_rtx_NEG (wmode, src);
> +	emit_insn (gen_rtx_SET (src, neg_src));
> +
> +	if (short_mode)
> +	  src = gen_lowpart (mode, src);
> +      }
> +      /* Fall-through.  */
> +    case PLUS:
> +      ldop_code = AARCH64_LDOP_PLUS;
> +      break;
> +
> +    case IOR:
> +      ldop_code = AARCH64_LDOP_OR;
> +      break;
> +
> +    case XOR:
> +      ldop_code = AARCH64_LDOP_XOR;
> +      break;
> +
> +    case AND:
> +      {
> +	rtx not_src;
> +
> +	/* Resize the value if necessary.  */
> +	if (short_mode)
> +	  src = gen_lowpart (wmode, src);
> +
> +	not_src = gen_rtx_NOT (wmode, src);
> +	emit_insn (gen_rtx_SET (src, not_src));
> +
> +	if (short_mode)
> +	  src = gen_lowpart (mode, src);
> +      }
> +      ldop_code = AARCH64_LDOP_BIC;
> +      break;
> +
>      default:
>        /* The operation can't be done with atomic instructions.  */
>        gcc_unreachable ();
>      }
> +
> +  aarch64_emit_atomic_load_op (ldop_code, mode, out_data, src, mem, model_rtx);
>  }
>  
>  /* Split an atomic operation.  */
> diff --git a/gcc/config/aarch64/atomics.md b/gcc/config/aarch64/atomics.md
> index 11a9d13..e0d8856 100644
> --- a/gcc/config/aarch64/atomics.md
> +++ b/gcc/config/aarch64/atomics.md
> @@ -225,23 +225,63 @@
>    }
>  )
>  
> -(define_insn_and_split "atomic_<atomic_optab><mode>"
> +(define_expand "atomic_<atomic_optab><mode>"
> + [(match_operand:ALLI 0 "aarch64_sync_memory_operand" "")
> +  (atomic_op:ALLI
> +   (match_operand:ALLI 1 "<atomic_op_operand>" "")
> +   (match_operand:SI 2 "const_int_operand"))]
> +  ""
> +  {
> +    rtx (*gen) (rtx, rtx, rtx);
> +
> +    /* Use an atomic load-operate instruction when possible.  */
> +    if (aarch64_atomic_ldop_supported_p (<CODE>))
> +      gen = gen_aarch64_atomic_<atomic_optab><mode>_lse;
> +    else
> +      gen = gen_aarch64_atomic_<atomic_optab><mode>;
> +
> +    emit_insn (gen (operands[0], operands[1], operands[2]));
> +
> +    DONE;
> +  }
> +)
> +
> +(define_insn_and_split "aarch64_atomic_<atomic_optab><mode>"
> + [(set (match_operand:ALLI 0 "aarch64_sync_memory_operand" "+Q")
> +   (unspec_volatile:ALLI
> +    [(atomic_op:ALLI (match_dup 0)
> +      (match_operand:ALLI 1 "<atomic_op_operand>" "r<const_atomic>"))
> +     (match_operand:SI 2 "const_int_operand")]
> +    UNSPECV_ATOMIC_OP))
> +  (clobber (reg:CC CC_REGNUM))
> +  (clobber (match_scratch:ALLI 3 "=&r"))
> +  (clobber (match_scratch:SI 4 "=&r"))]
> +  ""
> +  "#"
> +  "&& reload_completed"
> +  [(const_int 0)]
> +  {
> +    aarch64_split_atomic_op (<CODE>, NULL, operands[3], operands[0],
> +			     operands[1], operands[2], operands[4]);
> +    DONE;
> +  }
> +)
> +
> +(define_insn_and_split "aarch64_atomic_<atomic_optab><mode>_lse"
>    [(set (match_operand:ALLI 0 "aarch64_sync_memory_operand" "+Q")
>      (unspec_volatile:ALLI
>        [(atomic_op:ALLI (match_dup 0)
>  	(match_operand:ALLI 1 "<atomic_op_operand>" "r<const_atomic>"))
> -       (match_operand:SI 2 "const_int_operand")]		;; model
> +       (match_operand:SI 2 "const_int_operand")]
>        UNSPECV_ATOMIC_OP))
> -       (clobber (reg:CC CC_REGNUM))
> -   (clobber (match_scratch:ALLI 3 "=&r"))
> -   (clobber (match_scratch:SI 4 "=&r"))]
> -  ""
> +   (clobber (match_scratch:ALLI 3 "=&r"))]
> +  "TARGET_LSE"
>    "#"
>    "&& reload_completed"
>    [(const_int 0)]
>    {
> -    aarch64_split_atomic_op (<CODE>, NULL, operands[3], operands[0],
> -			     operands[1], operands[2], operands[4]);
> +    aarch64_gen_atomic_ldop (<CODE>, operands[3], operands[0],
> +			     operands[1], operands[2]);
>      DONE;
>    }
>  )
> @@ -268,7 +308,30 @@
>    }
>  )
>  
> -(define_insn_and_split "atomic_fetch_<atomic_optab><mode>"
> +;; Load-operate-store, returning the updated memory data.
> +
> +(define_expand "atomic_fetch_<atomic_optab><mode>"
> + [(match_operand:ALLI 0 "register_operand" "")
> +  (match_operand:ALLI 1 "aarch64_sync_memory_operand" "")
> +  (atomic_op:ALLI
> +   (match_operand:ALLI 2 "<atomic_op_operand>" "")
> +   (match_operand:SI 3 "const_int_operand"))]
> + ""
> +{
> +  rtx (*gen) (rtx, rtx, rtx, rtx);
> +
> +  /* Use an atomic load-operate instruction when possible.  */
> +  if (aarch64_atomic_ldop_supported_p (<CODE>))
> +    gen = gen_aarch64_atomic_fetch_<atomic_optab><mode>_lse;
> +  else
> +    gen = gen_aarch64_atomic_fetch_<atomic_optab><mode>;
> +
> +  emit_insn (gen (operands[0], operands[1], operands[2], operands[3]));
> +
> +  DONE;
> +})
> +
> +(define_insn_and_split "aarch64_atomic_fetch_<atomic_optab><mode>"
>    [(set (match_operand:ALLI 0 "register_operand" "=&r")
>      (match_operand:ALLI 1 "aarch64_sync_memory_operand" "+Q"))
>     (set (match_dup 1)
> @@ -291,6 +354,26 @@
>    }
>  )
>  
> +(define_insn_and_split "aarch64_atomic_fetch_<atomic_optab><mode>_lse"
> +  [(set (match_operand:ALLI 0 "register_operand" "=&r")
> +    (match_operand:ALLI 1 "aarch64_sync_memory_operand" "+Q"))
> +   (set (match_dup 1)
> +    (unspec_volatile:ALLI
> +      [(atomic_op:ALLI (match_dup 1)
> +	(match_operand:ALLI 2 "<atomic_op_operand>" "r<const_atomic>"))
> +       (match_operand:SI 3 "const_int_operand")]
> +      UNSPECV_ATOMIC_LDOP))]
> +  "TARGET_LSE"
> +  "#"
> +  "&& reload_completed"
> +  [(const_int 0)]
> +  {
> +    aarch64_gen_atomic_ldop (<CODE>, operands[0], operands[1],
> +			     operands[2], operands[3]);
> +    DONE;
> +  }
> +)
> +
>  (define_insn_and_split "atomic_fetch_nand<mode>"
>    [(set (match_operand:ALLI 0 "register_operand" "=&r")
>      (match_operand:ALLI 1 "aarch64_sync_memory_operand" "+Q"))
> diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-inst-ldadd.c b/gcc/testsuite/gcc.target/aarch64/atomic-inst-ldadd.c
> new file mode 100644
> index 0000000..c21d2ed
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/atomic-inst-ldadd.c
> @@ -0,0 +1,58 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -march=armv8-a+lse" } */
> +
> +/* Test ARMv8.1-A Load-ADD instruction.  */
> +
> +#include "atomic-inst-ops.inc"
> +
> +#define TEST TEST_ONE
> +
> +#define LOAD_ADD(FN, TY, MODEL)						\
> +  TY FNNAME (FN, TY) (TY* val, TY* foo)					\
> +  {									\
> +    return __atomic_fetch_add (val, foo, MODEL);			\
> +  }
> +
> +#define LOAD_ADD_NORETURN(FN, TY, MODEL)				\
> +  void FNNAME (FN, TY) (TY* val, TY* foo)				\
> +  {									\
> +    __atomic_fetch_add (val, foo, MODEL);				\
> +  }
> +
> +#define LOAD_SUB(FN, TY, MODEL)						\
> +  TY FNNAME (FN, TY) (TY* val, TY* foo)					\
> +  {									\
> +    return __atomic_fetch_sub (val, foo, MODEL);			\
> +  }
> +
> +#define LOAD_SUB_NORETURN(FN, TY, MODEL)				\
> +  void FNNAME (FN, TY) (TY* val, TY* foo)				\
> +  {									\
> +    __atomic_fetch_sub (val, foo, MODEL);				\
> +  }
> +
> +
> +TEST (load_add, LOAD_ADD)
> +TEST (load_add_notreturn, LOAD_ADD_NORETURN)
> +
> +TEST (load_sub, LOAD_SUB)
> +TEST (load_sub_notreturn, LOAD_SUB_NORETURN)
> +
> +/* { dg-final { scan-assembler-times "ldaddb\t" 8} } */
> +/* { dg-final { scan-assembler-times "ldaddab\t" 16} } */
> +/* { dg-final { scan-assembler-times "ldaddlb\t" 8} } */
> +/* { dg-final { scan-assembler-times "ldaddalb\t" 16} } */
> +
> +/* { dg-final { scan-assembler-times "ldaddh\t" 8} } */
> +/* { dg-final { scan-assembler-times "ldaddah\t" 16} } */
> +/* { dg-final { scan-assembler-times "ldaddlh\t" 8} } */
> +/* { dg-final { scan-assembler-times "ldaddalh\t" 16} } */
> +
> +/* { dg-final { scan-assembler-times "ldadd\t" 16} } */
> +/* { dg-final { scan-assembler-times "ldadda\t" 32} } */
> +/* { dg-final { scan-assembler-times "ldaddl\t" 16} } */
> +/* { dg-final { scan-assembler-times "ldaddal\t" 32} } */
> +
> +/* { dg-final { scan-assembler-not "ldaxr\t" } } */
> +/* { dg-final { scan-assembler-not "stlxr\t" } } */
> +/* { dg-final { scan-assembler-not "dmb" } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-inst-ldlogic.c b/gcc/testsuite/gcc.target/aarch64/atomic-inst-ldlogic.c
> new file mode 100644
> index 0000000..fd0f484
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/atomic-inst-ldlogic.c
> @@ -0,0 +1,109 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -march=armv8-a+lse" } */
> +
> +/* Test ARMv8.1-A LD<logic-op> instruction.  */
> +
> +#include "atomic-inst-ops.inc"
> +
> +#define TEST TEST_ONE
> +
> +#define LOAD_OR(FN, TY, MODEL)						\
> +  TY FNNAME (FN, TY) (TY* val, TY* foo)					\
> +  {									\
> +    return __atomic_fetch_or (val, foo, MODEL);				\
> +  }
> +
> +#define LOAD_OR_NORETURN(FN, TY, MODEL)					\
> +  void FNNAME (FN, TY) (TY* val, TY* foo)				\
> +  {									\
> +    __atomic_fetch_or (val, foo, MODEL);				\
> +  }
> +
> +#define LOAD_AND(FN, TY, MODEL)						\
> +  TY FNNAME (FN, TY) (TY* val, TY* foo)					\
> +  {									\
> +    return __atomic_fetch_and (val, foo, MODEL);			\
> +  }
> +
> +#define LOAD_AND_NORETURN(FN, TY, MODEL)				\
> +  void FNNAME (FN, TY) (TY* val, TY* foo)				\
> +  {									\
> +    __atomic_fetch_and (val, foo, MODEL);				\
> +  }
> +
> +#define LOAD_XOR(FN, TY, MODEL)						\
> +  TY FNNAME (FN, TY) (TY* val, TY* foo)					\
> +  {									\
> +    return __atomic_fetch_xor (val, foo, MODEL);			\
> +  }
> +
> +#define LOAD_XOR_NORETURN(FN, TY, MODEL)				\
> +  void FNNAME (FN, TY) (TY* val, TY* foo)				\
> +  {									\
> +    __atomic_fetch_xor (val, foo, MODEL);				\
> +  }
> +
> +
> +TEST (load_or, LOAD_OR)
> +TEST (load_or_notreturn, LOAD_OR_NORETURN)
> +
> +TEST (load_and, LOAD_AND)
> +TEST (load_and_notreturn, LOAD_AND_NORETURN)
> +
> +TEST (load_xor, LOAD_XOR)
> +TEST (load_xor_notreturn, LOAD_XOR_NORETURN)
> +
> +/* Load-OR.  */
> +
> +/* { dg-final { scan-assembler-times "ldsetb\t" 4} } */
> +/* { dg-final { scan-assembler-times "ldsetab\t" 8} } */
> +/* { dg-final { scan-assembler-times "ldsetlb\t" 4} } */
> +/* { dg-final { scan-assembler-times "ldsetalb\t" 8} } */
> +
> +/* { dg-final { scan-assembler-times "ldseth\t" 4} } */
> +/* { dg-final { scan-assembler-times "ldsetah\t" 8} } */
> +/* { dg-final { scan-assembler-times "ldsetlh\t" 4} } */
> +/* { dg-final { scan-assembler-times "ldsetalh\t" 8} } */
> +
> +/* { dg-final { scan-assembler-times "ldset\t" 8} } */
> +/* { dg-final { scan-assembler-times "ldseta\t" 16} } */
> +/* { dg-final { scan-assembler-times "ldsetl\t" 8} } */
> +/* { dg-final { scan-assembler-times "ldsetal\t" 16} } */
> +
> +/* Load-AND.  */
> +
> +/* { dg-final { scan-assembler-times "ldclrb\t" 4} } */
> +/* { dg-final { scan-assembler-times "ldclrab\t" 8} } */
> +/* { dg-final { scan-assembler-times "ldclrlb\t" 4} } */
> +/* { dg-final { scan-assembler-times "ldclralb\t" 8} } */
> +
> +/* { dg-final { scan-assembler-times "ldclrh\t" 4} } */
> +/* { dg-final { scan-assembler-times "ldclrah\t" 8} } */
> +/* { dg-final { scan-assembler-times "ldclrlh\t" 4} } */
> +/* { dg-final { scan-assembler-times "ldclralh\t" 8} } */
> +
> +/* { dg-final { scan-assembler-times "ldclr\t" 8} */
> +/* { dg-final { scan-assembler-times "ldclra\t" 16} } */
> +/* { dg-final { scan-assembler-times "ldclrl\t" 8} } */
> +/* { dg-final { scan-assembler-times "ldclral\t" 16} } */
> +
> +/* Load-XOR.  */
> +
> +/* { dg-final { scan-assembler-times "ldeorb\t" 4} } */
> +/* { dg-final { scan-assembler-times "ldeorab\t" 8} } */
> +/* { dg-final { scan-assembler-times "ldeorlb\t" 4} } */
> +/* { dg-final { scan-assembler-times "ldeoralb\t" 8} } */
> +
> +/* { dg-final { scan-assembler-times "ldeorh\t" 4} } */
> +/* { dg-final { scan-assembler-times "ldeorah\t" 8} } */
> +/* { dg-final { scan-assembler-times "ldeorlh\t" 4} } */
> +/* { dg-final { scan-assembler-times "ldeoralh\t" 8} } */
> +
> +/* { dg-final { scan-assembler-times "ldeor\t" 8} */
> +/* { dg-final { scan-assembler-times "ldeora\t" 16} } */
> +/* { dg-final { scan-assembler-times "ldeorl\t" 8} } */
> +/* { dg-final { scan-assembler-times "ldeoral\t" 16} } */
> +
> +/* { dg-final { scan-assembler-not "ldaxr\t" } } */
> +/* { dg-final { scan-assembler-not "stlxr\t" } } */
> +/* { dg-final { scan-assembler-not "dmb" } } */
> -- 
> 2.1.4
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [AArch64][PATCH 5/5] Use atomic load-operate instructions for update-fetch patterns.
  2015-09-21 11:44   ` Matthew Wahab
@ 2015-09-21 14:27     ` James Greenhalgh
  0 siblings, 0 replies; 20+ messages in thread
From: James Greenhalgh @ 2015-09-21 14:27 UTC (permalink / raw)
  To: Matthew Wahab; +Cc: gcc-patches

On Mon, Sep 21, 2015 at 12:40:32PM +0100, Matthew Wahab wrote:
> On 17/09/15 17:54, Matthew Wahab wrote:
> > ARMv8.1 adds atomic swap and atomic load-operate instructions with
> > optional memory ordering specifiers. This patch uses the ARMv8.1
> > load-operate instructions to implement the atomic_<op>_fetch patterns.
> >
> > The approach is to use the atomic load-operate instruction to atomically
> > load the data and update memory and then to use the loaded data to
> > calculate the value that the instruction would have stored. The
> > calculation attempts to mirror the operation of the atomic instruction.
> > For example, atomic_and_fetch<mode> is implemented with an atomic
> > load-bic so the result is also calculated using a BIC instruction.
> >
> [...]
> >
> > 2015-09-17  Matthew Wahab  <matthew.wahab@arm.com>
> >
> >      * config/aarch64/aarch64-protos.h (aarch64_gen_atomic_ldop):
> >      Adjust declaration.
> >      * config/aarch64/aarch64.c (aarch64_emit_bic): New.
> >      (aarch64_gen_atomic_load_op): Adjust comment.  Add parameter
> >      out_result.  Update to support update-fetch operations.
> >      * config/aarch64/atomics.md (aarch64_atomic_exchange<mode>_lse):
> >      Adjust for change to aarch64_gen_atomic_ldop.
> >      (aarch64_atomic_<atomic_optab><mode>_lse): Likewise.
> >      (aarch64_atomic_fetch_<atomic_optab><mode>_lse): Likewise.
> >      (atomic_<atomic_optab>_fetch<mode>): Change to an expander.
> >      (aarch64_atomic_<atomic_optab>_fetch<mode>): New.
> >      (aarch64_atomic_<atomic_optab>_fetch<mode>_lse): New.
> >
> > gcc/testsuite
> > 2015-09-17  Matthew Wahab  <matthew.wahab@arm.com>
> >
> >      * gcc.target/aarch64/atomic-inst-ldadd.c: Add tests for
> >      update-fetch operations.
> >      * gcc.target/aarch64/atomic-inst-ldlogic.c: Likewise.
> >
> 
> Attached an updated patch that takes into account the review comments and changes for 
> the rest of the series.
> 
> The changes in this patch:
> - Updated emit_bic for changes in the earlier patch.
> - Simplified the patterns used in the new expanders.
> - Dropped CC clobber from the _lse patterns.
> 
> Tested the series for aarch64-none-linux-gnu with native bootstrap and
> make check. Also tested for aarch64-none-elf with cross-compiled
> check-gcc on an ARMv8.1 emulator with +lse enabled by default.
> 
> Ok for trunk?

OK.

Thanks,
James

> Matthew
> 
> 2015-09-21  Matthew Wahab  <matthew.wahab@arm.com>
> 
> 	* config/aarch64/aarch64-protos.h (aarch64_gen_atomic_ldop):
> 	Adjust declaration.
> 	* config/aarch64/aarch64.c (aarch64_emit_bic): New.
> 	(aarch64_gen_atomic_ldop): Adjust comment.  Add parameter
> 	out_result.  Update to support update-fetch operations.
> 	* config/aarch64/atomics.md (aarch64_atomic_exchange<mode>_lse):
> 	Adjust for change to aarch64_gen_atomic_ldop.
> 	(aarch64_atomic_<atomic_optab><mode>_lse): Likewise.
> 	(aarch64_atomic_fetch_<atomic_optab><mode>_lse): Likewise.
> 	(atomic_<atomic_optab>_fetch<mode>): Change to an expander.
> 	(aarch64_atomic_<atomic_optab>_fetch<mode>): New.
> 	(aarch64_atomic_<atomic_optab>_fetch<mode>_lse): New.
> 
> gcc/testsuite
> 2015-09-21  Matthew Wahab  <matthew.wahab@arm.com>
> 
> 	* gcc.target/aarch64/atomic-inst-ldadd.c: Add tests for
> 	update-fetch operations.
> 	* gcc.target/aarch64/atomic-inst-ldlogic.c: Likewise.
> 
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2015-09-21 14:16 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-17 16:39 [AArch64][PATCH 1/5] Use atomic instructions for swap and fetch-update operations Matthew Wahab
2015-09-17 16:42 ` [AArch64][PATCH 2/5] Add BIC instruction Matthew Wahab
2015-09-18  8:11   ` James Greenhalgh
2015-09-21 11:16     ` [AArch64][PATCH 2/5] Make BIC, other logical instructions, available. (was: Add BIC instruction.) Matthew Wahab
2015-09-21 11:31       ` James Greenhalgh
2015-09-17 16:47 ` [AArch64][PATCH 3/5] Add atomic load-operate instructions Matthew Wahab
2015-09-18  8:39   ` James Greenhalgh
2015-09-21 11:23     ` Matthew Wahab
2015-09-17 16:49 ` [AArch64][PATCH 4/5] Use atomic load-operate instructions for fetch-update patterns Matthew Wahab
2015-09-18  9:05   ` James Greenhalgh
2015-09-21 11:32     ` Matthew Wahab
2015-09-21 14:15       ` James Greenhalgh
2015-09-17 16:56 ` [AArch64][PATCH 5/5] Use atomic load-operate instructions for update-fetch patterns Matthew Wahab
2015-09-17 18:56   ` Andrew Pinski
2015-09-18 16:27     ` Ramana Radhakrishnan
2015-09-21 11:44   ` Matthew Wahab
2015-09-21 14:27     ` James Greenhalgh
2015-09-18  7:59 ` [AArch64][PATCH 1/5] Use atomic instructions for swap and fetch-update operations James Greenhalgh
2015-09-21 11:06   ` Matthew Wahab
2015-09-21 11:25     ` James Greenhalgh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).