public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] S/390: Optimize atomic_compare_exchange and atomic_compare builtins.
@ 2017-03-27 20:50 Dominik Vogt
  2017-03-29 15:22 ` [PATCH v2] " Dominik Vogt
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Dominik Vogt @ 2017-03-27 20:50 UTC (permalink / raw)
  To: gcc-patches; +Cc: Andreas Krebbel, Ulrich Weigand

[-- Attachment #1: Type: text/plain, Size: 404 bytes --]

The attached patch optimizes the atomic_exchange and
atomic_compare patterns on s390 and s390x (mostly limited to
SImode and DImode).  Among general optimizaation, the changes fix
most of the problems reported in PR 80080:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80080

Bootstrapped and regression tested on a zEC12 with s390 and s390x
biarch.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

[-- Attachment #2: 0001-ChangeLog --]
[-- Type: text/plain, Size: 1537 bytes --]

gcc/ChangeLog-dv-atomic-gcc7

	* (s390_expand_cs_hqi): Removed.
	(s390_expand_cs, s390_expand_atomic_exchange_tdsi): New prototypes.
	(s390_cc_modes_compatible): Export.
	* config/s390/predicates.md ("memory_nosymref_operand"): New predicate
	for compare-and-swap.
	* config/s390/s390.c(s390_emit_compare_and_swap): Handle all integer
	modes.
	(s390_cc_modes_compatible): Remove static.
	(s390_expand_cs_hqi): Make static.
	(s390_expand_cs_tdsi): Generate an explicit compare before trying
	compare-and-swap, in some cases.
	(s390_expand_cs): Wrapper function.
	(s390_expand_atomic_exchange_tdsi): New backend specific expander for
	atomic_exchange.
	* config/s390/s390.md (define_peephole2): New peephole to help
	combining the load-and-test pattern with volatile memory.
	("cstorecc4"): Deal with CCZmode too.
	("sne", "sneccz1_ne", "sneccz1_eq"): Renamed and duplicated pattern.
	("sneccz_ne", "sneccz_eq"): New.
	("atomic_compare_and_swap<mode>"): Merge the patterns for small and
	large integers.  Forbid symref memory operands.  Move expander to
	s390.c.
	("atomic_compare_and_swap<mode>_internal")
	("*atomic_compare_and_swap<mode>_1")
	("*atomic_compare_and_swap<mode>_2")
	("*atomic_compare_and_swap<mode>_3"): Forbid symref memory operands.
	("atomic_exchange<mode>"): Allow and implement all integer modes.
gcc/testsuite/ChangeLog-dv-atomic-gcc7

	* gcc.target/s390/md/atomic_compare_exchange-1.c: New test.
	* gcc.target/s390/md/atomic_compare_exchange-1.inc: New test.
	* gcc.target/s390/md/atomic_exchange-1.inc: New test.

[-- Attachment #3: 0001-S-390-Optimize-atomic_compare_exchange-and-atomic_co.patch --]
[-- Type: text/plain, Size: 36584 bytes --]

From 17822384e33b4b98c299ab25969907eb2b9184ee Mon Sep 17 00:00:00 2001
From: Dominik Vogt <vogt@linux.vnet.ibm.com>
Date: Thu, 23 Feb 2017 17:23:11 +0100
Subject: [PATCH] S/390: Optimize atomic_compare_exchange and
 atomic_compare builtins.

1) Use the load-and-test instructions for atomic_exchange if the value is 0.
2) If IS_WEAK is true, compare the memory contents before a compare-and-swap
   and skip the CS instructions if the value is not the expected one.
---
 gcc/config/s390/predicates.md                      |  13 +
 gcc/config/s390/s390-protos.h                      |   5 +-
 gcc/config/s390/s390.c                             | 178 ++++++++++-
 gcc/config/s390/s390.md                            | 217 +++++++++----
 .../gcc.target/s390/md/atomic_compare_exchange-1.c |  84 ++++++
 .../s390/md/atomic_compare_exchange-1.inc          | 336 +++++++++++++++++++++
 .../gcc.target/s390/md/atomic_exchange-1.c         | 309 +++++++++++++++++++
 7 files changed, 1075 insertions(+), 67 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/md/atomic_compare_exchange-1.c
 create mode 100644 gcc/testsuite/gcc.target/s390/md/atomic_compare_exchange-1.inc
 create mode 100644 gcc/testsuite/gcc.target/s390/md/atomic_exchange-1.c

diff --git a/gcc/config/s390/predicates.md b/gcc/config/s390/predicates.md
index 0c82efc..aadb454 100644
--- a/gcc/config/s390/predicates.md
+++ b/gcc/config/s390/predicates.md
@@ -67,6 +67,19 @@
   return true;
 })
 
+;; Like memory_operand, but rejects symbol references.
+(define_predicate "memory_nosymref_operand"
+  (match_operand 0 "memory_operand")
+{
+  if (SYMBOL_REF_P (XEXP (op, 0)))
+    return false;
+  if (GET_CODE (op) == SUBREG
+      && GET_CODE (SUBREG_REG (op)) == MEM
+      && SYMBOL_REF_P (XEXP (XEXP (op, 0), 0)))
+    return false;
+  return true;
+})
+
 ;; Return true if OP is a valid operand for the BRAS instruction.
 ;; Allow SYMBOL_REFs and @PLT stubs.
 
diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h
index 7f06a20..81644b9 100644
--- a/gcc/config/s390/s390-protos.h
+++ b/gcc/config/s390/s390-protos.h
@@ -81,6 +81,7 @@ extern bool s390_overlap_p (rtx, rtx, HOST_WIDE_INT);
 extern bool s390_offset_p (rtx, rtx, rtx);
 extern int tls_symbolic_operand (rtx);
 
+extern machine_mode s390_cc_modes_compatible (machine_mode, machine_mode);
 extern bool s390_match_ccmode (rtx_insn *, machine_mode);
 extern machine_mode s390_tm_ccmode (rtx, rtx, bool);
 extern machine_mode s390_select_ccmode (enum rtx_code, rtx, rtx);
@@ -112,8 +113,8 @@ extern void s390_expand_vec_strlen (rtx, rtx, rtx);
 extern void s390_expand_vec_movstr (rtx, rtx, rtx);
 extern bool s390_expand_addcc (enum rtx_code, rtx, rtx, rtx, rtx, rtx);
 extern bool s390_expand_insv (rtx, rtx, rtx, rtx);
-extern void s390_expand_cs_hqi (machine_mode, rtx, rtx, rtx,
-				rtx, rtx, bool);
+extern bool s390_expand_cs (machine_mode, rtx, rtx, rtx, rtx, rtx, bool);
+extern void s390_expand_atomic_exchange_tdsi (rtx, rtx, rtx);
 extern void s390_expand_atomic (machine_mode, enum rtx_code,
 				rtx, rtx, rtx, bool);
 extern void s390_expand_tbegin (rtx, rtx, rtx, bool);
diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index e800323..14770a2 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -1240,7 +1240,7 @@ s390_set_has_landing_pad_p (bool value)
    mode which is compatible with both.  Otherwise, return
    VOIDmode.  */
 
-static machine_mode
+machine_mode
 s390_cc_modes_compatible (machine_mode m1, machine_mode m2)
 {
   if (m1 == m2)
@@ -1751,7 +1751,25 @@ static rtx
 s390_emit_compare_and_swap (enum rtx_code code, rtx old, rtx mem,
 			    rtx cmp, rtx new_rtx)
 {
-  emit_insn (gen_atomic_compare_and_swapsi_internal (old, mem, cmp, new_rtx));
+  switch (GET_MODE (mem))
+    {
+    case QImode:
+    case HImode:
+    case SImode:
+      emit_insn (gen_atomic_compare_and_swapsi_internal (old, mem, cmp,
+							 new_rtx));
+      break;
+    case DImode:
+      emit_insn (gen_atomic_compare_and_swapdi_internal (old, mem, cmp,
+							 new_rtx));
+      break;
+    case TImode:
+      emit_insn (gen_atomic_compare_and_swapti_internal (old, mem, cmp,
+							 new_rtx));
+      break;
+    default:
+      gcc_unreachable ();
+    }
   return s390_emit_compare (code, gen_rtx_REG (CCZ1mode, CC_REGNUM),
 			    const0_rtx);
 }
@@ -6709,7 +6727,7 @@ s390_two_part_insv (struct alignment_context *ac, rtx *seq1, rtx *seq2,
    the memory location, CMP the old value to compare MEM with and NEW_RTX the
    value to set if CMP == MEM.  */
 
-void
+static void
 s390_expand_cs_hqi (machine_mode mode, rtx btarget, rtx vtarget, rtx mem,
 		    rtx cmp, rtx new_rtx, bool is_weak)
 {
@@ -6785,6 +6803,160 @@ s390_expand_cs_hqi (machine_mode mode, rtx btarget, rtx vtarget, rtx mem,
 					      NULL_RTX, 1, OPTAB_DIRECT), 1);
 }
 
+/* Variant of s390_expand_cs for SI, DI and TI modes.  */
+static void
+s390_expand_cs_tdsi (machine_mode mode, rtx btarget, rtx vtarget, rtx mem,
+		     rtx cmp, rtx new_rtx, bool is_weak)
+{
+  rtx output = vtarget;
+  rtx_code_label *skip_cs_label = NULL;
+  bool do_const_opt = false;
+
+  if (!register_operand (output, mode))
+    output = gen_reg_rtx (mode);
+
+  /* If IS_WEAK is true and the INPUT value is a constant, compare the memory
+     with the constant first and skip the compare_and_swap because its very
+     expensive and likely to fail anyway.
+     Note 1: This is done only for IS_WEAK because C11 suggest that spurious
+     fails are possible in that case.
+     Note 2: It may be useful to do this also for non-constant INPUT.
+     Note 3: Currently only targets with "load on condition" re supported
+     (z196 and newer).  */
+
+  if (TARGET_CPU_Z196
+      && (mode == SImode || (mode == DImode && TARGET_ZARCH)))
+    do_const_opt = (is_weak && CONST_INT_P (cmp));
+
+  if (do_const_opt)
+    {
+      const int very_unlikely = REG_BR_PROB_BASE / 100 - 1;
+      rtx cc = gen_rtx_REG (CCZmode, CC_REGNUM);
+
+      skip_cs_label = gen_label_rtx ();
+      emit_move_insn (output, mem);
+      emit_move_insn (btarget, const0_rtx);
+      emit_insn (gen_rtx_SET (cc, gen_rtx_COMPARE (CCZmode, output, cmp)));
+      s390_emit_jump (skip_cs_label, gen_rtx_NE (VOIDmode, cc, const0_rtx));
+      add_int_reg_note (get_last_insn (), REG_BR_PROB, very_unlikely);
+      /* If the jump is not taken, OUTPUT is the expected value.  */
+      cmp = output;
+      /* Reload newval to a register manually, *after* the compare and jump
+	 above.  Otherwise Reload might place it before the jump.  */
+    }
+  else
+    cmp = force_reg (mode, cmp);
+  new_rtx = force_reg (mode, new_rtx);
+  s390_emit_compare_and_swap (EQ, output, mem, cmp, new_rtx);
+
+  /* We deliberately accept non-register operands in the predicate
+     to ensure the write back to the output operand happens *before*
+     the store-flags code below.  This makes it easier for combine
+     to merge the store-flags code with a potential test-and-branch
+     pattern following (immediately!) afterwards.  */
+  if (output != vtarget)
+    emit_move_insn (vtarget, output);
+
+  if (skip_cs_label != NULL)
+      emit_label (skip_cs_label);
+  if (TARGET_Z196)
+    {
+      rtx cc, cond, ite;
+
+      cc = gen_rtx_REG ((do_const_opt) ? CCZmode : CCZ1mode, CC_REGNUM);
+      cond = gen_rtx_EQ (VOIDmode, cc, const0_rtx);
+      ite = gen_rtx_IF_THEN_ELSE (SImode, cond, const1_rtx,
+				  (do_const_opt) ? btarget : const0_rtx);
+      emit_insn (gen_rtx_SET (btarget, ite));
+    }
+  else
+    {
+      rtx cc, cond;
+
+      cc = gen_rtx_REG ((do_const_opt) ? CCZmode : CCZ1mode, CC_REGNUM);
+      cond = gen_rtx_EQ (SImode, cc, const0_rtx);
+      emit_insn (gen_cstorecc4 (btarget, cond, cc, const0_rtx));
+    }
+}
+
+/* Expand an atomic compare and swap operation.  MEM is the memory location,
+   CMP the old value to compare MEM with and NEW_RTX the value to set if
+   CMP == MEM.  */
+
+bool
+s390_expand_cs (machine_mode mode, rtx btarget, rtx vtarget, rtx mem,
+		rtx cmp, rtx new_rtx, bool is_weak)
+{
+  if (GET_MODE_BITSIZE (mode) >= 4
+      && GET_MODE_BITSIZE (mode) > MEM_ALIGN (mem))
+    return false;
+
+  /* If the memory address isn't in a register already, reload it now to allow
+     for better optimization in the Rtl passes.  Otherwise Reload does it much
+     later and it might end up inside a loop.  */
+  if (!REG_P (XEXP (mem, 0)))
+    {
+      rtx ref;
+
+      ref = force_reg (Pmode, XEXP (mem, 0));
+      mem = gen_rtx_MEM (mode, ref);
+    }
+
+  switch (mode)
+    {
+    case TImode:
+    case DImode:
+    case SImode:
+      s390_expand_cs_tdsi (mode, btarget, vtarget, mem, cmp, new_rtx, is_weak);
+      break;
+    case HImode:
+    case QImode:
+      s390_expand_cs_hqi (mode, btarget, vtarget, mem, cmp, new_rtx, is_weak);
+      break;
+    default:
+      gcc_unreachable ();
+    }
+
+  return true;
+}
+
+/* Expand an atomic_exchange operation simulated with a compare-and-swap loop.
+   The memory location MEM is set to INPUT.  OUTPUT is set to the previous value
+   of MEM.  */
+
+void
+s390_expand_atomic_exchange_tdsi (rtx output, rtx mem, rtx input)
+{
+  machine_mode mode = GET_MODE (mem);
+  rtx_code_label *csloop;
+
+  if (TARGET_Z196
+      && (mode == DImode || mode == SImode)
+      && CONST_INT_P (input) && INTVAL (input) == 0)
+    {
+      emit_move_insn (output, const0_rtx);
+      if (mode == DImode)
+	emit_insn (gen_atomic_fetch_anddi (output, mem, const0_rtx, input));
+      else
+	emit_insn (gen_atomic_fetch_andsi (output, mem, const0_rtx, input));
+      return;
+    }
+
+  if (!REG_P (input))
+    {
+      rtx tmp;
+
+      tmp = gen_reg_rtx (mode);
+      emit_move_insn (tmp, input);
+      input = tmp;
+    }
+  emit_move_insn (output, mem);
+  csloop = gen_label_rtx ();
+  emit_label (csloop);
+  s390_emit_jump (csloop, s390_emit_compare_and_swap (NE, output, mem, output,
+						      input));
+}
+
 /* Expand an atomic operation CODE of mode MODE.  MEM is the memory location
    and VAL the value to play with.  If AFTER is true then store the value
    MEM holds after the operation, if AFTER is false then store the value MEM
diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index 93a0bc6..34d76b2 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -893,6 +893,21 @@
   [(set_attr "op_type" "RR<E>,RXY")
    (set_attr "z10prop" "z10_fr_E1,z10_fwd_A3") ])
 
+; Peephole to combine a load-and-test from volatile memory which combine does
+; not do.
+(define_peephole2
+  [(set (match_operand:GPR 0 "register_operand")
+	(match_operand:GPR 2 "memory_operand"))
+   (set (reg CC_REGNUM)
+	(compare (match_dup 0) (match_operand:GPR 1 "const0_operand")))]
+  "s390_match_ccmode(insn, CCSmode) && TARGET_EXTIMM
+   && GENERAL_REG_P (operands[0])
+   && satisfies_constraint_T (operands[2])"
+  [(parallel
+    [(set (reg:CCS CC_REGNUM)
+	  (compare:CCS (match_dup 2) (match_dup 1)))
+     (set (match_dup 0) (match_dup 2))])])
+
 ; ltr, lt, ltgr, ltg
 (define_insn "*tst<mode>_cconly_extimm"
   [(set (reg CC_REGNUM)
@@ -2060,6 +2075,16 @@
    (set_attr "cpu_facility" "*,*,*,*,vx,*,vx,*,*,*,*,*,*")
 ])
 
+(define_split
+  [(parallel
+    [(set (match_operand:GPR 0 "register_operand")
+	  (mem:GPR (match_operand:P 1 "larl_operand")))
+     (set (match_operand:P 2 "register_operand")
+	  (match_dup 1))])]
+  ""
+  [(set (match_dup 2) (match_dup 1))
+   (set (match_dup 0) (mem:GPR (match_dup 2)))])
+
 (define_peephole2
   [(set (match_operand:SI 0 "register_operand" "")
         (mem:SI (match_operand 1 "address_operand" "")))]
@@ -6499,26 +6524,112 @@
   [(parallel
     [(set (match_operand:SI 0 "register_operand" "")
 	  (match_operator:SI 1 "s390_eqne_operator"
-           [(match_operand:CCZ1 2 "register_operand")
+           [(match_operand 2 "register_operand")
 	    (match_operand 3 "const0_operand")]))
      (clobber (reg:CC CC_REGNUM))])]
   ""
-  "emit_insn (gen_sne (operands[0], operands[2]));
-   if (GET_CODE (operands[1]) == EQ)
-     emit_insn (gen_xorsi3 (operands[0], operands[0], const1_rtx));
+  "machine_mode mode = GET_MODE (operands[2]);
+   if (GET_CODE (operands[1]) != EQ && GET_CODE (operands[1]) != NE)
+     FAIL;
+   if (!s390_cc_modes_compatible (mode, CCZmode))
+     FAIL;
+   if (TARGET_Z196)
+     {
+       rtx cc, cond, ite;
+
+       cc = gen_rtx_REG (mode, CC_REGNUM);
+       cond = gen_rtx_EQ (VOIDmode, cc, const0_rtx);
+       ite = gen_rtx_IF_THEN_ELSE (SImode, cond, const1_rtx, const0_rtx);
+       emit_insn (gen_rtx_SET (operands[0], ite));
+     }
+   else
+     {
+       if (mode == CCZ1mode)
+	 {
+	   if (GET_CODE (operands[1]) == EQ && TARGET_EXTIMM)
+	     emit_insn (gen_sneccz1_eq (operands[0], operands[2]));
+	   else
+	     emit_insn (gen_sneccz1_ne (operands[0], operands[2]));
+	 }
+       else
+	 {
+	   if (GET_CODE (operands[1]) == EQ && TARGET_EXTIMM)
+	     emit_insn (gen_sneccz_eq (operands[0], operands[2]));
+	   else
+	     emit_insn (gen_sneccz_ne (operands[0], operands[2]));
+	 }
+       if (GET_CODE (operands[1]) == EQ && !TARGET_EXTIMM)
+	 emit_insn (gen_xorsi3 (operands[0], operands[0], const1_rtx));
+     }
    DONE;")
 
-(define_insn_and_split "sne"
+(define_insn_and_split "sneccz1_ne"
   [(set (match_operand:SI 0 "register_operand" "=d")
 	(ne:SI (match_operand:CCZ1 1 "register_operand" "0")
-	       (const_int 0)))
-   (clobber (reg:CC CC_REGNUM))]
+	       (const_int 0)))]
   ""
   "#"
   "reload_completed"
-  [(parallel
-    [(set (match_dup 0) (ashiftrt:SI (match_dup 0) (const_int 28)))
-     (clobber (reg:CC CC_REGNUM))])])
+  [(set (match_dup 0) (lshiftrt:SI (match_dup 0) (const_int 28)))])
+
+(define_insn_and_split "sneccz1_eq"
+  [(set (reg:CCZ1 CC_REGNUM)
+	(ne:CCZ1 (match_operand:CCZ1 1 "register_operand" "0")
+		 (const_int 0)))
+   (set (match_operand:SI 0 "register_operand" "=d")
+	(eq:SI (match_dup 1) (const_int 0)))]
+  "TARGET_EXTIMM"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0) (lshiftrt:SI (match_dup 0) (const_int 28)))
+   (parallel
+    [(set (reg:CCZ CC_REGNUM)
+	  (compare:CCZ (xor:SI (match_dup 0) (const_int 1))
+		       (const_int 0)))
+     (set (match_dup 0)
+	  (xor:SI (match_dup 0) (const_int 1)))])])
+
+(define_insn_and_split "sneccz_ne"
+  [(set (reg:CCZ1 CC_REGNUM)
+	(eq:CCZ1 (match_operand:CCZ 1 "register_operand" "0")
+		 (const_int 0)))
+   (set (match_operand:SI 0 "register_operand" "=d")
+	(ne:SI (match_dup 1) (const_int 0)))]
+  "!TARGET_Z196"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0) (lshiftrt:SI (match_dup 0) (const_int 28)))
+   (parallel
+    [(set (reg:CCAN CC_REGNUM)
+	  (compare:CCAN (neg:SI (abs:SI (match_dup 0)))
+			(const_int 0)))
+     (set (match_dup 0)
+	  (neg:SI (abs:SI (match_dup 0))))])
+   (set (match_dup 0) (lshiftrt:SI (match_dup 0) (const_int 31)))])
+
+(define_insn_and_split "sneccz_eq"
+  [(set (reg:CCZ1 CC_REGNUM)
+	(ne:CCZ1 (match_operand:CCZ 1 "register_operand" "0")
+		 (const_int 0)))
+   (set (match_operand:SI 0 "register_operand" "=d")
+	(eq:SI (match_dup 1) (const_int 0)))]
+  "!TARGET_Z196 && TARGET_EXTIMM"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0) (lshiftrt:SI (match_dup 0) (const_int 28)))
+   (parallel
+    [(set (reg:CCAN CC_REGNUM)
+	  (compare:CCAN (neg:SI (abs:SI (match_dup 0)))
+			(const_int 0)))
+     (set (match_dup 0)
+	  (neg:SI (abs:SI (match_dup 0))))])
+   (set (match_dup 0) (lshiftrt:SI (match_dup 0) (const_int 31)))
+   (parallel
+    [(set (reg:CCZ CC_REGNUM)
+	  (compare:CCZ (xor:SI (match_dup 0) (const_int 1))
+		       (const_int 0)))
+     (set (match_dup 0)
+	  (xor:SI (match_dup 0) (const_int 1)))])])
 
 
 ;;
@@ -10174,60 +10285,28 @@
 
 (define_expand "atomic_compare_and_swap<mode>"
   [(match_operand:SI 0 "register_operand")	;; bool success output
-   (match_operand:DGPR 1 "nonimmediate_operand");; oldval output
-   (match_operand:DGPR 2 "memory_operand")	;; memory
-   (match_operand:DGPR 3 "register_operand")	;; expected intput
-   (match_operand:DGPR 4 "register_operand")	;; newval intput
+   (match_operand:DINT 1 "nonimmediate_operand");; oldval output
+   (match_operand:DINT 2 "memory_nosymref_operand")	;; memory
+   (match_operand:DINT 3 "general_operand")	;; expected intput
+   (match_operand:DINT 4 "general_operand")	;; newval intput
    (match_operand:SI 5 "const_int_operand")	;; is_weak
    (match_operand:SI 6 "const_int_operand")	;; success model
    (match_operand:SI 7 "const_int_operand")]	;; failure model
   ""
 {
-  rtx cc, cmp, output = operands[1];
-
-  if (!register_operand (output, <MODE>mode))
-    output = gen_reg_rtx (<MODE>mode);
-
-  if (MEM_ALIGN (operands[2]) < GET_MODE_BITSIZE (GET_MODE (operands[2])))
+  bool rc;
+  rc = s390_expand_cs (<MODE>mode, operands[0], operands[1], operands[2],
+		       operands[3], operands[4], INTVAL (operands[5]));
+  if (rc)
+    DONE;
+  else
     FAIL;
-
-  emit_insn (gen_atomic_compare_and_swap<mode>_internal
-	     (output, operands[2], operands[3], operands[4]));
-
-  /* We deliberately accept non-register operands in the predicate
-     to ensure the write back to the output operand happens *before*
-     the store-flags code below.  This makes it easier for combine
-     to merge the store-flags code with a potential test-and-branch
-     pattern following (immediately!) afterwards.  */
-  if (output != operands[1])
-    emit_move_insn (operands[1], output);
-
-  cc = gen_rtx_REG (CCZ1mode, CC_REGNUM);
-  cmp = gen_rtx_EQ (SImode, cc, const0_rtx);
-  emit_insn (gen_cstorecc4 (operands[0], cmp, cc, const0_rtx));
-  DONE;
-})
-
-(define_expand "atomic_compare_and_swap<mode>"
-  [(match_operand:SI 0 "register_operand")	;; bool success output
-   (match_operand:HQI 1 "nonimmediate_operand")	;; oldval output
-   (match_operand:HQI 2 "memory_operand")	;; memory
-   (match_operand:HQI 3 "general_operand")	;; expected intput
-   (match_operand:HQI 4 "general_operand")	;; newval intput
-   (match_operand:SI 5 "const_int_operand")	;; is_weak
-   (match_operand:SI 6 "const_int_operand")	;; success model
-   (match_operand:SI 7 "const_int_operand")]	;; failure model
-  ""
-{
-  s390_expand_cs_hqi (<MODE>mode, operands[0], operands[1], operands[2],
-		      operands[3], operands[4], INTVAL (operands[5]));
-  DONE;
 })
 
 (define_expand "atomic_compare_and_swap<mode>_internal"
   [(parallel
      [(set (match_operand:DGPR 0 "register_operand")
-	   (match_operand:DGPR 1 "memory_operand"))
+	   (match_operand:DGPR 1 "memory_nosymref_operand"))
       (set (match_dup 1)
 	   (unspec_volatile:DGPR
 	     [(match_dup 1)
@@ -10241,7 +10320,7 @@
 ; cdsg, csg
 (define_insn "*atomic_compare_and_swap<mode>_1"
   [(set (match_operand:TDI 0 "register_operand" "=r")
-	(match_operand:TDI 1 "memory_operand" "+S"))
+	(match_operand:TDI 1 "memory_nosymref_operand" "+S"))
    (set (match_dup 1)
 	(unspec_volatile:TDI
 	  [(match_dup 1)
@@ -10258,7 +10337,7 @@
 ; cds, cdsy
 (define_insn "*atomic_compare_and_swapdi_2"
   [(set (match_operand:DI 0 "register_operand" "=r,r")
-	(match_operand:DI 1 "memory_operand" "+Q,S"))
+	(match_operand:DI 1 "memory_nosymref_operand" "+Q,S"))
    (set (match_dup 1)
 	(unspec_volatile:DI
 	  [(match_dup 1)
@@ -10278,7 +10357,7 @@
 ; cs, csy
 (define_insn "*atomic_compare_and_swapsi_3"
   [(set (match_operand:SI 0 "register_operand" "=r,r")
-	(match_operand:SI 1 "memory_operand" "+Q,S"))
+	(match_operand:SI 1 "memory_nosymref_operand" "+Q,S"))
    (set (match_dup 1)
 	(unspec_volatile:SI
 	  [(match_dup 1)
@@ -10374,16 +10453,30 @@
   DONE;
 })
 
+;; Pattern to implement atomic_exchange with a compare-and-swap loop.  The code
+;; generated by the middleend is not good.
 (define_expand "atomic_exchange<mode>"
-  [(match_operand:HQI 0 "register_operand")		;; val out
-   (match_operand:HQI 1 "memory_operand")		;; memory
-   (match_operand:HQI 2 "general_operand")		;; val in
+  [(match_operand:DINT 0 "register_operand")		;; val out
+   (match_operand:DINT 1 "memory_nosymref_operand")	;; memory
+   (match_operand:DINT 2 "general_operand")		;; val in
    (match_operand:SI 3 "const_int_operand")]		;; model
   ""
 {
-  s390_expand_atomic (<MODE>mode, SET, operands[0], operands[1],
-		      operands[2], false);
-  DONE;
+  if (<MODE>mode == HImode || <MODE>mode == QImode)
+    {
+      s390_expand_atomic (<MODE>mode, SET, operands[0], operands[1],
+			  operands[2], false);
+      DONE;
+    }
+  else if (<MODE>mode == SImode || TARGET_ZARCH)
+    {
+      if (MEM_ALIGN (operands[1]) < GET_MODE_BITSIZE (<MODE>mode))
+	FAIL;
+      s390_expand_atomic_exchange_tdsi (operands[0], operands[1], operands[2]);
+      DONE;
+    }
+  else
+    FAIL;
 })
 
 ;;
diff --git a/gcc/testsuite/gcc.target/s390/md/atomic_compare_exchange-1.c b/gcc/testsuite/gcc.target/s390/md/atomic_compare_exchange-1.c
new file mode 100644
index 0000000..5cc026d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/md/atomic_compare_exchange-1.c
@@ -0,0 +1,84 @@
+/* Machine description pattern tests.  */
+
+/* { dg-do compile } */
+/* { dg-options "" } */
+/* { dg-do run { target { s390_useable_hw } } } */
+
+#include <stdio.h>
+
+struct
+{
+#ifdef __s390xx__
+  __int128 dummy128;
+  __int128 mem128;
+#endif
+  long long dummy64;
+  long long mem64;
+  int dummy32;
+  int mem32;
+  short mem16l;
+  short mem16h;
+  char mem8ll;
+  char mem8lh;
+  char mem8hl;
+  char mem8hh;
+} mem_s;
+
+#define TYPE char
+#define FN(SUFFIX) f8 ## SUFFIX
+#define FNS(SUFFIX) "f8" #SUFFIX
+#include "atomic_compare_exchange-1.inc"
+
+#define TYPE short
+#define FN(SUFFIX) f16 ##SUFFIX
+#define FNS(SUFFIX) "f16" #SUFFIX
+#include "atomic_compare_exchange-1.inc"
+
+#define TYPE int
+#define FN(SUFFIX) f32 ## SUFFIX
+#define FNS(SUFFIX) "f32" #SUFFIX
+#include "atomic_compare_exchange-1.inc"
+
+#define TYPE long long
+#define FN(SUFFIX) f64 ## SUFFIX
+#define FNS(SUFFIX) "f64" #SUFFIX
+#include "atomic_compare_exchange-1.inc"
+
+#ifdef __s390xx__
+#define TYPE __int128
+#define FN(SUFFIX) f128 ## SUFFIX
+#define FNS(SUFFIX) "f128" #SUFFIX
+#include "atomic_compare_exchange-1.inc"
+#endif
+
+int main(void)
+{
+  int err_count = 0;
+  int i;
+
+  for (i = -1; i <= 2; i++)
+    {
+      __builtin_memset(&mem_s, 0x99, sizeof(mem_s));
+      err_count += f8_validate(&mem_s.mem8ll, i, 1);
+      __builtin_memset(&mem_s, 0x99, sizeof(mem_s));
+      err_count += f8_validate(&mem_s.mem8lh, i, 1);
+      __builtin_memset(&mem_s, 0x99, sizeof(mem_s));
+      err_count += f8_validate(&mem_s.mem8hl, i, 1);
+      __builtin_memset(&mem_s, 0x99, sizeof(mem_s));
+      err_count += f8_validate(&mem_s.mem8hh, i, 1);
+      __builtin_memset(&mem_s, 0x99, sizeof(mem_s));
+      err_count += f16_validate(&mem_s.mem16l, i, 1);
+      __builtin_memset(&mem_s, 0x99, sizeof(mem_s));
+      err_count += f16_validate(&mem_s.mem16h, i, 1);
+      __builtin_memset(&mem_s, 0x99, sizeof(mem_s));
+      err_count += f32_validate(&mem_s.mem32, i, 1);
+      __builtin_memset(&mem_s, 0x99, sizeof(mem_s));
+      err_count += f64_validate(&mem_s.mem64, i, 1);
+#ifdef __s390xx__
+      __builtin_memset(&mem_s, 0x99, sizeof(mem_s));
+      err_count += f128_validate(&mem_s.mem128, i, 1);
+#endif
+    }
+
+  return err_count;
+}
diff --git a/gcc/testsuite/gcc.target/s390/md/atomic_compare_exchange-1.inc b/gcc/testsuite/gcc.target/s390/md/atomic_compare_exchange-1.inc
new file mode 100644
index 0000000..199aaa3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/md/atomic_compare_exchange-1.inc
@@ -0,0 +1,336 @@
+/* -*-c-*- */
+
+#undef NEW
+#define NEW 3
+
+__attribute__ ((noinline))
+int FN(_bo)(TYPE *mem, TYPE *old_ret, TYPE old)
+{
+  *old_ret = old;
+  return __atomic_compare_exchange_n (mem, (void *)old_ret, NEW, 1, 2, 0);
+}
+
+__attribute__ ((noinline))
+void FN(_o)(TYPE *mem, TYPE *old_ret, TYPE old)
+{
+  *old_ret = old;
+  __atomic_compare_exchange_n (mem, (void *)old_ret, NEW, 1, 2, 0);
+  return;
+}
+
+__attribute__ ((noinline))
+int FN(_b)(TYPE *mem, TYPE old)
+{
+  return __atomic_compare_exchange_n (mem, (void *)&old, NEW, 1, 2, 0);
+}
+
+__attribute__ ((noinline))
+void FN()(TYPE *mem, TYPE old)
+{
+  __atomic_compare_exchange_n (mem, (void *)&old, NEW, 1, 2, 0);
+  return;
+}
+
+/* Const != 0 old value.  */
+__attribute__ ((noinline))
+int FN(_c1_bo)(TYPE *mem, TYPE *old_ret)
+{
+  *old_ret = 1;
+  return __atomic_compare_exchange_n (mem, (void *)old_ret, NEW, 1, 2, 0);
+}
+
+__attribute__ ((noinline))
+void FN(_c1_o)(TYPE *mem, TYPE *old_ret)
+{
+  *old_ret = 1;
+  __atomic_compare_exchange_n (mem, (void *)old_ret, NEW, 1, 2, 0);
+  return;
+}
+
+__attribute__ ((noinline))
+int FN(_c1_b)(TYPE *mem)
+{
+  TYPE old = 1;
+  return __atomic_compare_exchange_n (mem, (void *)&old, NEW, 1, 2, 0);
+}
+
+__attribute__ ((noinline))
+void FN(_c1)(TYPE *mem)
+{
+  TYPE old = 1;
+  __atomic_compare_exchange_n (mem, (void *)&old, NEW, 1, 2, 0);
+  return;
+}
+
+/* Const == 0 old value.  */
+__attribute__ ((noinline))
+int FN(_c0_bo)(TYPE *mem, TYPE *old_ret)
+{
+  *old_ret = 0;
+  return __atomic_compare_exchange_n (mem, (void *)old_ret, NEW, 1, 2, 0);
+}
+
+__attribute__ ((noinline))
+void FN(_c0_o)(TYPE *mem, TYPE *old_ret)
+{
+  *old_ret = 0;
+  __atomic_compare_exchange_n (mem, (void *)old_ret, NEW, 1, 2, 0);
+  return;
+}
+
+__attribute__ ((noinline))
+int FN(_c0_b)(TYPE *mem)
+{
+  TYPE old = 0;
+  return __atomic_compare_exchange_n (mem, (void *)&old, NEW, 1, 2, 0);
+}
+
+__attribute__ ((noinline))
+void FN(_c0)(TYPE *mem)
+{
+  TYPE old = 0;
+  __atomic_compare_exchange_n (mem, (void *)&old, NEW, 1, 2, 0);
+  return;
+}
+
+int FN(_validate_mem)(TYPE *mem, TYPE expected_mem)
+{
+  if (*mem != expected_mem)
+    {
+      fprintf(stderr, "  BAD: mem %d != expected mem %d\n",
+	      *mem, expected_mem);
+      return 1;
+    }
+
+  return 0;
+}
+
+int FN(_validate_rc)(int rc, int expected_rc)
+{
+  if (rc != expected_rc)
+    {
+      fprintf(stderr, "  BAD: rc %d != expected rc %d\n",
+	      rc, expected_rc);
+      return 1;
+    }
+
+  return 0;
+}
+
+int FN(_validate_old_ret)(int old_ret, int expected_old_ret)
+{
+  if (old_ret != expected_old_ret)
+    {
+      fprintf(stderr, "  BAD: old_ret %d != expected old_ret %d\n",
+	      old_ret, expected_old_ret);
+      return 1;
+    }
+
+  return 0;
+}
+
+int FN(_validate)(TYPE *mem, TYPE init_mem, TYPE old)
+{
+  int err_count = 0;
+  int rc;
+  TYPE expected_mem;
+  int expected_rc;
+  TYPE old_ret;
+  int failed;
+  const char *fname;
+
+  fprintf(stderr, "%s: init_mem %d @ %p\n", __FUNCTION__, init_mem, mem);
+  {
+    failed = 0;
+    old = 1;
+    *mem = init_mem;
+    expected_mem = (init_mem == old) ? NEW : *mem;
+    expected_rc = (init_mem == old);
+    fname = FNS(_bo);
+    rc = FN(_bo)(mem, &old_ret, old);
+    failed |= FN(_validate_rc)(rc, expected_rc);
+    failed |= FN(_validate_old_ret)(old_ret, init_mem);
+    failed |= FN(_validate_mem)(mem, expected_mem);
+    if (failed)
+      {
+	fprintf(stderr, "  FAIL: %s: near line %d\n", fname, __LINE__ - 3);
+	err_count++;
+      }
+  }
+  {
+    failed = 0;
+    old = 1;
+    *mem = init_mem;
+    expected_mem = (init_mem == old) ? NEW : *mem;
+    fname = FNS(_o);
+    FN(_o)(mem, &old_ret, old);
+    failed |= FN(_validate_old_ret)(old_ret, init_mem);
+    failed |= FN(_validate_mem)(mem, expected_mem);
+    if (failed)
+      {
+	fprintf(stderr, "  FAIL: %s: near line %d\n", fname, __LINE__ - 3);
+	err_count++;
+      }
+  }
+  {
+    failed = 0;
+    old = 1;
+    *mem = init_mem;
+    expected_mem = (init_mem == old) ? NEW : *mem;
+    expected_rc = (init_mem == old);
+    fname = FNS(_b);
+    rc = FN(_b)(mem, old);
+    failed |= FN(_validate_rc)(rc, expected_rc);
+    failed |= FN(_validate_mem)(mem, expected_mem);
+    if (failed)
+      {
+	fprintf(stderr, "  FAIL: %s: near line %d\n", fname, __LINE__ - 3);
+	err_count++;
+      }
+  }
+  {
+    failed = 0;
+    old = 1;
+    *mem = init_mem;
+    expected_mem = (init_mem == old) ? NEW : *mem;
+    fname = FNS();
+    FN()(mem, old);
+    failed |= FN(_validate_mem)(mem, expected_mem);
+    if (failed)
+      {
+	fprintf(stderr, "  FAIL: %s: near line %d\n", fname, __LINE__ - 3);
+	err_count++;
+      }
+  }
+  {
+    failed = 0;
+    old = 1;
+    *mem = init_mem;
+    expected_mem = (init_mem == old) ? NEW : *mem;
+    expected_rc = (init_mem == old);
+    fname = FNS(_c1_bo);
+    rc = FN(_c1_bo)(mem, &old_ret);
+    failed |= FN(_validate_rc)(rc, expected_rc);
+    failed |= FN(_validate_old_ret)(old_ret, init_mem);
+    failed |= FN(_validate_mem)(mem, expected_mem);
+    if (failed)
+      {
+	fprintf(stderr, "  FAIL: %s: near line %d\n", fname, __LINE__ - 3);
+	err_count++;
+      }
+  }
+  {
+    failed = 0;
+    old = 1;
+    *mem = init_mem;
+    expected_mem = (init_mem == old) ? NEW : *mem;
+    fname = FNS(_c1_o);
+    FN(_c1_o)(mem, &old_ret);
+    failed |= FN(_validate_old_ret)(old_ret, init_mem);
+    failed |= FN(_validate_mem)(mem, expected_mem);
+    if (failed)
+      {
+	fprintf(stderr, "  FAIL: %s: near line %d\n", fname, __LINE__ - 3);
+	err_count++;
+      }
+  }
+  {
+    failed = 0;
+    old = 1;
+    *mem = init_mem;
+    expected_mem = (init_mem == old) ? NEW : *mem;
+    expected_rc = (init_mem == old);
+    fname = FNS(_c1_b);
+    rc = FN(_c1_b)(mem);
+    failed |= FN(_validate_rc)(rc, expected_rc);
+    failed |= FN(_validate_mem)(mem, expected_mem);
+    if (failed)
+      {
+	fprintf(stderr, "  FAIL: %s: near line %d\n", fname, __LINE__ - 3);
+	err_count++;
+      }
+  }
+  {
+    failed = 0;
+    old = 1;
+    *mem = init_mem;
+    expected_mem = (init_mem == old) ? NEW : *mem;
+    fname = FNS(_c1);
+    FN(_c1)(mem);
+    failed |= FN(_validate_mem)(mem, expected_mem);
+    if (failed)
+      {
+	fprintf(stderr, "  FAIL: %s: near line %d\n", fname, __LINE__ - 3);
+	err_count++;
+      }
+  }
+  {
+    failed = 0;
+    old = 0;
+    *mem = init_mem;
+    expected_mem = (init_mem == old) ? NEW : *mem;
+    expected_rc = (init_mem == old);
+    fname = FNS(_c0_bo);
+    rc = FN(_c0_bo)(mem, &old_ret);
+    failed |= FN(_validate_rc)(rc, expected_rc);
+    failed |= FN(_validate_old_ret)(old_ret, init_mem);
+    failed |= FN(_validate_mem)(mem, expected_mem);
+    if (failed)
+      {
+	fprintf(stderr, "  FAIL: %s: near line %d\n", fname, __LINE__ - 3);
+	err_count++;
+      }
+  }
+  {
+    failed = 0;
+    old = 0;
+    *mem = init_mem;
+    expected_mem = (init_mem == old) ? NEW : *mem;
+    fname = FNS(_c0_o);
+    FN(_c0_o)(mem, &old_ret);
+    failed |= FN(_validate_old_ret)(old_ret, init_mem);
+    failed |= FN(_validate_mem)(mem, expected_mem);
+    if (failed)
+      {
+	fprintf(stderr, "  FAIL: %s: near line %d\n", fname, __LINE__ - 3);
+	err_count++;
+      }
+  }
+  {
+    failed = 0;
+    old = 0;
+    *mem = init_mem;
+    expected_mem = (init_mem == old) ? NEW : *mem;
+    expected_rc = (init_mem == old);
+    fname = FNS(_c0_b);
+    rc = FN(_c0_b)(mem);
+    failed |= FN(_validate_rc)(rc, expected_rc);
+    failed |= FN(_validate_mem)(mem, expected_mem);
+    if (failed)
+      {
+	fprintf(stderr, "  FAIL: %s: near line %d\n", fname, __LINE__ - 3);
+	err_count++;
+      }
+  }
+  {
+    failed = 0;
+    old = 0;
+    *mem = init_mem;
+    expected_mem = (init_mem == old) ? NEW : *mem;
+    fname = FNS(_c0);
+    FN(_c0)(mem);
+    failed |= FN(_validate_mem)(mem, expected_mem);
+    if (failed)
+      {
+	fprintf(stderr, "  FAIL: %s: near line %d\n", fname, __LINE__ - 3);
+	err_count++;
+      }
+  }
+
+  return err_count;
+}
+
+#undef TYPE
+#undef MEM
+#undef FN
+#undef FNS
diff --git a/gcc/testsuite/gcc.target/s390/md/atomic_exchange-1.c b/gcc/testsuite/gcc.target/s390/md/atomic_exchange-1.c
new file mode 100644
index 0000000..f82b213
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/md/atomic_exchange-1.c
@@ -0,0 +1,309 @@
+/* Machine description pattern tests.  */
+
+/* { dg-do compile } */
+/* { dg-options "-lpthread -latomic" } */
+/* { dg-do run { target { s390_useable_hw } } } */
+
+/**/
+
+char
+ae_8_0 (char *lock)
+{
+  return __atomic_exchange_n (lock, 0, 2);
+}
+
+char
+ae_8_1 (char *lock)
+{
+  return __atomic_exchange_n (lock, 1, 2);
+}
+
+char g8;
+
+char
+ae_8_g_0 (void)
+{
+  return __atomic_exchange_n (&g8, 0, 2);
+}
+
+char
+ae_8_g_1 (void)
+{
+  return __atomic_exchange_n (&g8, 1, 2);
+}
+
+/**/
+
+short
+ae_16_0 (short *lock)
+{
+  return __atomic_exchange_n (lock, 0, 2);
+}
+
+short
+ae_16_1 (short *lock)
+{
+  return __atomic_exchange_n (lock, 1, 2);
+}
+
+short g16;
+
+short
+ae_16_g_0 (void)
+{
+  return __atomic_exchange_n (&g16, 0, 2);
+}
+
+short
+ae_16_g_1 (void)
+{
+  return __atomic_exchange_n (&g16, 1, 2);
+}
+
+/**/
+
+int
+ae_32_0 (int *lock)
+{
+  return __atomic_exchange_n (lock, 0, 2);
+}
+
+int
+ae_32_1 (int *lock)
+{
+  return __atomic_exchange_n (lock, 1, 2);
+}
+
+int g32;
+
+int
+ae_32_g_0 (void)
+{
+  return __atomic_exchange_n (&g32, 0, 2);
+}
+
+int
+ae_32_g_1 (void)
+{
+  return __atomic_exchange_n (&g32, 1, 2);
+}
+
+/**/
+
+long long
+ae_64_0 (long long *lock)
+{
+  return __atomic_exchange_n (lock, 0, 2);
+}
+
+long long
+ae_64_1 (long long *lock)
+{
+  return __atomic_exchange_n (lock, 1, 2);
+}
+
+long long g64;
+
+long long
+ ae_64_g_0 (void)
+{
+  return __atomic_exchange_n (&g64, 0, 2);
+}
+
+long long
+ae_64_g_1 (void)
+{
+  return __atomic_exchange_n (&g64, 1, 2);
+}
+
+/**/
+
+#ifdef __s390x__
+__int128
+ae_128_0 (__int128 *lock)
+{
+  return __atomic_exchange_n (lock, 0, 2);
+}
+
+__int128
+ae_128_1 (__int128 *lock)
+{
+  return __atomic_exchange_n (lock, 1, 2);
+}
+
+__int128 g128;
+
+__int128
+ae_128_g_0 (void)
+{
+  return __atomic_exchange_n (&g128, 0, 2);
+}
+
+__int128
+ae_128_g_1 (void)
+{
+  return __atomic_exchange_n (&g128, 1, 2);
+}
+
+#endif
+
+int main(void)
+{
+  int i;
+
+  for (i = 0; i <= 2; i++)
+    {
+      int oval = i;
+
+      {
+	char lock;
+	char rval;
+
+	lock = oval;
+	rval = ae_8_0 (&lock);
+	if (lock != 0)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	lock = oval;
+	rval = ae_8_1 (&lock);
+	if (lock != 1)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	g8 = oval;
+	rval = ae_8_g_0 ();
+	if (g8 != 0)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	g8 = oval;
+	rval = ae_8_g_1 ();
+	if (g8 != 1)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+      }
+      {
+	short lock;
+	short rval;
+
+	lock = oval;
+	rval = ae_16_0 (&lock);
+	if (lock != 0)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	lock = oval;
+	rval = ae_16_1 (&lock);
+	if (lock != 1)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	g16 = oval;
+	rval = ae_16_g_0 ();
+	if (g16 != 0)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	g16 = oval;
+	rval = ae_16_g_1 ();
+	if (g16 != 1)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+      }
+      {
+	int lock;
+	int rval;
+
+	lock = oval;
+	rval = ae_32_0 (&lock);
+	if (lock != 0)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	lock = oval;
+	rval = ae_32_1 (&lock);
+	if (lock != 1)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	g32 = oval;
+	rval = ae_32_g_0 ();
+	if (g32 != 0)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	g32 = oval;
+	rval = ae_32_g_1 ();
+	if (g32 != 1)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+      }
+      {
+	long long lock;
+	long long rval;
+
+	lock = oval;
+	rval = ae_64_0 (&lock);
+	if (lock != 0)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	lock = oval;
+	rval = ae_64_1 (&lock);
+	if (lock != 1)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	g64 = oval;
+	rval = ae_64_g_0 ();
+	if (g64 != 0)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	g64 = oval;
+	rval = ae_64_g_1 ();
+	if (g64 != 1)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+      }
+
+#ifdef __s390x__
+      {
+	__int128 lock;
+	__int128 rval;
+
+	lock = oval;
+	rval = ae_128_0 (&lock);
+	if (lock != 0)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	lock = oval;
+	rval = ae_128_1 (&lock);
+	if (lock != 1)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	g128 = oval;
+	rval = ae_128_g_0 ();
+	if (g128 != 0)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	g128 = oval;
+	rval = ae_128_g_1 ();
+	if (g128 != 1)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+      }
+#endif
+    }
+
+  return 0;
+}
-- 
2.3.0


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] S/390: Optimize atomic_compare_exchange and atomic_compare builtins.
  2017-03-27 20:50 [PATCH] S/390: Optimize atomic_compare_exchange and atomic_compare builtins Dominik Vogt
@ 2017-03-29 15:22 ` Dominik Vogt
  2017-04-05 13:52 ` [PATCH] " Dominik Vogt
  2017-04-11 14:21 ` [PATCH v5] " Dominik Vogt
  2 siblings, 0 replies; 18+ messages in thread
From: Dominik Vogt @ 2017-03-29 15:22 UTC (permalink / raw)
  To: gcc-patches; +Cc: Andreas Krebbel, Ulrich Weigand

[-- Attachment #1: Type: text/plain, Size: 797 bytes --]

On Mon, Mar 27, 2017 at 09:27:35PM +0100, Dominik Vogt wrote:
> The attached patch optimizes the atomic_exchange and
> atomic_compare patterns on s390 and s390x (mostly limited to
> SImode and DImode).  Among general optimizaation, the changes fix
> most of the problems reported in PR 80080:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80080
> 
> Bootstrapped and regression tested on a zEC12 with s390 and s390x
> biarch.

New version of the patch after internal discussion.  Bootstrapped
and regression tested on s390 and s390x biarch on a zEC12.

v2:
  * Clean up and correct comments.
  * Reformat.
  * Use force_reg.
  * Remove an experimental, not working splitter.
  * Use cstorecc4 more often.
  * Clean up code of the predicate.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

[-- Attachment #2: 0001-ChangeLog --]
[-- Type: text/plain, Size: 1507 bytes --]

gcc/ChangeLog

	* (s390_expand_cs_hqi): Removed.
	(s390_expand_cs, s390_expand_atomic_exchange_tdsi): New prototypes.
	(s390_cc_modes_compatible): Export.
	* config/s390/predicates.md ("memory_nosymref_operand"): New predicate
	for compare-and-swap.
	* config/s390/s390.c(s390_emit_compare_and_swap): Handle all integer
	modes.
	(s390_cc_modes_compatible): Remove static.
	(s390_expand_cs_hqi): Make static.
	(s390_expand_cs_tdsi): Generate an explicit compare before trying
	compare-and-swap, in some cases.
	(s390_expand_cs): Wrapper function.
	(s390_expand_atomic_exchange_tdsi): New backend specific expander for
	atomic_exchange.
	* config/s390/s390.md (define_peephole2): New peephole to help
	combining the load-and-test pattern with volatile memory.
	("cstorecc4"): Deal with CCZmode too.
	("sne", "sneccz1_ne", "sneccz1_eq"): Renamed and duplicated pattern.
	("sneccz_ne", "sneccz_eq"): New.
	("atomic_compare_and_swap<mode>"): Merge the patterns for small and
	large integers.  Forbid symref memory operands.  Move expander to
	s390.c.
	("atomic_compare_and_swap<mode>_internal")
	("*atomic_compare_and_swap<mode>_1")
	("*atomic_compare_and_swap<mode>_2")
	("*atomic_compare_and_swap<mode>_3"): Forbid symref memory operands.
	("atomic_exchange<mode>"): Allow and implement all integer modes.
gcc/testsuite/ChangeLog

	* gcc.target/s390/md/atomic_compare_exchange-1.c: New test.
	* gcc.target/s390/md/atomic_compare_exchange-1.inc: New test.
	* gcc.target/s390/md/atomic_exchange-1.inc: New test.

[-- Attachment #3: 0001-S-390-Optimize-atomic_compare_exchange-and-atomic_co.patch --]
[-- Type: text/plain, Size: 35831 bytes --]

From 1ce8c987f227e3c6095980e52edec07835f6fad7 Mon Sep 17 00:00:00 2001
From: Dominik Vogt <vogt@linux.vnet.ibm.com>
Date: Thu, 23 Feb 2017 17:23:11 +0100
Subject: [PATCH] S/390: Optimize atomic_compare_exchange and
 atomic_compare builtins.

1) Use the load-and-test instructions for atomic_exchange if the value is 0.
2) If IS_WEAK is true, compare the memory contents before a compare-and-swap
   and skip the CS instructions if the value is not the expected one.
---
 gcc/config/s390/predicates.md                      |  10 +
 gcc/config/s390/s390-protos.h                      |   5 +-
 gcc/config/s390/s390.c                             | 173 ++++++++++-
 gcc/config/s390/s390.md                            | 199 ++++++++----
 .../gcc.target/s390/md/atomic_compare_exchange-1.c |  84 ++++++
 .../s390/md/atomic_compare_exchange-1.inc          | 336 +++++++++++++++++++++
 .../gcc.target/s390/md/atomic_exchange-1.c         | 309 +++++++++++++++++++
 7 files changed, 1050 insertions(+), 66 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/md/atomic_compare_exchange-1.c
 create mode 100644 gcc/testsuite/gcc.target/s390/md/atomic_compare_exchange-1.inc
 create mode 100644 gcc/testsuite/gcc.target/s390/md/atomic_exchange-1.c

diff --git a/gcc/config/s390/predicates.md b/gcc/config/s390/predicates.md
index 0c82efc..c902f96 100644
--- a/gcc/config/s390/predicates.md
+++ b/gcc/config/s390/predicates.md
@@ -67,6 +67,16 @@
   return true;
 })
 
+;; Like memory_operand, but rejects symbol references.
+(define_predicate "memory_nosymref_operand"
+  (match_operand 0 "memory_operand")
+{
+  if (SUBREG_P (op))
+    op = XEXP (op, 0);
+
+  return (GET_CODE (op) == MEM && !SYMBOL_REF_P (XEXP (op, 0)));
+})
+
 ;; Return true if OP is a valid operand for the BRAS instruction.
 ;; Allow SYMBOL_REFs and @PLT stubs.
 
diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h
index 7f06a20..81644b9 100644
--- a/gcc/config/s390/s390-protos.h
+++ b/gcc/config/s390/s390-protos.h
@@ -81,6 +81,7 @@ extern bool s390_overlap_p (rtx, rtx, HOST_WIDE_INT);
 extern bool s390_offset_p (rtx, rtx, rtx);
 extern int tls_symbolic_operand (rtx);
 
+extern machine_mode s390_cc_modes_compatible (machine_mode, machine_mode);
 extern bool s390_match_ccmode (rtx_insn *, machine_mode);
 extern machine_mode s390_tm_ccmode (rtx, rtx, bool);
 extern machine_mode s390_select_ccmode (enum rtx_code, rtx, rtx);
@@ -112,8 +113,8 @@ extern void s390_expand_vec_strlen (rtx, rtx, rtx);
 extern void s390_expand_vec_movstr (rtx, rtx, rtx);
 extern bool s390_expand_addcc (enum rtx_code, rtx, rtx, rtx, rtx, rtx);
 extern bool s390_expand_insv (rtx, rtx, rtx, rtx);
-extern void s390_expand_cs_hqi (machine_mode, rtx, rtx, rtx,
-				rtx, rtx, bool);
+extern bool s390_expand_cs (machine_mode, rtx, rtx, rtx, rtx, rtx, bool);
+extern void s390_expand_atomic_exchange_tdsi (rtx, rtx, rtx);
 extern void s390_expand_atomic (machine_mode, enum rtx_code,
 				rtx, rtx, rtx, bool);
 extern void s390_expand_tbegin (rtx, rtx, rtx, bool);
diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index e800323..91d9daa 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -1240,7 +1240,7 @@ s390_set_has_landing_pad_p (bool value)
    mode which is compatible with both.  Otherwise, return
    VOIDmode.  */
 
-static machine_mode
+machine_mode
 s390_cc_modes_compatible (machine_mode m1, machine_mode m2)
 {
   if (m1 == m2)
@@ -1751,7 +1751,25 @@ static rtx
 s390_emit_compare_and_swap (enum rtx_code code, rtx old, rtx mem,
 			    rtx cmp, rtx new_rtx)
 {
-  emit_insn (gen_atomic_compare_and_swapsi_internal (old, mem, cmp, new_rtx));
+  switch (GET_MODE (mem))
+    {
+    case SImode:
+      emit_insn (gen_atomic_compare_and_swapsi_internal (old, mem, cmp,
+							 new_rtx));
+      break;
+    case DImode:
+      emit_insn (gen_atomic_compare_and_swapdi_internal (old, mem, cmp,
+							 new_rtx));
+      break;
+    case TImode:
+      emit_insn (gen_atomic_compare_and_swapti_internal (old, mem, cmp,
+							 new_rtx));
+      break;
+    case QImode:
+    case HImode:
+    default:
+      gcc_unreachable ();
+    }
   return s390_emit_compare (code, gen_rtx_REG (CCZ1mode, CC_REGNUM),
 			    const0_rtx);
 }
@@ -6709,7 +6727,7 @@ s390_two_part_insv (struct alignment_context *ac, rtx *seq1, rtx *seq2,
    the memory location, CMP the old value to compare MEM with and NEW_RTX the
    value to set if CMP == MEM.  */
 
-void
+static void
 s390_expand_cs_hqi (machine_mode mode, rtx btarget, rtx vtarget, rtx mem,
 		    rtx cmp, rtx new_rtx, bool is_weak)
 {
@@ -6785,6 +6803,155 @@ s390_expand_cs_hqi (machine_mode mode, rtx btarget, rtx vtarget, rtx mem,
 					      NULL_RTX, 1, OPTAB_DIRECT), 1);
 }
 
+/* Variant of s390_expand_cs for SI, DI and TI modes.  */
+static void
+s390_expand_cs_tdsi (machine_mode mode, rtx btarget, rtx vtarget, rtx mem,
+		     rtx cmp, rtx new_rtx, bool is_weak)
+{
+  rtx output = vtarget;
+  rtx_code_label *skip_cs_label = NULL;
+  bool do_const_opt = false;
+
+  if (!register_operand (output, mode))
+    output = gen_reg_rtx (mode);
+
+  /* If IS_WEAK is true and the INPUT value is a constant, compare the memory
+     with the constant first and skip the compare_and_swap because its very
+     expensive and likely to fail anyway.
+     Note 1: This is done only for IS_WEAK.  C11 allows optimizations that may
+     cause spurious in that case.
+     Note 2: It may be useful to do this also for non-constant INPUT.
+     Note 3: Currently only targets with "load on condition" are supported
+     (z196 and newer).  */
+
+  if (TARGET_CPU_Z196
+      && (mode == SImode
+	  || (mode == DImode && TARGET_ZARCH)))
+    do_const_opt = (is_weak && CONST_INT_P (cmp));
+
+  if (do_const_opt)
+    {
+      const int very_unlikely = REG_BR_PROB_BASE / 100 - 1;
+      rtx cc = gen_rtx_REG (CCZmode, CC_REGNUM);
+
+      skip_cs_label = gen_label_rtx ();
+      emit_move_insn (output, mem);
+      emit_move_insn (btarget, const0_rtx);
+      emit_insn (gen_rtx_SET (cc, gen_rtx_COMPARE (CCZmode, output, cmp)));
+      s390_emit_jump (skip_cs_label, gen_rtx_NE (VOIDmode, cc, const0_rtx));
+      add_int_reg_note (get_last_insn (), REG_BR_PROB, very_unlikely);
+      /* If the jump is not taken, OUTPUT is the expected value.  */
+      cmp = output;
+      /* Reload newval to a register manually, *after* the compare and jump
+	 above.  Otherwise Reload might place it before the jump.  */
+    }
+  else
+    cmp = force_reg (mode, cmp);
+  new_rtx = force_reg (mode, new_rtx);
+  s390_emit_compare_and_swap (EQ, output, mem, cmp, new_rtx);
+
+  /* We deliberately accept non-register operands in the predicate
+     to ensure the write back to the output operand happens *before*
+     the store-flags code below.  This makes it easier for combine
+     to merge the store-flags code with a potential test-and-branch
+     pattern following (immediately!) afterwards.  */
+  if (output != vtarget)
+    emit_move_insn (vtarget, output);
+
+  if (skip_cs_label != NULL)
+      emit_label (skip_cs_label);
+  if (TARGET_Z196 && do_const_opt)
+    {
+      rtx cc, cond, ite;
+
+      /* Do not use gen_cstorecc4 here because it writes either 1 or 0, but
+	 btarget has already been initialized with 0 above.  */
+      cc = gen_rtx_REG (CCZmode, CC_REGNUM);
+      cond = gen_rtx_EQ (VOIDmode, cc, const0_rtx);
+      ite = gen_rtx_IF_THEN_ELSE (SImode, cond, const1_rtx, btarget);
+      emit_insn (gen_rtx_SET (btarget, ite));
+    }
+  else
+    {
+      rtx cc, cond;
+
+      cc = gen_rtx_REG ((do_const_opt) ? CCZmode : CCZ1mode, CC_REGNUM);
+      cond = gen_rtx_EQ (SImode, cc, const0_rtx);
+      emit_insn (gen_cstorecc4 (btarget, cond, cc, const0_rtx));
+    }
+}
+
+/* Expand an atomic compare and swap operation.  MEM is the memory location,
+   CMP the old value to compare MEM with and NEW_RTX the value to set if
+   CMP == MEM.  */
+
+bool
+s390_expand_cs (machine_mode mode, rtx btarget, rtx vtarget, rtx mem,
+		rtx cmp, rtx new_rtx, bool is_weak)
+{
+  if (GET_MODE_BITSIZE (mode) >= 16
+      && GET_MODE_BITSIZE (mode) > MEM_ALIGN (mem))
+    return false;
+
+  /* If the memory address isn't in a register already, reload it now to allow
+     for better optimization in the Rtl passes.  Otherwise Reload does it much
+     later and it might end up inside a loop.  */
+  if (!REG_P (XEXP (mem, 0)))
+    {
+      rtx ref;
+
+      ref = force_reg (Pmode, XEXP (mem, 0));
+      mem = gen_rtx_MEM (mode, ref);
+    }
+
+  switch (mode)
+    {
+    case TImode:
+    case DImode:
+    case SImode:
+      s390_expand_cs_tdsi (mode, btarget, vtarget, mem, cmp, new_rtx, is_weak);
+      break;
+    case HImode:
+    case QImode:
+      s390_expand_cs_hqi (mode, btarget, vtarget, mem, cmp, new_rtx, is_weak);
+      break;
+    default:
+      gcc_unreachable ();
+    }
+
+  return true;
+}
+
+/* Expand an atomic_exchange operation simulated with a compare-and-swap loop.
+   The memory location MEM is set to INPUT.  OUTPUT is set to the previous value
+   of MEM.  */
+
+void
+s390_expand_atomic_exchange_tdsi (rtx output, rtx mem, rtx input)
+{
+  machine_mode mode = GET_MODE (mem);
+  rtx_code_label *csloop;
+
+  if (TARGET_Z196
+      && (mode == DImode || mode == SImode)
+      && CONST_INT_P (input) && INTVAL (input) == 0)
+    {
+      emit_move_insn (output, const0_rtx);
+      if (mode == DImode)
+	emit_insn (gen_atomic_fetch_anddi (output, mem, const0_rtx, input));
+      else
+	emit_insn (gen_atomic_fetch_andsi (output, mem, const0_rtx, input));
+      return;
+    }
+
+  input = force_reg (mode, input);
+  emit_move_insn (output, mem);
+  csloop = gen_label_rtx ();
+  emit_label (csloop);
+  s390_emit_jump (csloop, s390_emit_compare_and_swap (NE, output, mem, output,
+						      input));
+}
+
 /* Expand an atomic operation CODE of mode MODE.  MEM is the memory location
    and VAL the value to play with.  If AFTER is true then store the value
    MEM holds after the operation, if AFTER is false then store the value MEM
diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index 93a0bc6..7112ddd 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -893,6 +893,21 @@
   [(set_attr "op_type" "RR<E>,RXY")
    (set_attr "z10prop" "z10_fr_E1,z10_fwd_A3") ])
 
+; Peephole to combine a load-and-test from volatile memory which combine does
+; not do.
+(define_peephole2
+  [(set (match_operand:GPR 0 "register_operand")
+	(match_operand:GPR 2 "memory_operand"))
+   (set (reg CC_REGNUM)
+	(compare (match_dup 0) (match_operand:GPR 1 "const0_operand")))]
+  "s390_match_ccmode(insn, CCSmode) && TARGET_EXTIMM
+   && GENERAL_REG_P (operands[0])
+   && satisfies_constraint_T (operands[2])"
+  [(parallel
+    [(set (reg:CCS CC_REGNUM)
+	  (compare:CCS (match_dup 2) (match_dup 1)))
+     (set (match_dup 0) (match_dup 2))])])
+
 ; ltr, lt, ltgr, ltg
 (define_insn "*tst<mode>_cconly_extimm"
   [(set (reg CC_REGNUM)
@@ -6499,26 +6514,110 @@
   [(parallel
     [(set (match_operand:SI 0 "register_operand" "")
 	  (match_operator:SI 1 "s390_eqne_operator"
-           [(match_operand:CCZ1 2 "register_operand")
+           [(match_operand 2 "register_operand")
 	    (match_operand 3 "const0_operand")]))
      (clobber (reg:CC CC_REGNUM))])]
   ""
-  "emit_insn (gen_sne (operands[0], operands[2]));
-   if (GET_CODE (operands[1]) == EQ)
-     emit_insn (gen_xorsi3 (operands[0], operands[0], const1_rtx));
+  "machine_mode mode = GET_MODE (operands[2]);
+   if (!s390_cc_modes_compatible (mode, CCZmode))
+     FAIL;
+   if (TARGET_Z196)
+     {
+       rtx cc, cond, ite;
+
+       cc = gen_rtx_REG (mode, CC_REGNUM);
+       cond = gen_rtx_EQ (VOIDmode, cc, const0_rtx);
+       ite = gen_rtx_IF_THEN_ELSE (SImode, cond, const1_rtx, const0_rtx);
+       emit_insn (gen_rtx_SET (operands[0], ite));
+     }
+   else
+     {
+       if (mode == CCZ1mode)
+	 {
+	   if (GET_CODE (operands[1]) == EQ && TARGET_EXTIMM)
+	     emit_insn (gen_sneccz1_eq (operands[0], operands[2]));
+	   else
+	     emit_insn (gen_sneccz1_ne (operands[0], operands[2]));
+	 }
+       else
+	 {
+	   if (GET_CODE (operands[1]) == EQ && TARGET_EXTIMM)
+	     emit_insn (gen_sneccz_eq (operands[0], operands[2]));
+	   else
+	     emit_insn (gen_sneccz_ne (operands[0], operands[2]));
+	 }
+       if (GET_CODE (operands[1]) == EQ && !TARGET_EXTIMM)
+	 emit_insn (gen_xorsi3 (operands[0], operands[0], const1_rtx));
+     }
    DONE;")
 
-(define_insn_and_split "sne"
+(define_insn_and_split "sneccz1_ne"
   [(set (match_operand:SI 0 "register_operand" "=d")
 	(ne:SI (match_operand:CCZ1 1 "register_operand" "0")
-	       (const_int 0)))
-   (clobber (reg:CC CC_REGNUM))]
+	       (const_int 0)))]
   ""
   "#"
   "reload_completed"
-  [(parallel
-    [(set (match_dup 0) (ashiftrt:SI (match_dup 0) (const_int 28)))
-     (clobber (reg:CC CC_REGNUM))])])
+  [(set (match_dup 0) (lshiftrt:SI (match_dup 0) (const_int 28)))])
+
+(define_insn_and_split "sneccz1_eq"
+  [(set (reg:CCZ1 CC_REGNUM)
+	(ne:CCZ1 (match_operand:CCZ1 1 "register_operand" "0")
+		 (const_int 0)))
+   (set (match_operand:SI 0 "register_operand" "=d")
+	(eq:SI (match_dup 1) (const_int 0)))]
+  "TARGET_EXTIMM"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0) (lshiftrt:SI (match_dup 0) (const_int 28)))
+   (parallel
+    [(set (reg:CCZ CC_REGNUM)
+	  (compare:CCZ (xor:SI (match_dup 0) (const_int 1))
+		       (const_int 0)))
+     (set (match_dup 0)
+	  (xor:SI (match_dup 0) (const_int 1)))])])
+
+(define_insn_and_split "sneccz_ne"
+  [(set (reg:CCZ1 CC_REGNUM)
+	(eq:CCZ1 (match_operand:CCZ 1 "register_operand" "0")
+		 (const_int 0)))
+   (set (match_operand:SI 0 "register_operand" "=d")
+	(ne:SI (match_dup 1) (const_int 0)))]
+  "!TARGET_Z196"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0) (lshiftrt:SI (match_dup 0) (const_int 28)))
+   (parallel
+    [(set (reg:CCAN CC_REGNUM)
+	  (compare:CCAN (neg:SI (abs:SI (match_dup 0)))
+			(const_int 0)))
+     (set (match_dup 0)
+	  (neg:SI (abs:SI (match_dup 0))))])
+   (set (match_dup 0) (lshiftrt:SI (match_dup 0) (const_int 31)))])
+
+(define_insn_and_split "sneccz_eq"
+  [(set (reg:CCZ1 CC_REGNUM)
+	(ne:CCZ1 (match_operand:CCZ 1 "register_operand" "0")
+		 (const_int 0)))
+   (set (match_operand:SI 0 "register_operand" "=d")
+	(eq:SI (match_dup 1) (const_int 0)))]
+  "!TARGET_Z196 && TARGET_EXTIMM"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0) (lshiftrt:SI (match_dup 0) (const_int 28)))
+   (parallel
+    [(set (reg:CCAN CC_REGNUM)
+	  (compare:CCAN (neg:SI (abs:SI (match_dup 0)))
+			(const_int 0)))
+     (set (match_dup 0)
+	  (neg:SI (abs:SI (match_dup 0))))])
+   (set (match_dup 0) (lshiftrt:SI (match_dup 0) (const_int 31)))
+   (parallel
+    [(set (reg:CCZ CC_REGNUM)
+	  (compare:CCZ (xor:SI (match_dup 0) (const_int 1))
+		       (const_int 0)))
+     (set (match_dup 0)
+	  (xor:SI (match_dup 0) (const_int 1)))])])
 
 
 ;;
@@ -10174,60 +10273,28 @@
 
 (define_expand "atomic_compare_and_swap<mode>"
   [(match_operand:SI 0 "register_operand")	;; bool success output
-   (match_operand:DGPR 1 "nonimmediate_operand");; oldval output
-   (match_operand:DGPR 2 "memory_operand")	;; memory
-   (match_operand:DGPR 3 "register_operand")	;; expected intput
-   (match_operand:DGPR 4 "register_operand")	;; newval intput
+   (match_operand:DINT 1 "nonimmediate_operand");; oldval output
+   (match_operand:DINT 2 "memory_nosymref_operand")	;; memory
+   (match_operand:DINT 3 "general_operand")	;; expected intput
+   (match_operand:DINT 4 "general_operand")	;; newval intput
    (match_operand:SI 5 "const_int_operand")	;; is_weak
    (match_operand:SI 6 "const_int_operand")	;; success model
    (match_operand:SI 7 "const_int_operand")]	;; failure model
   ""
 {
-  rtx cc, cmp, output = operands[1];
-
-  if (!register_operand (output, <MODE>mode))
-    output = gen_reg_rtx (<MODE>mode);
-
-  if (MEM_ALIGN (operands[2]) < GET_MODE_BITSIZE (GET_MODE (operands[2])))
+  bool rc;
+  rc = s390_expand_cs (<MODE>mode, operands[0], operands[1], operands[2],
+		       operands[3], operands[4], INTVAL (operands[5]));
+  if (rc)
+    DONE;
+  else
     FAIL;
-
-  emit_insn (gen_atomic_compare_and_swap<mode>_internal
-	     (output, operands[2], operands[3], operands[4]));
-
-  /* We deliberately accept non-register operands in the predicate
-     to ensure the write back to the output operand happens *before*
-     the store-flags code below.  This makes it easier for combine
-     to merge the store-flags code with a potential test-and-branch
-     pattern following (immediately!) afterwards.  */
-  if (output != operands[1])
-    emit_move_insn (operands[1], output);
-
-  cc = gen_rtx_REG (CCZ1mode, CC_REGNUM);
-  cmp = gen_rtx_EQ (SImode, cc, const0_rtx);
-  emit_insn (gen_cstorecc4 (operands[0], cmp, cc, const0_rtx));
-  DONE;
-})
-
-(define_expand "atomic_compare_and_swap<mode>"
-  [(match_operand:SI 0 "register_operand")	;; bool success output
-   (match_operand:HQI 1 "nonimmediate_operand")	;; oldval output
-   (match_operand:HQI 2 "memory_operand")	;; memory
-   (match_operand:HQI 3 "general_operand")	;; expected intput
-   (match_operand:HQI 4 "general_operand")	;; newval intput
-   (match_operand:SI 5 "const_int_operand")	;; is_weak
-   (match_operand:SI 6 "const_int_operand")	;; success model
-   (match_operand:SI 7 "const_int_operand")]	;; failure model
-  ""
-{
-  s390_expand_cs_hqi (<MODE>mode, operands[0], operands[1], operands[2],
-		      operands[3], operands[4], INTVAL (operands[5]));
-  DONE;
 })
 
 (define_expand "atomic_compare_and_swap<mode>_internal"
   [(parallel
      [(set (match_operand:DGPR 0 "register_operand")
-	   (match_operand:DGPR 1 "memory_operand"))
+	   (match_operand:DGPR 1 "memory_nosymref_operand"))
       (set (match_dup 1)
 	   (unspec_volatile:DGPR
 	     [(match_dup 1)
@@ -10241,7 +10308,7 @@
 ; cdsg, csg
 (define_insn "*atomic_compare_and_swap<mode>_1"
   [(set (match_operand:TDI 0 "register_operand" "=r")
-	(match_operand:TDI 1 "memory_operand" "+S"))
+	(match_operand:TDI 1 "memory_nosymref_operand" "+S"))
    (set (match_dup 1)
 	(unspec_volatile:TDI
 	  [(match_dup 1)
@@ -10258,7 +10325,7 @@
 ; cds, cdsy
 (define_insn "*atomic_compare_and_swapdi_2"
   [(set (match_operand:DI 0 "register_operand" "=r,r")
-	(match_operand:DI 1 "memory_operand" "+Q,S"))
+	(match_operand:DI 1 "memory_nosymref_operand" "+Q,S"))
    (set (match_dup 1)
 	(unspec_volatile:DI
 	  [(match_dup 1)
@@ -10278,7 +10345,7 @@
 ; cs, csy
 (define_insn "*atomic_compare_and_swapsi_3"
   [(set (match_operand:SI 0 "register_operand" "=r,r")
-	(match_operand:SI 1 "memory_operand" "+Q,S"))
+	(match_operand:SI 1 "memory_nosymref_operand" "+Q,S"))
    (set (match_dup 1)
 	(unspec_volatile:SI
 	  [(match_dup 1)
@@ -10374,15 +10441,25 @@
   DONE;
 })
 
+;; Pattern to implement atomic_exchange with a compare-and-swap loop.  The code
+;; generated by the middleend is not good.
 (define_expand "atomic_exchange<mode>"
-  [(match_operand:HQI 0 "register_operand")		;; val out
-   (match_operand:HQI 1 "memory_operand")		;; memory
-   (match_operand:HQI 2 "general_operand")		;; val in
+  [(match_operand:DINT 0 "register_operand")		;; val out
+   (match_operand:DINT 1 "memory_nosymref_operand")	;; memory
+   (match_operand:DINT 2 "general_operand")		;; val in
    (match_operand:SI 3 "const_int_operand")]		;; model
   ""
 {
-  s390_expand_atomic (<MODE>mode, SET, operands[0], operands[1],
-		      operands[2], false);
+  if (<MODE>mode != QImode
+      && MEM_ALIGN (operands[1]) < GET_MODE_BITSIZE (<MODE>mode))
+    FAIL;
+  if (<MODE>mode == HImode || <MODE>mode == QImode)
+    s390_expand_atomic (<MODE>mode, SET, operands[0], operands[1], operands[2],
+			false);
+  else if (<MODE>mode == SImode || TARGET_ZARCH)
+    s390_expand_atomic_exchange_tdsi (operands[0], operands[1], operands[2]);
+  else
+    FAIL;
   DONE;
 })
 
diff --git a/gcc/testsuite/gcc.target/s390/md/atomic_compare_exchange-1.c b/gcc/testsuite/gcc.target/s390/md/atomic_compare_exchange-1.c
new file mode 100644
index 0000000..5cc026d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/md/atomic_compare_exchange-1.c
@@ -0,0 +1,84 @@
+/* Machine description pattern tests.  */
+
+/* { dg-do compile } */
+/* { dg-options "" } */
+/* { dg-do run { target { s390_useable_hw } } } */
+
+#include <stdio.h>
+
+struct
+{
+#ifdef __s390xx__
+  __int128 dummy128;
+  __int128 mem128;
+#endif
+  long long dummy64;
+  long long mem64;
+  int dummy32;
+  int mem32;
+  short mem16l;
+  short mem16h;
+  char mem8ll;
+  char mem8lh;
+  char mem8hl;
+  char mem8hh;
+} mem_s;
+
+#define TYPE char
+#define FN(SUFFIX) f8 ## SUFFIX
+#define FNS(SUFFIX) "f8" #SUFFIX
+#include "atomic_compare_exchange-1.inc"
+
+#define TYPE short
+#define FN(SUFFIX) f16 ##SUFFIX
+#define FNS(SUFFIX) "f16" #SUFFIX
+#include "atomic_compare_exchange-1.inc"
+
+#define TYPE int
+#define FN(SUFFIX) f32 ## SUFFIX
+#define FNS(SUFFIX) "f32" #SUFFIX
+#include "atomic_compare_exchange-1.inc"
+
+#define TYPE long long
+#define FN(SUFFIX) f64 ## SUFFIX
+#define FNS(SUFFIX) "f64" #SUFFIX
+#include "atomic_compare_exchange-1.inc"
+
+#ifdef __s390xx__
+#define TYPE __int128
+#define FN(SUFFIX) f128 ## SUFFIX
+#define FNS(SUFFIX) "f128" #SUFFIX
+#include "atomic_compare_exchange-1.inc"
+#endif
+
+int main(void)
+{
+  int err_count = 0;
+  int i;
+
+  for (i = -1; i <= 2; i++)
+    {
+      __builtin_memset(&mem_s, 0x99, sizeof(mem_s));
+      err_count += f8_validate(&mem_s.mem8ll, i, 1);
+      __builtin_memset(&mem_s, 0x99, sizeof(mem_s));
+      err_count += f8_validate(&mem_s.mem8lh, i, 1);
+      __builtin_memset(&mem_s, 0x99, sizeof(mem_s));
+      err_count += f8_validate(&mem_s.mem8hl, i, 1);
+      __builtin_memset(&mem_s, 0x99, sizeof(mem_s));
+      err_count += f8_validate(&mem_s.mem8hh, i, 1);
+      __builtin_memset(&mem_s, 0x99, sizeof(mem_s));
+      err_count += f16_validate(&mem_s.mem16l, i, 1);
+      __builtin_memset(&mem_s, 0x99, sizeof(mem_s));
+      err_count += f16_validate(&mem_s.mem16h, i, 1);
+      __builtin_memset(&mem_s, 0x99, sizeof(mem_s));
+      err_count += f32_validate(&mem_s.mem32, i, 1);
+      __builtin_memset(&mem_s, 0x99, sizeof(mem_s));
+      err_count += f64_validate(&mem_s.mem64, i, 1);
+#ifdef __s390xx__
+      __builtin_memset(&mem_s, 0x99, sizeof(mem_s));
+      err_count += f128_validate(&mem_s.mem128, i, 1);
+#endif
+    }
+
+  return err_count;
+}
diff --git a/gcc/testsuite/gcc.target/s390/md/atomic_compare_exchange-1.inc b/gcc/testsuite/gcc.target/s390/md/atomic_compare_exchange-1.inc
new file mode 100644
index 0000000..199aaa3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/md/atomic_compare_exchange-1.inc
@@ -0,0 +1,336 @@
+/* -*-c-*- */
+
+#undef NEW
+#define NEW 3
+
+__attribute__ ((noinline))
+int FN(_bo)(TYPE *mem, TYPE *old_ret, TYPE old)
+{
+  *old_ret = old;
+  return __atomic_compare_exchange_n (mem, (void *)old_ret, NEW, 1, 2, 0);
+}
+
+__attribute__ ((noinline))
+void FN(_o)(TYPE *mem, TYPE *old_ret, TYPE old)
+{
+  *old_ret = old;
+  __atomic_compare_exchange_n (mem, (void *)old_ret, NEW, 1, 2, 0);
+  return;
+}
+
+__attribute__ ((noinline))
+int FN(_b)(TYPE *mem, TYPE old)
+{
+  return __atomic_compare_exchange_n (mem, (void *)&old, NEW, 1, 2, 0);
+}
+
+__attribute__ ((noinline))
+void FN()(TYPE *mem, TYPE old)
+{
+  __atomic_compare_exchange_n (mem, (void *)&old, NEW, 1, 2, 0);
+  return;
+}
+
+/* Const != 0 old value.  */
+__attribute__ ((noinline))
+int FN(_c1_bo)(TYPE *mem, TYPE *old_ret)
+{
+  *old_ret = 1;
+  return __atomic_compare_exchange_n (mem, (void *)old_ret, NEW, 1, 2, 0);
+}
+
+__attribute__ ((noinline))
+void FN(_c1_o)(TYPE *mem, TYPE *old_ret)
+{
+  *old_ret = 1;
+  __atomic_compare_exchange_n (mem, (void *)old_ret, NEW, 1, 2, 0);
+  return;
+}
+
+__attribute__ ((noinline))
+int FN(_c1_b)(TYPE *mem)
+{
+  TYPE old = 1;
+  return __atomic_compare_exchange_n (mem, (void *)&old, NEW, 1, 2, 0);
+}
+
+__attribute__ ((noinline))
+void FN(_c1)(TYPE *mem)
+{
+  TYPE old = 1;
+  __atomic_compare_exchange_n (mem, (void *)&old, NEW, 1, 2, 0);
+  return;
+}
+
+/* Const == 0 old value.  */
+__attribute__ ((noinline))
+int FN(_c0_bo)(TYPE *mem, TYPE *old_ret)
+{
+  *old_ret = 0;
+  return __atomic_compare_exchange_n (mem, (void *)old_ret, NEW, 1, 2, 0);
+}
+
+__attribute__ ((noinline))
+void FN(_c0_o)(TYPE *mem, TYPE *old_ret)
+{
+  *old_ret = 0;
+  __atomic_compare_exchange_n (mem, (void *)old_ret, NEW, 1, 2, 0);
+  return;
+}
+
+__attribute__ ((noinline))
+int FN(_c0_b)(TYPE *mem)
+{
+  TYPE old = 0;
+  return __atomic_compare_exchange_n (mem, (void *)&old, NEW, 1, 2, 0);
+}
+
+__attribute__ ((noinline))
+void FN(_c0)(TYPE *mem)
+{
+  TYPE old = 0;
+  __atomic_compare_exchange_n (mem, (void *)&old, NEW, 1, 2, 0);
+  return;
+}
+
+int FN(_validate_mem)(TYPE *mem, TYPE expected_mem)
+{
+  if (*mem != expected_mem)
+    {
+      fprintf(stderr, "  BAD: mem %d != expected mem %d\n",
+	      *mem, expected_mem);
+      return 1;
+    }
+
+  return 0;
+}
+
+int FN(_validate_rc)(int rc, int expected_rc)
+{
+  if (rc != expected_rc)
+    {
+      fprintf(stderr, "  BAD: rc %d != expected rc %d\n",
+	      rc, expected_rc);
+      return 1;
+    }
+
+  return 0;
+}
+
+int FN(_validate_old_ret)(int old_ret, int expected_old_ret)
+{
+  if (old_ret != expected_old_ret)
+    {
+      fprintf(stderr, "  BAD: old_ret %d != expected old_ret %d\n",
+	      old_ret, expected_old_ret);
+      return 1;
+    }
+
+  return 0;
+}
+
+int FN(_validate)(TYPE *mem, TYPE init_mem, TYPE old)
+{
+  int err_count = 0;
+  int rc;
+  TYPE expected_mem;
+  int expected_rc;
+  TYPE old_ret;
+  int failed;
+  const char *fname;
+
+  fprintf(stderr, "%s: init_mem %d @ %p\n", __FUNCTION__, init_mem, mem);
+  {
+    failed = 0;
+    old = 1;
+    *mem = init_mem;
+    expected_mem = (init_mem == old) ? NEW : *mem;
+    expected_rc = (init_mem == old);
+    fname = FNS(_bo);
+    rc = FN(_bo)(mem, &old_ret, old);
+    failed |= FN(_validate_rc)(rc, expected_rc);
+    failed |= FN(_validate_old_ret)(old_ret, init_mem);
+    failed |= FN(_validate_mem)(mem, expected_mem);
+    if (failed)
+      {
+	fprintf(stderr, "  FAIL: %s: near line %d\n", fname, __LINE__ - 3);
+	err_count++;
+      }
+  }
+  {
+    failed = 0;
+    old = 1;
+    *mem = init_mem;
+    expected_mem = (init_mem == old) ? NEW : *mem;
+    fname = FNS(_o);
+    FN(_o)(mem, &old_ret, old);
+    failed |= FN(_validate_old_ret)(old_ret, init_mem);
+    failed |= FN(_validate_mem)(mem, expected_mem);
+    if (failed)
+      {
+	fprintf(stderr, "  FAIL: %s: near line %d\n", fname, __LINE__ - 3);
+	err_count++;
+      }
+  }
+  {
+    failed = 0;
+    old = 1;
+    *mem = init_mem;
+    expected_mem = (init_mem == old) ? NEW : *mem;
+    expected_rc = (init_mem == old);
+    fname = FNS(_b);
+    rc = FN(_b)(mem, old);
+    failed |= FN(_validate_rc)(rc, expected_rc);
+    failed |= FN(_validate_mem)(mem, expected_mem);
+    if (failed)
+      {
+	fprintf(stderr, "  FAIL: %s: near line %d\n", fname, __LINE__ - 3);
+	err_count++;
+      }
+  }
+  {
+    failed = 0;
+    old = 1;
+    *mem = init_mem;
+    expected_mem = (init_mem == old) ? NEW : *mem;
+    fname = FNS();
+    FN()(mem, old);
+    failed |= FN(_validate_mem)(mem, expected_mem);
+    if (failed)
+      {
+	fprintf(stderr, "  FAIL: %s: near line %d\n", fname, __LINE__ - 3);
+	err_count++;
+      }
+  }
+  {
+    failed = 0;
+    old = 1;
+    *mem = init_mem;
+    expected_mem = (init_mem == old) ? NEW : *mem;
+    expected_rc = (init_mem == old);
+    fname = FNS(_c1_bo);
+    rc = FN(_c1_bo)(mem, &old_ret);
+    failed |= FN(_validate_rc)(rc, expected_rc);
+    failed |= FN(_validate_old_ret)(old_ret, init_mem);
+    failed |= FN(_validate_mem)(mem, expected_mem);
+    if (failed)
+      {
+	fprintf(stderr, "  FAIL: %s: near line %d\n", fname, __LINE__ - 3);
+	err_count++;
+      }
+  }
+  {
+    failed = 0;
+    old = 1;
+    *mem = init_mem;
+    expected_mem = (init_mem == old) ? NEW : *mem;
+    fname = FNS(_c1_o);
+    FN(_c1_o)(mem, &old_ret);
+    failed |= FN(_validate_old_ret)(old_ret, init_mem);
+    failed |= FN(_validate_mem)(mem, expected_mem);
+    if (failed)
+      {
+	fprintf(stderr, "  FAIL: %s: near line %d\n", fname, __LINE__ - 3);
+	err_count++;
+      }
+  }
+  {
+    failed = 0;
+    old = 1;
+    *mem = init_mem;
+    expected_mem = (init_mem == old) ? NEW : *mem;
+    expected_rc = (init_mem == old);
+    fname = FNS(_c1_b);
+    rc = FN(_c1_b)(mem);
+    failed |= FN(_validate_rc)(rc, expected_rc);
+    failed |= FN(_validate_mem)(mem, expected_mem);
+    if (failed)
+      {
+	fprintf(stderr, "  FAIL: %s: near line %d\n", fname, __LINE__ - 3);
+	err_count++;
+      }
+  }
+  {
+    failed = 0;
+    old = 1;
+    *mem = init_mem;
+    expected_mem = (init_mem == old) ? NEW : *mem;
+    fname = FNS(_c1);
+    FN(_c1)(mem);
+    failed |= FN(_validate_mem)(mem, expected_mem);
+    if (failed)
+      {
+	fprintf(stderr, "  FAIL: %s: near line %d\n", fname, __LINE__ - 3);
+	err_count++;
+      }
+  }
+  {
+    failed = 0;
+    old = 0;
+    *mem = init_mem;
+    expected_mem = (init_mem == old) ? NEW : *mem;
+    expected_rc = (init_mem == old);
+    fname = FNS(_c0_bo);
+    rc = FN(_c0_bo)(mem, &old_ret);
+    failed |= FN(_validate_rc)(rc, expected_rc);
+    failed |= FN(_validate_old_ret)(old_ret, init_mem);
+    failed |= FN(_validate_mem)(mem, expected_mem);
+    if (failed)
+      {
+	fprintf(stderr, "  FAIL: %s: near line %d\n", fname, __LINE__ - 3);
+	err_count++;
+      }
+  }
+  {
+    failed = 0;
+    old = 0;
+    *mem = init_mem;
+    expected_mem = (init_mem == old) ? NEW : *mem;
+    fname = FNS(_c0_o);
+    FN(_c0_o)(mem, &old_ret);
+    failed |= FN(_validate_old_ret)(old_ret, init_mem);
+    failed |= FN(_validate_mem)(mem, expected_mem);
+    if (failed)
+      {
+	fprintf(stderr, "  FAIL: %s: near line %d\n", fname, __LINE__ - 3);
+	err_count++;
+      }
+  }
+  {
+    failed = 0;
+    old = 0;
+    *mem = init_mem;
+    expected_mem = (init_mem == old) ? NEW : *mem;
+    expected_rc = (init_mem == old);
+    fname = FNS(_c0_b);
+    rc = FN(_c0_b)(mem);
+    failed |= FN(_validate_rc)(rc, expected_rc);
+    failed |= FN(_validate_mem)(mem, expected_mem);
+    if (failed)
+      {
+	fprintf(stderr, "  FAIL: %s: near line %d\n", fname, __LINE__ - 3);
+	err_count++;
+      }
+  }
+  {
+    failed = 0;
+    old = 0;
+    *mem = init_mem;
+    expected_mem = (init_mem == old) ? NEW : *mem;
+    fname = FNS(_c0);
+    FN(_c0)(mem);
+    failed |= FN(_validate_mem)(mem, expected_mem);
+    if (failed)
+      {
+	fprintf(stderr, "  FAIL: %s: near line %d\n", fname, __LINE__ - 3);
+	err_count++;
+      }
+  }
+
+  return err_count;
+}
+
+#undef TYPE
+#undef MEM
+#undef FN
+#undef FNS
diff --git a/gcc/testsuite/gcc.target/s390/md/atomic_exchange-1.c b/gcc/testsuite/gcc.target/s390/md/atomic_exchange-1.c
new file mode 100644
index 0000000..f82b213
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/md/atomic_exchange-1.c
@@ -0,0 +1,309 @@
+/* Machine description pattern tests.  */
+
+/* { dg-do compile } */
+/* { dg-options "-lpthread -latomic" } */
+/* { dg-do run { target { s390_useable_hw } } } */
+
+/**/
+
+char
+ae_8_0 (char *lock)
+{
+  return __atomic_exchange_n (lock, 0, 2);
+}
+
+char
+ae_8_1 (char *lock)
+{
+  return __atomic_exchange_n (lock, 1, 2);
+}
+
+char g8;
+
+char
+ae_8_g_0 (void)
+{
+  return __atomic_exchange_n (&g8, 0, 2);
+}
+
+char
+ae_8_g_1 (void)
+{
+  return __atomic_exchange_n (&g8, 1, 2);
+}
+
+/**/
+
+short
+ae_16_0 (short *lock)
+{
+  return __atomic_exchange_n (lock, 0, 2);
+}
+
+short
+ae_16_1 (short *lock)
+{
+  return __atomic_exchange_n (lock, 1, 2);
+}
+
+short g16;
+
+short
+ae_16_g_0 (void)
+{
+  return __atomic_exchange_n (&g16, 0, 2);
+}
+
+short
+ae_16_g_1 (void)
+{
+  return __atomic_exchange_n (&g16, 1, 2);
+}
+
+/**/
+
+int
+ae_32_0 (int *lock)
+{
+  return __atomic_exchange_n (lock, 0, 2);
+}
+
+int
+ae_32_1 (int *lock)
+{
+  return __atomic_exchange_n (lock, 1, 2);
+}
+
+int g32;
+
+int
+ae_32_g_0 (void)
+{
+  return __atomic_exchange_n (&g32, 0, 2);
+}
+
+int
+ae_32_g_1 (void)
+{
+  return __atomic_exchange_n (&g32, 1, 2);
+}
+
+/**/
+
+long long
+ae_64_0 (long long *lock)
+{
+  return __atomic_exchange_n (lock, 0, 2);
+}
+
+long long
+ae_64_1 (long long *lock)
+{
+  return __atomic_exchange_n (lock, 1, 2);
+}
+
+long long g64;
+
+long long
+ ae_64_g_0 (void)
+{
+  return __atomic_exchange_n (&g64, 0, 2);
+}
+
+long long
+ae_64_g_1 (void)
+{
+  return __atomic_exchange_n (&g64, 1, 2);
+}
+
+/**/
+
+#ifdef __s390x__
+__int128
+ae_128_0 (__int128 *lock)
+{
+  return __atomic_exchange_n (lock, 0, 2);
+}
+
+__int128
+ae_128_1 (__int128 *lock)
+{
+  return __atomic_exchange_n (lock, 1, 2);
+}
+
+__int128 g128;
+
+__int128
+ae_128_g_0 (void)
+{
+  return __atomic_exchange_n (&g128, 0, 2);
+}
+
+__int128
+ae_128_g_1 (void)
+{
+  return __atomic_exchange_n (&g128, 1, 2);
+}
+
+#endif
+
+int main(void)
+{
+  int i;
+
+  for (i = 0; i <= 2; i++)
+    {
+      int oval = i;
+
+      {
+	char lock;
+	char rval;
+
+	lock = oval;
+	rval = ae_8_0 (&lock);
+	if (lock != 0)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	lock = oval;
+	rval = ae_8_1 (&lock);
+	if (lock != 1)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	g8 = oval;
+	rval = ae_8_g_0 ();
+	if (g8 != 0)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	g8 = oval;
+	rval = ae_8_g_1 ();
+	if (g8 != 1)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+      }
+      {
+	short lock;
+	short rval;
+
+	lock = oval;
+	rval = ae_16_0 (&lock);
+	if (lock != 0)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	lock = oval;
+	rval = ae_16_1 (&lock);
+	if (lock != 1)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	g16 = oval;
+	rval = ae_16_g_0 ();
+	if (g16 != 0)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	g16 = oval;
+	rval = ae_16_g_1 ();
+	if (g16 != 1)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+      }
+      {
+	int lock;
+	int rval;
+
+	lock = oval;
+	rval = ae_32_0 (&lock);
+	if (lock != 0)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	lock = oval;
+	rval = ae_32_1 (&lock);
+	if (lock != 1)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	g32 = oval;
+	rval = ae_32_g_0 ();
+	if (g32 != 0)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	g32 = oval;
+	rval = ae_32_g_1 ();
+	if (g32 != 1)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+      }
+      {
+	long long lock;
+	long long rval;
+
+	lock = oval;
+	rval = ae_64_0 (&lock);
+	if (lock != 0)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	lock = oval;
+	rval = ae_64_1 (&lock);
+	if (lock != 1)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	g64 = oval;
+	rval = ae_64_g_0 ();
+	if (g64 != 0)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	g64 = oval;
+	rval = ae_64_g_1 ();
+	if (g64 != 1)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+      }
+
+#ifdef __s390x__
+      {
+	__int128 lock;
+	__int128 rval;
+
+	lock = oval;
+	rval = ae_128_0 (&lock);
+	if (lock != 0)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	lock = oval;
+	rval = ae_128_1 (&lock);
+	if (lock != 1)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	g128 = oval;
+	rval = ae_128_g_0 ();
+	if (g128 != 0)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	g128 = oval;
+	rval = ae_128_g_1 ();
+	if (g128 != 1)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+      }
+#endif
+    }
+
+  return 0;
+}
-- 
2.3.0


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] S/390: Optimize atomic_compare_exchange and atomic_compare builtins.
  2017-03-27 20:50 [PATCH] S/390: Optimize atomic_compare_exchange and atomic_compare builtins Dominik Vogt
  2017-03-29 15:22 ` [PATCH v2] " Dominik Vogt
@ 2017-04-05 13:52 ` Dominik Vogt
  2017-04-05 15:25   ` Ulrich Weigand
                     ` (2 more replies)
  2017-04-11 14:21 ` [PATCH v5] " Dominik Vogt
  2 siblings, 3 replies; 18+ messages in thread
From: Dominik Vogt @ 2017-04-05 13:52 UTC (permalink / raw)
  To: gcc-patches; +Cc: Andreas Krebbel, Ulrich Weigand

On Mon, Mar 27, 2017 at 09:27:35PM +0100, Dominik Vogt wrote:
> The attached patch optimizes the atomic_exchange and
> atomic_compare patterns on s390 and s390x (mostly limited to
> SImode and DImode).  Among general optimizaation, the changes fix
> most of the problems reported in PR 80080:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80080
> 
> Bootstrapped and regression tested on a zEC12 with s390 and s390x
> biarch.

New version attached.

v3:

  * Remove sne* patterns.
  * Move alignment check from s390_expand_cs to s390.md.
  * Use s_operand instead of memory_nosymref_operand.
  * Remove memory_nosymref_operand.
  * Allow any CC-mode in cstorecc4 for TARGET_Z196.
  * Fix EQ with TARGET_Z196 in cstorecc4.
  * Duplicate CS patterns for CCZmode.

Bootstrapped and regression tested on a zEC12 with s390 and s390x
biarch.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] S/390: Optimize atomic_compare_exchange and atomic_compare builtins.
  2017-04-05 13:52 ` [PATCH] " Dominik Vogt
@ 2017-04-05 15:25   ` Ulrich Weigand
  2017-04-06  9:35   ` Dominik Vogt
  2017-04-07 14:14   ` Dominik Vogt
  2 siblings, 0 replies; 18+ messages in thread
From: Ulrich Weigand @ 2017-04-05 15:25 UTC (permalink / raw)
  To: vogt; +Cc: gcc-patches, Andreas Krebbel, Ulrich Weigand

Dominik Vogt wrote:
> On Mon, Mar 27, 2017 at 09:27:35PM +0100, Dominik Vogt wrote:
> > The attached patch optimizes the atomic_exchange and
> > atomic_compare patterns on s390 and s390x (mostly limited to
> > SImode and DImode).  Among general optimizaation, the changes fix
> > most of the problems reported in PR 80080:
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80080
> > 
> > Bootstrapped and regression tested on a zEC12 with s390 and s390x
> > biarch.
> 
> New version attached.

No, it isn't :-)

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] S/390: Optimize atomic_compare_exchange and atomic_compare builtins.
  2017-04-05 13:52 ` [PATCH] " Dominik Vogt
  2017-04-05 15:25   ` Ulrich Weigand
@ 2017-04-06  9:35   ` Dominik Vogt
  2017-04-06 15:29     ` Ulrich Weigand
  2017-04-07 14:14   ` Dominik Vogt
  2 siblings, 1 reply; 18+ messages in thread
From: Dominik Vogt @ 2017-04-06  9:35 UTC (permalink / raw)
  To: gcc-patches, Andreas Krebbel, Ulrich Weigand

[-- Attachment #1: Type: text/plain, Size: 1030 bytes --]

On Wed, Apr 05, 2017 at 02:52:00PM +0100, Dominik Vogt wrote:
> On Mon, Mar 27, 2017 at 09:27:35PM +0100, Dominik Vogt wrote:
> > The attached patch optimizes the atomic_exchange and
> > atomic_compare patterns on s390 and s390x (mostly limited to
> > SImode and DImode).  Among general optimizaation, the changes fix
> > most of the problems reported in PR 80080:
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80080
> > 
> > Bootstrapped and regression tested on a zEC12 with s390 and s390x
> > biarch.
> 
> New version attached.

This time it really is.  :-)

> v3:
> 
>   * Remove sne* patterns.
>   * Move alignment check from s390_expand_cs to s390.md.
>   * Use s_operand instead of memory_nosymref_operand.
>   * Remove memory_nosymref_operand.
>   * Allow any CC-mode in cstorecc4 for TARGET_Z196.
>   * Fix EQ with TARGET_Z196 in cstorecc4.
>   * Duplicate CS patterns for CCZmode.
> 
> Bootstrapped and regression tested on a zEC12 with s390 and s390x
> biarch.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

[-- Attachment #2: 0001-v3-ChangeLog --]
[-- Type: text/plain, Size: 1641 bytes --]

gcc/ChangeLog-dv-atomic-gcc7

	* s390-protos.h (s390_expand_cs_hqi): Removed.
	(s390_expand_cs, s390_expand_atomic_exchange_tdsi): New prototypes.
	* config/s390/s390.c (s390_emit_compare_and_swap): Handle all integer
	modes as well as CCZ1mode and CCZmode.
	(s390_expand_atomic_exchange_tdsi, s390_expand_atomic): Adapt to new
	signature of s390_emit_compare_and_swap.
	(s390_expand_cs_hqi): Likewise, make static.
	(s390_expand_cs_tdsi): Generate an explicit compare before trying
	compare-and-swap, in some cases.
	(s390_expand_cs): Wrapper function.
	(s390_expand_atomic_exchange_tdsi): New backend specific expander for
	atomic_exchange.
	* config/s390/s390.md (CCZZ1): New mode iterator for compare-and-swap.
	(define_peephole2): New peephole to help combining the load-and-test
	pattern with volatile memory.
	("cstorecc4"): Use load-on-condition and deal with CCZmode for
	TARGET_Z196.
	("atomic_compare_and_swap<mode>"): Merge the patterns for small and
	large integers.  Forbid symref memory operands.  Move expander to
	s390.c.  Require cc register.
	("atomic_compare_and_swap<DGPR:mode><CCZZ1:mode>_internal")
	("*atomic_compare_and_swap<TDI:mode><CCZZ1:mode>_1")
	("*atomic_compare_and_swapdi<CCZZ1:mode>_2")
	("*atomic_compare_and_swapsi<CCZZ1:mode>_3"): Duplicate for CCZ1mode and
	CCZmode.  Use s_operand to forbid symref memory operands.
	("atomic_exchange<mode>"): Allow and implement all integer modes.
gcc/testsuite/ChangeLog-dv-atomic-gcc7

	* gcc.target/s390/md/atomic_compare_exchange-1.c: New test.
	* gcc.target/s390/md/atomic_compare_exchange-1.inc: New test.
	* gcc.target/s390/md/atomic_exchange-1.inc: New test.

[-- Attachment #3: 0001-v3-S-390-Optimize-atomic_compare_exchange-and-atomic_co.patch --]
[-- Type: text/plain, Size: 34741 bytes --]

From d5e4c5785eaee076112d8493b5104db6689fe209 Mon Sep 17 00:00:00 2001
From: Dominik Vogt <vogt@linux.vnet.ibm.com>
Date: Thu, 23 Feb 2017 17:23:11 +0100
Subject: [PATCH] S/390: Optimize atomic_compare_exchange and
 atomic_compare builtins.

1) Use the load-and-test instructions for atomic_exchange if the value is 0.
2) If IS_WEAK is true, compare the memory contents before a compare-and-swap
   and skip the CS instructions if the value is not the expected one.
---
 gcc/config/s390/s390-protos.h                      |   4 +-
 gcc/config/s390/s390.c                             | 176 ++++++++++-
 gcc/config/s390/s390.md                            | 150 ++++-----
 .../gcc.target/s390/md/atomic_compare_exchange-1.c |  84 ++++++
 .../s390/md/atomic_compare_exchange-1.inc          | 336 +++++++++++++++++++++
 .../gcc.target/s390/md/atomic_exchange-1.c         | 309 +++++++++++++++++++
 6 files changed, 980 insertions(+), 79 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/md/atomic_compare_exchange-1.c
 create mode 100644 gcc/testsuite/gcc.target/s390/md/atomic_compare_exchange-1.inc
 create mode 100644 gcc/testsuite/gcc.target/s390/md/atomic_exchange-1.c

diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h
index 7f06a20..3fdb320 100644
--- a/gcc/config/s390/s390-protos.h
+++ b/gcc/config/s390/s390-protos.h
@@ -112,8 +112,8 @@ extern void s390_expand_vec_strlen (rtx, rtx, rtx);
 extern void s390_expand_vec_movstr (rtx, rtx, rtx);
 extern bool s390_expand_addcc (enum rtx_code, rtx, rtx, rtx, rtx, rtx);
 extern bool s390_expand_insv (rtx, rtx, rtx, rtx);
-extern void s390_expand_cs_hqi (machine_mode, rtx, rtx, rtx,
-				rtx, rtx, bool);
+extern void s390_expand_cs (machine_mode, rtx, rtx, rtx, rtx, rtx, bool);
+extern void s390_expand_atomic_exchange_tdsi (rtx, rtx, rtx);
 extern void s390_expand_atomic (machine_mode, enum rtx_code,
 				rtx, rtx, rtx, bool);
 extern void s390_expand_tbegin (rtx, rtx, rtx, bool);
diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 2cb8947..b1d6088 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -1762,11 +1762,40 @@ s390_emit_compare (enum rtx_code code, rtx op0, rtx op1)
 
 static rtx
 s390_emit_compare_and_swap (enum rtx_code code, rtx old, rtx mem,
-			    rtx cmp, rtx new_rtx)
+			    rtx cmp, rtx new_rtx, machine_mode ccmode)
 {
-  emit_insn (gen_atomic_compare_and_swapsi_internal (old, mem, cmp, new_rtx));
-  return s390_emit_compare (code, gen_rtx_REG (CCZ1mode, CC_REGNUM),
-			    const0_rtx);
+  switch (GET_MODE (mem))
+    {
+    case SImode:
+      if (ccmode == CCZ1mode)
+	emit_insn (gen_atomic_compare_and_swapsiccz1_internal (old, mem, cmp,
+							       new_rtx));
+      else
+	emit_insn (gen_atomic_compare_and_swapsiccz_internal (old, mem, cmp,
+							       new_rtx));
+      break;
+    case DImode:
+      if (ccmode == CCZ1mode)
+	emit_insn (gen_atomic_compare_and_swapdiccz1_internal (old, mem, cmp,
+							       new_rtx));
+      else
+	emit_insn (gen_atomic_compare_and_swapdiccz_internal (old, mem, cmp,
+							      new_rtx));
+      break;
+    case TImode:
+      if (ccmode == CCZ1mode)
+	emit_insn (gen_atomic_compare_and_swapticcz1_internal (old, mem, cmp,
+							       new_rtx));
+      else
+	emit_insn (gen_atomic_compare_and_swapticcz_internal (old, mem, cmp,
+							      new_rtx));
+      break;
+    case QImode:
+    case HImode:
+    default:
+      gcc_unreachable ();
+    }
+  return s390_emit_compare (code, gen_rtx_REG (ccmode, CC_REGNUM), const0_rtx);
 }
 
 /* Emit a jump instruction to TARGET and return it.  If COND is
@@ -6723,7 +6752,7 @@ s390_two_part_insv (struct alignment_context *ac, rtx *seq1, rtx *seq2,
    the memory location, CMP the old value to compare MEM with and NEW_RTX the
    value to set if CMP == MEM.  */
 
-void
+static void
 s390_expand_cs_hqi (machine_mode mode, rtx btarget, rtx vtarget, rtx mem,
 		    rtx cmp, rtx new_rtx, bool is_weak)
 {
@@ -6770,7 +6799,7 @@ s390_expand_cs_hqi (machine_mode mode, rtx btarget, rtx vtarget, rtx mem,
   emit_insn (seq2);
   emit_insn (seq3);
 
-  cc = s390_emit_compare_and_swap (EQ, res, ac.memsi, cmpv, newv);
+  cc = s390_emit_compare_and_swap (EQ, res, ac.memsi, cmpv, newv, CCZ1mode);
   if (is_weak)
     emit_insn (gen_cstorecc4 (btarget, cc, XEXP (cc, 0), XEXP (cc, 1)));
   else
@@ -6799,6 +6828,138 @@ s390_expand_cs_hqi (machine_mode mode, rtx btarget, rtx vtarget, rtx mem,
 					      NULL_RTX, 1, OPTAB_DIRECT), 1);
 }
 
+/* Variant of s390_expand_cs for SI, DI and TI modes.  */
+static void
+s390_expand_cs_tdsi (machine_mode mode, rtx btarget, rtx vtarget, rtx mem,
+		     rtx cmp, rtx new_rtx, bool is_weak)
+{
+  rtx output = vtarget;
+  rtx_code_label *skip_cs_label = NULL;
+  bool do_const_opt = false;
+
+  if (!register_operand (output, mode))
+    output = gen_reg_rtx (mode);
+
+  /* If IS_WEAK is true and the INPUT value is a constant, compare the memory
+     with the constant first and skip the compare_and_swap because its very
+     expensive and likely to fail anyway.
+     Note 1: This is done only for IS_WEAK.  C11 allows optimizations that may
+     cause spurious in that case.
+     Note 2: It may be useful to do this also for non-constant INPUT.
+     Note 3: Currently only targets with "load on condition" are supported
+     (z196 and newer).  */
+
+  if (TARGET_Z196
+      && (mode == SImode || mode == DImode))
+    do_const_opt = (is_weak && CONST_INT_P (cmp));
+
+  if (do_const_opt)
+    {
+      const int very_unlikely = REG_BR_PROB_BASE / 100 - 1;
+      rtx cc = gen_rtx_REG (CCZmode, CC_REGNUM);
+
+      skip_cs_label = gen_label_rtx ();
+      emit_move_insn (output, mem);
+      emit_move_insn (btarget, const0_rtx);
+      emit_insn (gen_rtx_SET (cc, gen_rtx_COMPARE (CCZmode, output, cmp)));
+      s390_emit_jump (skip_cs_label, gen_rtx_NE (VOIDmode, cc, const0_rtx));
+      add_int_reg_note (get_last_insn (), REG_BR_PROB, very_unlikely);
+      /* If the jump is not taken, OUTPUT is the expected value.  */
+      cmp = output;
+      /* Reload newval to a register manually, *after* the compare and jump
+	 above.  Otherwise Reload might place it before the jump.  */
+    }
+  else
+    cmp = force_reg (mode, cmp);
+  new_rtx = force_reg (mode, new_rtx);
+  s390_emit_compare_and_swap (EQ, output, mem, cmp, new_rtx,
+			      (do_const_opt) ? CCZmode : CCZ1mode);
+
+  /* We deliberately accept non-register operands in the predicate
+     to ensure the write back to the output operand happens *before*
+     the store-flags code below.  This makes it easier for combine
+     to merge the store-flags code with a potential test-and-branch
+     pattern following (immediately!) afterwards.  */
+  if (output != vtarget)
+    emit_move_insn (vtarget, output);
+
+  if (skip_cs_label != NULL)
+      emit_label (skip_cs_label);
+  if (TARGET_Z196 && do_const_opt)
+    {
+      rtx cc, cond, ite;
+
+      /* Do not use gen_cstorecc4 here because it writes either 1 or 0, but
+	 btarget has already been initialized with 0 above.  */
+      cc = gen_rtx_REG (CCZmode, CC_REGNUM);
+      cond = gen_rtx_EQ (VOIDmode, cc, const0_rtx);
+      ite = gen_rtx_IF_THEN_ELSE (SImode, cond, const1_rtx, btarget);
+      emit_insn (gen_rtx_SET (btarget, ite));
+    }
+  else
+    {
+      rtx cc, cond;
+
+      cc = gen_rtx_REG (CCZ1mode, CC_REGNUM);
+      cond = gen_rtx_EQ (SImode, cc, const0_rtx);
+      emit_insn (gen_cstorecc4 (btarget, cond, cc, const0_rtx));
+    }
+}
+
+/* Expand an atomic compare and swap operation.  MEM is the memory location,
+   CMP the old value to compare MEM with and NEW_RTX the value to set if
+   CMP == MEM.  */
+
+void
+s390_expand_cs (machine_mode mode, rtx btarget, rtx vtarget, rtx mem,
+		rtx cmp, rtx new_rtx, bool is_weak)
+{
+  switch (mode)
+    {
+    case TImode:
+    case DImode:
+    case SImode:
+      s390_expand_cs_tdsi (mode, btarget, vtarget, mem, cmp, new_rtx, is_weak);
+      break;
+    case HImode:
+    case QImode:
+      s390_expand_cs_hqi (mode, btarget, vtarget, mem, cmp, new_rtx, is_weak);
+      break;
+    default:
+      gcc_unreachable ();
+    }
+}
+
+/* Expand an atomic_exchange operation simulated with a compare-and-swap loop.
+   The memory location MEM is set to INPUT.  OUTPUT is set to the previous value
+   of MEM.  */
+
+void
+s390_expand_atomic_exchange_tdsi (rtx output, rtx mem, rtx input)
+{
+  machine_mode mode = GET_MODE (mem);
+  rtx_code_label *csloop;
+
+  if (TARGET_Z196
+      && (mode == DImode || mode == SImode)
+      && CONST_INT_P (input) && INTVAL (input) == 0)
+    {
+      emit_move_insn (output, const0_rtx);
+      if (mode == DImode)
+	emit_insn (gen_atomic_fetch_anddi (output, mem, const0_rtx, input));
+      else
+	emit_insn (gen_atomic_fetch_andsi (output, mem, const0_rtx, input));
+      return;
+    }
+
+  input = force_reg (mode, input);
+  emit_move_insn (output, mem);
+  csloop = gen_label_rtx ();
+  emit_label (csloop);
+  s390_emit_jump (csloop, s390_emit_compare_and_swap (NE, output, mem, output,
+						      input, CCZ1mode));
+}
+
 /* Expand an atomic operation CODE of mode MODE.  MEM is the memory location
    and VAL the value to play with.  If AFTER is true then store the value
    MEM holds after the operation, if AFTER is false then store the value MEM
@@ -6878,7 +7039,8 @@ s390_expand_atomic (machine_mode mode, enum rtx_code code,
     }
 
   s390_emit_jump (csloop, s390_emit_compare_and_swap (NE, cmp,
-						      ac.memsi, cmp, new_rtx));
+						      ac.memsi, cmp, new_rtx,
+						      CCZ1mode));
 
   /* Return the correct part of the bitfield.  */
   if (target)
diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index 59f189c..685d847 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -793,6 +793,9 @@
 ;; In place of GET_MODE_SIZE (<MODE>mode)
 (define_mode_attr modesize [(DI "8") (SI "4")])
 
+;; Used for compare-and-swap.
+(define_mode_iterator CCZZ1 [CCZ CCZ1])
+
 ;; Allow return and simple_return to be defined from a single template.
 (define_code_iterator ANY_RETURN [return simple_return])
 
@@ -907,6 +910,21 @@
   [(set_attr "op_type" "RR<E>,RXY")
    (set_attr "z10prop" "z10_fr_E1,z10_fwd_A3") ])
 
+; Peephole to combine a load-and-test from volatile memory which combine does
+; not do.
+(define_peephole2
+  [(set (match_operand:GPR 0 "register_operand")
+	(match_operand:GPR 2 "memory_operand"))
+   (set (reg CC_REGNUM)
+	(compare (match_dup 0) (match_operand:GPR 1 "const0_operand")))]
+  "s390_match_ccmode(insn, CCSmode) && TARGET_EXTIMM
+   && GENERAL_REG_P (operands[0])
+   && satisfies_constraint_T (operands[2])"
+  [(parallel
+    [(set (reg:CCS CC_REGNUM)
+	  (compare:CCS (match_dup 2) (match_dup 1)))
+     (set (match_dup 0) (match_dup 2))])])
+
 ; ltr, lt, ltgr, ltg
 (define_insn "*tst<mode>_cconly_extimm"
   [(set (reg CC_REGNUM)
@@ -6518,13 +6536,30 @@
   [(parallel
     [(set (match_operand:SI 0 "register_operand" "")
 	  (match_operator:SI 1 "s390_eqne_operator"
-           [(match_operand:CCZ1 2 "register_operand")
+           [(match_operand 2 "cc_reg_operand")
 	    (match_operand 3 "const0_operand")]))
      (clobber (reg:CC CC_REGNUM))])]
   ""
-  "emit_insn (gen_sne (operands[0], operands[2]));
-   if (GET_CODE (operands[1]) == EQ)
-     emit_insn (gen_xorsi3 (operands[0], operands[0], const1_rtx));
+  "machine_mode mode = GET_MODE (operands[2]);
+   if (TARGET_Z196)
+     {
+       rtx cond, ite;
+
+       if (GET_CODE (operands[1]) == NE)
+	 cond = gen_rtx_NE (VOIDmode, operands[2], const0_rtx);
+       else
+	 cond = gen_rtx_EQ (VOIDmode, operands[2], const0_rtx);
+       ite = gen_rtx_IF_THEN_ELSE (SImode, cond, const1_rtx, const0_rtx);
+       emit_insn (gen_rtx_SET (operands[0], ite));
+     }
+   else
+     {
+       if (mode != CCZ1mode)
+	 FAIL;
+       emit_insn (gen_sne (operands[0], operands[2]));
+       if (GET_CODE (operands[1]) == EQ)
+	 emit_insn (gen_xorsi3 (operands[0], operands[0], const1_rtx));
+     }
    DONE;")
 
 (define_insn_and_split "sne"
@@ -6535,9 +6570,7 @@
   ""
   "#"
   "reload_completed"
-  [(parallel
-    [(set (match_dup 0) (ashiftrt:SI (match_dup 0) (const_int 28)))
-     (clobber (reg:CC CC_REGNUM))])])
+  [(set (match_dup 0) (lshiftrt:SI (match_dup 0) (const_int 28)))])
 
 
 ;;
@@ -10198,99 +10231,66 @@
 
 (define_expand "atomic_compare_and_swap<mode>"
   [(match_operand:SI 0 "register_operand")	;; bool success output
-   (match_operand:DGPR 1 "nonimmediate_operand");; oldval output
-   (match_operand:DGPR 2 "memory_operand")	;; memory
-   (match_operand:DGPR 3 "register_operand")	;; expected intput
-   (match_operand:DGPR 4 "register_operand")	;; newval intput
+   (match_operand:DINT 1 "nonimmediate_operand");; oldval output
+   (match_operand:DINT 2 "s_operand")		;; memory
+   (match_operand:DINT 3 "general_operand")	;; expected intput
+   (match_operand:DINT 4 "general_operand")	;; newval intput
    (match_operand:SI 5 "const_int_operand")	;; is_weak
    (match_operand:SI 6 "const_int_operand")	;; success model
    (match_operand:SI 7 "const_int_operand")]	;; failure model
   ""
 {
-  rtx cc, cmp, output = operands[1];
-
-  if (!register_operand (output, <MODE>mode))
-    output = gen_reg_rtx (<MODE>mode);
-
-  if (MEM_ALIGN (operands[2]) < GET_MODE_BITSIZE (GET_MODE (operands[2])))
+  if (GET_MODE_BITSIZE (<MODE>mode) >= 16
+      && GET_MODE_BITSIZE (<MODE>mode) > MEM_ALIGN (operands[2]))
     FAIL;
 
-  emit_insn (gen_atomic_compare_and_swap<mode>_internal
-	     (output, operands[2], operands[3], operands[4]));
-
-  /* We deliberately accept non-register operands in the predicate
-     to ensure the write back to the output operand happens *before*
-     the store-flags code below.  This makes it easier for combine
-     to merge the store-flags code with a potential test-and-branch
-     pattern following (immediately!) afterwards.  */
-  if (output != operands[1])
-    emit_move_insn (operands[1], output);
-
-  cc = gen_rtx_REG (CCZ1mode, CC_REGNUM);
-  cmp = gen_rtx_EQ (SImode, cc, const0_rtx);
-  emit_insn (gen_cstorecc4 (operands[0], cmp, cc, const0_rtx));
-  DONE;
-})
-
-(define_expand "atomic_compare_and_swap<mode>"
-  [(match_operand:SI 0 "register_operand")	;; bool success output
-   (match_operand:HQI 1 "nonimmediate_operand")	;; oldval output
-   (match_operand:HQI 2 "memory_operand")	;; memory
-   (match_operand:HQI 3 "general_operand")	;; expected intput
-   (match_operand:HQI 4 "general_operand")	;; newval intput
-   (match_operand:SI 5 "const_int_operand")	;; is_weak
-   (match_operand:SI 6 "const_int_operand")	;; success model
-   (match_operand:SI 7 "const_int_operand")]	;; failure model
-  ""
-{
-  s390_expand_cs_hqi (<MODE>mode, operands[0], operands[1], operands[2],
-		      operands[3], operands[4], INTVAL (operands[5]));
-  DONE;
-})
+  s390_expand_cs (<MODE>mode, operands[0], operands[1], operands[2],
+		  operands[3], operands[4], INTVAL (operands[5]));
+  DONE;})
 
-(define_expand "atomic_compare_and_swap<mode>_internal"
+(define_expand "atomic_compare_and_swap<DGPR:mode><CCZZ1:mode>_internal"
   [(parallel
      [(set (match_operand:DGPR 0 "register_operand")
-	   (match_operand:DGPR 1 "memory_operand"))
+	   (match_operand:DGPR 1 "s_operand"))
       (set (match_dup 1)
 	   (unspec_volatile:DGPR
 	     [(match_dup 1)
 	      (match_operand:DGPR 2 "register_operand")
 	      (match_operand:DGPR 3 "register_operand")]
 	     UNSPECV_CAS))
-      (set (reg:CCZ1 CC_REGNUM)
-	   (compare:CCZ1 (match_dup 1) (match_dup 2)))])]
+      (set (reg:CCZZ1 CC_REGNUM)
+	   (compare:CCZZ1 (match_dup 1) (match_dup 2)))])]
   "")
 
 ; cdsg, csg
-(define_insn "*atomic_compare_and_swap<mode>_1"
+(define_insn "*atomic_compare_and_swap<TDI:mode><CCZZ1:mode>_1"
   [(set (match_operand:TDI 0 "register_operand" "=r")
-	(match_operand:TDI 1 "memory_operand" "+S"))
+	(match_operand:TDI 1 "s_operand" "+S"))
    (set (match_dup 1)
 	(unspec_volatile:TDI
 	  [(match_dup 1)
 	   (match_operand:TDI 2 "register_operand" "0")
 	   (match_operand:TDI 3 "register_operand" "r")]
 	  UNSPECV_CAS))
-   (set (reg:CCZ1 CC_REGNUM)
-	(compare:CCZ1 (match_dup 1) (match_dup 2)))]
+   (set (reg:CCZZ1 CC_REGNUM)
+	(compare:CCZZ1 (match_dup 1) (match_dup 2)))]
   "TARGET_ZARCH"
   "c<td>sg\t%0,%3,%S1"
   [(set_attr "op_type" "RSY")
    (set_attr "type"   "sem")])
 
 ; cds, cdsy
-(define_insn "*atomic_compare_and_swapdi_2"
+(define_insn "*atomic_compare_and_swapdi<CCZZ1:mode>_2"
   [(set (match_operand:DI 0 "register_operand" "=r,r")
-	(match_operand:DI 1 "memory_operand" "+Q,S"))
+	(match_operand:DI 1 "s_operand" "+Q,S"))
    (set (match_dup 1)
 	(unspec_volatile:DI
 	  [(match_dup 1)
 	   (match_operand:DI 2 "register_operand" "0,0")
 	   (match_operand:DI 3 "register_operand" "r,r")]
 	  UNSPECV_CAS))
-   (set (reg:CCZ1 CC_REGNUM)
-	(compare:CCZ1 (match_dup 1) (match_dup 2)))]
+   (set (reg:CCZZ1 CC_REGNUM)
+	(compare:CCZZ1 (match_dup 1) (match_dup 2)))]
   "!TARGET_ZARCH"
   "@
    cds\t%0,%3,%S1
@@ -10300,17 +10300,17 @@
    (set_attr "type" "sem")])
 
 ; cs, csy
-(define_insn "*atomic_compare_and_swapsi_3"
+(define_insn "*atomic_compare_and_swapsi<CCZZ1:mode>_3"
   [(set (match_operand:SI 0 "register_operand" "=r,r")
-	(match_operand:SI 1 "memory_operand" "+Q,S"))
+	(match_operand:SI 1 "s_operand" "+Q,S"))
    (set (match_dup 1)
 	(unspec_volatile:SI
 	  [(match_dup 1)
 	   (match_operand:SI 2 "register_operand" "0,0")
 	   (match_operand:SI 3 "register_operand" "r,r")]
 	  UNSPECV_CAS))
-   (set (reg:CCZ1 CC_REGNUM)
-	(compare:CCZ1 (match_dup 1) (match_dup 2)))]
+   (set (reg:CCZZ1 CC_REGNUM)
+	(compare:CCZZ1 (match_dup 1) (match_dup 2)))]
   ""
   "@
    cs\t%0,%3,%S1
@@ -10398,15 +10398,25 @@
   DONE;
 })
 
+;; Pattern to implement atomic_exchange with a compare-and-swap loop.  The code
+;; generated by the middleend is not good.
 (define_expand "atomic_exchange<mode>"
-  [(match_operand:HQI 0 "register_operand")		;; val out
-   (match_operand:HQI 1 "memory_operand")		;; memory
-   (match_operand:HQI 2 "general_operand")		;; val in
+  [(match_operand:DINT 0 "register_operand")		;; val out
+   (match_operand:DINT 1 "s_operand")			;; memory
+   (match_operand:DINT 2 "general_operand")		;; val in
    (match_operand:SI 3 "const_int_operand")]		;; model
   ""
 {
-  s390_expand_atomic (<MODE>mode, SET, operands[0], operands[1],
-		      operands[2], false);
+  if (<MODE>mode != QImode
+      && MEM_ALIGN (operands[1]) < GET_MODE_BITSIZE (<MODE>mode))
+    FAIL;
+  if (<MODE>mode == HImode || <MODE>mode == QImode)
+    s390_expand_atomic (<MODE>mode, SET, operands[0], operands[1], operands[2],
+			false);
+  else if (<MODE>mode == SImode || TARGET_ZARCH)
+    s390_expand_atomic_exchange_tdsi (operands[0], operands[1], operands[2]);
+  else
+    FAIL;
   DONE;
 })
 
diff --git a/gcc/testsuite/gcc.target/s390/md/atomic_compare_exchange-1.c b/gcc/testsuite/gcc.target/s390/md/atomic_compare_exchange-1.c
new file mode 100644
index 0000000..5cc026d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/md/atomic_compare_exchange-1.c
@@ -0,0 +1,84 @@
+/* Machine description pattern tests.  */
+
+/* { dg-do compile } */
+/* { dg-options "" } */
+/* { dg-do run { target { s390_useable_hw } } } */
+
+#include <stdio.h>
+
+struct
+{
+#ifdef __s390xx__
+  __int128 dummy128;
+  __int128 mem128;
+#endif
+  long long dummy64;
+  long long mem64;
+  int dummy32;
+  int mem32;
+  short mem16l;
+  short mem16h;
+  char mem8ll;
+  char mem8lh;
+  char mem8hl;
+  char mem8hh;
+} mem_s;
+
+#define TYPE char
+#define FN(SUFFIX) f8 ## SUFFIX
+#define FNS(SUFFIX) "f8" #SUFFIX
+#include "atomic_compare_exchange-1.inc"
+
+#define TYPE short
+#define FN(SUFFIX) f16 ##SUFFIX
+#define FNS(SUFFIX) "f16" #SUFFIX
+#include "atomic_compare_exchange-1.inc"
+
+#define TYPE int
+#define FN(SUFFIX) f32 ## SUFFIX
+#define FNS(SUFFIX) "f32" #SUFFIX
+#include "atomic_compare_exchange-1.inc"
+
+#define TYPE long long
+#define FN(SUFFIX) f64 ## SUFFIX
+#define FNS(SUFFIX) "f64" #SUFFIX
+#include "atomic_compare_exchange-1.inc"
+
+#ifdef __s390xx__
+#define TYPE __int128
+#define FN(SUFFIX) f128 ## SUFFIX
+#define FNS(SUFFIX) "f128" #SUFFIX
+#include "atomic_compare_exchange-1.inc"
+#endif
+
+int main(void)
+{
+  int err_count = 0;
+  int i;
+
+  for (i = -1; i <= 2; i++)
+    {
+      __builtin_memset(&mem_s, 0x99, sizeof(mem_s));
+      err_count += f8_validate(&mem_s.mem8ll, i, 1);
+      __builtin_memset(&mem_s, 0x99, sizeof(mem_s));
+      err_count += f8_validate(&mem_s.mem8lh, i, 1);
+      __builtin_memset(&mem_s, 0x99, sizeof(mem_s));
+      err_count += f8_validate(&mem_s.mem8hl, i, 1);
+      __builtin_memset(&mem_s, 0x99, sizeof(mem_s));
+      err_count += f8_validate(&mem_s.mem8hh, i, 1);
+      __builtin_memset(&mem_s, 0x99, sizeof(mem_s));
+      err_count += f16_validate(&mem_s.mem16l, i, 1);
+      __builtin_memset(&mem_s, 0x99, sizeof(mem_s));
+      err_count += f16_validate(&mem_s.mem16h, i, 1);
+      __builtin_memset(&mem_s, 0x99, sizeof(mem_s));
+      err_count += f32_validate(&mem_s.mem32, i, 1);
+      __builtin_memset(&mem_s, 0x99, sizeof(mem_s));
+      err_count += f64_validate(&mem_s.mem64, i, 1);
+#ifdef __s390xx__
+      __builtin_memset(&mem_s, 0x99, sizeof(mem_s));
+      err_count += f128_validate(&mem_s.mem128, i, 1);
+#endif
+    }
+
+  return err_count;
+}
diff --git a/gcc/testsuite/gcc.target/s390/md/atomic_compare_exchange-1.inc b/gcc/testsuite/gcc.target/s390/md/atomic_compare_exchange-1.inc
new file mode 100644
index 0000000..199aaa3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/md/atomic_compare_exchange-1.inc
@@ -0,0 +1,336 @@
+/* -*-c-*- */
+
+#undef NEW
+#define NEW 3
+
+__attribute__ ((noinline))
+int FN(_bo)(TYPE *mem, TYPE *old_ret, TYPE old)
+{
+  *old_ret = old;
+  return __atomic_compare_exchange_n (mem, (void *)old_ret, NEW, 1, 2, 0);
+}
+
+__attribute__ ((noinline))
+void FN(_o)(TYPE *mem, TYPE *old_ret, TYPE old)
+{
+  *old_ret = old;
+  __atomic_compare_exchange_n (mem, (void *)old_ret, NEW, 1, 2, 0);
+  return;
+}
+
+__attribute__ ((noinline))
+int FN(_b)(TYPE *mem, TYPE old)
+{
+  return __atomic_compare_exchange_n (mem, (void *)&old, NEW, 1, 2, 0);
+}
+
+__attribute__ ((noinline))
+void FN()(TYPE *mem, TYPE old)
+{
+  __atomic_compare_exchange_n (mem, (void *)&old, NEW, 1, 2, 0);
+  return;
+}
+
+/* Const != 0 old value.  */
+__attribute__ ((noinline))
+int FN(_c1_bo)(TYPE *mem, TYPE *old_ret)
+{
+  *old_ret = 1;
+  return __atomic_compare_exchange_n (mem, (void *)old_ret, NEW, 1, 2, 0);
+}
+
+__attribute__ ((noinline))
+void FN(_c1_o)(TYPE *mem, TYPE *old_ret)
+{
+  *old_ret = 1;
+  __atomic_compare_exchange_n (mem, (void *)old_ret, NEW, 1, 2, 0);
+  return;
+}
+
+__attribute__ ((noinline))
+int FN(_c1_b)(TYPE *mem)
+{
+  TYPE old = 1;
+  return __atomic_compare_exchange_n (mem, (void *)&old, NEW, 1, 2, 0);
+}
+
+__attribute__ ((noinline))
+void FN(_c1)(TYPE *mem)
+{
+  TYPE old = 1;
+  __atomic_compare_exchange_n (mem, (void *)&old, NEW, 1, 2, 0);
+  return;
+}
+
+/* Const == 0 old value.  */
+__attribute__ ((noinline))
+int FN(_c0_bo)(TYPE *mem, TYPE *old_ret)
+{
+  *old_ret = 0;
+  return __atomic_compare_exchange_n (mem, (void *)old_ret, NEW, 1, 2, 0);
+}
+
+__attribute__ ((noinline))
+void FN(_c0_o)(TYPE *mem, TYPE *old_ret)
+{
+  *old_ret = 0;
+  __atomic_compare_exchange_n (mem, (void *)old_ret, NEW, 1, 2, 0);
+  return;
+}
+
+__attribute__ ((noinline))
+int FN(_c0_b)(TYPE *mem)
+{
+  TYPE old = 0;
+  return __atomic_compare_exchange_n (mem, (void *)&old, NEW, 1, 2, 0);
+}
+
+__attribute__ ((noinline))
+void FN(_c0)(TYPE *mem)
+{
+  TYPE old = 0;
+  __atomic_compare_exchange_n (mem, (void *)&old, NEW, 1, 2, 0);
+  return;
+}
+
+int FN(_validate_mem)(TYPE *mem, TYPE expected_mem)
+{
+  if (*mem != expected_mem)
+    {
+      fprintf(stderr, "  BAD: mem %d != expected mem %d\n",
+	      *mem, expected_mem);
+      return 1;
+    }
+
+  return 0;
+}
+
+int FN(_validate_rc)(int rc, int expected_rc)
+{
+  if (rc != expected_rc)
+    {
+      fprintf(stderr, "  BAD: rc %d != expected rc %d\n",
+	      rc, expected_rc);
+      return 1;
+    }
+
+  return 0;
+}
+
+int FN(_validate_old_ret)(int old_ret, int expected_old_ret)
+{
+  if (old_ret != expected_old_ret)
+    {
+      fprintf(stderr, "  BAD: old_ret %d != expected old_ret %d\n",
+	      old_ret, expected_old_ret);
+      return 1;
+    }
+
+  return 0;
+}
+
+int FN(_validate)(TYPE *mem, TYPE init_mem, TYPE old)
+{
+  int err_count = 0;
+  int rc;
+  TYPE expected_mem;
+  int expected_rc;
+  TYPE old_ret;
+  int failed;
+  const char *fname;
+
+  fprintf(stderr, "%s: init_mem %d @ %p\n", __FUNCTION__, init_mem, mem);
+  {
+    failed = 0;
+    old = 1;
+    *mem = init_mem;
+    expected_mem = (init_mem == old) ? NEW : *mem;
+    expected_rc = (init_mem == old);
+    fname = FNS(_bo);
+    rc = FN(_bo)(mem, &old_ret, old);
+    failed |= FN(_validate_rc)(rc, expected_rc);
+    failed |= FN(_validate_old_ret)(old_ret, init_mem);
+    failed |= FN(_validate_mem)(mem, expected_mem);
+    if (failed)
+      {
+	fprintf(stderr, "  FAIL: %s: near line %d\n", fname, __LINE__ - 3);
+	err_count++;
+      }
+  }
+  {
+    failed = 0;
+    old = 1;
+    *mem = init_mem;
+    expected_mem = (init_mem == old) ? NEW : *mem;
+    fname = FNS(_o);
+    FN(_o)(mem, &old_ret, old);
+    failed |= FN(_validate_old_ret)(old_ret, init_mem);
+    failed |= FN(_validate_mem)(mem, expected_mem);
+    if (failed)
+      {
+	fprintf(stderr, "  FAIL: %s: near line %d\n", fname, __LINE__ - 3);
+	err_count++;
+      }
+  }
+  {
+    failed = 0;
+    old = 1;
+    *mem = init_mem;
+    expected_mem = (init_mem == old) ? NEW : *mem;
+    expected_rc = (init_mem == old);
+    fname = FNS(_b);
+    rc = FN(_b)(mem, old);
+    failed |= FN(_validate_rc)(rc, expected_rc);
+    failed |= FN(_validate_mem)(mem, expected_mem);
+    if (failed)
+      {
+	fprintf(stderr, "  FAIL: %s: near line %d\n", fname, __LINE__ - 3);
+	err_count++;
+      }
+  }
+  {
+    failed = 0;
+    old = 1;
+    *mem = init_mem;
+    expected_mem = (init_mem == old) ? NEW : *mem;
+    fname = FNS();
+    FN()(mem, old);
+    failed |= FN(_validate_mem)(mem, expected_mem);
+    if (failed)
+      {
+	fprintf(stderr, "  FAIL: %s: near line %d\n", fname, __LINE__ - 3);
+	err_count++;
+      }
+  }
+  {
+    failed = 0;
+    old = 1;
+    *mem = init_mem;
+    expected_mem = (init_mem == old) ? NEW : *mem;
+    expected_rc = (init_mem == old);
+    fname = FNS(_c1_bo);
+    rc = FN(_c1_bo)(mem, &old_ret);
+    failed |= FN(_validate_rc)(rc, expected_rc);
+    failed |= FN(_validate_old_ret)(old_ret, init_mem);
+    failed |= FN(_validate_mem)(mem, expected_mem);
+    if (failed)
+      {
+	fprintf(stderr, "  FAIL: %s: near line %d\n", fname, __LINE__ - 3);
+	err_count++;
+      }
+  }
+  {
+    failed = 0;
+    old = 1;
+    *mem = init_mem;
+    expected_mem = (init_mem == old) ? NEW : *mem;
+    fname = FNS(_c1_o);
+    FN(_c1_o)(mem, &old_ret);
+    failed |= FN(_validate_old_ret)(old_ret, init_mem);
+    failed |= FN(_validate_mem)(mem, expected_mem);
+    if (failed)
+      {
+	fprintf(stderr, "  FAIL: %s: near line %d\n", fname, __LINE__ - 3);
+	err_count++;
+      }
+  }
+  {
+    failed = 0;
+    old = 1;
+    *mem = init_mem;
+    expected_mem = (init_mem == old) ? NEW : *mem;
+    expected_rc = (init_mem == old);
+    fname = FNS(_c1_b);
+    rc = FN(_c1_b)(mem);
+    failed |= FN(_validate_rc)(rc, expected_rc);
+    failed |= FN(_validate_mem)(mem, expected_mem);
+    if (failed)
+      {
+	fprintf(stderr, "  FAIL: %s: near line %d\n", fname, __LINE__ - 3);
+	err_count++;
+      }
+  }
+  {
+    failed = 0;
+    old = 1;
+    *mem = init_mem;
+    expected_mem = (init_mem == old) ? NEW : *mem;
+    fname = FNS(_c1);
+    FN(_c1)(mem);
+    failed |= FN(_validate_mem)(mem, expected_mem);
+    if (failed)
+      {
+	fprintf(stderr, "  FAIL: %s: near line %d\n", fname, __LINE__ - 3);
+	err_count++;
+      }
+  }
+  {
+    failed = 0;
+    old = 0;
+    *mem = init_mem;
+    expected_mem = (init_mem == old) ? NEW : *mem;
+    expected_rc = (init_mem == old);
+    fname = FNS(_c0_bo);
+    rc = FN(_c0_bo)(mem, &old_ret);
+    failed |= FN(_validate_rc)(rc, expected_rc);
+    failed |= FN(_validate_old_ret)(old_ret, init_mem);
+    failed |= FN(_validate_mem)(mem, expected_mem);
+    if (failed)
+      {
+	fprintf(stderr, "  FAIL: %s: near line %d\n", fname, __LINE__ - 3);
+	err_count++;
+      }
+  }
+  {
+    failed = 0;
+    old = 0;
+    *mem = init_mem;
+    expected_mem = (init_mem == old) ? NEW : *mem;
+    fname = FNS(_c0_o);
+    FN(_c0_o)(mem, &old_ret);
+    failed |= FN(_validate_old_ret)(old_ret, init_mem);
+    failed |= FN(_validate_mem)(mem, expected_mem);
+    if (failed)
+      {
+	fprintf(stderr, "  FAIL: %s: near line %d\n", fname, __LINE__ - 3);
+	err_count++;
+      }
+  }
+  {
+    failed = 0;
+    old = 0;
+    *mem = init_mem;
+    expected_mem = (init_mem == old) ? NEW : *mem;
+    expected_rc = (init_mem == old);
+    fname = FNS(_c0_b);
+    rc = FN(_c0_b)(mem);
+    failed |= FN(_validate_rc)(rc, expected_rc);
+    failed |= FN(_validate_mem)(mem, expected_mem);
+    if (failed)
+      {
+	fprintf(stderr, "  FAIL: %s: near line %d\n", fname, __LINE__ - 3);
+	err_count++;
+      }
+  }
+  {
+    failed = 0;
+    old = 0;
+    *mem = init_mem;
+    expected_mem = (init_mem == old) ? NEW : *mem;
+    fname = FNS(_c0);
+    FN(_c0)(mem);
+    failed |= FN(_validate_mem)(mem, expected_mem);
+    if (failed)
+      {
+	fprintf(stderr, "  FAIL: %s: near line %d\n", fname, __LINE__ - 3);
+	err_count++;
+      }
+  }
+
+  return err_count;
+}
+
+#undef TYPE
+#undef MEM
+#undef FN
+#undef FNS
diff --git a/gcc/testsuite/gcc.target/s390/md/atomic_exchange-1.c b/gcc/testsuite/gcc.target/s390/md/atomic_exchange-1.c
new file mode 100644
index 0000000..f82b213
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/md/atomic_exchange-1.c
@@ -0,0 +1,309 @@
+/* Machine description pattern tests.  */
+
+/* { dg-do compile } */
+/* { dg-options "-lpthread -latomic" } */
+/* { dg-do run { target { s390_useable_hw } } } */
+
+/**/
+
+char
+ae_8_0 (char *lock)
+{
+  return __atomic_exchange_n (lock, 0, 2);
+}
+
+char
+ae_8_1 (char *lock)
+{
+  return __atomic_exchange_n (lock, 1, 2);
+}
+
+char g8;
+
+char
+ae_8_g_0 (void)
+{
+  return __atomic_exchange_n (&g8, 0, 2);
+}
+
+char
+ae_8_g_1 (void)
+{
+  return __atomic_exchange_n (&g8, 1, 2);
+}
+
+/**/
+
+short
+ae_16_0 (short *lock)
+{
+  return __atomic_exchange_n (lock, 0, 2);
+}
+
+short
+ae_16_1 (short *lock)
+{
+  return __atomic_exchange_n (lock, 1, 2);
+}
+
+short g16;
+
+short
+ae_16_g_0 (void)
+{
+  return __atomic_exchange_n (&g16, 0, 2);
+}
+
+short
+ae_16_g_1 (void)
+{
+  return __atomic_exchange_n (&g16, 1, 2);
+}
+
+/**/
+
+int
+ae_32_0 (int *lock)
+{
+  return __atomic_exchange_n (lock, 0, 2);
+}
+
+int
+ae_32_1 (int *lock)
+{
+  return __atomic_exchange_n (lock, 1, 2);
+}
+
+int g32;
+
+int
+ae_32_g_0 (void)
+{
+  return __atomic_exchange_n (&g32, 0, 2);
+}
+
+int
+ae_32_g_1 (void)
+{
+  return __atomic_exchange_n (&g32, 1, 2);
+}
+
+/**/
+
+long long
+ae_64_0 (long long *lock)
+{
+  return __atomic_exchange_n (lock, 0, 2);
+}
+
+long long
+ae_64_1 (long long *lock)
+{
+  return __atomic_exchange_n (lock, 1, 2);
+}
+
+long long g64;
+
+long long
+ ae_64_g_0 (void)
+{
+  return __atomic_exchange_n (&g64, 0, 2);
+}
+
+long long
+ae_64_g_1 (void)
+{
+  return __atomic_exchange_n (&g64, 1, 2);
+}
+
+/**/
+
+#ifdef __s390x__
+__int128
+ae_128_0 (__int128 *lock)
+{
+  return __atomic_exchange_n (lock, 0, 2);
+}
+
+__int128
+ae_128_1 (__int128 *lock)
+{
+  return __atomic_exchange_n (lock, 1, 2);
+}
+
+__int128 g128;
+
+__int128
+ae_128_g_0 (void)
+{
+  return __atomic_exchange_n (&g128, 0, 2);
+}
+
+__int128
+ae_128_g_1 (void)
+{
+  return __atomic_exchange_n (&g128, 1, 2);
+}
+
+#endif
+
+int main(void)
+{
+  int i;
+
+  for (i = 0; i <= 2; i++)
+    {
+      int oval = i;
+
+      {
+	char lock;
+	char rval;
+
+	lock = oval;
+	rval = ae_8_0 (&lock);
+	if (lock != 0)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	lock = oval;
+	rval = ae_8_1 (&lock);
+	if (lock != 1)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	g8 = oval;
+	rval = ae_8_g_0 ();
+	if (g8 != 0)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	g8 = oval;
+	rval = ae_8_g_1 ();
+	if (g8 != 1)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+      }
+      {
+	short lock;
+	short rval;
+
+	lock = oval;
+	rval = ae_16_0 (&lock);
+	if (lock != 0)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	lock = oval;
+	rval = ae_16_1 (&lock);
+	if (lock != 1)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	g16 = oval;
+	rval = ae_16_g_0 ();
+	if (g16 != 0)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	g16 = oval;
+	rval = ae_16_g_1 ();
+	if (g16 != 1)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+      }
+      {
+	int lock;
+	int rval;
+
+	lock = oval;
+	rval = ae_32_0 (&lock);
+	if (lock != 0)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	lock = oval;
+	rval = ae_32_1 (&lock);
+	if (lock != 1)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	g32 = oval;
+	rval = ae_32_g_0 ();
+	if (g32 != 0)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	g32 = oval;
+	rval = ae_32_g_1 ();
+	if (g32 != 1)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+      }
+      {
+	long long lock;
+	long long rval;
+
+	lock = oval;
+	rval = ae_64_0 (&lock);
+	if (lock != 0)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	lock = oval;
+	rval = ae_64_1 (&lock);
+	if (lock != 1)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	g64 = oval;
+	rval = ae_64_g_0 ();
+	if (g64 != 0)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	g64 = oval;
+	rval = ae_64_g_1 ();
+	if (g64 != 1)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+      }
+
+#ifdef __s390x__
+      {
+	__int128 lock;
+	__int128 rval;
+
+	lock = oval;
+	rval = ae_128_0 (&lock);
+	if (lock != 0)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	lock = oval;
+	rval = ae_128_1 (&lock);
+	if (lock != 1)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	g128 = oval;
+	rval = ae_128_g_0 ();
+	if (g128 != 0)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	g128 = oval;
+	rval = ae_128_g_1 ();
+	if (g128 != 1)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+      }
+#endif
+    }
+
+  return 0;
+}
-- 
2.3.0


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] S/390: Optimize atomic_compare_exchange and atomic_compare builtins.
  2017-04-06  9:35   ` Dominik Vogt
@ 2017-04-06 15:29     ` Ulrich Weigand
  2017-04-06 15:34       ` Ulrich Weigand
  0 siblings, 1 reply; 18+ messages in thread
From: Ulrich Weigand @ 2017-04-06 15:29 UTC (permalink / raw)
  To: vogt; +Cc: gcc-patches, Andreas Krebbel, Ulrich Weigand

Dominik Vogt wrote:

> > v3:
> > 
> >   * Remove sne* patterns.
> >   * Move alignment check from s390_expand_cs to s390.md.
> >   * Use s_operand instead of memory_nosymref_operand.
> >   * Remove memory_nosymref_operand.
> >   * Allow any CC-mode in cstorecc4 for TARGET_Z196.
> >   * Fix EQ with TARGET_Z196 in cstorecc4.
> >   * Duplicate CS patterns for CCZmode.
> > 
> > Bootstrapped and regression tested on a zEC12 with s390 and s390x
> > biarch.

>  s390_emit_compare_and_swap (enum rtx_code code, rtx old, rtx mem,
> -			    rtx cmp, rtx new_rtx)
> +			    rtx cmp, rtx new_rtx, machine_mode ccmode)
>  {
> -  emit_insn (gen_atomic_compare_and_swapsi_internal (old, mem, cmp, new_rtx));
> -  return s390_emit_compare (code, gen_rtx_REG (CCZ1mode, CC_REGNUM),
> -			    const0_rtx);
> +  switch (GET_MODE (mem))
> +    {
> +    case SImode:
> +      if (ccmode == CCZ1mode)
> +	emit_insn (gen_atomic_compare_and_swapsiccz1_internal (old, mem, cmp,
> +							       new_rtx));
> +      else
> +	emit_insn (gen_atomic_compare_and_swapsiccz_internal (old, mem, cmp,
> +							       new_rtx));
> +      break;
> +    case DImode:
> +      if (ccmode == CCZ1mode)
> +	emit_insn (gen_atomic_compare_and_swapdiccz1_internal (old, mem, cmp,
> +							       new_rtx));
> +      else
> +	emit_insn (gen_atomic_compare_and_swapdiccz_internal (old, mem, cmp,
> +							      new_rtx));
> +      break;
> +    case TImode:
> +      if (ccmode == CCZ1mode)
> +	emit_insn (gen_atomic_compare_and_swapticcz1_internal (old, mem, cmp,
> +							       new_rtx));
> +      else
> +	emit_insn (gen_atomic_compare_and_swapticcz_internal (old, mem, cmp,
> +							      new_rtx));
> +      break;

These expanders don't really do anything different depending on the
mode of the accessed word (SI/DI/TImode), so this seems like a bit of
unncessary duplication.  The original code was correct in always
calling the SImode variant, even if this looks a bit odd.  Maybe a
better fix is to just remove the mode from this expander.

> +  if (TARGET_Z196
> +      && (mode == SImode || mode == DImode))
> +    do_const_opt = (is_weak && CONST_INT_P (cmp));
> +
> +  if (do_const_opt)
> +    {
> +      const int very_unlikely = REG_BR_PROB_BASE / 100 - 1;
> +      rtx cc = gen_rtx_REG (CCZmode, CC_REGNUM);
> +
> +      skip_cs_label = gen_label_rtx ();
> +      emit_move_insn (output, mem);
> +      emit_move_insn (btarget, const0_rtx);
> +      emit_insn (gen_rtx_SET (cc, gen_rtx_COMPARE (CCZmode, output, cmp)));
> +      s390_emit_jump (skip_cs_label, gen_rtx_NE (VOIDmode, cc, const0_rtx));
> +      add_int_reg_note (get_last_insn (), REG_BR_PROB, very_unlikely);
> +      /* If the jump is not taken, OUTPUT is the expected value.  */
> +      cmp = output;
> +      /* Reload newval to a register manually, *after* the compare and jump
> +	 above.  Otherwise Reload might place it before the jump.  */
> +    }
> +  else
> +    cmp = force_reg (mode, cmp);
> +  new_rtx = force_reg (mode, new_rtx);
> +  s390_emit_compare_and_swap (EQ, output, mem, cmp, new_rtx,
> +			      (do_const_opt) ? CCZmode : CCZ1mode);
> +
> +  /* We deliberately accept non-register operands in the predicate
> +     to ensure the write back to the output operand happens *before*
> +     the store-flags code below.  This makes it easier for combine
> +     to merge the store-flags code with a potential test-and-branch
> +     pattern following (immediately!) afterwards.  */
> +  if (output != vtarget)
> +    emit_move_insn (vtarget, output);
> +
> +  if (skip_cs_label != NULL)
> +      emit_label (skip_cs_label);

So if do_const_opt is true, but output != vtarget, the code above will
write to output, but this is then never moved to vtarget.  This looks
incorrect.

> +  if (TARGET_Z196 && do_const_opt)

do_const_opt seems to always imply TARGET_Z196.

> +; Peephole to combine a load-and-test from volatile memory which combine does
> +; not do.
> +(define_peephole2
> +  [(set (match_operand:GPR 0 "register_operand")
> +	(match_operand:GPR 2 "memory_operand"))
> +   (set (reg CC_REGNUM)
> +	(compare (match_dup 0) (match_operand:GPR 1 "const0_operand")))]
> +  "s390_match_ccmode(insn, CCSmode) && TARGET_EXTIMM
> +   && GENERAL_REG_P (operands[0])
> +   && satisfies_constraint_T (operands[2])"
> +  [(parallel
> +    [(set (reg:CCS CC_REGNUM)
> +	  (compare:CCS (match_dup 2) (match_dup 1)))
> +     (set (match_dup 0) (match_dup 2))])])

We should really try to understand why this isn't done earlier and
fix the problem there ...

>    [(parallel
>      [(set (match_operand:SI 0 "register_operand" "")
>  	  (match_operator:SI 1 "s390_eqne_operator"
> -           [(match_operand:CCZ1 2 "register_operand")
> +           [(match_operand 2 "cc_reg_operand")
>  	    (match_operand 3 "const0_operand")]))
>       (clobber (reg:CC CC_REGNUM))])]
>    ""
> -  "emit_insn (gen_sne (operands[0], operands[2]));
> -   if (GET_CODE (operands[1]) == EQ)
> -     emit_insn (gen_xorsi3 (operands[0], operands[0], const1_rtx));
> +  "machine_mode mode = GET_MODE (operands[2]);
> +   if (TARGET_Z196)
> +     {
> +       rtx cond, ite;
> +
> +       if (GET_CODE (operands[1]) == NE)
> +	 cond = gen_rtx_NE (VOIDmode, operands[2], const0_rtx);
> +       else
> +	 cond = gen_rtx_EQ (VOIDmode, operands[2], const0_rtx);

I guess as a result cond is now always the same as operands[1] and
could be just taken from there?

> +       ite = gen_rtx_IF_THEN_ELSE (SImode, cond, const1_rtx, const0_rtx);
> +       emit_insn (gen_rtx_SET (operands[0], ite));

Also, since you're just emitting RTL directly, maybe you could simply use
the expander pattern above to do so (and not use emit_insn followed
by DONE in this case?)

> @@ -6535,9 +6570,7 @@
>    ""
>    "#"
>    "reload_completed"
> -  [(parallel
> -    [(set (match_dup 0) (ashiftrt:SI (match_dup 0) (const_int 28)))
> -     (clobber (reg:CC CC_REGNUM))])])
> +  [(set (match_dup 0) (lshiftrt:SI (match_dup 0) (const_int 28)))])

Why is this necessary?

> -(define_expand "atomic_compare_and_swap<mode>_internal"
> +(define_expand "atomic_compare_and_swap<DGPR:mode><CCZZ1:mode>_internal"
>    [(parallel
>       [(set (match_operand:DGPR 0 "register_operand")
> -	   (match_operand:DGPR 1 "memory_operand"))
> +	   (match_operand:DGPR 1 "s_operand"))
>        (set (match_dup 1)
>  	   (unspec_volatile:DGPR
>  	     [(match_dup 1)
>  	      (match_operand:DGPR 2 "register_operand")
>  	      (match_operand:DGPR 3 "register_operand")]
>  	     UNSPECV_CAS))
> -      (set (reg:CCZ1 CC_REGNUM)
> -	   (compare:CCZ1 (match_dup 1) (match_dup 2)))])]
> +      (set (reg:CCZZ1 CC_REGNUM)
> +	   (compare:CCZZ1 (match_dup 1) (match_dup 2)))])]
>    "")

See above ...

>  ; cdsg, csg
> -(define_insn "*atomic_compare_and_swap<mode>_1"
> +(define_insn "*atomic_compare_and_swap<TDI:mode><CCZZ1:mode>_1"
>    [(set (match_operand:TDI 0 "register_operand" "=r")
> -	(match_operand:TDI 1 "memory_operand" "+S"))
> +	(match_operand:TDI 1 "s_operand" "+S"))
>     (set (match_dup 1)
>  	(unspec_volatile:TDI
>  	  [(match_dup 1)
>  	   (match_operand:TDI 2 "register_operand" "0")
>  	   (match_operand:TDI 3 "register_operand" "r")]
>  	  UNSPECV_CAS))
> -   (set (reg:CCZ1 CC_REGNUM)
> -	(compare:CCZ1 (match_dup 1) (match_dup 2)))]
> +   (set (reg:CCZZ1 CC_REGNUM)
> +	(compare:CCZZ1 (match_dup 1) (match_dup 2)))]
>    "TARGET_ZARCH"
>    "c<td>sg\t%0,%3,%S1"
>    [(set_attr "op_type" "RSY")
>     (set_attr "type"   "sem")])

So for all these *insn* patterns, I think they don't need to be
duplicated at all, instead they should simply use the 
s390_match_ccmode mechanism to allow multiple CCmodes.

It seems that CCZ1mode will have to be added to
s390_match_ccmode_set to make this work: if set_mode
is CCZ1mode or CCZmode, a req_mode of CCZ1mode should
be accepted.  Then this pattern can simply use
  && s390_match_ccmode(insn, CCZ1mode)


Bye,
Ulrich

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] S/390: Optimize atomic_compare_exchange and atomic_compare builtins.
  2017-04-06 15:29     ` Ulrich Weigand
@ 2017-04-06 15:34       ` Ulrich Weigand
  0 siblings, 0 replies; 18+ messages in thread
From: Ulrich Weigand @ 2017-04-06 15:34 UTC (permalink / raw)
  To: Ulrich Weigand; +Cc: vogt, gcc-patches, Andreas Krebbel, Ulrich Weigand

I wrote (incorrectly):

> >    [(parallel
> >      [(set (match_operand:SI 0 "register_operand" "")
> >  	  (match_operator:SI 1 "s390_eqne_operator"
> > -           [(match_operand:CCZ1 2 "register_operand")
> > +           [(match_operand 2 "cc_reg_operand")
> >  	    (match_operand 3 "const0_operand")]))
> >       (clobber (reg:CC CC_REGNUM))])]
> >    ""
> > -  "emit_insn (gen_sne (operands[0], operands[2]));
> > -   if (GET_CODE (operands[1]) == EQ)
> > -     emit_insn (gen_xorsi3 (operands[0], operands[0], const1_rtx));
> > +  "machine_mode mode = GET_MODE (operands[2]);
> > +   if (TARGET_Z196)
> > +     {
> > +       rtx cond, ite;
> > +
> > +       if (GET_CODE (operands[1]) == NE)
> > +	 cond = gen_rtx_NE (VOIDmode, operands[2], const0_rtx);
> > +       else
> > +	 cond = gen_rtx_EQ (VOIDmode, operands[2], const0_rtx);
> 
> I guess as a result cond is now always the same as operands[1] and
> could be just taken from there?

This is wrong -- I didn't notice the mode changes (in the cstore
pattern, the mode on the operator is SImode, but of the if_then_else
we want VOIDmode.

> > +       ite = gen_rtx_IF_THEN_ELSE (SImode, cond, const1_rtx, const0_rtx);
> > +       emit_insn (gen_rtx_SET (operands[0], ite));
> 
> Also, since you're just emitting RTL directly, maybe you could simply use
> the expander pattern above to do so (and not use emit_insn followed
> by DONE in this case?)

Therefore this doesn't work either.

Sorry for the confusion.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] S/390: Optimize atomic_compare_exchange and atomic_compare builtins.
  2017-04-05 13:52 ` [PATCH] " Dominik Vogt
  2017-04-05 15:25   ` Ulrich Weigand
  2017-04-06  9:35   ` Dominik Vogt
@ 2017-04-07 14:14   ` Dominik Vogt
  2017-04-07 14:34     ` Ulrich Weigand
  2 siblings, 1 reply; 18+ messages in thread
From: Dominik Vogt @ 2017-04-07 14:14 UTC (permalink / raw)
  To: gcc-patches; +Cc: Andreas Krebbel, Ulrich Weigand

[-- Attachment #1: Type: text/plain, Size: 919 bytes --]

On Wed, Apr 05, 2017 at 02:52:00PM +0100, Dominik Vogt wrote:
> On Mon, Mar 27, 2017 at 09:27:35PM +0100, Dominik Vogt wrote:
> > The attached patch optimizes the atomic_exchange and
> > atomic_compare patterns on s390 and s390x (mostly limited to
> > SImode and DImode).  Among general optimizaation, the changes fix
> > most of the problems reported in PR 80080:
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80080
> > 
> > Bootstrapped and regression tested on a zEC12 with s390 and s390x
> > biarch.

New version attached.

v4:

  * Remoce CCZZ1 iterator. 
  * Remove duplicates of CS patterns. 
  * Move the skip_cs_label so that output is moved to vtarget even 
    if the CS instruction was not used. 
  * Removed leftover from "sne" (from an earlier version of the
  * patch). 

Bootstrapped and regression tested on a zEC12 with s390 and s390x
biarch.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

[-- Attachment #2: 0001-v4-ChangeLog --]
[-- Type: text/plain, Size: 1662 bytes --]

gcc/ChangeLog-dv-atomic-gcc7

	* s390-protos.h (s390_expand_cs_hqi): Removed.
	(s390_expand_cs, s390_expand_atomic_exchange_tdsi): New prototypes.
	* config/s390/s390.c (s390_emit_compare_and_swap): Handle all integer
	modes as well as CCZ1mode and CCZmode.
	(s390_expand_atomic_exchange_tdsi, s390_expand_atomic): Adapt to new
	signature of s390_emit_compare_and_swap.
	(s390_expand_cs_hqi): Likewise, make static.
	(s390_expand_cs_tdsi): Generate an explicit compare before trying
	compare-and-swap, in some cases.
	(s390_expand_cs): Wrapper function.
	(s390_expand_atomic_exchange_tdsi): New backend specific expander for
	atomic_exchange.
	(s390_match_ccmode_set): Allow CCZmode <-> CCZ1 mode.
	* config/s390/s390.md (define_peephole2): New peephole to help
	combining the load-and-test pattern with volatile memory.
	("cstorecc4"): Use load-on-condition and deal with CCZmode for
	TARGET_Z196.
	("atomic_compare_and_swap<mode>"): Merge the patterns for small and
	large integers.  Forbid symref memory operands.  Move expander to
	s390.c.  Require cc register.
	("atomic_compare_and_swap<DGPR:mode><CCZZ1:mode>_internal")
	("*atomic_compare_and_swap<TDI:mode><CCZZ1:mode>_1")
	("*atomic_compare_and_swapdi<CCZZ1:mode>_2")
	("*atomic_compare_and_swapsi<CCZZ1:mode>_3"): Use s_operand to forbid
	symref memory operands.  Remove CC mode and call s390_match_ccmode
	instead.
	("atomic_exchange<mode>"): Allow and implement all integer modes.
gcc/testsuite/ChangeLog-dv-atomic-gcc7

	* gcc.target/s390/md/atomic_compare_exchange-1.c: New test.
	* gcc.target/s390/md/atomic_compare_exchange-1.inc: New test.
	* gcc.target/s390/md/atomic_exchange-1.inc: New test.

[-- Attachment #3: 0001-v4-S-390-Optimize-atomic_compare_exchange-and-atomic_co.patch --]
[-- Type: text/plain, Size: 34382 bytes --]

From 0dbcab9152b3d1b7c3a6e72f6d45b8eb56ab40ae Mon Sep 17 00:00:00 2001
From: Dominik Vogt <vogt@linux.vnet.ibm.com>
Date: Thu, 23 Feb 2017 17:23:11 +0100
Subject: [PATCH] S/390: Optimize atomic_compare_exchange and
 atomic_compare builtins.

1) Use the load-and-test instructions for atomic_exchange if the value is 0.
2) If IS_WEAK is true, compare the memory contents before a compare-and-swap
   and skip the CS instructions if the value is not the expected one.
---
 gcc/config/s390/s390-protos.h                      |   4 +-
 gcc/config/s390/s390.c                             | 171 ++++++++++-
 gcc/config/s390/s390.md                            | 150 +++++----
 .../gcc.target/s390/md/atomic_compare_exchange-1.c |  84 ++++++
 .../s390/md/atomic_compare_exchange-1.inc          | 336 +++++++++++++++++++++
 .../gcc.target/s390/md/atomic_exchange-1.c         | 309 +++++++++++++++++++
 6 files changed, 977 insertions(+), 77 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/md/atomic_compare_exchange-1.c
 create mode 100644 gcc/testsuite/gcc.target/s390/md/atomic_compare_exchange-1.inc
 create mode 100644 gcc/testsuite/gcc.target/s390/md/atomic_exchange-1.c

diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h
index 7f06a20..3fdb320 100644
--- a/gcc/config/s390/s390-protos.h
+++ b/gcc/config/s390/s390-protos.h
@@ -112,8 +112,8 @@ extern void s390_expand_vec_strlen (rtx, rtx, rtx);
 extern void s390_expand_vec_movstr (rtx, rtx, rtx);
 extern bool s390_expand_addcc (enum rtx_code, rtx, rtx, rtx, rtx, rtx);
 extern bool s390_expand_insv (rtx, rtx, rtx, rtx);
-extern void s390_expand_cs_hqi (machine_mode, rtx, rtx, rtx,
-				rtx, rtx, bool);
+extern void s390_expand_cs (machine_mode, rtx, rtx, rtx, rtx, rtx, bool);
+extern void s390_expand_atomic_exchange_tdsi (rtx, rtx, rtx);
 extern void s390_expand_atomic (machine_mode, enum rtx_code,
 				rtx, rtx, rtx, bool);
 extern void s390_expand_tbegin (rtx, rtx, rtx, bool);
diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 2cb8947..fe16647 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -1306,6 +1306,7 @@ s390_match_ccmode_set (rtx set, machine_mode req_mode)
   set_mode = GET_MODE (SET_DEST (set));
   switch (set_mode)
     {
+    case CCZ1mode:
     case CCSmode:
     case CCSRmode:
     case CCUmode:
@@ -1328,7 +1329,8 @@ s390_match_ccmode_set (rtx set, machine_mode req_mode)
 
     case CCZmode:
       if (req_mode != CCSmode && req_mode != CCUmode && req_mode != CCTmode
-	  && req_mode != CCSRmode && req_mode != CCURmode)
+	  && req_mode != CCSRmode && req_mode != CCURmode
+	  && req_mode != CCZ1mode)
         return 0;
       break;
 
@@ -1762,11 +1764,31 @@ s390_emit_compare (enum rtx_code code, rtx op0, rtx op1)
 
 static rtx
 s390_emit_compare_and_swap (enum rtx_code code, rtx old, rtx mem,
-			    rtx cmp, rtx new_rtx)
+			    rtx cmp, rtx new_rtx, machine_mode ccmode)
 {
-  emit_insn (gen_atomic_compare_and_swapsi_internal (old, mem, cmp, new_rtx));
-  return s390_emit_compare (code, gen_rtx_REG (CCZ1mode, CC_REGNUM),
-			    const0_rtx);
+  rtx cc;
+
+  cc = gen_rtx_REG (ccmode, CC_REGNUM);
+  switch (GET_MODE (mem))
+    {
+    case SImode:
+      emit_insn (gen_atomic_compare_and_swapsi_internal (old, mem, cmp,
+							 new_rtx, cc));
+      break;
+    case DImode:
+      emit_insn (gen_atomic_compare_and_swapdi_internal (old, mem, cmp,
+							 new_rtx, cc));
+      break;
+    case TImode:
+	emit_insn (gen_atomic_compare_and_swapti_internal (old, mem, cmp,
+							   new_rtx, cc));
+      break;
+    case QImode:
+    case HImode:
+    default:
+      gcc_unreachable ();
+    }
+  return s390_emit_compare (code, cc, const0_rtx);
 }
 
 /* Emit a jump instruction to TARGET and return it.  If COND is
@@ -6723,7 +6745,7 @@ s390_two_part_insv (struct alignment_context *ac, rtx *seq1, rtx *seq2,
    the memory location, CMP the old value to compare MEM with and NEW_RTX the
    value to set if CMP == MEM.  */
 
-void
+static void
 s390_expand_cs_hqi (machine_mode mode, rtx btarget, rtx vtarget, rtx mem,
 		    rtx cmp, rtx new_rtx, bool is_weak)
 {
@@ -6770,7 +6792,7 @@ s390_expand_cs_hqi (machine_mode mode, rtx btarget, rtx vtarget, rtx mem,
   emit_insn (seq2);
   emit_insn (seq3);
 
-  cc = s390_emit_compare_and_swap (EQ, res, ac.memsi, cmpv, newv);
+  cc = s390_emit_compare_and_swap (EQ, res, ac.memsi, cmpv, newv, CCZ1mode);
   if (is_weak)
     emit_insn (gen_cstorecc4 (btarget, cc, XEXP (cc, 0), XEXP (cc, 1)));
   else
@@ -6799,6 +6821,138 @@ s390_expand_cs_hqi (machine_mode mode, rtx btarget, rtx vtarget, rtx mem,
 					      NULL_RTX, 1, OPTAB_DIRECT), 1);
 }
 
+/* Variant of s390_expand_cs for SI, DI and TI modes.  */
+static void
+s390_expand_cs_tdsi (machine_mode mode, rtx btarget, rtx vtarget, rtx mem,
+		     rtx cmp, rtx new_rtx, bool is_weak)
+{
+  rtx output = vtarget;
+  rtx_code_label *skip_cs_label = NULL;
+  bool do_const_opt = false;
+
+  if (!register_operand (output, mode))
+    output = gen_reg_rtx (mode);
+
+  /* If IS_WEAK is true and the INPUT value is a constant, compare the memory
+     with the constant first and skip the compare_and_swap because its very
+     expensive and likely to fail anyway.
+     Note 1: This is done only for IS_WEAK.  C11 allows optimizations that may
+     cause spurious in that case.
+     Note 2: It may be useful to do this also for non-constant INPUT.
+     Note 3: Currently only targets with "load on condition" are supported
+     (z196 and newer).  */
+
+  if (TARGET_Z196
+      && (mode == SImode || mode == DImode))
+    do_const_opt = (is_weak && CONST_INT_P (cmp));
+
+  if (do_const_opt)
+    {
+      const int very_unlikely = REG_BR_PROB_BASE / 100 - 1;
+      rtx cc = gen_rtx_REG (CCZmode, CC_REGNUM);
+
+      skip_cs_label = gen_label_rtx ();
+      emit_move_insn (output, mem);
+      emit_move_insn (btarget, const0_rtx);
+      emit_insn (gen_rtx_SET (cc, gen_rtx_COMPARE (CCZmode, output, cmp)));
+      s390_emit_jump (skip_cs_label, gen_rtx_NE (VOIDmode, cc, const0_rtx));
+      add_int_reg_note (get_last_insn (), REG_BR_PROB, very_unlikely);
+      /* If the jump is not taken, OUTPUT is the expected value.  */
+      cmp = output;
+      /* Reload newval to a register manually, *after* the compare and jump
+	 above.  Otherwise Reload might place it before the jump.  */
+    }
+  else
+    cmp = force_reg (mode, cmp);
+  new_rtx = force_reg (mode, new_rtx);
+  s390_emit_compare_and_swap (EQ, output, mem, cmp, new_rtx,
+			      (do_const_opt) ? CCZmode : CCZ1mode);
+  if (skip_cs_label != NULL)
+    emit_label (skip_cs_label);
+
+  /* We deliberately accept non-register operands in the predicate
+     to ensure the write back to the output operand happens *before*
+     the store-flags code below.  This makes it easier for combine
+     to merge the store-flags code with a potential test-and-branch
+     pattern following (immediately!) afterwards.  */
+  if (output != vtarget)
+    emit_move_insn (vtarget, output);
+
+  if (do_const_opt)
+    {
+      rtx cc, cond, ite;
+
+      /* Do not use gen_cstorecc4 here because it writes either 1 or 0, but
+	 btarget has already been initialized with 0 above.  */
+      cc = gen_rtx_REG (CCZmode, CC_REGNUM);
+      cond = gen_rtx_EQ (VOIDmode, cc, const0_rtx);
+      ite = gen_rtx_IF_THEN_ELSE (SImode, cond, const1_rtx, btarget);
+      emit_insn (gen_rtx_SET (btarget, ite));
+    }
+  else
+    {
+      rtx cc, cond;
+
+      cc = gen_rtx_REG (CCZ1mode, CC_REGNUM);
+      cond = gen_rtx_EQ (SImode, cc, const0_rtx);
+      emit_insn (gen_cstorecc4 (btarget, cond, cc, const0_rtx));
+    }
+}
+
+/* Expand an atomic compare and swap operation.  MEM is the memory location,
+   CMP the old value to compare MEM with and NEW_RTX the value to set if
+   CMP == MEM.  */
+
+void
+s390_expand_cs (machine_mode mode, rtx btarget, rtx vtarget, rtx mem,
+		rtx cmp, rtx new_rtx, bool is_weak)
+{
+  switch (mode)
+    {
+    case TImode:
+    case DImode:
+    case SImode:
+      s390_expand_cs_tdsi (mode, btarget, vtarget, mem, cmp, new_rtx, is_weak);
+      break;
+    case HImode:
+    case QImode:
+      s390_expand_cs_hqi (mode, btarget, vtarget, mem, cmp, new_rtx, is_weak);
+      break;
+    default:
+      gcc_unreachable ();
+    }
+}
+
+/* Expand an atomic_exchange operation simulated with a compare-and-swap loop.
+   The memory location MEM is set to INPUT.  OUTPUT is set to the previous value
+   of MEM.  */
+
+void
+s390_expand_atomic_exchange_tdsi (rtx output, rtx mem, rtx input)
+{
+  machine_mode mode = GET_MODE (mem);
+  rtx_code_label *csloop;
+
+  if (TARGET_Z196
+      && (mode == DImode || mode == SImode)
+      && CONST_INT_P (input) && INTVAL (input) == 0)
+    {
+      emit_move_insn (output, const0_rtx);
+      if (mode == DImode)
+	emit_insn (gen_atomic_fetch_anddi (output, mem, const0_rtx, input));
+      else
+	emit_insn (gen_atomic_fetch_andsi (output, mem, const0_rtx, input));
+      return;
+    }
+
+  input = force_reg (mode, input);
+  emit_move_insn (output, mem);
+  csloop = gen_label_rtx ();
+  emit_label (csloop);
+  s390_emit_jump (csloop, s390_emit_compare_and_swap (NE, output, mem, output,
+						      input, CCZ1mode));
+}
+
 /* Expand an atomic operation CODE of mode MODE.  MEM is the memory location
    and VAL the value to play with.  If AFTER is true then store the value
    MEM holds after the operation, if AFTER is false then store the value MEM
@@ -6878,7 +7032,8 @@ s390_expand_atomic (machine_mode mode, enum rtx_code code,
     }
 
   s390_emit_jump (csloop, s390_emit_compare_and_swap (NE, cmp,
-						      ac.memsi, cmp, new_rtx));
+						      ac.memsi, cmp, new_rtx,
+						      CCZ1mode));
 
   /* Return the correct part of the bitfield.  */
   if (target)
diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index 59f189c..8a700ed 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -907,6 +907,21 @@
   [(set_attr "op_type" "RR<E>,RXY")
    (set_attr "z10prop" "z10_fr_E1,z10_fwd_A3") ])
 
+; Peephole to combine a load-and-test from volatile memory which combine does
+; not do.
+(define_peephole2
+  [(set (match_operand:GPR 0 "register_operand")
+	(match_operand:GPR 2 "memory_operand"))
+   (set (reg CC_REGNUM)
+	(compare (match_dup 0) (match_operand:GPR 1 "const0_operand")))]
+  "s390_match_ccmode(insn, CCSmode) && TARGET_EXTIMM
+   && GENERAL_REG_P (operands[0])
+   && satisfies_constraint_T (operands[2])"
+  [(parallel
+    [(set (reg:CCS CC_REGNUM)
+	  (compare:CCS (match_dup 2) (match_dup 1)))
+     (set (match_dup 0) (match_dup 2))])])
+
 ; ltr, lt, ltgr, ltg
 (define_insn "*tst<mode>_cconly_extimm"
   [(set (reg CC_REGNUM)
@@ -6518,13 +6533,30 @@
   [(parallel
     [(set (match_operand:SI 0 "register_operand" "")
 	  (match_operator:SI 1 "s390_eqne_operator"
-           [(match_operand:CCZ1 2 "register_operand")
+           [(match_operand 2 "cc_reg_operand")
 	    (match_operand 3 "const0_operand")]))
      (clobber (reg:CC CC_REGNUM))])]
   ""
-  "emit_insn (gen_sne (operands[0], operands[2]));
-   if (GET_CODE (operands[1]) == EQ)
-     emit_insn (gen_xorsi3 (operands[0], operands[0], const1_rtx));
+  "machine_mode mode = GET_MODE (operands[2]);
+   if (TARGET_Z196)
+     {
+       rtx cond, ite;
+
+       if (GET_CODE (operands[1]) == NE)
+	 cond = gen_rtx_NE (VOIDmode, operands[2], const0_rtx);
+       else
+	 cond = gen_rtx_EQ (VOIDmode, operands[2], const0_rtx);
+       ite = gen_rtx_IF_THEN_ELSE (SImode, cond, const1_rtx, const0_rtx);
+       emit_insn (gen_rtx_SET (operands[0], ite));
+     }
+   else
+     {
+       if (mode != CCZ1mode)
+	 FAIL;
+       emit_insn (gen_sne (operands[0], operands[2]));
+       if (GET_CODE (operands[1]) == EQ)
+	 emit_insn (gen_xorsi3 (operands[0], operands[0], const1_rtx));
+     }
    DONE;")
 
 (define_insn_and_split "sne"
@@ -10198,83 +10230,56 @@
 
 (define_expand "atomic_compare_and_swap<mode>"
   [(match_operand:SI 0 "register_operand")	;; bool success output
-   (match_operand:DGPR 1 "nonimmediate_operand");; oldval output
-   (match_operand:DGPR 2 "memory_operand")	;; memory
-   (match_operand:DGPR 3 "register_operand")	;; expected intput
-   (match_operand:DGPR 4 "register_operand")	;; newval intput
+   (match_operand:DINT 1 "nonimmediate_operand");; oldval output
+   (match_operand:DINT 2 "s_operand")		;; memory
+   (match_operand:DINT 3 "general_operand")	;; expected intput
+   (match_operand:DINT 4 "general_operand")	;; newval intput
    (match_operand:SI 5 "const_int_operand")	;; is_weak
    (match_operand:SI 6 "const_int_operand")	;; success model
    (match_operand:SI 7 "const_int_operand")]	;; failure model
   ""
 {
-  rtx cc, cmp, output = operands[1];
-
-  if (!register_operand (output, <MODE>mode))
-    output = gen_reg_rtx (<MODE>mode);
-
-  if (MEM_ALIGN (operands[2]) < GET_MODE_BITSIZE (GET_MODE (operands[2])))
+  if (GET_MODE_BITSIZE (<MODE>mode) >= 16
+      && GET_MODE_BITSIZE (<MODE>mode) > MEM_ALIGN (operands[2]))
     FAIL;
 
-  emit_insn (gen_atomic_compare_and_swap<mode>_internal
-	     (output, operands[2], operands[3], operands[4]));
-
-  /* We deliberately accept non-register operands in the predicate
-     to ensure the write back to the output operand happens *before*
-     the store-flags code below.  This makes it easier for combine
-     to merge the store-flags code with a potential test-and-branch
-     pattern following (immediately!) afterwards.  */
-  if (output != operands[1])
-    emit_move_insn (operands[1], output);
-
-  cc = gen_rtx_REG (CCZ1mode, CC_REGNUM);
-  cmp = gen_rtx_EQ (SImode, cc, const0_rtx);
-  emit_insn (gen_cstorecc4 (operands[0], cmp, cc, const0_rtx));
-  DONE;
-})
-
-(define_expand "atomic_compare_and_swap<mode>"
-  [(match_operand:SI 0 "register_operand")	;; bool success output
-   (match_operand:HQI 1 "nonimmediate_operand")	;; oldval output
-   (match_operand:HQI 2 "memory_operand")	;; memory
-   (match_operand:HQI 3 "general_operand")	;; expected intput
-   (match_operand:HQI 4 "general_operand")	;; newval intput
-   (match_operand:SI 5 "const_int_operand")	;; is_weak
-   (match_operand:SI 6 "const_int_operand")	;; success model
-   (match_operand:SI 7 "const_int_operand")]	;; failure model
-  ""
-{
-  s390_expand_cs_hqi (<MODE>mode, operands[0], operands[1], operands[2],
-		      operands[3], operands[4], INTVAL (operands[5]));
-  DONE;
-})
+  s390_expand_cs (<MODE>mode, operands[0], operands[1], operands[2],
+		  operands[3], operands[4], INTVAL (operands[5]));
+  DONE;})
 
 (define_expand "atomic_compare_and_swap<mode>_internal"
   [(parallel
      [(set (match_operand:DGPR 0 "register_operand")
-	   (match_operand:DGPR 1 "memory_operand"))
+	   (match_operand:DGPR 1 "s_operand"))
       (set (match_dup 1)
 	   (unspec_volatile:DGPR
 	     [(match_dup 1)
 	      (match_operand:DGPR 2 "register_operand")
 	      (match_operand:DGPR 3 "register_operand")]
 	     UNSPECV_CAS))
-      (set (reg:CCZ1 CC_REGNUM)
-	   (compare:CCZ1 (match_dup 1) (match_dup 2)))])]
-  "")
+      (set (match_operand 4 "cc_reg_operand")
+	   (match_dup 5))])]
+  "GET_MODE (operands[4]) == CCZmode
+   || GET_MODE (operands[4]) == CCZ1mode"
+{
+  operands[5]
+    = gen_rtx_COMPARE (GET_MODE (operands[4]), operands[1], operands[2]);
+})
 
 ; cdsg, csg
 (define_insn "*atomic_compare_and_swap<mode>_1"
   [(set (match_operand:TDI 0 "register_operand" "=r")
-	(match_operand:TDI 1 "memory_operand" "+S"))
+	(match_operand:TDI 1 "s_operand" "+S"))
    (set (match_dup 1)
 	(unspec_volatile:TDI
 	  [(match_dup 1)
 	   (match_operand:TDI 2 "register_operand" "0")
 	   (match_operand:TDI 3 "register_operand" "r")]
 	  UNSPECV_CAS))
-   (set (reg:CCZ1 CC_REGNUM)
-	(compare:CCZ1 (match_dup 1) (match_dup 2)))]
-  "TARGET_ZARCH"
+   (set (reg CC_REGNUM)
+	(compare (match_dup 1) (match_dup 2)))]
+  "TARGET_ZARCH
+   && s390_match_ccmode (insn, CCZ1mode)"
   "c<td>sg\t%0,%3,%S1"
   [(set_attr "op_type" "RSY")
    (set_attr "type"   "sem")])
@@ -10282,16 +10287,17 @@
 ; cds, cdsy
 (define_insn "*atomic_compare_and_swapdi_2"
   [(set (match_operand:DI 0 "register_operand" "=r,r")
-	(match_operand:DI 1 "memory_operand" "+Q,S"))
+	(match_operand:DI 1 "s_operand" "+Q,S"))
    (set (match_dup 1)
 	(unspec_volatile:DI
 	  [(match_dup 1)
 	   (match_operand:DI 2 "register_operand" "0,0")
 	   (match_operand:DI 3 "register_operand" "r,r")]
 	  UNSPECV_CAS))
-   (set (reg:CCZ1 CC_REGNUM)
-	(compare:CCZ1 (match_dup 1) (match_dup 2)))]
-  "!TARGET_ZARCH"
+   (set (reg CC_REGNUM)
+	(compare (match_dup 1) (match_dup 2)))]
+  "!TARGET_ZARCH
+   && s390_match_ccmode (insn, CCZ1mode)"
   "@
    cds\t%0,%3,%S1
    cdsy\t%0,%3,%S1"
@@ -10302,16 +10308,16 @@
 ; cs, csy
 (define_insn "*atomic_compare_and_swapsi_3"
   [(set (match_operand:SI 0 "register_operand" "=r,r")
-	(match_operand:SI 1 "memory_operand" "+Q,S"))
+	(match_operand:SI 1 "s_operand" "+Q,S"))
    (set (match_dup 1)
 	(unspec_volatile:SI
 	  [(match_dup 1)
 	   (match_operand:SI 2 "register_operand" "0,0")
 	   (match_operand:SI 3 "register_operand" "r,r")]
 	  UNSPECV_CAS))
-   (set (reg:CCZ1 CC_REGNUM)
-	(compare:CCZ1 (match_dup 1) (match_dup 2)))]
-  ""
+   (set (reg CC_REGNUM)
+	(compare (match_dup 1) (match_dup 2)))]
+  "s390_match_ccmode (insn, CCZ1mode)"
   "@
    cs\t%0,%3,%S1
    csy\t%0,%3,%S1"
@@ -10398,15 +10404,25 @@
   DONE;
 })
 
+;; Pattern to implement atomic_exchange with a compare-and-swap loop.  The code
+;; generated by the middleend is not good.
 (define_expand "atomic_exchange<mode>"
-  [(match_operand:HQI 0 "register_operand")		;; val out
-   (match_operand:HQI 1 "memory_operand")		;; memory
-   (match_operand:HQI 2 "general_operand")		;; val in
+  [(match_operand:DINT 0 "register_operand")		;; val out
+   (match_operand:DINT 1 "s_operand")			;; memory
+   (match_operand:DINT 2 "general_operand")		;; val in
    (match_operand:SI 3 "const_int_operand")]		;; model
   ""
 {
-  s390_expand_atomic (<MODE>mode, SET, operands[0], operands[1],
-		      operands[2], false);
+  if (<MODE>mode != QImode
+      && MEM_ALIGN (operands[1]) < GET_MODE_BITSIZE (<MODE>mode))
+    FAIL;
+  if (<MODE>mode == HImode || <MODE>mode == QImode)
+    s390_expand_atomic (<MODE>mode, SET, operands[0], operands[1], operands[2],
+			false);
+  else if (<MODE>mode == SImode || TARGET_ZARCH)
+    s390_expand_atomic_exchange_tdsi (operands[0], operands[1], operands[2]);
+  else
+    FAIL;
   DONE;
 })
 
diff --git a/gcc/testsuite/gcc.target/s390/md/atomic_compare_exchange-1.c b/gcc/testsuite/gcc.target/s390/md/atomic_compare_exchange-1.c
new file mode 100644
index 0000000..5cc026d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/md/atomic_compare_exchange-1.c
@@ -0,0 +1,84 @@
+/* Machine description pattern tests.  */
+
+/* { dg-do compile } */
+/* { dg-options "" } */
+/* { dg-do run { target { s390_useable_hw } } } */
+
+#include <stdio.h>
+
+struct
+{
+#ifdef __s390xx__
+  __int128 dummy128;
+  __int128 mem128;
+#endif
+  long long dummy64;
+  long long mem64;
+  int dummy32;
+  int mem32;
+  short mem16l;
+  short mem16h;
+  char mem8ll;
+  char mem8lh;
+  char mem8hl;
+  char mem8hh;
+} mem_s;
+
+#define TYPE char
+#define FN(SUFFIX) f8 ## SUFFIX
+#define FNS(SUFFIX) "f8" #SUFFIX
+#include "atomic_compare_exchange-1.inc"
+
+#define TYPE short
+#define FN(SUFFIX) f16 ##SUFFIX
+#define FNS(SUFFIX) "f16" #SUFFIX
+#include "atomic_compare_exchange-1.inc"
+
+#define TYPE int
+#define FN(SUFFIX) f32 ## SUFFIX
+#define FNS(SUFFIX) "f32" #SUFFIX
+#include "atomic_compare_exchange-1.inc"
+
+#define TYPE long long
+#define FN(SUFFIX) f64 ## SUFFIX
+#define FNS(SUFFIX) "f64" #SUFFIX
+#include "atomic_compare_exchange-1.inc"
+
+#ifdef __s390xx__
+#define TYPE __int128
+#define FN(SUFFIX) f128 ## SUFFIX
+#define FNS(SUFFIX) "f128" #SUFFIX
+#include "atomic_compare_exchange-1.inc"
+#endif
+
+int main(void)
+{
+  int err_count = 0;
+  int i;
+
+  for (i = -1; i <= 2; i++)
+    {
+      __builtin_memset(&mem_s, 0x99, sizeof(mem_s));
+      err_count += f8_validate(&mem_s.mem8ll, i, 1);
+      __builtin_memset(&mem_s, 0x99, sizeof(mem_s));
+      err_count += f8_validate(&mem_s.mem8lh, i, 1);
+      __builtin_memset(&mem_s, 0x99, sizeof(mem_s));
+      err_count += f8_validate(&mem_s.mem8hl, i, 1);
+      __builtin_memset(&mem_s, 0x99, sizeof(mem_s));
+      err_count += f8_validate(&mem_s.mem8hh, i, 1);
+      __builtin_memset(&mem_s, 0x99, sizeof(mem_s));
+      err_count += f16_validate(&mem_s.mem16l, i, 1);
+      __builtin_memset(&mem_s, 0x99, sizeof(mem_s));
+      err_count += f16_validate(&mem_s.mem16h, i, 1);
+      __builtin_memset(&mem_s, 0x99, sizeof(mem_s));
+      err_count += f32_validate(&mem_s.mem32, i, 1);
+      __builtin_memset(&mem_s, 0x99, sizeof(mem_s));
+      err_count += f64_validate(&mem_s.mem64, i, 1);
+#ifdef __s390xx__
+      __builtin_memset(&mem_s, 0x99, sizeof(mem_s));
+      err_count += f128_validate(&mem_s.mem128, i, 1);
+#endif
+    }
+
+  return err_count;
+}
diff --git a/gcc/testsuite/gcc.target/s390/md/atomic_compare_exchange-1.inc b/gcc/testsuite/gcc.target/s390/md/atomic_compare_exchange-1.inc
new file mode 100644
index 0000000..199aaa3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/md/atomic_compare_exchange-1.inc
@@ -0,0 +1,336 @@
+/* -*-c-*- */
+
+#undef NEW
+#define NEW 3
+
+__attribute__ ((noinline))
+int FN(_bo)(TYPE *mem, TYPE *old_ret, TYPE old)
+{
+  *old_ret = old;
+  return __atomic_compare_exchange_n (mem, (void *)old_ret, NEW, 1, 2, 0);
+}
+
+__attribute__ ((noinline))
+void FN(_o)(TYPE *mem, TYPE *old_ret, TYPE old)
+{
+  *old_ret = old;
+  __atomic_compare_exchange_n (mem, (void *)old_ret, NEW, 1, 2, 0);
+  return;
+}
+
+__attribute__ ((noinline))
+int FN(_b)(TYPE *mem, TYPE old)
+{
+  return __atomic_compare_exchange_n (mem, (void *)&old, NEW, 1, 2, 0);
+}
+
+__attribute__ ((noinline))
+void FN()(TYPE *mem, TYPE old)
+{
+  __atomic_compare_exchange_n (mem, (void *)&old, NEW, 1, 2, 0);
+  return;
+}
+
+/* Const != 0 old value.  */
+__attribute__ ((noinline))
+int FN(_c1_bo)(TYPE *mem, TYPE *old_ret)
+{
+  *old_ret = 1;
+  return __atomic_compare_exchange_n (mem, (void *)old_ret, NEW, 1, 2, 0);
+}
+
+__attribute__ ((noinline))
+void FN(_c1_o)(TYPE *mem, TYPE *old_ret)
+{
+  *old_ret = 1;
+  __atomic_compare_exchange_n (mem, (void *)old_ret, NEW, 1, 2, 0);
+  return;
+}
+
+__attribute__ ((noinline))
+int FN(_c1_b)(TYPE *mem)
+{
+  TYPE old = 1;
+  return __atomic_compare_exchange_n (mem, (void *)&old, NEW, 1, 2, 0);
+}
+
+__attribute__ ((noinline))
+void FN(_c1)(TYPE *mem)
+{
+  TYPE old = 1;
+  __atomic_compare_exchange_n (mem, (void *)&old, NEW, 1, 2, 0);
+  return;
+}
+
+/* Const == 0 old value.  */
+__attribute__ ((noinline))
+int FN(_c0_bo)(TYPE *mem, TYPE *old_ret)
+{
+  *old_ret = 0;
+  return __atomic_compare_exchange_n (mem, (void *)old_ret, NEW, 1, 2, 0);
+}
+
+__attribute__ ((noinline))
+void FN(_c0_o)(TYPE *mem, TYPE *old_ret)
+{
+  *old_ret = 0;
+  __atomic_compare_exchange_n (mem, (void *)old_ret, NEW, 1, 2, 0);
+  return;
+}
+
+__attribute__ ((noinline))
+int FN(_c0_b)(TYPE *mem)
+{
+  TYPE old = 0;
+  return __atomic_compare_exchange_n (mem, (void *)&old, NEW, 1, 2, 0);
+}
+
+__attribute__ ((noinline))
+void FN(_c0)(TYPE *mem)
+{
+  TYPE old = 0;
+  __atomic_compare_exchange_n (mem, (void *)&old, NEW, 1, 2, 0);
+  return;
+}
+
+int FN(_validate_mem)(TYPE *mem, TYPE expected_mem)
+{
+  if (*mem != expected_mem)
+    {
+      fprintf(stderr, "  BAD: mem %d != expected mem %d\n",
+	      *mem, expected_mem);
+      return 1;
+    }
+
+  return 0;
+}
+
+int FN(_validate_rc)(int rc, int expected_rc)
+{
+  if (rc != expected_rc)
+    {
+      fprintf(stderr, "  BAD: rc %d != expected rc %d\n",
+	      rc, expected_rc);
+      return 1;
+    }
+
+  return 0;
+}
+
+int FN(_validate_old_ret)(int old_ret, int expected_old_ret)
+{
+  if (old_ret != expected_old_ret)
+    {
+      fprintf(stderr, "  BAD: old_ret %d != expected old_ret %d\n",
+	      old_ret, expected_old_ret);
+      return 1;
+    }
+
+  return 0;
+}
+
+int FN(_validate)(TYPE *mem, TYPE init_mem, TYPE old)
+{
+  int err_count = 0;
+  int rc;
+  TYPE expected_mem;
+  int expected_rc;
+  TYPE old_ret;
+  int failed;
+  const char *fname;
+
+  fprintf(stderr, "%s: init_mem %d @ %p\n", __FUNCTION__, init_mem, mem);
+  {
+    failed = 0;
+    old = 1;
+    *mem = init_mem;
+    expected_mem = (init_mem == old) ? NEW : *mem;
+    expected_rc = (init_mem == old);
+    fname = FNS(_bo);
+    rc = FN(_bo)(mem, &old_ret, old);
+    failed |= FN(_validate_rc)(rc, expected_rc);
+    failed |= FN(_validate_old_ret)(old_ret, init_mem);
+    failed |= FN(_validate_mem)(mem, expected_mem);
+    if (failed)
+      {
+	fprintf(stderr, "  FAIL: %s: near line %d\n", fname, __LINE__ - 3);
+	err_count++;
+      }
+  }
+  {
+    failed = 0;
+    old = 1;
+    *mem = init_mem;
+    expected_mem = (init_mem == old) ? NEW : *mem;
+    fname = FNS(_o);
+    FN(_o)(mem, &old_ret, old);
+    failed |= FN(_validate_old_ret)(old_ret, init_mem);
+    failed |= FN(_validate_mem)(mem, expected_mem);
+    if (failed)
+      {
+	fprintf(stderr, "  FAIL: %s: near line %d\n", fname, __LINE__ - 3);
+	err_count++;
+      }
+  }
+  {
+    failed = 0;
+    old = 1;
+    *mem = init_mem;
+    expected_mem = (init_mem == old) ? NEW : *mem;
+    expected_rc = (init_mem == old);
+    fname = FNS(_b);
+    rc = FN(_b)(mem, old);
+    failed |= FN(_validate_rc)(rc, expected_rc);
+    failed |= FN(_validate_mem)(mem, expected_mem);
+    if (failed)
+      {
+	fprintf(stderr, "  FAIL: %s: near line %d\n", fname, __LINE__ - 3);
+	err_count++;
+      }
+  }
+  {
+    failed = 0;
+    old = 1;
+    *mem = init_mem;
+    expected_mem = (init_mem == old) ? NEW : *mem;
+    fname = FNS();
+    FN()(mem, old);
+    failed |= FN(_validate_mem)(mem, expected_mem);
+    if (failed)
+      {
+	fprintf(stderr, "  FAIL: %s: near line %d\n", fname, __LINE__ - 3);
+	err_count++;
+      }
+  }
+  {
+    failed = 0;
+    old = 1;
+    *mem = init_mem;
+    expected_mem = (init_mem == old) ? NEW : *mem;
+    expected_rc = (init_mem == old);
+    fname = FNS(_c1_bo);
+    rc = FN(_c1_bo)(mem, &old_ret);
+    failed |= FN(_validate_rc)(rc, expected_rc);
+    failed |= FN(_validate_old_ret)(old_ret, init_mem);
+    failed |= FN(_validate_mem)(mem, expected_mem);
+    if (failed)
+      {
+	fprintf(stderr, "  FAIL: %s: near line %d\n", fname, __LINE__ - 3);
+	err_count++;
+      }
+  }
+  {
+    failed = 0;
+    old = 1;
+    *mem = init_mem;
+    expected_mem = (init_mem == old) ? NEW : *mem;
+    fname = FNS(_c1_o);
+    FN(_c1_o)(mem, &old_ret);
+    failed |= FN(_validate_old_ret)(old_ret, init_mem);
+    failed |= FN(_validate_mem)(mem, expected_mem);
+    if (failed)
+      {
+	fprintf(stderr, "  FAIL: %s: near line %d\n", fname, __LINE__ - 3);
+	err_count++;
+      }
+  }
+  {
+    failed = 0;
+    old = 1;
+    *mem = init_mem;
+    expected_mem = (init_mem == old) ? NEW : *mem;
+    expected_rc = (init_mem == old);
+    fname = FNS(_c1_b);
+    rc = FN(_c1_b)(mem);
+    failed |= FN(_validate_rc)(rc, expected_rc);
+    failed |= FN(_validate_mem)(mem, expected_mem);
+    if (failed)
+      {
+	fprintf(stderr, "  FAIL: %s: near line %d\n", fname, __LINE__ - 3);
+	err_count++;
+      }
+  }
+  {
+    failed = 0;
+    old = 1;
+    *mem = init_mem;
+    expected_mem = (init_mem == old) ? NEW : *mem;
+    fname = FNS(_c1);
+    FN(_c1)(mem);
+    failed |= FN(_validate_mem)(mem, expected_mem);
+    if (failed)
+      {
+	fprintf(stderr, "  FAIL: %s: near line %d\n", fname, __LINE__ - 3);
+	err_count++;
+      }
+  }
+  {
+    failed = 0;
+    old = 0;
+    *mem = init_mem;
+    expected_mem = (init_mem == old) ? NEW : *mem;
+    expected_rc = (init_mem == old);
+    fname = FNS(_c0_bo);
+    rc = FN(_c0_bo)(mem, &old_ret);
+    failed |= FN(_validate_rc)(rc, expected_rc);
+    failed |= FN(_validate_old_ret)(old_ret, init_mem);
+    failed |= FN(_validate_mem)(mem, expected_mem);
+    if (failed)
+      {
+	fprintf(stderr, "  FAIL: %s: near line %d\n", fname, __LINE__ - 3);
+	err_count++;
+      }
+  }
+  {
+    failed = 0;
+    old = 0;
+    *mem = init_mem;
+    expected_mem = (init_mem == old) ? NEW : *mem;
+    fname = FNS(_c0_o);
+    FN(_c0_o)(mem, &old_ret);
+    failed |= FN(_validate_old_ret)(old_ret, init_mem);
+    failed |= FN(_validate_mem)(mem, expected_mem);
+    if (failed)
+      {
+	fprintf(stderr, "  FAIL: %s: near line %d\n", fname, __LINE__ - 3);
+	err_count++;
+      }
+  }
+  {
+    failed = 0;
+    old = 0;
+    *mem = init_mem;
+    expected_mem = (init_mem == old) ? NEW : *mem;
+    expected_rc = (init_mem == old);
+    fname = FNS(_c0_b);
+    rc = FN(_c0_b)(mem);
+    failed |= FN(_validate_rc)(rc, expected_rc);
+    failed |= FN(_validate_mem)(mem, expected_mem);
+    if (failed)
+      {
+	fprintf(stderr, "  FAIL: %s: near line %d\n", fname, __LINE__ - 3);
+	err_count++;
+      }
+  }
+  {
+    failed = 0;
+    old = 0;
+    *mem = init_mem;
+    expected_mem = (init_mem == old) ? NEW : *mem;
+    fname = FNS(_c0);
+    FN(_c0)(mem);
+    failed |= FN(_validate_mem)(mem, expected_mem);
+    if (failed)
+      {
+	fprintf(stderr, "  FAIL: %s: near line %d\n", fname, __LINE__ - 3);
+	err_count++;
+      }
+  }
+
+  return err_count;
+}
+
+#undef TYPE
+#undef MEM
+#undef FN
+#undef FNS
diff --git a/gcc/testsuite/gcc.target/s390/md/atomic_exchange-1.c b/gcc/testsuite/gcc.target/s390/md/atomic_exchange-1.c
new file mode 100644
index 0000000..f82b213
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/md/atomic_exchange-1.c
@@ -0,0 +1,309 @@
+/* Machine description pattern tests.  */
+
+/* { dg-do compile } */
+/* { dg-options "-lpthread -latomic" } */
+/* { dg-do run { target { s390_useable_hw } } } */
+
+/**/
+
+char
+ae_8_0 (char *lock)
+{
+  return __atomic_exchange_n (lock, 0, 2);
+}
+
+char
+ae_8_1 (char *lock)
+{
+  return __atomic_exchange_n (lock, 1, 2);
+}
+
+char g8;
+
+char
+ae_8_g_0 (void)
+{
+  return __atomic_exchange_n (&g8, 0, 2);
+}
+
+char
+ae_8_g_1 (void)
+{
+  return __atomic_exchange_n (&g8, 1, 2);
+}
+
+/**/
+
+short
+ae_16_0 (short *lock)
+{
+  return __atomic_exchange_n (lock, 0, 2);
+}
+
+short
+ae_16_1 (short *lock)
+{
+  return __atomic_exchange_n (lock, 1, 2);
+}
+
+short g16;
+
+short
+ae_16_g_0 (void)
+{
+  return __atomic_exchange_n (&g16, 0, 2);
+}
+
+short
+ae_16_g_1 (void)
+{
+  return __atomic_exchange_n (&g16, 1, 2);
+}
+
+/**/
+
+int
+ae_32_0 (int *lock)
+{
+  return __atomic_exchange_n (lock, 0, 2);
+}
+
+int
+ae_32_1 (int *lock)
+{
+  return __atomic_exchange_n (lock, 1, 2);
+}
+
+int g32;
+
+int
+ae_32_g_0 (void)
+{
+  return __atomic_exchange_n (&g32, 0, 2);
+}
+
+int
+ae_32_g_1 (void)
+{
+  return __atomic_exchange_n (&g32, 1, 2);
+}
+
+/**/
+
+long long
+ae_64_0 (long long *lock)
+{
+  return __atomic_exchange_n (lock, 0, 2);
+}
+
+long long
+ae_64_1 (long long *lock)
+{
+  return __atomic_exchange_n (lock, 1, 2);
+}
+
+long long g64;
+
+long long
+ ae_64_g_0 (void)
+{
+  return __atomic_exchange_n (&g64, 0, 2);
+}
+
+long long
+ae_64_g_1 (void)
+{
+  return __atomic_exchange_n (&g64, 1, 2);
+}
+
+/**/
+
+#ifdef __s390x__
+__int128
+ae_128_0 (__int128 *lock)
+{
+  return __atomic_exchange_n (lock, 0, 2);
+}
+
+__int128
+ae_128_1 (__int128 *lock)
+{
+  return __atomic_exchange_n (lock, 1, 2);
+}
+
+__int128 g128;
+
+__int128
+ae_128_g_0 (void)
+{
+  return __atomic_exchange_n (&g128, 0, 2);
+}
+
+__int128
+ae_128_g_1 (void)
+{
+  return __atomic_exchange_n (&g128, 1, 2);
+}
+
+#endif
+
+int main(void)
+{
+  int i;
+
+  for (i = 0; i <= 2; i++)
+    {
+      int oval = i;
+
+      {
+	char lock;
+	char rval;
+
+	lock = oval;
+	rval = ae_8_0 (&lock);
+	if (lock != 0)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	lock = oval;
+	rval = ae_8_1 (&lock);
+	if (lock != 1)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	g8 = oval;
+	rval = ae_8_g_0 ();
+	if (g8 != 0)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	g8 = oval;
+	rval = ae_8_g_1 ();
+	if (g8 != 1)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+      }
+      {
+	short lock;
+	short rval;
+
+	lock = oval;
+	rval = ae_16_0 (&lock);
+	if (lock != 0)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	lock = oval;
+	rval = ae_16_1 (&lock);
+	if (lock != 1)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	g16 = oval;
+	rval = ae_16_g_0 ();
+	if (g16 != 0)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	g16 = oval;
+	rval = ae_16_g_1 ();
+	if (g16 != 1)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+      }
+      {
+	int lock;
+	int rval;
+
+	lock = oval;
+	rval = ae_32_0 (&lock);
+	if (lock != 0)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	lock = oval;
+	rval = ae_32_1 (&lock);
+	if (lock != 1)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	g32 = oval;
+	rval = ae_32_g_0 ();
+	if (g32 != 0)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	g32 = oval;
+	rval = ae_32_g_1 ();
+	if (g32 != 1)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+      }
+      {
+	long long lock;
+	long long rval;
+
+	lock = oval;
+	rval = ae_64_0 (&lock);
+	if (lock != 0)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	lock = oval;
+	rval = ae_64_1 (&lock);
+	if (lock != 1)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	g64 = oval;
+	rval = ae_64_g_0 ();
+	if (g64 != 0)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	g64 = oval;
+	rval = ae_64_g_1 ();
+	if (g64 != 1)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+      }
+
+#ifdef __s390x__
+      {
+	__int128 lock;
+	__int128 rval;
+
+	lock = oval;
+	rval = ae_128_0 (&lock);
+	if (lock != 0)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	lock = oval;
+	rval = ae_128_1 (&lock);
+	if (lock != 1)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	g128 = oval;
+	rval = ae_128_g_0 ();
+	if (g128 != 0)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	g128 = oval;
+	rval = ae_128_g_1 ();
+	if (g128 != 1)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+      }
+#endif
+    }
+
+  return 0;
+}
-- 
2.3.0


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] S/390: Optimize atomic_compare_exchange and atomic_compare builtins.
  2017-04-07 14:14   ` Dominik Vogt
@ 2017-04-07 14:34     ` Ulrich Weigand
  2017-04-07 15:37       ` Dominik Vogt
  0 siblings, 1 reply; 18+ messages in thread
From: Ulrich Weigand @ 2017-04-07 14:34 UTC (permalink / raw)
  To: vogt; +Cc: gcc-patches, Andreas Krebbel, Ulrich Weigand

Dominik Vogt wrote:

> v4:
> 
>   * Remoce CCZZ1 iterator. 
>   * Remove duplicates of CS patterns. 
>   * Move the skip_cs_label so that output is moved to vtarget even 
>     if the CS instruction was not used. 
>   * Removed leftover from "sne" (from an earlier version of the
>   * patch). 

Thanks, this looks quite good to me now.  I do still have two questions:

> +; Peephole to combine a load-and-test from volatile memory which combine does
> +; not do.
> +(define_peephole2
> +  [(set (match_operand:GPR 0 "register_operand")
> +	(match_operand:GPR 2 "memory_operand"))
> +   (set (reg CC_REGNUM)
> +	(compare (match_dup 0) (match_operand:GPR 1 "const0_operand")))]
> +  "s390_match_ccmode(insn, CCSmode) && TARGET_EXTIMM
> +   && GENERAL_REG_P (operands[0])
> +   && satisfies_constraint_T (operands[2])"
> +  [(parallel
> +    [(set (reg:CCS CC_REGNUM)
> +	  (compare:CCS (match_dup 2) (match_dup 1)))
> +     (set (match_dup 0) (match_dup 2))])])

Still wondering why this is necessary.  On the other hand, I guess it
cannot hurt to have the peephole either ...

> @@ -6518,13 +6533,30 @@
>    [(parallel
>      [(set (match_operand:SI 0 "register_operand" "")
>  	  (match_operator:SI 1 "s390_eqne_operator"
> -           [(match_operand:CCZ1 2 "register_operand")
> +           [(match_operand 2 "cc_reg_operand")
>  	    (match_operand 3 "const0_operand")]))
>       (clobber (reg:CC CC_REGNUM))])]
>    ""
> -  "emit_insn (gen_sne (operands[0], operands[2]));
> -   if (GET_CODE (operands[1]) == EQ)
> -     emit_insn (gen_xorsi3 (operands[0], operands[0], const1_rtx));
> +  "machine_mode mode = GET_MODE (operands[2]);
> +   if (TARGET_Z196)
> +     {
> +       rtx cond, ite;
> +
> +       if (GET_CODE (operands[1]) == NE)
> +	 cond = gen_rtx_NE (VOIDmode, operands[2], const0_rtx);
> +       else
> +	 cond = gen_rtx_EQ (VOIDmode, operands[2], const0_rtx);
> +       ite = gen_rtx_IF_THEN_ELSE (SImode, cond, const1_rtx, const0_rtx);
> +       emit_insn (gen_rtx_SET (operands[0], ite));
> +     }
> +   else
> +     {
> +       if (mode != CCZ1mode)
> +	 FAIL;
> +       emit_insn (gen_sne (operands[0], operands[2]));
> +       if (GET_CODE (operands[1]) == EQ)
> +	 emit_insn (gen_xorsi3 (operands[0], operands[0], const1_rtx));
> +     }
>     DONE;")

From what I can see in the rest of the patch, none of the CS changes now
actually *rely* on this change to cstorecc4 ... s390_expand_cs_tdsi only
calls cstorecc4 on !TARGET_Z196, where the above change is a no-op, and
in the TARGET_Z196 case it deliberates does *not* use cstorecc4.

Now, in general this improvement to cstorecc4 is of course valuable
in itself.  But I think at this point it might be better to separate
this out into an independent patch (and measure its effect separately).

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] S/390: Optimize atomic_compare_exchange and atomic_compare builtins.
  2017-04-07 14:34     ` Ulrich Weigand
@ 2017-04-07 15:37       ` Dominik Vogt
  2017-04-07 17:22         ` Ulrich Weigand
  0 siblings, 1 reply; 18+ messages in thread
From: Dominik Vogt @ 2017-04-07 15:37 UTC (permalink / raw)
  To: Ulrich Weigand; +Cc: gcc-patches, Andreas Krebbel, Ulrich Weigand

On Fri, Apr 07, 2017 at 04:34:44PM +0200, Ulrich Weigand wrote:
> > +; Peephole to combine a load-and-test from volatile memory which combine does
> > +; not do.
> > +(define_peephole2
> > +  [(set (match_operand:GPR 0 "register_operand")
> > +	(match_operand:GPR 2 "memory_operand"))
> > +   (set (reg CC_REGNUM)
> > +	(compare (match_dup 0) (match_operand:GPR 1 "const0_operand")))]
> > +  "s390_match_ccmode(insn, CCSmode) && TARGET_EXTIMM
> > +   && GENERAL_REG_P (operands[0])
> > +   && satisfies_constraint_T (operands[2])"
> > +  [(parallel
> > +    [(set (reg:CCS CC_REGNUM)
> > +	  (compare:CCS (match_dup 2) (match_dup 1)))
> > +     (set (match_dup 0) (match_dup 2))])])
> 
> Still wondering why this is necessary.

It's necessary vecause Combine refuses to match anything that
contains a volatile memory reference, using a global flag for
Recog.

> > @@ -6518,13 +6533,30 @@
> >    [(parallel
> >      [(set (match_operand:SI 0 "register_operand" "")
> >  	  (match_operator:SI 1 "s390_eqne_operator"
> > -           [(match_operand:CCZ1 2 "register_operand")
> > +           [(match_operand 2 "cc_reg_operand")
> >  	    (match_operand 3 "const0_operand")]))
> >       (clobber (reg:CC CC_REGNUM))])]
> >    ""
> > -  "emit_insn (gen_sne (operands[0], operands[2]));
> > -   if (GET_CODE (operands[1]) == EQ)
> > -     emit_insn (gen_xorsi3 (operands[0], operands[0], const1_rtx));
> > +  "machine_mode mode = GET_MODE (operands[2]);
> > +   if (TARGET_Z196)
> > +     {
> > +       rtx cond, ite;
> > +
> > +       if (GET_CODE (operands[1]) == NE)
> > +	 cond = gen_rtx_NE (VOIDmode, operands[2], const0_rtx);
> > +       else
> > +	 cond = gen_rtx_EQ (VOIDmode, operands[2], const0_rtx);
> > +       ite = gen_rtx_IF_THEN_ELSE (SImode, cond, const1_rtx, const0_rtx);
> > +       emit_insn (gen_rtx_SET (operands[0], ite));
> > +     }
> > +   else
> > +     {
> > +       if (mode != CCZ1mode)
> > +	 FAIL;
> > +       emit_insn (gen_sne (operands[0], operands[2]));
> > +       if (GET_CODE (operands[1]) == EQ)
> > +	 emit_insn (gen_xorsi3 (operands[0], operands[0], const1_rtx));
> > +     }
> >     DONE;")
> 
> >From what I can see in the rest of the patch, none of the CS changes now
> actually *rely* on this change to cstorecc4 ... s390_expand_cs_tdsi only
> calls cstorecc4 on !TARGET_Z196, where the above change is a no-op, and
> in the TARGET_Z196 case it deliberates does *not* use cstorecc4.

You're right.  After all the refactoring, this part of the patch
has become unused.

> Now, in general this improvement to cstorecc4 is of course valuable
> in itself.  But I think at this point it might be better to separate
> this out into an independent patch (and measure its effect separately).

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] S/390: Optimize atomic_compare_exchange and atomic_compare builtins.
  2017-04-07 15:37       ` Dominik Vogt
@ 2017-04-07 17:22         ` Ulrich Weigand
  2017-04-10  9:13           ` Dominik Vogt
  0 siblings, 1 reply; 18+ messages in thread
From: Ulrich Weigand @ 2017-04-07 17:22 UTC (permalink / raw)
  To: vogt; +Cc: gcc-patches, Andreas Krebbel, Ulrich Weigand

Dominik Vogt wrote:
> On Fri, Apr 07, 2017 at 04:34:44PM +0200, Ulrich Weigand wrote:
> > > +; Peephole to combine a load-and-test from volatile memory which combine does
> > > +; not do.
> > > +(define_peephole2
> > > +  [(set (match_operand:GPR 0 "register_operand")
> > > +	(match_operand:GPR 2 "memory_operand"))
> > > +   (set (reg CC_REGNUM)
> > > +	(compare (match_dup 0) (match_operand:GPR 1 "const0_operand")))]
> > > +  "s390_match_ccmode(insn, CCSmode) && TARGET_EXTIMM
> > > +   && GENERAL_REG_P (operands[0])
> > > +   && satisfies_constraint_T (operands[2])"
> > > +  [(parallel
> > > +    [(set (reg:CCS CC_REGNUM)
> > > +	  (compare:CCS (match_dup 2) (match_dup 1)))
> > > +     (set (match_dup 0) (match_dup 2))])])
> > 
> > Still wondering why this is necessary.
> 
> It's necessary vecause Combine refuses to match anything that
> contains a volatile memory reference, using a global flag for
> Recog.

So is this specifically to match the pre-test load emitted here?

+      emit_move_insn (output, mem);
+      emit_insn (gen_rtx_SET (cc, gen_rtx_COMPARE (CCZmode, output, cmp)));

If so, since you already know that this should always map to a
LOAD AND TEST, could simply just emit the LT pattern here,
instead of relying on combine to do it ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] S/390: Optimize atomic_compare_exchange and atomic_compare builtins.
  2017-04-07 17:22         ` Ulrich Weigand
@ 2017-04-10  9:13           ` Dominik Vogt
  2017-04-10  9:21             ` Dominik Vogt
  0 siblings, 1 reply; 18+ messages in thread
From: Dominik Vogt @ 2017-04-10  9:13 UTC (permalink / raw)
  To: Ulrich Weigand; +Cc: gcc-patches, Andreas Krebbel, Ulrich Weigand

On Fri, Apr 07, 2017 at 07:22:23PM +0200, Ulrich Weigand wrote:
> Dominik Vogt wrote:
> > On Fri, Apr 07, 2017 at 04:34:44PM +0200, Ulrich Weigand wrote:
> > > > +; Peephole to combine a load-and-test from volatile memory which combine does
> > > > +; not do.
> > > > +(define_peephole2
> > > > +  [(set (match_operand:GPR 0 "register_operand")
> > > > +	(match_operand:GPR 2 "memory_operand"))
> > > > +   (set (reg CC_REGNUM)
> > > > +	(compare (match_dup 0) (match_operand:GPR 1 "const0_operand")))]
> > > > +  "s390_match_ccmode(insn, CCSmode) && TARGET_EXTIMM
> > > > +   && GENERAL_REG_P (operands[0])
> > > > +   && satisfies_constraint_T (operands[2])"
> > > > +  [(parallel
> > > > +    [(set (reg:CCS CC_REGNUM)
> > > > +	  (compare:CCS (match_dup 2) (match_dup 1)))
> > > > +     (set (match_dup 0) (match_dup 2))])])
> > > 
> > > Still wondering why this is necessary.
> > 
> > It's necessary vecause Combine refuses to match anything that
> > contains a volatile memory reference, using a global flag for
> > Recog.
> 
> So is this specifically to match the pre-test load emitted here?
> 
> +      emit_move_insn (output, mem);
> +      emit_insn (gen_rtx_SET (cc, gen_rtx_COMPARE (CCZmode, output, cmp)));
> 
> If so, since you already know that this should always map to a
> LOAD AND TEST, could simply just emit the LT pattern here,
> instead of relying on combine to do it ...

Well, only if the value to compare to is constant zero (which is
what Glibc does).  In all other cases this won't result in
load-and-test.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] S/390: Optimize atomic_compare_exchange and atomic_compare builtins.
  2017-04-10  9:13           ` Dominik Vogt
@ 2017-04-10  9:21             ` Dominik Vogt
  2017-04-10 21:37               ` Ulrich Weigand
  0 siblings, 1 reply; 18+ messages in thread
From: Dominik Vogt @ 2017-04-10  9:21 UTC (permalink / raw)
  To: Ulrich Weigand, gcc-patches, Andreas Krebbel, Ulrich Weigand

On Mon, Apr 10, 2017 at 10:13:01AM +0100, Dominik Vogt wrote:
> On Fri, Apr 07, 2017 at 07:22:23PM +0200, Ulrich Weigand wrote:
> > Dominik Vogt wrote:
> > > On Fri, Apr 07, 2017 at 04:34:44PM +0200, Ulrich Weigand wrote:
> > > > > +; Peephole to combine a load-and-test from volatile memory which combine does
> > > > > +; not do.
> > > > > +(define_peephole2
> > > > > +  [(set (match_operand:GPR 0 "register_operand")
> > > > > +	(match_operand:GPR 2 "memory_operand"))
> > > > > +   (set (reg CC_REGNUM)
> > > > > +	(compare (match_dup 0) (match_operand:GPR 1 "const0_operand")))]
> > > > > +  "s390_match_ccmode(insn, CCSmode) && TARGET_EXTIMM
> > > > > +   && GENERAL_REG_P (operands[0])
> > > > > +   && satisfies_constraint_T (operands[2])"
> > > > > +  [(parallel
> > > > > +    [(set (reg:CCS CC_REGNUM)
> > > > > +	  (compare:CCS (match_dup 2) (match_dup 1)))
> > > > > +     (set (match_dup 0) (match_dup 2))])])
> > > > 
> > > > Still wondering why this is necessary.
> > > 
> > > It's necessary vecause Combine refuses to match anything that
> > > contains a volatile memory reference, using a global flag for
> > > Recog.
> > 
> > So is this specifically to match the pre-test load emitted here?
> > 
> > +      emit_move_insn (output, mem);
> > +      emit_insn (gen_rtx_SET (cc, gen_rtx_COMPARE (CCZmode, output, cmp)));
> > 
> > If so, since you already know that this should always map to a
> > LOAD AND TEST, could simply just emit the LT pattern here,
> > instead of relying on combine to do it ...
> 
> Well, only if the value to compare to is constant zero (which is
> what Glibc does).  In all other cases this won't result in
> load-and-test.

So, we could add a special case for const0_rtx that generates the
LT pattern and does not rely on Combine, and get rid of the
peephole.  I'm not sure this is worthwhile thoug, because the
peephole has other beneficial effects (as discussed), and until
we've solved the problems preventing Combine from merging L+LTR in
some cases, this is the best we have.  What do you think?

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] S/390: Optimize atomic_compare_exchange and atomic_compare builtins.
  2017-04-10  9:21             ` Dominik Vogt
@ 2017-04-10 21:37               ` Ulrich Weigand
  2017-04-11  8:11                 ` Andreas Krebbel
  0 siblings, 1 reply; 18+ messages in thread
From: Ulrich Weigand @ 2017-04-10 21:37 UTC (permalink / raw)
  To: vogt; +Cc: gcc-patches, Andreas Krebbel, Ulrich Weigand

Dominik Vogt wrote:

> So, we could add a special case for const0_rtx that generates the
> LT pattern and does not rely on Combine, and get rid of the
> peephole.  I'm not sure this is worthwhile thoug, because the
> peephole has other beneficial effects (as discussed), and until
> we've solved the problems preventing Combine from merging L+LTR in
> some cases, this is the best we have.  What do you think?

If we removed the peephole (for now), the patch now only touches
parts of the backend used to emit atomic instructions, so code
generation for any code that doesn't use those is guaranteed to
be unchanged.  Given that we're quite late in the cycle, this
might be a good idea at this point ...

But I don't see anything actually incorrect in the peephole, and
it might indeed be a good thing in general -- just maybe more
appropriate for the next stage1. 

Andreas, do you have an opinion on this?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] S/390: Optimize atomic_compare_exchange and atomic_compare builtins.
  2017-04-10 21:37               ` Ulrich Weigand
@ 2017-04-11  8:11                 ` Andreas Krebbel
  0 siblings, 0 replies; 18+ messages in thread
From: Andreas Krebbel @ 2017-04-11  8:11 UTC (permalink / raw)
  To: Ulrich Weigand, vogt; +Cc: gcc-patches, Ulrich Weigand

On 04/10/2017 11:37 PM, Ulrich Weigand wrote:
> Dominik Vogt wrote:
> 
>> So, we could add a special case for const0_rtx that generates the
>> LT pattern and does not rely on Combine, and get rid of the
>> peephole.  I'm not sure this is worthwhile thoug, because the
>> peephole has other beneficial effects (as discussed), and until
>> we've solved the problems preventing Combine from merging L+LTR in
>> some cases, this is the best we have.  What do you think?
> 
> If we removed the peephole (for now), the patch now only touches
> parts of the backend used to emit atomic instructions, so code
> generation for any code that doesn't use those is guaranteed to
> be unchanged.  Given that we're quite late in the cycle, this
> might be a good idea at this point ...
> 
> But I don't see anything actually incorrect in the peephole, and
> it might indeed be a good thing in general -- just maybe more
> appropriate for the next stage1. 
> 
> Andreas, do you have an opinion on this?

Yes, the change is unrelated to the rest of the patch and therefore should go into a separate patch.
But I'm fine with applying both right now.
The peephole surprisingly often helped code generation.  There appear to be plenty of cases where
the L+LTR combination could not be handled otherwise. Dominik reviewed the speccpu diffs and there
were only improvements.

-Andreas-

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v5] S/390: Optimize atomic_compare_exchange and atomic_compare builtins.
  2017-03-27 20:50 [PATCH] S/390: Optimize atomic_compare_exchange and atomic_compare builtins Dominik Vogt
  2017-03-29 15:22 ` [PATCH v2] " Dominik Vogt
  2017-04-05 13:52 ` [PATCH] " Dominik Vogt
@ 2017-04-11 14:21 ` Dominik Vogt
  2017-04-24 15:20   ` Ulrich Weigand
  2017-04-25  8:04   ` Andreas Krebbel
  2 siblings, 2 replies; 18+ messages in thread
From: Dominik Vogt @ 2017-04-11 14:21 UTC (permalink / raw)
  To: gcc-patches; +Cc: Andreas Krebbel, Ulrich Weigand

[-- Attachment #1: Type: text/plain, Size: 644 bytes --]

On Mon, Mar 27, 2017 at 09:27:35PM +0100, Dominik Vogt wrote:
> The attached patch optimizes the atomic_exchange and
> atomic_compare patterns on s390 and s390x (mostly limited to
> SImode and DImode).  Among general optimizaation, the changes fix
> most of the problems reported in PR 80080:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80080
> 
> Bootstrapped and regression tested on a zEC12 with s390 and s390x
> biarch.

v5:
  * Generate LT pattern directly for const 0 value.
  * Split into three patches.

Bootstrapped and regression tested on a zEC12 with s390 and s390x
biarch.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

[-- Attachment #2: 0001-v5-ChangeLog --]
[-- Type: text/plain, Size: 132 bytes --]

gcc/ChangeLog-dv-atomic-gcc7-1

	* config/s390/s390.md ("cstorecc4"): Use load-on-condition and deal
	with CCZmode for TARGET_Z196.

[-- Attachment #3: 0002-v5-ChangeLog --]
[-- Type: text/plain, Size: 155 bytes --]

gcc/ChangeLog-dv-atomic-gcc7-2

	* config/s390/s390.md (define_peephole2): New peephole to help
	combining the load-and-test pattern with volatile memory.

[-- Attachment #4: 0003-v5-ChangeLog --]
[-- Type: text/plain, Size: 1484 bytes --]

gcc/ChangeLog-dv-atomic-gcc7-3

	* s390-protos.h (s390_expand_cs_hqi): Removed.
	(s390_expand_cs, s390_expand_atomic_exchange_tdsi): New prototypes.
	* config/s390/s390.c (s390_emit_compare_and_swap): Handle all integer
	modes as well as CCZ1mode and CCZmode.
	(s390_expand_atomic_exchange_tdsi, s390_expand_atomic): Adapt to new
	signature of s390_emit_compare_and_swap.
	(s390_expand_cs_hqi): Likewise, make static.
	(s390_expand_cs_tdsi): Generate an explicit compare before trying
	compare-and-swap, in some cases.
	(s390_expand_cs): Wrapper function.
	(s390_expand_atomic_exchange_tdsi): New backend specific expander for
	atomic_exchange.
	(s390_match_ccmode_set): Allow CCZmode <-> CCZ1 mode.
	* config/s390/s390.md ("atomic_compare_and_swap<mode>"): Merge the
	patterns for small and large integers.  Forbid symref memory operands.
	Move expander to s390.c.  Require cc register.
	("atomic_compare_and_swap<DGPR:mode><CCZZ1:mode>_internal")
	("*atomic_compare_and_swap<TDI:mode><CCZZ1:mode>_1")
	("*atomic_compare_and_swapdi<CCZZ1:mode>_2")
	("*atomic_compare_and_swapsi<CCZZ1:mode>_3"): Use s_operand to forbid
	symref memory operands.  Remove CC mode and call s390_match_ccmode
	instead.
	("atomic_exchange<mode>"): Allow and implement all integer modes.
gcc/testsuite/ChangeLog-dv-atomic-gcc7

	* gcc.target/s390/md/atomic_compare_exchange-1.c: New test.
	* gcc.target/s390/md/atomic_compare_exchange-1.inc: New test.
	* gcc.target/s390/md/atomic_exchange-1.inc: New test.

[-- Attachment #5: 0001-v5-S-390-Use-load-on-condition-in-cstorecc4.patch --]
[-- Type: text/plain, Size: 1662 bytes --]

From fda471edcdea8b86d678514d4fa6cf11745cd2a5 Mon Sep 17 00:00:00 2001
From: Dominik Vogt <vogt@linux.vnet.ibm.com>
Date: Mon, 10 Apr 2017 08:29:40 +0100
Subject: [PATCH 1/3] S/390: Use load-on-condition in cstorecc4.

---
 gcc/config/s390/s390.md | 25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index 59f189c..6a1cab6 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -6518,13 +6518,30 @@
   [(parallel
     [(set (match_operand:SI 0 "register_operand" "")
 	  (match_operator:SI 1 "s390_eqne_operator"
-           [(match_operand:CCZ1 2 "register_operand")
+           [(match_operand 2 "cc_reg_operand")
 	    (match_operand 3 "const0_operand")]))
      (clobber (reg:CC CC_REGNUM))])]
   ""
-  "emit_insn (gen_sne (operands[0], operands[2]));
-   if (GET_CODE (operands[1]) == EQ)
-     emit_insn (gen_xorsi3 (operands[0], operands[0], const1_rtx));
+  "machine_mode mode = GET_MODE (operands[2]);
+   if (TARGET_Z196)
+     {
+       rtx cond, ite;
+
+       if (GET_CODE (operands[1]) == NE)
+	 cond = gen_rtx_NE (VOIDmode, operands[2], const0_rtx);
+       else
+	 cond = gen_rtx_EQ (VOIDmode, operands[2], const0_rtx);
+       ite = gen_rtx_IF_THEN_ELSE (SImode, cond, const1_rtx, const0_rtx);
+       emit_insn (gen_rtx_SET (operands[0], ite));
+     }
+   else
+     {
+       if (mode != CCZ1mode)
+	 FAIL;
+       emit_insn (gen_sne (operands[0], operands[2]));
+       if (GET_CODE (operands[1]) == EQ)
+	 emit_insn (gen_xorsi3 (operands[0], operands[0], const1_rtx));
+     }
    DONE;")
 
 (define_insn_and_split "sne"
-- 
2.3.0


[-- Attachment #6: 0002-v5-S-390-Peephole-to-merge-L-followerd-by-LT.patch --]
[-- Type: text/plain, Size: 1299 bytes --]

From cf55c86d6c8b7aef1a669c9368410f32f333c59a Mon Sep 17 00:00:00 2001
From: Dominik Vogt <vogt@linux.vnet.ibm.com>
Date: Tue, 11 Apr 2017 12:18:01 +0100
Subject: [PATCH 2/3] S/390: Peephole to merge L followerd by LT.

In some situations this is not done by other optimization passed.
---
 gcc/config/s390/s390.md | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index 6a1cab6..9baafcc 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -907,6 +907,21 @@
   [(set_attr "op_type" "RR<E>,RXY")
    (set_attr "z10prop" "z10_fr_E1,z10_fwd_A3") ])
 
+; Peephole to combine a load-and-test from volatile memory which combine does
+; not do.
+(define_peephole2
+  [(set (match_operand:GPR 0 "register_operand")
+	(match_operand:GPR 2 "memory_operand"))
+   (set (reg CC_REGNUM)
+	(compare (match_dup 0) (match_operand:GPR 1 "const0_operand")))]
+  "s390_match_ccmode(insn, CCSmode) && TARGET_EXTIMM
+   && GENERAL_REG_P (operands[0])
+   && satisfies_constraint_T (operands[2])"
+  [(parallel
+    [(set (reg:CCS CC_REGNUM)
+	  (compare:CCS (match_dup 2) (match_dup 1)))
+     (set (match_dup 0) (match_dup 2))])])
+
 ; ltr, lt, ltgr, ltg
 (define_insn "*tst<mode>_cconly_extimm"
   [(set (reg CC_REGNUM)
-- 
2.3.0


[-- Attachment #7: 0003-v5-S-390-Optimize-atomic_compare_exchange-and-atomic_co.patch --]
[-- Type: text/plain, Size: 32789 bytes --]

From 03822e2653ac7c302da58b4bdabc36ebdafd9902 Mon Sep 17 00:00:00 2001
From: Dominik Vogt <vogt@linux.vnet.ibm.com>
Date: Thu, 23 Feb 2017 17:23:11 +0100
Subject: [PATCH 3/3] S/390: Optimize atomic_compare_exchange and
 atomic_compare builtins.

1) Use the load-and-test instructions for atomic_exchange if the value is 0.
2) If IS_WEAK is true, compare the memory contents before a compare-and-swap
   and skip the CS instructions if the value is not the expected one.
---
 gcc/config/s390/s390-protos.h                      |   4 +-
 gcc/config/s390/s390.c                             | 184 ++++++++++-
 gcc/config/s390/s390.md                            | 110 +++----
 .../gcc.target/s390/md/atomic_compare_exchange-1.c |  84 ++++++
 .../s390/md/atomic_compare_exchange-1.inc          | 336 +++++++++++++++++++++
 .../gcc.target/s390/md/atomic_exchange-1.c         | 309 +++++++++++++++++++
 6 files changed, 954 insertions(+), 73 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/md/atomic_compare_exchange-1.c
 create mode 100644 gcc/testsuite/gcc.target/s390/md/atomic_compare_exchange-1.inc
 create mode 100644 gcc/testsuite/gcc.target/s390/md/atomic_exchange-1.c

diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h
index 7f06a20..3fdb320 100644
--- a/gcc/config/s390/s390-protos.h
+++ b/gcc/config/s390/s390-protos.h
@@ -112,8 +112,8 @@ extern void s390_expand_vec_strlen (rtx, rtx, rtx);
 extern void s390_expand_vec_movstr (rtx, rtx, rtx);
 extern bool s390_expand_addcc (enum rtx_code, rtx, rtx, rtx, rtx, rtx);
 extern bool s390_expand_insv (rtx, rtx, rtx, rtx);
-extern void s390_expand_cs_hqi (machine_mode, rtx, rtx, rtx,
-				rtx, rtx, bool);
+extern void s390_expand_cs (machine_mode, rtx, rtx, rtx, rtx, rtx, bool);
+extern void s390_expand_atomic_exchange_tdsi (rtx, rtx, rtx);
 extern void s390_expand_atomic (machine_mode, enum rtx_code,
 				rtx, rtx, rtx, bool);
 extern void s390_expand_tbegin (rtx, rtx, rtx, bool);
diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 2cb8947..c16391a 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -1306,6 +1306,7 @@ s390_match_ccmode_set (rtx set, machine_mode req_mode)
   set_mode = GET_MODE (SET_DEST (set));
   switch (set_mode)
     {
+    case CCZ1mode:
     case CCSmode:
     case CCSRmode:
     case CCUmode:
@@ -1328,7 +1329,8 @@ s390_match_ccmode_set (rtx set, machine_mode req_mode)
 
     case CCZmode:
       if (req_mode != CCSmode && req_mode != CCUmode && req_mode != CCTmode
-	  && req_mode != CCSRmode && req_mode != CCURmode)
+	  && req_mode != CCSRmode && req_mode != CCURmode
+	  && req_mode != CCZ1mode)
         return 0;
       break;
 
@@ -1762,11 +1764,31 @@ s390_emit_compare (enum rtx_code code, rtx op0, rtx op1)
 
 static rtx
 s390_emit_compare_and_swap (enum rtx_code code, rtx old, rtx mem,
-			    rtx cmp, rtx new_rtx)
+			    rtx cmp, rtx new_rtx, machine_mode ccmode)
 {
-  emit_insn (gen_atomic_compare_and_swapsi_internal (old, mem, cmp, new_rtx));
-  return s390_emit_compare (code, gen_rtx_REG (CCZ1mode, CC_REGNUM),
-			    const0_rtx);
+  rtx cc;
+
+  cc = gen_rtx_REG (ccmode, CC_REGNUM);
+  switch (GET_MODE (mem))
+    {
+    case SImode:
+      emit_insn (gen_atomic_compare_and_swapsi_internal (old, mem, cmp,
+							 new_rtx, cc));
+      break;
+    case DImode:
+      emit_insn (gen_atomic_compare_and_swapdi_internal (old, mem, cmp,
+							 new_rtx, cc));
+      break;
+    case TImode:
+	emit_insn (gen_atomic_compare_and_swapti_internal (old, mem, cmp,
+							   new_rtx, cc));
+      break;
+    case QImode:
+    case HImode:
+    default:
+      gcc_unreachable ();
+    }
+  return s390_emit_compare (code, cc, const0_rtx);
 }
 
 /* Emit a jump instruction to TARGET and return it.  If COND is
@@ -6723,7 +6745,7 @@ s390_two_part_insv (struct alignment_context *ac, rtx *seq1, rtx *seq2,
    the memory location, CMP the old value to compare MEM with and NEW_RTX the
    value to set if CMP == MEM.  */
 
-void
+static void
 s390_expand_cs_hqi (machine_mode mode, rtx btarget, rtx vtarget, rtx mem,
 		    rtx cmp, rtx new_rtx, bool is_weak)
 {
@@ -6770,7 +6792,7 @@ s390_expand_cs_hqi (machine_mode mode, rtx btarget, rtx vtarget, rtx mem,
   emit_insn (seq2);
   emit_insn (seq3);
 
-  cc = s390_emit_compare_and_swap (EQ, res, ac.memsi, cmpv, newv);
+  cc = s390_emit_compare_and_swap (EQ, res, ac.memsi, cmpv, newv, CCZ1mode);
   if (is_weak)
     emit_insn (gen_cstorecc4 (btarget, cc, XEXP (cc, 0), XEXP (cc, 1)));
   else
@@ -6799,6 +6821,151 @@ s390_expand_cs_hqi (machine_mode mode, rtx btarget, rtx vtarget, rtx mem,
 					      NULL_RTX, 1, OPTAB_DIRECT), 1);
 }
 
+/* Variant of s390_expand_cs for SI, DI and TI modes.  */
+static void
+s390_expand_cs_tdsi (machine_mode mode, rtx btarget, rtx vtarget, rtx mem,
+		     rtx cmp, rtx new_rtx, bool is_weak)
+{
+  rtx output = vtarget;
+  rtx_code_label *skip_cs_label = NULL;
+  bool do_const_opt = false;
+
+  if (!register_operand (output, mode))
+    output = gen_reg_rtx (mode);
+
+  /* If IS_WEAK is true and the INPUT value is a constant, compare the memory
+     with the constant first and skip the compare_and_swap because its very
+     expensive and likely to fail anyway.
+     Note 1: This is done only for IS_WEAK.  C11 allows optimizations that may
+     cause spurious in that case.
+     Note 2: It may be useful to do this also for non-constant INPUT.
+     Note 3: Currently only targets with "load on condition" are supported
+     (z196 and newer).  */
+
+  if (TARGET_Z196
+      && (mode == SImode || mode == DImode))
+    do_const_opt = (is_weak && CONST_INT_P (cmp));
+
+  if (do_const_opt)
+    {
+      const int very_unlikely = REG_BR_PROB_BASE / 100 - 1;
+      rtx cc = gen_rtx_REG (CCZmode, CC_REGNUM);
+
+      skip_cs_label = gen_label_rtx ();
+      emit_move_insn (btarget, const0_rtx);
+      if (CONST_INT_P (cmp) && INTVAL (cmp) == 0)
+	{
+	  rtvec lt = rtvec_alloc (2);
+
+	  /* Load-and-test + conditional jump.  */
+	  RTVEC_ELT (lt, 0)
+	    = gen_rtx_SET (cc, gen_rtx_COMPARE (CCZmode, mem, cmp));
+	  RTVEC_ELT (lt, 1) = gen_rtx_SET (output, mem);
+	  emit_insn (gen_rtx_PARALLEL (VOIDmode, lt));
+	}
+      else
+	{
+	  emit_move_insn (output, mem);
+	  emit_insn (gen_rtx_SET (cc, gen_rtx_COMPARE (CCZmode, output, cmp)));
+	}
+      s390_emit_jump (skip_cs_label, gen_rtx_NE (VOIDmode, cc, const0_rtx));
+      add_int_reg_note (get_last_insn (), REG_BR_PROB, very_unlikely);
+      /* If the jump is not taken, OUTPUT is the expected value.  */
+      cmp = output;
+      /* Reload newval to a register manually, *after* the compare and jump
+	 above.  Otherwise Reload might place it before the jump.  */
+    }
+  else
+    cmp = force_reg (mode, cmp);
+  new_rtx = force_reg (mode, new_rtx);
+  s390_emit_compare_and_swap (EQ, output, mem, cmp, new_rtx,
+			      (do_const_opt) ? CCZmode : CCZ1mode);
+  if (skip_cs_label != NULL)
+    emit_label (skip_cs_label);
+
+  /* We deliberately accept non-register operands in the predicate
+     to ensure the write back to the output operand happens *before*
+     the store-flags code below.  This makes it easier for combine
+     to merge the store-flags code with a potential test-and-branch
+     pattern following (immediately!) afterwards.  */
+  if (output != vtarget)
+    emit_move_insn (vtarget, output);
+
+  if (do_const_opt)
+    {
+      rtx cc, cond, ite;
+
+      /* Do not use gen_cstorecc4 here because it writes either 1 or 0, but
+	 btarget has already been initialized with 0 above.  */
+      cc = gen_rtx_REG (CCZmode, CC_REGNUM);
+      cond = gen_rtx_EQ (VOIDmode, cc, const0_rtx);
+      ite = gen_rtx_IF_THEN_ELSE (SImode, cond, const1_rtx, btarget);
+      emit_insn (gen_rtx_SET (btarget, ite));
+    }
+  else
+    {
+      rtx cc, cond;
+
+      cc = gen_rtx_REG (CCZ1mode, CC_REGNUM);
+      cond = gen_rtx_EQ (SImode, cc, const0_rtx);
+      emit_insn (gen_cstorecc4 (btarget, cond, cc, const0_rtx));
+    }
+}
+
+/* Expand an atomic compare and swap operation.  MEM is the memory location,
+   CMP the old value to compare MEM with and NEW_RTX the value to set if
+   CMP == MEM.  */
+
+void
+s390_expand_cs (machine_mode mode, rtx btarget, rtx vtarget, rtx mem,
+		rtx cmp, rtx new_rtx, bool is_weak)
+{
+  switch (mode)
+    {
+    case TImode:
+    case DImode:
+    case SImode:
+      s390_expand_cs_tdsi (mode, btarget, vtarget, mem, cmp, new_rtx, is_weak);
+      break;
+    case HImode:
+    case QImode:
+      s390_expand_cs_hqi (mode, btarget, vtarget, mem, cmp, new_rtx, is_weak);
+      break;
+    default:
+      gcc_unreachable ();
+    }
+}
+
+/* Expand an atomic_exchange operation simulated with a compare-and-swap loop.
+   The memory location MEM is set to INPUT.  OUTPUT is set to the previous value
+   of MEM.  */
+
+void
+s390_expand_atomic_exchange_tdsi (rtx output, rtx mem, rtx input)
+{
+  machine_mode mode = GET_MODE (mem);
+  rtx_code_label *csloop;
+
+  if (TARGET_Z196
+      && (mode == DImode || mode == SImode)
+      && CONST_INT_P (input) && INTVAL (input) == 0)
+    {
+      emit_move_insn (output, const0_rtx);
+      if (mode == DImode)
+	emit_insn (gen_atomic_fetch_anddi (output, mem, const0_rtx, input));
+      else
+	emit_insn (gen_atomic_fetch_andsi (output, mem, const0_rtx, input));
+      return;
+    }
+
+  input = force_reg (mode, input);
+  emit_move_insn (output, mem);
+  csloop = gen_label_rtx ();
+  emit_label (csloop);
+  s390_emit_jump (csloop, s390_emit_compare_and_swap (NE, output, mem, output,
+						      input, CCZ1mode));
+}
+
 /* Expand an atomic operation CODE of mode MODE.  MEM is the memory location
    and VAL the value to play with.  If AFTER is true then store the value
    MEM holds after the operation, if AFTER is false then store the value MEM
@@ -6878,7 +7045,8 @@ s390_expand_atomic (machine_mode mode, enum rtx_code code,
     }
 
   s390_emit_jump (csloop, s390_emit_compare_and_swap (NE, cmp,
-						      ac.memsi, cmp, new_rtx));
+						      ac.memsi, cmp, new_rtx,
+						      CCZ1mode));
 
   /* Return the correct part of the bitfield.  */
   if (target)
diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index 9baafcc..8a700ed 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -10230,83 +10230,56 @@
 
 (define_expand "atomic_compare_and_swap<mode>"
   [(match_operand:SI 0 "register_operand")	;; bool success output
-   (match_operand:DGPR 1 "nonimmediate_operand");; oldval output
-   (match_operand:DGPR 2 "memory_operand")	;; memory
-   (match_operand:DGPR 3 "register_operand")	;; expected intput
-   (match_operand:DGPR 4 "register_operand")	;; newval intput
+   (match_operand:DINT 1 "nonimmediate_operand");; oldval output
+   (match_operand:DINT 2 "s_operand")		;; memory
+   (match_operand:DINT 3 "general_operand")	;; expected intput
+   (match_operand:DINT 4 "general_operand")	;; newval intput
    (match_operand:SI 5 "const_int_operand")	;; is_weak
    (match_operand:SI 6 "const_int_operand")	;; success model
    (match_operand:SI 7 "const_int_operand")]	;; failure model
   ""
 {
-  rtx cc, cmp, output = operands[1];
-
-  if (!register_operand (output, <MODE>mode))
-    output = gen_reg_rtx (<MODE>mode);
-
-  if (MEM_ALIGN (operands[2]) < GET_MODE_BITSIZE (GET_MODE (operands[2])))
+  if (GET_MODE_BITSIZE (<MODE>mode) >= 16
+      && GET_MODE_BITSIZE (<MODE>mode) > MEM_ALIGN (operands[2]))
     FAIL;
 
-  emit_insn (gen_atomic_compare_and_swap<mode>_internal
-	     (output, operands[2], operands[3], operands[4]));
-
-  /* We deliberately accept non-register operands in the predicate
-     to ensure the write back to the output operand happens *before*
-     the store-flags code below.  This makes it easier for combine
-     to merge the store-flags code with a potential test-and-branch
-     pattern following (immediately!) afterwards.  */
-  if (output != operands[1])
-    emit_move_insn (operands[1], output);
-
-  cc = gen_rtx_REG (CCZ1mode, CC_REGNUM);
-  cmp = gen_rtx_EQ (SImode, cc, const0_rtx);
-  emit_insn (gen_cstorecc4 (operands[0], cmp, cc, const0_rtx));
-  DONE;
-})
-
-(define_expand "atomic_compare_and_swap<mode>"
-  [(match_operand:SI 0 "register_operand")	;; bool success output
-   (match_operand:HQI 1 "nonimmediate_operand")	;; oldval output
-   (match_operand:HQI 2 "memory_operand")	;; memory
-   (match_operand:HQI 3 "general_operand")	;; expected intput
-   (match_operand:HQI 4 "general_operand")	;; newval intput
-   (match_operand:SI 5 "const_int_operand")	;; is_weak
-   (match_operand:SI 6 "const_int_operand")	;; success model
-   (match_operand:SI 7 "const_int_operand")]	;; failure model
-  ""
-{
-  s390_expand_cs_hqi (<MODE>mode, operands[0], operands[1], operands[2],
-		      operands[3], operands[4], INTVAL (operands[5]));
-  DONE;
-})
+  s390_expand_cs (<MODE>mode, operands[0], operands[1], operands[2],
+		  operands[3], operands[4], INTVAL (operands[5]));
+  DONE;})
 
 (define_expand "atomic_compare_and_swap<mode>_internal"
   [(parallel
      [(set (match_operand:DGPR 0 "register_operand")
-	   (match_operand:DGPR 1 "memory_operand"))
+	   (match_operand:DGPR 1 "s_operand"))
       (set (match_dup 1)
 	   (unspec_volatile:DGPR
 	     [(match_dup 1)
 	      (match_operand:DGPR 2 "register_operand")
 	      (match_operand:DGPR 3 "register_operand")]
 	     UNSPECV_CAS))
-      (set (reg:CCZ1 CC_REGNUM)
-	   (compare:CCZ1 (match_dup 1) (match_dup 2)))])]
-  "")
+      (set (match_operand 4 "cc_reg_operand")
+	   (match_dup 5))])]
+  "GET_MODE (operands[4]) == CCZmode
+   || GET_MODE (operands[4]) == CCZ1mode"
+{
+  operands[5]
+    = gen_rtx_COMPARE (GET_MODE (operands[4]), operands[1], operands[2]);
+})
 
 ; cdsg, csg
 (define_insn "*atomic_compare_and_swap<mode>_1"
   [(set (match_operand:TDI 0 "register_operand" "=r")
-	(match_operand:TDI 1 "memory_operand" "+S"))
+	(match_operand:TDI 1 "s_operand" "+S"))
    (set (match_dup 1)
 	(unspec_volatile:TDI
 	  [(match_dup 1)
 	   (match_operand:TDI 2 "register_operand" "0")
 	   (match_operand:TDI 3 "register_operand" "r")]
 	  UNSPECV_CAS))
-   (set (reg:CCZ1 CC_REGNUM)
-	(compare:CCZ1 (match_dup 1) (match_dup 2)))]
-  "TARGET_ZARCH"
+   (set (reg CC_REGNUM)
+	(compare (match_dup 1) (match_dup 2)))]
+  "TARGET_ZARCH
+   && s390_match_ccmode (insn, CCZ1mode)"
   "c<td>sg\t%0,%3,%S1"
   [(set_attr "op_type" "RSY")
    (set_attr "type"   "sem")])
@@ -10314,16 +10287,17 @@
 ; cds, cdsy
 (define_insn "*atomic_compare_and_swapdi_2"
   [(set (match_operand:DI 0 "register_operand" "=r,r")
-	(match_operand:DI 1 "memory_operand" "+Q,S"))
+	(match_operand:DI 1 "s_operand" "+Q,S"))
    (set (match_dup 1)
 	(unspec_volatile:DI
 	  [(match_dup 1)
 	   (match_operand:DI 2 "register_operand" "0,0")
 	   (match_operand:DI 3 "register_operand" "r,r")]
 	  UNSPECV_CAS))
-   (set (reg:CCZ1 CC_REGNUM)
-	(compare:CCZ1 (match_dup 1) (match_dup 2)))]
-  "!TARGET_ZARCH"
+   (set (reg CC_REGNUM)
+	(compare (match_dup 1) (match_dup 2)))]
+  "!TARGET_ZARCH
+   && s390_match_ccmode (insn, CCZ1mode)"
   "@
    cds\t%0,%3,%S1
    cdsy\t%0,%3,%S1"
@@ -10334,16 +10308,16 @@
 ; cs, csy
 (define_insn "*atomic_compare_and_swapsi_3"
   [(set (match_operand:SI 0 "register_operand" "=r,r")
-	(match_operand:SI 1 "memory_operand" "+Q,S"))
+	(match_operand:SI 1 "s_operand" "+Q,S"))
    (set (match_dup 1)
 	(unspec_volatile:SI
 	  [(match_dup 1)
 	   (match_operand:SI 2 "register_operand" "0,0")
 	   (match_operand:SI 3 "register_operand" "r,r")]
 	  UNSPECV_CAS))
-   (set (reg:CCZ1 CC_REGNUM)
-	(compare:CCZ1 (match_dup 1) (match_dup 2)))]
-  ""
+   (set (reg CC_REGNUM)
+	(compare (match_dup 1) (match_dup 2)))]
+  "s390_match_ccmode (insn, CCZ1mode)"
   "@
    cs\t%0,%3,%S1
    csy\t%0,%3,%S1"
@@ -10430,15 +10404,25 @@
   DONE;
 })
 
+;; Pattern to implement atomic_exchange with a compare-and-swap loop.  The code
+;; generated by the middleend is not good.
 (define_expand "atomic_exchange<mode>"
-  [(match_operand:HQI 0 "register_operand")		;; val out
-   (match_operand:HQI 1 "memory_operand")		;; memory
-   (match_operand:HQI 2 "general_operand")		;; val in
+  [(match_operand:DINT 0 "register_operand")		;; val out
+   (match_operand:DINT 1 "s_operand")			;; memory
+   (match_operand:DINT 2 "general_operand")		;; val in
    (match_operand:SI 3 "const_int_operand")]		;; model
   ""
 {
-  s390_expand_atomic (<MODE>mode, SET, operands[0], operands[1],
-		      operands[2], false);
+  if (<MODE>mode != QImode
+      && MEM_ALIGN (operands[1]) < GET_MODE_BITSIZE (<MODE>mode))
+    FAIL;
+  if (<MODE>mode == HImode || <MODE>mode == QImode)
+    s390_expand_atomic (<MODE>mode, SET, operands[0], operands[1], operands[2],
+			false);
+  else if (<MODE>mode == SImode || TARGET_ZARCH)
+    s390_expand_atomic_exchange_tdsi (operands[0], operands[1], operands[2]);
+  else
+    FAIL;
   DONE;
 })
 
diff --git a/gcc/testsuite/gcc.target/s390/md/atomic_compare_exchange-1.c b/gcc/testsuite/gcc.target/s390/md/atomic_compare_exchange-1.c
new file mode 100644
index 0000000..5cc026d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/md/atomic_compare_exchange-1.c
@@ -0,0 +1,84 @@
+/* Machine description pattern tests.  */
+
+/* { dg-do compile } */
+/* { dg-options "" } */
+/* { dg-do run { target { s390_useable_hw } } } */
+
+#include <stdio.h>
+
+struct
+{
+#ifdef __s390xx__
+  __int128 dummy128;
+  __int128 mem128;
+#endif
+  long long dummy64;
+  long long mem64;
+  int dummy32;
+  int mem32;
+  short mem16l;
+  short mem16h;
+  char mem8ll;
+  char mem8lh;
+  char mem8hl;
+  char mem8hh;
+} mem_s;
+
+#define TYPE char
+#define FN(SUFFIX) f8 ## SUFFIX
+#define FNS(SUFFIX) "f8" #SUFFIX
+#include "atomic_compare_exchange-1.inc"
+
+#define TYPE short
+#define FN(SUFFIX) f16 ##SUFFIX
+#define FNS(SUFFIX) "f16" #SUFFIX
+#include "atomic_compare_exchange-1.inc"
+
+#define TYPE int
+#define FN(SUFFIX) f32 ## SUFFIX
+#define FNS(SUFFIX) "f32" #SUFFIX
+#include "atomic_compare_exchange-1.inc"
+
+#define TYPE long long
+#define FN(SUFFIX) f64 ## SUFFIX
+#define FNS(SUFFIX) "f64" #SUFFIX
+#include "atomic_compare_exchange-1.inc"
+
+#ifdef __s390xx__
+#define TYPE __int128
+#define FN(SUFFIX) f128 ## SUFFIX
+#define FNS(SUFFIX) "f128" #SUFFIX
+#include "atomic_compare_exchange-1.inc"
+#endif
+
+int main(void)
+{
+  int err_count = 0;
+  int i;
+
+  for (i = -1; i <= 2; i++)
+    {
+      __builtin_memset(&mem_s, 0x99, sizeof(mem_s));
+      err_count += f8_validate(&mem_s.mem8ll, i, 1);
+      __builtin_memset(&mem_s, 0x99, sizeof(mem_s));
+      err_count += f8_validate(&mem_s.mem8lh, i, 1);
+      __builtin_memset(&mem_s, 0x99, sizeof(mem_s));
+      err_count += f8_validate(&mem_s.mem8hl, i, 1);
+      __builtin_memset(&mem_s, 0x99, sizeof(mem_s));
+      err_count += f8_validate(&mem_s.mem8hh, i, 1);
+      __builtin_memset(&mem_s, 0x99, sizeof(mem_s));
+      err_count += f16_validate(&mem_s.mem16l, i, 1);
+      __builtin_memset(&mem_s, 0x99, sizeof(mem_s));
+      err_count += f16_validate(&mem_s.mem16h, i, 1);
+      __builtin_memset(&mem_s, 0x99, sizeof(mem_s));
+      err_count += f32_validate(&mem_s.mem32, i, 1);
+      __builtin_memset(&mem_s, 0x99, sizeof(mem_s));
+      err_count += f64_validate(&mem_s.mem64, i, 1);
+#ifdef __s390xx__
+      __builtin_memset(&mem_s, 0x99, sizeof(mem_s));
+      err_count += f128_validate(&mem_s.mem128, i, 1);
+#endif
+    }
+
+  return err_count;
+}
diff --git a/gcc/testsuite/gcc.target/s390/md/atomic_compare_exchange-1.inc b/gcc/testsuite/gcc.target/s390/md/atomic_compare_exchange-1.inc
new file mode 100644
index 0000000..199aaa3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/md/atomic_compare_exchange-1.inc
@@ -0,0 +1,336 @@
+/* -*-c-*- */
+
+#undef NEW
+#define NEW 3
+
+__attribute__ ((noinline))
+int FN(_bo)(TYPE *mem, TYPE *old_ret, TYPE old)
+{
+  *old_ret = old;
+  return __atomic_compare_exchange_n (mem, (void *)old_ret, NEW, 1, 2, 0);
+}
+
+__attribute__ ((noinline))
+void FN(_o)(TYPE *mem, TYPE *old_ret, TYPE old)
+{
+  *old_ret = old;
+  __atomic_compare_exchange_n (mem, (void *)old_ret, NEW, 1, 2, 0);
+  return;
+}
+
+__attribute__ ((noinline))
+int FN(_b)(TYPE *mem, TYPE old)
+{
+  return __atomic_compare_exchange_n (mem, (void *)&old, NEW, 1, 2, 0);
+}
+
+__attribute__ ((noinline))
+void FN()(TYPE *mem, TYPE old)
+{
+  __atomic_compare_exchange_n (mem, (void *)&old, NEW, 1, 2, 0);
+  return;
+}
+
+/* Const != 0 old value.  */
+__attribute__ ((noinline))
+int FN(_c1_bo)(TYPE *mem, TYPE *old_ret)
+{
+  *old_ret = 1;
+  return __atomic_compare_exchange_n (mem, (void *)old_ret, NEW, 1, 2, 0);
+}
+
+__attribute__ ((noinline))
+void FN(_c1_o)(TYPE *mem, TYPE *old_ret)
+{
+  *old_ret = 1;
+  __atomic_compare_exchange_n (mem, (void *)old_ret, NEW, 1, 2, 0);
+  return;
+}
+
+__attribute__ ((noinline))
+int FN(_c1_b)(TYPE *mem)
+{
+  TYPE old = 1;
+  return __atomic_compare_exchange_n (mem, (void *)&old, NEW, 1, 2, 0);
+}
+
+__attribute__ ((noinline))
+void FN(_c1)(TYPE *mem)
+{
+  TYPE old = 1;
+  __atomic_compare_exchange_n (mem, (void *)&old, NEW, 1, 2, 0);
+  return;
+}
+
+/* Const == 0 old value.  */
+__attribute__ ((noinline))
+int FN(_c0_bo)(TYPE *mem, TYPE *old_ret)
+{
+  *old_ret = 0;
+  return __atomic_compare_exchange_n (mem, (void *)old_ret, NEW, 1, 2, 0);
+}
+
+__attribute__ ((noinline))
+void FN(_c0_o)(TYPE *mem, TYPE *old_ret)
+{
+  *old_ret = 0;
+  __atomic_compare_exchange_n (mem, (void *)old_ret, NEW, 1, 2, 0);
+  return;
+}
+
+__attribute__ ((noinline))
+int FN(_c0_b)(TYPE *mem)
+{
+  TYPE old = 0;
+  return __atomic_compare_exchange_n (mem, (void *)&old, NEW, 1, 2, 0);
+}
+
+__attribute__ ((noinline))
+void FN(_c0)(TYPE *mem)
+{
+  TYPE old = 0;
+  __atomic_compare_exchange_n (mem, (void *)&old, NEW, 1, 2, 0);
+  return;
+}
+
+int FN(_validate_mem)(TYPE *mem, TYPE expected_mem)
+{
+  if (*mem != expected_mem)
+    {
+      fprintf(stderr, "  BAD: mem %d != expected mem %d\n",
+	      *mem, expected_mem);
+      return 1;
+    }
+
+  return 0;
+}
+
+int FN(_validate_rc)(int rc, int expected_rc)
+{
+  if (rc != expected_rc)
+    {
+      fprintf(stderr, "  BAD: rc %d != expected rc %d\n",
+	      rc, expected_rc);
+      return 1;
+    }
+
+  return 0;
+}
+
+int FN(_validate_old_ret)(int old_ret, int expected_old_ret)
+{
+  if (old_ret != expected_old_ret)
+    {
+      fprintf(stderr, "  BAD: old_ret %d != expected old_ret %d\n",
+	      old_ret, expected_old_ret);
+      return 1;
+    }
+
+  return 0;
+}
+
+int FN(_validate)(TYPE *mem, TYPE init_mem, TYPE old)
+{
+  int err_count = 0;
+  int rc;
+  TYPE expected_mem;
+  int expected_rc;
+  TYPE old_ret;
+  int failed;
+  const char *fname;
+
+  fprintf(stderr, "%s: init_mem %d @ %p\n", __FUNCTION__, init_mem, mem);
+  {
+    failed = 0;
+    old = 1;
+    *mem = init_mem;
+    expected_mem = (init_mem == old) ? NEW : *mem;
+    expected_rc = (init_mem == old);
+    fname = FNS(_bo);
+    rc = FN(_bo)(mem, &old_ret, old);
+    failed |= FN(_validate_rc)(rc, expected_rc);
+    failed |= FN(_validate_old_ret)(old_ret, init_mem);
+    failed |= FN(_validate_mem)(mem, expected_mem);
+    if (failed)
+      {
+	fprintf(stderr, "  FAIL: %s: near line %d\n", fname, __LINE__ - 3);
+	err_count++;
+      }
+  }
+  {
+    failed = 0;
+    old = 1;
+    *mem = init_mem;
+    expected_mem = (init_mem == old) ? NEW : *mem;
+    fname = FNS(_o);
+    FN(_o)(mem, &old_ret, old);
+    failed |= FN(_validate_old_ret)(old_ret, init_mem);
+    failed |= FN(_validate_mem)(mem, expected_mem);
+    if (failed)
+      {
+	fprintf(stderr, "  FAIL: %s: near line %d\n", fname, __LINE__ - 3);
+	err_count++;
+      }
+  }
+  {
+    failed = 0;
+    old = 1;
+    *mem = init_mem;
+    expected_mem = (init_mem == old) ? NEW : *mem;
+    expected_rc = (init_mem == old);
+    fname = FNS(_b);
+    rc = FN(_b)(mem, old);
+    failed |= FN(_validate_rc)(rc, expected_rc);
+    failed |= FN(_validate_mem)(mem, expected_mem);
+    if (failed)
+      {
+	fprintf(stderr, "  FAIL: %s: near line %d\n", fname, __LINE__ - 3);
+	err_count++;
+      }
+  }
+  {
+    failed = 0;
+    old = 1;
+    *mem = init_mem;
+    expected_mem = (init_mem == old) ? NEW : *mem;
+    fname = FNS();
+    FN()(mem, old);
+    failed |= FN(_validate_mem)(mem, expected_mem);
+    if (failed)
+      {
+	fprintf(stderr, "  FAIL: %s: near line %d\n", fname, __LINE__ - 3);
+	err_count++;
+      }
+  }
+  {
+    failed = 0;
+    old = 1;
+    *mem = init_mem;
+    expected_mem = (init_mem == old) ? NEW : *mem;
+    expected_rc = (init_mem == old);
+    fname = FNS(_c1_bo);
+    rc = FN(_c1_bo)(mem, &old_ret);
+    failed |= FN(_validate_rc)(rc, expected_rc);
+    failed |= FN(_validate_old_ret)(old_ret, init_mem);
+    failed |= FN(_validate_mem)(mem, expected_mem);
+    if (failed)
+      {
+	fprintf(stderr, "  FAIL: %s: near line %d\n", fname, __LINE__ - 3);
+	err_count++;
+      }
+  }
+  {
+    failed = 0;
+    old = 1;
+    *mem = init_mem;
+    expected_mem = (init_mem == old) ? NEW : *mem;
+    fname = FNS(_c1_o);
+    FN(_c1_o)(mem, &old_ret);
+    failed |= FN(_validate_old_ret)(old_ret, init_mem);
+    failed |= FN(_validate_mem)(mem, expected_mem);
+    if (failed)
+      {
+	fprintf(stderr, "  FAIL: %s: near line %d\n", fname, __LINE__ - 3);
+	err_count++;
+      }
+  }
+  {
+    failed = 0;
+    old = 1;
+    *mem = init_mem;
+    expected_mem = (init_mem == old) ? NEW : *mem;
+    expected_rc = (init_mem == old);
+    fname = FNS(_c1_b);
+    rc = FN(_c1_b)(mem);
+    failed |= FN(_validate_rc)(rc, expected_rc);
+    failed |= FN(_validate_mem)(mem, expected_mem);
+    if (failed)
+      {
+	fprintf(stderr, "  FAIL: %s: near line %d\n", fname, __LINE__ - 3);
+	err_count++;
+      }
+  }
+  {
+    failed = 0;
+    old = 1;
+    *mem = init_mem;
+    expected_mem = (init_mem == old) ? NEW : *mem;
+    fname = FNS(_c1);
+    FN(_c1)(mem);
+    failed |= FN(_validate_mem)(mem, expected_mem);
+    if (failed)
+      {
+	fprintf(stderr, "  FAIL: %s: near line %d\n", fname, __LINE__ - 3);
+	err_count++;
+      }
+  }
+  {
+    failed = 0;
+    old = 0;
+    *mem = init_mem;
+    expected_mem = (init_mem == old) ? NEW : *mem;
+    expected_rc = (init_mem == old);
+    fname = FNS(_c0_bo);
+    rc = FN(_c0_bo)(mem, &old_ret);
+    failed |= FN(_validate_rc)(rc, expected_rc);
+    failed |= FN(_validate_old_ret)(old_ret, init_mem);
+    failed |= FN(_validate_mem)(mem, expected_mem);
+    if (failed)
+      {
+	fprintf(stderr, "  FAIL: %s: near line %d\n", fname, __LINE__ - 3);
+	err_count++;
+      }
+  }
+  {
+    failed = 0;
+    old = 0;
+    *mem = init_mem;
+    expected_mem = (init_mem == old) ? NEW : *mem;
+    fname = FNS(_c0_o);
+    FN(_c0_o)(mem, &old_ret);
+    failed |= FN(_validate_old_ret)(old_ret, init_mem);
+    failed |= FN(_validate_mem)(mem, expected_mem);
+    if (failed)
+      {
+	fprintf(stderr, "  FAIL: %s: near line %d\n", fname, __LINE__ - 3);
+	err_count++;
+      }
+  }
+  {
+    failed = 0;
+    old = 0;
+    *mem = init_mem;
+    expected_mem = (init_mem == old) ? NEW : *mem;
+    expected_rc = (init_mem == old);
+    fname = FNS(_c0_b);
+    rc = FN(_c0_b)(mem);
+    failed |= FN(_validate_rc)(rc, expected_rc);
+    failed |= FN(_validate_mem)(mem, expected_mem);
+    if (failed)
+      {
+	fprintf(stderr, "  FAIL: %s: near line %d\n", fname, __LINE__ - 3);
+	err_count++;
+      }
+  }
+  {
+    failed = 0;
+    old = 0;
+    *mem = init_mem;
+    expected_mem = (init_mem == old) ? NEW : *mem;
+    fname = FNS(_c0);
+    FN(_c0)(mem);
+    failed |= FN(_validate_mem)(mem, expected_mem);
+    if (failed)
+      {
+	fprintf(stderr, "  FAIL: %s: near line %d\n", fname, __LINE__ - 3);
+	err_count++;
+      }
+  }
+
+  return err_count;
+}
+
+#undef TYPE
+#undef MEM
+#undef FN
+#undef FNS
diff --git a/gcc/testsuite/gcc.target/s390/md/atomic_exchange-1.c b/gcc/testsuite/gcc.target/s390/md/atomic_exchange-1.c
new file mode 100644
index 0000000..f82b213
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/md/atomic_exchange-1.c
@@ -0,0 +1,309 @@
+/* Machine description pattern tests.  */
+
+/* { dg-do compile } */
+/* { dg-options "-lpthread -latomic" } */
+/* { dg-do run { target { s390_useable_hw } } } */
+
+/**/
+
+char
+ae_8_0 (char *lock)
+{
+  return __atomic_exchange_n (lock, 0, 2);
+}
+
+char
+ae_8_1 (char *lock)
+{
+  return __atomic_exchange_n (lock, 1, 2);
+}
+
+char g8;
+
+char
+ae_8_g_0 (void)
+{
+  return __atomic_exchange_n (&g8, 0, 2);
+}
+
+char
+ae_8_g_1 (void)
+{
+  return __atomic_exchange_n (&g8, 1, 2);
+}
+
+/**/
+
+short
+ae_16_0 (short *lock)
+{
+  return __atomic_exchange_n (lock, 0, 2);
+}
+
+short
+ae_16_1 (short *lock)
+{
+  return __atomic_exchange_n (lock, 1, 2);
+}
+
+short g16;
+
+short
+ae_16_g_0 (void)
+{
+  return __atomic_exchange_n (&g16, 0, 2);
+}
+
+short
+ae_16_g_1 (void)
+{
+  return __atomic_exchange_n (&g16, 1, 2);
+}
+
+/**/
+
+int
+ae_32_0 (int *lock)
+{
+  return __atomic_exchange_n (lock, 0, 2);
+}
+
+int
+ae_32_1 (int *lock)
+{
+  return __atomic_exchange_n (lock, 1, 2);
+}
+
+int g32;
+
+int
+ae_32_g_0 (void)
+{
+  return __atomic_exchange_n (&g32, 0, 2);
+}
+
+int
+ae_32_g_1 (void)
+{
+  return __atomic_exchange_n (&g32, 1, 2);
+}
+
+/**/
+
+long long
+ae_64_0 (long long *lock)
+{
+  return __atomic_exchange_n (lock, 0, 2);
+}
+
+long long
+ae_64_1 (long long *lock)
+{
+  return __atomic_exchange_n (lock, 1, 2);
+}
+
+long long g64;
+
+long long
+ ae_64_g_0 (void)
+{
+  return __atomic_exchange_n (&g64, 0, 2);
+}
+
+long long
+ae_64_g_1 (void)
+{
+  return __atomic_exchange_n (&g64, 1, 2);
+}
+
+/**/
+
+#ifdef __s390x__
+__int128
+ae_128_0 (__int128 *lock)
+{
+  return __atomic_exchange_n (lock, 0, 2);
+}
+
+__int128
+ae_128_1 (__int128 *lock)
+{
+  return __atomic_exchange_n (lock, 1, 2);
+}
+
+__int128 g128;
+
+__int128
+ae_128_g_0 (void)
+{
+  return __atomic_exchange_n (&g128, 0, 2);
+}
+
+__int128
+ae_128_g_1 (void)
+{
+  return __atomic_exchange_n (&g128, 1, 2);
+}
+
+#endif
+
+int main(void)
+{
+  int i;
+
+  for (i = 0; i <= 2; i++)
+    {
+      int oval = i;
+
+      {
+	char lock;
+	char rval;
+
+	lock = oval;
+	rval = ae_8_0 (&lock);
+	if (lock != 0)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	lock = oval;
+	rval = ae_8_1 (&lock);
+	if (lock != 1)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	g8 = oval;
+	rval = ae_8_g_0 ();
+	if (g8 != 0)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	g8 = oval;
+	rval = ae_8_g_1 ();
+	if (g8 != 1)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+      }
+      {
+	short lock;
+	short rval;
+
+	lock = oval;
+	rval = ae_16_0 (&lock);
+	if (lock != 0)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	lock = oval;
+	rval = ae_16_1 (&lock);
+	if (lock != 1)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	g16 = oval;
+	rval = ae_16_g_0 ();
+	if (g16 != 0)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	g16 = oval;
+	rval = ae_16_g_1 ();
+	if (g16 != 1)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+      }
+      {
+	int lock;
+	int rval;
+
+	lock = oval;
+	rval = ae_32_0 (&lock);
+	if (lock != 0)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	lock = oval;
+	rval = ae_32_1 (&lock);
+	if (lock != 1)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	g32 = oval;
+	rval = ae_32_g_0 ();
+	if (g32 != 0)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	g32 = oval;
+	rval = ae_32_g_1 ();
+	if (g32 != 1)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+      }
+      {
+	long long lock;
+	long long rval;
+
+	lock = oval;
+	rval = ae_64_0 (&lock);
+	if (lock != 0)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	lock = oval;
+	rval = ae_64_1 (&lock);
+	if (lock != 1)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	g64 = oval;
+	rval = ae_64_g_0 ();
+	if (g64 != 0)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	g64 = oval;
+	rval = ae_64_g_1 ();
+	if (g64 != 1)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+      }
+
+#ifdef __s390x__
+      {
+	__int128 lock;
+	__int128 rval;
+
+	lock = oval;
+	rval = ae_128_0 (&lock);
+	if (lock != 0)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	lock = oval;
+	rval = ae_128_1 (&lock);
+	if (lock != 1)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	g128 = oval;
+	rval = ae_128_g_0 ();
+	if (g128 != 0)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+	g128 = oval;
+	rval = ae_128_g_1 ();
+	if (g128 != 1)
+	  __builtin_abort ();
+	if (rval != oval)
+	  __builtin_abort ();
+      }
+#endif
+    }
+
+  return 0;
+}
-- 
2.3.0


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v5] S/390: Optimize atomic_compare_exchange and atomic_compare builtins.
  2017-04-11 14:21 ` [PATCH v5] " Dominik Vogt
@ 2017-04-24 15:20   ` Ulrich Weigand
  2017-04-25  8:04   ` Andreas Krebbel
  1 sibling, 0 replies; 18+ messages in thread
From: Ulrich Weigand @ 2017-04-24 15:20 UTC (permalink / raw)
  To: vogt; +Cc: gcc-patches, Andreas Krebbel, Ulrich Weigand

Dominik Vogt wrote:
> On Mon, Mar 27, 2017 at 09:27:35PM +0100, Dominik Vogt wrote:
> > The attached patch optimizes the atomic_exchange and
> > atomic_compare patterns on s390 and s390x (mostly limited to
> > SImode and DImode).  Among general optimizaation, the changes fix
> > most of the problems reported in PR 80080:
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80080
> > 
> > Bootstrapped and regression tested on a zEC12 with s390 and s390x
> > biarch.
> 
> v5:
>   * Generate LT pattern directly for const 0 value.
>   * Split into three patches.
> 
> Bootstrapped and regression tested on a zEC12 with s390 and s390x
> biarch.

> gcc/ChangeLog-dv-atomic-gcc7-1
> 
> 	* config/s390/s390.md ("cstorecc4"): Use load-on-condition and deal
> 	with CCZmode for TARGET_Z196.

> gcc/ChangeLog-dv-atomic-gcc7-2
> 
> 	* config/s390/s390.md (define_peephole2): New peephole to help
> 	combining the load-and-test pattern with volatile memory.

> gcc/ChangeLog-dv-atomic-gcc7-3
> 
> 	* s390-protos.h (s390_expand_cs_hqi): Removed.
> 	(s390_expand_cs, s390_expand_atomic_exchange_tdsi): New prototypes.
> 	* config/s390/s390.c (s390_emit_compare_and_swap): Handle all integer
> 	modes as well as CCZ1mode and CCZmode.
> 	(s390_expand_atomic_exchange_tdsi, s390_expand_atomic): Adapt to new
> 	signature of s390_emit_compare_and_swap.
> 	(s390_expand_cs_hqi): Likewise, make static.
> 	(s390_expand_cs_tdsi): Generate an explicit compare before trying
> 	compare-and-swap, in some cases.
> 	(s390_expand_cs): Wrapper function.
> 	(s390_expand_atomic_exchange_tdsi): New backend specific expander for
> 	atomic_exchange.
> 	(s390_match_ccmode_set): Allow CCZmode <-> CCZ1 mode.
> 	* config/s390/s390.md ("atomic_compare_and_swap<mode>"): Merge the
> 	patterns for small and large integers.  Forbid symref memory operands.
> 	Move expander to s390.c.  Require cc register.
> 	("atomic_compare_and_swap<DGPR:mode><CCZZ1:mode>_internal")
> 	("*atomic_compare_and_swap<TDI:mode><CCZZ1:mode>_1")
> 	("*atomic_compare_and_swapdi<CCZZ1:mode>_2")
> 	("*atomic_compare_and_swapsi<CCZZ1:mode>_3"): Use s_operand to forbid
> 	symref memory operands.  Remove CC mode and call s390_match_ccmode
> 	instead.
> 	("atomic_exchange<mode>"): Allow and implement all integer modes.
>
> gcc/testsuite/ChangeLog-dv-atomic-gcc7
> 
> 	* gcc.target/s390/md/atomic_compare_exchange-1.c: New test.
> 	* gcc.target/s390/md/atomic_compare_exchange-1.inc: New test.
> 	* gcc.target/s390/md/atomic_exchange-1.inc: New test.


These all look good to me now.

Thanks,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v5] S/390: Optimize atomic_compare_exchange and atomic_compare builtins.
  2017-04-11 14:21 ` [PATCH v5] " Dominik Vogt
  2017-04-24 15:20   ` Ulrich Weigand
@ 2017-04-25  8:04   ` Andreas Krebbel
  1 sibling, 0 replies; 18+ messages in thread
From: Andreas Krebbel @ 2017-04-25  8:04 UTC (permalink / raw)
  To: vogt, gcc-patches, Ulrich Weigand

On 04/11/2017 04:20 PM, Dominik Vogt wrote:
> On Mon, Mar 27, 2017 at 09:27:35PM +0100, Dominik Vogt wrote:
>> The attached patch optimizes the atomic_exchange and
>> atomic_compare patterns on s390 and s390x (mostly limited to
>> SImode and DImode).  Among general optimizaation, the changes fix
>> most of the problems reported in PR 80080:
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80080
>>
>> Bootstrapped and regression tested on a zEC12 with s390 and s390x
>> biarch.
> 
> v5:
>   * Generate LT pattern directly for const 0 value.
>   * Split into three patches.
> 
> Bootstrapped and regression tested on a zEC12 with s390 and s390x
> biarch.

Applied to mainline. Thanks!

-Andreas-

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2017-04-25  7:46 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-27 20:50 [PATCH] S/390: Optimize atomic_compare_exchange and atomic_compare builtins Dominik Vogt
2017-03-29 15:22 ` [PATCH v2] " Dominik Vogt
2017-04-05 13:52 ` [PATCH] " Dominik Vogt
2017-04-05 15:25   ` Ulrich Weigand
2017-04-06  9:35   ` Dominik Vogt
2017-04-06 15:29     ` Ulrich Weigand
2017-04-06 15:34       ` Ulrich Weigand
2017-04-07 14:14   ` Dominik Vogt
2017-04-07 14:34     ` Ulrich Weigand
2017-04-07 15:37       ` Dominik Vogt
2017-04-07 17:22         ` Ulrich Weigand
2017-04-10  9:13           ` Dominik Vogt
2017-04-10  9:21             ` Dominik Vogt
2017-04-10 21:37               ` Ulrich Weigand
2017-04-11  8:11                 ` Andreas Krebbel
2017-04-11 14:21 ` [PATCH v5] " Dominik Vogt
2017-04-24 15:20   ` Ulrich Weigand
2017-04-25  8:04   ` Andreas Krebbel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).