[PATCH v2 00/11] aarch64: Implement TImode comparisons

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH v2 00/11] aarch64: Implement TImode comparisons
@ 2020-04-02 18:53 Richard Henderson
  2020-04-02 18:53 ` [PATCH v2 01/11] aarch64: Accept 0 as first argument to compares Richard Henderson
                   ` (12 more replies)
  0 siblings, 13 replies; 33+ messages in thread
From: Richard Henderson @ 2020-04-02 18:53 UTC (permalink / raw)
  To: gcc-patches
  Cc: richard.sandiford, segher, richard.earnshaw, Wilco.Dijkstra,
	marcus.shawcroft, kyrylo.tkachov

This is attacking case 3 of PR 94174.

In v2, I unify the various subtract-with-borrow and add-with-carry
patterns that also output flags with unspecs.  As suggested by
Richard Sandiford during review of v1.  It does seem cleaner.


r~


Richard Henderson (11):
  aarch64: Accept 0 as first argument to compares
  aarch64: Accept zeros in add<GPI>3_carryin
  aarch64: Provide expander for sub<GPI>3_compare1
  aarch64: Introduce aarch64_expand_addsubti
  aarch64: Use UNSPEC_SBCS for subtract-with-borrow + output flags
  aarch64: Use UNSPEC_ADCS for add-with-carry + output flags
  aarch64: Remove CC_ADCmode
  aarch64: Accept -1 as second argument to add<mode>3_carryin
  aarch64: Adjust result of aarch64_gen_compare_reg
  aarch64: Implement TImode comparisons
  aarch64: Implement absti2

 gcc/config/aarch64/aarch64-protos.h       |  10 +-
 gcc/config/aarch64/aarch64.c              | 303 +++++----
 gcc/config/aarch64/aarch64-modes.def      |   1 -
 gcc/config/aarch64/aarch64-simd.md        |  18 +-
 gcc/config/aarch64/aarch64-speculation.cc |   5 +-
 gcc/config/aarch64/aarch64.md             | 762 ++++++++++------------
 gcc/config/aarch64/predicates.md          |  15 +-
 7 files changed, 527 insertions(+), 587 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v2 01/11] aarch64: Accept 0 as first argument to compares
  2020-04-02 18:53 [PATCH v2 00/11] aarch64: Implement TImode comparisons Richard Henderson
@ 2020-04-02 18:53 ` Richard Henderson
  2020-04-02 18:53 ` [PATCH v2 02/11] aarch64: Accept zeros in add<GPI>3_carryin Richard Henderson
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 33+ messages in thread
From: Richard Henderson @ 2020-04-02 18:53 UTC (permalink / raw)
  To: gcc-patches
  Cc: richard.sandiford, segher, richard.earnshaw, Wilco.Dijkstra,
	marcus.shawcroft, kyrylo.tkachov

While cmp (extended register) and cmp (immediate) uses <Wn|WSP>,
cmp (shifted register) uses <Wn>.  So we can perform cmp xzr, x0.

For ccmp, we only have <Wn> as an input.

	* config/aarch64/aarch64.md (cmp<GPI>): For operand 0, use
	aarch64_reg_or_zero.  Shuffle reg/reg to last alternative
	and accept Z.
	(@ccmpcc<GPI>): For operand 0, use aarch64_reg_or_zero and Z.
	(@ccmpcc<GPI>_rev): Likewise.
---
 gcc/config/aarch64/aarch64.md | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index c7c4d1dd519..6fdab5f3402 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -502,7 +502,7 @@
 	   [(match_operand 0 "cc_register" "")
 	    (const_int 0)])
 	  (compare:CC_ONLY
-	    (match_operand:GPI 2 "register_operand" "r,r,r")
+	    (match_operand:GPI 2 "aarch64_reg_or_zero" "rZ,rZ,rZ")
 	    (match_operand:GPI 3 "aarch64_ccmp_operand" "r,Uss,Usn"))
 	  (unspec:CC_ONLY
 	    [(match_operand 5 "immediate_operand")]
@@ -542,7 +542,7 @@
 	    [(match_operand 5 "immediate_operand")]
 	    UNSPEC_NZCV)
 	  (compare:CC_ONLY
-	    (match_operand:GPI 2 "register_operand" "r,r,r")
+	    (match_operand:GPI 2 "aarch64_reg_or_zero" "rZ,rZ,rZ")
 	    (match_operand:GPI 3 "aarch64_ccmp_operand" "r,Uss,Usn"))))]
   ""
   "@
@@ -3961,14 +3961,14 @@
 
 (define_insn "cmp<mode>"
   [(set (reg:CC CC_REGNUM)
-	(compare:CC (match_operand:GPI 0 "register_operand" "rk,rk,rk")
-		    (match_operand:GPI 1 "aarch64_plus_operand" "r,I,J")))]
+	(compare:CC (match_operand:GPI 0 "aarch64_reg_or_zero" "rk,rk,rkZ")
+		    (match_operand:GPI 1 "aarch64_plus_operand" "I,J,r")))]
   ""
   "@
-   cmp\\t%<w>0, %<w>1
    cmp\\t%<w>0, %1
-   cmn\\t%<w>0, #%n1"
-  [(set_attr "type" "alus_sreg,alus_imm,alus_imm")]
+   cmn\\t%<w>0, #%n1
+   cmp\\t%<w>0, %<w>1"
+  [(set_attr "type" "alus_imm,alus_imm,alus_sreg")]
 )
 
 (define_insn "fcmp<mode>"
-- 
2.20.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v2 02/11] aarch64: Accept zeros in add<GPI>3_carryin
  2020-04-02 18:53 [PATCH v2 00/11] aarch64: Implement TImode comparisons Richard Henderson
  2020-04-02 18:53 ` [PATCH v2 01/11] aarch64: Accept 0 as first argument to compares Richard Henderson
@ 2020-04-02 18:53 ` Richard Henderson
  2020-04-02 18:53 ` [PATCH v2 03/11] aarch64: Provide expander for sub<GPI>3_compare1 Richard Henderson
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 33+ messages in thread
From: Richard Henderson @ 2020-04-02 18:53 UTC (permalink / raw)
  To: gcc-patches
  Cc: richard.sandiford, segher, richard.earnshaw, Wilco.Dijkstra,
	marcus.shawcroft, kyrylo.tkachov

The expander and the insn pattern did not match, leading to
recognition failures in expand.

	* config/aarch64/aarch64.md (*add<GPI>3_carryin): Accept zeros.
---
 gcc/config/aarch64/aarch64.md | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 6fdab5f3402..b242f2b1c73 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -2606,16 +2606,17 @@
    ""
 )
 
-;; Note that add with carry with two zero inputs is matched by cset,
-;; and that add with carry with one zero input is matched by cinc.
+;; While add with carry with two zero inputs will be folded to cset,
+;; and add with carry with one zero input will be folded to cinc,
+;; accept the zeros during initial expansion.
 
 (define_insn "*add<mode>3_carryin"
   [(set (match_operand:GPI 0 "register_operand" "=r")
 	(plus:GPI
 	  (plus:GPI
 	    (match_operand:GPI 3 "aarch64_carry_operation" "")
-	    (match_operand:GPI 1 "register_operand" "r"))
-	  (match_operand:GPI 2 "register_operand" "r")))]
+	    (match_operand:GPI 1 "aarch64_reg_or_zero" "rZ"))
+	  (match_operand:GPI 2 "aarch64_reg_or_zero" "rZ")))]
    ""
    "adc\\t%<w>0, %<w>1, %<w>2"
   [(set_attr "type" "adc_reg")]
-- 
2.20.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v2 03/11] aarch64: Provide expander for sub<GPI>3_compare1
  2020-04-02 18:53 [PATCH v2 00/11] aarch64: Implement TImode comparisons Richard Henderson
  2020-04-02 18:53 ` [PATCH v2 01/11] aarch64: Accept 0 as first argument to compares Richard Henderson
  2020-04-02 18:53 ` [PATCH v2 02/11] aarch64: Accept zeros in add<GPI>3_carryin Richard Henderson
@ 2020-04-02 18:53 ` Richard Henderson
  2020-04-02 18:53 ` [PATCH v2 04/11] aarch64: Introduce aarch64_expand_addsubti Richard Henderson
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 33+ messages in thread
From: Richard Henderson @ 2020-04-02 18:53 UTC (permalink / raw)
  To: gcc-patches
  Cc: richard.sandiford, segher, richard.earnshaw, Wilco.Dijkstra,
	marcus.shawcroft, kyrylo.tkachov

In one place we open-code a special case of this pattern into the
more specific sub<GPI>3_compare1_imm, and miss this special case
in other places.  Centralize that special case into an expander.

	* config/aarch64/aarch64.md (*sub<GPI>3_compare1): Rename
	from sub<GPI>3_compare1.
	(sub<GPI>3_compare1): New expander.
	* config/aarch64/aarch64.c (aarch64_expand_subvti): Remove
	call to gen_subdi3_compare1_imm.
---
 gcc/config/aarch64/aarch64.c  | 11 ++---------
 gcc/config/aarch64/aarch64.md | 22 +++++++++++++++++++++-
 2 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index c90de65de12..7a13a8e8ec4 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -20333,16 +20333,9 @@ aarch64_expand_subvti (rtx op0, rtx low_dest, rtx low_in1,
     }
   else
     {
-      if (aarch64_plus_immediate (low_in2, DImode))
-	emit_insn (gen_subdi3_compare1_imm (low_dest, low_in1, low_in2,
-					    GEN_INT (-INTVAL (low_in2))));
-      else
-	{
-	  low_in2 = force_reg (DImode, low_in2);
-	  emit_insn (gen_subdi3_compare1 (low_dest, low_in1, low_in2));
-	}
-      high_in2 = force_reg (DImode, high_in2);
+      emit_insn (gen_subdi3_compare1 (low_dest, low_in1, low_in2));
 
+      high_in2 = force_reg (DImode, high_in2);
       if (unsigned_p)
 	emit_insn (gen_usubdi3_carryinC (high_dest, high_in1, high_in2));
       else
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index b242f2b1c73..d6389cc8148 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3120,7 +3120,7 @@
   [(set_attr "type" "alus_imm")]
 )
 
-(define_insn "sub<mode>3_compare1"
+(define_insn "*sub<mode>3_compare1"
   [(set (reg:CC CC_REGNUM)
 	(compare:CC
 	  (match_operand:GPI 1 "aarch64_reg_or_zero" "rkZ")
@@ -3132,6 +3132,26 @@
   [(set_attr "type" "alus_sreg")]
 )
 
+(define_expand "sub<mode>3_compare1"
+  [(parallel
+    [(set (reg:CC CC_REGNUM)
+	  (compare:CC
+	    (match_operand:GPI 1 "aarch64_reg_or_zero")
+	    (match_operand:GPI 2 "aarch64_reg_or_imm")))
+     (set (match_operand:GPI 0 "register_operand")
+	  (minus:GPI (match_dup 1) (match_dup 2)))])]
+  ""
+{
+  if (aarch64_plus_immediate (operands[2], <MODE>mode))
+    {
+      emit_insn (gen_sub<mode>3_compare1_imm
+		 (operands[0], operands[1], operands[2],
+		  GEN_INT (-INTVAL (operands[2]))));
+      DONE;
+    }
+  operands[2] = force_reg (<MODE>mode, operands[2]);
+})
+
 (define_peephole2
   [(set (match_operand:GPI 0 "aarch64_general_reg")
 	(minus:GPI (match_operand:GPI 1 "aarch64_reg_or_zero")
-- 
2.20.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v2 04/11] aarch64: Introduce aarch64_expand_addsubti
  2020-04-02 18:53 [PATCH v2 00/11] aarch64: Implement TImode comparisons Richard Henderson
                   ` (2 preceding siblings ...)
  2020-04-02 18:53 ` [PATCH v2 03/11] aarch64: Provide expander for sub<GPI>3_compare1 Richard Henderson
@ 2020-04-02 18:53 ` Richard Henderson
  2020-04-02 18:53 ` [PATCH v2 05/11] aarch64: Use UNSPEC_SBCS for subtract-with-borrow + output flags Richard Henderson
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 33+ messages in thread
From: Richard Henderson @ 2020-04-02 18:53 UTC (permalink / raw)
  To: gcc-patches
  Cc: richard.sandiford, segher, richard.earnshaw, Wilco.Dijkstra,
	marcus.shawcroft, kyrylo.tkachov

Modify aarch64_expand_subvti into a form that handles all
addition and subtraction, modulo, signed or unsigned overflow.

Use expand_insn to put the operands into the proper form,
and do not force values into register if not required.

	* config/aarch64/aarch64.c (aarch64_ti_split) New.
	(aarch64_addti_scratch_regs): Remove.
	(aarch64_subvti_scratch_regs): Remove.
	(aarch64_expand_subvti): Remove.
	(aarch64_expand_addsubti): New.
	* config/aarch64/aarch64-protos.h: Update to match.
	* config/aarch64/aarch64.md (addti3): Use aarch64_expand_addsubti.
	(addvti4, uaddvti4): Likewise.
	(subvti4, usubvti4): Likewise.
	(subti3): Likewise; accept immediates for operand 2.
---
 gcc/config/aarch64/aarch64-protos.h |  10 +--
 gcc/config/aarch64/aarch64.c        | 129 +++++++++-------------------
 gcc/config/aarch64/aarch64.md       | 125 ++++++---------------------
 3 files changed, 67 insertions(+), 197 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index d6d668ea920..787085b24d2 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -630,16 +630,8 @@ void aarch64_reset_previous_fndecl (void);
 bool aarch64_return_address_signing_enabled (void);
 bool aarch64_bti_enabled (void);
 void aarch64_save_restore_target_globals (tree);
-void aarch64_addti_scratch_regs (rtx, rtx, rtx *,
-				 rtx *, rtx *,
-				 rtx *, rtx *,
-				 rtx *);
-void aarch64_subvti_scratch_regs (rtx, rtx, rtx *,
-				  rtx *, rtx *,
-				  rtx *, rtx *, rtx *);
-void aarch64_expand_subvti (rtx, rtx, rtx,
-			    rtx, rtx, rtx, rtx, bool);
 
+void aarch64_expand_addsubti (rtx, rtx, rtx, int, int, int);
 
 /* Initialize builtins for SIMD intrinsics.  */
 void init_aarch64_simd_builtins (void);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 7a13a8e8ec4..6263897c9a0 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -20241,110 +20241,61 @@ aarch64_gen_unlikely_cbranch (enum rtx_code code, machine_mode cc_mode,
   aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
 }
 
-/* Generate DImode scratch registers for 128-bit (TImode) addition.
+/* Generate DImode scratch registers for 128-bit (TImode) add/sub.
+   INPUT represents the TImode input operand
+   LO represents the low half (DImode) of the TImode operand
+   HI represents the high half (DImode) of the TImode operand.  */
 
-   OP1 represents the TImode destination operand 1
-   OP2 represents the TImode destination operand 2
-   LOW_DEST represents the low half (DImode) of TImode operand 0
-   LOW_IN1 represents the low half (DImode) of TImode operand 1
-   LOW_IN2 represents the low half (DImode) of TImode operand 2
-   HIGH_DEST represents the high half (DImode) of TImode operand 0
-   HIGH_IN1 represents the high half (DImode) of TImode operand 1
-   HIGH_IN2 represents the high half (DImode) of TImode operand 2.  */
-
-void
-aarch64_addti_scratch_regs (rtx op1, rtx op2, rtx *low_dest,
-			    rtx *low_in1, rtx *low_in2,
-			    rtx *high_dest, rtx *high_in1,
-			    rtx *high_in2)
+static void
+aarch64_ti_split (rtx input, rtx *lo, rtx *hi)
 {
-  *low_dest = gen_reg_rtx (DImode);
-  *low_in1 = gen_lowpart (DImode, op1);
-  *low_in2 = simplify_gen_subreg (DImode, op2, TImode,
-				  subreg_lowpart_offset (DImode, TImode));
-  *high_dest = gen_reg_rtx (DImode);
-  *high_in1 = gen_highpart (DImode, op1);
-  *high_in2 = simplify_gen_subreg (DImode, op2, TImode,
-				   subreg_highpart_offset (DImode, TImode));
+  *lo = simplify_gen_subreg (DImode, input, TImode,
+			     subreg_lowpart_offset (DImode, TImode));
+  *hi = simplify_gen_subreg (DImode, input, TImode,
+			     subreg_highpart_offset (DImode, TImode));
 }
 
-/* Generate DImode scratch registers for 128-bit (TImode) subtraction.
-
-   This function differs from 'arch64_addti_scratch_regs' in that
-   OP1 can be an immediate constant (zero). We must call
-   subreg_highpart_offset with DImode and TImode arguments, otherwise
-   VOIDmode will be used for the const_int which generates an internal
-   error from subreg_size_highpart_offset which does not expect a size of zero.
-
-   OP1 represents the TImode destination operand 1
-   OP2 represents the TImode destination operand 2
-   LOW_DEST represents the low half (DImode) of TImode operand 0
-   LOW_IN1 represents the low half (DImode) of TImode operand 1
-   LOW_IN2 represents the low half (DImode) of TImode operand 2
-   HIGH_DEST represents the high half (DImode) of TImode operand 0
-   HIGH_IN1 represents the high half (DImode) of TImode operand 1
-   HIGH_IN2 represents the high half (DImode) of TImode operand 2.  */
-
-
-void
-aarch64_subvti_scratch_regs (rtx op1, rtx op2, rtx *low_dest,
-			     rtx *low_in1, rtx *low_in2,
-			     rtx *high_dest, rtx *high_in1,
-			     rtx *high_in2)
-{
-  *low_dest = gen_reg_rtx (DImode);
-  *low_in1 = simplify_gen_subreg (DImode, op1, TImode,
-				  subreg_lowpart_offset (DImode, TImode));
-
-  *low_in2 = simplify_gen_subreg (DImode, op2, TImode,
-				  subreg_lowpart_offset (DImode, TImode));
-  *high_dest = gen_reg_rtx (DImode);
-
-  *high_in1 = simplify_gen_subreg (DImode, op1, TImode,
-				   subreg_highpart_offset (DImode, TImode));
-  *high_in2 = simplify_gen_subreg (DImode, op2, TImode,
-				   subreg_highpart_offset (DImode, TImode));
-}
-
-/* Generate RTL for 128-bit (TImode) subtraction with overflow.
-
+/* Generate RTL for 128-bit (TImode) addition or subtraction.
    OP0 represents the TImode destination operand 0
-   LOW_DEST represents the low half (DImode) of TImode operand 0
-   LOW_IN1 represents the low half (DImode) of TImode operand 1
-   LOW_IN2 represents the low half (DImode) of TImode operand 2
-   HIGH_DEST represents the high half (DImode) of TImode operand 0
-   HIGH_IN1 represents the high half (DImode) of TImode operand 1
-   HIGH_IN2 represents the high half (DImode) of TImode operand 2
-   UNSIGNED_P is true if the operation is being performed on unsigned
-   values.  */
+   OP1 and OP2 represent the TImode input operands.
+
+   Normal or Overflow behaviour is obtained via the INSN_CODE operands:
+   CODE_HI_LO0 is used when the low half of OP2 == 0, otherwise
+   CODE_LO is used on the low halves,
+   CODE_HI is used on the high halves.  */
+
 void
-aarch64_expand_subvti (rtx op0, rtx low_dest, rtx low_in1,
-		       rtx low_in2, rtx high_dest, rtx high_in1,
-		       rtx high_in2, bool unsigned_p)
+aarch64_expand_addsubti (rtx op0, rtx op1, rtx op2,
+			 int code_hi_lo0, int code_lo, int code_hi)
 {
-  if (low_in2 == const0_rtx)
+  rtx low_dest, low_op1, low_op2, high_dest, high_op1, high_op2;
+  struct expand_operand ops[3];
+
+  aarch64_ti_split (op1, &low_op1, &high_op1);
+  aarch64_ti_split (op2, &low_op2, &high_op2);
+
+  if (low_op2 == const0_rtx)
     {
-      low_dest = low_in1;
-      high_in2 = force_reg (DImode, high_in2);
-      if (unsigned_p)
-	emit_insn (gen_subdi3_compare1 (high_dest, high_in1, high_in2));
-      else
-	emit_insn (gen_subvdi_insn (high_dest, high_in1, high_in2));
+      low_dest = low_op1;
+      code_hi = code_hi_lo0;
     }
   else
     {
-      emit_insn (gen_subdi3_compare1 (low_dest, low_in1, low_in2));
-
-      high_in2 = force_reg (DImode, high_in2);
-      if (unsigned_p)
-	emit_insn (gen_usubdi3_carryinC (high_dest, high_in1, high_in2));
-      else
-	emit_insn (gen_subdi3_carryinV (high_dest, high_in1, high_in2));
+      low_dest = gen_reg_rtx (DImode);
+      create_output_operand(&ops[0], low_dest, DImode);
+      create_input_operand(&ops[1], low_op1, DImode);
+      create_input_operand(&ops[2], low_op2, DImode);
+      expand_insn ((insn_code)code_lo, 3, ops);
     }
 
+  high_dest = gen_reg_rtx (DImode);
+  create_output_operand(&ops[0], high_dest, DImode);
+  create_input_operand(&ops[1], high_op1, DImode);
+  create_input_operand(&ops[2], high_op2, DImode);
+  expand_insn ((insn_code)code_hi, 3, ops);
+
   emit_move_insn (gen_lowpart (DImode, op0), low_dest);
   emit_move_insn (gen_highpart (DImode, op0), high_dest);
-
 }
 
 /* Implement the TARGET_ASAN_SHADOW_OFFSET hook.  */
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index d6389cc8148..532c114a42e 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -2044,30 +2044,10 @@
 		 (match_operand:TI 2 "aarch64_reg_or_imm")))]
   ""
 {
-  rtx low_dest, op1_low, op2_low, high_dest, op1_high, op2_high;
-
-  aarch64_addti_scratch_regs (operands[1], operands[2],
-			      &low_dest, &op1_low, &op2_low,
-			      &high_dest, &op1_high, &op2_high);
-
-  if (op2_low == const0_rtx)
-    {
-      low_dest = op1_low;
-      if (!aarch64_pluslong_operand (op2_high, DImode))
-	op2_high = force_reg (DImode, op2_high);
-      emit_insn (gen_adddi3 (high_dest, op1_high, op2_high));
-    }
-  else
-    {
-      emit_insn (gen_adddi3_compareC (low_dest, op1_low,
-				      force_reg (DImode, op2_low)));
-      emit_insn (gen_adddi3_carryin (high_dest, op1_high,
-				     force_reg (DImode, op2_high)));
-    }
-
-  emit_move_insn (gen_lowpart (DImode, operands[0]), low_dest);
-  emit_move_insn (gen_highpart (DImode, operands[0]), high_dest);
-
+  aarch64_expand_addsubti (operands[0], operands[1], operands[2],
+			   CODE_FOR_adddi3,
+			   CODE_FOR_adddi3_compareC,
+			   CODE_FOR_adddi3_carryin);
   DONE;
 })
 
@@ -2078,29 +2058,10 @@
    (label_ref (match_operand 3 "" ""))]
   ""
 {
-  rtx low_dest, op1_low, op2_low, high_dest, op1_high, op2_high;
-
-  aarch64_addti_scratch_regs (operands[1], operands[2],
-			      &low_dest, &op1_low, &op2_low,
-			      &high_dest, &op1_high, &op2_high);
-
-  if (op2_low == const0_rtx)
-    {
-      low_dest = op1_low;
-      emit_insn (gen_adddi3_compareV (high_dest, op1_high,
-				      force_reg (DImode, op2_high)));
-    }
-  else
-    {
-      emit_insn (gen_adddi3_compareC (low_dest, op1_low,
-				      force_reg (DImode, op2_low)));
-      emit_insn (gen_adddi3_carryinV (high_dest, op1_high,
-				      force_reg (DImode, op2_high)));
-    }
-
-  emit_move_insn (gen_lowpart (DImode, operands[0]), low_dest);
-  emit_move_insn (gen_highpart (DImode, operands[0]), high_dest);
-
+  aarch64_expand_addsubti (operands[0], operands[1], operands[2],
+			   CODE_FOR_adddi3_compareV,
+			   CODE_FOR_adddi3_compareC,
+			   CODE_FOR_adddi3_carryinV);
   aarch64_gen_unlikely_cbranch (NE, CC_Vmode, operands[3]);
   DONE;
 })
@@ -2112,32 +2073,13 @@
    (label_ref (match_operand 3 "" ""))]
   ""
 {
-  rtx low_dest, op1_low, op2_low, high_dest, op1_high, op2_high;
-
-  aarch64_addti_scratch_regs (operands[1], operands[2],
-			      &low_dest, &op1_low, &op2_low,
-			      &high_dest, &op1_high, &op2_high);
-
-  if (op2_low == const0_rtx)
-    {
-      low_dest = op1_low;
-      emit_insn (gen_adddi3_compareC (high_dest, op1_high,
-				      force_reg (DImode, op2_high)));
-    }
-  else
-    {
-      emit_insn (gen_adddi3_compareC (low_dest, op1_low,
-				      force_reg (DImode, op2_low)));
-      emit_insn (gen_adddi3_carryinC (high_dest, op1_high,
-				      force_reg (DImode, op2_high)));
-    }
-
-  emit_move_insn (gen_lowpart (DImode, operands[0]), low_dest);
-  emit_move_insn (gen_highpart (DImode, operands[0]), high_dest);
-
+  aarch64_expand_addsubti (operands[0], operands[1], operands[2],
+			   CODE_FOR_adddi3_compareC,
+			   CODE_FOR_adddi3_compareC,
+			   CODE_FOR_adddi3_carryinC);
   aarch64_gen_unlikely_cbranch (GEU, CC_ADCmode, operands[3]);
   DONE;
- })
+})
 
 (define_insn "add<mode>3_compare0"
   [(set (reg:CC_NZ CC_REGNUM)
@@ -2980,20 +2922,13 @@
 (define_expand "subti3"
   [(set (match_operand:TI 0 "register_operand")
 	(minus:TI (match_operand:TI 1 "aarch64_reg_or_zero")
-		  (match_operand:TI 2 "register_operand")))]
+		  (match_operand:TI 2 "aarch64_reg_or_imm")))]
   ""
 {
-  rtx low_dest, op1_low, op2_low, high_dest, op1_high, op2_high;
-
-  aarch64_subvti_scratch_regs (operands[1], operands[2],
-			       &low_dest, &op1_low, &op2_low,
-			       &high_dest, &op1_high, &op2_high);
-
-  emit_insn (gen_subdi3_compare1 (low_dest, op1_low, op2_low));
-  emit_insn (gen_subdi3_carryin (high_dest, op1_high, op2_high));
-
-  emit_move_insn (gen_lowpart (DImode, operands[0]), low_dest);
-  emit_move_insn (gen_highpart (DImode, operands[0]), high_dest);
+  aarch64_expand_addsubti (operands[0], operands[1], operands[2],
+			   CODE_FOR_subdi3,
+			   CODE_FOR_subdi3_compare1,
+			   CODE_FOR_subdi3_carryin);
   DONE;
 })
 
@@ -3004,14 +2939,10 @@
    (label_ref (match_operand 3 "" ""))]
   ""
 {
-  rtx low_dest, op1_low, op2_low, high_dest, op1_high, op2_high;
-
-  aarch64_subvti_scratch_regs (operands[1], operands[2],
-			       &low_dest, &op1_low, &op2_low,
-			       &high_dest, &op1_high, &op2_high);
-  aarch64_expand_subvti (operands[0], low_dest, op1_low, op2_low,
-			 high_dest, op1_high, op2_high, false);
-
+  aarch64_expand_addsubti (operands[0], operands[1], operands[2],
+			   CODE_FOR_subvdi_insn,
+			   CODE_FOR_subdi3_compare1,
+			   CODE_FOR_subdi3_carryinV);
   aarch64_gen_unlikely_cbranch (NE, CC_Vmode, operands[3]);
   DONE;
 })
@@ -3023,14 +2954,10 @@
    (label_ref (match_operand 3 "" ""))]
   ""
 {
-  rtx low_dest, op1_low, op2_low, high_dest, op1_high, op2_high;
-
-  aarch64_subvti_scratch_regs (operands[1], operands[2],
-				    &low_dest, &op1_low, &op2_low,
-			       &high_dest, &op1_high, &op2_high);
-  aarch64_expand_subvti (operands[0], low_dest, op1_low, op2_low,
-			 high_dest, op1_high, op2_high, true);
-
+  aarch64_expand_addsubti (operands[0], operands[1], operands[2],
+			   CODE_FOR_subdi3_compare1,
+			   CODE_FOR_subdi3_compare1,
+			   CODE_FOR_usubdi3_carryinC);
   aarch64_gen_unlikely_cbranch (LTU, CCmode, operands[3]);
   DONE;
 })
-- 
2.20.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v2 05/11] aarch64: Use UNSPEC_SBCS for subtract-with-borrow + output flags
  2020-04-02 18:53 [PATCH v2 00/11] aarch64: Implement TImode comparisons Richard Henderson
                   ` (3 preceding siblings ...)
  2020-04-02 18:53 ` [PATCH v2 04/11] aarch64: Introduce aarch64_expand_addsubti Richard Henderson
@ 2020-04-02 18:53 ` Richard Henderson
  2020-04-09 21:52   ` Segher Boessenkool
  2020-04-02 18:53 ` [PATCH v2 06/11] aarch64: Use UNSPEC_ADCS for add-with-carry " Richard Henderson
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 33+ messages in thread
From: Richard Henderson @ 2020-04-02 18:53 UTC (permalink / raw)
  To: gcc-patches
  Cc: richard.sandiford, segher, richard.earnshaw, Wilco.Dijkstra,
	marcus.shawcroft, kyrylo.tkachov

The rtl description of signed/unsigned overflow from subtract
was fine, as far as it goes -- we have CC_Cmode and CC_Vmode
that indicate that only those particular bits are valid.

However, it's not clear how to extend that description to
handle signed comparison, where N == V (GE) N != V (LT) are
the only valid bits.

Using an UNSPEC means that we can unify all 3 usages without
fear that combine will try to infer anything from the rtl.
It also means we need far fewer variants when various inputs
have constants propagated in, and the rtl folds.

Accept -1 for the second input by using ADCS.

	* config/aarch64/aarch64.md (UNSPEC_SBCS): New.
	(cmp<GPI>3_carryin): New expander.
	(sub<GPI>3_carryin_cmp): New expander.
	(*cmp<GPI>3_carryin): New pattern.
	(*cmp<GPI>3_carryin_0): New pattern.
	(*sub<GPI>3_carryin_cmp): New pattern.
	(*sub<GPI>3_carryin_cmp_0): New pattern.
	(subvti4, usubvti4, negvti3): Use subdi3_carryin_cmp.
	(negvdi_carryinV): Remove.
	(usub<GPI>3_carryinC): Remove.
	(*usub<GPI>3_carryinC): Remove.
	(*usub<GPI>3_carryinC_z1): Remove.
	(*usub<GPI>3_carryinC_z2): Remove.
	(sub<GPI>3_carryinV): Remove.
	(*sub<GPI>3_carryinV): Remove.
	(*sub<GPI>3_carryinV_z2): Remove.
	* config/aarch64/predicates.md (aarch64_reg_zero_minus1): New.
---
 gcc/config/aarch64/aarch64.md    | 217 +++++++++++++------------------
 gcc/config/aarch64/predicates.md |   7 +
 2 files changed, 94 insertions(+), 130 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 532c114a42e..564dea390be 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -281,6 +281,7 @@
     UNSPEC_GEN_TAG_RND		; Generate a random 4-bit MTE tag.
     UNSPEC_TAG_SPACE		; Translate address to MTE tag address space.
     UNSPEC_LD1RO
+    UNSPEC_SBCS
 ])
 
 (define_c_enum "unspecv" [
@@ -2942,7 +2943,7 @@
   aarch64_expand_addsubti (operands[0], operands[1], operands[2],
 			   CODE_FOR_subvdi_insn,
 			   CODE_FOR_subdi3_compare1,
-			   CODE_FOR_subdi3_carryinV);
+			   CODE_FOR_subdi3_carryin_cmp);
   aarch64_gen_unlikely_cbranch (NE, CC_Vmode, operands[3]);
   DONE;
 })
@@ -2957,7 +2958,7 @@
   aarch64_expand_addsubti (operands[0], operands[1], operands[2],
 			   CODE_FOR_subdi3_compare1,
 			   CODE_FOR_subdi3_compare1,
-			   CODE_FOR_usubdi3_carryinC);
+			   CODE_FOR_subdi3_carryin_cmp);
   aarch64_gen_unlikely_cbranch (LTU, CCmode, operands[3]);
   DONE;
 })
@@ -2968,12 +2969,14 @@
    (label_ref (match_operand 2 "" ""))]
   ""
   {
-    emit_insn (gen_negdi_carryout (gen_lowpart (DImode, operands[0]),
-				   gen_lowpart (DImode, operands[1])));
-    emit_insn (gen_negvdi_carryinV (gen_highpart (DImode, operands[0]),
-				    gen_highpart (DImode, operands[1])));
-    aarch64_gen_unlikely_cbranch (NE, CC_Vmode, operands[2]);
+    rtx op0l = gen_lowpart (DImode, operands[0]);
+    rtx op1l = gen_lowpart (DImode, operands[1]);
+    rtx op0h = gen_highpart (DImode, operands[0]);
+    rtx op1h = gen_highpart (DImode, operands[1]);
 
+    emit_insn (gen_negdi_carryout (op0l, op1l));
+    emit_insn (gen_subdi3_carryin_cmp (op0h, const0_rtx, op1h));
+    aarch64_gen_unlikely_cbranch (NE, CC_Vmode, operands[2]);
     DONE;
   }
 )
@@ -2989,23 +2992,6 @@
   [(set_attr "type" "alus_sreg")]
 )
 
-(define_insn "negvdi_carryinV"
-  [(set (reg:CC_V CC_REGNUM)
-	(compare:CC_V
-	 (neg:TI (plus:TI
-		  (ltu:TI (reg:CC CC_REGNUM) (const_int 0))
-		  (sign_extend:TI (match_operand:DI 1 "register_operand" "r"))))
-	 (sign_extend:TI
-	  (neg:DI (plus:DI (ltu:DI (reg:CC CC_REGNUM) (const_int 0))
-			   (match_dup 1))))))
-   (set (match_operand:DI 0 "register_operand" "=r")
-	(neg:DI (plus:DI (ltu:DI (reg:CC CC_REGNUM) (const_int 0))
-			 (match_dup 1))))]
-  ""
-  "ngcs\\t%0, %1"
-  [(set_attr "type" "alus_sreg")]
-)
-
 (define_insn "*sub<mode>3_compare0"
   [(set (reg:CC_NZ CC_REGNUM)
 	(compare:CC_NZ (minus:GPI (match_operand:GPI 1 "register_operand" "rk")
@@ -3370,134 +3356,105 @@
   [(set_attr "type" "adc_reg")]
 )
 
-(define_expand "usub<GPI:mode>3_carryinC"
+(define_expand "sub<mode>3_carryin_cmp"
   [(parallel
-     [(set (reg:CC CC_REGNUM)
-	   (compare:CC
-	     (zero_extend:<DWI>
-	       (match_operand:GPI 1 "aarch64_reg_or_zero"))
-	     (plus:<DWI>
-	       (zero_extend:<DWI>
-		 (match_operand:GPI 2 "register_operand"))
-	       (ltu:<DWI> (reg:CC CC_REGNUM) (const_int 0)))))
-      (set (match_operand:GPI 0 "register_operand")
-	   (minus:GPI
-	     (minus:GPI (match_dup 1) (match_dup 2))
-	     (ltu:GPI (reg:CC CC_REGNUM) (const_int 0))))])]
+    [(set (match_dup 3)
+	  (unspec:CC
+	    [(match_operand:GPI 1 "aarch64_reg_or_zero")
+	     (match_operand:GPI 2 "aarch64_reg_zero_minus1")
+	     (match_dup 4)]
+	    UNSPEC_SBCS))
+     (set (match_operand:GPI 0 "register_operand" "=r")
+	  (unspec:GPI
+	    [(match_dup 1) (match_dup 2) (match_dup 4)]
+	    UNSPEC_SBCS))])]
    ""
+  {
+    operands[3] = gen_rtx_REG (CCmode, CC_REGNUM);
+    operands[4] = gen_rtx_LTU (<MODE>mode, operands[3], const0_rtx);
+  }
 )
 
-(define_insn "*usub<GPI:mode>3_carryinC_z1"
+(define_insn "*sub<mode>3_carryin_cmp"
   [(set (reg:CC CC_REGNUM)
-	(compare:CC
-	  (const_int 0)
-	  (plus:<DWI>
-	    (zero_extend:<DWI>
-	      (match_operand:GPI 1 "register_operand" "r"))
-	    (match_operand:<DWI> 2 "aarch64_borrow_operation" ""))))
-   (set (match_operand:GPI 0 "register_operand" "=r")
-	(minus:GPI
-	  (neg:GPI (match_dup 1))
-	  (match_operand:GPI 3 "aarch64_borrow_operation" "")))]
+	(unspec:CC
+	  [(match_operand:GPI 1 "aarch64_reg_or_zero" "rZ,rZ")
+	   (match_operand:GPI 2 "aarch64_reg_zero_minus1" "rZ,UsM")
+	   (match_operand:GPI 3 "aarch64_borrow_operation" "")]
+	  UNSPEC_SBCS))
+   (set (match_operand:GPI 0 "register_operand" "=r,r")
+	(unspec:GPI
+	  [(match_dup 1) (match_dup 2) (match_dup 3)]
+	  UNSPEC_SBCS))]
    ""
-   "sbcs\\t%<w>0, <w>zr, %<w>1"
+   "@
+    sbcs\\t%<w>0, %<w>1, %<w>2
+    adcs\\t%<w>0, %<w>1, <w>zr"
   [(set_attr "type" "adc_reg")]
 )
 
-(define_insn "*usub<GPI:mode>3_carryinC_z2"
+(define_expand "cmp<mode>3_carryin"
   [(set (reg:CC CC_REGNUM)
-	(compare:CC
-	  (zero_extend:<DWI>
-	    (match_operand:GPI 1 "register_operand" "r"))
-	  (match_operand:<DWI> 2 "aarch64_borrow_operation" "")))
-   (set (match_operand:GPI 0 "register_operand" "=r")
-	(minus:GPI
-	  (match_dup 1)
-	  (match_operand:GPI 3 "aarch64_borrow_operation" "")))]
+	(unspec:CC
+	  [(match_operand:GPI 0 "aarch64_reg_or_zero")
+	   (match_operand:GPI 1 "aarch64_reg_zero_minus1")
+	   (ltu:GPI (reg:CC CC_REGNUM) (const_int 0))]
+	  UNSPEC_SBCS))]
    ""
-   "sbcs\\t%<w>0, %<w>1, <w>zr"
-  [(set_attr "type" "adc_reg")]
 )
 
-(define_insn "*usub<GPI:mode>3_carryinC"
+(define_insn "*cmp<mode>3_carryin"
   [(set (reg:CC CC_REGNUM)
-	(compare:CC
-	  (zero_extend:<DWI>
-	    (match_operand:GPI 1 "register_operand" "r"))
-	  (plus:<DWI>
-	    (zero_extend:<DWI>
-	      (match_operand:GPI 2 "register_operand" "r"))
-	    (match_operand:<DWI> 3 "aarch64_borrow_operation" ""))))
-   (set (match_operand:GPI 0 "register_operand" "=r")
-	(minus:GPI
-	  (minus:GPI (match_dup 1) (match_dup 2))
-	  (match_operand:GPI 4 "aarch64_borrow_operation" "")))]
+	(unspec:CC
+	  [(match_operand:GPI 0 "aarch64_reg_or_zero" "rZ,rZ")
+	   (match_operand:GPI 1 "aarch64_reg_zero_minus1" "rZ,UsM")
+	   (match_operand:GPI 2 "aarch64_borrow_operation" "")]
+	  UNSPEC_SBCS))]
    ""
-   "sbcs\\t%<w>0, %<w>1, %<w>2"
+   "@
+    sbcs\\t<w>zr, %<w>0, %<w>1
+    adcs\\t<w>zr, %<w>0, <w>zr"
   [(set_attr "type" "adc_reg")]
 )
 
-(define_expand "sub<GPI:mode>3_carryinV"
-  [(parallel
-     [(set (reg:CC_V CC_REGNUM)
-	   (compare:CC_V
-	    (minus:<DWI>
-	     (sign_extend:<DWI>
-	       (match_operand:GPI 1 "aarch64_reg_or_zero"))
-	     (plus:<DWI>
-	       (sign_extend:<DWI>
-		 (match_operand:GPI 2 "register_operand"))
-	       (ltu:<DWI> (reg:CC CC_REGNUM) (const_int 0))))
-	    (sign_extend:<DWI>
-	     (minus:GPI (match_dup 1)
-			(plus:GPI (ltu:GPI (reg:CC CC_REGNUM) (const_int 0))
-				  (match_dup 2))))))
-      (set (match_operand:GPI 0 "register_operand")
-	   (minus:GPI
-	     (minus:GPI (match_dup 1) (match_dup 2))
-	     (ltu:GPI (reg:CC CC_REGNUM) (const_int 0))))])]
-   ""
+;; If combine can show that the borrow is 0, fold SBCS to SUBS.
+(define_insn_and_split "*sub<mode>3_carryin_cmp_0"
+  [(set (reg:CC CC_REGNUM)
+	(unspec:CC
+	  [(match_operand:GPI 1 "aarch64_reg_or_zero" "rk,rkZ")
+	   (match_operand:GPI 2 "aarch64_plus_immediate" "rIJ,r")
+	   (const_int 0)]
+	  UNSPEC_SBCS))
+   (set (match_operand:GPI 0 "register_operand")
+	(unspec:GPI
+	  [(match_dup 1) (match_dup 2) (const_int 0)]
+	  UNSPEC_SBCS))]
+  ""
+  "#"
+  ""
+  [(scratch)]
+  {
+    emit_insn (gen_sub<mode>3_compare1 (operands[0], operands[1],
+					operands[2]));
+    DONE;
+  }
 )
 
-(define_insn "*sub<mode>3_carryinV_z2"
-  [(set (reg:CC_V CC_REGNUM)
-	(compare:CC_V
-	 (minus:<DWI>
-	  (sign_extend:<DWI> (match_operand:GPI 1 "register_operand" "r"))
-	  (match_operand:<DWI> 2 "aarch64_borrow_operation" ""))
-	 (sign_extend:<DWI>
-	  (minus:GPI (match_dup 1)
-		     (match_operand:GPI 3 "aarch64_borrow_operation" "")))))
-   (set (match_operand:GPI 0 "register_operand" "=r")
-	(minus:GPI
-	 (match_dup 1) (match_dup 3)))]
+(define_insn_and_split "*cmp<mode>3_carryin_0"
+  [(set (reg:CC CC_REGNUM)
+	(unspec:CC
+	  [(match_operand:GPI 0 "aarch64_reg_or_zero" "rk,rZ")
+	   (match_operand:GPI 1 "aarch64_plus_operand" "rIJ,r")
+	   (const_int 0)]
+	  UNSPEC_SBCS))]
    ""
-   "sbcs\\t%<w>0, %<w>1, <w>zr"
-  [(set_attr "type" "adc_reg")]
-)
-
-(define_insn "*sub<mode>3_carryinV"
-  [(set (reg:CC_V CC_REGNUM)
-	(compare:CC_V
-	 (minus:<DWI>
-	  (sign_extend:<DWI>
-	    (match_operand:GPI 1 "register_operand" "r"))
-	  (plus:<DWI>
-	    (sign_extend:<DWI>
-	      (match_operand:GPI 2 "register_operand" "r"))
-	    (match_operand:<DWI> 3 "aarch64_borrow_operation" "")))
-	 (sign_extend:<DWI>
-	  (minus:GPI
-	   (match_dup 1)
-	   (plus:GPI (match_operand:GPI 4 "aarch64_borrow_operation" "")
-		     (match_dup 2))))))
-   (set (match_operand:GPI 0 "register_operand" "=r")
-	(minus:GPI
-	  (minus:GPI (match_dup 1) (match_dup 2))
-	  (match_dup 4)))]
+   "#"
    ""
-   "sbcs\\t%<w>0, %<w>1, %<w>2"
-  [(set_attr "type" "adc_reg")]
+  [(scratch)]
+  {
+    emit_insn (gen_cmp<mode> (operands[0], operands[1]));
+    DONE;
+  }
 )
 
 (define_insn "*sub_uxt<mode>_shift2"
diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
index 215fcec5955..5f44ef7d672 100644
--- a/gcc/config/aarch64/predicates.md
+++ b/gcc/config/aarch64/predicates.md
@@ -68,6 +68,13 @@
        (ior (match_operand 0 "register_operand")
 	    (match_test "op == CONST0_RTX (GET_MODE (op))"))))
 
+(define_predicate "aarch64_reg_zero_minus1"
+  (and (match_code "reg,subreg,const_int")
+       (ior (match_operand 0 "register_operand")
+	    (ior (match_test "op == CONST0_RTX (GET_MODE (op))")
+	         (match_test "op == CONSTM1_RTX (GET_MODE (op))")))))
+
+
 (define_predicate "aarch64_reg_or_fp_zero"
   (ior (match_operand 0 "register_operand")
 	(and (match_code "const_double")
-- 
2.20.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v2 06/11] aarch64: Use UNSPEC_ADCS for add-with-carry + output flags
  2020-04-02 18:53 [PATCH v2 00/11] aarch64: Implement TImode comparisons Richard Henderson
                   ` (4 preceding siblings ...)
  2020-04-02 18:53 ` [PATCH v2 05/11] aarch64: Use UNSPEC_SBCS for subtract-with-borrow + output flags Richard Henderson
@ 2020-04-02 18:53 ` Richard Henderson
  2020-04-02 18:53 ` [PATCH v2 07/11] aarch64: Remove CC_ADCmode Richard Henderson
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 33+ messages in thread
From: Richard Henderson @ 2020-04-02 18:53 UTC (permalink / raw)
  To: gcc-patches
  Cc: richard.sandiford, segher, richard.earnshaw, Wilco.Dijkstra,
	marcus.shawcroft, kyrylo.tkachov

Similar to UNSPEC_SBCS, we can unify the signed/unsigned overflow
paths by using an unspec.

Accept -1 for the second input by using SBCS.

	* config/aarch64/aarch64.md (UNSPEC_ADCS): New.
	(addvti4, uaddvti4): Use adddi_carryin_cmp.
	(add<GPI>3_carryinC): Remove.
	(*add<GPI>3_carryinC_zero): Remove.
	(*add<GPI>3_carryinC): Remove.
	(add<GPI>3_carryinV): Remove.
	(*add<GPI>3_carryinV_zero): Remove.
	(*add<GPI>3_carryinV): Remove.
	(add<GPI>3_carryin_cmp): New expander.
	(*add<GPI>3_carryin_cmp): New pattern.
	(*add<GPI>3_carryin_cmp_0): New pattern.
	(*cmn<GPI>3_carryin): New pattern.
	(*cmn<GPI>3_carryin_0): New pattern.
---
 gcc/config/aarch64/aarch64.md | 206 +++++++++++++++-------------------
 1 file changed, 89 insertions(+), 117 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 564dea390be..99023494fa1 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -281,6 +281,7 @@
     UNSPEC_GEN_TAG_RND		; Generate a random 4-bit MTE tag.
     UNSPEC_TAG_SPACE		; Translate address to MTE tag address space.
     UNSPEC_LD1RO
+    UNSPEC_ADCS
     UNSPEC_SBCS
 ])
 
@@ -2062,7 +2063,7 @@
   aarch64_expand_addsubti (operands[0], operands[1], operands[2],
 			   CODE_FOR_adddi3_compareV,
 			   CODE_FOR_adddi3_compareC,
-			   CODE_FOR_adddi3_carryinV);
+			   CODE_FOR_adddi3_carryin_cmp);
   aarch64_gen_unlikely_cbranch (NE, CC_Vmode, operands[3]);
   DONE;
 })
@@ -2077,7 +2078,7 @@
   aarch64_expand_addsubti (operands[0], operands[1], operands[2],
 			   CODE_FOR_adddi3_compareC,
 			   CODE_FOR_adddi3_compareC,
-			   CODE_FOR_adddi3_carryinC);
+			   CODE_FOR_adddi3_carryin_cmp);
   aarch64_gen_unlikely_cbranch (GEU, CC_ADCmode, operands[3]);
   DONE;
 })
@@ -2579,133 +2580,104 @@
   [(set_attr "type" "adc_reg")]
 )
 
-(define_expand "add<mode>3_carryinC"
+(define_expand "add<mode>3_carryin_cmp"
   [(parallel
-     [(set (match_dup 3)
-	   (compare:CC_ADC
-	     (plus:<DWI>
-	       (plus:<DWI>
-		 (match_dup 4)
-		 (zero_extend:<DWI>
-		   (match_operand:GPI 1 "register_operand")))
-	       (zero_extend:<DWI>
-		 (match_operand:GPI 2 "register_operand")))
-	     (match_dup 6)))
-      (set (match_operand:GPI 0 "register_operand")
-	   (plus:GPI
-	     (plus:GPI (match_dup 5) (match_dup 1))
-	     (match_dup 2)))])]
+    [(set (match_dup 3)
+	  (unspec:CC
+	    [(match_operand:GPI 1 "aarch64_reg_or_zero")
+	     (match_operand:GPI 2 "aarch64_reg_zero_minus1")
+	     (match_dup 4)]
+	    UNSPEC_ADCS))
+     (set (match_operand:GPI 0 "register_operand")
+	  (unspec:GPI
+	    [(match_dup 1) (match_dup 2) (match_dup 4)]
+	    UNSPEC_ADCS))])]
    ""
-{
-  operands[3] = gen_rtx_REG (CC_ADCmode, CC_REGNUM);
-  rtx ccin = gen_rtx_REG (CC_Cmode, CC_REGNUM);
-  operands[4] = gen_rtx_LTU (<DWI>mode, ccin, const0_rtx);
-  operands[5] = gen_rtx_LTU (<MODE>mode, ccin, const0_rtx);
-  operands[6] = immed_wide_int_const (wi::shwi (1, <DWI>mode)
-				      << GET_MODE_BITSIZE (<MODE>mode),
-				      TImode);
-})
+  {
+    operands[3] = gen_rtx_REG (CCmode, CC_REGNUM);
+    operands[4] = gen_rtx_GEU (<MODE>mode, operands[3], const0_rtx);
+  }
+)
 
-(define_insn "*add<mode>3_carryinC_zero"
-  [(set (reg:CC_ADC CC_REGNUM)
-	(compare:CC_ADC
-	  (plus:<DWI>
-	    (match_operand:<DWI> 2 "aarch64_carry_operation" "")
-	    (zero_extend:<DWI> (match_operand:GPI 1 "register_operand" "r")))
-	  (match_operand 4 "const_scalar_int_operand" "")))
-   (set (match_operand:GPI 0 "register_operand" "=r")
-	(plus:GPI (match_operand:GPI 3 "aarch64_carry_operation" "")
-		  (match_dup 1)))]
-  "rtx_mode_t (operands[4], <DWI>mode)
-   == (wi::shwi (1, <DWI>mode) << (unsigned) GET_MODE_BITSIZE (<MODE>mode))"
-   "adcs\\t%<w>0, %<w>1, <w>zr"
+(define_insn "*add<mode>3_carryin_cmp"
+  [(set (reg:CC CC_REGNUM)
+	(unspec:CC
+	  [(match_operand:GPI 1 "aarch64_reg_or_zero" "%rZ,rZ")
+	   (match_operand:GPI 2 "aarch64_reg_zero_minus1" "rZ,UsM")
+	   (match_operand:GPI 3 "aarch64_carry_operation" "")]
+	  UNSPEC_ADCS))
+   (set (match_operand:GPI 0 "register_operand" "=r,r")
+	(unspec:GPI
+	  [(match_dup 1) (match_dup 2) (match_dup 3)]
+	  UNSPEC_ADCS))]
+   ""
+   "@
+    adcs\\t%<w>0, %<w>1, %<w>2
+    sbcs\\t%<w>0, %<w>1, <w>zr"
   [(set_attr "type" "adc_reg")]
 )
 
-(define_insn "*add<mode>3_carryinC"
-  [(set (reg:CC_ADC CC_REGNUM)
-	(compare:CC_ADC
-	  (plus:<DWI>
-	    (plus:<DWI>
-	      (match_operand:<DWI> 3 "aarch64_carry_operation" "")
-	      (zero_extend:<DWI> (match_operand:GPI 1 "register_operand" "r")))
-	    (zero_extend:<DWI> (match_operand:GPI 2 "register_operand" "r")))
-	  (match_operand 5 "const_scalar_int_operand" "")))
-   (set (match_operand:GPI 0 "register_operand" "=r")
-	(plus:GPI
-	  (plus:GPI (match_operand:GPI 4 "aarch64_carry_operation" "")
-		    (match_dup 1))
-	  (match_dup 2)))]
-  "rtx_mode_t (operands[5], <DWI>mode)
-   == (wi::shwi (1, <DWI>mode) << (unsigned) GET_MODE_BITSIZE (<MODE>mode))"
-   "adcs\\t%<w>0, %<w>1, %<w>2"
+(define_insn "*cmn<mode>3_carryin"
+  [(set (reg:CC CC_REGNUM)
+	(unspec:CC
+	  [(match_operand:GPI 0 "aarch64_reg_or_zero" "%rZ,rZ")
+	   (match_operand:GPI 1 "aarch64_reg_zero_minus1" "rZ,UsM")
+	   (match_operand:GPI 2 "aarch64_carry_operation" "")]
+	  UNSPEC_ADCS))]
+   ""
+   "@
+    adcs\\t<w>zr, %<w>0, %<w>1
+    sbcs\\t<w>zr, %<w>0, <w>zr"
   [(set_attr "type" "adc_reg")]
 )
 
-(define_expand "add<mode>3_carryinV"
-  [(parallel
-     [(set (reg:CC_V CC_REGNUM)
-	   (compare:CC_V
-	     (plus:<DWI>
-	       (plus:<DWI>
-		 (match_dup 3)
-		 (sign_extend:<DWI>
-		   (match_operand:GPI 1 "register_operand")))
-	       (sign_extend:<DWI>
-		 (match_operand:GPI 2 "register_operand")))
-	   (sign_extend:<DWI>
-	     (plus:GPI
-	       (plus:GPI (match_dup 4) (match_dup 1))
-	       (match_dup 2)))))
-      (set (match_operand:GPI 0 "register_operand")
-	   (plus:GPI
-	     (plus:GPI (match_dup 4) (match_dup 1))
-	     (match_dup 2)))])]
-   ""
-{
-  rtx cc = gen_rtx_REG (CC_Cmode, CC_REGNUM);
-  operands[3] = gen_rtx_LTU (<DWI>mode, cc, const0_rtx);
-  operands[4] = gen_rtx_LTU (<MODE>mode, cc, const0_rtx);
-})
-
-(define_insn "*add<mode>3_carryinV_zero"
-  [(set (reg:CC_V CC_REGNUM)
-	(compare:CC_V
-	  (plus:<DWI>
-	    (match_operand:<DWI> 2 "aarch64_carry_operation" "")
-	    (sign_extend:<DWI> (match_operand:GPI 1 "register_operand" "r")))
-	  (sign_extend:<DWI>
-	    (plus:GPI
-	      (match_operand:GPI 3 "aarch64_carry_operation" "")
-	      (match_dup 1)))))
+;; If combine can show that the borrow is 0, fold ADCS to ADDS.
+(define_insn_and_split "*add<mode>3_carryin_cmp_0"
+  [(set (reg:CC CC_REGNUM)
+	(unspec:CC
+	  [(match_operand:GPI 1 "aarch64_reg_or_zero" "%rk")
+	   (match_operand:GPI 2 "aarch64_plus_immediate" "rIJ")
+	   (const_int 0)]
+	  UNSPEC_ADCS))
    (set (match_operand:GPI 0 "register_operand" "=r")
-	(plus:GPI (match_dup 3) (match_dup 1)))]
-   ""
-   "adcs\\t%<w>0, %<w>1, <w>zr"
-  [(set_attr "type" "adc_reg")]
+	(unspec:GPI
+	  [(match_dup 1) (match_dup 2) (const_int 0)]
+	  UNSPEC_ADCS))]
+  ""
+  "#"
+  ""
+  [(scratch)]
+  {
+    if (CONST_INT_P (operands[1]))
+      {
+	/* If operand2 is also constant, this must be before reload.
+	   Expanding this to an explicit plus of two constants would
+	   result in invalid rtl.  */
+	if (CONST_INT_P (operands[2]))
+	  FAIL;
+	std::swap (operands[1], operands[2]);
+      }
+    emit_insn (gen_add<mode>3_compare0 (operands[0], operands[1],
+					operands[2]));
+    DONE;
+  }
 )
 
-(define_insn "*add<mode>3_carryinV"
-  [(set (reg:CC_V CC_REGNUM)
-	(compare:CC_V
-	  (plus:<DWI>
-	    (plus:<DWI>
-	      (match_operand:<DWI> 3 "aarch64_carry_operation" "")
-	      (sign_extend:<DWI> (match_operand:GPI 1 "register_operand" "r")))
-	    (sign_extend:<DWI> (match_operand:GPI 2 "register_operand" "r")))
-	  (sign_extend:<DWI>
-	    (plus:GPI
-	      (plus:GPI
-		(match_operand:GPI 4 "aarch64_carry_operation" "")
-		(match_dup 1))
-	      (match_dup 2)))))
-   (set (match_operand:GPI 0 "register_operand" "=r")
-	(plus:GPI
-	  (plus:GPI (match_dup 4) (match_dup 1))
-	  (match_dup 2)))]
-   ""
-   "adcs\\t%<w>0, %<w>1, %<w>2"
-  [(set_attr "type" "adc_reg")]
+;; ??? There's no one add*compare*cconly pattern that covers both C and V
+;; into which this can be split.  Leave it whole for now.
+(define_insn "*cmn<mode>3_carryin_0"
+  [(set (reg:CC CC_REGNUM)
+	(unspec:CC
+	  [(match_operand:GPI 0 "aarch64_reg_or_zero" "%rk,rk,rZ")
+	   (match_operand:GPI 1 "aarch64_plus_operand" "I,J,rZ")
+	   (const_int 0)]
+	  UNSPEC_ADCS))]
+  ""
+  "@
+   cmn\\t%<w>0, %<w>1
+   cmp\\t%<w>0, #%n1
+   cmn\\t%<w>0, %<w>1"
+  [(set_attr "type" "alus_imm,alus_imm,alus_sreg")]
 )
 
 (define_insn "*add_uxt<mode>_shift2"
-- 
2.20.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v2 07/11] aarch64: Remove CC_ADCmode
  2020-04-02 18:53 [PATCH v2 00/11] aarch64: Implement TImode comparisons Richard Henderson
                   ` (5 preceding siblings ...)
  2020-04-02 18:53 ` [PATCH v2 06/11] aarch64: Use UNSPEC_ADCS for add-with-carry " Richard Henderson
@ 2020-04-02 18:53 ` Richard Henderson
  2020-04-02 18:53 ` [PATCH v2 08/11] aarch64: Accept -1 as second argument to add<mode>3_carryin Richard Henderson
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 33+ messages in thread
From: Richard Henderson @ 2020-04-02 18:53 UTC (permalink / raw)
  To: gcc-patches
  Cc: richard.sandiford, segher, richard.earnshaw, Wilco.Dijkstra,
	marcus.shawcroft, kyrylo.tkachov

Now that we're using UNSPEC_ADCS instead of rtl, there's
no reason to distinguish CC_ADCmode from CC_Cmode.  Both
examine only the C bit.  Within uaddvti4, using CC_Cmode
is clearer, since it's the carry-outthat's relevant.

	* config/aarch64/aarch64-modes.def (CC_ADC): Remove.
	* config/aarch64/aarch64.c (aarch64_select_cc_mode):
	Do not look for unsigned overflow from add with carry.
	* config/aarch64/aarch64.md (uaddvti4): Use CC_Cmode.
	* config/aarch64/predicates.md (aarch64_carry_operation)
	Remove check for CC_ADCmode.
	(aarch64_borrow_operation): Likewise.
---
 gcc/config/aarch64/aarch64.c         | 19 -------------------
 gcc/config/aarch64/aarch64-modes.def |  1 -
 gcc/config/aarch64/aarch64.md        |  2 +-
 gcc/config/aarch64/predicates.md     |  4 ++--
 4 files changed, 3 insertions(+), 23 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 6263897c9a0..8e54506bc3e 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -9094,16 +9094,6 @@ aarch64_select_cc_mode (RTX_CODE code, rtx x, rtx y)
       && rtx_equal_p (XEXP (x, 0), y))
     return CC_Cmode;
 
-  /* A test for unsigned overflow from an add with carry.  */
-  if ((mode_x == DImode || mode_x == TImode)
-      && (code == LTU || code == GEU)
-      && code_x == PLUS
-      && CONST_SCALAR_INT_P (y)
-      && (rtx_mode_t (y, mode_x)
-	  == (wi::shwi (1, mode_x)
-	      << (GET_MODE_BITSIZE (mode_x).to_constant () / 2))))
-    return CC_ADCmode;
-
   /* A test for signed overflow.  */
   if ((mode_x == DImode || mode_x == TImode)
       && code == NE
@@ -9232,15 +9222,6 @@ aarch64_get_condition_code_1 (machine_mode mode, enum rtx_code comp_code)
 	}
       break;
 
-    case E_CC_ADCmode:
-      switch (comp_code)
-	{
-	case GEU: return AARCH64_CS;
-	case LTU: return AARCH64_CC;
-	default: return -1;
-	}
-      break;
-
     case E_CC_Vmode:
       switch (comp_code)
 	{
diff --git a/gcc/config/aarch64/aarch64-modes.def b/gcc/config/aarch64/aarch64-modes.def
index af972e8f72b..32e4b6a35a9 100644
--- a/gcc/config/aarch64/aarch64-modes.def
+++ b/gcc/config/aarch64/aarch64-modes.def
@@ -38,7 +38,6 @@ CC_MODE (CC_NZC);   /* Only N, Z and C bits of condition flags are valid.
 CC_MODE (CC_NZ);    /* Only N and Z bits of condition flags are valid.  */
 CC_MODE (CC_Z);     /* Only Z bit of condition flags is valid.  */
 CC_MODE (CC_C);     /* C represents unsigned overflow of a simple addition.  */
-CC_MODE (CC_ADC);   /* Unsigned overflow from an ADC (add with carry).  */
 CC_MODE (CC_V);     /* Only V bit of condition flags is valid.  */
 
 /* Half-precision floating point for __fp16.  */
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 99023494fa1..8d405b40173 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -2079,7 +2079,7 @@
 			   CODE_FOR_adddi3_compareC,
 			   CODE_FOR_adddi3_compareC,
 			   CODE_FOR_adddi3_carryin_cmp);
-  aarch64_gen_unlikely_cbranch (GEU, CC_ADCmode, operands[3]);
+  aarch64_gen_unlikely_cbranch (LTU, CC_Cmode, operands[3]);
   DONE;
 })
 
diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
index 5f44ef7d672..42864cbf4dd 100644
--- a/gcc/config/aarch64/predicates.md
+++ b/gcc/config/aarch64/predicates.md
@@ -388,7 +388,7 @@
   machine_mode ccmode = GET_MODE (op0);
   if (ccmode == CC_Cmode)
     return GET_CODE (op) == LTU;
-  if (ccmode == CC_ADCmode || ccmode == CCmode)
+  if (ccmode == CCmode)
     return GET_CODE (op) == GEU;
   return false;
 })
@@ -406,7 +406,7 @@
   machine_mode ccmode = GET_MODE (op0);
   if (ccmode == CC_Cmode)
     return GET_CODE (op) == GEU;
-  if (ccmode == CC_ADCmode || ccmode == CCmode)
+  if (ccmode == CCmode)
     return GET_CODE (op) == LTU;
   return false;
 })
-- 
2.20.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v2 08/11] aarch64: Accept -1 as second argument to add<mode>3_carryin
  2020-04-02 18:53 [PATCH v2 00/11] aarch64: Implement TImode comparisons Richard Henderson
                   ` (6 preceding siblings ...)
  2020-04-02 18:53 ` [PATCH v2 07/11] aarch64: Remove CC_ADCmode Richard Henderson
@ 2020-04-02 18:53 ` Richard Henderson
  2020-04-02 18:53 ` [PATCH v2 09/11] aarch64: Adjust result of aarch64_gen_compare_reg Richard Henderson
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 33+ messages in thread
From: Richard Henderson @ 2020-04-02 18:53 UTC (permalink / raw)
  To: gcc-patches
  Cc: richard.sandiford, segher, richard.earnshaw, Wilco.Dijkstra,
	marcus.shawcroft, kyrylo.tkachov

	* config/aarch64/predicates.md (aarch64_reg_or_minus1): New.
	* config/aarch64/aarch64.md (add<GPI>3_carryin): Use it.
	(*add<GPI>3_carryin): Likewise.
	(*addsi3_carryin_uxtw): Likewise.
---
 gcc/config/aarch64/aarch64.md    | 26 +++++++++++++++-----------
 gcc/config/aarch64/predicates.md |  6 +++++-
 2 files changed, 20 insertions(+), 12 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 8d405b40173..c11c4366bf9 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -2545,7 +2545,7 @@
 	  (plus:GPI
 	    (ltu:GPI (reg:CC_C CC_REGNUM) (const_int 0))
 	    (match_operand:GPI 1 "aarch64_reg_or_zero"))
-	  (match_operand:GPI 2 "aarch64_reg_or_zero")))]
+	  (match_operand:GPI 2 "aarch64_reg_zero_minus1")))]
    ""
    ""
 )
@@ -2555,28 +2555,32 @@
 ;; accept the zeros during initial expansion.
 
 (define_insn "*add<mode>3_carryin"
-  [(set (match_operand:GPI 0 "register_operand" "=r")
+  [(set (match_operand:GPI 0 "register_operand" "=r,r")
 	(plus:GPI
 	  (plus:GPI
 	    (match_operand:GPI 3 "aarch64_carry_operation" "")
-	    (match_operand:GPI 1 "aarch64_reg_or_zero" "rZ"))
-	  (match_operand:GPI 2 "aarch64_reg_or_zero" "rZ")))]
-   ""
-   "adc\\t%<w>0, %<w>1, %<w>2"
+	    (match_operand:GPI 1 "aarch64_reg_or_zero" "rZ,rZ"))
+	  (match_operand:GPI 2 "aarch64_reg_zero_minus1" "rZ,UsM")))]
+  ""
+  "@
+   adc\\t%<w>0, %<w>1, %<w>2
+   sbc\\t%<w>0, %<w>1, <w>zr"
   [(set_attr "type" "adc_reg")]
 )
 
 ;; zero_extend version of above
 (define_insn "*addsi3_carryin_uxtw"
-  [(set (match_operand:DI 0 "register_operand" "=r")
+  [(set (match_operand:DI 0 "register_operand" "=r,r")
 	(zero_extend:DI
 	  (plus:SI
 	    (plus:SI
 	      (match_operand:SI 3 "aarch64_carry_operation" "")
-	      (match_operand:SI 1 "register_operand" "r"))
-	    (match_operand:SI 2 "register_operand" "r"))))]
-   ""
-   "adc\\t%w0, %w1, %w2"
+	      (match_operand:SI 1 "register_operand" "r,r"))
+	    (match_operand:SI 2 "aarch64_reg_or_minus1" "r,UsM"))))]
+  ""
+  "@
+   adc\\t%w0, %w1, %w2
+   sbc\\t%w0, %w1, wzr"
   [(set_attr "type" "adc_reg")]
 )
 
diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
index 42864cbf4dd..2e7aa6389eb 100644
--- a/gcc/config/aarch64/predicates.md
+++ b/gcc/config/aarch64/predicates.md
@@ -68,13 +68,17 @@
        (ior (match_operand 0 "register_operand")
 	    (match_test "op == CONST0_RTX (GET_MODE (op))"))))
 
+(define_predicate "aarch64_reg_or_minus1"
+  (and (match_code "reg,subreg,const_int")
+       (ior (match_operand 0 "register_operand")
+	    (match_test "op == CONSTM1_RTX (GET_MODE (op))"))))
+
 (define_predicate "aarch64_reg_zero_minus1"
   (and (match_code "reg,subreg,const_int")
        (ior (match_operand 0 "register_operand")
 	    (ior (match_test "op == CONST0_RTX (GET_MODE (op))")
 	         (match_test "op == CONSTM1_RTX (GET_MODE (op))")))))
 
-
 (define_predicate "aarch64_reg_or_fp_zero"
   (ior (match_operand 0 "register_operand")
 	(and (match_code "const_double")
-- 
2.20.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v2 09/11] aarch64: Adjust result of aarch64_gen_compare_reg
  2020-04-02 18:53 [PATCH v2 00/11] aarch64: Implement TImode comparisons Richard Henderson
                   ` (7 preceding siblings ...)
  2020-04-02 18:53 ` [PATCH v2 08/11] aarch64: Accept -1 as second argument to add<mode>3_carryin Richard Henderson
@ 2020-04-02 18:53 ` Richard Henderson
  2020-04-02 18:53 ` [PATCH v2 10/11] aarch64: Implement TImode comparisons Richard Henderson
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 33+ messages in thread
From: Richard Henderson @ 2020-04-02 18:53 UTC (permalink / raw)
  To: gcc-patches
  Cc: richard.sandiford, segher, richard.earnshaw, Wilco.Dijkstra,
	marcus.shawcroft, kyrylo.tkachov

Return the entire comparison expression, not just the cc_reg.
This will allow the routine to adjust the comparison code as
needed for TImode comparisons.

Note that some users were passing e.g. EQ to aarch64_gen_compare_reg
and then using gen_rtx_NE.  Pass the proper code in the first place.

	* config/aarch64/aarch64.c (aarch64_gen_compare_reg): Return
	the final comparison for code & cc_reg.
	(aarch64_gen_compare_reg_maybe_ze): Likewise.
	(aarch64_expand_compare_and_swap): Update to match -- do not
	build the final comparison here, but PUT_MODE as necessary.
	(aarch64_split_compare_and_swap): Use prebuilt comparison.
	* config/aarch64/aarch64-simd.md (aarch64_cm<COMPARISONS>di): Likewise.
	(aarch64_cm<UCOMPARISONS>di): Likewise.
	(aarch64_cmtstdi): Likewise.
	* config/aarch64/aarch64-speculation.cc
	(aarch64_speculation_establish_tracker): Likewise.
	* config/aarch64/aarch64.md (cbranch<GPI>4, cbranch<GPF>4): Likewise.
	(mod<GPI>3, abs<GPI>2): Likewise.
	(cstore<GPI>4, cstore<GPF>4): Likewise.
	(cmov<GPI>6, cmov<GPF>6): Likewise.
	(mov<ALLI>cc, mov<GPF><GPI>cc, mov<GPF>cc): Likewise.
	(<NEG_NOT><GPI>cc): Likewise.
	(ffs<GPI>2): Likewise.
	(cstorecc4): Remove redundant "".
---
 gcc/config/aarch64/aarch64.c              | 26 +++---
 gcc/config/aarch64/aarch64-simd.md        | 18 ++---
 gcc/config/aarch64/aarch64-speculation.cc |  5 +-
 gcc/config/aarch64/aarch64.md             | 96 ++++++++++-------------
 4 files changed, 63 insertions(+), 82 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 8e54506bc3e..93658338041 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -2328,7 +2328,7 @@ emit_set_insn (rtx x, rtx y)
 }
 
 /* X and Y are two things to compare using CODE.  Emit the compare insn and
-   return the rtx for register 0 in the proper mode.  */
+   return the rtx for the CCmode comparison.  */
 rtx
 aarch64_gen_compare_reg (RTX_CODE code, rtx x, rtx y)
 {
@@ -2359,7 +2359,7 @@ aarch64_gen_compare_reg (RTX_CODE code, rtx x, rtx y)
       cc_reg = gen_rtx_REG (cc_mode, CC_REGNUM);
       emit_set_insn (cc_reg, gen_rtx_COMPARE (cc_mode, x, y));
     }
-  return cc_reg;
+  return gen_rtx_fmt_ee (code, VOIDmode, cc_reg, const0_rtx);
 }
 
 /* Similarly, but maybe zero-extend Y if Y_MODE < SImode.  */
@@ -2382,7 +2382,7 @@ aarch64_gen_compare_reg_maybe_ze (RTX_CODE code, rtx x, rtx y,
 	  cc_mode = CC_SWPmode;
 	  cc_reg = gen_rtx_REG (cc_mode, CC_REGNUM);
 	  emit_set_insn (cc_reg, t);
-	  return cc_reg;
+	  return gen_rtx_fmt_ee (code, VOIDmode, cc_reg, const0_rtx);
 	}
     }
 
@@ -18487,7 +18487,8 @@ aarch64_expand_compare_and_swap (rtx operands[])
 
       emit_insn (gen_aarch64_compare_and_swap_lse (mode, rval, mem,
 						   newval, mod_s));
-      cc_reg = aarch64_gen_compare_reg_maybe_ze (NE, rval, oldval, mode);
+      x = aarch64_gen_compare_reg_maybe_ze (EQ, rval, oldval, mode);
+      PUT_MODE (x, SImode);
     }
   else if (TARGET_OUTLINE_ATOMICS)
     {
@@ -18498,7 +18499,8 @@ aarch64_expand_compare_and_swap (rtx operands[])
       rval = emit_library_call_value (func, NULL_RTX, LCT_NORMAL, r_mode,
 				      oldval, mode, newval, mode,
 				      XEXP (mem, 0), Pmode);
-      cc_reg = aarch64_gen_compare_reg_maybe_ze (NE, rval, oldval, mode);
+      x = aarch64_gen_compare_reg_maybe_ze (EQ, rval, oldval, mode);
+      PUT_MODE (x, SImode);
     }
   else
     {
@@ -18510,13 +18512,13 @@ aarch64_expand_compare_and_swap (rtx operands[])
       emit_insn (GEN_FCN (code) (rval, mem, oldval, newval,
 				 is_weak, mod_s, mod_f));
       cc_reg = gen_rtx_REG (CCmode, CC_REGNUM);
+      x = gen_rtx_EQ (SImode, cc_reg, const0_rtx);
     }
 
   if (r_mode != mode)
     rval = gen_lowpart (mode, rval);
   emit_move_insn (operands[1], rval);
 
-  x = gen_rtx_EQ (SImode, cc_reg, const0_rtx);
   emit_insn (gen_rtx_SET (bval, x));
 }
 
@@ -18591,10 +18593,8 @@ aarch64_split_compare_and_swap (rtx operands[])
   if (strong_zero_p)
     x = gen_rtx_NE (VOIDmode, rval, const0_rtx);
   else
-    {
-      rtx cc_reg = aarch64_gen_compare_reg_maybe_ze (NE, rval, oldval, mode);
-      x = gen_rtx_NE (VOIDmode, cc_reg, const0_rtx);
-    }
+    x = aarch64_gen_compare_reg_maybe_ze (NE, rval, oldval, mode);
+
   x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
 			    gen_rtx_LABEL_REF (Pmode, label2), pc_rtx);
   aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
@@ -18607,8 +18607,7 @@ aarch64_split_compare_and_swap (rtx operands[])
 	{
 	  /* Emit an explicit compare instruction, so that we can correctly
 	     track the condition codes.  */
-	  rtx cc_reg = aarch64_gen_compare_reg (NE, scratch, const0_rtx);
-	  x = gen_rtx_NE (GET_MODE (cc_reg), cc_reg, const0_rtx);
+	  x = aarch64_gen_compare_reg (NE, scratch, const0_rtx);
 	}
       else
 	x = gen_rtx_NE (VOIDmode, scratch, const0_rtx);
@@ -18703,8 +18702,7 @@ aarch64_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem,
     {
       /* Emit an explicit compare instruction, so that we can correctly
 	 track the condition codes.  */
-      rtx cc_reg = aarch64_gen_compare_reg (NE, cond, const0_rtx);
-      x = gen_rtx_NE (GET_MODE (cc_reg), cc_reg, const0_rtx);
+      x = aarch64_gen_compare_reg (NE, cond, const0_rtx);
     }
   else
     x = gen_rtx_NE (VOIDmode, cond, const0_rtx);
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 24a11fb5040..69e099a2c23 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -4800,10 +4800,8 @@
     if (GP_REGNUM_P (REGNO (operands[0]))
 	&& GP_REGNUM_P (REGNO (operands[1])))
       {
-	machine_mode mode = SELECT_CC_MODE (<CMP>, operands[1], operands[2]);
-	rtx cc_reg = aarch64_gen_compare_reg (<CMP>, operands[1], operands[2]);
-	rtx comparison = gen_rtx_<CMP> (mode, operands[1], operands[2]);
-	emit_insn (gen_cstoredi_neg (operands[0], comparison, cc_reg));
+	rtx cmp = aarch64_gen_compare_reg (<CMP>, operands[1], operands[2]);
+	emit_insn (gen_cstoredi_neg (operands[0], cmp, XEXP (cmp, 0)));
 	DONE;
       }
     /* Otherwise, we expand to a similar pattern which does not
@@ -4863,10 +4861,8 @@
     if (GP_REGNUM_P (REGNO (operands[0]))
 	&& GP_REGNUM_P (REGNO (operands[1])))
       {
-	machine_mode mode = CCmode;
-	rtx cc_reg = aarch64_gen_compare_reg (<CMP>, operands[1], operands[2]);
-	rtx comparison = gen_rtx_<CMP> (mode, operands[1], operands[2]);
-	emit_insn (gen_cstoredi_neg (operands[0], comparison, cc_reg));
+	rtx cmp = aarch64_gen_compare_reg (<CMP>, operands[1], operands[2]);
+	emit_insn (gen_cstoredi_neg (operands[0], cmp, XEXP (cmp, 0)));
 	DONE;
       }
     /* Otherwise, we expand to a similar pattern which does not
@@ -4936,10 +4932,8 @@
 	&& GP_REGNUM_P (REGNO (operands[1])))
       {
 	rtx and_tree = gen_rtx_AND (DImode, operands[1], operands[2]);
-	machine_mode mode = SELECT_CC_MODE (NE, and_tree, const0_rtx);
-	rtx cc_reg = aarch64_gen_compare_reg (NE, and_tree, const0_rtx);
-	rtx comparison = gen_rtx_NE (mode, and_tree, const0_rtx);
-	emit_insn (gen_cstoredi_neg (operands[0], comparison, cc_reg));
+	rtx cmp = aarch64_gen_compare_reg (NE, and_tree, const0_rtx);
+	emit_insn (gen_cstoredi_neg (operands[0], cmp, XEXP (cmp, 0)));
 	DONE;
       }
     /* Otherwise, we expand to a similar pattern which does not
diff --git a/gcc/config/aarch64/aarch64-speculation.cc b/gcc/config/aarch64/aarch64-speculation.cc
index f490b64ae61..87d5964871b 100644
--- a/gcc/config/aarch64/aarch64-speculation.cc
+++ b/gcc/config/aarch64/aarch64-speculation.cc
@@ -162,9 +162,8 @@ aarch64_speculation_establish_tracker ()
   rtx sp = gen_rtx_REG (DImode, SP_REGNUM);
   rtx tracker = gen_rtx_REG (DImode, SPECULATION_TRACKER_REGNUM);
   start_sequence ();
-  rtx cc = aarch64_gen_compare_reg (EQ, sp, const0_rtx);
-  emit_insn (gen_cstoredi_neg (tracker,
-			       gen_rtx_NE (CCmode, cc, const0_rtx), cc));
+  rtx x = aarch64_gen_compare_reg (NE, sp, const0_rtx);
+  emit_insn (gen_cstoredi_neg (tracker, x, XEXP (x, 0)));
   rtx_insn *seq = get_insns ();
   end_sequence ();
   return seq;
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index c11c4366bf9..dbaeb7c251c 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -466,12 +466,12 @@
 			   (label_ref (match_operand 3 "" ""))
 			   (pc)))]
   ""
-  "
-  operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]), operands[1],
+{
+  operands[0] = aarch64_gen_compare_reg (GET_CODE (operands[0]), operands[1],
 					 operands[2]);
+  operands[1] = XEXP (operands[0], 0);
   operands[2] = const0_rtx;
-  "
-)
+})
 
 (define_expand "cbranch<mode>4"
   [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
@@ -480,12 +480,12 @@
 			   (label_ref (match_operand 3 "" ""))
 			   (pc)))]
   ""
-  "
-  operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]), operands[1],
+{
+  operands[0] = aarch64_gen_compare_reg (GET_CODE (operands[0]), operands[1],
 					 operands[2]);
+  operands[1] = XEXP (operands[0], 0);
   operands[2] = const0_rtx;
-  "
-)
+})
 
 (define_expand "cbranchcc4"
   [(set (pc) (if_then_else
@@ -600,9 +600,8 @@
     if (val == 2)
       {
 	rtx masked = gen_reg_rtx (<MODE>mode);
-	rtx ccreg = aarch64_gen_compare_reg (LT, operands[1], const0_rtx);
+	rtx x = aarch64_gen_compare_reg (LT, operands[1], const0_rtx);
 	emit_insn (gen_and<mode>3 (masked, operands[1], mask));
-	rtx x = gen_rtx_LT (VOIDmode, ccreg, const0_rtx);
 	emit_insn (gen_csneg3<mode>_insn (operands[0], x, masked, masked));
 	DONE;
       }
@@ -3502,8 +3501,7 @@
    (match_operand:GPI 1 "register_operand")]
   ""
   {
-    rtx ccreg = aarch64_gen_compare_reg (LT, operands[1], const0_rtx);
-    rtx x = gen_rtx_LT (VOIDmode, ccreg, const0_rtx);
+    rtx x = aarch64_gen_compare_reg (LT, operands[1], const0_rtx);
     emit_insn (gen_csneg3<mode>_insn (operands[0], x, operands[1], operands[1]));
     DONE;
   }
@@ -3917,12 +3915,13 @@
 	 [(match_operand:GPI 2 "register_operand")
 	  (match_operand:GPI 3 "aarch64_plus_operand")]))]
   ""
-  "
-  operands[2] = aarch64_gen_compare_reg (GET_CODE (operands[1]), operands[2],
-				      operands[3]);
+{
+  operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[1]), operands[2],
+				         operands[3]);
+  PUT_MODE (operands[1], SImode);
+  operands[2] = XEXP (operands[1], 0);
   operands[3] = const0_rtx;
-  "
-)
+})
 
 (define_expand "cstorecc4"
   [(set (match_operand:SI 0 "register_operand")
@@ -3930,11 +3929,10 @@
 	[(match_operand 2 "cc_register")
          (match_operand 3 "const0_operand")]))]
   ""
-"{
+{
   emit_insn (gen_rtx_SET (operands[0], operands[1]));
   DONE;
-}")
-
+})
 
 (define_expand "cstore<mode>4"
   [(set (match_operand:SI 0 "register_operand")
@@ -3942,12 +3940,13 @@
 	 [(match_operand:GPF 2 "register_operand")
 	  (match_operand:GPF 3 "aarch64_fp_compare_operand")]))]
   ""
-  "
-  operands[2] = aarch64_gen_compare_reg (GET_CODE (operands[1]), operands[2],
-				      operands[3]);
+{
+  operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[1]), operands[2],
+				         operands[3]);
+  PUT_MODE (operands[1], SImode);
+  operands[2] = XEXP (operands[1], 0);
   operands[3] = const0_rtx;
-  "
-)
+})
 
 (define_insn "aarch64_cstore<mode>"
   [(set (match_operand:ALLI 0 "register_operand" "=r")
@@ -4033,12 +4032,12 @@
 	 (match_operand:GPI 4 "register_operand")
 	 (match_operand:GPI 5 "register_operand")))]
   ""
-  "
-  operands[2] = aarch64_gen_compare_reg (GET_CODE (operands[1]), operands[2],
-				      operands[3]);
+{
+  operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[1]), operands[2],
+				         operands[3]);
+  operands[2] = XEXP (operands[1], 0);
   operands[3] = const0_rtx;
-  "
-)
+})
 
 (define_expand "cmov<mode>6"
   [(set (match_operand:GPF 0 "register_operand")
@@ -4049,12 +4048,12 @@
 	 (match_operand:GPF 4 "register_operand")
 	 (match_operand:GPF 5 "register_operand")))]
   ""
-  "
-  operands[2] = aarch64_gen_compare_reg (GET_CODE (operands[1]), operands[2],
-				      operands[3]);
+{
+  operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[1]), operands[2],
+				         operands[3]);
+  operands[2] = XEXP (operands[1], 0);
   operands[3] = const0_rtx;
-  "
-)
+})
 
 (define_insn "*cmov<mode>_insn"
   [(set (match_operand:ALLI 0 "register_operand" "=r,r,r,r,r,r,r")
@@ -4131,15 +4130,13 @@
 			   (match_operand:ALLI 3 "register_operand")))]
   ""
   {
-    rtx ccreg;
     enum rtx_code code = GET_CODE (operands[1]);
 
     if (code == UNEQ || code == LTGT)
       FAIL;
 
-    ccreg = aarch64_gen_compare_reg (code, XEXP (operands[1], 0),
-				     XEXP (operands[1], 1));
-    operands[1] = gen_rtx_fmt_ee (code, VOIDmode, ccreg, const0_rtx);
+    operands[1] = aarch64_gen_compare_reg (code, XEXP (operands[1], 0),
+					   XEXP (operands[1], 1));
   }
 )
 
@@ -4150,15 +4147,13 @@
 			  (match_operand:GPF 3 "register_operand")))]
   ""
   {
-    rtx ccreg;
     enum rtx_code code = GET_CODE (operands[1]);
 
     if (code == UNEQ || code == LTGT)
       FAIL;
 
-    ccreg = aarch64_gen_compare_reg (code, XEXP (operands[1], 0),
-				  XEXP (operands[1], 1));
-    operands[1] = gen_rtx_fmt_ee (code, VOIDmode, ccreg, const0_rtx);
+    operands[1] = aarch64_gen_compare_reg (code, XEXP (operands[1], 0),
+					   XEXP (operands[1], 1));
   }
 )
 
@@ -4169,15 +4164,13 @@
 			  (match_operand:GPF 3 "register_operand")))]
   ""
   {
-    rtx ccreg;
     enum rtx_code code = GET_CODE (operands[1]);
 
     if (code == UNEQ || code == LTGT)
       FAIL;
 
-    ccreg = aarch64_gen_compare_reg (code, XEXP (operands[1], 0),
-				  XEXP (operands[1], 1));
-    operands[1] = gen_rtx_fmt_ee (code, VOIDmode, ccreg, const0_rtx);
+    operands[1] = aarch64_gen_compare_reg (code, XEXP (operands[1], 0),
+					   XEXP (operands[1], 1));
   }
 )
 
@@ -4188,15 +4181,13 @@
 			  (match_operand:GPI 3 "register_operand")))]
   ""
   {
-    rtx ccreg;
     enum rtx_code code = GET_CODE (operands[1]);
 
     if (code == UNEQ || code == LTGT)
       FAIL;
 
-    ccreg = aarch64_gen_compare_reg (code, XEXP (operands[1], 0),
-				      XEXP (operands[1], 1));
-    operands[1] = gen_rtx_fmt_ee (code, VOIDmode, ccreg, const0_rtx);
+    operands[1] = aarch64_gen_compare_reg (code, XEXP (operands[1], 0),
+					   XEXP (operands[1], 1));
   }
 )
 
@@ -4705,8 +4696,7 @@
    (match_operand:GPI 1 "register_operand")]
   ""
   {
-    rtx ccreg = aarch64_gen_compare_reg (EQ, operands[1], const0_rtx);
-    rtx x = gen_rtx_NE (VOIDmode, ccreg, const0_rtx);
+    rtx x = aarch64_gen_compare_reg (NE, operands[1], const0_rtx);
 
     emit_insn (gen_rbit<mode>2 (operands[0], operands[1]));
     emit_insn (gen_clz<mode>2 (operands[0], operands[0]));
-- 
2.20.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v2 10/11] aarch64: Implement TImode comparisons
  2020-04-02 18:53 [PATCH v2 00/11] aarch64: Implement TImode comparisons Richard Henderson
                   ` (8 preceding siblings ...)
  2020-04-02 18:53 ` [PATCH v2 09/11] aarch64: Adjust result of aarch64_gen_compare_reg Richard Henderson
@ 2020-04-02 18:53 ` Richard Henderson
  2020-04-02 18:53 ` [PATCH v2 11/11] aarch64: Implement absti2 Richard Henderson
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 33+ messages in thread
From: Richard Henderson @ 2020-04-02 18:53 UTC (permalink / raw)
  To: gcc-patches
  Cc: richard.sandiford, segher, richard.earnshaw, Wilco.Dijkstra,
	marcus.shawcroft, kyrylo.tkachov

Use ccmp to perform all TImode comparisons branchless.

	* config/aarch64/aarch64.c (aarch64_gen_compare_reg): Expand all of
	the comparisons for TImode, not just NE.
	* config/aarch64/aarch64.md (cbranchti4, cstoreti4): New.
---
 gcc/config/aarch64/aarch64.c  | 122 ++++++++++++++++++++++++++++++----
 gcc/config/aarch64/aarch64.md |  28 ++++++++
 2 files changed, 136 insertions(+), 14 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 93658338041..89c9192266c 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -2333,32 +2333,126 @@ rtx
 aarch64_gen_compare_reg (RTX_CODE code, rtx x, rtx y)
 {
   machine_mode cmp_mode = GET_MODE (x);
-  machine_mode cc_mode;
   rtx cc_reg;
 
   if (cmp_mode == TImode)
     {
-      gcc_assert (code == NE);
-
-      cc_mode = CCmode;
-      cc_reg = gen_rtx_REG (cc_mode, CC_REGNUM);
-
       rtx x_lo = operand_subword (x, 0, 0, TImode);
-      rtx y_lo = operand_subword (y, 0, 0, TImode);
-      emit_set_insn (cc_reg, gen_rtx_COMPARE (cc_mode, x_lo, y_lo));
-
       rtx x_hi = operand_subword (x, 1, 0, TImode);
-      rtx y_hi = operand_subword (y, 1, 0, TImode);
-      emit_insn (gen_ccmpccdi (cc_reg, cc_reg, x_hi, y_hi,
-			       gen_rtx_EQ (cc_mode, cc_reg, const0_rtx),
-			       GEN_INT (AARCH64_EQ)));
+      struct expand_operand ops[2];
+      rtx y_lo, y_hi, tmp;
+
+      if (CONST_INT_P (y))
+	{
+	  HOST_WIDE_INT y_int = INTVAL (y);
+
+	  y_lo = y;
+	  switch (code)
+	    {
+	    case EQ:
+	    case NE:
+	      /* For equality, IOR the two halves together.  If this gets
+		 used for a branch, we expect this to fold to cbz/cbnz;
+		 otherwise it's no larger than cmp+ccmp below.  Beware of
+		 the compare-and-swap post-reload split and use cmp+ccmp.  */
+	      if (y_int == 0 && can_create_pseudo_p ())
+		{
+		  tmp = gen_reg_rtx (DImode);
+		  emit_insn (gen_iordi3 (tmp, x_hi, x_lo));
+		  emit_insn (gen_cmpdi (tmp, const0_rtx));
+		  cc_reg = gen_rtx_REG (CCmode, CC_REGNUM);
+		  goto done;
+		}
+		break;
+
+	    case LE:
+	    case GT:
+	      /* Add 1 to Y to convert to LT/GE, which avoids the swap and
+		 keeps the constant operand.  The cstoreti and cbranchti
+		 operand predicates require aarch64_plus_operand, which
+		 means this increment cannot overflow.  */
+	      y_lo = gen_int_mode (++y_int, DImode);
+	      code = (code == LE ? LT : GE);
+	      /* fall through */
+
+	    case LT:
+	    case GE:
+	      /* Check only the sign bit using tst, or fold to tbz/tbnz.  */
+	      if (y_int == 0)
+		{
+		  cc_reg = gen_rtx_REG (CC_NZmode, CC_REGNUM);
+		  tmp = gen_rtx_AND (DImode, x_hi, GEN_INT (INT64_MIN));
+		  tmp = gen_rtx_COMPARE (CC_NZmode, tmp, const0_rtx);
+		  emit_set_insn (cc_reg, tmp);
+		  code = (code == LT ? NE : EQ);
+		  goto done;
+		}
+	      break;
+
+	    default:
+	      break;
+	    }
+	  y_hi = (y_int < 0 ? constm1_rtx : const0_rtx);
+	}
+      else
+	{
+	  y_lo = operand_subword (y, 0, 0, TImode);
+	  y_hi = operand_subword (y, 1, 0, TImode);
+	}
+
+      switch (code)
+	{
+	case LEU:
+	case GTU:
+	case LE:
+	case GT:
+	  std::swap (x_lo, y_lo);
+	  std::swap (x_hi, y_hi);
+	  code = swap_condition (code);
+	  break;
+
+	default:
+	  break;
+	}
+
+      /* Emit cmpdi, forcing operands into registers as required. */
+      create_input_operand (&ops[0], x_lo, DImode);
+      create_input_operand (&ops[1], y_lo, DImode);
+      expand_insn (CODE_FOR_cmpdi, 2, ops);
+
+      cc_reg = gen_rtx_REG (CCmode, CC_REGNUM);
+      switch (code)
+	{
+	case EQ:
+	case NE:
+	  /* For EQ, (x_lo == y_lo) && (x_hi == y_hi).  */
+	  emit_insn (gen_ccmpccdi (cc_reg, cc_reg, x_hi, y_hi,
+				   gen_rtx_EQ (VOIDmode, cc_reg, const0_rtx),
+				   GEN_INT (AARCH64_EQ)));
+	  break;
+
+	case LTU:
+	case GEU:
+	case LT:
+	case GE:
+	  /* Compute (x - y), as double-word arithmetic.  */
+	  create_input_operand (&ops[0], x_hi, DImode);
+	  create_input_operand (&ops[1], y_hi, DImode);
+	  expand_insn (CODE_FOR_cmpdi3_carryin, 2, ops);
+	  break;
+
+	default:
+	  gcc_unreachable ();
+	}
     }
   else
     {
-      cc_mode = SELECT_CC_MODE (code, x, y);
+      machine_mode cc_mode = SELECT_CC_MODE (code, x, y);
       cc_reg = gen_rtx_REG (cc_mode, CC_REGNUM);
       emit_set_insn (cc_reg, gen_rtx_COMPARE (cc_mode, x, y));
     }
+
+ done:
   return gen_rtx_fmt_ee (code, VOIDmode, cc_reg, const0_rtx);
 }
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index dbaeb7c251c..cf716f815a1 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -473,6 +473,20 @@
   operands[2] = const0_rtx;
 })
 
+(define_expand "cbranchti4"
+  [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
+			    [(match_operand:TI 1 "register_operand")
+			     (match_operand:TI 2 "aarch64_plus_operand")])
+			   (label_ref (match_operand 3 "" ""))
+			   (pc)))]
+  ""
+{
+  operands[0] = aarch64_gen_compare_reg (GET_CODE (operands[0]), operands[1],
+					 operands[2]);
+  operands[1] = XEXP (operands[0], 0);
+  operands[2] = const0_rtx;
+})
+
 (define_expand "cbranch<mode>4"
   [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
 			    [(match_operand:GPF 1 "register_operand")
@@ -3923,6 +3937,20 @@
   operands[3] = const0_rtx;
 })
 
+(define_expand "cstoreti4"
+  [(set (match_operand:SI 0 "register_operand")
+	(match_operator:SI 1 "aarch64_comparison_operator"
+	 [(match_operand:TI 2 "register_operand")
+	  (match_operand:TI 3 "aarch64_plus_operand")]))]
+  ""
+{
+  operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[1]), operands[2],
+				         operands[3]);
+  PUT_MODE (operands[1], SImode);
+  operands[2] = XEXP (operands[1], 0);
+  operands[3] = const0_rtx;
+})
+
 (define_expand "cstorecc4"
   [(set (match_operand:SI 0 "register_operand")
        (match_operator 1 "aarch64_comparison_operator_mode"
-- 
2.20.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v2 11/11] aarch64: Implement absti2
  2020-04-02 18:53 [PATCH v2 00/11] aarch64: Implement TImode comparisons Richard Henderson
                   ` (9 preceding siblings ...)
  2020-04-02 18:53 ` [PATCH v2 10/11] aarch64: Implement TImode comparisons Richard Henderson
@ 2020-04-02 18:53 ` Richard Henderson
  2020-04-02 18:55 ` [PATCH v2 00/11] aarch64: Implement TImode comparisons Richard Henderson
  2020-04-03 11:34 ` Richard Earnshaw (lists)
  12 siblings, 0 replies; 33+ messages in thread
From: Richard Henderson @ 2020-04-02 18:53 UTC (permalink / raw)
  To: gcc-patches
  Cc: richard.sandiford, segher, richard.earnshaw, Wilco.Dijkstra,
	marcus.shawcroft, kyrylo.tkachov

	* config/aarch64/aarch64.md (absti2): New.
---
 gcc/config/aarch64/aarch64.md | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index cf716f815a1..4a30d4cca93 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3521,6 +3521,35 @@
   }
 )
 
+(define_expand "absti2"
+  [(match_operand:TI 0 "register_operand")
+   (match_operand:TI 1 "register_operand")]
+  ""
+  {
+    rtx lo_op1 = gen_lowpart (DImode, operands[1]);
+    rtx hi_op1 = gen_highpart (DImode, operands[1]);
+    rtx lo_tmp = gen_reg_rtx (DImode);
+    rtx hi_tmp = gen_reg_rtx (DImode);
+    rtx x, cc;
+
+    emit_insn (gen_negdi_carryout (lo_tmp, lo_op1));
+    emit_insn (gen_subdi3_carryin_cmp (hi_tmp, const0_rtx, hi_op1));
+
+    cc = gen_rtx_REG (CC_NZmode, CC_REGNUM);
+    x = gen_rtx_GE (VOIDmode, cc, const0_rtx);
+    x = gen_rtx_IF_THEN_ELSE (DImode, x, lo_tmp, lo_op1);
+    emit_insn (gen_rtx_SET (lo_tmp, x));
+
+    x = gen_rtx_GE (VOIDmode, cc, const0_rtx);
+    x = gen_rtx_IF_THEN_ELSE (DImode, x, hi_tmp, hi_op1);
+    emit_insn (gen_rtx_SET (hi_tmp, x));
+
+    emit_move_insn (gen_lowpart (DImode, operands[0]), lo_tmp);
+    emit_move_insn (gen_highpart (DImode, operands[0]), hi_tmp);
+    DONE;
+  }
+)
+
 (define_insn "neg<mode>2"
   [(set (match_operand:GPI 0 "register_operand" "=r,w")
 	(neg:GPI (match_operand:GPI 1 "register_operand" "r,w")))]
-- 
2.20.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 00/11] aarch64: Implement TImode comparisons
  2020-04-02 18:53 [PATCH v2 00/11] aarch64: Implement TImode comparisons Richard Henderson
                   ` (10 preceding siblings ...)
  2020-04-02 18:53 ` [PATCH v2 11/11] aarch64: Implement absti2 Richard Henderson
@ 2020-04-02 18:55 ` Richard Henderson
  2020-04-03 11:34 ` Richard Earnshaw (lists)
  12 siblings, 0 replies; 33+ messages in thread
From: Richard Henderson @ 2020-04-02 18:55 UTC (permalink / raw)
  To: Richard Henderson, gcc-patches
  Cc: richard.earnshaw, segher, Wilco.Dijkstra, marcus.shawcroft

On 4/2/20 11:53 AM, Richard Henderson via Gcc-patches wrote:
> This is attacking case 3 of PR 94174.
> 
> In v2, I unify the various subtract-with-borrow and add-with-carry
> patterns that also output flags with unspecs.  As suggested by
> Richard Sandiford during review of v1.  It does seem cleaner.

Hmph.  I miscounted -- this is actually v3.  :-P


r~

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 00/11] aarch64: Implement TImode comparisons
  2020-04-02 18:53 [PATCH v2 00/11] aarch64: Implement TImode comparisons Richard Henderson
                   ` (11 preceding siblings ...)
  2020-04-02 18:55 ` [PATCH v2 00/11] aarch64: Implement TImode comparisons Richard Henderson
@ 2020-04-03 11:34 ` Richard Earnshaw (lists)
  2020-04-03 12:27   ` Richard Sandiford
  12 siblings, 1 reply; 33+ messages in thread
From: Richard Earnshaw (lists) @ 2020-04-03 11:34 UTC (permalink / raw)
  To: Richard Henderson, gcc-patches; +Cc: segher, Wilco.Dijkstra, marcus.shawcroft

On 02/04/2020 19:53, Richard Henderson via Gcc-patches wrote:
> This is attacking case 3 of PR 94174.
> 
> In v2, I unify the various subtract-with-borrow and add-with-carry
> patterns that also output flags with unspecs.  As suggested by
> Richard Sandiford during review of v1.  It does seem cleaner.
> 

Really?  I didn't need to use any unspecs for the Arm version of this.

R.

> 
> r~
> 
> 
> Richard Henderson (11):
>   aarch64: Accept 0 as first argument to compares
>   aarch64: Accept zeros in add<GPI>3_carryin
>   aarch64: Provide expander for sub<GPI>3_compare1
>   aarch64: Introduce aarch64_expand_addsubti
>   aarch64: Use UNSPEC_SBCS for subtract-with-borrow + output flags
>   aarch64: Use UNSPEC_ADCS for add-with-carry + output flags
>   aarch64: Remove CC_ADCmode
>   aarch64: Accept -1 as second argument to add<mode>3_carryin
>   aarch64: Adjust result of aarch64_gen_compare_reg
>   aarch64: Implement TImode comparisons
>   aarch64: Implement absti2
> 
>  gcc/config/aarch64/aarch64-protos.h       |  10 +-
>  gcc/config/aarch64/aarch64.c              | 303 +++++----
>  gcc/config/aarch64/aarch64-modes.def      |   1 -
>  gcc/config/aarch64/aarch64-simd.md        |  18 +-
>  gcc/config/aarch64/aarch64-speculation.cc |   5 +-
>  gcc/config/aarch64/aarch64.md             | 762 ++++++++++------------
>  gcc/config/aarch64/predicates.md          |  15 +-
>  7 files changed, 527 insertions(+), 587 deletions(-)
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 00/11] aarch64: Implement TImode comparisons
  2020-04-03 11:34 ` Richard Earnshaw (lists)
@ 2020-04-03 12:27   ` Richard Sandiford
  2020-04-03 13:17     ` Richard Earnshaw (lists)
  0 siblings, 1 reply; 33+ messages in thread
From: Richard Sandiford @ 2020-04-03 12:27 UTC (permalink / raw)
  To: Richard Earnshaw (lists)
  Cc: Richard Henderson, gcc-patches, marcus.shawcroft, segher, Wilco.Dijkstra

"Richard Earnshaw (lists)" <Richard.Earnshaw@arm.com> writes:
> On 02/04/2020 19:53, Richard Henderson via Gcc-patches wrote:
>> This is attacking case 3 of PR 94174.
>> 
>> In v2, I unify the various subtract-with-borrow and add-with-carry
>> patterns that also output flags with unspecs.  As suggested by
>> Richard Sandiford during review of v1.  It does seem cleaner.
>> 
>
> Really?  I didn't need to use any unspecs for the Arm version of this.
>
> R.

See https://gcc.gnu.org/pipermail/gcc-patches/2020-April/543063.html
(including quoted context) for how we got here.

The same problem affects the existing aarch64 patterns like
*usub<GPI:mode>3_carryinC.  Although that pattern avoids unspecs,
the compare:CC doesn't seem to be correct.

Richard

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 00/11] aarch64: Implement TImode comparisons
  2020-04-03 12:27   ` Richard Sandiford
@ 2020-04-03 13:17     ` Richard Earnshaw (lists)
  2020-04-03 15:03       ` Richard Sandiford
  0 siblings, 1 reply; 33+ messages in thread
From: Richard Earnshaw (lists) @ 2020-04-03 13:17 UTC (permalink / raw)
  To: Richard Henderson, gcc-patches, marcus.shawcroft, segher,
	Wilco.Dijkstra, richard.sandiford

On 03/04/2020 13:27, Richard Sandiford wrote:
> "Richard Earnshaw (lists)" <Richard.Earnshaw@arm.com> writes:
>> On 02/04/2020 19:53, Richard Henderson via Gcc-patches wrote:
>>> This is attacking case 3 of PR 94174.
>>>
>>> In v2, I unify the various subtract-with-borrow and add-with-carry
>>> patterns that also output flags with unspecs.  As suggested by
>>> Richard Sandiford during review of v1.  It does seem cleaner.
>>>
>>
>> Really?  I didn't need to use any unspecs for the Arm version of this.
>>
>> R.
> 
> See https://gcc.gnu.org/pipermail/gcc-patches/2020-April/543063.html
> (including quoted context) for how we got here.
> 
> The same problem affects the existing aarch64 patterns like
> *usub<GPI:mode>3_carryinC.  Although that pattern avoids unspecs,
> the compare:CC doesn't seem to be correct.
> 
> Richard
> 

But I don't think you can use ANY_EXTEND in these comparisons.  It
doesn't describe what the instruction does, since the instruction does
not really extend the values first.

I would really expect this patch series to be pretty much a dual of this
series that I posted last year for Arm.

https://gcc.gnu.org/pipermail/gcc-patches/2019-October/532180.html

R.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 00/11] aarch64: Implement TImode comparisons
  2020-04-03 13:17     ` Richard Earnshaw (lists)
@ 2020-04-03 15:03       ` Richard Sandiford
  2020-04-06  9:27         ` Richard Earnshaw (lists)
  0 siblings, 1 reply; 33+ messages in thread
From: Richard Sandiford @ 2020-04-03 15:03 UTC (permalink / raw)
  To: Richard Earnshaw (lists)
  Cc: Richard Henderson, gcc-patches, marcus.shawcroft, segher, Wilco.Dijkstra

"Richard Earnshaw (lists)" <Richard.Earnshaw@arm.com> writes:
> On 03/04/2020 13:27, Richard Sandiford wrote:
>> "Richard Earnshaw (lists)" <Richard.Earnshaw@arm.com> writes:
>>> On 02/04/2020 19:53, Richard Henderson via Gcc-patches wrote:
>>>> This is attacking case 3 of PR 94174.
>>>>
>>>> In v2, I unify the various subtract-with-borrow and add-with-carry
>>>> patterns that also output flags with unspecs.  As suggested by
>>>> Richard Sandiford during review of v1.  It does seem cleaner.
>>>>
>>>
>>> Really?  I didn't need to use any unspecs for the Arm version of this.
>>>
>>> R.
>> 
>> See https://gcc.gnu.org/pipermail/gcc-patches/2020-April/543063.html
>> (including quoted context) for how we got here.
>> 
>> The same problem affects the existing aarch64 patterns like
>> *usub<GPI:mode>3_carryinC.  Although that pattern avoids unspecs,
>> the compare:CC doesn't seem to be correct.
>> 
>> Richard
>> 
>
> But I don't think you can use ANY_EXTEND in these comparisons.  It
> doesn't describe what the instruction does, since the instruction does
> not really extend the values first.

Yeah, that was the starting point in the thread above too.  And using
zero_extend in the existing *usub<GPI:mode>3_carryinC pattern:

(define_insn "*usub<GPI:mode>3_carryinC"
  [(set (reg:CC CC_REGNUM)
  	(compare:CC
	  (zero_extend:<DWI>
	    (match_operand:GPI 1 "register_operand" "r"))
	  (plus:<DWI>
	    (zero_extend:<DWI>
	      (match_operand:GPI 2 "register_operand" "r"))
	    (match_operand:<DWI> 3 "aarch64_borrow_operation" ""))))
   (set (match_operand:GPI 0 "register_operand" "=r")
	(minus:GPI
	  (minus:GPI (match_dup 1) (match_dup 2))
	  (match_operand:GPI 4 "aarch64_borrow_operation" "")))]
   ""
   "sbcs\\t%<w>0, %<w>1, %<w>2"
  [(set_attr "type" "adc_reg")]
)

looks wrong for the same reason.  But the main problem IMO isn't how the
inputs to the compare:CC are represented, but that we're using compare:CC
at all.  Using compare doesn't accurately model the effect of SBCS on NZCV
for all inputs, so if we're going to use a compare here, it can't be :CC.

> I would really expect this patch series to be pretty much a dual of this
> series that I posted last year for Arm.
>
> https://gcc.gnu.org/pipermail/gcc-patches/2019-October/532180.html

That series uses compares with modes like CC_V and CC_B, so I think
you're saying that given the choice in the earlier thread between adding
a new CC mode or using unspecs, you would have preferred a new CC mode,
is that right?

Richard

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 00/11] aarch64: Implement TImode comparisons
  2020-04-03 15:03       ` Richard Sandiford
@ 2020-04-06  9:27         ` Richard Earnshaw (lists)
  2020-04-06 11:19           ` Richard Sandiford
  0 siblings, 1 reply; 33+ messages in thread
From: Richard Earnshaw (lists) @ 2020-04-06  9:27 UTC (permalink / raw)
  To: Richard Henderson, gcc-patches, marcus.shawcroft, segher,
	Wilco.Dijkstra, richard.sandiford

On 03/04/2020 16:03, Richard Sandiford wrote:
> "Richard Earnshaw (lists)" <Richard.Earnshaw@arm.com> writes:
>> On 03/04/2020 13:27, Richard Sandiford wrote:
>>> "Richard Earnshaw (lists)" <Richard.Earnshaw@arm.com> writes:
>>>> On 02/04/2020 19:53, Richard Henderson via Gcc-patches wrote:
>>>>> This is attacking case 3 of PR 94174.
>>>>>
>>>>> In v2, I unify the various subtract-with-borrow and add-with-carry
>>>>> patterns that also output flags with unspecs.  As suggested by
>>>>> Richard Sandiford during review of v1.  It does seem cleaner.
>>>>>
>>>>
>>>> Really?  I didn't need to use any unspecs for the Arm version of this.
>>>>
>>>> R.
>>>
>>> See https://gcc.gnu.org/pipermail/gcc-patches/2020-April/543063.html
>>> (including quoted context) for how we got here.
>>>
>>> The same problem affects the existing aarch64 patterns like
>>> *usub<GPI:mode>3_carryinC.  Although that pattern avoids unspecs,
>>> the compare:CC doesn't seem to be correct.
>>>
>>> Richard
>>>
>>
>> But I don't think you can use ANY_EXTEND in these comparisons.  It
>> doesn't describe what the instruction does, since the instruction does
>> not really extend the values first.
> 
> Yeah, that was the starting point in the thread above too.  And using
> zero_extend in the existing *usub<GPI:mode>3_carryinC pattern:
> 
> (define_insn "*usub<GPI:mode>3_carryinC"
>   [(set (reg:CC CC_REGNUM)
>   	(compare:CC
> 	  (zero_extend:<DWI>
> 	    (match_operand:GPI 1 "register_operand" "r"))
> 	  (plus:<DWI>
> 	    (zero_extend:<DWI>
> 	      (match_operand:GPI 2 "register_operand" "r"))
> 	    (match_operand:<DWI> 3 "aarch64_borrow_operation" ""))))
>    (set (match_operand:GPI 0 "register_operand" "=r")
> 	(minus:GPI
> 	  (minus:GPI (match_dup 1) (match_dup 2))
> 	  (match_operand:GPI 4 "aarch64_borrow_operation" "")))]
>    ""
>    "sbcs\\t%<w>0, %<w>1, %<w>2"
>   [(set_attr "type" "adc_reg")]
> )
> 
> looks wrong for the same reason.  But the main problem IMO isn't how the
> inputs to the compare:CC are represented, but that we're using compare:CC
> at all.  Using compare doesn't accurately model the effect of SBCS on NZCV
> for all inputs, so if we're going to use a compare here, it can't be :CC.
> 
>> I would really expect this patch series to be pretty much a dual of this
>> series that I posted last year for Arm.
>>
>> https://gcc.gnu.org/pipermail/gcc-patches/2019-October/532180.html
> 
> That series uses compares with modes like CC_V and CC_B, so I think
> you're saying that given the choice in the earlier thread between adding
> a new CC mode or using unspecs, you would have preferred a new CC mode,
> is that right?
> 

Yes.  It surprised me, when doing the aarch32 version, just how often
the mid-end parts of the compiler were able to reason about parts of the
parallel insn and optimize things accordingly (eg propagating the truth
of the comparison).  If you use an unspec that can never happen.


R.

> Richard
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 00/11] aarch64: Implement TImode comparisons
  2020-04-06  9:27         ` Richard Earnshaw (lists)
@ 2020-04-06 11:19           ` Richard Sandiford
  2020-04-06 12:22             ` Richard Biener
                               ` (2 more replies)
  0 siblings, 3 replies; 33+ messages in thread
From: Richard Sandiford @ 2020-04-06 11:19 UTC (permalink / raw)
  To: Richard Earnshaw (lists)
  Cc: Richard Henderson, gcc-patches, marcus.shawcroft, segher, Wilco.Dijkstra

"Richard Earnshaw (lists)" <Richard.Earnshaw@arm.com> writes:
> On 03/04/2020 16:03, Richard Sandiford wrote:
>> "Richard Earnshaw (lists)" <Richard.Earnshaw@arm.com> writes:
>>> On 03/04/2020 13:27, Richard Sandiford wrote:
>>>> "Richard Earnshaw (lists)" <Richard.Earnshaw@arm.com> writes:
>>>>> On 02/04/2020 19:53, Richard Henderson via Gcc-patches wrote:
>>>>>> This is attacking case 3 of PR 94174.
>>>>>>
>>>>>> In v2, I unify the various subtract-with-borrow and add-with-carry
>>>>>> patterns that also output flags with unspecs.  As suggested by
>>>>>> Richard Sandiford during review of v1.  It does seem cleaner.
>>>>>>
>>>>>
>>>>> Really?  I didn't need to use any unspecs for the Arm version of this.
>>>>>
>>>>> R.
>>>>
>>>> See https://gcc.gnu.org/pipermail/gcc-patches/2020-April/543063.html
>>>> (including quoted context) for how we got here.
>>>>
>>>> The same problem affects the existing aarch64 patterns like
>>>> *usub<GPI:mode>3_carryinC.  Although that pattern avoids unspecs,
>>>> the compare:CC doesn't seem to be correct.
>>>>
>>>> Richard
>>>>
>>>
>>> But I don't think you can use ANY_EXTEND in these comparisons.  It
>>> doesn't describe what the instruction does, since the instruction does
>>> not really extend the values first.
>> 
>> Yeah, that was the starting point in the thread above too.  And using
>> zero_extend in the existing *usub<GPI:mode>3_carryinC pattern:
>> 
>> (define_insn "*usub<GPI:mode>3_carryinC"
>>   [(set (reg:CC CC_REGNUM)
>>   	(compare:CC
>> 	  (zero_extend:<DWI>
>> 	    (match_operand:GPI 1 "register_operand" "r"))
>> 	  (plus:<DWI>
>> 	    (zero_extend:<DWI>
>> 	      (match_operand:GPI 2 "register_operand" "r"))
>> 	    (match_operand:<DWI> 3 "aarch64_borrow_operation" ""))))
>>    (set (match_operand:GPI 0 "register_operand" "=r")
>> 	(minus:GPI
>> 	  (minus:GPI (match_dup 1) (match_dup 2))
>> 	  (match_operand:GPI 4 "aarch64_borrow_operation" "")))]
>>    ""
>>    "sbcs\\t%<w>0, %<w>1, %<w>2"
>>   [(set_attr "type" "adc_reg")]
>> )
>> 
>> looks wrong for the same reason.  But the main problem IMO isn't how the
>> inputs to the compare:CC are represented, but that we're using compare:CC
>> at all.  Using compare doesn't accurately model the effect of SBCS on NZCV
>> for all inputs, so if we're going to use a compare here, it can't be :CC.
>> 
>>> I would really expect this patch series to be pretty much a dual of this
>>> series that I posted last year for Arm.
>>>
>>> https://gcc.gnu.org/pipermail/gcc-patches/2019-October/532180.html
>> 
>> That series uses compares with modes like CC_V and CC_B, so I think
>> you're saying that given the choice in the earlier thread between adding
>> a new CC mode or using unspecs, you would have preferred a new CC mode,
>> is that right?
>> 
>
> Yes.  It surprised me, when doing the aarch32 version, just how often
> the mid-end parts of the compiler were able to reason about parts of the
> parallel insn and optimize things accordingly (eg propagating the truth
> of the comparison).  If you use an unspec that can never happen.

That could be changed though.  E.g. we could add something like a
simplify_unspec target hook if this becomes a problem (either here
or for other unspecs).  A fancy implementation could even use
match.pd-style rules in the .md file.

The reason I'm not keen on using special modes for this case is that
they'd describe one way in which the result can be used rather than
describing what the instruction actually does.  The instruction really
does set all four flags to useful values.  The "problem" is that they're
not the values associated with a compare between two values, so representing
them that way will always lose information.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 00/11] aarch64: Implement TImode comparisons
  2020-04-06 11:19           ` Richard Sandiford
@ 2020-04-06 12:22             ` Richard Biener
  2020-04-08  9:10               ` Richard Sandiford
  2020-04-07  9:52             ` Richard Earnshaw (lists)
  2020-04-07 20:27             ` Segher Boessenkool
  2 siblings, 1 reply; 33+ messages in thread
From: Richard Biener @ 2020-04-06 12:22 UTC (permalink / raw)
  To: Richard Earnshaw (lists),
	Richard Henderson, GCC Patches, Marcus Shawcroft,
	Segher Boessenkool, Wilco Dijkstra, Richard Sandiford

On Mon, Apr 6, 2020 at 1:20 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> "Richard Earnshaw (lists)" <Richard.Earnshaw@arm.com> writes:
> > On 03/04/2020 16:03, Richard Sandiford wrote:
> >> "Richard Earnshaw (lists)" <Richard.Earnshaw@arm.com> writes:
> >>> On 03/04/2020 13:27, Richard Sandiford wrote:
> >>>> "Richard Earnshaw (lists)" <Richard.Earnshaw@arm.com> writes:
> >>>>> On 02/04/2020 19:53, Richard Henderson via Gcc-patches wrote:
> >>>>>> This is attacking case 3 of PR 94174.
> >>>>>>
> >>>>>> In v2, I unify the various subtract-with-borrow and add-with-carry
> >>>>>> patterns that also output flags with unspecs.  As suggested by
> >>>>>> Richard Sandiford during review of v1.  It does seem cleaner.
> >>>>>>
> >>>>>
> >>>>> Really?  I didn't need to use any unspecs for the Arm version of this.
> >>>>>
> >>>>> R.
> >>>>
> >>>> See https://gcc.gnu.org/pipermail/gcc-patches/2020-April/543063.html
> >>>> (including quoted context) for how we got here.
> >>>>
> >>>> The same problem affects the existing aarch64 patterns like
> >>>> *usub<GPI:mode>3_carryinC.  Although that pattern avoids unspecs,
> >>>> the compare:CC doesn't seem to be correct.
> >>>>
> >>>> Richard
> >>>>
> >>>
> >>> But I don't think you can use ANY_EXTEND in these comparisons.  It
> >>> doesn't describe what the instruction does, since the instruction does
> >>> not really extend the values first.
> >>
> >> Yeah, that was the starting point in the thread above too.  And using
> >> zero_extend in the existing *usub<GPI:mode>3_carryinC pattern:
> >>
> >> (define_insn "*usub<GPI:mode>3_carryinC"
> >>   [(set (reg:CC CC_REGNUM)
> >>      (compare:CC
> >>        (zero_extend:<DWI>
> >>          (match_operand:GPI 1 "register_operand" "r"))
> >>        (plus:<DWI>
> >>          (zero_extend:<DWI>
> >>            (match_operand:GPI 2 "register_operand" "r"))
> >>          (match_operand:<DWI> 3 "aarch64_borrow_operation" ""))))
> >>    (set (match_operand:GPI 0 "register_operand" "=r")
> >>      (minus:GPI
> >>        (minus:GPI (match_dup 1) (match_dup 2))
> >>        (match_operand:GPI 4 "aarch64_borrow_operation" "")))]
> >>    ""
> >>    "sbcs\\t%<w>0, %<w>1, %<w>2"
> >>   [(set_attr "type" "adc_reg")]
> >> )
> >>
> >> looks wrong for the same reason.  But the main problem IMO isn't how the
> >> inputs to the compare:CC are represented, but that we're using compare:CC
> >> at all.  Using compare doesn't accurately model the effect of SBCS on NZCV
> >> for all inputs, so if we're going to use a compare here, it can't be :CC.
> >>
> >>> I would really expect this patch series to be pretty much a dual of this
> >>> series that I posted last year for Arm.
> >>>
> >>> https://gcc.gnu.org/pipermail/gcc-patches/2019-October/532180.html
> >>
> >> That series uses compares with modes like CC_V and CC_B, so I think
> >> you're saying that given the choice in the earlier thread between adding
> >> a new CC mode or using unspecs, you would have preferred a new CC mode,
> >> is that right?
> >>
> >
> > Yes.  It surprised me, when doing the aarch32 version, just how often
> > the mid-end parts of the compiler were able to reason about parts of the
> > parallel insn and optimize things accordingly (eg propagating the truth
> > of the comparison).  If you use an unspec that can never happen.
>
> That could be changed though.  E.g. we could add something like a
> simplify_unspec target hook if this becomes a problem (either here
> or for other unspecs).  A fancy implementation could even use
> match.pd-style rules in the .md file.
>
> The reason I'm not keen on using special modes for this case is that
> they'd describe one way in which the result can be used rather than
> describing what the instruction actually does.  The instruction really
> does set all four flags to useful values.  The "problem" is that they're
> not the values associated with a compare between two values, so representing
> them that way will always lose information.

Can't you recover the pieces by using a parallel with multiple
set:CC_X that tie together the pieces in the "correct" way?

Richard.

> Thanks,
> Richard

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 00/11] aarch64: Implement TImode comparisons
  2020-04-06 11:19           ` Richard Sandiford
  2020-04-06 12:22             ` Richard Biener
@ 2020-04-07  9:52             ` Richard Earnshaw (lists)
  2020-04-07 16:32               ` Richard Sandiford
  2020-04-07 19:43               ` Segher Boessenkool
  2020-04-07 20:27             ` Segher Boessenkool
  2 siblings, 2 replies; 33+ messages in thread
From: Richard Earnshaw (lists) @ 2020-04-07  9:52 UTC (permalink / raw)
  To: Richard Henderson, gcc-patches, marcus.shawcroft, segher,
	Wilco.Dijkstra, richard.sandiford

On 06/04/2020 12:19, Richard Sandiford wrote:
> "Richard Earnshaw (lists)" <Richard.Earnshaw@arm.com> writes:
>> On 03/04/2020 16:03, Richard Sandiford wrote:
>>> "Richard Earnshaw (lists)" <Richard.Earnshaw@arm.com> writes:
>>>> On 03/04/2020 13:27, Richard Sandiford wrote:
>>>>> "Richard Earnshaw (lists)" <Richard.Earnshaw@arm.com> writes:
>>>>>> On 02/04/2020 19:53, Richard Henderson via Gcc-patches wrote:
>>>>>>> This is attacking case 3 of PR 94174.
>>>>>>>
>>>>>>> In v2, I unify the various subtract-with-borrow and add-with-carry
>>>>>>> patterns that also output flags with unspecs.  As suggested by
>>>>>>> Richard Sandiford during review of v1.  It does seem cleaner.
>>>>>>>
>>>>>>
>>>>>> Really?  I didn't need to use any unspecs for the Arm version of this.
>>>>>>
>>>>>> R.
>>>>>
>>>>> See https://gcc.gnu.org/pipermail/gcc-patches/2020-April/543063.html
>>>>> (including quoted context) for how we got here.
>>>>>
>>>>> The same problem affects the existing aarch64 patterns like
>>>>> *usub<GPI:mode>3_carryinC.  Although that pattern avoids unspecs,
>>>>> the compare:CC doesn't seem to be correct.
>>>>>
>>>>> Richard
>>>>>
>>>>
>>>> But I don't think you can use ANY_EXTEND in these comparisons.  It
>>>> doesn't describe what the instruction does, since the instruction does
>>>> not really extend the values first.
>>>
>>> Yeah, that was the starting point in the thread above too.  And using
>>> zero_extend in the existing *usub<GPI:mode>3_carryinC pattern:
>>>
>>> (define_insn "*usub<GPI:mode>3_carryinC"
>>>   [(set (reg:CC CC_REGNUM)
>>>   	(compare:CC
>>> 	  (zero_extend:<DWI>
>>> 	    (match_operand:GPI 1 "register_operand" "r"))
>>> 	  (plus:<DWI>
>>> 	    (zero_extend:<DWI>
>>> 	      (match_operand:GPI 2 "register_operand" "r"))
>>> 	    (match_operand:<DWI> 3 "aarch64_borrow_operation" ""))))
>>>    (set (match_operand:GPI 0 "register_operand" "=r")
>>> 	(minus:GPI
>>> 	  (minus:GPI (match_dup 1) (match_dup 2))
>>> 	  (match_operand:GPI 4 "aarch64_borrow_operation" "")))]
>>>    ""
>>>    "sbcs\\t%<w>0, %<w>1, %<w>2"
>>>   [(set_attr "type" "adc_reg")]
>>> )
>>>
>>> looks wrong for the same reason.  But the main problem IMO isn't how the
>>> inputs to the compare:CC are represented, but that we're using compare:CC
>>> at all.  Using compare doesn't accurately model the effect of SBCS on NZCV
>>> for all inputs, so if we're going to use a compare here, it can't be :CC.
>>>
>>>> I would really expect this patch series to be pretty much a dual of this
>>>> series that I posted last year for Arm.
>>>>
>>>> https://gcc.gnu.org/pipermail/gcc-patches/2019-October/532180.html
>>>
>>> That series uses compares with modes like CC_V and CC_B, so I think
>>> you're saying that given the choice in the earlier thread between adding
>>> a new CC mode or using unspecs, you would have preferred a new CC mode,
>>> is that right?
>>>
>>
>> Yes.  It surprised me, when doing the aarch32 version, just how often
>> the mid-end parts of the compiler were able to reason about parts of the
>> parallel insn and optimize things accordingly (eg propagating the truth
>> of the comparison).  If you use an unspec that can never happen.
> 
> That could be changed though.  E.g. we could add something like a
> simplify_unspec target hook if this becomes a problem (either here
> or for other unspecs).  A fancy implementation could even use
> match.pd-style rules in the .md file.

I really don't like that.  It sounds like the top of a long slippery
slope.  What about all the other cases where the RTL is comprehended by
the mid-end?

> 
> The reason I'm not keen on using special modes for this case is that
> they'd describe one way in which the result can be used rather than
> describing what the instruction actually does.  The instruction really
> does set all four flags to useful values.  The "problem" is that they're
> not the values associated with a compare between two values, so representing
> them that way will always lose information.
> 

Yes, it's true that the rtl -> machine instruction transform is not 100%
reversible.  That's always been the case, but it's the price we pay for
a generic IL that describes instructions on multiple architectures.

R.

> Thanks,
> Richard
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 00/11] aarch64: Implement TImode comparisons
  2020-04-07  9:52             ` Richard Earnshaw (lists)
@ 2020-04-07 16:32               ` Richard Sandiford
  2020-04-07 17:05                 ` Richard Henderson
  2020-04-07 19:43               ` Segher Boessenkool
  1 sibling, 1 reply; 33+ messages in thread
From: Richard Sandiford @ 2020-04-07 16:32 UTC (permalink / raw)
  To: Richard Earnshaw (lists)
  Cc: Richard Henderson, gcc-patches, marcus.shawcroft, segher, Wilco.Dijkstra

"Richard Earnshaw (lists)" <Richard.Earnshaw@arm.com> writes:
> On 06/04/2020 12:19, Richard Sandiford wrote:
>> "Richard Earnshaw (lists)" <Richard.Earnshaw@arm.com> writes:
>>> On 03/04/2020 16:03, Richard Sandiford wrote:
>>>> "Richard Earnshaw (lists)" <Richard.Earnshaw@arm.com> writes:
>>>>> On 03/04/2020 13:27, Richard Sandiford wrote:
>>>>>> "Richard Earnshaw (lists)" <Richard.Earnshaw@arm.com> writes:
>>>>>>> On 02/04/2020 19:53, Richard Henderson via Gcc-patches wrote:
>>>>>>>> This is attacking case 3 of PR 94174.
>>>>>>>>
>>>>>>>> In v2, I unify the various subtract-with-borrow and add-with-carry
>>>>>>>> patterns that also output flags with unspecs.  As suggested by
>>>>>>>> Richard Sandiford during review of v1.  It does seem cleaner.
>>>>>>>>
>>>>>>>
>>>>>>> Really?  I didn't need to use any unspecs for the Arm version of this.
>>>>>>>
>>>>>>> R.
>>>>>>
>>>>>> See https://gcc.gnu.org/pipermail/gcc-patches/2020-April/543063.html
>>>>>> (including quoted context) for how we got here.
>>>>>>
>>>>>> The same problem affects the existing aarch64 patterns like
>>>>>> *usub<GPI:mode>3_carryinC.  Although that pattern avoids unspecs,
>>>>>> the compare:CC doesn't seem to be correct.
>>>>>>
>>>>>> Richard
>>>>>>
>>>>>
>>>>> But I don't think you can use ANY_EXTEND in these comparisons.  It
>>>>> doesn't describe what the instruction does, since the instruction does
>>>>> not really extend the values first.
>>>>
>>>> Yeah, that was the starting point in the thread above too.  And using
>>>> zero_extend in the existing *usub<GPI:mode>3_carryinC pattern:
>>>>
>>>> (define_insn "*usub<GPI:mode>3_carryinC"
>>>>   [(set (reg:CC CC_REGNUM)
>>>>   	(compare:CC
>>>> 	  (zero_extend:<DWI>
>>>> 	    (match_operand:GPI 1 "register_operand" "r"))
>>>> 	  (plus:<DWI>
>>>> 	    (zero_extend:<DWI>
>>>> 	      (match_operand:GPI 2 "register_operand" "r"))
>>>> 	    (match_operand:<DWI> 3 "aarch64_borrow_operation" ""))))
>>>>    (set (match_operand:GPI 0 "register_operand" "=r")
>>>> 	(minus:GPI
>>>> 	  (minus:GPI (match_dup 1) (match_dup 2))
>>>> 	  (match_operand:GPI 4 "aarch64_borrow_operation" "")))]
>>>>    ""
>>>>    "sbcs\\t%<w>0, %<w>1, %<w>2"
>>>>   [(set_attr "type" "adc_reg")]
>>>> )
>>>>
>>>> looks wrong for the same reason.  But the main problem IMO isn't how the
>>>> inputs to the compare:CC are represented, but that we're using compare:CC
>>>> at all.  Using compare doesn't accurately model the effect of SBCS on NZCV
>>>> for all inputs, so if we're going to use a compare here, it can't be :CC.
>>>>
>>>>> I would really expect this patch series to be pretty much a dual of this
>>>>> series that I posted last year for Arm.
>>>>>
>>>>> https://gcc.gnu.org/pipermail/gcc-patches/2019-October/532180.html
>>>>
>>>> That series uses compares with modes like CC_V and CC_B, so I think
>>>> you're saying that given the choice in the earlier thread between adding
>>>> a new CC mode or using unspecs, you would have preferred a new CC mode,
>>>> is that right?
>>>>
>>>
>>> Yes.  It surprised me, when doing the aarch32 version, just how often
>>> the mid-end parts of the compiler were able to reason about parts of the
>>> parallel insn and optimize things accordingly (eg propagating the truth
>>> of the comparison).  If you use an unspec that can never happen.
>> 
>> That could be changed though.  E.g. we could add something like a
>> simplify_unspec target hook if this becomes a problem (either here
>> or for other unspecs).  A fancy implementation could even use
>> match.pd-style rules in the .md file.
>
> I really don't like that.  It sounds like the top of a long slippery
> slope.  What about all the other cases where the RTL is comprehended by
> the mid-end?

Is it really so bad though?  It's similar to how frontends can define
and fold their own trees, and how targets can define and fold their own
built-in functions.

The CC modes are also quite heavily macro/hook-based.

>> The reason I'm not keen on using special modes for this case is that
>> they'd describe one way in which the result can be used rather than
>> describing what the instruction actually does.  The instruction really
>> does set all four flags to useful values.  The "problem" is that they're
>> not the values associated with a compare between two values, so representing
>> them that way will always lose information.
>> 
>
> Yes, it's true that the rtl -> machine instruction transform is not 100%
> reversible.  That's always been the case, but it's the price we pay for
> a generic IL that describes instructions on multiple architectures.

It's not really reversibility that I'm after (at least not for its
own sake).

If we had a three-input compare_cc rtx_code that described a comparison
involving a carry input, we'd certainly be using it here, because that's
what the instruction does.  Given that we don't have the rtx_code, three
obvious choices are:

(1) Add it.

(2) Continue to represent what the instruction does using an unspec.

(3) Don't try to represent the "three-input compare_cc" operation and
    instead describe a two-input comparison that only yields a valid
    result for a subset of tests.

(1) seems like the best technical solution but would probably be
a lot of work.  I guess the reason I like (2) is that it stays
closest to (1).

Thanks,
Richard

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 00/11] aarch64: Implement TImode comparisons
  2020-04-07 16:32               ` Richard Sandiford
@ 2020-04-07 17:05                 ` Richard Henderson
  0 siblings, 0 replies; 33+ messages in thread
From: Richard Henderson @ 2020-04-07 17:05 UTC (permalink / raw)
  To: Richard Earnshaw (lists),
	gcc-patches, marcus.shawcroft, segher, Wilco.Dijkstra,
	richard.sandiford

On 4/7/20 9:32 AM, Richard Sandiford wrote:
> It's not really reversibility that I'm after (at least not for its
> own sake).
> 
> If we had a three-input compare_cc rtx_code that described a comparison
> involving a carry input, we'd certainly be using it here, because that's
> what the instruction does.  Given that we don't have the rtx_code, three
> obvious choices are:
> 
> (1) Add it.
> 
> (2) Continue to represent what the instruction does using an unspec.
> 
> (3) Don't try to represent the "three-input compare_cc" operation and
>     instead describe a two-input comparison that only yields a valid
>     result for a subset of tests.
> 
> (1) seems like the best technical solution but would probably be
> a lot of work.  I guess the reason I like (2) is that it stays
> closest to (1).

Indeed, the biggest problem that I'm having with copying the arm solution to
aarch64 is the special cases of the constants.

The first problem is that (any_extend:M1 (match_operand:M2)) is invalid rtl for
a constant, so you can't share the same define_insn to handle both register and
immediate input.

The second problem is how unpredictable the canonical rtl of an expression can
be after constant folding.  Which again requires more and more define_insns.
Even the Arm target gets this wrong.  In particular,

> (define_insn "cmpsi3_carryin_<CC_EXTEND>out"
>   [(set (reg:<CC_EXTEND> CC_REGNUM)
>         (compare:<CC_EXTEND>
>          (SE:DI (match_operand:SI 1 "s_register_operand" "0,r"))
>          (plus:DI (match_operand:DI 3 "arm_borrow_operation" "")
>                   (SE:DI (match_operand:SI 2 "s_register_operand" "l,r")))))
>    (clobber (match_scratch:SI 0 "=l,r"))]

is non-canonical according to combine.  It will only attempt the ordering

  (compare
    (plus ...)
    (sign_extend ...))

I have no idea why combine is attempting to reverse the sense of the comparison
here.  I can only presume it would also reverse the sense of the branch on
which the comparison is made, had the pattern matched.

This second problem is partially worked around by fwprop, in that it will try
to simply replace the operand without folding if that is recognizable.  Thus
cases like

  (compare (const_int 0) (plus ...))

can be produced from fwprop but not combine.  Which works well enough to not
bother with the CC_RSBmode that the arm target uses.

The third problem is the really quite complicated code that goes into
SELECT_CC_MODE.  This really should not be as difficult as it is, and is the
sort of thing for which we built recog.

Related to that is the insn costing, which also ought to use something akin to
recog.  We have all of the information there: if the insn is recognizable, the
type/length attributes can be used to provide a good value.

r~

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 00/11] aarch64: Implement TImode comparisons
  2020-04-07  9:52             ` Richard Earnshaw (lists)
  2020-04-07 16:32               ` Richard Sandiford
@ 2020-04-07 19:43               ` Segher Boessenkool
  1 sibling, 0 replies; 33+ messages in thread
From: Segher Boessenkool @ 2020-04-07 19:43 UTC (permalink / raw)
  To: Richard Earnshaw (lists)
  Cc: Richard Henderson, gcc-patches, marcus.shawcroft, Wilco.Dijkstra,
	richard.sandiford

Hi!

On Tue, Apr 07, 2020 at 10:52:10AM +0100, Richard Earnshaw (lists) wrote:
> On 06/04/2020 12:19, Richard Sandiford wrote:
> > "Richard Earnshaw (lists)" <Richard.Earnshaw@arm.com> writes:
> >> Yes.  It surprised me, when doing the aarch32 version, just how often
> >> the mid-end parts of the compiler were able to reason about parts of the
> >> parallel insn and optimize things accordingly (eg propagating the truth
> >> of the comparison).  If you use an unspec that can never happen.
> > 
> > That could be changed though.  E.g. we could add something like a
> > simplify_unspec target hook if this becomes a problem (either here
> > or for other unspecs).  A fancy implementation could even use
> > match.pd-style rules in the .md file.
> 
> I really don't like that.  It sounds like the top of a long slippery
> slope.  What about all the other cases where the RTL is comprehended by
> the mid-end?

Same here.  And the interesting transforms are not done in simplify-rtx
anyway, but in combine: simplify-rtx should only ever make a *simplified*
representation, while combine makes a (single!) choice that works out the
best in practice.

You do need a few separate patterns (for reg, 0, pos, -1, other neg, for
example), but then everything automatically works.  The canonical way to
write these insns is different for different constants, that is all.


Segher

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 00/11] aarch64: Implement TImode comparisons
  2020-04-06 11:19           ` Richard Sandiford
  2020-04-06 12:22             ` Richard Biener
  2020-04-07  9:52             ` Richard Earnshaw (lists)
@ 2020-04-07 20:27             ` Segher Boessenkool
  2020-04-07 21:43               ` Richard Henderson
  2020-04-08  9:06               ` Richard Sandiford
  2 siblings, 2 replies; 33+ messages in thread
From: Segher Boessenkool @ 2020-04-07 20:27 UTC (permalink / raw)
  To: Richard Earnshaw (lists),
	Richard Henderson, gcc-patches, marcus.shawcroft, Wilco.Dijkstra,
	richard.sandiford

On Mon, Apr 06, 2020 at 12:19:42PM +0100, Richard Sandiford wrote:
> The reason I'm not keen on using special modes for this case is that
> they'd describe one way in which the result can be used rather than
> describing what the instruction actually does.  The instruction really
> does set all four flags to useful values.  The "problem" is that they're
> not the values associated with a compare between two values, so representing
> them that way will always lose information.

CC modes describe how the flags are set, not how the flags are used.
You cannot easily describe the V bit setting with a compare (it needs
a mode bigger than the register), is that what you mean?


Segher

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 00/11] aarch64: Implement TImode comparisons
  2020-04-07 20:27             ` Segher Boessenkool
@ 2020-04-07 21:43               ` Richard Henderson
  2020-04-07 23:58                 ` Segher Boessenkool
  2020-04-08  9:06               ` Richard Sandiford
  1 sibling, 1 reply; 33+ messages in thread
From: Richard Henderson @ 2020-04-07 21:43 UTC (permalink / raw)
  To: Segher Boessenkool, Richard Earnshaw (lists),
	Richard Henderson, gcc-patches, marcus.shawcroft, Wilco.Dijkstra,
	richard.sandiford

On 4/7/20 1:27 PM, Segher Boessenkool wrote:
> On Mon, Apr 06, 2020 at 12:19:42PM +0100, Richard Sandiford wrote:
>> The reason I'm not keen on using special modes for this case is that
>> they'd describe one way in which the result can be used rather than
>> describing what the instruction actually does.  The instruction really
>> does set all four flags to useful values.  The "problem" is that they're
>> not the values associated with a compare between two values, so representing
>> them that way will always lose information.
> 
> CC modes describe how the flags are set, not how the flags are used.
> You cannot easily describe the V bit setting with a compare (it needs
> a mode bigger than the register), is that what you mean?

I think that is a good deal of the effort.

I wonder if it would be helpful to have

  (uoverflow_plus x y carry)
  (soverflow_plus x y carry)

etc.

(define_insn "uaddsi3_cout"
  [(set (reg:CC_C CC_REGNUM)
        (uoverflow_plus:CC_C
          (match_operand:SI 1 "register_operand")
          (match_operand:SI 2 "plus_operand")
          (const_int 0)))
    (set (match_operand:SI 0 "register_operand")
         (plus:SI (match_dup 1) (match_dup 2)))]
  ...
)

(define_insn "uaddsi4_cin_cout"
  [(set (reg:CC_C CC_REGNUM)
        (uoverflow_plus:CC_C
          (match_operand:SI 1 "register_operand")
          (match_operand:SI 2 "reg_or_zero_operand")
          (match_operand:SI 3 "carry_operand")))
    (set (match_operand:SI 0 "register_operand")
         (plus:SI
           (plus:SI (match_dup 3) (match_dup 1))
           (match_dup 2)))]
  ...
)

(define_insn "usubsi4_cin_cout"
  [(set (reg:CC_C CC_REGNUM)
        (uoverflow_plus:CC_C
          (match_operand:SI 1 "register_operand")
          (not:SI (match_operand:SI 2 "reg_or_zero_operand"))
          (match_operand:SI 3 "carry_operand")))
    (set (match_operand:SI 0 "register_operand")
         (minus:SI
           (minus:SI (match_dup 1) (match_dup 2))
           (match_operand:SI 4 "borrow_operand")))]
  ...
)

This does have the advantage of avoiding the extensions, so that constants can
be retained in the original mode.

Though of course if we go this way, there will be incentive to add
<s,u>overflow codes for all __builtin_*_overflow_p.


r~

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 00/11] aarch64: Implement TImode comparisons
  2020-04-07 21:43               ` Richard Henderson
@ 2020-04-07 23:58                 ` Segher Boessenkool
  2020-04-08  0:50                   ` Richard Henderson
  0 siblings, 1 reply; 33+ messages in thread
From: Segher Boessenkool @ 2020-04-07 23:58 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Richard Earnshaw (lists),
	Richard Henderson, gcc-patches, marcus.shawcroft, Wilco.Dijkstra,
	richard.sandiford

On Tue, Apr 07, 2020 at 02:43:36PM -0700, Richard Henderson wrote:
> On 4/7/20 1:27 PM, Segher Boessenkool wrote:
> > On Mon, Apr 06, 2020 at 12:19:42PM +0100, Richard Sandiford wrote:
> >> The reason I'm not keen on using special modes for this case is that
> >> they'd describe one way in which the result can be used rather than
> >> describing what the instruction actually does.  The instruction really
> >> does set all four flags to useful values.  The "problem" is that they're
> >> not the values associated with a compare between two values, so representing
> >> them that way will always lose information.
> > 
> > CC modes describe how the flags are set, not how the flags are used.
> > You cannot easily describe the V bit setting with a compare (it needs
> > a mode bigger than the register), is that what you mean?
> 
> I think that is a good deal of the effort.

So you need to have different CC modes for different ways of setting
(combinations) of the four NZVC bits, there isn't really any way around
that, that is how CC modes work.

> I wonder if it would be helpful to have
> 
>   (uoverflow_plus x y carry)
>   (soverflow_plus x y carry)
> 
> etc.

Those have three operands, which is nasty to express.

On rs6000 we have the carry bit as a separate register (it really is
only one bit, XER[CA], but in GCC we model it as a separate register).
We handle it as a fixed register (there is only one, and saving and
restoring it is relatively expensive, so this worked out the best).
Still, in the patterns (for insns like "adde") that take both a carry
input and have it as output, the expression for the carry output but
already the one for the GPR output become so unwieldy that nothing
can properly work with it.  So, in the end, I have all such insns that
take a carry input just clobber their carry output.  This works great!

Expressing the carry setting for insns that do not take a carry in is
much easier.  You get somewhat different patterns for various
immediate inputs, but that is all.

[ snip ]

> This does have the advantage of avoiding the extensions, so that constants can
> be retained in the original mode.

But it won't ever survive simplification; or, it will be in the way of
simplification.

A simple test is looking what you get if you do

long long f(long long x) { return x + 0x100000000; }

(on a 32-bit config).

> Though of course if we go this way, there will be incentive to add
> <s,u>overflow codes for all __builtin_*_overflow_p.

Yeah, eww.  And where will it stop?  What muladd insns should we have
special RTL codes for, for the high part?

Segher

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 00/11] aarch64: Implement TImode comparisons
  2020-04-07 23:58                 ` Segher Boessenkool
@ 2020-04-08  0:50                   ` Richard Henderson
  2020-04-08 20:56                     ` Segher Boessenkool
  0 siblings, 1 reply; 33+ messages in thread
From: Richard Henderson @ 2020-04-08  0:50 UTC (permalink / raw)
  To: Segher Boessenkool, Richard Henderson
  Cc: Richard Earnshaw (lists),
	gcc-patches, marcus.shawcroft, Wilco.Dijkstra, richard.sandiford

On 4/7/20 4:58 PM, Segher Boessenkool wrote:
>> I wonder if it would be helpful to have
>>
>>   (uoverflow_plus x y carry)
>>   (soverflow_plus x y carry)
>>
>> etc.
> 
> Those have three operands, which is nasty to express.

How so?  It's a perfectly natural operation.

> On rs6000 we have the carry bit as a separate register (it really is
> only one bit, XER[CA], but in GCC we model it as a separate register).
> We handle it as a fixed register (there is only one, and saving and
> restoring it is relatively expensive, so this worked out the best).

As for most platforms, more or less.

> Still, in the patterns (for insns like "adde") that take both a carry
> input and have it as output, the expression for the carry output but
> already the one for the GPR output become so unwieldy that nothing
> can properly work with it.  So, in the end, I have all such insns that
> take a carry input just clobber their carry output.  This works great!

Sure, right up until the point when you want to actually *use* that carry
output.  Which is exactly what we're talking about here.

> Expressing the carry setting for insns that do not take a carry in is
> much easier.  You get somewhat different patterns for various
> immediate inputs, but that is all.

It's not horrible, but it's certainly verbose.  If we add a shorthand for that
common operation, so much the better.

I would not expect optimizers to take a collection of inputs and introduce this
rtx code, but only operate with it when the backend emits it.

>> This does have the advantage of avoiding the extensions, so that constants can
>> be retained in the original mode.
> 
> But it won't ever survive simplification; or, it will be in the way of
> simplification.

How so?

It's clear that

  (set (reg:CC_C flags)
       (uoverflow_plus:CC_C
         (reg:SI x)
         (const_int 0)
         (const_int 0)))

cannot overflow.  Thus this expression as a whole would, in combination with
the user of the CC_MODE, e.g.

  (set (reg:SI y) (ne:SI (reg:CC_C flags) (const_int 0))

fold to

  (set (reg:SI y) (ne:SI (const_int 0) (const_int 0))
to
  (set (reg:SI y) (const_int 0))

just like any other (compare) + (condition) pair.

I don't see why this new rtx code is any more difficult than ones that we have
already.

>> Though of course if we go this way, there will be incentive to add
>> <s,u>overflow codes for all __builtin_*_overflow_p.
> 
> Yeah, eww.  And where will it stop?  What muladd insns should we have
> special RTL codes for, for the high part?

Well, we don't have overflow builtins for muladd yet.  Only plus, minus, and
mul.  Only x86 and s390x have insns to support overflow from mul without also
computing the highpart.

But add/sub-with-carry are *very* common operations.  As are add/sub-with-carry
with signed overflow into flags.  It would be nice to make that as simple as
possible across all targets.

r~

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 00/11] aarch64: Implement TImode comparisons
  2020-04-07 20:27             ` Segher Boessenkool
  2020-04-07 21:43               ` Richard Henderson
@ 2020-04-08  9:06               ` Richard Sandiford
  1 sibling, 0 replies; 33+ messages in thread
From: Richard Sandiford @ 2020-04-08  9:06 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Richard Earnshaw (lists),
	Richard Henderson, gcc-patches, marcus.shawcroft, Wilco.Dijkstra

Segher Boessenkool <segher@kernel.crashing.org> writes:
> On Mon, Apr 06, 2020 at 12:19:42PM +0100, Richard Sandiford wrote:
>> The reason I'm not keen on using special modes for this case is that
>> they'd describe one way in which the result can be used rather than
>> describing what the instruction actually does.  The instruction really
>> does set all four flags to useful values.  The "problem" is that they're
>> not the values associated with a compare between two values, so representing
>> them that way will always lose information.
>
> CC modes describe how the flags are set, not how the flags are used.
> You cannot easily describe the V bit setting with a compare (it needs
> a mode bigger than the register), is that what you mean?

I meant more that, if you want to represent the result of SBCS using
a compare, you have to decide ahead of time which result you're
interested in, and pick the CC mode and compare representation
accordingly.  E.g. SBCS res, x, y sets the Z flag if res is zero.
This is potentially useful, and could be represented as a
compare:CC_Z (say) between values based on x, y and the C flag.
SBCS res, x, y also sets the C flag if the first multiword value
(x and less significant words) is GEU the second.  This too is
potentially useful and could be represented as a compare:CC_C (say)
between values based on x, y and the C flag.  But these two compares
can't be the *same* compares, because SBCS can create a Z-set, C-clear
result that is impossible for single compares.

This is a case that the existing aarch64 pattern I mentioned gets wrong:

(define_insn "*usub<GPI:mode>3_carryinC"
  [(set (reg:CC CC_REGNUM)
	(compare:CC
	  (zero_extend:<DWI>
	    (match_operand:GPI 1 "register_operand" "r"))
	  (plus:<DWI>
	    (zero_extend:<DWI>
	      (match_operand:GPI 2 "register_operand" "r"))
	    (match_operand:<DWI> 3 "aarch64_borrow_operation" ""))))
   (set (match_operand:GPI 0 "register_operand" "=r")
	(minus:GPI
	  (minus:GPI (match_dup 1) (match_dup 2))
	  (match_operand:GPI 4 "aarch64_borrow_operation" "")))]
   ""
   "sbcs\\t%<w>0, %<w>1, %<w>2"
  [(set_attr "type" "adc_reg")]
)

(op3 == inverse of the carry flag).  When op0 == 0, op2 == -1 and op3 == 1,
the compare folds to:

  (compare:CC 0 (1 << 32))

for SI.  This compare ought to give an NE result, but the SBCS instead
sets the Z flag, giving an EQ result.

That's what I meant by the CC mode describing how the result is used.
You have to pick from a choice of several mutually-exclusive compare
representations depending on which result you want.  Sure, each
representation has to describe how the flags are set in an accurate way
(like the compare:CC_Z and compare:CC_C above are individually accurate).
But the choice of CC mode does reflect how the result is used, and can't
be made independently of how the result is used.

For SUBS you can pick the mode of the compare and the compared values
independently of how the result is used.  This seems cleaner and leads
to be better CSE opportunities, etc.  I think it'd be good to be able
to do the same for SBCS, whether that's through an unspec or (better)
a new rtx_code.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 00/11] aarch64: Implement TImode comparisons
  2020-04-06 12:22             ` Richard Biener
@ 2020-04-08  9:10               ` Richard Sandiford
  0 siblings, 0 replies; 33+ messages in thread
From: Richard Sandiford @ 2020-04-08  9:10 UTC (permalink / raw)
  To: Richard Biener
  Cc: Richard Earnshaw (lists),
	Richard Henderson, GCC Patches, Marcus Shawcroft,
	Segher Boessenkool, Wilco Dijkstra

Richard Biener <richard.guenther@gmail.com> writes:
> On Mon, Apr 6, 2020 at 1:20 PM Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>>
>> "Richard Earnshaw (lists)" <Richard.Earnshaw@arm.com> writes:
>> > On 03/04/2020 16:03, Richard Sandiford wrote:
>> >> "Richard Earnshaw (lists)" <Richard.Earnshaw@arm.com> writes:
>> >>> On 03/04/2020 13:27, Richard Sandiford wrote:
>> >>>> "Richard Earnshaw (lists)" <Richard.Earnshaw@arm.com> writes:
>> >>>>> On 02/04/2020 19:53, Richard Henderson via Gcc-patches wrote:
>> >>>>>> This is attacking case 3 of PR 94174.
>> >>>>>>
>> >>>>>> In v2, I unify the various subtract-with-borrow and add-with-carry
>> >>>>>> patterns that also output flags with unspecs.  As suggested by
>> >>>>>> Richard Sandiford during review of v1.  It does seem cleaner.
>> >>>>>>
>> >>>>>
>> >>>>> Really?  I didn't need to use any unspecs for the Arm version of this.
>> >>>>>
>> >>>>> R.
>> >>>>
>> >>>> See https://gcc.gnu.org/pipermail/gcc-patches/2020-April/543063.html
>> >>>> (including quoted context) for how we got here.
>> >>>>
>> >>>> The same problem affects the existing aarch64 patterns like
>> >>>> *usub<GPI:mode>3_carryinC.  Although that pattern avoids unspecs,
>> >>>> the compare:CC doesn't seem to be correct.
>> >>>>
>> >>>> Richard
>> >>>>
>> >>>
>> >>> But I don't think you can use ANY_EXTEND in these comparisons.  It
>> >>> doesn't describe what the instruction does, since the instruction does
>> >>> not really extend the values first.
>> >>
>> >> Yeah, that was the starting point in the thread above too.  And using
>> >> zero_extend in the existing *usub<GPI:mode>3_carryinC pattern:
>> >>
>> >> (define_insn "*usub<GPI:mode>3_carryinC"
>> >>   [(set (reg:CC CC_REGNUM)
>> >>      (compare:CC
>> >>        (zero_extend:<DWI>
>> >>          (match_operand:GPI 1 "register_operand" "r"))
>> >>        (plus:<DWI>
>> >>          (zero_extend:<DWI>
>> >>            (match_operand:GPI 2 "register_operand" "r"))
>> >>          (match_operand:<DWI> 3 "aarch64_borrow_operation" ""))))
>> >>    (set (match_operand:GPI 0 "register_operand" "=r")
>> >>      (minus:GPI
>> >>        (minus:GPI (match_dup 1) (match_dup 2))
>> >>        (match_operand:GPI 4 "aarch64_borrow_operation" "")))]
>> >>    ""
>> >>    "sbcs\\t%<w>0, %<w>1, %<w>2"
>> >>   [(set_attr "type" "adc_reg")]
>> >> )
>> >>
>> >> looks wrong for the same reason.  But the main problem IMO isn't how the
>> >> inputs to the compare:CC are represented, but that we're using compare:CC
>> >> at all.  Using compare doesn't accurately model the effect of SBCS on NZCV
>> >> for all inputs, so if we're going to use a compare here, it can't be :CC.
>> >>
>> >>> I would really expect this patch series to be pretty much a dual of this
>> >>> series that I posted last year for Arm.
>> >>>
>> >>> https://gcc.gnu.org/pipermail/gcc-patches/2019-October/532180.html
>> >>
>> >> That series uses compares with modes like CC_V and CC_B, so I think
>> >> you're saying that given the choice in the earlier thread between adding
>> >> a new CC mode or using unspecs, you would have preferred a new CC mode,
>> >> is that right?
>> >>
>> >
>> > Yes.  It surprised me, when doing the aarch32 version, just how often
>> > the mid-end parts of the compiler were able to reason about parts of the
>> > parallel insn and optimize things accordingly (eg propagating the truth
>> > of the comparison).  If you use an unspec that can never happen.
>>
>> That could be changed though.  E.g. we could add something like a
>> simplify_unspec target hook if this becomes a problem (either here
>> or for other unspecs).  A fancy implementation could even use
>> match.pd-style rules in the .md file.
>>
>> The reason I'm not keen on using special modes for this case is that
>> they'd describe one way in which the result can be used rather than
>> describing what the instruction actually does.  The instruction really
>> does set all four flags to useful values.  The "problem" is that they're
>> not the values associated with a compare between two values, so representing
>> them that way will always lose information.
>
> Can't you recover the pieces by using a parallel with multiple
> set:CC_X that tie together the pieces in the "correct" way?

That would mean splitting apart the flags register for the set but
(I guess) continuing to treat them as a single unit for uses.  That's
likely to make life harder for the optimisers.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 00/11] aarch64: Implement TImode comparisons
  2020-04-08  0:50                   ` Richard Henderson
@ 2020-04-08 20:56                     ` Segher Boessenkool
  0 siblings, 0 replies; 33+ messages in thread
From: Segher Boessenkool @ 2020-04-08 20:56 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Richard Henderson, Richard Earnshaw (lists),
	gcc-patches, marcus.shawcroft, Wilco.Dijkstra, richard.sandiford

On Tue, Apr 07, 2020 at 05:50:38PM -0700, Richard Henderson wrote:
> On 4/7/20 4:58 PM, Segher Boessenkool wrote:
> >> I wonder if it would be helpful to have
> >>
> >>   (uoverflow_plus x y carry)
> >>   (soverflow_plus x y carry)
> >>
> >> etc.
> > 
> > Those have three operands, which is nasty to express.
> 
> How so?  It's a perfectly natural operation.

If you make a new code for it, sure.

You can also do

  C = ((unsigned tworeg)x + y + carry != (unsigned onereg)x + y + carry);

etc., but then in RTL of course; and then you find out that GCC can
simplify some but not all of those expressions, and it uses different
forms everywhere.

The trick is to find an expression that GCC can handle better.

> > On rs6000 we have the carry bit as a separate register (it really is
> > only one bit, XER[CA], but in GCC we model it as a separate register).
> > We handle it as a fixed register (there is only one, and saving and
> > restoring it is relatively expensive, so this worked out the best).
> 
> As for most platforms, more or less.

It's roughly equal how many have it in a separate reg vs. have it in the
condition reg, sure.  Somewhat fewer archs use GPRs for the carries.

> > Still, in the patterns (for insns like "adde") that take both a carry
> > input and have it as output, the expression for the carry output but
> > already the one for the GPR output become so unwieldy that nothing
> > can properly work with it.  So, in the end, I have all such insns that
> > take a carry input just clobber their carry output.  This works great!
> 
> Sure, right up until the point when you want to actually *use* that carry
> output.  Which is exactly what we're talking about here.

We only had two such cases:

a) Code like
  unsigned long f(unsigned long a, long b) { return a + (b > 0); }
we can do with only three insns (which GCC of course cannot derive by
itself, it needs extra patterns).  It now needs four insns (but GCC knows
to do that without using the carry at all).

b) It is useful for carry chains.  Which are very hard to express in C
and have GCC optimise it to what you want at all *anyway*.

> > Expressing the carry setting for insns that do not take a carry in is
> > much easier.  You get somewhat different patterns for various
> > immediate inputs, but that is all.
> 
> It's not horrible, but it's certainly verbose.  If we add a shorthand for that
> common operation, so much the better.

If GCC knows how to properly optimise with these new operators, you will
find you need (at least some of) those special cases again.

> I don't see why this new rtx code is any more difficult than ones that we have
> already.

It needs to be handled separately everywhere again.

Segher

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 05/11] aarch64: Use UNSPEC_SBCS for subtract-with-borrow + output flags
  2020-04-02 18:53 ` [PATCH v2 05/11] aarch64: Use UNSPEC_SBCS for subtract-with-borrow + output flags Richard Henderson
@ 2020-04-09 21:52   ` Segher Boessenkool
  2020-04-10  3:50     ` Richard Henderson
  0 siblings, 1 reply; 33+ messages in thread
From: Segher Boessenkool @ 2020-04-09 21:52 UTC (permalink / raw)
  To: Richard Henderson
  Cc: gcc-patches, richard.sandiford, richard.earnshaw, Wilco.Dijkstra,
	marcus.shawcroft, kyrylo.tkachov

Hi!

On Thu, Apr 02, 2020 at 11:53:47AM -0700, Richard Henderson wrote:
> The rtl description of signed/unsigned overflow from subtract
> was fine, as far as it goes -- we have CC_Cmode and CC_Vmode
> that indicate that only those particular bits are valid.
> 
> However, it's not clear how to extend that description to
> handle signed comparison, where N == V (GE) N != V (LT) are
> the only valid bits.
> 
> Using an UNSPEC means that we can unify all 3 usages without
> fear that combine will try to infer anything from the rtl.
> It also means we need far fewer variants when various inputs
> have constants propagated in, and the rtl folds.
> 
> Accept -1 for the second input by using ADCS.

If you use an unspec anyway, why do you need a separate UNSPEC_SBCS?
It is just the same as UNSPEC_ADCS, with one of the inputs inverted?

Is there any reason to pretend borrows are different from carries?


Segher

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 05/11] aarch64: Use UNSPEC_SBCS for subtract-with-borrow + output flags
  2020-04-09 21:52   ` Segher Boessenkool
@ 2020-04-10  3:50     ` Richard Henderson
  0 siblings, 0 replies; 33+ messages in thread
From: Richard Henderson @ 2020-04-10  3:50 UTC (permalink / raw)
  To: Segher Boessenkool, Richard Henderson
  Cc: richard.earnshaw, Wilco.Dijkstra, gcc-patches, marcus.shawcroft

On 4/9/20 2:52 PM, Segher Boessenkool wrote:
> Hi!
> 
> On Thu, Apr 02, 2020 at 11:53:47AM -0700, Richard Henderson wrote:
>> The rtl description of signed/unsigned overflow from subtract
>> was fine, as far as it goes -- we have CC_Cmode and CC_Vmode
>> that indicate that only those particular bits are valid.
>>
>> However, it's not clear how to extend that description to
>> handle signed comparison, where N == V (GE) N != V (LT) are
>> the only valid bits.
>>
>> Using an UNSPEC means that we can unify all 3 usages without
>> fear that combine will try to infer anything from the rtl.
>> It also means we need far fewer variants when various inputs
>> have constants propagated in, and the rtl folds.
>>
>> Accept -1 for the second input by using ADCS.
> 
> If you use an unspec anyway, why do you need a separate UNSPEC_SBCS?
> It is just the same as UNSPEC_ADCS, with one of the inputs inverted?
> 
> Is there any reason to pretend borrows are different from carries?

Good point.  If we go this way, I'll make sure and merge them.
But I've also just sent v4 that does away with the unspecs and
uses the forms that Earnshaw used for config/arm.


r~

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2020-04-10  3:50 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-02 18:53 [PATCH v2 00/11] aarch64: Implement TImode comparisons Richard Henderson
2020-04-02 18:53 ` [PATCH v2 01/11] aarch64: Accept 0 as first argument to compares Richard Henderson
2020-04-02 18:53 ` [PATCH v2 02/11] aarch64: Accept zeros in add<GPI>3_carryin Richard Henderson
2020-04-02 18:53 ` [PATCH v2 03/11] aarch64: Provide expander for sub<GPI>3_compare1 Richard Henderson
2020-04-02 18:53 ` [PATCH v2 04/11] aarch64: Introduce aarch64_expand_addsubti Richard Henderson
2020-04-02 18:53 ` [PATCH v2 05/11] aarch64: Use UNSPEC_SBCS for subtract-with-borrow + output flags Richard Henderson
2020-04-09 21:52   ` Segher Boessenkool
2020-04-10  3:50     ` Richard Henderson
2020-04-02 18:53 ` [PATCH v2 06/11] aarch64: Use UNSPEC_ADCS for add-with-carry " Richard Henderson
2020-04-02 18:53 ` [PATCH v2 07/11] aarch64: Remove CC_ADCmode Richard Henderson
2020-04-02 18:53 ` [PATCH v2 08/11] aarch64: Accept -1 as second argument to add<mode>3_carryin Richard Henderson
2020-04-02 18:53 ` [PATCH v2 09/11] aarch64: Adjust result of aarch64_gen_compare_reg Richard Henderson
2020-04-02 18:53 ` [PATCH v2 10/11] aarch64: Implement TImode comparisons Richard Henderson
2020-04-02 18:53 ` [PATCH v2 11/11] aarch64: Implement absti2 Richard Henderson
2020-04-02 18:55 ` [PATCH v2 00/11] aarch64: Implement TImode comparisons Richard Henderson
2020-04-03 11:34 ` Richard Earnshaw (lists)
2020-04-03 12:27   ` Richard Sandiford
2020-04-03 13:17     ` Richard Earnshaw (lists)
2020-04-03 15:03       ` Richard Sandiford
2020-04-06  9:27         ` Richard Earnshaw (lists)
2020-04-06 11:19           ` Richard Sandiford
2020-04-06 12:22             ` Richard Biener
2020-04-08  9:10               ` Richard Sandiford
2020-04-07  9:52             ` Richard Earnshaw (lists)
2020-04-07 16:32               ` Richard Sandiford
2020-04-07 17:05                 ` Richard Henderson
2020-04-07 19:43               ` Segher Boessenkool
2020-04-07 20:27             ` Segher Boessenkool
2020-04-07 21:43               ` Richard Henderson
2020-04-07 23:58                 ` Segher Boessenkool
2020-04-08  0:50                   ` Richard Henderson
2020-04-08 20:56                     ` Segher Boessenkool
2020-04-08  9:06               ` Richard Sandiford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).