public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH v2 0/9] aarch64: Implement TImode comparisons
@ 2020-03-21  2:42 Richard Henderson
  2020-03-21  2:42 ` [PATCH v2 1/9] aarch64: Accept 0 as first argument to compares Richard Henderson
                   ` (8 more replies)
  0 siblings, 9 replies; 21+ messages in thread
From: Richard Henderson @ 2020-03-21  2:42 UTC (permalink / raw)
  To: gcc-patches
  Cc: richard.earnshaw, richard.sandiford, marcus.shawcroft,
	kyrylo.tkachov, Wilco.Dijkstra

This is attacking case 3 of PR 94174.

Although I'm no longer using ccmp for most of the TImode comparisons.
Thanks to Wilco Dijkstra for pulling off my blinders and reminding me
that we can use subs+sbcs for (almost) all compares.

The first 5 patches clean up or add patterns to support the expansion
and not generate extraneous constant loads.

The aarch64_expand_addsubti patch tidies up the existing TImode
arithmetic expansions.

EXAMPLE __subvti3 (context diff is easier to read):

*** 12,27 ****
    10:	b7f800a3 	tbnz	x3, #63, 24 <__subvti3+0x24>
!   14:	eb02003f 	cmp	x1, x2
!   18:	5400010c 	b.gt	38 <__subvti3+0x38>
!   1c:	54000140 	b.eq	44 <__subvti3+0x44>  // b.none
    20:	d65f03c0 	ret
!   24:	eb01005f 	cmp	x2, x1
!   28:	5400008c 	b.gt	38 <__subvti3+0x38>
!   2c:	54ffffa1 	b.ne	20 <__subvti3+0x20>  // b.any
!   30:	eb00009f 	cmp	x4, x0
!   34:	54ffff69 	b.ls	20 <__subvti3+0x20>  // b.plast
!   38:	a9bf7bfd 	stp	x29, x30, [sp, #-16]!
!   3c:	910003fd 	mov	x29, sp
!   40:	94000000 	bl	0 <abort>
!   44:	eb04001f 	cmp	x0, x4
!   48:	54ffff88 	b.hi	38 <__subvti3+0x38>  // b.pmore
!   4c:	d65f03c0 	ret
--- 12,22 ----
    10:	b7f800a3 	tbnz	x3, #63, 24 <__subvti3+0x24>
!   14:	eb00009f 	cmp	x4, x0
!   18:	fa01005f 	sbcs	xzr, x2, x1
!   1c:	540000ab 	b.lt	30 <__subvti3+0x30>  // b.tstop
    20:	d65f03c0 	ret
!   24:	eb04001f 	cmp	x0, x4
!   28:	fa02003f 	sbcs	xzr, x1, x2
!   2c:	54ffffaa 	b.ge	20 <__subvti3+0x20>  // b.tcont
!   30:	a9bf7bfd 	stp	x29, x30, [sp, #-16]!
!   34:	910003fd 	mov	x29, sp
!   38:	94000000 	bl	0 <abort>

EXAMPLE from the pr:

void test3(__int128 a, uint64_t l)
{
        if ((__int128_t)a - l <= 1)
                doit();
}

*** 11,23 ****
        subs    x0, x0, x2
        sbc     x1, x1, xzr
!       cmp     x1, 0
!       ble     .L6
! .L1:
        ret
        .p2align 2,,3
- .L6:
-       bne     .L4
-       cmp     x0, 1
-       bhi     .L1
  .L4:
        b       doit
--- 11,19 ----
        subs    x0, x0, x2
        sbc     x1, x1, xzr
!       cmp     x0, 2
!       sbcs    xzr, x1, xzr
!       blt     .L4
        ret
        .p2align 2,,3
  .L4:
        b       doit


r~


Richard Henderson (9):
  aarch64: Accept 0 as first argument to compares
  aarch64: Accept zeros in add<GPI>3_carryin
  aarch64: Add <su>cmp_*_carryinC patterns
  aarch64: Add <su>cmp<GPI>_carryinC_m2
  aarch64: Provide expander for sub<GPI>3_compare1
  aarch64: Introduce aarch64_expand_addsubti
  aarch64: Adjust result of aarch64_gen_compare_reg
  aarch64: Implement TImode comparisons
  aarch64: Implement absti2

 gcc/config/aarch64/aarch64-protos.h       |  10 +-
 gcc/config/aarch64/aarch64.c              | 292 +++++++++-------
 gcc/config/aarch64/aarch64-simd.md        |  18 +-
 gcc/config/aarch64/aarch64-speculation.cc |   5 +-
 gcc/config/aarch64/aarch64.md             | 389 +++++++++++++---------
 5 files changed, 402 insertions(+), 312 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 1/9] aarch64: Accept 0 as first argument to compares
  2020-03-21  2:42 [PATCH v2 0/9] aarch64: Implement TImode comparisons Richard Henderson
@ 2020-03-21  2:42 ` Richard Henderson
  2020-03-31 16:55   ` Richard Sandiford
  2020-03-21  2:42 ` [PATCH v2 2/9] aarch64: Accept zeros in add<GPI>3_carryin Richard Henderson
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 21+ messages in thread
From: Richard Henderson @ 2020-03-21  2:42 UTC (permalink / raw)
  To: gcc-patches
  Cc: richard.earnshaw, richard.sandiford, marcus.shawcroft,
	kyrylo.tkachov, Wilco.Dijkstra

While cmp (extended register) and cmp (immediate) uses <Wn|WSP>,
cmp (shifted register) uses <Wn>.  So we can perform cmp xzr, x0.

For ccmp, we only have <Wn> as an input.

	* config/aarch64/aarch64.md (cmp<GPI>): For operand 0, use
	aarch64_reg_or_zero.  Shuffle reg/reg to last alternative
	and accept Z.
	(@ccmpcc<GPI>): For operand 0, use aarch64_reg_or_zero and Z.
	(@ccmpcc<GPI>_rev): Likewise.
---
 gcc/config/aarch64/aarch64.md | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index c7c4d1dd519..b9ae51e48dd 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -502,7 +502,7 @@
 	   [(match_operand 0 "cc_register" "")
 	    (const_int 0)])
 	  (compare:CC_ONLY
-	    (match_operand:GPI 2 "register_operand" "r,r,r")
+	    (match_operand:GPI 2 "aarch64_reg_or_zero" "rZ,rZ,rZ")
 	    (match_operand:GPI 3 "aarch64_ccmp_operand" "r,Uss,Usn"))
 	  (unspec:CC_ONLY
 	    [(match_operand 5 "immediate_operand")]
@@ -542,7 +542,7 @@
 	    [(match_operand 5 "immediate_operand")]
 	    UNSPEC_NZCV)
 	  (compare:CC_ONLY
-	    (match_operand:GPI 2 "register_operand" "r,r,r")
+	    (match_operand:GPI 2 "aarch64_reg_or_zero" "rZ,rZ,rZ")
 	    (match_operand:GPI 3 "aarch64_ccmp_operand" "r,Uss,Usn"))))]
   ""
   "@
@@ -3961,14 +3961,14 @@
 
 (define_insn "cmp<mode>"
   [(set (reg:CC CC_REGNUM)
-	(compare:CC (match_operand:GPI 0 "register_operand" "rk,rk,rk")
-		    (match_operand:GPI 1 "aarch64_plus_operand" "r,I,J")))]
+	(compare:CC (match_operand:GPI 0 "aarch64_reg_or_zero" "rk,rk,rkZ")
+		    (match_operand:GPI 1 "aarch64_plus_operand" "I,J,rZ")))]
   ""
   "@
-   cmp\\t%<w>0, %<w>1
    cmp\\t%<w>0, %1
-   cmn\\t%<w>0, #%n1"
-  [(set_attr "type" "alus_sreg,alus_imm,alus_imm")]
+   cmn\\t%<w>0, #%n1
+   cmp\\t%<w>0, %<w>1"
+  [(set_attr "type" "alus_imm,alus_imm,alus_sreg")]
 )
 
 (define_insn "fcmp<mode>"
-- 
2.20.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 2/9] aarch64: Accept zeros in add<GPI>3_carryin
  2020-03-21  2:42 [PATCH v2 0/9] aarch64: Implement TImode comparisons Richard Henderson
  2020-03-21  2:42 ` [PATCH v2 1/9] aarch64: Accept 0 as first argument to compares Richard Henderson
@ 2020-03-21  2:42 ` Richard Henderson
  2020-03-21  2:42 ` [PATCH v2 3/9] aarch64: Add <su>cmp_*_carryinC patterns Richard Henderson
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 21+ messages in thread
From: Richard Henderson @ 2020-03-21  2:42 UTC (permalink / raw)
  To: gcc-patches
  Cc: richard.earnshaw, richard.sandiford, marcus.shawcroft,
	kyrylo.tkachov, Wilco.Dijkstra

The expander and the insn pattern did not match, leading to
recognition failures in expand.

	* config/aarch64/aarch64.md (*add<GPI>3_carryin): Accept zeros.
---
 gcc/config/aarch64/aarch64.md | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index b9ae51e48dd..a996a5f1c39 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -2606,16 +2606,17 @@
    ""
 )
 
-;; Note that add with carry with two zero inputs is matched by cset,
-;; and that add with carry with one zero input is matched by cinc.
+;; While add with carry with two zero inputs will be folded to cset,
+;; and add with carry with one zero input will be folded to cinc,
+;; accept the zeros during initial expansion.
 
 (define_insn "*add<mode>3_carryin"
   [(set (match_operand:GPI 0 "register_operand" "=r")
 	(plus:GPI
 	  (plus:GPI
 	    (match_operand:GPI 3 "aarch64_carry_operation" "")
-	    (match_operand:GPI 1 "register_operand" "r"))
-	  (match_operand:GPI 2 "register_operand" "r")))]
+	    (match_operand:GPI 1 "aarch64_reg_or_zero" "rZ"))
+	  (match_operand:GPI 2 "aarch64_reg_or_zero" "rZ")))]
    ""
    "adc\\t%<w>0, %<w>1, %<w>2"
   [(set_attr "type" "adc_reg")]
-- 
2.20.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 3/9] aarch64: Add <su>cmp_*_carryinC patterns
  2020-03-21  2:42 [PATCH v2 0/9] aarch64: Implement TImode comparisons Richard Henderson
  2020-03-21  2:42 ` [PATCH v2 1/9] aarch64: Accept 0 as first argument to compares Richard Henderson
  2020-03-21  2:42 ` [PATCH v2 2/9] aarch64: Accept zeros in add<GPI>3_carryin Richard Henderson
@ 2020-03-21  2:42 ` Richard Henderson
  2020-03-22 19:30   ` Segher Boessenkool
  2020-03-31 18:34   ` Richard Sandiford
  2020-03-21  2:42 ` [PATCH v2 4/9] aarch64: Add <su>cmp<GPI>_carryinC_m2 Richard Henderson
                   ` (5 subsequent siblings)
  8 siblings, 2 replies; 21+ messages in thread
From: Richard Henderson @ 2020-03-21  2:42 UTC (permalink / raw)
  To: gcc-patches
  Cc: richard.earnshaw, richard.sandiford, marcus.shawcroft,
	kyrylo.tkachov, Wilco.Dijkstra

Duplicate all usub_*_carryinC, but use xzr for the output when we
only require the flags output.  The signed versions use sign_extend
instead of zero_extend for combine's benefit.

These will be used shortly for TImode comparisons.

	* config/aarch64/aarch64.md (<su>cmp<GPI>3_carryinC): New.
	(*<su>cmp<GPI>3_carryinC_z1): New.
	(*<su>cmp<GPI>3_carryinC_z2): New.
	(*<su>cmp<GPI>3_carryinC): New.
---
 gcc/config/aarch64/aarch64.md | 50 +++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index a996a5f1c39..9b1c3f797f9 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3440,6 +3440,18 @@
    ""
 )
 
+(define_expand "<ANY_EXTEND:su>cmp<GPI:mode>3_carryinC"
+   [(set (reg:CC CC_REGNUM)
+	 (compare:CC
+	   (ANY_EXTEND:<DWI>
+	     (match_operand:GPI 0 "register_operand"))
+	   (plus:<DWI>
+	     (ANY_EXTEND:<DWI>
+	       (match_operand:GPI 1 "register_operand"))
+	     (ltu:<DWI> (reg:CC CC_REGNUM) (const_int 0)))))]
+   ""
+)
+
 (define_insn "*usub<GPI:mode>3_carryinC_z1"
   [(set (reg:CC CC_REGNUM)
 	(compare:CC
@@ -3457,6 +3469,19 @@
   [(set_attr "type" "adc_reg")]
 )
 
+(define_insn "*<ANY_EXTEND:su>cmp<GPI:mode>3_carryinC_z1"
+  [(set (reg:CC CC_REGNUM)
+	(compare:CC
+	  (const_int 0)
+	  (plus:<DWI>
+	    (ANY_EXTEND:<DWI>
+	      (match_operand:GPI 0 "register_operand" "r"))
+	    (match_operand:<DWI> 1 "aarch64_borrow_operation" ""))))]
+   ""
+   "sbcs\\t<w>zr, <w>zr, %<w>0"
+  [(set_attr "type" "adc_reg")]
+)
+
 (define_insn "*usub<GPI:mode>3_carryinC_z2"
   [(set (reg:CC CC_REGNUM)
 	(compare:CC
@@ -3472,6 +3497,17 @@
   [(set_attr "type" "adc_reg")]
 )
 
+(define_insn "*<ANY_EXTEND:su>cmp<GPI:mode>3_carryinC_z2"
+  [(set (reg:CC CC_REGNUM)
+	(compare:CC
+	  (ANY_EXTEND:<DWI>
+	    (match_operand:GPI 0 "register_operand" "r"))
+	  (match_operand:<DWI> 1 "aarch64_borrow_operation" "")))]
+   ""
+   "sbcs\\t<w>zr, %<w>0, <w>zr"
+  [(set_attr "type" "adc_reg")]
+)
+
 (define_insn "*usub<GPI:mode>3_carryinC"
   [(set (reg:CC CC_REGNUM)
 	(compare:CC
@@ -3490,6 +3526,20 @@
   [(set_attr "type" "adc_reg")]
 )
 
+(define_insn "*<ANY_EXTEND:su>cmp<GPI:mode>3_carryinC"
+  [(set (reg:CC CC_REGNUM)
+	(compare:CC
+	  (ANY_EXTEND:<DWI>
+	    (match_operand:GPI 0 "register_operand" "r"))
+	  (plus:<DWI>
+	    (ANY_EXTEND:<DWI>
+	      (match_operand:GPI 1 "register_operand" "r"))
+	    (match_operand:<DWI> 2 "aarch64_borrow_operation" ""))))]
+   ""
+   "sbcs\\t<w>zr, %<w>0, %<w>1"
+  [(set_attr "type" "adc_reg")]
+)
+
 (define_expand "sub<GPI:mode>3_carryinV"
   [(parallel
      [(set (reg:CC_V CC_REGNUM)
-- 
2.20.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 4/9] aarch64: Add <su>cmp<GPI>_carryinC_m2
  2020-03-21  2:42 [PATCH v2 0/9] aarch64: Implement TImode comparisons Richard Henderson
                   ` (2 preceding siblings ...)
  2020-03-21  2:42 ` [PATCH v2 3/9] aarch64: Add <su>cmp_*_carryinC patterns Richard Henderson
@ 2020-03-21  2:42 ` Richard Henderson
  2020-03-21  2:42 ` [PATCH v2 5/9] aarch64: Provide expander for sub<GPI>3_compare1 Richard Henderson
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 21+ messages in thread
From: Richard Henderson @ 2020-03-21  2:42 UTC (permalink / raw)
  To: gcc-patches
  Cc: richard.earnshaw, richard.sandiford, marcus.shawcroft,
	kyrylo.tkachov, Wilco.Dijkstra

Combine will fold immediate -1 differently than the other
*cmp*_carryinC* patterns.  In this case we can use adcs
with an xzr input, and it occurs frequently when comparing
128-bit values to small negative constants.

	* config/aarch64/aarch64.md (<su>cmp<GPI>_carryinC_m2): New.
---
 gcc/config/aarch64/aarch64.md | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 9b1c3f797f9..076158b0071 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3452,6 +3452,7 @@
    ""
 )
 
+;; Substituting zero into the first input operand.
 (define_insn "*usub<GPI:mode>3_carryinC_z1"
   [(set (reg:CC CC_REGNUM)
 	(compare:CC
@@ -3482,6 +3483,7 @@
   [(set_attr "type" "adc_reg")]
 )
 
+;; Substituting zero into the second input operand.
 (define_insn "*usub<GPI:mode>3_carryinC_z2"
   [(set (reg:CC CC_REGNUM)
 	(compare:CC
@@ -3508,6 +3510,19 @@
   [(set_attr "type" "adc_reg")]
 )
 
+;; Substituting -1 into the second input operand.
+(define_insn "*<ANY_EXTEND:su>cmp<GPI:mode>3_carryinC_m2"
+  [(set (reg:CC CC_REGNUM)
+	(compare:CC
+	  (neg:<DWI>
+	    (match_operand:<DWI> 1 "aarch64_carry_operation" ""))
+	  (ANY_EXTEND:<DWI>
+	    (match_operand:GPI 0 "register_operand" "r"))))]
+   ""
+   "adcs\\t<w>zr, %<w>0, <w>zr"
+  [(set_attr "type" "adc_reg")]
+)
+
 (define_insn "*usub<GPI:mode>3_carryinC"
   [(set (reg:CC CC_REGNUM)
 	(compare:CC
-- 
2.20.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 5/9] aarch64: Provide expander for sub<GPI>3_compare1
  2020-03-21  2:42 [PATCH v2 0/9] aarch64: Implement TImode comparisons Richard Henderson
                   ` (3 preceding siblings ...)
  2020-03-21  2:42 ` [PATCH v2 4/9] aarch64: Add <su>cmp<GPI>_carryinC_m2 Richard Henderson
@ 2020-03-21  2:42 ` Richard Henderson
  2020-03-21  2:42 ` [PATCH v2 6/9] aarch64: Introduce aarch64_expand_addsubti Richard Henderson
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 21+ messages in thread
From: Richard Henderson @ 2020-03-21  2:42 UTC (permalink / raw)
  To: gcc-patches
  Cc: richard.earnshaw, richard.sandiford, marcus.shawcroft,
	kyrylo.tkachov, Wilco.Dijkstra

In a couple of places we open-code a special case of this
pattern into the more specific sub<GPI>3_compare1_imm.
Centralize that special case into an expander.

	* config/aarch64/aarch64.md (*sub<GPI>3_compare1): Rename
	from sub<GPI>3_compare1.
	(sub<GPI>3_compare1): New expander.
---
 gcc/config/aarch64/aarch64.md | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 076158b0071..47eeba7311c 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3120,7 +3120,7 @@
   [(set_attr "type" "alus_imm")]
 )
 
-(define_insn "sub<mode>3_compare1"
+(define_insn "*sub<mode>3_compare1"
   [(set (reg:CC CC_REGNUM)
 	(compare:CC
 	  (match_operand:GPI 1 "aarch64_reg_or_zero" "rkZ")
@@ -3132,6 +3132,26 @@
   [(set_attr "type" "alus_sreg")]
 )
 
+(define_expand "sub<mode>3_compare1"
+  [(parallel
+    [(set (reg:CC CC_REGNUM)
+	  (compare:CC
+	    (match_operand:GPI 1 "aarch64_reg_or_zero")
+	    (match_operand:GPI 2 "aarch64_reg_or_imm")))
+     (set (match_operand:GPI 0 "register_operand")
+	  (minus:GPI (match_dup 1) (match_dup 2)))])]
+  ""
+{
+  if (aarch64_plus_immediate (operands[2], <MODE>mode))
+    {
+      emit_insn (gen_sub<mode>3_compare1_imm
+		 (operands[0], operands[1], operands[2],
+		  GEN_INT (-INTVAL (operands[2]))));
+      DONE;
+    }
+  operands[2] = force_reg (<MODE>mode, operands[2]);
+})
+
 (define_peephole2
   [(set (match_operand:GPI 0 "aarch64_general_reg")
 	(minus:GPI (match_operand:GPI 1 "aarch64_reg_or_zero")
-- 
2.20.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 6/9] aarch64: Introduce aarch64_expand_addsubti
  2020-03-21  2:42 [PATCH v2 0/9] aarch64: Implement TImode comparisons Richard Henderson
                   ` (4 preceding siblings ...)
  2020-03-21  2:42 ` [PATCH v2 5/9] aarch64: Provide expander for sub<GPI>3_compare1 Richard Henderson
@ 2020-03-21  2:42 ` Richard Henderson
  2020-03-21  2:42 ` [PATCH v2 7/9] aarch64: Adjust result of aarch64_gen_compare_reg Richard Henderson
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 21+ messages in thread
From: Richard Henderson @ 2020-03-21  2:42 UTC (permalink / raw)
  To: gcc-patches
  Cc: richard.earnshaw, richard.sandiford, marcus.shawcroft,
	kyrylo.tkachov, Wilco.Dijkstra

Modify aarch64_expand_subvti into a form that handles all
addition and subtraction, modulo, signed or unsigned overflow.

Use expand_insn to put the operands into the proper form,
and do not force values into register if not required.

	* config/aarch64/aarch64.c (aarch64_ti_split) New.
	(aarch64_addti_scratch_regs): Remove.
	(aarch64_subvti_scratch_regs): Remove.
	(aarch64_expand_subvti): Remove.
	(aarch64_expand_addsubti): New.
	* config/aarch64/aarch64-protos.h: Update to match.
	* config/aarch64/aarch64.md (addti3): Use aarch64_expand_addsubti.
	(addvti4, uaddvti4): Likewise.
	(subvti4, usubvti4): Likewise.
	(subti3): Likewise; accept immediates for operand 2.
---
 gcc/config/aarch64/aarch64-protos.h |  10 +-
 gcc/config/aarch64/aarch64.c        | 136 ++++++++--------------------
 gcc/config/aarch64/aarch64.md       | 125 ++++++-------------------
 3 files changed, 67 insertions(+), 204 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index d6d668ea920..787085b24d2 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -630,16 +630,8 @@ void aarch64_reset_previous_fndecl (void);
 bool aarch64_return_address_signing_enabled (void);
 bool aarch64_bti_enabled (void);
 void aarch64_save_restore_target_globals (tree);
-void aarch64_addti_scratch_regs (rtx, rtx, rtx *,
-				 rtx *, rtx *,
-				 rtx *, rtx *,
-				 rtx *);
-void aarch64_subvti_scratch_regs (rtx, rtx, rtx *,
-				  rtx *, rtx *,
-				  rtx *, rtx *, rtx *);
-void aarch64_expand_subvti (rtx, rtx, rtx,
-			    rtx, rtx, rtx, rtx, bool);
 
+void aarch64_expand_addsubti (rtx, rtx, rtx, int, int, int);
 
 /* Initialize builtins for SIMD intrinsics.  */
 void init_aarch64_simd_builtins (void);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index c90de65de12..6263897c9a0 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -20241,117 +20241,61 @@ aarch64_gen_unlikely_cbranch (enum rtx_code code, machine_mode cc_mode,
   aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
 }
 
-/* Generate DImode scratch registers for 128-bit (TImode) addition.
+/* Generate DImode scratch registers for 128-bit (TImode) add/sub.
+   INPUT represents the TImode input operand
+   LO represents the low half (DImode) of the TImode operand
+   HI represents the high half (DImode) of the TImode operand.  */
 
-   OP1 represents the TImode destination operand 1
-   OP2 represents the TImode destination operand 2
-   LOW_DEST represents the low half (DImode) of TImode operand 0
-   LOW_IN1 represents the low half (DImode) of TImode operand 1
-   LOW_IN2 represents the low half (DImode) of TImode operand 2
-   HIGH_DEST represents the high half (DImode) of TImode operand 0
-   HIGH_IN1 represents the high half (DImode) of TImode operand 1
-   HIGH_IN2 represents the high half (DImode) of TImode operand 2.  */
-
-void
-aarch64_addti_scratch_regs (rtx op1, rtx op2, rtx *low_dest,
-			    rtx *low_in1, rtx *low_in2,
-			    rtx *high_dest, rtx *high_in1,
-			    rtx *high_in2)
+static void
+aarch64_ti_split (rtx input, rtx *lo, rtx *hi)
 {
-  *low_dest = gen_reg_rtx (DImode);
-  *low_in1 = gen_lowpart (DImode, op1);
-  *low_in2 = simplify_gen_subreg (DImode, op2, TImode,
-				  subreg_lowpart_offset (DImode, TImode));
-  *high_dest = gen_reg_rtx (DImode);
-  *high_in1 = gen_highpart (DImode, op1);
-  *high_in2 = simplify_gen_subreg (DImode, op2, TImode,
-				   subreg_highpart_offset (DImode, TImode));
+  *lo = simplify_gen_subreg (DImode, input, TImode,
+			     subreg_lowpart_offset (DImode, TImode));
+  *hi = simplify_gen_subreg (DImode, input, TImode,
+			     subreg_highpart_offset (DImode, TImode));
 }
 
-/* Generate DImode scratch registers for 128-bit (TImode) subtraction.
-
-   This function differs from 'arch64_addti_scratch_regs' in that
-   OP1 can be an immediate constant (zero). We must call
-   subreg_highpart_offset with DImode and TImode arguments, otherwise
-   VOIDmode will be used for the const_int which generates an internal
-   error from subreg_size_highpart_offset which does not expect a size of zero.
-
-   OP1 represents the TImode destination operand 1
-   OP2 represents the TImode destination operand 2
-   LOW_DEST represents the low half (DImode) of TImode operand 0
-   LOW_IN1 represents the low half (DImode) of TImode operand 1
-   LOW_IN2 represents the low half (DImode) of TImode operand 2
-   HIGH_DEST represents the high half (DImode) of TImode operand 0
-   HIGH_IN1 represents the high half (DImode) of TImode operand 1
-   HIGH_IN2 represents the high half (DImode) of TImode operand 2.  */
-
-
-void
-aarch64_subvti_scratch_regs (rtx op1, rtx op2, rtx *low_dest,
-			     rtx *low_in1, rtx *low_in2,
-			     rtx *high_dest, rtx *high_in1,
-			     rtx *high_in2)
-{
-  *low_dest = gen_reg_rtx (DImode);
-  *low_in1 = simplify_gen_subreg (DImode, op1, TImode,
-				  subreg_lowpart_offset (DImode, TImode));
-
-  *low_in2 = simplify_gen_subreg (DImode, op2, TImode,
-				  subreg_lowpart_offset (DImode, TImode));
-  *high_dest = gen_reg_rtx (DImode);
-
-  *high_in1 = simplify_gen_subreg (DImode, op1, TImode,
-				   subreg_highpart_offset (DImode, TImode));
-  *high_in2 = simplify_gen_subreg (DImode, op2, TImode,
-				   subreg_highpart_offset (DImode, TImode));
-}
-
-/* Generate RTL for 128-bit (TImode) subtraction with overflow.
-
+/* Generate RTL for 128-bit (TImode) addition or subtraction.
    OP0 represents the TImode destination operand 0
-   LOW_DEST represents the low half (DImode) of TImode operand 0
-   LOW_IN1 represents the low half (DImode) of TImode operand 1
-   LOW_IN2 represents the low half (DImode) of TImode operand 2
-   HIGH_DEST represents the high half (DImode) of TImode operand 0
-   HIGH_IN1 represents the high half (DImode) of TImode operand 1
-   HIGH_IN2 represents the high half (DImode) of TImode operand 2
-   UNSIGNED_P is true if the operation is being performed on unsigned
-   values.  */
+   OP1 and OP2 represent the TImode input operands.
+
+   Normal or Overflow behaviour is obtained via the INSN_CODE operands:
+   CODE_HI_LO0 is used when the low half of OP2 == 0, otherwise
+   CODE_LO is used on the low halves,
+   CODE_HI is used on the high halves.  */
+
 void
-aarch64_expand_subvti (rtx op0, rtx low_dest, rtx low_in1,
-		       rtx low_in2, rtx high_dest, rtx high_in1,
-		       rtx high_in2, bool unsigned_p)
+aarch64_expand_addsubti (rtx op0, rtx op1, rtx op2,
+			 int code_hi_lo0, int code_lo, int code_hi)
 {
-  if (low_in2 == const0_rtx)
+  rtx low_dest, low_op1, low_op2, high_dest, high_op1, high_op2;
+  struct expand_operand ops[3];
+
+  aarch64_ti_split (op1, &low_op1, &high_op1);
+  aarch64_ti_split (op2, &low_op2, &high_op2);
+
+  if (low_op2 == const0_rtx)
     {
-      low_dest = low_in1;
-      high_in2 = force_reg (DImode, high_in2);
-      if (unsigned_p)
-	emit_insn (gen_subdi3_compare1 (high_dest, high_in1, high_in2));
-      else
-	emit_insn (gen_subvdi_insn (high_dest, high_in1, high_in2));
+      low_dest = low_op1;
+      code_hi = code_hi_lo0;
     }
   else
     {
-      if (aarch64_plus_immediate (low_in2, DImode))
-	emit_insn (gen_subdi3_compare1_imm (low_dest, low_in1, low_in2,
-					    GEN_INT (-INTVAL (low_in2))));
-      else
-	{
-	  low_in2 = force_reg (DImode, low_in2);
-	  emit_insn (gen_subdi3_compare1 (low_dest, low_in1, low_in2));
-	}
-      high_in2 = force_reg (DImode, high_in2);
-
-      if (unsigned_p)
-	emit_insn (gen_usubdi3_carryinC (high_dest, high_in1, high_in2));
-      else
-	emit_insn (gen_subdi3_carryinV (high_dest, high_in1, high_in2));
+      low_dest = gen_reg_rtx (DImode);
+      create_output_operand(&ops[0], low_dest, DImode);
+      create_input_operand(&ops[1], low_op1, DImode);
+      create_input_operand(&ops[2], low_op2, DImode);
+      expand_insn ((insn_code)code_lo, 3, ops);
     }
 
+  high_dest = gen_reg_rtx (DImode);
+  create_output_operand(&ops[0], high_dest, DImode);
+  create_input_operand(&ops[1], high_op1, DImode);
+  create_input_operand(&ops[2], high_op2, DImode);
+  expand_insn ((insn_code)code_hi, 3, ops);
+
   emit_move_insn (gen_lowpart (DImode, op0), low_dest);
   emit_move_insn (gen_highpart (DImode, op0), high_dest);
-
 }
 
 /* Implement the TARGET_ASAN_SHADOW_OFFSET hook.  */
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 47eeba7311c..c3fb2292d19 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -2044,30 +2044,10 @@
 		 (match_operand:TI 2 "aarch64_reg_or_imm")))]
   ""
 {
-  rtx low_dest, op1_low, op2_low, high_dest, op1_high, op2_high;
-
-  aarch64_addti_scratch_regs (operands[1], operands[2],
-			      &low_dest, &op1_low, &op2_low,
-			      &high_dest, &op1_high, &op2_high);
-
-  if (op2_low == const0_rtx)
-    {
-      low_dest = op1_low;
-      if (!aarch64_pluslong_operand (op2_high, DImode))
-	op2_high = force_reg (DImode, op2_high);
-      emit_insn (gen_adddi3 (high_dest, op1_high, op2_high));
-    }
-  else
-    {
-      emit_insn (gen_adddi3_compareC (low_dest, op1_low,
-				      force_reg (DImode, op2_low)));
-      emit_insn (gen_adddi3_carryin (high_dest, op1_high,
-				     force_reg (DImode, op2_high)));
-    }
-
-  emit_move_insn (gen_lowpart (DImode, operands[0]), low_dest);
-  emit_move_insn (gen_highpart (DImode, operands[0]), high_dest);
-
+  aarch64_expand_addsubti (operands[0], operands[1], operands[2],
+			   CODE_FOR_adddi3,
+			   CODE_FOR_adddi3_compareC,
+			   CODE_FOR_adddi3_carryin);
   DONE;
 })
 
@@ -2078,29 +2058,10 @@
    (label_ref (match_operand 3 "" ""))]
   ""
 {
-  rtx low_dest, op1_low, op2_low, high_dest, op1_high, op2_high;
-
-  aarch64_addti_scratch_regs (operands[1], operands[2],
-			      &low_dest, &op1_low, &op2_low,
-			      &high_dest, &op1_high, &op2_high);
-
-  if (op2_low == const0_rtx)
-    {
-      low_dest = op1_low;
-      emit_insn (gen_adddi3_compareV (high_dest, op1_high,
-				      force_reg (DImode, op2_high)));
-    }
-  else
-    {
-      emit_insn (gen_adddi3_compareC (low_dest, op1_low,
-				      force_reg (DImode, op2_low)));
-      emit_insn (gen_adddi3_carryinV (high_dest, op1_high,
-				      force_reg (DImode, op2_high)));
-    }
-
-  emit_move_insn (gen_lowpart (DImode, operands[0]), low_dest);
-  emit_move_insn (gen_highpart (DImode, operands[0]), high_dest);
-
+  aarch64_expand_addsubti (operands[0], operands[1], operands[2],
+			   CODE_FOR_adddi3_compareV,
+			   CODE_FOR_adddi3_compareC,
+			   CODE_FOR_adddi3_carryinV);
   aarch64_gen_unlikely_cbranch (NE, CC_Vmode, operands[3]);
   DONE;
 })
@@ -2112,32 +2073,13 @@
    (label_ref (match_operand 3 "" ""))]
   ""
 {
-  rtx low_dest, op1_low, op2_low, high_dest, op1_high, op2_high;
-
-  aarch64_addti_scratch_regs (operands[1], operands[2],
-			      &low_dest, &op1_low, &op2_low,
-			      &high_dest, &op1_high, &op2_high);
-
-  if (op2_low == const0_rtx)
-    {
-      low_dest = op1_low;
-      emit_insn (gen_adddi3_compareC (high_dest, op1_high,
-				      force_reg (DImode, op2_high)));
-    }
-  else
-    {
-      emit_insn (gen_adddi3_compareC (low_dest, op1_low,
-				      force_reg (DImode, op2_low)));
-      emit_insn (gen_adddi3_carryinC (high_dest, op1_high,
-				      force_reg (DImode, op2_high)));
-    }
-
-  emit_move_insn (gen_lowpart (DImode, operands[0]), low_dest);
-  emit_move_insn (gen_highpart (DImode, operands[0]), high_dest);
-
+  aarch64_expand_addsubti (operands[0], operands[1], operands[2],
+			   CODE_FOR_adddi3_compareC,
+			   CODE_FOR_adddi3_compareC,
+			   CODE_FOR_adddi3_carryinC);
   aarch64_gen_unlikely_cbranch (GEU, CC_ADCmode, operands[3]);
   DONE;
- })
+})
 
 (define_insn "add<mode>3_compare0"
   [(set (reg:CC_NZ CC_REGNUM)
@@ -2980,20 +2922,13 @@
 (define_expand "subti3"
   [(set (match_operand:TI 0 "register_operand")
 	(minus:TI (match_operand:TI 1 "aarch64_reg_or_zero")
-		  (match_operand:TI 2 "register_operand")))]
+		  (match_operand:TI 2 "aarch64_reg_or_imm")))]
   ""
 {
-  rtx low_dest, op1_low, op2_low, high_dest, op1_high, op2_high;
-
-  aarch64_subvti_scratch_regs (operands[1], operands[2],
-			       &low_dest, &op1_low, &op2_low,
-			       &high_dest, &op1_high, &op2_high);
-
-  emit_insn (gen_subdi3_compare1 (low_dest, op1_low, op2_low));
-  emit_insn (gen_subdi3_carryin (high_dest, op1_high, op2_high));
-
-  emit_move_insn (gen_lowpart (DImode, operands[0]), low_dest);
-  emit_move_insn (gen_highpart (DImode, operands[0]), high_dest);
+  aarch64_expand_addsubti (operands[0], operands[1], operands[2],
+			   CODE_FOR_subdi3,
+			   CODE_FOR_subdi3_compare1,
+			   CODE_FOR_subdi3_carryin);
   DONE;
 })
 
@@ -3004,14 +2939,10 @@
    (label_ref (match_operand 3 "" ""))]
   ""
 {
-  rtx low_dest, op1_low, op2_low, high_dest, op1_high, op2_high;
-
-  aarch64_subvti_scratch_regs (operands[1], operands[2],
-			       &low_dest, &op1_low, &op2_low,
-			       &high_dest, &op1_high, &op2_high);
-  aarch64_expand_subvti (operands[0], low_dest, op1_low, op2_low,
-			 high_dest, op1_high, op2_high, false);
-
+  aarch64_expand_addsubti (operands[0], operands[1], operands[2],
+			   CODE_FOR_subvdi_insn,
+			   CODE_FOR_subdi3_compare1,
+			   CODE_FOR_subdi3_carryinV);
   aarch64_gen_unlikely_cbranch (NE, CC_Vmode, operands[3]);
   DONE;
 })
@@ -3023,14 +2954,10 @@
    (label_ref (match_operand 3 "" ""))]
   ""
 {
-  rtx low_dest, op1_low, op2_low, high_dest, op1_high, op2_high;
-
-  aarch64_subvti_scratch_regs (operands[1], operands[2],
-				    &low_dest, &op1_low, &op2_low,
-			       &high_dest, &op1_high, &op2_high);
-  aarch64_expand_subvti (operands[0], low_dest, op1_low, op2_low,
-			 high_dest, op1_high, op2_high, true);
-
+  aarch64_expand_addsubti (operands[0], operands[1], operands[2],
+			   CODE_FOR_subdi3_compare1,
+			   CODE_FOR_subdi3_compare1,
+			   CODE_FOR_usubdi3_carryinC);
   aarch64_gen_unlikely_cbranch (LTU, CCmode, operands[3]);
   DONE;
 })
-- 
2.20.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 7/9] aarch64: Adjust result of aarch64_gen_compare_reg
  2020-03-21  2:42 [PATCH v2 0/9] aarch64: Implement TImode comparisons Richard Henderson
                   ` (5 preceding siblings ...)
  2020-03-21  2:42 ` [PATCH v2 6/9] aarch64: Introduce aarch64_expand_addsubti Richard Henderson
@ 2020-03-21  2:42 ` Richard Henderson
  2020-03-22 21:55   ` Segher Boessenkool
  2020-03-21  2:42 ` [PATCH v2 8/9] aarch64: Implement TImode comparisons Richard Henderson
  2020-03-21  2:42 ` [PATCH v2 9/9] aarch64: Implement absti2 Richard Henderson
  8 siblings, 1 reply; 21+ messages in thread
From: Richard Henderson @ 2020-03-21  2:42 UTC (permalink / raw)
  To: gcc-patches
  Cc: richard.earnshaw, richard.sandiford, marcus.shawcroft,
	kyrylo.tkachov, Wilco.Dijkstra

Return the entire comparison expression, not just the cc_reg.
This will allow the routine to adjust the comparison code as
needed for TImode comparisons.

Note that some users were passing e.g. EQ to aarch64_gen_compare_reg
and then using gen_rtx_NE.  Pass the proper code in the first place.

	* config/aarch64/aarch64.c (aarch64_gen_compare_reg): Return
	the final comparison for code & cc_reg.
	(aarch64_gen_compare_reg_maybe_ze): Likewise.
	(aarch64_expand_compare_and_swap): Update to match -- do not
	build the final comparison here, but PUT_MODE as necessary.
	(aarch64_split_compare_and_swap): Use prebuilt comparison.
	* config/aarch64/aarch64-simd.md (aarch64_cm<COMPARISONS>di): Likewise.
	(aarch64_cm<UCOMPARISONS>di): Likewise.
	(aarch64_cmtstdi): Likewise.
	* config/aarch64/aarch64-speculation.cc
	(aarch64_speculation_establish_tracker): Likewise.
	* config/aarch64/aarch64.md (cbranch<GPI>4, cbranch<GPF>4): Likewise.
	(mod<GPI>3, abs<GPI>2): Likewise.
	(cstore<GPI>4, cstore<GPF>4): Likewise.
	(cmov<GPI>6, cmov<GPF>6): Likewise.
	(mov<ALLI>cc, mov<GPF><GPI>cc, mov<GPF>cc): Likewise.
	(<NEG_NOT><GPI>cc): Likewise.
	(ffs<GPI>2): Likewise.
	(cstorecc4): Remove redundant "".
---
 gcc/config/aarch64/aarch64.c              | 26 +++---
 gcc/config/aarch64/aarch64-simd.md        | 18 ++---
 gcc/config/aarch64/aarch64-speculation.cc |  5 +-
 gcc/config/aarch64/aarch64.md             | 96 ++++++++++-------------
 4 files changed, 63 insertions(+), 82 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 6263897c9a0..9e7c26a8df2 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -2328,7 +2328,7 @@ emit_set_insn (rtx x, rtx y)
 }
 
 /* X and Y are two things to compare using CODE.  Emit the compare insn and
-   return the rtx for register 0 in the proper mode.  */
+   return the rtx for the CCmode comparison.  */
 rtx
 aarch64_gen_compare_reg (RTX_CODE code, rtx x, rtx y)
 {
@@ -2359,7 +2359,7 @@ aarch64_gen_compare_reg (RTX_CODE code, rtx x, rtx y)
       cc_reg = gen_rtx_REG (cc_mode, CC_REGNUM);
       emit_set_insn (cc_reg, gen_rtx_COMPARE (cc_mode, x, y));
     }
-  return cc_reg;
+  return gen_rtx_fmt_ee (code, VOIDmode, cc_reg, const0_rtx);
 }
 
 /* Similarly, but maybe zero-extend Y if Y_MODE < SImode.  */
@@ -2382,7 +2382,7 @@ aarch64_gen_compare_reg_maybe_ze (RTX_CODE code, rtx x, rtx y,
 	  cc_mode = CC_SWPmode;
 	  cc_reg = gen_rtx_REG (cc_mode, CC_REGNUM);
 	  emit_set_insn (cc_reg, t);
-	  return cc_reg;
+	  return gen_rtx_fmt_ee (code, VOIDmode, cc_reg, const0_rtx);
 	}
     }
 
@@ -18506,7 +18506,8 @@ aarch64_expand_compare_and_swap (rtx operands[])
 
       emit_insn (gen_aarch64_compare_and_swap_lse (mode, rval, mem,
 						   newval, mod_s));
-      cc_reg = aarch64_gen_compare_reg_maybe_ze (NE, rval, oldval, mode);
+      x = aarch64_gen_compare_reg_maybe_ze (EQ, rval, oldval, mode);
+      PUT_MODE (x, SImode);
     }
   else if (TARGET_OUTLINE_ATOMICS)
     {
@@ -18517,7 +18518,8 @@ aarch64_expand_compare_and_swap (rtx operands[])
       rval = emit_library_call_value (func, NULL_RTX, LCT_NORMAL, r_mode,
 				      oldval, mode, newval, mode,
 				      XEXP (mem, 0), Pmode);
-      cc_reg = aarch64_gen_compare_reg_maybe_ze (NE, rval, oldval, mode);
+      x = aarch64_gen_compare_reg_maybe_ze (EQ, rval, oldval, mode);
+      PUT_MODE (x, SImode);
     }
   else
     {
@@ -18529,13 +18531,13 @@ aarch64_expand_compare_and_swap (rtx operands[])
       emit_insn (GEN_FCN (code) (rval, mem, oldval, newval,
 				 is_weak, mod_s, mod_f));
       cc_reg = gen_rtx_REG (CCmode, CC_REGNUM);
+      x = gen_rtx_EQ (SImode, cc_reg, const0_rtx);
     }
 
   if (r_mode != mode)
     rval = gen_lowpart (mode, rval);
   emit_move_insn (operands[1], rval);
 
-  x = gen_rtx_EQ (SImode, cc_reg, const0_rtx);
   emit_insn (gen_rtx_SET (bval, x));
 }
 
@@ -18610,10 +18612,8 @@ aarch64_split_compare_and_swap (rtx operands[])
   if (strong_zero_p)
     x = gen_rtx_NE (VOIDmode, rval, const0_rtx);
   else
-    {
-      rtx cc_reg = aarch64_gen_compare_reg_maybe_ze (NE, rval, oldval, mode);
-      x = gen_rtx_NE (VOIDmode, cc_reg, const0_rtx);
-    }
+    x = aarch64_gen_compare_reg_maybe_ze (NE, rval, oldval, mode);
+
   x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
 			    gen_rtx_LABEL_REF (Pmode, label2), pc_rtx);
   aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
@@ -18626,8 +18626,7 @@ aarch64_split_compare_and_swap (rtx operands[])
 	{
 	  /* Emit an explicit compare instruction, so that we can correctly
 	     track the condition codes.  */
-	  rtx cc_reg = aarch64_gen_compare_reg (NE, scratch, const0_rtx);
-	  x = gen_rtx_NE (GET_MODE (cc_reg), cc_reg, const0_rtx);
+	  x = aarch64_gen_compare_reg (NE, scratch, const0_rtx);
 	}
       else
 	x = gen_rtx_NE (VOIDmode, scratch, const0_rtx);
@@ -18722,8 +18721,7 @@ aarch64_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem,
     {
       /* Emit an explicit compare instruction, so that we can correctly
 	 track the condition codes.  */
-      rtx cc_reg = aarch64_gen_compare_reg (NE, cond, const0_rtx);
-      x = gen_rtx_NE (GET_MODE (cc_reg), cc_reg, const0_rtx);
+      x = aarch64_gen_compare_reg (NE, cond, const0_rtx);
     }
   else
     x = gen_rtx_NE (VOIDmode, cond, const0_rtx);
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 24a11fb5040..69e099a2c23 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -4800,10 +4800,8 @@
     if (GP_REGNUM_P (REGNO (operands[0]))
 	&& GP_REGNUM_P (REGNO (operands[1])))
       {
-	machine_mode mode = SELECT_CC_MODE (<CMP>, operands[1], operands[2]);
-	rtx cc_reg = aarch64_gen_compare_reg (<CMP>, operands[1], operands[2]);
-	rtx comparison = gen_rtx_<CMP> (mode, operands[1], operands[2]);
-	emit_insn (gen_cstoredi_neg (operands[0], comparison, cc_reg));
+	rtx cmp = aarch64_gen_compare_reg (<CMP>, operands[1], operands[2]);
+	emit_insn (gen_cstoredi_neg (operands[0], cmp, XEXP (cmp, 0)));
 	DONE;
       }
     /* Otherwise, we expand to a similar pattern which does not
@@ -4863,10 +4861,8 @@
     if (GP_REGNUM_P (REGNO (operands[0]))
 	&& GP_REGNUM_P (REGNO (operands[1])))
       {
-	machine_mode mode = CCmode;
-	rtx cc_reg = aarch64_gen_compare_reg (<CMP>, operands[1], operands[2]);
-	rtx comparison = gen_rtx_<CMP> (mode, operands[1], operands[2]);
-	emit_insn (gen_cstoredi_neg (operands[0], comparison, cc_reg));
+	rtx cmp = aarch64_gen_compare_reg (<CMP>, operands[1], operands[2]);
+	emit_insn (gen_cstoredi_neg (operands[0], cmp, XEXP (cmp, 0)));
 	DONE;
       }
     /* Otherwise, we expand to a similar pattern which does not
@@ -4936,10 +4932,8 @@
 	&& GP_REGNUM_P (REGNO (operands[1])))
       {
 	rtx and_tree = gen_rtx_AND (DImode, operands[1], operands[2]);
-	machine_mode mode = SELECT_CC_MODE (NE, and_tree, const0_rtx);
-	rtx cc_reg = aarch64_gen_compare_reg (NE, and_tree, const0_rtx);
-	rtx comparison = gen_rtx_NE (mode, and_tree, const0_rtx);
-	emit_insn (gen_cstoredi_neg (operands[0], comparison, cc_reg));
+	rtx cmp = aarch64_gen_compare_reg (NE, and_tree, const0_rtx);
+	emit_insn (gen_cstoredi_neg (operands[0], cmp, XEXP (cmp, 0)));
 	DONE;
       }
     /* Otherwise, we expand to a similar pattern which does not
diff --git a/gcc/config/aarch64/aarch64-speculation.cc b/gcc/config/aarch64/aarch64-speculation.cc
index f490b64ae61..87d5964871b 100644
--- a/gcc/config/aarch64/aarch64-speculation.cc
+++ b/gcc/config/aarch64/aarch64-speculation.cc
@@ -162,9 +162,8 @@ aarch64_speculation_establish_tracker ()
   rtx sp = gen_rtx_REG (DImode, SP_REGNUM);
   rtx tracker = gen_rtx_REG (DImode, SPECULATION_TRACKER_REGNUM);
   start_sequence ();
-  rtx cc = aarch64_gen_compare_reg (EQ, sp, const0_rtx);
-  emit_insn (gen_cstoredi_neg (tracker,
-			       gen_rtx_NE (CCmode, cc, const0_rtx), cc));
+  rtx x = aarch64_gen_compare_reg (NE, sp, const0_rtx);
+  emit_insn (gen_cstoredi_neg (tracker, x, XEXP (x, 0)));
   rtx_insn *seq = get_insns ();
   end_sequence ();
   return seq;
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index c3fb2292d19..0b44c814bae 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -464,12 +464,12 @@
 			   (label_ref (match_operand 3 "" ""))
 			   (pc)))]
   ""
-  "
-  operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]), operands[1],
+{
+  operands[0] = aarch64_gen_compare_reg (GET_CODE (operands[0]), operands[1],
 					 operands[2]);
+  operands[1] = XEXP (operands[0], 0);
   operands[2] = const0_rtx;
-  "
-)
+})
 
 (define_expand "cbranch<mode>4"
   [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
@@ -478,12 +478,12 @@
 			   (label_ref (match_operand 3 "" ""))
 			   (pc)))]
   ""
-  "
-  operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]), operands[1],
+{
+  operands[0] = aarch64_gen_compare_reg (GET_CODE (operands[0]), operands[1],
 					 operands[2]);
+  operands[1] = XEXP (operands[0], 0);
   operands[2] = const0_rtx;
-  "
-)
+})
 
 (define_expand "cbranchcc4"
   [(set (pc) (if_then_else
@@ -598,9 +598,8 @@
     if (val == 2)
       {
 	rtx masked = gen_reg_rtx (<MODE>mode);
-	rtx ccreg = aarch64_gen_compare_reg (LT, operands[1], const0_rtx);
+	rtx x = aarch64_gen_compare_reg (LT, operands[1], const0_rtx);
 	emit_insn (gen_and<mode>3 (masked, operands[1], mask));
-	rtx x = gen_rtx_LT (VOIDmode, ccreg, const0_rtx);
 	emit_insn (gen_csneg3<mode>_insn (operands[0], x, masked, masked));
 	DONE;
       }
@@ -3634,8 +3633,7 @@
    (match_operand:GPI 1 "register_operand")]
   ""
   {
-    rtx ccreg = aarch64_gen_compare_reg (LT, operands[1], const0_rtx);
-    rtx x = gen_rtx_LT (VOIDmode, ccreg, const0_rtx);
+    rtx x = aarch64_gen_compare_reg (LT, operands[1], const0_rtx);
     emit_insn (gen_csneg3<mode>_insn (operands[0], x, operands[1], operands[1]));
     DONE;
   }
@@ -4049,12 +4047,13 @@
 	 [(match_operand:GPI 2 "register_operand")
 	  (match_operand:GPI 3 "aarch64_plus_operand")]))]
   ""
-  "
-  operands[2] = aarch64_gen_compare_reg (GET_CODE (operands[1]), operands[2],
-				      operands[3]);
+{
+  operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[1]), operands[2],
+				         operands[3]);
+  PUT_MODE (operands[1], SImode);
+  operands[2] = XEXP (operands[1], 0);
   operands[3] = const0_rtx;
-  "
-)
+})
 
 (define_expand "cstorecc4"
   [(set (match_operand:SI 0 "register_operand")
@@ -4062,11 +4061,10 @@
 	[(match_operand 2 "cc_register")
          (match_operand 3 "const0_operand")]))]
   ""
-"{
+{
   emit_insn (gen_rtx_SET (operands[0], operands[1]));
   DONE;
-}")
-
+})
 
 (define_expand "cstore<mode>4"
   [(set (match_operand:SI 0 "register_operand")
@@ -4074,12 +4072,13 @@
 	 [(match_operand:GPF 2 "register_operand")
 	  (match_operand:GPF 3 "aarch64_fp_compare_operand")]))]
   ""
-  "
-  operands[2] = aarch64_gen_compare_reg (GET_CODE (operands[1]), operands[2],
-				      operands[3]);
+{
+  operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[1]), operands[2],
+				         operands[3]);
+  PUT_MODE (operands[1], SImode);
+  operands[2] = XEXP (operands[1], 0);
   operands[3] = const0_rtx;
-  "
-)
+})
 
 (define_insn "aarch64_cstore<mode>"
   [(set (match_operand:ALLI 0 "register_operand" "=r")
@@ -4165,12 +4164,12 @@
 	 (match_operand:GPI 4 "register_operand")
 	 (match_operand:GPI 5 "register_operand")))]
   ""
-  "
-  operands[2] = aarch64_gen_compare_reg (GET_CODE (operands[1]), operands[2],
-				      operands[3]);
+{
+  operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[1]), operands[2],
+				         operands[3]);
+  operands[2] = XEXP (operands[1], 0);
   operands[3] = const0_rtx;
-  "
-)
+})
 
 (define_expand "cmov<mode>6"
   [(set (match_operand:GPF 0 "register_operand")
@@ -4181,12 +4180,12 @@
 	 (match_operand:GPF 4 "register_operand")
 	 (match_operand:GPF 5 "register_operand")))]
   ""
-  "
-  operands[2] = aarch64_gen_compare_reg (GET_CODE (operands[1]), operands[2],
-				      operands[3]);
+{
+  operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[1]), operands[2],
+				         operands[3]);
+  operands[2] = XEXP (operands[1], 0);
   operands[3] = const0_rtx;
-  "
-)
+})
 
 (define_insn "*cmov<mode>_insn"
   [(set (match_operand:ALLI 0 "register_operand" "=r,r,r,r,r,r,r")
@@ -4263,15 +4262,13 @@
 			   (match_operand:ALLI 3 "register_operand")))]
   ""
   {
-    rtx ccreg;
     enum rtx_code code = GET_CODE (operands[1]);
 
     if (code == UNEQ || code == LTGT)
       FAIL;
 
-    ccreg = aarch64_gen_compare_reg (code, XEXP (operands[1], 0),
-				     XEXP (operands[1], 1));
-    operands[1] = gen_rtx_fmt_ee (code, VOIDmode, ccreg, const0_rtx);
+    operands[1] = aarch64_gen_compare_reg (code, XEXP (operands[1], 0),
+					   XEXP (operands[1], 1));
   }
 )
 
@@ -4282,15 +4279,13 @@
 			  (match_operand:GPF 3 "register_operand")))]
   ""
   {
-    rtx ccreg;
     enum rtx_code code = GET_CODE (operands[1]);
 
     if (code == UNEQ || code == LTGT)
       FAIL;
 
-    ccreg = aarch64_gen_compare_reg (code, XEXP (operands[1], 0),
-				  XEXP (operands[1], 1));
-    operands[1] = gen_rtx_fmt_ee (code, VOIDmode, ccreg, const0_rtx);
+    operands[1] = aarch64_gen_compare_reg (code, XEXP (operands[1], 0),
+					   XEXP (operands[1], 1));
   }
 )
 
@@ -4301,15 +4296,13 @@
 			  (match_operand:GPF 3 "register_operand")))]
   ""
   {
-    rtx ccreg;
     enum rtx_code code = GET_CODE (operands[1]);
 
     if (code == UNEQ || code == LTGT)
       FAIL;
 
-    ccreg = aarch64_gen_compare_reg (code, XEXP (operands[1], 0),
-				  XEXP (operands[1], 1));
-    operands[1] = gen_rtx_fmt_ee (code, VOIDmode, ccreg, const0_rtx);
+    operands[1] = aarch64_gen_compare_reg (code, XEXP (operands[1], 0),
+					   XEXP (operands[1], 1));
   }
 )
 
@@ -4320,15 +4313,13 @@
 			  (match_operand:GPI 3 "register_operand")))]
   ""
   {
-    rtx ccreg;
     enum rtx_code code = GET_CODE (operands[1]);
 
     if (code == UNEQ || code == LTGT)
       FAIL;
 
-    ccreg = aarch64_gen_compare_reg (code, XEXP (operands[1], 0),
-				      XEXP (operands[1], 1));
-    operands[1] = gen_rtx_fmt_ee (code, VOIDmode, ccreg, const0_rtx);
+    operands[1] = aarch64_gen_compare_reg (code, XEXP (operands[1], 0),
+					   XEXP (operands[1], 1));
   }
 )
 
@@ -4837,8 +4828,7 @@
    (match_operand:GPI 1 "register_operand")]
   ""
   {
-    rtx ccreg = aarch64_gen_compare_reg (EQ, operands[1], const0_rtx);
-    rtx x = gen_rtx_NE (VOIDmode, ccreg, const0_rtx);
+    rtx x = aarch64_gen_compare_reg (NE, operands[1], const0_rtx);
 
     emit_insn (gen_rbit<mode>2 (operands[0], operands[1]));
     emit_insn (gen_clz<mode>2 (operands[0], operands[0]));
-- 
2.20.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 8/9] aarch64: Implement TImode comparisons
  2020-03-21  2:42 [PATCH v2 0/9] aarch64: Implement TImode comparisons Richard Henderson
                   ` (6 preceding siblings ...)
  2020-03-21  2:42 ` [PATCH v2 7/9] aarch64: Adjust result of aarch64_gen_compare_reg Richard Henderson
@ 2020-03-21  2:42 ` Richard Henderson
  2020-03-21  2:42 ` [PATCH v2 9/9] aarch64: Implement absti2 Richard Henderson
  8 siblings, 0 replies; 21+ messages in thread
From: Richard Henderson @ 2020-03-21  2:42 UTC (permalink / raw)
  To: gcc-patches
  Cc: richard.earnshaw, richard.sandiford, marcus.shawcroft,
	kyrylo.tkachov, Wilco.Dijkstra

Use ccmp to perform all TImode comparisons branchless.

	* config/aarch64/aarch64.c (aarch64_gen_compare_reg): Expand all of
	the comparisons for TImode, not just NE.
	* config/aarch64/aarch64.md (cbranchti4, cstoreti4): New.
---
 gcc/config/aarch64/aarch64.c  | 130 ++++++++++++++++++++++++++++++----
 gcc/config/aarch64/aarch64.md |  28 ++++++++
 2 files changed, 144 insertions(+), 14 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 9e7c26a8df2..6ae0ea388ce 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -2333,32 +2333,134 @@ rtx
 aarch64_gen_compare_reg (RTX_CODE code, rtx x, rtx y)
 {
   machine_mode cmp_mode = GET_MODE (x);
-  machine_mode cc_mode;
   rtx cc_reg;
 
   if (cmp_mode == TImode)
     {
-      gcc_assert (code == NE);
-
-      cc_mode = CCmode;
-      cc_reg = gen_rtx_REG (cc_mode, CC_REGNUM);
-
       rtx x_lo = operand_subword (x, 0, 0, TImode);
-      rtx y_lo = operand_subword (y, 0, 0, TImode);
-      emit_set_insn (cc_reg, gen_rtx_COMPARE (cc_mode, x_lo, y_lo));
-
       rtx x_hi = operand_subword (x, 1, 0, TImode);
-      rtx y_hi = operand_subword (y, 1, 0, TImode);
-      emit_insn (gen_ccmpccdi (cc_reg, cc_reg, x_hi, y_hi,
-			       gen_rtx_EQ (cc_mode, cc_reg, const0_rtx),
-			       GEN_INT (AARCH64_EQ)));
+      struct expand_operand ops[2];
+      rtx y_lo, y_hi, tmp;
+
+      if (CONST_INT_P (y))
+	{
+	  HOST_WIDE_INT y_int = INTVAL (y);
+
+	  y_lo = y;
+	  switch (code)
+	    {
+	    case EQ:
+	    case NE:
+	      /* For equality, IOR the two halves together.  If this gets
+		 used for a branch, we expect this to fold to cbz/cbnz;
+		 otherwise it's no larger than cmp+ccmp below.  Beware of
+		 the compare-and-swap post-reload split and use cmp+ccmp.  */
+	      if (y_int == 0 && can_create_pseudo_p ())
+		{
+		  tmp = gen_reg_rtx (DImode);
+		  emit_insn (gen_iordi3 (tmp, x_hi, x_lo));
+		  emit_insn (gen_cmpdi (tmp, const0_rtx));
+		  cc_reg = gen_rtx_REG (CCmode, CC_REGNUM);
+		  goto done;
+		}
+		break;
+
+	    case LE:
+	    case GT:
+	      /* Add 1 to Y to convert to LT/GE, which avoids the swap and
+		 keeps the constant operand.  The cstoreti and cbranchti
+		 operand predicates require aarch64_plus_operand, which
+		 means this increment cannot overflow.  */
+	      y_lo = gen_int_mode (++y_int, DImode);
+	      code = (code == LE ? LT : GE);
+	      /* fall through */
+
+	    case LT:
+	    case GE:
+	      /* Check only the sign bit using tst, or fold to tbz/tbnz.  */
+	      if (y_int == 0)
+		{
+		  cc_reg = gen_rtx_REG (CC_NZmode, CC_REGNUM);
+		  tmp = gen_rtx_AND (DImode, x_hi, GEN_INT (INT64_MIN));
+		  tmp = gen_rtx_COMPARE (CC_NZmode, tmp, const0_rtx);
+		  emit_set_insn (cc_reg, tmp);
+		  code = (code == LT ? NE : EQ);
+		  goto done;
+		}
+	      break;
+
+	    default:
+	      break;
+	    }
+	  y_hi = (y_int < 0 ? constm1_rtx : const0_rtx);
+	}
+      else
+	{
+	  y_lo = operand_subword (y, 0, 0, TImode);
+	  y_hi = operand_subword (y, 1, 0, TImode);
+	}
+
+      switch (code)
+	{
+	case LEU:
+	case GTU:
+	case LE:
+	case GT:
+	  std::swap (x_lo, y_lo);
+	  std::swap (x_hi, y_hi);
+	  code = swap_condition (code);
+	  break;
+
+	default:
+	  break;
+	}
+
+      /* Emit cmpdi, forcing operands into registers as required. */
+      create_input_operand (&ops[0], x_lo, DImode);
+      create_input_operand (&ops[1], y_lo, DImode);
+      expand_insn (CODE_FOR_cmpdi, 2, ops);
+
+      cc_reg = gen_rtx_REG (CCmode, CC_REGNUM);
+      switch (code)
+	{
+	case EQ:
+	case NE:
+	  /* For EQ, (x_lo == y_lo) && (x_hi == y_hi).  */
+	  emit_insn (gen_ccmpccdi (cc_reg, cc_reg, x_hi, y_hi,
+				   gen_rtx_EQ (VOIDmode, cc_reg, const0_rtx),
+				   GEN_INT (AARCH64_EQ)));
+	  break;
+
+	case LTU:
+	case GEU:
+	  /* For LTU, (x - y), as double-word arithmetic.  */
+	  create_input_operand (&ops[0], x_hi, DImode);
+	  create_input_operand (&ops[1], y_hi, DImode);
+	  expand_insn (CODE_FOR_ucmpdi3_carryinC, 2, ops);
+	  /* The result is entirely within the C bit. */
+	  break;
+
+	case LT:
+	case GE:
+	  /* For LT, (x - y), as double-word arithmetic.  */
+	  create_input_operand (&ops[0], x_hi, DImode);
+	  create_input_operand (&ops[1], y_hi, DImode);
+	  expand_insn (CODE_FOR_scmpdi3_carryinC, 2, ops);
+	  /* The result is within the N and V bits -- normal LT/GE. */
+	  break;
+
+	default:
+	  gcc_unreachable ();
+	}
     }
   else
     {
-      cc_mode = SELECT_CC_MODE (code, x, y);
+      machine_mode cc_mode = SELECT_CC_MODE (code, x, y);
       cc_reg = gen_rtx_REG (cc_mode, CC_REGNUM);
       emit_set_insn (cc_reg, gen_rtx_COMPARE (cc_mode, x, y));
     }
+
+ done:
   return gen_rtx_fmt_ee (code, VOIDmode, cc_reg, const0_rtx);
 }
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 0b44c814bae..284a8038e28 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -471,6 +471,20 @@
   operands[2] = const0_rtx;
 })
 
+(define_expand "cbranchti4"
+  [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
+			    [(match_operand:TI 1 "register_operand")
+			     (match_operand:TI 2 "aarch64_plus_operand")])
+			   (label_ref (match_operand 3 "" ""))
+			   (pc)))]
+  ""
+{
+  operands[0] = aarch64_gen_compare_reg (GET_CODE (operands[0]), operands[1],
+					 operands[2]);
+  operands[1] = XEXP (operands[0], 0);
+  operands[2] = const0_rtx;
+})
+
 (define_expand "cbranch<mode>4"
   [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
 			    [(match_operand:GPF 1 "register_operand")
@@ -4055,6 +4069,20 @@
   operands[3] = const0_rtx;
 })
 
+(define_expand "cstoreti4"
+  [(set (match_operand:SI 0 "register_operand")
+	(match_operator:SI 1 "aarch64_comparison_operator"
+	 [(match_operand:TI 2 "register_operand")
+	  (match_operand:TI 3 "aarch64_plus_operand")]))]
+  ""
+{
+  operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[1]), operands[2],
+				         operands[3]);
+  PUT_MODE (operands[1], SImode);
+  operands[2] = XEXP (operands[1], 0);
+  operands[3] = const0_rtx;
+})
+
 (define_expand "cstorecc4"
   [(set (match_operand:SI 0 "register_operand")
        (match_operator 1 "aarch64_comparison_operator_mode"
-- 
2.20.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 9/9] aarch64: Implement absti2
  2020-03-21  2:42 [PATCH v2 0/9] aarch64: Implement TImode comparisons Richard Henderson
                   ` (7 preceding siblings ...)
  2020-03-21  2:42 ` [PATCH v2 8/9] aarch64: Implement TImode comparisons Richard Henderson
@ 2020-03-21  2:42 ` Richard Henderson
  8 siblings, 0 replies; 21+ messages in thread
From: Richard Henderson @ 2020-03-21  2:42 UTC (permalink / raw)
  To: gcc-patches
  Cc: richard.earnshaw, richard.sandiford, marcus.shawcroft,
	kyrylo.tkachov, Wilco.Dijkstra

	* config/aarch64/aarch64.md (absti2): New.
---
 gcc/config/aarch64/aarch64.md | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 284a8038e28..7a112f89487 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3653,6 +3653,36 @@
   }
 )
 
+(define_expand "absti2"
+  [(match_operand:TI 0 "register_operand")
+   (match_operand:TI 1 "register_operand")]
+  ""
+  {
+    rtx lo_op1 = gen_lowpart (DImode, operands[1]);
+    rtx hi_op1 = gen_highpart (DImode, operands[1]);
+    rtx lo_tmp = gen_reg_rtx (DImode);
+    rtx hi_tmp = gen_reg_rtx (DImode);
+    rtx x;
+
+    emit_insn (gen_negdi_carryout (lo_tmp, lo_op1));
+    emit_insn (gen_negvdi_carryinV (hi_tmp, hi_op1));
+
+    rtx cc = gen_rtx_REG (CC_NZmode, CC_REGNUM);
+
+    x = gen_rtx_GE (VOIDmode, cc, const0_rtx);
+    x = gen_rtx_IF_THEN_ELSE (DImode, x, lo_tmp, lo_op1);
+    emit_insn (gen_rtx_SET (lo_tmp, x));
+
+    x = gen_rtx_GE (VOIDmode, cc, const0_rtx);
+    x = gen_rtx_IF_THEN_ELSE (DImode, x, hi_tmp, hi_op1);
+    emit_insn (gen_rtx_SET (hi_tmp, x));
+
+    emit_move_insn (gen_lowpart (DImode, operands[0]), lo_tmp);
+    emit_move_insn (gen_highpart (DImode, operands[0]), hi_tmp);
+    DONE;
+  }
+)
+
 (define_insn "neg<mode>2"
   [(set (match_operand:GPI 0 "register_operand" "=r,w")
 	(neg:GPI (match_operand:GPI 1 "register_operand" "r,w")))]
-- 
2.20.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/9] aarch64: Add <su>cmp_*_carryinC patterns
  2020-03-21  2:42 ` [PATCH v2 3/9] aarch64: Add <su>cmp_*_carryinC patterns Richard Henderson
@ 2020-03-22 19:30   ` Segher Boessenkool
  2020-03-22 20:40     ` Richard Henderson
  2020-03-31 18:34   ` Richard Sandiford
  1 sibling, 1 reply; 21+ messages in thread
From: Segher Boessenkool @ 2020-03-22 19:30 UTC (permalink / raw)
  To: Richard Henderson
  Cc: gcc-patches, richard.earnshaw, Wilco.Dijkstra, marcus.shawcroft

Hi!

On Fri, Mar 20, 2020 at 07:42:25PM -0700, Richard Henderson via Gcc-patches wrote:
> Duplicate all usub_*_carryinC, but use xzr for the output when we
> only require the flags output.  The signed versions use sign_extend
> instead of zero_extend for combine's benefit.

You actually use ANY_EXTEND, which makes a lot more sense :-)

Did you see combine create a sign_extend, ever?  Or do those just come
from combining other insns that already contain a sign_extend?


Segher

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/9] aarch64: Add <su>cmp_*_carryinC patterns
  2020-03-22 19:30   ` Segher Boessenkool
@ 2020-03-22 20:40     ` Richard Henderson
  0 siblings, 0 replies; 21+ messages in thread
From: Richard Henderson @ 2020-03-22 20:40 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: gcc-patches, richard.earnshaw, Wilco.Dijkstra, marcus.shawcroft

On 3/22/20 12:30 PM, Segher Boessenkool wrote:
> Hi!
> 
> On Fri, Mar 20, 2020 at 07:42:25PM -0700, Richard Henderson via Gcc-patches wrote:
>> Duplicate all usub_*_carryinC, but use xzr for the output when we
>> only require the flags output.  The signed versions use sign_extend
>> instead of zero_extend for combine's benefit.
> 
> You actually use ANY_EXTEND, which makes a lot more sense :-)
> 
> Did you see combine create a sign_extend, ever?  Or do those just come
> from combining other insns that already contain a sign_extend?

In the penultimate patch, for cmpti, I emit this sign_extend'ed pattern
manually, so that rtl actually gets the proper description of the comparison of
the high-half of the TImode variable.


r~

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 7/9] aarch64: Adjust result of aarch64_gen_compare_reg
  2020-03-21  2:42 ` [PATCH v2 7/9] aarch64: Adjust result of aarch64_gen_compare_reg Richard Henderson
@ 2020-03-22 21:55   ` Segher Boessenkool
  2020-03-22 22:21     ` Richard Henderson
  0 siblings, 1 reply; 21+ messages in thread
From: Segher Boessenkool @ 2020-03-22 21:55 UTC (permalink / raw)
  To: Richard Henderson
  Cc: gcc-patches, richard.earnshaw, Wilco.Dijkstra, marcus.shawcroft

Hi!

On Fri, Mar 20, 2020 at 07:42:29PM -0700, Richard Henderson via Gcc-patches wrote:
> @@ -2382,7 +2382,7 @@ aarch64_gen_compare_reg_maybe_ze (RTX_CODE code, rtx x, rtx y,
>  	  cc_mode = CC_SWPmode;
>  	  cc_reg = gen_rtx_REG (cc_mode, CC_REGNUM);
>  	  emit_set_insn (cc_reg, t);
> -	  return cc_reg;
> +	  return gen_rtx_fmt_ee (code, VOIDmode, cc_reg, const0_rtx);
>  	}
>      }
>  
> @@ -18506,7 +18506,8 @@ aarch64_expand_compare_and_swap (rtx operands[])
>  
>        emit_insn (gen_aarch64_compare_and_swap_lse (mode, rval, mem,
>  						   newval, mod_s));
> -      cc_reg = aarch64_gen_compare_reg_maybe_ze (NE, rval, oldval, mode);
> +      x = aarch64_gen_compare_reg_maybe_ze (EQ, rval, oldval, mode);
> +      PUT_MODE (x, SImode);

Maybe this stuff would be simpler (and more obviously correct) if it
was more explicit CC_REGNUM is a fixed register, and the code would use
it directly everywhere?

(Something for stage1 I suppose, if you / the aarch people want to do
this at all :-) )

This patch does look correct to me, fwiw.


Segher

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 7/9] aarch64: Adjust result of aarch64_gen_compare_reg
  2020-03-22 21:55   ` Segher Boessenkool
@ 2020-03-22 22:21     ` Richard Henderson
  0 siblings, 0 replies; 21+ messages in thread
From: Richard Henderson @ 2020-03-22 22:21 UTC (permalink / raw)
  To: Segher Boessenkool, Richard Henderson
  Cc: richard.earnshaw, gcc-patches, marcus.shawcroft, Wilco.Dijkstra

On 3/22/20 2:55 PM, Segher Boessenkool wrote:
> Maybe this stuff would be simpler (and more obviously correct) if it
> was more explicit CC_REGNUM is a fixed register, and the code would use
> it directly everywhere?

Indeed the biggest issue I have in this patch is what CC_MODE to expose from
the high-half compare.

For unsigned inequality, only the C bit is valid.  For signed inequality, only
the N + V bits.  For equality, only the Z bit.

Which I am trying to expose with the multiple creations of CC_REGNUM, which are
used within the comparison, which are indeed sanity checked vs the comparison
code via %m/%M.

But the mode of the CC_REGNUM does not necessarily match up with the mode of
the comparison that generates it.  And we do not have a CC_NVmode, so I'm using
full CCmode for that.

This is the part of the patch that could use the most feedback.


r~

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/9] aarch64: Accept 0 as first argument to compares
  2020-03-21  2:42 ` [PATCH v2 1/9] aarch64: Accept 0 as first argument to compares Richard Henderson
@ 2020-03-31 16:55   ` Richard Sandiford
  2020-03-31 17:15     ` Richard Henderson
  0 siblings, 1 reply; 21+ messages in thread
From: Richard Sandiford @ 2020-03-31 16:55 UTC (permalink / raw)
  To: Richard Henderson
  Cc: gcc-patches, richard.earnshaw, marcus.shawcroft, kyrylo.tkachov,
	Wilco.Dijkstra

Richard Henderson <richard.henderson@linaro.org> writes:
> While cmp (extended register) and cmp (immediate) uses <Wn|WSP>,
> cmp (shifted register) uses <Wn>.  So we can perform cmp xzr, x0.
>
> For ccmp, we only have <Wn> as an input.
>
> 	* config/aarch64/aarch64.md (cmp<GPI>): For operand 0, use
> 	aarch64_reg_or_zero.  Shuffle reg/reg to last alternative
> 	and accept Z.
> 	(@ccmpcc<GPI>): For operand 0, use aarch64_reg_or_zero and Z.
> 	(@ccmpcc<GPI>_rev): Likewise.

Looks good, but...

> ---
>  gcc/config/aarch64/aarch64.md | 14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index c7c4d1dd519..b9ae51e48dd 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -502,7 +502,7 @@
>  	   [(match_operand 0 "cc_register" "")
>  	    (const_int 0)])
>  	  (compare:CC_ONLY
> -	    (match_operand:GPI 2 "register_operand" "r,r,r")
> +	    (match_operand:GPI 2 "aarch64_reg_or_zero" "rZ,rZ,rZ")
>  	    (match_operand:GPI 3 "aarch64_ccmp_operand" "r,Uss,Usn"))
>  	  (unspec:CC_ONLY
>  	    [(match_operand 5 "immediate_operand")]
> @@ -542,7 +542,7 @@
>  	    [(match_operand 5 "immediate_operand")]
>  	    UNSPEC_NZCV)
>  	  (compare:CC_ONLY
> -	    (match_operand:GPI 2 "register_operand" "r,r,r")
> +	    (match_operand:GPI 2 "aarch64_reg_or_zero" "rZ,rZ,rZ")
>  	    (match_operand:GPI 3 "aarch64_ccmp_operand" "r,Uss,Usn"))))]
>    ""
>    "@
> @@ -3961,14 +3961,14 @@
>  
>  (define_insn "cmp<mode>"
>    [(set (reg:CC CC_REGNUM)
> -	(compare:CC (match_operand:GPI 0 "register_operand" "rk,rk,rk")
> -		    (match_operand:GPI 1 "aarch64_plus_operand" "r,I,J")))]
> +	(compare:CC (match_operand:GPI 0 "aarch64_reg_or_zero" "rk,rk,rkZ")
> +		    (match_operand:GPI 1 "aarch64_plus_operand" "I,J,rZ")))]
>    ""
>    "@
> -   cmp\\t%<w>0, %<w>1
>     cmp\\t%<w>0, %1
> -   cmn\\t%<w>0, #%n1"
> -  [(set_attr "type" "alus_sreg,alus_imm,alus_imm")]
> +   cmn\\t%<w>0, #%n1
> +   cmp\\t%<w>0, %<w>1"
> +  [(set_attr "type" "alus_imm,alus_imm,alus_sreg")]
>  )
>  
>  (define_insn "fcmp<mode>"

...does adding 'Z' to operand 1 enable any new combinations?
I guess it allows (compare:CC (const_int 0) (const_int 0)),
but it's borderline whether that should be valid rtl.

Richard

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/9] aarch64: Accept 0 as first argument to compares
  2020-03-31 16:55   ` Richard Sandiford
@ 2020-03-31 17:15     ` Richard Henderson
  0 siblings, 0 replies; 21+ messages in thread
From: Richard Henderson @ 2020-03-31 17:15 UTC (permalink / raw)
  To: gcc-patches, richard.earnshaw, marcus.shawcroft, kyrylo.tkachov,
	Wilco.Dijkstra, richard.sandiford

On 3/31/20 9:55 AM, Richard Sandiford wrote:
>>  (define_insn "cmp<mode>"
>>    [(set (reg:CC CC_REGNUM)
>> -	(compare:CC (match_operand:GPI 0 "register_operand" "rk,rk,rk")
>> -		    (match_operand:GPI 1 "aarch64_plus_operand" "r,I,J")))]
>> +	(compare:CC (match_operand:GPI 0 "aarch64_reg_or_zero" "rk,rk,rkZ")
>> +		    (match_operand:GPI 1 "aarch64_plus_operand" "I,J,rZ")))]
>>    ""
>>    "@
>> -   cmp\\t%<w>0, %<w>1
>>     cmp\\t%<w>0, %1
>> -   cmn\\t%<w>0, #%n1"
>> -  [(set_attr "type" "alus_sreg,alus_imm,alus_imm")]
>> +   cmn\\t%<w>0, #%n1
>> +   cmp\\t%<w>0, %<w>1"
>> +  [(set_attr "type" "alus_imm,alus_imm,alus_sreg")]
>>  )
>>  
>>  (define_insn "fcmp<mode>"
> 
> ...does adding 'Z' to operand 1 enable any new combinations?

Not useful ones, on reflection, but given it's a valid combination, it's easier
to include it than not.

I can certainly remove that.

r~


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/9] aarch64: Add <su>cmp_*_carryinC patterns
  2020-03-21  2:42 ` [PATCH v2 3/9] aarch64: Add <su>cmp_*_carryinC patterns Richard Henderson
  2020-03-22 19:30   ` Segher Boessenkool
@ 2020-03-31 18:34   ` Richard Sandiford
  2020-03-31 22:44     ` Richard Henderson
  1 sibling, 1 reply; 21+ messages in thread
From: Richard Sandiford @ 2020-03-31 18:34 UTC (permalink / raw)
  To: Richard Henderson
  Cc: gcc-patches, richard.earnshaw, marcus.shawcroft, kyrylo.tkachov,
	Wilco.Dijkstra

Richard Henderson <richard.henderson@linaro.org> writes:
> Duplicate all usub_*_carryinC, but use xzr for the output when we
> only require the flags output.  The signed versions use sign_extend
> instead of zero_extend for combine's benefit.
>
> These will be used shortly for TImode comparisons.
>
> 	* config/aarch64/aarch64.md (<su>cmp<GPI>3_carryinC): New.
> 	(*<su>cmp<GPI>3_carryinC_z1): New.
> 	(*<su>cmp<GPI>3_carryinC_z2): New.
> 	(*<su>cmp<GPI>3_carryinC): New.
> ---
>  gcc/config/aarch64/aarch64.md | 50 +++++++++++++++++++++++++++++++++++
>  1 file changed, 50 insertions(+)
>
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index a996a5f1c39..9b1c3f797f9 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -3440,6 +3440,18 @@
>     ""
>  )
>  
> +(define_expand "<ANY_EXTEND:su>cmp<GPI:mode>3_carryinC"
> +   [(set (reg:CC CC_REGNUM)
> +	 (compare:CC
> +	   (ANY_EXTEND:<DWI>
> +	     (match_operand:GPI 0 "register_operand"))
> +	   (plus:<DWI>
> +	     (ANY_EXTEND:<DWI>
> +	       (match_operand:GPI 1 "register_operand"))
> +	     (ltu:<DWI> (reg:CC CC_REGNUM) (const_int 0)))))]
> +   ""
> +)
>  (define_insn "*usub<GPI:mode>3_carryinC_z1"
>    [(set (reg:CC CC_REGNUM)
>  	(compare:CC
> @@ -3457,6 +3469,19 @@
>    [(set_attr "type" "adc_reg")]
>  )
>  
> +(define_insn "*<ANY_EXTEND:su>cmp<GPI:mode>3_carryinC_z1"
> +  [(set (reg:CC CC_REGNUM)
> +	(compare:CC
> +	  (const_int 0)
> +	  (plus:<DWI>
> +	    (ANY_EXTEND:<DWI>
> +	      (match_operand:GPI 0 "register_operand" "r"))
> +	    (match_operand:<DWI> 1 "aarch64_borrow_operation" ""))))]
> +   ""
> +   "sbcs\\t<w>zr, <w>zr, %<w>0"
> +  [(set_attr "type" "adc_reg")]
> +)
> +
>  (define_insn "*usub<GPI:mode>3_carryinC_z2"
>    [(set (reg:CC CC_REGNUM)
>  	(compare:CC
> @@ -3472,6 +3497,17 @@
>    [(set_attr "type" "adc_reg")]
>  )
>  
> +(define_insn "*<ANY_EXTEND:su>cmp<GPI:mode>3_carryinC_z2"
> +  [(set (reg:CC CC_REGNUM)
> +	(compare:CC
> +	  (ANY_EXTEND:<DWI>
> +	    (match_operand:GPI 0 "register_operand" "r"))
> +	  (match_operand:<DWI> 1 "aarch64_borrow_operation" "")))]
> +   ""
> +   "sbcs\\t<w>zr, %<w>0, <w>zr"
> +  [(set_attr "type" "adc_reg")]
> +)
> +
>  (define_insn "*usub<GPI:mode>3_carryinC"
>    [(set (reg:CC CC_REGNUM)
>  	(compare:CC
> @@ -3490,6 +3526,20 @@
>    [(set_attr "type" "adc_reg")]
>  )
>  
> +(define_insn "*<ANY_EXTEND:su>cmp<GPI:mode>3_carryinC"
> +  [(set (reg:CC CC_REGNUM)
> +	(compare:CC
> +	  (ANY_EXTEND:<DWI>
> +	    (match_operand:GPI 0 "register_operand" "r"))
> +	  (plus:<DWI>
> +	    (ANY_EXTEND:<DWI>
> +	      (match_operand:GPI 1 "register_operand" "r"))
> +	    (match_operand:<DWI> 2 "aarch64_borrow_operation" ""))))]
> +   ""
> +   "sbcs\\t<w>zr, %<w>0, %<w>1"
> +  [(set_attr "type" "adc_reg")]
> +)

I guess this feeds into your reply to Segher's comment for 7/9,
but I think:

   (compare:CC X Y)

is always supposed to be the NZCV flags result of X - Y, as computed in
the mode of X and Y.  If so, it seems like the type of extension should
matter.  E.g. the N flag ought to be set for:

  (compare:CC
    (sign_extend 0xf...)
    (plus (sign_extend 0x7...)
          (ltu ...)))

but ought to be clear for:

  (compare:CC
    (zero_extend 0xf...)
    (plus (zero_extend 0x7...)
          (ltu ...)))

If so, I guess this is a bug in the existing code...

Thanks,
Richard

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/9] aarch64: Add <su>cmp_*_carryinC patterns
  2020-03-31 18:34   ` Richard Sandiford
@ 2020-03-31 22:44     ` Richard Henderson
  2020-04-01 12:37       ` Segher Boessenkool
  2020-04-01 16:28       ` Richard Sandiford
  0 siblings, 2 replies; 21+ messages in thread
From: Richard Henderson @ 2020-03-31 22:44 UTC (permalink / raw)
  To: gcc-patches, richard.earnshaw, marcus.shawcroft, kyrylo.tkachov,
	Wilco.Dijkstra, richard.sandiford, Segher Boessenkool

On 3/31/20 11:34 AM, Richard Sandiford wrote:
>> +(define_insn "*<ANY_EXTEND:su>cmp<GPI:mode>3_carryinC"
>> +  [(set (reg:CC CC_REGNUM)
>> +	(compare:CC
>> +	  (ANY_EXTEND:<DWI>
>> +	    (match_operand:GPI 0 "register_operand" "r"))
>> +	  (plus:<DWI>
>> +	    (ANY_EXTEND:<DWI>
>> +	      (match_operand:GPI 1 "register_operand" "r"))
>> +	    (match_operand:<DWI> 2 "aarch64_borrow_operation" ""))))]
>> +   ""
>> +   "sbcs\\t<w>zr, %<w>0, %<w>1"
>> +  [(set_attr "type" "adc_reg")]
>> +)
> 
> I guess this feeds into your reply to Segher's comment for 7/9,
> but I think:
> 
>    (compare:CC X Y)
> 
> is always supposed to be the NZCV flags result of X - Y, as computed in
> the mode of X and Y.  If so, it seems like the type of extension should
> matter.  E.g. the N flag ought to be set for:
> 
>   (compare:CC
>     (sign_extend 0xf...)
>     (plus (sign_extend 0x7...)
>           (ltu ...)))
> 
> but ought to be clear for:
> 
>   (compare:CC
>     (zero_extend 0xf...)
>     (plus (zero_extend 0x7...)
>           (ltu ...)))
> 
> If so, I guess this is a bug in the existing code...

The subject of CCmodes is a sticky one.  It mostly all depends on what combine
is able to do with the patterns.

For instance, your choice of example above, even for signed, the N bit cannot
be examined by itself, because that would only be valid for a comparison
against zero, like

    (compare (plus (reg) (reg))
             (const_int 0))

For this particular bit of rtl, the only valid comparison is N == V, i.e. GE/LT.

If we add a new CC mode for this, what would you call it?  Probably not
CC_NVmode, because to me that implies you can use either N or V, but it doesn't
imply you must examine both.

If we add more CC modes, does that mean that we have to improve SELECT_CC_MODE
to match those patterns?  Or do we add new CC modes just so that combine's use
of SELECT_CC_MODE *cannot* match them?


r~

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/9] aarch64: Add <su>cmp_*_carryinC patterns
  2020-03-31 22:44     ` Richard Henderson
@ 2020-04-01 12:37       ` Segher Boessenkool
  2020-04-01 16:28       ` Richard Sandiford
  1 sibling, 0 replies; 21+ messages in thread
From: Segher Boessenkool @ 2020-04-01 12:37 UTC (permalink / raw)
  To: Richard Henderson
  Cc: gcc-patches, richard.earnshaw, marcus.shawcroft, kyrylo.tkachov,
	Wilco.Dijkstra, richard.sandiford

On Tue, Mar 31, 2020 at 03:44:51PM -0700, Richard Henderson wrote:
> If we add more CC modes, does that mean that we have to improve SELECT_CC_MODE
> to match those patterns?  Or do we add new CC modes just so that combine's use
> of SELECT_CC_MODE *cannot* match them?

Adding new CC modes isn't helpful if nothing ever generates them.  To
have combine decide to use a different CC mode for a combined insn, you
need SELECT_CC_MODE.

The result of a combination is always valid (as RTL) with the original
CC mode, but such an insn might not exist, and perhaps it does exist
with some weaker CC mode (or more general CC mode, in some cases).


Segher

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/9] aarch64: Add <su>cmp_*_carryinC patterns
  2020-03-31 22:44     ` Richard Henderson
  2020-04-01 12:37       ` Segher Boessenkool
@ 2020-04-01 16:28       ` Richard Sandiford
  2020-04-01 17:14         ` Richard Henderson
  1 sibling, 1 reply; 21+ messages in thread
From: Richard Sandiford @ 2020-04-01 16:28 UTC (permalink / raw)
  To: Richard Henderson
  Cc: gcc-patches, richard.earnshaw, marcus.shawcroft, kyrylo.tkachov,
	Wilco.Dijkstra, Segher Boessenkool

Richard Henderson <richard.henderson@linaro.org> writes:
> On 3/31/20 11:34 AM, Richard Sandiford wrote:
>>> +(define_insn "*<ANY_EXTEND:su>cmp<GPI:mode>3_carryinC"
>>> +  [(set (reg:CC CC_REGNUM)
>>> +	(compare:CC
>>> +	  (ANY_EXTEND:<DWI>
>>> +	    (match_operand:GPI 0 "register_operand" "r"))
>>> +	  (plus:<DWI>
>>> +	    (ANY_EXTEND:<DWI>
>>> +	      (match_operand:GPI 1 "register_operand" "r"))
>>> +	    (match_operand:<DWI> 2 "aarch64_borrow_operation" ""))))]
>>> +   ""
>>> +   "sbcs\\t<w>zr, %<w>0, %<w>1"
>>> +  [(set_attr "type" "adc_reg")]
>>> +)
>> 
>> I guess this feeds into your reply to Segher's comment for 7/9,
>> but I think:
>> 
>>    (compare:CC X Y)
>> 
>> is always supposed to be the NZCV flags result of X - Y, as computed in
>> the mode of X and Y.  If so, it seems like the type of extension should
>> matter.  E.g. the N flag ought to be set for:
>> 
>>   (compare:CC
>>     (sign_extend 0xf...)
>>     (plus (sign_extend 0x7...)
>>           (ltu ...)))
>> 
>> but ought to be clear for:
>> 
>>   (compare:CC
>>     (zero_extend 0xf...)
>>     (plus (zero_extend 0x7...)
>>           (ltu ...)))
>> 
>> If so, I guess this is a bug in the existing code...
>
> The subject of CCmodes is a sticky one.  It mostly all depends on what combine
> is able to do with the patterns.
>
> For instance, your choice of example above, even for signed, the N bit cannot
> be examined by itself, because that would only be valid for a comparison
> against zero, like
>
>     (compare (plus (reg) (reg))
>              (const_int 0))
>
> For this particular bit of rtl, the only valid comparison is N == V,
> i.e. GE/LT.
>
> If we add a new CC mode for this, what would you call it?  Probably not
> CC_NVmode, because to me that implies you can use either N or V, but it doesn't
> imply you must examine both.
>
> If we add more CC modes, does that mean that we have to improve SELECT_CC_MODE
> to match those patterns?  Or do we add new CC modes just so that combine's use
> of SELECT_CC_MODE *cannot* match them?

Yeah, looks awkward.  There isn't AFAICT any way to describe the full
NZCV result of SBCS as a compare, in the case where C is possibly zero.
So I guess if we're going to continue using compare then we need to cook
up a new mode like you say.

How important is it to describe the flags operation as a compare though?
Could we instead use an unspec with three inputs, and keep it as :CC?
That would still allow special-case matching for zero operands.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/9] aarch64: Add <su>cmp_*_carryinC patterns
  2020-04-01 16:28       ` Richard Sandiford
@ 2020-04-01 17:14         ` Richard Henderson
  0 siblings, 0 replies; 21+ messages in thread
From: Richard Henderson @ 2020-04-01 17:14 UTC (permalink / raw)
  To: gcc-patches, richard.earnshaw, marcus.shawcroft, kyrylo.tkachov,
	Wilco.Dijkstra, Segher Boessenkool, richard.sandiford

On 4/1/20 9:28 AM, Richard Sandiford wrote:
> How important is it to describe the flags operation as a compare though?
> Could we instead use an unspec with three inputs, and keep it as :CC?
> That would still allow special-case matching for zero operands.

I'm not sure.

My guess is that the only interesting optimization for ADC/SBC is when
optimization determines that the low-part of op2 is zero, so that we can fold

  [(set (reg cc) (compare ...))
   (set (reg t0) (sub (reg a0) (reg b0))]

  [(set (reg cc) (compare ...))
   (set (reg t1) (sub (reg a1)
		   (sub (reg b1)
		     (geu (reg cc) (const 0)))))]

to

  [(set (reg t0) (reg a0)]

  [(set (reg cc) (compare ...))
   (set (reg t1) (sub (reg a1) (reg b1))]

which combine should be able to do by propagating zeros across the compare+geu.

Though I suppose it's still possible to handle this with unspecs and
define_split, so that

  [(set (reg cc)
        (unspec [(reg a1) (reg b2) (geu ...)]
        UNSPEC_SBCS)
   (set (reg t1) ...)]

when the geu folds to (const_int 0), we can split this to a plain sub.

I'll see if I can make this work with a minimum of effort.


r~

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2020-04-01 17:14 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-21  2:42 [PATCH v2 0/9] aarch64: Implement TImode comparisons Richard Henderson
2020-03-21  2:42 ` [PATCH v2 1/9] aarch64: Accept 0 as first argument to compares Richard Henderson
2020-03-31 16:55   ` Richard Sandiford
2020-03-31 17:15     ` Richard Henderson
2020-03-21  2:42 ` [PATCH v2 2/9] aarch64: Accept zeros in add<GPI>3_carryin Richard Henderson
2020-03-21  2:42 ` [PATCH v2 3/9] aarch64: Add <su>cmp_*_carryinC patterns Richard Henderson
2020-03-22 19:30   ` Segher Boessenkool
2020-03-22 20:40     ` Richard Henderson
2020-03-31 18:34   ` Richard Sandiford
2020-03-31 22:44     ` Richard Henderson
2020-04-01 12:37       ` Segher Boessenkool
2020-04-01 16:28       ` Richard Sandiford
2020-04-01 17:14         ` Richard Henderson
2020-03-21  2:42 ` [PATCH v2 4/9] aarch64: Add <su>cmp<GPI>_carryinC_m2 Richard Henderson
2020-03-21  2:42 ` [PATCH v2 5/9] aarch64: Provide expander for sub<GPI>3_compare1 Richard Henderson
2020-03-21  2:42 ` [PATCH v2 6/9] aarch64: Introduce aarch64_expand_addsubti Richard Henderson
2020-03-21  2:42 ` [PATCH v2 7/9] aarch64: Adjust result of aarch64_gen_compare_reg Richard Henderson
2020-03-22 21:55   ` Segher Boessenkool
2020-03-22 22:21     ` Richard Henderson
2020-03-21  2:42 ` [PATCH v2 8/9] aarch64: Implement TImode comparisons Richard Henderson
2020-03-21  2:42 ` [PATCH v2 9/9] aarch64: Implement absti2 Richard Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).