public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 0/4]AArch64: support conditional early clobbers on certain operations.
@ 2024-05-15 10:28 Tamar Christina
  2024-05-15 10:28 ` [PATCH 1/4]AArch64: convert several predicate patterns to new compact syntax Tamar Christina
                   ` (4 more replies)
  0 siblings, 5 replies; 19+ messages in thread
From: Tamar Christina @ 2024-05-15 10:28 UTC (permalink / raw)
  To: gcc-patches
  Cc: nd, Richard.Earnshaw, Marcus.Shawcroft, ktkachov, richard.sandiford

[-- Attachment #1: Type: text/plain, Size: 1613 bytes --]

Hi All,

Some Neoverse Software Optimization Guides (SWoG) have a clause that state
that for predicated operations that also produce a predicate it is preferred
that the codegen should use a different register for the destination than that
of the input predicate in order to avoid a performance overhead.

This of course has the problem that it increases register pressure and so should
be done with care.  Additionally not all micro-architectures have this
consideration and so it shouldn't be done as a default thing.

The patch series adds support for doing conditional early clobbers through a
combination of new alternatives and attributes to control their availability.

On high register pressure we also use LRA's costing to prefer not to use the
alternative and instead just use the tie as this is preferable to a reload.

Concretely this patch series does:

> aarch64-none-elf-gcc -O3 -g0 -S -o - pred-clobber.c -mcpu=neoverse-n2

foo:
        mov     z31.h, w0
        ptrue   p3.b, all
        cmplo   p0.h, p3/z, z0.h, z31.h
        b       use

> aarch64-none-elf-gcc -O3 -g0 -S -o - pred-clobber.c -mcpu=neoverse-n1+sve

foo:
        mov     z31.h, w0
        ptrue   p0.b, all
        cmplo   p0.h, p0/z, z0.h, z31.h
        b       use

> aarch64-none-elf-gcc -O3 -g0 -S -o - pred-clobber.c -mcpu=neoverse-n2 -ffixed-p[1-15]

foo:
        mov     z31.h, w0
        ptrue   p0.b, all
        cmplo   p0.h, p0/z, z0.h, z31.h
        b       use

Testcases for the changes are in the last patch of the series.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Thanks,
Tamar

---

-- 

[-- Attachment #2: rb18359.patch --]
[-- Type: text/x-diff, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/4]AArch64: convert several predicate patterns to new compact syntax
  2024-05-15 10:28 [PATCH 0/4]AArch64: support conditional early clobbers on certain operations Tamar Christina
@ 2024-05-15 10:28 ` Tamar Christina
  2024-05-15 10:35   ` Kyrill Tkachov
  2024-05-15 11:06   ` Richard Sandiford
  2024-05-15 10:28 ` [PATCH 2/4]AArch64: add new tuning param and attribute for enabling conditional early clobber Tamar Christina
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 19+ messages in thread
From: Tamar Christina @ 2024-05-15 10:28 UTC (permalink / raw)
  To: gcc-patches
  Cc: nd, Richard.Earnshaw, Marcus.Shawcroft, ktkachov, richard.sandiford

[-- Attachment #1: Type: text/plain, Size: 27218 bytes --]

Hi All,

This converts the single alternative patterns to the new compact syntax such
that when I add the new alternatives it's clearer what's being changed.

Note that this will spew out a bunch of warnings from geninsn as it'll warn that
@ is useless for a single alternative pattern.  These are not fatal so won't
break the build and are only temporary.

No change in functionality is expected with this patch.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* config/aarch64/aarch64-sve.md (and<mode>3,
	@aarch64_pred_<optab><mode>_z, *<optab><mode>3_cc,
	*<optab><mode>3_ptest, aarch64_pred_<nlogical><mode>_z,
	*<nlogical><mode>3_cc, *<nlogical><mode>3_ptest,
	aarch64_pred_<logical_nn><mode>_z, *<logical_nn><mode>3_cc,
	*<logical_nn><mode>3_ptest, *cmp<cmp_op><mode>_ptest,
	@aarch64_pred_cmp<cmp_op><mode>_wide,
	*aarch64_pred_cmp<cmp_op><mode>_wide_cc,
	*aarch64_pred_cmp<cmp_op><mode>_wide_ptest, *aarch64_brk<brk_op>_cc,
	*aarch64_brk<brk_op>_ptest, @aarch64_brk<brk_op>, *aarch64_brkn_cc,
	*aarch64_brkn_ptest, *aarch64_brk<brk_op>_cc,
	*aarch64_brk<brk_op>_ptest, aarch64_rdffr_z, *aarch64_rdffr_z_ptest,
	*aarch64_rdffr_ptest, *aarch64_rdffr_z_cc, *aarch64_rdffr_cc): Convert
	to compact syntax.
	* config/aarch64/aarch64-sve2.md
	(@aarch64_pred_<sve_int_op><mode>): Likewise.

---
diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md
index 0434358122d2fde71bd0e0f850338e739e9be02c..839ab0627747d7a49bef7b0192ee9e7a42587ca0 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -1156,76 +1156,86 @@ (define_insn "aarch64_rdffr"
 
 ;; Likewise with zero predication.
 (define_insn "aarch64_rdffr_z"
-  [(set (match_operand:VNx16BI 0 "register_operand" "=Upa")
+  [(set (match_operand:VNx16BI 0 "register_operand")
 	(and:VNx16BI
 	  (reg:VNx16BI FFRT_REGNUM)
-	  (match_operand:VNx16BI 1 "register_operand" "Upa")))]
+	  (match_operand:VNx16BI 1 "register_operand")))]
   "TARGET_SVE && TARGET_NON_STREAMING"
-  "rdffr\t%0.b, %1/z"
+  {@ [ cons: =0, 1   ]
+     [ Upa     , Upa ] rdffr\t%0.b, %1/z
+  }
 )
 
 ;; Read the FFR to test for a fault, without using the predicate result.
 (define_insn "*aarch64_rdffr_z_ptest"
   [(set (reg:CC_NZC CC_REGNUM)
 	(unspec:CC_NZC
-	  [(match_operand:VNx16BI 1 "register_operand" "Upa")
+	  [(match_operand:VNx16BI 1 "register_operand")
 	   (match_dup 1)
 	   (match_operand:SI 2 "aarch64_sve_ptrue_flag")
 	   (and:VNx16BI
 	     (reg:VNx16BI FFRT_REGNUM)
 	     (match_dup 1))]
 	  UNSPEC_PTEST))
-   (clobber (match_scratch:VNx16BI 0 "=Upa"))]
+   (clobber (match_scratch:VNx16BI 0))]
   "TARGET_SVE && TARGET_NON_STREAMING"
-  "rdffrs\t%0.b, %1/z"
+  {@ [ cons: =0, 1  , 2 ]
+     [ Upa     , Upa,   ] rdffrs\t%0.b, %1/z
+  }
 )
 
 ;; Same for unpredicated RDFFR when tested with a known PTRUE.
 (define_insn "*aarch64_rdffr_ptest"
   [(set (reg:CC_NZC CC_REGNUM)
 	(unspec:CC_NZC
-	  [(match_operand:VNx16BI 1 "register_operand" "Upa")
+	  [(match_operand:VNx16BI 1 "register_operand")
 	   (match_dup 1)
 	   (const_int SVE_KNOWN_PTRUE)
 	   (reg:VNx16BI FFRT_REGNUM)]
 	  UNSPEC_PTEST))
-   (clobber (match_scratch:VNx16BI 0 "=Upa"))]
+   (clobber (match_scratch:VNx16BI 0))]
   "TARGET_SVE && TARGET_NON_STREAMING"
-  "rdffrs\t%0.b, %1/z"
+  {@ [ cons: =0, 1   ]
+     [ Upa     , Upa ] rdffrs\t%0.b, %1/z
+  }
 )
 
 ;; Read the FFR with zero predication and test the result.
 (define_insn "*aarch64_rdffr_z_cc"
   [(set (reg:CC_NZC CC_REGNUM)
 	(unspec:CC_NZC
-	  [(match_operand:VNx16BI 1 "register_operand" "Upa")
+	  [(match_operand:VNx16BI 1 "register_operand")
 	   (match_dup 1)
 	   (match_operand:SI 2 "aarch64_sve_ptrue_flag")
 	   (and:VNx16BI
 	     (reg:VNx16BI FFRT_REGNUM)
 	     (match_dup 1))]
 	  UNSPEC_PTEST))
-   (set (match_operand:VNx16BI 0 "register_operand" "=Upa")
+   (set (match_operand:VNx16BI 0 "register_operand")
 	(and:VNx16BI
 	  (reg:VNx16BI FFRT_REGNUM)
 	  (match_dup 1)))]
   "TARGET_SVE && TARGET_NON_STREAMING"
-  "rdffrs\t%0.b, %1/z"
+  {@ [ cons: =0, 1  , 2 ]
+     [ Upa     , Upa,   ] rdffrs\t%0.b, %1/z
+  }
 )
 
 ;; Same for unpredicated RDFFR when tested with a known PTRUE.
 (define_insn "*aarch64_rdffr_cc"
   [(set (reg:CC_NZC CC_REGNUM)
 	(unspec:CC_NZC
-	  [(match_operand:VNx16BI 1 "register_operand" "Upa")
+	  [(match_operand:VNx16BI 1 "register_operand")
 	   (match_dup 1)
 	   (const_int SVE_KNOWN_PTRUE)
 	   (reg:VNx16BI FFRT_REGNUM)]
 	  UNSPEC_PTEST))
-   (set (match_operand:VNx16BI 0 "register_operand" "=Upa")
+   (set (match_operand:VNx16BI 0 "register_operand")
 	(reg:VNx16BI FFRT_REGNUM))]
   "TARGET_SVE && TARGET_NON_STREAMING"
-  "rdffrs\t%0.b, %1/z"
+  {@ [ cons: =0, 1  , 2 ]
+     [ Upa     , Upa,   ] rdffrs\t%0.b, %1/z
+  }
 )
 
 ;; [R3 in the block comment above about FFR handling]
@@ -6637,11 +6647,13 @@ (define_insn "@aarch64_pred_<optab><mode>"
 ;; Doubling the second operand is the preferred implementation
 ;; of the MOV alias, so we use that instead of %1/z, %1, %2.
 (define_insn "and<mode>3"
-  [(set (match_operand:PRED_ALL 0 "register_operand" "=Upa")
-	(and:PRED_ALL (match_operand:PRED_ALL 1 "register_operand" "Upa")
-		      (match_operand:PRED_ALL 2 "register_operand" "Upa")))]
+  [(set (match_operand:PRED_ALL 0 "register_operand")
+	(and:PRED_ALL (match_operand:PRED_ALL 1 "register_operand")
+		      (match_operand:PRED_ALL 2 "register_operand")))]
   "TARGET_SVE"
-  "and\t%0.b, %1/z, %2.b, %2.b"
+  {@ [ cons: =0, 1  , 2   ]
+     [ Upa     , Upa, Upa ] and\t%0.b, %1/z, %2.b, %2.b
+  }
 )
 
 ;; Unpredicated predicate EOR and ORR.
@@ -6660,14 +6672,16 @@ (define_expand "<optab><mode>3"
 
 ;; Predicated predicate AND, EOR and ORR.
 (define_insn "@aarch64_pred_<optab><mode>_z"
-  [(set (match_operand:PRED_ALL 0 "register_operand" "=Upa")
+  [(set (match_operand:PRED_ALL 0 "register_operand")
 	(and:PRED_ALL
 	  (LOGICAL:PRED_ALL
-	    (match_operand:PRED_ALL 2 "register_operand" "Upa")
-	    (match_operand:PRED_ALL 3 "register_operand" "Upa"))
-	  (match_operand:PRED_ALL 1 "register_operand" "Upa")))]
+	    (match_operand:PRED_ALL 2 "register_operand")
+	    (match_operand:PRED_ALL 3 "register_operand"))
+	  (match_operand:PRED_ALL 1 "register_operand")))]
   "TARGET_SVE"
-  "<logical>\t%0.b, %1/z, %2.b, %3.b"
+  {@ [ cons: =0, 1  , 2  , 3   ]
+     [ Upa     , Upa, Upa, Upa ] <logical>\t%0.b, %1/z, %2.b, %3.b
+  }
 )
 
 ;; Perform a logical operation on operands 2 and 3, using operand 1 as
@@ -6676,38 +6690,42 @@ (define_insn "@aarch64_pred_<optab><mode>_z"
 (define_insn "*<optab><mode>3_cc"
   [(set (reg:CC_NZC CC_REGNUM)
 	(unspec:CC_NZC
-	  [(match_operand:VNx16BI 1 "register_operand" "Upa")
+	  [(match_operand:VNx16BI 1 "register_operand")
 	   (match_operand 4)
 	   (match_operand:SI 5 "aarch64_sve_ptrue_flag")
 	   (and:PRED_ALL
 	     (LOGICAL:PRED_ALL
-	       (match_operand:PRED_ALL 2 "register_operand" "Upa")
-	       (match_operand:PRED_ALL 3 "register_operand" "Upa"))
+	       (match_operand:PRED_ALL 2 "register_operand")
+	       (match_operand:PRED_ALL 3 "register_operand"))
 	     (match_dup 4))]
 	  UNSPEC_PTEST))
-   (set (match_operand:PRED_ALL 0 "register_operand" "=Upa")
+   (set (match_operand:PRED_ALL 0 "register_operand")
 	(and:PRED_ALL (LOGICAL:PRED_ALL (match_dup 2) (match_dup 3))
 		      (match_dup 4)))]
   "TARGET_SVE"
-  "<logical>s\t%0.b, %1/z, %2.b, %3.b"
+  {@ [ cons: =0, 1  , 2  , 3  , 4, 5 ]
+     [ Upa     , Upa, Upa, Upa,  ,   ] <logical>s\t%0.b, %1/z, %2.b, %3.b
+  }
 )
 
 ;; Same with just the flags result.
 (define_insn "*<optab><mode>3_ptest"
   [(set (reg:CC_NZC CC_REGNUM)
 	(unspec:CC_NZC
-	  [(match_operand:VNx16BI 1 "register_operand" "Upa")
+	  [(match_operand:VNx16BI 1 "register_operand")
 	   (match_operand 4)
 	   (match_operand:SI 5 "aarch64_sve_ptrue_flag")
 	   (and:PRED_ALL
 	     (LOGICAL:PRED_ALL
-	       (match_operand:PRED_ALL 2 "register_operand" "Upa")
-	       (match_operand:PRED_ALL 3 "register_operand" "Upa"))
+	       (match_operand:PRED_ALL 2 "register_operand")
+	       (match_operand:PRED_ALL 3 "register_operand"))
 	     (match_dup 4))]
 	  UNSPEC_PTEST))
-   (clobber (match_scratch:VNx16BI 0 "=Upa"))]
+   (clobber (match_scratch:VNx16BI 0))]
   "TARGET_SVE"
-  "<logical>s\t%0.b, %1/z, %2.b, %3.b"
+  {@ [ cons: =0, 1  , 2  , 3  , 4, 5 ]
+     [ Upa     , Upa, Upa, Upa,  ,   ] <logical>s\t%0.b, %1/z, %2.b, %3.b
+  }
 )
 
 ;; -------------------------------------------------------------------------
@@ -6720,56 +6738,62 @@ (define_insn "*<optab><mode>3_ptest"
 
 ;; Predicated predicate BIC and ORN.
 (define_insn "aarch64_pred_<nlogical><mode>_z"
-  [(set (match_operand:PRED_ALL 0 "register_operand" "=Upa")
+  [(set (match_operand:PRED_ALL 0 "register_operand")
 	(and:PRED_ALL
 	  (NLOGICAL:PRED_ALL
-	    (not:PRED_ALL (match_operand:PRED_ALL 3 "register_operand" "Upa"))
-	    (match_operand:PRED_ALL 2 "register_operand" "Upa"))
-	  (match_operand:PRED_ALL 1 "register_operand" "Upa")))]
+	    (not:PRED_ALL (match_operand:PRED_ALL 3 "register_operand"))
+	    (match_operand:PRED_ALL 2 "register_operand"))
+	  (match_operand:PRED_ALL 1 "register_operand")))]
   "TARGET_SVE"
-  "<nlogical>\t%0.b, %1/z, %2.b, %3.b"
+  {@ [ cons: =0, 1  , 2  , 3   ]
+     [ Upa     , Upa, Upa, Upa ] <nlogical>\t%0.b, %1/z, %2.b, %3.b
+  }
 )
 
 ;; Same, but set the flags as a side-effect.
 (define_insn "*<nlogical><mode>3_cc"
   [(set (reg:CC_NZC CC_REGNUM)
 	(unspec:CC_NZC
-	  [(match_operand:VNx16BI 1 "register_operand" "Upa")
+	  [(match_operand:VNx16BI 1 "register_operand")
 	   (match_operand 4)
 	   (match_operand:SI 5 "aarch64_sve_ptrue_flag")
 	   (and:PRED_ALL
 	     (NLOGICAL:PRED_ALL
 	       (not:PRED_ALL
-		 (match_operand:PRED_ALL 3 "register_operand" "Upa"))
-	       (match_operand:PRED_ALL 2 "register_operand" "Upa"))
+		 (match_operand:PRED_ALL 3 "register_operand"))
+	       (match_operand:PRED_ALL 2 "register_operand"))
 	     (match_dup 4))]
 	  UNSPEC_PTEST))
-   (set (match_operand:PRED_ALL 0 "register_operand" "=Upa")
+   (set (match_operand:PRED_ALL 0 "register_operand")
 	(and:PRED_ALL (NLOGICAL:PRED_ALL
 			(not:PRED_ALL (match_dup 3))
 			(match_dup 2))
 		      (match_dup 4)))]
   "TARGET_SVE"
-  "<nlogical>s\t%0.b, %1/z, %2.b, %3.b"
+  {@ [ cons: =0, 1  , 2  , 3  , 4, 5 ]
+     [ Upa     , Upa, Upa, Upa,  ,   ] <nlogical>s\t%0.b, %1/z, %2.b, %3.b
+  }
 )
 
 ;; Same with just the flags result.
 (define_insn "*<nlogical><mode>3_ptest"
   [(set (reg:CC_NZC CC_REGNUM)
 	(unspec:CC_NZC
-	  [(match_operand:VNx16BI 1 "register_operand" "Upa")
+	  [(match_operand:VNx16BI 1 "register_operand")
 	   (match_operand 4)
 	   (match_operand:SI 5 "aarch64_sve_ptrue_flag")
 	   (and:PRED_ALL
 	     (NLOGICAL:PRED_ALL
 	       (not:PRED_ALL
-		 (match_operand:PRED_ALL 3 "register_operand" "Upa"))
-	       (match_operand:PRED_ALL 2 "register_operand" "Upa"))
+		 (match_operand:PRED_ALL 3 "register_operand"))
+	       (match_operand:PRED_ALL 2 "register_operand"))
 	     (match_dup 4))]
 	  UNSPEC_PTEST))
-   (clobber (match_scratch:VNx16BI 0 "=Upa"))]
+   (clobber (match_scratch:VNx16BI 0))]
   "TARGET_SVE"
-  "<nlogical>s\t%0.b, %1/z, %2.b, %3.b"
+  {@ [ cons:  =0, 1  , 2  , 3  , 4, 5 ]
+     [ Upa      , Upa, Upa, Upa,  ,   ] <nlogical>s\t%0.b, %1/z, %2.b, %3.b
+  }
 )
 
 ;; -------------------------------------------------------------------------
@@ -6782,58 +6806,64 @@ (define_insn "*<nlogical><mode>3_ptest"
 
 ;; Predicated predicate NAND and NOR.
 (define_insn "aarch64_pred_<logical_nn><mode>_z"
-  [(set (match_operand:PRED_ALL 0 "register_operand" "=Upa")
+  [(set (match_operand:PRED_ALL 0 "register_operand")
 	(and:PRED_ALL
 	  (NLOGICAL:PRED_ALL
-	    (not:PRED_ALL (match_operand:PRED_ALL 2 "register_operand" "Upa"))
-	    (not:PRED_ALL (match_operand:PRED_ALL 3 "register_operand" "Upa")))
-	  (match_operand:PRED_ALL 1 "register_operand" "Upa")))]
+	    (not:PRED_ALL (match_operand:PRED_ALL 2 "register_operand"))
+	    (not:PRED_ALL (match_operand:PRED_ALL 3 "register_operand")))
+	  (match_operand:PRED_ALL 1 "register_operand")))]
   "TARGET_SVE"
-  "<logical_nn>\t%0.b, %1/z, %2.b, %3.b"
+  {@ [ cons: =0, 1  , 2  , 3   ]
+     [ Upa     , Upa, Upa, Upa ] <logical_nn>\t%0.b, %1/z, %2.b, %3.b
+  }
 )
 
 ;; Same, but set the flags as a side-effect.
 (define_insn "*<logical_nn><mode>3_cc"
   [(set (reg:CC_NZC CC_REGNUM)
 	(unspec:CC_NZC
-	  [(match_operand:VNx16BI 1 "register_operand" "Upa")
+	  [(match_operand:VNx16BI 1 "register_operand")
 	   (match_operand 4)
 	   (match_operand:SI 5 "aarch64_sve_ptrue_flag")
 	   (and:PRED_ALL
 	     (NLOGICAL:PRED_ALL
 	       (not:PRED_ALL
-		 (match_operand:PRED_ALL 2 "register_operand" "Upa"))
+		 (match_operand:PRED_ALL 2 "register_operand"))
 	       (not:PRED_ALL
-		 (match_operand:PRED_ALL 3 "register_operand" "Upa")))
+		 (match_operand:PRED_ALL 3 "register_operand")))
 	     (match_dup 4))]
 	  UNSPEC_PTEST))
-   (set (match_operand:PRED_ALL 0 "register_operand" "=Upa")
+   (set (match_operand:PRED_ALL 0 "register_operand")
 	(and:PRED_ALL (NLOGICAL:PRED_ALL
 			(not:PRED_ALL (match_dup 2))
 			(not:PRED_ALL (match_dup 3)))
 		      (match_dup 4)))]
   "TARGET_SVE"
-  "<logical_nn>s\t%0.b, %1/z, %2.b, %3.b"
+  {@ [ cons: =0, 1  , 2  , 3  , 4, 5 ]
+     [ Upa     , Upa, Upa, Upa,  ,   ] <logical_nn>s\t%0.b, %1/z, %2.b, %3.b
+  }
 )
 
 ;; Same with just the flags result.
 (define_insn "*<logical_nn><mode>3_ptest"
   [(set (reg:CC_NZC CC_REGNUM)
 	(unspec:CC_NZC
-	  [(match_operand:VNx16BI 1 "register_operand" "Upa")
+	  [(match_operand:VNx16BI 1 "register_operand")
 	   (match_operand 4)
 	   (match_operand:SI 5 "aarch64_sve_ptrue_flag")
 	   (and:PRED_ALL
 	     (NLOGICAL:PRED_ALL
 	       (not:PRED_ALL
-		 (match_operand:PRED_ALL 2 "register_operand" "Upa"))
+		 (match_operand:PRED_ALL 2 "register_operand"))
 	       (not:PRED_ALL
-		 (match_operand:PRED_ALL 3 "register_operand" "Upa")))
+		 (match_operand:PRED_ALL 3 "register_operand")))
 	     (match_dup 4))]
 	  UNSPEC_PTEST))
-   (clobber (match_scratch:VNx16BI 0 "=Upa"))]
+   (clobber (match_scratch:VNx16BI 0))]
   "TARGET_SVE"
-  "<logical_nn>s\t%0.b, %1/z, %2.b, %3.b"
+  {@ [ cons: =0, 1  , 2  , 3  , 4, 5 ]
+     [ Upa     , Upa, Upa, Upa,  ,   ] <logical_nn>s\t%0.b, %1/z, %2.b, %3.b
+  }
 )
 
 ;; =========================================================================
@@ -8133,12 +8163,12 @@ (define_insn_and_rewrite "*cmp<cmp_op><mode>_ptest"
 		(match_operand:SVE_I 3 "aarch64_sve_cmp_<sve_imm_con>_operand"))]
 	     UNSPEC_PRED_Z)]
 	  UNSPEC_PTEST))
-   (clobber (match_scratch:<VPRED> 0 "=Upa, Upa"))]
+   (clobber (match_scratch:<VPRED> 0))]
   "TARGET_SVE
    && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])"
-  {@ [ cons: 1 , 2 , 3              ]
-     [ Upl     , w , <sve_imm_con>  ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, #%3
-     [ Upl     , w , w              ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype>
+  {@ [ cons: =0, 1  , 2 , 3              ]
+     [ Upa     , Upl, w , <sve_imm_con>  ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, #%3
+     [ Upa     , Upl, w , w              ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype>
   }
   "&& !rtx_equal_p (operands[4], operands[6])"
   {
@@ -8180,18 +8210,20 @@ (define_insn_and_split "*cmp<cmp_op><mode>_and"
 
 ;; Predicated integer wide comparisons.
 (define_insn "@aarch64_pred_cmp<cmp_op><mode>_wide"
-  [(set (match_operand:<VPRED> 0 "register_operand" "=Upa")
+  [(set (match_operand:<VPRED> 0 "register_operand")
 	(unspec:<VPRED>
-	  [(match_operand:VNx16BI 1 "register_operand" "Upl")
+	  [(match_operand:VNx16BI 1 "register_operand")
 	   (match_operand:SI 2 "aarch64_sve_ptrue_flag")
 	   (unspec:<VPRED>
-	     [(match_operand:SVE_FULL_BHSI 3 "register_operand" "w")
-	      (match_operand:VNx2DI 4 "register_operand" "w")]
+	     [(match_operand:SVE_FULL_BHSI 3 "register_operand")
+	      (match_operand:VNx2DI 4 "register_operand")]
 	     SVE_COND_INT_CMP_WIDE)]
 	  UNSPEC_PRED_Z))
    (clobber (reg:CC_NZC CC_REGNUM))]
   "TARGET_SVE"
-  "cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.d"
+  {@ [ cons: =0, 1  , 2, 3, 4 ]
+     [ Upa     , Upl,  , w, w ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.d
+  }
 )
 
 ;; Predicated integer wide comparisons in which both the flag and
@@ -8199,19 +8231,19 @@ (define_insn "@aarch64_pred_cmp<cmp_op><mode>_wide"
 (define_insn "*aarch64_pred_cmp<cmp_op><mode>_wide_cc"
   [(set (reg:CC_NZC CC_REGNUM)
 	(unspec:CC_NZC
-	  [(match_operand:VNx16BI 1 "register_operand" "Upl")
+	  [(match_operand:VNx16BI 1 "register_operand")
 	   (match_operand 4)
 	   (match_operand:SI 5 "aarch64_sve_ptrue_flag")
 	   (unspec:<VPRED>
-	     [(match_operand:VNx16BI 6 "register_operand" "Upl")
+	     [(match_operand:VNx16BI 6 "register_operand")
 	      (match_operand:SI 7 "aarch64_sve_ptrue_flag")
 	      (unspec:<VPRED>
-		[(match_operand:SVE_FULL_BHSI 2 "register_operand" "w")
-		 (match_operand:VNx2DI 3 "register_operand" "w")]
+		[(match_operand:SVE_FULL_BHSI 2 "register_operand")
+		 (match_operand:VNx2DI 3 "register_operand")]
 		SVE_COND_INT_CMP_WIDE)]
 	     UNSPEC_PRED_Z)]
 	  UNSPEC_PTEST))
-   (set (match_operand:<VPRED> 0 "register_operand" "=Upa")
+   (set (match_operand:<VPRED> 0 "register_operand")
 	(unspec:<VPRED>
 	  [(match_dup 6)
 	   (match_dup 7)
@@ -8222,7 +8254,9 @@ (define_insn "*aarch64_pred_cmp<cmp_op><mode>_wide_cc"
 	  UNSPEC_PRED_Z))]
   "TARGET_SVE
    && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])"
-  "cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.d"
+  {@ [ cons: =0, 1  , 2, 3, 4, 5, 6  , 7 ]
+     [ Upa     , Upl, w, w,  ,  , Upl,   ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.d
+  }
 )
 
 ;; Predicated integer wide comparisons in which only the flags result
@@ -8230,22 +8264,24 @@ (define_insn "*aarch64_pred_cmp<cmp_op><mode>_wide_cc"
 (define_insn "*aarch64_pred_cmp<cmp_op><mode>_wide_ptest"
   [(set (reg:CC_NZC CC_REGNUM)
 	(unspec:CC_NZC
-	  [(match_operand:VNx16BI 1 "register_operand" "Upl")
+	  [(match_operand:VNx16BI 1 "register_operand")
 	   (match_operand 4)
 	   (match_operand:SI 5 "aarch64_sve_ptrue_flag")
 	   (unspec:<VPRED>
-	     [(match_operand:VNx16BI 6 "register_operand" "Upl")
+	     [(match_operand:VNx16BI 6 "register_operand")
 	      (match_operand:SI 7 "aarch64_sve_ptrue_flag")
 	      (unspec:<VPRED>
-		[(match_operand:SVE_FULL_BHSI 2 "register_operand" "w")
-		 (match_operand:VNx2DI 3 "register_operand" "w")]
+		[(match_operand:SVE_FULL_BHSI 2 "register_operand")
+		 (match_operand:VNx2DI 3 "register_operand")]
 		SVE_COND_INT_CMP_WIDE)]
 	     UNSPEC_PRED_Z)]
 	  UNSPEC_PTEST))
-   (clobber (match_scratch:<VPRED> 0 "=Upa"))]
+   (clobber (match_scratch:<VPRED> 0))]
   "TARGET_SVE
    && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])"
-  "cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.d"
+  {@ [ cons:  =0, 1  , 2, 3, 4, 5, 6  , 7 ]
+     [ Upa      , Upl, w, w,  ,  , Upl,   ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.d
+  }
 )
 
 ;; -------------------------------------------------------------------------
@@ -9922,41 +9958,45 @@ (define_insn "@aarch64_brk<brk_op>"
 (define_insn "*aarch64_brk<brk_op>_cc"
   [(set (reg:CC_NZC CC_REGNUM)
 	(unspec:CC_NZC
-	  [(match_operand:VNx16BI 1 "register_operand" "Upa")
+	  [(match_operand:VNx16BI 1 "register_operand")
 	   (match_dup 1)
 	   (match_operand:SI 4 "aarch64_sve_ptrue_flag")
 	   (unspec:VNx16BI
 	     [(match_dup 1)
-	      (match_operand:VNx16BI 2 "register_operand" "Upa")
+	      (match_operand:VNx16BI 2 "register_operand")
 	      (match_operand:VNx16BI 3 "aarch64_simd_imm_zero")]
 	     SVE_BRK_UNARY)]
 	  UNSPEC_PTEST))
-   (set (match_operand:VNx16BI 0 "register_operand" "=Upa")
+   (set (match_operand:VNx16BI 0 "register_operand")
 	(unspec:VNx16BI
 	  [(match_dup 1)
 	   (match_dup 2)
 	   (match_dup 3)]
 	  SVE_BRK_UNARY))]
   "TARGET_SVE"
-  "brk<brk_op>s\t%0.b, %1/z, %2.b"
+  {@ [ cons: =0, 1  , 2  , 3, 4 ]
+     [ Upa     , Upa, Upa,  ,   ] brk<brk_op>s\t%0.b, %1/z, %2.b
+  }
 )
 
 ;; Same, but with only the flags result being interesting.
 (define_insn "*aarch64_brk<brk_op>_ptest"
   [(set (reg:CC_NZC CC_REGNUM)
 	(unspec:CC_NZC
-	  [(match_operand:VNx16BI 1 "register_operand" "Upa")
+	  [(match_operand:VNx16BI 1 "register_operand")
 	   (match_dup 1)
 	   (match_operand:SI 4 "aarch64_sve_ptrue_flag")
 	   (unspec:VNx16BI
 	     [(match_dup 1)
-	      (match_operand:VNx16BI 2 "register_operand" "Upa")
+	      (match_operand:VNx16BI 2 "register_operand")
 	      (match_operand:VNx16BI 3 "aarch64_simd_imm_zero")]
 	     SVE_BRK_UNARY)]
 	  UNSPEC_PTEST))
-   (clobber (match_scratch:VNx16BI 0 "=Upa"))]
+   (clobber (match_scratch:VNx16BI 0))]
   "TARGET_SVE"
-  "brk<brk_op>s\t%0.b, %1/z, %2.b"
+  {@ [ cons: =0, 1  , 2  , 3, 4 ]
+     [ Upa     , Upa, Upa,  ,   ] brk<brk_op>s\t%0.b, %1/z, %2.b
+  }
 )
 
 ;; -------------------------------------------------------------------------
@@ -9973,14 +10013,16 @@ (define_insn "*aarch64_brk<brk_op>_ptest"
 
 ;; Binary BRKs (BRKN, BRKPA, BRKPB).
 (define_insn "@aarch64_brk<brk_op>"
-  [(set (match_operand:VNx16BI 0 "register_operand" "=Upa")
+  [(set (match_operand:VNx16BI 0 "register_operand")
 	(unspec:VNx16BI
-	  [(match_operand:VNx16BI 1 "register_operand" "Upa")
-	   (match_operand:VNx16BI 2 "register_operand" "Upa")
-	   (match_operand:VNx16BI 3 "register_operand" "<brk_reg_con>")]
+	  [(match_operand:VNx16BI 1 "register_operand")
+	   (match_operand:VNx16BI 2 "register_operand")
+	   (match_operand:VNx16BI 3 "register_operand")]
 	  SVE_BRK_BINARY))]
   "TARGET_SVE"
-  "brk<brk_op>\t%0.b, %1/z, %2.b, %<brk_reg_opno>.b"
+  {@ [ cons: =0, 1  , 2  , 3             ]
+     [ Upa     , Upa, Upa, <brk_reg_con> ] brk<brk_op>\t%0.b, %1/z, %2.b, %<brk_reg_opno>.b
+  }
 )
 
 ;; BRKN, producing both a predicate and a flags result.  Unlike other
@@ -9992,19 +10034,21 @@ (define_insn_and_rewrite "*aarch64_brkn_cc"
 	   (match_operand:VNx16BI 5)
 	   (const_int SVE_KNOWN_PTRUE)
 	   (unspec:VNx16BI
-	     [(match_operand:VNx16BI 1 "register_operand" "Upa")
-	      (match_operand:VNx16BI 2 "register_operand" "Upa")
-	      (match_operand:VNx16BI 3 "register_operand" "0")]
+	     [(match_operand:VNx16BI 1 "register_operand")
+	      (match_operand:VNx16BI 2 "register_operand")
+	      (match_operand:VNx16BI 3 "register_operand")]
 	     UNSPEC_BRKN)]
 	  UNSPEC_PTEST))
-   (set (match_operand:VNx16BI 0 "register_operand" "=Upa")
+   (set (match_operand:VNx16BI 0 "register_operand")
 	(unspec:VNx16BI
 	  [(match_dup 1)
 	   (match_dup 2)
 	   (match_dup 3)]
 	  UNSPEC_BRKN))]
   "TARGET_SVE"
-  "brkns\t%0.b, %1/z, %2.b, %0.b"
+  {@ [ cons: =0, 1  , 2  , 3, 4, 5 ]
+     [ Upa     , Upa, Upa, 0,  ,   ] brkns\t%0.b, %1/z, %2.b, %0.b
+  }
   "&& (operands[4] != CONST0_RTX (VNx16BImode)
        || operands[5] != CONST0_RTX (VNx16BImode))"
   {
@@ -10021,14 +10065,16 @@ (define_insn_and_rewrite "*aarch64_brkn_ptest"
 	   (match_operand:VNx16BI 5)
 	   (const_int SVE_KNOWN_PTRUE)
 	   (unspec:VNx16BI
-	     [(match_operand:VNx16BI 1 "register_operand" "Upa")
-	      (match_operand:VNx16BI 2 "register_operand" "Upa")
-	      (match_operand:VNx16BI 3 "register_operand" "0")]
+	     [(match_operand:VNx16BI 1 "register_operand")
+	      (match_operand:VNx16BI 2 "register_operand")
+	      (match_operand:VNx16BI 3 "register_operand")]
 	     UNSPEC_BRKN)]
 	  UNSPEC_PTEST))
-   (clobber (match_scratch:VNx16BI 0 "=Upa"))]
+   (clobber (match_scratch:VNx16BI 0))]
   "TARGET_SVE"
-  "brkns\t%0.b, %1/z, %2.b, %0.b"
+  {@ [ cons: =0, 1  , 2  , 3, 4, 5 ]
+     [ Upa     , Upa, Upa, 0,  ,   ] brkns\t%0.b, %1/z, %2.b, %0.b
+  }
   "&& (operands[4] != CONST0_RTX (VNx16BImode)
        || operands[5] != CONST0_RTX (VNx16BImode))"
   {
@@ -10041,41 +10087,45 @@ (define_insn_and_rewrite "*aarch64_brkn_ptest"
 (define_insn "*aarch64_brk<brk_op>_cc"
   [(set (reg:CC_NZC CC_REGNUM)
 	(unspec:CC_NZC
-	  [(match_operand:VNx16BI 1 "register_operand" "Upa")
+	  [(match_operand:VNx16BI 1 "register_operand")
 	   (match_dup 1)
 	   (match_operand:SI 4 "aarch64_sve_ptrue_flag")
 	   (unspec:VNx16BI
 	     [(match_dup 1)
-	      (match_operand:VNx16BI 2 "register_operand" "Upa")
-	      (match_operand:VNx16BI 3 "register_operand" "Upa")]
+	      (match_operand:VNx16BI 2 "register_operand")
+	      (match_operand:VNx16BI 3 "register_operand")]
 	     SVE_BRKP)]
 	  UNSPEC_PTEST))
-   (set (match_operand:VNx16BI 0 "register_operand" "=Upa")
+   (set (match_operand:VNx16BI 0 "register_operand")
 	(unspec:VNx16BI
 	  [(match_dup 1)
 	   (match_dup 2)
 	   (match_dup 3)]
 	  SVE_BRKP))]
   "TARGET_SVE"
-  "brk<brk_op>s\t%0.b, %1/z, %2.b, %3.b"
+  {@ [ cons: =0, 1  , 2  , 3  , 4 ]
+     [ Upa     , Upa, Upa, Upa,   ] brk<brk_op>s\t%0.b, %1/z, %2.b, %3.b
+  }
 )
 
 ;; Same, but with only the flags result being interesting.
 (define_insn "*aarch64_brk<brk_op>_ptest"
   [(set (reg:CC_NZC CC_REGNUM)
 	(unspec:CC_NZC
-	  [(match_operand:VNx16BI 1 "register_operand" "Upa")
+	  [(match_operand:VNx16BI 1 "register_operand")
 	   (match_dup 1)
 	   (match_operand:SI 4 "aarch64_sve_ptrue_flag")
 	   (unspec:VNx16BI
 	     [(match_dup 1)
-	      (match_operand:VNx16BI 2 "register_operand" "Upa")
-	      (match_operand:VNx16BI 3 "register_operand" "Upa")]
+	      (match_operand:VNx16BI 2 "register_operand")
+	      (match_operand:VNx16BI 3 "register_operand")]
 	     SVE_BRKP)]
 	  UNSPEC_PTEST))
-   (clobber (match_scratch:VNx16BI 0 "=Upa"))]
+   (clobber (match_scratch:VNx16BI 0))]
   "TARGET_SVE"
-  "brk<brk_op>s\t%0.b, %1/z, %2.b, %3.b"
+  {@ [ cons: =0, 1  , 2  , 3  , 4 ]
+     [ Upa     , Upa, Upa, Upa,   ] brk<brk_op>s\t%0.b, %1/z, %2.b, %3.b
+  }
 )
 
 ;; -------------------------------------------------------------------------
diff --git a/gcc/config/aarch64/aarch64-sve2.md b/gcc/config/aarch64/aarch64-sve2.md
index 934e57055d3419e5dcc89b473fd110a0d4978b4f..aa12baf48355358ca4fefe88157df3aac6eb09bd 100644
--- a/gcc/config/aarch64/aarch64-sve2.md
+++ b/gcc/config/aarch64/aarch64-sve2.md
@@ -3338,18 +3338,20 @@ (define_insn "@aarch64_sve2_histseg<mode>"
 
 ;; Predicated string matching.
 (define_insn "@aarch64_pred_<sve_int_op><mode>"
-  [(set (match_operand:<VPRED> 0 "register_operand" "=Upa")
+  [(set (match_operand:<VPRED> 0 "register_operand")
 	(unspec:<VPRED>
-	  [(match_operand:<VPRED> 1 "register_operand" "Upl")
+	  [(match_operand:<VPRED> 1 "register_operand")
 	   (match_operand:SI 2 "aarch64_sve_ptrue_flag")
 	   (unspec:<VPRED>
-	     [(match_operand:SVE_FULL_BHI 3 "register_operand" "w")
-	      (match_operand:SVE_FULL_BHI 4 "register_operand" "w")]
+	     [(match_operand:SVE_FULL_BHI 3 "register_operand")
+	      (match_operand:SVE_FULL_BHI 4 "register_operand")]
 	     SVE2_MATCH)]
 	  UNSPEC_PRED_Z))
    (clobber (reg:CC_NZC CC_REGNUM))]
   "TARGET_SVE2 && TARGET_NON_STREAMING"
-  "<sve_int_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.<Vetype>"
+  {@ [ cons: =0, 1 , 2, 3, 4 ]
+     [ Upa     , Upl, , w, w ] <sve_int_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.<Vetype>
+  }
 )
 
 ;; Predicated string matching in which both the flag and predicate results




-- 

[-- Attachment #2: rb18354.patch --]
[-- Type: text/x-diff, Size: 25850 bytes --]

diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md
index 0434358122d2fde71bd0e0f850338e739e9be02c..839ab0627747d7a49bef7b0192ee9e7a42587ca0 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -1156,76 +1156,86 @@ (define_insn "aarch64_rdffr"
 
 ;; Likewise with zero predication.
 (define_insn "aarch64_rdffr_z"
-  [(set (match_operand:VNx16BI 0 "register_operand" "=Upa")
+  [(set (match_operand:VNx16BI 0 "register_operand")
 	(and:VNx16BI
 	  (reg:VNx16BI FFRT_REGNUM)
-	  (match_operand:VNx16BI 1 "register_operand" "Upa")))]
+	  (match_operand:VNx16BI 1 "register_operand")))]
   "TARGET_SVE && TARGET_NON_STREAMING"
-  "rdffr\t%0.b, %1/z"
+  {@ [ cons: =0, 1   ]
+     [ Upa     , Upa ] rdffr\t%0.b, %1/z
+  }
 )
 
 ;; Read the FFR to test for a fault, without using the predicate result.
 (define_insn "*aarch64_rdffr_z_ptest"
   [(set (reg:CC_NZC CC_REGNUM)
 	(unspec:CC_NZC
-	  [(match_operand:VNx16BI 1 "register_operand" "Upa")
+	  [(match_operand:VNx16BI 1 "register_operand")
 	   (match_dup 1)
 	   (match_operand:SI 2 "aarch64_sve_ptrue_flag")
 	   (and:VNx16BI
 	     (reg:VNx16BI FFRT_REGNUM)
 	     (match_dup 1))]
 	  UNSPEC_PTEST))
-   (clobber (match_scratch:VNx16BI 0 "=Upa"))]
+   (clobber (match_scratch:VNx16BI 0))]
   "TARGET_SVE && TARGET_NON_STREAMING"
-  "rdffrs\t%0.b, %1/z"
+  {@ [ cons: =0, 1  , 2 ]
+     [ Upa     , Upa,   ] rdffrs\t%0.b, %1/z
+  }
 )
 
 ;; Same for unpredicated RDFFR when tested with a known PTRUE.
 (define_insn "*aarch64_rdffr_ptest"
   [(set (reg:CC_NZC CC_REGNUM)
 	(unspec:CC_NZC
-	  [(match_operand:VNx16BI 1 "register_operand" "Upa")
+	  [(match_operand:VNx16BI 1 "register_operand")
 	   (match_dup 1)
 	   (const_int SVE_KNOWN_PTRUE)
 	   (reg:VNx16BI FFRT_REGNUM)]
 	  UNSPEC_PTEST))
-   (clobber (match_scratch:VNx16BI 0 "=Upa"))]
+   (clobber (match_scratch:VNx16BI 0))]
   "TARGET_SVE && TARGET_NON_STREAMING"
-  "rdffrs\t%0.b, %1/z"
+  {@ [ cons: =0, 1   ]
+     [ Upa     , Upa ] rdffrs\t%0.b, %1/z
+  }
 )
 
 ;; Read the FFR with zero predication and test the result.
 (define_insn "*aarch64_rdffr_z_cc"
   [(set (reg:CC_NZC CC_REGNUM)
 	(unspec:CC_NZC
-	  [(match_operand:VNx16BI 1 "register_operand" "Upa")
+	  [(match_operand:VNx16BI 1 "register_operand")
 	   (match_dup 1)
 	   (match_operand:SI 2 "aarch64_sve_ptrue_flag")
 	   (and:VNx16BI
 	     (reg:VNx16BI FFRT_REGNUM)
 	     (match_dup 1))]
 	  UNSPEC_PTEST))
-   (set (match_operand:VNx16BI 0 "register_operand" "=Upa")
+   (set (match_operand:VNx16BI 0 "register_operand")
 	(and:VNx16BI
 	  (reg:VNx16BI FFRT_REGNUM)
 	  (match_dup 1)))]
   "TARGET_SVE && TARGET_NON_STREAMING"
-  "rdffrs\t%0.b, %1/z"
+  {@ [ cons: =0, 1  , 2 ]
+     [ Upa     , Upa,   ] rdffrs\t%0.b, %1/z
+  }
 )
 
 ;; Same for unpredicated RDFFR when tested with a known PTRUE.
 (define_insn "*aarch64_rdffr_cc"
   [(set (reg:CC_NZC CC_REGNUM)
 	(unspec:CC_NZC
-	  [(match_operand:VNx16BI 1 "register_operand" "Upa")
+	  [(match_operand:VNx16BI 1 "register_operand")
 	   (match_dup 1)
 	   (const_int SVE_KNOWN_PTRUE)
 	   (reg:VNx16BI FFRT_REGNUM)]
 	  UNSPEC_PTEST))
-   (set (match_operand:VNx16BI 0 "register_operand" "=Upa")
+   (set (match_operand:VNx16BI 0 "register_operand")
 	(reg:VNx16BI FFRT_REGNUM))]
   "TARGET_SVE && TARGET_NON_STREAMING"
-  "rdffrs\t%0.b, %1/z"
+  {@ [ cons: =0, 1  , 2 ]
+     [ Upa     , Upa,   ] rdffrs\t%0.b, %1/z
+  }
 )
 
 ;; [R3 in the block comment above about FFR handling]
@@ -6637,11 +6647,13 @@ (define_insn "@aarch64_pred_<optab><mode>"
 ;; Doubling the second operand is the preferred implementation
 ;; of the MOV alias, so we use that instead of %1/z, %1, %2.
 (define_insn "and<mode>3"
-  [(set (match_operand:PRED_ALL 0 "register_operand" "=Upa")
-	(and:PRED_ALL (match_operand:PRED_ALL 1 "register_operand" "Upa")
-		      (match_operand:PRED_ALL 2 "register_operand" "Upa")))]
+  [(set (match_operand:PRED_ALL 0 "register_operand")
+	(and:PRED_ALL (match_operand:PRED_ALL 1 "register_operand")
+		      (match_operand:PRED_ALL 2 "register_operand")))]
   "TARGET_SVE"
-  "and\t%0.b, %1/z, %2.b, %2.b"
+  {@ [ cons: =0, 1  , 2   ]
+     [ Upa     , Upa, Upa ] and\t%0.b, %1/z, %2.b, %2.b
+  }
 )
 
 ;; Unpredicated predicate EOR and ORR.
@@ -6660,14 +6672,16 @@ (define_expand "<optab><mode>3"
 
 ;; Predicated predicate AND, EOR and ORR.
 (define_insn "@aarch64_pred_<optab><mode>_z"
-  [(set (match_operand:PRED_ALL 0 "register_operand" "=Upa")
+  [(set (match_operand:PRED_ALL 0 "register_operand")
 	(and:PRED_ALL
 	  (LOGICAL:PRED_ALL
-	    (match_operand:PRED_ALL 2 "register_operand" "Upa")
-	    (match_operand:PRED_ALL 3 "register_operand" "Upa"))
-	  (match_operand:PRED_ALL 1 "register_operand" "Upa")))]
+	    (match_operand:PRED_ALL 2 "register_operand")
+	    (match_operand:PRED_ALL 3 "register_operand"))
+	  (match_operand:PRED_ALL 1 "register_operand")))]
   "TARGET_SVE"
-  "<logical>\t%0.b, %1/z, %2.b, %3.b"
+  {@ [ cons: =0, 1  , 2  , 3   ]
+     [ Upa     , Upa, Upa, Upa ] <logical>\t%0.b, %1/z, %2.b, %3.b
+  }
 )
 
 ;; Perform a logical operation on operands 2 and 3, using operand 1 as
@@ -6676,38 +6690,42 @@ (define_insn "@aarch64_pred_<optab><mode>_z"
 (define_insn "*<optab><mode>3_cc"
   [(set (reg:CC_NZC CC_REGNUM)
 	(unspec:CC_NZC
-	  [(match_operand:VNx16BI 1 "register_operand" "Upa")
+	  [(match_operand:VNx16BI 1 "register_operand")
 	   (match_operand 4)
 	   (match_operand:SI 5 "aarch64_sve_ptrue_flag")
 	   (and:PRED_ALL
 	     (LOGICAL:PRED_ALL
-	       (match_operand:PRED_ALL 2 "register_operand" "Upa")
-	       (match_operand:PRED_ALL 3 "register_operand" "Upa"))
+	       (match_operand:PRED_ALL 2 "register_operand")
+	       (match_operand:PRED_ALL 3 "register_operand"))
 	     (match_dup 4))]
 	  UNSPEC_PTEST))
-   (set (match_operand:PRED_ALL 0 "register_operand" "=Upa")
+   (set (match_operand:PRED_ALL 0 "register_operand")
 	(and:PRED_ALL (LOGICAL:PRED_ALL (match_dup 2) (match_dup 3))
 		      (match_dup 4)))]
   "TARGET_SVE"
-  "<logical>s\t%0.b, %1/z, %2.b, %3.b"
+  {@ [ cons: =0, 1  , 2  , 3  , 4, 5 ]
+     [ Upa     , Upa, Upa, Upa,  ,   ] <logical>s\t%0.b, %1/z, %2.b, %3.b
+  }
 )
 
 ;; Same with just the flags result.
 (define_insn "*<optab><mode>3_ptest"
   [(set (reg:CC_NZC CC_REGNUM)
 	(unspec:CC_NZC
-	  [(match_operand:VNx16BI 1 "register_operand" "Upa")
+	  [(match_operand:VNx16BI 1 "register_operand")
 	   (match_operand 4)
 	   (match_operand:SI 5 "aarch64_sve_ptrue_flag")
 	   (and:PRED_ALL
 	     (LOGICAL:PRED_ALL
-	       (match_operand:PRED_ALL 2 "register_operand" "Upa")
-	       (match_operand:PRED_ALL 3 "register_operand" "Upa"))
+	       (match_operand:PRED_ALL 2 "register_operand")
+	       (match_operand:PRED_ALL 3 "register_operand"))
 	     (match_dup 4))]
 	  UNSPEC_PTEST))
-   (clobber (match_scratch:VNx16BI 0 "=Upa"))]
+   (clobber (match_scratch:VNx16BI 0))]
   "TARGET_SVE"
-  "<logical>s\t%0.b, %1/z, %2.b, %3.b"
+  {@ [ cons: =0, 1  , 2  , 3  , 4, 5 ]
+     [ Upa     , Upa, Upa, Upa,  ,   ] <logical>s\t%0.b, %1/z, %2.b, %3.b
+  }
 )
 
 ;; -------------------------------------------------------------------------
@@ -6720,56 +6738,62 @@ (define_insn "*<optab><mode>3_ptest"
 
 ;; Predicated predicate BIC and ORN.
 (define_insn "aarch64_pred_<nlogical><mode>_z"
-  [(set (match_operand:PRED_ALL 0 "register_operand" "=Upa")
+  [(set (match_operand:PRED_ALL 0 "register_operand")
 	(and:PRED_ALL
 	  (NLOGICAL:PRED_ALL
-	    (not:PRED_ALL (match_operand:PRED_ALL 3 "register_operand" "Upa"))
-	    (match_operand:PRED_ALL 2 "register_operand" "Upa"))
-	  (match_operand:PRED_ALL 1 "register_operand" "Upa")))]
+	    (not:PRED_ALL (match_operand:PRED_ALL 3 "register_operand"))
+	    (match_operand:PRED_ALL 2 "register_operand"))
+	  (match_operand:PRED_ALL 1 "register_operand")))]
   "TARGET_SVE"
-  "<nlogical>\t%0.b, %1/z, %2.b, %3.b"
+  {@ [ cons: =0, 1  , 2  , 3   ]
+     [ Upa     , Upa, Upa, Upa ] <nlogical>\t%0.b, %1/z, %2.b, %3.b
+  }
 )
 
 ;; Same, but set the flags as a side-effect.
 (define_insn "*<nlogical><mode>3_cc"
   [(set (reg:CC_NZC CC_REGNUM)
 	(unspec:CC_NZC
-	  [(match_operand:VNx16BI 1 "register_operand" "Upa")
+	  [(match_operand:VNx16BI 1 "register_operand")
 	   (match_operand 4)
 	   (match_operand:SI 5 "aarch64_sve_ptrue_flag")
 	   (and:PRED_ALL
 	     (NLOGICAL:PRED_ALL
 	       (not:PRED_ALL
-		 (match_operand:PRED_ALL 3 "register_operand" "Upa"))
-	       (match_operand:PRED_ALL 2 "register_operand" "Upa"))
+		 (match_operand:PRED_ALL 3 "register_operand"))
+	       (match_operand:PRED_ALL 2 "register_operand"))
 	     (match_dup 4))]
 	  UNSPEC_PTEST))
-   (set (match_operand:PRED_ALL 0 "register_operand" "=Upa")
+   (set (match_operand:PRED_ALL 0 "register_operand")
 	(and:PRED_ALL (NLOGICAL:PRED_ALL
 			(not:PRED_ALL (match_dup 3))
 			(match_dup 2))
 		      (match_dup 4)))]
   "TARGET_SVE"
-  "<nlogical>s\t%0.b, %1/z, %2.b, %3.b"
+  {@ [ cons: =0, 1  , 2  , 3  , 4, 5 ]
+     [ Upa     , Upa, Upa, Upa,  ,   ] <nlogical>s\t%0.b, %1/z, %2.b, %3.b
+  }
 )
 
 ;; Same with just the flags result.
 (define_insn "*<nlogical><mode>3_ptest"
   [(set (reg:CC_NZC CC_REGNUM)
 	(unspec:CC_NZC
-	  [(match_operand:VNx16BI 1 "register_operand" "Upa")
+	  [(match_operand:VNx16BI 1 "register_operand")
 	   (match_operand 4)
 	   (match_operand:SI 5 "aarch64_sve_ptrue_flag")
 	   (and:PRED_ALL
 	     (NLOGICAL:PRED_ALL
 	       (not:PRED_ALL
-		 (match_operand:PRED_ALL 3 "register_operand" "Upa"))
-	       (match_operand:PRED_ALL 2 "register_operand" "Upa"))
+		 (match_operand:PRED_ALL 3 "register_operand"))
+	       (match_operand:PRED_ALL 2 "register_operand"))
 	     (match_dup 4))]
 	  UNSPEC_PTEST))
-   (clobber (match_scratch:VNx16BI 0 "=Upa"))]
+   (clobber (match_scratch:VNx16BI 0))]
   "TARGET_SVE"
-  "<nlogical>s\t%0.b, %1/z, %2.b, %3.b"
+  {@ [ cons:  =0, 1  , 2  , 3  , 4, 5 ]
+     [ Upa      , Upa, Upa, Upa,  ,   ] <nlogical>s\t%0.b, %1/z, %2.b, %3.b
+  }
 )
 
 ;; -------------------------------------------------------------------------
@@ -6782,58 +6806,64 @@ (define_insn "*<nlogical><mode>3_ptest"
 
 ;; Predicated predicate NAND and NOR.
 (define_insn "aarch64_pred_<logical_nn><mode>_z"
-  [(set (match_operand:PRED_ALL 0 "register_operand" "=Upa")
+  [(set (match_operand:PRED_ALL 0 "register_operand")
 	(and:PRED_ALL
 	  (NLOGICAL:PRED_ALL
-	    (not:PRED_ALL (match_operand:PRED_ALL 2 "register_operand" "Upa"))
-	    (not:PRED_ALL (match_operand:PRED_ALL 3 "register_operand" "Upa")))
-	  (match_operand:PRED_ALL 1 "register_operand" "Upa")))]
+	    (not:PRED_ALL (match_operand:PRED_ALL 2 "register_operand"))
+	    (not:PRED_ALL (match_operand:PRED_ALL 3 "register_operand")))
+	  (match_operand:PRED_ALL 1 "register_operand")))]
   "TARGET_SVE"
-  "<logical_nn>\t%0.b, %1/z, %2.b, %3.b"
+  {@ [ cons: =0, 1  , 2  , 3   ]
+     [ Upa     , Upa, Upa, Upa ] <logical_nn>\t%0.b, %1/z, %2.b, %3.b
+  }
 )
 
 ;; Same, but set the flags as a side-effect.
 (define_insn "*<logical_nn><mode>3_cc"
   [(set (reg:CC_NZC CC_REGNUM)
 	(unspec:CC_NZC
-	  [(match_operand:VNx16BI 1 "register_operand" "Upa")
+	  [(match_operand:VNx16BI 1 "register_operand")
 	   (match_operand 4)
 	   (match_operand:SI 5 "aarch64_sve_ptrue_flag")
 	   (and:PRED_ALL
 	     (NLOGICAL:PRED_ALL
 	       (not:PRED_ALL
-		 (match_operand:PRED_ALL 2 "register_operand" "Upa"))
+		 (match_operand:PRED_ALL 2 "register_operand"))
 	       (not:PRED_ALL
-		 (match_operand:PRED_ALL 3 "register_operand" "Upa")))
+		 (match_operand:PRED_ALL 3 "register_operand")))
 	     (match_dup 4))]
 	  UNSPEC_PTEST))
-   (set (match_operand:PRED_ALL 0 "register_operand" "=Upa")
+   (set (match_operand:PRED_ALL 0 "register_operand")
 	(and:PRED_ALL (NLOGICAL:PRED_ALL
 			(not:PRED_ALL (match_dup 2))
 			(not:PRED_ALL (match_dup 3)))
 		      (match_dup 4)))]
   "TARGET_SVE"
-  "<logical_nn>s\t%0.b, %1/z, %2.b, %3.b"
+  {@ [ cons: =0, 1  , 2  , 3  , 4, 5 ]
+     [ Upa     , Upa, Upa, Upa,  ,   ] <logical_nn>s\t%0.b, %1/z, %2.b, %3.b
+  }
 )
 
 ;; Same with just the flags result.
 (define_insn "*<logical_nn><mode>3_ptest"
   [(set (reg:CC_NZC CC_REGNUM)
 	(unspec:CC_NZC
-	  [(match_operand:VNx16BI 1 "register_operand" "Upa")
+	  [(match_operand:VNx16BI 1 "register_operand")
 	   (match_operand 4)
 	   (match_operand:SI 5 "aarch64_sve_ptrue_flag")
 	   (and:PRED_ALL
 	     (NLOGICAL:PRED_ALL
 	       (not:PRED_ALL
-		 (match_operand:PRED_ALL 2 "register_operand" "Upa"))
+		 (match_operand:PRED_ALL 2 "register_operand"))
 	       (not:PRED_ALL
-		 (match_operand:PRED_ALL 3 "register_operand" "Upa")))
+		 (match_operand:PRED_ALL 3 "register_operand")))
 	     (match_dup 4))]
 	  UNSPEC_PTEST))
-   (clobber (match_scratch:VNx16BI 0 "=Upa"))]
+   (clobber (match_scratch:VNx16BI 0))]
   "TARGET_SVE"
-  "<logical_nn>s\t%0.b, %1/z, %2.b, %3.b"
+  {@ [ cons: =0, 1  , 2  , 3  , 4, 5 ]
+     [ Upa     , Upa, Upa, Upa,  ,   ] <logical_nn>s\t%0.b, %1/z, %2.b, %3.b
+  }
 )
 
 ;; =========================================================================
@@ -8133,12 +8163,12 @@ (define_insn_and_rewrite "*cmp<cmp_op><mode>_ptest"
 		(match_operand:SVE_I 3 "aarch64_sve_cmp_<sve_imm_con>_operand"))]
 	     UNSPEC_PRED_Z)]
 	  UNSPEC_PTEST))
-   (clobber (match_scratch:<VPRED> 0 "=Upa, Upa"))]
+   (clobber (match_scratch:<VPRED> 0))]
   "TARGET_SVE
    && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])"
-  {@ [ cons: 1 , 2 , 3              ]
-     [ Upl     , w , <sve_imm_con>  ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, #%3
-     [ Upl     , w , w              ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype>
+  {@ [ cons: =0, 1  , 2 , 3              ]
+     [ Upa     , Upl, w , <sve_imm_con>  ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, #%3
+     [ Upa     , Upl, w , w              ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype>
   }
   "&& !rtx_equal_p (operands[4], operands[6])"
   {
@@ -8180,18 +8210,20 @@ (define_insn_and_split "*cmp<cmp_op><mode>_and"
 
 ;; Predicated integer wide comparisons.
 (define_insn "@aarch64_pred_cmp<cmp_op><mode>_wide"
-  [(set (match_operand:<VPRED> 0 "register_operand" "=Upa")
+  [(set (match_operand:<VPRED> 0 "register_operand")
 	(unspec:<VPRED>
-	  [(match_operand:VNx16BI 1 "register_operand" "Upl")
+	  [(match_operand:VNx16BI 1 "register_operand")
 	   (match_operand:SI 2 "aarch64_sve_ptrue_flag")
 	   (unspec:<VPRED>
-	     [(match_operand:SVE_FULL_BHSI 3 "register_operand" "w")
-	      (match_operand:VNx2DI 4 "register_operand" "w")]
+	     [(match_operand:SVE_FULL_BHSI 3 "register_operand")
+	      (match_operand:VNx2DI 4 "register_operand")]
 	     SVE_COND_INT_CMP_WIDE)]
 	  UNSPEC_PRED_Z))
    (clobber (reg:CC_NZC CC_REGNUM))]
   "TARGET_SVE"
-  "cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.d"
+  {@ [ cons: =0, 1  , 2, 3, 4 ]
+     [ Upa     , Upl,  , w, w ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.d
+  }
 )
 
 ;; Predicated integer wide comparisons in which both the flag and
@@ -8199,19 +8231,19 @@ (define_insn "@aarch64_pred_cmp<cmp_op><mode>_wide"
 (define_insn "*aarch64_pred_cmp<cmp_op><mode>_wide_cc"
   [(set (reg:CC_NZC CC_REGNUM)
 	(unspec:CC_NZC
-	  [(match_operand:VNx16BI 1 "register_operand" "Upl")
+	  [(match_operand:VNx16BI 1 "register_operand")
 	   (match_operand 4)
 	   (match_operand:SI 5 "aarch64_sve_ptrue_flag")
 	   (unspec:<VPRED>
-	     [(match_operand:VNx16BI 6 "register_operand" "Upl")
+	     [(match_operand:VNx16BI 6 "register_operand")
 	      (match_operand:SI 7 "aarch64_sve_ptrue_flag")
 	      (unspec:<VPRED>
-		[(match_operand:SVE_FULL_BHSI 2 "register_operand" "w")
-		 (match_operand:VNx2DI 3 "register_operand" "w")]
+		[(match_operand:SVE_FULL_BHSI 2 "register_operand")
+		 (match_operand:VNx2DI 3 "register_operand")]
 		SVE_COND_INT_CMP_WIDE)]
 	     UNSPEC_PRED_Z)]
 	  UNSPEC_PTEST))
-   (set (match_operand:<VPRED> 0 "register_operand" "=Upa")
+   (set (match_operand:<VPRED> 0 "register_operand")
 	(unspec:<VPRED>
 	  [(match_dup 6)
 	   (match_dup 7)
@@ -8222,7 +8254,9 @@ (define_insn "*aarch64_pred_cmp<cmp_op><mode>_wide_cc"
 	  UNSPEC_PRED_Z))]
   "TARGET_SVE
    && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])"
-  "cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.d"
+  {@ [ cons: =0, 1  , 2, 3, 4, 5, 6  , 7 ]
+     [ Upa     , Upl, w, w,  ,  , Upl,   ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.d
+  }
 )
 
 ;; Predicated integer wide comparisons in which only the flags result
@@ -8230,22 +8264,24 @@ (define_insn "*aarch64_pred_cmp<cmp_op><mode>_wide_cc"
 (define_insn "*aarch64_pred_cmp<cmp_op><mode>_wide_ptest"
   [(set (reg:CC_NZC CC_REGNUM)
 	(unspec:CC_NZC
-	  [(match_operand:VNx16BI 1 "register_operand" "Upl")
+	  [(match_operand:VNx16BI 1 "register_operand")
 	   (match_operand 4)
 	   (match_operand:SI 5 "aarch64_sve_ptrue_flag")
 	   (unspec:<VPRED>
-	     [(match_operand:VNx16BI 6 "register_operand" "Upl")
+	     [(match_operand:VNx16BI 6 "register_operand")
 	      (match_operand:SI 7 "aarch64_sve_ptrue_flag")
 	      (unspec:<VPRED>
-		[(match_operand:SVE_FULL_BHSI 2 "register_operand" "w")
-		 (match_operand:VNx2DI 3 "register_operand" "w")]
+		[(match_operand:SVE_FULL_BHSI 2 "register_operand")
+		 (match_operand:VNx2DI 3 "register_operand")]
 		SVE_COND_INT_CMP_WIDE)]
 	     UNSPEC_PRED_Z)]
 	  UNSPEC_PTEST))
-   (clobber (match_scratch:<VPRED> 0 "=Upa"))]
+   (clobber (match_scratch:<VPRED> 0))]
   "TARGET_SVE
    && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])"
-  "cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.d"
+  {@ [ cons:  =0, 1  , 2, 3, 4, 5, 6  , 7 ]
+     [ Upa      , Upl, w, w,  ,  , Upl,   ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.d
+  }
 )
 
 ;; -------------------------------------------------------------------------
@@ -9922,41 +9958,45 @@ (define_insn "@aarch64_brk<brk_op>"
 (define_insn "*aarch64_brk<brk_op>_cc"
   [(set (reg:CC_NZC CC_REGNUM)
 	(unspec:CC_NZC
-	  [(match_operand:VNx16BI 1 "register_operand" "Upa")
+	  [(match_operand:VNx16BI 1 "register_operand")
 	   (match_dup 1)
 	   (match_operand:SI 4 "aarch64_sve_ptrue_flag")
 	   (unspec:VNx16BI
 	     [(match_dup 1)
-	      (match_operand:VNx16BI 2 "register_operand" "Upa")
+	      (match_operand:VNx16BI 2 "register_operand")
 	      (match_operand:VNx16BI 3 "aarch64_simd_imm_zero")]
 	     SVE_BRK_UNARY)]
 	  UNSPEC_PTEST))
-   (set (match_operand:VNx16BI 0 "register_operand" "=Upa")
+   (set (match_operand:VNx16BI 0 "register_operand")
 	(unspec:VNx16BI
 	  [(match_dup 1)
 	   (match_dup 2)
 	   (match_dup 3)]
 	  SVE_BRK_UNARY))]
   "TARGET_SVE"
-  "brk<brk_op>s\t%0.b, %1/z, %2.b"
+  {@ [ cons: =0, 1  , 2  , 3, 4 ]
+     [ Upa     , Upa, Upa,  ,   ] brk<brk_op>s\t%0.b, %1/z, %2.b
+  }
 )
 
 ;; Same, but with only the flags result being interesting.
 (define_insn "*aarch64_brk<brk_op>_ptest"
   [(set (reg:CC_NZC CC_REGNUM)
 	(unspec:CC_NZC
-	  [(match_operand:VNx16BI 1 "register_operand" "Upa")
+	  [(match_operand:VNx16BI 1 "register_operand")
 	   (match_dup 1)
 	   (match_operand:SI 4 "aarch64_sve_ptrue_flag")
 	   (unspec:VNx16BI
 	     [(match_dup 1)
-	      (match_operand:VNx16BI 2 "register_operand" "Upa")
+	      (match_operand:VNx16BI 2 "register_operand")
 	      (match_operand:VNx16BI 3 "aarch64_simd_imm_zero")]
 	     SVE_BRK_UNARY)]
 	  UNSPEC_PTEST))
-   (clobber (match_scratch:VNx16BI 0 "=Upa"))]
+   (clobber (match_scratch:VNx16BI 0))]
   "TARGET_SVE"
-  "brk<brk_op>s\t%0.b, %1/z, %2.b"
+  {@ [ cons: =0, 1  , 2  , 3, 4 ]
+     [ Upa     , Upa, Upa,  ,   ] brk<brk_op>s\t%0.b, %1/z, %2.b
+  }
 )
 
 ;; -------------------------------------------------------------------------
@@ -9973,14 +10013,16 @@ (define_insn "*aarch64_brk<brk_op>_ptest"
 
 ;; Binary BRKs (BRKN, BRKPA, BRKPB).
 (define_insn "@aarch64_brk<brk_op>"
-  [(set (match_operand:VNx16BI 0 "register_operand" "=Upa")
+  [(set (match_operand:VNx16BI 0 "register_operand")
 	(unspec:VNx16BI
-	  [(match_operand:VNx16BI 1 "register_operand" "Upa")
-	   (match_operand:VNx16BI 2 "register_operand" "Upa")
-	   (match_operand:VNx16BI 3 "register_operand" "<brk_reg_con>")]
+	  [(match_operand:VNx16BI 1 "register_operand")
+	   (match_operand:VNx16BI 2 "register_operand")
+	   (match_operand:VNx16BI 3 "register_operand")]
 	  SVE_BRK_BINARY))]
   "TARGET_SVE"
-  "brk<brk_op>\t%0.b, %1/z, %2.b, %<brk_reg_opno>.b"
+  {@ [ cons: =0, 1  , 2  , 3             ]
+     [ Upa     , Upa, Upa, <brk_reg_con> ] brk<brk_op>\t%0.b, %1/z, %2.b, %<brk_reg_opno>.b
+  }
 )
 
 ;; BRKN, producing both a predicate and a flags result.  Unlike other
@@ -9992,19 +10034,21 @@ (define_insn_and_rewrite "*aarch64_brkn_cc"
 	   (match_operand:VNx16BI 5)
 	   (const_int SVE_KNOWN_PTRUE)
 	   (unspec:VNx16BI
-	     [(match_operand:VNx16BI 1 "register_operand" "Upa")
-	      (match_operand:VNx16BI 2 "register_operand" "Upa")
-	      (match_operand:VNx16BI 3 "register_operand" "0")]
+	     [(match_operand:VNx16BI 1 "register_operand")
+	      (match_operand:VNx16BI 2 "register_operand")
+	      (match_operand:VNx16BI 3 "register_operand")]
 	     UNSPEC_BRKN)]
 	  UNSPEC_PTEST))
-   (set (match_operand:VNx16BI 0 "register_operand" "=Upa")
+   (set (match_operand:VNx16BI 0 "register_operand")
 	(unspec:VNx16BI
 	  [(match_dup 1)
 	   (match_dup 2)
 	   (match_dup 3)]
 	  UNSPEC_BRKN))]
   "TARGET_SVE"
-  "brkns\t%0.b, %1/z, %2.b, %0.b"
+  {@ [ cons: =0, 1  , 2  , 3, 4, 5 ]
+     [ Upa     , Upa, Upa, 0,  ,   ] brkns\t%0.b, %1/z, %2.b, %0.b
+  }
   "&& (operands[4] != CONST0_RTX (VNx16BImode)
        || operands[5] != CONST0_RTX (VNx16BImode))"
   {
@@ -10021,14 +10065,16 @@ (define_insn_and_rewrite "*aarch64_brkn_ptest"
 	   (match_operand:VNx16BI 5)
 	   (const_int SVE_KNOWN_PTRUE)
 	   (unspec:VNx16BI
-	     [(match_operand:VNx16BI 1 "register_operand" "Upa")
-	      (match_operand:VNx16BI 2 "register_operand" "Upa")
-	      (match_operand:VNx16BI 3 "register_operand" "0")]
+	     [(match_operand:VNx16BI 1 "register_operand")
+	      (match_operand:VNx16BI 2 "register_operand")
+	      (match_operand:VNx16BI 3 "register_operand")]
 	     UNSPEC_BRKN)]
 	  UNSPEC_PTEST))
-   (clobber (match_scratch:VNx16BI 0 "=Upa"))]
+   (clobber (match_scratch:VNx16BI 0))]
   "TARGET_SVE"
-  "brkns\t%0.b, %1/z, %2.b, %0.b"
+  {@ [ cons: =0, 1  , 2  , 3, 4, 5 ]
+     [ Upa     , Upa, Upa, 0,  ,   ] brkns\t%0.b, %1/z, %2.b, %0.b
+  }
   "&& (operands[4] != CONST0_RTX (VNx16BImode)
        || operands[5] != CONST0_RTX (VNx16BImode))"
   {
@@ -10041,41 +10087,45 @@ (define_insn_and_rewrite "*aarch64_brkn_ptest"
 (define_insn "*aarch64_brk<brk_op>_cc"
   [(set (reg:CC_NZC CC_REGNUM)
 	(unspec:CC_NZC
-	  [(match_operand:VNx16BI 1 "register_operand" "Upa")
+	  [(match_operand:VNx16BI 1 "register_operand")
 	   (match_dup 1)
 	   (match_operand:SI 4 "aarch64_sve_ptrue_flag")
 	   (unspec:VNx16BI
 	     [(match_dup 1)
-	      (match_operand:VNx16BI 2 "register_operand" "Upa")
-	      (match_operand:VNx16BI 3 "register_operand" "Upa")]
+	      (match_operand:VNx16BI 2 "register_operand")
+	      (match_operand:VNx16BI 3 "register_operand")]
 	     SVE_BRKP)]
 	  UNSPEC_PTEST))
-   (set (match_operand:VNx16BI 0 "register_operand" "=Upa")
+   (set (match_operand:VNx16BI 0 "register_operand")
 	(unspec:VNx16BI
 	  [(match_dup 1)
 	   (match_dup 2)
 	   (match_dup 3)]
 	  SVE_BRKP))]
   "TARGET_SVE"
-  "brk<brk_op>s\t%0.b, %1/z, %2.b, %3.b"
+  {@ [ cons: =0, 1  , 2  , 3  , 4 ]
+     [ Upa     , Upa, Upa, Upa,   ] brk<brk_op>s\t%0.b, %1/z, %2.b, %3.b
+  }
 )
 
 ;; Same, but with only the flags result being interesting.
 (define_insn "*aarch64_brk<brk_op>_ptest"
   [(set (reg:CC_NZC CC_REGNUM)
 	(unspec:CC_NZC
-	  [(match_operand:VNx16BI 1 "register_operand" "Upa")
+	  [(match_operand:VNx16BI 1 "register_operand")
 	   (match_dup 1)
 	   (match_operand:SI 4 "aarch64_sve_ptrue_flag")
 	   (unspec:VNx16BI
 	     [(match_dup 1)
-	      (match_operand:VNx16BI 2 "register_operand" "Upa")
-	      (match_operand:VNx16BI 3 "register_operand" "Upa")]
+	      (match_operand:VNx16BI 2 "register_operand")
+	      (match_operand:VNx16BI 3 "register_operand")]
 	     SVE_BRKP)]
 	  UNSPEC_PTEST))
-   (clobber (match_scratch:VNx16BI 0 "=Upa"))]
+   (clobber (match_scratch:VNx16BI 0))]
   "TARGET_SVE"
-  "brk<brk_op>s\t%0.b, %1/z, %2.b, %3.b"
+  {@ [ cons: =0, 1  , 2  , 3  , 4 ]
+     [ Upa     , Upa, Upa, Upa,   ] brk<brk_op>s\t%0.b, %1/z, %2.b, %3.b
+  }
 )
 
 ;; -------------------------------------------------------------------------
diff --git a/gcc/config/aarch64/aarch64-sve2.md b/gcc/config/aarch64/aarch64-sve2.md
index 934e57055d3419e5dcc89b473fd110a0d4978b4f..aa12baf48355358ca4fefe88157df3aac6eb09bd 100644
--- a/gcc/config/aarch64/aarch64-sve2.md
+++ b/gcc/config/aarch64/aarch64-sve2.md
@@ -3338,18 +3338,20 @@ (define_insn "@aarch64_sve2_histseg<mode>"
 
 ;; Predicated string matching.
 (define_insn "@aarch64_pred_<sve_int_op><mode>"
-  [(set (match_operand:<VPRED> 0 "register_operand" "=Upa")
+  [(set (match_operand:<VPRED> 0 "register_operand")
 	(unspec:<VPRED>
-	  [(match_operand:<VPRED> 1 "register_operand" "Upl")
+	  [(match_operand:<VPRED> 1 "register_operand")
 	   (match_operand:SI 2 "aarch64_sve_ptrue_flag")
 	   (unspec:<VPRED>
-	     [(match_operand:SVE_FULL_BHI 3 "register_operand" "w")
-	      (match_operand:SVE_FULL_BHI 4 "register_operand" "w")]
+	     [(match_operand:SVE_FULL_BHI 3 "register_operand")
+	      (match_operand:SVE_FULL_BHI 4 "register_operand")]
 	     SVE2_MATCH)]
 	  UNSPEC_PRED_Z))
    (clobber (reg:CC_NZC CC_REGNUM))]
   "TARGET_SVE2 && TARGET_NON_STREAMING"
-  "<sve_int_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.<Vetype>"
+  {@ [ cons: =0, 1 , 2, 3, 4 ]
+     [ Upa     , Upl, , w, w ] <sve_int_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.<Vetype>
+  }
 )
 
 ;; Predicated string matching in which both the flag and predicate results




^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 2/4]AArch64: add new tuning param and attribute for enabling conditional early clobber
  2024-05-15 10:28 [PATCH 0/4]AArch64: support conditional early clobbers on certain operations Tamar Christina
  2024-05-15 10:28 ` [PATCH 1/4]AArch64: convert several predicate patterns to new compact syntax Tamar Christina
@ 2024-05-15 10:28 ` Tamar Christina
  2024-05-15 10:56   ` Richard Sandiford
  2024-05-15 10:29 ` [PATCH 3/4]AArch64: add new alternative with early clobber to patterns Tamar Christina
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 19+ messages in thread
From: Tamar Christina @ 2024-05-15 10:28 UTC (permalink / raw)
  To: gcc-patches
  Cc: nd, Richard.Earnshaw, Marcus.Shawcroft, ktkachov, richard.sandiford

[-- Attachment #1: Type: text/plain, Size: 3308 bytes --]

Hi All,

This adds a new tuning parameter EARLY_CLOBBER_SVE_PRED_DEST for AArch64 to
allow us to conditionally enable the early clobber alternatives based on the
tuning models.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* config/aarch64/aarch64-tuning-flags.def
	(EARLY_CLOBBER_SVE_PRED_DEST): New.
	* config/aarch64/aarch64.h (TARGET_SVE_PRED_CLOBBER): New.
	* config/aarch64/aarch64.md (pred_clobber): New.
	(arch_enabled): Use it.

---
diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def b/gcc/config/aarch64/aarch64-tuning-flags.def
index d5bcaebce770f0b217aac783063d39135f754c77..49fbad3ff28bc82b25c61ac501ccf533ec4b4c3f 100644
--- a/gcc/config/aarch64/aarch64-tuning-flags.def
+++ b/gcc/config/aarch64/aarch64-tuning-flags.def
@@ -48,4 +48,8 @@ AARCH64_EXTRA_TUNING_OPTION ("avoid_cross_loop_fma", AVOID_CROSS_LOOP_FMA)
 
 AARCH64_EXTRA_TUNING_OPTION ("fully_pipelined_fma", FULLY_PIPELINED_FMA)
 
+/* Enable is the target prefers to use a fresh register for predicate outputs
+   rather than re-use an input predicate register.  */
+AARCH64_EXTRA_TUNING_OPTION ("early_clobber_sve_pred_dest", EARLY_CLOBBER_SVE_PRED_DEST)
+
 #undef AARCH64_EXTRA_TUNING_OPTION
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index bbf11faaf4b4340956094a983f8b0dc2649b2d27..76a18dd511f40ebb58ed12d56b46c74084ba7c3c 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -495,6 +495,11 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = AARCH64_FL_SM_OFF;
     enabled through +gcs.  */
 #define TARGET_GCS (AARCH64_ISA_GCS)
 
+/*  Prefer different predicate registers for the output of a predicated operation over
+    re-using an existing input predicate.  */
+#define TARGET_SVE_PRED_CLOBBER (TARGET_SVE \
+				 && (aarch64_tune_params.extra_tuning_flags \
+				     & AARCH64_EXTRA_TUNE_EARLY_CLOBBER_SVE_PRED_DEST))
 
 /* Standard register usage.  */
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index dbde066f7478bec51a8703b017ea553aa98be309..1ecd1a2812969504bd5114a53473b478c5ddba82 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -445,6 +445,10 @@ (define_enum_attr "arch" "arches" (const_string "any"))
 ;; target-independent code.
 (define_attr "is_call" "no,yes" (const_string "no"))
 
+;; Indicates whether we want to enable the pattern with an optional early
+;; clobber for SVE predicates.
+(define_attr "pred_clobber" "no,yes" (const_string "no"))
+
 ;; [For compatibility with Arm in pipeline models]
 ;; Attribute that specifies whether or not the instruction touches fp
 ;; registers.
@@ -461,7 +465,8 @@ (define_attr "fp" "no,yes"
 (define_attr "arch_enabled" "no,yes"
   (if_then_else
     (ior
-	(eq_attr "arch" "any")
+	(and (eq_attr "arch" "any")
+	     (eq_attr "pred_clobber" "no"))
 
 	(and (eq_attr "arch" "rcpc8_4")
 	     (match_test "AARCH64_ISA_RCPC8_4"))
@@ -488,7 +493,10 @@ (define_attr "arch_enabled" "no,yes"
 	     (match_test "TARGET_SVE"))
 
 	(and (eq_attr "arch" "sme")
-	     (match_test "TARGET_SME")))
+	     (match_test "TARGET_SME"))
+
+	(and (eq_attr "pred_clobber" "yes")
+	     (match_test "TARGET_SVE_PRED_CLOBBER")))
     (const_string "yes")
     (const_string "no")))
 




-- 

[-- Attachment #2: rb18355.patch --]
[-- Type: text/x-diff, Size: 2793 bytes --]

diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def b/gcc/config/aarch64/aarch64-tuning-flags.def
index d5bcaebce770f0b217aac783063d39135f754c77..49fbad3ff28bc82b25c61ac501ccf533ec4b4c3f 100644
--- a/gcc/config/aarch64/aarch64-tuning-flags.def
+++ b/gcc/config/aarch64/aarch64-tuning-flags.def
@@ -48,4 +48,8 @@ AARCH64_EXTRA_TUNING_OPTION ("avoid_cross_loop_fma", AVOID_CROSS_LOOP_FMA)
 
 AARCH64_EXTRA_TUNING_OPTION ("fully_pipelined_fma", FULLY_PIPELINED_FMA)
 
+/* Enable is the target prefers to use a fresh register for predicate outputs
+   rather than re-use an input predicate register.  */
+AARCH64_EXTRA_TUNING_OPTION ("early_clobber_sve_pred_dest", EARLY_CLOBBER_SVE_PRED_DEST)
+
 #undef AARCH64_EXTRA_TUNING_OPTION
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index bbf11faaf4b4340956094a983f8b0dc2649b2d27..76a18dd511f40ebb58ed12d56b46c74084ba7c3c 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -495,6 +495,11 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = AARCH64_FL_SM_OFF;
     enabled through +gcs.  */
 #define TARGET_GCS (AARCH64_ISA_GCS)
 
+/*  Prefer different predicate registers for the output of a predicated operation over
+    re-using an existing input predicate.  */
+#define TARGET_SVE_PRED_CLOBBER (TARGET_SVE \
+				 && (aarch64_tune_params.extra_tuning_flags \
+				     & AARCH64_EXTRA_TUNE_EARLY_CLOBBER_SVE_PRED_DEST))
 
 /* Standard register usage.  */
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index dbde066f7478bec51a8703b017ea553aa98be309..1ecd1a2812969504bd5114a53473b478c5ddba82 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -445,6 +445,10 @@ (define_enum_attr "arch" "arches" (const_string "any"))
 ;; target-independent code.
 (define_attr "is_call" "no,yes" (const_string "no"))
 
+;; Indicates whether we want to enable the pattern with an optional early
+;; clobber for SVE predicates.
+(define_attr "pred_clobber" "no,yes" (const_string "no"))
+
 ;; [For compatibility with Arm in pipeline models]
 ;; Attribute that specifies whether or not the instruction touches fp
 ;; registers.
@@ -461,7 +465,8 @@ (define_attr "fp" "no,yes"
 (define_attr "arch_enabled" "no,yes"
   (if_then_else
     (ior
-	(eq_attr "arch" "any")
+	(and (eq_attr "arch" "any")
+	     (eq_attr "pred_clobber" "no"))
 
 	(and (eq_attr "arch" "rcpc8_4")
 	     (match_test "AARCH64_ISA_RCPC8_4"))
@@ -488,7 +493,10 @@ (define_attr "arch_enabled" "no,yes"
 	     (match_test "TARGET_SVE"))
 
 	(and (eq_attr "arch" "sme")
-	     (match_test "TARGET_SME")))
+	     (match_test "TARGET_SME"))
+
+	(and (eq_attr "pred_clobber" "yes")
+	     (match_test "TARGET_SVE_PRED_CLOBBER")))
     (const_string "yes")
     (const_string "no")))
 




^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 3/4]AArch64: add new alternative with early clobber to patterns
  2024-05-15 10:28 [PATCH 0/4]AArch64: support conditional early clobbers on certain operations Tamar Christina
  2024-05-15 10:28 ` [PATCH 1/4]AArch64: convert several predicate patterns to new compact syntax Tamar Christina
  2024-05-15 10:28 ` [PATCH 2/4]AArch64: add new tuning param and attribute for enabling conditional early clobber Tamar Christina
@ 2024-05-15 10:29 ` Tamar Christina
  2024-05-15 10:29 ` [PATCH 4/4]AArch64: enable new predicate tuning for Neoverse cores Tamar Christina
  2024-05-15 11:20 ` [PATCH 0/4]AArch64: support conditional early clobbers on certain operations Richard Biener
  4 siblings, 0 replies; 19+ messages in thread
From: Tamar Christina @ 2024-05-15 10:29 UTC (permalink / raw)
  To: gcc-patches
  Cc: nd, Richard.Earnshaw, Marcus.Shawcroft, ktkachov, richard.sandiford

[-- Attachment #1: Type: text/plain, Size: 18016 bytes --]

Hi All,

This patch adds new alternatives to the patterns which are affected.  The new
alternatives with the conditional early clobbers are added before the normal
ones in order for LRA to prefer them in the event that we have enough free
registers to accommodate them.

In case register pressure is too high the normal alternatives will be preferred
before a reload is considered as we rather have the tie than a spill.

Tests are in the next patch.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* config/aarch64/aarch64-sve.md (and<mode>3,
	@aarch64_pred_<optab><mode>_z, *<optab><mode>3_cc,
	*<optab><mode>3_ptest, aarch64_pred_<nlogical><mode>_z,
	*<nlogical><mode>3_cc, *<nlogical><mode>3_ptest,
	aarch64_pred_<logical_nn><mode>_z, *<logical_nn><mode>3_cc,
	*<logical_nn><mode>3_ptest, @aarch64_pred_cmp<cmp_op><mode>,
	*cmp<cmp_op><mode>_cc, *cmp<cmp_op><mode>_ptest,
	@aarch64_pred_cmp<cmp_op><mode>_wide,
	*aarch64_pred_cmp<cmp_op><mode>_wide_cc,
	*aarch64_pred_cmp<cmp_op><mode>_wide_ptest, @aarch64_brk<brk_op>,
	*aarch64_brk<brk_op>_cc, *aarch64_brk<brk_op>_ptest,
	@aarch64_brk<brk_op>, *aarch64_brkn_cc, *aarch64_brkn_ptest,
	*aarch64_brk<brk_op>_cc, *aarch64_brk<brk_op>_ptest,
	aarch64_rdffr_z, *aarch64_rdffr_z_ptest, *aarch64_rdffr_ptest,
	*aarch64_rdffr_z_cc, *aarch64_rdffr_cc): Add new early clobber
	alternative.
	* config/aarch64/aarch64-sve2.md
	(@aarch64_pred_<sve_int_op><mode>): Likewise.

---
diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md
index 839ab0627747d7a49bef7b0192ee9e7a42587ca0..93ec59e58afee260b85082c472db2abfea7386b6 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -1161,8 +1161,9 @@ (define_insn "aarch64_rdffr_z"
 	  (reg:VNx16BI FFRT_REGNUM)
 	  (match_operand:VNx16BI 1 "register_operand")))]
   "TARGET_SVE && TARGET_NON_STREAMING"
-  {@ [ cons: =0, 1   ]
-     [ Upa     , Upa ] rdffr\t%0.b, %1/z
+  {@ [ cons: =0, 1  ; attrs: pred_clobber ]
+     [ &Upa    , Upa; yes                 ] rdffr\t%0.b, %1/z
+     [ Upa     , Upa; *                   ] ^
   }
 )
 
@@ -1179,8 +1180,9 @@ (define_insn "*aarch64_rdffr_z_ptest"
 	  UNSPEC_PTEST))
    (clobber (match_scratch:VNx16BI 0))]
   "TARGET_SVE && TARGET_NON_STREAMING"
-  {@ [ cons: =0, 1  , 2 ]
-     [ Upa     , Upa,   ] rdffrs\t%0.b, %1/z
+  {@ [ cons: =0, 1  , 2; attrs: pred_clobber ]
+     [ &Upa    , Upa,  ; yes                 ] rdffrs\t%0.b, %1/z
+     [ Upa     , Upa,  ; *                   ] ^
   }
 )
 
@@ -1195,8 +1197,9 @@ (define_insn "*aarch64_rdffr_ptest"
 	  UNSPEC_PTEST))
    (clobber (match_scratch:VNx16BI 0))]
   "TARGET_SVE && TARGET_NON_STREAMING"
-  {@ [ cons: =0, 1   ]
-     [ Upa     , Upa ] rdffrs\t%0.b, %1/z
+  {@ [ cons: =0, 1  ; attrs: pred_clobber ]
+     [ &Upa    , Upa; yes                 ] rdffrs\t%0.b, %1/z
+     [ Upa     , Upa; *                   ] ^
   }
 )
 
@@ -1216,8 +1219,9 @@ (define_insn "*aarch64_rdffr_z_cc"
 	  (reg:VNx16BI FFRT_REGNUM)
 	  (match_dup 1)))]
   "TARGET_SVE && TARGET_NON_STREAMING"
-  {@ [ cons: =0, 1  , 2 ]
-     [ Upa     , Upa,   ] rdffrs\t%0.b, %1/z
+  {@ [ cons: =0, 1  , 2; attrs: pred_clobber ]
+     [ &Upa    , Upa,  ; yes                 ] rdffrs\t%0.b, %1/z
+     [ Upa     , Upa,  ; *                   ] ^
   }
 )
 
@@ -1233,8 +1237,9 @@ (define_insn "*aarch64_rdffr_cc"
    (set (match_operand:VNx16BI 0 "register_operand")
 	(reg:VNx16BI FFRT_REGNUM))]
   "TARGET_SVE && TARGET_NON_STREAMING"
-  {@ [ cons: =0, 1  , 2 ]
-     [ Upa     , Upa,   ] rdffrs\t%0.b, %1/z
+  {@ [ cons: =0, 1  , 2; attrs: pred_clobber ]
+     [ &Upa    , Upa,  ; yes                 ] rdffrs\t%0.b, %1/z
+     [ Upa     , Upa,  ; *                   ] ^
   }
 )
 
@@ -6651,8 +6656,9 @@ (define_insn "and<mode>3"
 	(and:PRED_ALL (match_operand:PRED_ALL 1 "register_operand")
 		      (match_operand:PRED_ALL 2 "register_operand")))]
   "TARGET_SVE"
-  {@ [ cons: =0, 1  , 2   ]
-     [ Upa     , Upa, Upa ] and\t%0.b, %1/z, %2.b, %2.b
+  {@ [ cons: =0, 1  , 2  ; attrs: pred_clobber ]
+     [ &Upa    , Upa, Upa; yes                 ] and\t%0.b, %1/z, %2.b, %2.b
+     [ Upa     , Upa, Upa; *                   ] ^
   }
 )
 
@@ -6679,8 +6685,9 @@ (define_insn "@aarch64_pred_<optab><mode>_z"
 	    (match_operand:PRED_ALL 3 "register_operand"))
 	  (match_operand:PRED_ALL 1 "register_operand")))]
   "TARGET_SVE"
-  {@ [ cons: =0, 1  , 2  , 3   ]
-     [ Upa     , Upa, Upa, Upa ] <logical>\t%0.b, %1/z, %2.b, %3.b
+  {@ [ cons: =0, 1  , 2  , 3  ; attrs: pred_clobber ]
+     [ &Upa    , Upa, Upa, Upa; yes                 ] <logical>\t%0.b, %1/z, %2.b, %3.b
+     [ Upa     , Upa, Upa, Upa; *                   ] ^
   }
 )
 
@@ -6703,8 +6710,9 @@ (define_insn "*<optab><mode>3_cc"
 	(and:PRED_ALL (LOGICAL:PRED_ALL (match_dup 2) (match_dup 3))
 		      (match_dup 4)))]
   "TARGET_SVE"
-  {@ [ cons: =0, 1  , 2  , 3  , 4, 5 ]
-     [ Upa     , Upa, Upa, Upa,  ,   ] <logical>s\t%0.b, %1/z, %2.b, %3.b
+  {@ [ cons: =0, 1  , 2  , 3  , 4, 5; attrs: pred_clobber ]
+     [ &Upa    , Upa, Upa, Upa,  ,  ; yes                 ] <logical>s\t%0.b, %1/z, %2.b, %3.b
+     [ Upa     , Upa, Upa, Upa,  ,  ; *                   ] ^
   }
 )
 
@@ -6723,8 +6731,9 @@ (define_insn "*<optab><mode>3_ptest"
 	  UNSPEC_PTEST))
    (clobber (match_scratch:VNx16BI 0))]
   "TARGET_SVE"
-  {@ [ cons: =0, 1  , 2  , 3  , 4, 5 ]
-     [ Upa     , Upa, Upa, Upa,  ,   ] <logical>s\t%0.b, %1/z, %2.b, %3.b
+  {@ [ cons: =0, 1  , 2  , 3  , 4, 5; attrs: pred_clobber ]
+     [ &Upa    , Upa, Upa, Upa,  ,  ; yes                 ] <logical>s\t%0.b, %1/z, %2.b, %3.b
+     [ Upa     , Upa, Upa, Upa,  ,  ; *                   ] ^
   }
 )
 
@@ -6745,8 +6754,9 @@ (define_insn "aarch64_pred_<nlogical><mode>_z"
 	    (match_operand:PRED_ALL 2 "register_operand"))
 	  (match_operand:PRED_ALL 1 "register_operand")))]
   "TARGET_SVE"
-  {@ [ cons: =0, 1  , 2  , 3   ]
-     [ Upa     , Upa, Upa, Upa ] <nlogical>\t%0.b, %1/z, %2.b, %3.b
+  {@ [ cons: =0, 1  , 2  , 3  ; attrs: pred_clobber ]
+     [ &Upa    , Upa, Upa, Upa; yes                 ] <nlogical>\t%0.b, %1/z, %2.b, %3.b
+     [ Upa     , Upa, Upa, Upa; *                   ] ^
   }
 )
 
@@ -6770,8 +6780,9 @@ (define_insn "*<nlogical><mode>3_cc"
 			(match_dup 2))
 		      (match_dup 4)))]
   "TARGET_SVE"
-  {@ [ cons: =0, 1  , 2  , 3  , 4, 5 ]
-     [ Upa     , Upa, Upa, Upa,  ,   ] <nlogical>s\t%0.b, %1/z, %2.b, %3.b
+  {@ [ cons: =0, 1  , 2  , 3  , 4, 5; attrs: pred_clobber ]
+     [ &Upa    , Upa, Upa, Upa,  ,  ; yes                 ] <nlogical>s\t%0.b, %1/z, %2.b, %3.b
+     [ Upa     , Upa, Upa, Upa,  ,  ; *                   ] ^
   }
 )
 
@@ -6791,8 +6802,9 @@ (define_insn "*<nlogical><mode>3_ptest"
 	  UNSPEC_PTEST))
    (clobber (match_scratch:VNx16BI 0))]
   "TARGET_SVE"
-  {@ [ cons:  =0, 1  , 2  , 3  , 4, 5 ]
-     [ Upa      , Upa, Upa, Upa,  ,   ] <nlogical>s\t%0.b, %1/z, %2.b, %3.b
+  {@ [ cons:  =0, 1  , 2  , 3  , 4, 5; attrs: pred_clobber ]
+     [ &Upa     , Upa, Upa, Upa,  ,  ; yes                 ] <nlogical>s\t%0.b, %1/z, %2.b, %3.b
+     [ Upa      , Upa, Upa, Upa,  ,  ; *                   ] ^
   }
 )
 
@@ -6813,8 +6825,9 @@ (define_insn "aarch64_pred_<logical_nn><mode>_z"
 	    (not:PRED_ALL (match_operand:PRED_ALL 3 "register_operand")))
 	  (match_operand:PRED_ALL 1 "register_operand")))]
   "TARGET_SVE"
-  {@ [ cons: =0, 1  , 2  , 3   ]
-     [ Upa     , Upa, Upa, Upa ] <logical_nn>\t%0.b, %1/z, %2.b, %3.b
+  {@ [ cons: =0,  1 , 2  , 3  ; attrs: pred_clobber ]
+     [ &Upa    , Upa, Upa, Upa; yes                 ] <logical_nn>\t%0.b, %1/z, %2.b, %3.b
+     [ Upa     , Upa, Upa, Upa; *                   ] ^
   }
 )
 
@@ -6839,8 +6852,9 @@ (define_insn "*<logical_nn><mode>3_cc"
 			(not:PRED_ALL (match_dup 3)))
 		      (match_dup 4)))]
   "TARGET_SVE"
-  {@ [ cons: =0, 1  , 2  , 3  , 4, 5 ]
-     [ Upa     , Upa, Upa, Upa,  ,   ] <logical_nn>s\t%0.b, %1/z, %2.b, %3.b
+  {@ [ cons: =0, 1  , 2  , 3  , 4, 5; attrs: pred_clobber ]
+     [ &Upa    , Upa, Upa, Upa,  ,  ; yes                 ] <logical_nn>s\t%0.b, %1/z, %2.b, %3.b
+     [ Upa     , Upa, Upa, Upa,  ,  ; *                   ] ^
   }
 )
 
@@ -6861,8 +6875,9 @@ (define_insn "*<logical_nn><mode>3_ptest"
 	  UNSPEC_PTEST))
    (clobber (match_scratch:VNx16BI 0))]
   "TARGET_SVE"
-  {@ [ cons: =0, 1  , 2  , 3  , 4, 5 ]
-     [ Upa     , Upa, Upa, Upa,  ,   ] <logical_nn>s\t%0.b, %1/z, %2.b, %3.b
+  {@ [ cons: =0, 1  , 2  , 3  , 4, 5; attrs: pred_clobber ]
+     [ &Upa    , Upa, Upa, Upa,  ,  ; yes                 ] <logical_nn>s\t%0.b, %1/z, %2.b, %3.b
+     [ Upa     , Upa, Upa, Upa,  ,  ; *                   ] ^
   }
 )
 
@@ -8104,9 +8119,11 @@ (define_insn "@aarch64_pred_cmp<cmp_op><mode>"
 	  UNSPEC_PRED_Z))
    (clobber (reg:CC_NZC CC_REGNUM))]
   "TARGET_SVE"
-  {@ [ cons: =0 , 1   , 3 , 4              ]
-     [ Upa      , Upl , w , <sve_imm_con>  ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, #%4
-     [ Upa      , Upl , w , w              ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.<Vetype>
+  {@ [ cons: =0 , 1   , 3 , 4            ; attrs: pred_clobber ]
+     [ &Upa     , Upl , w , <sve_imm_con>; yes                 ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, #%4
+     [ Upa      , Upl , w , <sve_imm_con>; *                   ] ^
+     [ &Upa     , Upl , w , w            ; yes                 ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.<Vetype>
+     [ Upa      , Upl , w , w            ; *                   ] ^
   }
 )
 
@@ -8136,9 +8153,11 @@ (define_insn_and_rewrite "*cmp<cmp_op><mode>_cc"
 	  UNSPEC_PRED_Z))]
   "TARGET_SVE
    && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])"
-  {@ [ cons: =0 , 1   , 2 , 3              ]
-     [ Upa      , Upl , w , <sve_imm_con>  ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, #%3
-     [ Upa      , Upl , w , w              ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype>
+  {@ [ cons: =0 , 1    , 2 , 3            ; attrs: pred_clobber ]
+     [ &Upa     ,  Upl , w , <sve_imm_con>; yes                 ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, #%3
+     [ Upa      ,  Upl , w , <sve_imm_con>; *                   ] ^
+     [ &Upa     ,  Upl , w , w            ; yes                 ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype>
+     [ Upa      ,  Upl , w , w            ; *                   ] ^
   }
   "&& !rtx_equal_p (operands[4], operands[6])"
   {
@@ -8166,9 +8185,11 @@ (define_insn_and_rewrite "*cmp<cmp_op><mode>_ptest"
    (clobber (match_scratch:<VPRED> 0))]
   "TARGET_SVE
    && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])"
-  {@ [ cons: =0, 1  , 2 , 3              ]
-     [ Upa     , Upl, w , <sve_imm_con>  ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, #%3
-     [ Upa     , Upl, w , w              ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype>
+  {@ [ cons: =0, 1   , 2 , 3            ; attrs: pred_clobber ]
+     [ &Upa    ,  Upl, w , <sve_imm_con>; yes                 ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, #%3
+     [ Upa     ,  Upl, w , <sve_imm_con>; *                   ] ^
+     [ &Upa    ,  Upl, w , w            ; yes                 ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype>
+     [ Upa     ,  Upl, w , w            ; *                   ] ^
   }
   "&& !rtx_equal_p (operands[4], operands[6])"
   {
@@ -8221,8 +8242,9 @@ (define_insn "@aarch64_pred_cmp<cmp_op><mode>_wide"
 	  UNSPEC_PRED_Z))
    (clobber (reg:CC_NZC CC_REGNUM))]
   "TARGET_SVE"
-  {@ [ cons: =0, 1  , 2, 3, 4 ]
-     [ Upa     , Upl,  , w, w ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.d
+  {@ [ cons: =0, 1   , 2, 3, 4; attrs: pred_clobber ]
+     [ &Upa    ,  Upl,  , w, w; yes                 ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.d
+     [ Upa     ,  Upl,  , w, w; *                   ] ^
   }
 )
 
@@ -8254,8 +8276,9 @@ (define_insn "*aarch64_pred_cmp<cmp_op><mode>_wide_cc"
 	  UNSPEC_PRED_Z))]
   "TARGET_SVE
    && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])"
-  {@ [ cons: =0, 1  , 2, 3, 4, 5, 6  , 7 ]
-     [ Upa     , Upl, w, w,  ,  , Upl,   ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.d
+  {@ [ cons: =0, 1   , 2, 3, 4, 5, 6  , 7; attrs: pred_clobber ]
+     [ &Upa    ,  Upl, w, w,  ,  , Upl,  ; yes                 ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.d
+     [ Upa     ,  Upl, w, w,  ,  , Upl,  ; *                   ] ^
   }
 )
 
@@ -8279,8 +8302,9 @@ (define_insn "*aarch64_pred_cmp<cmp_op><mode>_wide_ptest"
    (clobber (match_scratch:<VPRED> 0))]
   "TARGET_SVE
    && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])"
-  {@ [ cons:  =0, 1  , 2, 3, 4, 5, 6  , 7 ]
-     [ Upa      , Upl, w, w,  ,  , Upl,   ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.d
+  {@ [ cons:  =0, 1   , 2, 3, 4, 5, 6  , 7; attrs: pred_clobber ]
+     [ &Upa     ,  Upl, w, w,  ,  , Upl,  ; yes                 ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.d
+     [ Upa      ,  Upl, w, w,  ,  , Upl,  ; *                   ] ^
   }
 )
 
@@ -9948,9 +9972,11 @@ (define_insn "@aarch64_brk<brk_op>"
 	   (match_operand:VNx16BI 3 "aarch64_simd_reg_or_zero")]
 	  SVE_BRK_UNARY))]
   "TARGET_SVE"
-  {@ [ cons: =0 , 1   , 2   , 3   ]
-     [ Upa      , Upa , Upa , Dz  ] brk<brk_op>\t%0.b, %1/z, %2.b
-     [ Upa      , Upa , Upa , 0   ] brk<brk_op>\t%0.b, %1/m, %2.b
+  {@ [ cons: =0 , 1   , 2   , 3  ; attrs: pred_clobber ]
+     [ &Upa     ,  Upa , Upa , Dz; yes                 ] brk<brk_op>\t%0.b, %1/z, %2.b
+     [ Upa      ,  Upa , Upa , Dz; *                   ] ^
+     [ &Upa     ,  Upa , Upa , 0 ; yes                 ] brk<brk_op>\t%0.b, %1/m, %2.b
+     [ Upa      ,  Upa , Upa , 0 ; *                   ] ^
   }
 )
 
@@ -9974,8 +10000,9 @@ (define_insn "*aarch64_brk<brk_op>_cc"
 	   (match_dup 3)]
 	  SVE_BRK_UNARY))]
   "TARGET_SVE"
-  {@ [ cons: =0, 1  , 2  , 3, 4 ]
-     [ Upa     , Upa, Upa,  ,   ] brk<brk_op>s\t%0.b, %1/z, %2.b
+  {@ [ cons: =0, 1  , 2  , 3, 4; attrs: pred_clobber ]
+     [ &Upa    , Upa, Upa,  ,  ; yes                 ] brk<brk_op>s\t%0.b, %1/z, %2.b
+     [ Upa     , Upa, Upa,  ,  ; *                   ] ^
   }
 )
 
@@ -9994,8 +10021,9 @@ (define_insn "*aarch64_brk<brk_op>_ptest"
 	  UNSPEC_PTEST))
    (clobber (match_scratch:VNx16BI 0))]
   "TARGET_SVE"
-  {@ [ cons: =0, 1  , 2  , 3, 4 ]
-     [ Upa     , Upa, Upa,  ,   ] brk<brk_op>s\t%0.b, %1/z, %2.b
+  {@ [ cons: =0, 1  , 2  , 3, 4; attrs: pred_clobber ]
+     [ &Upa    , Upa, Upa,  ,  ; yes                 ] brk<brk_op>s\t%0.b, %1/z, %2.b
+     [ Upa     , Upa, Upa,  ,  ; *                   ] ^
   }
 )
 
@@ -10020,8 +10048,9 @@ (define_insn "@aarch64_brk<brk_op>"
 	   (match_operand:VNx16BI 3 "register_operand")]
 	  SVE_BRK_BINARY))]
   "TARGET_SVE"
-  {@ [ cons: =0, 1  , 2  , 3             ]
-     [ Upa     , Upa, Upa, <brk_reg_con> ] brk<brk_op>\t%0.b, %1/z, %2.b, %<brk_reg_opno>.b
+  {@ [ cons: =0,  1 , 2  , 3            ; attrs: pred_clobber ]
+     [ &Upa    , Upa, Upa, <brk_reg_con>; yes                 ] brk<brk_op>\t%0.b, %1/z, %2.b, %<brk_reg_opno>.b
+     [ Upa     , Upa, Upa, <brk_reg_con>; *                   ] ^
   }
 )
 
@@ -10046,8 +10075,9 @@ (define_insn_and_rewrite "*aarch64_brkn_cc"
 	   (match_dup 3)]
 	  UNSPEC_BRKN))]
   "TARGET_SVE"
-  {@ [ cons: =0, 1  , 2  , 3, 4, 5 ]
-     [ Upa     , Upa, Upa, 0,  ,   ] brkns\t%0.b, %1/z, %2.b, %0.b
+  {@ [ cons: =0, 1  , 2  , 3, 4, 5; attrs: pred_clobber ]
+     [ &Upa    , Upa, Upa, 0,  ,  ; yes                 ] brkns\t%0.b, %1/z, %2.b, %0.b
+     [ Upa     , Upa, Upa, 0,  ,  ; *                   ] ^
   }
   "&& (operands[4] != CONST0_RTX (VNx16BImode)
        || operands[5] != CONST0_RTX (VNx16BImode))"
@@ -10072,8 +10102,9 @@ (define_insn_and_rewrite "*aarch64_brkn_ptest"
 	  UNSPEC_PTEST))
    (clobber (match_scratch:VNx16BI 0))]
   "TARGET_SVE"
-  {@ [ cons: =0, 1  , 2  , 3, 4, 5 ]
-     [ Upa     , Upa, Upa, 0,  ,   ] brkns\t%0.b, %1/z, %2.b, %0.b
+  {@ [ cons: =0, 1  , 2  , 3, 4, 5; attrs: pred_clobber ]
+     [ &Upa    , Upa, Upa, 0,  , ; yes                  ] brkns\t%0.b, %1/z, %2.b, %0.b
+     [ Upa     , Upa, Upa, 0,  , ; *                    ] ^
   }
   "&& (operands[4] != CONST0_RTX (VNx16BImode)
        || operands[5] != CONST0_RTX (VNx16BImode))"
@@ -10103,8 +10134,9 @@ (define_insn "*aarch64_brk<brk_op>_cc"
 	   (match_dup 3)]
 	  SVE_BRKP))]
   "TARGET_SVE"
-  {@ [ cons: =0, 1  , 2  , 3  , 4 ]
-     [ Upa     , Upa, Upa, Upa,   ] brk<brk_op>s\t%0.b, %1/z, %2.b, %3.b
+  {@ [ cons: =0, 1  , 2  , 3  , 4; attrs: pred_clobber ]
+     [ &Upa    , Upa, Upa, Upa,  ; yes                 ] brk<brk_op>s\t%0.b, %1/z, %2.b, %3.b
+     [ Upa     , Upa, Upa, Upa,  ; *                   ] ^
   }
 )
 
@@ -10123,8 +10155,9 @@ (define_insn "*aarch64_brk<brk_op>_ptest"
 	  UNSPEC_PTEST))
    (clobber (match_scratch:VNx16BI 0))]
   "TARGET_SVE"
-  {@ [ cons: =0, 1  , 2  , 3  , 4 ]
-     [ Upa     , Upa, Upa, Upa,   ] brk<brk_op>s\t%0.b, %1/z, %2.b, %3.b
+  {@ [ cons: =0, 1  , 2  , 3  , 4; attrs: pred_clobber ]
+     [ &Upa    , Upa, Upa, Upa,   ; yes                ] brk<brk_op>s\t%0.b, %1/z, %2.b, %3.b
+     [ Upa     , Upa, Upa, Upa,   ; *                  ] ^
   }
 )
 
diff --git a/gcc/config/aarch64/aarch64-sve2.md b/gcc/config/aarch64/aarch64-sve2.md
index aa12baf48355358ca4fefe88157df3aac6eb09bd..771c346b8a3188dd7e3f3a98ee28f0ca5f928215 100644
--- a/gcc/config/aarch64/aarch64-sve2.md
+++ b/gcc/config/aarch64/aarch64-sve2.md
@@ -3349,8 +3349,9 @@ (define_insn "@aarch64_pred_<sve_int_op><mode>"
 	  UNSPEC_PRED_Z))
    (clobber (reg:CC_NZC CC_REGNUM))]
   "TARGET_SVE2 && TARGET_NON_STREAMING"
-  {@ [ cons: =0, 1 , 2, 3, 4 ]
-     [ Upa     , Upl, , w, w ] <sve_int_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.<Vetype>
+  {@ [ cons: =0, 1 , 2, 3, 4; attrs: pred_clobber ]
+     [ &Upa    , Upl, , w, w; yes                 ] <sve_int_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.<Vetype>
+     [ Upa     , Upl, , w, w; *                   ] ^
   }
 )
 




-- 

[-- Attachment #2: rb18357.patch --]
[-- Type: text/x-diff, Size: 16523 bytes --]

diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md
index 839ab0627747d7a49bef7b0192ee9e7a42587ca0..93ec59e58afee260b85082c472db2abfea7386b6 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -1161,8 +1161,9 @@ (define_insn "aarch64_rdffr_z"
 	  (reg:VNx16BI FFRT_REGNUM)
 	  (match_operand:VNx16BI 1 "register_operand")))]
   "TARGET_SVE && TARGET_NON_STREAMING"
-  {@ [ cons: =0, 1   ]
-     [ Upa     , Upa ] rdffr\t%0.b, %1/z
+  {@ [ cons: =0, 1  ; attrs: pred_clobber ]
+     [ &Upa    , Upa; yes                 ] rdffr\t%0.b, %1/z
+     [ Upa     , Upa; *                   ] ^
   }
 )
 
@@ -1179,8 +1180,9 @@ (define_insn "*aarch64_rdffr_z_ptest"
 	  UNSPEC_PTEST))
    (clobber (match_scratch:VNx16BI 0))]
   "TARGET_SVE && TARGET_NON_STREAMING"
-  {@ [ cons: =0, 1  , 2 ]
-     [ Upa     , Upa,   ] rdffrs\t%0.b, %1/z
+  {@ [ cons: =0, 1  , 2; attrs: pred_clobber ]
+     [ &Upa    , Upa,  ; yes                 ] rdffrs\t%0.b, %1/z
+     [ Upa     , Upa,  ; *                   ] ^
   }
 )
 
@@ -1195,8 +1197,9 @@ (define_insn "*aarch64_rdffr_ptest"
 	  UNSPEC_PTEST))
    (clobber (match_scratch:VNx16BI 0))]
   "TARGET_SVE && TARGET_NON_STREAMING"
-  {@ [ cons: =0, 1   ]
-     [ Upa     , Upa ] rdffrs\t%0.b, %1/z
+  {@ [ cons: =0, 1  ; attrs: pred_clobber ]
+     [ &Upa    , Upa; yes                 ] rdffrs\t%0.b, %1/z
+     [ Upa     , Upa; *                   ] ^
   }
 )
 
@@ -1216,8 +1219,9 @@ (define_insn "*aarch64_rdffr_z_cc"
 	  (reg:VNx16BI FFRT_REGNUM)
 	  (match_dup 1)))]
   "TARGET_SVE && TARGET_NON_STREAMING"
-  {@ [ cons: =0, 1  , 2 ]
-     [ Upa     , Upa,   ] rdffrs\t%0.b, %1/z
+  {@ [ cons: =0, 1  , 2; attrs: pred_clobber ]
+     [ &Upa    , Upa,  ; yes                 ] rdffrs\t%0.b, %1/z
+     [ Upa     , Upa,  ; *                   ] ^
   }
 )
 
@@ -1233,8 +1237,9 @@ (define_insn "*aarch64_rdffr_cc"
    (set (match_operand:VNx16BI 0 "register_operand")
 	(reg:VNx16BI FFRT_REGNUM))]
   "TARGET_SVE && TARGET_NON_STREAMING"
-  {@ [ cons: =0, 1  , 2 ]
-     [ Upa     , Upa,   ] rdffrs\t%0.b, %1/z
+  {@ [ cons: =0, 1  , 2; attrs: pred_clobber ]
+     [ &Upa    , Upa,  ; yes                 ] rdffrs\t%0.b, %1/z
+     [ Upa     , Upa,  ; *                   ] ^
   }
 )
 
@@ -6651,8 +6656,9 @@ (define_insn "and<mode>3"
 	(and:PRED_ALL (match_operand:PRED_ALL 1 "register_operand")
 		      (match_operand:PRED_ALL 2 "register_operand")))]
   "TARGET_SVE"
-  {@ [ cons: =0, 1  , 2   ]
-     [ Upa     , Upa, Upa ] and\t%0.b, %1/z, %2.b, %2.b
+  {@ [ cons: =0, 1  , 2  ; attrs: pred_clobber ]
+     [ &Upa    , Upa, Upa; yes                 ] and\t%0.b, %1/z, %2.b, %2.b
+     [ Upa     , Upa, Upa; *                   ] ^
   }
 )
 
@@ -6679,8 +6685,9 @@ (define_insn "@aarch64_pred_<optab><mode>_z"
 	    (match_operand:PRED_ALL 3 "register_operand"))
 	  (match_operand:PRED_ALL 1 "register_operand")))]
   "TARGET_SVE"
-  {@ [ cons: =0, 1  , 2  , 3   ]
-     [ Upa     , Upa, Upa, Upa ] <logical>\t%0.b, %1/z, %2.b, %3.b
+  {@ [ cons: =0, 1  , 2  , 3  ; attrs: pred_clobber ]
+     [ &Upa    , Upa, Upa, Upa; yes                 ] <logical>\t%0.b, %1/z, %2.b, %3.b
+     [ Upa     , Upa, Upa, Upa; *                   ] ^
   }
 )
 
@@ -6703,8 +6710,9 @@ (define_insn "*<optab><mode>3_cc"
 	(and:PRED_ALL (LOGICAL:PRED_ALL (match_dup 2) (match_dup 3))
 		      (match_dup 4)))]
   "TARGET_SVE"
-  {@ [ cons: =0, 1  , 2  , 3  , 4, 5 ]
-     [ Upa     , Upa, Upa, Upa,  ,   ] <logical>s\t%0.b, %1/z, %2.b, %3.b
+  {@ [ cons: =0, 1  , 2  , 3  , 4, 5; attrs: pred_clobber ]
+     [ &Upa    , Upa, Upa, Upa,  ,  ; yes                 ] <logical>s\t%0.b, %1/z, %2.b, %3.b
+     [ Upa     , Upa, Upa, Upa,  ,  ; *                   ] ^
   }
 )
 
@@ -6723,8 +6731,9 @@ (define_insn "*<optab><mode>3_ptest"
 	  UNSPEC_PTEST))
    (clobber (match_scratch:VNx16BI 0))]
   "TARGET_SVE"
-  {@ [ cons: =0, 1  , 2  , 3  , 4, 5 ]
-     [ Upa     , Upa, Upa, Upa,  ,   ] <logical>s\t%0.b, %1/z, %2.b, %3.b
+  {@ [ cons: =0, 1  , 2  , 3  , 4, 5; attrs: pred_clobber ]
+     [ &Upa    , Upa, Upa, Upa,  ,  ; yes                 ] <logical>s\t%0.b, %1/z, %2.b, %3.b
+     [ Upa     , Upa, Upa, Upa,  ,  ; *                   ] ^
   }
 )
 
@@ -6745,8 +6754,9 @@ (define_insn "aarch64_pred_<nlogical><mode>_z"
 	    (match_operand:PRED_ALL 2 "register_operand"))
 	  (match_operand:PRED_ALL 1 "register_operand")))]
   "TARGET_SVE"
-  {@ [ cons: =0, 1  , 2  , 3   ]
-     [ Upa     , Upa, Upa, Upa ] <nlogical>\t%0.b, %1/z, %2.b, %3.b
+  {@ [ cons: =0, 1  , 2  , 3  ; attrs: pred_clobber ]
+     [ &Upa    , Upa, Upa, Upa; yes                 ] <nlogical>\t%0.b, %1/z, %2.b, %3.b
+     [ Upa     , Upa, Upa, Upa; *                   ] ^
   }
 )
 
@@ -6770,8 +6780,9 @@ (define_insn "*<nlogical><mode>3_cc"
 			(match_dup 2))
 		      (match_dup 4)))]
   "TARGET_SVE"
-  {@ [ cons: =0, 1  , 2  , 3  , 4, 5 ]
-     [ Upa     , Upa, Upa, Upa,  ,   ] <nlogical>s\t%0.b, %1/z, %2.b, %3.b
+  {@ [ cons: =0, 1  , 2  , 3  , 4, 5; attrs: pred_clobber ]
+     [ &Upa    , Upa, Upa, Upa,  ,  ; yes                 ] <nlogical>s\t%0.b, %1/z, %2.b, %3.b
+     [ Upa     , Upa, Upa, Upa,  ,  ; *                   ] ^
   }
 )
 
@@ -6791,8 +6802,9 @@ (define_insn "*<nlogical><mode>3_ptest"
 	  UNSPEC_PTEST))
    (clobber (match_scratch:VNx16BI 0))]
   "TARGET_SVE"
-  {@ [ cons:  =0, 1  , 2  , 3  , 4, 5 ]
-     [ Upa      , Upa, Upa, Upa,  ,   ] <nlogical>s\t%0.b, %1/z, %2.b, %3.b
+  {@ [ cons:  =0, 1  , 2  , 3  , 4, 5; attrs: pred_clobber ]
+     [ &Upa     , Upa, Upa, Upa,  ,  ; yes                 ] <nlogical>s\t%0.b, %1/z, %2.b, %3.b
+     [ Upa      , Upa, Upa, Upa,  ,  ; *                   ] ^
   }
 )
 
@@ -6813,8 +6825,9 @@ (define_insn "aarch64_pred_<logical_nn><mode>_z"
 	    (not:PRED_ALL (match_operand:PRED_ALL 3 "register_operand")))
 	  (match_operand:PRED_ALL 1 "register_operand")))]
   "TARGET_SVE"
-  {@ [ cons: =0, 1  , 2  , 3   ]
-     [ Upa     , Upa, Upa, Upa ] <logical_nn>\t%0.b, %1/z, %2.b, %3.b
+  {@ [ cons: =0,  1 , 2  , 3  ; attrs: pred_clobber ]
+     [ &Upa    , Upa, Upa, Upa; yes                 ] <logical_nn>\t%0.b, %1/z, %2.b, %3.b
+     [ Upa     , Upa, Upa, Upa; *                   ] ^
   }
 )
 
@@ -6839,8 +6852,9 @@ (define_insn "*<logical_nn><mode>3_cc"
 			(not:PRED_ALL (match_dup 3)))
 		      (match_dup 4)))]
   "TARGET_SVE"
-  {@ [ cons: =0, 1  , 2  , 3  , 4, 5 ]
-     [ Upa     , Upa, Upa, Upa,  ,   ] <logical_nn>s\t%0.b, %1/z, %2.b, %3.b
+  {@ [ cons: =0, 1  , 2  , 3  , 4, 5; attrs: pred_clobber ]
+     [ &Upa    , Upa, Upa, Upa,  ,  ; yes                 ] <logical_nn>s\t%0.b, %1/z, %2.b, %3.b
+     [ Upa     , Upa, Upa, Upa,  ,  ; *                   ] ^
   }
 )
 
@@ -6861,8 +6875,9 @@ (define_insn "*<logical_nn><mode>3_ptest"
 	  UNSPEC_PTEST))
    (clobber (match_scratch:VNx16BI 0))]
   "TARGET_SVE"
-  {@ [ cons: =0, 1  , 2  , 3  , 4, 5 ]
-     [ Upa     , Upa, Upa, Upa,  ,   ] <logical_nn>s\t%0.b, %1/z, %2.b, %3.b
+  {@ [ cons: =0, 1  , 2  , 3  , 4, 5; attrs: pred_clobber ]
+     [ &Upa    , Upa, Upa, Upa,  ,  ; yes                 ] <logical_nn>s\t%0.b, %1/z, %2.b, %3.b
+     [ Upa     , Upa, Upa, Upa,  ,  ; *                   ] ^
   }
 )
 
@@ -8104,9 +8119,11 @@ (define_insn "@aarch64_pred_cmp<cmp_op><mode>"
 	  UNSPEC_PRED_Z))
    (clobber (reg:CC_NZC CC_REGNUM))]
   "TARGET_SVE"
-  {@ [ cons: =0 , 1   , 3 , 4              ]
-     [ Upa      , Upl , w , <sve_imm_con>  ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, #%4
-     [ Upa      , Upl , w , w              ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.<Vetype>
+  {@ [ cons: =0 , 1   , 3 , 4            ; attrs: pred_clobber ]
+     [ &Upa     , Upl , w , <sve_imm_con>; yes                 ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, #%4
+     [ Upa      , Upl , w , <sve_imm_con>; *                   ] ^
+     [ &Upa     , Upl , w , w            ; yes                 ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.<Vetype>
+     [ Upa      , Upl , w , w            ; *                   ] ^
   }
 )
 
@@ -8136,9 +8153,11 @@ (define_insn_and_rewrite "*cmp<cmp_op><mode>_cc"
 	  UNSPEC_PRED_Z))]
   "TARGET_SVE
    && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])"
-  {@ [ cons: =0 , 1   , 2 , 3              ]
-     [ Upa      , Upl , w , <sve_imm_con>  ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, #%3
-     [ Upa      , Upl , w , w              ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype>
+  {@ [ cons: =0 , 1    , 2 , 3            ; attrs: pred_clobber ]
+     [ &Upa     ,  Upl , w , <sve_imm_con>; yes                 ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, #%3
+     [ Upa      ,  Upl , w , <sve_imm_con>; *                   ] ^
+     [ &Upa     ,  Upl , w , w            ; yes                 ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype>
+     [ Upa      ,  Upl , w , w            ; *                   ] ^
   }
   "&& !rtx_equal_p (operands[4], operands[6])"
   {
@@ -8166,9 +8185,11 @@ (define_insn_and_rewrite "*cmp<cmp_op><mode>_ptest"
    (clobber (match_scratch:<VPRED> 0))]
   "TARGET_SVE
    && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])"
-  {@ [ cons: =0, 1  , 2 , 3              ]
-     [ Upa     , Upl, w , <sve_imm_con>  ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, #%3
-     [ Upa     , Upl, w , w              ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype>
+  {@ [ cons: =0, 1   , 2 , 3            ; attrs: pred_clobber ]
+     [ &Upa    ,  Upl, w , <sve_imm_con>; yes                 ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, #%3
+     [ Upa     ,  Upl, w , <sve_imm_con>; *                   ] ^
+     [ &Upa    ,  Upl, w , w            ; yes                 ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype>
+     [ Upa     ,  Upl, w , w            ; *                   ] ^
   }
   "&& !rtx_equal_p (operands[4], operands[6])"
   {
@@ -8221,8 +8242,9 @@ (define_insn "@aarch64_pred_cmp<cmp_op><mode>_wide"
 	  UNSPEC_PRED_Z))
    (clobber (reg:CC_NZC CC_REGNUM))]
   "TARGET_SVE"
-  {@ [ cons: =0, 1  , 2, 3, 4 ]
-     [ Upa     , Upl,  , w, w ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.d
+  {@ [ cons: =0, 1   , 2, 3, 4; attrs: pred_clobber ]
+     [ &Upa    ,  Upl,  , w, w; yes                 ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.d
+     [ Upa     ,  Upl,  , w, w; *                   ] ^
   }
 )
 
@@ -8254,8 +8276,9 @@ (define_insn "*aarch64_pred_cmp<cmp_op><mode>_wide_cc"
 	  UNSPEC_PRED_Z))]
   "TARGET_SVE
    && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])"
-  {@ [ cons: =0, 1  , 2, 3, 4, 5, 6  , 7 ]
-     [ Upa     , Upl, w, w,  ,  , Upl,   ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.d
+  {@ [ cons: =0, 1   , 2, 3, 4, 5, 6  , 7; attrs: pred_clobber ]
+     [ &Upa    ,  Upl, w, w,  ,  , Upl,  ; yes                 ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.d
+     [ Upa     ,  Upl, w, w,  ,  , Upl,  ; *                   ] ^
   }
 )
 
@@ -8279,8 +8302,9 @@ (define_insn "*aarch64_pred_cmp<cmp_op><mode>_wide_ptest"
    (clobber (match_scratch:<VPRED> 0))]
   "TARGET_SVE
    && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])"
-  {@ [ cons:  =0, 1  , 2, 3, 4, 5, 6  , 7 ]
-     [ Upa      , Upl, w, w,  ,  , Upl,   ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.d
+  {@ [ cons:  =0, 1   , 2, 3, 4, 5, 6  , 7; attrs: pred_clobber ]
+     [ &Upa     ,  Upl, w, w,  ,  , Upl,  ; yes                 ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.d
+     [ Upa      ,  Upl, w, w,  ,  , Upl,  ; *                   ] ^
   }
 )
 
@@ -9948,9 +9972,11 @@ (define_insn "@aarch64_brk<brk_op>"
 	   (match_operand:VNx16BI 3 "aarch64_simd_reg_or_zero")]
 	  SVE_BRK_UNARY))]
   "TARGET_SVE"
-  {@ [ cons: =0 , 1   , 2   , 3   ]
-     [ Upa      , Upa , Upa , Dz  ] brk<brk_op>\t%0.b, %1/z, %2.b
-     [ Upa      , Upa , Upa , 0   ] brk<brk_op>\t%0.b, %1/m, %2.b
+  {@ [ cons: =0 , 1   , 2   , 3  ; attrs: pred_clobber ]
+     [ &Upa     ,  Upa , Upa , Dz; yes                 ] brk<brk_op>\t%0.b, %1/z, %2.b
+     [ Upa      ,  Upa , Upa , Dz; *                   ] ^
+     [ &Upa     ,  Upa , Upa , 0 ; yes                 ] brk<brk_op>\t%0.b, %1/m, %2.b
+     [ Upa      ,  Upa , Upa , 0 ; *                   ] ^
   }
 )
 
@@ -9974,8 +10000,9 @@ (define_insn "*aarch64_brk<brk_op>_cc"
 	   (match_dup 3)]
 	  SVE_BRK_UNARY))]
   "TARGET_SVE"
-  {@ [ cons: =0, 1  , 2  , 3, 4 ]
-     [ Upa     , Upa, Upa,  ,   ] brk<brk_op>s\t%0.b, %1/z, %2.b
+  {@ [ cons: =0, 1  , 2  , 3, 4; attrs: pred_clobber ]
+     [ &Upa    , Upa, Upa,  ,  ; yes                 ] brk<brk_op>s\t%0.b, %1/z, %2.b
+     [ Upa     , Upa, Upa,  ,  ; *                   ] ^
   }
 )
 
@@ -9994,8 +10021,9 @@ (define_insn "*aarch64_brk<brk_op>_ptest"
 	  UNSPEC_PTEST))
    (clobber (match_scratch:VNx16BI 0))]
   "TARGET_SVE"
-  {@ [ cons: =0, 1  , 2  , 3, 4 ]
-     [ Upa     , Upa, Upa,  ,   ] brk<brk_op>s\t%0.b, %1/z, %2.b
+  {@ [ cons: =0, 1  , 2  , 3, 4; attrs: pred_clobber ]
+     [ &Upa    , Upa, Upa,  ,  ; yes                 ] brk<brk_op>s\t%0.b, %1/z, %2.b
+     [ Upa     , Upa, Upa,  ,  ; *                   ] ^
   }
 )
 
@@ -10020,8 +10048,9 @@ (define_insn "@aarch64_brk<brk_op>"
 	   (match_operand:VNx16BI 3 "register_operand")]
 	  SVE_BRK_BINARY))]
   "TARGET_SVE"
-  {@ [ cons: =0, 1  , 2  , 3             ]
-     [ Upa     , Upa, Upa, <brk_reg_con> ] brk<brk_op>\t%0.b, %1/z, %2.b, %<brk_reg_opno>.b
+  {@ [ cons: =0,  1 , 2  , 3            ; attrs: pred_clobber ]
+     [ &Upa    , Upa, Upa, <brk_reg_con>; yes                 ] brk<brk_op>\t%0.b, %1/z, %2.b, %<brk_reg_opno>.b
+     [ Upa     , Upa, Upa, <brk_reg_con>; *                   ] ^
   }
 )
 
@@ -10046,8 +10075,9 @@ (define_insn_and_rewrite "*aarch64_brkn_cc"
 	   (match_dup 3)]
 	  UNSPEC_BRKN))]
   "TARGET_SVE"
-  {@ [ cons: =0, 1  , 2  , 3, 4, 5 ]
-     [ Upa     , Upa, Upa, 0,  ,   ] brkns\t%0.b, %1/z, %2.b, %0.b
+  {@ [ cons: =0, 1  , 2  , 3, 4, 5; attrs: pred_clobber ]
+     [ &Upa    , Upa, Upa, 0,  ,  ; yes                 ] brkns\t%0.b, %1/z, %2.b, %0.b
+     [ Upa     , Upa, Upa, 0,  ,  ; *                   ] ^
   }
   "&& (operands[4] != CONST0_RTX (VNx16BImode)
        || operands[5] != CONST0_RTX (VNx16BImode))"
@@ -10072,8 +10102,9 @@ (define_insn_and_rewrite "*aarch64_brkn_ptest"
 	  UNSPEC_PTEST))
    (clobber (match_scratch:VNx16BI 0))]
   "TARGET_SVE"
-  {@ [ cons: =0, 1  , 2  , 3, 4, 5 ]
-     [ Upa     , Upa, Upa, 0,  ,   ] brkns\t%0.b, %1/z, %2.b, %0.b
+  {@ [ cons: =0, 1  , 2  , 3, 4, 5; attrs: pred_clobber ]
+     [ &Upa    , Upa, Upa, 0,  , ; yes                  ] brkns\t%0.b, %1/z, %2.b, %0.b
+     [ Upa     , Upa, Upa, 0,  , ; *                    ] ^
   }
   "&& (operands[4] != CONST0_RTX (VNx16BImode)
        || operands[5] != CONST0_RTX (VNx16BImode))"
@@ -10103,8 +10134,9 @@ (define_insn "*aarch64_brk<brk_op>_cc"
 	   (match_dup 3)]
 	  SVE_BRKP))]
   "TARGET_SVE"
-  {@ [ cons: =0, 1  , 2  , 3  , 4 ]
-     [ Upa     , Upa, Upa, Upa,   ] brk<brk_op>s\t%0.b, %1/z, %2.b, %3.b
+  {@ [ cons: =0, 1  , 2  , 3  , 4; attrs: pred_clobber ]
+     [ &Upa    , Upa, Upa, Upa,  ; yes                 ] brk<brk_op>s\t%0.b, %1/z, %2.b, %3.b
+     [ Upa     , Upa, Upa, Upa,  ; *                   ] ^
   }
 )
 
@@ -10123,8 +10155,9 @@ (define_insn "*aarch64_brk<brk_op>_ptest"
 	  UNSPEC_PTEST))
    (clobber (match_scratch:VNx16BI 0))]
   "TARGET_SVE"
-  {@ [ cons: =0, 1  , 2  , 3  , 4 ]
-     [ Upa     , Upa, Upa, Upa,   ] brk<brk_op>s\t%0.b, %1/z, %2.b, %3.b
+  {@ [ cons: =0, 1  , 2  , 3  , 4; attrs: pred_clobber ]
+     [ &Upa    , Upa, Upa, Upa,   ; yes                ] brk<brk_op>s\t%0.b, %1/z, %2.b, %3.b
+     [ Upa     , Upa, Upa, Upa,   ; *                  ] ^
   }
 )
 
diff --git a/gcc/config/aarch64/aarch64-sve2.md b/gcc/config/aarch64/aarch64-sve2.md
index aa12baf48355358ca4fefe88157df3aac6eb09bd..771c346b8a3188dd7e3f3a98ee28f0ca5f928215 100644
--- a/gcc/config/aarch64/aarch64-sve2.md
+++ b/gcc/config/aarch64/aarch64-sve2.md
@@ -3349,8 +3349,9 @@ (define_insn "@aarch64_pred_<sve_int_op><mode>"
 	  UNSPEC_PRED_Z))
    (clobber (reg:CC_NZC CC_REGNUM))]
   "TARGET_SVE2 && TARGET_NON_STREAMING"
-  {@ [ cons: =0, 1 , 2, 3, 4 ]
-     [ Upa     , Upl, , w, w ] <sve_int_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.<Vetype>
+  {@ [ cons: =0, 1 , 2, 3, 4; attrs: pred_clobber ]
+     [ &Upa    , Upl, , w, w; yes                 ] <sve_int_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.<Vetype>
+     [ Upa     , Upl, , w, w; *                   ] ^
   }
 )
 




^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 4/4]AArch64: enable new predicate tuning for Neoverse cores.
  2024-05-15 10:28 [PATCH 0/4]AArch64: support conditional early clobbers on certain operations Tamar Christina
                   ` (2 preceding siblings ...)
  2024-05-15 10:29 ` [PATCH 3/4]AArch64: add new alternative with early clobber to patterns Tamar Christina
@ 2024-05-15 10:29 ` Tamar Christina
  2024-05-15 11:20 ` [PATCH 0/4]AArch64: support conditional early clobbers on certain operations Richard Biener
  4 siblings, 0 replies; 19+ messages in thread
From: Tamar Christina @ 2024-05-15 10:29 UTC (permalink / raw)
  To: gcc-patches
  Cc: nd, Richard.Earnshaw, Marcus.Shawcroft, ktkachov, richard.sandiford

[-- Attachment #1: Type: text/plain, Size: 7715 bytes --]

Hi All,

This enables the new tuning flag for Neoverse V1, Neoverse V2 and Neoverse N2.
It is kept off for generic codegen.

Note the reason for the +sve even though they are in aarch64-sve.exp is if the
testsuite is ran with a forced SVE off option, e.g. -march=armv8-a+nosve then
the intrinsics end up being disabled because the -march is preferred over the
-mcpu even though the -mcpu comes later.

This prevents the tests from failing in such runs.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* config/aarch64/tuning_models/neoversen2.h (neoversen2_tunings): Add
	AARCH64_EXTRA_TUNE_EARLY_CLOBBER_SVE_PRED_DEST.
	* config/aarch64/tuning_models/neoversev1.h (neoversev1_tunings): Add
	AARCH64_EXTRA_TUNE_EARLY_CLOBBER_SVE_PRED_DEST.
	* config/aarch64/tuning_models/neoversev2.h (neoversev2_tunings): Add
	AARCH64_EXTRA_TUNE_EARLY_CLOBBER_SVE_PRED_DEST.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/sve/pred_clobber_1.c: New test.
	* gcc.target/aarch64/sve/pred_clobber_2.c: New test.
	* gcc.target/aarch64/sve/pred_clobber_3.c: New test.
	* gcc.target/aarch64/sve/pred_clobber_4.c: New test.
	* gcc.target/aarch64/sve/pred_clobber_5.c: New test.

---
diff --git a/gcc/config/aarch64/tuning_models/neoversen2.h b/gcc/config/aarch64/tuning_models/neoversen2.h
index 7e799bbe762fe862e31befed50e54040a7fd1f2f..0d8f3f6be67f3583b00473bef97ea3ae4fcea4ec 100644
--- a/gcc/config/aarch64/tuning_models/neoversen2.h
+++ b/gcc/config/aarch64/tuning_models/neoversen2.h
@@ -236,7 +236,8 @@ static const struct tune_params neoversen2_tunings =
   (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND
    | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
    | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
-   | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT),	/* tune_flags.  */
+   | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT
+   | AARCH64_EXTRA_TUNE_EARLY_CLOBBER_SVE_PRED_DEST),	/* tune_flags.  */
   &generic_prefetch_tune,
   AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
   AARCH64_LDP_STP_POLICY_ALWAYS	   /* stp_policy_model.  */
diff --git a/gcc/config/aarch64/tuning_models/neoversev1.h b/gcc/config/aarch64/tuning_models/neoversev1.h
index 9363f2ad98a5279cc99f2f9b1509ba921d582e84..d28d0b1c0498ed250b0a93ca69720fe10c65c93d 100644
--- a/gcc/config/aarch64/tuning_models/neoversev1.h
+++ b/gcc/config/aarch64/tuning_models/neoversev1.h
@@ -227,7 +227,8 @@ static const struct tune_params neoversev1_tunings =
   (AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
    | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
    | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT
-   | AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND),	/* tune_flags.  */
+   | AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND
+   | AARCH64_EXTRA_TUNE_EARLY_CLOBBER_SVE_PRED_DEST),	/* tune_flags.  */
   &generic_prefetch_tune,
   AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
   AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
diff --git a/gcc/config/aarch64/tuning_models/neoversev2.h b/gcc/config/aarch64/tuning_models/neoversev2.h
index bc01ed767c9b690504eb98456402df5d9d64eee3..3b2f9797bd777e73ca9c21501fa97448d96cb65e 100644
--- a/gcc/config/aarch64/tuning_models/neoversev2.h
+++ b/gcc/config/aarch64/tuning_models/neoversev2.h
@@ -236,7 +236,8 @@ static const struct tune_params neoversev2_tunings =
   (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND
    | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
    | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
-   | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT),	/* tune_flags.  */
+   | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT
+   | AARCH64_EXTRA_TUNE_EARLY_CLOBBER_SVE_PRED_DEST),	/* tune_flags.  */
   &generic_prefetch_tune,
   AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
   AARCH64_LDP_STP_POLICY_ALWAYS	   /* stp_policy_model.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_1.c b/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..934a00a38531c5fd4139d99ff33414904b2c104f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_1.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mcpu=neoverse-n2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC target "+sve"
+
+#include <arm_sve.h>
+
+extern void use(svbool_t);
+
+/*
+** foo:
+**	...
+**	ptrue	p([1-9][0-9]?).b, all
+**	cmplo	p0.h, p\1/z, z0.h, z[0-9]+.h
+**	...
+*/
+void foo (svuint16_t a, uint16_t b)
+{
+    svbool_t p0 = svcmplt_n_u16 (svptrue_b16 (), a, b);
+    use (p0);
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_2.c b/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_2.c
new file mode 100644
index 0000000000000000000000000000000000000000..58badb66a43b1ac50eeec153b9cac44fc831b145
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_2.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mcpu=neoverse-v2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC target "+sve"
+
+#include <arm_sve.h>
+
+extern void use(svbool_t);
+
+/*
+** foo:
+**	...
+**	ptrue	p([1-9][0-9]?).b, all
+**	cmplo	p0.h, p\1/z, z0.h, z[0-9]+.h
+**	...
+*/
+void foo (svuint16_t a, uint16_t b)
+{
+    svbool_t p0 = svcmplt_n_u16 (svptrue_b16 (), a, b);
+    use (p0);
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_3.c b/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_3.c
new file mode 100644
index 0000000000000000000000000000000000000000..c67c2bd3422e0bb0c694b5fe0adf0d83e4d967c6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_3.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mcpu=neoverse-v1" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC target "+sve"
+
+#include <arm_sve.h>
+
+extern void use(svbool_t);
+
+/*
+** foo:
+**	...
+**	ptrue	p([1-9][0-9]?).b, all
+**	cmplo	p0.h, p\1/z, z0.h, z[0-9]+.h
+**	...
+*/
+void foo (svuint16_t a, uint16_t b)
+{
+    svbool_t p0 = svcmplt_n_u16 (svptrue_b16 (), a, b);
+    use (p0);
+}
+
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_4.c b/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_4.c
new file mode 100644
index 0000000000000000000000000000000000000000..c0120afe5d523eff8297fadd4fc4c678676413d6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_4.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC target "+sve"
+
+#include <arm_sve.h>
+
+extern void use(svbool_t);
+
+/*
+** foo:
+**	...
+**	ptrue	p0.b, all
+**	cmplo	p0.h, p0/z, z0.h, z[0-9]+.h
+**	...
+*/
+void foo (svuint16_t a, uint16_t b)
+{
+    svbool_t p0 = svcmplt_n_u16 (svptrue_b16 (), a, b);
+    use (p0);
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_5.c b/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_5.c
new file mode 100644
index 0000000000000000000000000000000000000000..63f0669abd23d45c0ffd77c53859a098a21e0192
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_5.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mcpu=neoverse-n2 -ffixed-p1 -ffixed-p2 -ffixed-p3 -ffixed-p4 -ffixed-p5 -ffixed-p6 -ffixed-p7 -ffixed-p8 -ffixed-p9 -ffixed-p10 -ffixed-p11 -ffixed-p12 -ffixed-p12 -ffixed-p13 -ffixed-p14 -ffixed-p15" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC target "+sve"
+
+#include <arm_sve.h>
+
+extern void use(svbool_t);
+
+/*
+** foo:
+**	...
+**	ptrue	p0.b, all
+**	cmplo	p0.h, p0/z, z0.h, z[0-9]+.h
+**	...
+*/
+void foo (svuint16_t a, uint16_t b)
+{
+    svbool_t p0 = svcmplt_n_u16 (svptrue_b16 (), a, b);
+    use (p0);
+}




-- 

[-- Attachment #2: rb18356.patch --]
[-- Type: text/x-diff, Size: 6482 bytes --]

diff --git a/gcc/config/aarch64/tuning_models/neoversen2.h b/gcc/config/aarch64/tuning_models/neoversen2.h
index 7e799bbe762fe862e31befed50e54040a7fd1f2f..0d8f3f6be67f3583b00473bef97ea3ae4fcea4ec 100644
--- a/gcc/config/aarch64/tuning_models/neoversen2.h
+++ b/gcc/config/aarch64/tuning_models/neoversen2.h
@@ -236,7 +236,8 @@ static const struct tune_params neoversen2_tunings =
   (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND
    | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
    | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
-   | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT),	/* tune_flags.  */
+   | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT
+   | AARCH64_EXTRA_TUNE_EARLY_CLOBBER_SVE_PRED_DEST),	/* tune_flags.  */
   &generic_prefetch_tune,
   AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
   AARCH64_LDP_STP_POLICY_ALWAYS	   /* stp_policy_model.  */
diff --git a/gcc/config/aarch64/tuning_models/neoversev1.h b/gcc/config/aarch64/tuning_models/neoversev1.h
index 9363f2ad98a5279cc99f2f9b1509ba921d582e84..d28d0b1c0498ed250b0a93ca69720fe10c65c93d 100644
--- a/gcc/config/aarch64/tuning_models/neoversev1.h
+++ b/gcc/config/aarch64/tuning_models/neoversev1.h
@@ -227,7 +227,8 @@ static const struct tune_params neoversev1_tunings =
   (AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
    | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
    | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT
-   | AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND),	/* tune_flags.  */
+   | AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND
+   | AARCH64_EXTRA_TUNE_EARLY_CLOBBER_SVE_PRED_DEST),	/* tune_flags.  */
   &generic_prefetch_tune,
   AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
   AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
diff --git a/gcc/config/aarch64/tuning_models/neoversev2.h b/gcc/config/aarch64/tuning_models/neoversev2.h
index bc01ed767c9b690504eb98456402df5d9d64eee3..3b2f9797bd777e73ca9c21501fa97448d96cb65e 100644
--- a/gcc/config/aarch64/tuning_models/neoversev2.h
+++ b/gcc/config/aarch64/tuning_models/neoversev2.h
@@ -236,7 +236,8 @@ static const struct tune_params neoversev2_tunings =
   (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND
    | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
    | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
-   | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT),	/* tune_flags.  */
+   | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT
+   | AARCH64_EXTRA_TUNE_EARLY_CLOBBER_SVE_PRED_DEST),	/* tune_flags.  */
   &generic_prefetch_tune,
   AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
   AARCH64_LDP_STP_POLICY_ALWAYS	   /* stp_policy_model.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_1.c b/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..934a00a38531c5fd4139d99ff33414904b2c104f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_1.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mcpu=neoverse-n2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC target "+sve"
+
+#include <arm_sve.h>
+
+extern void use(svbool_t);
+
+/*
+** foo:
+**	...
+**	ptrue	p([1-9][0-9]?).b, all
+**	cmplo	p0.h, p\1/z, z0.h, z[0-9]+.h
+**	...
+*/
+void foo (svuint16_t a, uint16_t b)
+{
+    svbool_t p0 = svcmplt_n_u16 (svptrue_b16 (), a, b);
+    use (p0);
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_2.c b/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_2.c
new file mode 100644
index 0000000000000000000000000000000000000000..58badb66a43b1ac50eeec153b9cac44fc831b145
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_2.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mcpu=neoverse-v2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC target "+sve"
+
+#include <arm_sve.h>
+
+extern void use(svbool_t);
+
+/*
+** foo:
+**	...
+**	ptrue	p([1-9][0-9]?).b, all
+**	cmplo	p0.h, p\1/z, z0.h, z[0-9]+.h
+**	...
+*/
+void foo (svuint16_t a, uint16_t b)
+{
+    svbool_t p0 = svcmplt_n_u16 (svptrue_b16 (), a, b);
+    use (p0);
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_3.c b/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_3.c
new file mode 100644
index 0000000000000000000000000000000000000000..c67c2bd3422e0bb0c694b5fe0adf0d83e4d967c6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_3.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mcpu=neoverse-v1" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC target "+sve"
+
+#include <arm_sve.h>
+
+extern void use(svbool_t);
+
+/*
+** foo:
+**	...
+**	ptrue	p([1-9][0-9]?).b, all
+**	cmplo	p0.h, p\1/z, z0.h, z[0-9]+.h
+**	...
+*/
+void foo (svuint16_t a, uint16_t b)
+{
+    svbool_t p0 = svcmplt_n_u16 (svptrue_b16 (), a, b);
+    use (p0);
+}
+
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_4.c b/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_4.c
new file mode 100644
index 0000000000000000000000000000000000000000..c0120afe5d523eff8297fadd4fc4c678676413d6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_4.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC target "+sve"
+
+#include <arm_sve.h>
+
+extern void use(svbool_t);
+
+/*
+** foo:
+**	...
+**	ptrue	p0.b, all
+**	cmplo	p0.h, p0/z, z0.h, z[0-9]+.h
+**	...
+*/
+void foo (svuint16_t a, uint16_t b)
+{
+    svbool_t p0 = svcmplt_n_u16 (svptrue_b16 (), a, b);
+    use (p0);
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_5.c b/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_5.c
new file mode 100644
index 0000000000000000000000000000000000000000..63f0669abd23d45c0ffd77c53859a098a21e0192
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_5.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mcpu=neoverse-n2 -ffixed-p1 -ffixed-p2 -ffixed-p3 -ffixed-p4 -ffixed-p5 -ffixed-p6 -ffixed-p7 -ffixed-p8 -ffixed-p9 -ffixed-p10 -ffixed-p11 -ffixed-p12 -ffixed-p12 -ffixed-p13 -ffixed-p14 -ffixed-p15" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC target "+sve"
+
+#include <arm_sve.h>
+
+extern void use(svbool_t);
+
+/*
+** foo:
+**	...
+**	ptrue	p0.b, all
+**	cmplo	p0.h, p0/z, z0.h, z[0-9]+.h
+**	...
+*/
+void foo (svuint16_t a, uint16_t b)
+{
+    svbool_t p0 = svcmplt_n_u16 (svptrue_b16 (), a, b);
+    use (p0);
+}




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/4]AArch64: convert several predicate patterns to new compact syntax
  2024-05-15 10:28 ` [PATCH 1/4]AArch64: convert several predicate patterns to new compact syntax Tamar Christina
@ 2024-05-15 10:35   ` Kyrill Tkachov
  2024-05-15 11:06   ` Richard Sandiford
  1 sibling, 0 replies; 19+ messages in thread
From: Kyrill Tkachov @ 2024-05-15 10:35 UTC (permalink / raw)
  To: Tamar Christina
  Cc: Marcus.Shawcroft, Richard.Earnshaw, gcc-patches, ktkachov, nd,
	richard.sandiford

[-- Attachment #1: Type: text/plain, Size: 31058 bytes --]

Hi Tamar,

On Wed, 15 May 2024 at 11:28, Tamar Christina <tamar.christina@arm.com>
wrote:

> Hi All,
>
> This converts the single alternative patterns to the new compact syntax
> such
> that when I add the new alternatives it's clearer what's being changed.
>
> Note that this will spew out a bunch of warnings from geninsn as it'll
> warn that
> @ is useless for a single alternative pattern.  These are not fatal so
> won't
> break the build and are only temporary.
>
> No change in functionality is expected with this patch.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?


Ok.
Thanks,
Kyrill


>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>         * config/aarch64/aarch64-sve.md (and<mode>3,
>         @aarch64_pred_<optab><mode>_z, *<optab><mode>3_cc,
>         *<optab><mode>3_ptest, aarch64_pred_<nlogical><mode>_z,
>         *<nlogical><mode>3_cc, *<nlogical><mode>3_ptest,
>         aarch64_pred_<logical_nn><mode>_z, *<logical_nn><mode>3_cc,
>         *<logical_nn><mode>3_ptest, *cmp<cmp_op><mode>_ptest,
>         @aarch64_pred_cmp<cmp_op><mode>_wide,
>         *aarch64_pred_cmp<cmp_op><mode>_wide_cc,
>         *aarch64_pred_cmp<cmp_op><mode>_wide_ptest,
> *aarch64_brk<brk_op>_cc,
>         *aarch64_brk<brk_op>_ptest, @aarch64_brk<brk_op>, *aarch64_brkn_cc,
>         *aarch64_brkn_ptest, *aarch64_brk<brk_op>_cc,
>         *aarch64_brk<brk_op>_ptest, aarch64_rdffr_z,
> *aarch64_rdffr_z_ptest,
>         *aarch64_rdffr_ptest, *aarch64_rdffr_z_cc, *aarch64_rdffr_cc):
> Convert
>         to compact syntax.
>         * config/aarch64/aarch64-sve2.md
>         (@aarch64_pred_<sve_int_op><mode>): Likewise.
>
> ---
> diff --git a/gcc/config/aarch64/aarch64-sve.md
> b/gcc/config/aarch64/aarch64-sve.md
> index
> 0434358122d2fde71bd0e0f850338e739e9be02c..839ab0627747d7a49bef7b0192ee9e7a42587ca0
> 100644
> --- a/gcc/config/aarch64/aarch64-sve.md
> +++ b/gcc/config/aarch64/aarch64-sve.md
> @@ -1156,76 +1156,86 @@ (define_insn "aarch64_rdffr"
>
>  ;; Likewise with zero predication.
>  (define_insn "aarch64_rdffr_z"
> -  [(set (match_operand:VNx16BI 0 "register_operand" "=Upa")
> +  [(set (match_operand:VNx16BI 0 "register_operand")
>         (and:VNx16BI
>           (reg:VNx16BI FFRT_REGNUM)
> -         (match_operand:VNx16BI 1 "register_operand" "Upa")))]
> +         (match_operand:VNx16BI 1 "register_operand")))]
>    "TARGET_SVE && TARGET_NON_STREAMING"
> -  "rdffr\t%0.b, %1/z"
> +  {@ [ cons: =0, 1   ]
> +     [ Upa     , Upa ] rdffr\t%0.b, %1/z
> +  }
>  )
>
>  ;; Read the FFR to test for a fault, without using the predicate result.
>  (define_insn "*aarch64_rdffr_z_ptest"
>    [(set (reg:CC_NZC CC_REGNUM)
>         (unspec:CC_NZC
> -         [(match_operand:VNx16BI 1 "register_operand" "Upa")
> +         [(match_operand:VNx16BI 1 "register_operand")
>            (match_dup 1)
>            (match_operand:SI 2 "aarch64_sve_ptrue_flag")
>            (and:VNx16BI
>              (reg:VNx16BI FFRT_REGNUM)
>              (match_dup 1))]
>           UNSPEC_PTEST))
> -   (clobber (match_scratch:VNx16BI 0 "=Upa"))]
> +   (clobber (match_scratch:VNx16BI 0))]
>    "TARGET_SVE && TARGET_NON_STREAMING"
> -  "rdffrs\t%0.b, %1/z"
> +  {@ [ cons: =0, 1  , 2 ]
> +     [ Upa     , Upa,   ] rdffrs\t%0.b, %1/z
> +  }
>  )
>
>  ;; Same for unpredicated RDFFR when tested with a known PTRUE.
>  (define_insn "*aarch64_rdffr_ptest"
>    [(set (reg:CC_NZC CC_REGNUM)
>         (unspec:CC_NZC
> -         [(match_operand:VNx16BI 1 "register_operand" "Upa")
> +         [(match_operand:VNx16BI 1 "register_operand")
>            (match_dup 1)
>            (const_int SVE_KNOWN_PTRUE)
>            (reg:VNx16BI FFRT_REGNUM)]
>           UNSPEC_PTEST))
> -   (clobber (match_scratch:VNx16BI 0 "=Upa"))]
> +   (clobber (match_scratch:VNx16BI 0))]
>    "TARGET_SVE && TARGET_NON_STREAMING"
> -  "rdffrs\t%0.b, %1/z"
> +  {@ [ cons: =0, 1   ]
> +     [ Upa     , Upa ] rdffrs\t%0.b, %1/z
> +  }
>  )
>
>  ;; Read the FFR with zero predication and test the result.
>  (define_insn "*aarch64_rdffr_z_cc"
>    [(set (reg:CC_NZC CC_REGNUM)
>         (unspec:CC_NZC
> -         [(match_operand:VNx16BI 1 "register_operand" "Upa")
> +         [(match_operand:VNx16BI 1 "register_operand")
>            (match_dup 1)
>            (match_operand:SI 2 "aarch64_sve_ptrue_flag")
>            (and:VNx16BI
>              (reg:VNx16BI FFRT_REGNUM)
>              (match_dup 1))]
>           UNSPEC_PTEST))
> -   (set (match_operand:VNx16BI 0 "register_operand" "=Upa")
> +   (set (match_operand:VNx16BI 0 "register_operand")
>         (and:VNx16BI
>           (reg:VNx16BI FFRT_REGNUM)
>           (match_dup 1)))]
>    "TARGET_SVE && TARGET_NON_STREAMING"
> -  "rdffrs\t%0.b, %1/z"
> +  {@ [ cons: =0, 1  , 2 ]
> +     [ Upa     , Upa,   ] rdffrs\t%0.b, %1/z
> +  }
>  )
>
>  ;; Same for unpredicated RDFFR when tested with a known PTRUE.
>  (define_insn "*aarch64_rdffr_cc"
>    [(set (reg:CC_NZC CC_REGNUM)
>         (unspec:CC_NZC
> -         [(match_operand:VNx16BI 1 "register_operand" "Upa")
> +         [(match_operand:VNx16BI 1 "register_operand")
>            (match_dup 1)
>            (const_int SVE_KNOWN_PTRUE)
>            (reg:VNx16BI FFRT_REGNUM)]
>           UNSPEC_PTEST))
> -   (set (match_operand:VNx16BI 0 "register_operand" "=Upa")
> +   (set (match_operand:VNx16BI 0 "register_operand")
>         (reg:VNx16BI FFRT_REGNUM))]
>    "TARGET_SVE && TARGET_NON_STREAMING"
> -  "rdffrs\t%0.b, %1/z"
> +  {@ [ cons: =0, 1  , 2 ]
> +     [ Upa     , Upa,   ] rdffrs\t%0.b, %1/z
> +  }
>  )
>
>  ;; [R3 in the block comment above about FFR handling]
> @@ -6637,11 +6647,13 @@ (define_insn "@aarch64_pred_<optab><mode>"
>  ;; Doubling the second operand is the preferred implementation
>  ;; of the MOV alias, so we use that instead of %1/z, %1, %2.
>  (define_insn "and<mode>3"
> -  [(set (match_operand:PRED_ALL 0 "register_operand" "=Upa")
> -       (and:PRED_ALL (match_operand:PRED_ALL 1 "register_operand" "Upa")
> -                     (match_operand:PRED_ALL 2 "register_operand"
> "Upa")))]
> +  [(set (match_operand:PRED_ALL 0 "register_operand")
> +       (and:PRED_ALL (match_operand:PRED_ALL 1 "register_operand")
> +                     (match_operand:PRED_ALL 2 "register_operand")))]
>    "TARGET_SVE"
> -  "and\t%0.b, %1/z, %2.b, %2.b"
> +  {@ [ cons: =0, 1  , 2   ]
> +     [ Upa     , Upa, Upa ] and\t%0.b, %1/z, %2.b, %2.b
> +  }
>  )
>
>  ;; Unpredicated predicate EOR and ORR.
> @@ -6660,14 +6672,16 @@ (define_expand "<optab><mode>3"
>
>  ;; Predicated predicate AND, EOR and ORR.
>  (define_insn "@aarch64_pred_<optab><mode>_z"
> -  [(set (match_operand:PRED_ALL 0 "register_operand" "=Upa")
> +  [(set (match_operand:PRED_ALL 0 "register_operand")
>         (and:PRED_ALL
>           (LOGICAL:PRED_ALL
> -           (match_operand:PRED_ALL 2 "register_operand" "Upa")
> -           (match_operand:PRED_ALL 3 "register_operand" "Upa"))
> -         (match_operand:PRED_ALL 1 "register_operand" "Upa")))]
> +           (match_operand:PRED_ALL 2 "register_operand")
> +           (match_operand:PRED_ALL 3 "register_operand"))
> +         (match_operand:PRED_ALL 1 "register_operand")))]
>    "TARGET_SVE"
> -  "<logical>\t%0.b, %1/z, %2.b, %3.b"
> +  {@ [ cons: =0, 1  , 2  , 3   ]
> +     [ Upa     , Upa, Upa, Upa ] <logical>\t%0.b, %1/z, %2.b, %3.b
> +  }
>  )
>
>  ;; Perform a logical operation on operands 2 and 3, using operand 1 as
> @@ -6676,38 +6690,42 @@ (define_insn "@aarch64_pred_<optab><mode>_z"
>  (define_insn "*<optab><mode>3_cc"
>    [(set (reg:CC_NZC CC_REGNUM)
>         (unspec:CC_NZC
> -         [(match_operand:VNx16BI 1 "register_operand" "Upa")
> +         [(match_operand:VNx16BI 1 "register_operand")
>            (match_operand 4)
>            (match_operand:SI 5 "aarch64_sve_ptrue_flag")
>            (and:PRED_ALL
>              (LOGICAL:PRED_ALL
> -              (match_operand:PRED_ALL 2 "register_operand" "Upa")
> -              (match_operand:PRED_ALL 3 "register_operand" "Upa"))
> +              (match_operand:PRED_ALL 2 "register_operand")
> +              (match_operand:PRED_ALL 3 "register_operand"))
>              (match_dup 4))]
>           UNSPEC_PTEST))
> -   (set (match_operand:PRED_ALL 0 "register_operand" "=Upa")
> +   (set (match_operand:PRED_ALL 0 "register_operand")
>         (and:PRED_ALL (LOGICAL:PRED_ALL (match_dup 2) (match_dup 3))
>                       (match_dup 4)))]
>    "TARGET_SVE"
> -  "<logical>s\t%0.b, %1/z, %2.b, %3.b"
> +  {@ [ cons: =0, 1  , 2  , 3  , 4, 5 ]
> +     [ Upa     , Upa, Upa, Upa,  ,   ] <logical>s\t%0.b, %1/z, %2.b, %3.b
> +  }
>  )
>
>  ;; Same with just the flags result.
>  (define_insn "*<optab><mode>3_ptest"
>    [(set (reg:CC_NZC CC_REGNUM)
>         (unspec:CC_NZC
> -         [(match_operand:VNx16BI 1 "register_operand" "Upa")
> +         [(match_operand:VNx16BI 1 "register_operand")
>            (match_operand 4)
>            (match_operand:SI 5 "aarch64_sve_ptrue_flag")
>            (and:PRED_ALL
>              (LOGICAL:PRED_ALL
> -              (match_operand:PRED_ALL 2 "register_operand" "Upa")
> -              (match_operand:PRED_ALL 3 "register_operand" "Upa"))
> +              (match_operand:PRED_ALL 2 "register_operand")
> +              (match_operand:PRED_ALL 3 "register_operand"))
>              (match_dup 4))]
>           UNSPEC_PTEST))
> -   (clobber (match_scratch:VNx16BI 0 "=Upa"))]
> +   (clobber (match_scratch:VNx16BI 0))]
>    "TARGET_SVE"
> -  "<logical>s\t%0.b, %1/z, %2.b, %3.b"
> +  {@ [ cons: =0, 1  , 2  , 3  , 4, 5 ]
> +     [ Upa     , Upa, Upa, Upa,  ,   ] <logical>s\t%0.b, %1/z, %2.b, %3.b
> +  }
>  )
>
>  ;;
> -------------------------------------------------------------------------
> @@ -6720,56 +6738,62 @@ (define_insn "*<optab><mode>3_ptest"
>
>  ;; Predicated predicate BIC and ORN.
>  (define_insn "aarch64_pred_<nlogical><mode>_z"
> -  [(set (match_operand:PRED_ALL 0 "register_operand" "=Upa")
> +  [(set (match_operand:PRED_ALL 0 "register_operand")
>         (and:PRED_ALL
>           (NLOGICAL:PRED_ALL
> -           (not:PRED_ALL (match_operand:PRED_ALL 3 "register_operand"
> "Upa"))
> -           (match_operand:PRED_ALL 2 "register_operand" "Upa"))
> -         (match_operand:PRED_ALL 1 "register_operand" "Upa")))]
> +           (not:PRED_ALL (match_operand:PRED_ALL 3 "register_operand"))
> +           (match_operand:PRED_ALL 2 "register_operand"))
> +         (match_operand:PRED_ALL 1 "register_operand")))]
>    "TARGET_SVE"
> -  "<nlogical>\t%0.b, %1/z, %2.b, %3.b"
> +  {@ [ cons: =0, 1  , 2  , 3   ]
> +     [ Upa     , Upa, Upa, Upa ] <nlogical>\t%0.b, %1/z, %2.b, %3.b
> +  }
>  )
>
>  ;; Same, but set the flags as a side-effect.
>  (define_insn "*<nlogical><mode>3_cc"
>    [(set (reg:CC_NZC CC_REGNUM)
>         (unspec:CC_NZC
> -         [(match_operand:VNx16BI 1 "register_operand" "Upa")
> +         [(match_operand:VNx16BI 1 "register_operand")
>            (match_operand 4)
>            (match_operand:SI 5 "aarch64_sve_ptrue_flag")
>            (and:PRED_ALL
>              (NLOGICAL:PRED_ALL
>                (not:PRED_ALL
> -                (match_operand:PRED_ALL 3 "register_operand" "Upa"))
> -              (match_operand:PRED_ALL 2 "register_operand" "Upa"))
> +                (match_operand:PRED_ALL 3 "register_operand"))
> +              (match_operand:PRED_ALL 2 "register_operand"))
>              (match_dup 4))]
>           UNSPEC_PTEST))
> -   (set (match_operand:PRED_ALL 0 "register_operand" "=Upa")
> +   (set (match_operand:PRED_ALL 0 "register_operand")
>         (and:PRED_ALL (NLOGICAL:PRED_ALL
>                         (not:PRED_ALL (match_dup 3))
>                         (match_dup 2))
>                       (match_dup 4)))]
>    "TARGET_SVE"
> -  "<nlogical>s\t%0.b, %1/z, %2.b, %3.b"
> +  {@ [ cons: =0, 1  , 2  , 3  , 4, 5 ]
> +     [ Upa     , Upa, Upa, Upa,  ,   ] <nlogical>s\t%0.b, %1/z, %2.b, %3.b
> +  }
>  )
>
>  ;; Same with just the flags result.
>  (define_insn "*<nlogical><mode>3_ptest"
>    [(set (reg:CC_NZC CC_REGNUM)
>         (unspec:CC_NZC
> -         [(match_operand:VNx16BI 1 "register_operand" "Upa")
> +         [(match_operand:VNx16BI 1 "register_operand")
>            (match_operand 4)
>            (match_operand:SI 5 "aarch64_sve_ptrue_flag")
>            (and:PRED_ALL
>              (NLOGICAL:PRED_ALL
>                (not:PRED_ALL
> -                (match_operand:PRED_ALL 3 "register_operand" "Upa"))
> -              (match_operand:PRED_ALL 2 "register_operand" "Upa"))
> +                (match_operand:PRED_ALL 3 "register_operand"))
> +              (match_operand:PRED_ALL 2 "register_operand"))
>              (match_dup 4))]
>           UNSPEC_PTEST))
> -   (clobber (match_scratch:VNx16BI 0 "=Upa"))]
> +   (clobber (match_scratch:VNx16BI 0))]
>    "TARGET_SVE"
> -  "<nlogical>s\t%0.b, %1/z, %2.b, %3.b"
> +  {@ [ cons:  =0, 1  , 2  , 3  , 4, 5 ]
> +     [ Upa      , Upa, Upa, Upa,  ,   ] <nlogical>s\t%0.b, %1/z, %2.b,
> %3.b
> +  }
>  )
>
>  ;;
> -------------------------------------------------------------------------
> @@ -6782,58 +6806,64 @@ (define_insn "*<nlogical><mode>3_ptest"
>
>  ;; Predicated predicate NAND and NOR.
>  (define_insn "aarch64_pred_<logical_nn><mode>_z"
> -  [(set (match_operand:PRED_ALL 0 "register_operand" "=Upa")
> +  [(set (match_operand:PRED_ALL 0 "register_operand")
>         (and:PRED_ALL
>           (NLOGICAL:PRED_ALL
> -           (not:PRED_ALL (match_operand:PRED_ALL 2 "register_operand"
> "Upa"))
> -           (not:PRED_ALL (match_operand:PRED_ALL 3 "register_operand"
> "Upa")))
> -         (match_operand:PRED_ALL 1 "register_operand" "Upa")))]
> +           (not:PRED_ALL (match_operand:PRED_ALL 2 "register_operand"))
> +           (not:PRED_ALL (match_operand:PRED_ALL 3 "register_operand")))
> +         (match_operand:PRED_ALL 1 "register_operand")))]
>    "TARGET_SVE"
> -  "<logical_nn>\t%0.b, %1/z, %2.b, %3.b"
> +  {@ [ cons: =0, 1  , 2  , 3   ]
> +     [ Upa     , Upa, Upa, Upa ] <logical_nn>\t%0.b, %1/z, %2.b, %3.b
> +  }
>  )
>
>  ;; Same, but set the flags as a side-effect.
>  (define_insn "*<logical_nn><mode>3_cc"
>    [(set (reg:CC_NZC CC_REGNUM)
>         (unspec:CC_NZC
> -         [(match_operand:VNx16BI 1 "register_operand" "Upa")
> +         [(match_operand:VNx16BI 1 "register_operand")
>            (match_operand 4)
>            (match_operand:SI 5 "aarch64_sve_ptrue_flag")
>            (and:PRED_ALL
>              (NLOGICAL:PRED_ALL
>                (not:PRED_ALL
> -                (match_operand:PRED_ALL 2 "register_operand" "Upa"))
> +                (match_operand:PRED_ALL 2 "register_operand"))
>                (not:PRED_ALL
> -                (match_operand:PRED_ALL 3 "register_operand" "Upa")))
> +                (match_operand:PRED_ALL 3 "register_operand")))
>              (match_dup 4))]
>           UNSPEC_PTEST))
> -   (set (match_operand:PRED_ALL 0 "register_operand" "=Upa")
> +   (set (match_operand:PRED_ALL 0 "register_operand")
>         (and:PRED_ALL (NLOGICAL:PRED_ALL
>                         (not:PRED_ALL (match_dup 2))
>                         (not:PRED_ALL (match_dup 3)))
>                       (match_dup 4)))]
>    "TARGET_SVE"
> -  "<logical_nn>s\t%0.b, %1/z, %2.b, %3.b"
> +  {@ [ cons: =0, 1  , 2  , 3  , 4, 5 ]
> +     [ Upa     , Upa, Upa, Upa,  ,   ] <logical_nn>s\t%0.b, %1/z, %2.b,
> %3.b
> +  }
>  )
>
>  ;; Same with just the flags result.
>  (define_insn "*<logical_nn><mode>3_ptest"
>    [(set (reg:CC_NZC CC_REGNUM)
>         (unspec:CC_NZC
> -         [(match_operand:VNx16BI 1 "register_operand" "Upa")
> +         [(match_operand:VNx16BI 1 "register_operand")
>            (match_operand 4)
>            (match_operand:SI 5 "aarch64_sve_ptrue_flag")
>            (and:PRED_ALL
>              (NLOGICAL:PRED_ALL
>                (not:PRED_ALL
> -                (match_operand:PRED_ALL 2 "register_operand" "Upa"))
> +                (match_operand:PRED_ALL 2 "register_operand"))
>                (not:PRED_ALL
> -                (match_operand:PRED_ALL 3 "register_operand" "Upa")))
> +                (match_operand:PRED_ALL 3 "register_operand")))
>              (match_dup 4))]
>           UNSPEC_PTEST))
> -   (clobber (match_scratch:VNx16BI 0 "=Upa"))]
> +   (clobber (match_scratch:VNx16BI 0))]
>    "TARGET_SVE"
> -  "<logical_nn>s\t%0.b, %1/z, %2.b, %3.b"
> +  {@ [ cons: =0, 1  , 2  , 3  , 4, 5 ]
> +     [ Upa     , Upa, Upa, Upa,  ,   ] <logical_nn>s\t%0.b, %1/z, %2.b,
> %3.b
> +  }
>  )
>
>  ;;
> =========================================================================
> @@ -8133,12 +8163,12 @@ (define_insn_and_rewrite "*cmp<cmp_op><mode>_ptest"
>                 (match_operand:SVE_I 3
> "aarch64_sve_cmp_<sve_imm_con>_operand"))]
>              UNSPEC_PRED_Z)]
>           UNSPEC_PTEST))
> -   (clobber (match_scratch:<VPRED> 0 "=Upa, Upa"))]
> +   (clobber (match_scratch:<VPRED> 0))]
>    "TARGET_SVE
>     && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])"
> -  {@ [ cons: 1 , 2 , 3              ]
> -     [ Upl     , w , <sve_imm_con>  ] cmp<cmp_op>\t%0.<Vetype>, %1/z,
> %2.<Vetype>, #%3
> -     [ Upl     , w , w              ] cmp<cmp_op>\t%0.<Vetype>, %1/z,
> %2.<Vetype>, %3.<Vetype>
> +  {@ [ cons: =0, 1  , 2 , 3              ]
> +     [ Upa     , Upl, w , <sve_imm_con>  ] cmp<cmp_op>\t%0.<Vetype>,
> %1/z, %2.<Vetype>, #%3
> +     [ Upa     , Upl, w , w              ] cmp<cmp_op>\t%0.<Vetype>,
> %1/z, %2.<Vetype>, %3.<Vetype>
>    }
>    "&& !rtx_equal_p (operands[4], operands[6])"
>    {
> @@ -8180,18 +8210,20 @@ (define_insn_and_split "*cmp<cmp_op><mode>_and"
>
>  ;; Predicated integer wide comparisons.
>  (define_insn "@aarch64_pred_cmp<cmp_op><mode>_wide"
> -  [(set (match_operand:<VPRED> 0 "register_operand" "=Upa")
> +  [(set (match_operand:<VPRED> 0 "register_operand")
>         (unspec:<VPRED>
> -         [(match_operand:VNx16BI 1 "register_operand" "Upl")
> +         [(match_operand:VNx16BI 1 "register_operand")
>            (match_operand:SI 2 "aarch64_sve_ptrue_flag")
>            (unspec:<VPRED>
> -            [(match_operand:SVE_FULL_BHSI 3 "register_operand" "w")
> -             (match_operand:VNx2DI 4 "register_operand" "w")]
> +            [(match_operand:SVE_FULL_BHSI 3 "register_operand")
> +             (match_operand:VNx2DI 4 "register_operand")]
>              SVE_COND_INT_CMP_WIDE)]
>           UNSPEC_PRED_Z))
>     (clobber (reg:CC_NZC CC_REGNUM))]
>    "TARGET_SVE"
> -  "cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.d"
> +  {@ [ cons: =0, 1  , 2, 3, 4 ]
> +     [ Upa     , Upl,  , w, w ] cmp<cmp_op>\t%0.<Vetype>, %1/z,
> %3.<Vetype>, %4.d
> +  }
>  )
>
>  ;; Predicated integer wide comparisons in which both the flag and
> @@ -8199,19 +8231,19 @@ (define_insn "@aarch64_pred_cmp<cmp_op><mode>_wide"
>  (define_insn "*aarch64_pred_cmp<cmp_op><mode>_wide_cc"
>    [(set (reg:CC_NZC CC_REGNUM)
>         (unspec:CC_NZC
> -         [(match_operand:VNx16BI 1 "register_operand" "Upl")
> +         [(match_operand:VNx16BI 1 "register_operand")
>            (match_operand 4)
>            (match_operand:SI 5 "aarch64_sve_ptrue_flag")
>            (unspec:<VPRED>
> -            [(match_operand:VNx16BI 6 "register_operand" "Upl")
> +            [(match_operand:VNx16BI 6 "register_operand")
>               (match_operand:SI 7 "aarch64_sve_ptrue_flag")
>               (unspec:<VPRED>
> -               [(match_operand:SVE_FULL_BHSI 2 "register_operand" "w")
> -                (match_operand:VNx2DI 3 "register_operand" "w")]
> +               [(match_operand:SVE_FULL_BHSI 2 "register_operand")
> +                (match_operand:VNx2DI 3 "register_operand")]
>                 SVE_COND_INT_CMP_WIDE)]
>              UNSPEC_PRED_Z)]
>           UNSPEC_PTEST))
> -   (set (match_operand:<VPRED> 0 "register_operand" "=Upa")
> +   (set (match_operand:<VPRED> 0 "register_operand")
>         (unspec:<VPRED>
>           [(match_dup 6)
>            (match_dup 7)
> @@ -8222,7 +8254,9 @@ (define_insn
> "*aarch64_pred_cmp<cmp_op><mode>_wide_cc"
>           UNSPEC_PRED_Z))]
>    "TARGET_SVE
>     && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])"
> -  "cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.d"
> +  {@ [ cons: =0, 1  , 2, 3, 4, 5, 6  , 7 ]
> +     [ Upa     , Upl, w, w,  ,  , Upl,   ] cmp<cmp_op>\t%0.<Vetype>,
> %1/z, %2.<Vetype>, %3.d
> +  }
>  )
>
>  ;; Predicated integer wide comparisons in which only the flags result
> @@ -8230,22 +8264,24 @@ (define_insn
> "*aarch64_pred_cmp<cmp_op><mode>_wide_cc"
>  (define_insn "*aarch64_pred_cmp<cmp_op><mode>_wide_ptest"
>    [(set (reg:CC_NZC CC_REGNUM)
>         (unspec:CC_NZC
> -         [(match_operand:VNx16BI 1 "register_operand" "Upl")
> +         [(match_operand:VNx16BI 1 "register_operand")
>            (match_operand 4)
>            (match_operand:SI 5 "aarch64_sve_ptrue_flag")
>            (unspec:<VPRED>
> -            [(match_operand:VNx16BI 6 "register_operand" "Upl")
> +            [(match_operand:VNx16BI 6 "register_operand")
>               (match_operand:SI 7 "aarch64_sve_ptrue_flag")
>               (unspec:<VPRED>
> -               [(match_operand:SVE_FULL_BHSI 2 "register_operand" "w")
> -                (match_operand:VNx2DI 3 "register_operand" "w")]
> +               [(match_operand:SVE_FULL_BHSI 2 "register_operand")
> +                (match_operand:VNx2DI 3 "register_operand")]
>                 SVE_COND_INT_CMP_WIDE)]
>              UNSPEC_PRED_Z)]
>           UNSPEC_PTEST))
> -   (clobber (match_scratch:<VPRED> 0 "=Upa"))]
> +   (clobber (match_scratch:<VPRED> 0))]
>    "TARGET_SVE
>     && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])"
> -  "cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.d"
> +  {@ [ cons:  =0, 1  , 2, 3, 4, 5, 6  , 7 ]
> +     [ Upa      , Upl, w, w,  ,  , Upl,   ] cmp<cmp_op>\t%0.<Vetype>,
> %1/z, %2.<Vetype>, %3.d
> +  }
>  )
>
>  ;;
> -------------------------------------------------------------------------
> @@ -9922,41 +9958,45 @@ (define_insn "@aarch64_brk<brk_op>"
>  (define_insn "*aarch64_brk<brk_op>_cc"
>    [(set (reg:CC_NZC CC_REGNUM)
>         (unspec:CC_NZC
> -         [(match_operand:VNx16BI 1 "register_operand" "Upa")
> +         [(match_operand:VNx16BI 1 "register_operand")
>            (match_dup 1)
>            (match_operand:SI 4 "aarch64_sve_ptrue_flag")
>            (unspec:VNx16BI
>              [(match_dup 1)
> -             (match_operand:VNx16BI 2 "register_operand" "Upa")
> +             (match_operand:VNx16BI 2 "register_operand")
>               (match_operand:VNx16BI 3 "aarch64_simd_imm_zero")]
>              SVE_BRK_UNARY)]
>           UNSPEC_PTEST))
> -   (set (match_operand:VNx16BI 0 "register_operand" "=Upa")
> +   (set (match_operand:VNx16BI 0 "register_operand")
>         (unspec:VNx16BI
>           [(match_dup 1)
>            (match_dup 2)
>            (match_dup 3)]
>           SVE_BRK_UNARY))]
>    "TARGET_SVE"
> -  "brk<brk_op>s\t%0.b, %1/z, %2.b"
> +  {@ [ cons: =0, 1  , 2  , 3, 4 ]
> +     [ Upa     , Upa, Upa,  ,   ] brk<brk_op>s\t%0.b, %1/z, %2.b
> +  }
>  )
>
>  ;; Same, but with only the flags result being interesting.
>  (define_insn "*aarch64_brk<brk_op>_ptest"
>    [(set (reg:CC_NZC CC_REGNUM)
>         (unspec:CC_NZC
> -         [(match_operand:VNx16BI 1 "register_operand" "Upa")
> +         [(match_operand:VNx16BI 1 "register_operand")
>            (match_dup 1)
>            (match_operand:SI 4 "aarch64_sve_ptrue_flag")
>            (unspec:VNx16BI
>              [(match_dup 1)
> -             (match_operand:VNx16BI 2 "register_operand" "Upa")
> +             (match_operand:VNx16BI 2 "register_operand")
>               (match_operand:VNx16BI 3 "aarch64_simd_imm_zero")]
>              SVE_BRK_UNARY)]
>           UNSPEC_PTEST))
> -   (clobber (match_scratch:VNx16BI 0 "=Upa"))]
> +   (clobber (match_scratch:VNx16BI 0))]
>    "TARGET_SVE"
> -  "brk<brk_op>s\t%0.b, %1/z, %2.b"
> +  {@ [ cons: =0, 1  , 2  , 3, 4 ]
> +     [ Upa     , Upa, Upa,  ,   ] brk<brk_op>s\t%0.b, %1/z, %2.b
> +  }
>  )
>
>  ;;
> -------------------------------------------------------------------------
> @@ -9973,14 +10013,16 @@ (define_insn "*aarch64_brk<brk_op>_ptest"
>
>  ;; Binary BRKs (BRKN, BRKPA, BRKPB).
>  (define_insn "@aarch64_brk<brk_op>"
> -  [(set (match_operand:VNx16BI 0 "register_operand" "=Upa")
> +  [(set (match_operand:VNx16BI 0 "register_operand")
>         (unspec:VNx16BI
> -         [(match_operand:VNx16BI 1 "register_operand" "Upa")
> -          (match_operand:VNx16BI 2 "register_operand" "Upa")
> -          (match_operand:VNx16BI 3 "register_operand" "<brk_reg_con>")]
> +         [(match_operand:VNx16BI 1 "register_operand")
> +          (match_operand:VNx16BI 2 "register_operand")
> +          (match_operand:VNx16BI 3 "register_operand")]
>           SVE_BRK_BINARY))]
>    "TARGET_SVE"
> -  "brk<brk_op>\t%0.b, %1/z, %2.b, %<brk_reg_opno>.b"
> +  {@ [ cons: =0, 1  , 2  , 3             ]
> +     [ Upa     , Upa, Upa, <brk_reg_con> ] brk<brk_op>\t%0.b, %1/z, %2.b,
> %<brk_reg_opno>.b
> +  }
>  )
>
>  ;; BRKN, producing both a predicate and a flags result.  Unlike other
> @@ -9992,19 +10034,21 @@ (define_insn_and_rewrite "*aarch64_brkn_cc"
>            (match_operand:VNx16BI 5)
>            (const_int SVE_KNOWN_PTRUE)
>            (unspec:VNx16BI
> -            [(match_operand:VNx16BI 1 "register_operand" "Upa")
> -             (match_operand:VNx16BI 2 "register_operand" "Upa")
> -             (match_operand:VNx16BI 3 "register_operand" "0")]
> +            [(match_operand:VNx16BI 1 "register_operand")
> +             (match_operand:VNx16BI 2 "register_operand")
> +             (match_operand:VNx16BI 3 "register_operand")]
>              UNSPEC_BRKN)]
>           UNSPEC_PTEST))
> -   (set (match_operand:VNx16BI 0 "register_operand" "=Upa")
> +   (set (match_operand:VNx16BI 0 "register_operand")
>         (unspec:VNx16BI
>           [(match_dup 1)
>            (match_dup 2)
>            (match_dup 3)]
>           UNSPEC_BRKN))]
>    "TARGET_SVE"
> -  "brkns\t%0.b, %1/z, %2.b, %0.b"
> +  {@ [ cons: =0, 1  , 2  , 3, 4, 5 ]
> +     [ Upa     , Upa, Upa, 0,  ,   ] brkns\t%0.b, %1/z, %2.b, %0.b
> +  }
>    "&& (operands[4] != CONST0_RTX (VNx16BImode)
>         || operands[5] != CONST0_RTX (VNx16BImode))"
>    {
> @@ -10021,14 +10065,16 @@ (define_insn_and_rewrite "*aarch64_brkn_ptest"
>            (match_operand:VNx16BI 5)
>            (const_int SVE_KNOWN_PTRUE)
>            (unspec:VNx16BI
> -            [(match_operand:VNx16BI 1 "register_operand" "Upa")
> -             (match_operand:VNx16BI 2 "register_operand" "Upa")
> -             (match_operand:VNx16BI 3 "register_operand" "0")]
> +            [(match_operand:VNx16BI 1 "register_operand")
> +             (match_operand:VNx16BI 2 "register_operand")
> +             (match_operand:VNx16BI 3 "register_operand")]
>              UNSPEC_BRKN)]
>           UNSPEC_PTEST))
> -   (clobber (match_scratch:VNx16BI 0 "=Upa"))]
> +   (clobber (match_scratch:VNx16BI 0))]
>    "TARGET_SVE"
> -  "brkns\t%0.b, %1/z, %2.b, %0.b"
> +  {@ [ cons: =0, 1  , 2  , 3, 4, 5 ]
> +     [ Upa     , Upa, Upa, 0,  ,   ] brkns\t%0.b, %1/z, %2.b, %0.b
> +  }
>    "&& (operands[4] != CONST0_RTX (VNx16BImode)
>         || operands[5] != CONST0_RTX (VNx16BImode))"
>    {
> @@ -10041,41 +10087,45 @@ (define_insn_and_rewrite "*aarch64_brkn_ptest"
>  (define_insn "*aarch64_brk<brk_op>_cc"
>    [(set (reg:CC_NZC CC_REGNUM)
>         (unspec:CC_NZC
> -         [(match_operand:VNx16BI 1 "register_operand" "Upa")
> +         [(match_operand:VNx16BI 1 "register_operand")
>            (match_dup 1)
>            (match_operand:SI 4 "aarch64_sve_ptrue_flag")
>            (unspec:VNx16BI
>              [(match_dup 1)
> -             (match_operand:VNx16BI 2 "register_operand" "Upa")
> -             (match_operand:VNx16BI 3 "register_operand" "Upa")]
> +             (match_operand:VNx16BI 2 "register_operand")
> +             (match_operand:VNx16BI 3 "register_operand")]
>              SVE_BRKP)]
>           UNSPEC_PTEST))
> -   (set (match_operand:VNx16BI 0 "register_operand" "=Upa")
> +   (set (match_operand:VNx16BI 0 "register_operand")
>         (unspec:VNx16BI
>           [(match_dup 1)
>            (match_dup 2)
>            (match_dup 3)]
>           SVE_BRKP))]
>    "TARGET_SVE"
> -  "brk<brk_op>s\t%0.b, %1/z, %2.b, %3.b"
> +  {@ [ cons: =0, 1  , 2  , 3  , 4 ]
> +     [ Upa     , Upa, Upa, Upa,   ] brk<brk_op>s\t%0.b, %1/z, %2.b, %3.b
> +  }
>  )
>
>  ;; Same, but with only the flags result being interesting.
>  (define_insn "*aarch64_brk<brk_op>_ptest"
>    [(set (reg:CC_NZC CC_REGNUM)
>         (unspec:CC_NZC
> -         [(match_operand:VNx16BI 1 "register_operand" "Upa")
> +         [(match_operand:VNx16BI 1 "register_operand")
>            (match_dup 1)
>            (match_operand:SI 4 "aarch64_sve_ptrue_flag")
>            (unspec:VNx16BI
>              [(match_dup 1)
> -             (match_operand:VNx16BI 2 "register_operand" "Upa")
> -             (match_operand:VNx16BI 3 "register_operand" "Upa")]
> +             (match_operand:VNx16BI 2 "register_operand")
> +             (match_operand:VNx16BI 3 "register_operand")]
>              SVE_BRKP)]
>           UNSPEC_PTEST))
> -   (clobber (match_scratch:VNx16BI 0 "=Upa"))]
> +   (clobber (match_scratch:VNx16BI 0))]
>    "TARGET_SVE"
> -  "brk<brk_op>s\t%0.b, %1/z, %2.b, %3.b"
> +  {@ [ cons: =0, 1  , 2  , 3  , 4 ]
> +     [ Upa     , Upa, Upa, Upa,   ] brk<brk_op>s\t%0.b, %1/z, %2.b, %3.b
> +  }
>  )
>
>  ;;
> -------------------------------------------------------------------------
> diff --git a/gcc/config/aarch64/aarch64-sve2.md
> b/gcc/config/aarch64/aarch64-sve2.md
> index
> 934e57055d3419e5dcc89b473fd110a0d4978b4f..aa12baf48355358ca4fefe88157df3aac6eb09bd
> 100644
> --- a/gcc/config/aarch64/aarch64-sve2.md
> +++ b/gcc/config/aarch64/aarch64-sve2.md
> @@ -3338,18 +3338,20 @@ (define_insn "@aarch64_sve2_histseg<mode>"
>
>  ;; Predicated string matching.
>  (define_insn "@aarch64_pred_<sve_int_op><mode>"
> -  [(set (match_operand:<VPRED> 0 "register_operand" "=Upa")
> +  [(set (match_operand:<VPRED> 0 "register_operand")
>         (unspec:<VPRED>
> -         [(match_operand:<VPRED> 1 "register_operand" "Upl")
> +         [(match_operand:<VPRED> 1 "register_operand")
>            (match_operand:SI 2 "aarch64_sve_ptrue_flag")
>            (unspec:<VPRED>
> -            [(match_operand:SVE_FULL_BHI 3 "register_operand" "w")
> -             (match_operand:SVE_FULL_BHI 4 "register_operand" "w")]
> +            [(match_operand:SVE_FULL_BHI 3 "register_operand")
> +             (match_operand:SVE_FULL_BHI 4 "register_operand")]
>              SVE2_MATCH)]
>           UNSPEC_PRED_Z))
>     (clobber (reg:CC_NZC CC_REGNUM))]
>    "TARGET_SVE2 && TARGET_NON_STREAMING"
> -  "<sve_int_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.<Vetype>"
> +  {@ [ cons: =0, 1 , 2, 3, 4 ]
> +     [ Upa     , Upl, , w, w ] <sve_int_op>\t%0.<Vetype>, %1/z,
> %3.<Vetype>, %4.<Vetype>
> +  }
>  )
>
>  ;; Predicated string matching in which both the flag and predicate results
>
>
>
>
> --
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/4]AArch64: add new tuning param and attribute for enabling conditional early clobber
  2024-05-15 10:28 ` [PATCH 2/4]AArch64: add new tuning param and attribute for enabling conditional early clobber Tamar Christina
@ 2024-05-15 10:56   ` Richard Sandiford
  2024-05-15 11:03     ` Tamar Christina
  2024-05-22  9:29     ` Tamar Christina
  0 siblings, 2 replies; 19+ messages in thread
From: Richard Sandiford @ 2024-05-15 10:56 UTC (permalink / raw)
  To: Tamar Christina
  Cc: gcc-patches, nd, Richard.Earnshaw, Marcus.Shawcroft, ktkachov

Tamar Christina <tamar.christina@arm.com> writes:
> Hi All,
>
> This adds a new tuning parameter EARLY_CLOBBER_SVE_PRED_DEST for AArch64 to
> allow us to conditionally enable the early clobber alternatives based on the
> tuning models.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> 	* config/aarch64/aarch64-tuning-flags.def
> 	(EARLY_CLOBBER_SVE_PRED_DEST): New.
> 	* config/aarch64/aarch64.h (TARGET_SVE_PRED_CLOBBER): New.
> 	* config/aarch64/aarch64.md (pred_clobber): New.
> 	(arch_enabled): Use it.
>
> ---
> diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def b/gcc/config/aarch64/aarch64-tuning-flags.def
> index d5bcaebce770f0b217aac783063d39135f754c77..49fbad3ff28bc82b25c61ac501ccf533ec4b4c3f 100644
> --- a/gcc/config/aarch64/aarch64-tuning-flags.def
> +++ b/gcc/config/aarch64/aarch64-tuning-flags.def
> @@ -48,4 +48,8 @@ AARCH64_EXTRA_TUNING_OPTION ("avoid_cross_loop_fma", AVOID_CROSS_LOOP_FMA)
>  
>  AARCH64_EXTRA_TUNING_OPTION ("fully_pipelined_fma", FULLY_PIPELINED_FMA)
>  
> +/* Enable is the target prefers to use a fresh register for predicate outputs
> +   rather than re-use an input predicate register.  */
> +AARCH64_EXTRA_TUNING_OPTION ("early_clobber_sve_pred_dest", EARLY_CLOBBER_SVE_PRED_DEST)

Sorry for the bike-shedding, but how about something like "avoid_pred_rmw"?
(I'm open to other suggestions.)  Just looking for something that describes
either the architecture or the end result that we want to achieve.
And preferable something fairly short :)

avoid_* would be consistent with the existing "avoid_cross_loop_fma".

> +
>  #undef AARCH64_EXTRA_TUNING_OPTION
> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> index bbf11faaf4b4340956094a983f8b0dc2649b2d27..76a18dd511f40ebb58ed12d56b46c74084ba7c3c 100644
> --- a/gcc/config/aarch64/aarch64.h
> +++ b/gcc/config/aarch64/aarch64.h
> @@ -495,6 +495,11 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = AARCH64_FL_SM_OFF;
>      enabled through +gcs.  */
>  #define TARGET_GCS (AARCH64_ISA_GCS)
>  
> +/*  Prefer different predicate registers for the output of a predicated operation over
> +    re-using an existing input predicate.  */
> +#define TARGET_SVE_PRED_CLOBBER (TARGET_SVE \
> +				 && (aarch64_tune_params.extra_tuning_flags \
> +				     & AARCH64_EXTRA_TUNE_EARLY_CLOBBER_SVE_PRED_DEST))
>  
>  /* Standard register usage.  */
>  
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index dbde066f7478bec51a8703b017ea553aa98be309..1ecd1a2812969504bd5114a53473b478c5ddba82 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -445,6 +445,10 @@ (define_enum_attr "arch" "arches" (const_string "any"))
>  ;; target-independent code.
>  (define_attr "is_call" "no,yes" (const_string "no"))
>  
> +;; Indicates whether we want to enable the pattern with an optional early
> +;; clobber for SVE predicates.
> +(define_attr "pred_clobber" "no,yes" (const_string "no"))
> +
>  ;; [For compatibility with Arm in pipeline models]
>  ;; Attribute that specifies whether or not the instruction touches fp
>  ;; registers.
> @@ -461,7 +465,8 @@ (define_attr "fp" "no,yes"
>  (define_attr "arch_enabled" "no,yes"
>    (if_then_else
>      (ior
> -	(eq_attr "arch" "any")
> +	(and (eq_attr "arch" "any")
> +	     (eq_attr "pred_clobber" "no"))
>  
>  	(and (eq_attr "arch" "rcpc8_4")
>  	     (match_test "AARCH64_ISA_RCPC8_4"))
> @@ -488,7 +493,10 @@ (define_attr "arch_enabled" "no,yes"
>  	     (match_test "TARGET_SVE"))
>  
>  	(and (eq_attr "arch" "sme")
> -	     (match_test "TARGET_SME")))
> +	     (match_test "TARGET_SME"))
> +
> +	(and (eq_attr "pred_clobber" "yes")
> +	     (match_test "TARGET_SVE_PRED_CLOBBER")))

IMO it'd be bettero handle pred_clobber separately from arch, as a new
top-level AND:

  (and
    (ior
      (eq_attr "pred_clobber" "no")
      (match_test "!TARGET_..."))
    (ior
      ...existing arch tests...))

Thanks,
Richard

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH 2/4]AArch64: add new tuning param and attribute for enabling conditional early clobber
  2024-05-15 10:56   ` Richard Sandiford
@ 2024-05-15 11:03     ` Tamar Christina
  2024-05-22  9:29     ` Tamar Christina
  1 sibling, 0 replies; 19+ messages in thread
From: Tamar Christina @ 2024-05-15 11:03 UTC (permalink / raw)
  To: Richard Sandiford
  Cc: gcc-patches, nd, Richard Earnshaw, Marcus Shawcroft, ktkachov

> -----Original Message-----
> From: Richard Sandiford <richard.sandiford@arm.com>
> Sent: Wednesday, May 15, 2024 11:56 AM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Marcus Shawcroft
> <Marcus.Shawcroft@arm.com>; ktkachov@gcc.gnu.org
> Subject: Re: [PATCH 2/4]AArch64: add new tuning param and attribute for
> enabling conditional early clobber
> 
> Tamar Christina <tamar.christina@arm.com> writes:
> > Hi All,
> >
> > This adds a new tuning parameter EARLY_CLOBBER_SVE_PRED_DEST for AArch64
> to
> > allow us to conditionally enable the early clobber alternatives based on the
> > tuning models.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > 	* config/aarch64/aarch64-tuning-flags.def
> > 	(EARLY_CLOBBER_SVE_PRED_DEST): New.
> > 	* config/aarch64/aarch64.h (TARGET_SVE_PRED_CLOBBER): New.
> > 	* config/aarch64/aarch64.md (pred_clobber): New.
> > 	(arch_enabled): Use it.
> >
> > ---
> > diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def
> b/gcc/config/aarch64/aarch64-tuning-flags.def
> > index
> d5bcaebce770f0b217aac783063d39135f754c77..49fbad3ff28bc82b25c61ac50
> 1ccf533ec4b4c3f 100644
> > --- a/gcc/config/aarch64/aarch64-tuning-flags.def
> > +++ b/gcc/config/aarch64/aarch64-tuning-flags.def
> > @@ -48,4 +48,8 @@ AARCH64_EXTRA_TUNING_OPTION
> ("avoid_cross_loop_fma", AVOID_CROSS_LOOP_FMA)
> >
> >  AARCH64_EXTRA_TUNING_OPTION ("fully_pipelined_fma",
> FULLY_PIPELINED_FMA)
> >
> > +/* Enable is the target prefers to use a fresh register for predicate outputs
> > +   rather than re-use an input predicate register.  */
> > +AARCH64_EXTRA_TUNING_OPTION ("early_clobber_sve_pred_dest",
> EARLY_CLOBBER_SVE_PRED_DEST)
> 
> Sorry for the bike-shedding, but how about something like "avoid_pred_rmw"?
> (I'm open to other suggestions.)  Just looking for something that describes
> either the architecture or the end result that we want to achieve.
> And preferable something fairly short :)
> 
> avoid_* would be consistent with the existing "avoid_cross_loop_fma".

Sure, happy to, it's something we initially struggled with naming internally as well.
It sounds there's precedence so the avoid_ naming, so happy to use this naming.

Will respin with it.

Thanks,
Tamar

> 
> > +
> >  #undef AARCH64_EXTRA_TUNING_OPTION
> > diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> > index
> bbf11faaf4b4340956094a983f8b0dc2649b2d27..76a18dd511f40ebb58ed12d5
> 6b46c74084ba7c3c 100644
> > --- a/gcc/config/aarch64/aarch64.h
> > +++ b/gcc/config/aarch64/aarch64.h
> > @@ -495,6 +495,11 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE =
> AARCH64_FL_SM_OFF;
> >      enabled through +gcs.  */
> >  #define TARGET_GCS (AARCH64_ISA_GCS)
> >
> > +/*  Prefer different predicate registers for the output of a predicated operation
> over
> > +    re-using an existing input predicate.  */
> > +#define TARGET_SVE_PRED_CLOBBER (TARGET_SVE \
> > +				 && (aarch64_tune_params.extra_tuning_flags \
> > +				     &
> AARCH64_EXTRA_TUNE_EARLY_CLOBBER_SVE_PRED_DEST))
> >
> >  /* Standard register usage.  */
> >
> > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> > index
> dbde066f7478bec51a8703b017ea553aa98be309..1ecd1a2812969504bd5114a
> 53473b478c5ddba82 100644
> > --- a/gcc/config/aarch64/aarch64.md
> > +++ b/gcc/config/aarch64/aarch64.md
> > @@ -445,6 +445,10 @@ (define_enum_attr "arch" "arches" (const_string
> "any"))
> >  ;; target-independent code.
> >  (define_attr "is_call" "no,yes" (const_string "no"))
> >
> > +;; Indicates whether we want to enable the pattern with an optional early
> > +;; clobber for SVE predicates.
> > +(define_attr "pred_clobber" "no,yes" (const_string "no"))
> > +
> >  ;; [For compatibility with Arm in pipeline models]
> >  ;; Attribute that specifies whether or not the instruction touches fp
> >  ;; registers.
> > @@ -461,7 +465,8 @@ (define_attr "fp" "no,yes"
> >  (define_attr "arch_enabled" "no,yes"
> >    (if_then_else
> >      (ior
> > -	(eq_attr "arch" "any")
> > +	(and (eq_attr "arch" "any")
> > +	     (eq_attr "pred_clobber" "no"))
> >
> >  	(and (eq_attr "arch" "rcpc8_4")
> >  	     (match_test "AARCH64_ISA_RCPC8_4"))
> > @@ -488,7 +493,10 @@ (define_attr "arch_enabled" "no,yes"
> >  	     (match_test "TARGET_SVE"))
> >
> >  	(and (eq_attr "arch" "sme")
> > -	     (match_test "TARGET_SME")))
> > +	     (match_test "TARGET_SME"))
> > +
> > +	(and (eq_attr "pred_clobber" "yes")
> > +	     (match_test "TARGET_SVE_PRED_CLOBBER")))
> 
> IMO it'd be bettero handle pred_clobber separately from arch, as a new
> top-level AND:
> 
>   (and
>     (ior
>       (eq_attr "pred_clobber" "no")
>       (match_test "!TARGET_..."))
>     (ior
>       ...existing arch tests...))
> 
> Thanks,
> Richard

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/4]AArch64: convert several predicate patterns to new compact syntax
  2024-05-15 10:28 ` [PATCH 1/4]AArch64: convert several predicate patterns to new compact syntax Tamar Christina
  2024-05-15 10:35   ` Kyrill Tkachov
@ 2024-05-15 11:06   ` Richard Sandiford
  1 sibling, 0 replies; 19+ messages in thread
From: Richard Sandiford @ 2024-05-15 11:06 UTC (permalink / raw)
  To: Tamar Christina
  Cc: gcc-patches, nd, Richard.Earnshaw, Marcus.Shawcroft, ktkachov

Thanks for doing this a pre-patch.  Minor request below:

Tamar Christina <tamar.christina@arm.com> writes:
>  ;; Perform a logical operation on operands 2 and 3, using operand 1 as
> @@ -6676,38 +6690,42 @@ (define_insn "@aarch64_pred_<optab><mode>_z"
>  (define_insn "*<optab><mode>3_cc"
>    [(set (reg:CC_NZC CC_REGNUM)
>  	(unspec:CC_NZC
> -	  [(match_operand:VNx16BI 1 "register_operand" "Upa")
> +	  [(match_operand:VNx16BI 1 "register_operand")
>  	   (match_operand 4)
>  	   (match_operand:SI 5 "aarch64_sve_ptrue_flag")
>  	   (and:PRED_ALL
>  	     (LOGICAL:PRED_ALL
> -	       (match_operand:PRED_ALL 2 "register_operand" "Upa")
> -	       (match_operand:PRED_ALL 3 "register_operand" "Upa"))
> +	       (match_operand:PRED_ALL 2 "register_operand")
> +	       (match_operand:PRED_ALL 3 "register_operand"))
>  	     (match_dup 4))]
>  	  UNSPEC_PTEST))
> -   (set (match_operand:PRED_ALL 0 "register_operand" "=Upa")
> +   (set (match_operand:PRED_ALL 0 "register_operand")
>  	(and:PRED_ALL (LOGICAL:PRED_ALL (match_dup 2) (match_dup 3))
>  		      (match_dup 4)))]
>    "TARGET_SVE"
> -  "<logical>s\t%0.b, %1/z, %2.b, %3.b"
> +  {@ [ cons: =0, 1  , 2  , 3  , 4, 5 ]
> +     [ Upa     , Upa, Upa, Upa,  ,   ] <logical>s\t%0.b, %1/z, %2.b, %3.b
> +  }
>  )

Could we leave out these empty trailing constraints?  They're quite
common in SVE & SME patterns and are specifically not meant to influence
instruction selection.  E.g. we've done the same thing for *cnot<mode>
(to pick a random example).

Agree with Kyrill's ok otherwise.

Richard

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/4]AArch64: support conditional early clobbers on certain operations.
  2024-05-15 10:28 [PATCH 0/4]AArch64: support conditional early clobbers on certain operations Tamar Christina
                   ` (3 preceding siblings ...)
  2024-05-15 10:29 ` [PATCH 4/4]AArch64: enable new predicate tuning for Neoverse cores Tamar Christina
@ 2024-05-15 11:20 ` Richard Biener
  2024-05-15 11:23   ` Tamar Christina
  4 siblings, 1 reply; 19+ messages in thread
From: Richard Biener @ 2024-05-15 11:20 UTC (permalink / raw)
  To: Tamar Christina
  Cc: gcc-patches, nd, Richard.Earnshaw, Marcus.Shawcroft, ktkachov,
	richard.sandiford

On Wed, May 15, 2024 at 12:29 PM Tamar Christina
<tamar.christina@arm.com> wrote:
>
> Hi All,
>
> Some Neoverse Software Optimization Guides (SWoG) have a clause that state
> that for predicated operations that also produce a predicate it is preferred
> that the codegen should use a different register for the destination than that
> of the input predicate in order to avoid a performance overhead.
>
> This of course has the problem that it increases register pressure and so should
> be done with care.  Additionally not all micro-architectures have this
> consideration and so it shouldn't be done as a default thing.
>
> The patch series adds support for doing conditional early clobbers through a
> combination of new alternatives and attributes to control their availability.

You could have two alternatives, one with early clobber and one with
a matching constraint where you'd disparage the matching constraint one?

> On high register pressure we also use LRA's costing to prefer not to use the
> alternative and instead just use the tie as this is preferable to a reload.
>
> Concretely this patch series does:
>
> > aarch64-none-elf-gcc -O3 -g0 -S -o - pred-clobber.c -mcpu=neoverse-n2
>
> foo:
>         mov     z31.h, w0
>         ptrue   p3.b, all
>         cmplo   p0.h, p3/z, z0.h, z31.h
>         b       use
>
> > aarch64-none-elf-gcc -O3 -g0 -S -o - pred-clobber.c -mcpu=neoverse-n1+sve
>
> foo:
>         mov     z31.h, w0
>         ptrue   p0.b, all
>         cmplo   p0.h, p0/z, z0.h, z31.h
>         b       use
>
> > aarch64-none-elf-gcc -O3 -g0 -S -o - pred-clobber.c -mcpu=neoverse-n2 -ffixed-p[1-15]
>
> foo:
>         mov     z31.h, w0
>         ptrue   p0.b, all
>         cmplo   p0.h, p0/z, z0.h, z31.h
>         b       use
>
> Testcases for the changes are in the last patch of the series.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Thanks,
> Tamar
>
> ---
>
> --

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH 0/4]AArch64: support conditional early clobbers on certain operations.
  2024-05-15 11:20 ` [PATCH 0/4]AArch64: support conditional early clobbers on certain operations Richard Biener
@ 2024-05-15 11:23   ` Tamar Christina
  2024-05-15 14:51     ` Richard Sandiford
  0 siblings, 1 reply; 19+ messages in thread
From: Tamar Christina @ 2024-05-15 11:23 UTC (permalink / raw)
  To: Richard Biener
  Cc: gcc-patches, nd, Richard Earnshaw, Marcus Shawcroft, ktkachov,
	Richard Sandiford

> -----Original Message-----
> From: Richard Biener <richard.guenther@gmail.com>
> Sent: Wednesday, May 15, 2024 12:20 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Marcus Shawcroft
> <Marcus.Shawcroft@arm.com>; ktkachov@gcc.gnu.org; Richard Sandiford
> <Richard.Sandiford@arm.com>
> Subject: Re: [PATCH 0/4]AArch64: support conditional early clobbers on certain
> operations.
> 
> On Wed, May 15, 2024 at 12:29 PM Tamar Christina
> <tamar.christina@arm.com> wrote:
> >
> > Hi All,
> >
> > Some Neoverse Software Optimization Guides (SWoG) have a clause that state
> > that for predicated operations that also produce a predicate it is preferred
> > that the codegen should use a different register for the destination than that
> > of the input predicate in order to avoid a performance overhead.
> >
> > This of course has the problem that it increases register pressure and so should
> > be done with care.  Additionally not all micro-architectures have this
> > consideration and so it shouldn't be done as a default thing.
> >
> > The patch series adds support for doing conditional early clobbers through a
> > combination of new alternatives and attributes to control their availability.
> 
> You could have two alternatives, one with early clobber and one with
> a matching constraint where you'd disparage the matching constraint one?
> 

Yeah, that's what I do, though there's no need to disparage the non-early clobber
alternative as the early clobber alternative will naturally get a penalty if it needs a
reload.

Cheers,
Tamar

> > On high register pressure we also use LRA's costing to prefer not to use the
> > alternative and instead just use the tie as this is preferable to a reload.
> >
> > Concretely this patch series does:
> >
> > > aarch64-none-elf-gcc -O3 -g0 -S -o - pred-clobber.c -mcpu=neoverse-n2
> >
> > foo:
> >         mov     z31.h, w0
> >         ptrue   p3.b, all
> >         cmplo   p0.h, p3/z, z0.h, z31.h
> >         b       use
> >
> > > aarch64-none-elf-gcc -O3 -g0 -S -o - pred-clobber.c -mcpu=neoverse-n1+sve
> >
> > foo:
> >         mov     z31.h, w0
> >         ptrue   p0.b, all
> >         cmplo   p0.h, p0/z, z0.h, z31.h
> >         b       use
> >
> > > aarch64-none-elf-gcc -O3 -g0 -S -o - pred-clobber.c -mcpu=neoverse-n2 -
> ffixed-p[1-15]
> >
> > foo:
> >         mov     z31.h, w0
> >         ptrue   p0.b, all
> >         cmplo   p0.h, p0/z, z0.h, z31.h
> >         b       use
> >
> > Testcases for the changes are in the last patch of the series.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Thanks,
> > Tamar
> >
> > ---
> >
> > --

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/4]AArch64: support conditional early clobbers on certain operations.
  2024-05-15 11:23   ` Tamar Christina
@ 2024-05-15 14:51     ` Richard Sandiford
  2024-05-15 15:56       ` Tamar Christina
  0 siblings, 1 reply; 19+ messages in thread
From: Richard Sandiford @ 2024-05-15 14:51 UTC (permalink / raw)
  To: Tamar Christina
  Cc: Richard Biener, gcc-patches, nd, Richard Earnshaw,
	Marcus Shawcroft, ktkachov

Tamar Christina <Tamar.Christina@arm.com> writes:
>> -----Original Message-----
>> From: Richard Biener <richard.guenther@gmail.com>
>> Sent: Wednesday, May 15, 2024 12:20 PM
>> To: Tamar Christina <Tamar.Christina@arm.com>
>> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard Earnshaw
>> <Richard.Earnshaw@arm.com>; Marcus Shawcroft
>> <Marcus.Shawcroft@arm.com>; ktkachov@gcc.gnu.org; Richard Sandiford
>> <Richard.Sandiford@arm.com>
>> Subject: Re: [PATCH 0/4]AArch64: support conditional early clobbers on certain
>> operations.
>> 
>> On Wed, May 15, 2024 at 12:29 PM Tamar Christina
>> <tamar.christina@arm.com> wrote:
>> >
>> > Hi All,
>> >
>> > Some Neoverse Software Optimization Guides (SWoG) have a clause that state
>> > that for predicated operations that also produce a predicate it is preferred
>> > that the codegen should use a different register for the destination than that
>> > of the input predicate in order to avoid a performance overhead.
>> >
>> > This of course has the problem that it increases register pressure and so should
>> > be done with care.  Additionally not all micro-architectures have this
>> > consideration and so it shouldn't be done as a default thing.
>> >
>> > The patch series adds support for doing conditional early clobbers through a
>> > combination of new alternatives and attributes to control their availability.
>> 
>> You could have two alternatives, one with early clobber and one with
>> a matching constraint where you'd disparage the matching constraint one?
>> 
>
> Yeah, that's what I do, though there's no need to disparage the non-early clobber
> alternative as the early clobber alternative will naturally get a penalty if it needs a
> reload.

But I think Richard's suggestion was to disparage the one with a matching
constraint (not the earlyclobber), to reflect the increased cost of
reusing the register.

We did take that approach for gathers, e.g.:

     [&w, Z,   w, Ui1, Ui1, Upl] ld1<Vesize>\t%0.s, %5/z, [%2.s]
     [?w, Z,   0, Ui1, Ui1, Upl] ^

The (supposed) advantage is that, if register pressure is so tight
that using matching registers is the only alternative, we still
have the opportunity to do that, as a last resort.

Providing only an earlyclobber version means that using the same
register is prohibited outright.  If no other register is free, the RA
would need to spill something else to free up a temporary register.
And it might then do the equivalent of (pseudo-code):

      not p1.b, ..., p0.b
      mov p0.d, p1.d

after spilling what would otherwise have occupied p1.  In that
situation it would be better use:

      not p0.b, ..., p0.b

and not introduce the spill of p1.

Another case where using matching registers is natural is for
loop-carried dependencies.  Do we want to keep them in:

   loop:
      ...no other sets of p0....
      not p0.b, ..., p0.b
      ...no other sets of p0....
      bne loop

or should we split it to:

   loop:
      ...no other sets of p0....
      not p1.b, ..., p0.b
      mov p0.d, p1.d
      ...no other sets of p0....
      bne loop

?

Thanks,
Richard

>
> Cheers,
> Tamar
>
>> > On high register pressure we also use LRA's costing to prefer not to use the
>> > alternative and instead just use the tie as this is preferable to a reload.
>> >
>> > Concretely this patch series does:
>> >
>> > > aarch64-none-elf-gcc -O3 -g0 -S -o - pred-clobber.c -mcpu=neoverse-n2
>> >
>> > foo:
>> >         mov     z31.h, w0
>> >         ptrue   p3.b, all
>> >         cmplo   p0.h, p3/z, z0.h, z31.h
>> >         b       use
>> >
>> > > aarch64-none-elf-gcc -O3 -g0 -S -o - pred-clobber.c -mcpu=neoverse-n1+sve
>> >
>> > foo:
>> >         mov     z31.h, w0
>> >         ptrue   p0.b, all
>> >         cmplo   p0.h, p0/z, z0.h, z31.h
>> >         b       use
>> >
>> > > aarch64-none-elf-gcc -O3 -g0 -S -o - pred-clobber.c -mcpu=neoverse-n2 -
>> ffixed-p[1-15]
>> >
>> > foo:
>> >         mov     z31.h, w0
>> >         ptrue   p0.b, all
>> >         cmplo   p0.h, p0/z, z0.h, z31.h
>> >         b       use
>> >
>> > Testcases for the changes are in the last patch of the series.
>> >
>> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>> >
>> > Thanks,
>> > Tamar
>> >
>> > ---
>> >
>> > --

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH 0/4]AArch64: support conditional early clobbers on certain operations.
  2024-05-15 14:51     ` Richard Sandiford
@ 2024-05-15 15:56       ` Tamar Christina
  2024-05-15 21:31         ` Richard Sandiford
  0 siblings, 1 reply; 19+ messages in thread
From: Tamar Christina @ 2024-05-15 15:56 UTC (permalink / raw)
  To: Richard Sandiford
  Cc: Richard Biener, gcc-patches, nd, Richard Earnshaw,
	Marcus Shawcroft, ktkachov

> >> On Wed, May 15, 2024 at 12:29 PM Tamar Christina
> >> <tamar.christina@arm.com> wrote:
> >> >
> >> > Hi All,
> >> >
> >> > Some Neoverse Software Optimization Guides (SWoG) have a clause that state
> >> > that for predicated operations that also produce a predicate it is preferred
> >> > that the codegen should use a different register for the destination than that
> >> > of the input predicate in order to avoid a performance overhead.
> >> >
> >> > This of course has the problem that it increases register pressure and so
> should
> >> > be done with care.  Additionally not all micro-architectures have this
> >> > consideration and so it shouldn't be done as a default thing.
> >> >
> >> > The patch series adds support for doing conditional early clobbers through a
> >> > combination of new alternatives and attributes to control their availability.
> >>
> >> You could have two alternatives, one with early clobber and one with
> >> a matching constraint where you'd disparage the matching constraint one?
> >>
> >
> > Yeah, that's what I do, though there's no need to disparage the non-early clobber
> > alternative as the early clobber alternative will naturally get a penalty if it needs a
> > reload.
> 
> But I think Richard's suggestion was to disparage the one with a matching
> constraint (not the earlyclobber), to reflect the increased cost of
> reusing the register.
> 
> We did take that approach for gathers, e.g.:
> 
>      [&w, Z,   w, Ui1, Ui1, Upl] ld1<Vesize>\t%0.s, %5/z, [%2.s]
>      [?w, Z,   0, Ui1, Ui1, Upl] ^
> 
> The (supposed) advantage is that, if register pressure is so tight
> that using matching registers is the only alternative, we still
> have the opportunity to do that, as a last resort.
> 
> Providing only an earlyclobber version means that using the same
> register is prohibited outright.  If no other register is free, the RA
> would need to spill something else to free up a temporary register.
> And it might then do the equivalent of (pseudo-code):
> 
>       not p1.b, ..., p0.b
>       mov p0.d, p1.d
> 
> after spilling what would otherwise have occupied p1.  In that
> situation it would be better use:
> 
>       not p0.b, ..., p0.b
> 
> and not introduce the spill of p1.

I think I understood what Richi meant, but I thought it was already working that way.
i.e. as one of the testcases I had:

> aarch64-none-elf-gcc -O3 -g0 -S -o - pred-clobber.c -mcpu=neoverse-n2 -ffixed-p[1-15]

foo:
        mov     z31.h, w0
        ptrue   p0.b, all
        cmplo   p0.h, p0/z, z0.h, z31.h
        b       use

and reload did not force a spill.

My understanding of how this works, and how it seems to be working is that since reload costs
Alternative from front to back the cheapest one wins and it stops evaluating the rest.

The early clobber case is first and preferred, however when it's not possible, i.e. requires a non-pseudo
reload, the reload cost is added to the alternative.

However you're right that in the following testcase:

-mcpu=neoverse-n2 -ffixed-p1 -ffixed-p2 -ffixed-p3 -ffixed-p4 -ffixed-p5 -ffixed-p6 -ffixed-p7 -ffixed-p8 -ffixed-p9 -ffixed-p10 -ffixed-p11 -ffixed-p12 -ffixed-p12 -ffixed-p13 -ffixed-p14 -ffixed-p14 -fdump-rtl-reload

i.e. giving it an extra free register inexplicably causes a spill:

foo:
        addvl   sp, sp, #-1
        mov     z31.h, w0
        ptrue   p0.b, all
        str     p15, [sp]
        cmplo   p15.h, p0/z, z0.h, z31.h
        mov     p0.b, p15.b
        ldr     p15, [sp]
        addvl   sp, sp, #1
        b       use

so that's unexpected and is very weird as p15 has no defined value..

Now adding the ? as suggested to the non-early clobber alternative does not fix it, and my mental model for how this is supposed to work does not quite line up..
Why would making the non-clobber alternative even more expensive help it during high register pressure?? But with that suggestion the above case does not get fixed
and the following case

-mcpu=neoverse-n2 -ffixed-p1 -ffixed-p2 -ffixed-p3 -ffixed-p4 -ffixed-p5 -ffixed-p6 -ffixed-p7 -ffixed-p8 -ffixed-p9 -ffixed-p10 -ffixed-p11 -ffixed-p12 -ffixed-p12 -ffixed-p13 -ffixed-p14 -ffixed-p15 -fdump-rtl-reload

ICEs:

pred-clobber.c: In function 'foo':
pred-clobber.c:9:1: error: unable to find a register to spill
    9 | }
      | ^
pred-clobber.c:9:1: error: this is the insn:
(insn 10 22 19 2 (parallel [
            (set (reg:VNx8BI 110 [104])
                (unspec:VNx8BI [
                        (reg:VNx8BI 112)
                        (const_int 1 [0x1])
                        (ltu:VNx8BI (reg:VNx8HI 32 v0)
                            (reg:VNx8HI 63 v31))
                    ] UNSPEC_PRED_Z))
            (clobber (reg:CC_NZC 66 cc))
        ]) "pred-clobber.c":7:19 8687 {aarch64_pred_cmplovnx8hi}
     (expr_list:REG_DEAD (reg:VNx8BI 112)
        (expr_list:REG_DEAD (reg:VNx8HI 63 v31)
            (expr_list:REG_DEAD (reg:VNx8HI 32 v0)
                (expr_list:REG_UNUSED (reg:CC_NZC 66 cc)
                    (nil))))))
during RTL pass: reload
dump file: pred-clobber.c.315r.reload

and this is because the use of ? has the unintended side-effect of blocking a register class entirely during Sched1 as we've recently discovered.
i.e. see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114766

in this case it marked the alternative as NO_REGS during sched1 and so it's completely dead.
the use of the ? alternatives has caused quite the code bloat as we've recently discovered because of this unexpected and undocumented behavior.

To me,

diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md
index 93ec59e58af..2ee3d8ea35e 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -8120,10 +8120,10 @@ (define_insn "@aarch64_pred_cmp<cmp_op><mode>"
    (clobber (reg:CC_NZC CC_REGNUM))]
   "TARGET_SVE"
   {@ [ cons: =0 , 1   , 3 , 4            ; attrs: pred_clobber ]
-     [ &Upa     , Upl , w , <sve_imm_con>; yes                 ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, #%4
-     [ Upa      , Upl , w , <sve_imm_con>; *                   ] ^
-     [ &Upa     , Upl , w , w            ; yes                 ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.<Vetype>
-     [ Upa      , Upl , w , w            ; *                   ] ^
+     [ ^&Upa    , Upl , w , <sve_imm_con>; yes                 ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, #%4
+     [  Upa     , Upl , w , <sve_imm_con>; *                   ] ^
+     [ ^&Upa    , Upl , w , w            ; yes                 ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.<Vetype>
+     [  Upa     , Upl , w , w            ; *                   ] ^
   }
 )

Would have been the right approach, i.e. we prefer the alternative unless a reload is needed, which should work no? well. if ^ wasn't broken the same way
as ?.  Perhaps I need to use Wilco's new alternative that doesn't block a register class?

But I'm probably missing something...

> 
> Another case where using matching registers is natural is for
> loop-carried dependencies.  Do we want to keep them in:
> 
>    loop:
>       ...no other sets of p0....
>       not p0.b, ..., p0.b
>       ...no other sets of p0....
>       bne loop
> 
> or should we split it to:
> 
>    loop:
>       ...no other sets of p0....
>       not p1.b, ..., p0.b
>       mov p0.d, p1.d
>       ...no other sets of p0....
>       bne loop
> 
> ?

On the uarches that this affects they are equivalent (I'm happy to expand on this internally if you'd like),
So in those cases the first one is preferred as it won't matter.

Thanks for the review an explanation!

Tamar

> 
> Thanks,
> Richard
> 
> >
> > Cheers,
> > Tamar
> >
> >> > On high register pressure we also use LRA's costing to prefer not to use the
> >> > alternative and instead just use the tie as this is preferable to a reload.
> >> >
> >> > Concretely this patch series does:
> >> >
> >> > > aarch64-none-elf-gcc -O3 -g0 -S -o - pred-clobber.c -mcpu=neoverse-n2
> >> >
> >> > foo:
> >> >         mov     z31.h, w0
> >> >         ptrue   p3.b, all
> >> >         cmplo   p0.h, p3/z, z0.h, z31.h
> >> >         b       use
> >> >
> >> > > aarch64-none-elf-gcc -O3 -g0 -S -o - pred-clobber.c -mcpu=neoverse-n1+sve
> >> >
> >> > foo:
> >> >         mov     z31.h, w0
> >> >         ptrue   p0.b, all
> >> >         cmplo   p0.h, p0/z, z0.h, z31.h
> >> >         b       use
> >> >
> >> > > aarch64-none-elf-gcc -O3 -g0 -S -o - pred-clobber.c -mcpu=neoverse-n2 -
> >> ffixed-p[1-15]
> >> >
> >> > foo:
> >> >         mov     z31.h, w0
> >> >         ptrue   p0.b, all
> >> >         cmplo   p0.h, p0/z, z0.h, z31.h
> >> >         b       use
> >> >
> >> > Testcases for the changes are in the last patch of the series.
> >> >
> >> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >> >
> >> > Thanks,
> >> > Tamar
> >> >
> >> > ---
> >> >
> >> > --

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/4]AArch64: support conditional early clobbers on certain operations.
  2024-05-15 15:56       ` Tamar Christina
@ 2024-05-15 21:31         ` Richard Sandiford
  2024-05-16  2:45           ` Tamar Christina
  2024-05-21  3:24           ` Tamar Christina
  0 siblings, 2 replies; 19+ messages in thread
From: Richard Sandiford @ 2024-05-15 21:31 UTC (permalink / raw)
  To: Tamar Christina
  Cc: Richard Biener, gcc-patches, nd, Richard Earnshaw,
	Marcus Shawcroft, ktkachov

Tamar Christina <Tamar.Christina@arm.com> writes:
>> >> On Wed, May 15, 2024 at 12:29 PM Tamar Christina
>> >> <tamar.christina@arm.com> wrote:
>> >> >
>> >> > Hi All,
>> >> >
>> >> > Some Neoverse Software Optimization Guides (SWoG) have a clause that state
>> >> > that for predicated operations that also produce a predicate it is preferred
>> >> > that the codegen should use a different register for the destination than that
>> >> > of the input predicate in order to avoid a performance overhead.
>> >> >
>> >> > This of course has the problem that it increases register pressure and so
>> should
>> >> > be done with care.  Additionally not all micro-architectures have this
>> >> > consideration and so it shouldn't be done as a default thing.
>> >> >
>> >> > The patch series adds support for doing conditional early clobbers through a
>> >> > combination of new alternatives and attributes to control their availability.
>> >>
>> >> You could have two alternatives, one with early clobber and one with
>> >> a matching constraint where you'd disparage the matching constraint one?
>> >>
>> >
>> > Yeah, that's what I do, though there's no need to disparage the non-early clobber
>> > alternative as the early clobber alternative will naturally get a penalty if it needs a
>> > reload.
>> 
>> But I think Richard's suggestion was to disparage the one with a matching
>> constraint (not the earlyclobber), to reflect the increased cost of
>> reusing the register.
>> 
>> We did take that approach for gathers, e.g.:
>> 
>>      [&w, Z,   w, Ui1, Ui1, Upl] ld1<Vesize>\t%0.s, %5/z, [%2.s]
>>      [?w, Z,   0, Ui1, Ui1, Upl] ^
>> 
>> The (supposed) advantage is that, if register pressure is so tight
>> that using matching registers is the only alternative, we still
>> have the opportunity to do that, as a last resort.
>> 
>> Providing only an earlyclobber version means that using the same
>> register is prohibited outright.  If no other register is free, the RA
>> would need to spill something else to free up a temporary register.
>> And it might then do the equivalent of (pseudo-code):
>> 
>>       not p1.b, ..., p0.b
>>       mov p0.d, p1.d
>> 
>> after spilling what would otherwise have occupied p1.  In that
>> situation it would be better use:
>> 
>>       not p0.b, ..., p0.b
>> 
>> and not introduce the spill of p1.
>
> I think I understood what Richi meant, but I thought it was already working that way.

The suggestion was to use matching constraints (like "0") though,
whereas the patch doesn't.  I think your argument is that you don't
need to use matching constraints.  But that's different from the
suggestion (and from how we handle gathers).

I was going to say in response to patch 3 (but got distracted, sorry):
I don't think we should have:

   &Upa, Upa, ...
   Upa, Upa, ...

(taken from the pure logic ops) enabled at the same time.  Even though
it works for the testcases, I don't think it has well-defined semantics.

The problem is that, taken on its own, the second alternative says that
matching operands are free.  And fundamentally, I don't think the costs
*must* take the earlyclobber alternative over the non-earlyclobber one
(when costing during IRA, for instance).  In principle, the cheapest
is best.

The aim of the gather approach is to make each alternative correct in
isolation.  In:

      [&w, Z,   w, Ui1, Ui1, Upl] ld1<Vesize>\t%0.s, %5/z, [%2.s]
      [?w, Z,   0, Ui1, Ui1, Upl] ^

the second alternative says that it is possible to have operands 0
and 2 be the same vector register, but using that version has the
cost of an extra reload.  In that sense the alternatives are
(essentially) consistent about the restriction.

> i.e. as one of the testcases I had:
>
>> aarch64-none-elf-gcc -O3 -g0 -S -o - pred-clobber.c -mcpu=neoverse-n2 -ffixed-p[1-15]
>
> foo:
>         mov     z31.h, w0
>         ptrue   p0.b, all
>         cmplo   p0.h, p0/z, z0.h, z31.h
>         b       use
>
> and reload did not force a spill.
>
> My understanding of how this works, and how it seems to be working is that since reload costs
> Alternative from front to back the cheapest one wins and it stops evaluating the rest.
>
> The early clobber case is first and preferred, however when it's not possible, i.e. requires a non-pseudo
> reload, the reload cost is added to the alternative.
>
> However you're right that in the following testcase:
>
> -mcpu=neoverse-n2 -ffixed-p1 -ffixed-p2 -ffixed-p3 -ffixed-p4 -ffixed-p5 -ffixed-p6 -ffixed-p7 -ffixed-p8 -ffixed-p9 -ffixed-p10 -ffixed-p11 -ffixed-p12 -ffixed-p12 -ffixed-p13 -ffixed-p14 -ffixed-p14 -fdump-rtl-reload
>
> i.e. giving it an extra free register inexplicably causes a spill:
>
> foo:
>         addvl   sp, sp, #-1
>         mov     z31.h, w0
>         ptrue   p0.b, all
>         str     p15, [sp]
>         cmplo   p15.h, p0/z, z0.h, z31.h
>         mov     p0.b, p15.b
>         ldr     p15, [sp]
>         addvl   sp, sp, #1
>         b       use
>
> so that's unexpected and is very weird as p15 has no defined value..

This is because the function implicitly uses the SVE PCS, and so needs
to preserve p15 for the caller.  It looks like the correct behaviour.

> Now adding the ? as suggested to the non-early clobber alternative does not fix it, and my mental model for how this is supposed to work does not quite line up..
> Why would making the non-clobber alternative even more expensive help it during high register pressure??

Hopefully the above answers this.  The non-clobber alternative has
zero extra cost as things stand.  The costs from one alternative
(the earlyclobber one) don't carry forward to other alternatives.

> But with that suggestion the above case does not get fixed
> and the following case
>
> -mcpu=neoverse-n2 -ffixed-p1 -ffixed-p2 -ffixed-p3 -ffixed-p4 -ffixed-p5 -ffixed-p6 -ffixed-p7 -ffixed-p8 -ffixed-p9 -ffixed-p10 -ffixed-p11 -ffixed-p12 -ffixed-p12 -ffixed-p13 -ffixed-p14 -ffixed-p15 -fdump-rtl-reload
>
> ICEs:
>
> pred-clobber.c: In function 'foo':
> pred-clobber.c:9:1: error: unable to find a register to spill
>     9 | }
>       | ^
> pred-clobber.c:9:1: error: this is the insn:
> (insn 10 22 19 2 (parallel [
>             (set (reg:VNx8BI 110 [104])
>                 (unspec:VNx8BI [
>                         (reg:VNx8BI 112)
>                         (const_int 1 [0x1])
>                         (ltu:VNx8BI (reg:VNx8HI 32 v0)
>                             (reg:VNx8HI 63 v31))
>                     ] UNSPEC_PRED_Z))
>             (clobber (reg:CC_NZC 66 cc))
>         ]) "pred-clobber.c":7:19 8687 {aarch64_pred_cmplovnx8hi}
>      (expr_list:REG_DEAD (reg:VNx8BI 112)
>         (expr_list:REG_DEAD (reg:VNx8HI 63 v31)
>             (expr_list:REG_DEAD (reg:VNx8HI 32 v0)
>                 (expr_list:REG_UNUSED (reg:CC_NZC 66 cc)
>                     (nil))))))
> during RTL pass: reload
> dump file: pred-clobber.c.315r.reload

Which pattern did you use?

> and this is because the use of ? has the unintended side-effect of blocking a register class entirely during Sched1 as we've recently discovered.
> i.e. see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114766

(Is sched1 the problem here, or is it purely an RA thing?  What happens
when scheduling is disabled?)

> in this case it marked the alternative as NO_REGS during sched1 and so it's completely dead.
> the use of the ? alternatives has caused quite the code bloat as we've recently discovered because of this unexpected and undocumented behavior.
>
> To me,
>
> diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md
> index 93ec59e58af..2ee3d8ea35e 100644
> --- a/gcc/config/aarch64/aarch64-sve.md
> +++ b/gcc/config/aarch64/aarch64-sve.md
> @@ -8120,10 +8120,10 @@ (define_insn "@aarch64_pred_cmp<cmp_op><mode>"
>     (clobber (reg:CC_NZC CC_REGNUM))]
>    "TARGET_SVE"
>    {@ [ cons: =0 , 1   , 3 , 4            ; attrs: pred_clobber ]
> -     [ &Upa     , Upl , w , <sve_imm_con>; yes                 ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, #%4
> -     [ Upa      , Upl , w , <sve_imm_con>; *                   ] ^
> -     [ &Upa     , Upl , w , w            ; yes                 ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.<Vetype>
> -     [ Upa      , Upl , w , w            ; *                   ] ^
> +     [ ^&Upa    , Upl , w , <sve_imm_con>; yes                 ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, #%4
> +     [  Upa     , Upl , w , <sve_imm_con>; *                   ] ^
> +     [ ^&Upa    , Upl , w , w            ; yes                 ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.<Vetype>
> +     [  Upa     , Upl , w , w            ; *                   ] ^
>    }
>  )
>
> Would have been the right approach, i.e. we prefer the alternative unless a reload is needed, which should work no? well. if ^ wasn't broken the same way
> as ?.  Perhaps I need to use Wilco's new alternative that doesn't block a register class?

Hmm, I'm not sure.  It seems odd to mark only the output with ^, since
reloading the output isn't fundamentally different (costwise) from
reloading the input.

But to me, it's the alternative without the earlyclobber that should be
disparaged, since it's the inherently expensive one.

The gather-like approach would be something like:

     [ &Upa     , Upl , w , <sve_imm_con>; yes                 ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, #%4
     [ ?Upl     , 0   , w , <sve_imm_con>; yes                 ] ^
     [ Upa      , Upl , w , <sve_imm_con>; no                  ] ^
     [ &Upa     , Upl , w , w            ; yes                 ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.<Vetype>
     [ ?Upl     , 0   , w , w            ; yes                 ] ^
     [ Upa      , Upl , w , w            ; no                  ] ^

with:

  (define_attr "pred_clobber" "any,no,yes" (const_string "any"))

Thanks,
Richard

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH 0/4]AArch64: support conditional early clobbers on certain operations.
  2024-05-15 21:31         ` Richard Sandiford
@ 2024-05-16  2:45           ` Tamar Christina
  2024-05-21  3:24           ` Tamar Christina
  1 sibling, 0 replies; 19+ messages in thread
From: Tamar Christina @ 2024-05-16  2:45 UTC (permalink / raw)
  To: Richard Sandiford
  Cc: Richard Biener, gcc-patches, nd, Richard Earnshaw,
	Marcus Shawcroft, ktkachov

> -----Original Message-----
> From: Richard Sandiford <richard.sandiford@arm.com>
> Sent: Wednesday, May 15, 2024 10:31 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: Richard Biener <richard.guenther@gmail.com>; gcc-patches@gcc.gnu.org; nd
> <nd@arm.com>; Richard Earnshaw <Richard.Earnshaw@arm.com>; Marcus
> Shawcroft <Marcus.Shawcroft@arm.com>; ktkachov@gcc.gnu.org
> Subject: Re: [PATCH 0/4]AArch64: support conditional early clobbers on certain
> operations.
> 
> Tamar Christina <Tamar.Christina@arm.com> writes:
> >> >> On Wed, May 15, 2024 at 12:29 PM Tamar Christina
> >> >> <tamar.christina@arm.com> wrote:
> >> >> >
> >> >> > Hi All,
> >> >> >
> >> >> > Some Neoverse Software Optimization Guides (SWoG) have a clause that
> state
> >> >> > that for predicated operations that also produce a predicate it is preferred
> >> >> > that the codegen should use a different register for the destination than
> that
> >> >> > of the input predicate in order to avoid a performance overhead.
> >> >> >
> >> >> > This of course has the problem that it increases register pressure and so
> >> should
> >> >> > be done with care.  Additionally not all micro-architectures have this
> >> >> > consideration and so it shouldn't be done as a default thing.
> >> >> >
> >> >> > The patch series adds support for doing conditional early clobbers through
> a
> >> >> > combination of new alternatives and attributes to control their availability.
> >> >>
> >> >> You could have two alternatives, one with early clobber and one with
> >> >> a matching constraint where you'd disparage the matching constraint one?
> >> >>
> >> >
> >> > Yeah, that's what I do, though there's no need to disparage the non-early
> clobber
> >> > alternative as the early clobber alternative will naturally get a penalty if it
> needs a
> >> > reload.
> >>
> >> But I think Richard's suggestion was to disparage the one with a matching
> >> constraint (not the earlyclobber), to reflect the increased cost of
> >> reusing the register.
> >>
> >> We did take that approach for gathers, e.g.:
> >>
> >>      [&w, Z,   w, Ui1, Ui1, Upl] ld1<Vesize>\t%0.s, %5/z, [%2.s]
> >>      [?w, Z,   0, Ui1, Ui1, Upl] ^
> >>
> >> The (supposed) advantage is that, if register pressure is so tight
> >> that using matching registers is the only alternative, we still
> >> have the opportunity to do that, as a last resort.
> >>
> >> Providing only an earlyclobber version means that using the same
> >> register is prohibited outright.  If no other register is free, the RA
> >> would need to spill something else to free up a temporary register.
> >> And it might then do the equivalent of (pseudo-code):
> >>
> >>       not p1.b, ..., p0.b
> >>       mov p0.d, p1.d
> >>
> >> after spilling what would otherwise have occupied p1.  In that
> >> situation it would be better use:
> >>
> >>       not p0.b, ..., p0.b
> >>
> >> and not introduce the spill of p1.
> >
> > I think I understood what Richi meant, but I thought it was already working that
> way.
> 
> The suggestion was to use matching constraints (like "0") though,
> whereas the patch doesn't.  I think your argument is that you don't
> need to use matching constraints.  But that's different from the
> suggestion (and from how we handle gathers).
> 
> I was going to say in response to patch 3 (but got distracted, sorry):
> I don't think we should have:
> 
>    &Upa, Upa, ...
>    Upa, Upa, ...
> 
> (taken from the pure logic ops) enabled at the same time.  Even though
> it works for the testcases, I don't think it has well-defined semantics.
> 
> The problem is that, taken on its own, the second alternative says that
> matching operands are free.  And fundamentally, I don't think the costs
> *must* take the earlyclobber alternative over the non-earlyclobber one
> (when costing during IRA, for instance).  In principle, the cheapest
> is best.
> 
> The aim of the gather approach is to make each alternative correct in
> isolation.  In:
> 
>       [&w, Z,   w, Ui1, Ui1, Upl] ld1<Vesize>\t%0.s, %5/z, [%2.s]
>       [?w, Z,   0, Ui1, Ui1, Upl] ^
> 
> the second alternative says that it is possible to have operands 0
> and 2 be the same vector register, but using that version has the
> cost of an extra reload.  In that sense the alternatives are
> (essentially) consistent about the restriction.
> 

Oh I see! Sorry read over the explicit tie in the first mail! I understand now,
The idea is to explicitly model the tie, and non-tie versions. Got it.

> > i.e. as one of the testcases I had:
> >
> >> aarch64-none-elf-gcc -O3 -g0 -S -o - pred-clobber.c -mcpu=neoverse-n2 -ffixed-
> p[1-15]
> >
> > foo:
> >         mov     z31.h, w0
> >         ptrue   p0.b, all
> >         cmplo   p0.h, p0/z, z0.h, z31.h
> >         b       use
> >
> > and reload did not force a spill.
> >
> > My understanding of how this works, and how it seems to be working is that
> since reload costs
> > Alternative from front to back the cheapest one wins and it stops evaluating the
> rest.
> >
> > The early clobber case is first and preferred, however when it's not possible, i.e.
> requires a non-pseudo
> > reload, the reload cost is added to the alternative.
> >
> > However you're right that in the following testcase:
> >
> > -mcpu=neoverse-n2 -ffixed-p1 -ffixed-p2 -ffixed-p3 -ffixed-p4 -ffixed-p5 -ffixed-
> p6 -ffixed-p7 -ffixed-p8 -ffixed-p9 -ffixed-p10 -ffixed-p11 -ffixed-p12 -ffixed-p12 -
> ffixed-p13 -ffixed-p14 -ffixed-p14 -fdump-rtl-reload
> >
> > i.e. giving it an extra free register inexplicably causes a spill:
> >
> > foo:
> >         addvl   sp, sp, #-1
> >         mov     z31.h, w0
> >         ptrue   p0.b, all
> >         str     p15, [sp]
> >         cmplo   p15.h, p0/z, z0.h, z31.h
> >         mov     p0.b, p15.b
> >         ldr     p15, [sp]
> >         addvl   sp, sp, #1
> >         b       use
> >
> > so that's unexpected and is very weird as p15 has no defined value..
> 
> This is because the function implicitly uses the SVE PCS, and so needs
> to preserve p15 for the caller.  It looks like the correct behaviour.
> 
> > Now adding the ? as suggested to the non-early clobber alternative does not fix
> it, and my mental model for how this is supposed to work does not quite line up..
> > Why would making the non-clobber alternative even more expensive help it
> during high register pressure??
> 
> Hopefully the above answers this.  The non-clobber alternative has
> zero extra cost as things stand.  The costs from one alternative
> (the earlyclobber one) don't carry forward to other alternatives.
> 
> > But with that suggestion the above case does not get fixed
> > and the following case
> >
> > -mcpu=neoverse-n2 -ffixed-p1 -ffixed-p2 -ffixed-p3 -ffixed-p4 -ffixed-p5 -ffixed-
> p6 -ffixed-p7 -ffixed-p8 -ffixed-p9 -ffixed-p10 -ffixed-p11 -ffixed-p12 -ffixed-p12 -
> ffixed-p13 -ffixed-p14 -ffixed-p15 -fdump-rtl-reload
> >
> > ICEs:
> >
> > pred-clobber.c: In function 'foo':
> > pred-clobber.c:9:1: error: unable to find a register to spill
> >     9 | }
> >       | ^
> > pred-clobber.c:9:1: error: this is the insn:
> > (insn 10 22 19 2 (parallel [
> >             (set (reg:VNx8BI 110 [104])
> >                 (unspec:VNx8BI [
> >                         (reg:VNx8BI 112)
> >                         (const_int 1 [0x1])
> >                         (ltu:VNx8BI (reg:VNx8HI 32 v0)
> >                             (reg:VNx8HI 63 v31))
> >                     ] UNSPEC_PRED_Z))
> >             (clobber (reg:CC_NZC 66 cc))
> >         ]) "pred-clobber.c":7:19 8687 {aarch64_pred_cmplovnx8hi}
> >      (expr_list:REG_DEAD (reg:VNx8BI 112)
> >         (expr_list:REG_DEAD (reg:VNx8HI 63 v31)
> >             (expr_list:REG_DEAD (reg:VNx8HI 32 v0)
> >                 (expr_list:REG_UNUSED (reg:CC_NZC 66 cc)
> >                     (nil))))))
> > during RTL pass: reload
> > dump file: pred-clobber.c.315r.reload
> 
> Which pattern did you use?
> 
> > and this is because the use of ? has the unintended side-effect of blocking a
> register class entirely during Sched1 as we've recently discovered.
> > i.e. see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114766
> 
> (Is sched1 the problem here, or is it purely an RA thing?  What happens
> when scheduling is disabled?)
> 
> > in this case it marked the alternative as NO_REGS during sched1 and so it's
> completely dead.
> > the use of the ? alternatives has caused quite the code bloat as we've recently
> discovered because of this unexpected and undocumented behavior.
> >
> > To me,
> >
> > diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-
> sve.md
> > index 93ec59e58af..2ee3d8ea35e 100644
> > --- a/gcc/config/aarch64/aarch64-sve.md
> > +++ b/gcc/config/aarch64/aarch64-sve.md
> > @@ -8120,10 +8120,10 @@ (define_insn
> "@aarch64_pred_cmp<cmp_op><mode>"
> >     (clobber (reg:CC_NZC CC_REGNUM))]
> >    "TARGET_SVE"
> >    {@ [ cons: =0 , 1   , 3 , 4            ; attrs: pred_clobber ]
> > -     [ &Upa     , Upl , w , <sve_imm_con>; yes                 ]
> cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, #%4
> > -     [ Upa      , Upl , w , <sve_imm_con>; *                   ] ^
> > -     [ &Upa     , Upl , w , w            ; yes                 ] cmp<cmp_op>\t%0.<Vetype>,
> %1/z, %3.<Vetype>, %4.<Vetype>
> > -     [ Upa      , Upl , w , w            ; *                   ] ^
> > +     [ ^&Upa    , Upl , w , <sve_imm_con>; yes                 ]
> cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, #%4
> > +     [  Upa     , Upl , w , <sve_imm_con>; *                   ] ^
> > +     [ ^&Upa    , Upl , w , w            ; yes                 ] cmp<cmp_op>\t%0.<Vetype>,
> %1/z, %3.<Vetype>, %4.<Vetype>
> > +     [  Upa     , Upl , w , w            ; *                   ] ^
> >    }
> >  )
> >
> > Would have been the right approach, i.e. we prefer the alternative unless a reload
> is needed, which should work no? well. if ^ wasn't broken the same way
> > as ?.  Perhaps I need to use Wilco's new alternative that doesn't block a register
> class?
> 
> Hmm, I'm not sure.  It seems odd to mark only the output with ^, since
> reloading the output isn't fundamentally different (costwise) from
> reloading the input.
> 
> But to me, it's the alternative without the earlyclobber that should be
> disparaged, since it's the inherently expensive one.
> 
> The gather-like approach would be something like:
> 
>      [ &Upa     , Upl , w , <sve_imm_con>; yes                 ]
> cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, #%4
>      [ ?Upl     , 0   , w , <sve_imm_con>; yes                 ] ^
>      [ Upa      , Upl , w , <sve_imm_con>; no                  ] ^
>      [ &Upa     , Upl , w , w            ; yes                 ] cmp<cmp_op>\t%0.<Vetype>, %1/z,
> %3.<Vetype>, %4.<Vetype>
>      [ ?Upl     , 0   , w , w            ; yes                 ] ^
>      [ Upa      , Upl , w , w            ; no                  ] ^
> 
> with:
> 
>   (define_attr "pred_clobber" "any,no,yes" (const_string "any"))

Yeah, this makes sense to me.  Sorry I completely misunderstood that the alternative
with the tie was suggested in addition to, and not instead of.

I'll respin the patches this way.

Thanks both!,
Tamar

> 
> Thanks,
> Richard

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH 0/4]AArch64: support conditional early clobbers on certain operations.
  2024-05-15 21:31         ` Richard Sandiford
  2024-05-16  2:45           ` Tamar Christina
@ 2024-05-21  3:24           ` Tamar Christina
  1 sibling, 0 replies; 19+ messages in thread
From: Tamar Christina @ 2024-05-21  3:24 UTC (permalink / raw)
  To: Richard Sandiford
  Cc: Richard Biener, gcc-patches, nd, Richard Earnshaw,
	Marcus Shawcroft, ktkachov

> -----Original Message-----
> From: Richard Sandiford <richard.sandiford@arm.com>
> Sent: Wednesday, May 15, 2024 10:31 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: Richard Biener <richard.guenther@gmail.com>; gcc-patches@gcc.gnu.org; nd
> <nd@arm.com>; Richard Earnshaw <Richard.Earnshaw@arm.com>; Marcus
> Shawcroft <Marcus.Shawcroft@arm.com>; ktkachov@gcc.gnu.org
> Subject: Re: [PATCH 0/4]AArch64: support conditional early clobbers on certain
> operations.
> 
> Tamar Christina <Tamar.Christina@arm.com> writes:
> >> >> On Wed, May 15, 2024 at 12:29 PM Tamar Christina
> >> >> <tamar.christina@arm.com> wrote:
> >> >> >
> >> >> > Hi All,
> >> >> >
> >> >> > Some Neoverse Software Optimization Guides (SWoG) have a clause that
> state
> >> >> > that for predicated operations that also produce a predicate it is preferred
> >> >> > that the codegen should use a different register for the destination than
> that
> >> >> > of the input predicate in order to avoid a performance overhead.
> >> >> >
> >> >> > This of course has the problem that it increases register pressure and so
> >> should
> >> >> > be done with care.  Additionally not all micro-architectures have this
> >> >> > consideration and so it shouldn't be done as a default thing.
> >> >> >
> >> >> > The patch series adds support for doing conditional early clobbers through
> a
> >> >> > combination of new alternatives and attributes to control their availability.
> >> >>
> >> >> You could have two alternatives, one with early clobber and one with
> >> >> a matching constraint where you'd disparage the matching constraint one?
> >> >>
> >> >
> >> > Yeah, that's what I do, though there's no need to disparage the non-early
> clobber
> >> > alternative as the early clobber alternative will naturally get a penalty if it
> needs a
> >> > reload.
> >>
> >> But I think Richard's suggestion was to disparage the one with a matching
> >> constraint (not the earlyclobber), to reflect the increased cost of
> >> reusing the register.
> >>
> >> We did take that approach for gathers, e.g.:
> >>
> >>      [&w, Z,   w, Ui1, Ui1, Upl] ld1<Vesize>\t%0.s, %5/z, [%2.s]
> >>      [?w, Z,   0, Ui1, Ui1, Upl] ^
> >>
> >> The (supposed) advantage is that, if register pressure is so tight
> >> that using matching registers is the only alternative, we still
> >> have the opportunity to do that, as a last resort.
> >>
> >> Providing only an earlyclobber version means that using the same
> >> register is prohibited outright.  If no other register is free, the RA
> >> would need to spill something else to free up a temporary register.
> >> And it might then do the equivalent of (pseudo-code):
> >>
> >>       not p1.b, ..., p0.b
> >>       mov p0.d, p1.d
> >>
> >> after spilling what would otherwise have occupied p1.  In that
> >> situation it would be better use:
> >>
> >>       not p0.b, ..., p0.b
> >>
> >> and not introduce the spill of p1.
> >
> > I think I understood what Richi meant, but I thought it was already working that
> way.
> 
> The suggestion was to use matching constraints (like "0") though,
> whereas the patch doesn't.  I think your argument is that you don't
> need to use matching constraints.  But that's different from the
> suggestion (and from how we handle gathers).
> 
> I was going to say in response to patch 3 (but got distracted, sorry):
> I don't think we should have:
> 
>    &Upa, Upa, ...
>    Upa, Upa, ...
> 
> (taken from the pure logic ops) enabled at the same time.  Even though
> it works for the testcases, I don't think it has well-defined semantics.
> 
> The problem is that, taken on its own, the second alternative says that
> matching operands are free.  And fundamentally, I don't think the costs
> *must* take the earlyclobber alternative over the non-earlyclobber one
> (when costing during IRA, for instance).  In principle, the cheapest
> is best.
> 
> The aim of the gather approach is to make each alternative correct in
> isolation.  In:
> 
>       [&w, Z,   w, Ui1, Ui1, Upl] ld1<Vesize>\t%0.s, %5/z, [%2.s]
>       [?w, Z,   0, Ui1, Ui1, Upl] ^
> 
> the second alternative says that it is possible to have operands 0
> and 2 be the same vector register, but using that version has the
> cost of an extra reload.  In that sense the alternatives are
> (essentially) consistent about the restriction.
> 
> > i.e. as one of the testcases I had:
> >
> >> aarch64-none-elf-gcc -O3 -g0 -S -o - pred-clobber.c -mcpu=neoverse-n2 -ffixed-
> p[1-15]
> >
> > foo:
> >         mov     z31.h, w0
> >         ptrue   p0.b, all
> >         cmplo   p0.h, p0/z, z0.h, z31.h
> >         b       use
> >
> > and reload did not force a spill.
> >
> > My understanding of how this works, and how it seems to be working is that
> since reload costs
> > Alternative from front to back the cheapest one wins and it stops evaluating the
> rest.
> >
> > The early clobber case is first and preferred, however when it's not possible, i.e.
> requires a non-pseudo
> > reload, the reload cost is added to the alternative.
> >
> > However you're right that in the following testcase:
> >
> > -mcpu=neoverse-n2 -ffixed-p1 -ffixed-p2 -ffixed-p3 -ffixed-p4 -ffixed-p5 -ffixed-
> p6 -ffixed-p7 -ffixed-p8 -ffixed-p9 -ffixed-p10 -ffixed-p11 -ffixed-p12 -ffixed-p12 -
> ffixed-p13 -ffixed-p14 -ffixed-p14 -fdump-rtl-reload
> >
> > i.e. giving it an extra free register inexplicably causes a spill:
> >
> > foo:
> >         addvl   sp, sp, #-1
> >         mov     z31.h, w0
> >         ptrue   p0.b, all
> >         str     p15, [sp]
> >         cmplo   p15.h, p0/z, z0.h, z31.h
> >         mov     p0.b, p15.b
> >         ldr     p15, [sp]
> >         addvl   sp, sp, #1
> >         b       use
> >
> > so that's unexpected and is very weird as p15 has no defined value..
> 
> This is because the function implicitly uses the SVE PCS, and so needs
> to preserve p15 for the caller.  It looks like the correct behaviour.

Sure, but p15 isn't live after the call. 
It is somewhat of a regression in that if it had chosen the tie version
then p0 wouldn't need preserving.

It's a bit of an artificial case I guess but are we ok with this regression?
Or is there a way to query df to see if a value is live after the call?

I can only see ways to tell if the register is live before the call..

Thanks,
Tamar

> 
> > Now adding the ? as suggested to the non-early clobber alternative does not fix
> it, and my mental model for how this is supposed to work does not quite line up..
> > Why would making the non-clobber alternative even more expensive help it
> during high register pressure??
> 
> Hopefully the above answers this.  The non-clobber alternative has
> zero extra cost as things stand.  The costs from one alternative
> (the earlyclobber one) don't carry forward to other alternatives.
> 
> > But with that suggestion the above case does not get fixed
> > and the following case
> >
> > -mcpu=neoverse-n2 -ffixed-p1 -ffixed-p2 -ffixed-p3 -ffixed-p4 -ffixed-p5 -ffixed-
> p6 -ffixed-p7 -ffixed-p8 -ffixed-p9 -ffixed-p10 -ffixed-p11 -ffixed-p12 -ffixed-p12 -
> ffixed-p13 -ffixed-p14 -ffixed-p15 -fdump-rtl-reload
> >
> > ICEs:
> >
> > pred-clobber.c: In function 'foo':
> > pred-clobber.c:9:1: error: unable to find a register to spill
> >     9 | }
> >       | ^
> > pred-clobber.c:9:1: error: this is the insn:
> > (insn 10 22 19 2 (parallel [
> >             (set (reg:VNx8BI 110 [104])
> >                 (unspec:VNx8BI [
> >                         (reg:VNx8BI 112)
> >                         (const_int 1 [0x1])
> >                         (ltu:VNx8BI (reg:VNx8HI 32 v0)
> >                             (reg:VNx8HI 63 v31))
> >                     ] UNSPEC_PRED_Z))
> >             (clobber (reg:CC_NZC 66 cc))
> >         ]) "pred-clobber.c":7:19 8687 {aarch64_pred_cmplovnx8hi}
> >      (expr_list:REG_DEAD (reg:VNx8BI 112)
> >         (expr_list:REG_DEAD (reg:VNx8HI 63 v31)
> >             (expr_list:REG_DEAD (reg:VNx8HI 32 v0)
> >                 (expr_list:REG_UNUSED (reg:CC_NZC 66 cc)
> >                     (nil))))))
> > during RTL pass: reload
> > dump file: pred-clobber.c.315r.reload
> 
> Which pattern did you use?
> 
> > and this is because the use of ? has the unintended side-effect of blocking a
> register class entirely during Sched1 as we've recently discovered.
> > i.e. see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114766
> 
> (Is sched1 the problem here, or is it purely an RA thing?  What happens
> when scheduling is disabled?)
> 
> > in this case it marked the alternative as NO_REGS during sched1 and so it's
> completely dead.
> > the use of the ? alternatives has caused quite the code bloat as we've recently
> discovered because of this unexpected and undocumented behavior.
> >
> > To me,
> >
> > diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-
> sve.md
> > index 93ec59e58af..2ee3d8ea35e 100644
> > --- a/gcc/config/aarch64/aarch64-sve.md
> > +++ b/gcc/config/aarch64/aarch64-sve.md
> > @@ -8120,10 +8120,10 @@ (define_insn
> "@aarch64_pred_cmp<cmp_op><mode>"
> >     (clobber (reg:CC_NZC CC_REGNUM))]
> >    "TARGET_SVE"
> >    {@ [ cons: =0 , 1   , 3 , 4            ; attrs: pred_clobber ]
> > -     [ &Upa     , Upl , w , <sve_imm_con>; yes                 ]
> cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, #%4
> > -     [ Upa      , Upl , w , <sve_imm_con>; *                   ] ^
> > -     [ &Upa     , Upl , w , w            ; yes                 ] cmp<cmp_op>\t%0.<Vetype>,
> %1/z, %3.<Vetype>, %4.<Vetype>
> > -     [ Upa      , Upl , w , w            ; *                   ] ^
> > +     [ ^&Upa    , Upl , w , <sve_imm_con>; yes                 ]
> cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, #%4
> > +     [  Upa     , Upl , w , <sve_imm_con>; *                   ] ^
> > +     [ ^&Upa    , Upl , w , w            ; yes                 ] cmp<cmp_op>\t%0.<Vetype>,
> %1/z, %3.<Vetype>, %4.<Vetype>
> > +     [  Upa     , Upl , w , w            ; *                   ] ^
> >    }
> >  )
> >
> > Would have been the right approach, i.e. we prefer the alternative unless a reload
> is needed, which should work no? well. if ^ wasn't broken the same way
> > as ?.  Perhaps I need to use Wilco's new alternative that doesn't block a register
> class?
> 
> Hmm, I'm not sure.  It seems odd to mark only the output with ^, since
> reloading the output isn't fundamentally different (costwise) from
> reloading the input.
> 
> But to me, it's the alternative without the earlyclobber that should be
> disparaged, since it's the inherently expensive one.
> 
> The gather-like approach would be something like:
> 
>      [ &Upa     , Upl , w , <sve_imm_con>; yes                 ]
> cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, #%4
>      [ ?Upl     , 0   , w , <sve_imm_con>; yes                 ] ^
>      [ Upa      , Upl , w , <sve_imm_con>; no                  ] ^
>      [ &Upa     , Upl , w , w            ; yes                 ] cmp<cmp_op>\t%0.<Vetype>, %1/z,
> %3.<Vetype>, %4.<Vetype>
>      [ ?Upl     , 0   , w , w            ; yes                 ] ^
>      [ Upa      , Upl , w , w            ; no                  ] ^
> 
> with:
> 
>   (define_attr "pred_clobber" "any,no,yes" (const_string "any"))
> 
> Thanks,
> Richard

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH 2/4]AArch64: add new tuning param and attribute for enabling conditional early clobber
  2024-05-15 10:56   ` Richard Sandiford
  2024-05-15 11:03     ` Tamar Christina
@ 2024-05-22  9:29     ` Tamar Christina
  2024-05-28  9:37       ` Tamar Christina
  1 sibling, 1 reply; 19+ messages in thread
From: Tamar Christina @ 2024-05-22  9:29 UTC (permalink / raw)
  To: Richard Sandiford
  Cc: gcc-patches, nd, Richard Earnshaw, Marcus Shawcroft, ktkachov

[-- Attachment #1: Type: text/plain, Size: 6016 bytes --]

> 
> Sorry for the bike-shedding, but how about something like "avoid_pred_rmw"?
> (I'm open to other suggestions.)  Just looking for something that describes
> either the architecture or the end result that we want to achieve.
> And preferable something fairly short :)
> 
> avoid_* would be consistent with the existing "avoid_cross_loop_fma".
> 
> > +
> >  #undef AARCH64_EXTRA_TUNING_OPTION
> > diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> > index
> bbf11faaf4b4340956094a983f8b0dc2649b2d27..76a18dd511f40ebb58ed12d5
> 6b46c74084ba7c3c 100644
> > --- a/gcc/config/aarch64/aarch64.h
> > +++ b/gcc/config/aarch64/aarch64.h
> > @@ -495,6 +495,11 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE =
> AARCH64_FL_SM_OFF;
> >      enabled through +gcs.  */
> >  #define TARGET_GCS (AARCH64_ISA_GCS)
> >
> > +/*  Prefer different predicate registers for the output of a predicated operation
> over
> > +    re-using an existing input predicate.  */
> > +#define TARGET_SVE_PRED_CLOBBER (TARGET_SVE \
> > +				 && (aarch64_tune_params.extra_tuning_flags \
> > +				     &
> AARCH64_EXTRA_TUNE_EARLY_CLOBBER_SVE_PRED_DEST))
> >
> >  /* Standard register usage.  */
> >
> > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> > index
> dbde066f7478bec51a8703b017ea553aa98be309..1ecd1a2812969504bd5114a
> 53473b478c5ddba82 100644
> > --- a/gcc/config/aarch64/aarch64.md
> > +++ b/gcc/config/aarch64/aarch64.md
> > @@ -445,6 +445,10 @@ (define_enum_attr "arch" "arches" (const_string
> "any"))
> >  ;; target-independent code.
> >  (define_attr "is_call" "no,yes" (const_string "no"))
> >
> > +;; Indicates whether we want to enable the pattern with an optional early
> > +;; clobber for SVE predicates.
> > +(define_attr "pred_clobber" "no,yes" (const_string "no"))
> > +
> >  ;; [For compatibility with Arm in pipeline models]
> >  ;; Attribute that specifies whether or not the instruction touches fp
> >  ;; registers.
> > @@ -461,7 +465,8 @@ (define_attr "fp" "no,yes"
> >  (define_attr "arch_enabled" "no,yes"
> >    (if_then_else
> >      (ior
> > -	(eq_attr "arch" "any")
> > +	(and (eq_attr "arch" "any")
> > +	     (eq_attr "pred_clobber" "no"))
> >
> >  	(and (eq_attr "arch" "rcpc8_4")
> >  	     (match_test "AARCH64_ISA_RCPC8_4"))
> > @@ -488,7 +493,10 @@ (define_attr "arch_enabled" "no,yes"
> >  	     (match_test "TARGET_SVE"))
> >
> >  	(and (eq_attr "arch" "sme")
> > -	     (match_test "TARGET_SME")))
> > +	     (match_test "TARGET_SME"))
> > +
> > +	(and (eq_attr "pred_clobber" "yes")
> > +	     (match_test "TARGET_SVE_PRED_CLOBBER")))
> 
> IMO it'd be bettero handle pred_clobber separately from arch, as a new
> top-level AND:
> 
>   (and
>     (ior
>       (eq_attr "pred_clobber" "no")
>       (match_test "!TARGET_..."))
>     (ior
>       ...existing arch tests...))
> 

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* config/aarch64/aarch64-tuning-flags.def
	(AVOID_PRED_RMW): New.
	* config/aarch64/aarch64.h (TARGET_SVE_PRED_CLOBBER): New.
	* config/aarch64/aarch64.md (pred_clobber): New.
	(arch_enabled): Use it.

-- inline copy of patch --

diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def b/gcc/config/aarch64/aarch64-tuning-flags.def
index d5bcaebce770f0b217aac783063d39135f754c77..a9f48f5d3d4ea32fbf53086ba21eab4bc65b6dcb 100644
--- a/gcc/config/aarch64/aarch64-tuning-flags.def
+++ b/gcc/config/aarch64/aarch64-tuning-flags.def
@@ -48,4 +48,8 @@ AARCH64_EXTRA_TUNING_OPTION ("avoid_cross_loop_fma", AVOID_CROSS_LOOP_FMA)
 
 AARCH64_EXTRA_TUNING_OPTION ("fully_pipelined_fma", FULLY_PIPELINED_FMA)
 
+/* Enable is the target prefers to use a fresh register for predicate outputs
+   rather than re-use an input predicate register.  */
+AARCH64_EXTRA_TUNING_OPTION ("avoid_pred_rmw", AVOID_PRED_RMW)
+
 #undef AARCH64_EXTRA_TUNING_OPTION
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index bbf11faaf4b4340956094a983f8b0dc2649b2d27..e7669e65d7dae5df2ba42c265079b1856a5c382b 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -495,6 +495,11 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = AARCH64_FL_SM_OFF;
     enabled through +gcs.  */
 #define TARGET_GCS (AARCH64_ISA_GCS)
 
+/*  Prefer different predicate registers for the output of a predicated operation over
+    re-using an existing input predicate.  */
+#define TARGET_SVE_PRED_CLOBBER (TARGET_SVE \
+				 && (aarch64_tune_params.extra_tuning_flags \
+				     & AARCH64_EXTRA_TUNE_AVOID_PRED_RMW))
 
 /* Standard register usage.  */
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index dbde066f7478bec51a8703b017ea553aa98be309..52e5adba4172e14b794b5df9394e58ce49ef8b7f 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -445,6 +445,10 @@ (define_enum_attr "arch" "arches" (const_string "any"))
 ;; target-independent code.
 (define_attr "is_call" "no,yes" (const_string "no"))
 
+;; Indicates whether we want to enable the pattern with an optional early
+;; clobber for SVE predicates.
+(define_attr "pred_clobber" "no,yes" (const_string "no"))
+
 ;; [For compatibility with Arm in pipeline models]
 ;; Attribute that specifies whether or not the instruction touches fp
 ;; registers.
@@ -460,7 +464,12 @@ (define_attr "fp" "no,yes"
 
 (define_attr "arch_enabled" "no,yes"
   (if_then_else
-    (ior
+    (and
+      (ior
+	(eq_attr "pred_clobber" "no")
+	(match_test "TARGET_SVE_PRED_CLOBBER"))
+
+      (ior
 	(eq_attr "arch" "any")
 
 	(and (eq_attr "arch" "rcpc8_4")
@@ -488,7 +497,7 @@ (define_attr "arch_enabled" "no,yes"
 	     (match_test "TARGET_SVE"))
 
 	(and (eq_attr "arch" "sme")
-	     (match_test "TARGET_SME")))
+	     (match_test "TARGET_SME"))))
     (const_string "yes")
     (const_string "no")))


[-- Attachment #2: rb18355.patch --]
[-- Type: application/octet-stream, Size: 2666 bytes --]

diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def b/gcc/config/aarch64/aarch64-tuning-flags.def
index d5bcaebce770f0b217aac783063d39135f754c77..a9f48f5d3d4ea32fbf53086ba21eab4bc65b6dcb 100644
--- a/gcc/config/aarch64/aarch64-tuning-flags.def
+++ b/gcc/config/aarch64/aarch64-tuning-flags.def
@@ -48,4 +48,8 @@ AARCH64_EXTRA_TUNING_OPTION ("avoid_cross_loop_fma", AVOID_CROSS_LOOP_FMA)
 
 AARCH64_EXTRA_TUNING_OPTION ("fully_pipelined_fma", FULLY_PIPELINED_FMA)
 
+/* Enable is the target prefers to use a fresh register for predicate outputs
+   rather than re-use an input predicate register.  */
+AARCH64_EXTRA_TUNING_OPTION ("avoid_pred_rmw", AVOID_PRED_RMW)
+
 #undef AARCH64_EXTRA_TUNING_OPTION
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index bbf11faaf4b4340956094a983f8b0dc2649b2d27..e7669e65d7dae5df2ba42c265079b1856a5c382b 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -495,6 +495,11 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = AARCH64_FL_SM_OFF;
     enabled through +gcs.  */
 #define TARGET_GCS (AARCH64_ISA_GCS)
 
+/*  Prefer different predicate registers for the output of a predicated operation over
+    re-using an existing input predicate.  */
+#define TARGET_SVE_PRED_CLOBBER (TARGET_SVE \
+				 && (aarch64_tune_params.extra_tuning_flags \
+				     & AARCH64_EXTRA_TUNE_AVOID_PRED_RMW))
 
 /* Standard register usage.  */
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index dbde066f7478bec51a8703b017ea553aa98be309..52e5adba4172e14b794b5df9394e58ce49ef8b7f 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -445,6 +445,10 @@ (define_enum_attr "arch" "arches" (const_string "any"))
 ;; target-independent code.
 (define_attr "is_call" "no,yes" (const_string "no"))
 
+;; Indicates whether we want to enable the pattern with an optional early
+;; clobber for SVE predicates.
+(define_attr "pred_clobber" "no,yes" (const_string "no"))
+
 ;; [For compatibility with Arm in pipeline models]
 ;; Attribute that specifies whether or not the instruction touches fp
 ;; registers.
@@ -460,7 +464,12 @@ (define_attr "fp" "no,yes"
 
 (define_attr "arch_enabled" "no,yes"
   (if_then_else
-    (ior
+    (and
+      (ior
+	(eq_attr "pred_clobber" "no")
+	(match_test "TARGET_SVE_PRED_CLOBBER"))
+
+      (ior
 	(eq_attr "arch" "any")
 
 	(and (eq_attr "arch" "rcpc8_4")
@@ -488,7 +497,7 @@ (define_attr "arch_enabled" "no,yes"
 	     (match_test "TARGET_SVE"))
 
 	(and (eq_attr "arch" "sme")
-	     (match_test "TARGET_SME")))
+	     (match_test "TARGET_SME"))))
     (const_string "yes")
     (const_string "no")))
 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH 2/4]AArch64: add new tuning param and attribute for enabling conditional early clobber
  2024-05-22  9:29     ` Tamar Christina
@ 2024-05-28  9:37       ` Tamar Christina
  2024-05-30 14:59         ` Richard Sandiford
  0 siblings, 1 reply; 19+ messages in thread
From: Tamar Christina @ 2024-05-28  9:37 UTC (permalink / raw)
  To: Tamar Christina, Richard Sandiford
  Cc: gcc-patches, nd, Richard Earnshaw, Marcus Shawcroft, ktkachov

[-- Attachment #1: Type: text/plain, Size: 6779 bytes --]

> -----Original Message-----
> From: Tamar Christina <Tamar.Christina@arm.com>
> Sent: Wednesday, May 22, 2024 10:29 AM
> To: Richard Sandiford <Richard.Sandiford@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Marcus Shawcroft
> <Marcus.Shawcroft@arm.com>; ktkachov@gcc.gnu.org
> Subject: RE: [PATCH 2/4]AArch64: add new tuning param and attribute for
> enabling conditional early clobber
> 
> >
> > Sorry for the bike-shedding, but how about something like "avoid_pred_rmw"?
> > (I'm open to other suggestions.)  Just looking for something that describes
> > either the architecture or the end result that we want to achieve.
> > And preferable something fairly short :)
> >
> > avoid_* would be consistent with the existing "avoid_cross_loop_fma".
> >
> > > +
> > >  #undef AARCH64_EXTRA_TUNING_OPTION
> > > diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> > > index
> >
> bbf11faaf4b4340956094a983f8b0dc2649b2d27..76a18dd511f40ebb58ed12d5
> > 6b46c74084ba7c3c 100644
> > > --- a/gcc/config/aarch64/aarch64.h
> > > +++ b/gcc/config/aarch64/aarch64.h
> > > @@ -495,6 +495,11 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE =
> > AARCH64_FL_SM_OFF;
> > >      enabled through +gcs.  */
> > >  #define TARGET_GCS (AARCH64_ISA_GCS)
> > >
> > > +/*  Prefer different predicate registers for the output of a predicated operation
> > over
> > > +    re-using an existing input predicate.  */
> > > +#define TARGET_SVE_PRED_CLOBBER (TARGET_SVE \
> > > +				 && (aarch64_tune_params.extra_tuning_flags \
> > > +				     &
> > AARCH64_EXTRA_TUNE_EARLY_CLOBBER_SVE_PRED_DEST))
> > >
> > >  /* Standard register usage.  */
> > >
> > > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> > > index
> >
> dbde066f7478bec51a8703b017ea553aa98be309..1ecd1a2812969504bd5114a
> > 53473b478c5ddba82 100644
> > > --- a/gcc/config/aarch64/aarch64.md
> > > +++ b/gcc/config/aarch64/aarch64.md
> > > @@ -445,6 +445,10 @@ (define_enum_attr "arch" "arches" (const_string
> > "any"))
> > >  ;; target-independent code.
> > >  (define_attr "is_call" "no,yes" (const_string "no"))
> > >
> > > +;; Indicates whether we want to enable the pattern with an optional early
> > > +;; clobber for SVE predicates.
> > > +(define_attr "pred_clobber" "no,yes" (const_string "no"))
> > > +
> > >  ;; [For compatibility with Arm in pipeline models]
> > >  ;; Attribute that specifies whether or not the instruction touches fp
> > >  ;; registers.
> > > @@ -461,7 +465,8 @@ (define_attr "fp" "no,yes"
> > >  (define_attr "arch_enabled" "no,yes"
> > >    (if_then_else
> > >      (ior
> > > -	(eq_attr "arch" "any")
> > > +	(and (eq_attr "arch" "any")
> > > +	     (eq_attr "pred_clobber" "no"))
> > >
> > >  	(and (eq_attr "arch" "rcpc8_4")
> > >  	     (match_test "AARCH64_ISA_RCPC8_4"))
> > > @@ -488,7 +493,10 @@ (define_attr "arch_enabled" "no,yes"
> > >  	     (match_test "TARGET_SVE"))
> > >
> > >  	(and (eq_attr "arch" "sme")
> > > -	     (match_test "TARGET_SME")))
> > > +	     (match_test "TARGET_SME"))
> > > +
> > > +	(and (eq_attr "pred_clobber" "yes")
> > > +	     (match_test "TARGET_SVE_PRED_CLOBBER")))
> >
> > IMO it'd be bettero handle pred_clobber separately from arch, as a new
> > top-level AND:
> >
> >   (and
> >     (ior
> >       (eq_attr "pred_clobber" "no")
> >       (match_test "!TARGET_..."))
> >     (ior
> >       ...existing arch tests...))
> >
> 

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* config/aarch64/aarch64-tuning-flags.def
	(AVOID_PRED_RMW): New.
	* config/aarch64/aarch64.h (TARGET_SVE_PRED_CLOBBER): New.
	* config/aarch64/aarch64.md (pred_clobber): New.
	(arch_enabled): Use it.

-- inline copy of patch --

diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def b/gcc/config/aarch64/aarch64-tuning-flags.def
index d5bcaebce770f0b217aac783063d39135f754c77..a9f48f5d3d4ea32fbf53086ba21eab4bc65b6dcb 100644
--- a/gcc/config/aarch64/aarch64-tuning-flags.def
+++ b/gcc/config/aarch64/aarch64-tuning-flags.def
@@ -48,4 +48,8 @@ AARCH64_EXTRA_TUNING_OPTION ("avoid_cross_loop_fma", AVOID_CROSS_LOOP_FMA)
 
 AARCH64_EXTRA_TUNING_OPTION ("fully_pipelined_fma", FULLY_PIPELINED_FMA)
 
+/* Enable is the target prefers to use a fresh register for predicate outputs
+   rather than re-use an input predicate register.  */
+AARCH64_EXTRA_TUNING_OPTION ("avoid_pred_rmw", AVOID_PRED_RMW)
+
 #undef AARCH64_EXTRA_TUNING_OPTION
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index bbf11faaf4b4340956094a983f8b0dc2649b2d27..e7669e65d7dae5df2ba42c265079b1856a5c382b 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -495,6 +495,11 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = AARCH64_FL_SM_OFF;
     enabled through +gcs.  */
 #define TARGET_GCS (AARCH64_ISA_GCS)
 
+/*  Prefer different predicate registers for the output of a predicated operation over
+    re-using an existing input predicate.  */
+#define TARGET_SVE_PRED_CLOBBER (TARGET_SVE \
+				 && (aarch64_tune_params.extra_tuning_flags \
+				     & AARCH64_EXTRA_TUNE_AVOID_PRED_RMW))
 
 /* Standard register usage.  */
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index dbde066f7478bec51a8703b017ea553aa98be309..a7da3c01617eb8411029c7d2e32f13fa2cc1c833 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -445,6 +445,10 @@ (define_enum_attr "arch" "arches" (const_string "any"))
 ;; target-independent code.
 (define_attr "is_call" "no,yes" (const_string "no"))
 
+;; Indicates whether we want to enable the pattern with an optional early
+;; clobber for SVE predicates.
+(define_attr "pred_clobber" "any,no,yes" (const_string "any"))
+
 ;; [For compatibility with Arm in pipeline models]
 ;; Attribute that specifies whether or not the instruction touches fp
 ;; registers.
@@ -460,7 +464,17 @@ (define_attr "fp" "no,yes"
 
 (define_attr "arch_enabled" "no,yes"
   (if_then_else
-    (ior
+    (and
+      (ior
+	(and
+	  (eq_attr "pred_clobber" "no")
+	  (match_test "!TARGET_SVE_PRED_CLOBBER"))
+	(and
+	  (eq_attr "pred_clobber" "yes")
+	  (match_test "TARGET_SVE_PRED_CLOBBER"))
+	(eq_attr "pred_clobber" "any"))
+
+      (ior
 	(eq_attr "arch" "any")
 
 	(and (eq_attr "arch" "rcpc8_4")
@@ -488,7 +502,7 @@ (define_attr "arch_enabled" "no,yes"
 	     (match_test "TARGET_SVE"))
 
 	(and (eq_attr "arch" "sme")
-	     (match_test "TARGET_SME")))
+	     (match_test "TARGET_SME"))))
     (const_string "yes")
     (const_string "no")))


[-- Attachment #2: rb18355 (1).patch --]
[-- Type: application/octet-stream, Size: 2803 bytes --]

diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def b/gcc/config/aarch64/aarch64-tuning-flags.def
index d5bcaebce770f0b217aac783063d39135f754c77..a9f48f5d3d4ea32fbf53086ba21eab4bc65b6dcb 100644
--- a/gcc/config/aarch64/aarch64-tuning-flags.def
+++ b/gcc/config/aarch64/aarch64-tuning-flags.def
@@ -48,4 +48,8 @@ AARCH64_EXTRA_TUNING_OPTION ("avoid_cross_loop_fma", AVOID_CROSS_LOOP_FMA)
 
 AARCH64_EXTRA_TUNING_OPTION ("fully_pipelined_fma", FULLY_PIPELINED_FMA)
 
+/* Enable is the target prefers to use a fresh register for predicate outputs
+   rather than re-use an input predicate register.  */
+AARCH64_EXTRA_TUNING_OPTION ("avoid_pred_rmw", AVOID_PRED_RMW)
+
 #undef AARCH64_EXTRA_TUNING_OPTION
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index bbf11faaf4b4340956094a983f8b0dc2649b2d27..e7669e65d7dae5df2ba42c265079b1856a5c382b 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -495,6 +495,11 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = AARCH64_FL_SM_OFF;
     enabled through +gcs.  */
 #define TARGET_GCS (AARCH64_ISA_GCS)
 
+/*  Prefer different predicate registers for the output of a predicated operation over
+    re-using an existing input predicate.  */
+#define TARGET_SVE_PRED_CLOBBER (TARGET_SVE \
+				 && (aarch64_tune_params.extra_tuning_flags \
+				     & AARCH64_EXTRA_TUNE_AVOID_PRED_RMW))
 
 /* Standard register usage.  */
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index dbde066f7478bec51a8703b017ea553aa98be309..a7da3c01617eb8411029c7d2e32f13fa2cc1c833 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -445,6 +445,10 @@ (define_enum_attr "arch" "arches" (const_string "any"))
 ;; target-independent code.
 (define_attr "is_call" "no,yes" (const_string "no"))
 
+;; Indicates whether we want to enable the pattern with an optional early
+;; clobber for SVE predicates.
+(define_attr "pred_clobber" "any,no,yes" (const_string "any"))
+
 ;; [For compatibility with Arm in pipeline models]
 ;; Attribute that specifies whether or not the instruction touches fp
 ;; registers.
@@ -460,7 +464,17 @@ (define_attr "fp" "no,yes"
 
 (define_attr "arch_enabled" "no,yes"
   (if_then_else
-    (ior
+    (and
+      (ior
+	(and
+	  (eq_attr "pred_clobber" "no")
+	  (match_test "!TARGET_SVE_PRED_CLOBBER"))
+	(and
+	  (eq_attr "pred_clobber" "yes")
+	  (match_test "TARGET_SVE_PRED_CLOBBER"))
+	(eq_attr "pred_clobber" "any"))
+
+      (ior
 	(eq_attr "arch" "any")
 
 	(and (eq_attr "arch" "rcpc8_4")
@@ -488,7 +502,7 @@ (define_attr "arch_enabled" "no,yes"
 	     (match_test "TARGET_SVE"))
 
 	(and (eq_attr "arch" "sme")
-	     (match_test "TARGET_SME")))
+	     (match_test "TARGET_SME"))))
     (const_string "yes")
     (const_string "no")))
 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/4]AArch64: add new tuning param and attribute for enabling conditional early clobber
  2024-05-28  9:37       ` Tamar Christina
@ 2024-05-30 14:59         ` Richard Sandiford
  0 siblings, 0 replies; 19+ messages in thread
From: Richard Sandiford @ 2024-05-30 14:59 UTC (permalink / raw)
  To: Tamar Christina
  Cc: gcc-patches, nd, Richard Earnshaw, Marcus Shawcroft, ktkachov

Tamar Christina <Tamar.Christina@arm.com> writes:
>> -----Original Message-----
>> From: Tamar Christina <Tamar.Christina@arm.com>
>> Sent: Wednesday, May 22, 2024 10:29 AM
>> To: Richard Sandiford <Richard.Sandiford@arm.com>
>> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard Earnshaw
>> <Richard.Earnshaw@arm.com>; Marcus Shawcroft
>> <Marcus.Shawcroft@arm.com>; ktkachov@gcc.gnu.org
>> Subject: RE: [PATCH 2/4]AArch64: add new tuning param and attribute for
>> enabling conditional early clobber
>>
>> >
>> > Sorry for the bike-shedding, but how about something like "avoid_pred_rmw"?
>> > (I'm open to other suggestions.)  Just looking for something that describes
>> > either the architecture or the end result that we want to achieve.
>> > And preferable something fairly short :)
>> >
>> > avoid_* would be consistent with the existing "avoid_cross_loop_fma".
>> >
>> > > +
>> > >  #undef AARCH64_EXTRA_TUNING_OPTION
>> > > diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
>> > > index
>> >
>> bbf11faaf4b4340956094a983f8b0dc2649b2d27..76a18dd511f40ebb58ed12d5
>> > 6b46c74084ba7c3c 100644
>> > > --- a/gcc/config/aarch64/aarch64.h
>> > > +++ b/gcc/config/aarch64/aarch64.h
>> > > @@ -495,6 +495,11 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE =
>> > AARCH64_FL_SM_OFF;
>> > >      enabled through +gcs.  */
>> > >  #define TARGET_GCS (AARCH64_ISA_GCS)
>> > >
>> > > +/*  Prefer different predicate registers for the output of a predicated operation
>> > over
>> > > +    re-using an existing input predicate.  */
>> > > +#define TARGET_SVE_PRED_CLOBBER (TARGET_SVE \
>> > > +                          && (aarch64_tune_params.extra_tuning_flags \
>> > > +                              &
>> > AARCH64_EXTRA_TUNE_EARLY_CLOBBER_SVE_PRED_DEST))
>> > >
>> > >  /* Standard register usage.  */
>> > >
>> > > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
>> > > index
>> >
>> dbde066f7478bec51a8703b017ea553aa98be309..1ecd1a2812969504bd5114a
>> > 53473b478c5ddba82 100644
>> > > --- a/gcc/config/aarch64/aarch64.md
>> > > +++ b/gcc/config/aarch64/aarch64.md
>> > > @@ -445,6 +445,10 @@ (define_enum_attr "arch" "arches" (const_string
>> > "any"))
>> > >  ;; target-independent code.
>> > >  (define_attr "is_call" "no,yes" (const_string "no"))
>> > >
>> > > +;; Indicates whether we want to enable the pattern with an optional early
>> > > +;; clobber for SVE predicates.
>> > > +(define_attr "pred_clobber" "no,yes" (const_string "no"))
>> > > +
>> > >  ;; [For compatibility with Arm in pipeline models]
>> > >  ;; Attribute that specifies whether or not the instruction touches fp
>> > >  ;; registers.
>> > > @@ -461,7 +465,8 @@ (define_attr "fp" "no,yes"
>> > >  (define_attr "arch_enabled" "no,yes"
>> > >    (if_then_else
>> > >      (ior
>> > > - (eq_attr "arch" "any")
>> > > + (and (eq_attr "arch" "any")
>> > > +      (eq_attr "pred_clobber" "no"))
>> > >
>> > >   (and (eq_attr "arch" "rcpc8_4")
>> > >        (match_test "AARCH64_ISA_RCPC8_4"))
>> > > @@ -488,7 +493,10 @@ (define_attr "arch_enabled" "no,yes"
>> > >        (match_test "TARGET_SVE"))
>> > >
>> > >   (and (eq_attr "arch" "sme")
>> > > -      (match_test "TARGET_SME")))
>> > > +      (match_test "TARGET_SME"))
>> > > +
>> > > + (and (eq_attr "pred_clobber" "yes")
>> > > +      (match_test "TARGET_SVE_PRED_CLOBBER")))
>> >
>> > IMO it'd be bettero handle pred_clobber separately from arch, as a new
>> > top-level AND:
>> >
>> >   (and
>> >     (ior
>> >       (eq_attr "pred_clobber" "no")
>> >       (match_test "!TARGET_..."))
>> >     (ior
>> >       ...existing arch tests...))
>> >
>>
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>         * config/aarch64/aarch64-tuning-flags.def
>         (AVOID_PRED_RMW): New.
>         * config/aarch64/aarch64.h (TARGET_SVE_PRED_CLOBBER): New.
>         * config/aarch64/aarch64.md (pred_clobber): New.
>         (arch_enabled): Use it.
>
> -- inline copy of patch --
>
> diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def b/gcc/config/aarch64/aarch64-tuning-flags.def
> index d5bcaebce770f0b217aac783063d39135f754c77..a9f48f5d3d4ea32fbf53086ba21eab4bc65b6dcb 100644
> --- a/gcc/config/aarch64/aarch64-tuning-flags.def
> +++ b/gcc/config/aarch64/aarch64-tuning-flags.def
> @@ -48,4 +48,8 @@ AARCH64_EXTRA_TUNING_OPTION ("avoid_cross_loop_fma", AVOID_CROSS_LOOP_FMA)
>
>  AARCH64_EXTRA_TUNING_OPTION ("fully_pipelined_fma", FULLY_PIPELINED_FMA)
>
> +/* Enable is the target prefers to use a fresh register for predicate outputs
> +   rather than re-use an input predicate register.  */
> +AARCH64_EXTRA_TUNING_OPTION ("avoid_pred_rmw", AVOID_PRED_RMW)
> +
>  #undef AARCH64_EXTRA_TUNING_OPTION
> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> index bbf11faaf4b4340956094a983f8b0dc2649b2d27..e7669e65d7dae5df2ba42c265079b1856a5c382b 100644
> --- a/gcc/config/aarch64/aarch64.h
> +++ b/gcc/config/aarch64/aarch64.h
> @@ -495,6 +495,11 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = AARCH64_FL_SM_OFF;
>      enabled through +gcs.  */
>  #define TARGET_GCS (AARCH64_ISA_GCS)
>
> +/*  Prefer different predicate registers for the output of a predicated operation over
> +    re-using an existing input predicate.  */

Formatting nit (sorry for not noticing last time):

/* Prefer different predicate registers for the output of a predicated
   operation over re-using an existing input predicate.  */

(avoiding an extra space after "/*" and wrapping at 80 columns).

OK with that change, thanks.

Richard


> +#define TARGET_SVE_PRED_CLOBBER (TARGET_SVE \
> +                                && (aarch64_tune_params.extra_tuning_flags \
> +                                    & AARCH64_EXTRA_TUNE_AVOID_PRED_RMW))
>
>  /* Standard register usage.  */
>
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index dbde066f7478bec51a8703b017ea553aa98be309..a7da3c01617eb8411029c7d2e32f13fa2cc1c833 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -445,6 +445,10 @@ (define_enum_attr "arch" "arches" (const_string "any"))
>  ;; target-independent code.
>  (define_attr "is_call" "no,yes" (const_string "no"))
>
> +;; Indicates whether we want to enable the pattern with an optional early
> +;; clobber for SVE predicates.
> +(define_attr "pred_clobber" "any,no,yes" (const_string "any"))
> +
>  ;; [For compatibility with Arm in pipeline models]
>  ;; Attribute that specifies whether or not the instruction touches fp
>  ;; registers.
> @@ -460,7 +464,17 @@ (define_attr "fp" "no,yes"
>
>  (define_attr "arch_enabled" "no,yes"
>    (if_then_else
> -    (ior
> +    (and
> +      (ior
> +       (and
> +         (eq_attr "pred_clobber" "no")
> +         (match_test "!TARGET_SVE_PRED_CLOBBER"))
> +       (and
> +         (eq_attr "pred_clobber" "yes")
> +         (match_test "TARGET_SVE_PRED_CLOBBER"))
> +       (eq_attr "pred_clobber" "any"))
> +
> +      (ior
>         (eq_attr "arch" "any")
>
>         (and (eq_attr "arch" "rcpc8_4")
> @@ -488,7 +502,7 @@ (define_attr "arch_enabled" "no,yes"
>              (match_test "TARGET_SVE"))
>
>         (and (eq_attr "arch" "sme")
> -            (match_test "TARGET_SME")))
> +            (match_test "TARGET_SME"))))
>      (const_string "yes")
>      (const_string "no")))

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2024-05-30 14:59 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-15 10:28 [PATCH 0/4]AArch64: support conditional early clobbers on certain operations Tamar Christina
2024-05-15 10:28 ` [PATCH 1/4]AArch64: convert several predicate patterns to new compact syntax Tamar Christina
2024-05-15 10:35   ` Kyrill Tkachov
2024-05-15 11:06   ` Richard Sandiford
2024-05-15 10:28 ` [PATCH 2/4]AArch64: add new tuning param and attribute for enabling conditional early clobber Tamar Christina
2024-05-15 10:56   ` Richard Sandiford
2024-05-15 11:03     ` Tamar Christina
2024-05-22  9:29     ` Tamar Christina
2024-05-28  9:37       ` Tamar Christina
2024-05-30 14:59         ` Richard Sandiford
2024-05-15 10:29 ` [PATCH 3/4]AArch64: add new alternative with early clobber to patterns Tamar Christina
2024-05-15 10:29 ` [PATCH 4/4]AArch64: enable new predicate tuning for Neoverse cores Tamar Christina
2024-05-15 11:20 ` [PATCH 0/4]AArch64: support conditional early clobbers on certain operations Richard Biener
2024-05-15 11:23   ` Tamar Christina
2024-05-15 14:51     ` Richard Sandiford
2024-05-15 15:56       ` Tamar Christina
2024-05-15 21:31         ` Richard Sandiford
2024-05-16  2:45           ` Tamar Christina
2024-05-21  3:24           ` Tamar Christina

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).