* [PATCH 0/4]AArch64: support conditional early clobbers on certain operations. @ 2024-05-15 10:28 Tamar Christina 2024-05-15 10:28 ` [PATCH 1/4]AArch64: convert several predicate patterns to new compact syntax Tamar Christina ` (4 more replies) 0 siblings, 5 replies; 25+ messages in thread From: Tamar Christina @ 2024-05-15 10:28 UTC (permalink / raw) To: gcc-patches Cc: nd, Richard.Earnshaw, Marcus.Shawcroft, ktkachov, richard.sandiford [-- Attachment #1: Type: text/plain, Size: 1613 bytes --] Hi All, Some Neoverse Software Optimization Guides (SWoG) have a clause that state that for predicated operations that also produce a predicate it is preferred that the codegen should use a different register for the destination than that of the input predicate in order to avoid a performance overhead. This of course has the problem that it increases register pressure and so should be done with care. Additionally not all micro-architectures have this consideration and so it shouldn't be done as a default thing. The patch series adds support for doing conditional early clobbers through a combination of new alternatives and attributes to control their availability. On high register pressure we also use LRA's costing to prefer not to use the alternative and instead just use the tie as this is preferable to a reload. Concretely this patch series does: > aarch64-none-elf-gcc -O3 -g0 -S -o - pred-clobber.c -mcpu=neoverse-n2 foo: mov z31.h, w0 ptrue p3.b, all cmplo p0.h, p3/z, z0.h, z31.h b use > aarch64-none-elf-gcc -O3 -g0 -S -o - pred-clobber.c -mcpu=neoverse-n1+sve foo: mov z31.h, w0 ptrue p0.b, all cmplo p0.h, p0/z, z0.h, z31.h b use > aarch64-none-elf-gcc -O3 -g0 -S -o - pred-clobber.c -mcpu=neoverse-n2 -ffixed-p[1-15] foo: mov z31.h, w0 ptrue p0.b, all cmplo p0.h, p0/z, z0.h, z31.h b use Testcases for the changes are in the last patch of the series. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Thanks, Tamar --- -- [-- Attachment #2: rb18359.patch --] [-- Type: text/x-diff, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 1/4]AArch64: convert several predicate patterns to new compact syntax 2024-05-15 10:28 [PATCH 0/4]AArch64: support conditional early clobbers on certain operations Tamar Christina @ 2024-05-15 10:28 ` Tamar Christina 2024-05-15 10:35 ` Kyrill Tkachov 2024-05-15 11:06 ` Richard Sandiford 2024-05-15 10:28 ` [PATCH 2/4]AArch64: add new tuning param and attribute for enabling conditional early clobber Tamar Christina ` (3 subsequent siblings) 4 siblings, 2 replies; 25+ messages in thread From: Tamar Christina @ 2024-05-15 10:28 UTC (permalink / raw) To: gcc-patches Cc: nd, Richard.Earnshaw, Marcus.Shawcroft, ktkachov, richard.sandiford [-- Attachment #1: Type: text/plain, Size: 27218 bytes --] Hi All, This converts the single alternative patterns to the new compact syntax such that when I add the new alternatives it's clearer what's being changed. Note that this will spew out a bunch of warnings from geninsn as it'll warn that @ is useless for a single alternative pattern. These are not fatal so won't break the build and are only temporary. No change in functionality is expected with this patch. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * config/aarch64/aarch64-sve.md (and<mode>3, @aarch64_pred_<optab><mode>_z, *<optab><mode>3_cc, *<optab><mode>3_ptest, aarch64_pred_<nlogical><mode>_z, *<nlogical><mode>3_cc, *<nlogical><mode>3_ptest, aarch64_pred_<logical_nn><mode>_z, *<logical_nn><mode>3_cc, *<logical_nn><mode>3_ptest, *cmp<cmp_op><mode>_ptest, @aarch64_pred_cmp<cmp_op><mode>_wide, *aarch64_pred_cmp<cmp_op><mode>_wide_cc, *aarch64_pred_cmp<cmp_op><mode>_wide_ptest, *aarch64_brk<brk_op>_cc, *aarch64_brk<brk_op>_ptest, @aarch64_brk<brk_op>, *aarch64_brkn_cc, *aarch64_brkn_ptest, *aarch64_brk<brk_op>_cc, *aarch64_brk<brk_op>_ptest, aarch64_rdffr_z, *aarch64_rdffr_z_ptest, *aarch64_rdffr_ptest, *aarch64_rdffr_z_cc, *aarch64_rdffr_cc): Convert to compact syntax. * config/aarch64/aarch64-sve2.md (@aarch64_pred_<sve_int_op><mode>): Likewise. --- diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md index 0434358122d2fde71bd0e0f850338e739e9be02c..839ab0627747d7a49bef7b0192ee9e7a42587ca0 100644 --- a/gcc/config/aarch64/aarch64-sve.md +++ b/gcc/config/aarch64/aarch64-sve.md @@ -1156,76 +1156,86 @@ (define_insn "aarch64_rdffr" ;; Likewise with zero predication. (define_insn "aarch64_rdffr_z" - [(set (match_operand:VNx16BI 0 "register_operand" "=Upa") + [(set (match_operand:VNx16BI 0 "register_operand") (and:VNx16BI (reg:VNx16BI FFRT_REGNUM) - (match_operand:VNx16BI 1 "register_operand" "Upa")))] + (match_operand:VNx16BI 1 "register_operand")))] "TARGET_SVE && TARGET_NON_STREAMING" - "rdffr\t%0.b, %1/z" + {@ [ cons: =0, 1 ] + [ Upa , Upa ] rdffr\t%0.b, %1/z + } ) ;; Read the FFR to test for a fault, without using the predicate result. (define_insn "*aarch64_rdffr_z_ptest" [(set (reg:CC_NZC CC_REGNUM) (unspec:CC_NZC - [(match_operand:VNx16BI 1 "register_operand" "Upa") + [(match_operand:VNx16BI 1 "register_operand") (match_dup 1) (match_operand:SI 2 "aarch64_sve_ptrue_flag") (and:VNx16BI (reg:VNx16BI FFRT_REGNUM) (match_dup 1))] UNSPEC_PTEST)) - (clobber (match_scratch:VNx16BI 0 "=Upa"))] + (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE && TARGET_NON_STREAMING" - "rdffrs\t%0.b, %1/z" + {@ [ cons: =0, 1 , 2 ] + [ Upa , Upa, ] rdffrs\t%0.b, %1/z + } ) ;; Same for unpredicated RDFFR when tested with a known PTRUE. (define_insn "*aarch64_rdffr_ptest" [(set (reg:CC_NZC CC_REGNUM) (unspec:CC_NZC - [(match_operand:VNx16BI 1 "register_operand" "Upa") + [(match_operand:VNx16BI 1 "register_operand") (match_dup 1) (const_int SVE_KNOWN_PTRUE) (reg:VNx16BI FFRT_REGNUM)] UNSPEC_PTEST)) - (clobber (match_scratch:VNx16BI 0 "=Upa"))] + (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE && TARGET_NON_STREAMING" - "rdffrs\t%0.b, %1/z" + {@ [ cons: =0, 1 ] + [ Upa , Upa ] rdffrs\t%0.b, %1/z + } ) ;; Read the FFR with zero predication and test the result. (define_insn "*aarch64_rdffr_z_cc" [(set (reg:CC_NZC CC_REGNUM) (unspec:CC_NZC - [(match_operand:VNx16BI 1 "register_operand" "Upa") + [(match_operand:VNx16BI 1 "register_operand") (match_dup 1) (match_operand:SI 2 "aarch64_sve_ptrue_flag") (and:VNx16BI (reg:VNx16BI FFRT_REGNUM) (match_dup 1))] UNSPEC_PTEST)) - (set (match_operand:VNx16BI 0 "register_operand" "=Upa") + (set (match_operand:VNx16BI 0 "register_operand") (and:VNx16BI (reg:VNx16BI FFRT_REGNUM) (match_dup 1)))] "TARGET_SVE && TARGET_NON_STREAMING" - "rdffrs\t%0.b, %1/z" + {@ [ cons: =0, 1 , 2 ] + [ Upa , Upa, ] rdffrs\t%0.b, %1/z + } ) ;; Same for unpredicated RDFFR when tested with a known PTRUE. (define_insn "*aarch64_rdffr_cc" [(set (reg:CC_NZC CC_REGNUM) (unspec:CC_NZC - [(match_operand:VNx16BI 1 "register_operand" "Upa") + [(match_operand:VNx16BI 1 "register_operand") (match_dup 1) (const_int SVE_KNOWN_PTRUE) (reg:VNx16BI FFRT_REGNUM)] UNSPEC_PTEST)) - (set (match_operand:VNx16BI 0 "register_operand" "=Upa") + (set (match_operand:VNx16BI 0 "register_operand") (reg:VNx16BI FFRT_REGNUM))] "TARGET_SVE && TARGET_NON_STREAMING" - "rdffrs\t%0.b, %1/z" + {@ [ cons: =0, 1 , 2 ] + [ Upa , Upa, ] rdffrs\t%0.b, %1/z + } ) ;; [R3 in the block comment above about FFR handling] @@ -6637,11 +6647,13 @@ (define_insn "@aarch64_pred_<optab><mode>" ;; Doubling the second operand is the preferred implementation ;; of the MOV alias, so we use that instead of %1/z, %1, %2. (define_insn "and<mode>3" - [(set (match_operand:PRED_ALL 0 "register_operand" "=Upa") - (and:PRED_ALL (match_operand:PRED_ALL 1 "register_operand" "Upa") - (match_operand:PRED_ALL 2 "register_operand" "Upa")))] + [(set (match_operand:PRED_ALL 0 "register_operand") + (and:PRED_ALL (match_operand:PRED_ALL 1 "register_operand") + (match_operand:PRED_ALL 2 "register_operand")))] "TARGET_SVE" - "and\t%0.b, %1/z, %2.b, %2.b" + {@ [ cons: =0, 1 , 2 ] + [ Upa , Upa, Upa ] and\t%0.b, %1/z, %2.b, %2.b + } ) ;; Unpredicated predicate EOR and ORR. @@ -6660,14 +6672,16 @@ (define_expand "<optab><mode>3" ;; Predicated predicate AND, EOR and ORR. (define_insn "@aarch64_pred_<optab><mode>_z" - [(set (match_operand:PRED_ALL 0 "register_operand" "=Upa") + [(set (match_operand:PRED_ALL 0 "register_operand") (and:PRED_ALL (LOGICAL:PRED_ALL - (match_operand:PRED_ALL 2 "register_operand" "Upa") - (match_operand:PRED_ALL 3 "register_operand" "Upa")) - (match_operand:PRED_ALL 1 "register_operand" "Upa")))] + (match_operand:PRED_ALL 2 "register_operand") + (match_operand:PRED_ALL 3 "register_operand")) + (match_operand:PRED_ALL 1 "register_operand")))] "TARGET_SVE" - "<logical>\t%0.b, %1/z, %2.b, %3.b" + {@ [ cons: =0, 1 , 2 , 3 ] + [ Upa , Upa, Upa, Upa ] <logical>\t%0.b, %1/z, %2.b, %3.b + } ) ;; Perform a logical operation on operands 2 and 3, using operand 1 as @@ -6676,38 +6690,42 @@ (define_insn "@aarch64_pred_<optab><mode>_z" (define_insn "*<optab><mode>3_cc" [(set (reg:CC_NZC CC_REGNUM) (unspec:CC_NZC - [(match_operand:VNx16BI 1 "register_operand" "Upa") + [(match_operand:VNx16BI 1 "register_operand") (match_operand 4) (match_operand:SI 5 "aarch64_sve_ptrue_flag") (and:PRED_ALL (LOGICAL:PRED_ALL - (match_operand:PRED_ALL 2 "register_operand" "Upa") - (match_operand:PRED_ALL 3 "register_operand" "Upa")) + (match_operand:PRED_ALL 2 "register_operand") + (match_operand:PRED_ALL 3 "register_operand")) (match_dup 4))] UNSPEC_PTEST)) - (set (match_operand:PRED_ALL 0 "register_operand" "=Upa") + (set (match_operand:PRED_ALL 0 "register_operand") (and:PRED_ALL (LOGICAL:PRED_ALL (match_dup 2) (match_dup 3)) (match_dup 4)))] "TARGET_SVE" - "<logical>s\t%0.b, %1/z, %2.b, %3.b" + {@ [ cons: =0, 1 , 2 , 3 , 4, 5 ] + [ Upa , Upa, Upa, Upa, , ] <logical>s\t%0.b, %1/z, %2.b, %3.b + } ) ;; Same with just the flags result. (define_insn "*<optab><mode>3_ptest" [(set (reg:CC_NZC CC_REGNUM) (unspec:CC_NZC - [(match_operand:VNx16BI 1 "register_operand" "Upa") + [(match_operand:VNx16BI 1 "register_operand") (match_operand 4) (match_operand:SI 5 "aarch64_sve_ptrue_flag") (and:PRED_ALL (LOGICAL:PRED_ALL - (match_operand:PRED_ALL 2 "register_operand" "Upa") - (match_operand:PRED_ALL 3 "register_operand" "Upa")) + (match_operand:PRED_ALL 2 "register_operand") + (match_operand:PRED_ALL 3 "register_operand")) (match_dup 4))] UNSPEC_PTEST)) - (clobber (match_scratch:VNx16BI 0 "=Upa"))] + (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE" - "<logical>s\t%0.b, %1/z, %2.b, %3.b" + {@ [ cons: =0, 1 , 2 , 3 , 4, 5 ] + [ Upa , Upa, Upa, Upa, , ] <logical>s\t%0.b, %1/z, %2.b, %3.b + } ) ;; ------------------------------------------------------------------------- @@ -6720,56 +6738,62 @@ (define_insn "*<optab><mode>3_ptest" ;; Predicated predicate BIC and ORN. (define_insn "aarch64_pred_<nlogical><mode>_z" - [(set (match_operand:PRED_ALL 0 "register_operand" "=Upa") + [(set (match_operand:PRED_ALL 0 "register_operand") (and:PRED_ALL (NLOGICAL:PRED_ALL - (not:PRED_ALL (match_operand:PRED_ALL 3 "register_operand" "Upa")) - (match_operand:PRED_ALL 2 "register_operand" "Upa")) - (match_operand:PRED_ALL 1 "register_operand" "Upa")))] + (not:PRED_ALL (match_operand:PRED_ALL 3 "register_operand")) + (match_operand:PRED_ALL 2 "register_operand")) + (match_operand:PRED_ALL 1 "register_operand")))] "TARGET_SVE" - "<nlogical>\t%0.b, %1/z, %2.b, %3.b" + {@ [ cons: =0, 1 , 2 , 3 ] + [ Upa , Upa, Upa, Upa ] <nlogical>\t%0.b, %1/z, %2.b, %3.b + } ) ;; Same, but set the flags as a side-effect. (define_insn "*<nlogical><mode>3_cc" [(set (reg:CC_NZC CC_REGNUM) (unspec:CC_NZC - [(match_operand:VNx16BI 1 "register_operand" "Upa") + [(match_operand:VNx16BI 1 "register_operand") (match_operand 4) (match_operand:SI 5 "aarch64_sve_ptrue_flag") (and:PRED_ALL (NLOGICAL:PRED_ALL (not:PRED_ALL - (match_operand:PRED_ALL 3 "register_operand" "Upa")) - (match_operand:PRED_ALL 2 "register_operand" "Upa")) + (match_operand:PRED_ALL 3 "register_operand")) + (match_operand:PRED_ALL 2 "register_operand")) (match_dup 4))] UNSPEC_PTEST)) - (set (match_operand:PRED_ALL 0 "register_operand" "=Upa") + (set (match_operand:PRED_ALL 0 "register_operand") (and:PRED_ALL (NLOGICAL:PRED_ALL (not:PRED_ALL (match_dup 3)) (match_dup 2)) (match_dup 4)))] "TARGET_SVE" - "<nlogical>s\t%0.b, %1/z, %2.b, %3.b" + {@ [ cons: =0, 1 , 2 , 3 , 4, 5 ] + [ Upa , Upa, Upa, Upa, , ] <nlogical>s\t%0.b, %1/z, %2.b, %3.b + } ) ;; Same with just the flags result. (define_insn "*<nlogical><mode>3_ptest" [(set (reg:CC_NZC CC_REGNUM) (unspec:CC_NZC - [(match_operand:VNx16BI 1 "register_operand" "Upa") + [(match_operand:VNx16BI 1 "register_operand") (match_operand 4) (match_operand:SI 5 "aarch64_sve_ptrue_flag") (and:PRED_ALL (NLOGICAL:PRED_ALL (not:PRED_ALL - (match_operand:PRED_ALL 3 "register_operand" "Upa")) - (match_operand:PRED_ALL 2 "register_operand" "Upa")) + (match_operand:PRED_ALL 3 "register_operand")) + (match_operand:PRED_ALL 2 "register_operand")) (match_dup 4))] UNSPEC_PTEST)) - (clobber (match_scratch:VNx16BI 0 "=Upa"))] + (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE" - "<nlogical>s\t%0.b, %1/z, %2.b, %3.b" + {@ [ cons: =0, 1 , 2 , 3 , 4, 5 ] + [ Upa , Upa, Upa, Upa, , ] <nlogical>s\t%0.b, %1/z, %2.b, %3.b + } ) ;; ------------------------------------------------------------------------- @@ -6782,58 +6806,64 @@ (define_insn "*<nlogical><mode>3_ptest" ;; Predicated predicate NAND and NOR. (define_insn "aarch64_pred_<logical_nn><mode>_z" - [(set (match_operand:PRED_ALL 0 "register_operand" "=Upa") + [(set (match_operand:PRED_ALL 0 "register_operand") (and:PRED_ALL (NLOGICAL:PRED_ALL - (not:PRED_ALL (match_operand:PRED_ALL 2 "register_operand" "Upa")) - (not:PRED_ALL (match_operand:PRED_ALL 3 "register_operand" "Upa"))) - (match_operand:PRED_ALL 1 "register_operand" "Upa")))] + (not:PRED_ALL (match_operand:PRED_ALL 2 "register_operand")) + (not:PRED_ALL (match_operand:PRED_ALL 3 "register_operand"))) + (match_operand:PRED_ALL 1 "register_operand")))] "TARGET_SVE" - "<logical_nn>\t%0.b, %1/z, %2.b, %3.b" + {@ [ cons: =0, 1 , 2 , 3 ] + [ Upa , Upa, Upa, Upa ] <logical_nn>\t%0.b, %1/z, %2.b, %3.b + } ) ;; Same, but set the flags as a side-effect. (define_insn "*<logical_nn><mode>3_cc" [(set (reg:CC_NZC CC_REGNUM) (unspec:CC_NZC - [(match_operand:VNx16BI 1 "register_operand" "Upa") + [(match_operand:VNx16BI 1 "register_operand") (match_operand 4) (match_operand:SI 5 "aarch64_sve_ptrue_flag") (and:PRED_ALL (NLOGICAL:PRED_ALL (not:PRED_ALL - (match_operand:PRED_ALL 2 "register_operand" "Upa")) + (match_operand:PRED_ALL 2 "register_operand")) (not:PRED_ALL - (match_operand:PRED_ALL 3 "register_operand" "Upa"))) + (match_operand:PRED_ALL 3 "register_operand"))) (match_dup 4))] UNSPEC_PTEST)) - (set (match_operand:PRED_ALL 0 "register_operand" "=Upa") + (set (match_operand:PRED_ALL 0 "register_operand") (and:PRED_ALL (NLOGICAL:PRED_ALL (not:PRED_ALL (match_dup 2)) (not:PRED_ALL (match_dup 3))) (match_dup 4)))] "TARGET_SVE" - "<logical_nn>s\t%0.b, %1/z, %2.b, %3.b" + {@ [ cons: =0, 1 , 2 , 3 , 4, 5 ] + [ Upa , Upa, Upa, Upa, , ] <logical_nn>s\t%0.b, %1/z, %2.b, %3.b + } ) ;; Same with just the flags result. (define_insn "*<logical_nn><mode>3_ptest" [(set (reg:CC_NZC CC_REGNUM) (unspec:CC_NZC - [(match_operand:VNx16BI 1 "register_operand" "Upa") + [(match_operand:VNx16BI 1 "register_operand") (match_operand 4) (match_operand:SI 5 "aarch64_sve_ptrue_flag") (and:PRED_ALL (NLOGICAL:PRED_ALL (not:PRED_ALL - (match_operand:PRED_ALL 2 "register_operand" "Upa")) + (match_operand:PRED_ALL 2 "register_operand")) (not:PRED_ALL - (match_operand:PRED_ALL 3 "register_operand" "Upa"))) + (match_operand:PRED_ALL 3 "register_operand"))) (match_dup 4))] UNSPEC_PTEST)) - (clobber (match_scratch:VNx16BI 0 "=Upa"))] + (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE" - "<logical_nn>s\t%0.b, %1/z, %2.b, %3.b" + {@ [ cons: =0, 1 , 2 , 3 , 4, 5 ] + [ Upa , Upa, Upa, Upa, , ] <logical_nn>s\t%0.b, %1/z, %2.b, %3.b + } ) ;; ========================================================================= @@ -8133,12 +8163,12 @@ (define_insn_and_rewrite "*cmp<cmp_op><mode>_ptest" (match_operand:SVE_I 3 "aarch64_sve_cmp_<sve_imm_con>_operand"))] UNSPEC_PRED_Z)] UNSPEC_PTEST)) - (clobber (match_scratch:<VPRED> 0 "=Upa, Upa"))] + (clobber (match_scratch:<VPRED> 0))] "TARGET_SVE && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])" - {@ [ cons: 1 , 2 , 3 ] - [ Upl , w , <sve_imm_con> ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, #%3 - [ Upl , w , w ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype> + {@ [ cons: =0, 1 , 2 , 3 ] + [ Upa , Upl, w , <sve_imm_con> ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, #%3 + [ Upa , Upl, w , w ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype> } "&& !rtx_equal_p (operands[4], operands[6])" { @@ -8180,18 +8210,20 @@ (define_insn_and_split "*cmp<cmp_op><mode>_and" ;; Predicated integer wide comparisons. (define_insn "@aarch64_pred_cmp<cmp_op><mode>_wide" - [(set (match_operand:<VPRED> 0 "register_operand" "=Upa") + [(set (match_operand:<VPRED> 0 "register_operand") (unspec:<VPRED> - [(match_operand:VNx16BI 1 "register_operand" "Upl") + [(match_operand:VNx16BI 1 "register_operand") (match_operand:SI 2 "aarch64_sve_ptrue_flag") (unspec:<VPRED> - [(match_operand:SVE_FULL_BHSI 3 "register_operand" "w") - (match_operand:VNx2DI 4 "register_operand" "w")] + [(match_operand:SVE_FULL_BHSI 3 "register_operand") + (match_operand:VNx2DI 4 "register_operand")] SVE_COND_INT_CMP_WIDE)] UNSPEC_PRED_Z)) (clobber (reg:CC_NZC CC_REGNUM))] "TARGET_SVE" - "cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.d" + {@ [ cons: =0, 1 , 2, 3, 4 ] + [ Upa , Upl, , w, w ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.d + } ) ;; Predicated integer wide comparisons in which both the flag and @@ -8199,19 +8231,19 @@ (define_insn "@aarch64_pred_cmp<cmp_op><mode>_wide" (define_insn "*aarch64_pred_cmp<cmp_op><mode>_wide_cc" [(set (reg:CC_NZC CC_REGNUM) (unspec:CC_NZC - [(match_operand:VNx16BI 1 "register_operand" "Upl") + [(match_operand:VNx16BI 1 "register_operand") (match_operand 4) (match_operand:SI 5 "aarch64_sve_ptrue_flag") (unspec:<VPRED> - [(match_operand:VNx16BI 6 "register_operand" "Upl") + [(match_operand:VNx16BI 6 "register_operand") (match_operand:SI 7 "aarch64_sve_ptrue_flag") (unspec:<VPRED> - [(match_operand:SVE_FULL_BHSI 2 "register_operand" "w") - (match_operand:VNx2DI 3 "register_operand" "w")] + [(match_operand:SVE_FULL_BHSI 2 "register_operand") + (match_operand:VNx2DI 3 "register_operand")] SVE_COND_INT_CMP_WIDE)] UNSPEC_PRED_Z)] UNSPEC_PTEST)) - (set (match_operand:<VPRED> 0 "register_operand" "=Upa") + (set (match_operand:<VPRED> 0 "register_operand") (unspec:<VPRED> [(match_dup 6) (match_dup 7) @@ -8222,7 +8254,9 @@ (define_insn "*aarch64_pred_cmp<cmp_op><mode>_wide_cc" UNSPEC_PRED_Z))] "TARGET_SVE && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])" - "cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.d" + {@ [ cons: =0, 1 , 2, 3, 4, 5, 6 , 7 ] + [ Upa , Upl, w, w, , , Upl, ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.d + } ) ;; Predicated integer wide comparisons in which only the flags result @@ -8230,22 +8264,24 @@ (define_insn "*aarch64_pred_cmp<cmp_op><mode>_wide_cc" (define_insn "*aarch64_pred_cmp<cmp_op><mode>_wide_ptest" [(set (reg:CC_NZC CC_REGNUM) (unspec:CC_NZC - [(match_operand:VNx16BI 1 "register_operand" "Upl") + [(match_operand:VNx16BI 1 "register_operand") (match_operand 4) (match_operand:SI 5 "aarch64_sve_ptrue_flag") (unspec:<VPRED> - [(match_operand:VNx16BI 6 "register_operand" "Upl") + [(match_operand:VNx16BI 6 "register_operand") (match_operand:SI 7 "aarch64_sve_ptrue_flag") (unspec:<VPRED> - [(match_operand:SVE_FULL_BHSI 2 "register_operand" "w") - (match_operand:VNx2DI 3 "register_operand" "w")] + [(match_operand:SVE_FULL_BHSI 2 "register_operand") + (match_operand:VNx2DI 3 "register_operand")] SVE_COND_INT_CMP_WIDE)] UNSPEC_PRED_Z)] UNSPEC_PTEST)) - (clobber (match_scratch:<VPRED> 0 "=Upa"))] + (clobber (match_scratch:<VPRED> 0))] "TARGET_SVE && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])" - "cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.d" + {@ [ cons: =0, 1 , 2, 3, 4, 5, 6 , 7 ] + [ Upa , Upl, w, w, , , Upl, ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.d + } ) ;; ------------------------------------------------------------------------- @@ -9922,41 +9958,45 @@ (define_insn "@aarch64_brk<brk_op>" (define_insn "*aarch64_brk<brk_op>_cc" [(set (reg:CC_NZC CC_REGNUM) (unspec:CC_NZC - [(match_operand:VNx16BI 1 "register_operand" "Upa") + [(match_operand:VNx16BI 1 "register_operand") (match_dup 1) (match_operand:SI 4 "aarch64_sve_ptrue_flag") (unspec:VNx16BI [(match_dup 1) - (match_operand:VNx16BI 2 "register_operand" "Upa") + (match_operand:VNx16BI 2 "register_operand") (match_operand:VNx16BI 3 "aarch64_simd_imm_zero")] SVE_BRK_UNARY)] UNSPEC_PTEST)) - (set (match_operand:VNx16BI 0 "register_operand" "=Upa") + (set (match_operand:VNx16BI 0 "register_operand") (unspec:VNx16BI [(match_dup 1) (match_dup 2) (match_dup 3)] SVE_BRK_UNARY))] "TARGET_SVE" - "brk<brk_op>s\t%0.b, %1/z, %2.b" + {@ [ cons: =0, 1 , 2 , 3, 4 ] + [ Upa , Upa, Upa, , ] brk<brk_op>s\t%0.b, %1/z, %2.b + } ) ;; Same, but with only the flags result being interesting. (define_insn "*aarch64_brk<brk_op>_ptest" [(set (reg:CC_NZC CC_REGNUM) (unspec:CC_NZC - [(match_operand:VNx16BI 1 "register_operand" "Upa") + [(match_operand:VNx16BI 1 "register_operand") (match_dup 1) (match_operand:SI 4 "aarch64_sve_ptrue_flag") (unspec:VNx16BI [(match_dup 1) - (match_operand:VNx16BI 2 "register_operand" "Upa") + (match_operand:VNx16BI 2 "register_operand") (match_operand:VNx16BI 3 "aarch64_simd_imm_zero")] SVE_BRK_UNARY)] UNSPEC_PTEST)) - (clobber (match_scratch:VNx16BI 0 "=Upa"))] + (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE" - "brk<brk_op>s\t%0.b, %1/z, %2.b" + {@ [ cons: =0, 1 , 2 , 3, 4 ] + [ Upa , Upa, Upa, , ] brk<brk_op>s\t%0.b, %1/z, %2.b + } ) ;; ------------------------------------------------------------------------- @@ -9973,14 +10013,16 @@ (define_insn "*aarch64_brk<brk_op>_ptest" ;; Binary BRKs (BRKN, BRKPA, BRKPB). (define_insn "@aarch64_brk<brk_op>" - [(set (match_operand:VNx16BI 0 "register_operand" "=Upa") + [(set (match_operand:VNx16BI 0 "register_operand") (unspec:VNx16BI - [(match_operand:VNx16BI 1 "register_operand" "Upa") - (match_operand:VNx16BI 2 "register_operand" "Upa") - (match_operand:VNx16BI 3 "register_operand" "<brk_reg_con>")] + [(match_operand:VNx16BI 1 "register_operand") + (match_operand:VNx16BI 2 "register_operand") + (match_operand:VNx16BI 3 "register_operand")] SVE_BRK_BINARY))] "TARGET_SVE" - "brk<brk_op>\t%0.b, %1/z, %2.b, %<brk_reg_opno>.b" + {@ [ cons: =0, 1 , 2 , 3 ] + [ Upa , Upa, Upa, <brk_reg_con> ] brk<brk_op>\t%0.b, %1/z, %2.b, %<brk_reg_opno>.b + } ) ;; BRKN, producing both a predicate and a flags result. Unlike other @@ -9992,19 +10034,21 @@ (define_insn_and_rewrite "*aarch64_brkn_cc" (match_operand:VNx16BI 5) (const_int SVE_KNOWN_PTRUE) (unspec:VNx16BI - [(match_operand:VNx16BI 1 "register_operand" "Upa") - (match_operand:VNx16BI 2 "register_operand" "Upa") - (match_operand:VNx16BI 3 "register_operand" "0")] + [(match_operand:VNx16BI 1 "register_operand") + (match_operand:VNx16BI 2 "register_operand") + (match_operand:VNx16BI 3 "register_operand")] UNSPEC_BRKN)] UNSPEC_PTEST)) - (set (match_operand:VNx16BI 0 "register_operand" "=Upa") + (set (match_operand:VNx16BI 0 "register_operand") (unspec:VNx16BI [(match_dup 1) (match_dup 2) (match_dup 3)] UNSPEC_BRKN))] "TARGET_SVE" - "brkns\t%0.b, %1/z, %2.b, %0.b" + {@ [ cons: =0, 1 , 2 , 3, 4, 5 ] + [ Upa , Upa, Upa, 0, , ] brkns\t%0.b, %1/z, %2.b, %0.b + } "&& (operands[4] != CONST0_RTX (VNx16BImode) || operands[5] != CONST0_RTX (VNx16BImode))" { @@ -10021,14 +10065,16 @@ (define_insn_and_rewrite "*aarch64_brkn_ptest" (match_operand:VNx16BI 5) (const_int SVE_KNOWN_PTRUE) (unspec:VNx16BI - [(match_operand:VNx16BI 1 "register_operand" "Upa") - (match_operand:VNx16BI 2 "register_operand" "Upa") - (match_operand:VNx16BI 3 "register_operand" "0")] + [(match_operand:VNx16BI 1 "register_operand") + (match_operand:VNx16BI 2 "register_operand") + (match_operand:VNx16BI 3 "register_operand")] UNSPEC_BRKN)] UNSPEC_PTEST)) - (clobber (match_scratch:VNx16BI 0 "=Upa"))] + (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE" - "brkns\t%0.b, %1/z, %2.b, %0.b" + {@ [ cons: =0, 1 , 2 , 3, 4, 5 ] + [ Upa , Upa, Upa, 0, , ] brkns\t%0.b, %1/z, %2.b, %0.b + } "&& (operands[4] != CONST0_RTX (VNx16BImode) || operands[5] != CONST0_RTX (VNx16BImode))" { @@ -10041,41 +10087,45 @@ (define_insn_and_rewrite "*aarch64_brkn_ptest" (define_insn "*aarch64_brk<brk_op>_cc" [(set (reg:CC_NZC CC_REGNUM) (unspec:CC_NZC - [(match_operand:VNx16BI 1 "register_operand" "Upa") + [(match_operand:VNx16BI 1 "register_operand") (match_dup 1) (match_operand:SI 4 "aarch64_sve_ptrue_flag") (unspec:VNx16BI [(match_dup 1) - (match_operand:VNx16BI 2 "register_operand" "Upa") - (match_operand:VNx16BI 3 "register_operand" "Upa")] + (match_operand:VNx16BI 2 "register_operand") + (match_operand:VNx16BI 3 "register_operand")] SVE_BRKP)] UNSPEC_PTEST)) - (set (match_operand:VNx16BI 0 "register_operand" "=Upa") + (set (match_operand:VNx16BI 0 "register_operand") (unspec:VNx16BI [(match_dup 1) (match_dup 2) (match_dup 3)] SVE_BRKP))] "TARGET_SVE" - "brk<brk_op>s\t%0.b, %1/z, %2.b, %3.b" + {@ [ cons: =0, 1 , 2 , 3 , 4 ] + [ Upa , Upa, Upa, Upa, ] brk<brk_op>s\t%0.b, %1/z, %2.b, %3.b + } ) ;; Same, but with only the flags result being interesting. (define_insn "*aarch64_brk<brk_op>_ptest" [(set (reg:CC_NZC CC_REGNUM) (unspec:CC_NZC - [(match_operand:VNx16BI 1 "register_operand" "Upa") + [(match_operand:VNx16BI 1 "register_operand") (match_dup 1) (match_operand:SI 4 "aarch64_sve_ptrue_flag") (unspec:VNx16BI [(match_dup 1) - (match_operand:VNx16BI 2 "register_operand" "Upa") - (match_operand:VNx16BI 3 "register_operand" "Upa")] + (match_operand:VNx16BI 2 "register_operand") + (match_operand:VNx16BI 3 "register_operand")] SVE_BRKP)] UNSPEC_PTEST)) - (clobber (match_scratch:VNx16BI 0 "=Upa"))] + (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE" - "brk<brk_op>s\t%0.b, %1/z, %2.b, %3.b" + {@ [ cons: =0, 1 , 2 , 3 , 4 ] + [ Upa , Upa, Upa, Upa, ] brk<brk_op>s\t%0.b, %1/z, %2.b, %3.b + } ) ;; ------------------------------------------------------------------------- diff --git a/gcc/config/aarch64/aarch64-sve2.md b/gcc/config/aarch64/aarch64-sve2.md index 934e57055d3419e5dcc89b473fd110a0d4978b4f..aa12baf48355358ca4fefe88157df3aac6eb09bd 100644 --- a/gcc/config/aarch64/aarch64-sve2.md +++ b/gcc/config/aarch64/aarch64-sve2.md @@ -3338,18 +3338,20 @@ (define_insn "@aarch64_sve2_histseg<mode>" ;; Predicated string matching. (define_insn "@aarch64_pred_<sve_int_op><mode>" - [(set (match_operand:<VPRED> 0 "register_operand" "=Upa") + [(set (match_operand:<VPRED> 0 "register_operand") (unspec:<VPRED> - [(match_operand:<VPRED> 1 "register_operand" "Upl") + [(match_operand:<VPRED> 1 "register_operand") (match_operand:SI 2 "aarch64_sve_ptrue_flag") (unspec:<VPRED> - [(match_operand:SVE_FULL_BHI 3 "register_operand" "w") - (match_operand:SVE_FULL_BHI 4 "register_operand" "w")] + [(match_operand:SVE_FULL_BHI 3 "register_operand") + (match_operand:SVE_FULL_BHI 4 "register_operand")] SVE2_MATCH)] UNSPEC_PRED_Z)) (clobber (reg:CC_NZC CC_REGNUM))] "TARGET_SVE2 && TARGET_NON_STREAMING" - "<sve_int_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.<Vetype>" + {@ [ cons: =0, 1 , 2, 3, 4 ] + [ Upa , Upl, , w, w ] <sve_int_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.<Vetype> + } ) ;; Predicated string matching in which both the flag and predicate results -- [-- Attachment #2: rb18354.patch --] [-- Type: text/x-diff, Size: 25850 bytes --] diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md index 0434358122d2fde71bd0e0f850338e739e9be02c..839ab0627747d7a49bef7b0192ee9e7a42587ca0 100644 --- a/gcc/config/aarch64/aarch64-sve.md +++ b/gcc/config/aarch64/aarch64-sve.md @@ -1156,76 +1156,86 @@ (define_insn "aarch64_rdffr" ;; Likewise with zero predication. (define_insn "aarch64_rdffr_z" - [(set (match_operand:VNx16BI 0 "register_operand" "=Upa") + [(set (match_operand:VNx16BI 0 "register_operand") (and:VNx16BI (reg:VNx16BI FFRT_REGNUM) - (match_operand:VNx16BI 1 "register_operand" "Upa")))] + (match_operand:VNx16BI 1 "register_operand")))] "TARGET_SVE && TARGET_NON_STREAMING" - "rdffr\t%0.b, %1/z" + {@ [ cons: =0, 1 ] + [ Upa , Upa ] rdffr\t%0.b, %1/z + } ) ;; Read the FFR to test for a fault, without using the predicate result. (define_insn "*aarch64_rdffr_z_ptest" [(set (reg:CC_NZC CC_REGNUM) (unspec:CC_NZC - [(match_operand:VNx16BI 1 "register_operand" "Upa") + [(match_operand:VNx16BI 1 "register_operand") (match_dup 1) (match_operand:SI 2 "aarch64_sve_ptrue_flag") (and:VNx16BI (reg:VNx16BI FFRT_REGNUM) (match_dup 1))] UNSPEC_PTEST)) - (clobber (match_scratch:VNx16BI 0 "=Upa"))] + (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE && TARGET_NON_STREAMING" - "rdffrs\t%0.b, %1/z" + {@ [ cons: =0, 1 , 2 ] + [ Upa , Upa, ] rdffrs\t%0.b, %1/z + } ) ;; Same for unpredicated RDFFR when tested with a known PTRUE. (define_insn "*aarch64_rdffr_ptest" [(set (reg:CC_NZC CC_REGNUM) (unspec:CC_NZC - [(match_operand:VNx16BI 1 "register_operand" "Upa") + [(match_operand:VNx16BI 1 "register_operand") (match_dup 1) (const_int SVE_KNOWN_PTRUE) (reg:VNx16BI FFRT_REGNUM)] UNSPEC_PTEST)) - (clobber (match_scratch:VNx16BI 0 "=Upa"))] + (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE && TARGET_NON_STREAMING" - "rdffrs\t%0.b, %1/z" + {@ [ cons: =0, 1 ] + [ Upa , Upa ] rdffrs\t%0.b, %1/z + } ) ;; Read the FFR with zero predication and test the result. (define_insn "*aarch64_rdffr_z_cc" [(set (reg:CC_NZC CC_REGNUM) (unspec:CC_NZC - [(match_operand:VNx16BI 1 "register_operand" "Upa") + [(match_operand:VNx16BI 1 "register_operand") (match_dup 1) (match_operand:SI 2 "aarch64_sve_ptrue_flag") (and:VNx16BI (reg:VNx16BI FFRT_REGNUM) (match_dup 1))] UNSPEC_PTEST)) - (set (match_operand:VNx16BI 0 "register_operand" "=Upa") + (set (match_operand:VNx16BI 0 "register_operand") (and:VNx16BI (reg:VNx16BI FFRT_REGNUM) (match_dup 1)))] "TARGET_SVE && TARGET_NON_STREAMING" - "rdffrs\t%0.b, %1/z" + {@ [ cons: =0, 1 , 2 ] + [ Upa , Upa, ] rdffrs\t%0.b, %1/z + } ) ;; Same for unpredicated RDFFR when tested with a known PTRUE. (define_insn "*aarch64_rdffr_cc" [(set (reg:CC_NZC CC_REGNUM) (unspec:CC_NZC - [(match_operand:VNx16BI 1 "register_operand" "Upa") + [(match_operand:VNx16BI 1 "register_operand") (match_dup 1) (const_int SVE_KNOWN_PTRUE) (reg:VNx16BI FFRT_REGNUM)] UNSPEC_PTEST)) - (set (match_operand:VNx16BI 0 "register_operand" "=Upa") + (set (match_operand:VNx16BI 0 "register_operand") (reg:VNx16BI FFRT_REGNUM))] "TARGET_SVE && TARGET_NON_STREAMING" - "rdffrs\t%0.b, %1/z" + {@ [ cons: =0, 1 , 2 ] + [ Upa , Upa, ] rdffrs\t%0.b, %1/z + } ) ;; [R3 in the block comment above about FFR handling] @@ -6637,11 +6647,13 @@ (define_insn "@aarch64_pred_<optab><mode>" ;; Doubling the second operand is the preferred implementation ;; of the MOV alias, so we use that instead of %1/z, %1, %2. (define_insn "and<mode>3" - [(set (match_operand:PRED_ALL 0 "register_operand" "=Upa") - (and:PRED_ALL (match_operand:PRED_ALL 1 "register_operand" "Upa") - (match_operand:PRED_ALL 2 "register_operand" "Upa")))] + [(set (match_operand:PRED_ALL 0 "register_operand") + (and:PRED_ALL (match_operand:PRED_ALL 1 "register_operand") + (match_operand:PRED_ALL 2 "register_operand")))] "TARGET_SVE" - "and\t%0.b, %1/z, %2.b, %2.b" + {@ [ cons: =0, 1 , 2 ] + [ Upa , Upa, Upa ] and\t%0.b, %1/z, %2.b, %2.b + } ) ;; Unpredicated predicate EOR and ORR. @@ -6660,14 +6672,16 @@ (define_expand "<optab><mode>3" ;; Predicated predicate AND, EOR and ORR. (define_insn "@aarch64_pred_<optab><mode>_z" - [(set (match_operand:PRED_ALL 0 "register_operand" "=Upa") + [(set (match_operand:PRED_ALL 0 "register_operand") (and:PRED_ALL (LOGICAL:PRED_ALL - (match_operand:PRED_ALL 2 "register_operand" "Upa") - (match_operand:PRED_ALL 3 "register_operand" "Upa")) - (match_operand:PRED_ALL 1 "register_operand" "Upa")))] + (match_operand:PRED_ALL 2 "register_operand") + (match_operand:PRED_ALL 3 "register_operand")) + (match_operand:PRED_ALL 1 "register_operand")))] "TARGET_SVE" - "<logical>\t%0.b, %1/z, %2.b, %3.b" + {@ [ cons: =0, 1 , 2 , 3 ] + [ Upa , Upa, Upa, Upa ] <logical>\t%0.b, %1/z, %2.b, %3.b + } ) ;; Perform a logical operation on operands 2 and 3, using operand 1 as @@ -6676,38 +6690,42 @@ (define_insn "@aarch64_pred_<optab><mode>_z" (define_insn "*<optab><mode>3_cc" [(set (reg:CC_NZC CC_REGNUM) (unspec:CC_NZC - [(match_operand:VNx16BI 1 "register_operand" "Upa") + [(match_operand:VNx16BI 1 "register_operand") (match_operand 4) (match_operand:SI 5 "aarch64_sve_ptrue_flag") (and:PRED_ALL (LOGICAL:PRED_ALL - (match_operand:PRED_ALL 2 "register_operand" "Upa") - (match_operand:PRED_ALL 3 "register_operand" "Upa")) + (match_operand:PRED_ALL 2 "register_operand") + (match_operand:PRED_ALL 3 "register_operand")) (match_dup 4))] UNSPEC_PTEST)) - (set (match_operand:PRED_ALL 0 "register_operand" "=Upa") + (set (match_operand:PRED_ALL 0 "register_operand") (and:PRED_ALL (LOGICAL:PRED_ALL (match_dup 2) (match_dup 3)) (match_dup 4)))] "TARGET_SVE" - "<logical>s\t%0.b, %1/z, %2.b, %3.b" + {@ [ cons: =0, 1 , 2 , 3 , 4, 5 ] + [ Upa , Upa, Upa, Upa, , ] <logical>s\t%0.b, %1/z, %2.b, %3.b + } ) ;; Same with just the flags result. (define_insn "*<optab><mode>3_ptest" [(set (reg:CC_NZC CC_REGNUM) (unspec:CC_NZC - [(match_operand:VNx16BI 1 "register_operand" "Upa") + [(match_operand:VNx16BI 1 "register_operand") (match_operand 4) (match_operand:SI 5 "aarch64_sve_ptrue_flag") (and:PRED_ALL (LOGICAL:PRED_ALL - (match_operand:PRED_ALL 2 "register_operand" "Upa") - (match_operand:PRED_ALL 3 "register_operand" "Upa")) + (match_operand:PRED_ALL 2 "register_operand") + (match_operand:PRED_ALL 3 "register_operand")) (match_dup 4))] UNSPEC_PTEST)) - (clobber (match_scratch:VNx16BI 0 "=Upa"))] + (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE" - "<logical>s\t%0.b, %1/z, %2.b, %3.b" + {@ [ cons: =0, 1 , 2 , 3 , 4, 5 ] + [ Upa , Upa, Upa, Upa, , ] <logical>s\t%0.b, %1/z, %2.b, %3.b + } ) ;; ------------------------------------------------------------------------- @@ -6720,56 +6738,62 @@ (define_insn "*<optab><mode>3_ptest" ;; Predicated predicate BIC and ORN. (define_insn "aarch64_pred_<nlogical><mode>_z" - [(set (match_operand:PRED_ALL 0 "register_operand" "=Upa") + [(set (match_operand:PRED_ALL 0 "register_operand") (and:PRED_ALL (NLOGICAL:PRED_ALL - (not:PRED_ALL (match_operand:PRED_ALL 3 "register_operand" "Upa")) - (match_operand:PRED_ALL 2 "register_operand" "Upa")) - (match_operand:PRED_ALL 1 "register_operand" "Upa")))] + (not:PRED_ALL (match_operand:PRED_ALL 3 "register_operand")) + (match_operand:PRED_ALL 2 "register_operand")) + (match_operand:PRED_ALL 1 "register_operand")))] "TARGET_SVE" - "<nlogical>\t%0.b, %1/z, %2.b, %3.b" + {@ [ cons: =0, 1 , 2 , 3 ] + [ Upa , Upa, Upa, Upa ] <nlogical>\t%0.b, %1/z, %2.b, %3.b + } ) ;; Same, but set the flags as a side-effect. (define_insn "*<nlogical><mode>3_cc" [(set (reg:CC_NZC CC_REGNUM) (unspec:CC_NZC - [(match_operand:VNx16BI 1 "register_operand" "Upa") + [(match_operand:VNx16BI 1 "register_operand") (match_operand 4) (match_operand:SI 5 "aarch64_sve_ptrue_flag") (and:PRED_ALL (NLOGICAL:PRED_ALL (not:PRED_ALL - (match_operand:PRED_ALL 3 "register_operand" "Upa")) - (match_operand:PRED_ALL 2 "register_operand" "Upa")) + (match_operand:PRED_ALL 3 "register_operand")) + (match_operand:PRED_ALL 2 "register_operand")) (match_dup 4))] UNSPEC_PTEST)) - (set (match_operand:PRED_ALL 0 "register_operand" "=Upa") + (set (match_operand:PRED_ALL 0 "register_operand") (and:PRED_ALL (NLOGICAL:PRED_ALL (not:PRED_ALL (match_dup 3)) (match_dup 2)) (match_dup 4)))] "TARGET_SVE" - "<nlogical>s\t%0.b, %1/z, %2.b, %3.b" + {@ [ cons: =0, 1 , 2 , 3 , 4, 5 ] + [ Upa , Upa, Upa, Upa, , ] <nlogical>s\t%0.b, %1/z, %2.b, %3.b + } ) ;; Same with just the flags result. (define_insn "*<nlogical><mode>3_ptest" [(set (reg:CC_NZC CC_REGNUM) (unspec:CC_NZC - [(match_operand:VNx16BI 1 "register_operand" "Upa") + [(match_operand:VNx16BI 1 "register_operand") (match_operand 4) (match_operand:SI 5 "aarch64_sve_ptrue_flag") (and:PRED_ALL (NLOGICAL:PRED_ALL (not:PRED_ALL - (match_operand:PRED_ALL 3 "register_operand" "Upa")) - (match_operand:PRED_ALL 2 "register_operand" "Upa")) + (match_operand:PRED_ALL 3 "register_operand")) + (match_operand:PRED_ALL 2 "register_operand")) (match_dup 4))] UNSPEC_PTEST)) - (clobber (match_scratch:VNx16BI 0 "=Upa"))] + (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE" - "<nlogical>s\t%0.b, %1/z, %2.b, %3.b" + {@ [ cons: =0, 1 , 2 , 3 , 4, 5 ] + [ Upa , Upa, Upa, Upa, , ] <nlogical>s\t%0.b, %1/z, %2.b, %3.b + } ) ;; ------------------------------------------------------------------------- @@ -6782,58 +6806,64 @@ (define_insn "*<nlogical><mode>3_ptest" ;; Predicated predicate NAND and NOR. (define_insn "aarch64_pred_<logical_nn><mode>_z" - [(set (match_operand:PRED_ALL 0 "register_operand" "=Upa") + [(set (match_operand:PRED_ALL 0 "register_operand") (and:PRED_ALL (NLOGICAL:PRED_ALL - (not:PRED_ALL (match_operand:PRED_ALL 2 "register_operand" "Upa")) - (not:PRED_ALL (match_operand:PRED_ALL 3 "register_operand" "Upa"))) - (match_operand:PRED_ALL 1 "register_operand" "Upa")))] + (not:PRED_ALL (match_operand:PRED_ALL 2 "register_operand")) + (not:PRED_ALL (match_operand:PRED_ALL 3 "register_operand"))) + (match_operand:PRED_ALL 1 "register_operand")))] "TARGET_SVE" - "<logical_nn>\t%0.b, %1/z, %2.b, %3.b" + {@ [ cons: =0, 1 , 2 , 3 ] + [ Upa , Upa, Upa, Upa ] <logical_nn>\t%0.b, %1/z, %2.b, %3.b + } ) ;; Same, but set the flags as a side-effect. (define_insn "*<logical_nn><mode>3_cc" [(set (reg:CC_NZC CC_REGNUM) (unspec:CC_NZC - [(match_operand:VNx16BI 1 "register_operand" "Upa") + [(match_operand:VNx16BI 1 "register_operand") (match_operand 4) (match_operand:SI 5 "aarch64_sve_ptrue_flag") (and:PRED_ALL (NLOGICAL:PRED_ALL (not:PRED_ALL - (match_operand:PRED_ALL 2 "register_operand" "Upa")) + (match_operand:PRED_ALL 2 "register_operand")) (not:PRED_ALL - (match_operand:PRED_ALL 3 "register_operand" "Upa"))) + (match_operand:PRED_ALL 3 "register_operand"))) (match_dup 4))] UNSPEC_PTEST)) - (set (match_operand:PRED_ALL 0 "register_operand" "=Upa") + (set (match_operand:PRED_ALL 0 "register_operand") (and:PRED_ALL (NLOGICAL:PRED_ALL (not:PRED_ALL (match_dup 2)) (not:PRED_ALL (match_dup 3))) (match_dup 4)))] "TARGET_SVE" - "<logical_nn>s\t%0.b, %1/z, %2.b, %3.b" + {@ [ cons: =0, 1 , 2 , 3 , 4, 5 ] + [ Upa , Upa, Upa, Upa, , ] <logical_nn>s\t%0.b, %1/z, %2.b, %3.b + } ) ;; Same with just the flags result. (define_insn "*<logical_nn><mode>3_ptest" [(set (reg:CC_NZC CC_REGNUM) (unspec:CC_NZC - [(match_operand:VNx16BI 1 "register_operand" "Upa") + [(match_operand:VNx16BI 1 "register_operand") (match_operand 4) (match_operand:SI 5 "aarch64_sve_ptrue_flag") (and:PRED_ALL (NLOGICAL:PRED_ALL (not:PRED_ALL - (match_operand:PRED_ALL 2 "register_operand" "Upa")) + (match_operand:PRED_ALL 2 "register_operand")) (not:PRED_ALL - (match_operand:PRED_ALL 3 "register_operand" "Upa"))) + (match_operand:PRED_ALL 3 "register_operand"))) (match_dup 4))] UNSPEC_PTEST)) - (clobber (match_scratch:VNx16BI 0 "=Upa"))] + (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE" - "<logical_nn>s\t%0.b, %1/z, %2.b, %3.b" + {@ [ cons: =0, 1 , 2 , 3 , 4, 5 ] + [ Upa , Upa, Upa, Upa, , ] <logical_nn>s\t%0.b, %1/z, %2.b, %3.b + } ) ;; ========================================================================= @@ -8133,12 +8163,12 @@ (define_insn_and_rewrite "*cmp<cmp_op><mode>_ptest" (match_operand:SVE_I 3 "aarch64_sve_cmp_<sve_imm_con>_operand"))] UNSPEC_PRED_Z)] UNSPEC_PTEST)) - (clobber (match_scratch:<VPRED> 0 "=Upa, Upa"))] + (clobber (match_scratch:<VPRED> 0))] "TARGET_SVE && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])" - {@ [ cons: 1 , 2 , 3 ] - [ Upl , w , <sve_imm_con> ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, #%3 - [ Upl , w , w ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype> + {@ [ cons: =0, 1 , 2 , 3 ] + [ Upa , Upl, w , <sve_imm_con> ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, #%3 + [ Upa , Upl, w , w ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype> } "&& !rtx_equal_p (operands[4], operands[6])" { @@ -8180,18 +8210,20 @@ (define_insn_and_split "*cmp<cmp_op><mode>_and" ;; Predicated integer wide comparisons. (define_insn "@aarch64_pred_cmp<cmp_op><mode>_wide" - [(set (match_operand:<VPRED> 0 "register_operand" "=Upa") + [(set (match_operand:<VPRED> 0 "register_operand") (unspec:<VPRED> - [(match_operand:VNx16BI 1 "register_operand" "Upl") + [(match_operand:VNx16BI 1 "register_operand") (match_operand:SI 2 "aarch64_sve_ptrue_flag") (unspec:<VPRED> - [(match_operand:SVE_FULL_BHSI 3 "register_operand" "w") - (match_operand:VNx2DI 4 "register_operand" "w")] + [(match_operand:SVE_FULL_BHSI 3 "register_operand") + (match_operand:VNx2DI 4 "register_operand")] SVE_COND_INT_CMP_WIDE)] UNSPEC_PRED_Z)) (clobber (reg:CC_NZC CC_REGNUM))] "TARGET_SVE" - "cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.d" + {@ [ cons: =0, 1 , 2, 3, 4 ] + [ Upa , Upl, , w, w ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.d + } ) ;; Predicated integer wide comparisons in which both the flag and @@ -8199,19 +8231,19 @@ (define_insn "@aarch64_pred_cmp<cmp_op><mode>_wide" (define_insn "*aarch64_pred_cmp<cmp_op><mode>_wide_cc" [(set (reg:CC_NZC CC_REGNUM) (unspec:CC_NZC - [(match_operand:VNx16BI 1 "register_operand" "Upl") + [(match_operand:VNx16BI 1 "register_operand") (match_operand 4) (match_operand:SI 5 "aarch64_sve_ptrue_flag") (unspec:<VPRED> - [(match_operand:VNx16BI 6 "register_operand" "Upl") + [(match_operand:VNx16BI 6 "register_operand") (match_operand:SI 7 "aarch64_sve_ptrue_flag") (unspec:<VPRED> - [(match_operand:SVE_FULL_BHSI 2 "register_operand" "w") - (match_operand:VNx2DI 3 "register_operand" "w")] + [(match_operand:SVE_FULL_BHSI 2 "register_operand") + (match_operand:VNx2DI 3 "register_operand")] SVE_COND_INT_CMP_WIDE)] UNSPEC_PRED_Z)] UNSPEC_PTEST)) - (set (match_operand:<VPRED> 0 "register_operand" "=Upa") + (set (match_operand:<VPRED> 0 "register_operand") (unspec:<VPRED> [(match_dup 6) (match_dup 7) @@ -8222,7 +8254,9 @@ (define_insn "*aarch64_pred_cmp<cmp_op><mode>_wide_cc" UNSPEC_PRED_Z))] "TARGET_SVE && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])" - "cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.d" + {@ [ cons: =0, 1 , 2, 3, 4, 5, 6 , 7 ] + [ Upa , Upl, w, w, , , Upl, ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.d + } ) ;; Predicated integer wide comparisons in which only the flags result @@ -8230,22 +8264,24 @@ (define_insn "*aarch64_pred_cmp<cmp_op><mode>_wide_cc" (define_insn "*aarch64_pred_cmp<cmp_op><mode>_wide_ptest" [(set (reg:CC_NZC CC_REGNUM) (unspec:CC_NZC - [(match_operand:VNx16BI 1 "register_operand" "Upl") + [(match_operand:VNx16BI 1 "register_operand") (match_operand 4) (match_operand:SI 5 "aarch64_sve_ptrue_flag") (unspec:<VPRED> - [(match_operand:VNx16BI 6 "register_operand" "Upl") + [(match_operand:VNx16BI 6 "register_operand") (match_operand:SI 7 "aarch64_sve_ptrue_flag") (unspec:<VPRED> - [(match_operand:SVE_FULL_BHSI 2 "register_operand" "w") - (match_operand:VNx2DI 3 "register_operand" "w")] + [(match_operand:SVE_FULL_BHSI 2 "register_operand") + (match_operand:VNx2DI 3 "register_operand")] SVE_COND_INT_CMP_WIDE)] UNSPEC_PRED_Z)] UNSPEC_PTEST)) - (clobber (match_scratch:<VPRED> 0 "=Upa"))] + (clobber (match_scratch:<VPRED> 0))] "TARGET_SVE && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])" - "cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.d" + {@ [ cons: =0, 1 , 2, 3, 4, 5, 6 , 7 ] + [ Upa , Upl, w, w, , , Upl, ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.d + } ) ;; ------------------------------------------------------------------------- @@ -9922,41 +9958,45 @@ (define_insn "@aarch64_brk<brk_op>" (define_insn "*aarch64_brk<brk_op>_cc" [(set (reg:CC_NZC CC_REGNUM) (unspec:CC_NZC - [(match_operand:VNx16BI 1 "register_operand" "Upa") + [(match_operand:VNx16BI 1 "register_operand") (match_dup 1) (match_operand:SI 4 "aarch64_sve_ptrue_flag") (unspec:VNx16BI [(match_dup 1) - (match_operand:VNx16BI 2 "register_operand" "Upa") + (match_operand:VNx16BI 2 "register_operand") (match_operand:VNx16BI 3 "aarch64_simd_imm_zero")] SVE_BRK_UNARY)] UNSPEC_PTEST)) - (set (match_operand:VNx16BI 0 "register_operand" "=Upa") + (set (match_operand:VNx16BI 0 "register_operand") (unspec:VNx16BI [(match_dup 1) (match_dup 2) (match_dup 3)] SVE_BRK_UNARY))] "TARGET_SVE" - "brk<brk_op>s\t%0.b, %1/z, %2.b" + {@ [ cons: =0, 1 , 2 , 3, 4 ] + [ Upa , Upa, Upa, , ] brk<brk_op>s\t%0.b, %1/z, %2.b + } ) ;; Same, but with only the flags result being interesting. (define_insn "*aarch64_brk<brk_op>_ptest" [(set (reg:CC_NZC CC_REGNUM) (unspec:CC_NZC - [(match_operand:VNx16BI 1 "register_operand" "Upa") + [(match_operand:VNx16BI 1 "register_operand") (match_dup 1) (match_operand:SI 4 "aarch64_sve_ptrue_flag") (unspec:VNx16BI [(match_dup 1) - (match_operand:VNx16BI 2 "register_operand" "Upa") + (match_operand:VNx16BI 2 "register_operand") (match_operand:VNx16BI 3 "aarch64_simd_imm_zero")] SVE_BRK_UNARY)] UNSPEC_PTEST)) - (clobber (match_scratch:VNx16BI 0 "=Upa"))] + (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE" - "brk<brk_op>s\t%0.b, %1/z, %2.b" + {@ [ cons: =0, 1 , 2 , 3, 4 ] + [ Upa , Upa, Upa, , ] brk<brk_op>s\t%0.b, %1/z, %2.b + } ) ;; ------------------------------------------------------------------------- @@ -9973,14 +10013,16 @@ (define_insn "*aarch64_brk<brk_op>_ptest" ;; Binary BRKs (BRKN, BRKPA, BRKPB). (define_insn "@aarch64_brk<brk_op>" - [(set (match_operand:VNx16BI 0 "register_operand" "=Upa") + [(set (match_operand:VNx16BI 0 "register_operand") (unspec:VNx16BI - [(match_operand:VNx16BI 1 "register_operand" "Upa") - (match_operand:VNx16BI 2 "register_operand" "Upa") - (match_operand:VNx16BI 3 "register_operand" "<brk_reg_con>")] + [(match_operand:VNx16BI 1 "register_operand") + (match_operand:VNx16BI 2 "register_operand") + (match_operand:VNx16BI 3 "register_operand")] SVE_BRK_BINARY))] "TARGET_SVE" - "brk<brk_op>\t%0.b, %1/z, %2.b, %<brk_reg_opno>.b" + {@ [ cons: =0, 1 , 2 , 3 ] + [ Upa , Upa, Upa, <brk_reg_con> ] brk<brk_op>\t%0.b, %1/z, %2.b, %<brk_reg_opno>.b + } ) ;; BRKN, producing both a predicate and a flags result. Unlike other @@ -9992,19 +10034,21 @@ (define_insn_and_rewrite "*aarch64_brkn_cc" (match_operand:VNx16BI 5) (const_int SVE_KNOWN_PTRUE) (unspec:VNx16BI - [(match_operand:VNx16BI 1 "register_operand" "Upa") - (match_operand:VNx16BI 2 "register_operand" "Upa") - (match_operand:VNx16BI 3 "register_operand" "0")] + [(match_operand:VNx16BI 1 "register_operand") + (match_operand:VNx16BI 2 "register_operand") + (match_operand:VNx16BI 3 "register_operand")] UNSPEC_BRKN)] UNSPEC_PTEST)) - (set (match_operand:VNx16BI 0 "register_operand" "=Upa") + (set (match_operand:VNx16BI 0 "register_operand") (unspec:VNx16BI [(match_dup 1) (match_dup 2) (match_dup 3)] UNSPEC_BRKN))] "TARGET_SVE" - "brkns\t%0.b, %1/z, %2.b, %0.b" + {@ [ cons: =0, 1 , 2 , 3, 4, 5 ] + [ Upa , Upa, Upa, 0, , ] brkns\t%0.b, %1/z, %2.b, %0.b + } "&& (operands[4] != CONST0_RTX (VNx16BImode) || operands[5] != CONST0_RTX (VNx16BImode))" { @@ -10021,14 +10065,16 @@ (define_insn_and_rewrite "*aarch64_brkn_ptest" (match_operand:VNx16BI 5) (const_int SVE_KNOWN_PTRUE) (unspec:VNx16BI - [(match_operand:VNx16BI 1 "register_operand" "Upa") - (match_operand:VNx16BI 2 "register_operand" "Upa") - (match_operand:VNx16BI 3 "register_operand" "0")] + [(match_operand:VNx16BI 1 "register_operand") + (match_operand:VNx16BI 2 "register_operand") + (match_operand:VNx16BI 3 "register_operand")] UNSPEC_BRKN)] UNSPEC_PTEST)) - (clobber (match_scratch:VNx16BI 0 "=Upa"))] + (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE" - "brkns\t%0.b, %1/z, %2.b, %0.b" + {@ [ cons: =0, 1 , 2 , 3, 4, 5 ] + [ Upa , Upa, Upa, 0, , ] brkns\t%0.b, %1/z, %2.b, %0.b + } "&& (operands[4] != CONST0_RTX (VNx16BImode) || operands[5] != CONST0_RTX (VNx16BImode))" { @@ -10041,41 +10087,45 @@ (define_insn_and_rewrite "*aarch64_brkn_ptest" (define_insn "*aarch64_brk<brk_op>_cc" [(set (reg:CC_NZC CC_REGNUM) (unspec:CC_NZC - [(match_operand:VNx16BI 1 "register_operand" "Upa") + [(match_operand:VNx16BI 1 "register_operand") (match_dup 1) (match_operand:SI 4 "aarch64_sve_ptrue_flag") (unspec:VNx16BI [(match_dup 1) - (match_operand:VNx16BI 2 "register_operand" "Upa") - (match_operand:VNx16BI 3 "register_operand" "Upa")] + (match_operand:VNx16BI 2 "register_operand") + (match_operand:VNx16BI 3 "register_operand")] SVE_BRKP)] UNSPEC_PTEST)) - (set (match_operand:VNx16BI 0 "register_operand" "=Upa") + (set (match_operand:VNx16BI 0 "register_operand") (unspec:VNx16BI [(match_dup 1) (match_dup 2) (match_dup 3)] SVE_BRKP))] "TARGET_SVE" - "brk<brk_op>s\t%0.b, %1/z, %2.b, %3.b" + {@ [ cons: =0, 1 , 2 , 3 , 4 ] + [ Upa , Upa, Upa, Upa, ] brk<brk_op>s\t%0.b, %1/z, %2.b, %3.b + } ) ;; Same, but with only the flags result being interesting. (define_insn "*aarch64_brk<brk_op>_ptest" [(set (reg:CC_NZC CC_REGNUM) (unspec:CC_NZC - [(match_operand:VNx16BI 1 "register_operand" "Upa") + [(match_operand:VNx16BI 1 "register_operand") (match_dup 1) (match_operand:SI 4 "aarch64_sve_ptrue_flag") (unspec:VNx16BI [(match_dup 1) - (match_operand:VNx16BI 2 "register_operand" "Upa") - (match_operand:VNx16BI 3 "register_operand" "Upa")] + (match_operand:VNx16BI 2 "register_operand") + (match_operand:VNx16BI 3 "register_operand")] SVE_BRKP)] UNSPEC_PTEST)) - (clobber (match_scratch:VNx16BI 0 "=Upa"))] + (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE" - "brk<brk_op>s\t%0.b, %1/z, %2.b, %3.b" + {@ [ cons: =0, 1 , 2 , 3 , 4 ] + [ Upa , Upa, Upa, Upa, ] brk<brk_op>s\t%0.b, %1/z, %2.b, %3.b + } ) ;; ------------------------------------------------------------------------- diff --git a/gcc/config/aarch64/aarch64-sve2.md b/gcc/config/aarch64/aarch64-sve2.md index 934e57055d3419e5dcc89b473fd110a0d4978b4f..aa12baf48355358ca4fefe88157df3aac6eb09bd 100644 --- a/gcc/config/aarch64/aarch64-sve2.md +++ b/gcc/config/aarch64/aarch64-sve2.md @@ -3338,18 +3338,20 @@ (define_insn "@aarch64_sve2_histseg<mode>" ;; Predicated string matching. (define_insn "@aarch64_pred_<sve_int_op><mode>" - [(set (match_operand:<VPRED> 0 "register_operand" "=Upa") + [(set (match_operand:<VPRED> 0 "register_operand") (unspec:<VPRED> - [(match_operand:<VPRED> 1 "register_operand" "Upl") + [(match_operand:<VPRED> 1 "register_operand") (match_operand:SI 2 "aarch64_sve_ptrue_flag") (unspec:<VPRED> - [(match_operand:SVE_FULL_BHI 3 "register_operand" "w") - (match_operand:SVE_FULL_BHI 4 "register_operand" "w")] + [(match_operand:SVE_FULL_BHI 3 "register_operand") + (match_operand:SVE_FULL_BHI 4 "register_operand")] SVE2_MATCH)] UNSPEC_PRED_Z)) (clobber (reg:CC_NZC CC_REGNUM))] "TARGET_SVE2 && TARGET_NON_STREAMING" - "<sve_int_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.<Vetype>" + {@ [ cons: =0, 1 , 2, 3, 4 ] + [ Upa , Upl, , w, w ] <sve_int_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.<Vetype> + } ) ;; Predicated string matching in which both the flag and predicate results ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 1/4]AArch64: convert several predicate patterns to new compact syntax 2024-05-15 10:28 ` [PATCH 1/4]AArch64: convert several predicate patterns to new compact syntax Tamar Christina @ 2024-05-15 10:35 ` Kyrill Tkachov 2024-05-15 11:06 ` Richard Sandiford 1 sibling, 0 replies; 25+ messages in thread From: Kyrill Tkachov @ 2024-05-15 10:35 UTC (permalink / raw) To: Tamar Christina Cc: Marcus.Shawcroft, Richard.Earnshaw, gcc-patches, ktkachov, nd, richard.sandiford [-- Attachment #1: Type: text/plain, Size: 31058 bytes --] Hi Tamar, On Wed, 15 May 2024 at 11:28, Tamar Christina <tamar.christina@arm.com> wrote: > Hi All, > > This converts the single alternative patterns to the new compact syntax > such > that when I add the new alternatives it's clearer what's being changed. > > Note that this will spew out a bunch of warnings from geninsn as it'll > warn that > @ is useless for a single alternative pattern. These are not fatal so > won't > break the build and are only temporary. > > No change in functionality is expected with this patch. > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > Ok for master? Ok. Thanks, Kyrill > > Thanks, > Tamar > > gcc/ChangeLog: > > * config/aarch64/aarch64-sve.md (and<mode>3, > @aarch64_pred_<optab><mode>_z, *<optab><mode>3_cc, > *<optab><mode>3_ptest, aarch64_pred_<nlogical><mode>_z, > *<nlogical><mode>3_cc, *<nlogical><mode>3_ptest, > aarch64_pred_<logical_nn><mode>_z, *<logical_nn><mode>3_cc, > *<logical_nn><mode>3_ptest, *cmp<cmp_op><mode>_ptest, > @aarch64_pred_cmp<cmp_op><mode>_wide, > *aarch64_pred_cmp<cmp_op><mode>_wide_cc, > *aarch64_pred_cmp<cmp_op><mode>_wide_ptest, > *aarch64_brk<brk_op>_cc, > *aarch64_brk<brk_op>_ptest, @aarch64_brk<brk_op>, *aarch64_brkn_cc, > *aarch64_brkn_ptest, *aarch64_brk<brk_op>_cc, > *aarch64_brk<brk_op>_ptest, aarch64_rdffr_z, > *aarch64_rdffr_z_ptest, > *aarch64_rdffr_ptest, *aarch64_rdffr_z_cc, *aarch64_rdffr_cc): > Convert > to compact syntax. > * config/aarch64/aarch64-sve2.md > (@aarch64_pred_<sve_int_op><mode>): Likewise. > > --- > diff --git a/gcc/config/aarch64/aarch64-sve.md > b/gcc/config/aarch64/aarch64-sve.md > index > 0434358122d2fde71bd0e0f850338e739e9be02c..839ab0627747d7a49bef7b0192ee9e7a42587ca0 > 100644 > --- a/gcc/config/aarch64/aarch64-sve.md > +++ b/gcc/config/aarch64/aarch64-sve.md > @@ -1156,76 +1156,86 @@ (define_insn "aarch64_rdffr" > > ;; Likewise with zero predication. > (define_insn "aarch64_rdffr_z" > - [(set (match_operand:VNx16BI 0 "register_operand" "=Upa") > + [(set (match_operand:VNx16BI 0 "register_operand") > (and:VNx16BI > (reg:VNx16BI FFRT_REGNUM) > - (match_operand:VNx16BI 1 "register_operand" "Upa")))] > + (match_operand:VNx16BI 1 "register_operand")))] > "TARGET_SVE && TARGET_NON_STREAMING" > - "rdffr\t%0.b, %1/z" > + {@ [ cons: =0, 1 ] > + [ Upa , Upa ] rdffr\t%0.b, %1/z > + } > ) > > ;; Read the FFR to test for a fault, without using the predicate result. > (define_insn "*aarch64_rdffr_z_ptest" > [(set (reg:CC_NZC CC_REGNUM) > (unspec:CC_NZC > - [(match_operand:VNx16BI 1 "register_operand" "Upa") > + [(match_operand:VNx16BI 1 "register_operand") > (match_dup 1) > (match_operand:SI 2 "aarch64_sve_ptrue_flag") > (and:VNx16BI > (reg:VNx16BI FFRT_REGNUM) > (match_dup 1))] > UNSPEC_PTEST)) > - (clobber (match_scratch:VNx16BI 0 "=Upa"))] > + (clobber (match_scratch:VNx16BI 0))] > "TARGET_SVE && TARGET_NON_STREAMING" > - "rdffrs\t%0.b, %1/z" > + {@ [ cons: =0, 1 , 2 ] > + [ Upa , Upa, ] rdffrs\t%0.b, %1/z > + } > ) > > ;; Same for unpredicated RDFFR when tested with a known PTRUE. > (define_insn "*aarch64_rdffr_ptest" > [(set (reg:CC_NZC CC_REGNUM) > (unspec:CC_NZC > - [(match_operand:VNx16BI 1 "register_operand" "Upa") > + [(match_operand:VNx16BI 1 "register_operand") > (match_dup 1) > (const_int SVE_KNOWN_PTRUE) > (reg:VNx16BI FFRT_REGNUM)] > UNSPEC_PTEST)) > - (clobber (match_scratch:VNx16BI 0 "=Upa"))] > + (clobber (match_scratch:VNx16BI 0))] > "TARGET_SVE && TARGET_NON_STREAMING" > - "rdffrs\t%0.b, %1/z" > + {@ [ cons: =0, 1 ] > + [ Upa , Upa ] rdffrs\t%0.b, %1/z > + } > ) > > ;; Read the FFR with zero predication and test the result. > (define_insn "*aarch64_rdffr_z_cc" > [(set (reg:CC_NZC CC_REGNUM) > (unspec:CC_NZC > - [(match_operand:VNx16BI 1 "register_operand" "Upa") > + [(match_operand:VNx16BI 1 "register_operand") > (match_dup 1) > (match_operand:SI 2 "aarch64_sve_ptrue_flag") > (and:VNx16BI > (reg:VNx16BI FFRT_REGNUM) > (match_dup 1))] > UNSPEC_PTEST)) > - (set (match_operand:VNx16BI 0 "register_operand" "=Upa") > + (set (match_operand:VNx16BI 0 "register_operand") > (and:VNx16BI > (reg:VNx16BI FFRT_REGNUM) > (match_dup 1)))] > "TARGET_SVE && TARGET_NON_STREAMING" > - "rdffrs\t%0.b, %1/z" > + {@ [ cons: =0, 1 , 2 ] > + [ Upa , Upa, ] rdffrs\t%0.b, %1/z > + } > ) > > ;; Same for unpredicated RDFFR when tested with a known PTRUE. > (define_insn "*aarch64_rdffr_cc" > [(set (reg:CC_NZC CC_REGNUM) > (unspec:CC_NZC > - [(match_operand:VNx16BI 1 "register_operand" "Upa") > + [(match_operand:VNx16BI 1 "register_operand") > (match_dup 1) > (const_int SVE_KNOWN_PTRUE) > (reg:VNx16BI FFRT_REGNUM)] > UNSPEC_PTEST)) > - (set (match_operand:VNx16BI 0 "register_operand" "=Upa") > + (set (match_operand:VNx16BI 0 "register_operand") > (reg:VNx16BI FFRT_REGNUM))] > "TARGET_SVE && TARGET_NON_STREAMING" > - "rdffrs\t%0.b, %1/z" > + {@ [ cons: =0, 1 , 2 ] > + [ Upa , Upa, ] rdffrs\t%0.b, %1/z > + } > ) > > ;; [R3 in the block comment above about FFR handling] > @@ -6637,11 +6647,13 @@ (define_insn "@aarch64_pred_<optab><mode>" > ;; Doubling the second operand is the preferred implementation > ;; of the MOV alias, so we use that instead of %1/z, %1, %2. > (define_insn "and<mode>3" > - [(set (match_operand:PRED_ALL 0 "register_operand" "=Upa") > - (and:PRED_ALL (match_operand:PRED_ALL 1 "register_operand" "Upa") > - (match_operand:PRED_ALL 2 "register_operand" > "Upa")))] > + [(set (match_operand:PRED_ALL 0 "register_operand") > + (and:PRED_ALL (match_operand:PRED_ALL 1 "register_operand") > + (match_operand:PRED_ALL 2 "register_operand")))] > "TARGET_SVE" > - "and\t%0.b, %1/z, %2.b, %2.b" > + {@ [ cons: =0, 1 , 2 ] > + [ Upa , Upa, Upa ] and\t%0.b, %1/z, %2.b, %2.b > + } > ) > > ;; Unpredicated predicate EOR and ORR. > @@ -6660,14 +6672,16 @@ (define_expand "<optab><mode>3" > > ;; Predicated predicate AND, EOR and ORR. > (define_insn "@aarch64_pred_<optab><mode>_z" > - [(set (match_operand:PRED_ALL 0 "register_operand" "=Upa") > + [(set (match_operand:PRED_ALL 0 "register_operand") > (and:PRED_ALL > (LOGICAL:PRED_ALL > - (match_operand:PRED_ALL 2 "register_operand" "Upa") > - (match_operand:PRED_ALL 3 "register_operand" "Upa")) > - (match_operand:PRED_ALL 1 "register_operand" "Upa")))] > + (match_operand:PRED_ALL 2 "register_operand") > + (match_operand:PRED_ALL 3 "register_operand")) > + (match_operand:PRED_ALL 1 "register_operand")))] > "TARGET_SVE" > - "<logical>\t%0.b, %1/z, %2.b, %3.b" > + {@ [ cons: =0, 1 , 2 , 3 ] > + [ Upa , Upa, Upa, Upa ] <logical>\t%0.b, %1/z, %2.b, %3.b > + } > ) > > ;; Perform a logical operation on operands 2 and 3, using operand 1 as > @@ -6676,38 +6690,42 @@ (define_insn "@aarch64_pred_<optab><mode>_z" > (define_insn "*<optab><mode>3_cc" > [(set (reg:CC_NZC CC_REGNUM) > (unspec:CC_NZC > - [(match_operand:VNx16BI 1 "register_operand" "Upa") > + [(match_operand:VNx16BI 1 "register_operand") > (match_operand 4) > (match_operand:SI 5 "aarch64_sve_ptrue_flag") > (and:PRED_ALL > (LOGICAL:PRED_ALL > - (match_operand:PRED_ALL 2 "register_operand" "Upa") > - (match_operand:PRED_ALL 3 "register_operand" "Upa")) > + (match_operand:PRED_ALL 2 "register_operand") > + (match_operand:PRED_ALL 3 "register_operand")) > (match_dup 4))] > UNSPEC_PTEST)) > - (set (match_operand:PRED_ALL 0 "register_operand" "=Upa") > + (set (match_operand:PRED_ALL 0 "register_operand") > (and:PRED_ALL (LOGICAL:PRED_ALL (match_dup 2) (match_dup 3)) > (match_dup 4)))] > "TARGET_SVE" > - "<logical>s\t%0.b, %1/z, %2.b, %3.b" > + {@ [ cons: =0, 1 , 2 , 3 , 4, 5 ] > + [ Upa , Upa, Upa, Upa, , ] <logical>s\t%0.b, %1/z, %2.b, %3.b > + } > ) > > ;; Same with just the flags result. > (define_insn "*<optab><mode>3_ptest" > [(set (reg:CC_NZC CC_REGNUM) > (unspec:CC_NZC > - [(match_operand:VNx16BI 1 "register_operand" "Upa") > + [(match_operand:VNx16BI 1 "register_operand") > (match_operand 4) > (match_operand:SI 5 "aarch64_sve_ptrue_flag") > (and:PRED_ALL > (LOGICAL:PRED_ALL > - (match_operand:PRED_ALL 2 "register_operand" "Upa") > - (match_operand:PRED_ALL 3 "register_operand" "Upa")) > + (match_operand:PRED_ALL 2 "register_operand") > + (match_operand:PRED_ALL 3 "register_operand")) > (match_dup 4))] > UNSPEC_PTEST)) > - (clobber (match_scratch:VNx16BI 0 "=Upa"))] > + (clobber (match_scratch:VNx16BI 0))] > "TARGET_SVE" > - "<logical>s\t%0.b, %1/z, %2.b, %3.b" > + {@ [ cons: =0, 1 , 2 , 3 , 4, 5 ] > + [ Upa , Upa, Upa, Upa, , ] <logical>s\t%0.b, %1/z, %2.b, %3.b > + } > ) > > ;; > ------------------------------------------------------------------------- > @@ -6720,56 +6738,62 @@ (define_insn "*<optab><mode>3_ptest" > > ;; Predicated predicate BIC and ORN. > (define_insn "aarch64_pred_<nlogical><mode>_z" > - [(set (match_operand:PRED_ALL 0 "register_operand" "=Upa") > + [(set (match_operand:PRED_ALL 0 "register_operand") > (and:PRED_ALL > (NLOGICAL:PRED_ALL > - (not:PRED_ALL (match_operand:PRED_ALL 3 "register_operand" > "Upa")) > - (match_operand:PRED_ALL 2 "register_operand" "Upa")) > - (match_operand:PRED_ALL 1 "register_operand" "Upa")))] > + (not:PRED_ALL (match_operand:PRED_ALL 3 "register_operand")) > + (match_operand:PRED_ALL 2 "register_operand")) > + (match_operand:PRED_ALL 1 "register_operand")))] > "TARGET_SVE" > - "<nlogical>\t%0.b, %1/z, %2.b, %3.b" > + {@ [ cons: =0, 1 , 2 , 3 ] > + [ Upa , Upa, Upa, Upa ] <nlogical>\t%0.b, %1/z, %2.b, %3.b > + } > ) > > ;; Same, but set the flags as a side-effect. > (define_insn "*<nlogical><mode>3_cc" > [(set (reg:CC_NZC CC_REGNUM) > (unspec:CC_NZC > - [(match_operand:VNx16BI 1 "register_operand" "Upa") > + [(match_operand:VNx16BI 1 "register_operand") > (match_operand 4) > (match_operand:SI 5 "aarch64_sve_ptrue_flag") > (and:PRED_ALL > (NLOGICAL:PRED_ALL > (not:PRED_ALL > - (match_operand:PRED_ALL 3 "register_operand" "Upa")) > - (match_operand:PRED_ALL 2 "register_operand" "Upa")) > + (match_operand:PRED_ALL 3 "register_operand")) > + (match_operand:PRED_ALL 2 "register_operand")) > (match_dup 4))] > UNSPEC_PTEST)) > - (set (match_operand:PRED_ALL 0 "register_operand" "=Upa") > + (set (match_operand:PRED_ALL 0 "register_operand") > (and:PRED_ALL (NLOGICAL:PRED_ALL > (not:PRED_ALL (match_dup 3)) > (match_dup 2)) > (match_dup 4)))] > "TARGET_SVE" > - "<nlogical>s\t%0.b, %1/z, %2.b, %3.b" > + {@ [ cons: =0, 1 , 2 , 3 , 4, 5 ] > + [ Upa , Upa, Upa, Upa, , ] <nlogical>s\t%0.b, %1/z, %2.b, %3.b > + } > ) > > ;; Same with just the flags result. > (define_insn "*<nlogical><mode>3_ptest" > [(set (reg:CC_NZC CC_REGNUM) > (unspec:CC_NZC > - [(match_operand:VNx16BI 1 "register_operand" "Upa") > + [(match_operand:VNx16BI 1 "register_operand") > (match_operand 4) > (match_operand:SI 5 "aarch64_sve_ptrue_flag") > (and:PRED_ALL > (NLOGICAL:PRED_ALL > (not:PRED_ALL > - (match_operand:PRED_ALL 3 "register_operand" "Upa")) > - (match_operand:PRED_ALL 2 "register_operand" "Upa")) > + (match_operand:PRED_ALL 3 "register_operand")) > + (match_operand:PRED_ALL 2 "register_operand")) > (match_dup 4))] > UNSPEC_PTEST)) > - (clobber (match_scratch:VNx16BI 0 "=Upa"))] > + (clobber (match_scratch:VNx16BI 0))] > "TARGET_SVE" > - "<nlogical>s\t%0.b, %1/z, %2.b, %3.b" > + {@ [ cons: =0, 1 , 2 , 3 , 4, 5 ] > + [ Upa , Upa, Upa, Upa, , ] <nlogical>s\t%0.b, %1/z, %2.b, > %3.b > + } > ) > > ;; > ------------------------------------------------------------------------- > @@ -6782,58 +6806,64 @@ (define_insn "*<nlogical><mode>3_ptest" > > ;; Predicated predicate NAND and NOR. > (define_insn "aarch64_pred_<logical_nn><mode>_z" > - [(set (match_operand:PRED_ALL 0 "register_operand" "=Upa") > + [(set (match_operand:PRED_ALL 0 "register_operand") > (and:PRED_ALL > (NLOGICAL:PRED_ALL > - (not:PRED_ALL (match_operand:PRED_ALL 2 "register_operand" > "Upa")) > - (not:PRED_ALL (match_operand:PRED_ALL 3 "register_operand" > "Upa"))) > - (match_operand:PRED_ALL 1 "register_operand" "Upa")))] > + (not:PRED_ALL (match_operand:PRED_ALL 2 "register_operand")) > + (not:PRED_ALL (match_operand:PRED_ALL 3 "register_operand"))) > + (match_operand:PRED_ALL 1 "register_operand")))] > "TARGET_SVE" > - "<logical_nn>\t%0.b, %1/z, %2.b, %3.b" > + {@ [ cons: =0, 1 , 2 , 3 ] > + [ Upa , Upa, Upa, Upa ] <logical_nn>\t%0.b, %1/z, %2.b, %3.b > + } > ) > > ;; Same, but set the flags as a side-effect. > (define_insn "*<logical_nn><mode>3_cc" > [(set (reg:CC_NZC CC_REGNUM) > (unspec:CC_NZC > - [(match_operand:VNx16BI 1 "register_operand" "Upa") > + [(match_operand:VNx16BI 1 "register_operand") > (match_operand 4) > (match_operand:SI 5 "aarch64_sve_ptrue_flag") > (and:PRED_ALL > (NLOGICAL:PRED_ALL > (not:PRED_ALL > - (match_operand:PRED_ALL 2 "register_operand" "Upa")) > + (match_operand:PRED_ALL 2 "register_operand")) > (not:PRED_ALL > - (match_operand:PRED_ALL 3 "register_operand" "Upa"))) > + (match_operand:PRED_ALL 3 "register_operand"))) > (match_dup 4))] > UNSPEC_PTEST)) > - (set (match_operand:PRED_ALL 0 "register_operand" "=Upa") > + (set (match_operand:PRED_ALL 0 "register_operand") > (and:PRED_ALL (NLOGICAL:PRED_ALL > (not:PRED_ALL (match_dup 2)) > (not:PRED_ALL (match_dup 3))) > (match_dup 4)))] > "TARGET_SVE" > - "<logical_nn>s\t%0.b, %1/z, %2.b, %3.b" > + {@ [ cons: =0, 1 , 2 , 3 , 4, 5 ] > + [ Upa , Upa, Upa, Upa, , ] <logical_nn>s\t%0.b, %1/z, %2.b, > %3.b > + } > ) > > ;; Same with just the flags result. > (define_insn "*<logical_nn><mode>3_ptest" > [(set (reg:CC_NZC CC_REGNUM) > (unspec:CC_NZC > - [(match_operand:VNx16BI 1 "register_operand" "Upa") > + [(match_operand:VNx16BI 1 "register_operand") > (match_operand 4) > (match_operand:SI 5 "aarch64_sve_ptrue_flag") > (and:PRED_ALL > (NLOGICAL:PRED_ALL > (not:PRED_ALL > - (match_operand:PRED_ALL 2 "register_operand" "Upa")) > + (match_operand:PRED_ALL 2 "register_operand")) > (not:PRED_ALL > - (match_operand:PRED_ALL 3 "register_operand" "Upa"))) > + (match_operand:PRED_ALL 3 "register_operand"))) > (match_dup 4))] > UNSPEC_PTEST)) > - (clobber (match_scratch:VNx16BI 0 "=Upa"))] > + (clobber (match_scratch:VNx16BI 0))] > "TARGET_SVE" > - "<logical_nn>s\t%0.b, %1/z, %2.b, %3.b" > + {@ [ cons: =0, 1 , 2 , 3 , 4, 5 ] > + [ Upa , Upa, Upa, Upa, , ] <logical_nn>s\t%0.b, %1/z, %2.b, > %3.b > + } > ) > > ;; > ========================================================================= > @@ -8133,12 +8163,12 @@ (define_insn_and_rewrite "*cmp<cmp_op><mode>_ptest" > (match_operand:SVE_I 3 > "aarch64_sve_cmp_<sve_imm_con>_operand"))] > UNSPEC_PRED_Z)] > UNSPEC_PTEST)) > - (clobber (match_scratch:<VPRED> 0 "=Upa, Upa"))] > + (clobber (match_scratch:<VPRED> 0))] > "TARGET_SVE > && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])" > - {@ [ cons: 1 , 2 , 3 ] > - [ Upl , w , <sve_imm_con> ] cmp<cmp_op>\t%0.<Vetype>, %1/z, > %2.<Vetype>, #%3 > - [ Upl , w , w ] cmp<cmp_op>\t%0.<Vetype>, %1/z, > %2.<Vetype>, %3.<Vetype> > + {@ [ cons: =0, 1 , 2 , 3 ] > + [ Upa , Upl, w , <sve_imm_con> ] cmp<cmp_op>\t%0.<Vetype>, > %1/z, %2.<Vetype>, #%3 > + [ Upa , Upl, w , w ] cmp<cmp_op>\t%0.<Vetype>, > %1/z, %2.<Vetype>, %3.<Vetype> > } > "&& !rtx_equal_p (operands[4], operands[6])" > { > @@ -8180,18 +8210,20 @@ (define_insn_and_split "*cmp<cmp_op><mode>_and" > > ;; Predicated integer wide comparisons. > (define_insn "@aarch64_pred_cmp<cmp_op><mode>_wide" > - [(set (match_operand:<VPRED> 0 "register_operand" "=Upa") > + [(set (match_operand:<VPRED> 0 "register_operand") > (unspec:<VPRED> > - [(match_operand:VNx16BI 1 "register_operand" "Upl") > + [(match_operand:VNx16BI 1 "register_operand") > (match_operand:SI 2 "aarch64_sve_ptrue_flag") > (unspec:<VPRED> > - [(match_operand:SVE_FULL_BHSI 3 "register_operand" "w") > - (match_operand:VNx2DI 4 "register_operand" "w")] > + [(match_operand:SVE_FULL_BHSI 3 "register_operand") > + (match_operand:VNx2DI 4 "register_operand")] > SVE_COND_INT_CMP_WIDE)] > UNSPEC_PRED_Z)) > (clobber (reg:CC_NZC CC_REGNUM))] > "TARGET_SVE" > - "cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.d" > + {@ [ cons: =0, 1 , 2, 3, 4 ] > + [ Upa , Upl, , w, w ] cmp<cmp_op>\t%0.<Vetype>, %1/z, > %3.<Vetype>, %4.d > + } > ) > > ;; Predicated integer wide comparisons in which both the flag and > @@ -8199,19 +8231,19 @@ (define_insn "@aarch64_pred_cmp<cmp_op><mode>_wide" > (define_insn "*aarch64_pred_cmp<cmp_op><mode>_wide_cc" > [(set (reg:CC_NZC CC_REGNUM) > (unspec:CC_NZC > - [(match_operand:VNx16BI 1 "register_operand" "Upl") > + [(match_operand:VNx16BI 1 "register_operand") > (match_operand 4) > (match_operand:SI 5 "aarch64_sve_ptrue_flag") > (unspec:<VPRED> > - [(match_operand:VNx16BI 6 "register_operand" "Upl") > + [(match_operand:VNx16BI 6 "register_operand") > (match_operand:SI 7 "aarch64_sve_ptrue_flag") > (unspec:<VPRED> > - [(match_operand:SVE_FULL_BHSI 2 "register_operand" "w") > - (match_operand:VNx2DI 3 "register_operand" "w")] > + [(match_operand:SVE_FULL_BHSI 2 "register_operand") > + (match_operand:VNx2DI 3 "register_operand")] > SVE_COND_INT_CMP_WIDE)] > UNSPEC_PRED_Z)] > UNSPEC_PTEST)) > - (set (match_operand:<VPRED> 0 "register_operand" "=Upa") > + (set (match_operand:<VPRED> 0 "register_operand") > (unspec:<VPRED> > [(match_dup 6) > (match_dup 7) > @@ -8222,7 +8254,9 @@ (define_insn > "*aarch64_pred_cmp<cmp_op><mode>_wide_cc" > UNSPEC_PRED_Z))] > "TARGET_SVE > && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])" > - "cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.d" > + {@ [ cons: =0, 1 , 2, 3, 4, 5, 6 , 7 ] > + [ Upa , Upl, w, w, , , Upl, ] cmp<cmp_op>\t%0.<Vetype>, > %1/z, %2.<Vetype>, %3.d > + } > ) > > ;; Predicated integer wide comparisons in which only the flags result > @@ -8230,22 +8264,24 @@ (define_insn > "*aarch64_pred_cmp<cmp_op><mode>_wide_cc" > (define_insn "*aarch64_pred_cmp<cmp_op><mode>_wide_ptest" > [(set (reg:CC_NZC CC_REGNUM) > (unspec:CC_NZC > - [(match_operand:VNx16BI 1 "register_operand" "Upl") > + [(match_operand:VNx16BI 1 "register_operand") > (match_operand 4) > (match_operand:SI 5 "aarch64_sve_ptrue_flag") > (unspec:<VPRED> > - [(match_operand:VNx16BI 6 "register_operand" "Upl") > + [(match_operand:VNx16BI 6 "register_operand") > (match_operand:SI 7 "aarch64_sve_ptrue_flag") > (unspec:<VPRED> > - [(match_operand:SVE_FULL_BHSI 2 "register_operand" "w") > - (match_operand:VNx2DI 3 "register_operand" "w")] > + [(match_operand:SVE_FULL_BHSI 2 "register_operand") > + (match_operand:VNx2DI 3 "register_operand")] > SVE_COND_INT_CMP_WIDE)] > UNSPEC_PRED_Z)] > UNSPEC_PTEST)) > - (clobber (match_scratch:<VPRED> 0 "=Upa"))] > + (clobber (match_scratch:<VPRED> 0))] > "TARGET_SVE > && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])" > - "cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.d" > + {@ [ cons: =0, 1 , 2, 3, 4, 5, 6 , 7 ] > + [ Upa , Upl, w, w, , , Upl, ] cmp<cmp_op>\t%0.<Vetype>, > %1/z, %2.<Vetype>, %3.d > + } > ) > > ;; > ------------------------------------------------------------------------- > @@ -9922,41 +9958,45 @@ (define_insn "@aarch64_brk<brk_op>" > (define_insn "*aarch64_brk<brk_op>_cc" > [(set (reg:CC_NZC CC_REGNUM) > (unspec:CC_NZC > - [(match_operand:VNx16BI 1 "register_operand" "Upa") > + [(match_operand:VNx16BI 1 "register_operand") > (match_dup 1) > (match_operand:SI 4 "aarch64_sve_ptrue_flag") > (unspec:VNx16BI > [(match_dup 1) > - (match_operand:VNx16BI 2 "register_operand" "Upa") > + (match_operand:VNx16BI 2 "register_operand") > (match_operand:VNx16BI 3 "aarch64_simd_imm_zero")] > SVE_BRK_UNARY)] > UNSPEC_PTEST)) > - (set (match_operand:VNx16BI 0 "register_operand" "=Upa") > + (set (match_operand:VNx16BI 0 "register_operand") > (unspec:VNx16BI > [(match_dup 1) > (match_dup 2) > (match_dup 3)] > SVE_BRK_UNARY))] > "TARGET_SVE" > - "brk<brk_op>s\t%0.b, %1/z, %2.b" > + {@ [ cons: =0, 1 , 2 , 3, 4 ] > + [ Upa , Upa, Upa, , ] brk<brk_op>s\t%0.b, %1/z, %2.b > + } > ) > > ;; Same, but with only the flags result being interesting. > (define_insn "*aarch64_brk<brk_op>_ptest" > [(set (reg:CC_NZC CC_REGNUM) > (unspec:CC_NZC > - [(match_operand:VNx16BI 1 "register_operand" "Upa") > + [(match_operand:VNx16BI 1 "register_operand") > (match_dup 1) > (match_operand:SI 4 "aarch64_sve_ptrue_flag") > (unspec:VNx16BI > [(match_dup 1) > - (match_operand:VNx16BI 2 "register_operand" "Upa") > + (match_operand:VNx16BI 2 "register_operand") > (match_operand:VNx16BI 3 "aarch64_simd_imm_zero")] > SVE_BRK_UNARY)] > UNSPEC_PTEST)) > - (clobber (match_scratch:VNx16BI 0 "=Upa"))] > + (clobber (match_scratch:VNx16BI 0))] > "TARGET_SVE" > - "brk<brk_op>s\t%0.b, %1/z, %2.b" > + {@ [ cons: =0, 1 , 2 , 3, 4 ] > + [ Upa , Upa, Upa, , ] brk<brk_op>s\t%0.b, %1/z, %2.b > + } > ) > > ;; > ------------------------------------------------------------------------- > @@ -9973,14 +10013,16 @@ (define_insn "*aarch64_brk<brk_op>_ptest" > > ;; Binary BRKs (BRKN, BRKPA, BRKPB). > (define_insn "@aarch64_brk<brk_op>" > - [(set (match_operand:VNx16BI 0 "register_operand" "=Upa") > + [(set (match_operand:VNx16BI 0 "register_operand") > (unspec:VNx16BI > - [(match_operand:VNx16BI 1 "register_operand" "Upa") > - (match_operand:VNx16BI 2 "register_operand" "Upa") > - (match_operand:VNx16BI 3 "register_operand" "<brk_reg_con>")] > + [(match_operand:VNx16BI 1 "register_operand") > + (match_operand:VNx16BI 2 "register_operand") > + (match_operand:VNx16BI 3 "register_operand")] > SVE_BRK_BINARY))] > "TARGET_SVE" > - "brk<brk_op>\t%0.b, %1/z, %2.b, %<brk_reg_opno>.b" > + {@ [ cons: =0, 1 , 2 , 3 ] > + [ Upa , Upa, Upa, <brk_reg_con> ] brk<brk_op>\t%0.b, %1/z, %2.b, > %<brk_reg_opno>.b > + } > ) > > ;; BRKN, producing both a predicate and a flags result. Unlike other > @@ -9992,19 +10034,21 @@ (define_insn_and_rewrite "*aarch64_brkn_cc" > (match_operand:VNx16BI 5) > (const_int SVE_KNOWN_PTRUE) > (unspec:VNx16BI > - [(match_operand:VNx16BI 1 "register_operand" "Upa") > - (match_operand:VNx16BI 2 "register_operand" "Upa") > - (match_operand:VNx16BI 3 "register_operand" "0")] > + [(match_operand:VNx16BI 1 "register_operand") > + (match_operand:VNx16BI 2 "register_operand") > + (match_operand:VNx16BI 3 "register_operand")] > UNSPEC_BRKN)] > UNSPEC_PTEST)) > - (set (match_operand:VNx16BI 0 "register_operand" "=Upa") > + (set (match_operand:VNx16BI 0 "register_operand") > (unspec:VNx16BI > [(match_dup 1) > (match_dup 2) > (match_dup 3)] > UNSPEC_BRKN))] > "TARGET_SVE" > - "brkns\t%0.b, %1/z, %2.b, %0.b" > + {@ [ cons: =0, 1 , 2 , 3, 4, 5 ] > + [ Upa , Upa, Upa, 0, , ] brkns\t%0.b, %1/z, %2.b, %0.b > + } > "&& (operands[4] != CONST0_RTX (VNx16BImode) > || operands[5] != CONST0_RTX (VNx16BImode))" > { > @@ -10021,14 +10065,16 @@ (define_insn_and_rewrite "*aarch64_brkn_ptest" > (match_operand:VNx16BI 5) > (const_int SVE_KNOWN_PTRUE) > (unspec:VNx16BI > - [(match_operand:VNx16BI 1 "register_operand" "Upa") > - (match_operand:VNx16BI 2 "register_operand" "Upa") > - (match_operand:VNx16BI 3 "register_operand" "0")] > + [(match_operand:VNx16BI 1 "register_operand") > + (match_operand:VNx16BI 2 "register_operand") > + (match_operand:VNx16BI 3 "register_operand")] > UNSPEC_BRKN)] > UNSPEC_PTEST)) > - (clobber (match_scratch:VNx16BI 0 "=Upa"))] > + (clobber (match_scratch:VNx16BI 0))] > "TARGET_SVE" > - "brkns\t%0.b, %1/z, %2.b, %0.b" > + {@ [ cons: =0, 1 , 2 , 3, 4, 5 ] > + [ Upa , Upa, Upa, 0, , ] brkns\t%0.b, %1/z, %2.b, %0.b > + } > "&& (operands[4] != CONST0_RTX (VNx16BImode) > || operands[5] != CONST0_RTX (VNx16BImode))" > { > @@ -10041,41 +10087,45 @@ (define_insn_and_rewrite "*aarch64_brkn_ptest" > (define_insn "*aarch64_brk<brk_op>_cc" > [(set (reg:CC_NZC CC_REGNUM) > (unspec:CC_NZC > - [(match_operand:VNx16BI 1 "register_operand" "Upa") > + [(match_operand:VNx16BI 1 "register_operand") > (match_dup 1) > (match_operand:SI 4 "aarch64_sve_ptrue_flag") > (unspec:VNx16BI > [(match_dup 1) > - (match_operand:VNx16BI 2 "register_operand" "Upa") > - (match_operand:VNx16BI 3 "register_operand" "Upa")] > + (match_operand:VNx16BI 2 "register_operand") > + (match_operand:VNx16BI 3 "register_operand")] > SVE_BRKP)] > UNSPEC_PTEST)) > - (set (match_operand:VNx16BI 0 "register_operand" "=Upa") > + (set (match_operand:VNx16BI 0 "register_operand") > (unspec:VNx16BI > [(match_dup 1) > (match_dup 2) > (match_dup 3)] > SVE_BRKP))] > "TARGET_SVE" > - "brk<brk_op>s\t%0.b, %1/z, %2.b, %3.b" > + {@ [ cons: =0, 1 , 2 , 3 , 4 ] > + [ Upa , Upa, Upa, Upa, ] brk<brk_op>s\t%0.b, %1/z, %2.b, %3.b > + } > ) > > ;; Same, but with only the flags result being interesting. > (define_insn "*aarch64_brk<brk_op>_ptest" > [(set (reg:CC_NZC CC_REGNUM) > (unspec:CC_NZC > - [(match_operand:VNx16BI 1 "register_operand" "Upa") > + [(match_operand:VNx16BI 1 "register_operand") > (match_dup 1) > (match_operand:SI 4 "aarch64_sve_ptrue_flag") > (unspec:VNx16BI > [(match_dup 1) > - (match_operand:VNx16BI 2 "register_operand" "Upa") > - (match_operand:VNx16BI 3 "register_operand" "Upa")] > + (match_operand:VNx16BI 2 "register_operand") > + (match_operand:VNx16BI 3 "register_operand")] > SVE_BRKP)] > UNSPEC_PTEST)) > - (clobber (match_scratch:VNx16BI 0 "=Upa"))] > + (clobber (match_scratch:VNx16BI 0))] > "TARGET_SVE" > - "brk<brk_op>s\t%0.b, %1/z, %2.b, %3.b" > + {@ [ cons: =0, 1 , 2 , 3 , 4 ] > + [ Upa , Upa, Upa, Upa, ] brk<brk_op>s\t%0.b, %1/z, %2.b, %3.b > + } > ) > > ;; > ------------------------------------------------------------------------- > diff --git a/gcc/config/aarch64/aarch64-sve2.md > b/gcc/config/aarch64/aarch64-sve2.md > index > 934e57055d3419e5dcc89b473fd110a0d4978b4f..aa12baf48355358ca4fefe88157df3aac6eb09bd > 100644 > --- a/gcc/config/aarch64/aarch64-sve2.md > +++ b/gcc/config/aarch64/aarch64-sve2.md > @@ -3338,18 +3338,20 @@ (define_insn "@aarch64_sve2_histseg<mode>" > > ;; Predicated string matching. > (define_insn "@aarch64_pred_<sve_int_op><mode>" > - [(set (match_operand:<VPRED> 0 "register_operand" "=Upa") > + [(set (match_operand:<VPRED> 0 "register_operand") > (unspec:<VPRED> > - [(match_operand:<VPRED> 1 "register_operand" "Upl") > + [(match_operand:<VPRED> 1 "register_operand") > (match_operand:SI 2 "aarch64_sve_ptrue_flag") > (unspec:<VPRED> > - [(match_operand:SVE_FULL_BHI 3 "register_operand" "w") > - (match_operand:SVE_FULL_BHI 4 "register_operand" "w")] > + [(match_operand:SVE_FULL_BHI 3 "register_operand") > + (match_operand:SVE_FULL_BHI 4 "register_operand")] > SVE2_MATCH)] > UNSPEC_PRED_Z)) > (clobber (reg:CC_NZC CC_REGNUM))] > "TARGET_SVE2 && TARGET_NON_STREAMING" > - "<sve_int_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.<Vetype>" > + {@ [ cons: =0, 1 , 2, 3, 4 ] > + [ Upa , Upl, , w, w ] <sve_int_op>\t%0.<Vetype>, %1/z, > %3.<Vetype>, %4.<Vetype> > + } > ) > > ;; Predicated string matching in which both the flag and predicate results > > > > > -- > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 1/4]AArch64: convert several predicate patterns to new compact syntax 2024-05-15 10:28 ` [PATCH 1/4]AArch64: convert several predicate patterns to new compact syntax Tamar Christina 2024-05-15 10:35 ` Kyrill Tkachov @ 2024-05-15 11:06 ` Richard Sandiford 1 sibling, 0 replies; 25+ messages in thread From: Richard Sandiford @ 2024-05-15 11:06 UTC (permalink / raw) To: Tamar Christina Cc: gcc-patches, nd, Richard.Earnshaw, Marcus.Shawcroft, ktkachov Thanks for doing this a pre-patch. Minor request below: Tamar Christina <tamar.christina@arm.com> writes: > ;; Perform a logical operation on operands 2 and 3, using operand 1 as > @@ -6676,38 +6690,42 @@ (define_insn "@aarch64_pred_<optab><mode>_z" > (define_insn "*<optab><mode>3_cc" > [(set (reg:CC_NZC CC_REGNUM) > (unspec:CC_NZC > - [(match_operand:VNx16BI 1 "register_operand" "Upa") > + [(match_operand:VNx16BI 1 "register_operand") > (match_operand 4) > (match_operand:SI 5 "aarch64_sve_ptrue_flag") > (and:PRED_ALL > (LOGICAL:PRED_ALL > - (match_operand:PRED_ALL 2 "register_operand" "Upa") > - (match_operand:PRED_ALL 3 "register_operand" "Upa")) > + (match_operand:PRED_ALL 2 "register_operand") > + (match_operand:PRED_ALL 3 "register_operand")) > (match_dup 4))] > UNSPEC_PTEST)) > - (set (match_operand:PRED_ALL 0 "register_operand" "=Upa") > + (set (match_operand:PRED_ALL 0 "register_operand") > (and:PRED_ALL (LOGICAL:PRED_ALL (match_dup 2) (match_dup 3)) > (match_dup 4)))] > "TARGET_SVE" > - "<logical>s\t%0.b, %1/z, %2.b, %3.b" > + {@ [ cons: =0, 1 , 2 , 3 , 4, 5 ] > + [ Upa , Upa, Upa, Upa, , ] <logical>s\t%0.b, %1/z, %2.b, %3.b > + } > ) Could we leave out these empty trailing constraints? They're quite common in SVE & SME patterns and are specifically not meant to influence instruction selection. E.g. we've done the same thing for *cnot<mode> (to pick a random example). Agree with Kyrill's ok otherwise. Richard ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 2/4]AArch64: add new tuning param and attribute for enabling conditional early clobber 2024-05-15 10:28 [PATCH 0/4]AArch64: support conditional early clobbers on certain operations Tamar Christina 2024-05-15 10:28 ` [PATCH 1/4]AArch64: convert several predicate patterns to new compact syntax Tamar Christina @ 2024-05-15 10:28 ` Tamar Christina 2024-05-15 10:56 ` Richard Sandiford 2024-05-15 10:29 ` [PATCH 3/4]AArch64: add new alternative with early clobber to patterns Tamar Christina ` (2 subsequent siblings) 4 siblings, 1 reply; 25+ messages in thread From: Tamar Christina @ 2024-05-15 10:28 UTC (permalink / raw) To: gcc-patches Cc: nd, Richard.Earnshaw, Marcus.Shawcroft, ktkachov, richard.sandiford [-- Attachment #1: Type: text/plain, Size: 3308 bytes --] Hi All, This adds a new tuning parameter EARLY_CLOBBER_SVE_PRED_DEST for AArch64 to allow us to conditionally enable the early clobber alternatives based on the tuning models. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * config/aarch64/aarch64-tuning-flags.def (EARLY_CLOBBER_SVE_PRED_DEST): New. * config/aarch64/aarch64.h (TARGET_SVE_PRED_CLOBBER): New. * config/aarch64/aarch64.md (pred_clobber): New. (arch_enabled): Use it. --- diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def b/gcc/config/aarch64/aarch64-tuning-flags.def index d5bcaebce770f0b217aac783063d39135f754c77..49fbad3ff28bc82b25c61ac501ccf533ec4b4c3f 100644 --- a/gcc/config/aarch64/aarch64-tuning-flags.def +++ b/gcc/config/aarch64/aarch64-tuning-flags.def @@ -48,4 +48,8 @@ AARCH64_EXTRA_TUNING_OPTION ("avoid_cross_loop_fma", AVOID_CROSS_LOOP_FMA) AARCH64_EXTRA_TUNING_OPTION ("fully_pipelined_fma", FULLY_PIPELINED_FMA) +/* Enable is the target prefers to use a fresh register for predicate outputs + rather than re-use an input predicate register. */ +AARCH64_EXTRA_TUNING_OPTION ("early_clobber_sve_pred_dest", EARLY_CLOBBER_SVE_PRED_DEST) + #undef AARCH64_EXTRA_TUNING_OPTION diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index bbf11faaf4b4340956094a983f8b0dc2649b2d27..76a18dd511f40ebb58ed12d56b46c74084ba7c3c 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -495,6 +495,11 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = AARCH64_FL_SM_OFF; enabled through +gcs. */ #define TARGET_GCS (AARCH64_ISA_GCS) +/* Prefer different predicate registers for the output of a predicated operation over + re-using an existing input predicate. */ +#define TARGET_SVE_PRED_CLOBBER (TARGET_SVE \ + && (aarch64_tune_params.extra_tuning_flags \ + & AARCH64_EXTRA_TUNE_EARLY_CLOBBER_SVE_PRED_DEST)) /* Standard register usage. */ diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index dbde066f7478bec51a8703b017ea553aa98be309..1ecd1a2812969504bd5114a53473b478c5ddba82 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -445,6 +445,10 @@ (define_enum_attr "arch" "arches" (const_string "any")) ;; target-independent code. (define_attr "is_call" "no,yes" (const_string "no")) +;; Indicates whether we want to enable the pattern with an optional early +;; clobber for SVE predicates. +(define_attr "pred_clobber" "no,yes" (const_string "no")) + ;; [For compatibility with Arm in pipeline models] ;; Attribute that specifies whether or not the instruction touches fp ;; registers. @@ -461,7 +465,8 @@ (define_attr "fp" "no,yes" (define_attr "arch_enabled" "no,yes" (if_then_else (ior - (eq_attr "arch" "any") + (and (eq_attr "arch" "any") + (eq_attr "pred_clobber" "no")) (and (eq_attr "arch" "rcpc8_4") (match_test "AARCH64_ISA_RCPC8_4")) @@ -488,7 +493,10 @@ (define_attr "arch_enabled" "no,yes" (match_test "TARGET_SVE")) (and (eq_attr "arch" "sme") - (match_test "TARGET_SME"))) + (match_test "TARGET_SME")) + + (and (eq_attr "pred_clobber" "yes") + (match_test "TARGET_SVE_PRED_CLOBBER"))) (const_string "yes") (const_string "no"))) -- [-- Attachment #2: rb18355.patch --] [-- Type: text/x-diff, Size: 2793 bytes --] diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def b/gcc/config/aarch64/aarch64-tuning-flags.def index d5bcaebce770f0b217aac783063d39135f754c77..49fbad3ff28bc82b25c61ac501ccf533ec4b4c3f 100644 --- a/gcc/config/aarch64/aarch64-tuning-flags.def +++ b/gcc/config/aarch64/aarch64-tuning-flags.def @@ -48,4 +48,8 @@ AARCH64_EXTRA_TUNING_OPTION ("avoid_cross_loop_fma", AVOID_CROSS_LOOP_FMA) AARCH64_EXTRA_TUNING_OPTION ("fully_pipelined_fma", FULLY_PIPELINED_FMA) +/* Enable is the target prefers to use a fresh register for predicate outputs + rather than re-use an input predicate register. */ +AARCH64_EXTRA_TUNING_OPTION ("early_clobber_sve_pred_dest", EARLY_CLOBBER_SVE_PRED_DEST) + #undef AARCH64_EXTRA_TUNING_OPTION diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index bbf11faaf4b4340956094a983f8b0dc2649b2d27..76a18dd511f40ebb58ed12d56b46c74084ba7c3c 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -495,6 +495,11 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = AARCH64_FL_SM_OFF; enabled through +gcs. */ #define TARGET_GCS (AARCH64_ISA_GCS) +/* Prefer different predicate registers for the output of a predicated operation over + re-using an existing input predicate. */ +#define TARGET_SVE_PRED_CLOBBER (TARGET_SVE \ + && (aarch64_tune_params.extra_tuning_flags \ + & AARCH64_EXTRA_TUNE_EARLY_CLOBBER_SVE_PRED_DEST)) /* Standard register usage. */ diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index dbde066f7478bec51a8703b017ea553aa98be309..1ecd1a2812969504bd5114a53473b478c5ddba82 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -445,6 +445,10 @@ (define_enum_attr "arch" "arches" (const_string "any")) ;; target-independent code. (define_attr "is_call" "no,yes" (const_string "no")) +;; Indicates whether we want to enable the pattern with an optional early +;; clobber for SVE predicates. +(define_attr "pred_clobber" "no,yes" (const_string "no")) + ;; [For compatibility with Arm in pipeline models] ;; Attribute that specifies whether or not the instruction touches fp ;; registers. @@ -461,7 +465,8 @@ (define_attr "fp" "no,yes" (define_attr "arch_enabled" "no,yes" (if_then_else (ior - (eq_attr "arch" "any") + (and (eq_attr "arch" "any") + (eq_attr "pred_clobber" "no")) (and (eq_attr "arch" "rcpc8_4") (match_test "AARCH64_ISA_RCPC8_4")) @@ -488,7 +493,10 @@ (define_attr "arch_enabled" "no,yes" (match_test "TARGET_SVE")) (and (eq_attr "arch" "sme") - (match_test "TARGET_SME"))) + (match_test "TARGET_SME")) + + (and (eq_attr "pred_clobber" "yes") + (match_test "TARGET_SVE_PRED_CLOBBER"))) (const_string "yes") (const_string "no"))) ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/4]AArch64: add new tuning param and attribute for enabling conditional early clobber 2024-05-15 10:28 ` [PATCH 2/4]AArch64: add new tuning param and attribute for enabling conditional early clobber Tamar Christina @ 2024-05-15 10:56 ` Richard Sandiford 2024-05-15 11:03 ` Tamar Christina 2024-05-22 9:29 ` Tamar Christina 0 siblings, 2 replies; 25+ messages in thread From: Richard Sandiford @ 2024-05-15 10:56 UTC (permalink / raw) To: Tamar Christina Cc: gcc-patches, nd, Richard.Earnshaw, Marcus.Shawcroft, ktkachov Tamar Christina <tamar.christina@arm.com> writes: > Hi All, > > This adds a new tuning parameter EARLY_CLOBBER_SVE_PRED_DEST for AArch64 to > allow us to conditionally enable the early clobber alternatives based on the > tuning models. > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > Ok for master? > > Thanks, > Tamar > > gcc/ChangeLog: > > * config/aarch64/aarch64-tuning-flags.def > (EARLY_CLOBBER_SVE_PRED_DEST): New. > * config/aarch64/aarch64.h (TARGET_SVE_PRED_CLOBBER): New. > * config/aarch64/aarch64.md (pred_clobber): New. > (arch_enabled): Use it. > > --- > diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def b/gcc/config/aarch64/aarch64-tuning-flags.def > index d5bcaebce770f0b217aac783063d39135f754c77..49fbad3ff28bc82b25c61ac501ccf533ec4b4c3f 100644 > --- a/gcc/config/aarch64/aarch64-tuning-flags.def > +++ b/gcc/config/aarch64/aarch64-tuning-flags.def > @@ -48,4 +48,8 @@ AARCH64_EXTRA_TUNING_OPTION ("avoid_cross_loop_fma", AVOID_CROSS_LOOP_FMA) > > AARCH64_EXTRA_TUNING_OPTION ("fully_pipelined_fma", FULLY_PIPELINED_FMA) > > +/* Enable is the target prefers to use a fresh register for predicate outputs > + rather than re-use an input predicate register. */ > +AARCH64_EXTRA_TUNING_OPTION ("early_clobber_sve_pred_dest", EARLY_CLOBBER_SVE_PRED_DEST) Sorry for the bike-shedding, but how about something like "avoid_pred_rmw"? (I'm open to other suggestions.) Just looking for something that describes either the architecture or the end result that we want to achieve. And preferable something fairly short :) avoid_* would be consistent with the existing "avoid_cross_loop_fma". > + > #undef AARCH64_EXTRA_TUNING_OPTION > diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h > index bbf11faaf4b4340956094a983f8b0dc2649b2d27..76a18dd511f40ebb58ed12d56b46c74084ba7c3c 100644 > --- a/gcc/config/aarch64/aarch64.h > +++ b/gcc/config/aarch64/aarch64.h > @@ -495,6 +495,11 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = AARCH64_FL_SM_OFF; > enabled through +gcs. */ > #define TARGET_GCS (AARCH64_ISA_GCS) > > +/* Prefer different predicate registers for the output of a predicated operation over > + re-using an existing input predicate. */ > +#define TARGET_SVE_PRED_CLOBBER (TARGET_SVE \ > + && (aarch64_tune_params.extra_tuning_flags \ > + & AARCH64_EXTRA_TUNE_EARLY_CLOBBER_SVE_PRED_DEST)) > > /* Standard register usage. */ > > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md > index dbde066f7478bec51a8703b017ea553aa98be309..1ecd1a2812969504bd5114a53473b478c5ddba82 100644 > --- a/gcc/config/aarch64/aarch64.md > +++ b/gcc/config/aarch64/aarch64.md > @@ -445,6 +445,10 @@ (define_enum_attr "arch" "arches" (const_string "any")) > ;; target-independent code. > (define_attr "is_call" "no,yes" (const_string "no")) > > +;; Indicates whether we want to enable the pattern with an optional early > +;; clobber for SVE predicates. > +(define_attr "pred_clobber" "no,yes" (const_string "no")) > + > ;; [For compatibility with Arm in pipeline models] > ;; Attribute that specifies whether or not the instruction touches fp > ;; registers. > @@ -461,7 +465,8 @@ (define_attr "fp" "no,yes" > (define_attr "arch_enabled" "no,yes" > (if_then_else > (ior > - (eq_attr "arch" "any") > + (and (eq_attr "arch" "any") > + (eq_attr "pred_clobber" "no")) > > (and (eq_attr "arch" "rcpc8_4") > (match_test "AARCH64_ISA_RCPC8_4")) > @@ -488,7 +493,10 @@ (define_attr "arch_enabled" "no,yes" > (match_test "TARGET_SVE")) > > (and (eq_attr "arch" "sme") > - (match_test "TARGET_SME"))) > + (match_test "TARGET_SME")) > + > + (and (eq_attr "pred_clobber" "yes") > + (match_test "TARGET_SVE_PRED_CLOBBER"))) IMO it'd be bettero handle pred_clobber separately from arch, as a new top-level AND: (and (ior (eq_attr "pred_clobber" "no") (match_test "!TARGET_...")) (ior ...existing arch tests...)) Thanks, Richard ^ permalink raw reply [flat|nested] 25+ messages in thread
* RE: [PATCH 2/4]AArch64: add new tuning param and attribute for enabling conditional early clobber 2024-05-15 10:56 ` Richard Sandiford @ 2024-05-15 11:03 ` Tamar Christina 2024-05-22 9:29 ` Tamar Christina 1 sibling, 0 replies; 25+ messages in thread From: Tamar Christina @ 2024-05-15 11:03 UTC (permalink / raw) To: Richard Sandiford Cc: gcc-patches, nd, Richard Earnshaw, Marcus Shawcroft, ktkachov > -----Original Message----- > From: Richard Sandiford <richard.sandiford@arm.com> > Sent: Wednesday, May 15, 2024 11:56 AM > To: Tamar Christina <Tamar.Christina@arm.com> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard Earnshaw > <Richard.Earnshaw@arm.com>; Marcus Shawcroft > <Marcus.Shawcroft@arm.com>; ktkachov@gcc.gnu.org > Subject: Re: [PATCH 2/4]AArch64: add new tuning param and attribute for > enabling conditional early clobber > > Tamar Christina <tamar.christina@arm.com> writes: > > Hi All, > > > > This adds a new tuning parameter EARLY_CLOBBER_SVE_PRED_DEST for AArch64 > to > > allow us to conditionally enable the early clobber alternatives based on the > > tuning models. > > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > > > Ok for master? > > > > Thanks, > > Tamar > > > > gcc/ChangeLog: > > > > * config/aarch64/aarch64-tuning-flags.def > > (EARLY_CLOBBER_SVE_PRED_DEST): New. > > * config/aarch64/aarch64.h (TARGET_SVE_PRED_CLOBBER): New. > > * config/aarch64/aarch64.md (pred_clobber): New. > > (arch_enabled): Use it. > > > > --- > > diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def > b/gcc/config/aarch64/aarch64-tuning-flags.def > > index > d5bcaebce770f0b217aac783063d39135f754c77..49fbad3ff28bc82b25c61ac50 > 1ccf533ec4b4c3f 100644 > > --- a/gcc/config/aarch64/aarch64-tuning-flags.def > > +++ b/gcc/config/aarch64/aarch64-tuning-flags.def > > @@ -48,4 +48,8 @@ AARCH64_EXTRA_TUNING_OPTION > ("avoid_cross_loop_fma", AVOID_CROSS_LOOP_FMA) > > > > AARCH64_EXTRA_TUNING_OPTION ("fully_pipelined_fma", > FULLY_PIPELINED_FMA) > > > > +/* Enable is the target prefers to use a fresh register for predicate outputs > > + rather than re-use an input predicate register. */ > > +AARCH64_EXTRA_TUNING_OPTION ("early_clobber_sve_pred_dest", > EARLY_CLOBBER_SVE_PRED_DEST) > > Sorry for the bike-shedding, but how about something like "avoid_pred_rmw"? > (I'm open to other suggestions.) Just looking for something that describes > either the architecture or the end result that we want to achieve. > And preferable something fairly short :) > > avoid_* would be consistent with the existing "avoid_cross_loop_fma". Sure, happy to, it's something we initially struggled with naming internally as well. It sounds there's precedence so the avoid_ naming, so happy to use this naming. Will respin with it. Thanks, Tamar > > > + > > #undef AARCH64_EXTRA_TUNING_OPTION > > diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h > > index > bbf11faaf4b4340956094a983f8b0dc2649b2d27..76a18dd511f40ebb58ed12d5 > 6b46c74084ba7c3c 100644 > > --- a/gcc/config/aarch64/aarch64.h > > +++ b/gcc/config/aarch64/aarch64.h > > @@ -495,6 +495,11 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = > AARCH64_FL_SM_OFF; > > enabled through +gcs. */ > > #define TARGET_GCS (AARCH64_ISA_GCS) > > > > +/* Prefer different predicate registers for the output of a predicated operation > over > > + re-using an existing input predicate. */ > > +#define TARGET_SVE_PRED_CLOBBER (TARGET_SVE \ > > + && (aarch64_tune_params.extra_tuning_flags \ > > + & > AARCH64_EXTRA_TUNE_EARLY_CLOBBER_SVE_PRED_DEST)) > > > > /* Standard register usage. */ > > > > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md > > index > dbde066f7478bec51a8703b017ea553aa98be309..1ecd1a2812969504bd5114a > 53473b478c5ddba82 100644 > > --- a/gcc/config/aarch64/aarch64.md > > +++ b/gcc/config/aarch64/aarch64.md > > @@ -445,6 +445,10 @@ (define_enum_attr "arch" "arches" (const_string > "any")) > > ;; target-independent code. > > (define_attr "is_call" "no,yes" (const_string "no")) > > > > +;; Indicates whether we want to enable the pattern with an optional early > > +;; clobber for SVE predicates. > > +(define_attr "pred_clobber" "no,yes" (const_string "no")) > > + > > ;; [For compatibility with Arm in pipeline models] > > ;; Attribute that specifies whether or not the instruction touches fp > > ;; registers. > > @@ -461,7 +465,8 @@ (define_attr "fp" "no,yes" > > (define_attr "arch_enabled" "no,yes" > > (if_then_else > > (ior > > - (eq_attr "arch" "any") > > + (and (eq_attr "arch" "any") > > + (eq_attr "pred_clobber" "no")) > > > > (and (eq_attr "arch" "rcpc8_4") > > (match_test "AARCH64_ISA_RCPC8_4")) > > @@ -488,7 +493,10 @@ (define_attr "arch_enabled" "no,yes" > > (match_test "TARGET_SVE")) > > > > (and (eq_attr "arch" "sme") > > - (match_test "TARGET_SME"))) > > + (match_test "TARGET_SME")) > > + > > + (and (eq_attr "pred_clobber" "yes") > > + (match_test "TARGET_SVE_PRED_CLOBBER"))) > > IMO it'd be bettero handle pred_clobber separately from arch, as a new > top-level AND: > > (and > (ior > (eq_attr "pred_clobber" "no") > (match_test "!TARGET_...")) > (ior > ...existing arch tests...)) > > Thanks, > Richard ^ permalink raw reply [flat|nested] 25+ messages in thread
* RE: [PATCH 2/4]AArch64: add new tuning param and attribute for enabling conditional early clobber 2024-05-15 10:56 ` Richard Sandiford 2024-05-15 11:03 ` Tamar Christina @ 2024-05-22 9:29 ` Tamar Christina 2024-05-28 9:37 ` Tamar Christina 1 sibling, 1 reply; 25+ messages in thread From: Tamar Christina @ 2024-05-22 9:29 UTC (permalink / raw) To: Richard Sandiford Cc: gcc-patches, nd, Richard Earnshaw, Marcus Shawcroft, ktkachov [-- Attachment #1: Type: text/plain, Size: 6016 bytes --] > > Sorry for the bike-shedding, but how about something like "avoid_pred_rmw"? > (I'm open to other suggestions.) Just looking for something that describes > either the architecture or the end result that we want to achieve. > And preferable something fairly short :) > > avoid_* would be consistent with the existing "avoid_cross_loop_fma". > > > + > > #undef AARCH64_EXTRA_TUNING_OPTION > > diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h > > index > bbf11faaf4b4340956094a983f8b0dc2649b2d27..76a18dd511f40ebb58ed12d5 > 6b46c74084ba7c3c 100644 > > --- a/gcc/config/aarch64/aarch64.h > > +++ b/gcc/config/aarch64/aarch64.h > > @@ -495,6 +495,11 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = > AARCH64_FL_SM_OFF; > > enabled through +gcs. */ > > #define TARGET_GCS (AARCH64_ISA_GCS) > > > > +/* Prefer different predicate registers for the output of a predicated operation > over > > + re-using an existing input predicate. */ > > +#define TARGET_SVE_PRED_CLOBBER (TARGET_SVE \ > > + && (aarch64_tune_params.extra_tuning_flags \ > > + & > AARCH64_EXTRA_TUNE_EARLY_CLOBBER_SVE_PRED_DEST)) > > > > /* Standard register usage. */ > > > > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md > > index > dbde066f7478bec51a8703b017ea553aa98be309..1ecd1a2812969504bd5114a > 53473b478c5ddba82 100644 > > --- a/gcc/config/aarch64/aarch64.md > > +++ b/gcc/config/aarch64/aarch64.md > > @@ -445,6 +445,10 @@ (define_enum_attr "arch" "arches" (const_string > "any")) > > ;; target-independent code. > > (define_attr "is_call" "no,yes" (const_string "no")) > > > > +;; Indicates whether we want to enable the pattern with an optional early > > +;; clobber for SVE predicates. > > +(define_attr "pred_clobber" "no,yes" (const_string "no")) > > + > > ;; [For compatibility with Arm in pipeline models] > > ;; Attribute that specifies whether or not the instruction touches fp > > ;; registers. > > @@ -461,7 +465,8 @@ (define_attr "fp" "no,yes" > > (define_attr "arch_enabled" "no,yes" > > (if_then_else > > (ior > > - (eq_attr "arch" "any") > > + (and (eq_attr "arch" "any") > > + (eq_attr "pred_clobber" "no")) > > > > (and (eq_attr "arch" "rcpc8_4") > > (match_test "AARCH64_ISA_RCPC8_4")) > > @@ -488,7 +493,10 @@ (define_attr "arch_enabled" "no,yes" > > (match_test "TARGET_SVE")) > > > > (and (eq_attr "arch" "sme") > > - (match_test "TARGET_SME"))) > > + (match_test "TARGET_SME")) > > + > > + (and (eq_attr "pred_clobber" "yes") > > + (match_test "TARGET_SVE_PRED_CLOBBER"))) > > IMO it'd be bettero handle pred_clobber separately from arch, as a new > top-level AND: > > (and > (ior > (eq_attr "pred_clobber" "no") > (match_test "!TARGET_...")) > (ior > ...existing arch tests...)) > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * config/aarch64/aarch64-tuning-flags.def (AVOID_PRED_RMW): New. * config/aarch64/aarch64.h (TARGET_SVE_PRED_CLOBBER): New. * config/aarch64/aarch64.md (pred_clobber): New. (arch_enabled): Use it. -- inline copy of patch -- diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def b/gcc/config/aarch64/aarch64-tuning-flags.def index d5bcaebce770f0b217aac783063d39135f754c77..a9f48f5d3d4ea32fbf53086ba21eab4bc65b6dcb 100644 --- a/gcc/config/aarch64/aarch64-tuning-flags.def +++ b/gcc/config/aarch64/aarch64-tuning-flags.def @@ -48,4 +48,8 @@ AARCH64_EXTRA_TUNING_OPTION ("avoid_cross_loop_fma", AVOID_CROSS_LOOP_FMA) AARCH64_EXTRA_TUNING_OPTION ("fully_pipelined_fma", FULLY_PIPELINED_FMA) +/* Enable is the target prefers to use a fresh register for predicate outputs + rather than re-use an input predicate register. */ +AARCH64_EXTRA_TUNING_OPTION ("avoid_pred_rmw", AVOID_PRED_RMW) + #undef AARCH64_EXTRA_TUNING_OPTION diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index bbf11faaf4b4340956094a983f8b0dc2649b2d27..e7669e65d7dae5df2ba42c265079b1856a5c382b 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -495,6 +495,11 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = AARCH64_FL_SM_OFF; enabled through +gcs. */ #define TARGET_GCS (AARCH64_ISA_GCS) +/* Prefer different predicate registers for the output of a predicated operation over + re-using an existing input predicate. */ +#define TARGET_SVE_PRED_CLOBBER (TARGET_SVE \ + && (aarch64_tune_params.extra_tuning_flags \ + & AARCH64_EXTRA_TUNE_AVOID_PRED_RMW)) /* Standard register usage. */ diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index dbde066f7478bec51a8703b017ea553aa98be309..52e5adba4172e14b794b5df9394e58ce49ef8b7f 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -445,6 +445,10 @@ (define_enum_attr "arch" "arches" (const_string "any")) ;; target-independent code. (define_attr "is_call" "no,yes" (const_string "no")) +;; Indicates whether we want to enable the pattern with an optional early +;; clobber for SVE predicates. +(define_attr "pred_clobber" "no,yes" (const_string "no")) + ;; [For compatibility with Arm in pipeline models] ;; Attribute that specifies whether or not the instruction touches fp ;; registers. @@ -460,7 +464,12 @@ (define_attr "fp" "no,yes" (define_attr "arch_enabled" "no,yes" (if_then_else - (ior + (and + (ior + (eq_attr "pred_clobber" "no") + (match_test "TARGET_SVE_PRED_CLOBBER")) + + (ior (eq_attr "arch" "any") (and (eq_attr "arch" "rcpc8_4") @@ -488,7 +497,7 @@ (define_attr "arch_enabled" "no,yes" (match_test "TARGET_SVE")) (and (eq_attr "arch" "sme") - (match_test "TARGET_SME"))) + (match_test "TARGET_SME")))) (const_string "yes") (const_string "no"))) [-- Attachment #2: rb18355.patch --] [-- Type: application/octet-stream, Size: 2666 bytes --] diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def b/gcc/config/aarch64/aarch64-tuning-flags.def index d5bcaebce770f0b217aac783063d39135f754c77..a9f48f5d3d4ea32fbf53086ba21eab4bc65b6dcb 100644 --- a/gcc/config/aarch64/aarch64-tuning-flags.def +++ b/gcc/config/aarch64/aarch64-tuning-flags.def @@ -48,4 +48,8 @@ AARCH64_EXTRA_TUNING_OPTION ("avoid_cross_loop_fma", AVOID_CROSS_LOOP_FMA) AARCH64_EXTRA_TUNING_OPTION ("fully_pipelined_fma", FULLY_PIPELINED_FMA) +/* Enable is the target prefers to use a fresh register for predicate outputs + rather than re-use an input predicate register. */ +AARCH64_EXTRA_TUNING_OPTION ("avoid_pred_rmw", AVOID_PRED_RMW) + #undef AARCH64_EXTRA_TUNING_OPTION diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index bbf11faaf4b4340956094a983f8b0dc2649b2d27..e7669e65d7dae5df2ba42c265079b1856a5c382b 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -495,6 +495,11 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = AARCH64_FL_SM_OFF; enabled through +gcs. */ #define TARGET_GCS (AARCH64_ISA_GCS) +/* Prefer different predicate registers for the output of a predicated operation over + re-using an existing input predicate. */ +#define TARGET_SVE_PRED_CLOBBER (TARGET_SVE \ + && (aarch64_tune_params.extra_tuning_flags \ + & AARCH64_EXTRA_TUNE_AVOID_PRED_RMW)) /* Standard register usage. */ diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index dbde066f7478bec51a8703b017ea553aa98be309..52e5adba4172e14b794b5df9394e58ce49ef8b7f 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -445,6 +445,10 @@ (define_enum_attr "arch" "arches" (const_string "any")) ;; target-independent code. (define_attr "is_call" "no,yes" (const_string "no")) +;; Indicates whether we want to enable the pattern with an optional early +;; clobber for SVE predicates. +(define_attr "pred_clobber" "no,yes" (const_string "no")) + ;; [For compatibility with Arm in pipeline models] ;; Attribute that specifies whether or not the instruction touches fp ;; registers. @@ -460,7 +464,12 @@ (define_attr "fp" "no,yes" (define_attr "arch_enabled" "no,yes" (if_then_else - (ior + (and + (ior + (eq_attr "pred_clobber" "no") + (match_test "TARGET_SVE_PRED_CLOBBER")) + + (ior (eq_attr "arch" "any") (and (eq_attr "arch" "rcpc8_4") @@ -488,7 +497,7 @@ (define_attr "arch_enabled" "no,yes" (match_test "TARGET_SVE")) (and (eq_attr "arch" "sme") - (match_test "TARGET_SME"))) + (match_test "TARGET_SME")))) (const_string "yes") (const_string "no"))) ^ permalink raw reply [flat|nested] 25+ messages in thread
* RE: [PATCH 2/4]AArch64: add new tuning param and attribute for enabling conditional early clobber 2024-05-22 9:29 ` Tamar Christina @ 2024-05-28 9:37 ` Tamar Christina 2024-05-30 14:59 ` Richard Sandiford 0 siblings, 1 reply; 25+ messages in thread From: Tamar Christina @ 2024-05-28 9:37 UTC (permalink / raw) To: Tamar Christina, Richard Sandiford Cc: gcc-patches, nd, Richard Earnshaw, Marcus Shawcroft, ktkachov [-- Attachment #1: Type: text/plain, Size: 6779 bytes --] > -----Original Message----- > From: Tamar Christina <Tamar.Christina@arm.com> > Sent: Wednesday, May 22, 2024 10:29 AM > To: Richard Sandiford <Richard.Sandiford@arm.com> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard Earnshaw > <Richard.Earnshaw@arm.com>; Marcus Shawcroft > <Marcus.Shawcroft@arm.com>; ktkachov@gcc.gnu.org > Subject: RE: [PATCH 2/4]AArch64: add new tuning param and attribute for > enabling conditional early clobber > > > > > Sorry for the bike-shedding, but how about something like "avoid_pred_rmw"? > > (I'm open to other suggestions.) Just looking for something that describes > > either the architecture or the end result that we want to achieve. > > And preferable something fairly short :) > > > > avoid_* would be consistent with the existing "avoid_cross_loop_fma". > > > > > + > > > #undef AARCH64_EXTRA_TUNING_OPTION > > > diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h > > > index > > > bbf11faaf4b4340956094a983f8b0dc2649b2d27..76a18dd511f40ebb58ed12d5 > > 6b46c74084ba7c3c 100644 > > > --- a/gcc/config/aarch64/aarch64.h > > > +++ b/gcc/config/aarch64/aarch64.h > > > @@ -495,6 +495,11 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = > > AARCH64_FL_SM_OFF; > > > enabled through +gcs. */ > > > #define TARGET_GCS (AARCH64_ISA_GCS) > > > > > > +/* Prefer different predicate registers for the output of a predicated operation > > over > > > + re-using an existing input predicate. */ > > > +#define TARGET_SVE_PRED_CLOBBER (TARGET_SVE \ > > > + && (aarch64_tune_params.extra_tuning_flags \ > > > + & > > AARCH64_EXTRA_TUNE_EARLY_CLOBBER_SVE_PRED_DEST)) > > > > > > /* Standard register usage. */ > > > > > > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md > > > index > > > dbde066f7478bec51a8703b017ea553aa98be309..1ecd1a2812969504bd5114a > > 53473b478c5ddba82 100644 > > > --- a/gcc/config/aarch64/aarch64.md > > > +++ b/gcc/config/aarch64/aarch64.md > > > @@ -445,6 +445,10 @@ (define_enum_attr "arch" "arches" (const_string > > "any")) > > > ;; target-independent code. > > > (define_attr "is_call" "no,yes" (const_string "no")) > > > > > > +;; Indicates whether we want to enable the pattern with an optional early > > > +;; clobber for SVE predicates. > > > +(define_attr "pred_clobber" "no,yes" (const_string "no")) > > > + > > > ;; [For compatibility with Arm in pipeline models] > > > ;; Attribute that specifies whether or not the instruction touches fp > > > ;; registers. > > > @@ -461,7 +465,8 @@ (define_attr "fp" "no,yes" > > > (define_attr "arch_enabled" "no,yes" > > > (if_then_else > > > (ior > > > - (eq_attr "arch" "any") > > > + (and (eq_attr "arch" "any") > > > + (eq_attr "pred_clobber" "no")) > > > > > > (and (eq_attr "arch" "rcpc8_4") > > > (match_test "AARCH64_ISA_RCPC8_4")) > > > @@ -488,7 +493,10 @@ (define_attr "arch_enabled" "no,yes" > > > (match_test "TARGET_SVE")) > > > > > > (and (eq_attr "arch" "sme") > > > - (match_test "TARGET_SME"))) > > > + (match_test "TARGET_SME")) > > > + > > > + (and (eq_attr "pred_clobber" "yes") > > > + (match_test "TARGET_SVE_PRED_CLOBBER"))) > > > > IMO it'd be bettero handle pred_clobber separately from arch, as a new > > top-level AND: > > > > (and > > (ior > > (eq_attr "pred_clobber" "no") > > (match_test "!TARGET_...")) > > (ior > > ...existing arch tests...)) > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * config/aarch64/aarch64-tuning-flags.def (AVOID_PRED_RMW): New. * config/aarch64/aarch64.h (TARGET_SVE_PRED_CLOBBER): New. * config/aarch64/aarch64.md (pred_clobber): New. (arch_enabled): Use it. -- inline copy of patch -- diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def b/gcc/config/aarch64/aarch64-tuning-flags.def index d5bcaebce770f0b217aac783063d39135f754c77..a9f48f5d3d4ea32fbf53086ba21eab4bc65b6dcb 100644 --- a/gcc/config/aarch64/aarch64-tuning-flags.def +++ b/gcc/config/aarch64/aarch64-tuning-flags.def @@ -48,4 +48,8 @@ AARCH64_EXTRA_TUNING_OPTION ("avoid_cross_loop_fma", AVOID_CROSS_LOOP_FMA) AARCH64_EXTRA_TUNING_OPTION ("fully_pipelined_fma", FULLY_PIPELINED_FMA) +/* Enable is the target prefers to use a fresh register for predicate outputs + rather than re-use an input predicate register. */ +AARCH64_EXTRA_TUNING_OPTION ("avoid_pred_rmw", AVOID_PRED_RMW) + #undef AARCH64_EXTRA_TUNING_OPTION diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index bbf11faaf4b4340956094a983f8b0dc2649b2d27..e7669e65d7dae5df2ba42c265079b1856a5c382b 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -495,6 +495,11 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = AARCH64_FL_SM_OFF; enabled through +gcs. */ #define TARGET_GCS (AARCH64_ISA_GCS) +/* Prefer different predicate registers for the output of a predicated operation over + re-using an existing input predicate. */ +#define TARGET_SVE_PRED_CLOBBER (TARGET_SVE \ + && (aarch64_tune_params.extra_tuning_flags \ + & AARCH64_EXTRA_TUNE_AVOID_PRED_RMW)) /* Standard register usage. */ diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index dbde066f7478bec51a8703b017ea553aa98be309..a7da3c01617eb8411029c7d2e32f13fa2cc1c833 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -445,6 +445,10 @@ (define_enum_attr "arch" "arches" (const_string "any")) ;; target-independent code. (define_attr "is_call" "no,yes" (const_string "no")) +;; Indicates whether we want to enable the pattern with an optional early +;; clobber for SVE predicates. +(define_attr "pred_clobber" "any,no,yes" (const_string "any")) + ;; [For compatibility with Arm in pipeline models] ;; Attribute that specifies whether or not the instruction touches fp ;; registers. @@ -460,7 +464,17 @@ (define_attr "fp" "no,yes" (define_attr "arch_enabled" "no,yes" (if_then_else - (ior + (and + (ior + (and + (eq_attr "pred_clobber" "no") + (match_test "!TARGET_SVE_PRED_CLOBBER")) + (and + (eq_attr "pred_clobber" "yes") + (match_test "TARGET_SVE_PRED_CLOBBER")) + (eq_attr "pred_clobber" "any")) + + (ior (eq_attr "arch" "any") (and (eq_attr "arch" "rcpc8_4") @@ -488,7 +502,7 @@ (define_attr "arch_enabled" "no,yes" (match_test "TARGET_SVE")) (and (eq_attr "arch" "sme") - (match_test "TARGET_SME"))) + (match_test "TARGET_SME")))) (const_string "yes") (const_string "no"))) [-- Attachment #2: rb18355 (1).patch --] [-- Type: application/octet-stream, Size: 2803 bytes --] diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def b/gcc/config/aarch64/aarch64-tuning-flags.def index d5bcaebce770f0b217aac783063d39135f754c77..a9f48f5d3d4ea32fbf53086ba21eab4bc65b6dcb 100644 --- a/gcc/config/aarch64/aarch64-tuning-flags.def +++ b/gcc/config/aarch64/aarch64-tuning-flags.def @@ -48,4 +48,8 @@ AARCH64_EXTRA_TUNING_OPTION ("avoid_cross_loop_fma", AVOID_CROSS_LOOP_FMA) AARCH64_EXTRA_TUNING_OPTION ("fully_pipelined_fma", FULLY_PIPELINED_FMA) +/* Enable is the target prefers to use a fresh register for predicate outputs + rather than re-use an input predicate register. */ +AARCH64_EXTRA_TUNING_OPTION ("avoid_pred_rmw", AVOID_PRED_RMW) + #undef AARCH64_EXTRA_TUNING_OPTION diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index bbf11faaf4b4340956094a983f8b0dc2649b2d27..e7669e65d7dae5df2ba42c265079b1856a5c382b 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -495,6 +495,11 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = AARCH64_FL_SM_OFF; enabled through +gcs. */ #define TARGET_GCS (AARCH64_ISA_GCS) +/* Prefer different predicate registers for the output of a predicated operation over + re-using an existing input predicate. */ +#define TARGET_SVE_PRED_CLOBBER (TARGET_SVE \ + && (aarch64_tune_params.extra_tuning_flags \ + & AARCH64_EXTRA_TUNE_AVOID_PRED_RMW)) /* Standard register usage. */ diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index dbde066f7478bec51a8703b017ea553aa98be309..a7da3c01617eb8411029c7d2e32f13fa2cc1c833 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -445,6 +445,10 @@ (define_enum_attr "arch" "arches" (const_string "any")) ;; target-independent code. (define_attr "is_call" "no,yes" (const_string "no")) +;; Indicates whether we want to enable the pattern with an optional early +;; clobber for SVE predicates. +(define_attr "pred_clobber" "any,no,yes" (const_string "any")) + ;; [For compatibility with Arm in pipeline models] ;; Attribute that specifies whether or not the instruction touches fp ;; registers. @@ -460,7 +464,17 @@ (define_attr "fp" "no,yes" (define_attr "arch_enabled" "no,yes" (if_then_else - (ior + (and + (ior + (and + (eq_attr "pred_clobber" "no") + (match_test "!TARGET_SVE_PRED_CLOBBER")) + (and + (eq_attr "pred_clobber" "yes") + (match_test "TARGET_SVE_PRED_CLOBBER")) + (eq_attr "pred_clobber" "any")) + + (ior (eq_attr "arch" "any") (and (eq_attr "arch" "rcpc8_4") @@ -488,7 +502,7 @@ (define_attr "arch_enabled" "no,yes" (match_test "TARGET_SVE")) (and (eq_attr "arch" "sme") - (match_test "TARGET_SME"))) + (match_test "TARGET_SME")))) (const_string "yes") (const_string "no"))) ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/4]AArch64: add new tuning param and attribute for enabling conditional early clobber 2024-05-28 9:37 ` Tamar Christina @ 2024-05-30 14:59 ` Richard Sandiford 0 siblings, 0 replies; 25+ messages in thread From: Richard Sandiford @ 2024-05-30 14:59 UTC (permalink / raw) To: Tamar Christina Cc: gcc-patches, nd, Richard Earnshaw, Marcus Shawcroft, ktkachov Tamar Christina <Tamar.Christina@arm.com> writes: >> -----Original Message----- >> From: Tamar Christina <Tamar.Christina@arm.com> >> Sent: Wednesday, May 22, 2024 10:29 AM >> To: Richard Sandiford <Richard.Sandiford@arm.com> >> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard Earnshaw >> <Richard.Earnshaw@arm.com>; Marcus Shawcroft >> <Marcus.Shawcroft@arm.com>; ktkachov@gcc.gnu.org >> Subject: RE: [PATCH 2/4]AArch64: add new tuning param and attribute for >> enabling conditional early clobber >> >> > >> > Sorry for the bike-shedding, but how about something like "avoid_pred_rmw"? >> > (I'm open to other suggestions.) Just looking for something that describes >> > either the architecture or the end result that we want to achieve. >> > And preferable something fairly short :) >> > >> > avoid_* would be consistent with the existing "avoid_cross_loop_fma". >> > >> > > + >> > > #undef AARCH64_EXTRA_TUNING_OPTION >> > > diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h >> > > index >> > >> bbf11faaf4b4340956094a983f8b0dc2649b2d27..76a18dd511f40ebb58ed12d5 >> > 6b46c74084ba7c3c 100644 >> > > --- a/gcc/config/aarch64/aarch64.h >> > > +++ b/gcc/config/aarch64/aarch64.h >> > > @@ -495,6 +495,11 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = >> > AARCH64_FL_SM_OFF; >> > > enabled through +gcs. */ >> > > #define TARGET_GCS (AARCH64_ISA_GCS) >> > > >> > > +/* Prefer different predicate registers for the output of a predicated operation >> > over >> > > + re-using an existing input predicate. */ >> > > +#define TARGET_SVE_PRED_CLOBBER (TARGET_SVE \ >> > > + && (aarch64_tune_params.extra_tuning_flags \ >> > > + & >> > AARCH64_EXTRA_TUNE_EARLY_CLOBBER_SVE_PRED_DEST)) >> > > >> > > /* Standard register usage. */ >> > > >> > > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md >> > > index >> > >> dbde066f7478bec51a8703b017ea553aa98be309..1ecd1a2812969504bd5114a >> > 53473b478c5ddba82 100644 >> > > --- a/gcc/config/aarch64/aarch64.md >> > > +++ b/gcc/config/aarch64/aarch64.md >> > > @@ -445,6 +445,10 @@ (define_enum_attr "arch" "arches" (const_string >> > "any")) >> > > ;; target-independent code. >> > > (define_attr "is_call" "no,yes" (const_string "no")) >> > > >> > > +;; Indicates whether we want to enable the pattern with an optional early >> > > +;; clobber for SVE predicates. >> > > +(define_attr "pred_clobber" "no,yes" (const_string "no")) >> > > + >> > > ;; [For compatibility with Arm in pipeline models] >> > > ;; Attribute that specifies whether or not the instruction touches fp >> > > ;; registers. >> > > @@ -461,7 +465,8 @@ (define_attr "fp" "no,yes" >> > > (define_attr "arch_enabled" "no,yes" >> > > (if_then_else >> > > (ior >> > > - (eq_attr "arch" "any") >> > > + (and (eq_attr "arch" "any") >> > > + (eq_attr "pred_clobber" "no")) >> > > >> > > (and (eq_attr "arch" "rcpc8_4") >> > > (match_test "AARCH64_ISA_RCPC8_4")) >> > > @@ -488,7 +493,10 @@ (define_attr "arch_enabled" "no,yes" >> > > (match_test "TARGET_SVE")) >> > > >> > > (and (eq_attr "arch" "sme") >> > > - (match_test "TARGET_SME"))) >> > > + (match_test "TARGET_SME")) >> > > + >> > > + (and (eq_attr "pred_clobber" "yes") >> > > + (match_test "TARGET_SVE_PRED_CLOBBER"))) >> > >> > IMO it'd be bettero handle pred_clobber separately from arch, as a new >> > top-level AND: >> > >> > (and >> > (ior >> > (eq_attr "pred_clobber" "no") >> > (match_test "!TARGET_...")) >> > (ior >> > ...existing arch tests...)) >> > >> > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > Ok for master? > > Thanks, > Tamar > > gcc/ChangeLog: > > * config/aarch64/aarch64-tuning-flags.def > (AVOID_PRED_RMW): New. > * config/aarch64/aarch64.h (TARGET_SVE_PRED_CLOBBER): New. > * config/aarch64/aarch64.md (pred_clobber): New. > (arch_enabled): Use it. > > -- inline copy of patch -- > > diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def b/gcc/config/aarch64/aarch64-tuning-flags.def > index d5bcaebce770f0b217aac783063d39135f754c77..a9f48f5d3d4ea32fbf53086ba21eab4bc65b6dcb 100644 > --- a/gcc/config/aarch64/aarch64-tuning-flags.def > +++ b/gcc/config/aarch64/aarch64-tuning-flags.def > @@ -48,4 +48,8 @@ AARCH64_EXTRA_TUNING_OPTION ("avoid_cross_loop_fma", AVOID_CROSS_LOOP_FMA) > > AARCH64_EXTRA_TUNING_OPTION ("fully_pipelined_fma", FULLY_PIPELINED_FMA) > > +/* Enable is the target prefers to use a fresh register for predicate outputs > + rather than re-use an input predicate register. */ > +AARCH64_EXTRA_TUNING_OPTION ("avoid_pred_rmw", AVOID_PRED_RMW) > + > #undef AARCH64_EXTRA_TUNING_OPTION > diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h > index bbf11faaf4b4340956094a983f8b0dc2649b2d27..e7669e65d7dae5df2ba42c265079b1856a5c382b 100644 > --- a/gcc/config/aarch64/aarch64.h > +++ b/gcc/config/aarch64/aarch64.h > @@ -495,6 +495,11 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = AARCH64_FL_SM_OFF; > enabled through +gcs. */ > #define TARGET_GCS (AARCH64_ISA_GCS) > > +/* Prefer different predicate registers for the output of a predicated operation over > + re-using an existing input predicate. */ Formatting nit (sorry for not noticing last time): /* Prefer different predicate registers for the output of a predicated operation over re-using an existing input predicate. */ (avoiding an extra space after "/*" and wrapping at 80 columns). OK with that change, thanks. Richard > +#define TARGET_SVE_PRED_CLOBBER (TARGET_SVE \ > + && (aarch64_tune_params.extra_tuning_flags \ > + & AARCH64_EXTRA_TUNE_AVOID_PRED_RMW)) > > /* Standard register usage. */ > > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md > index dbde066f7478bec51a8703b017ea553aa98be309..a7da3c01617eb8411029c7d2e32f13fa2cc1c833 100644 > --- a/gcc/config/aarch64/aarch64.md > +++ b/gcc/config/aarch64/aarch64.md > @@ -445,6 +445,10 @@ (define_enum_attr "arch" "arches" (const_string "any")) > ;; target-independent code. > (define_attr "is_call" "no,yes" (const_string "no")) > > +;; Indicates whether we want to enable the pattern with an optional early > +;; clobber for SVE predicates. > +(define_attr "pred_clobber" "any,no,yes" (const_string "any")) > + > ;; [For compatibility with Arm in pipeline models] > ;; Attribute that specifies whether or not the instruction touches fp > ;; registers. > @@ -460,7 +464,17 @@ (define_attr "fp" "no,yes" > > (define_attr "arch_enabled" "no,yes" > (if_then_else > - (ior > + (and > + (ior > + (and > + (eq_attr "pred_clobber" "no") > + (match_test "!TARGET_SVE_PRED_CLOBBER")) > + (and > + (eq_attr "pred_clobber" "yes") > + (match_test "TARGET_SVE_PRED_CLOBBER")) > + (eq_attr "pred_clobber" "any")) > + > + (ior > (eq_attr "arch" "any") > > (and (eq_attr "arch" "rcpc8_4") > @@ -488,7 +502,7 @@ (define_attr "arch_enabled" "no,yes" > (match_test "TARGET_SVE")) > > (and (eq_attr "arch" "sme") > - (match_test "TARGET_SME"))) > + (match_test "TARGET_SME")))) > (const_string "yes") > (const_string "no"))) ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 3/4]AArch64: add new alternative with early clobber to patterns 2024-05-15 10:28 [PATCH 0/4]AArch64: support conditional early clobbers on certain operations Tamar Christina 2024-05-15 10:28 ` [PATCH 1/4]AArch64: convert several predicate patterns to new compact syntax Tamar Christina 2024-05-15 10:28 ` [PATCH 2/4]AArch64: add new tuning param and attribute for enabling conditional early clobber Tamar Christina @ 2024-05-15 10:29 ` Tamar Christina 2024-05-15 10:29 ` [PATCH 4/4]AArch64: enable new predicate tuning for Neoverse cores Tamar Christina 2024-05-15 11:20 ` [PATCH 0/4]AArch64: support conditional early clobbers on certain operations Richard Biener 4 siblings, 0 replies; 25+ messages in thread From: Tamar Christina @ 2024-05-15 10:29 UTC (permalink / raw) To: gcc-patches Cc: nd, Richard.Earnshaw, Marcus.Shawcroft, ktkachov, richard.sandiford [-- Attachment #1: Type: text/plain, Size: 18016 bytes --] Hi All, This patch adds new alternatives to the patterns which are affected. The new alternatives with the conditional early clobbers are added before the normal ones in order for LRA to prefer them in the event that we have enough free registers to accommodate them. In case register pressure is too high the normal alternatives will be preferred before a reload is considered as we rather have the tie than a spill. Tests are in the next patch. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * config/aarch64/aarch64-sve.md (and<mode>3, @aarch64_pred_<optab><mode>_z, *<optab><mode>3_cc, *<optab><mode>3_ptest, aarch64_pred_<nlogical><mode>_z, *<nlogical><mode>3_cc, *<nlogical><mode>3_ptest, aarch64_pred_<logical_nn><mode>_z, *<logical_nn><mode>3_cc, *<logical_nn><mode>3_ptest, @aarch64_pred_cmp<cmp_op><mode>, *cmp<cmp_op><mode>_cc, *cmp<cmp_op><mode>_ptest, @aarch64_pred_cmp<cmp_op><mode>_wide, *aarch64_pred_cmp<cmp_op><mode>_wide_cc, *aarch64_pred_cmp<cmp_op><mode>_wide_ptest, @aarch64_brk<brk_op>, *aarch64_brk<brk_op>_cc, *aarch64_brk<brk_op>_ptest, @aarch64_brk<brk_op>, *aarch64_brkn_cc, *aarch64_brkn_ptest, *aarch64_brk<brk_op>_cc, *aarch64_brk<brk_op>_ptest, aarch64_rdffr_z, *aarch64_rdffr_z_ptest, *aarch64_rdffr_ptest, *aarch64_rdffr_z_cc, *aarch64_rdffr_cc): Add new early clobber alternative. * config/aarch64/aarch64-sve2.md (@aarch64_pred_<sve_int_op><mode>): Likewise. --- diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md index 839ab0627747d7a49bef7b0192ee9e7a42587ca0..93ec59e58afee260b85082c472db2abfea7386b6 100644 --- a/gcc/config/aarch64/aarch64-sve.md +++ b/gcc/config/aarch64/aarch64-sve.md @@ -1161,8 +1161,9 @@ (define_insn "aarch64_rdffr_z" (reg:VNx16BI FFRT_REGNUM) (match_operand:VNx16BI 1 "register_operand")))] "TARGET_SVE && TARGET_NON_STREAMING" - {@ [ cons: =0, 1 ] - [ Upa , Upa ] rdffr\t%0.b, %1/z + {@ [ cons: =0, 1 ; attrs: pred_clobber ] + [ &Upa , Upa; yes ] rdffr\t%0.b, %1/z + [ Upa , Upa; * ] ^ } ) @@ -1179,8 +1180,9 @@ (define_insn "*aarch64_rdffr_z_ptest" UNSPEC_PTEST)) (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE && TARGET_NON_STREAMING" - {@ [ cons: =0, 1 , 2 ] - [ Upa , Upa, ] rdffrs\t%0.b, %1/z + {@ [ cons: =0, 1 , 2; attrs: pred_clobber ] + [ &Upa , Upa, ; yes ] rdffrs\t%0.b, %1/z + [ Upa , Upa, ; * ] ^ } ) @@ -1195,8 +1197,9 @@ (define_insn "*aarch64_rdffr_ptest" UNSPEC_PTEST)) (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE && TARGET_NON_STREAMING" - {@ [ cons: =0, 1 ] - [ Upa , Upa ] rdffrs\t%0.b, %1/z + {@ [ cons: =0, 1 ; attrs: pred_clobber ] + [ &Upa , Upa; yes ] rdffrs\t%0.b, %1/z + [ Upa , Upa; * ] ^ } ) @@ -1216,8 +1219,9 @@ (define_insn "*aarch64_rdffr_z_cc" (reg:VNx16BI FFRT_REGNUM) (match_dup 1)))] "TARGET_SVE && TARGET_NON_STREAMING" - {@ [ cons: =0, 1 , 2 ] - [ Upa , Upa, ] rdffrs\t%0.b, %1/z + {@ [ cons: =0, 1 , 2; attrs: pred_clobber ] + [ &Upa , Upa, ; yes ] rdffrs\t%0.b, %1/z + [ Upa , Upa, ; * ] ^ } ) @@ -1233,8 +1237,9 @@ (define_insn "*aarch64_rdffr_cc" (set (match_operand:VNx16BI 0 "register_operand") (reg:VNx16BI FFRT_REGNUM))] "TARGET_SVE && TARGET_NON_STREAMING" - {@ [ cons: =0, 1 , 2 ] - [ Upa , Upa, ] rdffrs\t%0.b, %1/z + {@ [ cons: =0, 1 , 2; attrs: pred_clobber ] + [ &Upa , Upa, ; yes ] rdffrs\t%0.b, %1/z + [ Upa , Upa, ; * ] ^ } ) @@ -6651,8 +6656,9 @@ (define_insn "and<mode>3" (and:PRED_ALL (match_operand:PRED_ALL 1 "register_operand") (match_operand:PRED_ALL 2 "register_operand")))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 ] - [ Upa , Upa, Upa ] and\t%0.b, %1/z, %2.b, %2.b + {@ [ cons: =0, 1 , 2 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa; yes ] and\t%0.b, %1/z, %2.b, %2.b + [ Upa , Upa, Upa; * ] ^ } ) @@ -6679,8 +6685,9 @@ (define_insn "@aarch64_pred_<optab><mode>_z" (match_operand:PRED_ALL 3 "register_operand")) (match_operand:PRED_ALL 1 "register_operand")))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, Upa ] <logical>\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa; yes ] <logical>\t%0.b, %1/z, %2.b, %3.b + [ Upa , Upa, Upa, Upa; * ] ^ } ) @@ -6703,8 +6710,9 @@ (define_insn "*<optab><mode>3_cc" (and:PRED_ALL (LOGICAL:PRED_ALL (match_dup 2) (match_dup 3)) (match_dup 4)))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 , 4, 5 ] - [ Upa , Upa, Upa, Upa, , ] <logical>s\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 , 4, 5; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa, , ; yes ] <logical>s\t%0.b, %1/z, %2.b, %3.b + [ Upa , Upa, Upa, Upa, , ; * ] ^ } ) @@ -6723,8 +6731,9 @@ (define_insn "*<optab><mode>3_ptest" UNSPEC_PTEST)) (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 , 4, 5 ] - [ Upa , Upa, Upa, Upa, , ] <logical>s\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 , 4, 5; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa, , ; yes ] <logical>s\t%0.b, %1/z, %2.b, %3.b + [ Upa , Upa, Upa, Upa, , ; * ] ^ } ) @@ -6745,8 +6754,9 @@ (define_insn "aarch64_pred_<nlogical><mode>_z" (match_operand:PRED_ALL 2 "register_operand")) (match_operand:PRED_ALL 1 "register_operand")))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, Upa ] <nlogical>\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa; yes ] <nlogical>\t%0.b, %1/z, %2.b, %3.b + [ Upa , Upa, Upa, Upa; * ] ^ } ) @@ -6770,8 +6780,9 @@ (define_insn "*<nlogical><mode>3_cc" (match_dup 2)) (match_dup 4)))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 , 4, 5 ] - [ Upa , Upa, Upa, Upa, , ] <nlogical>s\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 , 4, 5; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa, , ; yes ] <nlogical>s\t%0.b, %1/z, %2.b, %3.b + [ Upa , Upa, Upa, Upa, , ; * ] ^ } ) @@ -6791,8 +6802,9 @@ (define_insn "*<nlogical><mode>3_ptest" UNSPEC_PTEST)) (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 , 4, 5 ] - [ Upa , Upa, Upa, Upa, , ] <nlogical>s\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 , 4, 5; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa, , ; yes ] <nlogical>s\t%0.b, %1/z, %2.b, %3.b + [ Upa , Upa, Upa, Upa, , ; * ] ^ } ) @@ -6813,8 +6825,9 @@ (define_insn "aarch64_pred_<logical_nn><mode>_z" (not:PRED_ALL (match_operand:PRED_ALL 3 "register_operand"))) (match_operand:PRED_ALL 1 "register_operand")))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, Upa ] <logical_nn>\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa; yes ] <logical_nn>\t%0.b, %1/z, %2.b, %3.b + [ Upa , Upa, Upa, Upa; * ] ^ } ) @@ -6839,8 +6852,9 @@ (define_insn "*<logical_nn><mode>3_cc" (not:PRED_ALL (match_dup 3))) (match_dup 4)))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 , 4, 5 ] - [ Upa , Upa, Upa, Upa, , ] <logical_nn>s\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 , 4, 5; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa, , ; yes ] <logical_nn>s\t%0.b, %1/z, %2.b, %3.b + [ Upa , Upa, Upa, Upa, , ; * ] ^ } ) @@ -6861,8 +6875,9 @@ (define_insn "*<logical_nn><mode>3_ptest" UNSPEC_PTEST)) (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 , 4, 5 ] - [ Upa , Upa, Upa, Upa, , ] <logical_nn>s\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 , 4, 5; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa, , ; yes ] <logical_nn>s\t%0.b, %1/z, %2.b, %3.b + [ Upa , Upa, Upa, Upa, , ; * ] ^ } ) @@ -8104,9 +8119,11 @@ (define_insn "@aarch64_pred_cmp<cmp_op><mode>" UNSPEC_PRED_Z)) (clobber (reg:CC_NZC CC_REGNUM))] "TARGET_SVE" - {@ [ cons: =0 , 1 , 3 , 4 ] - [ Upa , Upl , w , <sve_imm_con> ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, #%4 - [ Upa , Upl , w , w ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.<Vetype> + {@ [ cons: =0 , 1 , 3 , 4 ; attrs: pred_clobber ] + [ &Upa , Upl , w , <sve_imm_con>; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, #%4 + [ Upa , Upl , w , <sve_imm_con>; * ] ^ + [ &Upa , Upl , w , w ; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.<Vetype> + [ Upa , Upl , w , w ; * ] ^ } ) @@ -8136,9 +8153,11 @@ (define_insn_and_rewrite "*cmp<cmp_op><mode>_cc" UNSPEC_PRED_Z))] "TARGET_SVE && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])" - {@ [ cons: =0 , 1 , 2 , 3 ] - [ Upa , Upl , w , <sve_imm_con> ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, #%3 - [ Upa , Upl , w , w ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype> + {@ [ cons: =0 , 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upl , w , <sve_imm_con>; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, #%3 + [ Upa , Upl , w , <sve_imm_con>; * ] ^ + [ &Upa , Upl , w , w ; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype> + [ Upa , Upl , w , w ; * ] ^ } "&& !rtx_equal_p (operands[4], operands[6])" { @@ -8166,9 +8185,11 @@ (define_insn_and_rewrite "*cmp<cmp_op><mode>_ptest" (clobber (match_scratch:<VPRED> 0))] "TARGET_SVE && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upl, w , <sve_imm_con> ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, #%3 - [ Upa , Upl, w , w ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype> + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upl, w , <sve_imm_con>; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, #%3 + [ Upa , Upl, w , <sve_imm_con>; * ] ^ + [ &Upa , Upl, w , w ; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype> + [ Upa , Upl, w , w ; * ] ^ } "&& !rtx_equal_p (operands[4], operands[6])" { @@ -8221,8 +8242,9 @@ (define_insn "@aarch64_pred_cmp<cmp_op><mode>_wide" UNSPEC_PRED_Z)) (clobber (reg:CC_NZC CC_REGNUM))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2, 3, 4 ] - [ Upa , Upl, , w, w ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.d + {@ [ cons: =0, 1 , 2, 3, 4; attrs: pred_clobber ] + [ &Upa , Upl, , w, w; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.d + [ Upa , Upl, , w, w; * ] ^ } ) @@ -8254,8 +8276,9 @@ (define_insn "*aarch64_pred_cmp<cmp_op><mode>_wide_cc" UNSPEC_PRED_Z))] "TARGET_SVE && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])" - {@ [ cons: =0, 1 , 2, 3, 4, 5, 6 , 7 ] - [ Upa , Upl, w, w, , , Upl, ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.d + {@ [ cons: =0, 1 , 2, 3, 4, 5, 6 , 7; attrs: pred_clobber ] + [ &Upa , Upl, w, w, , , Upl, ; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.d + [ Upa , Upl, w, w, , , Upl, ; * ] ^ } ) @@ -8279,8 +8302,9 @@ (define_insn "*aarch64_pred_cmp<cmp_op><mode>_wide_ptest" (clobber (match_scratch:<VPRED> 0))] "TARGET_SVE && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])" - {@ [ cons: =0, 1 , 2, 3, 4, 5, 6 , 7 ] - [ Upa , Upl, w, w, , , Upl, ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.d + {@ [ cons: =0, 1 , 2, 3, 4, 5, 6 , 7; attrs: pred_clobber ] + [ &Upa , Upl, w, w, , , Upl, ; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.d + [ Upa , Upl, w, w, , , Upl, ; * ] ^ } ) @@ -9948,9 +9972,11 @@ (define_insn "@aarch64_brk<brk_op>" (match_operand:VNx16BI 3 "aarch64_simd_reg_or_zero")] SVE_BRK_UNARY))] "TARGET_SVE" - {@ [ cons: =0 , 1 , 2 , 3 ] - [ Upa , Upa , Upa , Dz ] brk<brk_op>\t%0.b, %1/z, %2.b - [ Upa , Upa , Upa , 0 ] brk<brk_op>\t%0.b, %1/m, %2.b + {@ [ cons: =0 , 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upa , Upa , Dz; yes ] brk<brk_op>\t%0.b, %1/z, %2.b + [ Upa , Upa , Upa , Dz; * ] ^ + [ &Upa , Upa , Upa , 0 ; yes ] brk<brk_op>\t%0.b, %1/m, %2.b + [ Upa , Upa , Upa , 0 ; * ] ^ } ) @@ -9974,8 +10000,9 @@ (define_insn "*aarch64_brk<brk_op>_cc" (match_dup 3)] SVE_BRK_UNARY))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3, 4 ] - [ Upa , Upa, Upa, , ] brk<brk_op>s\t%0.b, %1/z, %2.b + {@ [ cons: =0, 1 , 2 , 3, 4; attrs: pred_clobber ] + [ &Upa , Upa, Upa, , ; yes ] brk<brk_op>s\t%0.b, %1/z, %2.b + [ Upa , Upa, Upa, , ; * ] ^ } ) @@ -9994,8 +10021,9 @@ (define_insn "*aarch64_brk<brk_op>_ptest" UNSPEC_PTEST)) (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3, 4 ] - [ Upa , Upa, Upa, , ] brk<brk_op>s\t%0.b, %1/z, %2.b + {@ [ cons: =0, 1 , 2 , 3, 4; attrs: pred_clobber ] + [ &Upa , Upa, Upa, , ; yes ] brk<brk_op>s\t%0.b, %1/z, %2.b + [ Upa , Upa, Upa, , ; * ] ^ } ) @@ -10020,8 +10048,9 @@ (define_insn "@aarch64_brk<brk_op>" (match_operand:VNx16BI 3 "register_operand")] SVE_BRK_BINARY))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, <brk_reg_con> ] brk<brk_op>\t%0.b, %1/z, %2.b, %<brk_reg_opno>.b + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa, <brk_reg_con>; yes ] brk<brk_op>\t%0.b, %1/z, %2.b, %<brk_reg_opno>.b + [ Upa , Upa, Upa, <brk_reg_con>; * ] ^ } ) @@ -10046,8 +10075,9 @@ (define_insn_and_rewrite "*aarch64_brkn_cc" (match_dup 3)] UNSPEC_BRKN))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3, 4, 5 ] - [ Upa , Upa, Upa, 0, , ] brkns\t%0.b, %1/z, %2.b, %0.b + {@ [ cons: =0, 1 , 2 , 3, 4, 5; attrs: pred_clobber ] + [ &Upa , Upa, Upa, 0, , ; yes ] brkns\t%0.b, %1/z, %2.b, %0.b + [ Upa , Upa, Upa, 0, , ; * ] ^ } "&& (operands[4] != CONST0_RTX (VNx16BImode) || operands[5] != CONST0_RTX (VNx16BImode))" @@ -10072,8 +10102,9 @@ (define_insn_and_rewrite "*aarch64_brkn_ptest" UNSPEC_PTEST)) (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3, 4, 5 ] - [ Upa , Upa, Upa, 0, , ] brkns\t%0.b, %1/z, %2.b, %0.b + {@ [ cons: =0, 1 , 2 , 3, 4, 5; attrs: pred_clobber ] + [ &Upa , Upa, Upa, 0, , ; yes ] brkns\t%0.b, %1/z, %2.b, %0.b + [ Upa , Upa, Upa, 0, , ; * ] ^ } "&& (operands[4] != CONST0_RTX (VNx16BImode) || operands[5] != CONST0_RTX (VNx16BImode))" @@ -10103,8 +10134,9 @@ (define_insn "*aarch64_brk<brk_op>_cc" (match_dup 3)] SVE_BRKP))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 , 4 ] - [ Upa , Upa, Upa, Upa, ] brk<brk_op>s\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 , 4; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa, ; yes ] brk<brk_op>s\t%0.b, %1/z, %2.b, %3.b + [ Upa , Upa, Upa, Upa, ; * ] ^ } ) @@ -10123,8 +10155,9 @@ (define_insn "*aarch64_brk<brk_op>_ptest" UNSPEC_PTEST)) (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 , 4 ] - [ Upa , Upa, Upa, Upa, ] brk<brk_op>s\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 , 4; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa, ; yes ] brk<brk_op>s\t%0.b, %1/z, %2.b, %3.b + [ Upa , Upa, Upa, Upa, ; * ] ^ } ) diff --git a/gcc/config/aarch64/aarch64-sve2.md b/gcc/config/aarch64/aarch64-sve2.md index aa12baf48355358ca4fefe88157df3aac6eb09bd..771c346b8a3188dd7e3f3a98ee28f0ca5f928215 100644 --- a/gcc/config/aarch64/aarch64-sve2.md +++ b/gcc/config/aarch64/aarch64-sve2.md @@ -3349,8 +3349,9 @@ (define_insn "@aarch64_pred_<sve_int_op><mode>" UNSPEC_PRED_Z)) (clobber (reg:CC_NZC CC_REGNUM))] "TARGET_SVE2 && TARGET_NON_STREAMING" - {@ [ cons: =0, 1 , 2, 3, 4 ] - [ Upa , Upl, , w, w ] <sve_int_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.<Vetype> + {@ [ cons: =0, 1 , 2, 3, 4; attrs: pred_clobber ] + [ &Upa , Upl, , w, w; yes ] <sve_int_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.<Vetype> + [ Upa , Upl, , w, w; * ] ^ } ) -- [-- Attachment #2: rb18357.patch --] [-- Type: text/x-diff, Size: 16523 bytes --] diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md index 839ab0627747d7a49bef7b0192ee9e7a42587ca0..93ec59e58afee260b85082c472db2abfea7386b6 100644 --- a/gcc/config/aarch64/aarch64-sve.md +++ b/gcc/config/aarch64/aarch64-sve.md @@ -1161,8 +1161,9 @@ (define_insn "aarch64_rdffr_z" (reg:VNx16BI FFRT_REGNUM) (match_operand:VNx16BI 1 "register_operand")))] "TARGET_SVE && TARGET_NON_STREAMING" - {@ [ cons: =0, 1 ] - [ Upa , Upa ] rdffr\t%0.b, %1/z + {@ [ cons: =0, 1 ; attrs: pred_clobber ] + [ &Upa , Upa; yes ] rdffr\t%0.b, %1/z + [ Upa , Upa; * ] ^ } ) @@ -1179,8 +1180,9 @@ (define_insn "*aarch64_rdffr_z_ptest" UNSPEC_PTEST)) (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE && TARGET_NON_STREAMING" - {@ [ cons: =0, 1 , 2 ] - [ Upa , Upa, ] rdffrs\t%0.b, %1/z + {@ [ cons: =0, 1 , 2; attrs: pred_clobber ] + [ &Upa , Upa, ; yes ] rdffrs\t%0.b, %1/z + [ Upa , Upa, ; * ] ^ } ) @@ -1195,8 +1197,9 @@ (define_insn "*aarch64_rdffr_ptest" UNSPEC_PTEST)) (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE && TARGET_NON_STREAMING" - {@ [ cons: =0, 1 ] - [ Upa , Upa ] rdffrs\t%0.b, %1/z + {@ [ cons: =0, 1 ; attrs: pred_clobber ] + [ &Upa , Upa; yes ] rdffrs\t%0.b, %1/z + [ Upa , Upa; * ] ^ } ) @@ -1216,8 +1219,9 @@ (define_insn "*aarch64_rdffr_z_cc" (reg:VNx16BI FFRT_REGNUM) (match_dup 1)))] "TARGET_SVE && TARGET_NON_STREAMING" - {@ [ cons: =0, 1 , 2 ] - [ Upa , Upa, ] rdffrs\t%0.b, %1/z + {@ [ cons: =0, 1 , 2; attrs: pred_clobber ] + [ &Upa , Upa, ; yes ] rdffrs\t%0.b, %1/z + [ Upa , Upa, ; * ] ^ } ) @@ -1233,8 +1237,9 @@ (define_insn "*aarch64_rdffr_cc" (set (match_operand:VNx16BI 0 "register_operand") (reg:VNx16BI FFRT_REGNUM))] "TARGET_SVE && TARGET_NON_STREAMING" - {@ [ cons: =0, 1 , 2 ] - [ Upa , Upa, ] rdffrs\t%0.b, %1/z + {@ [ cons: =0, 1 , 2; attrs: pred_clobber ] + [ &Upa , Upa, ; yes ] rdffrs\t%0.b, %1/z + [ Upa , Upa, ; * ] ^ } ) @@ -6651,8 +6656,9 @@ (define_insn "and<mode>3" (and:PRED_ALL (match_operand:PRED_ALL 1 "register_operand") (match_operand:PRED_ALL 2 "register_operand")))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 ] - [ Upa , Upa, Upa ] and\t%0.b, %1/z, %2.b, %2.b + {@ [ cons: =0, 1 , 2 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa; yes ] and\t%0.b, %1/z, %2.b, %2.b + [ Upa , Upa, Upa; * ] ^ } ) @@ -6679,8 +6685,9 @@ (define_insn "@aarch64_pred_<optab><mode>_z" (match_operand:PRED_ALL 3 "register_operand")) (match_operand:PRED_ALL 1 "register_operand")))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, Upa ] <logical>\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa; yes ] <logical>\t%0.b, %1/z, %2.b, %3.b + [ Upa , Upa, Upa, Upa; * ] ^ } ) @@ -6703,8 +6710,9 @@ (define_insn "*<optab><mode>3_cc" (and:PRED_ALL (LOGICAL:PRED_ALL (match_dup 2) (match_dup 3)) (match_dup 4)))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 , 4, 5 ] - [ Upa , Upa, Upa, Upa, , ] <logical>s\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 , 4, 5; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa, , ; yes ] <logical>s\t%0.b, %1/z, %2.b, %3.b + [ Upa , Upa, Upa, Upa, , ; * ] ^ } ) @@ -6723,8 +6731,9 @@ (define_insn "*<optab><mode>3_ptest" UNSPEC_PTEST)) (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 , 4, 5 ] - [ Upa , Upa, Upa, Upa, , ] <logical>s\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 , 4, 5; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa, , ; yes ] <logical>s\t%0.b, %1/z, %2.b, %3.b + [ Upa , Upa, Upa, Upa, , ; * ] ^ } ) @@ -6745,8 +6754,9 @@ (define_insn "aarch64_pred_<nlogical><mode>_z" (match_operand:PRED_ALL 2 "register_operand")) (match_operand:PRED_ALL 1 "register_operand")))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, Upa ] <nlogical>\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa; yes ] <nlogical>\t%0.b, %1/z, %2.b, %3.b + [ Upa , Upa, Upa, Upa; * ] ^ } ) @@ -6770,8 +6780,9 @@ (define_insn "*<nlogical><mode>3_cc" (match_dup 2)) (match_dup 4)))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 , 4, 5 ] - [ Upa , Upa, Upa, Upa, , ] <nlogical>s\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 , 4, 5; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa, , ; yes ] <nlogical>s\t%0.b, %1/z, %2.b, %3.b + [ Upa , Upa, Upa, Upa, , ; * ] ^ } ) @@ -6791,8 +6802,9 @@ (define_insn "*<nlogical><mode>3_ptest" UNSPEC_PTEST)) (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 , 4, 5 ] - [ Upa , Upa, Upa, Upa, , ] <nlogical>s\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 , 4, 5; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa, , ; yes ] <nlogical>s\t%0.b, %1/z, %2.b, %3.b + [ Upa , Upa, Upa, Upa, , ; * ] ^ } ) @@ -6813,8 +6825,9 @@ (define_insn "aarch64_pred_<logical_nn><mode>_z" (not:PRED_ALL (match_operand:PRED_ALL 3 "register_operand"))) (match_operand:PRED_ALL 1 "register_operand")))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, Upa ] <logical_nn>\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa; yes ] <logical_nn>\t%0.b, %1/z, %2.b, %3.b + [ Upa , Upa, Upa, Upa; * ] ^ } ) @@ -6839,8 +6852,9 @@ (define_insn "*<logical_nn><mode>3_cc" (not:PRED_ALL (match_dup 3))) (match_dup 4)))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 , 4, 5 ] - [ Upa , Upa, Upa, Upa, , ] <logical_nn>s\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 , 4, 5; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa, , ; yes ] <logical_nn>s\t%0.b, %1/z, %2.b, %3.b + [ Upa , Upa, Upa, Upa, , ; * ] ^ } ) @@ -6861,8 +6875,9 @@ (define_insn "*<logical_nn><mode>3_ptest" UNSPEC_PTEST)) (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 , 4, 5 ] - [ Upa , Upa, Upa, Upa, , ] <logical_nn>s\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 , 4, 5; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa, , ; yes ] <logical_nn>s\t%0.b, %1/z, %2.b, %3.b + [ Upa , Upa, Upa, Upa, , ; * ] ^ } ) @@ -8104,9 +8119,11 @@ (define_insn "@aarch64_pred_cmp<cmp_op><mode>" UNSPEC_PRED_Z)) (clobber (reg:CC_NZC CC_REGNUM))] "TARGET_SVE" - {@ [ cons: =0 , 1 , 3 , 4 ] - [ Upa , Upl , w , <sve_imm_con> ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, #%4 - [ Upa , Upl , w , w ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.<Vetype> + {@ [ cons: =0 , 1 , 3 , 4 ; attrs: pred_clobber ] + [ &Upa , Upl , w , <sve_imm_con>; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, #%4 + [ Upa , Upl , w , <sve_imm_con>; * ] ^ + [ &Upa , Upl , w , w ; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.<Vetype> + [ Upa , Upl , w , w ; * ] ^ } ) @@ -8136,9 +8153,11 @@ (define_insn_and_rewrite "*cmp<cmp_op><mode>_cc" UNSPEC_PRED_Z))] "TARGET_SVE && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])" - {@ [ cons: =0 , 1 , 2 , 3 ] - [ Upa , Upl , w , <sve_imm_con> ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, #%3 - [ Upa , Upl , w , w ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype> + {@ [ cons: =0 , 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upl , w , <sve_imm_con>; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, #%3 + [ Upa , Upl , w , <sve_imm_con>; * ] ^ + [ &Upa , Upl , w , w ; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype> + [ Upa , Upl , w , w ; * ] ^ } "&& !rtx_equal_p (operands[4], operands[6])" { @@ -8166,9 +8185,11 @@ (define_insn_and_rewrite "*cmp<cmp_op><mode>_ptest" (clobber (match_scratch:<VPRED> 0))] "TARGET_SVE && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upl, w , <sve_imm_con> ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, #%3 - [ Upa , Upl, w , w ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype> + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upl, w , <sve_imm_con>; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, #%3 + [ Upa , Upl, w , <sve_imm_con>; * ] ^ + [ &Upa , Upl, w , w ; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype> + [ Upa , Upl, w , w ; * ] ^ } "&& !rtx_equal_p (operands[4], operands[6])" { @@ -8221,8 +8242,9 @@ (define_insn "@aarch64_pred_cmp<cmp_op><mode>_wide" UNSPEC_PRED_Z)) (clobber (reg:CC_NZC CC_REGNUM))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2, 3, 4 ] - [ Upa , Upl, , w, w ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.d + {@ [ cons: =0, 1 , 2, 3, 4; attrs: pred_clobber ] + [ &Upa , Upl, , w, w; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.d + [ Upa , Upl, , w, w; * ] ^ } ) @@ -8254,8 +8276,9 @@ (define_insn "*aarch64_pred_cmp<cmp_op><mode>_wide_cc" UNSPEC_PRED_Z))] "TARGET_SVE && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])" - {@ [ cons: =0, 1 , 2, 3, 4, 5, 6 , 7 ] - [ Upa , Upl, w, w, , , Upl, ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.d + {@ [ cons: =0, 1 , 2, 3, 4, 5, 6 , 7; attrs: pred_clobber ] + [ &Upa , Upl, w, w, , , Upl, ; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.d + [ Upa , Upl, w, w, , , Upl, ; * ] ^ } ) @@ -8279,8 +8302,9 @@ (define_insn "*aarch64_pred_cmp<cmp_op><mode>_wide_ptest" (clobber (match_scratch:<VPRED> 0))] "TARGET_SVE && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])" - {@ [ cons: =0, 1 , 2, 3, 4, 5, 6 , 7 ] - [ Upa , Upl, w, w, , , Upl, ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.d + {@ [ cons: =0, 1 , 2, 3, 4, 5, 6 , 7; attrs: pred_clobber ] + [ &Upa , Upl, w, w, , , Upl, ; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.d + [ Upa , Upl, w, w, , , Upl, ; * ] ^ } ) @@ -9948,9 +9972,11 @@ (define_insn "@aarch64_brk<brk_op>" (match_operand:VNx16BI 3 "aarch64_simd_reg_or_zero")] SVE_BRK_UNARY))] "TARGET_SVE" - {@ [ cons: =0 , 1 , 2 , 3 ] - [ Upa , Upa , Upa , Dz ] brk<brk_op>\t%0.b, %1/z, %2.b - [ Upa , Upa , Upa , 0 ] brk<brk_op>\t%0.b, %1/m, %2.b + {@ [ cons: =0 , 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upa , Upa , Dz; yes ] brk<brk_op>\t%0.b, %1/z, %2.b + [ Upa , Upa , Upa , Dz; * ] ^ + [ &Upa , Upa , Upa , 0 ; yes ] brk<brk_op>\t%0.b, %1/m, %2.b + [ Upa , Upa , Upa , 0 ; * ] ^ } ) @@ -9974,8 +10000,9 @@ (define_insn "*aarch64_brk<brk_op>_cc" (match_dup 3)] SVE_BRK_UNARY))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3, 4 ] - [ Upa , Upa, Upa, , ] brk<brk_op>s\t%0.b, %1/z, %2.b + {@ [ cons: =0, 1 , 2 , 3, 4; attrs: pred_clobber ] + [ &Upa , Upa, Upa, , ; yes ] brk<brk_op>s\t%0.b, %1/z, %2.b + [ Upa , Upa, Upa, , ; * ] ^ } ) @@ -9994,8 +10021,9 @@ (define_insn "*aarch64_brk<brk_op>_ptest" UNSPEC_PTEST)) (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3, 4 ] - [ Upa , Upa, Upa, , ] brk<brk_op>s\t%0.b, %1/z, %2.b + {@ [ cons: =0, 1 , 2 , 3, 4; attrs: pred_clobber ] + [ &Upa , Upa, Upa, , ; yes ] brk<brk_op>s\t%0.b, %1/z, %2.b + [ Upa , Upa, Upa, , ; * ] ^ } ) @@ -10020,8 +10048,9 @@ (define_insn "@aarch64_brk<brk_op>" (match_operand:VNx16BI 3 "register_operand")] SVE_BRK_BINARY))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, <brk_reg_con> ] brk<brk_op>\t%0.b, %1/z, %2.b, %<brk_reg_opno>.b + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa, <brk_reg_con>; yes ] brk<brk_op>\t%0.b, %1/z, %2.b, %<brk_reg_opno>.b + [ Upa , Upa, Upa, <brk_reg_con>; * ] ^ } ) @@ -10046,8 +10075,9 @@ (define_insn_and_rewrite "*aarch64_brkn_cc" (match_dup 3)] UNSPEC_BRKN))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3, 4, 5 ] - [ Upa , Upa, Upa, 0, , ] brkns\t%0.b, %1/z, %2.b, %0.b + {@ [ cons: =0, 1 , 2 , 3, 4, 5; attrs: pred_clobber ] + [ &Upa , Upa, Upa, 0, , ; yes ] brkns\t%0.b, %1/z, %2.b, %0.b + [ Upa , Upa, Upa, 0, , ; * ] ^ } "&& (operands[4] != CONST0_RTX (VNx16BImode) || operands[5] != CONST0_RTX (VNx16BImode))" @@ -10072,8 +10102,9 @@ (define_insn_and_rewrite "*aarch64_brkn_ptest" UNSPEC_PTEST)) (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3, 4, 5 ] - [ Upa , Upa, Upa, 0, , ] brkns\t%0.b, %1/z, %2.b, %0.b + {@ [ cons: =0, 1 , 2 , 3, 4, 5; attrs: pred_clobber ] + [ &Upa , Upa, Upa, 0, , ; yes ] brkns\t%0.b, %1/z, %2.b, %0.b + [ Upa , Upa, Upa, 0, , ; * ] ^ } "&& (operands[4] != CONST0_RTX (VNx16BImode) || operands[5] != CONST0_RTX (VNx16BImode))" @@ -10103,8 +10134,9 @@ (define_insn "*aarch64_brk<brk_op>_cc" (match_dup 3)] SVE_BRKP))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 , 4 ] - [ Upa , Upa, Upa, Upa, ] brk<brk_op>s\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 , 4; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa, ; yes ] brk<brk_op>s\t%0.b, %1/z, %2.b, %3.b + [ Upa , Upa, Upa, Upa, ; * ] ^ } ) @@ -10123,8 +10155,9 @@ (define_insn "*aarch64_brk<brk_op>_ptest" UNSPEC_PTEST)) (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 , 4 ] - [ Upa , Upa, Upa, Upa, ] brk<brk_op>s\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 , 4; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa, ; yes ] brk<brk_op>s\t%0.b, %1/z, %2.b, %3.b + [ Upa , Upa, Upa, Upa, ; * ] ^ } ) diff --git a/gcc/config/aarch64/aarch64-sve2.md b/gcc/config/aarch64/aarch64-sve2.md index aa12baf48355358ca4fefe88157df3aac6eb09bd..771c346b8a3188dd7e3f3a98ee28f0ca5f928215 100644 --- a/gcc/config/aarch64/aarch64-sve2.md +++ b/gcc/config/aarch64/aarch64-sve2.md @@ -3349,8 +3349,9 @@ (define_insn "@aarch64_pred_<sve_int_op><mode>" UNSPEC_PRED_Z)) (clobber (reg:CC_NZC CC_REGNUM))] "TARGET_SVE2 && TARGET_NON_STREAMING" - {@ [ cons: =0, 1 , 2, 3, 4 ] - [ Upa , Upl, , w, w ] <sve_int_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.<Vetype> + {@ [ cons: =0, 1 , 2, 3, 4; attrs: pred_clobber ] + [ &Upa , Upl, , w, w; yes ] <sve_int_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.<Vetype> + [ Upa , Upl, , w, w; * ] ^ } ) ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 4/4]AArch64: enable new predicate tuning for Neoverse cores. 2024-05-15 10:28 [PATCH 0/4]AArch64: support conditional early clobbers on certain operations Tamar Christina ` (2 preceding siblings ...) 2024-05-15 10:29 ` [PATCH 3/4]AArch64: add new alternative with early clobber to patterns Tamar Christina @ 2024-05-15 10:29 ` Tamar Christina 2024-05-15 11:20 ` [PATCH 0/4]AArch64: support conditional early clobbers on certain operations Richard Biener 4 siblings, 0 replies; 25+ messages in thread From: Tamar Christina @ 2024-05-15 10:29 UTC (permalink / raw) To: gcc-patches Cc: nd, Richard.Earnshaw, Marcus.Shawcroft, ktkachov, richard.sandiford [-- Attachment #1: Type: text/plain, Size: 7715 bytes --] Hi All, This enables the new tuning flag for Neoverse V1, Neoverse V2 and Neoverse N2. It is kept off for generic codegen. Note the reason for the +sve even though they are in aarch64-sve.exp is if the testsuite is ran with a forced SVE off option, e.g. -march=armv8-a+nosve then the intrinsics end up being disabled because the -march is preferred over the -mcpu even though the -mcpu comes later. This prevents the tests from failing in such runs. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * config/aarch64/tuning_models/neoversen2.h (neoversen2_tunings): Add AARCH64_EXTRA_TUNE_EARLY_CLOBBER_SVE_PRED_DEST. * config/aarch64/tuning_models/neoversev1.h (neoversev1_tunings): Add AARCH64_EXTRA_TUNE_EARLY_CLOBBER_SVE_PRED_DEST. * config/aarch64/tuning_models/neoversev2.h (neoversev2_tunings): Add AARCH64_EXTRA_TUNE_EARLY_CLOBBER_SVE_PRED_DEST. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/pred_clobber_1.c: New test. * gcc.target/aarch64/sve/pred_clobber_2.c: New test. * gcc.target/aarch64/sve/pred_clobber_3.c: New test. * gcc.target/aarch64/sve/pred_clobber_4.c: New test. * gcc.target/aarch64/sve/pred_clobber_5.c: New test. --- diff --git a/gcc/config/aarch64/tuning_models/neoversen2.h b/gcc/config/aarch64/tuning_models/neoversen2.h index 7e799bbe762fe862e31befed50e54040a7fd1f2f..0d8f3f6be67f3583b00473bef97ea3ae4fcea4ec 100644 --- a/gcc/config/aarch64/tuning_models/neoversen2.h +++ b/gcc/config/aarch64/tuning_models/neoversen2.h @@ -236,7 +236,8 @@ static const struct tune_params neoversen2_tunings = (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS - | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT), /* tune_flags. */ + | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT + | AARCH64_EXTRA_TUNE_EARLY_CLOBBER_SVE_PRED_DEST), /* tune_flags. */ &generic_prefetch_tune, AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ diff --git a/gcc/config/aarch64/tuning_models/neoversev1.h b/gcc/config/aarch64/tuning_models/neoversev1.h index 9363f2ad98a5279cc99f2f9b1509ba921d582e84..d28d0b1c0498ed250b0a93ca69720fe10c65c93d 100644 --- a/gcc/config/aarch64/tuning_models/neoversev1.h +++ b/gcc/config/aarch64/tuning_models/neoversev1.h @@ -227,7 +227,8 @@ static const struct tune_params neoversev1_tunings = (AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT - | AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND), /* tune_flags. */ + | AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND + | AARCH64_EXTRA_TUNE_EARLY_CLOBBER_SVE_PRED_DEST), /* tune_flags. */ &generic_prefetch_tune, AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ diff --git a/gcc/config/aarch64/tuning_models/neoversev2.h b/gcc/config/aarch64/tuning_models/neoversev2.h index bc01ed767c9b690504eb98456402df5d9d64eee3..3b2f9797bd777e73ca9c21501fa97448d96cb65e 100644 --- a/gcc/config/aarch64/tuning_models/neoversev2.h +++ b/gcc/config/aarch64/tuning_models/neoversev2.h @@ -236,7 +236,8 @@ static const struct tune_params neoversev2_tunings = (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS - | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT), /* tune_flags. */ + | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT + | AARCH64_EXTRA_TUNE_EARLY_CLOBBER_SVE_PRED_DEST), /* tune_flags. */ &generic_prefetch_tune, AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_1.c b/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_1.c new file mode 100644 index 0000000000000000000000000000000000000000..934a00a38531c5fd4139d99ff33414904b2c104f --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_1.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mcpu=neoverse-n2" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#pragma GCC target "+sve" + +#include <arm_sve.h> + +extern void use(svbool_t); + +/* +** foo: +** ... +** ptrue p([1-9][0-9]?).b, all +** cmplo p0.h, p\1/z, z0.h, z[0-9]+.h +** ... +*/ +void foo (svuint16_t a, uint16_t b) +{ + svbool_t p0 = svcmplt_n_u16 (svptrue_b16 (), a, b); + use (p0); +} diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_2.c b/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_2.c new file mode 100644 index 0000000000000000000000000000000000000000..58badb66a43b1ac50eeec153b9cac44fc831b145 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_2.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mcpu=neoverse-v2" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#pragma GCC target "+sve" + +#include <arm_sve.h> + +extern void use(svbool_t); + +/* +** foo: +** ... +** ptrue p([1-9][0-9]?).b, all +** cmplo p0.h, p\1/z, z0.h, z[0-9]+.h +** ... +*/ +void foo (svuint16_t a, uint16_t b) +{ + svbool_t p0 = svcmplt_n_u16 (svptrue_b16 (), a, b); + use (p0); +} diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_3.c b/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_3.c new file mode 100644 index 0000000000000000000000000000000000000000..c67c2bd3422e0bb0c694b5fe0adf0d83e4d967c6 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_3.c @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mcpu=neoverse-v1" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#pragma GCC target "+sve" + +#include <arm_sve.h> + +extern void use(svbool_t); + +/* +** foo: +** ... +** ptrue p([1-9][0-9]?).b, all +** cmplo p0.h, p\1/z, z0.h, z[0-9]+.h +** ... +*/ +void foo (svuint16_t a, uint16_t b) +{ + svbool_t p0 = svcmplt_n_u16 (svptrue_b16 (), a, b); + use (p0); +} + diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_4.c b/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_4.c new file mode 100644 index 0000000000000000000000000000000000000000..c0120afe5d523eff8297fadd4fc4c678676413d6 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_4.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#pragma GCC target "+sve" + +#include <arm_sve.h> + +extern void use(svbool_t); + +/* +** foo: +** ... +** ptrue p0.b, all +** cmplo p0.h, p0/z, z0.h, z[0-9]+.h +** ... +*/ +void foo (svuint16_t a, uint16_t b) +{ + svbool_t p0 = svcmplt_n_u16 (svptrue_b16 (), a, b); + use (p0); +} diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_5.c b/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_5.c new file mode 100644 index 0000000000000000000000000000000000000000..63f0669abd23d45c0ffd77c53859a098a21e0192 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_5.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mcpu=neoverse-n2 -ffixed-p1 -ffixed-p2 -ffixed-p3 -ffixed-p4 -ffixed-p5 -ffixed-p6 -ffixed-p7 -ffixed-p8 -ffixed-p9 -ffixed-p10 -ffixed-p11 -ffixed-p12 -ffixed-p12 -ffixed-p13 -ffixed-p14 -ffixed-p15" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#pragma GCC target "+sve" + +#include <arm_sve.h> + +extern void use(svbool_t); + +/* +** foo: +** ... +** ptrue p0.b, all +** cmplo p0.h, p0/z, z0.h, z[0-9]+.h +** ... +*/ +void foo (svuint16_t a, uint16_t b) +{ + svbool_t p0 = svcmplt_n_u16 (svptrue_b16 (), a, b); + use (p0); +} -- [-- Attachment #2: rb18356.patch --] [-- Type: text/x-diff, Size: 6482 bytes --] diff --git a/gcc/config/aarch64/tuning_models/neoversen2.h b/gcc/config/aarch64/tuning_models/neoversen2.h index 7e799bbe762fe862e31befed50e54040a7fd1f2f..0d8f3f6be67f3583b00473bef97ea3ae4fcea4ec 100644 --- a/gcc/config/aarch64/tuning_models/neoversen2.h +++ b/gcc/config/aarch64/tuning_models/neoversen2.h @@ -236,7 +236,8 @@ static const struct tune_params neoversen2_tunings = (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS - | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT), /* tune_flags. */ + | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT + | AARCH64_EXTRA_TUNE_EARLY_CLOBBER_SVE_PRED_DEST), /* tune_flags. */ &generic_prefetch_tune, AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ diff --git a/gcc/config/aarch64/tuning_models/neoversev1.h b/gcc/config/aarch64/tuning_models/neoversev1.h index 9363f2ad98a5279cc99f2f9b1509ba921d582e84..d28d0b1c0498ed250b0a93ca69720fe10c65c93d 100644 --- a/gcc/config/aarch64/tuning_models/neoversev1.h +++ b/gcc/config/aarch64/tuning_models/neoversev1.h @@ -227,7 +227,8 @@ static const struct tune_params neoversev1_tunings = (AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT - | AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND), /* tune_flags. */ + | AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND + | AARCH64_EXTRA_TUNE_EARLY_CLOBBER_SVE_PRED_DEST), /* tune_flags. */ &generic_prefetch_tune, AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ diff --git a/gcc/config/aarch64/tuning_models/neoversev2.h b/gcc/config/aarch64/tuning_models/neoversev2.h index bc01ed767c9b690504eb98456402df5d9d64eee3..3b2f9797bd777e73ca9c21501fa97448d96cb65e 100644 --- a/gcc/config/aarch64/tuning_models/neoversev2.h +++ b/gcc/config/aarch64/tuning_models/neoversev2.h @@ -236,7 +236,8 @@ static const struct tune_params neoversev2_tunings = (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS - | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT), /* tune_flags. */ + | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT + | AARCH64_EXTRA_TUNE_EARLY_CLOBBER_SVE_PRED_DEST), /* tune_flags. */ &generic_prefetch_tune, AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_1.c b/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_1.c new file mode 100644 index 0000000000000000000000000000000000000000..934a00a38531c5fd4139d99ff33414904b2c104f --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_1.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mcpu=neoverse-n2" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#pragma GCC target "+sve" + +#include <arm_sve.h> + +extern void use(svbool_t); + +/* +** foo: +** ... +** ptrue p([1-9][0-9]?).b, all +** cmplo p0.h, p\1/z, z0.h, z[0-9]+.h +** ... +*/ +void foo (svuint16_t a, uint16_t b) +{ + svbool_t p0 = svcmplt_n_u16 (svptrue_b16 (), a, b); + use (p0); +} diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_2.c b/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_2.c new file mode 100644 index 0000000000000000000000000000000000000000..58badb66a43b1ac50eeec153b9cac44fc831b145 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_2.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mcpu=neoverse-v2" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#pragma GCC target "+sve" + +#include <arm_sve.h> + +extern void use(svbool_t); + +/* +** foo: +** ... +** ptrue p([1-9][0-9]?).b, all +** cmplo p0.h, p\1/z, z0.h, z[0-9]+.h +** ... +*/ +void foo (svuint16_t a, uint16_t b) +{ + svbool_t p0 = svcmplt_n_u16 (svptrue_b16 (), a, b); + use (p0); +} diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_3.c b/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_3.c new file mode 100644 index 0000000000000000000000000000000000000000..c67c2bd3422e0bb0c694b5fe0adf0d83e4d967c6 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_3.c @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mcpu=neoverse-v1" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#pragma GCC target "+sve" + +#include <arm_sve.h> + +extern void use(svbool_t); + +/* +** foo: +** ... +** ptrue p([1-9][0-9]?).b, all +** cmplo p0.h, p\1/z, z0.h, z[0-9]+.h +** ... +*/ +void foo (svuint16_t a, uint16_t b) +{ + svbool_t p0 = svcmplt_n_u16 (svptrue_b16 (), a, b); + use (p0); +} + diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_4.c b/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_4.c new file mode 100644 index 0000000000000000000000000000000000000000..c0120afe5d523eff8297fadd4fc4c678676413d6 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_4.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#pragma GCC target "+sve" + +#include <arm_sve.h> + +extern void use(svbool_t); + +/* +** foo: +** ... +** ptrue p0.b, all +** cmplo p0.h, p0/z, z0.h, z[0-9]+.h +** ... +*/ +void foo (svuint16_t a, uint16_t b) +{ + svbool_t p0 = svcmplt_n_u16 (svptrue_b16 (), a, b); + use (p0); +} diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_5.c b/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_5.c new file mode 100644 index 0000000000000000000000000000000000000000..63f0669abd23d45c0ffd77c53859a098a21e0192 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_5.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mcpu=neoverse-n2 -ffixed-p1 -ffixed-p2 -ffixed-p3 -ffixed-p4 -ffixed-p5 -ffixed-p6 -ffixed-p7 -ffixed-p8 -ffixed-p9 -ffixed-p10 -ffixed-p11 -ffixed-p12 -ffixed-p12 -ffixed-p13 -ffixed-p14 -ffixed-p15" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#pragma GCC target "+sve" + +#include <arm_sve.h> + +extern void use(svbool_t); + +/* +** foo: +** ... +** ptrue p0.b, all +** cmplo p0.h, p0/z, z0.h, z[0-9]+.h +** ... +*/ +void foo (svuint16_t a, uint16_t b) +{ + svbool_t p0 = svcmplt_n_u16 (svptrue_b16 (), a, b); + use (p0); +} ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 0/4]AArch64: support conditional early clobbers on certain operations. 2024-05-15 10:28 [PATCH 0/4]AArch64: support conditional early clobbers on certain operations Tamar Christina ` (3 preceding siblings ...) 2024-05-15 10:29 ` [PATCH 4/4]AArch64: enable new predicate tuning for Neoverse cores Tamar Christina @ 2024-05-15 11:20 ` Richard Biener 2024-05-15 11:23 ` Tamar Christina 4 siblings, 1 reply; 25+ messages in thread From: Richard Biener @ 2024-05-15 11:20 UTC (permalink / raw) To: Tamar Christina Cc: gcc-patches, nd, Richard.Earnshaw, Marcus.Shawcroft, ktkachov, richard.sandiford On Wed, May 15, 2024 at 12:29 PM Tamar Christina <tamar.christina@arm.com> wrote: > > Hi All, > > Some Neoverse Software Optimization Guides (SWoG) have a clause that state > that for predicated operations that also produce a predicate it is preferred > that the codegen should use a different register for the destination than that > of the input predicate in order to avoid a performance overhead. > > This of course has the problem that it increases register pressure and so should > be done with care. Additionally not all micro-architectures have this > consideration and so it shouldn't be done as a default thing. > > The patch series adds support for doing conditional early clobbers through a > combination of new alternatives and attributes to control their availability. You could have two alternatives, one with early clobber and one with a matching constraint where you'd disparage the matching constraint one? > On high register pressure we also use LRA's costing to prefer not to use the > alternative and instead just use the tie as this is preferable to a reload. > > Concretely this patch series does: > > > aarch64-none-elf-gcc -O3 -g0 -S -o - pred-clobber.c -mcpu=neoverse-n2 > > foo: > mov z31.h, w0 > ptrue p3.b, all > cmplo p0.h, p3/z, z0.h, z31.h > b use > > > aarch64-none-elf-gcc -O3 -g0 -S -o - pred-clobber.c -mcpu=neoverse-n1+sve > > foo: > mov z31.h, w0 > ptrue p0.b, all > cmplo p0.h, p0/z, z0.h, z31.h > b use > > > aarch64-none-elf-gcc -O3 -g0 -S -o - pred-clobber.c -mcpu=neoverse-n2 -ffixed-p[1-15] > > foo: > mov z31.h, w0 > ptrue p0.b, all > cmplo p0.h, p0/z, z0.h, z31.h > b use > > Testcases for the changes are in the last patch of the series. > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > Thanks, > Tamar > > --- > > -- ^ permalink raw reply [flat|nested] 25+ messages in thread
* RE: [PATCH 0/4]AArch64: support conditional early clobbers on certain operations. 2024-05-15 11:20 ` [PATCH 0/4]AArch64: support conditional early clobbers on certain operations Richard Biener @ 2024-05-15 11:23 ` Tamar Christina 2024-05-15 14:51 ` Richard Sandiford 0 siblings, 1 reply; 25+ messages in thread From: Tamar Christina @ 2024-05-15 11:23 UTC (permalink / raw) To: Richard Biener Cc: gcc-patches, nd, Richard Earnshaw, Marcus Shawcroft, ktkachov, Richard Sandiford > -----Original Message----- > From: Richard Biener <richard.guenther@gmail.com> > Sent: Wednesday, May 15, 2024 12:20 PM > To: Tamar Christina <Tamar.Christina@arm.com> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard Earnshaw > <Richard.Earnshaw@arm.com>; Marcus Shawcroft > <Marcus.Shawcroft@arm.com>; ktkachov@gcc.gnu.org; Richard Sandiford > <Richard.Sandiford@arm.com> > Subject: Re: [PATCH 0/4]AArch64: support conditional early clobbers on certain > operations. > > On Wed, May 15, 2024 at 12:29 PM Tamar Christina > <tamar.christina@arm.com> wrote: > > > > Hi All, > > > > Some Neoverse Software Optimization Guides (SWoG) have a clause that state > > that for predicated operations that also produce a predicate it is preferred > > that the codegen should use a different register for the destination than that > > of the input predicate in order to avoid a performance overhead. > > > > This of course has the problem that it increases register pressure and so should > > be done with care. Additionally not all micro-architectures have this > > consideration and so it shouldn't be done as a default thing. > > > > The patch series adds support for doing conditional early clobbers through a > > combination of new alternatives and attributes to control their availability. > > You could have two alternatives, one with early clobber and one with > a matching constraint where you'd disparage the matching constraint one? > Yeah, that's what I do, though there's no need to disparage the non-early clobber alternative as the early clobber alternative will naturally get a penalty if it needs a reload. Cheers, Tamar > > On high register pressure we also use LRA's costing to prefer not to use the > > alternative and instead just use the tie as this is preferable to a reload. > > > > Concretely this patch series does: > > > > > aarch64-none-elf-gcc -O3 -g0 -S -o - pred-clobber.c -mcpu=neoverse-n2 > > > > foo: > > mov z31.h, w0 > > ptrue p3.b, all > > cmplo p0.h, p3/z, z0.h, z31.h > > b use > > > > > aarch64-none-elf-gcc -O3 -g0 -S -o - pred-clobber.c -mcpu=neoverse-n1+sve > > > > foo: > > mov z31.h, w0 > > ptrue p0.b, all > > cmplo p0.h, p0/z, z0.h, z31.h > > b use > > > > > aarch64-none-elf-gcc -O3 -g0 -S -o - pred-clobber.c -mcpu=neoverse-n2 - > ffixed-p[1-15] > > > > foo: > > mov z31.h, w0 > > ptrue p0.b, all > > cmplo p0.h, p0/z, z0.h, z31.h > > b use > > > > Testcases for the changes are in the last patch of the series. > > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > > > Thanks, > > Tamar > > > > --- > > > > -- ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 0/4]AArch64: support conditional early clobbers on certain operations. 2024-05-15 11:23 ` Tamar Christina @ 2024-05-15 14:51 ` Richard Sandiford 2024-05-15 15:56 ` Tamar Christina 0 siblings, 1 reply; 25+ messages in thread From: Richard Sandiford @ 2024-05-15 14:51 UTC (permalink / raw) To: Tamar Christina Cc: Richard Biener, gcc-patches, nd, Richard Earnshaw, Marcus Shawcroft, ktkachov Tamar Christina <Tamar.Christina@arm.com> writes: >> -----Original Message----- >> From: Richard Biener <richard.guenther@gmail.com> >> Sent: Wednesday, May 15, 2024 12:20 PM >> To: Tamar Christina <Tamar.Christina@arm.com> >> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard Earnshaw >> <Richard.Earnshaw@arm.com>; Marcus Shawcroft >> <Marcus.Shawcroft@arm.com>; ktkachov@gcc.gnu.org; Richard Sandiford >> <Richard.Sandiford@arm.com> >> Subject: Re: [PATCH 0/4]AArch64: support conditional early clobbers on certain >> operations. >> >> On Wed, May 15, 2024 at 12:29 PM Tamar Christina >> <tamar.christina@arm.com> wrote: >> > >> > Hi All, >> > >> > Some Neoverse Software Optimization Guides (SWoG) have a clause that state >> > that for predicated operations that also produce a predicate it is preferred >> > that the codegen should use a different register for the destination than that >> > of the input predicate in order to avoid a performance overhead. >> > >> > This of course has the problem that it increases register pressure and so should >> > be done with care. Additionally not all micro-architectures have this >> > consideration and so it shouldn't be done as a default thing. >> > >> > The patch series adds support for doing conditional early clobbers through a >> > combination of new alternatives and attributes to control their availability. >> >> You could have two alternatives, one with early clobber and one with >> a matching constraint where you'd disparage the matching constraint one? >> > > Yeah, that's what I do, though there's no need to disparage the non-early clobber > alternative as the early clobber alternative will naturally get a penalty if it needs a > reload. But I think Richard's suggestion was to disparage the one with a matching constraint (not the earlyclobber), to reflect the increased cost of reusing the register. We did take that approach for gathers, e.g.: [&w, Z, w, Ui1, Ui1, Upl] ld1<Vesize>\t%0.s, %5/z, [%2.s] [?w, Z, 0, Ui1, Ui1, Upl] ^ The (supposed) advantage is that, if register pressure is so tight that using matching registers is the only alternative, we still have the opportunity to do that, as a last resort. Providing only an earlyclobber version means that using the same register is prohibited outright. If no other register is free, the RA would need to spill something else to free up a temporary register. And it might then do the equivalent of (pseudo-code): not p1.b, ..., p0.b mov p0.d, p1.d after spilling what would otherwise have occupied p1. In that situation it would be better use: not p0.b, ..., p0.b and not introduce the spill of p1. Another case where using matching registers is natural is for loop-carried dependencies. Do we want to keep them in: loop: ...no other sets of p0.... not p0.b, ..., p0.b ...no other sets of p0.... bne loop or should we split it to: loop: ...no other sets of p0.... not p1.b, ..., p0.b mov p0.d, p1.d ...no other sets of p0.... bne loop ? Thanks, Richard > > Cheers, > Tamar > >> > On high register pressure we also use LRA's costing to prefer not to use the >> > alternative and instead just use the tie as this is preferable to a reload. >> > >> > Concretely this patch series does: >> > >> > > aarch64-none-elf-gcc -O3 -g0 -S -o - pred-clobber.c -mcpu=neoverse-n2 >> > >> > foo: >> > mov z31.h, w0 >> > ptrue p3.b, all >> > cmplo p0.h, p3/z, z0.h, z31.h >> > b use >> > >> > > aarch64-none-elf-gcc -O3 -g0 -S -o - pred-clobber.c -mcpu=neoverse-n1+sve >> > >> > foo: >> > mov z31.h, w0 >> > ptrue p0.b, all >> > cmplo p0.h, p0/z, z0.h, z31.h >> > b use >> > >> > > aarch64-none-elf-gcc -O3 -g0 -S -o - pred-clobber.c -mcpu=neoverse-n2 - >> ffixed-p[1-15] >> > >> > foo: >> > mov z31.h, w0 >> > ptrue p0.b, all >> > cmplo p0.h, p0/z, z0.h, z31.h >> > b use >> > >> > Testcases for the changes are in the last patch of the series. >> > >> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. >> > >> > Thanks, >> > Tamar >> > >> > --- >> > >> > -- ^ permalink raw reply [flat|nested] 25+ messages in thread
* RE: [PATCH 0/4]AArch64: support conditional early clobbers on certain operations. 2024-05-15 14:51 ` Richard Sandiford @ 2024-05-15 15:56 ` Tamar Christina 2024-05-15 21:31 ` Richard Sandiford 0 siblings, 1 reply; 25+ messages in thread From: Tamar Christina @ 2024-05-15 15:56 UTC (permalink / raw) To: Richard Sandiford Cc: Richard Biener, gcc-patches, nd, Richard Earnshaw, Marcus Shawcroft, ktkachov > >> On Wed, May 15, 2024 at 12:29 PM Tamar Christina > >> <tamar.christina@arm.com> wrote: > >> > > >> > Hi All, > >> > > >> > Some Neoverse Software Optimization Guides (SWoG) have a clause that state > >> > that for predicated operations that also produce a predicate it is preferred > >> > that the codegen should use a different register for the destination than that > >> > of the input predicate in order to avoid a performance overhead. > >> > > >> > This of course has the problem that it increases register pressure and so > should > >> > be done with care. Additionally not all micro-architectures have this > >> > consideration and so it shouldn't be done as a default thing. > >> > > >> > The patch series adds support for doing conditional early clobbers through a > >> > combination of new alternatives and attributes to control their availability. > >> > >> You could have two alternatives, one with early clobber and one with > >> a matching constraint where you'd disparage the matching constraint one? > >> > > > > Yeah, that's what I do, though there's no need to disparage the non-early clobber > > alternative as the early clobber alternative will naturally get a penalty if it needs a > > reload. > > But I think Richard's suggestion was to disparage the one with a matching > constraint (not the earlyclobber), to reflect the increased cost of > reusing the register. > > We did take that approach for gathers, e.g.: > > [&w, Z, w, Ui1, Ui1, Upl] ld1<Vesize>\t%0.s, %5/z, [%2.s] > [?w, Z, 0, Ui1, Ui1, Upl] ^ > > The (supposed) advantage is that, if register pressure is so tight > that using matching registers is the only alternative, we still > have the opportunity to do that, as a last resort. > > Providing only an earlyclobber version means that using the same > register is prohibited outright. If no other register is free, the RA > would need to spill something else to free up a temporary register. > And it might then do the equivalent of (pseudo-code): > > not p1.b, ..., p0.b > mov p0.d, p1.d > > after spilling what would otherwise have occupied p1. In that > situation it would be better use: > > not p0.b, ..., p0.b > > and not introduce the spill of p1. I think I understood what Richi meant, but I thought it was already working that way. i.e. as one of the testcases I had: > aarch64-none-elf-gcc -O3 -g0 -S -o - pred-clobber.c -mcpu=neoverse-n2 -ffixed-p[1-15] foo: mov z31.h, w0 ptrue p0.b, all cmplo p0.h, p0/z, z0.h, z31.h b use and reload did not force a spill. My understanding of how this works, and how it seems to be working is that since reload costs Alternative from front to back the cheapest one wins and it stops evaluating the rest. The early clobber case is first and preferred, however when it's not possible, i.e. requires a non-pseudo reload, the reload cost is added to the alternative. However you're right that in the following testcase: -mcpu=neoverse-n2 -ffixed-p1 -ffixed-p2 -ffixed-p3 -ffixed-p4 -ffixed-p5 -ffixed-p6 -ffixed-p7 -ffixed-p8 -ffixed-p9 -ffixed-p10 -ffixed-p11 -ffixed-p12 -ffixed-p12 -ffixed-p13 -ffixed-p14 -ffixed-p14 -fdump-rtl-reload i.e. giving it an extra free register inexplicably causes a spill: foo: addvl sp, sp, #-1 mov z31.h, w0 ptrue p0.b, all str p15, [sp] cmplo p15.h, p0/z, z0.h, z31.h mov p0.b, p15.b ldr p15, [sp] addvl sp, sp, #1 b use so that's unexpected and is very weird as p15 has no defined value.. Now adding the ? as suggested to the non-early clobber alternative does not fix it, and my mental model for how this is supposed to work does not quite line up.. Why would making the non-clobber alternative even more expensive help it during high register pressure?? But with that suggestion the above case does not get fixed and the following case -mcpu=neoverse-n2 -ffixed-p1 -ffixed-p2 -ffixed-p3 -ffixed-p4 -ffixed-p5 -ffixed-p6 -ffixed-p7 -ffixed-p8 -ffixed-p9 -ffixed-p10 -ffixed-p11 -ffixed-p12 -ffixed-p12 -ffixed-p13 -ffixed-p14 -ffixed-p15 -fdump-rtl-reload ICEs: pred-clobber.c: In function 'foo': pred-clobber.c:9:1: error: unable to find a register to spill 9 | } | ^ pred-clobber.c:9:1: error: this is the insn: (insn 10 22 19 2 (parallel [ (set (reg:VNx8BI 110 [104]) (unspec:VNx8BI [ (reg:VNx8BI 112) (const_int 1 [0x1]) (ltu:VNx8BI (reg:VNx8HI 32 v0) (reg:VNx8HI 63 v31)) ] UNSPEC_PRED_Z)) (clobber (reg:CC_NZC 66 cc)) ]) "pred-clobber.c":7:19 8687 {aarch64_pred_cmplovnx8hi} (expr_list:REG_DEAD (reg:VNx8BI 112) (expr_list:REG_DEAD (reg:VNx8HI 63 v31) (expr_list:REG_DEAD (reg:VNx8HI 32 v0) (expr_list:REG_UNUSED (reg:CC_NZC 66 cc) (nil)))))) during RTL pass: reload dump file: pred-clobber.c.315r.reload and this is because the use of ? has the unintended side-effect of blocking a register class entirely during Sched1 as we've recently discovered. i.e. see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114766 in this case it marked the alternative as NO_REGS during sched1 and so it's completely dead. the use of the ? alternatives has caused quite the code bloat as we've recently discovered because of this unexpected and undocumented behavior. To me, diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md index 93ec59e58af..2ee3d8ea35e 100644 --- a/gcc/config/aarch64/aarch64-sve.md +++ b/gcc/config/aarch64/aarch64-sve.md @@ -8120,10 +8120,10 @@ (define_insn "@aarch64_pred_cmp<cmp_op><mode>" (clobber (reg:CC_NZC CC_REGNUM))] "TARGET_SVE" {@ [ cons: =0 , 1 , 3 , 4 ; attrs: pred_clobber ] - [ &Upa , Upl , w , <sve_imm_con>; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, #%4 - [ Upa , Upl , w , <sve_imm_con>; * ] ^ - [ &Upa , Upl , w , w ; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.<Vetype> - [ Upa , Upl , w , w ; * ] ^ + [ ^&Upa , Upl , w , <sve_imm_con>; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, #%4 + [ Upa , Upl , w , <sve_imm_con>; * ] ^ + [ ^&Upa , Upl , w , w ; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.<Vetype> + [ Upa , Upl , w , w ; * ] ^ } ) Would have been the right approach, i.e. we prefer the alternative unless a reload is needed, which should work no? well. if ^ wasn't broken the same way as ?. Perhaps I need to use Wilco's new alternative that doesn't block a register class? But I'm probably missing something... > > Another case where using matching registers is natural is for > loop-carried dependencies. Do we want to keep them in: > > loop: > ...no other sets of p0.... > not p0.b, ..., p0.b > ...no other sets of p0.... > bne loop > > or should we split it to: > > loop: > ...no other sets of p0.... > not p1.b, ..., p0.b > mov p0.d, p1.d > ...no other sets of p0.... > bne loop > > ? On the uarches that this affects they are equivalent (I'm happy to expand on this internally if you'd like), So in those cases the first one is preferred as it won't matter. Thanks for the review an explanation! Tamar > > Thanks, > Richard > > > > > Cheers, > > Tamar > > > >> > On high register pressure we also use LRA's costing to prefer not to use the > >> > alternative and instead just use the tie as this is preferable to a reload. > >> > > >> > Concretely this patch series does: > >> > > >> > > aarch64-none-elf-gcc -O3 -g0 -S -o - pred-clobber.c -mcpu=neoverse-n2 > >> > > >> > foo: > >> > mov z31.h, w0 > >> > ptrue p3.b, all > >> > cmplo p0.h, p3/z, z0.h, z31.h > >> > b use > >> > > >> > > aarch64-none-elf-gcc -O3 -g0 -S -o - pred-clobber.c -mcpu=neoverse-n1+sve > >> > > >> > foo: > >> > mov z31.h, w0 > >> > ptrue p0.b, all > >> > cmplo p0.h, p0/z, z0.h, z31.h > >> > b use > >> > > >> > > aarch64-none-elf-gcc -O3 -g0 -S -o - pred-clobber.c -mcpu=neoverse-n2 - > >> ffixed-p[1-15] > >> > > >> > foo: > >> > mov z31.h, w0 > >> > ptrue p0.b, all > >> > cmplo p0.h, p0/z, z0.h, z31.h > >> > b use > >> > > >> > Testcases for the changes are in the last patch of the series. > >> > > >> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > >> > > >> > Thanks, > >> > Tamar > >> > > >> > --- > >> > > >> > -- ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 0/4]AArch64: support conditional early clobbers on certain operations. 2024-05-15 15:56 ` Tamar Christina @ 2024-05-15 21:31 ` Richard Sandiford 2024-05-16 2:45 ` Tamar Christina 2024-05-21 3:24 ` Tamar Christina 0 siblings, 2 replies; 25+ messages in thread From: Richard Sandiford @ 2024-05-15 21:31 UTC (permalink / raw) To: Tamar Christina Cc: Richard Biener, gcc-patches, nd, Richard Earnshaw, Marcus Shawcroft, ktkachov Tamar Christina <Tamar.Christina@arm.com> writes: >> >> On Wed, May 15, 2024 at 12:29 PM Tamar Christina >> >> <tamar.christina@arm.com> wrote: >> >> > >> >> > Hi All, >> >> > >> >> > Some Neoverse Software Optimization Guides (SWoG) have a clause that state >> >> > that for predicated operations that also produce a predicate it is preferred >> >> > that the codegen should use a different register for the destination than that >> >> > of the input predicate in order to avoid a performance overhead. >> >> > >> >> > This of course has the problem that it increases register pressure and so >> should >> >> > be done with care. Additionally not all micro-architectures have this >> >> > consideration and so it shouldn't be done as a default thing. >> >> > >> >> > The patch series adds support for doing conditional early clobbers through a >> >> > combination of new alternatives and attributes to control their availability. >> >> >> >> You could have two alternatives, one with early clobber and one with >> >> a matching constraint where you'd disparage the matching constraint one? >> >> >> > >> > Yeah, that's what I do, though there's no need to disparage the non-early clobber >> > alternative as the early clobber alternative will naturally get a penalty if it needs a >> > reload. >> >> But I think Richard's suggestion was to disparage the one with a matching >> constraint (not the earlyclobber), to reflect the increased cost of >> reusing the register. >> >> We did take that approach for gathers, e.g.: >> >> [&w, Z, w, Ui1, Ui1, Upl] ld1<Vesize>\t%0.s, %5/z, [%2.s] >> [?w, Z, 0, Ui1, Ui1, Upl] ^ >> >> The (supposed) advantage is that, if register pressure is so tight >> that using matching registers is the only alternative, we still >> have the opportunity to do that, as a last resort. >> >> Providing only an earlyclobber version means that using the same >> register is prohibited outright. If no other register is free, the RA >> would need to spill something else to free up a temporary register. >> And it might then do the equivalent of (pseudo-code): >> >> not p1.b, ..., p0.b >> mov p0.d, p1.d >> >> after spilling what would otherwise have occupied p1. In that >> situation it would be better use: >> >> not p0.b, ..., p0.b >> >> and not introduce the spill of p1. > > I think I understood what Richi meant, but I thought it was already working that way. The suggestion was to use matching constraints (like "0") though, whereas the patch doesn't. I think your argument is that you don't need to use matching constraints. But that's different from the suggestion (and from how we handle gathers). I was going to say in response to patch 3 (but got distracted, sorry): I don't think we should have: &Upa, Upa, ... Upa, Upa, ... (taken from the pure logic ops) enabled at the same time. Even though it works for the testcases, I don't think it has well-defined semantics. The problem is that, taken on its own, the second alternative says that matching operands are free. And fundamentally, I don't think the costs *must* take the earlyclobber alternative over the non-earlyclobber one (when costing during IRA, for instance). In principle, the cheapest is best. The aim of the gather approach is to make each alternative correct in isolation. In: [&w, Z, w, Ui1, Ui1, Upl] ld1<Vesize>\t%0.s, %5/z, [%2.s] [?w, Z, 0, Ui1, Ui1, Upl] ^ the second alternative says that it is possible to have operands 0 and 2 be the same vector register, but using that version has the cost of an extra reload. In that sense the alternatives are (essentially) consistent about the restriction. > i.e. as one of the testcases I had: > >> aarch64-none-elf-gcc -O3 -g0 -S -o - pred-clobber.c -mcpu=neoverse-n2 -ffixed-p[1-15] > > foo: > mov z31.h, w0 > ptrue p0.b, all > cmplo p0.h, p0/z, z0.h, z31.h > b use > > and reload did not force a spill. > > My understanding of how this works, and how it seems to be working is that since reload costs > Alternative from front to back the cheapest one wins and it stops evaluating the rest. > > The early clobber case is first and preferred, however when it's not possible, i.e. requires a non-pseudo > reload, the reload cost is added to the alternative. > > However you're right that in the following testcase: > > -mcpu=neoverse-n2 -ffixed-p1 -ffixed-p2 -ffixed-p3 -ffixed-p4 -ffixed-p5 -ffixed-p6 -ffixed-p7 -ffixed-p8 -ffixed-p9 -ffixed-p10 -ffixed-p11 -ffixed-p12 -ffixed-p12 -ffixed-p13 -ffixed-p14 -ffixed-p14 -fdump-rtl-reload > > i.e. giving it an extra free register inexplicably causes a spill: > > foo: > addvl sp, sp, #-1 > mov z31.h, w0 > ptrue p0.b, all > str p15, [sp] > cmplo p15.h, p0/z, z0.h, z31.h > mov p0.b, p15.b > ldr p15, [sp] > addvl sp, sp, #1 > b use > > so that's unexpected and is very weird as p15 has no defined value.. This is because the function implicitly uses the SVE PCS, and so needs to preserve p15 for the caller. It looks like the correct behaviour. > Now adding the ? as suggested to the non-early clobber alternative does not fix it, and my mental model for how this is supposed to work does not quite line up.. > Why would making the non-clobber alternative even more expensive help it during high register pressure?? Hopefully the above answers this. The non-clobber alternative has zero extra cost as things stand. The costs from one alternative (the earlyclobber one) don't carry forward to other alternatives. > But with that suggestion the above case does not get fixed > and the following case > > -mcpu=neoverse-n2 -ffixed-p1 -ffixed-p2 -ffixed-p3 -ffixed-p4 -ffixed-p5 -ffixed-p6 -ffixed-p7 -ffixed-p8 -ffixed-p9 -ffixed-p10 -ffixed-p11 -ffixed-p12 -ffixed-p12 -ffixed-p13 -ffixed-p14 -ffixed-p15 -fdump-rtl-reload > > ICEs: > > pred-clobber.c: In function 'foo': > pred-clobber.c:9:1: error: unable to find a register to spill > 9 | } > | ^ > pred-clobber.c:9:1: error: this is the insn: > (insn 10 22 19 2 (parallel [ > (set (reg:VNx8BI 110 [104]) > (unspec:VNx8BI [ > (reg:VNx8BI 112) > (const_int 1 [0x1]) > (ltu:VNx8BI (reg:VNx8HI 32 v0) > (reg:VNx8HI 63 v31)) > ] UNSPEC_PRED_Z)) > (clobber (reg:CC_NZC 66 cc)) > ]) "pred-clobber.c":7:19 8687 {aarch64_pred_cmplovnx8hi} > (expr_list:REG_DEAD (reg:VNx8BI 112) > (expr_list:REG_DEAD (reg:VNx8HI 63 v31) > (expr_list:REG_DEAD (reg:VNx8HI 32 v0) > (expr_list:REG_UNUSED (reg:CC_NZC 66 cc) > (nil)))))) > during RTL pass: reload > dump file: pred-clobber.c.315r.reload Which pattern did you use? > and this is because the use of ? has the unintended side-effect of blocking a register class entirely during Sched1 as we've recently discovered. > i.e. see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114766 (Is sched1 the problem here, or is it purely an RA thing? What happens when scheduling is disabled?) > in this case it marked the alternative as NO_REGS during sched1 and so it's completely dead. > the use of the ? alternatives has caused quite the code bloat as we've recently discovered because of this unexpected and undocumented behavior. > > To me, > > diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md > index 93ec59e58af..2ee3d8ea35e 100644 > --- a/gcc/config/aarch64/aarch64-sve.md > +++ b/gcc/config/aarch64/aarch64-sve.md > @@ -8120,10 +8120,10 @@ (define_insn "@aarch64_pred_cmp<cmp_op><mode>" > (clobber (reg:CC_NZC CC_REGNUM))] > "TARGET_SVE" > {@ [ cons: =0 , 1 , 3 , 4 ; attrs: pred_clobber ] > - [ &Upa , Upl , w , <sve_imm_con>; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, #%4 > - [ Upa , Upl , w , <sve_imm_con>; * ] ^ > - [ &Upa , Upl , w , w ; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.<Vetype> > - [ Upa , Upl , w , w ; * ] ^ > + [ ^&Upa , Upl , w , <sve_imm_con>; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, #%4 > + [ Upa , Upl , w , <sve_imm_con>; * ] ^ > + [ ^&Upa , Upl , w , w ; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.<Vetype> > + [ Upa , Upl , w , w ; * ] ^ > } > ) > > Would have been the right approach, i.e. we prefer the alternative unless a reload is needed, which should work no? well. if ^ wasn't broken the same way > as ?. Perhaps I need to use Wilco's new alternative that doesn't block a register class? Hmm, I'm not sure. It seems odd to mark only the output with ^, since reloading the output isn't fundamentally different (costwise) from reloading the input. But to me, it's the alternative without the earlyclobber that should be disparaged, since it's the inherently expensive one. The gather-like approach would be something like: [ &Upa , Upl , w , <sve_imm_con>; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, #%4 [ ?Upl , 0 , w , <sve_imm_con>; yes ] ^ [ Upa , Upl , w , <sve_imm_con>; no ] ^ [ &Upa , Upl , w , w ; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.<Vetype> [ ?Upl , 0 , w , w ; yes ] ^ [ Upa , Upl , w , w ; no ] ^ with: (define_attr "pred_clobber" "any,no,yes" (const_string "any")) Thanks, Richard ^ permalink raw reply [flat|nested] 25+ messages in thread
* RE: [PATCH 0/4]AArch64: support conditional early clobbers on certain operations. 2024-05-15 21:31 ` Richard Sandiford @ 2024-05-16 2:45 ` Tamar Christina 2024-05-21 3:24 ` Tamar Christina 1 sibling, 0 replies; 25+ messages in thread From: Tamar Christina @ 2024-05-16 2:45 UTC (permalink / raw) To: Richard Sandiford Cc: Richard Biener, gcc-patches, nd, Richard Earnshaw, Marcus Shawcroft, ktkachov > -----Original Message----- > From: Richard Sandiford <richard.sandiford@arm.com> > Sent: Wednesday, May 15, 2024 10:31 PM > To: Tamar Christina <Tamar.Christina@arm.com> > Cc: Richard Biener <richard.guenther@gmail.com>; gcc-patches@gcc.gnu.org; nd > <nd@arm.com>; Richard Earnshaw <Richard.Earnshaw@arm.com>; Marcus > Shawcroft <Marcus.Shawcroft@arm.com>; ktkachov@gcc.gnu.org > Subject: Re: [PATCH 0/4]AArch64: support conditional early clobbers on certain > operations. > > Tamar Christina <Tamar.Christina@arm.com> writes: > >> >> On Wed, May 15, 2024 at 12:29 PM Tamar Christina > >> >> <tamar.christina@arm.com> wrote: > >> >> > > >> >> > Hi All, > >> >> > > >> >> > Some Neoverse Software Optimization Guides (SWoG) have a clause that > state > >> >> > that for predicated operations that also produce a predicate it is preferred > >> >> > that the codegen should use a different register for the destination than > that > >> >> > of the input predicate in order to avoid a performance overhead. > >> >> > > >> >> > This of course has the problem that it increases register pressure and so > >> should > >> >> > be done with care. Additionally not all micro-architectures have this > >> >> > consideration and so it shouldn't be done as a default thing. > >> >> > > >> >> > The patch series adds support for doing conditional early clobbers through > a > >> >> > combination of new alternatives and attributes to control their availability. > >> >> > >> >> You could have two alternatives, one with early clobber and one with > >> >> a matching constraint where you'd disparage the matching constraint one? > >> >> > >> > > >> > Yeah, that's what I do, though there's no need to disparage the non-early > clobber > >> > alternative as the early clobber alternative will naturally get a penalty if it > needs a > >> > reload. > >> > >> But I think Richard's suggestion was to disparage the one with a matching > >> constraint (not the earlyclobber), to reflect the increased cost of > >> reusing the register. > >> > >> We did take that approach for gathers, e.g.: > >> > >> [&w, Z, w, Ui1, Ui1, Upl] ld1<Vesize>\t%0.s, %5/z, [%2.s] > >> [?w, Z, 0, Ui1, Ui1, Upl] ^ > >> > >> The (supposed) advantage is that, if register pressure is so tight > >> that using matching registers is the only alternative, we still > >> have the opportunity to do that, as a last resort. > >> > >> Providing only an earlyclobber version means that using the same > >> register is prohibited outright. If no other register is free, the RA > >> would need to spill something else to free up a temporary register. > >> And it might then do the equivalent of (pseudo-code): > >> > >> not p1.b, ..., p0.b > >> mov p0.d, p1.d > >> > >> after spilling what would otherwise have occupied p1. In that > >> situation it would be better use: > >> > >> not p0.b, ..., p0.b > >> > >> and not introduce the spill of p1. > > > > I think I understood what Richi meant, but I thought it was already working that > way. > > The suggestion was to use matching constraints (like "0") though, > whereas the patch doesn't. I think your argument is that you don't > need to use matching constraints. But that's different from the > suggestion (and from how we handle gathers). > > I was going to say in response to patch 3 (but got distracted, sorry): > I don't think we should have: > > &Upa, Upa, ... > Upa, Upa, ... > > (taken from the pure logic ops) enabled at the same time. Even though > it works for the testcases, I don't think it has well-defined semantics. > > The problem is that, taken on its own, the second alternative says that > matching operands are free. And fundamentally, I don't think the costs > *must* take the earlyclobber alternative over the non-earlyclobber one > (when costing during IRA, for instance). In principle, the cheapest > is best. > > The aim of the gather approach is to make each alternative correct in > isolation. In: > > [&w, Z, w, Ui1, Ui1, Upl] ld1<Vesize>\t%0.s, %5/z, [%2.s] > [?w, Z, 0, Ui1, Ui1, Upl] ^ > > the second alternative says that it is possible to have operands 0 > and 2 be the same vector register, but using that version has the > cost of an extra reload. In that sense the alternatives are > (essentially) consistent about the restriction. > Oh I see! Sorry read over the explicit tie in the first mail! I understand now, The idea is to explicitly model the tie, and non-tie versions. Got it. > > i.e. as one of the testcases I had: > > > >> aarch64-none-elf-gcc -O3 -g0 -S -o - pred-clobber.c -mcpu=neoverse-n2 -ffixed- > p[1-15] > > > > foo: > > mov z31.h, w0 > > ptrue p0.b, all > > cmplo p0.h, p0/z, z0.h, z31.h > > b use > > > > and reload did not force a spill. > > > > My understanding of how this works, and how it seems to be working is that > since reload costs > > Alternative from front to back the cheapest one wins and it stops evaluating the > rest. > > > > The early clobber case is first and preferred, however when it's not possible, i.e. > requires a non-pseudo > > reload, the reload cost is added to the alternative. > > > > However you're right that in the following testcase: > > > > -mcpu=neoverse-n2 -ffixed-p1 -ffixed-p2 -ffixed-p3 -ffixed-p4 -ffixed-p5 -ffixed- > p6 -ffixed-p7 -ffixed-p8 -ffixed-p9 -ffixed-p10 -ffixed-p11 -ffixed-p12 -ffixed-p12 - > ffixed-p13 -ffixed-p14 -ffixed-p14 -fdump-rtl-reload > > > > i.e. giving it an extra free register inexplicably causes a spill: > > > > foo: > > addvl sp, sp, #-1 > > mov z31.h, w0 > > ptrue p0.b, all > > str p15, [sp] > > cmplo p15.h, p0/z, z0.h, z31.h > > mov p0.b, p15.b > > ldr p15, [sp] > > addvl sp, sp, #1 > > b use > > > > so that's unexpected and is very weird as p15 has no defined value.. > > This is because the function implicitly uses the SVE PCS, and so needs > to preserve p15 for the caller. It looks like the correct behaviour. > > > Now adding the ? as suggested to the non-early clobber alternative does not fix > it, and my mental model for how this is supposed to work does not quite line up.. > > Why would making the non-clobber alternative even more expensive help it > during high register pressure?? > > Hopefully the above answers this. The non-clobber alternative has > zero extra cost as things stand. The costs from one alternative > (the earlyclobber one) don't carry forward to other alternatives. > > > But with that suggestion the above case does not get fixed > > and the following case > > > > -mcpu=neoverse-n2 -ffixed-p1 -ffixed-p2 -ffixed-p3 -ffixed-p4 -ffixed-p5 -ffixed- > p6 -ffixed-p7 -ffixed-p8 -ffixed-p9 -ffixed-p10 -ffixed-p11 -ffixed-p12 -ffixed-p12 - > ffixed-p13 -ffixed-p14 -ffixed-p15 -fdump-rtl-reload > > > > ICEs: > > > > pred-clobber.c: In function 'foo': > > pred-clobber.c:9:1: error: unable to find a register to spill > > 9 | } > > | ^ > > pred-clobber.c:9:1: error: this is the insn: > > (insn 10 22 19 2 (parallel [ > > (set (reg:VNx8BI 110 [104]) > > (unspec:VNx8BI [ > > (reg:VNx8BI 112) > > (const_int 1 [0x1]) > > (ltu:VNx8BI (reg:VNx8HI 32 v0) > > (reg:VNx8HI 63 v31)) > > ] UNSPEC_PRED_Z)) > > (clobber (reg:CC_NZC 66 cc)) > > ]) "pred-clobber.c":7:19 8687 {aarch64_pred_cmplovnx8hi} > > (expr_list:REG_DEAD (reg:VNx8BI 112) > > (expr_list:REG_DEAD (reg:VNx8HI 63 v31) > > (expr_list:REG_DEAD (reg:VNx8HI 32 v0) > > (expr_list:REG_UNUSED (reg:CC_NZC 66 cc) > > (nil)))))) > > during RTL pass: reload > > dump file: pred-clobber.c.315r.reload > > Which pattern did you use? > > > and this is because the use of ? has the unintended side-effect of blocking a > register class entirely during Sched1 as we've recently discovered. > > i.e. see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114766 > > (Is sched1 the problem here, or is it purely an RA thing? What happens > when scheduling is disabled?) > > > in this case it marked the alternative as NO_REGS during sched1 and so it's > completely dead. > > the use of the ? alternatives has caused quite the code bloat as we've recently > discovered because of this unexpected and undocumented behavior. > > > > To me, > > > > diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64- > sve.md > > index 93ec59e58af..2ee3d8ea35e 100644 > > --- a/gcc/config/aarch64/aarch64-sve.md > > +++ b/gcc/config/aarch64/aarch64-sve.md > > @@ -8120,10 +8120,10 @@ (define_insn > "@aarch64_pred_cmp<cmp_op><mode>" > > (clobber (reg:CC_NZC CC_REGNUM))] > > "TARGET_SVE" > > {@ [ cons: =0 , 1 , 3 , 4 ; attrs: pred_clobber ] > > - [ &Upa , Upl , w , <sve_imm_con>; yes ] > cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, #%4 > > - [ Upa , Upl , w , <sve_imm_con>; * ] ^ > > - [ &Upa , Upl , w , w ; yes ] cmp<cmp_op>\t%0.<Vetype>, > %1/z, %3.<Vetype>, %4.<Vetype> > > - [ Upa , Upl , w , w ; * ] ^ > > + [ ^&Upa , Upl , w , <sve_imm_con>; yes ] > cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, #%4 > > + [ Upa , Upl , w , <sve_imm_con>; * ] ^ > > + [ ^&Upa , Upl , w , w ; yes ] cmp<cmp_op>\t%0.<Vetype>, > %1/z, %3.<Vetype>, %4.<Vetype> > > + [ Upa , Upl , w , w ; * ] ^ > > } > > ) > > > > Would have been the right approach, i.e. we prefer the alternative unless a reload > is needed, which should work no? well. if ^ wasn't broken the same way > > as ?. Perhaps I need to use Wilco's new alternative that doesn't block a register > class? > > Hmm, I'm not sure. It seems odd to mark only the output with ^, since > reloading the output isn't fundamentally different (costwise) from > reloading the input. > > But to me, it's the alternative without the earlyclobber that should be > disparaged, since it's the inherently expensive one. > > The gather-like approach would be something like: > > [ &Upa , Upl , w , <sve_imm_con>; yes ] > cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, #%4 > [ ?Upl , 0 , w , <sve_imm_con>; yes ] ^ > [ Upa , Upl , w , <sve_imm_con>; no ] ^ > [ &Upa , Upl , w , w ; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, > %3.<Vetype>, %4.<Vetype> > [ ?Upl , 0 , w , w ; yes ] ^ > [ Upa , Upl , w , w ; no ] ^ > > with: > > (define_attr "pred_clobber" "any,no,yes" (const_string "any")) Yeah, this makes sense to me. Sorry I completely misunderstood that the alternative with the tie was suggested in addition to, and not instead of. I'll respin the patches this way. Thanks both!, Tamar > > Thanks, > Richard ^ permalink raw reply [flat|nested] 25+ messages in thread
* RE: [PATCH 0/4]AArch64: support conditional early clobbers on certain operations. 2024-05-15 21:31 ` Richard Sandiford 2024-05-16 2:45 ` Tamar Christina @ 2024-05-21 3:24 ` Tamar Christina 1 sibling, 0 replies; 25+ messages in thread From: Tamar Christina @ 2024-05-21 3:24 UTC (permalink / raw) To: Richard Sandiford Cc: Richard Biener, gcc-patches, nd, Richard Earnshaw, Marcus Shawcroft, ktkachov > -----Original Message----- > From: Richard Sandiford <richard.sandiford@arm.com> > Sent: Wednesday, May 15, 2024 10:31 PM > To: Tamar Christina <Tamar.Christina@arm.com> > Cc: Richard Biener <richard.guenther@gmail.com>; gcc-patches@gcc.gnu.org; nd > <nd@arm.com>; Richard Earnshaw <Richard.Earnshaw@arm.com>; Marcus > Shawcroft <Marcus.Shawcroft@arm.com>; ktkachov@gcc.gnu.org > Subject: Re: [PATCH 0/4]AArch64: support conditional early clobbers on certain > operations. > > Tamar Christina <Tamar.Christina@arm.com> writes: > >> >> On Wed, May 15, 2024 at 12:29 PM Tamar Christina > >> >> <tamar.christina@arm.com> wrote: > >> >> > > >> >> > Hi All, > >> >> > > >> >> > Some Neoverse Software Optimization Guides (SWoG) have a clause that > state > >> >> > that for predicated operations that also produce a predicate it is preferred > >> >> > that the codegen should use a different register for the destination than > that > >> >> > of the input predicate in order to avoid a performance overhead. > >> >> > > >> >> > This of course has the problem that it increases register pressure and so > >> should > >> >> > be done with care. Additionally not all micro-architectures have this > >> >> > consideration and so it shouldn't be done as a default thing. > >> >> > > >> >> > The patch series adds support for doing conditional early clobbers through > a > >> >> > combination of new alternatives and attributes to control their availability. > >> >> > >> >> You could have two alternatives, one with early clobber and one with > >> >> a matching constraint where you'd disparage the matching constraint one? > >> >> > >> > > >> > Yeah, that's what I do, though there's no need to disparage the non-early > clobber > >> > alternative as the early clobber alternative will naturally get a penalty if it > needs a > >> > reload. > >> > >> But I think Richard's suggestion was to disparage the one with a matching > >> constraint (not the earlyclobber), to reflect the increased cost of > >> reusing the register. > >> > >> We did take that approach for gathers, e.g.: > >> > >> [&w, Z, w, Ui1, Ui1, Upl] ld1<Vesize>\t%0.s, %5/z, [%2.s] > >> [?w, Z, 0, Ui1, Ui1, Upl] ^ > >> > >> The (supposed) advantage is that, if register pressure is so tight > >> that using matching registers is the only alternative, we still > >> have the opportunity to do that, as a last resort. > >> > >> Providing only an earlyclobber version means that using the same > >> register is prohibited outright. If no other register is free, the RA > >> would need to spill something else to free up a temporary register. > >> And it might then do the equivalent of (pseudo-code): > >> > >> not p1.b, ..., p0.b > >> mov p0.d, p1.d > >> > >> after spilling what would otherwise have occupied p1. In that > >> situation it would be better use: > >> > >> not p0.b, ..., p0.b > >> > >> and not introduce the spill of p1. > > > > I think I understood what Richi meant, but I thought it was already working that > way. > > The suggestion was to use matching constraints (like "0") though, > whereas the patch doesn't. I think your argument is that you don't > need to use matching constraints. But that's different from the > suggestion (and from how we handle gathers). > > I was going to say in response to patch 3 (but got distracted, sorry): > I don't think we should have: > > &Upa, Upa, ... > Upa, Upa, ... > > (taken from the pure logic ops) enabled at the same time. Even though > it works for the testcases, I don't think it has well-defined semantics. > > The problem is that, taken on its own, the second alternative says that > matching operands are free. And fundamentally, I don't think the costs > *must* take the earlyclobber alternative over the non-earlyclobber one > (when costing during IRA, for instance). In principle, the cheapest > is best. > > The aim of the gather approach is to make each alternative correct in > isolation. In: > > [&w, Z, w, Ui1, Ui1, Upl] ld1<Vesize>\t%0.s, %5/z, [%2.s] > [?w, Z, 0, Ui1, Ui1, Upl] ^ > > the second alternative says that it is possible to have operands 0 > and 2 be the same vector register, but using that version has the > cost of an extra reload. In that sense the alternatives are > (essentially) consistent about the restriction. > > > i.e. as one of the testcases I had: > > > >> aarch64-none-elf-gcc -O3 -g0 -S -o - pred-clobber.c -mcpu=neoverse-n2 -ffixed- > p[1-15] > > > > foo: > > mov z31.h, w0 > > ptrue p0.b, all > > cmplo p0.h, p0/z, z0.h, z31.h > > b use > > > > and reload did not force a spill. > > > > My understanding of how this works, and how it seems to be working is that > since reload costs > > Alternative from front to back the cheapest one wins and it stops evaluating the > rest. > > > > The early clobber case is first and preferred, however when it's not possible, i.e. > requires a non-pseudo > > reload, the reload cost is added to the alternative. > > > > However you're right that in the following testcase: > > > > -mcpu=neoverse-n2 -ffixed-p1 -ffixed-p2 -ffixed-p3 -ffixed-p4 -ffixed-p5 -ffixed- > p6 -ffixed-p7 -ffixed-p8 -ffixed-p9 -ffixed-p10 -ffixed-p11 -ffixed-p12 -ffixed-p12 - > ffixed-p13 -ffixed-p14 -ffixed-p14 -fdump-rtl-reload > > > > i.e. giving it an extra free register inexplicably causes a spill: > > > > foo: > > addvl sp, sp, #-1 > > mov z31.h, w0 > > ptrue p0.b, all > > str p15, [sp] > > cmplo p15.h, p0/z, z0.h, z31.h > > mov p0.b, p15.b > > ldr p15, [sp] > > addvl sp, sp, #1 > > b use > > > > so that's unexpected and is very weird as p15 has no defined value.. > > This is because the function implicitly uses the SVE PCS, and so needs > to preserve p15 for the caller. It looks like the correct behaviour. Sure, but p15 isn't live after the call. It is somewhat of a regression in that if it had chosen the tie version then p0 wouldn't need preserving. It's a bit of an artificial case I guess but are we ok with this regression? Or is there a way to query df to see if a value is live after the call? I can only see ways to tell if the register is live before the call.. Thanks, Tamar > > > Now adding the ? as suggested to the non-early clobber alternative does not fix > it, and my mental model for how this is supposed to work does not quite line up.. > > Why would making the non-clobber alternative even more expensive help it > during high register pressure?? > > Hopefully the above answers this. The non-clobber alternative has > zero extra cost as things stand. The costs from one alternative > (the earlyclobber one) don't carry forward to other alternatives. > > > But with that suggestion the above case does not get fixed > > and the following case > > > > -mcpu=neoverse-n2 -ffixed-p1 -ffixed-p2 -ffixed-p3 -ffixed-p4 -ffixed-p5 -ffixed- > p6 -ffixed-p7 -ffixed-p8 -ffixed-p9 -ffixed-p10 -ffixed-p11 -ffixed-p12 -ffixed-p12 - > ffixed-p13 -ffixed-p14 -ffixed-p15 -fdump-rtl-reload > > > > ICEs: > > > > pred-clobber.c: In function 'foo': > > pred-clobber.c:9:1: error: unable to find a register to spill > > 9 | } > > | ^ > > pred-clobber.c:9:1: error: this is the insn: > > (insn 10 22 19 2 (parallel [ > > (set (reg:VNx8BI 110 [104]) > > (unspec:VNx8BI [ > > (reg:VNx8BI 112) > > (const_int 1 [0x1]) > > (ltu:VNx8BI (reg:VNx8HI 32 v0) > > (reg:VNx8HI 63 v31)) > > ] UNSPEC_PRED_Z)) > > (clobber (reg:CC_NZC 66 cc)) > > ]) "pred-clobber.c":7:19 8687 {aarch64_pred_cmplovnx8hi} > > (expr_list:REG_DEAD (reg:VNx8BI 112) > > (expr_list:REG_DEAD (reg:VNx8HI 63 v31) > > (expr_list:REG_DEAD (reg:VNx8HI 32 v0) > > (expr_list:REG_UNUSED (reg:CC_NZC 66 cc) > > (nil)))))) > > during RTL pass: reload > > dump file: pred-clobber.c.315r.reload > > Which pattern did you use? > > > and this is because the use of ? has the unintended side-effect of blocking a > register class entirely during Sched1 as we've recently discovered. > > i.e. see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114766 > > (Is sched1 the problem here, or is it purely an RA thing? What happens > when scheduling is disabled?) > > > in this case it marked the alternative as NO_REGS during sched1 and so it's > completely dead. > > the use of the ? alternatives has caused quite the code bloat as we've recently > discovered because of this unexpected and undocumented behavior. > > > > To me, > > > > diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64- > sve.md > > index 93ec59e58af..2ee3d8ea35e 100644 > > --- a/gcc/config/aarch64/aarch64-sve.md > > +++ b/gcc/config/aarch64/aarch64-sve.md > > @@ -8120,10 +8120,10 @@ (define_insn > "@aarch64_pred_cmp<cmp_op><mode>" > > (clobber (reg:CC_NZC CC_REGNUM))] > > "TARGET_SVE" > > {@ [ cons: =0 , 1 , 3 , 4 ; attrs: pred_clobber ] > > - [ &Upa , Upl , w , <sve_imm_con>; yes ] > cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, #%4 > > - [ Upa , Upl , w , <sve_imm_con>; * ] ^ > > - [ &Upa , Upl , w , w ; yes ] cmp<cmp_op>\t%0.<Vetype>, > %1/z, %3.<Vetype>, %4.<Vetype> > > - [ Upa , Upl , w , w ; * ] ^ > > + [ ^&Upa , Upl , w , <sve_imm_con>; yes ] > cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, #%4 > > + [ Upa , Upl , w , <sve_imm_con>; * ] ^ > > + [ ^&Upa , Upl , w , w ; yes ] cmp<cmp_op>\t%0.<Vetype>, > %1/z, %3.<Vetype>, %4.<Vetype> > > + [ Upa , Upl , w , w ; * ] ^ > > } > > ) > > > > Would have been the right approach, i.e. we prefer the alternative unless a reload > is needed, which should work no? well. if ^ wasn't broken the same way > > as ?. Perhaps I need to use Wilco's new alternative that doesn't block a register > class? > > Hmm, I'm not sure. It seems odd to mark only the output with ^, since > reloading the output isn't fundamentally different (costwise) from > reloading the input. > > But to me, it's the alternative without the earlyclobber that should be > disparaged, since it's the inherently expensive one. > > The gather-like approach would be something like: > > [ &Upa , Upl , w , <sve_imm_con>; yes ] > cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, #%4 > [ ?Upl , 0 , w , <sve_imm_con>; yes ] ^ > [ Upa , Upl , w , <sve_imm_con>; no ] ^ > [ &Upa , Upl , w , w ; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, > %3.<Vetype>, %4.<Vetype> > [ ?Upl , 0 , w , w ; yes ] ^ > [ Upa , Upl , w , w ; no ] ^ > > with: > > (define_attr "pred_clobber" "any,no,yes" (const_string "any")) > > Thanks, > Richard ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 3/4]AArch64: add new alternative with early clobber to patterns @ 2024-05-22 9:29 Tamar Christina 2024-05-22 9:47 ` Richard Sandiford 0 siblings, 1 reply; 25+ messages in thread From: Tamar Christina @ 2024-05-22 9:29 UTC (permalink / raw) To: gcc-patches Cc: nd, Richard.Earnshaw, Marcus.Shawcroft, ktkachov, richard.sandiford [-- Attachment #1: Type: text/plain, Size: 19544 bytes --] Hi All, This patch adds new alternatives to the patterns which are affected. The new alternatives with the conditional early clobbers are added before the normal ones in order for LRA to prefer them in the event that we have enough free registers to accommodate them. In case register pressure is too high the normal alternatives will be preferred before a reload is considered as we rather have the tie than a spill. Tests are in the next patch. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * config/aarch64/aarch64-sve.md (and<mode>3, @aarch64_pred_<optab><mode>_z, *<optab><mode>3_cc, *<optab><mode>3_ptest, aarch64_pred_<nlogical><mode>_z, *<nlogical><mode>3_cc, *<nlogical><mode>3_ptest, aarch64_pred_<logical_nn><mode>_z, *<logical_nn><mode>3_cc, *<logical_nn><mode>3_ptest, @aarch64_pred_cmp<cmp_op><mode>, *cmp<cmp_op><mode>_cc, *cmp<cmp_op><mode>_ptest, @aarch64_pred_cmp<cmp_op><mode>_wide, *aarch64_pred_cmp<cmp_op><mode>_wide_cc, *aarch64_pred_cmp<cmp_op><mode>_wide_ptest, @aarch64_brk<brk_op>, *aarch64_brk<brk_op>_cc, *aarch64_brk<brk_op>_ptest, @aarch64_brk<brk_op>, *aarch64_brkn_cc, *aarch64_brkn_ptest, *aarch64_brk<brk_op>_cc, *aarch64_brk<brk_op>_ptest, aarch64_rdffr_z, *aarch64_rdffr_z_ptest, *aarch64_rdffr_ptest, *aarch64_rdffr_z_cc, *aarch64_rdffr_cc): Add new early clobber alternative. * config/aarch64/aarch64-sve2.md (@aarch64_pred_<sve_int_op><mode>): Likewise. --- diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md index e3085c0c636f1317409bbf3b5fbaf5342a2df1f6..8fdc1bc3cd43acfcd675a18350c297428c85fe46 100644 --- a/gcc/config/aarch64/aarch64-sve.md +++ b/gcc/config/aarch64/aarch64-sve.md @@ -1161,8 +1161,10 @@ (define_insn "aarch64_rdffr_z" (reg:VNx16BI FFRT_REGNUM) (match_operand:VNx16BI 1 "register_operand")))] "TARGET_SVE && TARGET_NON_STREAMING" - {@ [ cons: =0, 1 ] - [ Upa , Upa ] rdffr\t%0.b, %1/z + {@ [ cons: =0, 1 ; attrs: pred_clobber ] + [ &Upa , Upa; yes ] rdffr\t%0.b, %1/z + [ ?Upa , Upa; yes ] ^ + [ Upa , Upa; * ] ^ } ) @@ -1179,8 +1181,10 @@ (define_insn "*aarch64_rdffr_z_ptest" UNSPEC_PTEST)) (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE && TARGET_NON_STREAMING" - {@ [ cons: =0, 1 ] - [ Upa , Upa ] rdffrs\t%0.b, %1/z + {@ [ cons: =0, 1 ; attrs: pred_clobber ] + [ &Upa , Upa; yes ] rdffrs\t%0.b, %1/z + [ ?Upa , Upa; yes ] ^ + [ Upa , Upa; * ] ^ } ) @@ -1195,8 +1199,10 @@ (define_insn "*aarch64_rdffr_ptest" UNSPEC_PTEST)) (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE && TARGET_NON_STREAMING" - {@ [ cons: =0, 1 ] - [ Upa , Upa ] rdffrs\t%0.b, %1/z + {@ [ cons: =0, 1 ; attrs: pred_clobber ] + [ &Upa , Upa; yes ] rdffrs\t%0.b, %1/z + [ ?Upa , Upa; yes ] ^ + [ Upa , Upa; * ] ^ } ) @@ -1216,8 +1222,10 @@ (define_insn "*aarch64_rdffr_z_cc" (reg:VNx16BI FFRT_REGNUM) (match_dup 1)))] "TARGET_SVE && TARGET_NON_STREAMING" - {@ [ cons: =0, 1 ] - [ Upa , Upa ] rdffrs\t%0.b, %1/z + {@ [ cons: =0, 1 ; attrs: pred_clobber ] + [ &Upa , Upa; yes ] rdffrs\t%0.b, %1/z + [ ?Upa , Upa; yes ] ^ + [ Upa , Upa; * ] ^ } ) @@ -1233,8 +1241,10 @@ (define_insn "*aarch64_rdffr_cc" (set (match_operand:VNx16BI 0 "register_operand") (reg:VNx16BI FFRT_REGNUM))] "TARGET_SVE && TARGET_NON_STREAMING" - {@ [ cons: =0, 1 ] - [ Upa , Upa ] rdffrs\t%0.b, %1/z + {@ [ cons: =0, 1 ; attrs: pred_clobber ] + [ &Upa , Upa; yes ] rdffrs\t%0.b, %1/z + [ ?Upa , Upa; yes ] ^ + [ Upa , Upa; * ] ^ } ) @@ -6651,8 +6661,10 @@ (define_insn "and<mode>3" (and:PRED_ALL (match_operand:PRED_ALL 1 "register_operand") (match_operand:PRED_ALL 2 "register_operand")))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 ] - [ Upa , Upa, Upa ] and\t%0.b, %1/z, %2.b, %2.b + {@ [ cons: =0, 1 , 2 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa; yes ] and\t%0.b, %1/z, %2.b, %2.b + [ ?Upa , Upa, Upa; yes ] ^ + [ Upa , Upa, Upa; * ] ^ } ) @@ -6679,8 +6691,10 @@ (define_insn "@aarch64_pred_<optab><mode>_z" (match_operand:PRED_ALL 3 "register_operand")) (match_operand:PRED_ALL 1 "register_operand")))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, Upa ] <logical>\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa; yes ] <logical>\t%0.b, %1/z, %2.b, %3.b + [ ?Upa , Upa, Upa, Upa; yes ] ^ + [ Upa , Upa, Upa, Upa; * ] ^ } ) @@ -6703,8 +6717,10 @@ (define_insn "*<optab><mode>3_cc" (and:PRED_ALL (LOGICAL:PRED_ALL (match_dup 2) (match_dup 3)) (match_dup 4)))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, Upa ] <logical>s\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa; yes ] <logical>s\t%0.b, %1/z, %2.b, %3.b + [ ?Upa , Upa, Upa, Upa; yes ] ^ + [ Upa , Upa, Upa, Upa; * ] ^ } ) @@ -6723,8 +6739,10 @@ (define_insn "*<optab><mode>3_ptest" UNSPEC_PTEST)) (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, Upa ] <logical>s\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa; yes ] <logical>s\t%0.b, %1/z, %2.b, %3.b + [ ?Upa , Upa, Upa, Upa; yes ] ^ + [ Upa , Upa, Upa, Upa; * ] ^ } ) @@ -6745,8 +6763,10 @@ (define_insn "aarch64_pred_<nlogical><mode>_z" (match_operand:PRED_ALL 2 "register_operand")) (match_operand:PRED_ALL 1 "register_operand")))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, Upa ] <nlogical>\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa; yes ] <nlogical>\t%0.b, %1/z, %2.b, %3.b + [ ?Upa , Upa, Upa, Upa; yes ] ^ + [ Upa , Upa, Upa, Upa; * ] ^ } ) @@ -6770,8 +6790,10 @@ (define_insn "*<nlogical><mode>3_cc" (match_dup 2)) (match_dup 4)))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, Upa ] <nlogical>s\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa; yes ] <nlogical>s\t%0.b, %1/z, %2.b, %3.b + [ ?Upa , Upa, Upa, Upa; yes ] ^ + [ Upa , Upa, Upa, Upa; * ] ^ } ) @@ -6791,8 +6813,10 @@ (define_insn "*<nlogical><mode>3_ptest" UNSPEC_PTEST)) (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, Upa ] <nlogical>s\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa; yes ] <nlogical>s\t%0.b, %1/z, %2.b, %3.b + [ ?Upa , Upa, Upa, Upa; yes ] ^ + [ Upa , Upa, Upa, Upa; * ] ^ } ) @@ -6813,8 +6837,10 @@ (define_insn "aarch64_pred_<logical_nn><mode>_z" (not:PRED_ALL (match_operand:PRED_ALL 3 "register_operand"))) (match_operand:PRED_ALL 1 "register_operand")))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, Upa ] <logical_nn>\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa; yes ] <logical_nn>\t%0.b, %1/z, %2.b, %3.b + [ ?Upa , Upa, Upa, Upa; yes ] ^ + [ Upa , Upa, Upa, Upa; * ] ^ } ) @@ -6839,8 +6865,10 @@ (define_insn "*<logical_nn><mode>3_cc" (not:PRED_ALL (match_dup 3))) (match_dup 4)))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, Upa ] <logical_nn>s\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa; yes ] <logical_nn>s\t%0.b, %1/z, %2.b, %3.b + [ ?Upa , Upa, Upa, Upa; yes ] ^ + [ Upa , Upa, Upa, Upa; * ] ^ } ) @@ -6861,8 +6889,10 @@ (define_insn "*<logical_nn><mode>3_ptest" UNSPEC_PTEST)) (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, Upa ] <logical_nn>s\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa; yes ] <logical_nn>s\t%0.b, %1/z, %2.b, %3.b + [ ?Upa , Upa, Upa, Upa; yes ] ^ + [ Upa , Upa, Upa, Upa; * ] ^ } ) @@ -8104,9 +8134,13 @@ (define_insn "@aarch64_pred_cmp<cmp_op><mode>" UNSPEC_PRED_Z)) (clobber (reg:CC_NZC CC_REGNUM))] "TARGET_SVE" - {@ [ cons: =0 , 1 , 3 , 4 ] - [ Upa , Upl , w , <sve_imm_con> ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, #%4 - [ Upa , Upl , w , w ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.<Vetype> + {@ [ cons: =0 , 1 , 3 , 4 ; attrs: pred_clobber ] + [ &Upa , Upl , w , <sve_imm_con>; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, #%4 + [ ?Upa , Upl , w , <sve_imm_con>; yes ] ^ + [ Upa , Upl , w , <sve_imm_con>; * ] ^ + [ &Upa , Upl , w , w ; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.<Vetype> + [ ?Upa , Upl , w , w ; yes ] ^ + [ Upa , Upl , w , w ; * ] ^ } ) @@ -8136,9 +8170,13 @@ (define_insn_and_rewrite "*cmp<cmp_op><mode>_cc" UNSPEC_PRED_Z))] "TARGET_SVE && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])" - {@ [ cons: =0 , 1 , 2 , 3 ] - [ Upa , Upl , w , <sve_imm_con> ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, #%3 - [ Upa , Upl , w , w ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype> + {@ [ cons: =0 , 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upl , w , <sve_imm_con>; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, #%3 + [ ?Upa , Upl , w , <sve_imm_con>; yes ] ^ + [ Upa , Upl , w , <sve_imm_con>; * ] ^ + [ &Upa , Upl , w , w ; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype> + [ ?Upa , Upl , w , w ; yes ] ^ + [ Upa , Upl , w , w ; * ] ^ } "&& !rtx_equal_p (operands[4], operands[6])" { @@ -8166,9 +8204,13 @@ (define_insn_and_rewrite "*cmp<cmp_op><mode>_ptest" (clobber (match_scratch:<VPRED> 0))] "TARGET_SVE && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upl, w , <sve_imm_con> ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, #%3 - [ Upa , Upl, w , w ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype> + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upl, w , <sve_imm_con>; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, #%3 + [ ?Upa , Upl, w , <sve_imm_con>; yes ] ^ + [ Upa , Upl, w , <sve_imm_con>; * ] ^ + [ &Upa , Upl, w , w ; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype> + [ ?Upa , Upl, w , w ; yes ] ^ + [ Upa , Upl, w , w ; * ] ^ } "&& !rtx_equal_p (operands[4], operands[6])" { @@ -8221,8 +8263,10 @@ (define_insn "@aarch64_pred_cmp<cmp_op><mode>_wide" UNSPEC_PRED_Z)) (clobber (reg:CC_NZC CC_REGNUM))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2, 3, 4 ] - [ Upa , Upl, , w, w ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.d + {@ [ cons: =0, 1 , 2, 3, 4; attrs: pred_clobber ] + [ &Upa , Upl, , w, w; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.d + [ ?Upa , Upl, , w, w; yes ] ^ + [ Upa , Upl, , w, w; * ] ^ } ) @@ -8254,8 +8298,10 @@ (define_insn "*aarch64_pred_cmp<cmp_op><mode>_wide_cc" UNSPEC_PRED_Z))] "TARGET_SVE && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])" - {@ [ cons: =0, 1 , 2, 3, 6 ] - [ Upa , Upl, w, w, Upl ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.d + {@ [ cons: =0, 1 , 2, 3, 6 ; attrs: pred_clobber ] + [ &Upa , Upl, w, w, Upl; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.d + [ ?Upa , Upl, w, w, Upl; yes ] ^ + [ Upa , Upl, w, w, Upl; * ] ^ } ) @@ -8279,8 +8325,10 @@ (define_insn "*aarch64_pred_cmp<cmp_op><mode>_wide_ptest" (clobber (match_scratch:<VPRED> 0))] "TARGET_SVE && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])" - {@ [ cons: =0, 1 , 2, 3, 6 ] - [ Upa , Upl, w, w, Upl ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.d + {@ [ cons: =0, 1 , 2, 3, 6 ; attrs: pred_clobber ] + [ &Upa , Upl, w, w, Upl; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.d + [ ?Upa , Upl, w, w, Upl; yes ] ^ + [ Upa , Upl, w, w, Upl; * ] ^ } ) @@ -9948,9 +9996,13 @@ (define_insn "@aarch64_brk<brk_op>" (match_operand:VNx16BI 3 "aarch64_simd_reg_or_zero")] SVE_BRK_UNARY))] "TARGET_SVE" - {@ [ cons: =0 , 1 , 2 , 3 ] - [ Upa , Upa , Upa , Dz ] brk<brk_op>\t%0.b, %1/z, %2.b - [ Upa , Upa , Upa , 0 ] brk<brk_op>\t%0.b, %1/m, %2.b + {@ [ cons: =0 , 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upa , Upa , Dz; yes ] brk<brk_op>\t%0.b, %1/z, %2.b + [ ?Upa , Upa , Upa , Dz; yes ] ^ + [ Upa , Upa , Upa , Dz; * ] ^ + [ &Upa , Upa , Upa , 0 ; yes ] brk<brk_op>\t%0.b, %1/m, %2.b + [ ?Upa , Upa , Upa , 0 ; yes ] ^ + [ Upa , Upa , Upa , 0 ; * ] ^ } ) @@ -9974,8 +10026,10 @@ (define_insn "*aarch64_brk<brk_op>_cc" (match_dup 3)] SVE_BRK_UNARY))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 ] - [ Upa , Upa, Upa ] brk<brk_op>s\t%0.b, %1/z, %2.b + {@ [ cons: =0, 1 , 2 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa; yes ] brk<brk_op>s\t%0.b, %1/z, %2.b + [ ?Upa , Upa, Upa; yes ] ^ + [ Upa , Upa, Upa; * ] ^ } ) @@ -9994,8 +10048,10 @@ (define_insn "*aarch64_brk<brk_op>_ptest" UNSPEC_PTEST)) (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 ] - [ Upa , Upa, Upa ] brk<brk_op>s\t%0.b, %1/z, %2.b + {@ [ cons: =0, 1 , 2 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa; yes ] brk<brk_op>s\t%0.b, %1/z, %2.b + [ ?Upa , Upa, Upa; yes ] ^ + [ Upa , Upa, Upa; * ] ^ } ) @@ -10020,8 +10076,10 @@ (define_insn "@aarch64_brk<brk_op>" (match_operand:VNx16BI 3 "register_operand")] SVE_BRK_BINARY))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, <brk_reg_con> ] brk<brk_op>\t%0.b, %1/z, %2.b, %<brk_reg_opno>.b + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa, <brk_reg_con>; yes ] brk<brk_op>\t%0.b, %1/z, %2.b, %<brk_reg_opno>.b + [ ?Upa , Upa, Upa, <brk_reg_con>; yes ] ^ + [ Upa , Upa, Upa, <brk_reg_con>; * ] ^ } ) @@ -10046,8 +10104,10 @@ (define_insn_and_rewrite "*aarch64_brkn_cc" (match_dup 3)] UNSPEC_BRKN))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, 0 ] brkns\t%0.b, %1/z, %2.b, %0.b + {@ [ cons: =0, 1 , 2 , 3; attrs: pred_clobber ] + [ &Upa , Upa, Upa, 0; yes ] brkns\t%0.b, %1/z, %2.b, %0.b + [ ?Upa , Upa, Upa, 0; yes ] ^ + [ Upa , Upa, Upa, 0; * ] ^ } "&& (operands[4] != CONST0_RTX (VNx16BImode) || operands[5] != CONST0_RTX (VNx16BImode))" @@ -10072,8 +10132,10 @@ (define_insn_and_rewrite "*aarch64_brkn_ptest" UNSPEC_PTEST)) (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, 0 ] brkns\t%0.b, %1/z, %2.b, %0.b + {@ [ cons: =0, 1 , 2 , 3; attrs: pred_clobber ] + [ &Upa , Upa, Upa, 0; yes ] brkns\t%0.b, %1/z, %2.b, %0.b + [ ?Upa , Upa, Upa, 0; yes ] ^ + [ Upa , Upa, Upa, 0; * ] ^ } "&& (operands[4] != CONST0_RTX (VNx16BImode) || operands[5] != CONST0_RTX (VNx16BImode))" @@ -10103,8 +10165,10 @@ (define_insn "*aarch64_brk<brk_op>_cc" (match_dup 3)] SVE_BRKP))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, Upa ] brk<brk_op>s\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 , 4; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa, ; yes ] brk<brk_op>s\t%0.b, %1/z, %2.b, %3.b + [ ?Upa , Upa, Upa, Upa, ; yes ] ^ + [ Upa , Upa, Upa, Upa, ; * ] ^ } ) @@ -10123,8 +10187,10 @@ (define_insn "*aarch64_brk<brk_op>_ptest" UNSPEC_PTEST)) (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, Upa ] brk<brk_op>s\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa; yes ] brk<brk_op>s\t%0.b, %1/z, %2.b, %3.b + [ ?Upa , Upa, Upa, Upa; yes ] ^ + [ Upa , Upa, Upa, Upa; * ] ^ } ) diff --git a/gcc/config/aarch64/aarch64-sve2.md b/gcc/config/aarch64/aarch64-sve2.md index aa12baf48355358ca4fefe88157df3aac6eb09bd..1a49494a69d8335e5f7d3ef4bd3a90d0805bba84 100644 --- a/gcc/config/aarch64/aarch64-sve2.md +++ b/gcc/config/aarch64/aarch64-sve2.md @@ -3349,8 +3349,10 @@ (define_insn "@aarch64_pred_<sve_int_op><mode>" UNSPEC_PRED_Z)) (clobber (reg:CC_NZC CC_REGNUM))] "TARGET_SVE2 && TARGET_NON_STREAMING" - {@ [ cons: =0, 1 , 2, 3, 4 ] - [ Upa , Upl, , w, w ] <sve_int_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.<Vetype> + {@ [ cons: =0, 1 , 2, 3, 4; attrs: pred_clobber ] + [ &Upa , Upl, , w, w; yes ] <sve_int_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.<Vetype> + [ ?Upa , Upl, , w, w; yes ] ^ + [ Upa , Upl, , w, w; * ] ^ } ) -- [-- Attachment #2: rb18357.patch --] [-- Type: text/x-diff, Size: 18051 bytes --] diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md index e3085c0c636f1317409bbf3b5fbaf5342a2df1f6..8fdc1bc3cd43acfcd675a18350c297428c85fe46 100644 --- a/gcc/config/aarch64/aarch64-sve.md +++ b/gcc/config/aarch64/aarch64-sve.md @@ -1161,8 +1161,10 @@ (define_insn "aarch64_rdffr_z" (reg:VNx16BI FFRT_REGNUM) (match_operand:VNx16BI 1 "register_operand")))] "TARGET_SVE && TARGET_NON_STREAMING" - {@ [ cons: =0, 1 ] - [ Upa , Upa ] rdffr\t%0.b, %1/z + {@ [ cons: =0, 1 ; attrs: pred_clobber ] + [ &Upa , Upa; yes ] rdffr\t%0.b, %1/z + [ ?Upa , Upa; yes ] ^ + [ Upa , Upa; * ] ^ } ) @@ -1179,8 +1181,10 @@ (define_insn "*aarch64_rdffr_z_ptest" UNSPEC_PTEST)) (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE && TARGET_NON_STREAMING" - {@ [ cons: =0, 1 ] - [ Upa , Upa ] rdffrs\t%0.b, %1/z + {@ [ cons: =0, 1 ; attrs: pred_clobber ] + [ &Upa , Upa; yes ] rdffrs\t%0.b, %1/z + [ ?Upa , Upa; yes ] ^ + [ Upa , Upa; * ] ^ } ) @@ -1195,8 +1199,10 @@ (define_insn "*aarch64_rdffr_ptest" UNSPEC_PTEST)) (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE && TARGET_NON_STREAMING" - {@ [ cons: =0, 1 ] - [ Upa , Upa ] rdffrs\t%0.b, %1/z + {@ [ cons: =0, 1 ; attrs: pred_clobber ] + [ &Upa , Upa; yes ] rdffrs\t%0.b, %1/z + [ ?Upa , Upa; yes ] ^ + [ Upa , Upa; * ] ^ } ) @@ -1216,8 +1222,10 @@ (define_insn "*aarch64_rdffr_z_cc" (reg:VNx16BI FFRT_REGNUM) (match_dup 1)))] "TARGET_SVE && TARGET_NON_STREAMING" - {@ [ cons: =0, 1 ] - [ Upa , Upa ] rdffrs\t%0.b, %1/z + {@ [ cons: =0, 1 ; attrs: pred_clobber ] + [ &Upa , Upa; yes ] rdffrs\t%0.b, %1/z + [ ?Upa , Upa; yes ] ^ + [ Upa , Upa; * ] ^ } ) @@ -1233,8 +1241,10 @@ (define_insn "*aarch64_rdffr_cc" (set (match_operand:VNx16BI 0 "register_operand") (reg:VNx16BI FFRT_REGNUM))] "TARGET_SVE && TARGET_NON_STREAMING" - {@ [ cons: =0, 1 ] - [ Upa , Upa ] rdffrs\t%0.b, %1/z + {@ [ cons: =0, 1 ; attrs: pred_clobber ] + [ &Upa , Upa; yes ] rdffrs\t%0.b, %1/z + [ ?Upa , Upa; yes ] ^ + [ Upa , Upa; * ] ^ } ) @@ -6651,8 +6661,10 @@ (define_insn "and<mode>3" (and:PRED_ALL (match_operand:PRED_ALL 1 "register_operand") (match_operand:PRED_ALL 2 "register_operand")))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 ] - [ Upa , Upa, Upa ] and\t%0.b, %1/z, %2.b, %2.b + {@ [ cons: =0, 1 , 2 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa; yes ] and\t%0.b, %1/z, %2.b, %2.b + [ ?Upa , Upa, Upa; yes ] ^ + [ Upa , Upa, Upa; * ] ^ } ) @@ -6679,8 +6691,10 @@ (define_insn "@aarch64_pred_<optab><mode>_z" (match_operand:PRED_ALL 3 "register_operand")) (match_operand:PRED_ALL 1 "register_operand")))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, Upa ] <logical>\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa; yes ] <logical>\t%0.b, %1/z, %2.b, %3.b + [ ?Upa , Upa, Upa, Upa; yes ] ^ + [ Upa , Upa, Upa, Upa; * ] ^ } ) @@ -6703,8 +6717,10 @@ (define_insn "*<optab><mode>3_cc" (and:PRED_ALL (LOGICAL:PRED_ALL (match_dup 2) (match_dup 3)) (match_dup 4)))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, Upa ] <logical>s\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa; yes ] <logical>s\t%0.b, %1/z, %2.b, %3.b + [ ?Upa , Upa, Upa, Upa; yes ] ^ + [ Upa , Upa, Upa, Upa; * ] ^ } ) @@ -6723,8 +6739,10 @@ (define_insn "*<optab><mode>3_ptest" UNSPEC_PTEST)) (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, Upa ] <logical>s\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa; yes ] <logical>s\t%0.b, %1/z, %2.b, %3.b + [ ?Upa , Upa, Upa, Upa; yes ] ^ + [ Upa , Upa, Upa, Upa; * ] ^ } ) @@ -6745,8 +6763,10 @@ (define_insn "aarch64_pred_<nlogical><mode>_z" (match_operand:PRED_ALL 2 "register_operand")) (match_operand:PRED_ALL 1 "register_operand")))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, Upa ] <nlogical>\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa; yes ] <nlogical>\t%0.b, %1/z, %2.b, %3.b + [ ?Upa , Upa, Upa, Upa; yes ] ^ + [ Upa , Upa, Upa, Upa; * ] ^ } ) @@ -6770,8 +6790,10 @@ (define_insn "*<nlogical><mode>3_cc" (match_dup 2)) (match_dup 4)))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, Upa ] <nlogical>s\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa; yes ] <nlogical>s\t%0.b, %1/z, %2.b, %3.b + [ ?Upa , Upa, Upa, Upa; yes ] ^ + [ Upa , Upa, Upa, Upa; * ] ^ } ) @@ -6791,8 +6813,10 @@ (define_insn "*<nlogical><mode>3_ptest" UNSPEC_PTEST)) (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, Upa ] <nlogical>s\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa; yes ] <nlogical>s\t%0.b, %1/z, %2.b, %3.b + [ ?Upa , Upa, Upa, Upa; yes ] ^ + [ Upa , Upa, Upa, Upa; * ] ^ } ) @@ -6813,8 +6837,10 @@ (define_insn "aarch64_pred_<logical_nn><mode>_z" (not:PRED_ALL (match_operand:PRED_ALL 3 "register_operand"))) (match_operand:PRED_ALL 1 "register_operand")))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, Upa ] <logical_nn>\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa; yes ] <logical_nn>\t%0.b, %1/z, %2.b, %3.b + [ ?Upa , Upa, Upa, Upa; yes ] ^ + [ Upa , Upa, Upa, Upa; * ] ^ } ) @@ -6839,8 +6865,10 @@ (define_insn "*<logical_nn><mode>3_cc" (not:PRED_ALL (match_dup 3))) (match_dup 4)))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, Upa ] <logical_nn>s\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa; yes ] <logical_nn>s\t%0.b, %1/z, %2.b, %3.b + [ ?Upa , Upa, Upa, Upa; yes ] ^ + [ Upa , Upa, Upa, Upa; * ] ^ } ) @@ -6861,8 +6889,10 @@ (define_insn "*<logical_nn><mode>3_ptest" UNSPEC_PTEST)) (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, Upa ] <logical_nn>s\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa; yes ] <logical_nn>s\t%0.b, %1/z, %2.b, %3.b + [ ?Upa , Upa, Upa, Upa; yes ] ^ + [ Upa , Upa, Upa, Upa; * ] ^ } ) @@ -8104,9 +8134,13 @@ (define_insn "@aarch64_pred_cmp<cmp_op><mode>" UNSPEC_PRED_Z)) (clobber (reg:CC_NZC CC_REGNUM))] "TARGET_SVE" - {@ [ cons: =0 , 1 , 3 , 4 ] - [ Upa , Upl , w , <sve_imm_con> ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, #%4 - [ Upa , Upl , w , w ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.<Vetype> + {@ [ cons: =0 , 1 , 3 , 4 ; attrs: pred_clobber ] + [ &Upa , Upl , w , <sve_imm_con>; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, #%4 + [ ?Upa , Upl , w , <sve_imm_con>; yes ] ^ + [ Upa , Upl , w , <sve_imm_con>; * ] ^ + [ &Upa , Upl , w , w ; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.<Vetype> + [ ?Upa , Upl , w , w ; yes ] ^ + [ Upa , Upl , w , w ; * ] ^ } ) @@ -8136,9 +8170,13 @@ (define_insn_and_rewrite "*cmp<cmp_op><mode>_cc" UNSPEC_PRED_Z))] "TARGET_SVE && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])" - {@ [ cons: =0 , 1 , 2 , 3 ] - [ Upa , Upl , w , <sve_imm_con> ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, #%3 - [ Upa , Upl , w , w ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype> + {@ [ cons: =0 , 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upl , w , <sve_imm_con>; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, #%3 + [ ?Upa , Upl , w , <sve_imm_con>; yes ] ^ + [ Upa , Upl , w , <sve_imm_con>; * ] ^ + [ &Upa , Upl , w , w ; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype> + [ ?Upa , Upl , w , w ; yes ] ^ + [ Upa , Upl , w , w ; * ] ^ } "&& !rtx_equal_p (operands[4], operands[6])" { @@ -8166,9 +8204,13 @@ (define_insn_and_rewrite "*cmp<cmp_op><mode>_ptest" (clobber (match_scratch:<VPRED> 0))] "TARGET_SVE && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upl, w , <sve_imm_con> ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, #%3 - [ Upa , Upl, w , w ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype> + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upl, w , <sve_imm_con>; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, #%3 + [ ?Upa , Upl, w , <sve_imm_con>; yes ] ^ + [ Upa , Upl, w , <sve_imm_con>; * ] ^ + [ &Upa , Upl, w , w ; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype> + [ ?Upa , Upl, w , w ; yes ] ^ + [ Upa , Upl, w , w ; * ] ^ } "&& !rtx_equal_p (operands[4], operands[6])" { @@ -8221,8 +8263,10 @@ (define_insn "@aarch64_pred_cmp<cmp_op><mode>_wide" UNSPEC_PRED_Z)) (clobber (reg:CC_NZC CC_REGNUM))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2, 3, 4 ] - [ Upa , Upl, , w, w ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.d + {@ [ cons: =0, 1 , 2, 3, 4; attrs: pred_clobber ] + [ &Upa , Upl, , w, w; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.d + [ ?Upa , Upl, , w, w; yes ] ^ + [ Upa , Upl, , w, w; * ] ^ } ) @@ -8254,8 +8298,10 @@ (define_insn "*aarch64_pred_cmp<cmp_op><mode>_wide_cc" UNSPEC_PRED_Z))] "TARGET_SVE && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])" - {@ [ cons: =0, 1 , 2, 3, 6 ] - [ Upa , Upl, w, w, Upl ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.d + {@ [ cons: =0, 1 , 2, 3, 6 ; attrs: pred_clobber ] + [ &Upa , Upl, w, w, Upl; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.d + [ ?Upa , Upl, w, w, Upl; yes ] ^ + [ Upa , Upl, w, w, Upl; * ] ^ } ) @@ -8279,8 +8325,10 @@ (define_insn "*aarch64_pred_cmp<cmp_op><mode>_wide_ptest" (clobber (match_scratch:<VPRED> 0))] "TARGET_SVE && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])" - {@ [ cons: =0, 1 , 2, 3, 6 ] - [ Upa , Upl, w, w, Upl ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.d + {@ [ cons: =0, 1 , 2, 3, 6 ; attrs: pred_clobber ] + [ &Upa , Upl, w, w, Upl; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.d + [ ?Upa , Upl, w, w, Upl; yes ] ^ + [ Upa , Upl, w, w, Upl; * ] ^ } ) @@ -9948,9 +9996,13 @@ (define_insn "@aarch64_brk<brk_op>" (match_operand:VNx16BI 3 "aarch64_simd_reg_or_zero")] SVE_BRK_UNARY))] "TARGET_SVE" - {@ [ cons: =0 , 1 , 2 , 3 ] - [ Upa , Upa , Upa , Dz ] brk<brk_op>\t%0.b, %1/z, %2.b - [ Upa , Upa , Upa , 0 ] brk<brk_op>\t%0.b, %1/m, %2.b + {@ [ cons: =0 , 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upa , Upa , Dz; yes ] brk<brk_op>\t%0.b, %1/z, %2.b + [ ?Upa , Upa , Upa , Dz; yes ] ^ + [ Upa , Upa , Upa , Dz; * ] ^ + [ &Upa , Upa , Upa , 0 ; yes ] brk<brk_op>\t%0.b, %1/m, %2.b + [ ?Upa , Upa , Upa , 0 ; yes ] ^ + [ Upa , Upa , Upa , 0 ; * ] ^ } ) @@ -9974,8 +10026,10 @@ (define_insn "*aarch64_brk<brk_op>_cc" (match_dup 3)] SVE_BRK_UNARY))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 ] - [ Upa , Upa, Upa ] brk<brk_op>s\t%0.b, %1/z, %2.b + {@ [ cons: =0, 1 , 2 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa; yes ] brk<brk_op>s\t%0.b, %1/z, %2.b + [ ?Upa , Upa, Upa; yes ] ^ + [ Upa , Upa, Upa; * ] ^ } ) @@ -9994,8 +10048,10 @@ (define_insn "*aarch64_brk<brk_op>_ptest" UNSPEC_PTEST)) (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 ] - [ Upa , Upa, Upa ] brk<brk_op>s\t%0.b, %1/z, %2.b + {@ [ cons: =0, 1 , 2 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa; yes ] brk<brk_op>s\t%0.b, %1/z, %2.b + [ ?Upa , Upa, Upa; yes ] ^ + [ Upa , Upa, Upa; * ] ^ } ) @@ -10020,8 +10076,10 @@ (define_insn "@aarch64_brk<brk_op>" (match_operand:VNx16BI 3 "register_operand")] SVE_BRK_BINARY))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, <brk_reg_con> ] brk<brk_op>\t%0.b, %1/z, %2.b, %<brk_reg_opno>.b + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa, <brk_reg_con>; yes ] brk<brk_op>\t%0.b, %1/z, %2.b, %<brk_reg_opno>.b + [ ?Upa , Upa, Upa, <brk_reg_con>; yes ] ^ + [ Upa , Upa, Upa, <brk_reg_con>; * ] ^ } ) @@ -10046,8 +10104,10 @@ (define_insn_and_rewrite "*aarch64_brkn_cc" (match_dup 3)] UNSPEC_BRKN))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, 0 ] brkns\t%0.b, %1/z, %2.b, %0.b + {@ [ cons: =0, 1 , 2 , 3; attrs: pred_clobber ] + [ &Upa , Upa, Upa, 0; yes ] brkns\t%0.b, %1/z, %2.b, %0.b + [ ?Upa , Upa, Upa, 0; yes ] ^ + [ Upa , Upa, Upa, 0; * ] ^ } "&& (operands[4] != CONST0_RTX (VNx16BImode) || operands[5] != CONST0_RTX (VNx16BImode))" @@ -10072,8 +10132,10 @@ (define_insn_and_rewrite "*aarch64_brkn_ptest" UNSPEC_PTEST)) (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, 0 ] brkns\t%0.b, %1/z, %2.b, %0.b + {@ [ cons: =0, 1 , 2 , 3; attrs: pred_clobber ] + [ &Upa , Upa, Upa, 0; yes ] brkns\t%0.b, %1/z, %2.b, %0.b + [ ?Upa , Upa, Upa, 0; yes ] ^ + [ Upa , Upa, Upa, 0; * ] ^ } "&& (operands[4] != CONST0_RTX (VNx16BImode) || operands[5] != CONST0_RTX (VNx16BImode))" @@ -10103,8 +10165,10 @@ (define_insn "*aarch64_brk<brk_op>_cc" (match_dup 3)] SVE_BRKP))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, Upa ] brk<brk_op>s\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 , 4; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa, ; yes ] brk<brk_op>s\t%0.b, %1/z, %2.b, %3.b + [ ?Upa , Upa, Upa, Upa, ; yes ] ^ + [ Upa , Upa, Upa, Upa, ; * ] ^ } ) @@ -10123,8 +10187,10 @@ (define_insn "*aarch64_brk<brk_op>_ptest" UNSPEC_PTEST)) (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, Upa ] brk<brk_op>s\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa; yes ] brk<brk_op>s\t%0.b, %1/z, %2.b, %3.b + [ ?Upa , Upa, Upa, Upa; yes ] ^ + [ Upa , Upa, Upa, Upa; * ] ^ } ) diff --git a/gcc/config/aarch64/aarch64-sve2.md b/gcc/config/aarch64/aarch64-sve2.md index aa12baf48355358ca4fefe88157df3aac6eb09bd..1a49494a69d8335e5f7d3ef4bd3a90d0805bba84 100644 --- a/gcc/config/aarch64/aarch64-sve2.md +++ b/gcc/config/aarch64/aarch64-sve2.md @@ -3349,8 +3349,10 @@ (define_insn "@aarch64_pred_<sve_int_op><mode>" UNSPEC_PRED_Z)) (clobber (reg:CC_NZC CC_REGNUM))] "TARGET_SVE2 && TARGET_NON_STREAMING" - {@ [ cons: =0, 1 , 2, 3, 4 ] - [ Upa , Upl, , w, w ] <sve_int_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.<Vetype> + {@ [ cons: =0, 1 , 2, 3, 4; attrs: pred_clobber ] + [ &Upa , Upl, , w, w; yes ] <sve_int_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.<Vetype> + [ ?Upa , Upl, , w, w; yes ] ^ + [ Upa , Upl, , w, w; * ] ^ } ) ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 3/4]AArch64: add new alternative with early clobber to patterns 2024-05-22 9:29 [PATCH 3/4]AArch64: add new alternative with early clobber to patterns Tamar Christina @ 2024-05-22 9:47 ` Richard Sandiford 2024-05-22 11:00 ` Tamar Christina 0 siblings, 1 reply; 25+ messages in thread From: Richard Sandiford @ 2024-05-22 9:47 UTC (permalink / raw) To: Tamar Christina Cc: gcc-patches, nd, Richard.Earnshaw, Marcus.Shawcroft, ktkachov Tamar Christina <tamar.christina@arm.com> writes: > Hi All, > > This patch adds new alternatives to the patterns which are affected. The new > alternatives with the conditional early clobbers are added before the normal > ones in order for LRA to prefer them in the event that we have enough free > registers to accommodate them. > > In case register pressure is too high the normal alternatives will be preferred > before a reload is considered as we rather have the tie than a spill. > > Tests are in the next patch. > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > Ok for master? > > Thanks, > Tamar > > gcc/ChangeLog: > > * config/aarch64/aarch64-sve.md (and<mode>3, > @aarch64_pred_<optab><mode>_z, *<optab><mode>3_cc, > *<optab><mode>3_ptest, aarch64_pred_<nlogical><mode>_z, > *<nlogical><mode>3_cc, *<nlogical><mode>3_ptest, > aarch64_pred_<logical_nn><mode>_z, *<logical_nn><mode>3_cc, > *<logical_nn><mode>3_ptest, @aarch64_pred_cmp<cmp_op><mode>, > *cmp<cmp_op><mode>_cc, *cmp<cmp_op><mode>_ptest, > @aarch64_pred_cmp<cmp_op><mode>_wide, > *aarch64_pred_cmp<cmp_op><mode>_wide_cc, > *aarch64_pred_cmp<cmp_op><mode>_wide_ptest, @aarch64_brk<brk_op>, > *aarch64_brk<brk_op>_cc, *aarch64_brk<brk_op>_ptest, > @aarch64_brk<brk_op>, *aarch64_brkn_cc, *aarch64_brkn_ptest, > *aarch64_brk<brk_op>_cc, *aarch64_brk<brk_op>_ptest, > aarch64_rdffr_z, *aarch64_rdffr_z_ptest, *aarch64_rdffr_ptest, > *aarch64_rdffr_z_cc, *aarch64_rdffr_cc): Add new early clobber > alternative. > * config/aarch64/aarch64-sve2.md > (@aarch64_pred_<sve_int_op><mode>): Likewise. > > --- > diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md > index e3085c0c636f1317409bbf3b5fbaf5342a2df1f6..8fdc1bc3cd43acfcd675a18350c297428c85fe46 100644 > --- a/gcc/config/aarch64/aarch64-sve.md > +++ b/gcc/config/aarch64/aarch64-sve.md > @@ -1161,8 +1161,10 @@ (define_insn "aarch64_rdffr_z" > (reg:VNx16BI FFRT_REGNUM) > (match_operand:VNx16BI 1 "register_operand")))] > "TARGET_SVE && TARGET_NON_STREAMING" > - {@ [ cons: =0, 1 ] > - [ Upa , Upa ] rdffr\t%0.b, %1/z > + {@ [ cons: =0, 1 ; attrs: pred_clobber ] > + [ &Upa , Upa; yes ] rdffr\t%0.b, %1/z > + [ ?Upa , Upa; yes ] ^ > + [ Upa , Upa; * ] ^ > } > ) Sorry for not explaining it very well, but in the previous review I suggested: > The gather-like approach would be something like: > > [ &Upa , Upl , w , <sve_imm_con>; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, #%4 > [ ?Upl , 0 , w , <sve_imm_con>; yes ] ^ > [ Upa , Upl , w , <sve_imm_con>; no ] ^ > [ &Upa , Upl , w , w ; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.<Vetype> > [ ?Upl , 0 , w , w ; yes ] ^ > [ Upa , Upl , w , w ; no ] ^ > > with: > > (define_attr "pred_clobber" "any,no,yes" (const_string "any")) (with emphasis on the last line). What I didn't say explicitly is that "no" should require !TARGET_SVE_PRED_CLOBBER. The premise of that review was that we shouldn't enable things like: [ Upa , Upl , w , w ; no ] ^ for TARGET_SVE_PRED_CLOBBER since it contradicts the earlyclobber alternative. So we should enable either the pred_clobber=yes alternatives or the pred_clobber=no alternatives, but not both. The default "any" is then for other non-predicate instructions that don't care about TARGET_SVE_PRED_CLOBBER either way. In contrast, this patch makes pred_clobber=yes enable the alternatives that correctly describe the restriction (good!) but then also enables the normal alternatives too, which IMO makes the semantics unclear. Thanks, Richard > > @@ -1179,8 +1181,10 @@ (define_insn "*aarch64_rdffr_z_ptest" > UNSPEC_PTEST)) > (clobber (match_scratch:VNx16BI 0))] > "TARGET_SVE && TARGET_NON_STREAMING" > - {@ [ cons: =0, 1 ] > - [ Upa , Upa ] rdffrs\t%0.b, %1/z > + {@ [ cons: =0, 1 ; attrs: pred_clobber ] > + [ &Upa , Upa; yes ] rdffrs\t%0.b, %1/z > + [ ?Upa , Upa; yes ] ^ > + [ Upa , Upa; * ] ^ > } > ) > > @@ -1195,8 +1199,10 @@ (define_insn "*aarch64_rdffr_ptest" > UNSPEC_PTEST)) > (clobber (match_scratch:VNx16BI 0))] > "TARGET_SVE && TARGET_NON_STREAMING" > - {@ [ cons: =0, 1 ] > - [ Upa , Upa ] rdffrs\t%0.b, %1/z > + {@ [ cons: =0, 1 ; attrs: pred_clobber ] > + [ &Upa , Upa; yes ] rdffrs\t%0.b, %1/z > + [ ?Upa , Upa; yes ] ^ > + [ Upa , Upa; * ] ^ > } > ) > > @@ -1216,8 +1222,10 @@ (define_insn "*aarch64_rdffr_z_cc" > (reg:VNx16BI FFRT_REGNUM) > (match_dup 1)))] > "TARGET_SVE && TARGET_NON_STREAMING" > - {@ [ cons: =0, 1 ] > - [ Upa , Upa ] rdffrs\t%0.b, %1/z > + {@ [ cons: =0, 1 ; attrs: pred_clobber ] > + [ &Upa , Upa; yes ] rdffrs\t%0.b, %1/z > + [ ?Upa , Upa; yes ] ^ > + [ Upa , Upa; * ] ^ > } > ) > > @@ -1233,8 +1241,10 @@ (define_insn "*aarch64_rdffr_cc" > (set (match_operand:VNx16BI 0 "register_operand") > (reg:VNx16BI FFRT_REGNUM))] > "TARGET_SVE && TARGET_NON_STREAMING" > - {@ [ cons: =0, 1 ] > - [ Upa , Upa ] rdffrs\t%0.b, %1/z > + {@ [ cons: =0, 1 ; attrs: pred_clobber ] > + [ &Upa , Upa; yes ] rdffrs\t%0.b, %1/z > + [ ?Upa , Upa; yes ] ^ > + [ Upa , Upa; * ] ^ > } > ) > > @@ -6651,8 +6661,10 @@ (define_insn "and<mode>3" > (and:PRED_ALL (match_operand:PRED_ALL 1 "register_operand") > (match_operand:PRED_ALL 2 "register_operand")))] > "TARGET_SVE" > - {@ [ cons: =0, 1 , 2 ] > - [ Upa , Upa, Upa ] and\t%0.b, %1/z, %2.b, %2.b > + {@ [ cons: =0, 1 , 2 ; attrs: pred_clobber ] > + [ &Upa , Upa, Upa; yes ] and\t%0.b, %1/z, %2.b, %2.b > + [ ?Upa , Upa, Upa; yes ] ^ > + [ Upa , Upa, Upa; * ] ^ > } > ) > > @@ -6679,8 +6691,10 @@ (define_insn "@aarch64_pred_<optab><mode>_z" > (match_operand:PRED_ALL 3 "register_operand")) > (match_operand:PRED_ALL 1 "register_operand")))] > "TARGET_SVE" > - {@ [ cons: =0, 1 , 2 , 3 ] > - [ Upa , Upa, Upa, Upa ] <logical>\t%0.b, %1/z, %2.b, %3.b > + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] > + [ &Upa , Upa, Upa, Upa; yes ] <logical>\t%0.b, %1/z, %2.b, %3.b > + [ ?Upa , Upa, Upa, Upa; yes ] ^ > + [ Upa , Upa, Upa, Upa; * ] ^ > } > ) > > @@ -6703,8 +6717,10 @@ (define_insn "*<optab><mode>3_cc" > (and:PRED_ALL (LOGICAL:PRED_ALL (match_dup 2) (match_dup 3)) > (match_dup 4)))] > "TARGET_SVE" > - {@ [ cons: =0, 1 , 2 , 3 ] > - [ Upa , Upa, Upa, Upa ] <logical>s\t%0.b, %1/z, %2.b, %3.b > + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] > + [ &Upa , Upa, Upa, Upa; yes ] <logical>s\t%0.b, %1/z, %2.b, %3.b > + [ ?Upa , Upa, Upa, Upa; yes ] ^ > + [ Upa , Upa, Upa, Upa; * ] ^ > } > ) > > @@ -6723,8 +6739,10 @@ (define_insn "*<optab><mode>3_ptest" > UNSPEC_PTEST)) > (clobber (match_scratch:VNx16BI 0))] > "TARGET_SVE" > - {@ [ cons: =0, 1 , 2 , 3 ] > - [ Upa , Upa, Upa, Upa ] <logical>s\t%0.b, %1/z, %2.b, %3.b > + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] > + [ &Upa , Upa, Upa, Upa; yes ] <logical>s\t%0.b, %1/z, %2.b, %3.b > + [ ?Upa , Upa, Upa, Upa; yes ] ^ > + [ Upa , Upa, Upa, Upa; * ] ^ > } > ) > > @@ -6745,8 +6763,10 @@ (define_insn "aarch64_pred_<nlogical><mode>_z" > (match_operand:PRED_ALL 2 "register_operand")) > (match_operand:PRED_ALL 1 "register_operand")))] > "TARGET_SVE" > - {@ [ cons: =0, 1 , 2 , 3 ] > - [ Upa , Upa, Upa, Upa ] <nlogical>\t%0.b, %1/z, %2.b, %3.b > + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] > + [ &Upa , Upa, Upa, Upa; yes ] <nlogical>\t%0.b, %1/z, %2.b, %3.b > + [ ?Upa , Upa, Upa, Upa; yes ] ^ > + [ Upa , Upa, Upa, Upa; * ] ^ > } > ) > > @@ -6770,8 +6790,10 @@ (define_insn "*<nlogical><mode>3_cc" > (match_dup 2)) > (match_dup 4)))] > "TARGET_SVE" > - {@ [ cons: =0, 1 , 2 , 3 ] > - [ Upa , Upa, Upa, Upa ] <nlogical>s\t%0.b, %1/z, %2.b, %3.b > + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] > + [ &Upa , Upa, Upa, Upa; yes ] <nlogical>s\t%0.b, %1/z, %2.b, %3.b > + [ ?Upa , Upa, Upa, Upa; yes ] ^ > + [ Upa , Upa, Upa, Upa; * ] ^ > } > ) > > @@ -6791,8 +6813,10 @@ (define_insn "*<nlogical><mode>3_ptest" > UNSPEC_PTEST)) > (clobber (match_scratch:VNx16BI 0))] > "TARGET_SVE" > - {@ [ cons: =0, 1 , 2 , 3 ] > - [ Upa , Upa, Upa, Upa ] <nlogical>s\t%0.b, %1/z, %2.b, %3.b > + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] > + [ &Upa , Upa, Upa, Upa; yes ] <nlogical>s\t%0.b, %1/z, %2.b, %3.b > + [ ?Upa , Upa, Upa, Upa; yes ] ^ > + [ Upa , Upa, Upa, Upa; * ] ^ > } > ) > > @@ -6813,8 +6837,10 @@ (define_insn "aarch64_pred_<logical_nn><mode>_z" > (not:PRED_ALL (match_operand:PRED_ALL 3 "register_operand"))) > (match_operand:PRED_ALL 1 "register_operand")))] > "TARGET_SVE" > - {@ [ cons: =0, 1 , 2 , 3 ] > - [ Upa , Upa, Upa, Upa ] <logical_nn>\t%0.b, %1/z, %2.b, %3.b > + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] > + [ &Upa , Upa, Upa, Upa; yes ] <logical_nn>\t%0.b, %1/z, %2.b, %3.b > + [ ?Upa , Upa, Upa, Upa; yes ] ^ > + [ Upa , Upa, Upa, Upa; * ] ^ > } > ) > > @@ -6839,8 +6865,10 @@ (define_insn "*<logical_nn><mode>3_cc" > (not:PRED_ALL (match_dup 3))) > (match_dup 4)))] > "TARGET_SVE" > - {@ [ cons: =0, 1 , 2 , 3 ] > - [ Upa , Upa, Upa, Upa ] <logical_nn>s\t%0.b, %1/z, %2.b, %3.b > + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] > + [ &Upa , Upa, Upa, Upa; yes ] <logical_nn>s\t%0.b, %1/z, %2.b, %3.b > + [ ?Upa , Upa, Upa, Upa; yes ] ^ > + [ Upa , Upa, Upa, Upa; * ] ^ > } > ) > > @@ -6861,8 +6889,10 @@ (define_insn "*<logical_nn><mode>3_ptest" > UNSPEC_PTEST)) > (clobber (match_scratch:VNx16BI 0))] > "TARGET_SVE" > - {@ [ cons: =0, 1 , 2 , 3 ] > - [ Upa , Upa, Upa, Upa ] <logical_nn>s\t%0.b, %1/z, %2.b, %3.b > + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] > + [ &Upa , Upa, Upa, Upa; yes ] <logical_nn>s\t%0.b, %1/z, %2.b, %3.b > + [ ?Upa , Upa, Upa, Upa; yes ] ^ > + [ Upa , Upa, Upa, Upa; * ] ^ > } > ) > > @@ -8104,9 +8134,13 @@ (define_insn "@aarch64_pred_cmp<cmp_op><mode>" > UNSPEC_PRED_Z)) > (clobber (reg:CC_NZC CC_REGNUM))] > "TARGET_SVE" > - {@ [ cons: =0 , 1 , 3 , 4 ] > - [ Upa , Upl , w , <sve_imm_con> ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, #%4 > - [ Upa , Upl , w , w ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.<Vetype> > + {@ [ cons: =0 , 1 , 3 , 4 ; attrs: pred_clobber ] > + [ &Upa , Upl , w , <sve_imm_con>; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, #%4 > + [ ?Upa , Upl , w , <sve_imm_con>; yes ] ^ > + [ Upa , Upl , w , <sve_imm_con>; * ] ^ > + [ &Upa , Upl , w , w ; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.<Vetype> > + [ ?Upa , Upl , w , w ; yes ] ^ > + [ Upa , Upl , w , w ; * ] ^ > } > ) > > @@ -8136,9 +8170,13 @@ (define_insn_and_rewrite "*cmp<cmp_op><mode>_cc" > UNSPEC_PRED_Z))] > "TARGET_SVE > && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])" > - {@ [ cons: =0 , 1 , 2 , 3 ] > - [ Upa , Upl , w , <sve_imm_con> ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, #%3 > - [ Upa , Upl , w , w ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype> > + {@ [ cons: =0 , 1 , 2 , 3 ; attrs: pred_clobber ] > + [ &Upa , Upl , w , <sve_imm_con>; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, #%3 > + [ ?Upa , Upl , w , <sve_imm_con>; yes ] ^ > + [ Upa , Upl , w , <sve_imm_con>; * ] ^ > + [ &Upa , Upl , w , w ; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype> > + [ ?Upa , Upl , w , w ; yes ] ^ > + [ Upa , Upl , w , w ; * ] ^ > } > "&& !rtx_equal_p (operands[4], operands[6])" > { > @@ -8166,9 +8204,13 @@ (define_insn_and_rewrite "*cmp<cmp_op><mode>_ptest" > (clobber (match_scratch:<VPRED> 0))] > "TARGET_SVE > && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])" > - {@ [ cons: =0, 1 , 2 , 3 ] > - [ Upa , Upl, w , <sve_imm_con> ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, #%3 > - [ Upa , Upl, w , w ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype> > + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] > + [ &Upa , Upl, w , <sve_imm_con>; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, #%3 > + [ ?Upa , Upl, w , <sve_imm_con>; yes ] ^ > + [ Upa , Upl, w , <sve_imm_con>; * ] ^ > + [ &Upa , Upl, w , w ; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype> > + [ ?Upa , Upl, w , w ; yes ] ^ > + [ Upa , Upl, w , w ; * ] ^ > } > "&& !rtx_equal_p (operands[4], operands[6])" > { > @@ -8221,8 +8263,10 @@ (define_insn "@aarch64_pred_cmp<cmp_op><mode>_wide" > UNSPEC_PRED_Z)) > (clobber (reg:CC_NZC CC_REGNUM))] > "TARGET_SVE" > - {@ [ cons: =0, 1 , 2, 3, 4 ] > - [ Upa , Upl, , w, w ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.d > + {@ [ cons: =0, 1 , 2, 3, 4; attrs: pred_clobber ] > + [ &Upa , Upl, , w, w; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.d > + [ ?Upa , Upl, , w, w; yes ] ^ > + [ Upa , Upl, , w, w; * ] ^ > } > ) > > @@ -8254,8 +8298,10 @@ (define_insn "*aarch64_pred_cmp<cmp_op><mode>_wide_cc" > UNSPEC_PRED_Z))] > "TARGET_SVE > && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])" > - {@ [ cons: =0, 1 , 2, 3, 6 ] > - [ Upa , Upl, w, w, Upl ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.d > + {@ [ cons: =0, 1 , 2, 3, 6 ; attrs: pred_clobber ] > + [ &Upa , Upl, w, w, Upl; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.d > + [ ?Upa , Upl, w, w, Upl; yes ] ^ > + [ Upa , Upl, w, w, Upl; * ] ^ > } > ) > > @@ -8279,8 +8325,10 @@ (define_insn "*aarch64_pred_cmp<cmp_op><mode>_wide_ptest" > (clobber (match_scratch:<VPRED> 0))] > "TARGET_SVE > && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])" > - {@ [ cons: =0, 1 , 2, 3, 6 ] > - [ Upa , Upl, w, w, Upl ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.d > + {@ [ cons: =0, 1 , 2, 3, 6 ; attrs: pred_clobber ] > + [ &Upa , Upl, w, w, Upl; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.d > + [ ?Upa , Upl, w, w, Upl; yes ] ^ > + [ Upa , Upl, w, w, Upl; * ] ^ > } > ) > > @@ -9948,9 +9996,13 @@ (define_insn "@aarch64_brk<brk_op>" > (match_operand:VNx16BI 3 "aarch64_simd_reg_or_zero")] > SVE_BRK_UNARY))] > "TARGET_SVE" > - {@ [ cons: =0 , 1 , 2 , 3 ] > - [ Upa , Upa , Upa , Dz ] brk<brk_op>\t%0.b, %1/z, %2.b > - [ Upa , Upa , Upa , 0 ] brk<brk_op>\t%0.b, %1/m, %2.b > + {@ [ cons: =0 , 1 , 2 , 3 ; attrs: pred_clobber ] > + [ &Upa , Upa , Upa , Dz; yes ] brk<brk_op>\t%0.b, %1/z, %2.b > + [ ?Upa , Upa , Upa , Dz; yes ] ^ > + [ Upa , Upa , Upa , Dz; * ] ^ > + [ &Upa , Upa , Upa , 0 ; yes ] brk<brk_op>\t%0.b, %1/m, %2.b > + [ ?Upa , Upa , Upa , 0 ; yes ] ^ > + [ Upa , Upa , Upa , 0 ; * ] ^ > } > ) > > @@ -9974,8 +10026,10 @@ (define_insn "*aarch64_brk<brk_op>_cc" > (match_dup 3)] > SVE_BRK_UNARY))] > "TARGET_SVE" > - {@ [ cons: =0, 1 , 2 ] > - [ Upa , Upa, Upa ] brk<brk_op>s\t%0.b, %1/z, %2.b > + {@ [ cons: =0, 1 , 2 ; attrs: pred_clobber ] > + [ &Upa , Upa, Upa; yes ] brk<brk_op>s\t%0.b, %1/z, %2.b > + [ ?Upa , Upa, Upa; yes ] ^ > + [ Upa , Upa, Upa; * ] ^ > } > ) > > @@ -9994,8 +10048,10 @@ (define_insn "*aarch64_brk<brk_op>_ptest" > UNSPEC_PTEST)) > (clobber (match_scratch:VNx16BI 0))] > "TARGET_SVE" > - {@ [ cons: =0, 1 , 2 ] > - [ Upa , Upa, Upa ] brk<brk_op>s\t%0.b, %1/z, %2.b > + {@ [ cons: =0, 1 , 2 ; attrs: pred_clobber ] > + [ &Upa , Upa, Upa; yes ] brk<brk_op>s\t%0.b, %1/z, %2.b > + [ ?Upa , Upa, Upa; yes ] ^ > + [ Upa , Upa, Upa; * ] ^ > } > ) > > @@ -10020,8 +10076,10 @@ (define_insn "@aarch64_brk<brk_op>" > (match_operand:VNx16BI 3 "register_operand")] > SVE_BRK_BINARY))] > "TARGET_SVE" > - {@ [ cons: =0, 1 , 2 , 3 ] > - [ Upa , Upa, Upa, <brk_reg_con> ] brk<brk_op>\t%0.b, %1/z, %2.b, %<brk_reg_opno>.b > + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] > + [ &Upa , Upa, Upa, <brk_reg_con>; yes ] brk<brk_op>\t%0.b, %1/z, %2.b, %<brk_reg_opno>.b > + [ ?Upa , Upa, Upa, <brk_reg_con>; yes ] ^ > + [ Upa , Upa, Upa, <brk_reg_con>; * ] ^ > } > ) > > @@ -10046,8 +10104,10 @@ (define_insn_and_rewrite "*aarch64_brkn_cc" > (match_dup 3)] > UNSPEC_BRKN))] > "TARGET_SVE" > - {@ [ cons: =0, 1 , 2 , 3 ] > - [ Upa , Upa, Upa, 0 ] brkns\t%0.b, %1/z, %2.b, %0.b > + {@ [ cons: =0, 1 , 2 , 3; attrs: pred_clobber ] > + [ &Upa , Upa, Upa, 0; yes ] brkns\t%0.b, %1/z, %2.b, %0.b > + [ ?Upa , Upa, Upa, 0; yes ] ^ > + [ Upa , Upa, Upa, 0; * ] ^ > } > "&& (operands[4] != CONST0_RTX (VNx16BImode) > || operands[5] != CONST0_RTX (VNx16BImode))" > @@ -10072,8 +10132,10 @@ (define_insn_and_rewrite "*aarch64_brkn_ptest" > UNSPEC_PTEST)) > (clobber (match_scratch:VNx16BI 0))] > "TARGET_SVE" > - {@ [ cons: =0, 1 , 2 , 3 ] > - [ Upa , Upa, Upa, 0 ] brkns\t%0.b, %1/z, %2.b, %0.b > + {@ [ cons: =0, 1 , 2 , 3; attrs: pred_clobber ] > + [ &Upa , Upa, Upa, 0; yes ] brkns\t%0.b, %1/z, %2.b, %0.b > + [ ?Upa , Upa, Upa, 0; yes ] ^ > + [ Upa , Upa, Upa, 0; * ] ^ > } > "&& (operands[4] != CONST0_RTX (VNx16BImode) > || operands[5] != CONST0_RTX (VNx16BImode))" > @@ -10103,8 +10165,10 @@ (define_insn "*aarch64_brk<brk_op>_cc" > (match_dup 3)] > SVE_BRKP))] > "TARGET_SVE" > - {@ [ cons: =0, 1 , 2 , 3 ] > - [ Upa , Upa, Upa, Upa ] brk<brk_op>s\t%0.b, %1/z, %2.b, %3.b > + {@ [ cons: =0, 1 , 2 , 3 , 4; attrs: pred_clobber ] > + [ &Upa , Upa, Upa, Upa, ; yes ] brk<brk_op>s\t%0.b, %1/z, %2.b, %3.b > + [ ?Upa , Upa, Upa, Upa, ; yes ] ^ > + [ Upa , Upa, Upa, Upa, ; * ] ^ > } > ) > > @@ -10123,8 +10187,10 @@ (define_insn "*aarch64_brk<brk_op>_ptest" > UNSPEC_PTEST)) > (clobber (match_scratch:VNx16BI 0))] > "TARGET_SVE" > - {@ [ cons: =0, 1 , 2 , 3 ] > - [ Upa , Upa, Upa, Upa ] brk<brk_op>s\t%0.b, %1/z, %2.b, %3.b > + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] > + [ &Upa , Upa, Upa, Upa; yes ] brk<brk_op>s\t%0.b, %1/z, %2.b, %3.b > + [ ?Upa , Upa, Upa, Upa; yes ] ^ > + [ Upa , Upa, Upa, Upa; * ] ^ > } > ) > > diff --git a/gcc/config/aarch64/aarch64-sve2.md b/gcc/config/aarch64/aarch64-sve2.md > index aa12baf48355358ca4fefe88157df3aac6eb09bd..1a49494a69d8335e5f7d3ef4bd3a90d0805bba84 100644 > --- a/gcc/config/aarch64/aarch64-sve2.md > +++ b/gcc/config/aarch64/aarch64-sve2.md > @@ -3349,8 +3349,10 @@ (define_insn "@aarch64_pred_<sve_int_op><mode>" > UNSPEC_PRED_Z)) > (clobber (reg:CC_NZC CC_REGNUM))] > "TARGET_SVE2 && TARGET_NON_STREAMING" > - {@ [ cons: =0, 1 , 2, 3, 4 ] > - [ Upa , Upl, , w, w ] <sve_int_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.<Vetype> > + {@ [ cons: =0, 1 , 2, 3, 4; attrs: pred_clobber ] > + [ &Upa , Upl, , w, w; yes ] <sve_int_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.<Vetype> > + [ ?Upa , Upl, , w, w; yes ] ^ > + [ Upa , Upl, , w, w; * ] ^ > } > ) ^ permalink raw reply [flat|nested] 25+ messages in thread
* RE: [PATCH 3/4]AArch64: add new alternative with early clobber to patterns 2024-05-22 9:47 ` Richard Sandiford @ 2024-05-22 11:00 ` Tamar Christina 2024-05-22 11:24 ` Richard Sandiford 0 siblings, 1 reply; 25+ messages in thread From: Tamar Christina @ 2024-05-22 11:00 UTC (permalink / raw) To: Richard Sandiford Cc: gcc-patches, nd, Richard Earnshaw, Marcus Shawcroft, ktkachov > -----Original Message----- > From: Richard Sandiford <richard.sandiford@arm.com> > Sent: Wednesday, May 22, 2024 10:48 AM > To: Tamar Christina <Tamar.Christina@arm.com> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard Earnshaw > <Richard.Earnshaw@arm.com>; Marcus Shawcroft > <Marcus.Shawcroft@arm.com>; ktkachov@gcc.gnu.org > Subject: Re: [PATCH 3/4]AArch64: add new alternative with early clobber to > patterns > > Tamar Christina <tamar.christina@arm.com> writes: > > Hi All, > > > > This patch adds new alternatives to the patterns which are affected. The new > > alternatives with the conditional early clobbers are added before the normal > > ones in order for LRA to prefer them in the event that we have enough free > > registers to accommodate them. > > > > In case register pressure is too high the normal alternatives will be preferred > > before a reload is considered as we rather have the tie than a spill. > > > > Tests are in the next patch. > > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > > > Ok for master? > > > > Thanks, > > Tamar > > > > gcc/ChangeLog: > > > > * config/aarch64/aarch64-sve.md (and<mode>3, > > @aarch64_pred_<optab><mode>_z, *<optab><mode>3_cc, > > *<optab><mode>3_ptest, aarch64_pred_<nlogical><mode>_z, > > *<nlogical><mode>3_cc, *<nlogical><mode>3_ptest, > > aarch64_pred_<logical_nn><mode>_z, *<logical_nn><mode>3_cc, > > *<logical_nn><mode>3_ptest, @aarch64_pred_cmp<cmp_op><mode>, > > *cmp<cmp_op><mode>_cc, *cmp<cmp_op><mode>_ptest, > > @aarch64_pred_cmp<cmp_op><mode>_wide, > > *aarch64_pred_cmp<cmp_op><mode>_wide_cc, > > *aarch64_pred_cmp<cmp_op><mode>_wide_ptest, > @aarch64_brk<brk_op>, > > *aarch64_brk<brk_op>_cc, *aarch64_brk<brk_op>_ptest, > > @aarch64_brk<brk_op>, *aarch64_brkn_cc, *aarch64_brkn_ptest, > > *aarch64_brk<brk_op>_cc, *aarch64_brk<brk_op>_ptest, > > aarch64_rdffr_z, *aarch64_rdffr_z_ptest, *aarch64_rdffr_ptest, > > *aarch64_rdffr_z_cc, *aarch64_rdffr_cc): Add new early clobber > > alternative. > > * config/aarch64/aarch64-sve2.md > > (@aarch64_pred_<sve_int_op><mode>): Likewise. > > > > --- > > diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64- > sve.md > > index > e3085c0c636f1317409bbf3b5fbaf5342a2df1f6..8fdc1bc3cd43acfcd675a18350c > 297428c85fe46 100644 > > --- a/gcc/config/aarch64/aarch64-sve.md > > +++ b/gcc/config/aarch64/aarch64-sve.md > > @@ -1161,8 +1161,10 @@ (define_insn "aarch64_rdffr_z" > > (reg:VNx16BI FFRT_REGNUM) > > (match_operand:VNx16BI 1 "register_operand")))] > > "TARGET_SVE && TARGET_NON_STREAMING" > > - {@ [ cons: =0, 1 ] > > - [ Upa , Upa ] rdffr\t%0.b, %1/z > > + {@ [ cons: =0, 1 ; attrs: pred_clobber ] > > + [ &Upa , Upa; yes ] rdffr\t%0.b, %1/z > > + [ ?Upa , Upa; yes ] ^ > > + [ Upa , Upa; * ] ^ > > } > > ) > > Sorry for not explaining it very well, but in the previous review I suggested: > > > The gather-like approach would be something like: > > > > [ &Upa , Upl , w , <sve_imm_con>; yes ] > cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, #%4 > > [ ?Upl , 0 , w , <sve_imm_con>; yes ] ^ > > [ Upa , Upl , w , <sve_imm_con>; no ] ^ > > [ &Upa , Upl , w , w ; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, > %3.<Vetype>, %4.<Vetype> > > [ ?Upl , 0 , w , w ; yes ] ^ > > [ Upa , Upl , w , w ; no ] ^ > > > > with: > > > > (define_attr "pred_clobber" "any,no,yes" (const_string "any")) > > (with emphasis on the last line). What I didn't say explicitly is > that "no" should require !TARGET_SVE_PRED_CLOBBER. > > The premise of that review was that we shouldn't enable things like: > > [ Upa , Upl , w , w ; no ] ^ > > for TARGET_SVE_PRED_CLOBBER since it contradicts the earlyclobber > alternative. So we should enable either the pred_clobber=yes > alternatives or the pred_clobber=no alternatives, but not both. > > The default "any" is then for other non-predicate instructions that > don't care about TARGET_SVE_PRED_CLOBBER either way. > > In contrast, this patch makes pred_clobber=yes enable the alternatives > that correctly describe the restriction (good!) but then also enables > the normal alternatives too, which IMO makes the semantics unclear. Sure, the reason I still had that is because this ICEs under high register pressure: {@ [ cons: =0 , 1 , 3 , 4 ; attrs: pred_clobber ] [ &Upa , Upl , w , <sve_imm_con>; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, #%4 [ ?Upa , 0 , w , <sve_imm_con>; yes ] ^ [ Upa , Upl , w , <sve_imm_con>; no ] ^ [ &Upa , Upl , w , w ; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.<Vetype> [ ?Upa , 0 , w , w ; yes ] ^ [ Upa , Upl , w , w ; no ] ^ } So now in the `yes` case reload does: Considering alt=0 of insn 10: (0) =&Upa (1) Upl (3) w (4) vsd 0 Small class reload: reject+=3 0 Non input pseudo reload: reject++ 0 Early clobber: reject++ Bad operand -- refuse Considering alt=1 of insn 10: (0) ?Upa (1) 0 (3) w (4) vsd Staticly defined alt reject+=6 0 Small class reload: reject+=3 0 Non input pseudo reload: reject++ 1 Dying matched operand reload: reject++ 1 Small class reload: reject+=3 Bad operand -- refuse Considering alt=3 of insn 10: (0) &Upa (1) Upl (3) w (4) w 0 Small class reload: reject+=3 0 Non input pseudo reload: reject++ 0 Early clobber: reject++ overall=11,losers=1,rld_nregs=1 Considering alt=4 of insn 10: (0) ?Upa (1) 0 (3) w (4) w Staticly defined alt reject+=6 0 Small class reload: reject+=3 0 Non input pseudo reload: reject++ overall=16,losers=1 -- refuse Choosing alt 3 in insn 10: (0) &Upa (1) Upl (3) w (4) w {aarch64_pred_cmplovnx8hi} And the penalty of alt=4 makes it pick alt=3 even though it doesn't have the free registers for it. alt=4 would have worked. I believe this now follows exactly what was suggested: 1. provide an early clobber alternative 2. provide an explicit tie alternative with an increase in cost for using it 3. provide a general/normal alternative that is only enabled when the first two aren't. Having read the email a number of times.. did I somehow miss something? Tamar > > Thanks, > Richard > > > > > @@ -1179,8 +1181,10 @@ (define_insn "*aarch64_rdffr_z_ptest" > > UNSPEC_PTEST)) > > (clobber (match_scratch:VNx16BI 0))] > > "TARGET_SVE && TARGET_NON_STREAMING" > > - {@ [ cons: =0, 1 ] > > - [ Upa , Upa ] rdffrs\t%0.b, %1/z > > + {@ [ cons: =0, 1 ; attrs: pred_clobber ] > > + [ &Upa , Upa; yes ] rdffrs\t%0.b, %1/z > > + [ ?Upa , Upa; yes ] ^ > > + [ Upa , Upa; * ] ^ > > } > > ) > > > > @@ -1195,8 +1199,10 @@ (define_insn "*aarch64_rdffr_ptest" > > UNSPEC_PTEST)) > > (clobber (match_scratch:VNx16BI 0))] > > "TARGET_SVE && TARGET_NON_STREAMING" > > - {@ [ cons: =0, 1 ] > > - [ Upa , Upa ] rdffrs\t%0.b, %1/z > > + {@ [ cons: =0, 1 ; attrs: pred_clobber ] > > + [ &Upa , Upa; yes ] rdffrs\t%0.b, %1/z > > + [ ?Upa , Upa; yes ] ^ > > + [ Upa , Upa; * ] ^ > > } > > ) > > > > @@ -1216,8 +1222,10 @@ (define_insn "*aarch64_rdffr_z_cc" > > (reg:VNx16BI FFRT_REGNUM) > > (match_dup 1)))] > > "TARGET_SVE && TARGET_NON_STREAMING" > > - {@ [ cons: =0, 1 ] > > - [ Upa , Upa ] rdffrs\t%0.b, %1/z > > + {@ [ cons: =0, 1 ; attrs: pred_clobber ] > > + [ &Upa , Upa; yes ] rdffrs\t%0.b, %1/z > > + [ ?Upa , Upa; yes ] ^ > > + [ Upa , Upa; * ] ^ > > } > > ) > > > > @@ -1233,8 +1241,10 @@ (define_insn "*aarch64_rdffr_cc" > > (set (match_operand:VNx16BI 0 "register_operand") > > (reg:VNx16BI FFRT_REGNUM))] > > "TARGET_SVE && TARGET_NON_STREAMING" > > - {@ [ cons: =0, 1 ] > > - [ Upa , Upa ] rdffrs\t%0.b, %1/z > > + {@ [ cons: =0, 1 ; attrs: pred_clobber ] > > + [ &Upa , Upa; yes ] rdffrs\t%0.b, %1/z > > + [ ?Upa , Upa; yes ] ^ > > + [ Upa , Upa; * ] ^ > > } > > ) > > > > @@ -6651,8 +6661,10 @@ (define_insn "and<mode>3" > > (and:PRED_ALL (match_operand:PRED_ALL 1 "register_operand") > > (match_operand:PRED_ALL 2 "register_operand")))] > > "TARGET_SVE" > > - {@ [ cons: =0, 1 , 2 ] > > - [ Upa , Upa, Upa ] and\t%0.b, %1/z, %2.b, %2.b > > + {@ [ cons: =0, 1 , 2 ; attrs: pred_clobber ] > > + [ &Upa , Upa, Upa; yes ] and\t%0.b, %1/z, %2.b, %2.b > > + [ ?Upa , Upa, Upa; yes ] ^ > > + [ Upa , Upa, Upa; * ] ^ > > } > > ) > > > > @@ -6679,8 +6691,10 @@ (define_insn "@aarch64_pred_<optab><mode>_z" > > (match_operand:PRED_ALL 3 "register_operand")) > > (match_operand:PRED_ALL 1 "register_operand")))] > > "TARGET_SVE" > > - {@ [ cons: =0, 1 , 2 , 3 ] > > - [ Upa , Upa, Upa, Upa ] <logical>\t%0.b, %1/z, %2.b, %3.b > > + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] > > + [ &Upa , Upa, Upa, Upa; yes ] <logical>\t%0.b, %1/z, %2.b, %3.b > > + [ ?Upa , Upa, Upa, Upa; yes ] ^ > > + [ Upa , Upa, Upa, Upa; * ] ^ > > } > > ) > > > > @@ -6703,8 +6717,10 @@ (define_insn "*<optab><mode>3_cc" > > (and:PRED_ALL (LOGICAL:PRED_ALL (match_dup 2) (match_dup 3)) > > (match_dup 4)))] > > "TARGET_SVE" > > - {@ [ cons: =0, 1 , 2 , 3 ] > > - [ Upa , Upa, Upa, Upa ] <logical>s\t%0.b, %1/z, %2.b, %3.b > > + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] > > + [ &Upa , Upa, Upa, Upa; yes ] <logical>s\t%0.b, %1/z, %2.b, %3.b > > + [ ?Upa , Upa, Upa, Upa; yes ] ^ > > + [ Upa , Upa, Upa, Upa; * ] ^ > > } > > ) > > > > @@ -6723,8 +6739,10 @@ (define_insn "*<optab><mode>3_ptest" > > UNSPEC_PTEST)) > > (clobber (match_scratch:VNx16BI 0))] > > "TARGET_SVE" > > - {@ [ cons: =0, 1 , 2 , 3 ] > > - [ Upa , Upa, Upa, Upa ] <logical>s\t%0.b, %1/z, %2.b, %3.b > > + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] > > + [ &Upa , Upa, Upa, Upa; yes ] <logical>s\t%0.b, %1/z, %2.b, %3.b > > + [ ?Upa , Upa, Upa, Upa; yes ] ^ > > + [ Upa , Upa, Upa, Upa; * ] ^ > > } > > ) > > > > @@ -6745,8 +6763,10 @@ (define_insn "aarch64_pred_<nlogical><mode>_z" > > (match_operand:PRED_ALL 2 "register_operand")) > > (match_operand:PRED_ALL 1 "register_operand")))] > > "TARGET_SVE" > > - {@ [ cons: =0, 1 , 2 , 3 ] > > - [ Upa , Upa, Upa, Upa ] <nlogical>\t%0.b, %1/z, %2.b, %3.b > > + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] > > + [ &Upa , Upa, Upa, Upa; yes ] <nlogical>\t%0.b, %1/z, %2.b, %3.b > > + [ ?Upa , Upa, Upa, Upa; yes ] ^ > > + [ Upa , Upa, Upa, Upa; * ] ^ > > } > > ) > > > > @@ -6770,8 +6790,10 @@ (define_insn "*<nlogical><mode>3_cc" > > (match_dup 2)) > > (match_dup 4)))] > > "TARGET_SVE" > > - {@ [ cons: =0, 1 , 2 , 3 ] > > - [ Upa , Upa, Upa, Upa ] <nlogical>s\t%0.b, %1/z, %2.b, %3.b > > + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] > > + [ &Upa , Upa, Upa, Upa; yes ] <nlogical>s\t%0.b, %1/z, %2.b, %3.b > > + [ ?Upa , Upa, Upa, Upa; yes ] ^ > > + [ Upa , Upa, Upa, Upa; * ] ^ > > } > > ) > > > > @@ -6791,8 +6813,10 @@ (define_insn "*<nlogical><mode>3_ptest" > > UNSPEC_PTEST)) > > (clobber (match_scratch:VNx16BI 0))] > > "TARGET_SVE" > > - {@ [ cons: =0, 1 , 2 , 3 ] > > - [ Upa , Upa, Upa, Upa ] <nlogical>s\t%0.b, %1/z, %2.b, %3.b > > + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] > > + [ &Upa , Upa, Upa, Upa; yes ] <nlogical>s\t%0.b, %1/z, %2.b, %3.b > > + [ ?Upa , Upa, Upa, Upa; yes ] ^ > > + [ Upa , Upa, Upa, Upa; * ] ^ > > } > > ) > > > > @@ -6813,8 +6837,10 @@ (define_insn > "aarch64_pred_<logical_nn><mode>_z" > > (not:PRED_ALL (match_operand:PRED_ALL 3 "register_operand"))) > > (match_operand:PRED_ALL 1 "register_operand")))] > > "TARGET_SVE" > > - {@ [ cons: =0, 1 , 2 , 3 ] > > - [ Upa , Upa, Upa, Upa ] <logical_nn>\t%0.b, %1/z, %2.b, %3.b > > + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] > > + [ &Upa , Upa, Upa, Upa; yes ] <logical_nn>\t%0.b, %1/z, %2.b, > %3.b > > + [ ?Upa , Upa, Upa, Upa; yes ] ^ > > + [ Upa , Upa, Upa, Upa; * ] ^ > > } > > ) > > > > @@ -6839,8 +6865,10 @@ (define_insn "*<logical_nn><mode>3_cc" > > (not:PRED_ALL (match_dup 3))) > > (match_dup 4)))] > > "TARGET_SVE" > > - {@ [ cons: =0, 1 , 2 , 3 ] > > - [ Upa , Upa, Upa, Upa ] <logical_nn>s\t%0.b, %1/z, %2.b, %3.b > > + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] > > + [ &Upa , Upa, Upa, Upa; yes ] <logical_nn>s\t%0.b, %1/z, %2.b, > %3.b > > + [ ?Upa , Upa, Upa, Upa; yes ] ^ > > + [ Upa , Upa, Upa, Upa; * ] ^ > > } > > ) > > > > @@ -6861,8 +6889,10 @@ (define_insn "*<logical_nn><mode>3_ptest" > > UNSPEC_PTEST)) > > (clobber (match_scratch:VNx16BI 0))] > > "TARGET_SVE" > > - {@ [ cons: =0, 1 , 2 , 3 ] > > - [ Upa , Upa, Upa, Upa ] <logical_nn>s\t%0.b, %1/z, %2.b, %3.b > > + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] > > + [ &Upa , Upa, Upa, Upa; yes ] <logical_nn>s\t%0.b, %1/z, %2.b, > %3.b > > + [ ?Upa , Upa, Upa, Upa; yes ] ^ > > + [ Upa , Upa, Upa, Upa; * ] ^ > > } > > ) > > > > @@ -8104,9 +8134,13 @@ (define_insn > "@aarch64_pred_cmp<cmp_op><mode>" > > UNSPEC_PRED_Z)) > > (clobber (reg:CC_NZC CC_REGNUM))] > > "TARGET_SVE" > > - {@ [ cons: =0 , 1 , 3 , 4 ] > > - [ Upa , Upl , w , <sve_imm_con> ] cmp<cmp_op>\t%0.<Vetype>, %1/z, > %3.<Vetype>, #%4 > > - [ Upa , Upl , w , w ] cmp<cmp_op>\t%0.<Vetype>, %1/z, > %3.<Vetype>, %4.<Vetype> > > + {@ [ cons: =0 , 1 , 3 , 4 ; attrs: pred_clobber ] > > + [ &Upa , Upl , w , <sve_imm_con>; yes ] > cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, #%4 > > + [ ?Upa , Upl , w , <sve_imm_con>; yes ] ^ > > + [ Upa , Upl , w , <sve_imm_con>; * ] ^ > > + [ &Upa , Upl , w , w ; yes ] cmp<cmp_op>\t%0.<Vetype>, > %1/z, %3.<Vetype>, %4.<Vetype> > > + [ ?Upa , Upl , w , w ; yes ] ^ > > + [ Upa , Upl , w , w ; * ] ^ > > } > > ) > > > > @@ -8136,9 +8170,13 @@ (define_insn_and_rewrite > "*cmp<cmp_op><mode>_cc" > > UNSPEC_PRED_Z))] > > "TARGET_SVE > > && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])" > > - {@ [ cons: =0 , 1 , 2 , 3 ] > > - [ Upa , Upl , w , <sve_imm_con> ] cmp<cmp_op>\t%0.<Vetype>, %1/z, > %2.<Vetype>, #%3 > > - [ Upa , Upl , w , w ] cmp<cmp_op>\t%0.<Vetype>, %1/z, > %2.<Vetype>, %3.<Vetype> > > + {@ [ cons: =0 , 1 , 2 , 3 ; attrs: pred_clobber ] > > + [ &Upa , Upl , w , <sve_imm_con>; yes ] > cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, #%3 > > + [ ?Upa , Upl , w , <sve_imm_con>; yes ] ^ > > + [ Upa , Upl , w , <sve_imm_con>; * ] ^ > > + [ &Upa , Upl , w , w ; yes ] cmp<cmp_op>\t%0.<Vetype>, > %1/z, %2.<Vetype>, %3.<Vetype> > > + [ ?Upa , Upl , w , w ; yes ] ^ > > + [ Upa , Upl , w , w ; * ] ^ > > } > > "&& !rtx_equal_p (operands[4], operands[6])" > > { > > @@ -8166,9 +8204,13 @@ (define_insn_and_rewrite > "*cmp<cmp_op><mode>_ptest" > > (clobber (match_scratch:<VPRED> 0))] > > "TARGET_SVE > > && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])" > > - {@ [ cons: =0, 1 , 2 , 3 ] > > - [ Upa , Upl, w , <sve_imm_con> ] cmp<cmp_op>\t%0.<Vetype>, %1/z, > %2.<Vetype>, #%3 > > - [ Upa , Upl, w , w ] cmp<cmp_op>\t%0.<Vetype>, %1/z, > %2.<Vetype>, %3.<Vetype> > > + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] > > + [ &Upa , Upl, w , <sve_imm_con>; yes ] > cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, #%3 > > + [ ?Upa , Upl, w , <sve_imm_con>; yes ] ^ > > + [ Upa , Upl, w , <sve_imm_con>; * ] ^ > > + [ &Upa , Upl, w , w ; yes ] cmp<cmp_op>\t%0.<Vetype>, > %1/z, %2.<Vetype>, %3.<Vetype> > > + [ ?Upa , Upl, w , w ; yes ] ^ > > + [ Upa , Upl, w , w ; * ] ^ > > } > > "&& !rtx_equal_p (operands[4], operands[6])" > > { > > @@ -8221,8 +8263,10 @@ (define_insn > "@aarch64_pred_cmp<cmp_op><mode>_wide" > > UNSPEC_PRED_Z)) > > (clobber (reg:CC_NZC CC_REGNUM))] > > "TARGET_SVE" > > - {@ [ cons: =0, 1 , 2, 3, 4 ] > > - [ Upa , Upl, , w, w ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, > %4.d > > + {@ [ cons: =0, 1 , 2, 3, 4; attrs: pred_clobber ] > > + [ &Upa , Upl, , w, w; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, > %3.<Vetype>, %4.d > > + [ ?Upa , Upl, , w, w; yes ] ^ > > + [ Upa , Upl, , w, w; * ] ^ > > } > > ) > > > > @@ -8254,8 +8298,10 @@ (define_insn > "*aarch64_pred_cmp<cmp_op><mode>_wide_cc" > > UNSPEC_PRED_Z))] > > "TARGET_SVE > > && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])" > > - {@ [ cons: =0, 1 , 2, 3, 6 ] > > - [ Upa , Upl, w, w, Upl ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, > %3.d > > + {@ [ cons: =0, 1 , 2, 3, 6 ; attrs: pred_clobber ] > > + [ &Upa , Upl, w, w, Upl; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, > %2.<Vetype>, %3.d > > + [ ?Upa , Upl, w, w, Upl; yes ] ^ > > + [ Upa , Upl, w, w, Upl; * ] ^ > > } > > ) > > > > @@ -8279,8 +8325,10 @@ (define_insn > "*aarch64_pred_cmp<cmp_op><mode>_wide_ptest" > > (clobber (match_scratch:<VPRED> 0))] > > "TARGET_SVE > > && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])" > > - {@ [ cons: =0, 1 , 2, 3, 6 ] > > - [ Upa , Upl, w, w, Upl ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, > %3.d > > + {@ [ cons: =0, 1 , 2, 3, 6 ; attrs: pred_clobber ] > > + [ &Upa , Upl, w, w, Upl; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, > %2.<Vetype>, %3.d > > + [ ?Upa , Upl, w, w, Upl; yes ] ^ > > + [ Upa , Upl, w, w, Upl; * ] ^ > > } > > ) > > > > @@ -9948,9 +9996,13 @@ (define_insn "@aarch64_brk<brk_op>" > > (match_operand:VNx16BI 3 "aarch64_simd_reg_or_zero")] > > SVE_BRK_UNARY))] > > "TARGET_SVE" > > - {@ [ cons: =0 , 1 , 2 , 3 ] > > - [ Upa , Upa , Upa , Dz ] brk<brk_op>\t%0.b, %1/z, %2.b > > - [ Upa , Upa , Upa , 0 ] brk<brk_op>\t%0.b, %1/m, %2.b > > + {@ [ cons: =0 , 1 , 2 , 3 ; attrs: pred_clobber ] > > + [ &Upa , Upa , Upa , Dz; yes ] brk<brk_op>\t%0.b, %1/z, %2.b > > + [ ?Upa , Upa , Upa , Dz; yes ] ^ > > + [ Upa , Upa , Upa , Dz; * ] ^ > > + [ &Upa , Upa , Upa , 0 ; yes ] brk<brk_op>\t%0.b, %1/m, %2.b > > + [ ?Upa , Upa , Upa , 0 ; yes ] ^ > > + [ Upa , Upa , Upa , 0 ; * ] ^ > > } > > ) > > > > @@ -9974,8 +10026,10 @@ (define_insn "*aarch64_brk<brk_op>_cc" > > (match_dup 3)] > > SVE_BRK_UNARY))] > > "TARGET_SVE" > > - {@ [ cons: =0, 1 , 2 ] > > - [ Upa , Upa, Upa ] brk<brk_op>s\t%0.b, %1/z, %2.b > > + {@ [ cons: =0, 1 , 2 ; attrs: pred_clobber ] > > + [ &Upa , Upa, Upa; yes ] brk<brk_op>s\t%0.b, %1/z, %2.b > > + [ ?Upa , Upa, Upa; yes ] ^ > > + [ Upa , Upa, Upa; * ] ^ > > } > > ) > > > > @@ -9994,8 +10048,10 @@ (define_insn "*aarch64_brk<brk_op>_ptest" > > UNSPEC_PTEST)) > > (clobber (match_scratch:VNx16BI 0))] > > "TARGET_SVE" > > - {@ [ cons: =0, 1 , 2 ] > > - [ Upa , Upa, Upa ] brk<brk_op>s\t%0.b, %1/z, %2.b > > + {@ [ cons: =0, 1 , 2 ; attrs: pred_clobber ] > > + [ &Upa , Upa, Upa; yes ] brk<brk_op>s\t%0.b, %1/z, %2.b > > + [ ?Upa , Upa, Upa; yes ] ^ > > + [ Upa , Upa, Upa; * ] ^ > > } > > ) > > > > @@ -10020,8 +10076,10 @@ (define_insn "@aarch64_brk<brk_op>" > > (match_operand:VNx16BI 3 "register_operand")] > > SVE_BRK_BINARY))] > > "TARGET_SVE" > > - {@ [ cons: =0, 1 , 2 , 3 ] > > - [ Upa , Upa, Upa, <brk_reg_con> ] brk<brk_op>\t%0.b, %1/z, %2.b, > %<brk_reg_opno>.b > > + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] > > + [ &Upa , Upa, Upa, <brk_reg_con>; yes ] brk<brk_op>\t%0.b, %1/z, > %2.b, %<brk_reg_opno>.b > > + [ ?Upa , Upa, Upa, <brk_reg_con>; yes ] ^ > > + [ Upa , Upa, Upa, <brk_reg_con>; * ] ^ > > } > > ) > > > > @@ -10046,8 +10104,10 @@ (define_insn_and_rewrite "*aarch64_brkn_cc" > > (match_dup 3)] > > UNSPEC_BRKN))] > > "TARGET_SVE" > > - {@ [ cons: =0, 1 , 2 , 3 ] > > - [ Upa , Upa, Upa, 0 ] brkns\t%0.b, %1/z, %2.b, %0.b > > + {@ [ cons: =0, 1 , 2 , 3; attrs: pred_clobber ] > > + [ &Upa , Upa, Upa, 0; yes ] brkns\t%0.b, %1/z, %2.b, %0.b > > + [ ?Upa , Upa, Upa, 0; yes ] ^ > > + [ Upa , Upa, Upa, 0; * ] ^ > > } > > "&& (operands[4] != CONST0_RTX (VNx16BImode) > > || operands[5] != CONST0_RTX (VNx16BImode))" > > @@ -10072,8 +10132,10 @@ (define_insn_and_rewrite > "*aarch64_brkn_ptest" > > UNSPEC_PTEST)) > > (clobber (match_scratch:VNx16BI 0))] > > "TARGET_SVE" > > - {@ [ cons: =0, 1 , 2 , 3 ] > > - [ Upa , Upa, Upa, 0 ] brkns\t%0.b, %1/z, %2.b, %0.b > > + {@ [ cons: =0, 1 , 2 , 3; attrs: pred_clobber ] > > + [ &Upa , Upa, Upa, 0; yes ] brkns\t%0.b, %1/z, %2.b, %0.b > > + [ ?Upa , Upa, Upa, 0; yes ] ^ > > + [ Upa , Upa, Upa, 0; * ] ^ > > } > > "&& (operands[4] != CONST0_RTX (VNx16BImode) > > || operands[5] != CONST0_RTX (VNx16BImode))" > > @@ -10103,8 +10165,10 @@ (define_insn "*aarch64_brk<brk_op>_cc" > > (match_dup 3)] > > SVE_BRKP))] > > "TARGET_SVE" > > - {@ [ cons: =0, 1 , 2 , 3 ] > > - [ Upa , Upa, Upa, Upa ] brk<brk_op>s\t%0.b, %1/z, %2.b, %3.b > > + {@ [ cons: =0, 1 , 2 , 3 , 4; attrs: pred_clobber ] > > + [ &Upa , Upa, Upa, Upa, ; yes ] brk<brk_op>s\t%0.b, %1/z, %2.b, > %3.b > > + [ ?Upa , Upa, Upa, Upa, ; yes ] ^ > > + [ Upa , Upa, Upa, Upa, ; * ] ^ > > } > > ) > > > > @@ -10123,8 +10187,10 @@ (define_insn "*aarch64_brk<brk_op>_ptest" > > UNSPEC_PTEST)) > > (clobber (match_scratch:VNx16BI 0))] > > "TARGET_SVE" > > - {@ [ cons: =0, 1 , 2 , 3 ] > > - [ Upa , Upa, Upa, Upa ] brk<brk_op>s\t%0.b, %1/z, %2.b, %3.b > > + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] > > + [ &Upa , Upa, Upa, Upa; yes ] brk<brk_op>s\t%0.b, %1/z, %2.b, > %3.b > > + [ ?Upa , Upa, Upa, Upa; yes ] ^ > > + [ Upa , Upa, Upa, Upa; * ] ^ > > } > > ) > > > > diff --git a/gcc/config/aarch64/aarch64-sve2.md > b/gcc/config/aarch64/aarch64-sve2.md > > index > aa12baf48355358ca4fefe88157df3aac6eb09bd..1a49494a69d8335e5f7d3ef4b > d3a90d0805bba84 100644 > > --- a/gcc/config/aarch64/aarch64-sve2.md > > +++ b/gcc/config/aarch64/aarch64-sve2.md > > @@ -3349,8 +3349,10 @@ (define_insn > "@aarch64_pred_<sve_int_op><mode>" > > UNSPEC_PRED_Z)) > > (clobber (reg:CC_NZC CC_REGNUM))] > > "TARGET_SVE2 && TARGET_NON_STREAMING" > > - {@ [ cons: =0, 1 , 2, 3, 4 ] > > - [ Upa , Upl, , w, w ] <sve_int_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, > %4.<Vetype> > > + {@ [ cons: =0, 1 , 2, 3, 4; attrs: pred_clobber ] > > + [ &Upa , Upl, , w, w; yes ] <sve_int_op>\t%0.<Vetype>, %1/z, > %3.<Vetype>, %4.<Vetype> > > + [ ?Upa , Upl, , w, w; yes ] ^ > > + [ Upa , Upl, , w, w; * ] ^ > > } > > ) ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 3/4]AArch64: add new alternative with early clobber to patterns 2024-05-22 11:00 ` Tamar Christina @ 2024-05-22 11:24 ` Richard Sandiford 2024-05-28 9:38 ` Tamar Christina 0 siblings, 1 reply; 25+ messages in thread From: Richard Sandiford @ 2024-05-22 11:24 UTC (permalink / raw) To: Tamar Christina Cc: gcc-patches, nd, Richard Earnshaw, Marcus Shawcroft, ktkachov Tamar Christina <Tamar.Christina@arm.com> writes: >> -----Original Message----- >> From: Richard Sandiford <richard.sandiford@arm.com> >> Sent: Wednesday, May 22, 2024 10:48 AM >> To: Tamar Christina <Tamar.Christina@arm.com> >> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard Earnshaw >> <Richard.Earnshaw@arm.com>; Marcus Shawcroft >> <Marcus.Shawcroft@arm.com>; ktkachov@gcc.gnu.org >> Subject: Re: [PATCH 3/4]AArch64: add new alternative with early clobber to >> patterns >> >> Tamar Christina <tamar.christina@arm.com> writes: >> > Hi All, >> > >> > This patch adds new alternatives to the patterns which are affected. The new >> > alternatives with the conditional early clobbers are added before the normal >> > ones in order for LRA to prefer them in the event that we have enough free >> > registers to accommodate them. >> > >> > In case register pressure is too high the normal alternatives will be preferred >> > before a reload is considered as we rather have the tie than a spill. >> > >> > Tests are in the next patch. >> > >> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. >> > >> > Ok for master? >> > >> > Thanks, >> > Tamar >> > >> > gcc/ChangeLog: >> > >> > * config/aarch64/aarch64-sve.md (and<mode>3, >> > @aarch64_pred_<optab><mode>_z, *<optab><mode>3_cc, >> > *<optab><mode>3_ptest, aarch64_pred_<nlogical><mode>_z, >> > *<nlogical><mode>3_cc, *<nlogical><mode>3_ptest, >> > aarch64_pred_<logical_nn><mode>_z, *<logical_nn><mode>3_cc, >> > *<logical_nn><mode>3_ptest, @aarch64_pred_cmp<cmp_op><mode>, >> > *cmp<cmp_op><mode>_cc, *cmp<cmp_op><mode>_ptest, >> > @aarch64_pred_cmp<cmp_op><mode>_wide, >> > *aarch64_pred_cmp<cmp_op><mode>_wide_cc, >> > *aarch64_pred_cmp<cmp_op><mode>_wide_ptest, >> @aarch64_brk<brk_op>, >> > *aarch64_brk<brk_op>_cc, *aarch64_brk<brk_op>_ptest, >> > @aarch64_brk<brk_op>, *aarch64_brkn_cc, *aarch64_brkn_ptest, >> > *aarch64_brk<brk_op>_cc, *aarch64_brk<brk_op>_ptest, >> > aarch64_rdffr_z, *aarch64_rdffr_z_ptest, *aarch64_rdffr_ptest, >> > *aarch64_rdffr_z_cc, *aarch64_rdffr_cc): Add new early clobber >> > alternative. >> > * config/aarch64/aarch64-sve2.md >> > (@aarch64_pred_<sve_int_op><mode>): Likewise. >> > >> > --- >> > diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64- >> sve.md >> > index >> e3085c0c636f1317409bbf3b5fbaf5342a2df1f6..8fdc1bc3cd43acfcd675a18350c >> 297428c85fe46 100644 >> > --- a/gcc/config/aarch64/aarch64-sve.md >> > +++ b/gcc/config/aarch64/aarch64-sve.md >> > @@ -1161,8 +1161,10 @@ (define_insn "aarch64_rdffr_z" >> > (reg:VNx16BI FFRT_REGNUM) >> > (match_operand:VNx16BI 1 "register_operand")))] >> > "TARGET_SVE && TARGET_NON_STREAMING" >> > - {@ [ cons: =0, 1 ] >> > - [ Upa , Upa ] rdffr\t%0.b, %1/z >> > + {@ [ cons: =0, 1 ; attrs: pred_clobber ] >> > + [ &Upa , Upa; yes ] rdffr\t%0.b, %1/z >> > + [ ?Upa , Upa; yes ] ^ >> > + [ Upa , Upa; * ] ^ >> > } >> > ) >> >> Sorry for not explaining it very well, but in the previous review I suggested: >> >> > The gather-like approach would be something like: >> > >> > [ &Upa , Upl , w , <sve_imm_con>; yes ] >> cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, #%4 >> > [ ?Upl , 0 , w , <sve_imm_con>; yes ] ^ >> > [ Upa , Upl , w , <sve_imm_con>; no ] ^ >> > [ &Upa , Upl , w , w ; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, >> %3.<Vetype>, %4.<Vetype> >> > [ ?Upl , 0 , w , w ; yes ] ^ >> > [ Upa , Upl , w , w ; no ] ^ >> > >> > with: >> > >> > (define_attr "pred_clobber" "any,no,yes" (const_string "any")) >> >> (with emphasis on the last line). What I didn't say explicitly is >> that "no" should require !TARGET_SVE_PRED_CLOBBER. >> >> The premise of that review was that we shouldn't enable things like: >> >> [ Upa , Upl , w , w ; no ] ^ >> >> for TARGET_SVE_PRED_CLOBBER since it contradicts the earlyclobber >> alternative. So we should enable either the pred_clobber=yes >> alternatives or the pred_clobber=no alternatives, but not both. >> >> The default "any" is then for other non-predicate instructions that >> don't care about TARGET_SVE_PRED_CLOBBER either way. >> >> In contrast, this patch makes pred_clobber=yes enable the alternatives >> that correctly describe the restriction (good!) but then also enables >> the normal alternatives too, which IMO makes the semantics unclear. > > Sure, the reason I still had that is because this ICEs under high register > pressure: > > {@ [ cons: =0 , 1 , 3 , 4 ; attrs: pred_clobber ] > [ &Upa , Upl , w , <sve_imm_con>; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, #%4 > [ ?Upa , 0 , w , <sve_imm_con>; yes ] ^ > [ Upa , Upl , w , <sve_imm_con>; no ] ^ > [ &Upa , Upl , w , w ; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.<Vetype> > [ ?Upa , 0 , w , w ; yes ] ^ > [ Upa , Upl , w , w ; no ] ^ > } > > So now in the `yes` case reload does: > > Considering alt=0 of insn 10: (0) =&Upa (1) Upl (3) w (4) vsd > 0 Small class reload: reject+=3 > 0 Non input pseudo reload: reject++ > 0 Early clobber: reject++ > Bad operand -- refuse > Considering alt=1 of insn 10: (0) ?Upa (1) 0 (3) w (4) vsd > Staticly defined alt reject+=6 > 0 Small class reload: reject+=3 > 0 Non input pseudo reload: reject++ > 1 Dying matched operand reload: reject++ > 1 Small class reload: reject+=3 > Bad operand -- refuse > Considering alt=3 of insn 10: (0) &Upa (1) Upl (3) w (4) w > 0 Small class reload: reject+=3 > 0 Non input pseudo reload: reject++ > 0 Early clobber: reject++ > overall=11,losers=1,rld_nregs=1 > Considering alt=4 of insn 10: (0) ?Upa (1) 0 (3) w (4) w > Staticly defined alt reject+=6 > 0 Small class reload: reject+=3 > 0 Non input pseudo reload: reject++ > overall=16,losers=1 -- refuse > Choosing alt 3 in insn 10: (0) &Upa (1) Upl (3) w (4) w {aarch64_pred_cmplovnx8hi} > > And the penalty of alt=4 makes it pick alt=3 even though it doesn't have the free registers > for it. alt=4 would have worked. By "high register pressure", do you mean if predicate registers are disabled using -ffixed? If so, that's ok in itself. That kind of ICE shouldn't occur in real use. > I believe this now follows exactly what was suggested: > > 1. provide an early clobber alternative > 2. provide an explicit tie alternative with an increase in cost for using it > 3. provide a general/normal alternative that is only enabled when the first two aren't. > > Having read the email a number of times.. did I somehow miss something? But how is (3) arranged? It looks like the normal alternative is enabled unconditionally, in the sense that the "enabled" attribute is always "yes". Thanks, Richard ^ permalink raw reply [flat|nested] 25+ messages in thread
* RE: [PATCH 3/4]AArch64: add new alternative with early clobber to patterns 2024-05-22 11:24 ` Richard Sandiford @ 2024-05-28 9:38 ` Tamar Christina 2024-05-30 20:12 ` Richard Sandiford 0 siblings, 1 reply; 25+ messages in thread From: Tamar Christina @ 2024-05-28 9:38 UTC (permalink / raw) To: Richard Sandiford Cc: gcc-patches, nd, Richard Earnshaw, Marcus Shawcroft, ktkachov [-- Attachment #1: Type: text/plain, Size: 27884 bytes --] > -----Original Message----- > From: Richard Sandiford <richard.sandiford@arm.com> > Sent: Wednesday, May 22, 2024 12:24 PM > To: Tamar Christina <Tamar.Christina@arm.com> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard Earnshaw > <Richard.Earnshaw@arm.com>; Marcus Shawcroft > <Marcus.Shawcroft@arm.com>; ktkachov@gcc.gnu.org > Subject: Re: [PATCH 3/4]AArch64: add new alternative with early clobber to > patterns > > Tamar Christina <Tamar.Christina@arm.com> writes: > >> -----Original Message----- > >> From: Richard Sandiford <richard.sandiford@arm.com> > >> Sent: Wednesday, May 22, 2024 10:48 AM > >> To: Tamar Christina <Tamar.Christina@arm.com> > >> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard Earnshaw > >> <Richard.Earnshaw@arm.com>; Marcus Shawcroft > >> <Marcus.Shawcroft@arm.com>; ktkachov@gcc.gnu.org > >> Subject: Re: [PATCH 3/4]AArch64: add new alternative with early clobber to > >> patterns > >> > >> Tamar Christina <tamar.christina@arm.com> writes: > >> > Hi All, > >> > > >> > This patch adds new alternatives to the patterns which are affected. The new > >> > alternatives with the conditional early clobbers are added before the normal > >> > ones in order for LRA to prefer them in the event that we have enough free > >> > registers to accommodate them. > >> > > >> > In case register pressure is too high the normal alternatives will be preferred > >> > before a reload is considered as we rather have the tie than a spill. > >> > > >> > Tests are in the next patch. > >> > > >> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > >> > > >> > Ok for master? > >> > > >> > Thanks, > >> > Tamar > >> > > >> > gcc/ChangeLog: > >> > > >> > * config/aarch64/aarch64-sve.md (and<mode>3, > >> > @aarch64_pred_<optab><mode>_z, *<optab><mode>3_cc, > >> > *<optab><mode>3_ptest, aarch64_pred_<nlogical><mode>_z, > >> > *<nlogical><mode>3_cc, *<nlogical><mode>3_ptest, > >> > aarch64_pred_<logical_nn><mode>_z, *<logical_nn><mode>3_cc, > >> > *<logical_nn><mode>3_ptest, @aarch64_pred_cmp<cmp_op><mode>, > >> > *cmp<cmp_op><mode>_cc, *cmp<cmp_op><mode>_ptest, > >> > @aarch64_pred_cmp<cmp_op><mode>_wide, > >> > *aarch64_pred_cmp<cmp_op><mode>_wide_cc, > >> > *aarch64_pred_cmp<cmp_op><mode>_wide_ptest, > >> @aarch64_brk<brk_op>, > >> > *aarch64_brk<brk_op>_cc, *aarch64_brk<brk_op>_ptest, > >> > @aarch64_brk<brk_op>, *aarch64_brkn_cc, *aarch64_brkn_ptest, > >> > *aarch64_brk<brk_op>_cc, *aarch64_brk<brk_op>_ptest, > >> > aarch64_rdffr_z, *aarch64_rdffr_z_ptest, *aarch64_rdffr_ptest, > >> > *aarch64_rdffr_z_cc, *aarch64_rdffr_cc): Add new early clobber > >> > alternative. > >> > * config/aarch64/aarch64-sve2.md > >> > (@aarch64_pred_<sve_int_op><mode>): Likewise. > >> > > >> > --- > >> > diff --git a/gcc/config/aarch64/aarch64-sve.md > b/gcc/config/aarch64/aarch64- > >> sve.md > >> > index > >> > e3085c0c636f1317409bbf3b5fbaf5342a2df1f6..8fdc1bc3cd43acfcd675a18350c > >> 297428c85fe46 100644 > >> > --- a/gcc/config/aarch64/aarch64-sve.md > >> > +++ b/gcc/config/aarch64/aarch64-sve.md > >> > @@ -1161,8 +1161,10 @@ (define_insn "aarch64_rdffr_z" > >> > (reg:VNx16BI FFRT_REGNUM) > >> > (match_operand:VNx16BI 1 "register_operand")))] > >> > "TARGET_SVE && TARGET_NON_STREAMING" > >> > - {@ [ cons: =0, 1 ] > >> > - [ Upa , Upa ] rdffr\t%0.b, %1/z > >> > + {@ [ cons: =0, 1 ; attrs: pred_clobber ] > >> > + [ &Upa , Upa; yes ] rdffr\t%0.b, %1/z > >> > + [ ?Upa , Upa; yes ] ^ > >> > + [ Upa , Upa; * ] ^ > >> > } > >> > ) > >> > >> Sorry for not explaining it very well, but in the previous review I suggested: > >> > >> > The gather-like approach would be something like: > >> > > >> > [ &Upa , Upl , w , <sve_imm_con>; yes ] > >> cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, #%4 > >> > [ ?Upl , 0 , w , <sve_imm_con>; yes ] ^ > >> > [ Upa , Upl , w , <sve_imm_con>; no ] ^ > >> > [ &Upa , Upl , w , w ; yes ] cmp<cmp_op>\t%0.<Vetype>, > %1/z, > >> %3.<Vetype>, %4.<Vetype> > >> > [ ?Upl , 0 , w , w ; yes ] ^ > >> > [ Upa , Upl , w , w ; no ] ^ > >> > > >> > with: > >> > > >> > (define_attr "pred_clobber" "any,no,yes" (const_string "any")) > >> > >> (with emphasis on the last line). What I didn't say explicitly is > >> that "no" should require !TARGET_SVE_PRED_CLOBBER. > >> > >> The premise of that review was that we shouldn't enable things like: > >> > >> [ Upa , Upl , w , w ; no ] ^ > >> > >> for TARGET_SVE_PRED_CLOBBER since it contradicts the earlyclobber > >> alternative. So we should enable either the pred_clobber=yes > >> alternatives or the pred_clobber=no alternatives, but not both. > >> > >> The default "any" is then for other non-predicate instructions that > >> don't care about TARGET_SVE_PRED_CLOBBER either way. > >> > >> In contrast, this patch makes pred_clobber=yes enable the alternatives > >> that correctly describe the restriction (good!) but then also enables > >> the normal alternatives too, which IMO makes the semantics unclear. > > > > Sure, the reason I still had that is because this ICEs under high register > > pressure: > > > > {@ [ cons: =0 , 1 , 3 , 4 ; attrs: pred_clobber ] > > [ &Upa , Upl , w , <sve_imm_con>; yes ] > cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, #%4 > > [ ?Upa , 0 , w , <sve_imm_con>; yes ] ^ > > [ Upa , Upl , w , <sve_imm_con>; no ] ^ > > [ &Upa , Upl , w , w ; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, > %3.<Vetype>, %4.<Vetype> > > [ ?Upa , 0 , w , w ; yes ] ^ > > [ Upa , Upl , w , w ; no ] ^ > > } > > > > So now in the `yes` case reload does: > > > > Considering alt=0 of insn 10: (0) =&Upa (1) Upl (3) w (4) vsd > > 0 Small class reload: reject+=3 > > 0 Non input pseudo reload: reject++ > > 0 Early clobber: reject++ > > Bad operand -- refuse > > Considering alt=1 of insn 10: (0) ?Upa (1) 0 (3) w (4) vsd > > Staticly defined alt reject+=6 > > 0 Small class reload: reject+=3 > > 0 Non input pseudo reload: reject++ > > 1 Dying matched operand reload: reject++ > > 1 Small class reload: reject+=3 > > Bad operand -- refuse > > Considering alt=3 of insn 10: (0) &Upa (1) Upl (3) w (4) w > > 0 Small class reload: reject+=3 > > 0 Non input pseudo reload: reject++ > > 0 Early clobber: reject++ > > overall=11,losers=1,rld_nregs=1 > > Considering alt=4 of insn 10: (0) ?Upa (1) 0 (3) w (4) w > > Staticly defined alt reject+=6 > > 0 Small class reload: reject+=3 > > 0 Non input pseudo reload: reject++ > > overall=16,losers=1 -- refuse > > Choosing alt 3 in insn 10: (0) &Upa (1) Upl (3) w (4) w > {aarch64_pred_cmplovnx8hi} > > > > And the penalty of alt=4 makes it pick alt=3 even though it doesn't have the free > registers > > for it. alt=4 would have worked. > > By "high register pressure", do you mean if predicate registers are > disabled using -ffixed? If so, that's ok in itself. That kind of > ICE shouldn't occur in real use. > > > I believe this now follows exactly what was suggested: > > > > 1. provide an early clobber alternative > > 2. provide an explicit tie alternative with an increase in cost for using it > > 3. provide a general/normal alternative that is only enabled when the first two > aren't. > > > > Having read the email a number of times.. did I somehow miss something? > > But how is (3) arranged? It looks like the normal alternative is enabled > unconditionally, in the sense that the "enabled" attribute is always "yes". > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * config/aarch64/aarch64-sve.md (and<mode>3, @aarch64_pred_<optab><mode>_z, *<optab><mode>3_cc, *<optab><mode>3_ptest, aarch64_pred_<nlogical><mode>_z, *<nlogical><mode>3_cc, *<nlogical><mode>3_ptest, aarch64_pred_<logical_nn><mode>_z, *<logical_nn><mode>3_cc, *<logical_nn><mode>3_ptest, @aarch64_pred_cmp<cmp_op><mode>, *cmp<cmp_op><mode>_cc, *cmp<cmp_op><mode>_ptest, @aarch64_pred_cmp<cmp_op><mode>_wide, *aarch64_pred_cmp<cmp_op><mode>_wide_cc, *aarch64_pred_cmp<cmp_op><mode>_wide_ptest, @aarch64_brk<brk_op>, *aarch64_brk<brk_op>_cc, *aarch64_brk<brk_op>_ptest, @aarch64_brk<brk_op>, *aarch64_brkn_cc, *aarch64_brkn_ptest, *aarch64_brk<brk_op>_cc, *aarch64_brk<brk_op>_ptest, aarch64_rdffr_z, *aarch64_rdffr_z_ptest, *aarch64_rdffr_ptest, *aarch64_rdffr_z_cc, *aarch64_rdffr_cc): Add new early clobber alternative. * config/aarch64/aarch64-sve2.md (@aarch64_pred_<sve_int_op><mode>): Likewise. -- inline copy of patch -- diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md index e3085c0c636f1317409bbf3b5fbaf5342a2df1f6..ca97750f5c3549bbb3a89aa41acb4edfac3f1b85 100644 --- a/gcc/config/aarch64/aarch64-sve.md +++ b/gcc/config/aarch64/aarch64-sve.md @@ -1161,8 +1161,10 @@ (define_insn "aarch64_rdffr_z" (reg:VNx16BI FFRT_REGNUM) (match_operand:VNx16BI 1 "register_operand")))] "TARGET_SVE && TARGET_NON_STREAMING" - {@ [ cons: =0, 1 ] - [ Upa , Upa ] rdffr\t%0.b, %1/z + {@ [ cons: =0, 1 ; attrs: pred_clobber ] + [ &Upa , Upa; yes ] rdffr\t%0.b, %1/z + [ ?Upa , 0 ; yes ] ^ + [ Upa , Upa; no ] ^ } ) @@ -1179,8 +1181,10 @@ (define_insn "*aarch64_rdffr_z_ptest" UNSPEC_PTEST)) (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE && TARGET_NON_STREAMING" - {@ [ cons: =0, 1 ] - [ Upa , Upa ] rdffrs\t%0.b, %1/z + {@ [ cons: =0, 1 ; attrs: pred_clobber ] + [ &Upa , Upa; yes ] rdffrs\t%0.b, %1/z + [ ?Upa , 0 ; yes ] ^ + [ Upa , Upa; no ] ^ } ) @@ -1195,8 +1199,10 @@ (define_insn "*aarch64_rdffr_ptest" UNSPEC_PTEST)) (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE && TARGET_NON_STREAMING" - {@ [ cons: =0, 1 ] - [ Upa , Upa ] rdffrs\t%0.b, %1/z + {@ [ cons: =0, 1 ; attrs: pred_clobber ] + [ &Upa , Upa; yes ] rdffrs\t%0.b, %1/z + [ ?Upa , 0 ; yes ] ^ + [ Upa , Upa; no ] ^ } ) @@ -1216,8 +1222,10 @@ (define_insn "*aarch64_rdffr_z_cc" (reg:VNx16BI FFRT_REGNUM) (match_dup 1)))] "TARGET_SVE && TARGET_NON_STREAMING" - {@ [ cons: =0, 1 ] - [ Upa , Upa ] rdffrs\t%0.b, %1/z + {@ [ cons: =0, 1 ; attrs: pred_clobber ] + [ &Upa , Upa; yes ] rdffrs\t%0.b, %1/z + [ ?Upa , 0 ; yes ] ^ + [ Upa , Upa; no ] ^ } ) @@ -1233,8 +1241,10 @@ (define_insn "*aarch64_rdffr_cc" (set (match_operand:VNx16BI 0 "register_operand") (reg:VNx16BI FFRT_REGNUM))] "TARGET_SVE && TARGET_NON_STREAMING" - {@ [ cons: =0, 1 ] - [ Upa , Upa ] rdffrs\t%0.b, %1/z + {@ [ cons: =0, 1 ; attrs: pred_clobber ] + [ &Upa , Upa; yes ] rdffrs\t%0.b, %1/z + [ ?Upa , 0 ; yes ] ^ + [ Upa , Upa; no ] ^ } ) @@ -6651,8 +6661,10 @@ (define_insn "and<mode>3" (and:PRED_ALL (match_operand:PRED_ALL 1 "register_operand") (match_operand:PRED_ALL 2 "register_operand")))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 ] - [ Upa , Upa, Upa ] and\t%0.b, %1/z, %2.b, %2.b + {@ [ cons: =0, 1 , 2 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa; yes ] and\t%0.b, %1/z, %2.b, %2.b + [ ?Upa , 0 , Upa; yes ] ^ + [ Upa , Upa, Upa; no ] ^ } ) @@ -6679,8 +6691,10 @@ (define_insn "@aarch64_pred_<optab><mode>_z" (match_operand:PRED_ALL 3 "register_operand")) (match_operand:PRED_ALL 1 "register_operand")))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, Upa ] <logical>\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa; yes ] <logical>\t%0.b, %1/z, %2.b, %3.b + [ ?Upa , 0 , Upa, Upa; yes ] ^ + [ Upa , Upa, Upa, Upa; no ] ^ } ) @@ -6703,8 +6717,10 @@ (define_insn "*<optab><mode>3_cc" (and:PRED_ALL (LOGICAL:PRED_ALL (match_dup 2) (match_dup 3)) (match_dup 4)))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, Upa ] <logical>s\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa; yes ] <logical>s\t%0.b, %1/z, %2.b, %3.b + [ ?Upa , 0 , Upa, Upa; yes ] ^ + [ Upa , Upa, Upa, Upa; no ] ^ } ) @@ -6723,8 +6739,10 @@ (define_insn "*<optab><mode>3_ptest" UNSPEC_PTEST)) (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, Upa ] <logical>s\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa; yes ] <logical>s\t%0.b, %1/z, %2.b, %3.b + [ ?Upa , 0 , Upa, Upa; yes ] ^ + [ Upa , Upa, Upa, Upa; no ] ^ } ) @@ -6745,8 +6763,10 @@ (define_insn "aarch64_pred_<nlogical><mode>_z" (match_operand:PRED_ALL 2 "register_operand")) (match_operand:PRED_ALL 1 "register_operand")))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, Upa ] <nlogical>\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa; yes ] <nlogical>\t%0.b, %1/z, %2.b, %3.b + [ ?Upa , 0 , Upa, Upa; yes ] ^ + [ Upa , Upa, Upa, Upa; no ] ^ } ) @@ -6770,8 +6790,10 @@ (define_insn "*<nlogical><mode>3_cc" (match_dup 2)) (match_dup 4)))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, Upa ] <nlogical>s\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa; yes ] <nlogical>s\t%0.b, %1/z, %2.b, %3.b + [ ?Upa , 0 , Upa, Upa; yes ] ^ + [ Upa , Upa, Upa, Upa; no ] ^ } ) @@ -6791,8 +6813,10 @@ (define_insn "*<nlogical><mode>3_ptest" UNSPEC_PTEST)) (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, Upa ] <nlogical>s\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa; yes ] <nlogical>s\t%0.b, %1/z, %2.b, %3.b + [ ?Upa , 0 , Upa, Upa; yes ] ^ + [ Upa , Upa, Upa, Upa; no ] ^ } ) @@ -6813,8 +6837,10 @@ (define_insn "aarch64_pred_<logical_nn><mode>_z" (not:PRED_ALL (match_operand:PRED_ALL 3 "register_operand"))) (match_operand:PRED_ALL 1 "register_operand")))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, Upa ] <logical_nn>\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa; yes ] <logical_nn>\t%0.b, %1/z, %2.b, %3.b + [ ?Upa , 0 , Upa, Upa; yes ] ^ + [ Upa , Upa, Upa, Upa; no ] ^ } ) @@ -6839,8 +6865,10 @@ (define_insn "*<logical_nn><mode>3_cc" (not:PRED_ALL (match_dup 3))) (match_dup 4)))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, Upa ] <logical_nn>s\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa; yes ] <logical_nn>s\t%0.b, %1/z, %2.b, %3.b + [ ?Upa , 0 , Upa, Upa; yes ] ^ + [ Upa , Upa, Upa, Upa; no ] ^ } ) @@ -6861,8 +6889,10 @@ (define_insn "*<logical_nn><mode>3_ptest" UNSPEC_PTEST)) (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, Upa ] <logical_nn>s\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa; yes ] <logical_nn>s\t%0.b, %1/z, %2.b, %3.b + [ ?Upa , 0 , Upa, Upa; yes ] ^ + [ Upa , Upa, Upa, Upa; no ] ^ } ) @@ -8104,9 +8134,13 @@ (define_insn "@aarch64_pred_cmp<cmp_op><mode>" UNSPEC_PRED_Z)) (clobber (reg:CC_NZC CC_REGNUM))] "TARGET_SVE" - {@ [ cons: =0 , 1 , 3 , 4 ] - [ Upa , Upl , w , <sve_imm_con> ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, #%4 - [ Upa , Upl , w , w ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.<Vetype> + {@ [ cons: =0 , 1 , 3 , 4 ; attrs: pred_clobber ] + [ &Upa , Upl , w , <sve_imm_con>; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, #%4 + [ ?Upa , 0 , w , <sve_imm_con>; yes ] ^ + [ Upa , Upl , w , <sve_imm_con>; no ] ^ + [ &Upa , Upl , w , w ; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.<Vetype> + [ ?Upa , 0 , w , w ; yes ] ^ + [ Upa , Upl , w , w ; no ] ^ } ) @@ -8136,9 +8170,13 @@ (define_insn_and_rewrite "*cmp<cmp_op><mode>_cc" UNSPEC_PRED_Z))] "TARGET_SVE && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])" - {@ [ cons: =0 , 1 , 2 , 3 ] - [ Upa , Upl , w , <sve_imm_con> ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, #%3 - [ Upa , Upl , w , w ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype> + {@ [ cons: =0 , 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upl , w , <sve_imm_con>; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, #%3 + [ ?Upa , 0 , w , <sve_imm_con>; yes ] ^ + [ Upa , Upl , w , <sve_imm_con>; no ] ^ + [ &Upa , Upl , w , w ; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype> + [ ?Upa , 0 , w , w ; yes ] ^ + [ Upa , Upl , w , w ; no ] ^ } "&& !rtx_equal_p (operands[4], operands[6])" { @@ -8166,9 +8204,13 @@ (define_insn_and_rewrite "*cmp<cmp_op><mode>_ptest" (clobber (match_scratch:<VPRED> 0))] "TARGET_SVE && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upl, w , <sve_imm_con> ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, #%3 - [ Upa , Upl, w , w ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype> + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upl, w , <sve_imm_con>; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, #%3 + [ ?Upa , 0 , w , <sve_imm_con>; yes ] ^ + [ Upa , Upl, w , <sve_imm_con>; no ] ^ + [ &Upa , Upl, w , w ; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype> + [ ?Upa , 0 , w , w ; yes ] ^ + [ Upa , Upl, w , w ; no ] ^ } "&& !rtx_equal_p (operands[4], operands[6])" { @@ -8221,8 +8263,10 @@ (define_insn "@aarch64_pred_cmp<cmp_op><mode>_wide" UNSPEC_PRED_Z)) (clobber (reg:CC_NZC CC_REGNUM))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2, 3, 4 ] - [ Upa , Upl, , w, w ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.d + {@ [ cons: =0, 1 , 2, 3, 4; attrs: pred_clobber ] + [ &Upa , Upl, , w, w; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.d + [ ?Upa , 0 , , w, w; yes ] ^ + [ Upa , Upl, , w, w; no ] ^ } ) @@ -8254,8 +8298,10 @@ (define_insn "*aarch64_pred_cmp<cmp_op><mode>_wide_cc" UNSPEC_PRED_Z))] "TARGET_SVE && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])" - {@ [ cons: =0, 1 , 2, 3, 6 ] - [ Upa , Upl, w, w, Upl ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.d + {@ [ cons: =0, 1 , 2, 3, 6 ; attrs: pred_clobber ] + [ &Upa , Upl, w, w, Upl; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.d + [ ?Upa , 0 , w, w, Upl; yes ] ^ + [ Upa , Upl, w, w, Upl; no ] ^ } ) @@ -8279,8 +8325,10 @@ (define_insn "*aarch64_pred_cmp<cmp_op><mode>_wide_ptest" (clobber (match_scratch:<VPRED> 0))] "TARGET_SVE && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])" - {@ [ cons: =0, 1 , 2, 3, 6 ] - [ Upa , Upl, w, w, Upl ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.d + {@ [ cons: =0, 1 , 2, 3, 6 ; attrs: pred_clobber ] + [ &Upa , Upl, w, w, Upl; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.d + [ ?Upa , 0 , w, w, Upl; yes ] ^ + [ Upa , Upl, w, w, Upl; no ] ^ } ) @@ -9948,9 +9996,13 @@ (define_insn "@aarch64_brk<brk_op>" (match_operand:VNx16BI 3 "aarch64_simd_reg_or_zero")] SVE_BRK_UNARY))] "TARGET_SVE" - {@ [ cons: =0 , 1 , 2 , 3 ] - [ Upa , Upa , Upa , Dz ] brk<brk_op>\t%0.b, %1/z, %2.b - [ Upa , Upa , Upa , 0 ] brk<brk_op>\t%0.b, %1/m, %2.b + {@ [ cons: =0 , 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upa , Upa , Dz; yes ] brk<brk_op>\t%0.b, %1/z, %2.b + [ ?Upa , 0 , Upa , Dz; yes ] ^ + [ Upa , Upa , Upa , Dz; no ] ^ + [ &Upa , Upa , Upa , 0 ; yes ] brk<brk_op>\t%0.b, %1/m, %2.b + [ ?Upa , 0 , Upa , 0 ; yes ] ^ + [ Upa , Upa , Upa , 0 ; no ] ^ } ) @@ -9974,8 +10026,10 @@ (define_insn "*aarch64_brk<brk_op>_cc" (match_dup 3)] SVE_BRK_UNARY))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 ] - [ Upa , Upa, Upa ] brk<brk_op>s\t%0.b, %1/z, %2.b + {@ [ cons: =0, 1 , 2 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa; yes ] brk<brk_op>s\t%0.b, %1/z, %2.b + [ ?Upa , 0 , Upa; yes ] ^ + [ Upa , Upa, Upa; no ] ^ } ) @@ -9994,8 +10048,10 @@ (define_insn "*aarch64_brk<brk_op>_ptest" UNSPEC_PTEST)) (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 ] - [ Upa , Upa, Upa ] brk<brk_op>s\t%0.b, %1/z, %2.b + {@ [ cons: =0, 1 , 2 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa; yes ] brk<brk_op>s\t%0.b, %1/z, %2.b + [ ?Upa , 0 , Upa; yes ] ^ + [ Upa , Upa, Upa; no ] ^ } ) @@ -10020,8 +10076,10 @@ (define_insn "@aarch64_brk<brk_op>" (match_operand:VNx16BI 3 "register_operand")] SVE_BRK_BINARY))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, <brk_reg_con> ] brk<brk_op>\t%0.b, %1/z, %2.b, %<brk_reg_opno>.b + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa, <brk_reg_con>; yes ] brk<brk_op>\t%0.b, %1/z, %2.b, %<brk_reg_opno>.b + [ ?Upa , 0 , Upa, <brk_reg_con>; yes ] ^ + [ Upa , Upa, Upa, <brk_reg_con>; no ] ^ } ) @@ -10046,8 +10104,10 @@ (define_insn_and_rewrite "*aarch64_brkn_cc" (match_dup 3)] UNSPEC_BRKN))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, 0 ] brkns\t%0.b, %1/z, %2.b, %0.b + {@ [ cons: =0, 1 , 2 , 3; attrs: pred_clobber ] + [ &Upa , Upa, Upa, 0; yes ] brkns\t%0.b, %1/z, %2.b, %0.b + [ ?Upa , 0 , Upa, 0; yes ] ^ + [ Upa , Upa, Upa, 0; no ] ^ } "&& (operands[4] != CONST0_RTX (VNx16BImode) || operands[5] != CONST0_RTX (VNx16BImode))" @@ -10072,8 +10132,10 @@ (define_insn_and_rewrite "*aarch64_brkn_ptest" UNSPEC_PTEST)) (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, 0 ] brkns\t%0.b, %1/z, %2.b, %0.b + {@ [ cons: =0, 1 , 2 , 3; attrs: pred_clobber ] + [ &Upa , Upa, Upa, 0; yes ] brkns\t%0.b, %1/z, %2.b, %0.b + [ ?Upa , 0 , Upa, 0; yes ] ^ + [ Upa , Upa, Upa, 0; no ] ^ } "&& (operands[4] != CONST0_RTX (VNx16BImode) || operands[5] != CONST0_RTX (VNx16BImode))" @@ -10103,8 +10165,10 @@ (define_insn "*aarch64_brk<brk_op>_cc" (match_dup 3)] SVE_BRKP))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, Upa ] brk<brk_op>s\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 , 4; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa, ; yes ] brk<brk_op>s\t%0.b, %1/z, %2.b, %3.b + [ ?Upa , 0 , Upa, Upa, ; yes ] ^ + [ Upa , Upa, Upa, Upa, ; no ] ^ } ) @@ -10123,8 +10187,10 @@ (define_insn "*aarch64_brk<brk_op>_ptest" UNSPEC_PTEST)) (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, Upa ] brk<brk_op>s\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa; yes ] brk<brk_op>s\t%0.b, %1/z, %2.b, %3.b + [ ?Upa , 0 , Upa, Upa; yes ] ^ + [ Upa , Upa, Upa, Upa; no ] ^ } ) diff --git a/gcc/config/aarch64/aarch64-sve2.md b/gcc/config/aarch64/aarch64-sve2.md index aa12baf48355358ca4fefe88157df3aac6eb09bd..62634e16daad577c5de710c1fc5c8fa542282265 100644 --- a/gcc/config/aarch64/aarch64-sve2.md +++ b/gcc/config/aarch64/aarch64-sve2.md @@ -3349,8 +3349,10 @@ (define_insn "@aarch64_pred_<sve_int_op><mode>" UNSPEC_PRED_Z)) (clobber (reg:CC_NZC CC_REGNUM))] "TARGET_SVE2 && TARGET_NON_STREAMING" - {@ [ cons: =0, 1 , 2, 3, 4 ] - [ Upa , Upl, , w, w ] <sve_int_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.<Vetype> + {@ [ cons: =0, 1 , 2, 3, 4; attrs: pred_clobber ] + [ &Upa , Upl, , w, w; yes ] <sve_int_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.<Vetype> + [ ?Upa , 0 , , w, w; yes ] ^ + [ Upa , Upl, , w, w; no ] ^ } ) [-- Attachment #2: rb18357.patch --] [-- Type: application/octet-stream, Size: 18048 bytes --] diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md index e3085c0c636f1317409bbf3b5fbaf5342a2df1f6..ca97750f5c3549bbb3a89aa41acb4edfac3f1b85 100644 --- a/gcc/config/aarch64/aarch64-sve.md +++ b/gcc/config/aarch64/aarch64-sve.md @@ -1161,8 +1161,10 @@ (define_insn "aarch64_rdffr_z" (reg:VNx16BI FFRT_REGNUM) (match_operand:VNx16BI 1 "register_operand")))] "TARGET_SVE && TARGET_NON_STREAMING" - {@ [ cons: =0, 1 ] - [ Upa , Upa ] rdffr\t%0.b, %1/z + {@ [ cons: =0, 1 ; attrs: pred_clobber ] + [ &Upa , Upa; yes ] rdffr\t%0.b, %1/z + [ ?Upa , 0 ; yes ] ^ + [ Upa , Upa; no ] ^ } ) @@ -1179,8 +1181,10 @@ (define_insn "*aarch64_rdffr_z_ptest" UNSPEC_PTEST)) (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE && TARGET_NON_STREAMING" - {@ [ cons: =0, 1 ] - [ Upa , Upa ] rdffrs\t%0.b, %1/z + {@ [ cons: =0, 1 ; attrs: pred_clobber ] + [ &Upa , Upa; yes ] rdffrs\t%0.b, %1/z + [ ?Upa , 0 ; yes ] ^ + [ Upa , Upa; no ] ^ } ) @@ -1195,8 +1199,10 @@ (define_insn "*aarch64_rdffr_ptest" UNSPEC_PTEST)) (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE && TARGET_NON_STREAMING" - {@ [ cons: =0, 1 ] - [ Upa , Upa ] rdffrs\t%0.b, %1/z + {@ [ cons: =0, 1 ; attrs: pred_clobber ] + [ &Upa , Upa; yes ] rdffrs\t%0.b, %1/z + [ ?Upa , 0 ; yes ] ^ + [ Upa , Upa; no ] ^ } ) @@ -1216,8 +1222,10 @@ (define_insn "*aarch64_rdffr_z_cc" (reg:VNx16BI FFRT_REGNUM) (match_dup 1)))] "TARGET_SVE && TARGET_NON_STREAMING" - {@ [ cons: =0, 1 ] - [ Upa , Upa ] rdffrs\t%0.b, %1/z + {@ [ cons: =0, 1 ; attrs: pred_clobber ] + [ &Upa , Upa; yes ] rdffrs\t%0.b, %1/z + [ ?Upa , 0 ; yes ] ^ + [ Upa , Upa; no ] ^ } ) @@ -1233,8 +1241,10 @@ (define_insn "*aarch64_rdffr_cc" (set (match_operand:VNx16BI 0 "register_operand") (reg:VNx16BI FFRT_REGNUM))] "TARGET_SVE && TARGET_NON_STREAMING" - {@ [ cons: =0, 1 ] - [ Upa , Upa ] rdffrs\t%0.b, %1/z + {@ [ cons: =0, 1 ; attrs: pred_clobber ] + [ &Upa , Upa; yes ] rdffrs\t%0.b, %1/z + [ ?Upa , 0 ; yes ] ^ + [ Upa , Upa; no ] ^ } ) @@ -6651,8 +6661,10 @@ (define_insn "and<mode>3" (and:PRED_ALL (match_operand:PRED_ALL 1 "register_operand") (match_operand:PRED_ALL 2 "register_operand")))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 ] - [ Upa , Upa, Upa ] and\t%0.b, %1/z, %2.b, %2.b + {@ [ cons: =0, 1 , 2 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa; yes ] and\t%0.b, %1/z, %2.b, %2.b + [ ?Upa , 0 , Upa; yes ] ^ + [ Upa , Upa, Upa; no ] ^ } ) @@ -6679,8 +6691,10 @@ (define_insn "@aarch64_pred_<optab><mode>_z" (match_operand:PRED_ALL 3 "register_operand")) (match_operand:PRED_ALL 1 "register_operand")))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, Upa ] <logical>\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa; yes ] <logical>\t%0.b, %1/z, %2.b, %3.b + [ ?Upa , 0 , Upa, Upa; yes ] ^ + [ Upa , Upa, Upa, Upa; no ] ^ } ) @@ -6703,8 +6717,10 @@ (define_insn "*<optab><mode>3_cc" (and:PRED_ALL (LOGICAL:PRED_ALL (match_dup 2) (match_dup 3)) (match_dup 4)))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, Upa ] <logical>s\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa; yes ] <logical>s\t%0.b, %1/z, %2.b, %3.b + [ ?Upa , 0 , Upa, Upa; yes ] ^ + [ Upa , Upa, Upa, Upa; no ] ^ } ) @@ -6723,8 +6739,10 @@ (define_insn "*<optab><mode>3_ptest" UNSPEC_PTEST)) (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, Upa ] <logical>s\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa; yes ] <logical>s\t%0.b, %1/z, %2.b, %3.b + [ ?Upa , 0 , Upa, Upa; yes ] ^ + [ Upa , Upa, Upa, Upa; no ] ^ } ) @@ -6745,8 +6763,10 @@ (define_insn "aarch64_pred_<nlogical><mode>_z" (match_operand:PRED_ALL 2 "register_operand")) (match_operand:PRED_ALL 1 "register_operand")))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, Upa ] <nlogical>\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa; yes ] <nlogical>\t%0.b, %1/z, %2.b, %3.b + [ ?Upa , 0 , Upa, Upa; yes ] ^ + [ Upa , Upa, Upa, Upa; no ] ^ } ) @@ -6770,8 +6790,10 @@ (define_insn "*<nlogical><mode>3_cc" (match_dup 2)) (match_dup 4)))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, Upa ] <nlogical>s\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa; yes ] <nlogical>s\t%0.b, %1/z, %2.b, %3.b + [ ?Upa , 0 , Upa, Upa; yes ] ^ + [ Upa , Upa, Upa, Upa; no ] ^ } ) @@ -6791,8 +6813,10 @@ (define_insn "*<nlogical><mode>3_ptest" UNSPEC_PTEST)) (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, Upa ] <nlogical>s\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa; yes ] <nlogical>s\t%0.b, %1/z, %2.b, %3.b + [ ?Upa , 0 , Upa, Upa; yes ] ^ + [ Upa , Upa, Upa, Upa; no ] ^ } ) @@ -6813,8 +6837,10 @@ (define_insn "aarch64_pred_<logical_nn><mode>_z" (not:PRED_ALL (match_operand:PRED_ALL 3 "register_operand"))) (match_operand:PRED_ALL 1 "register_operand")))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, Upa ] <logical_nn>\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa; yes ] <logical_nn>\t%0.b, %1/z, %2.b, %3.b + [ ?Upa , 0 , Upa, Upa; yes ] ^ + [ Upa , Upa, Upa, Upa; no ] ^ } ) @@ -6839,8 +6865,10 @@ (define_insn "*<logical_nn><mode>3_cc" (not:PRED_ALL (match_dup 3))) (match_dup 4)))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, Upa ] <logical_nn>s\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa; yes ] <logical_nn>s\t%0.b, %1/z, %2.b, %3.b + [ ?Upa , 0 , Upa, Upa; yes ] ^ + [ Upa , Upa, Upa, Upa; no ] ^ } ) @@ -6861,8 +6889,10 @@ (define_insn "*<logical_nn><mode>3_ptest" UNSPEC_PTEST)) (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, Upa ] <logical_nn>s\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa; yes ] <logical_nn>s\t%0.b, %1/z, %2.b, %3.b + [ ?Upa , 0 , Upa, Upa; yes ] ^ + [ Upa , Upa, Upa, Upa; no ] ^ } ) @@ -8104,9 +8134,13 @@ (define_insn "@aarch64_pred_cmp<cmp_op><mode>" UNSPEC_PRED_Z)) (clobber (reg:CC_NZC CC_REGNUM))] "TARGET_SVE" - {@ [ cons: =0 , 1 , 3 , 4 ] - [ Upa , Upl , w , <sve_imm_con> ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, #%4 - [ Upa , Upl , w , w ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.<Vetype> + {@ [ cons: =0 , 1 , 3 , 4 ; attrs: pred_clobber ] + [ &Upa , Upl , w , <sve_imm_con>; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, #%4 + [ ?Upa , 0 , w , <sve_imm_con>; yes ] ^ + [ Upa , Upl , w , <sve_imm_con>; no ] ^ + [ &Upa , Upl , w , w ; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.<Vetype> + [ ?Upa , 0 , w , w ; yes ] ^ + [ Upa , Upl , w , w ; no ] ^ } ) @@ -8136,9 +8170,13 @@ (define_insn_and_rewrite "*cmp<cmp_op><mode>_cc" UNSPEC_PRED_Z))] "TARGET_SVE && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])" - {@ [ cons: =0 , 1 , 2 , 3 ] - [ Upa , Upl , w , <sve_imm_con> ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, #%3 - [ Upa , Upl , w , w ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype> + {@ [ cons: =0 , 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upl , w , <sve_imm_con>; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, #%3 + [ ?Upa , 0 , w , <sve_imm_con>; yes ] ^ + [ Upa , Upl , w , <sve_imm_con>; no ] ^ + [ &Upa , Upl , w , w ; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype> + [ ?Upa , 0 , w , w ; yes ] ^ + [ Upa , Upl , w , w ; no ] ^ } "&& !rtx_equal_p (operands[4], operands[6])" { @@ -8166,9 +8204,13 @@ (define_insn_and_rewrite "*cmp<cmp_op><mode>_ptest" (clobber (match_scratch:<VPRED> 0))] "TARGET_SVE && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upl, w , <sve_imm_con> ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, #%3 - [ Upa , Upl, w , w ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype> + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upl, w , <sve_imm_con>; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, #%3 + [ ?Upa , 0 , w , <sve_imm_con>; yes ] ^ + [ Upa , Upl, w , <sve_imm_con>; no ] ^ + [ &Upa , Upl, w , w ; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype> + [ ?Upa , 0 , w , w ; yes ] ^ + [ Upa , Upl, w , w ; no ] ^ } "&& !rtx_equal_p (operands[4], operands[6])" { @@ -8221,8 +8263,10 @@ (define_insn "@aarch64_pred_cmp<cmp_op><mode>_wide" UNSPEC_PRED_Z)) (clobber (reg:CC_NZC CC_REGNUM))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2, 3, 4 ] - [ Upa , Upl, , w, w ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.d + {@ [ cons: =0, 1 , 2, 3, 4; attrs: pred_clobber ] + [ &Upa , Upl, , w, w; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.d + [ ?Upa , 0 , , w, w; yes ] ^ + [ Upa , Upl, , w, w; no ] ^ } ) @@ -8254,8 +8298,10 @@ (define_insn "*aarch64_pred_cmp<cmp_op><mode>_wide_cc" UNSPEC_PRED_Z))] "TARGET_SVE && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])" - {@ [ cons: =0, 1 , 2, 3, 6 ] - [ Upa , Upl, w, w, Upl ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.d + {@ [ cons: =0, 1 , 2, 3, 6 ; attrs: pred_clobber ] + [ &Upa , Upl, w, w, Upl; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.d + [ ?Upa , 0 , w, w, Upl; yes ] ^ + [ Upa , Upl, w, w, Upl; no ] ^ } ) @@ -8279,8 +8325,10 @@ (define_insn "*aarch64_pred_cmp<cmp_op><mode>_wide_ptest" (clobber (match_scratch:<VPRED> 0))] "TARGET_SVE && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])" - {@ [ cons: =0, 1 , 2, 3, 6 ] - [ Upa , Upl, w, w, Upl ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.d + {@ [ cons: =0, 1 , 2, 3, 6 ; attrs: pred_clobber ] + [ &Upa , Upl, w, w, Upl; yes ] cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.d + [ ?Upa , 0 , w, w, Upl; yes ] ^ + [ Upa , Upl, w, w, Upl; no ] ^ } ) @@ -9948,9 +9996,13 @@ (define_insn "@aarch64_brk<brk_op>" (match_operand:VNx16BI 3 "aarch64_simd_reg_or_zero")] SVE_BRK_UNARY))] "TARGET_SVE" - {@ [ cons: =0 , 1 , 2 , 3 ] - [ Upa , Upa , Upa , Dz ] brk<brk_op>\t%0.b, %1/z, %2.b - [ Upa , Upa , Upa , 0 ] brk<brk_op>\t%0.b, %1/m, %2.b + {@ [ cons: =0 , 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upa , Upa , Dz; yes ] brk<brk_op>\t%0.b, %1/z, %2.b + [ ?Upa , 0 , Upa , Dz; yes ] ^ + [ Upa , Upa , Upa , Dz; no ] ^ + [ &Upa , Upa , Upa , 0 ; yes ] brk<brk_op>\t%0.b, %1/m, %2.b + [ ?Upa , 0 , Upa , 0 ; yes ] ^ + [ Upa , Upa , Upa , 0 ; no ] ^ } ) @@ -9974,8 +10026,10 @@ (define_insn "*aarch64_brk<brk_op>_cc" (match_dup 3)] SVE_BRK_UNARY))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 ] - [ Upa , Upa, Upa ] brk<brk_op>s\t%0.b, %1/z, %2.b + {@ [ cons: =0, 1 , 2 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa; yes ] brk<brk_op>s\t%0.b, %1/z, %2.b + [ ?Upa , 0 , Upa; yes ] ^ + [ Upa , Upa, Upa; no ] ^ } ) @@ -9994,8 +10048,10 @@ (define_insn "*aarch64_brk<brk_op>_ptest" UNSPEC_PTEST)) (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 ] - [ Upa , Upa, Upa ] brk<brk_op>s\t%0.b, %1/z, %2.b + {@ [ cons: =0, 1 , 2 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa; yes ] brk<brk_op>s\t%0.b, %1/z, %2.b + [ ?Upa , 0 , Upa; yes ] ^ + [ Upa , Upa, Upa; no ] ^ } ) @@ -10020,8 +10076,10 @@ (define_insn "@aarch64_brk<brk_op>" (match_operand:VNx16BI 3 "register_operand")] SVE_BRK_BINARY))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, <brk_reg_con> ] brk<brk_op>\t%0.b, %1/z, %2.b, %<brk_reg_opno>.b + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa, <brk_reg_con>; yes ] brk<brk_op>\t%0.b, %1/z, %2.b, %<brk_reg_opno>.b + [ ?Upa , 0 , Upa, <brk_reg_con>; yes ] ^ + [ Upa , Upa, Upa, <brk_reg_con>; no ] ^ } ) @@ -10046,8 +10104,10 @@ (define_insn_and_rewrite "*aarch64_brkn_cc" (match_dup 3)] UNSPEC_BRKN))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, 0 ] brkns\t%0.b, %1/z, %2.b, %0.b + {@ [ cons: =0, 1 , 2 , 3; attrs: pred_clobber ] + [ &Upa , Upa, Upa, 0; yes ] brkns\t%0.b, %1/z, %2.b, %0.b + [ ?Upa , 0 , Upa, 0; yes ] ^ + [ Upa , Upa, Upa, 0; no ] ^ } "&& (operands[4] != CONST0_RTX (VNx16BImode) || operands[5] != CONST0_RTX (VNx16BImode))" @@ -10072,8 +10132,10 @@ (define_insn_and_rewrite "*aarch64_brkn_ptest" UNSPEC_PTEST)) (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, 0 ] brkns\t%0.b, %1/z, %2.b, %0.b + {@ [ cons: =0, 1 , 2 , 3; attrs: pred_clobber ] + [ &Upa , Upa, Upa, 0; yes ] brkns\t%0.b, %1/z, %2.b, %0.b + [ ?Upa , 0 , Upa, 0; yes ] ^ + [ Upa , Upa, Upa, 0; no ] ^ } "&& (operands[4] != CONST0_RTX (VNx16BImode) || operands[5] != CONST0_RTX (VNx16BImode))" @@ -10103,8 +10165,10 @@ (define_insn "*aarch64_brk<brk_op>_cc" (match_dup 3)] SVE_BRKP))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, Upa ] brk<brk_op>s\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 , 4; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa, ; yes ] brk<brk_op>s\t%0.b, %1/z, %2.b, %3.b + [ ?Upa , 0 , Upa, Upa, ; yes ] ^ + [ Upa , Upa, Upa, Upa, ; no ] ^ } ) @@ -10123,8 +10187,10 @@ (define_insn "*aarch64_brk<brk_op>_ptest" UNSPEC_PTEST)) (clobber (match_scratch:VNx16BI 0))] "TARGET_SVE" - {@ [ cons: =0, 1 , 2 , 3 ] - [ Upa , Upa, Upa, Upa ] brk<brk_op>s\t%0.b, %1/z, %2.b, %3.b + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] + [ &Upa , Upa, Upa, Upa; yes ] brk<brk_op>s\t%0.b, %1/z, %2.b, %3.b + [ ?Upa , 0 , Upa, Upa; yes ] ^ + [ Upa , Upa, Upa, Upa; no ] ^ } ) diff --git a/gcc/config/aarch64/aarch64-sve2.md b/gcc/config/aarch64/aarch64-sve2.md index aa12baf48355358ca4fefe88157df3aac6eb09bd..62634e16daad577c5de710c1fc5c8fa542282265 100644 --- a/gcc/config/aarch64/aarch64-sve2.md +++ b/gcc/config/aarch64/aarch64-sve2.md @@ -3349,8 +3349,10 @@ (define_insn "@aarch64_pred_<sve_int_op><mode>" UNSPEC_PRED_Z)) (clobber (reg:CC_NZC CC_REGNUM))] "TARGET_SVE2 && TARGET_NON_STREAMING" - {@ [ cons: =0, 1 , 2, 3, 4 ] - [ Upa , Upl, , w, w ] <sve_int_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.<Vetype> + {@ [ cons: =0, 1 , 2, 3, 4; attrs: pred_clobber ] + [ &Upa , Upl, , w, w; yes ] <sve_int_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.<Vetype> + [ ?Upa , 0 , , w, w; yes ] ^ + [ Upa , Upl, , w, w; no ] ^ } ) ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 3/4]AArch64: add new alternative with early clobber to patterns 2024-05-28 9:38 ` Tamar Christina @ 2024-05-30 20:12 ` Richard Sandiford 0 siblings, 0 replies; 25+ messages in thread From: Richard Sandiford @ 2024-05-30 20:12 UTC (permalink / raw) To: Tamar Christina Cc: gcc-patches, nd, Richard Earnshaw, Marcus Shawcroft, ktkachov Tamar Christina <Tamar.Christina@arm.com> writes: > [...] > @@ -6651,8 +6661,10 @@ (define_insn "and<mode>3" > (and:PRED_ALL (match_operand:PRED_ALL 1 "register_operand") > (match_operand:PRED_ALL 2 "register_operand")))] > "TARGET_SVE" > - {@ [ cons: =0, 1 , 2 ] > - [ Upa , Upa, Upa ] and\t%0.b, %1/z, %2.b, %2.b > + {@ [ cons: =0, 1 , 2 ; attrs: pred_clobber ] > + [ &Upa , Upa, Upa; yes ] and\t%0.b, %1/z, %2.b, %2.b > + [ ?Upa , 0 , Upa; yes ] ^ > + [ Upa , Upa, Upa; no ] ^ I think this ought to be: > + {@ [ cons: =0, 1 , 2 ; attrs: pred_clobber ] > + [ &Upa , Upa, Upa ; yes ] and\t%0.b, %1/z, %2.b, %2.b > + [ ?Upa , 0Upa, 0Upa; yes ] ^ > + [ Upa , Upa, Upa ; no ] ^ so that operand 2 can be tied to operand 0 in the worst case. Similarly: > } > ) > > @@ -6679,8 +6691,10 @@ (define_insn "@aarch64_pred_<optab><mode>_z" > (match_operand:PRED_ALL 3 "register_operand")) > (match_operand:PRED_ALL 1 "register_operand")))] > "TARGET_SVE" > - {@ [ cons: =0, 1 , 2 , 3 ] > - [ Upa , Upa, Upa, Upa ] <logical>\t%0.b, %1/z, %2.b, %3.b > + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] > + [ &Upa , Upa, Upa, Upa; yes ] <logical>\t%0.b, %1/z, %2.b, %3.b > + [ ?Upa , 0 , Upa, Upa; yes ] ^ > + [ Upa , Upa, Upa, Upa; no ] ^ > } > ) this would be: {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] [ &Upa , Upa , Upa , Upa ; yes ] <logical>\t%0.b, %1/z, %2.b, %3.b [ ?Upa , 0Upa, 0Upa, 0Upa; yes ] ^ [ Upa , Upa , Upa, Upa ; no ] ^ } Same idea for the rest. I tried this on: ---------------------------------------------------------------------- #include <arm_sve.h> void use (svbool_t, svbool_t, svbool_t); void f1 (svbool_t p0, svbool_t p1, svbool_t p2, int n, svbool_t *ptr) { while (n--) p2 = svand_z (p0, p1, p2); *ptr = p2; } void f2 (svbool_t p0, svbool_t p1, svbool_t p2, svbool_t *ptr) { *ptr = svand_z (p0, p1, p2); } void f3 (svbool_t p0, svbool_t p1, svbool_t p2, svbool_t *ptr) { use (svand_z (p0, p1, p2), p1, p2); } void f4 (svbool_t p0, svbool_t p1, svbool_t p2, svbool_t *ptr) { use (p0, svand_z (p0, p1, p2), p2); } void f5 (svbool_t p0, svbool_t p1, svbool_t p2, svbool_t *ptr) { use (p0, p1, svand_z (p0, p1, p2)); } ---------------------------------------------------------------------- and it seemed to produce the right output: ---------------------------------------------------------------------- f1: cbz w0, .L2 sub w0, w0, #1 .p2align 5,,15 .L3: and p2.b, p0/z, p1.b, p2.b sub w0, w0, #1 cmn w0, #1 bne .L3 .L2: str p2, [x1] ret f2: and p3.b, p0/z, p1.b, p2.b str p3, [x0] ret f3: and p0.b, p0/z, p1.b, p2.b b use f4: and p1.b, p0/z, p1.b, p2.b b use f5: and p2.b, p0/z, p1.b, p2.b b use ---------------------------------------------------------------------- (with that coming directly from RA, rather than being cleaned up later) > [...] > @@ -10046,8 +10104,10 @@ (define_insn_and_rewrite "*aarch64_brkn_cc" > (match_dup 3)] > UNSPEC_BRKN))] > "TARGET_SVE" > - {@ [ cons: =0, 1 , 2 , 3 ] > - [ Upa , Upa, Upa, 0 ] brkns\t%0.b, %1/z, %2.b, %0.b > + {@ [ cons: =0, 1 , 2 , 3; attrs: pred_clobber ] > + [ &Upa , Upa, Upa, 0; yes ] brkns\t%0.b, %1/z, %2.b, %0.b > + [ ?Upa , 0 , Upa, 0; yes ] ^ > + [ Upa , Upa, Upa, 0; no ] ^ > } > "&& (operands[4] != CONST0_RTX (VNx16BImode) > || operands[5] != CONST0_RTX (VNx16BImode))" Probably best to leave this out. All alternatives require operand 3 to match operand 0. So operands 1 and 2 will only match operand 0 if they're the same as operand 3. In that case it'd be better to allow the sharing rather than force the same value to be stored in two registers. That is, if op1 != op3 && op2 != op3 then we get what we want naturally, regardless of tuning. The same thing would apply to the BRKN instances of <brk_reg_con>: > @@ -10020,8 +10076,10 @@ (define_insn "@aarch64_brk<brk_op>" > (match_operand:VNx16BI 3 "register_operand")] > SVE_BRK_BINARY))] > "TARGET_SVE" > - {@ [ cons: =0, 1 , 2 , 3 ] > - [ Upa , Upa, Upa, <brk_reg_con> ] brk<brk_op>\t%0.b, %1/z, %2.b, %<brk_reg_opno>.b > + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] > + [ &Upa , Upa, Upa, <brk_reg_con>; yes ] brk<brk_op>\t%0.b, %1/z, %2.b, %<brk_reg_opno>.b > + [ ?Upa , 0 , Upa, <brk_reg_con>; yes ] ^ > + [ Upa , Upa, Upa, <brk_reg_con>; no ] ^ > } > ) but I think we should keep this factoring/abstraction and just add the extra alternatives regardless. I.e.: {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] [ &Upa , Upa , Upa , <brk_reg_con> ; yes ] brk<brk_op>\t%0.b, %1/z, %2.b, %<brk_reg_opno>.b [ ?Upa , 0Upa, 0Upa, 0<brk_reg_con>; yes ] ^ [ Upa , Upa , Upa , <brk_reg_con> ; no ] ^ (even though this gives "00", which is valid but redundant). OK with those changes, thanks. Richard ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2024-05-30 20:12 UTC | newest] Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-05-15 10:28 [PATCH 0/4]AArch64: support conditional early clobbers on certain operations Tamar Christina 2024-05-15 10:28 ` [PATCH 1/4]AArch64: convert several predicate patterns to new compact syntax Tamar Christina 2024-05-15 10:35 ` Kyrill Tkachov 2024-05-15 11:06 ` Richard Sandiford 2024-05-15 10:28 ` [PATCH 2/4]AArch64: add new tuning param and attribute for enabling conditional early clobber Tamar Christina 2024-05-15 10:56 ` Richard Sandiford 2024-05-15 11:03 ` Tamar Christina 2024-05-22 9:29 ` Tamar Christina 2024-05-28 9:37 ` Tamar Christina 2024-05-30 14:59 ` Richard Sandiford 2024-05-15 10:29 ` [PATCH 3/4]AArch64: add new alternative with early clobber to patterns Tamar Christina 2024-05-15 10:29 ` [PATCH 4/4]AArch64: enable new predicate tuning for Neoverse cores Tamar Christina 2024-05-15 11:20 ` [PATCH 0/4]AArch64: support conditional early clobbers on certain operations Richard Biener 2024-05-15 11:23 ` Tamar Christina 2024-05-15 14:51 ` Richard Sandiford 2024-05-15 15:56 ` Tamar Christina 2024-05-15 21:31 ` Richard Sandiford 2024-05-16 2:45 ` Tamar Christina 2024-05-21 3:24 ` Tamar Christina 2024-05-22 9:29 [PATCH 3/4]AArch64: add new alternative with early clobber to patterns Tamar Christina 2024-05-22 9:47 ` Richard Sandiford 2024-05-22 11:00 ` Tamar Christina 2024-05-22 11:24 ` Richard Sandiford 2024-05-28 9:38 ` Tamar Christina 2024-05-30 20:12 ` Richard Sandiford
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).