From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 0F943385828E for ; Thu, 3 Aug 2023 12:51:41 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 0F943385828E Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id C656E113E; Thu, 3 Aug 2023 05:52:23 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id E2C353F6C4; Thu, 3 Aug 2023 05:51:39 -0700 (PDT) From: Richard Sandiford To: Tamar Christina Mail-Followup-To: Tamar Christina ,"gcc-patches\@gcc.gnu.org" , nd , Richard Earnshaw , Marcus Shawcroft , Kyrylo Tkachov , richard.sandiford@arm.com Cc: "gcc-patches\@gcc.gnu.org" , nd , Richard Earnshaw , Marcus Shawcroft , Kyrylo Tkachov Subject: Re: [PATCH]AArch64 Undo vec_widen_shiftl optabs [PR106346] References: Date: Thu, 03 Aug 2023 13:51:38 +0100 In-Reply-To: (Tamar Christina's message of "Thu, 3 Aug 2023 12:32:02 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-26.0 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,KAM_LOTSOFHASH,KAM_SHORT,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Tamar Christina writes: >> > + >> > +(define_constraint "D3" >> > + "@internal >> > + A constraint that matches vector of immediates that is with 0 to >> > +(bits(mode)/2)-1." >> > + (and (match_code "const,const_vector") >> > + (match_test "aarch64_const_vec_all_same_in_range_p (op, 0, >> > + (GET_MODE_UNIT_BITSIZE (mode) / 2) - 1)"))) >>=20 >> Having this mapping for D2 and D3, with D2 corresponded to prec/2, kind-= of >> makes D3 a false mnemonic. How about DL instead? (L for "left-shift lo= ng" or >> "low-part", take your pick) >>=20 >> Looks good otherwise. >>=20 > > Wasn't sure if this was an ok with changes or not, so here's the final pa= tch =F0=9F=98=8A I was hoping to have another look before it went in. But... > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > Ok for master? ...yeah, LGTM, thanks. Richard > Thanks, > Tamar > > gcc/ChangeLog: > > PR target/106346 > * config/aarch64/aarch64-simd.md (vec_widen_shiftl_lo_, > vec_widen_shiftl_hi_): Remove. > (aarch64_shll_internal): Renamed to... > (aarch64_shll): .. This. > (aarch64_shll2_internal): Renamed to... > (aarch64_shll2): .. This. > (aarch64_shll_n, aarch64_shll2_n): Re-use new > optabs. > * config/aarch64/constraints.md (D2, DL): New. > * config/aarch64/predicates.md (aarch64_simd_shll_imm_vec): New. > > gcc/testsuite/ChangeLog: > > PR target/106346 > * gcc.target/aarch64/pr98772.c: Adjust assembly. > * gcc.target/aarch64/vect-widen-shift.c: New test. > > --- inline copy of patch --- > > diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarc= h64-simd.md > index d95394101470446e55f25a2397dd112239b6a54d..f67eb70577d0c2d9911d8c867= d38a4d0b390337c 100644 > --- a/gcc/config/aarch64/aarch64-simd.md > +++ b/gcc/config/aarch64/aarch64-simd.md > @@ -6387,105 +6387,67 @@ (define_insn "aarch64_qshl<= vczbe>" > [(set_attr "type" "neon_sat_shift_reg")] > ) >=20=20 > -(define_expand "vec_widen_shiftl_lo_" > - [(set (match_operand: 0 "register_operand" "=3Dw") > - (unspec: [(match_operand:VQW 1 "register_operand" "w") > - (match_operand:SI 2 > - "aarch64_simd_shift_imm_bitsize_" "i")] > - VSHLL))] > - "TARGET_SIMD" > - { > - rtx p =3D aarch64_simd_vect_par_cnst_half (mode, , fal= se); > - emit_insn (gen_aarch64_shll_internal (operands[0], operan= ds[1], > - p, operands[2])); > - DONE; > - } > -) > - > -(define_expand "vec_widen_shiftl_hi_" > - [(set (match_operand: 0 "register_operand") > - (unspec: [(match_operand:VQW 1 "register_operand" "w") > - (match_operand:SI 2 > - "immediate_operand" "i")] > - VSHLL))] > - "TARGET_SIMD" > - { > - rtx p =3D aarch64_simd_vect_par_cnst_half (mode, , tru= e); > - emit_insn (gen_aarch64_shll2_internal (operands[0], opera= nds[1], > - p, operands[2])); > - DONE; > - } > -) > - > ;; vshll_n >=20=20 > -(define_insn "aarch64_shll_internal" > - [(set (match_operand: 0 "register_operand" "=3Dw") > - (unspec: [(vec_select: > - (match_operand:VQW 1 "register_operand" "w") > - (match_operand:VQW 2 "vect_par_cnst_lo_half" "")) > - (match_operand:SI 3 > - "aarch64_simd_shift_imm_bitsize_" "i")] > - VSHLL))] > +(define_insn "aarch64_shll" > + [(set (match_operand: 0 "register_operand") > + (ashift: (ANY_EXTEND: > + (match_operand:VD_BHSI 1 "register_operand")) > + (match_operand: 2 > + "aarch64_simd_shll_imm_vec")))] > "TARGET_SIMD" > - { > - if (INTVAL (operands[3]) =3D=3D GET_MODE_UNIT_BITSIZE (mode)) > - return "shll\\t%0., %1., %3"; > - else > - return "shll\\t%0., %1., %3"; > + {@ [cons: =3D0, 1, 2] > + [w, w, D2] shll\t%0., %1., %I2 > + [w, w, DL] shll\t%0., %1., %I2 > } > [(set_attr "type" "neon_shift_imm_long")] > ) >=20=20 > -(define_insn "aarch64_shll2_internal" > - [(set (match_operand: 0 "register_operand" "=3Dw") > - (unspec: [(vec_select: > - (match_operand:VQW 1 "register_operand" "w") > - (match_operand:VQW 2 "vect_par_cnst_hi_half" "")) > - (match_operand:SI 3 > - "aarch64_simd_shift_imm_bitsize_" "i")] > +(define_expand "aarch64_shll_n" > + [(set (match_operand: 0 "register_operand") > + (unspec: [(match_operand:VD_BHSI 1 "register_operand") > + (match_operand:SI 2 > + "aarch64_simd_shift_imm_bitsize_")] > VSHLL))] > "TARGET_SIMD" > { > - if (INTVAL (operands[3]) =3D=3D GET_MODE_UNIT_BITSIZE (mode)) > - return "shll2\\t%0., %1., %3"; > - else > - return "shll2\\t%0., %1., %3"; > + rtx shft =3D gen_const_vec_duplicate (mode, operands[2]); > + emit_insn (gen_aarch64_shll (operands[0], operands[1], sh= ft)); > + DONE; > } > - [(set_attr "type" "neon_shift_imm_long")] > ) >=20=20 > -(define_insn "aarch64_shll_n" > - [(set (match_operand: 0 "register_operand" "=3Dw") > - (unspec: [(match_operand:VD_BHSI 1 "register_operand" "w") > - (match_operand:SI 2 > - "aarch64_simd_shift_imm_bitsize_" "i")] > - VSHLL))] > +;; vshll_high_n > + > +(define_insn "aarch64_shll2" > + [(set (match_operand: 0 "register_operand") > + (ashift: (ANY_EXTEND: > + (vec_select: > + (match_operand:VQW 1 "register_operand") > + (match_operand:VQW 2 "vect_par_cnst_hi_half"))) > + (match_operand: 3 > + "aarch64_simd_shll_imm_vec")))] > "TARGET_SIMD" > - { > - if (INTVAL (operands[2]) =3D=3D GET_MODE_UNIT_BITSIZE (mode)) > - return "shll\\t%0., %1., %2"; > - else > - return "shll\\t%0., %1., %2"; > + {@ [cons: =3D0, 1, 2, 3] > + [w, w, , D2] shll2\t%0., %1., %I3 > + [w, w, , DL] shll2\t%0., %1., %I3 > } > [(set_attr "type" "neon_shift_imm_long")] > ) >=20=20 > -;; vshll_high_n > - > -(define_insn "aarch64_shll2_n" > - [(set (match_operand: 0 "register_operand" "=3Dw") > - (unspec: [(match_operand:VQW 1 "register_operand" "w") > - (match_operand:SI 2 "immediate_operand" "i")] > - VSHLL))] > +(define_expand "aarch64_shll2_n" > + [(set (match_operand: 0 "register_operand") > + (unspec: [(match_operand:VQW 1 "register_operand") > + (match_operand:SI 2 > + "aarch64_simd_shift_imm_bitsize_")] > + VSHLL))] > "TARGET_SIMD" > { > - if (INTVAL (operands[2]) =3D=3D GET_MODE_UNIT_BITSIZE (mode)) > - return "shll2\\t%0., %1., %2"; > - else > - return "shll2\\t%0., %1., %2"; > + rtx shft =3D gen_const_vec_duplicate (mode, operands[2]); > + rtx p =3D aarch64_simd_vect_par_cnst_half (mode, , tru= e); > + emit_insn (gen_aarch64_shll2 (operands[0], operands[1], p= , shft)); > + DONE; > } > - [(set_attr "type" "neon_shift_imm_long")] > ) >=20=20 > ;; vrshr_n > diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/const= raints.md > index 6df1dbec2a8097abe9783ed1670c77a8fad4ca57..371a00827d84d8ea4a06ba2b0= 0a761d3b179ae90 100644 > --- a/gcc/config/aarch64/constraints.md > +++ b/gcc/config/aarch64/constraints.md > @@ -468,6 +468,20 @@ (define_constraint "D1" > GET_MODE_UNIT_BITSIZE (mode) - 1, > GET_MODE_UNIT_BITSIZE (mode) - 1)"))) >=20=20 > +(define_constraint "D2" > + "@internal > + A constraint that matches vector of immediates that is bits(mode)/2." > + (and (match_code "const,const_vector") > + (match_test "aarch64_simd_shift_imm_vec_exact_top (op, mode)"))) > + > +(define_constraint "DL" > + "@internal > + A constraint that matches vector of immediates for left shift long. > + That is immediates between 0 to (bits(mode)/2)-1." > + (and (match_code "const,const_vector") > + (match_test "aarch64_const_vec_all_same_in_range_p (op, 0, > + (GET_MODE_UNIT_BITSIZE (mode) / 2) - 1)"))) > + > (define_constraint "Dr" > "@internal > A constraint that matches vector of immediates for right shifts." > diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predic= ates.md > index d5a4a1cd9bf8cde8e779de6e0afa531f04892a7b..2d8d1fe25c1de35cb5a238605= 8cb2901ee46cd82 100644 > --- a/gcc/config/aarch64/predicates.md > +++ b/gcc/config/aarch64/predicates.md > @@ -638,6 +638,11 @@ (define_predicate "aarch64_simd_raddsubhn_imm_vec" > HOST_WIDE_INT_1U > << (GET_MODE_UNIT_BITSIZE (mode) / 2 - 1))"))) >=20=20 > +(define_predicate "aarch64_simd_shll_imm_vec" > + (and (match_code "const_vector") > + (match_test "aarch64_const_vec_all_same_in_range_p (op, 0, > + GET_MODE_UNIT_BITSIZE (mode) / 2)"))) > + > (define_predicate "aarch64_simd_shift_imm_bitsize_qi" > (and (match_code "const_int") > (match_test "IN_RANGE (INTVAL (op), 0, 8)"))) > diff --git a/gcc/testsuite/gcc.target/aarch64/pr98772.c b/gcc/testsuite/g= cc.target/aarch64/pr98772.c > index 8259251a7c0b64ae8362ea29ec3cf1d2a9d63547..52ad012dcfe72721b8c987bb8= 26c0ffb8ba3f31e 100644 > --- a/gcc/testsuite/gcc.target/aarch64/pr98772.c > +++ b/gcc/testsuite/gcc.target/aarch64/pr98772.c > @@ -155,4 +155,4 @@ int main () > /* { dg-final { scan-assembler-times "uaddl\\tv" 2 } } */ > /* { dg-final { scan-assembler-times "usubl\\tv" 2 } } */ > /* { dg-final { scan-assembler-times "umull\\tv" 2 } } */ > -/* { dg-final { scan-assembler-times "shl\\tv" 2 } } */ > +/* { dg-final { scan-assembler-times "shll\\tv" 2 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/vect-widen-shift.c b/gcc/te= stsuite/gcc.target/aarch64/vect-widen-shift.c > new file mode 100644 > index 0000000000000000000000000000000000000000..6ee41f63ef8a145c0eb7f2139= 50e7501e058b2fa > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/vect-widen-shift.c > @@ -0,0 +1,50 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3 -save-temps" } */ > +/* { dg-final { check-function-bodies "**" "" "" } } */ > +#include > +#include > + > +#pragma GCC target "+nosve" > + > +#define ARR_SIZE 1024 > + > +/* Should produce an shll,shll2 pair*/ > +/* > +** sshll_opt1: > +** ... > +** shll v[0-9]+.4s, v[0-9]+.4h, 16 > +** shll2 v[0-9]+.4s, v[0-9]+.8h, 16 > +** ... > +*/ > +void sshll_opt1 (int32_t *foo, int16_t *a, int16_t *b) > +{ > + for( int i =3D 0; i < ARR_SIZE - 3;i=3Di+4) > + { > + foo[i] =3D a[i] << 16; > + foo[i+1] =3D a[i+1] << 16; > + foo[i+2] =3D a[i+2] << 16; > + foo[i+3] =3D a[i+3] << 16; > + } > +} > + > +/* > +** sshll_opt2: > +** ... > +** sxtl v[0-9]+.4s, v[0-9]+.4h > +** sxtl2 v[0-9]+.4s, v[0-9]+.8h > +** sshl v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s > +** sshl v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s > +** ... > +*/ > +void sshll_opt2 (int32_t *foo, int16_t *a, int16_t *b) > +{ > + for( int i =3D 0; i < ARR_SIZE - 3;i=3Di+4) > + { > + foo[i] =3D a[i] << 16; > + foo[i+1] =3D a[i+1] << 15; > + foo[i+2] =3D a[i+2] << 14; > + foo[i+3] =3D a[i+3] << 17; > + } > +} > + > +