From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id B9C853858422 for ; Mon, 1 Nov 2021 08:49:38 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B9C853858422 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 50841101E; Mon, 1 Nov 2021 01:49:38 -0700 (PDT) Received: from localhost (unknown [10.32.98.88]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 5BEAA3F70D; Mon, 1 Nov 2021 01:49:37 -0700 (PDT) From: Richard Sandiford To: Tamar Christina Mail-Followup-To: Tamar Christina , "gcc-patches\@gcc.gnu.org" , nd , Richard Earnshaw , Marcus Shawcroft , Kyrylo Tkachov , richard.sandiford@arm.com Cc: "gcc-patches\@gcc.gnu.org" , nd , Richard Earnshaw , Marcus Shawcroft , Kyrylo Tkachov Subject: Re: [PATCH]AArch64 Make use of FADDP in simple reductions. References: Date: Mon, 01 Nov 2021 08:49:36 +0000 In-Reply-To: (Tamar Christina's message of "Mon, 1 Nov 2021 07:39:51 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 01 Nov 2021 08:49:41 -0000 Tamar Christina writes: >> -----Original Message----- >> From: Richard Sandiford >> Sent: Friday, October 8, 2021 5:24 PM >> To: Tamar Christina >> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw >> ; Marcus Shawcroft >> ; Kyrylo Tkachov >> Subject: Re: [PATCH]AArch64 Make use of FADDP in simple reductions. >>=20 >> Tamar Christina writes: >> > Hi All, >> > >> > This is a respin of an older patch which never got upstream reviewed >> > by a maintainer. It's been updated to fit the current GCC codegen. >> > >> > This patch adds a pattern to support the (F)ADDP (scalar) instruction. >> > >> > Before the patch, the C code >> > >> > typedef float v4sf __attribute__((vector_size (16))); >> > >> > float >> > foo1 (v4sf x) >> > { >> > return x[0] + x[1]; >> > } >> > >> > generated: >> > >> > foo1: >> > dup s1, v0.s[1] >> > fadd s0, s1, s0 >> > ret >> > >> > After patch: >> > foo1: >> > faddp s0, v0.2s >> > ret >> > >> > The double case is now handled by SLP but the remaining cases still >> > need help from combine. I have kept the integer and floating point >> > separate because of the integer one only supports V2DI and sharing it >> > with the float would have required definition of a few new iterators f= or just >> a single use. >> > >> > I provide support for when both elements are subregs as a different >> > pattern as there's no way to tell reload that the two registers must >> > be equal with just constraints. >> > >> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. >> > >> > Ok for master? >> > >> > Thanks, >> > Tamar >> > >> > gcc/ChangeLog: >> > >> > * config/aarch64/aarch64-simd.md (*aarch64_faddp_scalar, >> > *aarch64_addp_scalarv2di, *aarch64_faddp_scalar2, >> > *aarch64_addp_scalar2v2di): New. >> > >> > gcc/testsuite/ChangeLog: >> > >> > * gcc.target/aarch64/simd/scalar_faddp.c: New test. >> > * gcc.target/aarch64/simd/scalar_faddp2.c: New test. >> > * gcc.target/aarch64/simd/scalar_addp.c: New test. >> > >> > Co-authored-by: Tamar Christina >> > >> > --- inline copy of patch -- >> > diff --git a/gcc/config/aarch64/aarch64-simd.md >> > b/gcc/config/aarch64/aarch64-simd.md >> > index >> > >> 6814dae079c9ff40aaa2bb625432bf9eb8906b73..b49f8b79b11cbb1888c503d9a >> 938 >> > 4424f44bde05 100644 >> > --- a/gcc/config/aarch64/aarch64-simd.md >> > +++ b/gcc/config/aarch64/aarch64-simd.md >> > @@ -3414,6 +3414,70 @@ (define_insn "aarch64_faddp" >> > [(set_attr "type" "neon_fp_reduc_add_")] >> > ) >> > >> > +;; For the case where both operands are a subreg we need to use a ;; >> > +match_dup since reload cannot enforce that the registers are ;; the >> > +same with a constraint in this case. >> > +(define_insn "*aarch64_faddp_scalar2" >> > + [(set (match_operand: 0 "register_operand" "=3Dw") >> > + (plus: >> > + (vec_select: >> > + (match_operator: 1 "subreg_lowpart_operator" >> > + [(match_operand:VHSDF 2 "register_operand" "w")]) >> > + (parallel [(match_operand 3 "const_int_operand" "n")])) >> > + (match_dup: 2)))] >> > + "TARGET_SIMD >> > + && ENDIAN_LANE_N (, INTVAL (operands[3])) =3D=3D 1" >> > + "faddp\t%0, %2.2" >> > + [(set_attr "type" "neon_fp_reduc_add_")] >> > +) >>=20 >> The difficulty with using match_dup here is that the first vec_select op= erand >> ought to fold to a REG after reload, rather than stay as a subreg. From= that >> POV we're forcing the generation of non-canonical rtl. >>=20 >> Also=E2=80=A6 >>=20 >> > +(define_insn "*aarch64_faddp_scalar" >> > + [(set (match_operand: 0 "register_operand" "=3Dw") >> > + (plus: >> > + (vec_select: >> > + (match_operand:VHSDF 1 "register_operand" "w") >> > + (parallel [(match_operand 2 "const_int_operand" "n")])) >> > + (match_operand: 3 "register_operand" "1")))] >> > + "TARGET_SIMD >> > + && ENDIAN_LANE_N (, INTVAL (operands[2])) =3D=3D 1 >> > + && SUBREG_P (operands[3]) && !SUBREG_P (operands[1]) >> > + && subreg_lowpart_p (operands[3])" >> > + "faddp\t%0, %1.2" >> > + [(set_attr "type" "neon_fp_reduc_add_")] >> > +) Also: It'd probably be better to use V2F for the iterator, since it excludes V4 and V8 modes. I think we can use vect_par_cnst_hi_half for operand 2. >> =E2=80=A6matching constraints don't work reliably between two inputs: >> the RA doesn't know how to combine two different inputs into one input in >> order to make them match. >>=20 >> Have you tried doing this as a define_peephole2 instead? >> That fits this kind of situation better (from an rtl representation poin= t of >> view), but peephole2s are admittedly less powerful than combine. >>=20 >> If peephole2s don't work then I think we'll have to provide a pattern th= at >> accepts two distinct inputs and then split the instruction if the inputs= aren't in >> the same register. That sounds a bit ugly though, so it'd be good news = if the >> peephole thing works out. > > Unfortunately peepholes don't work very well here because e.g. addp can be > Assigned by the regalloc to the integer side instead of simd, in which ca= se you > Can't use the instruction anymore. Ah, yeah, we wouldn't be able to recover easily from that. > The peepholes seem to only detect the simple FP cases. > > I tried adding something like a post-reload spit > > + "&& reload_completed && REGNO (operands[1]) !=3D REGNO (operands[3])" > + [(clobber (match_scratch: 4 "=3Dw")) > + (set (match_dup 4) > + (vec_select: > + (match_dup 1) > + (parallel [(match_dup 2)]))) > + (set (match_dup 0) > + (plus: > + (match_dup 4) > + (match_dup 3)))] > + "" > > But this doesn't seem like it'll work as it needs a scratch? The clobber should go in the pattern itself, rather than in the split. E.g= .: [(set (match_operand: 0 "register_operand" "=3Dw") (plus: (vec_select: (match_operand:V2F 1 "register_operand" "w") (match_operand 2 "vect_par_cnst_hi_half")) (match_operand: 3 "register_operand" "w"))) (clobber (match_scratch:V2F 4 "=3D&w"))] recog will add the clobber when needed. However, it's probably simpler to make operand 0 itself earlyclobber: [(set (match_operand: 0 "register_operand" "=3D&w") (plus: (vec_select: (match_operand:V2F 1 "register_operand" "w") (match_operand 2 "vect_par_cnst_hi_half")) (match_operand: 3 "register_operand" "w")))] The output code should return "#" if the register numbers are different. All of this complication makes me think: couldn't we match this in gimple instead? I.e. check for an addition of the elements in a 2-element vector and match it to IFN_REDUC_PLUS (when supported)? Thanks, Richard