From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <richard.sandiford@arm.com>
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
 by sourceware.org (Postfix) with ESMTP id B9C853858422
 for <gcc-patches@gcc.gnu.org>; Mon,  1 Nov 2021 08:49:38 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B9C853858422
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
 by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 50841101E;
 Mon,  1 Nov 2021 01:49:38 -0700 (PDT)
Received: from localhost (unknown [10.32.98.88])
 by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 5BEAA3F70D;
 Mon,  1 Nov 2021 01:49:37 -0700 (PDT)
From: Richard Sandiford <richard.sandiford@arm.com>
To: Tamar Christina <Tamar.Christina@arm.com>
Mail-Followup-To: Tamar Christina <Tamar.Christina@arm.com>,
 "gcc-patches\@gcc.gnu.org" <gcc-patches@gcc.gnu.org>, nd <nd@arm.com>,
 Richard Earnshaw <Richard.Earnshaw@arm.com>,
 Marcus Shawcroft <Marcus.Shawcroft@arm.com>,
 Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>, richard.sandiford@arm.com
Cc: "gcc-patches\@gcc.gnu.org" <gcc-patches@gcc.gnu.org>, nd <nd@arm.com>,
 Richard Earnshaw <Richard.Earnshaw@arm.com>,
 Marcus Shawcroft <Marcus.Shawcroft@arm.com>,
 Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
Subject: Re: [PATCH]AArch64 Make use of FADDP in simple reductions.
References: <patch-14681-tamar@arm.com> <mpth7drmqmt.fsf@arm.com>
 <VI1PR08MB5325256A185D641EC7B90980FF8A9@VI1PR08MB5325.eurprd08.prod.outlook.com>
Date: Mon, 01 Nov 2021 08:49:36 +0000
In-Reply-To: <VI1PR08MB5325256A185D641EC7B90980FF8A9@VI1PR08MB5325.eurprd08.prod.outlook.com>
 (Tamar Christina's message of "Mon, 1 Nov 2021 07:39:51 +0000")
Message-ID: <mpt8ry8fefz.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 KAM_DMARC_STATUS, KAM_SHORT, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Mon, 01 Nov 2021 08:49:41 -0000

Tamar Christina <Tamar.Christina@arm.com> writes:
>> -----Original Message-----
>> From: Richard Sandiford <richard.sandiford@arm.com>
>> Sent: Friday, October 8, 2021 5:24 PM
>> To: Tamar Christina <Tamar.Christina@arm.com>
>> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard Earnshaw
>> <Richard.Earnshaw@arm.com>; Marcus Shawcroft
>> <Marcus.Shawcroft@arm.com>; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
>> Subject: Re: [PATCH]AArch64 Make use of FADDP in simple reductions.
>>=20
>> Tamar Christina <tamar.christina@arm.com> writes:
>> > Hi All,
>> >
>> > This is a respin of an older patch which never got upstream reviewed
>> > by a maintainer.  It's been updated to fit the current GCC codegen.
>> >
>> > This patch adds a pattern to support the (F)ADDP (scalar) instruction.
>> >
>> > Before the patch, the C code
>> >
>> > typedef float v4sf __attribute__((vector_size (16)));
>> >
>> > float
>> > foo1 (v4sf x)
>> > {
>> >   return x[0] + x[1];
>> > }
>> >
>> > generated:
>> >
>> > foo1:
>> > 	dup     s1, v0.s[1]
>> > 	fadd    s0, s1, s0
>> > 	ret
>> >
>> > After patch:
>> > foo1:
>> > 	faddp   s0, v0.2s
>> > 	ret
>> >
>> > The double case is now handled by SLP but the remaining cases still
>> > need help from combine.  I have kept the integer and floating point
>> > separate because of the integer one only supports V2DI and sharing it
>> > with the float would have required definition of a few new iterators f=
or just
>> a single use.
>> >
>> > I provide support for when both elements are subregs as a different
>> > pattern as there's no way to tell reload that the two registers must
>> > be equal with just constraints.
>> >
>> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>> >
>> > Ok for master?
>> >
>> > Thanks,
>> > Tamar
>> >
>> > gcc/ChangeLog:
>> >
>> > 	* config/aarch64/aarch64-simd.md (*aarch64_faddp_scalar<mode>,
>> > 	*aarch64_addp_scalarv2di, *aarch64_faddp_scalar2<mode>,
>> > 	*aarch64_addp_scalar2v2di): New.
>> >
>> > gcc/testsuite/ChangeLog:
>> >
>> > 	* gcc.target/aarch64/simd/scalar_faddp.c: New test.
>> > 	* gcc.target/aarch64/simd/scalar_faddp2.c: New test.
>> > 	* gcc.target/aarch64/simd/scalar_addp.c: New test.
>> >
>> > Co-authored-by: Tamar Christina <tamar.christina@arm.com>
>> >
>> > --- inline copy of patch --
>> > diff --git a/gcc/config/aarch64/aarch64-simd.md
>> > b/gcc/config/aarch64/aarch64-simd.md
>> > index
>> >
>> 6814dae079c9ff40aaa2bb625432bf9eb8906b73..b49f8b79b11cbb1888c503d9a
>> 938
>> > 4424f44bde05 100644
>> > --- a/gcc/config/aarch64/aarch64-simd.md
>> > +++ b/gcc/config/aarch64/aarch64-simd.md
>> > @@ -3414,6 +3414,70 @@ (define_insn "aarch64_faddp<mode>"
>> >    [(set_attr "type" "neon_fp_reduc_add_<stype><q>")]
>> >  )
>> >
>> > +;; For the case where both operands are a subreg we need to use a ;;
>> > +match_dup since reload cannot enforce that the registers are ;; the
>> > +same with a constraint in this case.
>> > +(define_insn "*aarch64_faddp_scalar2<mode>"
>> > +  [(set (match_operand:<VEL> 0 "register_operand" "=3Dw")
>> > +	(plus:<VEL>
>> > +	  (vec_select:<VEL>
>> > +	    (match_operator:<VEL> 1 "subreg_lowpart_operator"
>> > +	      [(match_operand:VHSDF 2 "register_operand" "w")])
>> > +	    (parallel [(match_operand 3 "const_int_operand" "n")]))
>> > +	  (match_dup:<VEL> 2)))]
>> > +  "TARGET_SIMD
>> > +   && ENDIAN_LANE_N (<nunits>, INTVAL (operands[3])) =3D=3D 1"
>> > +  "faddp\t%<Vetype>0, %2.2<Vetype>"
>> > +  [(set_attr "type" "neon_fp_reduc_add_<stype><q>")]
>> > +)
>>=20
>> The difficulty with using match_dup here is that the first vec_select op=
erand
>> ought to fold to a REG after reload, rather than stay as a subreg.  From=
 that
>> POV we're forcing the generation of non-canonical rtl.
>>=20
>> Also=E2=80=A6
>>=20
>> > +(define_insn "*aarch64_faddp_scalar<mode>"
>> > +  [(set (match_operand:<VEL> 0 "register_operand" "=3Dw")
>> > +	(plus:<VEL>
>> > +	  (vec_select:<VEL>
>> > +	    (match_operand:VHSDF 1 "register_operand" "w")
>> > +	    (parallel [(match_operand 2 "const_int_operand" "n")]))
>> > +	  (match_operand:<VEL> 3 "register_operand" "1")))]
>> > +  "TARGET_SIMD
>> > +   && ENDIAN_LANE_N (<nunits>, INTVAL (operands[2])) =3D=3D 1
>> > +   && SUBREG_P (operands[3]) && !SUBREG_P (operands[1])
>> > +   && subreg_lowpart_p (operands[3])"
>> > +  "faddp\t%<Vetype>0, %1.2<Vetype>"
>> > +  [(set_attr "type" "neon_fp_reduc_add_<stype><q>")]
>> > +)

Also:

It'd probably be better to use V2F for the iterator, since it excludes
V4 and V8 modes.

I think we can use vect_par_cnst_hi_half for operand 2.

>> =E2=80=A6matching constraints don't work reliably between two inputs:
>> the RA doesn't know how to combine two different inputs into one input in
>> order to make them match.
>>=20
>> Have you tried doing this as a define_peephole2 instead?
>> That fits this kind of situation better (from an rtl representation poin=
t of
>> view), but peephole2s are admittedly less powerful than combine.
>>=20
>> If peephole2s don't work then I think we'll have to provide a pattern th=
at
>> accepts two distinct inputs and then split the instruction if the inputs=
 aren't in
>> the same register.  That sounds a bit ugly though, so it'd be good news =
if the
>> peephole thing works out.
>
> Unfortunately peepholes don't work very well here because e.g. addp can be
> Assigned by the regalloc to the integer side instead of simd, in which ca=
se you
> Can't use the instruction anymore.

Ah, yeah, we wouldn't be able to recover easily from that.

> The peepholes seem to only detect the simple FP cases.
>
> I tried adding something like a post-reload spit
>
> +  "&& reload_completed && REGNO (operands[1]) !=3D REGNO (operands[3])"
> +  [(clobber (match_scratch:<VEL> 4 "=3Dw"))
> +   (set (match_dup 4)
> +       (vec_select:<VEL>
> +         (match_dup 1)
> +         (parallel [(match_dup 2)])))
> +   (set (match_dup 0)
> +       (plus:<VEL>
> +         (match_dup 4)
> +         (match_dup 3)))]
> +  ""
>
> But this doesn't seem like it'll work as it needs a scratch?

The clobber should go in the pattern itself, rather than in the split.  E.g=
.:

  [(set (match_operand:<VEL> 0 "register_operand" "=3Dw")
	(plus:<VEL>
	  (vec_select:<VEL>
	    (match_operand:V2F 1 "register_operand" "w")
	    (match_operand 2 "vect_par_cnst_hi_half"))
	  (match_operand:<VEL> 3 "register_operand" "w")))
   (clobber (match_scratch:V2F 4 "=3D&w"))]

recog will add the clobber when needed.

However, it's probably simpler to make operand 0 itself earlyclobber:

  [(set (match_operand:<VEL> 0 "register_operand" "=3D&w")
	(plus:<VEL>
	  (vec_select:<VEL>
	    (match_operand:V2F 1 "register_operand" "w")
	    (match_operand 2 "vect_par_cnst_hi_half"))
	  (match_operand:<VEL> 3 "register_operand" "w")))]

The output code should return "#" if the register numbers are different.

All of this complication makes me think: couldn't we match this in
gimple instead?  I.e. check for an addition of the elements in a
2-element vector and match it to IFN_REDUC_PLUS (when supported)?

Thanks,
Richard