From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=/xIy=4E=arm.com=richard.sandiford@sourceware.org>
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
	by sourceware.org (Postfix) with ESMTP id 40F773856B58
	for <gcc-patches@gcc.gnu.org>; Tue,  6 Dec 2022 11:05:21 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 40F773856B58
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
	by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 8298C23A;
	Tue,  6 Dec 2022 03:05:27 -0800 (PST)
Received: from localhost (e121540-lin.manchester.arm.com [10.32.99.50])
	by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id DB4DD3F73D;
	Tue,  6 Dec 2022 03:05:19 -0800 (PST)
From: Richard Sandiford <richard.sandiford@arm.com>
To: Tamar Christina <Tamar.Christina@arm.com>
Mail-Followup-To: Tamar Christina <Tamar.Christina@arm.com>,"gcc-patches\@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,  nd <nd@arm.com>,  Richard Earnshaw <Richard.Earnshaw@arm.com>,  Marcus Shawcroft <Marcus.Shawcroft@arm.com>,  Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>, richard.sandiford@arm.com
Cc: "gcc-patches\@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,  nd <nd@arm.com>,  Richard Earnshaw <Richard.Earnshaw@arm.com>,  Marcus Shawcroft <Marcus.Shawcroft@arm.com>,  Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
Subject: Re: [PATCH 5/8]AArch64 aarch64: Make existing V2HF be usable.
References: <Y1+4euF0rUwFIjTL@arm.com> <mptfsf2vhzj.fsf@arm.com>
	<VI1PR08MB5325335D195073D1E5B2AB33FF009@VI1PR08MB5325.eurprd08.prod.outlook.com>
	<mptzgc0bzcq.fsf@arm.com>
	<VI1PR08MB5325AE85029C5D3294F6EF47FF1B9@VI1PR08MB5325.eurprd08.prod.outlook.com>
Date: Tue, 06 Dec 2022 11:05:18 +0000
In-Reply-To: <VI1PR08MB5325AE85029C5D3294F6EF47FF1B9@VI1PR08MB5325.eurprd08.prod.outlook.com>
	(Tamar Christina's message of "Tue, 6 Dec 2022 10:58:24 +0000")
Message-ID: <mpttu28bxn5.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
X-Spam-Status: No, score=-38.8 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,KAM_SHORT,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

Tamar Christina <Tamar.Christina@arm.com> writes:
>> -----Original Message-----
>> From: Richard Sandiford <richard.sandiford@arm.com>
>> Sent: Tuesday, December 6, 2022 10:28 AM
>> To: Tamar Christina <Tamar.Christina@arm.com>
>> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard Earnshaw
>> <Richard.Earnshaw@arm.com>; Marcus Shawcroft
>> <Marcus.Shawcroft@arm.com>; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
>> Subject: Re: [PATCH 5/8]AArch64 aarch64: Make existing V2HF be usable.
>> 
>> Tamar Christina <Tamar.Christina@arm.com> writes:
>> > Hi,
>> >
>> >
>> >> This name might cause confusion with the SVE iterators, where FULL
>> >> means "every bit of the register is used".  How about something like
>> >> VMOVE instead?
>> >>
>> >> With this change, I guess VALL_F16 represents "The set of all modes
>> >> for which the vld1 intrinsics are provided" and VMOVE or whatever is
>> >> "All Advanced SIMD modes suitable for moving, loading, and storing".
>> >> That is, VMOVE extends VALL_F16 with modes that are not manifested
>> >> via intrinsics.
>> >>
>> >
>> > Done.
>> >
>> >> Where is the 2h used, and is it valid syntax in that context?
>> >>
>> >> Same for later instances of 2h.
>> >
>> > They are, but they weren't meant to be in this patch.  They belong in
>> > a separate FP16 series that I won't get to finish for GCC 13 due not
>> > being able to finish writing all the tests.  I have moved them to that patch
>> series though.
>> >
>> > While the addp patch series has been killed, this patch is still good
>> > standalone and improves codegen as shown in the updated testcase.
>> >
>> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>> >
>> > Ok for master?
>> >
>> > Thanks,
>> > Tamar
>> >
>> > gcc/ChangeLog:
>> >
>> > 	* config/aarch64/aarch64-simd.md (*aarch64_simd_movv2hf): New.
>> > 	(mov<mode>, movmisalign<mode>, aarch64_dup_lane<mode>,
>> > 	aarch64_store_lane0<mode>, aarch64_simd_vec_set<mode>,
>> > 	@aarch64_simd_vec_copy_lane<mode>, vec_set<mode>,
>> > 	reduc_<optab>_scal_<mode>, reduc_<fmaxmin>_scal_<mode>,
>> > 	aarch64_reduc_<optab>_internal<mode>,
>> aarch64_get_lane<mode>,
>> > 	vec_init<mode><Vel>, vec_extract<mode><Vel>): Support V2HF.
>> > 	(aarch64_simd_dupv2hf): New.
>> > 	* config/aarch64/aarch64.cc (aarch64_classify_vector_mode):
>> > 	Add E_V2HFmode.
>> > 	* config/aarch64/iterators.md (VHSDF_P): New.
>> > 	(V2F, VMOVE, nunits, Vtype, Vmtype, Vetype, stype, VEL,
>> > 	Vel, q, vp): Add V2HF.
>> > 	* config/arm/types.md (neon_fp_reduc_add_h): New.
>> >
>> > gcc/testsuite/ChangeLog:
>> >
>> > 	* gcc.target/aarch64/sve/slp_1.c: Update testcase.
>> >
>> > --- inline copy of patch ---
>> >
>> > diff --git a/gcc/config/aarch64/aarch64-simd.md
>> > b/gcc/config/aarch64/aarch64-simd.md
>> > index
>> >
>> f4152160084d6b6f34bd69f0ba6386c1ab50f77e..487a31010245accec28e779661
>> e6
>> > c2d578fca4b7 100644
>> > --- a/gcc/config/aarch64/aarch64-simd.md
>> > +++ b/gcc/config/aarch64/aarch64-simd.md
>> > @@ -19,10 +19,10 @@
>> >  ;; <http://www.gnu.org/licenses/>.
>> >
>> >  (define_expand "mov<mode>"
>> > -  [(set (match_operand:VALL_F16 0 "nonimmediate_operand")
>> > -	(match_operand:VALL_F16 1 "general_operand"))]
>> > +  [(set (match_operand:VMOVE 0 "nonimmediate_operand")
>> > +	(match_operand:VMOVE 1 "general_operand"))]
>> >    "TARGET_SIMD"
>> > -  "
>> > +{
>> >    /* Force the operand into a register if it is not an
>> >       immediate whose use can be replaced with xzr.
>> >       If the mode is 16 bytes wide, then we will be doing @@ -46,12
>> > +46,11 @@ (define_expand "mov<mode>"
>> >        aarch64_expand_vector_init (operands[0], operands[1]);
>> >        DONE;
>> >      }
>> > -  "
>> > -)
>> > +})
>> >
>> >  (define_expand "movmisalign<mode>"
>> > -  [(set (match_operand:VALL_F16 0 "nonimmediate_operand")
>> > -        (match_operand:VALL_F16 1 "general_operand"))]
>> > +  [(set (match_operand:VMOVE 0 "nonimmediate_operand")
>> > +        (match_operand:VMOVE 1 "general_operand"))]
>> >    "TARGET_SIMD && !STRICT_ALIGNMENT"
>> >  {
>> >    /* This pattern is not permitted to fail during expansion: if both
>> > arguments @@ -73,6 +72,16 @@ (define_insn
>> "aarch64_simd_dup<mode>"
>> >    [(set_attr "type" "neon_dup<q>, neon_from_gp<q>")]
>> >  )
>> >
>> > +(define_insn "aarch64_simd_dupv2hf"
>> > +  [(set (match_operand:V2HF 0 "register_operand" "=w")
>> > +	(vec_duplicate:V2HF
>> > +	  (match_operand:HF 1 "register_operand" "0")))]
>> 
>> Seems like this should be "w" rather than "0", since SLI is a two-register
>> instruction.
>
> Yes, but for a dup it's only valid when the same register is used. i.e. it has to
> write into the original src register.

Ah, right.  In that case it might be better to use %d0 for the source
operand:

  For operands to match in a particular case usually means that they
  are identical-looking RTL expressions.  But in a few special cases
  specific kinds of dissimilarity are allowed.  For example, @code{*x}
  as an input operand will match @code{*x++} as an output operand.
  For proper results in such cases, the output template should always
  use the output-operand's number when printing the operand.

Thanks,
Richard