From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=Snz2=AD=arm.com=richard.sandiford@sourceware.org>
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
	by sourceware.org (Postfix) with ESMTP id B1DA63858401
	for <gcc-patches@gcc.gnu.org>; Wed, 12 Apr 2023 08:59:45 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B1DA63858401
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
	by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 838F0D75;
	Wed, 12 Apr 2023 02:00:28 -0700 (PDT)
Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72])
	by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 805C83F587;
	Wed, 12 Apr 2023 01:59:43 -0700 (PDT)
From: Richard Sandiford <richard.sandiford@arm.com>
To: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
Mail-Followup-To: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>,Richard Biener <rguenther@suse.de>,  gcc Patches <gcc-patches@gcc.gnu.org>, richard.sandiford@arm.com
Cc: Richard Biener <rguenther@suse.de>,  gcc Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [aarch64] Use dup and zip1 for interleaving elements in initializing vector
References: <CAAgBjM=0mHW4Aw2u-Kksy=OV5KY-G7_CW+mrT1QKPyKMrBi80g@mail.gmail.com>
	<CAAgBjM=AW6O_fy-bxajgA0ZSu1-F0K8dPtJ3M847TG7SFD+5jQ@mail.gmail.com>
	<CAAgBjMnMpzNDYcDFOmzyLUi4zhepdp3AttbhC1ODKK2NQ=xvnQ@mail.gmail.com>
	<mptmt6nvjhu.fsf@arm.com>
	<CAAgBjMndgAd5eS52rKq+5MsqzA2FRiXM_3CLiovgD9rn8f6TBw@mail.gmail.com>
	<mptv8kle4hd.fsf@arm.com>
	<CAAgBjMkxZXVPYoX_C=deX1P83ZXXqxoWWAkhuFMVE2ha3XJG+A@mail.gmail.com>
	<mpta61wccvr.fsf@arm.com>
	<CAAgBjMktt7DN30efqEnjhPnkbVufTqrqSgrBkSw-0aytA8rf6A@mail.gmail.com>
	<CAAgBjMmTke8Qp2yzXYfDLASpW_LxPTR=AkKqbkMUia=MgwQrXA@mail.gmail.com>
	<mpt357maicn.fsf@arm.com>
	<CAAgBjM=v66TMXjC3+KYHEgmjuf88zGxJ4mQHFGc1jzLWd+H_Gw@mail.gmail.com>
	<mpth6vz7zzx.fsf@arm.com>
	<CAAgBjMkczsYmdE_JU86Dy6_tcA4E2URgk+pkk7bOz=W2_+4XVA@mail.gmail.com>
	<mpt4jqsxyvd.fsf@arm.com>
	<nycvar.YFH.7.77.849.2303130729320.18795@jbgna.fhfr.qr>
	<CAAgBjM=1qkfP=f-_LTXJtqoexpGMzNdUpEJY2y1nRyTn8XUcow@mail.gmail.com>
	<mpth6tvo73w.fsf@arm.com>
	<CAAgBjMn9K95aieL4iv6Kn_XCATzZWSN9bODionsF5-Ow4VMS6A@mail.gmail.com>
	<mptfs9dmh7w.fsf@arm.com>
	<CAAgBjM=JXdWiUtqarjfxP91_Oay9g2rEpHXopARQErfzHSfc9A@mail.gmail.com>
Date: Wed, 12 Apr 2023 09:59:42 +0100
In-Reply-To: <CAAgBjM=JXdWiUtqarjfxP91_Oay9g2rEpHXopARQErfzHSfc9A@mail.gmail.com>
	(Prathamesh Kulkarni's message of "Thu, 6 Apr 2023 16:51:08 +0530")
Message-ID: <mptedopjx1d.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
X-Spam-Status: No, score=-31.2 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
> On Thu, 6 Apr 2023 at 16:05, Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>>
>> Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
>> > On Tue, 4 Apr 2023 at 23:35, Richard Sandiford
>> > <richard.sandiford@arm.com> wrote:
>> >> > diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>> >> > index cd9cace3c9b..3de79060619 100644
>> >> > --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>> >> > +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>> >> > @@ -817,6 +817,62 @@ public:
>> >> >
>> >> >  class svdupq_impl : public quiet<function_base>
>> >> >  {
>> >> > +private:
>> >> > +  gimple *
>> >> > +  fold_nonconst_dupq (gimple_folder &f, unsigned factor) const
>> >> > +  {
>> >> > +    /* Lower lhs = svdupq (arg0, arg1, ..., argN} into:
>> >> > +       tmp = {arg0, arg1, ..., arg<N-1>}
>> >> > +       lhs = VEC_PERM_EXPR (tmp, tmp, {0, 1, 2, N-1, ...})  */
>> >> > +
>> >> > +    /* TODO: Revisit to handle factor by padding zeros.  */
>> >> > +    if (factor > 1)
>> >> > +      return NULL;
>> >>
>> >> Isn't the key thing here predicate vs. vector rather than factor == 1 vs.
>> >> factor != 1?  Do we generate good code for b8, where factor should be 1?
>> > Hi,
>> > It generates the following code for svdup_n_b8:
>> > https://pastebin.com/ypYt590c
>>
>> Hmm, yeah, not pretty :-)  But it's not pretty without either.
>>
>> > I suppose lowering to ctor+vec_perm_expr is not really useful
>> > for this case because it won't simplify ctor, unlike the above case of
>> > svdupq_s32 (x[0], x[1], x[2], x[3]);
>> > However I wonder if it's still a good idea to lower svdupq for predicates, for
>> > representing svdupq (or other intrinsics) using GIMPLE constructs as
>> > far as possible ?
>>
>> It's possible, but I think we'd need an example in which its a clear
>> benefit.
> Sorry I posted for wrong test case above.
> For the following test:
> svbool_t f(uint8x16_t x)
> {
>   return svdupq_n_b8 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7],
>                                     x[8], x[9], x[10], x[11], x[12],
> x[13], x[14], x[15]);
> }
>
> Code-gen:
> https://pastebin.com/maexgeJn
>
> I suppose it's equivalent to following ?
>
> svbool_t f2(uint8x16_t x)
> {
>   svuint8_t tmp = svdupq_n_u8 ((bool) x[0], (bool) x[1], (bool) x[2],
> (bool) x[3],
>                                (bool) x[4], (bool) x[5], (bool) x[6],
> (bool) x[7],
>                                (bool) x[8], (bool) x[9], (bool) x[10],
> (bool) x[11],
>                                (bool) x[12], (bool) x[13], (bool)
> x[14], (bool) x[15]);
>   return svcmpne_n_u8 (svptrue_b8 (), tmp, 0);
> }

Yeah, this is essentially the transformation that the svdupq rtl
expander uses.  It would probably be a good idea to do that in
gimple too.

Thanks,
Richard

>
> which generates:
> f2:
> .LFB3901:
>         .cfi_startproc
>         movi    v1.16b, 0x1
>         ptrue   p0.b, all
>         cmeq    v0.16b, v0.16b, #0
>         bic     v0.16b, v1.16b, v0.16b
>         dup     z0.q, z0.q[0]
>         cmpne   p0.b, p0/z, z0.b, #0
>         ret
>
> Thanks,
> Prathamesh