From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id B1DA63858401 for ; Wed, 12 Apr 2023 08:59:45 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B1DA63858401 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 838F0D75; Wed, 12 Apr 2023 02:00:28 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 805C83F587; Wed, 12 Apr 2023 01:59:43 -0700 (PDT) From: Richard Sandiford To: Prathamesh Kulkarni Mail-Followup-To: Prathamesh Kulkarni ,Richard Biener , gcc Patches , richard.sandiford@arm.com Cc: Richard Biener , gcc Patches Subject: Re: [aarch64] Use dup and zip1 for interleaving elements in initializing vector References: Date: Wed, 12 Apr 2023 09:59:42 +0100 In-Reply-To: (Prathamesh Kulkarni's message of "Thu, 6 Apr 2023 16:51:08 +0530") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-31.2 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Prathamesh Kulkarni writes: > On Thu, 6 Apr 2023 at 16:05, Richard Sandiford > wrote: >> >> Prathamesh Kulkarni writes: >> > On Tue, 4 Apr 2023 at 23:35, Richard Sandiford >> > wrote: >> >> > diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc >> >> > index cd9cace3c9b..3de79060619 100644 >> >> > --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc >> >> > +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc >> >> > @@ -817,6 +817,62 @@ public: >> >> > >> >> > class svdupq_impl : public quiet >> >> > { >> >> > +private: >> >> > + gimple * >> >> > + fold_nonconst_dupq (gimple_folder &f, unsigned factor) const >> >> > + { >> >> > + /* Lower lhs = svdupq (arg0, arg1, ..., argN} into: >> >> > + tmp = {arg0, arg1, ..., arg} >> >> > + lhs = VEC_PERM_EXPR (tmp, tmp, {0, 1, 2, N-1, ...}) */ >> >> > + >> >> > + /* TODO: Revisit to handle factor by padding zeros. */ >> >> > + if (factor > 1) >> >> > + return NULL; >> >> >> >> Isn't the key thing here predicate vs. vector rather than factor == 1 vs. >> >> factor != 1? Do we generate good code for b8, where factor should be 1? >> > Hi, >> > It generates the following code for svdup_n_b8: >> > https://pastebin.com/ypYt590c >> >> Hmm, yeah, not pretty :-) But it's not pretty without either. >> >> > I suppose lowering to ctor+vec_perm_expr is not really useful >> > for this case because it won't simplify ctor, unlike the above case of >> > svdupq_s32 (x[0], x[1], x[2], x[3]); >> > However I wonder if it's still a good idea to lower svdupq for predicates, for >> > representing svdupq (or other intrinsics) using GIMPLE constructs as >> > far as possible ? >> >> It's possible, but I think we'd need an example in which its a clear >> benefit. > Sorry I posted for wrong test case above. > For the following test: > svbool_t f(uint8x16_t x) > { > return svdupq_n_b8 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7], > x[8], x[9], x[10], x[11], x[12], > x[13], x[14], x[15]); > } > > Code-gen: > https://pastebin.com/maexgeJn > > I suppose it's equivalent to following ? > > svbool_t f2(uint8x16_t x) > { > svuint8_t tmp = svdupq_n_u8 ((bool) x[0], (bool) x[1], (bool) x[2], > (bool) x[3], > (bool) x[4], (bool) x[5], (bool) x[6], > (bool) x[7], > (bool) x[8], (bool) x[9], (bool) x[10], > (bool) x[11], > (bool) x[12], (bool) x[13], (bool) > x[14], (bool) x[15]); > return svcmpne_n_u8 (svptrue_b8 (), tmp, 0); > } Yeah, this is essentially the transformation that the svdupq rtl expander uses. It would probably be a good idea to do that in gimple too. Thanks, Richard > > which generates: > f2: > .LFB3901: > .cfi_startproc > movi v1.16b, 0x1 > ptrue p0.b, all > cmeq v0.16b, v0.16b, #0 > bic v0.16b, v1.16b, v0.16b > dup z0.q, z0.q[0] > cmpne p0.b, p0/z, z0.b, #0 > ret > > Thanks, > Prathamesh