From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ed1-x52f.google.com (mail-ed1-x52f.google.com [IPv6:2a00:1450:4864:20::52f]) by sourceware.org (Postfix) with ESMTPS id 8C4623861C54 for ; Tue, 6 Jul 2021 09:25:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 8C4623861C54 Received: by mail-ed1-x52f.google.com with SMTP id l24so27088145edr.11 for ; Tue, 06 Jul 2021 02:25:53 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=iFXQNAk0WhnZZcH5fN3Qgq5nmc24L8uFNSAZi7WZG18=; b=HVnh1p5U1TzowRJZD0S9rjm4ivBYZxiegpIfQ6zoNZATH1pUrjHFuCBquTnPx0u8AE Ydcz6+E488PQW5Lk4/RqZlC7Y7/ml6YMO7NQRN8Ki17lyfIbj7r3XeMqsj+osVyIE9Lz 9SrxaCbVD78caAAqVJqJQbkpfwtl5bor3kCosAX82tMEoEUXPzEfXitc5JdjHKo6QPIg EBvbL6subZngnLQFlTX5Ilniuh3rGSCDsHBZ6QLH/Z4PlexG8e3KurYgd5fq95zwRGoq xJnhum+7wQC0/Er5PruPfZnywKrrtWrwo3IE8qRzkSzCudgUySxNUzQulyPrOLScHwDg i2xA== X-Gm-Message-State: AOAM530uuasBKt6VbYngolzpGHB6SgVCiDxFyT+xTeoKJ37w6+FKQWBZ CQq4sddVHVap+eWLpiYu0paVVha6eWQQ0QeEWK2DNg== X-Google-Smtp-Source: ABdhPJz+Ukkw8ifpxe/tuI1QA479DKYFz2XGXAiB0JAm8qCCxc0cMYGqny4NqBQ/s3bRSoI4mpsTC9LpNlfVI04gZGA= X-Received: by 2002:a05:6402:22e4:: with SMTP id dn4mr6475467edb.371.1625563552507; Tue, 06 Jul 2021 02:25:52 -0700 (PDT) MIME-Version: 1.0 References: <9fd8bc30-f7d1-0171-4147-d570413f7a62@foss.st.com> <1d306b96-daa5-3a47-5e3e-d07ddd56dcf4@foss.st.com> In-Reply-To: From: Prathamesh Kulkarni Date: Tue, 6 Jul 2021 14:55:16 +0530 Message-ID: Subject: Re: [ARM] PR98435: Missed optimization in expanding vector constructor To: Kyrylo Tkachov Cc: Christophe LYON , gcc Patches Content-Type: multipart/mixed; boundary="0000000000001bad8305c6710244" X-Spam-Status: No, score=-8.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 06 Jul 2021 09:25:56 -0000 --0000000000001bad8305c6710244 Content-Type: text/plain; charset="UTF-8" On Tue, 6 Jul 2021 at 13:33, Kyrylo Tkachov wrote: > > > > > -----Original Message----- > > From: Prathamesh Kulkarni > > Sent: 06 July 2021 08:06 > > To: Christophe LYON > > Cc: Kyrylo Tkachov ; gcc Patches > patches@gcc.gnu.org> > > Subject: Re: [ARM] PR98435: Missed optimization in expanding vector > > constructor > > > > On Thu, 1 Jul 2021 at 16:26, Prathamesh Kulkarni > > wrote: > > > > > > On Wed, 30 Jun 2021 at 20:51, Christophe LYON > > > wrote: > > > > > > > > > > > > On 29/06/2021 12:46, Prathamesh Kulkarni wrote: > > > > > On Mon, 28 Jun 2021 at 14:48, Christophe LYON > > > > > wrote: > > > > >> > > > > >> On 28/06/2021 10:40, Kyrylo Tkachov via Gcc-patches wrote: > > > > >>>> -----Original Message----- > > > > >>>> From: Prathamesh Kulkarni > > > > >>>> Sent: 28 June 2021 09:38 > > > > >>>> To: Kyrylo Tkachov > > > > >>>> Cc: Christophe Lyon ; gcc Patches > > > > > >>>> patches@gcc.gnu.org> > > > > >>>> Subject: Re: [ARM] PR98435: Missed optimization in expanding > > vector > > > > >>>> constructor > > > > >>>> > > > > >>>> On Thu, 24 Jun 2021 at 22:01, Kyrylo Tkachov > > > > > > >>>> wrote: > > > > >>>>> > > > > >>>>>> -----Original Message----- > > > > >>>>>> From: Prathamesh Kulkarni > > > > >>>>>> Sent: 14 June 2021 09:02 > > > > >>>>>> To: Christophe Lyon > > > > >>>>>> Cc: gcc Patches ; Kyrylo Tkachov > > > > >>>>>> > > > > >>>>>> Subject: Re: [ARM] PR98435: Missed optimization in expanding > > vector > > > > >>>>>> constructor > > > > >>>>>> > > > > >>>>>> On Wed, 9 Jun 2021 at 15:58, Prathamesh Kulkarni > > > > >>>>>> wrote: > > > > >>>>>>> On Fri, 4 Jun 2021 at 13:15, Christophe Lyon > > > > >>>> > > > > >>>>>> wrote: > > > > >>>>>>>> On Fri, 4 Jun 2021 at 09:27, Prathamesh Kulkarni via Gcc- > > patches > > > > >>>>>>>> wrote: > > > > >>>>>>>>> Hi, > > > > >>>>>>>>> As mentioned in PR, for the following test-case: > > > > >>>>>>>>> > > > > >>>>>>>>> #include > > > > >>>>>>>>> > > > > >>>>>>>>> bfloat16x4_t f1 (bfloat16_t a) > > > > >>>>>>>>> { > > > > >>>>>>>>> return vdup_n_bf16 (a); > > > > >>>>>>>>> } > > > > >>>>>>>>> > > > > >>>>>>>>> bfloat16x4_t f2 (bfloat16_t a) > > > > >>>>>>>>> { > > > > >>>>>>>>> return (bfloat16x4_t) {a, a, a, a}; > > > > >>>>>>>>> } > > > > >>>>>>>>> > > > > >>>>>>>>> Compiling with arm-linux-gnueabi -O3 -mfpu=neon -mfloat- > > > > >>>> abi=softfp > > > > >>>>>>>>> -march=armv8.2-a+bf16+fp16 results in f2 not being > > vectorized: > > > > >>>>>>>>> > > > > >>>>>>>>> f1: > > > > >>>>>>>>> vdup.16 d16, r0 > > > > >>>>>>>>> vmov r0, r1, d16 @ v4bf > > > > >>>>>>>>> bx lr > > > > >>>>>>>>> > > > > >>>>>>>>> f2: > > > > >>>>>>>>> mov r3, r0 @ __bf16 > > > > >>>>>>>>> adr r1, .L4 > > > > >>>>>>>>> ldrd r0, [r1] > > > > >>>>>>>>> mov r2, r3 @ __bf16 > > > > >>>>>>>>> mov ip, r3 @ __bf16 > > > > >>>>>>>>> bfi r1, r2, #0, #16 > > > > >>>>>>>>> bfi r0, ip, #0, #16 > > > > >>>>>>>>> bfi r1, r3, #16, #16 > > > > >>>>>>>>> bfi r0, r2, #16, #16 > > > > >>>>>>>>> bx lr > > > > >>>>>>>>> > > > > >>>>>>>>> This seems to happen because vec_init pattern in neon.md > > has VDQ > > > > >>>>>> mode > > > > >>>>>>>>> iterator, which doesn't include V4BF. In attached patch, I > > changed > > > > >>>>>>>>> mode > > > > >>>>>>>>> to VDQX which seems to work for the test-case, and the > > compiler > > > > >>>> now > > > > >>>>>> generates: > > > > >>>>>>>>> f2: > > > > >>>>>>>>> vdup.16 d16, r0 > > > > >>>>>>>>> vmov r0, r1, d16 @ v4bf > > > > >>>>>>>>> bx lr > > > > >>>>>>>>> > > > > >>>>>>>>> However, the pattern is also gated on TARGET_HAVE_MVE > > and I am > > > > >>>>>> not > > > > >>>>>>>>> sure if either VDQ or VDQX are correct modes for MVE since > > MVE > > > > >>>> has > > > > >>>>>>>>> only 128-bit vectors ? > > > > >>>>>>>>> > > > > >>>>>>>> I think patterns common to both Neon and MVE should be > > moved to > > > > >>>>>>>> vec-common.md, I don't know why such patterns were left in > > > > >>>> neon.md. > > > > >>>>>>> Since we end up calling neon_expand_vector_init for both > > NEON and > > > > >>>> MVE, > > > > >>>>>>> I am not sure if we should separate the pattern ? > > > > >>>>>>> Would it make sense to FAIL if the mode size isn't 16 bytes for > > MVE as > > > > >>>>>>> in attached patch so > > > > >>>>>>> it will call neon_expand_vector_init only for 128-bit vectors ? > > > > >>>>>>> Altho hard-coding 16 in the pattern doesn't seem a good idea to > > me > > > > >>>> either. > > > > >>>>>> ping https://gcc.gnu.org/pipermail/gcc-patches/2021- > > June/572342.html > > > > >>>>>> (attaching patch as text). > > > > >>>>>> > > > > >>>>> --- a/gcc/config/arm/neon.md > > > > >>>>> +++ b/gcc/config/arm/neon.md > > > > >>>>> @@ -459,10 +459,12 @@ > > > > >>>>> ) > > > > >>>>> > > > > >>>>> (define_expand "vec_init" > > > > >>>>> - [(match_operand:VDQ 0 "s_register_operand") > > > > >>>>> + [(match_operand:VDQX 0 "s_register_operand") > > > > >>>>> (match_operand 1 "" "")] > > > > >>>>> "TARGET_NEON || TARGET_HAVE_MVE" > > > > >>>>> { > > > > >>>>> + if (TARGET_HAVE_MVE && GET_MODE_SIZE (GET_MODE > > > > >>>> (operands[0])) != 16) > > > > >>>>> + FAIL; > > > > >>>>> neon_expand_vector_init (operands[0], operands[1]); > > > > >>>>> DONE; > > > > >>>>> }) > > > > >>>>> > > > > >>>>> I think we should move this to vec-common.md like Christophe > > said. > > > > >>>>> Perhaps rather than making it FAIL for non-16 MVE sizes we just > > disable it in > > > > >>>> the expander condition? > > > > >>>>> "TARGET_NEON || (TARGET_HAVE_MVE && GET_MODE_SIZE (< > > > > >>>> VDQ>mode) != 16)" > > > > >>>> Is it OK to use mode ? Because using mode resulted > > in lot > > > > >>>> of build errors. > > > > >>>> Also, I think the comparison should be inverted, ie, GET_MODE_SIZE > > > > >>>> (mode) == 16 since > > > > >>>> we want to make the pattern pass if target is MVE and vector size is > > 16 bytes ? > > > > >>>> Do these changes in attached patch look OK ? > > > > >>> Yes, you're right. > > > > >> > > > > >> Can't this be ARM_HAVE__ARITH like in most expanders in > > vec-common.md? > > > > >> > > > > >> (maybe with a && !TARGET_REALLY_IWMMXT if needed) > > > > > I wonder if this should be ARM_HAVE__LDST instead since > > we're > > > > > initializing the vector ? > > > > > > > > > > > > Well, it really depends on which modes you want to enable. > > > > > > > > > > > > Looks like your move VDQ -> VDQ adds V4BF, V8BF and DI. > > > > > > > > Are they all OK for Neon? > > > > > > > > They are not OK for MVE. > > > > > > > > Ideally you could add testcases to cover to the supported and > > > > unsupported modes for both Neon and MVE.\ > > > > > > > > Before your patch, the expander is enabled for MVE for 64 bit modes > > > > (V8QI, V4HI, V2SI): what happens in this case? Does the compiler crash > > > > or is there something else preventing the match? > > > Hi, > > > Apparently there is VALID_MVE_MODE macro, so is it better to use: > > > TARGET_NEON || (TARGET_HAVE_MVE && > > VALID_MVE_MODE(mode)) > > > as in the attached patch ? > > The change is ok. I would like to see some testcases like Christophe suggested, but this patch just moves the expander around rather than introducing new functionality. Hi Kyrill, As mentioned in the first email, the patch improves code-gen for following test-case: bfloat16x4_t f (bfloat16_t a) { return (bfloat16x4_t) {a, a, a, a}; } Before patch: f: mov r3, r0 @ __bf16 adr r1, .L4 ldrd r0, [r1] mov r2, r3 @ __bf16 mov ip, r3 @ __bf16 bfi r1, r2, #0, #16 bfi r0, ip, #0, #16 bfi r1, r3, #16, #16 bfi r0, r2, #16, #16 bx lr After patch: f: vdup.16 d16, r0 vmov r0, r1, d16 @ v4bf bx lr because the patch changes mode from VDQ to VDQX to accommodate bf modes. I have included the test in the attached patch. I think Christophe's concerns were mainly about the right modes getting enabled for MVE. Unfortunately, I am not sure how to test for that because the FE catches invalid modes, and we don't end up hitting the pattern. Thanks, Prathamesh > Thanks, > Kyrill > > > ping https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574206.html > > > > Thanks, > > Prathamesh > > > > > > Thanks, > > > Prathamesh > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Christophe > > > > > > > > > > > > > Thanks, > > > > > Prathamesh > > > > >> > > > > >> Christophe > > > > >> > > > > >> > > > > >>> Ok. > > > > >>> Thanks, > > > > >>> Kyrill > > > > >>> > > > > >>> > > > > >>>> Thanks, > > > > >>>> Prathamesh > > > > >>>>> Thanks, > > > > >>>>> Kyrill > > > > >>>>> > > > > >>>>>> Thanks, > > > > >>>>>> Prathamesh > > > > >>>>>>> Thanks, > > > > >>>>>>> Prathamesh > > > > >>>>>>>> That being said, I suggest you look at other similar patterns in > > > > >>>>>>>> vec-common.md, most of which are gated on > > > > >>>>>>>> ARM_HAVE__ARITH > > > > >>>>>>>> and possibly beware of issues with iwmmxt :-) > > > > >>>>>>>> > > > > >>>>>>>> Christophe > > > > >>>>>>>> > > > > >>>>>>>>> Thanks, > > > > >>>>>>>>> Prathamesh --0000000000001bad8305c6710244 Content-Type: text/plain; charset="US-ASCII"; name="pr98435-5.txt" Content-Disposition: attachment; filename="pr98435-5.txt" Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: f_kqrufmn50 ZGlmZiAtLWdpdCBhL2djYy9jb25maWcvYXJtL25lb24ubWQgYi9nY2MvY29uZmlnL2FybS9uZW9u Lm1kCmluZGV4IDZhNjU3MzMxN2NmLi4wYzk4YjNhOGYyMyAxMDA2NDQKLS0tIGEvZ2NjL2NvbmZp Zy9hcm0vbmVvbi5tZAorKysgYi9nY2MvY29uZmlnL2FybS9uZW9uLm1kCkBAIC00NTgsMTUgKzQ1 OCw2IEBACiAgIFsoc2V0X2F0dHIgInR5cGUiICJuZW9uX3N0b3JlMV9vbmVfbGFuZV9xLG5lb25f dG9fZ3BfcSIpXQogKQogCi0oZGVmaW5lX2V4cGFuZCAidmVjX2luaXQ8bW9kZT48Vl9lbGVtX2w+ IgotICBbKG1hdGNoX29wZXJhbmQ6VkRRIDAgInNfcmVnaXN0ZXJfb3BlcmFuZCIpCi0gICAobWF0 Y2hfb3BlcmFuZCAxICIiICIiKV0KLSAgIlRBUkdFVF9ORU9OIHx8IFRBUkdFVF9IQVZFX01WRSIK LXsKLSAgbmVvbl9leHBhbmRfdmVjdG9yX2luaXQgKG9wZXJhbmRzWzBdLCBvcGVyYW5kc1sxXSk7 Ci0gIERPTkU7Ci19KQotCiA7OyBEb3VibGV3b3JkIGFuZCBxdWFkd29yZCBhcml0aG1ldGljLgog CiA7OyBOT1RFOiBzb21lIG90aGVyIGluc3RydWN0aW9ucyBhbHNvIHN1cHBvcnQgNjQtYml0IGlu dGVnZXIKZGlmZiAtLWdpdCBhL2djYy9jb25maWcvYXJtL3ZlYy1jb21tb24ubWQgYi9nY2MvY29u ZmlnL2FybS92ZWMtY29tbW9uLm1kCmluZGV4IDhlMzUxNTFkYTQ2Li43ODU4YmU5ZjI4ZSAxMDA2 NDQKLS0tIGEvZ2NjL2NvbmZpZy9hcm0vdmVjLWNvbW1vbi5tZAorKysgYi9nY2MvY29uZmlnL2Fy bS92ZWMtY29tbW9uLm1kCkBAIC01NjUsMyArNTY1LDEyIEBACiAKICAgRE9ORTsKIH0pCisKKyhk ZWZpbmVfZXhwYW5kICJ2ZWNfaW5pdDxtb2RlPjxWX2VsZW1fbD4iCisgIFsobWF0Y2hfb3BlcmFu ZDpWRFFYIDAgInNfcmVnaXN0ZXJfb3BlcmFuZCIpCisgICAobWF0Y2hfb3BlcmFuZCAxICIiICIi KV0KKyAgIlRBUkdFVF9ORU9OIHx8IChUQVJHRVRfSEFWRV9NVkUgJiYgVkFMSURfTVZFX01PREUg KDxNT0RFPm1vZGUpKSIgCit7CisgIG5lb25fZXhwYW5kX3ZlY3Rvcl9pbml0IChvcGVyYW5kc1sw XSwgb3BlcmFuZHNbMV0pOworICBET05FOworfSkKZGlmZiAtLWdpdCBhL2djYy90ZXN0c3VpdGUv Z2NjLnRhcmdldC9hcm0vc2ltZC9wcjk4NDM1LmMgYi9nY2MvdGVzdHN1aXRlL2djYy50YXJnZXQv YXJtL3NpbWQvcHI5ODQzNS5jCm5ldyBmaWxlIG1vZGUgMTAwNjQ0CmluZGV4IDAwMDAwMDAwMDAw Li4wYWY4NjMzZmQ1NgotLS0gL2Rldi9udWxsCisrKyBiL2djYy90ZXN0c3VpdGUvZ2NjLnRhcmdl dC9hcm0vc2ltZC9wcjk4NDM1LmMKQEAgLTAsMCArMSwxNSBAQAorLyogeyBkZy1kbyBjb21waWxl IH0gKi8KKy8qIHsgZGctb3B0aW9ucyAiLU8yIC1mZmFzdC1tYXRoIiB9ICovCisvKiB7IGRnLXJl cXVpcmUtZWZmZWN0aXZlLXRhcmdldCBhcm1fdjhfMmFfYmYxNl9uZW9uX29rIH0gKi8KKy8qIHsg ZGctYWRkLW9wdGlvbnMgYXJtX3Y4XzJhX2JmMTZfbmVvbiB9ICovCisvKiB7IGRnLWFkZGl0aW9u YWwtb3B0aW9ucyAiLW1mbG9hdC1hYmk9c29mdGZwIC1tYXJjaD1hcm12OC4yLWErYmYxNitmcDE2 IiB9ICovCisKKyNpbmNsdWRlIDxhcm1fbmVvbi5oPgorCitiZmxvYXQxNng0X3QgZiAoYmZsb2F0 MTZfdCBhKQoreworICByZXR1cm4gKGJmbG9hdDE2eDRfdCkge2EsIGEsIGEsIGF9OworfQorCisv KiB7IGRnLWZpbmFsIHsgc2Nhbi1hc3NlbWJsZXIge1x0dmR1cC4xNlx0ZFswLTldKywgcjB9IH0g fSAqLworLyogeyBkZy1maW5hbCB7IHNjYW4tYXNzZW1ibGVyIHtcdHZtb3ZcdHIwLCByMSwgZFsw LTldK30gfSB9ICovCg== --0000000000001bad8305c6710244--