From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ed1-x531.google.com (mail-ed1-x531.google.com [IPv6:2a00:1450:4864:20::531]) by sourceware.org (Postfix) with ESMTPS id 55674384F02A for ; Thu, 1 Jul 2021 10:57:35 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 55674384F02A Received: by mail-ed1-x531.google.com with SMTP id j11so7782897edq.6 for ; Thu, 01 Jul 2021 03:57:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=0ar1TLSDxd0/wox1KJt14mVYb8RTTIOipOtoWqIorfo=; b=RwaoaGKsfRQQig+4FGKHX+BG/U2HBzDBBKP86GGC+uAXBsWSaLZdzNNh1ZZAMeTWLq 123OIcM24rcXg9GKsg9J2ULxsQUXfgqNjo3nwDGZJl5N4RUCjoaCBDIjVf+tCJOKIWIr 8gimdPp3hT6BzN18ygfb2QdxOATOFeLb1n2LUj40g/Aeqfi2Ah1b6pJwxlj47phqiilb YusimGvFmHpZCCf6ml/itI93WQ4zPWSbQsGGhUlTZZ162ZeaOYjBws1RToc18evaZW1t Wanx6HF4SUgK7D3rIRygC5RsIjgjF5X4a8cg3fNBTWQVlKSfTE8oS15TYvwFeFmjM2R6 761g== X-Gm-Message-State: AOAM530rKvm7N/95S5oq6wWwJxwMQwKGx6CoCDi1KXr09CEHzJEyKT8t RbtQDwmyNOTAQMR6+c6fzS0FR1pSEohVxcxoWGNF3A== X-Google-Smtp-Source: ABdhPJxGSbMHlPua8kq0oQlBmiysPUay92mq3TZ4Br8/fVS07BTj+35g/mj88gRJ5GJoDB5Ilu6mZRZZCD1qiOhxZVY= X-Received: by 2002:aa7:cd85:: with SMTP id x5mr23628091edv.115.1625137054337; Thu, 01 Jul 2021 03:57:34 -0700 (PDT) MIME-Version: 1.0 References: <9fd8bc30-f7d1-0171-4147-d570413f7a62@foss.st.com> <1d306b96-daa5-3a47-5e3e-d07ddd56dcf4@foss.st.com> In-Reply-To: <1d306b96-daa5-3a47-5e3e-d07ddd56dcf4@foss.st.com> From: Prathamesh Kulkarni Date: Thu, 1 Jul 2021 16:26:57 +0530 Message-ID: Subject: Re: [ARM] PR98435: Missed optimization in expanding vector constructor To: Christophe LYON Cc: Kyrylo Tkachov , gcc Patches Content-Type: multipart/mixed; boundary="000000000000d604c405c60db499" X-Spam-Status: No, score=-8.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Jul 2021 10:57:37 -0000 --000000000000d604c405c60db499 Content-Type: text/plain; charset="UTF-8" On Wed, 30 Jun 2021 at 20:51, Christophe LYON wrote: > > > On 29/06/2021 12:46, Prathamesh Kulkarni wrote: > > On Mon, 28 Jun 2021 at 14:48, Christophe LYON > > wrote: > >> > >> On 28/06/2021 10:40, Kyrylo Tkachov via Gcc-patches wrote: > >>>> -----Original Message----- > >>>> From: Prathamesh Kulkarni > >>>> Sent: 28 June 2021 09:38 > >>>> To: Kyrylo Tkachov > >>>> Cc: Christophe Lyon ; gcc Patches >>>> patches@gcc.gnu.org> > >>>> Subject: Re: [ARM] PR98435: Missed optimization in expanding vector > >>>> constructor > >>>> > >>>> On Thu, 24 Jun 2021 at 22:01, Kyrylo Tkachov > >>>> wrote: > >>>>> > >>>>>> -----Original Message----- > >>>>>> From: Prathamesh Kulkarni > >>>>>> Sent: 14 June 2021 09:02 > >>>>>> To: Christophe Lyon > >>>>>> Cc: gcc Patches ; Kyrylo Tkachov > >>>>>> > >>>>>> Subject: Re: [ARM] PR98435: Missed optimization in expanding vector > >>>>>> constructor > >>>>>> > >>>>>> On Wed, 9 Jun 2021 at 15:58, Prathamesh Kulkarni > >>>>>> wrote: > >>>>>>> On Fri, 4 Jun 2021 at 13:15, Christophe Lyon > >>>> > >>>>>> wrote: > >>>>>>>> On Fri, 4 Jun 2021 at 09:27, Prathamesh Kulkarni via Gcc-patches > >>>>>>>> wrote: > >>>>>>>>> Hi, > >>>>>>>>> As mentioned in PR, for the following test-case: > >>>>>>>>> > >>>>>>>>> #include > >>>>>>>>> > >>>>>>>>> bfloat16x4_t f1 (bfloat16_t a) > >>>>>>>>> { > >>>>>>>>> return vdup_n_bf16 (a); > >>>>>>>>> } > >>>>>>>>> > >>>>>>>>> bfloat16x4_t f2 (bfloat16_t a) > >>>>>>>>> { > >>>>>>>>> return (bfloat16x4_t) {a, a, a, a}; > >>>>>>>>> } > >>>>>>>>> > >>>>>>>>> Compiling with arm-linux-gnueabi -O3 -mfpu=neon -mfloat- > >>>> abi=softfp > >>>>>>>>> -march=armv8.2-a+bf16+fp16 results in f2 not being vectorized: > >>>>>>>>> > >>>>>>>>> f1: > >>>>>>>>> vdup.16 d16, r0 > >>>>>>>>> vmov r0, r1, d16 @ v4bf > >>>>>>>>> bx lr > >>>>>>>>> > >>>>>>>>> f2: > >>>>>>>>> mov r3, r0 @ __bf16 > >>>>>>>>> adr r1, .L4 > >>>>>>>>> ldrd r0, [r1] > >>>>>>>>> mov r2, r3 @ __bf16 > >>>>>>>>> mov ip, r3 @ __bf16 > >>>>>>>>> bfi r1, r2, #0, #16 > >>>>>>>>> bfi r0, ip, #0, #16 > >>>>>>>>> bfi r1, r3, #16, #16 > >>>>>>>>> bfi r0, r2, #16, #16 > >>>>>>>>> bx lr > >>>>>>>>> > >>>>>>>>> This seems to happen because vec_init pattern in neon.md has VDQ > >>>>>> mode > >>>>>>>>> iterator, which doesn't include V4BF. In attached patch, I changed > >>>>>>>>> mode > >>>>>>>>> to VDQX which seems to work for the test-case, and the compiler > >>>> now > >>>>>> generates: > >>>>>>>>> f2: > >>>>>>>>> vdup.16 d16, r0 > >>>>>>>>> vmov r0, r1, d16 @ v4bf > >>>>>>>>> bx lr > >>>>>>>>> > >>>>>>>>> However, the pattern is also gated on TARGET_HAVE_MVE and I am > >>>>>> not > >>>>>>>>> sure if either VDQ or VDQX are correct modes for MVE since MVE > >>>> has > >>>>>>>>> only 128-bit vectors ? > >>>>>>>>> > >>>>>>>> I think patterns common to both Neon and MVE should be moved to > >>>>>>>> vec-common.md, I don't know why such patterns were left in > >>>> neon.md. > >>>>>>> Since we end up calling neon_expand_vector_init for both NEON and > >>>> MVE, > >>>>>>> I am not sure if we should separate the pattern ? > >>>>>>> Would it make sense to FAIL if the mode size isn't 16 bytes for MVE as > >>>>>>> in attached patch so > >>>>>>> it will call neon_expand_vector_init only for 128-bit vectors ? > >>>>>>> Altho hard-coding 16 in the pattern doesn't seem a good idea to me > >>>> either. > >>>>>> ping https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572342.html > >>>>>> (attaching patch as text). > >>>>>> > >>>>> --- a/gcc/config/arm/neon.md > >>>>> +++ b/gcc/config/arm/neon.md > >>>>> @@ -459,10 +459,12 @@ > >>>>> ) > >>>>> > >>>>> (define_expand "vec_init" > >>>>> - [(match_operand:VDQ 0 "s_register_operand") > >>>>> + [(match_operand:VDQX 0 "s_register_operand") > >>>>> (match_operand 1 "" "")] > >>>>> "TARGET_NEON || TARGET_HAVE_MVE" > >>>>> { > >>>>> + if (TARGET_HAVE_MVE && GET_MODE_SIZE (GET_MODE > >>>> (operands[0])) != 16) > >>>>> + FAIL; > >>>>> neon_expand_vector_init (operands[0], operands[1]); > >>>>> DONE; > >>>>> }) > >>>>> > >>>>> I think we should move this to vec-common.md like Christophe said. > >>>>> Perhaps rather than making it FAIL for non-16 MVE sizes we just disable it in > >>>> the expander condition? > >>>>> "TARGET_NEON || (TARGET_HAVE_MVE && GET_MODE_SIZE (< > >>>> VDQ>mode) != 16)" > >>>> Is it OK to use mode ? Because using mode resulted in lot > >>>> of build errors. > >>>> Also, I think the comparison should be inverted, ie, GET_MODE_SIZE > >>>> (mode) == 16 since > >>>> we want to make the pattern pass if target is MVE and vector size is 16 bytes ? > >>>> Do these changes in attached patch look OK ? > >>> Yes, you're right. > >> > >> Can't this be ARM_HAVE__ARITH like in most expanders in vec-common.md? > >> > >> (maybe with a && !TARGET_REALLY_IWMMXT if needed) > > I wonder if this should be ARM_HAVE__LDST instead since we're > > initializing the vector ? > > > Well, it really depends on which modes you want to enable. > > > Looks like your move VDQ -> VDQ adds V4BF, V8BF and DI. > > Are they all OK for Neon? > > They are not OK for MVE. > > Ideally you could add testcases to cover to the supported and > unsupported modes for both Neon and MVE.\ > > Before your patch, the expander is enabled for MVE for 64 bit modes > (V8QI, V4HI, V2SI): what happens in this case? Does the compiler crash > or is there something else preventing the match? Hi, Apparently there is VALID_MVE_MODE macro, so is it better to use: TARGET_NEON || (TARGET_HAVE_MVE && VALID_MVE_MODE(mode)) as in the attached patch ? Thanks, Prathamesh > > > Thanks, > > > Christophe > > > > Thanks, > > Prathamesh > >> > >> Christophe > >> > >> > >>> Ok. > >>> Thanks, > >>> Kyrill > >>> > >>> > >>>> Thanks, > >>>> Prathamesh > >>>>> Thanks, > >>>>> Kyrill > >>>>> > >>>>>> Thanks, > >>>>>> Prathamesh > >>>>>>> Thanks, > >>>>>>> Prathamesh > >>>>>>>> That being said, I suggest you look at other similar patterns in > >>>>>>>> vec-common.md, most of which are gated on > >>>>>>>> ARM_HAVE__ARITH > >>>>>>>> and possibly beware of issues with iwmmxt :-) > >>>>>>>> > >>>>>>>> Christophe > >>>>>>>> > >>>>>>>>> Thanks, > >>>>>>>>> Prathamesh --000000000000d604c405c60db499 Content-Type: text/plain; charset="US-ASCII"; name="pr98435-4.txt" Content-Disposition: attachment; filename="pr98435-4.txt" Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: f_kqkskcxh0 ZGlmZiAtLWdpdCBhL2djYy9jb25maWcvYXJtL25lb24ubWQgYi9nY2MvY29uZmlnL2FybS9uZW9u Lm1kCmluZGV4IDZhNjU3MzMxN2NmLi4wYzk4YjNhOGYyMyAxMDA2NDQKLS0tIGEvZ2NjL2NvbmZp Zy9hcm0vbmVvbi5tZAorKysgYi9nY2MvY29uZmlnL2FybS9uZW9uLm1kCkBAIC00NTgsMTUgKzQ1 OCw2IEBACiAgIFsoc2V0X2F0dHIgInR5cGUiICJuZW9uX3N0b3JlMV9vbmVfbGFuZV9xLG5lb25f dG9fZ3BfcSIpXQogKQogCi0oZGVmaW5lX2V4cGFuZCAidmVjX2luaXQ8bW9kZT48Vl9lbGVtX2w+ IgotICBbKG1hdGNoX29wZXJhbmQ6VkRRIDAgInNfcmVnaXN0ZXJfb3BlcmFuZCIpCi0gICAobWF0 Y2hfb3BlcmFuZCAxICIiICIiKV0KLSAgIlRBUkdFVF9ORU9OIHx8IFRBUkdFVF9IQVZFX01WRSIK LXsKLSAgbmVvbl9leHBhbmRfdmVjdG9yX2luaXQgKG9wZXJhbmRzWzBdLCBvcGVyYW5kc1sxXSk7 Ci0gIERPTkU7Ci19KQotCiA7OyBEb3VibGV3b3JkIGFuZCBxdWFkd29yZCBhcml0aG1ldGljLgog CiA7OyBOT1RFOiBzb21lIG90aGVyIGluc3RydWN0aW9ucyBhbHNvIHN1cHBvcnQgNjQtYml0IGlu dGVnZXIKZGlmZiAtLWdpdCBhL2djYy9jb25maWcvYXJtL3ZlYy1jb21tb24ubWQgYi9nY2MvY29u ZmlnL2FybS92ZWMtY29tbW9uLm1kCmluZGV4IDhlMzUxNTFkYTQ2Li43ODU4YmU5ZjI4ZSAxMDA2 NDQKLS0tIGEvZ2NjL2NvbmZpZy9hcm0vdmVjLWNvbW1vbi5tZAorKysgYi9nY2MvY29uZmlnL2Fy bS92ZWMtY29tbW9uLm1kCkBAIC01NjUsMyArNTY1LDEyIEBACiAKICAgRE9ORTsKIH0pCisKKyhk ZWZpbmVfZXhwYW5kICJ2ZWNfaW5pdDxtb2RlPjxWX2VsZW1fbD4iCisgIFsobWF0Y2hfb3BlcmFu ZDpWRFFYIDAgInNfcmVnaXN0ZXJfb3BlcmFuZCIpCisgICAobWF0Y2hfb3BlcmFuZCAxICIiICIi KV0KKyAgIlRBUkdFVF9ORU9OIHx8IChUQVJHRVRfSEFWRV9NVkUgJiYgVkFMSURfTVZFX01PREUg KDxNT0RFPm1vZGUpKSIgCit7CisgIG5lb25fZXhwYW5kX3ZlY3Rvcl9pbml0IChvcGVyYW5kc1sw XSwgb3BlcmFuZHNbMV0pOworICBET05FOworfSkK --000000000000d604c405c60db499--