From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ej1-x62c.google.com (mail-ej1-x62c.google.com [IPv6:2a00:1450:4864:20::62c]) by sourceware.org (Postfix) with ESMTPS id 32036385503C for ; Tue, 6 Jul 2021 07:06:28 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 32036385503C Received: by mail-ej1-x62c.google.com with SMTP id b2so32540122ejg.8 for ; Tue, 06 Jul 2021 00:06:28 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=DxtJqbtyGy4Kv8tXpfN0bhNT+RAKhDzMtFxvqrCkUy0=; b=LdmxBnVKlmP9rf8KNBJdYxU+c5yPVYnIVqafn2AwqLwJaHyTG4DJ+HnGUweIThL37/ AKp0hj0GndGNDxly0nZUJpKqUMv8uKe+98/D5SQzZsESBq46MjnX4XXoCMO7mt7Mg2cr q1BLDJXUHravZswSdNO3GxmmZ5rcMeuaLzWVRgDClFJyJMEbtQByfGyq2y3HBrnY1Kw/ bivfv7G3vx9CrhUsPjS0+plRLrUwGEPQORQTiqD1SdKchBvZBLzDyBODW6vtSToLElL1 y3MPY8o/Ght3CPPObjKRdJ9vnSgBgPut/JP4a2zi7u1JrEucw28vEc5pWagYdQM9wT7f zPpw== X-Gm-Message-State: AOAM5306nqRPpEo0/tJAxI+kn4gDeXD6X+fH4MAmDxFyOqHxywJQgSd2 d+YevkrVfV6jFDK4oKp0QtFAxbcAZbGZ9kf5QaaDeg== X-Google-Smtp-Source: ABdhPJy0dlkjEJ4wzOFWEIwJO5pim6l49kI/l/hUEULLY7O8JtDhd6ngzjLlMUPoV9xTewwjP2eQvTcUrRdaSWlVO2A= X-Received: by 2002:a17:906:1344:: with SMTP id x4mr17225027ejb.44.1625555187310; Tue, 06 Jul 2021 00:06:27 -0700 (PDT) MIME-Version: 1.0 References: <9fd8bc30-f7d1-0171-4147-d570413f7a62@foss.st.com> <1d306b96-daa5-3a47-5e3e-d07ddd56dcf4@foss.st.com> In-Reply-To: From: Prathamesh Kulkarni Date: Tue, 6 Jul 2021 12:35:49 +0530 Message-ID: Subject: Re: [ARM] PR98435: Missed optimization in expanding vector constructor To: Christophe LYON Cc: Kyrylo Tkachov , gcc Patches Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-3.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 06 Jul 2021 07:06:30 -0000 On Thu, 1 Jul 2021 at 16:26, Prathamesh Kulkarni wrote: > > On Wed, 30 Jun 2021 at 20:51, Christophe LYON > wrote: > > > > > > On 29/06/2021 12:46, Prathamesh Kulkarni wrote: > > > On Mon, 28 Jun 2021 at 14:48, Christophe LYON > > > wrote: > > >> > > >> On 28/06/2021 10:40, Kyrylo Tkachov via Gcc-patches wrote: > > >>>> -----Original Message----- > > >>>> From: Prathamesh Kulkarni > > >>>> Sent: 28 June 2021 09:38 > > >>>> To: Kyrylo Tkachov > > >>>> Cc: Christophe Lyon ; gcc Patches > >>>> patches@gcc.gnu.org> > > >>>> Subject: Re: [ARM] PR98435: Missed optimization in expanding vector > > >>>> constructor > > >>>> > > >>>> On Thu, 24 Jun 2021 at 22:01, Kyrylo Tkachov > > >>>> wrote: > > >>>>> > > >>>>>> -----Original Message----- > > >>>>>> From: Prathamesh Kulkarni > > >>>>>> Sent: 14 June 2021 09:02 > > >>>>>> To: Christophe Lyon > > >>>>>> Cc: gcc Patches ; Kyrylo Tkachov > > >>>>>> > > >>>>>> Subject: Re: [ARM] PR98435: Missed optimization in expanding vector > > >>>>>> constructor > > >>>>>> > > >>>>>> On Wed, 9 Jun 2021 at 15:58, Prathamesh Kulkarni > > >>>>>> wrote: > > >>>>>>> On Fri, 4 Jun 2021 at 13:15, Christophe Lyon > > >>>> > > >>>>>> wrote: > > >>>>>>>> On Fri, 4 Jun 2021 at 09:27, Prathamesh Kulkarni via Gcc-patches > > >>>>>>>> wrote: > > >>>>>>>>> Hi, > > >>>>>>>>> As mentioned in PR, for the following test-case: > > >>>>>>>>> > > >>>>>>>>> #include > > >>>>>>>>> > > >>>>>>>>> bfloat16x4_t f1 (bfloat16_t a) > > >>>>>>>>> { > > >>>>>>>>> return vdup_n_bf16 (a); > > >>>>>>>>> } > > >>>>>>>>> > > >>>>>>>>> bfloat16x4_t f2 (bfloat16_t a) > > >>>>>>>>> { > > >>>>>>>>> return (bfloat16x4_t) {a, a, a, a}; > > >>>>>>>>> } > > >>>>>>>>> > > >>>>>>>>> Compiling with arm-linux-gnueabi -O3 -mfpu=neon -mfloat- > > >>>> abi=softfp > > >>>>>>>>> -march=armv8.2-a+bf16+fp16 results in f2 not being vectorized: > > >>>>>>>>> > > >>>>>>>>> f1: > > >>>>>>>>> vdup.16 d16, r0 > > >>>>>>>>> vmov r0, r1, d16 @ v4bf > > >>>>>>>>> bx lr > > >>>>>>>>> > > >>>>>>>>> f2: > > >>>>>>>>> mov r3, r0 @ __bf16 > > >>>>>>>>> adr r1, .L4 > > >>>>>>>>> ldrd r0, [r1] > > >>>>>>>>> mov r2, r3 @ __bf16 > > >>>>>>>>> mov ip, r3 @ __bf16 > > >>>>>>>>> bfi r1, r2, #0, #16 > > >>>>>>>>> bfi r0, ip, #0, #16 > > >>>>>>>>> bfi r1, r3, #16, #16 > > >>>>>>>>> bfi r0, r2, #16, #16 > > >>>>>>>>> bx lr > > >>>>>>>>> > > >>>>>>>>> This seems to happen because vec_init pattern in neon.md has VDQ > > >>>>>> mode > > >>>>>>>>> iterator, which doesn't include V4BF. In attached patch, I changed > > >>>>>>>>> mode > > >>>>>>>>> to VDQX which seems to work for the test-case, and the compiler > > >>>> now > > >>>>>> generates: > > >>>>>>>>> f2: > > >>>>>>>>> vdup.16 d16, r0 > > >>>>>>>>> vmov r0, r1, d16 @ v4bf > > >>>>>>>>> bx lr > > >>>>>>>>> > > >>>>>>>>> However, the pattern is also gated on TARGET_HAVE_MVE and I am > > >>>>>> not > > >>>>>>>>> sure if either VDQ or VDQX are correct modes for MVE since MVE > > >>>> has > > >>>>>>>>> only 128-bit vectors ? > > >>>>>>>>> > > >>>>>>>> I think patterns common to both Neon and MVE should be moved to > > >>>>>>>> vec-common.md, I don't know why such patterns were left in > > >>>> neon.md. > > >>>>>>> Since we end up calling neon_expand_vector_init for both NEON and > > >>>> MVE, > > >>>>>>> I am not sure if we should separate the pattern ? > > >>>>>>> Would it make sense to FAIL if the mode size isn't 16 bytes for MVE as > > >>>>>>> in attached patch so > > >>>>>>> it will call neon_expand_vector_init only for 128-bit vectors ? > > >>>>>>> Altho hard-coding 16 in the pattern doesn't seem a good idea to me > > >>>> either. > > >>>>>> ping https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572342.html > > >>>>>> (attaching patch as text). > > >>>>>> > > >>>>> --- a/gcc/config/arm/neon.md > > >>>>> +++ b/gcc/config/arm/neon.md > > >>>>> @@ -459,10 +459,12 @@ > > >>>>> ) > > >>>>> > > >>>>> (define_expand "vec_init" > > >>>>> - [(match_operand:VDQ 0 "s_register_operand") > > >>>>> + [(match_operand:VDQX 0 "s_register_operand") > > >>>>> (match_operand 1 "" "")] > > >>>>> "TARGET_NEON || TARGET_HAVE_MVE" > > >>>>> { > > >>>>> + if (TARGET_HAVE_MVE && GET_MODE_SIZE (GET_MODE > > >>>> (operands[0])) != 16) > > >>>>> + FAIL; > > >>>>> neon_expand_vector_init (operands[0], operands[1]); > > >>>>> DONE; > > >>>>> }) > > >>>>> > > >>>>> I think we should move this to vec-common.md like Christophe said. > > >>>>> Perhaps rather than making it FAIL for non-16 MVE sizes we just disable it in > > >>>> the expander condition? > > >>>>> "TARGET_NEON || (TARGET_HAVE_MVE && GET_MODE_SIZE (< > > >>>> VDQ>mode) != 16)" > > >>>> Is it OK to use mode ? Because using mode resulted in lot > > >>>> of build errors. > > >>>> Also, I think the comparison should be inverted, ie, GET_MODE_SIZE > > >>>> (mode) == 16 since > > >>>> we want to make the pattern pass if target is MVE and vector size is 16 bytes ? > > >>>> Do these changes in attached patch look OK ? > > >>> Yes, you're right. > > >> > > >> Can't this be ARM_HAVE__ARITH like in most expanders in vec-common.md? > > >> > > >> (maybe with a && !TARGET_REALLY_IWMMXT if needed) > > > I wonder if this should be ARM_HAVE__LDST instead since we're > > > initializing the vector ? > > > > > > Well, it really depends on which modes you want to enable. > > > > > > Looks like your move VDQ -> VDQ adds V4BF, V8BF and DI. > > > > Are they all OK for Neon? > > > > They are not OK for MVE. > > > > Ideally you could add testcases to cover to the supported and > > unsupported modes for both Neon and MVE.\ > > > > Before your patch, the expander is enabled for MVE for 64 bit modes > > (V8QI, V4HI, V2SI): what happens in this case? Does the compiler crash > > or is there something else preventing the match? > Hi, > Apparently there is VALID_MVE_MODE macro, so is it better to use: > TARGET_NEON || (TARGET_HAVE_MVE && VALID_MVE_MODE(mode)) > as in the attached patch ? ping https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574206.html Thanks, Prathamesh > > Thanks, > Prathamesh > > > > > > Thanks, > > > > > > Christophe > > > > > > > Thanks, > > > Prathamesh > > >> > > >> Christophe > > >> > > >> > > >>> Ok. > > >>> Thanks, > > >>> Kyrill > > >>> > > >>> > > >>>> Thanks, > > >>>> Prathamesh > > >>>>> Thanks, > > >>>>> Kyrill > > >>>>> > > >>>>>> Thanks, > > >>>>>> Prathamesh > > >>>>>>> Thanks, > > >>>>>>> Prathamesh > > >>>>>>>> That being said, I suggest you look at other similar patterns in > > >>>>>>>> vec-common.md, most of which are gated on > > >>>>>>>> ARM_HAVE__ARITH > > >>>>>>>> and possibly beware of issues with iwmmxt :-) > > >>>>>>>> > > >>>>>>>> Christophe > > >>>>>>>> > > >>>>>>>>> Thanks, > > >>>>>>>>> Prathamesh