On Wed, 30 Jun 2021 at 20:51, Christophe LYON wrote: > > > On 29/06/2021 12:46, Prathamesh Kulkarni wrote: > > On Mon, 28 Jun 2021 at 14:48, Christophe LYON > > wrote: > >> > >> On 28/06/2021 10:40, Kyrylo Tkachov via Gcc-patches wrote: > >>>> -----Original Message----- > >>>> From: Prathamesh Kulkarni > >>>> Sent: 28 June 2021 09:38 > >>>> To: Kyrylo Tkachov > >>>> Cc: Christophe Lyon ; gcc Patches >>>> patches@gcc.gnu.org> > >>>> Subject: Re: [ARM] PR98435: Missed optimization in expanding vector > >>>> constructor > >>>> > >>>> On Thu, 24 Jun 2021 at 22:01, Kyrylo Tkachov > >>>> wrote: > >>>>> > >>>>>> -----Original Message----- > >>>>>> From: Prathamesh Kulkarni > >>>>>> Sent: 14 June 2021 09:02 > >>>>>> To: Christophe Lyon > >>>>>> Cc: gcc Patches ; Kyrylo Tkachov > >>>>>> > >>>>>> Subject: Re: [ARM] PR98435: Missed optimization in expanding vector > >>>>>> constructor > >>>>>> > >>>>>> On Wed, 9 Jun 2021 at 15:58, Prathamesh Kulkarni > >>>>>> wrote: > >>>>>>> On Fri, 4 Jun 2021 at 13:15, Christophe Lyon > >>>> > >>>>>> wrote: > >>>>>>>> On Fri, 4 Jun 2021 at 09:27, Prathamesh Kulkarni via Gcc-patches > >>>>>>>> wrote: > >>>>>>>>> Hi, > >>>>>>>>> As mentioned in PR, for the following test-case: > >>>>>>>>> > >>>>>>>>> #include > >>>>>>>>> > >>>>>>>>> bfloat16x4_t f1 (bfloat16_t a) > >>>>>>>>> { > >>>>>>>>> return vdup_n_bf16 (a); > >>>>>>>>> } > >>>>>>>>> > >>>>>>>>> bfloat16x4_t f2 (bfloat16_t a) > >>>>>>>>> { > >>>>>>>>> return (bfloat16x4_t) {a, a, a, a}; > >>>>>>>>> } > >>>>>>>>> > >>>>>>>>> Compiling with arm-linux-gnueabi -O3 -mfpu=neon -mfloat- > >>>> abi=softfp > >>>>>>>>> -march=armv8.2-a+bf16+fp16 results in f2 not being vectorized: > >>>>>>>>> > >>>>>>>>> f1: > >>>>>>>>> vdup.16 d16, r0 > >>>>>>>>> vmov r0, r1, d16 @ v4bf > >>>>>>>>> bx lr > >>>>>>>>> > >>>>>>>>> f2: > >>>>>>>>> mov r3, r0 @ __bf16 > >>>>>>>>> adr r1, .L4 > >>>>>>>>> ldrd r0, [r1] > >>>>>>>>> mov r2, r3 @ __bf16 > >>>>>>>>> mov ip, r3 @ __bf16 > >>>>>>>>> bfi r1, r2, #0, #16 > >>>>>>>>> bfi r0, ip, #0, #16 > >>>>>>>>> bfi r1, r3, #16, #16 > >>>>>>>>> bfi r0, r2, #16, #16 > >>>>>>>>> bx lr > >>>>>>>>> > >>>>>>>>> This seems to happen because vec_init pattern in neon.md has VDQ > >>>>>> mode > >>>>>>>>> iterator, which doesn't include V4BF. In attached patch, I changed > >>>>>>>>> mode > >>>>>>>>> to VDQX which seems to work for the test-case, and the compiler > >>>> now > >>>>>> generates: > >>>>>>>>> f2: > >>>>>>>>> vdup.16 d16, r0 > >>>>>>>>> vmov r0, r1, d16 @ v4bf > >>>>>>>>> bx lr > >>>>>>>>> > >>>>>>>>> However, the pattern is also gated on TARGET_HAVE_MVE and I am > >>>>>> not > >>>>>>>>> sure if either VDQ or VDQX are correct modes for MVE since MVE > >>>> has > >>>>>>>>> only 128-bit vectors ? > >>>>>>>>> > >>>>>>>> I think patterns common to both Neon and MVE should be moved to > >>>>>>>> vec-common.md, I don't know why such patterns were left in > >>>> neon.md. > >>>>>>> Since we end up calling neon_expand_vector_init for both NEON and > >>>> MVE, > >>>>>>> I am not sure if we should separate the pattern ? > >>>>>>> Would it make sense to FAIL if the mode size isn't 16 bytes for MVE as > >>>>>>> in attached patch so > >>>>>>> it will call neon_expand_vector_init only for 128-bit vectors ? > >>>>>>> Altho hard-coding 16 in the pattern doesn't seem a good idea to me > >>>> either. > >>>>>> ping https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572342.html > >>>>>> (attaching patch as text). > >>>>>> > >>>>> --- a/gcc/config/arm/neon.md > >>>>> +++ b/gcc/config/arm/neon.md > >>>>> @@ -459,10 +459,12 @@ > >>>>> ) > >>>>> > >>>>> (define_expand "vec_init" > >>>>> - [(match_operand:VDQ 0 "s_register_operand") > >>>>> + [(match_operand:VDQX 0 "s_register_operand") > >>>>> (match_operand 1 "" "")] > >>>>> "TARGET_NEON || TARGET_HAVE_MVE" > >>>>> { > >>>>> + if (TARGET_HAVE_MVE && GET_MODE_SIZE (GET_MODE > >>>> (operands[0])) != 16) > >>>>> + FAIL; > >>>>> neon_expand_vector_init (operands[0], operands[1]); > >>>>> DONE; > >>>>> }) > >>>>> > >>>>> I think we should move this to vec-common.md like Christophe said. > >>>>> Perhaps rather than making it FAIL for non-16 MVE sizes we just disable it in > >>>> the expander condition? > >>>>> "TARGET_NEON || (TARGET_HAVE_MVE && GET_MODE_SIZE (< > >>>> VDQ>mode) != 16)" > >>>> Is it OK to use mode ? Because using mode resulted in lot > >>>> of build errors. > >>>> Also, I think the comparison should be inverted, ie, GET_MODE_SIZE > >>>> (mode) == 16 since > >>>> we want to make the pattern pass if target is MVE and vector size is 16 bytes ? > >>>> Do these changes in attached patch look OK ? > >>> Yes, you're right. > >> > >> Can't this be ARM_HAVE__ARITH like in most expanders in vec-common.md? > >> > >> (maybe with a && !TARGET_REALLY_IWMMXT if needed) > > I wonder if this should be ARM_HAVE__LDST instead since we're > > initializing the vector ? > > > Well, it really depends on which modes you want to enable. > > > Looks like your move VDQ -> VDQ adds V4BF, V8BF and DI. > > Are they all OK for Neon? > > They are not OK for MVE. > > Ideally you could add testcases to cover to the supported and > unsupported modes for both Neon and MVE.\ > > Before your patch, the expander is enabled for MVE for 64 bit modes > (V8QI, V4HI, V2SI): what happens in this case? Does the compiler crash > or is there something else preventing the match? Hi, Apparently there is VALID_MVE_MODE macro, so is it better to use: TARGET_NEON || (TARGET_HAVE_MVE && VALID_MVE_MODE(mode)) as in the attached patch ? Thanks, Prathamesh > > > Thanks, > > > Christophe > > > > Thanks, > > Prathamesh > >> > >> Christophe > >> > >> > >>> Ok. > >>> Thanks, > >>> Kyrill > >>> > >>> > >>>> Thanks, > >>>> Prathamesh > >>>>> Thanks, > >>>>> Kyrill > >>>>> > >>>>>> Thanks, > >>>>>> Prathamesh > >>>>>>> Thanks, > >>>>>>> Prathamesh > >>>>>>>> That being said, I suggest you look at other similar patterns in > >>>>>>>> vec-common.md, most of which are gated on > >>>>>>>> ARM_HAVE__ARITH > >>>>>>>> and possibly beware of issues with iwmmxt :-) > >>>>>>>> > >>>>>>>> Christophe > >>>>>>>> > >>>>>>>>> Thanks, > >>>>>>>>> Prathamesh