From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx07-00178001.pphosted.com (mx08-00178001.pphosted.com [91.207.212.93]) by sourceware.org (Postfix) with ESMTPS id 40257383541F for ; Wed, 30 Jun 2021 15:21:50 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 40257383541F Received: from pps.filterd (m0046661.ppops.net [127.0.0.1]) by mx07-00178001.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 15UFCmIf007350; Wed, 30 Jun 2021 17:21:46 +0200 Received: from beta.dmz-eu.st.com (beta.dmz-eu.st.com [164.129.1.35]) by mx07-00178001.pphosted.com with ESMTP id 39gnbpk2jy-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 30 Jun 2021 17:21:46 +0200 Received: from euls16034.sgp.st.com (euls16034.sgp.st.com [10.75.44.20]) by beta.dmz-eu.st.com (STMicroelectronics) with ESMTP id BCCC410002A; Wed, 30 Jun 2021 17:21:45 +0200 (CEST) Received: from Webmail-eu.st.com (sfhdag2node3.st.com [10.75.127.6]) by euls16034.sgp.st.com (STMicroelectronics) with ESMTP id 8EC0C241865; Wed, 30 Jun 2021 17:21:45 +0200 (CEST) Received: from [10.211.4.206] (10.75.127.48) by SFHDAG2NODE3.st.com (10.75.127.6) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Wed, 30 Jun 2021 17:21:44 +0200 Subject: Re: [ARM] PR98435: Missed optimization in expanding vector constructor To: Prathamesh Kulkarni CC: Kyrylo Tkachov , gcc Patches References: <9fd8bc30-f7d1-0171-4147-d570413f7a62@foss.st.com> From: Christophe LYON Message-ID: <1d306b96-daa5-3a47-5e3e-d07ddd56dcf4@foss.st.com> Date: Wed, 30 Jun 2021 17:21:44 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US X-Originating-IP: [10.75.127.48] X-ClientProxiedBy: SFHDAG2NODE1.st.com (10.75.127.4) To SFHDAG2NODE3.st.com (10.75.127.6) X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.790 definitions=2021-06-30_06:2021-06-30, 2021-06-30 signatures=0 X-Spam-Status: No, score=-2.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, KAM_SHORT, NICE_REPLY_A, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 30 Jun 2021 15:21:52 -0000 On 29/06/2021 12:46, Prathamesh Kulkarni wrote: > On Mon, 28 Jun 2021 at 14:48, Christophe LYON > wrote: >> >> On 28/06/2021 10:40, Kyrylo Tkachov via Gcc-patches wrote: >>>> -----Original Message----- >>>> From: Prathamesh Kulkarni >>>> Sent: 28 June 2021 09:38 >>>> To: Kyrylo Tkachov >>>> Cc: Christophe Lyon ; gcc Patches >>> patches@gcc.gnu.org> >>>> Subject: Re: [ARM] PR98435: Missed optimization in expanding vector >>>> constructor >>>> >>>> On Thu, 24 Jun 2021 at 22:01, Kyrylo Tkachov >>>> wrote: >>>>> >>>>>> -----Original Message----- >>>>>> From: Prathamesh Kulkarni >>>>>> Sent: 14 June 2021 09:02 >>>>>> To: Christophe Lyon >>>>>> Cc: gcc Patches ; Kyrylo Tkachov >>>>>> >>>>>> Subject: Re: [ARM] PR98435: Missed optimization in expanding vector >>>>>> constructor >>>>>> >>>>>> On Wed, 9 Jun 2021 at 15:58, Prathamesh Kulkarni >>>>>> wrote: >>>>>>> On Fri, 4 Jun 2021 at 13:15, Christophe Lyon >>>> >>>>>> wrote: >>>>>>>> On Fri, 4 Jun 2021 at 09:27, Prathamesh Kulkarni via Gcc-patches >>>>>>>> wrote: >>>>>>>>> Hi, >>>>>>>>> As mentioned in PR, for the following test-case: >>>>>>>>> >>>>>>>>> #include >>>>>>>>> >>>>>>>>> bfloat16x4_t f1 (bfloat16_t a) >>>>>>>>> { >>>>>>>>> return vdup_n_bf16 (a); >>>>>>>>> } >>>>>>>>> >>>>>>>>> bfloat16x4_t f2 (bfloat16_t a) >>>>>>>>> { >>>>>>>>> return (bfloat16x4_t) {a, a, a, a}; >>>>>>>>> } >>>>>>>>> >>>>>>>>> Compiling with arm-linux-gnueabi -O3 -mfpu=neon -mfloat- >>>> abi=softfp >>>>>>>>> -march=armv8.2-a+bf16+fp16 results in f2 not being vectorized: >>>>>>>>> >>>>>>>>> f1: >>>>>>>>> vdup.16 d16, r0 >>>>>>>>> vmov r0, r1, d16 @ v4bf >>>>>>>>> bx lr >>>>>>>>> >>>>>>>>> f2: >>>>>>>>> mov r3, r0 @ __bf16 >>>>>>>>> adr r1, .L4 >>>>>>>>> ldrd r0, [r1] >>>>>>>>> mov r2, r3 @ __bf16 >>>>>>>>> mov ip, r3 @ __bf16 >>>>>>>>> bfi r1, r2, #0, #16 >>>>>>>>> bfi r0, ip, #0, #16 >>>>>>>>> bfi r1, r3, #16, #16 >>>>>>>>> bfi r0, r2, #16, #16 >>>>>>>>> bx lr >>>>>>>>> >>>>>>>>> This seems to happen because vec_init pattern in neon.md has VDQ >>>>>> mode >>>>>>>>> iterator, which doesn't include V4BF. In attached patch, I changed >>>>>>>>> mode >>>>>>>>> to VDQX which seems to work for the test-case, and the compiler >>>> now >>>>>> generates: >>>>>>>>> f2: >>>>>>>>> vdup.16 d16, r0 >>>>>>>>> vmov r0, r1, d16 @ v4bf >>>>>>>>> bx lr >>>>>>>>> >>>>>>>>> However, the pattern is also gated on TARGET_HAVE_MVE and I am >>>>>> not >>>>>>>>> sure if either VDQ or VDQX are correct modes for MVE since MVE >>>> has >>>>>>>>> only 128-bit vectors ? >>>>>>>>> >>>>>>>> I think patterns common to both Neon and MVE should be moved to >>>>>>>> vec-common.md, I don't know why such patterns were left in >>>> neon.md. >>>>>>> Since we end up calling neon_expand_vector_init for both NEON and >>>> MVE, >>>>>>> I am not sure if we should separate the pattern ? >>>>>>> Would it make sense to FAIL if the mode size isn't 16 bytes for MVE as >>>>>>> in attached patch so >>>>>>> it will call neon_expand_vector_init only for 128-bit vectors ? >>>>>>> Altho hard-coding 16 in the pattern doesn't seem a good idea to me >>>> either. >>>>>> ping https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572342.html >>>>>> (attaching patch as text). >>>>>> >>>>> --- a/gcc/config/arm/neon.md >>>>> +++ b/gcc/config/arm/neon.md >>>>> @@ -459,10 +459,12 @@ >>>>> ) >>>>> >>>>> (define_expand "vec_init" >>>>> - [(match_operand:VDQ 0 "s_register_operand") >>>>> + [(match_operand:VDQX 0 "s_register_operand") >>>>> (match_operand 1 "" "")] >>>>> "TARGET_NEON || TARGET_HAVE_MVE" >>>>> { >>>>> + if (TARGET_HAVE_MVE && GET_MODE_SIZE (GET_MODE >>>> (operands[0])) != 16) >>>>> + FAIL; >>>>> neon_expand_vector_init (operands[0], operands[1]); >>>>> DONE; >>>>> }) >>>>> >>>>> I think we should move this to vec-common.md like Christophe said. >>>>> Perhaps rather than making it FAIL for non-16 MVE sizes we just disable it in >>>> the expander condition? >>>>> "TARGET_NEON || (TARGET_HAVE_MVE && GET_MODE_SIZE (< >>>> VDQ>mode) != 16)" >>>> Is it OK to use mode ? Because using mode resulted in lot >>>> of build errors. >>>> Also, I think the comparison should be inverted, ie, GET_MODE_SIZE >>>> (mode) == 16 since >>>> we want to make the pattern pass if target is MVE and vector size is 16 bytes ? >>>> Do these changes in attached patch look OK ? >>> Yes, you're right. >> >> Can't this be ARM_HAVE__ARITH like in most expanders in vec-common.md? >> >> (maybe with a && !TARGET_REALLY_IWMMXT if needed) > I wonder if this should be ARM_HAVE__LDST instead since we're > initializing the vector ? Well, it really depends on which modes you want to enable. Looks like your move VDQ -> VDQ adds V4BF, V8BF and DI. Are they all OK for Neon? They are not OK for MVE. Ideally you could add testcases to cover to the supported and unsupported modes for both Neon and MVE.\ Before your patch, the expander is enabled for MVE for 64 bit modes (V8QI, V4HI, V2SI): what happens in this case? Does the compiler crash or is there something else preventing the match? Thanks, Christophe > Thanks, > Prathamesh >> >> Christophe >> >> >>> Ok. >>> Thanks, >>> Kyrill >>> >>> >>>> Thanks, >>>> Prathamesh >>>>> Thanks, >>>>> Kyrill >>>>> >>>>>> Thanks, >>>>>> Prathamesh >>>>>>> Thanks, >>>>>>> Prathamesh >>>>>>>> That being said, I suggest you look at other similar patterns in >>>>>>>> vec-common.md, most of which are gated on >>>>>>>> ARM_HAVE__ARITH >>>>>>>> and possibly beware of issues with iwmmxt :-) >>>>>>>> >>>>>>>> Christophe >>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Prathamesh