From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx07-00178001.pphosted.com (mx08-00178001.pphosted.com [91.207.212.93]) by sourceware.org (Postfix) with ESMTPS id 4AF1A3951C26 for ; Mon, 28 Jun 2021 09:18:25 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 4AF1A3951C26 Received: from pps.filterd (m0046660.ppops.net [127.0.0.1]) by mx07-00178001.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 15S9IIkL007384; Mon, 28 Jun 2021 11:18:22 +0200 Received: from beta.dmz-eu.st.com (beta.dmz-eu.st.com [164.129.1.35]) by mx07-00178001.pphosted.com with ESMTP id 39f5hg2000-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 28 Jun 2021 11:18:22 +0200 Received: from euls16034.sgp.st.com (euls16034.sgp.st.com [10.75.44.20]) by beta.dmz-eu.st.com (STMicroelectronics) with ESMTP id 1A60410002A; Mon, 28 Jun 2021 11:18:22 +0200 (CEST) Received: from Webmail-eu.st.com (sfhdag2node3.st.com [10.75.127.6]) by euls16034.sgp.st.com (STMicroelectronics) with ESMTP id 0570321A209; Mon, 28 Jun 2021 11:18:22 +0200 (CEST) Received: from [10.211.5.9] (10.75.127.48) by SFHDAG2NODE3.st.com (10.75.127.6) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Mon, 28 Jun 2021 11:18:21 +0200 Subject: Re: [ARM] PR98435: Missed optimization in expanding vector constructor To: Kyrylo Tkachov , Prathamesh Kulkarni CC: gcc Patches References: From: Christophe LYON Message-ID: <9fd8bc30-f7d1-0171-4147-d570413f7a62@foss.st.com> Date: Mon, 28 Jun 2021 11:17:56 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US X-Originating-IP: [10.75.127.48] X-ClientProxiedBy: SFHDAG1NODE2.st.com (10.75.127.2) To SFHDAG2NODE3.st.com (10.75.127.6) X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.790 definitions=2021-06-28_07:2021-06-25, 2021-06-28 signatures=0 X-Spam-Status: No, score=-2.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, KAM_SHORT, NICE_REPLY_A, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 28 Jun 2021 09:18:27 -0000 On 28/06/2021 10:40, Kyrylo Tkachov via Gcc-patches wrote: > >> -----Original Message----- >> From: Prathamesh Kulkarni >> Sent: 28 June 2021 09:38 >> To: Kyrylo Tkachov >> Cc: Christophe Lyon ; gcc Patches > patches@gcc.gnu.org> >> Subject: Re: [ARM] PR98435: Missed optimization in expanding vector >> constructor >> >> On Thu, 24 Jun 2021 at 22:01, Kyrylo Tkachov >> wrote: >>> >>> >>>> -----Original Message----- >>>> From: Prathamesh Kulkarni >>>> Sent: 14 June 2021 09:02 >>>> To: Christophe Lyon >>>> Cc: gcc Patches ; Kyrylo Tkachov >>>> >>>> Subject: Re: [ARM] PR98435: Missed optimization in expanding vector >>>> constructor >>>> >>>> On Wed, 9 Jun 2021 at 15:58, Prathamesh Kulkarni >>>> wrote: >>>>> On Fri, 4 Jun 2021 at 13:15, Christophe Lyon >> >>>> wrote: >>>>>> On Fri, 4 Jun 2021 at 09:27, Prathamesh Kulkarni via Gcc-patches >>>>>> wrote: >>>>>>> Hi, >>>>>>> As mentioned in PR, for the following test-case: >>>>>>> >>>>>>> #include >>>>>>> >>>>>>> bfloat16x4_t f1 (bfloat16_t a) >>>>>>> { >>>>>>> return vdup_n_bf16 (a); >>>>>>> } >>>>>>> >>>>>>> bfloat16x4_t f2 (bfloat16_t a) >>>>>>> { >>>>>>> return (bfloat16x4_t) {a, a, a, a}; >>>>>>> } >>>>>>> >>>>>>> Compiling with arm-linux-gnueabi -O3 -mfpu=neon -mfloat- >> abi=softfp >>>>>>> -march=armv8.2-a+bf16+fp16 results in f2 not being vectorized: >>>>>>> >>>>>>> f1: >>>>>>> vdup.16 d16, r0 >>>>>>> vmov r0, r1, d16 @ v4bf >>>>>>> bx lr >>>>>>> >>>>>>> f2: >>>>>>> mov r3, r0 @ __bf16 >>>>>>> adr r1, .L4 >>>>>>> ldrd r0, [r1] >>>>>>> mov r2, r3 @ __bf16 >>>>>>> mov ip, r3 @ __bf16 >>>>>>> bfi r1, r2, #0, #16 >>>>>>> bfi r0, ip, #0, #16 >>>>>>> bfi r1, r3, #16, #16 >>>>>>> bfi r0, r2, #16, #16 >>>>>>> bx lr >>>>>>> >>>>>>> This seems to happen because vec_init pattern in neon.md has VDQ >>>> mode >>>>>>> iterator, which doesn't include V4BF. In attached patch, I changed >>>>>>> mode >>>>>>> to VDQX which seems to work for the test-case, and the compiler >> now >>>> generates: >>>>>>> f2: >>>>>>> vdup.16 d16, r0 >>>>>>> vmov r0, r1, d16 @ v4bf >>>>>>> bx lr >>>>>>> >>>>>>> However, the pattern is also gated on TARGET_HAVE_MVE and I am >>>> not >>>>>>> sure if either VDQ or VDQX are correct modes for MVE since MVE >> has >>>>>>> only 128-bit vectors ? >>>>>>> >>>>>> I think patterns common to both Neon and MVE should be moved to >>>>>> vec-common.md, I don't know why such patterns were left in >> neon.md. >>>>> Since we end up calling neon_expand_vector_init for both NEON and >> MVE, >>>>> I am not sure if we should separate the pattern ? >>>>> Would it make sense to FAIL if the mode size isn't 16 bytes for MVE as >>>>> in attached patch so >>>>> it will call neon_expand_vector_init only for 128-bit vectors ? >>>>> Altho hard-coding 16 in the pattern doesn't seem a good idea to me >> either. >>>> ping https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572342.html >>>> (attaching patch as text). >>>> >>> --- a/gcc/config/arm/neon.md >>> +++ b/gcc/config/arm/neon.md >>> @@ -459,10 +459,12 @@ >>> ) >>> >>> (define_expand "vec_init" >>> - [(match_operand:VDQ 0 "s_register_operand") >>> + [(match_operand:VDQX 0 "s_register_operand") >>> (match_operand 1 "" "")] >>> "TARGET_NEON || TARGET_HAVE_MVE" >>> { >>> + if (TARGET_HAVE_MVE && GET_MODE_SIZE (GET_MODE >> (operands[0])) != 16) >>> + FAIL; >>> neon_expand_vector_init (operands[0], operands[1]); >>> DONE; >>> }) >>> >>> I think we should move this to vec-common.md like Christophe said. >>> Perhaps rather than making it FAIL for non-16 MVE sizes we just disable it in >> the expander condition? >>> "TARGET_NEON || (TARGET_HAVE_MVE && GET_MODE_SIZE (< >> VDQ>mode) != 16)" >> Is it OK to use mode ? Because using mode resulted in lot >> of build errors. >> Also, I think the comparison should be inverted, ie, GET_MODE_SIZE >> (mode) == 16 since >> we want to make the pattern pass if target is MVE and vector size is 16 bytes ? >> Do these changes in attached patch look OK ? > Yes, you're right. Can't this be ARM_HAVE__ARITH like in most expanders in vec-common.md? (maybe with a && !TARGET_REALLY_IWMMXT if needed) Christophe > Ok. > Thanks, > Kyrill > > >> Thanks, >> Prathamesh >>> Thanks, >>> Kyrill >>> >>>> Thanks, >>>> Prathamesh >>>>> Thanks, >>>>> Prathamesh >>>>>> That being said, I suggest you look at other similar patterns in >>>>>> vec-common.md, most of which are gated on >>>>>> ARM_HAVE__ARITH >>>>>> and possibly beware of issues with iwmmxt :-) >>>>>> >>>>>> Christophe >>>>>> >>>>>>> Thanks, >>>>>>> Prathamesh