From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 21113 invoked by alias); 26 Jul 2018 16:52:24 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 21088 invoked by uid 89); 26 Jul 2018 16:52:23 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-26.2 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,RCVD_IN_DNSWL_NONE,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.2 spammy=aarch64-simd.md, aarch64simdmd, UD:aarch64-simd.md, for X-HELO: EUR02-AM5-obe.outbound.protection.outlook.com Received: from mail-eopbgr00084.outbound.protection.outlook.com (HELO EUR02-AM5-obe.outbound.protection.outlook.com) (40.107.0.84) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 26 Jul 2018 16:52:21 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector1-arm-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=hCpBJ/YkAwNs+Zz3UYJw6YbzdPfzR1KdXW7NmnYaW3M=; b=Lwzaa7+XvY8FszU2sytWD/MeZaayzHimMNZZzjmCRIzLGIHHZ2uedTdUe5kDnkt3R1E28HZ2eppPWmCkaRMfqPRChKWCNmifLqGj7iCYNWt2MpkCORvL5NSzJGizQHCOOx1nhZq2rvQ3wK9U75ltfL+EfL0JV+6HG35xft4+6Xg= Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=Sam.Tebbs@arm.com; Received: from [10.2.206.193] (217.140.96.140) by DB7PR08MB3435.eurprd08.prod.outlook.com (2603:10a6:10:42::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.995.16; Thu, 26 Jul 2018 16:52:17 +0000 Subject: Re: [GCC][PATCH][Aarch64] Stop redundant zero-extension after UMOV when in DI mode To: Sudakshina Das , "gcc-patches@gcc.gnu.org" Cc: Marcus Shawcroft , nd , Richard Earnshaw , James Greenhalgh References: <953dbdd2-e20c-4587-3e0d-ad1a65fc93c6@arm.com> <85b58ddb-3da2-67c6-1514-e308201191d3@arm.com> <74da7cb6-485b-3ce4-7901-d10cb6f1ed95@arm.com> From: Sam Tebbs Message-ID: <9e7ccf5b-55b9-543d-1f9e-f9ab36e93376@arm.com> Date: Thu, 26 Jul 2018 16:52:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 MIME-Version: 1.0 In-Reply-To: <74da7cb6-485b-3ce4-7901-d10cb6f1ed95@arm.com> Content-Type: multipart/mixed; boundary="------------07B2CC4968BE90498A91CCE8" Return-Path: sam.tebbs@arm.com Received-SPF: None (protection.outlook.com: arm.com does not designate permitted sender hosts) X-IsSubscribed: yes X-SW-Source: 2018-07/txt/msg01677.txt.bz2 This is a multi-part message in MIME format. --------------07B2CC4968BE90498A91CCE8 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-length: 4612 On 07/25/2018 07:08 PM, Sudakshina Das wrote: > Hi Sam > > On 25/07/18 14:08, Sam Tebbs wrote: >> On 07/23/2018 05:01 PM, Sudakshina Das wrote: >>> Hi Sam >>> >>> >>> On Monday 23 July 2018 11:39 AM, Sam Tebbs wrote: >>>> Hi all, >>>> >>>> This patch extends the aarch64_get_lane_zero_extendsi instruction >>>> definition to >>>> also cover DI mode. This prevents a redundant AND instruction from >>>> being >>>> generated due to the pattern failing to be matched. >>>> >>>> Example: >>>> >>>> typedef char v16qi __attribute__ ((vector_size (16))); >>>> >>>> unsigned long long >>>> foo (v16qi a) >>>> { >>>>   return a[0]; >>>> } >>>> >>>> Previously generated: >>>> >>>> foo: >>>>         umov    w0, v0.b[0] >>>>         and     x0, x0, 255 >>>>         ret >>>> >>>> And now generates: >>>> >>>> foo: >>>>         umov    w0, v0.b[0] >>>>         ret >>>> >>>> Bootstrapped on aarch64-none-linux-gnu and tested on >>>> aarch64-none-elf with no >>>> regressions. >>>> >>>> gcc/ >>>> 2018-07-23  Sam Tebbs >>>> >>>>         * config/aarch64/aarch64-simd.md >>>>     (*aarch64_get_lane_zero_extendsi): >>>>         Rename to... >>>> (*aarch64_get_lane_zero_extend): ... This. >>>>         Use GPI iterator instead of SI mode. >>>> >>>> gcc/testsuite >>>> 2018-07-23  Sam Tebbs >>>> >>>>         * gcc.target/aarch64/extract_zero_extend.c: New file >>>> >>> You will need an approval from a maintainer, but I would only add >>> one request to this: >>> >>> diff --git a/gcc/config/aarch64/aarch64-simd.md >>> b/gcc/config/aarch64/aarch64-simd.md >>> index 89e38e6..15fb661 100644 >>> --- a/gcc/config/aarch64/aarch64-simd.md >>> +++ b/gcc/config/aarch64/aarch64-simd.md >>> @@ -3032,15 +3032,16 @@ >>>    [(set_attr "type" "neon_to_gp")] >>>  ) >>> >>> -(define_insn "*aarch64_get_lane_zero_extendsi" >>> -  [(set (match_operand:SI 0 "register_operand" "=r") >>> -    (zero_extend:SI >>> +(define_insn "*aarch64_get_lane_zero_extend" >>> +  [(set (match_operand:GPI 0 "register_operand" "=r") >>> +    (zero_extend:GPI >>> >>> Since you are adding 4 new patterns with this change, could you add >>> more cases in your test as well to make sure you have coverage for >>> each of them. >>> >>> Thanks >>> Sudi >> >> Hi Sudi, >> >> Thanks for the feedback. Here is an updated patch that adds more >> testcases to cover the patterns generated by the different mode >> combinations. The changelog and description from my original email >> still apply. >> > > Thanks for making the changes and adding more test cases. I do however > see that you are only covering 2 out of 4 new > *aarch64_get_lane_zero_extenddi<> patterns. The > *aarch64_get_lane_zero_extendsi<> were already existing. I don't mind > those tests. I would just ask you to add the other two new patterns > as well. Also since the different versions of the instruction generate > same instructions (like foo_16qi and foo_8qi both give out the same > instruction), I would suggest using a -fdump-rtl-final (or any relevant > rtl dump) with the dg-options and using a scan-rtl-dump to scan the > pattern name. Something like: > /* { dg-do compile } */ > /* { dg-options "-O3 -fdump-rtl-final" } */ > ... > ... > /* { dg-final { scan-rtl-dump "aarch64_get_lane_zero_extenddiv16qi" > "final" } } */ > > Thanks > Sudi Hi Sudi, Thanks again. Here's an update that adds 4 more tests, so all 8 patterns generated are now tested for! Below is the updated changelog gcc/ 2018-07-26  Sam Tebbs          * config/aarch64/aarch64-simd.md         (*aarch64_get_lane_zero_extendsi):         Rename to... (*aarch64_get_lane_zero_extend): ... This.         Use GPI iterator instead of SI mode. gcc/testsuite 2018-07-26  Sam Tebbs          * gcc.target/aarch64/extract_zero_extend.c: New file > >>> >>>        (vec_select: >>>          (match_operand:VDQQH 1 "register_operand" "w") >>>          (parallel [(match_operand:SI 2 "immediate_operand" "i")]))))] >>>    "TARGET_SIMD" >>>    { >>> -    operands[2] = aarch64_endian_lane_rtx (mode, INTVAL >>> (operands[2])); >>> +    operands[2] = aarch64_endian_lane_rtx (mode, >>> +                       INTVAL (operands[2])); >>>      return "umov\\t%w0, %1.[%2]"; >>>    } >>>    [(set_attr "type" "neon_to_gp")] >> > --------------07B2CC4968BE90498A91CCE8 Content-Type: text/x-patch; name="latest.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="latest.patch" Content-length: 3430 diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index f1784d72e55c412d076de43f2f7aad4632d55ecb..e92a3b49c65e84d2a16a2a480c359a0b4d8fa3e3 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -3033,15 +3033,16 @@ [(set_attr "type" "neon_to_gp")] ) -(define_insn "*aarch64_get_lane_zero_extendsi" - [(set (match_operand:SI 0 "register_operand" "=r") - (zero_extend:SI +(define_insn "*aarch64_get_lane_zero_extend" + [(set (match_operand:GPI 0 "register_operand" "=r") + (zero_extend:GPI (vec_select: (match_operand:VDQQH 1 "register_operand" "w") (parallel [(match_operand:SI 2 "immediate_operand" "i")]))))] "TARGET_SIMD" { - operands[2] = aarch64_endian_lane_rtx (mode, INTVAL (operands[2])); + operands[2] = aarch64_endian_lane_rtx (mode, + INTVAL (operands[2])); return "umov\\t%w0, %1.[%2]"; } [(set_attr "type" "neon_to_gp")] diff --git a/gcc/testsuite/gcc.target/aarch64/extract_zero_extend.c b/gcc/testsuite/gcc.target/aarch64/extract_zero_extend.c new file mode 100644 index 0000000000000000000000000000000000000000..a294b261909a1d67ab339c929f2609dcda01c067 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/extract_zero_extend.c @@ -0,0 +1,81 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fdump-rtl-final" } */ + +/* Tests div16qi. */ +typedef unsigned char div16qi __attribute__ ((vector_size (16))); +/* Tests div8qi. */ +typedef unsigned char div8qi __attribute__ ((vector_size (8))); +/* Tests div8hi. */ +typedef unsigned short div8hi __attribute__ ((vector_size (16))); +/* Tests div4hi. */ +typedef unsigned short div4hi __attribute__ ((vector_size (8))); + +/* Tests siv16qi. */ +typedef unsigned char siv16qi __attribute__ ((vector_size (16))); +/* Tests siv8qi. */ +typedef unsigned char siv8qi __attribute__ ((vector_size (8))); +/* Tests siv8hi. */ +typedef unsigned short siv8hi __attribute__ ((vector_size (16))); +/* Tests siv4hi. */ +typedef unsigned short siv4hi __attribute__ ((vector_size (8))); + + +unsigned long long +foo_div16qi (div16qi a) +{ + return a[0]; +} + +unsigned long long +foo_div8qi (div8qi a) +{ + return a[0]; +} + +unsigned long long +foo_div8hi (div8hi a) +{ + return a[0]; +} + +unsigned long long +foo_div4hi (div4hi a) +{ + return a[0]; +} + +unsigned int +foo_siv16qi (siv16qi a) +{ + return a[0]; +} + +unsigned int +foo_siv8qi (siv8qi a) +{ + return a[0]; +} + +unsigned int +foo_siv8hi (siv8hi a) +{ + return a[0]; +} + +unsigned int +foo_siv4hi (siv4hi a) +{ + return a[0]; +} + +/* { dg-final { scan-assembler-times "umov\\t" 8 } } */ +/* { dg-final { scan-assembler-not "and\\t" } } */ + +/* { dg-final { scan-rtl-dump "aarch64_get_lane_zero_extenddiv16qi" "final" } } */ +/* { dg-final { scan-rtl-dump "aarch64_get_lane_zero_extenddiv8qi" "final" } } */ +/* { dg-final { scan-rtl-dump "aarch64_get_lane_zero_extenddiv8hi" "final" } } */ +/* { dg-final { scan-rtl-dump "aarch64_get_lane_zero_extenddiv4hi" "final" } } */ +/* { dg-final { scan-rtl-dump "aarch64_get_lane_zero_extendsiv16qi" "final" } } */ +/* { dg-final { scan-rtl-dump "aarch64_get_lane_zero_extendsiv8qi" "final" } } */ +/* { dg-final { scan-rtl-dump "aarch64_get_lane_zero_extendsiv8hi" "final" } } */ +/* { dg-final { scan-rtl-dump "aarch64_get_lane_zero_extendsiv4hi" "final" } } */ --------------07B2CC4968BE90498A91CCE8--