From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from NAM04-DM6-obe.outbound.protection.outlook.com (mail-dm6nam04on2093.outbound.protection.outlook.com [40.107.102.93]) by sourceware.org (Postfix) with ESMTPS id 825E73858D32 for ; Wed, 19 Jul 2023 04:33:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 825E73858D32 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=os.amperecomputing.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=os.amperecomputing.com ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=DkeJ2knKayDEiubzLgFooG8u8bG7LTQXx/1jWC53tOdZdHRGrx43PfG/D7ra8QQEWt1HkdPCdtteghHGcTgkpLR4KRzpiSk3xffriL2mTPLFa57tILtl2114BpEkZwxa3vyLr7VIfvx6A5fSeKjOHSj6ewnmDBzD1Y8EtmJVvZhVZcMatYJd4BfxO/5DKkL74ZgceRI3n2QCoS++hDA4CesKxG34O76JCiEhiI4lSB4Q886OJtyAP3gEg9HeDy/i4+VKbV/3snoR2b0YoJ/1xXXmL3WrNhmjh+62tox0mQu7V29roXgZbIUURHb+iUaO96Y6+fjkQJ5sgxuLXb0TsQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=2ezgOHoNt/j6i66cXYgxSLt9nwQab2fdEmzcFcPAL7k=; b=X1CLTrLWZujOJrcUe+pW3QEIBK27723xvyfiECjgF/ThisRTF7TGg5goW5uzaFcuHSJl6jfFfe8tXA1PIz7TtdZ67+SJcmLDs+KGAT5dW6Eqx1amXK0AtcTGRPK3aXMTVd/4A8KmAGUn72ZczjyWcKO1dsIyjzSWXwXV3I/EMVnSHYHbrpMNQYvOF9WGeMX7PEbvpnuYr/kCpw3uwY6WrQU5xsAS27jQ0QHEL0u9uM7U/7/2DOurIhYl8fvACI0FUUIFIOvorXJc9AiJuM1as/s0+Le06cdYhGRoISdhBkqcpfbfoeFtL7pEyFo/ClfGMP69F6aKfWzUoPrM5yiqmw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=os.amperecomputing.com; dmarc=pass action=none header.from=os.amperecomputing.com; dkim=pass header.d=os.amperecomputing.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=os.amperecomputing.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=2ezgOHoNt/j6i66cXYgxSLt9nwQab2fdEmzcFcPAL7k=; b=QniQFYY45fFx6hKS9MOfpv5u/2+T6RnRWlan9ZGatRtY4Cjb/bxzadYhzk+8eMZf0sTtOMqRjfrDhfW1kqDblvggrE6+XfjMDuG6zXBMB60cwUoekSCt/YnSVz0XlNwP+rJfeS0HfZcgRjl69rte+ulX5TaYHW4pKmHE6u+IboU= Received: from SJ2PR01MB8635.prod.exchangelabs.com (2603:10b6:a03:57b::16) by SA1PR01MB6544.prod.exchangelabs.com (2603:10b6:806:1ab::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6609.24; Wed, 19 Jul 2023 04:33:49 +0000 Received: from SJ2PR01MB8635.prod.exchangelabs.com ([fe80::4973:da2:1b04:e600]) by SJ2PR01MB8635.prod.exchangelabs.com ([fe80::4973:da2:1b04:e600%6]) with mapi id 15.20.6588.031; Wed, 19 Jul 2023 04:33:49 +0000 From: Hao Liu OS To: "GCC-patches@gcc.gnu.org" CC: "richard.sandiford@arm.com" Subject: [PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625] Thread-Topic: [PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625] Thread-Index: AQHZuflIfzSQE63nDUGU9bz2B+2yKg== Date: Wed, 19 Jul 2023 04:33:48 +0000 Message-ID: Accept-Language: en-US, zh-CN Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Enabled=True;MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_SiteId=3bc2b170-fd94-476d-b0ce-4229bdc904a7;MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_SetDate=2023-07-19T04:33:46.692Z;MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Name=Confidential;MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_ContentBits=0;MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Method=Standard; authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=os.amperecomputing.com; x-ms-publictraffictype: Email x-ms-traffictypediagnostic: SJ2PR01MB8635:EE_|SA1PR01MB6544:EE_ x-ms-office365-filtering-correlation-id: 7c41f010-1a42-4c7f-1b5f-08db88115872 x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: bVwTlNvI0FzbXxXkj9y+SvDnqyZ69zI6QoOwRRJo4xz9DDGkxHldsncIyQwRfcDdJWQHRLDz6L7bwghmYb/YVcI4GCjaR3SkhLxaQEJ9F3sMAuEoSKeRrFWuQM1Cs9mzVd32tsXm6ZneVG+q/NQwLacBqciDHffAeSY0uKDDZDa2e87WGGfx0awuIs7TFAMYAUKHnkRVuILEdKHvWbuHauy1AShwIfNR0cW4ufzvHpCXlF/JXDV21goCp+Wj16VY3+ZXvQdxChi2NC/veT/DgEbS7RdbIJaaBp5c5fnmy/QGaUiucL9RjV8pSTfdYqPvYzNHN/uLiG9N9AcgOUtleDVgUwUdcC5dhE1CZKOmaJlqEmyY8vH7MB42LD8K79nNBg6tnEkNWCci4/+I9OjHCh3VS4KHlPW8L9pKDAtz9e6KpT1CuJ+a8xyXivNHTlCeP3+X8DVwT8nyCyQIskC6+6CVpCPpD5lA/OsjRVABd1AbVAbTDKx1sa9oKWjdBseEqjgUAPWTR7I4VFcGgpKBj2EGmxPq/G2T8B9WFrndww80LxKwGOVwRux1c2bpteYhpJ33MbyzPgyIRAdk32Un+/uVxM0qCnohOZmD+ItEkbA= x-forefront-antispam-report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:SJ2PR01MB8635.prod.exchangelabs.com;PTR:;CAT:NONE;SFS:(13230028)(4636009)(346002)(39850400004)(396003)(136003)(376002)(366004)(451199021)(86362001)(84970400001)(478600001)(83380400001)(186003)(38100700002)(41300700001)(26005)(71200400001)(33656002)(316002)(8936002)(6916009)(8676002)(2906002)(4326008)(122000001)(66556008)(64756008)(9686003)(7696005)(66946007)(52536014)(66446008)(66476007)(6506007)(91956017)(38070700005)(76116006)(55016003)(5660300002)(66899021);DIR:OUT;SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?Q?Ia2ocKYwjM9LtHud3boxYB0K+d4ar1CTwRSAd22nGthuVDuXcOHcz7IuMz?= =?iso-8859-1?Q?uKZ9ffNVVdG+tBFwihIILzWBb4Z/plI9P+01ONenZJcX3NtwWe5Et2IruS?= =?iso-8859-1?Q?46qHq7C4q7eSrllx9gBBG/0VdaqPC7r8kepyJlwCGdWaC9qF4MQV26/ggQ?= =?iso-8859-1?Q?C81zgIe6Wd7zIaksmHlYkzLbC2lzkjxqHhsA6oX4qiXjhxmpmnSf/KwzW3?= =?iso-8859-1?Q?3ma+1M/pT8a7lC5xvK/lw2InqSgc1xd/Cj2wKp8VvQD1phZWVGp8dTOcGn?= =?iso-8859-1?Q?Rb198dUB6A4rxgrFnGS1aouc6mFwEkbB+OdxVxbHsrpdDkuFGnOD3GKgw3?= =?iso-8859-1?Q?l0OGMhY0BDEfAuL+H1TxbOWWevSWnbYoNnFajElkebHn+FCxI++FLBgbrA?= =?iso-8859-1?Q?tgH4pdh5YvhtJyKi/7yKOepzGvDZAGcU7z9tX0hTwhu6dCfGmOEfQKwNuz?= =?iso-8859-1?Q?t6uNDZ5c9+bQKvVpfrP9Nvvs/xN7kTDlRh+8bZB9yzx8uqwjQ6Bn93FLCX?= =?iso-8859-1?Q?O3tKWzui4G9evUVXMWBTmTCvKhMclRo9ZWdtx9aIPVI62FluEMcjP1KIGu?= =?iso-8859-1?Q?4p0L6A4tmTJZ/jwsRO+/pEgDNf42g7lIGDiSe5TfIgW4kh1MGOd9vs3Doz?= =?iso-8859-1?Q?eUSLczBRqmQRKaKILWTpPJZMiBAYPtp4US2r4+Hs0nQmeWyy+WjItxP57y?= =?iso-8859-1?Q?bXcdpf/7WhnBaIVsqXntLHCB2x6sn61U10MnlBYvjyKoQ2LqN9JzPTr4y7?= =?iso-8859-1?Q?bDnkyARhkoow16bzQjghcBfXOQTNkM7pauT9FBuL/ZrTfLrfINpgP8BBZH?= =?iso-8859-1?Q?+/pRW2t4VlyCCMkYdc9MsnBTq1W/8b/waFnx2F8gNxTfja6hlHziH6Yaie?= =?iso-8859-1?Q?doBlNIGEHDk4I3er2MLOaQg/LmygLpfAXTHbRajv9Rpk9uGIp0OAppne//?= =?iso-8859-1?Q?lfwDSThd29YDu/wg8Mx3KUmesqMYK97xe/aaoR5/oKsMHaOraOMB/6xsth?= =?iso-8859-1?Q?/WCZGgFNSunE51LcW5keTLl5pHnJY2txFeIUX+28/gGgt6PdRrd7dWdYW9?= =?iso-8859-1?Q?N51Uz6NY4zRjyaqmLGEvGVsML95vhg/EmOhS2kw/Lm8L3SgVY0Eb5dhJHT?= =?iso-8859-1?Q?Zskn5MJEeWRyYa5o1kv0FrZAoTn/GWllvOam69s+jWcBRtuiZyjqrzLDP6?= =?iso-8859-1?Q?dBqVuKNBKPpHTJahVw7livQ+f7PpLyG2ZTDb7logbhTiN8gmJiW6iVirx3?= =?iso-8859-1?Q?haD7dPh6booJ/ARnWsNizUKCsZ5AECUutkFiOv/vGu3J8NLnueJOYndUgx?= =?iso-8859-1?Q?4d6LCnIBJ/gmSirlEbZKDy4kkuoz7mKQsoyocMfzntfV6WtCci7KMpIkYH?= =?iso-8859-1?Q?0ta5gM/zkxluyAZu+TtkINHdT5nJzJV6/pCOPlE7mN1drwxzsRdrv/6TLy?= =?iso-8859-1?Q?JVBl75IczLs+Jvdw0wbevK1oFBdXjFPgk+CVU2PPEzIRMww7J93/Ch48pd?= =?iso-8859-1?Q?IAZTm3bubQCdo7GPeuRX7Nf61PHFlhDXtZDWhrLAJJ1ZmDAehdQZPwqolb?= =?iso-8859-1?Q?9cj/T+a3zJCYD3vzIQOLB8eLrkGBhxmalQii4JZ2SEPZsR7Qhw10T174NV?= =?iso-8859-1?Q?m58GjnseWXLFw0UZDpNUZ8sujJuDdfVRQjUh9O6JERIF5kn551Q5fxxQ?= =?iso-8859-1?Q?=3D=3D?= Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: os.amperecomputing.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: SJ2PR01MB8635.prod.exchangelabs.com X-MS-Exchange-CrossTenant-Network-Message-Id: 7c41f010-1a42-4c7f-1b5f-08db88115872 X-MS-Exchange-CrossTenant-originalarrivaltime: 19 Jul 2023 04:33:48.2333 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 3bc2b170-fd94-476d-b0ce-4229bdc904a7 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: Xrr28A3D+pPtXabiVHN+n396ADhEOqKfdKrDeiJhO+BgsHWrSUYIDQgqGGwmbsN2/qjPfF1Hq5WOJGzduhyCfn8zI5EKB2PE2ciyFHLRflfwpIRbv72P8JnklegZL+nY X-MS-Exchange-Transport-CrossTenantHeadersStamped: SA1PR01MB6544 X-Spam-Status: No, score=-11.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: This only affects the new costs in aarch64 backend. Currently, the reducti= on=0A= latency of vector body is too large as it is multiplied by stmt count. As = the=0A= scalar reduction latency is small, the new costs model may think "scalar co= de=0A= would issue more quickly" and increase the vector body cost a lot, which wi= ll=0A= miss vectorization opportunities.=0A= =0A= Tested by bootstrapping on aarch64-linux-gnu.=0A= =0A= gcc/ChangeLog:=0A= =0A= PR target/110625=0A= * config/aarch64/aarch64.cc (count_ops): Remove the '* count'=0A= for reduction_latency.=0A= =0A= gcc/testsuite/ChangeLog:=0A= =0A= * gcc.target/aarch64/pr110625.c: New testcase.=0A= ---=0A= gcc/config/aarch64/aarch64.cc | 5 +--=0A= gcc/testsuite/gcc.target/aarch64/pr110625.c | 46 +++++++++++++++++++++=0A= 2 files changed, 47 insertions(+), 4 deletions(-)=0A= create mode 100644 gcc/testsuite/gcc.target/aarch64/pr110625.c=0A= =0A= diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc= =0A= index 560e5431636..27afa64b7d5 100644=0A= --- a/gcc/config/aarch64/aarch64.cc=0A= +++ b/gcc/config/aarch64/aarch64.cc=0A= @@ -16788,10 +16788,7 @@ aarch64_vector_costs::count_ops (unsigned int coun= t, vect_cost_for_stmt kind,=0A= {=0A= unsigned int base=0A= =3D aarch64_in_loop_reduction_latency (m_vinfo, stmt_info, m_vec_flags);= =0A= -=0A= - /* ??? Ideally we'd do COUNT reductions in parallel, but unfortunate= ly=0A= - that's not yet the case. */=0A= - ops->reduction_latency =3D MAX (ops->reduction_latency, base * count= );=0A= + ops->reduction_latency =3D MAX (ops->reduction_latency, base);=0A= }=0A= =0A= /* Assume that multiply-adds will become a single operation. */=0A= diff --git a/gcc/testsuite/gcc.target/aarch64/pr110625.c b/gcc/testsuite/gc= c.target/aarch64/pr110625.c=0A= new file mode 100644=0A= index 00000000000..0965cac33a0=0A= --- /dev/null=0A= +++ b/gcc/testsuite/gcc.target/aarch64/pr110625.c=0A= @@ -0,0 +1,46 @@=0A= +/* { dg-do compile } */=0A= +/* { dg-options "-Ofast -mcpu=3Dneoverse-n2 -fdump-tree-vect-details -fno-= tree-slp-vectorize" } */=0A= +/* { dg-final { scan-tree-dump-not "reduction latency =3D 8" "vect" } } */= =0A= +=0A= +/* Do not increase the vector body cost due to the incorrect reduction lat= ency=0A= + Original vector body cost =3D 51=0A= + Scalar issue estimate:=0A= + ...=0A= + reduction latency =3D 2=0A= + estimated min cycles per iteration =3D 2.000000=0A= + estimated cycles per vector iteration (for VF 2) =3D 4.000000=0A= + Vector issue estimate:=0A= + ...=0A= + reduction latency =3D 8 <-- Too large=0A= + estimated min cycles per iteration =3D 8.000000=0A= + Increasing body cost to 102 because scalar code would issue more quick= ly=0A= + ...=0A= + missed: cost model: the vector iteration cost =3D 102 divided by the = scalar iteration cost =3D 44 is greater or equal to the vectorization facto= r =3D 2.=0A= + missed: not vectorized: vectorization not profitable. */=0A= +=0A= +typedef struct=0A= +{=0A= + unsigned short m1, m2, m3, m4;=0A= +} the_struct_t;=0A= +typedef struct=0A= +{=0A= + double m1, m2, m3, m4, m5;=0A= +} the_struct2_t;=0A= +=0A= +double=0A= +bar (the_struct2_t *);=0A= +=0A= +double=0A= +foo (double *k, unsigned int n, the_struct_t *the_struct)=0A= +{=0A= + unsigned int u;=0A= + the_struct2_t result;=0A= + for (u =3D 0; u < n; u++, k--)=0A= + {=0A= + result.m1 +=3D (*k) * the_struct[u].m1;=0A= + result.m2 +=3D (*k) * the_struct[u].m2;=0A= + result.m3 +=3D (*k) * the_struct[u].m3;=0A= + result.m4 +=3D (*k) * the_struct[u].m4;=0A= + }=0A= + return bar (&result);=0A= +}=0A= -- =0A= 2.34.1=0A=