From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from NAM04-BN8-obe.outbound.protection.outlook.com (mail-bn8nam04on2092.outbound.protection.outlook.com [40.107.100.92]) by sourceware.org (Postfix) with ESMTPS id BFAF23858C62 for ; Mon, 24 Jul 2023 01:58:35 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org BFAF23858C62 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=os.amperecomputing.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=os.amperecomputing.com ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Uobu43CCK5ZEHYgwbwlNbOAu0Xx4mjubeH25oLCfr9FwBprHd9MM7FZ2JSjOKAr06ns/OVTe4K7RbCD5TdSJMt4ucYnDon2qlpHWJRAcBJTCSR4NLwok3UNve92L/N7suF2zDyYX6R7xyPuixWXak+DRF0IafU7AeiobF9UaU8zgXICwPXlxIrXPQJ6i1TtBnE5o64/qpe4RwBbXS6sHQd7BbAkYsiznW4EcMKRYhUwooyXiwz7neiUJgBMpiCBJESHHaJGZfu4HxWjfM8tMhjpvJx9J0uGyYI7PT8w00cPymrHaGtXtiw6FO3n1Jg4xE44CuqKMQhvvOliMZGNSUw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=lqHT9o3j2vNtzmC3V+H2ngul7eeYWru7dcmTBT0gwY4=; b=ldv+MmsyBfmNu4yUQpSGkbnlNvXUoJ9IzaqDbgWyufGMGnDD8VdKz3N/pAwu9IvroBivGkc7ll4VZWAsUQwzKCciqyFUghgKGpOB2hhU0MsjDMzDnexvcVB8kavFYMZUuRl4jVye/RdUXQWsS3+Sm0hyBJuJz9z/zjwaPWpU4zifTj+W00gom8pon8lC7bhTCV3CBv7CJXYP+0T/NsXBwofCv+7VzRWNGIdVu9o+d2+1xb+/fMe1KpT6fgzrtcAad+MBq8hW8i9tG7kJvDb0GJ0q/9vsyR8pjCp/I+Pw4ivB3W67V/SQIz0bRmn8SYsDw2draL33AG8JsZwPvnVjyA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=os.amperecomputing.com; dmarc=pass action=none header.from=os.amperecomputing.com; dkim=pass header.d=os.amperecomputing.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=os.amperecomputing.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=lqHT9o3j2vNtzmC3V+H2ngul7eeYWru7dcmTBT0gwY4=; b=ImUaCUC/54FjEH496GYjzdSZJrROytdFqt98YXp7mmY9aVDyLcxfQunvjbX5Lsmej3P+WUS9iEHpwnZtQJqYMwG7AtkAbVuFuidt97mzmAxqDfd5LH1CQ9+X6NliDPbhEWGEjZCcOwORZBnbpX+X42EauecGH0Ey0UJZsBRYzi8= Received: from SJ2PR01MB8635.prod.exchangelabs.com (2603:10b6:a03:57b::16) by PH0PR01MB6229.prod.exchangelabs.com (2603:10b6:510:1f::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6609.32; Mon, 24 Jul 2023 01:58:30 +0000 Received: from SJ2PR01MB8635.prod.exchangelabs.com ([fe80::4c34:715d:c446:7fc5]) by SJ2PR01MB8635.prod.exchangelabs.com ([fe80::4c34:715d:c446:7fc5%5]) with mapi id 15.20.6609.031; Mon, 24 Jul 2023 01:58:30 +0000 From: Hao Liu OS To: "GCC-patches@gcc.gnu.org" CC: "richard.sandiford@arm.com" Subject: Re: [PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625] Thread-Topic: [PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625] Thread-Index: AQHZuflIfzSQE63nDUGU9bz2B+2yKq/IMBrT Date: Mon, 24 Jul 2023 01:58:30 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-US, zh-CN Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Enabled=True;MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_SiteId=3bc2b170-fd94-476d-b0ce-4229bdc904a7;MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_SetDate=2023-07-24T01:58:28.961Z;MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Name=Confidential;MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_ContentBits=0;MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Method=Standard; authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=os.amperecomputing.com; x-ms-publictraffictype: Email x-ms-traffictypediagnostic: SJ2PR01MB8635:EE_|PH0PR01MB6229:EE_ x-ms-office365-filtering-correlation-id: 2f8fd336-a4e1-4e7a-a6cc-08db8be97a87 x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: 2GmFumslK7uxSk52W8weOCw3WwTXCQFeH6tlEgIEVKCfOSo147br+WRTo3cm29I8vM7eHSMDwi5GTlSYqc51PpV1bA7DZRG0QwtqNwQqheQ0AHMR6dq7hx1z46f61gbww+nNkd14+s6HRQyS0q8h8nJR1rK1WJdxNru/NgJpcs6M6VAerjMJTxoE1g2nGedDTaExsFSWJIG6zMcLgU1dpswz8kQTH9XasESZWCqrr/C3R4ush/HmyQI/dDoaSOkI81HdCGMnicQbYHbBk3QvyOXc1p3MzlQQBYUFjp6yLnIA16X9uM0FtxXciYprYdbq5u5DHQqBopD7E2dw12iuqgb2n8JMBZsqanroVYjzn1hVnF5ssFmVt/eAZlx6db6I9uwHymTtthcigbaGVCee6uI3m9HmhuN2r3g5FQngWraRuCdtScTQQJrx6yu1Lr9vpvMxLNfnQVqTAWos4snNaX0/wj74/abF0D/8FQ2LE+utjVZqXdHuVtsOUXhlT8htHYgNuz6i0nSeVe+eQuQzSSLhBYDRejsRtQZimLG9Uj0TAzTwbH5KnSQgldX6OEL63F0CMI2UnBGYx+7ehjGJ5hwxyf1VyVoBl/6XMNEWs0s= x-forefront-antispam-report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:SJ2PR01MB8635.prod.exchangelabs.com;PTR:;CAT:NONE;SFS:(13230028)(4636009)(396003)(39840400004)(366004)(376002)(346002)(136003)(451199021)(91956017)(2906002)(64756008)(66946007)(76116006)(66446008)(66556008)(66476007)(4326008)(6916009)(7696005)(33656002)(71200400001)(86362001)(9686003)(83380400001)(478600001)(186003)(122000001)(26005)(38070700005)(6506007)(53546011)(38100700002)(55016003)(41300700001)(8936002)(8676002)(5660300002)(52536014)(84970400001)(316002)(66899021);DIR:OUT;SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?us-ascii?Q?dhWy3Ajn383c0WndyxWjEXLOkEFqAuMOtwrjwn+Uk28r3jacSdP3ei10Mxzp?= =?us-ascii?Q?R7rYR2EryE3piEWQ7RcUgPqqGoprQmx9D6I6Xiu953M/wKZLq3liYC5Ekc2Z?= =?us-ascii?Q?nQ2HQn//NbxVn5D9qss9P89DWmTMYiS6zycn94xw5Iv1Ws1gRPaJMvypm9eW?= =?us-ascii?Q?yU77cqY5oBBSDH1us15NoKrUQEMgV0970p/VCIgEnaWnvhjdiEerR3+9aPn7?= =?us-ascii?Q?gJYWRZ9Uz1XQCbQenUao632pbtSJu1JjRag4oA8VZaSzFmy7oxPZfv0rRkJ8?= =?us-ascii?Q?H4I1Qii/eMispkJ/AdiK90Ou9aL4jAroNoMZeoXR87Ia40oWsJZyJJ0GsAQ9?= =?us-ascii?Q?hNqStWpMiiJjc5I+NO/q+vmfCHx70oHGHDynY8BueCRh4jWlREzNE8wQHiyo?= =?us-ascii?Q?9n8kCn8i1s+2tGLUNglpVAe05KazDF8YCSNeaPxV0c0Mr3GuhxxEVx7NCNg7?= =?us-ascii?Q?GRJis6sPoD+q9ZvyJrhuRpKr4lif+qJnTeTOihnak5khF2xjcf1tv4CG30zl?= =?us-ascii?Q?+OTN2pQJjRclrfYZl4OosFNLL/mEmV5d/Td5v6sKA2fmUfSeCB78SQRVOpGD?= =?us-ascii?Q?HJweH1S6aYBJFDETnKHbgDXdwBIPNRLalodO2/gVeHfJnJdHcyIXnwRew14A?= =?us-ascii?Q?IW+nS0xvoxX4ucBM/HM2/4f9XMpd+IQsg+QjD8C6XbqWFoZQqU1hBZkBc9P0?= =?us-ascii?Q?QM8G2XI4cG4sf89qf1ylhItj+Uh7Ud7z2sB8LmIRAsL9dQ8ir5ofeWQSHvU0?= =?us-ascii?Q?9xKFfXd6/fivCApVDDs3zl0Ot+W2DlqouxJlAvR/zxVQmKDqmkR6RVoQyJfI?= =?us-ascii?Q?kpYptcRTSl1AM22JdWUaLnPpLCUC8LiciTNROo4lnbQ8E0kjLYMMwanXJY+S?= =?us-ascii?Q?NQbTiLlehzordXSLgrLBECgkbWIhKoJr9H3R3uiqjsoksUVANBwf0eec65PQ?= =?us-ascii?Q?24cUjKdAfto//drq5Fov+WYIsXGtGMdp+l/yRbqrG8lGyNXUb6JWINMzTRcZ?= =?us-ascii?Q?2rI0oteblpCta6+v+Ts5nhd7zf1YvytArWWTDYYyhWzKOptCje+0Q0yZWBMj?= =?us-ascii?Q?RvSwTiVwt4gKEnlLgMFJlevPvOezQlmwao9WbGrNGSkrl4a98U27qy+E03y1?= =?us-ascii?Q?0CHj09HjIDqy9B+gP5SHvZ9qL3ua3rJ22KbNs5qQ5FDlSo1N3M0npuihy13J?= =?us-ascii?Q?W9qBxg2/qR8Q9Lx3W5TxvWZIwXgvFoF2AB0cUza2hHlIdFtul5NggtTLTmfi?= =?us-ascii?Q?6KbvHv+x61lmAIqZK8ruyP2h9tSwQX2onh492O+nSCLECrzb1nTvj5ebeQ8N?= =?us-ascii?Q?RDtxuTDaKEA9U1/MQguo3/9laGs1n/SoWk5b8KK1kuvCpEqQpcmg2QF3eZY0?= =?us-ascii?Q?oalSk1Yb4itepD47V0kCiIDgTI2yn5w6dNDc9qdKVD+PgkO+TXqpRFMb1Lke?= =?us-ascii?Q?2W0xA30tfBRPWp3ZEVNp5pT+ZX5AUmvYNOb193t4rlaQ1mdEMr/Udm39RUHO?= =?us-ascii?Q?DknwWhcOnIZ3QhuPLnaw3QF5cMA5komRPxOrHEAKG3VJ2ff7kDbGwSJoHnqY?= =?us-ascii?Q?FA9kXOeEwVcQIzAEON/Z8p6b1slpSl3Ri/8PKyhe6gkX7Ji5AHRy4iFBfM0G?= =?us-ascii?Q?rw=3D=3D?= Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: os.amperecomputing.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: SJ2PR01MB8635.prod.exchangelabs.com X-MS-Exchange-CrossTenant-Network-Message-Id: 2f8fd336-a4e1-4e7a-a6cc-08db8be97a87 X-MS-Exchange-CrossTenant-originalarrivaltime: 24 Jul 2023 01:58:30.1820 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 3bc2b170-fd94-476d-b0ce-4229bdc904a7 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: hW9NZtBZL/jJbv5eT5j7/oUKckeJ/sW3jrQz3rRxsD5VssYcWyi9o03vwUXLqTD00pCRVxVOUT6TCCYfjvRTCUlZvrO5EMjfFduo9OJiip1PgRc0lUZ9tTEkY134/utO X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH0PR01MB6229 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Richard, Gentle ping. Is it ok for trunk? Or, you will have patch covering such fix? Thanks, -Hao ________________________________________ From: Hao Liu OS Sent: Wednesday, July 19, 2023 12:33 To: GCC-patches@gcc.gnu.org Cc: richard.sandiford@arm.com Subject: [PATCH] AArch64: Do not increase the vect reduction latency by mul= tiplying count [PR110625] This only affects the new costs in aarch64 backend. Currently, the reducti= on latency of vector body is too large as it is multiplied by stmt count. As = the scalar reduction latency is small, the new costs model may think "scalar co= de would issue more quickly" and increase the vector body cost a lot, which wi= ll miss vectorization opportunities. Tested by bootstrapping on aarch64-linux-gnu. gcc/ChangeLog: PR target/110625 * config/aarch64/aarch64.cc (count_ops): Remove the '* count' for reduction_latency. gcc/testsuite/ChangeLog: * gcc.target/aarch64/pr110625.c: New testcase. --- gcc/config/aarch64/aarch64.cc | 5 +-- gcc/testsuite/gcc.target/aarch64/pr110625.c | 46 +++++++++++++++++++++ 2 files changed, 47 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/pr110625.c diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 560e5431636..27afa64b7d5 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -16788,10 +16788,7 @@ aarch64_vector_costs::count_ops (unsigned int coun= t, vect_cost_for_stmt kind, { unsigned int base =3D aarch64_in_loop_reduction_latency (m_vinfo, stmt_info, m_vec_fl= ags); - - /* ??? Ideally we'd do COUNT reductions in parallel, but unfortunate= ly - that's not yet the case. */ - ops->reduction_latency =3D MAX (ops->reduction_latency, base * count= ); + ops->reduction_latency =3D MAX (ops->reduction_latency, base); } /* Assume that multiply-adds will become a single operation. */ diff --git a/gcc/testsuite/gcc.target/aarch64/pr110625.c b/gcc/testsuite/gc= c.target/aarch64/pr110625.c new file mode 100644 index 00000000000..0965cac33a0 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/pr110625.c @@ -0,0 +1,46 @@ +/* { dg-do compile } */ +/* { dg-options "-Ofast -mcpu=3Dneoverse-n2 -fdump-tree-vect-details -fno-= tree-slp-vectorize" } */ +/* { dg-final { scan-tree-dump-not "reduction latency =3D 8" "vect" } } */ + +/* Do not increase the vector body cost due to the incorrect reduction lat= ency + Original vector body cost =3D 51 + Scalar issue estimate: + ... + reduction latency =3D 2 + estimated min cycles per iteration =3D 2.000000 + estimated cycles per vector iteration (for VF 2) =3D 4.000000 + Vector issue estimate: + ... + reduction latency =3D 8 <-- Too large + estimated min cycles per iteration =3D 8.000000 + Increasing body cost to 102 because scalar code would issue more quick= ly + ... + missed: cost model: the vector iteration cost =3D 102 divided by the = scalar iteration cost =3D 44 is greater or equal to the vectorization facto= r =3D 2. + missed: not vectorized: vectorization not profitable. */ + +typedef struct +{ + unsigned short m1, m2, m3, m4; +} the_struct_t; +typedef struct +{ + double m1, m2, m3, m4, m5; +} the_struct2_t; + +double +bar (the_struct2_t *); + +double +foo (double *k, unsigned int n, the_struct_t *the_struct) +{ + unsigned int u; + the_struct2_t result; + for (u =3D 0; u < n; u++, k--) + { + result.m1 +=3D (*k) * the_struct[u].m1; + result.m2 +=3D (*k) * the_struct[u].m2; + result.m3 +=3D (*k) * the_struct[u].m3; + result.m4 +=3D (*k) * the_struct[u].m4; + } + return bar (&result); +} -- 2.34.1