From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from NAM11-CO1-obe.outbound.protection.outlook.com (mail-co1nam11on2119.outbound.protection.outlook.com [40.107.220.119]) by sourceware.org (Postfix) with ESMTPS id 0E3BC3858422 for ; Tue, 1 Aug 2023 09:43:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 0E3BC3858422 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=os.amperecomputing.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=os.amperecomputing.com ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=dD26R+VvGugbObM2/l+JeD8mxyqwiCtf4oIz7GIPbI9C0s5uzW4nEYaPOLgjySu98FJzs19lVPiwwEt6jj+LPEsusAR+j+NhfHp7DHMG71rSTLjzBduPHJ8muYBgzhm+o5pOemWJD4OZOssIH2bhBRYKsjH1FSYfnFGxBm5jdu8lBxxAMW4fsSZplfDGbM7QsIZQsISRPJPBjOU1CGc5kBJkt07W+cQvgccje06RV1o6Swj792da+voetaVgzHzkmVYBiBI/QLJ7bwVqalY7Q6knrZ5j4AxSvTZuQJqB1xe7rhJoCFAKg1YB1WaMVzd1jFliygko6/JwW/hWlb1IYg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=ejRPEdLqXa/s9SwocoINgOuRvxBXV0kAGSl3puWk2vA=; b=GJRGXk3RBgWx9fPATpJwSYFPd8+N2NDRp2fw3yLNSpIVUGHUrFsQp8uN4wuCk5CEgd9Rudz6BeTctJdIDj9NWkZp7lWh+uDp42Cq6Jhr0u8EnY5WLo63VXIyb3goZtCWeLD2E/ZkZcmnc+4HrcvcVKIC63sTr95o/85o1H7nKBaM+kp9vwr2LrMkITt5N78u8MD1wQajLpi/NPYkwvrJZhJFKsTrl4Y3jB1mMuefYxgpWdj6IpXnio5EoLoxrCRvska23aVwoDcWCI/eDCvnWwx+nfKV7hg6cEd+9j4kYAj4FoF3wVEC1ZBCOsjrRlenvsWNtJRTgeqvbFHlh8NN2w== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=os.amperecomputing.com; dmarc=pass action=none header.from=os.amperecomputing.com; dkim=pass header.d=os.amperecomputing.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=os.amperecomputing.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=ejRPEdLqXa/s9SwocoINgOuRvxBXV0kAGSl3puWk2vA=; b=Arx7i14jEgRVnzZ/d6g4NNoyxV8b0ygH3Gxm6/U6IYv3Bql+xUhdi/WHxpFaeBjrihlPcf2KuQWl0eoeM7nP5NhDqFU2D6VK5UAslZxn7NHwAlnAuNOz99WvRIeK4IcR+PjQGBtum7d4sUvzcxgsNdPlFMJg5gIjX2AmupDKQ5Q= Received: from SJ2PR01MB8635.prod.exchangelabs.com (2603:10b6:a03:57b::16) by SN7PR01MB8116.prod.exchangelabs.com (2603:10b6:806:357::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6609.29; Tue, 1 Aug 2023 09:43:20 +0000 Received: from SJ2PR01MB8635.prod.exchangelabs.com ([fe80::4c34:715d:c446:7fc5]) by SJ2PR01MB8635.prod.exchangelabs.com ([fe80::4c34:715d:c446:7fc5%5]) with mapi id 15.20.6631.043; Tue, 1 Aug 2023 09:43:19 +0000 From: Hao Liu OS To: Richard Sandiford CC: Richard Biener , "GCC-patches@gcc.gnu.org" Subject: Re: [PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625] Thread-Topic: [PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625] Thread-Index: AQHZuflIfzSQE63nDUGU9bz2B+2yKq/Iyx9zgAFtZ+aAAAyoV4ABDpZUgABz54CAAAd444AAOh/TgAN2f3eAA7nMmIAAcKY/gAGZXh4= Date: Tue, 1 Aug 2023 09:43:19 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-US, zh-CN Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Enabled=True;MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_SiteId=3bc2b170-fd94-476d-b0ce-4229bdc904a7;MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_SetDate=2023-08-01T09:43:17.597Z;MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Name=Confidential;MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_ContentBits=0;MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Method=Standard; authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=os.amperecomputing.com; x-ms-publictraffictype: Email x-ms-traffictypediagnostic: SJ2PR01MB8635:EE_|SN7PR01MB8116:EE_ x-ms-office365-filtering-correlation-id: e0803396-d1d9-447a-86cd-08db9273bceb x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: Qf35MaVz9Ud0YxeqAmvZHDDozalZgQSXDEin/JjdYZkPKA5RNbsJ0T1ZuxbXNns3vFv7TFE/yYA3KqG/k7wD0B20927B3m3xAvYFLHCCu7kg9XOjkNHpqd13RQUI3TOtTf1hp+exmOJKCZ7CskkcdY/Grqhtl3wkgoUYd2O2FRCSVtvAgvFW+LsiSrPcXFTRhX2aB1bfDun+jn+mm0ThW/l/QIVohQ/iCbeNvIFPvJv1q8/nlk5JQ71PbnEdBxOAZIxvfUmNqvybzcJ0EYXhbkcT0yaNnp/+ydcS/Zn3I3+ydNEO5JlykwyrVpr2+DEzukceuTSOOqDfpTA7fPHL9GlLiBzy7FvqZwmGqtTTDfmSffDTwQFhYof1z1qa7ImOLuxIet+flglbRHS0nLl34kp5ZA13SyuN8MLrrtXHxgOWhS+HzJ9uIJ4wq5VDZajn0DmoRpaWn/WtMPqyPUF/b7oxTMTJhlp0D67xiLvRBu4a01B4pbgWAOSIy1jqK2DfB+eK7M5oB9PUcV3hgklwhETYsJ1HLoafPjzB/bP8kFLilpY1ukMIdOt3J4G/SuX+j86eEjQ1gWrtQzrmrTetWbk7UJS8bfWgi9BwJZystBA= x-forefront-antispam-report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:SJ2PR01MB8635.prod.exchangelabs.com;PTR:;CAT:NONE;SFS:(13230028)(4636009)(376002)(39850400004)(346002)(136003)(396003)(366004)(451199021)(122000001)(54906003)(9686003)(6916009)(26005)(71200400001)(64756008)(66556008)(66476007)(66446008)(55016003)(4326008)(33656002)(91956017)(6506007)(53546011)(66946007)(76116006)(52536014)(5660300002)(7696005)(478600001)(84970400001)(2906002)(38070700005)(41300700001)(8936002)(8676002)(316002)(186003)(38100700002)(86362001)(83380400001);DIR:OUT;SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?us-ascii?Q?j1CoZk38MXitghdVYw4Nw2Ck2uNqJTSLBzjE01MbOhltzRfjLWwUPCUTpvP8?= =?us-ascii?Q?Fj5/h8kvLQko3cW54ZOB1rU1ABUXfQDTicoUHDYOxZ47Nt19js7ritBbpR1x?= =?us-ascii?Q?mlHHKdGUGIY2YoyZkGv/o4865VYXvtGbvfPf6rzR5X1eC3Z1WhTeIHFLEwkm?= =?us-ascii?Q?SXHPyq0slL1u55wTp177VT9OVJ8IxgFVM1uz0s9RU8S8MKk/9LjEXDFbuarc?= =?us-ascii?Q?ftERvOvD0XWTvuwHF7qB8StLLfoWAbYAEuiMkm8dciyrmTzT7TTAtmmC13/p?= =?us-ascii?Q?9AoEoFOPbI9Uj0I42cEo540K1SDizejpUxGt+A9s/UISbVCyQynCJH48Cwpn?= =?us-ascii?Q?14uwAWK3NptFbUA6x+BCCX3+K5hQCA8NhlitC6wHqQMPEn/AZ7mfE3P3FcYX?= =?us-ascii?Q?yNxkoKexqaE/ZxxwtiUbktAGxCyD1MVZKoMm6NZxmqC10mPdQagiucQwnarV?= =?us-ascii?Q?14wSiXb/tIa9R/qumU7z0TGGBVD4Bo51unvdGZG2ltcY93wp2FNTQiFqF8OW?= =?us-ascii?Q?cgvvss8Xb4ymgEt7y/vbsjEEeTnV//6J8VNPrPNsTnM3IABXOLy+PFkTkFhq?= =?us-ascii?Q?EAwKvFirUNShLEZrPwZafpw1g+ogtYYJwWOwRDKlRq684N81ZJ13+xHyxMy0?= =?us-ascii?Q?T3QS9EbBqFAFiWd4ww1OC+HgDKRn3VBUN5rn6d8gEcDXsp3yeXlgVxzwpAnZ?= =?us-ascii?Q?+ul3QfQF0sSQkkBru/jjG6DKOhAr28qW7kbBVqfbXJUaCEcnTuslrmfmmvU5?= =?us-ascii?Q?ZRa8zjzOeh/Fw8GZUedi80/R1LXVevV7075MdtDvfFd1TTCVdnKjoZeDMHMf?= =?us-ascii?Q?Em2WVCnucR1OVEPyeeAkQ9zKAIkUcixlavNbYB5iPrD/k4pUSh9NdCS6ehGA?= =?us-ascii?Q?8sgWQpa+3hzN6StV9l6EO7sO/RGs9dY44fcm+OLVEGNpnA8IprsA9wyNyK5d?= =?us-ascii?Q?neimUUFw5tw+Ld6QDn9ss5r4tdXbykAMYyWvcW+KrNUuUQaBnpQTIfSvHBcn?= =?us-ascii?Q?0zcdc75/42wyIMHnQghSWQRx75CU1Yk8jLqAlJ2nkhSFdOOTunKHoDZsBOk4?= =?us-ascii?Q?eEA8yftZEA3ujHQRLrhwhqnT0iIigE7l350LuHi5nqC67Qsa0/VinKSq10mN?= =?us-ascii?Q?KZpDPM7BEZSq0jilf8cRw4KKMXhI8PPbEkCNYxK9gGwlbjnms644vh/J1LEw?= =?us-ascii?Q?9B5qubzF3BBk4ibuezC9XIa1kb9aacnLpzbCj24TcnRvztmICwbYiA8phcPu?= =?us-ascii?Q?LUDJce5mRI4MPhx3yfIsv+2sW1CwTXa12ZGY5TTyyzePVi7lQNISvnljgIVB?= =?us-ascii?Q?0avFRhmfOFAob2pVQBy19C4LqwhA7Jxl8FpqoamNdOZ8pMnLqrlXG+37cGDC?= =?us-ascii?Q?WbGgxkuVArCkTeRQuYNGOuoR/qxXDiAL4ZGnP4/IUFwiGHmN6St6S8Z9mJQK?= =?us-ascii?Q?jNjxJMXAZEyvY5T39ZzQMwBmaT3W7cUfei+C12ftJ1Bki2SGv9GVjiiGWpYm?= =?us-ascii?Q?SPZJtiSBLHHkXM6HRPd61VioKe4zGju2DMan/14zrXQP5CEWGJquu+lrBD6f?= =?us-ascii?Q?hIMTm4EUfRp+YWAVNKfm+doP9794kqsBSzYrkb1Lh6/aL0tnukjy4rIzjVsX?= =?us-ascii?Q?wA=3D=3D?= Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: os.amperecomputing.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: SJ2PR01MB8635.prod.exchangelabs.com X-MS-Exchange-CrossTenant-Network-Message-Id: e0803396-d1d9-447a-86cd-08db9273bceb X-MS-Exchange-CrossTenant-originalarrivaltime: 01 Aug 2023 09:43:19.1199 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 3bc2b170-fd94-476d-b0ce-4229bdc904a7 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: UWe/W6Bi4T7yZJEJnqSboSBi4bHQIsqhAJeJaFpwnsOiVbbk6ZVDuMTACzPNZkfBOjh252KjOTJg+L+YHeqy9070QgIY40AsKop0Lsu/3rAggSdE8oRMZ+d1npEaNmhr X-MS-Exchange-Transport-CrossTenantHeadersStamped: SN7PR01MB8116 X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Richard, This is a quick fix to the several ICEs. It seems even STMT_VINFO_LIVE_P i= s true, some reduct stmts still don't have REDUC_DEF. So I change the chec= k to STMT_VINFO_REDUC_DEF. Is it OK for trunk? --- Fix the ICEs on empty reduction define. Even STMT_VINFO_LIVE_P is true, so= me reduct stmts still don't have definition. gcc/ChangeLog: PR target/110625 * config/aarch64/aarch64.cc (aarch64_force_single_cycle): check STMT_VINFO_REDUC_DEF to avoid failures in info_for_reduction --- gcc/config/aarch64/aarch64.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index d4d76025545..5b8d8fa8e2d 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -16776,7 +16776,7 @@ aarch64_adjust_stmt_cost (vect_cost_for_stmt kind, = stmt_vec_info stmt_info, static bool aarch64_force_single_cycle (vec_info *vinfo, stmt_vec_info stmt_info) { - if (!STMT_VINFO_LIVE_P (stmt_info)) + if (!STMT_VINFO_REDUC_DEF (stmt_info)) return false; auto reduc_info =3D info_for_reduction (vinfo, stmt_info); -- 2.40.0 ________________________________________ From: Richard Sandiford Sent: Monday, July 31, 2023 17:11 To: Hao Liu OS Cc: Richard Biener; GCC-patches@gcc.gnu.org Subject: Re: [PATCH] AArch64: Do not increase the vect reduction latency by= multiplying count [PR110625] Hao Liu OS writes: >> Which test case do you see this for? The two tests in the patch still >> seem to report correct latencies for me if I make the change above. > > Not the newly added tests. It is still the existing case causing the pre= vious ICE (i.e. assertion problem): gcc.target/aarch64/sve/cost_model_13.c. > > It's not the test case itself failed, but the dump message of vect says t= he "reduction latency" is 0: > > Before the change: > cost_model_13.c:7:21: note: Original vector body cost =3D 6 > cost_model_13.c:7:21: note: Scalar issue estimate: > cost_model_13.c:7:21: note: load operations =3D 1 > cost_model_13.c:7:21: note: store operations =3D 0 > cost_model_13.c:7:21: note: general operations =3D 1 > cost_model_13.c:7:21: note: reduction latency =3D 1 > cost_model_13.c:7:21: note: estimated min cycles per iteration =3D 1.0= 00000 > cost_model_13.c:7:21: note: estimated cycles per vector iteration (for= VF 8) =3D 8.000000 > cost_model_13.c:7:21: note: Vector issue estimate: > cost_model_13.c:7:21: note: load operations =3D 1 > cost_model_13.c:7:21: note: store operations =3D 0 > cost_model_13.c:7:21: note: general operations =3D 1 > cost_model_13.c:7:21: note: reduction latency =3D 2 > cost_model_13.c:7:21: note: estimated min cycles per iteration =3D 2.0= 00000 > > After the change: > cost_model_13.c:7:21: note: Original vector body cost =3D 6 > cost_model_13.c:7:21: note: Scalar issue estimate: > cost_model_13.c:7:21: note: load operations =3D 1 > cost_model_13.c:7:21: note: store operations =3D 0 > cost_model_13.c:7:21: note: general operations =3D 1 > cost_model_13.c:7:21: note: reduction latency =3D 0 <--- seems= not consistent with above result > cost_model_13.c:7:21: note: estimated min cycles per iteration =3D 1.0= 00000 > cost_model_13.c:7:21: note: estimated cycles per vector iteration (for= VF 8) =3D 8.000000 > cost_model_13.c:7:21: note: Vector issue estimate: > cost_model_13.c:7:21: note: load operations =3D 1 > cost_model_13.c:7:21: note: store operations =3D 0 > cost_model_13.c:7:21: note: general operations =3D 1 > cost_model_13.c:7:21: note: reduction latency =3D 0 <--- seems= not consistent with above result > cost_model_13.c:7:21: note: estimated min cycles per iteration =3D 1.0= 00000 <--- seems not consistent with above result > > BTW. this should be caused by the reduction stmt is not live, which indic= ates whether this stmts is part of a computation whose result is used outsi= de the loop (tree-vectorized.h:1204): > : > # res_18 =3D PHI > # i_20 =3D PHI > _1 =3D (long unsigned int) i_20; > _2 =3D _1 * 2; > _3 =3D x_14(D) + _2; > _4 =3D *_3; > _5 =3D (unsigned short) _4; > res.0_6 =3D (unsigned short) res_18; > _7 =3D _5 + res.0_6; <-- This is not live, = may be caused by the below type cast stmt. > res_15 =3D (short int) _7; > i_16 =3D i_20 + 1; > if (n_11(D) > i_16) > goto ; > else > goto ; > > : > goto ; Ah, I see, thanks. My concern was: if requiring !STMT_VINFO_LIVE_P stmts can cause "normal" reductions to have a latency of 0, could the same thing happen for single-cycle reductions? But I suppose the answer is "no". Introducing a cast like the above would cause reduc_chain_length > 1, and so: if (ncopies > 1 && (STMT_VINFO_RELEVANT (stmt_info) <=3D vect_used_only_live) && reduc_chain_length =3D=3D 1 && loop_vinfo->suggested_unroll_factor =3D=3D 1) single_defuse_cycle =3D true; wouldn't trigger. Which makes the single-cycle thing a bit hit-and-miss... So yeah, I agree the patch is safe after all. Please split the check out into a helper though, to avoid the awkward formatting: /* Return true if STMT_INFO is part of a reduction that has the form: r =3D r op ...; r =3D r op ...; with the single accumulator being read and written multiple times. */ static bool aarch64_force_single_cycle (vec_info *vinfo, stmt_vec_info stmt_info) { if (!STMT_VINFO_LIVE_P (stmt_info)) return false; auto reduc_info =3D info_for_reduction (vinfo, stmt_info); return STMT_VINFO_FORCE_SINGLE_CYCLE (reduc_info); } OK with that change, thanks. Richard