From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from NAM12-MW2-obe.outbound.protection.outlook.com (mail-mw2nam12on2121.outbound.protection.outlook.com [40.107.244.121]) by sourceware.org (Postfix) with ESMTPS id D755C3858CD1 for ; Mon, 31 Jul 2023 09:25:18 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D755C3858CD1 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=os.amperecomputing.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=os.amperecomputing.com ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=kR2KAFAeZDKl9fuWbk8ClNrgg+EdfXEmccNHsDqxr6tcyKklcU1QhRDaH7oa/0vd7RZq4GmXoHlQqJiSDaEQibIfuaSitMIbzMSDLTwRtgzt524Y5g/9jT7bB9PPtT2Y/atdJPyjzChBl1N8uMDqKRFWa3YFcRjEvY8LPQfWqM+7Q0O1WCYCqyLI/AC8Mot6rZx0lqCCUnJCi+2NC+5iG6IIx2i7pMSGcdR+guFnp9m1xXxUMx1WVqN/ujWQyOtRF7QeGTiRxfxeMJtsnv707AvDQV+35qyt8oi6V0HD3ddYAcsHYsTgPXM6Afwg1k6NXd2gE5v89G4jWaANeSrv1w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=aQca3iHtXii6+wxxJtqxtZBhufSnx9Hh9Bc6MYGhCL8=; b=cNeqmoNPIrWDR4jv8WTavS5cU2HVxpxPJLK+lGLFvMqd+TmCa/Vw3WiA9AuzS6E/DbAQrifUHUfeetFXRySj5YYKAHvQISwAj/Ksk7+j40D45n79WviW7beX4SDG52ZSNKk1UlfAXXD3xRldOK5VPxTJYVo+3E7Tyz2VYCPsmsGY0oTKdn+uxYJJN8vo5YPqeRY9u9bcoTwaxrFRMlS3bvz/AY9vSlwRHfAbPHL8VuvC6iSBAvnyCw5YCXqORRNa6bFBFxjDVOPKegfPVeLMhEsUMyO3gL+kvp19Ac1m46xQ10ZvB5Jr1jrRP/neU3FQECtcuYXfjJy5JzX6fJ18eg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=os.amperecomputing.com; dmarc=pass action=none header.from=os.amperecomputing.com; dkim=pass header.d=os.amperecomputing.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=os.amperecomputing.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=aQca3iHtXii6+wxxJtqxtZBhufSnx9Hh9Bc6MYGhCL8=; b=OjFkgTQa6tm9vXh/XSgGFcO2XQICskJvSI+qdJiZ5O4XkKOtPEcJ99yj/YVX77ewXxuMjgK3Huvahu5SVNw6iXd9lFKyf+9d2AkG59uZ6D81FI/Sc8vAQ4rDyBEC5Anc+Bkvn4Nz65gB130TWhIqadO0nrHG++zsulec+rWieD8= Received: from SJ2PR01MB8635.prod.exchangelabs.com (2603:10b6:a03:57b::16) by BL1PR01MB7721.prod.exchangelabs.com (2603:10b6:208:395::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6631.29; Mon, 31 Jul 2023 09:25:14 +0000 Received: from SJ2PR01MB8635.prod.exchangelabs.com ([fe80::4c34:715d:c446:7fc5]) by SJ2PR01MB8635.prod.exchangelabs.com ([fe80::4c34:715d:c446:7fc5%5]) with mapi id 15.20.6631.042; Mon, 31 Jul 2023 09:25:13 +0000 From: Hao Liu OS To: Richard Sandiford CC: Richard Biener , "GCC-patches@gcc.gnu.org" Subject: Re: [PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625] Thread-Topic: [PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625] Thread-Index: AQHZuflIfzSQE63nDUGU9bz2B+2yKq/Iyx9zgAFtZ+aAAAyoV4ABDpZUgABz54CAAAd444AAOh/TgAN2f3eAA7nMmIAAcKY/gAAC6vE= Date: Mon, 31 Jul 2023 09:25:13 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-US, zh-CN Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Enabled=True;MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_SiteId=3bc2b170-fd94-476d-b0ce-4229bdc904a7;MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_SetDate=2023-07-31T09:25:12.364Z;MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Name=Confidential;MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_ContentBits=0;MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Method=Standard; authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=os.amperecomputing.com; x-ms-publictraffictype: Email x-ms-traffictypediagnostic: SJ2PR01MB8635:EE_|BL1PR01MB7721:EE_ x-ms-office365-filtering-correlation-id: a8003a88-f844-4e08-3bb9-08db91a80b4e x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: 6lFTOj9tjUge4pmZUFRs/p8Uyc+Sps8c3Bbn5HAvsNiwv+jrU+o8aErYhZF3Bbn0X4BJBWp1uhRPcO0bj4OavTCaDNvNHfFvZMb9wVS9ImHQjCqw11WLPsfCF+/UaY393jkY8ru8S8I1ktqgaKOfeNRPFQxH3gRvJZlJ8vYLaS/yBun7mFsNvyrzc9bvdfUV3vfCX7Blz7cRQSpUWLv584d/AjIIaSDoT22SvSibfXsLMRjyGQP8mxjWpedmlKJGb/cvjVmnyT+b7vXlLhMzrDcMpouVOGuzG9QfjRlmVjYKtgloo+RXir7788qW1cPO++vozOqjYJH36VZ86eIp9uCE+4o0v1EIuxbOHIkOxzVRqHg/2T1H5UU0WnqfzIyqv0K9DlOIQGnNR2ivDKCGWSelg+3L/MpcAN4EyYcxdPlG9n151znzyqF9URKxkFrI3cU+/we/f+TIzDeuQ5xJIO8gAkPIRxdNrvI1a81/SsUoWZzRVv9FjpQThR5zM6nH1FnLDW9tMw5t7oO9vtcVcfCtzuGORGQvT35rFWFQYcoD6qcS92JaDkcJTwEG7PUEYIn0BpxtP6XFuufiKTvboK7KIPVy6Y2eR59n3F0ApD0= x-forefront-antispam-report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:SJ2PR01MB8635.prod.exchangelabs.com;PTR:;CAT:NONE;SFS:(13230028)(4636009)(366004)(346002)(396003)(136003)(39850400004)(376002)(451199021)(122000001)(54906003)(478600001)(84970400001)(55016003)(5660300002)(7696005)(4326008)(64756008)(316002)(8676002)(6916009)(8936002)(2906002)(71200400001)(52536014)(91956017)(38070700005)(9686003)(86362001)(38100700002)(83380400001)(76116006)(66476007)(6506007)(66946007)(66446008)(41300700001)(53546011)(186003)(26005)(66556008)(33656002);DIR:OUT;SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?us-ascii?Q?7YRYNYqoNZBPKzMURwMnDfOZBfpDD/6Sz1ufM+5XOXQ+9wM4/n4q0EEe5wUP?= =?us-ascii?Q?MZHb51MEaB6Id18lLy53r7T7krpKSwBuBhuqYR0knJdIVdS9KweEgDmP5nTC?= =?us-ascii?Q?EgDHxX6BlTlOoPWICMeej58PohzvJ5vDUYmeUnoktSB/bAtcEeNd4/GewKpt?= =?us-ascii?Q?qTSstDsTZRYMB9qLprqKMx8Zbo0cy71Ay/BuKim8aRrCJHY76IOo9wZ8++1d?= =?us-ascii?Q?gtB6332WEL3VynhawuCO7JcHmiCJCSs6ACon12W3gOidTURJoWDSiMKSZ+UO?= =?us-ascii?Q?7+w6sD8TwpNKGY9crs5TYFcKzKLLACTmgWfEJihhtBXDegKFUmGctinm/Yrc?= =?us-ascii?Q?Jy1o3YeSPO1Mll9VPCE2Xl4myT4ToyCR3ia9tUf/mD6B5RDOOhL2anH1kIYZ?= =?us-ascii?Q?WH44mutrPX4lftcml1wT3okYipYzur9Lz7Hx7iysj7zc8AupgaDLV+2/Nkot?= =?us-ascii?Q?1sYKt/7kUlQFOB0/89+AZfY/Yt4k0XOv6BPONVio/IVBv97AAMF/OXjwXMS3?= =?us-ascii?Q?Ppqi3WYyrzDQ2TLC7CZgou6fE8PhOIr5hMK14lyHl4FvbLoNQX3I0H0rGmNt?= =?us-ascii?Q?ye8DuhEFCPkIMIGqWY0Wd0bDOTov+Y5TLMwLKkv3yGnuxsSTDtEHJ6um0hUZ?= =?us-ascii?Q?BVjVLrtJGdpOWzHcDeuqAMSvSfLEgrj1LPbcIzCnd9f0CvZUTbArIqBwdwgx?= =?us-ascii?Q?eMwAPiZwaNlHXKgsiZDWXEHYQ0RkrTZMuZOiMb180I5KalmjdV/qUAqTxSuK?= =?us-ascii?Q?+4ZBaauKHRbv/mX6LOq14nS7Wq564tMYKQVzEWdWcXI6cm+l0N9XWOoMdgiL?= =?us-ascii?Q?iG3gjAfS49BP/wG/ytK1LLv8Fi93NnebWpzXH9hSkmVo21kUKJDsFnKGwypY?= =?us-ascii?Q?yR6vwA/3jP0+ENuJzB5OvE805kb50QG4ai/9hpSxN1pPkEAUYDCwiHIUXFJu?= =?us-ascii?Q?P4jKWZSqtgCF7ATBQS80+Ga+ezlye7CrvHEiI1RhfsNfZVYgttAFPXmq/wcV?= =?us-ascii?Q?2Sb98DsrQ3G1bezItFydNi6KEHrYRYuzdkqBbVqA9WS+tTEhktMryYV+HLtU?= =?us-ascii?Q?ZkwQLbtYwAg78oZ29E0N92qwhkC9uBzHKQXvn5C1uvzCPDM0eF3oGbHYBa5g?= =?us-ascii?Q?emIQSUTqy5dVpY7N25GSKM3Wtcfg+1Lc4GdjF+T6GtgnszwGaukWwFEtNxeC?= =?us-ascii?Q?Q8bapNIEErrM7GEtuxxkbMrWvKgYCRxr4PelSSKKHoVCR8Nv25CG/9g7GvMT?= =?us-ascii?Q?Q8nSxxicRisxX4m4sMsGBuY5hN0coGnMgIufeHRHjULAb3y3R1AuasOtC110?= =?us-ascii?Q?m4S3CLRWB8ozT73aBCDGKjxgUdUB9gc7fX8ldHdbzhBS+KAxKoE6gVMgUauO?= =?us-ascii?Q?IpvocbOfkIeb9K8qf6jn5ZhgnaR+ZAGPFeaJ9Bxl1iLIeyAawLtUqfORCall?= =?us-ascii?Q?BEM/1rHs4o2vlcOsd8dokhwq8zWUog+2o2jlWfWryjrFuNJaVZtzKwI95rpU?= =?us-ascii?Q?+0/g5SiOBinbkhDPnaU1GUhhbxpI25oTEMycV3xqp+pNc0BcnQ3GTYBihIzW?= =?us-ascii?Q?s0l9qz0wjMFT5jyUkn1mXAPsD5X48ZmhMzgdKi88YecZWIskXloZcmPrftpD?= =?us-ascii?Q?mg=3D=3D?= Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: os.amperecomputing.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: SJ2PR01MB8635.prod.exchangelabs.com X-MS-Exchange-CrossTenant-Network-Message-Id: a8003a88-f844-4e08-3bb9-08db91a80b4e X-MS-Exchange-CrossTenant-originalarrivaltime: 31 Jul 2023 09:25:13.3049 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 3bc2b170-fd94-476d-b0ce-4229bdc904a7 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: 3pSJEZ6WZaZ6Sp4r/IFqRQYE4MReOQXjgEFIJyzWaxCC/517LwrrVoNg3Ww6ypNJCBOG/Dhi4nhkTy3/Ye14nMUbjsjLasciOr6PIxbdmfZo6892B8w+xIMo6xQNSaqc X-MS-Exchange-Transport-CrossTenantHeadersStamped: BL1PR01MB7721 X-Spam-Status: No, score=-6.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,KAM_SHORT,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Sure, the helper makes the code simpler. I'll test the new patch and push = if there is no other issue. Thanks, Hao ________________________________________ From: Richard Sandiford Sent: Monday, July 31, 2023 17:11 To: Hao Liu OS Cc: Richard Biener; GCC-patches@gcc.gnu.org Subject: Re: [PATCH] AArch64: Do not increase the vect reduction latency by= multiplying count [PR110625] Hao Liu OS writes: >> Which test case do you see this for? The two tests in the patch still >> seem to report correct latencies for me if I make the change above. > > Not the newly added tests. It is still the existing case causing the pre= vious ICE (i.e. assertion problem): gcc.target/aarch64/sve/cost_model_13.c. > > It's not the test case itself failed, but the dump message of vect says t= he "reduction latency" is 0: > > Before the change: > cost_model_13.c:7:21: note: Original vector body cost =3D 6 > cost_model_13.c:7:21: note: Scalar issue estimate: > cost_model_13.c:7:21: note: load operations =3D 1 > cost_model_13.c:7:21: note: store operations =3D 0 > cost_model_13.c:7:21: note: general operations =3D 1 > cost_model_13.c:7:21: note: reduction latency =3D 1 > cost_model_13.c:7:21: note: estimated min cycles per iteration =3D 1.0= 00000 > cost_model_13.c:7:21: note: estimated cycles per vector iteration (for= VF 8) =3D 8.000000 > cost_model_13.c:7:21: note: Vector issue estimate: > cost_model_13.c:7:21: note: load operations =3D 1 > cost_model_13.c:7:21: note: store operations =3D 0 > cost_model_13.c:7:21: note: general operations =3D 1 > cost_model_13.c:7:21: note: reduction latency =3D 2 > cost_model_13.c:7:21: note: estimated min cycles per iteration =3D 2.0= 00000 > > After the change: > cost_model_13.c:7:21: note: Original vector body cost =3D 6 > cost_model_13.c:7:21: note: Scalar issue estimate: > cost_model_13.c:7:21: note: load operations =3D 1 > cost_model_13.c:7:21: note: store operations =3D 0 > cost_model_13.c:7:21: note: general operations =3D 1 > cost_model_13.c:7:21: note: reduction latency =3D 0 <--- seems= not consistent with above result > cost_model_13.c:7:21: note: estimated min cycles per iteration =3D 1.0= 00000 > cost_model_13.c:7:21: note: estimated cycles per vector iteration (for= VF 8) =3D 8.000000 > cost_model_13.c:7:21: note: Vector issue estimate: > cost_model_13.c:7:21: note: load operations =3D 1 > cost_model_13.c:7:21: note: store operations =3D 0 > cost_model_13.c:7:21: note: general operations =3D 1 > cost_model_13.c:7:21: note: reduction latency =3D 0 <--- seems= not consistent with above result > cost_model_13.c:7:21: note: estimated min cycles per iteration =3D 1.0= 00000 <--- seems not consistent with above result > > BTW. this should be caused by the reduction stmt is not live, which indic= ates whether this stmts is part of a computation whose result is used outsi= de the loop (tree-vectorized.h:1204): > : > # res_18 =3D PHI > # i_20 =3D PHI > _1 =3D (long unsigned int) i_20; > _2 =3D _1 * 2; > _3 =3D x_14(D) + _2; > _4 =3D *_3; > _5 =3D (unsigned short) _4; > res.0_6 =3D (unsigned short) res_18; > _7 =3D _5 + res.0_6; <-- This is not live, = may be caused by the below type cast stmt. > res_15 =3D (short int) _7; > i_16 =3D i_20 + 1; > if (n_11(D) > i_16) > goto ; > else > goto ; > > : > goto ; Ah, I see, thanks. My concern was: if requiring !STMT_VINFO_LIVE_P stmts can cause "normal" reductions to have a latency of 0, could the same thing happen for single-cycle reductions? But I suppose the answer is "no". Introducing a cast like the above would cause reduc_chain_length > 1, and so: if (ncopies > 1 && (STMT_VINFO_RELEVANT (stmt_info) <=3D vect_used_only_live) && reduc_chain_length =3D=3D 1 && loop_vinfo->suggested_unroll_factor =3D=3D 1) single_defuse_cycle =3D true; wouldn't trigger. Which makes the single-cycle thing a bit hit-and-miss... So yeah, I agree the patch is safe after all. Please split the check out into a helper though, to avoid the awkward formatting: /* Return true if STMT_INFO is part of a reduction that has the form: r =3D r op ...; r =3D r op ...; with the single accumulator being read and written multiple times. */ static bool aarch64_force_single_cycle (vec_info *vinfo, stmt_vec_info stmt_info) { if (!STMT_VINFO_LIVE_P (stmt_info)) return false; auto reduc_info =3D info_for_reduction (vinfo, stmt_info); return STMT_VINFO_FORCE_SINGLE_CYCLE (reduc_info); } OK with that change, thanks. Richard