From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR05-AM6-obe.outbound.protection.outlook.com (mail-am6eur05on2084.outbound.protection.outlook.com [40.107.22.84]) by sourceware.org (Postfix) with ESMTPS id A63EC384AB78 for ; Fri, 3 May 2024 08:55:43 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A63EC384AB78 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org A63EC384AB78 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=40.107.22.84 ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1714726547; cv=pass; b=W3HqEUnbIm2eFLfwn+jAl2lw3/FXyqpnuO9h3vsBcYxFbR6lYbM63d7aK1xz6oBk73K9YOJ29pWdQ3BLQLw6olhZTKt7hdQJR8AVRlXeBpbW9BR9aGU3v47zPbbZ3SNYMWfA83c8L2L06Ljpt4cowxpKvRqGMj7azH//DTxad1w= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1714726547; c=relaxed/simple; bh=VJYntgtGvT1n5ggQ0hYL5K/64x1FPtTDNiabJc0HaT0=; h=DKIM-Signature:DKIM-Signature:Date:From:To:Subject:Message-ID: MIME-Version; b=nhCA2n/Vgd0x/WXq/k6yIa1A2nYgszsOlMiThBtXaR+2AhT23eRZ2KmpKNxcJFow1OGKtWJpAiFYjTxYdnX84HNPte7QuT/fDZOQGp0huJJB6jqJu3+9JLf2LzbLy7DcjHttcH9XDwSmsNBYVSWxwO2KQmBUmrnOLe33KxCFL24= ARC-Authentication-Results: i=3; server2.sourceware.org ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=VfSkOxRJtdnnz02cys0PCCdFI6iwhCKb20ZUW31rl25fnPQqhNEsBy4nm1Z9m6eU9YpNtBzldWbjl6fhDW/szUYLjjRnlSs1BvxRPeeb0H3++IYfeIjfXDBSo04r6xqKc9tFRyDs2Q1DyjYmC6V/BlLl0Xx7HnMD1gvMl2FPzPGiYXv7Z2Ee/Myh+sO0vdN22HZVdqAScEyEU77Uh1/a6pjavsrl0OqxD/ZjZD5fWhsNdQnxThjM5r1XUtKYCLorEPcP8PjgWKx+2+E8bKnZAV7PyITdjJ+SNyyudphMDYYrlOpYsxddRGY7THq6NMbFmW8enTw52MmYndLxWzE8pA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=ViZpgQE2ZvGIlg48uxQee4XrIHTEkcaGxRZT/CQDI7I=; b=PU0B4RAJcOfmUhGjMdkSDMYAeV5oz2XC+FE3EmY/tcyu50MaOOFzghaqhUiGjUbKM6jNMNWqCt+CK+QCgQq6TXeenbxqsj1GQeeDB4bww2vpU/QacDfRTBc/67rKX0NRIudh23Loo8iFU8+cMRMglhnBpwmq+v0Rv+XV4HNnB6bbXXy4+mvIoDHIGAf/amJ6/xDz34F+qraX1PqpMKSH8O2sYhmGV2/klCELfZSA2HIi2IYtv7YTAH2yekEUkyIk/K4ZuhuD8GA2oTDamzRKCEc++tGWWPRq4EZ9o850iDTnQxrTIYfJ3EMASLAPtvx0yWuCoDW2ufJBikfs++mZtw== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=arm.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=ViZpgQE2ZvGIlg48uxQee4XrIHTEkcaGxRZT/CQDI7I=; b=gBM58pd1AkjS2/SYpkC9IEnObnZ6j32g2n+QC7wsniuQKpDAPfn8vSD6aqB1VOPNGJoL9t8Y9BLedI83vZlHRZOIEpCxXCBzKanZTujDhusyD+9+ubWtZsxRnJDfWK5R0rgPLEP617/PhupxamuZIqWFaSSXXzXJVC9u/Ur0BsM= Received: from DB3PR06CA0033.eurprd06.prod.outlook.com (2603:10a6:8:1::46) by AM8PR08MB5556.eurprd08.prod.outlook.com (2603:10a6:20b:1db::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7544.29; Fri, 3 May 2024 08:55:35 +0000 Received: from DB1PEPF00050A00.eurprd03.prod.outlook.com (2603:10a6:8:1:cafe::96) by DB3PR06CA0033.outlook.office365.com (2603:10a6:8:1::46) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7544.30 via Frontend Transport; Fri, 3 May 2024 08:55:35 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=arm.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DB1PEPF00050A00.mail.protection.outlook.com (10.167.242.42) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.7519.19 via Frontend Transport; Fri, 3 May 2024 08:55:35 +0000 Received: ("Tessian outbound af213ececc3d:v315"); Fri, 03 May 2024 08:55:35 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 5694a7548eb74184 X-CR-MTA-TID: 64aa7808 Received: from 91a9633b61d8.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 65A73A12-D6BB-4DD8-B6D0-C3D94A538EAA.1; Fri, 03 May 2024 08:55:24 +0000 Received: from EUR04-DB3-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 91a9633b61d8.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Fri, 03 May 2024 08:55:24 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=VEPHSM2XhgUmCNSv+iuBppEQnDHpIAUpBdbYF7vs5arfBfBYUqebLGQOYSnJUdmGpddSn/4VxaLT/M5XRcFPLyFDVCpSYVxtC2tgibS8CgB/t9rUCZCkbOi/rPdBx62sUIG7dK5Dz64Rk3UnTWCFdcJlYyoxN9PAKzoNJdopwrmdJO+RC18MVbVKNZ+azzBtLv1iOoLsxk/mMfbeb7ZvJlP92CHkq/v9vgZI6qIoiZa7CQboI4h2bhSb8+tMLD9S7kVp696wHcuNX+3NABcppJcxndan4L29HGUkeXqoLhxoQYFS7Lox8IgByJg1/UO4e0RMPRxDajyrgAh0uYHADg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=ViZpgQE2ZvGIlg48uxQee4XrIHTEkcaGxRZT/CQDI7I=; b=At3nO/hmgnib50YbL9gTPglSpicTOgYaSKO244rCh/2SjS8tpZniBPamVV6tZer/Nx8UCX+zzKpepLSCG7/af95fhe1r9NIoF/CgoK3qCoM3TrfC/4dwQxbqZJfM8t5Tg25eBXU0bU+BPJ0ySfs/NvuRqyuIAdnMiurqaC9ZMAEsmcK/I137UNv54JzGAuX5O5PPvETqPX92WQLSA6thShtPyetBnlfhErg+LgTp4myfjsLzkqmWg6YN5CuIR0d5WsUsnt39byjY7V59v5lvcKqH6l2ro8w1V1xzok+7JyCkMani06OMWpvBgaL6T+EzZOukexJ49Y7N2V4EWrW05A== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=ViZpgQE2ZvGIlg48uxQee4XrIHTEkcaGxRZT/CQDI7I=; b=gBM58pd1AkjS2/SYpkC9IEnObnZ6j32g2n+QC7wsniuQKpDAPfn8vSD6aqB1VOPNGJoL9t8Y9BLedI83vZlHRZOIEpCxXCBzKanZTujDhusyD+9+ubWtZsxRnJDfWK5R0rgPLEP617/PhupxamuZIqWFaSSXXzXJVC9u/Ur0BsM= Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from AS8PR08MB8947.eurprd08.prod.outlook.com (2603:10a6:20b:5b3::22) by VI0PR08MB11043.eurprd08.prod.outlook.com (2603:10a6:800:250::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7544.21; Fri, 3 May 2024 08:55:20 +0000 Received: from AS8PR08MB8947.eurprd08.prod.outlook.com ([fe80::181e:2de2:dfd:9ada]) by AS8PR08MB8947.eurprd08.prod.outlook.com ([fe80::181e:2de2:dfd:9ada%6]) with mapi id 15.20.7544.029; Fri, 3 May 2024 08:55:19 +0000 Date: Fri, 3 May 2024 09:55:16 +0100 From: Alex Coplan To: Ajit Agarwal Cc: Richard Sandiford , "Kewen.Lin" , Segher Boessenkool , Michael Meissner , David Edelsohn , Peter Bergner , gcc-patches Subject: Re: [PATCH V4 1/3] aarch64: Place target independent and dependent changed code in one file Message-ID: References: <595cd863-3ba5-492b-84d1-ab470411507e@linux.ibm.com> <52a437cc-4452-4e0e-86a9-84e653401275@linux.ibm.com> Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: LO4P265CA0232.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:315::14) To AS8PR08MB8947.eurprd08.prod.outlook.com (2603:10a6:20b:5b3::22) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: AS8PR08MB8947:EE_|VI0PR08MB11043:EE_|DB1PEPF00050A00:EE_|AM8PR08MB5556:EE_ X-MS-Office365-Filtering-Correlation-Id: 206bd6ff-4a36-4e6e-466c-08dc6b4ecbf3 x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0;ARA:13230031|1800799015|376005|366007; X-Microsoft-Antispam-Message-Info-Original: =?utf-8?B?TTZ5M01JcS9tbGtxVDJzeGpvaTdDa2ZLbHdib2FMR3pEOVA2MWtnNTRwWjJo?= =?utf-8?B?NlhNUTUwY2p3cStkM0RaWnVyb0RMTFJSSG4xRkdoS3dVMUxLY0l6OHNVZk1w?= =?utf-8?B?Yk8vc2F6Qk82bHhUM1lNSjJYbVhJTzI5WW5XbVFBeWdXeFJXcXNuNnBxOWZK?= =?utf-8?B?NGNEVmZnVTRUd2xHZTdhUHVnS1FPbE9TdmtoTnZ0WFpVQjd2b2FXMDhtSWp1?= =?utf-8?B?TXlwVUs5ek9ZT0ZWNVRkNW9BTVdmU05FRnlLQWZnVFUreW45RWF1dmFWZERB?= =?utf-8?B?QVNOeVhMOTZURmFscmZ2ZVpBMEdKaFJLMmkvU3puaHc5c211Tm1XMnpFc21H?= =?utf-8?B?VDRRUklXSW1WTlVldGQvUGcrY29uVzFXcHVVaXRUL1RhUExkYk93Zy8xeUFT?= =?utf-8?B?UHBxTzRNamdUWEYwK2U3dnA4WTQwNTFaSWd3WmhFajhvRUcxQVRvSCtSWHA0?= =?utf-8?B?cjV0OWkxTE8yWG1zUWE2UjZldTBaS0xYdkRtbWRqc3dvR0VBaVBhUUUxOGdz?= =?utf-8?B?M2grblBUZ2k3di9QUHFkZ3ZBNWVScjJTUjY1ZUFHQTZ0Ym95WllPWktxM2RN?= =?utf-8?B?Ym11NzVEaEZJNWJZaXg1MnduZ05yZGRZU2JzVDViQTVzY2xaSkxRZDVQaW9P?= =?utf-8?B?YWJMZDAyVXBKWG8xV2tsQ2lGbkFvemhhc2UzeWNZL2hSRFo2dm8rb2ttRHY4?= =?utf-8?B?elFva3FBbGNTY2JDc3FMZUdxVE9iTk1zdGxVSVNVaWhTeHQ3RWhzdVYvODVF?= =?utf-8?B?ZWxwTU1BMUh5RUtKOFdzVU94alF0c3plUnQwdUJMWmIyRmxwMDZjU1VRSW5Q?= =?utf-8?B?SmdnOFREY2xZaXJPNUV3bFc1N2VZV0VpYWU2bkpPeUx0RlZ3cVM4YWJOY3lP?= =?utf-8?B?QXJBeWhNbHdhaTMxaFcwMWYwaVhqMWV4M2cxbVBqUFRHRUF4OWhRYU51U1Fs?= =?utf-8?B?bTdhR1F0ZWJ3ZzhKQ3FXSkRTU3FzdFVaSUJLVlFaa0J1bU0xUTNpUk9EZUU0?= =?utf-8?B?NzhXckFyQnNPL0VrTXBKSWVHOU0vUmpHQ3R4MlY4TkFKUm0vQUU1WEJyMkFL?= =?utf-8?B?aVJVRHdOUXNzdCs3OHdrNThZYXpYcFpFL2VzR2tKditJWG5yWng5c3ZNQ1o3?= =?utf-8?B?UXZSRzkrYnRJVWhxUWdlYjVHaVcvMzE2S3VjOHE3ODNMdUFsS2l1VzNHWEJY?= =?utf-8?B?VDdwMWlBOTBhWmxhOURhdGkwQ3hjWXJSZkhBdHJLcW84N3loRFltbXQ1ZlpF?= =?utf-8?B?Mm9HMFZ6QzZQUmlwM0F0S04wZythdXVCU0d5R3dpRGYzUk80NWF6YmxjWXRF?= =?utf-8?B?WFNDM0kxK21JWEJMSlJBVktZbWZ4clIvL25hR0orN0dJVmZ5WVRMTWpOVTJj?= =?utf-8?B?S1VVaWFITTE5NXQxeCszQUJSWkNteEdMTlFNMjdScnUzSnM4N3RkZ0ZCZjlF?= =?utf-8?B?VUYwMFRjOGM5eTF6NHhMRmtaVTBsR2JPcmRSWjB1MDc2YUZJRFlwTmMzZFp0?= =?utf-8?B?SjFNNjV4ZFhnelVIRDcwU1hJTWtRc3hWSW1OYlUwd0ZYOHBCL3BoUGJUME1l?= =?utf-8?B?WDdiZm5RRU5tOEtlcUpXb1k0Y2dreU1idHhzVHpkWDdEK0tlR0xXZDVQcjhV?= =?utf-8?B?WUswdXZXVFRLcUxYdUxDTHhMR3VWemZNMmZpYlh0Yk1EU2V3Q21iaEw5VlFy?= =?utf-8?B?bWkyckVWbnB4TGFFSkFrRlpOd0JGRko0dDRuSzA4NyszeVVHRVhydkhRPT0=?= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:AS8PR08MB8947.eurprd08.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230031)(1800799015)(376005)(366007);DIR:OUT;SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI0PR08MB11043 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DB1PEPF00050A00.eurprd03.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 5a0e8b11-2fc2-4f49-90d8-08dc6b4ec27b X-Microsoft-Antispam: BCL:0;ARA:13230031|36860700004|35042699013|1800799015|376005; X-Microsoft-Antispam-Message-Info: =?utf-8?B?a1BMb0doUUtFV2w4WHVWSmFHcGJOL0hkejJHSkZ1eXdnbzdCVmo4S2xZQlVj?= =?utf-8?B?T2doV1drZjVJN0I1Zmw0dGZFZ3V4LzY5ZHphZGU2TE0rcW01Qjc1MlBVcFZq?= =?utf-8?B?K2dRUUh4Wk1UbmNoT051ZTFwR0lEY1FveitNWEhYZnVsT2VOT1RDWmYxYnI0?= =?utf-8?B?Mjc5d21FYUtKWlExUmNSMnU4NzFjb0NEMEN5dzVmQWF4VzFKQkdlSzVtSDBJ?= =?utf-8?B?NVBvZ3VrZDkyN1NoMk5CYStqWkdQWktrVkVZNG13ZkxuMHFsb0xBUEFORFVW?= =?utf-8?B?aHppUHpaVUlNZlpqaW5nT3ZuWWd4eUxERnlDamtVT3pNMFNuajU3TjJTbHEx?= =?utf-8?B?QmRmOWNpVzc5NHN1STFFTUY4OE40L0M5c20yQ3hSandhS21RQzYxMkQydzM1?= =?utf-8?B?L2NqTWM2eGZyd25lN0dvYkxBOFVmOHRRNlAzT1Q3L05qcjY3aHp2MTl2ckpt?= =?utf-8?B?KzRlenVZUUltRjVyaGMzUnFKMmpvaFlQUUh1a2xKQkF0QlBVdTZ0VGJKMHpO?= =?utf-8?B?N2RNSUtrRGx4K2U2YUUwSWsxTzNqdUJoTGszOEtuWmVGeUt1VngxeDJsOVVi?= =?utf-8?B?Q1hTaVZzTW5mOVN5Z2ZBek1xRndxcDdLcVdXSmVNL013dkVGMmd5Z1B0Nnh4?= =?utf-8?B?RUJxeTZ1VGVnaytBOVRsc3FqcE5OM1QxVnN6YVpmcXdKVjQvU0cvMU1BMXFm?= =?utf-8?B?R0ozWVM3d3FPYWRKRWRtdWt6bFdYaGVsT3NmWmpmU1FSNm5GeUVHSDNKdzVT?= =?utf-8?B?THN0UmdZQkVpTFlzZDF1ck5sS0ZRaXY2Z3lULzhBTXlmbFdFSllhelJnZlNS?= =?utf-8?B?Sko3R0VtSGRUSzM4OVEvVDlmbXFYSmt5WmNZZktaWWNJN2svYTJyc2IwaEtZ?= =?utf-8?B?NlFvT2lNY25lbnBuSXMvUTg4KzZMY2ZLTzlhSXZWQUNDUEdjL3VJbTFhcEc5?= =?utf-8?B?RGYxdFd6SGYyZXNjenNIMnRHMG5QS3B3aDE1cXduRjRCRDBlTk5mZnJiS0hy?= =?utf-8?B?WFZDS3ZYQm85dlpkMDQxbUdsYzgraWpPb2ZNam55WHZsWTFpTDlvc3k5NFNm?= =?utf-8?B?ZFdDR3AvQUcvejVuOXI2M21hdEN4aElZS04rbVFvUXE5b1dHeXl1TlhnSEMv?= =?utf-8?B?Y08vb2R1NE0zUDI0WjY1Q1dpTWVaVmJ2R2NPWnMxQ3RwU3JsaHUxVjNhMmlr?= =?utf-8?B?RUV6SVcrT2FsV01PNzRkMFNuQzNqU2pycElkcXZpNVRKMGVNL29OTUVKUkN2?= =?utf-8?B?S09VL2x2WHJLdFloMXF6N2swRk4xT1pHTlVWZVJYK29xSDlFYVJGdjkxR0tj?= =?utf-8?B?ZXlkTUFLNWhsVDlDdW14YWVsRnBBcWRBa3lyTGhiN2l0TkxhcDF6TEcvMWV0?= =?utf-8?B?a1pNUEhaRlZPM2NSd28yZlRKWWZBN2VCRGhrbG9TVGFsbmY3MnlLNXRkQWx4?= =?utf-8?B?eG0wVWo0SWdwckg3Zy92YzZ5VTB6V09pN1dLYlBIelBkSWpiK1Nsa0J1bXNC?= =?utf-8?B?TU9KMnppTjJIMjc4Rjl1VjlQSVRucnJOS3RNM2V2QzdyQVBvSmMrMFh4Q0tx?= =?utf-8?B?d3ZtMkZnaE1NdTk4bGQ1aVN2SkFENzRBZjAzM3A5a3pCUmovRUZFQWdxWFZP?= =?utf-8?B?Z0s3U0NMRE9jL1VvRk1QNTZYSC9nVTVhRnp0MkNOd0s3SFgweGc2S2kvcTdk?= =?utf-8?B?Sm1Sb2pLamJMVURqTGFRUnZ3RjAzMmZJZ3Z2ejh2dlVTRmNYMjVNOTBBUzJw?= =?utf-8?B?dTV1aERBaVN0SGd0TVI1U0ExVWV2a0V6UVkxMldhdjVlUFhwR2dUQ05TaWVi?= =?utf-8?B?OHlxTWpGWGl0SE1HdFU0MHJLTWxtTDQ0ejFOcVpkckIrU1VXUXZKRXBsV2xN?= =?utf-8?Q?BbRinoUW7v4Nm?= X-Forefront-Antispam-Report: CIP:63.35.35.123;CTRY:IE;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:64aa7808-outbound-1.mta.getcheckrecipient.com;PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com;CAT:NONE;SFS:(13230031)(36860700004)(35042699013)(1800799015)(376005);DIR:OUT;SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 03 May 2024 08:55:35.2638 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 206bd6ff-4a36-4e6e-466c-08dc6b4ecbf3 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d;Ip=[63.35.35.123];Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DB1PEPF00050A00.eurprd03.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM8PR08MB5556 X-Spam-Status: No, score=-6.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FORGED_SPF_HELO,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE,TXREP,UNPARSEABLE_RELAY autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 22/04/2024 13:01, Ajit Agarwal wrote: > Hello Alex: > > On 14/04/24 10:29 pm, Ajit Agarwal wrote: > > Hello Alex: > > > > On 12/04/24 11:02 pm, Ajit Agarwal wrote: > >> Hello Alex: > >> > >> On 12/04/24 8:15 pm, Alex Coplan wrote: > >>> On 12/04/2024 20:02, Ajit Agarwal wrote: > >>>> Hello Alex: > >>>> > >>>> On 11/04/24 7:55 pm, Alex Coplan wrote: > >>>>> On 10/04/2024 23:48, Ajit Agarwal wrote: > >>>>>> Hello Alex: > >>>>>> > >>>>>> On 10/04/24 7:52 pm, Alex Coplan wrote: > >>>>>>> Hi Ajit, > >>>>>>> > >>>>>>> On 10/04/2024 15:31, Ajit Agarwal wrote: > >>>>>>>> Hello Alex: > >>>>>>>> > >>>>>>>> On 10/04/24 1:42 pm, Alex Coplan wrote: > >>>>>>>>> Hi Ajit, > >>>>>>>>> > >>>>>>>>> On 09/04/2024 20:59, Ajit Agarwal wrote: > >>>>>>>>>> Hello Alex: > >>>>>>>>>> > >>>>>>>>>> On 09/04/24 8:39 pm, Alex Coplan wrote: > >>>>>>>>>>> On 09/04/2024 20:01, Ajit Agarwal wrote: > >>>>>>>>>>>> Hello Alex: > >>>>>>>>>>>> > >>>>>>>>>>>> On 09/04/24 7:29 pm, Alex Coplan wrote: > >>>>>>>>>>>>> On 09/04/2024 17:30, Ajit Agarwal wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On 05/04/24 10:03 pm, Alex Coplan wrote: > >>>>>>>>>>>>>>> On 05/04/2024 13:53, Ajit Agarwal wrote: > >>>>>>>>>>>>>>>> Hello Alex/Richard: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> All review comments are incorporated. > >>>>>>> > >>>>>>>>>>>>>>>> @@ -2890,8 +3018,8 @@ ldp_bb_info::merge_pairs (insn_list_t &left_list, > >>>>>>>>>>>>>>>> // of accesses. If we find two sets of adjacent accesses, call > >>>>>>>>>>>>>>>> // merge_pairs. > >>>>>>>>>>>>>>>> void > >>>>>>>>>>>>>>>> -ldp_bb_info::transform_for_base (int encoded_lfs, > >>>>>>>>>>>>>>>> - access_group &group) > >>>>>>>>>>>>>>>> +pair_fusion_bb_info::transform_for_base (int encoded_lfs, > >>>>>>>>>>>>>>>> + access_group &group) > >>>>>>>>>>>>>>>> { > >>>>>>>>>>>>>>>> const auto lfs = decode_lfs (encoded_lfs); > >>>>>>>>>>>>>>>> const unsigned access_size = lfs.size; > >>>>>>>>>>>>>>>> @@ -2909,7 +3037,7 @@ ldp_bb_info::transform_for_base (int encoded_lfs, > >>>>>>>>>>>>>>>> access.cand_insns, > >>>>>>>>>>>>>>>> lfs.load_p, > >>>>>>>>>>>>>>>> access_size); > >>>>>>>>>>>>>>>> - skip_next = access.cand_insns.empty (); > >>>>>>>>>>>>>>>> + skip_next = bb_state->cand_insns_empty_p (access.cand_insns); > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> As above, why is this needed? > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> For rs6000 we want to return always true. as load store pair > >>>>>>>>>>>>>> that are to be merged with 8/16 16/32 32/64 is occuring for rs6000. > >>>>>>>>>>>>>> And we want load store pair to 8/16 32/64. Thats why we want > >>>>>>>>>>>>>> to generate always true for rs6000 to skip pairs as above. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Hmm, sorry, I'm not sure I follow. Are you saying that for rs6000 you have > >>>>>>>>>>>>> load/store pair instructions where the two arms of the access are storing > >>>>>>>>>>>>> operands of different sizes? Or something else? > >>>>>>>>>>>>> > >>>>>>>>>>>>> As it stands the logic is to skip the next iteration only if we > >>>>>>>>>>>>> exhausted all the candidate insns for the current access. In the case > >>>>>>>>>>>>> that we didn't exhaust all such candidates, then the idea is that when > >>>>>>>>>>>>> access becomes prev_access, we can attempt to use those candidates as > >>>>>>>>>>>>> the "left-hand side" of a pair in the next iteration since we failed to > >>>>>>>>>>>>> use them as the "right-hand side" of a pair in the current iteration. > >>>>>>>>>>>>> I don't see why you wouldn't want that behaviour. Please can you > >>>>>>>>>>>>> explain? > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> In merge_pair we get the 2 load candiates one load from 0 offset and > >>>>>>>>>>>> other load is from 16th offset. Then in next iteration we get load > >>>>>>>>>>>> from 16th offset and other load from 32 offset. In next iteration > >>>>>>>>>>>> we get load from 32 offset and other load from 48 offset. > >>>>>>>>>>>> > >>>>>>>>>>>> For example: > >>>>>>>>>>>> > >>>>>>>>>>>> Currently we get the load candiates as follows. > >>>>>>>>>>>> > >>>>>>>>>>>> pairs: > >>>>>>>>>>>> > >>>>>>>>>>>> load from 0th offset. > >>>>>>>>>>>> load from 16th offset. > >>>>>>>>>>>> > >>>>>>>>>>>> next pairs: > >>>>>>>>>>>> > >>>>>>>>>>>> load from 16th offset. > >>>>>>>>>>>> load from 32th offset. > >>>>>>>>>>>> > >>>>>>>>>>>> next pairs: > >>>>>>>>>>>> > >>>>>>>>>>>> load from 32th offset > >>>>>>>>>>>> load from 48th offset. > >>>>>>>>>>>> > >>>>>>>>>>>> Instead in rs6000 we should get: > >>>>>>>>>>>> > >>>>>>>>>>>> pairs: > >>>>>>>>>>>> > >>>>>>>>>>>> load from 0th offset > >>>>>>>>>>>> load from 16th offset. > >>>>>>>>>>>> > >>>>>>>>>>>> next pairs: > >>>>>>>>>>>> > >>>>>>>>>>>> load from 32th offset > >>>>>>>>>>>> load from 48th offset. > >>>>>>>>>>> > >>>>>>>>>>> Hmm, so then I guess my question is: why wouldn't you consider merging > >>>>>>>>>>> the pair with offsets (16,32) for rs6000? Is it because you have a > >>>>>>>>>>> stricter alignment requirement on the base pair offsets (i.e. they have > >>>>>>>>>>> to be a multiple of 32 when the operand size is 16)? So the pair > >>>>>>>>>>> offsets have to be a multiple of the entire pair size rather than a > >>>>>>>>>>> single operand size> > >>>>>>>>>> > >>>>>>>>>> We get load pair at a certain point with (0,16) and other program > >>>>>>>>>> point we get load pair (32, 48). > >>>>>>>>>> > >>>>>>>>>> In current implementation it takes offsets loads as (0, 16), > >>>>>>>>>> (16, 32), (32, 48). > >>>>>>>>>> > >>>>>>>>>> But In rs6000 we want the load pair to be merged at different points > >>>>>>>>>> as (0,16) and (32, 48). for (0,16) we want to replace load lxvp with > >>>>>>>>>> 0 offset and other load (32, 48) with lxvp with 32 offset. > >>>>>>>>>> > >>>>>>>>>> In current case it will merge with lxvp with 0 offset and lxvp with > >>>>>>>>>> 16 offset, then lxvp with 32 offset and lxvp with 48 offset which > >>>>>>>>>> is incorrect in our case as the (16-32) case 16 offset will not > >>>>>>>>>> load from even register and break for rs6000. > >>>>>>>>> > >>>>>>>>> Sorry, I think I'm still missing something here. Why does the address offset > >>>>>>>>> affect the parity of the tranfser register? ISTM they needn't be related at > >>>>>>>>> all (and indeed we can't even know the parity of the transfer register before > >>>>>>>>> RA, but perhaps you're only intending on running the pass after RA?) > >>>>>>>>> > >>>>>>>> > >>>>>>>> We have load pair with (0,16) wherein these loads are adjacent and > >>>>>>>> replaced with lxvp. > >>>>>>>> > >>>>>>>> Semantic of lxvp instruction is that it loads adjacent load pair in > >>>>>>>> even register and even_register + 1. > >>>>>>>> > >>>>>>>> We replace the above load pair with lxvp instruction and then we > >>>>>>>> dont need to merge (16,32) as (0, 16) is already merged and instead > >>>>>>>> we merge (32,48). > >>>>>>> > >>>>>>> Ok, but the existing logic should already account for this. I.e. if we > >>>>>>> successfully merge (0,16), then we don't attempt to merge (16,32). We'd only > >>>>>>> attempt to merge (16,32) if the merge of (0,16) failed (for whatever reason). > >>>>>>> So I don't see that there's anything specific to lxvp that requires this logic > >>>>>>> to change, _unless_ you have a stricter alignment requirement on the offsets as > >>>>>>> I mentioned before. > >>>>>>> > >>>>>> > >>>>>> Thanks for the suggestion. It worked for rs6000 also with current changes. > >>>>>> Sorry for the confusion. > >>>>> > >>>>> Alright, glad we got to the bottom of this! > >>>> > >>>> Thanks. > >>>>> > >>>>>> > >>>>>>>> > >>>>>>>> Yes you are correct, the addresss offset doesn't affect the parity of > >>>>>>>> the register transfer as we are doing fusion pass before RA. > >>>>>>>> If that is the case then I think it would be better to introduce a > >>>>>>>>>>> virtual function (say pair_offset_alignment_ok_p) that vets the base > >>>>>>>>>>> offset of the pair (prev_access->offset in transform_for_base). I guess > >>>>>>>>>>> it would also take access_size as a parameter and for aarch64 it should > >>>>>>>>>>> check: > >>>>>>>>>>> > >>>>>>>>>>> multiple_p (offset, access_size) > >>>>>>>>>>> > >>>>>>>>>>> and for rs6000 it could check: > >>>>>>>>>>> > >>>>>>>>>>> multiple_p (offset, access_size * 2) > >>>>>>>>>>> > >>>>>>>>>>> and we would then incorporate a call to that predicate in the else if > >>>>>>>>>>> condition of tranform_for_base. > >>>>>>>>>>> > >>>>>>>>>>> It would have the same limitation whereby we assume that MEM_EXPR offset > >>>>>>>>>>> alignment is a good proxy for RTL offset alignment, but we already > >>>>>>>>>>> depend on that with the multiple_p check in track_via_mem_expr. > >>>>>>>>>>> > >>>>>>>> I have addressed the above hooks and it worked fine with both rs6000 > >>>>>>>> and aarch64. I am sending subsequent patch in some time that address > >>>>>>>> above. > >>>>>>>> > >>>>>>>>> How do you plan on handling this even-odd requirement for rs6000? > >>>>>>>>> > >>>>>>>> > >>>>>>>> We plan to handle with V16QI subreg: 0 and V16QI subreg : 16 to > >>>>>>>> generate register pair and thats what we generate and implement > >>>>>>>> in rs6000 target > >>>>>>>> code. > >>>>>>> > >>>>>>> Ah, this is coming back to me now. Sorry, I should have remembered this from > >>>>>>> the previous discussion with Richard S. > >>>>>>> > >>>>>>> Apologies for going on a slight tangent here, but if you're running > >>>>>>> before RA are you planning to create a new OImode pseudo register for > >>>>>>> the lxvp insn and then somehow update uses of the old transfer registers > >>>>>>> to replace them with subregs of that OImode pseudo? > >>>>>> > >>>>>> Yes I do the same as you have mentioned. We generate register pairs > >>>>>> with 256 bit mode with two subregs of 128 bit modes with 0 and > >>>>>> 16 offset. > >>>>>> > >>>>>> Or do you just plan > >>>>>>> or replacing the individual loads with moves (instead of deleting them)? > >>>>>>> I guess the latter would be simpler and might work better in the > >>>>>>> presence of hard regs. > >>>>>>> > >>>>>> > >>>>>> Would you mind explaining how to generate register pairs with lxvp by > >>>>>> replacing loads with moves. > >>>>> > >>>>> Yeah, so suppose you have something like: > >>>>> > >>>>> (set (reg:V16QI v1) (mem:V16QI addr)) > >>>>> (set (reg:V16QI v2) (mem:V16QI addr+16)) > >>>>> > >>>>> then when you insert the lxvp you can then (logically) replace the > >>>>> original load instructions with moves from the appropriate subreg, as > >>>>> follows: > >>>>> > >>>>> (set (reg:OI pair-pseudo) (mem:OI addr)) ; lxvp > >>>>> (set (reg:V16QI v1) (subreg:V16QI (reg:OI pair-pseudo) 0)) > >>>>> (set (reg:V16QI v2) (subreg:V16QI (reg:OI pair-pseudo) 16)) > >>>>> > >>>> > >>>> Any Pseudo created with gen_rtx_REG like > >>>> gen_rtx_REG (OOmode, REGNO (dest_exp) will error'ed out > >>>> with unrecognize insn by LRA. > >>> > >>> I'm not surprised that goes wrong: you can't just create a new REG > >>> rtx in a different mode and reuse the regno of an existing pseudo. > >>> > > > > Thanks for the suggestion. > >>>> > >>>> If I create pseudo with gen_reg_rtx (OOmode) will error'ed > >>>> out with new_defs Pseudo register is not found in > >>>> change->new_defs. > >>> > >>> Yeah, I suppose you'd need to add an RTL-SSA def for the new pseudo. > >>> > >> > >> Would you mind explaining how can I add and RTL-SSA def for the > >> new pseudo. > > > > I have added and RTL-SSA def for the new pseudo. With that I could > > get register oairs correctly. > >> > >>>> > >>>> Also the sequential register pairs are not generated by > >>>> Register Allocator. > >>> > >>> So how do you get the sequential pairs with your current approach? My > >>> understanding was that what I suggested above doesn't really change what > >>> you're doing w.r.t the lxvp insn itself, but maybe I didn't properly > >>> understand the approach taken in your initial patchset. > >>> > >> > >> I generate (set (reg:OO pair-pseudo) (mem:OI addr)) ; lxvp > >> and then at the use point of pair_pseudo generate the following. > >> > >> (subreg:V16QI (reg:OI pair-pseudo) 0)) > >> (subreg:V16QI (reg:OI pair-pseudo) 16)) > >> > > > > I get register pairs correctly generating RTL as you have > > suggested but we get extra moves that would impact the performance. > > > > Please let me know what do you think. > > > > For the below testcase: > > #include > > void > foo (__vector_quad *dst, vector unsigned char *ptr, vector unsigned char src) > { > __vector_quad acc; > __builtin_mma_xvf32ger(&acc, src, ptr[0]); > __builtin_mma_xvf32gerpp(&acc, src, ptr[1]); > *dst = acc; > } > > > My earlier implementation without moves generates the following assembly > for the above testcase: > > .LFB0: > .cfi_startproc > .localentry _Z3fooPu13__vector_quadPDv16_hS1_,1 > lxvp %vs32,0(%r4) > xvf32ger 0,%vs34,%vs33 > xvf32gerpp 0,%vs34,%vs32 > xxmfacc 0 > stxvp %vs2,0(%r3) > stxvp %vs0,32(%r3) > blr > > > With moves you have suggested I get the below generated assembly: > .LFB0: > .cfi_startproc > .localentry _Z3fooPu13__vector_quadPDv16_hS1_,1 > lxvp %vs0,0(%r4) > xxlor %vs33,%vs1,%vs1 > xxlor %vs32,%vs0,%vs0 > xvf32ger 0,%vs34,%vs33 > xvf32gerpp 0,%vs34,%vs32 > xxmfacc 0 > stxvp %vs2,0(%r3) > stxvp %vs0,32(%r3) > blr > > > As you see with register moves two extra xxlor instructions are > generated. I see. There's no need to change your approach for now, then, it was just a suggestion. I was hoping the RA would be able to eliminate the redundant moves, but it looks like that isn't happening here. Richard S might have some pointers for you w.r.t. this approach when he gets back. Sorry for the delay in getting back to you, I was busy with other things last week. I'll try to look at the latest version of the patch you sent for the pass. Thanks, Alex > > Please let me know what do you think. > > Thanks & Regards > Ajit > > Thanks & Regards > > Ajit > >> Thanks & Regards > >> Ajit > >>> Thanks, > >>> Alex > >>> > >>>> > >>>> Thats why I haven't used above method as I also thought > >>>> through. > >>>> > >>>> Please let me know what do you think. > >>>> > >>>> Thanks & Regards > >>>> Ajit > >>>>> now I'm not sure off-hand if this exact combination of subregs and mode > >>>>> changes is valid, but hopefully you get the idea. The benefit of this > >>>>> approach is that it keeps the transformation local and is more > >>>>> compile-time efficient (we don't have to look through all mentions of > >>>>> v1/v2 and replace them with subregs). I'd hope that the RA can then > >>>>> clean up any redundant moves (especially in the case that v1/v2 are > >>>>> pseudos). > >>>>> > >>>>> That would mean that you don't need all the grubbing around with DF > >>>>> looking for occurrences of the transfer regs. Instead we'd just need > >>>>> some logic to insert the extra moves after the lxvp for rs6000, and I > >>>>> think this might fit more cleanly into the current pass. > >>>>> > >>>>> Does that make sense? In any case, this shouldn't really affect the > >>>>> preparatory aarch64 patch because we were planning to defer adding any > >>>>> hooks that are only needed for rs6000 from the initial aarch64/generic > >>>>> split. > >>>>> > >>>>> Thanks, > >>>>> Alex > >>>>> > >>>>>> > >>>>>> Thanks & Regards > >>>>>> Ajit > >>>>>> > >>>>>>> Thanks, > >>>>>>> Alex > >>>>>>> > >>>>>>>> > >>>>>>>> Thanks & Regards > >>>>>>>> Ajit > >>>>>>>>> Thanks, > >>>>>>>>> Alex > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> lxvp should load from even registers and then loaded value will > >>>>>>>>>> be in even register and even register +1 (which is odd). > >>>>>>>>>> > >>>>>>>>>> Thanks & Regards > >>>>>>>>>> Ajit > >>>>>>>>>>> If that is the case then I think it would be better to introduce a > >>>>>>>>>>> virtual function (say pair_offset_alignment_ok_p) that vets the base > >>>>>>>>>>> offset of the pair (prev_access->offset in transform_for_base). I guess > >>>>>>>>>>> it would also take access_size as a parameter and for aarch64 it should > >>>>>>>>>>> check: > >>>>>>>>>>> > >>>>>>>>>>> multiple_p (offset, access_size) > >>>>>>>>>>> > >>>>>>>>>>> and for rs6000 it could check: > >>>>>>>>>>> > >>>>>>>>>>> multiple_p (offset, access_size * 2) > >>>>>>>>>>> > >>>>>>>>>>> and we would then incorporate a call to that predicate in the else if > >>>>>>>>>>> condition of tranform_for_base. > >>>>>>>>>>> > >>>>>>>>>>> It would have the same limitation whereby we assume that MEM_EXPR offset > >>>>>>>>>>> alignment is a good proxy for RTL offset alignment, but we already > >>>>>>>>>>> depend on that with the multiple_p check in track_via_mem_expr. > >>>>>>>>>>> > >>>>>>>>>>> Thanks, > >>>>>>>>>>> Alex > >>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> Thanks & Regards > >>>>>>>>>>>> Ajit > >>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>> Alex > >>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> } > >>>>>>>>>>>>>>>> prev_access = &access; > >>>>>>>>>>>>>>>> } > >>>>>>>>>>>>>>>> @@ -2919,7 +3047,7 @@ ldp_bb_info::transform_for_base (int encoded_lfs, > >>>>>>>>>>>>>>>> // and remove all the tombstone insns, being sure to reparent any uses > >>>>>>>>>>>>>>>> // of mem to previous defs when we do this. > >>>>>>>>>>>>>>>> void > >>>>>>>>>>>>>>>> -ldp_bb_info::cleanup_tombstones () > >>>>>>>>>>>>>>>> +pair_fusion_bb_info::cleanup_tombstones () > >>>>>>>>>>>>>>>> { > >>>>>>>>>>>>>>>> // No need to do anything if we didn't emit a tombstone insn for this BB. > >>>>>>>>>>>>>>>> if (!m_emitted_tombstone) > >>>>>>>>>>>>>>>> @@ -2947,7 +3075,7 @@ ldp_bb_info::cleanup_tombstones () > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> template > >>>>>>>>>>>>>>>> void > >>>>>>>>>>>>>>>> -ldp_bb_info::traverse_base_map (Map &map) > >>>>>>>>>>>>>>>> +pair_fusion_bb_info::traverse_base_map (Map &map) > >>>>>>>>>>>>>>>> { > >>>>>>>>>>>>>>>> for (auto kv : map) > >>>>>>>>>>>>>>>> { > >>>>>>>>>>>>>>>> @@ -2958,7 +3086,7 @@ ldp_bb_info::traverse_base_map (Map &map) > >>>>>>>>>>>>>>>> } > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> void > >>>>>>>>>>>>>>>> -ldp_bb_info::transform () > >>>>>>>>>>>>>>>> +pair_fusion_bb_info::transform () > >>>>>>>>>>>>>>>> { > >>>>>>>>>>>>>>>> traverse_base_map (expr_map); > >>>>>>>>>>>>>>>> traverse_base_map (def_map); > >>>>>>>>>>>>>>>> @@ -3167,14 +3295,13 @@ try_promote_writeback (insn_info *insn) > >>>>>>>>>>>>>>>> // for load/store candidates. If running after RA, also try and promote > >>>>>>>>>>>>>>>> // non-writeback pairs to use writeback addressing. Then try to fuse > >>>>>>>>>>>>>>>> // candidates into pairs. > >>>>>>>>>>>>>>>> -void ldp_fusion_bb (bb_info *bb) > >>>>>>>>>>>>>>>> +void pair_fusion::ldp_fusion_bb (bb_info *bb) > >>>>>>>>>>>>>>>> { > >>>>>>>>>>>>>>>> - const bool track_loads > >>>>>>>>>>>>>>>> - = aarch64_tune_params.ldp_policy_model != AARCH64_LDP_STP_POLICY_NEVER; > >>>>>>>>>>>>>>>> - const bool track_stores > >>>>>>>>>>>>>>>> - = aarch64_tune_params.stp_policy_model != AARCH64_LDP_STP_POLICY_NEVER; > >>>>>>>>>>>>>>>> + const bool track_loads = track_load_p (); > >>>>>>>>>>>>>>>> + const bool track_stores = track_store_p (); > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> - ldp_bb_info bb_state (bb); > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> This: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> + aarch64_pair_fusion derived; > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> can be deleted and then: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> + pair_fusion_bb_info bb_info (bb, &derived); > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> can just be: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> pair_fusion_bb_info bb_info (bb, this); > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> (or you can pass *this if you make bb_info take a reference). > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> I don't think there's a particular need to change the variable name > >>>>>>>>>>>>>>> (bb_state -> bb_info). I chose the former because it doens't clash > >>>>>>>>>>>>>>> with the RTL-SSA structure of the same name as the latter. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Addressed. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> for (auto insn : bb->nondebug_insns ()) > >>>>>>>>>>>>>>>> { > >>>>>>>>>>>>>>>> @@ -3184,31 +3311,31 @@ void ldp_fusion_bb (bb_info *bb) > >>>>>>>>>>>>>>>> continue; > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> rtx pat = PATTERN (rti); > >>>>>>>>>>>>>>>> - if (reload_completed > >>>>>>>>>>>>>>>> - && aarch64_ldp_writeback > 1 > >>>>>>>>>>>>>>>> - && GET_CODE (pat) == PARALLEL > >>>>>>>>>>>>>>>> - && XVECLEN (pat, 0) == 2) > >>>>>>>>>>>>>>>> + if (pair_mem_promote_writeback_p (pat)) > >>>>>>>>>>>>>>>> try_promote_writeback (insn); > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> It looks like try_promote_writeback itself will need some further work > >>>>>>>>>>>>>>> to make it target-independent. I suppose this check: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> auto rti = insn->rtl (); > >>>>>>>>>>>>>>> const auto attr = get_attr_ldpstp (rti); > >>>>>>>>>>>>>>> if (attr == LDPSTP_NONE) > >>>>>>>>>>>>>>> return; > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> bool load_p = (attr == LDPSTP_LDP); > >>>>>>>>>>>>>>> gcc_checking_assert (load_p || attr == LDPSTP_STP); > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> will need to become part of the pair_mem_promote_writeback_p hook that you > >>>>>>>>>>>>>>> added, potentially changing it to return a boolean for load_p. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Then I guess we will need hooks for destructuring the pair insn and > >>>>>>>>>>>>>>> another hook to wrap aarch64_gen_writeback_pair. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Addressed. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> if (GET_CODE (pat) != SET) > >>>>>>>>>>>>>>>> continue; > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> if (track_stores && MEM_P (XEXP (pat, 0))) > >>>>>>>>>>>>>>>> - bb_state.track_access (insn, false, XEXP (pat, 0)); > >>>>>>>>>>>>>>>> + bb_info.track_access (insn, false, XEXP (pat, 0)); > >>>>>>>>>>>>>>>> else if (track_loads && MEM_P (XEXP (pat, 1))) > >>>>>>>>>>>>>>>> - bb_state.track_access (insn, true, XEXP (pat, 1)); > >>>>>>>>>>>>>>>> + bb_info.track_access (insn, true, XEXP (pat, 1)); > >>>>>>>>>>>>>>>> } > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> - bb_state.transform (); > >>>>>>>>>>>>>>>> - bb_state.cleanup_tombstones (); > >>>>>>>>>>>>>>>> + bb_info.transform (); > >>>>>>>>>>>>>>>> + bb_info.cleanup_tombstones (); > >>>>>>>>>>>>>>>> } > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> void ldp_fusion () > >>>>>>>>>>>>>>>> { > >>>>>>>>>>>>>>>> ldp_fusion_init (); > >>>>>>>>>>>>>>>> + pair_fusion *pfuse; > >>>>>>>>>>>>>>>> + aarch64_pair_fusion derived; > >>>>>>>>>>>>>>>> + pfuse = &derived; > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> This is indeed the one place where I think it is acceptable to > >>>>>>>>>>>>>>> instantiate aarch64_pair_fusion. But again there's no need to create a > >>>>>>>>>>>>>>> pointer to the parent class, just call any function you like directly. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Addressed. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> for (auto bb : crtl->ssa->bbs ()) > >>>>>>>>>>>>>>>> - ldp_fusion_bb (bb); > >>>>>>>>>>>>>>>> + pfuse->ldp_fusion_bb (bb); > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> I think even the code to iterate over bbs should itself be a member > >>>>>>>>>>>>>>> function of pair_fusion (say "run") and then that becomes part of the > >>>>>>>>>>>>>>> generic code. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> So this function would just become: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> aarch64_pair_fusion pass; > >>>>>>>>>>>>>>> pass.run (); > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> and could be inlined into the caller. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Addressed. > >>>>>>>>>>>>>>> Perhaps you could also add an early return like: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> if (!track_loads_p () && !track_stores_p ()) > >>>>>>>>>>>>>>> return; > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> in pair_fusion::run () and then remove the corresponding code from > >>>>>>>>>>>>>>> pass_ldp_fusion::gate? > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Addressed. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> ldp_fusion_destroy (); > >>>>>>>>>>>>>>>> } > >>>>>>>>>>>>>>>> -- > >>>>>>>>>>>>>>>> 2.39.3 > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>>>> Alex > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Thanks & Regards > >>>>>>>>>>>>>> Ajit