From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR05-DB8-obe.outbound.protection.outlook.com (mail-db8eur05on2054.outbound.protection.outlook.com [40.107.20.54]) by sourceware.org (Postfix) with ESMTPS id A1B34385842C for ; Tue, 9 Apr 2024 15:09:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A1B34385842C Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org A1B34385842C Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=40.107.20.54 ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1712675399; cv=pass; b=DFnASgMvDoo3iDo4aIqet//VELxuzqTwiMBfdWw8yXBFSyPzDAXzOiLY03+EdndBnNNGw2BMfND26rQN5AhKqsoO20T+IM8pNBhpFiuQa+WbWwF2d1eqFU9aZ50VYGTNf7o4b21KfXmglTZCRvytRhpdVysM7OgHZbfUz85diYQ= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1712675399; c=relaxed/simple; bh=V3G0x6bOOo/d1dvD1B3Oju5DDdHTyCuHMuPGjUFQqfs=; h=DKIM-Signature:DKIM-Signature:Date:From:To:Subject:Message-ID: MIME-Version; b=Mm0KiYgfALKYxRjh2O4764pSZvSgPugdssEJScqM78OJs46+vweGOcujZdlCAHN608COKB2ZuaLAOLckCortP8rI9HVOJcPAt9lzqLF1fzFtcYet3WOTI2vlQFygBoG1j7GqmdRWmrsUNHJDI1NHULfsyOlPLS9OJobLfLiWhqE= ARC-Authentication-Results: i=3; server2.sourceware.org ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=mO2op+H/yqTQlxFq9cDqVTN/0oILE4+bPaZJ4WI/Be2l59P6WuEpLKWNsHvpbZtdEqkWWELXl87p/cK45x0lduS25gvihF2MvP/ouJg7xzuBMo6zvo2gH6fSMlU+6RLDy2jGvSkhYmxicmJpb4lSkyVSAFCEAxP0g5gJgyCPB30c15SnYjYn7qGhn8kmfbpMz1KBZBdH9AIk4KNPPfuqb/ZQU2mZZRBDv2wEbUAZAErILz/KyGm3wF40yieLVTuH7hseUCI4YjaVJeVJfy0k1E3wUf2LM9owZbuLTb1DnV3fRNC6ptzq4CLGQMCiMvDg2BlYtQJo3YmlqzURyXyPEw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=P5PeGJcpY8255qKC3JFzrOqkIX+9pozHjyb67KvIJaM=; b=XxD1fhl+rgMoyGeuUaaMBLZW2iHxFBA/Id+drmIicFqw4ZRr8Dg08K8Ho4FxsCiKcMacMlkRpNq00chyQy8MnpdibFL0JqraZj3qPGDbbbpB3iYc2qRi75fLutDXgZbI+gAtsX/eFrIfgjnZk5GmElSmREtgcfmEYh9XE//Wo7vahw60/6jLl/OHzNeYVMwydBSLZcbvfnH6oXW+ZNlniNH8RtIJSRbIvgisShaxQv3nnUXEMrPRFCCvbOuG51Xft/JtGw/AKgFJ3Nok+SF8m8VpGiaGq4JAR8ATaUT1qLHxhusckSReD/suoVD6DJqNYr3nS1jWsOvhh3e5zj87LQ== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=P5PeGJcpY8255qKC3JFzrOqkIX+9pozHjyb67KvIJaM=; b=zo9kq7D7c9gKk4F1emCU+1h4rJUgW0xFkJ7EKrGyYQy6ZLBoIuvrSQpv14JeGH94QsSMEyskiTPXsLDu3wPyrUWk3NniiMaw/jd0EcMiNv6S8IPwP4JqDIJvIZNip0ZS1HyWdCz3FSfo52VcsDHinteVcdJ8uA1CMRYpWgI/QEE= Received: from AM8P190CA0022.EURP190.PROD.OUTLOOK.COM (2603:10a6:20b:219::27) by DU0PR08MB7762.eurprd08.prod.outlook.com (2603:10a6:10:3ba::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7409.46; Tue, 9 Apr 2024 15:09:50 +0000 Received: from AMS0EPF00000195.eurprd05.prod.outlook.com (2603:10a6:20b:219:cafe::f7) by AM8P190CA0022.outlook.office365.com (2603:10a6:20b:219::27) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7452.35 via Frontend Transport; Tue, 9 Apr 2024 15:09:50 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AMS0EPF00000195.mail.protection.outlook.com (10.167.16.215) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.7452.22 via Frontend Transport; Tue, 9 Apr 2024 15:09:49 +0000 Received: ("Tessian outbound 9d16f63426bd:v300"); Tue, 09 Apr 2024 15:09:49 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 41723ddde8073c02 X-CR-MTA-TID: 64aa7808 Received: from cfef13d0a282.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 95E38609-5B19-479C-932A-4A54ADA69A53.1; Tue, 09 Apr 2024 15:09:39 +0000 Received: from EUR02-DB5-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id cfef13d0a282.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Tue, 09 Apr 2024 15:09:39 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=kUcYH5j9M34E/Wz7VTx+f4czWB6c317L5ozmE7HoGjP/dxXzfFgHAc+wEghUgM8gzudw6pzAQt/Wz2sHCVvd2nQ8nOKtIGEJrFih7iNXm+kkwjJjv7OGJwgUWKrvAJ8rPQjyMx8nKq3d3gwYehPdFdddV/Jnp3qA4gVoqf5/wH2n758hjkxYRnIKblJKRbMvkFEgEuLSJvtqrr5uB/TrpO/8KFmlbqcMYdXKGDr9rsFJLoS54jMaNkMEhXV4SV5rltlTJ0WPbaG9Ed0prx9m/fnZ0IXN+xZ/S15wHRD9leCsj/54naELTRqfdE59HrhIvor1F+IqboZb0N80VGHO0A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=P5PeGJcpY8255qKC3JFzrOqkIX+9pozHjyb67KvIJaM=; b=htfVx8zTOOJZQeQZUR0csFI7JOSY2ao6UjNd/hnR8T+bZgfbQ2fz3JDPEBeeGQwHboYM7HbMZQDeL6S/oOZ2Cu4Kif2mSWwg9nx18B7u1WZMbewJYn2ghiGOVleE4GEhPOOopgp5rn0ctB2YPhrEzbczjRamrvlmS2zv3NPywfjCfGW/c/2+kdat7w1ts1HjOkTltLPSvmT6PARbWdsMruwUVmVRXqe5ocrZ2IljwN8bjCxKex+IRz608/vW5W8cBrRWSn7imC+OL+d0Yc6G66U8evAiELdXUV62Z81CKOHD4HSaygmdfPwhfbwUKMuoNRVB6ItiYg3RPI0d0N/RMg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=P5PeGJcpY8255qKC3JFzrOqkIX+9pozHjyb67KvIJaM=; b=zo9kq7D7c9gKk4F1emCU+1h4rJUgW0xFkJ7EKrGyYQy6ZLBoIuvrSQpv14JeGH94QsSMEyskiTPXsLDu3wPyrUWk3NniiMaw/jd0EcMiNv6S8IPwP4JqDIJvIZNip0ZS1HyWdCz3FSfo52VcsDHinteVcdJ8uA1CMRYpWgI/QEE= Received: from PAWPR08MB8958.eurprd08.prod.outlook.com (2603:10a6:102:33e::15) by AS4PR08MB8023.eurprd08.prod.outlook.com (2603:10a6:20b:586::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7409.46; Tue, 9 Apr 2024 15:09:36 +0000 Received: from PAWPR08MB8958.eurprd08.prod.outlook.com ([fe80::9561:db03:3f6c:5634]) by PAWPR08MB8958.eurprd08.prod.outlook.com ([fe80::9561:db03:3f6c:5634%5]) with mapi id 15.20.7409.042; Tue, 9 Apr 2024 15:09:36 +0000 Date: Tue, 9 Apr 2024 16:09:32 +0100 From: Alex Coplan To: Ajit Agarwal Cc: Richard Sandiford , "Kewen.Lin" , Segher Boessenkool , Michael Meissner , David Edelsohn , Peter Bergner , gcc-patches Subject: Re: [PATCH V4 1/3] aarch64: Place target independent and dependent changed code in one file Message-ID: References: Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: LO4P123CA0356.GBRP123.PROD.OUTLOOK.COM (2603:10a6:600:18d::19) To PAWPR08MB8958.eurprd08.prod.outlook.com (2603:10a6:102:33e::15) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: PAWPR08MB8958:EE_|AS4PR08MB8023:EE_|AMS0EPF00000195:EE_|DU0PR08MB7762:EE_ x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: Uqo06GknFtZWfC0dCqKWXIbq4XsfW1fZ8A8PIZKdh3e999oEZCpSR6KmjPgJmcIePVMq4c9S5ZU7KbsfxS436LUZtdCO7rr1ZnpkVbYUzgQDa96o3N1/ODL5zlpOEF00twKK+QKUZHkySFax20IYUNnrTpBXT0S0UDJZIa+dqDhc5+5dEt52DTcE+GpkAI5mcTFW5nGZZ6xeAb07+PeOF2by94d3lPM3nSK8gV8RE+ExwR9cr1taKSRm5KobyCuDzeRA58bp0t/B0CcxXJtzO6MoWXAbvmvpyqQ6q1ooPPFM39vP3Ugsahafak7DpoynJQnGS10byMrAiltPPbXJQ5apHKNyhfr9X6p7tWABlSLziYtiVmm7uD7hDBRCeS4hsjJFTcvpipRwxpMR+K/FIlI/T2mWedGQCq8jFRJ1CJknmNmLnXYAzK5JVpMQ5M8T4GIqLchZLptw1d7o4EP358yQFZURz1bQXoqaFGJWqzi/8fwIyyc+ZnrtDlchmjcJ/mAhovokpDSjX2tQIffed7gIpV6Q5l5GEpCVBvTfsBbolEiO09htSfomPivo+GPCLcZnOQxH7ziBJVXj3N+4I/PDHKqcepGaxrrmnE287opQ+7XkYBueWNpJ/XYlAKRlTL1D+M/Q6crvl2R1/wa3PxYxRkJcvLjnq3Q8yGLM/M0= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:PAWPR08MB8958.eurprd08.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230031)(366007)(1800799015)(376005);DIR:OUT;SFP:1102; X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS4PR08MB8023 X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AMS0EPF00000195.eurprd05.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 9577811c-9489-4ee6-1a96-08dc58a71a1a X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: KbHI8pkzhjLUWHDfPz2ohLcKUuhN13gO7Qr8a0OhYXKl76aNL8zPh/VxL4wHfIVv4ysmlsI2VMSjTBWni3RMj6mA6VE3XJaaDu/vjQ/k05ID+/sRX+EIRIh4ngRPCdDteVzG13hdtpakCob0ZPV83Ja6oxktN611kpu0UPF7jLPtAJlJdovzXoBMKkFVHztN5zPVlvmaRMySd4rLO3SfbY2dKdu7+SHrjHGweEV8OOIroVq3bCKkjLtclGNN5vI4vkZK46TncNb7S7V0d0DrbWEJa511QMskxAFV5ILYa8DDh0PoKMIp0OKXOMDDWgNixgaPFFPp3LNAg+LVTFBCjevcANpp49rb8eSJtoSNveTM2Bg/jv/HDEi/WPm7iPtE1JPu5F1OhlZGKYBst0dC81wC8HkSNxK4/Y37yPj+lelDBs9L3jawEqYSHI7vdNdph2GpLg2ncmem66hBNNYfpzCYlP8niTlLfIMAvYDh+o8Q/xx0F9blk2AKj8y2NflIaRUwiGvMaZs1/CnfotXT9KtfVo9bIPYwmN2HKKlu/uQZ/oO+1X/oevkFDgizA0AEuWCw0WlyQg5CvWCUPMP0pnUVnLsFFxRw8tpjgteGve5UoycsvAQniLs6OEQ5mHyNAh98WoqdD28QLoqRZOfgyehJf78ikhxg82z7NEXNuUT+d5wsXkzA83JvofYyLOS/WaZrB36FVRJiewPoYPCQKLVFqOcoB8ww+mg9GvpNX9hhvrX+gErA+I/uUZDTf8Kp X-Forefront-Antispam-Report: CIP:63.35.35.123;CTRY:IE;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:64aa7808-outbound-1.mta.getcheckrecipient.com;PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com;CAT:NONE;SFS:(13230031)(36860700004)(82310400014)(376005)(1800799015);DIR:OUT;SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 09 Apr 2024 15:09:49.9687 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 9577811c-9489-4ee6-1a96-08dc58a71a1a X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d;Ip=[63.35.35.123];Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AMS0EPF00000195.eurprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DU0PR08MB7762 X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,FORGED_SPF_HELO,GIT_PATCH_0,KAM_DMARC_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE,TXREP,UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 09/04/2024 20:01, Ajit Agarwal wrote: > Hello Alex: > > On 09/04/24 7:29 pm, Alex Coplan wrote: > > On 09/04/2024 17:30, Ajit Agarwal wrote: > >> > >> > >> On 05/04/24 10:03 pm, Alex Coplan wrote: > >>> On 05/04/2024 13:53, Ajit Agarwal wrote: > >>>> Hello Alex/Richard: > >>>> > >>>> All review comments are incorporated. > >>> > >>> Thanks, I was kind-of expecting you to also send the renaming patch as a > >>> preparatory patch as we discussed. > >>> > >>> Sorry for another meta comment, but: I think the reason that the Linaro > >>> CI isn't running tests on your patches is actually because you're > >>> sending 1/3 of a series but not sending the rest of the series. > >>> > >>> So please can you either send this as an individual preparatory patch > >>> (not marked as a series) or if you're going to send a series (e.g. with > >>> a preparatory rename patch as 1/2 and this as 2/2) then send the entire > >>> series when you make updates. That way the CI should test your patches, > >>> which would be helpful. > >>> > >> > >> Addressed. > >> > >>>> > >>>> Common infrastructure of load store pair fusion is divided into target > >>>> independent and target dependent changed code. > >>>> > >>>> Target independent code is the Generic code with pure virtual function > >>>> to interface betwwen target independent and dependent code. > >>>> > >>>> Target dependent code is the implementation of pure virtual function for > >>>> aarch64 target and the call to target independent code. > >>>> > >>>> Thanks & Regards > >>>> Ajit > >>>> > >>>> > >>>> aarch64: Place target independent and dependent changed code in one file > >>>> > >>>> Common infrastructure of load store pair fusion is divided into target > >>>> independent and target dependent changed code. > >>>> > >>>> Target independent code is the Generic code with pure virtual function > >>>> to interface betwwen target independent and dependent code. > >>>> > >>>> Target dependent code is the implementation of pure virtual function for > >>>> aarch64 target and the call to target independent code. > >>>> > >>>> 2024-04-06 Ajit Kumar Agarwal > >>>> > >>>> gcc/ChangeLog: > >>>> > >>>> * config/aarch64/aarch64-ldp-fusion.cc: Place target > >>>> independent and dependent changed code. > >>> > >>> You're going to need a proper ChangeLog eventually, but I guess there's > >>> no need for that right now. > >>> > >>>> --- > >>>> gcc/config/aarch64/aarch64-ldp-fusion.cc | 371 +++++++++++++++-------- > >>>> 1 file changed, 249 insertions(+), 122 deletions(-) > >>>> > >>>> diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc b/gcc/config/aarch64/aarch64-ldp-fusion.cc > >>>> index 22ed95eb743..cb21b514ef7 100644 > >>>> --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc > >>>> +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc > >>>> @@ -138,8 +138,122 @@ struct alt_base > >>>> poly_int64 offset; > >>>> }; > >>>> > >>>> +// Virtual base class for load/store walkers used in alias analysis. > >>>> +struct alias_walker > >>>> +{ > >>>> + virtual bool conflict_p (int &budget) const = 0; > >>>> + virtual insn_info *insn () const = 0; > >>>> + virtual bool valid () const = 0; > >>> > >>> Heh, looking at this made me realise there is a whitespace bug here in > >>> the existing code (double space after const). Sorry about that! I'll > >>> push an obvious fix for that. > >>> > >>>> + virtual void advance () = 0; > >>>> +}; > >>>> + > >>>> +struct pair_fusion { > >>>> + > >>>> + pair_fusion () {}; > >>> > >>> This ctor looks pointless at the moment. Perhaps instead we could put > >>> the contents of ldp_fusion_init in here and then delete that function? > >>> > >> > >> Addressed. > >> > >>>> + virtual bool fpsimd_op_p (rtx reg_op, machine_mode mem_mode, > >>>> + bool load_p) = 0; > >>> > >>> Please can we have comments above each of these virtual functions > >>> describing any parameters, what the purpose of the hook is, and the > >>> interpretation of the return value? This will serve as the > >>> documentation for other targets that want to make use of the pass. > >>> > >>> It might make sense to have a default-false implementation for > >>> fpsimd_op_p, especially if you don't want to make use of this bit for > >>> rs6000. > >>> > >> > >> Addressed. > >> > >>>> + > >>>> + virtual bool pair_operand_mode_ok_p (machine_mode mode) = 0; > >>>> + virtual bool pair_trailing_writeback_p () = 0; > >>> > >>> Sorry for the run-around, but: I think this and > >>> handle_writeback_opportunities () should be the same function, either > >>> returning an enum or taking an enum and returning a boolean. > >>> > >>> At a minimum they should have more similar sounding names. > >>> > >> > >> Addressed. > >> > >>>> + virtual bool pair_reg_operand_ok_p (bool load_p, rtx reg_op, > >>>> + machine_mode mem_mode) = 0; > >>>> + virtual int pair_mem_alias_check_limit () = 0; > >>>> + virtual bool handle_writeback_opportunities () = 0 ; > >>>> + virtual bool pair_mem_ok_with_policy (rtx first_mem, bool load_p, > >>>> + machine_mode mode) = 0; > >>>> + virtual rtx gen_mem_pair (rtx *pats, rtx writeback, > >>> > >>> Nit: excess whitespace after pats, > >>> > >>>> + bool load_p) = 0; > >>>> + virtual bool pair_mem_promote_writeback_p (rtx pat) = 0; > >>>> + virtual bool track_load_p () = 0; > >>>> + virtual bool track_store_p () = 0; > >>> > >>> I think it would probably make more sense for these two to have > >>> default-true implementations rather than being pure virtual functions. > >>> > >>> Also, sorry for the bikeshedding, but please can we keep the plural > >>> names (so track_loads_p and track_stores_p). > >>> > >>>> + virtual bool cand_insns_empty_p (std::list &insns) = 0; > >>> > >>> Why does this need to be virtualised? I would it expect it to > >>> just be insns.empty () on all targets. > >>> > >>>> + virtual bool pair_mem_in_range_p (HOST_WIDE_INT off) = 0; > >>>> + void ldp_fusion_bb (bb_info *bb); > >>> > >>> Just to check, are you planning to rename this function in a separate > >>> patch? > >>> > >> > >> Yes. > >>>> + > >>>> + ~pair_fusion() { } > >>> > >>> As with the ctor, perhaps we could put the contents of ldp_fusion_destroy > >>> in here and then delete that function? > >>> > >> > >> Addressed. > >>>> +}; > >>>> + > >>>> +struct aarch64_pair_fusion : public pair_fusion > >>>> +{ > >>>> +public: > >>> > >> > >> Addressed. > >>> Nit: there shouldn't be a need for this as public is already the default > >>> for structs. > >>> > >>>> + aarch64_pair_fusion () : pair_fusion () {}; > >>> > >>> This ctor looks pointless, the default one should do, so can we remove > >>> this? > >>> > >> > >> Addressed. > >>>> + bool fpsimd_op_p (rtx reg_op, machine_mode mem_mode, > >>>> + bool load_p) override final > >>>> + { > >>>> + const bool fpsimd_op_p > >>>> + = reload_completed > >>>> + ? (REG_P (reg_op) && FP_REGNUM_P (REGNO (reg_op))) > >>>> + : (GET_MODE_CLASS (mem_mode) != MODE_INT > >>>> + && (load_p || !aarch64_const_zero_rtx_p (reg_op))); > >>>> + return fpsimd_op_p; > >>> > >>> Can this just directly return the boolean expression instead of storing > >>> it into a variable first? > >>> > >> Addressed. > >>>> + } > >>>> + > >>>> + bool pair_mem_promote_writeback_p (rtx pat) > >>>> + { > >>>> + if (reload_completed > >>>> + && aarch64_ldp_writeback > 1 > >>>> + && GET_CODE (pat) == PARALLEL > >>>> + && XVECLEN (pat, 0) == 2) > >>>> + return true; > >>>> + > >>>> + return false; > >>>> + } > >>>> + > >>>> + bool pair_mem_ok_with_policy (rtx first_mem, bool load_p, > >>>> + machine_mode mode) > >>>> + { > >>>> + return aarch64_mem_ok_with_ldpstp_policy_model (first_mem, > >>>> + load_p, > >>>> + mode); > >>>> + } > >>>> + bool pair_operand_mode_ok_p (machine_mode mode); > >>> > >>> Why have you implemented this (one-line function) out-of-line but the > >>> others are implemented inline? Please be consistent. > >>> > >> > >> Addressed. > >>>> + > >>>> + rtx gen_mem_pair (rtx *pats, rtx writeback, bool load_p); > >>>> + > >>>> + bool pair_trailing_writeback_p () > >>>> + { > >>>> + return aarch64_ldp_writeback > 1; > >>>> + } > >>>> + bool pair_reg_operand_ok_p (bool load_p, rtx reg_op, > >>>> + machine_mode mem_mode) > >>>> + { > >>>> + return (load_p > >>>> + ? aarch64_ldp_reg_operand (reg_op, mem_mode) > >>>> + : aarch64_stp_reg_operand (reg_op, mem_mode)); > >>>> + } > >>>> + int pair_mem_alias_check_limit () > >>>> + { > >>>> + return aarch64_ldp_alias_check_limit; > >>>> + } > >>>> + bool handle_writeback_opportunities () > >>>> + { > >>>> + return aarch64_ldp_writeback; > >>>> + } > >>>> + bool track_load_p () > >>>> + { > >>>> + const bool track_loads > >>>> + = aarch64_tune_params.ldp_policy_model != AARCH64_LDP_STP_POLICY_NEVER; > >>>> + return track_loads; > >>>> + } > >>> > >> > >> Addressed. > >>> As above, can this just return the boolean expression directly without > >>> the temporary? > >>> > >>>> + bool track_store_p () > >>>> + { > >>>> + const bool track_stores > >>>> + = aarch64_tune_params.stp_policy_model != AARCH64_LDP_STP_POLICY_NEVER; > >>>> + return track_stores; > >>>> + } > >>> > >> > >> Addressed. > >>> Likewise. > >>> > >>>> + bool cand_insns_empty_p (std::list &insns) > >>>> + { > >>>> + return insns.empty(); > >>>> + } > >>> > >>> As mentioned on the declaration, I don't see why this needs to be > >>> virtualised. > >>> > >>>> + bool pair_mem_in_range_p (HOST_WIDE_INT off) > >>>> + { > >>>> + return (off < LDP_MIN_IMM || off > LDP_MAX_IMM); > >>>> + } > >>> > >>> Hmm, this looks to test exactly the opposite condition from what the > >>> name suggests, i.e. the implementation looks like what I would expect > >>> for a function called pair_mem_out_of_range_p. > >>> > >>> Please can you invert the condition and fix up callers appropriately? > >>> > >>> This suggests you're not really thinking about the code you're writing, > >>> please try to think a bit more carefully before posting patches. > >>> > >> > >> Sorry for my mistake as I have overlooked it. > >>>> +}; > >>>> + > >>>> + > >>>> // State used by the pass for a given basic block. > >>>> -struct ldp_bb_info > >>>> +struct pair_fusion_bb_info > >>>> { > >>>> using def_hash = nofree_ptr_hash; > >>>> using expr_key_t = pair_hash>; > >>>> @@ -160,14 +274,17 @@ struct ldp_bb_info > >>>> > >>>> static const size_t obstack_alignment = sizeof (void *); > >>>> bb_info *m_bb; > >>>> + pair_fusion *bb_state; > >>> > >> > >> Addressed. > >> > >>> Can we call this m_pass instead of bb_state? The pair_fusion class is > >>> explicitly at the pass level rather than the bb level. > >>> > >>> Also I think this (and indeed m_bb, although that's pre-existing) could > >>> be private members. Not sure there's a good reason for making them > >>> public. > >>> > >> > >> Addressed. > >> > >>> It might be slightly nicer to use a reference for m_pass unless there's > >>> a good reason to use a pointer, although I don't feel strongly about > >>> this. > >>> > >>>> > >>>> - ldp_bb_info (bb_info *bb) : m_bb (bb), m_emitted_tombstone (false) > >>>> + pair_fusion_bb_info (bb_info *bb, > >>>> + aarch64_pair_fusion *d) : m_bb (bb), > >>> > >>> This generic class needs to take the generic pass type, i.e. just a > >>> pair_fusion * rather than an aarch64_pair_fusion *. > >>> > >>>> + bb_state (d), m_emitted_tombstone (false) > >>>> { > >>>> obstack_specify_allocation (&m_obstack, OBSTACK_CHUNK_SIZE, > >>>> obstack_alignment, obstack_chunk_alloc, > >>>> obstack_chunk_free); > >>>> } > >>>> - ~ldp_bb_info () > >>>> + ~pair_fusion_bb_info () > >>>> { > >>>> obstack_free (&m_obstack, nullptr); > >>>> > >>>> @@ -177,10 +294,32 @@ struct ldp_bb_info > >>>> bitmap_obstack_release (&m_bitmap_obstack); > >>>> } > >>>> } > >>>> + void track_access (insn_info *, bool load, rtx mem); > >>>> + void transform (); > >>>> + void cleanup_tombstones (); > >>> > >>> I think most (or indeed all?) of the changes here shouldn't be needed. > >>> AFAICS, there shouldn't be a need to: > >>> - drop inline from these functions. > >>> - change the accessibility of any members. > >>> > >> > >> Addressed. > >>> Plese can you drop these changes or explain why they're needed (from > >>> here until the end of the class). > >>> > >>>> + void merge_pairs (insn_list_t &, insn_list_t &, > >>>> + bool load_p, unsigned access_size); > >>>> + void transform_for_base (int load_size, access_group &group); > >>>> + > >>>> + bool try_fuse_pair (bool load_p, unsigned access_size, > >>>> + insn_info *i1, insn_info *i2); > >>>> + > >>>> + bool fuse_pair (bool load_p, unsigned access_size, > >>>> + int writeback, > >>>> + insn_info *i1, insn_info *i2, > >>>> + base_cand &base, > >>>> + const insn_range_info &move_range); > >>>> > >>>> - inline void track_access (insn_info *, bool load, rtx mem); > >>>> - inline void transform (); > >>>> - inline void cleanup_tombstones (); > >>>> + void do_alias_analysis (insn_info *alias_hazards[4], > >>>> + alias_walker *walkers[4], > >>>> + bool load_p); > >>>> + > >>>> + void track_tombstone (int uid); > >>>> + > >>>> + bool track_via_mem_expr (insn_info *, rtx mem, lfs_fields lfs); > >>>> + > >>>> + template > >>>> + void traverse_base_map (Map &map); > >>>> > >>>> private: > >>>> obstack m_obstack; > >>>> @@ -191,30 +330,32 @@ private: > >>>> bool m_emitted_tombstone; > >>>> > >>>> inline splay_tree_node *node_alloc (access_record *); > >>>> +}; > >>>> > >>>> - template > >>>> - inline void traverse_base_map (Map &map); > >>>> - inline void transform_for_base (int load_size, access_group &group); > >>>> - > >>>> - inline void merge_pairs (insn_list_t &, insn_list_t &, > >>>> - bool load_p, unsigned access_size); > >>>> - > >>>> - inline bool try_fuse_pair (bool load_p, unsigned access_size, > >>>> - insn_info *i1, insn_info *i2); > >>>> - > >>>> - inline bool fuse_pair (bool load_p, unsigned access_size, > >>>> - int writeback, > >>>> - insn_info *i1, insn_info *i2, > >>>> - base_cand &base, > >>>> - const insn_range_info &move_range); > >>>> - > >>>> - inline void track_tombstone (int uid); > >>>> +rtx aarch64_pair_fusion::gen_mem_pair (rtx *pats, > >>> > >>> Style nit: newline after rtx. > >>> > >> Addressed. > >> > >>>> + rtx writeback, > >>>> + bool load_p) > >>>> + { > >>>> + rtx pair_pat; > >>>> > >>>> - inline bool track_via_mem_expr (insn_info *, rtx mem, lfs_fields lfs); > >>>> -}; > >>>> + if (writeback) > >>>> + { > >>>> + auto patvec = gen_rtvec (3, writeback, pats[0], pats[1]); > >>>> + pair_pat = gen_rtx_PARALLEL (VOIDmode, patvec); > >>>> + } > >>>> + else if (load_p) > >>>> + pair_pat = aarch64_gen_load_pair (XEXP (pats[0], 0), > >>>> + XEXP (pats[1], 0), > >>>> + XEXP (pats[0], 1)); > >>>> + else > >>>> + pair_pat = aarch64_gen_store_pair (XEXP (pats[0], 0), > >>>> + XEXP (pats[0], 1), > >>>> + XEXP (pats[1], 1)); > >>>> + return pair_pat; > >>> > >>> Can the assignments just be direct returns here, please? Then > >>> get rid of the temporary. > >>> > >> > >> Addressed. > >> > >>>> + } > >>>> > >>>> splay_tree_node * > >>>> -ldp_bb_info::node_alloc (access_record *access) > >>>> +pair_fusion_bb_info::node_alloc (access_record *access) > >>>> { > >>>> using T = splay_tree_node; > >>>> void *addr = obstack_alloc (&m_obstack, sizeof (T)); > >>>> @@ -262,7 +403,7 @@ drop_writeback (rtx mem) > >>>> // RTX_AUTOINC addresses. The interface is like strip_offset except we take a > >>>> // MEM so that we know the mode of the access. > >>>> static rtx > >>>> -ldp_strip_offset (rtx mem, poly_int64 *offset) > >>>> +pair_mem_strip_offset (rtx mem, poly_int64 *offset) > >>> > >>> I thought we discussed splitting the renaming into a separate patch? > >>> > >>>> { > >>>> rtx addr = XEXP (mem, 0); > >>>> > >>>> @@ -332,6 +473,12 @@ ldp_operand_mode_ok_p (machine_mode mode) > >>>> return reload_completed || mode != TImode; > >>>> } > >>>> > >>>> +bool > >>>> +aarch64_pair_fusion::pair_operand_mode_ok_p (machine_mode mode) > >>>> +{ > >>>> + return ldp_operand_mode_ok_p (mode); > >>>> +} > >>>> + > >>>> // Given LFS (load_p, fpsimd_p, size) fields in FIELDS, encode these > >>>> // into an integer for use as a hash table key. > >>>> static int > >>>> @@ -396,7 +543,7 @@ access_group::track (Alloc alloc_node, poly_int64 offset, insn_info *insn) > >>>> // MEM_EXPR base (i.e. a tree decl) relative to which we can track the access. > >>>> // LFS is used as part of the key to the hash table, see track_access. > >>>> bool > >>>> -ldp_bb_info::track_via_mem_expr (insn_info *insn, rtx mem, lfs_fields lfs) > >>>> +pair_fusion_bb_info::track_via_mem_expr (insn_info *insn, rtx mem, lfs_fields lfs) > >>> > >>> Again, please do any renaming in a separate patch. > >>> > >> > >> Addressed. > >>>> { > >>>> if (!MEM_EXPR (mem) || !MEM_OFFSET_KNOWN_P (mem)) > >>>> return false; > >>>> @@ -412,9 +559,10 @@ ldp_bb_info::track_via_mem_expr (insn_info *insn, rtx mem, lfs_fields lfs) > >>>> const machine_mode mem_mode = GET_MODE (mem); > >>>> const HOST_WIDE_INT mem_size = GET_MODE_SIZE (mem_mode).to_constant (); > >>>> > >>>> - // Punt on misaligned offsets. LDP/STP instructions require offsets to be a > >>>> - // multiple of the access size, and we believe that misaligned offsets on > >>>> - // MEM_EXPR bases are likely to lead to misaligned offsets w.r.t. RTL bases. > >>>> + // Punt on misaligned offsets. Paired memory access instructions require > >>>> + // offsets to be a multiple of the access size, and we believe that > >>>> + // misaligned offsets on MEM_EXPR bases are likely to lead to misaligned > >>>> + // offsets w.r.t. RTL bases. > >>>> if (!multiple_p (offset, mem_size)) > >>>> return false; > >>>> > >>>> @@ -438,46 +586,38 @@ ldp_bb_info::track_via_mem_expr (insn_info *insn, rtx mem, lfs_fields lfs) > >>>> } > >>>> > >>>> // Main function to begin pair discovery. Given a memory access INSN, > >>>> -// determine whether it could be a candidate for fusing into an ldp/stp, > >>>> +// determine whether it could be a candidate for fusing into an pair mem, > >>> > >>> s/an pair mem/a paired access/ > >>> > >>>> // and if so, track it in the appropriate data structure for this basic > >>>> // block. LOAD_P is true if the access is a load, and MEM is the mem > >>>> // rtx that occurs in INSN. > >>>> void > >>>> -ldp_bb_info::track_access (insn_info *insn, bool load_p, rtx mem) > >>>> +pair_fusion_bb_info::track_access (insn_info *insn, bool load_p, rtx mem) > >>>> { > >>>> // We can't combine volatile MEMs, so punt on these. > >>>> if (MEM_VOLATILE_P (mem)) > >>>> return; > >>>> > >>>> - // Ignore writeback accesses if the param says to do so. > >>>> - if (!aarch64_ldp_writeback > >>>> + // Ignore writeback accesses if the param says to do so > >>> > >> > >> Addressed. > >>> This comment needs updating, I guess something like "if the hook says to > >>> do so". But also please don't delete the full stop. > >>> > >>>> + if (!bb_state->handle_writeback_opportunities () > >>>> && GET_RTX_CLASS (GET_CODE (XEXP (mem, 0))) == RTX_AUTOINC) > >>>> return; > >>>> > >>>> const machine_mode mem_mode = GET_MODE (mem); > >>>> - if (!ldp_operand_mode_ok_p (mem_mode)) > >>>> + > >>> > >>> I don't think we need the extra whitespace here. > >>> > >>>> + if (!bb_state->pair_operand_mode_ok_p (mem_mode)) > >>>> return; > >>>> > >>>> rtx reg_op = XEXP (PATTERN (insn->rtl ()), !load_p); > >>>> > >>>> - // Ignore the access if the register operand isn't suitable for ldp/stp. > >>>> - if (load_p > >>>> - ? !aarch64_ldp_reg_operand (reg_op, mem_mode) > >>>> - : !aarch64_stp_reg_operand (reg_op, mem_mode)) > >>>> + if (!bb_state->pair_reg_operand_ok_p (load_p, reg_op, mem_mode)) > >>>> return; > >>>> - > >>>> // We want to segregate FP/SIMD accesses from GPR accesses. > >>>> // > >>>> // Before RA, we use the modes, noting that stores of constant zero > >>>> // operands use GPRs (even in non-integer modes). After RA, we use > >>>> // the hard register numbers. > >>>> - const bool fpsimd_op_p > >>>> - = reload_completed > >>>> - ? (REG_P (reg_op) && FP_REGNUM_P (REGNO (reg_op))) > >>>> - : (GET_MODE_CLASS (mem_mode) != MODE_INT > >>>> - && (load_p || !aarch64_const_zero_rtx_p (reg_op))); > >>>> - > >>>> - // Note ldp_operand_mode_ok_p already rejected VL modes. > >>>> + const bool fpsimd_op_p = bb_state->fpsimd_op_p (reg_op, mem_mode, load_p); > >>>> + // Note pair_operand_mode_ok_p already rejected VL modes. > >>>> const HOST_WIDE_INT mem_size = GET_MODE_SIZE (mem_mode).to_constant (); > >>>> const lfs_fields lfs = { load_p, fpsimd_op_p, mem_size }; > >>>> > >>>> @@ -487,7 +627,7 @@ ldp_bb_info::track_access (insn_info *insn, bool load_p, rtx mem) > >>>> poly_int64 mem_off; > >>>> rtx addr = XEXP (mem, 0); > >>>> const bool autoinc_p = GET_RTX_CLASS (GET_CODE (addr)) == RTX_AUTOINC; > >>>> - rtx base = ldp_strip_offset (mem, &mem_off); > >>>> + rtx base = pair_mem_strip_offset (mem, &mem_off); > >>> > >>> Again, let's split the renaming off into a separate patch. > >>> > >> > >> Addressed. > >>>> if (!REG_P (base)) > >>>> return; > >>>> > >>>> @@ -506,8 +646,8 @@ ldp_bb_info::track_access (insn_info *insn, bool load_p, rtx mem) > >>>> // elimination offset pre-RA, we should postpone forming pairs on such > >>>> // accesses until after RA. > >>>> // > >>>> - // As it stands, addresses with offsets in range for LDR but not > >>>> - // in range for LDP/STP are currently reloaded inefficiently, > >>>> + // As it stands, addresses in range for an individual load/store but not > >>>> + // for a paired access are currently reloaded inefficiently, > >>>> // ending up with a separate base register for each pair. > >>>> // > >>>> // In theory LRA should make use of > >>>> @@ -519,8 +659,8 @@ ldp_bb_info::track_access (insn_info *insn, bool load_p, rtx mem) > >>>> // that calls targetm.legitimize_address_displacement. > >>>> // > >>>> // So for now, it's better to punt when we can't be sure that the > >>>> - // offset is in range for LDP/STP. Out-of-range cases can then be > >>>> - // handled after RA by the out-of-range LDP/STP peepholes. Eventually, it > >>>> + // offset is in range for paired access. Out-of-range cases can then be > >>>> + // handled after RA by the out-of-range PAIR MEM peepholes. Eventually, it > >>>> // would be nice to handle known out-of-range opportunities in the > >>>> // pass itself (for stack accesses, this would be in the post-RA pass). > >>>> if (!reload_completed > >>>> @@ -573,8 +713,8 @@ ldp_bb_info::track_access (insn_info *insn, bool load_p, rtx mem) > >>>> gcc_unreachable (); // Base defs should be unique. > >>>> } > >>>> > >>>> - // Punt on misaligned offsets. LDP/STP require offsets to be a multiple of > >>>> - // the access size. > >>>> + // Punt on misaligned offsets. Paired memory accesses require offsets > >>>> + // to be a multiple of the access size. > >>>> if (!multiple_p (mem_off, mem_size)) > >>>> return; > >>>> > >>>> @@ -614,7 +754,7 @@ static bool no_ignore (insn_info *) { return false; } > >>>> // making use of alias disambiguation. > >>>> static insn_info * > >>>> latest_hazard_before (insn_info *insn, rtx *ignore, > >>>> - insn_info *ignore_insn = nullptr) > >>>> + insn_info *ignore_insn = 0) > >>> > >>> Is this change needed? > >>> > >> > >> Addressed. > >>>> { > >>>> insn_info *result = nullptr; > >>>> > >>>> @@ -1150,7 +1290,7 @@ extract_writebacks (bool load_p, rtx pats[2], int changed) > >>>> const bool autoinc_p = GET_RTX_CLASS (GET_CODE (addr)) == RTX_AUTOINC; > >>>> > >>>> poly_int64 offset; > >>>> - rtx this_base = ldp_strip_offset (mem, &offset); > >>>> + rtx this_base = pair_mem_strip_offset (mem, &offset); > >>>> gcc_assert (REG_P (this_base)); > >>>> if (base_reg) > >>>> gcc_assert (rtx_equal_p (base_reg, this_base)); > >>>> @@ -1286,7 +1426,11 @@ find_trailing_add (insn_info *insns[2], > >>>> > >>>> off_hwi /= access_size; > >>>> > >>>> - if (off_hwi < LDP_MIN_IMM || off_hwi > LDP_MAX_IMM) > >>>> + pair_fusion *pfuse; > >>>> + aarch64_pair_fusion derived; > >>>> + pfuse = &derived; > >>> > >>> Huh? We'll want to use the same instance of the pass structure (pair_fusion) > >>> that gets created at the top level, so that should get passed around everywhere. > >>> > >>> In the case of find_trailing_add we probably don't want to add any more > >>> parameters (as Segher noted there are already too many), so perhaps the > >>> easiest thing would be to make find_trailing_add itself a (non-virtual) > >>> member function of pair_fusion. > >>> > >>> Then here we can just query pair_mem_in_range_p directly. > >>> > >>> For try_promote_writeback itself you can either make it a non-virtual > >>> member function of pair_fusion or just pass a reference to the pass > >>> object in when you call it. > >>> > >>> Note also that you don't have to create a pointer of the parent type in > >>> order to call a virtual function on a derived object, you can just call > >>> it directly. > >>> > >> > >> Addressed. > >>>> + > >>>> + if (pfuse->pair_mem_in_range_p (off_hwi)) > >>>> return nullptr; > >>> > >>> Note of course this condition needs inverting when you invert the > >>> implementation. > >>> > >> > >> Addressed. > >>>> > >>>> auto dump_prefix = [&]() > >>>> @@ -1328,7 +1472,7 @@ find_trailing_add (insn_info *insns[2], > >>>> // We just emitted a tombstone with uid UID, track it in a bitmap for > >>>> // this BB so we can easily identify it later when cleaning up tombstones. > >>>> void > >>>> -ldp_bb_info::track_tombstone (int uid) > >>>> +pair_fusion_bb_info::track_tombstone (int uid) > >>> > >>> Again, please split the renaming off. Same with other instances below. > >>> > >>>> { > >>>> if (!m_emitted_tombstone) > >>>> { > >>>> @@ -1528,7 +1672,7 @@ fixup_debug_uses (obstack_watermark &attempt, > >>>> gcc_checking_assert (GET_RTX_CLASS (GET_CODE (XEXP (mem, 0))) > >>>> == RTX_AUTOINC); > >>>> > >>>> - base = ldp_strip_offset (mem, &offset); > >>>> + base = pair_mem_strip_offset (mem, &offset); > >>>> gcc_checking_assert (REG_P (base) && REGNO (base) == base_regno); > >>>> } > >>>> fixup_debug_use (attempt, use, def, base, offset); > >>>> @@ -1664,7 +1808,7 @@ fixup_debug_uses (obstack_watermark &attempt, > >>>> // BASE gives the chosen base candidate for the pair and MOVE_RANGE is > >>>> // a singleton range which says where to place the pair. > >>>> bool > >>>> -ldp_bb_info::fuse_pair (bool load_p, > >>>> +pair_fusion_bb_info::fuse_pair (bool load_p, > >>>> unsigned access_size, > >>>> int writeback, > >>>> insn_info *i1, insn_info *i2, > >>>> @@ -1800,7 +1944,7 @@ ldp_bb_info::fuse_pair (bool load_p, > >>>> { > >>>> if (dump_file) > >>>> fprintf (dump_file, > >>>> - " ldp: i%d has wb but subsequent i%d has non-wb " > >>>> + " load pair: i%d has wb but subsequent i%d has non-wb " > >>>> "update of base (r%d), dropping wb\n", > >>>> insns[0]->uid (), insns[1]->uid (), base_regno); > >>>> gcc_assert (writeback_effect); > >>>> @@ -1823,7 +1967,7 @@ ldp_bb_info::fuse_pair (bool load_p, > >>>> } > >>>> > >>>> // If either of the original insns had writeback, but the resulting pair insn > >>>> - // does not (can happen e.g. in the ldp edge case above, or if the writeback > >>>> + // does not (can happen e.g. in the load pair edge case above, or if the writeback > >>>> // effects cancel out), then drop the def(s) of the base register as > >>>> // appropriate. > >>>> // > >>>> @@ -1842,7 +1986,7 @@ ldp_bb_info::fuse_pair (bool load_p, > >>>> // update of the base register and try and fold it in to make this into a > >>>> // writeback pair. > >>>> insn_info *trailing_add = nullptr; > >>>> - if (aarch64_ldp_writeback > 1 > >>>> + if (bb_state->pair_trailing_writeback_p () > >>>> && !writeback_effect > >>>> && (!load_p || (!refers_to_regno_p (base_regno, base_regno + 1, > >>>> XEXP (pats[0], 0), nullptr) > >>>> @@ -1863,14 +2007,14 @@ ldp_bb_info::fuse_pair (bool load_p, > >>>> } > >>>> > >>>> // Now that we know what base mem we're going to use, check if it's OK > >>>> - // with the ldp/stp policy. > >>>> + // with the pair mem policy. > >>>> rtx first_mem = XEXP (pats[0], load_p); > >>>> - if (!aarch64_mem_ok_with_ldpstp_policy_model (first_mem, > >>>> - load_p, > >>>> - GET_MODE (first_mem))) > >>>> + if (!bb_state->pair_mem_ok_with_policy (first_mem, > >>>> + load_p, > >>>> + GET_MODE (first_mem))) > >>>> { > >>>> if (dump_file) > >>>> - fprintf (dump_file, "punting on pair (%d,%d), ldp/stp policy says no\n", > >>>> + fprintf (dump_file, "punting on pair (%d,%d), pair mem policy says no\n", > >>>> i1->uid (), i2->uid ()); > >>>> return false; > >>>> } > >>>> @@ -1878,21 +2022,10 @@ ldp_bb_info::fuse_pair (bool load_p, > >>>> rtx reg_notes = combine_reg_notes (first, second, load_p); > >>>> > >>>> rtx pair_pat; > >>>> - if (writeback_effect) > >>>> - { > >>>> - auto patvec = gen_rtvec (3, writeback_effect, pats[0], pats[1]); > >>>> - pair_pat = gen_rtx_PARALLEL (VOIDmode, patvec); > >>>> - } > >>>> - else if (load_p) > >>>> - pair_pat = aarch64_gen_load_pair (XEXP (pats[0], 0), > >>>> - XEXP (pats[1], 0), > >>>> - XEXP (pats[0], 1)); > >>>> - else > >>>> - pair_pat = aarch64_gen_store_pair (XEXP (pats[0], 0), > >>>> - XEXP (pats[0], 1), > >>>> - XEXP (pats[1], 1)); > >>>> > >>>> + pair_pat = bb_state->gen_mem_pair (pats, writeback_effect, load_p); > >>>> insn_change *pair_change = nullptr; > >>>> + > >>>> auto set_pair_pat = [pair_pat,reg_notes](insn_change *change) { > >>>> rtx_insn *rti = change->insn ()->rtl (); > >>>> validate_unshare_change (rti, &PATTERN (rti), pair_pat, true); > >>>> @@ -2133,15 +2266,6 @@ load_modified_by_store_p (insn_info *load, > >>>> return false; > >>>> } > >>>> > >>>> -// Virtual base class for load/store walkers used in alias analysis. > >>>> -struct alias_walker > >>>> -{ > >>>> - virtual bool conflict_p (int &budget) const = 0; > >>>> - virtual insn_info *insn () const = 0; > >>>> - virtual bool valid () const = 0; > >>>> - virtual void advance () = 0; > >>>> -}; > >>>> - > >>>> // Implement some common functionality used by both store_walker > >>>> // and load_walker. > >>>> template > >>>> @@ -2259,13 +2383,13 @@ public: > >>>> // > >>>> // We try to maintain the invariant that if a walker becomes invalid, we > >>>> // set its pointer to null. > >>>> -static void > >>>> -do_alias_analysis (insn_info *alias_hazards[4], > >>>> +void > >>>> +pair_fusion_bb_info::do_alias_analysis (insn_info *alias_hazards[4], > >>> > >>> It would probably make more sense for this to be a (non-virtual) member > >>> function of the pair_fusion class, since it doesn't need to access the > >>> BB-specific state. > >>> > >> > >> Addressed. > >>>> alias_walker *walkers[4], > >>>> bool load_p) > >>>> { > >>>> const int n_walkers = 2 + (2 * !load_p); > >>>> - int budget = aarch64_ldp_alias_check_limit; > >>>> + int budget = bb_state->pair_mem_alias_check_limit (); > >>>> > >>>> auto next_walker = [walkers,n_walkers](int current) -> int { > >>>> for (int j = 1; j <= n_walkers; j++) > >>>> @@ -2365,7 +2489,7 @@ get_viable_bases (insn_info *insns[2], > >>>> { > >>>> const bool is_lower = (i == reversed); > >>>> poly_int64 poly_off; > >>>> - rtx base = ldp_strip_offset (cand_mems[i], &poly_off); > >>>> + rtx base = pair_mem_strip_offset (cand_mems[i], &poly_off); > >>>> if (GET_RTX_CLASS (GET_CODE (XEXP (cand_mems[i], 0))) == RTX_AUTOINC) > >>>> writeback |= (1 << i); > >>>> > >>>> @@ -2373,7 +2497,7 @@ get_viable_bases (insn_info *insns[2], > >>>> continue; > >>>> > >>>> // Punt on accesses relative to eliminable regs. See the comment in > >>>> - // ldp_bb_info::track_access for a detailed explanation of this. > >>>> + // pair_fusion_bb_info::track_access for a detailed explanation of this. > >>>> if (!reload_completed > >>>> && (REGNO (base) == FRAME_POINTER_REGNUM > >>>> || REGNO (base) == ARG_POINTER_REGNUM)) > >>>> @@ -2397,7 +2521,11 @@ get_viable_bases (insn_info *insns[2], > >>>> if (!is_lower) > >>>> base_off--; > >>>> > >>>> - if (base_off < LDP_MIN_IMM || base_off > LDP_MAX_IMM) > >>>> + pair_fusion *pfuse; > >>>> + aarch64_pair_fusion derived; > >>>> + pfuse = &derived; > >>> > >>> Again, please don't do this but instead use the same instance > >>> everywhere. > >>> > >> > >> Addressed. > >>>> + > >>>> + if (pfuse->pair_mem_in_range_p (base_off)) > >>>> continue; > >>>> > >>>> use_info *use = find_access (insns[i]->uses (), REGNO (base)); > >>>> @@ -2454,12 +2582,12 @@ get_viable_bases (insn_info *insns[2], > >>>> } > >>>> > >>>> // Given two adjacent memory accesses of the same size, I1 and I2, try > >>>> -// and see if we can merge them into a ldp or stp. > >>>> +// and see if we can merge them into a paired accesses load and store. > >>> > >>> "into a load pair or store pair" or "into a paired access" if you > >>> prefer. > >>> > >> > >> Addressed. > >>>> // > >>>> // ACCESS_SIZE gives the (common) size of a single access, LOAD_P is true > >>>> // if the accesses are both loads, otherwise they are both stores. > >>>> bool > >>>> -ldp_bb_info::try_fuse_pair (bool load_p, unsigned access_size, > >>>> +pair_fusion_bb_info::try_fuse_pair (bool load_p, unsigned access_size, > >>>> insn_info *i1, insn_info *i2) > >>>> { > >>>> if (dump_file) > >>>> @@ -2494,7 +2622,7 @@ ldp_bb_info::try_fuse_pair (bool load_p, unsigned access_size, > >>>> { > >>>> if (dump_file) > >>>> fprintf (dump_file, > >>>> - "punting on ldp due to reg conflcits (%d,%d)\n", > >>>> + "punting on pair mem load due to reg conflcits (%d,%d)\n", > >>>> insns[0]->uid (), insns[1]->uid ()); > >>>> return false; > >>>> } > >>>> @@ -2843,7 +2971,7 @@ debug (const insn_list_t &l) > >>>> // we can't re-order them anyway, so provided earlier passes have cleaned up > >>>> // redundant loads, we shouldn't miss opportunities by doing this. > >>>> void > >>>> -ldp_bb_info::merge_pairs (insn_list_t &left_list, > >>>> +pair_fusion_bb_info::merge_pairs (insn_list_t &left_list, > >>>> insn_list_t &right_list, > >>>> bool load_p, > >>>> unsigned access_size) > >>>> @@ -2890,8 +3018,8 @@ ldp_bb_info::merge_pairs (insn_list_t &left_list, > >>>> // of accesses. If we find two sets of adjacent accesses, call > >>>> // merge_pairs. > >>>> void > >>>> -ldp_bb_info::transform_for_base (int encoded_lfs, > >>>> - access_group &group) > >>>> +pair_fusion_bb_info::transform_for_base (int encoded_lfs, > >>>> + access_group &group) > >>>> { > >>>> const auto lfs = decode_lfs (encoded_lfs); > >>>> const unsigned access_size = lfs.size; > >>>> @@ -2909,7 +3037,7 @@ ldp_bb_info::transform_for_base (int encoded_lfs, > >>>> access.cand_insns, > >>>> lfs.load_p, > >>>> access_size); > >>>> - skip_next = access.cand_insns.empty (); > >>>> + skip_next = bb_state->cand_insns_empty_p (access.cand_insns); > >>> > >>> As above, why is this needed? > >> > >> For rs6000 we want to return always true. as load store pair > >> that are to be merged with 8/16 16/32 32/64 is occuring for rs6000. > >> And we want load store pair to 8/16 32/64. Thats why we want > >> to generate always true for rs6000 to skip pairs as above. > > > > Hmm, sorry, I'm not sure I follow. Are you saying that for rs6000 you have > > load/store pair instructions where the two arms of the access are storing > > operands of different sizes? Or something else? > > > > As it stands the logic is to skip the next iteration only if we > > exhausted all the candidate insns for the current access. In the case > > that we didn't exhaust all such candidates, then the idea is that when > > access becomes prev_access, we can attempt to use those candidates as > > the "left-hand side" of a pair in the next iteration since we failed to > > use them as the "right-hand side" of a pair in the current iteration. > > I don't see why you wouldn't want that behaviour. Please can you > > explain? > > > > In merge_pair we get the 2 load candiates one load from 0 offset and > other load is from 16th offset. Then in next iteration we get load > from 16th offset and other load from 32 offset. In next iteration > we get load from 32 offset and other load from 48 offset. > > For example: > > Currently we get the load candiates as follows. > > pairs: > > load from 0th offset. > load from 16th offset. > > next pairs: > > load from 16th offset. > load from 32th offset. > > next pairs: > > load from 32th offset > load from 48th offset. > > Instead in rs6000 we should get: > > pairs: > > load from 0th offset > load from 16th offset. > > next pairs: > > load from 32th offset > load from 48th offset. Hmm, so then I guess my question is: why wouldn't you consider merging the pair with offsets (16,32) for rs6000? Is it because you have a stricter alignment requirement on the base pair offsets (i.e. they have to be a multiple of 32 when the operand size is 16)? So the pair offsets have to be a multiple of the entire pair size rather than a single operand size? If that is the case then I think it would be better to introduce a virtual function (say pair_offset_alignment_ok_p) that vets the base offset of the pair (prev_access->offset in transform_for_base). I guess it would also take access_size as a parameter and for aarch64 it should check: multiple_p (offset, access_size) and for rs6000 it could check: multiple_p (offset, access_size * 2) and we would then incorporate a call to that predicate in the else if condition of tranform_for_base. It would have the same limitation whereby we assume that MEM_EXPR offset alignment is a good proxy for RTL offset alignment, but we already depend on that with the multiple_p check in track_via_mem_expr. Thanks, Alex > > Thanks & Regards > Ajit > > Thanks, > > Alex > > > >> > >>> > >>>> } > >>>> prev_access = &access; > >>>> } > >>>> @@ -2919,7 +3047,7 @@ ldp_bb_info::transform_for_base (int encoded_lfs, > >>>> // and remove all the tombstone insns, being sure to reparent any uses > >>>> // of mem to previous defs when we do this. > >>>> void > >>>> -ldp_bb_info::cleanup_tombstones () > >>>> +pair_fusion_bb_info::cleanup_tombstones () > >>>> { > >>>> // No need to do anything if we didn't emit a tombstone insn for this BB. > >>>> if (!m_emitted_tombstone) > >>>> @@ -2947,7 +3075,7 @@ ldp_bb_info::cleanup_tombstones () > >>>> > >>>> template > >>>> void > >>>> -ldp_bb_info::traverse_base_map (Map &map) > >>>> +pair_fusion_bb_info::traverse_base_map (Map &map) > >>>> { > >>>> for (auto kv : map) > >>>> { > >>>> @@ -2958,7 +3086,7 @@ ldp_bb_info::traverse_base_map (Map &map) > >>>> } > >>>> > >>>> void > >>>> -ldp_bb_info::transform () > >>>> +pair_fusion_bb_info::transform () > >>>> { > >>>> traverse_base_map (expr_map); > >>>> traverse_base_map (def_map); > >>>> @@ -3167,14 +3295,13 @@ try_promote_writeback (insn_info *insn) > >>>> // for load/store candidates. If running after RA, also try and promote > >>>> // non-writeback pairs to use writeback addressing. Then try to fuse > >>>> // candidates into pairs. > >>>> -void ldp_fusion_bb (bb_info *bb) > >>>> +void pair_fusion::ldp_fusion_bb (bb_info *bb) > >>>> { > >>>> - const bool track_loads > >>>> - = aarch64_tune_params.ldp_policy_model != AARCH64_LDP_STP_POLICY_NEVER; > >>>> - const bool track_stores > >>>> - = aarch64_tune_params.stp_policy_model != AARCH64_LDP_STP_POLICY_NEVER; > >>>> + const bool track_loads = track_load_p (); > >>>> + const bool track_stores = track_store_p (); > >>>> > >>>> - ldp_bb_info bb_state (bb); > >>> > >>> This: > >>> > >>>> + aarch64_pair_fusion derived; > >>> > >>> can be deleted and then: > >>> > >>>> + pair_fusion_bb_info bb_info (bb, &derived); > >>> > >>> can just be: > >>> > >>> pair_fusion_bb_info bb_info (bb, this); > >>> > >>> (or you can pass *this if you make bb_info take a reference). > >>> > >>> I don't think there's a particular need to change the variable name > >>> (bb_state -> bb_info). I chose the former because it doens't clash > >>> with the RTL-SSA structure of the same name as the latter. > >>> > >> > >> Addressed. > >>>> > >>>> for (auto insn : bb->nondebug_insns ()) > >>>> { > >>>> @@ -3184,31 +3311,31 @@ void ldp_fusion_bb (bb_info *bb) > >>>> continue; > >>>> > >>>> rtx pat = PATTERN (rti); > >>>> - if (reload_completed > >>>> - && aarch64_ldp_writeback > 1 > >>>> - && GET_CODE (pat) == PARALLEL > >>>> - && XVECLEN (pat, 0) == 2) > >>>> + if (pair_mem_promote_writeback_p (pat)) > >>>> try_promote_writeback (insn); > >>> > >>> It looks like try_promote_writeback itself will need some further work > >>> to make it target-independent. I suppose this check: > >>> > >>> auto rti = insn->rtl (); > >>> const auto attr = get_attr_ldpstp (rti); > >>> if (attr == LDPSTP_NONE) > >>> return; > >>> > >>> bool load_p = (attr == LDPSTP_LDP); > >>> gcc_checking_assert (load_p || attr == LDPSTP_STP); > >>> > >>> will need to become part of the pair_mem_promote_writeback_p hook that you > >>> added, potentially changing it to return a boolean for load_p. > >>> > >>> Then I guess we will need hooks for destructuring the pair insn and > >>> another hook to wrap aarch64_gen_writeback_pair. > >>> > >> > >> Addressed. > >>>> > >>>> if (GET_CODE (pat) != SET) > >>>> continue; > >>>> > >>>> if (track_stores && MEM_P (XEXP (pat, 0))) > >>>> - bb_state.track_access (insn, false, XEXP (pat, 0)); > >>>> + bb_info.track_access (insn, false, XEXP (pat, 0)); > >>>> else if (track_loads && MEM_P (XEXP (pat, 1))) > >>>> - bb_state.track_access (insn, true, XEXP (pat, 1)); > >>>> + bb_info.track_access (insn, true, XEXP (pat, 1)); > >>>> } > >>>> > >>>> - bb_state.transform (); > >>>> - bb_state.cleanup_tombstones (); > >>>> + bb_info.transform (); > >>>> + bb_info.cleanup_tombstones (); > >>>> } > >>>> > >>>> void ldp_fusion () > >>>> { > >>>> ldp_fusion_init (); > >>>> + pair_fusion *pfuse; > >>>> + aarch64_pair_fusion derived; > >>>> + pfuse = &derived; > >>> > >>> This is indeed the one place where I think it is acceptable to > >>> instantiate aarch64_pair_fusion. But again there's no need to create a > >>> pointer to the parent class, just call any function you like directly. > >>> > >> > >> Addressed. > >>>> > >>>> for (auto bb : crtl->ssa->bbs ()) > >>>> - ldp_fusion_bb (bb); > >>>> + pfuse->ldp_fusion_bb (bb); > >>> > >>> I think even the code to iterate over bbs should itself be a member > >>> function of pair_fusion (say "run") and then that becomes part of the > >>> generic code. > >>> > >>> So this function would just become: > >>> > >>> aarch64_pair_fusion pass; > >>> pass.run (); > >>> > >>> and could be inlined into the caller. > >>> > >> > >> Addressed. > >>> Perhaps you could also add an early return like: > >>> > >>> if (!track_loads_p () && !track_stores_p ()) > >>> return; > >>> > >>> in pair_fusion::run () and then remove the corresponding code from > >>> pass_ldp_fusion::gate? > >>> > >> > >> Addressed. > >> > >>>> > >>>> ldp_fusion_destroy (); > >>>> } > >>>> -- > >>>> 2.39.3 > >>>> > >>> > >>> Thanks, > >>> Alex > >> > >> Thanks & Regards > >> Ajit