From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR05-AM6-obe.outbound.protection.outlook.com (mail-am6eur05on2052.outbound.protection.outlook.com [40.107.22.52]) by sourceware.org (Postfix) with ESMTPS id 4DF413858D38 for ; Fri, 12 Apr 2024 14:45:59 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 4DF413858D38 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 4DF413858D38 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=40.107.22.52 ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1712933162; cv=pass; b=RXjybE+glaCBI/H4r1T7d6GJTTNfNcxNMxE+IYBHmjgjA6YX1pj7UyD9ndmF6zPlGTHnH3fR5iOhSKfzQ0JxxdtQ/FMwgesJpN2F7Q9RAC3HQZpL+GOg2dbbOmOs4/+ZkjkiWlq6eFnFOVh0MFlZhgElleCOdrEaD1kuVzBfuS0= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1712933162; c=relaxed/simple; bh=rK7OL5vY6t+1wr6VQ2E9n1WoBXGeI9782Xrb4HSxbAw=; h=DKIM-Signature:DKIM-Signature:Date:From:To:Subject:Message-ID: MIME-Version; b=LMknG48ItNzGL9M6Fj/pZppYwlU7Ee1d8ypRY7rxqH8uggMjkDiHsUn4/0Nd3ydC+kyhjPbMKbFm1DUn0grgUGkVTVw6sMQcP5QTdVhLBF11aVoAqbYkXP2Z809+Ihp9cyhlrO0YSzHg/Xz2nycCy7YcjC2ogy3G43y/pXgCohw= ARC-Authentication-Results: i=3; server2.sourceware.org ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=ZpUE4dwXHoLUflnCbnElymC66HwBNCTeFS4BeyhG+N6H+He7lGRLxjpDMXLO2/O8ws7l1qqEek1vIAvyxPw0FVze9w9aXGgSZz++h4FzuV/2UfaaQZUYSspPggd2fMxETx9AqVTLv+xPgv7I9JMXT/NAGaBLrWWnLNXpy73MbX1A0HZYDYCVxnJhKfIQ9bMNW0P4S3jXWqedYvWUj+dmbcJIijeWJjsVHOg1B5MGcCNrmScncRyPzti1riONVVZtUceUWGAMOEdtKRf9eBp+F4SPmPqOydOLCZTdccn0wpUhUlNj/J8ZbwZnTGKzNRlWfZnzpGUT6pqkqPYlVOoW8g== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=dXTbdIJQapi8qu+WrqnK5qNjOuhfY+X1nxS1twI0IeI=; b=DiKG5KQrFWvBW8xnsYMWAIEgywIWnP++5KjCUT4a3yN/qh01nf+L7rf/5u3+I4N9TpYeK3qzC30aa9QS6v6T5H/4lvgDR2chngBB/GMZKU5TqPvPPb9UlhOmgGrPd7S+Gkf5OoO1RIu90GbS05QXDE8zyynFAjlV8vkN9I03KdDx6FOAJ4JMAjfBBXZqeor3d7AIIHfLMNC32Z9Vr0Cumsq8RjT12CMJClO1LPnvFoVq0vKoqY5+5oSwyCAu9t+BvhMSdelDVFv7uAa7gGecvnqtpvz8UMQQQr/uAtBWV3OnmkFfuOUykw3pi/Na9P68SIwXtSNhbAopVBJYllShPA== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=dXTbdIJQapi8qu+WrqnK5qNjOuhfY+X1nxS1twI0IeI=; b=vF5tDiVmztuHI6sGh5eq8MKdPPH1MlJIJNIlZuwfaShjJPPbFvkf0PzN8xQ01bbbdGRopCIL26lcB/Ce9EdJqPnzP0azphTeuhB00VcQPs1S6SSWbIpWCyU7XEHDdSQctZlSCS+QoZKAjs0VYDoN648zcV4zly0vQpnHSfg9SWg= Received: from AM6P195CA0025.EURP195.PROD.OUTLOOK.COM (2603:10a6:209:81::38) by AM8PR08MB6532.eurprd08.prod.outlook.com (2603:10a6:20b:316::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7409.46; Fri, 12 Apr 2024 14:45:56 +0000 Received: from AM1PEPF000252E1.eurprd07.prod.outlook.com (2603:10a6:209:81:cafe::55) by AM6P195CA0025.outlook.office365.com (2603:10a6:209:81::38) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7472.22 via Frontend Transport; Fri, 12 Apr 2024 14:45:56 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM1PEPF000252E1.mail.protection.outlook.com (10.167.16.59) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.7452.22 via Frontend Transport; Fri, 12 Apr 2024 14:45:56 +0000 Received: ("Tessian outbound 01a47eb2eb85:v313"); Fri, 12 Apr 2024 14:45:56 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 7482bc597c992b9a X-CR-MTA-TID: 64aa7808 Received: from 7561f31b21b8.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id C19653D6-C40D-4178-9B5C-F057B8244BE8.1; Fri, 12 Apr 2024 14:45:45 +0000 Received: from EUR04-VI1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 7561f31b21b8.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Fri, 12 Apr 2024 14:45:45 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=GHLuhhC2KZFUA/gtmlM0Hv9z5zEg6jfloICyxEf9wvqDOlr873K3gUVi69+9g0xN1tKWptZI4kuj6TUA4sEuhz2pS5seghGlQduHdc1a8wC4CYYdRAdlWMQ1ZRQ5b9TGcbbgtP0/vn9IBFxWL2hVFwOr0hyt32UlqhprQI9ILaqxdufMCvSMaCA+pcKCVU3j//0RFqbr9cZ1mCNvG38h7KuZDFsogI8GsNLFWZkuuISrdFEUROAxfILR42/5m4mJbhUJvIwdTxzReAX1dt5JdXuidtgG2alO9wgKTo+HbTuiEvens5RXSwdjHpoCzu20MCaUN0FamHGOngrxDASPqg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=dXTbdIJQapi8qu+WrqnK5qNjOuhfY+X1nxS1twI0IeI=; b=bKCJQRnvJCo0uBoUagMKkGdf35WKRwEyIp66Q3LQH/EQ+AiZHyz+eclUiSMv23Z1NZ6BzU/5R7LK7l9oNAzpvqSbiLolYlbs6I2hPGR6HqNwEc8Kgdr1iNfD45wzf8Q1jhOy4uUH+Z6iLtcjEQrbwJaRXPxeWOoSu5rxjQoIUU10rP4qNUhb8anTSXduFWPBSIgxP9XPKiejh5u91w55ujfKXcnugJ4Utyfv5F3HzUElGDjyMSQrSzLe+0zRH/GGJgZfi5xmtmoQGDFrgY7KBpcRyA43s1vfvktDn5z7k7+V8DmyhTCbKxEvYEvaiLlANDeQdqR14PnNIOAfw64HEQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=dXTbdIJQapi8qu+WrqnK5qNjOuhfY+X1nxS1twI0IeI=; b=vF5tDiVmztuHI6sGh5eq8MKdPPH1MlJIJNIlZuwfaShjJPPbFvkf0PzN8xQ01bbbdGRopCIL26lcB/Ce9EdJqPnzP0azphTeuhB00VcQPs1S6SSWbIpWCyU7XEHDdSQctZlSCS+QoZKAjs0VYDoN648zcV4zly0vQpnHSfg9SWg= Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from PAWPR08MB8958.eurprd08.prod.outlook.com (2603:10a6:102:33e::15) by GV2PR08MB9949.eurprd08.prod.outlook.com (2603:10a6:150:ba::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7409.46; Fri, 12 Apr 2024 14:45:41 +0000 Received: from PAWPR08MB8958.eurprd08.prod.outlook.com ([fe80::9561:db03:3f6c:5634]) by PAWPR08MB8958.eurprd08.prod.outlook.com ([fe80::9561:db03:3f6c:5634%5]) with mapi id 15.20.7409.042; Fri, 12 Apr 2024 14:45:41 +0000 Date: Fri, 12 Apr 2024 15:45:38 +0100 From: Alex Coplan To: Ajit Agarwal Cc: Richard Sandiford , "Kewen.Lin" , Segher Boessenkool , Michael Meissner , David Edelsohn , Peter Bergner , gcc-patches Subject: Re: [PATCH V4 1/3] aarch64: Place target independent and dependent changed code in one file Message-ID: References: <54ab422a-4ffa-4ca9-8029-eb23c42ed653@linux.ibm.com> <595cd863-3ba5-492b-84d1-ab470411507e@linux.ibm.com> Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <595cd863-3ba5-492b-84d1-ab470411507e@linux.ibm.com> X-ClientProxiedBy: LO2P265CA0112.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:c::28) To PAWPR08MB8958.eurprd08.prod.outlook.com (2603:10a6:102:33e::15) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: PAWPR08MB8958:EE_|GV2PR08MB9949:EE_|AM1PEPF000252E1:EE_|AM8PR08MB6532:EE_ X-MS-Office365-Filtering-Correlation-Id: 504aa5fb-77a0-45b2-aec6-08dc5aff4306 x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: CeDgwBtLKd7qDkmrTkKzUTynLgHyEnaAX3qz+owB3IHhMZJLdVYPjMhuWhlLlYEbNG60aNRCeEPqhy5a8C8v1H7MoUoDVbpoJxRvzdfCNHJpBCCkV7Yr4D/qHQf1FvrOtXW+M6MXaft2mZHS20NuWylX7dtQGeopeZ/DSK99arkkukEddFZ2RgVfHuCUdftctqnrywZy4YvQuqpeyQf0ySPtAMqAxGDIwn2SN3NeWWKPpR73OGix0tqe5xWChZKYMYJN3Z5AH4+Kt+oNGPDHBHpwMmSCZ+wlKHeO32arE/oIutBq6uv8BHqtWWHMq77T9uZjuNh+89PrOi7c7r9oLPxApM/PEnjoTMlL981rxiQ/h1nEO1fuJVArGzcrKPe7jtqV9Gkc3onULh1SBkuhdGZwifrjvhhM0CkEgnynrGhIXKfpo7hmBIH+Zd3RDGsNYZZiH89jvdevxaZwu95u0ZI3vUvGtLbpYrruizktJOxOW3YHk1RJlQPeTL8J4IItuVgxcl7KR4yMc8Z4J2LiH+Ai5GIFZtMXmqP+q0oxQEi3lIsYhF1r1HiA/82/HvlUes5jeMOtR9SvwC/S64Exur7+3omyazGA089TWZlUNwj7y5lU1rW/uZSL6j6+kKf68MMLQk425EcPVQWXf2uf5lG1lyPek8XaXIu8o6V0zdo= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:PAWPR08MB8958.eurprd08.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230031)(366007)(1800799015)(376005);DIR:OUT;SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: GV2PR08MB9949 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM1PEPF000252E1.eurprd07.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: ff79be24-f398-427a-0afd-08dc5aff397e X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: Vn1XgwDUn7puRnZm/Qvpgjw7OBtELyp0e2B6En1J79D/sCIYaLB/pXgjXZYIMhQu3/Vr2QNdwNXPp1+UZ3QN1kV22UTV6U2woVx8ON8+NFIM5bGsp6NJ4RT+TbPDqbd8EgXu+LJ5C/09mgA2smWXAX5iGVMG492dvDVew8JwO2unDNRAy07B9dn6NPxTtlLHuqVE+bYWd/B/pCZE5fk91E4rGmxoo2ls2po0WQNeBQeBkJwhFokswTIPVXXxoH40UHykTaEak/FAfjLsGcGBi/UMq6Xmp6hGEf10Fy8P7X68XujAklznscQixX9+JnKU3x2ZgRFnGYw3P78i5PFC0uw3YsMInCYz9D+kA+9smW3h7BhDFPCRIHHbgc6tFzClKfGKsXt1exmJ2lzofbaovMMzCtM/tE5y/qbXlZBlE2euXO9fLwQR+4q5mMwEMmpLxbRSTmK3VBLT6kdkM21d5/mxvb5CwcNEe8niIotxdf76OpuvExJYFKmVE47vw3wf8x55EPphP5EtAM5Mz+97Nzh4S8MqbaltBa0jUG44l5amiP2LL2uHjTa6z1LdJTF19wAblndr0cMEBNoBGtzdT3Qon+jmnvAQiJjNRSxDe8rZ2+I5Qug9/B/+RsqPkr1AYNb5gM6zvYHzVrvkq7JyIT8xteB18WoqKgdYianLYi6IsppuoaLdPyxzOnlfZZ4/NtEA0GsUWT2jGM/MOWRQmjWRmM0Ao2t6uDt+c3XuTBBenn2xz0jPf5Z0VEXnuors X-Forefront-Antispam-Report: CIP:63.35.35.123;CTRY:IE;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:64aa7808-outbound-1.mta.getcheckrecipient.com;PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com;CAT:NONE;SFS:(13230031)(376005)(1800799015)(82310400014)(36860700004);DIR:OUT;SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 12 Apr 2024 14:45:56.6814 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 504aa5fb-77a0-45b2-aec6-08dc5aff4306 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d;Ip=[63.35.35.123];Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM1PEPF000252E1.eurprd07.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM8PR08MB6532 X-Spam-Status: No, score=-6.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,FORGED_SPF_HELO,KAM_DMARC_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE,TXREP,UNPARSEABLE_RELAY autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 12/04/2024 20:02, Ajit Agarwal wrote: > Hello Alex: > > On 11/04/24 7:55 pm, Alex Coplan wrote: > > On 10/04/2024 23:48, Ajit Agarwal wrote: > >> Hello Alex: > >> > >> On 10/04/24 7:52 pm, Alex Coplan wrote: > >>> Hi Ajit, > >>> > >>> On 10/04/2024 15:31, Ajit Agarwal wrote: > >>>> Hello Alex: > >>>> > >>>> On 10/04/24 1:42 pm, Alex Coplan wrote: > >>>>> Hi Ajit, > >>>>> > >>>>> On 09/04/2024 20:59, Ajit Agarwal wrote: > >>>>>> Hello Alex: > >>>>>> > >>>>>> On 09/04/24 8:39 pm, Alex Coplan wrote: > >>>>>>> On 09/04/2024 20:01, Ajit Agarwal wrote: > >>>>>>>> Hello Alex: > >>>>>>>> > >>>>>>>> On 09/04/24 7:29 pm, Alex Coplan wrote: > >>>>>>>>> On 09/04/2024 17:30, Ajit Agarwal wrote: > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> On 05/04/24 10:03 pm, Alex Coplan wrote: > >>>>>>>>>>> On 05/04/2024 13:53, Ajit Agarwal wrote: > >>>>>>>>>>>> Hello Alex/Richard: > >>>>>>>>>>>> > >>>>>>>>>>>> All review comments are incorporated. > >>> > >>>>>>>>>>>> @@ -2890,8 +3018,8 @@ ldp_bb_info::merge_pairs (insn_list_t &left_list, > >>>>>>>>>>>> // of accesses. If we find two sets of adjacent accesses, call > >>>>>>>>>>>> // merge_pairs. > >>>>>>>>>>>> void > >>>>>>>>>>>> -ldp_bb_info::transform_for_base (int encoded_lfs, > >>>>>>>>>>>> - access_group &group) > >>>>>>>>>>>> +pair_fusion_bb_info::transform_for_base (int encoded_lfs, > >>>>>>>>>>>> + access_group &group) > >>>>>>>>>>>> { > >>>>>>>>>>>> const auto lfs = decode_lfs (encoded_lfs); > >>>>>>>>>>>> const unsigned access_size = lfs.size; > >>>>>>>>>>>> @@ -2909,7 +3037,7 @@ ldp_bb_info::transform_for_base (int encoded_lfs, > >>>>>>>>>>>> access.cand_insns, > >>>>>>>>>>>> lfs.load_p, > >>>>>>>>>>>> access_size); > >>>>>>>>>>>> - skip_next = access.cand_insns.empty (); > >>>>>>>>>>>> + skip_next = bb_state->cand_insns_empty_p (access.cand_insns); > >>>>>>>>>>> > >>>>>>>>>>> As above, why is this needed? > >>>>>>>>>> > >>>>>>>>>> For rs6000 we want to return always true. as load store pair > >>>>>>>>>> that are to be merged with 8/16 16/32 32/64 is occuring for rs6000. > >>>>>>>>>> And we want load store pair to 8/16 32/64. Thats why we want > >>>>>>>>>> to generate always true for rs6000 to skip pairs as above. > >>>>>>>>> > >>>>>>>>> Hmm, sorry, I'm not sure I follow. Are you saying that for rs6000 you have > >>>>>>>>> load/store pair instructions where the two arms of the access are storing > >>>>>>>>> operands of different sizes? Or something else? > >>>>>>>>> > >>>>>>>>> As it stands the logic is to skip the next iteration only if we > >>>>>>>>> exhausted all the candidate insns for the current access. In the case > >>>>>>>>> that we didn't exhaust all such candidates, then the idea is that when > >>>>>>>>> access becomes prev_access, we can attempt to use those candidates as > >>>>>>>>> the "left-hand side" of a pair in the next iteration since we failed to > >>>>>>>>> use them as the "right-hand side" of a pair in the current iteration. > >>>>>>>>> I don't see why you wouldn't want that behaviour. Please can you > >>>>>>>>> explain? > >>>>>>>>> > >>>>>>>> > >>>>>>>> In merge_pair we get the 2 load candiates one load from 0 offset and > >>>>>>>> other load is from 16th offset. Then in next iteration we get load > >>>>>>>> from 16th offset and other load from 32 offset. In next iteration > >>>>>>>> we get load from 32 offset and other load from 48 offset. > >>>>>>>> > >>>>>>>> For example: > >>>>>>>> > >>>>>>>> Currently we get the load candiates as follows. > >>>>>>>> > >>>>>>>> pairs: > >>>>>>>> > >>>>>>>> load from 0th offset. > >>>>>>>> load from 16th offset. > >>>>>>>> > >>>>>>>> next pairs: > >>>>>>>> > >>>>>>>> load from 16th offset. > >>>>>>>> load from 32th offset. > >>>>>>>> > >>>>>>>> next pairs: > >>>>>>>> > >>>>>>>> load from 32th offset > >>>>>>>> load from 48th offset. > >>>>>>>> > >>>>>>>> Instead in rs6000 we should get: > >>>>>>>> > >>>>>>>> pairs: > >>>>>>>> > >>>>>>>> load from 0th offset > >>>>>>>> load from 16th offset. > >>>>>>>> > >>>>>>>> next pairs: > >>>>>>>> > >>>>>>>> load from 32th offset > >>>>>>>> load from 48th offset. > >>>>>>> > >>>>>>> Hmm, so then I guess my question is: why wouldn't you consider merging > >>>>>>> the pair with offsets (16,32) for rs6000? Is it because you have a > >>>>>>> stricter alignment requirement on the base pair offsets (i.e. they have > >>>>>>> to be a multiple of 32 when the operand size is 16)? So the pair > >>>>>>> offsets have to be a multiple of the entire pair size rather than a > >>>>>>> single operand size> > >>>>>> > >>>>>> We get load pair at a certain point with (0,16) and other program > >>>>>> point we get load pair (32, 48). > >>>>>> > >>>>>> In current implementation it takes offsets loads as (0, 16), > >>>>>> (16, 32), (32, 48). > >>>>>> > >>>>>> But In rs6000 we want the load pair to be merged at different points > >>>>>> as (0,16) and (32, 48). for (0,16) we want to replace load lxvp with > >>>>>> 0 offset and other load (32, 48) with lxvp with 32 offset. > >>>>>> > >>>>>> In current case it will merge with lxvp with 0 offset and lxvp with > >>>>>> 16 offset, then lxvp with 32 offset and lxvp with 48 offset which > >>>>>> is incorrect in our case as the (16-32) case 16 offset will not > >>>>>> load from even register and break for rs6000. > >>>>> > >>>>> Sorry, I think I'm still missing something here. Why does the address offset > >>>>> affect the parity of the tranfser register? ISTM they needn't be related at > >>>>> all (and indeed we can't even know the parity of the transfer register before > >>>>> RA, but perhaps you're only intending on running the pass after RA?) > >>>>> > >>>> > >>>> We have load pair with (0,16) wherein these loads are adjacent and > >>>> replaced with lxvp. > >>>> > >>>> Semantic of lxvp instruction is that it loads adjacent load pair in > >>>> even register and even_register + 1. > >>>> > >>>> We replace the above load pair with lxvp instruction and then we > >>>> dont need to merge (16,32) as (0, 16) is already merged and instead > >>>> we merge (32,48). > >>> > >>> Ok, but the existing logic should already account for this. I.e. if we > >>> successfully merge (0,16), then we don't attempt to merge (16,32). We'd only > >>> attempt to merge (16,32) if the merge of (0,16) failed (for whatever reason). > >>> So I don't see that there's anything specific to lxvp that requires this logic > >>> to change, _unless_ you have a stricter alignment requirement on the offsets as > >>> I mentioned before. > >>> > >> > >> Thanks for the suggestion. It worked for rs6000 also with current changes. > >> Sorry for the confusion. > > > > Alright, glad we got to the bottom of this! > > Thanks. > > > >> > >>>> > >>>> Yes you are correct, the addresss offset doesn't affect the parity of > >>>> the register transfer as we are doing fusion pass before RA. > >>>> If that is the case then I think it would be better to introduce a > >>>>>>> virtual function (say pair_offset_alignment_ok_p) that vets the base > >>>>>>> offset of the pair (prev_access->offset in transform_for_base). I guess > >>>>>>> it would also take access_size as a parameter and for aarch64 it should > >>>>>>> check: > >>>>>>> > >>>>>>> multiple_p (offset, access_size) > >>>>>>> > >>>>>>> and for rs6000 it could check: > >>>>>>> > >>>>>>> multiple_p (offset, access_size * 2) > >>>>>>> > >>>>>>> and we would then incorporate a call to that predicate in the else if > >>>>>>> condition of tranform_for_base. > >>>>>>> > >>>>>>> It would have the same limitation whereby we assume that MEM_EXPR offset > >>>>>>> alignment is a good proxy for RTL offset alignment, but we already > >>>>>>> depend on that with the multiple_p check in track_via_mem_expr. > >>>>>>> > >>>> I have addressed the above hooks and it worked fine with both rs6000 > >>>> and aarch64. I am sending subsequent patch in some time that address > >>>> above. > >>>> > >>>>> How do you plan on handling this even-odd requirement for rs6000? > >>>>> > >>>> > >>>> We plan to handle with V16QI subreg: 0 and V16QI subreg : 16 to > >>>> generate register pair and thats what we generate and implement > >>>> in rs6000 target > >>>> code. > >>> > >>> Ah, this is coming back to me now. Sorry, I should have remembered this from > >>> the previous discussion with Richard S. > >>> > >>> Apologies for going on a slight tangent here, but if you're running > >>> before RA are you planning to create a new OImode pseudo register for > >>> the lxvp insn and then somehow update uses of the old transfer registers > >>> to replace them with subregs of that OImode pseudo? > >> > >> Yes I do the same as you have mentioned. We generate register pairs > >> with 256 bit mode with two subregs of 128 bit modes with 0 and > >> 16 offset. > >> > >> Or do you just plan > >>> or replacing the individual loads with moves (instead of deleting them)? > >>> I guess the latter would be simpler and might work better in the > >>> presence of hard regs. > >>> > >> > >> Would you mind explaining how to generate register pairs with lxvp by > >> replacing loads with moves. > > > > Yeah, so suppose you have something like: > > > > (set (reg:V16QI v1) (mem:V16QI addr)) > > (set (reg:V16QI v2) (mem:V16QI addr+16)) > > > > then when you insert the lxvp you can then (logically) replace the > > original load instructions with moves from the appropriate subreg, as > > follows: > > > > (set (reg:OI pair-pseudo) (mem:OI addr)) ; lxvp > > (set (reg:V16QI v1) (subreg:V16QI (reg:OI pair-pseudo) 0)) > > (set (reg:V16QI v2) (subreg:V16QI (reg:OI pair-pseudo) 16)) > > > > Any Pseudo created with gen_rtx_REG like > gen_rtx_REG (OOmode, REGNO (dest_exp) will error'ed out > with unrecognize insn by LRA. I'm not surprised that goes wrong: you can't just create a new REG rtx in a different mode and reuse the regno of an existing pseudo. > > If I create pseudo with gen_reg_rtx (OOmode) will error'ed > out with new_defs Pseudo register is not found in > change->new_defs. Yeah, I suppose you'd need to add an RTL-SSA def for the new pseudo. > > Also the sequential register pairs are not generated by > Register Allocator. So how do you get the sequential pairs with your current approach? My understanding was that what I suggested above doesn't really change what you're doing w.r.t the lxvp insn itself, but maybe I didn't properly understand the approach taken in your initial patchset. Thanks, Alex > > Thats why I haven't used above method as I also thought > through. > > Please let me know what do you think. > > Thanks & Regards > Ajit > > now I'm not sure off-hand if this exact combination of subregs and mode > > changes is valid, but hopefully you get the idea. The benefit of this > > approach is that it keeps the transformation local and is more > > compile-time efficient (we don't have to look through all mentions of > > v1/v2 and replace them with subregs). I'd hope that the RA can then > > clean up any redundant moves (especially in the case that v1/v2 are > > pseudos). > > > > That would mean that you don't need all the grubbing around with DF > > looking for occurrences of the transfer regs. Instead we'd just need > > some logic to insert the extra moves after the lxvp for rs6000, and I > > think this might fit more cleanly into the current pass. > > > > Does that make sense? In any case, this shouldn't really affect the > > preparatory aarch64 patch because we were planning to defer adding any > > hooks that are only needed for rs6000 from the initial aarch64/generic > > split. > > > > Thanks, > > Alex > > > >> > >> Thanks & Regards > >> Ajit > >> > >>> Thanks, > >>> Alex > >>> > >>>> > >>>> Thanks & Regards > >>>> Ajit > >>>>> Thanks, > >>>>> Alex > >>>>> > >>>>>> > >>>>>> lxvp should load from even registers and then loaded value will > >>>>>> be in even register and even register +1 (which is odd). > >>>>>> > >>>>>> Thanks & Regards > >>>>>> Ajit > >>>>>>> If that is the case then I think it would be better to introduce a > >>>>>>> virtual function (say pair_offset_alignment_ok_p) that vets the base > >>>>>>> offset of the pair (prev_access->offset in transform_for_base). I guess > >>>>>>> it would also take access_size as a parameter and for aarch64 it should > >>>>>>> check: > >>>>>>> > >>>>>>> multiple_p (offset, access_size) > >>>>>>> > >>>>>>> and for rs6000 it could check: > >>>>>>> > >>>>>>> multiple_p (offset, access_size * 2) > >>>>>>> > >>>>>>> and we would then incorporate a call to that predicate in the else if > >>>>>>> condition of tranform_for_base. > >>>>>>> > >>>>>>> It would have the same limitation whereby we assume that MEM_EXPR offset > >>>>>>> alignment is a good proxy for RTL offset alignment, but we already > >>>>>>> depend on that with the multiple_p check in track_via_mem_expr. > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Alex > >>>>>>> > >>>>>>>> > >>>>>>>> Thanks & Regards > >>>>>>>> Ajit > >>>>>>>>> Thanks, > >>>>>>>>> Alex > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>> } > >>>>>>>>>>>> prev_access = &access; > >>>>>>>>>>>> } > >>>>>>>>>>>> @@ -2919,7 +3047,7 @@ ldp_bb_info::transform_for_base (int encoded_lfs, > >>>>>>>>>>>> // and remove all the tombstone insns, being sure to reparent any uses > >>>>>>>>>>>> // of mem to previous defs when we do this. > >>>>>>>>>>>> void > >>>>>>>>>>>> -ldp_bb_info::cleanup_tombstones () > >>>>>>>>>>>> +pair_fusion_bb_info::cleanup_tombstones () > >>>>>>>>>>>> { > >>>>>>>>>>>> // No need to do anything if we didn't emit a tombstone insn for this BB. > >>>>>>>>>>>> if (!m_emitted_tombstone) > >>>>>>>>>>>> @@ -2947,7 +3075,7 @@ ldp_bb_info::cleanup_tombstones () > >>>>>>>>>>>> > >>>>>>>>>>>> template > >>>>>>>>>>>> void > >>>>>>>>>>>> -ldp_bb_info::traverse_base_map (Map &map) > >>>>>>>>>>>> +pair_fusion_bb_info::traverse_base_map (Map &map) > >>>>>>>>>>>> { > >>>>>>>>>>>> for (auto kv : map) > >>>>>>>>>>>> { > >>>>>>>>>>>> @@ -2958,7 +3086,7 @@ ldp_bb_info::traverse_base_map (Map &map) > >>>>>>>>>>>> } > >>>>>>>>>>>> > >>>>>>>>>>>> void > >>>>>>>>>>>> -ldp_bb_info::transform () > >>>>>>>>>>>> +pair_fusion_bb_info::transform () > >>>>>>>>>>>> { > >>>>>>>>>>>> traverse_base_map (expr_map); > >>>>>>>>>>>> traverse_base_map (def_map); > >>>>>>>>>>>> @@ -3167,14 +3295,13 @@ try_promote_writeback (insn_info *insn) > >>>>>>>>>>>> // for load/store candidates. If running after RA, also try and promote > >>>>>>>>>>>> // non-writeback pairs to use writeback addressing. Then try to fuse > >>>>>>>>>>>> // candidates into pairs. > >>>>>>>>>>>> -void ldp_fusion_bb (bb_info *bb) > >>>>>>>>>>>> +void pair_fusion::ldp_fusion_bb (bb_info *bb) > >>>>>>>>>>>> { > >>>>>>>>>>>> - const bool track_loads > >>>>>>>>>>>> - = aarch64_tune_params.ldp_policy_model != AARCH64_LDP_STP_POLICY_NEVER; > >>>>>>>>>>>> - const bool track_stores > >>>>>>>>>>>> - = aarch64_tune_params.stp_policy_model != AARCH64_LDP_STP_POLICY_NEVER; > >>>>>>>>>>>> + const bool track_loads = track_load_p (); > >>>>>>>>>>>> + const bool track_stores = track_store_p (); > >>>>>>>>>>>> > >>>>>>>>>>>> - ldp_bb_info bb_state (bb); > >>>>>>>>>>> > >>>>>>>>>>> This: > >>>>>>>>>>> > >>>>>>>>>>>> + aarch64_pair_fusion derived; > >>>>>>>>>>> > >>>>>>>>>>> can be deleted and then: > >>>>>>>>>>> > >>>>>>>>>>>> + pair_fusion_bb_info bb_info (bb, &derived); > >>>>>>>>>>> > >>>>>>>>>>> can just be: > >>>>>>>>>>> > >>>>>>>>>>> pair_fusion_bb_info bb_info (bb, this); > >>>>>>>>>>> > >>>>>>>>>>> (or you can pass *this if you make bb_info take a reference). > >>>>>>>>>>> > >>>>>>>>>>> I don't think there's a particular need to change the variable name > >>>>>>>>>>> (bb_state -> bb_info). I chose the former because it doens't clash > >>>>>>>>>>> with the RTL-SSA structure of the same name as the latter. > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Addressed. > >>>>>>>>>>>> > >>>>>>>>>>>> for (auto insn : bb->nondebug_insns ()) > >>>>>>>>>>>> { > >>>>>>>>>>>> @@ -3184,31 +3311,31 @@ void ldp_fusion_bb (bb_info *bb) > >>>>>>>>>>>> continue; > >>>>>>>>>>>> > >>>>>>>>>>>> rtx pat = PATTERN (rti); > >>>>>>>>>>>> - if (reload_completed > >>>>>>>>>>>> - && aarch64_ldp_writeback > 1 > >>>>>>>>>>>> - && GET_CODE (pat) == PARALLEL > >>>>>>>>>>>> - && XVECLEN (pat, 0) == 2) > >>>>>>>>>>>> + if (pair_mem_promote_writeback_p (pat)) > >>>>>>>>>>>> try_promote_writeback (insn); > >>>>>>>>>>> > >>>>>>>>>>> It looks like try_promote_writeback itself will need some further work > >>>>>>>>>>> to make it target-independent. I suppose this check: > >>>>>>>>>>> > >>>>>>>>>>> auto rti = insn->rtl (); > >>>>>>>>>>> const auto attr = get_attr_ldpstp (rti); > >>>>>>>>>>> if (attr == LDPSTP_NONE) > >>>>>>>>>>> return; > >>>>>>>>>>> > >>>>>>>>>>> bool load_p = (attr == LDPSTP_LDP); > >>>>>>>>>>> gcc_checking_assert (load_p || attr == LDPSTP_STP); > >>>>>>>>>>> > >>>>>>>>>>> will need to become part of the pair_mem_promote_writeback_p hook that you > >>>>>>>>>>> added, potentially changing it to return a boolean for load_p. > >>>>>>>>>>> > >>>>>>>>>>> Then I guess we will need hooks for destructuring the pair insn and > >>>>>>>>>>> another hook to wrap aarch64_gen_writeback_pair. > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Addressed. > >>>>>>>>>>>> > >>>>>>>>>>>> if (GET_CODE (pat) != SET) > >>>>>>>>>>>> continue; > >>>>>>>>>>>> > >>>>>>>>>>>> if (track_stores && MEM_P (XEXP (pat, 0))) > >>>>>>>>>>>> - bb_state.track_access (insn, false, XEXP (pat, 0)); > >>>>>>>>>>>> + bb_info.track_access (insn, false, XEXP (pat, 0)); > >>>>>>>>>>>> else if (track_loads && MEM_P (XEXP (pat, 1))) > >>>>>>>>>>>> - bb_state.track_access (insn, true, XEXP (pat, 1)); > >>>>>>>>>>>> + bb_info.track_access (insn, true, XEXP (pat, 1)); > >>>>>>>>>>>> } > >>>>>>>>>>>> > >>>>>>>>>>>> - bb_state.transform (); > >>>>>>>>>>>> - bb_state.cleanup_tombstones (); > >>>>>>>>>>>> + bb_info.transform (); > >>>>>>>>>>>> + bb_info.cleanup_tombstones (); > >>>>>>>>>>>> } > >>>>>>>>>>>> > >>>>>>>>>>>> void ldp_fusion () > >>>>>>>>>>>> { > >>>>>>>>>>>> ldp_fusion_init (); > >>>>>>>>>>>> + pair_fusion *pfuse; > >>>>>>>>>>>> + aarch64_pair_fusion derived; > >>>>>>>>>>>> + pfuse = &derived; > >>>>>>>>>>> > >>>>>>>>>>> This is indeed the one place where I think it is acceptable to > >>>>>>>>>>> instantiate aarch64_pair_fusion. But again there's no need to create a > >>>>>>>>>>> pointer to the parent class, just call any function you like directly. > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Addressed. > >>>>>>>>>>>> > >>>>>>>>>>>> for (auto bb : crtl->ssa->bbs ()) > >>>>>>>>>>>> - ldp_fusion_bb (bb); > >>>>>>>>>>>> + pfuse->ldp_fusion_bb (bb); > >>>>>>>>>>> > >>>>>>>>>>> I think even the code to iterate over bbs should itself be a member > >>>>>>>>>>> function of pair_fusion (say "run") and then that becomes part of the > >>>>>>>>>>> generic code. > >>>>>>>>>>> > >>>>>>>>>>> So this function would just become: > >>>>>>>>>>> > >>>>>>>>>>> aarch64_pair_fusion pass; > >>>>>>>>>>> pass.run (); > >>>>>>>>>>> > >>>>>>>>>>> and could be inlined into the caller. > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Addressed. > >>>>>>>>>>> Perhaps you could also add an early return like: > >>>>>>>>>>> > >>>>>>>>>>> if (!track_loads_p () && !track_stores_p ()) > >>>>>>>>>>> return; > >>>>>>>>>>> > >>>>>>>>>>> in pair_fusion::run () and then remove the corresponding code from > >>>>>>>>>>> pass_ldp_fusion::gate? > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Addressed. > >>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> ldp_fusion_destroy (); > >>>>>>>>>>>> } > >>>>>>>>>>>> -- > >>>>>>>>>>>> 2.39.3 > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> Thanks, > >>>>>>>>>>> Alex > >>>>>>>>>> > >>>>>>>>>> Thanks & Regards > >>>>>>>>>> Ajit