From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR05-DB8-obe.outbound.protection.outlook.com (mail-db8eur05on2045.outbound.protection.outlook.com [40.107.20.45]) by sourceware.org (Postfix) with ESMTPS id 847A4385840A for ; Wed, 10 Apr 2024 14:23:09 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 847A4385840A Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 847A4385840A Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=40.107.20.45 ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1712758993; cv=pass; b=hfcjvD11yZDV3NGDCZga65Gk32BtdIlJu2/hO1/2WKLEAtf6+dQCX5RnZoTwx5DR0x3wfaJ+L0CuvKJZpDk3bZX7wOYzEjGJ4LUghpfzTwpxNYdYoYL1Zl06abDIhuy2leFIDAOIbzR2pg1eRIILqV758imMjfYJUxe4Sjp5VQs= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1712758993; c=relaxed/simple; bh=L4nc1SOQWW+gurARetwLgiIzNPosDNvbabynOnyqfGA=; h=DKIM-Signature:DKIM-Signature:Date:From:To:Subject:Message-ID: MIME-Version; b=EBZryc+NVg6uY2BgS44fdVXSKA2sNaTPuBe2Qbv6jxPuGGYLHb+pPvz1wSTjNm8jZe6NTOPUWTSM1E2DO4xKPfHRI9Mp076bp0DlXluoZB9a+1nsPZnRu4rt/GaMKyz/j/8MfoJMeByIyJJbtk7JQy8mMwLsn6+6vfCy0SU+DE4= ARC-Authentication-Results: i=3; server2.sourceware.org ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=OwakHthuHzfEMYCR9LzI68QK8cCi9NI3iZuI6+VxbfkkkAv+0Z51jdL9F1Fbjycb4si2u/IqRRAPsXp0MLDQVNsOw9yTe2T7Du/5qh0wnqIMuHYP/BF8Dzu1w3GzZgpuWPEcQJvXmwaVsXzZI960ryuCTMgLCcSyGZtTOorV9mmo5tk0xjhkOEMeSkMSg+0r8AWZ/yC6c7y30mUq2Dya6wb49LqZH9ioPp31eKyg3wt+BrghBHc2AD/t2JIpPvVPcyRGlCpvYUNQT5T1oXsFiIXT5P2ru1aF149fVgcIOQzqA/73AclWCAGuoxO1luG614KJc+EGx14wnRgz4kbdAw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=QUCK40j/MKbmAltgkpt0LRa+HZ3KZMwNAU4kq3URtlw=; b=DNIF2c5Dr6ZLuJnked75ac6NPbzOSxp8wI6B9a0DeLo9OaSEaXbIdxjsr3xUCOJpNCHS0YqfxPo83CBVtheh7URMPKgqqBMLxKrOGARk9D3plYdTHuoDGgMCsS9knZ/W3o8qdjlljvwA9B0onWIxOtHUcR7W7Iz5S0rmHAIjh0QmyDEXotS2JpQ6LeN2Us0PWyuM4E0WxXGj1hTQyUJxyEZMSzIFupTpIPrqhtIeX5fIi49pVpUP/TNvJU/04+EatZRwXfZkT3dbJCMjyHB4kREQKfgZsh7KimePDLAgv4Zpkx8pTxUGU3t4w/0LjKYUUekIVQ+eBTPda5yyjHCESQ== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=QUCK40j/MKbmAltgkpt0LRa+HZ3KZMwNAU4kq3URtlw=; b=K/asQl2ZvhzWM3E8YNai1w1cvqJM1g1oEKfF8FqkrEYAix3aHLF4EuBFwmqZhCnLfFSKkrSgL4n6cJ0EqvcsZQZZD1UO5GLQ3ci5IcxdLbErfoVYkPnOU7anl53vGUWYe2W/wIMyfjk4kJZ0tTwMxKVCDyC6h0pIUAxsc9x1uNM= Received: from DUZP191CA0063.EURP191.PROD.OUTLOOK.COM (2603:10a6:10:4fa::7) by VI0PR08MB10538.eurprd08.prod.outlook.com (2603:10a6:800:205::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7409.46; Wed, 10 Apr 2024 14:23:06 +0000 Received: from DB1PEPF00039233.eurprd03.prod.outlook.com (2603:10a6:10:4fa:cafe::62) by DUZP191CA0063.outlook.office365.com (2603:10a6:10:4fa::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7452.30 via Frontend Transport; Wed, 10 Apr 2024 14:23:05 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DB1PEPF00039233.mail.protection.outlook.com (10.167.8.106) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.7452.22 via Frontend Transport; Wed, 10 Apr 2024 14:23:05 +0000 Received: ("Tessian outbound e26069fc76b9:v300"); Wed, 10 Apr 2024 14:23:05 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: fdd05763612484c3 X-CR-MTA-TID: 64aa7808 Received: from 7089f17375cc.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id C3DB048C-8AFE-46A6-BA95-55FAFF310ABF.1; Wed, 10 Apr 2024 14:22:59 +0000 Received: from EUR03-AM7-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 7089f17375cc.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Wed, 10 Apr 2024 14:22:59 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=aWkIvnVM9TLUqy4LIteXa82+hsX5I4tCQj0345OM1xAUAOlGDMGW/af3Y4ol9mJWgGLQLsazZMPCbO+GYsevOb+WF0FawzI4m9VBWr8V1+Q4sH4/TYLQ6VY9gHUR1MV00rK7rpL7J9X+W1vIbwiQ1T8CF7XzCG8XAQgrG40K9gkNmaU6gvBPTyPx7HL7x59TX+QhQna2Gdx0oMUHljrMktbfz/G7Mrip4oLSvxhTqOkKdFTlkknAvPyez89Ip5klVKd/k0amfajnqX/wROq5D2vPUMnF+h4wrTUeug1HbkdOUxRDWn4ZHHU9FyiBO8FH1H/AJf1qJ0QMkLR+GFVPkw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=QUCK40j/MKbmAltgkpt0LRa+HZ3KZMwNAU4kq3URtlw=; b=jBl5d8d/NoGhEJ46Ex7gwpxG1WGQVnTADJa2q3qq8Qp0mkSmFhTcveJaeqAwQiCqwjgpaWs38OE8QQBjTRp7lX3+P+6gan8zesJ+Sfe1aYKBvCzNDJtayPBdIBuT9DyCuwgj+P4XQ7/5RapG5Do+46nRijewcZLTnBU6giLirtx1mC49fQX2Uu55IdgY72NygIAmoldc8IqYwD0++5ZnDHQNRIrJ8igdo3gHUeFNRshdiVKy8pkLcCFHXvwdM0eoEUD2SJEjmm69OLB+J6wlP5t4Tt+RGdjBh4psIDkPIzPhTDrLbHERm7arNgBXghBMruEOwsQIvjngr7JukNdryA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=QUCK40j/MKbmAltgkpt0LRa+HZ3KZMwNAU4kq3URtlw=; b=K/asQl2ZvhzWM3E8YNai1w1cvqJM1g1oEKfF8FqkrEYAix3aHLF4EuBFwmqZhCnLfFSKkrSgL4n6cJ0EqvcsZQZZD1UO5GLQ3ci5IcxdLbErfoVYkPnOU7anl53vGUWYe2W/wIMyfjk4kJZ0tTwMxKVCDyC6h0pIUAxsc9x1uNM= Received: from PAWPR08MB8958.eurprd08.prod.outlook.com (2603:10a6:102:33e::15) by DBBPR08MB5884.eurprd08.prod.outlook.com (2603:10a6:10:1f6::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7409.46; Wed, 10 Apr 2024 14:22:57 +0000 Received: from PAWPR08MB8958.eurprd08.prod.outlook.com ([fe80::9561:db03:3f6c:5634]) by PAWPR08MB8958.eurprd08.prod.outlook.com ([fe80::9561:db03:3f6c:5634%5]) with mapi id 15.20.7409.042; Wed, 10 Apr 2024 14:22:57 +0000 Date: Wed, 10 Apr 2024 15:22:54 +0100 From: Alex Coplan To: Ajit Agarwal Cc: Richard Sandiford , "Kewen.Lin" , Segher Boessenkool , Michael Meissner , David Edelsohn , Peter Bergner , gcc-patches Subject: Re: [PATCH V4 1/3] aarch64: Place target independent and dependent changed code in one file Message-ID: References: <54ab422a-4ffa-4ca9-8029-eb23c42ed653@linux.ibm.com> Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: AM0PR02CA0209.eurprd02.prod.outlook.com (2603:10a6:20b:28f::16) To PAWPR08MB8958.eurprd08.prod.outlook.com (2603:10a6:102:33e::15) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: PAWPR08MB8958:EE_|DBBPR08MB5884:EE_|DB1PEPF00039233:EE_|VI0PR08MB10538:EE_ x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: jAwhX+MTqFt6sgiXgvqruVy2HeTt4x4gqAVnGb+Boz99SAvj9PJe3EdxlAzX3wulT4fu8d443Dbn49p62NNisxhPex4s0HOSplnVHmweJ/dvlrbRT3LVM9Qsv8TnKAh6tffoxUDdi2H3yn1WvElJQnILrz5UFqzavzWGglmbNTY7cMACNYIL6awJveXjPFkwl9pfNtSiq3xnWILJUWJJqhz461EQJy3b+Ym7GWZJ5dGHh7pLQKfwUjOASi4FxG7VWvnqUuhDBvtEmLuV5XCqrJ0IwQvCPDHqXfB4Iq583TLNu3xX1hoLJ6SvGLVwNdOmw6x96KKn+SQqhXepUxb7A1tTGpvoJfHKbJpNssp9TmPqC97so/polGVLxDVnyA857F1Ra50uj/jEWEh+pkl6zhlpsGidCudgTO7du1n0rSSL7Ra7Kfg6SnzOP/BIwUdfkOFbkYbFDcuVdTK/HKgn5wVQNPTsn/OYMRFLqLPJshXeqfike+zxUnhZ+lAvD7xdHul30+cJSZ7bTngHruxTSQHcZ0VaJAeoxxlyCOxJ7ZFbjpPdDoj/d6tmyK0Cm7abkYnR4ww0QJWPE9z4ONySLKeZXw+jAYDTa2NUFxsWFYXHazQJ6ntteEfahE+mjcS8cfMlEpr4wPl3XYfchngcjAWQxz+l7Dm7zVYjZtxnobc= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:PAWPR08MB8958.eurprd08.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230031)(1800799015)(366007)(376005);DIR:OUT;SFP:1102; X-MS-Exchange-Transport-CrossTenantHeadersStamped: DBBPR08MB5884 X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DB1PEPF00039233.eurprd03.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 570e1628-65cc-4d12-36db-08dc5969bcf2 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: VAHz92lb+hk/BSS7fbQJ9FwgpGw4JHUrS0LvzsFJpRj1p9NJseOD7FKAAkprZPvhHSLaDCx5axfdxX1k/y4R2KKaWOsoThou4G/WnsqlIieSU0ZOXbrviAoHybeOaxCiJStT+NKhHi8d2IpoCsdFfeJL7je7LA4ToYjo2s+GT+pX7eHhzgupZn5fwEBOhK81ZOtXuzhzehtj+jZ/IKlPYYAT4UjVrrDWjzDhH++A8mTAXabHiZLtAb1ckosOBr0lj7/Kzqr/L9vTFBYkV32bY/wjUvZs5OTQF684UYPwQmLXShYCjplHyPMCLCT8eRW7uSvIbsmFYL3OOyA/ScXPn8EZuGT2jhoEJsts7ldw6+TlAgxGyu7cWrE99Zx+h5F0MlELBj2IurAToTSelIfSfBYqisL+qgCdtLT4qY64DLupNyEew98Ctx1LntZOR4ieMWMKKwZS/7Gk2Ci/3l3utkUUnzyjvPwLmeK8u7znPMzwNLsdliWcUoG6JqV/VlRjq7Xrnq1AX58ZOgQjscVwROQ7ib2ywB+RlR7Ik7SnoseFzZ3F7NiQjXxC4Bz7frDA09OkE4ts6IX9RjUL89OnKScgFqWzRCnlUJjJ7hVaBjC+EnXtyI/RIrj6L16HB/dQo/ekdQO2nJfW5F9N/IiznDe4lVI+Oke133/YMD6/PzuE7RVuDKgQyP3QWWKMS4zN4FsH6fMAQ6jGeNwEPhN+HI4kWl9ce+lojg7UHfD0q6fJ6Ksyo6Vu3M72Zklz0IT/ X-Forefront-Antispam-Report: CIP:63.35.35.123;CTRY:IE;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:64aa7808-outbound-1.mta.getcheckrecipient.com;PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com;CAT:NONE;SFS:(13230031)(82310400014)(1800799015)(36860700004)(376005);DIR:OUT;SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 10 Apr 2024 14:23:05.6196 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 570e1628-65cc-4d12-36db-08dc5969bcf2 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d;Ip=[63.35.35.123];Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DB1PEPF00039233.eurprd03.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI0PR08MB10538 X-Spam-Status: No, score=-6.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,FORGED_SPF_HELO,KAM_DMARC_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE,TXREP,UNPARSEABLE_RELAY autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Ajit, On 10/04/2024 15:31, Ajit Agarwal wrote: > Hello Alex: > > On 10/04/24 1:42 pm, Alex Coplan wrote: > > Hi Ajit, > > > > On 09/04/2024 20:59, Ajit Agarwal wrote: > >> Hello Alex: > >> > >> On 09/04/24 8:39 pm, Alex Coplan wrote: > >>> On 09/04/2024 20:01, Ajit Agarwal wrote: > >>>> Hello Alex: > >>>> > >>>> On 09/04/24 7:29 pm, Alex Coplan wrote: > >>>>> On 09/04/2024 17:30, Ajit Agarwal wrote: > >>>>>> > >>>>>> > >>>>>> On 05/04/24 10:03 pm, Alex Coplan wrote: > >>>>>>> On 05/04/2024 13:53, Ajit Agarwal wrote: > >>>>>>>> Hello Alex/Richard: > >>>>>>>> > >>>>>>>> All review comments are incorporated. > >>>>>>>> @@ -2890,8 +3018,8 @@ ldp_bb_info::merge_pairs (insn_list_t &left_list, > >>>>>>>> // of accesses. If we find two sets of adjacent accesses, call > >>>>>>>> // merge_pairs. > >>>>>>>> void > >>>>>>>> -ldp_bb_info::transform_for_base (int encoded_lfs, > >>>>>>>> - access_group &group) > >>>>>>>> +pair_fusion_bb_info::transform_for_base (int encoded_lfs, > >>>>>>>> + access_group &group) > >>>>>>>> { > >>>>>>>> const auto lfs = decode_lfs (encoded_lfs); > >>>>>>>> const unsigned access_size = lfs.size; > >>>>>>>> @@ -2909,7 +3037,7 @@ ldp_bb_info::transform_for_base (int encoded_lfs, > >>>>>>>> access.cand_insns, > >>>>>>>> lfs.load_p, > >>>>>>>> access_size); > >>>>>>>> - skip_next = access.cand_insns.empty (); > >>>>>>>> + skip_next = bb_state->cand_insns_empty_p (access.cand_insns); > >>>>>>> > >>>>>>> As above, why is this needed? > >>>>>> > >>>>>> For rs6000 we want to return always true. as load store pair > >>>>>> that are to be merged with 8/16 16/32 32/64 is occuring for rs6000. > >>>>>> And we want load store pair to 8/16 32/64. Thats why we want > >>>>>> to generate always true for rs6000 to skip pairs as above. > >>>>> > >>>>> Hmm, sorry, I'm not sure I follow. Are you saying that for rs6000 you have > >>>>> load/store pair instructions where the two arms of the access are storing > >>>>> operands of different sizes? Or something else? > >>>>> > >>>>> As it stands the logic is to skip the next iteration only if we > >>>>> exhausted all the candidate insns for the current access. In the case > >>>>> that we didn't exhaust all such candidates, then the idea is that when > >>>>> access becomes prev_access, we can attempt to use those candidates as > >>>>> the "left-hand side" of a pair in the next iteration since we failed to > >>>>> use them as the "right-hand side" of a pair in the current iteration. > >>>>> I don't see why you wouldn't want that behaviour. Please can you > >>>>> explain? > >>>>> > >>>> > >>>> In merge_pair we get the 2 load candiates one load from 0 offset and > >>>> other load is from 16th offset. Then in next iteration we get load > >>>> from 16th offset and other load from 32 offset. In next iteration > >>>> we get load from 32 offset and other load from 48 offset. > >>>> > >>>> For example: > >>>> > >>>> Currently we get the load candiates as follows. > >>>> > >>>> pairs: > >>>> > >>>> load from 0th offset. > >>>> load from 16th offset. > >>>> > >>>> next pairs: > >>>> > >>>> load from 16th offset. > >>>> load from 32th offset. > >>>> > >>>> next pairs: > >>>> > >>>> load from 32th offset > >>>> load from 48th offset. > >>>> > >>>> Instead in rs6000 we should get: > >>>> > >>>> pairs: > >>>> > >>>> load from 0th offset > >>>> load from 16th offset. > >>>> > >>>> next pairs: > >>>> > >>>> load from 32th offset > >>>> load from 48th offset. > >>> > >>> Hmm, so then I guess my question is: why wouldn't you consider merging > >>> the pair with offsets (16,32) for rs6000? Is it because you have a > >>> stricter alignment requirement on the base pair offsets (i.e. they have > >>> to be a multiple of 32 when the operand size is 16)? So the pair > >>> offsets have to be a multiple of the entire pair size rather than a > >>> single operand size> > >> > >> We get load pair at a certain point with (0,16) and other program > >> point we get load pair (32, 48). > >> > >> In current implementation it takes offsets loads as (0, 16), > >> (16, 32), (32, 48). > >> > >> But In rs6000 we want the load pair to be merged at different points > >> as (0,16) and (32, 48). for (0,16) we want to replace load lxvp with > >> 0 offset and other load (32, 48) with lxvp with 32 offset. > >> > >> In current case it will merge with lxvp with 0 offset and lxvp with > >> 16 offset, then lxvp with 32 offset and lxvp with 48 offset which > >> is incorrect in our case as the (16-32) case 16 offset will not > >> load from even register and break for rs6000. > > > > Sorry, I think I'm still missing something here. Why does the address offset > > affect the parity of the tranfser register? ISTM they needn't be related at > > all (and indeed we can't even know the parity of the transfer register before > > RA, but perhaps you're only intending on running the pass after RA?) > > > > We have load pair with (0,16) wherein these loads are adjacent and > replaced with lxvp. > > Semantic of lxvp instruction is that it loads adjacent load pair in > even register and even_register + 1. > > We replace the above load pair with lxvp instruction and then we > dont need to merge (16,32) as (0, 16) is already merged and instead > we merge (32,48). Ok, but the existing logic should already account for this. I.e. if we successfully merge (0,16), then we don't attempt to merge (16,32). We'd only attempt to merge (16,32) if the merge of (0,16) failed (for whatever reason). So I don't see that there's anything specific to lxvp that requires this logic to change, _unless_ you have a stricter alignment requirement on the offsets as I mentioned before. > > Yes you are correct, the addresss offset doesn't affect the parity of > the register transfer as we are doing fusion pass before RA. > If that is the case then I think it would be better to introduce a > >>> virtual function (say pair_offset_alignment_ok_p) that vets the base > >>> offset of the pair (prev_access->offset in transform_for_base). I guess > >>> it would also take access_size as a parameter and for aarch64 it should > >>> check: > >>> > >>> multiple_p (offset, access_size) > >>> > >>> and for rs6000 it could check: > >>> > >>> multiple_p (offset, access_size * 2) > >>> > >>> and we would then incorporate a call to that predicate in the else if > >>> condition of tranform_for_base. > >>> > >>> It would have the same limitation whereby we assume that MEM_EXPR offset > >>> alignment is a good proxy for RTL offset alignment, but we already > >>> depend on that with the multiple_p check in track_via_mem_expr. > >>> > I have addressed the above hooks and it worked fine with both rs6000 > and aarch64. I am sending subsequent patch in some time that address > above. > > > How do you plan on handling this even-odd requirement for rs6000? > > > > We plan to handle with V16QI subreg: 0 and V16QI subreg : 16 to > generate register pair and thats what we generate and implement > in rs6000 target > code. Ah, this is coming back to me now. Sorry, I should have remembered this from the previous discussion with Richard S. Apologies for going on a slight tangent here, but if you're running before RA are you planning to create a new OImode pseudo register for the lxvp insn and then somehow update uses of the old transfer registers to replace them with subregs of that OImode pseudo? Or do you just plan or replacing the individual loads with moves (instead of deleting them)? I guess the latter would be simpler and might work better in the presence of hard regs. Thanks, Alex > > Thanks & Regards > Ajit > > Thanks, > > Alex > > > >> > >> lxvp should load from even registers and then loaded value will > >> be in even register and even register +1 (which is odd). > >> > >> Thanks & Regards > >> Ajit > >>> If that is the case then I think it would be better to introduce a > >>> virtual function (say pair_offset_alignment_ok_p) that vets the base > >>> offset of the pair (prev_access->offset in transform_for_base). I guess > >>> it would also take access_size as a parameter and for aarch64 it should > >>> check: > >>> > >>> multiple_p (offset, access_size) > >>> > >>> and for rs6000 it could check: > >>> > >>> multiple_p (offset, access_size * 2) > >>> > >>> and we would then incorporate a call to that predicate in the else if > >>> condition of tranform_for_base. > >>> > >>> It would have the same limitation whereby we assume that MEM_EXPR offset > >>> alignment is a good proxy for RTL offset alignment, but we already > >>> depend on that with the multiple_p check in track_via_mem_expr. > >>> > >>> Thanks, > >>> Alex > >>> > >>>> > >>>> Thanks & Regards > >>>> Ajit > >>>>> Thanks, > >>>>> Alex > >>>>> > >>>>>> > >>>>>>> > >>>>>>>> } > >>>>>>>> prev_access = &access; > >>>>>>>> } > >>>>>>>> @@ -2919,7 +3047,7 @@ ldp_bb_info::transform_for_base (int encoded_lfs, > >>>>>>>> // and remove all the tombstone insns, being sure to reparent any uses > >>>>>>>> // of mem to previous defs when we do this. > >>>>>>>> void > >>>>>>>> -ldp_bb_info::cleanup_tombstones () > >>>>>>>> +pair_fusion_bb_info::cleanup_tombstones () > >>>>>>>> { > >>>>>>>> // No need to do anything if we didn't emit a tombstone insn for this BB. > >>>>>>>> if (!m_emitted_tombstone) > >>>>>>>> @@ -2947,7 +3075,7 @@ ldp_bb_info::cleanup_tombstones () > >>>>>>>> > >>>>>>>> template > >>>>>>>> void > >>>>>>>> -ldp_bb_info::traverse_base_map (Map &map) > >>>>>>>> +pair_fusion_bb_info::traverse_base_map (Map &map) > >>>>>>>> { > >>>>>>>> for (auto kv : map) > >>>>>>>> { > >>>>>>>> @@ -2958,7 +3086,7 @@ ldp_bb_info::traverse_base_map (Map &map) > >>>>>>>> } > >>>>>>>> > >>>>>>>> void > >>>>>>>> -ldp_bb_info::transform () > >>>>>>>> +pair_fusion_bb_info::transform () > >>>>>>>> { > >>>>>>>> traverse_base_map (expr_map); > >>>>>>>> traverse_base_map (def_map); > >>>>>>>> @@ -3167,14 +3295,13 @@ try_promote_writeback (insn_info *insn) > >>>>>>>> // for load/store candidates. If running after RA, also try and promote > >>>>>>>> // non-writeback pairs to use writeback addressing. Then try to fuse > >>>>>>>> // candidates into pairs. > >>>>>>>> -void ldp_fusion_bb (bb_info *bb) > >>>>>>>> +void pair_fusion::ldp_fusion_bb (bb_info *bb) > >>>>>>>> { > >>>>>>>> - const bool track_loads > >>>>>>>> - = aarch64_tune_params.ldp_policy_model != AARCH64_LDP_STP_POLICY_NEVER; > >>>>>>>> - const bool track_stores > >>>>>>>> - = aarch64_tune_params.stp_policy_model != AARCH64_LDP_STP_POLICY_NEVER; > >>>>>>>> + const bool track_loads = track_load_p (); > >>>>>>>> + const bool track_stores = track_store_p (); > >>>>>>>> > >>>>>>>> - ldp_bb_info bb_state (bb); > >>>>>>> > >>>>>>> This: > >>>>>>> > >>>>>>>> + aarch64_pair_fusion derived; > >>>>>>> > >>>>>>> can be deleted and then: > >>>>>>> > >>>>>>>> + pair_fusion_bb_info bb_info (bb, &derived); > >>>>>>> > >>>>>>> can just be: > >>>>>>> > >>>>>>> pair_fusion_bb_info bb_info (bb, this); > >>>>>>> > >>>>>>> (or you can pass *this if you make bb_info take a reference). > >>>>>>> > >>>>>>> I don't think there's a particular need to change the variable name > >>>>>>> (bb_state -> bb_info). I chose the former because it doens't clash > >>>>>>> with the RTL-SSA structure of the same name as the latter. > >>>>>>> > >>>>>> > >>>>>> Addressed. > >>>>>>>> > >>>>>>>> for (auto insn : bb->nondebug_insns ()) > >>>>>>>> { > >>>>>>>> @@ -3184,31 +3311,31 @@ void ldp_fusion_bb (bb_info *bb) > >>>>>>>> continue; > >>>>>>>> > >>>>>>>> rtx pat = PATTERN (rti); > >>>>>>>> - if (reload_completed > >>>>>>>> - && aarch64_ldp_writeback > 1 > >>>>>>>> - && GET_CODE (pat) == PARALLEL > >>>>>>>> - && XVECLEN (pat, 0) == 2) > >>>>>>>> + if (pair_mem_promote_writeback_p (pat)) > >>>>>>>> try_promote_writeback (insn); > >>>>>>> > >>>>>>> It looks like try_promote_writeback itself will need some further work > >>>>>>> to make it target-independent. I suppose this check: > >>>>>>> > >>>>>>> auto rti = insn->rtl (); > >>>>>>> const auto attr = get_attr_ldpstp (rti); > >>>>>>> if (attr == LDPSTP_NONE) > >>>>>>> return; > >>>>>>> > >>>>>>> bool load_p = (attr == LDPSTP_LDP); > >>>>>>> gcc_checking_assert (load_p || attr == LDPSTP_STP); > >>>>>>> > >>>>>>> will need to become part of the pair_mem_promote_writeback_p hook that you > >>>>>>> added, potentially changing it to return a boolean for load_p. > >>>>>>> > >>>>>>> Then I guess we will need hooks for destructuring the pair insn and > >>>>>>> another hook to wrap aarch64_gen_writeback_pair. > >>>>>>> > >>>>>> > >>>>>> Addressed. > >>>>>>>> > >>>>>>>> if (GET_CODE (pat) != SET) > >>>>>>>> continue; > >>>>>>>> > >>>>>>>> if (track_stores && MEM_P (XEXP (pat, 0))) > >>>>>>>> - bb_state.track_access (insn, false, XEXP (pat, 0)); > >>>>>>>> + bb_info.track_access (insn, false, XEXP (pat, 0)); > >>>>>>>> else if (track_loads && MEM_P (XEXP (pat, 1))) > >>>>>>>> - bb_state.track_access (insn, true, XEXP (pat, 1)); > >>>>>>>> + bb_info.track_access (insn, true, XEXP (pat, 1)); > >>>>>>>> } > >>>>>>>> > >>>>>>>> - bb_state.transform (); > >>>>>>>> - bb_state.cleanup_tombstones (); > >>>>>>>> + bb_info.transform (); > >>>>>>>> + bb_info.cleanup_tombstones (); > >>>>>>>> } > >>>>>>>> > >>>>>>>> void ldp_fusion () > >>>>>>>> { > >>>>>>>> ldp_fusion_init (); > >>>>>>>> + pair_fusion *pfuse; > >>>>>>>> + aarch64_pair_fusion derived; > >>>>>>>> + pfuse = &derived; > >>>>>>> > >>>>>>> This is indeed the one place where I think it is acceptable to > >>>>>>> instantiate aarch64_pair_fusion. But again there's no need to create a > >>>>>>> pointer to the parent class, just call any function you like directly. > >>>>>>> > >>>>>> > >>>>>> Addressed. > >>>>>>>> > >>>>>>>> for (auto bb : crtl->ssa->bbs ()) > >>>>>>>> - ldp_fusion_bb (bb); > >>>>>>>> + pfuse->ldp_fusion_bb (bb); > >>>>>>> > >>>>>>> I think even the code to iterate over bbs should itself be a member > >>>>>>> function of pair_fusion (say "run") and then that becomes part of the > >>>>>>> generic code. > >>>>>>> > >>>>>>> So this function would just become: > >>>>>>> > >>>>>>> aarch64_pair_fusion pass; > >>>>>>> pass.run (); > >>>>>>> > >>>>>>> and could be inlined into the caller. > >>>>>>> > >>>>>> > >>>>>> Addressed. > >>>>>>> Perhaps you could also add an early return like: > >>>>>>> > >>>>>>> if (!track_loads_p () && !track_stores_p ()) > >>>>>>> return; > >>>>>>> > >>>>>>> in pair_fusion::run () and then remove the corresponding code from > >>>>>>> pass_ldp_fusion::gate? > >>>>>>> > >>>>>> > >>>>>> Addressed. > >>>>>> > >>>>>>>> > >>>>>>>> ldp_fusion_destroy (); > >>>>>>>> } > >>>>>>>> -- > >>>>>>>> 2.39.3 > >>>>>>>> > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Alex > >>>>>> > >>>>>> Thanks & Regards > >>>>>> Ajit