From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR04-HE1-obe.outbound.protection.outlook.com (mail-he1eur04on2053.outbound.protection.outlook.com [40.107.7.53]) by sourceware.org (Postfix) with ESMTPS id 3A0BB3858C2A for ; Mon, 25 Sep 2023 10:59:25 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 3A0BB3858C2A Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=l7/309LaMiVozEyKAauigXxnd+Unv71R1OicVtaRZf4=; b=JRBNrGqd1sdEuCt3bqvMGQAJH0UWqE4Q87Bz6pH9nIXzKoierF0mrnn7szaB5nEIqnCGv7goqGTFw8Vv6p7ENDSF0VFH0dbnh0ivXp8lnmoUQ4TmCbIenVVk48wr12IrprA7NxXvInSPUIZqV23V5s4NhufHGFus/yokug2nowk= Received: from DB8P191CA0024.EURP191.PROD.OUTLOOK.COM (2603:10a6:10:130::34) by DU2PR08MB10188.eurprd08.prod.outlook.com (2603:10a6:10:46f::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6813.28; Mon, 25 Sep 2023 10:59:21 +0000 Received: from DBAEUR03FT014.eop-EUR03.prod.protection.outlook.com (2603:10a6:10:130:cafe::ff) by DB8P191CA0024.outlook.office365.com (2603:10a6:10:130::34) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6792.35 via Frontend Transport; Mon, 25 Sep 2023 10:59:21 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DBAEUR03FT014.mail.protection.outlook.com (100.127.143.22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6838.18 via Frontend Transport; Mon, 25 Sep 2023 10:59:21 +0000 Received: ("Tessian outbound 30c9f5e988c5:v175"); Mon, 25 Sep 2023 10:59:21 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 5ffc63112bd1903d X-CR-MTA-TID: 64aa7808 Received: from c10012c51007.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id DF7BEF5A-8577-4F30-8135-C295D519CE7E.1; Mon, 25 Sep 2023 10:59:14 +0000 Received: from EUR04-VI1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id c10012c51007.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Mon, 25 Sep 2023 10:59:14 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=O9Lzfx65WG/Wawl3cY/TcmORPb9c/4NdMXyy+kxgyf3K8WCRZcbaO9tI6URDzEhCG8EVpU+ZG+5FQixqY/S9zuzsyLjmDpfY65ux6aWo2QO0WQOrQzSDrmMviLj+VrW+Ytlxuf7M5F/KByOdvhUQoIsZ/S59PHKgr1t9Lpb2ACa3apXd/+/w5mTG46hbdFmKvToOMb+HbNA7bZ3YIHoS1Xo89DtWgwBM7gLB/NLVvC7ZuF4ceKu86qA6OrUmYO0GSEqlFCEA8fvIUwCdxi4Rnk3BuKMYu387hiXraWBiWuzltv91tso0wfsh7UaKAGPhlFoyORQnY/r1yATq6mRHMA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=l7/309LaMiVozEyKAauigXxnd+Unv71R1OicVtaRZf4=; b=LJIaFd97I+gHfD/+IH59rUGDzRFSdUXVUOAkjGUnBOT5MScD4ulSUJ+Sgx/iG9zJK5/f0S2J1aCM8+8UrmAGLuSyLwQDDKvfoAIDXa9aJHgKrNZ2R7g+YnfJv2FBboqoWsuRvesNtuetWkXspGqnyfUez1dFPFKl4nnkd6Jj2zaa4AvYLfJypFZ1yW+c79+cl/BO/k8Vq8Z7TVTeusyjE4UK2EUreqBfWzeuBvePZYO7wgBJhmpfJVcqAg5UPjo3U2nF7whSP94nRGEPjwGf70oDPhCaqUAJw3TUl1w34eWJHKt8LP4yb+tv6KXQwIlp+fOIPDUng4ESSTVkmI1DlA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=l7/309LaMiVozEyKAauigXxnd+Unv71R1OicVtaRZf4=; b=JRBNrGqd1sdEuCt3bqvMGQAJH0UWqE4Q87Bz6pH9nIXzKoierF0mrnn7szaB5nEIqnCGv7goqGTFw8Vv6p7ENDSF0VFH0dbnh0ivXp8lnmoUQ4TmCbIenVVk48wr12IrprA7NxXvInSPUIZqV23V5s4NhufHGFus/yokug2nowk= Received: from PAXPR08MB6926.eurprd08.prod.outlook.com (2603:10a6:102:138::24) by AS4PR08MB7479.eurprd08.prod.outlook.com (2603:10a6:20b:4e6::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6813.28; Mon, 25 Sep 2023 10:59:10 +0000 Received: from PAXPR08MB6926.eurprd08.prod.outlook.com ([fe80::5d97:c4c8:2d86:6056]) by PAXPR08MB6926.eurprd08.prod.outlook.com ([fe80::5d97:c4c8:2d86:6056%5]) with mapi id 15.20.6813.027; Mon, 25 Sep 2023 10:59:10 +0000 From: Kyrylo Tkachov To: Manos Anagnostakis , "gcc-patches@gcc.gnu.org" CC: Philipp Tomsich Subject: RE: [PATCH] aarch64: Fine-grained ldp and stp policies with test-cases. Thread-Topic: [PATCH] aarch64: Fine-grained ldp and stp policies with test-cases. Thread-Index: AQHZ0aiktbve2IOKM02nwUDR3H3zJLArmalg Date: Mon, 25 Sep 2023 10:59:10 +0000 Message-ID: References: <20230818074943.41754-1-manos.anagnostakis@vrull.eu> In-Reply-To: <20230818074943.41754-1-manos.anagnostakis@vrull.eu> Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; x-ms-traffictypediagnostic: PAXPR08MB6926:EE_|AS4PR08MB7479:EE_|DBAEUR03FT014:EE_|DU2PR08MB10188:EE_ X-MS-Office365-Filtering-Correlation-Id: 49fdb60b-f92c-42b4-f9a3-08dbbdb678cc x-checkrecipientrouted: true nodisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: FPYIB9AEgJz/Ugs1bmwDyyQooHemIuwgePMfyAZoQBs7IVeJBfkwclnzblaqgdIAVs1LUPbSeKyrSqD9+oFLNiIG1e7a2JEx0/mxCA8Jo6pRs1bpW/WbkKU0XMDnb2lKVttpQAOSHzPtQ2SsEaqRrdMd0zfne7oHRtYYQ5rghU0poZYhsIfGNb7ZDZfUKtgnTyrgnXrR7EBJ6ldF/IYV5IhZFsbwIOgLe+zKZOLRhVzHneZWOTNLLmy0J/Qkw4iDY9Kt8g6cL5AavNasKzsFwa1uf0+9GpHVeTWo67aoo1+9Y3Pxj2poMizBMZu57EsezU/bTKNk6pIaxz6DWNp7qHQBOv9qfMYevBY/aUY2GoMfz8wXQt5beBv5FYS7LWnKqf64x5SILzOOGeomKH2baW50I3rPoWfVWveR5s9EoPDtMSnXkFpweqk0dpmSFEfThouXOAT451157sQyonUJcYx/3Cc1TkkMH4Ab5MFG6o4270E626gg/AJa/fUbQbmLseP7SSskbfiaSt4uYg2R6AERtDYd9xkmc2dtijbQgHdX3VxUxLf0VurLdJ5mLVJjoZRsq8G7loFIPpC31FkBF1B1oXmko5eHVYck9wparB0yQ7AKQSikNjkqOKabbsbeSvB3ty48K8JBjWuwpYdQPA== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:PAXPR08MB6926.eurprd08.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230031)(366004)(396003)(136003)(346002)(39860400002)(376002)(230922051799003)(186009)(1800799009)(451199024)(55016003)(2906002)(38070700005)(83380400001)(30864003)(122000001)(38100700002)(76116006)(55236004)(9686003)(66946007)(66556008)(66476007)(66446008)(64756008)(110136005)(52536014)(71200400001)(6506007)(53546011)(26005)(5660300002)(316002)(41300700001)(7696005)(4326008)(8936002)(8676002)(478600001)(86362001)(33656002)(84970400001)(559001)(579004);DIR:OUT;SFP:1101; Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS4PR08MB7479 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DBAEUR03FT014.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: c094b250-ed76-4940-084d-08dbbdb67248 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: UpspdJtqUCLio5uKBPxjZZcM0fogTJPZO9P7q+bxHWDiIW7HtMsyyIt13a2KGBLozspChU7yrTOQV1meqJkB7Q2fzb52TJ05/AbfMuS2t6avR69+7nR59B4JH2tvgDeb7MW/NZxmzcIufWHBv9g8PJ5HVpOdH+gyb7oaxb3DP8V3m8+HxX383vmEIVJiq6UoccKW5/j7eemk5FGanN6Sw3qF5XTcXaa2v0hHddneM/5YokAmaf4u7KXj6YIJEkr6CwPXfehc3sYj9wxhgFNes2TM0Cf9/FVrzYVIGp+1V0UlB5/e2ur8qXNFXFmQtWG4P67wTDo7E2zDwqX8aIQyd+NHXEsoEKgjTgaRCgpY1SX0MXLicAXBcvv1hW+0O0e71thKWxCcPSiy7DRSBVrXq98UDt5aqVj3g7fpgVaRzEM+sdr3WK3W9gcHv58qArMZKzlF6VyyNx/nI6wR5VKXJwV9Es5yZIpm6p3t89s7C6Rki6OcCvqNtu9P59x38LzR4Q5kart/NOierdj5Aa9BYmSi7ATvURHmnjgJLfh/0ye18TEWbTgOBan27a4G/0iFnyDyUpvcvT8YayWin2R02+w3ktY6AcDT9HfA3fas3SmM9JlBTL2lLWvtWP2+Xe0sxmIcBb7iBm/51AOHRGmMjUvIBfeGN/+HYQ/VvhDQONERq7zLPqikfl4BVB/fELz8QNZMM530kwx7sHWPXE1EQNznKCKb/w5JS/TpOyfAbwM= X-Forefront-Antispam-Report: CIP:63.35.35.123;CTRY:IE;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:64aa7808-outbound-1.mta.getcheckrecipient.com;PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com;CAT:NONE;SFS:(13230031)(4636009)(376002)(396003)(39850400004)(346002)(136003)(230922051799003)(186009)(82310400011)(1800799009)(451199024)(36840700001)(46966006)(40480700001)(55016003)(2906002)(47076005)(36860700001)(83380400001)(30864003)(356005)(82740400003)(81166007)(70206006)(70586007)(9686003)(110136005)(53546011)(52536014)(6506007)(5660300002)(26005)(336012)(316002)(41300700001)(7696005)(478600001)(8936002)(8676002)(4326008)(107886003)(86362001)(33656002)(84970400001)(579004);DIR:OUT;SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 25 Sep 2023 10:59:21.1453 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 49fdb60b-f92c-42b4-f9a3-08dbbdb678cc X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d;Ip=[63.35.35.123];Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DBAEUR03FT014.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DU2PR08MB10188 X-Spam-Status: No, score=-11.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,FORGED_SPF_HELO,GIT_PATCH_0,KAM_DMARC_NONE,KAM_SHORT,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE,TXREP,UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Manos, Apologies for the long delay. > -----Original Message----- > From: Manos Anagnostakis > Sent: Friday, August 18, 2023 8:50 AM > To: gcc-patches@gcc.gnu.org > Cc: Kyrylo Tkachov ; Philipp Tomsich > ; Manos Anagnostakis > > Subject: [PATCH] aarch64: Fine-grained ldp and stp policies with test-cas= es. >=20 > This patch implements the following TODO in gcc/config/aarch64/aarch64.cc > to provide the requested behaviour for handling ldp and stp: >=20 > /* Allow the tuning structure to disable LDP instruction formation > from combining instructions (e.g., in peephole2). > TODO: Implement fine-grained tuning control for LDP and STP: > 1. control policies for load and store separately; > 2. support the following policies: > - default (use what is in the tuning structure) > - always > - never > - aligned (only if the compiler can prove that the > load will be aligned to 2 * element_size) */ >=20 > It provides two new and concrete command-line options -mldp-policy and - > mstp-policy > to give the ability to control load and store policies seperately as > stated in part 1 of the TODO. >=20 > The accepted values for both options are: > - default: Use the ldp/stp policy defined in the corresponding tuning > structure. > - always: Emit ldp/stp regardless of alignment. > - never: Do not emit ldp/stp. > - aligned: In order to emit ldp/stp, first check if the load/store will > be aligned to 2 * element_size. >=20 > gcc/ChangeLog: > * config/aarch64/aarch64-protos.h (struct tune_params): Add > appropriate enums for the policies. > * config/aarch64/aarch64-tuning-flags.def > (AARCH64_EXTRA_TUNING_OPTION): Remove superseded tuning > options. > * config/aarch64/aarch64.cc (aarch64_parse_ldp_policy): New > function to parse ldp-policy option. > (aarch64_parse_stp_policy): New function to parse stp-policy opti= on. > (aarch64_override_options_internal): Call parsing functions. > (aarch64_operands_ok_for_ldpstp): Add option-value check and > alignment check and remove superseded ones > (aarch64_operands_adjust_ok_for_ldpstp): Add option-value check a= nd > alignment check and remove superseded ones. > * config/aarch64/aarch64.opt: Add options. >=20 > gcc/testsuite/ChangeLog: > * gcc.target/aarch64/ldp_aligned.c: New test. > * gcc.target/aarch64/ldp_always.c: New test. > * gcc.target/aarch64/ldp_never.c: New test. > * gcc.target/aarch64/stp_aligned.c: New test. > * gcc.target/aarch64/stp_always.c: New test. > * gcc.target/aarch64/stp_never.c: New test. >=20 > Signed-off-by: Manos Anagnostakis > --- >=20 > gcc/config/aarch64/aarch64-protos.h | 24 ++ > gcc/config/aarch64/aarch64-tuning-flags.def | 8 - > gcc/config/aarch64/aarch64.cc | 229 ++++++++++++++---- > gcc/config/aarch64/aarch64.opt | 8 + > .../gcc.target/aarch64/ldp_aligned.c | 64 +++++ > gcc/testsuite/gcc.target/aarch64/ldp_always.c | 64 +++++ > gcc/testsuite/gcc.target/aarch64/ldp_never.c | 64 +++++ > .../gcc.target/aarch64/stp_aligned.c | 60 +++++ > gcc/testsuite/gcc.target/aarch64/stp_always.c | 60 +++++ > gcc/testsuite/gcc.target/aarch64/stp_never.c | 60 +++++ > 10 files changed, 580 insertions(+), 61 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_aligned.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_always.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_never.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/stp_aligned.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/stp_always.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/stp_never.c >=20 > diff --git a/gcc/config/aarch64/aarch64-protos.h > b/gcc/config/aarch64/aarch64-protos.h > index 70303d6fd95..be1d73490ed 100644 > --- a/gcc/config/aarch64/aarch64-protos.h > +++ b/gcc/config/aarch64/aarch64-protos.h > @@ -568,6 +568,30 @@ struct tune_params > /* Place prefetch struct pointer at the end to enable type checking > errors when tune_params misses elements (e.g., from erroneous merge= s). > */ > const struct cpu_prefetch_tune *prefetch; > +/* An enum specifying how to handle load pairs using a fine-grained poli= cy: > + - LDP_POLICY_ALIGNED: Emit ldp if the source pointer is aligned > + to at least double the alignment of the type. > + - LDP_POLICY_ALWAYS: Emit ldp regardless of alignment. > + - LDP_POLICY_NEVER: Do not emit ldp. */ > + > + enum aarch64_ldp_policy_model > + { > + LDP_POLICY_ALIGNED, > + LDP_POLICY_ALWAYS, > + LDP_POLICY_NEVER > + } ldp_policy_model; > +/* An enum specifying how to handle store pairs using a fine-grained pol= icy: > + - STP_POLICY_ALIGNED: Emit stp if the source pointer is aligned > + to at least double the alignment of the type. > + - STP_POLICY_ALWAYS: Emit stp regardless of alignment. > + - STP_POLICY_NEVER: Do not emit stp. */ > + > + enum aarch64_stp_policy_model > + { > + STP_POLICY_ALIGNED, > + STP_POLICY_ALWAYS, > + STP_POLICY_NEVER > + } stp_policy_model; > }; >=20 > /* Classifies an address. > diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def > b/gcc/config/aarch64/aarch64-tuning-flags.def > index 52112ba7c48..774568e9106 100644 > --- a/gcc/config/aarch64/aarch64-tuning-flags.def > +++ b/gcc/config/aarch64/aarch64-tuning-flags.def > @@ -30,11 +30,6 @@ >=20 > AARCH64_EXTRA_TUNING_OPTION ("rename_fma_regs", > RENAME_FMA_REGS) >=20 > -/* Don't create non-8 byte aligned load/store pair. That is if the > -two load/stores are not at least 8 byte aligned don't create load/store > -pairs. */ > -AARCH64_EXTRA_TUNING_OPTION ("slow_unaligned_ldpw", > SLOW_UNALIGNED_LDPW) > - > /* Some of the optional shift to some arthematic instructions are > considered cheap. Logical shift left <=3D4 with or without a > zero extend are considered cheap. Sign extend; non logical shift lef= t > @@ -44,9 +39,6 @@ AARCH64_EXTRA_TUNING_OPTION > ("cheap_shift_extend", CHEAP_SHIFT_EXTEND) > /* Disallow load/store pair instructions on Q-registers. */ > AARCH64_EXTRA_TUNING_OPTION ("no_ldp_stp_qregs", > NO_LDP_STP_QREGS) >=20 > -/* Disallow load-pair instructions to be formed in combine/peephole. */ > -AARCH64_EXTRA_TUNING_OPTION ("no_ldp_combine", NO_LDP_COMBINE) > - > AARCH64_EXTRA_TUNING_OPTION ("rename_load_regs", > RENAME_LOAD_REGS) >=20 > AARCH64_EXTRA_TUNING_OPTION ("cse_sve_vl_constants", > CSE_SVE_VL_CONSTANTS) > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.c= c > index 560e5431636..51c94804f12 100644 > --- a/gcc/config/aarch64/aarch64.cc > +++ b/gcc/config/aarch64/aarch64.cc > @@ -1356,7 +1356,9 @@ static const struct tune_params generic_tunings =3D > Neoverse V1. It does not have a noticeable effect on A64FX and sho= uld > have at most a very minor effect on SVE2 cores. */ > (AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS), /* tune_flags. */ > - &generic_prefetch_tune > + &generic_prefetch_tune, > + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ > + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ > }; >=20 > static const struct tune_params cortexa35_tunings =3D > @@ -1390,7 +1392,9 @@ static const struct tune_params cortexa35_tunings > =3D > 0, /* max_case_values. */ > tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ > (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ > - &generic_prefetch_tune > + &generic_prefetch_tune, > + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ > + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ > }; >=20 > static const struct tune_params cortexa53_tunings =3D > @@ -1424,7 +1428,9 @@ static const struct tune_params cortexa53_tunings > =3D > 0, /* max_case_values. */ > tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ > (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ > - &generic_prefetch_tune > + &generic_prefetch_tune, > + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ > + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ > }; >=20 > static const struct tune_params cortexa57_tunings =3D > @@ -1458,7 +1464,9 @@ static const struct tune_params cortexa57_tunings > =3D > 0, /* max_case_values. */ > tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ > (AARCH64_EXTRA_TUNE_RENAME_FMA_REGS), /* tune_flags. */ > - &generic_prefetch_tune > + &generic_prefetch_tune, > + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ > + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ > }; >=20 > static const struct tune_params cortexa72_tunings =3D > @@ -1492,7 +1500,9 @@ static const struct tune_params cortexa72_tunings > =3D > 0, /* max_case_values. */ > tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ > (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ > - &generic_prefetch_tune > + &generic_prefetch_tune, > + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ > + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ > }; >=20 > static const struct tune_params cortexa73_tunings =3D > @@ -1526,7 +1536,9 @@ static const struct tune_params cortexa73_tunings > =3D > 0, /* max_case_values. */ > tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ > (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ > - &generic_prefetch_tune > + &generic_prefetch_tune, > + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ > + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ > }; >=20 >=20 > @@ -1561,7 +1573,9 @@ static const struct tune_params exynosm1_tunings > =3D > 48, /* max_case_values. */ > tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ > (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ > - &exynosm1_prefetch_tune > + &exynosm1_prefetch_tune, > + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ > + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ > }; >=20 > static const struct tune_params thunderxt88_tunings =3D > @@ -1593,8 +1607,10 @@ static const struct tune_params > thunderxt88_tunings =3D > 2, /* min_div_recip_mul_df. */ > 0, /* max_case_values. */ > tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model. */ > - (AARCH64_EXTRA_TUNE_SLOW_UNALIGNED_LDPW), /* tune_flags. */ > - &thunderxt88_prefetch_tune > + (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ > + &thunderxt88_prefetch_tune, > + tune_params::LDP_POLICY_ALIGNED, /* ldp_policy_model. */ > + tune_params::STP_POLICY_ALIGNED /* stp_policy_model. */ > }; >=20 > static const struct tune_params thunderx_tunings =3D > @@ -1626,9 +1642,10 @@ static const struct tune_params thunderx_tunings > =3D > 2, /* min_div_recip_mul_df. */ > 0, /* max_case_values. */ > tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model. */ > - (AARCH64_EXTRA_TUNE_SLOW_UNALIGNED_LDPW > - | AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND), /* tune_flags. */ > - &thunderx_prefetch_tune > + (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND), /* tune_flags. */ > + &thunderx_prefetch_tune, > + tune_params::LDP_POLICY_ALIGNED, /* ldp_policy_model. */ > + tune_params::STP_POLICY_ALIGNED /* stp_policy_model. */ > }; >=20 > static const struct tune_params tsv110_tunings =3D > @@ -1662,7 +1679,9 @@ static const struct tune_params tsv110_tunings =3D > 0, /* max_case_values. */ > tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ > (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ > - &tsv110_prefetch_tune > + &tsv110_prefetch_tune, > + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ > + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ > }; >=20 > static const struct tune_params xgene1_tunings =3D > @@ -1695,7 +1714,9 @@ static const struct tune_params xgene1_tunings =3D > 17, /* max_case_values. */ > tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model. */ > (AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS), /* tune_flags. */ > - &xgene1_prefetch_tune > + &xgene1_prefetch_tune, > + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ > + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ > }; >=20 > static const struct tune_params emag_tunings =3D > @@ -1728,7 +1749,9 @@ static const struct tune_params emag_tunings =3D > 17, /* max_case_values. */ > tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model. */ > (AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS), /* tune_flags. */ > - &xgene1_prefetch_tune > + &xgene1_prefetch_tune, > + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ > + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ > }; >=20 > static const struct tune_params qdf24xx_tunings =3D > @@ -1762,7 +1785,9 @@ static const struct tune_params qdf24xx_tunings =3D > 0, /* max_case_values. */ > tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ > AARCH64_EXTRA_TUNE_RENAME_LOAD_REGS, /* tune_flags. */ > - &qdf24xx_prefetch_tune > + &qdf24xx_prefetch_tune, > + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ > + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ > }; >=20 > /* Tuning structure for the Qualcomm Saphira core. Default to falkor va= lues > @@ -1798,7 +1823,9 @@ static const struct tune_params saphira_tunings =3D > 0, /* max_case_values. */ > tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ > (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ > - &generic_prefetch_tune > + &generic_prefetch_tune, > + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ > + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ > }; >=20 > static const struct tune_params thunderx2t99_tunings =3D > @@ -1832,7 +1859,9 @@ static const struct tune_params > thunderx2t99_tunings =3D > 0, /* max_case_values. */ > tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ > (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ > - &thunderx2t99_prefetch_tune > + &thunderx2t99_prefetch_tune, > + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ > + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ > }; >=20 > static const struct tune_params thunderx3t110_tunings =3D > @@ -1866,7 +1895,9 @@ static const struct tune_params > thunderx3t110_tunings =3D > 0, /* max_case_values. */ > tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ > (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ > - &thunderx3t110_prefetch_tune > + &thunderx3t110_prefetch_tune, > + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ > + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ > }; >=20 > static const struct tune_params neoversen1_tunings =3D > @@ -1899,7 +1930,9 @@ static const struct tune_params > neoversen1_tunings =3D > 0, /* max_case_values. */ > tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ > (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND), /* tune_flags. */ > - &generic_prefetch_tune > + &generic_prefetch_tune, > + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ > + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ > }; >=20 > static const struct tune_params ampere1_tunings =3D > @@ -1935,8 +1968,10 @@ static const struct tune_params ampere1_tunings > =3D > 2, /* min_div_recip_mul_df. */ > 0, /* max_case_values. */ > tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ > - (AARCH64_EXTRA_TUNE_NO_LDP_COMBINE), /* tune_flags. */ > - &ere1_prefetch_tune > + (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ > + &ere1_prefetch_tune, > + tune_params::LDP_POLICY_ALIGNED, /* ldp_policy_model. */ > + tune_params::STP_POLICY_ALIGNED /* stp_policy_model. */ > }; >=20 > static const struct tune_params ampere1a_tunings =3D > @@ -1973,8 +2008,10 @@ static const struct tune_params ampere1a_tunings > =3D > 2, /* min_div_recip_mul_df. */ > 0, /* max_case_values. */ > tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ > - (AARCH64_EXTRA_TUNE_NO_LDP_COMBINE), /* tune_flags. */ > - &ere1_prefetch_tune > + (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ > + &ere1_prefetch_tune, > + tune_params::LDP_POLICY_ALIGNED, /* ldp_policy_model. */ > + tune_params::STP_POLICY_ALIGNED /* stp_policy_model. */ > }; >=20 > static const advsimd_vec_cost neoversev1_advsimd_vector_cost =3D > @@ -2155,7 +2192,9 @@ static const struct tune_params > neoversev1_tunings =3D > | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS > | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT > | AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND), /* tune_flags. */ > - &generic_prefetch_tune > + &generic_prefetch_tune, > + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ > + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ > }; >=20 > static const sve_vec_cost neoverse512tvb_sve_vector_cost =3D > @@ -2292,7 +2331,9 @@ static const struct tune_params > neoverse512tvb_tunings =3D > (AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS > | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS > | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT), /* > tune_flags. */ > - &generic_prefetch_tune > + &generic_prefetch_tune, > + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ > + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ > }; >=20 > static const advsimd_vec_cost neoversen2_advsimd_vector_cost =3D > @@ -2482,7 +2523,9 @@ static const struct tune_params > neoversen2_tunings =3D > | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS > | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS > | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT), /* > tune_flags. */ > - &generic_prefetch_tune > + &generic_prefetch_tune, > + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ > + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ > }; >=20 > static const advsimd_vec_cost neoversev2_advsimd_vector_cost =3D > @@ -2672,7 +2715,9 @@ static const struct tune_params > neoversev2_tunings =3D > | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS > | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS > | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT), /* > tune_flags. */ > - &generic_prefetch_tune > + &generic_prefetch_tune, > + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ > + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ > }; >=20 > static const struct tune_params a64fx_tunings =3D > @@ -2705,7 +2750,9 @@ static const struct tune_params a64fx_tunings =3D > 0, /* max_case_values. */ > tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ > (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ > - &a64fx_prefetch_tune > + &a64fx_prefetch_tune, > + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ > + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ > }; >=20 > /* Support for fine-grained override of the tuning structures. */ > @@ -17645,6 +17692,50 @@ aarch64_parse_tune (const char *to_parse, > const struct processor **res) > return AARCH_PARSE_INVALID_ARG; > } >=20 > +/* Validate a command-line -mldp-policy option. Parse the policy > + specified in STR and throw errors if appropriate. */ > + > +static bool > +aarch64_parse_ldp_policy (const char *str, struct tune_params* tune) > +{ > + /* Check the value of the option to be one of the accepted. */ > + if (strcmp (str, "always") =3D=3D 0) > + tune->ldp_policy_model =3D tune_params::LDP_POLICY_ALWAYS; > + else if (strcmp (str, "never") =3D=3D 0) > + tune->ldp_policy_model =3D tune_params::LDP_POLICY_NEVER; > + else if (strcmp (str, "aligned") =3D=3D 0) > + tune->ldp_policy_model =3D tune_params::LDP_POLICY_ALIGNED; > + else if (strcmp (str, "default") !=3D 0) > + { > + error ("unknown value %qs for %<-mldp-policy%>", str); > + return false; > + } > + > + return true; > +} > + > +/* Validate a command-line -mstp-policy option. Parse the policy > + specified in STR and throw errors if appropriate. */ > + > +static bool > +aarch64_parse_stp_policy (const char *str, struct tune_params* tune) > +{ > + /* Check the value of the option to be one of the accepted. */ > + if (strcmp (str, "always") =3D=3D 0) > + tune->stp_policy_model =3D tune_params::STP_POLICY_ALWAYS; > + else if (strcmp (str, "never") =3D=3D 0) > + tune->stp_policy_model =3D tune_params::STP_POLICY_NEVER; > + else if (strcmp (str, "aligned") =3D=3D 0) > + tune->stp_policy_model =3D tune_params::STP_POLICY_ALIGNED; > + else if (strcmp (str, "default") !=3D 0) > + { > + error ("unknown value %qs for %<-mstp-policy%>", str); > + return false; > + } > + > + return true; > +} > + > /* Parse TOKEN, which has length LENGTH to see if it is an option > described in FLAG. If it is, return the index bit for that fusion ty= pe. > If not, error (printing OPTION_NAME) and return zero. */ > @@ -17993,6 +18084,14 @@ aarch64_override_options_internal (struct > gcc_options *opts) > aarch64_parse_override_string (opts->x_aarch64_override_tune_string, > &aarch64_tune_params); >=20 > + if (opts->x_aarch64_ldp_policy_string) > + aarch64_parse_ldp_policy (opts->x_aarch64_ldp_policy_string, > + &aarch64_tune_params); > + > + if (opts->x_aarch64_stp_policy_string) > + aarch64_parse_stp_policy (opts->x_aarch64_stp_policy_string, > + &aarch64_tune_params); > + > /* This target defaults to strict volatile bitfields. */ > if (opts->x_flag_strict_volatile_bitfields < 0 && abi_version_at_least= (2)) > opts->x_flag_strict_volatile_bitfields =3D 1; > @@ -26301,18 +26400,14 @@ aarch64_operands_ok_for_ldpstp (rtx > *operands, bool load, > enum reg_class rclass_1, rclass_2; > rtx mem_1, mem_2, reg_1, reg_2; >=20 > - /* Allow the tuning structure to disable LDP instruction formation > - from combining instructions (e.g., in peephole2). > - TODO: Implement fine-grained tuning control for LDP and STP: > - 1. control policies for load and store separately; > - 2. support the following policies: > - - default (use what is in the tuning structure) > - - always > - - never > - - aligned (only if the compiler can prove that the > - load will be aligned to 2 * element_size) */ > - if (load && (aarch64_tune_params.extra_tuning_flags > - & AARCH64_EXTRA_TUNE_NO_LDP_COMBINE)) > + /* If we have LDP_POLICY_NEVER, reject the load pair. */ > + if (load > + && aarch64_tune_params.ldp_policy_model =3D=3D > tune_params::LDP_POLICY_NEVER) > + return false; > + > + /* If we have STP_POLICY_NEVER, reject the store pair. */ > + if (!load > + && aarch64_tune_params.stp_policy_model =3D=3D > tune_params::STP_POLICY_NEVER) > return false; >=20 > if (load) > @@ -26339,13 +26434,22 @@ aarch64_operands_ok_for_ldpstp (rtx > *operands, bool load, > if (MEM_VOLATILE_P (mem_1) || MEM_VOLATILE_P (mem_2)) > return false; >=20 > - /* If we have SImode and slow unaligned ldp, > - check the alignment to be at least 8 byte. */ > - if (mode =3D=3D SImode > - && (aarch64_tune_params.extra_tuning_flags > - & AARCH64_EXTRA_TUNE_SLOW_UNALIGNED_LDPW) > + /* If we have LDP_POLICY_ALIGNED, > + do not emit the load pair unless the alignment is checked to be > + at least double the alignment of the type. */ > + if (load > + && aarch64_tune_params.ldp_policy_model =3D=3D > tune_params::LDP_POLICY_ALIGNED > && !optimize_size > - && MEM_ALIGN (mem_1) < 8 * BITS_PER_UNIT) > + && MEM_ALIGN (mem_1) < 2 * GET_MODE_ALIGNMENT (mode)) > + return false; > + > + /* If we have STP_POLICY_ALIGNED, > + do not emit the store pair unless the alignment is checked to be > + at least double the alignment of the type. */ > + if (!load > + && aarch64_tune_params.stp_policy_model =3D=3D > tune_params::STP_POLICY_ALIGNED > + && !optimize_size > + && MEM_ALIGN (mem_1) < 2 * GET_MODE_ALIGNMENT (mode)) > return false; I appreciate there is an existing use of optimize_size above, but the recom= mended way of checking this is optimize_function_for_size_p (cfun) >=20 > /* Check if the addresses are in the form of [base+offset]. */ > @@ -26475,6 +26579,16 @@ aarch64_operands_adjust_ok_for_ldpstp (rtx > *operands, bool load, > HOST_WIDE_INT offvals[num_insns], msize; > rtx mem[num_insns], reg[num_insns], base[num_insns], > offset[num_insns]; >=20 > + /* If we have LDP_POLICY_NEVER, reject the load pair. */ > + if (load > + && aarch64_tune_params.ldp_policy_model =3D=3D > tune_params::LDP_POLICY_NEVER) > + return false; > + > + /* If we have STP_POLICY_NEVER, reject the store pair. */ > + if (!load > + && aarch64_tune_params.stp_policy_model =3D=3D > tune_params::STP_POLICY_NEVER) > + return false; > + > if (load) > { > for (int i =3D 0; i < num_insns; i++) > @@ -26564,13 +26678,22 @@ aarch64_operands_adjust_ok_for_ldpstp (rtx > *operands, bool load, > if (offvals[0] % msize !=3D offvals[2] % msize) > return false; >=20 > - /* If we have SImode and slow unaligned ldp, > - check the alignment to be at least 8 byte. */ > - if (mode =3D=3D SImode > - && (aarch64_tune_params.extra_tuning_flags > - & AARCH64_EXTRA_TUNE_SLOW_UNALIGNED_LDPW) > + /* If we have LDP_POLICY_ALIGNED, > + do not emit the load pair unless the alignment is checked to be > + at least double the alignment of the type. */ > + if (load > + && aarch64_tune_params.ldp_policy_model =3D=3D > tune_params::LDP_POLICY_ALIGNED > + && !optimize_size > + && MEM_ALIGN (mem[0]) < 2 * GET_MODE_ALIGNMENT (mode)) > + return false; > + > + /* If we have STP_POLICY_ALIGNED, > + do not emit the store pair unless the alignment is checked to be > + at least double the alignment of the type. */ > + if (!load > + && aarch64_tune_params.stp_policy_model =3D=3D > tune_params::STP_POLICY_ALIGNED > && !optimize_size > - && MEM_ALIGN (mem[0]) < 8 * BITS_PER_UNIT) > + && MEM_ALIGN (mem[0]) < 2 * GET_MODE_ALIGNMENT (mode)) > return false; >=20 > return true; > diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.= opt > index 4a0580435a8..e5302947ce7 100644 > --- a/gcc/config/aarch64/aarch64.opt > +++ b/gcc/config/aarch64/aarch64.opt > @@ -205,6 +205,14 @@ msign-return-address=3D > Target WarnRemoved RejectNegative Joined Enum(aarch_ra_sign_scope_t) > Var(aarch_ra_sign_scope) Init(AARCH_FUNCTION_NONE) Save > Select return address signing scope. >=20 > +mldp-policy=3D > +Target RejectNegative Joined Var(aarch64_ldp_policy_string) Save > +Fine-grained policy for load pairs. > + > +mstp-policy=3D > +Target RejectNegative Joined Var(aarch64_stp_policy_string) Save > +Fine-grained policy for store pairs. I'd like to avoid having -m* option for such low-level codegen tweaks. -m* = options should be used for options that enable/disable user-visible feature= s, ABI things etc. We have target-specific params these days so I'd recommend you implement th= is in a similar way to -param=3Daarch64-autovec-preference=3D. It will have to take a number rather than a string but that should be okay,= as long as the right values are documented in invoke.texi. Otherwise the approach looks good. Thanks, Kyrill > + > Enum > Name(aarch_ra_sign_scope_t) Type(enum aarch_function_type) > Supported AArch64 return address signing scope (for use with -msign-retu= rn- > address=3D option): > diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_aligned.c > b/gcc/testsuite/gcc.target/aarch64/ldp_aligned.c > new file mode 100644 > index 00000000000..895018f6b53 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/ldp_aligned.c > @@ -0,0 +1,64 @@ > +/* { dg-options "-O3 -mldp-policy=3Daligned" } */ > + > +#include > +#include > + > +typedef int v4si __attribute__ ((vector_size (16))); > + > +#define LDP_TEST_ALIGNED(TYPE) \ > +TYPE ldp_aligned_##TYPE(char* ptr){ \ > + TYPE a_0, a_1; \ > + TYPE *arr =3D (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1= )); \ > + a_0 =3D arr[0]; \ > + a_1 =3D arr[1]; \ > + return a_0 + a_1; \ > +} > + > +#define LDP_TEST_UNALIGNED(TYPE) \ > +TYPE ldp_unaligned_##TYPE(char* ptr){ \ > + TYPE a_0, a_1; \ > + TYPE *arr =3D (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1= )); \ > + TYPE *a =3D arr+1; \ > + a_0 =3D a[0]; \ > + a_1 =3D a[1]; \ > + return a_0 + a_1; \ > +} > + > +#define LDP_TEST_ADJUST_ALIGNED(TYPE) \ > +TYPE ldp_aligned_adjust_##TYPE(char* ptr){ \ > + TYPE a_0, a_1, a_2, a_3; \ > + TYPE *arr =3D (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1= )); \ > + a_0 =3D arr[100]; \ > + a_1 =3D arr[101]; \ > + a_2 =3D arr[102]; \ > + a_3 =3D arr[103]; \ > + return a_0 + a_1 + a_2 + a_3; \ > +} > + > +#define LDP_TEST_ADJUST_UNALIGNED(TYPE) \ > +TYPE ldp_unaligned_adjust_##TYPE(char* ptr){ \ > + TYPE a_0, a_1, a_2, a_3; \ > + TYPE *arr =3D (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1= )); \ > + TYPE *a =3D arr+1; \ > + a_0 =3D a[100]; \ > + a_1 =3D a[101]; \ > + a_2 =3D a[102]; \ > + a_3 =3D a[103]; \ > + return a_0 + a_1 + a_2 + a_3; \ > +} > + > +LDP_TEST_ALIGNED(int32_t); > +LDP_TEST_ALIGNED(int64_t); > +LDP_TEST_ALIGNED(v4si); > +LDP_TEST_UNALIGNED(int32_t); > +LDP_TEST_UNALIGNED(int64_t); > +LDP_TEST_UNALIGNED(v4si); > +LDP_TEST_ADJUST_ALIGNED(int32_t); > +LDP_TEST_ADJUST_ALIGNED(int64_t); > +LDP_TEST_ADJUST_UNALIGNED(int32_t); > +LDP_TEST_ADJUST_UNALIGNED(int64_t); > + > +/* { dg-final { scan-assembler-times "ldp\tw\[0-9\]+, w\[0-9\]" 3 } } */ > +/* { dg-final { scan-assembler-times "ldp\tx\[0-9\]+, x\[0-9\]" 3 } } */ > +/* { dg-final { scan-assembler-times "ldp\tq\[0-9\]+, q\[0-9\]" 1 } } */ > + > diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_always.c > b/gcc/testsuite/gcc.target/aarch64/ldp_always.c > new file mode 100644 > index 00000000000..ead4fe41891 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/ldp_always.c > @@ -0,0 +1,64 @@ > +/* { dg-options "-O3 -mldp-policy=3Dalways" } */ > + > +#include > +#include > + > +typedef int v4si __attribute__ ((vector_size (16))); > + > +#define LDP_TEST_ALIGNED(TYPE) \ > +TYPE ldp_aligned_##TYPE(char* ptr){ \ > + TYPE a_0, a_1; \ > + TYPE *arr =3D (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1= )); \ > + a_0 =3D arr[0]; \ > + a_1 =3D arr[1]; \ > + return a_0 + a_1; \ > +} > + > +#define LDP_TEST_UNALIGNED(TYPE) \ > +TYPE ldp_unaligned_##TYPE(char* ptr){ \ > + TYPE a_0, a_1; \ > + TYPE *arr =3D (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1= )); \ > + TYPE *a =3D arr+1; \ > + a_0 =3D a[0]; \ > + a_1 =3D a[1]; \ > + return a_0 + a_1; \ > +} > + > +#define LDP_TEST_ADJUST_ALIGNED(TYPE) \ > +TYPE ldp_aligned_adjust_##TYPE(char* ptr){ \ > + TYPE a_0, a_1, a_2, a_3; \ > + TYPE *arr =3D (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1= )); \ > + a_0 =3D arr[100]; \ > + a_1 =3D arr[101]; \ > + a_2 =3D arr[102]; \ > + a_3 =3D arr[103]; \ > + return a_0 + a_1 + a_2 + a_3; \ > +} > + > +#define LDP_TEST_ADJUST_UNALIGNED(TYPE) \ > +TYPE ldp_unaligned_adjust_##TYPE(char* ptr){ \ > + TYPE a_0, a_1, a_2, a_3; \ > + TYPE *arr =3D (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1= )); \ > + TYPE *a =3D arr+1; \ > + a_0 =3D a[100]; \ > + a_1 =3D a[101]; \ > + a_2 =3D a[102]; \ > + a_3 =3D a[103]; \ > + return a_0 + a_1 + a_2 + a_3; \ > +} > + > +LDP_TEST_ALIGNED(int32_t); > +LDP_TEST_ALIGNED(int64_t); > +LDP_TEST_ALIGNED(v4si); > +LDP_TEST_UNALIGNED(int32_t); > +LDP_TEST_UNALIGNED(int64_t); > +LDP_TEST_UNALIGNED(v4si); > +LDP_TEST_ADJUST_ALIGNED(int32_t); > +LDP_TEST_ADJUST_ALIGNED(int64_t); > +LDP_TEST_ADJUST_UNALIGNED(int32_t); > +LDP_TEST_ADJUST_UNALIGNED(int64_t); > + > +/* { dg-final { scan-assembler-times "ldp\tw\[0-9\]+, w\[0-9\]" 6 } } */ > +/* { dg-final { scan-assembler-times "ldp\tx\[0-9\]+, x\[0-9\]" 6 } } */ > +/* { dg-final { scan-assembler-times "ldp\tq\[0-9\]+, q\[0-9\]" 2 } } */ > + > diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_never.c > b/gcc/testsuite/gcc.target/aarch64/ldp_never.c > new file mode 100644 > index 00000000000..aae2f087241 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/ldp_never.c > @@ -0,0 +1,64 @@ > +/* { dg-options "-O3 -mldp-policy=3Dnever" } */ > + > +#include > +#include > + > +typedef int v4si __attribute__ ((vector_size (16))); > + > +#define LDP_TEST_ALIGNED(TYPE) \ > +TYPE ldp_aligned_##TYPE(char* ptr){ \ > + TYPE a_0, a_1; \ > + TYPE *arr =3D (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1= )); \ > + a_0 =3D arr[0]; \ > + a_1 =3D arr[1]; \ > + return a_0 + a_1; \ > +} > + > +#define LDP_TEST_UNALIGNED(TYPE) \ > +TYPE ldp_unaligned_##TYPE(char* ptr){ \ > + TYPE a_0, a_1; \ > + TYPE *arr =3D (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1= )); \ > + TYPE *a =3D arr+1; \ > + a_0 =3D a[0]; \ > + a_1 =3D a[1]; \ > + return a_0 + a_1; \ > +} > + > +#define LDP_TEST_ADJUST_ALIGNED(TYPE) \ > +TYPE ldp_aligned_adjust_##TYPE(char* ptr){ \ > + TYPE a_0, a_1, a_2, a_3; \ > + TYPE *arr =3D (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1= )); \ > + a_0 =3D arr[100]; \ > + a_1 =3D arr[101]; \ > + a_2 =3D arr[102]; \ > + a_3 =3D arr[103]; \ > + return a_0 + a_1 + a_2 + a_3; \ > +} > + > +#define LDP_TEST_ADJUST_UNALIGNED(TYPE) \ > +TYPE ldp_unaligned_adjust_##TYPE(char* ptr){ \ > + TYPE a_0, a_1, a_2, a_3; \ > + TYPE *arr =3D (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1= )); \ > + TYPE *a =3D arr+1; \ > + a_0 =3D a[100]; \ > + a_1 =3D a[101]; \ > + a_2 =3D a[102]; \ > + a_3 =3D a[103]; \ > + return a_0 + a_1 + a_2 + a_3; \ > +} > + > +LDP_TEST_ALIGNED(int32_t); > +LDP_TEST_ALIGNED(int64_t); > +LDP_TEST_ALIGNED(v4si); > +LDP_TEST_UNALIGNED(int32_t); > +LDP_TEST_UNALIGNED(int64_t); > +LDP_TEST_UNALIGNED(v4si); > +LDP_TEST_ADJUST_ALIGNED(int32_t); > +LDP_TEST_ADJUST_ALIGNED(int64_t); > +LDP_TEST_ADJUST_UNALIGNED(int32_t); > +LDP_TEST_ADJUST_UNALIGNED(int64_t); > + > +/* { dg-final { scan-assembler-times "ldp\tw\[0-9\]+, w\[0-9\]" 0 } } */ > +/* { dg-final { scan-assembler-times "ldp\tx\[0-9\]+, x\[0-9\]" 0 } } */ > +/* { dg-final { scan-assembler-times "ldp\tq\[0-9\]+, q\[0-9\]" 0 } } */ > + > diff --git a/gcc/testsuite/gcc.target/aarch64/stp_aligned.c > b/gcc/testsuite/gcc.target/aarch64/stp_aligned.c > new file mode 100644 > index 00000000000..07b49629292 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/stp_aligned.c > @@ -0,0 +1,60 @@ > +/* { dg-options "-O3 -mstp-policy=3Daligned" } */ > + > +#include > +#include > + > +typedef int v4si __attribute__ ((vector_size (16))); > + > +#define STP_TEST_ALIGNED(TYPE) \ > +TYPE *stp_aligned_##TYPE(char* ptr, TYPE x){ \ > + TYPE *arr =3D (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1= )); \ > + arr[0] =3D x; \ > + arr[1] =3D x; \ > + return arr; \ > +} > + > +#define STP_TEST_UNALIGNED(TYPE) \ > +TYPE *stp_unaligned_##TYPE(char* ptr, TYPE x){ \ > + TYPE *arr =3D (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1= )); \ > + TYPE *a =3D arr+1; \ > + a[0] =3D x; \ > + a[1] =3D x; \ > + return a; \ > +} > + > +#define STP_TEST_ADJUST_ALIGNED(TYPE) \ > +TYPE *stp_aligned_adjust_##TYPE(char* ptr, TYPE x){ \ > + TYPE *arr =3D (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1= )); \ > + arr[100] =3D x; \ > + arr[101] =3D x; \ > + arr[102] =3D x; \ > + arr[103] =3D x; \ > + return arr; \ > +} > + > +#define STP_TEST_ADJUST_UNALIGNED(TYPE) \ > +TYPE *stp_unaligned_adjust_##TYPE(char* ptr, TYPE x){ \ > + TYPE *arr =3D (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1= )); \ > + TYPE *a =3D arr+1; \ > + a[100] =3D x; \ > + a[101] =3D x; \ > + a[102] =3D x; \ > + a[103] =3D x; \ > + return a; \ > +} > + > +STP_TEST_ALIGNED(int32_t); > +STP_TEST_ALIGNED(int64_t); > +STP_TEST_ALIGNED(v4si); > +STP_TEST_UNALIGNED(int32_t); > +STP_TEST_UNALIGNED(int64_t); > +STP_TEST_UNALIGNED(v4si); > +STP_TEST_ADJUST_ALIGNED(int32_t); > +STP_TEST_ADJUST_ALIGNED(int64_t); > +STP_TEST_ADJUST_UNALIGNED(int32_t); > +STP_TEST_ADJUST_UNALIGNED(int64_t); > + > +/* { dg-final { scan-assembler-times "stp\tw\[0-9\]+, w\[0-9\]" 3 } } */ > +/* { dg-final { scan-assembler-times "stp\tx\[0-9\]+, x\[0-9\]" 3 } } */ > +/* { dg-final { scan-assembler-times "stp\tq\[0-9\]+, q\[0-9\]" 1 } } */ > + > diff --git a/gcc/testsuite/gcc.target/aarch64/stp_always.c > b/gcc/testsuite/gcc.target/aarch64/stp_always.c > new file mode 100644 > index 00000000000..6a1c671f02c > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/stp_always.c > @@ -0,0 +1,60 @@ > +/* { dg-options "-O3 -mstp-policy=3Dalways" } */ > + > +#include > +#include > + > +typedef int v4si __attribute__ ((vector_size (16))); > + > +#define STP_TEST_ALIGNED(TYPE) \ > +TYPE *stp_aligned_##TYPE(char* ptr, TYPE x){ \ > + TYPE *arr =3D (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1= )); \ > + arr[0] =3D x; \ > + arr[1] =3D x; \ > + return arr; \ > +} > + > +#define STP_TEST_UNALIGNED(TYPE) \ > +TYPE *stp_unaligned_##TYPE(char* ptr, TYPE x){ \ > + TYPE *arr =3D (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1= )); \ > + TYPE *a =3D arr+1; \ > + a[0] =3D x; \ > + a[1] =3D x; \ > + return a; \ > +} > + > +#define STP_TEST_ADJUST_ALIGNED(TYPE) \ > +TYPE *stp_aligned_adjust_##TYPE(char* ptr, TYPE x){ \ > + TYPE *arr =3D (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1= )); \ > + arr[100] =3D x; \ > + arr[101] =3D x; \ > + arr[102] =3D x; \ > + arr[103] =3D x; \ > + return arr; \ > +} > + > +#define STP_TEST_ADJUST_UNALIGNED(TYPE) \ > +TYPE *stp_unaligned_adjust_##TYPE(char* ptr, TYPE x){ \ > + TYPE *arr =3D (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1= )); \ > + TYPE *a =3D arr+1; \ > + a[100] =3D x; \ > + a[101] =3D x; \ > + a[102] =3D x; \ > + a[103] =3D x; \ > + return a; \ > +} > + > +STP_TEST_ALIGNED(int32_t); > +STP_TEST_ALIGNED(int64_t); > +STP_TEST_ALIGNED(v4si); > +STP_TEST_UNALIGNED(int32_t); > +STP_TEST_UNALIGNED(int64_t); > +STP_TEST_UNALIGNED(v4si); > +STP_TEST_ADJUST_ALIGNED(int32_t); > +STP_TEST_ADJUST_ALIGNED(int64_t); > +STP_TEST_ADJUST_UNALIGNED(int32_t); > +STP_TEST_ADJUST_UNALIGNED(int64_t); > + > +/* { dg-final { scan-assembler-times "stp\tw\[0-9\]+, w\[0-9\]" 6 } } */ > +/* { dg-final { scan-assembler-times "stp\tx\[0-9\]+, x\[0-9\]" 6 } } */ > +/* { dg-final { scan-assembler-times "stp\tq\[0-9\]+, q\[0-9\]" 2 } } */ > + > diff --git a/gcc/testsuite/gcc.target/aarch64/stp_never.c > b/gcc/testsuite/gcc.target/aarch64/stp_never.c > new file mode 100644 > index 00000000000..9cd703995b7 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/stp_never.c > @@ -0,0 +1,60 @@ > +/* { dg-options "-O3 -mstp-policy=3Dnever" } */ > + > +#include > +#include > + > +typedef int v4si __attribute__ ((vector_size (16))); > + > +#define STP_TEST_ALIGNED(TYPE) \ > +TYPE *stp_aligned_##TYPE(char* ptr, TYPE x){ \ > + TYPE *arr =3D (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1= )); \ > + arr[0] =3D x; \ > + arr[1] =3D x; \ > + return arr; \ > +} > + > +#define STP_TEST_UNALIGNED(TYPE) \ > +TYPE *stp_unaligned_##TYPE(char* ptr, TYPE x){ \ > + TYPE *arr =3D (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1= )); \ > + TYPE *a =3D arr+1; \ > + a[0] =3D x; \ > + a[1] =3D x; \ > + return a; \ > +} > + > +#define STP_TEST_ADJUST_ALIGNED(TYPE) \ > +TYPE *stp_aligned_adjust_##TYPE(char* ptr, TYPE x){ \ > + TYPE *arr =3D (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1= )); \ > + arr[100] =3D x; \ > + arr[101] =3D x; \ > + arr[102] =3D x; \ > + arr[103] =3D x; \ > + return arr; \ > +} > + > +#define STP_TEST_ADJUST_UNALIGNED(TYPE) \ > +TYPE *stp_unaligned_adjust_##TYPE(char* ptr, TYPE x){ \ > + TYPE *arr =3D (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1= )); \ > + TYPE *a =3D arr+1; \ > + a[100] =3D x; \ > + a[101] =3D x; \ > + a[102] =3D x; \ > + a[103] =3D x; \ > + return a; \ > +} > + > +STP_TEST_ALIGNED(int32_t); > +STP_TEST_ALIGNED(int64_t); > +STP_TEST_ALIGNED(v4si); > +STP_TEST_UNALIGNED(int32_t); > +STP_TEST_UNALIGNED(int64_t); > +STP_TEST_UNALIGNED(v4si); > +STP_TEST_ADJUST_ALIGNED(int32_t); > +STP_TEST_ADJUST_ALIGNED(int64_t); > +STP_TEST_ADJUST_UNALIGNED(int32_t); > +STP_TEST_ADJUST_UNALIGNED(int64_t); > + > +/* { dg-final { scan-assembler-times "stp\tw\[0-9\]+, w\[0-9\]" 0 } } */ > +/* { dg-final { scan-assembler-times "stp\tx\[0-9\]+, x\[0-9\]" 0 } } */ > +/* { dg-final { scan-assembler-times "stp\tq\[0-9\]+, q\[0-9\]" 0 } } */ > + > -- > 2.40.1