From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR04-HE1-obe.outbound.protection.outlook.com (mail-he1eur04on2084.outbound.protection.outlook.com [40.107.7.84]) by sourceware.org (Postfix) with ESMTPS id CF5843858C50 for ; Mon, 17 Apr 2023 09:56:36 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CF5843858C50 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=G/D8Icw+VWD8YOUpbQPHaVerWxsQo69H4KqHu8eKibo=; b=uF8MZ5Lq9D/R9e/cMTasXyGxVMoYaygI0nGUI2V994NwTKCjOFBIOgPvq/5GWKOMVq2+Nk16S/M7/YVRlQxqZO7K3AKwg6tRMrUevPzEIkpd4zsVplbO1ioAHciErlOtMVUFTcBZMwe3QCPWP5VZfEgvj6sJxpZ/PQ6Lkmdz6TA= Received: from AS9PR05CA0313.eurprd05.prod.outlook.com (2603:10a6:20b:491::22) by GV1PR08MB7851.eurprd08.prod.outlook.com (2603:10a6:150:5e::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6298.45; Mon, 17 Apr 2023 09:56:33 +0000 Received: from AM7EUR03FT025.eop-EUR03.prod.protection.outlook.com (2603:10a6:20b:491:cafe::9a) by AS9PR05CA0313.outlook.office365.com (2603:10a6:20b:491::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6298.46 via Frontend Transport; Mon, 17 Apr 2023 09:56:33 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM7EUR03FT025.mail.protection.outlook.com (100.127.140.199) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6298.31 via Frontend Transport; Mon, 17 Apr 2023 09:56:32 +0000 Received: ("Tessian outbound 8b05220b4215:v136"); Mon, 17 Apr 2023 09:56:32 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 684a0732d8e7a8ca X-CR-MTA-TID: 64aa7808 Received: from 3f56cbf34ff1.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id ECCEB2E5-D39F-4682-ADED-2A57CE56A961.1; Mon, 17 Apr 2023 09:56:26 +0000 Received: from EUR05-AM6-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 3f56cbf34ff1.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Mon, 17 Apr 2023 09:56:26 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=JSnuqC8rzLLO4cpSTcvCFwe3XDQaGCpGXXDVM5OZ7xL0Kh3ICvR/i/uRaShdDD9QZgkOayES/TwyVmp9JcjlR/ufOs21WStn13GttQoCplTb2tXw1nBszbJx6hYx/39Tu9B3U/WmCdbk8BrVZ95QsF4ycY6IPnU6nhVnal2b9tnMVMjzmyQdmLLvKLQbY3zTLvv2R6v9NF7XnpSFYR9CxHBvFsZTbmb6g+RY48XmqS34vXk1Nc7VTgP5V4tTtx8ZUxsL6kDnt8tROlTnopbS4IzqaH44p1pLDb9rdalvaNzS6B6DKIm0LSbKRIeFvkjnvfFUcxh0YYgr9bs3bwpY8w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=G/D8Icw+VWD8YOUpbQPHaVerWxsQo69H4KqHu8eKibo=; b=UaXyI7DXCmUGgH7zuZGRCMZn/t7ZiatmTqyK547CAATtTJrSlH3jj7UkIiXv44dzqCTuWliJlCDA4fY/TI7hES4RJt19wyOdD6Hc++PZSEtMjbEZjE5V6S47QdGmRNr+csZqZOk0GMysD54oNKoqQy4Dsb9OyUcMmdVTZRo7+nVksjIG7mgeSm8p7DtL11nB0NrB+/MF8aulZBxSk5XXNzuFOAik2S16br4v9x9hUE4/MQW0U8fsUw6CcAGJFqr3cxNxvVfokjMpToNTRONclUGWK7GkYH1jqh2EKylbU/LYuuxWgEcTb7Ri1PU1m5uR6NI4ifxXwl+KdO4HKTy3tA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=G/D8Icw+VWD8YOUpbQPHaVerWxsQo69H4KqHu8eKibo=; b=uF8MZ5Lq9D/R9e/cMTasXyGxVMoYaygI0nGUI2V994NwTKCjOFBIOgPvq/5GWKOMVq2+Nk16S/M7/YVRlQxqZO7K3AKwg6tRMrUevPzEIkpd4zsVplbO1ioAHciErlOtMVUFTcBZMwe3QCPWP5VZfEgvj6sJxpZ/PQ6Lkmdz6TA= Received: from PAXPR08MB6926.eurprd08.prod.outlook.com (2603:10a6:102:138::24) by DU2PR08MB10085.eurprd08.prod.outlook.com (2603:10a6:10:496::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6298.30; Mon, 17 Apr 2023 09:56:23 +0000 Received: from PAXPR08MB6926.eurprd08.prod.outlook.com ([fe80::8bb3:2d4d:b99e:f10a]) by PAXPR08MB6926.eurprd08.prod.outlook.com ([fe80::8bb3:2d4d:b99e:f10a%5]) with mapi id 15.20.6298.045; Mon, 17 Apr 2023 09:56:22 +0000 From: Kyrylo Tkachov To: Philipp Tomsich , "gcc-patches@gcc.gnu.org" CC: Di Zhao Subject: RE: [PATCH v2] aarch64: disable LDP via tuning structure for -mcpu=ampere1/1a Thread-Topic: [PATCH v2] aarch64: disable LDP via tuning structure for -mcpu=ampere1/1a Thread-Index: AQHZbvvJiAnazwgjMk+ztmUVwSiEcK8vR6aA Date: Mon, 17 Apr 2023 09:56:22 +0000 Message-ID: References: <20230414180543.1497603-1-philipp.tomsich@vrull.eu> In-Reply-To: <20230414180543.1497603-1-philipp.tomsich@vrull.eu> Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; x-ms-traffictypediagnostic: PAXPR08MB6926:EE_|DU2PR08MB10085:EE_|AM7EUR03FT025:EE_|GV1PR08MB7851:EE_ X-MS-Office365-Filtering-Correlation-Id: 27431c30-6958-40cb-cafe-08db3f2a064e x-checkrecipientrouted: true nodisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: Dl4xmamrH+UJrYiw34z/SYOPsXM4xf9yFxgqbE6p8dw3nOVnOt/A7HH/UU9f9HTpLAXlmrr67nUBnAH219vK2RykE+dBAIisDjIrWPwnDadjn4dHcU6J9zVHYVDmvbBLEg7uNJEcaFTVL7V1A3K597hApVxPcycVsHJAWHWvNmRNyoI5ZLwjoY8tmhra9QPHNPJHafi3AQ0bWLWio4NP+GdwGZfoOSnvOEZ6fll5GsPjuigvfFZHPGrcdPr5DRTXdak6jAhOCZ/SnOGwUZHQth06P1bkSHD1ge3F12dgN+4iqlBASuOi4LGdgcvWxUe8lvwDpw2l73gKGWTfS49mnCFgLlzWQd/FhLTDxN6UA/ZgeuyVpAp2ja4VKsdaYrDVN85bXP73MrvbhnP5zSU8An49PdUbHOn7dbeWgX0SfzGKZQoWVSI1pIuY/rieApYRL/ZpgCihF6rjTBJ4OiuoAxpxYO7bwQjNAkbwU6+1EN92hdni+XBbcdrLJgscmPhul4K91qXZDuyEGavOGosukHEXZlO4o2r33tyzkwuVxqXqMc7VAY65OYhPYG5Q117RP3Pnie9uvidrXCGYUrIa8CBixvnHMd1M6dKHxhY+kMQ= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:PAXPR08MB6926.eurprd08.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230028)(4636009)(396003)(376002)(346002)(136003)(366004)(39860400002)(451199021)(5660300002)(71200400001)(7696005)(66476007)(66556008)(2906002)(4326008)(66946007)(64756008)(66446008)(76116006)(86362001)(38070700005)(8936002)(122000001)(38100700002)(41300700001)(33656002)(478600001)(52536014)(8676002)(316002)(55016003)(9686003)(110136005)(6506007)(26005)(53546011)(84970400001)(186003)(83380400001);DIR:OUT;SFP:1101; Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: DU2PR08MB10085 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM7EUR03FT025.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: c4d34d50-d450-4c09-b2b9-08db3f29ffe4 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: c3r0NvBToXywaXZiR1WeWfmeJzN2XqxgHAlYVzHs65k1QJyUdkAeUO3dqJNMbNE9zTFpBtBVzOzWrj3w8mjHp4U6Hk09aJ+Wu4BnPxspAkbywFhW1qJHxhbVTvExCKReFO36G956k9Hj5DdQLzWMf5uuEcPmHmNPJQp29C6nYbK0DXVrdOyVLCfrlqrSkJRGxUPPP8okB0Jd15P45kJmgexSuSMVEHfkvMEJaI8FhVNM0ZTG9CzFaEo8VR+zV2NFS8DzsKBXPAAabsqRBcIZ11qHk9rLBSBOAFnWDR7mtcI2zuXY0udvM33gIS5Zw+8Qy9RWJQG8q0Pw+bF3hkaRYw6MlOY3HzF6vkxJ6K1YD5FMXn4StYzYyobgRnb1GQQ7RWb9RtTisrKBQlgk7/tzHfcV1uXtxFSpFTTsAD2o7M4vunxdSgQpKbGKA+GG+8FRlMg2P0c/kaaGSFJJd05GZZH7UZiXh52SynGJCt5zyUn0e3WUbJaDztVWAQyLJuXSPOTFM/2+fy7gAdZIe8Oz+Z2HpYr0eXhvgl8p/a7SC4DN4L8ikaF4m89XL8menII5pBaAfMBzJpAGHcAOtVMewCVqo3Fb/+ztIbISkWoAT+2Ieq5JSqyKhZ+nW9YcaVXO0jByvbkACX/ve6XjCdlRoHhR3mpbaW8ZOIlAfngsVBt+C1QnkTg6SYU9aYezB2hI X-Forefront-Antispam-Report: CIP:63.35.35.123;CTRY:IE;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:64aa7808-outbound-1.mta.getcheckrecipient.com;PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com;CAT:NONE;SFS:(13230028)(4636009)(136003)(346002)(376002)(39860400002)(396003)(451199021)(46966006)(36840700001)(40470700004)(55016003)(40480700001)(40460700003)(70586007)(70206006)(82740400003)(110136005)(4326008)(478600001)(316002)(8676002)(52536014)(5660300002)(8936002)(41300700001)(356005)(81166007)(186003)(53546011)(336012)(83380400001)(47076005)(36860700001)(9686003)(7696005)(107886003)(6506007)(26005)(82310400005)(33656002)(86362001)(2906002)(84970400001);DIR:OUT;SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 17 Apr 2023 09:56:32.9279 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 27431c30-6958-40cb-cafe-08db3f2a064e X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d;Ip=[63.35.35.123];Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM7EUR03FT025.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: GV1PR08MB7851 X-Spam-Status: No, score=-11.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,FORGED_SPF_HELO,GIT_PATCH_0,KAM_DMARC_NONE,KAM_SHORT,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE,UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: > -----Original Message----- > From: Philipp Tomsich > Sent: Friday, April 14, 2023 7:06 PM > To: gcc-patches@gcc.gnu.org > Cc: Kyrylo Tkachov ; Philipp Tomsich > ; Di Zhao > Subject: [PATCH v2] aarch64: disable LDP via tuning structure for - > mcpu=3Dampere1/1a >=20 > AmpereOne (-mcpu=3Dampere1) breaks LDP instructions into two uops. > Given the chance that this causes instructions to slip into the next > decoding cycle and the additional overheads when handling > cacheline-crossing LDP instructions, we disable the generation of LDP > isntructions through the tuning structure from instruction combining > (such as in peephole2). >=20 > Given the code-density benefits in builtins and prologue/epilogue > expansion, we allow LDPs there. >=20 > This commit: > * adds a new tuning option AARCH64_EXTRA_TUNE_NO_LDP_COMBINE > * allows -moverride=3Dtune=3D... to override this >=20 > These changes are benchmark-driven, yielding the following changes > (with a net-overall improvement): > 503.bwaves_r. -0.88% > 507.cactuBSSN_r 0.35% > 508.namd_r 3.09% > 510.parest_r -2.99% > 511.povray_r 5.54% > 519.lbm_r 15.83% > 521.wrf_r 0.56% > 526.blender_r 2.47% > 527.cam4_r 0.70% > 538.imagick_r 0.00% > 544.nab_r -0.33% > 549.fotonik3d_r. -0.42% > 554.roms_r 0.00% > ------------------------- > =3D total 1.79% >=20 > Signed-off-by: Philipp Tomsich > Co-Authored-By: Di Zhao Ok. Thanks, Kyrill >=20 > gcc/ChangeLog: >=20 > * config/aarch64/aarch64-tuning-flags.def > (AARCH64_EXTRA_TUNING_OPTION): > Add AARCH64_EXTRA_TUNE_NO_LDP_COMBINE. > * config/aarch64/aarch64.cc (aarch64_operands_ok_for_ldpstp): > Check for the above tuning option when processing loads. >=20 > gcc/testsuite/ChangeLog: >=20 > * gcc.target/aarch64/ampere1-no_ldp_combine.c: New test. >=20 > --- >=20 > Changes in v2: > - apply both to -mcpu=3Dampere1 and -mcpu=3Dampere1a > - add TODO: tag, per discussions on the mailing list > - add testcase >=20 > gcc/config/aarch64/aarch64-tuning-flags.def | 3 +++ > gcc/config/aarch64/aarch64.cc | 18 ++++++++++++++++-- > .../aarch64/ampere1-no_ldp_combine.c | 11 +++++++++++ > 3 files changed, 30 insertions(+), 2 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/aarch64/ampere1- > no_ldp_combine.c >=20 > diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def > b/gcc/config/aarch64/aarch64-tuning-flags.def > index 712895a5263..52112ba7c48 100644 > --- a/gcc/config/aarch64/aarch64-tuning-flags.def > +++ b/gcc/config/aarch64/aarch64-tuning-flags.def > @@ -44,6 +44,9 @@ AARCH64_EXTRA_TUNING_OPTION > ("cheap_shift_extend", CHEAP_SHIFT_EXTEND) > /* Disallow load/store pair instructions on Q-registers. */ > AARCH64_EXTRA_TUNING_OPTION ("no_ldp_stp_qregs", > NO_LDP_STP_QREGS) >=20 > +/* Disallow load-pair instructions to be formed in combine/peephole. */ > +AARCH64_EXTRA_TUNING_OPTION ("no_ldp_combine", > NO_LDP_COMBINE) > + > AARCH64_EXTRA_TUNING_OPTION ("rename_load_regs", > RENAME_LOAD_REGS) >=20 > AARCH64_EXTRA_TUNING_OPTION ("cse_sve_vl_constants", > CSE_SVE_VL_CONSTANTS) > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.c= c > index f4ef22ce02f..0f04ab9fba0 100644 > --- a/gcc/config/aarch64/aarch64.cc > +++ b/gcc/config/aarch64/aarch64.cc > @@ -1933,7 +1933,7 @@ static const struct tune_params ampere1_tunings =3D > 2, /* min_div_recip_mul_df. */ > 0, /* max_case_values. */ > tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ > - (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ > + (AARCH64_EXTRA_TUNE_NO_LDP_COMBINE), /* tune_flags. */ > &ere1_prefetch_tune > }; >=20 > @@ -1971,7 +1971,7 @@ static const struct tune_params ampere1a_tunings > =3D > 2, /* min_div_recip_mul_df. */ > 0, /* max_case_values. */ > tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ > - (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ > + (AARCH64_EXTRA_TUNE_NO_LDP_COMBINE), /* tune_flags. */ > &ere1_prefetch_tune > }; >=20 > @@ -26053,6 +26053,20 @@ aarch64_operands_ok_for_ldpstp (rtx > *operands, bool load, > enum reg_class rclass_1, rclass_2; > rtx mem_1, mem_2, reg_1, reg_2; >=20 > + /* Allow the tuning structure to disable LDP instruction formation > + from combining instructions (e.g., in peephole2). > + TODO: Implement fine-grained tuning control for LDP and STP: > + 1. control policies for load and store separately; > + 2. support the following policies: > + - default (use what is in the tuning structure) > + - always > + - never > + - aligned (only if the compiler can prove that the > + load will be aligned to 2 * element_size) */ > + if (load && (aarch64_tune_params.extra_tuning_flags > + & AARCH64_EXTRA_TUNE_NO_LDP_COMBINE)) > + return false; > + > if (load) > { > mem_1 =3D operands[1]; > diff --git a/gcc/testsuite/gcc.target/aarch64/ampere1-no_ldp_combine.c > b/gcc/testsuite/gcc.target/aarch64/ampere1-no_ldp_combine.c > new file mode 100644 > index 00000000000..bc871f4481d > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/ampere1-no_ldp_combine.c > @@ -0,0 +1,11 @@ > +/* { dg-options "-O3 -mtune=3Dampere1" } */ > + > +long > +foo (long a[]) > +{ > + return a[0] + a[1]; > +} > + > +/* We should see two ldrs instead of one ldp. */ > +/* { dg-final { scan-assembler {\tldr\t} } } */ > +/* { dg-final { scan-assembler-not {\tldp\t} } } */ > -- > 2.34.1