From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 70112 invoked by alias); 20 Apr 2017 15:42:02 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 69879 invoked by uid 89); 20 Apr 2017 15:42:02 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-24.5 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,KAM_LOTSOFHASH,RCVD_IN_DNSWL_NONE,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.2 spammy=bypass, proprietary X-HELO: EUR03-VE1-obe.outbound.protection.outlook.com Received: from mail-eopbgr50053.outbound.protection.outlook.com (HELO EUR03-VE1-obe.outbound.protection.outlook.com) (40.107.5.53) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 20 Apr 2017 15:41:59 +0000 Received: from AM5PR0802MB2610.eurprd08.prod.outlook.com (10.175.46.18) by AM4PR08MB2658.eurprd08.prod.outlook.com (10.171.190.147) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1034.10; Thu, 20 Apr 2017 15:41:59 +0000 Received: from AM5PR0802MB2610.eurprd08.prod.outlook.com ([10.175.46.18]) by AM5PR0802MB2610.eurprd08.prod.outlook.com ([10.175.46.18]) with mapi id 15.01.1047.013; Thu, 20 Apr 2017 15:41:58 +0000 From: Wilco Dijkstra To: GCC Patches , James Greenhalgh CC: nd Subject: Re: [PATCH][AArch64] Model Cortex-A53 load forwarding Date: Thu, 20 Apr 2017 15:59:00 -0000 Message-ID: References: In-Reply-To: authentication-results: arm.com; dkim=none (message not signed) header.d=none;arm.com; dmarc=none action=none header.from=arm.com; x-microsoft-exchange-diagnostics: 1;AM4PR08MB2658;7:63Z2QbniB2Csi1NekqEPZ2/hc2zES64I2jV7WkjKt1M35ynguSz71visnJbtOhyes2gOXbNxWYPVwIX60aFZ9z1/+3C4qK/5Gc5TRGlEJGQFna4NSbAamuHfyC3RFTS0mWiizadwk2L78QSvtbrFdfhOnmQOQL9PirvIZtp0oi76pIExYMLmCemR4LeNyhDt3lVi4r7TDKGLOMh7iFoJGDxBKaXVCTJvmP4gJzxGmi6cBnElizuGnBAEXclXn9caFtsbjxobVVLIkqp6FhJ3QQqtk1acOe686Y6XdlP3mdLzyKVW+RWQPMW/68CXCtwvmgII+6pgepT6dEzof0oZBg== x-ms-office365-filtering-correlation-id: 9e92d86e-b7a6-4c5e-b3f3-08d48803c806 x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(22001)(2017030254075)(48565401081)(201703131423075)(201703031133081);SRVR:AM4PR08MB2658; nodisclaimer: True x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(180628864354917); x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(6040450)(601004)(2401047)(5005006)(8121501046)(3002001)(10201501046)(93006095)(93001095)(6055026)(6041248)(20161123562025)(201703131423075)(201702281528075)(201703061421075)(20161123560025)(20161123555025)(20161123564025)(6072148);SRVR:AM4PR08MB2658;BCL:0;PCL:0;RULEID:;SRVR:AM4PR08MB2658; x-forefront-prvs: 02830F0362 x-forefront-antispam-report: SFV:NSPM;SFS:(10009020)(6009001)(39450400003)(39840400002)(39860400002)(39850400002)(39410400002)(39400400002)(377424004)(53936002)(25786009)(6506006)(74316002)(4326008)(55016002)(53546009)(7696004)(305945005)(99286003)(9686003)(122556002)(77096006)(5660300001)(575784001)(7736002)(2900100001)(86362001)(6246003)(3280700002)(38730400002)(3846002)(3660700001)(102836003)(81166006)(6116002)(189998001)(8676002)(2906002)(33656002)(8936002)(76176999)(50986999)(54356999)(6636002)(229853002)(66066001)(2950100002);DIR:OUT;SFP:1101;SCL:1;SRVR:AM4PR08MB2658;H:AM5PR0802MB2610.eurprd08.prod.outlook.com;FPR:;SPF:None;MLV:sfv;LANG:en; spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-originalarrivaltime: 20 Apr 2017 15:41:58.7303 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM4PR08MB2658 X-SW-Source: 2017-04/txt/msg00900.txt.bz2 ping From: Wilco Dijkstra Sent: 05 April 2017 13:29 To: GCC Patches Cc: nd; James Greenhalgh Subject: [PATCH][AArch64] Model Cortex-A53 load forwarding =A0=20=20=20 Code scheduling for Cortex-A53 isn't as good as it could be.=A0 It turns out code runs faster overall if we place loads and stores with a dependency closer together.=A0 To achieve this effect, this patch adds a bypass between cortex_a53_load1 and cortex_a53_load*/cortex_a53_store* if the result of an earlier load is used in an address calculation.=A0 This significantly impro= ved benchmark scores in a proprietary benchmark suite. Passes AArch64 bootstrap and regress. OK for stage 1? ChangeLog: 2017-04-05=A0 Wilco Dijkstra=A0 =A0=A0=A0=A0=A0=A0=A0 * config/arm/aarch-common.c (arm_early_load_addr_dep_= ptr): =A0=A0=A0=A0=A0=A0=A0 New function. =A0=A0=A0=A0=A0=A0=A0 (arm_early_store_addr_dep_ptr): Likewise. =A0=A0=A0=A0=A0=A0=A0 * config/arm/aarch-common-protos.h =A0=A0=A0=A0=A0=A0=A0 (arm_early_load_addr_dep_ptr): Add prototype. =A0=A0=A0=A0=A0=A0=A0 (arm_early_store_addr_dep_ptr): Likewise. =A0=A0=A0=A0=A0=A0=A0 * config/arm/cortex-a53.md: Add new bypasses. --- diff --git a/gcc/config/arm/aarch-common-protos.h b/gcc/config/arm/aarch-co= mmon-protos.h index 8e9fb7a895b0a4aaf1585eb3368443899b061c9b..5298172e6b6930a110388a40a75= 33ff208a87095 100644 --- a/gcc/config/arm/aarch-common-protos.h +++ b/gcc/config/arm/aarch-common-protos.h @@ -30,7 +30,9 @@ extern bool aarch_rev16_p (rtx); =A0extern bool aarch_rev16_shleft_mask_imm_p (rtx, machine_mode); =A0extern bool aarch_rev16_shright_mask_imm_p (rtx, machine_mode); =A0extern int arm_early_load_addr_dep (rtx, rtx); +extern int arm_early_load_addr_dep_ptr (rtx, rtx); =A0extern int arm_early_store_addr_dep (rtx, rtx); +extern int arm_early_store_addr_dep_ptr (rtx, rtx); =A0extern int arm_mac_accumulator_is_mul_result (rtx, rtx); =A0extern int arm_mac_accumulator_is_result (rtx, rtx); =A0extern int arm_no_early_alu_shift_dep (rtx, rtx); diff --git a/gcc/config/arm/aarch-common.c b/gcc/config/arm/aarch-common.c index dd37be0291a633f606d95ec8acacc598435828b3..74b80b272550028919c42743879= 44867ffed43d1 100644 --- a/gcc/config/arm/aarch-common.c +++ b/gcc/config/arm/aarch-common.c @@ -241,6 +241,24 @@ arm_early_load_addr_dep (rtx producer, rtx consumer) =A0=A0 return reg_overlap_mentioned_p (value, addr); =A0} =A0 +/* Return nonzero if the CONSUMER instruction (a load) does need +=A0=A0 a Pmode PRODUCER's value to calculate the address.=A0 */ + +int +arm_early_load_addr_dep_ptr (rtx producer, rtx consumer) +{ +=A0 rtx value =3D arm_find_sub_rtx_with_code (PATTERN (producer), SET, fal= se); +=A0 rtx addr =3D arm_find_sub_rtx_with_code (PATTERN (consumer), SET, fals= e); + +=A0 if (!value || !addr || !MEM_P (SET_SRC (value))) +=A0=A0=A0 return 0; + +=A0 value =3D SET_DEST (value); +=A0 addr =3D SET_SRC (addr); + +=A0 return GET_MODE (value) =3D=3D Pmode && reg_overlap_mentioned_p (value= , addr); +} + =A0/* Return nonzero if the CONSUMER instruction (an ALU op) does not =A0=A0=A0 have an early register shift value or amount dependency on the =A0=A0=A0 result of PRODUCER.=A0 */ @@ -336,6 +354,24 @@ arm_early_store_addr_dep (rtx producer, rtx consumer) =A0=A0 return !arm_no_early_store_addr_dep (producer, consumer); =A0} =A0 +/* Return nonzero if the CONSUMER instruction (a store) does need +=A0=A0 a Pmode PRODUCER's value to calculate the address.=A0 */ + +int +arm_early_store_addr_dep_ptr (rtx producer, rtx consumer) +{ +=A0 rtx value =3D arm_find_sub_rtx_with_code (PATTERN (producer), SET, fal= se); +=A0 rtx addr =3D arm_find_sub_rtx_with_code (PATTERN (consumer), SET, fals= e); + +=A0 if (!value || !addr || !MEM_P (SET_SRC (value))) +=A0=A0=A0 return 0; + +=A0 value =3D SET_DEST (value); +=A0 addr =3D SET_DEST (addr); + +=A0 return GET_MODE (value) =3D=3D Pmode && reg_overlap_mentioned_p (value= , addr); +} + =A0/* Return non-zero iff the consumer (a multiply-accumulate or a =A0=A0=A0 multiple-subtract instruction) has an accumulator dependency on t= he =A0=A0=A0 result of the producer and no other dependency on that result.=A0= It diff --git a/gcc/config/arm/cortex-a53.md b/gcc/config/arm/cortex-a53.md index b367ad403a4a641da34521c17669027b87092737..f8225f33c7a06485147b30fe263= 3309ac252d0c7 100644 --- a/gcc/config/arm/cortex-a53.md +++ b/gcc/config/arm/cortex-a53.md @@ -246,6 +246,16 @@ =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 "cortex_a53_store*" =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 "arm_no_early_store_add= r_dep") =A0 +;; Model a bypass for load to load/store address. + +(define_bypass 3 "cortex_a53_load1" +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 "cortex_a53_load*" +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 "arm_early_load_addr_dep_ptr= ") + +(define_bypass 3 "cortex_a53_load1" +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 "cortex_a53_store*" +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 "arm_early_store_addr_dep_pt= r") + =A0;; Model a GP->FP register move as similar to stores. =A0 =A0(define_bypass 0 "cortex_a53_alu*,cortex_a53_shift*" =20=20=20=20