From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR01-HE1-obe.outbound.protection.outlook.com (mail-eopbgr130089.outbound.protection.outlook.com [40.107.13.89]) by sourceware.org (Postfix) with ESMTPS id 680713851880 for ; Wed, 23 Nov 2022 18:10:25 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 680713851880 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=5+rMU5QE2UtvbDC2XOPbnePF/CDonex99jcBHNc7E9I=; b=KXpSbXnhIpapHtkJaM44wdHAVfZf/ffTijyI5wzn7mHG3d/9JHbxx2m/Kv/vtJxzNUyByqM1JUCMD6PbLL+yXRMvtztWHeZH7Y7zIoJaSlZVUg15Pm37QYvKq50VSPWLC2r9LzLCfErCfwQFKdYEZKWK7EjdB+ev8JiLpSc+CC4= Received: from ZR0P278CA0010.CHEP278.PROD.OUTLOOK.COM (2603:10a6:910:16::20) by GV1PR08MB8083.eurprd08.prod.outlook.com (2603:10a6:150:95::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5857.17; Wed, 23 Nov 2022 18:10:16 +0000 Received: from VI1EUR03FT005.eop-EUR03.prod.protection.outlook.com (2603:10a6:910:16:cafe::29) by ZR0P278CA0010.outlook.office365.com (2603:10a6:910:16::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5857.18 via Frontend Transport; Wed, 23 Nov 2022 18:10:16 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by VI1EUR03FT005.mail.protection.outlook.com (100.127.144.102) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5834.8 via Frontend Transport; Wed, 23 Nov 2022 18:10:16 +0000 Received: ("Tessian outbound f394866f3f2b:v130"); Wed, 23 Nov 2022 18:10:16 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 11472438269efc08 X-CR-MTA-TID: 64aa7808 Received: from 2658d119c5cf.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 0B800F3A-8A13-4AD0-B9D7-F40475849653.1; Wed, 23 Nov 2022 18:10:07 +0000 Received: from EUR04-HE1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 2658d119c5cf.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Wed, 23 Nov 2022 18:10:07 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=cpn5BByyM7Xd9DiOFI+Cz8WxaQO+9lL+7eMVookh+2xxnUNRCsZhhW1v11K6D2H/g/Bsfjl4Q8nhOJ+vXQsqjXL/H3iYEKgflEXgHyN449M7vHuNAahZn1Ts6R/4uXMzqFrxVEl7doxNKtmQbWuOTwt6eV7ssGHjzfz8GownrAmZgw74w0HNBYcckxFqhTvG060A1/8/6plXzweohInVU0WbSKqfHKuTuv2Gr/OubzfXciaLhCoxH/AY+31YaLgIeIigNczcBhnS/RenpKamDVeBtRwg72Ti7+WiDfL4BnUjvLDbMTEOPHJzqioC7efIbq5nm5rb2q3WUPDvoaAYPA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=5+rMU5QE2UtvbDC2XOPbnePF/CDonex99jcBHNc7E9I=; b=NdrSyftYr64pqV6636sqMvW5MLqP2xOXUgn3G/sS9M/KoFhY9Z+arCNec6Vsyxf+c8KH1EvIRLw7oajLN84pKcFWEOK/jsUeq+UV7RH0IFRdvu30QiXfVx/CUfxlPljACYNZndxl4AaNGCXRi1r50dlg2y9wyFEKiUEEdOeA1qa+S4tuM58yiGuovwy3eXxOhvCXp5ISb9bMONIwdbygOT1bvdOVRUmTD84hy/O5/IG8HHaxbgGKjL9dVqQcEoqSkKWTRvj3PHagLEE4B6XP2OCFDxzKkkOQMVE5yzq7yEmHV6oerzNGCsaqi28HFV0Zk8HRYoUSdJz87SWpDdROwA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=5+rMU5QE2UtvbDC2XOPbnePF/CDonex99jcBHNc7E9I=; b=KXpSbXnhIpapHtkJaM44wdHAVfZf/ffTijyI5wzn7mHG3d/9JHbxx2m/Kv/vtJxzNUyByqM1JUCMD6PbLL+yXRMvtztWHeZH7Y7zIoJaSlZVUg15Pm37QYvKq50VSPWLC2r9LzLCfErCfwQFKdYEZKWK7EjdB+ev8JiLpSc+CC4= Received: from PAWPR08MB8982.eurprd08.prod.outlook.com (2603:10a6:102:33f::20) by GVXPR08MB7774.eurprd08.prod.outlook.com (2603:10a6:150:7::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5857.17; Wed, 23 Nov 2022 18:10:02 +0000 Received: from PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::4c73:7d14:fc39:a3cb]) by PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::4c73:7d14:fc39:a3cb%3]) with mapi id 15.20.5857.017; Wed, 23 Nov 2022 18:10:02 +0000 From: Wilco Dijkstra To: Richard Sandiford CC: GCC Patches Subject: Re: [PATCH] AArch64: Add fma_reassoc_width [PR107413] Thread-Topic: [PATCH] AArch64: Add fma_reassoc_width [PR107413] Thread-Index: AQHY9DgezXT8hXnv2EGnjnt4gJ97Ga5Kp/RSgAAmFfKAAELldoABzGMn Date: Wed, 23 Nov 2022 18:10:01 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; x-ms-traffictypediagnostic: PAWPR08MB8982:EE_|GVXPR08MB7774:EE_|VI1EUR03FT005:EE_|GV1PR08MB8083:EE_ X-MS-Office365-Filtering-Correlation-Id: 5801951b-c31a-4138-2a38-08dacd7df979 x-checkrecipientrouted: true nodisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: 4uSDW5yICNyT6IX7/NPuUxRtu2oEPhSGGJahIrR5O4vy6dlUjJj1hADbKmEGfY/XRHUd+0pbjVnko8dcqgaybR/ohD+fnXVwDNrUJZae1l7kNINXjGng64Ncs+XqL84WD4nYn6fJC6tOwm9HxqDP1Im0M24A/Qru7FaCtJ835nmjWAW8Ddzc/ZLqNNeKWKa9gMebkC7k1yKqMI1cpLYuW0p8LmeqPAJOIHh9T+AcifhTZtk/Z/kwYgT8g/EPL07VtBPV18TdBiorDp6xHfFwMpH/kMb2m/s1i552GuOTKz3ugzTl7dCT6IFMWbXnZ0F2r0Ubh7wIf0zdGtxtAxZNXihA+cpWOTV+GLS8qCFtnY+7bCvvlQlVxzMV3ZRqFp16ryagunHD3DYHBcTNxLKC3Hl47SJN8dFVYl7VhEYclNQ8UiaXaf2bwwQ4TdLhE21qV5+xAjSuL1OIfCJecClHaeCcgI7xwgPXav4rnv6G0U8XWHduov230TMRv5KuBSEaMH8FgrcTcaHUNWVwloPtozxdsLb6KjZzQSbrKy6Nt5gJFtyiN4/RJh2fWvB8QvZRgGZC15Ob6mqloQDS4/3HMgL5PeQaSv31uDQ9ox7ocy6qPnQcmBUX16KtqYMfdqaXHDVuWgPiCUAj+4968JTNVnUeMIskODo9SnsKlAYGlbEooXx37glPvfQV8RnbAmA0HkqniNQTu2yv5ZUiegQzbA== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:PAWPR08MB8982.eurprd08.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230022)(4636009)(366004)(396003)(376002)(39860400002)(136003)(346002)(451199015)(33656002)(55016003)(86362001)(38100700002)(38070700005)(4326008)(8676002)(8936002)(83380400001)(2906002)(6862004)(122000001)(76116006)(91956017)(316002)(66476007)(186003)(6636002)(7696005)(66446008)(478600001)(26005)(41300700001)(5660300002)(66946007)(52536014)(66556008)(64756008)(6506007)(9686003)(71200400001);DIR:OUT;SFP:1101; Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: GVXPR08MB7774 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: VI1EUR03FT005.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 1db59aac-db0d-493f-258b-08dacd7df0c4 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: fW4D9684H3o2bOBoF9bwBF6w136OmRpcleYVTQevze59f7W/LpWNjUC4LYOPpN5sy5MOqe1jdlKIclxP5t5SXzRsVkg8MIONJrJeZUNxEFabm0JUlpdtDpM5F52H4WNQAlKLziHFMuIoIhuWor0sKx5/oHJRYPZMZF04kY40gna3TbvU82rLA/kRwq8C1z8Ecnh/Dx5N9Zy2fa5or/y/+1pN2RNsaKSO9D2qfRzhJ54ZA1mKl8htbPwBfldrzkyxpYz9QHQ5PfN0xMcWbqQduBGVVvLtavf927gHHYH62w1dVsH1vquearhCv3RD0oX73y8g6W6M4zLlbDl9gRZt3hROvGRvjjRDitwS0VCOWgkdLxicoEFmVKiZ0nfCLN20ZM1LHlm14/1AMBac0cIuFyBpjcoCF2UgDtkgyOoqHrNlmf24SFqH3VNCh1bzYiRadUrqTktu0YnQ5qxwnYPVVYsNvgRoXyWLSuETEYbniHwc9ojt0ctQyJtRJdD8QGdzmxZKXyJ1rYQozNgvGcDQYoTzxFDXix2BToOj13Ye78xa3DEKIr+2xMhTJzhlBFN1bNLS4+RXvlPSaqbAAxKpdUnk67X83vcd0280FX/RL016hoipMCGuxzC5UKkb+SWR6sW5lGdLGU/lJ053oylXmWOQ1NmFSbtrb5UblvYFxBFPZY3RF8UCv+47EoUP9MA7iHqjgdcKqIsXzTWusknZcQ== X-Forefront-Antispam-Report: CIP:63.35.35.123;CTRY:IE;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:64aa7808-outbound-1.mta.getcheckrecipient.com;PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com;CAT:NONE;SFS:(13230022)(4636009)(396003)(39860400002)(376002)(346002)(136003)(451199015)(46966006)(36840700001)(40470700004)(6862004)(6636002)(36860700001)(33656002)(316002)(356005)(81166007)(55016003)(47076005)(40460700003)(186003)(40480700001)(336012)(83380400001)(6506007)(7696005)(26005)(9686003)(478600001)(41300700001)(5660300002)(86362001)(8936002)(52536014)(82740400003)(4326008)(8676002)(2906002)(70206006)(70586007)(82310400005);DIR:OUT;SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 23 Nov 2022 18:10:16.5013 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 5801951b-c31a-4138-2a38-08dacd7df979 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d;Ip=[63.35.35.123];Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: VI1EUR03FT005.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: GV1PR08MB8083 X-Spam-Status: No, score=-11.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,FORGED_SPF_HELO,GIT_PATCH_0,KAM_DMARC_NONE,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE,TXREP,UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Richard,=0A= =0A= >> A smart reassociation pass could form more FMAs while also increasing=0A= >> parallelism, but the way it currently works always results in fewer FMAs= .=0A= >=0A= > Yeah, as Richard said, that seems the right long-term fix.=0A= > It would also avoid the hack of treating PLUS_EXPR as a signal=0A= > of an FMA, which has the drawback of assuming (for 2-FMA cores)=0A= > that plain addition never benefits from reassociation in its own right.= =0A= =0A= True but it's hard to separate them. You will have a mix of FADD and FMAs= =0A= to reassociate (since FMA still counts as an add), and the ratio between=0A= them as well as the number of operations may affect the best reassociation= =0A= width.=0A= =0A= > Still, I guess the hackiness is pre-existing and the patch is removing=0A= > the hackiness for some cores, so from that point of view it's a strict=0A= > improvement over the status quo.=A0 And it's too late in the GCC 13=0A= > cycle to do FMA reassociation properly.=A0 So I'm OK with the patch=0A= > in principle, but could you post an update with more commentary?=0A= =0A= Sure, here is an update with longer comment in aarch64_reassociation_width:= =0A= =0A= =0A= Add a reassocation width for FMAs in per-CPU tuning structures. Keep the=0A= existing setting for cores with 2 FMA pipes, and use 4 for cores with 4=0A= FMA pipes. This improves SPECFP2017 on Neoverse V1 by ~1.5%.=0A= =0A= Passes regress/bootstrap, OK for commit?=0A= =0A= gcc/ChangeLog/=0A= PR 107413=0A= * config/aarch64/aarch64.cc (struct tune_params): Add=0A= fma_reassoc_width to all CPU tuning structures.=0A= (aarch64_reassociation_width): Use fma_reassoc_width.=0A= * config/aarch64/aarch64-protos.h (struct tune_params): Add=0A= fma_reassoc_width.=0A= =0A= ---=0A= diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch= 64-protos.h=0A= index 238820581c5ee7617f8eed1df2cf5418b1127e19..4be93c93c26e091f878bc8e4cf0= 6e90888405fb2 100644=0A= --- a/gcc/config/aarch64/aarch64-protos.h=0A= +++ b/gcc/config/aarch64/aarch64-protos.h=0A= @@ -540,6 +540,7 @@ struct tune_params=0A= const char *loop_align;=0A= int int_reassoc_width;=0A= int fp_reassoc_width;=0A= + int fma_reassoc_width;=0A= int vec_reassoc_width;=0A= int min_div_recip_mul_sf;=0A= int min_div_recip_mul_df;=0A= diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc= =0A= index c91df6f5006c257690aafb75398933d628a970e1..15d478c77ceb2d6c52a70b6ffd8= fdadcfa8deba0 100644=0A= --- a/gcc/config/aarch64/aarch64.cc=0A= +++ b/gcc/config/aarch64/aarch64.cc=0A= @@ -1346,6 +1346,7 @@ static const struct tune_params generic_tunings =3D= =0A= "8", /* loop_align. */=0A= 2, /* int_reassoc_width. */=0A= 4, /* fp_reassoc_width. */=0A= + 1, /* fma_reassoc_width. */=0A= 1, /* vec_reassoc_width. */=0A= 2, /* min_div_recip_mul_sf. */=0A= 2, /* min_div_recip_mul_df. */=0A= @@ -1382,6 +1383,7 @@ static const struct tune_params cortexa35_tunings =3D= =0A= "8", /* loop_align. */=0A= 2, /* int_reassoc_width. */=0A= 4, /* fp_reassoc_width. */=0A= + 1, /* fma_reassoc_width. */=0A= 1, /* vec_reassoc_width. */=0A= 2, /* min_div_recip_mul_sf. */=0A= 2, /* min_div_recip_mul_df. */=0A= @@ -1415,6 +1417,7 @@ static const struct tune_params cortexa53_tunings =3D= =0A= "8", /* loop_align. */=0A= 2, /* int_reassoc_width. */=0A= 4, /* fp_reassoc_width. */=0A= + 1, /* fma_reassoc_width. */=0A= 1, /* vec_reassoc_width. */=0A= 2, /* min_div_recip_mul_sf. */=0A= 2, /* min_div_recip_mul_df. */=0A= @@ -1448,6 +1451,7 @@ static const struct tune_params cortexa57_tunings =3D= =0A= "8", /* loop_align. */=0A= 2, /* int_reassoc_width. */=0A= 4, /* fp_reassoc_width. */=0A= + 1, /* fma_reassoc_width. */=0A= 1, /* vec_reassoc_width. */=0A= 2, /* min_div_recip_mul_sf. */=0A= 2, /* min_div_recip_mul_df. */=0A= @@ -1481,6 +1485,7 @@ static const struct tune_params cortexa72_tunings =3D= =0A= "8", /* loop_align. */=0A= 2, /* int_reassoc_width. */=0A= 4, /* fp_reassoc_width. */=0A= + 1, /* fma_reassoc_width. */=0A= 1, /* vec_reassoc_width. */=0A= 2, /* min_div_recip_mul_sf. */=0A= 2, /* min_div_recip_mul_df. */=0A= @@ -1514,6 +1519,7 @@ static const struct tune_params cortexa73_tunings =3D= =0A= "8", /* loop_align. */=0A= 2, /* int_reassoc_width. */=0A= 4, /* fp_reassoc_width. */=0A= + 1, /* fma_reassoc_width. */=0A= 1, /* vec_reassoc_width. */=0A= 2, /* min_div_recip_mul_sf. */=0A= 2, /* min_div_recip_mul_df. */=0A= @@ -1548,6 +1554,7 @@ static const struct tune_params exynosm1_tunings =3D= =0A= "4", /* loop_align. */=0A= 2, /* int_reassoc_width. */=0A= 4, /* fp_reassoc_width. */=0A= + 1, /* fma_reassoc_width. */=0A= 1, /* vec_reassoc_width. */=0A= 2, /* min_div_recip_mul_sf. */=0A= 2, /* min_div_recip_mul_df. */=0A= @@ -1580,6 +1587,7 @@ static const struct tune_params thunderxt88_tunings = =3D=0A= "8", /* loop_align. */=0A= 2, /* int_reassoc_width. */=0A= 4, /* fp_reassoc_width. */=0A= + 1, /* fma_reassoc_width. */=0A= 1, /* vec_reassoc_width. */=0A= 2, /* min_div_recip_mul_sf. */=0A= 2, /* min_div_recip_mul_df. */=0A= @@ -1612,6 +1620,7 @@ static const struct tune_params thunderx_tunings =3D= =0A= "8", /* loop_align. */=0A= 2, /* int_reassoc_width. */=0A= 4, /* fp_reassoc_width. */=0A= + 1, /* fma_reassoc_width. */=0A= 1, /* vec_reassoc_width. */=0A= 2, /* min_div_recip_mul_sf. */=0A= 2, /* min_div_recip_mul_df. */=0A= @@ -1646,6 +1655,7 @@ static const struct tune_params tsv110_tunings =3D=0A= "8", /* loop_align. */=0A= 2, /* int_reassoc_width. */=0A= 4, /* fp_reassoc_width. */=0A= + 1, /* fma_reassoc_width. */=0A= 1, /* vec_reassoc_width. */=0A= 2, /* min_div_recip_mul_sf. */=0A= 2, /* min_div_recip_mul_df. */=0A= @@ -1678,6 +1688,7 @@ static const struct tune_params xgene1_tunings =3D=0A= "16", /* loop_align. */=0A= 2, /* int_reassoc_width. */=0A= 4, /* fp_reassoc_width. */=0A= + 1, /* fma_reassoc_width. */=0A= 1, /* vec_reassoc_width. */=0A= 2, /* min_div_recip_mul_sf. */=0A= 2, /* min_div_recip_mul_df. */=0A= @@ -1710,6 +1721,7 @@ static const struct tune_params emag_tunings =3D=0A= "16", /* loop_align. */=0A= 2, /* int_reassoc_width. */=0A= 4, /* fp_reassoc_width. */=0A= + 1, /* fma_reassoc_width. */=0A= 1, /* vec_reassoc_width. */=0A= 2, /* min_div_recip_mul_sf. */=0A= 2, /* min_div_recip_mul_df. */=0A= @@ -1743,6 +1755,7 @@ static const struct tune_params qdf24xx_tunings =3D= =0A= "16", /* loop_align. */=0A= 2, /* int_reassoc_width. */=0A= 4, /* fp_reassoc_width. */=0A= + 1, /* fma_reassoc_width. */=0A= 1, /* vec_reassoc_width. */=0A= 2, /* min_div_recip_mul_sf. */=0A= 2, /* min_div_recip_mul_df. */=0A= @@ -1778,6 +1791,7 @@ static const struct tune_params saphira_tunings =3D= =0A= "16", /* loop_align. */=0A= 2, /* int_reassoc_width. */=0A= 4, /* fp_reassoc_width. */=0A= + 1, /* fma_reassoc_width. */=0A= 1, /* vec_reassoc_width. */=0A= 2, /* min_div_recip_mul_sf. */=0A= 2, /* min_div_recip_mul_df. */=0A= @@ -1811,6 +1825,7 @@ static const struct tune_params thunderx2t99_tunings = =3D=0A= "16", /* loop_align. */=0A= 3, /* int_reassoc_width. */=0A= 2, /* fp_reassoc_width. */=0A= + 1, /* fma_reassoc_width. */=0A= 2, /* vec_reassoc_width. */=0A= 2, /* min_div_recip_mul_sf. */=0A= 2, /* min_div_recip_mul_df. */=0A= @@ -1844,6 +1859,7 @@ static const struct tune_params thunderx3t110_tunings= =3D=0A= "16", /* loop_align. */=0A= 3, /* int_reassoc_width. */=0A= 2, /* fp_reassoc_width. */=0A= + 1, /* fma_reassoc_width. */=0A= 2, /* vec_reassoc_width. */=0A= 2, /* min_div_recip_mul_sf. */=0A= 2, /* min_div_recip_mul_df. */=0A= @@ -1876,6 +1892,7 @@ static const struct tune_params neoversen1_tunings = =3D=0A= "32:16", /* loop_align. */=0A= 2, /* int_reassoc_width. */=0A= 4, /* fp_reassoc_width. */=0A= + 1, /* fma_reassoc_width. */=0A= 2, /* vec_reassoc_width. */=0A= 2, /* min_div_recip_mul_sf. */=0A= 2, /* min_div_recip_mul_df. */=0A= @@ -1912,6 +1929,7 @@ static const struct tune_params ampere1_tunings =3D= =0A= "32:16", /* loop_align. */=0A= 2, /* int_reassoc_width. */=0A= 4, /* fp_reassoc_width. */=0A= + 1, /* fma_reassoc_width. */=0A= 2, /* vec_reassoc_width. */=0A= 2, /* min_div_recip_mul_sf. */=0A= 2, /* min_div_recip_mul_df. */=0A= @@ -1949,6 +1967,7 @@ static const struct tune_params ampere1a_tunings =3D= =0A= "32:16", /* loop_align. */=0A= 2, /* int_reassoc_width. */=0A= 4, /* fp_reassoc_width. */=0A= + 1, /* fma_reassoc_width. */=0A= 2, /* vec_reassoc_width. */=0A= 2, /* min_div_recip_mul_sf. */=0A= 2, /* min_div_recip_mul_df. */=0A= @@ -2126,6 +2145,7 @@ static const struct tune_params neoversev1_tunings = =3D=0A= "32:16", /* loop_align. */=0A= 2, /* int_reassoc_width. */=0A= 4, /* fp_reassoc_width. */=0A= + 4, /* fma_reassoc_width. */=0A= 2, /* vec_reassoc_width. */=0A= 2, /* min_div_recip_mul_sf. */=0A= 2, /* min_div_recip_mul_df. */=0A= @@ -2263,6 +2283,7 @@ static const struct tune_params neoverse512tvb_tuning= s =3D=0A= "32:16", /* loop_align. */=0A= 2, /* int_reassoc_width. */=0A= 4, /* fp_reassoc_width. */=0A= + 4, /* fma_reassoc_width. */=0A= 2, /* vec_reassoc_width. */=0A= 2, /* min_div_recip_mul_sf. */=0A= 2, /* min_div_recip_mul_df. */=0A= @@ -2451,6 +2472,7 @@ static const struct tune_params neoversen2_tunings = =3D=0A= "32:16", /* loop_align. */=0A= 2, /* int_reassoc_width. */=0A= 4, /* fp_reassoc_width. */=0A= + 1, /* fma_reassoc_width. */=0A= 2, /* vec_reassoc_width. */=0A= 2, /* min_div_recip_mul_sf. */=0A= 2, /* min_div_recip_mul_df. */=0A= @@ -2640,6 +2662,7 @@ static const struct tune_params neoversev2_tunings = =3D=0A= "32:16", /* loop_align. */=0A= 3, /* int_reassoc_width. */=0A= 6, /* fp_reassoc_width. */=0A= + 4, /* fma_reassoc_width. */=0A= 3, /* vec_reassoc_width. */=0A= 2, /* min_div_recip_mul_sf. */=0A= 2, /* min_div_recip_mul_df. */=0A= @@ -2675,6 +2698,7 @@ static const struct tune_params a64fx_tunings =3D=0A= "32", /* loop_align. */=0A= 4, /* int_reassoc_width. */=0A= 2, /* fp_reassoc_width. */=0A= + 1, /* fma_reassoc_width. */=0A= 2, /* vec_reassoc_width. */=0A= 2, /* min_div_recip_mul_sf. */=0A= 2, /* min_div_recip_mul_df. */=0A= @@ -3387,9 +3411,15 @@ aarch64_reassociation_width (unsigned opc, machine_m= ode mode)=0A= return aarch64_tune_params.vec_reassoc_width;=0A= if (INTEGRAL_MODE_P (mode))=0A= return aarch64_tune_params.int_reassoc_width;=0A= - /* Avoid reassociating floating point addition so we emit more FMAs. */= =0A= - if (FLOAT_MODE_P (mode) && opc !=3D PLUS_EXPR)=0A= - return aarch64_tune_params.fp_reassoc_width;=0A= + /* Reassociation reduces the number of FMAs which may result in worse=0A= + performance. Use a per-CPU setting for FMA reassociation which allow= s=0A= + narrow CPUs with few FP pipes to switch it off (value of 1), and wide= r=0A= + CPUs with many FP pipes to enable reassociation.=0A= + Since the reassociation pass doesn't understand FMA at all, assume=0A= + that any FP addition might turn into FMA. */=0A= + if (FLOAT_MODE_P (mode))=0A= + return opc =3D=3D PLUS_EXPR ? aarch64_tune_params.fma_reassoc_width=0A= + : aarch64_tune_params.fp_reassoc_width;=0A= return 1;=0A= }=0A= =0A= =0A= =0A=