From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 1F59F3858D33 for ; Fri, 29 Dec 2023 10:23:54 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 1F59F3858D33 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 1F59F3858D33 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1703845436; cv=none; b=jkoMHHAoruw54Dg2K238oosxAjlZ6c8ZxL+5oR9EhJwy+2fFL0V2vBEY6zFU462eREPbpMBHPNrKSnaEQrIT7Nqb5b7FmlOazqiFy8pbXsA73SUtPbx5blOB5sJ0oCUAb9K/6WFMXCMcpHLLbW2TWYBrwZ6o1FXdfxQJNLJLfk0= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1703845436; c=relaxed/simple; bh=NyWdbGfS/uWcxYEU4Y1lr4adL/aHXDJ6eGA8ZThfJa0=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=wgO1eQB/ZoxeczjEbTGLu1lqJDcFVw2v+dRsXdCF+7Bg4IMg8lEFxwJwjKTmMUGhDBqJrEvGqBRD5opeTMNmy/q5OLpGXxwsXLLS7//rIaAGeYw2rU3xfLjKa/gWwhIflwbrZKcfipd3AtLM4SmBzxqxsyjLSl6khS6smV+Bnsc= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 2D6A92F4; Fri, 29 Dec 2023 02:24:39 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 47C993F64C; Fri, 29 Dec 2023 02:23:52 -0800 (PST) From: Richard Sandiford To: Di Zhao OS Mail-Followup-To: Di Zhao OS ,"gcc-patches\@gcc.gnu.org" , richard.sandiford@arm.com Cc: "gcc-patches\@gcc.gnu.org" Subject: Re: [PATCH] aarch64: add 'AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA' References: Date: Fri, 29 Dec 2023 10:23:46 +0000 In-Reply-To: (Di Zhao's message of "Wed, 27 Dec 2023 10:40:57 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-21.6 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Di Zhao OS writes: > This patch adds a new tuning option 'AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA', > to consider fully pipelined FMAs in reassociation. Also, set this option > by default for Ampere CPUs. > > Tested on aarch64-unknown-linux-gnu. Is this OK for trunk? > > Thanks, > Di Zhao > > gcc/ChangeLog: > > * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNING_OPTION): > New tuning option AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA. > * config/aarch64/aarch64.cc (aarch64_override_options_internal): Set > param_fully_pipelined_fma according to tuning option. > * config/aarch64/tuning_models/ampere1.h: Add > AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA to tune_flags. > * config/aarch64/tuning_models/ampere1a.h: Likewise. > * config/aarch64/tuning_models/ampere1b.h: Likewise. > > --- > gcc/config/aarch64/aarch64-tuning-flags.def | 2 ++ > gcc/config/aarch64/aarch64.cc | 6 ++++++ > gcc/config/aarch64/tuning_models/ampere1.h | 3 ++- > gcc/config/aarch64/tuning_models/ampere1a.h | 3 ++- > gcc/config/aarch64/tuning_models/ampere1b.h | 3 ++- > 5 files changed, 14 insertions(+), 3 deletions(-) > > diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def b/gcc/config/aarch64/aarch64-tuning-flags.def > index f28a73839a6..256f17bad60 100644 > --- a/gcc/config/aarch64/aarch64-tuning-flags.def > +++ b/gcc/config/aarch64/aarch64-tuning-flags.def > @@ -49,4 +49,6 @@ AARCH64_EXTRA_TUNING_OPTION ("matched_vector_throughput", MATCHED_VECTOR_THROUGH > > AARCH64_EXTRA_TUNING_OPTION ("avoid_cross_loop_fma", AVOID_CROSS_LOOP_FMA) > > +AARCH64_EXTRA_TUNING_OPTION ("fully_pipelined_FMA", FULLY_PIPELINED_FMA) Could you change this to all-lowercase, i.e. fully_pipelined_fma, for consistency with avoid_cross_loop_fma above? > + > #undef AARCH64_EXTRA_TUNING_OPTION > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc > index f9850320f61..1b3b288cdf9 100644 > --- a/gcc/config/aarch64/aarch64.cc > +++ b/gcc/config/aarch64/aarch64.cc > @@ -18289,6 +18289,12 @@ aarch64_override_options_internal (struct gcc_options *opts) > SET_OPTION_IF_UNSET (opts, &global_options_set, param_avoid_fma_max_bits, > 512); > > + /* Consider fully pipelined FMA in reassociation. */ > + if (aarch64_tune_params.extra_tuning_flags > + & AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA) > + SET_OPTION_IF_UNSET (opts, &global_options_set, param_fully_pipelined_fma, > + 1); > + > aarch64_override_options_after_change_1 (opts); > } > > diff --git a/gcc/config/aarch64/tuning_models/ampere1.h b/gcc/config/aarch64/tuning_models/ampere1.h > index a144e8f94b3..d63788528a7 100644 > --- a/gcc/config/aarch64/tuning_models/ampere1.h > +++ b/gcc/config/aarch64/tuning_models/ampere1.h > @@ -104,7 +104,8 @@ static const struct tune_params ampere1_tunings = > 2, /* min_div_recip_mul_df. */ > 0, /* max_case_values. */ > tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ > - (AARCH64_EXTRA_TUNE_AVOID_CROSS_LOOP_FMA), /* tune_flags. */ > + (AARCH64_EXTRA_TUNE_AVOID_CROSS_LOOP_FMA | > + AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA), /* tune_flags. */ Formatting nit, but GCC style is to put the "|" at the start of the following line: (AARCH64_EXTRA_TUNE_AVOID_CROSS_LOOP_FMA | AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA), /* tune_flags. */ Same for the others. OK with those changes, thanks. Richard > &ere1_prefetch_tune, > AARCH64_LDP_STP_POLICY_ALIGNED, /* ldp_policy_model. */ > AARCH64_LDP_STP_POLICY_ALIGNED /* stp_policy_model. */ > diff --git a/gcc/config/aarch64/tuning_models/ampere1a.h b/gcc/config/aarch64/tuning_models/ampere1a.h > index f688ed08a79..63506e1d1c6 100644 > --- a/gcc/config/aarch64/tuning_models/ampere1a.h > +++ b/gcc/config/aarch64/tuning_models/ampere1a.h > @@ -56,7 +56,8 @@ static const struct tune_params ampere1a_tunings = > 2, /* min_div_recip_mul_df. */ > 0, /* max_case_values. */ > tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ > - (AARCH64_EXTRA_TUNE_AVOID_CROSS_LOOP_FMA), /* tune_flags. */ > + (AARCH64_EXTRA_TUNE_AVOID_CROSS_LOOP_FMA | > + AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA), /* tune_flags. */ > &ere1_prefetch_tune, > AARCH64_LDP_STP_POLICY_ALIGNED, /* ldp_policy_model. */ > AARCH64_LDP_STP_POLICY_ALIGNED /* stp_policy_model. */ > diff --git a/gcc/config/aarch64/tuning_models/ampere1b.h b/gcc/config/aarch64/tuning_models/ampere1b.h > index a98b6a980f7..7894e730174 100644 > --- a/gcc/config/aarch64/tuning_models/ampere1b.h > +++ b/gcc/config/aarch64/tuning_models/ampere1b.h > @@ -106,7 +106,8 @@ static const struct tune_params ampere1b_tunings = > 0, /* max_case_values. */ > tune_params::AUTOPREFETCHER_STRONG, /* autoprefetcher_model. */ > (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND | > - AARCH64_EXTRA_TUNE_AVOID_CROSS_LOOP_FMA), /* tune_flags. */ > + AARCH64_EXTRA_TUNE_AVOID_CROSS_LOOP_FMA | > + AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA), /* tune_flags. */ > &ere1b_prefetch_tune, > AARCH64_LDP_STP_POLICY_ALIGNED, /* ldp_policy_model. */ > AARCH64_LDP_STP_POLICY_ALIGNED /* stp_policy_model. */