* [PATCH] Set AVOID_256FMA_CHAINS TO m_GENERIC as it's generally good to new platforms
@ 2023-11-22 4:15 liuhongt
2023-11-30 8:38 ` Hongtao Liu
0 siblings, 1 reply; 2+ messages in thread
From: liuhongt @ 2023-11-22 4:15 UTC (permalink / raw)
To: gcc-patches; +Cc: crazylht, hjl.tools, Zhang, Annita
From: "Zhang, Annita" <annita.zhang@intel.com>
Avoid_fma_chain was enabled in m_SAPPHIRERAPIDS, m_ALDERLAKE and
m_CORE_HYBRID. It can also be enabled in m_GENERIC to improve the
performance of -march=x86-64-v3/v4 with -mtune=generic set by
default. One SPEC2017 benchmark 510.parest_r can improve greatly due
to it. From the experiments, the single thread with -O2
-march=x86-64-v3 can improve 26% on SPR, and 15% on Zen3. Meanwhile,
it didn't cause notable regression in previous platforms including
Cascade Lake and Ice Lake Server.
On zenver4, it looks like fadd(3 cycles) is still fater than fma(4
cycles). So in theory, avoid_fma_chain should be also better for
znver4. And according to [1], enable fma_chain is not a generic win on
znver4?
----cut from [1]---------------
I also added X86_TUNE_AVOID_256FMA_CHAINS. Since fma has improved in
zen4 this flag may not be a win except for very specific benchmarks. I
am still doing some more detailed testing here.
-----cut end--------------
[1] https://gcc.gnu.org/pipermail/gcc-patches/2022-December/607962.html
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?
gcc/ChangeLog
* config/i386/x86-tune.def (AVOID_256FMA_CHAINS): Add
m_GENERIC.
---
gcc/config/i386/x86-tune.def | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
index 43fa9e8fd6d..a2e57e01550 100644
--- a/gcc/config/i386/x86-tune.def
+++ b/gcc/config/i386/x86-tune.def
@@ -521,7 +521,7 @@ DEF_TUNE (X86_TUNE_AVOID_128FMA_CHAINS, "avoid_fma_chains", m_ZNVER1 | m_ZNVER2
/* X86_TUNE_AVOID_256FMA_CHAINS: Avoid creating loops with tight 256bit or
smaller FMA chain. */
DEF_TUNE (X86_TUNE_AVOID_256FMA_CHAINS, "avoid_fma256_chains", m_ZNVER2 | m_ZNVER3
- | m_CORE_HYBRID | m_SAPPHIRERAPIDS | m_CORE_ATOM)
+ | m_CORE_HYBRID | m_SAPPHIRERAPIDS | m_CORE_ATOM | m_GENERIC)
/* X86_TUNE_AVOID_512FMA_CHAINS: Avoid creating loops with tight 512bit or
smaller FMA chain. */
--
2.31.1
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [PATCH] Set AVOID_256FMA_CHAINS TO m_GENERIC as it's generally good to new platforms
2023-11-22 4:15 [PATCH] Set AVOID_256FMA_CHAINS TO m_GENERIC as it's generally good to new platforms liuhongt
@ 2023-11-30 8:38 ` Hongtao Liu
0 siblings, 0 replies; 2+ messages in thread
From: Hongtao Liu @ 2023-11-30 8:38 UTC (permalink / raw)
To: Richard Biener, Jan Hubicka; +Cc: gcc-patches, hjl.tools, Zhang, Annita
Any comments?
On Wed, Nov 22, 2023 at 12:17 PM liuhongt <hongtao.liu@intel.com> wrote:
>
> From: "Zhang, Annita" <annita.zhang@intel.com>
>
> Avoid_fma_chain was enabled in m_SAPPHIRERAPIDS, m_ALDERLAKE and
> m_CORE_HYBRID. It can also be enabled in m_GENERIC to improve the
> performance of -march=x86-64-v3/v4 with -mtune=generic set by
> default. One SPEC2017 benchmark 510.parest_r can improve greatly due
> to it. From the experiments, the single thread with -O2
> -march=x86-64-v3 can improve 26% on SPR, and 15% on Zen3. Meanwhile,
> it didn't cause notable regression in previous platforms including
> Cascade Lake and Ice Lake Server.
>
> On zenver4, it looks like fadd(3 cycles) is still fater than fma(4
> cycles). So in theory, avoid_fma_chain should be also better for
> znver4. And according to [1], enable fma_chain is not a generic win on
> znver4?
>
> ----cut from [1]---------------
> I also added X86_TUNE_AVOID_256FMA_CHAINS. Since fma has improved in
> zen4 this flag may not be a win except for very specific benchmarks. I
> am still doing some more detailed testing here.
> -----cut end--------------
>
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-December/607962.html
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?
>
> gcc/ChangeLog
>
> * config/i386/x86-tune.def (AVOID_256FMA_CHAINS): Add
> m_GENERIC.
> ---
> gcc/config/i386/x86-tune.def | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
> index 43fa9e8fd6d..a2e57e01550 100644
> --- a/gcc/config/i386/x86-tune.def
> +++ b/gcc/config/i386/x86-tune.def
> @@ -521,7 +521,7 @@ DEF_TUNE (X86_TUNE_AVOID_128FMA_CHAINS, "avoid_fma_chains", m_ZNVER1 | m_ZNVER2
> /* X86_TUNE_AVOID_256FMA_CHAINS: Avoid creating loops with tight 256bit or
> smaller FMA chain. */
> DEF_TUNE (X86_TUNE_AVOID_256FMA_CHAINS, "avoid_fma256_chains", m_ZNVER2 | m_ZNVER3
> - | m_CORE_HYBRID | m_SAPPHIRERAPIDS | m_CORE_ATOM)
> + | m_CORE_HYBRID | m_SAPPHIRERAPIDS | m_CORE_ATOM | m_GENERIC)
>
> /* X86_TUNE_AVOID_512FMA_CHAINS: Avoid creating loops with tight 512bit or
> smaller FMA chain. */
> --
> 2.31.1
>
--
BR,
Hongtao
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2023-11-30 8:30 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-22 4:15 [PATCH] Set AVOID_256FMA_CHAINS TO m_GENERIC as it's generally good to new platforms liuhongt
2023-11-30 8:38 ` Hongtao Liu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).