[gcc r12-9250] Enable 512 bit vector for zen4

public inbox for gcc-cvs@sourceware.org
help / color / mirror / Atom feed

* [gcc r12-9250] Enable 512 bit vector for zen4
@ 2023-03-14  8:48 Jan Hubicka
  0 siblings, 0 replies; only message in thread
From: Jan Hubicka @ 2023-03-14  8:48 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:2166a87cefde15ec7798914095d9a61566e7fccd

commit r12-9250-g2166a87cefde15ec7798914095d9a61566e7fccd
Author: Jan Hubicka <jh@suse.cz>
Date:   Tue Feb 7 05:23:00 2023 +0100

    Enable 512 bit vector for zen4
    
    While internally 512 registers are splits into two 256 halves, 512 bit vectors
    reduces number of instructions to retire and has chance to improve paralelism.
    There are few tsvc benchmarks that improves significantly:
    
               runtime
    benchmark  256bit  512bit
    s2275      48.57   20.67    -58%
    s311       32.29   16.06    -50%
    s312       32.30   16.07    -50%
    vsumr      32.30   16.07    -50%
    s314       10.77   5.42     -50%
    s313       21.52   10.85    -50%
    vdotr      43.05   21.69    -50%
    s316       10.80   5.64     -48%
    s235       61.72   33.91    -45%
    s161       15.91   9.95     -38%
    s3251      32.13   20.31    -36%
    
    And there are no benchmarks with off-noise regression.  The basic matrix
    multiplication loop improves by 32%.  It is also expected that 512 bit
    vectors are more power effecient (I can't masure that).
    
    The down side is that loops with low trip counts may get slower when the
    unvectorized prologue and epilogue is hit more often.  With SPECfp this
    problem happens with x264 (12% regression) and bwaves (6% regression)
    and this is tracked in
    https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108410
    and will need more work on vectorizer to support masked epilogues.
    
    After some additional testing it seems that using 512 bit vectors by
    default is now overall better choice.
    
    Bootstrapped/regtested x86_64-linux. Plan to commit it tomorrow.
    
            * config/i386/x86-tune.def (X86_TUNE_AVX256_OPTIMAL): Turn off
            for znver4.
    
    (cherry picked from commit a7502c4a614238ac3f80271886b217b156bdf923)

Diff:
---
 gcc/config/i386/x86-tune.def | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
index 8c3c1b41e79..9f15ae7bb2e 100644
--- a/gcc/config/i386/x86-tune.def
+++ b/gcc/config/i386/x86-tune.def
@@ -537,7 +537,7 @@ DEF_TUNE (X86_TUNE_AVX128_OPTIMAL, "avx128_optimal", m_BDVER | m_BTVER2
 
 /* X86_TUNE_AVX256_OPTIMAL: Use 256-bit AVX instructions instead of 512-bit AVX
    instructions in the auto-vectorizer.  */
-DEF_TUNE (X86_TUNE_AVX256_OPTIMAL, "avx256_optimal", m_CORE_AVX512 | m_ZNVER4)
+DEF_TUNE (X86_TUNE_AVX256_OPTIMAL, "avx256_optimal", m_CORE_AVX512)
 
 /* X86_TUNE_AVX256_SPLIT_REGS: if true, AVX512 ops are split into two AVX256 ops.  */
 DEF_TUNE (X86_TUNE_AVX512_SPLIT_REGS, "avx512_split_regs", m_ZNVER4)

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2023-03-14  8:48 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-14  8:48 [gcc r12-9250] Enable 512 bit vector for zen4 Jan Hubicka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).