public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Jan Hubicka <hubicka@ucw.cz>
To: gcc-patches@gcc.gnu.org, mjambor@suse.cz
Subject: Enable AVX512 512bit vectors by default on Zen4
Date: Sun, 5 Feb 2023 01:42:13 +0100	[thread overview]
Message-ID: <Y977ZULaCk6iP74+@kam.mff.cuni.cz> (raw)

Hi,
this patch enables AVX512 by default on Zen4.  While internally 512
registers are splits into two 256 halves, 512 bit vectors reduces number
of instructions to retire and has chance to improve paralelism.
There are few tsvc benchmarks that improves significantly:

           runtime
benchmark  256bit  512bit
s2275      48.57   20.67    -58%
s311       32.29   16.06    -50%
s312       32.30   16.07    -50%
vsumr      32.30   16.07    -50%
s314       10.77   5.42     -50%
s313       21.52   10.85    -50%
vdotr      43.05   21.69    -50%
s316       10.80   5.64     -48%
s235       61.72   33.91    -45%
s161       15.91   9.95     -38%
s3251      32.13   20.31    -36%

And there are no benchmarks with off-noise regression.  The basic matrix
multiplication loop improves by 32% (for 1000x1000 marices).  It is also
expected that 512 bit vectors are more power effecient (I can't masure
that).

The down side is that loops with low trip counts slower for an iteration
ranges where the epilogue is hit more often.  In SPECfp this problem
happens with x264 (12% regression) and bwaves (6% regression) and this
is tracked in 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108410
and will need more work on vectorizer to support masked epilogues.

After some additional testing it seems that using 512 bit vectors by
default is now overall better choice.

Bootstrapped/regtested x86_64-linux. Plan to commit it tomorrow.


	* config/i386/x86-tune.def (X86_TUNE_AVX256_OPTIMAL): Turn off
	for znver4.

diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
index c78dad07c88..3054656a12c 100644
--- a/gcc/config/i386/x86-tune.def
+++ b/gcc/config/i386/x86-tune.def
@@ -551,7 +551,7 @@ DEF_TUNE (X86_TUNE_AVX128_OPTIMAL, "avx128_optimal", m_BDVER | m_BTVER2
 
 /* X86_TUNE_AVX256_OPTIMAL: Use 256-bit AVX instructions instead of 512-bit AVX
    instructions in the auto-vectorizer.  */
-DEF_TUNE (X86_TUNE_AVX256_OPTIMAL, "avx256_optimal", m_CORE_AVX512 | m_ZNVER4)
+DEF_TUNE (X86_TUNE_AVX256_OPTIMAL, "avx256_optimal", m_CORE_AVX512)
 
 /* X86_TUNE_AVX256_SPLIT_REGS: if true, AVX512 ops are split into two AVX256 ops.  */
 DEF_TUNE (X86_TUNE_AVX512_SPLIT_REGS, "avx512_split_regs", m_ZNVER4)

                 reply	other threads:[~2023-02-05  0:42 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y977ZULaCk6iP74+@kam.mff.cuni.cz \
    --to=hubicka@ucw.cz \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=mjambor@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).