public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/104122] New: On Zen3, 510.parest_r (built with -Ofast) is faster with generic than with native tuning
@ 2022-01-19 17:20 jamborm at gcc dot gnu.org
2022-01-19 17:54 ` [Bug target/104122] " jamborm at gcc dot gnu.org
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: jamborm at gcc dot gnu.org @ 2022-01-19 17:20 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104122
Bug ID: 104122
Summary: On Zen3, 510.parest_r (built with -Ofast) is faster
with generic than with native tuning
Product: gcc
Version: 12.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jamborm at gcc dot gnu.org
CC: hubicka at gcc dot gnu.org
Blocks: 26163
Target Milestone: ---
Host: x86_64-linux
Target: x86_64-linux
On Zen3 based CPUs, benchmark 510.parest_r from the SPEC 2017 FPrate is faster
with -march=generic than with -march=native. LNT reports 11% regression:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=463.457.0&plot.1=471.457.0&
However, my own measurements on a different but similar EPYC machine suggest it
can be as high as 26%. On a yet another Ryzen machine I can see almost 10%
too. I only have older-than-LNT data from the Ryzen machine and we did not see
the regression when gcc 11 was released. However it seems that the generic
tuning improved while the native one did not.
Referenced Bugs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
[Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug target/104122] On Zen3, 510.parest_r (built with -Ofast) is faster with generic than with native tuning
2022-01-19 17:20 [Bug target/104122] New: On Zen3, 510.parest_r (built with -Ofast) is faster with generic than with native tuning jamborm at gcc dot gnu.org
@ 2022-01-19 17:54 ` jamborm at gcc dot gnu.org
2022-01-20 7:54 ` [Bug target/104122] On Zen3, 510.parest_r (built with -Ofast) is faster with generic than with native ISA rguenth at gcc dot gnu.org
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: jamborm at gcc dot gnu.org @ 2022-01-19 17:54 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104122
--- Comment #1 from Martin Jambor <jamborm at gcc dot gnu.org> ---
On the said EPYC machine, I could see 6% regression at -O2 as well and then
confirmed it on the Ryzen. Again, historical data suggests generic improved
more than native and we already had a 4% regression when gc11 was released.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug target/104122] On Zen3, 510.parest_r (built with -Ofast) is faster with generic than with native ISA
2022-01-19 17:20 [Bug target/104122] New: On Zen3, 510.parest_r (built with -Ofast) is faster with generic than with native tuning jamborm at gcc dot gnu.org
2022-01-19 17:54 ` [Bug target/104122] " jamborm at gcc dot gnu.org
@ 2022-01-20 7:54 ` rguenth at gcc dot gnu.org
2022-01-20 9:40 ` jamborm at gcc dot gnu.org
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-01-20 7:54 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104122
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|On Zen3, 510.parest_r |On Zen3, 510.parest_r
|(built with -Ofast) is |(built with -Ofast) is
|faster with generic than |faster with generic than
|with native tuning |with native ISA
Keywords| |missed-optimization
--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
It's ISA, not tuning. I suppose -march=native -mtune=generic is still bad? I
wonder if you tried the obvious -mprefer-avx128?
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug target/104122] On Zen3, 510.parest_r (built with -Ofast) is faster with generic than with native ISA
2022-01-19 17:20 [Bug target/104122] New: On Zen3, 510.parest_r (built with -Ofast) is faster with generic than with native tuning jamborm at gcc dot gnu.org
2022-01-19 17:54 ` [Bug target/104122] " jamborm at gcc dot gnu.org
2022-01-20 7:54 ` [Bug target/104122] On Zen3, 510.parest_r (built with -Ofast) is faster with generic than with native ISA rguenth at gcc dot gnu.org
@ 2022-01-20 9:40 ` jamborm at gcc dot gnu.org
2022-01-20 10:55 ` jamborm at gcc dot gnu.org
2023-01-18 17:41 ` jamborm at gcc dot gnu.org
4 siblings, 0 replies; 6+ messages in thread
From: jamborm at gcc dot gnu.org @ 2022-01-20 9:40 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104122
--- Comment #3 from Martin Jambor <jamborm at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #2)
> It's ISA, not tuning.
You are of course correct, unfortunately I am too accustomed to
using the wrong term.
> I suppose -march=native -mtune=generic is still bad?
I don't know, I'd have to manually check.
> I wonder if you tried the obvious -mprefer-avx128?
I hope that is equivalent to -mprefer-vector-width=128
If it is, -march=native -mtune=native -mprefer-vector-width=128 is
even quite a bit slower than -march=native -mtune=native.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug target/104122] On Zen3, 510.parest_r (built with -Ofast) is faster with generic than with native ISA
2022-01-19 17:20 [Bug target/104122] New: On Zen3, 510.parest_r (built with -Ofast) is faster with generic than with native tuning jamborm at gcc dot gnu.org
` (2 preceding siblings ...)
2022-01-20 9:40 ` jamborm at gcc dot gnu.org
@ 2022-01-20 10:55 ` jamborm at gcc dot gnu.org
2023-01-18 17:41 ` jamborm at gcc dot gnu.org
4 siblings, 0 replies; 6+ messages in thread
From: jamborm at gcc dot gnu.org @ 2022-01-20 10:55 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104122
--- Comment #4 from Martin Jambor <jamborm at gcc dot gnu.org> ---
(In reply to Martin Jambor from comment #3)
> (In reply to Richard Biener from comment #2)
>
> > I suppose -march=native -mtune=generic is still bad?
>
> I don't know, I'd have to manually check.
>
It turns out that (at least on the Ryzen machine) -march=native
-mtune=generic is actually 15% better than not using any of the two
options.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug target/104122] On Zen3, 510.parest_r (built with -Ofast) is faster with generic than with native ISA
2022-01-19 17:20 [Bug target/104122] New: On Zen3, 510.parest_r (built with -Ofast) is faster with generic than with native tuning jamborm at gcc dot gnu.org
` (3 preceding siblings ...)
2022-01-20 10:55 ` jamborm at gcc dot gnu.org
@ 2023-01-18 17:41 ` jamborm at gcc dot gnu.org
4 siblings, 0 replies; 6+ messages in thread
From: jamborm at gcc dot gnu.org @ 2023-01-18 17:41 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104122
Martin Jambor <jamborm at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |RESOLVED
Resolution|--- |FIXED
--- Comment #5 from Martin Jambor <jamborm at gcc dot gnu.org> ---
According to the LNT plot, this has been fixed last spring.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2023-01-18 17:41 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-19 17:20 [Bug target/104122] New: On Zen3, 510.parest_r (built with -Ofast) is faster with generic than with native tuning jamborm at gcc dot gnu.org
2022-01-19 17:54 ` [Bug target/104122] " jamborm at gcc dot gnu.org
2022-01-20 7:54 ` [Bug target/104122] On Zen3, 510.parest_r (built with -Ofast) is faster with generic than with native ISA rguenth at gcc dot gnu.org
2022-01-20 9:40 ` jamborm at gcc dot gnu.org
2022-01-20 10:55 ` jamborm at gcc dot gnu.org
2023-01-18 17:41 ` jamborm at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).