[Bug target/104122] New: On Zen3, 510.parest_r (built with -Ofast) is faster with generic than with native tuning

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug target/104122] New: On Zen3, 510.parest_r (built with -Ofast) is faster with generic than with native tuning
@ 2022-01-19 17:20 jamborm at gcc dot gnu.org
  2022-01-19 17:54 ` [Bug target/104122] " jamborm at gcc dot gnu.org
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: jamborm at gcc dot gnu.org @ 2022-01-19 17:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104122

            Bug ID: 104122
           Summary: On Zen3, 510.parest_r (built with -Ofast) is faster
                    with generic than with native tuning
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jamborm at gcc dot gnu.org
                CC: hubicka at gcc dot gnu.org
            Blocks: 26163
  Target Milestone: ---
              Host: x86_64-linux
            Target: x86_64-linux

On Zen3 based CPUs, benchmark 510.parest_r from the SPEC 2017 FPrate is faster
with -march=generic than with -march=native.  LNT reports 11% regression:

 
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=463.457.0&plot.1=471.457.0&

However, my own measurements on a different but similar EPYC machine suggest it
can be as high as 26%.  On a yet another Ryzen machine I can see almost 10%
too.  I only have older-than-LNT data from the Ryzen machine and we did not see
the regression when gcc 11 was released.  However it seems that the generic
tuning improved while the native one did not.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
[Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/104122] On Zen3, 510.parest_r (built with -Ofast) is faster with generic than with native tuning
  2022-01-19 17:20 [Bug target/104122] New: On Zen3, 510.parest_r (built with -Ofast) is faster with generic than with native tuning jamborm at gcc dot gnu.org
@ 2022-01-19 17:54 ` jamborm at gcc dot gnu.org
  2022-01-20  7:54 ` [Bug target/104122] On Zen3, 510.parest_r (built with -Ofast) is faster with generic than with native ISA rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: jamborm at gcc dot gnu.org @ 2022-01-19 17:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104122

--- Comment #1 from Martin Jambor <jamborm at gcc dot gnu.org> ---
On the said EPYC machine, I could see 6% regression at -O2 as well and then
confirmed it on the Ryzen.  Again, historical data suggests generic improved
more than native and we already had a 4% regression when gc11 was released.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/104122] On Zen3, 510.parest_r (built with -Ofast) is faster with generic than with native ISA
  2022-01-19 17:20 [Bug target/104122] New: On Zen3, 510.parest_r (built with -Ofast) is faster with generic than with native tuning jamborm at gcc dot gnu.org
  2022-01-19 17:54 ` [Bug target/104122] " jamborm at gcc dot gnu.org
@ 2022-01-20  7:54 ` rguenth at gcc dot gnu.org
  2022-01-20  9:40 ` jamborm at gcc dot gnu.org
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-01-20  7:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104122

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|On Zen3, 510.parest_r       |On Zen3, 510.parest_r
                   |(built with -Ofast) is      |(built with -Ofast) is
                   |faster with generic than    |faster with generic than
                   |with native tuning          |with native ISA
           Keywords|                            |missed-optimization

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
It's ISA, not tuning.  I suppose -march=native -mtune=generic is still bad?  I
wonder if you tried the obvious -mprefer-avx128?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/104122] On Zen3, 510.parest_r (built with -Ofast) is faster with generic than with native ISA
  2022-01-19 17:20 [Bug target/104122] New: On Zen3, 510.parest_r (built with -Ofast) is faster with generic than with native tuning jamborm at gcc dot gnu.org
  2022-01-19 17:54 ` [Bug target/104122] " jamborm at gcc dot gnu.org
  2022-01-20  7:54 ` [Bug target/104122] On Zen3, 510.parest_r (built with -Ofast) is faster with generic than with native ISA rguenth at gcc dot gnu.org
@ 2022-01-20  9:40 ` jamborm at gcc dot gnu.org
  2022-01-20 10:55 ` jamborm at gcc dot gnu.org
  2023-01-18 17:41 ` jamborm at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: jamborm at gcc dot gnu.org @ 2022-01-20  9:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104122

--- Comment #3 from Martin Jambor <jamborm at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #2)
> It's ISA, not tuning.

You are of course correct, unfortunately I am too accustomed to
using the wrong term.

> I suppose -march=native -mtune=generic is still bad?

I don't know, I'd have to manually check.

> I wonder if you tried the obvious -mprefer-avx128?

I hope that is equivalent to -mprefer-vector-width=128

If it is, -march=native -mtune=native -mprefer-vector-width=128 is
even quite a bit slower than -march=native -mtune=native.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/104122] On Zen3, 510.parest_r (built with -Ofast) is faster with generic than with native ISA
  2022-01-19 17:20 [Bug target/104122] New: On Zen3, 510.parest_r (built with -Ofast) is faster with generic than with native tuning jamborm at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2022-01-20  9:40 ` jamborm at gcc dot gnu.org
@ 2022-01-20 10:55 ` jamborm at gcc dot gnu.org
  2023-01-18 17:41 ` jamborm at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: jamborm at gcc dot gnu.org @ 2022-01-20 10:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104122

--- Comment #4 from Martin Jambor <jamborm at gcc dot gnu.org> ---
(In reply to Martin Jambor from comment #3)
> (In reply to Richard Biener from comment #2)
>
> > I suppose -march=native -mtune=generic is still bad?
> 
> I don't know, I'd have to manually check.
> 

It turns out that (at least on the Ryzen machine) -march=native
-mtune=generic is actually 15% better than not using any of the two
options.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/104122] On Zen3, 510.parest_r (built with -Ofast) is faster with generic than with native ISA
  2022-01-19 17:20 [Bug target/104122] New: On Zen3, 510.parest_r (built with -Ofast) is faster with generic than with native tuning jamborm at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2022-01-20 10:55 ` jamborm at gcc dot gnu.org
@ 2023-01-18 17:41 ` jamborm at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: jamborm at gcc dot gnu.org @ 2023-01-18 17:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104122

Martin Jambor <jamborm at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|---                         |FIXED

--- Comment #5 from Martin Jambor <jamborm at gcc dot gnu.org> ---
According to the LNT plot, this has been fixed last spring.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-01-18 17:41 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-19 17:20 [Bug target/104122] New: On Zen3, 510.parest_r (built with -Ofast) is faster with generic than with native tuning jamborm at gcc dot gnu.org
2022-01-19 17:54 ` [Bug target/104122] " jamborm at gcc dot gnu.org
2022-01-20  7:54 ` [Bug target/104122] On Zen3, 510.parest_r (built with -Ofast) is faster with generic than with native ISA rguenth at gcc dot gnu.org
2022-01-20  9:40 ` jamborm at gcc dot gnu.org
2022-01-20 10:55 ` jamborm at gcc dot gnu.org
2023-01-18 17:41 ` jamborm at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).