public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug gcov-profile/94472] New: 400.perlbench is slower when compiled at -O2 with both PGO and LTO on AMD Zen CPUs
@ 2020-04-03 14:54 jamborm at gcc dot gnu.org
  2020-04-28  2:30 ` [Bug ipa/94472] " edlinger at gcc dot gnu.org
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: jamborm at gcc dot gnu.org @ 2020-04-03 14:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94472

            Bug ID: 94472
           Summary: 400.perlbench is slower when compiled at -O2 with both
                    PGO and LTO on AMD Zen CPUs
           Product: gcc
           Version: 10.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: gcov-profile
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jamborm at gcc dot gnu.org
                CC: hubicka at gcc dot gnu.org, marxin at gcc dot gnu.org
            Blocks: 26163
  Target Milestone: ---
              Host: x86_64-linux
            Target: x86_64-linux

400.perlbench is slower when compiled at -O2 (and generic march/mtune)
with both PGO and LTO when compiled with master (26b3e568a60) than
when built with GCC 9, on Zen2 by 13% and on Zen1 by 7%.  The
performance is comparable on Intel Cascade Lake server CPU.

I attempted bisecting the problems on the Zen2 CPU but was only
partially successful because a lot of the slowdown seemed to have
happened gradually.  The first bigger slowdown - almost 4% - came
with:

  562d1e9556777988ae46c5d1357af2636bc272ea is the first bad commit
  commit 562d1e9556777988ae46c5d1357af2636bc272ea
  Author: Jan Hubicka <hubicka@gcc.gnu.org>
  Date:   Wed Oct 2 16:01:47 2019 +0000

    cif-code.def (MAX_INLINE_INSNS_SINGLE_O2_LIMIT, [...]): New.


            * cif-code.def (MAX_INLINE_INSNS_SINGLE_O2_LIMIT,
            MAX_INLINE_INSNS_AUTO_O2_LIMIT): New.

  ...
    From-SVN: r276469

About the same performance loss was then introduced by:

commit 2925cad2151842daa387950e62d989090e47c91d
Author: Jan Hubicka <hubicka@ucw.cz>
Date:   Thu Oct 3 17:08:21 2019 +0200

    params.def (PARAM_INLINE_HEURISTICS_HINT_PERCENT, [...]): New.

            * params.def (PARAM_INLINE_HEURISTICS_HINT_PERCENT,
            PARAM_INLINE_HEURISTICS_HINT_PERCENT_O2): New.
            * doc/invoke.texi (inline-heuristics-hint-percent,
            inline-heuristics-hint-percent-O2): Document.
            * tree-inline.c (inline_insns_single, inline_insns_auto): Add new
            hint attribute.
            (can_inline_edge_by_limits_p): Use it.


And finally throughout March the benchmark is quite jumpy but finally
ended again ended up about 5% slower than at the beginning of the
month.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
[Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug ipa/94472] 400.perlbench is slower when compiled at -O2 with both PGO and LTO on AMD Zen CPUs
  2020-04-03 14:54 [Bug gcov-profile/94472] New: 400.perlbench is slower when compiled at -O2 with both PGO and LTO on AMD Zen CPUs jamborm at gcc dot gnu.org
@ 2020-04-28  2:30 ` edlinger at gcc dot gnu.org
  2020-04-28  2:36 ` edlinger at gcc dot gnu.org
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: edlinger at gcc dot gnu.org @ 2020-04-28  2:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94472

Bernd Edlinger <edlinger at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2020-04-28
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW
                 CC|                            |edlinger at gcc dot gnu.org,
                   |                            |jakub at gcc dot gnu.org,
                   |                            |law at gcc dot gnu.org,
                   |                            |rguenther at suse dot de

--- Comment #1 from Bernd Edlinger <edlinger at gcc dot gnu.org> ---
This looks like an important issue to me.
maybe P2 ?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug ipa/94472] 400.perlbench is slower when compiled at -O2 with both PGO and LTO on AMD Zen CPUs
  2020-04-03 14:54 [Bug gcov-profile/94472] New: 400.perlbench is slower when compiled at -O2 with both PGO and LTO on AMD Zen CPUs jamborm at gcc dot gnu.org
  2020-04-28  2:30 ` [Bug ipa/94472] " edlinger at gcc dot gnu.org
@ 2020-04-28  2:36 ` edlinger at gcc dot gnu.org
  2020-04-28  8:06 ` jamborm at gcc dot gnu.org
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: edlinger at gcc dot gnu.org @ 2020-04-28  2:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94472

--- Comment #2 from Bernd Edlinger <edlinger at gcc dot gnu.org> ---
Martin, can you try to change the limits,
maybe that is just a limit for inline expansions
that is not right?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug ipa/94472] 400.perlbench is slower when compiled at -O2 with both PGO and LTO on AMD Zen CPUs
  2020-04-03 14:54 [Bug gcov-profile/94472] New: 400.perlbench is slower when compiled at -O2 with both PGO and LTO on AMD Zen CPUs jamborm at gcc dot gnu.org
  2020-04-28  2:30 ` [Bug ipa/94472] " edlinger at gcc dot gnu.org
  2020-04-28  2:36 ` edlinger at gcc dot gnu.org
@ 2020-04-28  8:06 ` jamborm at gcc dot gnu.org
  2020-04-28  8:36 ` edlinger at gcc dot gnu.org
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: jamborm at gcc dot gnu.org @ 2020-04-28  8:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94472

--- Comment #3 from Martin Jambor <jamborm at gcc dot gnu.org> ---
My benchmarking setup is currently gone so unfortunately no, not easily.  I'll
be re-measuring everything on a different computer with a slightly different
CPU model soon, so after that I guess I could.  But it is most likely the
limits, yes.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug ipa/94472] 400.perlbench is slower when compiled at -O2 with both PGO and LTO on AMD Zen CPUs
  2020-04-03 14:54 [Bug gcov-profile/94472] New: 400.perlbench is slower when compiled at -O2 with both PGO and LTO on AMD Zen CPUs jamborm at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2020-04-28  8:06 ` jamborm at gcc dot gnu.org
@ 2020-04-28  8:36 ` edlinger at gcc dot gnu.org
  2020-04-28  8:38 ` jakub at gcc dot gnu.org
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: edlinger at gcc dot gnu.org @ 2020-04-28  8:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94472

--- Comment #4 from Bernd Edlinger <edlinger at gcc dot gnu.org> ---
(In reply to Martin Jambor from comment #3)
> My benchmarking setup is currently gone so unfortunately no, not easily. 
> I'll be re-measuring everything on a different computer with a slightly
> different CPU model soon, so after that I guess I could.  But it is most
> likely the limits, yes.

Yeah, easy to fix, but it takes some time.
But this is not more important than your life.

Shall I raise this to P1 so it prevents gcc-10 release?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug ipa/94472] 400.perlbench is slower when compiled at -O2 with both PGO and LTO on AMD Zen CPUs
  2020-04-03 14:54 [Bug gcov-profile/94472] New: 400.perlbench is slower when compiled at -O2 with both PGO and LTO on AMD Zen CPUs jamborm at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2020-04-28  8:36 ` edlinger at gcc dot gnu.org
@ 2020-04-28  8:38 ` jakub at gcc dot gnu.org
  2020-04-28  8:41 ` edlinger at gcc dot gnu.org
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: jakub at gcc dot gnu.org @ 2020-04-28  8:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94472

--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
No, we can't block GCC 10 release indefinitely, we are already behind the usual
schedule.  We need to resolve the C++ ABI issues and get the release out.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug ipa/94472] 400.perlbench is slower when compiled at -O2 with both PGO and LTO on AMD Zen CPUs
  2020-04-03 14:54 [Bug gcov-profile/94472] New: 400.perlbench is slower when compiled at -O2 with both PGO and LTO on AMD Zen CPUs jamborm at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2020-04-28  8:38 ` jakub at gcc dot gnu.org
@ 2020-04-28  8:41 ` edlinger at gcc dot gnu.org
  2020-04-28  8:42 ` edlinger at gcc dot gnu.org
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: edlinger at gcc dot gnu.org @ 2020-04-28  8:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94472

--- Comment #6 from Bernd Edlinger <edlinger at gcc dot gnu.org> ---
(In reply to Jakub Jelinek from comment #5)
> No, we can't block GCC 10 release indefinitely, we are already behind the
> usual schedule.  We need to resolve the C++ ABI issues and get the release
> out.

Sorry, have you heard of the Corona pandemic out there?

This is not like olympic games 2020, which has been cancelled?
I just say I would delay gcc 10 right now, before it is too
late, this performance regression will make the damage worse.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug ipa/94472] 400.perlbench is slower when compiled at -O2 with both PGO and LTO on AMD Zen CPUs
  2020-04-03 14:54 [Bug gcov-profile/94472] New: 400.perlbench is slower when compiled at -O2 with both PGO and LTO on AMD Zen CPUs jamborm at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2020-04-28  8:41 ` edlinger at gcc dot gnu.org
@ 2020-04-28  8:42 ` edlinger at gcc dot gnu.org
  2020-04-28  9:21 ` rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: edlinger at gcc dot gnu.org @ 2020-04-28  8:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94472

Bernd Edlinger <edlinger at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P3                          |P1

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug ipa/94472] 400.perlbench is slower when compiled at -O2 with both PGO and LTO on AMD Zen CPUs
  2020-04-03 14:54 [Bug gcov-profile/94472] New: 400.perlbench is slower when compiled at -O2 with both PGO and LTO on AMD Zen CPUs jamborm at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2020-04-28  8:42 ` edlinger at gcc dot gnu.org
@ 2020-04-28  9:21 ` rguenth at gcc dot gnu.org
  2020-04-28  9:22 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-04-28  9:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94472

--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Bernd Edlinger from comment #4)
> (In reply to Martin Jambor from comment #3)
> > My benchmarking setup is currently gone so unfortunately no, not easily. 
> > I'll be re-measuring everything on a different computer with a slightly
> > different CPU model soon, so after that I guess I could.  But it is most
> > likely the limits, yes.
> 
> Yeah, easy to fix, but it takes some time.
> But this is not more important than your life.

Note tuning parameters is hard and takes a lot of time.  If we adjust things
to make 400.perlbench happy which is btw. from SPEC 2006(!) we're going to
regress things elsewhere.  It's going to be a whack-a-mole game and definitely
not suitable at this stage (inliner re-tuning is also prone to trigger
latent GCC issues in previously fine compiling apps).

> Shall I raise this to P1 so it prevents gcc-10 release?

Definitely not.  Setting priority is the release managers job, and btw.
bug priority is meaningless for non-regression bugreports.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug ipa/94472] 400.perlbench is slower when compiled at -O2 with both PGO and LTO on AMD Zen CPUs
  2020-04-03 14:54 [Bug gcov-profile/94472] New: 400.perlbench is slower when compiled at -O2 with both PGO and LTO on AMD Zen CPUs jamborm at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2020-04-28  9:21 ` rguenth at gcc dot gnu.org
@ 2020-04-28  9:22 ` rguenth at gcc dot gnu.org
  2020-04-28  9:57 ` hubicka at ucw dot cz
  2020-04-28 13:09 ` edlinger at gcc dot gnu.org
  10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-04-28  9:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94472

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |WAITING

--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
Oh, and bugfixing requires to first understand the bug.  Especially for
performance related issues understanding what goes wrong is important.
I see no analysis being performed to date.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug ipa/94472] 400.perlbench is slower when compiled at -O2 with both PGO and LTO on AMD Zen CPUs
  2020-04-03 14:54 [Bug gcov-profile/94472] New: 400.perlbench is slower when compiled at -O2 with both PGO and LTO on AMD Zen CPUs jamborm at gcc dot gnu.org
                   ` (8 preceding siblings ...)
  2020-04-28  9:22 ` rguenth at gcc dot gnu.org
@ 2020-04-28  9:57 ` hubicka at ucw dot cz
  2020-04-28 13:09 ` edlinger at gcc dot gnu.org
  10 siblings, 0 replies; 12+ messages in thread
From: hubicka at ucw dot cz @ 2020-04-28  9:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94472

--- Comment #9 from Jan Hubicka <hubicka at ucw dot cz> ---
> --- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
> Oh, and bugfixing requires to first understand the bug.  Especially for
> performance related issues understanding what goes wrong is important.
> I see no analysis being performed to date.

The problem here is that -O2 -fprofile-use is now using -O2 inliner
limits while previously it used -O3 inliner limit (because -fprofile-use
enables -finline-functions).

I can see this on SPEC GCC, perl, Firefox, real GCC and clang. We now
have performance diference between -O2+FDO and -O3+FDO.

It is something I kind of missed in my testing, because I was testing
-O2 and -O3 + FDO but not -O2+FDO.  I realize that -O2+FDO is kind of
important because we use it in our bootstrap. So i was collecting data
over weekend for Clang, GCC and Firefox.

It is question how agressive we want to be at -O2+FDO but the
observation is that in all these programs the code size growth for -O3
style limits is quite small (bellow 2%) simply because thraining
coverage is quite small in all those programs (sub 10%) and thus the
code size growth for inlining hot calls is acceptable
and thus I think the current defaults are really suboptimal.

I think there are few ways to proceed
 1) make inline limits with FDO to be -O3 ones
 2) invent yet another set of parameters for FDO
 3) increase importance of known_hot hint that is set of calls that are
 known to be hot (either by inlining or by hot attribute).

1 is easiest but bit non-sytematic. I am not really keen about 2 because
if parameter explosion.
However 3 looks like good alternative so I am running benchmarks with
few settings of it, but they take some time.

Honza

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug ipa/94472] 400.perlbench is slower when compiled at -O2 with both PGO and LTO on AMD Zen CPUs
  2020-04-03 14:54 [Bug gcov-profile/94472] New: 400.perlbench is slower when compiled at -O2 with both PGO and LTO on AMD Zen CPUs jamborm at gcc dot gnu.org
                   ` (9 preceding siblings ...)
  2020-04-28  9:57 ` hubicka at ucw dot cz
@ 2020-04-28 13:09 ` edlinger at gcc dot gnu.org
  10 siblings, 0 replies; 12+ messages in thread
From: edlinger at gcc dot gnu.org @ 2020-04-28 13:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94472

--- Comment #10 from Bernd Edlinger <edlinger at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #7)
> 
> > Shall I raise this to P1 so it prevents gcc-10 release?
> 
> Definitely not.  Setting priority is the release managers job, and btw.
> bug priority is meaningless for non-regression bugreports.

Okay, Richard,

is this P2 or P3 then, I just wanted you to think about it.
;-)



Thanks
Bernd.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-04-28 13:09 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-03 14:54 [Bug gcov-profile/94472] New: 400.perlbench is slower when compiled at -O2 with both PGO and LTO on AMD Zen CPUs jamborm at gcc dot gnu.org
2020-04-28  2:30 ` [Bug ipa/94472] " edlinger at gcc dot gnu.org
2020-04-28  2:36 ` edlinger at gcc dot gnu.org
2020-04-28  8:06 ` jamborm at gcc dot gnu.org
2020-04-28  8:36 ` edlinger at gcc dot gnu.org
2020-04-28  8:38 ` jakub at gcc dot gnu.org
2020-04-28  8:41 ` edlinger at gcc dot gnu.org
2020-04-28  8:42 ` edlinger at gcc dot gnu.org
2020-04-28  9:21 ` rguenth at gcc dot gnu.org
2020-04-28  9:22 ` rguenth at gcc dot gnu.org
2020-04-28  9:57 ` hubicka at ucw dot cz
2020-04-28 13:09 ` edlinger at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).