public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug gcov-profile/94472] New: 400.perlbench is slower when compiled at -O2 with both PGO and LTO on AMD Zen CPUs
@ 2020-04-03 14:54 jamborm at gcc dot gnu.org
2020-04-28 2:30 ` [Bug ipa/94472] " edlinger at gcc dot gnu.org
` (10 more replies)
0 siblings, 11 replies; 12+ messages in thread
From: jamborm at gcc dot gnu.org @ 2020-04-03 14:54 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94472
Bug ID: 94472
Summary: 400.perlbench is slower when compiled at -O2 with both
PGO and LTO on AMD Zen CPUs
Product: gcc
Version: 10.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: gcov-profile
Assignee: unassigned at gcc dot gnu.org
Reporter: jamborm at gcc dot gnu.org
CC: hubicka at gcc dot gnu.org, marxin at gcc dot gnu.org
Blocks: 26163
Target Milestone: ---
Host: x86_64-linux
Target: x86_64-linux
400.perlbench is slower when compiled at -O2 (and generic march/mtune)
with both PGO and LTO when compiled with master (26b3e568a60) than
when built with GCC 9, on Zen2 by 13% and on Zen1 by 7%. The
performance is comparable on Intel Cascade Lake server CPU.
I attempted bisecting the problems on the Zen2 CPU but was only
partially successful because a lot of the slowdown seemed to have
happened gradually. The first bigger slowdown - almost 4% - came
with:
562d1e9556777988ae46c5d1357af2636bc272ea is the first bad commit
commit 562d1e9556777988ae46c5d1357af2636bc272ea
Author: Jan Hubicka <hubicka@gcc.gnu.org>
Date: Wed Oct 2 16:01:47 2019 +0000
cif-code.def (MAX_INLINE_INSNS_SINGLE_O2_LIMIT, [...]): New.
* cif-code.def (MAX_INLINE_INSNS_SINGLE_O2_LIMIT,
MAX_INLINE_INSNS_AUTO_O2_LIMIT): New.
...
From-SVN: r276469
About the same performance loss was then introduced by:
commit 2925cad2151842daa387950e62d989090e47c91d
Author: Jan Hubicka <hubicka@ucw.cz>
Date: Thu Oct 3 17:08:21 2019 +0200
params.def (PARAM_INLINE_HEURISTICS_HINT_PERCENT, [...]): New.
* params.def (PARAM_INLINE_HEURISTICS_HINT_PERCENT,
PARAM_INLINE_HEURISTICS_HINT_PERCENT_O2): New.
* doc/invoke.texi (inline-heuristics-hint-percent,
inline-heuristics-hint-percent-O2): Document.
* tree-inline.c (inline_insns_single, inline_insns_auto): Add new
hint attribute.
(can_inline_edge_by_limits_p): Use it.
And finally throughout March the benchmark is quite jumpy but finally
ended again ended up about 5% slower than at the beginning of the
month.
Referenced Bugs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
[Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug ipa/94472] 400.perlbench is slower when compiled at -O2 with both PGO and LTO on AMD Zen CPUs
2020-04-03 14:54 [Bug gcov-profile/94472] New: 400.perlbench is slower when compiled at -O2 with both PGO and LTO on AMD Zen CPUs jamborm at gcc dot gnu.org
@ 2020-04-28 2:30 ` edlinger at gcc dot gnu.org
2020-04-28 2:36 ` edlinger at gcc dot gnu.org
` (9 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: edlinger at gcc dot gnu.org @ 2020-04-28 2:30 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94472
Bernd Edlinger <edlinger at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Last reconfirmed| |2020-04-28
Ever confirmed|0 |1
Status|UNCONFIRMED |NEW
CC| |edlinger at gcc dot gnu.org,
| |jakub at gcc dot gnu.org,
| |law at gcc dot gnu.org,
| |rguenther at suse dot de
--- Comment #1 from Bernd Edlinger <edlinger at gcc dot gnu.org> ---
This looks like an important issue to me.
maybe P2 ?
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug ipa/94472] 400.perlbench is slower when compiled at -O2 with both PGO and LTO on AMD Zen CPUs
2020-04-03 14:54 [Bug gcov-profile/94472] New: 400.perlbench is slower when compiled at -O2 with both PGO and LTO on AMD Zen CPUs jamborm at gcc dot gnu.org
2020-04-28 2:30 ` [Bug ipa/94472] " edlinger at gcc dot gnu.org
@ 2020-04-28 2:36 ` edlinger at gcc dot gnu.org
2020-04-28 8:06 ` jamborm at gcc dot gnu.org
` (8 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: edlinger at gcc dot gnu.org @ 2020-04-28 2:36 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94472
--- Comment #2 from Bernd Edlinger <edlinger at gcc dot gnu.org> ---
Martin, can you try to change the limits,
maybe that is just a limit for inline expansions
that is not right?
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug ipa/94472] 400.perlbench is slower when compiled at -O2 with both PGO and LTO on AMD Zen CPUs
2020-04-03 14:54 [Bug gcov-profile/94472] New: 400.perlbench is slower when compiled at -O2 with both PGO and LTO on AMD Zen CPUs jamborm at gcc dot gnu.org
2020-04-28 2:30 ` [Bug ipa/94472] " edlinger at gcc dot gnu.org
2020-04-28 2:36 ` edlinger at gcc dot gnu.org
@ 2020-04-28 8:06 ` jamborm at gcc dot gnu.org
2020-04-28 8:36 ` edlinger at gcc dot gnu.org
` (7 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: jamborm at gcc dot gnu.org @ 2020-04-28 8:06 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94472
--- Comment #3 from Martin Jambor <jamborm at gcc dot gnu.org> ---
My benchmarking setup is currently gone so unfortunately no, not easily. I'll
be re-measuring everything on a different computer with a slightly different
CPU model soon, so after that I guess I could. But it is most likely the
limits, yes.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug ipa/94472] 400.perlbench is slower when compiled at -O2 with both PGO and LTO on AMD Zen CPUs
2020-04-03 14:54 [Bug gcov-profile/94472] New: 400.perlbench is slower when compiled at -O2 with both PGO and LTO on AMD Zen CPUs jamborm at gcc dot gnu.org
` (2 preceding siblings ...)
2020-04-28 8:06 ` jamborm at gcc dot gnu.org
@ 2020-04-28 8:36 ` edlinger at gcc dot gnu.org
2020-04-28 8:38 ` jakub at gcc dot gnu.org
` (6 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: edlinger at gcc dot gnu.org @ 2020-04-28 8:36 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94472
--- Comment #4 from Bernd Edlinger <edlinger at gcc dot gnu.org> ---
(In reply to Martin Jambor from comment #3)
> My benchmarking setup is currently gone so unfortunately no, not easily.
> I'll be re-measuring everything on a different computer with a slightly
> different CPU model soon, so after that I guess I could. But it is most
> likely the limits, yes.
Yeah, easy to fix, but it takes some time.
But this is not more important than your life.
Shall I raise this to P1 so it prevents gcc-10 release?
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug ipa/94472] 400.perlbench is slower when compiled at -O2 with both PGO and LTO on AMD Zen CPUs
2020-04-03 14:54 [Bug gcov-profile/94472] New: 400.perlbench is slower when compiled at -O2 with both PGO and LTO on AMD Zen CPUs jamborm at gcc dot gnu.org
` (3 preceding siblings ...)
2020-04-28 8:36 ` edlinger at gcc dot gnu.org
@ 2020-04-28 8:38 ` jakub at gcc dot gnu.org
2020-04-28 8:41 ` edlinger at gcc dot gnu.org
` (5 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: jakub at gcc dot gnu.org @ 2020-04-28 8:38 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94472
--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
No, we can't block GCC 10 release indefinitely, we are already behind the usual
schedule. We need to resolve the C++ ABI issues and get the release out.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug ipa/94472] 400.perlbench is slower when compiled at -O2 with both PGO and LTO on AMD Zen CPUs
2020-04-03 14:54 [Bug gcov-profile/94472] New: 400.perlbench is slower when compiled at -O2 with both PGO and LTO on AMD Zen CPUs jamborm at gcc dot gnu.org
` (4 preceding siblings ...)
2020-04-28 8:38 ` jakub at gcc dot gnu.org
@ 2020-04-28 8:41 ` edlinger at gcc dot gnu.org
2020-04-28 8:42 ` edlinger at gcc dot gnu.org
` (4 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: edlinger at gcc dot gnu.org @ 2020-04-28 8:41 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94472
--- Comment #6 from Bernd Edlinger <edlinger at gcc dot gnu.org> ---
(In reply to Jakub Jelinek from comment #5)
> No, we can't block GCC 10 release indefinitely, we are already behind the
> usual schedule. We need to resolve the C++ ABI issues and get the release
> out.
Sorry, have you heard of the Corona pandemic out there?
This is not like olympic games 2020, which has been cancelled?
I just say I would delay gcc 10 right now, before it is too
late, this performance regression will make the damage worse.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug ipa/94472] 400.perlbench is slower when compiled at -O2 with both PGO and LTO on AMD Zen CPUs
2020-04-03 14:54 [Bug gcov-profile/94472] New: 400.perlbench is slower when compiled at -O2 with both PGO and LTO on AMD Zen CPUs jamborm at gcc dot gnu.org
` (5 preceding siblings ...)
2020-04-28 8:41 ` edlinger at gcc dot gnu.org
@ 2020-04-28 8:42 ` edlinger at gcc dot gnu.org
2020-04-28 9:21 ` rguenth at gcc dot gnu.org
` (3 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: edlinger at gcc dot gnu.org @ 2020-04-28 8:42 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94472
Bernd Edlinger <edlinger at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Priority|P3 |P1
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug ipa/94472] 400.perlbench is slower when compiled at -O2 with both PGO and LTO on AMD Zen CPUs
2020-04-03 14:54 [Bug gcov-profile/94472] New: 400.perlbench is slower when compiled at -O2 with both PGO and LTO on AMD Zen CPUs jamborm at gcc dot gnu.org
` (6 preceding siblings ...)
2020-04-28 8:42 ` edlinger at gcc dot gnu.org
@ 2020-04-28 9:21 ` rguenth at gcc dot gnu.org
2020-04-28 9:22 ` rguenth at gcc dot gnu.org
` (2 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-04-28 9:21 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94472
--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Bernd Edlinger from comment #4)
> (In reply to Martin Jambor from comment #3)
> > My benchmarking setup is currently gone so unfortunately no, not easily.
> > I'll be re-measuring everything on a different computer with a slightly
> > different CPU model soon, so after that I guess I could. But it is most
> > likely the limits, yes.
>
> Yeah, easy to fix, but it takes some time.
> But this is not more important than your life.
Note tuning parameters is hard and takes a lot of time. If we adjust things
to make 400.perlbench happy which is btw. from SPEC 2006(!) we're going to
regress things elsewhere. It's going to be a whack-a-mole game and definitely
not suitable at this stage (inliner re-tuning is also prone to trigger
latent GCC issues in previously fine compiling apps).
> Shall I raise this to P1 so it prevents gcc-10 release?
Definitely not. Setting priority is the release managers job, and btw.
bug priority is meaningless for non-regression bugreports.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug ipa/94472] 400.perlbench is slower when compiled at -O2 with both PGO and LTO on AMD Zen CPUs
2020-04-03 14:54 [Bug gcov-profile/94472] New: 400.perlbench is slower when compiled at -O2 with both PGO and LTO on AMD Zen CPUs jamborm at gcc dot gnu.org
` (7 preceding siblings ...)
2020-04-28 9:21 ` rguenth at gcc dot gnu.org
@ 2020-04-28 9:22 ` rguenth at gcc dot gnu.org
2020-04-28 9:57 ` hubicka at ucw dot cz
2020-04-28 13:09 ` edlinger at gcc dot gnu.org
10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-04-28 9:22 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94472
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |WAITING
--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
Oh, and bugfixing requires to first understand the bug. Especially for
performance related issues understanding what goes wrong is important.
I see no analysis being performed to date.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug ipa/94472] 400.perlbench is slower when compiled at -O2 with both PGO and LTO on AMD Zen CPUs
2020-04-03 14:54 [Bug gcov-profile/94472] New: 400.perlbench is slower when compiled at -O2 with both PGO and LTO on AMD Zen CPUs jamborm at gcc dot gnu.org
` (8 preceding siblings ...)
2020-04-28 9:22 ` rguenth at gcc dot gnu.org
@ 2020-04-28 9:57 ` hubicka at ucw dot cz
2020-04-28 13:09 ` edlinger at gcc dot gnu.org
10 siblings, 0 replies; 12+ messages in thread
From: hubicka at ucw dot cz @ 2020-04-28 9:57 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94472
--- Comment #9 from Jan Hubicka <hubicka at ucw dot cz> ---
> --- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
> Oh, and bugfixing requires to first understand the bug. Especially for
> performance related issues understanding what goes wrong is important.
> I see no analysis being performed to date.
The problem here is that -O2 -fprofile-use is now using -O2 inliner
limits while previously it used -O3 inliner limit (because -fprofile-use
enables -finline-functions).
I can see this on SPEC GCC, perl, Firefox, real GCC and clang. We now
have performance diference between -O2+FDO and -O3+FDO.
It is something I kind of missed in my testing, because I was testing
-O2 and -O3 + FDO but not -O2+FDO. I realize that -O2+FDO is kind of
important because we use it in our bootstrap. So i was collecting data
over weekend for Clang, GCC and Firefox.
It is question how agressive we want to be at -O2+FDO but the
observation is that in all these programs the code size growth for -O3
style limits is quite small (bellow 2%) simply because thraining
coverage is quite small in all those programs (sub 10%) and thus the
code size growth for inlining hot calls is acceptable
and thus I think the current defaults are really suboptimal.
I think there are few ways to proceed
1) make inline limits with FDO to be -O3 ones
2) invent yet another set of parameters for FDO
3) increase importance of known_hot hint that is set of calls that are
known to be hot (either by inlining or by hot attribute).
1 is easiest but bit non-sytematic. I am not really keen about 2 because
if parameter explosion.
However 3 looks like good alternative so I am running benchmarks with
few settings of it, but they take some time.
Honza
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug ipa/94472] 400.perlbench is slower when compiled at -O2 with both PGO and LTO on AMD Zen CPUs
2020-04-03 14:54 [Bug gcov-profile/94472] New: 400.perlbench is slower when compiled at -O2 with both PGO and LTO on AMD Zen CPUs jamborm at gcc dot gnu.org
` (9 preceding siblings ...)
2020-04-28 9:57 ` hubicka at ucw dot cz
@ 2020-04-28 13:09 ` edlinger at gcc dot gnu.org
10 siblings, 0 replies; 12+ messages in thread
From: edlinger at gcc dot gnu.org @ 2020-04-28 13:09 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94472
--- Comment #10 from Bernd Edlinger <edlinger at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #7)
>
> > Shall I raise this to P1 so it prevents gcc-10 release?
>
> Definitely not. Setting priority is the release managers job, and btw.
> bug priority is meaningless for non-regression bugreports.
Okay, Richard,
is this P2 or P3 then, I just wanted you to think about it.
;-)
Thanks
Bernd.
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2020-04-28 13:09 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-03 14:54 [Bug gcov-profile/94472] New: 400.perlbench is slower when compiled at -O2 with both PGO and LTO on AMD Zen CPUs jamborm at gcc dot gnu.org
2020-04-28 2:30 ` [Bug ipa/94472] " edlinger at gcc dot gnu.org
2020-04-28 2:36 ` edlinger at gcc dot gnu.org
2020-04-28 8:06 ` jamborm at gcc dot gnu.org
2020-04-28 8:36 ` edlinger at gcc dot gnu.org
2020-04-28 8:38 ` jakub at gcc dot gnu.org
2020-04-28 8:41 ` edlinger at gcc dot gnu.org
2020-04-28 8:42 ` edlinger at gcc dot gnu.org
2020-04-28 9:21 ` rguenth at gcc dot gnu.org
2020-04-28 9:22 ` rguenth at gcc dot gnu.org
2020-04-28 9:57 ` hubicka at ucw dot cz
2020-04-28 13:09 ` edlinger at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).