public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug ipa/94360] New: 6% run-time regression of 502.gcc_r against GCC 9 when compiled with -O2 and both PGO and LTO
@ 2020-03-27 16:34 jamborm at gcc dot gnu.org
2020-03-27 16:55 ` [Bug ipa/94360] " marxin at gcc dot gnu.org
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: jamborm at gcc dot gnu.org @ 2020-03-27 16:34 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94360
Bug ID: 94360
Summary: 6% run-time regression of 502.gcc_r against GCC 9 when
compiled with -O2 and both PGO and LTO
Product: gcc
Version: 10.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: ipa
Assignee: unassigned at gcc dot gnu.org
Reporter: jamborm at gcc dot gnu.org
CC: hubicka at gcc dot gnu.org, marxin at gcc dot gnu.org
Blocks: 26163
Target Milestone: ---
Host: x86_64-linux
Target: x86_64-linux
When built at -O2, generic march/mtune and with both PGO and LTO and
current trunk/master, SPEC 2017 INTrate 502.gcc_r is 6% slower when
run on and AMD Zen2-based CPU - and about 4.8% slower on Intel Cascade
Lake.
Looking at how the run-time of the benchmark evolved over the course
of GCC 10 development cycle, the first and biggest regression (9%)
comes with:
commit 2925cad2151842daa387950e62d989090e47c91d
Author: Jan Hubicka <hubicka@ucw.cz>
Date: Thu Oct 3 17:08:21 2019 +0200
params.def (PARAM_INLINE_HEURISTICS_HINT_PERCENT, [...]): New.
* params.def (PARAM_INLINE_HEURISTICS_HINT_PERCENT,
PARAM_INLINE_HEURISTICS_HINT_PERCENT_O2): New.
* doc/invoke.texi (inline-heuristics-hint-percent,
inline-heuristics-hint-percent-O2): Document.
* tree-inline.c (inline_insns_single, inline_insns_auto): Add new
hint attribute.
(can_inline_edge_by_limits_p): Use it.
From-SVN: r276516
Then between Wed Nov 6 (72d6aeecd95) and Mon Nov 18 (58c036c8354) it
improved to about 103% of GCC 9 run-time (I did not exactly found what
caused it because in much of this range the compiler was segfaulting
in the LTO phase). Eventually, the benchmark regresses to current
106% of GCC 9 run-time with Honza's:
- 9340d34599e Convert inliner to function specific param infrastructure, or
- 1e83bd7003e Convert inliner to new param infrastructure.
The former cannot be built without the latter.
Symbol profiles are:
trunk (26b3e568a60):
Overhead Samples Shared Object Symbol
........ ......... ....................
....................................
4.04% 42371 cpugcc_r_peak.pgolto bitmap_ior_into
2.91% 30281 cpugcc_r_peak.pgolto df_worklist_dataflow
2.24% 23342 cpugcc_r_peak.pgolto df_note_compute
1.92% 20120 cpugcc_r_peak.pgolto bitmap_set_bit
1.75% 18148 cpugcc_r_peak.pgolto rest_of_handle_fast_dce.lto_priv.0
1.58% 16580 libc-2.31.so __memset_avx2_unaligned_erms
1.40% 14514 cpugcc_r_peak.pgolto extract_new_fences_from.lto_priv.0
1.39% 14732 libc-2.31.so _int_malloc
1.33% 13824 cpugcc_r_peak.pgolto bitmap_copy
1.24% 12962 cpugcc_r_peak.pgolto bitmap_bit_p
1.19% 12346 cpugcc_r_peak.pgolto bitmap_and
1.18% 12242 cpugcc_r_peak.pgolto df_lr_local_compute.lto_priv.0
1.02% 10618 cpugcc_r_peak.pgolto cleanup_cfg.isra.0
vs gcc 9 (releases/gcc-9.3.0):
Overhead Samples Shared Object Symbol
........ ......... ....................
.....................................
6.81% 66967 cpugcc_r_peak.pgolto df_worklist_dataflow
2.83% 28063 cpugcc_r_peak.pgolto bitmap_ior_into
2.80% 27489 cpugcc_r_peak.pgolto df_note_compute.lto_priv.0
2.17% 21334 cpugcc_r_peak.pgolto rest_of_handle_fast_dce.lto_priv.0
1.69% 16671 libc-2.31.so __memset_avx2_unaligned_erms
1.51% 14876 cpugcc_r_peak.pgolto try_optimize_cfg.lto_priv.0
1.50% 14990 libc-2.31.so _int_malloc
1.50% 14715 cpugcc_r_peak.pgolto extract_new_fences_from.lto_priv.0
1.36% 13406 cpugcc_r_peak.pgolto df_lr_local_compute.lto_priv.0
1.20% 11926 cpugcc_r_peak.pgolto remove_unused_locals
1.06% 10433 cpugcc_r_peak.pgolto sched_analyze_insn
1.04% 10210 cpugcc_r_peak.pgolto init_alias_analysis
1.04% 10188 cpugcc_r_peak.pgolto prescan_insns_for_dce.lto_priv.0
1.00% 9876 cpugcc_r_peak.pgolto compute_transp
Referenced Bugs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
[Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug ipa/94360] 6% run-time regression of 502.gcc_r against GCC 9 when compiled with -O2 and both PGO and LTO
2020-03-27 16:34 [Bug ipa/94360] New: 6% run-time regression of 502.gcc_r against GCC 9 when compiled with -O2 and both PGO and LTO jamborm at gcc dot gnu.org
@ 2020-03-27 16:55 ` marxin at gcc dot gnu.org
2020-03-30 18:00 ` jamborm at gcc dot gnu.org
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: marxin at gcc dot gnu.org @ 2020-03-27 16:55 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94360
--- Comment #1 from Martin Liška <marxin at gcc dot gnu.org> ---
Unfortunately, the mentioned configuration is not tested on LNT periodic
benchmarks.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug ipa/94360] 6% run-time regression of 502.gcc_r against GCC 9 when compiled with -O2 and both PGO and LTO
2020-03-27 16:34 [Bug ipa/94360] New: 6% run-time regression of 502.gcc_r against GCC 9 when compiled with -O2 and both PGO and LTO jamborm at gcc dot gnu.org
2020-03-27 16:55 ` [Bug ipa/94360] " marxin at gcc dot gnu.org
@ 2020-03-30 18:00 ` jamborm at gcc dot gnu.org
2023-01-18 15:42 ` jamborm at gcc dot gnu.org
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: jamborm at gcc dot gnu.org @ 2020-03-30 18:00 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94360
--- Comment #2 from Martin Jambor <jamborm at gcc dot gnu.org> ---
PR94410 is another O2 PGO+LTO bug where g:2925cad2151 caused a slowdown.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug ipa/94360] 6% run-time regression of 502.gcc_r against GCC 9 when compiled with -O2 and both PGO and LTO
2020-03-27 16:34 [Bug ipa/94360] New: 6% run-time regression of 502.gcc_r against GCC 9 when compiled with -O2 and both PGO and LTO jamborm at gcc dot gnu.org
2020-03-27 16:55 ` [Bug ipa/94360] " marxin at gcc dot gnu.org
2020-03-30 18:00 ` jamborm at gcc dot gnu.org
@ 2023-01-18 15:42 ` jamborm at gcc dot gnu.org
2023-01-18 15:47 ` hubicka at gcc dot gnu.org
2023-01-19 10:00 ` jamborm at gcc dot gnu.org
4 siblings, 0 replies; 6+ messages in thread
From: jamborm at gcc dot gnu.org @ 2023-01-18 15:42 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94360
--- Comment #3 from Martin Jambor <jamborm at gcc dot gnu.org> ---
LNT can still see this, on the zen2 and zen3 machine at least:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=700.337.0&plot.1=711.337.0&plot.2=740.337.0&plot.3=694.337.0&
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=690.337.0&plot.1=745.337.0&plot.2=777.337.0&plot.3=687.337.0&
(gcc 9 is the dot in the left bottom corner).
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug ipa/94360] 6% run-time regression of 502.gcc_r against GCC 9 when compiled with -O2 and both PGO and LTO
2020-03-27 16:34 [Bug ipa/94360] New: 6% run-time regression of 502.gcc_r against GCC 9 when compiled with -O2 and both PGO and LTO jamborm at gcc dot gnu.org
` (2 preceding siblings ...)
2023-01-18 15:42 ` jamborm at gcc dot gnu.org
@ 2023-01-18 15:47 ` hubicka at gcc dot gnu.org
2023-01-19 10:00 ` jamborm at gcc dot gnu.org
4 siblings, 0 replies; 6+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-01-18 15:47 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94360
Jan Hubicka <hubicka at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Last reconfirmed| |2023-01-18
Ever confirmed|0 |1
--- Comment #4 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
With -O2 -fprofile-use we now use -O2 inliner limits while previously we
switched to effectively -O3 inlining.
In a way it makes sense to have -O2 -fprofile-use to produce smaller and bit
slower code than -O3 -fprofile-use but it seems that current limits are way too
low. I.e. the code size savings does not seem to justify the performance loss.
From maintenance perspective it kind of sucks to have 3 sets of values (-O2,
-O3 and -O2 + -fprofile-use) but maybe we can get cheaply out by simply making
"known hot" hint to be taken seriously with FDO. FDO inlining is kind of easy
since hot calls are known well.
I will take a look.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug ipa/94360] 6% run-time regression of 502.gcc_r against GCC 9 when compiled with -O2 and both PGO and LTO
2020-03-27 16:34 [Bug ipa/94360] New: 6% run-time regression of 502.gcc_r against GCC 9 when compiled with -O2 and both PGO and LTO jamborm at gcc dot gnu.org
` (3 preceding siblings ...)
2023-01-18 15:47 ` hubicka at gcc dot gnu.org
@ 2023-01-19 10:00 ` jamborm at gcc dot gnu.org
4 siblings, 0 replies; 6+ messages in thread
From: jamborm at gcc dot gnu.org @ 2023-01-19 10:00 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94360
--- Comment #5 from Martin Jambor <jamborm at gcc dot gnu.org> ---
Well, if the current behavior is a good one (I have not looked at how
size/performance trade-off works out) then I am also fine declaring this bug
invalid.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2023-01-19 10:00 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-27 16:34 [Bug ipa/94360] New: 6% run-time regression of 502.gcc_r against GCC 9 when compiled with -O2 and both PGO and LTO jamborm at gcc dot gnu.org
2020-03-27 16:55 ` [Bug ipa/94360] " marxin at gcc dot gnu.org
2020-03-30 18:00 ` jamborm at gcc dot gnu.org
2023-01-18 15:42 ` jamborm at gcc dot gnu.org
2023-01-18 15:47 ` hubicka at gcc dot gnu.org
2023-01-19 10:00 ` jamborm at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).