[Bug ipa/94360] New: 6% run-time regression of 502.gcc_r against GCC 9 when compiled with -O2 and both PGO and LTO

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug ipa/94360] New: 6% run-time regression of 502.gcc_r against GCC 9 when compiled with -O2 and both PGO and LTO
@ 2020-03-27 16:34 jamborm at gcc dot gnu.org
  2020-03-27 16:55 ` [Bug ipa/94360] " marxin at gcc dot gnu.org
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: jamborm at gcc dot gnu.org @ 2020-03-27 16:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94360

            Bug ID: 94360
           Summary: 6% run-time regression of 502.gcc_r against GCC 9 when
                    compiled with -O2 and both PGO and LTO
           Product: gcc
           Version: 10.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: ipa
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jamborm at gcc dot gnu.org
                CC: hubicka at gcc dot gnu.org, marxin at gcc dot gnu.org
            Blocks: 26163
  Target Milestone: ---
              Host: x86_64-linux
            Target: x86_64-linux

When built at -O2, generic march/mtune and with both PGO and LTO and
current trunk/master, SPEC 2017 INTrate 502.gcc_r is 6% slower when
run on and AMD Zen2-based CPU - and about 4.8% slower on Intel Cascade
Lake.

Looking at how the run-time of the benchmark evolved over the course
of GCC 10 development cycle, the first and biggest regression (9%)
comes with:

  commit 2925cad2151842daa387950e62d989090e47c91d
  Author: Jan Hubicka <hubicka@ucw.cz>
  Date:   Thu Oct 3 17:08:21 2019 +0200

    params.def (PARAM_INLINE_HEURISTICS_HINT_PERCENT, [...]): New.

            * params.def (PARAM_INLINE_HEURISTICS_HINT_PERCENT,
            PARAM_INLINE_HEURISTICS_HINT_PERCENT_O2): New.
            * doc/invoke.texi (inline-heuristics-hint-percent,
            inline-heuristics-hint-percent-O2): Document.
            * tree-inline.c (inline_insns_single, inline_insns_auto): Add new
            hint attribute.
            (can_inline_edge_by_limits_p): Use it.

   From-SVN: r276516

Then between Wed Nov 6 (72d6aeecd95) and Mon Nov 18 (58c036c8354) it
improved to about 103% of GCC 9 run-time (I did not exactly found what
caused it because in much of this range the compiler was segfaulting
in the LTO phase).  Eventually, the benchmark regresses to current
106% of GCC 9 run-time with Honza's:

  - 9340d34599e Convert inliner to function specific param infrastructure, or
  - 1e83bd7003e Convert inliner to new param infrastructure.

The former cannot be built without the latter.

Symbol profiles are:

trunk (26b3e568a60):
  Overhead    Samples  Shared Object         Symbol                             
  ........  .........  .................... 
....................................

     4.04%      42371  cpugcc_r_peak.pgolto  bitmap_ior_into
     2.91%      30281  cpugcc_r_peak.pgolto  df_worklist_dataflow
     2.24%      23342  cpugcc_r_peak.pgolto  df_note_compute
     1.92%      20120  cpugcc_r_peak.pgolto  bitmap_set_bit
     1.75%      18148  cpugcc_r_peak.pgolto  rest_of_handle_fast_dce.lto_priv.0
     1.58%      16580  libc-2.31.so          __memset_avx2_unaligned_erms
     1.40%      14514  cpugcc_r_peak.pgolto  extract_new_fences_from.lto_priv.0
     1.39%      14732  libc-2.31.so          _int_malloc
     1.33%      13824  cpugcc_r_peak.pgolto  bitmap_copy
     1.24%      12962  cpugcc_r_peak.pgolto  bitmap_bit_p
     1.19%      12346  cpugcc_r_peak.pgolto  bitmap_and
     1.18%      12242  cpugcc_r_peak.pgolto  df_lr_local_compute.lto_priv.0
     1.02%      10618  cpugcc_r_peak.pgolto  cleanup_cfg.isra.0


vs gcc 9 (releases/gcc-9.3.0):


  Overhead    Samples  Shared Object         Symbol                             
  ........  .........  .................... 
.....................................

     6.81%      66967  cpugcc_r_peak.pgolto  df_worklist_dataflow
     2.83%      28063  cpugcc_r_peak.pgolto  bitmap_ior_into
     2.80%      27489  cpugcc_r_peak.pgolto  df_note_compute.lto_priv.0
     2.17%      21334  cpugcc_r_peak.pgolto  rest_of_handle_fast_dce.lto_priv.0
     1.69%      16671  libc-2.31.so          __memset_avx2_unaligned_erms
     1.51%      14876  cpugcc_r_peak.pgolto  try_optimize_cfg.lto_priv.0
     1.50%      14990  libc-2.31.so          _int_malloc
     1.50%      14715  cpugcc_r_peak.pgolto  extract_new_fences_from.lto_priv.0
     1.36%      13406  cpugcc_r_peak.pgolto  df_lr_local_compute.lto_priv.0
     1.20%      11926  cpugcc_r_peak.pgolto  remove_unused_locals
     1.06%      10433  cpugcc_r_peak.pgolto  sched_analyze_insn
     1.04%      10210  cpugcc_r_peak.pgolto  init_alias_analysis
     1.04%      10188  cpugcc_r_peak.pgolto  prescan_insns_for_dce.lto_priv.0
     1.00%       9876  cpugcc_r_peak.pgolto  compute_transp


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
[Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug ipa/94360] 6% run-time regression of 502.gcc_r against GCC 9 when compiled with -O2 and both PGO and LTO
  2020-03-27 16:34 [Bug ipa/94360] New: 6% run-time regression of 502.gcc_r against GCC 9 when compiled with -O2 and both PGO and LTO jamborm at gcc dot gnu.org
@ 2020-03-27 16:55 ` marxin at gcc dot gnu.org
  2020-03-30 18:00 ` jamborm at gcc dot gnu.org
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: marxin at gcc dot gnu.org @ 2020-03-27 16:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94360

--- Comment #1 from Martin Liška <marxin at gcc dot gnu.org> ---
Unfortunately, the mentioned configuration is not tested on LNT periodic
benchmarks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug ipa/94360] 6% run-time regression of 502.gcc_r against GCC 9 when compiled with -O2 and both PGO and LTO
  2020-03-27 16:34 [Bug ipa/94360] New: 6% run-time regression of 502.gcc_r against GCC 9 when compiled with -O2 and both PGO and LTO jamborm at gcc dot gnu.org
  2020-03-27 16:55 ` [Bug ipa/94360] " marxin at gcc dot gnu.org
@ 2020-03-30 18:00 ` jamborm at gcc dot gnu.org
  2023-01-18 15:42 ` jamborm at gcc dot gnu.org
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: jamborm at gcc dot gnu.org @ 2020-03-30 18:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94360

--- Comment #2 from Martin Jambor <jamborm at gcc dot gnu.org> ---
PR94410 is another O2 PGO+LTO bug where g:2925cad2151 caused a slowdown.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug ipa/94360] 6% run-time regression of 502.gcc_r against GCC 9 when compiled with -O2 and both PGO and LTO
  2020-03-27 16:34 [Bug ipa/94360] New: 6% run-time regression of 502.gcc_r against GCC 9 when compiled with -O2 and both PGO and LTO jamborm at gcc dot gnu.org
  2020-03-27 16:55 ` [Bug ipa/94360] " marxin at gcc dot gnu.org
  2020-03-30 18:00 ` jamborm at gcc dot gnu.org
@ 2023-01-18 15:42 ` jamborm at gcc dot gnu.org
  2023-01-18 15:47 ` hubicka at gcc dot gnu.org
  2023-01-19 10:00 ` jamborm at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: jamborm at gcc dot gnu.org @ 2023-01-18 15:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94360

--- Comment #3 from Martin Jambor <jamborm at gcc dot gnu.org> ---
LNT can still see this, on the zen2 and zen3 machine at least:

https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=700.337.0&plot.1=711.337.0&plot.2=740.337.0&plot.3=694.337.0&

https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=690.337.0&plot.1=745.337.0&plot.2=777.337.0&plot.3=687.337.0&

(gcc 9 is the dot in the left bottom corner).

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug ipa/94360] 6% run-time regression of 502.gcc_r against GCC 9 when compiled with -O2 and both PGO and LTO
  2020-03-27 16:34 [Bug ipa/94360] New: 6% run-time regression of 502.gcc_r against GCC 9 when compiled with -O2 and both PGO and LTO jamborm at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2023-01-18 15:42 ` jamborm at gcc dot gnu.org
@ 2023-01-18 15:47 ` hubicka at gcc dot gnu.org
  2023-01-19 10:00 ` jamborm at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-01-18 15:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94360

Jan Hubicka <hubicka at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2023-01-18
     Ever confirmed|0                           |1

--- Comment #4 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
With -O2 -fprofile-use we now use -O2 inliner limits while previously we
switched to effectively -O3 inlining.
In a way it makes sense to have -O2 -fprofile-use to produce smaller and bit
slower code than -O3 -fprofile-use but it seems that current limits are way too
low.  I.e. the code size savings does not seem to justify the performance loss.

From maintenance perspective it kind of sucks to have 3 sets of values (-O2,
-O3 and -O2 + -fprofile-use) but maybe we can get cheaply out by simply making
"known hot" hint to be taken seriously with FDO.  FDO inlining is kind of easy
since hot calls are known well.

I will take a look.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug ipa/94360] 6% run-time regression of 502.gcc_r against GCC 9 when compiled with -O2 and both PGO and LTO
  2020-03-27 16:34 [Bug ipa/94360] New: 6% run-time regression of 502.gcc_r against GCC 9 when compiled with -O2 and both PGO and LTO jamborm at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2023-01-18 15:47 ` hubicka at gcc dot gnu.org
@ 2023-01-19 10:00 ` jamborm at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: jamborm at gcc dot gnu.org @ 2023-01-19 10:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94360

--- Comment #5 from Martin Jambor <jamborm at gcc dot gnu.org> ---
Well, if the current behavior is a good one (I have not looked at how
size/performance trade-off works out) then I am also fine declaring this bug
invalid.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-01-19 10:00 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-27 16:34 [Bug ipa/94360] New: 6% run-time regression of 502.gcc_r against GCC 9 when compiled with -O2 and both PGO and LTO jamborm at gcc dot gnu.org
2020-03-27 16:55 ` [Bug ipa/94360] " marxin at gcc dot gnu.org
2020-03-30 18:00 ` jamborm at gcc dot gnu.org
2023-01-18 15:42 ` jamborm at gcc dot gnu.org
2023-01-18 15:47 ` hubicka at gcc dot gnu.org
2023-01-19 10:00 ` jamborm at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).