[Bug driver/114531] New: Feature proposal for an `-finline-functions-aggressive` compiler option

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug driver/114531] New: Feature proposal for an `-finline-functions-aggressive` compiler option
@ 2024-03-30  1:25 rvmallad at amazon dot com
  2024-03-30  1:37 ` [Bug ipa/114531] " pinskia at gcc dot gnu.org
                   ` (19 more replies)
  0 siblings, 20 replies; 22+ messages in thread
From: rvmallad at amazon dot com @ 2024-03-30  1:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114531

            Bug ID: 114531
           Summary: Feature proposal for an
                    `-finline-functions-aggressive` compiler option
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: driver
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rvmallad at amazon dot com
                CC: rsandifo at gcc dot gnu.org
  Target Milestone: ---

Created attachment 57837
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57837&action=edit
patch to implement -finline-functions-aggressive option in GCC

This is a proposal for a user-visible GCC compiler option for aggressive
inlining that is currently only available at -O3 as internal inline parameters
(--param=early-inlining-insns=14 --param=inline-heuristics-hint-percent=600
--param=inline-min-speedup=15 --param=max-inline-insns-auto=30
--param=max-inline-insns-single=200).

I got some perf data for Envoy (https://github.com/envoyproxy/envoy) and SPEC
CPU2017 intrate benchmarks on C7g.2xlarge w Ubuntu22 + gcc-11.4.0. We see perf
gains (2% - 5%) using these aggressive inline parameters (at -O2). Attached is
a patch for this change.

We do not want to add these inline limits at ‘-O2’ itself, as we see from one
of the SPEC CPU tests that got slower. Also, more inline tuning at -O2 would
make some of the symbols not to be available for probe/ debug (that are
available when not using these aggressive inline params).

-----------------------------------------------------------------------
Envoy load_balancer_benchmark – using only 1 CPU – Iterations, higher better
$ bazel run -c opt //test/common/upstream:load_balancer_benchmark

bazel-envoy/external/local_config_cc/BUILD can be changed for adding inline
parameters/ options.

------------------------------------------------------------------------
Benchmark Iterations           Baseline O2        + Inline Params   Gain
------------------------------------------------------------------------
benchmarkRoundRobinLoad          1518               1596           1.05x
BalancerBuild/500/50/50

benchmarkLeastRequestLoad        1465               1514           1.03x
BalancerChooseHost/100/3/1000           

benchmarkRingHashLoadBalancer      33                 34           1.03x
ChooseHost/100/65536/100000           

benchmarkMaglevLoadBalancer        69                 72           1.04x
Weighted/500/95/75/25/10000
------------------------------------------------------------------------

copies=8        "-O2"   "-Ofast" Gain          "-O2 +           Gain w
                                 w Ofast        inlining"       inlining
500.perlbench_r 36.5    34.3     94.0%          34.4            94.2%
502.gcc_r       45.4    47.6     104.8%         47.5            104.6%
505.mcf_r       44.6    48.2     108.1%         44.3            99.3%
520.omnetpp_r   22.1    24.9     112.7%         21.9            99.1%
523.xalancbmk_r 43.8    46.3     105.7%         45.4            103.7%
525.x264_r      44.3    89       200.9%         43.8            98.9%
531.deepsjeng_r 36      37.3     103.6%         37.5            104.2%
541.leela_r     33.5    33.9     101.2%         34.2            102.1%
548.exchange2_r 65.4    76.6     117.1%         65.3            99.8%
557.xz_r        19.8    19.9     100.5%         19.9            100.5%
SPECrate..base  37.1    41.6     112.1%         37.3            100.5%
-----------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug ipa/114531] Feature proposal for an `-finline-functions-aggressive` compiler option
  2024-03-30  1:25 [Bug driver/114531] New: Feature proposal for an `-finline-functions-aggressive` compiler option rvmallad at amazon dot com
@ 2024-03-30  1:37 ` pinskia at gcc dot gnu.org
  2024-03-30  1:38 ` pinskia at gcc dot gnu.org
                   ` (18 subsequent siblings)
  19 siblings, 0 replies; 22+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-03-30  1:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114531

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Maybe we should figure out why the increase of the limits help and add extra
code to get better heuristics rather than just tweaking the limits.

I know that there was some improvements for gcc 14 already for the heuristics
for c++ code.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug ipa/114531] Feature proposal for an `-finline-functions-aggressive` compiler option
  2024-03-30  1:25 [Bug driver/114531] New: Feature proposal for an `-finline-functions-aggressive` compiler option rvmallad at amazon dot com
  2024-03-30  1:37 ` [Bug ipa/114531] " pinskia at gcc dot gnu.org
@ 2024-03-30  1:38 ` pinskia at gcc dot gnu.org
  2024-03-30  1:56 ` pinskia at gcc dot gnu.org
                   ` (17 subsequent siblings)
  19 siblings, 0 replies; 22+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-03-30  1:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114531

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
I suspect the implementation of the option should be changed slight as how does
it interact with the user supplying the params too.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug ipa/114531] Feature proposal for an `-finline-functions-aggressive` compiler option
  2024-03-30  1:25 [Bug driver/114531] New: Feature proposal for an `-finline-functions-aggressive` compiler option rvmallad at amazon dot com
  2024-03-30  1:37 ` [Bug ipa/114531] " pinskia at gcc dot gnu.org
  2024-03-30  1:38 ` pinskia at gcc dot gnu.org
@ 2024-03-30  1:56 ` pinskia at gcc dot gnu.org
  2024-03-30  2:19 ` rvmallad at amazon dot com
                   ` (16 subsequent siblings)
  19 siblings, 0 replies; 22+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-03-30  1:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114531

--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Also do you have numbers with lto enabled? Or is these without lto?

Does LTO improve the situation for Envoy too?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug ipa/114531] Feature proposal for an `-finline-functions-aggressive` compiler option
  2024-03-30  1:25 [Bug driver/114531] New: Feature proposal for an `-finline-functions-aggressive` compiler option rvmallad at amazon dot com
                   ` (2 preceding siblings ...)
  2024-03-30  1:56 ` pinskia at gcc dot gnu.org
@ 2024-03-30  2:19 ` rvmallad at amazon dot com
  2024-03-30  2:21 ` rvmallad at amazon dot com
                   ` (15 subsequent siblings)
  19 siblings, 0 replies; 22+ messages in thread
From: rvmallad at amazon dot com @ 2024-03-30  2:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114531

--- Comment #4 from Rama Malladi <rvmallad at amazon dot com> ---
(In reply to Andrew Pinski from comment #1)
> Maybe we should figure out why the increase of the limits help and add extra
> code to get better heuristics rather than just tweaking the limits.
> 
> I know that there was some improvements for gcc 14 already for the
> heuristics for c++ code.

interesting... thank you.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug ipa/114531] Feature proposal for an `-finline-functions-aggressive` compiler option
  2024-03-30  1:25 [Bug driver/114531] New: Feature proposal for an `-finline-functions-aggressive` compiler option rvmallad at amazon dot com
                   ` (3 preceding siblings ...)
  2024-03-30  2:19 ` rvmallad at amazon dot com
@ 2024-03-30  2:21 ` rvmallad at amazon dot com
  2024-03-30  2:29 ` pinskia at gcc dot gnu.org
                   ` (14 subsequent siblings)
  19 siblings, 0 replies; 22+ messages in thread
From: rvmallad at amazon dot com @ 2024-03-30  2:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114531

--- Comment #5 from Rama Malladi <rvmallad at amazon dot com> ---
(In reply to Andrew Pinski from comment #3)
> Also do you have numbers with lto enabled? Or is these without lto?
> 
> Does LTO improve the situation for Envoy too?

These numbers are without lto. I haven't tried it but I can try and post an
update.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug ipa/114531] Feature proposal for an `-finline-functions-aggressive` compiler option
  2024-03-30  1:25 [Bug driver/114531] New: Feature proposal for an `-finline-functions-aggressive` compiler option rvmallad at amazon dot com
                   ` (4 preceding siblings ...)
  2024-03-30  2:21 ` rvmallad at amazon dot com
@ 2024-03-30  2:29 ` pinskia at gcc dot gnu.org
  2024-04-01 11:41 ` rvmallad at amazon dot com
                   ` (13 subsequent siblings)
  19 siblings, 0 replies; 22+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-03-30  2:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114531

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://gcc.gnu.org/bugzill
                   |                            |a/show_bug.cgi?id=109849

--- Comment #6 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
See PR 109849 for some of the improvements

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug ipa/114531] Feature proposal for an `-finline-functions-aggressive` compiler option
  2024-03-30  1:25 [Bug driver/114531] New: Feature proposal for an `-finline-functions-aggressive` compiler option rvmallad at amazon dot com
                   ` (5 preceding siblings ...)
  2024-03-30  2:29 ` pinskia at gcc dot gnu.org
@ 2024-04-01 11:41 ` rvmallad at amazon dot com
  2024-04-02  8:36 ` rguenth at gcc dot gnu.org
                   ` (12 subsequent siblings)
  19 siblings, 0 replies; 22+ messages in thread
From: rvmallad at amazon dot com @ 2024-04-01 11:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114531

--- Comment #7 from Rama Malladi <rvmallad at amazon dot com> ---
(In reply to Rama Malladi from comment #5)
> (In reply to Andrew Pinski from comment #3)
> > Also do you have numbers with lto enabled? Or is these without lto?
> > 
> > Does LTO improve the situation for Envoy too?
> 
> These numbers are without lto. I haven't tried it but I can try and post an
> update.

I checked and found the Envoy run was w/o LTO but SPEC cpu2017 intrate was w
LTO.

I tried a build of Envoy w LTO and it failed. I need to debug that issue
further.

Below are perf results w/o LTO. gcc version 11.4.0 (Ubuntu
11.4.0-1ubuntu1~22.04).

copies=8        -O2     -Ofast  Gain w  -O2 + inlining  Gain w
                noLTO   noLTO   Ofast   noLTO           inlining
500.perlbench_r 33.7    33.3    98.8%   33.2            98.5%
502.gcc_r       45.2    46.9    103.8%  46.3            102.4%
505.mcf_r       44.7    44.3    99.1%   44.6            99.8%
520.omnetpp_r   21.4    24.4    114.0%  21.3            99.5%
523.xalancbmk_r 41.6    45.5    109.4%  44              105.8%
525.x264_r      44.2    89      201.4%  43.9            99.3%
531.deepsjeng_r 32.8    32.8    100.0%  33.1            100.9%
541.leela_r     28.6    30.5    106.6%  30.3            105.9%
548.exchange2_r 64.1    64.6    100.8%  64.1            100.0%
557.xz_r        20.3    20.4    100.5%  20.3            100.0%
SPECrate..base  35.6    39.4    110.7%  36              101.1%

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug ipa/114531] Feature proposal for an `-finline-functions-aggressive` compiler option
  2024-03-30  1:25 [Bug driver/114531] New: Feature proposal for an `-finline-functions-aggressive` compiler option rvmallad at amazon dot com
                   ` (6 preceding siblings ...)
  2024-04-01 11:41 ` rvmallad at amazon dot com
@ 2024-04-02  8:36 ` rguenth at gcc dot gnu.org
  2024-04-08  9:52 ` rvmallad at amazon dot com
                   ` (11 subsequent siblings)
  19 siblings, 0 replies; 22+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-04-02  8:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114531

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug ipa/114531] Feature proposal for an `-finline-functions-aggressive` compiler option
  2024-03-30  1:25 [Bug driver/114531] New: Feature proposal for an `-finline-functions-aggressive` compiler option rvmallad at amazon dot com
                   ` (7 preceding siblings ...)
  2024-04-02  8:36 ` rguenth at gcc dot gnu.org
@ 2024-04-08  9:52 ` rvmallad at amazon dot com
  2024-05-31 13:57 ` rvmallad at amazon dot com
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 22+ messages in thread
From: rvmallad at amazon dot com @ 2024-04-08  9:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114531

Rama Malladi <rvmallad at amazon dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rvmallad at amazon dot com

--- Comment #8 from Rama Malladi <rvmallad at amazon dot com> ---
Created attachment 57898
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57898&action=edit
Updated patch for `-finline-functions-aggressive` GCC option.

This is an updated patch to include a new GCC option:
`-finline-functions-aggressive`. It has the `-O3` inlining heuristics replaced
with an entry that implies `OPT_finline_functions_aggressive` is enabled. It
also has an entry in `invoke.texi` for documentation stating that this option
selects the same inlining heuristics as `-O3`.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug ipa/114531] Feature proposal for an `-finline-functions-aggressive` compiler option
  2024-03-30  1:25 [Bug driver/114531] New: Feature proposal for an `-finline-functions-aggressive` compiler option rvmallad at amazon dot com
                   ` (8 preceding siblings ...)
  2024-04-08  9:52 ` rvmallad at amazon dot com
@ 2024-05-31 13:57 ` rvmallad at amazon dot com
  2024-06-25 13:03 ` wilco at gcc dot gnu.org
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 22+ messages in thread
From: rvmallad at amazon dot com @ 2024-05-31 13:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114531

--- Comment #9 from Rama Malladi <rvmallad at amazon dot com> ---
I wanted us to review this feature implementation given GCC 15 Stage 1
development has started. Thank you.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug ipa/114531] Feature proposal for an `-finline-functions-aggressive` compiler option
  2024-03-30  1:25 [Bug driver/114531] New: Feature proposal for an `-finline-functions-aggressive` compiler option rvmallad at amazon dot com
                   ` (9 preceding siblings ...)
  2024-05-31 13:57 ` rvmallad at amazon dot com
@ 2024-06-25 13:03 ` wilco at gcc dot gnu.org
  2024-06-25 15:30 ` rvmallad at amazon dot com
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 22+ messages in thread
From: wilco at gcc dot gnu.org @ 2024-06-25 13:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114531

Wilco <wilco at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |wilco at gcc dot gnu.org

--- Comment #10 from Wilco <wilco at gcc dot gnu.org> ---
A 1.1% overall performance gain looks good - is there a significant codesize
hit from this? If so, are there slightly less aggressive settings that still
get most of the performance gains but at a lower (acceptable) codesize cost? It
seems there may be scope to improve the default settings of -O2.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug ipa/114531] Feature proposal for an `-finline-functions-aggressive` compiler option
  2024-03-30  1:25 [Bug driver/114531] New: Feature proposal for an `-finline-functions-aggressive` compiler option rvmallad at amazon dot com
                   ` (10 preceding siblings ...)
  2024-06-25 13:03 ` wilco at gcc dot gnu.org
@ 2024-06-25 15:30 ` rvmallad at amazon dot com
  2024-06-25 16:20 ` hubicka at ucw dot cz
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 22+ messages in thread
From: rvmallad at amazon dot com @ 2024-06-25 15:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114531

--- Comment #11 from Rama Malladi <rvmallad at amazon dot com> ---
(In reply to Wilco from comment #10)
> A 1.1% overall performance gain looks good - is there a significant codesize
> hit from this? If so, are there slightly less aggressive settings that still
> get most of the performance gains but at a lower (acceptable) codesize cost?
> It seems there may be scope to improve the default settings of -O2.

Here is a code size comparison of O2 vs. O2 + inline params for SPEC cpu2017
Int Rate benchmarks. One of the concerns for not modifying the default inline
parameters at -O2 is loss of some function observability due to aggressive
inline optimizations.

Benchmark       "-O2"   "-O2 + inline   size
                         params"        increase
500.perlbench_r 8.5M    11M             1.29
502.gcc_r       51M     56M             1.10
505.mcf_r       102K    106K            1.04
520.omnetpp_r   22M     23M             1.05
523.xalancbmk_r 52M     53M             1.02
525.x264_r      2.5M    2.8M            1.12
531.deepsjeng_r 416K    441K            1.06
541.leela_r     2.6M    2.6M            1.00
548.exchange2_r 115K    115K            1.00
557.xz_r        818K    871K            1.06
999.specrand_ir 24K     24K             1.00

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug ipa/114531] Feature proposal for an `-finline-functions-aggressive` compiler option
  2024-03-30  1:25 [Bug driver/114531] New: Feature proposal for an `-finline-functions-aggressive` compiler option rvmallad at amazon dot com
                   ` (11 preceding siblings ...)
  2024-06-25 15:30 ` rvmallad at amazon dot com
@ 2024-06-25 16:20 ` hubicka at ucw dot cz
  2024-06-25 16:25 ` rvmallad at amazon dot com
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 22+ messages in thread
From: hubicka at ucw dot cz @ 2024-06-25 16:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114531

--- Comment #12 from Jan Hubicka <hubicka at ucw dot cz> ---
If this is without LTO, can you also try the LTO numbers?
Inliner behaves sifniciantly different with and without LTO, since LTO
introduces many (and often too many) inlining oppurtunities, which
sometimes makes things to out of hand.

Overal SPEC2k17 without LTO is not the most representative inlining
benchmark, since most programs there are relatively old and written with
small abstraction penalty.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug ipa/114531] Feature proposal for an `-finline-functions-aggressive` compiler option
  2024-03-30  1:25 [Bug driver/114531] New: Feature proposal for an `-finline-functions-aggressive` compiler option rvmallad at amazon dot com
                   ` (12 preceding siblings ...)
  2024-06-25 16:20 ` hubicka at ucw dot cz
@ 2024-06-25 16:25 ` rvmallad at amazon dot com
  2024-06-25 16:49 ` hubicka at ucw dot cz
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 22+ messages in thread
From: rvmallad at amazon dot com @ 2024-06-25 16:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114531

--- Comment #13 from Rama Malladi <rvmallad at amazon dot com> ---
(In reply to Jan Hubicka from comment #12)
> If this is without LTO, can you also try the LTO numbers?
> Inliner behaves sifniciantly different with and without LTO, since LTO
> introduces many (and often too many) inlining oppurtunities, which
> sometimes makes things to out of hand.
> 
> Overal SPEC2k17 without LTO is not the most representative inlining
> benchmark, since most programs there are relatively old and written with
> small abstraction penalty.

The numbers listed above in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114531#c11 are with LTO. Here are
the base flags used for these runs:

`O2 -march=armv8-a+crc+crypto -mtune=native -flto`

+ inlining params:
`--param=early-inlining-insns=14 --param=inline-heuristics-hint-percent=600
--param=inline-min-speedup=15 --param=max-inline-insns-auto=30
--param=max-inline-insns-single=200`

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug ipa/114531] Feature proposal for an `-finline-functions-aggressive` compiler option
  2024-03-30  1:25 [Bug driver/114531] New: Feature proposal for an `-finline-functions-aggressive` compiler option rvmallad at amazon dot com
                   ` (13 preceding siblings ...)
  2024-06-25 16:25 ` rvmallad at amazon dot com
@ 2024-06-25 16:49 ` hubicka at ucw dot cz
  2024-06-25 17:40 ` rvmallad at amazon dot com
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 22+ messages in thread
From: hubicka at ucw dot cz @ 2024-06-25 16:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114531

--- Comment #14 from Jan Hubicka <hubicka at ucw dot cz> ---
As for bit of history on this.  I have introduced the split -O2 and -O3
limits in order to be able to enable -finline-small-functions at -O2
which we found to be really importnat for C++ codebases which no longer
care about explicit use of inline keyword much.

To do that it was necessary to find settings that does not grow -O2
binaries significantly (or reduce it) and yields to measurably better
performance.  Without LTO and SPECCPU the differences were quite small.
With LTO it was more noticeable and with firefox/clang and similar
with LTO they were significant (often double-digit).

Pushing up -O2 limits can make sense, but needs to be done carefully -
in longer term IMO we do not want to let -O2 binaries to grow faster
than their perofrmance. Sadly this figure is not that great.

https://lnt.opensuse.org/db_default/v4/SPEC/spec_report/branchhttps://lnt.opensuse.org/db_default/v4/SPEC/spec_report/branch
loads slowly but has some data.

SPEC2k17 with -O2 -flto on 2nd generation zen performs as follows:
        gcc-7   gcc-8   gcc-9   gcc-10  gcc-11  gcc-12  gcc-13  gcc-14 
gcc-trunk
SPECint 2.55%   2.90%   ~       4.55%   4.47%   11.29%  12.60%  14.13%  13.42%
SPECfp ~        ~       ~       ~       ~       4.15%   4.98%   5.30%   5.18%

Those are scores (bigger is better) compared to gcc-6 in percents. ~ is noise.

Large improvement in gcc-12 is enablement of vectorizer for specint
comes primarily from x264

While text section size:
        gcc-7   gcc-8   gcc-9   gcc-10  gcc-11  gcc-12  gcc-13  gcc-14 
gcc-trunk
int     ~       ~       ~       9.77%   9.57%   8.72%   8.26%   10.68%  10.59%
fp      ~       2.40%   ~       18.30%  18.24%  18.92%  18.66%  22.23%  22.27%
Those are sizes (smaller is better).  So we do get coniderable bloat.

In GCC10 Fortran ABI changed and imporant part of FP 18% FP bloat is
caused by it.  Here are individual changes:


runtime (only benchmarks with off-noise changes):

Test Name       gcc-7   gcc-8   gcc-9   gcc-10  gcc-11  gcc-12  gcc-13  gcc-14 
gcc-trunk
FP/538.imagick  25.01%  25.64%  27.57%  21.51%  21.75%  19.46%  19.88%  23.20% 
22.91%
INT/525.x264_r  7.25%   6.20%   6.58%   7.48%   ~       -37.7%  -40.4%  -41.6% 
-39.90%
INT/548.exchan  -17.9%  -17.8%  -14.9%  -14.1%  -5.88%  -13.9%  -21.6%  -25.0% 
-26.48%
INT/531.deepsj  -2.46%  ~       ~       -15.0%  -16.1%  -17.9%  -18.8%  -19.3% 
-19.62%
FP/503.bwaves_  -6.30%  ~       -2.71%  16.95%  16.71%  16.65%  16.94%  16.94% 
16.70%
FP/527.cam4_r   -2.99%  -2.33%  -10.7%  -11.3%  -10.9%  -11.8%  -11.9%  -12.5% 
-11.37%
FP/521.wrf_r    ~       -2.40%  -5.99%  -6.10%  -5.66%  -9.45%  -9.28%  -9.82% 
-9.95%
FP/554.roms_r   ~       5.79%   2.51%   ~       5.24%   7.95%   9.35%   9.11%  
9.68%
INT/520.omnetp  -3.26%  -3.45%  ~       -3.82%  -6.71%  -7.37%  -6.57%  -6.83% 
-5.62%
FP/549.fotonik  ~       ~       -5.60%  -8.26%  -8.61%  -3.80%  -4.82%  -3.26% 
-5.48%
INT/541.leela_  -2.47%  -2.19%  ~       -4.57%  -6.32%  -4.76%  -5.69%  -6.72% 
-5.88%
INT/500.perlbe  ~       -2.11%  -2.34%  -6.03%  -4.51%  ~       ~       -5.01% 
-4.52%
INT/523.xalanc  -2.42%  -3.18%  -2.26%  -3.75%  -2.31%  -5.95%  -2.02%  -3.52% 
~
FP/511.povray_  ~       ~       5.21%   -6.54%  ~       ~       ~       ~      
~
INT/505.mcf_r   ~       ~       ~       ~       ~       -2.82%  -3.32%  -3.71% 
-4.14%
FP/510.parest_  ~       ~       ~       ~       -3.31%  ~       -2.28%  -3.03% 
-3.39%
FP/519.lbm_r    3.33%   ~       ~       -4.72%  ~       ~       ~       ~      
~
FP/544.nab_r    ~       ~       ~       ~       ~       ~       -2.43%  ~      
-3.15%
FP/508.namd_r   ~       ~       ~       ~       4.20%   ~       ~       -2.35% 
-2.02%
Those are times (smaller is better)

- Imagemagick regression since GCC 7 is store-to-load forwarding where we
  vectorize load in one function of value stored by pieces in another.
- x264 improvement in GCC 12 is vectorization at -O2 (which may be
  argued to help primarily code that should be built with -Ofast/-O3
  anyway)
- exchange improvement in GCC 7 is special handling of self recursive
  functions with nested loops (quite specific to the benchmark)
- forgot what caused changes in deepsjeng in GCC10 and cam4 in GCC9

size

                GCC 6 size      gcc-7   gcc-8   gcc-9   gcc-10  gcc-11  gcc-12 
gcc-13  gcc-14  gcc-trunk
FP/521.wrf_rg   11.85 MB        ~       5.78%   4.43%   33.11%  33.11%  34.41% 
34.41%  38.42%  38.41%
INT/557.xz_rg   75.53 KB        ~       ~       ~       30.10%  29.47%  29.18% 
30.30%  33.28%  33.57%
FP/totalg       28.08 MB        ~       2.40%   ~       18.30%  18.24%  18.92% 
18.66%  22.23%  22.27%
INT/523.xalanc  1.98 MB         ~       ~       15.05%  14.85%  14.54%  13.62% 
13.80%  17.31%  17.07%
FP/526.blender  6.21 MB         ~       ~       -2.50%  15.93%  15.97%  15.70% 
14.08%  18.47%  18.40%
INT/541.leela   74.37 KB        ~       ~       13.36%  -8.84%  -8.54%  -15.7% 
-15.3%  -9.58%  -10.34%
INT/500.perlb   1.50 MB         ~       ~       ~       9.20%   9.08%   10.08% 
9.69%   12.44%  12.38%
INT/502.gcc_r   6.16 MB         ~       ~       -2.18%  10.40%  10.59%  8.50%  
8.07%   10.14%  10.10%
FP/549.fotoni   325.23 KB       ~       ~       ~       4.39%   4.33%   8.35%  
9.28%   11.76%  10.82%
FP/519.lbm_rg   10.53 KB        -3.90%  -5.54%  -4.72%  -3.83%  -3.60%  -6.58% 
-6.43%  -5.28%  -5.28%
FP/538.imagic   1.03 MB         ~       2.32%   ~       7.36%   7.47%   6.49%  
6.22%   5.17%   4.74%
FP/544.nab_rg   83.99 KB        ~       -2.37%  -3.63%  -5.02%  -5.43%  -5.33% 
-7.50%  -5.35%  -5.49%
INT/531.deeps   60.41 KB        ~       ~       ~       2.54%   2.81%   7.06%  
6.65%   9.91%   10.01%
FP/511.povray   771.09 KB       ~       ~       ~       7.83%   9.25%   6.44%  
2.73%   5.84%   5.68%
FP/507.cactuB   2.54 MB         6.59%   ~       -3.85%  ~       ~       2.81%  
5.32%   7.56%   9.22%
FP/527.cam4_r   2.60 MB         ~       2.25%   ~       4.21%   3.92%   3.96%  
5.02%   6.37%   6.11%
INT/548.excha   65.35 KB        -7.62%  2.14%   ~       -3.24%  -3.58%  ~      
~       5.92%   6.14%
FP/510.parest   1.29 MB         -2.11%  ~       9.79%   ~       -2.17%  -3.89% 
-4.43%  3.44%   3.22%
INT/520.omnet   1.07 MB         ~       ~       -3.96%  4.31%   ~       4.71%  
2.48%   3.83%   3.52%
FP/508.namd_r   829.33 KB       ~       ~       13.51%  ~       ~       ~      
~       4.11%   3.25%
INT/505.mcf_r   12.59 KB        ~       -3.25%  -5.23%  -2.24%  -4.12%  -2.88% 
-2.26%  2.71%   ~
INT/525.x264_   404.39 KB       ~       ~       ~       -5.23%  -5.15%  -3.84% 
-3.88%  ~       ~
FP/554.roms_r   563.96 KB       -2.55%  ~       ~       ~       ~       ~      
-4.60%  -4.17%  -4.59%
FP/503.bwaves   30.62 KB        2.74%   ~       -2.24%  -2.52%  -2.43%  ~      
~       ~       ~

So GCC binary for example got 10% bigger

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug ipa/114531] Feature proposal for an `-finline-functions-aggressive` compiler option
  2024-03-30  1:25 [Bug driver/114531] New: Feature proposal for an `-finline-functions-aggressive` compiler option rvmallad at amazon dot com
                   ` (14 preceding siblings ...)
  2024-06-25 16:49 ` hubicka at ucw dot cz
@ 2024-06-25 17:40 ` rvmallad at amazon dot com
  2024-06-25 17:48 ` rvmallad at amazon dot com
                   ` (3 subsequent siblings)
  19 siblings, 0 replies; 22+ messages in thread
From: rvmallad at amazon dot com @ 2024-06-25 17:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114531

--- Comment #15 from Rama Malladi <rvmallad at amazon dot com> ---
Thanks for the comments and for giving us some history/ perspective. I agree
with this statement,

> Pushing up -O2 limits can make sense, but needs to be done carefully -
> in longer term IMO we do not want to let -O2 binaries to grow faster
> than their perofrmance. Sadly this figure is not that great.

and hence this option was proposed to help the user explicitly enable it and
get more performance gains w inlining in addtion to LTO. The initial
description I posted had perf upside for individual SPEC cpu2017 Int rate
benchmarks w LTO. Note that not all benchmarks benefit and indeed
`500.perlbench_r` perf went down w code size increase. But some other
benchmarks such as `502.gcc_r`, `523.xalancbmk_r`, `531.deepsjeng_r` and
`541.leela_r` saw better performance. Customer applications such as Envoy saw
higher performance with these inline parameters.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug ipa/114531] Feature proposal for an `-finline-functions-aggressive` compiler option
  2024-03-30  1:25 [Bug driver/114531] New: Feature proposal for an `-finline-functions-aggressive` compiler option rvmallad at amazon dot com
                   ` (15 preceding siblings ...)
  2024-06-25 17:40 ` rvmallad at amazon dot com
@ 2024-06-25 17:48 ` rvmallad at amazon dot com
  2024-06-25 18:40 ` rsandifo at gcc dot gnu.org
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 22+ messages in thread
From: rvmallad at amazon dot com @ 2024-06-25 17:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114531

--- Comment #16 from Rama Malladi <rvmallad at amazon dot com> ---
I had posted a patch at the URL below for this feature:
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655506.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug ipa/114531] Feature proposal for an `-finline-functions-aggressive` compiler option
  2024-03-30  1:25 [Bug driver/114531] New: Feature proposal for an `-finline-functions-aggressive` compiler option rvmallad at amazon dot com
                   ` (16 preceding siblings ...)
  2024-06-25 17:48 ` rvmallad at amazon dot com
@ 2024-06-25 18:40 ` rsandifo at gcc dot gnu.org
  2024-06-25 22:25   ` Jan Hubicka
  2024-06-25 22:25 ` hubicka at ucw dot cz
  2024-06-27 11:46 ` rvmallad at amazon dot com
  19 siblings, 1 reply; 22+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2024-06-25 18:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114531

--- Comment #17 from Richard Sandiford <rsandifo at gcc dot gnu.org> ---
I can see that it's useful to ask whether the current -O2 & -O3 inlining
heuristics are making the right trade-off.  But I think that's really a
different issue from the one that is raised in the PR.  (Unless we think that
-O2 and -O3 should always have the same inlining heuristics henceforward, but
that seems unlikely.)

At the moment, -O3 is essentially -O2 + some -f options + some --param options.
 Users who want to pick & chose some of the -f options can do so, and can add
them to stable build systems.  Normally, obsolete -f options are turned into
no-ops rather than removed.  But users can't pick & choose the --params, and
add them to stable build systems, because we reserve the right to remove
--params without warning.

So IMO, we should have an -f option that represents “the inlining parameters
enabled by -O3”, whatever they happen to be for a given release.  It's OK if
the set is empty.

For such a change, it doesn't really matter whether the current --params are
the right ones.  It just matters that the --params are the ones that we
currently use.  If the --params are changed later, the -f option and -O3 will
automatically stay in sync.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Bug ipa/114531] Feature proposal for an `-finline-functions-aggressive` compiler option
  2024-06-25 18:40 ` rsandifo at gcc dot gnu.org
@ 2024-06-25 22:25   ` Jan Hubicka
  0 siblings, 0 replies; 22+ messages in thread
From: Jan Hubicka @ 2024-06-25 22:25 UTC (permalink / raw)
  To: rsandifo at gcc dot gnu.org; +Cc: gcc-bugs

> different issue from the one that is raised in the PR.  (Unless we think that
> -O2 and -O3 should always have the same inlining heuristics henceforward, but
> that seems unlikely.)

Yes, I think point of -O3 is to let compiler to be more aggressive than
what seems desirable for your average distro build defaults (which needs
to balance speed and size).
> 
> At the moment, -O3 is essentially -O2 + some -f options + some --param options.
>  Users who want to pick & chose some of the -f options can do so, and can add
> them to stable build systems.  Normally, obsolete -f options are turned into
> no-ops rather than removed.  But users can't pick & choose the --params, and
> add them to stable build systems, because we reserve the right to remove
> --params without warning.

Moreover those --params are slowly chaning their meaning in time.  I
need to retune inliner when early inlining gets smarter.
> 
> So IMO, we should have an -f option that represents “the inlining parameters
> enabled by -O3”, whatever they happen to be for a given release.  It's OK if
> the set is empty.
> 
> For such a change, it doesn't really matter whether the current --params are
> the right ones.  It just matters that the --params are the ones that we
> currently use.  If the --params are changed later, the -f option and -O3 will
> automatically stay in sync.

I am trying to understand how useful this is.  I am basically worried
about two things
 1) we have other optimization passes that behave differently at -O2 and
    -O3 (vectorizer, unrolling etc.) and I think we may want to have
    more. We also have -Os and -O1.

    So perhaps we want kind of more systmatic solution. We already have
    -fvect-cost-model that is kind of vectorizer version of the proposed
    inliner option.
 2) inliner is already quite painful to tune. Especially since 
     one really needs to benchmark packages significantly bigger than
     SPECs which tends to be bit hard to set up and benchmark
     meaningfully. I usually do at least Firefox and clang where the
     first is always quite some work to get working well with latest
     GCC. We SUSE's LNT we also run "C++ behchmarks" which were
     initially collected as kind of inliner tests with higher
     abstraction penalty (tramp3d etc.).

     For many years I benchmarked primarily -O3 and -O3 + profile
     feedbcak on x86-64 only with ocassional look at -O2 and -Os
     behaviour which were generally more stable.
     I also tested other targets (poer and aarch64) but just
     sporadically, which is not good.

     After GCC5 I doubled testing to include both lto/non-lto variant.
     Since GCC10 -O2 started to envolve and needed re-testing too
     (lto/nonlto). One metric I know I ought to tune is -O2 -flto and
     FDO which used to be essentially -O3 before the optimization level
     --params were introduced, but now -O2 + FDO inlining is more
     conservative which hurts, for example, profiledbootstrapped GCC.

     So naturally I am bit worried to introduce even more combinations
     that needs testing and maintenance.  If we add user friendly way to
     tweak this, we also make a promise to keep it sane.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug ipa/114531] Feature proposal for an `-finline-functions-aggressive` compiler option
  2024-03-30  1:25 [Bug driver/114531] New: Feature proposal for an `-finline-functions-aggressive` compiler option rvmallad at amazon dot com
                   ` (17 preceding siblings ...)
  2024-06-25 18:40 ` rsandifo at gcc dot gnu.org
@ 2024-06-25 22:25 ` hubicka at ucw dot cz
  2024-06-27 11:46 ` rvmallad at amazon dot com
  19 siblings, 0 replies; 22+ messages in thread
From: hubicka at ucw dot cz @ 2024-06-25 22:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114531

--- Comment #18 from Jan Hubicka <hubicka at ucw dot cz> ---
> different issue from the one that is raised in the PR.  (Unless we think that
> -O2 and -O3 should always have the same inlining heuristics henceforward, but
> that seems unlikely.)

Yes, I think point of -O3 is to let compiler to be more aggressive than
what seems desirable for your average distro build defaults (which needs
to balance speed and size).
> 
> At the moment, -O3 is essentially -O2 + some -f options + some --param options.
>  Users who want to pick & chose some of the -f options can do so, and can add
> them to stable build systems.  Normally, obsolete -f options are turned into
> no-ops rather than removed.  But users can't pick & choose the --params, and
> add them to stable build systems, because we reserve the right to remove
> --params without warning.

Moreover those --params are slowly chaning their meaning in time.  I
need to retune inliner when early inlining gets smarter.
> 
> So IMO, we should have an -f option that represents “the inlining parameters
> enabled by -O3”, whatever they happen to be for a given release.  It's OK if
> the set is empty.
> 
> For such a change, it doesn't really matter whether the current --params are
> the right ones.  It just matters that the --params are the ones that we
> currently use.  If the --params are changed later, the -f option and -O3 will
> automatically stay in sync.

I am trying to understand how useful this is.  I am basically worried
about two things
 1) we have other optimization passes that behave differently at -O2 and
    -O3 (vectorizer, unrolling etc.) and I think we may want to have
    more. We also have -Os and -O1.

    So perhaps we want kind of more systmatic solution. We already have
    -fvect-cost-model that is kind of vectorizer version of the proposed
    inliner option.
 2) inliner is already quite painful to tune. Especially since 
     one really needs to benchmark packages significantly bigger than
     SPECs which tends to be bit hard to set up and benchmark
     meaningfully. I usually do at least Firefox and clang where the
     first is always quite some work to get working well with latest
     GCC. We SUSE's LNT we also run "C++ behchmarks" which were
     initially collected as kind of inliner tests with higher
     abstraction penalty (tramp3d etc.).

     For many years I benchmarked primarily -O3 and -O3 + profile
     feedbcak on x86-64 only with ocassional look at -O2 and -Os
     behaviour which were generally more stable.
     I also tested other targets (poer and aarch64) but just
     sporadically, which is not good.

     After GCC5 I doubled testing to include both lto/non-lto variant.
     Since GCC10 -O2 started to envolve and needed re-testing too
     (lto/nonlto). One metric I know I ought to tune is -O2 -flto and
     FDO which used to be essentially -O3 before the optimization level
     --params were introduced, but now -O2 + FDO inlining is more
     conservative which hurts, for example, profiledbootstrapped GCC.

     So naturally I am bit worried to introduce even more combinations
     that needs testing and maintenance.  If we add user friendly way to
     tweak this, we also make a promise to keep it sane.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug ipa/114531] Feature proposal for an `-finline-functions-aggressive` compiler option
  2024-03-30  1:25 [Bug driver/114531] New: Feature proposal for an `-finline-functions-aggressive` compiler option rvmallad at amazon dot com
                   ` (18 preceding siblings ...)
  2024-06-25 22:25 ` hubicka at ucw dot cz
@ 2024-06-27 11:46 ` rvmallad at amazon dot com
  19 siblings, 0 replies; 22+ messages in thread
From: rvmallad at amazon dot com @ 2024-06-27 11:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114531

--- Comment #19 from Rama Malladi <rvmallad at amazon dot com> ---
Thank you Hubicka@ for the inputs. I see your intent and that we have to
revisit the inline parameter tuning. As I and Richard S mentioned, the intent
of this feature request or PR is to expose such an option to the user for
getting aggressive inline optimizations enabled by the compiler.

Coming to some equivalent flags such as `-fvect-cost-model`, those flags allow
for choosing one of the multiple models such as 'dynamic', 'cheap'... In case
of inline parameter choice, we have only 2 choices: 'default', 'aggressive'.
Hence, adding an option such as `-finline-functions-aggressive` would be fine
to toggle between default and aggressive settings.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2024-06-27 11:46 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-30  1:25 [Bug driver/114531] New: Feature proposal for an `-finline-functions-aggressive` compiler option rvmallad at amazon dot com
2024-03-30  1:37 ` [Bug ipa/114531] " pinskia at gcc dot gnu.org
2024-03-30  1:38 ` pinskia at gcc dot gnu.org
2024-03-30  1:56 ` pinskia at gcc dot gnu.org
2024-03-30  2:19 ` rvmallad at amazon dot com
2024-03-30  2:21 ` rvmallad at amazon dot com
2024-03-30  2:29 ` pinskia at gcc dot gnu.org
2024-04-01 11:41 ` rvmallad at amazon dot com
2024-04-02  8:36 ` rguenth at gcc dot gnu.org
2024-04-08  9:52 ` rvmallad at amazon dot com
2024-05-31 13:57 ` rvmallad at amazon dot com
2024-06-25 13:03 ` wilco at gcc dot gnu.org
2024-06-25 15:30 ` rvmallad at amazon dot com
2024-06-25 16:20 ` hubicka at ucw dot cz
2024-06-25 16:25 ` rvmallad at amazon dot com
2024-06-25 16:49 ` hubicka at ucw dot cz
2024-06-25 17:40 ` rvmallad at amazon dot com
2024-06-25 17:48 ` rvmallad at amazon dot com
2024-06-25 18:40 ` rsandifo at gcc dot gnu.org
2024-06-25 22:25   ` Jan Hubicka
2024-06-25 22:25 ` hubicka at ucw dot cz
2024-06-27 11:46 ` rvmallad at amazon dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).