public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/96337] New: GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4
@ 2020-07-27 15:32 aros at gmx dot com
  2020-07-27 18:36 ` [Bug rtl-optimization/96337] " david.bolvansky at gmail dot com
                   ` (20 more replies)
  0 siblings, 21 replies; 24+ messages in thread
From: aros at gmx dot com @ 2020-07-27 15:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96337

            Bug ID: 96337
           Summary: GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC
                    9.3/8.4
           Product: gcc
           Version: 10.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: aros at gmx dot com
  Target Milestone: ---

All the pertinent details are on this page:

https://www.phoronix.com/scan.php?page=article&item=gcc-10900k-compiler&num=4

Options used: -O2

CPU used: Intel Core i9 10900K

I wonder what could have caused such a huge regression. Many Linux distros
compile their code just for -march=x86-64 and could be affected by the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug rtl-optimization/96337] GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4
  2020-07-27 15:32 [Bug rtl-optimization/96337] New: GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4 aros at gmx dot com
@ 2020-07-27 18:36 ` david.bolvansky at gmail dot com
  2020-07-27 21:13 ` aros at gmx dot com
                   ` (19 subsequent siblings)
  20 siblings, 0 replies; 24+ messages in thread
From: david.bolvansky at gmail dot com @ 2020-07-27 18:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96337

Dávid Bolvanský <david.bolvansky at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |david.bolvansky at gmail dot com

--- Comment #1 from Dávid Bolvanský <david.bolvansky at gmail dot com> ---
Inliner changes?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug rtl-optimization/96337] GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4
  2020-07-27 15:32 [Bug rtl-optimization/96337] New: GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4 aros at gmx dot com
  2020-07-27 18:36 ` [Bug rtl-optimization/96337] " david.bolvansky at gmail dot com
@ 2020-07-27 21:13 ` aros at gmx dot com
  2020-07-28  6:04 ` [Bug ipa/96337] [10/11 Regression] " rguenth at gcc dot gnu.org
                   ` (18 subsequent siblings)
  20 siblings, 0 replies; 24+ messages in thread
From: aros at gmx dot com @ 2020-07-27 21:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96337

--- Comment #2 from Artem S. Tashkinov <aros at gmx dot com> ---
Looks like even kernel performance is affected:
https://lore.kernel.org/lkml/20200507224530.2993316-1-Jason@zx2c4.com/

That was surely not a change for the better.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug ipa/96337] [10/11 Regression] GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4
  2020-07-27 15:32 [Bug rtl-optimization/96337] New: GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4 aros at gmx dot com
  2020-07-27 18:36 ` [Bug rtl-optimization/96337] " david.bolvansky at gmail dot com
  2020-07-27 21:13 ` aros at gmx dot com
@ 2020-07-28  6:04 ` rguenth at gcc dot gnu.org
  2020-07-28  6:04 ` rguenth at gcc dot gnu.org
                   ` (17 subsequent siblings)
  20 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-07-28  6:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96337

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|rtl-optimization            |ipa
            Summary|GCC 10.2: twice as slow for |[10/11 Regression] GCC
                   |-O2 -march=x86-64 vs. GCC   |10.2: twice as slow for -O2
                   |9.3/8.4                     |-march=x86-64 vs. GCC
                   |                            |9.3/8.4
           Keywords|                            |missed-optimization
                 CC|                            |hubicka at gcc dot gnu.org,
                   |                            |marxin at gcc dot gnu.org

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
Well, the workloads tested are not -O2 workloads but yes, distros likely still
will use -O2 for them unless the package itself overrides.

But IIRC the main change was that -O2 -fprofile-use no longer uses -O3
inliner settings, the settings for -O2 itself were not changed much?  Honza?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug ipa/96337] [10/11 Regression] GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4
  2020-07-27 15:32 [Bug rtl-optimization/96337] New: GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4 aros at gmx dot com
                   ` (2 preceding siblings ...)
  2020-07-28  6:04 ` [Bug ipa/96337] [10/11 Regression] " rguenth at gcc dot gnu.org
@ 2020-07-28  6:04 ` rguenth at gcc dot gnu.org
  2020-07-28  6:20 ` hubicka at gcc dot gnu.org
                   ` (16 subsequent siblings)
  20 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-07-28  6:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96337

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |10.3

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug ipa/96337] [10/11 Regression] GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4
  2020-07-27 15:32 [Bug rtl-optimization/96337] New: GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4 aros at gmx dot com
                   ` (3 preceding siblings ...)
  2020-07-28  6:04 ` rguenth at gcc dot gnu.org
@ 2020-07-28  6:20 ` hubicka at gcc dot gnu.org
  2020-07-28  6:42 ` hubicka at gcc dot gnu.org
                   ` (15 subsequent siblings)
  20 siblings, 0 replies; 24+ messages in thread
From: hubicka at gcc dot gnu.org @ 2020-07-28  6:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96337

--- Comment #4 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
There was changes to -O2 inliner.  I have
 - enabled auto-inlininig
 - reduced early inlining a bit
 - reduced limits for inlining functions declared inline
The second two was needed to keep code size under control and did well on
overall -O2 spec and Firefox performance (without FDO, with FDO we indeed had
some performance loss and code size gains, which I plan to revisit).

This should not be visible on linux kernel though since it does always inline.
The linked patch to enable -O3 by default does not make too much sense to me. 

I will see if I can reproduce phoronix benchmarks - indeed those workloads are
not typical -O2 workloads and may be affected by the inline limits.

Honza

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug ipa/96337] [10/11 Regression] GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4
  2020-07-27 15:32 [Bug rtl-optimization/96337] New: GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4 aros at gmx dot com
                   ` (4 preceding siblings ...)
  2020-07-28  6:20 ` hubicka at gcc dot gnu.org
@ 2020-07-28  6:42 ` hubicka at gcc dot gnu.org
  2020-07-28  6:49 ` hubicka at gcc dot gnu.org
                   ` (14 subsequent siblings)
  20 siblings, 0 replies; 24+ messages in thread
From: hubicka at gcc dot gnu.org @ 2020-07-28  6:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96337

--- Comment #5 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
OK, I started with checking Himeno where phoronix reports 4377->2681
on my notebook (Intel(R) Core(TM) i7-6600U CPU) there may be around 1-5%
regression that is not inliner related

GCC 10
 Loop executed for 7445 times
 Gosa : 2.924613e-08 
 MFLOPS measured : 2346.645663  cpu : 50.172505
 Score based on Pentium III 600MHz using Fortran 77: 28.617630

GCC 9
 Loop executed for 8253 times
 Gosa : 9.062229e-09 
 MFLOPS measured : 2454.019320  cpu : 53.184180
 Score based on Pentium III 600MHz using Fortran 77: 29.927065

The internal loops and inlining looks almost identical.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug ipa/96337] [10/11 Regression] GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4
  2020-07-27 15:32 [Bug rtl-optimization/96337] New: GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4 aros at gmx dot com
                   ` (5 preceding siblings ...)
  2020-07-28  6:42 ` hubicka at gcc dot gnu.org
@ 2020-07-28  6:49 ` hubicka at gcc dot gnu.org
  2020-07-28  8:08 ` hubicka at gcc dot gnu.org
                   ` (13 subsequent siblings)
  20 siblings, 0 replies; 24+ messages in thread
From: hubicka at gcc dot gnu.org @ 2020-07-28  6:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96337

--- Comment #6 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Coremark.

GCC 9 run1:
CoreMark Size    : 666
Total ticks      : 12310
Total time (secs): 12.310000
Iterations/Sec   : 24370.430544
Iterations       : 300000
Compiler version : GCC9.3.1 20200406 [revision
6db837a5288ee3ca5ec504fbd5a765817e556ac2]
Compiler flags   : -O2 -DPERFORMANCE_RUN=1  -lrt

GCC 9 run2:
CoreMark Size    : 666
Total ticks      : 12471
Total time (secs): 12.471000
Iterations/Sec   : 24055.809478
Iterations       : 300000
Compiler version : GCC9.3.1 20200406 [revision
6db837a5288ee3ca5ec504fbd5a765817e556ac2]
Compiler flags   : -O2 -DPERFORMANCE_RUN=1  -lrt


GCC 10 run1:
CoreMark Size    : 666
Total ticks      : 15269
Total time (secs): 15.269000
Iterations/Sec   : 26196.869474
Iterations       : 400000
Compiler version : GCC10.1.1 20200507 [revision
dd38686d9c810cecbaa80bb82ed91caaa58ad635]
Compiler flags   : -O2 -DPERFORMANCE_RUN=1  -lrt

GCC 10 run2:
CoreMark Size    : 666
Total ticks      : 11770
Total time (secs): 11.770000
Iterations/Sec   : 25488.530161
Iterations       : 300000
Compiler version : GCC10.1.1 20200507 [revision
dd38686d9c810cecbaa80bb82ed91caaa58ad635]
Compiler flags   : -O2 -DPERFORMANCE_RUN=1  -lrt

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug ipa/96337] [10/11 Regression] GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4
  2020-07-27 15:32 [Bug rtl-optimization/96337] New: GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4 aros at gmx dot com
                   ` (6 preceding siblings ...)
  2020-07-28  6:49 ` hubicka at gcc dot gnu.org
@ 2020-07-28  8:08 ` hubicka at gcc dot gnu.org
  2020-07-28  8:24 ` hubicka at gcc dot gnu.org
                   ` (12 subsequent siblings)
  20 siblings, 0 replies; 24+ messages in thread
From: hubicka at gcc dot gnu.org @ 2020-07-28  8:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96337

--- Comment #7 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
X265
GCC 9:
y4m  [info]: 1920x1080 fps 30/1 i420p8 frames 0 - 599 of 600
raw  [info]: output file: /dev/null
x265 [info]: HEVC encoder version 3.1.2+1-76650bab70f9
x265 [info]: build info [Linux][GCC 9.3.1][64 bit][noasm] 8bit
x265 [info]: using cpu capabilities: none!
x265 [info]: Main profile, Level-4 (Main tier)
x265 [info]: Thread pool created using 4 threads
x265 [info]: Slices                              : 1
x265 [info]: frame threads / pool features       : 2 / wpp(17 rows)
x265 [info]: Coding QT: max CU size, min CU size : 64 / 8
x265 [info]: Residual QT: max TU size, max depth : 32 / 1 inter / 1 intra
x265 [info]: ME / range / subpel / merge         : hex / 57 / 2 / 3
x265 [info]: Keyframe min / max / scenecut / bias: 25 / 250 / 40 / 5.00
x265 [info]: Lookahead / bframes / badapt        : 20 / 4 / 2
x265 [info]: b-pyramid / weightp / weightb       : 1 / 1 / 0
x265 [info]: References / ref-limit  cu / depth  : 3 / off / on
x265 [info]: AQ: mode / str / qg-size / cu-tree  : 2 / 1.0 / 32 / 1
x265 [info]: Rate Control / qCompress            : CRF-28.0 / 0.60
x265 [info]: tools: rd=3 psy-rd=2.00 early-skip rskip signhide tmvp b-intra
x265 [info]: tools: strong-intra-smoothing lslices=6 deblock sao
x265 [info]: frame I:      3, Avg QP:27.57  kb/s: 14018.64                      
x265 [info]: frame P:    146, Avg QP:28.84  kb/s: 4313.98 
x265 [info]: frame B:    451, Avg QP:35.29  kb/s: 204.06  
x265 [info]: Weighted P-Frames: Y:0.0% UV:0.0%
x265 [info]: consecutive B-frames: 0.7% 0.0% 0.0% 94.6% 4.7% 

encoded 600 frames in 279.98s (2.14 fps), 1273.22 kb/s, Avg QP:33.68
1056.04user 1.31system 4:40.01elapsed 377%CPU (0avgtext+0avgdata
432688maxresident)k
0inputs+0outputs (0major+102385minor)pagefaults 0swaps


GCC 10:
y4m  [info]: 1920x1080 fps 30/1 i420p8 frames 0 - 599 of 600
raw  [info]: output file: /dev/null
x265 [info]: HEVC encoder version 3.1.2+1-76650bab70f9
x265 [info]: build info [Linux][GCC 10.1.1][64 bit][noasm] 8bit
x265 [info]: using cpu capabilities: none!
x265 [info]: Main profile, Level-4 (Main tier)
x265 [info]: Thread pool created using 4 threads
x265 [info]: Slices                              : 1
x265 [info]: frame threads / pool features       : 2 / wpp(17 rows)
x265 [info]: Coding QT: max CU size, min CU size : 64 / 8
x265 [info]: Residual QT: max TU size, max depth : 32 / 1 inter / 1 intra
x265 [info]: ME / range / subpel / merge         : hex / 57 / 2 / 3
x265 [info]: Keyframe min / max / scenecut / bias: 25 / 250 / 40 / 5.00
x265 [info]: Lookahead / bframes / badapt        : 20 / 4 / 2
x265 [info]: b-pyramid / weightp / weightb       : 1 / 1 / 0
x265 [info]: References / ref-limit  cu / depth  : 3 / off / on
x265 [info]: AQ: mode / str / qg-size / cu-tree  : 2 / 1.0 / 32 / 1
x265 [info]: Rate Control / qCompress            : CRF-28.0 / 0.60
x265 [info]: tools: rd=3 psy-rd=2.00 early-skip rskip signhide tmvp b-intra
x265 [info]: tools: strong-intra-smoothing lslices=6 deblock sao
x265 [info]: frame I:      3, Avg QP:27.57  kb/s: 14018.64                      
x265 [info]: frame P:    146, Avg QP:28.84  kb/s: 4313.98 
x265 [info]: frame B:    451, Avg QP:35.29  kb/s: 204.06  
x265 [info]: Weighted P-Frames: Y:0.0% UV:0.0%
x265 [info]: consecutive B-frames: 0.7% 0.0% 0.0% 94.6% 4.7% 

encoded 600 frames in 292.63s (2.05 fps), 1273.22 kb/s, Avg QP:33.68
1079.80user 1.76system 4:52.65elapsed 369%CPU (0avgtext+0avgdata
427464maxresident)k
0inputs+0outputs (0major+73644minor)pagefaults 0swaps

So 5% difference instead of 50%. This is a codebase that I would build with
-O3.  Looking at perf reports there is a difference in inlining.

GCC 9:
   8.74%  x265     libx265.so.176       [.] (anonymous namespace)::satd_8x4
   5.67%  x265     libx265.so.176       [.] (anonymous
namespace)::filterVertical_sp_c<8>
   4.44%  x265     libx265.so.176       [.] (anonymous
namespace)::pixelavg_pp<8, 8>
   4.11%  x265     libx265.so.176       [.] (anonymous
namespace)::psyCost_pp<3>                                                       
   3.81%  x265     libx265.so.176       [.] (anonymous
namespace)::interp_horiz_ps_c<8, 64, 64>
   3.33%  x265     libx265.so.176       [.] (anonymous namespace)::sad<8, 8>
   3.29%  x265     libx265.so.176       [.] partialButterfly32

GCC 10:
   9.17%  x265     libx265.so.176       [.] (anonymous namespace)::_sa8d_8x8
   8.70%  x265     libx265.so.176       [.] (anonymous namespace)::satd_8x4 
   5.80%  x265     libx265.so.176       [.] (anonymous
namespace)::pixelavg_pp<8, 8>
   5.55%  x265     libx265.so.176       [.] (anonymous
namespace)::filterVertical_sp_c<8> 
   3.90%  x265     libx265.so.176       [.] (anonymous namespace)::sad<8, 8>
   3.71%  x265     libx265.so.176       [.] (anonymous
namespace)::interp_horiz_ps_c<8, 64, 64> 
   3.48%  x265     libx265.so.176       [.] (anonymous namespace)::sad_x4<8, 8>

I build with 
cmake ../source/ -DCMAKE_CXX_FLAGS=-O2 -DCMAKE_CXX_FLAGS_RELEASE=-DNDEBUG
-DCMAKE_CXX_COMPILER=g++-9
I think phoronix may be missing release flag override so he may be testing -O3
build.

GCC 9 inlines _sa8d_8x8 while GCC 10 does not. It is estimated by inliner to
159 insns, so this is indeed the change from --param inline-insns-single
dropping it  from 200 to 70 for -O2. The default of 200 did not make very good
sense for -O2 since inline is abused by C++ codebases (this was main point of
the retuning)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug ipa/96337] [10/11 Regression] GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4
  2020-07-27 15:32 [Bug rtl-optimization/96337] New: GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4 aros at gmx dot com
                   ` (7 preceding siblings ...)
  2020-07-28  8:08 ` hubicka at gcc dot gnu.org
@ 2020-07-28  8:24 ` hubicka at gcc dot gnu.org
  2020-07-28  8:39 ` hubicka at gcc dot gnu.org
                   ` (11 subsequent siblings)
  20 siblings, 0 replies; 24+ messages in thread
From: hubicka at gcc dot gnu.org @ 2020-07-28  8:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96337

--- Comment #8 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
This is the built withour release flags override as seems to be done by
phoronix:

GCC 9:
y4m  [info]: 1920x1080 fps 30/1 i420p8 frames 0 - 599 of 600
raw  [info]: output file: /dev/null
x265 [info]: HEVC encoder version 3.1.2+1-76650bab70f9
x265 [info]: build info [Linux][GCC 9.3.1][64 bit][noasm] 8bit
x265 [info]: using cpu capabilities: none!
x265 [info]: Main profile, Level-4 (Main tier)
x265 [info]: Thread pool created using 4 threads
x265 [info]: Slices                              : 1
x265 [info]: frame threads / pool features       : 2 / wpp(17 rows)
x265 [info]: Coding QT: max CU size, min CU size : 64 / 8
x265 [info]: Residual QT: max TU size, max depth : 32 / 1 inter / 1 intra
x265 [info]: ME / range / subpel / merge         : hex / 57 / 2 / 3
x265 [info]: Keyframe min / max / scenecut / bias: 25 / 250 / 40 / 5.00
x265 [info]: Lookahead / bframes / badapt        : 20 / 4 / 2
x265 [info]: b-pyramid / weightp / weightb       : 1 / 1 / 0
x265 [info]: References / ref-limit  cu / depth  : 3 / off / on
x265 [info]: AQ: mode / str / qg-size / cu-tree  : 2 / 1.0 / 32 / 1
x265 [info]: Rate Control / qCompress            : CRF-28.0 / 0.60
x265 [info]: tools: rd=3 psy-rd=2.00 early-skip rskip signhide tmvp b-intra
x265 [info]: tools: strong-intra-smoothing lslices=6 deblock sao
x265 [info]: frame I:      3, Avg QP:27.57  kb/s: 14018.64                      
x265 [info]: frame P:    146, Avg QP:28.84  kb/s: 4313.98 
x265 [info]: frame B:    451, Avg QP:35.29  kb/s: 204.06  
x265 [info]: Weighted P-Frames: Y:0.0% UV:0.0%
x265 [info]: consecutive B-frames: 0.7% 0.0% 0.0% 94.6% 4.7% 

encoded 600 frames in 171.30s (3.50 fps), 1273.22 kb/s, Avg QP:33.68
599.58user 1.62system 2:51.33elapsed 350%CPU (0avgtext+0avgdata
416976maxresident)k
225384inputs+0outputs (0major+95380minor)pagefaults 0swaps

GCC 10:
y4m  [info]: 1920x1080 fps 30/1 i420p8 frames 0 - 599 of 600
raw  [info]: output file: /dev/null
x265 [info]: HEVC encoder version 3.1.2+1-76650bab70f9
x265 [info]: build info [Linux][GCC 10.1.1][64 bit][noasm] 8bit
x265 [info]: using cpu capabilities: none!
x265 [info]: Main profile, Level-4 (Main tier)
x265 [info]: Thread pool created using 4 threads
x265 [info]: Slices                              : 1
x265 [info]: frame threads / pool features       : 2 / wpp(17 rows)
x265 [info]: Coding QT: max CU size, min CU size : 64 / 8
x265 [info]: Residual QT: max TU size, max depth : 32 / 1 inter / 1 intra
x265 [info]: ME / range / subpel / merge         : hex / 57 / 2 / 3
x265 [info]: Keyframe min / max / scenecut / bias: 25 / 250 / 40 / 5.00
x265 [info]: Lookahead / bframes / badapt        : 20 / 4 / 2
x265 [info]: b-pyramid / weightp / weightb       : 1 / 1 / 0
x265 [info]: References / ref-limit  cu / depth  : 3 / off / on
x265 [info]: AQ: mode / str / qg-size / cu-tree  : 2 / 1.0 / 32 / 1
x265 [info]: Rate Control / qCompress            : CRF-28.0 / 0.60
x265 [info]: tools: rd=3 psy-rd=2.00 early-skip rskip signhide tmvp b-intra
x265 [info]: tools: strong-intra-smoothing lslices=6 deblock sao
x265 [info]: frame I:      3, Avg QP:27.57  kb/s: 14018.64                      
x265 [info]: frame P:    146, Avg QP:28.84  kb/s: 4313.98 
x265 [info]: frame B:    451, Avg QP:35.29  kb/s: 204.06  
x265 [info]: Weighted P-Frames: Y:0.0% UV:0.0%
x265 [info]: consecutive B-frames: 0.7% 0.0% 0.0% 94.6% 4.7% 

encoded 600 frames in 168.97s (3.55 fps), 1273.22 kb/s, Avg QP:33.68
592.69user 1.89system 2:49.00elapsed 351%CPU (0avgtext+0avgdata
416184maxresident)k
476408inputs+0outputs (1major+95191minor)pagefaults 0swaps

So a small improvement.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug ipa/96337] [10/11 Regression] GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4
  2020-07-27 15:32 [Bug rtl-optimization/96337] New: GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4 aros at gmx dot com
                   ` (8 preceding siblings ...)
  2020-07-28  8:24 ` hubicka at gcc dot gnu.org
@ 2020-07-28  8:39 ` hubicka at gcc dot gnu.org
  2020-07-28  8:47 ` david.bolvansky at gmail dot com
                   ` (10 subsequent siblings)
  20 siblings, 0 replies; 24+ messages in thread
From: hubicka at gcc dot gnu.org @ 2020-07-28  8:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96337

--- Comment #9 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
scimark
GCC 9:
**                                                              **
** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
** for details. (Results can be submitted to pozo@nist.gov)     **
**                                                              **
Using       2.00 seconds min time per kenel.
Composite Score:         1062.28
FFT             Mflops:   189.17    (N=1048576)
SOR             Mflops:   947.53    (1000 x 1000)
MonteCarlo:     Mflops:   710.10
Sparse matmult  Mflops:  1402.08    (N=100000, nz=1000000)
LU              Mflops:  2062.49    (M=1000, N=1000)

GCC 10:
**                                                              **
** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
** for details. (Results can be submitted to pozo@nist.gov)     **
**                                                              **
Using       2.00 seconds min time per kenel.
Composite Score:         1176.22
FFT             Mflops:   201.17    (N=1048576)
SOR             Mflops:   961.33    (1000 x 1000)
MonteCarlo:     Mflops:   708.62
Sparse matmult  Mflops:  1639.66    (N=100000, nz=1000000)
LU              Mflops:  2370.30    (M=1000, N=1000)

So again around 10% improvement for gcc10

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug ipa/96337] [10/11 Regression] GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4
  2020-07-27 15:32 [Bug rtl-optimization/96337] New: GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4 aros at gmx dot com
                   ` (9 preceding siblings ...)
  2020-07-28  8:39 ` hubicka at gcc dot gnu.org
@ 2020-07-28  8:47 ` david.bolvansky at gmail dot com
  2020-07-28  9:37   ` Jan Hubicka
  2020-07-28  9:37 ` hubicka at ucw dot cz
                   ` (9 subsequent siblings)
  20 siblings, 1 reply; 24+ messages in thread
From: david.bolvansky at gmail dot com @ 2020-07-28  8:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96337

--- Comment #10 from Dávid Bolvanský <david.bolvansky at gmail dot com> ---
>> Compiler version : GCC10.1.1

Maybe you want to use same GCC version as phoronix used (GCC 10.2)?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Bug ipa/96337] [10/11 Regression] GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4
  2020-07-28  8:47 ` david.bolvansky at gmail dot com
@ 2020-07-28  9:37   ` Jan Hubicka
  0 siblings, 0 replies; 24+ messages in thread
From: Jan Hubicka @ 2020-07-28  9:37 UTC (permalink / raw)
  To: david.bolvansky at gmail dot com; +Cc: gcc-bugs

> 
> Maybe you want to use same GCC version as phoronix used (GCC 10.2)?
OK, I will give it a try, but there are no inliner changes in gcc 10.2
compared to 10.1.

Honza


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug ipa/96337] [10/11 Regression] GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4
  2020-07-27 15:32 [Bug rtl-optimization/96337] New: GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4 aros at gmx dot com
                   ` (10 preceding siblings ...)
  2020-07-28  8:47 ` david.bolvansky at gmail dot com
@ 2020-07-28  9:37 ` hubicka at ucw dot cz
  2020-07-29  7:56 ` aros at gmx dot com
                   ` (8 subsequent siblings)
  20 siblings, 0 replies; 24+ messages in thread
From: hubicka at ucw dot cz @ 2020-07-28  9:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96337

--- Comment #11 from Jan Hubicka <hubicka at ucw dot cz> ---
> 
> Maybe you want to use same GCC version as phoronix used (GCC 10.2)?
OK, I will give it a try, but there are no inliner changes in gcc 10.2
compared to 10.1.

Honza

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug ipa/96337] [10/11 Regression] GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4
  2020-07-27 15:32 [Bug rtl-optimization/96337] New: GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4 aros at gmx dot com
                   ` (11 preceding siblings ...)
  2020-07-28  9:37 ` hubicka at ucw dot cz
@ 2020-07-29  7:56 ` aros at gmx dot com
  2020-08-01 18:53 ` andysem at mail dot ru
                   ` (7 subsequent siblings)
  20 siblings, 0 replies; 24+ messages in thread
From: aros at gmx dot com @ 2020-07-29  7:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96337

--- Comment #12 from Artem S. Tashkinov <aros at gmx dot com> ---
Michael has admitted that might be a specific CPU relate regression:

> Been running some more tests today:
> - Tried on a i9-10980XE Cascade Lake and Cascade Lake Xeon systems and did not reproduce...
> - I went back to the i9-10900K and picked just a few of the tests where it was impacted the hardest, but then surprisingly the results were similar that run.

Source:
https://www.phoronix.com/forums/forum/software/programming-compilers/1196789-gcc-benchmarks-at-varying-optimization-levels-with-core-i9-10900k-show-an-unexpected-surprise?p=1197196#post1197196

The plot thickens.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug ipa/96337] [10/11 Regression] GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4
  2020-07-27 15:32 [Bug rtl-optimization/96337] New: GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4 aros at gmx dot com
                   ` (12 preceding siblings ...)
  2020-07-29  7:56 ` aros at gmx dot com
@ 2020-08-01 18:53 ` andysem at mail dot ru
  2020-08-01 19:30   ` Jan Hubicka
  2020-08-01 19:14 ` david.bolvansky at gmail dot com
                   ` (6 subsequent siblings)
  20 siblings, 1 reply; 24+ messages in thread
From: andysem at mail dot ru @ 2020-08-01 18:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96337

--- Comment #13 from andysem at mail dot ru ---
I think, this inliner change needs to be reverted. People expect -O2 to produce
decently optimized binaries, and starting with gcc 10.x it doesn't deliver. -O3
traditionally enabled optimizations that may or may not improve performance
(and historically, sometimes even break code), so most projects don't use it.

If there needs to be an optimization mode that prioritizes compilation speed
then let that be a separate mode, e.g. -O1.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug ipa/96337] [10/11 Regression] GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4
  2020-07-27 15:32 [Bug rtl-optimization/96337] New: GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4 aros at gmx dot com
                   ` (13 preceding siblings ...)
  2020-08-01 18:53 ` andysem at mail dot ru
@ 2020-08-01 19:14 ` david.bolvansky at gmail dot com
  2020-08-01 19:30 ` hubicka at ucw dot cz
                   ` (5 subsequent siblings)
  20 siblings, 0 replies; 24+ messages in thread
From: david.bolvansky at gmail dot com @ 2020-08-01 19:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96337

--- Comment #14 from Dávid Bolvanský <david.bolvansky at gmail dot com> ---
Or change -Os to be gcc10 -O2 with less inlining, -revert O2 to gcc9 -02 and
implement -Oz to create agressive “-Os”.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Bug ipa/96337] [10/11 Regression] GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4
  2020-08-01 18:53 ` andysem at mail dot ru
@ 2020-08-01 19:30   ` Jan Hubicka
  0 siblings, 0 replies; 24+ messages in thread
From: Jan Hubicka @ 2020-08-01 19:30 UTC (permalink / raw)
  To: andysem at mail dot ru; +Cc: gcc-bugs

> I think, this inliner change needs to be reverted. People expect -O2 to produce
> decently optimized binaries, and starting with gcc 10.x it doesn't deliver. -O3
> traditionally enabled optimizations that may or may not improve performance
> (and historically, sometimes even break code), so most projects don't use it.
I wrote a short description of inliner changes to the phoronix
discussion
https://www.phoronix.com/forums/forum/software/programming-compilers/1196789-gcc-benchmarks-at-varying-optimization-levels-with-core-i9-10900k-show-an-unexpected-surprise/page5
comment 44.

Inliner changes was not targetting to make compile time faster and
compiled code slower. It was intended to reflect more closely modern C++
codebases and get faster binaries (at -O2 and -O2 -flto) without
regressing in code sizes.  In fact more inlining happens and thus we
needed to optimize inliner code carefully to avoid regressions with LTO.

It was benchmarked on wide range of bechmarks including some where
phoronix measured a degradation before GCC10 release.

The benchmarks presented does not reproduce and seems odd. 50% on very
simple benchmarks is bit too much for a change in one optimization.  It
seems more like thermal throttling. Michael promised to re-run the tests
and he is still spekaing about htat in the last reply from 31st.

Testcases are greatly welcome.

Honza


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug ipa/96337] [10/11 Regression] GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4
  2020-07-27 15:32 [Bug rtl-optimization/96337] New: GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4 aros at gmx dot com
                   ` (14 preceding siblings ...)
  2020-08-01 19:14 ` david.bolvansky at gmail dot com
@ 2020-08-01 19:30 ` hubicka at ucw dot cz
  2020-09-19 21:34 ` hubicka at gcc dot gnu.org
                   ` (4 subsequent siblings)
  20 siblings, 0 replies; 24+ messages in thread
From: hubicka at ucw dot cz @ 2020-08-01 19:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96337

--- Comment #15 from Jan Hubicka <hubicka at ucw dot cz> ---
> I think, this inliner change needs to be reverted. People expect -O2 to produce
> decently optimized binaries, and starting with gcc 10.x it doesn't deliver. -O3
> traditionally enabled optimizations that may or may not improve performance
> (and historically, sometimes even break code), so most projects don't use it.
I wrote a short description of inliner changes to the phoronix
discussion
https://www.phoronix.com/forums/forum/software/programming-compilers/1196789-gcc-benchmarks-at-varying-optimization-levels-with-core-i9-10900k-show-an-unexpected-surprise/page5
comment 44.

Inliner changes was not targetting to make compile time faster and
compiled code slower. It was intended to reflect more closely modern C++
codebases and get faster binaries (at -O2 and -O2 -flto) without
regressing in code sizes.  In fact more inlining happens and thus we
needed to optimize inliner code carefully to avoid regressions with LTO.

It was benchmarked on wide range of bechmarks including some where
phoronix measured a degradation before GCC10 release.

The benchmarks presented does not reproduce and seems odd. 50% on very
simple benchmarks is bit too much for a change in one optimization.  It
seems more like thermal throttling. Michael promised to re-run the tests
and he is still spekaing about htat in the last reply from 31st.

Testcases are greatly welcome.

Honza

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug ipa/96337] [10/11 Regression] GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4
  2020-07-27 15:32 [Bug rtl-optimization/96337] New: GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4 aros at gmx dot com
                   ` (15 preceding siblings ...)
  2020-08-01 19:30 ` hubicka at ucw dot cz
@ 2020-09-19 21:34 ` hubicka at gcc dot gnu.org
  2020-09-19 21:39 ` vz-gcc at zeitlins dot org
                   ` (3 subsequent siblings)
  20 siblings, 0 replies; 24+ messages in thread
From: hubicka at gcc dot gnu.org @ 2020-09-19 21:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96337

Jan Hubicka <hubicka at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|---                         |WORKSFORME

--- Comment #16 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
It seems that the benchmarks was flawed. We could reopen if phoronix suceeds to
reporduce them.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug ipa/96337] [10/11 Regression] GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4
  2020-07-27 15:32 [Bug rtl-optimization/96337] New: GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4 aros at gmx dot com
                   ` (16 preceding siblings ...)
  2020-09-19 21:34 ` hubicka at gcc dot gnu.org
@ 2020-09-19 21:39 ` vz-gcc at zeitlins dot org
  2020-09-19 21:46 ` hubicka at ucw dot cz
                   ` (2 subsequent siblings)
  20 siblings, 0 replies; 24+ messages in thread
From: vz-gcc at zeitlins dot org @ 2020-09-19 21:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96337

--- Comment #17 from Vadim Zeitlin <vz-gcc at zeitlins dot org> ---
I've just subscribed to this bug because we see bug slow downs in our project
when switching from 8.3 to 10.2 (89% slower in an important use case, 30%
slowdown more or less across the board), without any other changes. We don't
have any simple test showing this (yet), but there is definitely something very
wrong here and I don't think it should be closed.

FWIW in our case using -O3 doesn't help (it does make the code marginally
faster, but improvement of <0.01% is not worth 10% higher build time).

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug ipa/96337] [10/11 Regression] GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4
  2020-07-27 15:32 [Bug rtl-optimization/96337] New: GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4 aros at gmx dot com
                   ` (17 preceding siblings ...)
  2020-09-19 21:39 ` vz-gcc at zeitlins dot org
@ 2020-09-19 21:46 ` hubicka at ucw dot cz
  2020-09-19 21:49 ` vz-gcc at zeitlins dot org
  2020-09-19 21:52 ` hubicka at ucw dot cz
  20 siblings, 0 replies; 24+ messages in thread
From: hubicka at ucw dot cz @ 2020-09-19 21:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96337

--- Comment #18 from Jan Hubicka <hubicka at ucw dot cz> ---
> I've just subscribed to this bug because we see bug slow downs in our project
> when switching from 8.3 to 10.2 (89% slower in an important use case, 30%
> slowdown more or less across the board), without any other changes. We don't
> have any simple test showing this (yet), but there is definitely something very
> wrong here and I don't think it should be closed.
> 
> FWIW in our case using -O3 doesn't help (it does make the code marginally
> faster, but improvement of <0.01% is not worth 10% higher build time).

We need a reproducer to fix bugs.  So if you have actual testcase that
slow down, it would be great to open separate bug report for that.
It is best to have a self contained testcases, if that is not possible
at least a perf profile and we can discuss with you what to do next.

Honza

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug ipa/96337] [10/11 Regression] GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4
  2020-07-27 15:32 [Bug rtl-optimization/96337] New: GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4 aros at gmx dot com
                   ` (18 preceding siblings ...)
  2020-09-19 21:46 ` hubicka at ucw dot cz
@ 2020-09-19 21:49 ` vz-gcc at zeitlins dot org
  2020-09-19 21:52 ` hubicka at ucw dot cz
  20 siblings, 0 replies; 24+ messages in thread
From: vz-gcc at zeitlins dot org @ 2020-09-19 21:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96337

--- Comment #19 from Vadim Zeitlin <vz-gcc at zeitlins dot org> ---
(In reply to Jan Hubicka from comment #18)
> We need a reproducer to fix bugs.

Yes, of course, I understand this. I just didn't have time to make one yet,
we've literally discovered the issue only today (well, maybe yesterday,
depending on the time zone).

> So if you have actual testcase that
> slow down, it would be great to open separate bug report for that.

OK, will do, but, at least superficially, our situation seems very similar to
this one, so I thought it would be better to keep this one going. But, again,
I'll open the new one as soon as I can make a test case for it, if this is your
preference.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug ipa/96337] [10/11 Regression] GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4
  2020-07-27 15:32 [Bug rtl-optimization/96337] New: GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4 aros at gmx dot com
                   ` (19 preceding siblings ...)
  2020-09-19 21:49 ` vz-gcc at zeitlins dot org
@ 2020-09-19 21:52 ` hubicka at ucw dot cz
  20 siblings, 0 replies; 24+ messages in thread
From: hubicka at ucw dot cz @ 2020-09-19 21:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96337

--- Comment #20 from Jan Hubicka <hubicka at ucw dot cz> ---
> OK, will do, but, at least superficially, our situation seems very similar to
> this one, so I thought it would be better to keep this one going. But, again,
> I'll open the new one as soon as I can make a test case for it, if this is your
> preference.

Yes, please fill new bug report.  There should be one issue per bug
report with ocassional metabugs linking them together. 

Honza

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2020-09-19 21:52 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-27 15:32 [Bug rtl-optimization/96337] New: GCC 10.2: twice as slow for -O2 -march=x86-64 vs. GCC 9.3/8.4 aros at gmx dot com
2020-07-27 18:36 ` [Bug rtl-optimization/96337] " david.bolvansky at gmail dot com
2020-07-27 21:13 ` aros at gmx dot com
2020-07-28  6:04 ` [Bug ipa/96337] [10/11 Regression] " rguenth at gcc dot gnu.org
2020-07-28  6:04 ` rguenth at gcc dot gnu.org
2020-07-28  6:20 ` hubicka at gcc dot gnu.org
2020-07-28  6:42 ` hubicka at gcc dot gnu.org
2020-07-28  6:49 ` hubicka at gcc dot gnu.org
2020-07-28  8:08 ` hubicka at gcc dot gnu.org
2020-07-28  8:24 ` hubicka at gcc dot gnu.org
2020-07-28  8:39 ` hubicka at gcc dot gnu.org
2020-07-28  8:47 ` david.bolvansky at gmail dot com
2020-07-28  9:37   ` Jan Hubicka
2020-07-28  9:37 ` hubicka at ucw dot cz
2020-07-29  7:56 ` aros at gmx dot com
2020-08-01 18:53 ` andysem at mail dot ru
2020-08-01 19:30   ` Jan Hubicka
2020-08-01 19:14 ` david.bolvansky at gmail dot com
2020-08-01 19:30 ` hubicka at ucw dot cz
2020-09-19 21:34 ` hubicka at gcc dot gnu.org
2020-09-19 21:39 ` vz-gcc at zeitlins dot org
2020-09-19 21:46 ` hubicka at ucw dot cz
2020-09-19 21:49 ` vz-gcc at zeitlins dot org
2020-09-19 21:52 ` hubicka at ucw dot cz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).