public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c++/101555] New: Compile slowdown in tree PRE
@ 2021-07-21 14:56 wsnyder at wsnyder dot org
  2021-07-21 14:59 ` [Bug c++/101555] " wsnyder at wsnyder dot org
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: wsnyder at wsnyder dot org @ 2021-07-21 14:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101555

            Bug ID: 101555
           Summary: Compile slowdown in tree PRE
           Product: gcc
           Version: 9.3.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: wsnyder at wsnyder dot org
  Target Milestone: ---

Running

   g++ (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0

as

   g++ -I. -c -Os -ftime-report -o /dev/null Vara___024root__114_0.cpp

takes 297 CPU seconds.

On clang version 10.0.0-4ubuntu1 it takes 37 seconds total.

In GCC, 276 of 297 seconds are in the "tree PRE" stage. Splitting the large
function into roughly three pieces (Vara___024root__114_0.cpp, _1 and _2)
the "tree PRE" stage takes 12.7, 7.8 and 3.9 seconds (totalling 24.4).
Therefore I suspect some O(n^2) issue in tree PRE.

Note removing -Os makes gcc much faster, but still signifcantly slower than
clang.

This issue was originally reported using gcc 8.2.0 by Matheus Cavalcante 
at https://github.com/verilator/verilator/issues/3072

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug c++/101555] Compile slowdown in tree PRE
  2021-07-21 14:56 [Bug c++/101555] New: Compile slowdown in tree PRE wsnyder at wsnyder dot org
@ 2021-07-21 14:59 ` wsnyder at wsnyder dot org
  2021-07-22  8:41 ` [Bug tree-optimization/101555] [12 Regression] " rguenth at gcc dot gnu.org
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: wsnyder at wsnyder dot org @ 2021-07-21 14:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101555

--- Comment #1 from Wilson Snyder <wsnyder at wsnyder dot org> ---
Created attachment 51187
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51187&action=edit
Examples and runtimes

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/101555] [12 Regression] Compile slowdown in tree PRE
  2021-07-21 14:56 [Bug c++/101555] New: Compile slowdown in tree PRE wsnyder at wsnyder dot org
  2021-07-21 14:59 ` [Bug c++/101555] " wsnyder at wsnyder dot org
@ 2021-07-22  8:41 ` rguenth at gcc dot gnu.org
  2021-07-22  9:24 ` rguenth at gcc dot gnu.org
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-07-22  8:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101555

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |12.0
           Keywords|                            |needs-bisection
            Summary|Compile slowdown in tree    |[12 Regression] Compile
                   |PRE                         |slowdown in tree PRE
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org
             Status|UNCONFIRMED                 |ASSIGNED
            Version|9.3.0                       |12.0
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2021-07-22

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed with g++-9 (SUSE Linux) 9.3.1 20200406 [revision
6db837a5288ee3ca5ec504fbd5a765817e556ac2]

 tree PRE                           : 645.41 ( 97%)   0.16 ( 34%) 645.60 ( 97%)
 177896 kB ( 15%)
 TOTAL                              : 665.14          0.47        665.73       
1154121 kB

with -fno-code-hoisting that get's down to

 tree PRE                           : 219.50 ( 93%)   0.04 ( 11%) 219.69 ( 93%)
  87813 kB ( 11%)
 TOTAL                              : 235.70          0.37        236.12       
 807306 kB

GCC 10.3.0 shows

 tree PRE                           : 267.62 ( 93%)   0.13 ( 32%) 267.69 ( 93%)
 140858 kB ( 13%)
 TOTAL                              : 288.92          0.41        289.37       
1122606 kB

GCC 11 does

 tree PRE                           : 251.41 ( 91%)   0.10 ( 23%) 251.71 ( 91%)
  137M ( 16%)
 TOTAL                              : 275.01          0.43        275.51       
  837M

Note that trunk regresses again:

 tree PRE                           : 312.12 ( 93%)   0.07 ( 15%) 312.46 ( 93%)
  137M ( 16%)
 TOTAL                              : 337.03          0.48        337.58       
  836M

bisection for that would be nice.

So part of this PR is a dup of PR97623, but there's still slowness of PRE
(as said elsewhere PRE is known quadratic).

I'm turning this PR into the GCC 12 regression that shows above.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/101555] [12 Regression] Compile slowdown in tree PRE
  2021-07-21 14:56 [Bug c++/101555] New: Compile slowdown in tree PRE wsnyder at wsnyder dot org
  2021-07-21 14:59 ` [Bug c++/101555] " wsnyder at wsnyder dot org
  2021-07-22  8:41 ` [Bug tree-optimization/101555] [12 Regression] " rguenth at gcc dot gnu.org
@ 2021-07-22  9:24 ` rguenth at gcc dot gnu.org
  2021-09-06 14:44 ` [Bug tree-optimization/101555] " rguenth at gcc dot gnu.org
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-07-22  9:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101555

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
Btw, it behaves as "intended":

> grep  -i iteration Vara___024root__114.cpp.138t.pre
RPO iteration over 1050 blocks visited 1050 blocks in total discovering 1050
executable blocks iterating 1.0 times, a block was visited max. 1 times
Starting iteration 0
Starting iteration 1
Starting insert iteration 1
worklist iteration #1

so all "iterating" stuff proceeds optimally.

> grep 'Insert' Vara___024root__114.cpp.138t.pre | wc -l
398400

so that's a lot of inserts.  GCC 11 has the same.
Compared to GCC 11 PRE and VN didn't change in any significant way, so I'm
really wondering what this is about.  The IL into PRE is also quite similar.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/101555] Compile slowdown in tree PRE
  2021-07-21 14:56 [Bug c++/101555] New: Compile slowdown in tree PRE wsnyder at wsnyder dot org
                   ` (2 preceding siblings ...)
  2021-07-22  9:24 ` rguenth at gcc dot gnu.org
@ 2021-09-06 14:44 ` rguenth at gcc dot gnu.org
  2021-09-07  7:04 ` rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-09-06 14:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101555

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|[12 Regression] Compile     |Compile slowdown in tree
                   |slowdown in tree PRE        |PRE
   Target Milestone|12.0                        |---
           Keywords|needs-bisection             |

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
Meanwhile GCC 12 seems to be as fast as GCC 11 (again).

Still PRE is slow, in this case the profile seems to be mostly alias walk
related.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/101555] Compile slowdown in tree PRE
  2021-07-21 14:56 [Bug c++/101555] New: Compile slowdown in tree PRE wsnyder at wsnyder dot org
                   ` (3 preceding siblings ...)
  2021-09-06 14:44 ` [Bug tree-optimization/101555] " rguenth at gcc dot gnu.org
@ 2021-09-07  7:04 ` rguenth at gcc dot gnu.org
  2021-09-07  9:14 ` cvs-commit at gcc dot gnu.org
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-09-07  7:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101555

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
OK, so most of the time is spent in ANTIC compute, specifically PHI translation
and there translate_vuse_through_block doing the (rate limited)
stmt_may_clobber_ref_p_1 query.

It's a bit fishy that we're doing these things "twice", in particular
we are supposed to translate ANTIC_IN to the predecessor ANTIC_OUT, thus
all VUSEs should be already top-of-block.  But then in compute_antic_aux
we're doing

  /* Prune expressions that are clobbered in block and thus become
     invalid if translated from ANTIC_OUT to ANTIC_IN.  */
  prune_clobbered_mems (ANTIC_OUT, block);

which is supposed to do the "translation" through the block but that does
not adjust VUSEs of expressions.  That uses value_dies_in_block_x, something
with a cache but also following a somewhat different logic with respect
to the VUSEs in the expression.

The main complication here is that the expression set we start from is
taken from the VN tables which may put "pre-translated" VUSEs in rather
than starting with the VUSEs as they were.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/101555] Compile slowdown in tree PRE
  2021-07-21 14:56 [Bug c++/101555] New: Compile slowdown in tree PRE wsnyder at wsnyder dot org
                   ` (4 preceding siblings ...)
  2021-09-07  7:04 ` rguenth at gcc dot gnu.org
@ 2021-09-07  9:14 ` cvs-commit at gcc dot gnu.org
  2021-09-07  9:22 ` rguenth at gcc dot gnu.org
  2022-07-22  9:48 ` rguenth at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-09-07  9:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101555

--- Comment #6 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:f387ff788f63c1974479644edae728047f843ec4

commit r12-3378-gf387ff788f63c1974479644edae728047f843ec4
Author: Richard Biener <rguenther@suse.de>
Date:   Tue Sep 7 10:35:42 2021 +0200

    tree-optimization/101555 - avoid redundant alias queries in PRE

    This avoids doing redundant work during PHI translation to invalidate
    mems when translating their corresponding VUSE through the blocks
    virtual PHI node.  All the invalidation work is already done by
    prune_clobbered_mems.

    This speeds up the compile of the testcase from 275s with PRE
    taking 91% of the compile-time down to 43s with PRE taking 16%
    of the compile-time.

    2021-09-07  Richard Biener  <rguenther@suse.de>

            PR tree-optimization/101555
            * tree-ssa-pre.c (translate_vuse_through_block): Do not
            perform an alias walk to determine the validity of the
            mem at the start of the block which is already guaranteed
            by means of prune_clobbered_mems.
            (phi_translate_1): Pass edge to translate_vuse_through_block.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/101555] Compile slowdown in tree PRE
  2021-07-21 14:56 [Bug c++/101555] New: Compile slowdown in tree PRE wsnyder at wsnyder dot org
                   ` (5 preceding siblings ...)
  2021-09-07  9:14 ` cvs-commit at gcc dot gnu.org
@ 2021-09-07  9:22 ` rguenth at gcc dot gnu.org
  2022-07-22  9:48 ` rguenth at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-09-07  9:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101555

--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
The committed change improves compile-time to less than 50s, in principle it
also applies to the GCC 11 and 10 branches where the related issue was fixed.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/101555] Compile slowdown in tree PRE
  2021-07-21 14:56 [Bug c++/101555] New: Compile slowdown in tree PRE wsnyder at wsnyder dot org
                   ` (6 preceding siblings ...)
  2021-09-07  9:22 ` rguenth at gcc dot gnu.org
@ 2022-07-22  9:48 ` rguenth at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-07-22  9:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101555

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |FIXED

--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
Fixed in GCC 12.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-07-22  9:48 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-21 14:56 [Bug c++/101555] New: Compile slowdown in tree PRE wsnyder at wsnyder dot org
2021-07-21 14:59 ` [Bug c++/101555] " wsnyder at wsnyder dot org
2021-07-22  8:41 ` [Bug tree-optimization/101555] [12 Regression] " rguenth at gcc dot gnu.org
2021-07-22  9:24 ` rguenth at gcc dot gnu.org
2021-09-06 14:44 ` [Bug tree-optimization/101555] " rguenth at gcc dot gnu.org
2021-09-07  7:04 ` rguenth at gcc dot gnu.org
2021-09-07  9:14 ` cvs-commit at gcc dot gnu.org
2021-09-07  9:22 ` rguenth at gcc dot gnu.org
2022-07-22  9:48 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).