public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug middle-end/38474] slow compilation at -O0 due to expand's temp slot goo
       [not found] <bug-38474-4@http.gcc.gnu.org/bugzilla/>
@ 2011-04-28 16:04 ` rguenth at gcc dot gnu.org
  2011-12-02 13:06 ` steven at gcc dot gnu.org
                   ` (48 subsequent siblings)
  49 siblings, 0 replies; 50+ messages in thread
From: rguenth at gcc dot gnu.org @ 2011-04-28 16:04 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.3.6                       |---


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Bug middle-end/38474] slow compilation at -O0 due to expand's temp slot goo
       [not found] <bug-38474-4@http.gcc.gnu.org/bugzilla/>
  2011-04-28 16:04 ` [Bug middle-end/38474] slow compilation at -O0 due to expand's temp slot goo rguenth at gcc dot gnu.org
@ 2011-12-02 13:06 ` steven at gcc dot gnu.org
  2011-12-02 13:25 ` matz at gcc dot gnu.org
                   ` (47 subsequent siblings)
  49 siblings, 0 replies; 50+ messages in thread
From: steven at gcc dot gnu.org @ 2011-12-02 13:06 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474

Steven Bosscher <steven at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |steven at gcc dot gnu.org

--- Comment #50 from Steven Bosscher <steven at gcc dot gnu.org> 2011-12-02 13:06:12 UTC ---
Michael, you were working on this. Did your patches resolve this bug completely
by now?


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Bug middle-end/38474] slow compilation at -O0 due to expand's temp slot goo
       [not found] <bug-38474-4@http.gcc.gnu.org/bugzilla/>
  2011-04-28 16:04 ` [Bug middle-end/38474] slow compilation at -O0 due to expand's temp slot goo rguenth at gcc dot gnu.org
  2011-12-02 13:06 ` steven at gcc dot gnu.org
@ 2011-12-02 13:25 ` matz at gcc dot gnu.org
  2012-05-27 23:24 ` steven at gcc dot gnu.org
                   ` (46 subsequent siblings)
  49 siblings, 0 replies; 50+ messages in thread
From: matz at gcc dot gnu.org @ 2011-12-02 13:25 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474

--- Comment #51 from Michael Matz <matz at gcc dot gnu.org> 2011-12-02 13:23:57 UTC ---
Nope, I don't have more than a couple hacks to try different approaches
as of right now.  I should dust them off for next stage1.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Bug middle-end/38474] slow compilation at -O0 due to expand's temp slot goo
       [not found] <bug-38474-4@http.gcc.gnu.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2011-12-02 13:25 ` matz at gcc dot gnu.org
@ 2012-05-27 23:24 ` steven at gcc dot gnu.org
  2012-05-29  7:53 ` Joost.VandeVondele at mat dot ethz.ch
                   ` (45 subsequent siblings)
  49 siblings, 0 replies; 50+ messages in thread
From: steven at gcc dot gnu.org @ 2012-05-27 23:24 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474

Steven Bosscher <steven at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |patch
                URL|                            |http://gcc.gnu.org/ml/gcc-p
                   |                            |atches/2012-05/msg01813.htm
                   |                            |l

--- Comment #52 from Steven Bosscher <steven at gcc dot gnu.org> 2012-05-27 23:15:19 UTC ---
Patch set posted by matz.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Bug middle-end/38474] slow compilation at -O0 due to expand's temp slot goo
       [not found] <bug-38474-4@http.gcc.gnu.org/bugzilla/>
                   ` (3 preceding siblings ...)
  2012-05-27 23:24 ` steven at gcc dot gnu.org
@ 2012-05-29  7:53 ` Joost.VandeVondele at mat dot ethz.ch
  2012-05-29 13:09 ` matz at gcc dot gnu.org
                   ` (44 subsequent siblings)
  49 siblings, 0 replies; 50+ messages in thread
From: Joost.VandeVondele at mat dot ethz.ch @ 2012-05-29  7:53 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474

--- Comment #53 from Joost VandeVondele <Joost.VandeVondele at mat dot ethz.ch> 2012-05-29 07:45:36 UTC ---
For the original testcase I have for trunk (gcc version 4.8.0 20120516
(experimental) [trunk revision 187595] (GCC)) very reasonable times (1min) at
-O0, but pretty slow (20min) at -O2. At -O2, all time goes to 'alias stmt
walking      : 826.02' in the latter case. Time reports below:

gfortran -ftime-report -ffree-line-length-512 -g -c testcase.f90

Execution times (seconds)
 phase setup             :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall 
   243 kB ( 0%) ggc
 phase parsing           :   3.59 ( 6%) usr   0.05 ( 5%) sys   3.64 ( 6%) wall 
 47592 kB ( 7%) ggc
 phase cgraph            :  60.02 (94%) usr   0.90 (95%) sys  60.94 (94%) wall 
649547 kB (93%) ggc
 phase generate          :  60.03 (94%) usr   0.90 (95%) sys  60.95 (94%) wall 
649948 kB (93%) ggc
 garbage collection      :   1.04 ( 2%) usr   0.00 ( 0%) sys   1.04 ( 2%) wall 
     0 kB ( 0%) ggc
 callgraph construction  :   0.18 ( 0%) usr   0.01 ( 1%) sys   0.20 ( 0%) wall 
 15909 kB ( 2%) ggc
 callgraph optimization  :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall 
   201 kB ( 0%) ggc
 cfg construction        :   0.08 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall 
     7 kB ( 0%) ggc
 cfg cleanup             :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall 
     0 kB ( 0%) ggc
 CFG verifier            :   1.16 ( 2%) usr   0.00 ( 0%) sys   1.18 ( 2%) wall 
     0 kB ( 0%) ggc
 trivially dead code     :   0.34 ( 1%) usr   0.00 ( 0%) sys   0.35 ( 1%) wall 
     0 kB ( 0%) ggc
 df scan insns           :   1.00 ( 2%) usr   0.25 (26%) sys   1.23 ( 2%) wall 
    11 kB ( 0%) ggc
 df live regs            :   0.46 ( 1%) usr   0.00 ( 0%) sys   0.49 ( 1%) wall 
     0 kB ( 0%) ggc
 df reg dead/unused notes:   0.45 ( 1%) usr   0.01 ( 1%) sys   0.47 ( 1%) wall 
 19416 kB ( 3%) ggc
 register information    :   0.20 ( 0%) usr   0.01 ( 1%) sys   0.19 ( 0%) wall 
     0 kB ( 0%) ggc
 alias analysis          :   0.15 ( 0%) usr   0.00 ( 0%) sys   0.17 ( 0%) wall 
  8336 kB ( 1%) ggc
 rebuild jump labels     :   0.22 ( 0%) usr   0.00 ( 0%) sys   0.21 ( 0%) wall 
     0 kB ( 0%) ggc
 parser (global)         :   3.59 ( 6%) usr   0.05 ( 5%) sys   3.64 ( 6%) wall 
 47587 kB ( 7%) ggc
 inline heuristics       :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall 
    54 kB ( 0%) ggc
 tree gimplify           :   0.48 ( 1%) usr   0.01 ( 1%) sys   0.49 ( 1%) wall 
 26304 kB ( 4%) ggc
 tree eh                 :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall 
    39 kB ( 0%) ggc
 tree CFG construction   :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall 
   190 kB ( 0%) ggc
 tree find ref. vars     :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall 
  3263 kB ( 0%) ggc
 tree PHI insertion      :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall 
     0 kB ( 0%) ggc
 tree SSA rewrite        :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall 
    43 kB ( 0%) ggc
 tree SSA other          :   0.04 ( 0%) usr   0.02 ( 2%) sys   0.01 ( 0%) wall 
    18 kB ( 0%) ggc
 tree operand scan       :   0.01 ( 0%) usr   0.01 ( 1%) sys   0.06 ( 0%) wall 
   118 kB ( 0%) ggc
 tree SSA verifier       :   0.10 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall 
     0 kB ( 0%) ggc
 tree STMT verifier      :   0.58 ( 1%) usr   0.06 ( 6%) sys   0.62 ( 1%) wall 
     0 kB ( 0%) ggc
 callgraph verifier      :   0.28 ( 0%) usr   0.00 ( 0%) sys   0.29 ( 0%) wall 
     0 kB ( 0%) ggc
 out of ssa              :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall 
     0 kB ( 0%) ggc
 expand vars             :  21.72 (34%) usr   0.02 ( 2%) sys  21.74 (34%) wall 
 10086 kB ( 1%) ggc
 expand                  :   6.18 (10%) usr   0.15 (16%) sys   6.31 (10%) wall 
251886 kB (36%) ggc
 post expand cleanups    :   0.14 ( 0%) usr   0.00 ( 0%) sys   0.13 ( 0%) wall 
  1744 kB ( 0%) ggc
 integrated RA           :  10.75 (17%) usr   0.16 (17%) sys  10.87 (17%) wall 
128826 kB (18%) ggc
 reload                  :   5.72 ( 9%) usr   0.15 (16%) sys   5.92 ( 9%) wall 
123587 kB (18%) ggc
 thread pro- & epilogue  :   2.51 ( 4%) usr   0.00 ( 0%) sys   2.50 ( 4%) wall 
   198 kB ( 0%) ggc
 machine dep reorg       :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall 
     0 kB ( 0%) ggc
 final                   :   2.61 ( 4%) usr   0.04 ( 4%) sys   2.65 ( 4%) wall 
  7227 kB ( 1%) ggc
 symout                  :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall 
  4914 kB ( 1%) ggc
 rest of compilation     :   2.36 ( 4%) usr   0.00 ( 0%) sys   2.35 ( 4%) wall 
 47578 kB ( 7%) ggc
 verify RTL sharing      :   1.02 ( 2%) usr   0.00 ( 0%) sys   1.04 ( 2%) wall 
     0 kB ( 0%) ggc
 TOTAL                 :  63.65             0.95            64.62            
697784 kB


gfortran -ftime-report -ffree-line-length-512 -O2 -g -c testcase.f90

Execution times (seconds)
 phase setup             :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall 
   243 kB ( 0%) ggc
 phase parsing           :   3.59 ( 0%) usr   0.07 ( 5%) sys   3.66 ( 0%) wall 
 47596 kB ( 7%) ggc
 phase cgraph            :1031.34 (100%) usr   1.36 (93%) sys1032.77 (100%)
wall  630545 kB (91%) ggc
 phase generate          :1031.85 (100%) usr   1.39 (95%) sys1033.30 (100%)
wall  643621 kB (93%) ggc
 garbage collection      :   1.74 ( 0%) usr   0.01 ( 1%) sys   1.74 ( 0%) wall 
     0 kB ( 0%) ggc
 callgraph construction  :   0.17 ( 0%) usr   0.00 ( 0%) sys   0.19 ( 0%) wall 
 15908 kB ( 2%) ggc
 callgraph optimization  :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall 
   201 kB ( 0%) ggc
 ipa cp                  :   0.40 ( 0%) usr   0.12 ( 8%) sys   0.59 ( 0%) wall 
  6174 kB ( 1%) ggc
 ipa reference           :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall 
     0 kB ( 0%) ggc
 ipa profile             :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall 
     0 kB ( 0%) ggc
 ipa pure const          :   0.08 ( 0%) usr   0.00 ( 0%) sys   0.09 ( 0%) wall 
     0 kB ( 0%) ggc
 cfg construction        :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall 
     0 kB ( 0%) ggc
 cfg cleanup             :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall 
     1 kB ( 0%) ggc
 CFG verifier            :   1.84 ( 0%) usr   0.01 ( 1%) sys   1.83 ( 0%) wall 
     0 kB ( 0%) ggc
 trivially dead code     :   0.46 ( 0%) usr   0.00 ( 0%) sys   0.47 ( 0%) wall 
     0 kB ( 0%) ggc
 df scan insns           :   0.75 ( 0%) usr   0.08 ( 5%) sys   0.83 ( 0%) wall 
    11 kB ( 0%) ggc
 df multiple defs        :   0.49 ( 0%) usr   0.00 ( 0%) sys   0.49 ( 0%) wall 
     0 kB ( 0%) ggc
 df reaching defs        :   1.28 ( 0%) usr   0.02 ( 1%) sys   1.30 ( 0%) wall 
     0 kB ( 0%) ggc
 df live regs            :   2.22 ( 0%) usr   0.00 ( 0%) sys   2.22 ( 0%) wall 
     0 kB ( 0%) ggc
 df live&initialized regs:   1.18 ( 0%) usr   0.00 ( 0%) sys   1.19 ( 0%) wall 
     0 kB ( 0%) ggc
 df use-def / def-use chains:   0.46 ( 0%) usr   0.01 ( 1%) sys   0.46 ( 0%)
wall       0 kB ( 0%) ggc
 df reg dead/unused notes:   2.26 ( 0%) usr   0.00 ( 0%) sys   2.29 ( 0%) wall 
 15547 kB ( 2%) ggc
 register information    :   1.16 ( 0%) usr   0.00 ( 0%) sys   1.15 ( 0%) wall 
     0 kB ( 0%) ggc
 alias analysis          :   0.60 ( 0%) usr   0.00 ( 0%) sys   0.59 ( 0%) wall 
 18809 kB ( 3%) ggc
 alias stmt walking      : 826.02 (80%) usr   0.21 (14%) sys 826.21 (80%) wall 
     0 kB ( 0%) ggc
 register scan           :   0.10 ( 0%) usr   0.00 ( 0%) sys   0.09 ( 0%) wall 
    10 kB ( 0%) ggc
 rebuild jump labels     :   0.12 ( 0%) usr   0.00 ( 0%) sys   0.12 ( 0%) wall 
     0 kB ( 0%) ggc
 parser (global)         :   3.59 ( 0%) usr   0.07 ( 5%) sys   3.66 ( 0%) wall 
 47591 kB ( 7%) ggc
 inline heuristics       :   0.38 ( 0%) usr   0.01 ( 1%) sys   0.35 ( 0%) wall 
   161 kB ( 0%) ggc
 integration             :   0.01 ( 0%) usr   0.01 ( 1%) sys   0.04 ( 0%) wall 
   285 kB ( 0%) ggc
 tree gimplify           :   0.48 ( 0%) usr   0.02 ( 1%) sys   0.49 ( 0%) wall 
 26299 kB ( 4%) ggc
 tree eh                 :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall 
    39 kB ( 0%) ggc
 tree CFG cleanup        :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall 
     4 kB ( 0%) ggc
 tree tail merge         :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall 
     0 kB ( 0%) ggc
 tree VRP                :   0.10 ( 0%) usr   0.00 ( 0%) sys   0.10 ( 0%) wall 
   394 kB ( 0%) ggc
 tree copy propagation   :   0.20 ( 0%) usr   0.00 ( 0%) sys   0.23 ( 0%) wall 
   175 kB ( 0%) ggc
 tree find ref. vars     :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall 
  3262 kB ( 0%) ggc
 tree PTA                :  26.17 ( 3%) usr   0.25 (17%) sys  26.41 ( 3%) wall 
 36473 kB ( 5%) ggc
 tree SSA rewrite        :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall 
 15685 kB ( 2%) ggc
 tree SSA other          :   0.01 ( 0%) usr   0.02 ( 1%) sys   0.05 ( 0%) wall 
    18 kB ( 0%) ggc
 tree SSA incremental    :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall 
    14 kB ( 0%) ggc
 tree operand scan       :   0.09 ( 0%) usr   0.01 ( 1%) sys   0.12 ( 0%) wall 
  8291 kB ( 1%) ggc
 dominator optimization  :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall 
    46 kB ( 0%) ggc
 tree SRA                :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall 
     0 kB ( 0%) ggc
 tree CCP                :   0.17 ( 0%) usr   0.00 ( 0%) sys   0.16 ( 0%) wall 
   107 kB ( 0%) ggc
 tree reassociation      :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall 
    23 kB ( 0%) ggc
 tree PRE                :   0.15 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 0%) wall 
   180 kB ( 0%) ggc
 tree FRE                :   0.21 ( 0%) usr   0.00 ( 0%) sys   0.18 ( 0%) wall 
   313 kB ( 0%) ggc
 tree linearize phis     :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall 
    12 kB ( 0%) ggc
 tree forward propagate  :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall 
     0 kB ( 0%) ggc
 tree conservative DCE   :   0.19 ( 0%) usr   0.03 ( 2%) sys   0.25 ( 0%) wall 
     0 kB ( 0%) ggc
 tree aggressive DCE     :   0.11 ( 0%) usr   0.02 ( 1%) sys   0.10 ( 0%) wall 
   295 kB ( 0%) ggc
 tree buildin call DCE   :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall 
     0 kB ( 0%) ggc
 tree DSE                :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall 
     0 kB ( 0%) ggc
 tree rename SSA copies  :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall 
     0 kB ( 0%) ggc
 tree SSA verifier       :   3.58 ( 0%) usr   0.00 ( 0%) sys   3.64 ( 0%) wall 
     0 kB ( 0%) ggc
 tree STMT verifier      :   8.91 ( 1%) usr   0.03 ( 2%) sys   8.87 ( 1%) wall 
     0 kB ( 0%) ggc
 tree strlen optimization:   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall 
     0 kB ( 0%) ggc
 callgraph verifier      :   0.48 ( 0%) usr   0.00 ( 0%) sys   0.46 ( 0%) wall 
     0 kB ( 0%) ggc
 dominance computation   :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall 
     0 kB ( 0%) ggc
 out of ssa              :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall 
     0 kB ( 0%) ggc
 expand vars             :  96.25 ( 9%) usr   0.00 ( 0%) sys  96.26 ( 9%) wall 
  9993 kB ( 1%) ggc
 expand                  :   1.34 ( 0%) usr   0.06 ( 4%) sys   1.38 ( 0%) wall 
119840 kB (17%) ggc
 post expand cleanups    :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall 
   109 kB ( 0%) ggc
 lower subreg            :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall 
     0 kB ( 0%) ggc
 forward prop            :   3.47 ( 0%) usr   0.02 ( 1%) sys   3.48 ( 0%) wall 
  6588 kB ( 1%) ggc
 CSE                     :   0.99 ( 0%) usr   0.01 ( 1%) sys   1.01 ( 0%) wall 
 13067 kB ( 2%) ggc
 dead code elimination   :   0.41 ( 0%) usr   0.00 ( 0%) sys   0.41 ( 0%) wall 
     0 kB ( 0%) ggc
 dead store elim1        :   0.65 ( 0%) usr   0.02 ( 1%) sys   0.68 ( 0%) wall 
  1535 kB ( 0%) ggc
 dead store elim2        :   2.52 ( 0%) usr   0.00 ( 0%) sys   2.53 ( 0%) wall 
  4393 kB ( 1%) ggc
 CPROP                   :   1.83 ( 0%) usr   0.00 ( 0%) sys   1.83 ( 0%) wall 
    10 kB ( 0%) ggc
 PRE                     :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall 
     1 kB ( 0%) ggc
 CSE 2                   :   0.78 ( 0%) usr   0.01 ( 1%) sys   0.78 ( 0%) wall 
  7470 kB ( 1%) ggc
 branch prediction       :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall 
    41 kB ( 0%) ggc
 combiner                :   0.47 ( 0%) usr   0.01 ( 1%) sys   0.48 ( 0%) wall 
    12 kB ( 0%) ggc
 if-conversion           :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall 
    22 kB ( 0%) ggc
 regmove                 :   0.40 ( 0%) usr   0.00 ( 0%) sys   0.41 ( 0%) wall 
     0 kB ( 0%) ggc
 integrated RA           :  13.95 ( 1%) usr   0.12 ( 8%) sys  14.05 ( 1%) wall 
 67921 kB (10%) ggc
 reload                  :   3.03 ( 0%) usr   0.04 ( 3%) sys   3.04 ( 0%) wall 
 52942 kB ( 8%) ggc
 reload CSE regs         :   1.24 ( 0%) usr   0.00 ( 0%) sys   1.25 ( 0%) wall 
 10704 kB ( 2%) ggc
 ree                     :   0.10 ( 0%) usr   0.00 ( 0%) sys   0.10 ( 0%) wall 
     1 kB ( 0%) ggc
 thread pro- & epilogue  :   0.46 ( 0%) usr   0.00 ( 0%) sys   0.45 ( 0%) wall 
   256 kB ( 0%) ggc
 if-conversion 2         :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall 
    22 kB ( 0%) ggc
 combine stack adjustments:   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall
      0 kB ( 0%) ggc
 peephole 2              :   0.15 ( 0%) usr   0.00 ( 0%) sys   0.15 ( 0%) wall 
    15 kB ( 0%) ggc
 hard reg cprop          :   1.12 ( 0%) usr   0.00 ( 0%) sys   1.11 ( 0%) wall 
     3 kB ( 0%) ggc
 scheduling 2            :   3.33 ( 0%) usr   0.11 ( 8%) sys   3.44 ( 0%) wall 
 39009 kB ( 6%) ggc
 machine dep reorg       :   0.21 ( 0%) usr   0.00 ( 0%) sys   0.22 ( 0%) wall 
     0 kB ( 0%) ggc
 reorder blocks          :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall 
     2 kB ( 0%) ggc
 final                   :   0.87 ( 0%) usr   0.02 ( 1%) sys   0.92 ( 0%) wall 
 11744 kB ( 2%) ggc
 symout                  :   0.74 ( 0%) usr   0.06 ( 4%) sys   0.78 ( 0%) wall 
111089 kB (16%) ggc
 variable tracking       :   0.61 ( 0%) usr   0.00 ( 0%) sys   0.61 ( 0%) wall 
 29367 kB ( 4%) ggc
 var-tracking dataflow   :   3.52 ( 0%) usr   0.00 ( 0%) sys   3.52 ( 0%) wall 
     0 kB ( 0%) ggc
 var-tracking emit       :   4.20 ( 0%) usr   0.00 ( 0%) sys   4.20 ( 0%) wall 
  5616 kB ( 1%) ggc
 rest of compilation     :   1.16 ( 0%) usr   0.02 ( 1%) sys   1.27 ( 0%) wall 
  2733 kB ( 0%) ggc
 remove unused locals    :   0.09 ( 0%) usr   0.00 ( 0%) sys   0.09 ( 0%) wall 
     0 kB ( 0%) ggc
 address taken           :   0.13 ( 0%) usr   0.00 ( 0%) sys   0.12 ( 0%) wall 
     0 kB ( 0%) ggc
 unaccounted todo        :   0.62 ( 0%) usr   0.03 ( 2%) sys   0.66 ( 0%) wall 
     0 kB ( 0%) ggc
 verify RTL sharing      :   3.87 ( 0%) usr   0.00 ( 0%) sys   3.89 ( 0%) wall 
     0 kB ( 0%) ggc
 TOTAL                 :1035.46             1.46          1036.99            
691461 kB


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Bug middle-end/38474] slow compilation at -O0 due to expand's temp slot goo
       [not found] <bug-38474-4@http.gcc.gnu.org/bugzilla/>
                   ` (4 preceding siblings ...)
  2012-05-29  7:53 ` Joost.VandeVondele at mat dot ethz.ch
@ 2012-05-29 13:09 ` matz at gcc dot gnu.org
  2012-05-29 13:12 ` matz at gcc dot gnu.org
                   ` (43 subsequent siblings)
  49 siblings, 0 replies; 50+ messages in thread
From: matz at gcc dot gnu.org @ 2012-05-29 13:09 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474

--- Comment #54 from Michael Matz <matz at gcc dot gnu.org> 2012-05-29 12:47:29 UTC ---
Yes, only the expand vars problem is attacked by my patch.  The alias walking
seems to come from an IPA analysis via ipa_compute_jump_functions.

detect_type_change uses the walker from all call statements, and that's
used by compute_known_type_jump_func, from compute_scalar_jump_functions,
from ipa_compute_jump_functions_for_edge.  And the latter is called for each
callee.  The yukawa_gn_full function has very many calls, so this
seems to make out for an quadratic problem.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Bug middle-end/38474] slow compilation at -O0 due to expand's temp slot goo
       [not found] <bug-38474-4@http.gcc.gnu.org/bugzilla/>
                   ` (5 preceding siblings ...)
  2012-05-29 13:09 ` matz at gcc dot gnu.org
@ 2012-05-29 13:12 ` matz at gcc dot gnu.org
  2012-05-29 15:08 ` hubicka at gcc dot gnu.org
                   ` (42 subsequent siblings)
  49 siblings, 0 replies; 50+ messages in thread
From: matz at gcc dot gnu.org @ 2012-05-29 13:12 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474

--- Comment #55 from Michael Matz <matz at gcc dot gnu.org> 2012-05-29 13:08:52 UTC ---
FWIW the node->callees list in yukawa_gn_full has 25076 entries.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Bug middle-end/38474] slow compilation at -O0 due to expand's temp slot goo
       [not found] <bug-38474-4@http.gcc.gnu.org/bugzilla/>
                   ` (6 preceding siblings ...)
  2012-05-29 13:12 ` matz at gcc dot gnu.org
@ 2012-05-29 15:08 ` hubicka at gcc dot gnu.org
  2012-05-29 16:00 ` jamborm at gcc dot gnu.org
                   ` (41 subsequent siblings)
  49 siblings, 0 replies; 50+ messages in thread
From: hubicka at gcc dot gnu.org @ 2012-05-29 15:08 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474

Jan Hubicka <hubicka at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mjambor at suse dot cz

--- Comment #56 from Jan Hubicka <hubicka at gcc dot gnu.org> 2012-05-29 14:57:30 UTC ---
That functions is Martin's. Martin, can you please take a look?


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Bug middle-end/38474] slow compilation at -O0 due to expand's temp slot goo
       [not found] <bug-38474-4@http.gcc.gnu.org/bugzilla/>
                   ` (7 preceding siblings ...)
  2012-05-29 15:08 ` hubicka at gcc dot gnu.org
@ 2012-05-29 16:00 ` jamborm at gcc dot gnu.org
  2012-06-15 14:56 ` matz at gcc dot gnu.org
                   ` (40 subsequent siblings)
  49 siblings, 0 replies; 50+ messages in thread
From: jamborm at gcc dot gnu.org @ 2012-05-29 16:00 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474

Martin Jambor <jamborm at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jamborm at gcc dot gnu.org

--- Comment #57 from Martin Jambor <jamborm at gcc dot gnu.org> 2012-05-29 15:08:39 UTC ---
(In reply to comment #56)
> That functions is Martin's. Martin, can you please take a look?

I will, on Monday.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Bug middle-end/38474] slow compilation at -O0 due to expand's temp slot goo
       [not found] <bug-38474-4@http.gcc.gnu.org/bugzilla/>
                   ` (8 preceding siblings ...)
  2012-05-29 16:00 ` jamborm at gcc dot gnu.org
@ 2012-06-15 14:56 ` matz at gcc dot gnu.org
  2012-06-15 15:13 ` matz at gcc dot gnu.org
                   ` (39 subsequent siblings)
  49 siblings, 0 replies; 50+ messages in thread
From: matz at gcc dot gnu.org @ 2012-06-15 14:56 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474

--- Comment #58 from Michael Matz <matz at gcc dot gnu.org> 2012-06-15 14:56:33 UTC ---
Author: matz
Date: Fri Jun 15 14:56:26 2012
New Revision: 188667

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=188667
Log:
    PR middle-end/38474
    * cfgexpand.c (add_alias_set_conflicts): Remove.
    (expand_used_vars): Don't call it.
    (aggregate_contains_union_type): Remove.
    * function.c (n_temp_slots_in_use): New static data.
    (make_slot_available, assign_stack_temp_for_type): Update it.
    (init_temp_slots): Zero it.
    (remove_unused_temp_slot_addresses): Use it for quicker removal.
    (remove_unused_temp_slot_addresses_1): Use htab_clear_slot.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/cfgexpand.c
    trunk/gcc/function.c


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Bug middle-end/38474] slow compilation at -O0 due to expand's temp slot goo
       [not found] <bug-38474-4@http.gcc.gnu.org/bugzilla/>
                   ` (9 preceding siblings ...)
  2012-06-15 14:56 ` matz at gcc dot gnu.org
@ 2012-06-15 15:13 ` matz at gcc dot gnu.org
  2012-06-15 15:26 ` Joost.VandeVondele at mat dot ethz.ch
                   ` (38 subsequent siblings)
  49 siblings, 0 replies; 50+ messages in thread
From: matz at gcc dot gnu.org @ 2012-06-15 15:13 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474

--- Comment #59 from Michael Matz <matz at gcc dot gnu.org> 2012-06-15 15:12:59 UTC ---
There should be no compile performance problems in expand anymore.
The alias stmt walker as used from IPA remains a problem, though.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Bug middle-end/38474] slow compilation at -O0 due to expand's temp slot goo
       [not found] <bug-38474-4@http.gcc.gnu.org/bugzilla/>
                   ` (10 preceding siblings ...)
  2012-06-15 15:13 ` matz at gcc dot gnu.org
@ 2012-06-15 15:26 ` Joost.VandeVondele at mat dot ethz.ch
  2012-06-26 14:26 ` jamborm at gcc dot gnu.org
                   ` (37 subsequent siblings)
  49 siblings, 0 replies; 50+ messages in thread
From: Joost.VandeVondele at mat dot ethz.ch @ 2012-06-15 15:26 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474

--- Comment #60 from Joost VandeVondele <Joost.VandeVondele at mat dot ethz.ch> 2012-06-15 15:26:20 UTC ---
(In reply to comment #59)
> There should be no compile performance problems in expand anymore.
> The alias stmt walker as used from IPA remains a problem, though.

Thanks... expand is now indeed essentially gone from the timing report.

> gfortran -ftime-report -ffree-line-length-512 -g -c testcase.f90

Execution times (seconds)
 phase setup             :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall 
   243 kB ( 0%) ggc
 phase parsing           :   3.57 ( 9%) usr   0.06 ( 7%) sys   3.63 ( 9%) wall 
 47592 kB ( 7%) ggc
 phase cgraph            :  36.49 (91%) usr   0.86 (93%) sys  37.34 (91%) wall 
647436 kB (93%) ggc
 phase generate          :  36.50 (91%) usr   0.86 (93%) sys  37.36 (91%) wall 
647838 kB (93%) ggc
 garbage collection      :   1.04 ( 3%) usr   0.00 ( 0%) sys   1.04 ( 3%) wall 
     0 kB ( 0%) ggc
 callgraph construction  :   0.19 ( 0%) usr   0.00 ( 0%) sys   0.19 ( 0%) wall 
 15909 kB ( 2%) ggc
 callgraph optimization  :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall 
   201 kB ( 0%) ggc
 cfg construction        :   0.08 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall 
     7 kB ( 0%) ggc
 cfg cleanup             :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall 
     0 kB ( 0%) ggc
 CFG verifier            :   1.26 ( 3%) usr   0.00 ( 0%) sys   1.25 ( 3%) wall 
     0 kB ( 0%) ggc
 trivially dead code     :   0.43 ( 1%) usr   0.00 ( 0%) sys   0.41 ( 1%) wall 
     0 kB ( 0%) ggc
 df scan insns           :   0.98 ( 2%) usr   0.24 (26%) sys   1.24 ( 3%) wall 
    11 kB ( 0%) ggc
 df live regs            :   0.58 ( 1%) usr   0.01 ( 1%) sys   0.57 ( 1%) wall 
     0 kB ( 0%) ggc
 df reg dead/unused notes:   0.43 ( 1%) usr   0.01 ( 1%) sys   0.45 ( 1%) wall 
 19416 kB ( 3%) ggc
 register information    :   0.18 ( 0%) usr   0.00 ( 0%) sys   0.18 ( 0%) wall 
     0 kB ( 0%) ggc
 alias analysis          :   0.15 ( 0%) usr   0.00 ( 0%) sys   0.14 ( 0%) wall 
  8337 kB ( 1%) ggc
 rebuild jump labels     :   0.22 ( 1%) usr   0.00 ( 0%) sys   0.21 ( 1%) wall 
     0 kB ( 0%) ggc
 parser (global)         :   3.57 ( 9%) usr   0.06 ( 7%) sys   3.63 ( 9%) wall 
 47587 kB ( 7%) ggc
 inline heuristics       :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall 
    54 kB ( 0%) ggc
 tree gimplify           :   0.51 ( 1%) usr   0.01 ( 1%) sys   0.51 ( 1%) wall 
 26304 kB ( 4%) ggc
 tree eh                 :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall 
    39 kB ( 0%) ggc
 tree CFG construction   :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall 
   190 kB ( 0%) ggc
 tree CFG cleanup        :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall 
     0 kB ( 0%) ggc
 tree find ref. vars     :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall 
  3263 kB ( 0%) ggc
 tree PHI insertion      :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall 
     0 kB ( 0%) ggc
 tree SSA other          :   0.01 ( 0%) usr   0.01 ( 1%) sys   0.02 ( 0%) wall 
    18 kB ( 0%) ggc
 tree operand scan       :   0.03 ( 0%) usr   0.03 ( 3%) sys   0.05 ( 0%) wall 
   118 kB ( 0%) ggc
 tree SSA verifier       :   0.10 ( 0%) usr   0.00 ( 0%) sys   0.10 ( 0%) wall 
     0 kB ( 0%) ggc
 tree STMT verifier      :   0.56 ( 1%) usr   0.05 ( 5%) sys   0.63 ( 2%) wall 
     0 kB ( 0%) ggc
 callgraph verifier      :   0.25 ( 1%) usr   0.00 ( 0%) sys   0.27 ( 1%) wall 
     0 kB ( 0%) ggc
 out of ssa              :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall 
     0 kB ( 0%) ggc
 expand vars             :   1.02 ( 3%) usr   0.02 ( 2%) sys   1.03 ( 3%) wall 
 10086 kB ( 1%) ggc
 expand                  :   2.03 ( 5%) usr   0.12 (13%) sys   2.18 ( 5%) wall 
249774 kB (36%) ggc
 post expand cleanups    :   0.14 ( 0%) usr   0.01 ( 1%) sys   0.14 ( 0%) wall 
  1744 kB ( 0%) ggc
 integrated RA           :  10.75 (27%) usr   0.15 (16%) sys  10.93 (27%) wall 
128826 kB (19%) ggc
 reload                  :   5.56 (14%) usr   0.16 (17%) sys   5.77 (14%) wall 
123587 kB (18%) ggc
 thread pro- & epilogue  :   2.65 ( 7%) usr   0.00 ( 0%) sys   2.64 ( 6%) wall 
   198 kB ( 0%) ggc
 machine dep reorg       :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall 
     0 kB ( 0%) ggc
 final                   :   3.11 ( 8%) usr   0.04 ( 4%) sys   3.15 ( 8%) wall 
  7227 kB ( 1%) ggc
 symout                  :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall 
  4914 kB ( 1%) ggc
 rest of compilation     :   2.46 ( 6%) usr   0.00 ( 0%) sys   2.39 ( 6%) wall 
 47578 kB ( 7%) ggc
 unaccounted todo        :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall 
     0 kB ( 0%) ggc
 verify RTL sharing      :   1.49 ( 4%) usr   0.00 ( 0%) sys   1.48 ( 4%) wall 
     0 kB ( 0%) ggc
 TOTAL                 :  40.09             0.92            41.02            
695674 kB


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Bug middle-end/38474] slow compilation at -O0 due to expand's temp slot goo
       [not found] <bug-38474-4@http.gcc.gnu.org/bugzilla/>
                   ` (11 preceding siblings ...)
  2012-06-15 15:26 ` Joost.VandeVondele at mat dot ethz.ch
@ 2012-06-26 14:26 ` jamborm at gcc dot gnu.org
  2012-06-26 14:45 ` matz at gcc dot gnu.org
                   ` (36 subsequent siblings)
  49 siblings, 0 replies; 50+ messages in thread
From: jamborm at gcc dot gnu.org @ 2012-06-26 14:26 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474

--- Comment #61 from Martin Jambor <jamborm at gcc dot gnu.org> 2012-06-26 14:26:34 UTC ---
(In reply to comment #57)
> 
> I will, on Monday.

And by Monday I obviously meant yesterday ;-)

Anyway, on the machine where are debugged this, compilation at -O3
took over 16 seconds which dropped to about 13.5 seconds when I also
added -fno-devirtualize (-ftime-report showed that alias stmt walking
dropped from 82% to 75%).  This is mainly due to calls to
detect_type_change from compute_known_type_jump_func, there are 36454
of them and all are of course completely pointless because we do not
devirtualize in Fortran.

Looking into the code, it is apparent that I even attempted to avoid
such situations but somehow was not paying enough attention.  The
rather obvious patch below fixes that.  With it, the compile time at
-O3 drops to 13.5 without any additional options (~50 calls to
detect_type_change_ssa and detect_type change from other places remain
but those are not a big problem here, they are not so easy to get rid
of and I hope to eventually remove the type detection machinery from
IPA altogether so I'll keep those for later).

I'll bootstrap and test the patch and post it to the mailing list
soon.

Index: gcc/ipa-prop.c
===================================================================
--- gcc/ipa-prop.c      (revision 188931)
+++ gcc/ipa-prop.c      (working copy)
@@ -912,8 +912,8 @@ compute_known_type_jump_func (tree op, s
       || is_global_var (base))
     return;

-  if (detect_type_change (op, base, call, jfunc, offset)
-      || !TYPE_BINFO (TREE_TYPE (base)))
+  if (!TYPE_BINFO (TREE_TYPE (base))
+      || detect_type_change (op, base, call, jfunc, offset))
     return;


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Bug middle-end/38474] slow compilation at -O0 due to expand's temp slot goo
       [not found] <bug-38474-4@http.gcc.gnu.org/bugzilla/>
                   ` (12 preceding siblings ...)
  2012-06-26 14:26 ` jamborm at gcc dot gnu.org
@ 2012-06-26 14:45 ` matz at gcc dot gnu.org
  2012-06-26 14:58 ` rguenth at gcc dot gnu.org
                   ` (35 subsequent siblings)
  49 siblings, 0 replies; 50+ messages in thread
From: matz at gcc dot gnu.org @ 2012-06-26 14:45 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474

--- Comment #62 from Michael Matz <matz at gcc dot gnu.org> 2012-06-26 14:44:58 UTC ---
(In reply to comment #61)
> (In reply to comment #57)
> 
> Anyway, on the machine where are debugged this, compilation at -O3
> took over 16 seconds which dropped to about 13.5 seconds when I also

What?  Must be a future machine.  On everything I have access to the reduced
testcase (6309 lines) takes about 800 to 1000 seconds.  Do you build without
any checking?

In any case, the proposed patch does reduce the time to basically nothing for
the alias tree walker, so: thanks :)


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Bug middle-end/38474] slow compilation at -O0 due to expand's temp slot goo
       [not found] <bug-38474-4@http.gcc.gnu.org/bugzilla/>
                   ` (13 preceding siblings ...)
  2012-06-26 14:45 ` matz at gcc dot gnu.org
@ 2012-06-26 14:58 ` rguenth at gcc dot gnu.org
  2012-06-26 15:01 ` jamborm at gcc dot gnu.org
                   ` (34 subsequent siblings)
  49 siblings, 0 replies; 50+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-06-26 14:58 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474

--- Comment #63 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-06-26 14:58:31 UTC ---
(In reply to comment #61)
> (In reply to comment #57)
> > 
> > I will, on Monday.
> 
> And by Monday I obviously meant yesterday ;-)
> 
> Anyway, on the machine where are debugged this, compilation at -O3
> took over 16 seconds which dropped to about 13.5 seconds when I also
> added -fno-devirtualize (-ftime-report showed that alias stmt walking
> dropped from 82% to 75%).  This is mainly due to calls to
> detect_type_change from compute_known_type_jump_func, there are 36454
> of them and all are of course completely pointless because we do not
> devirtualize in Fortran.
> 
> Looking into the code, it is apparent that I even attempted to avoid
> such situations but somehow was not paying enough attention.  The
> rather obvious patch below fixes that.  With it, the compile time at
> -O3 drops to 13.5 without any additional options (~50 calls to
> detect_type_change_ssa and detect_type change from other places remain
> but those are not a big problem here, they are not so easy to get rid
> of and I hope to eventually remove the type detection machinery from
> IPA altogether so I'll keep those for later).
> 
> I'll bootstrap and test the patch and post it to the mailing list
> soon.
> 
> Index: gcc/ipa-prop.c
> ===================================================================
> --- gcc/ipa-prop.c      (revision 188931)
> +++ gcc/ipa-prop.c      (working copy)
> @@ -912,8 +912,8 @@ compute_known_type_jump_func (tree op, s
>        || is_global_var (base))
>      return;
> 
> -  if (detect_type_change (op, base, call, jfunc, offset)
> -      || !TYPE_BINFO (TREE_TYPE (base)))
> +  if (!TYPE_BINFO (TREE_TYPE (base))
> +      || detect_type_change (op, base, call, jfunc, offset))
>      return;

That change qualifies for a backport to all branches it applies to ...


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Bug middle-end/38474] slow compilation at -O0 due to expand's temp slot goo
       [not found] <bug-38474-4@http.gcc.gnu.org/bugzilla/>
                   ` (14 preceding siblings ...)
  2012-06-26 14:58 ` rguenth at gcc dot gnu.org
@ 2012-06-26 15:01 ` jamborm at gcc dot gnu.org
  2012-06-29 14:34 ` jamborm at gcc dot gnu.org
                   ` (33 subsequent siblings)
  49 siblings, 0 replies; 50+ messages in thread
From: jamborm at gcc dot gnu.org @ 2012-06-26 15:01 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474

--- Comment #64 from Martin Jambor <jamborm at gcc dot gnu.org> 2012-06-26 15:01:28 UTC ---
(In reply to comment #62)
> (In reply to comment #61)
> > (In reply to comment #57)
> > 
> > Anyway, on the machine where are debugged this, compilation at -O3
> > took over 16 seconds which dropped to about 13.5 seconds when I also
> 
> What?  Must be a future machine.  On everything I have access to the reduced
> testcase (6309 lines) takes about 800 to 1000 seconds.  Do you build without
> any checking?

Minutes! Of course I meant minutes, the drop is thus from ~1000
seconds to ~810 seconds.  I forgot I was using bash time instead of
/usr/bin/time -f%U which I was regularly using only a few days ago.

> 
> In any case, the proposed patch does reduce the time to basically nothing for
> the alias tree walker, so: thanks :)

I've experimentally disabled the walker in
is_parm_modified_before_stmt and am now waiting for results but I
guess it won't have any measurable impact.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Bug middle-end/38474] slow compilation at -O0 due to expand's temp slot goo
       [not found] <bug-38474-4@http.gcc.gnu.org/bugzilla/>
                   ` (15 preceding siblings ...)
  2012-06-26 15:01 ` jamborm at gcc dot gnu.org
@ 2012-06-29 14:34 ` jamborm at gcc dot gnu.org
  2012-07-02 15:28 ` jamborm at gcc dot gnu.org
                   ` (32 subsequent siblings)
  49 siblings, 0 replies; 50+ messages in thread
From: jamborm at gcc dot gnu.org @ 2012-06-29 14:34 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474

--- Comment #65 from Martin Jambor <jamborm at gcc dot gnu.org> 2012-06-29 14:34:34 UTC ---
I have posted the patch to the mailing list:

http://gcc.gnu.org/ml/gcc-patches/2012-06/msg01928.html

along with an equivalent one for the 4.6 branch:

http://gcc.gnu.org/ml/gcc-patches/2012-06/msg01929.html


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Bug middle-end/38474] slow compilation at -O0 due to expand's temp slot goo
       [not found] <bug-38474-4@http.gcc.gnu.org/bugzilla/>
                   ` (16 preceding siblings ...)
  2012-06-29 14:34 ` jamborm at gcc dot gnu.org
@ 2012-07-02 15:28 ` jamborm at gcc dot gnu.org
  2012-07-02 15:44 ` jamborm at gcc dot gnu.org
                   ` (31 subsequent siblings)
  49 siblings, 0 replies; 50+ messages in thread
From: jamborm at gcc dot gnu.org @ 2012-07-02 15:28 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474

--- Comment #66 from Martin Jambor <jamborm at gcc dot gnu.org> 2012-07-02 15:28:17 UTC ---
Author: jamborm
Date: Mon Jul  2 15:28:11 2012
New Revision: 189163

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=189163
Log:
2012-07-02  Martin Jambor  <mjambor@suse.cz>

    PR middle-end/38474
    * ipa-prop.c (compute_known_type_jump_func): Put BINFO check before a
    dynamic type change check.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/ipa-prop.c


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Bug middle-end/38474] slow compilation at -O0 due to expand's temp slot goo
       [not found] <bug-38474-4@http.gcc.gnu.org/bugzilla/>
                   ` (17 preceding siblings ...)
  2012-07-02 15:28 ` jamborm at gcc dot gnu.org
@ 2012-07-02 15:44 ` jamborm at gcc dot gnu.org
  2012-07-02 15:53 ` jamborm at gcc dot gnu.org
                   ` (30 subsequent siblings)
  49 siblings, 0 replies; 50+ messages in thread
From: jamborm at gcc dot gnu.org @ 2012-07-02 15:44 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474

--- Comment #67 from Martin Jambor <jamborm at gcc dot gnu.org> 2012-07-02 15:44:01 UTC ---
Author: jamborm
Date: Mon Jul  2 15:43:56 2012
New Revision: 189164

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=189164
Log:
2012-07-02  Martin Jambor  <mjambor@suse.cz>

    PR middle-end/38474
    * ipa-prop.c (compute_known_type_jump_func): Put BINFO check before a
    dynamic type change check.


Modified:
    branches/gcc-4_7-branch/gcc/ChangeLog
    branches/gcc-4_7-branch/gcc/ipa-prop.c


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Bug middle-end/38474] slow compilation at -O0 due to expand's temp slot goo
       [not found] <bug-38474-4@http.gcc.gnu.org/bugzilla/>
                   ` (18 preceding siblings ...)
  2012-07-02 15:44 ` jamborm at gcc dot gnu.org
@ 2012-07-02 15:53 ` jamborm at gcc dot gnu.org
  2012-08-28  8:25 ` steven at gcc dot gnu.org
                   ` (29 subsequent siblings)
  49 siblings, 0 replies; 50+ messages in thread
From: jamborm at gcc dot gnu.org @ 2012-07-02 15:53 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474

--- Comment #68 from Martin Jambor <jamborm at gcc dot gnu.org> 2012-07-02 15:53:29 UTC ---
Author: jamborm
Date: Mon Jul  2 15:53:21 2012
New Revision: 189165

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=189165
Log:
2012-07-02  Martin Jambor  <mjambor@suse.cz>

    PR middle-end/38474
    * ipa-prop.c (compute_known_type_jump_func): Check for a BINFO before
    checking for a dynamic type change.


Modified:
    branches/gcc-4_6-branch/gcc/ChangeLog
    branches/gcc-4_6-branch/gcc/ipa-prop.c


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Bug middle-end/38474] slow compilation at -O0 due to expand's temp slot goo
       [not found] <bug-38474-4@http.gcc.gnu.org/bugzilla/>
                   ` (19 preceding siblings ...)
  2012-07-02 15:53 ` jamborm at gcc dot gnu.org
@ 2012-08-28  8:25 ` steven at gcc dot gnu.org
  2012-08-28 11:28 ` Joost.VandeVondele at mat dot ethz.ch
                   ` (28 subsequent siblings)
  49 siblings, 0 replies; 50+ messages in thread
From: steven at gcc dot gnu.org @ 2012-08-28  8:25 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474

Steven Bosscher <steven at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |WAITING

--- Comment #69 from Steven Bosscher <steven at gcc dot gnu.org> 2012-08-28 08:25:06 UTC ---
Is there still a problem here?


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Bug middle-end/38474] slow compilation at -O0 due to expand's temp slot goo
       [not found] <bug-38474-4@http.gcc.gnu.org/bugzilla/>
                   ` (20 preceding siblings ...)
  2012-08-28  8:25 ` steven at gcc dot gnu.org
@ 2012-08-28 11:28 ` Joost.VandeVondele at mat dot ethz.ch
  2012-08-28 14:55 ` Joost.VandeVondele at mat dot ethz.ch
                   ` (27 subsequent siblings)
  49 siblings, 0 replies; 50+ messages in thread
From: Joost.VandeVondele at mat dot ethz.ch @ 2012-08-28 11:28 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474

--- Comment #70 from Joost VandeVondele <Joost.VandeVondele at mat dot ethz.ch> 2012-08-28 11:28:06 UTC ---
(In reply to comment #69)
> Is there still a problem here?

for current trunk and the original testcase, timings are reasonable at -O0 -O1
-O2, but very long at -O3 (>60min):

report.O0.txt: TOTAL                 :  38.78             0.89            39.67
            691166 kB
report.O1.txt: TOTAL                 :  70.04             1.13            71.22
            634523 kB
report.O2.txt: TOTAL                 : 204.51             1.16           205.71
            691522 kB

the biggest consumers are

-O0:  
====
integrated RA           :  10.36 
reload                  :   5.16;

-O1:
====
tree PTA                :   7.77
integrated RA           :  13.36

-O2:
====
expand vars             :  83.15
tree PTA                :  35.04

-O3: (also needs about 4Gb of memory)
====
??? not yet finished (>60min)


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Bug middle-end/38474] slow compilation at -O0 due to expand's temp slot goo
       [not found] <bug-38474-4@http.gcc.gnu.org/bugzilla/>
                   ` (21 preceding siblings ...)
  2012-08-28 11:28 ` Joost.VandeVondele at mat dot ethz.ch
@ 2012-08-28 14:55 ` Joost.VandeVondele at mat dot ethz.ch
  2012-08-28 15:07 ` steven at gcc dot gnu.org
                   ` (26 subsequent siblings)
  49 siblings, 0 replies; 50+ messages in thread
From: Joost.VandeVondele at mat dot ethz.ch @ 2012-08-28 14:55 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474

--- Comment #71 from Joost VandeVondele <Joost.VandeVondele at mat dot ethz.ch> 2012-08-28 14:54:54 UTC ---
The -O3 compile is 3h later still running and needs >20Gb of RAM. The issue
seems now to be variable_tracking_main

#0  0x0000000000b7b8ce in dataflow_set_preserve_mem_locs(void**, void*) ()
#1  0x0000000000e76168 in htab_traverse_noresize ()
#2  0x0000000000b770e0 in dataflow_set_clear_at_call(dataflow_set_def*) ()
#3  0x0000000000b7c613 in vt_emit_notes() ()
#4  0x0000000000b847ea in variable_tracking_main() ()
#5  0x00000000008e8acf in execute_one_pass(opt_pass*) ()


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Bug middle-end/38474] slow compilation at -O0 due to expand's temp slot goo
       [not found] <bug-38474-4@http.gcc.gnu.org/bugzilla/>
                   ` (22 preceding siblings ...)
  2012-08-28 14:55 ` Joost.VandeVondele at mat dot ethz.ch
@ 2012-08-28 15:07 ` steven at gcc dot gnu.org
  2013-03-06 11:01 ` [Bug middle-end/38474] compile time explosion in dataflow_set_preserve_mem_locs at -O3 steven at gcc dot gnu.org
                   ` (25 subsequent siblings)
  49 siblings, 0 replies; 50+ messages in thread
From: steven at gcc dot gnu.org @ 2012-08-28 15:07 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474

Steven Bosscher <steven at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |aoliva at gcc dot gnu.org

--- Comment #72 from Steven Bosscher <steven at gcc dot gnu.org> 2012-08-28 15:06:56 UTC ---
Thus, another var-tracking issue.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Bug middle-end/38474] compile time explosion in dataflow_set_preserve_mem_locs at -O3
       [not found] <bug-38474-4@http.gcc.gnu.org/bugzilla/>
                   ` (23 preceding siblings ...)
  2012-08-28 15:07 ` steven at gcc dot gnu.org
@ 2013-03-06 11:01 ` steven at gcc dot gnu.org
  2013-03-07 10:32 ` rguenth at gcc dot gnu.org
                   ` (24 subsequent siblings)
  49 siblings, 0 replies; 50+ messages in thread
From: steven at gcc dot gnu.org @ 2013-03-06 11:01 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474

Steven Bosscher <steven at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|WAITING                     |NEW
                URL|http://gcc.gnu.org/ml/gcc-p |
                   |atches/2012-05/msg01813.htm |
                   |l                           |
                 CC|hubicka at gcc dot gnu.org, |
                   |jamborm at gcc dot gnu.org, |
                   |matz at gcc dot gnu.org,    |
                   |vmakarov at redhat dot com  |
         AssignedTo|matz at gcc dot gnu.org     |unassigned at gcc dot
                   |                            |gnu.org
            Summary|slow compilation at -O0 due |compile time explosion in
                   |to expand's temp slot goo   |dataflow_set_preserve_mem_l
                   |                            |ocs at -O3
      Known to fail|                            |4.8.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Bug middle-end/38474] compile time explosion in dataflow_set_preserve_mem_locs at -O3
       [not found] <bug-38474-4@http.gcc.gnu.org/bugzilla/>
                   ` (24 preceding siblings ...)
  2013-03-06 11:01 ` [Bug middle-end/38474] compile time explosion in dataflow_set_preserve_mem_locs at -O3 steven at gcc dot gnu.org
@ 2013-03-07 10:32 ` rguenth at gcc dot gnu.org
  2013-03-07 14:55 ` rguenth at gcc dot gnu.org
                   ` (23 subsequent siblings)
  49 siblings, 0 replies; 50+ messages in thread
From: rguenth at gcc dot gnu.org @ 2013-03-07 10:32 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
             Blocks|                            |47344
         AssignedTo|unassigned at gcc dot       |rguenth at gcc dot gnu.org
                   |gnu.org                     |

--- Comment #73 from Richard Biener <rguenth at gcc dot gnu.org> 2013-03-07 10:31:53 UTC ---
On trunk with the reduced testcase I now see PTA taking 90% of compile-time ...

Argh.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Bug middle-end/38474] compile time explosion in dataflow_set_preserve_mem_locs at -O3
       [not found] <bug-38474-4@http.gcc.gnu.org/bugzilla/>
                   ` (25 preceding siblings ...)
  2013-03-07 10:32 ` rguenth at gcc dot gnu.org
@ 2013-03-07 14:55 ` rguenth at gcc dot gnu.org
  2013-12-06 13:43 ` rguenth at gcc dot gnu.org
                   ` (22 subsequent siblings)
  49 siblings, 0 replies; 50+ messages in thread
From: rguenth at gcc dot gnu.org @ 2013-03-07 14:55 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474

--- Comment #74 from Richard Biener <rguenth at gcc dot gnu.org> 2013-03-07 14:55:27 UTC ---
(In reply to comment #73)
> On trunk with the reduced testcase I now see PTA taking 90% of compile-time ...
> 
> Argh.

I can speed it up by

@@ -1631,7 +1619,20 @@ do_sd_constraint (constraint_graph_t gra
            flag |= bitmap_set_bit (sol, escaped_id);
          else if (v->may_have_pointers
                   && add_graph_edge (graph, lhs, t))
-           flag |= bitmap_ior_into (sol, get_varinfo (t)->solution);
+           {
+             /* For transitive closures, x = *(x + UNKNOWN), delay
+                propagation of the solution across the added edges
+                by marking sources as changed.  */
+             if (lhs == c->rhs.var)
+               {
+                 bitmap_set_bit (changed, t);
+                 flag |= true;
+               }
+             /* Else speedup solving by doing that here to save
+                iterations.  */
+             else
+               flag |= bitmap_ior_into (sol, get_varinfo (t)->solution);
+           }

          /* If the variable is not exactly at the requested offset
             we have to include the next one.  */

as repeatedly walking all of 'sol' for 'sol = *(sol + UNKNOWN)' when
adding the solution of one of delta(sol)'s member is slow.  Using
a temporary bitmap to collect all changes doesn't speed it up though,
so the only effect of the above is that the forwarding is delayed until
the next solver iteration (where eventually we discover and eliminate
more indirect cycles).


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Bug middle-end/38474] compile time explosion in dataflow_set_preserve_mem_locs at -O3
       [not found] <bug-38474-4@http.gcc.gnu.org/bugzilla/>
                   ` (26 preceding siblings ...)
  2013-03-07 14:55 ` rguenth at gcc dot gnu.org
@ 2013-12-06 13:43 ` rguenth at gcc dot gnu.org
  2013-12-06 14:20 ` rguenth at gcc dot gnu.org
                   ` (21 subsequent siblings)
  49 siblings, 0 replies; 50+ messages in thread
From: rguenth at gcc dot gnu.org @ 2013-12-06 13:43 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474

--- Comment #75 from Richard Biener <rguenth at gcc dot gnu.org> ---
On trunk with the reduced testcase and -O2 (no -g):

 ipa inlining heuristics :   9.85 ( 5%) usr   0.00 ( 0%) sys   9.93 ( 5%) wall 
  1448 kB ( 0%) ggc
 tree PTA                : 161.26 (78%) usr   0.30 (45%) sys 162.00 (78%) wall 
 42484 kB ( 8%) ggc
 expand vars             :   3.06 ( 1%) usr   0.01 ( 2%) sys   3.06 ( 1%) wall 
 16074 kB ( 3%) ggc
 integrated RA           :   4.46 ( 2%) usr   0.11 (17%) sys   4.56 ( 2%) wall 
 87144 kB (17%) ggc
 TOTAL                 : 205.49             0.66           206.80            
513245 kB

-O3 is in the same ballpark.  I'll look at this again.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Bug middle-end/38474] compile time explosion in dataflow_set_preserve_mem_locs at -O3
       [not found] <bug-38474-4@http.gcc.gnu.org/bugzilla/>
                   ` (27 preceding siblings ...)
  2013-12-06 13:43 ` rguenth at gcc dot gnu.org
@ 2013-12-06 14:20 ` rguenth at gcc dot gnu.org
  2013-12-09 15:13 ` rguenth at gcc dot gnu.org
                   ` (20 subsequent siblings)
  49 siblings, 0 replies; 50+ messages in thread
From: rguenth at gcc dot gnu.org @ 2013-12-06 14:20 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474

--- Comment #76 from Richard Biener <rguenth at gcc dot gnu.org> ---
There are a lot of calls with fnspec, almost all constraints look like

D.12770.0+16 = allalltmp
D.12770.64+128 = allalltmp
D.12770.192+64 = allalltmp
callarg = &READONLY
callarg = *callarg
callarg = callarg + UNKNOWN
CALLUSED = callarg
ESCAPED = &NONLOCAL
allalltmp = CALLUSED
allalltmp = NONLOCAL
D.12771.0+16 = allalltmp
D.12771.64+128 = allalltmp
D.12771.192+64 = allalltmp
callarg = &D.12770.0+16
callarg = *callarg
callarg = callarg + UNKNOWN

They get unified pretty quickly though.

Still we end up with many very large sets that include ESCAPED but also some
members of ESCAPED explicitely (that's redundant).  I have some idea on how
to mitigate this which eventually should speed things up (or at least reduce
memory usage).

Like

Index: tree-ssa-structalias.c
===================================================================
--- tree-ssa-structalias.c      (revision 205739)
+++ tree-ssa-structalias.c      (working copy)
@@ -1600,6 +1600,14 @@ do_sd_constraint (constraint_graph_t gra
       goto done;
     }

+  /* If the solution of Y contains escaped then filter all bits from
+     that from the delta to reduce work.  */
+  if (bitmap_bit_p (delta, escaped_id))
+    {
+      bitmap_and_compl_into (delta, get_varinfo (find
(escaped_id))->solution);
+      flag |= bitmap_set_bit (sol, escaped_id);
+    }
+  
   /* If we do not know at with offset the rhs is dereferenced compute
      the reachability set of DELTA, conservatively assuming it is
      dereferenced at all valid offsets.  */

will check next week.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Bug middle-end/38474] compile time explosion in dataflow_set_preserve_mem_locs at -O3
       [not found] <bug-38474-4@http.gcc.gnu.org/bugzilla/>
                   ` (28 preceding siblings ...)
  2013-12-06 14:20 ` rguenth at gcc dot gnu.org
@ 2013-12-09 15:13 ` rguenth at gcc dot gnu.org
  2013-12-10 12:31 ` rguenth at gcc dot gnu.org
                   ` (19 subsequent siblings)
  49 siblings, 0 replies; 50+ messages in thread
From: rguenth at gcc dot gnu.org @ 2013-12-09 15:13 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474

--- Comment #77 from Richard Biener <rguenth at gcc dot gnu.org> ---
Author: rguenth
Date: Mon Dec  9 15:13:07 2013
New Revision: 205808

URL: http://gcc.gnu.org/viewcvs?rev=205808&root=gcc&view=rev
Log:
2013-12-09  Richard Biener  <rguenther@suse.de>

    PR middle-end/38474
    * tree-ssa-structalias.c (set_union_with_increment): Remove
    unreachable code.
    (do_complex_constraint): Call set_union_with_increment with
    the solution delta, not the full solution.
    (make_transitive_closure_constraints): Merge the two
    constraints.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/tree-ssa-structalias.c


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Bug middle-end/38474] compile time explosion in dataflow_set_preserve_mem_locs at -O3
       [not found] <bug-38474-4@http.gcc.gnu.org/bugzilla/>
                   ` (29 preceding siblings ...)
  2013-12-09 15:13 ` rguenth at gcc dot gnu.org
@ 2013-12-10 12:31 ` rguenth at gcc dot gnu.org
  2021-02-10 14:52 ` rguenth at gcc dot gnu.org
                   ` (18 subsequent siblings)
  49 siblings, 0 replies; 50+ messages in thread
From: rguenth at gcc dot gnu.org @ 2013-12-10 12:31 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474

--- Comment #78 from Richard Biener <rguenth at gcc dot gnu.org> ---
Author: rguenth
Date: Tue Dec 10 12:31:39 2013
New Revision: 205857

URL: http://gcc.gnu.org/viewcvs?rev=205857&root=gcc&view=rev
Log:
2013-12-10  Richard Biener  <rguenther@suse.de>

    PR middle-end/38474
    * tree-ssa-structalias.c (solution_set_expand): Expand into
    a different possibly cached bitmap and return the result.
    (set_union_with_increment): Pass in a shared expanded bitmap
    and adjust.
    (do_sd_constraint): Likewise.
    (do_ds_constraint): Likewise.
    (do_complex_constraint): Likewise.
    (solve_graph): Manage the shared expanded bitmap.

    * gcc.dg/ipa/ipa-pta-14.c: Un-XFAIL.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/ipa/ipa-pta-14.c
    trunk/gcc/tree-ssa-structalias.c


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Bug middle-end/38474] compile time explosion in dataflow_set_preserve_mem_locs at -O3
       [not found] <bug-38474-4@http.gcc.gnu.org/bugzilla/>
                   ` (30 preceding siblings ...)
  2013-12-10 12:31 ` rguenth at gcc dot gnu.org
@ 2021-02-10 14:52 ` rguenth at gcc dot gnu.org
  2021-02-10 15:03 ` rguenth at gcc dot gnu.org
                   ` (17 subsequent siblings)
  49 siblings, 0 replies; 50+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-02-10 14:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #83 from Richard Biener <rguenth at gcc dot gnu.org> ---
Meh.  On trunk (GCC 11) we now have for the reduced testcase

> ./f951 -quiet testcase_reduced.f90 -ffree-line-length-512 -ftime-report -O3

Time variable                                   usr           sys          wall
          GGC
...
 callgraph ipa passes               :  28.09 (  8%)   0.23 ( 38%)  28.33 (  8%)
   68M ( 12%)
 ipa inlining heuristics            :   5.13 (  1%)   0.01 (  2%)   5.13 (  1%)
   14M (  3%)
 alias stmt walking                 :   7.03 (  2%)   0.09 ( 15%)   7.15 (  2%)
  277k (  0%)
 tree PTA                           :  26.20 (  7%)   0.17 ( 28%)  26.39 (  7%)
   25M (  5%)
 store merging                      : 308.60 ( 84%)   0.01 (  2%) 308.70 ( 84%)
 3858k (  1%)
 TOTAL                              : 365.68          0.61        366.42       
  557M

so store-merging goes bollocks.  I will try to dig into it a bit but I'm not
very familiar with the code.  GCC 10 behaves similar here but not as bad:

 store merging                      : 232.10 ( 82%)   0.02 (  4%) 232.19 ( 82%)
   3837 kB (  1%)
 TOTAL                              : 283.51          0.45        284.05       
 582957 kB

while GCC 9 is sane:

 store merging                      :   0.04 (  0%)   0.00 (  0%)   0.04 (  0%)
   2700 kB (  1%)
 TOTAL                              :  88.59          0.70         89.34       
 521364 kB

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Bug middle-end/38474] compile time explosion in dataflow_set_preserve_mem_locs at -O3
       [not found] <bug-38474-4@http.gcc.gnu.org/bugzilla/>
                   ` (31 preceding siblings ...)
  2021-02-10 14:52 ` rguenth at gcc dot gnu.org
@ 2021-02-10 15:03 ` rguenth at gcc dot gnu.org
  2021-02-10 15:46 ` rguenth at gcc dot gnu.org
                   ` (16 subsequent siblings)
  49 siblings, 0 replies; 50+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-02-10 15:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474

--- Comment #84 from Richard Biener <rguenth at gcc dot gnu.org> ---
So it's the usual (quadratic) culprit:

Samples: 1M of event 'cycles:u', Event count (approx.): 1675893461671           
Overhead       Samples  Command  Shared Object     Symbol                       
  20.61%        316521  f951     f951              [.] get_ref_base_and_extent
  14.42%        221221  f951     f951              [.] (anonymous
namespace)::pass_store_merging::terminate_all_aliasing_chains
   5.77%         88586  f951     f951              [.] special_function_p

I'll see whether I can do some surgery.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Bug middle-end/38474] compile time explosion in dataflow_set_preserve_mem_locs at -O3
       [not found] <bug-38474-4@http.gcc.gnu.org/bugzilla/>
                   ` (32 preceding siblings ...)
  2021-02-10 15:03 ` rguenth at gcc dot gnu.org
@ 2021-02-10 15:46 ` rguenth at gcc dot gnu.org
  2021-02-10 15:47 ` rguenth at gcc dot gnu.org
                   ` (15 subsequent siblings)
  49 siblings, 0 replies; 50+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-02-10 15:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474

--- Comment #85 from Richard Biener <rguenth at gcc dot gnu.org> ---
Starting new chain with statement:
D.31414 ={v} {CLOBBER};
The base object is:
&D.31414
Starting new chain with statement:
D.31415 ={v} {CLOBBER};
The base object is:
&D.31415
...

but those are all the last use of the base object so they just add up and
are never invalidated, but lengthening the m_stores_head list and thus
making terminate_all_aliasing_chains more expensive.

Jakub, were the clobbers ever supposed to _start_ a chain?

With

diff --git a/gcc/gimple-ssa-store-merging.c b/gcc/gimple-ssa-store-merging.c
index f0f4a068de5..fa9a092d544 100644
--- a/gcc/gimple-ssa-store-merging.c
+++ b/gcc/gimple-ssa-store-merging.c
@@ -5175,6 +5175,9 @@ pass_store_merging::process_store (gimple *stmt)

   /* Store aliases any existing chain?  */
   ret |= terminate_all_aliasing_chains (NULL, stmt);
+  /* Do not start a new chain from a CLOBBER.  */
+  if (gimple_clobber_p (stmt))
+    return ret;
   /* Start a new chain.  */
   class imm_store_chain_info *new_chain
     = new imm_store_chain_info (m_stores_head, base_addr);

compile-time gets down to

 store merging                      :   1.18 (  2%)   0.00 (  0%)   1.18 (  2%)
 3858k (  1%)
 TOTAL                              :  59.84          0.57         60.43       
  557M

I'm checking if it has any testsuite fallout.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Bug middle-end/38474] compile time explosion in dataflow_set_preserve_mem_locs at -O3
       [not found] <bug-38474-4@http.gcc.gnu.org/bugzilla/>
                   ` (33 preceding siblings ...)
  2021-02-10 15:46 ` rguenth at gcc dot gnu.org
@ 2021-02-10 15:47 ` rguenth at gcc dot gnu.org
  2021-02-10 15:51 ` jakub at gcc dot gnu.org
                   ` (14 subsequent siblings)
  49 siblings, 0 replies; 50+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-02-10 15:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474

--- Comment #86 from Richard Biener <rguenth at gcc dot gnu.org> ---
OK, so clobber handling was added as a fix for PR92038

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Bug middle-end/38474] compile time explosion in dataflow_set_preserve_mem_locs at -O3
       [not found] <bug-38474-4@http.gcc.gnu.org/bugzilla/>
                   ` (34 preceding siblings ...)
  2021-02-10 15:47 ` rguenth at gcc dot gnu.org
@ 2021-02-10 15:51 ` jakub at gcc dot gnu.org
  2021-02-10 15:55 ` rguenther at suse dot de
                   ` (13 subsequent siblings)
  49 siblings, 0 replies; 50+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-02-10 15:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474

--- Comment #87 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
At least for PR92038 it is important to see CLOBBERs in the chain, including
the first position in there.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Bug middle-end/38474] compile time explosion in dataflow_set_preserve_mem_locs at -O3
       [not found] <bug-38474-4@http.gcc.gnu.org/bugzilla/>
                   ` (35 preceding siblings ...)
  2021-02-10 15:51 ` jakub at gcc dot gnu.org
@ 2021-02-10 15:55 ` rguenther at suse dot de
  2021-02-10 16:02 ` rguenth at gcc dot gnu.org
                   ` (12 subsequent siblings)
  49 siblings, 0 replies; 50+ messages in thread
From: rguenther at suse dot de @ 2021-02-10 15:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474

--- Comment #88 from rguenther at suse dot de <rguenther at suse dot de> ---
On Wed, 10 Feb 2021, jakub at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474
> 
> --- Comment #87 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> At least for PR92038 it is important to see CLOBBERs in the chain, including
> the first position in there.

Hmm, OK.  I'll look closer tomorrow but can you try to explain why
it's ever important at the first position?

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Bug middle-end/38474] compile time explosion in dataflow_set_preserve_mem_locs at -O3
       [not found] <bug-38474-4@http.gcc.gnu.org/bugzilla/>
                   ` (36 preceding siblings ...)
  2021-02-10 15:55 ` rguenther at suse dot de
@ 2021-02-10 16:02 ` rguenth at gcc dot gnu.org
  2021-02-10 16:06 ` jakub at gcc dot gnu.org
                   ` (11 subsequent siblings)
  49 siblings, 0 replies; 50+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-02-10 16:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474

--- Comment #89 from Richard Biener <rguenth at gcc dot gnu.org> ---
Fallout includes

FAIL: g++.dg/opt/store-merging-1.C  scan-tree-dump store-merging "New sequence
of [12] stores to replace old one of 2 stores"

which shows

Starting new chain with statement:
s ={v} {CLOBBER};
The base object is:
&s
Recording immediate store from stmt:
s.a = 0;
Recording immediate store from stmt:
s.b = 0;
stmt causes chain termination:
foo (s);

and the CLOBBER allows us to use zeros for padding:

Store 0:
bitsize:64 bitpos:0 val:{CLOBBER}
Store 1:
bitsize:32 bitpos:0 val:0
Store 2:
bitsize:8 bitpos:32 val:0
After writing {CLOBBER} of size 64 at position 0
  the merged value contains 00 00 00 00 00 00 00 00
  the merged mask contains  00 00 00 00 00 00 00 00
After writing 0 of size 32 at position 0
  the merged value contains 00 00 00 00 00 00 00 00
  the merged mask contains  00 00 00 00 00 00 00 00
After writing 0 of size 8 at position 32
  the merged value contains 00 00 00 00 00 00 00 00
  the merged mask contains  00 00 00 00 00 00 00 00
Coalescing successful!
Merged into 1 stores
New sequence of 1 stores to replace old one of 2 stores
# .MEM_6 = VDEF <.MEM_5>
MEM <unsigned long> [(void *)&s] = 0;
Merging successful!

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Bug middle-end/38474] compile time explosion in dataflow_set_preserve_mem_locs at -O3
       [not found] <bug-38474-4@http.gcc.gnu.org/bugzilla/>
                   ` (37 preceding siblings ...)
  2021-02-10 16:02 ` rguenth at gcc dot gnu.org
@ 2021-02-10 16:06 ` jakub at gcc dot gnu.org
  2021-02-10 16:06 ` rguenth at gcc dot gnu.org
                   ` (10 subsequent siblings)
  49 siblings, 0 replies; 50+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-02-10 16:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474

--- Comment #90 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Because it says that the whole range is uninitialized, so the store merging
code doesn't need to care about pre-existing content in any gaps between the
stored values.  So say when the whole var is clobbered and then the code stores
to every second bitfield, we don't need to read the old content, mask it, or
with the stored bits and store that, but can just put some suitable value into
the gaps (0 or all ones or whatever is best).
For quadratic behavior, I wonder if we just shouldn't see how many chains are
we tracking currently and if we have too many (some param), terminate all of
them.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Bug middle-end/38474] compile time explosion in dataflow_set_preserve_mem_locs at -O3
       [not found] <bug-38474-4@http.gcc.gnu.org/bugzilla/>
                   ` (38 preceding siblings ...)
  2021-02-10 16:06 ` jakub at gcc dot gnu.org
@ 2021-02-10 16:06 ` rguenth at gcc dot gnu.org
  2021-02-10 16:28 ` rguenth at gcc dot gnu.org
                   ` (9 subsequent siblings)
  49 siblings, 0 replies; 50+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-02-10 16:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474

--- Comment #91 from Richard Biener <rguenth at gcc dot gnu.org> ---
So the other simple idea I have is to limit the number of active store groups
and force-terminate in either a LRU or FIFO manner.

For the testcase at hand the decls we start the chain for are all only
used in full but knowing that would require some pre-analysis of the IL
similar to what SRA does for example (collecting all accesses).  It's
then also still easy to "break" such heuristic so limiting is in the end
the only (and I guess best) option.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Bug middle-end/38474] compile time explosion in dataflow_set_preserve_mem_locs at -O3
       [not found] <bug-38474-4@http.gcc.gnu.org/bugzilla/>
                   ` (39 preceding siblings ...)
  2021-02-10 16:06 ` rguenth at gcc dot gnu.org
@ 2021-02-10 16:28 ` rguenth at gcc dot gnu.org
  2021-02-10 16:49 ` jakub at gcc dot gnu.org
                   ` (8 subsequent siblings)
  49 siblings, 0 replies; 50+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-02-10 16:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474

--- Comment #92 from Richard Biener <rguenth at gcc dot gnu.org> ---
Simple and stupid like the below works and does

 store merging                      :   0.42 (  1%)   0.00 (  0%)   0.43 (  1%)
 3858k (  1%)
 TOTAL                              :  56.86          0.56         57.45       
  557M

we have a limit of 64 stores in a single chain, so the product of both limits
limit the number of alias queries done in terminate_all_aliasing_chains.  I'll
polish it up tomorrow (and will refrain from trying to avoid the linear
walk here and keeping a counter or even a pointer to the last element).

diff --git a/gcc/gimple-ssa-store-merging.c b/gcc/gimple-ssa-store-merging.c
index f0f4a068de5..c6ec6b2cbce 100644
--- a/gcc/gimple-ssa-store-merging.c
+++ b/gcc/gimple-ssa-store-merging.c
@@ -5175,6 +5175,19 @@ pass_store_merging::process_store (gimple *stmt)

   /* Store aliases any existing chain?  */
   ret |= terminate_all_aliasing_chains (NULL, stmt);
+  unsigned cnt = 0;
+  imm_store_chain_info **e = &m_stores_head;
+  while (*e)
+    if (++cnt > 16)
+      {
+       if (dump_file && (dump_flags & TDF_DETAILS))
+         fprintf (dump_file, "Too many chains, terminating oldest before "
+                  "starting a new chain.\n");
+       terminate_and_process_chain (*e);
+      }
+    else
+      e = &(*e)->next;
+
   /* Start a new chain.  */
   class imm_store_chain_info *new_chain
     = new imm_store_chain_info (m_stores_head, base_addr);

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Bug middle-end/38474] compile time explosion in dataflow_set_preserve_mem_locs at -O3
       [not found] <bug-38474-4@http.gcc.gnu.org/bugzilla/>
                   ` (40 preceding siblings ...)
  2021-02-10 16:28 ` rguenth at gcc dot gnu.org
@ 2021-02-10 16:49 ` jakub at gcc dot gnu.org
  2021-02-11  9:32 ` rguenth at gcc dot gnu.org
                   ` (7 subsequent siblings)
  49 siblings, 0 replies; 50+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-02-10 16:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474

--- Comment #93 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
I think I'd go for more chains by default, at least 64 or even 256, with a
param and tracking on how many we have in a counter.  The class has a
ctor/dtor, so the increment/decrement of the counter can be done there.
And I think it is doubly-linked, so tail should be prev on the head.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Bug middle-end/38474] compile time explosion in dataflow_set_preserve_mem_locs at -O3
       [not found] <bug-38474-4@http.gcc.gnu.org/bugzilla/>
                   ` (41 preceding siblings ...)
  2021-02-10 16:49 ` jakub at gcc dot gnu.org
@ 2021-02-11  9:32 ` rguenth at gcc dot gnu.org
  2021-02-12  8:57 ` cvs-commit at gcc dot gnu.org
                   ` (6 subsequent siblings)
  49 siblings, 0 replies; 50+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-02-11  9:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474

--- Comment #94 from Richard Biener <rguenth at gcc dot gnu.org> ---
Created attachment 50165
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50165&action=edit
patch experiment

(In reply to Jakub Jelinek from comment #93)
> I think I'd go for more chains by default, at least 64 or even 256, with a
> param and tracking on how many we have in a counter.  The class has a
> ctor/dtor, so the increment/decrement of the counter can be done there.
> And I think it is doubly-linked, so tail should be prev on the head.

So the list is not circular but adding a counter is easy and avoids the
linked list walk in the usual cases when we do not hit the limit.

Note that the number of alias queries we do is
(param_max_store_chains_to_track * param_max_stores_to_merge) squared
with a default of 64 that's already 64 ^ 4.  So a less restrictive option
might be to limit the product of both numbers.

Now, I was thinking that we eventually can get rid of most alias queries
by instead of ambiguating against all stores in each chain only
ambiguate against the whole base of each chain, effectively making it
param_max_store_chains_to_track squared then.  That might lose some odd
cases.  The profile also shows that the caching of ao_refs (and thus
get_ref_base_and_extent calls) is imperfect - for the store chains
we could record ao_refs for example (it effectively already records
all necessary info anyway).

A simple patch caching the ao_ref in store_immediate_info improves compile
time to

 store merging                      : 265.07 ( 83%)   0.00 (  0%) 265.14 ( 82%)
 3858k (  1%)
 TOTAL                              : 321.25          0.53        321.88       
  557M

shaving off some 50s and thus possibly worth the extra memory cost here.

Btw, I've done some statistics and for this testcase we indeed mostly
have chains with a single store (the clobber) and thus the idea of
reducing the number of alias queries by globbing all stores of a chain
into a single effective ao_ref wouldn't help too much
(but it would save us from caching the ao_ref in each store to a single
ao_ref per chain):

  36422 Terminating chain with 1 stores
  10889 Terminating chain with 3 stores

and the largest number of active chains is 32661.

The idea of limiting the number of overall tracked stores instead of the
number of chains would be similarly simple and put a limit on the overall
alias queries (but then we have the limit on the number of stores in a
single chain, possibly because of non-linearities in chain processing).

Still limiting the number of tracked chain is more obvious in behavior
and thus IMHO superior for users (even if it means we have to put a
more conservative default on that).

Statistics on the param defaults:

 64    1.31 (  2%) 
128    2.64 (  4%)
256    4.86 (  8%)

but given the above statistics the testcase isn't a good example to tune
these with just either one or three stores on each chain.  Enabling
the ao_ref caching only marginally improves the testcase with the
limiting in place:

256    4.28 (  7%)

so any comments?

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Bug middle-end/38474] compile time explosion in dataflow_set_preserve_mem_locs at -O3
       [not found] <bug-38474-4@http.gcc.gnu.org/bugzilla/>
                   ` (42 preceding siblings ...)
  2021-02-11  9:32 ` rguenth at gcc dot gnu.org
@ 2021-02-12  8:57 ` cvs-commit at gcc dot gnu.org
  2021-02-12 10:29 ` rguenth at gcc dot gnu.org
                   ` (5 subsequent siblings)
  49 siblings, 0 replies; 50+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-02-12  8:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474

--- Comment #95 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:95d94b52ea8478334fb92cca545f0bd904bd0034

commit r11-7205-g95d94b52ea8478334fb92cca545f0bd904bd0034
Author: Richard Biener <rguenther@suse.de>
Date:   Thu Feb 11 11:13:47 2021 +0100

    tree-optimization/38474 - fix store-merging compile-time regression

    The following puts a limit on the number of alias tests we do in
    terminate_all_aliasing_chains which is quadratic in the number of
    overall stores currentrly tracked.  There is already a limit in
    place on the maximum number of stores in a single chain so the
    following adds a limit on the number of chains tracked.  The
    worst number of overall stores tracked from the defaults (64 and 64)
    is then 4096 which when imposed as the sole limit for the testcase
    still causes

     store merging                      :  71.65 ( 56%)

    because the testcase is somewhat degenerate with most chains
    consisting only of a single store (and 25% of exactly three stores).
    The single stores are all CLOBBERs at the point variables go out of
    scope.  Note unpatched we have

     store merging                      : 308.60 ( 84%)

    Limiting the number of chains to 64 brings this down to

     store merging                      :   1.52 (  3%)

    which is more reasonable.  There are ideas on how to make
    terminate_all_aliasing_chains cheaper but for this degenerate case
    they would not have any effect so I'll defer for GCC 12 for those.

    I'm not sure we want to have both --params, just keeping the
    more to-the-point max-stores-to-track works but makes the
    degenerate case above slower.
    I made the current default 1024 which for the testcasse
    (without limiting chains) results in 25% compile time and 20s
    putting it in the same ballpart as the next offender (which is PTA).

    This is a regression on trunk and the GCC 10 branch btw.

    2021-02-11  Richard Biener  <rguenther@suse.de>

            PR tree-optimization/38474
            * params.opt (-param=max-store-chains-to-track=): New param.
            (-param=max-stores-to-track=): Likewise.
            * doc/invoke.texi (max-store-chains-to-track): Document.
            (max-stores-to-track): Likewise.
            * gimple-ssa-store-merging.c (pass_store_merging::m_n_chains):
            New.
            (pass_store_merging::m_n_stores): Likewise.
            (pass_store_merging::terminate_and_process_chain): Update
            m_n_stores and m_n_chains.
            (pass_store_merging::process_store): Likewise.   Terminate
            oldest chains if the number of stores or chains get too large.
            (imm_store_chain_info::terminate_and_process_chain): Dump
            chain length.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Bug middle-end/38474] compile time explosion in dataflow_set_preserve_mem_locs at -O3
       [not found] <bug-38474-4@http.gcc.gnu.org/bugzilla/>
                   ` (43 preceding siblings ...)
  2021-02-12  8:57 ` cvs-commit at gcc dot gnu.org
@ 2021-02-12 10:29 ` rguenth at gcc dot gnu.org
  2021-02-12 10:35 ` rguenth at gcc dot gnu.org
                   ` (4 subsequent siblings)
  49 siblings, 0 replies; 50+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-02-12 10:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474

--- Comment #96 from Richard Biener <rguenth at gcc dot gnu.org> ---
The full testcase on trunk (g:95d94b52ea8478334fb92cca545f0bd904bd0034) at -O0
-g
now takes 9s to compile and uses 1GB ram.

With -O1 -g we have

Time variable                                   usr           sys          wall
          GGC
 callgraph functions expansion      :  13.41 ( 12%)   0.21 ( 60%)  13.63 ( 12%)
  439M ( 73%)
 callgraph ipa passes               :  94.79 ( 86%)   0.13 ( 37%)  94.95 ( 86%)
   75M ( 13%)
 ipa function summary               :  91.46 ( 83%)   0.02 (  6%)  91.53 ( 83%)
   17M (  3%)
 tree PTA                           :   5.78 (  5%)   0.05 ( 14%)   5.85 (  5%)
   23M (  4%)
 TOTAL                              : 109.96          0.35        110.37       
  597M
109.97user 0.37system 1:50.38elapsed 99%CPU (0avgtext+0avgdata
1110568maxresident)k
0inputs+0outputs (0major+350549minor)pagefaults 0swaps

where perf shows

Samples: 448K of event 'cycles:u', Event count (approx.): 483237005145          
Overhead       Samples  Command   Shared Object     Symbol                      
  17.26%         77187  f951      f951              [.] get_ref_base_and_extent
                                          #
   8.36%         37385  f951      f951              [.]
stmt_may_clobber_ref_p_1                                          #
   7.16%         32045  f951      f951              [.] default_binds_local_p_3
                                          #
   6.40%         28628  f951      f951              [.] bitmap_bit_p           
                                          #
   6.39%         28557  f951      f951              [.]
determine_known_aggregate_parts                                   #
   5.92%         26464  f951      f951              [.] pt_solution_includes_1 
                                          #
   4.66%         20834  f951      f951              [.]
call_may_clobber_ref_p_1                                          #
   3.44%         15406  f951      f951              [.] flags_from_decl_or_type
                                          #
   3.35%         14971  f951      f951              [.] refs_may_alias_p_1     
                                          #
   3.05%         13667  f951      f951              [.] gimple_call_flags      
                                          #
   2.55%         11387  f951      f951              [.]
cgraph_node::get_availability                                     #
   2.40%         10739  f951      libc-2.26.so      [.] __strncmp_sse42        
                                          #
   2.32%         10372  f951      f951              [.] check_fnspec           
                                          #
   1.89%          8411  f951      f951              [.] bitmap_set_bit         
                                          #
   1.71%          7635  f951      f951              [.]
private_lookup_attribute                                          #
   1.68%          7512  f951      f951              [.]
get_modref_function_summary                                       #
   1.52%          6805  f951      f951              [.]
decl_binds_to_current_def_p                                       #
   1.46%          6512  f951      f951              [.] gimple_call_fnspec     
                                          #
   1.26%          5582  f951      f951              [.] bitmap_clear_bit       
                                          #
   0.94%          4212  f951      f951              [.]
cgraph_node::function_or_virtual_thunk_symbol       

we need to do sth about the IPA fnsummary cost, it looks unreasonable compared
to all the rest, at least for -O1.  Cutting down --param ipa-max-aa-steps
doesn't seem to help but it looks accounting is simply broken.

And with -O2 or -O3 we have

Time variable                                   usr           sys          wall
          GGC
 callgraph functions expansion      : 201.23 ( 20%)   0.77 ( 46%) 202.05 ( 20%)
 1230M ( 82%)
 callgraph ipa passes               : 807.58 ( 80%)   0.86 ( 52%) 808.75 ( 80%)
  201M ( 13%)
 ipa inlining heuristics            :  40.25 (  4%)   0.01 (  1%)  40.24 (  4%)
   41M (  3%)
 alias stmt walking                 :  21.48 (  2%)   0.20 ( 12%)  21.72 (  2%)
  601k (  0%)
 tree PTA                           : 788.36 ( 78%)   0.76 ( 46%) 789.43 ( 78%)
  101M (  7%)
 tree slp vectorization             :  13.97 (  1%)   0.04 (  2%)  14.01 (  1%)
  225M ( 15%)
 expand vars                        :  92.66 (  9%)   0.00 (  0%)  92.72 (  9%)
   63M (  4%)
 TOTAL                              :1010.42          1.66       1012.46       
 1509M
1010.42user 1.73system 16:52.53elapsed 99%CPU (0avgtext+0avgdata
4764428maxresident)k
0inputs+0outputs (0major+1199966minor)pagefaults 0swaps

surprisingly the IPA fnsummary issue is -O1 only but maybe it's an accounting
issue.  perf with callgraph points to (if I interpret correctly) the
determine_known_aggregate_parts function which, while accounting alias
queries done via get_continuation_for_phi, does not account those done
by walking the VDEF chain itself.  I'm testing a fix.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Bug middle-end/38474] compile time explosion in dataflow_set_preserve_mem_locs at -O3
       [not found] <bug-38474-4@http.gcc.gnu.org/bugzilla/>
                   ` (44 preceding siblings ...)
  2021-02-12 10:29 ` rguenth at gcc dot gnu.org
@ 2021-02-12 10:35 ` rguenth at gcc dot gnu.org
  2021-02-12 11:42 ` cvs-commit at gcc dot gnu.org
                   ` (3 subsequent siblings)
  49 siblings, 0 replies; 50+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-02-12 10:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474

--- Comment #97 from Richard Biener <rguenth at gcc dot gnu.org> ---
So fixing that makes GCC 11 compile the full testcase at -O1 -g in 18 seconds
using about 1GB of memory.

That leaves PTA at -O2+ as the biggest offender (it also shows up with the
reduced testcase).

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Bug middle-end/38474] compile time explosion in dataflow_set_preserve_mem_locs at -O3
       [not found] <bug-38474-4@http.gcc.gnu.org/bugzilla/>
                   ` (45 preceding siblings ...)
  2021-02-12 10:35 ` rguenth at gcc dot gnu.org
@ 2021-02-12 11:42 ` cvs-commit at gcc dot gnu.org
  2021-02-12 14:40 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  49 siblings, 0 replies; 50+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-02-12 11:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474

--- Comment #98 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:6cc886bf4279461b8931c4ca544185a85cd69f26

commit r11-7208-g6cc886bf4279461b8931c4ca544185a85cd69f26
Author: Richard Biener <rguenther@suse.de>
Date:   Fri Feb 12 11:13:36 2021 +0100

    middle-end/38474 - fix alias walk budget accounting in IPA analysis

    The walk_aliased_vdef calls do not update the walking budget until
    it is hit by a single call (and then in one case it resumes with
    no limit at all).  The following rectifies this in multiple places.
    It also makes the updates more consistend and fixes
    determine_known_aggregate_parts to account its own alias queries.

    2021-02-12  Richard Biener  <rguenther@suse.de>

            PR middle-end/38474
            * ipa-fnsummary.c (unmodified_parm_1): Only walk when
            fbi->aa_walk_budget is bigger than zero.  Update
            fbi->aa_walk_budget.
            (param_change_prob): Likewise.
            * ipa-prop.c (detect_type_change_from_memory_writes):
            Properly account walk_aliased_vdefs.
            (parm_preserved_before_stmt_p): Canonicalize updates.
            (parm_ref_data_preserved_p): Likewise.
            (parm_ref_data_pass_through_p): Likewise.
            (determine_known_aggregate_parts): Account own alias queries.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Bug middle-end/38474] compile time explosion in dataflow_set_preserve_mem_locs at -O3
       [not found] <bug-38474-4@http.gcc.gnu.org/bugzilla/>
                   ` (46 preceding siblings ...)
  2021-02-12 11:42 ` cvs-commit at gcc dot gnu.org
@ 2021-02-12 14:40 ` rguenth at gcc dot gnu.org
  2021-02-16 12:38 ` cvs-commit at gcc dot gnu.org
  2021-04-29  8:04 ` cvs-commit at gcc dot gnu.org
  49 siblings, 0 replies; 50+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-02-12 14:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474

--- Comment #99 from Richard Biener <rguenth at gcc dot gnu.org> ---
Just a short brain-dump for the PTA issue:

--param max-fields-for-field-sensitive=1 helps, so some magic limit and
auto-degrading might be a good idea.

Solver stats are not so bad:

 Total vars:               21507
Non-pointer vars:          16
Statically unified vars:  6800
Dynamically unified vars: 0
Iterations:               4
Number of edges:          43380
Number of implicit edges: 23056

but varmap "compression" happens before unifying those 6800 vars which
means bitmaps are less dense than possible.  That there's nothing
dynamically unified also says that likely iteration order is sub-optimal.
We don't have entries of the forward graph and so likely do multile
DFSs starting from somewhere inside for example.  Given we add both
succ and pred edges during solving itself makes itegrating the DFS
into the iteration itself look attractive eventually.

More stats are needed to judge iteration order tweaks.

We have IL like

  D.335748 = __result_mpfr_division_mp_mp;
  __result_mpfr_division_mp_mp ={v} {CLOBBER};
  D.76250 = D.335748;
  D.335748 ={v} {CLOBBER};
...
  mpfr_add (&__result_mpfr_addition_mp_mp, &D.76250, &D.76256, 0);

that just generates a lot of initial constraints and variables.  D.335748
becomes live here, so does D.76250.  This happens early also, like with

__attribute__((fn spec (". r r ")))
struct mpfr_type mpfr_division_mp_mp (struct mpfr_type & restrict a1, struct
mpfr_type & restrict a2)
{
  struct mpfr_type __result_mpfr_division_mp_mp;
  integer(kind=4) retval;

  <bb 2> :
  mpfr_init (&__result_mpfr_division_mp_mp);
  retval_6 = mpfr_div (&__result_mpfr_division_mp_mp, a1_3(D), a2_4(D), 0);
  <retval> = __result_mpfr_division_mp_mp;
  __result_mpfr_division_mp_mp ={v} {CLOBBER};
  return <retval>;

having some early pass after inlining clean up the result would be nice
[simply renaming and eliding 1:1 copies here].  It takes until SRA to fix this
and the PTA pass after it (run as part of PRE) is fast:

 tree PTA                           :  20.26 (  7%)

another thing to notice would be not splitting vars that just occur
"bare" in the IL but that would require some pre-scanning of the IL
to note interesting (and uninteresting) variables.  That's something
we might need anyway though for improved allocated array handling for
example.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Bug middle-end/38474] compile time explosion in dataflow_set_preserve_mem_locs at -O3
       [not found] <bug-38474-4@http.gcc.gnu.org/bugzilla/>
                   ` (47 preceding siblings ...)
  2021-02-12 14:40 ` rguenth at gcc dot gnu.org
@ 2021-02-16 12:38 ` cvs-commit at gcc dot gnu.org
  2021-04-29  8:04 ` cvs-commit at gcc dot gnu.org
  49 siblings, 0 replies; 50+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-02-16 12:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474

--- Comment #100 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:3f16a1678156035bbe73b217fbce4d9c27d1d559

commit r11-7254-g3f16a1678156035bbe73b217fbce4d9c27d1d559
Author: Richard Biener <rguenther@suse.de>
Date:   Tue Feb 16 12:42:26 2021 +0100

    tree-optimization/38474 - improve PTA varinfo sorting

    This improves a previous heuristic to sort address-taken variables
    first (because those appear in points-to bitmaps) by tracking which
    variables appear in ADDRESSOF constraints (there's also
    graph->address_taken but that's computed only later).

    This shaves off 30s worth of compile-time for the full testcase in
    PR38474 (which then still takes 965s to compile at -O2).

    2021-02-16  Richard Biener  <rguenther@suse.de>

            PR tree-optimization/38474
            * tree-ssa-structalias.c (variable_info::address_taken): New.
            (new_var_info): Initialize address_taken.
            (process_constraint): Set address_taken.
            (solve_constraints): Use the new address_taken flag rather
            than is_reg_var for sorting variables.
            (dump_constraint): Dump the variable number if the name
            is just NULL.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Bug middle-end/38474] compile time explosion in dataflow_set_preserve_mem_locs at -O3
       [not found] <bug-38474-4@http.gcc.gnu.org/bugzilla/>
                   ` (48 preceding siblings ...)
  2021-02-16 12:38 ` cvs-commit at gcc dot gnu.org
@ 2021-04-29  8:04 ` cvs-commit at gcc dot gnu.org
  49 siblings, 0 replies; 50+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-04-29  8:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474

--- Comment #101 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:c57a8aea0c3ab8394f7dbfa417ee27b4613f63b7

commit r12-280-gc57a8aea0c3ab8394f7dbfa417ee27b4613f63b7
Author: Richard Biener <rguenther@suse.de>
Date:   Thu Apr 29 08:32:00 2021 +0200

    middle-end/38474 - speedup PTA constraint solving

    In testcases like PR38474 and PR99912 we're seeing very slow
    PTA solving.  One can observe an excessive amount of forwarding,
    mostly during sd constraint solving.  The way we solve the graph
    does not avoid forwarding the same bits through multiple paths,
    and especially when such alternate path involves ESCAPED as
    intermediate this causes the ESCAPED solution to be expanded
    in receivers.

    The following adds heuristic to add_graph_edge which adds
    forwarding edges but also guards the initial solution forwarding
    (which is the expensive part) to detect the case of ESCAPED
    receiving the same set and the destination already containing
    ESCAPED.

    This speeds up the PTA solving process by more than 50%.

    2021-04-29  Richard Biener  <rguenther@suse.de>

            PR middle-end/38474
            * tree-ssa-structalias.c (add_graph_edge): Avoid direct
            forwarding when indirect forwarding through ESCAPED
            alread happens.

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2021-04-29  8:04 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-38474-4@http.gcc.gnu.org/bugzilla/>
2011-04-28 16:04 ` [Bug middle-end/38474] slow compilation at -O0 due to expand's temp slot goo rguenth at gcc dot gnu.org
2011-12-02 13:06 ` steven at gcc dot gnu.org
2011-12-02 13:25 ` matz at gcc dot gnu.org
2012-05-27 23:24 ` steven at gcc dot gnu.org
2012-05-29  7:53 ` Joost.VandeVondele at mat dot ethz.ch
2012-05-29 13:09 ` matz at gcc dot gnu.org
2012-05-29 13:12 ` matz at gcc dot gnu.org
2012-05-29 15:08 ` hubicka at gcc dot gnu.org
2012-05-29 16:00 ` jamborm at gcc dot gnu.org
2012-06-15 14:56 ` matz at gcc dot gnu.org
2012-06-15 15:13 ` matz at gcc dot gnu.org
2012-06-15 15:26 ` Joost.VandeVondele at mat dot ethz.ch
2012-06-26 14:26 ` jamborm at gcc dot gnu.org
2012-06-26 14:45 ` matz at gcc dot gnu.org
2012-06-26 14:58 ` rguenth at gcc dot gnu.org
2012-06-26 15:01 ` jamborm at gcc dot gnu.org
2012-06-29 14:34 ` jamborm at gcc dot gnu.org
2012-07-02 15:28 ` jamborm at gcc dot gnu.org
2012-07-02 15:44 ` jamborm at gcc dot gnu.org
2012-07-02 15:53 ` jamborm at gcc dot gnu.org
2012-08-28  8:25 ` steven at gcc dot gnu.org
2012-08-28 11:28 ` Joost.VandeVondele at mat dot ethz.ch
2012-08-28 14:55 ` Joost.VandeVondele at mat dot ethz.ch
2012-08-28 15:07 ` steven at gcc dot gnu.org
2013-03-06 11:01 ` [Bug middle-end/38474] compile time explosion in dataflow_set_preserve_mem_locs at -O3 steven at gcc dot gnu.org
2013-03-07 10:32 ` rguenth at gcc dot gnu.org
2013-03-07 14:55 ` rguenth at gcc dot gnu.org
2013-12-06 13:43 ` rguenth at gcc dot gnu.org
2013-12-06 14:20 ` rguenth at gcc dot gnu.org
2013-12-09 15:13 ` rguenth at gcc dot gnu.org
2013-12-10 12:31 ` rguenth at gcc dot gnu.org
2021-02-10 14:52 ` rguenth at gcc dot gnu.org
2021-02-10 15:03 ` rguenth at gcc dot gnu.org
2021-02-10 15:46 ` rguenth at gcc dot gnu.org
2021-02-10 15:47 ` rguenth at gcc dot gnu.org
2021-02-10 15:51 ` jakub at gcc dot gnu.org
2021-02-10 15:55 ` rguenther at suse dot de
2021-02-10 16:02 ` rguenth at gcc dot gnu.org
2021-02-10 16:06 ` jakub at gcc dot gnu.org
2021-02-10 16:06 ` rguenth at gcc dot gnu.org
2021-02-10 16:28 ` rguenth at gcc dot gnu.org
2021-02-10 16:49 ` jakub at gcc dot gnu.org
2021-02-11  9:32 ` rguenth at gcc dot gnu.org
2021-02-12  8:57 ` cvs-commit at gcc dot gnu.org
2021-02-12 10:29 ` rguenth at gcc dot gnu.org
2021-02-12 10:35 ` rguenth at gcc dot gnu.org
2021-02-12 11:42 ` cvs-commit at gcc dot gnu.org
2021-02-12 14:40 ` rguenth at gcc dot gnu.org
2021-02-16 12:38 ` cvs-commit at gcc dot gnu.org
2021-04-29  8:04 ` cvs-commit at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).