[Bug ipa/65076] New: [5 Regression] 16% tramp3d-v4.cpp compile time regression

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug ipa/65076] New: [5 Regression] 16% tramp3d-v4.cpp compile time regression
@ 2015-02-16 13:00 trippels at gcc dot gnu.org
  2015-02-16 13:14 ` [Bug ipa/65076] " rguenth at gcc dot gnu.org
                   ` (51 more replies)
  0 siblings, 52 replies; 53+ messages in thread
From: trippels at gcc dot gnu.org @ 2015-02-16 13:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076

            Bug ID: 65076
           Summary: [5 Regression] 16% tramp3d-v4.cpp compile time
                    regression
           Product: gcc
           Version: 5.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: ipa
          Assignee: unassigned at gcc dot gnu.org
          Reporter: trippels at gcc dot gnu.org
                CC: hubicka at gcc dot gnu.org

Trunk build with --enable-checking=release and LTO/PGO is currently
~16% slower when compiling tramp3d-v4.cpp compared to 4.9.:

Trunk:
markus@x4 ~ % time g++ -w -Ofast tramp3d-v4.cpp
g++ -w -Ofast tramp3d-v4.cpp  25.92s user 0.35s system 99% cpu 26.303 total
4.9:
markus@x4 ~ % time g++ -w -Ofast tramp3d-v4.cpp
g++ -w -Ofast tramp3d-v4.cpp  21.56s user 0.36s system 99% cpu 21.963 total

It looks like r219452 is the culprit.

phase opt and generate  :  19.43 (87%) usr   1.44 (71%) sys  20.88 (85%) wall 
674052 kB (64%) ggc 
vs.
phase opt and generate  :  23.69 (89%) usr   1.51 (70%) sys  25.20 (87%) wall 
710029 kB (66%) ggc


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bug ipa/65076] [5 Regression] 16% tramp3d-v4.cpp compile time regression
  2015-02-16 13:00 [Bug ipa/65076] New: [5 Regression] 16% tramp3d-v4.cpp compile time regression trippels at gcc dot gnu.org
@ 2015-02-16 13:14 ` rguenth at gcc dot gnu.org
  2015-02-16 13:22 ` trippels at gcc dot gnu.org
                   ` (50 subsequent siblings)
  51 siblings, 0 replies; 53+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-02-16 13:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |5.0

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
So it's either time spent in the inliner (unlikely, though the patch has an
extra update_callee_keys call) or different (early) inlining decisions.

I remember you saying that without LTO/PGO-ing GCC the difference is a lot
smaller?


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bug ipa/65076] [5 Regression] 16% tramp3d-v4.cpp compile time regression
  2015-02-16 13:00 [Bug ipa/65076] New: [5 Regression] 16% tramp3d-v4.cpp compile time regression trippels at gcc dot gnu.org
  2015-02-16 13:14 ` [Bug ipa/65076] " rguenth at gcc dot gnu.org
@ 2015-02-16 13:22 ` trippels at gcc dot gnu.org
  2015-02-16 18:31 ` hubicka at gcc dot gnu.org
                   ` (49 subsequent siblings)
  51 siblings, 0 replies; 53+ messages in thread
From: trippels at gcc dot gnu.org @ 2015-02-16 13:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076

--- Comment #2 from Markus Trippelsdorf <trippels at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #1)
> So it's either time spent in the inliner (unlikely, though the patch has an
> extra update_callee_keys call) or different (early) inlining decisions.
> 
> I remember you saying that without LTO/PGO-ing GCC the difference is a lot
> smaller?

No, with --disable-bootstrap it still goes from 27.133 (r219451)
to 31.116 (r219452).


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bug ipa/65076] [5 Regression] 16% tramp3d-v4.cpp compile time regression
  2015-02-16 13:00 [Bug ipa/65076] New: [5 Regression] 16% tramp3d-v4.cpp compile time regression trippels at gcc dot gnu.org
  2015-02-16 13:14 ` [Bug ipa/65076] " rguenth at gcc dot gnu.org
  2015-02-16 13:22 ` trippels at gcc dot gnu.org
@ 2015-02-16 18:31 ` hubicka at gcc dot gnu.org
  2015-02-16 19:07 ` trippels at gcc dot gnu.org
                   ` (48 subsequent siblings)
  51 siblings, 0 replies; 53+ messages in thread
From: hubicka at gcc dot gnu.org @ 2015-02-16 18:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076

--- Comment #3 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Yep, I looked into this somewhat while preparing the patch. With new metric we
manage to do a lot more inlining before hitting the limits. This is kind of
positive effect - clearly inliner does things that pays back and it is posisble
to drop down INLINE_UNIT_GROWTH.

On the other hand it produces somewhat longer text segment that also leads to
slower compile times.

I finally got working firefox in all setups and plan to do some of inliner
tuning this week, so will look into that. What is the full -Q report you get?

Honza


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bug ipa/65076] [5 Regression] 16% tramp3d-v4.cpp compile time regression
  2015-02-16 13:00 [Bug ipa/65076] New: [5 Regression] 16% tramp3d-v4.cpp compile time regression trippels at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2015-02-16 18:31 ` hubicka at gcc dot gnu.org
@ 2015-02-16 19:07 ` trippels at gcc dot gnu.org
  2015-02-16 19:15 ` trippels at gcc dot gnu.org
                   ` (47 subsequent siblings)
  51 siblings, 0 replies; 53+ messages in thread
From: trippels at gcc dot gnu.org @ 2015-02-16 19:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076

--- Comment #4 from Markus Trippelsdorf <trippels at gcc dot gnu.org> ---
markus@x4 ~ % g++ -ftime-report -Ofast -w tramp3d-v4.cpp

Execution times (seconds)
 phase setup             :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall 
  1419 kB ( 0%) ggc
 phase parsing           :   1.10 ( 4%) usr   0.41 (20%) sys   1.51 ( 5%) wall 
167809 kB (16%) ggc
 phase lang. deferred    :   1.94 ( 7%) usr   0.22 (11%) sys   2.16 ( 7%) wall 
196631 kB (18%) ggc
 phase opt and generate  :  23.73 (89%) usr   1.46 (70%) sys  25.18 (87%) wall 
710026 kB (66%) ggc
 phase finalize          :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall 
     0 kB ( 0%) ggc
 |name lookup            :   0.20 ( 1%) usr   0.05 ( 2%) sys   0.36 ( 1%) wall 
 25424 kB ( 2%) ggc
 |overload resolution    :   0.96 ( 4%) usr   0.10 ( 5%) sys   1.11 ( 4%) wall 
116881 kB (11%) ggc
 garbage collection      :   1.14 ( 4%) usr   0.00 ( 0%) sys   1.14 ( 4%) wall 
     0 kB ( 0%) ggc
 dump files              :   0.13 ( 0%) usr   0.02 ( 1%) sys   0.20 ( 1%) wall 
     0 kB ( 0%) ggc
 callgraph construction  :   0.31 ( 1%) usr   0.02 ( 1%) sys   0.46 ( 2%) wall 
 17921 kB ( 2%) ggc
 callgraph optimization  :   0.37 ( 1%) usr   0.14 ( 7%) sys   0.42 ( 1%) wall 
 14318 kB ( 1%) ggc
 ipa dead code removal   :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall 
     0 kB ( 0%) ggc
 ipa virtual call target :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall 
     1 kB ( 0%) ggc
 ipa cp                  :   0.10 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall 
  4221 kB ( 0%) ggc
 ipa inlining heuristics :   0.35 ( 1%) usr   0.00 ( 0%) sys   0.35 ( 1%) wall 
 10474 kB ( 1%) ggc
 ipa function splitting  :   0.01 ( 0%) usr   0.01 ( 0%) sys   0.05 ( 0%) wall 
   797 kB ( 0%) ggc
 ipa comdats             :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall 
     0 kB ( 0%) ggc
 ipa reference           :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall 
     0 kB ( 0%) ggc
 ipa profile             :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall 
     0 kB ( 0%) ggc
 ipa pure const          :   0.07 ( 0%) usr   0.01 ( 0%) sys   0.06 ( 0%) wall 
   100 kB ( 0%) ggc
 ipa icf                 :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall 
     1 kB ( 0%) ggc
 ipa SRA                 :   0.19 ( 1%) usr   0.06 ( 3%) sys   0.20 ( 1%) wall 
 25432 kB ( 2%) ggc
 ipa free inline summary :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall 
     0 kB ( 0%) ggc
 cfg construction        :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall 
  1645 kB ( 0%) ggc
 cfg cleanup             :   0.17 ( 1%) usr   0.00 ( 0%) sys   0.15 ( 1%) wall 
   984 kB ( 0%) ggc
 trivially dead code     :   0.13 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 0%) wall 
     0 kB ( 0%) ggc
 df scan insns           :   0.12 ( 0%) usr   0.00 ( 0%) sys   0.15 ( 1%) wall 
   107 kB ( 0%) ggc
 df multiple defs        :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall 
     0 kB ( 0%) ggc
 df reaching defs        :   0.13 ( 0%) usr   0.00 ( 0%) sys   0.25 ( 1%) wall 
     0 kB ( 0%) ggc
 df live regs            :   0.70 ( 3%) usr   0.03 ( 1%) sys   0.80 ( 3%) wall 
     0 kB ( 0%) ggc
 df live&initialized regs:   0.27 ( 1%) usr   0.01 ( 0%) sys   0.18 ( 1%) wall 
     0 kB ( 0%) ggc
 df use-def / def-use chains:   0.09 ( 0%) usr   0.00 ( 0%) sys   0.09 ( 0%)
wall       0 kB ( 0%) ggc
 df reg dead/unused notes:   0.35 ( 1%) usr   0.03 ( 1%) sys   0.43 ( 1%) wall 
  4732 kB ( 0%) ggc
 register information    :   0.15 ( 1%) usr   0.01 ( 0%) sys   0.09 ( 0%) wall 
     0 kB ( 0%) ggc
 alias analysis          :   0.29 ( 1%) usr   0.00 ( 0%) sys   0.31 ( 1%) wall 
 16094 kB ( 1%) ggc
 alias stmt walking      :   0.63 ( 2%) usr   0.01 ( 0%) sys   0.64 ( 2%) wall 
   795 kB ( 0%) ggc
 register scan           :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall 
   544 kB ( 0%) ggc
 rebuild jump labels     :   0.07 ( 0%) usr   0.01 ( 0%) sys   0.07 ( 0%) wall 
     0 kB ( 0%) ggc
 preprocessing           :   0.07 ( 0%) usr   0.13 ( 6%) sys   0.18 ( 1%) wall 
  3198 kB ( 0%) ggc
 parser (global)         :   0.13 ( 0%) usr   0.16 ( 8%) sys   0.27 ( 1%) wall 
 50197 kB ( 5%) ggc
 parser struct body      :   0.16 ( 1%) usr   0.02 ( 1%) sys   0.25 ( 1%) wall 
 24480 kB ( 2%) ggc
 parser enumerator list  :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall 
   291 kB ( 0%) ggc
 parser function body    :   0.14 ( 1%) usr   0.00 ( 0%) sys   0.11 ( 0%) wall 
  7849 kB ( 1%) ggc
 parser inl. func. body  :   0.07 ( 0%) usr   0.01 ( 0%) sys   0.04 ( 0%) wall 
  4142 kB ( 0%) ggc
 parser inl. meth. body  :   0.15 ( 1%) usr   0.04 ( 2%) sys   0.17 ( 1%) wall 
 17265 kB ( 2%) ggc
 template instantiation  :   2.00 ( 7%) usr   0.27 (13%) sys   2.30 ( 8%) wall 
256355 kB (24%) ggc
 early inlining heuristics:   0.05 ( 0%) usr   0.02 ( 1%) sys   0.08 ( 0%) wall
   8349 kB ( 1%) ggc
 inline parameters       :   0.15 ( 1%) usr   0.01 ( 0%) sys   0.22 ( 1%) wall 
  7914 kB ( 1%) ggc
 integration             :   0.81 ( 3%) usr   0.14 ( 7%) sys   1.01 ( 3%) wall 
113530 kB (11%) ggc
 tree gimplify           :   0.24 ( 1%) usr   0.06 ( 3%) sys   0.31 ( 1%) wall 
 26559 kB ( 2%) ggc
 tree eh                 :   0.07 ( 0%) usr   0.01 ( 0%) sys   0.08 ( 0%) wall 
 11189 kB ( 1%) ggc
 tree CFG construction   :   0.07 ( 0%) usr   0.02 ( 1%) sys   0.06 ( 0%) wall 
 19071 kB ( 2%) ggc
 tree CFG cleanup        :   0.35 ( 1%) usr   0.05 ( 2%) sys   0.25 ( 1%) wall 
  1323 kB ( 0%) ggc
 tree tail merge         :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall 
     3 kB ( 0%) ggc
 tree VRP                :   0.35 ( 1%) usr   0.01 ( 0%) sys   0.49 ( 2%) wall 
 11832 kB ( 1%) ggc
 tree copy propagation   :   0.14 ( 1%) usr   0.00 ( 0%) sys   0.12 ( 0%) wall 
   285 kB ( 0%) ggc
 tree PTA                :   0.67 ( 3%) usr   0.06 ( 3%) sys   0.77 ( 3%) wall 
  4489 kB ( 0%) ggc
 tree PHI insertion      :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall 
  2920 kB ( 0%) ggc
 tree SSA rewrite        :   0.17 ( 1%) usr   0.01 ( 0%) sys   0.15 ( 1%) wall 
 16594 kB ( 2%) ggc
 tree SSA other          :   0.09 ( 0%) usr   0.03 ( 1%) sys   0.12 ( 0%) wall 
  1393 kB ( 0%) ggc
 tree SSA incremental    :   0.34 ( 1%) usr   0.03 ( 1%) sys   0.36 ( 1%) wall 
  5838 kB ( 1%) ggc
 tree operand scan       :   0.49 ( 2%) usr   0.02 ( 1%) sys   0.38 ( 1%) wall 
 49421 kB ( 5%) ggc
 dominator optimization  :   0.17 ( 1%) usr   0.02 ( 1%) sys   0.27 ( 1%) wall 
  6572 kB ( 1%) ggc
 tree SRA                :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall 
  1215 kB ( 0%) ggc
 isolate eroneous paths  :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall 
    13 kB ( 0%) ggc
 tree CCP                :   0.62 ( 2%) usr   0.08 ( 4%) sys   0.60 ( 2%) wall 
 13927 kB ( 1%) ggc
 tree PHI const/copy prop:   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall 
    67 kB ( 0%) ggc
 tree split crit edges   :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall 
  5764 kB ( 1%) ggc
 tree reassociation      :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall 
   151 kB ( 0%) ggc
 tree PRE                :   1.00 ( 4%) usr   0.03 ( 1%) sys   1.07 ( 4%) wall 
 14480 kB ( 1%) ggc
 tree FRE                :   0.88 ( 3%) usr   0.04 ( 2%) sys   0.79 ( 3%) wall 
  8148 kB ( 1%) ggc
 tree code sinking       :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall 
  1194 kB ( 0%) ggc
 tree linearize phis     :   0.02 ( 0%) usr   0.01 ( 0%) sys   0.06 ( 0%) wall 
  1296 kB ( 0%) ggc
 tree forward propagate  :   0.27 ( 1%) usr   0.05 ( 2%) sys   0.20 ( 1%) wall 
  2269 kB ( 0%) ggc
 tree conservative DCE   :   0.09 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 0%) wall 
   300 kB ( 0%) ggc
 tree aggressive DCE     :   0.13 ( 0%) usr   0.01 ( 0%) sys   0.22 ( 1%) wall 
 12882 kB ( 1%) ggc
 tree DSE                :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.10 ( 0%) wall 
   242 kB ( 0%) ggc
 PHI merge               :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall 
    95 kB ( 0%) ggc
 tree loop bounds        :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall 
   764 kB ( 0%) ggc
 tree loop invariant motion:   0.09 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 0%)
wall      26 kB ( 0%) ggc
 tree canonical iv       :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall 
  1366 kB ( 0%) ggc
 scev constant prop      :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall 
   771 kB ( 0%) ggc
 tree loop unswitching   :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall 
   606 kB ( 0%) ggc
 complete unrolling      :   0.27 ( 1%) usr   0.01 ( 0%) sys   0.30 ( 1%) wall 
 20912 kB ( 2%) ggc
 tree vectorization      :   0.06 ( 0%) usr   0.01 ( 0%) sys   0.15 ( 1%) wall 
  9013 kB ( 1%) ggc
 tree slp vectorization  :   0.10 ( 0%) usr   0.00 ( 0%) sys   0.18 ( 1%) wall 
 12751 kB ( 1%) ggc
 tree loop distribution  :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall 
   902 kB ( 0%) ggc
 tree iv optimization    :   0.31 ( 1%) usr   0.00 ( 0%) sys   0.30 ( 1%) wall 
 17198 kB ( 2%) ggc
 predictive commoning    :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall 
  3265 kB ( 0%) ggc
 tree copy headers       :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall 
  1128 kB ( 0%) ggc
 tree SSA uncprop        :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall 
     0 kB ( 0%) ggc
 tree rename SSA copies  :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall 
     0 kB ( 0%) ggc
 tree strlen optimization:   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall 
     1 kB ( 0%) ggc
 dominance frontiers     :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall 
     0 kB ( 0%) ggc
 dominance computation   :   0.60 ( 2%) usr   0.03 ( 1%) sys   0.54 ( 2%) wall 
     0 kB ( 0%) ggc
 control dependences     :   0.00 ( 0%) usr   0.01 ( 0%) sys   0.01 ( 0%) wall 
     0 kB ( 0%) ggc
 out of ssa              :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.09 ( 0%) wall 
   140 kB ( 0%) ggc
 expand vars             :   0.06 ( 0%) usr   0.02 ( 1%) sys   0.03 ( 0%) wall 
  4200 kB ( 0%) ggc
 expand                  :   0.27 ( 1%) usr   0.00 ( 0%) sys   0.31 ( 1%) wall 
 33291 kB ( 3%) ggc
 post expand cleanups    :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall 
  2580 kB ( 0%) ggc
 varconst                :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall 
    20 kB ( 0%) ggc
 forward prop            :   0.11 ( 0%) usr   0.01 ( 0%) sys   0.18 ( 1%) wall 
  3934 kB ( 0%) ggc
 CSE                     :   0.35 ( 1%) usr   0.02 ( 1%) sys   0.28 ( 1%) wall 
  1353 kB ( 0%) ggc
 dead code elimination   :   0.08 ( 0%) usr   0.00 ( 0%) sys   0.09 ( 0%) wall 
     0 kB ( 0%) ggc
 dead store elim1        :   0.16 ( 1%) usr   0.01 ( 0%) sys   0.25 ( 1%) wall 
  3232 kB ( 0%) ggc
 dead store elim2        :   0.18 ( 1%) usr   0.01 ( 0%) sys   0.24 ( 1%) wall 
  4237 kB ( 0%) ggc
 loop init               :   0.29 ( 1%) usr   0.06 ( 3%) sys   0.25 ( 1%) wall 
 18739 kB ( 2%) ggc
 loop invariant motion   :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 0%) wall 
   401 kB ( 0%) ggc
 loop fini               :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall 
     0 kB ( 0%) ggc
 CPROP                   :   0.44 ( 2%) usr   0.02 ( 1%) sys   0.32 ( 1%) wall 
  4941 kB ( 0%) ggc
 PRE                     :   0.18 ( 1%) usr   0.00 ( 0%) sys   0.19 ( 1%) wall 
  1519 kB ( 0%) ggc
 CSE 2                   :   0.21 ( 1%) usr   0.00 ( 0%) sys   0.13 ( 0%) wall 
   703 kB ( 0%) ggc
 branch prediction       :   0.11 ( 0%) usr   0.01 ( 0%) sys   0.13 ( 0%) wall 
  4087 kB ( 0%) ggc
 combiner                :   0.79 ( 3%) usr   0.00 ( 0%) sys   0.76 ( 3%) wall 
 15418 kB ( 1%) ggc
 if-conversion           :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall 
   175 kB ( 0%) ggc
 integrated RA           :   1.44 ( 5%) usr   0.03 ( 1%) sys   1.41 ( 5%) wall 
 56992 kB ( 5%) ggc
 LRA non-specific        :   0.40 ( 1%) usr   0.01 ( 0%) sys   0.39 ( 1%) wall 
  5387 kB ( 1%) ggc
 LRA virtuals elimination:   0.05 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall 
  2587 kB ( 0%) ggc
 LRA reload inheritance  :   0.10 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall 
   899 kB ( 0%) ggc
 LRA create live ranges  :   0.32 ( 1%) usr   0.00 ( 0%) sys   0.40 ( 1%) wall 
   819 kB ( 0%) ggc
 LRA hard reg assignment :   0.11 ( 0%) usr   0.00 ( 0%) sys   0.18 ( 1%) wall 
     0 kB ( 0%) ggc
 LRA rematerialization   :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.10 ( 0%) wall 
     1 kB ( 0%) ggc
 reload                  :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall 
     0 kB ( 0%) ggc
 reload CSE regs         :   0.40 ( 1%) usr   0.02 ( 1%) sys   0.51 ( 2%) wall 
  6295 kB ( 1%) ggc
 load CSE after reload   :   0.11 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall 
   157 kB ( 0%) ggc
 ree                     :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall 
   135 kB ( 0%) ggc
 thread pro- & epilogue  :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.09 ( 0%) wall 
  2825 kB ( 0%) ggc
 if-conversion 2         :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall 
    10 kB ( 0%) ggc
 combine stack adjustments:   0.02 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall
      0 kB ( 0%) ggc
 peephole 2              :   0.12 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall 
   516 kB ( 0%) ggc
 hard reg cprop          :   0.08 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall 
    79 kB ( 0%) ggc
 scheduling 2            :   0.72 ( 3%) usr   0.03 ( 1%) sys   0.57 ( 2%) wall 
  1696 kB ( 0%) ggc
 machine dep reorg       :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall 
    12 kB ( 0%) ggc
 reorder blocks          :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall 
  2858 kB ( 0%) ggc
 shorten branches        :   0.08 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 0%) wall 
     0 kB ( 0%) ggc
 final                   :   0.18 ( 1%) usr   0.00 ( 0%) sys   0.21 ( 1%) wall 
  7228 kB ( 1%) ggc
 straight-line strength reduction:   0.04 ( 0%) usr   0.00 ( 0%) sys   0.02 (
0%) wall     369 kB ( 0%) ggc
 rest of compilation     :   0.42 ( 2%) usr   0.03 ( 1%) sys   0.37 ( 1%) wall 
  5498 kB ( 1%) ggc
 unaccounted late compilation:   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%)
wall       0 kB ( 0%) ggc
 remove unused locals    :   0.15 ( 1%) usr   0.02 ( 1%) sys   0.19 ( 1%) wall 
    36 kB ( 0%) ggc
 address taken           :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 0%) wall 
     0 kB ( 0%) ggc
 unaccounted todo        :   0.38 ( 1%) usr   0.03 ( 1%) sys   0.50 ( 2%) wall 
     0 kB ( 0%) ggc
 repair loop structures  :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall 
     0 kB ( 0%) ggc
 TOTAL                 :  26.77             2.09            28.87           
1075919 kB


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bug ipa/65076] [5 Regression] 16% tramp3d-v4.cpp compile time regression
  2015-02-16 13:00 [Bug ipa/65076] New: [5 Regression] 16% tramp3d-v4.cpp compile time regression trippels at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2015-02-16 19:07 ` trippels at gcc dot gnu.org
@ 2015-02-16 19:15 ` trippels at gcc dot gnu.org
  2015-02-17 10:49 ` rguenth at gcc dot gnu.org
                   ` (46 subsequent siblings)
  51 siblings, 0 replies; 53+ messages in thread
From: trippels at gcc dot gnu.org @ 2015-02-16 19:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076

--- Comment #5 from Markus Trippelsdorf <trippels at gcc dot gnu.org> ---
Perf shows:

Overhead  Command   Shared Object               Symbol
   2.45%  cc1plus   libc-2.21.90.so             [.] _int_malloc
   1.88%  cc1plus   cc1plus                     [.] bitmap_find_bit
   1.72%  cc1plus   cc1plus                     [.] gt_ggc_mx_lang_tree_node
   1.36%  cc1plus   libc-2.21.90.so             [.] _int_free
   1.05%  cc1plus   cc1plus                     [.] ggc_set_mark
   0.97%  cc1plus   cc1plus                     [.] record_reg_classes
   0.96%  cc1plus   cc1plus                     [.] df_worklist_dataflow
   0.91%  cc1plus   cc1plus                     [.] build_qualified_type
   0.88%  cc1plus   cc1plus                     [.] df_note_compute
   0.84%  cc1plus   libc-2.21.90.so             [.] malloc_consolidate

(Using a faster malloc implementation speeds up compile time by ~5%:

markus@x4 ~ % time g++ -w -Ofast tramp3d-v4.cpp
g++ -w -Ofast tramp3d-v4.cpp  26.00s user 0.32s system 99% cpu 26.341 total
markus@x4 ~ % time LD_PRELOAD=/usr/lib/libllalloc.so.1.3 g++ -w -Ofast
tramp3d-v4.cpp
LD_PRELOAD=/usr/lib/libllalloc.so.1.3 g++ -w -Ofast tramp3d-v4.cpp  24.60s user
0.37s system 99% cpu 24.997 total)


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bug ipa/65076] [5 Regression] 16% tramp3d-v4.cpp compile time regression
  2015-02-16 13:00 [Bug ipa/65076] New: [5 Regression] 16% tramp3d-v4.cpp compile time regression trippels at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2015-02-16 19:15 ` trippels at gcc dot gnu.org
@ 2015-02-17 10:49 ` rguenth at gcc dot gnu.org
  2015-03-04  9:10 ` rguenth at gcc dot gnu.org
                   ` (45 subsequent siblings)
  51 siblings, 0 replies; 53+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-02-17 10:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076

--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
I wonder where the main _int_malloc load comes from.


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bug ipa/65076] [5 Regression] 16% tramp3d-v4.cpp compile time regression
  2015-02-16 13:00 [Bug ipa/65076] New: [5 Regression] 16% tramp3d-v4.cpp compile time regression trippels at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2015-02-17 10:49 ` rguenth at gcc dot gnu.org
@ 2015-03-04  9:10 ` rguenth at gcc dot gnu.org
  2015-03-18 12:56 ` rguenth at gcc dot gnu.org
                   ` (44 subsequent siblings)
  51 siblings, 0 replies; 53+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-03-04  9:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076

--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #6)
> I wonder where the main _int_malloc load comes from.

To answer this question myself, 84% from the _int_malloc load comes from
calling
malloc of which 55% comes from calling xmalloc (27% xrealloc, 12% operator
new),
of which there is quite an even distribution amongst callers, alloc-pool.c and
obstack.c as acummulators on the top, sbitmap allocations are also high in the
list.  callgrind thinks _obstack_begin and pool_alloc are the top ones
cost-wise
with some large gap to the thrid place.  _obstack_begin is mostly called from
bitmap obstack init which is then reasonably distributed.  alloc-pools are
mostly used by the DF and DOM (et-forest) machineries.  xrealloc is vec<>s,
operator new
is SCEV.

tramp3d has a lot of functions thus we gobble up many one-per-function or
one-per-pass allocations.

But even callgrind confirms that _int_malloc is the 3rd costly function
(as of "self" cost, w/o callees).

But unfortunately there is nothing obvious to cut off a significant amount
of the allocation load.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bug ipa/65076] [5 Regression] 16% tramp3d-v4.cpp compile time regression
  2015-02-16 13:00 [Bug ipa/65076] New: [5 Regression] 16% tramp3d-v4.cpp compile time regression trippels at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2015-03-04  9:10 ` rguenth at gcc dot gnu.org
@ 2015-03-18 12:56 ` rguenth at gcc dot gnu.org
  2015-03-21  5:32 ` hubicka at gcc dot gnu.org
                   ` (43 subsequent siblings)
  51 siblings, 0 replies; 53+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-03-18 12:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076

--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
The "culprit" basically refactors things and in the process screws
code-generation with sreals?


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bug ipa/65076] [5 Regression] 16% tramp3d-v4.cpp compile time regression
  2015-02-16 13:00 [Bug ipa/65076] New: [5 Regression] 16% tramp3d-v4.cpp compile time regression trippels at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2015-03-18 12:56 ` rguenth at gcc dot gnu.org
@ 2015-03-21  5:32 ` hubicka at gcc dot gnu.org
  2015-03-21 10:25 ` hubicka at gcc dot gnu.org
                   ` (42 subsequent siblings)
  51 siblings, 0 replies; 53+ messages in thread
From: hubicka at gcc dot gnu.org @ 2015-03-21  5:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076

--- Comment #9 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Concerning Comment #7, I do not think the sreal refactoring screwed things up.
sreals are not high on profile and the code generated is not worse (performance
wise). It is not better, but it is not a surprise - we traditionally keep
inline limits high enough to get tramp3d performance good.

GCC 5 currently generates 11% bigger code than GCC 4.9. Not a good thing, but
from POV of the inliner heuristics it is - it is trying to do as much as
posisble of code duplication before it hits the limits.  New badness metrics
performs better on tramp3d - it does all useful inlining at 10% of unit growth,
hile old one needed 25%. It prioritizes more cases where inliner knows DCE will
happen because of propagation across arguments and thus it manages to get later
optimizers more busy.

Still plan to look into this more, but I think it is kind of non-bug (just
showing the fact that inliner is stupid to believe that all inlining is a good
idea, but there is not much to do except getting realistic static program
profiles that it out of current state of art).

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bug ipa/65076] [5 Regression] 16% tramp3d-v4.cpp compile time regression
  2015-02-16 13:00 [Bug ipa/65076] New: [5 Regression] 16% tramp3d-v4.cpp compile time regression trippels at gcc dot gnu.org
                   ` (8 preceding siblings ...)
  2015-03-21  5:32 ` hubicka at gcc dot gnu.org
@ 2015-03-21 10:25 ` hubicka at gcc dot gnu.org
  2015-03-21 10:48 ` hubicka at gcc dot gnu.org
                   ` (41 subsequent siblings)
  51 siblings, 0 replies; 53+ messages in thread
From: hubicka at gcc dot gnu.org @ 2015-03-21 10:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076

--- Comment #10 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
I can re-confirm the 16% compile time regression.  I went through some compare.

$ wc -l *.ssa
299231 tramp3d-v4.ii.015t.ssa
$ wc -l ../5/*.ssa
331115 ../5/tramp3d-v4.ii.018t.ssa

so as a lame compare, we already have 10% more statements to start with.
Now einline

$ wc -l *.einline
692812 tramp3d-v4.ii.018t.einline
$ wc -l ../5/*.einline
724090 ../5/tramp3d-v4.ii.026t.einline

so after einline we seem to have 4% statements more, we do about the same
number of inlining:

$ grep Inlining tramp3d-v4.ii.*einline | wc -l
28003
$ grep Inlining ../5/tramp3d-v4.ii.*einline | wc -l
28685

but at release_ssa we still have about 4% more.

$ wc -l *release_ssa*
348378 tramp3d-v4.ii.036t.release_ssa
$ wc -l ../5/*release_ssa*
365689 ../5/tramp3d-v4.ii.043t.release_ssa

There is no difference in number of functions in ssa and release_ssa dumps. 
What makes the functions bigger in GCC 5?

$ grep "^  .* = " *.release_ssa | wc -l
65028
$ grep "^  .* = " ../5/*.release_ssa | wc -l
72636

The number of statements is about the same.

During the actual inlining GCC 4.9 reports:
 Unit growth for small function inlining: 88536->114049 (28%)
and
 Unit growth for small function inlining: 87943->97699 (11%)

Statement count seems to remain 7% in .optimized dumps.  So perhaps the
slowdown is not really that much caused by IPA passes as we somehow manage to
produce more code out of C++ FE.

I looked for interesting differences in SSA dump.  Here are few:

-;; Function int __gthread_active_p() (_ZL18__gthread_active_pv,
funcdef_no=312, decl_uid=8436, symbol_order=127)
+;; Function int __gthread_active_p() (_ZL18__gthread_active_pv,
funcdef_no=312, decl_uid=8537, cgraph_uid=127, symbol_order=127)

 int __gthread_active_p() ()
 {
-  bool _1;
-  int _2;
+  static void * const __gthread_active_ptr = (void *) __gthrw_pthread_cancel;
+  void * __gthread_active_ptr.111_2;
+  bool _3;
+  int _4;

   <bb 2>:
-  _1 = __gthrw_pthread_cancel != 0B;
-  _2 = (int) _1;
-  return _2;
+  __gthread_active_ptr.111_2 = __gthread_active_ptr;
+  _3 = __gthread_active_ptr.111_2 != 0B;
+  _4 = (int) _3;
+  return _4;

 }

... this looks like header change, perhaps ...

 ObserverEvent::~ObserverEvent() (struct ObserverEvent * const this)
 {
-  int _6;
+  int (*__vtbl_ptr_type) () * _2;
+  int _7;

   <bb 2>:
-  this_3(D)->_vptr.ObserverEvent = &MEM[(void *)&_ZTV13ObserverEvent + 16B];
-  *this_3(D) ={v} {CLOBBER};
-  _6 = 0;
-  if (_6 != 0)
+  _2 = &_ZTV13ObserverEvent + 16;
+  this_4(D)->_vptr.ObserverEvent = _2;
+  MEM[(struct  &)this_4(D)] ={v} {CLOBBER};
+  _7 = 0;
+  if (_7 != 0)

... extra temporary initializing vtbl pointer. This is repeated many times ...

-;; Function static Unique::Value_t Unique::get() (_ZN6Unique3getEv,
funcdef_no=3030, decl_uid=51649, symbol_order=884)
+;; Function static Unique::Value_t Unique::get() (_ZN6Unique3getEv,
funcdef_no=3030, decl_uid=51730, cgraph_uid=883, symbol_order=884)

 static Unique::Value_t Unique::get() ()
 {
   Value_t retval;
-  long int next_s.83_2;
-  long int next_s.84_3;
-  long int next_s.85_4;
-  Value_t _7;
+  long int next_s.83_3;
+  long int next_s.84_4;
+  long int next_s.85_5;
+  Value_t _9;

   <bb 2>:
-  Pooma::DummyMutex::_ZN5Pooma10DummyMutex4lockEv.isra.26 ();
-  next_s.83_2 = next_s;
-  next_s.84_3 = next_s.83_2;
-  next_s.85_4 = next_s.84_3 + 1;
-  next_s = next_s.85_4;
-  retval_6 = next_s.84_3;
-  Pooma::DummyMutex::_ZN5Pooma10DummyMutex6unlockEv.isra.27 ();
-  _7 = retval_6;
-  return _7;
+  Pooma::DummyMutex::lock (&mutex_s);
+  next_s.83_3 = next_s;
+  next_s.84_4 = next_s.83_3;
+  next_s.85_5 = next_s.84_4 + 1;
+  next_s = next_s.85_5;
+  retval_7 = next_s.84_4;
+  Pooma::DummyMutex::unlock (&mutex_s);
+  _9 = retval_7;
+  return _9;

 }

... here we give up on ISRA....

and we have about twice as much EH:

$ grep "resx " tramp3d-v4.ii.*\.ssa | wc -l
4816
$ grep "resx " ../5/tramp3d-v4.ii.*\.ssa | wc -l
8671

which however is optimized out at a time of release_ssa.

Another thing that we may consider to cleanup in next stage1 is to get rid of
dead stores:

-  MEM[(struct new_allocator *)&D.561702] ={v} {CLOBBER};
-  D.561702 ={v} {CLOBBER};
-  D.561702 ={v} {CLOBBER};
-  MEM[(struct new_allocator *)_2] ={v} {CLOBBER};
-  MEM[(struct allocator *)_2] ={v} {CLOBBER};
-  MEM[(struct _Alloc_hider *)_2] ={v} {CLOBBER};
-  MEM[(struct basic_string *)_2] ={v} {CLOBBER};
-  *_2 ={v} {CLOBBER};
-  *this_1(D) ={v} {CLOBBER};
+  MEM[(struct  &)&D.570046] ={v} {CLOBBER};
+  MEM[(struct  &)&D.570046] ={v} {CLOBBER};
+  D.570046 ={v} {CLOBBER};
+  MEM[(struct  &)_2] ={v} {CLOBBER};
+  MEM[(struct  &)_2] ={v} {CLOBBER};
+  MEM[(struct  &)_2] ={v} {CLOBBER};
+  MEM[(struct  &)_2] ={v} {CLOBBER};
+  MEM[(struct  &)_2] ={v} {CLOBBER};
+  MEM[(struct  &)this_1(D)] ={v} {CLOBBER};

Clobbers are dangerously common. There are 18K clobbers in release_ssa dump out
of 65K assignments, that makes them to be 29% of all the code. The number of
clobbers seems to go down only in tramp3d-v4.ii.166t.ehcleanup dump and we
still get a lot of redundancies:

  <bb 32>:                                                                      
  D.581063 ={v} {CLOBBER};                                                      
  D.581063 ={v} {CLOBBER};                                                      
  D.164155 ={v} {CLOBBER};                                                      
  D.164155 ={v} {CLOBBER};                                                      
  operator delete [] (begbuf_18);                                               

Why those are not considered a dead stores and DCEed out earlier?


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bug ipa/65076] [5 Regression] 16% tramp3d-v4.cpp compile time regression
  2015-02-16 13:00 [Bug ipa/65076] New: [5 Regression] 16% tramp3d-v4.cpp compile time regression trippels at gcc dot gnu.org
                   ` (9 preceding siblings ...)
  2015-03-21 10:25 ` hubicka at gcc dot gnu.org
@ 2015-03-21 10:48 ` hubicka at gcc dot gnu.org
  2015-03-21 11:32 ` hubicka at gcc dot gnu.org
                   ` (40 subsequent siblings)
  51 siblings, 0 replies; 53+ messages in thread
From: hubicka at gcc dot gnu.org @ 2015-03-21 10:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076

--- Comment #11 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Sorry, the number of clobbers drops at DSE1, not during ehcleanup2, I just
messed up my grep. 

I tried the following patch:

Index: passes.def
===================================================================
--- passes.def  (revision 221541)
+++ passes.def  (working copy)
@@ -87,6 +87,7 @@ along with GCC; see the file COPYING3.
          NEXT_PASS (pass_build_ealias);
          NEXT_PASS (pass_fre);
          NEXT_PASS (pass_merge_phi);
+         NEXT_PASS (pass_dse);
          NEXT_PASS (pass_cd_dce);
          NEXT_PASS (pass_early_ipa_sra);
          NEXT_PASS (pass_tail_recursion);

This brings number of CLOBBER statements at release_ssa time down to 7392 (50%
reduction).  A nice effect of this patch is that it tends to simplify
destructors often to empty to make them more inlinable:

 ObserverEvent::~ObserverEvent() (struct ObserverEvent * const this)
 {
   <bb 2>:
-  this_2(D)->_vptr.ObserverEvent = &MEM[(void *)&_ZTV13ObserverEvent + 16B];
   MEM[(struct  &)this_2(D)] ={v} {CLOBBER};
   return;

saves a lot of the clobbers:
 Engine<3, double, ExpressionTag<UnaryNode<FnNorm, BinaryNode<OpSubtract,
Reference<Field<NoMesh<3>, Vector<3, double, Full>, ViewEngine<3,
IndexFunction<GenericURM<MeshTraits<3, double, UniformRectilinearTag,
CartesianTag, 3> >::PositionsFunctor> > > >, Scalar<Vector<3, double, Full> > >
> > >::~Engine() (struct Engine * const this)
 {
   <bb 2>:
-  MEM[(struct  &)this_2(D) + 32] ={v} {CLOBBER};
-  MEM[(struct  &)this_2(D) + 32] ={v} {CLOBBER};
-  MEM[(struct  &)this_2(D) + 8] ={v} {CLOBBER};
-  MEM[(struct  &)this_2(D) + 8] ={v} {CLOBBER};
-  MEM[(struct  &)this_2(D) + 8] ={v} {CLOBBER};
-  MEM[(struct  &)this_2(D)] ={v} {CLOBBER};
-  MEM[(struct  &)this_2(D)] ={v} {CLOBBER};
-  MEM[(struct  &)this_2(D)] ={v} {CLOBBER};
+  MEM[(struct  &)this_1(D)] ={v} {CLOBBER};
   return;

which is especially nice for LTO streaming.

and saves about 7% of code apparently after inlining:

$ wc -l *copyprop2
200189 tramp3d-v4.ii.085t.copyprop2
$ wc -l ../5/*copyprop2
215060 ../5/tramp3d-v4.ii.084t.copyprop2

Even though the inline decisions does not seem to be changed considerably (at
least on tramp3d).

On unrelated note I noticed PR65502

Still I guess this does not really explain the origin of regression in
statement count relative to 4.9...


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bug ipa/65076] [5 Regression] 16% tramp3d-v4.cpp compile time regression
  2015-02-16 13:00 [Bug ipa/65076] New: [5 Regression] 16% tramp3d-v4.cpp compile time regression trippels at gcc dot gnu.org
                   ` (10 preceding siblings ...)
  2015-03-21 10:48 ` hubicka at gcc dot gnu.org
@ 2015-03-21 11:32 ` hubicka at gcc dot gnu.org
  2015-03-24 14:56 ` rguenth at gcc dot gnu.org
                   ` (39 subsequent siblings)
  51 siblings, 0 replies; 53+ messages in thread
From: hubicka at gcc dot gnu.org @ 2015-03-21 11:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076

--- Comment #12 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Also the number of statements is about the same at .cfg dump, so it is .ssa
that introduces all the differences. Why?


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bug ipa/65076] [5 Regression] 16% tramp3d-v4.cpp compile time regression
  2015-02-16 13:00 [Bug ipa/65076] New: [5 Regression] 16% tramp3d-v4.cpp compile time regression trippels at gcc dot gnu.org
                   ` (11 preceding siblings ...)
  2015-03-21 11:32 ` hubicka at gcc dot gnu.org
@ 2015-03-24 14:56 ` rguenth at gcc dot gnu.org
  2015-03-24 17:21 ` jakub at gcc dot gnu.org
                   ` (38 subsequent siblings)
  51 siblings, 0 replies; 53+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-03-24 14:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076

--- Comment #13 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Jan Hubicka from comment #10)
> I can re-confirm the 16% compile time regression.  I went through some
> compare.
> 
> $ wc -l *.ssa
> 299231 tramp3d-v4.ii.015t.ssa
> $ wc -l ../5/*.ssa
> 331115 ../5/tramp3d-v4.ii.018t.ssa
> 
> so as a lame compare, we already have 10% more statements to start with.
> Now einline
> 
> $ wc -l *.einline
> 692812 tramp3d-v4.ii.018t.einline
> $ wc -l ../5/*.einline
> 724090 ../5/tramp3d-v4.ii.026t.einline
> 
> so after einline we seem to have 4% statements more, we do about the same
> number of inlining:
> 
> $ grep Inlining tramp3d-v4.ii.*einline | wc -l
> 28003
> $ grep Inlining ../5/tramp3d-v4.ii.*einline | wc -l
> 28685
> 
> but at release_ssa we still have about 4% more.
> 
> $ wc -l *release_ssa*
> 348378 tramp3d-v4.ii.036t.release_ssa
> $ wc -l ../5/*release_ssa*
> 365689 ../5/tramp3d-v4.ii.043t.release_ssa
> 
> There is no difference in number of functions in ssa and release_ssa dumps. 
> What makes the functions bigger in GCC 5?
> 
> $ grep "^  .* = " *.release_ssa | wc -l
> 65028
> $ grep "^  .* = " ../5/*.release_ssa | wc -l
> 72636
> 
> The number of statements is about the same.
> 
> During the actual inlining GCC 4.9 reports:
>  Unit growth for small function inlining: 88536->114049 (28%)
> and
>  Unit growth for small function inlining: 87943->97699 (11%)
> 
> Statement count seems to remain 7% in .optimized dumps.  So perhaps the
> slowdown is not really that much caused by IPA passes as we somehow manage
> to produce more code out of C++ FE.
> 
> I looked for interesting differences in SSA dump.  Here are few:
> 
> -;; Function int __gthread_active_p() (_ZL18__gthread_active_pv,
> funcdef_no=312, decl_uid=8436, symbol_order=127)
> +;; Function int __gthread_active_p() (_ZL18__gthread_active_pv,
> funcdef_no=312, decl_uid=8537, cgraph_uid=127, symbol_order=127)
>  
>  int __gthread_active_p() ()
>  {
> -  bool _1;
> -  int _2;
> +  static void * const __gthread_active_ptr = (void *)
> __gthrw_pthread_cancel;
> +  void * __gthread_active_ptr.111_2;
> +  bool _3;
> +  int _4;
>  
>    <bb 2>:
> -  _1 = __gthrw_pthread_cancel != 0B;
> -  _2 = (int) _1;
> -  return _2;
> +  __gthread_active_ptr.111_2 = __gthread_active_ptr;
> +  _3 = __gthread_active_ptr.111_2 != 0B;
> +  _4 = (int) _3;
> +  return _4;
>  
>  }
> 
> ... this looks like header change, perhaps ...

Yep.  __gthrw_pthread_cancel is a function pointer (thsu constant)
while __gthread_active_ptr is a global variable.

>  ObserverEvent::~ObserverEvent() (struct ObserverEvent * const this)
>  {
> -  int _6;
> +  int (*__vtbl_ptr_type) () * _2;
> +  int _7;
>  
>    <bb 2>:
> -  this_3(D)->_vptr.ObserverEvent = &MEM[(void *)&_ZTV13ObserverEvent + 16B];
> -  *this_3(D) ={v} {CLOBBER};
> -  _6 = 0;
> -  if (_6 != 0)
> +  _2 = &_ZTV13ObserverEvent + 16;
> +  this_4(D)->_vptr.ObserverEvent = _2;
> +  MEM[(struct  &)this_4(D)] ={v} {CLOBBER};
> +  _7 = 0;
> +  if (_7 != 0)
> 
> ... extra temporary initializing vtbl pointer. This is repeated many times
> ...

This is because of

2015-03-20  Richard Biener  <rguenther@suse.de>

        PR middle-end/64715
...
        * gimplify.c (gimplify_expr): Remove premature folding of
        &X + CST to &MEM[&X, CST].

thus relatively recent.  It will be fixed up by ccp1.

> -;; Function static Unique::Value_t Unique::get() (_ZN6Unique3getEv,
> funcdef_no=3030, decl_uid=51649, symbol_order=884)
> +;; Function static Unique::Value_t Unique::get() (_ZN6Unique3getEv,
> funcdef_no=3030, decl_uid=51730, cgraph_uid=883, symbol_order=884)
>  
>  static Unique::Value_t Unique::get() ()
>  {
>    Value_t retval;
> -  long int next_s.83_2;
> -  long int next_s.84_3;
> -  long int next_s.85_4;
> -  Value_t _7;
> +  long int next_s.83_3;
> +  long int next_s.84_4;
> +  long int next_s.85_5;
> +  Value_t _9;
>  
>    <bb 2>:
> -  Pooma::DummyMutex::_ZN5Pooma10DummyMutex4lockEv.isra.26 ();
> -  next_s.83_2 = next_s;
> -  next_s.84_3 = next_s.83_2;
> -  next_s.85_4 = next_s.84_3 + 1;
> -  next_s = next_s.85_4;
> -  retval_6 = next_s.84_3;
> -  Pooma::DummyMutex::_ZN5Pooma10DummyMutex6unlockEv.isra.27 ();
> -  _7 = retval_6;
> -  return _7;
> +  Pooma::DummyMutex::lock (&mutex_s);
> +  next_s.83_3 = next_s;
> +  next_s.84_4 = next_s.83_3;
> +  next_s.85_5 = next_s.84_4 + 1;
> +  next_s = next_s.85_5;
> +  retval_7 = next_s.84_4;
> +  Pooma::DummyMutex::unlock (&mutex_s);
> +  _9 = retval_7;
> +  return _9;
>  
>  }
> 
> ... here we give up on ISRA....

I believe because of

2015-02-13  Ilya Enkovich  <ilya.enkovich@intel.com>

        PR tree-optimization/65002
        * tree-cfg.c (pass_data_fixup_cfg): Don't update
        SSA on start.
        * tree-sra.c (some_callers_have_no_vuse_p): New.
        (ipa_early_sra): Reject functions whose callers
        assume function is read only.

or related changes.

> and we have about twice as much EH:
> 
> $ grep "resx " tramp3d-v4.ii.*\.ssa | wc -l
> 4816
> $ grep "resx " ../5/tramp3d-v4.ii.*\.ssa | wc -l
> 8671
> 
> which however is optimized out at a time of release_ssa.

That's maybe because we emit more CLOBBERs initially (do we?)

> Another thing that we may consider to cleanup in next stage1 is to get rid
> of dead stores:
> 
> -  MEM[(struct new_allocator *)&D.561702] ={v} {CLOBBER};
> -  D.561702 ={v} {CLOBBER};
> -  D.561702 ={v} {CLOBBER};
> -  MEM[(struct new_allocator *)_2] ={v} {CLOBBER};
> -  MEM[(struct allocator *)_2] ={v} {CLOBBER};
> -  MEM[(struct _Alloc_hider *)_2] ={v} {CLOBBER};
> -  MEM[(struct basic_string *)_2] ={v} {CLOBBER};
> -  *_2 ={v} {CLOBBER};
> -  *this_1(D) ={v} {CLOBBER};
> +  MEM[(struct  &)&D.570046] ={v} {CLOBBER};
> +  MEM[(struct  &)&D.570046] ={v} {CLOBBER};
> +  D.570046 ={v} {CLOBBER};
> +  MEM[(struct  &)_2] ={v} {CLOBBER};
> +  MEM[(struct  &)_2] ={v} {CLOBBER};
> +  MEM[(struct  &)_2] ={v} {CLOBBER};
> +  MEM[(struct  &)_2] ={v} {CLOBBER};
> +  MEM[(struct  &)_2] ={v} {CLOBBER};
> +  MEM[(struct  &)this_1(D)] ={v} {CLOBBER};
> 
> Clobbers are dangerously common. There are 18K clobbers in release_ssa dump
> out of 65K assignments, that makes them to be 29% of all the code. The
> number of clobbers seems to go down only in tramp3d-v4.ii.166t.ehcleanup
> dump and we still get a lot of redundancies:

Yeah, well ... :/  I've already taught DCE to get rid of the really useless
ones...

>   <bb 32>:                                                                  
> 
>   D.581063 ={v} {CLOBBER};                                                  
> 
>   D.581063 ={v} {CLOBBER};                                                  
> 
>   D.164155 ={v} {CLOBBER};                                                  
> 
>   D.164155 ={v} {CLOBBER};                                                  
> 
>   operator delete [] (begbuf_18);                                           
> 
> 
> Why those are not considered a dead stores and DCEed out earlier?

dead clobbers you mean?  Well, they are only "dead" if there are no
uses/defs of its LHS dominating them.


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bug ipa/65076] [5 Regression] 16% tramp3d-v4.cpp compile time regression
  2015-02-16 13:00 [Bug ipa/65076] New: [5 Regression] 16% tramp3d-v4.cpp compile time regression trippels at gcc dot gnu.org
                   ` (12 preceding siblings ...)
  2015-03-24 14:56 ` rguenth at gcc dot gnu.org
@ 2015-03-24 17:21 ` jakub at gcc dot gnu.org
  2015-03-25  8:42 ` hubicka at gcc dot gnu.org
                   ` (37 subsequent siblings)
  51 siblings, 0 replies; 53+ messages in thread
From: jakub at gcc dot gnu.org @ 2015-03-24 17:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #15 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
The only major bump in the length of the *.ssa dump (from 304028 to 336849) is
in r217125 aka the chkp stuff.


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bug ipa/65076] [5 Regression] 16% tramp3d-v4.cpp compile time regression
  2015-02-16 13:00 [Bug ipa/65076] New: [5 Regression] 16% tramp3d-v4.cpp compile time regression trippels at gcc dot gnu.org
                   ` (13 preceding siblings ...)
  2015-03-24 17:21 ` jakub at gcc dot gnu.org
@ 2015-03-25  8:42 ` hubicka at gcc dot gnu.org
  2015-03-25  8:42 ` hubicka at ucw dot cz
                   ` (36 subsequent siblings)
  51 siblings, 0 replies; 53+ messages in thread
From: hubicka at gcc dot gnu.org @ 2015-03-25  8:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076

--- Comment #16 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
The chkp stuff is IMO bit problematic. I was thinking about cutting the
optimization queue but was always hesitant to do so because of the cache
locality and other implications. I am not sure if that was considered with chkp
approach and if the split is really needed. Not something to track for GCC 5
though.

The slowdown
http://gcc.opensuse.org/c++bench-frescobaldi/tramp3d/split-build.html
is quite gradual, while the code size jump
http://gcc.opensuse.org/c++bench-frescobaldi/tramp3d/
happened at one point.  I looked once into the code size with Jakub and part of
that seems to be due to unwind info no longer using cfi directives at older
gases. Part is the new heuristics.

I am still hopping to understand better the code size part. To get performance
back we however probalby look for several little factors contributing to the
slowdown.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bug ipa/65076] [5 Regression] 16% tramp3d-v4.cpp compile time regression
  2015-02-16 13:00 [Bug ipa/65076] New: [5 Regression] 16% tramp3d-v4.cpp compile time regression trippels at gcc dot gnu.org
                   ` (14 preceding siblings ...)
  2015-03-25  8:42 ` hubicka at gcc dot gnu.org
@ 2015-03-25  8:42 ` hubicka at ucw dot cz
  2015-03-25  8:50 ` rguenther at suse dot de
                   ` (35 subsequent siblings)
  51 siblings, 0 replies; 53+ messages in thread
From: hubicka at ucw dot cz @ 2015-03-25  8:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076

--- Comment #17 from Jan Hubicka <hubicka at ucw dot cz> ---
> > Even though the inline decisions does not seem to be changed considerably
> > (at least on tramp3d).
> 
> Yeah, clobbers don't account for anything for size/inline estimates
> (well, I hope so!).

Yep, they are ignored.
> 
> And yes, doing DSE early is quite an old idea...  we should revisit it
> next stage1.

I also run into it several time.  We should not forget this time ;)

Honza


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bug ipa/65076] [5 Regression] 16% tramp3d-v4.cpp compile time regression
  2015-02-16 13:00 [Bug ipa/65076] New: [5 Regression] 16% tramp3d-v4.cpp compile time regression trippels at gcc dot gnu.org
                   ` (15 preceding siblings ...)
  2015-03-25  8:42 ` hubicka at ucw dot cz
@ 2015-03-25  8:50 ` rguenther at suse dot de
  2015-03-25 21:35 ` hubicka at gcc dot gnu.org
                   ` (34 subsequent siblings)
  51 siblings, 0 replies; 53+ messages in thread
From: rguenther at suse dot de @ 2015-03-25  8:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076

--- Comment #18 from rguenther at suse dot de <rguenther at suse dot de> ---
On Wed, 25 Mar 2015, hubicka at ucw dot cz wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076
> 
> --- Comment #17 from Jan Hubicka <hubicka at ucw dot cz> ---
> > > Even though the inline decisions does not seem to be changed considerably
> > > (at least on tramp3d).
> > 
> > Yeah, clobbers don't account for anything for size/inline estimates
> > (well, I hope so!).
> 
> Yep, they are ignored.

Btw, and we indeed generate more clobbers with GCC 5:

> grep CLOBBER a/tramp3d-v4.ii.015t.ssa | wc -l
8937
> grep CLOBBER b/tramp3d-v4.ii.018t.ssa | wc -l
12636

which is odd as we now have new code that replaces clobbers with
SSA default defs...

> grep CLOBBER a/tramp3d-v4.ii.010t.eh | wc -l
13012
> grep CLOBBER b/tramp3d-v4.ii.010t.eh | wc -l
13012
> grep CLOBBER a/tramp3d-v4.ii.011t.cfg | wc -l
12827
> grep CLOBBER b/tramp3d-v4.ii.011t.cfg | wc -l
12827

so we manage to drop a lot of them during into-SSA with GCC 4.9 but
fail to do that with GCC 5.  Which would mean we write less vars
into SSA form(?)

> grep '_[0-9]* =' a/tramp3d-v4.ii.015t.ssa | wc -l
46055
> grep '_[0-9]*(D)' a/tramp3d-v4.ii.015t.ssa | wc -l
30839

vs.

> grep '_[0-9]* =' b/tramp3d-v4.ii.018t.ssa | wc -l
45543
> grep '_[0-9]*(D)' b/tramp3d-v4.ii.018t.ssa | wc -l
33782

eventually it might be that in 4.9 local_pure_const runs for a functions
callees before we fixup its CFG and write it into SSA form.  That might
cause a significant cleanup of EH before entering into-SSA for 4.9
(as callees might become nothrow).  At least that's now my theory.

I wonder if it makes sense to run local + IPA pure-const as we now
do "IPA into-SSA".  So sth like

Index: gcc/passes.def
===================================================================
--- gcc/passes.def      (revision 221633)
+++ gcc/passes.def      (working copy)
@@ -58,7 +58,9 @@ along with GCC; see the file COPYING3.
       NEXT_PASS (pass_build_ssa);
       NEXT_PASS (pass_ubsan);
       NEXT_PASS (pass_early_warn_uninitialized);
+         NEXT_PASS (pass_local_pure_const);
   POP_INSERT_PASSES ()
+  NEXT_PASS (pass_ipa_pure_const);

   NEXT_PASS (pass_chkp_instrumentation_passes);
   PUSH_INSERT_PASSES_WITHIN (pass_chkp_instrumentation_passes)

(which likely doesn't work exactly like that, of course).  Or
keep doing local_pure_const only and make sure to process
pass_build_ssa_passes in a proper order.  Of course the real
power of local-pure-const only arrives when done after local
optimizations...

But I'd say the compile-time effect is clearly going into-SSA
before cleaning up EH significantly.

> > And yes, doing DSE early is quite an old idea...  we should revisit it
> > next stage1.
> 
> I also run into it several time.  We should not forget this time ;)

;)


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bug ipa/65076] [5 Regression] 16% tramp3d-v4.cpp compile time regression
  2015-02-16 13:00 [Bug ipa/65076] New: [5 Regression] 16% tramp3d-v4.cpp compile time regression trippels at gcc dot gnu.org
                   ` (16 preceding siblings ...)
  2015-03-25  8:50 ` rguenther at suse dot de
@ 2015-03-25 21:35 ` hubicka at gcc dot gnu.org
  2015-03-26  3:23 ` hubicka at gcc dot gnu.org
                   ` (33 subsequent siblings)
  51 siblings, 0 replies; 53+ messages in thread
From: hubicka at gcc dot gnu.org @ 2015-03-25 21:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076

Jan Hubicka <hubicka at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |enkovich.gnu at gmail dot com

--- Comment #20 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Thanks, Richard. I will take a look for the bootstrap ICE.  Running one extra
local pure-const should be cheap enough and dropping large regions of EH code
seems like a win.

Concering the inline decision difference, perhaps you can have more insight. I
typically use tramp3d as a black box to test heuristic changes - it is almost
impossible for me to figure out why inlining decisions went wrong on this beast
;))

Why exactly does chkp pass can not run as part of early opts?


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bug ipa/65076] [5 Regression] 16% tramp3d-v4.cpp compile time regression
  2015-02-16 13:00 [Bug ipa/65076] New: [5 Regression] 16% tramp3d-v4.cpp compile time regression trippels at gcc dot gnu.org
                   ` (17 preceding siblings ...)
  2015-03-25 21:35 ` hubicka at gcc dot gnu.org
@ 2015-03-26  3:23 ` hubicka at gcc dot gnu.org
  2015-03-27  4:03 ` hubicka at gcc dot gnu.org
                   ` (32 subsequent siblings)
  51 siblings, 0 replies; 53+ messages in thread
From: hubicka at gcc dot gnu.org @ 2015-03-26  3:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076

--- Comment #21 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Actually, looking at the code, I do not think we want full pure/const pass
(that build loops and attmepts to prove finiteness). We only want to run
nothrow discovery that is a lot cheaper and perhaps we want to do that already
during lowering passes - it does not care much about into_ssa.

We also may want to release ssa names after the ssa construction in case it
leads to considerable amount of DCE.


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bug ipa/65076] [5 Regression] 16% tramp3d-v4.cpp compile time regression
  2015-02-16 13:00 [Bug ipa/65076] New: [5 Regression] 16% tramp3d-v4.cpp compile time regression trippels at gcc dot gnu.org
                   ` (18 preceding siblings ...)
  2015-03-26  3:23 ` hubicka at gcc dot gnu.org
@ 2015-03-27  4:03 ` hubicka at gcc dot gnu.org
  2015-03-27  6:21 ` hubicka at gcc dot gnu.org
                   ` (31 subsequent siblings)
  51 siblings, 0 replies; 53+ messages in thread
From: hubicka at gcc dot gnu.org @ 2015-03-27  4:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076

Jan Hubicka <hubicka at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2015-03-27
     Ever confirmed|0                           |1

--- Comment #22 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
With the nothrow pass added into into-ssa queue, I now get same statement
counts after into_ssa as in 4.9.

At release_ssa time, the statement count is 4% higher (down from 7%).  This is
due to change of inline-insns-early parameter and using --param
early-inlining-insns=11 makes the instruction count actually 2% lower than for
4.9.

Curious fact is that mainline does 4158 early inlines, reducing
early-inlining-insns=11 increase number of inlines to 4695 while 4.9 does 4843.
So it seems that more early inlining somehow blocks itself later. (in addition
to the early inlining limit bump, early inliner now does uses predicates to
anticipate DCE)

Statement count in .optimize dump is 7% up and with --param
early-inlining-insns=11 3% up. Instruction count is 9% up.

Code segment is now 547066 compared to 4.9's 490807 (11% up).
Early-inlining-insns has minimal effect.

I will check if I can try to guide inliner to get more out of line functions
removed.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bug ipa/65076] [5 Regression] 16% tramp3d-v4.cpp compile time regression
  2015-02-16 13:00 [Bug ipa/65076] New: [5 Regression] 16% tramp3d-v4.cpp compile time regression trippels at gcc dot gnu.org
                   ` (19 preceding siblings ...)
  2015-03-27  4:03 ` hubicka at gcc dot gnu.org
@ 2015-03-27  6:21 ` hubicka at gcc dot gnu.org
  2015-03-27  9:34 ` rguenth at gcc dot gnu.org
                   ` (30 subsequent siblings)
  51 siblings, 0 replies; 53+ messages in thread
From: hubicka at gcc dot gnu.org @ 2015-03-27  6:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076

--- Comment #23 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Also with early-inlining-insns=11 the statement count is smaller for mainline
(copmared to 4.9) until the pass bswap, it grows up in PRE (by about 1%) and
then it continues growing with subsequent passes.  So it seems we make them
considerably more busy...


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bug ipa/65076] [5 Regression] 16% tramp3d-v4.cpp compile time regression
  2015-02-16 13:00 [Bug ipa/65076] New: [5 Regression] 16% tramp3d-v4.cpp compile time regression trippels at gcc dot gnu.org
                   ` (20 preceding siblings ...)
  2015-03-27  6:21 ` hubicka at gcc dot gnu.org
@ 2015-03-27  9:34 ` rguenth at gcc dot gnu.org
  2015-03-28 22:17 ` hubicka at gcc dot gnu.org
                   ` (29 subsequent siblings)
  51 siblings, 0 replies; 53+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-03-27  9:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076

--- Comment #26 from Richard Biener <rguenth at gcc dot gnu.org> ---
So how is the compile-time regression now?


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bug ipa/65076] [5 Regression] 16% tramp3d-v4.cpp compile time regression
  2015-02-16 13:00 [Bug ipa/65076] New: [5 Regression] 16% tramp3d-v4.cpp compile time regression trippels at gcc dot gnu.org
                   ` (21 preceding siblings ...)
  2015-03-27  9:34 ` rguenth at gcc dot gnu.org
@ 2015-03-28 22:17 ` hubicka at gcc dot gnu.org
  2015-03-29 14:29 ` trippels at gcc dot gnu.org
                   ` (28 subsequent siblings)
  51 siblings, 0 replies; 53+ messages in thread
From: hubicka at gcc dot gnu.org @ 2015-03-28 22:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076

--- Comment #27 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Unfortunately from me it wend down from about 18% to 15%, so still a
regression. One quantiative parameter I can measure is increase of number of
functions in the resulting binary from 1030 to 1140. I will try to double check
if that is bug of unreachable code removal or a property of new inliner metrics
preferring fewer full inlines.


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bug ipa/65076] [5 Regression] 16% tramp3d-v4.cpp compile time regression
  2015-02-16 13:00 [Bug ipa/65076] New: [5 Regression] 16% tramp3d-v4.cpp compile time regression trippels at gcc dot gnu.org
                   ` (22 preceding siblings ...)
  2015-03-28 22:17 ` hubicka at gcc dot gnu.org
@ 2015-03-29 14:29 ` trippels at gcc dot gnu.org
  2015-03-30  6:03 ` hubicka at gcc dot gnu.org
                   ` (27 subsequent siblings)
  51 siblings, 0 replies; 53+ messages in thread
From: trippels at gcc dot gnu.org @ 2015-03-29 14:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076

--- Comment #28 from Markus Trippelsdorf <trippels at gcc dot gnu.org> ---
Yes, it is still 15% on my machine, too:

markus@x4 ~ % time g++ -w -Ofast tramp3d-v4.cpp
g++ -w -Ofast tramp3d-v4.cpp  25.45s user 0.33s system 99% cpu 25.832 total

(At least this is still faster than clang:
markus@x4 ~ % time clang++ -w -Ofast tramp3d-v4.cpp
clang++ -w -Ofast tramp3d-v4.cpp  31.03s user 0.15s system 100% cpu 31.183
total)


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bug ipa/65076] [5 Regression] 16% tramp3d-v4.cpp compile time regression
  2015-02-16 13:00 [Bug ipa/65076] New: [5 Regression] 16% tramp3d-v4.cpp compile time regression trippels at gcc dot gnu.org
                   ` (23 preceding siblings ...)
  2015-03-29 14:29 ` trippels at gcc dot gnu.org
@ 2015-03-30  6:03 ` hubicka at gcc dot gnu.org
  2015-03-30  6:15 ` hubicka at gcc dot gnu.org
                   ` (26 subsequent siblings)
  51 siblings, 0 replies; 53+ messages in thread
From: hubicka at gcc dot gnu.org @ 2015-03-30  6:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076

--- Comment #29 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
I also tested with -Os and compile times seems about same as for 4.9 modulo
noise.

The following one liner brings instruction and function count in final binary
to same as in 4.9:
Index: ipa-inline.c
===================================================================
--- ipa-inline.c        (revision 221757)
+++ ipa-inline.c        (working copy)
@@ -1099,7 +1099,7 @@ edge_badness (struct cgraph_edge *edge,
        numerator = numerator >> 11;
       denominator = growth;
       if (callee_info->growth > 0)
-       denominator *= callee_info->growth;
+       denominator *= callee_info->growth * callee_info->growth;

       badness = - numerator / denominator;



I think this makes a lot of sense - it somehow makes growth of a given call
more important than growth for inlining everything and it also makes inliner to
more strongly prefer small growth, so I think I will commit it after testing
with plan for quick reversal if some benchmarks decrease significantly. (in
fact I had that patch scheduled for bechmarking for couple months)

I am testing it on my copy of C++ suite and will try to run firefox benchmarks
and look for off-noise differences.


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bug ipa/65076] [5 Regression] 16% tramp3d-v4.cpp compile time regression
  2015-02-16 13:00 [Bug ipa/65076] New: [5 Regression] 16% tramp3d-v4.cpp compile time regression trippels at gcc dot gnu.org
                   ` (24 preceding siblings ...)
  2015-03-30  6:03 ` hubicka at gcc dot gnu.org
@ 2015-03-30  6:15 ` hubicka at gcc dot gnu.org
  2015-03-31 13:03 ` rguenth at gcc dot gnu.org
                   ` (25 subsequent siblings)
  51 siblings, 0 replies; 53+ messages in thread
From: hubicka at gcc dot gnu.org @ 2015-03-30  6:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076

--- Comment #30 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Author: hubicka
Date: Mon Mar 30 02:00:56 2015
New Revision: 221769

URL: https://gcc.gnu.org/viewcvs?rev=221769&root=gcc&view=rev
Log:

    PR ipa/65076
    * ipa-inline.c (edge_badness): Base denominator on callee's
    grwoth squared.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/ipa-inline.c


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bug ipa/65076] [5 Regression] 16% tramp3d-v4.cpp compile time regression
  2015-02-16 13:00 [Bug ipa/65076] New: [5 Regression] 16% tramp3d-v4.cpp compile time regression trippels at gcc dot gnu.org
                   ` (25 preceding siblings ...)
  2015-03-30  6:15 ` hubicka at gcc dot gnu.org
@ 2015-03-31 13:03 ` rguenth at gcc dot gnu.org
  2015-03-31 13:09 ` trippels at gcc dot gnu.org
                   ` (24 subsequent siblings)
  51 siblings, 0 replies; 53+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-03-31 13:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P3                          |P1

--- Comment #31 from Richard Biener <rguenth at gcc dot gnu.org> ---
No negative effects seen.  Update on the regression?  P3->P1 before willfully
downgrading later...


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bug ipa/65076] [5 Regression] 16% tramp3d-v4.cpp compile time regression
  2015-02-16 13:00 [Bug ipa/65076] New: [5 Regression] 16% tramp3d-v4.cpp compile time regression trippels at gcc dot gnu.org
                   ` (26 preceding siblings ...)
  2015-03-31 13:03 ` rguenth at gcc dot gnu.org
@ 2015-03-31 13:09 ` trippels at gcc dot gnu.org
  2015-03-31 14:10 ` rguenther at suse dot de
                   ` (23 subsequent siblings)
  51 siblings, 0 replies; 53+ messages in thread
From: trippels at gcc dot gnu.org @ 2015-03-31 13:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076

--- Comment #32 from Markus Trippelsdorf <trippels at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #31)
> No negative effects seen.  Update on the regression?  P3->P1 before
> willfully downgrading later...

It depends on the target machine. On amdfam10 it is still 12%,
on sandybridge it is less than 10%.
But tramp3d-v4.cpp compiled with gcc-5 runs 3.5% faster thanks to -ipa-icf.


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bug ipa/65076] [5 Regression] 16% tramp3d-v4.cpp compile time regression
  2015-02-16 13:00 [Bug ipa/65076] New: [5 Regression] 16% tramp3d-v4.cpp compile time regression trippels at gcc dot gnu.org
                   ` (27 preceding siblings ...)
  2015-03-31 13:09 ` trippels at gcc dot gnu.org
@ 2015-03-31 14:10 ` rguenther at suse dot de
  2015-03-31 14:13 ` trippels at gcc dot gnu.org
                   ` (22 subsequent siblings)
  51 siblings, 0 replies; 53+ messages in thread
From: rguenther at suse dot de @ 2015-03-31 14:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076

--- Comment #33 from rguenther at suse dot de <rguenther at suse dot de> ---
On Tue, 31 Mar 2015, trippels at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076
> 
> --- Comment #32 from Markus Trippelsdorf <trippels at gcc dot gnu.org> ---
> (In reply to Richard Biener from comment #31)
> > No negative effects seen.  Update on the regression?  P3->P1 before
> > willfully downgrading later...
> 
> It depends on the target machine. On amdfam10 it is still 12%,
> on sandybridge it is less than 10%.
> But tramp3d-v4.cpp compiled with gcc-5 runs 3.5% faster thanks to -ipa-icf.

Ok, I'd say we should disable new passes for a fair comparison - is
using -fno-ipa-icf significantly better?

10% is a lot (IMHO).  I wonder how compile-time evolved on other
code-bases (I see ~3% on the CSiBE compile for example, at -Os).

How does -O2 compile-time compare?


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bug ipa/65076] [5 Regression] 16% tramp3d-v4.cpp compile time regression
  2015-02-16 13:00 [Bug ipa/65076] New: [5 Regression] 16% tramp3d-v4.cpp compile time regression trippels at gcc dot gnu.org
                   ` (28 preceding siblings ...)
  2015-03-31 14:10 ` rguenther at suse dot de
@ 2015-03-31 14:13 ` trippels at gcc dot gnu.org
  2015-03-31 14:47 ` trippels at gcc dot gnu.org
                   ` (21 subsequent siblings)
  51 siblings, 0 replies; 53+ messages in thread
From: trippels at gcc dot gnu.org @ 2015-03-31 14:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076

--- Comment #34 from Markus Trippelsdorf <trippels at gcc dot gnu.org> ---
(In reply to rguenther@suse.de from comment #33)
> On Tue, 31 Mar 2015, trippels at gcc dot gnu.org wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076
> > 
> > --- Comment #32 from Markus Trippelsdorf <trippels at gcc dot gnu.org> ---
> > (In reply to Richard Biener from comment #31)
> > > No negative effects seen.  Update on the regression?  P3->P1 before
> > > willfully downgrading later...
> > 
> > It depends on the target machine. On amdfam10 it is still 12%,
> > on sandybridge it is less than 10%.
> > But tramp3d-v4.cpp compiled with gcc-5 runs 3.5% faster thanks to -ipa-icf.
> 
> Ok, I'd say we should disable new passes for a fair comparison - is
> using -fno-ipa-icf significantly better?

No, it is in the noise.

> 10% is a lot (IMHO).  I wonder how compile-time evolved on other
> code-bases (I see ~3% on the CSiBE compile for example, at -Os).
> 
> How does -O2 compile-time compare?

amdfam10:    23.721 vs. 21.060 = 11.2179%
sandybridge: 16.978 vs. 15.360 = 9.52998%


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bug ipa/65076] [5 Regression] 16% tramp3d-v4.cpp compile time regression
  2015-02-16 13:00 [Bug ipa/65076] New: [5 Regression] 16% tramp3d-v4.cpp compile time regression trippels at gcc dot gnu.org
                   ` (29 preceding siblings ...)
  2015-03-31 14:13 ` trippels at gcc dot gnu.org
@ 2015-03-31 14:47 ` trippels at gcc dot gnu.org
  2015-03-31 15:08 ` trippels at gcc dot gnu.org
                   ` (20 subsequent siblings)
  51 siblings, 0 replies; 53+ messages in thread
From: trippels at gcc dot gnu.org @ 2015-03-31 14:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076

--- Comment #35 from Markus Trippelsdorf <trippels at gcc dot gnu.org> ---
POWER8     : 23.424 vs. 20.676 = 11.7316%

Firefox LTO buildtime on ppc64le:  5:18.12 total vs. 4:48.85 total = 6.25%


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bug ipa/65076] [5 Regression] 16% tramp3d-v4.cpp compile time regression
  2015-02-16 13:00 [Bug ipa/65076] New: [5 Regression] 16% tramp3d-v4.cpp compile time regression trippels at gcc dot gnu.org
                   ` (30 preceding siblings ...)
  2015-03-31 14:47 ` trippels at gcc dot gnu.org
@ 2015-03-31 15:08 ` trippels at gcc dot gnu.org
  2015-03-31 15:10 ` evstupac at gmail dot com
                   ` (19 subsequent siblings)
  51 siblings, 0 replies; 53+ messages in thread
From: trippels at gcc dot gnu.org @ 2015-03-31 15:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076

--- Comment #36 from Markus Trippelsdorf <trippels at gcc dot gnu.org> ---
(In reply to Markus Trippelsdorf from comment #35)
> Firefox LTO buildtime on ppc64le:  5:18.12 total vs. 4:48.85 total = 6.25%

Please ignore the Firefox buildtime comparison. It was a measuring error.
Build times are roughly the same with 5 vs 4.9.


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bug ipa/65076] [5 Regression] 16% tramp3d-v4.cpp compile time regression
  2015-02-16 13:00 [Bug ipa/65076] New: [5 Regression] 16% tramp3d-v4.cpp compile time regression trippels at gcc dot gnu.org
                   ` (31 preceding siblings ...)
  2015-03-31 15:08 ` trippels at gcc dot gnu.org
@ 2015-03-31 15:10 ` evstupac at gmail dot com
  2015-03-31 15:29 ` rguenth at gcc dot gnu.org
                   ` (18 subsequent siblings)
  51 siblings, 0 replies; 53+ messages in thread
From: evstupac at gmail dot com @ 2015-03-31 15:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076

--- Comment #37 from Stupachenko Evgeny <evstupac at gmail dot com> ---
(In reply to Richard Biener from comment #31)
> No negative effects seen.  Update on the regression?  P3->P1 before
> willfully downgrading later...

Compiled with "-Ofast -flto -funroll-loops -m32" and corresponding "-march="

spec2000 252.eon performance regressed on 5%, 197.parser on 2%.


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bug ipa/65076] [5 Regression] 16% tramp3d-v4.cpp compile time regression
  2015-02-16 13:00 [Bug ipa/65076] New: [5 Regression] 16% tramp3d-v4.cpp compile time regression trippels at gcc dot gnu.org
                   ` (32 preceding siblings ...)
  2015-03-31 15:10 ` evstupac at gmail dot com
@ 2015-03-31 15:29 ` rguenth at gcc dot gnu.org
  2015-03-31 16:06 ` hubicka at ucw dot cz
                   ` (17 subsequent siblings)
  51 siblings, 0 replies; 53+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-03-31 15:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076

--- Comment #38 from Richard Biener <rguenth at gcc dot gnu.org> ---
Looks that compile-time with -Dleafify=flatten is basically unchanged.  So it
is definitely different inlining decisions for tram3d-v4.cpp.  Maybe we inline
a lot more early now (due to early-insn param change?) and thus push more stmts
through th eearly pipeline before we get rid of the bodies via IPA inlining?
That -Os behaves sane hits at that (we ignore early-inlining-insns there).
Though the base for inline-unit-growth is the size after early inlining
and the former was dropped to 15%...

Would be nice to have 'phase opt and generate' split into lowering, early
opts, IPA, late opts and RTL phase.


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bug ipa/65076] [5 Regression] 16% tramp3d-v4.cpp compile time regression
  2015-02-16 13:00 [Bug ipa/65076] New: [5 Regression] 16% tramp3d-v4.cpp compile time regression trippels at gcc dot gnu.org
                   ` (33 preceding siblings ...)
  2015-03-31 15:29 ` rguenth at gcc dot gnu.org
@ 2015-03-31 16:06 ` hubicka at ucw dot cz
  2015-03-31 16:25 ` hubicka at gcc dot gnu.org
                   ` (16 subsequent siblings)
  51 siblings, 0 replies; 53+ messages in thread
From: hubicka at ucw dot cz @ 2015-03-31 16:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076

--- Comment #39 from Jan Hubicka <hubicka at ucw dot cz> ---
Hi, yep, -Os or flatten is unchanged. It seems something regress with -O3
inline decisions but it is somewhat hard
to pinpoint.  I am on a way to Victoria, so I will do more only tonight.

https://gcc.gnu.org/ml/gcc-patches/2015-03/msg01615.html
makes it possible to get comparable binary with old early inlining limits and
unit growth
(--param early-inlining-insns=11 --param inline-unit-growth=30). But the
increased
early inlining limits does not really seem to lead to more inlines for tramp3d.

Honza


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bug ipa/65076] [5 Regression] 16% tramp3d-v4.cpp compile time regression
  2015-02-16 13:00 [Bug ipa/65076] New: [5 Regression] 16% tramp3d-v4.cpp compile time regression trippels at gcc dot gnu.org
                   ` (34 preceding siblings ...)
  2015-03-31 16:06 ` hubicka at ucw dot cz
@ 2015-03-31 16:25 ` hubicka at gcc dot gnu.org
  2015-03-31 17:36 ` hubicka at gcc dot gnu.org
                   ` (15 subsequent siblings)
  51 siblings, 0 replies; 53+ messages in thread
From: hubicka at gcc dot gnu.org @ 2015-03-31 16:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076

--- Comment #40 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
-O3 graph http://gcc.opensuse.org/c++bench/tramp3d/split-build.html seems to
show 3 bigger increases recently. Can we get the revisions for those?


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bug ipa/65076] [5 Regression] 16% tramp3d-v4.cpp compile time regression
  2015-02-16 13:00 [Bug ipa/65076] New: [5 Regression] 16% tramp3d-v4.cpp compile time regression trippels at gcc dot gnu.org
                   ` (35 preceding siblings ...)
  2015-03-31 16:25 ` hubicka at gcc dot gnu.org
@ 2015-03-31 17:36 ` hubicka at gcc dot gnu.org
  2015-03-31 17:53 ` hubicka at gcc dot gnu.org
                   ` (14 subsequent siblings)
  51 siblings, 0 replies; 53+ messages in thread
From: hubicka at gcc dot gnu.org @ 2015-03-31 17:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076

--- Comment #41 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
OK. I can actually look it up in raw files.

It is: 69s->80s between


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bug ipa/65076] [5 Regression] 16% tramp3d-v4.cpp compile time regression
  2015-02-16 13:00 [Bug ipa/65076] New: [5 Regression] 16% tramp3d-v4.cpp compile time regression trippels at gcc dot gnu.org
                   ` (36 preceding siblings ...)
  2015-03-31 17:36 ` hubicka at gcc dot gnu.org
@ 2015-03-31 17:53 ` hubicka at gcc dot gnu.org
  2015-03-31 17:54 ` hubicka at gcc dot gnu.org
                   ` (13 subsequent siblings)
  51 siblings, 0 replies; 53+ messages in thread
From: hubicka at gcc dot gnu.org @ 2015-03-31 17:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076

--- Comment #42 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Sorry, accidental message.

It is 69->80.5s between 141127.61083 and 150113.26056 (tester was down)
      66->69s between 141123.15275 and 141124.01653
      60->64 between 140807.80282 and 140808.66762

Now the other tester shows 
      59->50 between 150112.41636 and 150113.13858
      51->53 between 141124.14999 and 141123.43031
      45->48 between 140807.01584 and 140808.72560

Can we, please, restart the frescobaldi tester? It is down sine 23rd.

There is an inline change
2015-01-12  Jan Hubicka  <hubicka@ucw.cz>

        PR ipa/63967
        PR ipa/64425
        * ipa-inline.c (compute_uninlined_call_time,
        compute_inlined_call_time): Use counts for extra precision when
        needed possible.
        (big_speedup_p): Fix formating.
        (RELATIVE_TIME_BENEFIT_RANGE): Remove.
        (relative_time_benefit): Remove.
        (edge_badness): Turn DECL_DISREGARD_INLINE_LIMITS into hint;
        merge guessed and read profile paths.
        (inline_small_functions): Count only !optimize_size functions into
        initial size; be more lax about sanity check when profile is used;
        be sure to update inlined function profile when profile is read.

2015-01-12  Jan Hubicka  <hubicka@ucw.cz>

        PR ipa/63470
        * ipa-inline-analysis.c (inline_edge_duplication_hook): Adjust
        cost when edge becomes direct.
        * ipa-prop.c (make_edge_direct): Do not adjust when speculation
        is resolved or when introducing new speculation.

2014-11-22  Jan Hubicka  <hubicka@ucw.cz>

        PR ipa/63671
        * ipa-inline-transform.c (can_remove_node_now_p_1): Handle alises
        and -fno-devirtualize more carefully.
        (can_remove_node_now_p): Update.

(it really is 24th)

2014-08-07  Jan Hubicka  <hubicka@ucw.cz>

        * ipa-devirt.c: Include gimple-pretty-print.h
        (referenced_from_vtable_p): Exclude DECL_EXTERNAL from
        further tests.
        (decl_maybe_in_construction_p): Fix conditional on cdtor check
        (get_polymorphic_call_info): Fix return value
        (type_change_info): New sturcture based on ipa-prop
        variant.
        (noncall_stmt_may_be_vtbl_ptr_store): New predicate
        based on ipa-prop variant.
        (extr_type_from_vtbl_ptr_store): New function
        based on ipa-prop variant.
        (record_known_type): New function.
        (check_stmt_for_type_change): New function.
        (get_dynamic_type): New function.
        * ipa-prop.c (ipa_analyze_call_uses): Use get_dynamic_type.
        * tree-ssa-pre.c: ipa-utils.h
        (eliminate_dom_walker::before_dom_children): Use ipa-devirt
        machinery; sanity check with ipa-prop devirtualization.
        * trans-mem.c (ipa_tm_insert_gettmclone_call): Clear
        polymorphic flag.

Last one is ipa-cp change or perhaps get_dynamic_type is expensive (it has some
chance for that). I will add timevar for it.

The first two changes are not what I would expect. Perhaps checking these is a
good idea.  It may be something else comitted the same day.


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bug ipa/65076] [5 Regression] 16% tramp3d-v4.cpp compile time regression
  2015-02-16 13:00 [Bug ipa/65076] New: [5 Regression] 16% tramp3d-v4.cpp compile time regression trippels at gcc dot gnu.org
                   ` (37 preceding siblings ...)
  2015-03-31 17:53 ` hubicka at gcc dot gnu.org
@ 2015-03-31 17:54 ` hubicka at gcc dot gnu.org
  2015-03-31 20:31 ` trippels at gcc dot gnu.org
                   ` (12 subsequent siblings)
  51 siblings, 0 replies; 53+ messages in thread
From: hubicka at gcc dot gnu.org @ 2015-03-31 17:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076

--- Comment #43 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Markus, can you reproduce some consistent growth in -ftime-report for one of
passes? Given that code size difference is solved (please try to double check
that, we may have slightly different revisions of tramp3d), it really seems
that one of the passes simply regress.


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bug ipa/65076] [5 Regression] 16% tramp3d-v4.cpp compile time regression
  2015-02-16 13:00 [Bug ipa/65076] New: [5 Regression] 16% tramp3d-v4.cpp compile time regression trippels at gcc dot gnu.org
                   ` (38 preceding siblings ...)
  2015-03-31 17:54 ` hubicka at gcc dot gnu.org
@ 2015-03-31 20:31 ` trippels at gcc dot gnu.org
  2015-04-01  8:02 ` rguenther at suse dot de
                   ` (11 subsequent siblings)
  51 siblings, 0 replies; 53+ messages in thread
From: trippels at gcc dot gnu.org @ 2015-03-31 20:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076

--- Comment #44 from Markus Trippelsdorf <trippels at gcc dot gnu.org> ---
(In reply to Jan Hubicka from comment #43)
> Markus, can you reproduce some consistent growth in -ftime-report for one of
> passes? Given that code size difference is solved (please try to double
> check that, we may have slightly different revisions of tramp3d), it really
> seems that one of the passes simply regress.

Like Richard wrote in comment 38 it is "phase opt and generate" that regresses
for me.
Code size difference is OK now (1% bigger on X86, 2% smaller on ppc64).

And I have pointed to r219452 in comment0, but haven't double checked yet.
(r219452 is:
2015-01-12  Jan Hubicka  <hubicka@ucw.cz>

        PR ipa/63967
        PR ipa/64425
        * ipa-inline.c (compute_uninlined_call_time,
...)


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bug ipa/65076] [5 Regression] 16% tramp3d-v4.cpp compile time regression
  2015-02-16 13:00 [Bug ipa/65076] New: [5 Regression] 16% tramp3d-v4.cpp compile time regression trippels at gcc dot gnu.org
                   ` (39 preceding siblings ...)
  2015-03-31 20:31 ` trippels at gcc dot gnu.org
@ 2015-04-01  8:02 ` rguenther at suse dot de
  2015-04-01  8:05 ` rguenther at suse dot de
                   ` (10 subsequent siblings)
  51 siblings, 0 replies; 53+ messages in thread
From: rguenther at suse dot de @ 2015-04-01  8:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076

--- Comment #47 from rguenther at suse dot de <rguenther at suse dot de> ---
On Tue, 31 Mar 2015, hubicka at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076
> 
> --- Comment #42 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
> Sorry, accidental message.
> 
> It is 69->80.5s between 141127.61083 and 150113.26056 (tester was down)
>       66->69s between 141123.15275 and 141124.01653
>       60->64 between 140807.80282 and 140808.66762
> 
> Now the other tester shows 
>       59->50 between 150112.41636 and 150113.13858
>       51->53 between 141124.14999 and 141123.43031
>       45->48 between 140807.01584 and 140808.72560
> 
> Can we, please, restart the frescobaldi tester? It is down sine 23rd.

Not sure what happened - trying to investigate.

Btw, terbium usually shows more smooth graphs (ok, it's Itanic...)
(http://gcc.opensuse.org/c++bench-terbium/tramp3d/split-build.html)

I'll see whether I can find the time to setup C++ testing on czerny.

For terbium it's

  114 -> 124 between 141030.96758 (r216126) and 141211.32597 (r218621) 
(tester was down)
  123 -> 128 between 141228.22340 (r219074) and 141228.90737 (r219088)
  129 -> 160 between 150112.86585 (r219449) and 150113.55305 (r219508)
  162 -> 139 between 150119.02405 (r219836) and 150120.70780 (r219878)
  138 -> 127 between 150330.74477 (r221762) and 150330.46734 (r221770)


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bug ipa/65076] [5 Regression] 16% tramp3d-v4.cpp compile time regression
  2015-02-16 13:00 [Bug ipa/65076] New: [5 Regression] 16% tramp3d-v4.cpp compile time regression trippels at gcc dot gnu.org
                   ` (40 preceding siblings ...)
  2015-04-01  8:02 ` rguenther at suse dot de
@ 2015-04-01  8:05 ` rguenther at suse dot de
  2015-04-01  8:28 ` hubicka at gcc dot gnu.org
                   ` (9 subsequent siblings)
  51 siblings, 0 replies; 53+ messages in thread
From: rguenther at suse dot de @ 2015-04-01  8:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076

--- Comment #48 from rguenther at suse dot de <rguenther at suse dot de> ---
On Tue, 31 Mar 2015, trippels at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076
> 
> --- Comment #46 from Markus Trippelsdorf <trippels at gcc dot gnu.org> ---
> (In reply to Jan Hubicka from comment #45)
> > > Like Richard wrote in comment 38 it is "phase opt and generate" that regresses
> >
> > Yes, but is it regression because of one specific pass shown later or is it
> > just a cumulative  effect of many little slowdown?
> 
> Nothing pops into the eye, so it must be the cumulative effect.

Maybe we regressed optimizing GCC itself?  (does not bootstrapping
but compiling gcc 5 with gcc 4.9 improve things?)

Could also that replacing more libiberty htabs with hash_tables
(the GCed ones) and replacing pointer-set/map made things slower.


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bug ipa/65076] [5 Regression] 16% tramp3d-v4.cpp compile time regression
  2015-02-16 13:00 [Bug ipa/65076] New: [5 Regression] 16% tramp3d-v4.cpp compile time regression trippels at gcc dot gnu.org
                   ` (41 preceding siblings ...)
  2015-04-01  8:05 ` rguenther at suse dot de
@ 2015-04-01  8:28 ` hubicka at gcc dot gnu.org
  2015-04-01  8:34 ` trippels at gcc dot gnu.org
                   ` (8 subsequent siblings)
  51 siblings, 0 replies; 53+ messages in thread
From: hubicka at gcc dot gnu.org @ 2015-04-01  8:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076

--- Comment #49 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
I did some experiments about the increase of early inlining insns:

 - Early optimizers of both 4.9 and mainline process 9819 functions.
 - At release_ssa time, the statement count is 8%
 - at ipa-cp, we have 9% fewer functions at mainline (so inliing helps)
 - At copyrename2 time, GCC 4.9 has 2% more statements and same number of
functions. The difference drops to 1% at .optimized time.

I do not think pure statement count explains the problem - early optimization
is small part of the queue. Also early-inlining-insns does not have at all that
much effect on GCC 4.9 (26.1s -> 16.9s or a noise)

A difference may be in a fact, that original metric used relative time benefits
that computed estimated time saved over estimated time for executing both
caller and callee. Now this metric drops to low values when caller is huge.
New metric does not have this property and do not consider it a bad idea to
inline into huge callers as long as time seems to improve measurably.  I
suppose it may account in overall slowdown as we get large functions more
often.

It seems supported by fact that mainline hits large-function-growth limit 285
times (about 9% of all functions output), while 4.9 7 times.

I am also seeing some issues with firefox and the new javascript interpreter.
It seems that current limit of inline-unit-growth (reduced from 30 to 15%) is
too small for new firefox trees and there is very good improvement for
increasing it back to 30%.  This however of course makes this PR worse. 

I have patch to re-implement original badness metric in current tree, lets see.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bug ipa/65076] [5 Regression] 16% tramp3d-v4.cpp compile time regression
  2015-02-16 13:00 [Bug ipa/65076] New: [5 Regression] 16% tramp3d-v4.cpp compile time regression trippels at gcc dot gnu.org
                   ` (42 preceding siblings ...)
  2015-04-01  8:28 ` hubicka at gcc dot gnu.org
@ 2015-04-01  8:34 ` trippels at gcc dot gnu.org
  2015-04-02  5:18 ` hubicka at gcc dot gnu.org
                   ` (7 subsequent siblings)
  51 siblings, 0 replies; 53+ messages in thread
From: trippels at gcc dot gnu.org @ 2015-04-01  8:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076

--- Comment #51 from Markus Trippelsdorf <trippels at gcc dot gnu.org> ---
(In reply to rguenther@suse.de from comment #48)
>
> Maybe we regressed optimizing GCC itself?  (does not bootstrapping
> but compiling gcc 5 with gcc 4.9 improve things?)

No, gcc configured with "--disable-bootstrap --enable-checking=release" 
and build with gcc-5 vs. gcc-4.9 does not show any difference.


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bug ipa/65076] [5 Regression] 16% tramp3d-v4.cpp compile time regression
  2015-02-16 13:00 [Bug ipa/65076] New: [5 Regression] 16% tramp3d-v4.cpp compile time regression trippels at gcc dot gnu.org
                   ` (43 preceding siblings ...)
  2015-04-01  8:34 ` trippels at gcc dot gnu.org
@ 2015-04-02  5:18 ` hubicka at gcc dot gnu.org
  2015-04-02  7:07 ` hubicka at gcc dot gnu.org
                   ` (6 subsequent siblings)
  51 siblings, 0 replies; 53+ messages in thread
From: hubicka at gcc dot gnu.org @ 2015-04-02  5:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076

--- Comment #52 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
$ time /aux/hubicka/trunk-install/bin/g++ -Ofast -fpermissive --param
large-function-insns=1 tramp3d-v4.ii -w ;  ./a.out -n 3

real    0m34.232s
user    0m33.729s
sys     0m0.532s
i = 1    t = 0.00209225  dt = 0.00209225 (0.0175509s/it)
i = 2    t = 0.00410537  dt = 0.00201312 (0.164169s/it)
i = 3    t = 0.00603889  dt = 0.00193352 (0.167095s/it)

$ time /aux/hubicka/trunk-install/bin/g++ -Ofast -fpermissive --param
large-function-insns=10 tramp3d-v4.ii -w ;  ./a.out -n 3

real    0m34.226s
user    0m33.749s
sys     0m0.506s
i = 1    t = 0.00209225  dt = 0.00209225 (0.0187211s/it)
i = 2    t = 0.00410537  dt = 0.00201312 (0.177041s/it)
i = 3    t = 0.00603889  dt = 0.00193352 (0.181561s/it)
$ time /aux/hubicka/trunk-install/bin/g++ -Ofast -fpermissive --param
large-function-insns=100 tramp3d-v4.ii -w ;  ./a.out -n 3

real    0m34.012s
user    0m33.455s
sys     0m0.586s
i = 1    t = 0.00209225  dt = 0.00209225 (0.0175891s/it)
i = 2    t = 0.00410537  dt = 0.00201312 (0.172188s/it)
i = 3    t = 0.00603889  dt = 0.00193352 (0.175776s/it)
$ time /aux/hubicka/trunk-install/bin/g++ -Ofast -fpermissive --param
large-function-insns=500 tramp3d-v4.ii -w ;  ./a.out -n 3

real    0m35.720s
user    0m35.252s
sys     0m0.498s
i = 1    t = 0.00209225  dt = 0.00209225 (0.0190959s/it)
i = 2    t = 0.00410537  dt = 0.00201312 (0.147543s/it)
i = 3    t = 0.00603889  dt = 0.00193352 (0.151731s/it)

$ time /aux/hubicka/trunk-install/bin/g++ -Ofast -fpermissive --param
large-function-insns=2700 tramp3d-v4.ii -w ;  ./a.out -n 3

real    0m36.697s
user    0m36.192s
sys     0m0.536s

$ time /aux/hubicka/trunk-install/bin/g++ -Ofast -fpermissive --param
large-function-insns=1000 tramp3d-v4.ii -w ;  ./a.out -n 3

real    0m36.369s
user    0m35.900s
sys     0m0.500s
i = 1    t = 0.00209225  dt = 0.00209225 (0.00889301s/it)
i = 2    t = 0.00410537  dt = 0.00201312 (0.137394s/it)
i = 3    t = 0.00603889  dt = 0.00193352 (0.14172s/it)
$ time /aux/hubicka/trunk-install/bin/g++ -Ofast -fpermissive --param
large-function-insns=2700 tramp3d-v4.ii -w ;  ./a.out -n 3

real    0m36.730s
user    0m36.216s
sys     0m0.546s
i = 1    t = 0.00209225  dt = 0.00209225 (0.00888801s/it)
i = 2    t = 0.00410537  dt = 0.00201312 (0.134414s/it)
i = 3    t = 0.00603889  dt = 0.00193352 (0.137397s/it)


$ time /aux/hubicka/trunk-install/bin/g++ -Ofast -fpermissive --param
large-function-insns=10000 tramp3d-v4.ii -w ;  ./a.out -n 3

real    0m37.722s
user    0m37.215s
sys     0m0.539s
i = 1    t = 0.00209225  dt = 0.00209225 (0.00893092s/it)
i = 2    t = 0.00410537  dt = 0.00201312 (0.171207s/it)
i = 3    t = 0.00603889  dt = 0.00193352 (0.17444s/it)
$ time /aux/hubicka/trunk-install/bin/g++ -Ofast -fpermissive --param
large-function-insns=100000 tramp3d-v4.ii -w ;  ./a.out -n 3

real    0m37.675s
user    0m37.147s
sys     0m0.559s
i = 1    t = 0.00209225  dt = 0.00209225 (0.00888085s/it)
i = 2    t = 0.00410537  dt = 0.00201312 (0.169823s/it)
i = 3    t = 0.00603889  dt = 0.00193352 (0.173361s/it)


So there seems to be 8% compile time performance drop somewhere in between 100
and 1000 of large-function-insns and the current default 2700 seems to sit in
sweet spot of the performance (bellow 500 or above 10000 starts dropping).

The text segment size is 589645 for 10000, 587750 for 2700, 591666 for 1000,
576922 for 500

GCC 4.9 seems happy with growth of 100 and drops a bit at growth of 30. It also
delivers smaller binary at those growth settings (489406 bytes).

So it seems to suggest that noticeable part of the remaining regression may be
due to new heuristics prefferring large functions. I will experiment with
re-adding the combined function size into the denomiator. But my first
experiments does not look amazing.


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bug ipa/65076] [5 Regression] 16% tramp3d-v4.cpp compile time regression
  2015-02-16 13:00 [Bug ipa/65076] New: [5 Regression] 16% tramp3d-v4.cpp compile time regression trippels at gcc dot gnu.org
                   ` (44 preceding siblings ...)
  2015-04-02  5:18 ` hubicka at gcc dot gnu.org
@ 2015-04-02  7:07 ` hubicka at gcc dot gnu.org
  2015-04-02 23:44 ` hubicka at gcc dot gnu.org
                   ` (5 subsequent siblings)
  51 siblings, 0 replies; 53+ messages in thread
From: hubicka at gcc dot gnu.org @ 2015-04-02  7:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076

--- Comment #53 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
This patch makes denominator to use resulting function size (not uninlined time
like 4.9 did but getting the resulting fraction closer to 4.9 style):
Index: ../../gcc/ipa-inline.c
===================================================================
--- ../../gcc/ipa-inline.c      (revision 221806)
+++ ../../gcc/ipa-inline.c      (working copy)
@@ -1167,6 +1168,7 @@ edge_badness (struct cgraph_edge *edge,
            overall_growth += 256 * 256 - 256;
          denominator *= overall_growth;
         }
+      denominator *= inline_summaries->get (caller)->self_size + growth;

       badness = - numerator / denominator;

Large function growht is now hit only 8 times and compile time seems down to:
real    0m33.670s
user    0m33.065s
sys     0m0.522s
code size seems 8% bigger than 4.9, runtime is good.

The performance stays good with large-function-insns=10 and compile time
further drops to:
real    0m32.127s
user    0m31.634s
sys     0m0.520s

I will run firefox benchmarks tonight.


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bug ipa/65076] [5 Regression] 16% tramp3d-v4.cpp compile time regression
  2015-02-16 13:00 [Bug ipa/65076] New: [5 Regression] 16% tramp3d-v4.cpp compile time regression trippels at gcc dot gnu.org
                   ` (45 preceding siblings ...)
  2015-04-02  7:07 ` hubicka at gcc dot gnu.org
@ 2015-04-02 23:44 ` hubicka at gcc dot gnu.org
  2015-04-03 18:09 ` hubicka at gcc dot gnu.org
                   ` (4 subsequent siblings)
  51 siblings, 0 replies; 53+ messages in thread
From: hubicka at gcc dot gnu.org @ 2015-04-02 23:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076

--- Comment #54 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
I have full set of firefox talos benchmarks with inline-unit-growth bumped back
to 30 (I did not test default value by accident, but I am running itnow). We
now get back the GCC 4.9 performance on dromaeo_dom/dromaeo_css that was
troubling me most. We also get improvement on tsvgx and the startup time
benchmark (where mainline is already on par with GCC 4.9 after the wrapper
fix).

I am definitely surprised - I originally introduced the relative benefits as a
hack to get around value range limitation in the fixed point badness metric,
but the explanation about many average size functions being better for code
generation that case of some very small and some very big seems reasonable. 
The combined badness metric kind of consider function growth and unit growth to
be both limiting factors as they are.

Given the benchmark results, I think we should go ahead with the change. I will
work out the preferred limits for overall-unit-growth based on firefox
incrementally. Also we need to reconsider FDO metric. It seems that new one
works better for tramp3d and spec, I will re-check how it behaves on firefox.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bug ipa/65076] [5 Regression] 16% tramp3d-v4.cpp compile time regression
  2015-02-16 13:00 [Bug ipa/65076] New: [5 Regression] 16% tramp3d-v4.cpp compile time regression trippels at gcc dot gnu.org
                   ` (46 preceding siblings ...)
  2015-04-02 23:44 ` hubicka at gcc dot gnu.org
@ 2015-04-03 18:09 ` hubicka at gcc dot gnu.org
  2015-04-07  9:32 ` rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  51 siblings, 0 replies; 53+ messages in thread
From: hubicka at gcc dot gnu.org @ 2015-04-03 18:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076

--- Comment #55 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Author: hubicka
Date: Fri Apr  3 18:09:13 2015
New Revision: 221859

URL: https://gcc.gnu.org/viewcvs?rev=221859&root=gcc&view=rev
Log:
    PR ipa/65076
    * ipa-inline.c (edge_badness): Add combined size to the denominator.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/ipa-inline.c


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bug ipa/65076] [5 Regression] 16% tramp3d-v4.cpp compile time regression
  2015-02-16 13:00 [Bug ipa/65076] New: [5 Regression] 16% tramp3d-v4.cpp compile time regression trippels at gcc dot gnu.org
                   ` (47 preceding siblings ...)
  2015-04-03 18:09 ` hubicka at gcc dot gnu.org
@ 2015-04-07  9:32 ` rguenth at gcc dot gnu.org
  2015-04-22  1:32 ` [Bug ipa/65076] [5/6 " hubicka at gcc dot gnu.org
                   ` (2 subsequent siblings)
  51 siblings, 0 replies; 53+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-04-07  9:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P1                          |P2

--- Comment #56 from Richard Biener <rguenth at gcc dot gnu.org> ---
Let's demote this to P2, the compile-time regression is quite specific to
tramp3d-v4.


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bug ipa/65076] [5/6 Regression] 16% tramp3d-v4.cpp compile time regression
  2015-02-16 13:00 [Bug ipa/65076] New: [5 Regression] 16% tramp3d-v4.cpp compile time regression trippels at gcc dot gnu.org
                   ` (48 preceding siblings ...)
  2015-04-07  9:32 ` rguenth at gcc dot gnu.org
@ 2015-04-22  1:32 ` hubicka at gcc dot gnu.org
  2015-04-22 12:06 ` jakub at gcc dot gnu.org
  2015-07-16  9:19 ` rguenth at gcc dot gnu.org
  51 siblings, 0 replies; 53+ messages in thread
From: hubicka at gcc dot gnu.org @ 2015-04-22  1:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076

--- Comment #57 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Author: hubicka
Date: Wed Apr 22 01:32:14 2015
New Revision: 222305

URL: https://gcc.gnu.org/viewcvs?rev=222305&root=gcc&view=rev
Log:
    PR ipa/65076
    * passes.def (early_optimizations): Add pass_dse.

    * g++.dg/tree-ssa/pr61034.C: Update template.
    * g++.dg/warn/Warray-bounds.C: Harden for DSE.
    * gcc.dg/Warray-bounds-11.c: Likewise.
    * gcc.dg/Warray-bounds.c: Likewise.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/passes.def
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/g++.dg/tree-ssa/pr61034.C
    trunk/gcc/testsuite/g++.dg/warn/Warray-bounds.C
    trunk/gcc/testsuite/gcc.dg/Warray-bounds-11.c
    trunk/gcc/testsuite/gcc.dg/Warray-bounds.c


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bug ipa/65076] [5/6 Regression] 16% tramp3d-v4.cpp compile time regression
  2015-02-16 13:00 [Bug ipa/65076] New: [5 Regression] 16% tramp3d-v4.cpp compile time regression trippels at gcc dot gnu.org
                   ` (49 preceding siblings ...)
  2015-04-22  1:32 ` [Bug ipa/65076] [5/6 " hubicka at gcc dot gnu.org
@ 2015-04-22 12:06 ` jakub at gcc dot gnu.org
  2015-07-16  9:19 ` rguenth at gcc dot gnu.org
  51 siblings, 0 replies; 53+ messages in thread
From: jakub at gcc dot gnu.org @ 2015-04-22 12:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|5.0                         |5.2

--- Comment #58 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
GCC 5.1 has been released.


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bug ipa/65076] [5/6 Regression] 16% tramp3d-v4.cpp compile time regression
  2015-02-16 13:00 [Bug ipa/65076] New: [5 Regression] 16% tramp3d-v4.cpp compile time regression trippels at gcc dot gnu.org
                   ` (50 preceding siblings ...)
  2015-04-22 12:06 ` jakub at gcc dot gnu.org
@ 2015-07-16  9:19 ` rguenth at gcc dot gnu.org
  51 siblings, 0 replies; 53+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-07-16  9:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|5.2                         |5.3

--- Comment #59 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 5.2 is being released, adjusting target milestone to 5.3.


^ permalink raw reply	[flat|nested] 53+ messages in thread

end of thread, other threads:[~2015-07-16  9:19 UTC | newest]

Thread overview: 53+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-16 13:00 [Bug ipa/65076] New: [5 Regression] 16% tramp3d-v4.cpp compile time regression trippels at gcc dot gnu.org
2015-02-16 13:14 ` [Bug ipa/65076] " rguenth at gcc dot gnu.org
2015-02-16 13:22 ` trippels at gcc dot gnu.org
2015-02-16 18:31 ` hubicka at gcc dot gnu.org
2015-02-16 19:07 ` trippels at gcc dot gnu.org
2015-02-16 19:15 ` trippels at gcc dot gnu.org
2015-02-17 10:49 ` rguenth at gcc dot gnu.org
2015-03-04  9:10 ` rguenth at gcc dot gnu.org
2015-03-18 12:56 ` rguenth at gcc dot gnu.org
2015-03-21  5:32 ` hubicka at gcc dot gnu.org
2015-03-21 10:25 ` hubicka at gcc dot gnu.org
2015-03-21 10:48 ` hubicka at gcc dot gnu.org
2015-03-21 11:32 ` hubicka at gcc dot gnu.org
2015-03-24 14:56 ` rguenth at gcc dot gnu.org
2015-03-24 17:21 ` jakub at gcc dot gnu.org
2015-03-25  8:42 ` hubicka at gcc dot gnu.org
2015-03-25  8:42 ` hubicka at ucw dot cz
2015-03-25  8:50 ` rguenther at suse dot de
2015-03-25 21:35 ` hubicka at gcc dot gnu.org
2015-03-26  3:23 ` hubicka at gcc dot gnu.org
2015-03-27  4:03 ` hubicka at gcc dot gnu.org
2015-03-27  6:21 ` hubicka at gcc dot gnu.org
2015-03-27  9:34 ` rguenth at gcc dot gnu.org
2015-03-28 22:17 ` hubicka at gcc dot gnu.org
2015-03-29 14:29 ` trippels at gcc dot gnu.org
2015-03-30  6:03 ` hubicka at gcc dot gnu.org
2015-03-30  6:15 ` hubicka at gcc dot gnu.org
2015-03-31 13:03 ` rguenth at gcc dot gnu.org
2015-03-31 13:09 ` trippels at gcc dot gnu.org
2015-03-31 14:10 ` rguenther at suse dot de
2015-03-31 14:13 ` trippels at gcc dot gnu.org
2015-03-31 14:47 ` trippels at gcc dot gnu.org
2015-03-31 15:08 ` trippels at gcc dot gnu.org
2015-03-31 15:10 ` evstupac at gmail dot com
2015-03-31 15:29 ` rguenth at gcc dot gnu.org
2015-03-31 16:06 ` hubicka at ucw dot cz
2015-03-31 16:25 ` hubicka at gcc dot gnu.org
2015-03-31 17:36 ` hubicka at gcc dot gnu.org
2015-03-31 17:53 ` hubicka at gcc dot gnu.org
2015-03-31 17:54 ` hubicka at gcc dot gnu.org
2015-03-31 20:31 ` trippels at gcc dot gnu.org
2015-04-01  8:02 ` rguenther at suse dot de
2015-04-01  8:05 ` rguenther at suse dot de
2015-04-01  8:28 ` hubicka at gcc dot gnu.org
2015-04-01  8:34 ` trippels at gcc dot gnu.org
2015-04-02  5:18 ` hubicka at gcc dot gnu.org
2015-04-02  7:07 ` hubicka at gcc dot gnu.org
2015-04-02 23:44 ` hubicka at gcc dot gnu.org
2015-04-03 18:09 ` hubicka at gcc dot gnu.org
2015-04-07  9:32 ` rguenth at gcc dot gnu.org
2015-04-22  1:32 ` [Bug ipa/65076] [5/6 " hubicka at gcc dot gnu.org
2015-04-22 12:06 ` jakub at gcc dot gnu.org
2015-07-16  9:19 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).