public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* 3.4 / 3.5 / tree-ssa comparisons
@ 2004-04-03 20:50 Richard Guenther
  2004-04-03 20:52 ` Richard Guenther
  2004-04-03 20:54 ` Andrew Pinski
  0 siblings, 2 replies; 6+ messages in thread
From: Richard Guenther @ 2004-04-03 20:50 UTC (permalink / raw)
  To: gcc

The automated tester at 
http://www.tat.physik.uni-tuebingen.de/~rguenth/gcc/monitor-summary.html
completed its first 3.5 build.  I never checked 3.5, and so I'm 
surprised on the numbers it got:

bootstrap time (52min) is inbetween 3.4 (50min) and tree-ssa (62min), 
build times for the tramp3d-v3 test, too(!), I did expect them to 
improve compared to 3.4, not already regress again..., they are now
2.43min vs. 2.28min (3.4) and 2.75min (tree-ssa).  Also performance of 
the resulting binary is better(!) for 3.5 (6.9s/it) than for tree-ssa 
(7.68s/it) and of course 3.4 is slowest (8.85s/it).  This means we'll 
regress in both compile and runtime if merging tree-ssa now, but we 
won't have a runtime regression towards 3.4 then, only a compile time 
performance regression.

The obvious question is, why is 3.5 so much better than 3.4?  And of 
course, why is tree-ssa not better than 3.5 for C++ expression template 
numeric code?

All compilers built with checking disabled, tramp3d.cpp is build with
-O2 -funroll-loops -ffast-math, testing machine is a 1.3GHz Itanium2 
machine.

Richard.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 3.4 / 3.5 / tree-ssa comparisons
  2004-04-03 20:50 3.4 / 3.5 / tree-ssa comparisons Richard Guenther
@ 2004-04-03 20:52 ` Richard Guenther
  2004-04-03 20:54 ` Andrew Pinski
  1 sibling, 0 replies; 6+ messages in thread
From: Richard Guenther @ 2004-04-03 20:52 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc

Richard Guenther wrote:
> The automated tester at 
> http://www.tat.physik.uni-tuebingen.de/~rguenth/gcc/monitor-summary.html
> completed its first 3.5 build.  I never checked 3.5, and so I'm 
> surprised on the numbers it got.

> The obvious question is, why is 3.5 so much better than 3.4?  And of 
> course, why is tree-ssa not better than 3.5 for C++ expression template 
> numeric code?

Could be attributed to changed inlining decisions as all tests have 
leafify patch disabled (though I don't mention any changes here 3.4 vs 3.5).

Richard.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 3.4 / 3.5 / tree-ssa comparisons
  2004-04-03 20:50 3.4 / 3.5 / tree-ssa comparisons Richard Guenther
  2004-04-03 20:52 ` Richard Guenther
@ 2004-04-03 20:54 ` Andrew Pinski
  2004-04-04 13:57   ` Richard Guenther
  1 sibling, 1 reply; 6+ messages in thread
From: Andrew Pinski @ 2004-04-03 20:54 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc, Andrew Pinski


On Apr 3, 2004, at 15:49, Richard Guenther wrote:

> The automated tester at  
> http://www.tat.physik.uni-tuebingen.de/~rguenth/gcc/monitor- 
> summary.html
> completed its first 3.5 build.  I never checked 3.5, and so I'm  
> surprised on the numbers it got:
>
> bootstrap time (52min) is inbetween 3.4 (50min) and tree-ssa (62min),  
> build times for the tramp3d-v3 test, too(!), I did expect them to  
> improve compared to 3.4, not already regress again..., they are now
> 2.43min vs. 2.28min (3.4) and 2.75min (tree-ssa).  Also performance of  
> the resulting binary is better(!) for 3.5 (6.9s/it) than for tree-ssa  
> (7.68s/it) and of course 3.4 is slowest (8.85s/it).  This means we'll  
> regress in both compile and runtime if merging tree-ssa now, but we  
> won't have a runtime regression towards 3.4 then, only a compile time  
> performance regression.
>
> The obvious question is, why is 3.5 so much better than 3.4?  And of  
> course, why is tree-ssa not better than 3.5 for C++ expression  
> template numeric code?

You could check the tree-ssa with my patch at  
<http://gcc.gnu.org/ml/gcc-patches/2004-04/msg00169.html>,
it should give both a runtime improvement and a compile time  
improvement.


Thanks,
Andrew Pinski

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 3.4 / 3.5 / tree-ssa comparisons
  2004-04-03 20:54 ` Andrew Pinski
@ 2004-04-04 13:57   ` Richard Guenther
  2004-04-04 16:45     ` Andrew Pinski
  0 siblings, 1 reply; 6+ messages in thread
From: Richard Guenther @ 2004-04-04 13:57 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: gcc

Andrew Pinski wrote:
> 
> On Apr 3, 2004, at 15:49, Richard Guenther wrote:
> 
>> The automated tester at  
>> http://www.tat.physik.uni-tuebingen.de/~rguenth/gcc/monitor- summary.html
>> completed its first 3.5 build.  I never checked 3.5, and so I'm  
>> surprised on the numbers it got:
>>
>> bootstrap time (52min) is inbetween 3.4 (50min) and tree-ssa (62min),  
>> build times for the tramp3d-v3 test, too(!), I did expect them to  
>> improve compared to 3.4, not already regress again..., they are now
>> 2.43min vs. 2.28min (3.4) and 2.75min (tree-ssa).  Also performance 
>> of  the resulting binary is better(!) for 3.5 (6.9s/it) than for 
>> tree-ssa  (7.68s/it) and of course 3.4 is slowest (8.85s/it).  This 
>> means we'll  regress in both compile and runtime if merging tree-ssa 
>> now, but we  won't have a runtime regression towards 3.4 then, only a 
>> compile time  performance regression.
>>
>> The obvious question is, why is 3.5 so much better than 3.4?  And of  
>> course, why is tree-ssa not better than 3.5 for C++ expression  
>> template numeric code?
> 
> 
> You could check the tree-ssa with my patch at  
> <http://gcc.gnu.org/ml/gcc-patches/2004-04/msg00169.html>,
> it should give both a runtime improvement and a compile time  improvement.

Numbers with this patch applied are 62min bootstrap time,
  TOTAL                 : 151.44             3.21           154.66
before vs.
  TOTAL                 : 155.70             3.18           158.89
after applying patch build time.
Runtime is 7.73s/it compared to 7.64s/it beforer.
So it's not helping, but instead pessimizing slightly!?

before:
  tree gimplify         :   2.04 ( 1%) usr   0.02 ( 1%) sys   2.06 ( 1%) 
wall
  tree eh               :   1.33 ( 1%) usr   0.01 ( 0%) sys   1.34 ( 1%) 
wall
  tree CFG construction :   0.77 ( 0%) usr   0.02 ( 1%) sys   0.80 ( 1%) 
wall
  tree CFG cleanup      :   0.96 ( 1%) usr   0.00 ( 0%) sys   1.00 ( 1%) 
wall
  tree PTA              :   0.34 ( 0%) usr   0.00 ( 0%) sys   0.35 ( 0%) 
wall
  tree alias analysis   :   0.46 ( 0%) usr   0.00 ( 0%) sys   0.45 ( 0%) 
wall
  tree PHI insertion    :   1.70 ( 1%) usr   0.03 ( 1%) sys   1.72 ( 1%) 
wall
  tree SSA rewrite      :   1.53 ( 1%) usr   0.00 ( 0%) sys   1.52 ( 1%) 
wall
  tree SSA other        :   2.31 ( 1%) usr   0.16 ( 5%) sys   2.54 ( 2%) 
wall
  tree operand scan     :   2.08 ( 1%) usr   0.25 ( 8%) sys   2.27 ( 1%) 
wall
  dominator optimization:   6.37 ( 4%) usr   0.11 ( 3%) sys   6.49 ( 4%) 
wall
  tree SRA              :   0.15 ( 0%) usr   0.00 ( 0%) sys   0.16 ( 0%) 
wall
  tree CCP              :   0.65 ( 0%) usr   0.00 ( 0%) sys   0.66 ( 0%) 
wall
  tree split crit edges :   0.16 ( 0%) usr   0.00 ( 0%) sys   0.14 ( 0%) 
wall
  tree PRE              :   2.21 ( 1%) usr   0.01 ( 0%) sys   2.21 ( 1%) 
wall
  tree linearize phis   :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) 
wall
  tree forward propagate:   0.38 ( 0%) usr   0.00 ( 0%) sys   0.37 ( 0%) 
wall
  tree conservative DCE :   1.03 ( 1%) usr   0.00 ( 0%) sys   1.04 ( 1%) 
wall
  tree aggressive DCE   :   0.46 ( 0%) usr   0.00 ( 0%) sys   0.46 ( 0%) 
wall
  tree DSE              :   0.91 ( 1%) usr   0.01 ( 0%) sys   0.91 ( 1%) 
wall
  tree copy headers     :   0.88 ( 1%) usr   0.01 ( 0%) sys   0.88 ( 1%) 
wall
  tree SSA to normal    :   1.13 ( 1%) usr   0.01 ( 0%) sys   1.16 ( 1%) 
wall
  tree rename SSA copies:   0.35 ( 0%) usr   0.01 ( 0%) sys   0.34 ( 0%) 
wall
  dominance frontiers   :   0.17 ( 0%) usr   0.00 ( 0%) sys   0.19 ( 0%) 
wall
  control dependences   :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) 
wall
  expand                :   9.44 ( 6%) usr   0.05 ( 2%) sys   9.47 ( 6%) 
wall


after:
  tree gimplify         :   2.03 ( 1%) usr   0.02 ( 1%) sys   2.03 ( 1%) 
wall
  tree eh               :   1.31 ( 1%) usr   0.01 ( 0%) sys   1.31 ( 1%) 
wall
  tree CFG construction :   0.74 ( 0%) usr   0.02 ( 1%) sys   0.76 ( 0%) 
wall
  tree CFG cleanup      :   0.96 ( 1%) usr   0.00 ( 0%) sys   0.96 ( 1%) 
wall
  tree PTA              :   0.30 ( 0%) usr   0.00 ( 0%) sys   0.30 ( 0%) 
wall
  tree alias analysis   :   0.39 ( 0%) usr   0.01 ( 0%) sys   0.39 ( 0%) 
wall
  tree PHI insertion    :   1.64 ( 1%) usr   0.05 ( 2%) sys   1.71 ( 1%) 
wall
  tree SSA rewrite      :   1.47 ( 1%) usr   0.02 ( 0%) sys   1.49 ( 1%) 
wall
  tree SSA other        :   2.36 ( 2%) usr   0.15 ( 5%) sys   2.48 ( 2%) 
wall
  tree operand scan     :   2.23 ( 1%) usr   0.25 ( 8%) sys   2.48 ( 2%) 
wall
  dominator optimization:   6.44 ( 4%) usr   0.10 ( 3%) sys   6.54 ( 4%) 
wall
  tree SRA              :   0.16 ( 0%) usr   0.00 ( 0%) sys   0.16 ( 0%) 
wall
  tree CCP              :   0.59 ( 0%) usr   0.01 ( 0%) sys   0.60 ( 0%) 
wall
  tree split crit edges :   0.13 ( 0%) usr   0.00 ( 0%) sys   0.14 ( 0%) 
wall
  tree PRE              :   1.96 ( 1%) usr   0.01 ( 0%) sys   1.96 ( 1%) 
wall
  tree linearize phis   :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) 
wall
  tree remove casts     :   0.26 ( 0%) usr   0.00 ( 0%) sys   0.25 ( 0%) 
wall
  tree forward propagate:   0.36 ( 0%) usr   0.00 ( 0%) sys   0.36 ( 0%) 
wall
  tree conservative DCE :   1.05 ( 1%) usr   0.01 ( 0%) sys   1.06 ( 1%) 
wall
  tree aggressive DCE   :   0.41 ( 0%) usr   0.00 ( 0%) sys   0.41 ( 0%) 
wall
  tree DSE              :   0.86 ( 1%) usr   0.01 ( 0%) sys   0.87 ( 1%) 
wall
  tree copy headers     :   0.82 ( 1%) usr   0.01 ( 0%) sys   0.83 ( 1%) 
wall
  tree SSA to normal    :   1.04 ( 1%) usr   0.02 ( 1%) sys   1.06 ( 1%) 
wall
  tree rename SSA copies:   0.34 ( 0%) usr   0.01 ( 0%) sys   0.35 ( 0%) 
wall
  dominance frontiers   :   0.19 ( 0%) usr   0.00 ( 0%) sys   0.20 ( 0%) 
wall
  control dependences   :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) 
wall
  expand                :   9.26 ( 6%) usr   0.06 ( 2%) sys   9.32 ( 6%) 
wall

So its not a win here.

With the suggested -fno-gcse and --param max-cse-path-length=0 I get a 
compile time of
  TOTAL                 : 143.30             2.99           146.30
and runtimes of 7.87s/it.  With just -fno-gcse I get
  TOTAL                 : 144.75             3.08           147.83
and 7.89s/it, with just --param max-cse-path-length=0 it's
  TOTAL                 : 150.02             3.09           153.12
and 7.77s/it.

But maybe I'm chasing the wrong effects without enabling leafify as 
there are no nice loops to optimize then...

Richard.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 3.4 / 3.5 / tree-ssa comparisons
  2004-04-04 13:57   ` Richard Guenther
@ 2004-04-04 16:45     ` Andrew Pinski
  2004-04-05  9:21       ` Richard Guenther
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Pinski @ 2004-04-04 16:45 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc, Andrew Pinski


On Apr 4, 2004, at 09:57, Richard Guenther wrote:
> Runtime is 7.73s/it compared to 7.64s/it beforer.
> So it's not helping, but instead pessimizing slightly!?

This does not make sense because I looked into the tree
dumps for this code and it looked like it would improve
it and not hurt it.  But there is another patch which is
in the works which should also help but only in combination
with the patch which you tested.

Also I was still getting cast removals which should make SRA
do its work, but we would have to run aliasing again and do
DOM another time after this second aliasing run so the compile
time will/should go up.

Also I did a compile time comparison with and without this
cast patch on PR8361, and the patch was a win in compile
time by 2 seconds out of a run of 40 seconds so an
improvement of 5%, maybe adding the leafy patch you will
see that this patch helps more than what you have looked
at so far.

Thanks,
Andrew Pinski

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 3.4 / 3.5 / tree-ssa comparisons
  2004-04-04 16:45     ` Andrew Pinski
@ 2004-04-05  9:21       ` Richard Guenther
  0 siblings, 0 replies; 6+ messages in thread
From: Richard Guenther @ 2004-04-05  9:21 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: gcc

On Sun, 4 Apr 2004, Andrew Pinski wrote:

>
> On Apr 4, 2004, at 09:57, Richard Guenther wrote:
> > Runtime is 7.73s/it compared to 7.64s/it beforer.
> > So it's not helping, but instead pessimizing slightly!?
>
> This does not make sense because I looked into the tree
> dumps for this code and it looked like it would improve
> it and not hurt it.  But there is another patch which is
> in the works which should also help but only in combination
> with the patch which you tested.

I tried to do comparisons with leafify enabled, but I don't seem to be
able to switch off the cast pass with -fno-tree-cast?  Using the tree-ssa
build from tonight (which has only extra gfortran patch applied) I get
segfaults of the leafified binaries... - so I have only compile time
comparison which is (with tree-cast)
 TOTAL                 : 233.36             4.04           237.42
vs. (without tree-cast)
 TOTAL                 : 253.57             3.94           257.53

so for leafify enabled builds it definitely helps compile time.  Main
improvements are for PHI insertion, DOM, DSE and the thing that is most
helped is PRE which dropped from 16s to 10s.  Also improvements all over
the place as we seem to emit a lot less code with the patch (stripped size
4324216 vs. 4447272).

Now it is to find out if tree-cast is "fixing" a bug in tree-ssa that used to
miscompile the testcase in the leafify case, or just masking it, or if the
gfortran update caused the miscompilation (unlikely).

=> I like the patch ;)

Richard.

> Also I was still getting cast removals which should make SRA
> do its work, but we would have to run aliasing again and do
> DOM another time after this second aliasing run so the compile
> time will/should go up.
>
> Also I did a compile time comparison with and without this
> cast patch on PR8361, and the patch was a win in compile
> time by 2 seconds out of a run of 40 seconds so an
> improvement of 5%, maybe adding the leafy patch you will
> see that this patch helps more than what you have looked
> at so far.
>
> Thanks,
> Andrew Pinski
>

--
Richard Guenther <richard dot guenther at uni-tuebingen dot de>
WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2004-04-05  9:21 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-04-03 20:50 3.4 / 3.5 / tree-ssa comparisons Richard Guenther
2004-04-03 20:52 ` Richard Guenther
2004-04-03 20:54 ` Andrew Pinski
2004-04-04 13:57   ` Richard Guenther
2004-04-04 16:45     ` Andrew Pinski
2004-04-05  9:21       ` Richard Guenther

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).