* povray: Revised numbers
@ 2004-01-17 21:34 Scott Robert Ladd
2004-01-17 21:48 ` Jan Hubicka
0 siblings, 1 reply; 11+ messages in thread
From: Scott Robert Ladd @ 2004-01-17 21:34 UTC (permalink / raw)
To: gcc mailing list
I love live peer review. No, I really do... ;)
If we're going to debate the relative quality of mainline and tree-ssa,
it seems to me we need some real numbers. I'm an engineer, not a fortune
teller.
Here are revised numbers for my povray test, based on comments both
public and private:
compile benchmark
time time
-------- ---------
gcc mainline 1:43 7:59
w/ -mfpmath=sse 1:46 6:30
gcc tree-ssa 1:46 7:35
w/ -mfpmath=sse ** SEG fault **
icc 8.0 1:53 5:50
Previously, I was unaware of the --disable-checking switch; using it to
build tree-ssa improved its compile time such that it performs at the
same speed as mainline.
Someone suggested privately that I try -mfpmath=sse with GCC; I hadn't
had good results with that option on some other code, but I tried it
just to see what would happen.
Mainline's code became *much* faster with -mfpmath=sse; however,
tree-ssa generated a povray that segfaulted during picture generation.
No, I have not had time to see if this is a known bug.
Mainline looks *much* better now; however, much as I want tree-ssa to
move forward, I find myself concurring with many of Mark's views.
I may, if possible, run my Acovea program on GCC with povray, to see if
I can find a set of switches that equal Intel's performance. I suspect,
however, that most of Intel's advantage stems from its vectorization of
code.
--
Scott Robert Ladd
Coyote Gulch Productions (http://www.coyotegulch.com)
Software Invention for High-Performance Computing
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: povray: Revised numbers
2004-01-17 21:34 povray: Revised numbers Scott Robert Ladd
@ 2004-01-17 21:48 ` Jan Hubicka
2004-01-17 22:00 ` Richard Guenther
0 siblings, 1 reply; 11+ messages in thread
From: Jan Hubicka @ 2004-01-17 21:48 UTC (permalink / raw)
To: Scott Robert Ladd; +Cc: gcc mailing list
> I love live peer review. No, I really do... ;)
>
> If we're going to debate the relative quality of mainline and tree-ssa,
> it seems to me we need some real numbers. I'm an engineer, not a fortune
> teller.
>
> Here are revised numbers for my povray test, based on comments both
> public and private:
>
> compile benchmark
> time time
> -------- ---------
> gcc mainline 1:43 7:59
> w/ -mfpmath=sse 1:46 6:30
>
> gcc tree-ssa 1:46 7:35
> w/ -mfpmath=sse ** SEG fault **
>
> icc 8.0 1:53 5:50
>
> Previously, I was unaware of the --disable-checking switch; using it to
> build tree-ssa improved its compile time such that it performs at the
> same speed as mainline.
>
> Someone suggested privately that I try -mfpmath=sse with GCC; I hadn't
> had good results with that option on some other code, but I tried it
> just to see what would happen.
Thanks for testing.
If you have code where -mfpmath=sse lose considerably on mainline
(3.3 branch had very fresh SSA implementation and lacked P4 tunning), I
would be interested in seeing it.
Honza
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: povray: Revised numbers
2004-01-17 21:48 ` Jan Hubicka
@ 2004-01-17 22:00 ` Richard Guenther
2004-01-17 22:12 ` Jan Hubicka
` (2 more replies)
0 siblings, 3 replies; 11+ messages in thread
From: Richard Guenther @ 2004-01-17 22:00 UTC (permalink / raw)
To: Jan Hubicka; +Cc: Scott Robert Ladd, gcc mailing list
On Sat, 17 Jan 2004, Jan Hubicka wrote:
> > Here are revised numbers for my povray test, based on comments both
> > public and private:
> >
> > compile benchmark
> > time time
> > -------- ---------
> > gcc mainline 1:43 7:59
> > w/ -mfpmath=sse 1:46 6:30
> >
> > gcc tree-ssa 1:46 7:35
> > w/ -mfpmath=sse ** SEG fault **
> >
> > icc 8.0 1:53 5:50
> >
> Thanks for testing.
> If you have code where -mfpmath=sse lose considerably on mainline
> (3.3 branch had very fresh SSA implementation and lacked P4 tunning), I
> would be interested in seeing it.
As can Scott, so can I ;) Here are some benchmarks for my favorite POOMA
based application. For current 3.3, 3.4 and Intel 8.0 (tree-ssa ICEs).
For a fair comparison, I dropped __attribute__((leafify)) in one round of
testing, as I cannot hack Intel 8.0 for my needs.
Compiler flags are -O2 -funroll-loops -ffast-math -march=athlon for gcc
and -O2 -ip for icc. Tests are done on an Athlon 1GHz with 1GB ram.
gcc 3.3 is 3.3.3 20040112, gcc 3.4 is 3.4.0 20040114, icc 8.0 is build
20031231Z.
compile time run time binary size (static, stripped)
gcc3.3 6m57s 1m24s 2877232
8m23s 0m34s 2963408 __attribute__((leafify))
gcc3.4 3m41s 1m10s 2584088
5m04s 0m39s 2682520 __attribute__((leafify))
icc8.0 12m41s 0m42s 5046476
So what one can clearly see, Intel 8.0 looses all the way down in compilation
speed and binary size, but is a lot better in runtime than plain gcc. And
it should be clear, why I'm still maintaining the leafify attribute
patch... :)
The leafify test also shows that mainline/3.4 has somewhat regressed in
speed compared to 3.3 - performance I expect to get back after rtlopt
been merged. Together with the new parser and improved ISO compliance I'm
quite happy with 3.4.
I'd love to test tree-ssa, but that doesn't build my application at the
moment (and I don't have the leafify attribute fixed wrt the latest
cgraph changes yet).
Richard.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: povray: Revised numbers
2004-01-17 22:00 ` Richard Guenther
@ 2004-01-17 22:12 ` Jan Hubicka
2004-01-17 22:24 ` Richard Guenther
2004-01-17 22:41 ` Richard Henderson
2004-01-20 19:24 ` Richard Guenther
2 siblings, 1 reply; 11+ messages in thread
From: Jan Hubicka @ 2004-01-17 22:12 UTC (permalink / raw)
To: Richard Guenther; +Cc: Jan Hubicka, Scott Robert Ladd, gcc mailing list
> On Sat, 17 Jan 2004, Jan Hubicka wrote:
>
> > > Here are revised numbers for my povray test, based on comments both
> > > public and private:
> > >
> > > compile benchmark
> > > time time
> > > -------- ---------
> > > gcc mainline 1:43 7:59
> > > w/ -mfpmath=sse 1:46 6:30
> > >
> > > gcc tree-ssa 1:46 7:35
> > > w/ -mfpmath=sse ** SEG fault **
> > >
> > > icc 8.0 1:53 5:50
> > >
> > Thanks for testing.
> > If you have code where -mfpmath=sse lose considerably on mainline
> > (3.3 branch had very fresh SSA implementation and lacked P4 tunning), I
> > would be interested in seeing it.
>
> As can Scott, so can I ;) Here are some benchmarks for my favorite POOMA
> based application. For current 3.3, 3.4 and Intel 8.0 (tree-ssa ICEs).
> For a fair comparison, I dropped __attribute__((leafify)) in one round of
> testing, as I cannot hack Intel 8.0 for my needs.
>
> Compiler flags are -O2 -funroll-loops -ffast-math -march=athlon for gcc
> and -O2 -ip for icc. Tests are done on an Athlon 1GHz with 1GB ram.
> gcc 3.3 is 3.3.3 20040112, gcc 3.4 is 3.4.0 20040114, icc 8.0 is build
> 20031231Z.
>
> compile time run time binary size (static, stripped)
> gcc3.3 6m57s 1m24s 2877232
> 8m23s 0m34s 2963408 __attribute__((leafify))
> gcc3.4 3m41s 1m10s 2584088
> 5m04s 0m39s 2682520 __attribute__((leafify))
> icc8.0 12m41s 0m42s 5046476
>
> So what one can clearly see, Intel 8.0 looses all the way down in compilation
> speed and binary size, but is a lot better in runtime than plain gcc. And
> it should be clear, why I'm still maintaining the leafify attribute
> patch... :)
>
> The leafify test also shows that mainline/3.4 has somewhat regressed in
> speed compared to 3.3 - performance I expect to get back after rtlopt
> been merged. Together with the new parser and improved ISO compliance I'm
I think this can be actually interference with the leafify code. If you
leafify a really huge function, you easilly reach the inline-unit-growth
limits and supress all other inlining in effect.
I was thinking that perhaps it may sense to apply inline limits only to
the growth caused by non-always_inline/leafify/extern inline functions.
Honza
> quite happy with 3.4.
>
> I'd love to test tree-ssa, but that doesn't build my application at the
> moment (and I don't have the leafify attribute fixed wrt the latest
> cgraph changes yet).
>
> Richard.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: povray: Revised numbers
2004-01-17 22:12 ` Jan Hubicka
@ 2004-01-17 22:24 ` Richard Guenther
0 siblings, 0 replies; 11+ messages in thread
From: Richard Guenther @ 2004-01-17 22:24 UTC (permalink / raw)
To: Jan Hubicka; +Cc: gcc mailing list
On Sat, 17 Jan 2004, Jan Hubicka wrote:
> > On Sat, 17 Jan 2004, Jan Hubicka wrote:
> >
> > compile time run time binary size (static, stripped)
> > gcc3.3 6m57s 1m24s 2877232
> > 8m23s 0m34s 2963408 __attribute__((leafify))
> > gcc3.4 3m41s 1m10s 2584088
> > 5m04s 0m39s 2682520 __attribute__((leafify))
> > icc8.0 12m41s 0m42s 5046476
> >
> > So what one can clearly see, Intel 8.0 looses all the way down in compilation
> > speed and binary size, but is a lot better in runtime than plain gcc. And
> > it should be clear, why I'm still maintaining the leafify attribute
> > patch... :)
> >
> > The leafify test also shows that mainline/3.4 has somewhat regressed in
> > speed compared to 3.3 - performance I expect to get back after rtlopt
> > been merged. Together with the new parser and improved ISO compliance I'm
>
> I think this can be actually interference with the leafify code. If you
> leafify a really huge function, you easilly reach the inline-unit-growth
> limits and supress all other inlining in effect.
> I was thinking that perhaps it may sense to apply inline limits only to
> the growth caused by non-always_inline/leafify/extern inline functions.
I'm not counting the leafify inlining at all, so leafifying shouldn't make
an effect here.
Richard.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: povray: Revised numbers
2004-01-17 22:00 ` Richard Guenther
2004-01-17 22:12 ` Jan Hubicka
@ 2004-01-17 22:41 ` Richard Henderson
2004-01-20 19:24 ` Richard Guenther
2 siblings, 0 replies; 11+ messages in thread
From: Richard Henderson @ 2004-01-17 22:41 UTC (permalink / raw)
To: Richard Guenther; +Cc: Jan Hubicka, Scott Robert Ladd, gcc mailing list
On Sat, Jan 17, 2004 at 11:00:05PM +0100, Richard Guenther wrote:
> I'd love to test tree-ssa, but that doesn't build my application at the
> moment ...
Please file a bug report.
r~
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: povray: Revised numbers
2004-01-17 22:00 ` Richard Guenther
2004-01-17 22:12 ` Jan Hubicka
2004-01-17 22:41 ` Richard Henderson
@ 2004-01-20 19:24 ` Richard Guenther
2004-01-20 20:32 ` Daniel Berlin
2 siblings, 1 reply; 11+ messages in thread
From: Richard Guenther @ 2004-01-20 19:24 UTC (permalink / raw)
To: gcc; +Cc: Jan Hubicka, Scott Robert Ladd
On Sat, 17 Jan 2004, Richard Guenther wrote:
> Here are some benchmarks for my favorite POOMA
> based application. For current 3.3, 3.4 and Intel 8.0 (tree-ssa ICEs).
> For a fair comparison, I dropped __attribute__((leafify)) in one round of
> testing, as I cannot hack Intel 8.0 for my needs.
>
> Compiler flags are -O2 -funroll-loops -ffast-math -march=athlon for gcc
> and -O2 -ip for icc. Tests are done on an Athlon 1GHz with 1GB ram.
> gcc 3.3 is 3.3.3 20040112, gcc 3.4 is 3.4.0 20040114, icc 8.0 is build
> 20031231Z.
>
> compile time run time binary size (static, stripped)
> gcc3.3 6m57s 1m24s 2877232
> 8m23s 0m34s 2963408 __attribute__((leafify))
> gcc3.4 3m41s 1m10s 2584088
> 5m04s 0m39s 2682520 __attribute__((leafify))
> icc8.0 12m41s 0m42s 5046476
And here comes tree-ssa numbers (very slightly different program and
runtime-test, so again with gcc3.4 + leafify numbers):
tree-ssa 3m35s 1m48s 1712992
gcc3.4 3m33s 0m51s 1738984 __attribute__((leafify))
So the tree-ssa compilation times are not too bad, runtime, too. Using
-ftree-sra -ftree-points-to=andersen doesn't improve runtime performance,
though.
Richard.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: povray: Revised numbers
2004-01-20 19:24 ` Richard Guenther
@ 2004-01-20 20:32 ` Daniel Berlin
2004-01-20 20:38 ` Steven Bosscher
0 siblings, 1 reply; 11+ messages in thread
From: Daniel Berlin @ 2004-01-20 20:32 UTC (permalink / raw)
To: Richard Guenther; +Cc: gcc, Jan Hubicka, Scott Robert Ladd
> So the tree-ssa compilation times are not too bad, runtime, too. Using
> -ftree-sra -ftree-points-to=andersen doesn't improve runtime
> performance,
> though.
tree-sra is on by default, so i don't know why you'd think that would
improve performance.
as for points to, as i've stated multiple times before, it's not going
to help much (more than 1%) until I finish the code that uses the
results to disambiguate what vars are call clobbered, alias global
memory, and point to global memory.
Most of our optimizations are *not* blocked by aliasing other
variables, they are blocked by thinking variables are call clobbered,
alias global memory, or point to global memory.
One of the reasons i haven't requested that points-to be turned on by
default is that it won't have a performance impact until i've finished
the aforementioned code.
>
> Richard.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: povray: Revised numbers
2004-01-20 20:32 ` Daniel Berlin
@ 2004-01-20 20:38 ` Steven Bosscher
2004-01-20 20:45 ` Daniel Berlin
0 siblings, 1 reply; 11+ messages in thread
From: Steven Bosscher @ 2004-01-20 20:38 UTC (permalink / raw)
To: Daniel Berlin, Richard Guenther; +Cc: gcc, Jan Hubicka, Scott Robert Ladd
On Tuesday 20 January 2004 21:31, Daniel Berlin wrote:
> One of the reasons i haven't requested that points-to be turned on by
> default is that it won't have a performance impact until i've finished
> the aforementioned code.
It would give you some testing input though. If it doesn't hurt
and it doesn't help, then by all means turn it on, so if there are
still bugs in it, we catch them early.
Gr.
Steven
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: povray: Revised numbers
2004-01-20 20:38 ` Steven Bosscher
@ 2004-01-20 20:45 ` Daniel Berlin
2004-01-20 23:08 ` Steven Bosscher
0 siblings, 1 reply; 11+ messages in thread
From: Daniel Berlin @ 2004-01-20 20:45 UTC (permalink / raw)
To: Steven Bosscher; +Cc: gcc, Jan Hubicka, Scott Robert Ladd, Richard Guenther
On Jan 20, 2004, at 3:36 PM, Steven Bosscher wrote:
> On Tuesday 20 January 2004 21:31, Daniel Berlin wrote:
>> One of the reasons i haven't requested that points-to be turned on by
>> default is that it won't have a performance impact until i've finished
>> the aforementioned code.
>
> It would give you some testing input though.
Yeah, but so does the bootstrap and regression test.
:)
It might stop people from breaking it every third week by slightly
changing GIMPLE predicates and whatnot (so we get operations we never
had to handle before), but then they'd just ask me to fix it anyway.
Hey, i'm all for turning it on by default, don't get me wrong, I just
don't see it doing anything for the majority of cases yet. But if
people are okay with that, ...
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: povray: Revised numbers
2004-01-20 20:45 ` Daniel Berlin
@ 2004-01-20 23:08 ` Steven Bosscher
0 siblings, 0 replies; 11+ messages in thread
From: Steven Bosscher @ 2004-01-20 23:08 UTC (permalink / raw)
To: Daniel Berlin; +Cc: gcc, Jan Hubicka, Scott Robert Ladd, Richard Guenther
On Tuesday 20 January 2004 21:45, Daniel Berlin wrote:
> On Jan 20, 2004, at 3:36 PM, Steven Bosscher wrote:
> > On Tuesday 20 January 2004 21:31, Daniel Berlin wrote:
> >> One of the reasons i haven't requested that points-to be turned on by
> >> default is that it won't have a performance impact until i've finished
> >> the aforementioned code.
> >
> > It would give you some testing input though.
>
> Yeah, but so does the bootstrap and regression test.
>
> :)
>
> It might stop people from breaking it every third week by slightly
> changing GIMPLE predicates and whatnot (so we get operations we never
> had to handle before), but then they'd just ask me to fix it anyway.
>
> Hey, i'm all for turning it on by default, don't get me wrong, I just
> don't see it doing anything for the majority of cases yet. But if
> people are okay with that, ...
I would like to turn it on by default at some optimization level
just to get it some testing.
I have bootstrapped with Andersen PTA enabled, and I've posted the test
results in http://gcc.gnu.org/ml/gcc-testresults/2004-01/msg00853.html.
While PTA is not used much yet, one FAIL is already fixed by this:
< FAIL: gcc.dg/tree-ssa/sra-3.c scan-tree-dump-times link_error 0
Nice.
Gr.
Steven
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2004-01-20 23:08 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-01-17 21:34 povray: Revised numbers Scott Robert Ladd
2004-01-17 21:48 ` Jan Hubicka
2004-01-17 22:00 ` Richard Guenther
2004-01-17 22:12 ` Jan Hubicka
2004-01-17 22:24 ` Richard Guenther
2004-01-17 22:41 ` Richard Henderson
2004-01-20 19:24 ` Richard Guenther
2004-01-20 20:32 ` Daniel Berlin
2004-01-20 20:38 ` Steven Bosscher
2004-01-20 20:45 ` Daniel Berlin
2004-01-20 23:08 ` Steven Bosscher
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).