povray: Revised numbers

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* povray: Revised numbers
@ 2004-01-17 21:34 Scott Robert Ladd
  2004-01-17 21:48 ` Jan Hubicka
  0 siblings, 1 reply; 11+ messages in thread
From: Scott Robert Ladd @ 2004-01-17 21:34 UTC (permalink / raw)
  To: gcc mailing list

I love live peer review. No, I really do... ;)

If we're going to debate the relative quality of mainline and tree-ssa, 
it seems to me we need some real numbers. I'm an engineer, not a fortune 
teller.

Here are revised numbers for my povray test, based on comments both 
public and private:

                    compile  benchmark
                     time      time
                   --------  ---------
gcc mainline        1:43      7:59
   w/ -mfpmath=sse   1:46      6:30

gcc tree-ssa        1:46      7:35
   w/ -mfpmath=sse  ** SEG fault **

icc 8.0             1:53      5:50

Previously, I was unaware of the --disable-checking switch; using it to 
build tree-ssa improved its compile time such that it performs at the 
same speed as mainline.

Someone suggested privately that I try -mfpmath=sse with GCC; I hadn't 
had good results with that option on some other code, but I tried it 
just to see what would happen.

Mainline's code became *much* faster with -mfpmath=sse; however, 
tree-ssa generated a povray that segfaulted during picture generation. 
No, I have not had time to see if this is a known bug.

Mainline looks *much* better now; however, much as I want tree-ssa to 
move forward, I find myself concurring with many of Mark's views.

I may, if possible, run my Acovea program on GCC with povray, to see if 
I can find a set of switches that equal Intel's performance. I suspect, 
however, that most of Intel's advantage stems from its vectorization of 
code.

-- 
Scott Robert Ladd
Coyote Gulch Productions (http://www.coyotegulch.com)
Software Invention for High-Performance Computing

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: povray: Revised numbers
  2004-01-17 21:34 povray: Revised numbers Scott Robert Ladd
@ 2004-01-17 21:48 ` Jan Hubicka
  2004-01-17 22:00   ` Richard Guenther
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Hubicka @ 2004-01-17 21:48 UTC (permalink / raw)
  To: Scott Robert Ladd; +Cc: gcc mailing list

> I love live peer review. No, I really do... ;)
> 
> If we're going to debate the relative quality of mainline and tree-ssa, 
> it seems to me we need some real numbers. I'm an engineer, not a fortune 
> teller.
> 
> Here are revised numbers for my povray test, based on comments both 
> public and private:
> 
>                    compile  benchmark
>                     time      time
>                   --------  ---------
> gcc mainline        1:43      7:59
>   w/ -mfpmath=sse   1:46      6:30
> 
> gcc tree-ssa        1:46      7:35
>   w/ -mfpmath=sse  ** SEG fault **
> 
> icc 8.0             1:53      5:50
> 
> Previously, I was unaware of the --disable-checking switch; using it to 
> build tree-ssa improved its compile time such that it performs at the 
> same speed as mainline.
> 
> Someone suggested privately that I try -mfpmath=sse with GCC; I hadn't 
> had good results with that option on some other code, but I tried it 
> just to see what would happen.
Thanks for testing.
If you have code where -mfpmath=sse lose considerably on mainline
(3.3 branch had very fresh SSA implementation and lacked P4 tunning), I
would be interested in seeing it.

Honza

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: povray: Revised numbers
  2004-01-17 21:48 ` Jan Hubicka
@ 2004-01-17 22:00   ` Richard Guenther
  2004-01-17 22:12     ` Jan Hubicka
                       ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Richard Guenther @ 2004-01-17 22:00 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Scott Robert Ladd, gcc mailing list

On Sat, 17 Jan 2004, Jan Hubicka wrote:

> > Here are revised numbers for my povray test, based on comments both
> > public and private:
> >
> >                    compile  benchmark
> >                     time      time
> >                   --------  ---------
> > gcc mainline        1:43      7:59
> >   w/ -mfpmath=sse   1:46      6:30
> >
> > gcc tree-ssa        1:46      7:35
> >   w/ -mfpmath=sse  ** SEG fault **
> >
> > icc 8.0             1:53      5:50
> >
> Thanks for testing.
> If you have code where -mfpmath=sse lose considerably on mainline
> (3.3 branch had very fresh SSA implementation and lacked P4 tunning), I
> would be interested in seeing it.

As can Scott, so can I ;)  Here are some benchmarks for my favorite POOMA
based application.  For current 3.3, 3.4 and Intel 8.0 (tree-ssa ICEs).
For a fair comparison, I dropped __attribute__((leafify)) in one round of
testing, as I cannot hack Intel 8.0 for my needs.

Compiler flags are -O2 -funroll-loops -ffast-math -march=athlon for gcc
and -O2 -ip for icc.  Tests are done on an Athlon 1GHz with 1GB ram.
gcc 3.3 is 3.3.3 20040112, gcc 3.4 is 3.4.0 20040114, icc 8.0 is build
20031231Z.

	compile time	run time	binary size (static, stripped)
gcc3.3	6m57s		1m24s		2877232
	8m23s		0m34s		2963408  __attribute__((leafify))
gcc3.4	3m41s		1m10s		2584088
	5m04s		0m39s		2682520  __attribute__((leafify))
icc8.0	12m41s		0m42s		5046476

So what one can clearly see, Intel 8.0 looses all the way down in compilation
speed and binary size, but is a lot better in runtime than plain gcc.  And
it should be clear, why I'm still maintaining the leafify attribute
patch... :)

The leafify test also shows that mainline/3.4 has somewhat regressed in
speed compared to 3.3 - performance I expect to get back after rtlopt
been merged.  Together with the new parser and improved ISO compliance I'm
quite happy with 3.4.

I'd love to test tree-ssa, but that doesn't build my application at the
moment (and I don't have the leafify attribute fixed wrt the latest
cgraph changes yet).

Richard.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: povray: Revised numbers
  2004-01-17 22:00   ` Richard Guenther
@ 2004-01-17 22:12     ` Jan Hubicka
  2004-01-17 22:24       ` Richard Guenther
  2004-01-17 22:41     ` Richard Henderson
  2004-01-20 19:24     ` Richard Guenther
  2 siblings, 1 reply; 11+ messages in thread
From: Jan Hubicka @ 2004-01-17 22:12 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Jan Hubicka, Scott Robert Ladd, gcc mailing list

> On Sat, 17 Jan 2004, Jan Hubicka wrote:
> 
> > > Here are revised numbers for my povray test, based on comments both
> > > public and private:
> > >
> > >                    compile  benchmark
> > >                     time      time
> > >                   --------  ---------
> > > gcc mainline        1:43      7:59
> > >   w/ -mfpmath=sse   1:46      6:30
> > >
> > > gcc tree-ssa        1:46      7:35
> > >   w/ -mfpmath=sse  ** SEG fault **
> > >
> > > icc 8.0             1:53      5:50
> > >
> > Thanks for testing.
> > If you have code where -mfpmath=sse lose considerably on mainline
> > (3.3 branch had very fresh SSA implementation and lacked P4 tunning), I
> > would be interested in seeing it.
> 
> As can Scott, so can I ;)  Here are some benchmarks for my favorite POOMA
> based application.  For current 3.3, 3.4 and Intel 8.0 (tree-ssa ICEs).
> For a fair comparison, I dropped __attribute__((leafify)) in one round of
> testing, as I cannot hack Intel 8.0 for my needs.
> 
> Compiler flags are -O2 -funroll-loops -ffast-math -march=athlon for gcc
> and -O2 -ip for icc.  Tests are done on an Athlon 1GHz with 1GB ram.
> gcc 3.3 is 3.3.3 20040112, gcc 3.4 is 3.4.0 20040114, icc 8.0 is build
> 20031231Z.
> 
> 	compile time	run time	binary size (static, stripped)
> gcc3.3	6m57s		1m24s		2877232
> 	8m23s		0m34s		2963408  __attribute__((leafify))
> gcc3.4	3m41s		1m10s		2584088
> 	5m04s		0m39s		2682520  __attribute__((leafify))
> icc8.0	12m41s		0m42s		5046476
> 
> So what one can clearly see, Intel 8.0 looses all the way down in compilation
> speed and binary size, but is a lot better in runtime than plain gcc.  And
> it should be clear, why I'm still maintaining the leafify attribute
> patch... :)
> 
> The leafify test also shows that mainline/3.4 has somewhat regressed in
> speed compared to 3.3 - performance I expect to get back after rtlopt
> been merged.  Together with the new parser and improved ISO compliance I'm

I think this can be actually interference with the leafify code.  If you
leafify a really huge function, you easilly reach the inline-unit-growth
limits and supress all other inlining in effect.
I was thinking that perhaps it may sense to apply inline limits only to
the growth caused by non-always_inline/leafify/extern inline functions.

Honza
> quite happy with 3.4.
> 
> I'd love to test tree-ssa, but that doesn't build my application at the
> moment (and I don't have the leafify attribute fixed wrt the latest
> cgraph changes yet).
> 
> Richard.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: povray: Revised numbers
  2004-01-17 22:12     ` Jan Hubicka
@ 2004-01-17 22:24       ` Richard Guenther
  0 siblings, 0 replies; 11+ messages in thread
From: Richard Guenther @ 2004-01-17 22:24 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc mailing list

On Sat, 17 Jan 2004, Jan Hubicka wrote:

> > On Sat, 17 Jan 2004, Jan Hubicka wrote:
> >
> > 	compile time	run time	binary size (static, stripped)
> > gcc3.3	6m57s		1m24s		2877232
> > 	8m23s		0m34s		2963408  __attribute__((leafify))
> > gcc3.4	3m41s		1m10s		2584088
> > 	5m04s		0m39s		2682520  __attribute__((leafify))
> > icc8.0	12m41s		0m42s		5046476
> >
> > So what one can clearly see, Intel 8.0 looses all the way down in compilation
> > speed and binary size, but is a lot better in runtime than plain gcc.  And
> > it should be clear, why I'm still maintaining the leafify attribute
> > patch... :)
> >
> > The leafify test also shows that mainline/3.4 has somewhat regressed in
> > speed compared to 3.3 - performance I expect to get back after rtlopt
> > been merged.  Together with the new parser and improved ISO compliance I'm
>
> I think this can be actually interference with the leafify code.  If you
> leafify a really huge function, you easilly reach the inline-unit-growth
> limits and supress all other inlining in effect.
> I was thinking that perhaps it may sense to apply inline limits only to
> the growth caused by non-always_inline/leafify/extern inline functions.

I'm not counting the leafify inlining at all, so leafifying shouldn't make
an effect here.

Richard.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: povray: Revised numbers
  2004-01-17 22:00   ` Richard Guenther
  2004-01-17 22:12     ` Jan Hubicka
@ 2004-01-17 22:41     ` Richard Henderson
  2004-01-20 19:24     ` Richard Guenther
  2 siblings, 0 replies; 11+ messages in thread
From: Richard Henderson @ 2004-01-17 22:41 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Jan Hubicka, Scott Robert Ladd, gcc mailing list

On Sat, Jan 17, 2004 at 11:00:05PM +0100, Richard Guenther wrote:
> I'd love to test tree-ssa, but that doesn't build my application at the
> moment ...

Please file a bug report.


r~

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: povray: Revised numbers
  2004-01-17 22:00   ` Richard Guenther
  2004-01-17 22:12     ` Jan Hubicka
  2004-01-17 22:41     ` Richard Henderson
@ 2004-01-20 19:24     ` Richard Guenther
  2004-01-20 20:32       ` Daniel Berlin
  2 siblings, 1 reply; 11+ messages in thread
From: Richard Guenther @ 2004-01-20 19:24 UTC (permalink / raw)
  To: gcc; +Cc: Jan Hubicka, Scott Robert Ladd

On Sat, 17 Jan 2004, Richard Guenther wrote:

> Here are some benchmarks for my favorite POOMA
> based application.  For current 3.3, 3.4 and Intel 8.0 (tree-ssa ICEs).
> For a fair comparison, I dropped __attribute__((leafify)) in one round of
> testing, as I cannot hack Intel 8.0 for my needs.
>
> Compiler flags are -O2 -funroll-loops -ffast-math -march=athlon for gcc
> and -O2 -ip for icc.  Tests are done on an Athlon 1GHz with 1GB ram.
> gcc 3.3 is 3.3.3 20040112, gcc 3.4 is 3.4.0 20040114, icc 8.0 is build
> 20031231Z.
>
> 	compile time	run time	binary size (static, stripped)
> gcc3.3	6m57s		1m24s		2877232
> 		8m23s		0m34s		2963408  __attribute__((leafify))
> gcc3.4	3m41s		1m10s		2584088
> 		5m04s		0m39s		2682520  __attribute__((leafify))
> icc8.0	12m41s		0m42s		5046476

And here comes tree-ssa numbers (very slightly different program and
runtime-test, so again with gcc3.4 + leafify numbers):

tree-ssa	3m35s		1m48s		1712992
gcc3.4		3m33s		0m51s		1738984  __attribute__((leafify))

So the tree-ssa compilation times are not too bad, runtime, too.  Using
-ftree-sra -ftree-points-to=andersen doesn't improve runtime performance,
though.

Richard.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: povray: Revised numbers
  2004-01-20 19:24     ` Richard Guenther
@ 2004-01-20 20:32       ` Daniel Berlin
  2004-01-20 20:38         ` Steven Bosscher
  0 siblings, 1 reply; 11+ messages in thread
From: Daniel Berlin @ 2004-01-20 20:32 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc, Jan Hubicka, Scott Robert Ladd

> So the tree-ssa compilation times are not too bad, runtime, too.  Using
> -ftree-sra -ftree-points-to=andersen doesn't improve runtime 
> performance,
> though.

tree-sra is on by default, so i don't know why you'd think that would 
improve performance.
as for points to, as i've stated multiple times before, it's not going 
to help much (more than 1%) until I finish the code that uses the 
results to disambiguate what vars are call clobbered, alias global 
memory, and point to global memory.
Most of our optimizations are *not* blocked by aliasing other 
variables, they are blocked by thinking variables are call clobbered, 
alias global memory, or point to global memory.

One of the reasons i haven't requested that points-to be turned on by 
default is that it won't have a performance impact until i've finished 
the aforementioned code.

>
> Richard.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: povray: Revised numbers
  2004-01-20 20:32       ` Daniel Berlin
@ 2004-01-20 20:38         ` Steven Bosscher
  2004-01-20 20:45           ` Daniel Berlin
  0 siblings, 1 reply; 11+ messages in thread
From: Steven Bosscher @ 2004-01-20 20:38 UTC (permalink / raw)
  To: Daniel Berlin, Richard Guenther; +Cc: gcc, Jan Hubicka, Scott Robert Ladd

On Tuesday 20 January 2004 21:31, Daniel Berlin wrote:
> One of the reasons i haven't requested that points-to be turned on by
> default is that it won't have a performance impact until i've finished
> the aforementioned code.

It would give you some testing input though.  If it doesn't hurt
and it doesn't help, then by all means turn it on, so if there are
still bugs in it, we catch them early.

Gr.
Steven

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: povray: Revised numbers
  2004-01-20 20:38         ` Steven Bosscher
@ 2004-01-20 20:45           ` Daniel Berlin
  2004-01-20 23:08             ` Steven Bosscher
  0 siblings, 1 reply; 11+ messages in thread
From: Daniel Berlin @ 2004-01-20 20:45 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: gcc, Jan Hubicka, Scott Robert Ladd, Richard Guenther

On Jan 20, 2004, at 3:36 PM, Steven Bosscher wrote:

> On Tuesday 20 January 2004 21:31, Daniel Berlin wrote:
>> One of the reasons i haven't requested that points-to be turned on by
>> default is that it won't have a performance impact until i've finished
>> the aforementioned code.
>
> It would give you some testing input though.

Yeah, but so does the bootstrap and regression test.
:)
It might stop people from breaking it every third week by slightly 
changing GIMPLE predicates and whatnot (so we get operations we never 
had to handle before), but then they'd just ask me to fix it anyway.

Hey, i'm all for turning it on by default, don't get me wrong, I just 
don't see it doing anything for the majority of cases yet.  But if 
people are okay with that, ...

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: povray: Revised numbers
  2004-01-20 20:45           ` Daniel Berlin
@ 2004-01-20 23:08             ` Steven Bosscher
  0 siblings, 0 replies; 11+ messages in thread
From: Steven Bosscher @ 2004-01-20 23:08 UTC (permalink / raw)
  To: Daniel Berlin; +Cc: gcc, Jan Hubicka, Scott Robert Ladd, Richard Guenther

On Tuesday 20 January 2004 21:45, Daniel Berlin wrote:
> On Jan 20, 2004, at 3:36 PM, Steven Bosscher wrote:
> > On Tuesday 20 January 2004 21:31, Daniel Berlin wrote:
> >> One of the reasons i haven't requested that points-to be turned on by
> >> default is that it won't have a performance impact until i've finished
> >> the aforementioned code.
> >
> > It would give you some testing input though.
>
> Yeah, but so does the bootstrap and regression test.
>
> :)
>
> It might stop people from breaking it every third week by slightly
> changing GIMPLE predicates and whatnot (so we get operations we never
> had to handle before), but then they'd just ask me to fix it anyway.
>
> Hey, i'm all for turning it on by default, don't get me wrong, I just
> don't see it doing anything for the majority of cases yet.  But if
> people are okay with that, ...

I would like to turn it on by default at some optimization level
just to get it some testing.

I have bootstrapped with Andersen PTA enabled, and I've posted the test
results in http://gcc.gnu.org/ml/gcc-testresults/2004-01/msg00853.html.
While PTA is not used much yet, one FAIL is already fixed by this:

< FAIL: gcc.dg/tree-ssa/sra-3.c scan-tree-dump-times link_error 0

Nice.

Gr.
Steven


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2004-01-20 23:08 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-01-17 21:34 povray: Revised numbers Scott Robert Ladd
2004-01-17 21:48 ` Jan Hubicka
2004-01-17 22:00   ` Richard Guenther
2004-01-17 22:12     ` Jan Hubicka
2004-01-17 22:24       ` Richard Guenther
2004-01-17 22:41     ` Richard Henderson
2004-01-20 19:24     ` Richard Guenther
2004-01-20 20:32       ` Daniel Berlin
2004-01-20 20:38         ` Steven Bosscher
2004-01-20 20:45           ` Daniel Berlin
2004-01-20 23:08             ` Steven Bosscher

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).