public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Loop unrolling-related SPEC regressions?
@ 2002-02-01 10:32 Paolo Carlini
  2002-02-01 10:46 ` Richard Henderson
  2002-02-01 11:14 ` Joe Buck
  0 siblings, 2 replies; 41+ messages in thread
From: Paolo Carlini @ 2002-02-01 10:32 UTC (permalink / raw)
  To: gcc; +Cc: rth

Hi,

browsing the latest results from Andreas, it looks like a few of them (e.g.,
164.gzip, 186.crafty, 200.sixtrack) are showing a definite regression in the
PEAK case, characterized by -funroll-all-loops. May it be related to the recent:

    http://gcc.gnu.org/ml/gcc-patches/2002-01/msg02199.html

??

Thanks,
Paolo.



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Loop unrolling-related SPEC regressions?
  2002-02-01 10:32 Loop unrolling-related SPEC regressions? Paolo Carlini
@ 2002-02-01 10:46 ` Richard Henderson
  2002-02-01 10:51   ` Paolo Carlini
  2002-02-01 11:14 ` Joe Buck
  1 sibling, 1 reply; 41+ messages in thread
From: Richard Henderson @ 2002-02-01 10:46 UTC (permalink / raw)
  To: Paolo Carlini; +Cc: gcc

> browsing the latest results from Andreas, it looks like a few of them (e.g.,
> 164.gzip, 186.crafty, 200.sixtrack) are showing a definite regression in the
> PEAK case, characterized by -funroll-all-loops. May it be related to the
> recent:

Possibly.  If someone debugs this and finds a test case for which loop
unrolling fails where it succeeded before, I'll look at it.  I can't
promise to fix it though, since the fix may break the original test case.


r~

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Loop unrolling-related SPEC regressions?
  2002-02-01 10:46 ` Richard Henderson
@ 2002-02-01 10:51   ` Paolo Carlini
  2002-02-01 10:57     ` Richard Henderson
  0 siblings, 1 reply; 41+ messages in thread
From: Paolo Carlini @ 2002-02-01 10:51 UTC (permalink / raw)
  To: Richard Henderson; +Cc: aj, gcc

Richard Henderson wrote:

> > browsing the latest results from Andreas, it looks like a few of them (e.g.,
> > 164.gzip, 186.crafty, 200.sixtrack) are showing a definite regression in the
> > PEAK case, characterized by -funroll-all-loops. May it be related to the
> > recent:
>
> Possibly.  If someone debugs this and finds a test case for which loop
> unrolling fails where it succeeded before, I'll look at it.

Thanks for your prompt reply. The problem is, all of SPEC is not publicly
available :-( Only Andreas may try to heavily distil a testcase from the original
codes...

Otherwise, we should find one somewhere else. Where??

>  I can't
> promise to fix it though, since the fix may break the original test case.

I see.

Cheers,
P.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Loop unrolling-related SPEC regressions?
  2002-02-01 10:51   ` Paolo Carlini
@ 2002-02-01 10:57     ` Richard Henderson
  2002-02-01 11:12       ` Paolo Carlini
  0 siblings, 1 reply; 41+ messages in thread
From: Richard Henderson @ 2002-02-01 10:57 UTC (permalink / raw)
  To: Paolo Carlini; +Cc: aj, gcc

On Fri, Feb 01, 2002 at 07:50:03PM +0100, Paolo Carlini wrote:
> Otherwise, we should find one somewhere else. Where??

There are other benchmarks.  Some of them are on gcc.gnu.org
somewhere (there's a link off the web pages).  Try them and
see if we regress -funroll-loops.

Note that I have no confidence that -funroll-all-loops is a
useful thing to try.  You're overriding the logic in the 
unroller that tries to decide if the unrolling would pay off.


r~

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Loop unrolling-related SPEC regressions?
  2002-02-01 10:57     ` Richard Henderson
@ 2002-02-01 11:12       ` Paolo Carlini
  2002-02-04  8:37         ` Andreas Jaeger
  0 siblings, 1 reply; 41+ messages in thread
From: Paolo Carlini @ 2002-02-01 11:12 UTC (permalink / raw)
  To: Richard Henderson; +Cc: gcc, aj

Richard Henderson wrote:

> On Fri, Feb 01, 2002 at 07:50:03PM +0100, Paolo Carlini wrote:
> > Otherwise, we should find one somewhere else. Where??
>
> There are other benchmarks.  Some of them are on gcc.gnu.org
> somewhere (there's a link off the web pages).  Try them and
> see if we regress -funroll-loops.

Ok. I will try to do my best during the weekend.

> Note that I have no confidence that -funroll-all-loops is a
> useful thing to try.  You're overriding the logic in the
> unroller that tries to decide if the unrolling would pay off.

I see.
Perhaps we could ask Andreas to help by running an exceptional SPEC test with
-funroll-loops instead (ideally, 2 different runs, pre- and post- the unroller
patch).

Cheers,
Paolo.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Loop unrolling-related SPEC regressions?
  2002-02-01 10:32 Loop unrolling-related SPEC regressions? Paolo Carlini
  2002-02-01 10:46 ` Richard Henderson
@ 2002-02-01 11:14 ` Joe Buck
  2002-02-04  8:45   ` Andreas Jaeger
  2002-02-04 10:58   ` Jan Hubicka
  1 sibling, 2 replies; 41+ messages in thread
From: Joe Buck @ 2002-02-01 11:14 UTC (permalink / raw)
  To: Paolo Carlini; +Cc: gcc, rth


> browsing the latest results from Andreas, it looks like a few of them (e.g.,
> 164.gzip, 186.crafty, 200.sixtrack) are showing a definite regression in the
> PEAK case, characterized by -funroll-all-loops.

It's not clear to me that -funroll-all-loops is the correct setting for
PEAK, as bloating out the code may make the cache perform worse.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Loop unrolling-related SPEC regressions?
  2002-02-01 11:12       ` Paolo Carlini
@ 2002-02-04  8:37         ` Andreas Jaeger
  2002-02-04  9:05           ` Paolo Carlini
  0 siblings, 1 reply; 41+ messages in thread
From: Andreas Jaeger @ 2002-02-04  8:37 UTC (permalink / raw)
  To: Paolo Carlini; +Cc: Richard Henderson, gcc

Paolo Carlini <pcarlini@unitus.it> writes:

> Perhaps we could ask Andreas to help by running an exceptional SPEC test with
> -funroll-loops instead (ideally, 2 different runs, pre- and post- the unroller
> patch).

Sorry for joining in late, I've been travelling.

Tell me exactly which patch I should revert and which compiler flags I
should use and I'll bootstrap two GCCs and run one SPECint run using
the different compilers for base and peak.

Andreas
-- 
 Andreas Jaeger
  SuSE Labs aj@suse.de
   private aj@arthur.inka.de
    http://www.suse.de/~aj

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Loop unrolling-related SPEC regressions?
  2002-02-01 11:14 ` Joe Buck
@ 2002-02-04  8:45   ` Andreas Jaeger
  2002-02-04 10:58   ` Jan Hubicka
  1 sibling, 0 replies; 41+ messages in thread
From: Andreas Jaeger @ 2002-02-04  8:45 UTC (permalink / raw)
  To: Joe Buck; +Cc: Paolo Carlini, gcc, rth

Joe Buck <jbuck@synopsys.COM> writes:

>> browsing the latest results from Andreas, it looks like a few of them (e.g.,
>> 164.gzip, 186.crafty, 200.sixtrack) are showing a definite regression in the
>> PEAK case, characterized by -funroll-all-loops.
>
> It's not clear to me that -funroll-all-loops is the correct setting for
> PEAK, as bloating out the code may make the cache perform worse.

I know it's not the best setting but I'm not going for the best
numbers but for consistency - and like to test different areas of the
compiler.

Andreas
-- 
 Andreas Jaeger
  SuSE Labs aj@suse.de
   private aj@arthur.inka.de
    http://www.suse.de/~aj

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Loop unrolling-related SPEC regressions?
  2002-02-04  8:37         ` Andreas Jaeger
@ 2002-02-04  9:05           ` Paolo Carlini
  2002-02-04 11:15             ` Andreas Jaeger
  2002-02-06  0:59             ` Andreas Jaeger
  0 siblings, 2 replies; 41+ messages in thread
From: Paolo Carlini @ 2002-02-04  9:05 UTC (permalink / raw)
  To: Andreas Jaeger; +Cc: gcc, rth

Andreas Jaeger wrote:

>Paolo Carlini <pcarlini@unitus.it> writes:
>
>>Perhaps we could ask Andreas to help by running an exceptional SPEC test with
>>-funroll-loops instead (ideally, 2 different runs, pre- and post- the unroller
>>patch).
>>
>Sorry for joining in late, I've been travelling.
>
>Tell me exactly which patch I should revert and which compiler flags I
>should use and I'll bootstrap two GCCs and run one SPECint run using
>the different compilers for base and peak.
>
Thank you very much for your feedback Andreas.
With your help we could try to understand the following: RTH patch 
affects negatively SPEC runs (*) for a PEAK setup identical to that 
which you currently use *but* with -funroll-loops (instead of 
-funroll-all-loops) or not? In the process, we could also understand 
more of the issue itself -funroll-all-loops vs. -funroll-loops.
Therefore, if you agree, this is the patch which should be tentatively 
reverted:

    http://gcc.gnu.org/ml/gcc-patches/2002-01/msg02199.html

Thanks,
Paolo.

(*) When I say "affect negatively" I really mean the following: there is 
a good amount of evidence that due to that patch the following tests 
loose many points: 164.gzip, 186.crafty, 200.sixtrack.





>
>
>Andreas
>


-- 
Paolo Carlini
Dipartimento di Scienze Ambientali
Università degli Studi della Tuscia
Largo dell'Università, I-01100, Viterbo, ITALY



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Loop unrolling-related SPEC regressions?
  2002-02-01 11:14 ` Joe Buck
  2002-02-04  8:45   ` Andreas Jaeger
@ 2002-02-04 10:58   ` Jan Hubicka
  2002-02-04 11:07     ` Paolo Carlini
  2002-02-04 11:20     ` Joe Buck
  1 sibling, 2 replies; 41+ messages in thread
From: Jan Hubicka @ 2002-02-04 10:58 UTC (permalink / raw)
  To: Joe Buck; +Cc: Paolo Carlini, gcc, rth

> 
> > browsing the latest results from Andreas, it looks like a few of them (e.g.,
> > 164.gzip, 186.crafty, 200.sixtrack) are showing a definite regression in the
> > PEAK case, characterized by -funroll-all-loops.
> 
> It's not clear to me that -funroll-all-loops is the correct setting for
> PEAK, as bloating out the code may make the cache perform worse.

We do use them in the testing runs for exactly these purposes.
It tends to show the "bugs" that causes unnecesary code growth in some
areas unnoticed by other benchmarks.
THe base/peak flags are not supposed to bring best performance,
but be good for testing majority of gcc features.

Honza

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Loop unrolling-related SPEC regressions?
  2002-02-04 10:58   ` Jan Hubicka
@ 2002-02-04 11:07     ` Paolo Carlini
  2002-02-04 12:12       ` Andreas Jaeger
  2002-02-04 11:20     ` Joe Buck
  1 sibling, 1 reply; 41+ messages in thread
From: Paolo Carlini @ 2002-02-04 11:07 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc

Jan Hubicka wrote:

> >
> > > browsing the latest results from Andreas, it looks like a few of them (e.g.,
> > > 164.gzip, 186.crafty, 200.sixtrack) are showing a definite regression in the
> > > PEAK case, characterized by -funroll-all-loops.
> >
> > It's not clear to me that -funroll-all-loops is the correct setting for
> > PEAK, as bloating out the code may make the cache perform worse.
>
> We do use them in the testing runs for exactly these purposes.
> It tends to show the "bugs" that causes unnecesary code growth in some
> areas unnoticed by other benchmarks.
> THe base/peak flags are not supposed to bring best performance,
> but be good for testing majority of gcc features.

That's really enlightening Honza! Thanks for the clarification.
We should also remember this when someone compares the SPEC numbers made available
by other compiler producers with those of GCC: my guess is that this kind of
rationale for choosing the PEAK flags it's unfortunately not so widespread...

Cheers,
Paolo.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Loop unrolling-related SPEC regressions?
  2002-02-04  9:05           ` Paolo Carlini
@ 2002-02-04 11:15             ` Andreas Jaeger
  2002-02-06  0:59             ` Andreas Jaeger
  1 sibling, 0 replies; 41+ messages in thread
From: Andreas Jaeger @ 2002-02-04 11:15 UTC (permalink / raw)
  To: Paolo Carlini; +Cc: gcc, rth

Paolo Carlini <pcarlini@unitus.it> writes:

> Andreas Jaeger wrote:
>
>>Paolo Carlini <pcarlini@unitus.it> writes:
>>
>>>Perhaps we could ask Andreas to help by running an exceptional SPEC test with
>>>-funroll-loops instead (ideally, 2 different runs, pre- and post- the unroller
>>>patch).
>>>
>>Sorry for joining in late, I've been travelling.
>>
>>Tell me exactly which patch I should revert and which compiler flags I
>>should use and I'll bootstrap two GCCs and run one SPECint run using
>>the different compilers for base and peak.
>>
> Thank you very much for your feedback Andreas.
> With your help we could try to understand the following: RTH patch
> affects negatively SPEC runs (*) for a PEAK setup identical to that
> which you currently use *but* with -funroll-loops (instead of
> -funroll-all-loops) or not? In the process, we could also understand
> more of the issue itself -funroll-all-loops vs. -funroll-loops.
> Therefore, if you agree, this is the patch which should be tentatively
> reverted:
>
>     http://gcc.gnu.org/ml/gcc-patches/2002-01/msg02199.html
>
> Thanks,
> Paolo.
>
> (*) When I say "affect negatively" I really mean the following: there
> is a good amount of evidence that due to that patch the following
> tests loose many points: 164.gzip, 186.crafty, 200.sixtrack.

I have some scripts [1] that bootstrap GCC and automatically run SPEC.
I'll try to setup some tests tomorrow and send the results,

Andreas


Footnotes: 
[1]  If anybody likes to have my scripts, just ask me.

-- 
 Andreas Jaeger
  SuSE Labs aj@suse.de
   private aj@arthur.inka.de
    http://www.suse.de/~aj

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Loop unrolling-related SPEC regressions?
  2002-02-04 10:58   ` Jan Hubicka
  2002-02-04 11:07     ` Paolo Carlini
@ 2002-02-04 11:20     ` Joe Buck
  2002-02-04 11:33       ` Jan Hubicka
  1 sibling, 1 reply; 41+ messages in thread
From: Joe Buck @ 2002-02-04 11:20 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Joe Buck, Paolo Carlini, gcc, rth

Honza writes:

> THe base/peak flags are not supposed to bring best performance,
> but be good for testing majority of gcc features.

gcc's competition, though, tends to use them that way (choosing options
that meet the criteria but give best performance).

Not that I want to get into a war on that front, but ...

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Loop unrolling-related SPEC regressions?
  2002-02-04 11:20     ` Joe Buck
@ 2002-02-04 11:33       ` Jan Hubicka
  2002-02-04 11:47         ` Jan Hubicka
  2002-02-05  9:59         ` David Edelsohn
  0 siblings, 2 replies; 41+ messages in thread
From: Jan Hubicka @ 2002-02-04 11:33 UTC (permalink / raw)
  To: Joe Buck; +Cc: Jan Hubicka, Paolo Carlini, gcc, rth

> Honza writes:
> 
> > THe base/peak flags are not supposed to bring best performance,
> > but be good for testing majority of gcc features.
> 
> gcc's competition, though, tends to use them that way (choosing options
> that meet the criteria but give best performance).
> 
> Not that I want to get into a war on that front, but ...
Hmm, perhaps we can try to make kind of "official" SPEC results when 3.1 release
is out.  Andreas did some experimentation with the various options (it is
linked from the page), and as I remember the loop unrolling -funroll-loops
had neutral effect overal, while -funroll-all-loops caused slight performance
drop.

Thinks may've changed, as I tried to investigate some of the regressions and
address some of code size issues. THe code produced by 3.1 should be
considerably smaller than code produced by 3.0.

Anyway I would like to see the recent regressions solved. Some of them
appears to be due to patch:

2001-11-17  Corey Minyard  <minyard@acm.org>
	    Richard Henderson  <rth@redhat.com>

	* unroll.c (loop_iterations): Detect one situation in which we
	overestimate the number of iterations.

And:

2001-11-30  Zoltan Hidvegi  <hzoli@hzoli.2y.net>

	* unroll.c (unroll_loop): Correct special exit cases.

I tried to investigate these but lacking simple testcase I found it quite
dificult.  I fixed some defects but still the overall resutls does not
improve.  FOr 3.2 we hope to have ready new loop unroller code, but for
3.1 this apepars to be important issue.

Honza

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Loop unrolling-related SPEC regressions?
  2002-02-04 11:33       ` Jan Hubicka
@ 2002-02-04 11:47         ` Jan Hubicka
  2002-02-05  9:59         ` David Edelsohn
  1 sibling, 0 replies; 41+ messages in thread
From: Jan Hubicka @ 2002-02-04 11:47 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Joe Buck, Paolo Carlini, gcc, rth

> > Honza writes:
> > 
> > > THe base/peak flags are not supposed to bring best performance,
> > > but be good for testing majority of gcc features.
> > 
> > gcc's competition, though, tends to use them that way (choosing options
> > that meet the criteria but give best performance).
> > 
> > Not that I want to get into a war on that front, but ...
> Hmm, perhaps we can try to make kind of "official" SPEC results when 3.1 release
> is out.  Andreas did some experimentation with the various options (it is
> linked from the page), and as I remember the loop unrolling -funroll-loops
> had neutral effect overal, while -funroll-all-loops caused slight performance
> drop.
Oops, sorry.
Looking at the numbers, -funrol-loops/-funroll-all-loops are equivalent in
Andreas testing and both slightly (6 seconds) better.
For 3.1 I would guess them to be more win because of code size savings
around the compiler.

Note that for Athlon the optimizer manual recommends heavy inlining and unrolling
as the cache sizes are pretty big.  This is the case for common benchmarks, even
for spec2000 that are often smaller than real world applications.

Honza

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Loop unrolling-related SPEC regressions?
  2002-02-04 11:07     ` Paolo Carlini
@ 2002-02-04 12:12       ` Andreas Jaeger
  2002-02-04 16:36         ` Paolo Carlini
  2002-02-04 21:34         ` Tim Prince
  0 siblings, 2 replies; 41+ messages in thread
From: Andreas Jaeger @ 2002-02-04 12:12 UTC (permalink / raw)
  To: Paolo Carlini; +Cc: Jan Hubicka, gcc

Paolo Carlini <pcarlini@unitus.it> writes:

> Jan Hubicka wrote:
>
>> >
>> > > browsing the latest results from Andreas, it looks like a few of them (e.g.,
>> > > 164.gzip, 186.crafty, 200.sixtrack) are showing a definite regression in the
>> > > PEAK case, characterized by -funroll-all-loops.
>> >
>> > It's not clear to me that -funroll-all-loops is the correct setting for
>> > PEAK, as bloating out the code may make the cache perform worse.
>>
>> We do use them in the testing runs for exactly these purposes.
>> It tends to show the "bugs" that causes unnecesary code growth in some
>> areas unnoticed by other benchmarks.
>> THe base/peak flags are not supposed to bring best performance,
>> but be good for testing majority of gcc features.
>
> That's really enlightening Honza! Thanks for the clarification.
> We should also remember this when someone compares the SPEC numbers made available
> by other compiler producers with those of GCC: my guess is that this kind of
> rationale for choosing the PEAK flags it's unfortunately not so widespread...

Didn't I mention it that way?  Feel free to send a patch for my SPEC
page to clarify what we're doing...

thanks,
Andreas
-- 
 Andreas Jaeger
  SuSE Labs aj@suse.de
   private aj@arthur.inka.de
    http://www.suse.de/~aj

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Loop unrolling-related SPEC regressions?
  2002-02-04 12:12       ` Andreas Jaeger
@ 2002-02-04 16:36         ` Paolo Carlini
  2002-02-04 21:34         ` Tim Prince
  1 sibling, 0 replies; 41+ messages in thread
From: Paolo Carlini @ 2002-02-04 16:36 UTC (permalink / raw)
  To: Andreas Jaeger; +Cc: gcc

Andreas Jaeger wrote:

> > That's really enlightening Honza! Thanks for the clarification.
> > We should also remember this when someone compares the SPEC numbers made available
> > by other compiler producers with those of GCC: my guess is that this kind of
> > rationale for choosing the PEAK flags it's unfortunately not so widespread...
>
> Didn't I mention it that way?  Feel free to send a patch for my SPEC
> page to clarify what we're doing...

No, your pages indeed present the tests exactly in this way. It's my fault not having
read the descriptive text attentively before.

Anyway, I look forward to see your numbers relative to the PEAK-type flags (but
-funroll-loops) with/without RTH unroller fix.

Thanks,
Paolo.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Loop unrolling-related SPEC regressions?
  2002-02-04 12:12       ` Andreas Jaeger
  2002-02-04 16:36         ` Paolo Carlini
@ 2002-02-04 21:34         ` Tim Prince
  2002-02-05  4:32           ` Jan Hubicka
  1 sibling, 1 reply; 41+ messages in thread
From: Tim Prince @ 2002-02-04 21:34 UTC (permalink / raw)
  To: Andreas Jaeger, Paolo Carlini; +Cc: Jan Hubicka, gcc

On Monday 04 February 2002 11:48, Andreas Jaeger wrote:
> Paolo Carlini <pcarlini@unitus.it> writes:
> > Jan Hubicka wrote:
> 
> >> THe base/peak flags are not supposed to bring best performance,
> >> but be good for testing majority of gcc features.
> >
> > That's really enlightening Honza! Thanks for the clarification.
> > We should also remember this when someone compares the SPEC numbers made
> > available by other compiler producers with those of GCC: my guess is that
> > this kind of rationale for choosing the PEAK flags it's unfortunately not
> > so widespread...
>
> Didn't I mention it that way?  Feel free to send a patch for my SPEC
> page to clarify what we're doing...
Of course, compilers which are sold on the basis of SPEC base performance 
have different approach to default options than gcc.  One expects the 
base option set to be the one which is the best single setting conforming to 
the limit on number of options, to obtain the highest rating.  Thus, a 
compiler such as Intel's makes a simple option package such as 
'icc -xW -Oi-'
roughly equivalent to
'gcc -msse2 -march=pentium4 -Os -funroll-loops -mpreferred-stack-boundary=4 
-ffast-math'
with even the base rating depending on Profile Guided Optimization.
Of course, one expects the peak rating to be found with a set of options 
which produces the fastest acceptable result for each test, not necessarily 
the most aggressive group of optimizations.  In that light, the SPEC 
disclosures allow one to speculate as to how much trial and error work was 
needed to obtain the results submitted, and how much more might be needed to 
achieve equivalent performance on a typical application.

I thank Andreas and Honza for explaining the difference between what they 
have done and what some of us may have expected.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Loop unrolling-related SPEC regressions?
  2002-02-04 21:34         ` Tim Prince
@ 2002-02-05  4:32           ` Jan Hubicka
  2002-02-05 14:05             ` Geoff Keating
  2002-02-05 20:57             ` Tim Prince
  0 siblings, 2 replies; 41+ messages in thread
From: Jan Hubicka @ 2002-02-05  4:32 UTC (permalink / raw)
  To: Tim Prince; +Cc: Andreas Jaeger, Paolo Carlini, Jan Hubicka, gcc

> On Monday 04 February 2002 11:48, Andreas Jaeger wrote:
> > Paolo Carlini <pcarlini@unitus.it> writes:
> > > Jan Hubicka wrote:
> > 
> > >> THe base/peak flags are not supposed to bring best performance,
> > >> but be good for testing majority of gcc features.
> > >
> > > That's really enlightening Honza! Thanks for the clarification.
> > > We should also remember this when someone compares the SPEC numbers made
> > > available by other compiler producers with those of GCC: my guess is that
> > > this kind of rationale for choosing the PEAK flags it's unfortunately not
> > > so widespread...
> >
> > Didn't I mention it that way?  Feel free to send a patch for my SPEC
> > page to clarify what we're doing...
> Of course, compilers which are sold on the basis of SPEC base performance 
> have different approach to default options than gcc.  One expects the 
> base option set to be the one which is the best single setting conforming to 
> the limit on number of options, to obtain the highest rating.  Thus, a 
> compiler such as Intel's makes a simple option package such as 
> 'icc -xW -Oi-'
> roughly equivalent to
> 'gcc -msse2 -march=pentium4 -Os -funroll-loops -mpreferred-stack-boundary=4 
> -ffast-math'

I am playing with the idea of making -O behaving like -f and allowing -Ospeed
"optimize for maximal speed for common circmuatens" and -Osize.  We can also
invent -O[no]debug "prohibit optimizations that make debugging dificult, like
tail call optimization, frame pointer ellimination, or (currently) register
renaming", or -Odangerous "enable language standard breaking transformations".

Perhaps that can be usefull not only to "fit in" the spec2000 rules, but also
to avoid confusion of users. Many benchmarks published uses far from "sane"
compilation switches.

Honza
> with even the base rating depending on Profile Guided Optimization.
> Of course, one expects the peak rating to be found with a set of options 
> which produces the fastest acceptable result for each test, not necessarily 
> the most aggressive group of optimizations.  In that light, the SPEC 
> disclosures allow one to speculate as to how much trial and error work was 
> needed to obtain the results submitted, and how much more might be needed to 
> achieve equivalent performance on a typical application.
> 
> I thank Andreas and Honza for explaining the difference between what they 
> have done and what some of us may have expected.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Loop unrolling-related SPEC regressions?
  2002-02-04 11:33       ` Jan Hubicka
  2002-02-04 11:47         ` Jan Hubicka
@ 2002-02-05  9:59         ` David Edelsohn
  1 sibling, 0 replies; 41+ messages in thread
From: David Edelsohn @ 2002-02-05  9:59 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Joe Buck, Paolo Carlini, gcc, rth

>>>>> Jan Hubicka writes:

Jan> Anyway I would like to see the recent regressions solved. Some of them
Jan> appears to be due to patch:

Jan> 2001-11-17  Corey Minyard  <minyard@acm.org>
Jan> Richard Henderson  <rth@redhat.com>

Jan> * unroll.c (loop_iterations): Detect one situation in which we
Jan> overestimate the number of iterations.

Jan> And:

Jan> 2001-11-30  Zoltan Hidvegi  <hzoli@hzoli.2y.net>

Jan> * unroll.c (unroll_loop): Correct special exit cases.

	Does this regression also exist on the GCC 3.0 branch?  When I
tried to integrate Zoli's patch into the GCC trunk, I ran into conflicts
with Richard's patch.  Richard duplicated some of Zoli's work with a
similar patch.  Richard's patch to unroll.c is a subset of Zoli's patch.

	After the problem getting Zoli's original patch reviewed and
approved, I decided to wait and see if Richard's subset was enough.  Given
the hostile reception to Zoli's original patches, it is unfortunate that
Richard had to duplicate the work and create nearly identical fixes.

	I would suggest you investigate whether replacing Richard's
partial patch with Zoli's complete version of the patch for unroll.c fixes
the problem.

	Zoli's patch already was approved by Mark Mitchell.  Zoli's
complete version of the patch is in GCC 3.0 branch.  Unless Richard
objects, if Zoli's complete patch fixes the problem, we should be able to
substitute Zoli's patch for Richard's.

David

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Loop unrolling-related SPEC regressions?
  2002-02-05  4:32           ` Jan Hubicka
@ 2002-02-05 14:05             ` Geoff Keating
  2002-02-06  3:32               ` Jan Hubicka
  2002-02-05 20:57             ` Tim Prince
  1 sibling, 1 reply; 41+ messages in thread
From: Geoff Keating @ 2002-02-05 14:05 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc

Jan Hubicka <jh@suse.cz> writes:

> I am playing with the idea of making -O behaving like -f and allowing -Ospeed
> "optimize for maximal speed for common circmuatens" and -Osize.  We can also
> invent -O[no]debug "prohibit optimizations that make debugging dificult, like
> tail call optimization, frame pointer ellimination, or (currently) register
> renaming", 

Don't we already have these?  -O3 is what you call 'Ospeed', -Os is
what you call 'Osize', '-O0' is Odebug.

-- 
- Geoffrey Keating <geoffk@geoffk.org> <geoffk@redhat.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Loop unrolling-related SPEC regressions?
  2002-02-05  4:32           ` Jan Hubicka
  2002-02-05 14:05             ` Geoff Keating
@ 2002-02-05 20:57             ` Tim Prince
  1 sibling, 0 replies; 41+ messages in thread
From: Tim Prince @ 2002-02-05 20:57 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Andreas Jaeger, Paolo Carlini, Jan Hubicka, gcc

On Tuesday 05 February 2002 04:29, Jan Hubicka wrote:
> > On Monday 04 February 2002 11:48, Andreas Jaeger wrote:
> > > Paolo Carlini <pcarlini@unitus.it> writes:
> > > > Jan Hubicka wrote:
> > > >> THe base/peak flags are not supposed to bring best performance,
> > > >> but be good for testing majority of gcc features.
> > > >
> > > > That's really enlightening Honza! Thanks for the clarification.
> > > > We should also remember this when someone compares the SPEC numbers
> > > > made available by other compiler producers with those of GCC: my
> > > > guess is that this kind of rationale for choosing the PEAK flags it's
> > > > unfortunately not so widespread...
> > >
> > > Didn't I mention it that way?  Feel free to send a patch for my SPEC
> > > page to clarify what we're doing...
> >
> > Of course, compilers which are sold on the basis of SPEC base performance
> > have different approach to default options than gcc.  One expects the
> > base option set to be the one which is the best single setting conforming
> > to the limit on number of options, to obtain the highest rating.  Thus, a
> > compiler such as Intel's makes a simple option package such as
> > 'icc -xW -Oi-'
> > roughly equivalent to
> > 'gcc -msse2 -march=pentium4 -Os -funroll-loops
> > -mpreferred-stack-boundary=4 -ffast-math'
>
> I am playing with the idea of making -O behaving like -f and allowing
> -Ospeed "optimize for maximal speed for common circmuatens" and -Osize.  We
> can also invent -O[no]debug "prohibit optimizations that make debugging
> dificult, like tail call optimization, frame pointer ellimination, or
> (currently) register renaming", or -Odangerous "enable language standard
> breaking transformations".
I thought that was the meaning of -ffast-math, or do you mean some 
combination of optimization which includes -ffast-math and -funroll-loops?
>
> Perhaps that can be usefull not only to "fit in" the spec2000 rules, but
> also to avoid confusion of users. Many benchmarks published uses far from
> "sane" compilation switches.
Yes, there is a need for a simple switch which includes most "sane" 
optimizations which are useful for a specified architecture, even if it is 
not quite sufficient for a good SPEC score. 
>
> Honza
>
> > with even the base rating depending on Profile Guided Optimization.
> > Of course, one expects the peak rating to be found with a set of options
> > which produces the fastest acceptable result for each test, not
> > necessarily the most aggressive group of optimizations.  In that light,
> > the SPEC disclosures allow one to speculate as to how much trial and
> > error work was needed to obtain the results submitted, and how much more
> > might be needed to achieve equivalent performance on a typical
> > application.
> >
> > I thank Andreas and Honza for explaining the difference between what they
> > have done and what some of us may have expected.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Loop unrolling-related SPEC regressions?
  2002-02-04  9:05           ` Paolo Carlini
  2002-02-04 11:15             ` Andreas Jaeger
@ 2002-02-06  0:59             ` Andreas Jaeger
  2002-02-06  1:05               ` Paolo Carlini
  2002-02-06 13:08               ` Andreas Jaeger
  1 sibling, 2 replies; 41+ messages in thread
From: Andreas Jaeger @ 2002-02-06  0:59 UTC (permalink / raw)
  To: Paolo Carlini; +Cc: gcc, rth

[-- Attachment #1: Type: text/plain, Size: 388 bytes --]


Here're the results of:
Base Compiler: GCC CVS as of Feb 5 2002 8am UTC
Peak Compiler: base plus reversion of loop unrolling patch
cflags base: -fomit-frame-pointer -march=athlon -funroll-loops -fstrict-aliasing -malign-double
cflags peak: -fomit-frame-pointer -march=athlon -funroll-loops -fstrict-aliasing -malign-double
Iterations: 3

Tell me if you like to see other runs,

Andreas


[-- Attachment #2: CINT2000.010.asc --]
[-- Type: text/plain, Size: 7228 bytes --]

##############################################################################
#   INVALID RUN INVALID RUN INVALID RUN INVALID RUN INVALID RUN INVALID RUN  #
#                                                                            #
# 'reportable' flag not set during run                                       #
#                                                                            #
#   INVALID RUN INVALID RUN INVALID RUN INVALID RUN INVALID RUN INVALID RUN  #
##############################################################################
                            SPEC CINT2000 Summary
                               Unknown Unknown
                             Tested by SuSE GmbH
                           Tue Feb  5 10:02:30 2002

SPEC License #1922  Test date: 2002-02-05   Hardware availability: June 2001
Tester: Andreas Jaeger, SuSE GmbH           Software availability: Now

                                     Estimated                     Estimated
                   Base      Base      Base      Peak      Peak      Peak
   Benchmarks    Ref Time  Run Time   Ratio    Ref Time  Run Time   Ratio
   ------------  --------  --------  --------  --------  --------  --------
   164.gzip          1400       562       249      1400       560       250 
   164.gzip          1400       561       250*     1400       559       250*
   164.gzip          1400       559       251      1400       559       251 
   175.vpr           1400       700       200*     1400       699       200*
   175.vpr           1400       701       200      1400       700       200 
   175.vpr           1400       699       200      1400       699       200 
   176.gcc           1100       449       245      1100       451       244*
   176.gcc           1100       450       244      1100       451       244 
   176.gcc           1100       449       245*     1100       451       244 
   181.mcf           1800      1048       172*     1800      1048       172 
   181.mcf           1800      1048       172      1800      1045       172 
   181.mcf           1800      1047       172      1800      1047       172*
   186.crafty        1000       279       358*     1000       279       358*
   186.crafty        1000       279       358      1000       279       358 
   186.crafty        1000       279       358      1000       279       358 
   197.parser        1800       795       226      1800       796       226 
   197.parser        1800       795       226      1800       795       226 
   197.parser        1800       795       226*     1800       796       226*
   252.eon           1300       983       132      1300       983       132 
   252.eon           1300       983       132*     1300       983       132*
   252.eon           1300       984       132      1300       983       132 
   253.perlbmk       1800       592       304      1800       592       304 
   253.perlbmk       1800       592       304      1800       592       304*
   253.perlbmk       1800       592       304*     1800       592       304 
   254.gap           1100       526       209      1100       525       209 
   254.gap           1100       525       210      1100       525       209*
   254.gap           1100       525       209*     1100       525       210 
   255.vortex        1900       595       319*     1900       595       319*
   255.vortex        1900       597       318      1900       596       319 
   255.vortex        1900       595       319      1900       595       319 
   256.bzip2         1500       827       181      1500       827       181 
   256.bzip2         1500       826       182      1500       826       182 
   256.bzip2         1500       826       182*     1500       826       182*
   300.twolf         3000      1327       226      3000      1329       226 
   300.twolf         3000      1334       225      3000      1340       224*
   300.twolf         3000      1329       226*     3000      1341       224 
   ========================================================================
   164.gzip          1400       561       250*     1400       559       250*
   175.vpr           1400       700       200*     1400       699       200*
   176.gcc           1100       449       245*     1100       451       244*
   181.mcf           1800      1048       172*     1800      1047       172*
   186.crafty        1000       279       358*     1000       279       358*
   197.parser        1800       795       226*     1800       796       226*
   252.eon           1300       983       132*     1300       983       132*
   253.perlbmk       1800       592       304*     1800       592       304*
   254.gap           1100       525       209*     1100       525       209*
   255.vortex        1900       595       319*     1900       595       319*
   256.bzip2         1500       826       182*     1500       826       182*
   300.twolf         3000      1329       226*     3000      1340       224*
   Est. SPECint_base2000                  227
   Est. SPECint2000                                                     227


                                   HARDWARE
                                   --------
     Hardware Vendor: Unknown
          Model Name: Unknown
                 CPU: AMD Athlon(tm) Processor
             CPU MHz: 1102.541
                 FPU: Integrated
      CPU(s) enabled: 1
    CPU(s) orderable: 1
            Parallel: No
       Primary Cache:  
     Secondary Cache: 256 KB
            L3 Cache: N/A
         Other Cache: N/A
              Memory: 496 MB
      Disk Subsystem: Unknown
      Other Hardware: Ethernet


                                   SOFTWARE
                                   --------
    Operating System: SuSE Linux 7.3 (i386)
            Compiler: GCC CVS
         File System: Linux/ReiserFS
        System State: Multi-User


                                    NOTES
                                    -----
     Base flags: -fomit-frame-pointer -march=athlon -funroll-loops -fstrict-aliasing -malign-double
     base plus reversion of loop unrolling patch
     Peak flags: -fomit-frame-pointer -march=athlon -funroll-loops -fstrict-aliasing -malign-double
     Unspecified
     To compile and execute eon correctly the following extra flags
     are used for compilation: -ffast-math -fwritable-strings.
##############################################################################
#   INVALID RUN INVALID RUN INVALID RUN INVALID RUN INVALID RUN INVALID RUN  #
#                                                                            #
# 'reportable' flag not set during run                                       #
#                                                                            #
#   INVALID RUN INVALID RUN INVALID RUN INVALID RUN INVALID RUN INVALID RUN  #
##############################################################################
-----------------------------------------------------------------------------
For questions about this result, please contact the tester.
For other inquiries, please contact webmaster@spec.org.
Copyright 1999-2000 Standard Performance Evaluation Corporation
Generated on Wed Feb  6 00:44:09 2002 by SPEC CPU2000 ASCII formatter v2.1

[-- Attachment #3: Type: text/plain, Size: 100 bytes --]


-- 
 Andreas Jaeger
  SuSE Labs aj@suse.de
   private aj@arthur.inka.de
    http://www.suse.de/~aj

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Loop unrolling-related SPEC regressions?
  2002-02-06  0:59             ` Andreas Jaeger
@ 2002-02-06  1:05               ` Paolo Carlini
  2002-02-06  1:39                 ` Andreas Jaeger
  2002-02-06 13:08               ` Andreas Jaeger
  1 sibling, 1 reply; 41+ messages in thread
From: Paolo Carlini @ 2002-02-06  1:05 UTC (permalink / raw)
  To: Andreas Jaeger; +Cc: gcc, rth

Andreas Jaeger wrote:

>Tell me if you like to see other runs,
>
Thank you very much, Andreas.

>164.gzip          1400       561       250*     1400       559       250*
>
>186.crafty        1000       279       358*     1000       279       358*
>
Really indistinguishable, right?

>Est. SPECint_base2000                  227
>   Est. SPECint2000                                                     227
>
Andreas, please excuse my *very* stupid question (for sure I could find 
this explained somewhere in your pages ;-)
How these SPEC indexes compare with those you publish on the WEB? I mean 
they are roughly half in size.

Thanks, Paolo.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Loop unrolling-related SPEC regressions?
  2002-02-06  1:05               ` Paolo Carlini
@ 2002-02-06  1:39                 ` Andreas Jaeger
  2002-02-06  1:43                   ` Paolo Carlini
  0 siblings, 1 reply; 41+ messages in thread
From: Andreas Jaeger @ 2002-02-06  1:39 UTC (permalink / raw)
  To: Paolo Carlini; +Cc: gcc, rth

Paolo Carlini <pcarlini@unitus.it> writes:

> Andreas Jaeger wrote:
>
>>Tell me if you like to see other runs,
>>
> Thank you very much, Andreas.
>
>>164.gzip          1400       561       250*     1400       559       250*
>>
>>186.crafty        1000       279       358*     1000       279       358*
>>
> Really indistinguishable, right?

Yes.

>>Est. SPECint_base2000                  227
>>   Est. SPECint2000                                                     227
>>
> Andreas, please excuse my *very* stupid question (for sure I could
> find this explained somewhere in your pages ;-)
> How these SPEC indexes compare with those you publish on the WEB? I
> mean they are roughly half in size.

Let's check... Oh, it's a different machine it's the one I
use for:
http://www.suse.de/%7Eaj/SPEC/CINT/sandbox/index.html

But nevertheless the numbers look wrong...

I found it - I forgot to add -O3 :-(

Ok, I rerun the tests with correct flags now:
cflags base: -O3 -fomit-frame-pointer -march=athlon -funroll-loops -fstrict-aliasing -malign-double
cflags peak: -O3 -fomit-frame-pointer -march=athlon -funroll-loops -fstrict-aliasing -malign-double

Sorry - and thanks for looking closer into these...

Andreas
-- 
 Andreas Jaeger
  SuSE Labs aj@suse.de
   private aj@arthur.inka.de
    http://www.suse.de/~aj

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Loop unrolling-related SPEC regressions?
  2002-02-06  1:43                   ` Paolo Carlini
@ 2002-02-06  1:43                     ` Andreas Jaeger
  2002-02-06  1:50                     ` Andreas Jaeger
  1 sibling, 0 replies; 41+ messages in thread
From: Andreas Jaeger @ 2002-02-06  1:43 UTC (permalink / raw)
  To: Paolo Carlini; +Cc: gcc

Paolo Carlini <pcarlini@unitus.it> writes:

> Andreas Jaeger wrote:
>
>>I found it - I forgot to add -O3 :-(
>>
>>Ok, I rerun the tests with correct flags now:
>>cflags base: -O3 -fomit-frame-pointer -march=athlon -funroll-loops -fstrict-aliasing -malign-double
>>cflags peak: -O3 -fomit-frame-pointer -march=athlon -funroll-loops -fstrict-aliasing -malign-double
>>
>>Sorry - and thanks for looking closer into these...
>>
> Ok :-)
>
> Any chance you can run also SPECfp2000 (at least 200.sixtrack) ???

No problem, I'll run everything.

Andreas
-- 
 Andreas Jaeger
  SuSE Labs aj@suse.de
   private aj@arthur.inka.de
    http://www.suse.de/~aj

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Loop unrolling-related SPEC regressions?
  2002-02-06  1:39                 ` Andreas Jaeger
@ 2002-02-06  1:43                   ` Paolo Carlini
  2002-02-06  1:43                     ` Andreas Jaeger
  2002-02-06  1:50                     ` Andreas Jaeger
  0 siblings, 2 replies; 41+ messages in thread
From: Paolo Carlini @ 2002-02-06  1:43 UTC (permalink / raw)
  To: Andreas Jaeger; +Cc: gcc

Andreas Jaeger wrote:

>I found it - I forgot to add -O3 :-(
>
>Ok, I rerun the tests with correct flags now:
>cflags base: -O3 -fomit-frame-pointer -march=athlon -funroll-loops -fstrict-aliasing -malign-double
>cflags peak: -O3 -fomit-frame-pointer -march=athlon -funroll-loops -fstrict-aliasing -malign-double
>
>Sorry - and thanks for looking closer into these...
>
Ok :-)

Any chance you can run also SPECfp2000 (at least 200.sixtrack) ???

Thanks, Paolo.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Loop unrolling-related SPEC regressions?
  2002-02-06  1:43                   ` Paolo Carlini
  2002-02-06  1:43                     ` Andreas Jaeger
@ 2002-02-06  1:50                     ` Andreas Jaeger
  1 sibling, 0 replies; 41+ messages in thread
From: Andreas Jaeger @ 2002-02-06  1:50 UTC (permalink / raw)
  To: Paolo Carlini; +Cc: gcc



Paolo, I'll send you the results automatically when they're finished
(I use a script that mails the output ;-) - and send both of them
tomorrow to the mailing list manually.

Andreas
-- 
 Andreas Jaeger
  SuSE Labs aj@suse.de
   private aj@arthur.inka.de
    http://www.suse.de/~aj

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Loop unrolling-related SPEC regressions?
  2002-02-05 14:05             ` Geoff Keating
@ 2002-02-06  3:32               ` Jan Hubicka
  0 siblings, 0 replies; 41+ messages in thread
From: Jan Hubicka @ 2002-02-06  3:32 UTC (permalink / raw)
  To: Geoff Keating; +Cc: Jan Hubicka, gcc

> Jan Hubicka <jh@suse.cz> writes:
> 
> > I am playing with the idea of making -O behaving like -f and allowing -Ospeed
> > "optimize for maximal speed for common circmuatens" and -Osize.  We can also
> > invent -O[no]debug "prohibit optimizations that make debugging dificult, like
> > tail call optimization, frame pointer ellimination, or (currently) register
> > renaming", 
> 
> Don't we already have these?  -O3 is what you call 'Ospeed', -Os is
> what you call 'Osize', '-O0' is Odebug.

Yes and no. Defnitly to get best perofrmance you don't need to enable
just -march=mymodel -O3

We disable some options at higher optimizations levels just to allow debugging
(like -fomit-frame-pointer, register renaming, loop unrolling).

It is somewhat dificult to orientate in the optimization levels for users
and I am not sure what exactly can we do with the situation.

Honza
> 
> -- 
> - Geoffrey Keating <geoffk@geoffk.org> <geoffk@redhat.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Loop unrolling-related SPEC regressions?
  2002-02-06  0:59             ` Andreas Jaeger
  2002-02-06  1:05               ` Paolo Carlini
@ 2002-02-06 13:08               ` Andreas Jaeger
  2002-02-06 14:09                 ` Laurent Guerby
  1 sibling, 1 reply; 41+ messages in thread
From: Andreas Jaeger @ 2002-02-06 13:08 UTC (permalink / raw)
  To: Paolo Carlini; +Cc: gcc, rth

[-- Attachment #1: Type: text/plain, Size: 384 bytes --]


I hope these results are fine now - the differences are minimal,

Andreas

Base Compiler: GCC CVS as of Feb 5 2002 8am UTC
Peak Compiler: base plus reversion of loop unrolling patch
cflags base: -O3 -fomit-frame-pointer -march=athlon -funroll-loops -fstrict-aliasing -malign-double
cflags peak: -O3 -fomit-frame-pointer -march=athlon -funroll-loops -fstrict-aliasing -malign-double


[-- Attachment #2: CINT2000.012.asc --]
[-- Type: text/plain, Size: 7257 bytes --]

##############################################################################
#   INVALID RUN INVALID RUN INVALID RUN INVALID RUN INVALID RUN INVALID RUN  #
#                                                                            #
# 'reportable' flag not set during run                                       #
#                                                                            #
#   INVALID RUN INVALID RUN INVALID RUN INVALID RUN INVALID RUN INVALID RUN  #
##############################################################################
                            SPEC CINT2000 Summary
                               Unknown Unknown
                             Tested by SuSE GmbH
                           Wed Feb  6 10:42:56 2002

SPEC License #1922  Test date: 2002-02-06   Hardware availability: June 2001
Tester: Andreas Jaeger, SuSE GmbH           Software availability: Now

                                     Estimated                     Estimated
                   Base      Base      Base      Peak      Peak      Peak
   Benchmarks    Ref Time  Run Time   Ratio    Ref Time  Run Time   Ratio
   ------------  --------  --------  --------  --------  --------  --------
   164.gzip          1400       303       462      1400       304       460 
   164.gzip          1400       303       461*     1400       301       466*
   164.gzip          1400       304       461      1400       301       466 
   175.vpr           1400       541       259      1400       541       259*
   175.vpr           1400       541       259*     1400       541       259 
   175.vpr           1400       541       259      1400       543       258 
   176.gcc           1100       343       320      1100       343       320 
   176.gcc           1100       341       323*     1100       342       322*
   176.gcc           1100       339       324      1100       341       323 
   181.mcf           1800      1001       180      1800      1003       179 
   181.mcf           1800      1001       180*     1800      1001       180 
   181.mcf           1800       999       180      1800      1003       179*
   186.crafty        1000       191       524*     1000       191       524 
   186.crafty        1000       191       524      1000       191       525*
   186.crafty        1000       191       525      1000       191       525 
   197.parser        1800       554       325*     1800       554       325 
   197.parser        1800       554       325      1800       554       325 
   197.parser        1800       554       325      1800       554       325*
   252.eon           1300       203       642*     1300       203       640 
   252.eon           1300       203       641      1300       203       642 
   252.eon           1300       202       642      1300       203       641*
   253.perlbmk       1800       364       494      1800       364       494 
   253.perlbmk       1800       364       495*     1800       364       495 
   253.perlbmk       1800       364       495      1800       364       495*
   254.gap           1100       302       365      1100       303       363 
   254.gap           1100       301       366*     1100       300       366*
   254.gap           1100       301       366      1100       300       366 
   255.vortex        1900       464       410*     1900       465       409 
   255.vortex        1900       464       410      1900       463       411 
   255.vortex        1900       464       409      1900       463       410*
   256.bzip2         1500       493       304      1500       493       304 
   256.bzip2         1500       492       305      1500       492       305*
   256.bzip2         1500       492       305*     1500       492       305 
   300.twolf         3000      1056       284*     3000      1060       283 
   300.twolf         3000      1058       284      3000      1060       283*
   300.twolf         3000      1056       284      3000      1057       284 
   ========================================================================
   164.gzip          1400       303       461*     1400       301       466*
   175.vpr           1400       541       259*     1400       541       259*
   176.gcc           1100       341       323*     1100       342       322*
   181.mcf           1800      1001       180*     1800      1003       179*
   186.crafty        1000       191       524*     1000       191       525*
   197.parser        1800       554       325*     1800       554       325*
   252.eon           1300       203       642*     1300       203       641*
   253.perlbmk       1800       364       495*     1800       364       495*
   254.gap           1100       301       366*     1100       300       366*
   255.vortex        1900       464       410*     1900       463       410*
   256.bzip2         1500       492       305*     1500       492       305*
   300.twolf         3000      1056       284*     3000      1060       283*
   Est. SPECint_base2000                  361
   Est. SPECint2000                                                     361


                                   HARDWARE
                                   --------
     Hardware Vendor: Unknown
          Model Name: Unknown
                 CPU: AMD Athlon(tm) Processor
             CPU MHz: 1102.541
                 FPU: Integrated
      CPU(s) enabled: 1
    CPU(s) orderable: 1
            Parallel: No
       Primary Cache:  
     Secondary Cache: 256 KB
            L3 Cache: N/A
         Other Cache: N/A
              Memory: 496 MB
      Disk Subsystem: Unknown
      Other Hardware: Ethernet


                                   SOFTWARE
                                   --------
    Operating System: SuSE Linux 7.3 (i386)
            Compiler: GCC CVS
         File System: Linux/ReiserFS
        System State: Multi-User


                                    NOTES
                                    -----
     Base flags: -O3 -fomit-frame-pointer -march=athlon -funroll-loops -fstrict-aliasing -malign-double
     GCC CVS as of Feb 5 2002 8am UTC
     Peak flags: -O3 -fomit-frame-pointer -march=athlon -funroll-loops -fstrict-aliasing -malign-double
     base plus reversion of loop unrolling patch
     To compile and execute eon correctly the following extra flags
     are used for compilation: -ffast-math -fwritable-strings.
##############################################################################
#   INVALID RUN INVALID RUN INVALID RUN INVALID RUN INVALID RUN INVALID RUN  #
#                                                                            #
# 'reportable' flag not set during run                                       #
#                                                                            #
#   INVALID RUN INVALID RUN INVALID RUN INVALID RUN INVALID RUN INVALID RUN  #
##############################################################################
-----------------------------------------------------------------------------
For questions about this result, please contact the tester.
For other inquiries, please contact webmaster@spec.org.
Copyright 1999-2000 Standard Performance Evaluation Corporation
Generated on Wed Feb  6 20:47:55 2002 by SPEC CPU2000 ASCII formatter v2.1

[-- Attachment #3: Type: text/plain, Size: 100 bytes --]


-- 
 Andreas Jaeger
  SuSE Labs aj@suse.de
   private aj@arthur.inka.de
    http://www.suse.de/~aj

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Loop unrolling-related SPEC regressions?
  2002-02-06 13:08               ` Andreas Jaeger
@ 2002-02-06 14:09                 ` Laurent Guerby
  2002-02-06 14:45                   ` Dale Johannesen
  2002-02-06 15:00                   ` Andreas Jaeger
  0 siblings, 2 replies; 41+ messages in thread
From: Laurent Guerby @ 2002-02-06 14:09 UTC (permalink / raw)
  To: Andreas Jaeger; +Cc: Paolo Carlini, gcc, rth

I just merged your base results with:

<http://www.spec.org/osg/cpu2000/results/res2000q4/cpu2000-20001204-00426.asc>

                 GCC   S   G/S   SP  G/SP
    164.gzip     461 472 0.976  563 0.818
    175.vpr      259 255 1.015  285 0.908
    176.gcc      323 248 1.302  355 0.909
    181.mcf      180 194 0.927  196 0.918
    186.crafty   524 632 0.829  678 0.772
    197.parser   325 372 0.873  373 0.871
    252.eon      642 692 0.927 1056 0.607
    253.perlbmk  495 668 0.741  720 0.687
    254.gap      366 441 0.829  441 0.829
    255.vortex   410 702 0.584  731 0.560
    256.bzip2    305 335 0.910  343 0.889
    300.twolf    284 340 0.835  360 0.788

GCC = GCC base, S = SPEC base, SP = SPEC peak

This was with the closest SPEC run I found, however
the MHz are different, so I don't know if a rescale is needed:

SPEC web: CPU: 1.2GHz AMD Athlon processor A1200AMT3B
Andreas : CPU MHz: 1102.541

Apparent weaknesses on base are vortex and perlbmk, has
anyone looked at them? perl might be interesting, 25%
base performance hit on such a complex piece of free software,
there must be some critical interpreter piece of code
completely miscompiled by CVS GCC (performance-wise).

Any perl hacker willing to zoom on it?
Does anyone know if it is a performance regression from previous GCC?

I assume eon and vortex are easy targets for "one
optimisation gets all" and might be less interesting
to look at.

-- 
Laurent Guerby <guerby@acm.org>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Loop unrolling-related SPEC regressions?
  2002-02-06 14:09                 ` Laurent Guerby
@ 2002-02-06 14:45                   ` Dale Johannesen
  2002-02-06 15:08                     ` Andreas Jaeger
  2002-02-07  3:48                     ` Jan Hubicka
  2002-02-06 15:00                   ` Andreas Jaeger
  1 sibling, 2 replies; 41+ messages in thread
From: Dale Johannesen @ 2002-02-06 14:45 UTC (permalink / raw)
  To: Laurent Guerby; +Cc: Dale Johannesen, Andreas Jaeger, Paolo Carlini, gcc, rth


On Wednesday, February 6, 2002, at 02:08 PM, Laurent Guerby wrote:
>                 GCC   S   G/S   SP  G/SP
>    164.gzip     461 472 0.976  563 0.818
>    175.vpr      259 255 1.015  285 0.908
>    176.gcc      323 248 1.302  355 0.909
>    181.mcf      180 194 0.927  196 0.918
>    186.crafty   524 632 0.829  678 0.772
>    197.parser   325 372 0.873  373 0.871
>    252.eon      642 692 0.927 1056 0.607
>    253.perlbmk  495 668 0.741  720 0.687
>    254.gap      366 441 0.829  441 0.829
>    255.vortex   410 702 0.584  731 0.560
>    256.bzip2    305 335 0.910  343 0.889
>    300.twolf    284 340 0.835  360 0.788
>
> Apparent weaknesses on base are vortex and perlbmk, has
> anyone looked at them? perl might be interesting, 25%
> base performance hit on such a complex piece of free software,
> there must be some critical interpreter piece of code
> completely miscompiled by CVS GCC (performance-wise).

I looked at Spec quite a bit for my last job and I can
suggest some things that are important.

Intelligent use of profiling info from the first pass is
important.  You'll see the published numbers do this.
Last time I looked gcc used this only for branch
straightening; it can also be used effectively to
drive inlining and register allocation.

crafty is heavily dependent on efficiency of "long long".
It's a chess program, full of 64-bit bitmasks.

eon is the only one in C++.  If there are any problems
in exception handling they will show up here.  The program
does not actually throw any exceptions, so turning off
the handling for peak may help (SPEC won't let you turn
it off for base).  Good inlining decisions are also important.

the two most heavily executed functions in perl are big;
IME register allocation & scheduling don't always work
well for big functions.  They also both call setjmp; if
this disables any substantial amount of optimization it
will hurt.

vortex accesses a huge amount of virtual memory.  Good
malloc/free performance is critical.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Loop unrolling-related SPEC regressions?
  2002-02-06 14:09                 ` Laurent Guerby
  2002-02-06 14:45                   ` Dale Johannesen
@ 2002-02-06 15:00                   ` Andreas Jaeger
  2002-02-07  5:22                     ` Laurent Guerby
  1 sibling, 1 reply; 41+ messages in thread
From: Andreas Jaeger @ 2002-02-06 15:00 UTC (permalink / raw)
  To: Laurent Guerby; +Cc: Paolo Carlini, gcc, rth

Laurent Guerby <guerby@acm.org> writes:

> I just merged your base results with:
>
> <http://www.spec.org/osg/cpu2000/results/res2000q4/cpu2000-20001204-00426.asc>
>
>                  GCC   S   G/S   SP  G/SP
>     164.gzip     461 472 0.976  563 0.818
>     175.vpr      259 255 1.015  285 0.908
>     176.gcc      323 248 1.302  355 0.909
>     181.mcf      180 194 0.927  196 0.918
>     186.crafty   524 632 0.829  678 0.772
>     197.parser   325 372 0.873  373 0.871
>     252.eon      642 692 0.927 1056 0.607
>     253.perlbmk  495 668 0.741  720 0.687
>     254.gap      366 441 0.829  441 0.829
>     255.vortex   410 702 0.584  731 0.560
>     256.bzip2    305 335 0.910  343 0.889
>     300.twolf    284 340 0.835  360 0.788
>
> GCC = GCC base, S = SPEC base, SP = SPEC peak
>
> This was with the closest SPEC run I found, however
> the MHz are different, so I don't know if a rescale is needed:
>
> SPEC web: CPU: 1.2GHz AMD Athlon processor A1200AMT3B
> Andreas : CPU MHz: 1102.541

A rescale is needed.

Better take these values that I measured under Linux with the same CPU
as in
http://www.spec.org/osg/cpu2000/results/res2001q2/cpu2000-20010519-00651.asc
(use those values for comparison!):

Compiler  GCC 3.1 from CVS of 2001-11-07
Base flags: -O3 -march=athlon -fomit-frame-pointer

Peak flags: -O3 -march=athlon -fomit-frame-pointer, FDO: Pass1:
-fprofile-arcs, Pass2 -fbranch-probabilities
                                      Estimated                        Estimated
                    Base      Base      Base       Peak       Peak       Peak
   Benchmarks     Ref Time  Run Time   Ratio     Ref Time   Run Time    Ratio

   164.gzip           1400        274        511      1400        264        530
   175.vpr            1400        451        310      1400        464        302
   176.gcc            1100        286        384      1100        275        400
   181.mcf            1800        829        217      1800        897        201
   186.crafty         1000        171        585      1000        163        614
   197.parser         1800        470        383      1800        465        387
   252.eon            1300        211        616      1300        210        618
   253.perlbmk        1800        321        562      1800        309        583
   254.gap            1100        235        467      1100        228        482
   255.vortex         1900        401        474      1900        379        501
   256.bzip2          1500        378        397      1500        398        377
   300.twolf          3000        908        331      3000        892        336
   SPECint_base2000                          420       
   SPECint2000                                                               424

Hardware: Dual AMD Athlon 1.2 GHz, 1 GB Memory, SCSI system
Software: SuSE Linux 7.3.


> Apparent weaknesses on base are vortex and perlbmk, has
> anyone looked at them? perl might be interesting, 25%
> base performance hit on such a complex piece of free software,
> there must be some critical interpreter piece of code
> completely miscompiled by CVS GCC (performance-wise).
>
> Any perl hacker willing to zoom on it?
> Does anyone know if it is a performance regression from previous GCC?
>
> I assume eon and vortex are easy targets for "one
> optimisation gets all" and might be less interesting
> to look at.

Andreas
-- 
 Andreas Jaeger
  SuSE Labs aj@suse.de
   private aj@arthur.inka.de
    http://www.suse.de/~aj

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Loop unrolling-related SPEC regressions?
  2002-02-06 14:45                   ` Dale Johannesen
@ 2002-02-06 15:08                     ` Andreas Jaeger
  2002-02-07  3:59                       ` Jan Hubicka
  2002-02-07  7:27                       ` Michael Matz
  2002-02-07  3:48                     ` Jan Hubicka
  1 sibling, 2 replies; 41+ messages in thread
From: Andreas Jaeger @ 2002-02-06 15:08 UTC (permalink / raw)
  To: Dale Johannesen; +Cc: Laurent Guerby, Paolo Carlini, gcc, rth

Dale Johannesen <dalej@apple.com> writes:

> Intelligent use of profiling info from the first pass is
> important.  You'll see the published numbers do this.
> Last time I looked gcc used this only for branch
> straightening; it can also be used effectively to
> drive inlining and register allocation.

AFAIK the infrastructure is there, it only needs to be used for
inlining - and also in the new register allocator.  Michal, is this
possible?


> crafty is heavily dependent on efficiency of "long long".
> It's a chess program, full of 64-bit bitmasks.
>
> eon is the only one in C++.  If there are any problems
> in exception handling they will show up here.  The program
> does not actually throw any exceptions, so turning off
> the handling for peak may help (SPEC won't let you turn
> it off for base).  Good inlining decisions are also important.

The inline change brought in August by Kurt Garloff brought a real
performance improvement, check my graphs at http://www.suse.de/~aj/SPEC

> the two most heavily executed functions in perl are big;
> IME register allocation & scheduling don't always work
> well for big functions.  They also both call setjmp; if
> this disables any substantial amount of optimization it
> will hurt.
>
> vortex accesses a huge amount of virtual memory.  Good
> malloc/free performance is critical.


Andreas
-- 
 Andreas Jaeger
  SuSE Labs aj@suse.de
   private aj@arthur.inka.de
    http://www.suse.de/~aj

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Loop unrolling-related SPEC regressions?
  2002-02-06 14:45                   ` Dale Johannesen
  2002-02-06 15:08                     ` Andreas Jaeger
@ 2002-02-07  3:48                     ` Jan Hubicka
  2002-02-07  9:42                       ` Richard Henderson
  1 sibling, 1 reply; 41+ messages in thread
From: Jan Hubicka @ 2002-02-07  3:48 UTC (permalink / raw)
  To: Dale Johannesen; +Cc: Laurent Guerby, Andreas Jaeger, Paolo Carlini, gcc, rth

> I looked at Spec quite a bit for my last job and I can
> suggest some things that are important.
> 
> Intelligent use of profiling info from the first pass is
> important.  You'll see the published numbers do this.
> Last time I looked gcc used this only for branch
> straightening; it can also be used effectively to
> drive inlining and register allocation.

I did quite active development on this.  In 3.1 gcc can do some
of optimizations based on profile info, like register allocation.
On the cfg-branch we are still focusing on this path for 3.2
timeframe.
> 
> crafty is heavily dependent on efficiency of "long long".
> It's a chess program, full of 64-bit bitmasks.

This is actually big problem for gcc.  It may be workaroundable
by using SSE/MMX arithmetics when available.
> 
> eon is the only one in C++.  If there are any problems
> in exception handling they will show up here.  The program
> does not actually throw any exceptions, so turning off
> the handling for peak may help (SPEC won't let you turn
> it off for base).  Good inlining decisions are also important.

Yes, eon basically appears to be very huge, so everything
that shrinks the footprint is usefull.
> 
> the two most heavily executed functions in perl are big;
> IME register allocation & scheduling don't always work
> well for big functions.  They also both call setjmp; if
> this disables any substantial amount of optimization it
> will hurt.

Our setjmp handling should be aggressive enought.  We represent
it as abnormal edge in the CFG and this optimize the rest of
function w/o much of degradation.

Honza

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Loop unrolling-related SPEC regressions?
  2002-02-06 15:08                     ` Andreas Jaeger
@ 2002-02-07  3:59                       ` Jan Hubicka
  2002-02-07  7:27                       ` Michael Matz
  1 sibling, 0 replies; 41+ messages in thread
From: Jan Hubicka @ 2002-02-07  3:59 UTC (permalink / raw)
  To: Andreas Jaeger; +Cc: Dale Johannesen, Laurent Guerby, Paolo Carlini, gcc, rth

> Dale Johannesen <dalej@apple.com> writes:
> 
> > Intelligent use of profiling info from the first pass is
> > important.  You'll see the published numbers do this.
> > Last time I looked gcc used this only for branch
> > straightening; it can also be used effectively to
> > drive inlining and register allocation.
> 
> AFAIK the infrastructure is there, it only needs to be used for
> inlining - and also in the new register allocator.  Michal, is this
> possible?

It is not possible to use it for inlining yet, as our loop optimization
pass kill profile info, so unless we want to have multiple passes needed
for optimization, we need to rewrite that one as well as RTL code
generation first.  We are working on that.

It is already used for register allocation in the mainline as well as
for few of other decision.  CFG branch is more aggressive already deriving
about 5% benefit.  I hope this to improve as the code matures.

Honza

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Loop unrolling-related SPEC regressions?
  2002-02-06 15:00                   ` Andreas Jaeger
@ 2002-02-07  5:22                     ` Laurent Guerby
  2002-02-07  5:46                       ` Cannot allocate 92625564 bytes after allocating 198356992 bytes Dimitri Frederickx
       [not found]                       ` <ho8za5ijlt.fsf@gee.suse.de>
  0 siblings, 2 replies; 41+ messages in thread
From: Laurent Guerby @ 2002-02-07  5:22 UTC (permalink / raw)
  To: Andreas Jaeger; +Cc: Paolo Carlini, gcc, rth

Here is the new comparison between

AMD     (S): 
http://www.spec.org/osg/cpu2000/results/res2000q4/cpu2000-20001204-00426.asc
Andreas (G): 
http://www.spec.org/osg/cpu2000/results/res2001q2/cpu2000-20010519-00651.asc

                  G   S   G/S   GP   SP  GP/SP
    164.gzip     608 472 1.288  619  563 1.099
    175.vpr      319 255 1.250  342  285 1.200
    176.gcc      357 248 1.439  430  355 1.211
    181.mcf      229 194 1.180  232  196 1.183
    186.crafty   665 632 1.052  692  678 1.020
    197.parser   436 372 1.172  434  373 1.163
    252.eon      839 692 1.212 1046 1056 0.990
    253.perlbmk  759 668 1.136  748  720 1.038
    254.gap      581 441 1.317  579  441 1.312
    255.vortex   764 702 1.088  796  731 1.088
    256.bzip2    401 335 1.197  423  343 1.233
    300.twolf    417 340 1.226  421  360 1.169

I assume I have made a stoopid mistake somewhere,
otherwise why is AMD still submitting SPEC results
based on the Intel compiler?

May be SPECfp is showing another picture though.

-- 
Laurent Guerby <guerby@acm.org>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Cannot allocate 92625564 bytes after allocating 198356992 bytes
  2002-02-07  5:22                     ` Laurent Guerby
@ 2002-02-07  5:46                       ` Dimitri Frederickx
       [not found]                       ` <ho8za5ijlt.fsf@gee.suse.de>
  1 sibling, 0 replies; 41+ messages in thread
From: Dimitri Frederickx @ 2002-02-07  5:46 UTC (permalink / raw)
  To: gcc-help, gcc

While compiling my sourcecode with gcc I get the following message:
(I use jam to compile my sources)

Cc /home/Administrator/open-wonka/build-x86-winnt/wonka/bin/unicode.o

cc1.exe: Cannot allocate 92625564 bytes after allocating 198356992 bytes

gcc  -c -Wall -Wsign-compare -Wshadow -Wpointer-arith -Wstrict-prototypes -W
inline -Wconversion -DDEBUG_LEVEL=7 -DVERSION_STRING='"WONKA-0-8-RELEASE"' -
D__NO_STRING_INLINES -DWINNT -DBOOTCLASSDIR='"boot"' -DBOOTCLASSFILE='"class
es.zip"' -DDEBUG -DRUNTIME_CHECKS -ggdb -DFICL -DENABLE_GC -DDISABLE_PS -DOS
WALD -DFSROOT='"./fsroot"' -fno-leading-underscore  -O2  -I/home/Administrat
or/open-wonka/wonka/src/vm -I/home/Administrator/open-wonka/kernel/oswald/in
clude -I/home/Administrator/open-wonka/kernel/oswald/hal/host/winnt/include 
-I/home/Administrator/open-wonka/kernel/oswald/hal/cpu/x86/include -I/home/A
dministrator/open-wonka/wonka/include -I/home/Administrator/open-wonka/wonka
/hal/cpu/x86/include -I/home/Administrator/open-wonka/wonka/hal/hostos/winnt
/include -I/home/Administrator/open-wonka/network/none/include -I/home/Admin
istrator/open-wonka/wonka/include -I/home/Administrator/open-wonka/wonka/hal
/cpu/x86/include -I/home/Administrator/open-wonka/wonka/hal/hostos/winnt/inc
lude -I/home/Administrator/open-wonka/build-x86-winnt/wonka/bin -I/home/Admi
nistrator/open-wonka/build-x86-winnt/awt/none/bin -I/home/Administrator/open
-wonka/fs/native/hal/hostos/winnt/include -I/home/Administrator/open-wonka/f
s/native/include -I/home/Administrator/open-wonka/jpda/jdwp/include  -o
/home/Administrator/open-wonka/build-x86-winnt/wonka/bin/unicode.o
/home/Administrator/open-wonka/build-x86-winnt/wonka/bin/unicode.c

...failed Cc
/home/Administrator/open-wonka/build-x86-winnt/wonka/bin/unicode.o ...


What does this message mean? Why do I get it? Why can't I compile my source?
How do I solve it?

Dimitri Frederickx
Student Industrial Engineer

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Loop unrolling-related SPEC regressions?
  2002-02-06 15:08                     ` Andreas Jaeger
  2002-02-07  3:59                       ` Jan Hubicka
@ 2002-02-07  7:27                       ` Michael Matz
  1 sibling, 0 replies; 41+ messages in thread
From: Michael Matz @ 2002-02-07  7:27 UTC (permalink / raw)
  To: Andreas Jaeger; +Cc: Dale Johannesen, Laurent Guerby, Paolo Carlini, gcc, rth

Hi,

On Thu, 7 Feb 2002, Andreas Jaeger wrote:

> > Intelligent use of profiling info from the first pass is
> > important.  You'll see the published numbers do this.
> > Last time I looked gcc used this only for branch
> > straightening; it can also be used effectively to
> > drive inlining and register allocation.
>
> AFAIK the infrastructure is there, it only needs to be used for
> inlining - and also in the new register allocator.  Michal, is this
> possible?

In fact if the numbers I base decisions in the allocator on, are itself
based on profiling, then the allocator can already be profile-guided.
Basically what I need is only cost estimations for basic blocks (i.e. how
often they are run compared to the other blocks in the function).  I don't
care if they are estimated statically from loop structure or precisely
from profiling.


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Loop unrolling-related SPEC regressions?
  2002-02-07  3:48                     ` Jan Hubicka
@ 2002-02-07  9:42                       ` Richard Henderson
  0 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2002-02-07  9:42 UTC (permalink / raw)
  To: Jan Hubicka
  Cc: Dale Johannesen, Laurent Guerby, Andreas Jaeger, Paolo Carlini, gcc

On Thu, Feb 07, 2002 at 12:46:45PM +0100, Jan Hubicka wrote:
> Our setjmp handling should be aggressive enought.  We represent
> it as abnormal edge in the CFG and this optimize the rest of
> function w/o much of degradation.

No we don't.  We _talked_ about doing that.


r~

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Loop unrolling-related SPEC regressions?
       [not found]                       ` <ho8za5ijlt.fsf@gee.suse.de>
@ 2002-02-07 11:52                         ` Laurent Guerby
  0 siblings, 0 replies; 41+ messages in thread
From: Laurent Guerby @ 2002-02-07 11:52 UTC (permalink / raw)
  To: Andreas Jaeger, gcc

Andreas Jaeger wrote:

> ??? Both pages uses Intel's compiler.  You just compared two windows
> results - or am I missing something?

I misinterpreted the URL you gave as being your results, indeed my first 
comparison
was just showing off the yearly progress of the Intel compiler dudes on 
generating
better AMD Athlon code, quite impressive :).

 > What is interesting is the comparance of my GCC values together with
 > AMD's values from 2001q2,

G: http://gcc.gnu.org/ml/gcc/2002-02/msg00448.html
A: 
http://www.spec.org/osg/cpu2000/results/res2001q2/cpu2000-20010519-00651.asc

               G   A   G/A    GP   AP GP/AP
164.gzip    511 608 0.840   530  619 0.856
175.vpr     310 319 0.971   302  342 0.883
176.gcc     384 357 1.075   400  430 0.930
181.mcf     217 229 0.947   201  232 0.866 (1)
186.crafty  585 665 0.879   614  692 0.887
197.parser  383 436 0.878   387  434 0.891 (2)
252.eon     616 839 0.734   618 1046 0.590
253.perlbmk 562 759 0.740   583  748 0.779 (2)
254.gap     467 581 0.803   482  579 0.832 (2)
255.vortex  474 764 0.620   501  796 0.629
256.bzip2   397 401 0.990   377  423 0.891 (1)
300.twolf   331 417 0.793   336  421 0.798

(1) GCC peak < GCC base, profile feedback not working as intended :).
(2) AMD peak < AMD base (???)

Still 20-something % on perl.

-- 
Laurent Guerby <guerby@acm.org>

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2002-02-07 19:47 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-02-01 10:32 Loop unrolling-related SPEC regressions? Paolo Carlini
2002-02-01 10:46 ` Richard Henderson
2002-02-01 10:51   ` Paolo Carlini
2002-02-01 10:57     ` Richard Henderson
2002-02-01 11:12       ` Paolo Carlini
2002-02-04  8:37         ` Andreas Jaeger
2002-02-04  9:05           ` Paolo Carlini
2002-02-04 11:15             ` Andreas Jaeger
2002-02-06  0:59             ` Andreas Jaeger
2002-02-06  1:05               ` Paolo Carlini
2002-02-06  1:39                 ` Andreas Jaeger
2002-02-06  1:43                   ` Paolo Carlini
2002-02-06  1:43                     ` Andreas Jaeger
2002-02-06  1:50                     ` Andreas Jaeger
2002-02-06 13:08               ` Andreas Jaeger
2002-02-06 14:09                 ` Laurent Guerby
2002-02-06 14:45                   ` Dale Johannesen
2002-02-06 15:08                     ` Andreas Jaeger
2002-02-07  3:59                       ` Jan Hubicka
2002-02-07  7:27                       ` Michael Matz
2002-02-07  3:48                     ` Jan Hubicka
2002-02-07  9:42                       ` Richard Henderson
2002-02-06 15:00                   ` Andreas Jaeger
2002-02-07  5:22                     ` Laurent Guerby
2002-02-07  5:46                       ` Cannot allocate 92625564 bytes after allocating 198356992 bytes Dimitri Frederickx
     [not found]                       ` <ho8za5ijlt.fsf@gee.suse.de>
2002-02-07 11:52                         ` Loop unrolling-related SPEC regressions? Laurent Guerby
2002-02-01 11:14 ` Joe Buck
2002-02-04  8:45   ` Andreas Jaeger
2002-02-04 10:58   ` Jan Hubicka
2002-02-04 11:07     ` Paolo Carlini
2002-02-04 12:12       ` Andreas Jaeger
2002-02-04 16:36         ` Paolo Carlini
2002-02-04 21:34         ` Tim Prince
2002-02-05  4:32           ` Jan Hubicka
2002-02-05 14:05             ` Geoff Keating
2002-02-06  3:32               ` Jan Hubicka
2002-02-05 20:57             ` Tim Prince
2002-02-04 11:20     ` Joe Buck
2002-02-04 11:33       ` Jan Hubicka
2002-02-04 11:47         ` Jan Hubicka
2002-02-05  9:59         ` David Edelsohn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).