public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: [GCC 3.0] Bad regression, binary size
@ 2001-07-23 19:42 dewar
  2001-07-24  9:27 ` Linus Torvalds
  0 siblings, 1 reply; 44+ messages in thread
From: dewar @ 2001-07-23 19:42 UTC (permalink / raw)
  To: jthorn, torvalds; +Cc: gcc

<<My argument is really the same, but with a twist: keep it alive, but
don't make it the default if it makes non-FP code bigger.
>>

It is always tricky to argue about defaults. One critical issue with defaults
is to make benchmarks work better out of the box, but the default of -O0
seriously undermines this design criterion in any case.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [GCC 3.0] Bad regression, binary size
  2001-07-23 19:42 [GCC 3.0] Bad regression, binary size dewar
@ 2001-07-24  9:27 ` Linus Torvalds
  0 siblings, 0 replies; 44+ messages in thread
From: Linus Torvalds @ 2001-07-24  9:27 UTC (permalink / raw)
  To: dewar; +Cc: jthorn, gcc

On Mon, 23 Jul 2001 dewar@gnat.com wrote:
>
> <<My argument is really the same, but with a twist: keep it alive, but
> don't make it the default if it makes non-FP code bigger.
> >>
>
> It is always tricky to argue about defaults. One critical issue with defaults
> is to make benchmarks work better out of the box, but the default of -O0
> seriously undermines this design criterion in any case.

I suspect that everybody who does benchmarking are so aware of the -O flag
that the "default" to not optimize is not really a default at all except
in a very theoretical sense. Although maybe gcc could make the default -O
a bit stronger.

But talking about the -O flag, I _really_ think that the fact that -O3
generally generates much worse code than -O2 is non-intuitive and can
really throw some people.

Why does -O3 imply "-finline-functions", when it's been shown again and
again to just make things worse? Now the current gcc seems to have fixed
this to some degree by just making "-finline-functions" much weaker
(good), but still, shouldn't we always have the rule that higher
optimization numbers tends to make code run faster for at least a
meaningful subset of the word "code" ;)

If somebody wants to pessimize their code by inlining everything, let them
use -O-1 ("inline everything, but don't bother with things like CSE" ;^),
or jst explicitly say "-finline".

This certainly threw me the first time I used gcc. I remember how the
original Linux makefiles used "-O6" because some place had (incorrectly)
mentioned that -Ox modified gcc behaviour up to the value 6, and I assumed
that -O6 would be better than -O2.

I think it would make much more sense if -O3 meant everything that -O2
does, plus the things we don't do because it makes debugging harder (ie
-fomit-frame-pointer and similar). That actually speeds things up and
makes code visibly smaller at times. Unlike the current fairly strange
thing -O3 does..

(Yeah, I know, we already turn on -fomit-frame-pointer for -O, but only on
the few targets where it doesn't hurt debugging. I'm just saying that
maybe we should do it for everything once you hit -O3, and then gently
warn about the combination of -O3 and -g).

		Linus

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [GCC 3.0] Bad regression, binary size
  2001-07-23  4:46 Jonathan Thornburg
@ 2001-07-23 15:14 ` Linus Torvalds
  0 siblings, 0 replies; 44+ messages in thread
From: Linus Torvalds @ 2001-07-23 15:14 UTC (permalink / raw)
  To: jthorn; +Cc: gcc

In article < 200107231146.NAA11586@mach.thp.univie.ac.at > you write:
>In message <URL: http://gcc.gnu.org/ml/gcc/2001-07/msg00617.html >,
>Linus Torvalds <torvalds at transmeta dot com> wrote
>> I would argue that floating point is a lot more special than kernels are.
>> I bet the code generation and optimization issues for kernels tend to be
>> closer to most programs than FP code tends to be. Which is why I think it
>> is the FP code that should be special-cased, not the kernel.
>
>I think this is a fundamental philosophical point... with which I
>strongly disagree.  The problem is that referring to "most" programs
>begs the question of just which sort of workload you're sampling.
>
>I'd like to put in a voice to urge that fp performance continue to be
>treated as "important".  Indeed, the integer programs that _I_ use
>(including OS kernels, shells, web browsers, not to mention gcc itself)
>are generally either not CPU-bound, or are already "fast enough", so
>I really don't care very much about integer performance.  But the fp
>codes I develop and use for a living are all painfully slow, and the
>science I'm doing with them would be a lot better if they were faster,
>so I care a lot about fp performance.

I'm absolutely not arguing against you.

I would, for example, suspect that a "correct" optimization strategy for
99% of all real-world cases (not benchmarks) is:

 - if it doesn't have floating point, optimize for the smallest size
   possible. Never do loop unrolling or anything fancy like that. 

 - if it has floating point or MMX, go wild at that point.

(And make the "has floating point" decision something more clever than
just "oh, I saw a fp reg here").

>So please, keep those fp-performance options (eg 16-byte stack alignment
>without having to always use a frame pointer on x86) alive for those of
>us who care!

My argument is really the same, but with a twist: keep it alive, but
don't make it the default if it makes non-FP code bigger.

Think of it this way: the single biggest slowdown that people react to
badly on most machines is the initial loading time for a binary. There
really aren't that many machines made today that "feel slow" from a CPU
standpoint.

Which means that loop optimizations seldom make sense for most things. 
Most integer workloads have fairly low repeat-rates: we're talking
hundreds or maybe few thousand repeats of most loops.  You have to make
those loops a heck of a lot faster to make up for a single page-fault
brought on by making the code larger.  Even just a single icache miss
(and you're pretty much _guaranteed_ an icache miss on program startup)
tends to be enough to the advantage of a win of a cycle or two. 

Most programs have become _less_ loopy, and the loops have grown much
bigger. 

The exception is FP and vector-integer stuff (MMX, AltiVec, whatever you
call it). 

		Linus

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [GCC 3.0] Bad regression, binary size
@ 2001-07-23  4:46 Jonathan Thornburg
  2001-07-23 15:14 ` Linus Torvalds
  0 siblings, 1 reply; 44+ messages in thread
From: Jonathan Thornburg @ 2001-07-23  4:46 UTC (permalink / raw)
  To: gcc; +Cc: Jonathan Thornburg

In message <URL: http://gcc.gnu.org/ml/gcc/2001-07/msg00617.html >,
Linus Torvalds <torvalds at transmeta dot com> wrote
> I would argue that floating point is a lot more special than kernels are.
> I bet the code generation and optimization issues for kernels tend to be
> closer to most programs than FP code tends to be. Which is why I think it
> is the FP code that should be special-cased, not the kernel.

I think this is a fundamental philosophical point... with which I
strongly disagree.  The problem is that referring to "most" programs
begs the question of just which sort of workload you're sampling.

I'd like to put in a voice to urge that fp performance continue to be
treated as "important".  Indeed, the integer programs that _I_ use
(including OS kernels, shells, web browsers, not to mention gcc itself)
are generally either not CPU-bound, or are already "fast enough", so
I really don't care very much about integer performance.  But the fp
codes I develop and use for a living are all painfully slow, and the
science I'm doing with them would be a lot better if they were faster,
so I care a lot about fp performance.

And yes, there are people doing serious fp on x86.  It may be a kludge
of an architecture, but sometimes money talks... and the newer x86 cpus
have significantly improved fp over older ones.

So please, keep those fp-performance options (eg 16-byte stack alignment
without having to always use a frame pointer on x86) alive for those of
us who care!

-- 
-- Jonathan Thornburg <jthorn@thp.univie.ac.at>
   Max-Planck-Institut fuer Gravitationsphysik (Albert-Einstein-Institut),
   Golm, Germany             http://www.aei.mpg.de/~jthorn/home.html
   "C++ is to programming as sex is to reproduction. Better ways might
    technically exist but they're not nearly as much fun." -- Nikolai Irgens

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [GCC 3.0] Bad regression, binary size
  2001-07-09 14:59         ` Phil Edwards
  2001-07-09 15:04           ` Marc Espie
@ 2001-07-09 16:43           ` Daniel Berlin
  1 sibling, 0 replies; 44+ messages in thread
From: Daniel Berlin @ 2001-07-09 16:43 UTC (permalink / raw)
  To: Phil Edwards; +Cc: Justin Guyett, Neil Booth, Gerald Pfeifer, Marc Espie, gcc

Phil Edwards <pedwards@disaster.jaj.com> writes:

> On Mon, Jul 09, 2001 at 02:54:21PM -0700, Justin Guyett wrote:
>> I dunno about the 1minute+ helloworld compile time someone just suggested,
>> but this is disturbing:
>> (v2.95) g++ -o hello hello.cc  0.15s user 0.03s system 96% cpu 0.186 total
>> (v3.0)  g++ -o hello hello.cc  2.04s user 0.06s system 99% cpu 2.102 total
>> 
>> if there's a 2 second loading overhead for g++v3, that's 2 seconds * (#
>> files) which can be quite a lot in large trees.  Is this simply from
>> linking in the new libstdc++?
> 
> Parsing the libstdc++ headers takes time, mostly because nearly all the
> code is in the headers.  Once we have 'export' this should be
> better.

This isn't the trouble, it's the inlining.
We expand something from 300 insns to 15k insns, which itself takes
time, memory, and then the backend has to deal with a lot more.

Parsing took < 5 seconds on my example.
It was the rest of compilation that took > 30 seconds.
Without inlining, it took 5 seconds at -O3 total.
There must be a happy medium.

 
> 
> Phil
> 
> -- 
> Would I had phrases that are not known, utterances that are strange, in
> new language that has not been used, free from repetition, not an utterance
> which has grown stale, which men of old have spoken.
>                                      - anonymous Egyptian scribe, c.1700 BC

-- 
"I put tape on the mirrors in my house so I don't accidentally
walk through into another dimension.
"-Steven Wright

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [GCC 3.0] Bad regression, binary size
  2001-07-09 16:05             ` Richard Henderson
@ 2001-07-09 16:24               ` Linus Torvalds
  0 siblings, 0 replies; 44+ messages in thread
From: Linus Torvalds @ 2001-07-09 16:24 UTC (permalink / raw)
  To: Richard Henderson; +Cc: gcc

On Mon, 9 Jul 2001, Richard Henderson wrote:
>
> That said, for the newest processors, P4, Athlon, and x86-64,
> it's a win to allocate all of the outgoing argument space in
> the prologue and use mov instaed of push.  At which point keeping
> the stack aligned takes zero extra instructions.

Hmm. It's obviously not true at least for one case: functions that do not
need a frame and that call other functions without arguments. But at that
point it might not matter any more.

		Linus

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [GCC 3.0] Bad regression, binary size
  2001-07-09 15:55             ` Joern Rennecke
@ 2001-07-09 16:14               ` Linus Torvalds
  0 siblings, 0 replies; 44+ messages in thread
From: Linus Torvalds @ 2001-07-09 16:14 UTC (permalink / raw)
  To: Joern Rennecke; +Cc: rth, gcc

On Mon, 9 Jul 2001, Joern Rennecke wrote:
>
> You can only ause your prologue variant when you could otherwide use
> -fomit-frame-pointer.

No. Read the thing again. It's a perfectly fine prologue - it's kind of
what gcc does already for alloca.

But I already stated that yes, this basically ends up using a frame
pointer for FP stuff. gcc does that anyway, even for -O2, so considering
hat we speed up everything else that sounds like a really good tradeoff.

Have you seen some of the benchmarks, and how much the stack alignment
hits newer gcc's?

		Linus

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [GCC 3.0] Bad regression, binary size
  2001-07-09 15:23           ` Linus Torvalds
  2001-07-09 15:55             ` Joern Rennecke
@ 2001-07-09 16:05             ` Richard Henderson
  2001-07-09 16:24               ` Linus Torvalds
  1 sibling, 1 reply; 44+ messages in thread
From: Richard Henderson @ 2001-07-09 16:05 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: gcc

On Mon, Jul 09, 2001 at 03:22:09PM -0700, Linus Torvalds wrote:
> Ehh, what's wrong with just
> 
>         pushl %ebp
>         movl %esp,%ebp
>         subl $needed+16,%esp
>         andl $-15,%esp
> 
> which is just one instruction longer than the current regular frame 
> prologue, and leaves you with a aligned stack pointer AND a perfectly   
> good way to get to the argument frame (the frame pointer).

Hmm.  Normally we eliminate all stack frame references to EBP.
This would work ok except when alloca is used.

> Sure, if people use "alloca()" or other fancy stuff (variable-sized
> automatic variables etc), then you might need an extra instruction or   
> two to actually align the stack. How common is that?

It's not uncommon, but certainly less common than needing
aligned stack frames.

> Statistics, anyone? I bet heavy FP code has FP arguments, which means
> that the stack needs alignment very seldom indeed. Same is probably true
> for XMM etc too.

It'd be a decent thing to collect.

That said, for the newest processors, P4, Athlon, and x86-64,
it's a win to allocate all of the outgoing argument space in
the prologue and use mov instaed of push.  At which point keeping
the stack aligned takes zero extra instructions.


r~

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [GCC 3.0] Bad regression, binary size
  2001-07-09 15:23           ` Linus Torvalds
@ 2001-07-09 15:55             ` Joern Rennecke
  2001-07-09 16:14               ` Linus Torvalds
  2001-07-09 16:05             ` Richard Henderson
  1 sibling, 1 reply; 44+ messages in thread
From: Joern Rennecke @ 2001-07-09 15:55 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: rth, gcc

> >Not quite -- you have to play games to get a proper pointer 
> >to the argument list so that you can do your step 3.
> 
> Ehh, what's wrong with just
> 
>         pushl %ebp
>         movl %esp,%ebp
>         subl $needed+16,%esp
>         andl $-15,%esp
> 
> which is just one instruction longer than the current regular frame 
> prologue, and leaves you with a aligned stack pointer AND a perfectly   
>       
> good way to get to the argument frame (the frame pointer).
> 
> Sure, if people use "alloca()" or other fancy stuff (variable-sized
> automatic variables etc), then you might need an extra instruction or   
> two to actually align the stack. How common is that? That case does need
> some extra work, I agree.

You can only ause your prologue variant when you could otherwide use
-fomit-frame-pointer.  And you also pay the price of having prefix bytes
for all the variable accesses on the stack.  Yet you won't get the benefit
of using %ebp as a general purpose registers.

> Anyway, there's a simpler approach for that: when passing in values that
> need alignment, we align the stack in the caller.  That way you know
> that your frame is always as aligned as it can be, AND the callee has
> the additional knowledge that because it was passed in a FP value it
> doesn't need to do the extra alignment itself.

IMHO marking functions in libc that have callbacks to user code or that
generally call - directly or indirectly - functions that do floating point
processing or callbacks, so that these specifically preserve 16 byte
alignment, should go a long way to avoid misalignment slowdown for programs
that want alignment, while not significantly slowing down integer code due
to data cahe misses.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [GCC 3.0] Bad regression, binary size
  2001-07-09 14:49         ` Richard Henderson
@ 2001-07-09 15:23           ` Linus Torvalds
  2001-07-09 15:55             ` Joern Rennecke
  2001-07-09 16:05             ` Richard Henderson
  0 siblings, 2 replies; 44+ messages in thread
From: Linus Torvalds @ 2001-07-09 15:23 UTC (permalink / raw)
  To: rth, gcc

In article < 20010709144855.B10117@redhat.com > you write:
>On Sun, Jul 08, 2001 at 02:28:04PM -0700, Linus Torvalds wrote:
>> You can fix it in three easy-ish steps (famous last words):
>
>Oh, no.  You can't get away that easily.
>
>>  - make a "align frame pointer prologue" (which is not that different
>>    from the existing one - one extra "and")
>
>Not quite -- you have to play games to get a proper pointer 
>to the argument list so that you can do your step 3.

Ehh, what's wrong with just

        pushl %ebp
        movl %esp,%ebp
        subl $needed+16,%esp
        andl $-15,%esp

which is just one instruction longer than the current regular frame 
prologue, and leaves you with a aligned stack pointer AND a perfectly   
      
good way to get to the argument frame (the frame pointer).

Sure, if people use "alloca()" or other fancy stuff (variable-sized
automatic variables etc), then you might need an extra instruction or   
two to actually align the stack. How common is that? That case does need
some extra work, I agree.
  
>>  - load all alignment-wanting arguments into pseudos in the prologue
>>    after aliging the frame pointer (and let the normal spill code spill
>>    them if required - now they are aligned or are cached in registers). 
>
>Which might work ok for small numbers of arguments, but
>is going to be excruciating when someone passes in 30
>arguments, or worse, passes a structure by value.

Anyway, there's a simpler approach for that: when passing in values that
need alignment, we align the stack in the caller.  That way you know
that your frame is always as aligned as it can be, AND the callee has
the additional knowledge that because it was passed in a FP value it
doesn't need to do the extra alignment itself.

So the only reason to ever do the above alignment stuff is
 - you haven't been passed in a aligned value, but you need to spill
   one, or you need to call one with one on the stack.

Statistics, anyone? I bet heavy FP code has FP arguments, which means
that the stack needs alignment very seldom indeed. Same is probably true
for XMM etc too.

			Linus

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [GCC 3.0] Bad regression, binary size
  2001-07-09 14:59         ` Phil Edwards
@ 2001-07-09 15:04           ` Marc Espie
  2001-07-09 16:43           ` Daniel Berlin
  1 sibling, 0 replies; 44+ messages in thread
From: Marc Espie @ 2001-07-09 15:04 UTC (permalink / raw)
  To: Phil Edwards; +Cc: gcc

On Mon, Jul 09, 2001 at 06:00:15PM -0400, Phil Edwards wrote:

> Parsing the libstdc++ headers takes time, mostly because nearly all the
> code is in the headers.  Once we have 'export' this should be better.

Right, sometimes this century maybe, hopefully...

I wouldn't bitch too much about C++... it's probably going to be a
nightmare when compiling kde (which already takes insanely long), but C
code apparently goes slightly slower, maybe 10%...

Much less worse than 2.8 -> 2.95.

Anyways, I hope 3.1 is going to be faster than 3.0, for a change...

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [GCC 3.0] Bad regression, binary size
  2001-07-09 14:54       ` Justin Guyett
@ 2001-07-09 14:59         ` Phil Edwards
  2001-07-09 15:04           ` Marc Espie
  2001-07-09 16:43           ` Daniel Berlin
  0 siblings, 2 replies; 44+ messages in thread
From: Phil Edwards @ 2001-07-09 14:59 UTC (permalink / raw)
  To: Justin Guyett; +Cc: Neil Booth, Gerald Pfeifer, Marc Espie, gcc

On Mon, Jul 09, 2001 at 02:54:21PM -0700, Justin Guyett wrote:
> I dunno about the 1minute+ helloworld compile time someone just suggested,
> but this is disturbing:
> (v2.95) g++ -o hello hello.cc  0.15s user 0.03s system 96% cpu 0.186 total
> (v3.0)  g++ -o hello hello.cc  2.04s user 0.06s system 99% cpu 2.102 total
> 
> if there's a 2 second loading overhead for g++v3, that's 2 seconds * (#
> files) which can be quite a lot in large trees.  Is this simply from
> linking in the new libstdc++?

Parsing the libstdc++ headers takes time, mostly because nearly all the
code is in the headers.  Once we have 'export' this should be better.


Phil

-- 
Would I had phrases that are not known, utterances that are strange, in
new language that has not been used, free from repetition, not an utterance
which has grown stale, which men of old have spoken.
                                     - anonymous Egyptian scribe, c.1700 BC

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [GCC 3.0] Bad regression, binary size
  2001-07-09 13:28     ` Neil Booth
  2001-07-09 14:13       ` Gerald Pfeifer
@ 2001-07-09 14:54       ` Justin Guyett
  2001-07-09 14:59         ` Phil Edwards
  1 sibling, 1 reply; 44+ messages in thread
From: Justin Guyett @ 2001-07-09 14:54 UTC (permalink / raw)
  To: Neil Booth; +Cc: Gerald Pfeifer, Marc Espie, gcc

On Mon, 9 Jul 2001, Neil Booth wrote:

> Gerald Pfeifer wrote:-
>
> > Indeed, GCC 3.0 takes an order of magnitude longer to compile my sources,
>
> That's not fair - it's not compiling the same sources if you're
> talking about C++ with the standard library, which I presume you are.

I dunno about the 1minute+ helloworld compile time someone just suggested,
but this is disturbing:
(v2.95) g++ -o hello hello.cc  0.15s user 0.03s system 96% cpu 0.186 total
(v3.0)  g++ -o hello hello.cc  2.04s user 0.06s system 99% cpu 2.102 total

if there's a 2 second loading overhead for g++v3, that's 2 seconds * (#
files) which can be quite a lot in large trees.  Is this simply from
linking in the new libstdc++?

c performance is still fine afaict

openssl 0.9.6a (probably not the best choice in the world, but it'll do):

v3.0 (733)
(O3, m486)       make  183.20s user 10.17s system 98% cpu 3:16:49 total
(O2, march=i686) make  177.66s user 10.44s system 99% cpu 3:08.32 total
(-O -march=i686) make  144.05s user 9.64s  system 99% cpu 2:33:84 total

vs v2.95 (850) (but slower hdd)
(O3, m486)       make  130.88s user 15.06s system 90% cpu 2:41.03 total
(O2, march=i686) make  128.23s user 15.73s system 92% cpu 2:36.26 total
(-O -march=i686) make  128.82s user 15.17s system 92% cpu 2:35.58 total

A better rebuttal for c compilation might be that gcc 3.0 is slower with
significant optimization, but 2.95 doesn't seem to be doing much more
optimizing in -O2 vs -O.  Keep in mind the CPU speed difference.

If you're compiling so much, perhaps the better model would be to use -O
or -O0 until you need a release-speed build to do stress testing with, at
which point it makes sense to spend longer compiling for incrementally
smaller code speed increases.  It may not have made very much difference
in 2.95, but it makes sense that it _should_, given a compiler that
actually does significantly more processing for higher optimization
levels.


justin

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [GCC 3.0] Bad regression, binary size
  2001-07-08 14:28       ` Linus Torvalds
  2001-07-08 23:59         ` Marc Espie
@ 2001-07-09 14:49         ` Richard Henderson
  2001-07-09 15:23           ` Linus Torvalds
  1 sibling, 1 reply; 44+ messages in thread
From: Richard Henderson @ 2001-07-09 14:49 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: gcc

On Sun, Jul 08, 2001 at 02:28:04PM -0700, Linus Torvalds wrote:
> You can fix it in three easy-ish steps (famous last words):

Oh, no.  You can't get away that easily.

>  - make a "align frame pointer prologue" (which is not that different
>    from the existing one - one extra "and")

Not quite -- you have to play games to get a proper pointer 
to the argument list so that you can do your step 3.

>  - load all alignment-wanting arguments into pseudos in the prologue
>    after aliging the frame pointer (and let the normal spill code spill
>    them if required - now they are aligned or are cached in registers). 

Which might work ok for small numbers of arguments, but
is going to be excruciating when someone passes in 30
arguments, or worse, passes a structure by value.

> This would not have any negative impact on most code,
> and I don't see it as being noticeably slower than the
> "always align" even for the FP-heavy code.

I think you'd be surprised how ugly forced stack alignment
is in practice.


r~

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [GCC 3.0] Bad regression, binary size
  2001-07-09 14:13       ` Gerald Pfeifer
@ 2001-07-09 14:36         ` Bobby McNulty
  0 siblings, 0 replies; 44+ messages in thread
From: Bobby McNulty @ 2001-07-09 14:36 UTC (permalink / raw)
  To: Gerald Pfeifer, Neil Booth; +Cc: Marc Espie, gcc

----- Original Message -----
From: "Gerald Pfeifer" <pfeifer@dbai.tuwien.ac.at>
To: "Neil Booth" <neil@daikokuya.demon.co.uk>
Cc: "Marc Espie" <espie@quatramaran.ens.fr>; <gcc@gcc.gnu.org>
Sent: Monday, July 09, 2001 4:13 PM
Subject: Re: [GCC 3.0] Bad regression, binary size


> On Mon, 9 Jul 2001, Neil Booth wrote:
> >> Indeed, GCC 3.0 takes an order of magnitude longer to compile my
sources,
> > That's not fair - it's not compiling the same sources if you're
> > talking about C++ with the standard library, which I presume you are.
>
> Well, yes. As I wrote "Plus, a lot of that slowdown probably is due to
> libstdc++-v3 and its much more standards compliant implementation of
> iterators etc."
>
> But, frankly, as I user I'm mainly seeing the following: It's exactly
> the same source that compiles without warning under both compilers, and
> one compiler takes significantly more time and more memory to generate a
> binary that is both larger and slower.
>
> Gerald
> --
> Gerald "Jerry" pfeifer@dbai.tuwien.ac.at
http://www.dbai.tuwien.ac.at/~pfeifer/
>
I noticed in Linux Mandrake 8.0, with GCC 3.0.1 prerelease, the C++ took
about a minute to compile a simple "Hello, World" program, but GCC took less
than a second. BTW I did not recompile GLIBC or the Kernel.
Now, I'm back with Windows 98 and Cygwin. I was just reading the list when I
found the slowdown messages.
Perhaps the size does play a lot with it.
        Bobby McNulty
        Computer Enthusiest.



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [GCC 3.0] Bad regression, binary size
  2001-07-09 13:28     ` Neil Booth
@ 2001-07-09 14:13       ` Gerald Pfeifer
  2001-07-09 14:36         ` Bobby McNulty
  2001-07-09 14:54       ` Justin Guyett
  1 sibling, 1 reply; 44+ messages in thread
From: Gerald Pfeifer @ 2001-07-09 14:13 UTC (permalink / raw)
  To: Neil Booth; +Cc: Marc Espie, gcc

On Mon, 9 Jul 2001, Neil Booth wrote:
>> Indeed, GCC 3.0 takes an order of magnitude longer to compile my sources,
> That's not fair - it's not compiling the same sources if you're
> talking about C++ with the standard library, which I presume you are.

Well, yes. As I wrote "Plus, a lot of that slowdown probably is due to
libstdc++-v3 and its much more standards compliant implementation of
iterators etc."

But, frankly, as I user I'm mainly seeing the following: It's exactly
the same source that compiles without warning under both compilers, and
one compiler takes significantly more time and more memory to generate a
binary that is both larger and slower.

Gerald
-- 
Gerald "Jerry" pfeifer@dbai.tuwien.ac.at http://www.dbai.tuwien.ac.at/~pfeifer/

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [GCC 3.0] Bad regression, binary size
  2001-07-09 13:12   ` Gerald Pfeifer
@ 2001-07-09 13:28     ` Neil Booth
  2001-07-09 14:13       ` Gerald Pfeifer
  2001-07-09 14:54       ` Justin Guyett
  0 siblings, 2 replies; 44+ messages in thread
From: Neil Booth @ 2001-07-09 13:28 UTC (permalink / raw)
  To: Gerald Pfeifer; +Cc: Marc Espie, gcc

Gerald Pfeifer wrote:-

> Indeed, GCC 3.0 takes an order of magnitude longer to compile my sources,

That's not fair - it's not compiling the same sources if you're
talking about C++ with the standard library, which I presume you are.

Neil.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [GCC 3.0] Bad regression, binary size
  2001-07-09  0:06 ` Marc Espie
@ 2001-07-09 13:12   ` Gerald Pfeifer
  2001-07-09 13:28     ` Neil Booth
  0 siblings, 1 reply; 44+ messages in thread
From: Gerald Pfeifer @ 2001-07-09 13:12 UTC (permalink / raw)
  To: Marc Espie; +Cc: gcc

On Mon, 9 Jul 2001, Marc Espie wrote:
> <devil's advocate>
> Err... New compilers are supposed to be BETTER than old ones. Ideally, there
> should be no regression at all ! What's the damn point of -Os if successive
> releases of the compiler keep getting out larger and larger code ?

Indeed, GCC 3.0 takes an order of magnitude longer to compile my sources,
about three times as much of memory, and the generated code runs slower
(on ia32, one of the platforms most improved for GCC 3.0).

> In fact, what's the point of adding and adding more optimizing passes to
> the compiler if all it does is keep the code size mostly identical, keep
> the running time mostly identical, and slow the compiler down even more ?

That's a question I've asked myself as well, yes. :-(

On point, and a crucial so, is infrastructure. The new C++ inliner got us
rid of a lot of bugs and will allow for significant improvements in the
near future. Plus, a lot of that slowdown probably is due to libstdc++-v3
and its much more standards compliant implementation of iterators etc.

Gerald
-- 
Gerald "Jerry" pfeifer@dbai.tuwien.ac.at http://www.dbai.tuwien.ac.at/~pfeifer/

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [GCC 3.0] Bad regression, binary size
  2001-07-09 10:13 dewar
@ 2001-07-09 10:37 ` Linus Torvalds
  0 siblings, 0 replies; 44+ messages in thread
From: Linus Torvalds @ 2001-07-09 10:37 UTC (permalink / raw)
  To: dewar; +Cc: Marc.Espie, gcc

On Mon, 9 Jul 2001 dewar@gnat.com wrote:
>
> <<I would argue that floating point is a lot more special than kernels are.
> I bet the code generation and optimization issues for kernels tend to be
> closer to most programs than FP code tends to be. Which is why I think it
> is the FP code that should be special-cased, not the kernel.
> >>
>
> But it is not just floating-point, it is any 8 or 16 byte object, such
> as a 64-bit integer, or a struct with two 32-bit fields.

Structs with two 32-bit fields? Why? They are accessed with 32-bit
operations, they don't need any alignment.

Using MMX opcodes for moving 8-byte entities around is _stupid_. You get
FP save/restore exceptions everywhere. The break-even point for using MMX
(and trust me, the kernel people have done the numbers) tends to be in the
kilobyte range. So something like memcpy()/memset() will use it, but as it
only makes sense with large areas and with XMM anyway (and XMM has support
for unaligned start/end stuff), I seriously doubt it's an issue.

But read my email again: I did say that "floating point" is "any large
object". Yes it happens. But yes, it also tends to be very localized.
Whether we're talking about "long long" or "long double" or
"xmm_fp_type_t". And I claim that my heuristic should work rather
efficiently.

There are numbers to back up the problems with stack alignment. Just
search the archives on this list.

Show me the numbers that back up the fact that stack alignment is globally
necessary. I doubt you will find a _single_ benchmark that would hurt from
doing it just locally.

I have the numbers for the current practice being BAD. You show me yours
to back up YOUR claims.  Until you do, you're just spouting opinions and
hot air.

And that's not how you should do optimization work.

		Linus

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [GCC 3.0] Bad regression, binary size
@ 2001-07-09 10:13 dewar
  2001-07-09 10:37 ` Linus Torvalds
  0 siblings, 1 reply; 44+ messages in thread
From: dewar @ 2001-07-09 10:13 UTC (permalink / raw)
  To: Marc.Espie, torvalds; +Cc: gcc

<<I would argue that floating point is a lot more special than kernels are.
I bet the code generation and optimization issues for kernels tend to be
closer to most programs than FP code tends to be. Which is why I think it
is the FP code that should be special-cased, not the kernel.
>>

But it is not just floating-point, it is any 8 or 16 byte object, such
as a 64-bit integer, or a struct with two 32-bit fields.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [GCC 3.0] Bad regression, binary size
  2001-07-09  9:10           ` Linus Torvalds
  2001-07-09  9:28             ` Marc Espie
@ 2001-07-09 10:10             ` Paolo Carlini
  1 sibling, 0 replies; 44+ messages in thread
From: Paolo Carlini @ 2001-07-09 10:10 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: gcc

Hi,

Linus Torvalds wrote:

> Now, that report may obviously have been bogus, of course. There may be
> something else going on. Linux uses "-mpreferred-stack-boundary=2" exactly
> because the default behaviour of newer gcc's is very much suboptimal
> (certainly for the kernel, and I will bet that it is for most applications
> compiled with gcc).

The "flops" benchmark (f.i., http://ftp.sunet.se/pub2/benchmark/aburto/flops/ )
clearly demostrates this point: just try building it with 3.0.x with and
without -mpreferred-stack-boundary=2. The difference is impressive.

My two cents,
Paolo Carlini.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [GCC 3.0] Bad regression, binary size
  2001-07-09  9:28             ` Marc Espie
@ 2001-07-09  9:58               ` Linus Torvalds
  0 siblings, 0 replies; 44+ messages in thread
From: Linus Torvalds @ 2001-07-09  9:58 UTC (permalink / raw)
  To: Marc.Espie; +Cc: gcc

On Mon, 9 Jul 2001, Marc Espie wrote:
>
> No, you read this a bit too quick. Someone suggested to use
> -mpreferred-stack-boundary to get the size back down, which actually does
> work, but still does not give me any idea why my kernel is suddenly bigger.

Ahh, ok.

> The actual object sizes is +10K,
> The resulting kernel is +60K,
> compressed size difference is even larger.

Bad compression? Doesn't sound like alignment or lack of merging. Although
the code alignment is using the "multi-byte nops", which might not
compress that well. If the default architecture has changed..

> Now, the rationale for keeping -mpreferred-stack-boundary `high' by default
> is that  mixed code (`normal' code + floating point code) does need that
> boundary by default, so if you turn that off, say in library code, then
> all floating point code loses (because it's somewhat inefficient to regain
> proper boundary, and the compiler does not really know when to emit that
> code, e.g., where is the boundary between floating point code and
> non-floating point code, since this is global stuff that isn't even
> `fixed').

But that's a bogus argument.

The compiler _does_ know when to emit the code. You only need to emit the
code when:
 - you have floating point arguments to a function you're calling (so that
   you can try to align the arguments - but realize that none of the x86
   ABI's really allow for true alignment ANYWAY)
 - you need to spill floating point

If you add these rules together, you also know that
 - you are already aligned if you got a floating point argument.

(Exchange "large object" for "floating point" - I realize that things like
XMM etc also want alignment. Also, you may want to modify the rule about
fp arguments - maybe not try to keep the stack itself aligned, but keep
instead the first fp argument aligned).

Now, floating point doesn't just happen on its own. When you have floating
point spills, I will bet you that in most cases you have floating point
arguments to your function too. Which means that with the above two
heuristics (and the rule you can infer from them), you will probably see
very little "unnecessary" alignment generation.

And the win? You see _no_ unnecessary alignment for the normal case.

And remember: you're supposed to optimize for the normal case. That's
rule#1 in optimization. Always.

> Kernels are special beasts, since they are completely isolated from
> userland, and hence their stack boundary issues are very well isolated.

I agree that kernels tend to have more well-defined behaviour, and for
that reason kernels (and embedded projects) can often use rules that are
illegal in general. HOWEVER, I disagree strongly that kernels are that
"special".

I would argue that floating point is a lot more special than kernels are.
I bet the code generation and optimization issues for kernels tend to be
closer to most programs than FP code tends to be. Which is why I think it
is the FP code that should be special-cased, not the kernel.

> I can hardly fault the gcc team on preferred-stack-boundary...

Why? Do you argue that it makes code better on average? I seriously doubt
it. And if you don't argue that, then why just accept the "we will always
do this thing" decision?

Now, I can accept the fact that some people may think it is _easier_ to
just always keep the stack aligned. That's a technical judgement, and
likely to be true. But if THAT is the reason for the alignment, then I
would still fault rth for saying "it will always be so". Maybe somebody
comes along and is willing to fix it, and I would hope that the gcc
developers would jump for joy, instead of being set on a course of bad
code generation for the default case.

		Linus

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [GCC 3.0] Bad regression, binary size
  2001-07-09  9:10           ` Linus Torvalds
@ 2001-07-09  9:28             ` Marc Espie
  2001-07-09  9:58               ` Linus Torvalds
  2001-07-09 10:10             ` Paolo Carlini
  1 sibling, 1 reply; 44+ messages in thread
From: Marc Espie @ 2001-07-09  9:28 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: gcc

On Mon, Jul 09, 2001 at 09:09:08AM -0700, Linus Torvalds wrote:
> 
> On Mon, 9 Jul 2001, Marc Espie wrote:
> >
> > In article < 200107082128.f68LS4x08156@penguin.transmeta.com > you write:
> > >gcc-3.0 already gets complaints for generating slower and more bloated
> > >code than previous gcc releases. That should tell people something.
> >
> > Try not to jumble things together, please. The ix86 alignment properties
> > is nothing new at all.
> 
> Hmm.. It was reported as the potential reason for the bigger OpenBSD
> kernel.

No, you read this a bit too quick. Someone suggested to use
-mpreferred-stack-boundary to get the size back down, which actually does
work, but still does not give me any idea why my kernel is suddenly bigger.

The actual object sizes is +10K,
The resulting kernel is +60K,
compressed size difference is even larger.

As far as I know, my linker gets confused and no longer merges stuff it
used to, which I now wish to find.

Now, the rationale for keeping -mpreferred-stack-boundary `high' by default
is that  mixed code (`normal' code + floating point code) does need that
boundary by default, so if you turn that off, say in library code, then
all floating point code loses (because it's somewhat inefficient to regain
proper boundary, and the compiler does not really know when to emit that
code, e.g., where is the boundary between floating point code and
non-floating point code, since this is global stuff that isn't even
`fixed').

Kernels are special beasts, since they are completely isolated from
userland, and hence their stack boundary issues are very well isolated.

All this is even documented in the gcc documentation (e.g., 
preferred-stack-boundary is marked as a beneficial tweak for embedded
systems).


I can hardly fault the gcc team on preferred-stack-boundary...

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [GCC 3.0] Bad regression, binary size
  2001-07-08 23:59         ` Marc Espie
  2001-07-09  6:06           ` Tim Prince
@ 2001-07-09  9:10           ` Linus Torvalds
  2001-07-09  9:28             ` Marc Espie
  2001-07-09 10:10             ` Paolo Carlini
  1 sibling, 2 replies; 44+ messages in thread
From: Linus Torvalds @ 2001-07-09  9:10 UTC (permalink / raw)
  To: Marc Espie; +Cc: gcc

On Mon, 9 Jul 2001, Marc Espie wrote:
>
> In article < 200107082128.f68LS4x08156@penguin.transmeta.com > you write:
> >gcc-3.0 already gets complaints for generating slower and more bloated
> >code than previous gcc releases. That should tell people something.
>
> Try not to jumble things together, please. The ix86 alignment properties
> is nothing new at all.

Hmm.. It was reported as the potential reason for the bigger OpenBSD
kernel.

Now, that report may obviously have been bogus, of course. There may be
something else going on. Linux uses "-mpreferred-stack-boundary=2" exactly
because the default behaviour of newer gcc's is very much suboptimal
(certainly for the kernel, and I will bet that it is for most applications
compiled with gcc).

		Linus

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [GCC 3.0] Bad regression, binary size
@ 2001-07-09  7:50 dewar
  0 siblings, 0 replies; 44+ messages in thread
From: dewar @ 2001-07-09  7:50 UTC (permalink / raw)
  To: Marc.Espie, dewar; +Cc: gcc

<<If it causes us not to be able to switch to gcc 3.0.x six months from now,
it is quite serious, from my point of view...
>>

There are many reasons why a specific project might not want to switch
to a new version of the compiler, but that does not mean there is a serious
regression. For example, a project might be depending on a bug, now fixed,
and thus be delayed in switching versions of the compiler, but that does
not make it a regression.

it is really helpful NOT to overuse the technical term regression, which
simply means that *with respect to the target specifications* there is
an instance of failing to meet specs where before the spec was met.

THat's not the case here, there was no spec that your particular program
generate 0.98X space instead of 1.00X space

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [GCC 3.0] Bad regression, binary size
  2001-07-09  7:18 dewar
@ 2001-07-09  7:22 ` Marc Espie
  0 siblings, 0 replies; 44+ messages in thread
From: Marc Espie @ 2001-07-09  7:22 UTC (permalink / raw)
  To: dewar; +Cc: gcc

On Mon, Jul 09, 2001 at 10:17:54AM -0400, dewar@gnat.com wrote:

> The issue is with calling this a "serious regression", that's obviously
> inappropriate.

If it causes us not to be able to switch to gcc 3.0.x six months from now,
it is quite serious, from my point of view...

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [GCC 3.0] Bad regression, binary size
@ 2001-07-09  7:18 dewar
  2001-07-09  7:22 ` Marc Espie
  0 siblings, 1 reply; 44+ messages in thread
From: dewar @ 2001-07-09  7:18 UTC (permalink / raw)
  To: espie, gcc

<<Err... New compilers are supposed to be BETTER than old ones. Ideally, there
should be no regression at all ! What's the damn point of -Os if successive
releases of the compiler keep getting out larger and larger code ?
>>

A requirement that -Os not increase code size for any possible program is
clearly unreasonable. It may well be that some particular *size* optimization
helps nearly all programs, but not that particular one.

Or, as discussed in the thread on alignment, a change in ABI conventions
may cause code size increases that are justified in terms of general
performance improvement (obviously -Os should not affect the ABI).

<<In fact, as I stated already, this seems to be a weird linker/compiler
interaction, which I'm going to try to figure out.
>>

Well of course anytime we see some anomolous behavior like this, we should
try to understand it, and if in fact the behavior can be improved, then we
should indeed fix it. No one argues with that.

The issue is with calling this a "serious regression", that's obviously
inappropriate.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [GCC 3.0] Bad regression, binary size
  2001-07-08 23:59         ` Marc Espie
@ 2001-07-09  6:06           ` Tim Prince
  2001-07-09  9:10           ` Linus Torvalds
  1 sibling, 0 replies; 44+ messages in thread
From: Tim Prince @ 2001-07-09  6:06 UTC (permalink / raw)
  To: Marc Espie, torvalds; +Cc: gcc

----- Original Message -----
From: "Marc Espie" <espie@quatramaran.ens.fr>
To: <torvalds@transmeta.com>
Cc: <gcc@gcc.gnu.org>
Sent: Sunday, July 08, 2001 11:59 PM
Subject: Re: [GCC 3.0] Bad regression, binary size


> In article < 200107082128.f68LS4x08156@penguin.transmeta.com > you
write:
> >gcc-3.0 already gets complaints for generating slower and more
bloated
> >code than previous gcc releases. That should tell people something.
>
> Try not to jumble things together, please. The ix86 alignment
properties
> is nothing new at all. You can't really use the `gcc 3.0 is slower
> than previous releases' argument as a lever against it.
>
> If gcc 3.0 is a problem for you, please dig deeper, and get the stack
> alignment issues in a separate thread.

Several commercial compilers have options to make the stack 4- 8-, or
maybe 16- byte aligned.  I'd rather pay the occasional price of a
16-byte aligned stack than continue with the current dilemma of certain
g++ implementations breaking with more than 4-byte alignment, and the
consequent poor performance on larger objects.  In spite of public
pronouncements that gcc is not meant to support technical computing,
there are people who use 64-bit or larger objects.  gcc on linux has not
followed the MS tactic of disabling support for long double,  for
example.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [GCC 3.0] Bad regression, binary size
  2001-07-05  8:42 dewar
@ 2001-07-09  0:06 ` Marc Espie
  2001-07-09 13:12   ` Gerald Pfeifer
  0 siblings, 1 reply; 44+ messages in thread
From: Marc Espie @ 2001-07-09  0:06 UTC (permalink / raw)
  To: gcc

In article < 20010705154207.74D63F2B5E@nile.gnat.com > you write:
><<This is a significant regression from 2.95.2. Fortunately, it does seem to
>come from some linker issue which I hope to figure out, but be assured this
>kind of problem is very serious.

>I really can't agree that a 2% difference in object size is a significant
>regression. Optimization is a trade off between size and execution speed,
>and you may well get variations in this range from one version of a compiler
>to another.

<devil's advocate>
Err... New compilers are supposed to be BETTER than old ones. Ideally, there
should be no regression at all ! What's the damn point of -Os if successive
releases of the compiler keep getting out larger and larger code ?

In fact, what's the point of adding and adding more optimizing passes to
the compiler if all it does is keep the code size mostly identical, keep
the running time mostly identical, and slow the compiler down even more ?

And I'm not even talking some obscure, half-supported platform like m68k,
which has had ICEs every new release since 2.8, but i386...

One issue is that updating is necessary to get work-in-progress progress 
on real important things, like C++ support. Why does this have to come
with plain old C regressions ?
</devil's advocate>

In fact, as I stated already, this seems to be a weird linker/compiler 
interaction, which I'm going to try to figure out.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [GCC 3.0] Bad regression, binary size
  2001-07-08 14:28       ` Linus Torvalds
@ 2001-07-08 23:59         ` Marc Espie
  2001-07-09  6:06           ` Tim Prince
  2001-07-09  9:10           ` Linus Torvalds
  2001-07-09 14:49         ` Richard Henderson
  1 sibling, 2 replies; 44+ messages in thread
From: Marc Espie @ 2001-07-08 23:59 UTC (permalink / raw)
  To: torvalds; +Cc: gcc

In article < 200107082128.f68LS4x08156@penguin.transmeta.com > you write:
>gcc-3.0 already gets complaints for generating slower and more bloated
>code than previous gcc releases. That should tell people something.

Try not to jumble things together, please. The ix86 alignment properties
is nothing new at all. You can't really use the `gcc 3.0 is slower 
than previous releases' argument as a lever against it.

If gcc 3.0 is a problem for you, please dig deeper, and get the stack
alignment issues in a separate thread.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [GCC 3.0] Bad regression, binary size
  2001-07-07 16:45     ` Richard Henderson
@ 2001-07-08 14:28       ` Linus Torvalds
  2001-07-08 23:59         ` Marc Espie
  2001-07-09 14:49         ` Richard Henderson
  0 siblings, 2 replies; 44+ messages in thread
From: Linus Torvalds @ 2001-07-08 14:28 UTC (permalink / raw)
  To: gcc

In article < 20010707164531.C2847@redhat.com > you write:
>On Sat, Jul 07, 2001 at 11:41:48PM +0200, Jamie Lokier wrote:
>> By the way, is this 16 byte stack alignment going to be a permanent
>> feature?
>
>Yes.

Ugh.  It's a bad, bad, misoptimization.

99% of all code tends to be pretty much integer only.  You do not get
any wins from the alignment, and you do get clear losses.  For one,
function calls are clearly more expensive, and your dcache isn't as
dense. 

So you optimize for the 1% of benchmarks that care, at the expense of
the 99% of real-life code that doesn't?

>> As far as I can tell, it is not necessary to 16-byte align all functions
>> just to help the performance of a few.  Instead, functions which pass
>> aligned arguments or use aligned locals, and which don't receive any
>> aligned argument, could use "and" to align the stack themselves.  It is
>> necessary to record the original stack pointer, but that is easy enough.
>
>Yes, this can be done, but it is _really_ expensive.  You can
>no longer directly address stack arguments, and wind up burning
>yet another register to address them.

You can fix it in three easy-ish steps (famous last words):
 - require that functions that have alignment requirements have a frame
   pointer
 - make a "align frame pointer prologue" (which is not that different
   from the existing one - one extra "and")
 - load all alignment-wanting arguments into pseudos in the prologue
   after aliging the frame pointer (and let the normal spill code spill
   them if required - now they are aligned or are cached in registers). 

This would not have any negative impact on most code, and I don't see it
as being noticeably slower than the "always align" even for the FP-heavy
code. Especially as gcc _already_ defaults to having a frame pointer
even when optimizing.

gcc-3.0 already gets complaints for generating slower and more bloated
code than previous gcc releases. That should tell people something.

		Linus

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [GCC 3.0] Bad regression, binary size
  2001-07-07 14:41   ` Jamie Lokier
@ 2001-07-07 16:45     ` Richard Henderson
  2001-07-08 14:28       ` Linus Torvalds
  0 siblings, 1 reply; 44+ messages in thread
From: Richard Henderson @ 2001-07-07 16:45 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Marc.Espie, gcc

On Sat, Jul 07, 2001 at 11:41:48PM +0200, Jamie Lokier wrote:
> By the way, is this 16 byte stack alignment going to be a permanent
> feature?

Yes.

> As far as I can tell, it is not necessary to 16-byte align all functions
> just to help the performance of a few.  Instead, functions which pass
> aligned arguments or use aligned locals, and which don't receive any
> aligned argument, could use "and" to align the stack themselves.  It is
> necessary to record the original stack pointer, but that is easy enough.

Yes, this can be done, but it is _really_ expensive.  You can
no longer directly address stack arguments, and wind up burning
yet another register to address them.

The one thing that really ought to happen is to notice that
leaf functions don't need extra alignment.  Then remember
alignment required by previously compiled functions.


r~

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [GCC 3.0] Bad regression, binary size
  2001-07-07 13:09 ` Richard Henderson
@ 2001-07-07 14:41   ` Jamie Lokier
  2001-07-07 16:45     ` Richard Henderson
  0 siblings, 1 reply; 44+ messages in thread
From: Jamie Lokier @ 2001-07-07 14:41 UTC (permalink / raw)
  To: Richard Henderson, Marc.Espie, gcc

Richard Henderson wrote:
> > -rwxr-xr-x   1 espie    wheel     3994332 Jul  5 16:08 bsd
> > 
> > 3.0:
> > -rwxr-xr-x   1 espie    wheel     4068067 Jul  5 16:35 bsd
> > 
> > Exact same source. Both are using -O2 -Os to compile.
> 
> Also try -mpreferred-stack-boundary=2.  The default is 16 byte
> alignment, which your kernel almost certainly doesn't need, and
> there is extra code to maintain that alignment.

By the way, is this 16 byte stack alignment going to be a permanent
feature?

As far as I can tell, it is not necessary to 16-byte align all functions
just to help the performance of a few.  Instead, functions which pass
aligned arguments or use aligned locals, and which don't receive any
aligned argument, could use "and" to align the stack themselves.  It is
necessary to record the original stack pointer, but that is easy enough.

(The one case where this rule isn't perfect is variadic functions, but
alignment-related performance is not really an issue for those).

It seems to me this would improve performance, code density, and reduce
stack usage of functions which do not need the extra alignment, while at
the same time allow functions which do benefit from stack alignment to
perform as well as they do now.

This would also have the benefit that if a larger alignment is required
for some object, for example some structures like to be cache-line
aligned, then that can be accomodated too.

cheers,
-- Jamie

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [GCC 3.0] Bad regression, binary size
  2001-07-05  7:39 Marc Espie
  2001-07-05  7:51 ` Marc Espie
  2001-07-05 15:14 ` Geoff Keating
@ 2001-07-07 13:09 ` Richard Henderson
  2001-07-07 14:41   ` Jamie Lokier
  2 siblings, 1 reply; 44+ messages in thread
From: Richard Henderson @ 2001-07-07 13:09 UTC (permalink / raw)
  To: Marc.Espie; +Cc: gcc

On Thu, Jul 05, 2001 at 04:39:30PM +0200, Marc Espie wrote:
> 2.95.3 + peephole to handle stack adjustment, backported from current:
> -rwxr-xr-x   1 espie    wheel     3994332 Jul  5 16:08 bsd
> 
> 3.0:
> -rwxr-xr-x   1 espie    wheel     4068067 Jul  5 16:35 bsd
> 
> Exact same source. Both are using -O2 -Os to compile.

Also try -mpreferred-stack-boundary=2.  The default is 16 byte
alignment, which your kernel almost certainly doesn't need, and
there is extra code to maintain that alignment.


r~

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [GCC 3.0] Bad regression, binary size
  2001-07-05  7:39 Marc Espie
  2001-07-05  7:51 ` Marc Espie
@ 2001-07-05 15:14 ` Geoff Keating
  2001-07-07 13:09 ` Richard Henderson
  2 siblings, 0 replies; 44+ messages in thread
From: Geoff Keating @ 2001-07-05 15:14 UTC (permalink / raw)
  To: Marc.Espie; +Cc: gcc

Marc Espie <espie@schutzenberger.liafa.jussieu.fr> writes:

> After fixing a few warnings, I've managed to compile OpenBSD kernels.
> The results are BAD:
> 
> 2.95.3 + peephole to handle stack adjustment, backported from current:
> -rwxr-xr-x   1 espie    wheel     3994332 Jul  5 16:08 bsd
> 
> 3.0:
> -rwxr-xr-x   1 espie    wheel     4068067 Jul  5 16:35 bsd
> 
> 
> Exact same source. Both are using -O2 -Os to compile.
> 
> This is on an i386.
> 
> This is pretty bad. Basically, it means we can't switch to 3.0 at all.
> 
> I'm going to look at the corresponding generated files, see if I can spot
> some pattern.

You might consider looking at the ix86_*_alignment functions in
i386.c.  From what I can see, they don't honour -Os at all.

-- 
- Geoffrey Keating <geoffk@geoffk.org>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [GCC 3.0] Bad regression, binary size
@ 2001-07-05  8:42 dewar
  2001-07-09  0:06 ` Marc Espie
  0 siblings, 1 reply; 44+ messages in thread
From: dewar @ 2001-07-05  8:42 UTC (permalink / raw)
  To: Marc.Espie, dewar; +Cc: gcc

<<This is a significant regression from 2.95.2. Fortunately, it does seem to
come from some linker issue which I hope to figure out, but be assured this
kind of problem is very serious.
>>

I really can't agree that a 2% difference in object size is a significant
regression. Optimization is a trade off between size and execution speed,
and you may well get variations in this range from one version of a compiler
to another.

An embedded system with only 2% headroom is an abomination.

As for fitting on a floppy, that really entirely marginal. it sounds like
you have been desparately close the floppy limit for a while, and at this
stage nothing gets distributed on floppies anyway.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [GCC 3.0] Bad regression, binary size
  2001-07-05  8:31 dewar
@ 2001-07-05  8:38 ` Marc Espie
  0 siblings, 0 replies; 44+ messages in thread
From: Marc Espie @ 2001-07-05  8:38 UTC (permalink / raw)
  To: dewar; +Cc: gcc

On Thu, Jul 05, 2001 at 11:31:35AM -0400, dewar@gnat.com wrote:
> <<2.95.3 + peephole to handle stack adjustment, backported from current:
> -rwxr-xr-x   1 espie    wheel     3994332 Jul  5 16:08 bsd

> 3.0:
> -rwxr-xr-x   1 espie    wheel     4068067 Jul  5 16:35 bsd
> >>

> I don't understand, are you saying that the less than 2% increase in
> size is so worrisome that you cannot switch? If so, that's an odd
> claim indeed.

Think `embedded systems'. In the case at hand, being able to fit a
compressed image on a floppy or not makes all the difference in the world.

This is a significant regression from 2.95.2. Fortunately, it does seem to
come from some linker issue which I hope to figure out, but be assured this
kind of problem is very serious.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [GCC 3.0] Bad regression, binary size
@ 2001-07-05  8:31 dewar
  2001-07-05  8:38 ` Marc Espie
  0 siblings, 1 reply; 44+ messages in thread
From: dewar @ 2001-07-05  8:31 UTC (permalink / raw)
  To: Marc.Espie, gcc

<<2.95.3 + peephole to handle stack adjustment, backported from current:
-rwxr-xr-x   1 espie    wheel     3994332 Jul  5 16:08 bsd

3.0:
-rwxr-xr-x   1 espie    wheel     4068067 Jul  5 16:35 bsd
>>

I don't understand, are you saying that the less than 2% increase in
size is so worrisome that you cannot switch? If so, that's an odd
claim indeed.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [GCC 3.0] Bad regression, binary size
  2001-07-05  8:17   ` David Edelsohn
@ 2001-07-05  8:22     ` Marc Espie
  0 siblings, 0 replies; 44+ messages in thread
From: Marc Espie @ 2001-07-05  8:22 UTC (permalink / raw)
  To: gcc

On Thu, Jul 05, 2001 at 11:17:05AM -0400, David Edelsohn wrote:
> 	Alignment?  Libraries (does OpenBSD link with libgcc.a?)?

This is probably an alignment issue... Both binaries link by hand, using
ld -z -Ttext E0100000 -e start -x -o bsd ${SYSTEM_OBJ} vers.o


On gcc 3.0, total object size (wc -c) is 5459492, vs. 5450269  on 
gcc 2.95.3...

Next thing I do is I'm going to mix and match object files from both, see
if I notice a pattern.

Sorry if I give details slowly, this thing is confusing me a lot...
I really wouldn't have expected the size difference to show up that way.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [GCC 3.0] Bad regression, binary size
  2001-07-05  7:51 ` Marc Espie
  2001-07-05  8:07   ` Joern Rennecke
@ 2001-07-05  8:17   ` David Edelsohn
  2001-07-05  8:22     ` Marc Espie
  1 sibling, 1 reply; 44+ messages in thread
From: David Edelsohn @ 2001-07-05  8:17 UTC (permalink / raw)
  To: Marc Espie; +Cc: gcc

>>>>> Marc Espie writes:

Marc> In article < 20010705163930.A29149@schutzenberger.liafa.jussieu.fr > you write:
>> 2.95.3 + peephole to handle stack adjustment, backported from current:
>> -rwxr-xr-x   1 espie    wheel     3994332 Jul  5 16:08 bsd

>> 3.0:
>> -rwxr-xr-x   1 espie    wheel     4068067 Jul  5 16:35 bsd


>> Exact same source. Both are using -O2 -Os to compile.

Marc> Hum, this is weird... the corresponding object sizes are not that far...
Marc> Anyone has any idea as to a linking issue that may explain that problem ?

	Alignment?  Libraries (does OpenBSD link with libgcc.a?)?

David

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [GCC 3.0] Bad regression, binary size
  2001-07-05  8:07   ` Joern Rennecke
@ 2001-07-05  8:12     ` Marc Espie
  0 siblings, 0 replies; 44+ messages in thread
From: Marc Espie @ 2001-07-05  8:12 UTC (permalink / raw)
  To: Joern Rennecke; +Cc: gcc

On Thu, Jul 05, 2001 at 04:07:42PM +0100, Joern Rennecke wrote:
> > Anyone has any idea as to a linking issue that may explain that problem ?
> 
> The shared libgcc contains a lot of functions that are never used.

static binary. In fact -nostdlib, I forgot to say (bsd kernel).

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [GCC 3.0] Bad regression, binary size
  2001-07-05  7:51 ` Marc Espie
@ 2001-07-05  8:07   ` Joern Rennecke
  2001-07-05  8:12     ` Marc Espie
  2001-07-05  8:17   ` David Edelsohn
  1 sibling, 1 reply; 44+ messages in thread
From: Joern Rennecke @ 2001-07-05  8:07 UTC (permalink / raw)
  To: Marc Espie; +Cc: gcc

> Anyone has any idea as to a linking issue that may explain that problem ?

The shared libgcc contains a lot of functions that are never used.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [GCC 3.0] Bad regression, binary size
  2001-07-05  7:39 Marc Espie
@ 2001-07-05  7:51 ` Marc Espie
  2001-07-05  8:07   ` Joern Rennecke
  2001-07-05  8:17   ` David Edelsohn
  2001-07-05 15:14 ` Geoff Keating
  2001-07-07 13:09 ` Richard Henderson
  2 siblings, 2 replies; 44+ messages in thread
From: Marc Espie @ 2001-07-05  7:51 UTC (permalink / raw)
  To: gcc

In article < 20010705163930.A29149@schutzenberger.liafa.jussieu.fr > you write:

>2.95.3 + peephole to handle stack adjustment, backported from current:
>-rwxr-xr-x   1 espie    wheel     3994332 Jul  5 16:08 bsd

>3.0:
>-rwxr-xr-x   1 espie    wheel     4068067 Jul  5 16:35 bsd


>Exact same source. Both are using -O2 -Os to compile.

Hum, this is weird... the corresponding object sizes are not that far...
Anyone has any idea as to a linking issue that may explain that problem ?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [GCC 3.0] Bad regression, binary size
@ 2001-07-05  7:39 Marc Espie
  2001-07-05  7:51 ` Marc Espie
                   ` (2 more replies)
  0 siblings, 3 replies; 44+ messages in thread
From: Marc Espie @ 2001-07-05  7:39 UTC (permalink / raw)
  To: gcc

After fixing a few warnings, I've managed to compile OpenBSD kernels.
The results are BAD:

2.95.3 + peephole to handle stack adjustment, backported from current:
-rwxr-xr-x   1 espie    wheel     3994332 Jul  5 16:08 bsd

3.0:
-rwxr-xr-x   1 espie    wheel     4068067 Jul  5 16:35 bsd


Exact same source. Both are using -O2 -Os to compile.

This is on an i386.

This is pretty bad. Basically, it means we can't switch to 3.0 at all.

I'm going to look at the corresponding generated files, see if I can spot
some pattern.

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2001-07-24  9:27 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-07-23 19:42 [GCC 3.0] Bad regression, binary size dewar
2001-07-24  9:27 ` Linus Torvalds
  -- strict thread matches above, loose matches on Subject: below --
2001-07-23  4:46 Jonathan Thornburg
2001-07-23 15:14 ` Linus Torvalds
2001-07-09 10:13 dewar
2001-07-09 10:37 ` Linus Torvalds
2001-07-09  7:50 dewar
2001-07-09  7:18 dewar
2001-07-09  7:22 ` Marc Espie
2001-07-05  8:42 dewar
2001-07-09  0:06 ` Marc Espie
2001-07-09 13:12   ` Gerald Pfeifer
2001-07-09 13:28     ` Neil Booth
2001-07-09 14:13       ` Gerald Pfeifer
2001-07-09 14:36         ` Bobby McNulty
2001-07-09 14:54       ` Justin Guyett
2001-07-09 14:59         ` Phil Edwards
2001-07-09 15:04           ` Marc Espie
2001-07-09 16:43           ` Daniel Berlin
2001-07-05  8:31 dewar
2001-07-05  8:38 ` Marc Espie
2001-07-05  7:39 Marc Espie
2001-07-05  7:51 ` Marc Espie
2001-07-05  8:07   ` Joern Rennecke
2001-07-05  8:12     ` Marc Espie
2001-07-05  8:17   ` David Edelsohn
2001-07-05  8:22     ` Marc Espie
2001-07-05 15:14 ` Geoff Keating
2001-07-07 13:09 ` Richard Henderson
2001-07-07 14:41   ` Jamie Lokier
2001-07-07 16:45     ` Richard Henderson
2001-07-08 14:28       ` Linus Torvalds
2001-07-08 23:59         ` Marc Espie
2001-07-09  6:06           ` Tim Prince
2001-07-09  9:10           ` Linus Torvalds
2001-07-09  9:28             ` Marc Espie
2001-07-09  9:58               ` Linus Torvalds
2001-07-09 10:10             ` Paolo Carlini
2001-07-09 14:49         ` Richard Henderson
2001-07-09 15:23           ` Linus Torvalds
2001-07-09 15:55             ` Joern Rennecke
2001-07-09 16:14               ` Linus Torvalds
2001-07-09 16:05             ` Richard Henderson
2001-07-09 16:24               ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).