Re: ix86 double alignment (was Re: egcs-1.1 release schedule)

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* Re: ix86 double alignment (was Re: egcs-1.1 release schedule)
@ 1998-06-24 17:12 John Wehle
  1998-06-24 21:23 ` Jeffrey A Law
  0 siblings, 1 reply; 23+ messages in thread
From: John Wehle @ 1998-06-24 17:12 UTC (permalink / raw)
  To: law; +Cc: burley, d.love, egcs, davem

> It's an interesting question to think about.  HP recommends a 64byte
> alignment for the stack on PAs.  It has some *really* nice benefits
> as far as the dcache is concerned.  And until about a year ago we
> actually followed that guideline -- by setting STACK_BOUNDARY appropriately :-)
> 
> That's how I know about the problems that combine will cause if you
> end up with a mis-aligned stack pointer relative to STACK_BOUNDARY.
> It turned out the crt0 code on hpux10 only provided 8 byte alignment
> for the stack pointer.  Opps.

What about defining PREFERRED_STACK_BOUNDARY to mean the optimal stack
alignment and having it default to STACK_BOUNDARY?  Then change the
places which align the stack based on STACK_BOUNDARY to use
PREFERRED_STACK_BOUNDARY.  Leave code which implements optimizations
(and records the stack alignment) based on STACK_BOUNDARY alone.  This
way gcc will attempt to align the stack based on PREFERRED_STACK_BOUNDARY
and assume STACK_BOUNDARY when implementing optimizations which should
be safe (assuming that PREFERRED_STACK_BOUNDARY >= STACK_BOUNDARY is
enforced).

I known ... I've probably oversimplified the issue. :-)

-- John
-------------------------------------------------------------------------
|   Feith Systems  |   Voice: 1-215-646-8000  |  Email: john@feith.com  |
|    John Wehle    |     Fax: 1-215-540-5495  |                         |
-------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: ix86 double alignment (was Re: egcs-1.1 release schedule)
  1998-06-24 17:12 ix86 double alignment (was Re: egcs-1.1 release schedule) John Wehle
@ 1998-06-24 21:23 ` Jeffrey A Law
  0 siblings, 0 replies; 23+ messages in thread
From: Jeffrey A Law @ 1998-06-24 21:23 UTC (permalink / raw)
  To: John Wehle; +Cc: burley, d.love, egcs, davem

  In message < 199806242143.RAA15394@jwlab.FEITH.COM >you write:
  > What about defining PREFERRED_STACK_BOUNDARY to mean the optimal stack
  > alignment and having it default to STACK_BOUNDARY?  Then change the
  > places which align the stack based on STACK_BOUNDARY to use
  > PREFERRED_STACK_BOUNDARY.  Leave code which implements optimizations
  > (and records the stack alignment) based on STACK_BOUNDARY alone.  This
  > way gcc will attempt to align the stack based on PREFERRED_STACK_BOUNDARY
  > and assume STACK_BOUNDARY when implementing optimizations which should
  > be safe (assuming that PREFERRED_STACK_BOUNDARY >= STACK_BOUNDARY is
  > enforced).
  > 
  > I known ... I've probably oversimplified the issue. :-)
That's basically what I expect to happen, or something very similar.

jeff

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: ix86 `double' alignment (was Re: egcs-1.1 release schedule)
  1998-06-25  0:25           ` Jeffrey A Law
  1998-06-25  9:59             ` Tim Hollebeek
@ 1998-06-28 18:01             ` Marc Lehmann
  1 sibling, 0 replies; 23+ messages in thread
From: Marc Lehmann @ 1998-06-28 18:01 UTC (permalink / raw)
  To: egcs

On Wed, Jun 24, 1998 at 06:11:55PM -0600, Jeffrey A Law wrote:

>   > "by value" (C style) at all, now that I think about it more.  It'd
>   > surely break the ABI.
> Right.  That's basically what I was trying to explain in one or more
> of those longer messages.  You can't align stuff in the arglist without
> either breaking the ABI or blowing away the alignment we want for the
> stack pointer.

-malign-double already breaks the abi, and is reeeealy useful. considerung
it doesn't cost anything at all to implement -marg-align-double (except
fixing a bug/deficiency in calls.c). I'd guess people would be happy for the
additional 1%.

      -----==-                                              |
      ----==-- _                                            |
      ---==---(_)__  __ ____  __       Marc Lehmann       +--
      --==---/ / _ \/ // /\ \/ /       pcg@goof.com       |e|
      -=====/_/_//_/\_,_/ /_/\_\                          --+
    The choice of a GNU generation                        |
                                                          |

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: ix86 `double' alignment (was Re: egcs-1.1 release schedule)
  1998-06-25  0:25           ` Jeffrey A Law
@ 1998-06-25  9:59             ` Tim Hollebeek
  1998-06-28 18:01             ` Marc Lehmann
  1 sibling, 0 replies; 23+ messages in thread
From: Tim Hollebeek @ 1998-06-25  9:59 UTC (permalink / raw)
  To: law; +Cc: burley, d.love, toon, egcs

Jeffrey A Law writes ...
> 
> Right.  That's basically what I was trying to explain in one or more
> of those longer messages.  You can't align stuff in the arglist without
> either breaking the ABI or blowing away the alignment we want for the
> stack pointer.

Would it be possible to have the argument read from its unaligned
location the first time it is used, but if it is ever spilled out of a
register before it goes dead to have it written to an aligned
location?  This seems like a workable idea which is half way between
the 'copy to an aligned location' and the 'always use the unaligned
value' ideas.  Might be a bitch to implement, though.

---------------------------------------------------------------------------
Tim Hollebeek                           | "Everything above is a true
email: tim@wfn-shop.princeton.edu       |  statement, for sufficiently
URL: http://wfn-shop.princeton.edu/~tim |  false values of true."

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: ix86 `double' alignment (was Re: egcs-1.1 release schedule)
  1998-06-24 14:50         ` Craig Burley
@ 1998-06-25  0:25           ` Jeffrey A Law
  1998-06-25  9:59             ` Tim Hollebeek
  1998-06-28 18:01             ` Marc Lehmann
  0 siblings, 2 replies; 23+ messages in thread
From: Jeffrey A Law @ 1998-06-25  0:25 UTC (permalink / raw)
  To: Craig Burley; +Cc: d.love, toon, egcs

  In message < 199806241507.LAA11762@melange.gnu.org >you write:
  > >Basically the ABI just mandates a 4 byte alignment, we get better
  > >performance if we could get the args 8 byte aligned.  But  I'll be
  > >a little suprised if we can do this without having the callee copy
  > >it out of the arglist to an aligned memory slot.
  > 
  > I guess we shouldn't try aligning outoing/incoming doubles passed
  > "by value" (C style) at all, now that I think about it more.  It'd
  > surely break the ABI.
Right.  That's basically what I was trying to explain in one or more
of those longer messages.  You can't align stuff in the arglist without
either breaking the ABI or blowing away the alignment we want for the
stack pointer.

  > And, I suspect the real performance penalties come mostly from arrays
  > and such inside loops anyway.  People don't pass arrays by value
  > (not usually ;-), and if an incoming by-value double is used
  > frequently in a loop, all that's really needed is to make the
  > compiler smart enough to make an aligned copy of that argument...but
  > let's wait until we see real code that could benefit from that.
Right.  It may also be the case that we'll need to align stack slots
for pseudos that don't get hard regs.  But that can wait until we
determine its important.


  > So, AFAICT, the doubles end up where they end up, either aligned
  > or not, and there's nothing we can do about it at that point.
Right.


  > >Before we can do *anything* about the alignment of args and autos we
  > >first need to get the stack pointer aligned at all times.  Let's deal
  > >with that first, then try to come up with solutions for the auto and
  > >argument alignment afterwards.
  > 
  > Uh-guh-reed!  :) 
Yeppers. :-)  

jeff

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: ix86 double alignment (was Re: egcs-1.1 release schedule)
  1998-06-24 10:08   ` Dave Love
@ 1998-06-24 21:23     ` Jeffrey A Law
  0 siblings, 0 replies; 23+ messages in thread
From: Jeffrey A Law @ 1998-06-24 21:23 UTC (permalink / raw)
  To: Dave Love; +Cc: Craig Burley, egcs

  In message < rzqd8byol2i.fsf@djlvig.dl.ac.uk >you write:
  > >>>>> "Craig" == Craig Burley <burley@gnu.org> writes:
  > 
  >  Craig> My current assumption is we are shooting for only 8-byte
  >  Craig> alignment of the stack frame to obtain 8-byte alignment of
  >  Craig> doubles within the frame.  I think crt0 (or whatever) already
  >  Craig> assures this, but don't know for sure about that or whether it
  >  Craig> further assures 16-byte or 32-byte alignment.
  > 
  > glibc2 does 8-byte alignment, Linux libc5 only does 4-byte (or did in
  > the last version I checked, but seemed easy to change); likewise
  > DJGPP.  I can't remember the story on Cygwin -- is that glibc2-based?
  > No info on x86 Solaris et al.
While these are important (since you have to know what alignment you
were initially given), the bigger issue is keeping the proper alignment
in the compiler itself.  That's the real work here :-0 

jeff

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: ix86 `double' alignment (was Re: egcs-1.1 release schedule)
  1998-06-24  2:28       ` Jeffrey A Law
@ 1998-06-24 14:50         ` Craig Burley
  1998-06-25  0:25           ` Jeffrey A Law
  0 siblings, 1 reply; 23+ messages in thread
From: Craig Burley @ 1998-06-24 14:50 UTC (permalink / raw)
  To: law; +Cc: d.love, toon, egcs

>Basically the ABI just mandates a 4 byte alignment, we get better
>performance if we could get the args 8 byte aligned.  But  I'll be
>a little suprised if we can do this without having the callee copy
>it out of the arglist to an aligned memory slot.

I guess we shouldn't try aligning outoing/incoming doubles passed
"by value" (C style) at all, now that I think about it more.  It'd
surely break the ABI.

>My gut tells me aligning variables inside the arglist isn't going to
>win as much as the other cases noted above.

Especially not for Fortran, since g77 doesn't generally pass doubles
(or anything) by value, with some exceptions for the run-time
library.

And, I suspect the real performance penalties come mostly from arrays
and such inside loops anyway.  People don't pass arrays by value
(not usually ;-), and if an incoming by-value double is used
frequently in a loop, all that's really needed is to make the
compiler smart enough to make an aligned copy of that argument...but
let's wait until we see real code that could benefit from that.

>  > Again, what is the *real* problem with just doing what is currently
>  > done for that case, ending up with a misaligned double arg for
>  > the incoming procedure -- must it really assume its double is
>  > 64-bit aligned?  Or is this really just an internal problem with
>  > gcc's housekeeping?
>There's no problem other than the performance issues.  The code will
>still work.  Maybe that's where we're mis-communicating :-)

Oh, okay, good, indeed we were.  My priorities here are first to
make sure nothing that does work stops working; second to make
sure nothing reasonable suddenly goes lots slower; third to
make lots of stuff go faster.  Its the third priority we're discussing,
of course, but some of the solutions that have been proposed
(including, I though, mine) might violate the first two.

But I now don't see how we can align doubles in an arglist while
both aligning the callee's incoming stack frame *and* meeting the
ABI requirements.  After all, any arglist consisting of arbitrary
float, double, and int (32-bit) must be laid out with no padding
between args and no padding between the last-pushed arg (the
first arg and the return address `call' pushes), right?  The only
way to ensure that the incoming stack frame is aligned is to
optionally reserve a 4-byte pad before pushing any of the args,
as we've discussed.

So, AFAICT, the doubles end up where they end up, either aligned
or not, and there's nothing we can do about it at that point.

Of course, the callee can, as you point out, copy them to aligned
locations on its stack frame and use that, which is worthwhile
if it sees the potential for frequent references during the call.

>Before we can do *anything* about the alignment of args and autos we
>first need to get the stack pointer aligned at all times.  Let's deal
>with that first, then try to come up with solutions for the auto and
>argument alignment afterwards.

Uh-guh-reed!  :)

        tq vm, (burley)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: ix86 double alignment (was Re: egcs-1.1 release schedule)
  1998-06-23 15:06 ` Craig Burley
  1998-06-23 22:55   ` Jeffrey A Law
@ 1998-06-24 10:08   ` Dave Love
  1998-06-24 21:23     ` Jeffrey A Law
  1 sibling, 1 reply; 23+ messages in thread
From: Dave Love @ 1998-06-24 10:08 UTC (permalink / raw)
  To: Craig Burley; +Cc: egcs

>>>>> "Craig" == Craig Burley <burley@gnu.org> writes:

 Craig> My current assumption is we are shooting for only 8-byte
 Craig> alignment of the stack frame to obtain 8-byte alignment of
 Craig> doubles within the frame.  I think crt0 (or whatever) already
 Craig> assures this, but don't know for sure about that or whether it
 Craig> further assures 16-byte or 32-byte alignment.

glibc2 does 8-byte alignment, Linux libc5 only does 4-byte (or did in
the last version I checked, but seemed easy to change); likewise
DJGPP.  I can't remember the story on Cygwin -- is that glibc2-based?
No info on x86 Solaris et al.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: ix86 `double' alignment (was Re: egcs-1.1 release schedule)
  1998-06-23  5:13     ` Craig Burley
@ 1998-06-24  2:28       ` Jeffrey A Law
  1998-06-24 14:50         ` Craig Burley
  0 siblings, 1 reply; 23+ messages in thread
From: Jeffrey A Law @ 1998-06-24  2:28 UTC (permalink / raw)
  To: Craig Burley; +Cc: d.love, egcs

  In message < 199806230722.DAA02276@melange.gnu.org >you write:
  > I'm a little curious, though, how such an operation comes to pass.
The code to implement va_arg has address masking of this nature.  I
also believe it occurs internally to handle memory accesses smaller
than 32bits on the alpha.

  > I came up with worse examples.  If STACK_BOUNDARY (or anything that
  > might break the ABI) is adjusted based on whether the processor
  > is [56]86 vs. [34]86, then code/libraries that happen to be compiled
  > on different variants of the x86 architecture could be magically
  > incompatible, producing subtly wrong results.
Yes.  Though I would strongly recommend against ABI things changing
based on the processor for this exact reason :-)

  > My question is, just why, *conceptually*, is it a problem on the
  > x86 architecture to try to align the argument list so the caller
  > frame is 64-bit aligned *and* at least some of the doubles in
  > the list are 64-bit aligned, but some aren't?
  > 
  > That is, is there a reason that x86 code *must* be generated either
  > to always assume doubles are 32-bit aligned *or* are always
  > 64-bit aligned?  I can't think of any.
It's not an architecture issue, but an ABI issue.  The architecture
should be able to handle just about an alignment we throw at it.

Basically the ABI just mandates a 4 byte alignment, we get better
performance if we could get the args 8 byte aligned.  But  I'll be
a little suprised if we can do this without having the callee copy
it out of the arglist to an aligned memory slot.

There is an interesting problem from a gcc internals standpoint that
we may hit if we ever tried to align doubles relative to the start of
the argument list.  I ran into it on the v850 a couple years ago, but
I don't remember the details.  Basically the info we needed to do
this wasn't available and the scheme broke down when handling 
varargs/stdarg.


  > So can we at least come up with a short-term way to say "*try*
  > to align outgoing doubles to 64-bits, but don't assume incoming
  > doubles are 64-bit aligned", and in the long run make a better
  > overall architecture for representing alignments?
Let's defer trying to align arglists until we get the stack pointer
itself aligned and until after we're aligning auto variables relative
to the stack pointer and possibly stack slots for pseudos that don't
get hard registers aligned :-)

My gut tells me aligning variables inside the arglist isn't going to
win as much as the other cases noted above.

  > >You might think we could compensate for this by pushing an extra 
  > >dummy word before the first integer to ensure the double gets 
  > >aligned.  But that loses if we have:
  > >
  > >foo (int2, double, int1)
  > >
  > >If we pushed an extra 4 byte hunk before int1, then the total 
  > >size of the arglist would be 20 bytes -- not a multiple of 8.
  > >
  > >And as I'll explain below, we must always make sure to allocate
  > >in 8 byte lumps -- we can't depend on the callee to round the stack.
  > 
  > Again, what is the *real* problem with just doing what is currently
  > done for that case, ending up with a misaligned double arg for
  > the incoming procedure -- must it really assume its double is
  > 64-bit aligned?  Or is this really just an internal problem with
  > gcc's housekeeping?
There's no problem other than the performance issues.  The code will
still work.  Maybe that's where we're mis-communicating :-)

Before we can do *anything* about the alignment of args and autos we
first need to get the stack pointer aligned at all times.  Let's deal
with that first, then try to come up with solutions for the auto and
argument alignment afterwards.


  > >Instead we must make sure that we always allocate stacks in 8 byte
  > >hunks in the prologue *and* that we push an extra dummy word on the stack
  > >when performing function calls where the arg list + return pointer
  > >are not a multiple of 8 bytes in size.
  > >
  > >[ Remember, the call itself will push a 4 byte word on the stack
  > >  too, so we have to account for it too. ]
  > 
  > Right.  Okay.
OK.  We agree on this.  And since any work which involves trying to
align autos depends on first getting the stack aligned let's solve the
alignment of the stack pointer problem first.  That work can happen
while we debate the other issues :-)


  > Also, presumably we don't actually have to *push* an arg, but just
  > subtract 4 from %esp, right?
Right.  We don't actually have to make the memory reference, just allocate
an extra outgoing arg slot before we push any of the real outgoing
arguments.

  > I am quite willing to do this work myself.  But I say that well-
  > knowing I'm not the best person for the job; just someone sufficiently
  > enthused, with a spot of time, a Pentium II, a trackball, and
  > half the g77 user base hounding me for the past couple of years,
  > etc. etc. etc.  So I'd need some initial hand-holding, probably.  :)
Well, I don't have the time to tackle it myself, but I can try to help
you (or anyone else) through the twisty maze of ABI related code in
gcc to try and make this happen.

jeff

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: ix86 `double' alignment (was Re: egcs-1.1 release schedule)
  1998-06-23 10:23 ix86 `double' " John Wehle
  1998-06-23 14:56 ` Craig Burley
@ 1998-06-23 22:55 ` Jeffrey A Law
  1 sibling, 0 replies; 23+ messages in thread
From: Jeffrey A Law @ 1998-06-23 22:55 UTC (permalink / raw)
  To: John Wehle; +Cc: d.love, egcs, burley

  In message < 199806231723.NAA12151@jwlab.FEITH.COM >you write:
  > > Instead we must make sure that we always allocate stacks in 8 byte
  > > hunks in the prologue *and* that we push an extra dummy word on the stack
  > > when performing function calls where the arg list + return pointer
  > > are not a multiple of 8 bytes in size.
  > 
  > I'm seeing a lot of references to double and aligning on 8 byte boundaries.
  > As long as we are looking at this we may want to view it as a more general
  > problem.  I.e. Intel recommends that
  > 
  >   doubles be aligned on 8 byte boundarys
  > 
  >   long doubles be aligned on 16 byte boundarys
  > 
  >   objects >= 32 bytes in size be aligned on 32 byte boundarys
  > 
  > I believe that they all have similar issues with regards to encouraging
  > gcc to align them for optimal performance on the stack so it's possible
  > that they may all be "solved" with the same solution in the back end.
  > 
  > Just something to think about. :-)
Well, I suspect to make this work we are going to have to make some
changes into the generic code (as well as the x86 target files).  So
presumably we'd be able to re-use some of the generic stuff again for
any other targets where it would be useful.

jeff

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: ix86 double alignment (was Re: egcs-1.1 release schedule)
  1998-06-23 15:06 ` Craig Burley
@ 1998-06-23 22:55   ` Jeffrey A Law
  1998-06-24 10:08   ` Dave Love
  1 sibling, 0 replies; 23+ messages in thread
From: Jeffrey A Law @ 1998-06-23 22:55 UTC (permalink / raw)
  To: Craig Burley; +Cc: john, d.love, egcs, davem

  In message < 199806232206.SAA00584@melange.gnu.org >you write:
  > >Though it's not as fine-grain as what's mentioned above it may be
  > >worth while to consider using the DATA_ALIGNMENT macro when laying
  > >out variables for the stack.  On the i386 it currently returns the
  > >Intel recommended alignment for doubles, long doubles, arrays, etc.
  > >(the recommended alignment for long doubles is different from doubles).
  > 
  > I realized later that we can't really do alignment of stack variables
  > on more than 8-byte boundaries unless we're willing to align the
  > stack frames themselves to those larger boundaries.
Yup.  Good point.


  > In a program
  > that doesn't itself use long double or arrays, would this be a
  > win or lose anyway, in terms of performance?  On the one hand, almost
  > always subtracting 4-28 bytes from %sp before pushing arguments onto
  > the stack before a call seems a waste.  On the other hand, maybe the
  > new stack frame itself meets Intel's definition of an object
  > greater than 32 bytes long (at least in most cases, I'd guess).
It's an interesting question to think about.  HP recommends a 64byte
alignment for the stack on PAs.  It has some *really* nice benefits
as far as the dcache is concerned.  And until about a year ago we
actually followed that guideline -- by setting STACK_BOUNDARY appropriately :-)

That's how I know about the problems that combine will cause if you
end up with a mis-aligned stack pointer relative to STACK_BOUNDARY.
It turned out the crt0 code on hpux10 only provided 8 byte alignment
for the stack pointer.  Opps.

Once in a great while folks would complain about the stack wasteage,
but it was rather rare -- even the embedded folks working on the PA
didn't complain (which in retrospect I find amazing).

  > My current assumption is we are shooting for only 8-byte alignment
  > of the stack frame to obtain 8-byte alignment of doubles within
  > the frame.
Right.  The nice thing is whatever mechanism we come up with could
later be used to increase the stack pointer alignment if we deemed
it useful.

  > (I'm guessing DATA_ALIGNMENT is already used for static and automatic
  > stuff, but somebody should verify this.)
Correct.  We changed the definition of DATA_ALIGNMENT to apply to 
basically anything in the static store -- including constants which
turned out to be a big win for some spec codes on the x86.

jeff

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: ix86 double alignment (was Re: egcs-1.1 release schedule)
  1998-06-23  3:32 ix86 double " John Wehle
@ 1998-06-23 15:06 ` Craig Burley
  1998-06-23 22:55   ` Jeffrey A Law
  1998-06-24 10:08   ` Dave Love
  0 siblings, 2 replies; 23+ messages in thread
From: Craig Burley @ 1998-06-23 15:06 UTC (permalink / raw)
  To: john; +Cc: law, d.love, egcs, davem

>Though it's not as fine-grain as what's mentioned above it may be
>worth while to consider using the DATA_ALIGNMENT macro when laying
>out variables for the stack.  On the i386 it currently returns the
>Intel recommended alignment for doubles, long doubles, arrays, etc.
>(the recommended alignment for long doubles is different from doubles).

I realized later that we can't really do alignment of stack variables
on more than 8-byte boundaries unless we're willing to align the
stack frames themselves to those larger boundaries.  In a program
that doesn't itself use long double or arrays, would this be a
win or lose anyway, in terms of performance?  On the one hand, almost
always subtracting 4-28 bytes from %sp before pushing arguments onto
the stack before a call seems a waste.  On the other hand, maybe the
new stack frame itself meets Intel's definition of an object
greater than 32 bytes long (at least in most cases, I'd guess).

My current assumption is we are shooting for only 8-byte alignment
of the stack frame to obtain 8-byte alignment of doubles within
the frame.  I think crt0 (or whatever) already assures this, but
don't know for sure about that or whether it further assures
16-byte or 32-byte alignment.  But without more than 8-byte
alignment of frames, we won't get Intel's recommended alignment
for long double or large objects when they're on the stack.

(I'm guessing DATA_ALIGNMENT is already used for static and automatic
stuff, but somebody should verify this.)

        tq vm, (burley)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: ix86 `double' alignment (was Re: egcs-1.1 release schedule)
  1998-06-23 10:23 ix86 `double' " John Wehle
@ 1998-06-23 14:56 ` Craig Burley
  1998-06-23 22:55 ` Jeffrey A Law
  1 sibling, 0 replies; 23+ messages in thread
From: Craig Burley @ 1998-06-23 14:56 UTC (permalink / raw)
  To: john; +Cc: law, d.love, toon, egcs

>  doubles be aligned on 8 byte boundarys
>
>  long doubles be aligned on 16 byte boundarys
>
>  objects >= 32 bytes in size be aligned on 32 byte boundarys
>
>I believe that they all have similar issues with regards to encouraging
>gcc to align them for optimal performance on the stack so it's possible
>that they may all be "solved" with the same solution in the back end.
>
>Just something to think about. :-)

I'm all for that.  g77 doesn't (yet) support long doubles (these
are 80-bit, IIRC), but certainly both g77 and its run-time
library make use of larger objects (EQUIVALENCE aggregates on the
stack, plus plenty of internally generated structures to pass
between the generated code and the library).  Though I'd be
interested in seeing an example of an app that is significantly
speeded up by aligning long doubles and aggregates according to
Intel's spec, over and above what is achieved if we simply align
non-ABI doubles according to that spec -- it'd surprise me if
such an app existed and was written in Fortran.  (I'd guess it'd
have to have tight inner loops using local EQUIVALENCE vars,
because I think that anytime internal stack-based aggregates are used,
the run-time library is involved, and that usually means I/O
and other operations that are inherently slow, i.e. not tight and
probably not helped noticeably by improving alignment.)

So IMO it'd indeed be nice to keep the additional alignment advice
in mind when we improve the `double' alignment situation, but
not necessary to actually implement it.  The `double' alignment
stuff really means big speedups for some codes.

        tq vm, (burley)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: ix86 `double' alignment (was Re: egcs-1.1 release schedule)
@ 1998-06-23 10:23 John Wehle
  1998-06-23 14:56 ` Craig Burley
  1998-06-23 22:55 ` Jeffrey A Law
  0 siblings, 2 replies; 23+ messages in thread
From: John Wehle @ 1998-06-23 10:23 UTC (permalink / raw)
  To: law; +Cc: d.love, egcs, burley

> Instead we must make sure that we always allocate stacks in 8 byte
> hunks in the prologue *and* that we push an extra dummy word on the stack
> when performing function calls where the arg list + return pointer
> are not a multiple of 8 bytes in size.

I'm seeing a lot of references to double and aligning on 8 byte boundaries.
As long as we are looking at this we may want to view it as a more general
problem.  I.e. Intel recommends that

  doubles be aligned on 8 byte boundarys

  long doubles be aligned on 16 byte boundarys

  objects >= 32 bytes in size be aligned on 32 byte boundarys

I believe that they all have similar issues with regards to encouraging
gcc to align them for optimal performance on the stack so it's possible
that they may all be "solved" with the same solution in the back end.

Just something to think about. :-)

-- John
-------------------------------------------------------------------------
|   Feith Systems  |   Voice: 1-215-646-8000  |  Email: john@feith.com  |
|    John Wehle    |     Fax: 1-215-540-5495  |                         |
-------------------------------------------------------------------------


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: ix86 double alignment (was Re: egcs-1.1 release schedule)
  1998-06-23  3:32   ` David S. Miller
@ 1998-06-23  6:30     ` Craig Burley
  0 siblings, 0 replies; 23+ messages in thread
From: Craig Burley @ 1998-06-23  6:30 UTC (permalink / raw)
  To: davem; +Cc: law, d.love, toon, egcs

>   Date: Mon, 22 Jun 1998 14:29:42 -0400 (EDT)
>   From: Craig Burley <burley@gnu.org>
>
>   A case we can't 64-bit align is:
>
>	   real r(2)
>	   double precision d1, d2
>	   equivalence (r(1),d1)
>	   equivalence (r(2),d2)
>
>   Regardless of whether this is stack, static, or even part of a common
>   block, we can't 64-bit align both d1 and d2.  (Well, not without
>   an option to completely change the way we implement Fortran; I wonder
>   if Sun does that to support weird-but-conforming code on SPARCs,
>   such as the above.)
>
>I can give some insight to this on certain cases on the Sparc.
[...]
>This would need some investigation before such a scheme is enable in
>egcs, so I'd say defer thinking about it until after the 1.1 release
>happens.

Yes, let me make this clear.  I brought up SPARC only to illustrate
my awareness that this issue is a general one for all gcc targets
in theory at least and in practice for at least two targets (x86
and SPARC), probably others as well (`sh', whatever that is, already
seems to have its own work done in this area...can someone send me
a `sh' machine please? ;-).

I am *not* recommending any of the changes we're discussing actually
applying to code generation on SPARCs at all, at least not in the
short term (1.1).

In particular, while this align-doubles-properly-or-performance-will-
really-really-suck issue has *frequently* come to my attention for
the x86 architecture, basically *never* can I recall anyone complaining
about not being able to compile real Fortran code using standard-
conforming (but, as Dave Love rightly points out, nonportable due
to machines like SPARC) constructs like the one I illustrated above.

My impression: SPARC users, even of g77, have already been "trained"
by Sun to not expect some weird force-bad-alignment code to work
on that architecture.  So they don't complain to us, either.

(And, yes, there is a way to "fix" this problem: provide a command-line
option that the compiler handles by making all ints and floats 64-bit,
all doubles 128-bit, and adjusting the run-time libraries and so on
accordingly.  Not that the *precision* goes up, just that the Fortran
INTEGER, LOGICAL, and REAL types suddenly have 32 useful bits and 32
junk bits in 64 bits of space, DOUBLE PRECISION and COMPLEX have 64+64
bits, and so on.  With only a relatively small amount of hair, this
makes full Fortran standards conformance possible, and makes most
data needs expand by a factor of two with no increase in precision or
range and a general decrease in run-time performance.  A great way to
discourage people from insisting that their old codes run without
modification.  Can't recall what I've been told, if anything, about
whether Sun provides such an option in any of its Fortran compilers.
Perhaps just the *threat* of "solving" this problem this way is all
that has ever been needed, like the cold-war MAD policy regarding use
of nuclear armaments.  But I can't recall anybody asking for this option
in g77, though I might be so quick to say "fine, send us lots of money"
that I forgot all about any such requests.  :)

So, the tempting thing some people have thought is "well, SPARC
users are used to a `broken' Fortran implementation for years,
being happier with the performance gains; why not foist this on
Intel users?"  Unfortunately, the existing user base, library
code, and iron base makes this worth solving for only newer
instances of those, but not worth breaking for the older ones.

The upshot: when [56]86 users aren't using "dangerous" options like
`-malign-double', we still need to do 64-bit alignment wherever
possible but without breaking compatibility to get decent
performance.

BTW, as much as I want this problem "cured" for [56]86 users in egcs
1.1, I *really* hope the libm performance problems on Alpha are
generally fixed by the time such a version of egcs is released,
because it disturbs me a bit that we might make [56]86 appear
even better price/performance-wise than Alphas than they have in
the past, especially since we *know* we can speed up the Alphas
more than the [56]86 just by fixing just the software.  (And,
intentionally, I've kept the Alpha architecture manual within
reach, but not the ix86 one so, for over a year now.  :)

        tq vm, (burley)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: ix86 double alignment (was Re: egcs-1.1 release schedule)
  1998-06-23  3:32   ` Jeffrey A Law
@ 1998-06-23  5:13     ` Craig Burley
  0 siblings, 0 replies; 23+ messages in thread
From: Craig Burley @ 1998-06-23  5:13 UTC (permalink / raw)
  To: law; +Cc: davem, d.love, egcs

[Need to study your later email with more technical issues more
carefully next, but here are some quick points.]

>  In message < 199806221829.OAA07477@melange.gnu.org >you write:
>  > Well, I'm willing to not try to do any special aligning for
>  > EQUIVALENCE and COMMON for now.  If we can just get 64-bit
>  > alignment for stack-allocated VAR_DECLs -- which generally
>  > won't include EQUIVALENCE (and certainly not COMMON) -- we'll
>  > have made a *huge* improvement in g77 performance, especially
>  > its *repeatability* of performance measurements.
>Yup.  But considering the release schedule, I'd be happy if we could
>just get the stack aligned properly without breaking the ABI, then
>iterate to getting automatic variables aligned relative to the stack.

I'd say getting the stack aligned properly without breaking the ABI
and *also* getting VAR_DECLs that are type `double' aligned within
those frames (whether arrays or scalars) is not only the most
important combination, but solves ~95% of the problems the g77
user community sees.

At least getting this to work when -malign-double is specified
would be kind of a "minimum" for making egcs 1.1 not noticably
much worse than g77 0.5.21 and 0.5.22 were.

>If we can get more done before the release, then great, but I wouldn't
>want to hold things up on this issue if we can avoid it.

I, hesitantly, agree.  Dave rightly points out that he and others
have been yelling about this for, well, it seems like years now.
I still have email from him (weeks ago) asking me to make a more
noticable push for this on the egcs list, and am now sorry I didn't
take his advice sooner.  I can only plead insanity, I guess it's
all those soccer balls I hit with my head playing for Scotland that
makes my brain mushy (just kidding, that's another Craig Burley ;-).

>  > (Without this improvement, egcs 1.1 will often appear *substantially*
>  > worse than the combination of g77 0.5.22 and gcc 2.7.2.3 on lots of
>  > widely used Fortran code, assuming users are using -malign-double.)
>Well, we still have -malign-double as an option for the x86 port, so
>if they use it they presumably would see comparable performance, right?

No.  I haven't yet finished up my totally-over-engineered diagnostic
program to expose all this, but the preliminary results are:

g77 0.5.21, 0.5.22:

  -  Without -malign-double, basically no doubles get aligned properly,
     with the exception of doubles that can be lazily aligned in COMMON
     (without conflicts; and I'm not quite sure why this is, maybe my
     program isn't working quite right, it could easily be an accident
     that these appear to be aligned, the sort of accident the new
     version of my program should make much less likely).

  -  With -malign-double, stack-based doubles still not aligned
     properly, but static/COMMON ones are (even if it breaks the
     COMMON ABI).

  -  With -malign-double -O, all doubles are aligned properly.  This is
     surprising; I didn't realize one needed -O to get this.  (And, yes,
     this breaks the COMMON ABI, etc.)

g77 0.5.23, egcs 1.0.3:

  -  Without -malign-double, no doubles get aligned properly.

  -  With -malign-double, static and automatic doubles get aligned,
     but not stack.  (Automatic are stack-based with dynamic size.)
     -O makes no difference.

egcs 19980615:

  -  Without -malign-double, only static doubles (and non-conflicting
     COMMON doubles) get aligned properly, the rest don't.

  -  With -malign-double, same as g77 0.5.23 and egcs 1.0.3.

So, the one huge improvement we should try to make for 1.1 is, IMO,
to achieve this:

  -  Without -malign-double, static, stack, and automatic doubles
     get aligned properly, but not if they're in EQUIVALENCE or
     COMMON blocks.  (Basically, any VAR_DECL the back end sees.)

  -  With -malign-double, same as g77 0.5.21 and 0.5.22 when -O is
     specified, except -O wouldn't be needed here.

The ideal situation would be to align all doubles that aren't
involved in ABI issues, but I think just doing the non-aggregate
ones handles ~95% of the important performance cases (as I've said,
and that's a *real* seat-of-the-pants guess).

Then, the only reason to use -malign-double is when the user knows
ABI issues are consistent across all pertinent modules (e.g. they're
all compiled with -malign-double) and the last ounce of performance
is needed.

>Actually, I'd expect generally better performance because we do  handle 
>alignments for static store items in a reasonable manner, which is a
>significant improvement by itself).

Right, which makes egcs *default* to better performance than the *default*
for 0.5.22, but to *worse* performance than 0.5.22 with -malign-double,
*even if -malign-double is used for egcs*.

The reason is that egcs nor gcc 2.8 will align stack-based doubles
(except automatics, i.e. dynamically-sized stack doubles).

So that's the main performance "regression" we currently have, which
means that at least getting -malign-double to align all doubles,
including stack-based ones, would seem to be worth making a
"required" item for egcs 1.1.

But I still would prefer it if we wouldn't effectively persuade most
users to risk using -malign-double just to get stack-based doubles
aligned for reasonable performance, and the way to do that is to make
alignment of stack-based (and automatic) doubles the default, again,
as long as the ABI isn't broken.

So not needing to use -malign-double to align stack-based doubles would
be a *huge* win, making egcs 1.1 *obviously* better than g77 0.5.21,
0.5.22, or 0.5.23 for most g77 users on x86.  That's because lots of
g77 users probably benchmark without reading up on (or feeling safe about
using) -malign-double.

>  > Note that I suggested the gcc architecture (machine descriptions,
>  > etc.) be modified to include a more fine-grained expression of
>  > alignment requirements.  E.g. distinguishing hardware requirements
>  > (even instruction requirements, such as `ld' vs. `ldd' on SPARCv8)
>  > from ABI requirements from ideal performance settings.  But this
>  > suggestion was turned down at the time -- some seven years ago!
>Sigh.  Yea, it really seems like something we should have -- then
>again, there's been little gcc emphasis on the x86 in the past and
>it's the most likely benefactor of such stuff.

When I asked for it, I thought it would have helped with the SPARC
system I was working on at the time.  Though, I might have been
wrong.  The x86 surely seems to have the most variety of alignment
flavors I've ever seen for any given type: e.g. `double' alone has
at least three alignments, 1 byte (minimum hardware alignment),
4 bytes (ABI alignment), and 8 bytes (ideal performance alignment)!

If you really want to see how sick I am about this "let's architect
the thing right so programmers have distinct things to specify,
instead of one-size-fits-all straitjackets", take a look at the
(now-ancient) g77 internals in this area.  E.g. egcs/gcc/f/target.h.

You'll find it not only tracks the alignment for each type, but
the "modulo".  That is, g77 can be taught (if given a suitable
back end ;-) that a given type is to be aligned such that it begins
on byte M of an N-byte-aligned block.  gcc and other tools
are architected to always align on byte 0 of an N-byte-aligned
block, but before I had worked on g77, I had some awareness of
the possibility of, e.g., a 10-byte type whose *last* 8 bytes
had to be 8-byte aligned, so the object as a whole would have
to be aligned to byte 6 of an 8-byte block.

Of course, this proved useful when cutting the code to handle
oddly-alignable aggregates, though never yet for fundamental
types.  Specifically, aligning an EQUIVALENCE block such that
REAL R must be immediately followed by DOUBLE PRECISION D fits
neatly into this scheme.  E.g. "EQUIVALENCE (R,S(1)), (D,S(2))"
is something g77 can and does handle by intuiting that the
entire EQUIVALENCE block must be aligned to byte 4 of an 8-byte-
aligned block (assuming compiling for a SPARC, or with -malign-double
on an x86, or any machine with 8-byte-aligned doubles and 4-byte
long floats).  That means g77 can also handle "COMMON C, D",
where C is CHARACTER*1 and D is DOUBLE PRECISION, though it warns
that it has to insert pre-padding (of 7 bytes, usually) between
the linker's idea of where the common block starts and where
it actually starts (where C starts).  (The warning is in case
the same common area is declared with a different type layout,
resulting in a different pre-padding being used, e.g. none.)

And, by "sick" I mean I designed it this way back around 1988,
before seeing barely a shred of GNU code!  My usual philosophy
is: if I can't think of *one* clear-cut meaning for a value,
that usually means I shouldn't be using just *one* value (or
constant or macro or whatever).  gcc's DECL_ALIGN field, and
the relevant muck, violates that principle in spades, of course.

        tq vm, (burley)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: ix86 `double' alignment (was Re: egcs-1.1 release schedule)
  1998-06-23  3:32   ` Jeffrey A Law
@ 1998-06-23  5:13     ` Craig Burley
  1998-06-24  2:28       ` Jeffrey A Law
  0 siblings, 1 reply; 23+ messages in thread
From: Craig Burley @ 1998-06-23  5:13 UTC (permalink / raw)
  To: law; +Cc: d.love, egcs

>  > >	  If the stack gets mis-aligned relative to STACK_BOUNDARY
>  > >	  combine could end up removing a seemingly useless
>  > >	  stack operation/address calculation.
>  > 
>  > I don't understand this, but presumably I need to look into it
>  > further.
>I explained it a little in a message to Toon.  Basically combine knows
>have to remove a rundant "and" operation which just turns off some
>low order bits in an address.  If the stack isn't aligned to
>STACK_BOUNDARY, then combine could end up removing a mask operation
>that wasn't redundant.

I'm a little curious, though, how such an operation comes to pass.
Is it only likely because the user code does something like
"&foo & 7", or are there internally-generated reasons?  It's okay
if you can't think of any examples; I agree with the overall
sentiment that we don't want to lie to the compiler in this area,
even if we can't come up with a reason it'd bite us right away!

>  > Okay, that makes sense to me.  We want to hit a majority of cases
>  > anyway.  We don't care (for now) about cases where users are
>  > combining multiple languages in weird ways, for example.
>Well, we care about it from a correctness standpoint.  Things still
>have to work if they're combining .o files from old compilers,
>callback from the library like qsort, etc.  But we aren't really
>worred about performance in those cases.

I came up with worse examples.  If STACK_BOUNDARY (or anything that
might break the ABI) is adjusted based on whether the processor
is [56]86 vs. [34]86, then code/libraries that happen to be compiled
on different variants of the x86 architecture could be magically
incompatible, producing subtly wrong results.

The other bad one is if we made only the g77 compiler "break" the
ABI to get this performance the "easy way", then some poor user
believed the g77 docs about f2c compatibility and tried to link
f2c-and-gcc-compiled code with g77-compiled code.  Even on the
same machine with the same version of gcc/g77 (egcs 1.1, if we
went down this rat-hole :), the result would be a subtly broken
executable, because the g77-compiled code would lay out its
COMMON areas differently than the f2c-and-gcc-compiled code!

>Sure.  Think about cases where the alignment of the double in the
>arglist isn't naturally aligned (think C, pass by value :-).
>
>foo (double, int)
>
>we push args back to front.
>
>So we push the int on the stack, which means the double will be
>at only a 4 byte aligned stack address if we assume our stack
>was 8 byte aligned before we pushed the args.

Note that, in practice, what g77 most often does is:

foo (double *, int *)

But, aside from that...

My question is, just why, *conceptually*, is it a problem on the
x86 architecture to try to align the argument list so the caller
frame is 64-bit aligned *and* at least some of the doubles in
the list are 64-bit aligned, but some aren't?

That is, is there a reason that x86 code *must* be generated either
to always assume doubles are 32-bit aligned *or* are always
64-bit aligned?  I can't think of any.

If that's the case, then IMO this whole problem is indeed, as I
thought, the result of gcc just not having a flexible-enough
architecture, that is, its "housekeeping staff" can't cope with
meeting these meetable requirements.

So can we at least come up with a short-term way to say "*try*
to align outgoing doubles to 64-bits, but don't assume incoming
doubles are 64-bit aligned", and in the long run make a better
overall architecture for representing alignments?  (Do I need
to write a "white paper" on what I mean by all this -- would
that help anyone understand what I'm talking about?  I've thought
it through quite a bit lately, so I could probably bang it out
with a few days' work.)

>You might think we could compensate for this by pushing an extra 
>dummy word before the first integer to ensure the double gets 
>aligned.  But that loses if we have:
>
>foo (int2, double, int1)
>
>If we pushed an extra 4 byte hunk before int1, then the total 
>size of the arglist would be 20 bytes -- not a multiple of 8.
>
>And as I'll explain below, we must always make sure to allocate
>in 8 byte lumps -- we can't depend on the callee to round the stack.

Again, what is the *real* problem with just doing what is currently
done for that case, ending up with a misaligned double arg for
the incoming procedure -- must it really assume its double is
64-bit aligned?  Or is this really just an internal problem with
gcc's housekeeping?

(Guess I should start reading my 486 handbook again!  :)

>  > Is it reasonable
>  > to just subtract an extra 8 bytes when creating the frame
>  > pointer upon procedure entry and then NAND it with 7 to align
>  > it?  Or would that make for problems with debugger, profiling,
>  > and/or exception support, or is there no quick way to NAND the
>  > frame pointer on the x86?
>Nope.  Because you then don't have a constant offset to get to the
>arguments that were passed to the function.    To make this work
>you'd have to dedicate a hard register to serve as an argument
>pointer, which will be horrible.
>
>[ Think about it, how can you generate code to find an argument if
>  at entry to the procedure you may adjust the stack by a varying
>  value (0 or 4). ]

Duh, okay, that's right.  That's my limited SPARC/VLIW thinking
tripping me up again.  Even though they have the same problem, I
don't think about it, because so often all the incoming arguments
arrive in registers.  (I have spent most of the last 13 years or
so working on machines that have registers; I'd forgotten my
earlier experiences working on machines with hardly any, sorry.  :)

>Instead we must make sure that we always allocate stacks in 8 byte
>hunks in the prologue *and* that we push an extra dummy word on the stack
>when performing function calls where the arg list + return pointer
>are not a multiple of 8 bytes in size.
>
>[ Remember, the call itself will push a 4 byte word on the stack
>  too, so we have to account for it too. ]

Right.  Okay.

>  > It seems like everyone else thinks the right way to do this is
>  > to try to always assure %sp is 64-bit aligned across calls by
>  > modifying all the code that is in the procedure-call chain.
>  > That probably means an extra dummy push before odd-number-of-args
>  > calls, etc., right?
>Close.  It's not the number of args, but the total size of the arg
>list.  If the size of the arg list is a multiple of 8 bytes, then
>we have to push a dummy arg. so that in the stack is 8 byte aligned
>when we enter the callee.

Well, if we can arrange for internal gcc housekeeping to do this
by default, *without* having other gcc housekeeping assume that
incoming double arguments, or the stack frame itself, are aligned,
is that basically enough to cover what I've been asking for?

(Note that, ideally, -malign-double would not be needed to do the
above.  I wouldn't mind a new option to disable that new behavior,
but IMO it should be enabled by default.)

Also, presumably we don't actually have to *push* an arg, but just
subtract 4 from %esp, right?

I am quite willing to do this work myself.  But I say that well-
knowing I'm not the best person for the job; just someone sufficiently
enthused, with a spot of time, a Pentium II, a trackball, and
half the g77 user base hounding me for the past couple of years,
etc. etc. etc.  So I'd need some initial hand-holding, probably.  :)

        tq vm, (burley)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: ix86 double alignment (was Re: egcs-1.1 release schedule)
@ 1998-06-23  3:32 John Wehle
  1998-06-23 15:06 ` Craig Burley
  0 siblings, 1 reply; 23+ messages in thread
From: John Wehle @ 1998-06-23  3:32 UTC (permalink / raw)
  To: burley; +Cc: law, d.love, egcs, davem

> [...]
> The latter uses automatic arrays (which gcc and g77 support), it'd
> be great to get those 64-bit aligned as well.  The former is the
> most important thing we *aren't* aligning, currently, even with
> `-malign-double'.  (It should be aligned especially if `a' is an
> array, of course.)
> [...]
> Note that I suggested the gcc architecture (machine descriptions,
> etc.) be modified to include a more fine-grained expression of
> alignment requirements.  E.g. distinguishing hardware requirements
> (even instruction requirements, such as `ld' vs. `ldd' on SPARCv8)
> from ABI requirements from ideal performance settings.  But this
> suggestion was turned down at the time -- some seven years ago!

Though it's not as fine-grain as what's mentioned above it may be
worth while to consider using the DATA_ALIGNMENT macro when laying
out variables for the stack.  On the i386 it currently returns the
Intel recommended alignment for doubles, long doubles, arrays, etc.
(the recommended alignment for long doubles is different from doubles).

-- John
-------------------------------------------------------------------------
|   Feith Systems  |   Voice: 1-215-646-8000  |  Email: john@feith.com  |
|    John Wehle    |     Fax: 1-215-540-5495  |                         |
-------------------------------------------------------------------------


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: ix86 double alignment (was Re: egcs-1.1 release schedule)
  1998-06-22 18:20 ` ix86 double alignment (was Re: egcs-1.1 release schedule) Craig Burley
@ 1998-06-23  3:32   ` David S. Miller
  1998-06-23  6:30     ` Craig Burley
  1998-06-23  3:32   ` Jeffrey A Law
  1 sibling, 1 reply; 23+ messages in thread
From: David S. Miller @ 1998-06-23  3:32 UTC (permalink / raw)
  To: burley; +Cc: law, d.love, egcs

   Date: Mon, 22 Jun 1998 14:29:42 -0400 (EDT)
   From: Craig Burley <burley@gnu.org>

   A case we can't 64-bit align is:

	   real r(2)
	   double precision d1, d2
	   equivalence (r(1),d1)
	   equivalence (r(2),d2)

   Regardless of whether this is stack, static, or even part of a common
   block, we can't 64-bit align both d1 and d2.  (Well, not without
   an option to completely change the way we implement Fortran; I wonder
   if Sun does that to support weird-but-conforming code on SPARCs,
   such as the above.)

I can give some insight to this on certain cases on the Sparc.

The UltraSparc has specific high performance trap vectors dedicated to
handling the case where a double float load is done to an address
which is not aligned correctly.  The UltraSparc users manual suggests
that if the compiler can determine that the likelyhood of bad
alignment is 50/50 or less, if should output the double float loads.

However there is an OS level side issue to this.  Sparc is rather
strict about unaligned loads by default on most systems I know about.
However, most systems provide some way to tell the OS to "allow
unaligned memory accesses, and fix them up for the program".

This would need some investigation before such a scheme is enable in
egcs, so I'd say defer thinking about it until after the 1.1 release
happens.

Later,
David S. Miller
davem@dm.cobaltmicro.com

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: ix86 `double' alignment (was Re: egcs-1.1 release schedule)
  1998-06-22 12:04 ` ix86 `double' alignment (was Re: egcs-1.1 release schedule) Craig Burley
@ 1998-06-23  3:32   ` Jeffrey A Law
  1998-06-23  5:13     ` Craig Burley
  0 siblings, 1 reply; 23+ messages in thread
From: Jeffrey A Law @ 1998-06-23  3:32 UTC (permalink / raw)
  To: Craig Burley; +Cc: d.love, egcs

  In message < 199806221811.OAA07410@melange.gnu.org >you write:
  > For Fortran code, we can usually hand-wave that; this case would
  > only come up when the call tree has an *embedded* procedure
  > that doesn't maintain proper alignment, and since the big
  > computational problem with g77 performance is in code compiled
  > by g77, and such code is rarely called by C code, I don't think
  > this would represent a huge deficiency.
Right, but changing STACK_BOUNDARY is not an option because it does
effect C code.


  > >	  If the stack gets mis-aligned relative to STACK_BOUNDARY
  > >	  combine could end up removing a seemingly useless
  > >	  stack operation/address calculation.
  > 
  > I don't understand this, but presumably I need to look into it
  > further.
I explained it a little in a message to Toon.  Basically combine knows
have to remove a rundant "and" operation which just turns off some
low order bits in an address.  If the stack isn't aligned to
STACK_BOUNDARY, then combine could end up removing a mask operation
that wasn't redundant.



  > Okay, that makes sense to me.  We want to hit a majority of cases
  > anyway.  We don't care (for now) about cases where users are
  > combining multiple languages in weird ways, for example.
Well, we care about it from a correctness standpoint.  Things still
have to work if they're combining .o files from old compilers,
callback from the library like qsort, etc.  But we aren't really
worred about performance in those cases.

  > 
  > >	* The ABI is still going to mandate that some doubles in
  > >	  argument lists are going to be mis-aligned.  We'd have
  > >	  to arrange to copy them from the arglist into a suitable
  > >	  stack slot.  This may be more trouble than its worth.
  > 
  > I'm not sure how this can ever happen in the x86 architecture?
Sure.  Think about cases where the alignment of the double in the
arglist isn't naturally aligned (think C, pass by value :-).

foo (double, int)

we push args back to front.

So we push the int on the stack, which means the double will be
at only a 4 byte aligned stack address if we assume our stack
was 8 byte aligned before we pushed the args.

You might think we could compensate for this by pushing an extra 
dummy word before the first integer to ensure the double gets 
aligned.  But that loses if we have:

foo (int2, double, int1)

If we pushed an extra 4 byte hunk before int1, then the total 
size of the arglist would be 20 bytes -- not a multiple of 8.

And as I'll explain below, we must always make sure to allocate
in 8 byte lumps -- we can't depend on the callee to round the stack.


  > Is it reasonable
  > to just subtract an extra 8 bytes when creating the frame
  > pointer upon procedure entry and then NAND it with 7 to align
  > it?  Or would that make for problems with debugger, profiling,
  > and/or exception support, or is there no quick way to NAND the
  > frame pointer on the x86?
Nope.  Because you then don't have a constant offset to get to the
arguments that were passed to the function.    To make this work
you'd have to dedicate a hard register to serve as an argument
pointer, which will be horrible.

[ Think about it, how can you generate code to find an argument if
  at entry to the procedure you may adjust the stack by a varying
  value (0 or 4). ]

Instead we must make sure that we always allocate stacks in 8 byte
hunks in the prologue *and* that we push an extra dummy word on the stack
when performing function calls where the arg list + return pointer
are not a multiple of 8 bytes in size.

[ Remember, the call itself will push a 4 byte word on the stack
  too, so we have to account for it too. ]


  > It seems like everyone else thinks the right way to do this is
  > to try to always assure %sp is 64-bit aligned across calls by
  > modifying all the code that is in the procedure-call chain.
  > That probably means an extra dummy push before odd-number-of-args
  > calls, etc., right?
Close.  It's not the number of args, but the total size of the arg
list.  If the size of the arg list is a multiple of 8 bytes, then
we have to push a dummy arg. so that in the stack is 8 byte aligned
when we enter the callee.


jeff

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: ix86 double alignment (was Re: egcs-1.1 release schedule)
  1998-06-22 18:20 ` ix86 double alignment (was Re: egcs-1.1 release schedule) Craig Burley
  1998-06-23  3:32   ` David S. Miller
@ 1998-06-23  3:32   ` Jeffrey A Law
  1998-06-23  5:13     ` Craig Burley
  1 sibling, 1 reply; 23+ messages in thread
From: Jeffrey A Law @ 1998-06-23  3:32 UTC (permalink / raw)
  To: Craig Burley; +Cc: davem, d.love, egcs

  In message < 199806221829.OAA07477@melange.gnu.org >you write:
  > Well, I'm willing to not try to do any special aligning for
  > EQUIVALENCE and COMMON for now.  If we can just get 64-bit
  > alignment for stack-allocated VAR_DECLs -- which generally
  > won't include EQUIVALENCE (and certainly not COMMON) -- we'll
  > have made a *huge* improvement in g77 performance, especially
  > its *repeatability* of performance measurements.
Yup.  But considering the release schedule, I'd be happy if we could
just get the stack aligned properly without breaking the ABI, then
iterate to getting automatic variables aligned relative to the stack.

If we can get more done before the release, then great, but I wouldn't
want to hold things up on this issue if we can avoid it.

  > (Without this improvement, egcs 1.1 will often appear *substantially*
  > worse than the combination of g77 0.5.22 and gcc 2.7.2.3 on lots of
  > widely used Fortran code, assuming users are using -malign-double.)
Well, we still have -malign-double as an option for the x86 port, so
if they use it they presumably would see comparable performance, right?

Actually, I'd expect generally better performance because we do  handle 
alignments for static store items in a reasonable manner, which is a
significant improvement by itself).

  > Note that I suggested the gcc architecture (machine descriptions,
  > etc.) be modified to include a more fine-grained expression of
  > alignment requirements.  E.g. distinguishing hardware requirements
  > (even instruction requirements, such as `ld' vs. `ldd' on SPARCv8)
  > from ABI requirements from ideal performance settings.  But this
  > suggestion was turned down at the time -- some seven years ago!
Sigh.  Yea, it really seems like something we should have -- then
again, there's been little gcc emphasis on the x86 in the past and
it's the most likely benefactor of such stuff.

jeff

^ permalink raw reply	[flat|nested] 23+ messages in thread

* ix86 double alignment (was Re: egcs-1.1 release schedule)
  1998-06-22  5:19 egcs-1.1 release schedule David S. Miller
@ 1998-06-22 18:20 ` Craig Burley
  1998-06-23  3:32   ` David S. Miller
  1998-06-23  3:32   ` Jeffrey A Law
  0 siblings, 2 replies; 23+ messages in thread
From: Craig Burley @ 1998-06-22 18:20 UTC (permalink / raw)
  To: davem; +Cc: law, d.love, egcs

>   Date: Sun, 21 Jun 1998 22:31:31 -0600
>   From: Jeffrey A Law <law@cygnus.com>
>
>	   * The ABI is still going to mandate that some doubles in
>	     argument lists are going to be mis-aligned.  We'd have
>	     to arrange to copy them from the arglist into a suitable
>	     stack slot.  This may be more trouble than its worth.
>
>And there are still going to be issues with equivalence statements.

Well, I'm willing to not try to do any special aligning for
EQUIVALENCE and COMMON for now.  If we can just get 64-bit
alignment for stack-allocated VAR_DECLs -- which generally
won't include EQUIVALENCE (and certainly not COMMON) -- we'll
have made a *huge* improvement in g77 performance, especially
its *repeatability* of performance measurements.

(Without this improvement, egcs 1.1 will often appear *substantially*
worse than the combination of g77 0.5.22 and gcc 2.7.2.3 on lots of
widely used Fortran code, assuming users are using -malign-double.)

I hope to have a fairly thorough sample program put together soon
(tomorrow?) to illustrate this, but the simple cases we want
to align for now are like

	subroutine x
	double precision a
	...
	end

and:

	subroutine y(n)
	double precision a(n)
	...
	end

The latter uses automatic arrays (which gcc and g77 support), it'd
be great to get those 64-bit aligned as well.  The former is the
most important thing we *aren't* aligning, currently, even with
`-malign-double'.  (It should be aligned especially if `a' is an
array, of course.)

A case we can't 64-bit align is:

	real r(2)
	double precision d1, d2
	equivalence (r(1),d1)
	equivalence (r(2),d2)

Regardless of whether this is stack, static, or even part of a common
block, we can't 64-bit align both d1 and d2.  (Well, not without
an option to completely change the way we implement Fortran; I wonder
if Sun does that to support weird-but-conforming code on SPARCs,
such as the above.)

What we *can* do is *implement* the above, perhaps warning about
the suboptimal alignment.  That is, there's no reason we can't
go ahead and 32-bit align d1 and d2, so one of them is not
64-bit aligned.  The programmer asked for it, after all!

What we can also 64-bit align is this:

	real r(2)
	double precision d
	equivalence (r(2),d)

We can do that because we can see that there are no actual *conflicts*
of alignment.  We can implement this by either inserting a dummy
unused 32-bit variable before r(1) and aligning *that* to a 64-bit
boundary (stack or static, doesn't matter), or, if we have a
smart-enough back end (or linker, for static memory I guess), simply
use a directive that means "align to a 64-bit boundary on bit 32".

But it's not *important* to 64-bit align the above EQUIVALENCE case,
certainly not for egcs 1.1.

And what we also need to continue to support is stuff like

	real r1, r2
	real s(6)
	double precision d1, d2
	common r1, d1, r2, d2
	equivalence (r1,s)

which requires that s(1) overlays r1, s(2) and s(3) overlay d1, s(4)
overlays r2, and s(5) and s(6) overlays d2.

Again, we can do this by seeing that there are no "hard" conflicts
(at the machine or ABI level), and punting (and warning?) over the
fact that the "soft" conflicts (the ideal 64-bit alignment of
double for performance reasons) prevent "ideal" alignment.  Again,
"so what", the programmer has specified no 64-bit alignment, so
we don't give it to him in cases like that -- but we can still
compile correct, and fairly fast, ABI-compatible, code.

Note that I suggested the gcc architecture (machine descriptions,
etc.) be modified to include a more fine-grained expression of
alignment requirements.  E.g. distinguishing hardware requirements
(even instruction requirements, such as `ld' vs. `ldd' on SPARCv8)
from ABI requirements from ideal performance settings.  But this
suggestion was turned down at the time -- some seven years ago!

Maybe it's time we finally got this all "right", and I'm sure
willing to help.  But I think we can only manage to get a bit of
it "right" to improve x86 performance for egcs 1.1.

        tq vm, (burley)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* ix86 `double' alignment (was Re: egcs-1.1 release schedule)
  1998-06-21 23:07 egcs-1.1 release schedule Jeffrey A Law
@ 1998-06-22 12:04 ` Craig Burley
  1998-06-23  3:32   ` Jeffrey A Law
  0 siblings, 1 reply; 23+ messages in thread
From: Craig Burley @ 1998-06-22 12:04 UTC (permalink / raw)
  To: law; +Cc: d.love, egcs

>  In message < rzqaf79nru7.fsf@djlvig.dl.ac.uk >you write:
>  > _Please_ include some means of allowing Fortran (at least) to get
>  > stack-allocated doubles double-aligned on x86 (modulo libc).  (I hope
>  > I haven't missed this going in at some stage!)  The one-line patch for
>  > STACK_BOUNDARY used by g77 0.5.22 is good enough.
>I'm still waiting on some kind of solution that doesn't totally break
>the ABI.

I'm willing to do some serious work to make this happen for 1.1,
which assumes it can be done in the next couple of weeks, right?

>To do this "right" you have to:
>
>	* Make sure gcc always allocates stack in multiples of 8 bytes,
>	  adding dummy outgoing args as necessary to keep the stack
>	  properly aligned at call points.
>
>	  You can't do this with STACK_BOUNDARY since that says we
>	  will 100% always have a properly aligned stack, which can
>	  never be true since we might be linking in code from
>	  another compiler which didn't keep the stack suitably
>	  aligned.

For Fortran code, we can usually hand-wave that; this case would
only come up when the call tree has an *embedded* procedure
that doesn't maintain proper alignment, and since the big
computational problem with g77 performance is in code compiled
by g77, and such code is rarely called by C code, I don't think
this would represent a huge deficiency.

>	  If the stack gets mis-aligned relative to STACK_BOUNDARY
>	  combine could end up removing a seemingly useless
>	  stack operation/address calculation.

I don't understand this, but presumably I need to look into it
further.

>	  The idea is to make sure the stack is 8 byte aligned in the
>	  common cases, but not absolutely rely on it for correct code
>	  generation.

Absolutely.

>	* Second, assuming that gcc always keeps the pointer aligned
>	  for itself, then arrange for doubles to end up 8 byte
>	  aligned relative to the stack pointer.
>
>	  If the stack gets mis-aligned due to an old module, then
>	  our doubles won't be aligned correctly, but the vast majority
>	  of the time they will be suitably aligned.
>
>	  I don't think there's any mechanism to do this when the
>	  desired alignment is less than STACK_BOUNDARY.  I fact
>	  I know that to be the case since I worked on a similar
>	  problem recently.

Okay, that makes sense to me.  We want to hit a majority of cases
anyway.  We don't care (for now) about cases where users are
combining multiple languages in weird ways, for example.

>	* The ABI is still going to mandate that some doubles in
>	  argument lists are going to be mis-aligned.  We'd have
>	  to arrange to copy them from the arglist into a suitable
>	  stack slot.  This may be more trouble than its worth.

I'm not sure how this can ever happen in the x86 architecture?

Well, I mean, not when passing argument by reference, which is
generally how g77 works anyway.

>Note that some non-ABI breaking changes to align doubles and other
>values have gone into the x86 compiler.  In particular we should be
>properly aligning all data in the static store.

Right.  The Next Big Thing is to, by default, 64-bit-align any
stack-based VAR_DECLs.  Just doing that would be Great.

What I'd like to see, and think wouldn't be too hard, is a change
that'd leave TYPE_ALIGN for doubles at 32, so g77 would still
be able to produce COMMON and EQUIVALENCE blocks ("aggregates")
containing doubles without breaking the ABI or rejecting
standard-conforming code.  (Never mind that g77 already does this
for systems like SPARC; SPARC users expect that, apparently,
while x86 users don't, mostly.)

But this change would set DECL_ALIGN for stack-based VAR_DECLs
to 64, and implement that, presumably by assuring that the
stack frame is itself 64-bit aligned.

What I don't know (having not looked into it in any detail) is
how best to ensure the stack frame is 64-bit aligned.  Presumably
%sp will always be 32-bit aligned upon entry to any procedure
(according to the ABI; perhaps the hardware?).  Is it reasonable
to just subtract an extra 8 bytes when creating the frame
pointer upon procedure entry and then NAND it with 7 to align
it?  Or would that make for problems with debugger, profiling,
and/or exception support, or is there no quick way to NAND the
frame pointer on the x86?

It seems like everyone else thinks the right way to do this is
to try to always assure %sp is 64-bit aligned across calls by
modifying all the code that is in the procedure-call chain.
That probably means an extra dummy push before odd-number-of-args
calls, etc., right?

The reason I'd generally prefer the former approach to the latter
is that either one is likely to cost some performance, but the
latter *always* costs performance since the caller doesn't know
whether the callee uses doubles, whereas the former costs only
when the procedure doing the extra dance actually uses doubles.
(Whether we can teach gcc to not do the NAND(%fp,7) if there
are no doubles on the stack is another issue.)

        tq vm, (burley)

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~1998-06-28 18:01 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1998-06-24 17:12 ix86 double alignment (was Re: egcs-1.1 release schedule) John Wehle
1998-06-24 21:23 ` Jeffrey A Law
  -- strict thread matches above, loose matches on Subject: below --
1998-06-23 10:23 ix86 `double' " John Wehle
1998-06-23 14:56 ` Craig Burley
1998-06-23 22:55 ` Jeffrey A Law
1998-06-23  3:32 ix86 double " John Wehle
1998-06-23 15:06 ` Craig Burley
1998-06-23 22:55   ` Jeffrey A Law
1998-06-24 10:08   ` Dave Love
1998-06-24 21:23     ` Jeffrey A Law
1998-06-22  5:19 egcs-1.1 release schedule David S. Miller
1998-06-22 18:20 ` ix86 double alignment (was Re: egcs-1.1 release schedule) Craig Burley
1998-06-23  3:32   ` David S. Miller
1998-06-23  6:30     ` Craig Burley
1998-06-23  3:32   ` Jeffrey A Law
1998-06-23  5:13     ` Craig Burley
1998-06-21 23:07 egcs-1.1 release schedule Jeffrey A Law
1998-06-22 12:04 ` ix86 `double' alignment (was Re: egcs-1.1 release schedule) Craig Burley
1998-06-23  3:32   ` Jeffrey A Law
1998-06-23  5:13     ` Craig Burley
1998-06-24  2:28       ` Jeffrey A Law
1998-06-24 14:50         ` Craig Burley
1998-06-25  0:25           ` Jeffrey A Law
1998-06-25  9:59             ` Tim Hollebeek
1998-06-28 18:01             ` Marc Lehmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).