From mboxrd@z Thu Jan  1 00:00:00 1970
From: Jeffrey A Law <law@cygnus.com>
To: Craig Burley <burley@gnu.org>
Cc: d.love@dl.ac.uk, egcs@cygnus.com
Subject: Re: ix86 `double' alignment (was Re: egcs-1.1 release schedule) 
Date: Wed, 24 Jun 1998 02:28:00 -0000
Message-id: <21108.898661578@hurl.cygnus.com>
References: <199806230722.DAA02276@melange.gnu.org>
X-SW-Source: 1998-06/msg00838.html

  In message < 199806230722.DAA02276@melange.gnu.org >you write:
  > I'm a little curious, though, how such an operation comes to pass.
The code to implement va_arg has address masking of this nature.  I
also believe it occurs internally to handle memory accesses smaller
than 32bits on the alpha.

  > I came up with worse examples.  If STACK_BOUNDARY (or anything that
  > might break the ABI) is adjusted based on whether the processor
  > is [56]86 vs. [34]86, then code/libraries that happen to be compiled
  > on different variants of the x86 architecture could be magically
  > incompatible, producing subtly wrong results.
Yes.  Though I would strongly recommend against ABI things changing
based on the processor for this exact reason :-)

  > My question is, just why, *conceptually*, is it a problem on the
  > x86 architecture to try to align the argument list so the caller
  > frame is 64-bit aligned *and* at least some of the doubles in
  > the list are 64-bit aligned, but some aren't?
  > 
  > That is, is there a reason that x86 code *must* be generated either
  > to always assume doubles are 32-bit aligned *or* are always
  > 64-bit aligned?  I can't think of any.
It's not an architecture issue, but an ABI issue.  The architecture
should be able to handle just about an alignment we throw at it.

Basically the ABI just mandates a 4 byte alignment, we get better
performance if we could get the args 8 byte aligned.  But  I'll be
a little suprised if we can do this without having the callee copy
it out of the arglist to an aligned memory slot.

There is an interesting problem from a gcc internals standpoint that
we may hit if we ever tried to align doubles relative to the start of
the argument list.  I ran into it on the v850 a couple years ago, but
I don't remember the details.  Basically the info we needed to do
this wasn't available and the scheme broke down when handling 
varargs/stdarg.


  > So can we at least come up with a short-term way to say "*try*
  > to align outgoing doubles to 64-bits, but don't assume incoming
  > doubles are 64-bit aligned", and in the long run make a better
  > overall architecture for representing alignments?
Let's defer trying to align arglists until we get the stack pointer
itself aligned and until after we're aligning auto variables relative
to the stack pointer and possibly stack slots for pseudos that don't
get hard registers aligned :-)

My gut tells me aligning variables inside the arglist isn't going to
win as much as the other cases noted above.

  > >You might think we could compensate for this by pushing an extra 
  > >dummy word before the first integer to ensure the double gets 
  > >aligned.  But that loses if we have:
  > >
  > >foo (int2, double, int1)
  > >
  > >If we pushed an extra 4 byte hunk before int1, then the total 
  > >size of the arglist would be 20 bytes -- not a multiple of 8.
  > >
  > >And as I'll explain below, we must always make sure to allocate
  > >in 8 byte lumps -- we can't depend on the callee to round the stack.
  > 
  > Again, what is the *real* problem with just doing what is currently
  > done for that case, ending up with a misaligned double arg for
  > the incoming procedure -- must it really assume its double is
  > 64-bit aligned?  Or is this really just an internal problem with
  > gcc's housekeeping?
There's no problem other than the performance issues.  The code will
still work.  Maybe that's where we're mis-communicating :-)

Before we can do *anything* about the alignment of args and autos we
first need to get the stack pointer aligned at all times.  Let's deal
with that first, then try to come up with solutions for the auto and
argument alignment afterwards.


  > >Instead we must make sure that we always allocate stacks in 8 byte
  > >hunks in the prologue *and* that we push an extra dummy word on the stack
  > >when performing function calls where the arg list + return pointer
  > >are not a multiple of 8 bytes in size.
  > >
  > >[ Remember, the call itself will push a 4 byte word on the stack
  > >  too, so we have to account for it too. ]
  > 
  > Right.  Okay.
OK.  We agree on this.  And since any work which involves trying to
align autos depends on first getting the stack aligned let's solve the
alignment of the stack pointer problem first.  That work can happen
while we debate the other issues :-)


  > Also, presumably we don't actually have to *push* an arg, but just
  > subtract 4 from %esp, right?
Right.  We don't actually have to make the memory reference, just allocate
an extra outgoing arg slot before we push any of the real outgoing
arguments.

  > I am quite willing to do this work myself.  But I say that well-
  > knowing I'm not the best person for the job; just someone sufficiently
  > enthused, with a spot of time, a Pentium II, a trackball, and
  > half the g77 user base hounding me for the past couple of years,
  > etc. etc. etc.  So I'd need some initial hand-holding, probably.  :)
Well, I don't have the time to tackle it myself, but I can try to help
you (or anyone else) through the twisty maze of ABI related code in
gcc to try and make this happen.

jeff