Re: A proposal to align GCC stack

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* Re: A proposal to align GCC stack
@ 2007-12-19  1:46 Ross Ridge
  0 siblings, 0 replies; 27+ messages in thread
From: Ross Ridge @ 2007-12-19  1:46 UTC (permalink / raw)
  To: gcc

Robert Dewar writes:
>Well if we have local variables of type float (and we have specified
>use of SSE), we are in trouble, no?

Non-vector SSE instructions, like the ones that operate on scalar floats,
don't require memory operands to be aligned.

					Ross Ridge

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: A proposal to align GCC stack
  2007-12-19  3:51 Ross Ridge
  2007-12-19 10:33 ` Andrew Pinski
  2007-12-20  9:11 ` Ye, Joey
@ 2008-03-20 20:18 ` Ye, Joey
  2 siblings, 0 replies; 27+ messages in thread
From: Ye, Joey @ 2008-03-20 20:18 UTC (permalink / raw)
  To: Ross Ridge, cschueler; +Cc: gcc

Ross, Christian,

Here are the patches to implement the idea we discussed before. Can you
take a look at it or try it?

http://gcc.gnu.org/ml/gcc-patches/2008-03/msg01200.html
http://gcc.gnu.org/ml/gcc-patches/2008-03/msg01199.html

Thanks - Joey

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: A proposal to align GCC stack
  2007-12-18  4:25 Ye, Joey
@ 2007-12-21 20:25 ` Christian Schüler
  0 siblings, 0 replies; 27+ messages in thread
From: Christian Schüler @ 2007-12-21 20:25 UTC (permalink / raw)
  To: gcc

Ye, Joey <joey.ye <at> intel.com> writes:

> 

Please go forward with this idea!

The current implementation of force_align_arg_pointer has never worked for me.
I have a DLL which may be called by code out of my control and I already have
manual stub functions to align the stack. I would love to rely on compiler
facilities for this but if I do, the host program crashes when my DLL is loaded.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: A proposal to align GCC stack
  2007-12-19 10:33 ` Andrew Pinski
@ 2007-12-20  9:32   ` Ye, Joey
  0 siblings, 0 replies; 27+ messages in thread
From: Ye, Joey @ 2007-12-20  9:32 UTC (permalink / raw)
  To: Andrew Pinski, Ross Ridge; +Cc: gcc

Andrew,

My proposal is supposed not limited to i386/x86_64. Would do please
spend some time review it and see if it can really solve problem in PowerPC?
Your comments is welcome.

Thanks - Joey  

-----Original Message-----
From: gcc-owner@gcc.gnu.org [mailto:gcc-owner@gcc.gnu.org] On Behalf Of Andrew Pinski
Sent: 2007年12月19日 18:07
To: Ross Ridge
Cc: gcc@gcc.gnu.org
Subject: Re: A proposal to align GCC stack

On 12/18/07, Ross Ridge <rridge@csclub.uwaterloo.ca> wrote:
> Look at it another way.  Lets say you were compiling x86_64 code with
> -fpreferred-stack-boundary=3, an 8-byte PREFERRED alignment.

Can we stop talking about x86/x86_64 specifics issues here?  I have an
use case for the PowerPC side of the Cell BE for variables greater
than the normal stack boundary alignment of 16bytes.  They need to be
128byte aligned for DMA transfering to the SPUs.

I already proposed a patch [1] to fix this use case but I have not
seen many replies yet.

Thanks,
Andrew Pinski

[1] http://gcc.gnu.org/ml/gcc-patches/2007-05/msg01167.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: A proposal to align GCC stack
  2007-12-19  3:51 Ross Ridge
  2007-12-19 10:33 ` Andrew Pinski
@ 2007-12-20  9:11 ` Ye, Joey
  2008-03-20 20:18 ` Ye, Joey
  2 siblings, 0 replies; 27+ messages in thread
From: Ye, Joey @ 2007-12-20  9:11 UTC (permalink / raw)
  To: Ross Ridge; +Cc: gcc

Ye, Joey writes:
>> This proposal values correctness at first place. So when compile
can't
>> make sure a function is only called from functions with the same or
bigger
>> preferred-stack-boundary, it will conservatively align the stack. One
>> optimization is to set INCOMING = PREFERRED for local functions. Do
you
>> think it more acceptable?

Ross Ridge wrote:
> Not really.  It might reduce the amount of unnecessary stack
adjustment,
> but the performance regression would remain.  Changing the behaviour
of
> -fpreferred-stack-boundary doesn't make it more correct.  It supposed
> to change the ABI, it works as documented and, yes, if it's misused it
> will cause problems.  So will any number of GCC's ABI changing
options.

> Look at it another way.  Lets say you were compiling x86_64 code with
> -fpreferred-stack-boundary=3, an 8-byte PREFERRED alignment.  As you
> know, this is different from the standard x86_64 ABI which requires a
> 16-byte alignment.  Now with your proposal, GCC's behaviour of won't
> change, because it's safe to assume that incoming stack is at least
> 8-byte aligned.  There should be no change in the code GCC generates,
> with or without your proposal.  However, the outgoing stack won't be
> 16-byte aligned as the x86_64 ABI requires.  In this case, what also
> doesn't change is the fact that mixing code compiled with different
> -fpreferred-stack-boundary values doesn't work.  It's just as
problematic
> and unsafe as it was before.

> So when you said "this proposal values correctness at first place",
> that really isn't true.  The proposal only addresses safety when
> preferred alignment is raised from the standard ABI's alignment.
You're
> conservatively aligning the incoming stack, but not the outgoing
stack.
> You don't seem to be concerned about the problems that can arise when
> the preferred is raised above the ABI's.  Why?  My guess is that
because
> "correctness" in this case would cause unacceptable regressions when
> compiling the x86_64 Linux kernel.
You are right. My proposal doesn't guarantee 100% correctness. In case
of PREFERRED < ABI, we hope the author knows what will happen.

> If you can understand why it would be unacceptable to change how
> -fpreferred-stack-boundary behaves when compiling the Linux kernel,
> then maybe you can understand why I don't find it acceptable for it to
> change when compiling my code.
I think I understand now. My updated version proposal sets 
INCOMING == PREFERRED, and -fpreferred-stack-boundary works
the same as before.

Thanks - Joey

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: A proposal to align GCC stack
  2007-12-19 10:06 Ross Ridge
@ 2007-12-19 15:32 ` H.J. Lu
  0 siblings, 0 replies; 27+ messages in thread
From: H.J. Lu @ 2007-12-19 15:32 UTC (permalink / raw)
  To: Ross Ridge; +Cc: gcc

On Wed, Dec 19, 2007 at 04:12:59AM -0500, Ross Ridge wrote:
> 
> >STACK_BOUNDARY is the minimum stack boundary. MAX(STACK_BOUNDARY,
> >PREFERRED_STACK_BOUNDARY) == PREFERRED_STACK_BOUNDARY.  So the question is
> >if we should assume INCOMING == PREFERRED_STACK_BOUNDARY in all cases:
> 
> Doing this would also remove need for ABI_STACK_BOUNDARY in your proposal.

In our proposal, ABI_STACK_BOUNDARY provides the default value for
PREFERRED_STACK_BOUNDARY. It can be different for different OSes.
For a given OS, you can change PREFERRED_STACK_BOUNDARY. But you can't
change ABI_STACK_BOUNDARY. You can think it as software STACK_BOUNDARY.

> 
> >Pros:
> >  1. Keep the current behaviour of -mpreferred-stack-boundary.
> >
> >Cons:
> >  1. The generated code is incompatible with the other object files.
> 
> Well, your proposal wouldn't completely solve that problem, either.
> You can't guarantee compatiblity with object files compiled with different
> values -mpreferred-stack-boundary, including those compiled with current
> implementation, unless you assume the incomming stack is aligned to
> the lowest value the flag can have and align the outgoing stack to the
> highest value that the flag can have.

We can align the outgoing stack to PREFERRED_STACK_BOUNDARY and assume
INCOMING = MIN (ABI_STACK_BOUNDARY, PREFERRED_STACK_BOUNDARY), which
is our original proposal.


H.J.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: A proposal to align GCC stack
  2007-12-19  9:13 Ross Ridge
@ 2007-12-19 14:30 ` H.J. Lu
  0 siblings, 0 replies; 27+ messages in thread
From: H.J. Lu @ 2007-12-19 14:30 UTC (permalink / raw)
  To: Ross Ridge; +Cc: gcc

On Wed, Dec 19, 2007 at 04:12:55AM -0500, Ross Ridge wrote:
> 
> I'm suggesting a different defintion of STACK_BOUNDARY which wouldn't,
> if strictly followed, result STACK_BOUNDARY being defined as 8 on
> the i386.  The i386 hardware doesn't enforce a minimum alignment on the
> stack pointer.

On i386, you can only push/pop 2 or 4 bytes. On x86-64, you can only
push/pop 2 or 8 bytes.

> stack.  In context of your proposal, defining STACK_BOUNDARY this way,
> as a requirement imposed on GCC by an ABI (or at least by convention),
> not the hardware, is important.  Without an ABI requirement, there's
> nothing that would prohibit an i386 leaf function from adjusting the
> stack in a way that leaves the stack 1- or 2-byte aligned.
> 

I don't mind changing the definition of STACK_BOUNDARY. It won't
affect our proposal. However, please don't use ABI when defining
STACK_BOUNDARY since a given hardware can have more than one ABIs
and only one STACK_BOUNDARY.


H.J.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: A proposal to align GCC stack
@ 2007-12-19 11:52 Ross Ridge
  0 siblings, 0 replies; 27+ messages in thread
From: Ross Ridge @ 2007-12-19 11:52 UTC (permalink / raw)
  To: gcc

Andrew Pinski writes:
> Can we stop talking about x86/x86_64 specifics issues here?

No.

>I have an use case for the PowerPC side of the Cell BE for variables
>greater than the normal stack boundary alignment of 16bytes.  They need
>to be 128byte aligned for DMA transfering to the SPUs.
>
>I already proposed a patch [1] to fix this use case but I have not
>seen many replies yet.

Complaining about someone talking about x86/x86_64 specific issues and
then bringing up a PowerPC/Cell specific issue is probably not the best
way to go about getting your patch approved.

						Ross Ridge

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: A proposal to align GCC stack
  2007-12-19  3:51 Ross Ridge
@ 2007-12-19 10:33 ` Andrew Pinski
  2007-12-20  9:32   ` Ye, Joey
  2007-12-20  9:11 ` Ye, Joey
  2008-03-20 20:18 ` Ye, Joey
  2 siblings, 1 reply; 27+ messages in thread
From: Andrew Pinski @ 2007-12-19 10:33 UTC (permalink / raw)
  To: Ross Ridge; +Cc: gcc

On 12/18/07, Ross Ridge <rridge@csclub.uwaterloo.ca> wrote:
> Look at it another way.  Lets say you were compiling x86_64 code with
> -fpreferred-stack-boundary=3, an 8-byte PREFERRED alignment.

Can we stop talking about x86/x86_64 specifics issues here?  I have an
use case for the PowerPC side of the Cell BE for variables greater
than the normal stack boundary alignment of 16bytes.  They need to be
128byte aligned for DMA transfering to the SPUs.

I already proposed a patch [1] to fix this use case but I have not
seen many replies yet.

Thanks,
Andrew Pinski

[1] http://gcc.gnu.org/ml/gcc-patches/2007-05/msg01167.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: A proposal to align GCC stack
@ 2007-12-19 10:06 Ross Ridge
  2007-12-19 15:32 ` H.J. Lu
  0 siblings, 1 reply; 27+ messages in thread
From: Ross Ridge @ 2007-12-19 10:06 UTC (permalink / raw)
  To: gcc

H.J. Lu writes:
> What value did you use for -mpreferred-stack-boundary? The x86 backend
> defaults to 16byte.

On Windows the 16-byte default pretty much just wastes space, so I use
-mpreferred-stack-boundary=2 where it might make a difference.  In the
case where I wanted to use SSE vector instructions, I explicitly used
-mpreferred-stack-boundary=4 (16-byte alignment).

>STACK_BOUNDARY is the minimum stack boundary. MAX(STACK_BOUNDARY,
>PREFERRED_STACK_BOUNDARY) == PREFERRED_STACK_BOUNDARY.  So the question is
>if we should assume INCOMING == PREFERRED_STACK_BOUNDARY in all cases:

Doing this would also remove need for ABI_STACK_BOUNDARY in your proposal.

>Pros:
>  1. Keep the current behaviour of -mpreferred-stack-boundary.
>
>Cons:
>  1. The generated code is incompatible with the other object files.

Well, your proposal wouldn't completely solve that problem, either.
You can't guarantee compatiblity with object files compiled with different
values -mpreferred-stack-boundary, including those compiled with current
implementation, unless you assume the incomming stack is aligned to
the lowest value the flag can have and align the outgoing stack to the
highest value that the flag can have.

						Ross Ridge

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: A proposal to align GCC stack
@ 2007-12-19  9:13 Ross Ridge
  2007-12-19 14:30 ` H.J. Lu
  0 siblings, 1 reply; 27+ messages in thread
From: Ross Ridge @ 2007-12-19  9:13 UTC (permalink / raw)
  To: gcc

Ross Ridge writes:
> As I mentioned later in my message STACK_BOUNDARY shouldn't be defined in
> terms of hardware, but in terms of the ABI.  While the i386 allows the
> stack pointer to bet set to any value, by convention the stack pointer
> is always kept 4-byte aligned at all times.  GCC should never generate
> code that that would violate this requirement, even in leaf-functions
> or transitorily during the prologue/epilogue.

H.J. Lu writes:
> From gcc internal manual

I'm suggesting a different defintion of STACK_BOUNDARY which wouldn't,
if strictly followed, result STACK_BOUNDARY being defined as 8 on
the i386.  The i386 hardware doesn't enforce a minimum alignment on the
stack pointer.

> Since x86 always push/pop stack by decrementing/incrementing address
> size, it makes senses to define STACK_BOUNDARY as address size.

The i386 PUSH and POP instructions adjust stack pointer the by the
operand size of the instruction.  The address size of the instruction
has no effect.  For example, GCC should never generate code like this:

	pushw $0
	pushw %ax

because the stack is temporarily misaligned.  This could result in a
signal, trap, interrupt or other asynchronous handler using a misaligned
stack.  In context of your proposal, defining STACK_BOUNDARY this way,
as a requirement imposed on GCC by an ABI (or at least by convention),
not the hardware, is important.  Without an ABI requirement, there's
nothing that would prohibit an i386 leaf function from adjusting the
stack in a way that leaves the stack 1- or 2-byte aligned.

					Ross Ridge

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: A proposal to align GCC stack
@ 2007-12-19  3:51 Ross Ridge
  2007-12-19 10:33 ` Andrew Pinski
                   ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Ross Ridge @ 2007-12-19  3:51 UTC (permalink / raw)
  To: gcc

Ross Ridge wrote:
> I'm currently using -fpreferred-stack-boundary without any trouble.
> Your proposal would in fact generate code to align stack when it's
> not necessary.  This would change the behaviour of
> -fpreferred-stack-boundary, hurting performance and that's unacceptable
> to me.

Ye, Joey writes:
> This proposal values correctness at first place. So when compile can't
> make sure a function is only called from functions with the same or bigger
> preferred-stack-boundary, it will conservatively align the stack. One
> optimization is to set INCOMING = PREFERRED for local functions. Do you
> think it more acceptable?

Not really.  It might reduce the amount of unnecessary stack adjustment,
but the performance regression would remain.  Changing the behaviour of
-fpreferred-stack-boundary doesn't make it more correct.  It supposed
to change the ABI, it works as documented and, yes, if it's misused it
will cause problems.  So will any number of GCC's ABI changing options.

Look at it another way.  Lets say you were compiling x86_64 code with
-fpreferred-stack-boundary=3, an 8-byte PREFERRED alignment.  As you
know, this is different from the standard x86_64 ABI which requires a
16-byte alignment.  Now with your proposal, GCC's behaviour of won't
change, because it's safe to assume that incoming stack is at least
8-byte aligned.  There should be no change in the code GCC generates,
with or without your proposal.  However, the outgoing stack won't be
16-byte aligned as the x86_64 ABI requires.  In this case, what also
doesn't change is the fact that mixing code compiled with different
-fpreferred-stack-boundary values doesn't work.  It's just as problematic
and unsafe as it was before.

So when you said "this proposal values correctness at first place",
that really isn't true.  The proposal only addresses safety when
preferred alignment is raised from the standard ABI's alignment.  You're
conservatively aligning the incoming stack, but not the outgoing stack.
You don't seem to be concerned about the problems that can arise when
the preferred is raised above the ABI's.  Why?  My guess is that because
"correctness" in this case would cause unacceptable regressions when
compiling the x86_64 Linux kernel.

If you can understand why it would be unacceptable to change how
-fpreferred-stack-boundary behaves when compiling the Linux kernel,
then maybe you can understand why I don't find it acceptable for it to
change when compiling my code.

					Ross Ridge

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: A proposal to align GCC stack
  2007-12-18 23:31 Ross Ridge
  2007-12-19  1:25 ` Robert Dewar
@ 2007-12-19  2:18 ` H.J. Lu
  1 sibling, 0 replies; 27+ messages in thread
From: H.J. Lu @ 2007-12-19  2:18 UTC (permalink / raw)
  To: Ross Ridge; +Cc: gcc

On Tue, Dec 18, 2007 at 06:31:25PM -0500, Ross Ridge wrote:
> Ye, Joey writes: 
> >i. STACK_BOUNDARY in bits, which is enforced by hardware, 32 for i386
> >and 64 for x86_64. It is the minimum stack boundary. It is fixed.
> 
> Ross Ridge wrote:
> >Strictly speaking by the above definition it would be 8 for i386.
> >The hardware doesn't force the stack to be 32-bit aligned, it just
> >performs poorly if it isn't.
> 
> Robert Dewar writes:
> >First, although for some types, the accesses may work, the optimizer
> >is allowed to assume that data is properly aligned, and could possibly
> >generate incorrect code ...
> 
> That's not enforced by hardware.
> 
> >Second, I am pretty sure there are SSE types that require
> >alignment at the hardware levell, even on the i386
> 
> This isn't a restriction on stack aligment.  It's a restriction on what
> kinds of machine types can be accessed on the stack.
> 
> As I mentioned later in my message STACK_BOUNDARY shouldn't be defined in
> terms of hardware, but in terms of the ABI.  While the i386 allows the
> stack pointer to bet set to any value, by convention the stack pointer
> is always kept 4-byte aligned at all times.  GCC should never generate
> code that that would violate this requirement, even in leaf-functions
> or transitorily during the prologue/epilogue.

From gcc internal manual:

 -- Macro: STACK_BOUNDARY
     Define this macro to the minimum alignment enforced by hardware
     for the stack pointer on this machine.  The definition is a C
     expression for the desired alignment (measured in bits).  This
     value is used as a default if `PREFERRED_STACK_BOUNDARY' is not
     defined.  On most machines, this should be the same as
     `PARM_BOUNDARY'.

Since x86 always push/pop stack by decrementing/incrementing address
size, it makes senses to define STACK_BOUNDARY as address size. It
has nothing to do with application binary interface (ABI).

> 
> This is different than the proposed ABI_STACK_BOUNDARY macro which defines

The proposed ABI_STACK_BOUNDARY defines the value specified by the various
psABIs which gcc conforms.

> the possibily stricter aligment the ABI requires at function entry.  Since
> most i386 ABIs don't require a stricter alignment, that has ment that
> SSE types couldn't be located on the stack.  Currently you can get around
> this problem by changing the ABI using -fperferred-stack-boundary or by

No, gcc works around by setting

ix86_preferred_stack_boundary = 128;

by default.

> forcing an SSE compatible alignment using -mstackrealign or __attribute__
> ((force_align_arg_pointer)).  Joey Ye's proposal is another solution
> to this problem where GCC would automatically force an SSE compatible
> aligment when SSE types are used on the stack.
> 

Our proposal isn't just "another" solution. It is a solution for generic
stack alignment problems.


H.J.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: A proposal to align GCC stack
  2007-12-19  1:00 Ross Ridge
  2007-12-19  1:53 ` Ye, Joey
@ 2007-12-19  2:07 ` H.J. Lu
  1 sibling, 0 replies; 27+ messages in thread
From: H.J. Lu @ 2007-12-19  2:07 UTC (permalink / raw)
  To: Ross Ridge; +Cc: gcc

On Tue, Dec 18, 2007 at 06:31:26PM -0500, Ross Ridge wrote:
> Ross Ridge wrote:
> > The -fpreferrred-stack-boundary flag currently generates code that
> > assumes the stack aligned to the preferred alignment on function entry.
> > If you assume a worse incoming alignment you'll be aligning the stack
> > unnecessarily and generating code that this flag doesn't require.
> 
> H.J. Lu writes:
> > That is how we get into trouble in the first place. The only place I
> > think of where you can guarantee everything is compiled with the same
> > -fpreferrred-stack-boundary is kernel. Our proposal will align stack
> > only when needed. PREFERRED_STACK_BOUNDARY > ABI_STACK_BOUNDARY will
> > generate a largr stack unnecessarily.
> 
> I'm currently using -fpreferred-stack-boundary without any trouble.

BTW, it is -mpreferred-stack-boundary. What value did you use for
-mpreferred-stack-boundary? The x86 backend defaults to 16byte.
The x86-64 psABI specifies 16byte stack alignment. But the ia32 psABI
only specifies 4byte stack alignment. That means that the object
files generated by gcc may be incompatible with libs or objects
compiled by other ia32 psABI confirming compilers.

> Your proposal would in fact generate code to align stack when it's not
> necessary.  This would change the behaviour of -fpreferred-stack-boundary,
> hurting performance and that's unacceptable to me.
> 
> >> Ok, if people are using this flag to change the alignment to something
> >> smaller than used by the standard ABI, then INCOMING should be
> >> MAX(STACK_BOUNDARY, PREFERRED_STACK_BOUNDARY).
> >
> > On x86-64, ABI_STACK_BOUNDARY is 16byte, but the Linux kernel may
> > want to use 8 byte for PREFERRED_STACK_BOUNDARY. INCOMING will
> > be MIN(STACK_BOUNDARY, PREFERRED_STACK_BOUNDARY) == 8 byte.

A typo, I meant "INCOMING will be MIN(ABI_STACK_BOUNDARY,
PREFERRED_STACK_BOUNDARY) == 8".

> 
> Using MAX(STACK_BOUNDARY, PREFERRED_STACK_BOUNDARY) also equals 8 in that
> case and preserves the behaviour -fpreferred-stack-boundary in every case.

STACK_BOUNDARY is the minimum stack boundary. MAX(STACK_BOUNDARY,
PREFERRED_STACK_BOUNDARY) == PREFERRED_STACK_BOUNDARY. So the question is
if we should assume INCOMING == PREFERRED_STACK_BOUNDARY in all cases:

Pros:
  1. Keep the current behaviour of -mpreferred-stack-boundary.

Cons:
  1. The generated code is incompatible with the other object files.

I guess we can live with that.


H.J.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: A proposal to align GCC stack
  2007-12-19  1:00 Ross Ridge
@ 2007-12-19  1:53 ` Ye, Joey
  2007-12-19  2:07 ` H.J. Lu
  1 sibling, 0 replies; 27+ messages in thread
From: Ye, Joey @ 2007-12-19  1:53 UTC (permalink / raw)
  To: Ross Ridge, gcc

 
Ross Ridge wrote:
> I'm currently using -fpreferred-stack-boundary without any trouble.
> Your proposal would in fact generate code to align stack when it's not
> necessary.  This would change the behaviour of
-fpreferred-stack-boundary,
> hurting performance and that's unacceptable to me.
This proposal values correctness at first place. So when compile can't
make
sure a function is only called from functions with the same or bigger 
preferred-stack-boundary, it will conservatively align the stack. One
optimization
is to set INCOMING = PREFERRED for local functions. Do you think it more
acceptable?

>> Ok, if people are using this flag to change the alignment to
something
>> smaller than used by the standard ABI, then INCOMING should be
>> MAX(STACK_BOUNDARY, PREFERRED_STACK_BOUNDARY).
>
> On x86-64, ABI_STACK_BOUNDARY is 16byte, but the Linux kernel may
> want to use 8 byte for PREFERRED_STACK_BOUNDARY. INCOMING will
> be MIN(STACK_BOUNDARY, PREFERRED_STACK_BOUNDARY) == 8 byte.

> Using MAX(STACK_BOUNDARY, PREFERRED_STACK_BOUNDARY) also equals 8 in
that
> case and preserves the behaviour -fpreferred-stack-boundary in every
case.
I think HJ means MIN(ABI_STACK_BOUNDARY, PREFERRED_STACK_BOUNDARY). 
MAX(ABI, PREFERRED) == 16 in this case.

Thanks - Joey

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: A proposal to align GCC stack
  2007-12-18 23:31 Ross Ridge
@ 2007-12-19  1:25 ` Robert Dewar
  2007-12-19  2:18 ` H.J. Lu
  1 sibling, 0 replies; 27+ messages in thread
From: Robert Dewar @ 2007-12-19  1:25 UTC (permalink / raw)
  To: Ross Ridge; +Cc: gcc

Ross Ridge wrote:
> Ye, Joey writes: 
>> i. STACK_BOUNDARY in bits, which is enforced by hardware, 32 for i386
>> and 64 for x86_64. It is the minimum stack boundary. It is fixed.
> 
> Ross Ridge wrote:
>> Strictly speaking by the above definition it would be 8 for i386.
>> The hardware doesn't force the stack to be 32-bit aligned, it just
>> performs poorly if it isn't.
> 
> Robert Dewar writes:
>> First, although for some types, the accesses may work, the optimizer
>> is allowed to assume that data is properly aligned, and could possibly
>> generate incorrect code ...
> 
> That's not enforced by hardware.

But suppose we have something like int(&k) & 1. The optimizer
is permitted to replace this with 0 if it knows that the type
of k is four byte aligned.
> 
>> Second, I am pretty sure there are SSE types that require
>> alignment at the hardware levell, even on the i386
> 
> This isn't a restriction on stack aligment.  It's a restriction on what
> kinds of machine types can be accessed on the stack.

Well if we have local variables of type float (and we have specified
use of SSE), we are in trouble, no?

> This is different than the proposed ABI_STACK_BOUNDARY macro which defines
> the possibily stricter aligment the ABI requires at function entry.  Since
> most i386 ABIs don't require a stricter alignment, that has ment that
> SSE types couldn't be located on the stack.  Currently you can get around
> this problem by changing the ABI using -fperferred-stack-boundary or by
> forcing an SSE compatible alignment using -mstackrealign or __attribute__
> ((force_align_arg_pointer)).  Joey Ye's proposal is another solution
> to this problem where GCC would automatically force an SSE compatible
> aligment when SSE types are used on the stack.

right ...
> 
> 					Ross Ridge

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: A proposal to align GCC stack
@ 2007-12-19  1:00 Ross Ridge
  2007-12-19  1:53 ` Ye, Joey
  2007-12-19  2:07 ` H.J. Lu
  0 siblings, 2 replies; 27+ messages in thread
From: Ross Ridge @ 2007-12-19  1:00 UTC (permalink / raw)
  To: gcc

Ross Ridge wrote:
> The -fpreferrred-stack-boundary flag currently generates code that
> assumes the stack aligned to the preferred alignment on function entry.
> If you assume a worse incoming alignment you'll be aligning the stack
> unnecessarily and generating code that this flag doesn't require.

H.J. Lu writes:
> That is how we get into trouble in the first place. The only place I
> think of where you can guarantee everything is compiled with the same
> -fpreferrred-stack-boundary is kernel. Our proposal will align stack
> only when needed. PREFERRED_STACK_BOUNDARY > ABI_STACK_BOUNDARY will
> generate a largr stack unnecessarily.

I'm currently using -fpreferred-stack-boundary without any trouble.
Your proposal would in fact generate code to align stack when it's not
necessary.  This would change the behaviour of -fpreferred-stack-boundary,
hurting performance and that's unacceptable to me.

>> Ok, if people are using this flag to change the alignment to something
>> smaller than used by the standard ABI, then INCOMING should be
>> MAX(STACK_BOUNDARY, PREFERRED_STACK_BOUNDARY).
>
> On x86-64, ABI_STACK_BOUNDARY is 16byte, but the Linux kernel may
> want to use 8 byte for PREFERRED_STACK_BOUNDARY. INCOMING will
> be MIN(STACK_BOUNDARY, PREFERRED_STACK_BOUNDARY) == 8 byte.

Using MAX(STACK_BOUNDARY, PREFERRED_STACK_BOUNDARY) also equals 8 in that
case and preserves the behaviour -fpreferred-stack-boundary in every case.

					Ross Ridge

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: A proposal to align GCC stack
@ 2007-12-18 23:31 Ross Ridge
  2007-12-19  1:25 ` Robert Dewar
  2007-12-19  2:18 ` H.J. Lu
  0 siblings, 2 replies; 27+ messages in thread
From: Ross Ridge @ 2007-12-18 23:31 UTC (permalink / raw)
  To: gcc

Ye, Joey writes: 
>i. STACK_BOUNDARY in bits, which is enforced by hardware, 32 for i386
>and 64 for x86_64. It is the minimum stack boundary. It is fixed.

Ross Ridge wrote:
>Strictly speaking by the above definition it would be 8 for i386.
>The hardware doesn't force the stack to be 32-bit aligned, it just
>performs poorly if it isn't.

Robert Dewar writes:
>First, although for some types, the accesses may work, the optimizer
>is allowed to assume that data is properly aligned, and could possibly
>generate incorrect code ...

That's not enforced by hardware.

>Second, I am pretty sure there are SSE types that require
>alignment at the hardware levell, even on the i386

This isn't a restriction on stack aligment.  It's a restriction on what
kinds of machine types can be accessed on the stack.

As I mentioned later in my message STACK_BOUNDARY shouldn't be defined in
terms of hardware, but in terms of the ABI.  While the i386 allows the
stack pointer to bet set to any value, by convention the stack pointer
is always kept 4-byte aligned at all times.  GCC should never generate
code that that would violate this requirement, even in leaf-functions
or transitorily during the prologue/epilogue.

This is different than the proposed ABI_STACK_BOUNDARY macro which defines
the possibily stricter aligment the ABI requires at function entry.  Since
most i386 ABIs don't require a stricter alignment, that has ment that
SSE types couldn't be located on the stack.  Currently you can get around
this problem by changing the ABI using -fperferred-stack-boundary or by
forcing an SSE compatible alignment using -mstackrealign or __attribute__
((force_align_arg_pointer)).  Joey Ye's proposal is another solution
to this problem where GCC would automatically force an SSE compatible
aligment when SSE types are used on the stack.

					Ross Ridge

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: A proposal to align GCC stack
  2007-12-18 13:52 ` Daniel Jacobowitz
@ 2007-12-18 18:05   ` H.J. Lu
  0 siblings, 0 replies; 27+ messages in thread
From: H.J. Lu @ 2007-12-18 18:05 UTC (permalink / raw)
  To: Ross Ridge, gcc

On Tue, Dec 18, 2007 at 08:47:35AM -0500, Daniel Jacobowitz wrote:
> On Mon, Dec 17, 2007 at 11:25:35PM -0500, Ross Ridge wrote:
> > >//  Reserve two stack slots and save return address 
> > >//  and previous frame pointer into them. By
> > >//  pointing new ebp to them, we build a pseudo 
> > >//  stack for unwinding
> > 
> > Hmmm... I don't know much about the DWARF unwind information, but
> > couldn't it handle this case without creating the "pseudo frame"?
> > Or at least be extended so it could?
> 
> In practice, there are non-DWARF unwinders scattered all over that
> work on i386 and folks want to keep them working.  DWARF has no
> trouble handling this sort of thing.
> 

Another thing is we may need to update prolog analyzer in gdb
to support the new prolog.


H.J.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: A proposal to align GCC stack
  2007-12-18 11:55 Ross Ridge
@ 2007-12-18 16:14 ` H.J. Lu
  0 siblings, 0 replies; 27+ messages in thread
From: H.J. Lu @ 2007-12-18 16:14 UTC (permalink / raw)
  To: Ross Ridge; +Cc: gcc

On Tue, Dec 18, 2007 at 03:39:42AM -0500, Ross Ridge wrote:
> >> changes the ABI.  According your defintions, I would think
> >> that INCOMING should be ABI_STACK_BOUNDARY in the first case,
> >> and MAX(ABI_STACK_BOUNDARY, PREFERRED_STACK_BOUNDARY) in the second.
> >
> > That isn't true since some .o files may not be compiled with
> > -fpreferred-stack-boundary or with a different value of
> > -fpreferred-stack-boundary.
> 
> Like with any ABI changing flag, that's not supported:
> 
> 	... Further, every function must be generated such that it keeps
> 	the stack aligned.  Thus calling a function compiled with a higher
> 	preferred stack boundary from a function compiled with a lower
> 	preferred stack boundary will most likely misalign the stack.
> 
> The -fpreferrred-stack-boundary flag currently generates code that
> assumes the stack aligned to the preferred alignment on function entry.
> If you assume a worse incoming alignment you'll be aligning the stack
> unnecessarily and generating code that this flag doesn't require.

That is how we get into trouble in the first place. The only place
I think of where you can guarantee everything is compiled with the
same -fpreferrred-stack-boundary is kernel. Our proposal will
align stack only when needed. PREFERRED_STACK_BOUNDARY >
ABI_STACK_BOUNDARY will generate a largr stack unnecessarily.

We have considered adding a new option, -fincoming-stack-boundary.
But we need to consider local/global functions as well as function
pointers. If a function can only be called locally, its incoming
stack boundary will be PREFERRED_STACK_BOUNDARY. Otherwise, its
incoming stack boundary will be MIN(INCOMING_STACK_BOUNDARY,
INCOMING_STACK_BOUNDARY,PREFERRED_STACK_BOUNDARY). We aren't sure
if its benefit will be worth its complexity.

> 
> > On x86-64, ABI_STACK_BOUNDARY is 16byte, but the Linux kernel may
> > want to use 8 byte for PREFERRED_STACK_BOUNDARY.
> 
> Ok, if people are using this flag to change the alignment to something
> smaller than used by the standard ABI, then INCOMING should be
> MAX(STACK_BOUNDARY, PREFERRED_STACK_BOUNDARY).

On x86-64, ABI_STACK_BOUNDARY is 16byte, but the Linux kernel may
want to use 8 byte for PREFERRED_STACK_BOUNDARY. INCOMING will
be MIN(STACK_BOUNDARY, PREFERRED_STACK_BOUNDARY) == 8 byte.

H.J.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: A proposal to align GCC stack
  2007-12-18  4:29 Ross Ridge
  2007-12-18  6:15 ` H.J. Lu
  2007-12-18 13:52 ` Daniel Jacobowitz
@ 2007-12-18 14:41 ` Robert Dewar
  2 siblings, 0 replies; 27+ messages in thread
From: Robert Dewar @ 2007-12-18 14:41 UTC (permalink / raw)
  To: Ross Ridge; +Cc: gcc

Ross Ridge wrote:
> Ye, Joey writes:
>> i. STACK_BOUNDARY in bits, which is enforced by hardware, 32 for i386
>> and 64 for x86_64. It is the minimum stack boundary. It is fixed.
> 
> Strictly speaking by the above definition it would be 8 for i386.
> The hardware doesn't force the stack to be 32-bit aligned, it just
> performs poorly if it isn't.

This seems wrong to me.

First, although for some types, the accesses may work, the optimizer
is allowed to assume that data is properly aligned, and could possibly
generate incorrect code (in Ada it is formally erroneous to have any
variable that is not properly aligned to its types alignment, unless
the alignment is specficially set to some other value).

Second, I am pretty sure there are SSE types that require
alignment at the hardware levell, even on the i386.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: A proposal to align GCC stack
  2007-12-18  4:29 Ross Ridge
  2007-12-18  6:15 ` H.J. Lu
@ 2007-12-18 13:52 ` Daniel Jacobowitz
  2007-12-18 18:05   ` H.J. Lu
  2007-12-18 14:41 ` Robert Dewar
  2 siblings, 1 reply; 27+ messages in thread
From: Daniel Jacobowitz @ 2007-12-18 13:52 UTC (permalink / raw)
  To: Ross Ridge; +Cc: gcc

On Mon, Dec 17, 2007 at 11:25:35PM -0500, Ross Ridge wrote:
> >//  Reserve two stack slots and save return address 
> >//  and previous frame pointer into them. By
> >//  pointing new ebp to them, we build a pseudo 
> >//  stack for unwinding
> 
> Hmmm... I don't know much about the DWARF unwind information, but
> couldn't it handle this case without creating the "pseudo frame"?
> Or at least be extended so it could?

In practice, there are non-DWARF unwinders scattered all over that
work on i386 and folks want to keep them working.  DWARF has no
trouble handling this sort of thing.

-- 
Daniel Jacobowitz
CodeSourcery

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: A proposal to align GCC stack
@ 2007-12-18 11:55 Ross Ridge
  2007-12-18 16:14 ` H.J. Lu
  0 siblings, 1 reply; 27+ messages in thread
From: Ross Ridge @ 2007-12-18 11:55 UTC (permalink / raw)
  To: gcc

Ross Ridge writes:
> This section doesn't make sense to me.  The force_align_arg_pointer
> attribute and -mstackrealign assume that the ABI is being
> followed, while the -fpreferred-stack-boundary option effectively

"H.J. Lu" <hjl at lucon dot org> writes
> According to Apple engineer who implemented the -mstackrealign,
> on MacOS/ia32, psABI is 16byte, but -mstackrealign will assume
> 4byte, which is STACK_BOUNDARY.

Ok.  The importanting thing is that for backwards compatibility it needs
to continue to assume 4-byte alignment on entry and align the stack to
a 16-byte alignment on x86 targets, so that makes more sense.

>> changes the ABI.  According your defintions, I would think
>> that INCOMING should be ABI_STACK_BOUNDARY in the first case,
>> and MAX(ABI_STACK_BOUNDARY, PREFERRED_STACK_BOUNDARY) in the second.
>
> That isn't true since some .o files may not be compiled with
> -fpreferred-stack-boundary or with a different value of
> -fpreferred-stack-boundary.

Like with any ABI changing flag, that's not supported:

	... Further, every function must be generated such that it keeps
	the stack aligned.  Thus calling a function compiled with a higher
	preferred stack boundary from a function compiled with a lower
	preferred stack boundary will most likely misalign the stack.

The -fpreferrred-stack-boundary flag currently generates code that
assumes the stack aligned to the preferred alignment on function entry.
If you assume a worse incoming alignment you'll be aligning the stack
unnecessarily and generating code that this flag doesn't require.

> On x86-64, ABI_STACK_BOUNDARY is 16byte, but the Linux kernel may
> want to use 8 byte for PREFERRED_STACK_BOUNDARY.

Ok, if people are using this flag to change the alignment to something
smaller than used by the standard ABI, then INCOMING should be
MAX(STACK_BOUNDARY, PREFERRED_STACK_BOUNDARY).

					Ross Ridge

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: A proposal to align GCC stack
  2007-12-18  6:15 ` H.J. Lu
@ 2007-12-18  7:50   ` Ye, Joey
  0 siblings, 0 replies; 27+ messages in thread
From: Ye, Joey @ 2007-12-18  7:50 UTC (permalink / raw)
  To: H.J. Lu, Ross Ridge; +Cc: gcc

Ross, HJ,

> 
> >Because I386 PIC requires BX as GOT pointer and I386 may use AX, DX
> >and CX as parameter passing registers, there are limited candidates for
> >this proposal to choose. Current proposal suggests EDI, because it won't
> >conflict with i386 PIC or regparm.
> 
> Could you pick a call-clobbered register in cases where one is availale?
I think it is doable. In current Apple engineer's code to support -mstackrealign,
hard register ECX is used. We need to add additional code to find which caller 
save register is not used to pass parameters. If none of them is available, 
we still have to use callee save reg like EDI.

> 
> >//  Reserve two stack slots and save return address 
> >//  and previous frame pointer into them. By
> >//  pointing new ebp to them, we build a pseudo 
> >//  stack for unwinding
> 
> Hmmm... I don't know much about the DWARF unwind information, but
> couldn't it handle this case without creating the "pseudo frame"?
> Or at least be extended so it could?

I haven't spent time investigated it yet. I agree it will be much more beautiful 
without "pseudo frame". I will be happy if solution can be found or be suggested here. 
But I doubt if it is worthwhile effort. Remember only when stack adjustment + alloca is 
present, will "pseudo frame" be generated. It may not be so common to impact 
performance.


-----Original Message-----
From: gcc-owner@gcc.gnu.org [mailto:gcc-owner@gcc.gnu.org] On Behalf Of H.J. Lu
Sent: 2007年12月18日 13:17
To: Ross Ridge
Cc: gcc@gcc.gnu.org
Subject: Re: A proposal to align GCC stack

On Mon, Dec 17, 2007 at 11:25:35PM -0500, Ross Ridge wrote:
> Ye, Joey writes:
> >i. STACK_BOUNDARY in bits, which is enforced by hardware, 32 for i386
> >and 64 for x86_64. It is the minimum stack boundary. It is fixed.
> 
> Strictly speaking by the above definition it would be 8 for i386.
> The hardware doesn't force the stack to be 32-bit aligned, it just
> performs poorly if it isn't.

We can change the wording.

> 
> >v. INCOMING_STACK_BOUNDARY in bits, which is the stack boundary
> >at function entry. If a function is marked with __attribute__
> >((force_align_arg_pointer)) or -mstackrealign option is provided,
> >INCOMING = STACK_BOUNDARY.  Otherwise, INCOMING == MIN(ABI_STACK_BOUNDARY,
> >PREFERRED_STACK_BOUNDARY) because a function can be called via psABI
> >externally or called locally with PREFERRED_STACK_BOUNDARY.
> 
> This section doesn't make sense to me.  The force_align_arg_pointer
> attribute and -mstackrealign assume that the ABI is being
> followed, while the -fpreferred-stack-boundary option effectively

According to Apple engineer who implemented the -mstackrealign,
on MacOS/ia32, psABI is 16byte, but -mstackrealign will assume
4byte, which is STACK_BOUNDARY.

> changes the ABI.  According your defintions, I would think
> that INCOMING should be ABI_STACK_BOUNDARY in the first case,
> and MAX(ABI_STACK_BOUNDARY, PREFERRED_STACK_BOUNDARY) in the second.

That isn't true since some .o files may not be compiled with
-fpreferred-stack-boundary or with a different value of
-fpreferred-stack-boundary.

> (Or just PREFERRED_STACK_BOUNDARY because a boundary less than the ABI's
> should be rejected during command line processing.)

On x86-64, ABI_STACK_BOUNDARY is 16byte, but the Linux kernel may
want to use 8 byte for PREFERRED_STACK_BOUNDARY.

> 
> >vi. REQUIRED_STACK_ALIGNMENT in bits, which is stack alignment required
> >by local variables and calling other function. REQUIRED_STACK_ALIGNMENT
> >== MAX(LOCAL_STACK_BOUNDARY,PREFERRED_STACK_BOUNDARY) in case of a
> >non-leaf function. For a leaf function, REQUIRED_STACK_ALIGNMENT ==
> >LOCAL_STACK_BOUNDARY.
> 
> Hmm... I think you should define STACK_BOUNDARY as the minimum
> alignment that ABI requires the stack pointer to keep at all times.
> ABI_STACK_BOUNDARY should be defined as the stack alignment the
> ABI requires at function entry.  In that case a leaf function's
> REQUIRED_STACK_ALIGMENT should be MAX(LOCAL_STACK_BOUNDARY,
> STACK_BOUNDARY).

That is true since if the only local variable is char, LOCAL_STACK_BOUNDARY
will be 1. But we want the stack to be aligned at STACK_BOUNDARY.
We will update our proposal. 



H.J.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: A proposal to align GCC stack
  2007-12-18  4:29 Ross Ridge
@ 2007-12-18  6:15 ` H.J. Lu
  2007-12-18  7:50   ` Ye, Joey
  2007-12-18 13:52 ` Daniel Jacobowitz
  2007-12-18 14:41 ` Robert Dewar
  2 siblings, 1 reply; 27+ messages in thread
From: H.J. Lu @ 2007-12-18  6:15 UTC (permalink / raw)
  To: Ross Ridge; +Cc: gcc

On Mon, Dec 17, 2007 at 11:25:35PM -0500, Ross Ridge wrote:
> Ye, Joey writes:
> >i. STACK_BOUNDARY in bits, which is enforced by hardware, 32 for i386
> >and 64 for x86_64. It is the minimum stack boundary. It is fixed.
> 
> Strictly speaking by the above definition it would be 8 for i386.
> The hardware doesn't force the stack to be 32-bit aligned, it just
> performs poorly if it isn't.

We can change the wording.

> 
> >v. INCOMING_STACK_BOUNDARY in bits, which is the stack boundary
> >at function entry. If a function is marked with __attribute__
> >((force_align_arg_pointer)) or -mstackrealign option is provided,
> >INCOMING = STACK_BOUNDARY.  Otherwise, INCOMING == MIN(ABI_STACK_BOUNDARY,
> >PREFERRED_STACK_BOUNDARY) because a function can be called via psABI
> >externally or called locally with PREFERRED_STACK_BOUNDARY.
> 
> This section doesn't make sense to me.  The force_align_arg_pointer
> attribute and -mstackrealign assume that the ABI is being
> followed, while the -fpreferred-stack-boundary option effectively

According to Apple engineer who implemented the -mstackrealign,
on MacOS/ia32, psABI is 16byte, but -mstackrealign will assume
4byte, which is STACK_BOUNDARY.

> changes the ABI.  According your defintions, I would think
> that INCOMING should be ABI_STACK_BOUNDARY in the first case,
> and MAX(ABI_STACK_BOUNDARY, PREFERRED_STACK_BOUNDARY) in the second.

That isn't true since some .o files may not be compiled with
-fpreferred-stack-boundary or with a different value of
-fpreferred-stack-boundary.

> (Or just PREFERRED_STACK_BOUNDARY because a boundary less than the ABI's
> should be rejected during command line processing.)

On x86-64, ABI_STACK_BOUNDARY is 16byte, but the Linux kernel may
want to use 8 byte for PREFERRED_STACK_BOUNDARY.

> 
> >vi. REQUIRED_STACK_ALIGNMENT in bits, which is stack alignment required
> >by local variables and calling other function. REQUIRED_STACK_ALIGNMENT
> >== MAX(LOCAL_STACK_BOUNDARY,PREFERRED_STACK_BOUNDARY) in case of a
> >non-leaf function. For a leaf function, REQUIRED_STACK_ALIGNMENT ==
> >LOCAL_STACK_BOUNDARY.
> 
> Hmm... I think you should define STACK_BOUNDARY as the minimum
> alignment that ABI requires the stack pointer to keep at all times.
> ABI_STACK_BOUNDARY should be defined as the stack alignment the
> ABI requires at function entry.  In that case a leaf function's
> REQUIRED_STACK_ALIGMENT should be MAX(LOCAL_STACK_BOUNDARY,
> STACK_BOUNDARY).

That is true since if the only local variable is char, LOCAL_STACK_BOUNDARY
will be 1. But we want the stack to be aligned at STACK_BOUNDARY.
We will update our proposal. 

> 
> >Because I386 PIC requires BX as GOT pointer and I386 may use AX, DX
> >and CX as parameter passing registers, there are limited candidates for
> >this proposal to choose. Current proposal suggests EDI, because it won't
> >conflict with i386 PIC or regparm.
> 
> Could you pick a call-clobbered register in cases where one is availale?

Joey, Xuepeng, is that doable?

> 
> >//  Reserve two stack slots and save return address 
> >//  and previous frame pointer into them. By
> >//  pointing new ebp to them, we build a pseudo 
> >//  stack for unwinding
> 
> Hmmm... I don't know much about the DWARF unwind information, but
> couldn't it handle this case without creating the "pseudo frame"?
> Or at least be extended so it could?


Joey, Xuepeng, what do you think?


H.J.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: A proposal to align GCC stack
@ 2007-12-18  4:29 Ross Ridge
  2007-12-18  6:15 ` H.J. Lu
                   ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Ross Ridge @ 2007-12-18  4:29 UTC (permalink / raw)
  To: gcc

Ye, Joey writes:
>i. STACK_BOUNDARY in bits, which is enforced by hardware, 32 for i386
>and 64 for x86_64. It is the minimum stack boundary. It is fixed.

Strictly speaking by the above definition it would be 8 for i386.
The hardware doesn't force the stack to be 32-bit aligned, it just
performs poorly if it isn't.

>v. INCOMING_STACK_BOUNDARY in bits, which is the stack boundary
>at function entry. If a function is marked with __attribute__
>((force_align_arg_pointer)) or -mstackrealign option is provided,
>INCOMING = STACK_BOUNDARY.  Otherwise, INCOMING == MIN(ABI_STACK_BOUNDARY,
>PREFERRED_STACK_BOUNDARY) because a function can be called via psABI
>externally or called locally with PREFERRED_STACK_BOUNDARY.

This section doesn't make sense to me.  The force_align_arg_pointer
attribute and -mstackrealign assume that the ABI is being
followed, while the -fpreferred-stack-boundary option effectively
changes the ABI.  According your defintions, I would think
that INCOMING should be ABI_STACK_BOUNDARY in the first case,
and MAX(ABI_STACK_BOUNDARY, PREFERRED_STACK_BOUNDARY) in the second.
(Or just PREFERRED_STACK_BOUNDARY because a boundary less than the ABI's
should be rejected during command line processing.)

>vi. REQUIRED_STACK_ALIGNMENT in bits, which is stack alignment required
>by local variables and calling other function. REQUIRED_STACK_ALIGNMENT
>== MAX(LOCAL_STACK_BOUNDARY,PREFERRED_STACK_BOUNDARY) in case of a
>non-leaf function. For a leaf function, REQUIRED_STACK_ALIGNMENT ==
>LOCAL_STACK_BOUNDARY.

Hmm... I think you should define STACK_BOUNDARY as the minimum
alignment that ABI requires the stack pointer to keep at all times.
ABI_STACK_BOUNDARY should be defined as the stack alignment the
ABI requires at function entry.  In that case a leaf function's
REQUIRED_STACK_ALIGMENT should be MAX(LOCAL_STACK_BOUNDARY,
STACK_BOUNDARY).

>Because I386 PIC requires BX as GOT pointer and I386 may use AX, DX
>and CX as parameter passing registers, there are limited candidates for
>this proposal to choose. Current proposal suggests EDI, because it won't
>conflict with i386 PIC or regparm.

Could you pick a call-clobbered register in cases where one is availale?

>//  Reserve two stack slots and save return address 
>//  and previous frame pointer into them. By
>//  pointing new ebp to them, we build a pseudo 
>//  stack for unwinding

Hmmm... I don't know much about the DWARF unwind information, but
couldn't it handle this case without creating the "pseudo frame"?
Or at least be extended so it could?

					Ross Ridge

^ permalink raw reply	[flat|nested] 27+ messages in thread

* A proposal to align GCC stack
@ 2007-12-18  4:25 Ye, Joey
  2007-12-21 20:25 ` Christian Schüler
  0 siblings, 1 reply; 27+ messages in thread
From: Ye, Joey @ 2007-12-18  4:25 UTC (permalink / raw)
  To: gcc; +Cc: H.J. Lu, xuepeng.guo

-- 0. MOTIVATION --
Some local variables (such as of __m128 type or marked with alignment
attribute) require stack aligned at a boundary larger than the default
stack
boundary. Current GCC partially supports this with limitations. We are
proposing a new design to fully solve the problem.

-- 1. CURRENT IMPLEMENTATION --
There are two ways current GCC supports bigger than default stack
alignment.  One is to make sure that stack is aligned at program entry
point, and then ensure that for each non-leaf function, its frame size
is
aligned. This approach doesn't work when linking with libs or objects
compiled by other psABI confirming compilers. Some problems are logged
as
PR 33721. Another is to adjust stack alignment at the entry point of a
function if it is marked with __attribute__ ((force_align_arg_pointer))
or -mstackrealign option is provided. This method guarantees the
alignment
in most of the cases but with following problems and limitations:

*  Only 16 bytes alignment is supported
*  Adjusting stack alignment at each function prologue hurts performance
unnecessarily, because not all functions need bigger alignment. In fact,
commonly only those functions which have SSE variables defined locally
(either declared by the user or compiler generated internal temporary
variables) need corresponding alignment.
*  Doesn't support x86_64 for the cases when required stack alignment
is > 16 bytes
*  Emits inefficient and complicated prologue/epilogue code to adjust
stack alignment
*  Doesn't work with nested functions
*  Has a bug handling register parameters, which resulted in a cpu2006
failure. A patch is available as a workaround.

-- 2. NEW PROPOSAL: DESIGN --
Here, we propose a new design to fully support stack alignment while
overcoming above problems. The new design will
*  Support arbitrary alignment value, including 4,8,16,32...
*  Adjust function stack alignment only when necessary
*  Initial development will be on i386 and x86_64, but can be extended
to other platforms
*  Emit more efficient prologue/epilogue code
*  Coexist with special features like dynamic stack allocation (alloca),
nested functions, register parameter passing, PIC code and tail call
optimization
*  Be able to debug and unwind stack

2.1 Support arbitrary alignment value
Different source code and optimizations requires different stack
alignment,
as in following table:
Feature         Alignment (bytes)
i386_ABI        4
x86_64_ABI      16
char            1
short           2
int             4
long            4/8*
long long       8
__m64           8
__m128          16
float           4
double          8
long double     4/16*
user specified  any power of 2

*Note: 4 for i386, 8/16 for x86_64
The new design will support any alignment value in this table.

2.2 Adjust function stack alignment only when necessary

Current GCC defines following macros related to stack alignment:
i. STACK_BOUNDARY in bits, which is enforced by hardware, 32 for i386
and
64 for x86_64. It is the minimum stack boundary. It is fixed.
ii. PREFERRED_STACK_BOUNDARY. It sets the stack alignment when calling a
function. It may be set at command line and has no impact on stack
alignment at function entry. This proposal requires PREFERRED >= STACK,
and
by default set to ABI_STACK_BOUNDARY

This design will define a few more macros, or concepts not explicitly
defined in code:
iii. ABI_STACK_BOUNDARY in bits, which is the stack boundary specified
by
psABI, 32 for i386 and 128 for x86_64.  ABI_STACK_BOUNDARY >=
STACK_BOUNDARY. It is fixed for a given psABI.
iv. LOCAL_STACK_BOUNDARY in bits. Each function stack has its own stack
alignment requirement, which depends the alignment of its stack
variables,
LOCAL_STACK_BOUNDARY = MAX (alignment of each effective stack variable).
v. INCOMING_STACK_BOUNDARY in bits, which is the stack boundary at
function
entry. If a function is marked with __attribute__
((force_align_arg_pointer))
or -mstackrealign option is provided, INCOMING = STACK_BOUNDARY.
Otherwise,
INCOMING == MIN(ABI_STACK_BOUNDARY, PREFERRED_STACK_BOUNDARY) because a
function can be called via psABI externally or called locally with
PREFERRED_STACK_BOUNDARY.
vi. REQUIRED_STACK_ALIGNMENT in bits, which is stack alignment required
by
local variables and calling other function. REQUIRED_STACK_ALIGNMENT ==
MAX(LOCAL_STACK_BOUNDARY,PREFERRED_STACK_BOUNDARY) in case of a non-leaf
function. For a leaf function, REQUIRED_STACK_ALIGNMENT ==
LOCAL_STACK_BOUNDARY.

This proposal won't adjust stack when INCOMING_STACK_BOUNDARY >=
REQUIRED_STACK_ALIGNMENT. Only when INCOMING_STACK_BOUNDARY <
REQUIRED_STACK_ALIGNMENT, it will adjust stack to
REQUIRED_STACK_ALIGNMENT
at prologue.

2.3 Initial development on i386 and x86_64
We initially support i386 and x86_64. In this document we focus more on
i386 because it is hard to implement because of the restriction of
having
a small register file.  But all that we discuss can be easily applied
to x86_64.

2.4 Emit more efficient prologue/epilogue
When a function needs to adjust stack alignment and has no dynamic stack
allocation, this design will generate following example
prologue/epilogue
code:
IA32 example Prologue:
        pushl     %ebp
        movl      %esp, %ebp
        andl      $-16, %esp
        subl      $4, %esp ; is $-4 the local stack size?
Epilogue:
        movl      %ebp, %esp
        popl      %ebp
        ret
Locals will be addressed as esp + offset and parameters as ebp + offset.

Add x86_64 example here.

Thus BP points to parameter frame and SP points to local frame.

2.5 Coexist with special features
Stack alignment adjustment will coexist with varying  GCC features
that have special calling conventions and frame layout, such as dynamic
stack allocation (alloca), nested functions and parameter passing via
registers to local functions.

I386 hard register usage is the major problem to make the proposal
friendly 
to various GCC features. This design requires an additional hard
register
in prologue/epilogue in case of dynamic stack allocation. Because I386
PIC
requires BX as GOT pointer and I386 may use AX, DX and CX as parameter
passing registers, there are limited candidates for this proposal to
choose. Current proposal suggests EDI, because it won't conflict with
i386 PIC or regparm.

X86_64 is much easier. This proposal just chooses RBX.

2.5.1 When stack alignment adjustment comes together with alloca,
following
example prologue/epilogue will be emitted:
Prologue:
       pushl     %edi                     // Save callee save reg edi
       leal      8(%esp), %edi            // Save address of parameter
frame
       andl      $-16, %esp               // Align local stack

//  Reserve two stack slots and save return address 
//  and previous frame pointer into them. By
//  pointing new ebp to them, we build a pseudo 
//  stack for unwinding.
       pushl     $4(%edi)                 //  save return address
       pushl     %ebp                     //  save old ebp
       movl      %esp, %ebp               //  point ebp to pseudo frame
start

       subl      $24, %esp                // adjust local frame size
       movl      %edi, vreg1

epilogue:
       movl      vreg1, %edi
       movl      %ebp, %esp               // Restore esp to pseudo frame
start
       popl      %ebp
       leal      -8(%edi), %esp           // restore esp to real frame
start
       popl      %edi                     // Restore edi
       ret

Locals will be addressed as ebp - offset, parameters as vreg1 + offset

Where BX is used to set up virtual parameter frame pointer, BP points to
local frame and SP points to dynamic allocation frame.

2.5.2 Nested functions will automatically work because it uses CX as
static
pointer, which won't conflict with any registers used by stack alignment
adjustment, even when nested functions are called via function pointer
and
a function stub on stack.

2.5.3 GCC may optimize to use registers to pass parameters . At most AX,
DX
and CX will be used. Such optimization won't conflict with stack
alignment
adjustment thus it should automatically work.

2.5.4 I386 PIC uses EBX as GOT pointer. This design work well under i386
PIC:

For example:
i686 Prologue:
        pushl     %edi
        leal      8(%esp), %edi
        andl      $-16, %esp
        pushl     $4(%edi)
        pushl     %ebp
        movl      %esp, %ebp
        subl      $24,  %esp
        call      .L1
.L1:
        popl      %ebx
        movl      %edi, vreg1

Body:  // code for alloca
        movl      (vreg1), %eax
        subl      %eax, %esp
        andl      $-16, %esp
        movl      %esp, %eax

i686 Epilogue:
        movl      %ebp, %esp
        popl      %ebp
        leal      -8(%edi), %esp
        popl      %edi
        ret

Locals will be addressed as ebp - offset, parameters as vreg1 + offset,
ebx has the GOT pointer.

2.6 Debug and unwind will work since DWARF2 has the flexibility to
define
different frame pointers.

2.7 Some intrinsics rely on stack layout. Need to handle them
accordingly.
They are __builtin_return_address, __builtin_frame_address. This
proposal
will setup pseudo frame slot to help unwinder find return address and
parent frame address by emit following prologue code after adjusting
alignment:
        pushl     $4(%edi)
        pushl     %ebp

-- 3. NEW PROPOSAL: IMPLEMENTATION --
The proposed implementation can be partitioned into following subtasks.
*  Alignment requirement collection
*  Frames addressing
*  Alignment code generation
*  Debug and unwind information

3.1 Collect alignment requirement
Collecting each function's alignment requirement from frontend or from
optimization passes like vectorizer, and informing backend.

Current GCC uses cfun->stack_alignment_needed to store MIN(largest stack
variable alignment, PREFERRED_STACK_BOUNDARY). We will reuse this field
and
define its value only as "largest stack variable alignment"

3.2 Frames addressing
Adding parameter frame, local frame, static frame and dynamic frame with
appropriate pointers, either hard registers or virtual registers.

Backend will customize CAN_ELIMINATE hook to assign hard registers to
corresponding virtual registers.

3.3 Alignment code generation
Emit prologue/epilogue code to guarantee correct stack alignment based
on
each function's alignment requirement collected previously.

Modification should happen in ix86_expand_prologue and
ix86_expand_epilogue.
Code to be emitted can follow above design in a straight forward manner.

3.4 Debug information
Emit debug and unwind information for aligned stacks. It also happens in
ix86_expand_prologue and ix86_expand_epilogue corresponding the
prologue/epilogue code emitted.

4. Code Example

Simply function:
void foo()
{

   volatile int local;
   ...
}

i686 Prologue:
        pushl     %ebp
        movl      %esp, %ebp
        subl      $4, %esp         // Adjust local frame size by 4
i686 Epilogue:
        movl      %ebp, %esp
        popl      %ebp
        ret

x86_64 Prologue:
        pushq     %rbp
        movq      %rsp, %rbp
        subq      $16, %rsp
x86_64 Epilogue:
        movl      %rbp, %rsp
        popl      %rbp
        ret

Pure 16 bytes align:
void foo()
{
    volatile __m128 m = _mm_set_ps1(0.f);
}

i686 Prologue:
        pushl     %ebp
        movl      %esp, %ebp
        andl      $-16, %esp
        subl      $16, %esp     // this is space for m, 16 byte aligned
i686 Epilogue:
        movl      %ebp, %esp
        popl      %ebp
        ret

x86_64 Prologue:
        pushq     %rbp
        movq      %rsp, %rbp
        andq      $-16, %rsp
        subq      $16, %rsp
x86_64 Epilogue:
        movl      %rbp, %rsp
        popl      %rbp
        ret

16 bytes align with alloca:
void foo(int size)
{
    char * ptr=alloca(size);
    volatile int __attribute((aligned(32)) m = 0;
    ...
}

i686 Prologue:
        pushl     %edi
        leal      8(%esp), %edi
        andl      $-32, %esp
        pushl     $4(%edi)
        pushl     %ebp
        movl      %esp, %ebp
        subl      $24,  %esp

Body:  // code for alloca
        movl      %edi, vreg1
        movl      (vreg1), %eax
        subl      %eax, %esp
        andl      $-16, %esp
        movl      %esp, %eax

i686 Epilogue:
        movl      %ebp, %esp
        popl      %ebp
        leal      -8(%edi), %esp
        popl      %edi
        ret

void foo(int dummy1, int dummy2, int dummy3, int dummy4,
         int dummy5, int dummy6, int size)
{
    char * ptr=alloca(size);
    volatile int __attribute((aligned(32)) m = 0;
    ...
}
x86_64 Prologue:
        pushq     %rbx
        leaq      $16(%rsp), %rbx
        andq      $-32, %rsp
        pushq     8(%rbx)
        pushq     %rbp
        movq      %rsp, %rbp
        subq      $24, %rsp

Body:
	movq      %rbx, vreg1
        movl      (vreg1), %eax
        subq      %rax, %rsp
        andq      $-16, %rsp
        movq      %rsp, %rax

x86_64 Epilogue:
        movl      %rbp, %rsp
        popl      %rbp
        movl      %rbx, %rsp
        popl      %rbx
        ret

m128 and PIC
int g_i;
void foo()
{
    volatile __m128 m = _mm_set_ps1(0.f);
    g_i = 123;
    ...
}

i686 Prologue:
        pushl     %ebp
        movl      %esp, %ebp
        andl      $-16, %esp
        pushl     %ebx
        subl      $16, %esp
        call      .L1
.L1:
        popl      %ebx
	...

i686 Epilogue:
        addl      $16, %esp
        popl      %ebx
        movl      %ebp, %esp
        popl      %ebp
        ret

m128 + alloca + PIC
void foo(int size)
{
    char * ptr=alloca(size);
    volatile __m128 m = _mm_set_ps1(0.f);
    ...
}
i686 Prologue:
        pushl     %edi
        leall     8(%esp), %edi
        andl      $-16, %esp
        pushl     4(%edi)
        pushl     %ebp
        movl      %esp, %ebp
        subl      $24,  %esp
        call      .L1
.L1:
        popl      %ebx

Body:
        movl      %edi, vreg1
        movl      (vreg1), %eax
        subl      %eax, %esp
        andl      $-16, %esp
        movl      %esp, %eax

i686 Epilogue:
        movl      %ebp, %esp
        popl      %ebp
        leal      -8(%edi), %esp
        popl      %edi
        ret

m128 + alloca + PIC + library call
void foo(int size)
{
    char * ptr=alloca(size);
    volatile __m128 m = _mm_set_ps1(0.f);
    printf("Hello\n");
    ...
}

i686 Prologue:
        pushl     %edi
        leal      8(%esp), %edi
        andl      $-16, %esp
        pushl     4(%edi)
        pushl     %ebp
        movl      %esp, %ebp
        subl      $24,  %esp
        call      .L1
.L1:
        popl      %ebx

i686 Body:
        movl      %edi, vreg1
        movl      (vreg1), %eax
        subl      %eax, %esp
        andl      $-16, %esp
        movl      %esp, %eax

Body:
        call      printf@PLT

i686 Epilogue:
        movl      %ebp, %esp
        popl      %ebp
        leal      -8(%edi), %esp
        popl      %edi
        ret

m128 and nested function and PIC
void foo()
{
    void bar(int arg1, int arg 2)
    {
         volatile __m128 m = _mm_set_ps1(0.f);
         ...
    }
    bar(1,2);
}

i686:
foo:
        ...
        movl      %ebp, %ecx
        call      bar@PLT
        ...

bar:
        pushl     %edi
        leal      8(%esp), %edi
        andl      $-16, %esp
        pushl     4(%edi)
        pushl     %ebp
        movl      %esp, %ebp
        subl      $24,  %esp
        call      .L1
.L1:
        popl      %ebx

        movl      %edi, vreg1
        movl      (vreg1), %eax
        subl      %eax, %esp
        andl      $-16, %esp
        movl      %esp, %eax
        ...

        movl      %ebp, %esp
        popl      %ebp
        leal      -8(%edi), %esp
        popl      %edi
        ret

m128, dynamic stack alloc and register parameter function call
static void bar(int arg1, int arg 2, int arg3)
{
    char * ptr=alloca(size);
    volatile __m128 m = _mm_set_ps1(0.f);
    ...
}

void foo()
{
    bar(1,2,3);
}

i686 foo:
        movl      $1, %eax
        movl      $2, %edx
        movl      $3, %ecx
        call      bar
        ...
bar:
        pushl     %edi
        leal      8(%esp), %edi
        andl      $-16, %esp
        pushl     $4(%edi)
        pushl     %ebp
        movl      %esp, %ebp
        subl      $24,  %esp

        movl      %edi, vreg1
        movl      (vreg1), %eax
        subl      %eax, %esp
        andl      $-16, %esp
        movl      %esp, %eax
	...

        movl      %ebp, %esp
        popl      %ebp
        leal      -8(%edi), %esp
        popl      %edi
        ret

Thanks - Joey

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2008-03-20 19:21 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-12-19  1:46 A proposal to align GCC stack Ross Ridge
  -- strict thread matches above, loose matches on Subject: below --
2007-12-19 11:52 Ross Ridge
2007-12-19 10:06 Ross Ridge
2007-12-19 15:32 ` H.J. Lu
2007-12-19  9:13 Ross Ridge
2007-12-19 14:30 ` H.J. Lu
2007-12-19  3:51 Ross Ridge
2007-12-19 10:33 ` Andrew Pinski
2007-12-20  9:32   ` Ye, Joey
2007-12-20  9:11 ` Ye, Joey
2008-03-20 20:18 ` Ye, Joey
2007-12-19  1:00 Ross Ridge
2007-12-19  1:53 ` Ye, Joey
2007-12-19  2:07 ` H.J. Lu
2007-12-18 23:31 Ross Ridge
2007-12-19  1:25 ` Robert Dewar
2007-12-19  2:18 ` H.J. Lu
2007-12-18 11:55 Ross Ridge
2007-12-18 16:14 ` H.J. Lu
2007-12-18  4:29 Ross Ridge
2007-12-18  6:15 ` H.J. Lu
2007-12-18  7:50   ` Ye, Joey
2007-12-18 13:52 ` Daniel Jacobowitz
2007-12-18 18:05   ` H.J. Lu
2007-12-18 14:41 ` Robert Dewar
2007-12-18  4:25 Ye, Joey
2007-12-21 20:25 ` Christian Schüler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).