public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* RE: g++ 3.4.0 cygwin, codegen SSE & alignement issues
@ 2004-04-29  4:49 Ross Ridge
  2004-04-29  6:12 ` DJ Delorie
  0 siblings, 1 reply; 7+ messages in thread
From: Ross Ridge @ 2004-04-29  4:49 UTC (permalink / raw)
  To: gcc

> Well, it's quicker to allocate a constant size stack frame than to
>dynamically calculate the alignment requirements, but only by two or
>three fairly trivial instructions.  And although aligning the frame just
>once at startup and keeping it aligned by always allocating aligned-size
>stack frames, in some situations stack memory is a limited resource, and
>particularly since not all code uses vector registers, there's a lot of
>stack memory usage to be saved by not making all the stack frames bigger
>just for the sake of the very few frames for functions that actually
>use the vector regs.  So I'd say it's probably one of those trade-offs
>for which there's no one 'right' answer.

I think a happy middle-way, at least for the i386 port, would be to
implement a special function attribute, say __attribute__((align_stack)),
that only dynamically aligns the stack of functions defined with the
attribute.

						Ross Ridge

-- 
 l/  //	  Ross Ridge -- The Great HTMU
[oo][oo]  rridge@csclub.uwaterloo.ca
-()-/()/  http://www.csclub.uwaterloo.ca/u/rridge/ 
 db  //	  

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: g++ 3.4.0 cygwin, codegen SSE & alignement issues
  2004-04-29  4:49 g++ 3.4.0 cygwin, codegen SSE & alignement issues Ross Ridge
@ 2004-04-29  6:12 ` DJ Delorie
  0 siblings, 0 replies; 7+ messages in thread
From: DJ Delorie @ 2004-04-29  6:12 UTC (permalink / raw)
  To: rridge; +Cc: gcc


> I think a happy middle-way, at least for the i386 port, would be to
> implement a special function attribute, say __attribute__((align_stack)),
> that only dynamically aligns the stack of functions defined with the
> attribute.

The existing align-stack code is target independent.  Why not create a
target independent attribute?  The x86 isn't the only target that has
this problem, just the only one with an OS that makes it so obvious
(although Windows runs on various embedded chips also).

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: g++ 3.4.0 cygwin, codegen SSE & alignement issues
@ 2004-04-29  6:46 Ross Ridge
  0 siblings, 0 replies; 7+ messages in thread
From: Ross Ridge @ 2004-04-29  6:46 UTC (permalink / raw)
  To: gcc

Ross Ridge wrote:
> I think a happy middle-way, at least for the i386 port, would be to
> implement a special function attribute, say __attribute__((align_stack)),
> that only dynamically aligns the stack of functions defined with the
> attribute.

DJ Delorie wrote:
> The existing align-stack code is target independent.  Why not create a
> target independent attribute?  The x86 isn't the only target that has
> this problem, just the only one with an OS that makes it so obvious
> (although Windows runs on various embedded chips also).

Well, the existing target independent forced align-stack code, which only
handles main(), is less than ideal on i386 Windows targets.  It generates
an unnecessary call to __alloca and wastes 16-bytes of stack space.
This isn't a problem since it currently only happens in main(), but it
would probably better to have the forced, dynamic, stack alignment code
done more intelligently in the backend where all the other prologue
stack allocation is handled.

						Ross Ridge

-- 
 l/  //	  Ross Ridge -- The Great HTMU
[oo][oo]  rridge@csclub.uwaterloo.ca
-()-/()/  http://www.csclub.uwaterloo.ca/u/rridge/ 
 db  //	  

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: g++ 3.4.0 cygwin, codegen SSE & alignement issues
  2004-04-28 17:04   ` Tim Prince
  2004-04-28 19:17     ` Dave Korn
@ 2004-04-28 20:42     ` Brian Ford
  1 sibling, 0 replies; 7+ messages in thread
From: Brian Ford @ 2004-04-28 20:42 UTC (permalink / raw)
  To: Tim Prince; +Cc: Dave Korn, cygwin, gcc

On Wed, 28 Apr 2004, Tim Prince wrote:

> At 08:51 AM 4/28/2004, Dave Korn wrote:
>
> > > From: cygwin-owner On Behalf Of tbp
> > > Sent: 28 April 2004 16:16
> >
> >[  Now x-posted to gcc list, since it's seemingly a gcc issue rather than a
> >cygwin environment issue.

It's an interoperability issue.

> >   I'd recommend doing that in the startup code in gcc's crt0.s myself.

That won't help for threads.  See:

http://www.cygwin.com/ml/cygwin/2004-04/msg01134.html

> > The real question is, is the compiler generating code that guarantees
> > the stack stays aligned, so you can do that just once at startup?  It
> > certainly ought to.

It is supposed to, given the call back and new thread caveats.

> As Dave said, this is more of a gcc than a cygwin issue,

For threads, it happens to be easiest to fix this in Cygwin.

> gcc made a decision, which is different from commercial compilers,

and the ABI.

-- 
Brian Ford
Senior Realtime Software Engineer
VITAL - Visual Simulation Systems
FlightSafety International
the best safety device in any aircraft is a well-trained pilot...

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: g++ 3.4.0 cygwin, codegen SSE & alignement issues
  2004-04-28 17:04   ` Tim Prince
@ 2004-04-28 19:17     ` Dave Korn
  2004-04-28 20:42     ` Brian Ford
  1 sibling, 0 replies; 7+ messages in thread
From: Dave Korn @ 2004-04-28 19:17 UTC (permalink / raw)
  To: cygwin; +Cc: gcc

> -----Original Message-----
> From: Tim Prince 
> Sent: 28 April 2004 17:19

> Because of the different division of responsibilities, if a 
> function built 
> by gcc is called by a function built by a commercial compiler 
> (or by gcc 
> -Os), the stack has a 75% probability of being mis-aligned.  
> It may be 
> possible to overcome this by having a wrapper function 
> between, which is 
> built by gcc with alignment specified, but does not use SSE.

  I once wrote a patch for gcc (for the ppc backend, but the principles
should be applicable if not the actual code) to add a new -m option, the
effect of which was to modify prolog generation code so that instead of just
subtracting a constant from the sp to allocate the new frame, it also
dynamically calculated how much extra to subtract to get the correct
alignment for the resulting new sp value.  It was pretty simple, involving
just a few extra assembler instructions in each prolog.

[  In fact, it may not be as simple as that (...any more).  With the ppc
eabi, the effect of allocating more space on the stack than you've actually
defined in the stack frame is that a gap opens up between the outgoing args
area, which grows up from the bottom of the frame, and the local vars and
saved regs area, which grow down from the top of the frame.  This didn't do
any harm in 2.95.x, but it might well go wrong in gcc-3.x.x, where the
handling of eliminable regs and starting frame offset is different.  I'm
also unsure about how badly this sort of malarkey might break gdb's
understanding of what is going on in a function's frame, but I would imagine
it would do so quite badly.  ]

  It's a total waste of bytes in a situation where you know that the OS or
CRT gets it right for you, but it would be useful in a mixed
objects/abis/compilers situation.  Looks like there might be call for the
same sort of thing for the i.86 backend?

> Presumably, there is a performance advantage to gcc of 
> assuming that the 
> caller passes an aligned stack, but not enough to persuade commercial 
> compilers to adopt a compatible scheme.

  Well, it's quicker to allocate a constant size stack frame than to
dynamically calculate the alignment requirements, but only by two or three
fairly trivial instructions.  And although aligning the frame just once at
startup and keeping it aligned by always allocating aligned-size stack
frames, in some situations stack memory is a limited resource, and
particularly since not all code uses vector registers, there's a lot of
stack memory usage to be saved by not making all the stack frames bigger
just for the sake of the very few frames for functions that actually use the
vector regs.  So I'd say it's probably one of those trade-offs for which
there's no one 'right' answer.

    cheers,
       DaveK
-- 
Can't think of a witty .sigline today....

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: g++ 3.4.0 cygwin, codegen SSE & alignement issues
  2004-04-28 16:56 ` Dave Korn
@ 2004-04-28 17:04   ` Tim Prince
  2004-04-28 19:17     ` Dave Korn
  2004-04-28 20:42     ` Brian Ford
  0 siblings, 2 replies; 7+ messages in thread
From: Tim Prince @ 2004-04-28 17:04 UTC (permalink / raw)
  To: Dave Korn, cygwin; +Cc: gcc

At 08:51 AM 4/28/2004, Dave Korn wrote:

> > -----Original Message-----
> > From: cygwin-owner On Behalf Of tbp
> > Sent: 28 April 2004 16:16
>
>[  Now x-posted to gcc list, since it's seemingly a gcc issue rather than a
>cygwin environment issue.  You might also care to refer to the current
>discussion on the gcc-patches mailing list under the thread
>
>"Re: [Bug target/15106] Vector varargs failure for AltiVec on ppc32 linux"
>
>which is discussing the same problem arising on ppc targets.  ]
>
>
> > > It's an ABI incompatiblity issue, GCC expects a 16-byte
> > aligned stack,
> > > but the Windows ABI, to the extent one actually exists, only assumes
> > > a 4-byte aligned stack (and even that's not a strict requirement).
> > Is there an official or semi official way to fix it or do i have to
> > insert something like "mov esp, eax; and 0x15, eax; sub eax,
> > esp" where
> > it helps?
>
>   I'd recommend doing that in the startup code in gcc's crt0.s myself.  The
>real question is, is the compiler generating code that guarantees the stack
>stays aligned, so you can do that just once at startup?  It certainly ought
>to.
>
> > I understand -mfpmath=sse is flagged as experimental. What i
> > don't get
> > is why the compiler emits totally bogus code when using default
> > switches: -O3 -march=k8 -> boom. -O3 -march=pentium4 -> boom.
>
>   The division of responsibility between OS, CRT/startup and compiler leaves
>it unclear as to who is supposed to ensure the alignment of the stack.  IMO,
>it's a compiler's problem to see to it that if the stack starts off aligned
>it remains that way, by always generating stack frames that are a multiple
>of the alignment requirement, and it's the CRT/startup code that is
>responsible for mediating between what the compiled code requires and what
>the underlying OS/arch provides for stack pointer alignment at startup.  Of
>course, that's IMO, and my opinion is hardly definitive.
>

As Dave said, this is more of a gcc than a cygwin issue, provided that 
cygwin doesn't defeat one of the functions of binutils or gcc (as it did in 
the past).  gcc made a decision, which is different from commercial 
compilers, that stack alignment requires each function to be compiled with 
-mpreferred-stack-alignment=4, so that it passes an aligned stack to the 
callee.  binutils has to be built with maximum alignment set to at least 4 
(as cygwin has done for some months now).

stack-alignment=4 is a default for gcc, except when -Os is specified.  If 
you use -Os, and any called function uses SSE, you must over-ride the stack 
alignment set by -Os.  gcc did this because the stack alignment caused some 
applications (which don't use SSE) to fail with stack overflow.  I suspect 
this could happen in cygwin.

Because of the different division of responsibilities, if a function built 
by gcc is called by a function built by a commercial compiler (or by gcc 
-Os), the stack has a 75% probability of being mis-aligned.  It may be 
possible to overcome this by having a wrapper function between, which is 
built by gcc with alignment specified, but does not use SSE.

Presumably, there is a performance advantage to gcc of assuming that the 
caller passes an aligned stack, but not enough to persuade commercial 
compilers to adopt a compatible scheme.


Tim Prince 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: g++ 3.4.0 cygwin, codegen SSE & alignement issues
       [not found] <408FCABF.2050702@ompf.org>
@ 2004-04-28 16:56 ` Dave Korn
  2004-04-28 17:04   ` Tim Prince
  0 siblings, 1 reply; 7+ messages in thread
From: Dave Korn @ 2004-04-28 16:56 UTC (permalink / raw)
  To: cygwin; +Cc: gcc

> -----Original Message-----
> From: cygwin-owner On Behalf Of tbp
> Sent: 28 April 2004 16:16

[  Now x-posted to gcc list, since it's seemingly a gcc issue rather than a
cygwin environment issue.  You might also care to refer to the current
discussion on the gcc-patches mailing list under the thread 

"Re: [Bug target/15106] Vector varargs failure for AltiVec on ppc32 linux"

which is discussing the same problem arising on ppc targets.  ]


> > It's an ABI incompatiblity issue, GCC expects a 16-byte 
> aligned stack,
> > but the Windows ABI, to the extent one actually exists, only assumes
> > a 4-byte aligned stack (and even that's not a strict requirement).
> Is there an official or semi official way to fix it or do i have to 
> insert something like "mov esp, eax; and 0x15, eax; sub eax, 
> esp" where 
> it helps?

  I'd recommend doing that in the startup code in gcc's crt0.s myself.  The
real question is, is the compiler generating code that guarantees the stack
stays aligned, so you can do that just once at startup?  It certainly ought
to.

> I understand -mfpmath=sse is flagged as experimental. What i 
> don't get 
> is why the compiler emits totally bogus code when using default 
> switches: -O3 -march=k8 -> boom. -O3 -march=pentium4 -> boom.

  The division of responsibility between OS, CRT/startup and compiler leaves
it unclear as to who is supposed to ensure the alignment of the stack.  IMO,
it's a compiler's problem to see to it that if the stack starts off aligned
it remains that way, by always generating stack frames that are a multiple
of the alignment requirement, and it's the CRT/startup code that is
responsible for mediating between what the compiled code requires and what
the underlying OS/arch provides for stack pointer alignment at startup.  Of
course, that's IMO, and my opinion is hardly definitive.


    cheers, 
      DaveK
-- 
Can't think of a witty .sigline today....

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2004-04-29  3:05 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-04-29  4:49 g++ 3.4.0 cygwin, codegen SSE & alignement issues Ross Ridge
2004-04-29  6:12 ` DJ Delorie
  -- strict thread matches above, loose matches on Subject: below --
2004-04-29  6:46 Ross Ridge
     [not found] <408FCABF.2050702@ompf.org>
2004-04-28 16:56 ` Dave Korn
2004-04-28 17:04   ` Tim Prince
2004-04-28 19:17     ` Dave Korn
2004-04-28 20:42     ` Brian Ford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).