From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 26149 invoked by alias); 28 Apr 2004 16:56:51 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 26036 invoked from network); 28 Apr 2004 16:56:49 -0000 Received: from unknown (HELO NUTMEG.CAM.ARTIMI.COM) (217.40.111.177) by sources.redhat.com with SMTP; 28 Apr 2004 16:56:49 -0000 Received: from mace ([192.168.1.25]) by NUTMEG.CAM.ARTIMI.COM with Microsoft SMTPSVC(6.0.3790.0); Wed, 28 Apr 2004 17:55:20 +0100 From: "Dave Korn" To: Cc: Subject: RE: g++ 3.4.0 cygwin, codegen SSE & alignement issues Date: Wed, 28 Apr 2004 19:17:00 -0000 MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit In-Reply-To: <6.0.1.1.0.20040428085945.01f20a90@imap.myrealbox.com> Message-ID: X-OriginalArrivalTime: 28 Apr 2004 16:55:20.0203 (UTC) FILETIME=[96F5C5B0:01C42D41] X-SW-Source: 2004-04/txt/msg01330.txt.bz2 > -----Original Message----- > From: Tim Prince > Sent: 28 April 2004 17:19 > Because of the different division of responsibilities, if a > function built > by gcc is called by a function built by a commercial compiler > (or by gcc > -Os), the stack has a 75% probability of being mis-aligned. > It may be > possible to overcome this by having a wrapper function > between, which is > built by gcc with alignment specified, but does not use SSE. I once wrote a patch for gcc (for the ppc backend, but the principles should be applicable if not the actual code) to add a new -m option, the effect of which was to modify prolog generation code so that instead of just subtracting a constant from the sp to allocate the new frame, it also dynamically calculated how much extra to subtract to get the correct alignment for the resulting new sp value. It was pretty simple, involving just a few extra assembler instructions in each prolog. [ In fact, it may not be as simple as that (...any more). With the ppc eabi, the effect of allocating more space on the stack than you've actually defined in the stack frame is that a gap opens up between the outgoing args area, which grows up from the bottom of the frame, and the local vars and saved regs area, which grow down from the top of the frame. This didn't do any harm in 2.95.x, but it might well go wrong in gcc-3.x.x, where the handling of eliminable regs and starting frame offset is different. I'm also unsure about how badly this sort of malarkey might break gdb's understanding of what is going on in a function's frame, but I would imagine it would do so quite badly. ] It's a total waste of bytes in a situation where you know that the OS or CRT gets it right for you, but it would be useful in a mixed objects/abis/compilers situation. Looks like there might be call for the same sort of thing for the i.86 backend? > Presumably, there is a performance advantage to gcc of > assuming that the > caller passes an aligned stack, but not enough to persuade commercial > compilers to adopt a compatible scheme. Well, it's quicker to allocate a constant size stack frame than to dynamically calculate the alignment requirements, but only by two or three fairly trivial instructions. And although aligning the frame just once at startup and keeping it aligned by always allocating aligned-size stack frames, in some situations stack memory is a limited resource, and particularly since not all code uses vector registers, there's a lot of stack memory usage to be saved by not making all the stack frames bigger just for the sake of the very few frames for functions that actually use the vector regs. So I'd say it's probably one of those trade-offs for which there's no one 'right' answer. cheers, DaveK -- Can't think of a witty .sigline today....