From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeffrey A Law To: Craig Burley Cc: d.love@dl.ac.uk, egcs@cygnus.com Subject: Re: ix86 `double' alignment (was Re: egcs-1.1 release schedule) Date: Tue, 23 Jun 1998 03:32:00 -0000 Message-id: <17885.898580970@hurl.cygnus.com> References: <199806221811.OAA07410@melange.gnu.org> X-SW-Source: 1998-06/msg00784.html In message < 199806221811.OAA07410@melange.gnu.org >you write: > For Fortran code, we can usually hand-wave that; this case would > only come up when the call tree has an *embedded* procedure > that doesn't maintain proper alignment, and since the big > computational problem with g77 performance is in code compiled > by g77, and such code is rarely called by C code, I don't think > this would represent a huge deficiency. Right, but changing STACK_BOUNDARY is not an option because it does effect C code. > > If the stack gets mis-aligned relative to STACK_BOUNDARY > > combine could end up removing a seemingly useless > > stack operation/address calculation. > > I don't understand this, but presumably I need to look into it > further. I explained it a little in a message to Toon. Basically combine knows have to remove a rundant "and" operation which just turns off some low order bits in an address. If the stack isn't aligned to STACK_BOUNDARY, then combine could end up removing a mask operation that wasn't redundant. > Okay, that makes sense to me. We want to hit a majority of cases > anyway. We don't care (for now) about cases where users are > combining multiple languages in weird ways, for example. Well, we care about it from a correctness standpoint. Things still have to work if they're combining .o files from old compilers, callback from the library like qsort, etc. But we aren't really worred about performance in those cases. > > > * The ABI is still going to mandate that some doubles in > > argument lists are going to be mis-aligned. We'd have > > to arrange to copy them from the arglist into a suitable > > stack slot. This may be more trouble than its worth. > > I'm not sure how this can ever happen in the x86 architecture? Sure. Think about cases where the alignment of the double in the arglist isn't naturally aligned (think C, pass by value :-). foo (double, int) we push args back to front. So we push the int on the stack, which means the double will be at only a 4 byte aligned stack address if we assume our stack was 8 byte aligned before we pushed the args. You might think we could compensate for this by pushing an extra dummy word before the first integer to ensure the double gets aligned. But that loses if we have: foo (int2, double, int1) If we pushed an extra 4 byte hunk before int1, then the total size of the arglist would be 20 bytes -- not a multiple of 8. And as I'll explain below, we must always make sure to allocate in 8 byte lumps -- we can't depend on the callee to round the stack. > Is it reasonable > to just subtract an extra 8 bytes when creating the frame > pointer upon procedure entry and then NAND it with 7 to align > it? Or would that make for problems with debugger, profiling, > and/or exception support, or is there no quick way to NAND the > frame pointer on the x86? Nope. Because you then don't have a constant offset to get to the arguments that were passed to the function. To make this work you'd have to dedicate a hard register to serve as an argument pointer, which will be horrible. [ Think about it, how can you generate code to find an argument if at entry to the procedure you may adjust the stack by a varying value (0 or 4). ] Instead we must make sure that we always allocate stacks in 8 byte hunks in the prologue *and* that we push an extra dummy word on the stack when performing function calls where the arg list + return pointer are not a multiple of 8 bytes in size. [ Remember, the call itself will push a 4 byte word on the stack too, so we have to account for it too. ] > It seems like everyone else thinks the right way to do this is > to try to always assure %sp is 64-bit aligned across calls by > modifying all the code that is in the procedure-call chain. > That probably means an extra dummy push before odd-number-of-args > calls, etc., right? Close. It's not the number of args, but the total size of the arg list. If the size of the arg list is a multiple of 8 bytes, then we have to push a dummy arg. so that in the stack is 8 byte aligned when we enter the callee. jeff