* stack allocation @ 2004-12-12 19:21 matt smith 2004-12-16 19:08 ` jlh 0 siblings, 1 reply; 4+ messages in thread From: matt smith @ 2004-12-12 19:21 UTC (permalink / raw) To: gcc-help This issue has been discussed in a few threads that I have found on google but there was no conclusive answer given for the phenomenon. example1.c void function(int a, int b, int c) { char buffer1[5]; char buffer2[10]; } void main() { function(1,2,3); } When you issue the "gcc -S -o example1.s example1.c" command and view the function prolog you see that the compiler reserves 40 bytes for these two arrays. subl $40, %esp To me the expected behavior should be subl $24, %esp or in other words reserving 24 bytes of stack space. Why the discrepancy? Thanks Josh __________________________________ Do you Yahoo!? Meet the all-new My Yahoo! - Try it today! http://my.yahoo.com ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: stack allocation 2004-12-12 19:21 stack allocation matt smith @ 2004-12-16 19:08 ` jlh 0 siblings, 0 replies; 4+ messages in thread From: jlh @ 2004-12-16 19:08 UTC (permalink / raw) To: matt smith, gcc-help [-- Attachment #1: Type: text/plain, Size: 1624 bytes --] matt smith wrote: > Why the discrepancy? I think I might have found the reason for this; here's what I've been experimenting with today: extern int i; extern void f2(); void f() { f2(); i = 3; } If I compile with "gcc-4.0 -O2" I get this: (on x86) f: pushl %ebp movl %esp, %ebp subl $8, %esp call f2 movl $3, %eax movl %eax, i leave ret The pushed %ebp uses 4 bytes on the stack and GCC reserves another 8 bytes (which are never used) for a total of 12 bytes. Now if I compile the same with the option "-fomit-frame-pointer" added I get this: f: subl $12, %esp call f2 movl $3, %eax movl %eax, i addl $12, %esp ret No more %ebp on the stack, but now GCC reserves 12 bytes. In both cases the function f() uses 12 bytes of stack and together with the 4 bytes of return address being on the stack already, it totals to 16 bytes, which is a nice alignment. And as you know, proper alignment makes code faster. If f() does not call any function, GCC does not reserve any unnecessary space. In your sample code, you didn't use optimization at all, so it probably did the alignment anyway, even if no other function gets called from your function. This might be the reason why it allocates 40 bytes instead of only what it requires for storage. Then I did some measurements and apparently, calling a function with the stack not aligned to 16-bytes is slower. So GCC actually does a good job here. Voilà , I hope this wasn't non-sense. :) jlh [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 252 bytes --] ^ permalink raw reply [flat|nested] 4+ messages in thread
[parent not found: <CAPhGq=bY8hS0DF2rf7_5E8ycYS52uJR8UH=Yjb0NiDBdaSR+6Q@mail.gmail.com>]
* Stack allocation [not found] <CAPhGq=bY8hS0DF2rf7_5E8ycYS52uJR8UH=Yjb0NiDBdaSR+6Q@mail.gmail.com> @ 2011-11-18 17:57 ` Alexandru Juncu 2011-11-18 19:35 ` Andrew Haley 0 siblings, 1 reply; 4+ messages in thread From: Alexandru Juncu @ 2011-11-18 17:57 UTC (permalink / raw) To: gcc-help Hello! [I sent an email on the gcc main list by my mistake, and I am moving the discussion here] I have a curiosity with something I once tested. I took a simple C program and made an assembly file with gcc -S. The C file looks something like this: int main(void) { int a=1, b=2; return 0; } The assembly instructions look like this: subl $16, %esp movl $1, -4(%ebp) movl $2, -8(%ebp) The subl $16, means the allocation of local variables on the stack, right? 16 bytes are enough for 4 32bit integers. If I have 1,2,3 or 4 local variables declared, you get those 16 bytes. If I have 5 variables, we have " subl $32, %esp". 5,6,7,8 variables ar the same. 9, 10,11,12, 48 bytes. The observation is that gcc allocates increments of 4 variables (if they are integers). If I allocate 8bit chars, increments of 16 chars. So the allocation is in increments of 16 bytes no matter what. OK, that's the observation... my question is why? What's the reason for this, is it an optimization (does is matter what's the -O used?) or is it architecture dependent (I ran it on x86) and is this just in gcc, just in a certain version of gcc or this is universal? I got a response that is related to the cache line alignment, to optimize cache hits. But I tried to compile the program with the --param l1-cache-size and got the same .s file. Is this ok? Thank you! -- Alexandru Juncu ROSEdu ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Stack allocation 2011-11-18 17:57 ` Stack allocation Alexandru Juncu @ 2011-11-18 19:35 ` Andrew Haley 0 siblings, 0 replies; 4+ messages in thread From: Andrew Haley @ 2011-11-18 19:35 UTC (permalink / raw) To: gcc-help On 11/18/2011 03:03 PM, Alexandru Juncu wrote: > Hello! > > [I sent an email on the gcc main list by my mistake, and I am moving > the discussion here] > > I have a curiosity with something I once tested. I took a simple C > program and made an assembly file with gcc -S. > > The C file looks something like this: > int main(void) > { > int a=1, b=2; > return 0; > } > > The assembly instructions look like this: > > subl $16, %esp > movl $1, -4(%ebp) > movl $2, -8(%ebp) > > The subl $16, means the allocation of local variables on the stack, > right? 16 bytes are enough for 4 32bit integers. > If I have 1,2,3 or 4 local variables declared, you get those 16 bytes. > If I have 5 variables, we have " subl $32, %esp". 5,6,7,8 variables ar > the same. 9, 10,11,12, 48 bytes. > > The observation is that gcc allocates increments of 4 variables (if > they are integers). If I allocate 8bit chars, increments of 16 chars. > > So the allocation is in increments of 16 bytes no matter what. > > OK, that's the observation... my question is why? What's the reason > for this, is it an optimization (does is matter what's the -O used?) > or is it architecture dependent (I ran it on x86) and is this just in > gcc, just in a certain version of gcc or this is universal? > > I got a response that is related to the cache line alignment, to > optimize cache hits. > But I tried to compile the program with the --param l1-cache-size and > got the same .s file. Is this ok? You're not optimizing. Nothing much will happen with optimization options when you're not optimizing. This is x86-specific, but other processors have similar needs. gcc must 16-align the stack because some structures (such as MMX data) must be aligned. Given that the data must be aligned, so must the stack. Also, fetches and stores that straddle cache line boundaries can be very slow. Andrew. ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2011-11-18 15:10 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2004-12-12 19:21 stack allocation matt smith 2004-12-16 19:08 ` jlh [not found] <CAPhGq=bY8hS0DF2rf7_5E8ycYS52uJR8UH=Yjb0NiDBdaSR+6Q@mail.gmail.com> 2011-11-18 17:57 ` Stack allocation Alexandru Juncu 2011-11-18 19:35 ` Andrew Haley
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).