* Useless assembly
@ 2004-11-28 21:40 Sam Lauber
2004-11-28 21:41 ` Kazu Hirata
` (3 more replies)
0 siblings, 4 replies; 11+ messages in thread
From: Sam Lauber @ 2004-11-28 21:40 UTC (permalink / raw)
To: gcc
When I run GCC 3.4.3 on this code:
#include <stdio.h>
int main(void)
{
printf("Hello World!\n");
return 0;
}
it generates the assembly code (this is i686 assembly)
.file "test.c"
.section .rodata
.LC0:
.string "Hello World!\n"
.text
.globl main
.type main, @function
main:
pushl %ebp
movl %esp, %ebp
subl $8, %esp
andl $-16, %esp
movl $0, %eax
addl $15, %eax
addl $15, %eax
shrl $4, %eax
sall $4, %eax
subl %eax, %esp
movl $.LC0, (%esp)
call printf
movl $0, %eax
leave
ret
.size main, .-main
.section .note.GNU-stack,"",@progbits
.ident "GCC: (GNU) 3.4.3"
By hand-optimizing the assembly, I made this:
.LC0:
.string "Hello World!\n"
.globl main
.type main, @function
main:
pushl %ebp
movl %esp, %ebp
movl $.LC0, (%esp)
call printf
leave
It worked the exact same. Clearly a lot was unnessecary. Removing "call printf" or "movl $.LC0, (%esp)" caused it not to print "Hello World!". Removing ".globl main" caused the linker to fail with an undefined reference to main. Removing anything else caused it to print Hello World and then generate a segmentation fault. Optimizing it at -O3 (which I'm sure includes SSA, DCE, and CCP) just made the compilation take longer and generate more assembly then the first time. If someone could just automate removing that code in cc1, it would probably mean a lot of optimization. Also, when I just compiled a "return 0;", I would expect that to just generate main and the instruction "ret" (or is it "retn?"). Instead, it generated a lot of extra instructions, then finally "ret". Clearly this is unsatisfactory. It makes cc1, as, and ld slower because they have to compile, assemble, and link more instructions. Each extra instruction makes a compilation time penalty of -3 and a runtime penalty of -1. At least that could be put onto DCE. If extra code gets generated by cc1, as and ld can't optimize it.
Samuel Lauber
--
_____________________________________________________________
Web-based SMS services available at http://www.operamail.com.
From your mailbox to local or overseas cell phones.
Powered by Outblaze
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Useless assembly
2004-11-28 21:40 Useless assembly Sam Lauber
@ 2004-11-28 21:41 ` Kazu Hirata
2004-11-28 21:56 ` Eric Botcazou
2004-11-28 22:06 ` Steven Bosscher
2004-11-28 22:46 ` jlh
` (2 subsequent siblings)
3 siblings, 2 replies; 11+ messages in thread
From: Kazu Hirata @ 2004-11-28 21:41 UTC (permalink / raw)
To: sam124; +Cc: gcc
Hi Sam,
> Optimizing it at -O3 (which I'm sure includes SSA, DCE, and CCP)
> just made the compilation take longer and generate more assembly
> then the first time.
-O3 does not enable SSA-based optimizations.
> Also, when I just compiled a "return 0;", I would expect that to
> just generate main and the instruction "ret" (or is it
> "retn?"). Instead, it generated a lot of extra instructions, then
> finally "ret".
Add an option -fomit-frame-pointer.
Kazu Hirata
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Useless assembly
2004-11-28 21:41 ` Kazu Hirata
@ 2004-11-28 21:56 ` Eric Botcazou
2004-11-28 22:06 ` Steven Bosscher
1 sibling, 0 replies; 11+ messages in thread
From: Eric Botcazou @ 2004-11-28 21:56 UTC (permalink / raw)
To: gcc; +Cc: Kazu Hirata, sam124
> Add an option -fomit-frame-pointer.
And don't call your function main. :-)
--
Eric Botcazou
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Useless assembly
2004-11-28 21:41 ` Kazu Hirata
2004-11-28 21:56 ` Eric Botcazou
@ 2004-11-28 22:06 ` Steven Bosscher
1 sibling, 0 replies; 11+ messages in thread
From: Steven Bosscher @ 2004-11-28 22:06 UTC (permalink / raw)
To: gcc; +Cc: Kazu Hirata, sam124
On Sunday 28 November 2004 22:35, Kazu Hirata wrote:
> Hi Sam,
>
> > Optimizing it at -O3 (which I'm sure includes SSA, DCE, and CCP)
> > just made the compilation take longer and generate more assembly
> > then the first time.
>
> -O3 does not enable SSA-based optimizations.
In fact with GCC 3.4.3, nothing enables SSA-based optimizations.
How could it, there are none.
Gr.
Steven
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Useless assembly
2004-11-28 21:40 Useless assembly Sam Lauber
2004-11-28 21:41 ` Kazu Hirata
@ 2004-11-28 22:46 ` jlh
2004-11-29 4:25 ` Robert Dewar
2004-11-29 12:04 ` Andreas Schwab
3 siblings, 0 replies; 11+ messages in thread
From: jlh @ 2004-11-28 22:46 UTC (permalink / raw)
To: gcc
[-- Attachment #1: Type: text/plain, Size: 1521 bytes --]
> When I run GCC 3.4.3 on this code:
> [...]
> it generates the assembly code (this is i686 assembly)
When doing such test, you shouldn't test it on the function main(),
as it is obviously treated specially. For example, take this example:
int hello()
{
return(0);
}
int main()
{
return(0);
}
And compile it with GCC 3.4.3 on x86. It gives (simplified):
hello: pushl %ebp
movl %esp, %ebp
movl $0, %eax
popl %ebp
ret
main: pushl %ebp
movl %esp, %ebp
subl $8, %esp
andl $-16, %esp
movl $0, %eax
addl $15, %eax
addl $15, %eax
shrl $4, %eax
sall $4, %eax
subl %eax, %esp
movl $0, %eax
leave
ret
You see the difference. Note that this was done without any optimization
enabled, the command line was "gcc -c -S a.c"
> Also, when I just compiled a "return 0;", I would expect that to just
> generate main and the instruction "ret" (or is it "retn?").
No, because frame pointers aren't omitted by default. Compile the
above example with the option "-fomit-frame-pointer" and you get what
you want, but only for hello(), not for main(). And that still without
optimization turned on. To make main() a bit shorter, the option "-O"
is enough.
There probably are many reasons to not omit those additional instructions
from main. And it's not necessary either, because main() only gets
executed once.
jlh
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 252 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Useless assembly
2004-11-28 21:40 Useless assembly Sam Lauber
2004-11-28 21:41 ` Kazu Hirata
2004-11-28 22:46 ` jlh
@ 2004-11-29 4:25 ` Robert Dewar
2004-11-29 12:04 ` Andreas Schwab
3 siblings, 0 replies; 11+ messages in thread
From: Robert Dewar @ 2004-11-29 4:25 UTC (permalink / raw)
To: Sam Lauber; +Cc: gcc
Sam Lauber wrote:
> By hand-optimizing the assembly, I made this:
>
> .LC0:
> .string "Hello World!\n"
> .globl main
> .type main, @function
> main:
> pushl %ebp
> movl %esp, %ebp
> movl $.LC0, (%esp)
> call printf
> leave
>
> It worked the exact same. Clearly a lot was unnessecary. Removing "call printf" or "movl $.LC0, (%esp)" caused it not to print "Hello World!". Removing ".globl main" caused the linker to fail with an undefined reference to main. Removing anything else caused it to print Hello World and then generate a segmentation fault. Optimizing it at -O3 (which I'm sure includes SSA, DCE, and CCP) just made the compilation take longer and generate more assembly then the first time. If someone could just automate removing that code in cc1, it would probably mean a lot of optimization. Also, when I just compiled a "return 0;", I would expect that to just generate main and the instruction "ret" (or is it "retn?"). Instead, it generated a lot of extra instructions, then finally "ret". Clearly this is unsatisfactory. It makes cc1, as, and ld slower because they have to compile, assemble, and link more instructions. Each extra instruction makes a compilation time penalty of -3 and a runtime
penalty of -1. At least that could be put onto DCE. If extra code gets generated by cc1, as and ld can't optimize it.
>
> Samuel Lauber
Yes, but you fail to establish 16-byte alignment for the
stack, which is the normal default. If you don't want
16-byte alignment for the stack, supply an appropriate
switch (e.g. -mpreferred-stack-boundary=2).
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Useless assembly
2004-11-28 21:40 Useless assembly Sam Lauber
` (2 preceding siblings ...)
2004-11-29 4:25 ` Robert Dewar
@ 2004-11-29 12:04 ` Andreas Schwab
3 siblings, 0 replies; 11+ messages in thread
From: Andreas Schwab @ 2004-11-29 12:04 UTC (permalink / raw)
To: Sam Lauber; +Cc: gcc
"Sam Lauber" <sam124@operamail.com> writes:
> By hand-optimizing the assembly, I made this:
>
> .LC0:
> .string "Hello World!\n"
> .globl main
> .type main, @function
> main:
> pushl %ebp
> movl %esp, %ebp
> movl $.LC0, (%esp)
> call printf
> leave
>
> It worked the exact same.
It doesn't return the correct exit code.
Andreas.
--
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux Products GmbH, MaxfeldstraÃe 5, 90409 Nürnberg, Germany
Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Useless assembly
2004-11-30 14:12 ` Dave Korn
@ 2004-11-30 16:20 ` Robert Dewar
0 siblings, 0 replies; 11+ messages in thread
From: Robert Dewar @ 2004-11-30 16:20 UTC (permalink / raw)
To: Dave Korn; +Cc: 'Sam Lauber', gcc
Dave Korn wrote:
> I guess now would not be a good time to mention stackless python ? :-)
Well of course it is possible to implement without local stack frames,
but the calling sequence for C does not work that way!
^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: Useless assembly
2004-11-30 4:36 ` Robert Dewar
@ 2004-11-30 14:12 ` Dave Korn
2004-11-30 16:20 ` Robert Dewar
0 siblings, 1 reply; 11+ messages in thread
From: Dave Korn @ 2004-11-30 14:12 UTC (permalink / raw)
To: 'Robert Dewar', 'Sam Lauber'; +Cc: gcc
> -----Original Message-----
> From: gcc-owner On Behalf Of Robert Dewar
> Sent: 30 November 2004 04:24
> Sam Lauber wrote:
> > What's the stack have to do with it? I thought I had a 32-bit i686.
> > Why would we have to align the stack to a 16-bit boundry?
>
> It's 16-byte alignment, and the code you eliminated was
> performing many critical
> functions including this alignment, which is required for
> maximum efficiency.
Not just efficiency but correctness also, when using vector instructions.
Throw away the stack alignment code in main and everything down the call
hierarchy will not have the alignment it expects from the stack, meaning it
will place local variables at unaligned addresses. I don't know whether the
mmx/sse unit would trap or just round the unaligned addresses (thereby
causing locals to stomp all over each other) but the resulting mess would
_not_ be a pretty sight.
> > And why would there have to be a stack at all?
>
> That's a peculiar question, the stack is fundamental to the calling
> sequence and the call instruction.
I guess now would not be a good time to mention stackless python ? :-)
cheers,
DaveK
--
Can't think of a witty .sigline today....
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Useless assembly
2004-11-30 3:12 Sam Lauber
@ 2004-11-30 4:36 ` Robert Dewar
2004-11-30 14:12 ` Dave Korn
0 siblings, 1 reply; 11+ messages in thread
From: Robert Dewar @ 2004-11-30 4:36 UTC (permalink / raw)
To: Sam Lauber; +Cc: gcc
Sam Lauber wrote:
> What's the stack have to do with it? I thought I had a 32-bit i686.
> Why would we have to align the stack to a 16-bit boundry?
It's 16-byte alignment, and the code you eliminated was performing many critical
functions including this alignment, which is required for maximum efficiency.
> And why would there have to be a stack at all?
That's a peculiar question, the stack is fundamental to the calling
sequence and the call instruction.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Useless assembly
@ 2004-11-30 3:12 Sam Lauber
2004-11-30 4:36 ` Robert Dewar
0 siblings, 1 reply; 11+ messages in thread
From: Sam Lauber @ 2004-11-30 3:12 UTC (permalink / raw)
To: Robert Dewar, gcc
What's the stack have to do with it? I thought I had a 32-bit i686. Why would we have to align the stack to a 16-bit boundry? And why would there have to be a stack at all?
----- Original Message -----
From: "Robert Dewar" <dewar@gnat.com>
To: "Sam Lauber" <sam124@operamail.com>
Subject: Re: Useless assembly
Date: Sun, 28 Nov 2004 22:26:40 -0500
>
> Sam Lauber wrote:
>
> > By hand-optimizing the assembly, I made this:
> >
> > .LC0:
> > .string "Hello World!\n"
> > .globl main
> > .type main, @function
> > main:
> > pushl %ebp
> > movl %esp, %ebp
> > movl $.LC0, (%esp)
> > call printf
> > leave
> >
> > It worked the exact same. Clearly a lot was unnessecary. Removing "call printf" or "movl $.LC0, (%esp)" caused it not to print "Hello World!". Removing ".globl main" caused the linker to fail with an undefined reference to main. Removing anything else caused it to print Hello World and then generate a segmentation fault. Optimizing it at -O3 (which I'm sure includes SSA, DCE, and CCP) just made the compilation take longer and generate more assembly then the first time. If someone could just automate removing that code in cc1, it would probably mean a lot of optimization. Also, when I just compiled a "return 0;", I would expect that to just generate main and the instruction "ret" (or is it "retn?"). Instead, it generated a lot of extra instructions, then finally "ret". Clearly this is unsatisfactory. It makes cc1, as, and ld slower because they have to compile, assemble, and link more instructions. Each extra instruction makes a compilation time penalty of -3 and a runtime
> penalty of -1. At least that could be put onto DCE. If extra code gets generated by cc1, as and ld can't optimize it.
> >
> > Samuel Lauber
>
> Yes, but you fail to establish 16-byte alignment for the
> stack, which is the normal default. If you don't want
> 16-byte alignment for the stack, supply an appropriate
> switch (e.g. -mpreferred-stack-boundary=2).
>
>
--
_____________________________________________________________
Web-based SMS services available at http://www.operamail.com.
From your mailbox to local or overseas cell phones.
Powered by Outblaze
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2004-11-30 15:05 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-11-28 21:40 Useless assembly Sam Lauber
2004-11-28 21:41 ` Kazu Hirata
2004-11-28 21:56 ` Eric Botcazou
2004-11-28 22:06 ` Steven Bosscher
2004-11-28 22:46 ` jlh
2004-11-29 4:25 ` Robert Dewar
2004-11-29 12:04 ` Andreas Schwab
2004-11-30 3:12 Sam Lauber
2004-11-30 4:36 ` Robert Dewar
2004-11-30 14:12 ` Dave Korn
2004-11-30 16:20 ` Robert Dewar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).