public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Useless assembly
@ 2004-11-28 21:40 Sam Lauber
  2004-11-28 21:41 ` Kazu Hirata
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Sam Lauber @ 2004-11-28 21:40 UTC (permalink / raw)
  To: gcc

When I run GCC 3.4.3 on this code:

#include <stdio.h>                                                                                                                             
int main(void)
{
        printf("Hello World!\n");
        return 0;
}
                                                                                                                                                                                                                                                          
it generates the assembly code (this is i686 assembly)

        .file   "test.c"
        .section        .rodata
.LC0:
        .string "Hello World!\n"
        .text
.globl main
        .type   main, @function
main:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $8, %esp
        andl    $-16, %esp
        movl    $0, %eax
        addl    $15, %eax
        addl    $15, %eax
        shrl    $4, %eax
        sall    $4, %eax
        subl    %eax, %esp
        movl    $.LC0, (%esp)
        call    printf
        movl    $0, %eax
        leave
        ret
        .size   main, .-main
        .section        .note.GNU-stack,"",@progbits
        .ident  "GCC: (GNU) 3.4.3"
                                                                                                                             
By hand-optimizing the assembly, I made this:

.LC0:
        .string "Hello World!\n"
.globl main
        .type   main, @function
main:
        pushl   %ebp
        movl    %esp, %ebp
        movl    $.LC0, (%esp)
        call    printf
        leave

It worked the exact same. Clearly a lot was unnessecary. Removing "call printf" or "movl $.LC0, (%esp)" caused it not to print "Hello World!". Removing ".globl main" caused the linker to fail with an undefined reference to main. Removing anything else caused it to print Hello World and then generate a segmentation fault. Optimizing it at -O3 (which I'm sure includes SSA, DCE, and CCP) just made the compilation take longer and generate more assembly then the first time. If someone could just automate removing that code in cc1, it would probably mean a lot of optimization. Also, when I just compiled a "return 0;", I would expect that to just generate main and the instruction "ret" (or is it "retn?"). Instead, it generated a lot of extra instructions, then finally "ret". Clearly this is unsatisfactory. It makes cc1, as, and ld slower because they have to compile, assemble, and link more instructions. Each extra instruction makes a compilation time penalty of -3 and a runtime penalty of -1. At least that could be put onto DCE. If extra code gets generated by cc1, as and ld can't optimize it.

Samuel Lauber
-- 
_____________________________________________________________
Web-based SMS services available at http://www.operamail.com.
From your mailbox to local or overseas cell phones.

Powered by Outblaze

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Useless assembly
  2004-11-28 21:40 Useless assembly Sam Lauber
@ 2004-11-28 21:41 ` Kazu Hirata
  2004-11-28 21:56   ` Eric Botcazou
  2004-11-28 22:06   ` Steven Bosscher
  2004-11-28 22:46 ` jlh
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 11+ messages in thread
From: Kazu Hirata @ 2004-11-28 21:41 UTC (permalink / raw)
  To: sam124; +Cc: gcc

Hi Sam,

> Optimizing it at -O3 (which I'm sure includes SSA, DCE, and CCP)
> just made the compilation take longer and generate more assembly
> then the first time.

-O3 does not enable SSA-based optimizations.

> Also, when I just compiled a "return 0;", I would expect that to
> just generate main and the instruction "ret" (or is it
> "retn?"). Instead, it generated a lot of extra instructions, then
> finally "ret".

Add an option -fomit-frame-pointer.

Kazu Hirata

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Useless assembly
  2004-11-28 21:41 ` Kazu Hirata
@ 2004-11-28 21:56   ` Eric Botcazou
  2004-11-28 22:06   ` Steven Bosscher
  1 sibling, 0 replies; 11+ messages in thread
From: Eric Botcazou @ 2004-11-28 21:56 UTC (permalink / raw)
  To: gcc; +Cc: Kazu Hirata, sam124

> Add an option -fomit-frame-pointer.

And don't call your function main. :-)

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Useless assembly
  2004-11-28 21:41 ` Kazu Hirata
  2004-11-28 21:56   ` Eric Botcazou
@ 2004-11-28 22:06   ` Steven Bosscher
  1 sibling, 0 replies; 11+ messages in thread
From: Steven Bosscher @ 2004-11-28 22:06 UTC (permalink / raw)
  To: gcc; +Cc: Kazu Hirata, sam124

On Sunday 28 November 2004 22:35, Kazu Hirata wrote:
> Hi Sam,
>
> > Optimizing it at -O3 (which I'm sure includes SSA, DCE, and CCP)
> > just made the compilation take longer and generate more assembly
> > then the first time.
>
> -O3 does not enable SSA-based optimizations.

In fact with GCC 3.4.3, nothing enables SSA-based optimizations.
How could it, there are none.

Gr.
Steven

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Useless assembly
  2004-11-28 21:40 Useless assembly Sam Lauber
  2004-11-28 21:41 ` Kazu Hirata
@ 2004-11-28 22:46 ` jlh
  2004-11-29  4:25 ` Robert Dewar
  2004-11-29 12:04 ` Andreas Schwab
  3 siblings, 0 replies; 11+ messages in thread
From: jlh @ 2004-11-28 22:46 UTC (permalink / raw)
  To: gcc

[-- Attachment #1: Type: text/plain, Size: 1521 bytes --]


> When I run GCC 3.4.3 on this code:
> [...]
> it generates the assembly code (this is i686 assembly)

When doing such test, you shouldn't test it on the function main(),
as it is obviously treated specially.  For example, take this example:

int hello()
{
         return(0);
}

int main()
{
         return(0);
}

And compile it with GCC 3.4.3 on x86.  It gives (simplified):

hello:  pushl   %ebp
         movl    %esp, %ebp
         movl    $0, %eax
         popl    %ebp
         ret

main:   pushl   %ebp
         movl    %esp, %ebp
         subl    $8, %esp
         andl    $-16, %esp
         movl    $0, %eax
         addl    $15, %eax
         addl    $15, %eax
         shrl    $4, %eax
         sall    $4, %eax
         subl    %eax, %esp
         movl    $0, %eax
         leave
         ret

You see the difference.  Note that this was done without any optimization
enabled, the command line was "gcc -c -S a.c"

 > Also, when I just compiled a "return 0;", I would expect that to just
 > generate main and the instruction "ret" (or is it "retn?").

No, because frame pointers aren't omitted by default.  Compile the
above example with the option "-fomit-frame-pointer" and you get what
you want, but only for hello(), not for main().  And that still without
optimization turned on.  To make main() a bit shorter, the option "-O"
is enough.

There probably are many reasons to not omit those additional instructions
from main.  And it's not necessary either, because main() only gets
executed once.

jlh

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 252 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Useless assembly
  2004-11-28 21:40 Useless assembly Sam Lauber
  2004-11-28 21:41 ` Kazu Hirata
  2004-11-28 22:46 ` jlh
@ 2004-11-29  4:25 ` Robert Dewar
  2004-11-29 12:04 ` Andreas Schwab
  3 siblings, 0 replies; 11+ messages in thread
From: Robert Dewar @ 2004-11-29  4:25 UTC (permalink / raw)
  To: Sam Lauber; +Cc: gcc

Sam Lauber wrote:

> By hand-optimizing the assembly, I made this:
> 
> .LC0:
>         .string "Hello World!\n"
> .globl main
>         .type   main, @function
> main:
>         pushl   %ebp
>         movl    %esp, %ebp
>         movl    $.LC0, (%esp)
>         call    printf
>         leave
> 
> It worked the exact same. Clearly a lot was unnessecary. Removing "call printf" or "movl $.LC0, (%esp)" caused it not to print "Hello World!". Removing ".globl main" caused the linker to fail with an undefined reference to main. Removing anything else caused it to print Hello World and then generate a segmentation fault. Optimizing it at -O3 (which I'm sure includes SSA, DCE, and CCP) just made the compilation take longer and generate more assembly then the first time. If someone could just automate removing that code in cc1, it would probably mean a lot of optimization. Also, when I just compiled a "return 0;", I would expect that to just generate main and the instruction "ret" (or is it "retn?"). Instead, it generated a lot of extra instructions, then finally "ret". Clearly this is unsatisfactory. It makes cc1, as, and ld slower because they have to compile, assemble, and link more instructions. Each extra instruction makes a compilation time penalty of -3 and a runtime 
penalty of -1. At least that could be put onto DCE. If extra code gets generated by cc1, as and ld can't optimize it.
> 
> Samuel Lauber

Yes, but you fail to establish 16-byte alignment for the
stack, which is the normal default. If you don't want
16-byte alignment for the stack, supply an appropriate
switch (e.g. -mpreferred-stack-boundary=2).

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Useless assembly
  2004-11-28 21:40 Useless assembly Sam Lauber
                   ` (2 preceding siblings ...)
  2004-11-29  4:25 ` Robert Dewar
@ 2004-11-29 12:04 ` Andreas Schwab
  3 siblings, 0 replies; 11+ messages in thread
From: Andreas Schwab @ 2004-11-29 12:04 UTC (permalink / raw)
  To: Sam Lauber; +Cc: gcc

"Sam Lauber" <sam124@operamail.com> writes:

> By hand-optimizing the assembly, I made this:
>
> .LC0:
>         .string "Hello World!\n"
> .globl main
>         .type   main, @function
> main:
>         pushl   %ebp
>         movl    %esp, %ebp
>         movl    $.LC0, (%esp)
>         call    printf
>         leave
>
> It worked the exact same.

It doesn't return the correct exit code.

Andreas.

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Useless assembly
  2004-11-30 14:12   ` Dave Korn
@ 2004-11-30 16:20     ` Robert Dewar
  0 siblings, 0 replies; 11+ messages in thread
From: Robert Dewar @ 2004-11-30 16:20 UTC (permalink / raw)
  To: Dave Korn; +Cc: 'Sam Lauber', gcc

Dave Korn wrote:

>   I guess now would not be a good time to mention stackless python ? :-) 

Well of course it is possible to implement without local stack frames,
but the calling sequence for C does not work that way!

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: Useless assembly
  2004-11-30  4:36 ` Robert Dewar
@ 2004-11-30 14:12   ` Dave Korn
  2004-11-30 16:20     ` Robert Dewar
  0 siblings, 1 reply; 11+ messages in thread
From: Dave Korn @ 2004-11-30 14:12 UTC (permalink / raw)
  To: 'Robert Dewar', 'Sam Lauber'; +Cc: gcc

> -----Original Message-----
> From: gcc-owner On Behalf Of Robert Dewar
> Sent: 30 November 2004 04:24

> Sam Lauber wrote:
> > What's the stack have to do with it? I thought I had a 32-bit i686.
> > Why would we have to align the stack to a 16-bit boundry?
> 
> It's 16-byte alignment, and the code you eliminated was 
> performing many critical
> functions including this alignment, which is required for 
> maximum efficiency.

  Not just efficiency but correctness also, when using vector instructions.
Throw away the stack alignment code in main and everything down the call
hierarchy will not have the alignment it expects from the stack, meaning it
will place local variables at unaligned addresses.  I don't know whether the
mmx/sse unit would trap or just round the unaligned addresses (thereby
causing locals to stomp all over each other) but the resulting mess would
_not_ be a pretty sight.

>  > And why would there have to be a stack at all?
> 
> That's a peculiar question, the stack is fundamental to the calling
> sequence and the call instruction.

  I guess now would not be a good time to mention stackless python ? :-) 

    cheers, 
      DaveK
-- 
Can't think of a witty .sigline today....

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Useless assembly
  2004-11-30  3:12 Sam Lauber
@ 2004-11-30  4:36 ` Robert Dewar
  2004-11-30 14:12   ` Dave Korn
  0 siblings, 1 reply; 11+ messages in thread
From: Robert Dewar @ 2004-11-30  4:36 UTC (permalink / raw)
  To: Sam Lauber; +Cc: gcc

Sam Lauber wrote:
> What's the stack have to do with it? I thought I had a 32-bit i686.
> Why would we have to align the stack to a 16-bit boundry?

It's 16-byte alignment, and the code you eliminated was performing many critical
functions including this alignment, which is required for maximum efficiency.

 > And why would there have to be a stack at all?

That's a peculiar question, the stack is fundamental to the calling
sequence and the call instruction.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Useless assembly
@ 2004-11-30  3:12 Sam Lauber
  2004-11-30  4:36 ` Robert Dewar
  0 siblings, 1 reply; 11+ messages in thread
From: Sam Lauber @ 2004-11-30  3:12 UTC (permalink / raw)
  To: Robert Dewar, gcc

What's the stack have to do with it? I thought I had a 32-bit i686. Why would we have to align the stack to a 16-bit boundry? And why would there have to be a stack at all?

----- Original Message -----
From: "Robert Dewar" <dewar@gnat.com>
To: "Sam Lauber" <sam124@operamail.com>
Subject: Re: Useless assembly
Date: Sun, 28 Nov 2004 22:26:40 -0500

> 
> Sam Lauber wrote:
> 
> > By hand-optimizing the assembly, I made this:
> > 
> > .LC0:
> >         .string "Hello World!\n"
> > .globl main
> >         .type   main, @function
> > main:
> >         pushl   %ebp
> >         movl    %esp, %ebp
> >         movl    $.LC0, (%esp)
> >         call    printf
> >         leave
> > 
> > It worked the exact same. Clearly a lot was unnessecary. Removing "call printf" or "movl $.LC0, (%esp)" caused it not to print "Hello World!". Removing ".globl main" caused the linker to fail with an undefined reference to main. Removing anything else caused it to print Hello World and then generate a segmentation fault. Optimizing it at -O3 (which I'm sure includes SSA, DCE, and CCP) just made the compilation take longer and generate more assembly then the first time. If someone could just automate removing that code in cc1, it would probably mean a lot of optimization. Also, when I just compiled a "return 0;", I would expect that to just generate main and the instruction "ret" (or is it "retn?"). Instead, it generated a lot of extra instructions, then finally "ret". Clearly this is unsatisfactory. It makes cc1, as, and ld slower because they have to compile, assemble, and link more instructions. Each extra instruction makes a compilation time penalty of -3 and a runtime 
> penalty of -1. At least that could be put onto DCE. If extra code gets generated by cc1, as and ld can't optimize it.
> > 
> > Samuel Lauber
> 
> Yes, but you fail to establish 16-byte alignment for the
> stack, which is the normal default. If you don't want
> 16-byte alignment for the stack, supply an appropriate
> switch (e.g. -mpreferred-stack-boundary=2).
> 
> 

-- 
_____________________________________________________________
Web-based SMS services available at http://www.operamail.com.
From your mailbox to local or overseas cell phones.

Powered by Outblaze

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2004-11-30 15:05 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-11-28 21:40 Useless assembly Sam Lauber
2004-11-28 21:41 ` Kazu Hirata
2004-11-28 21:56   ` Eric Botcazou
2004-11-28 22:06   ` Steven Bosscher
2004-11-28 22:46 ` jlh
2004-11-29  4:25 ` Robert Dewar
2004-11-29 12:04 ` Andreas Schwab
2004-11-30  3:12 Sam Lauber
2004-11-30  4:36 ` Robert Dewar
2004-11-30 14:12   ` Dave Korn
2004-11-30 16:20     ` Robert Dewar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).