public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* RE: pushl vs movl + movl on x86
@ 2005-08-23 20:30 Menezes, Evandro
  0 siblings, 0 replies; 3+ messages in thread
From: Menezes, Evandro @ 2005-08-23 20:30 UTC (permalink / raw)
  To: Dan Nicolaescu, gcc

Dan, 

> Is there a performance difference between the movl + movl and 
> pushl code sequences? 

Not in this example, but movl is faster in some circumstances than pushl.  A sequence of pushl has an implicit dependency chain on %esp, as it changes after each pushl, whereas a sequence of movl could enjoy better ILP.  However, movl is quite longer than pushl, as you pointed out, which may affect cache efficiency.  

Therefore, the sweet spot is somewhere in the middle.  It's more important to use movl wisely in prologs and epilogs than when passing arguments though.  For, as RTH mentioned, -maccumulate-outgoing-args is desirable to avoid frequent stack maintenance.

That being said, it depends largely on the underlying architecture implementation.

HTH


-- 
_______________________________________________________
Evandro Menezes            AMD               Austin, TX

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: pushl vs movl + movl on x86
  2005-08-23 19:08 Dan Nicolaescu
@ 2005-08-23 20:21 ` Richard Henderson
  0 siblings, 0 replies; 3+ messages in thread
From: Richard Henderson @ 2005-08-23 20:21 UTC (permalink / raw)
  To: Dan Nicolaescu; +Cc: gcc

On Tue, Aug 23, 2005 at 11:40:16AM -0700, Dan Nicolaescu wrote:
> Is there a performance difference between the movl + movl and pushl
> code sequences?

In this case, no.

> If not maybe then gcc should generate pushl for -O2
> too because it is smaller code.

It's not quite as simple as you make out.  You can get pushes out
of gcc with -mno-accumulate-outgoing-args, but then we have to add
other compensation code elsewhere.

IIRC, it was fairly well explored that we get equal or better
performance by not using pushes on P2 class machines and later.


r~

^ permalink raw reply	[flat|nested] 3+ messages in thread

* pushl vs movl + movl on x86
@ 2005-08-23 19:08 Dan Nicolaescu
  2005-08-23 20:21 ` Richard Henderson
  0 siblings, 1 reply; 3+ messages in thread
From: Dan Nicolaescu @ 2005-08-23 19:08 UTC (permalink / raw)
  To: gcc


For this code (from PR23525):

extern int waiting_for_initial_map;
extern int cp_pipe[2];
extern int pc_pipe[2];
extern int close (int __fd);

void
first_map_occurred(void)
{
    close(cp_pipe[0]);
    close(pc_pipe[1]);
    waiting_for_initial_map = 0;
}

gcc -march=i686 -O2 generates: 

        movl    cp_pipe, %eax
        movl    %eax, (%esp)
        call    close
        movl    pc_pipe+4, %eax
        movl    %eax, (%esp)
        call    close

The Intel compiler with the same flags generates:

        pushl     cp_pipe                                       #9.11
        call      close                                         #9.5
        pushl     4+pc_pipe                                     #10.11
        call      close                                         #10.5
 

gcc -march=i686 -Os generates similar code to the Intel compiler.

Is there a performance difference between the movl + movl and pushl
code sequences? If not maybe then gcc should generate pushl for -O2
too because it is smaller code.

Thanks

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2005-08-23 20:21 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-08-23 20:30 pushl vs movl + movl on x86 Menezes, Evandro
  -- strict thread matches above, loose matches on Subject: below --
2005-08-23 19:08 Dan Nicolaescu
2005-08-23 20:21 ` Richard Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).