* RE: pushl vs movl + movl on x86
@ 2005-08-23 20:30 Menezes, Evandro
0 siblings, 0 replies; 3+ messages in thread
From: Menezes, Evandro @ 2005-08-23 20:30 UTC (permalink / raw)
To: Dan Nicolaescu, gcc
Dan,
> Is there a performance difference between the movl + movl and
> pushl code sequences?
Not in this example, but movl is faster in some circumstances than pushl. A sequence of pushl has an implicit dependency chain on %esp, as it changes after each pushl, whereas a sequence of movl could enjoy better ILP. However, movl is quite longer than pushl, as you pointed out, which may affect cache efficiency.
Therefore, the sweet spot is somewhere in the middle. It's more important to use movl wisely in prologs and epilogs than when passing arguments though. For, as RTH mentioned, -maccumulate-outgoing-args is desirable to avoid frequent stack maintenance.
That being said, it depends largely on the underlying architecture implementation.
HTH
--
_______________________________________________________
Evandro Menezes AMD Austin, TX
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: pushl vs movl + movl on x86
2005-08-23 19:08 Dan Nicolaescu
@ 2005-08-23 20:21 ` Richard Henderson
0 siblings, 0 replies; 3+ messages in thread
From: Richard Henderson @ 2005-08-23 20:21 UTC (permalink / raw)
To: Dan Nicolaescu; +Cc: gcc
On Tue, Aug 23, 2005 at 11:40:16AM -0700, Dan Nicolaescu wrote:
> Is there a performance difference between the movl + movl and pushl
> code sequences?
In this case, no.
> If not maybe then gcc should generate pushl for -O2
> too because it is smaller code.
It's not quite as simple as you make out. You can get pushes out
of gcc with -mno-accumulate-outgoing-args, but then we have to add
other compensation code elsewhere.
IIRC, it was fairly well explored that we get equal or better
performance by not using pushes on P2 class machines and later.
r~
^ permalink raw reply [flat|nested] 3+ messages in thread
* pushl vs movl + movl on x86
@ 2005-08-23 19:08 Dan Nicolaescu
2005-08-23 20:21 ` Richard Henderson
0 siblings, 1 reply; 3+ messages in thread
From: Dan Nicolaescu @ 2005-08-23 19:08 UTC (permalink / raw)
To: gcc
For this code (from PR23525):
extern int waiting_for_initial_map;
extern int cp_pipe[2];
extern int pc_pipe[2];
extern int close (int __fd);
void
first_map_occurred(void)
{
close(cp_pipe[0]);
close(pc_pipe[1]);
waiting_for_initial_map = 0;
}
gcc -march=i686 -O2 generates:
movl cp_pipe, %eax
movl %eax, (%esp)
call close
movl pc_pipe+4, %eax
movl %eax, (%esp)
call close
The Intel compiler with the same flags generates:
pushl cp_pipe #9.11
call close #9.5
pushl 4+pc_pipe #10.11
call close #10.5
gcc -march=i686 -Os generates similar code to the Intel compiler.
Is there a performance difference between the movl + movl and pushl
code sequences? If not maybe then gcc should generate pushl for -O2
too because it is smaller code.
Thanks
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2005-08-23 20:21 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-08-23 20:30 pushl vs movl + movl on x86 Menezes, Evandro
-- strict thread matches above, loose matches on Subject: below --
2005-08-23 19:08 Dan Nicolaescu
2005-08-23 20:21 ` Richard Henderson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).