public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/18766] New: Inefficient code with -mfpmath=387,sse
@ 2004-12-01 20:54 bangerth at dealii dot org
2004-12-06 21:35 ` [Bug target/18766] " pinskia at gcc dot gnu dot org
0 siblings, 1 reply; 4+ messages in thread
From: bangerth at dealii dot org @ 2004-12-01 20:54 UTC (permalink / raw)
To: gcc-bugs
This is spinoff #1 of PR 17619:
Take this simple piece of code:
---------------------
float a[2],b[2];
float foobar () {
return a[0] * b[0]
+ a[1] * b[1];
}
---------------------
Compiled with
-O3 -funroll-loops -msse3 -mtune=pentium4 -march=pentium4 -mfpmath=387
we get this code:
---------------------
pushl %ebp
movl %esp, %ebp
flds b
fmuls a
flds b+4
fmuls a+4
faddp %st, %st(1)
popl %ebp
ret
-----------------------------
That's certainly optimal.
On the other hand, if we let the compiler use sse registers as well (though
we do not force it, we simply want the most efficient code), the code
we get with flags
-O3 -funroll-loops -msse3 -mtune=pentium4 -march=pentium4 -mfpmath=387,sse
looks like this:
-----------------------------
pushl %ebp
movl %esp, %ebp
subl $4, %esp
flds b
fmuls a
movss b+4, %xmm0
mulss a+4, %xmm0
movss %xmm0, -4(%ebp)
flds -4(%ebp)
faddp %st, %st(1)
leave
ret
---------------------------
The code is almost equivalent except for the fact that we have one
stack push and pop more to satisfy the system ABI that return values
are passed through st(0).
In essence, the compiler should just generate the first code sequence,
even if given the flag -mfpmath=387,sse.
W.
--
Summary: Inefficient code with -mfpmath=387,sse
Product: gcc
Version: 4.0.0
Status: UNCONFIRMED
Severity: normal
Priority: P2
Component: target
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: bangerth at dealii dot org
CC: gcc-bugs at gcc dot gnu dot org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18766
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug target/18766] Inefficient code with -mfpmath=387,sse
2004-12-01 20:54 [Bug target/18766] New: Inefficient code with -mfpmath=387,sse bangerth at dealii dot org
@ 2004-12-06 21:35 ` pinskia at gcc dot gnu dot org
0 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-12-06 21:35 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From pinskia at gcc dot gnu dot org 2004-12-06 21:34 -------
Confirmed.
--
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Ever Confirmed| |1
Last reconfirmed|0000-00-00 00:00:00 |2004-12-06 21:34:55
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18766
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug target/18766] Inefficient code with -mfpmath=387,sse
[not found] <bug-18766-102@http.gcc.gnu.org/bugzilla/>
2005-10-24 3:08 ` pinskia at gcc dot gnu dot org
@ 2008-08-03 17:01 ` ubizjak at gmail dot com
1 sibling, 0 replies; 4+ messages in thread
From: ubizjak at gmail dot com @ 2008-08-03 17:01 UTC (permalink / raw)
To: gcc-bugs
------- Comment #3 from ubizjak at gmail dot com 2008-08-03 16:59 -------
GNU C (GCC) version 4.4.0 20080803 (experimental) is now much smarter, several
rewrites of math ops now result in:
foobar:
pushl %ebp
movl %esp, %ebp
flds a
fmuls b
flds a+4
fmuls b+4
faddp %st, %st(1)
popl %ebp
ret
So, fixed for 4.4.
--
ubizjak at gmail dot com changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
Target Milestone|--- |4.4.0
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18766
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug target/18766] Inefficient code with -mfpmath=387,sse
[not found] <bug-18766-102@http.gcc.gnu.org/bugzilla/>
@ 2005-10-24 3:08 ` pinskia at gcc dot gnu dot org
2008-08-03 17:01 ` ubizjak at gmail dot com
1 sibling, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-10-24 3:08 UTC (permalink / raw)
To: gcc-bugs
------- Comment #2 from pinskia at gcc dot gnu dot org 2005-10-24 03:08 -------
What is happening is that the register allocator is selecting the return
possition for the last add which is a x87 register so it is doing the add in
x87 instead of sse which causes the rest to go bonkers.
--
pinskia at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Severity|normal |minor
Keywords| |ra
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18766
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2008-08-03 17:01 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-12-01 20:54 [Bug target/18766] New: Inefficient code with -mfpmath=387,sse bangerth at dealii dot org
2004-12-06 21:35 ` [Bug target/18766] " pinskia at gcc dot gnu dot org
[not found] <bug-18766-102@http.gcc.gnu.org/bugzilla/>
2005-10-24 3:08 ` pinskia at gcc dot gnu dot org
2008-08-03 17:01 ` ubizjak at gmail dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).