[Bug target/18766] New: Inefficient code with -mfpmath=387,sse

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug target/18766] New: Inefficient code with -mfpmath=387,sse
@ 2004-12-01 20:54 bangerth at dealii dot org
  2004-12-06 21:35 ` [Bug target/18766] " pinskia at gcc dot gnu dot org
  0 siblings, 1 reply; 4+ messages in thread
From: bangerth at dealii dot org @ 2004-12-01 20:54 UTC (permalink / raw)
  To: gcc-bugs

This is spinoff #1 of PR 17619: 
 
Take this simple piece of code: 
--------------------- 
float a[2],b[2];  
  
float foobar () {  
  return a[0] * b[0] 
    + a[1] * b[1];  
}  
--------------------- 
 
Compiled with  
  -O3 -funroll-loops -msse3 -mtune=pentium4 -march=pentium4 -mfpmath=387 
we get this code: 
--------------------- 
	pushl	%ebp 
	movl	%esp, %ebp 
	flds	b 
	fmuls	a 
	flds	b+4 
	fmuls	a+4 
	faddp	%st, %st(1) 
	popl	%ebp 
	ret 
----------------------------- 
That's certainly optimal. 
 
On the other hand, if we let the compiler use sse registers as well (though 
we do not force it, we simply want the most efficient code), the code 
we get with flags 
  -O3 -funroll-loops -msse3 -mtune=pentium4 -march=pentium4 -mfpmath=387,sse 
looks like this: 
----------------------------- 
	pushl	%ebp 
	movl	%esp, %ebp 
	subl	$4, %esp 
	flds	b 
	fmuls	a 
	movss	b+4, %xmm0 
	mulss	a+4, %xmm0 
	movss	%xmm0, -4(%ebp) 
	flds	-4(%ebp) 
	faddp	%st, %st(1) 
	leave 
	ret 
--------------------------- 
The code is almost equivalent except for the fact that we have one 
stack push and pop more to satisfy the system ABI that return values 
are passed through st(0). 
 
In essence, the compiler should just generate the first code sequence, 
even if given the flag -mfpmath=387,sse. 
 
W.

-- 
           Summary: Inefficient code with -mfpmath=387,sse
           Product: gcc
           Version: 4.0.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: target
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: bangerth at dealii dot org
                CC: gcc-bugs at gcc dot gnu dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18766


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/18766] Inefficient code with -mfpmath=387,sse
  2004-12-01 20:54 [Bug target/18766] New: Inefficient code with -mfpmath=387,sse bangerth at dealii dot org
@ 2004-12-06 21:35 ` pinskia at gcc dot gnu dot org
  0 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-12-06 21:35 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From pinskia at gcc dot gnu dot org  2004-12-06 21:34 -------
Confirmed.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|                            |1
   Last reconfirmed|0000-00-00 00:00:00         |2004-12-06 21:34:55
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18766


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/18766] Inefficient code with -mfpmath=387,sse
       [not found] <bug-18766-102@http.gcc.gnu.org/bugzilla/>
  2005-10-24  3:08 ` pinskia at gcc dot gnu dot org
@ 2008-08-03 17:01 ` ubizjak at gmail dot com
  1 sibling, 0 replies; 4+ messages in thread
From: ubizjak at gmail dot com @ 2008-08-03 17:01 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from ubizjak at gmail dot com  2008-08-03 16:59 -------
GNU C (GCC) version 4.4.0 20080803 (experimental) is now much smarter, several
rewrites of math ops now result in:

foobar:
        pushl   %ebp
        movl    %esp, %ebp
        flds    a
        fmuls   b
        flds    a+4
        fmuls   b+4
        faddp   %st, %st(1)
        popl    %ebp
        ret

So, fixed for 4.4.


-- 

ubizjak at gmail dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED
   Target Milestone|---                         |4.4.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18766


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/18766] Inefficient code with -mfpmath=387,sse
       [not found] <bug-18766-102@http.gcc.gnu.org/bugzilla/>
@ 2005-10-24  3:08 ` pinskia at gcc dot gnu dot org
  2008-08-03 17:01 ` ubizjak at gmail dot com
  1 sibling, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-10-24  3:08 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from pinskia at gcc dot gnu dot org  2005-10-24 03:08 -------
What is happening is that the register allocator is selecting the return
possition for the last add which is a x87 register so it is doing the add in
x87 instead of sse which causes the rest to go bonkers.


-- 

pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |minor
           Keywords|                            |ra


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18766


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2008-08-03 17:01 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-12-01 20:54 [Bug target/18766] New: Inefficient code with -mfpmath=387,sse bangerth at dealii dot org
2004-12-06 21:35 ` [Bug target/18766] " pinskia at gcc dot gnu dot org
     [not found] <bug-18766-102@http.gcc.gnu.org/bugzilla/>
2005-10-24  3:08 ` pinskia at gcc dot gnu dot org
2008-08-03 17:01 ` ubizjak at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).