public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/17619] New: Non-optimal code for -mfpmath=387,sse
@ 2004-09-22 19:19 bangerth at dealii dot org
  2004-09-22 19:33 ` [Bug target/17619] " bangerth at dealii dot org
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: bangerth at dealii dot org @ 2004-09-22 19:19 UTC (permalink / raw)
  To: gcc-bugs

I know that -mfpmath=387,sse is not considered production quality. 
Nevertheless, I though I might give it a try. So here's some  
example code that computes the scalar product between two 
vectors of length 4: 
-------------------------------- 
struct X { float array[4]; }; 
 
X a,b; 
 
float foobar () { 
  float s = 0; 
  for (unsigned int d=0; d<4; ++d) 
    s += a.array[d] * b.array[d]; 
  return s; 
} 
-------------------------- 
In the following, I will always use compile flags  
  -O3 -funroll-loops -msse3 -mtune=pentium4 -march=pentium4 
in addition to whatever setting for -mfpmath is decribed. 
 
With -mfpmath=387 we get this (reasonable) piece of code: 
_Z6foobarv: 
	pushl	%ebp 
	movl	%esp, %ebp 
	flds	b 
	fmuls	a 
	fadds	.LC0 
	flds	b+4 
	fmuls	a+4 
	faddp	%st, %st(1) 
	flds	b+8 
	fmuls	a+8 
	faddp	%st, %st(1) 
	flds	b+12 
	fmuls	a+12 
	faddp	%st, %st(1) 
	popl	%ebp 
	ret 
Here, we load each pair of vector elements and multiply them, then 
adding to the accumulator. The only thing that's nonoptimal is that 
the initial addition to zero in "fadds	.LC0" could be avoided (LC0 
is a label to a zero floating point number). 
 
If one tries to compile with -mfpmath=sse, one gets very similar 
code, with the exception that multiplications and additions are 
performed in xmm? registers. 
 
However, here comes the catch: I though if I specify -mfpmath=387,sse 
it should produce at least as good code as without. But I get this: 
_Z6foobarv: 
	pushl	%ebp 
	movl	%esp, %ebp 
	subl	$4, %esp 
	flds	b 
	fmuls	a 
	fadds	.LC0 
	movss	b+4, %xmm0 
	mulss	a+4, %xmm0 
	movss	%xmm0, -4(%ebp) 
	flds	-4(%ebp) 
	faddp	%st, %st(1) 
	movss	b+8, %xmm0 
	mulss	a+8, %xmm0 
	movss	%xmm0, -4(%ebp) 
	flds	-4(%ebp) 
	faddp	%st, %st(1) 
	movss	b+12, %xmm0 
	mulss	a+12, %xmm0 
	movss	%xmm0, -4(%ebp) 
	flds	-4(%ebp) 
	faddp	%st, %st(1) 
	leave 
	ret 
That is decidedly not optimal: we compute the result of each multiplication 
in xmm registers, but then we push them onto the stack, reload them into 
st(?) registers and accumulate them there. Surely the whole thing 
can be done without these stack operations and be more efficient. In 
particular, using just -mfpmath=sse shows that this is possible. 
 
W.

-- 
           Summary: Non-optimal code for -mfpmath=387,sse
           Product: gcc
           Version: 4.0.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: target
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: bangerth at dealii dot org
                CC: gcc-bugs at gcc dot gnu dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17619


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2004-12-01 20:59 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-09-22 19:19 [Bug target/17619] New: Non-optimal code for -mfpmath=387,sse bangerth at dealii dot org
2004-09-22 19:33 ` [Bug target/17619] " bangerth at dealii dot org
2004-09-22 21:16 ` pinskia at gcc dot gnu dot org
2004-09-22 21:22 ` bangerth at dealii dot org
2004-09-22 21:25 ` bangerth at dealii dot org
2004-09-22 21:35 ` bangerth at dealii dot org
2004-12-01 14:07 ` uros at gcc dot gnu dot org
2004-12-01 14:27 ` pinskia at gcc dot gnu dot org
2004-12-01 16:03 ` uros at gcc dot gnu dot org
2004-12-01 20:49 ` bangerth at dealii dot org
2004-12-01 20:59 ` bangerth at dealii dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).