public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug other/21195] New: SSE intrinsics not inlined, sometimes.
@ 2005-04-24 18:03 tbptbp at gmail dot com
  2005-04-24 18:05 ` [Bug target/21195] " pinskia at gcc dot gnu dot org
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: tbptbp at gmail dot com @ 2005-04-24 18:03 UTC (permalink / raw)
  To: gcc-bugs

Under some conditions (generally if you upset the inlining heuristic ie by force
inlining something), SSE intrinsics don't get inlined and some truely horrible
code ensues; the fix, tinkering with params, isn't much prettier.
Happened to me with various 4.x versions, on x86 or x86-64.

silly testcase:
#include <xmmintrin.h>



static __attribute__ ((always_inline)) bool bloatit(const __m128 a, const __m128
b) {

	const __m128

		v0 = _mm_max_ps(a,b),

		v1 = _mm_min_ps(a,b),

		v2 = _mm_mul_ps(a,b),

		v3 = _mm_div_ps(a,b),

		g0 = _mm_or_ps(_mm_or_ps(_mm_or_ps(v0,v1), v2), v3);

	

	return _mm_movemask_ps(g0);

}



bool finalblow(const __m128 a, const __m128 b, const __m128 c, const __m128 d,
const __m128 e, const __m128 f) {

	return bloatit(a,b) & bloatit(c,d) & bloatit(e,f) & bloatit(a,c) & bloatit(b,d)
& bloatit(c,e) & bloatit(d,f);

}


int main() { return 0; }


At -O3, on x86-64-linux, g++-4120050417 gets funky with:
0000000000400540 <_mm_mul_ps(float __vector, float __vector)>:
  400540:       mulps  %xmm1,%xmm0
  400543:       retq
...
0000000000400550 <_mm_div_ps(float __vector, float __vector)>:
  400550:       divps  %xmm1,%xmm0
  400553:       retq
...
0000000000400560 <_mm_min_ps(float __vector, float __vector)>:
  400560:       minps  %xmm1,%xmm0
  400563:       retq
...
0000000000400570 <_mm_max_ps(float __vector, float __vector)>:
  400570:       maxps  %xmm1,%xmm0
  400573:       retq
...
0000000000400580 <_mm_or_ps(float __vector, float __vector)>:
  400580:       orps   %xmm1,%xmm0
  400583:       retq
...
0000000000400590 <_mm_movemask_ps(float __vector)>:
  400590:       movmskps %xmm0,%eax
  400593:       retq

... only to conclude with this wonder
00000000004005b0 <finalblow(float __vector, float __vector, float __vector,
float __vector, float __vector, float __vector)>:
  4005b0:       push   %rbx
  4005b1:       xor    %ebx,%ebx
  4005b3:       sub    $0x1b0,%rsp
  4005ba:       movaps %xmm2,0x180(%rsp)
  4005c2:       movaps %xmm3,0x170(%rsp)
  4005ca:       movaps %xmm4,0x160(%rsp)
  4005d2:       movaps %xmm5,0x150(%rsp)
  4005da:       movaps %xmm1,0x190(%rsp)
  4005e2:       movaps %xmm0,0x1a0(%rsp)
  4005ea:       callq  400550 <_mm_div_ps(float __vector, float __vector)>
  4005ef:       movaps %xmm0,0x140(%rsp)
  4005f7:       movaps 0x190(%rsp),%xmm1
  4005ff:       movaps 0x1a0(%rsp),%xmm0
  400607:       callq  400540 <_mm_mul_ps(float __vector, float __vector)>
  40060c:       movaps 0x190(%rsp),%xmm1
  400614:       movaps %xmm0,0x130(%rsp)
  40061c:       movaps 0x1a0(%rsp),%xmm0
  400624:       callq  400560 <_mm_min_ps(float __vector, float __vector)>
  400629:       movaps 0x190(%rsp),%xmm1
  400631:       movaps %xmm0,0x120(%rsp)
  400639:       movaps 0x1a0(%rsp),%xmm0
  400641:       callq  400570 <_mm_max_ps(float __vector, float __vector)>
  400646:       movaps 0x120(%rsp),%xmm1
  40064e:       callq  400580 <_mm_or_ps(float __vector, float __vector)>
  400653:       movaps 0x130(%rsp),%xmm1
  40065b:       callq  400580 <_mm_or_ps(float __vector, float __vector)>
  400660:       movaps 0x140(%rsp),%xmm1
  400668:       callq  400580 <_mm_or_ps(float __vector, float __vector)>
  40066d:       callq  400590 <_mm_movemask_ps(float __vector)>
  400672:       movaps 0x170(%rsp),%xmm1
etc...


As said earlier, that's just one way to make that happen.
It would be a real plus if those intrinsics could be inconditionnaly inlined.

-- 
           Summary: SSE intrinsics not inlined, sometimes.
           Product: gcc
           Version: 4.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: other
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: tbptbp at gmail dot com
                CC: gcc-bugs at gcc dot gnu dot org
  GCC host triplet: x86*


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21195


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2005-07-21  8:28 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-04-24 18:03 [Bug other/21195] New: SSE intrinsics not inlined, sometimes tbptbp at gmail dot com
2005-04-24 18:05 ` [Bug target/21195] " pinskia at gcc dot gnu dot org
2005-04-25  4:15 ` pinskia at gcc dot gnu dot org
2005-04-26 12:47 ` tbptbp at gmail dot com
2005-04-26 13:26 ` pinskia at gcc dot gnu dot org
2005-04-26 14:29 ` tbptbp at gmail dot com
2005-05-05 23:58 ` tbptbp at gmail dot com
2005-06-14  8:55 ` steven at gcc dot gnu dot org
2005-06-29 16:50 ` stuart at apple dot com
2005-07-21  8:35 ` uros at kss-loka dot si

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).