[Bug other/21195] New: SSE intrinsics not inlined, sometimes.

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug other/21195] New: SSE intrinsics not inlined, sometimes.
@ 2005-04-24 18:03 tbptbp at gmail dot com
  2005-04-24 18:05 ` [Bug target/21195] " pinskia at gcc dot gnu dot org
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: tbptbp at gmail dot com @ 2005-04-24 18:03 UTC (permalink / raw)
  To: gcc-bugs

Under some conditions (generally if you upset the inlining heuristic ie by force
inlining something), SSE intrinsics don't get inlined and some truely horrible
code ensues; the fix, tinkering with params, isn't much prettier.
Happened to me with various 4.x versions, on x86 or x86-64.

silly testcase:
#include <xmmintrin.h>



static __attribute__ ((always_inline)) bool bloatit(const __m128 a, const __m128
b) {

	const __m128

		v0 = _mm_max_ps(a,b),

		v1 = _mm_min_ps(a,b),

		v2 = _mm_mul_ps(a,b),

		v3 = _mm_div_ps(a,b),

		g0 = _mm_or_ps(_mm_or_ps(_mm_or_ps(v0,v1), v2), v3);

	

	return _mm_movemask_ps(g0);

}



bool finalblow(const __m128 a, const __m128 b, const __m128 c, const __m128 d,
const __m128 e, const __m128 f) {

	return bloatit(a,b) & bloatit(c,d) & bloatit(e,f) & bloatit(a,c) & bloatit(b,d)
& bloatit(c,e) & bloatit(d,f);

}


int main() { return 0; }


At -O3, on x86-64-linux, g++-4120050417 gets funky with:
0000000000400540 <_mm_mul_ps(float __vector, float __vector)>:
  400540:       mulps  %xmm1,%xmm0
  400543:       retq
...
0000000000400550 <_mm_div_ps(float __vector, float __vector)>:
  400550:       divps  %xmm1,%xmm0
  400553:       retq
...
0000000000400560 <_mm_min_ps(float __vector, float __vector)>:
  400560:       minps  %xmm1,%xmm0
  400563:       retq
...
0000000000400570 <_mm_max_ps(float __vector, float __vector)>:
  400570:       maxps  %xmm1,%xmm0
  400573:       retq
...
0000000000400580 <_mm_or_ps(float __vector, float __vector)>:
  400580:       orps   %xmm1,%xmm0
  400583:       retq
...
0000000000400590 <_mm_movemask_ps(float __vector)>:
  400590:       movmskps %xmm0,%eax
  400593:       retq

... only to conclude with this wonder
00000000004005b0 <finalblow(float __vector, float __vector, float __vector,
float __vector, float __vector, float __vector)>:
  4005b0:       push   %rbx
  4005b1:       xor    %ebx,%ebx
  4005b3:       sub    $0x1b0,%rsp
  4005ba:       movaps %xmm2,0x180(%rsp)
  4005c2:       movaps %xmm3,0x170(%rsp)
  4005ca:       movaps %xmm4,0x160(%rsp)
  4005d2:       movaps %xmm5,0x150(%rsp)
  4005da:       movaps %xmm1,0x190(%rsp)
  4005e2:       movaps %xmm0,0x1a0(%rsp)
  4005ea:       callq  400550 <_mm_div_ps(float __vector, float __vector)>
  4005ef:       movaps %xmm0,0x140(%rsp)
  4005f7:       movaps 0x190(%rsp),%xmm1
  4005ff:       movaps 0x1a0(%rsp),%xmm0
  400607:       callq  400540 <_mm_mul_ps(float __vector, float __vector)>
  40060c:       movaps 0x190(%rsp),%xmm1
  400614:       movaps %xmm0,0x130(%rsp)
  40061c:       movaps 0x1a0(%rsp),%xmm0
  400624:       callq  400560 <_mm_min_ps(float __vector, float __vector)>
  400629:       movaps 0x190(%rsp),%xmm1
  400631:       movaps %xmm0,0x120(%rsp)
  400639:       movaps 0x1a0(%rsp),%xmm0
  400641:       callq  400570 <_mm_max_ps(float __vector, float __vector)>
  400646:       movaps 0x120(%rsp),%xmm1
  40064e:       callq  400580 <_mm_or_ps(float __vector, float __vector)>
  400653:       movaps 0x130(%rsp),%xmm1
  40065b:       callq  400580 <_mm_or_ps(float __vector, float __vector)>
  400660:       movaps 0x140(%rsp),%xmm1
  400668:       callq  400580 <_mm_or_ps(float __vector, float __vector)>
  40066d:       callq  400590 <_mm_movemask_ps(float __vector)>
  400672:       movaps 0x170(%rsp),%xmm1
etc...


As said earlier, that's just one way to make that happen.
It would be a real plus if those intrinsics could be inconditionnaly inlined.

-- 
           Summary: SSE intrinsics not inlined, sometimes.
           Product: gcc
           Version: 4.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: other
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: tbptbp at gmail dot com
                CC: gcc-bugs at gcc dot gnu dot org
  GCC host triplet: x86*


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21195


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/21195] SSE intrinsics not inlined, sometimes.
  2005-04-24 18:03 [Bug other/21195] New: SSE intrinsics not inlined, sometimes tbptbp at gmail dot com
@ 2005-04-24 18:05 ` pinskia at gcc dot gnu dot org
  2005-04-25  4:15 ` pinskia at gcc dot gnu dot org
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-04-24 18:05 UTC (permalink / raw)
  To: gcc-bugs



-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|other                       |target
           Keywords|                            |missed-optimization


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21195


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/21195] SSE intrinsics not inlined, sometimes.
  2005-04-24 18:03 [Bug other/21195] New: SSE intrinsics not inlined, sometimes tbptbp at gmail dot com
  2005-04-24 18:05 ` [Bug target/21195] " pinskia at gcc dot gnu dot org
@ 2005-04-25  4:15 ` pinskia at gcc dot gnu dot org
  2005-04-26 12:47 ` tbptbp at gmail dot com
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-04-25  4:15 UTC (permalink / raw)
  To: gcc-bugs



-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |ssemmx


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21195


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/21195] SSE intrinsics not inlined, sometimes.
  2005-04-24 18:03 [Bug other/21195] New: SSE intrinsics not inlined, sometimes tbptbp at gmail dot com
  2005-04-24 18:05 ` [Bug target/21195] " pinskia at gcc dot gnu dot org
  2005-04-25  4:15 ` pinskia at gcc dot gnu dot org
@ 2005-04-26 12:47 ` tbptbp at gmail dot com
  2005-04-26 13:26 ` pinskia at gcc dot gnu dot org
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: tbptbp at gmail dot com @ 2005-04-26 12:47 UTC (permalink / raw)
  To: gcc-bugs

------- Additional Comments From tbptbp at gmail dot com  2005-04-26 12:45 -------
Let's have some more fun.

Take the silly testcase up there, add this:
struct foo_t {
  bool dummy;
    __attribute__ ((always_inline)) foo_t() {}
};

change finalblow into that:
bool finalblow(const __m128 a, const __m128 b, const __m128 c, const __m128 d,
const __m128 e, const __m128 f) {
  foo_t bar[4];

  return bar[0].dummy &
            bloatit(a,b) & bloatit(c,d) & bloatit(e,f) & bloatit(a,c) &
bloatit(b,d) & bloatit(c,e) & bloatit(d,f);
}

and with the same compiler & flags you'll get this interesting snippet, from
finalblow:
...
  4005ea:       data16  <-- sure that loop deserves to be aligned
  4005eb:       data16
  4005ec:       nop
  4005ed:       data16
  4005ee:       data16
  4005ef:       nop
  4005f0:       inc    %eax
  4005f2:       cmp    $0x4,%eax
  4005f5:       jne    4005f0 <finalblow(float __vector, float __vector, float
__vector, float __vector, float __vector, float __vector)+0x40>
...

In case you're wondering, yes that's the constructor.

Again, that testcase is a bit artificial.
But i've just spent an hour tracking what was producing such an interesting
aligned empty loop in my app: same symptoms, but triggered differently; the
constructor was empty and not always_inline, but apparently some treshold was
met (lots of inlining around) and tada... instant contribution to the global
warming for peanuts :)

I'm certainly not qualified, but i'll dare to say that something's fishy wrt
inlining.

-- 

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21195

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/21195] SSE intrinsics not inlined, sometimes.
  2005-04-24 18:03 [Bug other/21195] New: SSE intrinsics not inlined, sometimes tbptbp at gmail dot com
                   ` (2 preceding siblings ...)
  2005-04-26 12:47 ` tbptbp at gmail dot com
@ 2005-04-26 13:26 ` pinskia at gcc dot gnu dot org
  2005-04-26 14:29 ` tbptbp at gmail dot com
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-04-26 13:26 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From pinskia at gcc dot gnu dot org  2005-04-26 13:25 -------
(In reply to comment #1)
> and with the same compiler & flags you'll get this interesting snippet, from
> finalblow:
> In case you're wondering, yes that's the constructor.

That is PR 19639.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21195


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/21195] SSE intrinsics not inlined, sometimes.
  2005-04-24 18:03 [Bug other/21195] New: SSE intrinsics not inlined, sometimes tbptbp at gmail dot com
                   ` (3 preceding siblings ...)
  2005-04-26 13:26 ` pinskia at gcc dot gnu dot org
@ 2005-04-26 14:29 ` tbptbp at gmail dot com
  2005-05-05 23:58 ` tbptbp at gmail dot com
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: tbptbp at gmail dot com @ 2005-04-26 14:29 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From tbptbp at gmail dot com  2005-04-26 14:29 -------
Subject: Re:  SSE intrinsics not inlined, sometimes.

On 26 Apr 2005 13:25:20 -0000, pinskia at gcc dot gnu dot org
<gcc-bugzilla@gcc.gnu.org> wrote:
> That is PR 19639.
Oh! A patch.

Sorry for the additionnal noise, but i'm getting a bit overeactive
about that inlining business.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21195


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/21195] SSE intrinsics not inlined, sometimes.
  2005-04-24 18:03 [Bug other/21195] New: SSE intrinsics not inlined, sometimes tbptbp at gmail dot com
                   ` (4 preceding siblings ...)
  2005-04-26 14:29 ` tbptbp at gmail dot com
@ 2005-05-05 23:58 ` tbptbp at gmail dot com
  2005-06-14  8:55 ` steven at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: tbptbp at gmail dot com @ 2005-05-05 23:58 UTC (permalink / raw)
  To: gcc-bugs

------- Additional Comments From tbptbp at gmail dot com  2005-05-05 23:58 -------
For future reference, i'm including my end-user offline answer to Uros regarding
always_inline usage.

Here we go:
> I was trying to take a quick look at your bugreport regarding
> always_inline attrubite. Just a quick remark - using only a plain static
> inline bool .... fixes the problem for me and at -O3 code looks like it
Doesn't surprise me.

> should. Is there a specific reason to have an attribute always_inline
> declared for the function you would like to inline? (Please note, that
> Jan Hubicka is currently working in this area.)
Yes, because inline alone in practice is next to useless. You say
below that reg<->mem movements are expensive, but my prime concern in
a hot path is branches.
And if you expect code to be inlined (or more precisely, you expect no
function call) then you have no alternative but to use always_inline.
Tho once you start using always_inline you upset the compiler and you
step in a world of pain where you have to babysit it for dependant
code with combo of always_inline/noinline.

In fact, always_inline/noinline combo are the only kludge for a number
of other problems:
. when gcc gets nuts, they are useful containement measures (so the
sillyness doesn't propagate)
. as said earlier inline being an (ignored) hint, if you have, say a
member function doing just one op (like those intrinsics in the
testcase), it makes absolutely no sense to not inline them. Ever. Yet
some times it happens.
. gcc doesn't like long sequences of branchless vectorized code, which
are quite common, and a static always_inline function is a way to tell
it to look somewhere else.
. those same static always_inline functions also are a way to tell it
to look closer at some code portion and to try to map its working set
into registers; it also has to do with the lack of an unroll pragma
and generally the lack of any directive to tell the compiler to pay
special attention to specific code.

So in the hotpath my code typically ends up being a bunch of
always_inline functions coalesced into a noinline.
For the non speed critical path, i let it up to the compiler. In that
regard, gcc4.x (and specifically gcc4.1) got a lot wiser, perhaps as
good as icc, but obviously not failproof :)

-- 

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21195

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/21195] SSE intrinsics not inlined, sometimes.
  2005-04-24 18:03 [Bug other/21195] New: SSE intrinsics not inlined, sometimes tbptbp at gmail dot com
                   ` (5 preceding siblings ...)
  2005-05-05 23:58 ` tbptbp at gmail dot com
@ 2005-06-14  8:55 ` steven at gcc dot gnu dot org
  2005-06-29 16:50 ` stuart at apple dot com
  2005-07-21  8:35 ` uros at kss-loka dot si
  8 siblings, 0 replies; 10+ messages in thread
From: steven at gcc dot gnu dot org @ 2005-06-14  8:55 UTC (permalink / raw)
  To: gcc-bugs



-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|                            |1
   Last reconfirmed|0000-00-00 00:00:00         |2005-06-14 08:55:35
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21195


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/21195] SSE intrinsics not inlined, sometimes.
  2005-04-24 18:03 [Bug other/21195] New: SSE intrinsics not inlined, sometimes tbptbp at gmail dot com
                   ` (6 preceding siblings ...)
  2005-06-14  8:55 ` steven at gcc dot gnu dot org
@ 2005-06-29 16:50 ` stuart at apple dot com
  2005-07-21  8:35 ` uros at kss-loka dot si
  8 siblings, 0 replies; 10+ messages in thread
From: stuart at apple dot com @ 2005-06-29 16:50 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From stuart at apple dot com  2005-06-29 16:49 -------
I marked all the x86 vector intrinsics with always_inline, and this seems to fix both the testcases here.

http://gcc.gnu.org/ml/gcc-cvs/2005-06/msg01059.html

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21195


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/21195] SSE intrinsics not inlined, sometimes.
  2005-04-24 18:03 [Bug other/21195] New: SSE intrinsics not inlined, sometimes tbptbp at gmail dot com
                   ` (7 preceding siblings ...)
  2005-06-29 16:50 ` stuart at apple dot com
@ 2005-07-21  8:35 ` uros at kss-loka dot si
  8 siblings, 0 replies; 10+ messages in thread
From: uros at kss-loka dot si @ 2005-07-21  8:35 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From uros at kss-loka dot si  2005-07-21 08:28 -------
(In reply to comment #5)
> I marked all the x86 vector intrinsics with always_inline, and this seems to 
fix both the testcases here.

  Confirmed.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
                URL|                            |http://gcc.gnu.org/ml/gcc-
                   |                            |cvs/2005-06/msg01059.html
             Status|NEW                         |RESOLVED
           Keywords|                            |patch
         Resolution|                            |FIXED
   Target Milestone|---                         |4.1.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21195


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2005-07-21  8:28 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-04-24 18:03 [Bug other/21195] New: SSE intrinsics not inlined, sometimes tbptbp at gmail dot com
2005-04-24 18:05 ` [Bug target/21195] " pinskia at gcc dot gnu dot org
2005-04-25  4:15 ` pinskia at gcc dot gnu dot org
2005-04-26 12:47 ` tbptbp at gmail dot com
2005-04-26 13:26 ` pinskia at gcc dot gnu dot org
2005-04-26 14:29 ` tbptbp at gmail dot com
2005-05-05 23:58 ` tbptbp at gmail dot com
2005-06-14  8:55 ` steven at gcc dot gnu dot org
2005-06-29 16:50 ` stuart at apple dot com
2005-07-21  8:35 ` uros at kss-loka dot si

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).