public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug other/21195] New: SSE intrinsics not inlined, sometimes.
@ 2005-04-24 18:03 tbptbp at gmail dot com
2005-04-24 18:05 ` [Bug target/21195] " pinskia at gcc dot gnu dot org
` (8 more replies)
0 siblings, 9 replies; 10+ messages in thread
From: tbptbp at gmail dot com @ 2005-04-24 18:03 UTC (permalink / raw)
To: gcc-bugs
Under some conditions (generally if you upset the inlining heuristic ie by force
inlining something), SSE intrinsics don't get inlined and some truely horrible
code ensues; the fix, tinkering with params, isn't much prettier.
Happened to me with various 4.x versions, on x86 or x86-64.
silly testcase:
#include <xmmintrin.h>
static __attribute__ ((always_inline)) bool bloatit(const __m128 a, const __m128
b) {
const __m128
v0 = _mm_max_ps(a,b),
v1 = _mm_min_ps(a,b),
v2 = _mm_mul_ps(a,b),
v3 = _mm_div_ps(a,b),
g0 = _mm_or_ps(_mm_or_ps(_mm_or_ps(v0,v1), v2), v3);
return _mm_movemask_ps(g0);
}
bool finalblow(const __m128 a, const __m128 b, const __m128 c, const __m128 d,
const __m128 e, const __m128 f) {
return bloatit(a,b) & bloatit(c,d) & bloatit(e,f) & bloatit(a,c) & bloatit(b,d)
& bloatit(c,e) & bloatit(d,f);
}
int main() { return 0; }
At -O3, on x86-64-linux, g++-4120050417 gets funky with:
0000000000400540 <_mm_mul_ps(float __vector, float __vector)>:
400540: mulps %xmm1,%xmm0
400543: retq
...
0000000000400550 <_mm_div_ps(float __vector, float __vector)>:
400550: divps %xmm1,%xmm0
400553: retq
...
0000000000400560 <_mm_min_ps(float __vector, float __vector)>:
400560: minps %xmm1,%xmm0
400563: retq
...
0000000000400570 <_mm_max_ps(float __vector, float __vector)>:
400570: maxps %xmm1,%xmm0
400573: retq
...
0000000000400580 <_mm_or_ps(float __vector, float __vector)>:
400580: orps %xmm1,%xmm0
400583: retq
...
0000000000400590 <_mm_movemask_ps(float __vector)>:
400590: movmskps %xmm0,%eax
400593: retq
... only to conclude with this wonder
00000000004005b0 <finalblow(float __vector, float __vector, float __vector,
float __vector, float __vector, float __vector)>:
4005b0: push %rbx
4005b1: xor %ebx,%ebx
4005b3: sub $0x1b0,%rsp
4005ba: movaps %xmm2,0x180(%rsp)
4005c2: movaps %xmm3,0x170(%rsp)
4005ca: movaps %xmm4,0x160(%rsp)
4005d2: movaps %xmm5,0x150(%rsp)
4005da: movaps %xmm1,0x190(%rsp)
4005e2: movaps %xmm0,0x1a0(%rsp)
4005ea: callq 400550 <_mm_div_ps(float __vector, float __vector)>
4005ef: movaps %xmm0,0x140(%rsp)
4005f7: movaps 0x190(%rsp),%xmm1
4005ff: movaps 0x1a0(%rsp),%xmm0
400607: callq 400540 <_mm_mul_ps(float __vector, float __vector)>
40060c: movaps 0x190(%rsp),%xmm1
400614: movaps %xmm0,0x130(%rsp)
40061c: movaps 0x1a0(%rsp),%xmm0
400624: callq 400560 <_mm_min_ps(float __vector, float __vector)>
400629: movaps 0x190(%rsp),%xmm1
400631: movaps %xmm0,0x120(%rsp)
400639: movaps 0x1a0(%rsp),%xmm0
400641: callq 400570 <_mm_max_ps(float __vector, float __vector)>
400646: movaps 0x120(%rsp),%xmm1
40064e: callq 400580 <_mm_or_ps(float __vector, float __vector)>
400653: movaps 0x130(%rsp),%xmm1
40065b: callq 400580 <_mm_or_ps(float __vector, float __vector)>
400660: movaps 0x140(%rsp),%xmm1
400668: callq 400580 <_mm_or_ps(float __vector, float __vector)>
40066d: callq 400590 <_mm_movemask_ps(float __vector)>
400672: movaps 0x170(%rsp),%xmm1
etc...
As said earlier, that's just one way to make that happen.
It would be a real plus if those intrinsics could be inconditionnaly inlined.
--
Summary: SSE intrinsics not inlined, sometimes.
Product: gcc
Version: 4.1.0
Status: UNCONFIRMED
Severity: normal
Priority: P2
Component: other
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: tbptbp at gmail dot com
CC: gcc-bugs at gcc dot gnu dot org
GCC host triplet: x86*
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21195
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/21195] SSE intrinsics not inlined, sometimes.
2005-04-24 18:03 [Bug other/21195] New: SSE intrinsics not inlined, sometimes tbptbp at gmail dot com
@ 2005-04-24 18:05 ` pinskia at gcc dot gnu dot org
2005-04-25 4:15 ` pinskia at gcc dot gnu dot org
` (7 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-04-24 18:05 UTC (permalink / raw)
To: gcc-bugs
--
What |Removed |Added
----------------------------------------------------------------------------
Component|other |target
Keywords| |missed-optimization
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21195
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/21195] SSE intrinsics not inlined, sometimes.
2005-04-24 18:03 [Bug other/21195] New: SSE intrinsics not inlined, sometimes tbptbp at gmail dot com
2005-04-24 18:05 ` [Bug target/21195] " pinskia at gcc dot gnu dot org
@ 2005-04-25 4:15 ` pinskia at gcc dot gnu dot org
2005-04-26 12:47 ` tbptbp at gmail dot com
` (6 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-04-25 4:15 UTC (permalink / raw)
To: gcc-bugs
--
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |ssemmx
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21195
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/21195] SSE intrinsics not inlined, sometimes.
2005-04-24 18:03 [Bug other/21195] New: SSE intrinsics not inlined, sometimes tbptbp at gmail dot com
2005-04-24 18:05 ` [Bug target/21195] " pinskia at gcc dot gnu dot org
2005-04-25 4:15 ` pinskia at gcc dot gnu dot org
@ 2005-04-26 12:47 ` tbptbp at gmail dot com
2005-04-26 13:26 ` pinskia at gcc dot gnu dot org
` (5 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: tbptbp at gmail dot com @ 2005-04-26 12:47 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From tbptbp at gmail dot com 2005-04-26 12:45 -------
Let's have some more fun.
Take the silly testcase up there, add this:
struct foo_t {
bool dummy;
__attribute__ ((always_inline)) foo_t() {}
};
change finalblow into that:
bool finalblow(const __m128 a, const __m128 b, const __m128 c, const __m128 d,
const __m128 e, const __m128 f) {
foo_t bar[4];
return bar[0].dummy &
bloatit(a,b) & bloatit(c,d) & bloatit(e,f) & bloatit(a,c) &
bloatit(b,d) & bloatit(c,e) & bloatit(d,f);
}
and with the same compiler & flags you'll get this interesting snippet, from
finalblow:
...
4005ea: data16 <-- sure that loop deserves to be aligned
4005eb: data16
4005ec: nop
4005ed: data16
4005ee: data16
4005ef: nop
4005f0: inc %eax
4005f2: cmp $0x4,%eax
4005f5: jne 4005f0 <finalblow(float __vector, float __vector, float
__vector, float __vector, float __vector, float __vector)+0x40>
...
In case you're wondering, yes that's the constructor.
Again, that testcase is a bit artificial.
But i've just spent an hour tracking what was producing such an interesting
aligned empty loop in my app: same symptoms, but triggered differently; the
constructor was empty and not always_inline, but apparently some treshold was
met (lots of inlining around) and tada... instant contribution to the global
warming for peanuts :)
I'm certainly not qualified, but i'll dare to say that something's fishy wrt
inlining.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21195
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/21195] SSE intrinsics not inlined, sometimes.
2005-04-24 18:03 [Bug other/21195] New: SSE intrinsics not inlined, sometimes tbptbp at gmail dot com
` (2 preceding siblings ...)
2005-04-26 12:47 ` tbptbp at gmail dot com
@ 2005-04-26 13:26 ` pinskia at gcc dot gnu dot org
2005-04-26 14:29 ` tbptbp at gmail dot com
` (4 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-04-26 13:26 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From pinskia at gcc dot gnu dot org 2005-04-26 13:25 -------
(In reply to comment #1)
> and with the same compiler & flags you'll get this interesting snippet, from
> finalblow:
> In case you're wondering, yes that's the constructor.
That is PR 19639.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21195
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/21195] SSE intrinsics not inlined, sometimes.
2005-04-24 18:03 [Bug other/21195] New: SSE intrinsics not inlined, sometimes tbptbp at gmail dot com
` (3 preceding siblings ...)
2005-04-26 13:26 ` pinskia at gcc dot gnu dot org
@ 2005-04-26 14:29 ` tbptbp at gmail dot com
2005-05-05 23:58 ` tbptbp at gmail dot com
` (3 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: tbptbp at gmail dot com @ 2005-04-26 14:29 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From tbptbp at gmail dot com 2005-04-26 14:29 -------
Subject: Re: SSE intrinsics not inlined, sometimes.
On 26 Apr 2005 13:25:20 -0000, pinskia at gcc dot gnu dot org
<gcc-bugzilla@gcc.gnu.org> wrote:
> That is PR 19639.
Oh! A patch.
Sorry for the additionnal noise, but i'm getting a bit overeactive
about that inlining business.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21195
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/21195] SSE intrinsics not inlined, sometimes.
2005-04-24 18:03 [Bug other/21195] New: SSE intrinsics not inlined, sometimes tbptbp at gmail dot com
` (4 preceding siblings ...)
2005-04-26 14:29 ` tbptbp at gmail dot com
@ 2005-05-05 23:58 ` tbptbp at gmail dot com
2005-06-14 8:55 ` steven at gcc dot gnu dot org
` (2 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: tbptbp at gmail dot com @ 2005-05-05 23:58 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From tbptbp at gmail dot com 2005-05-05 23:58 -------
For future reference, i'm including my end-user offline answer to Uros regarding
always_inline usage.
Here we go:
> I was trying to take a quick look at your bugreport regarding
> always_inline attrubite. Just a quick remark - using only a plain static
> inline bool .... fixes the problem for me and at -O3 code looks like it
Doesn't surprise me.
> should. Is there a specific reason to have an attribute always_inline
> declared for the function you would like to inline? (Please note, that
> Jan Hubicka is currently working in this area.)
Yes, because inline alone in practice is next to useless. You say
below that reg<->mem movements are expensive, but my prime concern in
a hot path is branches.
And if you expect code to be inlined (or more precisely, you expect no
function call) then you have no alternative but to use always_inline.
Tho once you start using always_inline you upset the compiler and you
step in a world of pain where you have to babysit it for dependant
code with combo of always_inline/noinline.
In fact, always_inline/noinline combo are the only kludge for a number
of other problems:
. when gcc gets nuts, they are useful containement measures (so the
sillyness doesn't propagate)
. as said earlier inline being an (ignored) hint, if you have, say a
member function doing just one op (like those intrinsics in the
testcase), it makes absolutely no sense to not inline them. Ever. Yet
some times it happens.
. gcc doesn't like long sequences of branchless vectorized code, which
are quite common, and a static always_inline function is a way to tell
it to look somewhere else.
. those same static always_inline functions also are a way to tell it
to look closer at some code portion and to try to map its working set
into registers; it also has to do with the lack of an unroll pragma
and generally the lack of any directive to tell the compiler to pay
special attention to specific code.
So in the hotpath my code typically ends up being a bunch of
always_inline functions coalesced into a noinline.
For the non speed critical path, i let it up to the compiler. In that
regard, gcc4.x (and specifically gcc4.1) got a lot wiser, perhaps as
good as icc, but obviously not failproof :)
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21195
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/21195] SSE intrinsics not inlined, sometimes.
2005-04-24 18:03 [Bug other/21195] New: SSE intrinsics not inlined, sometimes tbptbp at gmail dot com
` (5 preceding siblings ...)
2005-05-05 23:58 ` tbptbp at gmail dot com
@ 2005-06-14 8:55 ` steven at gcc dot gnu dot org
2005-06-29 16:50 ` stuart at apple dot com
2005-07-21 8:35 ` uros at kss-loka dot si
8 siblings, 0 replies; 10+ messages in thread
From: steven at gcc dot gnu dot org @ 2005-06-14 8:55 UTC (permalink / raw)
To: gcc-bugs
--
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Ever Confirmed| |1
Last reconfirmed|0000-00-00 00:00:00 |2005-06-14 08:55:35
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21195
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/21195] SSE intrinsics not inlined, sometimes.
2005-04-24 18:03 [Bug other/21195] New: SSE intrinsics not inlined, sometimes tbptbp at gmail dot com
` (6 preceding siblings ...)
2005-06-14 8:55 ` steven at gcc dot gnu dot org
@ 2005-06-29 16:50 ` stuart at apple dot com
2005-07-21 8:35 ` uros at kss-loka dot si
8 siblings, 0 replies; 10+ messages in thread
From: stuart at apple dot com @ 2005-06-29 16:50 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From stuart at apple dot com 2005-06-29 16:49 -------
I marked all the x86 vector intrinsics with always_inline, and this seems to fix both the testcases here.
http://gcc.gnu.org/ml/gcc-cvs/2005-06/msg01059.html
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21195
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/21195] SSE intrinsics not inlined, sometimes.
2005-04-24 18:03 [Bug other/21195] New: SSE intrinsics not inlined, sometimes tbptbp at gmail dot com
` (7 preceding siblings ...)
2005-06-29 16:50 ` stuart at apple dot com
@ 2005-07-21 8:35 ` uros at kss-loka dot si
8 siblings, 0 replies; 10+ messages in thread
From: uros at kss-loka dot si @ 2005-07-21 8:35 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From uros at kss-loka dot si 2005-07-21 08:28 -------
(In reply to comment #5)
> I marked all the x86 vector intrinsics with always_inline, and this seems to
fix both the testcases here.
Confirmed.
--
What |Removed |Added
----------------------------------------------------------------------------
URL| |http://gcc.gnu.org/ml/gcc-
| |cvs/2005-06/msg01059.html
Status|NEW |RESOLVED
Keywords| |patch
Resolution| |FIXED
Target Milestone|--- |4.1.0
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21195
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2005-07-21 8:28 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-04-24 18:03 [Bug other/21195] New: SSE intrinsics not inlined, sometimes tbptbp at gmail dot com
2005-04-24 18:05 ` [Bug target/21195] " pinskia at gcc dot gnu dot org
2005-04-25 4:15 ` pinskia at gcc dot gnu dot org
2005-04-26 12:47 ` tbptbp at gmail dot com
2005-04-26 13:26 ` pinskia at gcc dot gnu dot org
2005-04-26 14:29 ` tbptbp at gmail dot com
2005-05-05 23:58 ` tbptbp at gmail dot com
2005-06-14 8:55 ` steven at gcc dot gnu dot org
2005-06-29 16:50 ` stuart at apple dot com
2005-07-21 8:35 ` uros at kss-loka dot si
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).