public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c++/12902] New: Invalid assembly generated when using SSE / xmmintrin.h
@ 2003-11-05 1:31 kbowers at lanl dot gov
2003-11-05 4:08 ` [Bug c++/12902] " pinskia at gcc dot gnu dot org
` (15 more replies)
0 siblings, 16 replies; 17+ messages in thread
From: kbowers at lanl dot gov @ 2003-11-05 1:31 UTC (permalink / raw)
To: gcc-bugs
PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12902
Summary: Invalid assembly generated when using SSE / xmmintrin.h
Product: gcc
Version: 3.3.2
Status: UNCONFIRMED
Severity: normal
Priority: P2
Component: c++
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: kbowers at lanl dot gov
CC: gcc-bugs at gcc dot gnu dot org
GCC build triplet: i686-pc-linux-gnu
GCC host triplet: i686-pc-linux-gnu
GCC target triplet: i686-pc-linux-gnu
I have a code which heavily uses SSE. Some new functions written for the code
seg-faulted due to an unaligned "movaps" instruction. I stripped one of those
functions down as far as I could to make a self-contained program which
reproduces the bug. That program is attached to the end of this bug report. I
apologize in advance that I could not find a simpler faulting program.
It generates invalid assembly when compiled with gcc-3.3 and gcc-3.3.2 at
optimizations levels -O, -O1, -O2 and -O3 when compiled.
The following commands can demonstrate the bug.
% g++-3.3.2 -v
Reading specs from
/home/kbowers/local/bin/../lib/gcc-lib/i686-pc-linux-gnu/3.3.2/specs
Configured with: ../gcc-3.3.2/configure --prefix=/local_home/kbowers
--program-suffix=-3.3.2
Thread model: posix
gcc version 3.3.2
% g++-3.3.2 -S -fverbose-asm -O -msse compiler_bug.cpp
% cat compiler_bug.cpp
... snip to around line 45 ...
movl 8(%ebp), %eax # a
movlps (%eax), %xmm3 # <anonymous>
movaps 8(%eax), %xmm0
movlps %xmm0, -200(%ebp)
movhps 16(%eax), %xmm3 # <anonymous>
movaps 24(%eax), %xmm2
... snip ...
The first instruction loads the value of the "a" pointer into eax. When the loop
is run, "a" is a 16-byte aligned pointer. The third instruction tries to do an
aligned 16-byte load from a non-aligned address.
Thanks.
---- BEGIN PROGRAM ----
#include <xmmintrin.h>
typedef struct {
int i;
float f[3];
} a_t;
typedef struct {
float f[8];
} b_t;
typedef union {
int i[4];
float f[4];
__m128 v;
} vector4;
inline void swizzle( const void *a0, const void *a1,
const void *a2, const void *a3,
vector4 &a, vector4 &b, vector4 &c, vector4 &d ) {
__m128 t, u;
a.v = _mm_loadl_pi(a.v, (__m64 *)a0);
c.v = _mm_loadl_pi(c.v,((__m64 *)a0)+1);
a.v = _mm_loadh_pi(a.v, (__m64 *)a1);
c.v = _mm_loadh_pi(c.v,((__m64 *)a1)+1);
b.v = a.v;
d.v = c.v;
t = _mm_loadl_pi(b.v, (__m64 *)a2); // b.v to avoid warn
u = _mm_loadl_pi(d.v,((__m64 *)a2)+1); // d.v to avoid warn
t = _mm_loadh_pi(t, (__m64 *)a3);
u = _mm_loadh_pi(u,((__m64 *)a3)+1);
a.v = _mm_shuffle_ps(a.v,t,0x88);
b.v = _mm_shuffle_ps(b.v,t,0xdd);
c.v = _mm_shuffle_ps(c.v,u,0x88);
d.v = _mm_shuffle_ps(d.v,u,0xdd);
}
void foo( const a_t *a, const b_t *b, int n ) {
vector4 ai, a0, a1, a2, b0, b1, v0, v1, v2;
__m128 *p0, *p1, *p2, *p3;
for(;n;n--,a+=4) {
swizzle(a,a+1,a+2,a+3,ai,a0,a1,a2);
p0 = (__m128 *)(b + ai.i[0]);
p1 = (__m128 *)(b + ai.i[1]);
p2 = (__m128 *)(b + ai.i[2]);
p3 = (__m128 *)(b + ai.i[3]);
swizzle(p0++,p1++,p2++,p3++,b0,v0,v1,v2);
b0.v = _mm_add_ps( _mm_add_ps(b0.v,_mm_mul_ps(a1.v,v0.v)),
_mm_mul_ps(a2.v,_mm_add_ps(v1.v,_mm_mul_ps(a1.v,v2.v))));
swizzle(p0,p1,p2,p3,b1,v0,v1,v2);
b1.v = _mm_add_ps( _mm_add_ps(b1.v,_mm_mul_ps(a2.v,v0.v)),
_mm_mul_ps(a0.v,_mm_add_ps(v1.v,_mm_mul_ps(a2.v,v2.v))));
}
}
---- END PROGRAM ----
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug c++/12902] Invalid assembly generated when using SSE / xmmintrin.h
2003-11-05 1:31 [Bug c++/12902] New: Invalid assembly generated when using SSE / xmmintrin.h kbowers at lanl dot gov
@ 2003-11-05 4:08 ` pinskia at gcc dot gnu dot org
2003-11-05 5:51 ` kbowers at lanl dot gov
` (14 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2003-11-05 4:08 UTC (permalink / raw)
To: gcc-bugs
PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12902
pinskia at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |WAITING
------- Additional Comments From pinskia at gcc dot gnu dot org 2003-11-05 04:08 -------
The "a" in the asm corespondes to the variable "a" in foo, not the a in inlined function, swizzle.
So the movaps is coming from:
c.v = _mm_loadl_pi(c.v,((__m64 *)a0)+1); <--- 8(%eax)
Is that really your problem, or is because main's stack is unaligned and you inlined functions into
main that use sse?
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug c++/12902] Invalid assembly generated when using SSE / xmmintrin.h
2003-11-05 1:31 [Bug c++/12902] New: Invalid assembly generated when using SSE / xmmintrin.h kbowers at lanl dot gov
2003-11-05 4:08 ` [Bug c++/12902] " pinskia at gcc dot gnu dot org
@ 2003-11-05 5:51 ` kbowers at lanl dot gov
2003-11-06 18:53 ` kbowers at lanl dot gov
` (13 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: kbowers at lanl dot gov @ 2003-11-05 5:51 UTC (permalink / raw)
To: gcc-bugs
PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12902
------- Additional Comments From kbowers at lanl dot gov 2003-11-05 05:51 -------
Subject: Re: Invalid assembly generated when using SSE / xmmintrin.h
pinskia at gcc dot gnu dot org wrote:
> PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12902
>
>
> pinskia at gcc dot gnu dot org changed:
>
> What |Removed |Added
> ----------------------------------------------------------------------------
> Status|UNCONFIRMED |WAITING
>
>
> ------- Additional Comments From pinskia at gcc dot gnu dot org 2003-11-05 04:08 -------
> The "a" in the asm corespondes to the variable "a" in foo, not the a in inlined function, swizzle.
> So the movaps is coming from:
> c.v = _mm_loadl_pi(c.v,((__m64 *)a0)+1); <--- 8(%eax)
>
> Is that really your problem, or is because main's stack is unaligned and you inlined functions into
> main that use sse?
I don't think this is a stack alignment problem. The memory pointed to
by "swizzle:a0"/"foo:a" is not on the stack and is 16-byte aligned (a0
is the pointer "a" in the the foo). Also, _mm_loadl_pi is intended for
8-byte aligned as (((__m64 *)a0)+1 is) memory 8-byte loads .
I think gcc has a bug in the implementation of the
__builtin_ia32_loadlps and __builtin_ia32_loadhps intrinsics when
dealing with xmm registers stored in a stack temporary.
Here is foo again with the b1 swizzle / math removed.
void foo( const a_t *a, const b_t *b, int n ) {
vector4 ai, a0, a1, a2, b0, v0, v1, v2;
__m128 *p0, *p1, *p2, *p3;
for(;n;n--,a+=4) {
swizzle(a,a+1,a+2,a+3,ai,a0,a1,a2);
p0 = (__m128 *)(b + ai.i[0]);
p1 = (__m128 *)(b + ai.i[1]);
p2 = (__m128 *)(b + ai.i[2]);
p3 = (__m128 *)(b + ai.i[3]);
swizzle(p0++,p1++,p2++,p3++,b0,v0,v1,v2);
b0.v = _mm_add_ps(_mm_add_ps(b0.v,_mm_mul_ps(a1.v,v0.v)),
_mm_mul_ps(a2.v,_mm_add_ps(v1.v,_mm_mul_ps(a1.v,v2.v))));
}
}
Here is the relevant assembly snippet:
% gcc-3.3.2 -S -fverbose-asm -O -msse bug_12902_followup.cpp
% cat bug_12902_followup.s
... snip to around line 45 ...
movaps -40(%ebp), %xmm3 # <variable>.v, <anonymous>
movlps (%esi), %xmm3 # * a, <anonymous>
movlps 8(%esi), %xmm6 # <anonymous>
movhps 16(%esi), %xmm3 # <anonymous>
movhps 24(%esi), %xmm6 # <anonymous>
movaps %xmm6, %xmm4 # <anonymous>, <anonymous>
movaps %xmm3, %xmm1 # <anonymous>, <anonymous>
movlps 32(%esi), %xmm1 # <anonymous>
movaps %xmm6, %xmm2 # <anonymous>, <anonymous>
movlps 40(%esi), %xmm2 # <anonymous>
movhps 48(%esi), %xmm1 # <anonymous>
movhps 56(%esi), %xmm2 # <anonymous>
movaps %xmm3, %xmm0 # <anonymous>
... snip ...
This is the correct assembly and the function does not crash. I also
have much more complex versions of the loop which do not crash (in those
cases, I guess I got lucky with the register spills).
The third instruction above corresponds to line you cited:
c.v = _mm_loadl_pi(c.v,((__m64 *)a0)+1);
I've tried to figure out what gcc is doing in the original foo but I'm
not sure I follow:
I think I've figured out the problem. Here is a translation of the first
swizzle assembly from the original foo():
# a.v = _mm_loadl_pi(a.v, (__m64 *)a0);
* a.v is in xmm3
movlps (%eax), %xmm3
# c.v = _mm_loadl_pi(c.v,((__m64 *)a0)+1); ... FAULTS
* c.v is in -200(%ebp)
movaps 8(%eax), %xmm0
movlps %xmm0, -200(%ebp)
# a.v = _mm_loadh_pi(a.v, (__m64 *)a1);
* a.v is in xmm3
movhps 16(%eax), %xmm3
# c.v = _mm_loadh_pi(c.v,((__m64 *)a1)+1); ... WOULD FAULT
* c.v is in -200(%ebp)
movaps 24(%eax), %xmm2
movhps %xmm2, -200(%ebp)
# b.v = a.v;
* b.v is in xmm3 (a.v and b.v are copies)
# d.v = c.v;
* d.v is in xmm4 (d.v and c.v are copies)
movaps -200(%ebp), %xmm4
# t = _mm_loadl_pi(b.v, (__m64 *)a2);
* t is in xmm1
movaps %xmm3, %xmm1
movlps 32(%eax), %xmm1
# u = _mm_loadl_pi(d.v,((__m64 *)a2)+1);
* u is in xmm2
movaps %xmm4, %xmm2
movlps 40(%eax), %xmm2
# t = _mm_loadh_pi(t, (__m64 *)a3);
* t is in xmm1
movhps 48(%eax), %xmm1
# u = _mm_loadh_pi(u,((__m64 *)a3)+1);
* u is in xmm2
movhps 56(%eax), %xmm2
# a.v = _mm_shuffle_ps(a.v,t,0x88);
* a.v is stored is "foo:ai"
* b.v no longer shares xmm3 with a.v
movaps %xmm3, %xmm0
shufps $136, %xmm1, %xmm0
movaps %xmm0, -40(%ebp)
# b.v = _mm_shuffle_ps(b.v,t,0xdd);
* b.v is stored in "foo:a0"
shufps $221, %xmm1, %xmm3
movaps %xmm3, -216(%ebp)
# c.v = _mm_shuffle_ps(c.v,u,0x88);
* c.v is stored in "foo:a1"
* d.v no longer shares xmm4 with a.v
movaps %xmm4, %xmm0
shufps $136, %xmm2, %xmm0
movaps %xmm0, -200(%ebp)
# d.v = _mm_shuffle_ps(d.v,u,0xdd);
* d.v is stored in "foo:a2"
shufps $221, %xmm2, %xmm4
movaps %xmm4, -232(%ebp)
The two faulting instructions appear to occur when an _mm_loadl_pi
(__builtin_ia32_loadlps), _mm_loadh_pi (__builtin_ia32_loadhps) seem to
work fine when doing a m64 -> xmm register transaction. However, when
doing a m64 -> xmm stack temporary, they emit invalid assembly. The
above assembly sequency makes sense if the following changes are made:
# c.v = _mm_loadl_pi(c.v,((__m64 *)a0)+1); ... FIXED
* c.v is in -200(%ebp)
movaps -200(%ebp), %xmm0 # Retrieve c.v from stack
movlps 8(%eax), %xmm0 # A valid 64-bit load
movaps %xmm0, -200(%ebp) # Store modified c.v onto stack
and
# c.v = _mm_loadh_pi(c.v,((__m64 *)a1)+1); ... FIXED
* c.v is in -200(%ebp)
movaps -200(%ebp), %xmm2 # Retrive c.v from stack
movhps 24(%eax), %xmm2 # A valid 64-bit load
movaps %xmm2, -200(%ebp) # Store modified c.v onto stack
Thanks for your prompt reply.
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug c++/12902] Invalid assembly generated when using SSE / xmmintrin.h
2003-11-05 1:31 [Bug c++/12902] New: Invalid assembly generated when using SSE / xmmintrin.h kbowers at lanl dot gov
2003-11-05 4:08 ` [Bug c++/12902] " pinskia at gcc dot gnu dot org
2003-11-05 5:51 ` kbowers at lanl dot gov
@ 2003-11-06 18:53 ` kbowers at lanl dot gov
2003-11-07 1:02 ` kbowers at lanl dot gov
` (12 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: kbowers at lanl dot gov @ 2003-11-06 18:53 UTC (permalink / raw)
To: gcc-bugs
PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12902
------- Additional Comments From kbowers at lanl dot gov 2003-11-06 18:53 -------
I replied to the comments a couple of days ago but the bug status is still
marked as waiting.
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug c++/12902] Invalid assembly generated when using SSE / xmmintrin.h
2003-11-05 1:31 [Bug c++/12902] New: Invalid assembly generated when using SSE / xmmintrin.h kbowers at lanl dot gov
` (2 preceding siblings ...)
2003-11-06 18:53 ` kbowers at lanl dot gov
@ 2003-11-07 1:02 ` kbowers at lanl dot gov
2003-11-07 10:42 ` kbowers at lanl dot gov
` (11 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: kbowers at lanl dot gov @ 2003-11-07 1:02 UTC (permalink / raw)
To: gcc-bugs
PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12902
kbowers at lanl dot gov changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|WAITING |NEW
------- Additional Comments From kbowers at lanl dot gov 2003-11-07 01:02 -------
I've done some further diagnostics and I think the instruction pattern matching
is confused about when to emit a memory-store movlps/movhps and a memory-load
movlps/movhps.
Here is a snippet from from gcc/config/i386/i386.c:ix86_expand_builtin() around
lines 13470. Given this is my first serious look into gcc's internals, I've
annotated it with my best guess as to what it is doing:
// op0 is the "__A" of the xmmintrin.h:_mm_loadl_pi
// op1 is the "__P" of the xmmintrin.h:_mm_loadl_pi
// If A is a not a nonimmediate V4SF-mode operand, copy A into a
// V4SF-mode register temporary
if (! (*insn_data[icode].operand[1].predicate) (op0, mode0))
op0 = copy_to_mode_reg (mode0, op0);
// Copy P into a register and mark that register as a V4SF-mode memory
// operand
op1 = gen_rtx_MEM (mode1, copy_to_mode_reg (Pmode, op1));
// Create a temporary V4SF-mode register target if one of the following
// is true:
// - There is no return target
// - The target is not a V4SF-mode operand
// - The target is not in a non-immediate V4SF-mode operand
if (target == 0
|| GET_MODE (target) != tmode
|| ! (*insn_data[icode].operand[0].predicate) (target, tmode))
target = gen_reg_rtx (tmode);
// Create the appropriate RTL instructions.
// Return failure if we could not generate the RTL.
// Otherwise, emit the RTL and return where the result went.
//
// GEN_FCN(icode) is in LOADLPS case calls gen_sse_movlps.
// For this case, the "pat =" line is equivalent to:
// pat = gen_rtx_SET( VOIDmode, target,
// gen_rtx_fmt_eee( VEC_MERGE, V4SFmode, op0, op1, GEN_INT(3) ));
pat = GEN_FCN (icode) (target, op0, op1);
if (! pat)
return 0;
emit_insn (pat);
return target;
Here is the movlps description from gcc/config/i386/i386.md(line 18549):
(define_insn "sse_movlps"
[(set (match_operand:V4SF 0 "nonimmediate_operand" "=x,m")
(vec_merge:V4SF
(match_operand:V4SF 1 "nonimmediate_operand" "0,0")
(match_operand:V4SF 2 "nonimmediate_operand" "m,x")
(const_int 3)))]
"TARGET_SSE
&& (GET_CODE (operands[1]) == MEM || GET_CODE (operands[2]) == MEM)"
"movlps\t{%2, %0|%0, %2}"
[(set_attr "type" "ssecvt")
(set_attr "mode" "V4SF")])
If my annotations are correct, how does the instruction pattern matching
determine if a memory-store "movlps" or a memory-load "movlps" is emitted in the
case where both "target" and "op0" are memory operands? If the instruction
pattern matching was confused and picked the storing movlps case instead of the
loading movlps case, it would explain the fault I am seeing.
So, should the i386 machine description split the movlps/movhps descriptions
split into a separate load and store cases to eliminate this ambiguity?
For example, maybe something like this for the loadlps case:
(define_insn "sse_loadlps"
[(set (match_operand:V4SF 0 "nonimmediate_operand" "=x")
(vec_merge:V4SF
(match_operand:V4SF 1 "nonimmediate_operand" "0")
(match_operand:V4SF 2 "nonimmediate_operand" "m")
(const_int 3)))]
"TARGET_SSE
&& (GET_CODE (operands[1]) == MEM || GET_CODE (operands[2]) == MEM)"
"movlps\t{%2, %0|%0, %2}"
[(set_attr "type" "ssecvt")
(set_attr "mode" "V4SF")])
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug c++/12902] Invalid assembly generated when using SSE / xmmintrin.h
2003-11-05 1:31 [Bug c++/12902] New: Invalid assembly generated when using SSE / xmmintrin.h kbowers at lanl dot gov
` (3 preceding siblings ...)
2003-11-07 1:02 ` kbowers at lanl dot gov
@ 2003-11-07 10:42 ` kbowers at lanl dot gov
2003-12-09 18:01 ` kbowers at lanl dot gov
` (10 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: kbowers at lanl dot gov @ 2003-11-07 10:42 UTC (permalink / raw)
To: gcc-bugs
PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12902
------- Additional Comments From kbowers at lanl dot gov 2003-11-07 10:41 -------
I think I've found and fixed the bug. The bug appears to be the ambiguity issue
from my prior comment. I replaced the mov[h,l]p[s,d] instructions with more
appropriate versions as mentioned in my prior note. The attachments are the
diffs of the changes I made to i386.c and i386.md that allowed the original
program to compile properly:
$ gcc-3.3.2a -S -fverbose-asm -O -msse bug_12902.cpp
$ cat bug_12902.s
... snip ...
.L45:
movaps -40(%ebp), %xmm0 # <variable>.v, __A
movl 8(%ebp), %eax # a
movlps (%eax), %xmm0 # <anonymous>
movaps -200(%ebp), %xmm1
movlps 8(%eax), %xmm1
movhps 16(%eax), %xmm0 # <anonymous>
movhps 24(%eax), %xmm1
movaps %xmm1, -200(%ebp)
I don't know if the given diffs are the ideal way to make these modifications
but it seems to work. I learned the .md format about an hour ago by examining
other SSE instructions and so it is quite possible I've done it all wrong. To
the end, I am mildly concerned that load[h,l]p[s,d] need to be split like
sse_load_ss / sse_load_ss_1 to properly initialize some elements of the
vec_duplicate:V4SF.
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug c++/12902] Invalid assembly generated when using SSE / xmmintrin.h
2003-11-05 1:31 [Bug c++/12902] New: Invalid assembly generated when using SSE / xmmintrin.h kbowers at lanl dot gov
` (4 preceding siblings ...)
2003-11-07 10:42 ` kbowers at lanl dot gov
@ 2003-12-09 18:01 ` kbowers at lanl dot gov
2003-12-09 20:14 ` [Bug target/12902] " dhazeghi at yahoo dot com
` (9 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: kbowers at lanl dot gov @ 2003-12-09 18:01 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From kbowers at lanl dot gov 2003-12-09 18:01 -------
Over a month ago, I reported this bug and then went to the effort to learning
gcc's internals in order to submit patches. However, the bug is still listed as
new and there has been no response for roughly a month.
Will these patches (or some other fix) be incorporated into a future release of gcc?
The reason I ask is it is unreasonable for me to require users to both patch and
build their own compiler to install some scientific codes of mine on large
clusters. I would like to see this bug resolved quickly so I can continue to
recommend gcc as the preferred compiler. Given I've already submitted patches,
this should be easy to address.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12902
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug target/12902] Invalid assembly generated when using SSE / xmmintrin.h
2003-11-05 1:31 [Bug c++/12902] New: Invalid assembly generated when using SSE / xmmintrin.h kbowers at lanl dot gov
` (5 preceding siblings ...)
2003-12-09 18:01 ` kbowers at lanl dot gov
@ 2003-12-09 20:14 ` dhazeghi at yahoo dot com
2003-12-09 20:17 ` dhazeghi at yahoo dot com
` (8 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: dhazeghi at yahoo dot com @ 2003-12-09 20:14 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From dhazeghi at yahoo dot com 2003-12-09 20:14 -------
Would you mind regenerating the patches with diff -up (as requested at http://gcc.gnu.org/
contribute.html), and then pinging gcc-patches@gcc.gnu.org (preferably including a changelog,
and noting that you don't have cvs write-access)?
Regarding your general comment, gcc is more or less a volunteer project, with all the caveats
involved. Anyhow, ideally people would scan the bug database for new bugs and patches as they
came in, but the fact is that most gcc folk are quite busy. I can't guarantee your patch will be
approved (I have no authority), but if you send it to the list (with the changes I mentioned), it'll be a
lot more likely. I admit it isn't the best development project. But its what we have (suggestions/
comments are welcome).
--
What |Removed |Added
----------------------------------------------------------------------------
Component|c++ |target
Keywords| |patch, wrong-code
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12902
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug target/12902] Invalid assembly generated when using SSE / xmmintrin.h
2003-11-05 1:31 [Bug c++/12902] New: Invalid assembly generated when using SSE / xmmintrin.h kbowers at lanl dot gov
` (6 preceding siblings ...)
2003-12-09 20:14 ` [Bug target/12902] " dhazeghi at yahoo dot com
@ 2003-12-09 20:17 ` dhazeghi at yahoo dot com
2003-12-11 16:07 ` bangerth at dealii dot org
` (7 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: dhazeghi at yahoo dot com @ 2003-12-09 20:17 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From dhazeghi at yahoo dot com 2003-12-09 20:17 -------
s/project/model... Oh, and if you really want a prompt response, you can add the relevant
MAINTAINER (from gcc/MAINTAINERS) to the cc: list for a bug. Not necessarily recommended but...
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12902
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug target/12902] Invalid assembly generated when using SSE / xmmintrin.h
2003-11-05 1:31 [Bug c++/12902] New: Invalid assembly generated when using SSE / xmmintrin.h kbowers at lanl dot gov
` (7 preceding siblings ...)
2003-12-09 20:17 ` dhazeghi at yahoo dot com
@ 2003-12-11 16:07 ` bangerth at dealii dot org
2004-12-13 20:54 ` bangerth at dealii dot org
` (6 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: bangerth at dealii dot org @ 2003-12-11 16:07 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From bangerth at dealii dot org 2003-12-11 16:07 -------
I think Jan has done most of the sse work, so I'll CC him. Jan: here's
a patch that you may want to look into.
W.
--
What |Removed |Added
----------------------------------------------------------------------------
CC| |hubicka at gcc dot gnu dot
| |org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12902
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug target/12902] Invalid assembly generated when using SSE / xmmintrin.h
2003-11-05 1:31 [Bug c++/12902] New: Invalid assembly generated when using SSE / xmmintrin.h kbowers at lanl dot gov
` (8 preceding siblings ...)
2003-12-11 16:07 ` bangerth at dealii dot org
@ 2004-12-13 20:54 ` bangerth at dealii dot org
2004-12-14 10:54 ` uros at kss-loka dot si
` (5 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: bangerth at dealii dot org @ 2004-12-13 20:54 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From bangerth at dealii dot org 2004-12-13 20:54 -------
I really have not much of an idea what I am doing here, but this
is a shorter testcase:
-------------------------
#include <xmmintrin.h>
typedef struct {
int i;
float f[3];
} a_t;
typedef struct {
float f[8];
} b_t;
typedef union {
int i[4];
float f[4];
__m128 v;
} vector4;
void swizzle( const void *a0, const void *a1,
const void *a2, const void *a3,
vector4 *a, vector4 *b, vector4 *c, vector4 *d ) {
__m128 t, u;
a->v = _mm_loadl_pi(a->v, (__m64 *)a0);
c->v = _mm_loadl_pi(c->v,((__m64 *)a0)+1);
a->v = _mm_loadh_pi(a->v, (__m64 *)a1);
c->v = _mm_loadh_pi(c->v,((__m64 *)a1)+1);
t = _mm_loadl_pi(b->v, (__m64 *)a2);
u = _mm_loadl_pi(d->v,((__m64 *)a2)+1);
a->v = _mm_shuffle_ps(a->v,t,0);
b->v = _mm_shuffle_ps(b->v,t,0);
c->v = _mm_shuffle_ps(c->v,u,0);
d->v = _mm_shuffle_ps(d->v,u,0);
}
int main () {
a_t a[128];
b_t b[128];
vector4 ai, a0, a1, a2, b0, v0, v1, v2;
__m128 *p0, *p1, *p2, *p3;
int n = 1;
for(;n;n--) {
swizzle(a,a+1,a+2,a+3,&ai,&a0,&a1,&a2);
p0 = (__m128 *)(b + ai.i[0]);
p1 = (__m128 *)(b + ai.i[1]);
p2 = (__m128 *)(b + ai.i[2]);
p3 = (__m128 *)(b + ai.i[3]);
swizzle(p0++,p1++,p2++,p3++,&b0,&v0,&v1,&v2);
_mm_add_ps(_mm_add_ps(b0.v,_mm_mul_ps(a1.v,v0.v)),
_mm_mul_ps(a2.v,_mm_add_ps(v1.v,_mm_mul_ps(a1.v,v2.v))));
}
}
-----------------------------
It fails on 3.4 and mainline, but not with icc:
g/x> /home/bangerth/bin/gcc-3.4.*-pre/bin/g++ -O -msse2 -g x.cc ; ./a.out
Segmentation fault
g/x> /home/bangerth/bin/gcc-4.*-pre/bin/g++ -O -msse2 -g x.cc ; ./a.out
Segmentation fault
g/x> icc x.cc ; ./a.out
On mainline, I get this from a gdb session:
(gdb) r
Starting program: /home/bangerth/tmp/g/x/a.out
Program received signal SIGSEGV, Segmentation fault.
swizzle (a0=0xbfffe250, a1=0xbfffe260, a2=0xbfffe270, a3=0xbfffe280,
a=0xbfffeac0, b=0xbfffeab0, c=0xbfffeaa0, d=0xbfffea90) at x.cc:23
(gdb) i reg
[...]
eip 0x80483ba 0x80483ba
(gdb) disass
0x080483ba <_Z7swizzlePKvS0_S0_S0_P7vector4S2_S2_S2_+38>: movaps 0x8
(%esi),%xmm0
As I said, I have no idea what this program does, and if it is wellformed
after my attempts to reduce it at all. Maybe it helps anyway.
W.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12902
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug target/12902] Invalid assembly generated when using SSE / xmmintrin.h
2003-11-05 1:31 [Bug c++/12902] New: Invalid assembly generated when using SSE / xmmintrin.h kbowers at lanl dot gov
` (9 preceding siblings ...)
2004-12-13 20:54 ` bangerth at dealii dot org
@ 2004-12-14 10:54 ` uros at kss-loka dot si
2005-01-05 9:43 ` [Bug target/12902] [4.0 Regression] " uros at kss-loka dot si
` (4 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: uros at kss-loka dot si @ 2004-12-14 10:54 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From uros at kss-loka dot si 2004-12-14 10:52 -------
The problem here is in combiner, which in combination with reload pass produce
somehow incorrect pattern.
The line that segfaults is:
c->v = _mm_loadl_pi(c->v,((__m64 *)a0)+1);
This line is represented with foloowing RTL sequence (pr12902.c.00.expand):
(insn 26 24 27 1 (parallel [
(set (reg:SI 80)
(plus:SI (reg:SI 70 [ a0.26 ])
(const_int 8 [0x8])))
(clobber (reg:CC 17 flags))
]) -1 (nil)
(nil))
(insn 27 26 28 1 (set (reg:SI 81)
(reg:SI 80)) -1 (nil)
(nil))
(insn 28 27 30 1 (set (reg:V4SF 60 [ D.3679 ])
(vec_merge:V4SF (mem/s:V4SF (reg/v/f:SI 77 [ c ]) [0 <variable>.v+0 S16
A128])
(mem:V4SF (reg:SI 81) [0 S16 A8])
(const_int 3 [0x3]))) -1 (nil)
(nil))
(insn 30 28 32 1 (set (mem/s:V4SF (reg/v/f:SI 77 [ c ]) [0 <variable>.v+0 S16 A128])
(reg:V4SF 60 [ D.3679 ])) -1 (nil)
(nil))
This whole sequence is combined into one RTL insn (pr12902.c.17.combine) that
satisfies "sse_movlps" pattern constraints:
(insn 30 28 35 0 (set (mem/s:V4SF (reg/v/f:SI 77 [ c ]) [0 <variable>.v+0 S16 A128])
(vec_merge:V4SF (mem/s:V4SF (reg/v/f:SI 77 [ c ]) [0 <variable>.v+0 S16
A128])
(mem:V4SF (plus:SI (reg/v/f:SI 71 [ a0 ])
(const_int 8 [0x8])) [0 S16 A8])
(const_int 3 [0x3]))) 541 {sse_movlps} (insn_list:REG_DEP_TRUE 12 (nil))
(expr_list:REG_DEAD (reg/v/f:SI 71 [ a0 ])
(nil)))
Following this, reload generates what it thinks is the best reg/mem combination
to satisfy register constraints (pr12902.c.24.postreload) of "sse_movlps" pattern
(insn 80 28 30 0 (set (reg:V4SF 21 xmm0)
(mem:V4SF (plus:SI (reg/v/f:SI 4 si [orig:71 a0 ] [71])
(const_int 8 [0x8])) [0 S16 A8])) 509 {movv4sf_internal} (nil)
(nil))
(insn:HI 30 80 35 0 (set (mem/s:V4SF (reg/v/f:SI 1 dx [orig:77 c ] [77]) [0
<variable>.v+0 S16 A128])
(vec_merge:V4SF (mem/s:V4SF (reg/v/f:SI 1 dx [orig:77 c ] [77]) [0
<variable>.v+0 S16 A128])
(reg:V4SF 21 xmm0)
(const_int 3 [0x3]))) 541 {sse_movlps} (insn_list:REG_DEP_TRUE 12 (nil))
(nil))
Unfortunatelly, insn 80 will crash, because it results in unaligned load:
...
movaps 8(%esi), %xmm0 <- crash here
movlps %xmm0, (%edx)
...
Uros.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12902
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug target/12902] [4.0 Regression] Invalid assembly generated when using SSE / xmmintrin.h
2003-11-05 1:31 [Bug c++/12902] New: Invalid assembly generated when using SSE / xmmintrin.h kbowers at lanl dot gov
` (10 preceding siblings ...)
2004-12-14 10:54 ` uros at kss-loka dot si
@ 2005-01-05 9:43 ` uros at kss-loka dot si
2005-01-05 12:14 ` rth at gcc dot gnu dot org
` (3 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: uros at kss-loka dot si @ 2005-01-05 9:43 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From uros at kss-loka dot si 2005-01-05 09:43 -------
I think this bug should be fixed for 4.0.
--
What |Removed |Added
----------------------------------------------------------------------------
CC| |uros at kss-loka dot si
Summary|Invalid assembly generated |[4.0 Regression] Invalid
|when using SSE / xmmintrin.h|assembly generated when
| |using SSE / xmmintrin.h
Target Milestone|--- |4.0.0
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12902
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug target/12902] [4.0 Regression] Invalid assembly generated when using SSE / xmmintrin.h
2003-11-05 1:31 [Bug c++/12902] New: Invalid assembly generated when using SSE / xmmintrin.h kbowers at lanl dot gov
` (11 preceding siblings ...)
2005-01-05 9:43 ` [Bug target/12902] [4.0 Regression] " uros at kss-loka dot si
@ 2005-01-05 12:14 ` rth at gcc dot gnu dot org
2005-01-05 19:14 ` cvs-commit at gcc dot gnu dot org
` (2 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: rth at gcc dot gnu dot org @ 2005-01-05 12:14 UTC (permalink / raw)
To: gcc-bugs
--
What |Removed |Added
----------------------------------------------------------------------------
AssignedTo|unassigned at gcc dot gnu |rth at gcc dot gnu dot org
|dot org |
Status|NEW |ASSIGNED
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12902
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug target/12902] [4.0 Regression] Invalid assembly generated when using SSE / xmmintrin.h
2003-11-05 1:31 [Bug c++/12902] New: Invalid assembly generated when using SSE / xmmintrin.h kbowers at lanl dot gov
` (12 preceding siblings ...)
2005-01-05 12:14 ` rth at gcc dot gnu dot org
@ 2005-01-05 19:14 ` cvs-commit at gcc dot gnu dot org
2005-01-05 20:04 ` rth at gcc dot gnu dot org
2005-01-06 8:25 ` uros at kss-loka dot si
15 siblings, 0 replies; 17+ messages in thread
From: cvs-commit at gcc dot gnu dot org @ 2005-01-05 19:14 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From cvs-commit at gcc dot gnu dot org 2005-01-05 19:14 -------
Subject: Bug 12902
CVSROOT: /cvs/gcc
Module name: gcc
Changes by: rth@gcc.gnu.org 2005-01-05 19:14:39
Modified files:
gcc : ChangeLog
gcc/config/i386: i386.c i386.md
Added files:
gcc/testsuite/gcc.target/i386: sse-1.c
Log message:
PR target/12902
* config/i386/i386.md (sse_movhps, sse_movlps): Remove.
(sse_shufps): Change operand 3 to const_int_operand.
(sse2_storelps): Fix typo in template.
(sse_storehps, sse_loadhps, sse_storelps, sse_loadlps): New.
* config/i386/i386.c (ix86_expand_vector_move_misalign): Use them.
(ix86_expand_builtin): Likewise.
Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.7038&r2=2.7039
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config/i386/i386.c.diff?cvsroot=gcc&r1=1.767&r2=1.768
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config/i386/i386.md.diff?cvsroot=gcc&r1=1.599&r2=1.600
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/testsuite/gcc.target/i386/sse-1.c.diff?cvsroot=gcc&r1=NONE&r2=1.1
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12902
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug target/12902] [4.0 Regression] Invalid assembly generated when using SSE / xmmintrin.h
2003-11-05 1:31 [Bug c++/12902] New: Invalid assembly generated when using SSE / xmmintrin.h kbowers at lanl dot gov
` (13 preceding siblings ...)
2005-01-05 19:14 ` cvs-commit at gcc dot gnu dot org
@ 2005-01-05 20:04 ` rth at gcc dot gnu dot org
2005-01-06 8:25 ` uros at kss-loka dot si
15 siblings, 0 replies; 17+ messages in thread
From: rth at gcc dot gnu dot org @ 2005-01-05 20:04 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From rth at gcc dot gnu dot org 2005-01-05 20:03 -------
Fixed.
--
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution| |FIXED
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12902
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug target/12902] [4.0 Regression] Invalid assembly generated when using SSE / xmmintrin.h
2003-11-05 1:31 [Bug c++/12902] New: Invalid assembly generated when using SSE / xmmintrin.h kbowers at lanl dot gov
` (14 preceding siblings ...)
2005-01-05 20:04 ` rth at gcc dot gnu dot org
@ 2005-01-06 8:25 ` uros at kss-loka dot si
15 siblings, 0 replies; 17+ messages in thread
From: uros at kss-loka dot si @ 2005-01-06 8:25 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From uros at kss-loka dot si 2005-01-06 08:25 -------
(In reply to comment #18)
There are still problems, testcase from comment #14 fails with:
gcc -O2 -msse pr12902-1.c
pr12902-1.c: In function 'swizzle':
pr12902-1.c:32: error: unrecognizable insn:
(insn 105 97 46 0 (set (mem/s:V2SF (plus:SI (reg/v/f:SI 0 ax [orig:75 a ] [75])
(const_int 8 [0x8])) [0 <variable>.v+8 S8 A64])
(mem:V2SF (reg/v/f:SI 5 di [orig:72 a1 ] [72]) [0 S8 A8])) -1 (nil)
(nil))
pr12902-1.c:32: internal compiler error: in extract_insn, at recog.c:2020
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12902
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2005-01-06 8:25 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-11-05 1:31 [Bug c++/12902] New: Invalid assembly generated when using SSE / xmmintrin.h kbowers at lanl dot gov
2003-11-05 4:08 ` [Bug c++/12902] " pinskia at gcc dot gnu dot org
2003-11-05 5:51 ` kbowers at lanl dot gov
2003-11-06 18:53 ` kbowers at lanl dot gov
2003-11-07 1:02 ` kbowers at lanl dot gov
2003-11-07 10:42 ` kbowers at lanl dot gov
2003-12-09 18:01 ` kbowers at lanl dot gov
2003-12-09 20:14 ` [Bug target/12902] " dhazeghi at yahoo dot com
2003-12-09 20:17 ` dhazeghi at yahoo dot com
2003-12-11 16:07 ` bangerth at dealii dot org
2004-12-13 20:54 ` bangerth at dealii dot org
2004-12-14 10:54 ` uros at kss-loka dot si
2005-01-05 9:43 ` [Bug target/12902] [4.0 Regression] " uros at kss-loka dot si
2005-01-05 12:14 ` rth at gcc dot gnu dot org
2005-01-05 19:14 ` cvs-commit at gcc dot gnu dot org
2005-01-05 20:04 ` rth at gcc dot gnu dot org
2005-01-06 8:25 ` uros at kss-loka dot si
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).