public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c++/12902] New: Invalid assembly generated when using SSE / xmmintrin.h
@ 2003-11-05  1:31 kbowers at lanl dot gov
  2003-11-05  4:08 ` [Bug c++/12902] " pinskia at gcc dot gnu dot org
                   ` (15 more replies)
  0 siblings, 16 replies; 17+ messages in thread
From: kbowers at lanl dot gov @ 2003-11-05  1:31 UTC (permalink / raw)
  To: gcc-bugs

PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12902

           Summary: Invalid assembly generated when using SSE / xmmintrin.h
           Product: gcc
           Version: 3.3.2
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: c++
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: kbowers at lanl dot gov
                CC: gcc-bugs at gcc dot gnu dot org
 GCC build triplet: i686-pc-linux-gnu
  GCC host triplet: i686-pc-linux-gnu
GCC target triplet: i686-pc-linux-gnu

I have a code which heavily uses SSE. Some new functions written for the code
seg-faulted due to an unaligned "movaps" instruction. I stripped one of those
functions down as far as I could to make a self-contained program which
reproduces the bug. That program is attached to the end of this bug report. I
apologize in advance that I could not find a simpler faulting program.

It generates invalid assembly when compiled with gcc-3.3 and gcc-3.3.2 at
optimizations levels -O, -O1, -O2 and -O3 when compiled.

The following commands can demonstrate the bug.

% g++-3.3.2 -v

Reading specs from
/home/kbowers/local/bin/../lib/gcc-lib/i686-pc-linux-gnu/3.3.2/specs
Configured with: ../gcc-3.3.2/configure --prefix=/local_home/kbowers
--program-suffix=-3.3.2
Thread model: posix
gcc version 3.3.2

% g++-3.3.2 -S -fverbose-asm -O -msse compiler_bug.cpp
% cat compiler_bug.cpp

... snip to around line 45 ...

	movl	8(%ebp), %eax	#  a
	movlps	(%eax), %xmm3	#  <anonymous>
	movaps	8(%eax), %xmm0
	movlps	%xmm0, -200(%ebp)
	movhps	16(%eax), %xmm3	#  <anonymous>
	movaps	24(%eax), %xmm2

... snip ...

The first instruction loads the value of the "a" pointer into eax. When the loop
is run, "a" is a 16-byte aligned pointer. The third instruction tries to do an
aligned 16-byte load from a non-aligned address.

Thanks.

---- BEGIN PROGRAM ----
#include <xmmintrin.h>

typedef struct {
  int i;
  float f[3];
} a_t;

typedef struct {
  float f[8];
} b_t;

typedef union {
  int i[4];
  float f[4];
  __m128 v;
} vector4;

inline void swizzle( const void *a0, const void *a1,
                     const void *a2, const void *a3,
                     vector4 &a, vector4 &b, vector4 &c, vector4 &d ) {
  __m128 t, u;
  a.v = _mm_loadl_pi(a.v, (__m64 *)a0);
  c.v = _mm_loadl_pi(c.v,((__m64 *)a0)+1);
  a.v = _mm_loadh_pi(a.v, (__m64 *)a1);
  c.v = _mm_loadh_pi(c.v,((__m64 *)a1)+1);
  b.v = a.v;
  d.v = c.v;
  t   = _mm_loadl_pi(b.v, (__m64 *)a2);    // b.v to avoid warn
  u   = _mm_loadl_pi(d.v,((__m64 *)a2)+1); // d.v to avoid warn
  t   = _mm_loadh_pi(t, (__m64 *)a3);
  u   = _mm_loadh_pi(u,((__m64 *)a3)+1);
  a.v = _mm_shuffle_ps(a.v,t,0x88);
  b.v = _mm_shuffle_ps(b.v,t,0xdd);
  c.v = _mm_shuffle_ps(c.v,u,0x88);
  d.v = _mm_shuffle_ps(d.v,u,0xdd);
}

void foo( const a_t *a, const b_t *b, int n ) {
  vector4 ai, a0, a1, a2, b0, b1, v0, v1, v2;
  __m128 *p0, *p1, *p2, *p3;

  for(;n;n--,a+=4) {
    swizzle(a,a+1,a+2,a+3,ai,a0,a1,a2);
    p0 = (__m128 *)(b + ai.i[0]);
    p1 = (__m128 *)(b + ai.i[1]);
    p2 = (__m128 *)(b + ai.i[2]);
    p3 = (__m128 *)(b + ai.i[3]);
    swizzle(p0++,p1++,p2++,p3++,b0,v0,v1,v2);
    b0.v = _mm_add_ps(                _mm_add_ps(b0.v,_mm_mul_ps(a1.v,v0.v)),
                      _mm_mul_ps(a2.v,_mm_add_ps(v1.v,_mm_mul_ps(a1.v,v2.v))));
    swizzle(p0,p1,p2,p3,b1,v0,v1,v2);
    b1.v = _mm_add_ps(                _mm_add_ps(b1.v,_mm_mul_ps(a2.v,v0.v)),
                      _mm_mul_ps(a0.v,_mm_add_ps(v1.v,_mm_mul_ps(a2.v,v2.v))));
  }
}
---- END PROGRAM ----


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug c++/12902] Invalid assembly generated when using SSE / xmmintrin.h
  2003-11-05  1:31 [Bug c++/12902] New: Invalid assembly generated when using SSE / xmmintrin.h kbowers at lanl dot gov
@ 2003-11-05  4:08 ` pinskia at gcc dot gnu dot org
  2003-11-05  5:51 ` kbowers at lanl dot gov
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2003-11-05  4:08 UTC (permalink / raw)
  To: gcc-bugs

PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12902


pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |WAITING


------- Additional Comments From pinskia at gcc dot gnu dot org  2003-11-05 04:08 -------
The "a" in the asm corespondes to the variable "a" in foo, not the a in inlined function, swizzle.
So the movaps is coming from:
  c.v = _mm_loadl_pi(c.v,((__m64 *)a0)+1); <--- 8(%eax)

Is that really your problem, or is because main's stack is unaligned and you inlined functions into 
main that use sse?


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug c++/12902] Invalid assembly generated when using SSE / xmmintrin.h
  2003-11-05  1:31 [Bug c++/12902] New: Invalid assembly generated when using SSE / xmmintrin.h kbowers at lanl dot gov
  2003-11-05  4:08 ` [Bug c++/12902] " pinskia at gcc dot gnu dot org
@ 2003-11-05  5:51 ` kbowers at lanl dot gov
  2003-11-06 18:53 ` kbowers at lanl dot gov
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: kbowers at lanl dot gov @ 2003-11-05  5:51 UTC (permalink / raw)
  To: gcc-bugs

PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12902



------- Additional Comments From kbowers at lanl dot gov  2003-11-05 05:51 -------
Subject: Re:  Invalid assembly generated when using SSE / xmmintrin.h

pinskia at gcc dot gnu dot org wrote:
> PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12902
> 
> 
> pinskia at gcc dot gnu dot org changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>              Status|UNCONFIRMED                 |WAITING
> 
> 
> ------- Additional Comments From pinskia at gcc dot gnu dot org  2003-11-05 04:08 -------
> The "a" in the asm corespondes to the variable "a" in foo, not the a in inlined function, swizzle.
> So the movaps is coming from:
>   c.v = _mm_loadl_pi(c.v,((__m64 *)a0)+1); <--- 8(%eax)
 >
> Is that really your problem, or is because main's stack is unaligned and you inlined functions into 
> main that use sse?

I don't think this is a stack alignment problem. The memory pointed to 
by "swizzle:a0"/"foo:a" is not on the stack and is 16-byte aligned (a0 
is the pointer "a" in the the foo). Also, _mm_loadl_pi is intended for 
8-byte aligned as (((__m64 *)a0)+1 is) memory 8-byte loads .

I think gcc has a bug in the implementation of the 
__builtin_ia32_loadlps and __builtin_ia32_loadhps intrinsics when 
dealing with xmm registers stored in a stack temporary.

Here is foo again with the b1 swizzle / math removed.

void foo( const a_t *a, const b_t *b, int n ) {
   vector4 ai, a0, a1, a2, b0, v0, v1, v2;
   __m128 *p0, *p1, *p2, *p3;

   for(;n;n--,a+=4) {
     swizzle(a,a+1,a+2,a+3,ai,a0,a1,a2);
     p0 = (__m128 *)(b + ai.i[0]);
     p1 = (__m128 *)(b + ai.i[1]);
     p2 = (__m128 *)(b + ai.i[2]);
     p3 = (__m128 *)(b + ai.i[3]);
     swizzle(p0++,p1++,p2++,p3++,b0,v0,v1,v2);
     b0.v = _mm_add_ps(_mm_add_ps(b0.v,_mm_mul_ps(a1.v,v0.v)),
            _mm_mul_ps(a2.v,_mm_add_ps(v1.v,_mm_mul_ps(a1.v,v2.v))));
   }
}

Here is the relevant assembly snippet:

% gcc-3.3.2 -S -fverbose-asm -O -msse bug_12902_followup.cpp
% cat bug_12902_followup.s

... snip to around line 45 ...

movaps	-40(%ebp), %xmm3	#  <variable>.v,  <anonymous>
movlps	(%esi), %xmm3	# * a,  <anonymous>
movlps	8(%esi), %xmm6	#  <anonymous>
movhps	16(%esi), %xmm3	#  <anonymous>
movhps	24(%esi), %xmm6	#  <anonymous>
movaps	%xmm6, %xmm4	#  <anonymous>,  <anonymous>
movaps	%xmm3, %xmm1	#  <anonymous>,  <anonymous>
movlps	32(%esi), %xmm1	#  <anonymous>
movaps	%xmm6, %xmm2	#  <anonymous>,  <anonymous>
movlps	40(%esi), %xmm2	#  <anonymous>
movhps	48(%esi), %xmm1	#  <anonymous>
movhps	56(%esi), %xmm2	#  <anonymous>
movaps	%xmm3, %xmm0	#  <anonymous>

... snip ...

This is the correct assembly and the function does not crash. I also 
have much more complex versions of the loop which do not crash (in those 
cases, I guess I got lucky with the register spills).

The third instruction above corresponds to line you cited:

c.v = _mm_loadl_pi(c.v,((__m64 *)a0)+1);

I've tried to figure out what gcc is doing in the original foo but I'm 
not sure I follow:

I think I've figured out the problem. Here is a translation of the first 
swizzle assembly from the original foo():

#  a.v = _mm_loadl_pi(a.v, (__m64 *)a0);
*  a.v is in xmm3
	movlps	(%eax), %xmm3

#  c.v = _mm_loadl_pi(c.v,((__m64 *)a0)+1); ... FAULTS
*  c.v is in -200(%ebp)
	movaps	8(%eax), %xmm0
	movlps	%xmm0, -200(%ebp)

#  a.v = _mm_loadh_pi(a.v, (__m64 *)a1);
*  a.v is in xmm3
	movhps	16(%eax), %xmm3

#  c.v = _mm_loadh_pi(c.v,((__m64 *)a1)+1); ... WOULD FAULT
*  c.v is in -200(%ebp)
	movaps	24(%eax), %xmm2
	movhps	%xmm2, -200(%ebp)

#  b.v = a.v;
*  b.v is in xmm3 (a.v and b.v are copies)

#  d.v = c.v;
*  d.v is in xmm4 (d.v and c.v are copies)
	movaps	-200(%ebp), %xmm4

# t   = _mm_loadl_pi(b.v, (__m64 *)a2);
* t is in xmm1
	movaps	%xmm3, %xmm1
	movlps	32(%eax), %xmm1

# u   = _mm_loadl_pi(d.v,((__m64 *)a2)+1);
* u is in xmm2
	movaps	%xmm4, %xmm2
	movlps	40(%eax), %xmm2

# t   = _mm_loadh_pi(t, (__m64 *)a3);
* t is in xmm1
	movhps	48(%eax), %xmm1

# u   = _mm_loadh_pi(u,((__m64 *)a3)+1);
* u is in xmm2
	movhps	56(%eax), %xmm2

# a.v = _mm_shuffle_ps(a.v,t,0x88);
* a.v is stored is "foo:ai"
* b.v no longer shares xmm3 with a.v
	movaps	%xmm3, %xmm0
	shufps	$136, %xmm1, %xmm0
	movaps	%xmm0, -40(%ebp)

# b.v = _mm_shuffle_ps(b.v,t,0xdd);
* b.v is stored in "foo:a0"
	shufps	$221, %xmm1, %xmm3
	movaps	%xmm3, -216(%ebp)

# c.v = _mm_shuffle_ps(c.v,u,0x88);
* c.v is stored in "foo:a1"
* d.v no longer shares xmm4 with a.v
	movaps	%xmm4, %xmm0
	shufps	$136, %xmm2, %xmm0
	movaps	%xmm0, -200(%ebp)
# d.v = _mm_shuffle_ps(d.v,u,0xdd);
* d.v is stored in "foo:a2"
	shufps	$221, %xmm2, %xmm4
	movaps	%xmm4, -232(%ebp)

The two faulting instructions appear to occur when an _mm_loadl_pi 
(__builtin_ia32_loadlps), _mm_loadh_pi (__builtin_ia32_loadhps) seem to 
work fine when doing a m64 -> xmm register transaction. However, when 
doing a m64 -> xmm stack temporary, they emit invalid assembly. The 
above assembly sequency makes sense if the following changes are made:

#  c.v = _mm_loadl_pi(c.v,((__m64 *)a0)+1); ... FIXED
*  c.v is in -200(%ebp)
	movaps  -200(%ebp), %xmm0 # Retrieve c.v from stack
	movlps	8(%eax), %xmm0    # A valid 64-bit load
	movaps	%xmm0, -200(%ebp) # Store modified c.v onto stack

and

#  c.v = _mm_loadh_pi(c.v,((__m64 *)a1)+1); ... FIXED
*  c.v is in -200(%ebp)
	movaps  -200(%ebp), %xmm2 # Retrive c.v from stack
	movhps	24(%eax), %xmm2   # A valid 64-bit load
	movaps	%xmm2, -200(%ebp) # Store modified c.v onto stack

Thanks for your prompt reply.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug c++/12902] Invalid assembly generated when using SSE / xmmintrin.h
  2003-11-05  1:31 [Bug c++/12902] New: Invalid assembly generated when using SSE / xmmintrin.h kbowers at lanl dot gov
  2003-11-05  4:08 ` [Bug c++/12902] " pinskia at gcc dot gnu dot org
  2003-11-05  5:51 ` kbowers at lanl dot gov
@ 2003-11-06 18:53 ` kbowers at lanl dot gov
  2003-11-07  1:02 ` kbowers at lanl dot gov
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: kbowers at lanl dot gov @ 2003-11-06 18:53 UTC (permalink / raw)
  To: gcc-bugs

PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12902



------- Additional Comments From kbowers at lanl dot gov  2003-11-06 18:53 -------
I replied to the comments a couple of days ago but the bug status is still
marked as waiting.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug c++/12902] Invalid assembly generated when using SSE / xmmintrin.h
  2003-11-05  1:31 [Bug c++/12902] New: Invalid assembly generated when using SSE / xmmintrin.h kbowers at lanl dot gov
                   ` (2 preceding siblings ...)
  2003-11-06 18:53 ` kbowers at lanl dot gov
@ 2003-11-07  1:02 ` kbowers at lanl dot gov
  2003-11-07 10:42 ` kbowers at lanl dot gov
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: kbowers at lanl dot gov @ 2003-11-07  1:02 UTC (permalink / raw)
  To: gcc-bugs

PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12902


kbowers at lanl dot gov changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|WAITING                     |NEW


------- Additional Comments From kbowers at lanl dot gov  2003-11-07 01:02 -------
I've done some further diagnostics and I think the instruction pattern matching
is confused about when to emit a memory-store movlps/movhps and a memory-load
movlps/movhps.

Here is a snippet from from gcc/config/i386/i386.c:ix86_expand_builtin() around
lines 13470. Given this is my first serious look into gcc's internals, I've
annotated it with my best guess as to what it is doing:

      // op0 is the "__A" of the xmmintrin.h:_mm_loadl_pi
      // op1 is the "__P" of the xmmintrin.h:_mm_loadl_pi

      // If A is a not a nonimmediate V4SF-mode operand, copy A into a
      // V4SF-mode register temporary
      if (! (*insn_data[icode].operand[1].predicate) (op0, mode0)) 
	op0 = copy_to_mode_reg (mode0, op0); 

      // Copy P into a register and mark that register as a V4SF-mode memory
      // operand
      op1 = gen_rtx_MEM (mode1, copy_to_mode_reg (Pmode, op1)); 

      // Create a temporary V4SF-mode register target if one of the following
      // is true:
      // - There is no return target
      // - The target is not a V4SF-mode operand
      // - The target is not in a non-immediate V4SF-mode operand
      if (target == 0
	  || GET_MODE (target) != tmode
	  || ! (*insn_data[icode].operand[0].predicate) (target, tmode))
	target = gen_reg_rtx (tmode);

      // Create the appropriate RTL instructions.
      // Return failure if we could not generate the RTL.
      // Otherwise, emit the RTL and return where the result went.
      //
      // GEN_FCN(icode) is in LOADLPS case calls gen_sse_movlps.
      // For this case, the "pat =" line is equivalent to:
      // pat = gen_rtx_SET( VOIDmode, target,
      //   gen_rtx_fmt_eee( VEC_MERGE, V4SFmode, op0, op1, GEN_INT(3) ));
      pat = GEN_FCN (icode) (target, op0, op1); 
      if (! pat)
	return 0; 
      emit_insn (pat); 
      return target; 

Here is the movlps description from gcc/config/i386/i386.md(line 18549):

(define_insn "sse_movlps"
  [(set (match_operand:V4SF 0 "nonimmediate_operand" "=x,m")
	(vec_merge:V4SF
	 (match_operand:V4SF 1 "nonimmediate_operand" "0,0")
	 (match_operand:V4SF 2 "nonimmediate_operand" "m,x")
	 (const_int 3)))]
  "TARGET_SSE
   && (GET_CODE (operands[1]) == MEM || GET_CODE (operands[2]) == MEM)"
  "movlps\t{%2, %0|%0, %2}"
  [(set_attr "type" "ssecvt")
   (set_attr "mode" "V4SF")])

If my annotations are correct, how does the instruction pattern matching
determine if a memory-store "movlps" or a memory-load "movlps" is emitted in the
case where both "target" and "op0" are memory operands? If the instruction
pattern matching was confused and picked the storing movlps case instead of the
loading movlps case, it would explain the fault I am seeing.

So, should the i386 machine description split the movlps/movhps descriptions
split into a separate load and store cases to eliminate this ambiguity?

For example, maybe something like this for the loadlps case:

(define_insn "sse_loadlps"
  [(set (match_operand:V4SF 0 "nonimmediate_operand" "=x")
	(vec_merge:V4SF
	 (match_operand:V4SF 1 "nonimmediate_operand" "0")
	 (match_operand:V4SF 2 "nonimmediate_operand" "m")
	 (const_int 3)))]
  "TARGET_SSE
   && (GET_CODE (operands[1]) == MEM || GET_CODE (operands[2]) == MEM)"
  "movlps\t{%2, %0|%0, %2}"
  [(set_attr "type" "ssecvt")
   (set_attr "mode" "V4SF")])


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug c++/12902] Invalid assembly generated when using SSE / xmmintrin.h
  2003-11-05  1:31 [Bug c++/12902] New: Invalid assembly generated when using SSE / xmmintrin.h kbowers at lanl dot gov
                   ` (3 preceding siblings ...)
  2003-11-07  1:02 ` kbowers at lanl dot gov
@ 2003-11-07 10:42 ` kbowers at lanl dot gov
  2003-12-09 18:01 ` kbowers at lanl dot gov
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: kbowers at lanl dot gov @ 2003-11-07 10:42 UTC (permalink / raw)
  To: gcc-bugs

PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12902



------- Additional Comments From kbowers at lanl dot gov  2003-11-07 10:41 -------
I think I've found and fixed the bug. The bug appears to be the ambiguity issue
from my prior comment. I replaced the mov[h,l]p[s,d] instructions with more
appropriate versions as mentioned in my prior note. The attachments are the
diffs of the changes I made to i386.c and i386.md that allowed the original
program to compile properly:

$ gcc-3.3.2a -S -fverbose-asm -O -msse bug_12902.cpp
$ cat bug_12902.s
... snip ...
.L45:
        movaps  -40(%ebp), %xmm0        #  <variable>.v,  __A
        movl    8(%ebp), %eax   #  a
        movlps  (%eax), %xmm0   #  <anonymous>
        movaps  -200(%ebp), %xmm1
        movlps  8(%eax), %xmm1
        movhps  16(%eax), %xmm0 #  <anonymous>
        movhps  24(%eax), %xmm1
        movaps  %xmm1, -200(%ebp)

I don't know if the given diffs are the ideal way to make these modifications
but it seems to work. I learned the .md format about an hour ago by examining
other SSE instructions and so it is quite possible I've done it all wrong. To
the end, I am mildly concerned that load[h,l]p[s,d] need to be split like
sse_load_ss / sse_load_ss_1 to properly initialize some elements of the
vec_duplicate:V4SF.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug c++/12902] Invalid assembly generated when using SSE / xmmintrin.h
  2003-11-05  1:31 [Bug c++/12902] New: Invalid assembly generated when using SSE / xmmintrin.h kbowers at lanl dot gov
                   ` (4 preceding siblings ...)
  2003-11-07 10:42 ` kbowers at lanl dot gov
@ 2003-12-09 18:01 ` kbowers at lanl dot gov
  2003-12-09 20:14 ` [Bug target/12902] " dhazeghi at yahoo dot com
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: kbowers at lanl dot gov @ 2003-12-09 18:01 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From kbowers at lanl dot gov  2003-12-09 18:01 -------
Over a month ago, I reported this bug and then went to the effort to learning
gcc's internals in order to submit patches. However, the bug is still listed as
new and there has been no response for roughly a month.

Will these patches (or some other fix) be incorporated into a future release of gcc?

The reason I ask is it is unreasonable for me to require users to both patch and
build their own compiler to install some scientific codes of mine on large
clusters. I would like to see this bug resolved quickly so I can continue to
recommend gcc as the preferred compiler. Given I've already submitted patches,
this should be easy to address.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12902


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/12902] Invalid assembly generated when using SSE / xmmintrin.h
  2003-11-05  1:31 [Bug c++/12902] New: Invalid assembly generated when using SSE / xmmintrin.h kbowers at lanl dot gov
                   ` (5 preceding siblings ...)
  2003-12-09 18:01 ` kbowers at lanl dot gov
@ 2003-12-09 20:14 ` dhazeghi at yahoo dot com
  2003-12-09 20:17 ` dhazeghi at yahoo dot com
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: dhazeghi at yahoo dot com @ 2003-12-09 20:14 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From dhazeghi at yahoo dot com  2003-12-09 20:14 -------
Would you mind regenerating the patches with diff -up (as requested at http://gcc.gnu.org/
contribute.html), and then pinging gcc-patches@gcc.gnu.org (preferably including a changelog, 
and noting that you don't have cvs write-access)?

Regarding your general comment, gcc is more or less a volunteer project, with all the caveats 
involved. Anyhow, ideally people would scan the bug database for new bugs and patches as they 
came in, but the fact is that most gcc folk are quite busy. I can't guarantee your patch will be 
approved (I have no authority), but if you send it to the list (with the changes I mentioned), it'll be a 
lot more likely. I admit it isn't the best development project. But its what we have (suggestions/
comments are welcome).

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|c++                         |target
           Keywords|                            |patch, wrong-code


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12902


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/12902] Invalid assembly generated when using SSE / xmmintrin.h
  2003-11-05  1:31 [Bug c++/12902] New: Invalid assembly generated when using SSE / xmmintrin.h kbowers at lanl dot gov
                   ` (6 preceding siblings ...)
  2003-12-09 20:14 ` [Bug target/12902] " dhazeghi at yahoo dot com
@ 2003-12-09 20:17 ` dhazeghi at yahoo dot com
  2003-12-11 16:07 ` bangerth at dealii dot org
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: dhazeghi at yahoo dot com @ 2003-12-09 20:17 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From dhazeghi at yahoo dot com  2003-12-09 20:17 -------
s/project/model... Oh, and if you really want a prompt response, you can add the relevant 
MAINTAINER (from gcc/MAINTAINERS) to the cc: list for a bug. Not necessarily recommended but...

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12902


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/12902] Invalid assembly generated when using SSE / xmmintrin.h
  2003-11-05  1:31 [Bug c++/12902] New: Invalid assembly generated when using SSE / xmmintrin.h kbowers at lanl dot gov
                   ` (7 preceding siblings ...)
  2003-12-09 20:17 ` dhazeghi at yahoo dot com
@ 2003-12-11 16:07 ` bangerth at dealii dot org
  2004-12-13 20:54 ` bangerth at dealii dot org
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: bangerth at dealii dot org @ 2003-12-11 16:07 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From bangerth at dealii dot org  2003-12-11 16:07 -------
I think Jan has done most of the sse work, so I'll CC him. Jan: here's 
a patch that you may want to look into. 
 
W. 

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hubicka at gcc dot gnu dot
                   |                            |org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12902


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/12902] Invalid assembly generated when using SSE / xmmintrin.h
  2003-11-05  1:31 [Bug c++/12902] New: Invalid assembly generated when using SSE / xmmintrin.h kbowers at lanl dot gov
                   ` (8 preceding siblings ...)
  2003-12-11 16:07 ` bangerth at dealii dot org
@ 2004-12-13 20:54 ` bangerth at dealii dot org
  2004-12-14 10:54 ` uros at kss-loka dot si
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: bangerth at dealii dot org @ 2004-12-13 20:54 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From bangerth at dealii dot org  2004-12-13 20:54 -------
I really have not much of an idea what I am doing here, but this 
is a shorter testcase: 
------------------------- 
#include <xmmintrin.h> 
 
typedef struct { 
  int i; 
  float f[3]; 
} a_t; 
 
typedef struct { 
  float f[8]; 
} b_t; 
 
typedef union { 
  int i[4]; 
  float f[4]; 
  __m128 v; 
} vector4; 
 
void swizzle( const void *a0, const void *a1, 
              const void *a2, const void *a3, 
              vector4 *a, vector4 *b, vector4 *c, vector4 *d ) { 
  __m128 t, u; 
  a->v = _mm_loadl_pi(a->v, (__m64 *)a0); 
  c->v = _mm_loadl_pi(c->v,((__m64 *)a0)+1); 
  a->v = _mm_loadh_pi(a->v, (__m64 *)a1); 
  c->v = _mm_loadh_pi(c->v,((__m64 *)a1)+1); 
  t   = _mm_loadl_pi(b->v, (__m64 *)a2); 
  u   = _mm_loadl_pi(d->v,((__m64 *)a2)+1); 
  a->v = _mm_shuffle_ps(a->v,t,0); 
  b->v = _mm_shuffle_ps(b->v,t,0); 
  c->v = _mm_shuffle_ps(c->v,u,0); 
  d->v = _mm_shuffle_ps(d->v,u,0); 
} 
 
int main () { 
  a_t a[128]; 
  b_t b[128]; 
   vector4 ai, a0, a1, a2, b0, v0, v1, v2; 
   __m128 *p0, *p1, *p2, *p3; 
 
   int n = 1; 
   for(;n;n--) { 
     swizzle(a,a+1,a+2,a+3,&ai,&a0,&a1,&a2); 
     p0 = (__m128 *)(b + ai.i[0]); 
     p1 = (__m128 *)(b + ai.i[1]); 
     p2 = (__m128 *)(b + ai.i[2]); 
     p3 = (__m128 *)(b + ai.i[3]); 
     swizzle(p0++,p1++,p2++,p3++,&b0,&v0,&v1,&v2); 
     _mm_add_ps(_mm_add_ps(b0.v,_mm_mul_ps(a1.v,v0.v)), 
                _mm_mul_ps(a2.v,_mm_add_ps(v1.v,_mm_mul_ps(a1.v,v2.v)))); 
   } 
} 
----------------------------- 
 
It fails on 3.4 and mainline, but not with icc: 
g/x> /home/bangerth/bin/gcc-3.4.*-pre/bin/g++  -O -msse2 -g x.cc ; ./a.out  
Segmentation fault 
 
g/x> /home/bangerth/bin/gcc-4.*-pre/bin/g++  -O -msse2 -g x.cc ; ./a.out  
Segmentation fault 
 
g/x> icc x.cc ; ./a.out  
 
On mainline, I get this from a gdb session: 
 
(gdb) r 
Starting program: /home/bangerth/tmp/g/x/a.out  
 
Program received signal SIGSEGV, Segmentation fault. 
swizzle (a0=0xbfffe250, a1=0xbfffe260, a2=0xbfffe270, a3=0xbfffe280,  
    a=0xbfffeac0, b=0xbfffeab0, c=0xbfffeaa0, d=0xbfffea90) at x.cc:23 
 
(gdb) i reg 
[...] 
eip            0x80483ba	0x80483ba 
(gdb) disass 
0x080483ba <_Z7swizzlePKvS0_S0_S0_P7vector4S2_S2_S2_+38>:	movaps 0x8
(%esi),%xmm0 
 
As I said, I have no idea what this program does, and if it is wellformed  
after my attempts to reduce it at all. Maybe it helps anyway. 
 
W. 
 

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12902


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/12902] Invalid assembly generated when using SSE / xmmintrin.h
  2003-11-05  1:31 [Bug c++/12902] New: Invalid assembly generated when using SSE / xmmintrin.h kbowers at lanl dot gov
                   ` (9 preceding siblings ...)
  2004-12-13 20:54 ` bangerth at dealii dot org
@ 2004-12-14 10:54 ` uros at kss-loka dot si
  2005-01-05  9:43 ` [Bug target/12902] [4.0 Regression] " uros at kss-loka dot si
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: uros at kss-loka dot si @ 2004-12-14 10:54 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From uros at kss-loka dot si  2004-12-14 10:52 -------
The problem here is in combiner, which in combination with reload pass produce
somehow incorrect pattern.

The line that segfaults is:

  c->v = _mm_loadl_pi(c->v,((__m64 *)a0)+1); 


This line is represented with foloowing RTL sequence (pr12902.c.00.expand):

(insn 26 24 27 1 (parallel [
            (set (reg:SI 80)
                (plus:SI (reg:SI 70 [ a0.26 ])
                    (const_int 8 [0x8])))
            (clobber (reg:CC 17 flags))
        ]) -1 (nil)
    (nil))

(insn 27 26 28 1 (set (reg:SI 81)
        (reg:SI 80)) -1 (nil)
    (nil))

(insn 28 27 30 1 (set (reg:V4SF 60 [ D.3679 ])
        (vec_merge:V4SF (mem/s:V4SF (reg/v/f:SI 77 [ c ]) [0 <variable>.v+0 S16
A128])
            (mem:V4SF (reg:SI 81) [0 S16 A8])
            (const_int 3 [0x3]))) -1 (nil)
    (nil))

(insn 30 28 32 1 (set (mem/s:V4SF (reg/v/f:SI 77 [ c ]) [0 <variable>.v+0 S16 A128])
        (reg:V4SF 60 [ D.3679 ])) -1 (nil)
    (nil))


This whole sequence is combined into one RTL insn (pr12902.c.17.combine) that
satisfies "sse_movlps" pattern constraints:

(insn 30 28 35 0 (set (mem/s:V4SF (reg/v/f:SI 77 [ c ]) [0 <variable>.v+0 S16 A128])
        (vec_merge:V4SF (mem/s:V4SF (reg/v/f:SI 77 [ c ]) [0 <variable>.v+0 S16
A128])
            (mem:V4SF (plus:SI (reg/v/f:SI 71 [ a0 ])
                    (const_int 8 [0x8])) [0 S16 A8])
            (const_int 3 [0x3]))) 541 {sse_movlps} (insn_list:REG_DEP_TRUE 12 (nil))
    (expr_list:REG_DEAD (reg/v/f:SI 71 [ a0 ])
        (nil)))


Following this, reload generates what it thinks is the best reg/mem combination
to satisfy register constraints (pr12902.c.24.postreload) of "sse_movlps" pattern

(insn 80 28 30 0 (set (reg:V4SF 21 xmm0)
        (mem:V4SF (plus:SI (reg/v/f:SI 4 si [orig:71 a0 ] [71])
                (const_int 8 [0x8])) [0 S16 A8])) 509 {movv4sf_internal} (nil)
    (nil))

(insn:HI 30 80 35 0 (set (mem/s:V4SF (reg/v/f:SI 1 dx [orig:77 c ] [77]) [0
<variable>.v+0 S16 A128])
        (vec_merge:V4SF (mem/s:V4SF (reg/v/f:SI 1 dx [orig:77 c ] [77]) [0
<variable>.v+0 S16 A128])
            (reg:V4SF 21 xmm0)
            (const_int 3 [0x3]))) 541 {sse_movlps} (insn_list:REG_DEP_TRUE 12 (nil))
    (nil))


Unfortunatelly, insn 80 will crash, because it results in unaligned load:

	...
        movaps  8(%esi), %xmm0    <- crash here
        movlps  %xmm0, (%edx)
	...

Uros.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12902


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/12902] [4.0 Regression] Invalid assembly generated when using SSE / xmmintrin.h
  2003-11-05  1:31 [Bug c++/12902] New: Invalid assembly generated when using SSE / xmmintrin.h kbowers at lanl dot gov
                   ` (10 preceding siblings ...)
  2004-12-14 10:54 ` uros at kss-loka dot si
@ 2005-01-05  9:43 ` uros at kss-loka dot si
  2005-01-05 12:14 ` rth at gcc dot gnu dot org
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: uros at kss-loka dot si @ 2005-01-05  9:43 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From uros at kss-loka dot si  2005-01-05 09:43 -------
I think this bug should be fixed for 4.0.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |uros at kss-loka dot si
            Summary|Invalid assembly generated  |[4.0 Regression] Invalid
                   |when using SSE / xmmintrin.h|assembly generated when
                   |                            |using SSE / xmmintrin.h
   Target Milestone|---                         |4.0.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12902


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/12902] [4.0 Regression] Invalid assembly generated when using SSE / xmmintrin.h
  2003-11-05  1:31 [Bug c++/12902] New: Invalid assembly generated when using SSE / xmmintrin.h kbowers at lanl dot gov
                   ` (11 preceding siblings ...)
  2005-01-05  9:43 ` [Bug target/12902] [4.0 Regression] " uros at kss-loka dot si
@ 2005-01-05 12:14 ` rth at gcc dot gnu dot org
  2005-01-05 19:14 ` cvs-commit at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: rth at gcc dot gnu dot org @ 2005-01-05 12:14 UTC (permalink / raw)
  To: gcc-bugs



-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|unassigned at gcc dot gnu   |rth at gcc dot gnu dot org
                   |dot org                     |
             Status|NEW                         |ASSIGNED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12902


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/12902] [4.0 Regression] Invalid assembly generated when using SSE / xmmintrin.h
  2003-11-05  1:31 [Bug c++/12902] New: Invalid assembly generated when using SSE / xmmintrin.h kbowers at lanl dot gov
                   ` (12 preceding siblings ...)
  2005-01-05 12:14 ` rth at gcc dot gnu dot org
@ 2005-01-05 19:14 ` cvs-commit at gcc dot gnu dot org
  2005-01-05 20:04 ` rth at gcc dot gnu dot org
  2005-01-06  8:25 ` uros at kss-loka dot si
  15 siblings, 0 replies; 17+ messages in thread
From: cvs-commit at gcc dot gnu dot org @ 2005-01-05 19:14 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From cvs-commit at gcc dot gnu dot org  2005-01-05 19:14 -------
Subject: Bug 12902

CVSROOT:	/cvs/gcc
Module name:	gcc
Changes by:	rth@gcc.gnu.org	2005-01-05 19:14:39

Modified files:
	gcc            : ChangeLog 
	gcc/config/i386: i386.c i386.md 
Added files:
	gcc/testsuite/gcc.target/i386: sse-1.c 

Log message:
	PR target/12902
	* config/i386/i386.md (sse_movhps, sse_movlps): Remove.
	(sse_shufps): Change operand 3 to const_int_operand.
	(sse2_storelps): Fix typo in template.
	(sse_storehps, sse_loadhps, sse_storelps, sse_loadlps): New.
	* config/i386/i386.c (ix86_expand_vector_move_misalign): Use them.
	(ix86_expand_builtin): Likewise.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.7038&r2=2.7039
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config/i386/i386.c.diff?cvsroot=gcc&r1=1.767&r2=1.768
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config/i386/i386.md.diff?cvsroot=gcc&r1=1.599&r2=1.600
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/testsuite/gcc.target/i386/sse-1.c.diff?cvsroot=gcc&r1=NONE&r2=1.1



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12902


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/12902] [4.0 Regression] Invalid assembly generated when using SSE / xmmintrin.h
  2003-11-05  1:31 [Bug c++/12902] New: Invalid assembly generated when using SSE / xmmintrin.h kbowers at lanl dot gov
                   ` (13 preceding siblings ...)
  2005-01-05 19:14 ` cvs-commit at gcc dot gnu dot org
@ 2005-01-05 20:04 ` rth at gcc dot gnu dot org
  2005-01-06  8:25 ` uros at kss-loka dot si
  15 siblings, 0 replies; 17+ messages in thread
From: rth at gcc dot gnu dot org @ 2005-01-05 20:04 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From rth at gcc dot gnu dot org  2005-01-05 20:03 -------
Fixed.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|                            |FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12902


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/12902] [4.0 Regression] Invalid assembly generated when using SSE / xmmintrin.h
  2003-11-05  1:31 [Bug c++/12902] New: Invalid assembly generated when using SSE / xmmintrin.h kbowers at lanl dot gov
                   ` (14 preceding siblings ...)
  2005-01-05 20:04 ` rth at gcc dot gnu dot org
@ 2005-01-06  8:25 ` uros at kss-loka dot si
  15 siblings, 0 replies; 17+ messages in thread
From: uros at kss-loka dot si @ 2005-01-06  8:25 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From uros at kss-loka dot si  2005-01-06 08:25 -------
(In reply to comment #18)
There are still problems, testcase from comment #14 fails with:

gcc -O2 -msse pr12902-1.c

pr12902-1.c: In function 'swizzle':
pr12902-1.c:32: error: unrecognizable insn:
(insn 105 97 46 0 (set (mem/s:V2SF (plus:SI (reg/v/f:SI 0 ax [orig:75 a ] [75])
                (const_int 8 [0x8])) [0 <variable>.v+8 S8 A64])
        (mem:V2SF (reg/v/f:SI 5 di [orig:72 a1 ] [72]) [0 S8 A8])) -1 (nil)
    (nil))
pr12902-1.c:32: internal compiler error: in extract_insn, at recog.c:2020


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12902


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2005-01-06  8:25 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-11-05  1:31 [Bug c++/12902] New: Invalid assembly generated when using SSE / xmmintrin.h kbowers at lanl dot gov
2003-11-05  4:08 ` [Bug c++/12902] " pinskia at gcc dot gnu dot org
2003-11-05  5:51 ` kbowers at lanl dot gov
2003-11-06 18:53 ` kbowers at lanl dot gov
2003-11-07  1:02 ` kbowers at lanl dot gov
2003-11-07 10:42 ` kbowers at lanl dot gov
2003-12-09 18:01 ` kbowers at lanl dot gov
2003-12-09 20:14 ` [Bug target/12902] " dhazeghi at yahoo dot com
2003-12-09 20:17 ` dhazeghi at yahoo dot com
2003-12-11 16:07 ` bangerth at dealii dot org
2004-12-13 20:54 ` bangerth at dealii dot org
2004-12-14 10:54 ` uros at kss-loka dot si
2005-01-05  9:43 ` [Bug target/12902] [4.0 Regression] " uros at kss-loka dot si
2005-01-05 12:14 ` rth at gcc dot gnu dot org
2005-01-05 19:14 ` cvs-commit at gcc dot gnu dot org
2005-01-05 20:04 ` rth at gcc dot gnu dot org
2005-01-06  8:25 ` uros at kss-loka dot si

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).