public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c++/12902] New: Invalid assembly generated when using SSE / xmmintrin.h
@ 2003-11-05  1:31 kbowers at lanl dot gov
  2003-11-05  4:08 ` [Bug c++/12902] " pinskia at gcc dot gnu dot org
                   ` (15 more replies)
  0 siblings, 16 replies; 17+ messages in thread
From: kbowers at lanl dot gov @ 2003-11-05  1:31 UTC (permalink / raw)
  To: gcc-bugs

PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12902

           Summary: Invalid assembly generated when using SSE / xmmintrin.h
           Product: gcc
           Version: 3.3.2
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: c++
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: kbowers at lanl dot gov
                CC: gcc-bugs at gcc dot gnu dot org
 GCC build triplet: i686-pc-linux-gnu
  GCC host triplet: i686-pc-linux-gnu
GCC target triplet: i686-pc-linux-gnu

I have a code which heavily uses SSE. Some new functions written for the code
seg-faulted due to an unaligned "movaps" instruction. I stripped one of those
functions down as far as I could to make a self-contained program which
reproduces the bug. That program is attached to the end of this bug report. I
apologize in advance that I could not find a simpler faulting program.

It generates invalid assembly when compiled with gcc-3.3 and gcc-3.3.2 at
optimizations levels -O, -O1, -O2 and -O3 when compiled.

The following commands can demonstrate the bug.

% g++-3.3.2 -v

Reading specs from
/home/kbowers/local/bin/../lib/gcc-lib/i686-pc-linux-gnu/3.3.2/specs
Configured with: ../gcc-3.3.2/configure --prefix=/local_home/kbowers
--program-suffix=-3.3.2
Thread model: posix
gcc version 3.3.2

% g++-3.3.2 -S -fverbose-asm -O -msse compiler_bug.cpp
% cat compiler_bug.cpp

... snip to around line 45 ...

	movl	8(%ebp), %eax	#  a
	movlps	(%eax), %xmm3	#  <anonymous>
	movaps	8(%eax), %xmm0
	movlps	%xmm0, -200(%ebp)
	movhps	16(%eax), %xmm3	#  <anonymous>
	movaps	24(%eax), %xmm2

... snip ...

The first instruction loads the value of the "a" pointer into eax. When the loop
is run, "a" is a 16-byte aligned pointer. The third instruction tries to do an
aligned 16-byte load from a non-aligned address.

Thanks.

---- BEGIN PROGRAM ----
#include <xmmintrin.h>

typedef struct {
  int i;
  float f[3];
} a_t;

typedef struct {
  float f[8];
} b_t;

typedef union {
  int i[4];
  float f[4];
  __m128 v;
} vector4;

inline void swizzle( const void *a0, const void *a1,
                     const void *a2, const void *a3,
                     vector4 &a, vector4 &b, vector4 &c, vector4 &d ) {
  __m128 t, u;
  a.v = _mm_loadl_pi(a.v, (__m64 *)a0);
  c.v = _mm_loadl_pi(c.v,((__m64 *)a0)+1);
  a.v = _mm_loadh_pi(a.v, (__m64 *)a1);
  c.v = _mm_loadh_pi(c.v,((__m64 *)a1)+1);
  b.v = a.v;
  d.v = c.v;
  t   = _mm_loadl_pi(b.v, (__m64 *)a2);    // b.v to avoid warn
  u   = _mm_loadl_pi(d.v,((__m64 *)a2)+1); // d.v to avoid warn
  t   = _mm_loadh_pi(t, (__m64 *)a3);
  u   = _mm_loadh_pi(u,((__m64 *)a3)+1);
  a.v = _mm_shuffle_ps(a.v,t,0x88);
  b.v = _mm_shuffle_ps(b.v,t,0xdd);
  c.v = _mm_shuffle_ps(c.v,u,0x88);
  d.v = _mm_shuffle_ps(d.v,u,0xdd);
}

void foo( const a_t *a, const b_t *b, int n ) {
  vector4 ai, a0, a1, a2, b0, b1, v0, v1, v2;
  __m128 *p0, *p1, *p2, *p3;

  for(;n;n--,a+=4) {
    swizzle(a,a+1,a+2,a+3,ai,a0,a1,a2);
    p0 = (__m128 *)(b + ai.i[0]);
    p1 = (__m128 *)(b + ai.i[1]);
    p2 = (__m128 *)(b + ai.i[2]);
    p3 = (__m128 *)(b + ai.i[3]);
    swizzle(p0++,p1++,p2++,p3++,b0,v0,v1,v2);
    b0.v = _mm_add_ps(                _mm_add_ps(b0.v,_mm_mul_ps(a1.v,v0.v)),
                      _mm_mul_ps(a2.v,_mm_add_ps(v1.v,_mm_mul_ps(a1.v,v2.v))));
    swizzle(p0,p1,p2,p3,b1,v0,v1,v2);
    b1.v = _mm_add_ps(                _mm_add_ps(b1.v,_mm_mul_ps(a2.v,v0.v)),
                      _mm_mul_ps(a0.v,_mm_add_ps(v1.v,_mm_mul_ps(a2.v,v2.v))));
  }
}
---- END PROGRAM ----


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2005-01-06  8:25 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-11-05  1:31 [Bug c++/12902] New: Invalid assembly generated when using SSE / xmmintrin.h kbowers at lanl dot gov
2003-11-05  4:08 ` [Bug c++/12902] " pinskia at gcc dot gnu dot org
2003-11-05  5:51 ` kbowers at lanl dot gov
2003-11-06 18:53 ` kbowers at lanl dot gov
2003-11-07  1:02 ` kbowers at lanl dot gov
2003-11-07 10:42 ` kbowers at lanl dot gov
2003-12-09 18:01 ` kbowers at lanl dot gov
2003-12-09 20:14 ` [Bug target/12902] " dhazeghi at yahoo dot com
2003-12-09 20:17 ` dhazeghi at yahoo dot com
2003-12-11 16:07 ` bangerth at dealii dot org
2004-12-13 20:54 ` bangerth at dealii dot org
2004-12-14 10:54 ` uros at kss-loka dot si
2005-01-05  9:43 ` [Bug target/12902] [4.0 Regression] " uros at kss-loka dot si
2005-01-05 12:14 ` rth at gcc dot gnu dot org
2005-01-05 19:14 ` cvs-commit at gcc dot gnu dot org
2005-01-05 20:04 ` rth at gcc dot gnu dot org
2005-01-06  8:25 ` uros at kss-loka dot si

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).