public inbox for gcc-prs@sourceware.org
help / color / mirror / Atom feed
* Re: optimization/8049: SSE unaligned vector stores crash with -O0
@ 2002-10-10 18:06 Ian Ollmann
0 siblings, 0 replies; 15+ messages in thread
From: Ian Ollmann @ 2002-10-10 18:06 UTC (permalink / raw)
To: nobody; +Cc: gcc-prs
The following reply was made to PR optimization/8049; it has been noted by GNATS.
From: Ian Ollmann <iano@cco.caltech.edu>
To: Jan Hubicka <jh@suse.cz>
Cc: hubicka@gcc.gnu.org, <gcc-bugs@gcc.gnu.org>, <gcc-prs@gcc.gnu.org>,
<gcc-gnats@gcc.gnu.org>
Subject: Re: optimization/8049: SSE unaligned vector stores crash with -O0
Date: Thu, 10 Oct 2002 17:58:22 -0700
I think the problem is that the compiler is not properly padding data.
_mm_store_ps is nearly unique in that its argument order is
_mm_store_ps( float*, vector );
The 0x4 offset is to protect the float* that is written to the stack a few
bytes earlier. The compiler needs to insert an extra 12 bytes of padding.
In fact, it I swap the order of the arguments as follows:
_mm_store_ps( vector, float*)
then the crash goes away.
Ian
---------------------------------------------------
Ian Ollmann, Ph.D. iano@cco.caltech.edu
---------------------------------------------------
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: optimization/8049: SSE unaligned vector stores crash with -O0
@ 2003-04-08 13:14 hubicka
0 siblings, 0 replies; 15+ messages in thread
From: hubicka @ 2003-04-08 13:14 UTC (permalink / raw)
To: bernds, gcc-bugs, gcc-prs, iano
Synopsis: SSE unaligned vector stores crash with -O0
State-Changed-From-To: analyzed->closed
State-Changed-By: hubicka
State-Changed-When: Tue Apr 8 13:14:16 2003
State-Changed-Why:
The odgoing arguments are finally aligned properly. Dynamic stack alignment still needs to be resolved but there are other PRs about that.
http://gcc.gnu.org/cgi-bin/gnatsweb.pl?cmd=view%20audit-trail&database=gcc&pr=8049
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: optimization/8049: SSE unaligned vector stores crash with -O0
@ 2002-10-11 12:41 hubicka
0 siblings, 0 replies; 15+ messages in thread
From: hubicka @ 2002-10-11 12:41 UTC (permalink / raw)
To: bernds, gcc-bugs, gcc-prs, iano, nobody
Synopsis: SSE unaligned vector stores crash with -O0
Responsible-Changed-From-To: unassigned->bernds
Responsible-Changed-By: hubicka
Responsible-Changed-When: Fri Oct 11 12:41:36 2002
Responsible-Changed-Why:
Bernd, can you please finish the merge?
State-Changed-From-To: closed->analyzed
State-Changed-By: hubicka
State-Changed-When: Fri Oct 11 12:41:36 2002
State-Changed-Why:
One of testcases shows still existing problem - the patch to align outgoing arguments didn't get merget into mainline.
http://gcc.gnu.org/cgi-bin/gnatsweb.pl?cmd=view%20audit-trail&database=gcc&pr=8049
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: optimization/8049: SSE unaligned vector stores crash with -O0
@ 2002-10-11 2:36 Jan Hubicka
0 siblings, 0 replies; 15+ messages in thread
From: Jan Hubicka @ 2002-10-11 2:36 UTC (permalink / raw)
To: nobody; +Cc: gcc-prs
The following reply was made to PR optimization/8049; it has been noted by GNATS.
From: Jan Hubicka <jh@suse.cz>
To: Ian Ollmann <iano@cco.caltech.edu>
Cc: Jan Hubicka <jh@suse.cz>, hubicka@gcc.gnu.org,
gcc-bugs@gcc.gnu.org, gcc-prs@gcc.gnu.org, gcc-gnats@gcc.gnu.org
Subject: Re: optimization/8049: SSE unaligned vector stores crash with -O0
Date: Fri, 11 Oct 2002 11:32:48 +0200
> On Thu, 10 Oct 2002, Jan Hubicka wrote:
>
> > This really is load, however it is stored previously as:
> > > 0x80484c9 <MatrixMultiply+177>: movaps %xmm0,0xffffff68(%ebp)
>
> Why is that a store and this a load?
>
> 0x804888b <MatrixMultiply+1139> movaps %xmm0, 0x4 (%esp,1)
I see, I got confused by the disassembly.
I tought you are getting trap on the previous load, not on this store.
In this case it is not start frame that is misaligned, but outgoing
argument area is that looks like unrelated bug. (we crash during
storing the argument for function call, not during reading the stack
frame copy of C1). I will check what is going on here.
Thanks,
Honza
>
>
> ---------------------------------------------------
> Ian Ollmann, Ph.D. iano@cco.caltech.edu
> ---------------------------------------------------
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: optimization/8049: SSE unaligned vector stores crash with -O0
@ 2002-10-10 17:46 Ian Ollmann
0 siblings, 0 replies; 15+ messages in thread
From: Ian Ollmann @ 2002-10-10 17:46 UTC (permalink / raw)
To: nobody; +Cc: gcc-prs
The following reply was made to PR optimization/8049; it has been noted by GNATS.
From: Ian Ollmann <iano@cco.caltech.edu>
To: Jan Hubicka <jh@suse.cz>
Cc: hubicka@gcc.gnu.org, <gcc-bugs@gcc.gnu.org>, <gcc-prs@gcc.gnu.org>,
<gcc-gnats@gcc.gnu.org>
Subject: Re: optimization/8049: SSE unaligned vector stores crash with -O0
Date: Thu, 10 Oct 2002 17:45:16 -0700
On Thu, 10 Oct 2002, Jan Hubicka wrote:
> This really is load, however it is stored previously as:
> > 0x80484c9 <MatrixMultiply+177>: movaps %xmm0,0xffffff68(%ebp)
>
> And it didn't generated trap, so ebp is aligned that time I suppose,
> so it is really strange to see that it is not in this case.
> Perhaps it is _mm_store_ps that messed up the stack and restored ebp
> incorrectly?
I checked. I stepped through the entire function. %esp stays the same the
whole time (so long as we are in MatrixMultiply and after return from
various sub functions) and is 16 byte aligned. The problem is that the
compiler seems to have generated a 4 byte offset from the 16 byte aligned
stack pointer and tried to do an align vector store there. For the other
earlier stores, the offset is an even multiple of 16 bytes.
0x804888b <MatrixMultiply+1139> movaps %xmm0, 0x4(%esp,1)
^^^
esp: 0xbffff790
The store looks to me like a write to pass the vector on the stack, just
before the call is made.
Perhaps it is time to reopen this bug? I think this is a compiler
alignment bug. It happens for nearly every call to _mm_store_ps() if -O0
is set.
Ian
---------------------------------------------------
Ian Ollmann, Ph.D. iano@cco.caltech.edu
---------------------------------------------------
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: optimization/8049: SSE unaligned vector stores crash with -O0
@ 2002-10-10 12:46 Ian Ollmann
0 siblings, 0 replies; 15+ messages in thread
From: Ian Ollmann @ 2002-10-10 12:46 UTC (permalink / raw)
To: nobody; +Cc: gcc-prs
The following reply was made to PR optimization/8049; it has been noted by GNATS.
From: Ian Ollmann <iano@cco.caltech.edu>
To: Jan Hubicka <jh@suse.cz>
Cc: hubicka@gcc.gnu.org, <gcc-bugs@gcc.gnu.org>, <gcc-prs@gcc.gnu.org>,
<gcc-gnats@gcc.gnu.org>
Subject: Re: optimization/8049: SSE unaligned vector stores crash with -O0
Date: Thu, 10 Oct 2002 12:44:08 -0700
On Thu, 10 Oct 2002, Jan Hubicka wrote:
> This really is load, however it is stored previously as:
> > 0x80484c9 <MatrixMultiply+177>: movaps %xmm0,0xffffff68(%ebp)
Why is that a store and this a load?
0x804888b <MatrixMultiply+1139> movaps %xmm0, 0x4 (%esp,1)
---------------------------------------------------
Ian Ollmann, Ph.D. iano@cco.caltech.edu
---------------------------------------------------
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: optimization/8049: SSE unaligned vector stores crash with -O0
@ 2002-10-10 12:26 Jan Hubicka
0 siblings, 0 replies; 15+ messages in thread
From: Jan Hubicka @ 2002-10-10 12:26 UTC (permalink / raw)
To: nobody; +Cc: gcc-prs
The following reply was made to PR optimization/8049; it has been noted by GNATS.
From: Jan Hubicka <jh@suse.cz>
To: Ian Ollmann <iano@cco.caltech.edu>
Cc: Jan Hubicka <jh@suse.cz>, hubicka@gcc.gnu.org,
gcc-bugs@gcc.gnu.org, gcc-prs@gcc.gnu.org, gcc-gnats@gcc.gnu.org
Subject: Re: optimization/8049: SSE unaligned vector stores crash with -O0
Date: Thu, 10 Oct 2002 21:20:58 +0200
> On Thu, 10 Oct 2002, Jan Hubicka wrote:
>
> > > The original example is one. The three buffers passed into MatrixMultiply
> > > happen to be aligned on my system. The actual crash happens when the stack
> > > copy of C1 is loaded, before the _mm_store_ps (uninlined) function is
> > > called.
> >
> > This looks strange. All stores to C1 works properly and then you get
> > movaps crash when you load it to store into output?
> > I believe it is the output that is missaligned (destination of store)
> > because the destination array is missaligned from the caller.
>
> Hmm, perhaps I got confused between the Intel and ATT argument ordering
> scheme. Perhaps this is actually a store to the stack from the earlier
> mm_add_ps and gdb misreported the line from the source? What do you think?
>
> (gdb) run
> Program received signal SIGSEGV, Segmentation Fault
> 0x0804888b in MatrixMultiply() (A = 0xbffff9d0, B=0xbffff990, C=0xbffff950)
> at main.c:74
>
> 74 _mm_store_ps( C + 0, C1 ); //....
>
>
> 0x8048884 <MatrixMultiply+1132>: movaps 0xffffff68 (%ebp),%xmm0
> 0x804888b <MatrixMultiply+1139>: movaps %xmm0, 0x4 (%esp, 1)
> 0x8048890 <MatrixMultiply+1144>: call 0x8048930 <_mm_store_ps>
This really is load, however it is stored previously as:
> 0x80484c9 <MatrixMultiply+177>: movaps %xmm0,0xffffff68(%ebp)
And it didn't generated trap, so ebp is aligned that time I suppose,
so it is really strange to see that it is not in this case.
Perhaps it is _mm_store_ps that messed up the stack and restored ebp
incorrectly?
Honza
>
> (gdb) info registers
>
> esp 0xbffff800
> ebp 0xbffff928
>
>
>
> Dump of assembler code for function MatrixMultiply:
> 0x8048418 <MatrixMultiply>: push %ebp
> 0x8048419 <MatrixMultiply+1>: mov %esp,%ebp
> 0x804841b <MatrixMultiply+3>: sub $0x128,%esp
> 0x8048421 <MatrixMultiply+9>: mov 0x8(%ebp),%eax
> 0x8048424 <MatrixMultiply+12>: mov %eax,(%esp,1)
> 0x8048427 <MatrixMultiply+15>: call 0x8048925 <_mm_load_ps>
> 0x804842c <MatrixMultiply+20>: movaps %xmm0,0xffffffe8(%ebp)
> 0x8048430 <MatrixMultiply+24>: mov 0x8(%ebp),%eax
> 0x8048433 <MatrixMultiply+27>: add $0x10,%eax
> 0x8048436 <MatrixMultiply+30>: mov %eax,(%esp,1)
> 0x8048439 <MatrixMultiply+33>: call 0x8048925 <_mm_load_ps>
> 0x804843e <MatrixMultiply+38>: movaps %xmm0,0xffffffd8(%ebp)
> 0x8048442 <MatrixMultiply+42>: mov 0x8(%ebp),%eax
> 0x8048445 <MatrixMultiply+45>: add $0x20,%eax
> 0x8048448 <MatrixMultiply+48>: mov %eax,(%esp,1)
> 0x804844b <MatrixMultiply+51>: call 0x8048925 <_mm_load_ps>
> 0x8048450 <MatrixMultiply+56>: movaps %xmm0,0xffffffc8(%ebp)
> 0x8048454 <MatrixMultiply+60>: mov 0x8(%ebp),%eax
> 0x8048457 <MatrixMultiply+63>: add $0x30,%eax
> 0x804845a <MatrixMultiply+66>: mov %eax,(%esp,1)
> 0x804845d <MatrixMultiply+69>: call 0x8048925 <_mm_load_ps>
> 0x8048462 <MatrixMultiply+74>: movaps %xmm0,0xffffffb8(%ebp)
> 0x8048466 <MatrixMultiply+78>: mov 0xc(%ebp),%eax
> 0x8048469 <MatrixMultiply+81>: mov %eax,(%esp,1)
> 0x804846c <MatrixMultiply+84>: call 0x8048925 <_mm_load_ps>
> 0x8048471 <MatrixMultiply+89>: movaps %xmm0,0xffffffa8(%ebp)
> 0x8048475 <MatrixMultiply+93>: mov 0xc(%ebp),%eax
> 0x8048478 <MatrixMultiply+96>: add $0x10,%eax
> 0x804847b <MatrixMultiply+99>: mov %eax,(%esp,1)
> 0x804847e <MatrixMultiply+102>: call 0x8048925 <_mm_load_ps>
> 0x8048483 <MatrixMultiply+107>: movaps %xmm0,0xffffff98(%ebp)
> 0x8048487 <MatrixMultiply+111>: mov 0xc(%ebp),%eax
> 0x804848a <MatrixMultiply+114>: add $0x20,%eax
> 0x804848d <MatrixMultiply+117>: mov %eax,(%esp,1)
> 0x8048490 <MatrixMultiply+120>: call 0x8048925 <_mm_load_ps>
> 0x8048495 <MatrixMultiply+125>: movaps %xmm0,0xffffff88(%ebp)
> 0x8048499 <MatrixMultiply+129>: mov 0xc(%ebp),%eax
> 0x804849c <MatrixMultiply+132>: add $0x30,%eax
> 0x804849f <MatrixMultiply+135>: mov %eax,(%esp,1)
> 0x80484a2 <MatrixMultiply+138>: call 0x8048925 <_mm_load_ps>
> 0x80484a7 <MatrixMultiply+143>: movaps %xmm0,0xffffff78(%ebp)
> 0x80484ae <MatrixMultiply+150>: movaps 0xffffffe8(%ebp),%xmm0
> 0x80484b2 <MatrixMultiply+154>: shufps $0x0,0xffffffe8(%ebp),%xmm0
> 0x80484b7 <MatrixMultiply+159>: movaps %xmm0,(%esp,1)
> 0x80484bb <MatrixMultiply+163>: movaps 0xffffffa8(%ebp),%xmm0
> 0x80484bf <MatrixMultiply+167>: movaps %xmm0,0x10(%esp,1)
> 0x80484c4 <MatrixMultiply+172>: call 0x8048905 <_mm_mul_ps>
> 0x80484c9 <MatrixMultiply+177>: movaps %xmm0,0xffffff68(%ebp)
> 0x80484d0 <MatrixMultiply+184>: movaps 0xffffffd8(%ebp),%xmm0
> 0x80484d4 <MatrixMultiply+188>: shufps $0x0,0xffffffd8(%ebp),%xmm0
> 0x80484d9 <MatrixMultiply+193>: movaps %xmm0,(%esp,1)
> 0x80484dd <MatrixMultiply+197>: movaps 0xffffff98(%ebp),%xmm0
> 0x80484e1 <MatrixMultiply+201>: movaps %xmm0,0x10(%esp,1)
> 0x80484e6 <MatrixMultiply+206>: call 0x8048905 <_mm_mul_ps>
> 0x80484eb <MatrixMultiply+211>: movaps %xmm0,0xffffff58(%ebp)
> 0x80484f2 <MatrixMultiply+218>: movaps 0xffffffc8(%ebp),%xmm0
> 0x80484f6 <MatrixMultiply+222>: shufps $0x0,0xffffffc8(%ebp),%xmm0
> 0x80484fb <MatrixMultiply+227>: movaps %xmm0,(%esp,1)
> 0x80484ff <MatrixMultiply+231>: movaps 0xffffff88(%ebp),%xmm0
> 0x8048503 <MatrixMultiply+235>: movaps %xmm0,0x10(%esp,1)
> 0x8048508 <MatrixMultiply+240>: call 0x8048905 <_mm_mul_ps>
> 0x804850d <MatrixMultiply+245>: movaps %xmm0,0xffffff48(%ebp)
> 0x8048514 <MatrixMultiply+252>: movaps 0xffffffb8(%ebp),%xmm0
> 0x8048518 <MatrixMultiply+256>: shufps $0x0,0xffffffb8(%ebp),%xmm0
> 0x804851d <MatrixMultiply+261>: movaps %xmm0,(%esp,1)
> 0x8048521 <MatrixMultiply+265>: movaps 0xffffff78(%ebp),%xmm0
> 0x8048528 <MatrixMultiply+272>: movaps %xmm0,0x10(%esp,1)
> 0x804852d <MatrixMultiply+277>: call 0x8048905 <_mm_mul_ps>
> 0x8048532 <MatrixMultiply+282>: movaps %xmm0,0xffffff38(%ebp)
> 0x8048539 <MatrixMultiply+289>: movaps 0xffffffe8(%ebp),%xmm0
> 0x804853d <MatrixMultiply+293>: shufps $0x55,0xffffffe8(%ebp),%xmm0
> 0x8048542 <MatrixMultiply+298>: movaps %xmm0,(%esp,1)
> 0x8048546 <MatrixMultiply+302>: movaps 0xffffffa8(%ebp),%xmm0
> 0x804854a <MatrixMultiply+306>: movaps %xmm0,0x10(%esp,1)
> 0x804854f <MatrixMultiply+311>: call 0x8048905 <_mm_mul_ps>
> 0x8048554 <MatrixMultiply+316>: movaps %xmm0,0xffffff28(%ebp)
> 0x804855b <MatrixMultiply+323>: movaps 0xffffffd8(%ebp),%xmm0
> 0x804855f <MatrixMultiply+327>: shufps $0x55,0xffffffd8(%ebp),%xmm0
> 0x8048564 <MatrixMultiply+332>: movaps %xmm0,(%esp,1)
> 0x8048568 <MatrixMultiply+336>: movaps 0xffffff98(%ebp),%xmm0
> 0x804856c <MatrixMultiply+340>: movaps %xmm0,0x10(%esp,1)
> 0x8048571 <MatrixMultiply+345>: call 0x8048905 <_mm_mul_ps>
> 0x8048576 <MatrixMultiply+350>: movaps %xmm0,0xffffff18(%ebp)
> 0x804857d <MatrixMultiply+357>: movaps 0xffffffc8(%ebp),%xmm0
> 0x8048581 <MatrixMultiply+361>: shufps $0x55,0xffffffc8(%ebp),%xmm0
> 0x8048586 <MatrixMultiply+366>: movaps %xmm0,(%esp,1)
> 0x804858a <MatrixMultiply+370>: movaps 0xffffff88(%ebp),%xmm0
> 0x804858e <MatrixMultiply+374>: movaps %xmm0,0x10(%esp,1)
> 0x8048593 <MatrixMultiply+379>: call 0x8048905 <_mm_mul_ps>
> 0x8048598 <MatrixMultiply+384>: movaps %xmm0,0xffffff08(%ebp)
> 0x804859f <MatrixMultiply+391>: movaps 0xffffffb8(%ebp),%xmm0
> 0x80485a3 <MatrixMultiply+395>: shufps $0x55,0xffffffb8(%ebp),%xmm0
> 0x80485a8 <MatrixMultiply+400>: movaps %xmm0,(%esp,1)
> 0x80485ac <MatrixMultiply+404>: movaps 0xffffff78(%ebp),%xmm0
> 0x80485b3 <MatrixMultiply+411>: movaps %xmm0,0x10(%esp,1)
> 0x80485b8 <MatrixMultiply+416>: call 0x8048905 <_mm_mul_ps>
> 0x80485bd <MatrixMultiply+421>: movaps %xmm0,0xfffffef8(%ebp)
> 0x80485c4 <MatrixMultiply+428>: movaps 0xffffff68(%ebp),%xmm0
> 0x80485cb <MatrixMultiply+435>: movaps %xmm0,(%esp,1)
> 0x80485cf <MatrixMultiply+439>: movaps 0xffffff28(%ebp),%xmm0
> 0x80485d6 <MatrixMultiply+446>: movaps %xmm0,0x10(%esp,1)
> 0x80485db <MatrixMultiply+451>: call 0x80488e5 <_mm_add_ps>
> 0x80485e0 <MatrixMultiply+456>: movaps %xmm0,0xffffff68(%ebp)
> 0x80485e7 <MatrixMultiply+463>: movaps 0xffffff58(%ebp),%xmm0
> 0x80485ee <MatrixMultiply+470>: movaps %xmm0,(%esp,1)
> 0x80485f2 <MatrixMultiply+474>: movaps 0xffffff18(%ebp),%xmm0
> 0x80485f9 <MatrixMultiply+481>: movaps %xmm0,0x10(%esp,1)
> 0x80485fe <MatrixMultiply+486>: call 0x80488e5 <_mm_add_ps>
> 0x8048603 <MatrixMultiply+491>: movaps %xmm0,0xffffff58(%ebp)
> 0x804860a <MatrixMultiply+498>: movaps 0xffffff48(%ebp),%xmm0
> 0x8048611 <MatrixMultiply+505>: movaps %xmm0,(%esp,1)
> 0x8048615 <MatrixMultiply+509>: movaps 0xffffff08(%ebp),%xmm0
> 0x804861c <MatrixMultiply+516>: movaps %xmm0,0x10(%esp,1)
> 0x8048621 <MatrixMultiply+521>: call 0x80488e5 <_mm_add_ps>
> 0x8048626 <MatrixMultiply+526>: movaps %xmm0,0xffffff48(%ebp)
> 0x804862d <MatrixMultiply+533>: movaps 0xffffff38(%ebp),%xmm0
> 0x8048634 <MatrixMultiply+540>: movaps %xmm0,(%esp,1)
> 0x8048638 <MatrixMultiply+544>: movaps 0xfffffef8(%ebp),%xmm0
> 0x804863f <MatrixMultiply+551>: movaps %xmm0,0x10(%esp,1)
> 0x8048644 <MatrixMultiply+556>: call 0x80488e5 <_mm_add_ps>
> 0x8048649 <MatrixMultiply+561>: movaps %xmm0,0xffffff38(%ebp)
> 0x8048650 <MatrixMultiply+568>: movaps 0xffffffe8(%ebp),%xmm0
> 0x8048654 <MatrixMultiply+572>: shufps $0xaa,0xffffffe8(%ebp),%xmm0
> 0x8048659 <MatrixMultiply+577>: movaps %xmm0,(%esp,1)
> 0x804865d <MatrixMultiply+581>: movaps 0xffffffa8(%ebp),%xmm0
> 0x8048661 <MatrixMultiply+585>: movaps %xmm0,0x10(%esp,1)
> 0x8048666 <MatrixMultiply+590>: call 0x8048905 <_mm_mul_ps>
> 0x804866b <MatrixMultiply+595>: movaps %xmm0,0xffffff28(%ebp)
> 0x8048672 <MatrixMultiply+602>: movaps 0xffffffd8(%ebp),%xmm0
> 0x8048676 <MatrixMultiply+606>: shufps $0xaa,0xffffffd8(%ebp),%xmm0
> 0x804867b <MatrixMultiply+611>: movaps %xmm0,(%esp,1)
> 0x804867f <MatrixMultiply+615>: movaps 0xffffff98(%ebp),%xmm0
> 0x8048683 <MatrixMultiply+619>: movaps %xmm0,0x10(%esp,1)
> 0x8048688 <MatrixMultiply+624>: call 0x8048905 <_mm_mul_ps>
> 0x804868d <MatrixMultiply+629>: movaps %xmm0,0xffffff18(%ebp)
> 0x8048694 <MatrixMultiply+636>: movaps 0xffffffc8(%ebp),%xmm0
> 0x8048698 <MatrixMultiply+640>: shufps $0xaa,0xffffffc8(%ebp),%xmm0
> 0x804869d <MatrixMultiply+645>: movaps %xmm0,(%esp,1)
> 0x80486a1 <MatrixMultiply+649>: movaps 0xffffff88(%ebp),%xmm0
> 0x80486a5 <MatrixMultiply+653>: movaps %xmm0,0x10(%esp,1)
> 0x80486aa <MatrixMultiply+658>: call 0x8048905 <_mm_mul_ps>
> 0x80486af <MatrixMultiply+663>: movaps %xmm0,0xffffff08(%ebp)
> 0x80486b6 <MatrixMultiply+670>: movaps 0xffffffb8(%ebp),%xmm0
> 0x80486ba <MatrixMultiply+674>: shufps $0xaa,0xffffffb8(%ebp),%xmm0
> 0x80486bf <MatrixMultiply+679>: movaps %xmm0,(%esp,1)
> 0x80486c3 <MatrixMultiply+683>: movaps 0xffffff78(%ebp),%xmm0
> 0x80486ca <MatrixMultiply+690>: movaps %xmm0,0x10(%esp,1)
> 0x80486cf <MatrixMultiply+695>: call 0x8048905 <_mm_mul_ps>
> 0x80486d4 <MatrixMultiply+700>: movaps %xmm0,0xfffffef8(%ebp)
> 0x80486db <MatrixMultiply+707>: movaps 0xffffff68(%ebp),%xmm0
> 0x80486e2 <MatrixMultiply+714>: movaps %xmm0,(%esp,1)
> 0x80486e6 <MatrixMultiply+718>: movaps 0xffffff28(%ebp),%xmm0
> 0x80486ed <MatrixMultiply+725>: movaps %xmm0,0x10(%esp,1)
> 0x80486f2 <MatrixMultiply+730>: call 0x80488e5 <_mm_add_ps>
> 0x80486f7 <MatrixMultiply+735>: movaps %xmm0,0xffffff68(%ebp)
> 0x80486fe <MatrixMultiply+742>: movaps 0xffffff58(%ebp),%xmm0
> 0x8048705 <MatrixMultiply+749>: movaps %xmm0,(%esp,1)
> 0x8048709 <MatrixMultiply+753>: movaps 0xffffff18(%ebp),%xmm0
> 0x8048710 <MatrixMultiply+760>: movaps %xmm0,0x10(%esp,1)
> 0x8048715 <MatrixMultiply+765>: call 0x80488e5 <_mm_add_ps>
> 0x804871a <MatrixMultiply+770>: movaps %xmm0,0xffffff58(%ebp)
> 0x8048721 <MatrixMultiply+777>: movaps 0xffffff48(%ebp),%xmm0
> 0x8048728 <MatrixMultiply+784>: movaps %xmm0,(%esp,1)
> 0x804872c <MatrixMultiply+788>: movaps 0xffffff08(%ebp),%xmm0
> 0x8048733 <MatrixMultiply+795>: movaps %xmm0,0x10(%esp,1)
> 0x8048738 <MatrixMultiply+800>: call 0x80488e5 <_mm_add_ps>
> 0x804873d <MatrixMultiply+805>: movaps %xmm0,0xffffff48(%ebp)
> 0x8048744 <MatrixMultiply+812>: movaps 0xffffff38(%ebp),%xmm0
> 0x804874b <MatrixMultiply+819>: movaps %xmm0,(%esp,1)
> 0x804874f <MatrixMultiply+823>: movaps 0xfffffef8(%ebp),%xmm0
> 0x8048756 <MatrixMultiply+830>: movaps %xmm0,0x10(%esp,1)
> 0x804875b <MatrixMultiply+835>: call 0x80488e5 <_mm_add_ps>
> 0x8048760 <MatrixMultiply+840>: movaps %xmm0,0xffffff38(%ebp)
> 0x8048767 <MatrixMultiply+847>: movaps 0xffffffe8(%ebp),%xmm0
> 0x804876b <MatrixMultiply+851>: shufps $0xff,0xffffffe8(%ebp),%xmm0
> 0x8048770 <MatrixMultiply+856>: movaps %xmm0,(%esp,1)
> 0x8048774 <MatrixMultiply+860>: movaps 0xffffffa8(%ebp),%xmm0
> 0x8048778 <MatrixMultiply+864>: movaps %xmm0,0x10(%esp,1)
> 0x804877d <MatrixMultiply+869>: call 0x8048905 <_mm_mul_ps>
> 0x8048782 <MatrixMultiply+874>: movaps %xmm0,0xffffff28(%ebp)
> 0x8048789 <MatrixMultiply+881>: movaps 0xffffffd8(%ebp),%xmm0
> 0x804878d <MatrixMultiply+885>: shufps $0xff,0xffffffd8(%ebp),%xmm0
> 0x8048792 <MatrixMultiply+890>: movaps %xmm0,(%esp,1)
> 0x8048796 <MatrixMultiply+894>: movaps 0xffffff98(%ebp),%xmm0
> 0x804879a <MatrixMultiply+898>: movaps %xmm0,0x10(%esp,1)
> 0x804879f <MatrixMultiply+903>: call 0x8048905 <_mm_mul_ps>
> 0x80487a4 <MatrixMultiply+908>: movaps %xmm0,0xffffff18(%ebp)
> 0x80487ab <MatrixMultiply+915>: movaps 0xffffffc8(%ebp),%xmm0
> 0x80487af <MatrixMultiply+919>: shufps $0xff,0xffffffc8(%ebp),%xmm0
> 0x80487b4 <MatrixMultiply+924>: movaps %xmm0,(%esp,1)
> 0x80487b8 <MatrixMultiply+928>: movaps 0xffffff88(%ebp),%xmm0
> 0x80487bc <MatrixMultiply+932>: movaps %xmm0,0x10(%esp,1)
> 0x80487c1 <MatrixMultiply+937>: call 0x8048905 <_mm_mul_ps>
> 0x80487c6 <MatrixMultiply+942>: movaps %xmm0,0xffffff08(%ebp)
> 0x80487cd <MatrixMultiply+949>: movaps 0xffffffb8(%ebp),%xmm0
> 0x80487d1 <MatrixMultiply+953>: shufps $0xff,0xffffffb8(%ebp),%xmm0
> 0x80487d6 <MatrixMultiply+958>: movaps %xmm0,(%esp,1)
> 0x80487da <MatrixMultiply+962>: movaps 0xffffff78(%ebp),%xmm0
> 0x80487e1 <MatrixMultiply+969>: movaps %xmm0,0x10(%esp,1)
> 0x80487e6 <MatrixMultiply+974>: call 0x8048905 <_mm_mul_ps>
> 0x80487eb <MatrixMultiply+979>: movaps %xmm0,0xfffffef8(%ebp)
> 0x80487f2 <MatrixMultiply+986>: movaps 0xffffff68(%ebp),%xmm0
> 0x80487f9 <MatrixMultiply+993>: movaps %xmm0,(%esp,1)
> 0x80487fd <MatrixMultiply+997>: movaps 0xffffff28(%ebp),%xmm0
> 0x8048804 <MatrixMultiply+1004>: movaps %xmm0,0x10(%esp,1)
> 0x8048809 <MatrixMultiply+1009>: call 0x80488e5 <_mm_add_ps>
> 0x804880e <MatrixMultiply+1014>: movaps %xmm0,0xffffff68(%ebp)
> 0x8048815 <MatrixMultiply+1021>: movaps 0xffffff58(%ebp),%xmm0
> 0x804881c <MatrixMultiply+1028>: movaps %xmm0,(%esp,1)
> 0x8048820 <MatrixMultiply+1032>: movaps 0xffffff18(%ebp),%xmm0
> 0x8048827 <MatrixMultiply+1039>: movaps %xmm0,0x10(%esp,1)
> 0x804882c <MatrixMultiply+1044>: call 0x80488e5 <_mm_add_ps>
> 0x8048831 <MatrixMultiply+1049>: movaps %xmm0,0xffffff58(%ebp)
> 0x8048838 <MatrixMultiply+1056>: movaps 0xffffff48(%ebp),%xmm0
> 0x804883f <MatrixMultiply+1063>: movaps %xmm0,(%esp,1)
> 0x8048843 <MatrixMultiply+1067>: movaps 0xffffff08(%ebp),%xmm0
> 0x804884a <MatrixMultiply+1074>: movaps %xmm0,0x10(%esp,1)
> 0x804884f <MatrixMultiply+1079>: call 0x80488e5 <_mm_add_ps>
> 0x8048854 <MatrixMultiply+1084>: movaps %xmm0,0xffffff48(%ebp)
> 0x804885b <MatrixMultiply+1091>: movaps 0xffffff38(%ebp),%xmm0
> 0x8048862 <MatrixMultiply+1098>: movaps %xmm0,(%esp,1)
> 0x8048866 <MatrixMultiply+1102>: movaps 0xfffffef8(%ebp),%xmm0
> 0x804886d <MatrixMultiply+1109>: movaps %xmm0,0x10(%esp,1)
> 0x8048872 <MatrixMultiply+1114>: call 0x80488e5 <_mm_add_ps>
> 0x8048877 <MatrixMultiply+1119>: movaps %xmm0,0xffffff38(%ebp)
> 0x804887e <MatrixMultiply+1126>: mov 0x10(%ebp),%eax
> 0x8048881 <MatrixMultiply+1129>: mov %eax,(%esp,1)
> 0x8048884 <MatrixMultiply+1132>: movaps 0xffffff68(%ebp),%xmm0
> 0x804888b <MatrixMultiply+1139>: movaps %xmm0,0x4(%esp,1)
> 0x8048890 <MatrixMultiply+1144>: call 0x8048930 <_mm_store_ps>
> 0x8048895 <MatrixMultiply+1149>: mov 0x10(%ebp),%eax
> 0x8048898 <MatrixMultiply+1152>: add $0x10,%eax
> 0x804889b <MatrixMultiply+1155>: mov %eax,(%esp,1)
> 0x804889e <MatrixMultiply+1158>: movaps 0xffffff58(%ebp),%xmm0
> 0x80488a5 <MatrixMultiply+1165>: movaps %xmm0,0x4(%esp,1)
> 0x80488aa <MatrixMultiply+1170>: call 0x8048930 <_mm_store_ps>
> 0x80488af <MatrixMultiply+1175>: mov 0x10(%ebp),%eax
> 0x80488b2 <MatrixMultiply+1178>: add $0x20,%eax
> 0x80488b5 <MatrixMultiply+1181>: mov %eax,(%esp,1)
> 0x80488b8 <MatrixMultiply+1184>: movaps 0xffffff48(%ebp),%xmm0
> 0x80488bf <MatrixMultiply+1191>: movaps %xmm0,0x4(%esp,1)
> 0x80488c4 <MatrixMultiply+1196>: call 0x8048930 <_mm_store_ps>
> 0x80488c9 <MatrixMultiply+1201>: mov 0x10(%ebp),%eax
> 0x80488cc <MatrixMultiply+1204>: add $0x30,%eax
> 0x80488cf <MatrixMultiply+1207>: mov %eax,(%esp,1)
> 0x80488d2 <MatrixMultiply+1210>: movaps 0xffffff38(%ebp),%xmm0
> 0x80488d9 <MatrixMultiply+1217>: movaps %xmm0,0x4(%esp,1)
> 0x80488de <MatrixMultiply+1222>: call 0x8048930 <_mm_store_ps>
> 0x80488e3 <MatrixMultiply+1227>: leave
> 0x80488e4 <MatrixMultiply+1228>: ret
> End of assembler dump.
>
>
> ---------------------------------------------------
> Ian Ollmann, Ph.D. iano@cco.caltech.edu
> ---------------------------------------------------
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: optimization/8049: SSE unaligned vector stores crash with -O0
@ 2002-10-10 12:16 Ian Ollmann
0 siblings, 0 replies; 15+ messages in thread
From: Ian Ollmann @ 2002-10-10 12:16 UTC (permalink / raw)
To: nobody; +Cc: gcc-prs
The following reply was made to PR optimization/8049; it has been noted by GNATS.
From: Ian Ollmann <iano@cco.caltech.edu>
To: Jan Hubicka <jh@suse.cz>
Cc: hubicka@gcc.gnu.org, <gcc-bugs@gcc.gnu.org>, <gcc-prs@gcc.gnu.org>,
<gcc-gnats@gcc.gnu.org>
Subject: Re: optimization/8049: SSE unaligned vector stores crash with -O0
Date: Thu, 10 Oct 2002 12:10:12 -0700
On Thu, 10 Oct 2002, Jan Hubicka wrote:
> > The original example is one. The three buffers passed into MatrixMultiply
> > happen to be aligned on my system. The actual crash happens when the stack
> > copy of C1 is loaded, before the _mm_store_ps (uninlined) function is
> > called.
>
> This looks strange. All stores to C1 works properly and then you get
> movaps crash when you load it to store into output?
> I believe it is the output that is missaligned (destination of store)
> because the destination array is missaligned from the caller.
Hmm, perhaps I got confused between the Intel and ATT argument ordering
scheme. Perhaps this is actually a store to the stack from the earlier
mm_add_ps and gdb misreported the line from the source? What do you think?
(gdb) run
Program received signal SIGSEGV, Segmentation Fault
0x0804888b in MatrixMultiply() (A = 0xbffff9d0, B=0xbffff990, C=0xbffff950)
at main.c:74
74 _mm_store_ps( C + 0, C1 ); //....
0x8048884 <MatrixMultiply+1132>: movaps 0xffffff68 (%ebp),%xmm0
0x804888b <MatrixMultiply+1139>: movaps %xmm0, 0x4 (%esp, 1)
0x8048890 <MatrixMultiply+1144>: call 0x8048930 <_mm_store_ps>
(gdb) info registers
esp 0xbffff800
ebp 0xbffff928
Dump of assembler code for function MatrixMultiply:
0x8048418 <MatrixMultiply>: push %ebp
0x8048419 <MatrixMultiply+1>: mov %esp,%ebp
0x804841b <MatrixMultiply+3>: sub $0x128,%esp
0x8048421 <MatrixMultiply+9>: mov 0x8(%ebp),%eax
0x8048424 <MatrixMultiply+12>: mov %eax,(%esp,1)
0x8048427 <MatrixMultiply+15>: call 0x8048925 <_mm_load_ps>
0x804842c <MatrixMultiply+20>: movaps %xmm0,0xffffffe8(%ebp)
0x8048430 <MatrixMultiply+24>: mov 0x8(%ebp),%eax
0x8048433 <MatrixMultiply+27>: add $0x10,%eax
0x8048436 <MatrixMultiply+30>: mov %eax,(%esp,1)
0x8048439 <MatrixMultiply+33>: call 0x8048925 <_mm_load_ps>
0x804843e <MatrixMultiply+38>: movaps %xmm0,0xffffffd8(%ebp)
0x8048442 <MatrixMultiply+42>: mov 0x8(%ebp),%eax
0x8048445 <MatrixMultiply+45>: add $0x20,%eax
0x8048448 <MatrixMultiply+48>: mov %eax,(%esp,1)
0x804844b <MatrixMultiply+51>: call 0x8048925 <_mm_load_ps>
0x8048450 <MatrixMultiply+56>: movaps %xmm0,0xffffffc8(%ebp)
0x8048454 <MatrixMultiply+60>: mov 0x8(%ebp),%eax
0x8048457 <MatrixMultiply+63>: add $0x30,%eax
0x804845a <MatrixMultiply+66>: mov %eax,(%esp,1)
0x804845d <MatrixMultiply+69>: call 0x8048925 <_mm_load_ps>
0x8048462 <MatrixMultiply+74>: movaps %xmm0,0xffffffb8(%ebp)
0x8048466 <MatrixMultiply+78>: mov 0xc(%ebp),%eax
0x8048469 <MatrixMultiply+81>: mov %eax,(%esp,1)
0x804846c <MatrixMultiply+84>: call 0x8048925 <_mm_load_ps>
0x8048471 <MatrixMultiply+89>: movaps %xmm0,0xffffffa8(%ebp)
0x8048475 <MatrixMultiply+93>: mov 0xc(%ebp),%eax
0x8048478 <MatrixMultiply+96>: add $0x10,%eax
0x804847b <MatrixMultiply+99>: mov %eax,(%esp,1)
0x804847e <MatrixMultiply+102>: call 0x8048925 <_mm_load_ps>
0x8048483 <MatrixMultiply+107>: movaps %xmm0,0xffffff98(%ebp)
0x8048487 <MatrixMultiply+111>: mov 0xc(%ebp),%eax
0x804848a <MatrixMultiply+114>: add $0x20,%eax
0x804848d <MatrixMultiply+117>: mov %eax,(%esp,1)
0x8048490 <MatrixMultiply+120>: call 0x8048925 <_mm_load_ps>
0x8048495 <MatrixMultiply+125>: movaps %xmm0,0xffffff88(%ebp)
0x8048499 <MatrixMultiply+129>: mov 0xc(%ebp),%eax
0x804849c <MatrixMultiply+132>: add $0x30,%eax
0x804849f <MatrixMultiply+135>: mov %eax,(%esp,1)
0x80484a2 <MatrixMultiply+138>: call 0x8048925 <_mm_load_ps>
0x80484a7 <MatrixMultiply+143>: movaps %xmm0,0xffffff78(%ebp)
0x80484ae <MatrixMultiply+150>: movaps 0xffffffe8(%ebp),%xmm0
0x80484b2 <MatrixMultiply+154>: shufps $0x0,0xffffffe8(%ebp),%xmm0
0x80484b7 <MatrixMultiply+159>: movaps %xmm0,(%esp,1)
0x80484bb <MatrixMultiply+163>: movaps 0xffffffa8(%ebp),%xmm0
0x80484bf <MatrixMultiply+167>: movaps %xmm0,0x10(%esp,1)
0x80484c4 <MatrixMultiply+172>: call 0x8048905 <_mm_mul_ps>
0x80484c9 <MatrixMultiply+177>: movaps %xmm0,0xffffff68(%ebp)
0x80484d0 <MatrixMultiply+184>: movaps 0xffffffd8(%ebp),%xmm0
0x80484d4 <MatrixMultiply+188>: shufps $0x0,0xffffffd8(%ebp),%xmm0
0x80484d9 <MatrixMultiply+193>: movaps %xmm0,(%esp,1)
0x80484dd <MatrixMultiply+197>: movaps 0xffffff98(%ebp),%xmm0
0x80484e1 <MatrixMultiply+201>: movaps %xmm0,0x10(%esp,1)
0x80484e6 <MatrixMultiply+206>: call 0x8048905 <_mm_mul_ps>
0x80484eb <MatrixMultiply+211>: movaps %xmm0,0xffffff58(%ebp)
0x80484f2 <MatrixMultiply+218>: movaps 0xffffffc8(%ebp),%xmm0
0x80484f6 <MatrixMultiply+222>: shufps $0x0,0xffffffc8(%ebp),%xmm0
0x80484fb <MatrixMultiply+227>: movaps %xmm0,(%esp,1)
0x80484ff <MatrixMultiply+231>: movaps 0xffffff88(%ebp),%xmm0
0x8048503 <MatrixMultiply+235>: movaps %xmm0,0x10(%esp,1)
0x8048508 <MatrixMultiply+240>: call 0x8048905 <_mm_mul_ps>
0x804850d <MatrixMultiply+245>: movaps %xmm0,0xffffff48(%ebp)
0x8048514 <MatrixMultiply+252>: movaps 0xffffffb8(%ebp),%xmm0
0x8048518 <MatrixMultiply+256>: shufps $0x0,0xffffffb8(%ebp),%xmm0
0x804851d <MatrixMultiply+261>: movaps %xmm0,(%esp,1)
0x8048521 <MatrixMultiply+265>: movaps 0xffffff78(%ebp),%xmm0
0x8048528 <MatrixMultiply+272>: movaps %xmm0,0x10(%esp,1)
0x804852d <MatrixMultiply+277>: call 0x8048905 <_mm_mul_ps>
0x8048532 <MatrixMultiply+282>: movaps %xmm0,0xffffff38(%ebp)
0x8048539 <MatrixMultiply+289>: movaps 0xffffffe8(%ebp),%xmm0
0x804853d <MatrixMultiply+293>: shufps $0x55,0xffffffe8(%ebp),%xmm0
0x8048542 <MatrixMultiply+298>: movaps %xmm0,(%esp,1)
0x8048546 <MatrixMultiply+302>: movaps 0xffffffa8(%ebp),%xmm0
0x804854a <MatrixMultiply+306>: movaps %xmm0,0x10(%esp,1)
0x804854f <MatrixMultiply+311>: call 0x8048905 <_mm_mul_ps>
0x8048554 <MatrixMultiply+316>: movaps %xmm0,0xffffff28(%ebp)
0x804855b <MatrixMultiply+323>: movaps 0xffffffd8(%ebp),%xmm0
0x804855f <MatrixMultiply+327>: shufps $0x55,0xffffffd8(%ebp),%xmm0
0x8048564 <MatrixMultiply+332>: movaps %xmm0,(%esp,1)
0x8048568 <MatrixMultiply+336>: movaps 0xffffff98(%ebp),%xmm0
0x804856c <MatrixMultiply+340>: movaps %xmm0,0x10(%esp,1)
0x8048571 <MatrixMultiply+345>: call 0x8048905 <_mm_mul_ps>
0x8048576 <MatrixMultiply+350>: movaps %xmm0,0xffffff18(%ebp)
0x804857d <MatrixMultiply+357>: movaps 0xffffffc8(%ebp),%xmm0
0x8048581 <MatrixMultiply+361>: shufps $0x55,0xffffffc8(%ebp),%xmm0
0x8048586 <MatrixMultiply+366>: movaps %xmm0,(%esp,1)
0x804858a <MatrixMultiply+370>: movaps 0xffffff88(%ebp),%xmm0
0x804858e <MatrixMultiply+374>: movaps %xmm0,0x10(%esp,1)
0x8048593 <MatrixMultiply+379>: call 0x8048905 <_mm_mul_ps>
0x8048598 <MatrixMultiply+384>: movaps %xmm0,0xffffff08(%ebp)
0x804859f <MatrixMultiply+391>: movaps 0xffffffb8(%ebp),%xmm0
0x80485a3 <MatrixMultiply+395>: shufps $0x55,0xffffffb8(%ebp),%xmm0
0x80485a8 <MatrixMultiply+400>: movaps %xmm0,(%esp,1)
0x80485ac <MatrixMultiply+404>: movaps 0xffffff78(%ebp),%xmm0
0x80485b3 <MatrixMultiply+411>: movaps %xmm0,0x10(%esp,1)
0x80485b8 <MatrixMultiply+416>: call 0x8048905 <_mm_mul_ps>
0x80485bd <MatrixMultiply+421>: movaps %xmm0,0xfffffef8(%ebp)
0x80485c4 <MatrixMultiply+428>: movaps 0xffffff68(%ebp),%xmm0
0x80485cb <MatrixMultiply+435>: movaps %xmm0,(%esp,1)
0x80485cf <MatrixMultiply+439>: movaps 0xffffff28(%ebp),%xmm0
0x80485d6 <MatrixMultiply+446>: movaps %xmm0,0x10(%esp,1)
0x80485db <MatrixMultiply+451>: call 0x80488e5 <_mm_add_ps>
0x80485e0 <MatrixMultiply+456>: movaps %xmm0,0xffffff68(%ebp)
0x80485e7 <MatrixMultiply+463>: movaps 0xffffff58(%ebp),%xmm0
0x80485ee <MatrixMultiply+470>: movaps %xmm0,(%esp,1)
0x80485f2 <MatrixMultiply+474>: movaps 0xffffff18(%ebp),%xmm0
0x80485f9 <MatrixMultiply+481>: movaps %xmm0,0x10(%esp,1)
0x80485fe <MatrixMultiply+486>: call 0x80488e5 <_mm_add_ps>
0x8048603 <MatrixMultiply+491>: movaps %xmm0,0xffffff58(%ebp)
0x804860a <MatrixMultiply+498>: movaps 0xffffff48(%ebp),%xmm0
0x8048611 <MatrixMultiply+505>: movaps %xmm0,(%esp,1)
0x8048615 <MatrixMultiply+509>: movaps 0xffffff08(%ebp),%xmm0
0x804861c <MatrixMultiply+516>: movaps %xmm0,0x10(%esp,1)
0x8048621 <MatrixMultiply+521>: call 0x80488e5 <_mm_add_ps>
0x8048626 <MatrixMultiply+526>: movaps %xmm0,0xffffff48(%ebp)
0x804862d <MatrixMultiply+533>: movaps 0xffffff38(%ebp),%xmm0
0x8048634 <MatrixMultiply+540>: movaps %xmm0,(%esp,1)
0x8048638 <MatrixMultiply+544>: movaps 0xfffffef8(%ebp),%xmm0
0x804863f <MatrixMultiply+551>: movaps %xmm0,0x10(%esp,1)
0x8048644 <MatrixMultiply+556>: call 0x80488e5 <_mm_add_ps>
0x8048649 <MatrixMultiply+561>: movaps %xmm0,0xffffff38(%ebp)
0x8048650 <MatrixMultiply+568>: movaps 0xffffffe8(%ebp),%xmm0
0x8048654 <MatrixMultiply+572>: shufps $0xaa,0xffffffe8(%ebp),%xmm0
0x8048659 <MatrixMultiply+577>: movaps %xmm0,(%esp,1)
0x804865d <MatrixMultiply+581>: movaps 0xffffffa8(%ebp),%xmm0
0x8048661 <MatrixMultiply+585>: movaps %xmm0,0x10(%esp,1)
0x8048666 <MatrixMultiply+590>: call 0x8048905 <_mm_mul_ps>
0x804866b <MatrixMultiply+595>: movaps %xmm0,0xffffff28(%ebp)
0x8048672 <MatrixMultiply+602>: movaps 0xffffffd8(%ebp),%xmm0
0x8048676 <MatrixMultiply+606>: shufps $0xaa,0xffffffd8(%ebp),%xmm0
0x804867b <MatrixMultiply+611>: movaps %xmm0,(%esp,1)
0x804867f <MatrixMultiply+615>: movaps 0xffffff98(%ebp),%xmm0
0x8048683 <MatrixMultiply+619>: movaps %xmm0,0x10(%esp,1)
0x8048688 <MatrixMultiply+624>: call 0x8048905 <_mm_mul_ps>
0x804868d <MatrixMultiply+629>: movaps %xmm0,0xffffff18(%ebp)
0x8048694 <MatrixMultiply+636>: movaps 0xffffffc8(%ebp),%xmm0
0x8048698 <MatrixMultiply+640>: shufps $0xaa,0xffffffc8(%ebp),%xmm0
0x804869d <MatrixMultiply+645>: movaps %xmm0,(%esp,1)
0x80486a1 <MatrixMultiply+649>: movaps 0xffffff88(%ebp),%xmm0
0x80486a5 <MatrixMultiply+653>: movaps %xmm0,0x10(%esp,1)
0x80486aa <MatrixMultiply+658>: call 0x8048905 <_mm_mul_ps>
0x80486af <MatrixMultiply+663>: movaps %xmm0,0xffffff08(%ebp)
0x80486b6 <MatrixMultiply+670>: movaps 0xffffffb8(%ebp),%xmm0
0x80486ba <MatrixMultiply+674>: shufps $0xaa,0xffffffb8(%ebp),%xmm0
0x80486bf <MatrixMultiply+679>: movaps %xmm0,(%esp,1)
0x80486c3 <MatrixMultiply+683>: movaps 0xffffff78(%ebp),%xmm0
0x80486ca <MatrixMultiply+690>: movaps %xmm0,0x10(%esp,1)
0x80486cf <MatrixMultiply+695>: call 0x8048905 <_mm_mul_ps>
0x80486d4 <MatrixMultiply+700>: movaps %xmm0,0xfffffef8(%ebp)
0x80486db <MatrixMultiply+707>: movaps 0xffffff68(%ebp),%xmm0
0x80486e2 <MatrixMultiply+714>: movaps %xmm0,(%esp,1)
0x80486e6 <MatrixMultiply+718>: movaps 0xffffff28(%ebp),%xmm0
0x80486ed <MatrixMultiply+725>: movaps %xmm0,0x10(%esp,1)
0x80486f2 <MatrixMultiply+730>: call 0x80488e5 <_mm_add_ps>
0x80486f7 <MatrixMultiply+735>: movaps %xmm0,0xffffff68(%ebp)
0x80486fe <MatrixMultiply+742>: movaps 0xffffff58(%ebp),%xmm0
0x8048705 <MatrixMultiply+749>: movaps %xmm0,(%esp,1)
0x8048709 <MatrixMultiply+753>: movaps 0xffffff18(%ebp),%xmm0
0x8048710 <MatrixMultiply+760>: movaps %xmm0,0x10(%esp,1)
0x8048715 <MatrixMultiply+765>: call 0x80488e5 <_mm_add_ps>
0x804871a <MatrixMultiply+770>: movaps %xmm0,0xffffff58(%ebp)
0x8048721 <MatrixMultiply+777>: movaps 0xffffff48(%ebp),%xmm0
0x8048728 <MatrixMultiply+784>: movaps %xmm0,(%esp,1)
0x804872c <MatrixMultiply+788>: movaps 0xffffff08(%ebp),%xmm0
0x8048733 <MatrixMultiply+795>: movaps %xmm0,0x10(%esp,1)
0x8048738 <MatrixMultiply+800>: call 0x80488e5 <_mm_add_ps>
0x804873d <MatrixMultiply+805>: movaps %xmm0,0xffffff48(%ebp)
0x8048744 <MatrixMultiply+812>: movaps 0xffffff38(%ebp),%xmm0
0x804874b <MatrixMultiply+819>: movaps %xmm0,(%esp,1)
0x804874f <MatrixMultiply+823>: movaps 0xfffffef8(%ebp),%xmm0
0x8048756 <MatrixMultiply+830>: movaps %xmm0,0x10(%esp,1)
0x804875b <MatrixMultiply+835>: call 0x80488e5 <_mm_add_ps>
0x8048760 <MatrixMultiply+840>: movaps %xmm0,0xffffff38(%ebp)
0x8048767 <MatrixMultiply+847>: movaps 0xffffffe8(%ebp),%xmm0
0x804876b <MatrixMultiply+851>: shufps $0xff,0xffffffe8(%ebp),%xmm0
0x8048770 <MatrixMultiply+856>: movaps %xmm0,(%esp,1)
0x8048774 <MatrixMultiply+860>: movaps 0xffffffa8(%ebp),%xmm0
0x8048778 <MatrixMultiply+864>: movaps %xmm0,0x10(%esp,1)
0x804877d <MatrixMultiply+869>: call 0x8048905 <_mm_mul_ps>
0x8048782 <MatrixMultiply+874>: movaps %xmm0,0xffffff28(%ebp)
0x8048789 <MatrixMultiply+881>: movaps 0xffffffd8(%ebp),%xmm0
0x804878d <MatrixMultiply+885>: shufps $0xff,0xffffffd8(%ebp),%xmm0
0x8048792 <MatrixMultiply+890>: movaps %xmm0,(%esp,1)
0x8048796 <MatrixMultiply+894>: movaps 0xffffff98(%ebp),%xmm0
0x804879a <MatrixMultiply+898>: movaps %xmm0,0x10(%esp,1)
0x804879f <MatrixMultiply+903>: call 0x8048905 <_mm_mul_ps>
0x80487a4 <MatrixMultiply+908>: movaps %xmm0,0xffffff18(%ebp)
0x80487ab <MatrixMultiply+915>: movaps 0xffffffc8(%ebp),%xmm0
0x80487af <MatrixMultiply+919>: shufps $0xff,0xffffffc8(%ebp),%xmm0
0x80487b4 <MatrixMultiply+924>: movaps %xmm0,(%esp,1)
0x80487b8 <MatrixMultiply+928>: movaps 0xffffff88(%ebp),%xmm0
0x80487bc <MatrixMultiply+932>: movaps %xmm0,0x10(%esp,1)
0x80487c1 <MatrixMultiply+937>: call 0x8048905 <_mm_mul_ps>
0x80487c6 <MatrixMultiply+942>: movaps %xmm0,0xffffff08(%ebp)
0x80487cd <MatrixMultiply+949>: movaps 0xffffffb8(%ebp),%xmm0
0x80487d1 <MatrixMultiply+953>: shufps $0xff,0xffffffb8(%ebp),%xmm0
0x80487d6 <MatrixMultiply+958>: movaps %xmm0,(%esp,1)
0x80487da <MatrixMultiply+962>: movaps 0xffffff78(%ebp),%xmm0
0x80487e1 <MatrixMultiply+969>: movaps %xmm0,0x10(%esp,1)
0x80487e6 <MatrixMultiply+974>: call 0x8048905 <_mm_mul_ps>
0x80487eb <MatrixMultiply+979>: movaps %xmm0,0xfffffef8(%ebp)
0x80487f2 <MatrixMultiply+986>: movaps 0xffffff68(%ebp),%xmm0
0x80487f9 <MatrixMultiply+993>: movaps %xmm0,(%esp,1)
0x80487fd <MatrixMultiply+997>: movaps 0xffffff28(%ebp),%xmm0
0x8048804 <MatrixMultiply+1004>: movaps %xmm0,0x10(%esp,1)
0x8048809 <MatrixMultiply+1009>: call 0x80488e5 <_mm_add_ps>
0x804880e <MatrixMultiply+1014>: movaps %xmm0,0xffffff68(%ebp)
0x8048815 <MatrixMultiply+1021>: movaps 0xffffff58(%ebp),%xmm0
0x804881c <MatrixMultiply+1028>: movaps %xmm0,(%esp,1)
0x8048820 <MatrixMultiply+1032>: movaps 0xffffff18(%ebp),%xmm0
0x8048827 <MatrixMultiply+1039>: movaps %xmm0,0x10(%esp,1)
0x804882c <MatrixMultiply+1044>: call 0x80488e5 <_mm_add_ps>
0x8048831 <MatrixMultiply+1049>: movaps %xmm0,0xffffff58(%ebp)
0x8048838 <MatrixMultiply+1056>: movaps 0xffffff48(%ebp),%xmm0
0x804883f <MatrixMultiply+1063>: movaps %xmm0,(%esp,1)
0x8048843 <MatrixMultiply+1067>: movaps 0xffffff08(%ebp),%xmm0
0x804884a <MatrixMultiply+1074>: movaps %xmm0,0x10(%esp,1)
0x804884f <MatrixMultiply+1079>: call 0x80488e5 <_mm_add_ps>
0x8048854 <MatrixMultiply+1084>: movaps %xmm0,0xffffff48(%ebp)
0x804885b <MatrixMultiply+1091>: movaps 0xffffff38(%ebp),%xmm0
0x8048862 <MatrixMultiply+1098>: movaps %xmm0,(%esp,1)
0x8048866 <MatrixMultiply+1102>: movaps 0xfffffef8(%ebp),%xmm0
0x804886d <MatrixMultiply+1109>: movaps %xmm0,0x10(%esp,1)
0x8048872 <MatrixMultiply+1114>: call 0x80488e5 <_mm_add_ps>
0x8048877 <MatrixMultiply+1119>: movaps %xmm0,0xffffff38(%ebp)
0x804887e <MatrixMultiply+1126>: mov 0x10(%ebp),%eax
0x8048881 <MatrixMultiply+1129>: mov %eax,(%esp,1)
0x8048884 <MatrixMultiply+1132>: movaps 0xffffff68(%ebp),%xmm0
0x804888b <MatrixMultiply+1139>: movaps %xmm0,0x4(%esp,1)
0x8048890 <MatrixMultiply+1144>: call 0x8048930 <_mm_store_ps>
0x8048895 <MatrixMultiply+1149>: mov 0x10(%ebp),%eax
0x8048898 <MatrixMultiply+1152>: add $0x10,%eax
0x804889b <MatrixMultiply+1155>: mov %eax,(%esp,1)
0x804889e <MatrixMultiply+1158>: movaps 0xffffff58(%ebp),%xmm0
0x80488a5 <MatrixMultiply+1165>: movaps %xmm0,0x4(%esp,1)
0x80488aa <MatrixMultiply+1170>: call 0x8048930 <_mm_store_ps>
0x80488af <MatrixMultiply+1175>: mov 0x10(%ebp),%eax
0x80488b2 <MatrixMultiply+1178>: add $0x20,%eax
0x80488b5 <MatrixMultiply+1181>: mov %eax,(%esp,1)
0x80488b8 <MatrixMultiply+1184>: movaps 0xffffff48(%ebp),%xmm0
0x80488bf <MatrixMultiply+1191>: movaps %xmm0,0x4(%esp,1)
0x80488c4 <MatrixMultiply+1196>: call 0x8048930 <_mm_store_ps>
0x80488c9 <MatrixMultiply+1201>: mov 0x10(%ebp),%eax
0x80488cc <MatrixMultiply+1204>: add $0x30,%eax
0x80488cf <MatrixMultiply+1207>: mov %eax,(%esp,1)
0x80488d2 <MatrixMultiply+1210>: movaps 0xffffff38(%ebp),%xmm0
0x80488d9 <MatrixMultiply+1217>: movaps %xmm0,0x4(%esp,1)
0x80488de <MatrixMultiply+1222>: call 0x8048930 <_mm_store_ps>
0x80488e3 <MatrixMultiply+1227>: leave
0x80488e4 <MatrixMultiply+1228>: ret
End of assembler dump.
---------------------------------------------------
Ian Ollmann, Ph.D. iano@cco.caltech.edu
---------------------------------------------------
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: optimization/8049: SSE unaligned vector stores crash with -O0
@ 2002-10-10 11:06 Jan Hubicka
0 siblings, 0 replies; 15+ messages in thread
From: Jan Hubicka @ 2002-10-10 11:06 UTC (permalink / raw)
To: nobody; +Cc: gcc-prs
The following reply was made to PR optimization/8049; it has been noted by GNATS.
From: Jan Hubicka <jh@suse.cz>
To: Ian Ollmann <iano@cco.caltech.edu>
Cc: Jan Hubicka <jh@suse.cz>, hubicka@gcc.gnu.org,
gcc-bugs@gcc.gnu.org, gcc-prs@gcc.gnu.org, gcc-gnats@gcc.gnu.org
Subject: Re: optimization/8049: SSE unaligned vector stores crash with -O0
Date: Thu, 10 Oct 2002 20:03:17 +0200
> On Thu, 10 Oct 2002, Jan Hubicka wrote:
>
> > > On 10 Oct 2002 hubicka@gcc.gnu.org wrote:
> > >
> > > > Synopsis: SSE unaligned vector stores crash with -O0
> > > >
> > > > State-Changed-From-To: open->closed
> > > > State-Changed-By: hubicka
> > > > State-Changed-When: Thu Oct 10 09:45:59 2002
> > > > State-Changed-Why:
> > > > It is runtime bug to not align stack properly for main.
> > > > It will go away with runtime compiled using gcc 3.2 or can be workarounded by avoiding vector stuff in main.
> > > >
> > > > http://gcc.gnu.org/cgi-bin/gnatsweb.pl?cmd=view%20audit-trail&database=gcc&pr=8049
> > >
> > > I see this in functions that are not main(). Didn't I provide an example
> > > of one?
> >
> > I didn't see it. Can you send me some?
>
> The original example is one. The three buffers passed into MatrixMultiply
> happen to be aligned on my system. The actual crash happens when the stack
> copy of C1 is loaded, before the _mm_store_ps (uninlined) function is
> called.
This looks strange. All stores to C1 works properly and then you get
movaps crash when you load it to store into output?
I believe it is the output that is missaligned (destination of store)
because the destination array is missaligned from the caller.
Honza
>
> Ian
>
> ---------------------------------------------------
> Ian Ollmann, Ph.D. iano@cco.caltech.edu
> ---------------------------------------------------
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: optimization/8049: SSE unaligned vector stores crash with -O0
@ 2002-10-10 10:56 Ian Ollmann
0 siblings, 0 replies; 15+ messages in thread
From: Ian Ollmann @ 2002-10-10 10:56 UTC (permalink / raw)
To: nobody; +Cc: gcc-prs
The following reply was made to PR optimization/8049; it has been noted by GNATS.
From: Ian Ollmann <iano@cco.caltech.edu>
To: Jan Hubicka <jh@suse.cz>
Cc: hubicka@gcc.gnu.org, <gcc-bugs@gcc.gnu.org>, <gcc-prs@gcc.gnu.org>,
<gcc-gnats@gcc.gnu.org>
Subject: Re: optimization/8049: SSE unaligned vector stores crash with -O0
Date: Thu, 10 Oct 2002 10:54:51 -0700
On Thu, 10 Oct 2002, Jan Hubicka wrote:
> > On 10 Oct 2002 hubicka@gcc.gnu.org wrote:
> >
> > > Synopsis: SSE unaligned vector stores crash with -O0
> > >
> > > State-Changed-From-To: open->closed
> > > State-Changed-By: hubicka
> > > State-Changed-When: Thu Oct 10 09:45:59 2002
> > > State-Changed-Why:
> > > It is runtime bug to not align stack properly for main.
> > > It will go away with runtime compiled using gcc 3.2 or can be workarounded by avoiding vector stuff in main.
> > >
> > > http://gcc.gnu.org/cgi-bin/gnatsweb.pl?cmd=view%20audit-trail&database=gcc&pr=8049
> >
> > I see this in functions that are not main(). Didn't I provide an example
> > of one?
>
> I didn't see it. Can you send me some?
The original example is one. The three buffers passed into MatrixMultiply
happen to be aligned on my system. The actual crash happens when the stack
copy of C1 is loaded, before the _mm_store_ps (uninlined) function is
called.
Ian
---------------------------------------------------
Ian Ollmann, Ph.D. iano@cco.caltech.edu
---------------------------------------------------
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: optimization/8049: SSE unaligned vector stores crash with -O0
@ 2002-10-10 10:56 Jan Hubicka
0 siblings, 0 replies; 15+ messages in thread
From: Jan Hubicka @ 2002-10-10 10:56 UTC (permalink / raw)
To: nobody; +Cc: gcc-prs
The following reply was made to PR optimization/8049; it has been noted by GNATS.
From: Jan Hubicka <jh@suse.cz>
To: Ian Ollmann <iano@cco.caltech.edu>
Cc: hubicka@gcc.gnu.org, gcc-bugs@gcc.gnu.org, gcc-prs@gcc.gnu.org,
gcc-gnats@gcc.gnu.org
Subject: Re: optimization/8049: SSE unaligned vector stores crash with -O0
Date: Thu, 10 Oct 2002 19:47:20 +0200
> On 10 Oct 2002 hubicka@gcc.gnu.org wrote:
>
> > Synopsis: SSE unaligned vector stores crash with -O0
> >
> > State-Changed-From-To: open->closed
> > State-Changed-By: hubicka
> > State-Changed-When: Thu Oct 10 09:45:59 2002
> > State-Changed-Why:
> > It is runtime bug to not align stack properly for main.
> > It will go away with runtime compiled using gcc 3.2 or can be workarounded by avoiding vector stuff in main.
> >
> > http://gcc.gnu.org/cgi-bin/gnatsweb.pl?cmd=view%20audit-trail&database=gcc&pr=8049
>
> I see this in functions that are not main(). Didn't I provide an example
> of one?
I didn't see it. Can you send me some?
>
> Ian
>
> ---------------------------------------------------
> Ian Ollmann, Ph.D. iano@cco.caltech.edu
> ---------------------------------------------------
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: optimization/8049: SSE unaligned vector stores crash with -O0
@ 2002-10-10 10:46 Ian Ollmann
0 siblings, 0 replies; 15+ messages in thread
From: Ian Ollmann @ 2002-10-10 10:46 UTC (permalink / raw)
To: nobody; +Cc: gcc-prs
The following reply was made to PR optimization/8049; it has been noted by GNATS.
From: Ian Ollmann <iano@cco.caltech.edu>
To: hubicka@gcc.gnu.org, <gcc-bugs@gcc.gnu.org>, <gcc-prs@gcc.gnu.org>,
<gcc-gnats@gcc.gnu.org>
Cc:
Subject: Re: optimization/8049: SSE unaligned vector stores crash with -O0
Date: Thu, 10 Oct 2002 10:46:47 -0700
On 10 Oct 2002 hubicka@gcc.gnu.org wrote:
> Synopsis: SSE unaligned vector stores crash with -O0
>
> State-Changed-From-To: open->closed
> State-Changed-By: hubicka
> State-Changed-When: Thu Oct 10 09:45:59 2002
> State-Changed-Why:
> It is runtime bug to not align stack properly for main.
> It will go away with runtime compiled using gcc 3.2 or can be workarounded by avoiding vector stuff in main.
>
> http://gcc.gnu.org/cgi-bin/gnatsweb.pl?cmd=view%20audit-trail&database=gcc&pr=8049
I see this in functions that are not main(). Didn't I provide an example
of one?
Ian
---------------------------------------------------
Ian Ollmann, Ph.D. iano@cco.caltech.edu
---------------------------------------------------
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: optimization/8049: SSE unaligned vector stores crash with -O0
@ 2002-10-10 9:46 hubicka
0 siblings, 0 replies; 15+ messages in thread
From: hubicka @ 2002-10-10 9:46 UTC (permalink / raw)
To: gcc-bugs, gcc-prs, iano, nobody
Synopsis: SSE unaligned vector stores crash with -O0
State-Changed-From-To: open->closed
State-Changed-By: hubicka
State-Changed-When: Thu Oct 10 09:45:59 2002
State-Changed-Why:
It is runtime bug to not align stack properly for main.
It will go away with runtime compiled using gcc 3.2 or can be workarounded by avoiding vector stuff in main.
http://gcc.gnu.org/cgi-bin/gnatsweb.pl?cmd=view%20audit-trail&database=gcc&pr=8049
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: optimization/8049: SSE unaligned vector stores crash with -O0
@ 2002-09-25 20:16 Tim Prince
0 siblings, 0 replies; 15+ messages in thread
From: Tim Prince @ 2002-09-25 20:16 UTC (permalink / raw)
To: nobody; +Cc: gcc-prs
The following reply was made to PR optimization/8049; it has been noted by GNATS.
From: Tim Prince <tprince@computer.org>
To: iano@cco.caltech.edu, gcc-gnats@gcc.gnu.org
Cc:
Subject: Re: optimization/8049: SSE unaligned vector stores crash with -O0
Date: Wed, 25 Sep 2002 20:13:05 -0700
On Wednesday 25 September 2002 15:58, iano@cco.caltech.edu wrote:
> >Number: 8049
> >Category: optimization
> >Synopsis: SSE unaligned vector stores crash with -O0
> >Confidential: no
> >Severity: critical
> >Priority: medium
> >Responsible: unassigned
> >State: open
> >Class: sw-bug
> >Submitter-Id: net
> >Arrival-Date: Wed Sep 25 16:06:00 PDT 2002
> >Closed-Date:
> >Last-Modified:
> >Originator: Ian Ollmann
> >Release: gcc 3.3 20020925 (experimental)
> >Organization:
> >Environment:
>
> Red Hat Linux 7.3 (P4/2530)
>
> >Description:
>
> I have encountered several cases at optimization level zero (-O0) wherein
> the app seg faults on a movaps instruction just prior to calling
> _mm_store_ps().
>
> gcc -O0 -msse -g main.c
>
> main.c:
> -------
> #include <xmmintrin.h>
> #include <stdlib.h>
>
> void MatrixMultiply( float A[16], float B[16], float C[16] );
>
> int main( void )
> {
> float A[16] __attribute__ ((aligned (16) ) );
> float B[16] __attribute__ ((aligned (16) ) );
> float C[16] __attribute__ ((aligned (16) ) );
> int i;
>
I've seen several discussions indicating why gcc is unable to support these
alignments in main(). In a search of gcc info, I don't find this point
explained adequately. Ideally, the compiler would diagnose this as an error.
Also, somewhere, the measures required to get alignments on Windows targets
should be documented.
--
Tim Prince
^ permalink raw reply [flat|nested] 15+ messages in thread
* optimization/8049: SSE unaligned vector stores crash with -O0
@ 2002-09-25 16:06 iano
0 siblings, 0 replies; 15+ messages in thread
From: iano @ 2002-09-25 16:06 UTC (permalink / raw)
To: gcc-gnats
>Number: 8049
>Category: optimization
>Synopsis: SSE unaligned vector stores crash with -O0
>Confidential: no
>Severity: critical
>Priority: medium
>Responsible: unassigned
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Wed Sep 25 16:06:00 PDT 2002
>Closed-Date:
>Last-Modified:
>Originator: Ian Ollmann
>Release: gcc 3.3 20020925 (experimental)
>Organization:
>Environment:
Red Hat Linux 7.3 (P4/2530)
>Description:
I have encountered several cases at optimization level zero (-O0) wherein the app seg faults on a movaps instruction just prior to calling _mm_store_ps().
gcc -O0 -msse -g main.c
main.c:
-------
#include <xmmintrin.h>
#include <stdlib.h>
void MatrixMultiply( float A[16], float B[16], float C[16] );
int main( void )
{
float A[16] __attribute__ ((aligned (16) ) );
float B[16] __attribute__ ((aligned (16) ) );
float C[16] __attribute__ ((aligned (16) ) );
int i;
for( i = 0; i < 16; i++ )
{
A[i] = (double) (rand() - RAND_MAX/2) / (double) (RAND_MAX );
B[i] = (double) (rand() - RAND_MAX/2) / (double) (RAND_MAX );
C[i] = 0.0;
}
MatrixMultiply( A, B, C ); //the crasher
return 0;
}
void MatrixMultiply( float A[16], float B[16], float C[16] )
{
__m128 A1 = _mm_load_ps( A );
__m128 A2 = _mm_load_ps( A + 4 );
__m128 A3 = _mm_load_ps( A + 8 );
__m128 A4 = _mm_load_ps( A + 12 );
__m128 B1 = _mm_load_ps( B );
__m128 B2 = _mm_load_ps( B + 4 );
__m128 B3 = _mm_load_ps( B + 8 );
__m128 B4 = _mm_load_ps( B + 12 );
__m128 C1 = _mm_mul_ps( _mm_shuffle_ps( A1, A1, _MM_SHUFFLE(0,0,0,0) ), B1 );
__m128 C2 = _mm_mul_ps( _mm_shuffle_ps( A2, A2, _MM_SHUFFLE(0,0,0,0) ), B2 );
__m128 C3 = _mm_mul_ps( _mm_shuffle_ps( A3, A3, _MM_SHUFFLE(0,0,0,0) ), B3 );
__m128 C4 = _mm_mul_ps( _mm_shuffle_ps( A4, A4, _MM_SHUFFLE(0,0,0,0) ), B4 );
__m128 D1 = _mm_mul_ps( _mm_shuffle_ps( A1, A1, _MM_SHUFFLE(1,1,1,1) ), B1 );
__m128 D2 = _mm_mul_ps( _mm_shuffle_ps( A2, A2, _MM_SHUFFLE(1,1,1,1) ), B2 );
__m128 D3 = _mm_mul_ps( _mm_shuffle_ps( A3, A3, _MM_SHUFFLE(1,1,1,1) ), B3 );
__m128 D4 = _mm_mul_ps( _mm_shuffle_ps( A4, A4, _MM_SHUFFLE(1,1,1,1) ), B4 );
C1 = _mm_add_ps( C1, D1 );
C2 = _mm_add_ps( C2, D2 );
C3 = _mm_add_ps( C3, D3 );
C4 = _mm_add_ps( C4, D4 );
D1 = _mm_mul_ps( _mm_shuffle_ps( A1, A1, _MM_SHUFFLE(2,2,2,2) ), B1 );
D2 = _mm_mul_ps( _mm_shuffle_ps( A2, A2, _MM_SHUFFLE(2,2,2,2) ), B2 );
D3 = _mm_mul_ps( _mm_shuffle_ps( A3, A3, _MM_SHUFFLE(2,2,2,2) ), B3 );
D4 = _mm_mul_ps( _mm_shuffle_ps( A4, A4, _MM_SHUFFLE(2,2,2,2) ), B4 );
C1 = _mm_add_ps( C1, D1 );
C2 = _mm_add_ps( C2, D2 );
C3 = _mm_add_ps( C3, D3 );
C4 = _mm_add_ps( C4, D4 );
D1 = _mm_mul_ps( _mm_shuffle_ps( A1, A1, _MM_SHUFFLE(3,3,3,3) ), B1 );
D2 = _mm_mul_ps( _mm_shuffle_ps( A2, A2, _MM_SHUFFLE(3,3,3,3) ), B2 );
D3 = _mm_mul_ps( _mm_shuffle_ps( A3, A3, _MM_SHUFFLE(3,3,3,3) ), B3 );
D4 = _mm_mul_ps( _mm_shuffle_ps( A4, A4, _MM_SHUFFLE(3,3,3,3) ), B4 );
C1 = _mm_add_ps( C1, D1 );
C2 = _mm_add_ps( C2, D2 );
C3 = _mm_add_ps( C3, D3 );
C4 = _mm_add_ps( C4, D4 );
_mm_store_ps( C + 0, C1 ); //Crashes here on movaps that reads C1 into xmm0 prior to call _mm_store_ps if -O0
_mm_store_ps( C + 4, C2 );
_mm_store_ps( C + 8, C3 );
_mm_store_ps( C + 12, C4 );
}
>How-To-Repeat:
>Fix:
>Release-Note:
>Audit-Trail:
>Unformatted:
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2003-04-08 13:14 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-10-10 18:06 optimization/8049: SSE unaligned vector stores crash with -O0 Ian Ollmann
-- strict thread matches above, loose matches on Subject: below --
2003-04-08 13:14 hubicka
2002-10-11 12:41 hubicka
2002-10-11 2:36 Jan Hubicka
2002-10-10 17:46 Ian Ollmann
2002-10-10 12:46 Ian Ollmann
2002-10-10 12:26 Jan Hubicka
2002-10-10 12:16 Ian Ollmann
2002-10-10 11:06 Jan Hubicka
2002-10-10 10:56 Jan Hubicka
2002-10-10 10:56 Ian Ollmann
2002-10-10 10:46 Ian Ollmann
2002-10-10 9:46 hubicka
2002-09-25 20:16 Tim Prince
2002-09-25 16:06 iano
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).