public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/23909] New: Incorrect code generated for SSE2 based xor routine when compiled with -O2 -fomit-frame-pointer
@ 2005-09-16 5:38 jeff at panasas dot com
2005-09-16 5:45 ` [Bug c/23909] " jeff at panasas dot com
` (4 more replies)
0 siblings, 5 replies; 7+ messages in thread
From: jeff at panasas dot com @ 2005-09-16 5:38 UTC (permalink / raw)
To: gcc-bugs
We use some optimized XOR routines for software RAID. Unfortunately, the
compiler generated incorrect code when this was compiled for Redhat 7.3 +
2.4.24 (this is normally kernel code). I later found out that all versions of
gcc that I tested (up to FC4 - 4.0.0 20050519 (Red Hat 4.0.0-8)) had this
issue.
gcc -v on RH 7.3:
build-lin3> gcc -v
Reading specs from /usr/lib/gcc-lib/i386-redhat-linux/2.96/specs
gcc version 2.96 20000731 (Red Hat Linux 7.3 2.96-110)
build-lin3> uname -a
Linux build-lin3 2.4.21-kdb #2 SMP Tue Apr 6 12:52:57 EDT 2004 i686 unknown
I've also tested on gcc 4.0.0:
rack-lin9$ gcc -v
Using built-in specs.
Target: i386-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --
infodir=/usr/share/info --enable-shared --enable-threads=posix --enable-
checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-
exceptions --enable-libgcj-multifile --enable-
languages=c,c++,objc,java,f95,ada --enable-java-awt=gtk --with-java-
home=/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre --host=i386-redhat-linux
Thread model: posix
gcc version 4.0.0 20050519 (Red Hat 4.0.0-8)
rack-lin9$ uname -a
Linux rack-lin9 2.6.11-1.1369_FC4smp #1 SMP Thu Jun 2 23:08:39 EDT 2005 i686
i686 i386 GNU/Linux
Compile command line when test fails:
gcc -o xor_fail -fomit-frame-pointer -O2 xor.c
Compile command line when test PASSES:
gcc -o xor xor.c
I'll attach the test program to the bug.
The generated code runs into problems in the loop:
/* now perform the xor across a stride */
for (offset = stride; offset < maxoffs; offset += 32) {
/* load first strip unit */
__asm__ __volatile__(
"add %1, %0\n"
"movaps 0(%0), %%xmm0\n"
"movaps 16(%0), %%xmm1\n"
: : "r" (bptr[0]), "r" (offset));
/* now xor the next N-1 strip units */
for (j = 1; j < num_of_buffers; j++){
__asm__ __volatile__(
"add %1, %0\n"
"xorps 0(%0), %%xmm0\n"
"xorps 16(%0), %%xmm1\n"
: : "r" (bptr[j]), "r" (offset) );
}
/* now write out the result */
__asm__ __volatile__(
"add %1, %0\n"
"movntps %%xmm0, 0(%0)\n"
"movntps %%xmm1, 16(%0)\n"
: : "r" (dest), "r" (offset) );
}
Specifically, in first loading the data:
__asm__ __volatile__(
"add %1, %0\n"
"movaps 0(%0), %%xmm0\n"
"movaps 16(%0), %%xmm1\n"
: : "r" (bptr[0]), "r" (offset));
We end up referencing memory off the end of the array bptr[0]. This is
because the loop doesn't initialize %ebx and %ebx ends up being too large to
access this array. The loop jumps to .L261, but .L261 is below movl (%ebp), %
ebx.
movl (%ebp), %ebx
.p2align 2
.L261:
.stabn 68,0,168,.LM68-sse_multi_xor_gen
.LM68:
#APP
add %edx, %ebx
movaps 0(%ebx), %xmm0
movaps 16(%ebx), %xmm1
.stabn 68,0,175,.LM69-sse_multi_xor_gen
.LM69:
#NO_APP
movl $1, %ecx
cmpl %edi, %ecx
jge .L273
.p2align 2
.L265:
.stabn 68,0,176,.LM70-sse_multi_xor_gen
.LM70:
movl (%ebp,%ecx,4), %eax
#APP
add %edx, %eax
xorps 0(%eax), %xmm0
xorps 16(%eax), %xmm1
.stabn 68,0,175,.LM71-sse_multi_xor_gen
.LM71:
#NO_APP
incl %ecx
cmpl %edi, %ecx
jl .L265
.L273:
.stabn 68,0,183,.LM72-sse_multi_xor_gen
.LM72:
movl 88(%esp), %eax
#APP
add %edx, %eax
movntps %xmm0, 0(%eax)
movntps %xmm1, 16(%eax)
.stabn 68,0,166,.LM73-sse_multi_xor_gen
.LM73:
#NO_APP
addl $32, %edx
cmpl %esi, %edx
jb .L261
The workaround fix is to just remove -fomit-frame-pointer. Though I'm fairly
concerned since the Linux kernel uses -fomit-frame-pointer for the kernel
sources.
--
Summary: Incorrect code generated for SSE2 based xor routine when
compiled with -O2 -fomit-frame-pointer
Product: gcc
Version: unknown
Status: UNCONFIRMED
Severity: normal
Priority: P2
Component: c
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: jeff at panasas dot com
CC: gcc-bugs at gcc dot gnu dot org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23909
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug c/23909] Incorrect code generated for SSE2 based xor routine when compiled with -O2 -fomit-frame-pointer
2005-09-16 5:38 [Bug c/23909] New: Incorrect code generated for SSE2 based xor routine when compiled with -O2 -fomit-frame-pointer jeff at panasas dot com
@ 2005-09-16 5:45 ` jeff at panasas dot com
2005-09-16 14:03 ` [Bug target/23909] " pinskia at gcc dot gnu dot org
` (3 subsequent siblings)
4 siblings, 0 replies; 7+ messages in thread
From: jeff at panasas dot com @ 2005-09-16 5:45 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From jeff at panasas dot com 2005-09-16 05:44 -------
Created an attachment (id=9738)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=9738&action=view)
Xor test program
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23909
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/23909] Incorrect code generated for SSE2 based xor routine when compiled with -O2 -fomit-frame-pointer
2005-09-16 5:38 [Bug c/23909] New: Incorrect code generated for SSE2 based xor routine when compiled with -O2 -fomit-frame-pointer jeff at panasas dot com
2005-09-16 5:45 ` [Bug c/23909] " jeff at panasas dot com
@ 2005-09-16 14:03 ` pinskia at gcc dot gnu dot org
2005-09-16 14:04 ` pinskia at gcc dot gnu dot org
` (2 subsequent siblings)
4 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-09-16 14:03 UTC (permalink / raw)
To: gcc-bugs
--
What |Removed |Added
----------------------------------------------------------------------------
Component|c |target
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23909
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/23909] Incorrect code generated for SSE2 based xor routine when compiled with -O2 -fomit-frame-pointer
2005-09-16 5:38 [Bug c/23909] New: Incorrect code generated for SSE2 based xor routine when compiled with -O2 -fomit-frame-pointer jeff at panasas dot com
2005-09-16 5:45 ` [Bug c/23909] " jeff at panasas dot com
2005-09-16 14:03 ` [Bug target/23909] " pinskia at gcc dot gnu dot org
@ 2005-09-16 14:04 ` pinskia at gcc dot gnu dot org
2005-09-16 14:41 ` pinskia at gcc dot gnu dot org
2005-09-16 14:50 ` jeff at panasas dot com
4 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-09-16 14:04 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From pinskia at gcc dot gnu dot org 2005-09-16 14:04 -------
You should be using the SSE instrincs functions.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23909
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/23909] Incorrect code generated for SSE2 based xor routine when compiled with -O2 -fomit-frame-pointer
2005-09-16 5:38 [Bug c/23909] New: Incorrect code generated for SSE2 based xor routine when compiled with -O2 -fomit-frame-pointer jeff at panasas dot com
` (2 preceding siblings ...)
2005-09-16 14:04 ` pinskia at gcc dot gnu dot org
@ 2005-09-16 14:41 ` pinskia at gcc dot gnu dot org
2005-09-16 14:50 ` jeff at panasas dot com
4 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-09-16 14:41 UTC (permalink / raw)
To: gcc-bugs
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 548 bytes --]
------- Additional Comments From pinskia at gcc dot gnu dot org 2005-09-16 14:40 -------
This works for me with 4.1.0 but I really think it is just an accident.
/*@unused@*/ int cr0; /* really, it's used, but lclint doesn't "get" __asm__ */
This comment does not make sense.
Specificly since GCC also warns about it:
t.c:110: warning: unused variable cr0
t.c:109: warning: unused variable reg_store
I still think you are making a mistake in your code by using inline-asm.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23909
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/23909] Incorrect code generated for SSE2 based xor routine when compiled with -O2 -fomit-frame-pointer
2005-09-16 5:38 [Bug c/23909] New: Incorrect code generated for SSE2 based xor routine when compiled with -O2 -fomit-frame-pointer jeff at panasas dot com
` (3 preceding siblings ...)
2005-09-16 14:41 ` pinskia at gcc dot gnu dot org
@ 2005-09-16 14:50 ` jeff at panasas dot com
4 siblings, 0 replies; 7+ messages in thread
From: jeff at panasas dot com @ 2005-09-16 14:50 UTC (permalink / raw)
To: gcc-bugs
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1579 bytes --]
------- Additional Comments From jeff at panasas dot com 2005-09-16 14:50 -------
(In reply to comment #3)
> This works for me with 4.1.0 but I really think it is just an accident.
> /*@unused@*/ int cr0; /* really, it's used, but lclint doesn't "get"
__asm__ */
> This comment does not make sense.
> Specificly since GCC also warns about it:
> t.c:110: warning: unused variable cr0
> t.c:109: warning: unused variable reg_store
> I still think you are making a mistake in your code by using inline-asm.
I pulled out some of the code required in the kernel. When using SSE in the
kernel you have to save and restore cr0 but you can't do that at userlevel.
The comment only refers to lclint, a tool that we use to statically check our
code. lclint doesn't parse the inline asm, so we have to annotate the code.
reg_store is also another local that is used to save and restore the xmm
registers when running in the kernel. you can just ignore this at user-level.
I'm not sure why SSE intrinsics will help here? This bug is will "go-away"
when even very small changes are made to that loop, so just about any change
will mask the bug.
The other problem that we have is that this code is compiled with several
versions of gcc (2.95.2 -> 3.4) so the inline asm is a good common
denominator. I'd be willing to move to intrinisics if it solved the problem
rather than masked it. Are there other reasons that the inline asm is a bad
idea? I believe the code is completely legal inline asm.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23909
^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <bug-23909-11364@http.gcc.gnu.org/bugzilla/>]
* [Bug target/23909] Incorrect code generated for SSE2 based xor routine when compiled with -O2 -fomit-frame-pointer
[not found] <bug-23909-11364@http.gcc.gnu.org/bugzilla/>
@ 2006-09-03 20:31 ` pinskia at gcc dot gnu dot org
0 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2006-09-03 20:31 UTC (permalink / raw)
To: gcc-bugs
------- Comment #5 from pinskia at gcc dot gnu dot org 2006-09-03 20:30 -------
__asm__ __volatile__(
"add %1, %0\n"
"movaps 0(%0), %%xmm0\n"
"movaps 16(%0), %%xmm1\n"
: : "r" (bptr[0]), "r" (offset));
This asm is wrong, it does not tell the compiler that it modifies memory or
even xmmm? registers.
Really you should be using SSE instrincs functions.
--
pinskia at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |RESOLVED
Resolution| |INVALID
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23909
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2006-09-03 20:31 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-09-16 5:38 [Bug c/23909] New: Incorrect code generated for SSE2 based xor routine when compiled with -O2 -fomit-frame-pointer jeff at panasas dot com
2005-09-16 5:45 ` [Bug c/23909] " jeff at panasas dot com
2005-09-16 14:03 ` [Bug target/23909] " pinskia at gcc dot gnu dot org
2005-09-16 14:04 ` pinskia at gcc dot gnu dot org
2005-09-16 14:41 ` pinskia at gcc dot gnu dot org
2005-09-16 14:50 ` jeff at panasas dot com
[not found] <bug-23909-11364@http.gcc.gnu.org/bugzilla/>
2006-09-03 20:31 ` pinskia at gcc dot gnu dot org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).