public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/23909] New: Incorrect code generated for SSE2 based xor routine when compiled with -O2 -fomit-frame-pointer
@ 2005-09-16  5:38 jeff at panasas dot com
  2005-09-16  5:45 ` [Bug c/23909] " jeff at panasas dot com
                   ` (4 more replies)
  0 siblings, 5 replies; 7+ messages in thread
From: jeff at panasas dot com @ 2005-09-16  5:38 UTC (permalink / raw)
  To: gcc-bugs

We use some optimized XOR routines for software RAID.  Unfortunately, the 
compiler generated incorrect code when this was compiled for Redhat 7.3 + 
2.4.24 (this is normally kernel code).  I later found out that all versions of 
gcc that I tested (up to FC4 - 4.0.0 20050519 (Red Hat 4.0.0-8)) had this 
issue.

gcc -v on RH 7.3:

build-lin3> gcc -v
Reading specs from /usr/lib/gcc-lib/i386-redhat-linux/2.96/specs
gcc version 2.96 20000731 (Red Hat Linux 7.3 2.96-110)

build-lin3> uname -a
Linux build-lin3 2.4.21-kdb #2 SMP Tue Apr 6 12:52:57 EDT 2004 i686 unknown

I've also tested on gcc 4.0.0:

rack-lin9$ gcc -v
Using built-in specs.
Target: i386-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --
infodir=/usr/share/info --enable-shared --enable-threads=posix --enable-
checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-
exceptions --enable-libgcj-multifile --enable-
languages=c,c++,objc,java,f95,ada --enable-java-awt=gtk --with-java-
home=/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre --host=i386-redhat-linux
Thread model: posix
gcc version 4.0.0 20050519 (Red Hat 4.0.0-8)

rack-lin9$ uname -a
Linux rack-lin9 2.6.11-1.1369_FC4smp #1 SMP Thu Jun 2 23:08:39 EDT 2005 i686 
i686 i386 GNU/Linux


Compile command line when test fails: 

gcc -o xor_fail -fomit-frame-pointer -O2 xor.c

Compile command line when test PASSES:

gcc -o xor xor.c

I'll attach the test program to the bug.

The generated code runs into problems in the loop:


    /* now perform the xor across a stride */
    for (offset = stride; offset < maxoffs; offset += 32) {
      /* load first strip unit */
      __asm__ __volatile__(
                           "add     %1,       %0\n"
                           "movaps   0(%0),   %%xmm0\n"
                           "movaps  16(%0),   %%xmm1\n"
                           : : "r" (bptr[0]), "r" (offset));

      /* now xor the next N-1 strip units */
      for (j = 1; j < num_of_buffers; j++){
        __asm__ __volatile__(
                             "add    %1,     %0\n"
                             "xorps  0(%0),  %%xmm0\n"
                             "xorps 16(%0),  %%xmm1\n"
                             : :  "r" (bptr[j]), "r" (offset) );
      }
      /* now write out the result */
      __asm__ __volatile__(
                           "add     %1, %0\n"
                           "movntps %%xmm0,  0(%0)\n"
                           "movntps %%xmm1, 16(%0)\n"
                           : : "r" (dest), "r"  (offset) );

    }

Specifically, in first loading the data:
      __asm__ __volatile__(
                           "add     %1,       %0\n"
                           "movaps   0(%0),   %%xmm0\n"
                           "movaps  16(%0),   %%xmm1\n"
                           : : "r" (bptr[0]), "r" (offset));

We end up referencing memory off the end of the array bptr[0].  This is 
because the loop doesn't initialize %ebx and %ebx ends up being too large to 
access this array.  The loop jumps to .L261, but .L261 is below movl (%ebp), %
ebx.

        movl    (%ebp), %ebx
        .p2align 2
.L261:
        .stabn 68,0,168,.LM68-sse_multi_xor_gen
.LM68:
#APP
        add     %edx,       %ebx
movaps   0(%ebx),   %xmm0
movaps  16(%ebx),   %xmm1

        .stabn 68,0,175,.LM69-sse_multi_xor_gen
.LM69:
#NO_APP
        movl    $1, %ecx
        cmpl    %edi, %ecx
        jge     .L273
        .p2align 2
.L265:
        .stabn 68,0,176,.LM70-sse_multi_xor_gen
.LM70:
        movl    (%ebp,%ecx,4), %eax
#APP
        add    %edx,     %eax
xorps  0(%eax),  %xmm0
xorps 16(%eax),  %xmm1

        .stabn 68,0,175,.LM71-sse_multi_xor_gen
.LM71:
#NO_APP
        incl    %ecx
        cmpl    %edi, %ecx
        jl      .L265
.L273:
        .stabn 68,0,183,.LM72-sse_multi_xor_gen
.LM72:
        movl    88(%esp), %eax
#APP
        add     %edx, %eax
movntps %xmm0,  0(%eax)
movntps %xmm1, 16(%eax)

        .stabn 68,0,166,.LM73-sse_multi_xor_gen
.LM73:
#NO_APP
        addl    $32, %edx
        cmpl    %esi, %edx
        jb      .L261

The workaround fix is to just remove -fomit-frame-pointer.  Though I'm fairly 
concerned since the Linux kernel uses -fomit-frame-pointer for the kernel 
sources.

-- 
           Summary: Incorrect code generated for SSE2 based xor routine when
                    compiled with -O2 -fomit-frame-pointer
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: c
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: jeff at panasas dot com
                CC: gcc-bugs at gcc dot gnu dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23909


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug c/23909] Incorrect code generated for SSE2 based xor routine when compiled with -O2 -fomit-frame-pointer
  2005-09-16  5:38 [Bug c/23909] New: Incorrect code generated for SSE2 based xor routine when compiled with -O2 -fomit-frame-pointer jeff at panasas dot com
@ 2005-09-16  5:45 ` jeff at panasas dot com
  2005-09-16 14:03 ` [Bug target/23909] " pinskia at gcc dot gnu dot org
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: jeff at panasas dot com @ 2005-09-16  5:45 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From jeff at panasas dot com  2005-09-16 05:44 -------
Created an attachment (id=9738)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=9738&action=view)
Xor test program


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23909


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/23909] Incorrect code generated for SSE2 based xor routine when compiled with -O2 -fomit-frame-pointer
  2005-09-16  5:38 [Bug c/23909] New: Incorrect code generated for SSE2 based xor routine when compiled with -O2 -fomit-frame-pointer jeff at panasas dot com
  2005-09-16  5:45 ` [Bug c/23909] " jeff at panasas dot com
@ 2005-09-16 14:03 ` pinskia at gcc dot gnu dot org
  2005-09-16 14:04 ` pinskia at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-09-16 14:03 UTC (permalink / raw)
  To: gcc-bugs



-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|c                           |target


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23909


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/23909] Incorrect code generated for SSE2 based xor routine when compiled with -O2 -fomit-frame-pointer
  2005-09-16  5:38 [Bug c/23909] New: Incorrect code generated for SSE2 based xor routine when compiled with -O2 -fomit-frame-pointer jeff at panasas dot com
  2005-09-16  5:45 ` [Bug c/23909] " jeff at panasas dot com
  2005-09-16 14:03 ` [Bug target/23909] " pinskia at gcc dot gnu dot org
@ 2005-09-16 14:04 ` pinskia at gcc dot gnu dot org
  2005-09-16 14:41 ` pinskia at gcc dot gnu dot org
  2005-09-16 14:50 ` jeff at panasas dot com
  4 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-09-16 14:04 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From pinskia at gcc dot gnu dot org  2005-09-16 14:04 -------
You should be using the SSE instrincs functions.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23909


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/23909] Incorrect code generated for SSE2 based xor routine when compiled with -O2 -fomit-frame-pointer
  2005-09-16  5:38 [Bug c/23909] New: Incorrect code generated for SSE2 based xor routine when compiled with -O2 -fomit-frame-pointer jeff at panasas dot com
                   ` (2 preceding siblings ...)
  2005-09-16 14:04 ` pinskia at gcc dot gnu dot org
@ 2005-09-16 14:41 ` pinskia at gcc dot gnu dot org
  2005-09-16 14:50 ` jeff at panasas dot com
  4 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-09-16 14:41 UTC (permalink / raw)
  To: gcc-bugs

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 548 bytes --]


------- Additional Comments From pinskia at gcc dot gnu dot org  2005-09-16 14:40 -------
This works for me with 4.1.0 but I really think it is just an accident.

  /*@unused@*/ int cr0;   /* really, it's used, but lclint doesn't "get" __asm__ */

This comment does not make sense.
Specificly since GCC also warns about it:
t.c:110: warning: unused variable ‘cr0’
t.c:109: warning: unused variable ‘reg_store’


I still think you are making a mistake in your code by using inline-asm.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23909


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/23909] Incorrect code generated for SSE2 based xor routine when compiled with -O2 -fomit-frame-pointer
  2005-09-16  5:38 [Bug c/23909] New: Incorrect code generated for SSE2 based xor routine when compiled with -O2 -fomit-frame-pointer jeff at panasas dot com
                   ` (3 preceding siblings ...)
  2005-09-16 14:41 ` pinskia at gcc dot gnu dot org
@ 2005-09-16 14:50 ` jeff at panasas dot com
  4 siblings, 0 replies; 7+ messages in thread
From: jeff at panasas dot com @ 2005-09-16 14:50 UTC (permalink / raw)
  To: gcc-bugs

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1579 bytes --]


------- Additional Comments From jeff at panasas dot com  2005-09-16 14:50 -------
(In reply to comment #3)
> This works for me with 4.1.0 but I really think it is just an accident.
>   /*@unused@*/ int cr0;   /* really, it's used, but lclint doesn't "get" 
__asm__ */
> This comment does not make sense.
> Specificly since GCC also warns about it:
> t.c:110: warning: unused variable ‘cr0’
> t.c:109: warning: unused variable ‘reg_store’
> I still think you are making a mistake in your code by using inline-asm.

I pulled out some of the code required in the kernel.  When using SSE in the 
kernel you have to save and restore cr0 but you can't do that at userlevel.  
The comment only refers to lclint, a tool that we use to statically check our 
code.  lclint doesn't parse the inline asm, so we have to annotate the code.

reg_store is also another local that is used to save and restore the xmm 
registers when running in the kernel.  you can just ignore this at user-level.

I'm not sure why SSE intrinsics will help here?  This bug is will "go-away" 
when even very small changes are made to that loop, so just about any change 
will mask the bug.

The other problem that we have is that this code is compiled with several 
versions of gcc (2.95.2 -> 3.4) so the inline asm is a good common 
denominator.  I'd be willing to move to intrinisics if it solved the problem 
rather than masked it.  Are there other reasons that the inline asm is a bad 
idea?  I believe the code is completely legal inline asm.



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23909


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/23909] Incorrect code generated for SSE2 based xor routine when compiled with -O2 -fomit-frame-pointer
       [not found] <bug-23909-11364@http.gcc.gnu.org/bugzilla/>
@ 2006-09-03 20:31 ` pinskia at gcc dot gnu dot org
  0 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2006-09-03 20:31 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from pinskia at gcc dot gnu dot org  2006-09-03 20:30 -------
      __asm__ __volatile__(
                           "add     %1,       %0\n"
                           "movaps   0(%0),   %%xmm0\n"
                           "movaps  16(%0),   %%xmm1\n"
                           : : "r" (bptr[0]), "r" (offset));

This asm is wrong, it does not tell the compiler that it modifies memory or
even xmmm? registers.

Really you should be using SSE instrincs functions.


-- 

pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|                            |INVALID


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23909


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2006-09-03 20:31 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-09-16  5:38 [Bug c/23909] New: Incorrect code generated for SSE2 based xor routine when compiled with -O2 -fomit-frame-pointer jeff at panasas dot com
2005-09-16  5:45 ` [Bug c/23909] " jeff at panasas dot com
2005-09-16 14:03 ` [Bug target/23909] " pinskia at gcc dot gnu dot org
2005-09-16 14:04 ` pinskia at gcc dot gnu dot org
2005-09-16 14:41 ` pinskia at gcc dot gnu dot org
2005-09-16 14:50 ` jeff at panasas dot com
     [not found] <bug-23909-11364@http.gcc.gnu.org/bugzilla/>
2006-09-03 20:31 ` pinskia at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).