[Bug c/50065] New: -Os, -O2, -O3 optimization breaks LD/ST ordering on 32-bit SPARC

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug c/50065] New: -Os, -O2, -O3 optimization breaks LD/ST ordering on 32-bit SPARC
@ 2011-08-12 21:49 tanzhangxi at gmail dot com
  2011-08-13  4:33 ` [Bug rtl-optimization/50065] " pinskia at gcc dot gnu.org
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: tanzhangxi at gmail dot com @ 2011-08-12 21:49 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50065

             Bug #: 50065
           Summary: -Os, -O2, -O3 optimization breaks LD/ST ordering on
                    32-bit SPARC
    Classification: Unclassified
           Product: gcc
           Version: 4.6.1
            Status: UNCONFIRMED
          Severity: major
          Priority: P3
         Component: c
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: tanzhangxi@gmail.com


Considering the following C code.

static inline int spinlock_trylock(char* lock)
{
    int reg;
    __asm__ __volatile__ ("ldstub [%1],%0" : "=r"(reg) : "r"(lock) : "memory",
"cc");
    return reg;
}

static inline int spinlock_is_locked(char* lock)
{
    int reg;
    __asm__ __volatile__ ("ldub [%1],%0" : "=r"(reg) : "r"(lock) : "memory");
    return reg;
}


static inline void spinlock_lock(char* lock)
{
    while(spinlock_trylock(lock))
      while(spinlock_is_locked(lock));
}

static inline void spinlock_unlock(char* lock)
{
    *(volatile unsigned char*)lock = 0;
}

char remap_lock;
int remap_barrier;

void inc_remap_barrier()
{

  spinlock_lock(&remap_lock);
  remap_barrier++;
  spinlock_unlock(&remap_lock);
}

A simple ++ counter is protected by a spinlock implemented with the ldstub
atomic instruction on SPARC.

If I use -Os/-O2/-O3 to optimize the code, e.g.
sparc-ramp-gcc -c -Os test.c -o test.o

gcc will generate the following code
00000000 <inc_remap_barrier>:
   0:   03 00 00 00     sethi  %hi(0), %g1
   4:   10 80 00 06     b  1c <inc_remap_barrier+0x1c>
   8:   82 10 60 00     mov  %g1, %g1   ! 0 <inc_remap_barrier>
   c:   c4 08 40 00     ldub  [ %g1 ], %g2
  10:   80 a0 a0 00     cmp  %g2, 0
  14:   12 bf ff fe     bne  c <inc_remap_barrier+0xc>
  18:   01 00 00 00     nop 
  1c:   c4 68 40 00     ldstub  [ %g1 ], %g2
  20:   80 a0 a0 00     cmp  %g2, 0
  24:   12 bf ff fa     bne  c <inc_remap_barrier+0xc>
  28:   05 00 00 00     sethi  %hi(0), %g2
  2c:   c0 28 40 00     clrb  [ %g1 ]
  30:   c6 00 a0 00     ld  [ %g2 ], %g3
  34:   86 00 e0 01     inc  %g3
  38:   81 c3 e0 08     retl 
  3c:   c6 20 a0 00     st  %g3, [ %g2 ]


instruction 2C, clrb [%g1] corresponds to inline function 'spinlock_unlock'
    *(volatile unsigned char*)lock = 0;

This happens before the lock protected content 'remap_barrier++', i.e.

  30:   c6 00 a0 00     ld  [ %g2 ], %g3
  34:   86 00 e0 01     inc  %g3
  38:   81 c3 e0 08     retl 
  3c:   c6 20 a0 00     st  %g3, [ %g2 ]     ---> use the branch delay slot

This is wrong and will cause serious lock issues under a multithreading
environment.

However, the same code works fine with -O1 and -O0

This problem happens with couple of cross GCC builds (4.3.2 / 4.6.1) at our
side with various configurations (with glibc or a bare metal setting).
It breaks with the following configurations:

1. 
COLLECT_GCC=sparc-ramp-gcc
COLLECT_LTO_WRAPPER=/home/charming/toolchain/sparc-ramp/libexec/gcc/sparc-ramp-elf/4.6.1/lto-wrapper
Target: sparc-ramp-elf
Configured with: /home/charming/toolchain/.build/src/gcc-4.6.1/configure
--build=i686-build_pc-linux-gnu --host=i686-build_pc-linux-gnu
--target=sparc-ramp-elf --prefix=/home/charming/toolchain/sparc-ramp
--with-local-prefix=/home/charming/toolchain/sparc-ramp/sparc-ramp-elf/sysroot
--disable-multilib --disable-libmudflap
--with-sysroot=/home/charming/toolchain/sparc-ramp/sparc-ramp-elf/sysroot
--with-newlib --enable-threads=no --disable-shared
--with-pkgversion='crosstool-NG 1.12.0' --disable-__cxa_atexit
--with-gmp=/home/charming/toolchain/.build/sparc-ramp-elf/build/static
--with-mpfr=/home/charming/toolchain/.build/sparc-ramp-elf/build/static
--with-mpc=/home/charming/toolchain/.build/sparc-ramp-elf/build/static
--with-ppl=/home/charming/toolchain/.build/sparc-ramp-elf/build/static
--with-cloog=/home/charming/toolchain/.build/sparc-ramp-elf/build/static
--with-libelf=/home/charming/toolchain/.build/sparc-ramp-elf/build/static
--enable-lto
--with-host-libstdcxx='-L/home/charming/toolchain/.build/sparc-ramp-elf/build/static/lib
-lpwl' --enable-target-optspace --disable-libgomp --disable-libmudflap
--disable-nls --enable-languages=c --with-cpu=v8
Thread model: single
gcc version 4.6.1 (crosstool-NG 1.12.0) 

2.
Using built-in specs.
COLLECT_GCC=sparc-ramp-gcc
COLLECT_LTO_WRAPPER=/home/xtan/toolchain/sparc-ramp/libexec/gcc/sparc-ramp-linux-gnu/4.6.1/lto-wrapper
Target: sparc-ramp-linux-gnu
Configured with: /home/xtan/toolchain/.build/src/gcc-4.6.1/configure
--build=x86_64-build_unknown-linux-gnu --host=x86_64-build_unknown-linux-gnu
--target=sparc-ramp-linux-gnu --prefix=/home/xtan/toolchain/sparc-ramp
--with-sysroot=/home/xtan/toolchain/sparc-ramp/sparc-ramp-linux-gnu/sysroot
--enable-languages=c,c++ --disable-multilib
--with-pkgversion=crosstool-NG-1.11.3 --enable-__cxa_atexit
--disable-libmudflap --disable-libgomp --disable-libssp
--with-gmp=/home/xtan/toolchain/.build/sparc-ramp-linux-gnu/build/static
--with-mpfr=/home/xtan/toolchain/.build/sparc-ramp-linux-gnu/build/static
--with-mpc=/home/xtan/toolchain/.build/sparc-ramp-linux-gnu/build/static
--with-ppl=/home/xtan/toolchain/.build/sparc-ramp-linux-gnu/build/static
--with-cloog=/home/xtan/toolchain/.build/sparc-ramp-linux-gnu/build/static
--with-libelf=/home/xtan/toolchain/.build/sparc-ramp-linux-gnu/build/static
--with-host-libstdcxx='-L/home/xtan/toolchain/.build/sparc-ramp-linux-gnu/build/static/lib
-lpwl' --enable-threads=posix --enable-target-optspace
--with-local-prefix=/home/xtan/toolchain/sparc-ramp/sparc-ramp-linux-gnu/sysroot
--disable-nls --enable-symvers=gnu --enable-c99 --enable-long-long
Thread model: posix
gcc version 4.6.1 (crosstool-NG-1.11.3)


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug rtl-optimization/50065] -Os, -O2, -O3 optimization breaks LD/ST ordering on 32-bit SPARC
  2011-08-12 21:49 [Bug c/50065] New: -Os, -O2, -O3 optimization breaks LD/ST ordering on 32-bit SPARC tanzhangxi at gmail dot com
@ 2011-08-13  4:33 ` pinskia at gcc dot gnu.org
  2011-08-13 10:12 ` ebotcazou at gcc dot gnu.org
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2011-08-13  4:33 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50065

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> 2011-08-13 02:21:05 UTC ---
I fixed a volatile bug dealing with inlining and ipa-sra.  Maybe this was fixed
by that too.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug rtl-optimization/50065] -Os, -O2, -O3 optimization breaks LD/ST ordering on 32-bit SPARC
  2011-08-12 21:49 [Bug c/50065] New: -Os, -O2, -O3 optimization breaks LD/ST ordering on 32-bit SPARC tanzhangxi at gmail dot com
  2011-08-13  4:33 ` [Bug rtl-optimization/50065] " pinskia at gcc dot gnu.org
@ 2011-08-13 10:12 ` ebotcazou at gcc dot gnu.org
  2011-08-14  1:30 ` tanzhangxi at gmail dot com
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: ebotcazou at gcc dot gnu.org @ 2011-08-13 10:12 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50065

Eric Botcazou <ebotcazou at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
                 CC|                            |ebotcazou at gcc dot
                   |                            |gnu.org
         Resolution|                            |INVALID

--- Comment #2 from Eric Botcazou <ebotcazou at gcc dot gnu.org> 2011-08-13 10:10:47 UTC ---
> instruction 2C, clrb [%g1] corresponds to inline function 'spinlock_unlock'
>     *(volatile unsigned char*)lock = 0;
> 
> This happens before the lock protected content 'remap_barrier++', i.e.
> 
>   30:   c6 00 a0 00     ld  [ %g2 ], %g3
>   34:   86 00 e0 01     inc  %g3
>   38:   81 c3 e0 08     retl 
>   3c:   c6 20 a0 00     st  %g3, [ %g2 ]     ---> use the branch delay slot
> 
> This is wrong and will cause serious lock issues under a multithreading
> environment.

On what grounds is this wrong exactly?  The end of the code is equivalent to:

volatile unsigned char lock;
int remap_barrier;

remap_barrier++;
lock = 0;

It is perfectly valid for an optimizing C compiler to swap the two lines.

You want something like:

static inline void spin_unlock(char *lock)
{
    __asm__ __volatile__("stb %%g0, [%0]" : : "r" (lock) : "memory");
}


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug rtl-optimization/50065] -Os, -O2, -O3 optimization breaks LD/ST ordering on 32-bit SPARC
  2011-08-12 21:49 [Bug c/50065] New: -Os, -O2, -O3 optimization breaks LD/ST ordering on 32-bit SPARC tanzhangxi at gmail dot com
  2011-08-13  4:33 ` [Bug rtl-optimization/50065] " pinskia at gcc dot gnu.org
  2011-08-13 10:12 ` ebotcazou at gcc dot gnu.org
@ 2011-08-14  1:30 ` tanzhangxi at gmail dot com
  2011-08-14  4:42 ` tanzhangxi at gmail dot com
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: tanzhangxi at gmail dot com @ 2011-08-14  1:30 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50065

--- Comment #3 from Zhangxi Tan <tanzhangxi at gmail dot com> 2011-08-14 00:57:07 UTC ---
The code is equivalent to

volatile unsigned char lock;
int remap_barrier;

while (atomic_test_and_set(lock)) {
   while (lock) {
     ;
   }
}
remap_barrier++;
lock = 0;

Eric: could you let me know you you think the code inside function  
spinlock_lock(&remap_lock) is a NOP? This is a suggested lock implementation by
the SPARC spec. Also, the arch_write_lock/unlock in the SPARC port of Linux
uses a very similar implementation.

Andrew: could you let me know in which version I can find this ipa-sra fix? At
least, the stable 4.6.1 doesn't work.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug rtl-optimization/50065] -Os, -O2, -O3 optimization breaks LD/ST ordering on 32-bit SPARC
  2011-08-12 21:49 [Bug c/50065] New: -Os, -O2, -O3 optimization breaks LD/ST ordering on 32-bit SPARC tanzhangxi at gmail dot com
                   ` (2 preceding siblings ...)
  2011-08-14  1:30 ` tanzhangxi at gmail dot com
@ 2011-08-14  4:42 ` tanzhangxi at gmail dot com
  2011-08-14  9:38 ` mikpe at it dot uu.se
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: tanzhangxi at gmail dot com @ 2011-08-14  4:42 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50065

--- Comment #4 from Zhangxi Tan <tanzhangxi at gmail dot com> 2011-08-14 01:30:33 UTC ---
I don't think this is an valid optimization.

There are only two memory models in SPARC32, TSO and PSO (not RMO in the 64-bit
v9). Both don't allow relaxing the read->write order, i.e.  'LD remap_barrier'
should always be executed before 'ST lock'.

This optimization violates the memory model, therefore should be prohibited.

In addition, I still(In reply to comment #2)
> > instruction 2C, clrb [%g1] corresponds to inline function 'spinlock_unlock'
> >     *(volatile unsigned char*)lock = 0;
> > 
> > This happens before the lock protected content 'remap_barrier++', i.e.
> > 
> >   30:   c6 00 a0 00     ld  [ %g2 ], %g3
> >   34:   86 00 e0 01     inc  %g3
> >   38:   81 c3 e0 08     retl 
> >   3c:   c6 20 a0 00     st  %g3, [ %g2 ]     ---> use the branch delay slot
> > 
> > This is wrong and will cause serious lock issues under a multithreading
> > environment.
> 
> On what grounds is this wrong exactly?  The end of the code is equivalent to:
> 
> volatile unsigned char lock;
> int remap_barrier;
> 
> remap_barrier++;
> lock = 0;
> 
> It is perfectly valid for an optimizing C compiler to swap the two lines.
> 
> You want something like:
> 
> static inline void spin_unlock(char *lock)
> {
>     __asm__ __volatile__("stb %%g0, [%0]" : : "r" (lock) : "memory");
> }


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug rtl-optimization/50065] -Os, -O2, -O3 optimization breaks LD/ST ordering on 32-bit SPARC
  2011-08-12 21:49 [Bug c/50065] New: -Os, -O2, -O3 optimization breaks LD/ST ordering on 32-bit SPARC tanzhangxi at gmail dot com
                   ` (3 preceding siblings ...)
  2011-08-14  4:42 ` tanzhangxi at gmail dot com
@ 2011-08-14  9:38 ` mikpe at it dot uu.se
  2011-08-14 13:00 ` ebotcazou at gcc dot gnu.org
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: mikpe at it dot uu.se @ 2011-08-14  9:38 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50065

--- Comment #5 from Mikael Pettersson <mikpe at it dot uu.se> 2011-08-14 09:24:31 UTC ---
You need a _compiler_ barrier before the store in _unlock().


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug rtl-optimization/50065] -Os, -O2, -O3 optimization breaks LD/ST ordering on 32-bit SPARC
  2011-08-12 21:49 [Bug c/50065] New: -Os, -O2, -O3 optimization breaks LD/ST ordering on 32-bit SPARC tanzhangxi at gmail dot com
                   ` (4 preceding siblings ...)
  2011-08-14  9:38 ` mikpe at it dot uu.se
@ 2011-08-14 13:00 ` ebotcazou at gcc dot gnu.org
  2011-08-14 13:11 ` ebotcazou at gcc dot gnu.org
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: ebotcazou at gcc dot gnu.org @ 2011-08-14 13:00 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50065

--- Comment #6 from Eric Botcazou <ebotcazou at gcc dot gnu.org> 2011-08-14 12:42:47 UTC ---
> The code is equivalent to
> 
> volatile unsigned char lock;
> int remap_barrier;
> 
> while (atomic_test_and_set(lock)) {
>    while (lock) {
>      ;
>    }
> }
> remap_barrier++;
> lock = 0;
> 
> Eric: could you let me know you you think the code inside function  
> spinlock_lock(&remap_lock) is a NOP?

I don't, you simply misquoted, I wrote "the end of the code".  The first part
of the spinlock implementation is correct, in particular you have the required
memory barrier in spinlock_is_locked.  The second part is not correct, as you
don't have the memory barrier in spinlock_unlock.

> Also, the arch_write_lock/unlock in the SPARC port of Linux uses a very 
> similar implementation.

No, it precisely doesn't, it has the memory barrier in spinlock_unlock.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug rtl-optimization/50065] -Os, -O2, -O3 optimization breaks LD/ST ordering on 32-bit SPARC
  2011-08-12 21:49 [Bug c/50065] New: -Os, -O2, -O3 optimization breaks LD/ST ordering on 32-bit SPARC tanzhangxi at gmail dot com
                   ` (5 preceding siblings ...)
  2011-08-14 13:00 ` ebotcazou at gcc dot gnu.org
@ 2011-08-14 13:11 ` ebotcazou at gcc dot gnu.org
  2011-08-14 22:43 ` tanzhangxi at gmail dot com
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: ebotcazou at gcc dot gnu.org @ 2011-08-14 13:11 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50065

--- Comment #7 from Eric Botcazou <ebotcazou at gcc dot gnu.org> 2011-08-14 13:00:06 UTC ---
> I don't think this is an valid optimization.
> 
> There are only two memory models in SPARC32, TSO and PSO (not RMO in the 64-bit
> v9). Both don't allow relaxing the read->write order, i.e.  'LD remap_barrier'
> should always be executed before 'ST lock'.
> 
> This optimization violates the memory model, therefore should be prohibited.

You're apparently confusing 2 different concepts:

  1. What an optimizing C compiler is permitted to do.  This is defined by the
ISO Standard in terms of an abstract machine that is somewhat simplistic.  In
particular, there is no concept of concurrency or memory model, and the whole
thing is essentially target-independent.  The kind of reordering we have here
is allowed by the Standard as it doesn't change the "external state" of the
abstract machine.

  2. The memory model implemented by the SPARC processor, under which loads and
stores can be reordered, even though the compiler itself doesn't reorder them.

A proper implementation of spinlocks needs to take them both into account.

For the first part, you need a compiler memory barrier, i.e.:

__asm__ __volatile__ ("" : : : "memory");

For the second part, you need a processor memory barrier, i.e. to put a stbar
instruction if you're running PSO, plus an atomic instruction that is the only
memory barrier available in V8 for TSO.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug rtl-optimization/50065] -Os, -O2, -O3 optimization breaks LD/ST ordering on 32-bit SPARC
  2011-08-12 21:49 [Bug c/50065] New: -Os, -O2, -O3 optimization breaks LD/ST ordering on 32-bit SPARC tanzhangxi at gmail dot com
                   ` (6 preceding siblings ...)
  2011-08-14 13:11 ` ebotcazou at gcc dot gnu.org
@ 2011-08-14 22:43 ` tanzhangxi at gmail dot com
  2011-08-15  8:52 ` ebotcazou at gcc dot gnu.org
  2011-08-16  7:29 ` mikpe at it dot uu.se
  9 siblings, 0 replies; 11+ messages in thread
From: tanzhangxi at gmail dot com @ 2011-08-14 22:43 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50065

--- Comment #8 from Zhangxi Tan <tanzhangxi at gmail dot com> 2011-08-14 21:00:40 UTC ---
Thanks for the clear explanation.
I agree that a memory barrier would solve this issue.

Regarding the spinlock_unlock in linux, the regular arch_spin_unlock is
implemented with a single inline assembly. That will prevent the memory
reordering in C. However, for the 32-bit port the arch_write_unlock is still
defined as the following without a memory barrier in
arch/sparc/include/asm/spinlock_32.h

#define arch_write_unlock(rw)   do { (rw)->lock = 0; } while(0)

OTH, the 64-bit implemention is ok. Or did I miss something here.
Anyway, I think this is a separated issue from this thread.

(In reply to comment #6)
> > The code is equivalent to
> > 
> > volatile unsigned char lock;
> > int remap_barrier;
> > 
> > while (atomic_test_and_set(lock)) {
> >    while (lock) {
> >      ;
> >    }
> > }
> > remap_barrier++;
> > lock = 0;
> > 
> > Eric: could you let me know you you think the code inside function  
> > spinlock_lock(&remap_lock) is a NOP?
> 
> I don't, you simply misquoted, I wrote "the end of the code".  The first part
> of the spinlock implementation is correct, in particular you have the required
> memory barrier in spinlock_is_locked.  The second part is not correct, as you
> don't have the memory barrier in spinlock_unlock.
> 
> > Also, the arch_write_lock/unlock in the SPARC port of Linux uses a very 
> > similar implementation.
> 
> No, it precisely doesn't, it has the memory barrier in spinlock_unlock.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug rtl-optimization/50065] -Os, -O2, -O3 optimization breaks LD/ST ordering on 32-bit SPARC
  2011-08-12 21:49 [Bug c/50065] New: -Os, -O2, -O3 optimization breaks LD/ST ordering on 32-bit SPARC tanzhangxi at gmail dot com
                   ` (7 preceding siblings ...)
  2011-08-14 22:43 ` tanzhangxi at gmail dot com
@ 2011-08-15  8:52 ` ebotcazou at gcc dot gnu.org
  2011-08-16  7:29 ` mikpe at it dot uu.se
  9 siblings, 0 replies; 11+ messages in thread
From: ebotcazou at gcc dot gnu.org @ 2011-08-15  8:52 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50065

--- Comment #9 from Eric Botcazou <ebotcazou at gcc dot gnu.org> 2011-08-15 08:47:08 UTC ---
> Regarding the spinlock_unlock in linux, the regular arch_spin_unlock is
> implemented with a single inline assembly. That will prevent the memory
> reordering in C. However, for the 32-bit port the arch_write_unlock is still
> defined as the following without a memory barrier in
> arch/sparc/include/asm/spinlock_32.h
> 
> #define arch_write_unlock(rw)   do { (rw)->lock = 0; } while(0)
> 
> OTH, the 64-bit implemention is ok. Or did I miss something here.
> Anyway, I think this is a separated issue from this thread.

The discrepancy is a little surprising, indeed.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug rtl-optimization/50065] -Os, -O2, -O3 optimization breaks LD/ST ordering on 32-bit SPARC
  2011-08-12 21:49 [Bug c/50065] New: -Os, -O2, -O3 optimization breaks LD/ST ordering on 32-bit SPARC tanzhangxi at gmail dot com
                   ` (8 preceding siblings ...)
  2011-08-15  8:52 ` ebotcazou at gcc dot gnu.org
@ 2011-08-16  7:29 ` mikpe at it dot uu.se
  9 siblings, 0 replies; 11+ messages in thread
From: mikpe at it dot uu.se @ 2011-08-16  7:29 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50065

--- Comment #10 from Mikael Pettersson <mikpe at it dot uu.se> 2011-08-16 07:24:48 UTC ---
(In reply to comment #9)
> > Regarding the spinlock_unlock in linux, the regular arch_spin_unlock is
> > implemented with a single inline assembly. That will prevent the memory
> > reordering in C. However, for the 32-bit port the arch_write_unlock is still
> > defined as the following without a memory barrier in
> > arch/sparc/include/asm/spinlock_32.h
> > 
> > #define arch_write_unlock(rw)   do { (rw)->lock = 0; } while(0)
> > 
> > OTH, the 64-bit implemention is ok. Or did I miss something here.
> > Anyway, I think this is a separated issue from this thread.
> 
> The discrepancy is a little surprising, indeed.

That was a bug in the sparc32 Linux kernel.  I sent a patch yesterday to fix
it.


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2011-08-16  7:25 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-12 21:49 [Bug c/50065] New: -Os, -O2, -O3 optimization breaks LD/ST ordering on 32-bit SPARC tanzhangxi at gmail dot com
2011-08-13  4:33 ` [Bug rtl-optimization/50065] " pinskia at gcc dot gnu.org
2011-08-13 10:12 ` ebotcazou at gcc dot gnu.org
2011-08-14  1:30 ` tanzhangxi at gmail dot com
2011-08-14  4:42 ` tanzhangxi at gmail dot com
2011-08-14  9:38 ` mikpe at it dot uu.se
2011-08-14 13:00 ` ebotcazou at gcc dot gnu.org
2011-08-14 13:11 ` ebotcazou at gcc dot gnu.org
2011-08-14 22:43 ` tanzhangxi at gmail dot com
2011-08-15  8:52 ` ebotcazou at gcc dot gnu.org
2011-08-16  7:29 ` mikpe at it dot uu.se

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).