[Bug nptl/11291] New: potential deadlock in sem_*wait and sem

public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug nptl/11291] New: potential deadlock in sem_*wait and sem_post for MIPS architectures
@ 2010-02-17 15:07 mischa dot jonker at viragelogic dot com
  2010-02-17 15:08 ` [Bug nptl/11291] " mischa dot jonker at viragelogic dot com
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: mischa dot jonker at viragelogic dot com @ 2010-02-17 15:07 UTC (permalink / raw)
  To: glibc-bugs

I've encountered the following issue while heavily using glibc's
sem_wait and sem_post simultaneously in a lot of threads: it seems
that the sem_post and sem_wait functions can enter an endless loop in
some corner cases. After diving into the code, it seems that (for the
case of sem_post, nptl/sysdeps/unix/sysv/linux/sem_post.c) the
following code is causing the problem:

int
__new_sem_post (sem_t *sem)
{
 struct new_sem *isem = (struct new_sem *) sem;

 __typeof (isem->value) cur;
 do
   {
     cur = isem->value;
     if (isem->value == SEM_VALUE_MAX)
       {
         __set_errno (EOVERFLOW);
         return -1;
       }
   }
 while (atomic_compare_and_exchange_bool_acq (&isem->value, cur + 1, cur));


The problem occurs because gcc is optimizing this piece of code, and
puts 'isem->value' in a register before the while loop actually
begins. The atomic_compare_and_exchange_bool_acq macro then compares
'cur' (from the register, which is never updated in the loop) with
'isem->value' (from memory, which could be updated by another thread).
When we have multiple threads running and using the same semaphore,
the isem->value might be updated by another thread after the register
is filled with the value from memory, and then we run into an endless
loop...

I was able to fix this (and more issues of the same kind in the other
semaphore functions) by adding a 'volatile' keyword:

int
__new_sem_post (sem_t *sem)
{
 struct new_sem volatile *isem = (struct new_sem *) sem;

===8<===8<===8<=== Additional info
glibc 2.9, but still present in git

kernel Linux 2.6.28.10  Wed Feb 17 13:32:34 CET 2010 mips unknown

$ mipsel-linux-gcc -v
Using built-in specs.
Target: mipsel-linux
Configured with: --with-gnu-ld --enable-shared --enable-target-optspace --
enable-languages=c,c++,objc --enable-threads=posix --enable-multilib --enable-
c99 --enable-long-long --enable-symvers=gnu --enable-libstdcxx-pch --program-
prefix=mipsel-linux- --enable-libssp --disable-bootstrap --enable-libgomp --
disable-libmudflap --disable-libunwind-exceptions --enable-libssp --enable-
libgomp --disable-libmudflap --enable-__cxa_atexit
Thread model: posix
gcc version 4.2.4

$ mipsel-linux-ld -v
GNU ld (Linux/GNU Binutils) 2.18.50.0.7.20080502

===8<===8<===8<=== Patch that solved the problem for me:

diff --git a/nptl/sysdeps/unix/sysv/linux/sem_post.c 
b/nptl/sysdeps/unix/sysv/linux/sem_post.c
index 58b226f..0d4a6d6 100644
--- a/nptl/sysdeps/unix/sysv/linux/sem_post.c
+++ b/nptl/sysdeps/unix/sysv/linux/sem_post.c
@@ -29,7 +29,7 @@
 int
 __new_sem_post (sem_t *sem)
 {
-  struct new_sem *isem = (struct new_sem *) sem;
+  struct new_sem volatile *isem = (struct new_sem *) sem;
 
   __typeof (isem->value) cur;
   do
@@ -64,7 +64,7 @@ int
 attribute_compat_text_section
 __old_sem_post (sem_t *sem)
 {
-  int *futex = (int *) sem;
+  int volatile *futex = (int *) sem;
 
   int nr = atomic_increment_val (futex);
   /* We always have to assume it is a shared semaphore.  */
diff --git a/nptl/sysdeps/unix/sysv/linux/sem_timedwait.c 
b/nptl/sysdeps/unix/sysv/linux/sem_timedwait.c
index fdf0d74..a241ad6 100644
--- a/nptl/sysdeps/unix/sysv/linux/sem_timedwait.c
+++ b/nptl/sysdeps/unix/sysv/linux/sem_timedwait.c
@@ -34,7 +34,7 @@ extern void __sem_wait_cleanup (void *arg) attribute_hidden;
 int
 sem_timedwait (sem_t *sem, const struct timespec *abstime)
 {
-  struct new_sem *isem = (struct new_sem *) sem;
+  struct new_sem volatile *isem = (struct new_sem *) sem;
   int err;
 
   if (atomic_decrement_if_positive (&isem->value) > 0)
diff --git a/nptl/sysdeps/unix/sysv/linux/sem_trywait.c 
b/nptl/sysdeps/unix/sysv/linux/sem_trywait.c
index f500361..74e1170 100644
--- a/nptl/sysdeps/unix/sysv/linux/sem_trywait.c
+++ b/nptl/sysdeps/unix/sysv/linux/sem_trywait.c
@@ -30,7 +30,7 @@
 int
 __new_sem_trywait (sem_t *sem)
 {
-  int *futex = (int *) sem;
+  int volatile *futex = (int *) sem;
   int val;
 
   if (*futex > 0)
diff --git a/nptl/sysdeps/unix/sysv/linux/sem_wait.c 
b/nptl/sysdeps/unix/sysv/linux/sem_wait.c
index 20e2b48..5601e1a 100644
--- a/nptl/sysdeps/unix/sysv/linux/sem_wait.c
+++ b/nptl/sysdeps/unix/sysv/linux/sem_wait.c
@@ -32,7 +32,7 @@ void
 attribute_hidden
 __sem_wait_cleanup (void *arg)
 {
-  struct new_sem *isem = (struct new_sem *) arg;
+  struct new_sem volatile *isem = (struct new_sem *) arg;
 
   atomic_decrement (&isem->nwaiters);
 }
@@ -41,7 +41,7 @@ __sem_wait_cleanup (void *arg)
 int
 __new_sem_wait (sem_t *sem)
 {
-  struct new_sem *isem = (struct new_sem *) sem;
+  struct new_sem volatile *isem = (struct new_sem *) sem;
   int err;
 
   if (atomic_decrement_if_positive (&isem->value) > 0)
@@ -90,7 +90,7 @@ int
 attribute_compat_text_section
 __old_sem_wait (sem_t *sem)
 {
-  int *futex = (int *) sem;
+  int volatile *futex = (int *) sem;
   int err;
 
   do

-- 
           Summary: potential deadlock in sem_*wait and sem_post for MIPS
                    architectures
           Product: glibc
           Version: 2.9
            Status: NEW
          Severity: normal
          Priority: P2
         Component: nptl
        AssignedTo: drepper at redhat dot com
        ReportedBy: mischa dot jonker at viragelogic dot com
                CC: glibc-bugs at sources dot redhat dot com,mischa dot
                    jonker at viragelogic dot com
 GCC build triplet: i386-linux-gnu
  GCC host triplet: mipsel-linux-gnu
GCC target triplet: mipsel-linux-gnu


http://sourceware.org/bugzilla/show_bug.cgi?id=11291

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug nptl/11291] potential deadlock in sem_*wait and sem_post for MIPS architectures
  2010-02-17 15:07 [Bug nptl/11291] New: potential deadlock in sem_*wait and sem_post for MIPS architectures mischa dot jonker at viragelogic dot com
@ 2010-02-17 15:08 ` mischa dot jonker at viragelogic dot com
  2010-02-17 15:19 ` [Bug ports/11291] " jakub at redhat dot com
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: mischa dot jonker at viragelogic dot com @ 2010-02-17 15:08 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From mischa dot jonker at viragelogic dot com  2010-02-17 15:07 -------
Created an attachment (id=4606)
 --> (http://sourceware.org/bugzilla/attachment.cgi?id=4606&action=view)
Patch that fixes the problem for me


-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=11291

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug ports/11291] potential deadlock in sem_*wait and sem_post for MIPS architectures
  2010-02-17 15:07 [Bug nptl/11291] New: potential deadlock in sem_*wait and sem_post for MIPS architectures mischa dot jonker at viragelogic dot com
  2010-02-17 15:08 ` [Bug nptl/11291] " mischa dot jonker at viragelogic dot com
@ 2010-02-17 15:19 ` jakub at redhat dot com
  2010-02-17 16:37 ` mischa dot jonker at viragelogic dot com
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: jakub at redhat dot com @ 2010-02-17 15:19 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From jakub at redhat dot com  2010-02-17 15:19 -------
The patch is wrong.  Most likely just mips have wrong definition of the
atomic_compare_and_swap_bool macros where it doesn't tell the compiler that *mem
actually changes (missing "=m" (*mem) ? ).

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|nptl                        |ports


http://sourceware.org/bugzilla/show_bug.cgi?id=11291

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug ports/11291] potential deadlock in sem_*wait and sem_post for MIPS architectures
  2010-02-17 15:07 [Bug nptl/11291] New: potential deadlock in sem_*wait and sem_post for MIPS architectures mischa dot jonker at viragelogic dot com
  2010-02-17 15:08 ` [Bug nptl/11291] " mischa dot jonker at viragelogic dot com
  2010-02-17 15:19 ` [Bug ports/11291] " jakub at redhat dot com
@ 2010-02-17 16:37 ` mischa dot jonker at viragelogic dot com
  2010-02-17 16:59 ` schwab at linux-m68k dot org
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: mischa dot jonker at viragelogic dot com @ 2010-02-17 16:37 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From mischa dot jonker at viragelogic dot com  2010-02-17 16:37 -------
It looks like the problem is not in the macro, the "m" (*mem) is actually there:

#define __arch_compare_and_exchange_xxx_32_int(mem, newval, oldval, rel, acq) \
     __asm__ __volatile__ (                                                   \
     ".set      push\n\t"                                                     \
     MIPS_PUSH_MIPS2                                                          \
     rel        "\n"                                                          \
     "1:\t"                                                                   \
     "ll        %0,%4\n\t"                                                    \
     "move      %1,$0\n\t"                                                    \
     "bne       %0,%2,2f\n\t"                                                 \
     "move      %1,%3\n\t"                                                    \
     "sc        %1,%4\n\t"                                                    \
     "beqz      %1,1b\n"                                                      \
     acq        "\n\t"                                                        \
     ".set      pop\n"                                                        \
     "2:\n\t"                                                                 \
              : "=&r" (__prev), "=&r" (__cmp)                                 \
              : "r" (oldval), "r" (newval), "m" (*mem)                        \
              : "memory")


If we look at the disassembled output of gcc, we see that in below loop (inside __new_sem_post), the "cur = isem->value" 
is taken out of the loop, as isem->value does not seem to change. 

The relevant code from sem_post:
 do
    {
      cur = isem->value; // <-- taken out of the loop
      if (isem->value == SEM_VALUE_MAX) //<--- taken out of the loop as well
        {
          __set_errno (EOVERFLOW);
          return -1;
        }
    }
  while (atomic_compare_and_exchange_bool_acq (&isem->value, cur + 1, cur));

And here the disassembly, with explanation:
0000e530 <sem_post>:
    e530:       3c1c0002        lui     gp,0x2
    e534:       279ceb00        addiu   gp,gp,-5376
    e538:       0399e021        addu    gp,gp,t9
    e53c:       8c860000        lw      a2,0(a0)        // a2 (=cur) = isem->value
    e540:       3c037fff        lui     v1,0x7fff
    e544:       3462ffff        ori     v0,v1,0xffff    // v0 = SEM_VALUE_MAX
    e548:       14c20007        bne     a2,v0,e568 <sem_post+0x38>  //check for overflow
    e54c:       24c50001        addiu   a1,a2,1         // a1 = cur + 1 (note: branch delay slot!!)

<-- code to return error in case of overflow
    e550:       8f84833c        lw      a0,-31940(gp)
    e554:       7c03e83b        rdhwr   v1,$29
    e558:       00831021        addu    v0,a0,v1
    e55c:       2404ffff        li      a0,-1
    e560:       10000021        b       e5e8 <sem_post+0xb8>
    e564:       2403004f        li      v1,79 -->

// Execution continues here after overflow check. Please note that the ll/sc combination
// is used, which is in fact atomic. Furthermore, it fetches 0(a0) again, which is isem->value.
    e568:       c0830000        ll      v1,0(a0)        

// And here is the problem!! It compares the just fetched value with a2, which is initialised
// at the beginning of the loop, but never loaded again. The fact that 'll' is used for the load
// above (instead of lw), makes me believe that the atomic_compare_and_exchange_bool_acq macro
// is in fact correct (the ll belongs to the macro, since it's a special instruction for
// atomic operations and not generated by gcc). 
    e56c:       14660006        bne     v1,a2,e588 <sem_post+0x58>
    e570:       00003821        move    a3,zero
    e574:       00a03821        move    a3,a1
    e578:       e0870000        sc      a3,0(a0)
    e57c:       10e0fffa        beqz    a3,e568 <sem_post+0x38>
    e580:       00000000        nop
    e584:       0000000f        sync
    ....



-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=11291

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug ports/11291] potential deadlock in sem_*wait and sem_post for MIPS architectures
  2010-02-17 15:07 [Bug nptl/11291] New: potential deadlock in sem_*wait and sem_post for MIPS architectures mischa dot jonker at viragelogic dot com
                   ` (2 preceding siblings ...)
  2010-02-17 16:37 ` mischa dot jonker at viragelogic dot com
@ 2010-02-17 16:59 ` schwab at linux-m68k dot org
  2010-02-17 17:56 ` mischa dot jonker at viragelogic dot com
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: schwab at linux-m68k dot org @ 2010-02-17 16:59 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From schwab at linux-m68k dot org  2010-02-17 16:59 -------
"m" (*mem) should be "+m" (*mem) since *mem is rw.

-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=11291

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug ports/11291] potential deadlock in sem_*wait and sem_post for MIPS architectures
  2010-02-17 15:07 [Bug nptl/11291] New: potential deadlock in sem_*wait and sem_post for MIPS architectures mischa dot jonker at viragelogic dot com
                   ` (3 preceding siblings ...)
  2010-02-17 16:59 ` schwab at linux-m68k dot org
@ 2010-02-17 17:56 ` mischa dot jonker at viragelogic dot com
  2010-02-17 19:14 ` jakub at redhat dot com
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: mischa dot jonker at viragelogic dot com @ 2010-02-17 17:56 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From mischa dot jonker at viragelogic dot com  2010-02-17 17:56 -------
Indeed, the + is missing. This could cause statements after the macro to read a 
value from before the macro (as the compiler doesn't expect the macro to change it 
without the +, and therefore might use some old value stored in a register). 

But this is all within the same thread/instance: it doesn't fix the problem. What 
if a different thread changes isem->value?? The only way to tell the compiler 
(that I know of I must say;-)) is by means of the 'volatile' keyword.

-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=11291

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug ports/11291] potential deadlock in sem_*wait and sem_post for MIPS architectures
  2010-02-17 15:07 [Bug nptl/11291] New: potential deadlock in sem_*wait and sem_post for MIPS architectures mischa dot jonker at viragelogic dot com
                   ` (4 preceding siblings ...)
  2010-02-17 17:56 ` mischa dot jonker at viragelogic dot com
@ 2010-02-17 19:14 ` jakub at redhat dot com
  2010-02-18 12:18 ` mischa dot jonker at viragelogic dot com
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: jakub at redhat dot com @ 2010-02-17 19:14 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From jakub at redhat dot com  2010-02-17 19:14 -------
"+m" and "=m"/"0" is not recommended, you should really use "=m" (*mem) ... "m"
(*mem).

And all you need to fix is exactly this, the atomic_* insn takes care of other
threads accessing the same field.
Even volatile wouldn't change anything.  You just load the value the field has
at some point in the current thread, compute the new value you want to use and then
atomic_* will succeed only if the memory has the same value as it had when it
has been loaded.

-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=11291

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug ports/11291] potential deadlock in sem_*wait and sem_post for MIPS architectures
  2010-02-17 15:07 [Bug nptl/11291] New: potential deadlock in sem_*wait and sem_post for MIPS architectures mischa dot jonker at viragelogic dot com
                   ` (5 preceding siblings ...)
  2010-02-17 19:14 ` jakub at redhat dot com
@ 2010-02-18 12:18 ` mischa dot jonker at viragelogic dot com
  2010-02-18 12:19 ` mischa dot jonker at viragelogic dot com
  2010-03-23 15:04 ` jsm28 at gcc dot gnu dot org
  8 siblings, 0 replies; 10+ messages in thread
From: mischa dot jonker at viragelogic dot com @ 2010-02-18 12:18 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From mischa dot jonker at viragelogic dot com  2010-02-18 12:18 -------
I understand that adding 'volatile' does not change anything to the behaviour of the macro, and that the atomic 
instructions take care of other threads _while inside the atomic_compare_and_exchange_bool_acq macro_. The problem 
is, that it doesn't take care of concurrency in the instructions before that.

without volatile the compiler optimizes the code to:
  cur = isem->value;
  if (isem->value == SEM_VALUE_MAX)
   {
     __set_errno (EOVERFLOW);
     return -1;
   }
// if we get a context switch here, and someone else increases isem->value, 
// we get a deadlock, because cur contains isem->value+1, and
// atomic_compare_and_exchange_bool_acq can never succeed.
  do
   {
   }
 while (atomic_compare_and_exchange_bool_acq (&isem->value, cur + 1, cur));


So, while it doesn't alter the behaviour of atomic_compare_and_exchange_bool_acq, it does put the 'cur = isem->value' 
statement inside the while loop. However, changing the patch as proposed does this as well (although for different 
reasons: now the compiler is aware that the macro itself (instead of other threads) can change isem->value).


-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=11291

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug ports/11291] potential deadlock in sem_*wait and sem_post for MIPS architectures
  2010-02-17 15:07 [Bug nptl/11291] New: potential deadlock in sem_*wait and sem_post for MIPS architectures mischa dot jonker at viragelogic dot com
                   ` (6 preceding siblings ...)
  2010-02-18 12:18 ` mischa dot jonker at viragelogic dot com
@ 2010-02-18 12:19 ` mischa dot jonker at viragelogic dot com
  2010-03-23 15:04 ` jsm28 at gcc dot gnu dot org
  8 siblings, 0 replies; 10+ messages in thread
From: mischa dot jonker at viragelogic dot com @ 2010-02-18 12:19 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From mischa dot jonker at viragelogic dot com  2010-02-18 12:19 -------
Created an attachment (id=4608)
 --> (http://sourceware.org/bugzilla/attachment.cgi?id=4608&action=view)
Adds *mem as an output to atomic macro's for MIPS


-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=11291

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug ports/11291] potential deadlock in sem_*wait and sem_post for MIPS architectures
  2010-02-17 15:07 [Bug nptl/11291] New: potential deadlock in sem_*wait and sem_post for MIPS architectures mischa dot jonker at viragelogic dot com
                   ` (7 preceding siblings ...)
  2010-02-18 12:19 ` mischa dot jonker at viragelogic dot com
@ 2010-03-23 15:04 ` jsm28 at gcc dot gnu dot org
  8 siblings, 0 replies; 10+ messages in thread
From: jsm28 at gcc dot gnu dot org @ 2010-03-23 15:04 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From jsm28 at gcc dot gnu dot org  2010-03-23 15:04 -------
Thanks, I've applied the patch to the atomic macros.


-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


http://sourceware.org/bugzilla/show_bug.cgi?id=11291

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2010-03-23 15:04 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-02-17 15:07 [Bug nptl/11291] New: potential deadlock in sem_*wait and sem_post for MIPS architectures mischa dot jonker at viragelogic dot com
2010-02-17 15:08 ` [Bug nptl/11291] " mischa dot jonker at viragelogic dot com
2010-02-17 15:19 ` [Bug ports/11291] " jakub at redhat dot com
2010-02-17 16:37 ` mischa dot jonker at viragelogic dot com
2010-02-17 16:59 ` schwab at linux-m68k dot org
2010-02-17 17:56 ` mischa dot jonker at viragelogic dot com
2010-02-17 19:14 ` jakub at redhat dot com
2010-02-18 12:18 ` mischa dot jonker at viragelogic dot com
2010-02-18 12:19 ` mischa dot jonker at viragelogic dot com
2010-03-23 15:04 ` jsm28 at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).