public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug nptl/11291] New: potential deadlock in sem_*wait and sem_post for MIPS architectures
@ 2010-02-17 15:07 mischa dot jonker at viragelogic dot com
2010-02-17 15:08 ` [Bug nptl/11291] " mischa dot jonker at viragelogic dot com
` (8 more replies)
0 siblings, 9 replies; 10+ messages in thread
From: mischa dot jonker at viragelogic dot com @ 2010-02-17 15:07 UTC (permalink / raw)
To: glibc-bugs
I've encountered the following issue while heavily using glibc's
sem_wait and sem_post simultaneously in a lot of threads: it seems
that the sem_post and sem_wait functions can enter an endless loop in
some corner cases. After diving into the code, it seems that (for the
case of sem_post, nptl/sysdeps/unix/sysv/linux/sem_post.c) the
following code is causing the problem:
int
__new_sem_post (sem_t *sem)
{
struct new_sem *isem = (struct new_sem *) sem;
__typeof (isem->value) cur;
do
{
cur = isem->value;
if (isem->value == SEM_VALUE_MAX)
{
__set_errno (EOVERFLOW);
return -1;
}
}
while (atomic_compare_and_exchange_bool_acq (&isem->value, cur + 1, cur));
The problem occurs because gcc is optimizing this piece of code, and
puts 'isem->value' in a register before the while loop actually
begins. The atomic_compare_and_exchange_bool_acq macro then compares
'cur' (from the register, which is never updated in the loop) with
'isem->value' (from memory, which could be updated by another thread).
When we have multiple threads running and using the same semaphore,
the isem->value might be updated by another thread after the register
is filled with the value from memory, and then we run into an endless
loop...
I was able to fix this (and more issues of the same kind in the other
semaphore functions) by adding a 'volatile' keyword:
int
__new_sem_post (sem_t *sem)
{
struct new_sem volatile *isem = (struct new_sem *) sem;
===8<===8<===8<=== Additional info
glibc 2.9, but still present in git
kernel Linux 2.6.28.10 Wed Feb 17 13:32:34 CET 2010 mips unknown
$ mipsel-linux-gcc -v
Using built-in specs.
Target: mipsel-linux
Configured with: --with-gnu-ld --enable-shared --enable-target-optspace --
enable-languages=c,c++,objc --enable-threads=posix --enable-multilib --enable-
c99 --enable-long-long --enable-symvers=gnu --enable-libstdcxx-pch --program-
prefix=mipsel-linux- --enable-libssp --disable-bootstrap --enable-libgomp --
disable-libmudflap --disable-libunwind-exceptions --enable-libssp --enable-
libgomp --disable-libmudflap --enable-__cxa_atexit
Thread model: posix
gcc version 4.2.4
$ mipsel-linux-ld -v
GNU ld (Linux/GNU Binutils) 2.18.50.0.7.20080502
===8<===8<===8<=== Patch that solved the problem for me:
diff --git a/nptl/sysdeps/unix/sysv/linux/sem_post.c
b/nptl/sysdeps/unix/sysv/linux/sem_post.c
index 58b226f..0d4a6d6 100644
--- a/nptl/sysdeps/unix/sysv/linux/sem_post.c
+++ b/nptl/sysdeps/unix/sysv/linux/sem_post.c
@@ -29,7 +29,7 @@
int
__new_sem_post (sem_t *sem)
{
- struct new_sem *isem = (struct new_sem *) sem;
+ struct new_sem volatile *isem = (struct new_sem *) sem;
__typeof (isem->value) cur;
do
@@ -64,7 +64,7 @@ int
attribute_compat_text_section
__old_sem_post (sem_t *sem)
{
- int *futex = (int *) sem;
+ int volatile *futex = (int *) sem;
int nr = atomic_increment_val (futex);
/* We always have to assume it is a shared semaphore. */
diff --git a/nptl/sysdeps/unix/sysv/linux/sem_timedwait.c
b/nptl/sysdeps/unix/sysv/linux/sem_timedwait.c
index fdf0d74..a241ad6 100644
--- a/nptl/sysdeps/unix/sysv/linux/sem_timedwait.c
+++ b/nptl/sysdeps/unix/sysv/linux/sem_timedwait.c
@@ -34,7 +34,7 @@ extern void __sem_wait_cleanup (void *arg) attribute_hidden;
int
sem_timedwait (sem_t *sem, const struct timespec *abstime)
{
- struct new_sem *isem = (struct new_sem *) sem;
+ struct new_sem volatile *isem = (struct new_sem *) sem;
int err;
if (atomic_decrement_if_positive (&isem->value) > 0)
diff --git a/nptl/sysdeps/unix/sysv/linux/sem_trywait.c
b/nptl/sysdeps/unix/sysv/linux/sem_trywait.c
index f500361..74e1170 100644
--- a/nptl/sysdeps/unix/sysv/linux/sem_trywait.c
+++ b/nptl/sysdeps/unix/sysv/linux/sem_trywait.c
@@ -30,7 +30,7 @@
int
__new_sem_trywait (sem_t *sem)
{
- int *futex = (int *) sem;
+ int volatile *futex = (int *) sem;
int val;
if (*futex > 0)
diff --git a/nptl/sysdeps/unix/sysv/linux/sem_wait.c
b/nptl/sysdeps/unix/sysv/linux/sem_wait.c
index 20e2b48..5601e1a 100644
--- a/nptl/sysdeps/unix/sysv/linux/sem_wait.c
+++ b/nptl/sysdeps/unix/sysv/linux/sem_wait.c
@@ -32,7 +32,7 @@ void
attribute_hidden
__sem_wait_cleanup (void *arg)
{
- struct new_sem *isem = (struct new_sem *) arg;
+ struct new_sem volatile *isem = (struct new_sem *) arg;
atomic_decrement (&isem->nwaiters);
}
@@ -41,7 +41,7 @@ __sem_wait_cleanup (void *arg)
int
__new_sem_wait (sem_t *sem)
{
- struct new_sem *isem = (struct new_sem *) sem;
+ struct new_sem volatile *isem = (struct new_sem *) sem;
int err;
if (atomic_decrement_if_positive (&isem->value) > 0)
@@ -90,7 +90,7 @@ int
attribute_compat_text_section
__old_sem_wait (sem_t *sem)
{
- int *futex = (int *) sem;
+ int volatile *futex = (int *) sem;
int err;
do
--
Summary: potential deadlock in sem_*wait and sem_post for MIPS
architectures
Product: glibc
Version: 2.9
Status: NEW
Severity: normal
Priority: P2
Component: nptl
AssignedTo: drepper at redhat dot com
ReportedBy: mischa dot jonker at viragelogic dot com
CC: glibc-bugs at sources dot redhat dot com,mischa dot
jonker at viragelogic dot com
GCC build triplet: i386-linux-gnu
GCC host triplet: mipsel-linux-gnu
GCC target triplet: mipsel-linux-gnu
http://sourceware.org/bugzilla/show_bug.cgi?id=11291
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug nptl/11291] potential deadlock in sem_*wait and sem_post for MIPS architectures
2010-02-17 15:07 [Bug nptl/11291] New: potential deadlock in sem_*wait and sem_post for MIPS architectures mischa dot jonker at viragelogic dot com
@ 2010-02-17 15:08 ` mischa dot jonker at viragelogic dot com
2010-02-17 15:19 ` [Bug ports/11291] " jakub at redhat dot com
` (7 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: mischa dot jonker at viragelogic dot com @ 2010-02-17 15:08 UTC (permalink / raw)
To: glibc-bugs
------- Additional Comments From mischa dot jonker at viragelogic dot com 2010-02-17 15:07 -------
Created an attachment (id=4606)
--> (http://sourceware.org/bugzilla/attachment.cgi?id=4606&action=view)
Patch that fixes the problem for me
--
http://sourceware.org/bugzilla/show_bug.cgi?id=11291
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug ports/11291] potential deadlock in sem_*wait and sem_post for MIPS architectures
2010-02-17 15:07 [Bug nptl/11291] New: potential deadlock in sem_*wait and sem_post for MIPS architectures mischa dot jonker at viragelogic dot com
2010-02-17 15:08 ` [Bug nptl/11291] " mischa dot jonker at viragelogic dot com
@ 2010-02-17 15:19 ` jakub at redhat dot com
2010-02-17 16:37 ` mischa dot jonker at viragelogic dot com
` (6 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: jakub at redhat dot com @ 2010-02-17 15:19 UTC (permalink / raw)
To: glibc-bugs
------- Additional Comments From jakub at redhat dot com 2010-02-17 15:19 -------
The patch is wrong. Most likely just mips have wrong definition of the
atomic_compare_and_swap_bool macros where it doesn't tell the compiler that *mem
actually changes (missing "=m" (*mem) ? ).
--
What |Removed |Added
----------------------------------------------------------------------------
Component|nptl |ports
http://sourceware.org/bugzilla/show_bug.cgi?id=11291
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug ports/11291] potential deadlock in sem_*wait and sem_post for MIPS architectures
2010-02-17 15:07 [Bug nptl/11291] New: potential deadlock in sem_*wait and sem_post for MIPS architectures mischa dot jonker at viragelogic dot com
2010-02-17 15:08 ` [Bug nptl/11291] " mischa dot jonker at viragelogic dot com
2010-02-17 15:19 ` [Bug ports/11291] " jakub at redhat dot com
@ 2010-02-17 16:37 ` mischa dot jonker at viragelogic dot com
2010-02-17 16:59 ` schwab at linux-m68k dot org
` (5 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: mischa dot jonker at viragelogic dot com @ 2010-02-17 16:37 UTC (permalink / raw)
To: glibc-bugs
------- Additional Comments From mischa dot jonker at viragelogic dot com 2010-02-17 16:37 -------
It looks like the problem is not in the macro, the "m" (*mem) is actually there:
#define __arch_compare_and_exchange_xxx_32_int(mem, newval, oldval, rel, acq) \
__asm__ __volatile__ ( \
".set push\n\t" \
MIPS_PUSH_MIPS2 \
rel "\n" \
"1:\t" \
"ll %0,%4\n\t" \
"move %1,$0\n\t" \
"bne %0,%2,2f\n\t" \
"move %1,%3\n\t" \
"sc %1,%4\n\t" \
"beqz %1,1b\n" \
acq "\n\t" \
".set pop\n" \
"2:\n\t" \
: "=&r" (__prev), "=&r" (__cmp) \
: "r" (oldval), "r" (newval), "m" (*mem) \
: "memory")
If we look at the disassembled output of gcc, we see that in below loop (inside __new_sem_post), the "cur = isem->value"
is taken out of the loop, as isem->value does not seem to change.
The relevant code from sem_post:
do
{
cur = isem->value; // <-- taken out of the loop
if (isem->value == SEM_VALUE_MAX) //<--- taken out of the loop as well
{
__set_errno (EOVERFLOW);
return -1;
}
}
while (atomic_compare_and_exchange_bool_acq (&isem->value, cur + 1, cur));
And here the disassembly, with explanation:
0000e530 <sem_post>:
e530: 3c1c0002 lui gp,0x2
e534: 279ceb00 addiu gp,gp,-5376
e538: 0399e021 addu gp,gp,t9
e53c: 8c860000 lw a2,0(a0) // a2 (=cur) = isem->value
e540: 3c037fff lui v1,0x7fff
e544: 3462ffff ori v0,v1,0xffff // v0 = SEM_VALUE_MAX
e548: 14c20007 bne a2,v0,e568 <sem_post+0x38> //check for overflow
e54c: 24c50001 addiu a1,a2,1 // a1 = cur + 1 (note: branch delay slot!!)
<-- code to return error in case of overflow
e550: 8f84833c lw a0,-31940(gp)
e554: 7c03e83b rdhwr v1,$29
e558: 00831021 addu v0,a0,v1
e55c: 2404ffff li a0,-1
e560: 10000021 b e5e8 <sem_post+0xb8>
e564: 2403004f li v1,79 -->
// Execution continues here after overflow check. Please note that the ll/sc combination
// is used, which is in fact atomic. Furthermore, it fetches 0(a0) again, which is isem->value.
e568: c0830000 ll v1,0(a0)
// And here is the problem!! It compares the just fetched value with a2, which is initialised
// at the beginning of the loop, but never loaded again. The fact that 'll' is used for the load
// above (instead of lw), makes me believe that the atomic_compare_and_exchange_bool_acq macro
// is in fact correct (the ll belongs to the macro, since it's a special instruction for
// atomic operations and not generated by gcc).
e56c: 14660006 bne v1,a2,e588 <sem_post+0x58>
e570: 00003821 move a3,zero
e574: 00a03821 move a3,a1
e578: e0870000 sc a3,0(a0)
e57c: 10e0fffa beqz a3,e568 <sem_post+0x38>
e580: 00000000 nop
e584: 0000000f sync
....
--
http://sourceware.org/bugzilla/show_bug.cgi?id=11291
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug ports/11291] potential deadlock in sem_*wait and sem_post for MIPS architectures
2010-02-17 15:07 [Bug nptl/11291] New: potential deadlock in sem_*wait and sem_post for MIPS architectures mischa dot jonker at viragelogic dot com
` (2 preceding siblings ...)
2010-02-17 16:37 ` mischa dot jonker at viragelogic dot com
@ 2010-02-17 16:59 ` schwab at linux-m68k dot org
2010-02-17 17:56 ` mischa dot jonker at viragelogic dot com
` (4 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: schwab at linux-m68k dot org @ 2010-02-17 16:59 UTC (permalink / raw)
To: glibc-bugs
------- Additional Comments From schwab at linux-m68k dot org 2010-02-17 16:59 -------
"m" (*mem) should be "+m" (*mem) since *mem is rw.
--
http://sourceware.org/bugzilla/show_bug.cgi?id=11291
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug ports/11291] potential deadlock in sem_*wait and sem_post for MIPS architectures
2010-02-17 15:07 [Bug nptl/11291] New: potential deadlock in sem_*wait and sem_post for MIPS architectures mischa dot jonker at viragelogic dot com
` (3 preceding siblings ...)
2010-02-17 16:59 ` schwab at linux-m68k dot org
@ 2010-02-17 17:56 ` mischa dot jonker at viragelogic dot com
2010-02-17 19:14 ` jakub at redhat dot com
` (3 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: mischa dot jonker at viragelogic dot com @ 2010-02-17 17:56 UTC (permalink / raw)
To: glibc-bugs
------- Additional Comments From mischa dot jonker at viragelogic dot com 2010-02-17 17:56 -------
Indeed, the + is missing. This could cause statements after the macro to read a
value from before the macro (as the compiler doesn't expect the macro to change it
without the +, and therefore might use some old value stored in a register).
But this is all within the same thread/instance: it doesn't fix the problem. What
if a different thread changes isem->value?? The only way to tell the compiler
(that I know of I must say;-)) is by means of the 'volatile' keyword.
--
http://sourceware.org/bugzilla/show_bug.cgi?id=11291
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug ports/11291] potential deadlock in sem_*wait and sem_post for MIPS architectures
2010-02-17 15:07 [Bug nptl/11291] New: potential deadlock in sem_*wait and sem_post for MIPS architectures mischa dot jonker at viragelogic dot com
` (4 preceding siblings ...)
2010-02-17 17:56 ` mischa dot jonker at viragelogic dot com
@ 2010-02-17 19:14 ` jakub at redhat dot com
2010-02-18 12:18 ` mischa dot jonker at viragelogic dot com
` (2 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: jakub at redhat dot com @ 2010-02-17 19:14 UTC (permalink / raw)
To: glibc-bugs
------- Additional Comments From jakub at redhat dot com 2010-02-17 19:14 -------
"+m" and "=m"/"0" is not recommended, you should really use "=m" (*mem) ... "m"
(*mem).
And all you need to fix is exactly this, the atomic_* insn takes care of other
threads accessing the same field.
Even volatile wouldn't change anything. You just load the value the field has
at some point in the current thread, compute the new value you want to use and then
atomic_* will succeed only if the memory has the same value as it had when it
has been loaded.
--
http://sourceware.org/bugzilla/show_bug.cgi?id=11291
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug ports/11291] potential deadlock in sem_*wait and sem_post for MIPS architectures
2010-02-17 15:07 [Bug nptl/11291] New: potential deadlock in sem_*wait and sem_post for MIPS architectures mischa dot jonker at viragelogic dot com
` (5 preceding siblings ...)
2010-02-17 19:14 ` jakub at redhat dot com
@ 2010-02-18 12:18 ` mischa dot jonker at viragelogic dot com
2010-02-18 12:19 ` mischa dot jonker at viragelogic dot com
2010-03-23 15:04 ` jsm28 at gcc dot gnu dot org
8 siblings, 0 replies; 10+ messages in thread
From: mischa dot jonker at viragelogic dot com @ 2010-02-18 12:18 UTC (permalink / raw)
To: glibc-bugs
------- Additional Comments From mischa dot jonker at viragelogic dot com 2010-02-18 12:18 -------
I understand that adding 'volatile' does not change anything to the behaviour of the macro, and that the atomic
instructions take care of other threads _while inside the atomic_compare_and_exchange_bool_acq macro_. The problem
is, that it doesn't take care of concurrency in the instructions before that.
without volatile the compiler optimizes the code to:
cur = isem->value;
if (isem->value == SEM_VALUE_MAX)
{
__set_errno (EOVERFLOW);
return -1;
}
// if we get a context switch here, and someone else increases isem->value,
// we get a deadlock, because cur contains isem->value+1, and
// atomic_compare_and_exchange_bool_acq can never succeed.
do
{
}
while (atomic_compare_and_exchange_bool_acq (&isem->value, cur + 1, cur));
So, while it doesn't alter the behaviour of atomic_compare_and_exchange_bool_acq, it does put the 'cur = isem->value'
statement inside the while loop. However, changing the patch as proposed does this as well (although for different
reasons: now the compiler is aware that the macro itself (instead of other threads) can change isem->value).
--
http://sourceware.org/bugzilla/show_bug.cgi?id=11291
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug ports/11291] potential deadlock in sem_*wait and sem_post for MIPS architectures
2010-02-17 15:07 [Bug nptl/11291] New: potential deadlock in sem_*wait and sem_post for MIPS architectures mischa dot jonker at viragelogic dot com
` (6 preceding siblings ...)
2010-02-18 12:18 ` mischa dot jonker at viragelogic dot com
@ 2010-02-18 12:19 ` mischa dot jonker at viragelogic dot com
2010-03-23 15:04 ` jsm28 at gcc dot gnu dot org
8 siblings, 0 replies; 10+ messages in thread
From: mischa dot jonker at viragelogic dot com @ 2010-02-18 12:19 UTC (permalink / raw)
To: glibc-bugs
------- Additional Comments From mischa dot jonker at viragelogic dot com 2010-02-18 12:19 -------
Created an attachment (id=4608)
--> (http://sourceware.org/bugzilla/attachment.cgi?id=4608&action=view)
Adds *mem as an output to atomic macro's for MIPS
--
http://sourceware.org/bugzilla/show_bug.cgi?id=11291
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug ports/11291] potential deadlock in sem_*wait and sem_post for MIPS architectures
2010-02-17 15:07 [Bug nptl/11291] New: potential deadlock in sem_*wait and sem_post for MIPS architectures mischa dot jonker at viragelogic dot com
` (7 preceding siblings ...)
2010-02-18 12:19 ` mischa dot jonker at viragelogic dot com
@ 2010-03-23 15:04 ` jsm28 at gcc dot gnu dot org
8 siblings, 0 replies; 10+ messages in thread
From: jsm28 at gcc dot gnu dot org @ 2010-03-23 15:04 UTC (permalink / raw)
To: glibc-bugs
------- Additional Comments From jsm28 at gcc dot gnu dot org 2010-03-23 15:04 -------
Thanks, I've applied the patch to the atomic macros.
--
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
http://sourceware.org/bugzilla/show_bug.cgi?id=11291
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2010-03-23 15:04 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-02-17 15:07 [Bug nptl/11291] New: potential deadlock in sem_*wait and sem_post for MIPS architectures mischa dot jonker at viragelogic dot com
2010-02-17 15:08 ` [Bug nptl/11291] " mischa dot jonker at viragelogic dot com
2010-02-17 15:19 ` [Bug ports/11291] " jakub at redhat dot com
2010-02-17 16:37 ` mischa dot jonker at viragelogic dot com
2010-02-17 16:59 ` schwab at linux-m68k dot org
2010-02-17 17:56 ` mischa dot jonker at viragelogic dot com
2010-02-17 19:14 ` jakub at redhat dot com
2010-02-18 12:18 ` mischa dot jonker at viragelogic dot com
2010-02-18 12:19 ` mischa dot jonker at viragelogic dot com
2010-03-23 15:04 ` jsm28 at gcc dot gnu dot org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).