public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libc/5240] New: Pthread hang where there are still waiters when mutex is in "unlocked" state.
@ 2007-11-01 17:03 rsa at us dot ibm dot com
  2007-11-01 17:05 ` [Bug libc/5240] " rsa at us dot ibm dot com
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: rsa at us dot ibm dot com @ 2007-11-01 17:03 UTC (permalink / raw)
  To: glibc-bugs

A customer identified a potential race condition in
nptl/sysdeps/unix/sysv/linux/lowlevellock.c (__lll_timedlock_wait) which causes
waiting threads not to be woken up under certain circumstances.

Reproduced on:
SMP PowerPC 970
SMP POWER6 (with SMT)
SMP POWER5 (with SMT)
Uni PowerPC 440

Not reproduced on:
Uni Intel Pentium M.
SMP Intel Core 2 Duo.

Consider three threads, "A" holding a lock, "B" blocked in a timed
wait on the same lock, and "C" also blocked on that lock. The value of
the futex is 2.  Then:

- "A" releases the lock, setting the futex value to 0 and waking up
  "B".
- Before "B" performs any further action, "A" continues to execute and
  acquires the lock again, setting the futex value to 1.
- "B" checks the while condition in __lll_timedlock_wait:
  while (atomic_compare_and_exchange_bool_acq (futex, 2, 0) != 0);
  The condition is true, so "B" iterates the do-while loop.
- "B" hits the timeout and returns ETIMEDOUT.
- "A" releases the lock, setting the futex value from 1 to 0 (without
  wakeup).

At the end, "C" is left waiting, and the futex value is 0.

Testcase forthcoming...

-- 
           Summary: Pthread hang where there are still waiters when mutex is
                    in "unlocked" state.
           Product: glibc
           Version: unspecified
            Status: NEW
          Severity: critical
          Priority: P2
         Component: libc
        AssignedTo: drepper at redhat dot com
        ReportedBy: rsa at us dot ibm dot com
                CC: glibc-bugs at sources dot redhat dot com,rsa at us dot
                    ibm dot com
 GCC build triplet: powerpc-linux
  GCC host triplet: powerpc-linux
GCC target triplet: powerpc-linux


http://sourceware.org/bugzilla/show_bug.cgi?id=5240

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug libc/5240] Pthread hang where there are still waiters when mutex is in "unlocked" state.
  2007-11-01 17:03 [Bug libc/5240] New: Pthread hang where there are still waiters when mutex is in "unlocked" state rsa at us dot ibm dot com
@ 2007-11-01 17:05 ` rsa at us dot ibm dot com
  2007-11-01 17:55 ` rsa at us dot ibm dot com
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: rsa at us dot ibm dot com @ 2007-11-01 17:05 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From rsa at us dot ibm dot com  2007-11-01 17:05 -------
Created an attachment (id=2070)
 --> (http://sourceware.org/bugzilla/attachment.cgi?id=2070&action=view)
pthread hang

Associated test-case to reproduce on PowerPC hardware.

-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=5240

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug libc/5240] Pthread hang where there are still waiters when mutex is in "unlocked" state.
  2007-11-01 17:03 [Bug libc/5240] New: Pthread hang where there are still waiters when mutex is in "unlocked" state rsa at us dot ibm dot com
  2007-11-01 17:05 ` [Bug libc/5240] " rsa at us dot ibm dot com
@ 2007-11-01 17:55 ` rsa at us dot ibm dot com
  2007-11-01 19:48 ` [Bug nptl/5240] " rsa at us dot ibm dot com
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: rsa at us dot ibm dot com @ 2007-11-01 17:55 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From rsa at us dot ibm dot com  2007-11-01 17:55 -------
Created an attachment (id=2071)
 --> (http://sourceware.org/bugzilla/attachment.cgi?id=2071&action=view)
patch to avoid the hang by awakening waiters before returning TIMEOUT.

The following patch ensures that waiters will be awoken before returning the
timeout.  This patch avoids an unnecessary system call in the usual timeout
case.  

A simpler solution if we don't care about the system call cost would be to
unconditionally invoke lll_futex_wake before returning.

I've verified that this patch does indeed prevent the hang scenario described.


-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=5240

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug nptl/5240] Pthread hang where there are still waiters when mutex is in "unlocked" state.
  2007-11-01 17:03 [Bug libc/5240] New: Pthread hang where there are still waiters when mutex is in "unlocked" state rsa at us dot ibm dot com
  2007-11-01 17:05 ` [Bug libc/5240] " rsa at us dot ibm dot com
  2007-11-01 17:55 ` rsa at us dot ibm dot com
@ 2007-11-01 19:48 ` rsa at us dot ibm dot com
  2007-11-09  9:10 ` drepper at redhat dot com
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: rsa at us dot ibm dot com @ 2007-11-01 19:48 UTC (permalink / raw)
  To: glibc-bugs



-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|libc                        |nptl


http://sourceware.org/bugzilla/show_bug.cgi?id=5240

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug nptl/5240] Pthread hang where there are still waiters when mutex is in "unlocked" state.
  2007-11-01 17:03 [Bug libc/5240] New: Pthread hang where there are still waiters when mutex is in "unlocked" state rsa at us dot ibm dot com
                   ` (2 preceding siblings ...)
  2007-11-01 19:48 ` [Bug nptl/5240] " rsa at us dot ibm dot com
@ 2007-11-09  9:10 ` drepper at redhat dot com
  2007-11-09 13:54 ` rsa at us dot ibm dot com
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: drepper at redhat dot com @ 2007-11-09  9:10 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From drepper at redhat dot com  2007-11-09 09:10 -------
The analysis is correct but the patch is less than optimal.  I've checked in
something different and also fixed x86 and x86-64.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


http://sourceware.org/bugzilla/show_bug.cgi?id=5240

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug nptl/5240] Pthread hang where there are still waiters when mutex is in "unlocked" state.
  2007-11-01 17:03 [Bug libc/5240] New: Pthread hang where there are still waiters when mutex is in "unlocked" state rsa at us dot ibm dot com
                   ` (3 preceding siblings ...)
  2007-11-09  9:10 ` drepper at redhat dot com
@ 2007-11-09 13:54 ` rsa at us dot ibm dot com
  2007-11-09 23:05 ` rsa at us dot ibm dot com
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: rsa at us dot ibm dot com @ 2007-11-09 13:54 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From rsa at us dot ibm dot com  2007-11-09 13:54 -------
Thanks Ulrich, for future reference:

        "(__lll_timedlock_wait): If we time out, try one last time to lock the
        futex to avoid losing a wakeup signal."

lowlevellock.c
http://sourceware.org/cgi-bin/cvsweb.cgi/libc/nptl/sysdeps/unix/sysv/linux/lowlevellock.c.diff?cvsroot=glibc&r1=1.17&r2=1.18

i386/lowlevellock.S
http://sourceware.org/cgi-bin/cvsweb.cgi/libc/nptl/sysdeps/unix/sysv/linux/i386/i486/lowlevellock.S.diff?cvsroot=glibc&r1=1.19&r2=1.20

x86_64/lowlevellock.S
http://sourceware.org/cgi-bin/cvsweb.cgi/libc/nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S.diff?cvsroot=glibc&r1=1.21&r2=1.22

-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=5240

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug nptl/5240] Pthread hang where there are still waiters when mutex is in "unlocked" state.
  2007-11-01 17:03 [Bug libc/5240] New: Pthread hang where there are still waiters when mutex is in "unlocked" state rsa at us dot ibm dot com
                   ` (4 preceding siblings ...)
  2007-11-09 13:54 ` rsa at us dot ibm dot com
@ 2007-11-09 23:05 ` rsa at us dot ibm dot com
  2007-11-24  4:46 ` drepper at redhat dot com
  2007-11-27 23:20 ` rsa at us dot ibm dot com
  7 siblings, 0 replies; 9+ messages in thread
From: rsa at us dot ibm dot com @ 2007-11-09 23:05 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From rsa at us dot ibm dot com  2007-11-09 23:05 -------
I tested the patch on a Power5 machine and I'm still encountering the hang. 
Others indicate that they're getting the hang as well on different classes of
PowerPC hardware.  Is there any information you'd like me to gather to determine
why it's still happening?

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|FIXED                       |


http://sourceware.org/bugzilla/show_bug.cgi?id=5240

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug nptl/5240] Pthread hang where there are still waiters when mutex is in "unlocked" state.
  2007-11-01 17:03 [Bug libc/5240] New: Pthread hang where there are still waiters when mutex is in "unlocked" state rsa at us dot ibm dot com
                   ` (5 preceding siblings ...)
  2007-11-09 23:05 ` rsa at us dot ibm dot com
@ 2007-11-24  4:46 ` drepper at redhat dot com
  2007-11-27 23:20 ` rsa at us dot ibm dot com
  7 siblings, 0 replies; 9+ messages in thread
From: drepper at redhat dot com @ 2007-11-24  4:46 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From drepper at redhat dot com  2007-11-24 04:46 -------
I've made some more changes (and some optimizations).  The current code should work.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |RESOLVED
         Resolution|                            |FIXED


http://sourceware.org/bugzilla/show_bug.cgi?id=5240

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug nptl/5240] Pthread hang where there are still waiters when mutex is in "unlocked" state.
  2007-11-01 17:03 [Bug libc/5240] New: Pthread hang where there are still waiters when mutex is in "unlocked" state rsa at us dot ibm dot com
                   ` (6 preceding siblings ...)
  2007-11-24  4:46 ` drepper at redhat dot com
@ 2007-11-27 23:20 ` rsa at us dot ibm dot com
  7 siblings, 0 replies; 9+ messages in thread
From: rsa at us dot ibm dot com @ 2007-11-27 23:20 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From rsa at us dot ibm dot com  2007-11-27 23:20 -------
Created an attachment (id=2112)
 --> (http://sourceware.org/bugzilla/attachment.cgi?id=2112&action=view)
Simplified testcase with cleaner termination path.

The fixed worked perfectly on POWER6.  On POWER5 I kept running into a
segmentation fault in the exit() path of the test-case.

The test-case is problematic since the exit() in the child thread's
thread_exit() function causes process termination which ends up sending two
threads down the glibc exit() pipeline at the same time and the linked list of
exit handlers and ends up dereferencing a pointer which has already been
zeroed.

I've modified the test case to demonstrate a more appropriate exit strategy
(which also ends up simplifying the testcase).

I think this bug is resolved.

Thanks for the fix Ulrich.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #2070 is|0                           |1
           obsolete|                            |


http://sourceware.org/bugzilla/show_bug.cgi?id=5240

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2007-11-27 23:20 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-11-01 17:03 [Bug libc/5240] New: Pthread hang where there are still waiters when mutex is in "unlocked" state rsa at us dot ibm dot com
2007-11-01 17:05 ` [Bug libc/5240] " rsa at us dot ibm dot com
2007-11-01 17:55 ` rsa at us dot ibm dot com
2007-11-01 19:48 ` [Bug nptl/5240] " rsa at us dot ibm dot com
2007-11-09  9:10 ` drepper at redhat dot com
2007-11-09 13:54 ` rsa at us dot ibm dot com
2007-11-09 23:05 ` rsa at us dot ibm dot com
2007-11-24  4:46 ` drepper at redhat dot com
2007-11-27 23:20 ` rsa at us dot ibm dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).