public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug nptl/300] New: pthread_cond_timedwait does not reacquire the mutex on cancelation
@ 2004-08-05  8:19 sebastien dot decugis at ext dot bull dot net
  2004-08-05  8:23 ` [Bug nptl/300] " sebastien dot decugis at ext dot bull dot net
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: sebastien dot decugis at ext dot bull dot net @ 2004-08-05  8:19 UTC (permalink / raw)
  To: glibc-bugs

On a 2-way box (2x i686) a cancelation handler sometimes fails to unlock the
mutex (errorcheck type) when the thread is canceled during
pthread_cond_timedwait operation.

The glibc is a CVS checkout from 2004-08-03, configured with NPTL support.

The compiler is gcc 3.4.1

The kernel is 2.6.7 #2 SMP Thu Jul 22 15:00:28 CEST 2004 i686 i686 i386 GNU/Linux

I can provide a sample to reproduce the problem (but it happens randomly :-( )

-- 
           Summary: pthread_cond_timedwait does not reacquire the mutex on
                    cancelation
           Product: glibc
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: nptl
        AssignedTo: drepper at redhat dot com
        ReportedBy: sebastien dot decugis at ext dot bull dot net
                CC: glibc-bugs at sources dot redhat dot com


http://sources.redhat.com/bugzilla/show_bug.cgi?id=300

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug nptl/300] pthread_cond_timedwait does not reacquire the mutex on cancelation
  2004-08-05  8:19 [Bug nptl/300] New: pthread_cond_timedwait does not reacquire the mutex on cancelation sebastien dot decugis at ext dot bull dot net
@ 2004-08-05  8:23 ` sebastien dot decugis at ext dot bull dot net
  2004-08-06  8:26 ` sebastien dot decugis at ext dot bull dot net
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: sebastien dot decugis at ext dot bull dot net @ 2004-08-05  8:23 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From sebastien dot decugis at ext dot bull dot net  2004-08-05 08:23 -------
Created an attachment (id=150)
 --> (http://sources.redhat.com/bugzilla/attachment.cgi?id=150&action=view)
Sample which will reproduce the problem after a while

This sample will reproduce the bug after a while:
$ ./showbug 
[09:22:58][parent] All condvars & mutex are ready 
[09:22:58][parent] Signal handler registered
[09:22:58][parent] All 10 manager threads are running...
[09:28:05]Test showbug.c unresolved: got 1 (Operation not permitted) on line
139 (Failed to unlock mutex in cancel handler (THIS IS A BUG))


-- 


http://sources.redhat.com/bugzilla/show_bug.cgi?id=300

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug nptl/300] pthread_cond_timedwait does not reacquire the mutex on cancelation
  2004-08-05  8:19 [Bug nptl/300] New: pthread_cond_timedwait does not reacquire the mutex on cancelation sebastien dot decugis at ext dot bull dot net
  2004-08-05  8:23 ` [Bug nptl/300] " sebastien dot decugis at ext dot bull dot net
@ 2004-08-06  8:26 ` sebastien dot decugis at ext dot bull dot net
  2004-08-06 15:13 ` jakub at redhat dot com
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: sebastien dot decugis at ext dot bull dot net @ 2004-08-06  8:26 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From sebastien dot decugis at ext dot bull dot net  2004-08-06 08:26 -------
The problem also appears on a 1-way i486 box with a base Fedora Core 2
distribution (glibc compiled 2004-05-11).

-- 


http://sources.redhat.com/bugzilla/show_bug.cgi?id=300

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug nptl/300] pthread_cond_timedwait does not reacquire the mutex on cancelation
  2004-08-05  8:19 [Bug nptl/300] New: pthread_cond_timedwait does not reacquire the mutex on cancelation sebastien dot decugis at ext dot bull dot net
  2004-08-05  8:23 ` [Bug nptl/300] " sebastien dot decugis at ext dot bull dot net
  2004-08-06  8:26 ` sebastien dot decugis at ext dot bull dot net
@ 2004-08-06 15:13 ` jakub at redhat dot com
  2004-08-09 21:42 ` jakub at redhat dot com
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: jakub at redhat dot com @ 2004-08-06 15:13 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From jakub at redhat dot com  2004-08-06 15:13 -------
I have been able to reproduce this on 8way and 4way P4, but haven't been able
to reproduce it on x86_64 (4way and 1way) and ia64 (4way).
Also, if pthread_cond_wait instead of pthread_cond_timedwait is used, I can't
reproduce it either (but I haven't ever seen the thread to hit ETIMEDOUT
in pthread_cond_timedwait, so both pthread_cond_wait and timedwait should behave
the same).
Which means the suspect is
nptl/sysdeps/unix/sysv/linux/i386/i486/pthread_cond_timedwait.S


-- 


http://sources.redhat.com/bugzilla/show_bug.cgi?id=300

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug nptl/300] pthread_cond_timedwait does not reacquire the mutex on cancelation
  2004-08-05  8:19 [Bug nptl/300] New: pthread_cond_timedwait does not reacquire the mutex on cancelation sebastien dot decugis at ext dot bull dot net
                   ` (2 preceding siblings ...)
  2004-08-06 15:13 ` jakub at redhat dot com
@ 2004-08-09 21:42 ` jakub at redhat dot com
  2004-08-11 18:51 ` rth at redhat dot com
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: jakub at redhat dot com @ 2004-08-09 21:42 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From jakub at redhat dot com  2004-08-09 21:42 -------
This seems to be a bug on the GCC side.
The test only seems to fail if SIGCANCEL signal interrupts the worker thread
on the very first instruction of __pthread_disable_asynccancel function
(FYI compiled with -fasynchronous-unwind-tables), called from
pthread_cond_timedwait.S.
MD_FALLBACK_FRAME_STATE_FOR sets (FS)->regs.reg[8].loc.offset to address
where __pthread_disable_asynccancel+0 is stored, during uw_update_context
this results in context->ra being set to __pthread_disable_asynccancel+0.
The problem is that next uw_frame_state_for will
fde = _Unwind_Find_FDE (context->ra - 1, &context->bases);
but context->ra - 1 is __pthread_disable_asynccancel-1 which isn't covered by
.eh_frame (and if it would, it would be for a different function).

So, to me it looks like MD_FALLBACK_FRAME_STATE_FOR should ensure that
context->ra will be set to pctx->uc_mcontext.gregs[REG_IP] + 1, not just
pctx->uc_mcontext.gregs[REG_IP].
Richard, do you agree?
If so, the question is how to ensure it.  I remember MD_FALLBACK_FRAME_STATE_FOR
used to have ugly hacks to store an incremented address somewhere and point
(FS)->regs.reg[8].loc.offset to it.  But the hacks can't be too ugly, since
they need to be used in both MD_FALLBACK_FRAME_STATE_FOR and in kernel vDSO's
sigreturn .eh_frame description too.


-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rth at gcc dot gnu dot org


http://sources.redhat.com/bugzilla/show_bug.cgi?id=300

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug nptl/300] pthread_cond_timedwait does not reacquire the mutex on cancelation
  2004-08-05  8:19 [Bug nptl/300] New: pthread_cond_timedwait does not reacquire the mutex on cancelation sebastien dot decugis at ext dot bull dot net
                   ` (3 preceding siblings ...)
  2004-08-09 21:42 ` jakub at redhat dot com
@ 2004-08-11 18:51 ` rth at redhat dot com
  2004-08-11 20:25 ` jakub at redhat dot com
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: rth at redhat dot com @ 2004-08-11 18:51 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From rth at redhat dot com  2004-08-11 18:51 -------
Subject: Re:  pthread_cond_timedwait does not reacquire the mutex on cancelation

On Mon, Aug 09, 2004 at 09:42:30PM -0000, jakub at redhat dot com wrote:
> So, to me it looks like MD_FALLBACK_FRAME_STATE_FOR should ensure that
> context->ra will be set to pctx->uc_mcontext.gregs[REG_IP] + 1, not just
> pctx->uc_mcontext.gregs[REG_IP].
> Richard, do you agree?

I dunno.  Perhaps Uli Weigand is right and we should handle
all of this +- 1 signal stuff in MD_FALLBACK_FRAME_STATE_FOR.

Recall that signals like SIGILL and the like will tend to
point to the *next* instruction, so if you did the +1
unconditionally, now you're pointing pass the next insn, so
you could have moved to a different EH region.

I see no solution that is sure to be 100% correct.


r~


-- 


http://sources.redhat.com/bugzilla/show_bug.cgi?id=300

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug nptl/300] pthread_cond_timedwait does not reacquire the mutex on cancelation
  2004-08-05  8:19 [Bug nptl/300] New: pthread_cond_timedwait does not reacquire the mutex on cancelation sebastien dot decugis at ext dot bull dot net
                   ` (4 preceding siblings ...)
  2004-08-11 18:51 ` rth at redhat dot com
@ 2004-08-11 20:25 ` jakub at redhat dot com
  2004-08-11 21:07 ` rth at gcc dot gnu dot org
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: jakub at redhat dot com @ 2004-08-11 20:25 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From jakub at redhat dot com  2004-08-11 20:25 -------
If needed it can be a switch based on signal number or siginfo_t content.
But what I see as a bigger problem is how to do this + 1 thingie.

On x86-64 (and some arches) there are a few spare words in struct sigcontext,
so MD_FALLBACK_FRAME_STATE_FOR can store there the interemented value and point
REG_SAVED_OFFSET there.
But on i386 I haven't found such spare word anywhere.
DW_CFA_expression is only able to compute address of some value, not the actual
value itself (nor is able to store something to a memory slot).
Using some hack with setting a flag in MD_FALLBACK_FRAME_STATE_FOR for signal
frame is not good either, since .eh_frame for the signal trampoline is these
days in the vDSO on some arches, so there needs to be a way to express it there.
Even if e.g. a DW_CFA_* value is allocated to signal say that retaddr points
before the instruction, not after it, if kernel starts using it in its vDSO,
old libgcc_s.so.1's will stop working.

NPTL could go the libgcj way in its sigcancel_handler and increment (in
architecture specific macro) the pc value, but I don't think that's a good idea
(it would stop working if GCC ever decides to somehow solve this issue and more
importantly, backtrace () would never work through signal frames and neither
generic forced unwinding or exception handling through signal frames).


-- 


http://sources.redhat.com/bugzilla/show_bug.cgi?id=300

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug nptl/300] pthread_cond_timedwait does not reacquire the mutex on cancelation
  2004-08-05  8:19 [Bug nptl/300] New: pthread_cond_timedwait does not reacquire the mutex on cancelation sebastien dot decugis at ext dot bull dot net
                   ` (5 preceding siblings ...)
  2004-08-11 20:25 ` jakub at redhat dot com
@ 2004-08-11 21:07 ` rth at gcc dot gnu dot org
  2005-09-26  0:02 ` drepper at redhat dot com
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: rth at gcc dot gnu dot org @ 2004-08-11 21:07 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From rth at gcc dot gnu dot org  2004-08-11 21:07 -------
To handle vdso's, I really only see a couple of solutions.

One, a dedicated column for the signal number.  We'd have to hack libgcc to
look for this column being set each frame, as well as hard-code the signal
numbers that stop before the pc, rather than after.

Two, a DW_GNU_CFA_foo that has an expression that evaluates non-zero when
the pc is before the instruction rather than after.  As you say, this causes
old libgcc's to abort.  Maybe we don't care, and backport the change (at least
to recognize and skip) to all gcc branches.  It's not like we need unwinding
in order to boot.

Three, a new augmentation letter that indicates the presence of an expression
that evaluates non-zero.  You'd probably want a uleb128 at the beginning that
gave the size of the expression, so that we could easily skip it to find data
for subsequent letters.  This last has the advantage that when we see an unknown
letter, we're *supposed* to skip processing the rest of the augmentation.  I
don't know that that code has been tested though; we'd have to see if we do or
don't actually handle that properly.

-- 


http://sources.redhat.com/bugzilla/show_bug.cgi?id=300

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug nptl/300] pthread_cond_timedwait does not reacquire the mutex on cancelation
  2004-08-05  8:19 [Bug nptl/300] New: pthread_cond_timedwait does not reacquire the mutex on cancelation sebastien dot decugis at ext dot bull dot net
                   ` (6 preceding siblings ...)
  2004-08-11 21:07 ` rth at gcc dot gnu dot org
@ 2005-09-26  0:02 ` drepper at redhat dot com
  2006-02-21 17:55 ` jakub at redhat dot com
  2006-04-23 17:59 ` drepper at redhat dot com
  9 siblings, 0 replies; 11+ messages in thread
From: drepper at redhat dot com @ 2005-09-26  0:02 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From drepper at redhat dot com  2005-09-26 00:02 -------
Did anything ever got resolved?

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at redhat dot com


http://sourceware.org/bugzilla/show_bug.cgi?id=300

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug nptl/300] pthread_cond_timedwait does not reacquire the mutex on cancelation
  2004-08-05  8:19 [Bug nptl/300] New: pthread_cond_timedwait does not reacquire the mutex on cancelation sebastien dot decugis at ext dot bull dot net
                   ` (7 preceding siblings ...)
  2005-09-26  0:02 ` drepper at redhat dot com
@ 2006-02-21 17:55 ` jakub at redhat dot com
  2006-04-23 17:59 ` drepper at redhat dot com
  9 siblings, 0 replies; 11+ messages in thread
From: jakub at redhat dot com @ 2006-02-21 17:55 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From jakub at redhat dot com  2006-02-21 17:55 -------
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26208

-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=300

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug nptl/300] pthread_cond_timedwait does not reacquire the mutex on cancelation
  2004-08-05  8:19 [Bug nptl/300] New: pthread_cond_timedwait does not reacquire the mutex on cancelation sebastien dot decugis at ext dot bull dot net
                   ` (8 preceding siblings ...)
  2006-02-21 17:55 ` jakub at redhat dot com
@ 2006-04-23 17:59 ` drepper at redhat dot com
  9 siblings, 0 replies; 11+ messages in thread
From: drepper at redhat dot com @ 2006-04-23 17:59 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From drepper at redhat dot com  2006-04-23 17:59 -------
Should be fixed when binaries are compiled with a recent gcc.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


http://sourceware.org/bugzilla/show_bug.cgi?id=300

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2006-04-23 17:59 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-08-05  8:19 [Bug nptl/300] New: pthread_cond_timedwait does not reacquire the mutex on cancelation sebastien dot decugis at ext dot bull dot net
2004-08-05  8:23 ` [Bug nptl/300] " sebastien dot decugis at ext dot bull dot net
2004-08-06  8:26 ` sebastien dot decugis at ext dot bull dot net
2004-08-06 15:13 ` jakub at redhat dot com
2004-08-09 21:42 ` jakub at redhat dot com
2004-08-11 18:51 ` rth at redhat dot com
2004-08-11 20:25 ` jakub at redhat dot com
2004-08-11 21:07 ` rth at gcc dot gnu dot org
2005-09-26  0:02 ` drepper at redhat dot com
2006-02-21 17:55 ` jakub at redhat dot com
2006-04-23 17:59 ` drepper at redhat dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).