public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug nptl/12875] New: pthread_cond_timedwait can steal the wakeup of slower thread in pthread_cond_wait
@ 2011-06-10 17:57 martin at lispworks dot com
  2011-06-11 13:36 ` [Bug nptl/12875] " drepper.fsp at gmail dot com
                   ` (18 more replies)
  0 siblings, 19 replies; 20+ messages in thread
From: martin at lispworks dot com @ 2011-06-10 17:57 UTC (permalink / raw)
  To: glibc-bugs

http://sourceware.org/bugzilla/show_bug.cgi?id=12875

           Summary: pthread_cond_timedwait can steal the wakeup of slower
                    thread in pthread_cond_wait
           Product: glibc
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: nptl
        AssignedTo: drepper.fsp@gmail.com
        ReportedBy: martin@lispworks.com


Created attachment 5787
  --> http://sourceware.org/bugzilla/attachment.cgi?id=5787
Example source code

According to the definition of pthread_cond_signal, it should only unblock
threads that are blocked at the time it is called.

The attached example demonstrates a bug in pthread_cond_timedwait that can
allow it to "steal" the signal from a thread that was blocked in
pthread_cond_wait when pthread_cond_signal was called, even though
pthread_cond_timedwait was called after pthread_cond_signal.

This was tested on an Intel Core i7 970 CPU (6 cores, 12 threads) running
Fedora 14 x86_64 with the master branch of glibc from 2011-06-10 and also with
older releases.

There is no easy way to repeat this, because it depends very much on the timing
of the threads, so I've had to cheat by using pthread_kill to artificially
delay the thread that called pthread_cond_wait (cs_ptA).

The expected output of the program is

A waits
cs_timewaster starts
B signals
cs_delaywaster starts
C waits
D waits
B signals
C wakes
D wakes with ETIMEDOUT
cs_delaywaster ends
A wakes

Note that D wakes with ETIMEDOUT and A wakes afterwards.

Often the program hangs after outputing

A waits
cs_timewaster starts
B signals
cs_delaywaster starts
C waits
D waits
B signals
C wakes
D wakes with code 0
cs_delaywaster ends

Note that D wakes with code 0 and A never wakes.

The calls to usleep in the example are such that the first signal from thread B
should cause thread A to wake and second signal should cause thread C to wake. 
Thread D should wake on the timeout.

I think the problem is that if thread A wakes from the futex but doesn't
execute the rest of pthread_cond_wait quickly enough (e.g. because the signal
handler cs_delaywaster runs), then thread D can steal the wakeup when its futex
returns on the timeout.

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug nptl/12875] pthread_cond_timedwait can steal the wakeup of slower thread in pthread_cond_wait
  2011-06-10 17:57 [Bug nptl/12875] New: pthread_cond_timedwait can steal the wakeup of slower thread in pthread_cond_wait martin at lispworks dot com
@ 2011-06-11 13:36 ` drepper.fsp at gmail dot com
  2011-06-13 12:39 ` martin at lispworks dot com
                   ` (17 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: drepper.fsp at gmail dot com @ 2011-06-11 13:36 UTC (permalink / raw)
  To: glibc-bugs

http://sourceware.org/bugzilla/show_bug.cgi?id=12875

Ulrich Drepper <drepper.fsp at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |INVALID

--- Comment #1 from Ulrich Drepper <drepper.fsp at gmail dot com> 2011-06-11 13:35:58 UTC ---
The current implementation is correct.  There is no guarantee of fairness.

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug nptl/12875] pthread_cond_timedwait can steal the wakeup of slower thread in pthread_cond_wait
  2011-06-10 17:57 [Bug nptl/12875] New: pthread_cond_timedwait can steal the wakeup of slower thread in pthread_cond_wait martin at lispworks dot com
  2011-06-11 13:36 ` [Bug nptl/12875] " drepper.fsp at gmail dot com
@ 2011-06-13 12:39 ` martin at lispworks dot com
  2011-09-28 21:55 ` bugdal at aerifal dot cx
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: martin at lispworks dot com @ 2011-06-13 12:39 UTC (permalink / raw)
  To: glibc-bugs

http://sourceware.org/bugzilla/show_bug.cgi?id=12875

martin at lispworks dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|INVALID                     |

--- Comment #2 from martin at lispworks dot com 2011-06-13 12:39:04 UTC ---
Can you take another look at this please?  It isn't about fairness because the
man page (and spec AFAIK) for pthread_cond_signal says:

"The pthread_cond_signal() function shall unblock at least one of the threads
that are blocked on the specified condition variable cond (if any threads are
blocked on cond)."

In the example, only thread A is blocked at the time thread B signals for the
first time, so I think it should wake whatever happens later  The blocking of
threads C and D occurs after thread B signals for the first time so shouldn't
be affected by that signal.

The problem is that if pthread_cond_timedwait in thread D reaches its timeout
before thread A has had a chance to increment __woken_seq, then thread D will
claim the signal even though its true reason for waking is the timeout.

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug nptl/12875] pthread_cond_timedwait can steal the wakeup of slower thread in pthread_cond_wait
  2011-06-10 17:57 [Bug nptl/12875] New: pthread_cond_timedwait can steal the wakeup of slower thread in pthread_cond_wait martin at lispworks dot com
  2011-06-11 13:36 ` [Bug nptl/12875] " drepper.fsp at gmail dot com
  2011-06-13 12:39 ` martin at lispworks dot com
@ 2011-09-28 21:55 ` bugdal at aerifal dot cx
  2012-03-17 20:41 ` bugdal at aerifal dot cx
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: bugdal at aerifal dot cx @ 2011-09-28 21:55 UTC (permalink / raw)
  To: glibc-bugs

http://sourceware.org/bugzilla/show_bug.cgi?id=12875

Rich Felker <bugdal at aerifal dot cx> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bugdal at aerifal dot cx

--- Comment #3 from Rich Felker <bugdal at aerifal dot cx> 2011-09-28 21:55:17 UTC ---
This is definitely a bug. I hope this won't be another case where Mr. Drepper
has got too much time invested in the over-engineered code for minimizing
spurious wakes to admit that it needs to be thrown out and replaced with a
correct implementation.

Actually I have a possible alternate explanation for his position: it's
possible that he's viewing condition signals as events that always correspond
to a new item in a queue that will be removed by any waiter who wakes. If this
is all you use cond vars for, there's no need to avoid having a new waiter
(arriving after the signal) avoid stealing an existing waiter's wake, because
the new waiter could just as easily have experienced a spurious wake, and upon
waking up and checking the queue, found and removed the next queued item - at
which point, even if the existing waiter woke, it would find an empty queue
again and wait again.

Of course condition variables have plenty of other legitimate uses, and the
requirements on the implementation are governed by the specification, not by
one narrow-minded idea of what they should be used for...

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug nptl/12875] pthread_cond_timedwait can steal the wakeup of slower thread in pthread_cond_wait
  2011-06-10 17:57 [Bug nptl/12875] New: pthread_cond_timedwait can steal the wakeup of slower thread in pthread_cond_wait martin at lispworks dot com
                   ` (2 preceding siblings ...)
  2011-09-28 21:55 ` bugdal at aerifal dot cx
@ 2012-03-17 20:41 ` bugdal at aerifal dot cx
  2012-04-29  3:01 ` bugdal at aerifal dot cx
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: bugdal at aerifal dot cx @ 2012-03-17 20:41 UTC (permalink / raw)
  To: glibc-bugs

http://sourceware.org/bugzilla/show_bug.cgi?id=12875

--- Comment #4 from Rich Felker <bugdal at aerifal dot cx> 2012-03-17 20:40:32 UTC ---
Ping. There was a lot of analysis on this bug but as far as I know, no work
towards fixing it...

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug nptl/12875] pthread_cond_timedwait can steal the wakeup of slower thread in pthread_cond_wait
  2011-06-10 17:57 [Bug nptl/12875] New: pthread_cond_timedwait can steal the wakeup of slower thread in pthread_cond_wait martin at lispworks dot com
                   ` (3 preceding siblings ...)
  2012-03-17 20:41 ` bugdal at aerifal dot cx
@ 2012-04-29  3:01 ` bugdal at aerifal dot cx
  2012-09-18 14:18 ` triegel at redhat dot com
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: bugdal at aerifal dot cx @ 2012-04-29  3:01 UTC (permalink / raw)
  To: glibc-bugs

http://sourceware.org/bugzilla/show_bug.cgi?id=12875

Rich Felker <bugdal at aerifal dot cx> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|drepper.fsp at gmail dot    |unassigned at sourceware
                   |com                         |dot org

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug nptl/12875] pthread_cond_timedwait can steal the wakeup of slower thread in pthread_cond_wait
  2011-06-10 17:57 [Bug nptl/12875] New: pthread_cond_timedwait can steal the wakeup of slower thread in pthread_cond_wait martin at lispworks dot com
                   ` (4 preceding siblings ...)
  2012-04-29  3:01 ` bugdal at aerifal dot cx
@ 2012-09-18 14:18 ` triegel at redhat dot com
  2012-09-18 14:28 ` triegel at redhat dot com
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: triegel at redhat dot com @ 2012-09-18 14:18 UTC (permalink / raw)
  To: glibc-bugs

http://sourceware.org/bugzilla/show_bug.cgi?id=12875

Torvald Riegel <triegel at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |triegel at redhat dot com

--- Comment #5 from Torvald Riegel <triegel at redhat dot com> 2012-09-18 14:18:13 UTC ---
(In reply to comment #2)
> Can you take another look at this please?  It isn't about fairness because the
> man page (and spec AFAIK) for pthread_cond_signal says:
> 
> "The pthread_cond_signal() function shall unblock at least one of the threads
> that are blocked on the specified condition variable cond (if any threads are
> blocked on cond)."
> 
> In the example, only thread A is blocked at the time thread B signals for the
> first time, so I think it should wake whatever happens later  The blocking of
> threads C and D occurs after thread B signals for the first time so shouldn't
> be affected by that signal.

Please revise your test case so that it is properly synchronized for what you
intend to test.  usleep() is not a reliable way to enforce a happens-before
order between operations in different threads.  Cond vars, locks, or something
similar can enforce happens-before.

> The problem is that if pthread_cond_timedwait in thread D reaches its timeout
> before thread A has had a chance to increment __woken_seq, then thread D will
> claim the signal even though its true reason for waking is the timeout.

If A didn't consume a signal before D did, why shouldn't D consume an available
signal?  The manpage bit you quote says that _at least_ one of the blocked
threads should be unblocked.  This seems to be what's happening.

If you find a statement in the spec that disallows the behavior you see, please
quote this statement instead.  I guess what you might find to be surprising is
that a signal operation might not have finished wakening a thread even though
it already returned to the caller.

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug nptl/12875] pthread_cond_timedwait can steal the wakeup of slower thread in pthread_cond_wait
  2011-06-10 17:57 [Bug nptl/12875] New: pthread_cond_timedwait can steal the wakeup of slower thread in pthread_cond_wait martin at lispworks dot com
                   ` (5 preceding siblings ...)
  2012-09-18 14:18 ` triegel at redhat dot com
@ 2012-09-18 14:28 ` triegel at redhat dot com
  2012-09-19  3:21 ` bugdal at aerifal dot cx
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: triegel at redhat dot com @ 2012-09-18 14:28 UTC (permalink / raw)
  To: glibc-bugs

http://sourceware.org/bugzilla/show_bug.cgi?id=12875

--- Comment #6 from Torvald Riegel <triegel at redhat dot com> 2012-09-18 14:28:05 UTC ---
(In reply to comment #3)
> This is definitely a bug.

If so, please quote the specification or manpage that this violates.  At the
very least, we need this for documentation purposes.

> I hope this won't be another case where Mr. Drepper
> has got too much time invested in the over-engineered code for minimizing
> spurious wakes to admit that it needs to be thrown out and replaced with a
> correct implementation.

This is a bug report.  Please focus on the technical matter.

> Actually I have a possible alternate explanation for his position: it's
> possible that he's viewing condition signals as events that always correspond
> to a new item in a queue that will be removed by any waiter who wakes. If this
> is all you use cond vars for, there's no need to avoid having a new waiter
> (arriving after the signal) avoid stealing an existing waiter's wake, because
> the new waiter could just as easily have experienced a spurious wake, and upon
> waking up and checking the queue, found and removed the next queued item - at
> which point, even if the existing waiter woke, it would find an empty queue
> again and wait again.

What you describe is a fairness issue.  I'm not aware of any guarantee of
fairness or absence of starvation.  The cond var should try to be fair if
possible, but the test case is not even for a PI cond var, so threads aren't
even guaranteed to get to run on the CPU.

> Of course condition variables have plenty of other legitimate uses, and the
> requirements on the implementation are governed by the specification, not by
> one narrow-minded idea of what they should be used for...

So, please cite the bits of the specification that disallow the behavior.  What
Martin cite from the manpage does not seem to conflict with the current
behavior.

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug nptl/12875] pthread_cond_timedwait can steal the wakeup of slower thread in pthread_cond_wait
  2011-06-10 17:57 [Bug nptl/12875] New: pthread_cond_timedwait can steal the wakeup of slower thread in pthread_cond_wait martin at lispworks dot com
                   ` (6 preceding siblings ...)
  2012-09-18 14:28 ` triegel at redhat dot com
@ 2012-09-19  3:21 ` bugdal at aerifal dot cx
  2012-09-19  6:24 ` jakub at redhat dot com
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: bugdal at aerifal dot cx @ 2012-09-19  3:21 UTC (permalink / raw)
  To: glibc-bugs

http://sourceware.org/bugzilla/show_bug.cgi?id=12875

--- Comment #7 from Rich Felker <bugdal at aerifal dot cx> 2012-09-19 03:21:17 UTC ---
> If so, please quote the specification or manpage that this violates.  At the
> very least, we need this for documentation purposes.

"The pthread_cond_signal() function shall unblock at least one of the threads
that are blocked on the specified condition variable cond (if any threads are
blocked on cond)."

Source:
http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_cond_signal.html

If it's provable that threads 1 through N are waiting on a condition variable C
when thread X calls pthread_cond_signal on C N times, then in order to satisfy
the above, all N must be unblocked. If another waiter arriving after the N
signals consumes any of the signals and prevents those N from being blocked,
then the above cited requirement has been violated; at least one call to
pthread_cond_signal did not "unblock at least one of the threads that are
blocked".

I admit the test case is poorly written and rather obfuscated, but I believe
that it is showing a bug of this nature. Contrary to yours and Mr. Dreppers
characterizations, this is NOT A FAIRNESS ISSUE. No complaint about fairness
has been made by myself or the original poster of the bug. If the bug really
exists in the described form -- and it seems to me that it does, although I'd
like to find a simpler test case -- then it's an issue of the interface
violating its contract.

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug nptl/12875] pthread_cond_timedwait can steal the wakeup of slower thread in pthread_cond_wait
  2011-06-10 17:57 [Bug nptl/12875] New: pthread_cond_timedwait can steal the wakeup of slower thread in pthread_cond_wait martin at lispworks dot com
                   ` (7 preceding siblings ...)
  2012-09-19  3:21 ` bugdal at aerifal dot cx
@ 2012-09-19  6:24 ` jakub at redhat dot com
  2012-09-19  8:22 ` triegel at redhat dot com
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: jakub at redhat dot com @ 2012-09-19  6:24 UTC (permalink / raw)
  To: glibc-bugs

http://sourceware.org/bugzilla/show_bug.cgi?id=12875

Jakub Jelinek <jakub at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at redhat dot com

--- Comment #8 from Jakub Jelinek <jakub at redhat dot com> 2012-09-19 06:23:24 UTC ---
(In reply to comment #7)
> I admit the test case is poorly written and rather obfuscated, but I believe
> that it is showing a bug of this nature. Contrary to yours and Mr. Dreppers
> characterizations, this is NOT A FAIRNESS ISSUE. No complaint about fairness
> has been made by myself or the original poster of the bug. If the bug really
> exists in the described form -- and it seems to me that it does, although I'd
> like to find a simpler test case -- then it's an issue of the interface
> violating its contract.

The testcase doesn't show anything like that, there are absolutely no
guarantees that the usleeps result in whatever ordering of events in the
threaded program.

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug nptl/12875] pthread_cond_timedwait can steal the wakeup of slower thread in pthread_cond_wait
  2011-06-10 17:57 [Bug nptl/12875] New: pthread_cond_timedwait can steal the wakeup of slower thread in pthread_cond_wait martin at lispworks dot com
                   ` (8 preceding siblings ...)
  2012-09-19  6:24 ` jakub at redhat dot com
@ 2012-09-19  8:22 ` triegel at redhat dot com
  2012-09-19  8:24 ` bugdal at aerifal dot cx
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: triegel at redhat dot com @ 2012-09-19  8:22 UTC (permalink / raw)
  To: glibc-bugs

http://sourceware.org/bugzilla/show_bug.cgi?id=12875

--- Comment #9 from Torvald Riegel <triegel at redhat dot com> 2012-09-19 08:21:49 UTC ---
(In reply to comment #7)
> > If so, please quote the specification or manpage that this violates.  At the
> > very least, we need this for documentation purposes.
> 
> "The pthread_cond_signal() function shall unblock at least one of the threads
> that are blocked on the specified condition variable cond (if any threads are
> blocked on cond)."
> 
> Source:
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_cond_signal.html
> 
> If it's provable that threads 1 through N are waiting on a condition variable C
> when thread X calls pthread_cond_signal on C N times, then in order to satisfy
> the above, all N must be unblocked.

First, I don't think you can prove that they are waiting in this test case --
usleep doesn't give you any happens-before or other ordering guarantees.

Second, the sentence from the spec that you quote is vague on when those other
threads would block (e.g., is there any guarantee of linearizability or such?).
Specifically, it doesn't state which blocked threads should be wakened, or
which threads blocked at which time.

> If another waiter arriving after the N
> signals consumes any of the signals and prevents those N from being blocked,
> then the above cited requirement has been violated;

I don't see how this is violated. You say it unblocked the waiter and N-1 other
threads, so N overall for N signals.  That's what the guarantee is, isn't it?

> at least one call to
> pthread_cond_signal did not "unblock at least one of the threads that are
> blocked".

It did unblock one of the threads that are blocked.  Even at the end right
before their timeout they are still blocked, right?

> I admit the test case is poorly written and rather obfuscated but I believe
> that it is showing a bug of this nature.

> Contrary to yours and Mr. Dreppers
> characterizations, this is NOT A FAIRNESS ISSUE.

Please look again at the paragraph that you wrote that I replied to in comment
#6.  You describe a fairness issue there (a newer waiter "stealing" an older
waiter's signal).  I did not say that this bug report is just a fairness issue.
 In some executions it might show that there's no fairness, but the core
problems with this bug report are the two items I point out above.

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug nptl/12875] pthread_cond_timedwait can steal the wakeup of slower thread in pthread_cond_wait
  2011-06-10 17:57 [Bug nptl/12875] New: pthread_cond_timedwait can steal the wakeup of slower thread in pthread_cond_wait martin at lispworks dot com
                   ` (9 preceding siblings ...)
  2012-09-19  8:22 ` triegel at redhat dot com
@ 2012-09-19  8:24 ` bugdal at aerifal dot cx
  2012-09-19  8:50 ` bugdal at aerifal dot cx
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: bugdal at aerifal dot cx @ 2012-09-19  8:24 UTC (permalink / raw)
  To: glibc-bugs

http://sourceware.org/bugzilla/show_bug.cgi?id=12875

--- Comment #10 from Rich Felker <bugdal at aerifal dot cx> 2012-09-19 08:23:42 UTC ---
Use of usleep does not automatically invalidate the test. However, looking at
the test source again, I see this is a different test case (and different bug
report) from the one I was thinking of, which claims the same issue (stolen
wakes). I had a conversation with the author of that other bug report and was
convinced it's valid; I'll have to go back and find it so that this report can
be considered as a possible duplicate...

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug nptl/12875] pthread_cond_timedwait can steal the wakeup of slower thread in pthread_cond_wait
  2011-06-10 17:57 [Bug nptl/12875] New: pthread_cond_timedwait can steal the wakeup of slower thread in pthread_cond_wait martin at lispworks dot com
                   ` (10 preceding siblings ...)
  2012-09-19  8:24 ` bugdal at aerifal dot cx
@ 2012-09-19  8:50 ` bugdal at aerifal dot cx
  2012-09-19 15:35 ` triegel at redhat dot com
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: bugdal at aerifal dot cx @ 2012-09-19  8:50 UTC (permalink / raw)
  To: glibc-bugs

http://sourceware.org/bugzilla/show_bug.cgi?id=12875

--- Comment #11 from Rich Felker <bugdal at aerifal dot cx> 2012-09-19 08:49:36 UTC ---
Here's the related bug report, whose thread concluded with the claim that this
bug (12875) is probably a manifestation of it:
http://sourceware.org/bugzilla/show_bug.cgi?id=13165

With that said, some comments on Torvald's last reply:

> Second, the sentence from the spec that you quote is vague on when those other
> threads would block (e.g., is there any guarantee of linearizability or such?).
> Specifically, it doesn't state which blocked threads should be wakened, or
> which threads blocked at which time.

The relevant text is in the specification for pthread_cond_wait regarding
atomically unlocking the mutex and blocking:

"These functions atomically release mutex and cause the calling thread to block
on the condition variable cond; atomically here means "atomically with respect
to access by another thread to the mutex and then the condition variable". That
is, if another thread is able to acquire the mutex after the about-to-block
thread has released it, then a subsequent call to pthread_cond_broadcast() or
pthread_cond_signal() in that thread shall behave as if it were issued after
the about-to-block thread has blocked."

That is, if the mutex was held by a thread entering pthread_cond_wait and
another thread successfully acquires the mutex, the first thread "has blocked".

> > If another waiter arriving after the N
> > signals consumes any of the signals and prevents those N from being blocked,
> > then the above cited requirement has been violated;
> 
> I don't see how this is violated. You say it unblocked the waiter and N-1
> other threads, so N overall for N signals.  That's what the guarantee is,
> isn't it?

Perhaps my typo in the above-quoted text is the source of confusion. It should
have read "prevents those N from being _unblocked_".

With the above reading of the standard in mind, at the moment
pthread_cond_signal is called, it's provable that exactly those N, and not the
new waiter that's about to arrive, "have blocked" on the condition variable. So
if the N signals don't wake all N of them, it's a bug - each signal is required
to unblock at least one thread that "has blocked" in the above sense.

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug nptl/12875] pthread_cond_timedwait can steal the wakeup of slower thread in pthread_cond_wait
  2011-06-10 17:57 [Bug nptl/12875] New: pthread_cond_timedwait can steal the wakeup of slower thread in pthread_cond_wait martin at lispworks dot com
                   ` (11 preceding siblings ...)
  2012-09-19  8:50 ` bugdal at aerifal dot cx
@ 2012-09-19 15:35 ` triegel at redhat dot com
  2012-09-19 17:30 ` bugdal at aerifal dot cx
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: triegel at redhat dot com @ 2012-09-19 15:35 UTC (permalink / raw)
  To: glibc-bugs

http://sourceware.org/bugzilla/show_bug.cgi?id=12875

--- Comment #12 from Torvald Riegel <triegel at redhat dot com> 2012-09-19 15:34:48 UTC ---
(In reply to comment #11)
> Here's the related bug report, whose thread concluded with the claim that this
> bug (12875) is probably a manifestation of it:
> http://sourceware.org/bugzilla/show_bug.cgi?id=13165
> 
> With that said, some comments on Torvald's last reply:
> 
> > Second, the sentence from the spec that you quote is vague on when those other
> > threads would block (e.g., is there any guarantee of linearizability or such?).
> > Specifically, it doesn't state which blocked threads should be wakened, or
> > which threads blocked at which time.
> 
> The relevant text is in the specification for pthread_cond_wait regarding
> atomically unlocking the mutex and blocking:
> 
> "These functions atomically release mutex and cause the calling thread to block
> on the condition variable cond; atomically here means "atomically with respect
> to access by another thread to the mutex and then the condition variable". That
> is, if another thread is able to acquire the mutex after the about-to-block
> thread has released it, then a subsequent call to pthread_cond_broadcast() or
> pthread_cond_signal() in that thread shall behave as if it were issued after
> the about-to-block thread has blocked."

That states that the signaler needs to respect a happens-before relation
established by the mutex.  Thus, the prior cond_wait will be considered as a
wake-up target.  There is no guarantee there that it is the first to be
wakened. There is no statement there about the relationship to other writers,
nor when the signal is actually delivered.

> That is, if the mutex was held by a thread entering pthread_cond_wait and
> another thread successfully acquires the mutex, the first thread "has blocked".

Indeed.  However, that doesn't mean that it will be the first to be unblocked. 
If there are no other waiters, it is the only candidate though, and will be
unblocked (i.e., there are no lost wake-ups).

> > > If another waiter arriving after the N
> > > signals consumes any of the signals and prevents those N from being blocked,
> > > then the above cited requirement has been violated;
> > 
> > I don't see how this is violated. You say it unblocked the waiter and N-1
> > other threads, so N overall for N signals.  That's what the guarantee is,
> > isn't it?
> 
> Perhaps my typo in the above-quoted text is the source of confusion. It should
> have read "prevents those N from being _unblocked_".

I read this as unblocked.

> With the above reading of the standard in mind, at the moment
> pthread_cond_signal is called, it's provable that exactly those N, and not the
> new waiter that's about to arrive, "have blocked" on the condition variable. > So

It says that those have blocked, not that those are exactly the ones that can
be considered to have blocked (i.e., the first "if" in what you quoted is an
"if", not an "iff" (if and only if)).

> if the N signals don't wake all N of them, it's a bug - each signal is required
> to unblock at least one thread that "has blocked" in the above sense.

No.  See above.

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug nptl/12875] pthread_cond_timedwait can steal the wakeup of slower thread in pthread_cond_wait
  2011-06-10 17:57 [Bug nptl/12875] New: pthread_cond_timedwait can steal the wakeup of slower thread in pthread_cond_wait martin at lispworks dot com
                   ` (12 preceding siblings ...)
  2012-09-19 15:35 ` triegel at redhat dot com
@ 2012-09-19 17:30 ` bugdal at aerifal dot cx
  2012-09-20 12:29 ` triegel at redhat dot com
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: bugdal at aerifal dot cx @ 2012-09-19 17:30 UTC (permalink / raw)
  To: glibc-bugs


http://sourceware.org/bugzilla/show_bug.cgi?id=12875

--- Comment #13 from Rich Felker <bugdal at aerifal dot cx> 2012-09-19 17:30:17 UTC ---
> It says that those have blocked, not that those are exactly the ones that can
> be considered to have blocked (i.e., the first "if" in what you quoted is an
> "if", not an "iff" (if and only if)).

Short of an implementation-defined extension that must be documented, there is
no way a waiter can block on a condition variable short of calling
pthread_cond_wait or pthread_cond_timedwait. Under your proposed
interpretation, pthread_cond_signal would be useless; in order to be useful, at
least one thread needs to unblock, and the application has to be able to know
(based on its knowledge of which threads have blocked) that the signal will
unblock a thread that will allow the program to make forward progress. This of
course requires that every thread which has blocked on the condition at the
time of pthread_cond_signal be a thread whose unblocking would allow forward
progress; if other threads have blocked on the same condition, then it's an
application bug.

As I stated before, I'm not sure this bug report is valid and I was thinking of
the other one. But there is a real issue here that the implementation needs to
take care to satisfy.

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug nptl/12875] pthread_cond_timedwait can steal the wakeup of slower thread in pthread_cond_wait
  2011-06-10 17:57 [Bug nptl/12875] New: pthread_cond_timedwait can steal the wakeup of slower thread in pthread_cond_wait martin at lispworks dot com
                   ` (13 preceding siblings ...)
  2012-09-19 17:30 ` bugdal at aerifal dot cx
@ 2012-09-20 12:29 ` triegel at redhat dot com
  2012-09-20 14:02 ` martin at lispworks dot com
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: triegel at redhat dot com @ 2012-09-20 12:29 UTC (permalink / raw)
  To: glibc-bugs


http://sourceware.org/bugzilla/show_bug.cgi?id=12875

--- Comment #14 from Torvald Riegel <triegel at redhat dot com> 2012-09-20 12:28:36 UTC ---
(In reply to comment #13)
> > It says that those have blocked, not that those are exactly the ones that can
> > be considered to have blocked (i.e., the first "if" in what you quoted is an
> > "if", not an "iff" (if and only if)).
> 
> Short of an implementation-defined extension that must be documented, there is
> no way a waiter can block on a condition variable short of calling
> pthread_cond_wait or pthread_cond_timedwait. Under your proposed
> interpretation, pthread_cond_signal would be useless;

It's not useless. It just doesn't give all the guarantees you thought it would
give.

> in order to be useful, at
> least one thread needs to unblock, and the application has to be able to know
> (based on its knowledge of which threads have blocked) that the signal will
> unblock a thread that will allow the program to make forward progress.

If the application needs to associate certain waiters with signalers, it can
just use a separate cond var for this, for example.

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug nptl/12875] pthread_cond_timedwait can steal the wakeup of slower thread in pthread_cond_wait
  2011-06-10 17:57 [Bug nptl/12875] New: pthread_cond_timedwait can steal the wakeup of slower thread in pthread_cond_wait martin at lispworks dot com
                   ` (14 preceding siblings ...)
  2012-09-20 12:29 ` triegel at redhat dot com
@ 2012-09-20 14:02 ` martin at lispworks dot com
  2012-09-20 16:59 ` triegel at redhat dot com
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: martin at lispworks dot com @ 2012-09-20 14:02 UTC (permalink / raw)
  To: glibc-bugs


http://sourceware.org/bugzilla/show_bug.cgi?id=12875

martin at lispworks dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Attachment #5787|0                           |1
        is obsolete|                            |

--- Comment #15 from martin at lispworks dot com 2012-09-20 14:01:28 UTC ---
Created attachment 6639
  --> http://sourceware.org/bugzilla/attachment.cgi?id=6639
Test case with explicit happens-before logic rather than usleep

As requested, I've attached a version of the test case that uses the lock,
barriers and atomic instructions to enforce happens-before.  There are 2
remaining calls to usleep, which are needed to ensure ordering between the
calling thread and the internals of pthread_cond_wait than can't be controlled.

This bug may be different from Bug 13165, because it is caused by waking from a
timeout rather than an extra signal.

Why I think this is a bug: my reading of the sentence "The
pthread_cond_signal() function shall unblock at least one of the threads that
are blocked on the specified condition variable cond (if any threads are
blocked on cond)." is that it only affects threads that "are blocked" at the
time pthread_cond_signal() is called, not those that call pthread_cond_wait
afterwards.  The wording for pthread_cond_broadcast() says "currently blocked",
but is that an intentional difference?

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug nptl/12875] pthread_cond_timedwait can steal the wakeup of slower thread in pthread_cond_wait
  2011-06-10 17:57 [Bug nptl/12875] New: pthread_cond_timedwait can steal the wakeup of slower thread in pthread_cond_wait martin at lispworks dot com
                   ` (15 preceding siblings ...)
  2012-09-20 14:02 ` martin at lispworks dot com
@ 2012-09-20 16:59 ` triegel at redhat dot com
  2012-09-21  8:05 ` siddhesh at redhat dot com
  2014-06-24  8:36 ` fweimer at redhat dot com
  18 siblings, 0 replies; 20+ messages in thread
From: triegel at redhat dot com @ 2012-09-20 16:59 UTC (permalink / raw)
  To: glibc-bugs


http://sourceware.org/bugzilla/show_bug.cgi?id=12875

--- Comment #16 from Torvald Riegel <triegel at redhat dot com> 2012-09-20 16:59:10 UTC ---
(In reply to comment #15)
> Created attachment 6639 [details]
> Test case with explicit happens-before logic rather than usleep
> 
> As requested, I've attached a version of the test case that uses the lock,
> barriers and atomic instructions to enforce happens-before.

Thanks.

> This bug may be different from Bug 13165, because it is caused by waking from a
> timeout rather than an extra signal.

Even this happens with cond_timedwait, it seems to me that it is conceptually
the same issue.

> Why I think this is a bug: my reading of the sentence "The
> pthread_cond_signal() function shall unblock at least one of the threads that
> are blocked on the specified condition variable cond (if any threads are
> blocked on cond)." is that it only affects threads that "are blocked" at the
> time pthread_cond_signal() is called, not those that call pthread_cond_wait
> afterwards.

I don't read it that way.  Please see the discussion in bug #13165 about this. 
To summarize, the spec only requires the threads to be considered as blocked
that happen before the signal, but it does not require threads that wait after
the signal to NOT be considered blocked.  In your example, the waiter in thread
D is not disallowed to be considered as a blocked thread.

> The wording for pthread_cond_broadcast() says "currently blocked",
> but is that an intentional difference?

I can't speak about the intent of the authors of the spec. A more detailed
specification would certainly be easier to understand.  But unless there is a
change, I would stick to what's allowed/required by the current wording.

OK to classify as not a bug?

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug nptl/12875] pthread_cond_timedwait can steal the wakeup of slower thread in pthread_cond_wait
  2011-06-10 17:57 [Bug nptl/12875] New: pthread_cond_timedwait can steal the wakeup of slower thread in pthread_cond_wait martin at lispworks dot com
                   ` (16 preceding siblings ...)
  2012-09-20 16:59 ` triegel at redhat dot com
@ 2012-09-21  8:05 ` siddhesh at redhat dot com
  2014-06-24  8:36 ` fweimer at redhat dot com
  18 siblings, 0 replies; 20+ messages in thread
From: siddhesh at redhat dot com @ 2012-09-21  8:05 UTC (permalink / raw)
  To: glibc-bugs


http://sourceware.org/bugzilla/show_bug.cgi?id=12875

Siddhesh Poyarekar <siddhesh at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |siddhesh at redhat dot com

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug nptl/12875] pthread_cond_timedwait can steal the wakeup of slower thread in pthread_cond_wait
  2011-06-10 17:57 [Bug nptl/12875] New: pthread_cond_timedwait can steal the wakeup of slower thread in pthread_cond_wait martin at lispworks dot com
                   ` (17 preceding siblings ...)
  2012-09-21  8:05 ` siddhesh at redhat dot com
@ 2014-06-24  8:36 ` fweimer at redhat dot com
  18 siblings, 0 replies; 20+ messages in thread
From: fweimer at redhat dot com @ 2014-06-24  8:36 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=12875

Florian Weimer <fweimer at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
              Flags|                            |security-

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2014-06-24  8:36 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-06-10 17:57 [Bug nptl/12875] New: pthread_cond_timedwait can steal the wakeup of slower thread in pthread_cond_wait martin at lispworks dot com
2011-06-11 13:36 ` [Bug nptl/12875] " drepper.fsp at gmail dot com
2011-06-13 12:39 ` martin at lispworks dot com
2011-09-28 21:55 ` bugdal at aerifal dot cx
2012-03-17 20:41 ` bugdal at aerifal dot cx
2012-04-29  3:01 ` bugdal at aerifal dot cx
2012-09-18 14:18 ` triegel at redhat dot com
2012-09-18 14:28 ` triegel at redhat dot com
2012-09-19  3:21 ` bugdal at aerifal dot cx
2012-09-19  6:24 ` jakub at redhat dot com
2012-09-19  8:22 ` triegel at redhat dot com
2012-09-19  8:24 ` bugdal at aerifal dot cx
2012-09-19  8:50 ` bugdal at aerifal dot cx
2012-09-19 15:35 ` triegel at redhat dot com
2012-09-19 17:30 ` bugdal at aerifal dot cx
2012-09-20 12:29 ` triegel at redhat dot com
2012-09-20 14:02 ` martin at lispworks dot com
2012-09-20 16:59 ` triegel at redhat dot com
2012-09-21  8:05 ` siddhesh at redhat dot com
2014-06-24  8:36 ` fweimer at redhat dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).