* RE: 1.1.3 and upwards: apparent bug with pthread_cond_wait() and/or signal()
@ 2002-05-01 8:37 Robert Collins
2002-05-01 9:37 ` Michael Beach
2002-05-01 20:37 ` Oops! Correction, that should have been 1.3.3 " Michael Beach
0 siblings, 2 replies; 14+ messages in thread
From: Robert Collins @ 2002-05-01 8:37 UTC (permalink / raw)
To: Michael Beach; +Cc: cygwin
> -----Original Message-----
> From: Michael Beach [mailto:michaelb@ieee.org]
> Sent: Thursday, May 02, 2002 12:21 AM
> Thanks for taking the time to look at this issue, but I must
> disagree that
> this is the problem.
You're going to have to debug this yourself. I've given you my opinion
:].
> If the test thread locks the mutex first, sure it will
> probably signal before
> the main thread is wating, but that doesn't matter because
> the main thread
does this sequence look plausible to you? I don't claim it is whats
happening because the string output doesn't fit.. but it illustrates
the race. On a dual processor machine this is much more likely than a
single.
thread - lock
thread - state=run
thread - signal
main - lock
main - test state (passes)
thread - test state (fails)
main - state = acknowledged
main - signal
thread wait
main - unlock
main - join
thread is hung.
what are we seeing:
main - lock
main - test state fails
main - wait
thread - lock
thread - state=run
thread - signal
-- test thread has signal()ed
thread - test state (fails)
-- test thread about to wait()...
thread wait
-- main thread wakes!
main - state = acknowledged
-- main thread about to signal()
main - signal
main - unlock
-- main thread waiting for exit...
thread should wake here.
>
> If the above hand-wavy explanation does not seem convincing,
...
> the different platforms does not seem to hold much water...
Without a few more output statements, I'll not buy into that. However I
do accept your hand waving. Particularly since I've noticed something
useful out of this: pthread_join's argument should not be 0. I have to
dig up the spec to confirm this though.... but our code will segfault
like crazy on you as it stands.
> However, that said, I will be trying 1.3.10 to see if it
> makes a difference.
> If not, then I guess I will just have to make the move to the
> Win32 threading
> and synchronization APIs. Blech!
You could always help us debug the pthreads code... I wonder if the
recent patches I haven't reviewed properly yet address this. If you had
time, you could try them and see...
> > You should also _always_ test for the return value when
> using pthreads
> > calls. They don't throw exceptions and they don't set errno, so the
> > only way you can tell an error has occurred is to record the return
> > value.
>
> Yes I know. The reason for this sloppy coding is that this
> test program is
> ...
Please don't remove error handling. If I were to run this program I'd
expect to have error handling so I don't have to add it in. And running
the code w/o error handling won't help me id anything non-trivial.
Rob (Cygwin pthreads maintainer).
--
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Bug reporting: http://cygwin.com/bugs.html
Documentation: http://cygwin.com/docs.html
FAQ: http://cygwin.com/faq/
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 1.1.3 and upwards: apparent bug with pthread_cond_wait() and/or signal()
2002-05-01 8:37 1.1.3 and upwards: apparent bug with pthread_cond_wait() and/or signal() Robert Collins
@ 2002-05-01 9:37 ` Michael Beach
2002-05-01 20:37 ` Oops! Correction, that should have been 1.3.3 " Michael Beach
1 sibling, 0 replies; 14+ messages in thread
From: Michael Beach @ 2002-05-01 9:37 UTC (permalink / raw)
To: Robert Collins; +Cc: cygwin
On Thursday 02 May 2002 01:37, Robert Collins wrote:
> > -----Original Message-----
> > From: Michael Beach [mailto:michaelb@ieee.org]
> > Sent: Thursday, May 02, 2002 12:21 AM
> >
> >
> > Thanks for taking the time to look at this issue, but I must
> > disagree that
> > this is the problem.
>
> You're going to have to debug this yourself. I've given you my opinion
>
> :].
> :
> > If the test thread locks the mutex first, sure it will
> > probably signal before
> > the main thread is wating, but that doesn't matter because
> > the main thread
>
> does this sequence look plausible to you? I don't claim it is whats
> happening because the string output doesn't fit.. but it illustrates
> the race. On a dual processor machine this is much more likely than a
> single.
>
> thread - lock
> thread - state=run
> thread - signal
> main - lock
> main - test state (passes)
No, I don't think it's plausible. In particular, we can't get to "main-lock"
until we get to "thread wait" because it's not until then that "thread" has
(implicitly) released the mutex. The OS can pre-empt "thread" all it likes,
but as soon as "main" has progressed to the pthread_mutex_lock() call it (ie
"main") will no longer be runnable and so won't be scheduled, until "thread"
calls pthread_cond_wait().
> thread - test state (fails)
> main - state = acknowledged
> main - signal
> thread wait
> main - unlock
> main - join
> thread is hung.
>
>
> what are we seeing:
> main - lock
> main - test state fails
> main - wait
> thread - lock
> thread - state=run
> thread - signal
> -- test thread has signal()ed
> thread - test state (fails)
> -- test thread about to wait()...
> thread wait
> -- main thread wakes!
> main - state = acknowledged
> -- main thread about to signal()
> main - signal
> main - unlock
> -- main thread waiting for exit...
> thread should wake here.
>
> > If the above hand-wavy explanation does not seem convincing,
>
> ...
>
> > the different platforms does not seem to hold much water...
>
> Without a few more output statements, I'll not buy into that.
Fair enough.
> However I
> do accept your hand waving. Particularly since I've noticed something
> useful out of this: pthread_join's argument should not be 0. I have to
> dig up the spec to confirm this though.... but our code will segfault
> like crazy on you as it stands.
Well, I'm not sure what the standard says on this either, and I've not had an
authoritative reference book handy lately, so I've just been going with
what's legal according to the manpages on SuSE 7.2. So my excuse is "Linux
made me do it".
>
> > However, that said, I will be trying 1.3.10 to see if it
> > makes a difference.
> > If not, then I guess I will just have to make the move to the
> > Win32 threading
> > and synchronization APIs. Blech!
>
> You could always help us debug the pthreads code... I wonder if the
> recent patches I haven't reviewed properly yet address this. If you had
> time, you could try them and see...
In principle I'd be pleased to help, but in practice my time is a bit tight
right now as I've been doing the public spirited thing for one or two bugs
I've encountered in other open source projects I've been using, and now I
think my employer would like me to focus more closely on Real Work (TM) ;-)
However if you're not expecting high bandwidth, if you could point me at a
document or whatnot that explains how to set up a development environment I'd
be willing to have a go.
>
> > > You should also _always_ test for the return value when
> >
> > using pthreads
> >
> > > calls. They don't throw exceptions and they don't set errno, so the
> > > only way you can tell an error has occurred is to record the return
> > > value.
> >
> > Yes I know. The reason for this sloppy coding is that this
> > test program is
> > ...
>
> Please don't remove error handling. If I were to run this program I'd
> expect to have error handling so I don't have to add it in. And running
> the code w/o error handling won't help me id anything non-trivial.
Sure. The quick'n'dirty pthreads calls were only so I didn't have to post
half of our source tree in order to illustrate the problem with an example
that actually compiles. If you're serious about wanting to run it, give me a
shout and I'll give you a version with error handling.
>
> Rob (Cygwin pthreads maintainer).
Regards
M.Beach
--
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Bug reporting: http://cygwin.com/bugs.html
Documentation: http://cygwin.com/docs.html
FAQ: http://cygwin.com/faq/
^ permalink raw reply [flat|nested] 14+ messages in thread
* Oops! Correction, that should have been 1.3.3 and upwards: apparent bug with pthread_cond_wait() and/or signal()
2002-05-01 8:37 1.1.3 and upwards: apparent bug with pthread_cond_wait() and/or signal() Robert Collins
2002-05-01 9:37 ` Michael Beach
@ 2002-05-01 20:37 ` Michael Beach
1 sibling, 0 replies; 14+ messages in thread
From: Michael Beach @ 2002-05-01 20:37 UTC (permalink / raw)
To: Robert Collins; +Cc: cygwin
Sorry, it was late and I misread the output from uname! The version of DLL
I've been using is in fact 1.3.3, not 1.1.3.
Regards
M.Beach
--
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Bug reporting: http://cygwin.com/bugs.html
Documentation: http://cygwin.com/docs.html
FAQ: http://cygwin.com/faq/
^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: 1.1.3 and upwards: apparent bug with pthread_cond_wait() and/or signal()
@ 2002-05-17 0:48 Robert Collins
0 siblings, 0 replies; 14+ messages in thread
From: Robert Collins @ 2002-05-17 0:48 UTC (permalink / raw)
To: Tim and Kathy Andvaag, cygwin
> -----Original Message-----
> From: Tim and Kathy Andvaag [mailto:andvaag@sk.sympatico.ca]
> Sent: Friday, May 17, 2002 2:24 PM
> $ nice -n -1 nice
> 0
>
> Yet the adjusted priorities are clearly occuring when you
> look at the windows task manager?
Rounding error IIRC. The +-20 that unix uses doesn't map all that well
to NT's class + level approach.
Rob
--
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Bug reporting: http://cygwin.com/bugs.html
Documentation: http://cygwin.com/docs.html
FAQ: http://cygwin.com/faq/
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 1.1.3 and upwards: apparent bug with pthread_cond_wait() and/or signal()
@ 2002-05-17 0:18 Tim and Kathy Andvaag
0 siblings, 0 replies; 14+ messages in thread
From: Tim and Kathy Andvaag @ 2002-05-17 0:18 UTC (permalink / raw)
To: cygwin
>> Between 1.1.3 and 1.3.0 a huge change occurred in the pthreads code
>> base, so this assumption is not safe. (It's not necessarily wrong
>> either.) I'd definitely be using 1.3.10 though.
>>
>> > #include <pthread.h>
>> > #include <iostream>
>>
>> The cygwin c++ libgcc, stdlibc++ and gcc are not built with thread
>> support, so C++ and threads may not work well together. C should work
>> fine, and if anyone convinces Chris to release a thread-enabled gcc, C++
>> should get better.
>Arrrgh - so that explains why so much of my source crashes randomly every
now
>and again...... (!)
>Please, please release thread-enabled C++ libs.
>Chris
Thanks for this hint.
I rebuilt gcc with
./configure --enable-threads=yes
make
make install
and it solved my intermittent problems with running/compiling the iperf
package (uses threads and C++ libs).
Is there a reason this is not the default?
Tim
P.S. why is:
$ nice -n -1 nice
0
Yet the adjusted priorities are clearly occuring when you look at the
windows task manager?
--
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Bug reporting: http://cygwin.com/bugs.html
Documentation: http://cygwin.com/docs.html
FAQ: http://cygwin.com/faq/
^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: 1.1.3 and upwards: apparent bug with pthread_cond_wait() and/or signal()
@ 2002-05-02 4:29 Robert Collins
2002-05-02 6:57 ` Michael Beach
2002-05-02 7:32 ` Jason Tishler
0 siblings, 2 replies; 14+ messages in thread
From: Robert Collins @ 2002-05-02 4:29 UTC (permalink / raw)
To: Michael Beach; +Cc: cygwin, Jason Tishler
Michael,
that patch I included in my last email fixed the problem, and
didn't introduce any regressions as far as I could tell, so I've checked
it in. If you build yourself a cygwin dll, or grab the next snapshot to
be generated, it will be fixed.
Jason - -this- bug may be the one that killed python. Could you retest
with a new dll when you have a few free moments?
Cheers,
Rob
--
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Bug reporting: http://cygwin.com/bugs.html
Documentation: http://cygwin.com/docs.html
FAQ: http://cygwin.com/faq/
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 1.1.3 and upwards: apparent bug with pthread_cond_wait() and/or signal()
2002-05-02 4:29 Robert Collins
@ 2002-05-02 6:57 ` Michael Beach
2002-05-02 7:32 ` Jason Tishler
1 sibling, 0 replies; 14+ messages in thread
From: Michael Beach @ 2002-05-02 6:57 UTC (permalink / raw)
To: Robert Collins; +Cc: cygwin, Jason Tishler
On Thursday 02 May 2002 21:28, Robert Collins wrote:
> Michael,
> that patch I included in my last email fixed the problem, and
> didn't introduce any regressions as far as I could tell, so I've checked
> it in. If you build yourself a cygwin dll, or grab the next snapshot to
> be generated, it will be fixed.
>
Thanks very much for that Robert! At the moment I'm just downloading Cygwin
to my home system, then I'll grab the sources from CVS and do a build.
Regards
M.Beach
> Jason - -this- bug may be the one that killed python. Could you retest
> with a new dll when you have a few free moments?
>
> Cheers,
> Rob
--
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Bug reporting: http://cygwin.com/bugs.html
Documentation: http://cygwin.com/docs.html
FAQ: http://cygwin.com/faq/
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 1.1.3 and upwards: apparent bug with pthread_cond_wait() and/or signal()
2002-05-02 4:29 Robert Collins
2002-05-02 6:57 ` Michael Beach
@ 2002-05-02 7:32 ` Jason Tishler
1 sibling, 0 replies; 14+ messages in thread
From: Jason Tishler @ 2002-05-02 7:32 UTC (permalink / raw)
To: cygwin
Rob,
On Thu, May 02, 2002 at 09:28:54PM +1000, Robert Collins wrote:
> Jason - -this- bug may be the one that killed python. Could you retest
> with a new dll when you have a few free moments?
Bingo! This one is finally squashed!
I am running the Python test_threadedtempfile regression test in a
loop and have already run 600+ iterations without a hang. Previously,
it would hang before 30. So except for tuckering out my wimpy laptop,
I think that we are good to go. :,)
Thanks,
Jason
--
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Bug reporting: http://cygwin.com/bugs.html
Documentation: http://cygwin.com/docs.html
FAQ: http://cygwin.com/faq/
^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: 1.1.3 and upwards: apparent bug with pthread_cond_wait() and/or signal()
@ 2002-05-02 1:43 Robert Collins
0 siblings, 0 replies; 14+ messages in thread
From: Robert Collins @ 2002-05-02 1:43 UTC (permalink / raw)
To: Michael Beach; +Cc: cygwin
> -----Original Message-----
> From: Michael Beach [mailto:michaelb@ieee.org]
> Sent: Thursday, May 02, 2002 2:16 AM
> >
> > thread - lock
> > thread - state=run
> > thread - signal
> > main - lock
> > main - test state (passes)
>
> calls pthread_cond_wait().
Doh. I need some real serious sleep.
"Linux
> made me do it".
:].
> However if you're not expecting high bandwidth, if you could
> point me at a
> document or whatnot that explains how to set up a development
> environment I'd
> be willing to have a go.
There are very few developers contributing to pthreads code - right now
I'm swamped, and a new contributor has offered some high quality
patches. Http://www.cygwin.com/cvs.html explains how to grab the current
source. You could also just click on the 'src' checkbox beside the
cygwin package in setup.exe, to get it to download a snapshot.
> Sure. The quick'n'dirty pthreads calls were only so I didn't
> have to post
> half of our source tree in order to illustrate the problem
> with an example
> that actually compiles. If you're serious about wanting to
> run it, give me a
> shout and I'll give you a version with error handling.
I can duplicate the hang. What appears to be happenning is that signals
sent from a thread when another thread is entering?exiting? the wait
routine get dropped.
The main signal() routine finds 0 waiting threads (see thread.cc:495)
when it is called, so it does nothing.
A - main thread
b - new thread
L - lock
W - wait
S - signal
J - join
U - unlock
Fails
A B
L
W
L
S (1)
W
S <-- is dropped
U
U
J
Ok, in detail
S (1)
does this:
lock the cond variable
signals A
waits for A to wake to prevent dropped signals
unlocks the cond struct
then the W
locks the cond variable
increases the waiting count
waits, releasing the mutex and unlocking the cond variable
A on waking does this:
decrements the waiting count (now 0)
tells the S(1) routine that it's woken up
Locks the mutex that it's waiting on.
(*)clears the cond structure's cached mutex entry if it's the last
waking thread
locks the cond structure
decrements the mutex's wait reference
unlocks the cond structure.
(*) was buggy. So what is happening is that the W when it releases the
mutex, did so AFTER A tested for being the last thread, so A's test was
flawed. I've a fix ready, I just need to get some time to test, which I
will do tonight. If you want to test it, it's
Index: thread.cc
===================================================================
RCS file: /cvs/src/src/winsup/cygwin/thread.cc,v
retrieving revision 1.65
diff -u -p -r1.65 thread.cc
--- thread.cc 28 Feb 2002 13:50:41 -0000 1.65
+++ thread.cc 2 May 2002 08:42:21 -0000
@@ -1791,20 +1791,22 @@ __pthread_cond_dowait (pthread_cond_t *c
InterlockedIncrement (&((*themutex)->condwaits));
if (pthread_mutex_unlock (&(*cond)->cond_access))
system_printf ("Failed to unlock condition variable access mutex,
this %p", *cond);
+ /* At this point calls to Signal will progress evebn if we aren' yet
waiting
+ * However, the loop there should allow us to get scheduled and call
wait,
+ * and have them call PulseEvent again if we dont' respond.
+ */
rv = (*cond)->TimedWait (waitlength);
/* this may allow a race on the mutex acquisition and waits..
* But doing this within the cond access mutex creates a different
race
*/
- bool last = false;
- if (InterlockedDecrement (&((*cond)->waiting)) == 0)
- last = true;
+ InterlockedDecrement (&((*cond)->waiting));
/* Tell Signal that we have been released */
InterlockedDecrement (&((*cond)->ExitingWait));
(*themutex)->Lock ();
- if (last == true)
- (*cond)->mutex = NULL;
if (pthread_mutex_lock (&(*cond)->cond_access))
system_printf ("Failed to lock condition variable access mutex,
this %p", *cond);
+ if ((*cond)->waiting == 0)
+ (*cond)->mutex = NULL;
InterlockedDecrement (&((*themutex)->condwaits));
if (pthread_mutex_unlock (&(*cond)->cond_access))
system_printf ("Failed to unlock condition variable access mutex,
this %p", *cond);
Rob
--
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Bug reporting: http://cygwin.com/bugs.html
Documentation: http://cygwin.com/docs.html
FAQ: http://cygwin.com/faq/
^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: 1.1.3 and upwards: apparent bug with pthread_cond_wait() and/or signal()
@ 2002-05-01 5:57 Robert Collins
0 siblings, 0 replies; 14+ messages in thread
From: Robert Collins @ 2002-05-01 5:57 UTC (permalink / raw)
To: Christopher January, cygwin
> -----Original Message-----
> From: Christopher January [mailto:chris@atomice.net]
> Sent: Wednesday, May 01, 2002 10:40 PM
> To: cygwin@cygwin.com
> Subject: Re: 1.1.3 and upwards: apparent bug with
> pthread_cond_wait() and/or signal()
>
>
> > Between 1.1.3 and 1.3.0 a huge change occurred in the pthreads code
> > base, so this assumption is not safe. (It's not necessarily wrong
> > either.) I'd definitely be using 1.3.10 though.
> >
> > > #include <pthread.h>
> > > #include <iostream>
> >
> > The cygwin c++ libgcc, stdlibc++ and gcc are not built with thread
> > support, so C++ and threads may not work well together. C
> should work
> > fine, and if anyone convinces Chris to release a
> thread-enabled gcc,
> > C++ should get better.
> Arrrgh - so that explains why so much of my source crashes
> randomly every now
> and again...... (!)
But not that test program - the test program was buggy.
Rob
--
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Bug reporting: http://cygwin.com/bugs.html
Documentation: http://cygwin.com/docs.html
FAQ: http://cygwin.com/faq/
^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: 1.1.3 and upwards: apparent bug with pthread_cond_wait() and/or signal()
@ 2002-05-01 5:22 Robert Collins
2002-05-01 5:39 ` Christopher January
2002-05-01 7:41 ` Michael Beach
0 siblings, 2 replies; 14+ messages in thread
From: Robert Collins @ 2002-05-01 5:22 UTC (permalink / raw)
To: Michael Beach, cygwin
> -----Original Message-----
> From: Michael Beach [mailto:michaelb@ieee.org]
> Sent: Wednesday, May 01, 2002 9:44 PM
> To: cygwin@cygwin.com
> Subject: 1.1.3 and upwards: apparent bug with
> pthread_cond_wait() and/or signal()
>
>
> Hi all, I've just been wrestling with some code I've been
> writing, trying to
> get pthreads condition variables to work under Cygwin on
> Windows 2000. I've
> tried DLL versions 1.1.3 and the 20020409 snapshot, and
> neither are working
> for me, so I'm assuming that no versions in between will work
> either...
Between 1.1.3 and 1.3.0 a huge change occurred in the pthreads code
base, so this assumption is not safe. (It's not necessarily wrong
either.) I'd definitely be using 1.3.10 though.
> #include <pthread.h>
> #include <iostream>
The cygwin c++ libgcc, stdlibc++ and gcc are not built with thread
support, so C++ and threads may not work well together. C should work
fine, and if anyone convinces Chris to release a thread-enabled gcc, C++
should get better.
>
>
> int main(int argc, char *argv[])
>
> {
>
> CondVarTestData td;
>
> pthread_mutex_init(&td.m, 0);
>
> pthread_cond_init(&td.cv, 0);
>
> td.state = CondVarTestData::START;
>
> pthread_t th;
>
> pthread_create(&th, 0, condVarTestThreadEntry, &td);
>
> {
>
> pthread_mutex_lock(&td.m);
you should lock this before starting your thread. It's a potential race.
And due to cygwin's implementation, it *is* racing, and your other
thread is entering the mutex and signalling before you enter the mutex
and wait. That early signal with no waiters gets lost (as it should).
You should also _always_ test for the return value when using pthreads
calls. They don't throw exceptions and they don't set errno, so the only
way you can tell an error has occurred is to record the return value.
Rib
--
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Bug reporting: http://cygwin.com/bugs.html
Documentation: http://cygwin.com/docs.html
FAQ: http://cygwin.com/faq/
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 1.1.3 and upwards: apparent bug with pthread_cond_wait() and/or signal()
2002-05-01 5:22 Robert Collins
@ 2002-05-01 5:39 ` Christopher January
2002-05-01 7:41 ` Michael Beach
1 sibling, 0 replies; 14+ messages in thread
From: Christopher January @ 2002-05-01 5:39 UTC (permalink / raw)
To: cygwin
> Between 1.1.3 and 1.3.0 a huge change occurred in the pthreads code
> base, so this assumption is not safe. (It's not necessarily wrong
> either.) I'd definitely be using 1.3.10 though.
>
> > #include <pthread.h>
> > #include <iostream>
>
> The cygwin c++ libgcc, stdlibc++ and gcc are not built with thread
> support, so C++ and threads may not work well together. C should work
> fine, and if anyone convinces Chris to release a thread-enabled gcc, C++
> should get better.
Arrrgh - so that explains why so much of my source crashes randomly every now
and again...... (!)
Please, please release thread-enabled C++ libs.
Chris
--
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Bug reporting: http://cygwin.com/bugs.html
Documentation: http://cygwin.com/docs.html
FAQ: http://cygwin.com/faq/
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 1.1.3 and upwards: apparent bug with pthread_cond_wait() and/or signal()
2002-05-01 5:22 Robert Collins
2002-05-01 5:39 ` Christopher January
@ 2002-05-01 7:41 ` Michael Beach
1 sibling, 0 replies; 14+ messages in thread
From: Michael Beach @ 2002-05-01 7:41 UTC (permalink / raw)
To: Robert Collins; +Cc: cygwin
On Wednesday 01 May 2002 22:22, Robert Collins wrote:
> > -----Original Message-----
> > From: Michael Beach [mailto:michaelb@ieee.org]
> > Sent: Wednesday, May 01, 2002 9:44 PM
> > To: cygwin@cygwin.com
> > Subject: 1.1.3 and upwards: apparent bug with
> > pthread_cond_wait() and/or signal()
> >
> >
> > Hi all, I've just been wrestling with some code I've been
> > writing, trying to
> > get pthreads condition variables to work under Cygwin on
> > Windows 2000. I've
> > tried DLL versions 1.1.3 and the 20020409 snapshot, and
> > neither are working
> > for me, so I'm assuming that no versions in between will work
> > either...
>
> Between 1.1.3 and 1.3.0 a huge change occurred in the pthreads code
> base, so this assumption is not safe. (It's not necessarily wrong
> either.) I'd definitely be using 1.3.10 though.
I'll give it a try, but I'm not too hopeful considering that the snapshot
(which postdates 1.3.10) doesn't seem to work.
>
> > #include <pthread.h>
> > #include <iostream>
>
> The cygwin c++ libgcc, stdlibc++ and gcc are not built with thread
> support, so C++ and threads may not work well together. C should work
> fine, and if anyone convinces Chris to release a thread-enabled gcc, C++
> should get better.
>
> > int main(int argc, char *argv[])
> >
> > {
> >
> > CondVarTestData td;
> >
> > pthread_mutex_init(&td.m, 0);
> >
> > pthread_cond_init(&td.cv, 0);
> >
> > td.state = CondVarTestData::START;
> >
> > pthread_t th;
> >
> > pthread_create(&th, 0, condVarTestThreadEntry, &td);
> >
> > {
> >
> > pthread_mutex_lock(&td.m);
>
> you should lock this before starting your thread. It's a potential race.
> And due to cygwin's implementation, it *is* racing, and your other
> thread is entering the mutex and signalling before you enter the mutex
> and wait. That early signal with no waiters gets lost (as it should).
Thanks for taking the time to look at this issue, but I must disagree that
this is the problem. There *is* indeterminacy here (vis-a-vis what is
guaranteed by the pthreads spec) as to which thread locks the mutex first,
but I'd hesitate to call it a race condition since the completion of the test
program (by design) does not *depend* on which thread gets to the mutex
first. I've included relevant parts of the program again below to illustrate
my point.
If the test thread locks the mutex first, sure it will probably signal before
the main thread is wating, but that doesn't matter because the main thread
won't sleep since it tests the condition (that the shared state is
NEW_THREAD_RUNNING) to see whether or not it should call pthread_cond_wait(),
and the test thread ensures that that condition is satisfied before it
signals. So the test thread wll then end up waiting for the main thread to
signal it, which it will do. Then the test thread exits, the main thread
joins it and the program terminates succesfully.
On the other hand, if the main thread gets to the mutex first then it will
wait (as the NEW_THREAD_RUNNING condition will no be satisfied). At this
point the test thread will get to run and will signal the waiting main thread
after setting the state to NEW_THREAD_RUNNING. The main thread will then wake
when the test thread itself calls pthread_cond_wait() (and so releases the
mutex). The the main thread will signal the waiting test thread, which then
exits, and so the program then terminates much as before.
If the above hand-wavy explanation does not seem convincing, then I'd also
like to tender the empirical evidence of the printed output from the test
runs on Cygwin and Linux. In both cases the output is the same, up until the
point when the Cygwin built version just stops producing output at all. This
tends to indicate that the underlying thread systems are making the same
scheduling decisions with respect to those two threads, so the argument that
it works on Linux but not on Cygwin due to an inherent race condition
resolving itself differently (due to different scheduling of the threads) on
the different platforms does not seem to hold much water...
However, that said, I will be trying 1.3.10 to see if it makes a difference.
If not, then I guess I will just have to make the move to the Win32 threading
and synchronization APIs. Blech!
int main(int argc, char *argv[])
{
CondVarTestData td;
pthread_mutex_init(&td.m, 0);
pthread_cond_init(&td.cv, 0);
td.state = CondVarTestData::START;
pthread_t th;
pthread_create(&th, 0, condVarTestThreadEntry, &td);
{
pthread_mutex_lock(&td.m);
while (td.state != CondVarTestData::NEW_THREAD_RUNNING)
{
pthread_cond_wait(&td.cv, &td.m);
clog << "-- main thread wakes!" << endl;
}
td.state = CondVarTestData::NEW_THREAD_ACKNOWLEDGED;
clog << "-- main thread about to signal()" << endl;
pthread_cond_signal(&td.cv);
pthread_mutex_unlock(&td.m);
}
clog << "-- main thread waiting for exit..." << endl;
pthread_join(th, 0);
cout << "%% PASSED" << endl;
return 0;
}
void *condVarTestThreadEntry(void *arg)
{
CondVarTestData *td = (CondVarTestData *)arg;
pthread_mutex_lock(&td->m);
td->state = CondVarTestData::NEW_THREAD_RUNNING;
pthread_cond_signal(&td->cv);
clog << "-- test thread has signal()ed" << endl;
while (td->state != CondVarTestData::NEW_THREAD_ACKNOWLEDGED)
{
clog << "-- test thread about to wait()..." << endl;
pthread_cond_wait(&td->cv, &td->m);
clog << "-- test thread wakes!" << endl;
}
pthread_mutex_unlock(&td->m);
clog << "-- test thread about to exit..." << endl;
return 0;
}
>
> You should also _always_ test for the return value when using pthreads
> calls. They don't throw exceptions and they don't set errno, so the only
> way you can tell an error has occurred is to record the return value.
Yes I know. The reason for this sloppy coding is that this test program is
the result of quickly stripping out calls to a C++ threading library (which
in the case of Cygwin simply wraps pthreads quite thinly) and replacing the
with raw pthreads. The library does handle error returns, but I wanted to
demonstrate the problem without any "noise" from the library before posting
to the list.
>
> Rib
Regards
M.Beach
--
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Bug reporting: http://cygwin.com/bugs.html
Documentation: http://cygwin.com/docs.html
FAQ: http://cygwin.com/faq/
^ permalink raw reply [flat|nested] 14+ messages in thread
* 1.1.3 and upwards: apparent bug with pthread_cond_wait() and/or signal()
@ 2002-05-01 4:46 Michael Beach
0 siblings, 0 replies; 14+ messages in thread
From: Michael Beach @ 2002-05-01 4:46 UTC (permalink / raw)
To: cygwin
Hi all, I've just been wrestling with some code I've been writing, trying to
get pthreads condition variables to work under Cygwin on Windows 2000. I've
tried DLL versions 1.1.3 and the 20020409 snapshot, and neither are working
for me, so I'm assuming that no versions in between will work either...
When I build an run my test program (attached below) I get the following
results...
$ ./a.exe
-- test thread has signal()ed
-- test thread about to wait()...
-- main thread wakes!
-- main thread about to signal()
-- main thread waiting for exit...
and the program hangs. Presumably the test thread does not wake. On the other
hand, when I compile the same test program on SuSE Linux 7.2 (gcc 2.95.3,
glibc 2.2.2) I get what I consider to be correct results...
michaelb@gilgamesh:~ > ./a.out
-- test thread has signal()ed
-- test thread about to wait()...
-- main thread wakes!
-- main thread about to signal()
-- main thread waiting for exit...
-- test thread wakes!
-- test thread about to exit...
%% PASSED
I've done a lot of staring at my test program, and can't see any problem with
it (though I'm willing to be proved wrong in this, I can stand the shame!),
so I'm thinking that this is a Cygwin bug.
Regards
M.Beach
/*
* foo.cpp
*/
#include <pthread.h>
#include <iostream>
using namespace std;
void *condVarTestThreadEntry(void *arg);
struct CondVarTestData
{
pthread_mutex_t m;
pthread_cond_t cv;
enum { START, NEW_THREAD_RUNNING, NEW_THREAD_ACKNOWLEDGED } state;
};
int main(int argc, char *argv[])
{
CondVarTestData td;
pthread_mutex_init(&td.m, 0);
pthread_cond_init(&td.cv, 0);
td.state = CondVarTestData::START;
pthread_t th;
pthread_create(&th, 0, condVarTestThreadEntry, &td);
{
pthread_mutex_lock(&td.m);
while (td.state != CondVarTestData::NEW_THREAD_RUNNING)
{
pthread_cond_wait(&td.cv, &td.m);
clog << "-- main thread wakes!" << endl;
}
td.state = CondVarTestData::NEW_THREAD_ACKNOWLEDGED;
clog << "-- main thread about to signal()" << endl;
pthread_cond_signal(&td.cv);
pthread_mutex_unlock(&td.m);
}
clog << "-- main thread waiting for exit..." << endl;
pthread_join(th, 0);
cout << "%% PASSED" << endl;
return 0;
}
void *condVarTestThreadEntry(void *arg)
{
CondVarTestData *td = (CondVarTestData *)arg;
pthread_mutex_lock(&td->m);
td->state = CondVarTestData::NEW_THREAD_RUNNING;
pthread_cond_signal(&td->cv);
clog << "-- test thread has signal()ed" << endl;
while (td->state != CondVarTestData::NEW_THREAD_ACKNOWLEDGED)
{
clog << "-- test thread about to wait()..." << endl;
pthread_cond_wait(&td->cv, &td->m);
clog << "-- test thread wakes!" << endl;
}
pthread_mutex_unlock(&td->m);
clog << "-- test thread about to exit..." << endl;
return 0;
}
0d03a9@INTHN\x11\x04
--
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Bug reporting: http://cygwin.com/bugs.html
Documentation: http://cygwin.com/docs.html
FAQ: http://cygwin.com/faq/
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2002-05-17 5:30 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-05-01 8:37 1.1.3 and upwards: apparent bug with pthread_cond_wait() and/or signal() Robert Collins
2002-05-01 9:37 ` Michael Beach
2002-05-01 20:37 ` Oops! Correction, that should have been 1.3.3 " Michael Beach
-- strict thread matches above, loose matches on Subject: below --
2002-05-17 0:48 1.1.3 " Robert Collins
2002-05-17 0:18 Tim and Kathy Andvaag
2002-05-02 4:29 Robert Collins
2002-05-02 6:57 ` Michael Beach
2002-05-02 7:32 ` Jason Tishler
2002-05-02 1:43 Robert Collins
2002-05-01 5:57 Robert Collins
2002-05-01 5:22 Robert Collins
2002-05-01 5:39 ` Christopher January
2002-05-01 7:41 ` Michael Beach
2002-05-01 4:46 Michael Beach
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).