* RE: 1.1.3 and upwards: apparent bug with pthread_cond_wait() and/or signal()
@ 2002-05-01 8:37 Robert Collins
2002-05-01 9:37 ` Michael Beach
2002-05-01 20:37 ` Oops! Correction, that should have been 1.3.3 " Michael Beach
0 siblings, 2 replies; 3+ messages in thread
From: Robert Collins @ 2002-05-01 8:37 UTC (permalink / raw)
To: Michael Beach; +Cc: cygwin
> -----Original Message-----
> From: Michael Beach [mailto:michaelb@ieee.org]
> Sent: Thursday, May 02, 2002 12:21 AM
> Thanks for taking the time to look at this issue, but I must
> disagree that
> this is the problem.
You're going to have to debug this yourself. I've given you my opinion
:].
> If the test thread locks the mutex first, sure it will
> probably signal before
> the main thread is wating, but that doesn't matter because
> the main thread
does this sequence look plausible to you? I don't claim it is whats
happening because the string output doesn't fit.. but it illustrates
the race. On a dual processor machine this is much more likely than a
single.
thread - lock
thread - state=run
thread - signal
main - lock
main - test state (passes)
thread - test state (fails)
main - state = acknowledged
main - signal
thread wait
main - unlock
main - join
thread is hung.
what are we seeing:
main - lock
main - test state fails
main - wait
thread - lock
thread - state=run
thread - signal
-- test thread has signal()ed
thread - test state (fails)
-- test thread about to wait()...
thread wait
-- main thread wakes!
main - state = acknowledged
-- main thread about to signal()
main - signal
main - unlock
-- main thread waiting for exit...
thread should wake here.
>
> If the above hand-wavy explanation does not seem convincing,
...
> the different platforms does not seem to hold much water...
Without a few more output statements, I'll not buy into that. However I
do accept your hand waving. Particularly since I've noticed something
useful out of this: pthread_join's argument should not be 0. I have to
dig up the spec to confirm this though.... but our code will segfault
like crazy on you as it stands.
> However, that said, I will be trying 1.3.10 to see if it
> makes a difference.
> If not, then I guess I will just have to make the move to the
> Win32 threading
> and synchronization APIs. Blech!
You could always help us debug the pthreads code... I wonder if the
recent patches I haven't reviewed properly yet address this. If you had
time, you could try them and see...
> > You should also _always_ test for the return value when
> using pthreads
> > calls. They don't throw exceptions and they don't set errno, so the
> > only way you can tell an error has occurred is to record the return
> > value.
>
> Yes I know. The reason for this sloppy coding is that this
> test program is
> ...
Please don't remove error handling. If I were to run this program I'd
expect to have error handling so I don't have to add it in. And running
the code w/o error handling won't help me id anything non-trivial.
Rob (Cygwin pthreads maintainer).
--
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Bug reporting: http://cygwin.com/bugs.html
Documentation: http://cygwin.com/docs.html
FAQ: http://cygwin.com/faq/
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: 1.1.3 and upwards: apparent bug with pthread_cond_wait() and/or signal()
2002-05-01 8:37 1.1.3 and upwards: apparent bug with pthread_cond_wait() and/or signal() Robert Collins
@ 2002-05-01 9:37 ` Michael Beach
2002-05-01 20:37 ` Oops! Correction, that should have been 1.3.3 " Michael Beach
1 sibling, 0 replies; 3+ messages in thread
From: Michael Beach @ 2002-05-01 9:37 UTC (permalink / raw)
To: Robert Collins; +Cc: cygwin
On Thursday 02 May 2002 01:37, Robert Collins wrote:
> > -----Original Message-----
> > From: Michael Beach [mailto:michaelb@ieee.org]
> > Sent: Thursday, May 02, 2002 12:21 AM
> >
> >
> > Thanks for taking the time to look at this issue, but I must
> > disagree that
> > this is the problem.
>
> You're going to have to debug this yourself. I've given you my opinion
>
> :].
> :
> > If the test thread locks the mutex first, sure it will
> > probably signal before
> > the main thread is wating, but that doesn't matter because
> > the main thread
>
> does this sequence look plausible to you? I don't claim it is whats
> happening because the string output doesn't fit.. but it illustrates
> the race. On a dual processor machine this is much more likely than a
> single.
>
> thread - lock
> thread - state=run
> thread - signal
> main - lock
> main - test state (passes)
No, I don't think it's plausible. In particular, we can't get to "main-lock"
until we get to "thread wait" because it's not until then that "thread" has
(implicitly) released the mutex. The OS can pre-empt "thread" all it likes,
but as soon as "main" has progressed to the pthread_mutex_lock() call it (ie
"main") will no longer be runnable and so won't be scheduled, until "thread"
calls pthread_cond_wait().
> thread - test state (fails)
> main - state = acknowledged
> main - signal
> thread wait
> main - unlock
> main - join
> thread is hung.
>
>
> what are we seeing:
> main - lock
> main - test state fails
> main - wait
> thread - lock
> thread - state=run
> thread - signal
> -- test thread has signal()ed
> thread - test state (fails)
> -- test thread about to wait()...
> thread wait
> -- main thread wakes!
> main - state = acknowledged
> -- main thread about to signal()
> main - signal
> main - unlock
> -- main thread waiting for exit...
> thread should wake here.
>
> > If the above hand-wavy explanation does not seem convincing,
>
> ...
>
> > the different platforms does not seem to hold much water...
>
> Without a few more output statements, I'll not buy into that.
Fair enough.
> However I
> do accept your hand waving. Particularly since I've noticed something
> useful out of this: pthread_join's argument should not be 0. I have to
> dig up the spec to confirm this though.... but our code will segfault
> like crazy on you as it stands.
Well, I'm not sure what the standard says on this either, and I've not had an
authoritative reference book handy lately, so I've just been going with
what's legal according to the manpages on SuSE 7.2. So my excuse is "Linux
made me do it".
>
> > However, that said, I will be trying 1.3.10 to see if it
> > makes a difference.
> > If not, then I guess I will just have to make the move to the
> > Win32 threading
> > and synchronization APIs. Blech!
>
> You could always help us debug the pthreads code... I wonder if the
> recent patches I haven't reviewed properly yet address this. If you had
> time, you could try them and see...
In principle I'd be pleased to help, but in practice my time is a bit tight
right now as I've been doing the public spirited thing for one or two bugs
I've encountered in other open source projects I've been using, and now I
think my employer would like me to focus more closely on Real Work (TM) ;-)
However if you're not expecting high bandwidth, if you could point me at a
document or whatnot that explains how to set up a development environment I'd
be willing to have a go.
>
> > > You should also _always_ test for the return value when
> >
> > using pthreads
> >
> > > calls. They don't throw exceptions and they don't set errno, so the
> > > only way you can tell an error has occurred is to record the return
> > > value.
> >
> > Yes I know. The reason for this sloppy coding is that this
> > test program is
> > ...
>
> Please don't remove error handling. If I were to run this program I'd
> expect to have error handling so I don't have to add it in. And running
> the code w/o error handling won't help me id anything non-trivial.
Sure. The quick'n'dirty pthreads calls were only so I didn't have to post
half of our source tree in order to illustrate the problem with an example
that actually compiles. If you're serious about wanting to run it, give me a
shout and I'll give you a version with error handling.
>
> Rob (Cygwin pthreads maintainer).
Regards
M.Beach
--
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Bug reporting: http://cygwin.com/bugs.html
Documentation: http://cygwin.com/docs.html
FAQ: http://cygwin.com/faq/
^ permalink raw reply [flat|nested] 3+ messages in thread
* Oops! Correction, that should have been 1.3.3 and upwards: apparent bug with pthread_cond_wait() and/or signal()
2002-05-01 8:37 1.1.3 and upwards: apparent bug with pthread_cond_wait() and/or signal() Robert Collins
2002-05-01 9:37 ` Michael Beach
@ 2002-05-01 20:37 ` Michael Beach
1 sibling, 0 replies; 3+ messages in thread
From: Michael Beach @ 2002-05-01 20:37 UTC (permalink / raw)
To: Robert Collins; +Cc: cygwin
Sorry, it was late and I misread the output from uname! The version of DLL
I've been using is in fact 1.3.3, not 1.1.3.
Regards
M.Beach
--
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Bug reporting: http://cygwin.com/bugs.html
Documentation: http://cygwin.com/docs.html
FAQ: http://cygwin.com/faq/
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2002-05-02 3:37 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-05-01 8:37 1.1.3 and upwards: apparent bug with pthread_cond_wait() and/or signal() Robert Collins
2002-05-01 9:37 ` Michael Beach
2002-05-01 20:37 ` Oops! Correction, that should have been 1.3.3 " Michael Beach
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).