[ECOS] questions about condition variable example from eCos book

public inbox for ecos-discuss@sourceware.org
 help / color / mirror / Atom feed

* [ECOS] questions about condition variable example from eCos book
@ 2003-04-28  8:18 Eric Smith
  2003-04-28 12:16 ` Nick Garnett
  0 siblings, 1 reply; 2+ messages in thread
From: Eric Smith @ 2003-04-28  8:18 UTC (permalink / raw)
  To: ecos-discuss

I've got some questions about the condition variable example code in
the eCos book on pages 107-109.  I'm new to eCos so it's possible that
I'm misinterpreting something.

First, the description of cyg_cond_signal() says that it will "wake up
exactly one thread waiting on the condition variable [...]  If there are
no threads waiting for the condition variable when it is signaled, nothing
happens."  But it also says "a race condition could arise if more than
one thread is waiting for the condition variable.  This is why it is
important for the waiting thread to retest the condition variable to
ensure its proper state."

I don't understand where this race condition comes from.  Even if there
are multiple threads waiting on the same condition, cyg_cond_signal() will
only wake up one, right?  So how could the thread wake up without the
condition having been signalled?

The example code contains two threads.  Here's an excerpt,
slightly reformatted and without most of the comments:

line
#
11    void thread_a (cyg_addrword_t index)
12    {
14      while (1)
15        {
16          // Acquire data into the buffer...
19          buffer_empty = false;
22          cyg_mutex_lock (& mut_cond_var);
25          cyg_cond_signal (& cond_var);
28          cyg_mutex_unlock (& mut_cond_var);
29        }
30    }

35    void thread_b (cyg_addrword_t index)
36    {
38      while (1)
39        {
41          cyg_mutex_lock (& mut_cond_var);
44          while (buffer_empty == true)
45            {
46              cyg_cond_wait (& cond_var);
47            }
49          // get the buffer data...
52          buffer_empty = true;
55          cyg_mutex_unlock (& mut_cond_var);
57          // Process the data in the buffer...
58        }
59    }

So in this example, why is it not adequate for the while statement
on line 44 to be an if statement instead?  I'll concede that a while
is better defensive programming to use while, but it doesn't seem
strictly necessary as the text claims.

Secondly, I think this code does have two actual race conditions.
Suppose the two threads are at the same priority, and timeslicing is
enabled.  They handshake successfully once.  Here's one possible
scenario of the suceeding events:

   thread_b   line 49  gets data from the buffer (still holding mutex)

   thread_a   line 16  acquires data
              line 19  sets buffer_empty = false
              line 22  wait for mutex (still held by thread_b)

   thread_b   line 52  sets buffer_empty = true
              line 55  gives up mutex
              line 57  processes the data
              line 41  locks the mutex
              line 44  checks buffer_empty, it is true!
              line 46  waits for condition to be signalled (yeilding mutex)

   thread_a   line 22  acquires mutex and wakes up
              line 25  signals condition
              line 28  releases mutex
              line 16  starts acquiring more data

   thread_b   line 46  wakes up (re-acquiring the mutex)
              line 44  tests buffer_empty, finds it true!
              line 46  waits for condition to be signalled (yeilding mutex)

thread_b has effectively missed the signal from thread_a, though it
should get it the next time around.  However, that could be
arbitrarily far in the future, and in the meanwhile it's not
processing the data thread_a was trying to pass to it.

It seems fairly clear that line 19 "buffer_empty =false" in thread_a should
actually come after line 22 acquires the mutex, in order to prevent exactly
this sort of race condition.

In general, any time you use a single variable to communicate state between
two processes, it should be protected by a mutual exclusion mechanism of
some sort.  There are some exceptions to this rule, but any time you think
you can avoid the need for mutual exclusion, you should study the problem
*very* carefully.

A more general problem with the example code is that if there is a single
buffer shared between thread_a and thread_b, there needs to be something
to prevent thread_a from refilling the buffer before thread_b is done
with it.  To solve this problem, it may be necessary to move the
"acquire data" portion of thread_a after the mutex has been locked.  So
the code with these two fixes would be:

14      while (1)
15        {
            cyg_mutex_lock (& mut_cond_var);
16          // Acquire data into the buffer...
19          buffer_empty = false;
25          cyg_cond_signal (& cond_var);
28          cyg_mutex_unlock (& mut_cond_var);
29        }

Since the thread has the mutex locked during the acquisition of data and
setting buffer_empty = false, those could be done in either order, but as
a matter of style it seems best to not set the variable until after the
acquisition is completed.

Eric

-- 
Before posting, please read the FAQ: http://sources.redhat.com/fom/ecos
and search the list archive: http://sources.redhat.com/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [ECOS] questions about condition variable example from eCos book
  2003-04-28  8:18 [ECOS] questions about condition variable example from eCos book Eric Smith
@ 2003-04-28 12:16 ` Nick Garnett
  0 siblings, 0 replies; 2+ messages in thread
From: Nick Garnett @ 2003-04-28 12:16 UTC (permalink / raw)
  To: Eric Smith; +Cc: ecos-discuss

"Eric Smith" <eric-ecd@brouhaha.com> writes:

> I've got some questions about the condition variable example code in
> the eCos book on pages 107-109.  I'm new to eCos so it's possible that
> I'm misinterpreting something.
> 
> First, the description of cyg_cond_signal() says that it will "wake up
> exactly one thread waiting on the condition variable [...]  If there are
> no threads waiting for the condition variable when it is signaled, nothing
> happens."  But it also says "a race condition could arise if more than
> one thread is waiting for the condition variable.  This is why it is
> important for the waiting thread to retest the condition variable to
> ensure its proper state."
> 
> I don't understand where this race condition comes from.  Even if there
> are multiple threads waiting on the same condition, cyg_cond_signal() will
> only wake up one, right?  So how could the thread wake up without the
> condition having been signalled?

The problem is not really in what happens in the condition variable,
but in what happens in the mutex. When a thread waiting on a condition
variable is signalled, it has to re-acquire the mutex before
proceeding. Obviously it cannot do this immediately, since the thread
doing the signalling currently has the mutex, so it must wait until
the mutex becomes free. However, there may be other threads ahead of
it on the mutex queue, or a higher priority thread may jump in and
grab the mutex before it can run. So, by the time the thread actually
gets to run, other threads may have changed the state of the protected
data (stolen the buffer, consumed the resource, read the bytes from
the serial device...) and the condition it was waiting for is no
longer true. Hence, it has to re-test the condition and wait again if
it is not true.

There are other, more obscure, sources of potential race
conditions. Some implementations of condition variables, particularly
on multiprocessors, may occasionally cause more threads to be woken
than is strictly necessary. It is perfectly reasonable for any
condition variable implementation to treat signal() as if it were
broadcast().

> 
> The example code contains two threads.  Here's an excerpt,
> slightly reformatted and without most of the comments:
> 

[snip code]

> 
> 
> So in this example, why is it not adequate for the while statement
> on line 44 to be an if statement instead?  I'll concede that a while
> is better defensive programming to use while, but it doesn't seem
> strictly necessary as the text claims.
>

In this specific example, with only two threads, an if() might be
adequate. However, as soon as you add a second consumer thread, there
is the potential for a thread to come out of cyg_cond_wait() when
buffer_empty is true and then proceed to process bogus data.

In the real world, this is very common, and one might forget to go
back an change that if() to a while(). So it's best to get into the
habit of using while() all the time.

> 
> It seems fairly clear that line 19 "buffer_empty =false" in thread_a should
> actually come after line 22 acquires the mutex, in order to prevent exactly
> this sort of race condition.

I agree. This looks like a error. Probably the result of some "tidying
up".

> 
> A more general problem with the example code is that if there is a single
> buffer shared between thread_a and thread_b, there needs to be something
> to prevent thread_a from refilling the buffer before thread_b is done
> with it.  To solve this problem, it may be necessary to move the
> "acquire data" portion of thread_a after the mutex has been locked.  So
> the code with these two fixes would be:
> 
> 14      while (1)
> 15        {
>             cyg_mutex_lock (& mut_cond_var);
> 16          // Acquire data into the buffer...
> 19          buffer_empty = false;
> 25          cyg_cond_signal (& cond_var);
> 28          cyg_mutex_unlock (& mut_cond_var);
> 29        }
> 
> Since the thread has the mutex locked during the acquisition of data and
> setting buffer_empty = false, those could be done in either order, but as
> a matter of style it seems best to not set the variable until after the
> acquisition is completed.
> 

A possible problem with this approach is that if we have priority
inversion protection enabled, it is possible for the thread to end up
doing the entire data acquisition at an unnaturally raised priority.
It would be nice to avoid that.

An alternative approach would be to change the buffer_empty variable
into a buffer_owner, or a buffer_state variable that contains slightly
more information. This would allow thread_a to distinguish between the
buffer being unused, and being in use by thread_b.

-- 
Nick Garnett                    eCos Kernel Architect
http://www.ecoscentric.com/     The eCos and RedBoot experts

-- 
Before posting, please read the FAQ: http://sources.redhat.com/fom/ecos
and search the list archive: http://sources.redhat.com/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2003-04-28 12:03 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-04-28  8:18 [ECOS] questions about condition variable example from eCos book Eric Smith
2003-04-28 12:16 ` Nick Garnett

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).