public inbox for ecos-devel@sourceware.org
 help / color / mirror / Atom feed
* Cyg_Mutex::check_this() fails
@ 2013-02-13 11:36 René Schipp von Branitz Nielsen
  2013-03-15 17:56 ` jamescmaki
  0 siblings, 1 reply; 3+ messages in thread
From: René Schipp von Branitz Nielsen @ 2013-02-13 11:36 UTC (permalink / raw)
  To: ecos-devel

...Sending this from my private mail account because the eCos mailserver
blocks mail containing confidentiality disclaimers, which I can't prevent
when sending from my work mail account... :( :(

--------------------oOo--------------------

Hello folks,

I have a condition variable, c, tied to a mutex, m, and used like this with
a FIFO, f:

void producer(void)
{
    cyg_mutex_lock(&m);
    copy_to_fifo(&f, some_data);
    cyg_cond_signal(&c);
    cyg_mutex_unlock(&m);
}

void consumer_thread(cyg_addrword_t data) {
    cyg_mutex_lock(&m);
    while (1) {
        while (fifo_empty(&f)) {
            cyg_cond_wait(&c);
        }

        // m is locked here.

        // Empty FIFO.
        while (!fifo_empty(&f)) {
            copy_from_fifo(&f, &some_data);
            cyg_mutex_unlock();

            // Do something with some_data
            ...

            cyg_mutex_lock();
        }
    }
}

The following description refers to line numbers of rev. 1.15 of mutex.cxx.

When the system is under heavy interrupt load, threads may get scheduled in
and out more frequently than when not. Under these circumstances, I
sometimes get an assertion (cyg_assert_msg()) stemming from line 197 of
mutex.cxx, which is Cyg_Mutex::check_this(). Placing a breakpoint in this
function reveals that it happens when the consumer is about to wake up, that
is, in line 651, which is the second half of
Cyg_Condition_Variable::wait_inner( Cyg_Mutex *mx ).

A closer look at wait_inner() shows that when CYG_ASSERTCLASS( mx, "Corrupt
mutex") is invoked, the scheduler is not locked, which in turn means that
Cyg_Mutex::check_this() line 197 is tested non-atomically. Line 197
contains:
    if (( locked && owner == NULL ) return false;

So if the preemptive scheduler schedules the caller of cyg_cond_wait() out
in between the test of "locked" and "owner == NULL", and the mutex state
changes while scheduled out, we have a problem.

As I see it, CYG_ASSERTCLASS(some_obj, "some message") serves two purposes:
  1) check that some_obj is non-NULL and
  2) check that some_obj->check_this() returns TRUE.

IMO, only the first check needs to be made by wait_inner(), because line
#677 attempts to reacquire the mutex (mx->lock()), which itself performs the
check_this() check - and with the scheduler locked.

There are other places in mutex.cxx where CYG_ASSERTCLASS(Cyg_Mutex,
"message") is invoked without the scheduler locked, but I can't judge
whether these are OK or not.

Bottomline is that I suggest to change line #651 of mutex.cxx from
  CYG_ASSERTCLASS( mx, "Corrupt mutex"); to
  CYG_ASSERT(mx, "Invalid mutex pointer"); /* Or some other message */

Or is there anything fundamental, I have missed here?
Comments are appreciated.

Thanks in advance,
René Schipp von Branitz Nielsen
Vitesse Semiconductor Corporation


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Cyg_Mutex::check_this() fails
  2013-02-13 11:36 Cyg_Mutex::check_this() fails René Schipp von Branitz Nielsen
@ 2013-03-15 17:56 ` jamescmaki
  2013-03-15 19:12   ` jamescmaki
  0 siblings, 1 reply; 3+ messages in thread
From: jamescmaki @ 2013-03-15 17:56 UTC (permalink / raw)
  To: ecos-devel

Hi Rene,

At this point in the code the scheduler is not locked and the mutex is not
owned. This means that it is a bad idea to be looking at the internal state
of the mutex, i.e., calling CYG_ASSERTCLASS( mx, "Corrupt mutex").

I can reproduce this issue consistently now and I am testing a fix that
changes CYG_ASSERTCLASS to CYG_ASSERT. I am hoping that this will resolve
the problem instead of just moving the failed assert into Cyg_Mutex::lock().

However even if this does fix the problem, I am still wary. From what I can
see, locked and owner are never modified without locking the scheduler. This
means that they are both updated atomically. Because of this, when the
thread in  wait_inner() resumes it should see both of these variables in a
consistent state regardless of whether it holds a lock or not. I am
wondering if this can be explained by reordering and by delaying the check
until the scheduler is locked again it avoids this problem.

-James





--
View this message in context: http://sourceware-org.1504.n7.nabble.com/Cyg-Mutex-check-this-fails-tp222044p225589.html
Sent from the Sourceware - ecos-devel mailing list archive at Nabble.com.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Cyg_Mutex::check_this() fails
  2013-03-15 17:56 ` jamescmaki
@ 2013-03-15 19:12   ` jamescmaki
  0 siblings, 0 replies; 3+ messages in thread
From: jamescmaki @ 2013-03-15 19:12 UTC (permalink / raw)
  To: ecos-devel

OK I didn't think that through fully.

if (( locked && owner == NULL )

The thread sees locked == true, gets rescheduled, then sees owner == NULL.



--
View this message in context: http://sourceware-org.1504.n7.nabble.com/Cyg-Mutex-check-this-fails-tp222044p225606.html
Sent from the Sourceware - ecos-devel mailing list archive at Nabble.com.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2013-03-15 19:12 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-13 11:36 Cyg_Mutex::check_this() fails René Schipp von Branitz Nielsen
2013-03-15 17:56 ` jamescmaki
2013-03-15 19:12   ` jamescmaki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).