public inbox for frysk@sourceware.org
 help / color / mirror / Atom feed
* EventLoop.isCurrentThread() race condition
@ 2007-06-22 11:39 Kris Van Hees
  2007-06-22 16:31 ` Andrew Cagney
  0 siblings, 1 reply; 10+ messages in thread
From: Kris Van Hees @ 2007-06-22 11:39 UTC (permalink / raw)
  To: frysk

I was wondering whether anyone is actually depending on the side effect
of the isCurrentThread() method in frysk.event.EventLoop, in that it
assigns the tid of the current thread as the identifier for the event
loop thread if it wasn't assigned before.

Reason for asking is described in bugzilla #4688.  Essentially, we end
up with a race condition on what thread ends up setting the tid variable
in the EventLoop class.  There are cases where the EventLoop is
instantiated (tid is -1 at that point), and then other threads are
started that may trigger operations in the core that (usually
indirectly) call isCurrentThread() to determine whether work can be done
in-line or needs to be placed on the queue.  If that call to
isCurrentThread() is reached before the actual event loop thread
executes either runPolling() or runPending(), the event loop will
register the wrong tid, causing an exception later on (in updateTid())
when runPolling() or runPending() is called (from the correct thread).

The fix is of course to remove the side effect from isCurrentThread(),
but I'd rather not commit that until I know that there isn't someone
who actually depends on this behaviour since it has been in existance
since Apr 18, 2007.

For reference, I ran all tests with the current side effect in place,
and then re-ran all tests again with the patch applied.  The results
were identical (modulo tests that intermittently fail due to the
updateTid() esception).

	Cheers,
	Kris

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: EventLoop.isCurrentThread() race condition
  2007-06-22 11:39 EventLoop.isCurrentThread() race condition Kris Van Hees
@ 2007-06-22 16:31 ` Andrew Cagney
  2007-06-22 17:14   ` Kris Van Hees
  0 siblings, 1 reply; 10+ messages in thread
From: Andrew Cagney @ 2007-06-22 16:31 UTC (permalink / raw)
  To: Kris Van Hees; +Cc: frysk

There are two ways to run the eventLoop:

a) in the main thread of a single threaded app, as the frysk.proc tests 
and utilities all do
b) in a separate thread, started early by the main thread by having it 
call eventLoop.start() (the gui has a bug here, npremji was looking at 
it, part of which is not extending Thread)

That logic detects when neither of those strict models are being 
followed - and a deadlock will likely result (the signal goes to the 
wrong thread, or there is no thread to wake up).

Andrew


the code you refer
Kris Van Hees wrote:
> I was wondering whether anyone is actually depending on the side effect
> of the isCurrentThread() method in frysk.event.EventLoop, in that it
> assigns the tid of the current thread as the identifier for the event
> loop thread if it wasn't assigned before.
>
> Reason for asking is described in bugzilla #4688.  Essentially, we end
> up with a race condition on what thread ends up setting the tid variable
> in the EventLoop class.  There are cases where the EventLoop is
> instantiated (tid is -1 at that point), and then other threads are
> started that may trigger operations in the core that (usually
> indirectly) call isCurrentThread() to determine whether work can be done
> in-line or needs to be placed on the queue.  If that call to
> isCurrentThread() is reached before the actual event loop thread
> executes either runPolling() or runPending(), the event loop will
> register the wrong tid, causing an exception later on (in updateTid())
> when runPolling() or runPending() is called (from the correct thread).
>
> The fix is of course to remove the side effect from isCurrentThread(),
> but I'd rather not commit that until I know that there isn't someone
> who actually depends on this behaviour since it has been in existance
> since Apr 18, 2007.
>
> For reference, I ran all tests with the current side effect in place,
> and then re-ran all tests again with the patch applied.  The results
> were identical (modulo tests that intermittently fail due to the
> updateTid() esception).
>
> 	Cheers,
> 	Kris
>   

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: EventLoop.isCurrentThread() race condition
  2007-06-22 16:31 ` Andrew Cagney
@ 2007-06-22 17:14   ` Kris Van Hees
  2007-06-22 19:04     ` Andrew Cagney
  0 siblings, 1 reply; 10+ messages in thread
From: Kris Van Hees @ 2007-06-22 17:14 UTC (permalink / raw)
  To: Andrew Cagney; +Cc: Kris Van Hees, frysk

Agreed, and thanks for pointing out that I did actually overlook one
call to updateTid() (from run() in EventLoop).  However, I still do not
really see why isCurrentThread() would record the tid if it hasn't been
set yet.  It seems to be an undesirable complication that can give rise
to a race condition where none should exist.  Some tests in frysk-core
are actually using runPending or runPolling in a multi-threaded context,
which is obviously incorrect given your explanation below, but instead
of it triggering a clear exception, it causes an intermittent failure.
Such indeterministic behaviour should be avoided I think, especially as
it contributes to intermittent testsuite failures, reducing the
effectiveness of tests.

Also note that the comments on e.g. the run() method of EventLoop state
that existing pending events are processed before the first poll.  But
(especially in view of existing code in classes derived from Request)
that again opens up the problem that isCurrentThread() may be called
prior to EventLoop.start(), i.e. before the tid is assigned the
"correct" value.  Prohibiting pending events prior to the cal to start()
might be one way to go for multi-threaded applications, but that seems
to be the wrong way to go I think.

The proposed patch (if tid is -1, isCurrentThread() wil return false)
solves the problem of being indeterministic, and acts appropriately (as
far as I can determine) in both single threaded and multi threaded apps:

1. single threaded: No call has been made to runPending or runPolling, so
   the event loop has not executed yet.  If a call to isCurrentThread()
   is used to determine on whether a request should be executed
   synchronously or be added as an event, an event will be added.  It
   will still executed within the context of the single thread, albeit
   not until runPending or runPolling is executed.
2. multi threaded: The event loop is not running yet, so there isn't
   really an event loop thread yet.  Adding the request as an event to
   be processed as soonas the event loop thread starts seems quite
   correct in that case.  Alternatively, executing the request in the
   currently executing thread would mean later requests execute in a
   different thread (actual event loop), which is probably not what is
   supposed to be supported anyway.

Finally, if it is important for a request to be able to determine
whether isCurrentThread() returns false as a result of the event loop
not having been started (either as a separate thread or because
runPending/runPolling was not called yet) vs being a result of the
currently executing thread not being the event loop thread, it is easy
enough to add a method to EventLoop to query for that very situation
(e.g. hasStarted() or something similar).

	Cheers,
	Kris

On Fri, Jun 22, 2007 at 11:09:52AM -0400, Andrew Cagney wrote:
> There are two ways to run the eventLoop:
> 
> a) in the main thread of a single threaded app, as the frysk.proc tests 
> and utilities all do
> b) in a separate thread, started early by the main thread by having it 
> call eventLoop.start() (the gui has a bug here, npremji was looking at 
> it, part of which is not extending Thread)
> 
> That logic detects when neither of those strict models are being 
> followed - and a deadlock will likely result (the signal goes to the 
> wrong thread, or there is no thread to wake up).
> 
> Andrew
> 
> 
> the code you refer
> Kris Van Hees wrote:
> >I was wondering whether anyone is actually depending on the side effect
> >of the isCurrentThread() method in frysk.event.EventLoop, in that it
> >assigns the tid of the current thread as the identifier for the event
> >loop thread if it wasn't assigned before.
> >
> >Reason for asking is described in bugzilla #4688.  Essentially, we end
> >up with a race condition on what thread ends up setting the tid variable
> >in the EventLoop class.  There are cases where the EventLoop is
> >instantiated (tid is -1 at that point), and then other threads are
> >started that may trigger operations in the core that (usually
> >indirectly) call isCurrentThread() to determine whether work can be done
> >in-line or needs to be placed on the queue.  If that call to
> >isCurrentThread() is reached before the actual event loop thread
> >executes either runPolling() or runPending(), the event loop will
> >register the wrong tid, causing an exception later on (in updateTid())
> >when runPolling() or runPending() is called (from the correct thread).
> >
> >The fix is of course to remove the side effect from isCurrentThread(),
> >but I'd rather not commit that until I know that there isn't someone
> >who actually depends on this behaviour since it has been in existance
> >since Apr 18, 2007.
> >
> >For reference, I ran all tests with the current side effect in place,
> >and then re-ran all tests again with the patch applied.  The results
> >were identical (modulo tests that intermittently fail due to the
> >updateTid() esception).
> >
> >	Cheers,
> >	Kris
> >  
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: EventLoop.isCurrentThread() race condition
  2007-06-22 17:14   ` Kris Van Hees
@ 2007-06-22 19:04     ` Andrew Cagney
  2007-06-22 19:36       ` Kris Van Hees
  0 siblings, 1 reply; 10+ messages in thread
From: Andrew Cagney @ 2007-06-22 19:04 UTC (permalink / raw)
  To: Kris Van Hees; +Cc: frysk

Kris Van Hees wrote:
> Some tests in frysk-core
> are actually using runPending or runPolling in a multi-threaded context
Can you be more specific here, a thread and perhaps with a sequence 
diagram?  .runPending and .runPolling should be robust in a 
multi-threaded context provided they are being called from the main thread.

My guess is that the above is holding, but then a secondary thread is 
calling into the eventLoop before the primary thread has had a chance to 
pin down that it is the eventLoop thread.  But without knowing what 
specific problem you're refering to it remains a guess :-(

To expand on my notes:

There are two ways to run the eventLoop:
> 
> a) in the main thread of a single threaded app, as the frysk.proc tests 
> and utilities all do

Here the event-loop is considered to always be running and is bound to the main thread.  Hence events can be added at any time; the first such call locks this in.

> b) in a separate thread, started early by the main thread by having it 
> call eventLoop.start() (the gui has a bug here, npremji was looking at 
> it, part of which is not extending Thread)

Here the eventLoop isn't started until .start returns; so attempts to add events prior to that are not valid; code needing an eventLoop thread should start it before anything else.  When .start returns the event-loop is guaranteed to be ready.

Or as you put it, pending events before the event-loop is started is indeed prohibited.  Is this too onerous?  I don't think so:

- the intent is to stop/start the event-loop in which case it _must_ remain on the main thread (otherwize ptrace gets scrambled)

- the intent is to be multi-threaded; start that thread

To make this more concrete, can you explain what you are trying to do?  Perhaps with that understanding we can appreciate where the problem is.

Andrew

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: EventLoop.isCurrentThread() race condition
  2007-06-22 19:04     ` Andrew Cagney
@ 2007-06-22 19:36       ` Kris Van Hees
  2007-06-22 19:42         ` Andrew Cagney
  0 siblings, 1 reply; 10+ messages in thread
From: Kris Van Hees @ 2007-06-22 19:36 UTC (permalink / raw)
  To: Andrew Cagney; +Cc: Kris Van Hees, frysk

On Fri, Jun 22, 2007 at 01:14:06PM -0400, Andrew Cagney wrote:
> Kris Van Hees wrote:
> >Some tests in frysk-core
> >are actually using runPending or runPolling in a multi-threaded context
> Can you be more specific here, a thread and perhaps with a sequence
> diagram?  .runPending and .runPolling should be robust in a
> multi-threaded context provided they are being called from the main thread.
>
> My guess is that the above is holding, but then a secondary thread is
> calling into the eventLoop before the primary thread has had a chance to
> pin down that it is the eventLoop thread.  But without knowing what
> specific problem you're refering to it remains a guess :-(

To quote from my original email:
>>Reason for asking is described in bugzilla #4688.  Essentially, we end
>>up with a race condition on what thread ends up setting the tid variable
>>in the EventLoop class.  There are cases where the EventLoop is
>>instantiated (tid is -1 at that point), and then other threads are
>>started that may trigger operations in the core that (usually
>>indirectly) call isCurrentThread() to determine whether work can be done
>>in-line or needs to be placed on the queue.  If that call to
>>isCurrentThread() is reached before the actual event loop thread
>>executes either runPolling() or runPending(), the event loop will
>>register the wrong tid, causing an exception later on (in updateTid())
>>when runPolling() or runPending() is called (from the correct thread).

I believe that explains what you mention above rather well.

> Or as you put it, pending events before the event-loop is started is indeed
> prohibited.  Is this too onerous?  I don't think so:

I assumed that would be the case, which means that the comment block
preceding the run() method in EventLoop is incorrect and definitely
should be updated.

I quote:
    /**
     * Run the event-loop.
     *
     * The event loop is stopped by calling requestStop, and is
     * stopped after all pending events have been processed.  Any
     * existing pending events are always processed before performing
     * the first poll.
     */

Obviously, there should not be existing pending events, and processing
them would in fact lead to an exception.

Can you explain what you are failing to understand in my description of
the problem?

	Cheers,
	Kris

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: EventLoop.isCurrentThread() race condition
  2007-06-22 19:36       ` Kris Van Hees
@ 2007-06-22 19:42         ` Andrew Cagney
  2007-06-22 22:25           ` Kris Van Hees
  0 siblings, 1 reply; 10+ messages in thread
From: Andrew Cagney @ 2007-06-22 19:42 UTC (permalink / raw)
  To: Kris Van Hees; +Cc: frysk

Kris Van Hees wrote:
> On Fri, Jun 22, 2007 at 01:14:06PM -0400, Andrew Cagney wrote:
>   
>> Kris Van Hees wrote:
>>     
>>> Some tests in frysk-core
>>> are actually using runPending or runPolling in a multi-threaded context
>>>       
>> Can you be more specific here, a thread and perhaps with a sequence
>> diagram?  .runPending and .runPolling should be robust in a
>> multi-threaded context provided they are being called from the main thread.
>>
>>     

There's a typo by me in the above - more specific as in the test[s] - 
i.e., which specific test is causing this problem?  This will give me 
some context so that I can understand the events that that lead to you 
identifying this issue.

It sounds like the underlying problem is a misplaced or missing call to 
.runPending() or .start; but with no context, I can't tell.  If that 
logic is correctly detecting that the event-loop thread switched, then 
that isn't a bug.

> I assumed that would be the case, which means that the comment block
> preceding the run() method in EventLoop is incorrect and definitely
> should be updated.
>
>   

Yes.

> I quote:
>     /**
>      * Run the event-loop.
>      *
>      * The event loop is stopped by calling requestStop, and is
>      * stopped after all pending events have been processed.  Any
>      * existing pending events are always processed before performing
>      * the first poll.
>      */
>
> Obviously, there should not be existing pending events, and processing
> them would in fact lead to an exception.
>
> Can you explain what you are failing to understand in my description of
> the problem?
>   
> 	Cheers,
> 	Kris
>   

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: EventLoop.isCurrentThread() race condition
  2007-06-22 19:42         ` Andrew Cagney
@ 2007-06-22 22:25           ` Kris Van Hees
  2007-06-23  1:15             ` Andrew Cagney
  0 siblings, 1 reply; 10+ messages in thread
From: Kris Van Hees @ 2007-06-22 22:25 UTC (permalink / raw)
  To: Andrew Cagney; +Cc: Kris Van Hees, frysk

Take frysk.proc.ptrace.TestByteBuffer.
Comment out all tests up until the definition of the AsyncModify private class.
(I.e. comment out from the beginning of the verifySlice() method up to
but not including the declaration of the AsyncModify private class.)
Run the test a few times.

On Fri, Jun 22, 2007 at 03:36:08PM -0400, Andrew Cagney wrote:
> Kris Van Hees wrote:
> >On Fri, Jun 22, 2007 at 01:14:06PM -0400, Andrew Cagney wrote:
> >  
> >>Kris Van Hees wrote:
> >>    
> >>>Some tests in frysk-core
> >>>are actually using runPending or runPolling in a multi-threaded context
> >>>      
> >>Can you be more specific here, a thread and perhaps with a sequence
> >>diagram?  .runPending and .runPolling should be robust in a
> >>multi-threaded context provided they are being called from the main 
> >>thread.
> >>
> >>    
> 
> There's a typo by me in the above - more specific as in the test[s] - 
> i.e., which specific test is causing this problem?  This will give me 
> some context so that I can understand the events that that lead to you 
> identifying this issue.
> 
> It sounds like the underlying problem is a misplaced or missing call to 
> .runPending() or .start; but with no context, I can't tell.  If that 
> logic is correctly detecting that the event-loop thread switched, then 
> that isn't a bug.
> 
> >I assumed that would be the case, which means that the comment block
> >preceding the run() method in EventLoop is incorrect and definitely
> >should be updated.
> >
> >  
> 
> Yes.
> 
> >I quote:
> >    /**
> >     * Run the event-loop.
> >     *
> >     * The event loop is stopped by calling requestStop, and is
> >     * stopped after all pending events have been processed.  Any
> >     * existing pending events are always processed before performing
> >     * the first poll.
> >     */
> >
> >Obviously, there should not be existing pending events, and processing
> >them would in fact lead to an exception.
> >
> >Can you explain what you are failing to understand in my description of
> >the problem?
> >  
> >	Cheers,
> >	Kris
> >  
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: EventLoop.isCurrentThread() race condition
  2007-06-22 22:25           ` Kris Van Hees
@ 2007-06-23  1:15             ` Andrew Cagney
  2007-06-25 13:54               ` Kris Van Hees
  0 siblings, 1 reply; 10+ messages in thread
From: Andrew Cagney @ 2007-06-23  1:15 UTC (permalink / raw)
  To: Kris Van Hees; +Cc: frysk

Kris Van Hees wrote:
> Take frysk.proc.ptrace.TestByteBuffer.
> Comment out all tests up until the definition of the AsyncModify private class.
> (I.e. comment out from the beginning of the verifySlice() method up to
> but not including the declaration of the AsyncModify private class.)
> Run the test a few times.
>   

or ./TestRunner -r 1000 frysk.proc.ptrace.TestByteBuffer (although I 
didn't see it fail).

Anyway, yes there is a race with the secondary thread getting to the 
event-loop before the main thread; I've fixed that test.  Thanks!

Andrew

> On Fri, Jun 22, 2007 at 03:36:08PM -0400, Andrew Cagney wrote:
>   
>> Kris Van Hees wrote:
>>     
>>> On Fri, Jun 22, 2007 at 01:14:06PM -0400, Andrew Cagney wrote:
>>>  
>>>       
>>>> Kris Van Hees wrote:
>>>>    
>>>>         
>>>>> Some tests in frysk-core
>>>>> are actually using runPending or runPolling in a multi-threaded context
>>>>>      
>>>>>           
>>>> Can you be more specific here, a thread and perhaps with a sequence
>>>> diagram?  .runPending and .runPolling should be robust in a
>>>> multi-threaded context provided they are being called from the main 
>>>> thread.
>>>>
>>>>    
>>>>         
>> There's a typo by me in the above - more specific as in the test[s] - 
>> i.e., which specific test is causing this problem?  This will give me 
>> some context so that I can understand the events that that lead to you 
>> identifying this issue.
>>
>> It sounds like the underlying problem is a misplaced or missing call to 
>> .runPending() or .start; but with no context, I can't tell.  If that 
>> logic is correctly detecting that the event-loop thread switched, then 
>> that isn't a bug.
>>
>>     
>>> I assumed that would be the case, which means that the comment block
>>> preceding the run() method in EventLoop is incorrect and definitely
>>> should be updated.
>>>
>>>  
>>>       
>> Yes.
>>
>>     
>>> I quote:
>>>    /**
>>>     * Run the event-loop.
>>>     *
>>>     * The event loop is stopped by calling requestStop, and is
>>>     * stopped after all pending events have been processed.  Any
>>>     * existing pending events are always processed before performing
>>>     * the first poll.
>>>     */
>>>
>>> Obviously, there should not be existing pending events, and processing
>>> them would in fact lead to an exception.
>>>
>>> Can you explain what you are failing to understand in my description of
>>> the problem?
>>>  
>>> 	Cheers,
>>> 	Kris
>>>  
>>>       

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: EventLoop.isCurrentThread() race condition
  2007-06-23  1:15             ` Andrew Cagney
@ 2007-06-25 13:54               ` Kris Van Hees
  2007-06-27  1:20                 ` Andrew Cagney
  0 siblings, 1 reply; 10+ messages in thread
From: Kris Van Hees @ 2007-06-25 13:54 UTC (permalink / raw)
  To: Andrew Cagney; +Cc: Kris Van Hees, frysk

On Fri, Jun 22, 2007 at 07:54:07PM -0400, Andrew Cagney wrote:
> Kris Van Hees wrote:
> >Take frysk.proc.ptrace.TestByteBuffer.
> >Comment out all tests up until the definition of the AsyncModify private 
> >class.
> >(I.e. comment out from the beginning of the verifySlice() method up to
> >but not including the declaration of the AsyncModify private class.)
> >Run the test a few times.
> >  
> 
> or ./TestRunner -r 1000 frysk.proc.ptrace.TestByteBuffer (although I 
> didn't see it fail).
> 
> Anyway, yes there is a race with the secondary thread getting to the 
> event-loop before the main thread; I've fixed that test.  Thanks!

I see that the way you "fixed" the test is functionally identical to the
"hack" I use in TestMemorySpaceByteBuffer to work around this EventLoop
problem.  I hope this hack is not meant to be the final solution to this
race condition?  This really should be resolved in EventLoop, rather than
introducing a hack to make this work, and potentially getting stuck with
this problem popping up in other existing and future code.  Depending on
side effects in what appears to be a basic query method is dangerous (as
we can see from the issue with TestByteBuffer).

By the way, the comment to run() in EventLoop is yet to be updated to
reflect the correct behaviour?

	Cheers,
	Kris

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: EventLoop.isCurrentThread() race condition
  2007-06-25 13:54               ` Kris Van Hees
@ 2007-06-27  1:20                 ` Andrew Cagney
  0 siblings, 0 replies; 10+ messages in thread
From: Andrew Cagney @ 2007-06-27  1:20 UTC (permalink / raw)
  To: Kris Van Hees; +Cc: frysk

Kris Van Hees wrote:
> I see that the way you "fixed" the test is functionally identical to the
> "hack" I use in TestMemorySpaceByteBuffer to work around this EventLoop
> problem.  I hope this hack is not meant to be the final solution to this
> race condition?  This really should be resolved in EventLoop, rather than
> introducing a hack to make this work, and potentially getting stuck with
> this problem popping up in other existing and future code.  Depending on
> side effects in what appears to be a basic query method is dangerous (as
> we can see from the issue with TestByteBuffer).
>
>   

We could explore ways of creating the event-loop so that the intent is 
dictated up-front; I'll put that on my to-do list.  The event-loop is 
due for another pass anyway.

Btw, the test you added for Memory transfers, that is to tickle the 
multi-byte transfer interface, right?

Andrew


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2007-06-25 13:54 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-06-22 11:39 EventLoop.isCurrentThread() race condition Kris Van Hees
2007-06-22 16:31 ` Andrew Cagney
2007-06-22 17:14   ` Kris Van Hees
2007-06-22 19:04     ` Andrew Cagney
2007-06-22 19:36       ` Kris Van Hees
2007-06-22 19:42         ` Andrew Cagney
2007-06-22 22:25           ` Kris Van Hees
2007-06-23  1:15             ` Andrew Cagney
2007-06-25 13:54               ` Kris Van Hees
2007-06-27  1:20                 ` Andrew Cagney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).