Bugs in eCos SMP scheduler

public inbox for ecos-devel@sourceware.org
 help / color / mirror / Atom feed

* Bugs in eCos SMP scheduler
@ 2004-09-06 11:29 sandeep
  2004-09-07 12:54 ` sandeep
  0 siblings, 1 reply; 2+ messages in thread
From: sandeep @ 2004-09-06 11:29 UTC (permalink / raw)
  To: ecos-devel; +Cc: Nick Garnett

hi nick,

Of late, I observed a new situation related to scheduler, where sched_lock count 
was becoming negative (0xFFFFFFFF/0xFFFFFFFE) and the holder of lock was 
HAL_SMP_CPU_NONE (i.e. none of the processors was the owner of schedlock).

well, cause of that is --
though sched_lock incrementing process makes sure that only owner can increment 
the count, zero_sched_lock/set_sched_lock/get_sched_lock don't respect the 
notion of owner (processor) of sched_lock.
This, also introduces race conditions in the system, and results are obvious.

consider a sample situation of two processor configuration involving threads T1, 
T2, T3, T4, ... running on system comprising of processors P1 and P2.

- currently no processor owns the lock and sched_lock count is 0.
- T1 (on P2) completes it's excution of thread_entry and later in user specified
   thread entry function, takes the scheduler lock (owner = P2, count = 1)
- T2 (on P1) is in it's execution in thread_entry function and executes
   zero_sched_lock (owner = NONE, count = 0)
- T1 (on P2) unlocks the scheduler  __AND__ scheduler lock is -1 (0xFFFFFFFF,
   considering 32-bit data-type for it), owner is NONE.

another variation of previous scenario could be --

- T1 (on P2) takes sched lock (owner = P2, count =1)
- T2 (on P1) executed zero_sched_lock (owner = NONE, count = 0)
- T1 (on P2) takes another sched lock (owner = P2, count = 1)
   ** count should have been 2 **
- T1 (on P2) unlocks the scheduler (causes it to enter unlock_inner and choose
   another thread to run.
   ** scheduling shouldn't have happened at this point **
- next time when T1 is in on any processor, it continues with it's second
   scheduler unlocking (that decrements current sched lock value, irrespective of
   anyone else being owner) and messup continues..

   current code of sched-lock incrementing, does a lock++ when it becomes the
   owner of lock for the first time (instead of setting it to 1), hence in case
   sched_lock value had become -1 in previous case and NO CPU was owner, then it
   will become 0, rather than 1 and system is in mess.

   for this small aspect, fix is to replace "lock ++" by "lock = 1", but still
   the larger problem remains.

Possible solutions (in part) could be --

* I change zeroing sched_lock process to check for - if the current processor 
executing this code is the owner of sched lock, and only in that case proceed 
with zeroing.

but it breaks the notion of - every eCos thread starting with sched_lock value 
of 0 (a notion carried from NO-SMP eCos) --- impact of this??? might not be any.

Unless, I am missing something stupidly, decent amount of changes might be 
required for SMPising of eCos, ranging from changes to unlock_inner to situation 
  that it might not been possible to extend NO-SMP eCos scheduling model to SMP.

I have sat over the observations and tried to analyse a bit before keying in 
this mail today, but I am still considering my flaws in analysis/understanding 
of eCos. but the mentioned in this mail (and others not mentioned) observations 
can't be explained in other ways, atleast as of now.

For saving reader's time, SMP startup flow is as follows --

  Cyg_scheduler :: start ()
  --> HAL_SMP_CPU_START
  --> cyg_hal_smp_start
  --> cyg_hal_smp_startup
  --> cyg_kernel_smp_startup (takes scheduler lock and calls start_cpu)
  --> start_cpu (gets next thread to schedule and loads it)
  --> thread_entry (zeroes scheduler lock and calls actual thread entry point 
specified by the user during thread creation)

-- 
regards
sandeep
--------------------------------------------------------------------------
Walk softly and carry a megawatt laser.
--------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Bugs in eCos SMP scheduler
  2004-09-06 11:29 Bugs in eCos SMP scheduler sandeep
@ 2004-09-07 12:54 ` sandeep
  0 siblings, 0 replies; 2+ messages in thread
From: sandeep @ 2004-09-07 12:54 UTC (permalink / raw)
  To: Nick Garnett; +Cc: ecos-devel

hi nick,
apologies for the previous post. I guess, looking at hal_smp.h and smp.hxx 
simultaneously caused the previous post. Since you folks have tested SMP eCos on 
i386 using redhat/ecoscentric testbeds, bugs shouldn't be there in SMP eCos, so 
problem(s?) should be in HAL.
i feel, i have identified one problem atleast. the example situation given in 
earlier post would happen with the state of existing HAL (in some scenario). 
will handle it locally, and will share with list, from non company email id, in 
case I observe some issues with eCos non-hal parts that helps make eCos sturdy.
--
regards
sandeep
--------------------------------------------------------------------------
God doesn't play dice.
		-- Albert Einstein
--------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2004-09-07 12:54 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-09-06 11:29 Bugs in eCos SMP scheduler sandeep
2004-09-07 12:54 ` sandeep

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).