public inbox for ecos-devel@sourceware.org
 help / color / mirror / Atom feed
* Subtle bug not calling DSRs.
@ 2005-12-22 19:11 Sergei Organov
  2006-01-03 10:40 ` Nick Garnett
  0 siblings, 1 reply; 3+ messages in thread
From: Sergei Organov @ 2005-12-22 19:11 UTC (permalink / raw)
  To: ecos-devel

Hello,

I believe I've found bug when eCos doesn't call pended DSR(s) until
_next_ time the scheduler lock is released. The scenario I see is like
this:

1. Start single thread that is run as soon as scheduler is enabled.
2. The thread suspends itself that way or another (wait on a primitive
   or just sleep, -- it doesn't matter).
3. The (2) results in context switch from this thread to the idle thread
   from within Cyg_Scheduler::unlock_inner(0).
4. During context switch interrupt happens that is requesting
   corresponding DSR to be run. However, as scheduler is locked during
   context switch, the DSR is not invoked on exit from the interrupt.
5. Context switch continues and transfers control to the idle thread
   entry address being Cyg_HardwareThread::thread_entry(). This is
   because the idle thread had no chance to run before this point.
6. Cyg_HardwareThread::thread_entry() has no code to check for pended
   DSRs, so the DSR posted in (4) is not run.

In my case system just hangs (loops forever in idle thread) at this
point as my thread (1) in fact waits for the interrupt, but
corresponding DSR that is meant to wakeup the thread is never called.

The quick-and-dirty fix to the problem is:

--- thread.cxx	14 Dec 2005 13:48:31 -0000
+++ thread.cxx	22 Dec 2005 18:24:03 -0000
@@ -102,6 +102,9 @@
     Cyg_Scheduler::zero_sched_lock();     // the assignment into the code above.
     HAL_REORDER_BARRIER();
 
+    Cyg_Scheduler::lock();
+    Cyg_Scheduler::unlock();
+    
     // Call entry point in a loop.
 
     for(;;)

But while we are at it, it seems that entire prologue of the
Cyg_HardwareThread::thread_entry() should be moved to a new method of
the Cyg_Scheduler class as the prologue consists entirely of the
intimate details of how scheduler works. Moreover, this prologue is in
fact a slightly modified copy of the code found in the
Cyg_Scheduler::unlock_inner() executing after the line

                HAL_THREAD_SWITCH_CONTEXT( &current->stack_ptr,
                                           &next->stack_ptr );

Even if this common part can't be easily factored out, it is much better
to at least keep two slightly different copies in one place close to
each other I think.

Anyway, somebody more experienced in the eCos internal architecture
(Nick?) may wish to take a look at this and could probably come up with
a better patch.

-- 
Sergei.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Subtle bug not calling DSRs.
  2005-12-22 19:11 Subtle bug not calling DSRs Sergei Organov
@ 2006-01-03 10:40 ` Nick Garnett
  2006-01-10 13:49   ` Sergei Organov
  0 siblings, 1 reply; 3+ messages in thread
From: Nick Garnett @ 2006-01-03 10:40 UTC (permalink / raw)
  To: Sergei Organov; +Cc: ecos-devel

Sergei Organov <osv@javad.com> writes:

> Hello,
> 
> I believe I've found bug when eCos doesn't call pended DSR(s) until
> _next_ time the scheduler lock is released. The scenario I see is like
> this:
> 
> 1. Start single thread that is run as soon as scheduler is enabled.
> 2. The thread suspends itself that way or another (wait on a primitive
>    or just sleep, -- it doesn't matter).
> 3. The (2) results in context switch from this thread to the idle thread
>    from within Cyg_Scheduler::unlock_inner(0).
> 4. During context switch interrupt happens that is requesting
>    corresponding DSR to be run. However, as scheduler is locked during
>    context switch, the DSR is not invoked on exit from the interrupt.
> 5. Context switch continues and transfers control to the idle thread
>    entry address being Cyg_HardwareThread::thread_entry(). This is
>    because the idle thread had no chance to run before this point.
> 6. Cyg_HardwareThread::thread_entry() has no code to check for pended
>    DSRs, so the DSR posted in (4) is not run.

Good catch.

This is a somewhat obscure bug since the number of applications that
start the scheduler and then do nothing at all is very small. However,
failing to handle DSRs across the context switch to start a thread is
clearly wrong.

> 
> Even if this common part can't be easily factored out, it is much better
> to at least keep two slightly different copies in one place close to
> each other I think.

Yep, that makes sense. The contents of Cyg_Thread::thread_entry() has
become more elaborate since it was originally written. A little
refactoring now won't go amiss.

> 
> Anyway, somebody more experienced in the eCos internal architecture
> (Nick?) may wish to take a look at this and could probably come up with
> a better patch.

I think the following patch fixes the bug and reorganizes things to be
a little tidier. If this proves to be the case I'll generate a
ChangeLog entry and check it in in a couple of days.


Index: kernel/current/include/sched.hxx
===================================================================
RCS file: /cvs/ecos/ecos/packages/kernel/current/include/sched.hxx,v
retrieving revision 1.12
diff -u -5 -r1.12 sched.hxx
--- kernel/current/include/sched.hxx	23 May 2002 23:06:48 -0000	1.12
+++ kernel/current/include/sched.hxx	3 Jan 2006 10:23:36 -0000
@@ -174,10 +174,13 @@
     // decrement the lock but also look for a reschedule opportunity
     static void             unlock_reschedule();
 
     // release the preemption lock without rescheduling
     static void             unlock_simple();
+
+    // perform thread startup housekeeping
+    void Cyg_Scheduler::thread_entry( Cyg_Thread *thread );
     
     // Start execution of the scheduler
     static void start() CYGBLD_ATTRIB_NORET;
 
     // Start execution of the scheduler on the current CPU
Index: kernel/current/src/common/thread.cxx
===================================================================
RCS file: /cvs/ecos/ecos/packages/kernel/current/src/common/thread.cxx,v
retrieving revision 1.20
diff -u -5 -r1.20 thread.cxx
--- kernel/current/src/common/thread.cxx	30 Jan 2003 07:02:53 -0000	1.20
+++ kernel/current/src/common/thread.cxx	3 Jan 2006 10:23:37 -0000
@@ -84,28 +84,14 @@
 void
 Cyg_HardwareThread::thread_entry( Cyg_Thread *thread )
 {
     CYG_REPORT_FUNCTION();
 
-    Cyg_Scheduler::scheduler.clear_need_reschedule(); // finished rescheduling
-    Cyg_Scheduler::scheduler.set_current_thread(thread); // restore current thread pointer
-
-    CYG_INSTRUMENT_THREAD(ENTER,thread,0);
-    
-#ifdef CYGSEM_KERNEL_SCHED_TIMESLICE
-    // Reset the timeslice counter so that this thread gets a full
-    // quantum. 
-    Cyg_Scheduler::reset_timeslice_count();
-#endif
+    // Call the scheduler to do any housekeeping
+    Cyg_Scheduler::scheduler.thread_entry( thread );
     
-    // Zero the lock
-    HAL_REORDER_BARRIER ();            // Prevent the compiler from moving
-    Cyg_Scheduler::zero_sched_lock();     // the assignment into the code above.
-    HAL_REORDER_BARRIER();
-
     // Call entry point in a loop.
-
     for(;;)
     {
         thread->entry_point(thread->entry_data);
         thread->exit();
     }
@@ -1223,15 +1209,15 @@
 
 #  endif // CYGNUM_KERNEL_THREADS_IDLE_STACK_SIZE < CYGNUM_HAL_STACK_SIZE_MINIMUM
 # endif // CYGNUM_KERNEL_THREADS_IDLE_STACK_SIZE
 #endif // CYGNUM_HAL_STACK_SIZE_MINIMUM
 
-static char idle_thread_stack[CYGNUM_KERNEL_CPU_MAX][CYGNUM_KERNEL_THREADS_IDLE_STACK_SIZE];
-
 // Loop counter for debugging/housekeeping
 cyg_uint32 idle_thread_loops[CYGNUM_KERNEL_CPU_MAX];
 
+static char idle_thread_stack[CYGNUM_KERNEL_CPU_MAX][CYGNUM_KERNEL_THREADS_IDLE_STACK_SIZE];
+
 // -------------------------------------------------------------------------
 // Idle thread code.
 
 void
 idle_thread_main( CYG_ADDRESS data )
Index: kernel/current/src/sched/sched.cxx
===================================================================
RCS file: /cvs/ecos/ecos/packages/kernel/current/src/sched/sched.cxx,v
retrieving revision 1.16
diff -u -5 -r1.16 sched.cxx
--- kernel/current/src/sched/sched.cxx	9 Aug 2002 17:13:28 -0000	1.16
+++ kernel/current/src/sched/sched.cxx	3 Jan 2006 10:23:38 -0000
@@ -307,10 +307,32 @@
 
     CYG_FAIL( "Should not be executed" );
 }
 
 // -------------------------------------------------------------------------
+// Thread startup. This is called from Cyg_Thread::thread_entry() and
+// performs some housekeeping for a newly started thread.
+
+void Cyg_Scheduler::thread_entry( Cyg_Thread *thread )
+{
+    clear_need_reschedule();            // finished rescheduling
+    set_current_thread(thread);         // restore current thread pointer
+
+    CYG_INSTRUMENT_THREAD(ENTER,thread,0);
+    
+#ifdef CYGSEM_KERNEL_SCHED_TIMESLICE
+    // Reset the timeslice counter so that this thread gets a full
+    // quantum. 
+    reset_timeslice_count();
+#endif
+    
+    // Finally unlock the scheduler. As well as clearing the scheduler
+    // lock this allows any pending DSRs to execute.
+    unlock();    
+}
+
+// -------------------------------------------------------------------------
 // Start the scheduler. This is called after the initial threads have been
 // created to start scheduling. It gets any other CPUs running, and then
 // enters the scheduler.
 
 void Cyg_Scheduler::start()




-- 
Nick Garnett                                     eCos Kernel Architect
http://www.ecoscentric.com                The eCos and RedBoot experts


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Subtle bug not calling DSRs.
  2006-01-03 10:40 ` Nick Garnett
@ 2006-01-10 13:49   ` Sergei Organov
  0 siblings, 0 replies; 3+ messages in thread
From: Sergei Organov @ 2006-01-10 13:49 UTC (permalink / raw)
  To: Nick Garnett; +Cc: ecos-devel

Nick Garnett <nickg@ecoscentric.com> writes:
[...]
> I think the following patch fixes the bug and reorganizes things to be
> a little tidier. If this proves to be the case I'll generate a
> ChangeLog entry and check it in in a couple of days.

Looks fine to me, -- I've applied the patch and it does fix the problem.

-- 
Sergei.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2006-01-10 13:49 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-12-22 19:11 Subtle bug not calling DSRs Sergei Organov
2006-01-03 10:40 ` Nick Garnett
2006-01-10 13:49   ` Sergei Organov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).