public inbox for ecos-discuss@sourceware.org
 help / color / mirror / Atom feed
From: andrew.lunn@ascom.ch (Andrew Lunn)
To: ecos-discuss@sourceware.cygnus.com
Subject: [ECOS] Scheduler bug found and fixed
Date: Wed, 20 Sep 2000 05:02:00 -0000	[thread overview]
Message-ID: <200009201202.OAA01003@biferten.ma.tech.ascom.ch> (raw)

Hi Nick and others.

I found a bug in the mlqueue schedular. It goes something like this...

An alarm function calls cyg_thread_suspend() on the current thread, ie
the one that was running when the timer ISR went off.

cyg_thread_suspend correctly takes the thread off the run_queue[pri]
and clears the queue_map bit. The alarm func finishes, all the
remaining DSR are called. 

After this it decided to do a timeslice on the clock tick. The
timeslice function then rotates the current run queue. Unfortunatly,
the run queue is now empty, so it follows a null pointer into random
memory and makes that the head of the queue. It then forces a
reschedule. This works becasue the queue_map bit is not set on the now
corrupt run_queue. The correct thread gets to run and the system seems
happy.

Then something causes an add_thread onto the now corrupt run_queue. In
my cause a call the cyg_thread_kill(). add_thread checks the
consitancy of the run_queue. It finds the queue is not empty, but the
queue_map bit is not set and throws a CYG_ASSERT.

The fix is to change the timeslice function to check the queue it
wants to rotate actually has at least one thread on it. If not ask for
a reschedule.

Here is a patch for the fix and to put an assert into the rotate
function to check for rotating and empty queue.

        Andrew

Index: ChangeLog
===================================================================
RCS file: /cvs/ecos/ecos/packages/kernel/current/ChangeLog,v
retrieving revision 1.43
diff -c -r1.43 ChangeLog
*** ChangeLog	2000/09/19 05:53:59	1.43
--- ChangeLog	2000/09/20 12:01:30
***************
*** 1,3 ****
--- 1,8 ----
+ 2000-09-20  Andrew Lunn  <andrew.lunn@ascom.ch>
+ 
+ 	* src/sched/mlqueue.cxx (timeslice) don't rotate an empty queue. 
+ 	* src/sched/mlqueue.cxx (rotate) Assert queue is not empty
+ 
  2000-09-13  Jesper Skov  <jskov@redhat.com>
  
  	* tests/kexcept1.c (cause_exception): Use separate cause_fpe function.
Index: src/sched/mlqueue.cxx
===================================================================
RCS file: /cvs/ecos/ecos/packages/kernel/current/src/sched/mlqueue.cxx,v
retrieving revision 1.7
diff -c -r1.7 mlqueue.cxx
*** mlqueue.cxx	2000/09/11 02:43:00	1.7
--- mlqueue.cxx	2000/09/20 12:01:30
***************
*** 283,297 ****
      
          cyg_priority pri                               = thread->priority;
          Cyg_SchedulerThreadQueue_Implementation *queue = &sched->run_queue[pri];
! 
!         queue->rotate();
! 
!         if( queue->highpri() != thread )
              sched->need_reschedule = true;
! 
          timeslice_count = CYGNUM_KERNEL_SCHED_TIMESLICE_TICKS;
      }
- 
      
      CYG_ASSERT( queue_map & (1<<CYG_THREAD_MIN_PRIORITY), "Idle thread vanished!!!");
      CYG_ASSERT( !run_queue[CYG_THREAD_MIN_PRIORITY].empty(), "Idle thread vanished!!!");
--- 283,298 ----
      
          cyg_priority pri                               = thread->priority;
          Cyg_SchedulerThreadQueue_Implementation *queue = &sched->run_queue[pri];
!         if ( !queue->empty() ) {
!           queue->rotate();
!           
!           if( queue->highpri() != thread )
              sched->need_reschedule = true;
!         } else {
!             sched->need_reschedule = true;
!         }
          timeslice_count = CYGNUM_KERNEL_SCHED_TIMESLICE_TICKS;
      }
      
      CYG_ASSERT( queue_map & (1<<CYG_THREAD_MIN_PRIORITY), "Idle thread vanished!!!");
      CYG_ASSERT( !run_queue[CYG_THREAD_MIN_PRIORITY].empty(), "Idle thread vanished!!!");
***************
*** 656,662 ****
  Cyg_ThreadQueue_Implementation::rotate(void)
  {
      CYG_REPORT_FUNCTION();
!         
      queue = queue->next;
  
      CYG_REPORT_RETURN();
--- 657,664 ----
  Cyg_ThreadQueue_Implementation::rotate(void)
  {
      CYG_REPORT_FUNCTION();
!     CYG_ASSERT(queue != 0, "Rotating an empty queue");
!  
      queue = queue->next;
  
      CYG_REPORT_RETURN();


             reply	other threads:[~2000-09-20  5:02 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2000-09-20  5:02 Andrew Lunn [this message]
2000-09-26  6:24 ` Nick Garnett

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200009201202.OAA01003@biferten.ma.tech.ascom.ch \
    --to=andrew.lunn@ascom.ch \
    --cc=ecos-discuss@sourceware.cygnus.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).