RE: [ECOS] Re: DSR Scheduling Problem

public inbox for ecos-discuss@sourceware.org
 help / color / mirror / Atom feed

* RE: [ECOS] Re: DSR Scheduling Problem
@ 2006-01-14  0:45 Jay Foster
  2006-01-14  2:12 ` Grant Edwards
                   ` (2 more replies)
  0 siblings, 3 replies; 33+ messages in thread
From: Jay Foster @ 2006-01-14  0:45 UTC (permalink / raw)
  To: 'Grant Edwards', ecos-discuss

I still think that FIFO queuing of the DSRs is better than LIFO queuing,
because in the absence of any DSR priority information, the best that can be
done is temporal priority (ie FIFO).  This prevents the case (that I'm
seeing) where a lower priority ISR's DSR preempts a higher priority ISR's
DSR (the priority is lost in the LIFO DSR queue).

I located the kernel versions of the DSR code
(kernel/current/src/intr/intr.cxx), and discovered that there are two
implementations for the DSR handling (CYGIMP_KERNEL_INTERRUPTS_DSRS_LIST,
and CYGIMP_KERNEL_INTERRUPTS_DSRS_TABLE).  The default is to use the LIST,
which is LIFO, but the TABLE implementation is FIFO.  I switched my
configuration to the TABLE implementation, and my code works.  So a second
reason to use FIFO for the DSR LIST implementation is to match the behavior
of the TABLE implementation.

Jay

-----Original Message-----
From: Grant Edwards [mailto:grante@visi.com]
Sent: Friday, January 13, 2006 3:42 PM
To: ecos-discuss@ecos.sourceware.org
Subject: [ECOS] Re: DSR Scheduling Problem

> The test begins by transmitting data, which is looped back to the
receiver.
> It starts out with:
> 	TX ISR -> TX DSR
> 	TX ISR -> TX DSR
> 	...
> 	TX-ISR -> TX DSR
>
> Then I get the RX ISR during the TX DSR, which just schedules
> the RX DSR. However, the RX DSR does not run until 39 ms
> later,

And TX DSRs are running during that entire 38ms?

> resulting in an overrun error.  During this time period, the
> TX ISR and TX DSR continue their work transmitting the
> remaining data.  After all of the data has been sent, THEN the
> RX DSR runs.

It appears you don't have enough CPU time to run all of the
DSRs you want in the alloted time.

> Looking at the code post_dsr() and call_dsr() in
> hal/common/current/src/drv_api.c, I noticed that the DSRs are
> queued at the head of the list, and dequeued also from the
> head of the list.

Yup.  DSRs are scheduled in a LIFO manner. 

> This seems wrong,

It seems to work for everybody else. ;)

> as it can (and apparently does) cause DSRs to get delayed by
> other DSRs that are queued later.  Seems like it would be
> better to queue them on the end of the list and dequeue them
> from the head of the list, so that the DSRs would get run in
> the order in which they are queued.

If the DSRs that you're scheduling require 150% of the
available CPU time, then something's going to fail.  

In your particular case, perhaps it is better to fail in manner
B than in manner A. But, very few eCos users have the option of
failing, so nobody put in much extra effort to make things fail
in manner B rather than in manner A.  

Did that make sense?

-- 
Grant Edwards                   grante             Yow!  I'm having an
                                  at               EMOTIONAL OUTBURST!! But,
                               visi.com            uh, WHY is there a WAFFLE
                                                   in my PAJAMA POCKET??

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [ECOS]  Re: DSR Scheduling Problem
  2006-01-14  0:45 [ECOS] Re: DSR Scheduling Problem Jay Foster
@ 2006-01-14  2:12 ` Grant Edwards
  2006-01-14  3:04   ` Paul D. DeRocco
                     ` (2 more replies)
  2006-01-14  8:23 ` Andrew Lunn
  2006-01-16 10:27 ` Nick Garnett
  2 siblings, 3 replies; 33+ messages in thread
From: Grant Edwards @ 2006-01-14  2:12 UTC (permalink / raw)
  To: ecos-discuss

On 2006-01-14, Jay Foster <jay@systech.com> wrote:

> I still think that FIFO queuing of the DSRs is better than
> LIFO queuing, because in the absence of any DSR priority
> information, the best that can be done is temporal priority
> (ie FIFO).

That happens to work for your application, but I don't see how
you can say that FIFO is best in the general case.

> This prevents the case (that I'm seeing) where a lower
> priority ISR's DSR preempts a higher priority ISR's DSR (the
> priority is lost in the LIFO DSR queue).

I still maintain that your application is either broken or you
don't have enough CPU.  If one interrupts source requires so
much DSR time that others can't run, then there is simply
something wrong.  You seem to prefer a tx underrun error to an
rx overrun error.  I guarantee you're going to get one or the
other.  On the systems I work on, either is equally fatal, so
it is not the case that FIFO is better than LIFO.  Both work
equally well.

> The default is to use the LIST, which is LIFO, but the TABLE
> implementation is FIFO. I switched my configuration to the
> TABLE implementation, and my code works.  So a second reason
> to use FIFO for the DSR LIST implementation is to match the
> behavior of the TABLE implementation.

As long as the FIFO list approach doesn't require any more
overhead, it's fine with me.  Like I said, if you're not
allowed to starge any of the DSRs, either works equally well.

-- 
Grant Edwards                   grante             Yow!  Imagine--a WORLD
                                  at               without POODLES...
                               visi.com            

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [ECOS]  Re: DSR Scheduling Problem
  2006-01-14  2:12 ` Grant Edwards
@ 2006-01-14  3:04   ` Paul D. DeRocco
  2006-01-14  3:40     ` Grant Edwards
  2006-01-16  8:27   ` Dirk Husemann
  2006-02-13 10:41   ` Sergei Organov
  2 siblings, 1 reply; 33+ messages in thread
From: Paul D. DeRocco @ 2006-01-14  3:04 UTC (permalink / raw)
  To: eCos Discuss

> From: Grant Edwards
>
> I still maintain that your application is either broken or you
> don't have enough CPU.  If one interrupts source requires so
> much DSR time that others can't run, then there is simply
> something wrong.  You seem to prefer a tx underrun error to an
> rx overrun error.  I guarantee you're going to get one or the
> other.  On the systems I work on, either is equally fatal, so
> it is not the case that FIFO is better than LIFO.  Both work
> equally well.

If the transmitter has a hardware FIFO, and the software transmits one byte
per interrupt, then presenting a block of data to it after an idle period
will invoke the ISR/DSR a slew of times until the FIFO is full. This will
happen even if the average interrupt rate is eventually throttled to a
reasonable value by the serial transmission rate, once the FIFO is full. I
don't know if that accounts for the 38ms in this person's situation, but if
the FIFO is large it could certainly tie things up for a significant amount
of time.

--

Ciao,               Paul D. DeRocco
Paul                mailto:pderocco@ix.netcom.com

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [ECOS]  Re: DSR Scheduling Problem
  2006-01-14  3:04   ` Paul D. DeRocco
@ 2006-01-14  3:40     ` Grant Edwards
  2006-01-16  8:40       ` Daniel Néri
  0 siblings, 1 reply; 33+ messages in thread
From: Grant Edwards @ 2006-01-14  3:40 UTC (permalink / raw)
  To: ecos-discuss

On 2006-01-14, Paul D. DeRocco <pderocco@ix.netcom.com> wrote:

>> I still maintain that your application is either broken or you
>> don't have enough CPU.  If one interrupts source requires so
>> much DSR time that others can't run, then there is simply
>> something wrong.  You seem to prefer a tx underrun error to an
>> rx overrun error.  I guarantee you're going to get one or the
>> other.  On the systems I work on, either is equally fatal, so
>> it is not the case that FIFO is better than LIFO.  Both work
>> equally well.
>
> If the transmitter has a hardware FIFO, and the software
> transmits one byte per interrupt,

Then the sofware is completely and utterly broken.  It doesn't
deserve to work.

> then presenting a block of data to it after an idle period
> will invoke the ISR/DSR a slew of times until the FIFO is
> full.

That's insane.  Nobody with a clue would write software like
that.  When you get a TX interrupt you write data to the tx
FIFO until it's full.

> This will happen even if the average interrupt rate is
> eventually throttled to a reasonable value by the serial
> transmission rate, once the FIFO is full. I don't know if that
> accounts for the 38ms in this person's situation, but if the
> FIFO is large it could certainly tie things up for a
> significant amount of time.

You're describing completely broken software.  It needs to be
fixed.

-- 
Grant Edwards                   grante             Yow!  That's a decision
                                  at               that can only be made
                               visi.com            between you & SY SPERLING!!


-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [ECOS] Re: DSR Scheduling Problem
  2006-01-14  0:45 [ECOS] Re: DSR Scheduling Problem Jay Foster
  2006-01-14  2:12 ` Grant Edwards
@ 2006-01-14  8:23 ` Andrew Lunn
  2006-01-16 10:27 ` Nick Garnett
  2 siblings, 0 replies; 33+ messages in thread
From: Andrew Lunn @ 2006-01-14  8:23 UTC (permalink / raw)
  To: Jay Foster; +Cc: 'Grant Edwards', ecos-discuss

On Fri, Jan 13, 2006 at 04:44:34PM -0800, Jay Foster wrote:
> I still think that FIFO queuing of the DSRs is better than LIFO queuing,

Well you have the source. Make eCos do what you want. If you add the
appropriate CDL to control it i might consider integrating it into
anoncvs.

        Andrew

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [ECOS]  Re: DSR Scheduling Problem
  2006-01-14  2:12 ` Grant Edwards
  2006-01-14  3:04   ` Paul D. DeRocco
@ 2006-01-16  8:27   ` Dirk Husemann
  2006-01-16 15:11     ` Grant Edwards
  2006-02-13 10:41   ` Sergei Organov
  2 siblings, 1 reply; 33+ messages in thread
From: Dirk Husemann @ 2006-01-16  8:27 UTC (permalink / raw)
  To: Grant Edwards; +Cc: ecos-discuss

[-- Attachment #1: Type: text/plain, Size: 1984 bytes --]

Grant Edwards wrote:

>On 2006-01-14, Jay Foster <jay@systech.com> wrote:
>
>  
>
>>I still think that FIFO queuing of the DSRs is better than
>>LIFO queuing, because in the absence of any DSR priority
>>information, the best that can be done is temporal priority
>>(ie FIFO).
>>    
>>
>
>That happens to work for your application, but I don't see how
>you can say that FIFO is best in the general case.
>  
>
hmm...i think jay has a point here: we are apparently loosing temporal 
ordering --- so even if a system would be capable of handling the load 
if the DSRs were executed in the order the associated ISRs occurred, 
with the current LIFO implementation it fails --- as demonstrated by jay.

>  
>
>>This prevents the case (that I'm seeing) where a lower
>>priority ISR's DSR preempts a higher priority ISR's DSR (the
>>priority is lost in the LIFO DSR queue).
>>    
>>
>
>I still maintain that your application is either broken or you
>don't have enough CPU.  If one interrupts source requires so
>much DSR time that others can't run, then there is simply
>something wrong.  You seem to prefer a tx underrun error to an
>rx overrun error.  I guarantee you're going to get one or the
>other.  On the systems I work on, either is equally fatal, so
>it is not the case that FIFO is better than LIFO.  Both work
>equally well.
>  
>
hmm...apparently jay is seeing neither of your predicted results since 
he switched to FIFO...

-- 
Dr Dirk Husemann, Pervasive Computing, IBM Research, Zurich Research Lab
	hud@zurich.ibm.com --- http://www.zurich.ibm.com/~hud/
       PGP key: http://www.zurich.ibm.com/~hud/contact/PGP
  PGP Fingerprint: 983C 48E7 0A78 A313 401C  C4AD 3C0A 278E 6431 A149
	     Email only authentic if signed with PGP key.

Appended to this email is an electronic signature attachment. You can
ignore it if your email program does not know how to verify such a
signature. If you'd like to learn more about this topic, www.gnupg.org
is a good starting point.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 256 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [ECOS]  Re: DSR Scheduling Problem
  2006-01-14  3:40     ` Grant Edwards
@ 2006-01-16  8:40       ` Daniel Néri
  2006-01-16 10:36         ` Nick Garnett
                           ` (2 more replies)
  0 siblings, 3 replies; 33+ messages in thread
From: Daniel Néri @ 2006-01-16  8:40 UTC (permalink / raw)
  To: ecos-discuss

[-- Attachment #1: Type: text/plain, Size: 807 bytes --]

Grant Edwards <grante@visi.com> writes:

> On 2006-01-14, Paul D. DeRocco <pderocco@ix.netcom.com> wrote:
>
>> If the transmitter has a hardware FIFO, and the software
>> transmits one byte per interrupt,
>
> Then the sofware is completely and utterly broken.  It doesn't
> deserve to work.
>
>> then presenting a block of data to it after an idle period
>> will invoke the ISR/DSR a slew of times until the FIFO is
>> full.
>
> That's insane. Nobody with a clue would write software like that.

Actually, the generic 16x5x serial driver in eCos works exactly like
that.

> When you get a TX interrupt you write data to the tx FIFO until it's
> full.

Yep. I've made a somewhat quick-and-dirty fix that is attached below.


Regards,
-- 
Daniel NÃ©ri <daniel.neri@sigicom.se>
Sigicom AB, Stockholm, Sweden



[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Type: text/x-patch, Size: 1939 bytes --]

diff -r bc43d0ddc306 -r 851938281d52 devs/serial/generic/16x5x/current/src/ser_16x5x.c
--- a/devs/serial/generic/16x5x/current/src/ser_16x5x.c	Mon Nov 28 14:51:31 2005
+++ b/devs/serial/generic/16x5x/current/src/ser_16x5x.c	Thu Dec  1 10:44:47 2005
@@ -212,6 +212,8 @@
         s16550,
         s16550a
     } deviceType;
+    unsigned tx_fifo_size;
+    volatile unsigned tx_fifo_avail;
 #endif
 } pc_serial_info;
 
@@ -332,10 +334,15 @@
                 _fcr_thresh=FCR_RT14; break;
             }
             _fcr_thresh|=FCR_FE|FCR_CRF|FCR_CTF;
+            ser_chan->tx_fifo_size = 16;
             HAL_WRITE_UINT8(base+REG_fcr, _fcr_thresh); // Enable and clear FIFO
         }
-        else
+        else {
+            ser_chan->tx_fifo_size = 1;
             HAL_WRITE_UINT8(base+REG_fcr, 0); // make sure it's disabled
+        }
+
+        ser_chan->tx_fifo_avail = ser_chan->tx_fifo_size;
 #endif
         if (chan->out_cbuf.len != 0) {
             _ier = IER_RCV;
@@ -423,16 +430,26 @@
 static bool
 pc_serial_putc(serial_channel *chan, unsigned char c)
 {
+#ifndef CYGPKG_IO_SERIAL_GENERIC_16X5X_FIFO
     cyg_uint8 _lsr;
+#endif
     pc_serial_info *ser_chan = (pc_serial_info *)chan->dev_priv;
     cyg_addrword_t base = ser_chan->base;
 
+#ifdef CYGPKG_IO_SERIAL_GENERIC_16X5X_FIFO
+    if (ser_chan->tx_fifo_avail > 0) {
+        HAL_WRITE_UINT8(base+REG_thr, c);
+        --ser_chan->tx_fifo_avail;
+        return true;
+    }
+#else
     HAL_READ_UINT8(base+REG_lsr, _lsr);
     if (_lsr & LSR_THE) {
         // Transmit buffer is empty
         HAL_WRITE_UINT8(base+REG_thr, c);
         return true;
     }
+#endif
     // No space
     return false;
 }
@@ -626,6 +643,9 @@
             break;
         }
         case ISR_Tx:
+#ifdef CYGPKG_IO_SERIAL_GENERIC_16X5X_FIFO
+            ser_chan->tx_fifo_avail = ser_chan->tx_fifo_size;
+#endif
             (chan->callbacks->xmt_char)(chan);
             break;
 


[-- Attachment #3: Type: text/plain, Size: 148 bytes --]

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [ECOS] Re: DSR Scheduling Problem
  2006-01-14  0:45 [ECOS] Re: DSR Scheduling Problem Jay Foster
  2006-01-14  2:12 ` Grant Edwards
  2006-01-14  8:23 ` Andrew Lunn
@ 2006-01-16 10:27 ` Nick Garnett
  2 siblings, 0 replies; 33+ messages in thread
From: Nick Garnett @ 2006-01-16 10:27 UTC (permalink / raw)
  To: Jay Foster; +Cc: 'Grant Edwards', ecos-discuss

Jay Foster <jay@systech.com> writes:

> I still think that FIFO queuing of the DSRs is better than LIFO queuing,
> because in the absence of any DSR priority information, the best that can be
> done is temporal priority (ie FIFO).  This prevents the case (that I'm
> seeing) where a lower priority ISR's DSR preempts a higher priority ISR's
> DSR (the priority is lost in the LIFO DSR queue).

I think the main thing to try and discover is why your application is
so sensitive to the mere order in which DSRs are called. What
relationship exists between these DSRs that causes a problem when they
are called in a particular order? What might your main program be
doing to delay or otherwise interfere with DSR calling?

In general DSRs are independent of each other, the order in which two
or more pending DSRs get called should not matter. In any event they
all get called before any threads get to run, so the order does not
propogate to the threads, which will be run in priority order.

> 
> I located the kernel versions of the DSR code
> (kernel/current/src/intr/intr.cxx), and discovered that there are two
> implementations for the DSR handling (CYGIMP_KERNEL_INTERRUPTS_DSRS_LIST,
> and CYGIMP_KERNEL_INTERRUPTS_DSRS_TABLE).  The default is to use the LIST,
> which is LIFO, but the TABLE implementation is FIFO.  I switched my
> configuration to the TABLE implementation, and my code works.  So a second
> reason to use FIFO for the DSR LIST implementation is to match the behavior
> of the TABLE implementation.

I suspect that changing from LIFO to FIFO order is actually just
masking some other underlying problem. For example that the CPU is
just not up to the job of handling the load being asked of it. For
there to be more than one DSR pending more than very occasionally you
would have to have a very high interrupt rate, virtually saturating
the CPU. Your figure of a 39ms delay before the DSR runs is very
suggestive of something like this. It looks like your transmitter is
simply saturating the CPU.

The LIFO queueing method was chosen because it is fast, deterministic
and simple to maintain. I would be very reluctant to see a change to
this unless a *very* good case is made for FIFO order.

-- 
Nick Garnett                                     eCos Kernel Architect
http://www.ecoscentric.com                The eCos and RedBoot experts

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [ECOS]  Re: DSR Scheduling Problem
  2006-01-16  8:40       ` Daniel Néri
@ 2006-01-16 10:36         ` Nick Garnett
  2006-01-16 11:45           ` [ECOS] Generic 16x5x serial driver use of transmit FIFO (was: DSR Scheduling Problem) Daniel Néri
  2006-01-16 15:13         ` [ECOS] Re: DSR Scheduling Problem Grant Edwards
  2006-01-17  9:43         ` Andrew Lunn
  2 siblings, 1 reply; 33+ messages in thread
From: Nick Garnett @ 2006-01-16 10:36 UTC (permalink / raw)
  To: Daniel Néri; +Cc: ecos-discuss

daniel.neri@sigicom.se (Daniel Néri) writes:

> Grant Edwards <grante@visi.com> writes:
> 
> > On 2006-01-14, Paul D. DeRocco <pderocco@ix.netcom.com> wrote:
> >
> >> If the transmitter has a hardware FIFO, and the software
> >> transmits one byte per interrupt,
> >
> > Then the sofware is completely and utterly broken.  It doesn't
> > deserve to work.
> >
> >> then presenting a block of data to it after an idle period
> >> will invoke the ISR/DSR a slew of times until the FIFO is
> >> full.
> >
> > That's insane. Nobody with a clue would write software like that.
> 
> Actually, the generic 16x5x serial driver in eCos works exactly like
> that.

Actually, it doesn't.

> 
> > When you get a TX interrupt you write data to the tx FIFO until it's
> > full.
> 
> Yep. I've made a somewhat quick-and-dirty fix that is attached below.

This already happens. pc_serial_putc() returns true or false,
depending on whether it transmitted the byte. In the 16550 this means
that it will only return false when the FIFO fills up. The generic
serial code calls the putc() driver routine in a loop to transmit
bytes until it returns false.

-- 
Nick Garnett                                     eCos Kernel Architect
http://www.ecoscentric.com                The eCos and RedBoot experts


--
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [ECOS]  Generic 16x5x serial driver use of transmit FIFO (was: DSR Scheduling Problem)
  2006-01-16 10:36         ` Nick Garnett
@ 2006-01-16 11:45           ` Daniel Néri
  2006-01-16 12:23             ` Nick Garnett
  0 siblings, 1 reply; 33+ messages in thread
From: Daniel Néri @ 2006-01-16 11:45 UTC (permalink / raw)
  To: ecos-discuss

Nick Garnett <nickg@ecoscentric.com> writes:

> Actually, it doesn't.

I think you're wrong.

> This already happens. pc_serial_putc() returns true or false,
> depending on whether it transmitted the byte. In the 16550 this means
> that it will only return false when the FIFO fills up.

No. LSR bit 5 (THE) is set when the TX FIFO is empty, not when it's
non-full.

> The generic serial code calls the putc() driver routine in a loop to
> transmit bytes until it returns false.

Writing to the THR clears the THE so it will typically return false
after the first byte, and the FIFO will never fill up.



Regards,
-- 
Daniel NÃ©ri <daniel.neri@sigicom.se>
Sigicom AB, Stockholm, Sweden


-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [ECOS]  Generic 16x5x serial driver use of transmit FIFO (was: DSR Scheduling Problem)
  2006-01-16 11:45           ` [ECOS] Generic 16x5x serial driver use of transmit FIFO (was: DSR Scheduling Problem) Daniel Néri
@ 2006-01-16 12:23             ` Nick Garnett
  0 siblings, 0 replies; 33+ messages in thread
From: Nick Garnett @ 2006-01-16 12:23 UTC (permalink / raw)
  To: Daniel Néri; +Cc: ecos-discuss

daniel.neri@sigicom.se (Daniel Néri) writes:

> Nick Garnett <nickg@ecoscentric.com> writes:
> 
> > Actually, it doesn't.
> 
> I think you're wrong.
> 
> > This already happens. pc_serial_putc() returns true or false,
> > depending on whether it transmitted the byte. In the 16550 this means
> > that it will only return false when the FIFO fills up.
> 
> No. LSR bit 5 (THE) is set when the TX FIFO is empty, not when it's
> non-full.
> 
> > The generic serial code calls the putc() driver routine in a loop to
> > transmit bytes until it returns false.
> 
> Writing to the THR clears the THE so it will typically return false
> after the first byte, and the FIFO will never fill up.

Oh, yes, you're right. Other UARTs have better behaviour in this
regard, and I was assuming that the 16550 did the right thing here
too. I guess this is a pitfall of dealing with legacy hardware. :-(

-- 
Nick Garnett                                     eCos Kernel Architect
http://www.ecoscentric.com                The eCos and RedBoot experts


--
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [ECOS] Re: DSR Scheduling Problem
  2006-01-16  8:27   ` Dirk Husemann
@ 2006-01-16 15:11     ` Grant Edwards
  0 siblings, 0 replies; 33+ messages in thread
From: Grant Edwards @ 2006-01-16 15:11 UTC (permalink / raw)
  To: ecos-discuss

In gmane.os.ecos.general, you wrote:

>>>I still think that FIFO queuing of the DSRs is better than
>>>LIFO queuing, because in the absence of any DSR priority
>>>information, the best that can be done is temporal priority
>>>(ie FIFO).
>>
>>That happens to work for your application, but I don't see how
>>you can say that FIFO is best in the general case.
>>
> hmm...i think jay has a point here: we are apparently loosing temporal 
> ordering

True.

> --- so even if a system would be capable of handling the load 
> if the DSRs were executed in the order the associated ISRs occurred, 
> with the current LIFO implementation it fails --- as demonstrated by jay.

I don't see how he's demonstrated that at all.   If you need to
run 150ms of DSRs in a 100ms time period, you're not going to
be able to run them all by re-ordering them.  Jay didn't
demonstrate that changing the order made all his DSRs run as
often as they needed to.  

>>I still maintain that your application is either broken or you
>>don't have enough CPU.  If one interrupts source requires so
>>much DSR time that others can't run, then there is simply
>>something wrong.  You seem to prefer a tx underrun error to an
>>rx overrun error.  I guarantee you're going to get one or the
>>other.  On the systems I work on, either is equally fatal, so
>>it is not the case that FIFO is better than LIFO.  Both work
>>equally well.
>>  
>>
> hmm...apparently jay is seeing neither of your predicted results since 
> he switched to FIFO...

How do we know he's not getting tx FIFO underruns?

-- 
Grant Edwards                   grante             Yow!  .. I see TOILET
                                  at               SEATS...
                               visi.com            

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [ECOS] Re: DSR Scheduling Problem
  2006-01-16  8:40       ` Daniel Néri
  2006-01-16 10:36         ` Nick Garnett
@ 2006-01-16 15:13         ` Grant Edwards
  2006-01-17  9:43         ` Andrew Lunn
  2 siblings, 0 replies; 33+ messages in thread
From: Grant Edwards @ 2006-01-16 15:13 UTC (permalink / raw)
  To: ecos-discuss

>>> If the transmitter has a hardware FIFO, and the software
>>> transmits one byte per interrupt,
>>
>> Then the sofware is completely and utterly broken.  It doesn't
>> deserve to work.
>>
>>> then presenting a block of data to it after an idle period
>>> will invoke the ISR/DSR a slew of times until the FIFO is
>>> full.
>>
>> That's insane. Nobody with a clue would write software like
>> that.
>
> Actually, the generic 16x5x serial driver in eCos works
> exactly like that.

You're kidding!

>> When you get a TX interrupt you write data to the tx FIFO until it's
>> full.
>
> Yep. I've made a somewhat quick-and-dirty fix that is attached below.

That should provide a drastic reduction in load for that driver.

-- 
Grant Edwards                   grante             Yow!  FIRST, I'm covering
                                  at               you with OLIVE OIL and
                               visi.com            PRUNE WHIP!!

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [ECOS]  Re: DSR Scheduling Problem
  2006-01-16  8:40       ` Daniel Néri
  2006-01-16 10:36         ` Nick Garnett
  2006-01-16 15:13         ` [ECOS] Re: DSR Scheduling Problem Grant Edwards
@ 2006-01-17  9:43         ` Andrew Lunn
  2 siblings, 0 replies; 33+ messages in thread
From: Andrew Lunn @ 2006-01-17  9:43 UTC (permalink / raw)
  To: Daniel N?ri; +Cc: ecos-discuss

> > That's insane. Nobody with a clue would write software like that.
> 
> Actually, the generic 16x5x serial driver in eCos works exactly like
> that.
> 
> > When you get a TX interrupt you write data to the tx FIFO until it's
> > full.
> 
> Yep. I've made a somewhat quick-and-dirty fix that is attached below.

Would somebody like to make a clean fix for this. I will then commit
it.

        Thanks
                Andrew               

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [ECOS]  Re: DSR Scheduling Problem
  2006-01-14  2:12 ` Grant Edwards
  2006-01-14  3:04   ` Paul D. DeRocco
  2006-01-16  8:27   ` Dirk Husemann
@ 2006-02-13 10:41   ` Sergei Organov
  2006-02-15  2:06     ` Brett Delmage
  2 siblings, 1 reply; 33+ messages in thread
From: Sergei Organov @ 2006-02-13 10:41 UTC (permalink / raw)
  To: ecos-discuss

> On 2006-01-14, Jay Foster <jay@systech.com> wrote:
>> I still think that FIFO queuing of the DSRs is better than
>> LIFO queuing, because in the absence of any DSR priority
>> information, the best that can be done is temporal priority
>> (ie FIFO).

Nick Garnett <nickg@ecoscentric.com> writes:
> The LIFO queueing method was chosen because it is fast, deterministic
> and simple to maintain. I would be very reluctant to see a change to
> this unless a *very* good case is made for FIFO order.

Grant Edwards <grante@visi.com> writes:
> That happens to work for your application, but I don't see how
> you can say that FIFO is best in the general case.

That brought an interesting question, so I've tried to figure out some
numbers. I took IRQ-to-corresponding-DSR latency as the comparison
criteria, and my analysis of rather simple system suggests that LIFO
could be 2 times worse than FIFO at some conditions and is never
better. Though I've tried my best to make the comparison fair, I must
admit that I'm somewhat biased in favor of FIFO (LIFO wait queues just
sound wrong in the first place), and I'd be thankful should somebody
find a mistake in my reasonings below.

For the purpose of comparison I considered rather simple yet usual
system with the following properties:

1. There are N independent asynchronous interrupt sources
   IRQ1-IRQN. IRQ1 has highest priority and IRQN has lowest priority.

2. System load is low and each of interrupts happens so rarely that an
   IRQ never occurs when ISR/DSR corresponding to previous IRQ of this
   particular kind is active.

3. ISRs don't nest.

4. Difference between FIFO and LIFO management overheads is negligible
   compared to ISR + DSR execution times.

Now let's try to calculate *maximum* IRQ-to-DSR latency for IRQ1 that
has the highest priority. For simplicity let's assume every ISR handler
takes roughly the same time Ti for execution and every DSR handler takes
roughly the same time Td.

First consider FIFO case.

If Td => Ti, maximum latency is reached, e.g., when all the IRQs but
IRQ1 happen almost simultaneously, then IRQ1 happens just before the
beginning of DSR2 execution:
             IRQ1 
              v
ISR2,...,ISRN, ISR1,DSR2,...,DSRN,DSR1

and maximum IRQ1-to-DSR1 latency LF1 = Ti + Td * (N - 1)

If Ti >= Td, maximum latency is reached when, say, IRQ2 happens, ISR2
begins to execute, then IRQ1,IRQ3,...,IRQN happen almost immediately (in
whatever order). In this case the order of execution is:

IRQ1
v   
ISR2,ISR1,ISR3,...,ISRN,DSR2,DSR1,...,DSRN,

and LF2 = Ti * (N - 1) + Td

Overall, maximum IRQ1-to-DSR1 latency for FIFO policy is:

        | Td * (N - 1) + Ti, Ti <= Td 
LFmax = |
        | Ti * (N - 1) + Td, Ti >= Td

Now consider LIFO case.

The worst case w.r.t. DSR1 latency is:

IRQ1
v   
 ISR1,ISR2,....,ISRN,DSRN,...,DSR2,DSR1

and LL2 = Ti * N + Td * (N - 1)

and maximum IRQ1-to-DSR1 latency for LIFO policy is:

LLmax = Ti * N + Td * (N - 1)

Please note that:

1. LLmax >= LFmax

2. If Ti = Td,  LLmax/LFmax = 2 - 1/N,

   So, maximum DSR latency for the highest priority IRQ could be nearly
   2(!) times more for LIFO policy than for FIFO one.

3. Minimum latencies are the same and are equal to Ti (FIFO vs. LIFO
   overhead aside).

3. Mean latencies are almost the same for LIFO and FIFO policies for our
   system where interrupts occur so rarely that most probable case is
   single ISR followed by corresponding DSR (Lmean~=Ti).

Overall, the above analysis suggests LIFO is never better than FIFO and
it could be much worse than FIFO.

Isn't it time to finally get rid of LIFO wait queues in eCos? Any
objections?

-- Sergei.

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [ECOS]  Re: DSR Scheduling Problem
  2006-02-13 10:41   ` Sergei Organov
@ 2006-02-15  2:06     ` Brett Delmage
  2006-02-15  9:57       ` Sergei Organov
  0 siblings, 1 reply; 33+ messages in thread
From: Brett Delmage @ 2006-02-15  2:06 UTC (permalink / raw)
  To: Sergei Organov; +Cc: ecos-discuss

On Mon, 13 Feb 2006, Sergei Organov wrote:

<nice analysis deleted>

> Overall, the above analysis suggests LIFO is never better than FIFO and
> it could be much worse than FIFO.
> 
> Isn't it time to finally get rid of LIFO wait queues in eCos? Any
> objections?

Isn't ECOS about choice?
Make this configurable option and allow users to try both. :-)
May the best algorithm win!

> -- Sergei.

Brett

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [ECOS]  Re: DSR Scheduling Problem
  2006-02-15  2:06     ` Brett Delmage
@ 2006-02-15  9:57       ` Sergei Organov
  2006-02-15 13:23         ` Stefan Sommerfeld
                           ` (2 more replies)
  0 siblings, 3 replies; 33+ messages in thread
From: Sergei Organov @ 2006-02-15  9:57 UTC (permalink / raw)
  To: ecos-discuss

Brett Delmage <brett@twobikes.ottawa.on.ca> writes:

> On Mon, 13 Feb 2006, Sergei Organov wrote:
>
> <nice analysis deleted>
>
>> Overall, the above analysis suggests LIFO is never better than FIFO and
>> it could be much worse than FIFO.
>> 
>> Isn't it time to finally get rid of LIFO wait queues in eCos? Any
>> objections?
>
> Isn't ECOS about choice?

Well, unfortunately choice doesn't come for free. More testing, more
opportunities for bugs, more confusion, etc. I believe useless choices
are evil.

> Make this configurable option and allow users to try both. :-)

Which of the options do you suggest to be the default? How do you
explain users the criteria to choose one algorithm or another? How will
user compare the choices in his tests when most of time the algorithms
behave exactly the same? How do you explain why LIFO choice is there in
the first place if it has no advantages?

For example, the "array" choice for DSR queue does have an excuse as
being interrupts-disable-free, and it has an excuse of not being the
default as it has potential problems with missing DSRs. What's an excuse
for keeping LIFO choice? The only one I see is backward compatibility,
but due to the fact that eCos never specified exact order of DSRs it
shouldn't matter.

-- Sergei.

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [ECOS]  Re: DSR Scheduling Problem
  2006-02-15  9:57       ` Sergei Organov
@ 2006-02-15 13:23         ` Stefan Sommerfeld
  2006-02-15 14:07           ` Sergei Organov
  2006-02-15 15:54           ` Grant Edwards
  2006-02-15 15:53         ` Grant Edwards
  2006-02-15 16:34         ` Brett Delmage
  2 siblings, 2 replies; 33+ messages in thread
From: Stefan Sommerfeld @ 2006-02-15 13:23 UTC (permalink / raw)
  To: ecos-discuss

[-- Attachment #1: Type: text/plain, Size: 174 bytes --]

Hi,

I've added a DSR FIFO option which works like the LIST version, but in FIFO 
order. I've tested it for some time now and it works will. Please give it a 
try.

Bye.... 

[-- Attachment #2: fifo_dsr.patch --]
[-- Type: application/octet-stream, Size: 5002 bytes --]

Index: cdl/interrupts.cdl
===================================================================
RCS file: /cvs/ecos/ecos/packages/kernel/current/cdl/interrupts.cdl,v
retrieving revision 1.4
diff -w -u -r1.4 interrupts.cdl
--- cdl/interrupts.cdl	23 May 2002 23:06:45 -0000	1.4
+++ cdl/interrupts.cdl	15 Feb 2006 13:07:22 -0000
@@ -87,6 +87,21 @@
             possibility of a table overflow occurring."
     }
 
+    cdl_option CYGIMP_KERNEL_INTERRUPTS_DSRS_FIFO {
+        display       "Use FIFO for DSRs "
+        default_value 0
+        implements    CYGINT_KERNEL_INTERRUPTS_DSRS
+        description   "
+            When DSR support is enabled the kernel must keep track of all
+            the DSRs that are pending. This information can be kept in a
+            fixed-size table or in a linked list. The list implementation
+            requires that the kernel disable interrupts for a very short
+            period of time outside interrupt handlers, but there is no
+            possibility of a table overflow occurring. Instead of LIST this
+            implementation processed the DSR and first come, first serve
+            order, which reduces the ISR to DSR delay."
+    }
+
     cdl_component CYGIMP_KERNEL_INTERRUPTS_DSRS_TABLE {
         display       "Use fixed-size table for DSRs"
         default_value 0
Index: include/intr.hxx
===================================================================
RCS file: /cvs/ecos/ecos/packages/kernel/current/include/intr.hxx,v
retrieving revision 1.11
diff -w -u -r1.11 intr.hxx
--- include/intr.hxx	23 May 2002 23:06:47 -0000	1.11
+++ include/intr.hxx	15 Feb 2006 12:41:10 -0000
@@ -187,6 +187,7 @@
                                                CYGBLD_ANNOTATE_VARIABLE_INTR;
 
 #endif
+
 #ifdef CYGIMP_KERNEL_INTERRUPTS_DSRS_LIST
 
     // Number of DSR posts made
@@ -201,6 +202,22 @@
     
 #endif
 
+#ifdef CYGIMP_KERNEL_INTERRUPTS_DSRS_FIFO
+
+    // Number of DSR posts made
+    volatile cyg_ucount32 dsr_count CYGBLD_ANNOTATE_VARIABLE_INTR; 
+
+    // next DSR in list
+    Cyg_Interrupt* volatile next_dsr CYGBLD_ANNOTATE_VARIABLE_INTR; 
+
+    // static list of pending DSRs
+    static Cyg_Interrupt* volatile dsr_list[CYGNUM_KERNEL_CPU_MAX]
+                                           CYGBLD_ANNOTATE_VARIABLE_INTR;
+    static Cyg_Interrupt* volatile last_dsr[CYGNUM_KERNEL_CPU_MAX]
+                                           CYGBLD_ANNOTATE_VARIABLE_INTR;
+    
+#endif
+
 #ifdef CYGIMP_KERNEL_INTERRUPTS_CHAIN
 
     // The default mechanism for handling interrupts is to attach just
Index: src/intr/intr.cxx
===================================================================
RCS file: /cvs/ecos/ecos/packages/kernel/current/src/intr/intr.cxx,v
retrieving revision 1.17
diff -w -u -r1.17 intr.cxx
--- src/intr/intr.cxx	23 May 2002 23:06:54 -0000	1.17
+++ src/intr/intr.cxx	15 Feb 2006 12:38:38 -0000
@@ -100,6 +100,13 @@
 
 #endif
 
+#ifdef CYGIMP_KERNEL_INTERRUPTS_DSRS_FIFO
+
+    dsr_count   = 0;
+    next_dsr    = NULL;
+
+#endif
+
 #ifdef CYGIMP_KERNEL_INTERRUPTS_CHAIN
 
     next        = NULL;
@@ -139,6 +146,13 @@
 
 #endif
 
+#ifdef CYGIMP_KERNEL_INTERRUPTS_DSRS_FIFO
+
+Cyg_Interrupt* volatile Cyg_Interrupt::dsr_list[CYGNUM_KERNEL_CPU_MAX];
+Cyg_Interrupt* volatile Cyg_Interrupt::last_dsr[CYGNUM_KERNEL_CPU_MAX];
+
+#endif
+
 // -------------------------------------------------------------------------
 // Call any pending DSRs
 
@@ -193,6 +207,38 @@
     
 #endif
     
+#ifdef CYGIMP_KERNEL_INTERRUPTS_DSRS_FIFO
+
+    cyg_uint32 old_intr;
+    HAL_DISABLE_INTERRUPTS(old_intr);
+    while( dsr_list[cpu] != NULL )
+    {
+        Cyg_Interrupt* intr;
+        cyg_count32 count;
+        
+        
+        intr = dsr_list[cpu];
+        dsr_list[cpu] = intr->next_dsr;
+        count = intr->dsr_count;
+        intr->dsr_count = 0;
+        intr->next_dsr = 0;
+        if (dsr_list[cpu] == NULL)
+        {
+            last_dsr[cpu] = NULL;
+        }
+        
+        HAL_RESTORE_INTERRUPTS(old_intr);
+        
+        CYG_ASSERT( intr->dsr != NULL , "No DSR defined");
+
+        intr->dsr( intr->vector, count, (CYG_ADDRWORD)intr->data );
+        
+    	  HAL_DISABLE_INTERRUPTS(old_intr);
+    }
+    HAL_RESTORE_INTERRUPTS(old_intr);
+    
+#endif
+    
 };
 
 externC void
@@ -257,6 +303,27 @@
     
 #endif
     
+#ifdef CYGIMP_KERNEL_INTERRUPTS_DSRS_FIFO
+
+    // Only add the interrupt to the dsr list if this is
+    // the first DSR call.
+    
+    if( dsr_count++ == 0 )
+    {
+        Cyg_Interrupt* cur_last_dsr = last_dsr[cpu];
+        if (cur_last_dsr)
+        {
+            cur_last_dsr->next_dsr = this;
+        }
+        else
+        {
+            dsr_list[cpu] = this;
+        }
+        last_dsr[cpu] = this;
+    }
+    
+#endif
+    
     HAL_RESTORE_INTERRUPTS(old_intr);    
 };
 


[-- Attachment #3: Type: text/plain, Size: 148 bytes --]

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [ECOS]  Re: DSR Scheduling Problem
  2006-02-15 13:23         ` Stefan Sommerfeld
@ 2006-02-15 14:07           ` Sergei Organov
  2006-02-15 14:14             ` Stefan Sommerfeld
  2006-02-15 15:54           ` Grant Edwards
  1 sibling, 1 reply; 33+ messages in thread
From: Sergei Organov @ 2006-02-15 14:07 UTC (permalink / raw)
  To: ecos-discuss

"Stefan Sommerfeld" <sommerfeld@mikrom.de> writes:

> Hi,
>
> I've added a DSR FIFO option which works like the LIST version, but in
> FIFO order. I've tested it for some time now and it works will. Please
> give it a try.

A doubt about the names:

+#ifdef CYGIMP_KERNEL_INTERRUPTS_DSRS_FIFO
+
+Cyg_Interrupt* volatile Cyg_Interrupt::dsr_list[CYGNUM_KERNEL_CPU_MAX];
+Cyg_Interrupt* volatile Cyg_Interrupt::last_dsr[CYGNUM_KERNEL_CPU_MAX];
+
+#endif
+

To be consistent with array implementation, isn't it better to call the
above variables "dsr_list_head" and "dsr_list_tail", respectively?

-- Sergei.


-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [ECOS]  Re: DSR Scheduling Problem
  2006-02-15 14:07           ` Sergei Organov
@ 2006-02-15 14:14             ` Stefan Sommerfeld
  0 siblings, 0 replies; 33+ messages in thread
From: Stefan Sommerfeld @ 2006-02-15 14:14 UTC (permalink / raw)
  To: ecos-discuss

Hi,
>>
>> I've added a DSR FIFO option which works like the LIST version, but in
>> FIFO order. I've tested it for some time now and it works will. Please
>> give it a try.
> 
> A doubt about the names:
> 
> +#ifdef CYGIMP_KERNEL_INTERRUPTS_DSRS_FIFO
> +
> +Cyg_Interrupt* volatile Cyg_Interrupt::dsr_list[CYGNUM_KERNEL_CPU_MAX];
> +Cyg_Interrupt* volatile Cyg_Interrupt::last_dsr[CYGNUM_KERNEL_CPU_MAX];
> +
> +#endif
> +
> 
> To be consistent with array implementation, isn't it better to call the
> above variables "dsr_list_head" and "dsr_list_tail", respectively?

Yes, no problem. It's just a (working) suggestion for the implementation.

Bye...

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [ECOS] Re: DSR Scheduling Problem
  2006-02-15  9:57       ` Sergei Organov
  2006-02-15 13:23         ` Stefan Sommerfeld
@ 2006-02-15 15:53         ` Grant Edwards
  2006-02-15 18:30           ` Nick Garnett
  2006-02-15 19:36           ` Sergei Organov
  2006-02-15 16:34         ` Brett Delmage
  2 siblings, 2 replies; 33+ messages in thread
From: Grant Edwards @ 2006-02-15 15:53 UTC (permalink / raw)
  To: ecos-discuss

In gmane.os.ecos.general, you wrote:

>> Isn't ECOS about choice?

Yes.

> Well, unfortunately choice doesn't come for free.

And the new feature you want does?

> More testing, more opportunities for bugs, more confusion,
> etc. I believe useless choices are evil.

Adding a new DSR scheduler doesn't come for free either.
Keeping the old one, however, _is_ free.

>> Make this configurable option and allow users to try both. :-)
>
> Which of the options do you suggest to be the default?

The existing one, of course.  Always default to existing
behavior when adding new options.

> How do you explain users the criteria to choose one algorithm
> or another?

You don't.  Just explain the differences between the two
algorithms.  It's up to the user to determine the criteria on
which he's making his choice.

> How will user compare the choices in his tests when most of
> time the algorithms behave exactly the same?

That's up to the user.

> How do you explain why LIFO choice is there in the first place
> if it has no advantages?

It's already been explained:  LIFO is fast and dirt-simple.

> For example, the "array" choice for DSR queue does have an
> excuse as being interrupts-disable-free, and it has an excuse
> of not being the default as it has potential problems with
> missing DSRs. What's an excuse for keeping LIFO choice?

The most important excuse is not changing things for people who
have working systems.  I don't care how much you want FIFO DSR
scheduling -- you don't get to force it down my throat.

> The only one I see is backward compatibility, but due to the
> fact that eCos never specified exact order of DSRs it
> shouldn't matter.

Lots of things that shouldn't matter do.  Don't arbitrarily
force changes on everybody just to make a tiny minority happy.
I'm all for allowing that minority to add another DSR
scheduling option if they want.  I'm _not_ for allow them to
force that change on everybody else.

-- 
Grant Edwards                   grante             Yow!  Put FIVE DOZEN red
                                  at               GIRDLES in each CIRCULAR
                               visi.com            OPENING!!

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [ECOS] Re: DSR Scheduling Problem
  2006-02-15 13:23         ` Stefan Sommerfeld
  2006-02-15 14:07           ` Sergei Organov
@ 2006-02-15 15:54           ` Grant Edwards
  1 sibling, 0 replies; 33+ messages in thread
From: Grant Edwards @ 2006-02-15 15:54 UTC (permalink / raw)
  To: ecos-discuss


> I've added a DSR FIFO option which works like the LIST version, but in FIFO 
> order. I've tested it for some time now and it works will. Please give it a 
> try.

That's the spirit!

Instead of arguing about why everybody should do things your
way, just do it and let others use it if they please.

-- 
Grant Edwards                   grante             Yow!  They
                                  at               collapsed... like nuns
                               visi.com            in the street... they had
                                                   no teenappeal!

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [ECOS]  Re: DSR Scheduling Problem
  2006-02-15  9:57       ` Sergei Organov
  2006-02-15 13:23         ` Stefan Sommerfeld
  2006-02-15 15:53         ` Grant Edwards
@ 2006-02-15 16:34         ` Brett Delmage
  2 siblings, 0 replies; 33+ messages in thread
From: Brett Delmage @ 2006-02-15 16:34 UTC (permalink / raw)
  To: Sergei Organov; +Cc: ecos-discuss

On Wed, 15 Feb 2006, Sergei Organov wrote:

> Brett Delmage <brett@twobikes.ottawa.on.ca> writes:
> 
> > On Mon, 13 Feb 2006, Sergei Organov wrote:
> >
> > <nice analysis deleted>
> >
> >> Overall, the above analysis suggests LIFO is never better than FIFO and
> >> it could be much worse than FIFO.
> >> 
> >> Isn't it time to finally get rid of LIFO wait queues in eCos? Any
> >> objections?
> >
> > Isn't eCos about choice?
> 
> Well, unfortunately choice doesn't come for free. More testing, more
> opportunities for bugs, more confusion, etc. I believe useless choices
> are evil.
> 
> > Make this configurable option and allow users to try both. :-)
> 
> Which of the options do you suggest to be the default?

I have to agree with Grant that the current implementation should continue 
to be the default, because it is the expected behavior.

> How do you explain users the criteria to choose one algorithm or 
> another?

Again, I agree with Grant about that the user should be provided with the 
information (pros, cons) about each and can make a choice, and ideally 
benchmark their own application. As we have already seen, depending on 
application implementation, the advantages and disadvantages of each can 
change completely or not matter at all.

> How will user compare the choices in his tests when most of time the 
> algorithms behave exactly the same? How do you explain why LIFO choice 
> is there in the first place if it has no advantages?

Put the mathematical model that was recently posted into the package 
documentation and let the user decide ;-)

As with other open software, a significant benefit/feature/characteristic 
of eCos is its value as an educational tool. When I first started 
learning about eCos, I really liked, for example, the *different* 
implementations of the scheduler being available, with explanations. 
Design is always about tradeoffs, especially in embedded systems. The 
different implementations of functions in eCos are really useful as 
optimizable building blocks, but they also help developers to see and 
understand tradeoffs, such as efficiency vs latency. This knowledge can 
also be applied in their application code.

For the case in point, by including two DSR algorithms, people get to 
think more about what makes a good and bad algorithm, and study the 
differences. If a "bad" algorithm (whatever that may be!) was removed, 
then that learning opportunity would be lost.

The educational value of a F/LOSS eCos is truly important and something 
to be proud of. I urge contributers (of which one day I hope to be one, 
after I get my first port working!) to remember that as they contribute.
You're not just making a great tool: you're advancing human knowledge.

Brett
embedded software developer
Ottawa, Canada

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [ECOS] Re: DSR Scheduling Problem
  2006-02-15 15:53         ` Grant Edwards
@ 2006-02-15 18:30           ` Nick Garnett
  2006-02-15 19:30             ` Sergei Organov
  2006-02-15 19:36           ` Sergei Organov
  1 sibling, 1 reply; 33+ messages in thread
From: Nick Garnett @ 2006-02-15 18:30 UTC (permalink / raw)
  To: Grant Edwards; +Cc: ecos-discuss

Grant Edwards <grante@visi.com> writes:

> In gmane.os.ecos.general, you wrote:
> 
> >> Isn't ECOS about choice?
> 
> Yes.
> 
> > Well, unfortunately choice doesn't come for free.
> 
> And the new feature you want does?
> 
> > More testing, more opportunities for bugs, more confusion,
> > etc. I believe useless choices are evil.
> 
> Adding a new DSR scheduler doesn't come for free either.
> Keeping the old one, however, _is_ free.
> 
> >> Make this configurable option and allow users to try both. :-)
> >
> > Which of the options do you suggest to be the default?
> 
> The existing one, of course.  Always default to existing
> behavior when adding new options.
> 
> > How do you explain users the criteria to choose one algorithm
> > or another?
> 
> You don't.  Just explain the differences between the two
> algorithms.  It's up to the user to determine the criteria on
> which he's making his choice.
> 
> > How will user compare the choices in his tests when most of
> > time the algorithms behave exactly the same?
> 
> That's up to the user.
> 
> > How do you explain why LIFO choice is there in the first place
> > if it has no advantages?
> 
> It's already been explained:  LIFO is fast and dirt-simple.
> 
> > For example, the "array" choice for DSR queue does have an
> > excuse as being interrupts-disable-free, and it has an excuse
> > of not being the default as it has potential problems with
> > missing DSRs. What's an excuse for keeping LIFO choice?
> 
> The most important excuse is not changing things for people who
> have working systems.  I don't care how much you want FIFO DSR
> scheduling -- you don't get to force it down my throat.
> 
> > The only one I see is backward compatibility, but due to the
> > fact that eCos never specified exact order of DSRs it
> > shouldn't matter.
> 
> Lots of things that shouldn't matter do.  Don't arbitrarily
> force changes on everybody just to make a tiny minority happy.
> I'm all for allowing that minority to add another DSR
> scheduling option if they want.  I'm _not_ for allow them to
> force that change on everybody else.


I agree with everything that Grant says here.

Stefan's patch, give or take a bit of tidying and name changing, looks
just fine. Adding and removing DSRs remains deterministic, if not
quite constant-time, and resolves any reservations I might have about
adding FIFO queueing.

It should only go in as a configuration option and the original LIFO
mechanism should be the default. Most systems do not have interrupts
occuring at the sort of rate that would make DSR queueing order make
any difference. So we should default to the very simplest approach and
document the tradeoffs of each mechanism. Subsytems and drivers that
really want FIFO queueing can always have a "requires" statement in
their CDL for this option.


-- 
Nick Garnett                                          eCos Kernel Architect
http://www.ecoscentric.com                     The eCos and RedBoot experts
Besuchen Sie uns vom 14.-16.02.06 auf der Embedded World 2006, Stand 11-222
Visit us at Embedded World 2006, Nürnberg, Germany, 14-16 Feb, Stand 11-222


--
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [ECOS]  Re: DSR Scheduling Problem
  2006-02-15 18:30           ` Nick Garnett
@ 2006-02-15 19:30             ` Sergei Organov
  2006-02-16 10:00               ` Nick Garnett
  0 siblings, 1 reply; 33+ messages in thread
From: Sergei Organov @ 2006-02-15 19:30 UTC (permalink / raw)
  To: ecos-discuss

Nick Garnett <nickg@ecoscentric.com> writes:
[...]
> Stefan's patch, give or take a bit of tidying and name changing, looks
> just fine.

Please consider documentation worries I have in the reply to Grant's
objections I'm going to post in a few minutes.

> Adding and removing DSRs remains deterministic, if not quite
> constant-time, and resolves any reservations I might have about adding
> FIFO queueing.

What's less constant-time in FIFO w.r.t. LIFO? Did I miss something?

> Most systems do not have interrupts occuring at the sort of rate that
> would make DSR queueing order make any difference.

I'm confused. Did you miss in my analysis that maximum DSR latency
doesn't depend on the rate of interrupts? It was in fact one of my
primary assumptions that interrupts are rare. I.e., FIFO wins at low
interrupt rates.

> So we should default to the very simplest approach and document the
> tradeoffs of each mechanism. Subsytems and drivers that really want
> FIFO queueing can always have a "requires" statement in their CDL for
> this option.

If the above indeed were the case, I'd have no objections, but I still
fail to see *real* trade-offs of FIFO w.r.t. to LIFO.

-- Sergei.

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [ECOS]  Re: DSR Scheduling Problem
  2006-02-15 15:53         ` Grant Edwards
  2006-02-15 18:30           ` Nick Garnett
@ 2006-02-15 19:36           ` Sergei Organov
  2006-02-15 19:57             ` Grant Edwards
  1 sibling, 1 reply; 33+ messages in thread
From: Sergei Organov @ 2006-02-15 19:36 UTC (permalink / raw)
  To: ecos-discuss

Grant Edwards <grante@visi.com> writes:

> In gmane.os.ecos.general, you wrote:
>
>>> Isn't ECOS about choice?
>
> Yes.
>
>> Well, unfortunately choice doesn't come for free.
>
> And the new feature you want does?

No, it doesn't, but a new feature along with the choice is definitely
more cost than either one of those ;) Besides, I didn't want a new
feature, I wanted to see if the old one could be improved.

>> More testing, more opportunities for bugs, more confusion,
>> etc. I believe useless choices are evil.
>
> Adding a new DSR scheduler doesn't come for free either.

Sure, but the same is true for any other improvement. I suggested to
improve existing LIST scheduler, not to add yet another one.

> Keeping the old one, however, _is_ free.

I don't believe it is, provided a new one is added. Try to look at the
resulting code for DSRs management and I'm sure you'll see... It's
not free for those who needs to maintain the code and it's not free for
newcomers, IMHO. Being neither of those, I don't care that much though.

Besides, we have another problem with that. Consider the documentation
Stefan has been so kind to provide along with his nice patch:

+            ... Instead of LIST this
+            implementation processed the DSR and first come, first serve
+            order, which reduces the ISR to DSR delay."

Do you see the problem? Refer to documentation for LIST and notice that
it doesn't mention anything about the DSR processing order. As far as I
understand from earlier Nick's posts that was system decision not to
specify the order, and now we end up breaking those decision :(

To be consistent, one would need to change old option from
CYGIMP_KERNEL_INTERRUPTS_DSRS_LIST to, say,
CYGIMP_KERNEL_INTERRUPTS_DSRS_LIST_LIFO, call the new option
CYGIMP_KERNEL_INTERRUPTS_DSRS_LIST_FIFO, and then try hard to explain
why in the hell both are there.

To summarize, it seems that instead of improving behavior by changing
minor implementation detail, we need to introduce two new options,
remove one old option, and simultaneously make implementation details
public. For me it doesn't seem reasonable from architectural point of
view, sorry. (Even though Nick just posted his agreement with your
arguments).

>>> Make this configurable option and allow users to try both. :-)
>>
>> Which of the options do you suggest to be the default?
>
> The existing one, of course.  Always default to existing behavior when
> adding new options.

So the result in this given case is that the worst option is the default
one :( I don't in fact care much about theory, -- but the result does
matter, IMHO. Backward compatibility is very valuable goal, but keeping
it is not *always* the right choice, especially provided that an
application that could break relies on undocumented implementation
detail and thus is already somewhat broken. We just need to come to a
reasonable decision after weighting all pros and cons, -- that's why I
have posted this in the first place.

>
>> How do you explain users the criteria to choose one algorithm
>> or another?
>
> You don't.  Just explain the differences between the two
> algorithms.  It's up to the user to determine the criteria on
> which he's making his choice.

So are we going to give user a hint? I mean something along the lines:

"LIFO policy may introduce 2 times higher DSR latencies at some rare
conditions than FIFO and ARRAY, but it's there and is the default for
backward compatibility. Please consider to use either FIFO or ARRAY for
new projects."

>> How will user compare the choices in his tests when most of time the
>> algorithms behave exactly the same?
>
> That's up to the user.

Seems like putting on the user the responsibilities he can't cope with
:(

>
>> How do you explain why LIFO choice is there in the first place
>> if it has no advantages?
>
> It's already been explained:  LIFO is fast and dirt-simple.

... with the worst real-time properties of all the available options :(

BTW, FIFO is fast and dirt-simple as well.

>> For example, the "array" choice for DSR queue does have an
>> excuse as being interrupts-disable-free, and it has an excuse
>> of not being the default as it has potential problems with
>> missing DSRs. What's an excuse for keeping LIFO choice?
>
> The most important excuse is not changing things for people who
> have working systems.

Strictly speaking, not to break working system means not to change
anything, that in turn means not to switch to a new eCos version.

> I don't care how much you want FIFO DSR scheduling -- you don't get to
> force it down my throat.

I'm happy I can't force it due to the nature of an open-source project.

[What in fact bothers me is why don't you care, -- do you in fact still
have feeling that LIFO could be better in some cases? I'd be thankful if
you share it with me if you have.]

>> The only one I see is backward compatibility, but due to the fact
>> that eCos never specified exact order of DSRs it shouldn't matter.
>
> Lots of things that shouldn't matter do.

Yes, indeed.

> Don't arbitrarily force changes on everybody just to make a tiny
> minority happy. I'm all for allowing that minority to add another DSR
> scheduling option if they want.

1. The resulting DSR scheduling option after the change is still exactly
   as documented.

2. The "tiny minority" (if any) are in fact those applications that will
   break, and they are already somewhat broken anyway (as they rely on
   undocumented implementation detail).

> I'm _not_ for allow them to force that change on everybody else.

I don't believe you really think that *every* change to eCos sources
should be put under yet another option as some "working" system
somewhere may break, right?

That's in fact was the aim of my post, -- to decide how to qualify this
particular change, as unconditional implementation improvement or
otherwise, and I still think it's the former though I'm willing to
change my opinion should somebody come up with a conflicting case.

Anyway, I have neither rights nor desire to force anything on anybody,
so please take all the above slightly easier.

Thanks for the valuable discussion.

-- Sergei.

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [ECOS] Re: DSR Scheduling Problem
  2006-02-15 19:36           ` Sergei Organov
@ 2006-02-15 19:57             ` Grant Edwards
  2006-02-16 14:08               ` Sergei Organov
  0 siblings, 1 reply; 33+ messages in thread
From: Grant Edwards @ 2006-02-15 19:57 UTC (permalink / raw)
  To: ecos-discuss

In gmane.os.ecos.general, you wrote:

>> The existing one, of course.  Always default to existing behavior when
>> adding new options.
>
> So the result in this given case is that the worst option is
> the default one :( I don't in fact care much about theory, --
> but the result does matter, IMHO. Backward compatibility is
> very valuable goal, but keeping it is not *always* the right
> choice, especially provided that an application that could
> break relies on undocumented implementation detail and thus is
> already somewhat broken.

I hate to be overly pragmatic, but that's often just the way
things are.

> We just need to come to a reasonable decision after weighting
> all pros and cons, -- that's why I have posted this in the
> first place.

As I've said, I think keeping the old DSR scheduler and adding
an optional new one is the reasonable choice.  For the vast
majority of applications it just doesn't matter, and leaving
the current one as the default has the lowest risk of breaking
existing applications.   I guess I just don't think there are
that many applications where FIFO has a measureable advantage
to take the risk.

>>> How do you explain users the criteria to choose one algorithm
>>> or another?
>>
>> You don't.  Just explain the differences between the two
>> algorithms.  It's up to the user to determine the criteria on
>> which he's making his choice.
>
> So are we going to give user a hint? I mean something along the lines:
>
> "LIFO policy may introduce 2 times higher DSR latencies at some rare
> conditions than FIFO and ARRAY, but it's there and is the default for
> backward compatibility. Please consider to use either FIFO or ARRAY for
> new projects."

That's fine with me.

>>> How will user compare the choices in his tests when most of time the
>>> algorithms behave exactly the same?
>>
>> That's up to the user.
>
> Seems like putting on the user the responsibilities he can't cope with
>:(

How are we supposed to run/evaluate tests of the user's
application?

>>> How do you explain why LIFO choice is there in the first place
>>> if it has no advantages?
>>
>> It's already been explained:  LIFO is fast and dirt-simple.
>
> ... with the worst real-time properties of all the available options :(
>
> BTW, FIFO is fast and dirt-simple as well.

>> The most important excuse is not changing things for people who
>> have working systems.
>
> Strictly speaking, not to break working system means not to change
> anything, that in turn means not to switch to a new eCos version.

That's true.  A lot of work is involved in switching to a new
eCos version.  Changing the DSR scheduling algorithm on top of
that just adds a bit more risk.

> [What in fact bothers me is why don't you care, -- do you in
> fact still have feeling that LIFO could be better in some
> cases? I'd be thankful if you share it with me if you have.]

I have the feeling that for everything I've done, LIFO works
just as well as FIFO would.  I'm convinced that changing to a
new DSR scheduling scheme will be of no benefit to my
applications and represents a small (but non-zero) risk.

For example: We recently switched from the NetBSD TCP stack to
the FreeBSD stack because the latter is what's recommended and
what is being actively maintained.  There was a fringe benefit
of somewhat lower CPU load and higher TCP/IP throughput.

However, it broke our application in certain scenarios.  There
was a bug in the FreeBSD stack.  It was fixed 6 years ago in
the NetBSD stack, but never got fixed in the FreeBSD stack.  We
now have a rather frustrated and irate customer and have spent
quite a few hours duplicating the problem and tracking down the
bug in the FreeBSD stack.

Change is risk.

>>> The only one I see is backward compatibility, but due to the fact
>>> that eCos never specified exact order of DSRs it shouldn't matter.
>>
>> Lots of things that shouldn't matter do.
>
> Yes, indeed.

> I don't believe you really think that *every* change to eCos sources
> should be put under yet another option as some "working" system
> somewhere may break, right?

I think that in general, new features or fundamental changes to
existing features should be optional if possible.  Sometimes
that's simply not practical, but I think it is in this case.

-- 
Grant Edwards                   grante             Yow!  My ELBOW is a remote
                                  at               FRENCH OUTPOST!!
                               visi.com            

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [ECOS]  Re: DSR Scheduling Problem
  2006-02-15 19:30             ` Sergei Organov
@ 2006-02-16 10:00               ` Nick Garnett
  2006-02-16 13:09                 ` Sergei Organov
  0 siblings, 1 reply; 33+ messages in thread
From: Nick Garnett @ 2006-02-16 10:00 UTC (permalink / raw)
  To: Sergei Organov; +Cc: ecos-discuss

Sergei Organov <osv@javad.com> writes:

> Nick Garnett <nickg@ecoscentric.com> writes:
> [...]
> > Stefan's patch, give or take a bit of tidying and name changing, looks
> > just fine.
> 
> Please consider documentation worries I have in the reply to Grant's
> objections I'm going to post in a few minutes.

You have some valid points there. The issues of making the tradeoffs
apparent to the user and how the config options actually work are
tricky, but not insoluble.

> 
> > Adding and removing DSRs remains deterministic, if not quite
> > constant-time, and resolves any reservations I might have about adding
> > FIFO queueing.
> 
> What's less constant-time in FIFO w.r.t. LIFO? Did I miss something?

The LIFO code is straight-line with no tests. Its execution time is
therefore very predictable. The FIFO code contains tests in both the
post and call routines that introduce several different execution
paths that depend on factors outside the control of the current
caller. This introduces more jitter into the execution time. It also
accesses more data in RAM, allowing more scope for cache effects to
change execution time.

In both cases an upper bound on the execution time can be calculated,
so they are both deterministic.

However, the difference in the number of cycles is small and any
application that is sensitive to such a small jitter in DSR post/call
times probably has other more important timing problems to deal with.

> 
> > Most systems do not have interrupts occuring at the sort of rate that
> > would make DSR queueing order make any difference.
> 
> I'm confused. Did you miss in my analysis that maximum DSR latency
> doesn't depend on the rate of interrupts? It was in fact one of my
> primary assumptions that interrupts are rare. I.e., FIFO wins at low
> interrupt rates.

The queueing order only ever matters when there is more than one DSR
on the queue. In the vast majority of systems, interrupts do not occur
at such a high rate. Where multiple DSRs do get queued they are for
unrelated interrupts (e.g. serial+timer+exthernet) and the order in
which they are handled does not matter. What's more important is the
order in which any subsequent threads execute, which is determined by
priority.

The order only really matters when the DSRs are related in some
way. The serial driver problem that kicked this off is an example, and
the CAN driver is another one. Both of these are unusual in various
ways, and I believe could be fixed by a bit of recoding of the ISRs
and DSRs to deal with all pending events on each call.

> 
> > So we should default to the very simplest approach and document the
> > tradeoffs of each mechanism. Subsytems and drivers that really want
> > FIFO queueing can always have a "requires" statement in their CDL for
> > this option.
> 
> If the above indeed were the case, I'd have no objections, but I still
> fail to see *real* trade-offs of FIFO w.r.t. to LIFO.

The main advantage of the LIFO approach over FIFO is its lower
jitter. However, I admit that this is a relatively small effect.

I still think that the LIFO mechanism should be present as an
option. However I would be reasonably happy to see the FIFO mechanism
become the default. The code can be determined correct by inspection,
and the runtime effects of using it are minimal at best.

-- 
Nick Garnett                                          eCos Kernel Architect
http://www.ecoscentric.com                     The eCos and RedBoot experts
Besuchen Sie uns vom 14.-16.02.06 auf der Embedded World 2006, Stand 11-222
Visit us at Embedded World 2006, Nürnberg, Germany, 14-16 Feb, Stand 11-222

--
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [ECOS]  Re: DSR Scheduling Problem
  2006-02-16 10:00               ` Nick Garnett
@ 2006-02-16 13:09                 ` Sergei Organov
  0 siblings, 0 replies; 33+ messages in thread
From: Sergei Organov @ 2006-02-16 13:09 UTC (permalink / raw)
  To: ecos-discuss

Nick Garnett <nickg@ecoscentric.com> writes:
> Sergei Organov <osv@javad.com> writes:
>
>> Nick Garnett <nickg@ecoscentric.com> writes:
>> [...]
[...]
>> What's less constant-time in FIFO w.r.t. LIFO? Did I miss something?
>
> The LIFO code is straight-line with no tests. Its execution time is
> therefore very predictable. The FIFO code contains tests in both the
> post and call routines that introduce several different execution
> paths that depend on factors outside the control of the current
> caller. This introduces more jitter into the execution time. It also
> accesses more data in RAM, allowing more scope for cache effects to
> change execution time.
>
> In both cases an upper bound on the execution time can be calculated,
> so they are both deterministic.
>
> However, the difference in the number of cycles is small and any
> application that is sensitive to such a small jitter in DSR post/call
> times probably has other more important timing problems to deal with.

Thanks for the precise explanation, -- now I see what you mean.

>> > Most systems do not have interrupts occuring at the sort of rate that
>> > would make DSR queueing order make any difference.
>> 
>> I'm confused. Did you miss in my analysis that maximum DSR latency
>> doesn't depend on the rate of interrupts? It was in fact one of my
>> primary assumptions that interrupts are rare. I.e., FIFO wins at low
>> interrupt rates.
>
> The queueing order only ever matters when there is more than one DSR
> on the queue.

Sure.

> In the vast majority of systems, interrupts do not occur at such a
> high rate.

???

If you mean that in most systems the probability of more than one DSR
being active at any given time is low, I agree. However, the maximum
possible number of simultaneously posted DSRs is still not less than the
number of asynch interrupt sources in the system.

> Where multiple DSRs do get queued they are for unrelated interrupts
> (e.g. serial+timer+exthernet) and the order in which they are handled
> does not matter.

The order by itself doesn't matter, but the IRQ-to-DSR latency does
matter as we still talk about real-time systems, isn't it? And it's
IRQ-to-DSR latency where the FIFO wins as I've hopefully shown in my
analysis.

The fact that FIFO schedules DSRs in more "natural" order than LIFO is
IMHO nice, but it's not primary advantage of the FIFO and has not been
taken into account for the analysis I've made, nevertheless FIFO won.

> What's more important is the order in which any subsequent threads
> execute, which is determined by priority.

By your own logic, the order of execution of otherwise independent
threads also should not matter. What indeed does matter is minimizing
IRQ-to-thread latency, and it's true that either highest priority or
most long waiting thread should (and does) win in this race. Sorry, but
I fail to see why you consider IRQ-to-thread latency to be essential and
IRQ-to-DSR latency not to be that essential.

> The order only really matters when the DSRs are related in some
> way.

Once again, I don't care about the order of DSRs by itself. The
essential thing is that FIFO scheduling policy happens to minimize
maximum IRQ-to-DSR latency making it possible to meet some deadlines
that could be impossible to meet with LIFO policy.

[...]
>> > So we should default to the very simplest approach and document the
>> > tradeoffs of each mechanism. Subsytems and drivers that really want
>> > FIFO queueing can always have a "requires" statement in their CDL for
>> > this option.
>> 
>> If the above indeed were the case, I'd have no objections, but I still
>> fail to see *real* trade-offs of FIFO w.r.t. to LIFO.
>
> The main advantage of the LIFO approach over FIFO is its lower
> jitter. However, I admit that this is a relatively small effect.

Here I do agree. What I've been trying to say is that probably this
relatively small main advantage is too small to excuse addition of yet
another configuration option, but if you guys insist it must be there,
it's OK with me.

> I still think that the LIFO mechanism should be present as an
> option. However I would be reasonably happy to see the FIFO mechanism
> become the default. The code can be determined correct by inspection,
> and the runtime effects of using it are minimal at best.

At the risk of being considered too annoying, here is my last attempt to
change your mind. What do you say if I suggest a patch that will add
CYGIMP_KERNEL_SCHED_LIFO_QUEUES option in addition to existing
CYGIMP_KERNEL_SCHED_SORTED_QUEUES and to the implicit default FIFO using
your own arguments about trade-offs of FIFO vs LIFO ;)

Well, I think we now in fact all understand each other, and whatever the
final decision will be, it's OK with me.

Still not wishing anybody to end up being put into a LIFO wait queue in
the real life ;)

-- Sergei.

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [ECOS]  Re: DSR Scheduling Problem
  2006-02-15 19:57             ` Grant Edwards
@ 2006-02-16 14:08               ` Sergei Organov
  0 siblings, 0 replies; 33+ messages in thread
From: Sergei Organov @ 2006-02-16 14:08 UTC (permalink / raw)
  To: ecos-discuss

Grant Edwards <grante@visi.com> writes:
> In gmane.os.ecos.general, you wrote:
>
[...]
>> We just need to come to a reasonable decision after weighting
>> all pros and cons, -- that's why I have posted this in the
>> first place.
>
> As I've said, I think keeping the old DSR scheduler and adding
> an optional new one is the reasonable choice.  For the vast
> majority of applications it just doesn't matter,

What bothers me is that it's hard to tell if it does matter or not for a
given application. It could be the case that the unfortunate worst case
behavior of LIFO just didn't happen yet due to its low probability. It's
somewhat similar to having a race somewhere, -- it could work for years
than suddenly break nex Friday, 13 :(

> and leaving the current one as the default has the lowest risk of
> breaking existing applications.  I guess I just don't think there are
> that many applications where FIFO has a measureable advantage to take
> the risk.

And I have tried to perform the analysis and it made me feel that almost
any application could be affected :( Well, we can believe that the
probability of worst case behavior is so small that we could well ignore
it, but I still feel uneasy about it.

[...]
>>>> How will user compare the choices in his tests when most of time the
>>>> algorithms behave exactly the same?
>>>
>>> That's up to the user.
>>
>> Seems like putting on the user the responsibilities he can't cope with
>>:(
>
> How are we supposed to run/evaluate tests of the user's application?

The problem is that such kinds of problems are very difficult if not
impossible to find in tests. Even if an application behaves well for a
few days, it can still suddenly break tomorrow, -- that what I meant, --
the user probably has less chances to find the problem in his tests than
we have by analyzing the system :(

[...]
>> [What in fact bothers me is why don't you care, -- do you in
>> fact still have feeling that LIFO could be better in some
>> cases? I'd be thankful if you share it with me if you have.]
>
> I have the feeling that for everything I've done, LIFO works
> just as well as FIFO would.  I'm convinced that changing to a
> new DSR scheduling scheme will be of no benefit to my
> applications and represents a small (but non-zero) risk.

In fact we are in roughly the same situation. Please tell how did you
manage to convince yourself your applications aren't affected as after
the analysis I still feel uneasy about my own application that still
worked fine with LIFO.

To be unaffected for sure, either application should have less than 3
asynch IRQ sources, or the DSR latency equal to the sum of execution
times of all the ISRs and DSRs should be not a problem. If you have only
such applications, then indeed there is no reason to bother.

> For example: We recently switched from the NetBSD TCP stack to
> the FreeBSD stack because the latter is what's recommended and
> what is being actively maintained.  There was a fringe benefit
> of somewhat lower CPU load and higher TCP/IP throughput.
>
> However, it broke our application in certain scenarios.  There
> was a bug in the FreeBSD stack.  It was fixed 6 years ago in
> the NetBSD stack, but never got fixed in the FreeBSD stack.  We
> now have a rather frustrated and irate customer and have spent
> quite a few hours duplicating the problem and tracking down the
> bug in the FreeBSD stack.

Well, we both know such things happen :( Sorry you've run into such
troubles. Though switching from one TCP stack implementation to another
is IMHO many magnitudes more risky than switching from LIFO to FIFO
where things are much more simple and easier to understand.

>
> Change is risk.
>

Yes, my switch from 1.3.1 to 2.0 also resulted in hard to find breakage
(due to a CPU hardware bug that has been hidden by 1.3.1 ARM HAL code
and has been unveiled due to the HAL re-implementation). Do I think the
old HAL should better be left there under an option?  No, I
don't. Progress requires changes, changes may break things, -- we all
are used to cope with it.

>>>> The only one I see is backward compatibility, but due to the fact
>>>> that eCos never specified exact order of DSRs it shouldn't matter.
>>>
>>> Lots of things that shouldn't matter do.
>>
>> Yes, indeed.
>
>> I don't believe you really think that *every* change to eCos sources
>> should be put under yet another option as some "working" system
>> somewhere may break, right?
>
> I think that in general, new features or fundamental changes to
> existing features should be optional if possible.  Sometimes
> that's simply not practical, but I think it is in this case.

Well, in this particular case we still disagree, as the risk is IMHO
negligible, but my disagreement is not to the level to continue to argue
against the new options.

Keeping LIFO the default may have another unfortunate effect. The issues
involved in selection of particular option are so non-obvious that most
people will probably just leave the default driven by the reasonable
rule "if you aren't sure what the option is about, leave it the
default". I still think safest option should be made the default, but I
do understand you probably won't agree with me.

-- Sergei.

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [ECOS] Re: DSR Scheduling Problem
  2006-02-13 14:51 Uwe Kindler
@ 2006-02-13 15:26 ` Grant Edwards
  0 siblings, 0 replies; 33+ messages in thread
From: Grant Edwards @ 2006-02-13 15:26 UTC (permalink / raw)
  To: ecos-discuss

In gmane.os.ecos.general, you wrote:

[...]

> When a ISR of a message buffer is called, then the next message buffer 
> will be enabled by this ISR. Disabling the active buffer and enabling 
> the next one is quite fast and guarants that no message will be lost. 
> When such a burst of messages arrives it may happen, that a number of 
> message buffer ISRs will be called without any chance for a DSR to run. 
> This is not really a problem because we have up to 16 message buffers. 
> As soon as all messages are received the DSRs will run. And they will 
> run in LIFO order. So the interesting thing is, if we receive the CAN 
> messages with the following IDs:
>
> 0x001, 0x002, 0x003, 0x004, 0x005
>
> our application will  see the following:
>
> 0x005, 0x004, 0x003, 0x002, 0x001
>
> So the eCos DSR handling does change the received data here.
> At the moment I did not ran into any trouble with my CAN
> application and it seems not to be a problem but there is
> still a little bad feeling with this behaviour. It was quite
> tricky to write the CAN driver in a way that it works well
> with list and table implementation of DSR scheduling.

I'd probably use a single DSR that "knows" what order the
buffers are in.  I assume that the buffers are filled in a
predictable order?  Even if the hardware can't tell you which
buffers are full and which are empty, it should be relatively
simple to keep track in software if you need to, since it turns
into a standard "ring-buffer" inteface between the ISRs and the
DSR.

Just put a loop in the DSR to process all of the "full"
buffers. This will also result in less overhead, since the DSR
will only be invoked once for a "burst" of CAN frames.

-- 
Grant Edwards                   grante             Yow!  Being a BALD HERO
                                  at               is almost as FESTIVE as a
                               visi.com            TATTOOED KNOCKWURST.

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [ECOS] Re: DSR Scheduling Problem
@ 2006-02-13 14:51 Uwe Kindler
  2006-02-13 15:26 ` Grant Edwards
  0 siblings, 1 reply; 33+ messages in thread
From: Uwe Kindler @ 2006-02-13 14:51 UTC (permalink / raw)
  To: ecos-discuss; +Cc: osv

Hello,

I followed the "DSR scheduling problem" thread with little interest - 
till now. In the last days I improved the eCos CAN driver for FlexCAN 
module of Motorola Coldfire processors and run into trouble with DSR 
scheduling.

The FlexCAN module contains 16 message buffers for reception of CAN 
messages. Each buffer got its own interrupt vector. There is always only 
one message buffer active. That means, if a CAN message arrives, the 
message buffer that received this message will be locked immediatelly 
and the next message buffer will be enabled. So it is possible to read 
data from one buffer while listening for the next message with another 
buffer. In some situations there are bursts of CAN messages on the CAN 
bus - that is absolutely normal and does not mean that bus load is to high.

When a ISR of a message buffer is called, then the next message buffer 
will be enabled by this ISR. Disabling the active buffer and enabling 
the next one is quite fast and guarants that no message will be lost. 
When such a burst of messages arrives it may happen, that a number of 
message buffer ISRs will be called without any chance for a DSR to run. 
This is not really a problem because we have up to 16 message buffers. 
As soon as all messages are received the DSRs will run. And they will 
run in LIFO order. So the interesting thing is, if we receive the CAN 
messages with the following IDs:

0x001, 0x002, 0x003, 0x004, 0x005

our application will  see the following:

0x005, 0x004, 0x003, 0x002, 0x001

So the eCos DSR handling does change the received data here. At the 
moment I did not ran into any trouble with my CAN application and it 
seems not to be a problem but there is still a little bad feeling with 
this behaviour. It was quite tricky to write the CAN driver in a way 
that it works well with list and table implementation of DSR scheduling.

Regards,

Uwe Kindler
Softwareentwicklung

--

cetoni GmbH
Am Wiesenring 6
D-07554 Korbussen

Tel.: +49 (0) 36602 338 28
Fax:  +49 (0) 36602 338 11
uwe.kindler@cetoni.de
www.cetoni.de

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [ECOS] Re: DSR Scheduling Problem
  2006-01-13 23:01 [ECOS] " Jay Foster
@ 2006-01-13 23:38 ` Grant Edwards
  0 siblings, 0 replies; 33+ messages in thread
From: Grant Edwards @ 2006-01-13 23:38 UTC (permalink / raw)
  To: ecos-discuss

> The test begins by transmitting data, which is looped back to the receiver.
> It starts out with:
> 	TX ISR -> TX DSR
> 	TX ISR -> TX DSR
> 	...
> 	TX-ISR -> TX DSR
>
> Then I get the RX ISR during the TX DSR, which just schedules
> the RX DSR. However, the RX DSR does not run until 39 ms
> later,

And TX DSRs are running during that entire 38ms?

> resulting in an overrun error.  During this time period, the
> TX ISR and TX DSR continue their work transmitting the
> remaining data.  After all of the data has been sent, THEN the
> RX DSR runs.

It appears you don't have enough CPU time to run all of the
DSRs you want in the alloted time.

> Looking at the code post_dsr() and call_dsr() in
> hal/common/current/src/drv_api.c, I noticed that the DSRs are
> queued at the head of the list, and dequeued also from the
> head of the list.

Yup.  DSRs are scheduled in a LIFO manner. 

> This seems wrong,

It seems to work for everybody else. ;)

> as it can (and apparently does) cause DSRs to get delayed by
> other DSRs that are queued later.  Seems like it would be
> better to queue them on the end of the list and dequeue them
> from the head of the list, so that the DSRs would get run in
> the order in which they are queued.

If the DSRs that you're scheduling require 150% of the
available CPU time, then something's going to fail.  

In your particular case, perhaps it is better to fail in manner
B than in manner A. But, very few eCos users have the option of
failing, so nobody put in much extra effort to make things fail
in manner B rather than in manner A.  

Did that make sense?

-- 
Grant Edwards                   grante             Yow!  I'm having an
                                  at               EMOTIONAL OUTBURST!! But,
                               visi.com            uh, WHY is there a WAFFLE
                                                   in my PAJAMA POCKET??

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2006-02-16 14:08 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-01-14  0:45 [ECOS] Re: DSR Scheduling Problem Jay Foster
2006-01-14  2:12 ` Grant Edwards
2006-01-14  3:04   ` Paul D. DeRocco
2006-01-14  3:40     ` Grant Edwards
2006-01-16  8:40       ` Daniel Néri
2006-01-16 10:36         ` Nick Garnett
2006-01-16 11:45           ` [ECOS] Generic 16x5x serial driver use of transmit FIFO (was: DSR Scheduling Problem) Daniel Néri
2006-01-16 12:23             ` Nick Garnett
2006-01-16 15:13         ` [ECOS] Re: DSR Scheduling Problem Grant Edwards
2006-01-17  9:43         ` Andrew Lunn
2006-01-16  8:27   ` Dirk Husemann
2006-01-16 15:11     ` Grant Edwards
2006-02-13 10:41   ` Sergei Organov
2006-02-15  2:06     ` Brett Delmage
2006-02-15  9:57       ` Sergei Organov
2006-02-15 13:23         ` Stefan Sommerfeld
2006-02-15 14:07           ` Sergei Organov
2006-02-15 14:14             ` Stefan Sommerfeld
2006-02-15 15:54           ` Grant Edwards
2006-02-15 15:53         ` Grant Edwards
2006-02-15 18:30           ` Nick Garnett
2006-02-15 19:30             ` Sergei Organov
2006-02-16 10:00               ` Nick Garnett
2006-02-16 13:09                 ` Sergei Organov
2006-02-15 19:36           ` Sergei Organov
2006-02-15 19:57             ` Grant Edwards
2006-02-16 14:08               ` Sergei Organov
2006-02-15 16:34         ` Brett Delmage
2006-01-14  8:23 ` Andrew Lunn
2006-01-16 10:27 ` Nick Garnett
  -- strict thread matches above, loose matches on Subject: below --
2006-02-13 14:51 Uwe Kindler
2006-02-13 15:26 ` Grant Edwards
2006-01-13 23:01 [ECOS] " Jay Foster
2006-01-13 23:38 ` [ECOS] " Grant Edwards

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).