[ECOS] Interrupt stacking issues

public inbox for ecos-discuss@sourceware.org
 help / color / mirror / Atom feed

* [ECOS] Interrupt stacking issues
@ 2012-07-19  9:46 Alan Bowman
  2012-07-19  9:58 ` Martin Hans
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Alan Bowman @ 2012-07-19  9:46 UTC (permalink / raw)
  To: ecos-discuss

I am using a board based upon an STM3210E development board, only with
the processor clocked much slower (8MHz).  I am seeing problems with
interrupt handling when using the serial ports.  I believe that my
system should be fast enough to handle the data coming in.

Recently, I started getting assertions that a stack base was corrupt.
It turns out that the thread that is running when characters arrive at
the serial port has many instances of the ISR stacked one on top the
other until the stack overflows.  I understand that the ISR uses the
stack of the running thread, but I can't see how the same ISR can call
enough times until the stack overflows.

Since the stack check code is being asserted, I can see that the
PendSV bit is being set, and the scheduler is starting the context
switch - so the ISR itself is running to completion.  This also seems
to be quite timing dependent - when I have a setup that sees the
failure, it's quite consistent as soon as a modest amount of data is
written.  However, inserting a few lines of debugging code into ISRs
can make the issue (apparently) vanish.

Has anyone else had problems with recurring ISRs like this?  I'm not
sufficiently clear on the inner workings of the Cortex port to be able
to see if this is a problem with all interrupts, the serial ports or
something that I've done.  The stack dump doesn't enlighten me much
(beyond showing what seems to be repetition of the ISR call and
surrounding code), but I can post a printout if anyone thinks it would
help.

Thanks in advance

Alan Bowman

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [ECOS] Interrupt stacking issues
  2012-07-19  9:46 [ECOS] Interrupt stacking issues Alan Bowman
@ 2012-07-19  9:58 ` Martin Hans
       [not found]   ` <CAGT=Sv+C1C2Tqt05uLzphZfc=dCrhfOBBcDJwYg6uhKMFs1pKA@mail.gmail.com>
  2012-07-19 10:10 ` Manuel Borchers
  2012-07-19 15:46 ` Bernard Fouché
  2 siblings, 1 reply; 9+ messages in thread
From: Martin Hans @ 2012-07-19  9:58 UTC (permalink / raw)
  To: ecos-discuss

Hi Alan,
what are you using the serial port for?

I've had the problem that I would miss interrupts because I was using diag_printf. This function apparently disables interrupts. Switching to printf and configuring printf such that it is interrupt driven fixed that situation for me.

To configure printf to use interrupts, I enabled CYGPKG_IO_SERIAL_DEVICES and changed CYGDAT_IO_SERIAL_TTY_CONSOLE to /dev/tty0.

Hope this helps,
Martin
________________________________________
From: ecos-discuss-owner@ecos.sourceware.org [ecos-discuss-owner@ecos.sourceware.org] on behalf of Alan Bowman [alan.michael.bowman@gmail.com]
Sent: Thursday, July 19, 2012 11:46
To: ecos-discuss@ecos.sourceware.org
Subject: [ECOS] Interrupt stacking issues

I am using a board based upon an STM3210E development board, only with
the processor clocked much slower (8MHz).  I am seeing problems with
interrupt handling when using the serial ports.  I believe that my
system should be fast enough to handle the data coming in.

Recently, I started getting assertions that a stack base was corrupt.
It turns out that the thread that is running when characters arrive at
the serial port has many instances of the ISR stacked one on top the
other until the stack overflows.  I understand that the ISR uses the
stack of the running thread, but I can't see how the same ISR can call
enough times until the stack overflows.

Since the stack check code is being asserted, I can see that the
PendSV bit is being set, and the scheduler is starting the context
switch - so the ISR itself is running to completion.  This also seems
to be quite timing dependent - when I have a setup that sees the
failure, it's quite consistent as soon as a modest amount of data is
written.  However, inserting a few lines of debugging code into ISRs
can make the issue (apparently) vanish.

Has anyone else had problems with recurring ISRs like this?  I'm not
sufficiently clear on the inner workings of the Cortex port to be able
to see if this is a problem with all interrupts, the serial ports or
something that I've done.  The stack dump doesn't enlighten me much
(beyond showing what seems to be repetition of the ISR call and
surrounding code), but I can post a printout if anyone thinks it would
help.

Thanks in advance

Alan Bowman

--
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

--
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [ECOS] Interrupt stacking issues
  2012-07-19  9:46 [ECOS] Interrupt stacking issues Alan Bowman
  2012-07-19  9:58 ` Martin Hans
@ 2012-07-19 10:10 ` Manuel Borchers
  2012-07-19 10:22   ` Alan Bowman
  2012-07-19 15:46 ` Bernard Fouché
  2 siblings, 1 reply; 9+ messages in thread
From: Manuel Borchers @ 2012-07-19 10:10 UTC (permalink / raw)
  To: Alan Bowman; +Cc: ecos-discuss

[-- Attachment #1: Type: text/plain, Size: 672 bytes --]

Hi Alan,

could it be, that it's somehow related to
http://sourceware.org/ml/ecos-discuss/2011-05/msg00001.html ? You didn't
write which eCos version (3.0, cvs) and compiler you are using, so it's
just a guess from me.

Cheers,
Manuel

Am Donnerstag, den 19.07.2012, 10:46 +0100 schrieb Alan Bowman: 
> I am using a board based upon an STM3210E development board, only with
> the processor clocked much slower (8MHz).  I am seeing problems with
> interrupt handling when using the serial ports.  I believe that my
> system should be fast enough to handle the data coming in.

-- 
Manuel Borchers

Web: http://www.matronix.de
eMail: manuel@matronix.de

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [ECOS] Fwd: [ECOS] Interrupt stacking issues
       [not found]   ` <CAGT=Sv+C1C2Tqt05uLzphZfc=dCrhfOBBcDJwYg6uhKMFs1pKA@mail.gmail.com>
@ 2012-07-19 10:11     ` Alan Bowman
  0 siblings, 0 replies; 9+ messages in thread
From: Alan Bowman @ 2012-07-19 10:11 UTC (permalink / raw)
  To: ecos-discuss

Sorry - forgot to send it to the mailing list...

---------- Forwarded message ----------
From: Alan Bowman <alan.michael.bowman@gmail.com>
Date: 19 July 2012 11:10
Subject: Re: [ECOS] Interrupt stacking issues
To: Martin Hans <mha@sophion.com>

> what are you using the serial port for?

I'm sending commands to my board, and some of them run to hundreds of
characters long.  I'm also receiving responses back, although I've
never had any issue with that side of the link.  I have three ports in
use, and I think I've seen the same issue with all three.  It's more
repeatable with one, though, as I have more control over the data
being sent.

> I've had the problem that I would miss interrupts because I was using diag_printf. This function apparently disables interrupts. Switching to printf and configuring printf such that it is interrupt driven fixed that situation for me.

I don't think this is quite the same thing.  I don't think I'm missing
interrupts (at least, I'm definitely not when I have a "working"
configuration).  When it breaks, though, I have what seems to be the
same interrupt piled on top of itself a load of times until the stack
overflows.  It's like there's a place where the old ISR is still on
the stack when it gets re-enabled and run again... and again.. etc.
It's more likely the DSR/scheduler, rather than the ISR itself,
though.

Alan

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [ECOS] Interrupt stacking issues
  2012-07-19 10:10 ` Manuel Borchers
@ 2012-07-19 10:22   ` Alan Bowman
  0 siblings, 0 replies; 9+ messages in thread
From: Alan Bowman @ 2012-07-19 10:22 UTC (permalink / raw)
  To: Manuel Borchers; +Cc: ecos-discuss

> could it be, that it's somehow related to
> http://sourceware.org/ml/ecos-discuss/2011-05/msg00001.html ? You didn't
> write which eCos version (3.0, cvs) and compiler you are using, so it's
> just a guess from me.

Sorry, I should have specified.  I'm relatively near to top-of-trunk -
our tree doesn't yet have the support for the STM32F2 and STM32F4
families.  I'm using gcc-4.3.2 (downloaded from the eCos site) on
Cygwin.

The patch linked in that message has been included in our code, so I
assume it's not (directly) related to that.

Alan

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [ECOS] Interrupt stacking issues
  2012-07-19  9:46 [ECOS] Interrupt stacking issues Alan Bowman
  2012-07-19  9:58 ` Martin Hans
  2012-07-19 10:10 ` Manuel Borchers
@ 2012-07-19 15:46 ` Bernard Fouché
  2012-07-19 16:28   ` Alan Bowman
  2 siblings, 1 reply; 9+ messages in thread
From: Bernard Fouché @ 2012-07-19 15:46 UTC (permalink / raw)
  To: ecos-discuss

Le 19/07/2012 11:46, Alan Bowman a Ã©crit :
> I am using a board based upon an STM3210E development board, only with
> the processor clocked much slower (8MHz).  I am seeing problems with
> interrupt handling when using the serial ports.  I believe that my
> system should be fast enough to handle the data coming in.
>
Hello Alan,

I spent considerable time about ISR related issues with the NXP LPC17XX 
family (also a Cortex M3). My target also clocks at 8MHz.

It seems you hit this problem:

- because we clock slowly, we increase the possibility of having an IRQ 
triggered while the corresponding DSR code is running, or even before 
the ISR was triggered.

- the Cortex-M3 keeps track of pending interrupts: processing an IRQ 
does not clear the pending interrupt bit and if this bit isn't cleared, 
then as soon as the core can trigger a new interrupt even for the same 
interrupt source, it will.

- eCos does not provide a platform independent way to clear the pending 
interrupt bit. So if you use drivers that comes from another 
environment, these drivers will do nothing regarding handling the 
pending interrupt bit.

- clearing the pending interrupt bit must be done at the correct time. I 
don't know about the STM3210E, but in my case the UART driver is based 
on the generic serial driver but I can't use the generic serial driver 
as it is provided by eCos, because of this pending interrupt bit 
problem: to solve the issue, I reorganized the DSR to :

     for (;;){
         - read interrupt sources (more than one can be triggered when 
DSR code is scheduled)
         - if no interrupt to process, exit loop
         - clear pending interrupt bit
         - do the job related to interrupts.
     }

Of course the number of iteration is checked to avoid being stuck in the 
DSR, for instance if some connection error occured on the hardware, for 
instance if RTS is connected to a signal that changes of state very 
frequently, one can stay in the DSR forever.

- I had to modify UART, CAN and SPI drivers in a similar way.

- I did a lengthly bug report describing all of this 
(http://bugs.ecos.sourceware.org/show_bug.cgi?id=1001456). I submited a 
patch to add into eCos kernel a supplementary call to handle the 
interrupt pending bit problem, but it's stuck in bugzilla since February 
since it's a kernel API change..

Hope it helps!

    Bernard



-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [ECOS] Interrupt stacking issues
  2012-07-19 15:46 ` Bernard Fouché
@ 2012-07-19 16:28   ` Alan Bowman
  2012-07-19 16:58     ` Bernard Fouché
  0 siblings, 1 reply; 9+ messages in thread
From: Alan Bowman @ 2012-07-19 16:28 UTC (permalink / raw)
  To: Bernard Fouché; +Cc: ecos-discuss

> - I did a lengthly bug report describing all of this
> (http://bugs.ecos.sourceware.org/show_bug.cgi?id=1001456). I submited a
> patch to add into eCos kernel a supplementary call to handle the interrupt
> pending bit problem, but it's stuck in bugzilla since February since it's a
> kernel API change..
>
> Hope it helps!

Wow, that's a lot of detail to sort through - I'm going to have to
think about that for a bit.  On the face of it, we may well have that
problem.  However, I haven't worked out how that matches our symptoms.
 I thought that all of the ISRs (and the PendSV/DSR) could only run
once at a time.  If an ISR triggers the PendSV, it will be off the
stack by the time PendSV runs.  It can then pre-empt the PendSV
exception (so both will appear on the stack at once), but only once.
The PendSV exception bit may well be set again, but it will have to
wait for the first one to finish (and presumably be off the stack)
before running again.  Lots of different ISRs could all pre-empt each
other, but I don't _think_ that's what's happening.

Did you see any issues like this when you were investigating, or was
unnecessary ISR/DSR calls the key issue?

Thanks very much

Alan

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [ECOS] Interrupt stacking issues
  2012-07-19 16:28   ` Alan Bowman
@ 2012-07-19 16:58     ` Bernard Fouché
  2012-07-23  8:42       ` Alan Bowman
  0 siblings, 1 reply; 9+ messages in thread
From: Bernard Fouché @ 2012-07-19 16:58 UTC (permalink / raw)
  To: Alan Bowman; +Cc: ecos-discuss

Le 19/07/2012 18:28, Alan Bowman a Ã©crit :
>
> Wow, that's a lot of detail to sort through - I'm going to have to
> think about that for a bit.  On the face of it, we may well have that
> problem.  However, I haven't worked out how that matches our symptoms.
>   I thought that all of the ISRs (and the PendSV/DSR) could only run
> once at a time.  If an ISR triggers the PendSV, it will be off the
> stack by the time PendSV runs.  It can then pre-empt the PendSV
> exception (so both will appear on the stack at once), but only once.
Your situation may be related to nested interrupts. I didn't look at 
this potential issue since I didn't suffered from it :-)

For instance if you setup the UART interrupt priority at a lower value 
than what is used by eCos to disable interrupts (default is 8 from what 
I recall for the LPC17XX), then an UART  interrupt can be triggered even 
when you think you have disabled interrupts! If you have different UART 
with different interrupt priority levels and all have a value < 8 (if 8 
is the value setup by eCos in your case), then the ISR for an UART can 
be triggered even if an ISR of another UART with a lower priority is 
running and this could explain what you get.

> The PendSV exception bit may well be set again, but it will have to
> wait for the first one to finish (and presumably be off the stack)
> before running again.  Lots of different ISRs could all pre-empt each
> other, but I don't _think_ that's what's happening.
>
> Did you see any issues like this when you were investigating, or was
> unnecessary ISR/DSR calls the key issue?
My problem was unnecessary ISR/DSR calls, and since my target clocks 
slowly, I didn't want to waste any time or power in superfluous 
processing. If the problem is related to interrupt priority, it's a 
different story but my guess is that you will also be interested in the 
pending bit problem because if you clock at 8MHz you may have 
constraints similar to the ones I have.
Bernard


-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [ECOS] Interrupt stacking issues
  2012-07-19 16:58     ` Bernard Fouché
@ 2012-07-23  8:42       ` Alan Bowman
  0 siblings, 0 replies; 9+ messages in thread
From: Alan Bowman @ 2012-07-23  8:42 UTC (permalink / raw)
  To: ecos-discuss

I think I've got some understanding of why I'm seeing my stack
overflow, and I'm not quite sure what to do about it.  I think that
this describes the circumstances:

1 A character arrives at the serial port, triggering the ISR.
2 The ISR runs to completion, signalling the PendSV handler.
3 The PendSV handler pushes a "fake" stack frame onto the stack of the
executing thread, adding hal_interrupt_end and hal_interrupt_end_done
into the backtrace.
4 PendSV exits, returning the processor to its normal mode, but
running hal_interrupt_end.
5 hal_interrupt_end ultimately results in DSRs being called.
6 Normal thread execution is resumed.

I think that the problem might occur if another interrupt arrives
while step 4 is occurring.  There is an extra frame on the stack, but
all the interrupts (and PendSV) have run and are capable of being
re-run.  If another interrupt arrives at this point, the ISR/PendSV
runs again, and _another_ fake frame is pushed to the stack.  A series
of evenly spaced interrupts (such as one would find in a serial
stream) will push fake frames to the stack until overflow.  I could
disable serial interrupts in the ISR and re-enable in the DSR, but
that defeats the object of the STM32 serial driver copying data into
an intermediate buffer in the ISR.  This would also be affected by any
other unfortunately timed interrupts.

If I've understood this right, one doesn't just need to size stacks
based on the number of simultaneously triggered ISRs that might occur,
but also on the number that might occur in a row one after another.

I really hope that either a) I've misunderstood the issue, or b) I've
missed an obvious fix.

Can anyone confirm that I've understood this, or suggest any options?

Thanks

Alan

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2012-07-23  8:42 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-19  9:46 [ECOS] Interrupt stacking issues Alan Bowman
2012-07-19  9:58 ` Martin Hans
     [not found]   ` <CAGT=Sv+C1C2Tqt05uLzphZfc=dCrhfOBBcDJwYg6uhKMFs1pKA@mail.gmail.com>
2012-07-19 10:11     ` [ECOS] Fwd: " Alan Bowman
2012-07-19 10:10 ` Manuel Borchers
2012-07-19 10:22   ` Alan Bowman
2012-07-19 15:46 ` Bernard Fouché
2012-07-19 16:28   ` Alan Bowman
2012-07-19 16:58     ` Bernard Fouché
2012-07-23  8:42       ` Alan Bowman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).