[ECOS] DSR stops running after heavy interrupts.

public inbox for ecos-discuss@sourceware.org
 help / color / mirror / Atom feed

* [ECOS] DSR stops running after heavy interrupts.
@ 2006-04-05 21:09 Joe Porthouse
  2006-04-06  6:49 ` Andrew Lunn
  0 siblings, 1 reply; 25+ messages in thread
From: Joe Porthouse @ 2006-04-05 21:09 UTC (permalink / raw)
  To: 'eCos Discussion'

In a nutshell:
	The real time clock DSR stops getting called after several minutes
of heavy UART ISR traffic.  I have been running into this on and off for a
while.  Lowering the serial ISR priority seems to help some, but not
eliminate the problem.

Background:
	Application is on a custom xScale PXA255 board without redboot.
When problem occurs the Real Time tick clock simply stops updating.  All
other aspects of the program seem to work correctly.  The real time ISR is
still getting called as well as other ISRs, but the real time clock DSR is
no longer called.

	In the Vectors.S file I can step through the execution and see what
is happening.  On return from the ISR the return code is examined to
determine if a DSR call should be added to the DSL list.  This check is done
here:

        // The return value from the handler (in r0) will indicate whether a

        // DSR is to be posted. Pass this together with a pointer to the
        // interrupt object we have just used to the interrupt tidy up

        cmp     v1,#CYGNUM_HAL_INTERRUPT_NONE
        beq     17f

	When the problem occurs the branch (beq) is occurring that skips
adding the DSR to the list and ends the ISR.  I can see that R0 is correctly
0x03 but the branch still occurs.  The problem may be in how this is getting
compiled.  In my JTAG tool I see the above code as:

00008C5C e3740001   CMN       R4,#00000001
00008C60 0a000003   BEQ       00008c74 

	Obviously there is some assembler substitution going on.  I'm not
sure why if the value is in r0, why v1 is being checked (not familiar with
the "v" register notation).  Also not sure why the resulting code refers to
R4.  R4 has a different value then R1 at this point in the execution.

	Any ideas on this?

Many thanks,
Joe Porthouse
Toptech Systems, Inc.




-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [ECOS] DSR stops running after heavy interrupts.
  2006-04-05 21:09 [ECOS] DSR stops running after heavy interrupts Joe Porthouse
@ 2006-04-06  6:49 ` Andrew Lunn
  2006-04-06  9:02   ` Stefan Sommerfeld
  0 siblings, 1 reply; 25+ messages in thread
From: Andrew Lunn @ 2006-04-06  6:49 UTC (permalink / raw)
  To: Joe Porthouse; +Cc: 'eCos Discussion'

On Wed, Apr 05, 2006 at 05:09:41PM -0400, Joe Porthouse wrote:
> In a nutshell:
> 	The real time clock DSR stops getting called after several minutes
> of heavy UART ISR traffic.  I have been running into this on and off for a
> while.  Lowering the serial ISR priority seems to help some, but not
> eliminate the problem.
> 
> Background:
> 	Application is on a custom xScale PXA255 board without redboot.
> When problem occurs the Real Time tick clock simply stops updating.  All
> other aspects of the program seem to work correctly.  The real time ISR is
> still getting called as well as other ISRs, but the real time clock DSR is
> no longer called.
> 
> 	In the Vectors.S file I can step through the execution and see what
> is happening.  On return from the ISR the return code is examined to
> determine if a DSR call should be added to the DSL list.  This check is done
> here:
> 
>         // The return value from the handler (in r0) will indicate whether a
> 
>         // DSR is to be posted. Pass this together with a pointer to the
>         // interrupt object we have just used to the interrupt tidy up
> 
>         cmp     v1,#CYGNUM_HAL_INTERRUPT_NONE
>         beq     17f
> 
> 	When the problem occurs the branch (beq) is occurring that skips
> adding the DSR to the list and ends the ISR.  I can see that R0 is correctly
> 0x03 but the branch still occurs.  The problem may be in how this is getting
> compiled.  In my JTAG tool I see the above code as:
> 
> 00008C5C e3740001   CMN       R4,#00000001
> 00008C60 0a000003   BEQ       00008c74 
> 
> 	Obviously there is some assembler substitution going on.  I'm not
> sure why if the value is in r0, why v1 is being checked (not familiar with
> the "v" register notation).  Also not sure why the resulting code refers to
> R4.  R4 has a different value then R1 at this point in the execution.
> 
> 	Any ideas on this?

The fast that this works for a while and then breaks suggests it is
something unusual going on.

When the problem occurs take a look at the actually contents of memory
which contains these instructions. Has it been corrupted? Be careful
with your debugger here. If you just ask it to disassemble the code it
might show you what is in the elf file, not what is in memory. Do a
hex dump and compare the machine code bytes. 

        Andrew

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [ECOS] DSR stops running after heavy interrupts.
  2006-04-06  6:49 ` Andrew Lunn
@ 2006-04-06  9:02   ` Stefan Sommerfeld
  2006-04-06 21:09     ` Joe Porthouse
  0 siblings, 1 reply; 25+ messages in thread
From: Stefan Sommerfeld @ 2006-04-06  9:02 UTC (permalink / raw)
  To: ecos-discuss

Hi,

>> In a nutshell:
>> The real time clock DSR stops getting called after several minutes
>> of heavy UART ISR traffic.  I have been running into this on and off for 
>> a
>> while.  Lowering the serial ISR priority seems to help some, but not
>> eliminate the problem.
>>
>> Background:
>> Application is on a custom xScale PXA255 board without redboot.
>> When problem occurs the Real Time tick clock simply stops updating.  All
>> other aspects of the program seem to work correctly.  The real time ISR 
>> is
>> still getting called as well as other ISRs, but the real time clock DSR 
>> is
>> no longer called.
>>
>> In the Vectors.S file I can step through the execution and see what
>> is happening.  On return from the ISR the return code is examined to
>> determine if a DSR call should be added to the DSL list.  This check is 
>> done
>> here:
>>
>>         // The return value from the handler (in r0) will indicate 
>> whether a
>>
>>         // DSR is to be posted. Pass this together with a pointer to the
>>         // interrupt object we have just used to the interrupt tidy up
>>
>>         cmp     v1,#CYGNUM_HAL_INTERRUPT_NONE
>>         beq     17f
>>
>> When the problem occurs the branch (beq) is occurring that skips
>> adding the DSR to the list and ends the ISR.  I can see that R0 is 
>> correctly
>> 0x03 but the branch still occurs.  The problem may be in how this is 
>> getting
>> compiled.  In my JTAG tool I see the above code as:
>>
>> 00008C5C e3740001   CMN       R4,#00000001
>> 00008C60 0a000003   BEQ       00008c74
>>
>> Obviously there is some assembler substitution going on.  I'm not
>> sure why if the value is in r0, why v1 is being checked (not familiar 
>> with
>> the "v" register notation).  Also not sure why the resulting code refers 
>> to
>> R4.  R4 has a different value then R1 at this point in the execution.
>>
>> Any ideas on this?
>
> The fast that this works for a while and then breaks suggests it is
> something unusual going on.
>
> When the problem occurs take a look at the actually contents of memory
> which contains these instructions. Has it been corrupted? Be careful
> with your debugger here. If you just ask it to disassemble the code it
> might show you what is in the elf file, not what is in memory. Do a
> hex dump and compare the machine code bytes.

I had that experience too, not just with the uart (i'm using a PXA270). 
Currently using my FIFO DSRs it works better, but please do more research 
on that topic to fix this once and forever.

Bye...


-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [ECOS] DSR stops running after heavy interrupts.
  2006-04-06  9:02   ` Stefan Sommerfeld
@ 2006-04-06 21:09     ` Joe Porthouse
  2006-04-06 21:19       ` Andrew Lunn
  0 siblings, 1 reply; 25+ messages in thread
From: Joe Porthouse @ 2006-04-06 21:09 UTC (permalink / raw)
  To: ecos-discuss

Stefan, thanks.  I'm glad to know I'm not the only one experiencing this
problem.

I have made a little more progress.

I still can't explain the issues with the code listed in my first message
with the code checking the return value from the ISR, but I believe it is
somehow working correctly.  I still believe there may be a problem with R4
being checked instead of R0.  I did verify that the memory was the same as
my code window, as well as the flash image.

This is what I did find.

DSR calls are being added to the table... thousands of them... just not
getting serviced.  The all calls that lead to "call_pending_DSRs" seem to
originate from the unlock_inner() routine getting called.  This routine
stops getting called when the problem occurs.  (you can see the logic below)


inline void Cyg_Scheduler::unlock() 
{ 
    // This is an inline wrapper for the real scheduler unlock function in 
    // Cyg_Scheduler::unlock_inner(). 
        
    // Only do anything if the lock is about to go zero, otherwise we simply

    // decrement and return. As with lock() we do not need any special code 
    // to decrement the lock counter. 
     
    CYG_INSTRUMENT_SCHED(UNLOCK,get_sched_lock(),0); 
          
    HAL_REORDER_BARRIER(); 
          
    cyg_ucount32 __lock = get_sched_lock() - 1; 
         
    if( __lock == 0 )
      unlock_inner(0); 
    else
      set_sched_lock(__lock); 
   
    HAL_REORDER_BARRIER(); 
}

Upon examination the __lock value is "6" when unlock() is called at the end
of the ISR, thus unlock_inner never gets called.  If I get the variable
location in the get_sched_lock() back to 1, my DSR calls resume.
Mmmmmmm....

So somehow locks are being done without unlocks.  I am at a loss to figure
out how this is occurring since I do not make lock calls in any of my code.
Could interrupt preemption somehow be occurring?  Does the
hal_disable/enable interrupt calls mess with the lock?

Any good ideas on how to track this down?


Joe Porthouse

-----Original Message-----
From: ecos-discuss-owner@ecos.sourceware.org
[mailto:ecos-discuss-owner@ecos.sourceware.org] On Behalf Of Stefan
Sommerfeld
Sent: Thursday, April 06, 2006 5:01 AM
To: ecos-discuss@ecos.sourceware.org
Subject: Re: [ECOS] DSR stops running after heavy interrupts.

Hi,

>> In a nutshell:
>> The real time clock DSR stops getting called after several minutes
>> of heavy UART ISR traffic.  I have been running into this on and off for 
>> a
>> while.  Lowering the serial ISR priority seems to help some, but not
>> eliminate the problem.
>>
>> Background:
>> Application is on a custom xScale PXA255 board without redboot.
>> When problem occurs the Real Time tick clock simply stops updating.  All
>> other aspects of the program seem to work correctly.  The real time ISR 
>> is
>> still getting called as well as other ISRs, but the real time clock DSR 
>> is
>> no longer called.
>>
>> In the Vectors.S file I can step through the execution and see what
>> is happening.  On return from the ISR the return code is examined to
>> determine if a DSR call should be added to the DSL list.  This check is 
>> done
>> here:
>>
>>         // The return value from the handler (in r0) will indicate 
>> whether a
>>
>>         // DSR is to be posted. Pass this together with a pointer to the
>>         // interrupt object we have just used to the interrupt tidy up
>>
>>         cmp     v1,#CYGNUM_HAL_INTERRUPT_NONE
>>         beq     17f
>>
>> When the problem occurs the branch (beq) is occurring that skips
>> adding the DSR to the list and ends the ISR.  I can see that R0 is 
>> correctly
>> 0x03 but the branch still occurs.  The problem may be in how this is 
>> getting
>> compiled.  In my JTAG tool I see the above code as:
>>
>> 00008C5C e3740001   CMN       R4,#00000001
>> 00008C60 0a000003   BEQ       00008c74
>>
>> Obviously there is some assembler substitution going on.  I'm not
>> sure why if the value is in r0, why v1 is being checked (not familiar 
>> with
>> the "v" register notation).  Also not sure why the resulting code refers 
>> to
>> R4.  R4 has a different value then R1 at this point in the execution.
>>
>> Any ideas on this?
>
> The fast that this works for a while and then breaks suggests it is
> something unusual going on.
>
> When the problem occurs take a look at the actually contents of memory
> which contains these instructions. Has it been corrupted? Be careful
> with your debugger here. If you just ask it to disassemble the code it
> might show you what is in the elf file, not what is in memory. Do a
> hex dump and compare the machine code bytes.

I had that experience too, not just with the uart (i'm using a PXA270). 
Currently using my FIFO DSRs it works better, but please do more research 
on that topic to fix this once and forever.

Bye...


-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss




-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [ECOS] DSR stops running after heavy interrupts.
  2006-04-06 21:09     ` Joe Porthouse
@ 2006-04-06 21:19       ` Andrew Lunn
  2006-04-08  4:18         ` [ECOS] DSR stops running after heavy interrupts. Bug found? Joe Porthouse
  0 siblings, 1 reply; 25+ messages in thread
From: Andrew Lunn @ 2006-04-06 21:19 UTC (permalink / raw)
  To: Joe Porthouse; +Cc: ecos-discuss

On Thu, Apr 06, 2006 at 05:08:45PM -0400, Joe Porthouse wrote:
> Stefan, thanks.  I'm glad to know I'm not the only one experiencing this
> problem.
> 
> I have made a little more progress.
> 
> I still can't explain the issues with the code listed in my first message
> with the code checking the return value from the ISR, but I believe it is
> somehow working correctly.  I still believe there may be a problem with R4
> being checked instead of R0.  I did verify that the memory was the same as
> my code window, as well as the flash image.
> 
> This is what I did find.
> 
> DSR calls are being added to the table... thousands of them... just not
> getting serviced.  The all calls that lead to "call_pending_DSRs" seem to
> originate from the unlock_inner() routine getting called.  This routine
> stops getting called when the problem occurs.  (you can see the logic below)
> 
> 
> inline void Cyg_Scheduler::unlock() 
> { 
>     // This is an inline wrapper for the real scheduler unlock function in 
>     // Cyg_Scheduler::unlock_inner(). 
>         
>     // Only do anything if the lock is about to go zero, otherwise we simply
> 
>     // decrement and return. As with lock() we do not need any special code 
>     // to decrement the lock counter. 
>      
>     CYG_INSTRUMENT_SCHED(UNLOCK,get_sched_lock(),0); 
>           
>     HAL_REORDER_BARRIER(); 
>           
>     cyg_ucount32 __lock = get_sched_lock() - 1; 
>          
>     if( __lock == 0 )
>       unlock_inner(0); 
>     else
>       set_sched_lock(__lock); 
>    
>     HAL_REORDER_BARRIER(); 
> }
> 
> Upon examination the __lock value is "6" when unlock() is called at the end
> of the ISR, thus unlock_inner never gets called.  If I get the variable
> location in the get_sched_lock() back to 1, my DSR calls resume.
> Mmmmmmm....
> 
> So somehow locks are being done without unlocks.  I am at a loss to figure
> out how this is occurring since I do not make lock calls in any of my code.
> Could interrupt preemption somehow be occurring?  Does the
> hal_disable/enable interrupt calls mess with the lock?
> 
> Any good ideas on how to track this down?

Kernel instrumentation. 

CYG_INSTRUMENT_SCHED(UNLOCK,get_sched_lock(),0); 

locks and unlocks are logged. See if you can find a case of a lock
without an unlock.

        Andrew

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [ECOS] DSR stops running after heavy interrupts. Bug found?
  2006-04-06 21:19       ` Andrew Lunn
@ 2006-04-08  4:18         ` Joe Porthouse
  2006-04-09 12:33           ` Andrew Lunn
  0 siblings, 1 reply; 25+ messages in thread
From: Joe Porthouse @ 2006-04-08  4:18 UTC (permalink / raw)
  To: ecos-discuss

Found it!!!  

It took two days to figure out what was happening but I think I have a
handle on it.  See if this sounds right.

After an ISR executes, if there is an associated DSR to execute, the DSR is
added to the DSR list and a scheduler lock is made.  Since DSRs are run with
interrupts enabled, the scheduler lock will prevent the application code
from running until all DSRs finish and release each of the scheduler locks.
After adding the DSR to the list, if there is only one scheduler lock (the
one just added), then a call must be made to start the first DSR executing.
If more then one scheduler lock is in place, then execution must resume from
where it left off (DSR or other critical section).  The DSR will start after
the next scheduler unlock is called.

If the ISR does not have an associated DSR, nothing is added to the DSR list
and the scheduler lock is not made, allowing the application or DSR to
resume when the ISR finishes.

The problem is in the /hal/arm/arch/current/src/vectors.S file at line 951.

  // The return value from the handler (in r0) will indicate whether a 
  // DSR is to be posted. Pass this together with a pointer to the
  // interrupt object we have just used to the interrupt tidy up routine.

  // don't run this for spurious interrupts!
  cmp     v1,#CYGNUM_HAL_INTERRUPT_NONE   <-- Incorrectly references R4

  cmp     r0,#CYGNUM_HAL_INTERRUPT_NONE   <-- Change to this

The wrong register is referenced to determine if the ISR has a DSR to add to
the DSR list.  Since any value in R4 other then 0x0001 will call the
routines to add a DSR, and I assume most ISRs have a DSR, the default
behavior seems to works by chance in most configurations.

In my application my ISR does NOT have an associated DSR.  Even though the
correct 0x0001 is returned by the ISR, the call to add the DSR is still
made.  This includes performing a scheduler lock since it expects to release
it after the DSR runs, but there is no DSR.  I believe there is some type of
race condition here that allows the lock to not be released correctly since
there is no corresponding DSR in the DSR list.

Modifying only the above line has so far completely solved my issue of
loosing my DSRs execution.

Can someone review the proposed change, and if warranted, add it into the
CVS?  This problem could/will effect any ARM eCOS application.  Since "v1"
may have correctly referenced "r0" at some time in the past, the other half
dozen "v1, v2...v6" references in vectors.S could also be incorrect.

Joe Porthouse
Toptech Systems, Inc.

-----Original Message-----
From: ecos-discuss-owner@ecos.sourceware.org
[mailto:ecos-discuss-owner@ecos.sourceware.org] On Behalf Of Andrew Lunn
Sent: Thursday, April 06, 2006 5:19 PM
To: Joe Porthouse
Cc: ecos-discuss@ecos.sourceware.org
Subject: Re: [ECOS] DSR stops running after heavy interrupts.

On Thu, Apr 06, 2006 at 05:08:45PM -0400, Joe Porthouse wrote:
> Stefan, thanks.  I'm glad to know I'm not the only one experiencing this
> problem.
> 
> I have made a little more progress.
> 
> I still can't explain the issues with the code listed in my first message
> with the code checking the return value from the ISR, but I believe it is
> somehow working correctly.  I still believe there may be a problem with R4
> being checked instead of R0.  I did verify that the memory was the same as
> my code window, as well as the flash image.
> 
> This is what I did find.
> 
> DSR calls are being added to the table... thousands of them... just not
> getting serviced.  The all calls that lead to "call_pending_DSRs" seem to
> originate from the unlock_inner() routine getting called.  This routine
> stops getting called when the problem occurs.  (you can see the logic
below)
> 
> 
> inline void Cyg_Scheduler::unlock() 
> { 
>     // This is an inline wrapper for the real scheduler unlock function in

>     // Cyg_Scheduler::unlock_inner(). 
>         
>     // Only do anything if the lock is about to go zero, otherwise we
simply
> 
>     // decrement and return. As with lock() we do not need any special
code 
>     // to decrement the lock counter. 
>      
>     CYG_INSTRUMENT_SCHED(UNLOCK,get_sched_lock(),0); 
>           
>     HAL_REORDER_BARRIER(); 
>           
>     cyg_ucount32 __lock = get_sched_lock() - 1; 
>          
>     if( __lock == 0 )
>       unlock_inner(0); 
>     else
>       set_sched_lock(__lock); 
>    
>     HAL_REORDER_BARRIER(); 
> }
> 
> Upon examination the __lock value is "6" when unlock() is called at the
end
> of the ISR, thus unlock_inner never gets called.  If I get the variable
> location in the get_sched_lock() back to 1, my DSR calls resume.
> Mmmmmmm....
> 
> So somehow locks are being done without unlocks.  I am at a loss to figure
> out how this is occurring since I do not make lock calls in any of my
code.
> Could interrupt preemption somehow be occurring?  Does the
> hal_disable/enable interrupt calls mess with the lock?
> 
> Any good ideas on how to track this down?

Kernel instrumentation. 

CYG_INSTRUMENT_SCHED(UNLOCK,get_sched_lock(),0); 

locks and unlocks are logged. See if you can find a case of a lock
without an unlock.

        Andrew

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [ECOS] DSR stops running after heavy interrupts. Bug found?
  2006-04-08  4:18         ` [ECOS] DSR stops running after heavy interrupts. Bug found? Joe Porthouse
@ 2006-04-09 12:33           ` Andrew Lunn
  2006-04-10  4:50             ` [ECOS] " Sergei Organov
  0 siblings, 1 reply; 25+ messages in thread
From: Andrew Lunn @ 2006-04-09 12:33 UTC (permalink / raw)
  To: Joe Porthouse; +Cc: ecos-discuss

On Sat, Apr 08, 2006 at 12:18:24AM -0400, Joe Porthouse wrote:
> Found it!!!  
> 
> It took two days to figure out what was happening but I think I have a
> handle on it.  See if this sounds right.
> 
> After an ISR executes, if there is an associated DSR to execute, the DSR is
> added to the DSR list and a scheduler lock is made.  Since DSRs are run with
> interrupts enabled, the scheduler lock will prevent the application code
> from running until all DSRs finish and release each of the scheduler locks.
> After adding the DSR to the list, if there is only one scheduler lock (the
> one just added), then a call must be made to start the first DSR executing.
> If more then one scheduler lock is in place, then execution must resume from
> where it left off (DSR or other critical section).  The DSR will start after
> the next scheduler unlock is called.
> 
> If the ISR does not have an associated DSR, nothing is added to the DSR list
> and the scheduler lock is not made, allowing the application or DSR to
> resume when the ISR finishes.
> 
> The problem is in the /hal/arm/arch/current/src/vectors.S file at line 951.
> 
>   // The return value from the handler (in r0) will indicate whether a 
>   // DSR is to be posted. Pass this together with a pointer to the
>   // interrupt object we have just used to the interrupt tidy up routine.
> 
>   // don't run this for spurious interrupts!
>   cmp     v1,#CYGNUM_HAL_INTERRUPT_NONE   <-- Incorrectly references R4
> 
>   cmp     r0,#CYGNUM_HAL_INTERRUPT_NONE   <-- Change to this
> 
> The wrong register is referenced to determine if the ISR has a DSR to add to
> the DSR list.  Since any value in R4 other then 0x0001 will call the
> routines to add a DSR, and I assume most ISRs have a DSR, the default
> behavior seems to works by chance in most configurations.

I don't think this is the correct interprtation of this code.

Line 914"

        bl      hal_IRQ_handler         // determine interrupt source
        mov     v1,r0                   // returned vector #

So the vector is now in both r0 and v1( == r4)

#if defined(CYGPKG_KERNEL_INSTRUMENT) && \
    defined(CYGDBG_KERNEL_INSTRUMENT_INTR)
        ldr     r0,=RAISE_INTR          // arg0 = type = INTR,RAISE
        mov     r1,v1                   // arg1 = vector
        mov     r2,#0                   // arg2 = 0
        bl      cyg_instrument          // call instrument function
#endif

        ARM_MODE(r0,10)

        mov     r0,v1                   // vector #

The code above destroys the vector value in r0. Reload it from v1.

#if defined(CYGDBG_HAL_DEBUG_GDB_CTRLC_SUPPORT) \
    || defined(CYGDBG_HAL_DEBUG_GDB_BREAK_SUPPORT)
        // If we are supporting Ctrl-C interrupts from GDB, we must squirrel
        // away a pointer to the save interrupt state here so that we can
        // plant a breakpoint at some later time.

       .extern  hal_saved_interrupt_state
        ldr     r2,=hal_saved_interrupt_state
        str     v6,[r2]
#endif

        cmp     r0,#CYGNUM_HAL_INTERRUPT_NONE   // spurious interrupt
        bne     10f

Here we check if the vector returned indicates there is a spurious
interrupt. If it is not we jump over this next bit of code.

#ifdef  CYGIMP_HAL_COMMON_INTERRUPTS_IGNORE_SPURIOUS
        // Acknowledge the interrupt
        THUMB_CALL(r1,12,hal_interrupt_acknowledge)
#else
        mov     r0,v6                   // register frame
        THUMB_CALL(r1,12,hal_spurious_IRQ)
#endif // CYGIMP_HAL_COMMON_INTERRUPTS_IGNORE_SPURIOUS
        b       spurious_IRQ

Cleans up the interrupt controller after the spurious interrupt. v1
still contains the interrupt vector, ie #CYGNUM_HAL_INTERRUPT_NONE
        
10:     ldr     r1,.hal_interrupt_data
        ldr     r1,[r1,v1,lsl #2]       // handler data
        ldr     r2,.hal_interrupt_handlers
        ldr     v3,[r2,v1,lsl #2]       // handler (indexed by vector #)
        mov     r2,v6                   // register frame (this is necessary
                                        // for the ISR too, for ^C detection)

Calculates the address of the function to call. 

#ifdef __thumb__
        ldr     lr,=10f
        bx      v3                      // invoke handler (thumb mode)
        .pool
        .code   16
        .thumb_func
IRQ_10T:
10:     ldr     r2,=15f
        bx      r2                      // switch back to ARM mode
        .pool
        .code   32
15:
IRQ_15A:
#else
        mov     lr,pc                   // invoke handler (call indirect
        mov     pc,v3                   // thru v3)
#endif

Calls the vector, either as thumb of ARM ISA. v1 (==r4) still contains the
vector.

#ifdef CYGIMP_HAL_COMMON_INTERRUPTS_USE_INTERRUPT_STACK
        // If we are returning from the last nested interrupt, move back
        // to the thread stack. interrupt_end() must be called on the
        // thread stack since it potentially causes a context switch.
        ldr     r2,.irq_level
        ldr     r3,[r2]
        subs    r1,r3,#1
        str     r1,[r2]
        ldreq   sp,[sp]         // This should be the saved stack pointer
#endif
        // The return value from the handler (in r0) will indicate whether a 
        // DSR is to be posted. Pass this together with a pointer to the
        // interrupt object we have just used to the interrupt tidy up routine.

        // don't run this for spurious interrupts!
        cmp     v1,#CYGNUM_HAL_INTERRUPT_NONE
        beq     17f

v1 is still the vector. If the vector indicates a spurious interrupt
we don't have a DSR to call so skip the next bit of code.

        ldr     r1,.hal_interrupt_objects
        ldr     r1,[r1,v1,lsl #2]
        mov     r2,v6           // register frame

        THUMB_MODE(r3,10)
        
        bl      interrupt_end   // post any bottom layer handler

Now run the DSRs if possible etc.

                                // threads and call scheduler
        ARM_MODE(r1,10)
17:

//      mov     r0,sp
//      bl      show_frame_out

	// return from IRQ is same as return from exception
	b	return_from_exception        

So i think the comparison with v1 is correct. However, from what you
are saying it sounds like there needs to be another comparison
afterwards. Something like:

        and     r0,r0,#2 // CYG_ISR_CALL_DSR
        beq     17f


        Andrew

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [ECOS]  Re: DSR stops running after heavy interrupts. Bug found?
  2006-04-09 12:33           ` Andrew Lunn
@ 2006-04-10  4:50             ` Sergei Organov
  2006-04-10  9:36               ` Nick Garnett
  0 siblings, 1 reply; 25+ messages in thread
From: Sergei Organov @ 2006-04-10  4:50 UTC (permalink / raw)
  To: ecos-discuss

Andrew Lunn <andrew@lunn.ch> writes:

[...]

> However, from what you are saying it sounds like there needs to be
> another comparison afterwards. Something like:
>
>         and     r0,r0,#2 // CYG_ISR_CALL_DSR
>         beq     17f

No, bit checking of the ISR return value is performed inside the
interrupt_end() routine:

    if( isr_ret & Cyg_Interrupt::CALL_DSR && intr != NULL ) intr->post_dsr()

-- Sergei.


-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [ECOS]  Re: DSR stops running after heavy interrupts. Bug found?
  2006-04-10  4:50             ` [ECOS] " Sergei Organov
@ 2006-04-10  9:36               ` Nick Garnett
  2006-04-10 10:44                 ` Sergei Organov
                                   ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Nick Garnett @ 2006-04-10  9:36 UTC (permalink / raw)
  To: Sergei Organov; +Cc: ecos-discuss

 Sergei Organov <osv@javad.com> writes:

> Andrew Lunn <andrew@lunn.ch> writes:
> 
> [...]
> 
> > However, from what you are saying it sounds like there needs to be
> > another comparison afterwards. Something like:
> >
> >         and     r0,r0,#2 // CYG_ISR_CALL_DSR
> >         beq     17f
> 
> No, bit checking of the ISR return value is performed inside the
> interrupt_end() routine:
> 
>     if( isr_ret & Cyg_Interrupt::CALL_DSR && intr != NULL ) intr->post_dsr()

Exactly. And there are other housekeeping things that go on in
interrupt_end() which cannot be skipped. The most important of these
is decrementing the scheduler lock.

I don't really see how the original poster's problem is fixed by
trying to skip interrupt_end(), I would only expect doing that to
aggravate the problem. The scheduler lock is acquired early in
interrupt processing -- before the ISR is called and we know whether
there is a DSR to call. interrupt_end() decrements the scheduler lock
and as a side-effect may cause any DSRs to be called.

As Andrew has suggested, I think Joe's best way of working out what is
happening is to switch on instrumentation and see if he can track down
the extra increments of the scheduler lock.

-- 
Nick Garnett                                 eCos Kernel Architect
http://www.ecoscentric.com            The eCos and RedBoot experts

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [ECOS]  Re: DSR stops running after heavy interrupts. Bug  found?
  2006-04-10  9:36               ` Nick Garnett
@ 2006-04-10 10:44                 ` Sergei Organov
  2006-04-10 10:59                   ` Nick Garnett
  2006-04-10 16:41                 ` [ECOS] Re: DSR stops running after heavy interrupts. Spurious Interrupt! Joe Porthouse
  2006-04-13  7:58                 ` [ECOS] How to use the ARM directive DCB in Vectors.S Birahim Larou Fall
  2 siblings, 1 reply; 25+ messages in thread
From: Sergei Organov @ 2006-04-10 10:44 UTC (permalink / raw)
  To: ecos-discuss; +Cc: Nick Garnett

Nick Garnett <nickg@ecoscentric.com> writes:
>  Sergei Organov <osv@javad.com> writes:
>
>> Andrew Lunn <andrew@lunn.ch> writes:
>> 
>> [...]
>> 
>> > However, from what you are saying it sounds like there needs to be
>> > another comparison afterwards. Something like:
>> >
>> >         and     r0,r0,#2 // CYG_ISR_CALL_DSR
>> >         beq     17f
>> 
>> No, bit checking of the ISR return value is performed inside the
>> interrupt_end() routine:
>> 
>>     if( isr_ret & Cyg_Interrupt::CALL_DSR && intr != NULL ) intr->post_dsr()
>
>
> Exactly. And there are other housekeeping things that go on in
> interrupt_end() which cannot be skipped. The most important of these
> is decrementing the scheduler lock.
>
> I don't really see how the original poster's problem is fixed by
> trying to skip interrupt_end(), I would only expect doing that to
> aggravate the problem. The scheduler lock is acquired early in
> interrupt processing -- before the ISR is called and we know whether
> there is a DSR to call. interrupt_end() decrements the scheduler lock
> and as a side-effect may cause any DSRs to be called.

A little OT while we are at interrupt_end(). Could you please explain
why

#ifdef CYGPKG_KERNEL_SMP_SUPPORT
    Cyg_Scheduler::lock();
#endif

is there at the beginning, -- looks like extra scheduler lock without
corresponding unlock for SMP case. If not a bug, it seems a comment
would be nice to have there.

-- Sergei.

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [ECOS]  Re: DSR stops running after heavy interrupts. Bug  found?
  2006-04-10 10:44                 ` Sergei Organov
@ 2006-04-10 10:59                   ` Nick Garnett
  2006-04-10 11:15                     ` Sergei Organov
  0 siblings, 1 reply; 25+ messages in thread
From: Nick Garnett @ 2006-04-10 10:59 UTC (permalink / raw)
  To: Sergei Organov; +Cc: ecos-discuss

Sergei Organov <osv@javad.com> writes:

> A little OT while we are at interrupt_end(). Could you please explain
> why
> 
> #ifdef CYGPKG_KERNEL_SMP_SUPPORT
>     Cyg_Scheduler::lock();
> #endif
> 
> is there at the beginning, -- looks like extra scheduler lock without
> corresponding unlock for SMP case. If not a bug, it seems a comment
> would be nice to have there.

In SMP configurations we don't want to claim the scheduler lock in the
interrupt VSR because it would block interrupts and scheduler
operations on other CPUs. It also requires a spinlock to be claimed,
which would require special code to be written -- it's much easier to
do the job later. In HALs where SMP is supported, the usual scheduler
lock increment is ifdeffed out.

Perhaps a comment would be useful, but it seemed like the ifdef
surrounding it would be sufficient indication that this was for SMP
only. 

-- 
Nick Garnett                                 eCos Kernel Architect
http://www.ecoscentric.com            The eCos and RedBoot experts

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [ECOS]  Re: DSR stops running after heavy interrupts. Bug   found?
  2006-04-10 10:59                   ` Nick Garnett
@ 2006-04-10 11:15                     ` Sergei Organov
  2006-04-10 13:20                       ` Joe Porthouse
  0 siblings, 1 reply; 25+ messages in thread
From: Sergei Organov @ 2006-04-10 11:15 UTC (permalink / raw)
  To: ecos-discuss

Nick Garnett <nickg@ecoscentric.com> writes:

> Sergei Organov <osv@javad.com> writes:
>
>> A little OT while we are at interrupt_end(). Could you please explain
>> why
>> 
>> #ifdef CYGPKG_KERNEL_SMP_SUPPORT
>>     Cyg_Scheduler::lock();
>> #endif
>> 
>> is there at the beginning, -- looks like extra scheduler lock without
>> corresponding unlock for SMP case. If not a bug, it seems a comment
>> would be nice to have there.
>
> In SMP configurations we don't want to claim the scheduler lock in the
> interrupt VSR because it would block interrupts and scheduler
> operations on other CPUs. It also requires a spinlock to be claimed,
> which would require special code to be written -- it's much easier to
> do the job later. In HALs where SMP is supported, the usual scheduler
> lock increment is ifdeffed out.

Ah, now I see, thanks. Seems like non-SMP targets could benefit from
this approach as well, isn't it? Or is there some fundamental difference
here?

I just think that SMP variant makes some things better even for
single-CPU case and thus it could be a good idea to use SMP variant for
single-CPU case in those places. Less ifdefs would be another gain.

>
> Perhaps a comment would be useful, but it seemed like the ifdef
> surrounding it would be sufficient indication that this was for SMP
> only.

Please try to look at it from the POW of a reader of the
interrupt_end(), -- it's clear that it's for SMP only, but it's
absolutely unclear why SMP requires one more scheduler lock. When I
looked at it, I failed to find corresponding unlock(), but didn't pay
much attention as I'm not currently interested in SMP.

I believe your above "In SMP configurations we..." phrase would be a
nice comment for this piece of code.

-- Sergei.

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [ECOS]  Re: DSR stops running after heavy interrupts. Bug   found?
  2006-04-10 11:15                     ` Sergei Organov
@ 2006-04-10 13:20                       ` Joe Porthouse
  0 siblings, 0 replies; 25+ messages in thread
From: Joe Porthouse @ 2006-04-10 13:20 UTC (permalink / raw)
  To: ecos-discuss

All,
Many thanks for the replies.

I now see my misunderstanding on the intent of the vectors.S line 951,
   cmp     v1,#CYGNUM_HAL_INTERRUPT_NONE

I also see the bit check in the interrupt_end routine for the isr_ret value.

I am still at a loss for why my change solved my issue.  I do believe it is
an issue with servicing an ISR that does not have a DSR.

I am still debugging.

BTW, where is the initial scheduler lock performed when an interrupt is
generated?

Joe Porthouse
Toptech Systems, Inc.

-----Original Message-----
From: ecos-discuss-owner@ecos.sourceware.org
[mailto:ecos-discuss-owner@ecos.sourceware.org] On Behalf Of Sergei Organov
Sent: Monday, April 10, 2006 7:15 AM
To: ecos-discuss@ecos.sourceware.org
Subject: Re: [ECOS] Re: DSR stops running after heavy interrupts. Bug found?

Nick Garnett <nickg@ecoscentric.com> writes:

> Sergei Organov <osv@javad.com> writes:
>
>> A little OT while we are at interrupt_end(). Could you please explain
>> why
>> 
>> #ifdef CYGPKG_KERNEL_SMP_SUPPORT
>>     Cyg_Scheduler::lock();
>> #endif
>> 
>> is there at the beginning, -- looks like extra scheduler lock without
>> corresponding unlock for SMP case. If not a bug, it seems a comment
>> would be nice to have there.
>
> In SMP configurations we don't want to claim the scheduler lock in the
> interrupt VSR because it would block interrupts and scheduler
> operations on other CPUs. It also requires a spinlock to be claimed,
> which would require special code to be written -- it's much easier to
> do the job later. In HALs where SMP is supported, the usual scheduler
> lock increment is ifdeffed out.

Ah, now I see, thanks. Seems like non-SMP targets could benefit from
this approach as well, isn't it? Or is there some fundamental difference
here?

I just think that SMP variant makes some things better even for
single-CPU case and thus it could be a good idea to use SMP variant for
single-CPU case in those places. Less ifdefs would be another gain.

>
> Perhaps a comment would be useful, but it seemed like the ifdef
> surrounding it would be sufficient indication that this was for SMP
> only.

Please try to look at it from the POW of a reader of the
interrupt_end(), -- it's clear that it's for SMP only, but it's
absolutely unclear why SMP requires one more scheduler lock. When I
looked at it, I failed to find corresponding unlock(), but didn't pay
much attention as I'm not currently interested in SMP.

I believe your above "In SMP configurations we..." phrase would be a
nice comment for this piece of code.

-- Sergei.

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [ECOS]  Re: DSR stops running after heavy interrupts. Spurious Interrupt!
  2006-04-10  9:36               ` Nick Garnett
  2006-04-10 10:44                 ` Sergei Organov
@ 2006-04-10 16:41                 ` Joe Porthouse
  2006-04-10 17:20                   ` Nick Garnett
  2006-04-13  7:58                 ` [ECOS] How to use the ARM directive DCB in Vectors.S Birahim Larou Fall
  2 siblings, 1 reply; 25+ messages in thread
From: Joe Porthouse @ 2006-04-10 16:41 UTC (permalink / raw)
  To: ecos-discuss

Nick,
Thanks for you reply.

Your right.  Not calling the interrupt_end() routine would cause the lock
not to be released.  After your comment I started looking closer at my
modification.

My original modification of:
/hal/arm/arch/current/src/vectors.S file at line 951.
  cmp     v1,#CYGNUM_HAL_INTERRUPT_NONE   <-- from this
  cmp     r0,#CYGNUM_HAL_INTERRUPT_NONE   <-- to this

v1 originally contained the interrupt vector.  But I mistakenly believed
this was the check of the return value from the ISR.  I modified it to look
at r0, the return value from the isr. The return value from the isr will be
0-3 (really 1 or 3).  The CYGNUM_HAL_INTERRUPT_NONE is -1! (for some reason
I thought it was +1 looking at the assembly listing) 

Bottom line, my modification made sure that interrupt_end() would always be
called, even when v1 == CYGNUM_HAL_INTERRUPT_NONE (spurious interrupt).

I just did a quick test with the original code and verified that when a
spurious interrupt occurs, the interrupt_end() routine is not called and the
lock is not released and my problem occurs.

Calling the interrupt_end() routine with a spurious interrupt did not seem
to break anything.  Was there a reason why interrupt_end() should not be
called on spurious interrupts?

Now to figure out why I am getting a spurious interrupt with the simple UART
code listed below?

What should I look for in attempting to eliminate spurious interrupts?  Can
they be eliminated?

What modifications to eCos source or my project is in order for dealing with
spurious interrupts correctly?

#define CYGNUM_HAL_INTERRUPT_22   22
#define CYGNUM_HAL_INTERRUPT_21   21
#define CYGNUM_HAL_INTERRUPT_20   20
#define CYG_HAL_PRI_HIGH           0

static cyg_interrupt  btuart_interrupt_new_object;
static cyg_handle_t   btuart_interrupt_handle;
static cyg_vector_t   btuart_interrupt_vector   = CYGNUM_HAL_INTERRUPT_21;
static cyg_priority_t btuart_interrupt_priority = CYG_HAL_PRI_HIGH;

unsigned int isr_rx_count = 0;

cyg_uint32 btuart_interrupt_isr(
  cyg_vector_t vector,
  cyg_addrword_t data)
{
  unsigned int iir, lsr, rbr;

  iir = PXA255_BTIIR;

  // check if RX FIFO interrupt
  if((iir & PXA255_IIR_IID_INT_ID_MASK) ==
PXA255_IIR_IID_RX_FIFO_INT_PENDING)
  {
    lsr = PXA255_BTLSR;
    while(lsr & PXA255_LSR_DR_DATA_READY)
    {
      rbr = PXA255_BTRBR;
      isr_rx_count++;
      lsr = PXA255_BTLSR;
    }
  }

  cyg_interrupt_acknowledge(vector);

  return(CYG_ISR_HANDLED);
}

void serial_port_start(void)
{
  // GPIO and UART inits here...

  cyg_interrupt_create(
    btuart_interrupt_vector,
    btuart_interrupt_priority,
    0,
    &btuart_interrupt_isr,
    0,
    &btuart_interrupt_handle,
    &btuart_interrupt_new_object);

  cyg_interrupt_attach(btuart_interrupt_handle);

  cyg_interrupt_unmask(btuart_interrupt_vector);

  // UART interupt enabled here...
}

Joe Porthouse
Toptech Systems, Inc.

-----Original Message-----
From: ecos-discuss-owner@ecos.sourceware.org
[mailto:ecos-discuss-owner@ecos.sourceware.org] On Behalf Of Nick Garnett
Sent: Monday, April 10, 2006 5:36 AM
To: Sergei Organov
Cc: ecos-discuss@sources.redhat.com
Subject: Re: [ECOS] Re: DSR stops running after heavy interrupts. Bug found?

 Sergei Organov <osv@javad.com> writes:

> Andrew Lunn <andrew@lunn.ch> writes:
> 
> [...]
> 
> > However, from what you are saying it sounds like there needs to be
> > another comparison afterwards. Something like:
> >
> >         and     r0,r0,#2 // CYG_ISR_CALL_DSR
> >         beq     17f
> 
> No, bit checking of the ISR return value is performed inside the
> interrupt_end() routine:
> 
>     if( isr_ret & Cyg_Interrupt::CALL_DSR && intr != NULL )
intr->post_dsr()

Exactly. And there are other housekeeping things that go on in
interrupt_end() which cannot be skipped. The most important of these
is decrementing the scheduler lock.

I don't really see how the original poster's problem is fixed by
trying to skip interrupt_end(), I would only expect doing that to
aggravate the problem. The scheduler lock is acquired early in
interrupt processing -- before the ISR is called and we know whether
there is a DSR to call. interrupt_end() decrements the scheduler lock
and as a side-effect may cause any DSRs to be called.

As Andrew has suggested, I think Joe's best way of working out what is
happening is to switch on instrumentation and see if he can track down
the extra increments of the scheduler lock.

-- 
Nick Garnett                                 eCos Kernel Architect
http://www.ecoscentric.com            The eCos and RedBoot experts

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [ECOS]  Re: DSR stops running after heavy interrupts. Spurious Interrupt!
  2006-04-10 16:41                 ` [ECOS] Re: DSR stops running after heavy interrupts. Spurious Interrupt! Joe Porthouse
@ 2006-04-10 17:20                   ` Nick Garnett
  2006-04-10 17:44                     ` Andrew Lunn
                                       ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Nick Garnett @ 2006-04-10 17:20 UTC (permalink / raw)
  To: jporthouse; +Cc: ecos-discuss

"Joe Porthouse" <jporthouse@toptech.com> writes:

> Nick,
> Thanks for you reply.
> 
> Your right.  Not calling the interrupt_end() routine would cause the lock
> not to be released.  After your comment I started looking closer at my
> modification.
> 
> My original modification of:
> /hal/arm/arch/current/src/vectors.S file at line 951.
>   cmp     v1,#CYGNUM_HAL_INTERRUPT_NONE   <-- from this
>   cmp     r0,#CYGNUM_HAL_INTERRUPT_NONE   <-- to this
> 
> v1 originally contained the interrupt vector.  But I mistakenly believed
> this was the check of the return value from the ISR.  I modified it to look
> at r0, the return value from the isr. The return value from the isr will be
> 0-3 (really 1 or 3).  The CYGNUM_HAL_INTERRUPT_NONE is -1! (for some reason
> I thought it was +1 looking at the assembly listing) 
> 
> Bottom line, my modification made sure that interrupt_end() would always be
> called, even when v1 == CYGNUM_HAL_INTERRUPT_NONE (spurious interrupt).
> 
> I just did a quick test with the original code and verified that when a
> spurious interrupt occurs, the interrupt_end() routine is not called and the
> lock is not released and my problem occurs.
> 
> Calling the interrupt_end() routine with a spurious interrupt did not seem
> to break anything.

This all makes sense.

> Was there a reason why interrupt_end() should not be
> called on spurious interrupts?

I guess it was an attempt to avoid doing more than the absolute
minimum on spurious interrupts. It looks like there is a bug in there,
since the scheduler lock doesn't get decremented. In general, spurious
interrupts shouldn't happen, which is why it has managed to lurk here
for so long.

> 
> Now to figure out why I am getting a spurious interrupt with the simple UART
> code listed below?
> 
> What should I look for in attempting to eliminate spurious interrupts?  Can
> they be eliminated?

The CYGNUM_HAL_INTERRUPT_NONE return from hal_IRQ_handler() only
happens when an interrupt occurs but the interrupt controller denies
all knowledge of it. One possibility is that hal_IRQ_handler() is
decoding a real interrupt wrongly and generating -1 by mistake.

What you need to do is find out why hal_IRQ_handler() is returning
this value. If you can put a breakpoint in hal_IRQ_handler()
where CYGNUM_HAL_INTERRUPT_NONE is returned, then you should be able
to look at all the relevant device and interrupt controller registers
and find out what is going on.

Also, enable assertions, it might tell you something.

-- 
Nick Garnett                                 eCos Kernel Architect
http://www.ecoscentric.com            The eCos and RedBoot experts


-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [ECOS]  Re: DSR stops running after heavy interrupts. Spurious Interrupt!
  2006-04-10 17:20                   ` Nick Garnett
@ 2006-04-10 17:44                     ` Andrew Lunn
  2006-04-10 20:49                     ` Joe Porthouse
  2006-04-11  4:15                     ` Sergei Organov
  2 siblings, 0 replies; 25+ messages in thread
From: Andrew Lunn @ 2006-04-10 17:44 UTC (permalink / raw)
  To: Nick Garnett; +Cc: jporthouse, ecos-discuss

> > What should I look for in attempting to eliminate spurious interrupts?  Can
> > they be eliminated?
> 
> The CYGNUM_HAL_INTERRUPT_NONE return from hal_IRQ_handler() only
> happens when an interrupt occurs but the interrupt controller denies
> all knowledge of it. One possibility is that hal_IRQ_handler() is
> decoding a real interrupt wrongly and generating -1 by mistake.
> 
> What you need to do is find out why hal_IRQ_handler() is returning
> this value. If you can put a breakpoint in hal_IRQ_handler()
> where CYGNUM_HAL_INTERRUPT_NONE is returned, then you should be able
> to look at all the relevant device and interrupt controller registers
> and find out what is going on.

Also check that you have level vs edge trigger correct for your
hardware.

It could be a hardware error, eg a pulse is too short, a floating
interrupt signal, some device which does not get reset when the
process does etc.

        Andrew

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [ECOS]  Re: DSR stops running after heavy interrupts. Spurious Interrupt!
  2006-04-10 17:20                   ` Nick Garnett
  2006-04-10 17:44                     ` Andrew Lunn
@ 2006-04-10 20:49                     ` Joe Porthouse
  2006-04-11  4:07                       ` Sergei Organov
  2006-04-11  8:31                       ` Nick Garnett
  2006-04-11  4:15                     ` Sergei Organov
  2 siblings, 2 replies; 25+ messages in thread
From: Joe Porthouse @ 2006-04-10 20:49 UTC (permalink / raw)
  To: ecos-discuss

Ok, found the source of the Spurious Interrupts, (your really going to love
this one). 

During my testing on determining why I was receiving occasional spurious
interrupts I noticed that the PXA2X0_ICIP (IRQ Interrupt Pending) register
was completely clear, even though the processor had just jumped to the IRQ
vector.

I checked the interrupt level register and even turned off all other
interrupts in the system but the problem still occurred.  It was like
something was clearing the interrupt before the IRQ vector jump occurred, or
some intermediate IRQ hardware flag was not getting cleared when the
PXA2X0_ICIP was getting cleared from the last interrupt.  I was really
scratching my head at this point.

My target is an Intel PXA255 xScale processor and I'm using the three built
in UARTs.

One of the events that the built in UART can generate an interrupt on is a
"Character Timeout".  The definition of this event is when at least one
character is in the Receive FIFO and no data has been received for four
character times.

Clearing this interrupt occurs if you read from the Receiver FIFO, set the
FCR[RESETRF] bit or A NEW START BIT IS RECEIVED!!!

So if the RX FIFO is below the trigger point and a timeout occurs an IRQ
request is generated, but if a new start bit is detected the IRQ request is
then immediately cleared. :(

Wow an interrupt that can clear its own IRQ request before service occurs!!!
That would surely cause a Spurious Interrupt.

If my conclusions are correct and I want don't want characters to hang out
in my RX FIFO I will either need to:
#1.  Stop using the UART FIFO.
#2.  Poll the FIFO for trailing characters.
#3.  Live with the Spurious Interrupts as a processor UART design issue.

I will probably follow through with #3 by commenting out lien 951 and 952 in
the /hal/arm/arch/current/src/vectors.S file.

Joe Porthouse
Toptech Systems, Inc.

-----Original Message-----
From: nickg@xl5.calivar.com [mailto:nickg@xl5.calivar.com] On Behalf Of Nick
Garnett
Sent: Monday, April 10, 2006 1:20 PM
To: jporthouse@toptech.com
Cc: ecos-discuss@sources.redhat.com
Subject: Re: [ECOS] Re: DSR stops running after heavy interrupts. Spurious
Interrupt!

"Joe Porthouse" <jporthouse@toptech.com> writes:

> Nick,
> Thanks for you reply.
> 
> Your right.  Not calling the interrupt_end() routine would cause the lock
> not to be released.  After your comment I started looking closer at my
> modification.
> 
> My original modification of:
> /hal/arm/arch/current/src/vectors.S file at line 951.
>   cmp     v1,#CYGNUM_HAL_INTERRUPT_NONE   <-- from this
>   cmp     r0,#CYGNUM_HAL_INTERRUPT_NONE   <-- to this
> 
> v1 originally contained the interrupt vector.  But I mistakenly believed
> this was the check of the return value from the ISR.  I modified it to
look
> at r0, the return value from the isr. The return value from the isr will
be
> 0-3 (really 1 or 3).  The CYGNUM_HAL_INTERRUPT_NONE is -1! (for some
reason
> I thought it was +1 looking at the assembly listing) 
> 
> Bottom line, my modification made sure that interrupt_end() would always
be
> called, even when v1 == CYGNUM_HAL_INTERRUPT_NONE (spurious interrupt).
> 
> I just did a quick test with the original code and verified that when a
> spurious interrupt occurs, the interrupt_end() routine is not called and
the
> lock is not released and my problem occurs.
> 
> Calling the interrupt_end() routine with a spurious interrupt did not seem
> to break anything.

This all makes sense.

> Was there a reason why interrupt_end() should not be
> called on spurious interrupts?

I guess it was an attempt to avoid doing more than the absolute
minimum on spurious interrupts. It looks like there is a bug in there,
since the scheduler lock doesn't get decremented. In general, spurious
interrupts shouldn't happen, which is why it has managed to lurk here
for so long.

> 
> Now to figure out why I am getting a spurious interrupt with the simple
UART
> code listed below?
> 
> What should I look for in attempting to eliminate spurious interrupts?
Can
> they be eliminated?

The CYGNUM_HAL_INTERRUPT_NONE return from hal_IRQ_handler() only
happens when an interrupt occurs but the interrupt controller denies
all knowledge of it. One possibility is that hal_IRQ_handler() is
decoding a real interrupt wrongly and generating -1 by mistake.

What you need to do is find out why hal_IRQ_handler() is returning
this value. If you can put a breakpoint in hal_IRQ_handler()
where CYGNUM_HAL_INTERRUPT_NONE is returned, then you should be able
to look at all the relevant device and interrupt controller registers
and find out what is going on.

Also, enable assertions, it might tell you something.

-- 
Nick Garnett                                 eCos Kernel Architect
http://www.ecoscentric.com            The eCos and RedBoot experts

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [ECOS]  Re: DSR stops running after heavy interrupts. Spurious Interrupt!
  2006-04-10 20:49                     ` Joe Porthouse
@ 2006-04-11  4:07                       ` Sergei Organov
  2006-04-11  8:31                       ` Nick Garnett
  1 sibling, 0 replies; 25+ messages in thread
From: Sergei Organov @ 2006-04-11  4:07 UTC (permalink / raw)
  To: ecos-discuss

"Joe Porthouse" <jporthouse@toptech.com> writes:
> Ok, found the source of the Spurious Interrupts, (your really going to love
> this one). 
[...]
> Clearing this interrupt occurs if you read from the Receiver FIFO, set the
> FCR[RESETRF] bit or A NEW START BIT IS RECEIVED!!!
>
> So if the RX FIFO is below the trigger point and a timeout occurs an IRQ
> request is generated, but if a new start bit is detected the IRQ request is
> then immediately cleared. :(
>
> Wow an interrupt that can clear its own IRQ request before service occurs!!!
> That would surely cause a Spurious Interrupt.

Sounds like yet another piece of broken hardware design from Intel, --
they failed to deliver reasonable RS232 implementation at the early
days of PC, and still fail to do it right in 20 years :(

-- Sergei.


-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [ECOS]  Re: DSR stops running after heavy interrupts. Spurious Interrupt!
  2006-04-10 17:20                   ` Nick Garnett
  2006-04-10 17:44                     ` Andrew Lunn
  2006-04-10 20:49                     ` Joe Porthouse
@ 2006-04-11  4:15                     ` Sergei Organov
  2006-04-11  8:43                       ` Nick Garnett
  2 siblings, 1 reply; 25+ messages in thread
From: Sergei Organov @ 2006-04-11  4:15 UTC (permalink / raw)
  To: ecos-discuss

Nick Garnett <nickg@ecoscentric.com> writes:
> "Joe Porthouse" <jporthouse@toptech.com> writes:
[...]

>> Was there a reason why interrupt_end() should not be
>> called on spurious interrupts?
>
> I guess it was an attempt to avoid doing more than the absolute
> minimum on spurious interrupts. It looks like there is a bug in there,
> since the scheduler lock doesn't get decremented. In general, spurious
> interrupts shouldn't happen, which is why it has managed to lurk here
> for so long.

Well, I think the right question here is why scheduler lock is
incremented at all? I mean if SMP implementations happen to increment it
inside the interrupt_end(), then it should be safe for ARM HAL to
increment it just before calling interrupt_end(), isn't it? This way
spurious interrupt handling code will avoid both scheduler lock
increment and interrupt_end() call. Makes sense?

-- Sergei.


-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [ECOS]  Re: DSR stops running after heavy interrupts. Spurious Interrupt!
  2006-04-10 20:49                     ` Joe Porthouse
  2006-04-11  4:07                       ` Sergei Organov
@ 2006-04-11  8:31                       ` Nick Garnett
  1 sibling, 0 replies; 25+ messages in thread
From: Nick Garnett @ 2006-04-11  8:31 UTC (permalink / raw)
  To: jporthouse; +Cc: ecos-discuss

"Joe Porthouse" <jporthouse@toptech.com> writes:

> Clearing this interrupt occurs if you read from the Receiver FIFO, set the
> FCR[RESETRF] bit or A NEW START BIT IS RECEIVED!!!
> 
> So if the RX FIFO is below the trigger point and a timeout occurs an IRQ
> request is generated, but if a new start bit is detected the IRQ request is
> then immediately cleared. :(

That certainly sounds like the cause of your problems. It sounds like
there is no way to fix it without disabling the timeout interrupt. I'm
not sure whether it is a bug in the UART for cancelling an interrupt
it has raised, or a bug in the interrupt controller for not latching
interrupt requests. At least it seems to be a fairly narrow race
window, so isn't going to interfere with performance too much.

> 
> Wow an interrupt that can clear its own IRQ request before service occurs!!!
> That would surely cause a Spurious Interrupt.
> 
> If my conclusions are correct and I want don't want characters to hang out
> in my RX FIFO I will either need to:
> #1.  Stop using the UART FIFO.
> #2.  Poll the FIFO for trailing characters.
> #3.  Live with the Spurious Interrupts as a processor UART design issue.
> 
> I will probably follow through with #3 by commenting out lien 951 and 952 in
> the /hal/arm/arch/current/src/vectors.S file.
>

That sounds like the best approach. I guess we ought to take a look at
making this a permanent feature.

-- 
Nick Garnett                                 eCos Kernel Architect
http://www.ecoscentric.com            The eCos and RedBoot experts


-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [ECOS]  Re: DSR stops running after heavy interrupts. Spurious Interrupt!
  2006-04-11  4:15                     ` Sergei Organov
@ 2006-04-11  8:43                       ` Nick Garnett
  0 siblings, 0 replies; 25+ messages in thread
From: Nick Garnett @ 2006-04-11  8:43 UTC (permalink / raw)
  To: Sergei Organov; +Cc: ecos-discuss

 Sergei Organov <osv@javad.com> writes:

> Nick Garnett <nickg@ecoscentric.com> writes:
> > "Joe Porthouse" <jporthouse@toptech.com> writes:
> [...]
> 
> >> Was there a reason why interrupt_end() should not be
> >> called on spurious interrupts?
> >
> > I guess it was an attempt to avoid doing more than the absolute
> > minimum on spurious interrupts. It looks like there is a bug in there,
> > since the scheduler lock doesn't get decremented. In general, spurious
> > interrupts shouldn't happen, which is why it has managed to lurk here
> > for so long.
> 
> Well, I think the right question here is why scheduler lock is
> incremented at all? I mean if SMP implementations happen to increment it
> inside the interrupt_end(), then it should be safe for ARM HAL to
> increment it just before calling interrupt_end(), isn't it? This way
> spurious interrupt handling code will avoid both scheduler lock
> increment and interrupt_end() call. Makes sense?

The scheduler lock has several duties. As well as disabling thread
suspension and controlling when DSRs are called, it also does duty as
an interrupt nesting counter. We only want DSRs to be called when all
nested interrupts have been unwound and we are about to return from
the first one. The scheduler lock count does this implicitly. But for
this to work properly, the scheduler lock must be incremented in the
VSR before interrupts are re-enabled and the ISR is called.

In the SMP case I decided, at least initially, that nested interrupts
would not be supported. It was hard enough keeping track of interrupts
going off on different CPUs. This allowed me to move the lock
operation into interrupt_end(), and avoided having to write any asm
code to go into the VSR.

SMP is really still in its development phase, there are a number of
things that a little experimental in there. Moving the scheduler
locking to interrupt_end() was one of them. I certainly would not want
to do that for any other configuration.

-- 
Nick Garnett                                 eCos Kernel Architect
http://www.ecoscentric.com            The eCos and RedBoot experts

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [ECOS] How to use the ARM directive DCB in Vectors.S
  2006-04-10  9:36               ` Nick Garnett
  2006-04-10 10:44                 ` Sergei Organov
  2006-04-10 16:41                 ` [ECOS] Re: DSR stops running after heavy interrupts. Spurious Interrupt! Joe Porthouse
@ 2006-04-13  7:58                 ` Birahim Larou Fall
  2006-04-13 13:28                   ` Andrew Lunn
  2 siblings, 1 reply; 25+ messages in thread
From: Birahim Larou Fall @ 2006-04-13  7:58 UTC (permalink / raw)
  To: ecos-discuss

I have modified the source file vectors.s for arm achitecture, and I can't 
compile theis file because DCB is seen as a bad instrucetion.
packages/hal/arm/arch/current/src/vectors.S:517: Error: bad instruction 
`c_string DCB "C_string",0'
How to tell ecos to support ARM directives (DCD, DCB...?
Thanks!)
Fall Birahim

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [ECOS] How to use the ARM directive DCB in Vectors.S
  2006-04-13  7:58                 ` [ECOS] How to use the ARM directive DCB in Vectors.S Birahim Larou Fall
@ 2006-04-13 13:28                   ` Andrew Lunn
  2006-04-13 13:32                     ` Birahim Larou Fall
  0 siblings, 1 reply; 25+ messages in thread
From: Andrew Lunn @ 2006-04-13 13:28 UTC (permalink / raw)
  To: Birahim Larou Fall; +Cc: ecos-discuss

On Thu, Apr 13, 2006 at 09:53:30AM +0200, Birahim Larou Fall wrote:
> I have modified the source file vectors.s for arm achitecture, and I can't 
> compile theis file because DCB is seen as a bad instrucetion.
> packages/hal/arm/arch/current/src/vectors.S:517: Error: bad instruction 
> `c_string DCB "C_string",0'
> How to tell ecos to support ARM directives (DCD, DCB...?

The problem is DCD, DCB are directives for ARM's assemble. eCos uses
gas, so you need to use the gas equivelent. I suggest you read the gas
documentation.

        Andrew

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [ECOS] How to use the ARM directive DCB in Vectors.S
  2006-04-13 13:28                   ` Andrew Lunn
@ 2006-04-13 13:32                     ` Birahim Larou Fall
  2006-04-21  7:40                       ` [ECOS] " Daniel Néri
  0 siblings, 1 reply; 25+ messages in thread
From: Birahim Larou Fall @ 2006-04-13 13:32 UTC (permalink / raw)
  To: ecos-discuss

Thanks, Andrew, where can I have the gas documentation.
Fall Birahim




Andrew Lunn <andrew@lunn.ch> 
Sent by: ecos-discuss-owner@ecos.sourceware.org
13/04/2006 15:27

To
Birahim Larou Fall <BLFall@scmmicro.fr>
cc
ecos-discuss@sources.redhat.com
Subject
Re: [ECOS] How to use the ARM directive DCB in Vectors.S






On Thu, Apr 13, 2006 at 09:53:30AM +0200, Birahim Larou Fall wrote:
> I have modified the source file vectors.s for arm achitecture, and I 
can't 
> compile theis file because DCB is seen as a bad instrucetion.
> packages/hal/arm/arch/current/src/vectors.S:517: Error: bad instruction 
> `c_string DCB "C_string",0'
> How to tell ecos to support ARM directives (DCD, DCB...?

The problem is DCD, DCB are directives for ARM's assemble. eCos uses
gas, so you need to use the gas equivelent. I suggest you read the gas
documentation.

        Andrew

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss




-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [ECOS]  Re: How to use the ARM directive DCB in Vectors.S
  2006-04-13 13:32                     ` Birahim Larou Fall
@ 2006-04-21  7:40                       ` Daniel Néri
  0 siblings, 0 replies; 25+ messages in thread
From: Daniel Néri @ 2006-04-21  7:40 UTC (permalink / raw)
  To: ecos-discuss

Birahim Larou Fall <BLFall@scmmicro.fr> writes:

> Thanks, Andrew, where can I have the gas documentation.

gas is a member of the GNU binutils tool collection:

  http://sourceware.org/binutils/



Regards,
-- 
Daniel NÃ©ri <daniel.neri@sigicom.se>
Sigicom AB, Stockholm, Sweden


-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2006-04-21  7:40 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-04-05 21:09 [ECOS] DSR stops running after heavy interrupts Joe Porthouse
2006-04-06  6:49 ` Andrew Lunn
2006-04-06  9:02   ` Stefan Sommerfeld
2006-04-06 21:09     ` Joe Porthouse
2006-04-06 21:19       ` Andrew Lunn
2006-04-08  4:18         ` [ECOS] DSR stops running after heavy interrupts. Bug found? Joe Porthouse
2006-04-09 12:33           ` Andrew Lunn
2006-04-10  4:50             ` [ECOS] " Sergei Organov
2006-04-10  9:36               ` Nick Garnett
2006-04-10 10:44                 ` Sergei Organov
2006-04-10 10:59                   ` Nick Garnett
2006-04-10 11:15                     ` Sergei Organov
2006-04-10 13:20                       ` Joe Porthouse
2006-04-10 16:41                 ` [ECOS] Re: DSR stops running after heavy interrupts. Spurious Interrupt! Joe Porthouse
2006-04-10 17:20                   ` Nick Garnett
2006-04-10 17:44                     ` Andrew Lunn
2006-04-10 20:49                     ` Joe Porthouse
2006-04-11  4:07                       ` Sergei Organov
2006-04-11  8:31                       ` Nick Garnett
2006-04-11  4:15                     ` Sergei Organov
2006-04-11  8:43                       ` Nick Garnett
2006-04-13  7:58                 ` [ECOS] How to use the ARM directive DCB in Vectors.S Birahim Larou Fall
2006-04-13 13:28                   ` Andrew Lunn
2006-04-13 13:32                     ` Birahim Larou Fall
2006-04-21  7:40                       ` [ECOS] " Daniel Néri

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).