Please do not reply to this email. Use the web interface provided at:
http://bugs.ecos.sourceware.org/show_bug.cgi?id=1001456

--- Comment #29 from Bernard FouchÃ© <bernard.fouche@kuantic.com> 2012-09-27 14:36:11 BST ---
(In reply to comment #28)
>
> this is in fact how most drivers are already written today.
> 
> would'nt that be OK for you too?
> 
> Regards,
> Bernd Edlinger

Hi Bernd,

if the current interrupt handling management was able to fix this issue then
the problem would not exist. Please look at the chronograms:

1) when a 1st ISR occurs, the ISR handler is called, that's ok.
2) the ISR handler schedules a DSR and disable interrupt for the concerned
vector.
3) however a second interrupt condition occurs at that point: the pending bit
is raised because the interrupt can't be triggered by the MCU and hence the
pending bit could not have been cleared earlier in ISR code. This happens
generally in hardware having FIFO or buffers because they have many interrupt
conditions.
4) the DSR is run and processes all interrupt conditions, even the interrupt
condition that made the pending bit to be raised: this is the general design of
all DSR I've seen in eCos: do as much as possible in a single DSR run.
5) There is no API to clear the pending bit at the end of the DSR, hence:
6) the ISR is called again because of the pending bit previously set, even if
the DSR did all required work.
7) the DSR is triggered and have no work to do.

So one have to choose between:

- consider this problem to be cortex-m specific. Side effect: it is not
possible to have generic drivers (UART, SSP, CAN, etc) that can run efficiently
across cortex-m and other architectures because there isn't a common API call
to cancel the pending bit.

- abandon the idea of generic drivers shared between cortex-m and architectures
that don't have the need to manage the pending bit: re-adapt each driver,
mainly by copying files 99.9% identical from some architecture to cortex-m
(IMHO this is where the real bloat risk is)

- change DSR to do as less as possible in a single run: it sounds sarcastic but
today if a driver doesn't handle this issue, this is what happens: ISR and DSR
are triggered multiples times for nothing so why bother to make an efficient
DSR?

- ignore the problem. But cortex-m isn't some bizarre arch that will disappear
in a few months.

- increase eCos API to add a pending bit clearing call.

If there is another solution I would be glad to here about it!


  Bernard

-- 
Configure bugmail: http://bugs.ecos.sourceware.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
>From ecos-bugs-return-9839-listarch-ecos-bugs=sources.redhat.com@sourceware.org Thu Sep 27 14:59:55 2012
Return-Path: <ecos-bugs-return-9839-listarch-ecos-bugs=sources.redhat.com@sourceware.org>
Delivered-To: listarch-ecos-bugs@sources.redhat.com
Received: (qmail 28236 invoked by alias); 27 Sep 2012 14:59:54 -0000
Received: (qmail 28228 invoked by uid 22791); 27 Sep 2012 14:59:53 -0000
X-SWARE-Spam-Status: No, hits=-2.8 required=5.0
	tests=AWL,BAYES_00,KHOP_THREADED
X-Spam-Check-By: sourceware.org
Received: from hagrid.ecoscentric.com (HELO mail.ecoscentric.com) (212.13.207.197)
    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 27 Sep 2012 14:59:49 +0000
Received: from localhost (hagrid.ecoscentric.com [127.0.0.1])
	by mail.ecoscentric.com (Postfix) with ESMTP id 449DA2FB082E
	for <ecos-bugs@ecos.sourceware.org>; Thu, 27 Sep 2012 15:59:48 +0100 (BST)
Received: from mail.ecoscentric.com ([127.0.0.1])
	by localhost (hagrid.ecoscentric.com [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id xJnw2cuy+TWx; Thu, 27 Sep 2012 15:59:48 +0100 (BST)
From: bugzilla-daemon@bugs.ecos.sourceware.org
To: ecos-bugs@ecos.sourceware.org
Subject: [Bug 1001456] HAL misses Interrupt Clear-Pending Registers handling:
 wasted processing power
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: eCos
X-Bugzilla-Component: HAL
X-Bugzilla-Keywords:
X-Bugzilla-Severity: major
X-Bugzilla-Who: nickg@ecoscentric.com
X-Bugzilla-Status: ASSIGNED
X-Bugzilla-Priority: low
X-Bugzilla-Assigned-To: nickg@ecoscentric.com
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Changed-Fields:
In-Reply-To: <bug-1001456-13@http.bugs.ecos.sourceware.org/>
References: <bug-1001456-13@http.bugs.ecos.sourceware.org/>
X-Bugzilla-URL: http://bugs.ecos.sourceware.org/
Auto-Submitted: auto-generated
Content-Type: text/plain; charset="UTF-8"
MIME-Version: 1.0
Date: Thu, 27 Sep 2012 14:59:00 -0000
Message-Id: <20120927145942.1DA1F2FB0830@mail.ecoscentric.com>
Mailing-List: contact ecos-bugs-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <ecos-bugs.sourceware.org>
List-Subscribe: <mailto:ecos-bugs-subscribe@sourceware.org>
List-Post: <mailto:ecos-bugs@sourceware.org>
List-Help: <mailto:ecos-bugs-help@sourceware.org>, <http://sourceware.org/lists.html#faqs>
Sender: ecos-bugs-owner@sourceware.org
Delivered-To: mailing list ecos-bugs@sourceware.org
X-SW-Source: 2012/txt/msg01268.txt.bz2
Content-length: 5121

Please do not reply to this email. Use the web interface provided at:
http://bugs.ecos.sourceware.org/show_bug.cgi?id01456

--- Comment #30 from Nick Garnett <nickg@ecoscentric.com> 2012-09-27 15:59:38 BST ---

I'm not at all happy about adding an extra set of HAL and kernel functions to
all architectures just to solve an obscure problem on a single architecture.
Either a better solution needs to be found that can be applied only to the
Cortex-M architecture, or we simply have to live with the consequences.

The proposed change is, in any case, clearly a misuse of the NVIC hardware. The
feature being used is intended to allow individual interrupts to be set pending
by software for testing. The clear register appears to be present mainly as a
side effect of using a common interface for all these NVIC bit masks. I'm sure
ARM do not expect interrupts to be cleared in this way under normal
circumstances, this should be done as a consequence of entering the ISR.

The timing diagram in comment #2 suggests that the real problem will only occur
if the CPU is too slow for the rate of interrupts being delivered.

A better version of that timing diagram might be as follows:

HW |  E1      E2
---|----------------------------
ISR|    I1      I2
---|----------------------------
DSR|         D1=  ===Ò=
The ====== show the time during which the DSR is running. I2 runs during the
execution of D1, posting a second DSR call, which will run immediately after
D1, and in theory will find nothing to do.

I can see two situations in which this can happen.

1. The CPU is simply too slow to finish running D1 before E2/I2 run, even if D1
was started immediately after I1 completed. If the events are coming at this
rate continually, then the CPU simply won't keep up. If they come in infrequent
bursts, then the odd extra ISR/DSR is of little consequence, and is part of the
cost of dealing with a temporary overload.

2. The start of D1 was delayed because eCos had the scheduler locked when I1
ran. This is a consequence of the ISR/DSR model. If I2 ran before D1 started,
then the DSR would only be called once, with a larger count value. If I2 runs
after D1 starts, it may post a separate DSR; but this is true for all
architectures, not just this one.

Adding an interrupt cancel anywhere in D1 would only deal with any new events
that were posted before that point. E2 could occur just after the cancel, and
would still result in an extra ISR/DSR. The proposed solution can only reduce
the number of extra ISR/DSRs, never eliminate them entirely.

I also don't believe this is entirely an eCos problem. It is also present in
the Cortex-M nested interrupt model, and is the expected/intended behaviour.
Consider a system that is only using ISRs. Here's a timing diagram:

HW  |  E1   E2   E1
----|----------------------------
ISR1|    I1===      ===I1=====----|----------------------------
ISR2|         I2===
Here there are two devices, 1 and 2, with associated ISRs; ISR1 is lower
priority than ISR2. If ISR1 is running when device 2 raises an interrupt, then
it will be pre-empted and ISR2 will run. If ISR2 runs for long enough then it
may delay the completion of ISR1 until after a new device 1 interrupt is
posted. This will re-set the pending bit and immediately after ISR1 returns, it
will be re-entered. The same will happen in the absence of nested ISRs if ISR1
just takes too long to process the first event before the second occurs.

This is similar to the eCos situation. So long as these things occur
infrequently, then extra ISRs are simply a cost of handling bursts of
interrupts. If it happens frequently then that is an indication that the CPU is
too slow to keep up with the interrupt rate.


I wasn't sure what conclusion I would come to when I started writing this, but
I think I have convinced myself that this is actually a non-issue. The proposal
cannot eliminate these extra ISR/DSR calls completely; the problem is not eCos
specific; it is not Cortex-M specific either; the issue only seriously affects
systems that are on the edge of being too slow to cope with the interrupt rate.
The worst aspect of the proposal is that it spreads its tentacles into all
other architectures and device drivers.

However, comment #7 contains a seed of a better solution. Many device drivers
are somewhat lazy in using cyg_drv_interrupt_mask() and friends to control
interrupt delivery; and it is this that is the main cause of the problem. They
should really use peripheral registers to do this, where possible. Certainly
generic drivers like the 16x5x driver should. I switched the eCosCentric
version of this driver over to doing exactly this earlier this year and can
contribute a patch to do that for the public version. Other drivers should be
converted as and when convenient. Those devices that don't have local control
of interrupts will just have to continue with the current approach and accept
the consequences.

--
Configure bugmail: http://bugs.ecos.sourceware.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.