Please do not reply to this email. Use the web interface provided at: http://bugs.ecos.sourceware.org/show_bug.cgi?id=1001456 --- Comment #29 from Bernard Fouché 2012-09-27 14:36:11 BST --- (In reply to comment #28) > > this is in fact how most drivers are already written today. > > would'nt that be OK for you too? > > Regards, > Bernd Edlinger Hi Bernd, if the current interrupt handling management was able to fix this issue then the problem would not exist. Please look at the chronograms: 1) when a 1st ISR occurs, the ISR handler is called, that's ok. 2) the ISR handler schedules a DSR and disable interrupt for the concerned vector. 3) however a second interrupt condition occurs at that point: the pending bit is raised because the interrupt can't be triggered by the MCU and hence the pending bit could not have been cleared earlier in ISR code. This happens generally in hardware having FIFO or buffers because they have many interrupt conditions. 4) the DSR is run and processes all interrupt conditions, even the interrupt condition that made the pending bit to be raised: this is the general design of all DSR I've seen in eCos: do as much as possible in a single DSR run. 5) There is no API to clear the pending bit at the end of the DSR, hence: 6) the ISR is called again because of the pending bit previously set, even if the DSR did all required work. 7) the DSR is triggered and have no work to do. So one have to choose between: - consider this problem to be cortex-m specific. Side effect: it is not possible to have generic drivers (UART, SSP, CAN, etc) that can run efficiently across cortex-m and other architectures because there isn't a common API call to cancel the pending bit. - abandon the idea of generic drivers shared between cortex-m and architectures that don't have the need to manage the pending bit: re-adapt each driver, mainly by copying files 99.9% identical from some architecture to cortex-m (IMHO this is where the real bloat risk is) - change DSR to do as less as possible in a single run: it sounds sarcastic but today if a driver doesn't handle this issue, this is what happens: ISR and DSR are triggered multiples times for nothing so why bother to make an efficient DSR? - ignore the problem. But cortex-m isn't some bizarre arch that will disappear in a few months. - increase eCos API to add a pending bit clearing call. If there is another solution I would be glad to here about it! Bernard -- Configure bugmail: http://bugs.ecos.sourceware.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug. >From ecos-bugs-return-9839-listarch-ecos-bugs=sources.redhat.com@sourceware.org Thu Sep 27 14:59:55 2012 Return-Path: Delivered-To: listarch-ecos-bugs@sources.redhat.com Received: (qmail 28236 invoked by alias); 27 Sep 2012 14:59:54 -0000 Received: (qmail 28228 invoked by uid 22791); 27 Sep 2012 14:59:53 -0000 X-SWARE-Spam-Status: No, hits=-2.8 required=5.0 tests=AWL,BAYES_00,KHOP_THREADED X-Spam-Check-By: sourceware.org Received: from hagrid.ecoscentric.com (HELO mail.ecoscentric.com) (212.13.207.197) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 27 Sep 2012 14:59:49 +0000 Received: from localhost (hagrid.ecoscentric.com [127.0.0.1]) by mail.ecoscentric.com (Postfix) with ESMTP id 449DA2FB082E for ; Thu, 27 Sep 2012 15:59:48 +0100 (BST) Received: from mail.ecoscentric.com ([127.0.0.1]) by localhost (hagrid.ecoscentric.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id xJnw2cuy+TWx; Thu, 27 Sep 2012 15:59:48 +0100 (BST) From: bugzilla-daemon@bugs.ecos.sourceware.org To: ecos-bugs@ecos.sourceware.org Subject: [Bug 1001456] HAL misses Interrupt Clear-Pending Registers handling: wasted processing power X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: eCos X-Bugzilla-Component: HAL X-Bugzilla-Keywords: X-Bugzilla-Severity: major X-Bugzilla-Who: nickg@ecoscentric.com X-Bugzilla-Status: ASSIGNED X-Bugzilla-Priority: low X-Bugzilla-Assigned-To: nickg@ecoscentric.com X-Bugzilla-Target-Milestone: --- X-Bugzilla-Changed-Fields: In-Reply-To: References: X-Bugzilla-URL: http://bugs.ecos.sourceware.org/ Auto-Submitted: auto-generated Content-Type: text/plain; charset="UTF-8" MIME-Version: 1.0 Date: Thu, 27 Sep 2012 14:59:00 -0000 Message-Id: <20120927145942.1DA1F2FB0830@mail.ecoscentric.com> Mailing-List: contact ecos-bugs-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: ecos-bugs-owner@sourceware.org Delivered-To: mailing list ecos-bugs@sourceware.org X-SW-Source: 2012/txt/msg01268.txt.bz2 Content-length: 5121 Please do not reply to this email. Use the web interface provided at: http://bugs.ecos.sourceware.org/show_bug.cgi?id01456 --- Comment #30 from Nick Garnett 2012-09-27 15:59:38 BST --- I'm not at all happy about adding an extra set of HAL and kernel functions to all architectures just to solve an obscure problem on a single architecture. Either a better solution needs to be found that can be applied only to the Cortex-M architecture, or we simply have to live with the consequences. The proposed change is, in any case, clearly a misuse of the NVIC hardware. The feature being used is intended to allow individual interrupts to be set pending by software for testing. The clear register appears to be present mainly as a side effect of using a common interface for all these NVIC bit masks. I'm sure ARM do not expect interrupts to be cleared in this way under normal circumstances, this should be done as a consequence of entering the ISR. The timing diagram in comment #2 suggests that the real problem will only occur if the CPU is too slow for the rate of interrupts being delivered. A better version of that timing diagram might be as follows: HW | E1 E2 ---|---------------------------- ISR| I1 I2 ---|---------------------------- DSR| D1= ===Ò= The ====== show the time during which the DSR is running. I2 runs during the execution of D1, posting a second DSR call, which will run immediately after D1, and in theory will find nothing to do. I can see two situations in which this can happen. 1. The CPU is simply too slow to finish running D1 before E2/I2 run, even if D1 was started immediately after I1 completed. If the events are coming at this rate continually, then the CPU simply won't keep up. If they come in infrequent bursts, then the odd extra ISR/DSR is of little consequence, and is part of the cost of dealing with a temporary overload. 2. The start of D1 was delayed because eCos had the scheduler locked when I1 ran. This is a consequence of the ISR/DSR model. If I2 ran before D1 started, then the DSR would only be called once, with a larger count value. If I2 runs after D1 starts, it may post a separate DSR; but this is true for all architectures, not just this one. Adding an interrupt cancel anywhere in D1 would only deal with any new events that were posted before that point. E2 could occur just after the cancel, and would still result in an extra ISR/DSR. The proposed solution can only reduce the number of extra ISR/DSRs, never eliminate them entirely. I also don't believe this is entirely an eCos problem. It is also present in the Cortex-M nested interrupt model, and is the expected/intended behaviour. Consider a system that is only using ISRs. Here's a timing diagram: HW | E1 E2 E1 ----|---------------------------- ISR1| I1=== ===I1=====----|---------------------------- ISR2| I2=== Here there are two devices, 1 and 2, with associated ISRs; ISR1 is lower priority than ISR2. If ISR1 is running when device 2 raises an interrupt, then it will be pre-empted and ISR2 will run. If ISR2 runs for long enough then it may delay the completion of ISR1 until after a new device 1 interrupt is posted. This will re-set the pending bit and immediately after ISR1 returns, it will be re-entered. The same will happen in the absence of nested ISRs if ISR1 just takes too long to process the first event before the second occurs. This is similar to the eCos situation. So long as these things occur infrequently, then extra ISRs are simply a cost of handling bursts of interrupts. If it happens frequently then that is an indication that the CPU is too slow to keep up with the interrupt rate. I wasn't sure what conclusion I would come to when I started writing this, but I think I have convinced myself that this is actually a non-issue. The proposal cannot eliminate these extra ISR/DSR calls completely; the problem is not eCos specific; it is not Cortex-M specific either; the issue only seriously affects systems that are on the edge of being too slow to cope with the interrupt rate. The worst aspect of the proposal is that it spreads its tentacles into all other architectures and device drivers. However, comment #7 contains a seed of a better solution. Many device drivers are somewhat lazy in using cyg_drv_interrupt_mask() and friends to control interrupt delivery; and it is this that is the main cause of the problem. They should really use peripheral registers to do this, where possible. Certainly generic drivers like the 16x5x driver should. I switched the eCosCentric version of this driver over to doing exactly this earlier this year and can contribute a patch to do that for the public version. Other drivers should be converted as and when convenient. Those devices that don't have local control of interrupts will just have to continue with the current approach and accept the consequences. -- Configure bugmail: http://bugs.ecos.sourceware.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.