From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 27449 invoked by alias); 8 Apr 2006 04:18:39 -0000 Received: (qmail 27440 invoked by uid 22791); 8 Apr 2006 04:18:37 -0000 X-Spam-Check-By: sourceware.org Received: from mail.toptech.com (HELO toptech.com) (207.13.72.10) by sourceware.org (qpsmtpd/0.31) with ESMTP; Sat, 08 Apr 2006 04:18:34 +0000 DKIM-Signature: a=rsa-sha1; c=simple; d=toptech.com; s=MDaemon; t=1144469912; x=1145679512; q=dns; h=DomainKey-Signature: Received:Reply-To:From:To:Subject:Date:Organization:MIME-Version: Content-Type:Content-Transfer-Encoding:In-Reply-To:Thread-Index: Message-ID; b=JesxwPzVvcJmPwOXetExYW4t12ZIH9f8u2AKeY+BFG9u9FfobBr aXSVkXsRqmch9hrpRl8BoKNVv78reI6DFUlv1MDuC7zFSpGem6P/gu8OLk7kUSqR o4Q9YDGW0McubIqCeMI1KeYZgf2rvIpIzQmOchlAZUDPkIy+wGQwbGNo= Received: from jporthousexp2 by toptech.com (Cipher TLSv1:RC4-MD5:128) (MDaemon PRO v9.0.0) with ESMTP id 01-md50000000006.msg for ; Sat, 08 Apr 2006 00:18:31 -0400 Reply-To: From: "Joe Porthouse" To: Date: Sat, 08 Apr 2006 04:18:00 -0000 MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit In-Reply-To: <20060406211847.GH12221@lunn.ch> X-HashCash: 1:20:060408:ecos-discuss@ecos.sourceware.org::9B34ZCj7q2Wbiaee:000000000000000000000000000001Hu1 X-Spam-Processed: mail.toptech.com, Sat, 08 Apr 2006 00:18:31 -0400 (not processed: message from trusted or authenticated source) X-MDRemoteIP: 67.34.129.184 X-Return-Path: jporthouse@toptech.com X-MDaemon-Deliver-To: ecos-discuss@ecos.sourceware.org Message-ID: X-MDAV-Processed: mail.toptech.com, Sat, 08 Apr 2006 00:18:32 -0400 X-IsSubscribed: yes Mailing-List: contact ecos-discuss-help@ecos.sourceware.org; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: ecos-discuss-owner@ecos.sourceware.org Subject: RE: [ECOS] DSR stops running after heavy interrupts. Bug found? X-SW-Source: 2006-04/txt/msg00093.txt.bz2 Found it!!! It took two days to figure out what was happening but I think I have a handle on it. See if this sounds right. After an ISR executes, if there is an associated DSR to execute, the DSR is added to the DSR list and a scheduler lock is made. Since DSRs are run with interrupts enabled, the scheduler lock will prevent the application code from running until all DSRs finish and release each of the scheduler locks. After adding the DSR to the list, if there is only one scheduler lock (the one just added), then a call must be made to start the first DSR executing. If more then one scheduler lock is in place, then execution must resume from where it left off (DSR or other critical section). The DSR will start after the next scheduler unlock is called. If the ISR does not have an associated DSR, nothing is added to the DSR list and the scheduler lock is not made, allowing the application or DSR to resume when the ISR finishes. The problem is in the /hal/arm/arch/current/src/vectors.S file at line 951. // The return value from the handler (in r0) will indicate whether a // DSR is to be posted. Pass this together with a pointer to the // interrupt object we have just used to the interrupt tidy up routine. // don't run this for spurious interrupts! cmp v1,#CYGNUM_HAL_INTERRUPT_NONE <-- Incorrectly references R4 cmp r0,#CYGNUM_HAL_INTERRUPT_NONE <-- Change to this The wrong register is referenced to determine if the ISR has a DSR to add to the DSR list. Since any value in R4 other then 0x0001 will call the routines to add a DSR, and I assume most ISRs have a DSR, the default behavior seems to works by chance in most configurations. In my application my ISR does NOT have an associated DSR. Even though the correct 0x0001 is returned by the ISR, the call to add the DSR is still made. This includes performing a scheduler lock since it expects to release it after the DSR runs, but there is no DSR. I believe there is some type of race condition here that allows the lock to not be released correctly since there is no corresponding DSR in the DSR list. Modifying only the above line has so far completely solved my issue of loosing my DSRs execution. Can someone review the proposed change, and if warranted, add it into the CVS? This problem could/will effect any ARM eCOS application. Since "v1" may have correctly referenced "r0" at some time in the past, the other half dozen "v1, v2...v6" references in vectors.S could also be incorrect. Joe Porthouse Toptech Systems, Inc. -----Original Message----- From: ecos-discuss-owner@ecos.sourceware.org [mailto:ecos-discuss-owner@ecos.sourceware.org] On Behalf Of Andrew Lunn Sent: Thursday, April 06, 2006 5:19 PM To: Joe Porthouse Cc: ecos-discuss@ecos.sourceware.org Subject: Re: [ECOS] DSR stops running after heavy interrupts. On Thu, Apr 06, 2006 at 05:08:45PM -0400, Joe Porthouse wrote: > Stefan, thanks. I'm glad to know I'm not the only one experiencing this > problem. > > I have made a little more progress. > > I still can't explain the issues with the code listed in my first message > with the code checking the return value from the ISR, but I believe it is > somehow working correctly. I still believe there may be a problem with R4 > being checked instead of R0. I did verify that the memory was the same as > my code window, as well as the flash image. > > This is what I did find. > > DSR calls are being added to the table... thousands of them... just not > getting serviced. The all calls that lead to "call_pending_DSRs" seem to > originate from the unlock_inner() routine getting called. This routine > stops getting called when the problem occurs. (you can see the logic below) > > > inline void Cyg_Scheduler::unlock() > { > // This is an inline wrapper for the real scheduler unlock function in > // Cyg_Scheduler::unlock_inner(). > > // Only do anything if the lock is about to go zero, otherwise we simply > > // decrement and return. As with lock() we do not need any special code > // to decrement the lock counter. > > CYG_INSTRUMENT_SCHED(UNLOCK,get_sched_lock(),0); > > HAL_REORDER_BARRIER(); > > cyg_ucount32 __lock = get_sched_lock() - 1; > > if( __lock == 0 ) > unlock_inner(0); > else > set_sched_lock(__lock); > > HAL_REORDER_BARRIER(); > } > > Upon examination the __lock value is "6" when unlock() is called at the end > of the ISR, thus unlock_inner never gets called. If I get the variable > location in the get_sched_lock() back to 1, my DSR calls resume. > Mmmmmmm.... > > So somehow locks are being done without unlocks. I am at a loss to figure > out how this is occurring since I do not make lock calls in any of my code. > Could interrupt preemption somehow be occurring? Does the > hal_disable/enable interrupt calls mess with the lock? > > Any good ideas on how to track this down? Kernel instrumentation. CYG_INSTRUMENT_SCHED(UNLOCK,get_sched_lock(),0); locks and unlocks are logged. See if you can find a case of a lock without an unlock. Andrew -- Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss -- Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss