From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 1676 invoked by alias); 4 Mar 2014 16:51:06 -0000 Mailing-List: contact ecos-discuss-help@ecos.sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: ecos-discuss-owner@ecos.sourceware.org Received: (qmail 1657 invoked by uid 89); 4 Mar 2014 16:51:05 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.3 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.2 X-HELO: p02c12o144.mxlogic.net Received: from p02c12o144.mxlogic.net (HELO p02c12o144.mxlogic.net) (208.65.145.77) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-SHA encrypted) ESMTPS; Tue, 04 Mar 2014 16:51:03 +0000 Received: from unknown [12.218.215.72] (EHLO smtpauth1.linear.com) by p02c12o144.mxlogic.net(mxl_mta-7.2.4-1) with ESMTP id 17406135.0.14056.00-072.38790.p02c12o144.mxlogic.net (envelope-from ); Tue, 04 Mar 2014 09:51:02 -0700 (MST) X-MXL-Hash: 531604765c6dbe6d-b55408515fe8d900bf0c085c3c177a2361035c1b Received: from alabar.engineering.linear.com (unknown [10.186.3.96]) by smtpauth1.linear.com (Postfix) with ESMTPSA id 34C7C740B1; Tue, 4 Mar 2014 08:50:54 -0800 (PST) Content-Type: text/plain; charset=iso-8859-1 Mime-Version: 1.0 (Mac OS X Mail 6.6 \(1510\)) From: Michael Jones In-Reply-To: <5315FC76.8030002@stmi.com> Date: Tue, 04 Mar 2014 16:51:00 -0000 Cc: =?iso-8859-1?Q?Lambrecht_J=FCrgen?= , ecos discuss Content-Transfer-Encoding: quoted-printable Message-Id: References: <496B24D9-62B6-48F2-BD53-1F6B9ABE2083@linear.com> <5315F31D.9060007@stmi.com> <55AB601E-FC02-4510-B3A7-C1970FA2E187@linear.com> <5315FC76.8030002@stmi.com> To: christophe X-AnalysisOut: [v=2.0 cv=EIexJSlC c=1 sm=1 a=glloKNylpeYNumXQcclYyA==:17 a] X-AnalysisOut: [=CFl6StMSI_gA:10 a=D2_GN2MmYMYA:10 a=BLceEmwcHowA:10 a=8nJ] X-AnalysisOut: [EP1OIZ-IA:10 a=MqDINYqSAAAA:8 a=Tfy3TMlvAAAA:8 a=FP58Ms26A] X-AnalysisOut: [AAA:8 a=prgfXTrUAAAA:8 a=CCpqsmhAAAAA:8 a=l5925BLxmeVCMGlB] X-AnalysisOut: [Uy8A:9 a=wPNLvfGTeEIA:10 a=FuO2q78TzEcA:10 a=ntesgRjRzHUA:] X-AnalysisOut: [10 a=xLpt9-x9cSEA:10 a=4t78-hnhQh4A:10 a=8UKCUx3pCs0VX8o7:] X-AnalysisOut: [21 a=7ozepT-EN3vFYeBN:21] X-Spam: [F=0.5000000000; CM=0.500; MH=0.500(2014030416); S=0.200(2010122901)] X-MAIL-FROM: X-IsSubscribed: yes Subject: Re: [ECOS] Scheduler startup question X-SW-Source: 2014-03/txt/msg00004.txt.bz2 Christophe, What I mean is the lock shown in the code you put below is not in the eCos = code database. So when I said I added code, I added the code you put below. I removed that code and moved it to Vectors.S, where it is now a trylock, r= ather than the main lock call. (My latest code on source forge does not hav= e this lock call shown below.) When the lock was called in inteterrupt_end, it did not deadlock. When it w= as called in Vectors.S, it deadlocked. The functional difference is that when the lock was called in Vectors.S, it= was called before the ISR was called. But as I said, I have not tried to find the root cause of the deadlock. Perhaps I can try the kernel instrumentation when I have some time this wee= kend. Mike On Mar 4, 2014, at 9:16 AM, christophe wrote: > Michael, >=20 > I am not sure what you mean by adding code in interrupt_end to take the l= ock. The locking mechanism is present for SMP target, no change required: >=20 > externC void > interrupt_end( > cyg_uint32 isr_ret, > Cyg_Interrupt *intr, > HAL_SavedRegisters *regs > ) > { > // CYG_REPORT_FUNCTION(); >=20 > #ifdef CYGPKG_KERNEL_SMP_SUPPORT > Cyg_Scheduler::lock(); > #endif >=20 > The macro for incrementing the lock in SMP looks at the current owner of = the lock and spin when required. >=20 > I found the kernel instrumentation option very useful for debugging deadl= ocks. I was using CodeConfidence plugin in Eclipse to analyze the trace whi= ch makes it pretty efficient debugging. >=20 > Christophe >=20 > On 3/4/2014 4:58 PM, Michael Jones wrote: >> Christophe, >>=20 >> When I first got SMP to work I added some code in interrupt_end to take = the lock, but I moved it back to Vectors.S because I was trying to reduce c= hanges to the kernel. Functionally, the only difference is getting the lock= before the ISR is executed or not. >>=20 >> My bigger concern is how the lock is taken. When I increase the lock cou= nt, the core doing so (core 0) may not be the holder of the lock, which lea= ds to assertions. And if it spins while taking the lock, it deadlocks. I ha= ve not traced down the deadlock, but I think the problem is in the schedule= r, where some secondary CPU is waiting. >>=20 >> My current solution is to use a trylock in Vectors.S and living with the= fact that when it fails, it will take another real time clock interrupt to= try again. So interrupt_end is not guaranteed to called on each interrupt.= This keeps things simple. All interrupts go to core 0 except inter cpu int= errupts. Some latency is added because taking the lock is not guaranteed. >>=20 >> Other ways to handle this is to send interrupts to all cores, use inter = core interrupts, etc, in an effort to guarantee a lock is incremented by th= e core that holds the lock. >>=20 >> I was not able to figure our how i386 handled this. Does anyone know how= the i386 SMP incremented the lock if the core that got the interrupt did n= ot hold the lock? >>=20 >> Mike >>=20 >>=20 >> On Mar 4, 2014, at 8:37 AM, christophe wrote: >>=20 >>> Hi Michael, >>>=20 >>> I might remember wrong but I think in case of SMP target, the lock is n= ot taken in Vector.S but directly after entering interrupt_end. Of course t= his is spinlock based so it might delay posting/scheduling of the DSR. >>>=20 >>> Christophe >>>=20 >>> On 3/2/2014 9:19 PM, Michael Jones wrote: >>>> Jurgen, >>>>=20 >>>> I think I fully understand how the scheduler locking works during inte= rrupt now. Vectors.S takes the lock, and interrupt_end clears it. However, = the normal technique of incrementing the lock count does not work with SMP.= The problem is that another CPU may have the lock. Incrementing anyway lea= ds to assertions. Attempting to take the lock with the spinlock can lead to= deadlocks or an unresponsive network application. >>>>=20 >>>> So I changed things so that in Vectors.S, during an interrupt, an atte= mpt at locking is made. This means trying to take a spinlock that might fai= l. If the lock is taken, interrupt_end is called. If the lock fails, interr= upt_end is not called. >>>>=20 >>>> This means that a DSR may not be posted on that interrupt. This can ca= use some latency based on the real time clock interrupt rate, or time until= a thread switch. However, it is stable and assertion free. Also, a HAL cou= ld implement a timeout on the try spinlock which might reduce latency. >>>>=20 >>>> To support the try and testing if the lock was taken, I had to add som= e functions to the kernel. The following wiki page has been updated to refl= ect the kernel changes. >>>>=20 >>>> https://sourceforge.net/p/ecosfreescale/wiki/SMP%20Kernel/ >>>>=20 >>>> Anyone with SMP knowledge might want to take a look. There may be bett= er solutions to some of these problems. But at least for now, the IMX6 SMP = HAL seems stable and I can run IO intensive Lua scripts over telnet reliabl= y, even when the client aborts. >>>>=20 >>>> The client abort means telnet has to kill a thread. This was quite a c= hallenge. Telnet is creating a separate heap for Lua so it can kill the thr= ead and reclaim memory. The remaining problem is closing file handles. I st= ill get some assertions when a handle is sometimes killed by a thread that = does not own it. I don't think that can be solved without adding some new f= unctions dedicated to clean up of file handles by an outside thread. >>>>=20 >>>> Mike >>>>=20 >>>>=20 >>>>=20 >>>> On Feb 26, 2014, at 11:40 PM, Lambrecht J=FCrgen wrote: >>>>=20 >>>>> As far as I know the scheduler is started after cyg_user_start(), use= d by your application to initialize everything. Do you use cyg_user_start? >>>>>=20 >>>>>=20 >>>>> Verzonden vanaf Samsung Mobile >>>>>=20 >>>>>=20 >>>>>=20 >>>>> -------- Oorspronkelijk bericht -------- >>>>> Van: Michael Jones >>>>> Datum: >>>>> Aan: ecos discuss >>>>> Onderwerp: [ECOS] Scheduler startup question >>>>>=20 >>>>>=20 >>>>> I have a question about proper scheduler locking startup behavior. >>>>>=20 >>>>> The context is I am cleaning up my iMX6 HAL and attempting to make th= ings work without a couple of kernel hacks I added to make it work. >>>>>=20 >>>>> The question has to do with sched_lock. By default this has a value o= f 1, so during startup the scheduler is locked. >>>>>=20 >>>>> When there is an interrupt, sched_lock is incremented in Vectors.S, a= nd decremented in interrupt_end. >>>>>=20 >>>>> However, I am getting an assert in sync.h which is part of the BSD st= ack. The assert is because it expects the lock to be zero. >>>>>=20 >>>>> The question is, during the startup process, how does the lock get se= t to zero after initialization? Is it supposed to stay 1 while hardware is = initialized and through all the constructors, etc? Is it cleared by the sch= eduler somehow? Is the HAL supposed to zero it at some point during startup? >>>>>=20 >>>>> My HAL is part of the ARM hal, so if this is device specific, it is t= he ARM HAL I am working with. >>>>>=20 >>>>> Mike >>>>> -- >>>>> Before posting, please read the FAQ: http://ecos.sourceware.org/fom/e= cos >>>>> and search the list archive: http://ecos.sourceware.org/ml/ecos-discu= ss >>>>>=20 >>>>>=20 >>>>> -- >>>>> Before posting, please read the FAQ: http://ecos.sourceware.org/fom/e= cos >>>>> and search the list archive: http://ecos.sourceware.org/ml/ecos-discu= ss >>>>>=20 >>>=20 >>> --=20 >>> Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos >>> and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss >>>=20 >=20 >=20 > --=20 > Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos > and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss >=20 -- Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss