From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 16599 invoked by alias); 4 Mar 2014 17:09:09 -0000 Mailing-List: contact ecos-discuss-help@ecos.sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: ecos-discuss-owner@ecos.sourceware.org Received: (qmail 16586 invoked by uid 89); 4 Mar 2014 17:09:07 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.4 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.2 X-HELO: p01c11o141.mxlogic.net Received: from p01c11o141.mxlogic.net (HELO p01c11o141.mxlogic.net) (208.65.144.64) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-SHA encrypted) ESMTPS; Tue, 04 Mar 2014 17:09:05 +0000 Received: from unknown [12.218.215.72] (EHLO smtpauth1.linear.com) by p01c11o141.mxlogic.net(mxl_mta-7.2.4-1) with ESMTP id ea806135.0.31453.00-049.81591.p01c11o141.mxlogic.net (envelope-from ); Tue, 04 Mar 2014 10:09:05 -0700 (MST) X-MXL-Hash: 531608b154cb8c3e-bd9bef9a072cb3a27946fffcf1701ed8a8d0a990 Received: from alabar.engineering.linear.com (unknown [10.186.3.96]) by smtpauth1.linear.com (Postfix) with ESMTPSA id 58C19740B1; Tue, 4 Mar 2014 09:08:58 -0800 (PST) Content-Type: text/plain; charset=iso-8859-1 Mime-Version: 1.0 (Mac OS X Mail 6.6 \(1510\)) From: Michael Jones In-Reply-To: Date: Tue, 04 Mar 2014 17:09:00 -0000 Cc: christophe , =?iso-8859-1?Q?Lambrecht_J=FCrgen?= , ecos discuss Content-Transfer-Encoding: quoted-printable Message-Id: <19DF3277-4197-4FC8-8308-882BACA16013@linear.com> References: <496B24D9-62B6-48F2-BD53-1F6B9ABE2083@linear.com> <5315F31D.9060007@stmi.com> <55AB601E-FC02-4510-B3A7-C1970FA2E187@linear.com> <5315FC76.8030002@stmi.com> To: Michael Jones X-AnalysisOut: [v=2.0 cv=f8L/8pOM c=1 sm=1 a=glloKNylpeYNumXQcclYyA==:17 a] X-AnalysisOut: [=CFl6StMSI_gA:10 a=D2_GN2MmYMYA:10 a=BLceEmwcHowA:10 a=8nJ] X-AnalysisOut: [EP1OIZ-IA:10 a=MqDINYqSAAAA:8 a=Tfy3TMlvAAAA:8 a=FP58Ms26A] X-AnalysisOut: [AAA:8 a=prgfXTrUAAAA:8 a=CCpqsmhAAAAA:8 a=l5925BLxmeVCMGlB] X-AnalysisOut: [Uy8A:9 a=wPNLvfGTeEIA:10 a=xLpt9-x9cSEA:10 a=FuO2q78TzEcA:] X-AnalysisOut: [10 a=ntesgRjRzHUA:10 a=4t78-hnhQh4A:10 a=JoAqGQeJJACA-AZl:] X-AnalysisOut: [21 a=kOUdCL5m1cBpD4ae:21] X-Spam: [F=0.5000000000; CM=0.500; MH=0.500(2014030416); S=0.200(2010122901)] X-MAIL-FROM: X-IsSubscribed: yes Subject: Re: [ECOS] Scheduler startup question X-SW-Source: 2014-03/txt/msg00005.txt.bz2 Christophe, If you are looking at my source forge code, the integration brach is my lat= est code with the trylock in Vectors.S. Mike On Mar 4, 2014, at 9:51 AM, Michael Jones wrote: > Christophe, >=20 > What I mean is the lock shown in the code you put below is not in the eCo= s code database. So when I said I added code, I added the code you put belo= w. >=20 > I removed that code and moved it to Vectors.S, where it is now a trylock,= rather than the main lock call. (My latest code on source forge does not h= ave this lock call shown below.) >=20 > When the lock was called in inteterrupt_end, it did not deadlock. When it= was called in Vectors.S, it deadlocked. >=20 > The functional difference is that when the lock was called in Vectors.S, = it was called before the ISR was called. >=20 > But as I said, I have not tried to find the root cause of the deadlock. >=20 > Perhaps I can try the kernel instrumentation when I have some time this w= eekend. >=20 > Mike >=20 > On Mar 4, 2014, at 9:16 AM, christophe wrote: >=20 >> Michael, >>=20 >> I am not sure what you mean by adding code in interrupt_end to take the = lock. The locking mechanism is present for SMP target, no change required: >>=20 >> externC void >> interrupt_end( >> cyg_uint32 isr_ret, >> Cyg_Interrupt *intr, >> HAL_SavedRegisters *regs >> ) >> { >> // CYG_REPORT_FUNCTION(); >>=20 >> #ifdef CYGPKG_KERNEL_SMP_SUPPORT >> Cyg_Scheduler::lock(); >> #endif >>=20 >> The macro for incrementing the lock in SMP looks at the current owner of= the lock and spin when required. >>=20 >> I found the kernel instrumentation option very useful for debugging dead= locks. I was using CodeConfidence plugin in Eclipse to analyze the trace wh= ich makes it pretty efficient debugging. >>=20 >> Christophe >>=20 >> On 3/4/2014 4:58 PM, Michael Jones wrote: >>> Christophe, >>>=20 >>> When I first got SMP to work I added some code in interrupt_end to take= the lock, but I moved it back to Vectors.S because I was trying to reduce = changes to the kernel. Functionally, the only difference is getting the loc= k before the ISR is executed or not. >>>=20 >>> My bigger concern is how the lock is taken. When I increase the lock co= unt, the core doing so (core 0) may not be the holder of the lock, which le= ads to assertions. And if it spins while taking the lock, it deadlocks. I h= ave not traced down the deadlock, but I think the problem is in the schedul= er, where some secondary CPU is waiting. >>>=20 >>> My current solution is to use a trylock in Vectors.S and living with th= e fact that when it fails, it will take another real time clock interrupt t= o try again. So interrupt_end is not guaranteed to called on each interrupt= . This keeps things simple. All interrupts go to core 0 except inter cpu in= terrupts. Some latency is added because taking the lock is not guaranteed. >>>=20 >>> Other ways to handle this is to send interrupts to all cores, use inter= core interrupts, etc, in an effort to guarantee a lock is incremented by t= he core that holds the lock. >>>=20 >>> I was not able to figure our how i386 handled this. Does anyone know ho= w the i386 SMP incremented the lock if the core that got the interrupt did = not hold the lock? >>>=20 >>> Mike >>>=20 >>>=20 >>> On Mar 4, 2014, at 8:37 AM, christophe wrote: >>>=20 >>>> Hi Michael, >>>>=20 >>>> I might remember wrong but I think in case of SMP target, the lock is = not taken in Vector.S but directly after entering interrupt_end. Of course = this is spinlock based so it might delay posting/scheduling of the DSR. >>>>=20 >>>> Christophe >>>>=20 >>>> On 3/2/2014 9:19 PM, Michael Jones wrote: >>>>> Jurgen, >>>>>=20 >>>>> I think I fully understand how the scheduler locking works during int= errupt now. Vectors.S takes the lock, and interrupt_end clears it. However,= the normal technique of incrementing the lock count does not work with SMP= . The problem is that another CPU may have the lock. Incrementing anyway le= ads to assertions. Attempting to take the lock with the spinlock can lead t= o deadlocks or an unresponsive network application. >>>>>=20 >>>>> So I changed things so that in Vectors.S, during an interrupt, an att= empt at locking is made. This means trying to take a spinlock that might fa= il. If the lock is taken, interrupt_end is called. If the lock fails, inter= rupt_end is not called. >>>>>=20 >>>>> This means that a DSR may not be posted on that interrupt. This can c= ause some latency based on the real time clock interrupt rate, or time unti= l a thread switch. However, it is stable and assertion free. Also, a HAL co= uld implement a timeout on the try spinlock which might reduce latency. >>>>>=20 >>>>> To support the try and testing if the lock was taken, I had to add so= me functions to the kernel. The following wiki page has been updated to ref= lect the kernel changes. >>>>>=20 >>>>> https://sourceforge.net/p/ecosfreescale/wiki/SMP%20Kernel/ >>>>>=20 >>>>> Anyone with SMP knowledge might want to take a look. There may be bet= ter solutions to some of these problems. But at least for now, the IMX6 SMP= HAL seems stable and I can run IO intensive Lua scripts over telnet reliab= ly, even when the client aborts. >>>>>=20 >>>>> The client abort means telnet has to kill a thread. This was quite a = challenge. Telnet is creating a separate heap for Lua so it can kill the th= read and reclaim memory. The remaining problem is closing file handles. I s= till get some assertions when a handle is sometimes killed by a thread that= does not own it. I don't think that can be solved without adding some new = functions dedicated to clean up of file handles by an outside thread. >>>>>=20 >>>>> Mike >>>>>=20 >>>>>=20 >>>>>=20 >>>>> On Feb 26, 2014, at 11:40 PM, Lambrecht J=FCrgen wrote: >>>>>=20 >>>>>> As far as I know the scheduler is started after cyg_user_start(), us= ed by your application to initialize everything. Do you use cyg_user_start? >>>>>>=20 >>>>>>=20 >>>>>> Verzonden vanaf Samsung Mobile >>>>>>=20 >>>>>>=20 >>>>>>=20 >>>>>> -------- Oorspronkelijk bericht -------- >>>>>> Van: Michael Jones >>>>>> Datum: >>>>>> Aan: ecos discuss >>>>>> Onderwerp: [ECOS] Scheduler startup question >>>>>>=20 >>>>>>=20 >>>>>> I have a question about proper scheduler locking startup behavior. >>>>>>=20 >>>>>> The context is I am cleaning up my iMX6 HAL and attempting to make t= hings work without a couple of kernel hacks I added to make it work. >>>>>>=20 >>>>>> The question has to do with sched_lock. By default this has a value = of 1, so during startup the scheduler is locked. >>>>>>=20 >>>>>> When there is an interrupt, sched_lock is incremented in Vectors.S, = and decremented in interrupt_end. >>>>>>=20 >>>>>> However, I am getting an assert in sync.h which is part of the BSD s= tack. The assert is because it expects the lock to be zero. >>>>>>=20 >>>>>> The question is, during the startup process, how does the lock get s= et to zero after initialization? Is it supposed to stay 1 while hardware is= initialized and through all the constructors, etc? Is it cleared by the sc= heduler somehow? Is the HAL supposed to zero it at some point during startu= p? >>>>>>=20 >>>>>> My HAL is part of the ARM hal, so if this is device specific, it is = the ARM HAL I am working with. >>>>>>=20 >>>>>> Mike >>>>>> -- >>>>>> Before posting, please read the FAQ: http://ecos.sourceware.org/fom/= ecos >>>>>> and search the list archive: http://ecos.sourceware.org/ml/ecos-disc= uss >>>>>>=20 >>>>>>=20 >>>>>> -- >>>>>> Before posting, please read the FAQ: http://ecos.sourceware.org/fom/= ecos >>>>>> and search the list archive: http://ecos.sourceware.org/ml/ecos-disc= uss >>>>>>=20 >>>>=20 >>>> --=20 >>>> Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ec= os >>>> and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss >>>>=20 >>=20 >>=20 >> --=20 >> Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos >> and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss >>=20 >=20 -- Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss