From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <ecos-discuss-return-46252-listarch-ecos-discuss=sources.redhat.com@ecos.sourceware.org>
Received: (qmail 16599 invoked by alias); 4 Mar 2014 17:09:09 -0000
Mailing-List: contact ecos-discuss-help@ecos.sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <ecos-discuss.ecos.sourceware.org>
List-Subscribe: <mailto:ecos-discuss-subscribe@ecos.sourceware.org>
List-Archive: <http://ecos.sourceware.org/ml/ecos-discuss/>
List-Post: <mailto:ecos-discuss@ecos.sourceware.org>
List-Help: <mailto:ecos-discuss-help@ecos.sourceware.org>, <http://ecos.sourceware.org/ml/#faqs>
Sender: ecos-discuss-owner@ecos.sourceware.org
Received: (qmail 16586 invoked by uid 89); 4 Mar 2014 17:09:07 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.4 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.2
X-HELO: p01c11o141.mxlogic.net
Received: from p01c11o141.mxlogic.net (HELO p01c11o141.mxlogic.net) (208.65.144.64) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-SHA encrypted) ESMTPS; Tue, 04 Mar 2014 17:09:05 +0000
Received: from unknown [12.218.215.72] (EHLO smtpauth1.linear.com)	by p01c11o141.mxlogic.net(mxl_mta-7.2.4-1)	with ESMTP id ea806135.0.31453.00-049.81591.p01c11o141.mxlogic.net (envelope-from <mjones@linear.com>);	Tue, 04 Mar 2014 10:09:05 -0700 (MST)
X-MXL-Hash: 531608b154cb8c3e-bd9bef9a072cb3a27946fffcf1701ed8a8d0a990
Received: from alabar.engineering.linear.com (unknown [10.186.3.96])	by smtpauth1.linear.com (Postfix) with ESMTPSA id 58C19740B1;	Tue,  4 Mar 2014 09:08:58 -0800 (PST)
Content-Type: text/plain; charset=iso-8859-1
Mime-Version: 1.0 (Mac OS X Mail 6.6 \(1510\))
From: Michael Jones <mjones@linear.com>
In-Reply-To: <F9FAA5F5-0443-425A-BB56-F763877BE911@linear.com>
Date: Tue, 04 Mar 2014 17:09:00 -0000
Cc: christophe <ccoutand@stmi.com>, =?iso-8859-1?Q?Lambrecht_J=FCrgen?= <J.Lambrecht@TELEVIC.com>, ecos discuss <ecos-discuss@sourceware.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <19DF3277-4197-4FC8-8308-882BACA16013@linear.com>
References: <jbx041itlgmp5ctby5fmojvm.1393483201068@email.android.com> <496B24D9-62B6-48F2-BD53-1F6B9ABE2083@linear.com> <5315F31D.9060007@stmi.com> <55AB601E-FC02-4510-B3A7-C1970FA2E187@linear.com> <5315FC76.8030002@stmi.com> <F9FAA5F5-0443-425A-BB56-F763877BE911@linear.com>
To: Michael Jones <mjones@linear.com>
X-AnalysisOut: [v=2.0 cv=f8L/8pOM c=1 sm=1 a=glloKNylpeYNumXQcclYyA==:17 a]
X-AnalysisOut: [=CFl6StMSI_gA:10 a=D2_GN2MmYMYA:10 a=BLceEmwcHowA:10 a=8nJ]
X-AnalysisOut: [EP1OIZ-IA:10 a=MqDINYqSAAAA:8 a=Tfy3TMlvAAAA:8 a=FP58Ms26A]
X-AnalysisOut: [AAA:8 a=prgfXTrUAAAA:8 a=CCpqsmhAAAAA:8 a=l5925BLxmeVCMGlB]
X-AnalysisOut: [Uy8A:9 a=wPNLvfGTeEIA:10 a=xLpt9-x9cSEA:10 a=FuO2q78TzEcA:]
X-AnalysisOut: [10 a=ntesgRjRzHUA:10 a=4t78-hnhQh4A:10 a=JoAqGQeJJACA-AZl:]
X-AnalysisOut: [21 a=kOUdCL5m1cBpD4ae:21]
X-Spam: [F=0.5000000000; CM=0.500; MH=0.500(2014030416); S=0.200(2010122901)]
X-MAIL-FROM: <mjones@linear.com>
X-IsSubscribed: yes
Subject: Re: [ECOS] Scheduler startup question
X-SW-Source: 2014-03/txt/msg00005.txt.bz2

Christophe,

If you are looking at my source forge code, the integration brach is my lat=
est code with the trylock in Vectors.S.

Mike

On Mar 4, 2014, at 9:51 AM, Michael Jones <mjones@linear.com> wrote:

> Christophe,
>=20
> What I mean is the lock shown in the code you put below is not in the eCo=
s code database. So when I said I added code, I added the code you put belo=
w.
>=20
> I removed that code and moved it to Vectors.S, where it is now a trylock,=
 rather than the main lock call. (My latest code on source forge does not h=
ave this lock call shown below.)
>=20
> When the lock was called in inteterrupt_end, it did not deadlock. When it=
 was called in Vectors.S, it deadlocked.
>=20
> The functional difference is that when the lock was called in Vectors.S, =
it was called before the ISR was called.
>=20
> But as I said, I have not tried to find the root cause of the deadlock.
>=20
> Perhaps I can try the kernel instrumentation when I have some time this w=
eekend.
>=20
> Mike
>=20
> On Mar 4, 2014, at 9:16 AM, christophe <ccoutand@stmi.com> wrote:
>=20
>> Michael,
>>=20
>> I am not sure what you mean by adding code in interrupt_end to take the =
lock. The locking mechanism is present for SMP target, no change required:
>>=20
>> externC void
>> interrupt_end(
>>   cyg_uint32          isr_ret,
>>   Cyg_Interrupt       *intr,
>>   HAL_SavedRegisters  *regs
>>   )
>> {
>> //    CYG_REPORT_FUNCTION();
>>=20
>> #ifdef CYGPKG_KERNEL_SMP_SUPPORT
>>   Cyg_Scheduler::lock();
>> #endif
>>=20
>> The macro for incrementing the lock in SMP looks at the current owner of=
 the lock and spin when required.
>>=20
>> I found the kernel instrumentation option very useful for debugging dead=
locks. I was using CodeConfidence plugin in Eclipse to analyze the trace wh=
ich makes it pretty efficient debugging.
>>=20
>> Christophe
>>=20
>> On 3/4/2014 4:58 PM, Michael Jones wrote:
>>> Christophe,
>>>=20
>>> When I first got SMP to work I added some code in interrupt_end to take=
 the lock, but I moved it back to Vectors.S because I was trying to reduce =
changes to the kernel. Functionally, the only difference is getting the loc=
k before the ISR is executed or not.
>>>=20
>>> My bigger concern is how the lock is taken. When I increase the lock co=
unt, the core doing so (core 0) may not be the holder of the lock, which le=
ads to assertions. And if it spins while taking the lock, it deadlocks. I h=
ave not traced down the deadlock, but I think the problem is in the schedul=
er, where some secondary CPU is waiting.
>>>=20
>>> My current solution is to use a trylock in Vectors.S and living with th=
e fact that when it fails, it will take another real time clock interrupt t=
o try again. So interrupt_end is not guaranteed to called on each interrupt=
. This keeps things simple. All interrupts go to core 0 except inter cpu in=
terrupts. Some latency is added because taking the lock is not guaranteed.
>>>=20
>>> Other ways to handle this is to send interrupts to all cores, use inter=
 core interrupts, etc, in an effort to guarantee a lock is incremented by t=
he core that holds the lock.
>>>=20
>>> I was not able to figure our how i386 handled this. Does anyone know ho=
w the i386 SMP incremented the lock if the core that got the interrupt did =
not hold the lock?
>>>=20
>>> Mike
>>>=20
>>>=20
>>> On Mar 4, 2014, at 8:37 AM, christophe <ccoutand@stmi.com> wrote:
>>>=20
>>>> Hi Michael,
>>>>=20
>>>> I might remember wrong but I think in case of SMP target, the lock is =
not taken in Vector.S but directly after entering interrupt_end. Of course =
this is spinlock based so it might delay posting/scheduling of the DSR.
>>>>=20
>>>> Christophe
>>>>=20
>>>> On 3/2/2014 9:19 PM, Michael Jones wrote:
>>>>> Jurgen,
>>>>>=20
>>>>> I think I fully understand how the scheduler locking works during int=
errupt now. Vectors.S takes the lock, and interrupt_end clears it. However,=
 the normal technique of incrementing the lock count does not work with SMP=
. The problem is that another CPU may have the lock. Incrementing anyway le=
ads to assertions. Attempting to take the lock with the spinlock can lead t=
o deadlocks or an unresponsive network application.
>>>>>=20
>>>>> So I changed things so that in Vectors.S, during an interrupt, an att=
empt at locking is made. This means trying to take a spinlock that might fa=
il. If the lock is taken, interrupt_end is called. If the lock fails, inter=
rupt_end is not called.
>>>>>=20
>>>>> This means that a DSR may not be posted on that interrupt. This can c=
ause some latency based on the real time clock interrupt rate, or time unti=
l a thread switch. However, it is stable and assertion free. Also, a HAL co=
uld implement a timeout on the try spinlock which might reduce latency.
>>>>>=20
>>>>> To support the try and testing if the lock was taken, I had to add so=
me functions to the kernel. The following wiki page has been updated to ref=
lect the kernel changes.
>>>>>=20
>>>>> https://sourceforge.net/p/ecosfreescale/wiki/SMP%20Kernel/
>>>>>=20
>>>>> Anyone with SMP knowledge might want to take a look. There may be bet=
ter solutions to some of these problems. But at least for now, the IMX6 SMP=
 HAL seems stable and I can run IO intensive Lua scripts over telnet reliab=
ly, even when the client aborts.
>>>>>=20
>>>>> The client abort means telnet has to kill a thread. This was quite a =
challenge. Telnet is creating a separate heap for Lua so it can kill the th=
read and reclaim memory. The remaining problem is closing file handles. I s=
till get some assertions when a handle is sometimes killed by a thread that=
 does not own it. I don't think that can be solved without adding some new =
functions dedicated to clean up of file handles by an outside thread.
>>>>>=20
>>>>> Mike
>>>>>=20
>>>>>=20
>>>>>=20
>>>>> On Feb 26, 2014, at 11:40 PM, Lambrecht J=FCrgen <J.Lambrecht@TELEVIC=
.com> wrote:
>>>>>=20
>>>>>> As far as I know the scheduler is started after cyg_user_start(), us=
ed by your application to initialize everything.  Do you use cyg_user_start?
>>>>>>=20
>>>>>>=20
>>>>>> Verzonden vanaf Samsung Mobile
>>>>>>=20
>>>>>>=20
>>>>>>=20
>>>>>> -------- Oorspronkelijk bericht --------
>>>>>> Van: Michael Jones <mjones@linear.com>
>>>>>> Datum:
>>>>>> Aan: ecos discuss <ecos-discuss@sourceware.org>
>>>>>> Onderwerp: [ECOS] Scheduler startup question
>>>>>>=20
>>>>>>=20
>>>>>> I have a question about proper scheduler locking startup behavior.
>>>>>>=20
>>>>>> The context is I am cleaning up my iMX6 HAL and attempting to make t=
hings work without a couple of kernel hacks I added to make it work.
>>>>>>=20
>>>>>> The question has to do with sched_lock. By default this has a value =
of 1, so during startup the scheduler is locked.
>>>>>>=20
>>>>>> When there is an interrupt, sched_lock is incremented in Vectors.S, =
and decremented in interrupt_end.
>>>>>>=20
>>>>>> However, I am getting an assert in sync.h which is part of the BSD s=
tack. The assert is because it expects the lock to be zero.
>>>>>>=20
>>>>>> The question is, during the startup process, how does the lock get s=
et to zero after initialization? Is it supposed to stay 1 while hardware is=
 initialized and through all the constructors, etc? Is it cleared by the sc=
heduler somehow? Is the HAL supposed to zero it at some point during startu=
p?
>>>>>>=20
>>>>>> My HAL is part of the ARM hal, so if this is device specific, it is =
the ARM HAL I am working with.
>>>>>>=20
>>>>>> Mike
>>>>>> --
>>>>>> Before posting, please read the FAQ: http://ecos.sourceware.org/fom/=
ecos
>>>>>> and search the list archive: http://ecos.sourceware.org/ml/ecos-disc=
uss
>>>>>>=20
>>>>>>=20
>>>>>> --
>>>>>> Before posting, please read the FAQ: http://ecos.sourceware.org/fom/=
ecos
>>>>>> and search the list archive: http://ecos.sourceware.org/ml/ecos-disc=
uss
>>>>>>=20
>>>>=20
>>>> --=20
>>>> Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ec=
os
>>>> and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss
>>>>=20
>>=20
>>=20
>> --=20
>> Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
>> and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss
>>=20
>=20


--
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss