From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <ecos-discuss-return-38643-listarch-ecos-discuss=sources.redhat.com@ecos.sourceware.org>
Received: (qmail 21244 invoked by alias); 27 Aug 2007 08:12:40 -0000
Received: (qmail 21091 invoked by uid 22791); 27 Aug 2007 08:12:39 -0000
X-Spam-Check-By: sourceware.org
Received: from londo.lunn.ch (HELO londo.lunn.ch) (80.238.139.98)     by sourceware.org (qpsmtpd/0.31) with ESMTP; Mon, 27 Aug 2007 08:12:30 +0000
Received: from lunn by londo.lunn.ch with local (Exim 3.36 #1 (Debian)) 	id 1IPZhx-0007lv-00; Mon, 27 Aug 2007 10:12:25 +0200
Date: Mon, 27 Aug 2007 08:12:00 -0000
From: Andrew Lunn <andrew@lunn.ch>
To: Rick Davis <rickdavisjr@comcast.net>
Cc: ecos-discuss@ecos.sourceware.org
Message-ID: <20070827081225.GZ31057@lunn.ch>
Mail-Followup-To: Rick Davis <rickdavisjr@comcast.net>, 	ecos-discuss@ecos.sourceware.org
References: <003601c7e876$4e9ee690$ebdcb3b0$@net>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <003601c7e876$4e9ee690$ebdcb3b0$@net>
User-Agent: Mutt/1.5.16 (2007-06-11)
X-IsSubscribed: yes
Mailing-List: contact ecos-discuss-help@ecos.sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <ecos-discuss.ecos.sourceware.org>
List-Subscribe: <mailto:ecos-discuss-subscribe@ecos.sourceware.org>
List-Archive: <http://ecos.sourceware.org/ml/ecos-discuss/>
List-Post: <mailto:ecos-discuss@ecos.sourceware.org>
List-Help: <mailto:ecos-discuss-help@ecos.sourceware.org>, <http://ecos.sourceware.org/ml/#faqs>
Sender: ecos-discuss-owner@ecos.sourceware.org
Subject: Re: [ECOS] network problem
X-SW-Source: 2007-08/txt/msg00145.txt.bz2

On Mon, Aug 27, 2007 at 02:48:42AM -0400, Rick Davis wrote:
> I have a device using the MPC859T processor that has a small web server
> running using the standard eCos web server. I have a status page that
> auto-refreshes every 15 seconds and I am pinging the unit every second (Yes,
> I have a customer that is actually doing this). I don't really know what
> other network activity is occurring at the customer's site but my test lab
> has Windows network chatter going on. After about 12 or so hours the web
> stops responding and the unit can no longer be pinged. The FEC Ethernet
> driver is receiving packets and is calling the eth_drv_dsr but the deliver
> function is never called.
> 
> I have been tracking this down for some time and have noticed the
> following...
> 
> 1. The alarm thread in timeout.c is getting blocked when calling
> splx_internal() just before the call to eth_drv_run_deliveries().
> 2. The current value of spl_state in sync.c is 4 (SPL_NET)
> 
> Any ideas why the network would not release the splx_mutex?
> Any suggestion on how to further track this down?
> I don't have a GDB interface on my platform. :(

What vintage of eCos are you using? If you go back far enough into the
mists of time, there was at least one bug fix for alarms. But that is
a long time ago.

Do you have asserts enabled? It might give some clues.....

You could also enable CYGIMPL_TRACE_SPLX and call show_sched_events()
when you hit the deadlock. That should tell you what function is
holding the mutex. You might want to add to the log structure
__builtin_return_addresss(0), so you can see one more level up the
call stack. Otherwise i think you will just get spi_slpnet, which is
not much use.

    Andrew

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss