public inbox for ecos-discuss@sourceware.org
 help / color / mirror / Atom feed
* [ECOS] network problem
@ 2007-08-30 23:09 Rick Davis
  0 siblings, 0 replies; 18+ messages in thread
From: Rick Davis @ 2007-08-30 23:09 UTC (permalink / raw)
  To: Ecos-Discuss

I have still been trying to track down my network failure issue in previous
discussions. After days of looking in to the issue I have noticed that the
"Misc mpool" looses 32 bytes every time I access a web page. The pool
eventually runs out of memory and the network stack locks up waiting for
memory to be returned. Any ideas?

Thanks,
Rick Davis



-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [ECOS] network problem
  2007-09-01 12:05             ` Rick Davis
@ 2007-09-01 12:32               ` Andrew Lunn
  0 siblings, 0 replies; 18+ messages in thread
From: Andrew Lunn @ 2007-09-01 12:32 UTC (permalink / raw)
  To: Rick Davis
  Cc: 'Andrew Lunn', 'John Mills', 'eCos Users'

> I am
> currently going to be using the latest eCos for the MPC85xx processor. A&M
> has a port for their Python board (MPC8541) but it hasn't been applied?

Probably because A&M have not contributed the port. 

> As for the zpool, a couple/few years ago there was a thread regarding the
> lack of locking and they added a mutex that was used to prevent pre-emption
> inside of zalloci and zfreei.

I found a mention of this. It does not look like a patch was
submitted. I've not looked in too much detail, but it looks like there
could be a race here. So if somebody could submit a patch that would
be great.

      Andrew

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [ECOS] network problem
  2007-09-01 10:38           ` Andrew Lunn
@ 2007-09-01 12:05             ` Rick Davis
  2007-09-01 12:32               ` Andrew Lunn
  0 siblings, 1 reply; 18+ messages in thread
From: Rick Davis @ 2007-09-01 12:05 UTC (permalink / raw)
  To: 'Andrew Lunn', 'John Mills'; +Cc: 'eCos Users'

Andrew,

The issue I reported yesterday about bsd_accept was an issue on my old
snapshot. The latest version of bsd_accept is good to go. I am using an
older eCos snapshot because of the modifications I had to make for my
processors/platforms being used that aren't supported by eCos and my lack of
free time to merge all of these changes in to the latest snapshot. I am
currently going to be using the latest eCos for the MPC85xx processor. A&M
has a port for their Python board (MPC8541) but it hasn't been applied?

As for the zpool, a couple/few years ago there was a thread regarding the
lack of locking and they added a mutex that was used to prevent pre-emption
inside of zalloci and zfreei.

Rick

-----Original Message-----
From: Andrew Lunn [mailto:andrew@lunn.ch] 
Sent: Saturday, September 01, 2007 6:39 AM
To: John Mills
Cc: eCos Users; Rick Davis
Subject: Re: [ECOS] network problem

Hi Folks

I've lost track of the different threads about memory leaks in the
network stack. It seems like one of the leaks being talked about here
was fixed a long time ago:

2003-07-28  Jay Foster  <jay@systech.com>

        * src/sys/kern/sockio.c:
        Fixed memory leak in accept() call.

Do we still need locking in socreate()? socreate calls soalloc. That
has a comment /* XXX race condition for reentrant kernel */. The
actual problem is in zalloci which does not perform locking on the
linked list of elements in the pool.

We are re-entrant? socreate() is only called from bsd_socket().
bsd_socket() should only be called from socket().  socket() performs
locking, depending on what synchronisation protocol is in
use. bsd_tcpip has no synchronisation protocol, so two simultaneous
calls to socket() could result in a race!

Could somebody please submit a full patch for socreate.

Are there any other issues left?

    Thanks
        Andrew


-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [ECOS] network problem
  2007-08-28 18:10         ` John Mills
  2007-08-28 18:42           ` John Mills
  2007-08-28 18:42           ` Rick Davis
@ 2007-09-01 10:38           ` Andrew Lunn
  2007-09-01 12:05             ` Rick Davis
  2 siblings, 1 reply; 18+ messages in thread
From: Andrew Lunn @ 2007-09-01 10:38 UTC (permalink / raw)
  To: John Mills; +Cc: eCos Users, Rick Davis

Hi Folks

I've lost track of the different threads about memory leaks in the
network stack. It seems like one of the leaks being talked about here
was fixed a long time ago:

2003-07-28  Jay Foster  <jay@systech.com>

        * src/sys/kern/sockio.c:
        Fixed memory leak in accept() call.

Do we still need locking in socreate()? socreate calls soalloc. That
has a comment /* XXX race condition for reentrant kernel */. The
actual problem is in zalloci which does not perform locking on the
linked list of elements in the pool.

We are re-entrant? socreate() is only called from bsd_socket().
bsd_socket() should only be called from socket().  socket() performs
locking, depending on what synchronisation protocol is in
use. bsd_tcpip has no synchronisation protocol, so two simultaneous
calls to socket() could result in a race!

Could somebody please submit a full patch for socreate.

Are there any other issues left?

    Thanks
        Andrew

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [ECOS] network problem
  2007-08-28 18:42           ` Rick Davis
@ 2007-08-29 19:12             ` John Mills
  0 siblings, 0 replies; 18+ messages in thread
From: John Mills @ 2007-08-29 19:12 UTC (permalink / raw)
  To: 'eCos Users'

Hello -

In response (solidarity?) to Rick Davis problems of progressive
deterioration of network communications. I had a similar-sounding problme
and traced it to a disconnect between 'tcp_input.c:tcp_input()' and
'uipc_socket.c:sofree()'. I am about to post my patch on the
'ecos-patches' mailing list, which will be my first post there. Please let
me know if this is not appropriate use of the respective lists, and I
naturally welcome technical suggestions as well.

Thanks for your past help, and I look forward to future participation.

John Mills
AirDefense, Inc.
Alpharetta, GA


On Tue, 28 Aug 2007, Rick Davis wrote:

> John,
 
> Thanks for the information. On my product it is also the web that is being
> pounded on and dying. The telnet was used to verify the error. Do you know
> if there is a way to return these "stale TCP connection records" to the
> pool? Did you find a fix to the problem? I'm not a network stack expert at
> all.
 
> Thanks again,
> Rick Davis


-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [ECOS] network problem
  2007-08-28 18:10         ` John Mills
  2007-08-28 18:42           ` John Mills
@ 2007-08-28 18:42           ` Rick Davis
  2007-08-29 19:12             ` John Mills
  2007-09-01 10:38           ` Andrew Lunn
  2 siblings, 1 reply; 18+ messages in thread
From: Rick Davis @ 2007-08-28 18:42 UTC (permalink / raw)
  To: 'John Mills', 'eCos Users'

John,

Thanks for the information. On my product it is also the web that is being
pounded on and dying. The telnet was used to verify the error. Do you know
if there is a way to return these "stale TCP connection records" to the
pool? Did you find a fix to the problem? I'm not a network stack expert at
all.

Thanks again,
Rick Davis


-----Original Message-----
From: John Mills [mailto:johnmills@speakeasy.net] 
Sent: Tuesday, August 28, 2007 2:10 PM
To: eCos Users
Cc: Rick Davis
Subject: RE: [ECOS] network problem

Rick -

I have just run to ground a problem with very similar symptoms. It turned
out that the 'socket' pool ("zone") was depleted by unrecoverable, stale
TCP connection records. I tracked this down by adding a counter for
allocated/ deallocated data structures from that pool and diagnostic
printouts of the count as sockets were allocated or freed. Though we
noticed the problem with web inquiries, it turned out to have other
effects - like your inability to open a 'telnet' connection.

As I understand eCos 'zones', each is initially allocated a fixed memory
block based on the size of a specific data structure and the number of
such structures they are expected to provide. Thus a simple counter will
reflect where you stand with respect to a particular pool's capacity and
you probably don't have to dig into the zone alloc/dealloc mechanism.

HTH.

 - John Mills 

DISCLAIMER: I'm a relative beginner with eCos.

On Tue, 28 Aug 2007, Rick Davis wrote:

> Andrew,
> 
> After I sent the e-mail this morning, it stopped working in another way.
> http stopped responding but pings still worked. I have a simple telnet
> server on my application and it was failing trying to bind with "Try again
> later". In_pcbbind was failing because in_pcbinshash was failing
indicating
> it couldn't MALLOC memory. I turned on fancy asserts and tracing and am
> testing again. It usually take 12 Hrs or more to fail. In_pcbhash MALLOCs
> from the network pool so I am monitoring that. Any ideas why the network
> pool would run out of memory? Can it get fragmented?
> 
> Thanks,
> Rick Davis


-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [ECOS] network problem
  2007-08-28 18:10         ` John Mills
@ 2007-08-28 18:42           ` John Mills
  2007-08-28 18:42           ` Rick Davis
  2007-09-01 10:38           ` Andrew Lunn
  2 siblings, 0 replies; 18+ messages in thread
From: John Mills @ 2007-08-28 18:42 UTC (permalink / raw)
  To: eCos Users; +Cc: Rick Davis

Rick -

You could try drastically reducing the capacity of the pool you think
you're depleting. That way you might generate the lockup in a few minutes
or an hour, instead of 12 hours.

 - John Mills

On Tue, 28 Aug 2007, John Mills wrote:

> Rick -
> 
> I have just run to ground a problem with very similar symptoms. It turned
> out that the 'socket' pool ("zone") was depleted by unrecoverable, stale
> TCP connection records. I tracked this down by adding a counter for
> allocated/ deallocated data structures from that pool and diagnostic
> printouts of the count as sockets were allocated or freed. Though we
> noticed the problem with web inquiries, it turned out to have other
> effects - like your inability to open a 'telnet' connection.
> 
> As I understand eCos 'zones', each is initially allocated a fixed memory
> block based on the size of a specific data structure and the number of
> such structures they are expected to provide. Thus a simple counter will
> reflect where you stand with respect to a particular pool's capacity and
> you probably don't have to dig into the zone alloc/dealloc mechanism.
> 
> HTH.
> 
>  - John Mills 
> 
> DISCLAIMER: I'm a relative beginner with eCos.
> 
> On Tue, 28 Aug 2007, Rick Davis wrote:
> 
> > Andrew,
> > 
> > After I sent the e-mail this morning, it stopped working in another way.
> > http stopped responding but pings still worked. I have a simple telnet
> > server on my application and it was failing trying to bind with "Try again
> > later". In_pcbbind was failing because in_pcbinshash was failing indicating
> > it couldn't MALLOC memory. I turned on fancy asserts and tracing and am
> > testing again. It usually take 12 Hrs or more to fail. In_pcbhash MALLOCs
> > from the network pool so I am monitoring that. Any ideas why the network
> > pool would run out of memory? Can it get fragmented?
> > 
> > Thanks,
> > Rick Davis
> 
> 

-- 
 - John Mills
   john.m.mills@alum.mit.edu


-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [ECOS] network problem
  2007-08-28 16:29       ` Rick Davis
@ 2007-08-28 18:10         ` John Mills
  2007-08-28 18:42           ` John Mills
                             ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: John Mills @ 2007-08-28 18:10 UTC (permalink / raw)
  To: eCos Users; +Cc: Rick Davis

Rick -

I have just run to ground a problem with very similar symptoms. It turned
out that the 'socket' pool ("zone") was depleted by unrecoverable, stale
TCP connection records. I tracked this down by adding a counter for
allocated/ deallocated data structures from that pool and diagnostic
printouts of the count as sockets were allocated or freed. Though we
noticed the problem with web inquiries, it turned out to have other
effects - like your inability to open a 'telnet' connection.

As I understand eCos 'zones', each is initially allocated a fixed memory
block based on the size of a specific data structure and the number of
such structures they are expected to provide. Thus a simple counter will
reflect where you stand with respect to a particular pool's capacity and
you probably don't have to dig into the zone alloc/dealloc mechanism.

HTH.

 - John Mills 

DISCLAIMER: I'm a relative beginner with eCos.

On Tue, 28 Aug 2007, Rick Davis wrote:

> Andrew,
> 
> After I sent the e-mail this morning, it stopped working in another way.
> http stopped responding but pings still worked. I have a simple telnet
> server on my application and it was failing trying to bind with "Try again
> later". In_pcbbind was failing because in_pcbinshash was failing indicating
> it couldn't MALLOC memory. I turned on fancy asserts and tracing and am
> testing again. It usually take 12 Hrs or more to fail. In_pcbhash MALLOCs
> from the network pool so I am monitoring that. Any ideas why the network
> pool would run out of memory? Can it get fragmented?
> 
> Thanks,
> Rick Davis


-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [ECOS] network problem
  2007-08-28 16:09     ` Andrew Lunn
@ 2007-08-28 16:29       ` Rick Davis
  2007-08-28 18:10         ` John Mills
  0 siblings, 1 reply; 18+ messages in thread
From: Rick Davis @ 2007-08-28 16:29 UTC (permalink / raw)
  To: 'Andrew Lunn'; +Cc: ecos-discuss

Andrew,

After I sent the e-mail this morning, it stopped working in another way.
http stopped responding but pings still worked. I have a simple telnet
server on my application and it was failing trying to bind with "Try again
later". In_pcbbind was failing because in_pcbinshash was failing indicating
it couldn't MALLOC memory. I turned on fancy asserts and tracing and am
testing again. It usually take 12 Hrs or more to fail. In_pcbhash MALLOCs
from the network pool so I am monitoring that. Any ideas why the network
pool would run out of memory? Can it get fragmented?

Thanks,
Rick Davis


-----Original Message-----
From: Andrew Lunn [mailto:andrew@lunn.ch] 
Sent: Tuesday, August 28, 2007 12:09 PM
To: Rick Davis
Cc: ecos-discuss@ecos.sourceware.org
Subject: Re: [ECOS] network problem

> I guess I need to submit a patch because this issue is still in the latest
> eCos repository which I am getting ready to use the latest eCos for a new
> project/processor.

Please do submit a patch. 

       Andrew


-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [ECOS] network problem
  2007-08-28  9:00   ` Rick Davis
@ 2007-08-28 16:09     ` Andrew Lunn
  2007-08-28 16:29       ` Rick Davis
  0 siblings, 1 reply; 18+ messages in thread
From: Andrew Lunn @ 2007-08-28 16:09 UTC (permalink / raw)
  To: Rick Davis; +Cc: ecos-discuss

> I guess I need to submit a patch because this issue is still in the latest
> eCos repository which I am getting ready to use the latest eCos for a new
> project/processor.

Please do submit a patch. 

       Andrew

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [ECOS] network problem
  2007-08-27  8:12 ` Andrew Lunn
@ 2007-08-28  9:00   ` Rick Davis
  2007-08-28 16:09     ` Andrew Lunn
  0 siblings, 1 reply; 18+ messages in thread
From: Rick Davis @ 2007-08-28  9:00 UTC (permalink / raw)
  To: 'Andrew Lunn'; +Cc: ecos-discuss

Andrew,

I am using a snapshot from the 2005 era. I did go through the archives just
after I sent my e-mail and did find something from 16-Nov-2005 subject

Possible sockets/fd race condition.

I did what they did in socreate in uipc_socket.c and it appears to have
fixed my problem. The latest eCos repository does not contain this fix.
Below is the so create code. I added a call to splnet and the appropriate
calls to splx. This affects both
net/bsd_tcpip/current/src/sys/kern/uipc_socket.c and
net/bsd_tcpip/current/src/sys/kern/uipc_socket.c.

I guess I need to submit a patch because this issue is still in the latest
eCos repository which I am getting ready to use the latest eCos for a new
project/processor.

Below is the new socreate function in
net/bsd_tcpip/current/src/sys/kern/uipc_socket.c

int
socreate(dom, aso, type, proto, p)
	int dom;
	struct socket **aso;
	register int type;
	int proto;
	struct proc *p;
{
	register struct protosw *prp;
	register struct socket *so;
	register int error;
	int s = splnet();

	if (proto)
		prp = pffindproto(dom, proto, type);
	else
		prp = pffindtype(dom, type);

	if (prp == 0 || prp->pr_usrreqs->pru_attach == 0)
	{
		splx (s);
		return (EPROTONOSUPPORT);
	}
	if (prp->pr_type != type)
	{
		splx (s);
		return (EPROTOTYPE);
	}
	so = soalloc(p != 0);
	if (so == 0) {
		splx (s);
		return (ENOBUFS);
        }

	TAILQ_INIT(&so->so_incomp);
	TAILQ_INIT(&so->so_comp);
	so->so_type = type;
	so->so_proto = prp;
	error = (*prp->pr_usrreqs->pru_attach)(so, proto, p);
	if (error) {
		so->so_state |= SS_NOFDREF;
		sofree(so);
		splx (s);
		return (error);
	}
	*aso = so;
	splx (s);
	return (0);
}

Thanks for your response,
Rick Davis

-----Original Message-----
From: Andrew Lunn [mailto:andrew@lunn.ch] 
Sent: Monday, August 27, 2007 4:12 AM
To: Rick Davis
Cc: ecos-discuss@ecos.sourceware.org
Subject: Re: [ECOS] network problem

On Mon, Aug 27, 2007 at 02:48:42AM -0400, Rick Davis wrote:
> I have a device using the MPC859T processor that has a small web server
> running using the standard eCos web server. I have a status page that
> auto-refreshes every 15 seconds and I am pinging the unit every second
(Yes,
> I have a customer that is actually doing this). I don't really know what
> other network activity is occurring at the customer's site but my test lab
> has Windows network chatter going on. After about 12 or so hours the web
> stops responding and the unit can no longer be pinged. The FEC Ethernet
> driver is receiving packets and is calling the eth_drv_dsr but the deliver
> function is never called.
> 
> I have been tracking this down for some time and have noticed the
> following...
> 
> 1. The alarm thread in timeout.c is getting blocked when calling
> splx_internal() just before the call to eth_drv_run_deliveries().
> 2. The current value of spl_state in sync.c is 4 (SPL_NET)
> 
> Any ideas why the network would not release the splx_mutex?
> Any suggestion on how to further track this down?
> I don't have a GDB interface on my platform. :(

What vintage of eCos are you using? If you go back far enough into the
mists of time, there was at least one bug fix for alarms. But that is
a long time ago.

Do you have asserts enabled? It might give some clues.....

You could also enable CYGIMPL_TRACE_SPLX and call show_sched_events()
when you hit the deadlock. That should tell you what function is
holding the mutex. You might want to add to the log structure
__builtin_return_addresss(0), so you can see one more level up the
call stack. Otherwise i think you will just get spi_slpnet, which is
not much use.

    Andrew


-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [ECOS] network problem
  2007-08-27  6:48 Rick Davis
@ 2007-08-27  8:12 ` Andrew Lunn
  2007-08-28  9:00   ` Rick Davis
  0 siblings, 1 reply; 18+ messages in thread
From: Andrew Lunn @ 2007-08-27  8:12 UTC (permalink / raw)
  To: Rick Davis; +Cc: ecos-discuss

On Mon, Aug 27, 2007 at 02:48:42AM -0400, Rick Davis wrote:
> I have a device using the MPC859T processor that has a small web server
> running using the standard eCos web server. I have a status page that
> auto-refreshes every 15 seconds and I am pinging the unit every second (Yes,
> I have a customer that is actually doing this). I don't really know what
> other network activity is occurring at the customer's site but my test lab
> has Windows network chatter going on. After about 12 or so hours the web
> stops responding and the unit can no longer be pinged. The FEC Ethernet
> driver is receiving packets and is calling the eth_drv_dsr but the deliver
> function is never called.
> 
> I have been tracking this down for some time and have noticed the
> following...
> 
> 1. The alarm thread in timeout.c is getting blocked when calling
> splx_internal() just before the call to eth_drv_run_deliveries().
> 2. The current value of spl_state in sync.c is 4 (SPL_NET)
> 
> Any ideas why the network would not release the splx_mutex?
> Any suggestion on how to further track this down?
> I don't have a GDB interface on my platform. :(

What vintage of eCos are you using? If you go back far enough into the
mists of time, there was at least one bug fix for alarms. But that is
a long time ago.

Do you have asserts enabled? It might give some clues.....

You could also enable CYGIMPL_TRACE_SPLX and call show_sched_events()
when you hit the deadlock. That should tell you what function is
holding the mutex. You might want to add to the log structure
__builtin_return_addresss(0), so you can see one more level up the
call stack. Otherwise i think you will just get spi_slpnet, which is
not much use.

    Andrew

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [ECOS] network problem
@ 2007-08-27  6:48 Rick Davis
  2007-08-27  8:12 ` Andrew Lunn
  0 siblings, 1 reply; 18+ messages in thread
From: Rick Davis @ 2007-08-27  6:48 UTC (permalink / raw)
  To: ecos-discuss

I have a device using the MPC859T processor that has a small web server
running using the standard eCos web server. I have a status page that
auto-refreshes every 15 seconds and I am pinging the unit every second (Yes,
I have a customer that is actually doing this). I don't really know what
other network activity is occurring at the customer's site but my test lab
has Windows network chatter going on. After about 12 or so hours the web
stops responding and the unit can no longer be pinged. The FEC Ethernet
driver is receiving packets and is calling the eth_drv_dsr but the deliver
function is never called.

I have been tracking this down for some time and have noticed the
following...

1. The alarm thread in timeout.c is getting blocked when calling
splx_internal() just before the call to eth_drv_run_deliveries().
2. The current value of spl_state in sync.c is 4 (SPL_NET)

Any ideas why the network would not release the splx_mutex?
Any suggestion on how to further track this down?
I don't have a GDB interface on my platform. :(

Thanks in advance,
Rick Davis



-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [ECOS] : Network problem
  2005-05-14 19:55 [ECOS] : Network problem vamshi
@ 2005-05-16  5:13 ` Gary Thomas
  0 siblings, 0 replies; 18+ messages in thread
From: Gary Thomas @ 2005-05-16  5:13 UTC (permalink / raw)
  To: vamshi; +Cc: eCos Discussion

On Sat, 2005-05-14 at 12:22 +0530, vamshi@cse.iitb.ac.in wrote:
> We are writing an application that needs two ethernet cards.
> Our target platform is i386pc .
> 
> When we try to initialize the ethernet cards, we get the folowing error.
> 
> 
> [root@rclab5 try2]# ./a.out -t default.tdf [cyg_net_init] Init:
> mbinit(0x00000000)
> [cyg_net_init] Init: cyg_net_init_devs(0x00000000)
> Init device 'synth_eth1'
> Init device 'synth_eth0'
> [cyg_net_init] Init: loopattach(0x00000000)
> [cyg_net_init] Init: ifinit(0x00000000)
> [cyg_net_init] Init: domaininit(0x00000000)
> [cyg_net_init] Init: cyg_net_add_domain(0x020025c0)
> New domain internet at 0x00000000
> [cyg_net_init] Init: cyg_net_add_domain(0x02001ee0)
> New domain route at 0x00000000
> [cyg_net_init] Init: call_route_init(0x00000000)
> [cyg_net_init] Done
> Start PING test
> [eth_drv_ioctl] Warning: Driver can't set multi-cast mode
> [eth_drv_ioctl] Warning: Driver can't set multi-cast mode
> 

I don't see any error here.  The warning abut multi-cast is just
a warning and should not affect the operation (unless you are trying
to use IPv6)

Since you've obviously trying to use the synthetic Linux target,
does your I/O adapter [the auxiliary program that runs on Linux and 
provides network access] know about both interfaces?

Have you made sure that you understand the process with a single
network interface?  The manual has a good explanation of how to
run programs on the synthetic target, how to set up the auxiliary,
etc.  Run some of the standard network test programs under this
scenario, then proceed to getting two networks running.

-- 
------------------------------------------------------------
Gary Thomas                 |  Consulting for the
MLB Associates              |    Embedded world
------------------------------------------------------------


-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [ECOS] : Network problem
@ 2005-05-14 19:55 vamshi
  2005-05-16  5:13 ` Gary Thomas
  0 siblings, 1 reply; 18+ messages in thread
From: vamshi @ 2005-05-14 19:55 UTC (permalink / raw)
  To: ecos-discuss


We are writing an application that needs two ethernet cards.
Our target platform is i386pc .

When we try to initialize the ethernet cards, we get the folowing error.


[root@rclab5 try2]# ./a.out -t default.tdf [cyg_net_init] Init:
mbinit(0x00000000)
[cyg_net_init] Init: cyg_net_init_devs(0x00000000)
Init device 'synth_eth1'
Init device 'synth_eth0'
[cyg_net_init] Init: loopattach(0x00000000)
[cyg_net_init] Init: ifinit(0x00000000)
[cyg_net_init] Init: domaininit(0x00000000)
[cyg_net_init] Init: cyg_net_add_domain(0x020025c0)
New domain internet at 0x00000000
[cyg_net_init] Init: cyg_net_add_domain(0x02001ee0)
New domain route at 0x00000000
[cyg_net_init] Init: call_route_init(0x00000000)
[cyg_net_init] Done
Start PING test
[eth_drv_ioctl] Warning: Driver can't set multi-cast mode
[eth_drv_ioctl] Warning: Driver can't set multi-cast mode



Any assistance would be greatly appreciated.


This is our source code :


// PING test code

#include <network.h>
#ifdef CYGPKG_NET_INET6
#include <netinet/ip6.h>
#include <netinet/icmp6.h>
#endif

#include <pkgconf/system.h>
#include <pkgconf/net.h>

#include <cyg/infra/testcase.h>

#ifdef CYGBLD_DEVS_ETH_DEVICE_H    // Get the device config if it exists
#include CYGBLD_DEVS_ETH_DEVICE_H  // May provide
CYGTST_DEVS_ETH_TEST_NET_REALTIME
#endif

#ifdef CYGPKG_NET_TESTS_USE_RT_TEST_HARNESS // do we use the rt test?
# ifdef CYGTST_DEVS_ETH_TEST_NET_REALTIME // Get the test ancilla if it
exists
#  include CYGTST_DEVS_ETH_TEST_NET_REALTIME
# endif
#endif

// Fill in the blanks if necessary
#ifndef TNR_OFF
# define TNR_OFF()
#endif
#ifndef TNR_ON
# define TNR_ON()
#endif
#ifndef TNR_INIT
# define TNR_INIT()
#endif
#ifndef TNR_PRINT_ACTIVITY
# define TNR_PRINT_ACTIVITY()
#endif



#ifndef CYGPKG_LIBC_STDIO
#define perror(s) diag_printf(#s ": %s\n", strerror(errno))
#endif

#define STACK_SIZE (CYGNUM_HAL_STACK_SIZE_TYPICAL + 0x1000)
static char stack[STACK_SIZE];
static cyg_thread thread_data;
static cyg_handle_t thread_handle;

#define NUM_PINGS 16
#define MAX_PACKET 4096
#define MIN_PACKET   64
#define MAX_SEND   4000

#define PACKET_ADD  ((MAX_SEND - MIN_PACKET)/NUM_PINGS)
#define nPACKET_ADD  1

static unsigned char pkt1[MAX_PACKET], pkt2[MAX_PACKET];

#define UNIQUEID 0x1234

void
pexit(char *s)
{
    CYG_TEST_FAIL_FINISH(s);
}

// Compute INET checksum
int
inet_cksum(u_short *addr, int len)
{
    register int nleft = len;
    register u_short *w = addr;
    register u_short answer;
    register u_int sum = 0;
    u_short odd_byte = 0;

    /*
     *  Our algorithm is simple, using a 32 bit accumulator (sum),
     *  we add sequential 16 bit words to it, and at the end, fold
     *  back all the carry bits from the top 16 bits into the lower
     *  16 bits.
     */
    while( nleft > 1 )  {
        sum += *w++;
        nleft -= 2;
    }

    /* mop up an odd byte, if necessary */
    if( nleft == 1 ) {
        *(u_char *)(&odd_byte) = *(u_char *)w;
        sum += odd_byte;
    }

    /*
     * add back carry outs from top 16 bits to low 16 bits
     */
    sum = (sum >> 16) + (sum & 0x0000ffff); /* add hi 16 to low 16 */
    sum += (sum >> 16);                     /* add carry */
    answer = ~sum;                          /* truncate to 16 bits */
    return (answer);
}

static int
show_icmp(unsigned char *pkt, int len,
          struct sockaddr_in *from, struct sockaddr_in *to)
{
    cyg_tick_count_t *tp, tv;
    struct ip *ip;
    struct icmp *icmp;
    tv = cyg_current_time();
    ip = (struct ip *)pkt;
    if ((len < sizeof(*ip)) || ip->ip_v != IPVERSION) {
        diag_printf("%s: Short packet or not IP! - Len: %d, Version: %d\n",
                    inet_ntoa(from->sin_addr), len, ip->ip_v);
        return 0;
}
    icmp = (struct icmp *)(pkt + sizeof(*ip));
    len -= (sizeof(*ip) + 8);
    tp = (cyg_tick_count_t *)&icmp->icmp_data;
    if (icmp->icmp_type != ICMP_ECHOREPLY) {
        diag_printf("%s: Invalid ICMP - type: %d\n",
                    inet_ntoa(from->sin_addr), icmp->icmp_type);
        return 0;
    }
    if (icmp->icmp_id != UNIQUEID) {
        diag_printf("%s: ICMP received for wrong id - sent: %x, recvd: %x\n",
                    inet_ntoa(from->sin_addr), UNIQUEID, icmp->icmp_id);
    }
    diag_printf("%d bytes from %s: ", len, inet_ntoa(from->sin_addr));
    diag_printf("icmp_seq=%d", icmp->icmp_seq);
    diag_printf(", time=%dms\n", (int)(tv - *tp)*10);
    return (from->sin_addr.s_addr == to->sin_addr.s_addr);
}

static void
ping_host(int s, struct sockaddr_in *host)
{
    struct icmp *icmp = (struct icmp *)pkt1;
    int icmp_len = MIN_PACKET;
    int seq, ok_recv, bogus_recv;
    cyg_tick_count_t *tp;
    long *dp;
    struct sockaddr_in from;
    int i, len, fromlen;

    ok_recv = 0;
    bogus_recv = 0;
    diag_printf("PING server(h) %s\n", inet_ntoa(host->sin_addr));
    for (seq = 0;  seq < NUM_PINGS;  seq++, icmp_len += PACKET_ADD ) {
      //  TNR_ON();
        // Build ICMP packet
        icmp->icmp_type = ICMP_ECHO;
        icmp->icmp_code = 0;
        icmp->icmp_cksum = 0;
        icmp->icmp_seq = seq;
        icmp->icmp_id = 0x1234;
        // Set up ping data
        tp = (cyg_tick_count_t *)&icmp->icmp_data;
        *tp++ = cyg_current_time();
        dp = (long *)tp;
        for (i = sizeof(*tp);  i < icmp_len;  i += sizeof(*dp)) {
            *dp++ = i;
        }
        // Add checksum
        icmp->icmp_cksum = inet_cksum( (u_short *)icmp, icmp_len+8);
        // Send it off
        if (sendto(s, icmp, icmp_len+8, 0, (struct sockaddr *)host,
sizeof(*host)) < 0) {
    //        TNR_OFF();
            perror("sendto");
            continue;
        }
        // Wait for a response
        fromlen = sizeof(from);
        len = recvfrom(s, pkt2, sizeof(pkt2), 0, (struct sockaddr *)&from,
&fromlen);
        TNR_OFF();
        if (len < 0) {
            perror("recvfrom");
            icmp_len = MIN_PACKET - PACKET_ADD; // just in case - long routes
        } else {
            if (show_icmp(pkt2, len, &from, host)) {
                ok_recv++;
            } else {
                bogus_recv++;
            }
        }
    }
  //  TNR_OFF();
    diag_printf("Sent %d packets, received %d OK, %d bad\n", NUM_PINGS,
ok_recv, bogus_recv);
}


#ifdef CYGPKG_PROFILE_GPROF
#include <cyg/profile/profile.h>

extern char _stext, _etext;  // Defined by the linker

static void
start_profile(void)
{
    // This starts up the system-wide profiling, gathering
    // profile information on all of the code, with a 16 byte
    // "bucket" size, at a rate of 100us/profile hit.
    // Note: a bucket size of 16 will give pretty good function
    //       resolution.  Much smaller and the buffer becomes
    //       much too large for very little gain.
    // Note: a timer period of 100us is also a reasonable
    //       compromise.  Any smaller and the overhead of
    //       handling the timter (profile) interrupt could
    //       swamp the system.  A fast processor might get
    //       by with a smaller value, but a slow one could
    //       even be swamped by this value.  If the value is
    //       too large, the usefulness of the profile is reduced.
    profile_on(&_stext, &_etext, 16, 100);
}
#endif

static void
ping_test(struct bootp *bp)
{
    struct protoent *p;
    struct timeval tv;
    struct sockaddr_in host;
    int s;

    if ((p = getprotobyname("icmp")) == (struct protoent *)0) {
        pexit("getprotobyname");
        return;
    }
    s = socket(AF_INET, SOCK_RAW, p->p_proto);
    if (s < 0) {
        pexit("socket");
        return;
    }
    tv.tv_sec = 1;
    tv.tv_usec = 0;
    setsockopt(s, SOL_SOCKET, SO_RCVTIMEO, &tv, sizeof(tv));
    // Set up host address
    host.sin_family = AF_INET;
    host.sin_len = sizeof(host);
    host.sin_addr = bp->bp_siaddr;
    host.sin_port = 0;
    ping_host(s, &host);
    // Now try a bogus host
    host.sin_addr.s_addr = htonl(ntohl(host.sin_addr.s_addr) + 32);
    ping_host(s, &host);
}

void
net_test(cyg_addrword_t p)
{
    diag_printf("Start PING test\n");

//    TNR_INIT();
    init_all_network_interfaces();

#ifdef CYGPKG_PROFILE_GPROF
    start_profile();
#endif


#ifdef CYGHWR_NET_DRIVER_ETH1
    if (eth1_up) {
        ping_test(&eth1_bootp_data);
    }
#endif

  //  TNR_PRINT_ACTIVITY();
    //CYG_TEST_PASS_FINISH("Ping test OK");

}

void
cyg_start(void)
{
    // Create a main thread, so we can run the scheduler and have time 'pass'
    cyg_thread_create(10,                // Priority - just a number
                      net_test,          // entry
                      0,                 // entry parameter
                      "Network test",    // Name
                      &stack[0],         // Stack
                      STACK_SIZE,        // Size
                      &thread_handle,    // Handle
                      &thread_data       // Thread data structure
            );
    cyg_thread_resume(thread_handle);  // Start it
    cyg_scheduler_start();
}

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [ECOS] network problem
  2002-01-11  8:40 ` Andrew Lunn
@ 2002-01-11  9:04   ` Andrea Acquaviva
  0 siblings, 0 replies; 18+ messages in thread
From: Andrea Acquaviva @ 2002-01-11  9:04 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: ecos-discuss

[-- Attachment #1: Type: text/plain, Size: 2151 bytes --]

Andrew Lunn wrote:

> On Fri, Jan 11, 2002 at 05:27:33PM +0100, Andrea Acquaviva wrote:
> > Hi,
> >
> > I found a problem while using the CF ethernet interface on assabet
> > board.
> > After the network initialization made by init_all_network_interfaces(),
> > I put the program in an idle state (while(1)) and I try to ping the
> > interface. The interface reply to the ping request for a certain amount
> > of time and then blocks.
> > The strange thing is that this amount of time increase if I add some
> > debugging output when the packet are received.
> >
> > Someone can suggest me an explanation?
>
> Remember that eCos is an RTOS. A high priority thread which is
> runnable will always be run instead of a low priority thread. The
> network stack is implemented as threads as well. So if your endless
> loop is running at a higher priority then the network stack, don't
> expect the network stack to work.

I used the endless loop for debugging. I encountered this problem by using a
client (ecos)-server(linux) application. The application fails after a
certain number of correct transfers.

>
>
> Now the strange thing. You say it works for a while. That i don't
> understand. It should work, or it should not work. What are you
> actually pinging. The application stack or the redboot stack? The
> redboot stack may keep working under these conditions since its not
> thread based. The debug output would also help since it gives redboot
> more time to actually process network traffic for it.

I don't use redboot, I use gdb as a rom monitor.

>
>
> Also, some network drivers have a low priority tickle thread. This
> thread is used to recover from hardware errors in some ethernet
> devices. They lockup under some conditions and the tickle thread will
> bring them back to life. Maybe your endless loop is stopping this
> tickle thread and so you are seeing the hardware error.
>
>        Andrew

I attached the client, which is very simple.
Thanks a lot for your help!

Andrea.

--
Ing. Andrea Acquaviva
D.E.I.S. - Universita' di Bologna
V.le Risorgimento, 2    40136 BOLOGNA (ITALY)
Tel: (+39) 051 20 93787 Fax: (+39) 051 2093786



[-- Attachment #2: ECOS_client.c --]
[-- Type: image/x-xbitmap, Size: 2729 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [ECOS] network problem
  2002-01-11  8:30 [ECOS] network problem Andrea Acquaviva
@ 2002-01-11  8:40 ` Andrew Lunn
  2002-01-11  9:04   ` Andrea Acquaviva
  0 siblings, 1 reply; 18+ messages in thread
From: Andrew Lunn @ 2002-01-11  8:40 UTC (permalink / raw)
  To: Andrea Acquaviva; +Cc: eCos Disuss

On Fri, Jan 11, 2002 at 05:27:33PM +0100, Andrea Acquaviva wrote:
> Hi,
> 
> I found a problem while using the CF ethernet interface on assabet
> board.
> After the network initialization made by init_all_network_interfaces(),
> I put the program in an idle state (while(1)) and I try to ping the
> interface. The interface reply to the ping request for a certain amount
> of time and then blocks.
> The strange thing is that this amount of time increase if I add some
> debugging output when the packet are received.
> 
> Someone can suggest me an explanation?

Remember that eCos is an RTOS. A high priority thread which is
runnable will always be run instead of a low priority thread. The
network stack is implemented as threads as well. So if your endless
loop is running at a higher priority then the network stack, don't
expect the network stack to work.

Now the strange thing. You say it works for a while. That i don't
understand. It should work, or it should not work. What are you
actually pinging. The application stack or the redboot stack? The
redboot stack may keep working under these conditions since its not
thread based. The debug output would also help since it gives redboot
more time to actually process network traffic for it. 

Also, some network drivers have a low priority tickle thread. This
thread is used to recover from hardware errors in some ethernet
devices. They lockup under some conditions and the tickle thread will
bring them back to life. Maybe your endless loop is stopping this
tickle thread and so you are seeing the hardware error.

       Andrew

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [ECOS] network problem
@ 2002-01-11  8:30 Andrea Acquaviva
  2002-01-11  8:40 ` Andrew Lunn
  0 siblings, 1 reply; 18+ messages in thread
From: Andrea Acquaviva @ 2002-01-11  8:30 UTC (permalink / raw)
  To: ecos-discuss

Hi,

I found a problem while using the CF ethernet interface on assabet
board.
After the network initialization made by init_all_network_interfaces(),
I put the program in an idle state (while(1)) and I try to ping the
interface. The interface reply to the ping request for a certain amount
of time and then blocks.
The strange thing is that this amount of time increase if I add some
debugging output when the packet are received.

Someone can suggest me an explanation?

Thanks a lot,
Andrea.


--
Ing. Andrea Acquaviva
D.E.I.S. - Universita' di Bologna
V.le Risorgimento, 2    40136 BOLOGNA (ITALY)
Tel: (+39) 051 20 93787 Fax: (+39) 051 2093786



^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2007-09-01 12:32 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-08-30 23:09 [ECOS] network problem Rick Davis
  -- strict thread matches above, loose matches on Subject: below --
2007-08-27  6:48 Rick Davis
2007-08-27  8:12 ` Andrew Lunn
2007-08-28  9:00   ` Rick Davis
2007-08-28 16:09     ` Andrew Lunn
2007-08-28 16:29       ` Rick Davis
2007-08-28 18:10         ` John Mills
2007-08-28 18:42           ` John Mills
2007-08-28 18:42           ` Rick Davis
2007-08-29 19:12             ` John Mills
2007-09-01 10:38           ` Andrew Lunn
2007-09-01 12:05             ` Rick Davis
2007-09-01 12:32               ` Andrew Lunn
2005-05-14 19:55 [ECOS] : Network problem vamshi
2005-05-16  5:13 ` Gary Thomas
2002-01-11  8:30 [ECOS] network problem Andrea Acquaviva
2002-01-11  8:40 ` Andrew Lunn
2002-01-11  9:04   ` Andrea Acquaviva

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).