[ECOS] eth_recv out of MBUFs

public inbox for ecos-discuss@sourceware.org
 help / color / mirror / Atom feed

* [ECOS] eth_recv out of MBUFs
@ 2011-07-20 19:16 Stanislav Meduna
  2011-07-20 21:27 ` Stanislav Meduna
  2011-08-05  7:52 ` Lambrecht Jürgen
  0 siblings, 2 replies; 20+ messages in thread
From: Stanislav Meduna @ 2011-07-20 19:16 UTC (permalink / raw)
  To: ecos-discuss

Hi,

I am quite reproducibly able to produce

  eth_recv out of MBUFs

in the FreeBSD stack. The setup is two incoming TCP connections
sending data, a UDP socket sending data, pinging the device,
closing and reopening the TCP connections a few times and
to pull and put back the ethernet cable.

The device then never recovers. The partner sends ARP requests
with no response from eCos, as there is no mbuf to accept them.
The eCos never send anything. eth_drv_tickle_devices runs, but
finds IF_IS_EMPTY(&ifp->if_snd) true and never touches
the interface. Maybe it wants to ARP first, but was unable
to send the request - I don't know. Unless I am blind
I am caling tx_done for every packet I got into send
(but I am going to double-check this).

The mbufs as shown by cyg_net_show_mbufs are full of DATA
with short size (60 or 64) and flags 2 (M_PKTHDR).

Anyone has seen something like that? I am not very familiar
with the TCP/IP stacks in general - so it is quite problematic
for me to debug something there. Are there some constraints
of the "mbuf space has to be larger than tcp window of active
connections" type or something like that?

I am reluctant to try to "fix" it enlarging the space for mbufs,
as I think this should never happen regardless of mbufs and I am
also on a quite memory-constrained device where every 100 kB
matter.

Thanks for any hints
-- 
                                       Stano

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [ECOS] eth_recv out of MBUFs
  2011-07-20 19:16 [ECOS] eth_recv out of MBUFs Stanislav Meduna
@ 2011-07-20 21:27 ` Stanislav Meduna
  2011-08-05  7:52 ` Lambrecht Jürgen
  1 sibling, 0 replies; 20+ messages in thread
From: Stanislav Meduna @ 2011-07-20 21:27 UTC (permalink / raw)
  To: ecos-discuss

On 20.07.2011 21:15, Stanislav Meduna wrote:

> The device then never recovers.

Okay - it does recover if left sitting for minutes (or tens
of minutes) with the cable unplugged. So this is not
a unrecoverable leak, the mbufs will eventually get
drained.

Well, time to reproduce it again and look _what_ is stuck
in that buffers...

-- 
                                    Stano

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [ECOS] eth_recv out of MBUFs
  2011-07-20 19:16 [ECOS] eth_recv out of MBUFs Stanislav Meduna
  2011-07-20 21:27 ` Stanislav Meduna
@ 2011-08-05  7:52 ` Lambrecht Jürgen
  2011-08-09  6:37   ` Stanislav Meduna
  1 sibling, 1 reply; 20+ messages in thread
From: Lambrecht Jürgen @ 2011-08-05  7:52 UTC (permalink / raw)
  To: ecos-discuss

(try again to send to ecos)

On 07/20/2011 09:15 PM, Stanislav Meduna wrote:
>
> Hi,
>
> I am quite reproducibly able to produce
>
>   eth_recv out of MBUFs
>
> in the FreeBSD stack. The setup is two incoming TCP connections
> sending data, a UDP socket sending data, pinging the device,
> closing and reopening the TCP connections a few times and
> to pull and put back the ethernet cable.
>
Be carefull: the eCos freeBSD stack has no timeouts on TCP connections I 
believe, but often the PC stack has!
So the eCos TCP session stays alive, but the PC (used to test) stack 
shuts down the TCP connection when the cable breaks (after a timeout, or 
it can also detect the cable fault). And the TCPs are out of sync 
(something like that), and then it uses a lot of sockets, and indeed (as 
in your next mail) the timeouts are very big to free sockets.
If that is your problem, I can ask my college: he recompiled ecos to 
change the minute timeout to seconds.
>
>
> The device then never recovers. The partner sends ARP requests
> with no response from eCos, as there is no mbuf to accept them.
> The eCos never send anything. eth_drv_tickle_devices runs, but
> finds IF_IS_EMPTY(&ifp->if_snd) true and never touches
> the interface. Maybe it wants to ARP first, but was unable
> to send the request - I don't know. Unless I am blind
> I am caling tx_done for every packet I got into send
> (but I am going to double-check this).
>
> The mbufs as shown by cyg_net_show_mbufs are full of DATA
> with short size (60 or 64) and flags 2 (M_PKTHDR).
>
> Anyone has seen something like that? I am not very familiar
> with the TCP/IP stacks in general - so it is quite problematic
> for me to debug something there. Are there some constraints
> of the "mbuf space has to be larger than tcp window of active
> connections" type or something like that?
>
> I am reluctant to try to "fix" it enlarging the space for mbufs,
> as I think this should never happen regardless of mbufs and I am
> also on a quite memory-constrained device where every 100 kB
> matter.
>
With a slow device and big burst of data, I had to increase the space 
for mbufs.
I have had some problems with segmented data (using IP packets of > 
1518B(-header sizes)), but I believe the freeBSD stack is OK.

Better use lwIP on memory-constrained devices. And lwIP is more actively 
maintained..

Success,
Jürgen
>
>
> Thanks for any hints
> --
>                                        Stano
>
> --
> Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
> and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss
>


-- 
Jürgen Lambrecht
R&D Associate
Tel: +32 (0)51 303045    Fax: +32 (0)51 310670
http://www.televic-rail.com
Televic Rail NV - Leo Bekaertlaan 1 - 8870 Izegem - Belgium
Company number 0825.539.581 - RPR Kortrijk

--
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [ECOS] eth_recv out of MBUFs
  2011-08-05  7:52 ` Lambrecht Jürgen
@ 2011-08-09  6:37   ` Stanislav Meduna
  2011-08-09 12:55     ` Wayne Visser
  0 siblings, 1 reply; 20+ messages in thread
From: Stanislav Meduna @ 2011-08-09  6:37 UTC (permalink / raw)
  To: ecos-discuss

On 05.08.2011 09:51, Lambrecht JÃ¼rgen wrote:

> Be carefull: the eCos freeBSD stack has no timeouts on TCP connections I 
> believe, but often the PC stack has!

Well the TCP timeouts that are in place when the connection
is not sending any data are basically unusable on any
standard-conforming TCP/IP stack. Whoever designed
SO_KEEPALIVE the way it is specified was smoking something
quite strong...

> So the eCos TCP session stays alive, but the PC (used to test) stack 
> shuts down the TCP connection when the cable breaks (after a timeout, or 
> it can also detect the cable fault). And the TCPs are out of sync 
> (something like that), and then it uses a lot of sockets, and indeed (as 
> in your next mail) the timeouts are very big to free sockets.
> If that is your problem, I can ask my college: he recompiled ecos to 
> change the minute timeout to seconds.

This is most probably not the source of the problem I am seeing
in this test setup. Thanks for the hint anyway.

> With a slow device and big burst of data, I had to increase the space 
> for mbufs.

Well, but to what size? I'd like to know the formula to calculate
the worst-case mbufs needs based on the number of active TCP
and UDP sockets... I am probably asking for too much :(

My gut feeling is that the TCP windows of all open connections
do not fit into the allocated mbufs. I'll try to tune this
stuff - unfortunately the FreeBSD stack is not really
friendly here.

> Better use lwIP on memory-constrained devices. And lwIP is more actively 
> maintained..

This is on my todo list for quite a long time - unfortunately
there are issues that have to be solved first. AFAIK lwIP
is not thread-safe on a single socket level and our present
framework does read from and write to a single socket from
different threads :/

I have also no idea whether everything from eCos we are
using/planning to use is able to work with lwIP (e.g. SNMP).

Regards
-- 
                                       Stano

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [ECOS] eth_recv out of MBUFs
  2011-08-09  6:37   ` Stanislav Meduna
@ 2011-08-09 12:55     ` Wayne Visser
  2011-08-11 16:52       ` Stanislav Meduna
  0 siblings, 1 reply; 20+ messages in thread
From: Wayne Visser @ 2011-08-09 12:55 UTC (permalink / raw)
  To: Stanislav Meduna; +Cc: ecos-discuss

Hi Stano,

I've had a similar problem and found some help in this message:

http://ecos.sourceware.org/ml/ecos-discuss/2003-10/msg00350.html

and the followup:

http://ecos.sourceware.org/ml/ecos-discuss/2003-10/msg00380.html

Basically, I've used the suggested hack to reduce the time required for 
TCP connections to timeout.  Fiddling around with the number of 
permissible open connections did not end well...

--
Wayne Visser
LSZ PaperTech Inc.



On 11-08-09 02:36 AM, Stanislav Meduna wrote:
> On 05.08.2011 09:51, Lambrecht JÃ¼rgen wrote:
>
>> Be carefull: the eCos freeBSD stack has no timeouts on TCP connections I
>> believe, but often the PC stack has!
> Well the TCP timeouts that are in place when the connection
> is not sending any data are basically unusable on any
> standard-conforming TCP/IP stack. Whoever designed
> SO_KEEPALIVE the way it is specified was smoking something
> quite strong...
>
>> So the eCos TCP session stays alive, but the PC (used to test) stack
>> shuts down the TCP connection when the cable breaks (after a timeout, or
>> it can also detect the cable fault). And the TCPs are out of sync
>> (something like that), and then it uses a lot of sockets, and indeed (as
>> in your next mail) the timeouts are very big to free sockets.
>> If that is your problem, I can ask my college: he recompiled ecos to
>> change the minute timeout to seconds.
> This is most probably not the source of the problem I am seeing
> in this test setup. Thanks for the hint anyway.
>
>> With a slow device and big burst of data, I had to increase the space
>> for mbufs.
> Well, but to what size? I'd like to know the formula to calculate
> the worst-case mbufs needs based on the number of active TCP
> and UDP sockets... I am probably asking for too much :(
>
> My gut feeling is that the TCP windows of all open connections
> do not fit into the allocated mbufs. I'll try to tune this
> stuff - unfortunately the FreeBSD stack is not really
> friendly here.
>
>> Better use lwIP on memory-constrained devices. And lwIP is more actively
>> maintained..
> This is on my todo list for quite a long time - unfortunately
> there are issues that have to be solved first. AFAIK lwIP
> is not thread-safe on a single socket level and our present
> framework does read from and write to a single socket from
> different threads :/
>
> I have also no idea whether everything from eCos we are
> using/planning to use is able to work with lwIP (e.g. SNMP).
>
>
> Regards

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [ECOS] eth_recv out of MBUFs
  2011-08-09 12:55     ` Wayne Visser
@ 2011-08-11 16:52       ` Stanislav Meduna
  2011-11-22 17:50         ` Stefan Sommerfeld
  0 siblings, 1 reply; 20+ messages in thread
From: Stanislav Meduna @ 2011-08-11 16:52 UTC (permalink / raw)
  To: ecos-discuss

Hi all,

thank you all for the hints. I think I now know the cause
of the problem - it is most probably caused by the
nature of how our application communicates that does
not go well with the mbuf allocation.

Our application sends many (about 50/sec in my test
setup, in praxis it varies wildly) short (about 60 byte)
TCP packets and as the latency is more important than
the throughput it does it on a socket with the TCP_NODELAY
option.

It looks the stack does not coalesce the data into clusters
in this setup and allocates a separate mbuf for each
such packet. Of course if the cable is plugged out or something
this quickly leads to a mbuf pressure that is twice more than
naively expected because of the overhead (128 byte mbuf used
for 60 byte of data). Considering that the amount of data needed
in the mbufs can be (I guess) at least number of connections times
TCP send window this is not easy to manage in a system where
512 kB for network buffers is quite a luxury...

I am now testing a patch allowing to tune a few parameters
in the TCP/IP stack including the ones suggested by some
of you (I know there is sysctl but not for all of them)
and I hope I will be able to come up with the values
satisfying our needs.

The problem causing 'unrecoverable' mbufs overflows was
a completely different beast - something (a mangled packet?)
sometimes puts the ethernet controller of the LM3S9B90
processor into a state that always thinks it has a packet,
returns bogus data when trying to read it and can be only
fixed by resetting the controller. Oh well...

Regards
-- 
                                           Stano

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [ECOS] eth_recv out of MBUFs
  2011-08-11 16:52       ` Stanislav Meduna
@ 2011-11-22 17:50         ` Stefan Sommerfeld
  0 siblings, 0 replies; 20+ messages in thread
From: Stefan Sommerfeld @ 2011-11-22 17:50 UTC (permalink / raw)
  To: Stanislav Meduna; +Cc: ecos-discuss

Hi Stanislav,

did you manage do create a patch? I have some MBUF problems too, running out of
MBUFs or noticed big latencies which can be reduced some times with debug output
inside the network driver. The whole MBUF stuff seem to be quite critical and I
need a stable network stack.

Bye...

On 11.08.2011 18:51, Stanislav Meduna wrote:
> Hi all,
> 
> thank you all for the hints. I think I now know the cause
> of the problem - it is most probably caused by the
> nature of how our application communicates that does
> not go well with the mbuf allocation.
> 
> Our application sends many (about 50/sec in my test
> setup, in praxis it varies wildly) short (about 60 byte)
> TCP packets and as the latency is more important than
> the throughput it does it on a socket with the TCP_NODELAY
> option.
> 
> It looks the stack does not coalesce the data into clusters
> in this setup and allocates a separate mbuf for each
> such packet. Of course if the cable is plugged out or something
> this quickly leads to a mbuf pressure that is twice more than
> naively expected because of the overhead (128 byte mbuf used
> for 60 byte of data). Considering that the amount of data needed
> in the mbufs can be (I guess) at least number of connections times
> TCP send window this is not easy to manage in a system where
> 512 kB for network buffers is quite a luxury...
> 
> I am now testing a patch allowing to tune a few parameters
> in the TCP/IP stack including the ones suggested by some
> of you (I know there is sysctl but not for all of them)
> and I hope I will be able to come up with the values
> satisfying our needs.
> 
> 
> The problem causing 'unrecoverable' mbufs overflows was
> a completely different beast - something (a mangled packet?)
> sometimes puts the ethernet controller of the LM3S9B90
> processor into a state that always thinks it has a packet,
> returns bogus data when trying to read it and can be only
> fixed by resetting the controller. Oh well...
> 
> Regards

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [ECOS] eth_recv out of MBUFs
  2010-07-21  1:12     ` John Mills
@ 2010-07-21  1:58       ` cyl cyl
  0 siblings, 0 replies; 20+ messages in thread
From: cyl cyl @ 2010-07-21  1:58 UTC (permalink / raw)
  To: John Mills; +Cc: eCos Users

Memory designated for network buffers
        294912
Memory designated for network dynamically allocated memory        69632
MBUFs memory size
             34816        (default was 69632)
Clusters size
                     139264
Max number of open sockets
           32
Number of supoported pending network events                                8

- cyl

2010/7/21 John Mills <johnmills@speakeasy.net>:
> Cyl -
>
> How much memory is your code allocating dynamically? How much statically?
>
>  - Mills
>
> On Tue, 20 Jul 2010, cyl cyl wrote:
>
>>       A strange thing.   If I twice the "MBUFs memory size" which is
>> 69632  to 139264,  it seems worse.  No warnings of "out of MBUFs" ,
>> but a few seconds after I took out the hardware (my computer not the
>> development board) , I can't ping it (from another computer).
>
>>       If I set the "MBUFs memory size" to 69632 (default value), the
>> same problem occurs . But not happens so soon as "139264".
>
>>       If set to 34816 (half). It works well. I can always ping its
>> ip. And the "write" to the "unnormally closed" connection would return
>> -1 at last.
>
>>       if set to 17408 (1/4), the warning out of MBUFs occurs.
>>       Any suggestions?
>> Thank you.
>>
>> - cyl
>

--
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [ECOS] eth_recv out of MBUFs
  2010-07-20  7:45   ` cyl cyl
@ 2010-07-21  1:12     ` John Mills
  2010-07-21  1:58       ` cyl cyl
  0 siblings, 1 reply; 20+ messages in thread
From: John Mills @ 2010-07-21  1:12 UTC (permalink / raw)
  To: eCos Users; +Cc: cyl cyl

Cyl -

How much memory is your code allocating dynamically? How much statically?

  - Mills

On Tue, 20 Jul 2010, cyl cyl wrote:

>        A strange thing.   If I twice the "MBUFs memory size" which is
> 69632  to 139264,  it seems worse.  No warnings of "out of MBUFs" ,
> but a few seconds after I took out the hardware (my computer not the
> development board) , I can't ping it (from another computer).

>        If I set the "MBUFs memory size" to 69632 (default value), the
> same problem occurs . But not happens so soon as "139264".

>        If set to 34816 (half). It works well. I can always ping its
> ip. And the "write" to the "unnormally closed" connection would return
> -1 at last.

>        if set to 17408 (1/4), the warning out of MBUFs occurs.
>        Any suggestions?
> Thank you.
>
> - cyl

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [ECOS] eth_recv out of MBUFs
  2010-07-17 12:44 ` John Mills
  2010-07-18 21:07   ` Ross Younger
@ 2010-07-20  7:45   ` cyl cyl
  2010-07-21  1:12     ` John Mills
  1 sibling, 1 reply; 20+ messages in thread
From: cyl cyl @ 2010-07-20  7:45 UTC (permalink / raw)
  To: John Mills; +Cc: eCos Users

Hello:
        A strange thing.   If I twice the "MBUFs memory size" which is
69632  to 139264,  it seems worse.  No warnings of "out of MBUFs" ,
but a few seconds after I took out the hardware (my computer not the
development board) , I can't ping it (from another computer).
        If I set the "MBUFs memory size" to 69632 (default value), the
same problem occurs . But not happens so soon as "139264".
        If set to 34816 (half). It works well. I can always ping its
ip. And the "write" to the "unnormally closed" connection would return
-1 at last.
        if set to 17408 (1/4), the warning out of MBUFs occurs.
        Any suggestions?
Thank you.

- cyl


2010/7/17 John Mills <johnmills@speakeasy.net>:
> Cyl -
>
> You may be running out of free memory to allocate and the enternet buffer is
> just the place where it shows up.
>
> Short answer:
> You may need to look for a memory leak somewhere in your code. These are no
> fun to find.
>
> Long answer: eCos sets up a pool of available memory for allocation when it
> starts up. This pool is broken into blocks (the 'mbuf's in question, I
> think) and when a service or application allocates memory, eCos first checks
> whether any space is available in the one or more mbufs already assigned to
> that level of your code. If space is available then storage is allocated
> from that source for your data object; if not, the memory manager attempts
> to get another mbuf to work with. If none is available, you are "out of
> MBUFs".
>
> This block-allocation is a common approach in RTOS as it protects memory
> allocation and helps keep overhead code under control.
>
> Diagnosis: If you have generous read-write memory available, increase the
> amount which is broken up into mbufs at eCos startup. (Sorry - I can't point
> you to the correct code at the moment.) If this solves the problem, you had
> some kind of worst-case collision. If this makes the problem less frequent
> that may still be the issue. It this only delays the problem I would bet on
> a memory leak (unbalanced 'malloc' and 'free' usage in some thread). These
> are no fun to find.
>
> Suggestions:
>  1. Inactivate different threads and see if this solves the problem, then go
> looking for memory leakage in the thread(s) that were running when the
> problem occurred. Recently added code is the prime suspect, of course.
>
>  2. Write macros to "wrap" the system's 'malloc' and 'free' code so that
> they print from whence they are called and for what objects they are
> allocating. Then replace all calls to the system 'malloc' and 'free' with
> calls to your macros. (This approach makes it easy to turn the debug code on
> and off by at one spot by changing the macros' definitions: through a
> compilation switch for example.)
>
>  3. Use static rather than dynamic ('malloc') storage allocation wherever
> this is practical (not a bad idea in embedded real-time applications that
> may need to run stably for indefinite periods.)
>
>  4. Use a free ('valgrind') or commercial code analyzer to track memory
> allocation (another "learning opportunity"!).
>
> DISCLAIMERS:
>  1. I don't have eCos code handy so I may be misunderstanding the problem
> completely. Sorry.
>
>  2. I'm using eCos-2.0 so the issue may be quite different in 3.x.
>
>  3. If I missed the boat here, I hope folks will correct me and we will both
> learn from it.
>
>  - John Mills
>
> On Fri, 16 Jul 2010, cyl cyl wrote:
>
>> Hello!
>> I got warning: eth_recv out of MBUFs, when I was writing to a
>> connection(frequently) which is NOT  NORMALLY closed (hardware is
>> taken out). Usually "write" returns -1,  but sometimes it shows that
>> message. I'm using ecos3.0 by the way.
>> Anything will be helpful !
>> Thank you!
>

--
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [ECOS] eth_recv out of MBUFs
  2010-07-18 21:07   ` Ross Younger
@ 2010-07-19  1:55     ` John Mills
  0 siblings, 0 replies; 20+ messages in thread
From: John Mills @ 2010-07-19  1:55 UTC (permalink / raw)
  To: ecos; +Cc: Ross Younger

All -

I was not able to use 'valgrind' successfully on a multithreaded eCos 
application, but for another reason: We combine a number of packages from 
other sources and parties and these contained so many questionable coding 
structures that valgrind gave up the job in a flurry of sarcastic 
comments. We were confident that our leaks were in our own code and not 
those third-party items because we have a long history of stable 
performance with those components. There were also some new pieces we had 
not written so we couldn't be 100% sure where our problems were arising.

Since dynamic allocation and freeing are centralized, we used the "smart 
macro" approach and hooked a simple memory monitor that way. Indeed the 
problems were in code I had written. (Big surprise!)

  - John Mills

On Sun, 18 Jul 2010, Ross Younger wrote:

> John Mills wrote:
>>  4. Use a free ('valgrind') or commercial code analyzer to track memory 
>> allocation (another "learning opportunity"!).
>
> valgrind is a wonderful idea in general, but very Linux-specific at the 
> moment. I wondered about valgrinding an eCos app running on the synth target 
> the other month; I got as far as determining that it seems to Just Work for a 
> simple single-threaded eCos configuration, but as soon as you bring in the 
> scheduler the threading model - which doesn't use ordinary Linux threads - 
> causes valgrind to get very very confused.

> I found myself staring at the valgrind code, wondering how much effort it 
> would be to teach valgrind how to notice the synth spawning threads (at the 
> moment it does so by intercepting the clone syscall) but haven't had time to 
> actually try as yet...

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [ECOS] eth_recv out of MBUFs
  2010-07-17 12:44 ` John Mills
@ 2010-07-18 21:07   ` Ross Younger
  2010-07-19  1:55     ` John Mills
  2010-07-20  7:45   ` cyl cyl
  1 sibling, 1 reply; 20+ messages in thread
From: Ross Younger @ 2010-07-18 21:07 UTC (permalink / raw)
  To: ecos

John Mills wrote:
>  4. Use a free ('valgrind') or commercial code analyzer to track memory 
> allocation (another "learning opportunity"!).

valgrind is a wonderful idea in general, but very Linux-specific at the 
moment. I wondered about valgrinding an eCos app running on the synth 
target the other month; I got as far as determining that it seems to 
Just Work for a simple single-threaded eCos configuration, but as soon 
as you bring in the scheduler the threading model - which doesn't use 
ordinary Linux threads - causes valgrind to get very very confused.

I found myself staring at the valgrind code, wondering how much effort 
it would be to teach valgrind how to notice the synth spawning threads 
(at the moment it does so by intercepting the clone syscall) but haven't 
had time to actually try as yet...

Ross

-- 
eCosCentric Ltd, Barnwell House, Barnwell Drive, Cambridge CB5 8UU, UK
Registered in England no. 4422071.                 www.ecoscentric.com

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [ECOS] eth_recv out of MBUFs
  2010-07-16  8:26 cyl cyl
  2010-07-16 11:52 ` James Hunter
@ 2010-07-17 12:44 ` John Mills
  2010-07-18 21:07   ` Ross Younger
  2010-07-20  7:45   ` cyl cyl
  1 sibling, 2 replies; 20+ messages in thread
From: John Mills @ 2010-07-17 12:44 UTC (permalink / raw)
  To: eCos Users; +Cc: cyl cyl

Cyl -

You may be running out of free memory to allocate and the enternet 
buffer is just the place where it shows up.

Short answer:
You may need to look for a memory leak somewhere in your code. These are 
no fun to find.

Long answer: eCos sets up a pool of available memory for allocation when 
it starts up. This pool is broken into blocks (the 'mbuf's in question, I 
think) and when a service or application allocates memory, eCos first 
checks whether any space is available in the one or more mbufs already 
assigned to that level of your code. If space is available then storage is 
allocated from that source for your data object; if not, the memory 
manager attempts to get another mbuf to work with. If none is available, 
you are "out of MBUFs".

This block-allocation is a common approach in RTOS as it protects memory 
allocation and helps keep overhead code under control.

Diagnosis: If you have generous read-write memory available, increase the 
amount which is broken up into mbufs at eCos startup. (Sorry - I can't 
point you to the correct code at the moment.) If this solves the problem, 
you had some kind of worst-case collision. If this makes the problem less 
frequent that may still be the issue. It this only delays the problem I 
would bet on a memory leak (unbalanced 'malloc' and 'free' usage in some 
thread). These are no fun to find.

Suggestions:
  1. Inactivate different threads and see if this solves the problem, then 
go looking for memory leakage in the thread(s) that were running when the 
problem occurred. Recently added code is the prime suspect, of course.

  2. Write macros to "wrap" the system's 'malloc' and 'free' code so that 
they print from whence they are called and for what objects they are 
allocating. Then replace all calls to the system 'malloc' and 'free' with 
calls to your macros. (This approach makes it easy to turn the debug code 
on and off by at one spot by changing the macros' definitions: through a 
compilation switch for example.)

  3. Use static rather than dynamic ('malloc') storage allocation wherever 
this is practical (not a bad idea in embedded real-time applications that 
may need to run stably for indefinite periods.)

  4. Use a free ('valgrind') or commercial code analyzer to track memory 
allocation (another "learning opportunity"!).

DISCLAIMERS:
  1. I don't have eCos code handy so I may be misunderstanding the problem 
completely. Sorry.

  2. I'm using eCos-2.0 so the issue may be quite different in 3.x.

  3. If I missed the boat here, I hope folks will correct me and we will 
both learn from it.

  - John Mills

On Fri, 16 Jul 2010, cyl cyl wrote:

> Hello!
> I got warning: eth_recv out of MBUFs, when I was writing to a
> connection(frequently) which is NOT  NORMALLY closed (hardware is
> taken out). Usually "write" returns -1,  but sometimes it shows that
> message. I'm using ecos3.0 by the way.
> Anything will be helpful !
> Thank you!

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [ECOS] eth_recv out of MBUFs
  2010-07-16  8:26 cyl cyl
@ 2010-07-16 11:52 ` James Hunter
  2010-07-17 12:44 ` John Mills
  1 sibling, 0 replies; 20+ messages in thread
From: James Hunter @ 2010-07-16 11:52 UTC (permalink / raw)
  To: ecos-discuss, cyl cyl

Hi,

Could be a number of things..

1. Your network alarm and network support threads are not high enough priority to give the TCP/IP stack time to deal with all your data
2. You're simply overloading your CPU, it just can't handle that amount of traffic
3. There's a bug in your network driver that is not releasing MBUF's back to the system
4. You may need to increase the amount of memory allocated to MBUF's and clusters, but this usually masks the problem 

You could use some of the debug calls to see when the MBUF's are increasing.

I`m not sure what you mean by "write" returning -1, you mean SendTo? Or Send? 

James

-----Original Message-----
From: ecos-discuss-owner@ecos.sourceware.org [mailto:ecos-discuss-owner@ecos.sourceware.org] On Behalf Of cyl cyl
Sent: 16 July 2010 09:26
To: ecos-discuss@ecos.sourceware.org
Subject: [ECOS] eth_recv out of MBUFs

 Hello!
I got warning: eth_recv out of MBUFs, when I was writing to a
connection(frequently) which is NOT  NORMALLY closed (hardware is
taken out). Usually "write" returns -1,  but sometimes it shows that
message. I'm using ecos3.0 by the way.
Anything will be helpful !
Thank you!

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

--
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [ECOS] eth_recv out of MBUFs
@ 2010-07-16  8:26 cyl cyl
  2010-07-16 11:52 ` James Hunter
  2010-07-17 12:44 ` John Mills
  0 siblings, 2 replies; 20+ messages in thread
From: cyl cyl @ 2010-07-16  8:26 UTC (permalink / raw)
  To: ecos-discuss

 Hello!
I got warning: eth_recv out of MBUFs, when I was writing to a
connection(frequently) which is NOT  NORMALLY closed (hardware is
taken out). Usually "write" returns -1,  but sometimes it shows that
message. I'm using ecos3.0 by the way.
Anything will be helpful !
Thank you!

--
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [ECOS]  eth_recv out of MBUFs
  2001-10-16 17:19   ` Rosimildo da Silva
@ 2001-10-17  1:49     ` Peter Graf
  0 siblings, 0 replies; 20+ messages in thread
From: Peter Graf @ 2001-10-17  1:49 UTC (permalink / raw)
  To: ecos-discuss

Rosimildo da Silva wrote:

>I have observed the same problems.  I have checked with 2 platforms
>mips and x86 pc. In both cases over heavy load, the system always
>run-out- MBUFs. A second problem that I see is that there is no
>recover for that, I mean once you reach it, even if the other end
>stop sending anything, the ecos never recovers it.
>
>I did some tests, and I can only recreate this with TCP sockets.
>With UDP sockets, I've got the same targets running over 24 hours.

Same for me, CS8900 Ethernet, Hitachi SH3. Recovers, but needs minutes
after the other end stops sending.

Peter

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [ECOS]  eth_recv out of MBUFs
  2001-10-16 14:55 ` Jonathan Larmour
  2001-10-16 17:19   ` Rosimildo da Silva
@ 2001-10-17  1:03   ` Christoph Csebits
  1 sibling, 0 replies; 20+ messages in thread
From: Christoph Csebits @ 2001-10-17  1:03 UTC (permalink / raw)
  To: Jonathan Larmour; +Cc: eCos mailing list

On Tue, Oct 16, 2001 at 10:55:18PM +0100, Jonathan Larmour wrote:
> Christoph Csebits wrote:
> > 
> > hi,
> > 
> > i am working on a FCC driver for the MPC8260ADS.
> > (FCC driver is based on the Viper FEC driver)
> > cvs-version 02-Oct-2001
> > 
> > i am getting "warning: eth_recv out of MBUFs" messages
> > after while running the tcp_echo - "test suite".
> > sending only 100 packets instead of 1024 works fine.
> > 
> > What does this message mean, where should i
> > start in my driver code.
> 
> It probably means the mbufs your driver uses for received data aren't
> getting freed after use. Perhaps it isn't getting read fast enough or you
> need to increase CYGPKG_NET_MEM_USAGE if you are hammering it really hard.
> It may not indicate a problem.
> 
> Alternatively if the buffers stay allocated even after you've stopped
> hammering it, then that _would_ be a problem :-).

hi, i think i solved my mbuf problem.

Since my FCC driver is based on the QUICC/FEC drivers and
according to Jon Hartley ->
http://sources.redhat.com/ml/ecos-discuss/2001-07/msg00089.html
there are still various bugs in them.

So I worked out the patch for my FCC driver and now it seems
to work.
at least better than before :-) (waiting for arising problems)

thanks to Jon, 
best regards, christoph
-- 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [ECOS]  eth_recv out of MBUFs
  2001-10-16 14:55 ` Jonathan Larmour
@ 2001-10-16 17:19   ` Rosimildo da Silva
  2001-10-17  1:49     ` Peter Graf
  2001-10-17  1:03   ` Christoph Csebits
  1 sibling, 1 reply; 20+ messages in thread
From: Rosimildo da Silva @ 2001-10-16 17:19 UTC (permalink / raw)
  To: eCos mailing list

From: "Jonathan Larmour" <jlarmour@redhat.com>
To: "Christoph Csebits" <christoph.csebits@frequentis.com>
Cc: "eCos mailing list" <ecos-discuss@sources.redhat.com>
Sent: Tuesday, October 16, 2001 4:55 PM
Subject: Re: [ECOS] eth_recv out of MBUFs


> Christoph Csebits wrote:
> >
> > hi,
> >
> > i am working on a FCC driver for the MPC8260ADS.
> > (FCC driver is based on the Viper FEC driver)
> > cvs-version 02-Oct-2001
> >
> > i am getting "warning: eth_recv out of MBUFs" messages
> > after while running the tcp_echo - "test suite".
> > sending only 100 packets instead of 1024 works fine.
> >
> > What does this message mean, where should i
> > start in my driver code.
>
> It probably means the mbufs your driver uses for received data aren't
> getting freed after use. Perhaps it isn't getting read fast enough or you
> need to increase CYGPKG_NET_MEM_USAGE if you are hammering it really hard.
> It may not indicate a problem.
>
> Alternatively if the buffers stay allocated even after you've stopped
> hammering it, then that _would_ be a problem :-).

I have observed the same problems.  I have checked with 2 platforms
mips and x86 pc. In both cases over heavy load, the system always
run-out- MBUFs. A second problem that I see is that there is no
recover for that, I mean once you reach it, even if the other end
stop sending anything, the ecos never recovers it.

I did some tests, and I can only recreate this with TCP sockets.
With UDP sockets, I've got the same targets running over 24 hours.

Rosimildo.











^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [ECOS]  eth_recv out of MBUFs
  2001-10-16  9:21 Christoph Csebits
@ 2001-10-16 14:55 ` Jonathan Larmour
  2001-10-16 17:19   ` Rosimildo da Silva
  2001-10-17  1:03   ` Christoph Csebits
  0 siblings, 2 replies; 20+ messages in thread
From: Jonathan Larmour @ 2001-10-16 14:55 UTC (permalink / raw)
  To: Christoph Csebits; +Cc: eCos mailing list

Christoph Csebits wrote:
> 
> hi,
> 
> i am working on a FCC driver for the MPC8260ADS.
> (FCC driver is based on the Viper FEC driver)
> cvs-version 02-Oct-2001
> 
> i am getting "warning: eth_recv out of MBUFs" messages
> after while running the tcp_echo - "test suite".
> sending only 100 packets instead of 1024 works fine.
> 
> What does this message mean, where should i
> start in my driver code.

It probably means the mbufs your driver uses for received data aren't
getting freed after use. Perhaps it isn't getting read fast enough or you
need to increase CYGPKG_NET_MEM_USAGE if you are hammering it really hard.
It may not indicate a problem.

Alternatively if the buffers stay allocated even after you've stopped
hammering it, then that _would_ be a problem :-).

Jifl
-- 
Red Hat, Rustat House, Clifton Road, Cambridge, UK. Tel: +44 (1223) 271062
Maybe this world is another planet's Hell -Aldous Huxley || Opinions==mine

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [ECOS]  eth_recv out of MBUFs
@ 2001-10-16  9:21 Christoph Csebits
  2001-10-16 14:55 ` Jonathan Larmour
  0 siblings, 1 reply; 20+ messages in thread
From: Christoph Csebits @ 2001-10-16  9:21 UTC (permalink / raw)
  To: eCos mailing list

hi,

i am working on a FCC driver for the MPC8260ADS.
(FCC driver is based on the Viper FEC driver)
cvs-version 02-Oct-2001

i am getting "warning: eth_recv out of MBUFs" messages
after while running the tcp_echo - "test suite".
sending only 100 packets instead of 1024 works fine.

What does this message mean, where should i
start in my driver code.

Any suggestions are appreciated

regards, christoph

target log:
INK connection from 172.17.2.1:3993
SOURCE connection from 172.17.2.1:3996
Using 100 buffers of 8192 bytes each, 0%  background load
Set no background load
Set no background load
439 ticks elapsed, 3273 kloops predicted for an idle system
actual kloops 3084, CPU was 94%  idle during transfer
Set no background load
Set no background load
Final load[1458] = 742341 => 1% 
Net malloc:
  count:     80, min:       0.18, max:       1.17, total:      38.65, ave:       0.47
Net free:
  count:     32, min:       0.35, max:       4.50, total:      23.12, ave:       0.71
Mbuf alloc:
  count:   4109, min:       0.24, max:       6.23, total:    1603.69, ave:       0.38
Mbuf free:
  count:   3747, min:       0.32, max:       1.08, total:    1771.50, ave:       0.47
Cluster alloc:
  count:    200, min:       0.24, max:       0.53, total:      70.28, ave:       0.35
Checksum:
  count:   3689, min:       0.18, max:       7.91, total:    3214.81, ave:       0.86
Net memcpy:
  count:   7119, min:       0.02, max:      21.57, total:    4152.86, ave:       0.58
Net memset:
  count:   1128, min:       0.05, max:       1.77, total:     245.93, ave:       0.21
-- 

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2011-11-22 17:50 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-20 19:16 [ECOS] eth_recv out of MBUFs Stanislav Meduna
2011-07-20 21:27 ` Stanislav Meduna
2011-08-05  7:52 ` Lambrecht Jürgen
2011-08-09  6:37   ` Stanislav Meduna
2011-08-09 12:55     ` Wayne Visser
2011-08-11 16:52       ` Stanislav Meduna
2011-11-22 17:50         ` Stefan Sommerfeld
  -- strict thread matches above, loose matches on Subject: below --
2010-07-16  8:26 cyl cyl
2010-07-16 11:52 ` James Hunter
2010-07-17 12:44 ` John Mills
2010-07-18 21:07   ` Ross Younger
2010-07-19  1:55     ` John Mills
2010-07-20  7:45   ` cyl cyl
2010-07-21  1:12     ` John Mills
2010-07-21  1:58       ` cyl cyl
2001-10-16  9:21 Christoph Csebits
2001-10-16 14:55 ` Jonathan Larmour
2001-10-16 17:19   ` Rosimildo da Silva
2001-10-17  1:49     ` Peter Graf
2001-10-17  1:03   ` Christoph Csebits

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).