public inbox for ecos-discuss@sourceware.org
 help / color / mirror / Atom feed
From: "Michael O'Dowd" <michael.odowd@kuantic.com>
To: Elad Yosef <elad.yosef@gmail.com>
Cc: ecos-discuss@ecos.sourceware.org
Subject: Re: [ECOS] pbuf_alloc failures with LwIP
Date: Thu, 14 Jun 2012 14:37:00 -0000	[thread overview]
Message-ID: <4FD9F74E.7090104@kuantic.com> (raw)
In-Reply-To: <CAOFa9c3iCX6gHesxtQbd40yhXsBdAXbmx53JEQnP0DYcTa=2eA@mail.gmail.com>

Hi Elad,

Hmm, I've had a quick look at the pbuf management in eCos 3.0. It's 
quite different from the CVS version, so I'm not that familiar with it.

Nonetheless, I'm surprised by the PBUF statistics:

   PBUF - "each pbuf is 1024 bytes"
           avail: 30
           used: 1
           max: 11
           err: 2
           alloc_locked: 0
           refresh_locked: 0

There's something wrong here. Considering that "alloc_locked = 0", the 
only way for "err" to be incremented is if you run out of pbufs. 
However, the sign that you have run out of pbufs is that "max" equals 
"avail". Yet, in your case, max = 11, while avail = 30. So you didn't 
run out of pbufs, you only used 11 out of 30.

Digging a bit more, it appears that "err" in increased when 
pbuf_pool_alloc() returns NULL. This happens when the linked-list of 
available pbufs is empty.

So, how come the linked-list of available pbufs is empty when max = 11? 
In my opinion, the linked-list of available pbufs is corrupt or truncated.

Are you sure that you're respecting the thread-safe requirements of 
lwIP? Are you using multiple threads? If so, make sure that the 
SYS_ARCH_PROTECT macro (in lwip/sys.h) is defined to do something 
useful, rather than being an empty definition.

Regards,

Michael.

On 14/06/2012 06:43, Elad Yosef wrote:
> Hi Michael,
> Thanks for the detailed reply.
>
> I think I have exactly the same problem that you have - the networking
> stops working.
>
> I got the LwIP stats after the networking stopped working, see
>
>
>
> LINK
>          xmit: 0
>          rexmit: 0
>          recv: 0
>          fw: 0
>          drop: 0
>          chkerr: 0
>          lenerr: 0
>          memerr: 0
>          rterr: 0
>          proterr: 0
>          opterr: 0
>          err: 0
>          cachehit: 0
>
> IP_FRAG
>          xmit: 0
>          rexmit: 0
>          recv: 0
>          fw: 0
>          drop: 0
>          chkerr: 0
>          lenerr: 0
>          memerr: 0
>          rterr: 0
>          proterr: 0
>          opterr: 0
>          err: 0
>          cachehit: 0
>
> IP
>          xmit: 17643
>          rexmit: 0
>          recv: 63100
>          fw: 0
>          drop: 0
>          chkerr: 0
>          lenerr: 0
>          memerr: 0
>          rterr: 0
>          proterr: 0
>          opterr: 0
>          err: 0
>          cachehit: 0
>
> ICMP
>          xmit: 2775
>          rexmit: 0
>          recv: 2950
>          fw: 0
>          drop: 175
>          chkerr: 0
>          lenerr: 0
>          memerr: 0
>          rterr: 0
>          proterr: 175
>          opterr: 0
>          err: 0
>          cachehit: 0
>
> UDP
>          xmit: 4714
>          rexmit: 0
>          recv: 53209
>          fw: 0
>          drop: 0
>          chkerr: 0
>          lenerr: 0
>          memerr: 0
>          rterr: 0
>          proterr: 0
>          opterr: 0
>          err: 0
>          cachehit: 0
>
> TCP
>          xmit: 6715
>          rexmit: 0
>          recv: 6941
>          fw: 0
>          drop: 0
>          chkerr: 0
>          lenerr: 0
>          memerr: 2705
>          rterr: 0
>          proterr: 0
>          opterr: 0
>          err: 0
>          cachehit: 0
>
> PBUF - "each pbuf is 1024 bytes"
>          avail: 30
>          used: 1
>          max: 11
>          err: 2
>          alloc_locked: 0
>          refresh_locked: 0
>
>   MEM HEAP
>          avail: 1024
>          used: 0
>          max: 720
>          err: 0
>
>   MEM PBUF
>          avail: 8
>          used: 0
>          max: 2
>          err: 0
>
>   MEM RAW_PCB
>          avail: 4
>          used: 0
>          max: 0
>          err: 0
>
>   MEM UDP_PCB
>          avail: 3
>          used: 3
>          max: 3
>          err: 0
>
>   MEM TCP_PCB
>          avail: 16
>          used: 0
>          max: 8
>          err: 0
>
>   MEM TCP_PCB_LISTEN
>          avail: 1
>          used: 1
>          max: 1
>          err: 0
>
>   MEM TCP_SEG
>          avail: 6
>          used: 0
>          max: 4
>          err: 0
>
>   MEM NETBUF
>          avail: 10
>          used: 0
>          max: 6
>          err: 0
>
>   MEM NETCONN
>          avail: 12
>          used: 4
>          max: 7
>          err: 0
>
>   MEM API_MSG
>          avail: 6
>          used: 0
>          max: 2
>          err: 0
>
>   MEM TCP_MSG
>          avail: 12
>          used: 0
>          max: 7
>          err: 0
>
>   MEM TIMEOUT
>          avail: 4
>          used: 2
>          max: 3
>          err: 0
>
>
> I would appreciate if can take a look
>
> Elad
>
>
> On Wed, Jun 13, 2012 at 6:47 PM, Michael O'Dowd
> <michael.odowd@kuantic.com> wrote:
>> Hi Elad,
>>
>> I ran into a similar problem recently. I'm using a recent CVS checkout
>> rather than 3.0. Also, I'm probably not using the same ethernet HW, so I
>> don't know how well my reply corresponds to your case.
>>
>> The eth_drv.c file is the glue between lwIP and the underlying ethernet
>> driver, so the issue that you are encountering may be specific to the
>> driver. In my case, when under stress, eth_drv.c generates the error
>> message: "cannot allocate pbuf to receive packet". Soon after that, the
>> ethernet driver stops receiving traffic permanently, but does not crash. In
>> your case, if I understand correctly, your system crashes.
>>
>> The issue is that when eth_drv_recv() fails to allocate a pbuf, it returns
>> without calling the ethernet driver recv() function: (sc->funs->recv)(). In
>> my case, the driver requires that it's recv() function be called, in order
>> to complete the processing of the packet reception and to free up the
>> receive buffer(s). Failing to call it, apparently causes the receive path to
>> cease functioning (I'm still investigating the details). In your case, I
>> gather that it crashes the system.
>>
>> Note: I'm running on an NXP 1788 (Cortex-M3), using the
>> "devs/arm/lpc2xxx/current/src/if_lpc2xxx.c" ethernet driver.
>>
>> There are two aspects to this problem:
>>
>> 1) In my opinion, there is a bug in eth_drv_recv(). If there are no pbufs
>> available, then it should at least cause the received packet to be
>> discarded. Otherwise, the system may fail whenever there is a minor burst of
>> traffic on the network. It doesn't take much: there are only 16 pbufs
>> available by default. Whether or not the system fails, depends on how the
>> ethernet driver reacts to the failure to call it's recv() function. I hope
>> to fix this on my platform in the near future.
>>
>> 2) You should also keep an eye on your pbuf usage, just to make sure that
>> you don't have a pbuf memroy leak. You could also try to allocate more
>> pbufs, if you have the available memory.
>>
>> If you are using the default lwip configuration, the pbuf memory allocation
>> is handled by memp.[hc]. It has a fixed number of pbufs available. The
>> default is 16 pbufs, and can be changed in the configtool under: [lwIP
>> networking stack/Memory options/Number of memp struct pbufs].
>>
>> Alternatively, if you have lots of memory, you could enable the checkbox:
>> [lwIP networking stack/Memory options/Use malloc for pool allocations]. This
>> bypasses the memp pools and their static limitations. Though this will make
>> it harder to spot a pbuf memory leak. I haven't tried this personally.
>>
>> Finally, (when using memp) the pbuf usage can be monitored with
>> lwip/stats.h. If you have access to a serial port, try calling
>> stats_display(). Here is a snippet of the pbuf related output:
>>
>>>   MEM PBUF_POOL
>>>           avail: 16
>>>           used: 0
>>>           max: 3
>>>           err: 0
>> The "err" counter increases when pbuf_alloc() fails.
>>
>> Hope that helps,
>>
>> Regards,
>>
>> Michael O'Dowd
>> Kuantic SAS
>>
>>
>> On 12/06/2012 22:40, Elad Yosef wrote:
>>> Hi all,
>>> I'm using LwIP stack on my target and experiencing crashes under stress.
>>>
>>> function eth_drv_recv) from ../io/eth/v3_0/ser/lwip/eth_drv.c
>>> calls pbuf_alloc() and this allocation fails.
>>>
>>> Is this result of some bad configuration?
>>>
>>> Thanks
>>> Elad
>>>

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

  reply	other threads:[~2012-06-14 14:37 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-12 20:40 Elad Yosef
2012-06-13 15:47 ` Michael O'Dowd
2012-06-14  4:43   ` Elad Yosef
2012-06-14 14:37     ` Michael O'Dowd [this message]
2012-06-17  6:28       ` Elad Yosef
2012-06-17  9:12         ` [ECOS] " John Dallaway
2012-06-17 13:36           ` Elad Yosef
2012-06-18 10:17             ` Michael O'Dowd
2012-06-17 14:25           ` Elad Yosef

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FD9F74E.7090104@kuantic.com \
    --to=michael.odowd@kuantic.com \
    --cc=ecos-discuss@ecos.sourceware.org \
    --cc=elad.yosef@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).