From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 16565 invoked by alias); 14 Jun 2012 04:43:29 -0000 Received: (qmail 16550 invoked by uid 22791); 14 Jun 2012 04:43:26 -0000 X-SWARE-Spam-Status: No, hits=-5.0 required=5.0 tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,KHOP_RCVD_TRUST,KHOP_THREADED,RCVD_IN_DNSWL_LOW,RCVD_IN_HOSTKARMA_YE X-Spam-Check-By: sourceware.org Received: from mail-pz0-f49.google.com (HELO mail-pz0-f49.google.com) (209.85.210.49) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 14 Jun 2012 04:43:10 +0000 Received: by dadm1 with SMTP id m1so1941199dad.36 for ; Wed, 13 Jun 2012 21:43:09 -0700 (PDT) MIME-Version: 1.0 Received: by 10.68.223.129 with SMTP id qu1mr3260357pbc.165.1339648989591; Wed, 13 Jun 2012 21:43:09 -0700 (PDT) Received: by 10.68.46.33 with HTTP; Wed, 13 Jun 2012 21:43:09 -0700 (PDT) In-Reply-To: <4FD8B62E.6080005@kuantic.com> References: <4FD8B62E.6080005@kuantic.com> Date: Thu, 14 Jun 2012 04:43:00 -0000 Message-ID: From: Elad Yosef To: "Michael O'Dowd" Cc: ecos-discuss@ecos.sourceware.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes Mailing-List: contact ecos-discuss-help@ecos.sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: ecos-discuss-owner@ecos.sourceware.org Subject: Re: [ECOS] pbuf_alloc failures with LwIP X-SW-Source: 2012-06/txt/msg00003.txt.bz2 Hi Michael, Thanks for the detailed reply. I think I have exactly the same problem that you have - the networking stops working. I got the LwIP stats after the networking stopped working, see LINK xmit: 0 rexmit: 0 recv: 0 fw: 0 drop: 0 chkerr: 0 lenerr: 0 memerr: 0 rterr: 0 proterr: 0 opterr: 0 err: 0 cachehit: 0 IP_FRAG xmit: 0 rexmit: 0 recv: 0 fw: 0 drop: 0 chkerr: 0 lenerr: 0 memerr: 0 rterr: 0 proterr: 0 opterr: 0 err: 0 cachehit: 0 IP xmit: 17643 rexmit: 0 recv: 63100 fw: 0 drop: 0 chkerr: 0 lenerr: 0 memerr: 0 rterr: 0 proterr: 0 opterr: 0 err: 0 cachehit: 0 ICMP xmit: 2775 rexmit: 0 recv: 2950 fw: 0 drop: 175 chkerr: 0 lenerr: 0 memerr: 0 rterr: 0 proterr: 175 opterr: 0 err: 0 cachehit: 0 UDP xmit: 4714 rexmit: 0 recv: 53209 fw: 0 drop: 0 chkerr: 0 lenerr: 0 memerr: 0 rterr: 0 proterr: 0 opterr: 0 err: 0 cachehit: 0 TCP xmit: 6715 rexmit: 0 recv: 6941 fw: 0 drop: 0 chkerr: 0 lenerr: 0 memerr: 2705 rterr: 0 proterr: 0 opterr: 0 err: 0 cachehit: 0 PBUF - "each pbuf is 1024 bytes" avail: 30 used: 1 max: 11 err: 2 alloc_locked: 0 refresh_locked: 0 MEM HEAP avail: 1024 used: 0 max: 720 err: 0 MEM PBUF avail: 8 used: 0 max: 2 err: 0 MEM RAW_PCB avail: 4 used: 0 max: 0 err: 0 MEM UDP_PCB avail: 3 used: 3 max: 3 err: 0 MEM TCP_PCB avail: 16 used: 0 max: 8 err: 0 MEM TCP_PCB_LISTEN avail: 1 used: 1 max: 1 err: 0 MEM TCP_SEG avail: 6 used: 0 max: 4 err: 0 MEM NETBUF avail: 10 used: 0 max: 6 err: 0 MEM NETCONN avail: 12 used: 4 max: 7 err: 0 MEM API_MSG avail: 6 used: 0 max: 2 err: 0 MEM TCP_MSG avail: 12 used: 0 max: 7 err: 0 MEM TIMEOUT avail: 4 used: 2 max: 3 err: 0 I would appreciate if can take a look Elad On Wed, Jun 13, 2012 at 6:47 PM, Michael O'Dowd wrote: > Hi Elad, > > I ran into a similar problem recently. I'm using a recent CVS checkout > rather than 3.0. Also, I'm probably not using the same ethernet HW, so I > don't know how well my reply corresponds to your case. > > The eth_drv.c file is the glue between lwIP and the underlying ethernet > driver, so the issue that you are encountering may be specific to the > driver. In my case, when under stress, eth_drv.c generates the error > message: "cannot allocate pbuf to receive packet". Soon after that, the > ethernet driver stops receiving traffic permanently, but does not crash. = In > your case, if I understand correctly, your system crashes. > > The issue is that when eth_drv_recv() fails to allocate a pbuf, it returns > without calling the ethernet driver recv() function: (sc->funs->recv)(). = In > my case, the driver requires that it's recv() function be called, in order > to complete the processing of the packet reception and to free up the > receive buffer(s). Failing to call it, apparently causes the receive path= to > cease functioning (I'm still investigating the details). In your case, I > gather that it crashes the system. > > Note: I'm running on an NXP 1788 (Cortex-M3), using the > "devs/arm/lpc2xxx/current/src/if_lpc2xxx.c" ethernet driver. > > There are two aspects to this problem: > > 1) In my opinion, there is a bug in eth_drv_recv(). If there are no pbufs > available, then it should at least cause the received packet to be > discarded. Otherwise, the system may fail whenever there is a minor burst= of > traffic on the network. It doesn't take much: there are only 16 pbufs > available by default. Whether or not the system fails, depends on how the > ethernet driver reacts to the failure to call it's recv() function. I hope > to fix this on my platform in the near future. > > 2) You should also keep an eye on your pbuf usage, just to make sure that > you don't have a pbuf memroy leak. You could also try to allocate more > pbufs, if you have the available memory. > > If you are using the default lwip configuration, the pbuf memory allocati= on > is handled by memp.[hc]. It has a fixed number of pbufs available. The > default is 16 pbufs, and can be changed in the configtool under: [lwIP > networking stack/Memory options/Number of memp struct pbufs]. > > Alternatively, if you have lots of memory, you could enable the checkbox: > [lwIP networking stack/Memory options/Use malloc for pool allocations]. T= his > bypasses the memp pools and their static limitations. Though this will ma= ke > it harder to spot a pbuf memory leak. I haven't tried this personally. > > Finally, (when using memp) the pbuf usage can be monitored with > lwip/stats.h. If you have access to a serial port, try calling > stats_display(). Here is a snippet of the pbuf related output: > >> =A0MEM PBUF_POOL >> =A0 =A0 =A0 =A0 =A0avail: 16 >> =A0 =A0 =A0 =A0 =A0used: 0 >> =A0 =A0 =A0 =A0 =A0max: 3 >> =A0 =A0 =A0 =A0 =A0err: 0 > > The "err" counter increases when pbuf_alloc() fails. > > Hope that helps, > > Regards, > > Michael O'Dowd > Kuantic SAS > > > On 12/06/2012 22:40, Elad Yosef wrote: >> >> Hi all, >> I'm using LwIP stack on my target and experiencing crashes under stress. >> >> function eth_drv_recv) from ../io/eth/v3_0/ser/lwip/eth_drv.c >> calls pbuf_alloc() and this allocation fails. >> >> Is this result of some bad configuration? >> >> Thanks >> Elad >> > -- Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss