From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 17949 invoked by alias); 17 Jun 2012 06:28:31 -0000 Received: (qmail 17938 invoked by uid 22791); 17 Jun 2012 06:28:29 -0000 X-SWARE-Spam-Status: No, hits=-5.0 required=5.0 tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,KHOP_RCVD_TRUST,KHOP_THREADED,RCVD_IN_DNSWL_LOW,RCVD_IN_HOSTKARMA_YE X-Spam-Check-By: sourceware.org Received: from mail-pb0-f49.google.com (HELO mail-pb0-f49.google.com) (209.85.160.49) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Sun, 17 Jun 2012 06:28:15 +0000 Received: by pbbrq13 with SMTP id rq13so7321097pbb.36 for ; Sat, 16 Jun 2012 23:28:15 -0700 (PDT) MIME-Version: 1.0 Received: by 10.68.240.99 with SMTP id vz3mr38170877pbc.60.1339914495323; Sat, 16 Jun 2012 23:28:15 -0700 (PDT) Received: by 10.68.46.33 with HTTP; Sat, 16 Jun 2012 23:28:15 -0700 (PDT) In-Reply-To: <4FD9F74E.7090104@kuantic.com> References: <4FD8B62E.6080005@kuantic.com> <4FD9F74E.7090104@kuantic.com> Date: Sun, 17 Jun 2012 06:28:00 -0000 Message-ID: From: Elad Yosef To: "Michael O'Dowd" Cc: ecos-discuss@ecos.sourceware.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes Mailing-List: contact ecos-discuss-help@ecos.sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: ecos-discuss-owner@ecos.sourceware.org Subject: Re: [ECOS] pbuf_alloc failures with LwIP X-SW-Source: 2012-06/txt/msg00005.txt.bz2 Hi, I'm using multiple threads in my application. The SYS_ARCH_PROTECT is not enabled at all. How do I enable it? Is it enabled in the current CVS version? Does the current LwIP in CVS is compatible with ecos-3.0? In yes I would consider upgrading LwIP package only. Thanks Elad On Thu, Jun 14, 2012 at 5:38 PM, Michael O'Dowd wrote: > Hi Elad, > > Hmm, I've had a quick look at the pbuf management in eCos 3.0. It's quite > different from the CVS version, so I'm not that familiar with it. > > Nonetheless, I'm surprised by the PBUF statistics: > > =A0PBUF - "each pbuf is 1024 bytes" > =A0 =A0 =A0 =A0 =A0avail: 30 > =A0 =A0 =A0 =A0 =A0used: 1 > =A0 =A0 =A0 =A0 =A0max: 11 > =A0 =A0 =A0 =A0 =A0err: 2 > =A0 =A0 =A0 =A0 =A0alloc_locked: 0 > =A0 =A0 =A0 =A0 =A0refresh_locked: 0 > > There's something wrong here. Considering that "alloc_locked =3D 0", the = only > way for "err" to be incremented is if you run out of pbufs. However, the > sign that you have run out of pbufs is that "max" equals "avail". Yet, in > your case, max =3D 11, while avail =3D 30. So you didn't run out of pbufs= , you > only used 11 out of 30. > > Digging a bit more, it appears that "err" in increased when > pbuf_pool_alloc() returns NULL. This happens when the linked-list of > available pbufs is empty. > > So, how come the linked-list of available pbufs is empty when max =3D 11?= In > my opinion, the linked-list of available pbufs is corrupt or truncated. > > Are you sure that you're respecting the thread-safe requirements of lwIP? > Are you using multiple threads? If so, make sure that the SYS_ARCH_PROTECT > macro (in lwip/sys.h) is defined to do something useful, rather than being > an empty definition. > > Regards, > > Michael. > > On 14/06/2012 06:43, Elad Yosef wrote: >> >> Hi Michael, >> Thanks for the detailed reply. >> >> I think I have exactly the same problem that you have - the networking >> stops working. >> >> I got the LwIP stats after the networking stopped working, see >> >> >> >> LINK >> =A0 =A0 =A0 =A0 xmit: 0 >> =A0 =A0 =A0 =A0 rexmit: 0 >> =A0 =A0 =A0 =A0 recv: 0 >> =A0 =A0 =A0 =A0 fw: 0 >> =A0 =A0 =A0 =A0 drop: 0 >> =A0 =A0 =A0 =A0 chkerr: 0 >> =A0 =A0 =A0 =A0 lenerr: 0 >> =A0 =A0 =A0 =A0 memerr: 0 >> =A0 =A0 =A0 =A0 rterr: 0 >> =A0 =A0 =A0 =A0 proterr: 0 >> =A0 =A0 =A0 =A0 opterr: 0 >> =A0 =A0 =A0 =A0 err: 0 >> =A0 =A0 =A0 =A0 cachehit: 0 >> >> IP_FRAG >> =A0 =A0 =A0 =A0 xmit: 0 >> =A0 =A0 =A0 =A0 rexmit: 0 >> =A0 =A0 =A0 =A0 recv: 0 >> =A0 =A0 =A0 =A0 fw: 0 >> =A0 =A0 =A0 =A0 drop: 0 >> =A0 =A0 =A0 =A0 chkerr: 0 >> =A0 =A0 =A0 =A0 lenerr: 0 >> =A0 =A0 =A0 =A0 memerr: 0 >> =A0 =A0 =A0 =A0 rterr: 0 >> =A0 =A0 =A0 =A0 proterr: 0 >> =A0 =A0 =A0 =A0 opterr: 0 >> =A0 =A0 =A0 =A0 err: 0 >> =A0 =A0 =A0 =A0 cachehit: 0 >> >> IP >> =A0 =A0 =A0 =A0 xmit: 17643 >> =A0 =A0 =A0 =A0 rexmit: 0 >> =A0 =A0 =A0 =A0 recv: 63100 >> =A0 =A0 =A0 =A0 fw: 0 >> =A0 =A0 =A0 =A0 drop: 0 >> =A0 =A0 =A0 =A0 chkerr: 0 >> =A0 =A0 =A0 =A0 lenerr: 0 >> =A0 =A0 =A0 =A0 memerr: 0 >> =A0 =A0 =A0 =A0 rterr: 0 >> =A0 =A0 =A0 =A0 proterr: 0 >> =A0 =A0 =A0 =A0 opterr: 0 >> =A0 =A0 =A0 =A0 err: 0 >> =A0 =A0 =A0 =A0 cachehit: 0 >> >> ICMP >> =A0 =A0 =A0 =A0 xmit: 2775 >> =A0 =A0 =A0 =A0 rexmit: 0 >> =A0 =A0 =A0 =A0 recv: 2950 >> =A0 =A0 =A0 =A0 fw: 0 >> =A0 =A0 =A0 =A0 drop: 175 >> =A0 =A0 =A0 =A0 chkerr: 0 >> =A0 =A0 =A0 =A0 lenerr: 0 >> =A0 =A0 =A0 =A0 memerr: 0 >> =A0 =A0 =A0 =A0 rterr: 0 >> =A0 =A0 =A0 =A0 proterr: 175 >> =A0 =A0 =A0 =A0 opterr: 0 >> =A0 =A0 =A0 =A0 err: 0 >> =A0 =A0 =A0 =A0 cachehit: 0 >> >> UDP >> =A0 =A0 =A0 =A0 xmit: 4714 >> =A0 =A0 =A0 =A0 rexmit: 0 >> =A0 =A0 =A0 =A0 recv: 53209 >> =A0 =A0 =A0 =A0 fw: 0 >> =A0 =A0 =A0 =A0 drop: 0 >> =A0 =A0 =A0 =A0 chkerr: 0 >> =A0 =A0 =A0 =A0 lenerr: 0 >> =A0 =A0 =A0 =A0 memerr: 0 >> =A0 =A0 =A0 =A0 rterr: 0 >> =A0 =A0 =A0 =A0 proterr: 0 >> =A0 =A0 =A0 =A0 opterr: 0 >> =A0 =A0 =A0 =A0 err: 0 >> =A0 =A0 =A0 =A0 cachehit: 0 >> >> TCP >> =A0 =A0 =A0 =A0 xmit: 6715 >> =A0 =A0 =A0 =A0 rexmit: 0 >> =A0 =A0 =A0 =A0 recv: 6941 >> =A0 =A0 =A0 =A0 fw: 0 >> =A0 =A0 =A0 =A0 drop: 0 >> =A0 =A0 =A0 =A0 chkerr: 0 >> =A0 =A0 =A0 =A0 lenerr: 0 >> =A0 =A0 =A0 =A0 memerr: 2705 >> =A0 =A0 =A0 =A0 rterr: 0 >> =A0 =A0 =A0 =A0 proterr: 0 >> =A0 =A0 =A0 =A0 opterr: 0 >> =A0 =A0 =A0 =A0 err: 0 >> =A0 =A0 =A0 =A0 cachehit: 0 >> >> PBUF - "each pbuf is 1024 bytes" >> =A0 =A0 =A0 =A0 avail: 30 >> =A0 =A0 =A0 =A0 used: 1 >> =A0 =A0 =A0 =A0 max: 11 >> =A0 =A0 =A0 =A0 err: 2 >> =A0 =A0 =A0 =A0 alloc_locked: 0 >> =A0 =A0 =A0 =A0 refresh_locked: 0 >> >> =A0MEM HEAP >> =A0 =A0 =A0 =A0 avail: 1024 >> =A0 =A0 =A0 =A0 used: 0 >> =A0 =A0 =A0 =A0 max: 720 >> =A0 =A0 =A0 =A0 err: 0 >> >> =A0MEM PBUF >> =A0 =A0 =A0 =A0 avail: 8 >> =A0 =A0 =A0 =A0 used: 0 >> =A0 =A0 =A0 =A0 max: 2 >> =A0 =A0 =A0 =A0 err: 0 >> >> =A0MEM RAW_PCB >> =A0 =A0 =A0 =A0 avail: 4 >> =A0 =A0 =A0 =A0 used: 0 >> =A0 =A0 =A0 =A0 max: 0 >> =A0 =A0 =A0 =A0 err: 0 >> >> =A0MEM UDP_PCB >> =A0 =A0 =A0 =A0 avail: 3 >> =A0 =A0 =A0 =A0 used: 3 >> =A0 =A0 =A0 =A0 max: 3 >> =A0 =A0 =A0 =A0 err: 0 >> >> =A0MEM TCP_PCB >> =A0 =A0 =A0 =A0 avail: 16 >> =A0 =A0 =A0 =A0 used: 0 >> =A0 =A0 =A0 =A0 max: 8 >> =A0 =A0 =A0 =A0 err: 0 >> >> =A0MEM TCP_PCB_LISTEN >> =A0 =A0 =A0 =A0 avail: 1 >> =A0 =A0 =A0 =A0 used: 1 >> =A0 =A0 =A0 =A0 max: 1 >> =A0 =A0 =A0 =A0 err: 0 >> >> =A0MEM TCP_SEG >> =A0 =A0 =A0 =A0 avail: 6 >> =A0 =A0 =A0 =A0 used: 0 >> =A0 =A0 =A0 =A0 max: 4 >> =A0 =A0 =A0 =A0 err: 0 >> >> =A0MEM NETBUF >> =A0 =A0 =A0 =A0 avail: 10 >> =A0 =A0 =A0 =A0 used: 0 >> =A0 =A0 =A0 =A0 max: 6 >> =A0 =A0 =A0 =A0 err: 0 >> >> =A0MEM NETCONN >> =A0 =A0 =A0 =A0 avail: 12 >> =A0 =A0 =A0 =A0 used: 4 >> =A0 =A0 =A0 =A0 max: 7 >> =A0 =A0 =A0 =A0 err: 0 >> >> =A0MEM API_MSG >> =A0 =A0 =A0 =A0 avail: 6 >> =A0 =A0 =A0 =A0 used: 0 >> =A0 =A0 =A0 =A0 max: 2 >> =A0 =A0 =A0 =A0 err: 0 >> >> =A0MEM TCP_MSG >> =A0 =A0 =A0 =A0 avail: 12 >> =A0 =A0 =A0 =A0 used: 0 >> =A0 =A0 =A0 =A0 max: 7 >> =A0 =A0 =A0 =A0 err: 0 >> >> =A0MEM TIMEOUT >> =A0 =A0 =A0 =A0 avail: 4 >> =A0 =A0 =A0 =A0 used: 2 >> =A0 =A0 =A0 =A0 max: 3 >> =A0 =A0 =A0 =A0 err: 0 >> >> >> I would appreciate if can take a look >> >> Elad >> >> >> On Wed, Jun 13, 2012 at 6:47 PM, Michael O'Dowd >> wrote: >>> >>> Hi Elad, >>> >>> I ran into a similar problem recently. I'm using a recent CVS checkout >>> rather than 3.0. Also, I'm probably not using the same ethernet HW, so I >>> don't know how well my reply corresponds to your case. >>> >>> The eth_drv.c file is the glue between lwIP and the underlying ethernet >>> driver, so the issue that you are encountering may be specific to the >>> driver. In my case, when under stress, eth_drv.c generates the error >>> message: "cannot allocate pbuf to receive packet". Soon after that, the >>> ethernet driver stops receiving traffic permanently, but does not crash. >>> In >>> your case, if I understand correctly, your system crashes. >>> >>> The issue is that when eth_drv_recv() fails to allocate a pbuf, it >>> returns >>> without calling the ethernet driver recv() function: (sc->funs->recv)(). >>> In >>> my case, the driver requires that it's recv() function be called, in >>> order >>> to complete the processing of the packet reception and to free up the >>> receive buffer(s). Failing to call it, apparently causes the receive pa= th >>> to >>> cease functioning (I'm still investigating the details). In your case, I >>> gather that it crashes the system. >>> >>> Note: I'm running on an NXP 1788 (Cortex-M3), using the >>> "devs/arm/lpc2xxx/current/src/if_lpc2xxx.c" ethernet driver. >>> >>> There are two aspects to this problem: >>> >>> 1) In my opinion, there is a bug in eth_drv_recv(). If there are no pbu= fs >>> available, then it should at least cause the received packet to be >>> discarded. Otherwise, the system may fail whenever there is a minor bur= st >>> of >>> traffic on the network. It doesn't take much: there are only 16 pbufs >>> available by default. Whether or not the system fails, depends on how t= he >>> ethernet driver reacts to the failure to call it's recv() function. I >>> hope >>> to fix this on my platform in the near future. >>> >>> 2) You should also keep an eye on your pbuf usage, just to make sure th= at >>> you don't have a pbuf memroy leak. You could also try to allocate more >>> pbufs, if you have the available memory. >>> >>> If you are using the default lwip configuration, the pbuf memory >>> allocation >>> is handled by memp.[hc]. It has a fixed number of pbufs available. The >>> default is 16 pbufs, and can be changed in the configtool under: [lwIP >>> networking stack/Memory options/Number of memp struct pbufs]. >>> >>> Alternatively, if you have lots of memory, you could enable the checkbo= x: >>> [lwIP networking stack/Memory options/Use malloc for pool allocations]. >>> This >>> bypasses the memp pools and their static limitations. Though this will >>> make >>> it harder to spot a pbuf memory leak. I haven't tried this personally. >>> >>> Finally, (when using memp) the pbuf usage can be monitored with >>> lwip/stats.h. If you have access to a serial port, try calling >>> stats_display(). Here is a snippet of the pbuf related output: >>> >>>> =A0MEM PBUF_POOL >>>> =A0 =A0 =A0 =A0 =A0avail: 16 >>>> =A0 =A0 =A0 =A0 =A0used: 0 >>>> =A0 =A0 =A0 =A0 =A0max: 3 >>>> =A0 =A0 =A0 =A0 =A0err: 0 >>> >>> The "err" counter increases when pbuf_alloc() fails. >>> >>> Hope that helps, >>> >>> Regards, >>> >>> Michael O'Dowd >>> Kuantic SAS >>> >>> >>> On 12/06/2012 22:40, Elad Yosef wrote: >>>> >>>> Hi all, >>>> I'm using LwIP stack on my target and experiencing crashes under stres= s. >>>> >>>> function eth_drv_recv) from ../io/eth/v3_0/ser/lwip/eth_drv.c >>>> calls pbuf_alloc() and this allocation fails. >>>> >>>> Is this result of some bad configuration? >>>> >>>> Thanks >>>> Elad >>>> > -- Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss