* [ECOS] Re: accept() FreeBSD hangs when out of resources
@ 2007-06-11 23:15 Tad
2007-06-12 3:51 ` Andrew Lunn
0 siblings, 1 reply; 11+ messages in thread
From: Tad @ 2007-06-11 23:15 UTC (permalink / raw)
To: ecos-discuss
>> accept() won't return and won't timeout (>12hrs) when listen() indicates
>> a new connection, if out of sockets/file-descriptors and all TCP
>> connections are in ESTABLISHED state.
>
> Where exactly is it blocked. Please could you provide a call stack.
Couldn't see why it would hang either, Andrew, but seems to reliably.
Wish I could help more. Submitted 20 hrs of digging. My system doesn't
have any gdb or printf capablities. Think I gave enough reproduction
situation for someone with gdb capabilities to take it further.
--
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [ECOS] Re: accept() FreeBSD hangs when out of resources 2007-06-11 23:15 [ECOS] Re: accept() FreeBSD hangs when out of resources Tad @ 2007-06-12 3:51 ` Andrew Lunn 2007-06-12 3:57 ` Tad 2007-06-12 4:05 ` [ECOS] Re: Re: accept() FreeBSD hangs when out of resources Tad 0 siblings, 2 replies; 11+ messages in thread From: Andrew Lunn @ 2007-06-12 3:51 UTC (permalink / raw) To: Tad; +Cc: ecos-discuss On Mon, Jun 11, 2007 at 03:42:07PM -0800, Tad wrote: > >>accept() won't return and won't timeout (>12hrs) when listen() indicates > >>a new connection, if out of sockets/file-descriptors and all TCP > >>connections are in ESTABLISHED state. > > > >Where exactly is it blocked. Please could you provide a call stack. > > Couldn't see why it would hang either, Andrew, but seems to reliably. > > Wish I could help more. Submitted 20 hrs of digging. My system doesn't > have any gdb or printf capablities. Think I gave enough reproduction > situation for someone with gdb capabilities to take it further. For situations like this i find working on the synthetic target much better. You have full gdb support, diag_printf etc. What i would ideally like is a test case we can add to the standard tests. The test case should fail now, but once we have fix the problem we can keep the test case for regression tests. Andrew -- Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [ECOS] Re: accept() FreeBSD hangs when out of resources 2007-06-12 3:51 ` Andrew Lunn @ 2007-06-12 3:57 ` Tad 2007-06-12 6:54 ` Andrew Lunn 2007-06-12 4:05 ` [ECOS] Re: Re: accept() FreeBSD hangs when out of resources Tad 1 sibling, 1 reply; 11+ messages in thread From: Tad @ 2007-06-12 3:57 UTC (permalink / raw) To: ecos-discuss Andrew Lunn wrote: > On Mon, Jun 11, 2007 at 03:42:07PM -0800, Tad wrote: > >>>> accept() won't return and won't timeout (>12hrs) when listen() indicates >>>> a new connection, if out of sockets/file-descriptors and all TCP >>>> connections are in ESTABLISHED state. >>>> >>> Where exactly is it blocked. Please could you provide a call stack. It's possible that the block is somewhere such as this "FIXME" code that wasn't finished in sys/kern/sockio.c /* 348 * At this point we know that there is at least one connection 349 * ready to be accepted. Remove it from the queue prior to 350 * allocating the file descriptor for it since falloc() may 351 * block allowing another process to accept the connection 352 * instead. 353 */ 354 so = TAILQ_FIRST(&head->so_comp); 355 TAILQ_REMOVE(&head->so_comp, so, so_list); 356 head->so_qlen--; 357 358 #if 0 // FIXME 359 fflag = lfp->f_flag; 360 error = falloc(p, &nfp, &fd); 361 if (error) { 362 /* 363 * Probably ran out of file descriptors. Put the 364 * unaccepted connection back onto the queue and 365 * do another wakeup so some other process might 366 * have a chance at it. 367 */ 368 TAILQ_INSERT_HEAD(&head->so_comp, so, so_list); 369 head->so_qlen++; 370 wakeup_one(&head->so_timeo); 371 splx(s); 372 goto done; 373 } 374 fhold(nfp); 375 p->p_retval[0] = fd; 376 377 /* connection has been removed from the listen queue */ 378 KNOTE(&head->so_rcv.sb_sel.si_note, 0); 379 #endif 380 381 so->so_state &= ~SS_COMP; 382 so->so_head = NULL; 383 384 cyg_selinit(&so->so_rcv.sb_sel); 385 cyg_selinit(&so->so_snd.sb_sel); 386 387 new_fp->f_type = DTYPE_SOCKET; -- Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [ECOS] Re: accept() FreeBSD hangs when out of resources 2007-06-12 3:57 ` Tad @ 2007-06-12 6:54 ` Andrew Lunn 2007-06-12 15:37 ` Tad 2007-06-12 16:08 ` [ECOS] listen (x, 0) on new TCP incoming connections doesn't stop select()/accept() Tad 0 siblings, 2 replies; 11+ messages in thread From: Andrew Lunn @ 2007-06-12 6:54 UTC (permalink / raw) To: Tad; +Cc: eCos Disuss On Mon, Jun 11, 2007 at 04:05:57PM -0800, Tad wrote: > Andrew Lunn wrote: > >On Mon, Jun 11, 2007 at 03:42:07PM -0800, Tad wrote: > > > >>>>accept() won't return and won't timeout (>12hrs) when listen() > >>>>indicates a new connection, if out of sockets/file-descriptors and all > >>>>TCP connections are in ESTABLISHED state. > >>>> > >>>Where exactly is it blocked. Please could you provide a call stack. > > It's possible that the block is somewhere such as this "FIXME" code that > wasn't finished in sys/kern/sockio.c Yes, i already looked at this code. However this code is creating a new file descriptor. However the way eCos works is that the file descriptor has already been allocated and is passed into the function as a parameter. So i went back and looked at what called this function and where is the file descriptor allocated. That code does appear to correct handle insufficient resources. So, i really need more information, eg the test case, or a backtrace when the thread is blocked. Andrew -- Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [ECOS] Re: accept() FreeBSD hangs when out of resources 2007-06-12 6:54 ` Andrew Lunn @ 2007-06-12 15:37 ` Tad 2007-06-12 15:49 ` Lars Povlsen 2007-06-12 16:08 ` [ECOS] listen (x, 0) on new TCP incoming connections doesn't stop select()/accept() Tad 1 sibling, 1 reply; 11+ messages in thread From: Tad @ 2007-06-12 15:37 UTC (permalink / raw) To: eCos Disuss Andrew Lunn wrote: > On Mon, Jun 11, 2007 at 04:05:57PM -0800, Tad wrote: > >> Andrew Lunn wrote: >> >>> On Mon, Jun 11, 2007 at 03:42:07PM -0800, Tad wrote: >>> >>> >>>>>> accept() won't return and won't timeout (>12hrs) when listen() >>>>>> indicates a new connection, if out of sockets/file-descriptors and all >>>>>> TCP connections are in ESTABLISHED state. >>>>>> >>>>>> >>>>> Where exactly is it blocked. Please could you provide a call stack. >>>>> more info. seems to be dependent on CYGNUM_FILEIO_NFILE rather than CYGPKG_NET_MAXSOCKETS. reducing NFILE < MAXSOCKETS causes accept to hang with fewer established connections than before reduction. -- Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss ^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: [ECOS] Re: accept() FreeBSD hangs when out of resources 2007-06-12 15:37 ` Tad @ 2007-06-12 15:49 ` Lars Povlsen 0 siblings, 0 replies; 11+ messages in thread From: Lars Povlsen @ 2007-06-12 15:49 UTC (permalink / raw) To: eCos Disuss This seems a lot like the problem I've seen - and reported on 17/4-07. I've been able to occasionally reproduce it manually with a browser (MSIE), but enabling TCP debug logging causes the problem to go away (not occur). AFAICS, it is a race condition in the TCP stack causing socket buffers to be leaked (forever). Calling cyg_kmem_print_stats() displays the problem (but you need reset to recover :-() : Network stack mbuf stats: mbufs 97, clusters 60, free clusters 1 Failed to get 0 times Waited to get 0 times Drained queues to get 0 times VM zone 'ripcb': Total: 64, Free: 64, Allocs: 0, Frees: 0, Fails: 0 VM zone 'tcpcb': Total: 64, Free: 61, Allocs: 353, Frees: 350, Fails: 0 VM zone 'udpcb': Total: 64, Free: 63, Allocs: 4, Frees: 3, Fails: 0 VM zone 'socket': Total: 64, *Free: 0*, Allocs: 365, Frees: 293, Fails: 8 Misc mpool: total 98304, free 4192, max free block 3748 Mbufs pool: total 81792, free 69248, blocksize 128 Clust pool: total 163840, free 38912, blocksize 2048 FWIW, I have not had time to dig into this (as my attempts to produce a test bench has failed...) ---Lars -----Original Message----- From: ecos-discuss-owner@ecos.sourceware.org [mailto:ecos-discuss-owner@ecos.sourceware.org] On Behalf Of Tad Sent: 12. juni 2007 14:05 To: eCos Disuss Subject: Re: [ECOS] Re: accept() FreeBSD hangs when out of resources Andrew Lunn wrote: > On Mon, Jun 11, 2007 at 04:05:57PM -0800, Tad wrote: > >> Andrew Lunn wrote: >> >>> On Mon, Jun 11, 2007 at 03:42:07PM -0800, Tad wrote: >>> >>> >>>>>> accept() won't return and won't timeout (>12hrs) when listen() >>>>>> indicates a new connection, if out of sockets/file-descriptors and all >>>>>> TCP connections are in ESTABLISHED state. >>>>>> >>>>>> >>>>> Where exactly is it blocked. Please could you provide a call stack. >>>>> more info. seems to be dependent on CYGNUM_FILEIO_NFILE rather than CYGPKG_NET_MAXSOCKETS. reducing NFILE < MAXSOCKETS causes accept to hang with fewer established connections than before reduction. -- Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss -- Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss ^ permalink raw reply [flat|nested] 11+ messages in thread
* [ECOS] listen (x, 0) on new TCP incoming connections doesn't stop select()/accept() 2007-06-12 6:54 ` Andrew Lunn 2007-06-12 15:37 ` Tad @ 2007-06-12 16:08 ` Tad 1 sibling, 0 replies; 11+ messages in thread From: Tad @ 2007-06-12 16:08 UTC (permalink / raw) To: eCos Disuss FWIW, as far as I can tell, and not fully understanding the internals of listen(): It appears that attempting to stop accepting incoming TCP connections by setting listen (x, backlog=0) (after an initial listen (x, >0) if it's relevant) doesn't stop incoming TCP SYN connection requests from being ACK'd and appearing in select() and accept(). Thought that should have. -- Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [ECOS] Re: Re: accept() FreeBSD hangs when out of resources 2007-06-12 3:51 ` Andrew Lunn 2007-06-12 3:57 ` Tad @ 2007-06-12 4:05 ` Tad 2007-06-12 11:06 ` Andrew Lunn 1 sibling, 1 reply; 11+ messages in thread From: Tad @ 2007-06-12 4:05 UTC (permalink / raw) To: ecos-discuss Andrew Lunn wrote: > What i would ideally like is a test case we can add to the standard > tests. The test case should fail now, but once we have fix the problem > we can keep the test case for regression tests. 16 rapid http POSTS to any ATHTTP server compiled with 16 max sockets should lock the server up forever (as long as they're <CYG_HTTPD_SOCKET_IDLE_TIMEOUT(300) secs so the TCP conns stay in ESTABLISHED rather than TIMED_WAIT) FWIW, I raised the both MAX file NFD, NFILES? while keeping the MAX_SOCKETS the same with no change. -- Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [ECOS] Re: Re: accept() FreeBSD hangs when out of resources 2007-06-12 4:05 ` [ECOS] Re: Re: accept() FreeBSD hangs when out of resources Tad @ 2007-06-12 11:06 ` Andrew Lunn 2007-06-12 11:19 ` Andrew Lunn 0 siblings, 1 reply; 11+ messages in thread From: Andrew Lunn @ 2007-06-12 11:06 UTC (permalink / raw) To: Tad; +Cc: ecos-discuss On Mon, Jun 11, 2007 at 04:14:45PM -0800, Tad wrote: > Andrew Lunn wrote: > >What i would ideally like is a test case we can add to the standard > >tests. The test case should fail now, but once we have fix the problem > >we can keep the test case for regression tests. > > 16 rapid http POSTS to any ATHTTP server compiled with 16 max sockets > should lock the server up forever (as long as they're > <CYG_HTTPD_SOCKET_IDLE_TIMEOUT(300) secs so the TCP conns stay in > ESTABLISHED rather than TIMED_WAIT) So, maybe you can modify the server test case, assuming there is one, by adding a new thread which makes 16 connections to 127.0.0.1:80. Andrew -- Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [ECOS] Re: Re: accept() FreeBSD hangs when out of resources 2007-06-12 11:06 ` Andrew Lunn @ 2007-06-12 11:19 ` Andrew Lunn [not found] ` <466F2FC7.8060704@ds3switch.com> 0 siblings, 1 reply; 11+ messages in thread From: Andrew Lunn @ 2007-06-12 11:19 UTC (permalink / raw) To: Tad; +Cc: eCos Disuss On Tue, Jun 12, 2007 at 05:51:04AM +0200, Andrew Lunn wrote: > On Mon, Jun 11, 2007 at 04:14:45PM -0800, Tad wrote: > > Andrew Lunn wrote: > > >What i would ideally like is a test case we can add to the standard > > >tests. The test case should fail now, but once we have fix the problem > > >we can keep the test case for regression tests. > > > > 16 rapid http POSTS to any ATHTTP server compiled with 16 max sockets > > should lock the server up forever (as long as they're > > <CYG_HTTPD_SOCKET_IDLE_TIMEOUT(300) secs so the TCP conns stay in > > ESTABLISHED rather than TIMED_WAIT) > > So, maybe you can modify the server test case, assuming there is one, > by adding a new thread which makes 16 connections to 127.0.0.1:80. Actually, tcp_lo_test.c probably has 90% of the code you need for writing a much simpler test case. Andrew -- Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <466F2FC7.8060704@ds3switch.com>]
* [ECOS] "Fix" for atHTTP and HTTP socket requirements with mozilla POSTS [not found] ` <466F2FC7.8060704@ds3switch.com> @ 2007-06-13 0:09 ` Tad 0 siblings, 0 replies; 11+ messages in thread From: Tad @ 2007-06-13 0:09 UTC (permalink / raw) To: ecos-discuss "Fix" for hanging atHTTP client requests on out-of-sockets. Background: It's somewhat known that atHTTP will "pause" for several minutes when running out of sockets. One reason this can happen is that mozilla opens a new TCP connection for each POST or chunked-transfer(I think) GET, which requires a new socket for each. The remnant sockets eventually (300sec default) are shutdown by atHTTP and then enter TCP TIME_WAIT state which is 2xMSL or something like another 2-4 minutes -- but that's a long time. BTW, this assumes you don't hit the bug where accept() hangs when out of sockets. See solution in a couple days for that. "Solution:" Mozilla (on XP) appears to get smart after opening about 10 TCP connections, and starts FIN,ACK ing them to shut them down so atHTTP doesn't have to sit in TIME_WAIT or atHTTP timeout on the old connections. So, setting CYGPKG_NET_MAXSOCKETS to something > 10x # of users + a couple for sockets other eCos apps have open will allow "unlimited" mozilla (XP) client requests without any timeouts. -- Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2007-06-12 22:46 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2007-06-11 23:15 [ECOS] Re: accept() FreeBSD hangs when out of resources Tad 2007-06-12 3:51 ` Andrew Lunn 2007-06-12 3:57 ` Tad 2007-06-12 6:54 ` Andrew Lunn 2007-06-12 15:37 ` Tad 2007-06-12 15:49 ` Lars Povlsen 2007-06-12 16:08 ` [ECOS] listen (x, 0) on new TCP incoming connections doesn't stop select()/accept() Tad 2007-06-12 4:05 ` [ECOS] Re: Re: accept() FreeBSD hangs when out of resources Tad 2007-06-12 11:06 ` Andrew Lunn 2007-06-12 11:19 ` Andrew Lunn [not found] ` <466F2FC7.8060704@ds3switch.com> 2007-06-13 0:09 ` [ECOS] "Fix" for atHTTP and HTTP socket requirements with mozilla POSTS Tad
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).