* [ECOS] Re: accept() FreeBSD hangs when out of resources
@ 2007-06-11 23:15 Tad
2007-06-12 3:51 ` Andrew Lunn
0 siblings, 1 reply; 11+ messages in thread
From: Tad @ 2007-06-11 23:15 UTC (permalink / raw)
To: ecos-discuss
>> accept() won't return and won't timeout (>12hrs) when listen() indicates
>> a new connection, if out of sockets/file-descriptors and all TCP
>> connections are in ESTABLISHED state.
>
> Where exactly is it blocked. Please could you provide a call stack.
Couldn't see why it would hang either, Andrew, but seems to reliably.
Wish I could help more. Submitted 20 hrs of digging. My system doesn't
have any gdb or printf capablities. Think I gave enough reproduction
situation for someone with gdb capabilities to take it further.
--
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [ECOS] Re: accept() FreeBSD hangs when out of resources
2007-06-11 23:15 [ECOS] Re: accept() FreeBSD hangs when out of resources Tad
@ 2007-06-12 3:51 ` Andrew Lunn
2007-06-12 3:57 ` Tad
2007-06-12 4:05 ` [ECOS] Re: Re: accept() FreeBSD hangs when out of resources Tad
0 siblings, 2 replies; 11+ messages in thread
From: Andrew Lunn @ 2007-06-12 3:51 UTC (permalink / raw)
To: Tad; +Cc: ecos-discuss
On Mon, Jun 11, 2007 at 03:42:07PM -0800, Tad wrote:
> >>accept() won't return and won't timeout (>12hrs) when listen() indicates
> >>a new connection, if out of sockets/file-descriptors and all TCP
> >>connections are in ESTABLISHED state.
> >
> >Where exactly is it blocked. Please could you provide a call stack.
>
> Couldn't see why it would hang either, Andrew, but seems to reliably.
>
> Wish I could help more. Submitted 20 hrs of digging. My system doesn't
> have any gdb or printf capablities. Think I gave enough reproduction
> situation for someone with gdb capabilities to take it further.
For situations like this i find working on the synthetic target much
better. You have full gdb support, diag_printf etc.
What i would ideally like is a test case we can add to the standard
tests. The test case should fail now, but once we have fix the problem
we can keep the test case for regression tests.
Andrew
--
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [ECOS] Re: accept() FreeBSD hangs when out of resources
2007-06-12 3:51 ` Andrew Lunn
@ 2007-06-12 3:57 ` Tad
2007-06-12 6:54 ` Andrew Lunn
2007-06-12 4:05 ` [ECOS] Re: Re: accept() FreeBSD hangs when out of resources Tad
1 sibling, 1 reply; 11+ messages in thread
From: Tad @ 2007-06-12 3:57 UTC (permalink / raw)
To: ecos-discuss
Andrew Lunn wrote:
> On Mon, Jun 11, 2007 at 03:42:07PM -0800, Tad wrote:
>
>>>> accept() won't return and won't timeout (>12hrs) when listen() indicates
>>>> a new connection, if out of sockets/file-descriptors and all TCP
>>>> connections are in ESTABLISHED state.
>>>>
>>> Where exactly is it blocked. Please could you provide a call stack.
It's possible that the block is somewhere such as this "FIXME" code that
wasn't finished in sys/kern/sockio.c
/*
348 * At this point we know that there is at least one connection
349 * ready to be accepted. Remove it from the queue prior to
350 * allocating the file descriptor for it since falloc() may
351 * block allowing another process to accept the connection
352 * instead.
353 */
354 so = TAILQ_FIRST(&head->so_comp);
355 TAILQ_REMOVE(&head->so_comp, so, so_list);
356 head->so_qlen--;
357
358 #if 0 // FIXME
359 fflag = lfp->f_flag;
360 error = falloc(p, &nfp, &fd);
361 if (error) {
362 /*
363 * Probably ran out of file descriptors. Put the
364 * unaccepted connection back onto the queue and
365 * do another wakeup so some other process might
366 * have a chance at it.
367 */
368 TAILQ_INSERT_HEAD(&head->so_comp, so, so_list);
369 head->so_qlen++;
370 wakeup_one(&head->so_timeo);
371 splx(s);
372 goto done;
373 }
374 fhold(nfp);
375 p->p_retval[0] = fd;
376
377 /* connection has been removed from the listen queue */
378 KNOTE(&head->so_rcv.sb_sel.si_note, 0);
379 #endif
380
381 so->so_state &= ~SS_COMP;
382 so->so_head = NULL;
383
384 cyg_selinit(&so->so_rcv.sb_sel);
385 cyg_selinit(&so->so_snd.sb_sel);
386
387 new_fp->f_type = DTYPE_SOCKET;
--
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [ECOS] Re: Re: accept() FreeBSD hangs when out of resources
2007-06-12 3:51 ` Andrew Lunn
2007-06-12 3:57 ` Tad
@ 2007-06-12 4:05 ` Tad
2007-06-12 11:06 ` Andrew Lunn
1 sibling, 1 reply; 11+ messages in thread
From: Tad @ 2007-06-12 4:05 UTC (permalink / raw)
To: ecos-discuss
Andrew Lunn wrote:
> What i would ideally like is a test case we can add to the standard
> tests. The test case should fail now, but once we have fix the problem
> we can keep the test case for regression tests.
16 rapid http POSTS to any ATHTTP server compiled with 16 max sockets
should lock the server up forever (as long as they're
<CYG_HTTPD_SOCKET_IDLE_TIMEOUT(300) secs so the TCP conns stay in
ESTABLISHED rather than TIMED_WAIT)
FWIW, I raised the both MAX file NFD, NFILES? while keeping the
MAX_SOCKETS the same with no change.
--
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [ECOS] Re: accept() FreeBSD hangs when out of resources
2007-06-12 3:57 ` Tad
@ 2007-06-12 6:54 ` Andrew Lunn
2007-06-12 15:37 ` Tad
2007-06-12 16:08 ` [ECOS] listen (x, 0) on new TCP incoming connections doesn't stop select()/accept() Tad
0 siblings, 2 replies; 11+ messages in thread
From: Andrew Lunn @ 2007-06-12 6:54 UTC (permalink / raw)
To: Tad; +Cc: eCos Disuss
On Mon, Jun 11, 2007 at 04:05:57PM -0800, Tad wrote:
> Andrew Lunn wrote:
> >On Mon, Jun 11, 2007 at 03:42:07PM -0800, Tad wrote:
> >
> >>>>accept() won't return and won't timeout (>12hrs) when listen()
> >>>>indicates a new connection, if out of sockets/file-descriptors and all
> >>>>TCP connections are in ESTABLISHED state.
> >>>>
> >>>Where exactly is it blocked. Please could you provide a call stack.
>
> It's possible that the block is somewhere such as this "FIXME" code that
> wasn't finished in sys/kern/sockio.c
Yes, i already looked at this code. However this code is creating a
new file descriptor. However the way eCos works is that the file
descriptor has already been allocated and is passed into the function
as a parameter. So i went back and looked at what called this function
and where is the file descriptor allocated. That code does appear to
correct handle insufficient resources.
So, i really need more information, eg the test case, or a backtrace
when the thread is blocked.
Andrew
--
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [ECOS] Re: Re: accept() FreeBSD hangs when out of resources
2007-06-12 4:05 ` [ECOS] Re: Re: accept() FreeBSD hangs when out of resources Tad
@ 2007-06-12 11:06 ` Andrew Lunn
2007-06-12 11:19 ` Andrew Lunn
0 siblings, 1 reply; 11+ messages in thread
From: Andrew Lunn @ 2007-06-12 11:06 UTC (permalink / raw)
To: Tad; +Cc: ecos-discuss
On Mon, Jun 11, 2007 at 04:14:45PM -0800, Tad wrote:
> Andrew Lunn wrote:
> >What i would ideally like is a test case we can add to the standard
> >tests. The test case should fail now, but once we have fix the problem
> >we can keep the test case for regression tests.
>
> 16 rapid http POSTS to any ATHTTP server compiled with 16 max sockets
> should lock the server up forever (as long as they're
> <CYG_HTTPD_SOCKET_IDLE_TIMEOUT(300) secs so the TCP conns stay in
> ESTABLISHED rather than TIMED_WAIT)
So, maybe you can modify the server test case, assuming there is one,
by adding a new thread which makes 16 connections to 127.0.0.1:80.
Andrew
--
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [ECOS] Re: Re: accept() FreeBSD hangs when out of resources
2007-06-12 11:06 ` Andrew Lunn
@ 2007-06-12 11:19 ` Andrew Lunn
[not found] ` <466F2FC7.8060704@ds3switch.com>
0 siblings, 1 reply; 11+ messages in thread
From: Andrew Lunn @ 2007-06-12 11:19 UTC (permalink / raw)
To: Tad; +Cc: eCos Disuss
On Tue, Jun 12, 2007 at 05:51:04AM +0200, Andrew Lunn wrote:
> On Mon, Jun 11, 2007 at 04:14:45PM -0800, Tad wrote:
> > Andrew Lunn wrote:
> > >What i would ideally like is a test case we can add to the standard
> > >tests. The test case should fail now, but once we have fix the problem
> > >we can keep the test case for regression tests.
> >
> > 16 rapid http POSTS to any ATHTTP server compiled with 16 max sockets
> > should lock the server up forever (as long as they're
> > <CYG_HTTPD_SOCKET_IDLE_TIMEOUT(300) secs so the TCP conns stay in
> > ESTABLISHED rather than TIMED_WAIT)
>
> So, maybe you can modify the server test case, assuming there is one,
> by adding a new thread which makes 16 connections to 127.0.0.1:80.
Actually, tcp_lo_test.c probably has 90% of the code you need for
writing a much simpler test case.
Andrew
--
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [ECOS] Re: accept() FreeBSD hangs when out of resources
2007-06-12 6:54 ` Andrew Lunn
@ 2007-06-12 15:37 ` Tad
2007-06-12 15:49 ` Lars Povlsen
2007-06-12 16:08 ` [ECOS] listen (x, 0) on new TCP incoming connections doesn't stop select()/accept() Tad
1 sibling, 1 reply; 11+ messages in thread
From: Tad @ 2007-06-12 15:37 UTC (permalink / raw)
To: eCos Disuss
Andrew Lunn wrote:
> On Mon, Jun 11, 2007 at 04:05:57PM -0800, Tad wrote:
>
>> Andrew Lunn wrote:
>>
>>> On Mon, Jun 11, 2007 at 03:42:07PM -0800, Tad wrote:
>>>
>>>
>>>>>> accept() won't return and won't timeout (>12hrs) when listen()
>>>>>> indicates a new connection, if out of sockets/file-descriptors and all
>>>>>> TCP connections are in ESTABLISHED state.
>>>>>>
>>>>>>
>>>>> Where exactly is it blocked. Please could you provide a call stack.
>>>>>
more info.
seems to be dependent on CYGNUM_FILEIO_NFILE rather than
CYGPKG_NET_MAXSOCKETS. reducing NFILE < MAXSOCKETS causes accept to
hang with fewer established connections than before reduction.
--
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss
^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: [ECOS] Re: accept() FreeBSD hangs when out of resources
2007-06-12 15:37 ` Tad
@ 2007-06-12 15:49 ` Lars Povlsen
0 siblings, 0 replies; 11+ messages in thread
From: Lars Povlsen @ 2007-06-12 15:49 UTC (permalink / raw)
To: eCos Disuss
This seems a lot like the problem I've seen - and reported on 17/4-07.
I've been able to occasionally reproduce it manually with a browser
(MSIE), but enabling TCP debug logging causes the problem to go away
(not occur).
AFAICS, it is a race condition in the TCP stack causing socket buffers
to be leaked (forever). Calling cyg_kmem_print_stats() displays the
problem (but you need reset to recover :-() :
Network stack mbuf stats:
mbufs 97, clusters 60, free clusters 1
Failed to get 0 times
Waited to get 0 times
Drained queues to get 0 times
VM zone 'ripcb':
Total: 64, Free: 64, Allocs: 0, Frees: 0, Fails: 0
VM zone 'tcpcb':
Total: 64, Free: 61, Allocs: 353, Frees: 350, Fails: 0
VM zone 'udpcb':
Total: 64, Free: 63, Allocs: 4, Frees: 3, Fails: 0
VM zone 'socket':
Total: 64, *Free: 0*, Allocs: 365, Frees: 293, Fails: 8
Misc mpool: total 98304, free 4192, max free block 3748
Mbufs pool: total 81792, free 69248, blocksize 128
Clust pool: total 163840, free 38912, blocksize 2048
FWIW, I have not had time to dig into this (as my attempts to produce a
test bench has failed...)
---Lars
-----Original Message-----
From: ecos-discuss-owner@ecos.sourceware.org
[mailto:ecos-discuss-owner@ecos.sourceware.org] On Behalf Of Tad
Sent: 12. juni 2007 14:05
To: eCos Disuss
Subject: Re: [ECOS] Re: accept() FreeBSD hangs when out of resources
Andrew Lunn wrote:
> On Mon, Jun 11, 2007 at 04:05:57PM -0800, Tad wrote:
>
>> Andrew Lunn wrote:
>>
>>> On Mon, Jun 11, 2007 at 03:42:07PM -0800, Tad wrote:
>>>
>>>
>>>>>> accept() won't return and won't timeout (>12hrs) when listen()
>>>>>> indicates a new connection, if out of sockets/file-descriptors
and all
>>>>>> TCP connections are in ESTABLISHED state.
>>>>>>
>>>>>>
>>>>> Where exactly is it blocked. Please could you provide a call
stack.
>>>>>
more info.
seems to be dependent on CYGNUM_FILEIO_NFILE rather than
CYGPKG_NET_MAXSOCKETS. reducing NFILE < MAXSOCKETS causes accept to
hang with fewer established connections than before reduction.
--
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss
--
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss
^ permalink raw reply [flat|nested] 11+ messages in thread
* [ECOS] listen (x, 0) on new TCP incoming connections doesn't stop select()/accept()
2007-06-12 6:54 ` Andrew Lunn
2007-06-12 15:37 ` Tad
@ 2007-06-12 16:08 ` Tad
1 sibling, 0 replies; 11+ messages in thread
From: Tad @ 2007-06-12 16:08 UTC (permalink / raw)
To: eCos Disuss
FWIW, as far as I can tell, and not fully understanding the internals of
listen():
It appears that attempting to stop accepting incoming TCP connections by
setting listen (x, backlog=0) (after an initial listen (x, >0) if it's
relevant) doesn't stop incoming TCP SYN connection requests from being
ACK'd and appearing in select() and accept(). Thought that should have.
--
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss
^ permalink raw reply [flat|nested] 11+ messages in thread
* [ECOS] "Fix" for atHTTP and HTTP socket requirements with mozilla POSTS
[not found] ` <466F2FC7.8060704@ds3switch.com>
@ 2007-06-13 0:09 ` Tad
0 siblings, 0 replies; 11+ messages in thread
From: Tad @ 2007-06-13 0:09 UTC (permalink / raw)
To: ecos-discuss
"Fix" for hanging atHTTP client requests on out-of-sockets.
Background:
It's somewhat known that atHTTP will "pause" for several minutes when
running out of sockets. One reason this can happen is that mozilla
opens a new TCP connection for each POST or chunked-transfer(I think)
GET, which requires a new socket for each. The remnant sockets
eventually (300sec default) are shutdown by atHTTP and then enter TCP
TIME_WAIT state which is 2xMSL or something like another 2-4 minutes --
but that's a long time. BTW, this assumes you don't hit the bug where
accept() hangs when out of sockets. See solution in a couple days for
that.
"Solution:"
Mozilla (on XP) appears to get smart after opening about 10 TCP
connections, and starts FIN,ACK ing them to shut them down so atHTTP
doesn't have to sit in TIME_WAIT or atHTTP timeout on the old connections.
So, setting CYGPKG_NET_MAXSOCKETS to something > 10x # of users + a
couple for sockets other eCos apps have open will allow "unlimited"
mozilla (XP) client requests without any timeouts.
--
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2007-06-12 22:46 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-06-11 23:15 [ECOS] Re: accept() FreeBSD hangs when out of resources Tad
2007-06-12 3:51 ` Andrew Lunn
2007-06-12 3:57 ` Tad
2007-06-12 6:54 ` Andrew Lunn
2007-06-12 15:37 ` Tad
2007-06-12 15:49 ` Lars Povlsen
2007-06-12 16:08 ` [ECOS] listen (x, 0) on new TCP incoming connections doesn't stop select()/accept() Tad
2007-06-12 4:05 ` [ECOS] Re: Re: accept() FreeBSD hangs when out of resources Tad
2007-06-12 11:06 ` Andrew Lunn
2007-06-12 11:19 ` Andrew Lunn
[not found] ` <466F2FC7.8060704@ds3switch.com>
2007-06-13 0:09 ` [ECOS] "Fix" for atHTTP and HTTP socket requirements with mozilla POSTS Tad
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).