[ECOS] accept() behaviour (out of file descriptors)

public inbox for ecos-discuss@sourceware.org
 help / color / mirror / Atom feed

* [ECOS] accept() behaviour (out of file descriptors)
@ 2003-09-25 13:36 Christoph Csebits
  2003-09-25 17:28 ` Nick Garnett
  0 siblings, 1 reply; 6+ messages in thread
From: Christoph Csebits @ 2003-09-25 13:36 UTC (permalink / raw)
  To: ecos discuss list

hi,

i am a bit confused about how accept() works in eCos.

In eCos: accept() when called directly allocates a file descriptor
(with a new file pointer) and then blocks until a connection
was established.

When a client connects, a new socket is allocated and
somehow related to the previously allocated file descriptor
(and file pointer).

The socket is returned and when calling accept() again it
gets a new file descriptor and then blocks again.

Now let us assume there are no file descriptors available.
What happens is, that accept() does not block and returns
-1 (errno=EMFILE) immediately.

accept() "returns" EMFILE even though no client had connected.

Using accept() like below, results in an endless loop (busy waiting!)
(until a file descriptor is freed).

for (;;)
{
   int s;
   if ((s = accept(...)) < 0 )
   {
      perror("accept");
   }
   else
   {
      // handle connection
   }
}

In linux accept() does allocate a new file descriptor _only_after_
a connection has established. If a client connects and
no file descriptor is available the connection is aborted.
("Connection closed by foreign host.")
And accept() returns -1 (errno=EMFILE) and when called again
it blocks until the next client wants to connect.

I think this is the right way to handle such a situation.

What do you think about how accept() should behave?

Note that we implemented a linux-like accept() for eCos (only FreeBSD).
I can send a patch if someone is interested.

regards, Christoph
-- 

-- 
Before posting, please read the FAQ: http://sources.redhat.com/fom/ecos
and search the list archive: http://sources.redhat.com/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [ECOS] accept() behaviour (out of file descriptors)
  2003-09-25 13:36 [ECOS] accept() behaviour (out of file descriptors) Christoph Csebits
@ 2003-09-25 17:28 ` Nick Garnett
  2003-09-26  7:00   ` Thomas BINDER
  0 siblings, 1 reply; 6+ messages in thread
From: Nick Garnett @ 2003-09-25 17:28 UTC (permalink / raw)
  To: Christoph Csebits; +Cc: ecos discuss list

Christoph Csebits <christoph.csebits@frequentis.com> writes:

> 
> In linux accept() does allocate a new file descriptor _only_after_
> a connection has established. If a client connects and
> no file descriptor is available the connection is aborted.
> ("Connection closed by foreign host.")
> And accept() returns -1 (errno=EMFILE) and when called again
> it blocks until the next client wants to connect.

I'm not convinced that Linux has it right here. It seems unreasonable
to accept the connection if there is insufficient local resource to
complete the operation. BSD seems to go to some effort to put the
pending connection back on the queue if it cannot accept it -- which
seems a better thing to do.

> 
> I think this is the right way to handle such a situation.
> 
> What do you think about how accept() should behave?
> 

Even if we move the allocation of the descriptor to after the call to
the stack's accept routine, we still need to allocate a cyg_file
object before, and exactly the same thing can happen.

Unfortunately, the different division of responsibility between layers
we have in eCos means that we may end up with these minor semantic
differences that we cannot fix. BSD (and probably Linux) has the
layers mixed up together in sys_accept().

> Note that we implemented a linux-like accept() for eCos (only FreeBSD).
> I can send a patch if someone is interested.
> 

If it is a simply patch then I see no reason not to apply it. While it
won't totally fix the problem, it will at least reduce the number of
surprises by one.

-- 
Nick Garnett                    eCos Kernel Architect
http://www.ecoscentric.com      The eCos and RedBoot experts

-- 
Before posting, please read the FAQ: http://sources.redhat.com/fom/ecos
and search the list archive: http://sources.redhat.com/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [ECOS] accept() behaviour (out of file descriptors)
  2003-09-25 17:28 ` Nick Garnett
@ 2003-09-26  7:00   ` Thomas BINDER
  2003-09-26 12:01     ` Nick Garnett
  0 siblings, 1 reply; 6+ messages in thread
From: Thomas BINDER @ 2003-09-26  7:00 UTC (permalink / raw)
  To: Nick Garnett; +Cc: Christoph Csebits, ecos discuss list

Nick Garnett wrote:
> 
> Christoph Csebits <christoph.csebits@frequentis.com> writes:
> 
> >
> > In linux accept() does allocate a new file descriptor _only_after_
> > a connection has established. If a client connects and
> > no file descriptor is available the connection is aborted.
> > ("Connection closed by foreign host.")
> > And accept() returns -1 (errno=EMFILE) and when called again
> > it blocks until the next client wants to connect.
> 
> I'm not convinced that Linux has it right here. It seems unreasonable
> to accept the connection if there is insufficient local resource to
> complete the operation. BSD seems to go to some effort to put the
> pending connection back on the queue if it cannot accept it -- which
> seems a better thing to do.

I am not sure what the intended behaviour of accept() should be. The Linux man page claims that the Linux accept conforms to SVr4 *and* BSD. According to this man page accept() blocks *until* a connection is attempted.

As to putting the connection back onto the queue, one must not forget that Operating Systems like Linux or BSD have several sets of filedescriptors (one per process). Therefore, it certainly makes (more) sense to let another process try to accept the connection.

In my opinion, however, it does not make much sense to run into an endless loop (and consume lots of CPU) in case no process has ressources to accept the connection.

> 
> >
> > I think this is the right way to handle such a situation.
> >
> > What do you think about how accept() should behave?
> >
> 
> Even if we move the allocation of the descriptor to after the call to
> the stack's accept routine, we still need to allocate a cyg_file
> object before, and exactly the same thing can happen.

Well, actually no. We allocate the filedescriptors *in* bsd_accept, *after* tsleep returns (right at the place where someone put a big FIXME comment :-) ). This ensures that accept will block until a connection is actually attempted. In case no more filedescriptors (or files) are available, there is not really much one can do (in eCos). 

We found that aborting the connection in that case (the Linux behaviour) is better than using sleeps around successive calls to accept (necessary with eCos). What do you think?

best regards,
T.Binder
--

-- 
Before posting, please read the FAQ: http://sources.redhat.com/fom/ecos
and search the list archive: http://sources.redhat.com/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [ECOS] accept() behaviour (out of file descriptors)
  2003-09-26  7:00   ` Thomas BINDER
@ 2003-09-26 12:01     ` Nick Garnett
  2003-09-29 10:28       ` Thomas BINDER
  0 siblings, 1 reply; 6+ messages in thread
From: Nick Garnett @ 2003-09-26 12:01 UTC (permalink / raw)
  To: Thomas BINDER; +Cc: Christoph Csebits, ecos discuss list

Thomas BINDER <Thomas.Binder@frequentis.com> writes:

> 
> As to putting the connection back onto the queue, one must not
> forget that Operating Systems like Linux or BSD have several sets of
> filedescriptors (one per process). Therefore, it certainly makes
> (more) sense to let another process try to accept the connection.
>

It is fairly rare for more than one process to be accepting on the
same port. I can only think of some sort of high-throughput load
balanced SMP server that might need to do that. These operating
systems also have much larger numbers of file descriptors per
process. But we can always increase the number of file descriptors if
an application is going to make lots of connections. It's just a
config option.

> In my opinion, however, it does not make much sense to run into an
> endless loop (and consume lots of CPU) in case no process has
> ressources to accept the connection.
>

I'll agree that looping is bad. The question is how to stop it in a
clean way. My preferred approach would be to address the source of the
problem and increase the file descriptor table size.

> 
> > 
> > >
> > > I think this is the right way to handle such a situation.
> > >
> > > What do you think about how accept() should behave?
> > >
> > 
> > Even if we move the allocation of the descriptor to after the call to
> > the stack's accept routine, we still need to allocate a cyg_file
> > object before, and exactly the same thing can happen.
> 
> Well, actually no. We allocate the filedescriptors *in* bsd_accept,
> *after* tsleep returns (right at the place where someone put a big
> FIXME comment :-) ). This ensures that accept will block until a
> connection is actually attempted. In case no more filedescriptors
> (or files) are available, there is not really much one can do (in
> eCos).

I don't like that at all. It breaks the layering and would make the
introduction of different network stacks difficult. 

> 
> We found that aborting the connection in that case (the Linux
> behaviour) is better than using sleeps around successive calls to
> accept (necessary with eCos). What do you think?

I suspect that the best approach might just be a documentations
fix. Point out somewhere that if you get EMFILE or ENFILE errors then
increase the number of descriptors and/or file objects. eCos is, after
all, meant to be an embedded OS with deliberately restricted
resources, not a full Linux/BSD clone. So we will reach limits sooner
than on those systems and the solutions are often to incrementally
adjust the configured resource allocations rather than try to always
recover at runtime.

-- 
Nick Garnett                    eCos Kernel Architect
http://www.ecoscentric.com      The eCos and RedBoot experts

-- 
Before posting, please read the FAQ: http://sources.redhat.com/fom/ecos
and search the list archive: http://sources.redhat.com/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [ECOS] accept() behaviour (out of file descriptors)
  2003-09-26 12:01     ` Nick Garnett
@ 2003-09-29 10:28       ` Thomas BINDER
  2003-10-01 14:32         ` Nick Garnett
  0 siblings, 1 reply; 6+ messages in thread
From: Thomas BINDER @ 2003-09-29 10:28 UTC (permalink / raw)
  To: Nick Garnett; +Cc: Christoph Csebits, ecos discuss list

Nick Garnett wrote:
> 
> Thomas BINDER <Thomas.Binder@frequentis.com> writes:
> 
> > In my opinion, however, it does not make much sense to run into an
> > endless loop (and consume lots of CPU) in case no process has
> > ressources to accept the connection.
> >
> 
> I'll agree that looping is bad. The question is how to stop it in a
> clean way. My preferred approach would be to address the source of the
> problem and increase the file descriptor table size.

Unfortunately the problem has a little deeper impact. We are not talking about regular use of (lots of) file descriptors here. Think about the case where file descriptors are consumed erroneously. In our application the thread (telnet server) that waits for incoming connections would suddenly run into an endless loop and some of the other threads (those with lower or equal prio) would not get the CPU any longer. Now go ahead and find the real problem :-). Increasing the number of filedescriptors does not help either.

Now, one could certainly argue that a telnet server should sleep for a certain period when accept fails. But what about a Web-Server (which we also use in a different project)?. Is it a good idea to sleep between consecutive (failed) accepts? From a quick look at the eCos Web-Server I believe that this problem is also not properly handled there (consecutive array lookup with index -1).

How do you suggest to use accept() in eCos?

> > Well, actually no. We allocate the filedescriptors *in* bsd_accept,
> > *after* tsleep returns (right at the place where someone put a big
> > FIXME comment :-) ). This ensures that accept will block until a
> > connection is actually attempted. In case no more filedescriptors
> > (or files) are available, there is not really much one can do (in
> > eCos).
> 
> I don't like that at all. It breaks the layering and would make the
> introduction of different network stacks difficult.

I am afraid I don't understand that. All network stacks use callbacks (into mempools) to allocate/de-allocate resources (mbufs, sockets). What's the catch of using a callback to allocate a file descriptor / pointer (as the original FreeBSD stack does)? What else was the FIXME originally meant for?

best regards,
Tom
--

-- 
Before posting, please read the FAQ: http://sources.redhat.com/fom/ecos
and search the list archive: http://sources.redhat.com/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [ECOS] accept() behaviour (out of file descriptors)
  2003-09-29 10:28       ` Thomas BINDER
@ 2003-10-01 14:32         ` Nick Garnett
  0 siblings, 0 replies; 6+ messages in thread
From: Nick Garnett @ 2003-10-01 14:32 UTC (permalink / raw)
  To: Thomas BINDER; +Cc: Christoph Csebits, ecos discuss list

Thomas BINDER <Thomas.Binder@frequentis.com> writes:

> Unfortunately the problem has a little deeper impact. We are not
> talking about regular use of (lots of) file descriptors here. Think
> about the case where file descriptors are consumed erroneously. In
> our application the thread (telnet server) that waits for incoming
> connections would suddenly run into an endless loop and some of the
> other threads (those with lower or equal prio) would not get the CPU
> any longer. Now go ahead and find the real problem :-). Increasing
> the number of filedescriptors does not help either.

But this sort of thing should only happen during development. Anything
that causes an application to eat up all the file descriptors during
deployment is a bug.

> 
> Now, one could certainly argue that a telnet server should sleep for
> a certain period when accept fails. But what about a Web-Server
> (which we also use in a different project)?. Is it a good idea to
> sleep between consecutive (failed) accepts? From a quick look at the
> eCos Web-Server I believe that this problem is also not properly
> handled there (consecutive array lookup with index -1).
>

Adding a delay to the loop while debugging the problem may allow other
threads to run and print an error message and so help you eliminate
the bug. But it does not need to be there permanently.

> How do you suggest to use accept() in eCos?
>

I suggest you find the bug that is eating up all your file descriptors
and fix that, rather than worry about a symptom. There will always be
obscure corner cases where eCos will behave slightly different from
Linux or BSD. This is a consequence of being an embedded OS rather
than a fully-featured general purpose OS. We have to make compromises
in things like the amount of resource we devote to certain aspects, or
the complexity of the code we use to implement them. If we made the
effort to fully duplicate the behaviour of Linux/BSD we would end up
just as large and complex.

> > I don't like that at all. It breaks the layering and would make the
> > introduction of different network stacks difficult.
>
> I am afraid I don't understand that. All network stacks use
> callbacks (into mempools) to allocate/de-allocate resources (mbufs,
> sockets). What's the catch of using a callback to allocate a file
> descriptor / pointer (as the original FreeBSD stack does)? What else
> was the FIXME originally meant for?

Those callbacks are into other parts of the same package. BSD is just
one big lump of code with very loose interfaces between modules, Linux
is even worse and doesn't seem to have any clean interfaces at all.

One of the compromises we have made in the design of eCos is to keep
the interface between the FILEIO package and network stacks
simple. The FILEIO package deals entirely with file descriptors, the
network stacks know nothing about them. This gives us the freedom to
reimplement or even eliminate code and data if we want. Moving this
knowledge down into the stack makes the interface more complex,
exposes routines that were never intended to be an API and makes the
task of porting a network stack to eCos more onerous. All of this to
fix one obscure corner case that is itself merely a symptom of a more
serious application bug.

-- 
Nick Garnett                    eCos Kernel Architect
http://www.ecoscentric.com      The eCos and RedBoot experts

-- 
Before posting, please read the FAQ: http://sources.redhat.com/fom/ecos
and search the list archive: http://sources.redhat.com/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2003-10-01 14:32 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-09-25 13:36 [ECOS] accept() behaviour (out of file descriptors) Christoph Csebits
2003-09-25 17:28 ` Nick Garnett
2003-09-26  7:00   ` Thomas BINDER
2003-09-26 12:01     ` Nick Garnett
2003-09-29 10:28       ` Thomas BINDER
2003-10-01 14:32         ` Nick Garnett

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).