public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
From: <sten.kristian.ivarsson@gmail.com>
To: "'Ken Brown'" <kbrown@cornell.edu>
Cc: "'cygwin'" <cygwin@cygwin.com>
Subject: Sv: Sv: Sv: Sv: Sv: Sv: Sv: Sv: Named pipes and multiple writers
Date: Thu, 2 Apr 2020 10:05:49 +0200	[thread overview]
Message-ID: <000901d608c5$86361880$92a24980$@gmail.com> (raw)
In-Reply-To: <7897bc10-439d-64aa-c173-f0bf4ec82468@cornell.edu>

> On 4/1/2020 2:34 PM, Ken Brown via Cygwin wrote:
> > On 4/1/2020 1:14 PM, sten.kristian.ivarsson@gmail.com wrote:
> >>> On 4/1/2020 4:52 AM, sten.kristian.ivarsson@gmail.com wrote:
> >>>>> On 3/31/2020 5:10 PM, sten.kristian.ivarsson@gmail.com wrote:
> >>>>>>> On 3/28/2020 10:19 PM, Ken Brown via Cygwin wrote:
> >>>>>>>> On 3/28/2020 11:43 AM, Ken Brown via Cygwin wrote:
> >>>>>>>>> On 3/28/2020 8:10 AM, sten.kristian.ivarsson@gmail.com wrote:
> >>>>>>>>>>> On 3/27/2020 10:53 AM, sten.kristian.ivarsson@gmail.com wrote:
> >>>>>>>>>>>>> On 3/26/2020 7:19 PM, Ken Brown via Cygwin wrote:
> >>>>>>>>>>>>>> On 3/26/2020 6:39 PM, Ken Brown via Cygwin wrote:
> >>>>>>>>>>>>>>> On 3/26/2020 6:01 PM, sten.kristian.ivarsson@gmail.com
wrote:
> >>>>>>>>>>>>>>>> The ENIXIO occurs when parallel child-processes
> >>>>>>>>>>>>>>>> simultaneously using O_NONBLOCK opening the descriptor.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> This is consistent with my guess that the error is
> >>>>>>>>>>>>>>> generated by fhandler_fifo::wait.  I have a feeling that
> >>>>>>>>>>>>>>> read_ready should have been created as a manual-reset
> >>>>>>>>>>>>>>> event, and that more care is needed to make sure it's
> >>>>>>>>>>>>>>> set
> >> when it should be.
> >>
> >> [snip]
> >>
> >>>>>>>> Never mind.  I was able to reproduce the problem and find the
cause.
> >>>>>>>> What happens is that when the first subprocess exits,
> >>>>>>>> fhandler_fifo::close resets read_ready.  That causes the second
> >>>>>>>> and subsequent subprocesses to think that there's no reader
> >>>>>>>> open, so their attempts to open a writer with O_NONBLOCK fail
with ENXIO.
> >>
> >> [snip]
> >>
> >>>> I wrote in a previous mail in this topic that it seemed to work
> >>>> fine for me as well, but when I bumped up the numbers of writers
> >>>> and/or the number of messages (e.g. 25/25) it starts to fail again
> >>
> >> [snip]
> >>
> >>> Yes, it is a resource issue.  There is a limit on the number of
> >>> writers
> >> that can be open at one
> >>> time, currently 64.  I chose that number arbitrarily, with no idea
> >>> what
> >> might actually be
> >>> needed in practice, and it can easily be changed.
> >>
> >> Does it have to be a limit at all ? We would rather see that the
> >> application decide how much resources it would like to use. In our
> >> particular case there will be a process-manager with an incoming pipe
> >> that possible several thousands of processes will write to
> >
> > I agree.
> >
> >> Just for fiddling around (to figure out if this is the limit that
> >> make other things work a bit odd), where's this 64 limit defined now ?
> >
> > It's MAX_CLIENTS, defined in fhandler.h.  But there seem to be other
> > resource issues also; simply increasing MAX_CLIENTS doesn't solve the
> > problem.  I think there are also problems with the number of threads,
> > for example.  Each time your program forks, the subprocess inherits
> > the rfd file descriptor and its "fifo_reader_thread" starts up.  This
> > is unnecessary for your application, so I tried disabling it (in
> fhandler_fifo::fixup_after_fork), just as an experiment.
> >
> > But then I ran into some deadlocks, suggesting that one of the locks
> > I'm using isn't robust enough.  So I've got a lot of things to work on.
> >
> >>> In addition, a writer isn't recognized as closed until a reader
> >>> tries to
> >> read and gets an error.
> >>> In your example with 25/25, the list of writers quickly gets to 64
> >>> before
> >> the parent ever tries
> >>> to read.
> >>
> >> That explains the behaviour, but should there be some error returned
> >> from open/write (maybe it is but I'm missing it) ?
> >
> > The error is discovered in add_client_handler, called from
> > thread_func.  I think you'll only see it if you run the program under
> > strace.  I'll see if I can find a way to report it.  Currently,
> > there's a retry loop in fhandler_fifo::open when a writer tries to
> > open, and I think I need to limit the number of retries and then error
out.
> 
> I pushed a few improvements and bug fixes, and your 25/25 example now runs
without a
> problem.  I increased MAX_CLIENTS to 1024 just for the sake of this
example, but I'll
> work on letting the number of writers increase dynamically as needed.

I pulled it and tried it out and yes, the sample test program with 25/25
worked well and a whole bunch of our unit-tests passed with ok result now

We still do have some issues, but I cannot yet tell if they are related to
named pipes or not

It is great that you're looking into a totally dynamic solution

Kristian

> Ken


  reply	other threads:[~2020-04-02  8:05 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-25 11:11 sten.kristian.ivarsson
2020-03-25 12:44 ` Ken Brown
     [not found]   ` <18be01d602ab$0bbfca30$233f5e90$@gmail.com>
2020-03-26 14:06     ` Sv: " Ken Brown
2020-03-26 15:11       ` Ken Brown
2020-03-26 16:03         ` Norton Allen
2020-03-26 16:44           ` Ken Brown
2020-03-26 17:00             ` Norton Allen
2020-03-26 22:01       ` Sv: " sten.kristian.ivarsson
2020-03-26 22:39         ` Ken Brown
2020-03-26 23:19           ` Ken Brown
2020-03-27 13:10             ` Ken Brown
2020-03-27 14:53               ` Sv: " sten.kristian.ivarsson
2020-03-27 22:56                 ` Ken Brown
2020-03-27 23:00                   ` Ken Brown
2020-03-28 12:10                   ` Sv: " sten.kristian.ivarsson
2020-03-28 15:43                     ` Ken Brown
2020-03-29  2:19                       ` Ken Brown
2020-03-30 17:44                         ` Ken Brown
2020-03-31 21:10                           ` Sv: " sten.kristian.ivarsson
2020-03-31 22:02                             ` Ken Brown
2020-04-01  7:45                               ` Sv: " sten.kristian.ivarsson
2020-04-01 13:47                                 ` Ken Brown
2020-04-01  8:52                               ` sten.kristian.ivarsson
2020-04-01 16:15                                 ` Ken Brown
2020-04-01 17:14                                   ` Sv: " sten.kristian.ivarsson
2020-04-01 18:34                                     ` Ken Brown
2020-04-02  2:19                                       ` Ken Brown
2020-04-02  8:05                                         ` sten.kristian.ivarsson [this message]
2020-04-02 12:47                                           ` Sv: Sv: Sv: Sv: Sv: Sv: Sv: Sv: Named pipes and multiple wri Gregery Barton
2020-04-02 18:21                                           ` Sv: Sv: Sv: Sv: Sv: Sv: Sv: Sv: Named pipes and multiple writers Ken Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='000901d608c5$86361880$92a24980$@gmail.com' \
    --to=sten.kristian.ivarsson@gmail.com \
    --cc=cygwin@cygwin.com \
    --cc=kbrown@cornell.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).