From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lj1-x229.google.com (mail-lj1-x229.google.com [IPv6:2a00:1450:4864:20::229]) by sourceware.org (Postfix) with ESMTPS id 5BB01385E02E for ; Thu, 2 Apr 2020 08:05:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 5BB01385E02E Received: by mail-lj1-x229.google.com with SMTP id k21so2266388ljh.2 for ; Thu, 02 Apr 2020 01:05:53 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:references:in-reply-to:subject:date :message-id:mime-version:content-transfer-encoding:content-language :thread-index; bh=4tIw/QaH/EahlGIGCnD1v+xhmSAfHN/PiqkZzbXFQUU=; b=QKkRcN6N11U0OtrAaFq3Jyv9CNTTW+CPGm8KmJvgRNq1mzc3cG1NV2wTLiVqcb11Kv K2cWq/9IAWU3V50uR0uM91nEd3SUSu3GE/WyJLEStax8lOpQfPizS4moT7P9BMwibYw4 VodzvmgOJwZq362aHy8LC9dCjvxd9/EziMJNk2AAqTHFZ7IPiHaX/I3N1O8KB1YXtVNK OO07mZIzC0ZXyRPXM9fD6ptB/DcXZOZMthInsK6m+Ib0WiuvbC2de8SPKRRcX6dFvDul n3OI/AOGZO5zGFknPEeZPoeHZNEfxmixfQ8gMHILhF1QHkyv6JqIVmqnMzaDW7SNls3C fGPA== X-Gm-Message-State: AGi0PuZKc+kpimghmUh7cZQUKNBzs27+/OqC3afQmZGbdq7CYOgXdjP0 RJTC3TVLbsK8VqetXtL7TWkYtxlN X-Google-Smtp-Source: APiQypLJbk2RNwsjhPC/JrZkTJyCQ+pQF2D03QPKeHKk7uAjxdoUvKEcYukQJJ2HHR7QWEP6ZpL26w== X-Received: by 2002:a2e:985a:: with SMTP id e26mr1278691ljj.17.1585814751722; Thu, 02 Apr 2020 01:05:51 -0700 (PDT) Received: from JOKK (87-249-164-127.ljusnet.se. [87.249.164.127]) by smtp.gmail.com with ESMTPSA id i4sm3286668lfl.57.2020.04.02.01.05.50 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 02 Apr 2020 01:05:50 -0700 (PDT) From: To: "'Ken Brown'" Cc: "'cygwin'" References: <1b1401d60296$2769e690$763db3b0$@gmail.com> <716e2076-f607-454e-2723-937c3959e2a3@cornell.edu> <18be01d602ab$0bbfca30$233f5e90$@gmail.com> <35b43b59-6410-f21f-710c-385e39cbae0b@cornell.edu> <005201d603ba$2bc8ab20$835a0160$@gmail.com> <472d1df6-531a-ebd7-4ffa-583a06e270ff@cornell.edu> <00b901d60447$7ecb4c50$7c61e4f0$@gmail.com> <00e001d604f9$d0aa0720$71fe1560$@gmail.com> <8c6c5655-c162-8361-9f44-376bbd7cf114@cornell.edu> <3fe06192-7300-382a-8c98-f1bc2ff81e36@cornell.edu> <003701d607a0$c975f140$5c61d3c0$@gmail.com> <249be61e-da8a-7da1-ca67-0c4c6433a415@cornell.edu> <000a01d60802$d1525900$73f70b00$@gmail.com> <001601d60848$fcffd320$f6ff7960$@gmail.com> <7b5b058e-5047-4d49-8c31-5553056f3845@cornell.edu> <7897bc10-439d-64aa-c173-f0bf4ec8246 8@cornell.edu> In-Reply-To: <7897bc10-439d-64aa-c173-f0bf4ec82468@cornell.edu> Subject: Sv: Sv: Sv: Sv: Sv: Sv: Sv: Sv: Named pipes and multiple writers Date: Thu, 2 Apr 2020 10:05:49 +0200 Message-ID: <000901d608c5$86361880$92a24980$@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-Mailer: Microsoft Outlook 16.0 Content-Language: en-gb Thread-Index: AQJE9fQg8TMZuJRGwTEKbo0ZGNgDeQHtMggPA01jw/MCH/KUAAKPevBvAf/qW+kC4eksHQHeBGSzAqmcYp4CB8F8lwIBR+2oARkkbuUCauCM3AJYP32sAmYz8EcC5hrYQwEcxqeZAcg8TosBauZ/cAGu+YSMAumkQH6mLDFkkA== X-Spam-Status: No, score=-2.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: cygwin@cygwin.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: General Cygwin discussions and problem reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 02 Apr 2020 08:05:55 -0000 > On 4/1/2020 2:34 PM, Ken Brown via Cygwin wrote: > > On 4/1/2020 1:14 PM, sten.kristian.ivarsson@gmail.com wrote: > >>> On 4/1/2020 4:52 AM, sten.kristian.ivarsson@gmail.com wrote: > >>>>> On 3/31/2020 5:10 PM, sten.kristian.ivarsson@gmail.com wrote: > >>>>>>> On 3/28/2020 10:19 PM, Ken Brown via Cygwin wrote: > >>>>>>>> On 3/28/2020 11:43 AM, Ken Brown via Cygwin wrote: > >>>>>>>>> On 3/28/2020 8:10 AM, sten.kristian.ivarsson@gmail.com = wrote: > >>>>>>>>>>> On 3/27/2020 10:53 AM, sten.kristian.ivarsson@gmail.com = wrote: > >>>>>>>>>>>>> On 3/26/2020 7:19 PM, Ken Brown via Cygwin wrote: > >>>>>>>>>>>>>> On 3/26/2020 6:39 PM, Ken Brown via Cygwin wrote: > >>>>>>>>>>>>>>> On 3/26/2020 6:01 PM, sten.kristian.ivarsson@gmail.com wrote: > >>>>>>>>>>>>>>>> The ENIXIO occurs when parallel child-processes > >>>>>>>>>>>>>>>> simultaneously using O_NONBLOCK opening the = descriptor. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> This is consistent with my guess that the error is > >>>>>>>>>>>>>>> generated by fhandler_fifo::wait.=A0 I have a feeling = that > >>>>>>>>>>>>>>> read_ready should have been created as a manual-reset > >>>>>>>>>>>>>>> event, and that more care is needed to make sure it's > >>>>>>>>>>>>>>> set > >> when it should be. > >> > >> [snip] > >> > >>>>>>>> Never mind.=A0 I was able to reproduce the problem and find = the cause. > >>>>>>>> What happens is that when the first subprocess exits, > >>>>>>>> fhandler_fifo::close resets read_ready.=A0 That causes the = second > >>>>>>>> and subsequent subprocesses to think that there's no reader > >>>>>>>> open, so their attempts to open a writer with O_NONBLOCK fail with ENXIO. > >> > >> [snip] > >> > >>>> I wrote in a previous mail in this topic that it seemed to work > >>>> fine for me as well, but when I bumped up the numbers of writers > >>>> and/or the number of messages (e.g. 25/25) it starts to fail = again > >> > >> [snip] > >> > >>> Yes, it is a resource issue.=A0 There is a limit on the number of > >>> writers > >> that can be open at one > >>> time, currently 64.=A0 I chose that number arbitrarily, with no = idea > >>> what > >> might actually be > >>> needed in practice, and it can easily be changed. > >> > >> Does it have to be a limit at all ? We would rather see that the > >> application decide how much resources it would like to use. In our > >> particular case there will be a process-manager with an incoming = pipe > >> that possible several thousands of processes will write to > > > > I agree. > > > >> Just for fiddling around (to figure out if this is the limit that > >> make other things work a bit odd), where's this 64 limit defined = now ? > > > > It's MAX_CLIENTS, defined in fhandler.h.=A0 But there seem to be = other > > resource issues also; simply increasing MAX_CLIENTS doesn't solve = the > > problem.=A0 I think there are also problems with the number of = threads, > > for example.=A0 Each time your program forks, the subprocess = inherits > > the rfd file descriptor and its "fifo_reader_thread" starts up.=A0 = This > > is unnecessary for your application, so I tried disabling it (in > fhandler_fifo::fixup_after_fork), just as an experiment. > > > > But then I ran into some deadlocks, suggesting that one of the locks > > I'm using isn't robust enough.=A0 So I've got a lot of things to = work on. > > > >>> In addition, a writer isn't recognized as closed until a reader > >>> tries to > >> read and gets an error. > >>> In your example with 25/25, the list of writers quickly gets to 64 > >>> before > >> the parent ever tries > >>> to read. > >> > >> That explains the behaviour, but should there be some error = returned > >> from open/write (maybe it is but I'm missing it) ? > > > > The error is discovered in add_client_handler, called from > > thread_func.=A0 I think you'll only see it if you run the program = under > > strace.=A0 I'll see if I can find a way to report it.=A0 Currently, > > there's a retry loop in fhandler_fifo::open when a writer tries to > > open, and I think I need to limit the number of retries and then = error out. >=20 > I pushed a few improvements and bug fixes, and your 25/25 example now = runs without a > problem. I increased MAX_CLIENTS to 1024 just for the sake of this example, but I'll > work on letting the number of writers increase dynamically as needed. I pulled it and tried it out and yes, the sample test program with 25/25 worked well and a whole bunch of our unit-tests passed with ok result = now We still do have some issues, but I cannot yet tell if they are related = to named pipes or not It is great that you're looking into a totally dynamic solution Kristian > Ken