From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from NAM12-DM6-obe.outbound.protection.outlook.com (mail-dm6nam12on2105.outbound.protection.outlook.com [40.107.243.105]) by sourceware.org (Postfix) with ESMTPS id 3155C385BF92 for ; Wed, 1 Apr 2020 18:34:16 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 3155C385BF92 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ZwBh0XvSIGpdoMuTX8zHXTeyhBKSkMidCuJm2gWcgbg3kq92DGc9exMe2s6nKHRn+MAtLGwpWhfEIdYA2P1ML3+CfMAQ6rgfA0tAaOg3VGpjY9e6/XK/L/1iWLyzoD025Eo0Jm8gOuMH3l1vZd+R7DGi4GX1XLi5M9Z8KDzGqRFSf0JrH+kUe6S6AvUT7c8h4ZwzZEzx7+aNWt79QaIxhe1HUlTTBUYtZXKCz2PdB/D8zA/i2N+luX2t9+krYV/rSgjUtN2w8/bvA69sH+3wGRLadN6BGZoPQ6C3zZO3hIP8Qxsc0QqTLQYddAqx8QymeuuqFc5V8Vy55RgVxXaFLA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=/ioLkiuVRm73rJrxvN8BwU4eMdlQknkCagqf/qSagxw=; b=AMKfEistqK6eek5ylH/rc3xoef76Hs4J6faqBWdWFwpciAQBFYy3oOvyo/MSjpMGrfe9Mp4svT05r77GT6AcHSoiEa2P4sbldmYbm71AMyWFlQ9z9duO1K25CjCHsfyvz6RtcunJJLyskEsaW1ofVL4Nh6jpu3WlQYWy1e5RIBL6gx7gcsdnhCODqQ3eD0RcFc50azF+Upp2ixQpmeMtM9+t49P1qxjRG4rs4xBivOS68h9NYaj9M+RSHOzTrDdJn9o+eLEK0SGlwnhLhtKcIKsZqDbyauVi+Wr4XnP59M56jx8loOeGSZpR9ksC2h0ryTOgHcNzkSy75rAV94v+xQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=cornell.edu; dmarc=pass action=none header.from=cornell.edu; dkim=pass header.d=cornell.edu; arc=none Received: from DM6PR04MB6075.namprd04.prod.outlook.com (2603:10b6:5:127::31) by DM6PR04MB4844.namprd04.prod.outlook.com (2603:10b6:5:1f::33) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2856.20; Wed, 1 Apr 2020 18:34:15 +0000 Received: from DM6PR04MB6075.namprd04.prod.outlook.com ([fe80::e806:ecfb:c187:4e5a]) by DM6PR04MB6075.namprd04.prod.outlook.com ([fe80::e806:ecfb:c187:4e5a%4]) with mapi id 15.20.2856.018; Wed, 1 Apr 2020 18:34:14 +0000 Subject: Re: Sv: Sv: Sv: Sv: Sv: Sv: Sv: Named pipes and multiple writers To: sten.kristian.ivarsson@gmail.com Cc: 'cygwin' References: <1b1401d60296$2769e690$763db3b0$@gmail.com> <716e2076-f607-454e-2723-937c3959e2a3@cornell.edu> <18be01d602ab$0bbfca30$233f5e90$@gmail.com> <35b43b59-6410-f21f-710c-385e39cbae0b@cornell.edu> <005201d603ba$2bc8ab20$835a0160$@gmail.com> <472d1df6-531a-ebd7-4ffa-583a06e270ff@cornell.edu> <00b901d60447$7ecb4c50$7c61e4f0$@gmail.com> <00e001d604f9$d0aa0720$71fe1560$@gmail.com> <8c6c5655-c162-8361-9f44-376bbd7cf114@cornell.edu> <3fe06192-7300-382a-8c98-f1bc2ff81e36@cornell.edu> <003701d607a0$c975f140$5c61d3c0$@gmail.com> <249be61e-da8a-7da1-ca67-0c4c6433a415@cornell.edu> <000a01d60802$d1525900$73f70b00$@gmail.com> <001601d60848$fcffd320$f6ff7960$@gmail.com> From: Ken Brown Message-ID: <7b5b058e-5047-4d49-8c31-5553056f3845@cornell.edu> Date: Wed, 1 Apr 2020 14:34:12 -0400 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.6.0 In-Reply-To: <001601d60848$fcffd320$f6ff7960$@gmail.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-ClientProxiedBy: CH2PR19CA0029.namprd19.prod.outlook.com (2603:10b6:610:4d::39) To DM6PR04MB6075.namprd04.prod.outlook.com (2603:10b6:5:127::31) MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 Received: from [192.168.0.17] (68.175.129.7) by CH2PR19CA0029.namprd19.prod.outlook.com (2603:10b6:610:4d::39) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2878.15 via Frontend Transport; Wed, 1 Apr 2020 18:34:14 +0000 X-Originating-IP: [68.175.129.7] X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 4de76d3f-9be9-4406-f5b5-08d7d66b4784 X-MS-TrafficTypeDiagnostic: DM6PR04MB4844: X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:8882; X-Forefront-PRVS: 03607C04F0 X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:DM6PR04MB6075.namprd04.prod.outlook.com; PTR:; CAT:NONE; SFTY:; SFS:(10019020)(4636009)(396003)(39860400002)(346002)(366004)(136003)(376002)(6486002)(956004)(52116002)(186003)(478600001)(26005)(75432002)(53546011)(2616005)(16526019)(8936002)(2906002)(5660300002)(66556008)(86362001)(81156014)(66946007)(66476007)(81166006)(786003)(4326008)(16576012)(36756003)(31686004)(6916009)(316002)(8676002)(31696002); DIR:OUT; SFP:1102; Received-SPF: None (protection.outlook.com: cornell.edu does not designate permitted sender hosts) X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: SegzrnKOJjWP1h5ryrxPhA/Qtkp3dL/SS5MH+0SuReMZSVEghQ7b0CgC9jMb91ZOSUYCLbwTm9Rf3dQZSZ7104XLZpSyKWgeMnx7M5EbroMfJ8XRX8OBo7Ka3SepvqlhV2oP7ppLBlCt8fEDL3Uanoc+qrNfmlRtS+iUKEO3SvD57JMhYZBOoilWpvPjpXC8PCtturUhRuD1eIWhwGnJgE/23zeWE3ITIEXtCYUnpL2H12sHyM19X8xutVi393hz+40ULUadsBom5elKky5aMZ0zoZ/f2gPX3BQp/GbixxKccZ4bafB3sNoQFcswz2XLLNT9Slu8nx5O1WHuS66mcQvmN/Xovvbc4RzqIOpJkp0GgfGN/tM2Puhgu+rvwa4WGoJFq2yBdZYcdl23xp5dbMdxUDQvHR03R5eb0b0M4x97SI205fWjdeXffmOS2yp9 X-MS-Exchange-AntiSpam-MessageData: 90+CelgUlZVS+UUnBWDu69DXH7plPt5X6USYxCBKphzN3YTAWtIeGBsjX3HM309m06NZ7cSD9TlCo6Lau1eWsv1iJtl+lgpVXUYtsBdBCgGFXVF1lSNHcJehDzlxavRbto35SROsZBMUZzWJnG+r+Q== X-OriginatorOrg: cornell.edu X-MS-Exchange-CrossTenant-Network-Message-Id: 4de76d3f-9be9-4406-f5b5-08d7d66b4784 X-MS-Exchange-CrossTenant-OriginalArrivalTime: 01 Apr 2020 18:34:14.8183 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 5d7e4366-1b9b-45cf-8e79-b14b27df46e1 X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: bU/zLOoO5Mjn2uEFEwCmrmDkP/f520/yOVKMprmjCQovW6cHISn5v5IvEqATorYQq6ouwvBt+rHSYRzJcJzWOw== X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM6PR04MB4844 X-Spam-Status: No, score=-1.3 required=5.0 tests=BAYES_00, DKIM_INVALID, DKIM_SIGNED, KAM_DMARC_STATUS, MSGID_FROM_MTA_HEADER, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: cygwin@cygwin.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: General Cygwin discussions and problem reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Apr 2020 18:34:18 -0000 On 4/1/2020 1:14 PM, sten.kristian.ivarsson@gmail.com wrote: >> On 4/1/2020 4:52 AM, sten.kristian.ivarsson@gmail.com wrote: >>>> On 3/31/2020 5:10 PM, sten.kristian.ivarsson@gmail.com wrote: >>>>>> On 3/28/2020 10:19 PM, Ken Brown via Cygwin wrote: >>>>>>> On 3/28/2020 11:43 AM, Ken Brown via Cygwin wrote: >>>>>>>> On 3/28/2020 8:10 AM, sten.kristian.ivarsson@gmail.com wrote: >>>>>>>>>> On 3/27/2020 10:53 AM, sten.kristian.ivarsson@gmail.com wrote: >>>>>>>>>>>> On 3/26/2020 7:19 PM, Ken Brown via Cygwin wrote: >>>>>>>>>>>>> On 3/26/2020 6:39 PM, Ken Brown via Cygwin wrote: >>>>>>>>>>>>>> On 3/26/2020 6:01 PM, sten.kristian.ivarsson@gmail.com wrote: >>>>>>>>>>>>>>> The ENIXIO occurs when parallel child-processes >>>>>>>>>>>>>>> simultaneously using O_NONBLOCK opening the descriptor. >>>>>>>>>>>>>> >>>>>>>>>>>>>> This is consistent with my guess that the error is >>>>>>>>>>>>>> generated by fhandler_fifo::wait.  I have a feeling that >>>>>>>>>>>>>> read_ready should have been created as a manual-reset >>>>>>>>>>>>>> event, and that more care is needed to make sure it's set > when it should be. > > [snip] > >>>>>>> Never mind.  I was able to reproduce the problem and find the cause. >>>>>>> What happens is that when the first subprocess exits, >>>>>>> fhandler_fifo::close resets read_ready.  That causes the second >>>>>>> and subsequent subprocesses to think that there's no reader open, >>>>>>> so their attempts to open a writer with O_NONBLOCK fail with ENXIO. > > [snip] > >>> I wrote in a previous mail in this topic that it seemed to work fine >>> for me as well, but when I bumped up the numbers of writers and/or the >>> number of messages (e.g. 25/25) it starts to fail again > > [snip] > >> Yes, it is a resource issue. There is a limit on the number of writers > that can be open at one >> time, currently 64. I chose that number arbitrarily, with no idea what > might actually be >> needed in practice, and it can easily be changed. > > Does it have to be a limit at all ? We would rather see that the application > decide how much resources it would like to use. In our particular case there > will be a process-manager with an incoming pipe that possible several > thousands of processes will write to I agree. > Just for fiddling around (to figure out if this is the limit that make other > things work a bit odd), where's this 64 limit defined now ? It's MAX_CLIENTS, defined in fhandler.h. But there seem to be other resource issues also; simply increasing MAX_CLIENTS doesn't solve the problem. I think there are also problems with the number of threads, for example. Each time your program forks, the subprocess inherits the rfd file descriptor and its "fifo_reader_thread" starts up. This is unnecessary for your application, so I tried disabling it (in fhandler_fifo::fixup_after_fork), just as an experiment. But then I ran into some deadlocks, suggesting that one of the locks I'm using isn't robust enough. So I've got a lot of things to work on. >> In addition, a writer isn't recognized as closed until a reader tries to > read and gets an error. >> In your example with 25/25, the list of writers quickly gets to 64 before > the parent ever tries >> to read. > > That explains the behaviour, but should there be some error returned from > open/write (maybe it is but I'm missing it) ? The error is discovered in add_client_handler, called from thread_func. I think you'll only see it if you run the program under strace. I'll see if I can find a way to report it. Currently, there's a retry loop in fhandler_fifo::open when a writer tries to open, and I think I need to limit the number of retries and then error out. Ken