From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from NAM12-DM6-obe.outbound.protection.outlook.com (mail-dm6nam12on2111.outbound.protection.outlook.com [40.107.243.111]) by sourceware.org (Postfix) with ESMTPS id 751A4385DC12 for ; Thu, 2 Apr 2020 02:19:35 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 751A4385DC12 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=f7h6pzVCep9goB2pXAPNP8lMtSCkmOpvvRChavd0L1/iN3VYtG6xVpXqCvbWJDDPhXlgKYUXqFDhfuG0MP4pLnoZiJkhqxIxHcgxV3KsMN8Nxl+SMv2FCKM8koqF7T0LHzyKvZsSLA8r9rogcd85GPJYvp5EyUfEPpdNDll5IGA79S2fynhqBHvHyxZdDkot8ZAB4UtIfu4ulE8GNMjZkuTMxi7T0ZViGDk6EJFkTABRcdhDkNg1n9dDmH34AE2uqIy/av2Qop6el+KUq127YUyoEjiG+EnnE1zQPsraqWNq73e0si07/07uINaSKbLj0i8pyTc4pDc4EwV3SxZvtA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=x4uPlLA+ogzk+bN3OzCdELqN/U7mVmeshgvoHzOgH/4=; b=S6QWIPchAt2O/Dvb9IFvBRG63kHhDi1uaztoxKknS4iUXLljVAHcQTLC7JXL4pjO2E1/RC8VbSnNR/xUYXkYBHk3TpHtfaiPUeEj7suqCFR2i0QR7qpio9GYac+/fxTH85GrjDlcO4G+wtPhOlUe8TZgE5R8gTh7yfsUf++XjYVW/dzwrAqgfPp0buqc4MU6QRHdZFJhMFxpf2ULvlWDPqft9zJlvqBkkBe8j2PWbl2UbSIy4RW3NzEXFnVkzBRdRABGOaHU78l/TOac8jb5RSnxWb5vxSUFbOzFN7mMw7h3NPloNKaGibrw68oY6+3tI+CPXb++R/3CkRzD/BaLIQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=cornell.edu; dmarc=pass action=none header.from=cornell.edu; dkim=pass header.d=cornell.edu; arc=none Received: from DM6PR04MB6075.namprd04.prod.outlook.com (2603:10b6:5:127::31) by DM6PR04MB5754.namprd04.prod.outlook.com (2603:10b6:5:16c::24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2856.20; Thu, 2 Apr 2020 02:19:32 +0000 Received: from DM6PR04MB6075.namprd04.prod.outlook.com ([fe80::e806:ecfb:c187:4e5a]) by DM6PR04MB6075.namprd04.prod.outlook.com ([fe80::e806:ecfb:c187:4e5a%4]) with mapi id 15.20.2878.016; Thu, 2 Apr 2020 02:19:32 +0000 Subject: Re: Sv: Sv: Sv: Sv: Sv: Sv: Sv: Named pipes and multiple writers To: sten.kristian.ivarsson@gmail.com Cc: 'cygwin' References: <1b1401d60296$2769e690$763db3b0$@gmail.com> <716e2076-f607-454e-2723-937c3959e2a3@cornell.edu> <18be01d602ab$0bbfca30$233f5e90$@gmail.com> <35b43b59-6410-f21f-710c-385e39cbae0b@cornell.edu> <005201d603ba$2bc8ab20$835a0160$@gmail.com> <472d1df6-531a-ebd7-4ffa-583a06e270ff@cornell.edu> <00b901d60447$7ecb4c50$7c61e4f0$@gmail.com> <00e001d604f9$d0aa0720$71fe1560$@gmail.com> <8c6c5655-c162-8361-9f44-376bbd7cf114@cornell.edu> <3fe06192-7300-382a-8c98-f1bc2ff81e36@cornell.edu> <003701d607a0$c975f140$5c61d3c0$@gmail.com> <249be61e-da8a-7da1-ca67-0c4c6433a415@cornell.edu> <000a01d60802$d1525900$73f70b00$@gmail.com> <001601d60848$fcffd320$f6ff7960$@gmail.com> <7b5b058e-5047-4d49-8c31-5553056f3845@cornell.edu> From: Ken Brown Message-ID: <7897bc10-439d-64aa-c173-f0bf4ec82468@cornell.edu> Date: Wed, 1 Apr 2020 22:19:29 -0400 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.6.0 In-Reply-To: <7b5b058e-5047-4d49-8c31-5553056f3845@cornell.edu> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-ClientProxiedBy: BL0PR05CA0026.namprd05.prod.outlook.com (2603:10b6:208:91::36) To DM6PR04MB6075.namprd04.prod.outlook.com (2603:10b6:5:127::31) MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 Received: from [192.168.0.17] (68.175.129.7) by BL0PR05CA0026.namprd05.prod.outlook.com (2603:10b6:208:91::36) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2878.9 via Frontend Transport; Thu, 2 Apr 2020 02:19:31 +0000 X-Originating-IP: [68.175.129.7] X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 92e1497f-444a-4442-f83f-08d7d6ac478c X-MS-TrafficTypeDiagnostic: DM6PR04MB5754: X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:8882; X-Forefront-PRVS: 0361212EA8 X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:DM6PR04MB6075.namprd04.prod.outlook.com; PTR:; CAT:NONE; SFTY:; SFS:(10019020)(4636009)(346002)(376002)(366004)(39860400002)(136003)(396003)(6486002)(81166006)(8676002)(26005)(2616005)(66476007)(81156014)(66556008)(2906002)(5660300002)(186003)(75432002)(31686004)(16526019)(316002)(16576012)(53546011)(66946007)(52116002)(31696002)(4326008)(36756003)(8936002)(956004)(786003)(6916009)(478600001)(86362001); DIR:OUT; SFP:1102; Received-SPF: None (protection.outlook.com: cornell.edu does not designate permitted sender hosts) X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: Zc4wEAy7P1eSz6YwuPJpXEoPSyGgxwki7+03NUYfBlJxngle1P3sKIktR9nmTDzwwEutwQYYMtkUES6c5oMXvcg8DabOoUxAoDtURUcwTUTGYk0nv9+XC7G+aBziQYBm8tjJx8cs2j6fnb/N5tcoVV+rXP2pAurAkF02DWFhGURgsbsZRDS5CbaI0gGXX6FYYuF6ga/p+NLbif+yJ2z5AtAmQljStwDBtVBgSW9auFsmCzgAQD0ZuvzWbrLRuBe9705MX61/JjUBFB2ZUysLHrz/RyG85AWtQldptasu59l3JlP96A/ojKI3UvppSPp3kD4UVeePDFl2+IsF/Ow9N44hyC+1PwqkVROSexD66DBqZJmAVlkR2c+l6FxMR/47Gie058P5f5iii0FOVY+s4i7H0No/4BM/eWKrhrUzBvz9H5Ae9LAlYftJ57ERf2nA X-MS-Exchange-AntiSpam-MessageData: 0KvZpGG5nggDqxl1DRfqdhYL+KTz4hbwomAxksM861w5AeH0ozQnv7KBr9EmoltIFtCjE0nV5slcK2Jz70ZXytEXhqTHbr6VdCeau684C7iQmuC+OLxDCq8KZrSpGxWAJ5Rw+TaG9px4LNOyEEatRQ== X-OriginatorOrg: cornell.edu X-MS-Exchange-CrossTenant-Network-Message-Id: 92e1497f-444a-4442-f83f-08d7d6ac478c X-MS-Exchange-CrossTenant-OriginalArrivalTime: 02 Apr 2020 02:19:32.2882 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 5d7e4366-1b9b-45cf-8e79-b14b27df46e1 X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: Ouim6bN3i3l+6npHxsZHF+gCeMK+ZR9zHPjKEs3FAIvBu3ICj4oojmVs5fh1WWXfxdwOVt2Ycjm6JQ1JenHkpw== X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM6PR04MB5754 X-Spam-Status: No, score=-1.2 required=5.0 tests=BAYES_00, DKIM_INVALID, DKIM_SIGNED, KAM_DMARC_STATUS, MSGID_FROM_MTA_HEADER, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: cygwin@cygwin.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: General Cygwin discussions and problem reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 02 Apr 2020 02:19:37 -0000 On 4/1/2020 2:34 PM, Ken Brown via Cygwin wrote: > On 4/1/2020 1:14 PM, sten.kristian.ivarsson@gmail.com wrote: >>> On 4/1/2020 4:52 AM, sten.kristian.ivarsson@gmail.com wrote: >>>>> On 3/31/2020 5:10 PM, sten.kristian.ivarsson@gmail.com wrote: >>>>>>> On 3/28/2020 10:19 PM, Ken Brown via Cygwin wrote: >>>>>>>> On 3/28/2020 11:43 AM, Ken Brown via Cygwin wrote: >>>>>>>>> On 3/28/2020 8:10 AM, sten.kristian.ivarsson@gmail.com wrote: >>>>>>>>>>> On 3/27/2020 10:53 AM, sten.kristian.ivarsson@gmail.com wrote: >>>>>>>>>>>>> On 3/26/2020 7:19 PM, Ken Brown via Cygwin wrote: >>>>>>>>>>>>>> On 3/26/2020 6:39 PM, Ken Brown via Cygwin wrote: >>>>>>>>>>>>>>> On 3/26/2020 6:01 PM, sten.kristian.ivarsson@gmail.com wrote: >>>>>>>>>>>>>>>> The ENIXIO occurs when parallel child-processes >>>>>>>>>>>>>>>> simultaneously using O_NONBLOCK opening the descriptor. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> This is consistent with my guess that the error is >>>>>>>>>>>>>>> generated by fhandler_fifo::wait.  I have a feeling that >>>>>>>>>>>>>>> read_ready should have been created as a manual-reset >>>>>>>>>>>>>>> event, and that more care is needed to make sure it's set >> when it should be. >> >> [snip] >> >>>>>>>> Never mind.  I was able to reproduce the problem and find the cause. >>>>>>>> What happens is that when the first subprocess exits, >>>>>>>> fhandler_fifo::close resets read_ready.  That causes the second >>>>>>>> and subsequent subprocesses to think that there's no reader open, >>>>>>>> so their attempts to open a writer with O_NONBLOCK fail with ENXIO. >> >> [snip] >> >>>> I wrote in a previous mail in this topic that it seemed to work fine >>>> for me as well, but when I bumped up the numbers of writers and/or the >>>> number of messages (e.g. 25/25) it starts to fail again >> >> [snip] >> >>> Yes, it is a resource issue.  There is a limit on the number of writers >> that can be open at one >>> time, currently 64.  I chose that number arbitrarily, with no idea what >> might actually be >>> needed in practice, and it can easily be changed. >> >> Does it have to be a limit at all ? We would rather see that the application >> decide how much resources it would like to use. In our particular case there >> will be a process-manager with an incoming pipe that possible several >> thousands of processes will write to > > I agree. > >> Just for fiddling around (to figure out if this is the limit that make other >> things work a bit odd), where's this 64 limit defined now ? > > It's MAX_CLIENTS, defined in fhandler.h.  But there seem to be other resource > issues also; simply increasing MAX_CLIENTS doesn't solve the problem.  I think > there are also problems with the number of threads, for example.  Each time your > program forks, the subprocess inherits the rfd file descriptor and its > "fifo_reader_thread" starts up.  This is unnecessary for your application, so I > tried disabling it (in fhandler_fifo::fixup_after_fork), just as an experiment. > > But then I ran into some deadlocks, suggesting that one of the locks I'm using > isn't robust enough.  So I've got a lot of things to work on. > >>> In addition, a writer isn't recognized as closed until a reader tries to >> read and gets an error. >>> In your example with 25/25, the list of writers quickly gets to 64 before >> the parent ever tries >>> to read. >> >> That explains the behaviour, but should there be some error returned from >> open/write (maybe it is but I'm missing it) ? > > The error is discovered in add_client_handler, called from thread_func.  I think > you'll only see it if you run the program under strace.  I'll see if I can find > a way to report it.  Currently, there's a retry loop in fhandler_fifo::open when > a writer tries to open, and I think I need to limit the number of retries and > then error out. I pushed a few improvements and bug fixes, and your 25/25 example now runs without a problem. I increased MAX_CLIENTS to 1024 just for the sake of this example, but I'll work on letting the number of writers increase dynamically as needed. Ken