From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from NAM10-DM6-obe.outbound.protection.outlook.com (mail-dm6nam10on2125.outbound.protection.outlook.com [40.107.93.125]) by sourceware.org (Postfix) with ESMTPS id 30924395445A for ; Tue, 13 Apr 2021 22:43:07 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 30924395445A ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=EJDYKIhTEriJZqpsIC0n7yRacB19G//Hp58945wr55Ai8VYHKMVmtsFeqYqr5FmhffivyRfaqFLKVqSokGLMoefj9vmJ01VRxXBn4tAMQpk3whaWiN3CuDwZtR6TqtY/AXpPVVoaQOd/gzwNsrz1+BjzrzRAoW8eej1N8dhsuVkyTAIXRiBPEnfFmxJHoRWYvB+3ISpaMs7TnOLL0LJCicqdX2ymzZNcWRW1Ig7n7/adpPgYKHAaF4yqjGWftbiGWCHmfINuytI8mjMdbe6UIR1Eac3Pv2/VtFvjg88Ejp9FM9l/53CdAxCAwU0ceA7wHUIIDNZMGA6C1qAy6t7Bhg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=2DFjrCfkXpCAJ5KV2YUaHlLIYFJf+1BVom8p03fmFAA=; b=jNlsdNfTPvZUFz/O9mRYAe9OAmT3vuLh26ffgLfDJCzS7Dio+Gs4QIWq1XfZu+moOUWviv0n5wnVp2JmPEF0h+2iOoVgk8vd2fnYPFDqx9Jwb+uWMxYqBxxt28iNudiAXsM609q6G5pJLpHC5c/gJNivBZCDKoziNLWVcRfyM2mC67APldpily8k2kkbW+iKAXHk756MKPNTzNn0Nd/F0X+d7+uQ5csLWZ6ZB1opNOPrJZQUSeIOV3yjfGdyPG9UGIuCgClvR+eJEKbDjwrkZJ5Pay+yJRIrGw8CI0IA56KllXwW+g2vXkpVf8IpqiwLKxCIX0Zq1y1OjkXCXFWNjg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=cornell.edu; dmarc=pass action=none header.from=cornell.edu; dkim=pass header.d=cornell.edu; arc=none Received: from BN7PR04MB4388.namprd04.prod.outlook.com (2603:10b6:406:f8::19) by BN8PR04MB6370.namprd04.prod.outlook.com (2603:10b6:408:d5::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4020.17; Tue, 13 Apr 2021 22:43:05 +0000 Received: from BN7PR04MB4388.namprd04.prod.outlook.com ([fe80::59f8:fcc4:f07e:9a89]) by BN7PR04MB4388.namprd04.prod.outlook.com ([fe80::59f8:fcc4:f07e:9a89%4]) with mapi id 15.20.4020.023; Tue, 13 Apr 2021 22:43:05 +0000 Subject: Re: AF_UNIX/SOCK_DGRAM is dropping messages To: sten.kristian.ivarsson@gmail.com, cygwin@cygwin.com References: <04cc01d71ffa$7d1e6cf0$775b46d0$@gmail.com> <00d901d7208e$97c05c50$c74114f0$@gmail.com> <860668bf-8cf9-0969-6a01-7fbf8b782db1@cornell.edu> <000901d72607$55dc5a90$01950fb0$@gmail.com> <3346cd1c-b93f-83c4-ff26-553ac95ec692@cornell.edu> <7c21a430-9609-7fd4-1a02-8b7c1978d2f8@cornell.edu> <001901d72af4$4009cd50$c01d67f0$@gmail.com> <134074c1-4c0b-0842-b88b-536a1ed4aefe@cornell.edu> <000e01d7306e$3c265580$b4730080$@gmail.com> <19cf8626-c653-76db-a409-730a5aa5c955@cornell.edu> From: Ken Brown Message-ID: <4380cdea-c95b-d9dc-50e3-e5adabb73b92@cornell.edu> Date: Tue, 13 Apr 2021 18:43:03 -0400 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.9.1 In-Reply-To: <19cf8626-c653-76db-a409-730a5aa5c955@cornell.edu> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-Originating-IP: [24.194.34.31] X-ClientProxiedBy: MN2PR19CA0007.namprd19.prod.outlook.com (2603:10b6:208:178::20) To BN7PR04MB4388.namprd04.prod.outlook.com (2603:10b6:406:f8::19) MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 Received: from [192.168.0.17] (24.194.34.31) by MN2PR19CA0007.namprd19.prod.outlook.com (2603:10b6:208:178::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4020.17 via Frontend Transport; Tue, 13 Apr 2021 22:43:05 +0000 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 8500c02c-a294-4e8a-eb24-08d8fecd80ce X-MS-TrafficTypeDiagnostic: BN8PR04MB6370: X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:10000; X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: rejpkwsYgd1LMrjL4ybxaJ/uvLSEZHLLGwSrmcRQj2BTWdvj68C+x3w+eQd3OdmxDD8OnaODP//ca6oVVHpbclJEGSPWnxbElk4J22m6zg8N8gKW0G/GlaJBnlyjw9W4XvvNvsHGm+3dyQoCpgM3WOnHWl9+FIuwxR9NydKvpebRvO/C6OYQMzc+WFKGu1IoWXmKAQY/54L4d2Lm2z/jUh3tJ/NFWnLZeNpZ65JN/KOgU0NQOfqjdBwWnJWumclqrgkhlghh8g3dMRXLXx/QOPXLoHBvIxlJqFw3IumDJo2yV8Fe7qNs7A4W6/en0fv5zNf+W0XHy1taVwsh0SIWMpwWFjXfoQdYuOYTuOQG7ApjAx0e0zmG2mgYX15RSj3vA2FIvoZ6kPyUF5Qm+CAqbTLXG4JaOPbASS/WfedVgQ/xfh6hTRZzFL+jBDZbTExIzZMbZVMMpM4iHPFLu8na4R2fKhVdH7fyj9f/bAM4OXryMRoCvabDTwxAFfywqXa7lYSCdkDZDvuOsV6zXRSxQAK/1e65NOdkBD73oXDGhSZtvaMNozehLvFsM6JsJjbhp0uzwLy2pJ3EIElpFiUHO03KQdUzWHIHlta9riYTjF8s+wIsYm+Gx9mxT4cCLAw0oVYjd766aCgdP+2EsnO51kUNTD6RBeqW+spcMd6hssA8BS/erHHS+y8fy5mIkDc4OcQYUe9JPH+I7IsLbsvoXJqQei7N+hInh0ifmk4OrxKL2LhLElyMRQI9AuV5FtfMTxb3KTIAo4ulpRF3URa0JSPvdiPmYcL2xkKq4G3khkQ= X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:BN7PR04MB4388.namprd04.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(366004)(396003)(39860400002)(376002)(346002)(136003)(16526019)(31696002)(38350700002)(52116002)(15650500001)(26005)(86362001)(2616005)(478600001)(186003)(966005)(16576012)(8676002)(53546011)(75432002)(5660300002)(8936002)(2906002)(6486002)(36756003)(38100700002)(83380400001)(66946007)(786003)(66556008)(31686004)(956004)(66574015)(66476007)(316002)(43740500002)(45980500001); DIR:OUT; SFP:1102; X-MS-Exchange-AntiSpam-MessageData: =?Windows-1252?Q?RwW+cTaBYLcIWs3MwHI9547iWSBa+iZRdseHyXTmHOGaB9ZqwN9H2xN7?= =?Windows-1252?Q?/hep6/KfpS9+W9lK+FS2istJ79yl9KKf8agHu5E2a/cOPrdHDsO2E60d?= =?Windows-1252?Q?uBIdi5rSkK9Ed/M33fpH/QV9MUjxgzI6M+eUgkryN3+bgmi0QzzfJR0h?= =?Windows-1252?Q?iC7L+ksIGth/2GHQGFYwD5Uk+JxA53BJeNkOyPy2SJFn2x7WdyJZQgYD?= =?Windows-1252?Q?3VI7T2RHTMhQpAR+0U233FjbKuYSDDKei0TeA7VSSNnPyrwbPZQQJ6OO?= =?Windows-1252?Q?KafPF2VJertuPTDl/RAseY7lZeAswLy5MCKIU/0oEFIvyXd+N7gKoF0y?= =?Windows-1252?Q?itbXQkPJhUBDtE4Y9CoIvkUvEbUWlCHC7F5oZ1DA5IRDuuPmQPbGd++9?= =?Windows-1252?Q?Yu6OMpSfIOgfCQVnn/bgVq2Vm9GmvlEbIbmBbXJR4AxJ1vujGRIbrDUg?= =?Windows-1252?Q?yjC7SHQJT4tvefcBRMgFPyCl+iQ/KdnHLBYc6l8foeFDHlc4PCopX3XI?= =?Windows-1252?Q?fTiOgaeQ6S27hPb98s645fCf/qAEcMjjUsbnOGdVIshwx3rFneQtSQ9s?= =?Windows-1252?Q?BdeuiLsGtAgWL/gignDhu9fxeJLZWvRGl9L+FafQHSXjSiZOliE4aglv?= =?Windows-1252?Q?jXhrteGsQPQav/OwWvNpgWGrSCOkBY70Jh7rDzaF8UC54vxTkg3Lf+A4?= =?Windows-1252?Q?e1lC+GAkNa2Ia3+6bSiEtTGUm/H8Tdv696XwEXWU8MwcORkIshoLOevb?= =?Windows-1252?Q?Ubvet0pOHuXRBb+wx/Xq8fSMopnMZBs8E1aJ5OkNT4Wskr4O3vqjl6Cu?= =?Windows-1252?Q?ai3CB12CpP2dEE4CiFNaFGITipv9CS79ycxHZYI4Vp4hWVh+1k0kmiSE?= =?Windows-1252?Q?/eZOIbBUR4CrqO6q3am0CVuWxkNQJbBKTJqCijsUH+zVkp86kkahGnCD?= =?Windows-1252?Q?/SabyVC5Yt2xZjhMHHXOppeb1L6AGOOucdKKeXH/xFJtrtT2qBHzDj3s?= =?Windows-1252?Q?1c1ZkSF7ZnVqwTbYycIwIlOsfSES2/YxIV3aXdBDv9aHRcfgzRByRlKz?= =?Windows-1252?Q?2Ggt1lzM3Ikq3Gwbnr18WVkJAyxria9xxZtGBO+cXc7nvzgZD7T808Nr?= =?Windows-1252?Q?/vP5Kekk74dVqnb3eP/8dDU0x1Mwcii1Hlh5XD6lt+1Rn934UGeAJNs5?= =?Windows-1252?Q?GTRJ6u3jk+vEZrB2WUdQ8RA4uOyKhubNBZSLYuVL5p/a5ArcmAHnvBq8?= =?Windows-1252?Q?cmHlGVYOzFaBmeUv9LMeWMBAKhYjtgtriod9zy/0nQBRhqwwjUyqelnO?= =?Windows-1252?Q?fQ917kShcZccGb8yrh3tUCTURsH4R+3xIdsYI41NFeO7CH4e7QyzSpES?= =?Windows-1252?Q?82eLXv3+bdLJ7oDPSpA9qwxD87v4Yv/xfZr/qPuLeKYtMn2j24dqQIdz?= X-OriginatorOrg: cornell.edu X-MS-Exchange-CrossTenant-Network-Message-Id: 8500c02c-a294-4e8a-eb24-08d8fecd80ce X-MS-Exchange-CrossTenant-AuthSource: BN7PR04MB4388.namprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 13 Apr 2021 22:43:05.7359 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 5d7e4366-1b9b-45cf-8e79-b14b27df46e1 X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: VWrDdfV4+DtCeunU7Qmc3krDCpQcFu1e4LMmsLByjgbZhQepFamxcluQgeQiXJS0xVYGGrRtxwD/fJc2cDQm0w== X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN8PR04MB6370 X-Spam-Status: No, score=1.3 required=5.0 tests=BAYES_00, DKIM_INVALID, DKIM_SIGNED, KAM_DMARC_STATUS, MSGID_FROM_MTA_HEADER, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, URIBL_SBL, URIBL_SBL_A autolearn=no autolearn_force=no version=3.4.2 X-Spam-Level: * X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: cygwin@cygwin.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: General Cygwin discussions and problem reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Apr 2021 22:43:10 -0000 On 4/13/2021 10:47 AM, Ken Brown via Cygwin wrote: > On 4/13/2021 10:06 AM, sten.kristian.ivarsson@gmail.com wrote: >> Hi Ken >> >>>>>>>>>>> Using AF_UNIX/SOCK_DGRAM with current version (3.2.0) seems >>> to >>>>>>>>>>> drop messages or at least they are not received in the same >>>>>>>>>>> order they are  sent >>>>>>> >>>>>>> [snip] >>>>>>> >>>>>>>> Thanks for the test case.  I can confirm the problem.  I'm not >>>>>>>> familiar enough with the current AF_UNIX implementation to debug >>>>>>>> this easily.  I'd rather spend my time on the new implementation >>>>>>>> (on the topic/af_unix branch).  It turns out that your test case >>>>>>>> fails there too, but in a completely different way, due to a bug >>>>>>>> in sendto for datagrams.  I'll see if I can fix that bug and then try >>>>>>>> again. >>>>>>>> >>>>>>>> Ken >>>>>>> >>>>>>> Ok, too bad it wasn't our own code base but good that the "mystery" >>>>>>> is verified >>>>>>> >>>>>>> I finally succeed to build topic/af_unix (after finding out what >>>>>>> version of zlib was needed), but not with -D__WITH_AF_UNIX to >>>>>>> CXXFLAGS though and thus I haven’t tested it yet >>>>>>> >>>>>>> Is it sufficient to add the define to the "main" Makefile or do you >>>>>>> have to add it to all the Makefile:s ? I guess I can find out >>>>>>> though >>>>>> >>>>>> I do it on the configure line, like this: >>>>>> >>>>>>     ../af_unix/configure CXXFLAGS="-g -O0 -D__WITH_AF_UNIX" -- >>> prefix=... >>>>>> >>>>>>> Is topic/af_unix fairly up to date with master branch ? >>>>>> >>>>>> Yes, I periodically cherry-pick commits from master to topic/af_unix. >>>>>> I'lldo that again right now. >>>>>> >>>>>>> Either way, I'll be glad to help out testing topic/af_unix >>>>>> >>>>>> Thanks! >>>>> >>>>> I've now pushed a fix for that sendto bug, and your test case runs >>>>> without error on the topic/af_unix branch. >>>> >>>> It seems like the test-case do work now with topic/af_unix in blocking >>>> mode, but when using non-blocking (with MSG_DONTWAIT) there are >>> some >>>> issues I think >>>> >>>> 1. When the queue is empty with non-blocking recv(), errno is set to >>>> EPIPE but I think it should be EAGAIN (or maybe the pipe is getting >>>> broken for real of some reason ?) >>>> >>>> 2. When using non-blocking recv() and no message is written at all, it >>>> seems like recv() blocks forever >>>> >>>> 3. Using non-blocking recv() where the "client" does send less than >>>> "count" messages, sometimes recv() blocks forever (as well) >>>> >>>> >>>> My naïve analysis of this is that for the first issue (if any) the >>>> wrong errno is set and for the second issue it blocks if no sendto() >>>> is done after the first recv(), i.e. nothing kicks the "reader thread" >>>> in the butt to realise the queue is empty. It is not super clear >>>> though what POSIX says about creating blocking descriptors and then >>>> using non-blocking-flags with recv(), but this works in Linux any way >>> >>> The explanation is actually much simpler.  In the recv code where a bound >>> datagram socket waits for a remote socket to connect to the pipe, I simply >>> forget to handle MSG_DONTWAIT.  I've pushed a fix.  Please retest. >>> >>> I should add that in all my work so far on the topic/af_unix branch, I've >>> thought mainly about stream sockets.  So there may still be things remaining >>> to be implemented for the datagram case. >> >> I finally got some time to test topic/af_unix in our "real" cygwin-application >> (casual) and unfortunately very few of our unittests pass >> >> The symptoms are that there's unexpected eternal blocking, sometimes there's >> unexpected EADDRNOTAVAIL, sometimes it looks like some memory corruption (and >> core-dumps) >> >> Of course the memory corruption etc could be our self and the core-dumps might >> be because of uncaught exceptions >> >> Needles to say is that all unittests pass on Linux, but of course >> cygwin-topic/af_unix could act according to POSIX-standard and the behaviour >> couldbe due to our own misinterpretation of how POSIX works > > More likely it's due to bugs in the topic/af_unix branch.  This is still very > much a work in progress. > >> I will try to narrow down the quite complex logic and reproduce the problems > > That would be ideal. > >> If you of some reason wanna try it with casual, I'd be glad to help you out >> (it should be easier now that last time (but there might be some documentation >> missing for Cygwin still)) >> >> https://bitbucket.org/casualcore/ > > I'm going on vacation in a few days, but I might do this when I get back. > > Thanks for your testing. By the way, if your code is using datagram sockets, then there are very serious problems with our implementation (even aside from the performance issue that we've already discussed). For example, I don't know of any reasonable way for select to test whether such a socket is ready for writing. We'll need to solve that somehow. Ken