From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-x132.google.com (mail-lf1-x132.google.com [IPv6:2a00:1450:4864:20::132]) by sourceware.org (Postfix) with ESMTPS id 9B2A53947C06 for ; Tue, 13 Apr 2021 14:06:46 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 9B2A53947C06 Received: by mail-lf1-x132.google.com with SMTP id n138so27624949lfa.3 for ; Tue, 13 Apr 2021 07:06:46 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:references:in-reply-to:subject:date :message-id:mime-version:content-transfer-encoding:content-language :thread-index; bh=VP2S0h3x+SKg8xAxzDbzfrkAurbgpzwwgAzPQMWGiJc=; b=mxvR09qb+mVSWHSJqkM3fAToA7ORx4JctlninOp8StiUbR/qA5FuK5gtzA8fy6kuVS sXWWP6AiuaJwmUU7iGegcxFgpZlR2xaKv5FtWlqFeNjIP8BELkKZEf3bwMHEHiNiY8qX 7JccTmT51lPMevouedUtQCtlsdwGTmmyvfmkNmiUJjAWMgaKkxBKljNWJRyytMsiWJmh U8b/ViEGnJWBVhP0p4ISvJelFXjQYlxVE0Gu+xhOaxx1FOfGkJhZpMIyS+2hfxewrUvr A5aWreJwCwsgMZdka1ksB/efUK1KuSle7qD1B8LoadqIJlQVYIrVvkPZzLs1NyvUzK5y t6rw== X-Gm-Message-State: AOAM531s7Jx4eUP13keyOk9kSEgLM6UBaOFSk1TVbaHUJEidxlPY6Aq8 20Ota74yO+Kzy4WGokPa2FqRsI81JrIsOA== X-Google-Smtp-Source: ABdhPJzHGUAQLEtUoJYYDqqBhp2qShb1Q3/N/6hfaVzh7mCsO0BACpwPTbRyvSYjegoFXKEgsWT6aA== X-Received: by 2002:a05:6512:3c6:: with SMTP id w6mr20530769lfp.294.1618322805413; Tue, 13 Apr 2021 07:06:45 -0700 (PDT) Received: from zingo (87-249-172-112.ljusnet.se. [87.249.172.112]) by smtp.gmail.com with ESMTPSA id f3sm1828556lfc.226.2021.04.13.07.06.43 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Tue, 13 Apr 2021 07:06:44 -0700 (PDT) From: To: "'Ken Brown'" , References: <04cc01d71ffa$7d1e6cf0$775b46d0$@gmail.com> <00d901d7208e$97c05c50$c74114f0$@gmail.com> <860668bf-8cf9-0969-6a01-7fbf8b782db1@cornell.edu> <000901d72607$55dc5a90$01950fb0$@gmail.com> <3346cd1c-b93f-83c4-ff26-553ac95ec692@cornell.edu> <7c21a430-9609-7fd4-1a02-8b7c1978d2f8@cornell.edu> <001901d72af4$4009cd50$c01d67f0$@gmail.com> <134074c1-4c0b-0842-b88b-536a1ed4aefe@cornell.edu> In-Reply-To: <134074c1-4c0b-0842-b88b-536a1ed4aefe@cornell.edu> Subject: RE: AF_UNIX/SOCK_DGRAM is dropping messages Date: Tue, 13 Apr 2021 16:06:42 +0200 Message-ID: <000e01d7306e$3c265580$b4730080$@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Mailer: Microsoft Outlook 16.0 Content-Language: en-se Thread-Index: AQIPffBCgY7dkx32YYBd4buxXBOzegICwCl2At957CQCAh4QbgK/qZQ0Aiflzi4DDsW9ugMOPnyiAg8iLcGpohywcA== X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, URIBL_SBL, URIBL_SBL_A autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: cygwin@cygwin.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: General Cygwin discussions and problem reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Apr 2021 14:06:54 -0000 Hi Ken > >>>>>>>> Using AF_UNIX/SOCK_DGRAM with current version (3.2.0) seems > to > >>>>>>>> drop messages or at least they are not received in the same > >>>>>>>> order they are sent > >>>> > >>>> [snip] > >>>> > >>>>> Thanks for the test case. I can confirm the problem. I'm not > >>>>> familiar enough with the current AF_UNIX implementation to debug > >>>>> this easily. I'd rather spend my time on the new implementation > >>>>> (on the topic/af_unix branch). It turns out that your test case > >>>>> fails there too, but in a completely different way, due to a bug > >>>>> in sendto for datagrams. I'll see if I can fix that bug and = then try again. > >>>>> > >>>>> Ken > >>>> > >>>> Ok, too bad it wasn't our own code base but good that the = "mystery" > >>>> is verified > >>>> > >>>> I finally succeed to build topic/af_unix (after finding out what > >>>> version of zlib was needed), but not with -D__WITH_AF_UNIX to > >>>> CXXFLAGS though and thus I haven=E2=80=99t tested it yet > >>>> > >>>> Is it sufficient to add the define to the "main" Makefile or do = you > >>>> have to add it to all the Makefile:s ? I guess I can find out > >>>> though > >>> > >>> I do it on the configure line, like this: > >>> > >>> ../af_unix/configure CXXFLAGS=3D"-g -O0 -D__WITH_AF_UNIX" -- > prefix=3D... > >>> > >>>> Is topic/af_unix fairly up to date with master branch ? > >>> > >>> Yes, I periodically cherry-pick commits from master to = topic/af_unix. > >>> I'lldo that again right now. > >>> > >>>> Either way, I'll be glad to help out testing topic/af_unix > >>> > >>> Thanks! > >> > >> I've now pushed a fix for that sendto bug, and your test case runs > >> without error on the topic/af_unix branch. > > > > It seems like the test-case do work now with topic/af_unix in = blocking > > mode, but when using non-blocking (with MSG_DONTWAIT) there are > some > > issues I think > > > > 1. When the queue is empty with non-blocking recv(), errno is set to > > EPIPE but I think it should be EAGAIN (or maybe the pipe is getting > > broken for real of some reason ?) > > > > 2. When using non-blocking recv() and no message is written at all, = it > > seems like recv() blocks forever > > > > 3. Using non-blocking recv() where the "client" does send less than > > "count" messages, sometimes recv() blocks forever (as well) > > > > > > My na=C3=AFve analysis of this is that for the first issue (if any) = the > > wrong errno is set and for the second issue it blocks if no sendto() > > is done after the first recv(), i.e. nothing kicks the "reader = thread" > > in the butt to realise the queue is empty. It is not super clear > > though what POSIX says about creating blocking descriptors and then > > using non-blocking-flags with recv(), but this works in Linux any = way >=20 > The explanation is actually much simpler. In the recv code where a = bound > datagram socket waits for a remote socket to connect to the pipe, I = simply > forget to handle MSG_DONTWAIT. I've pushed a fix. Please retest. >=20 > I should add that in all my work so far on the topic/af_unix branch, = I've > thought mainly about stream sockets. So there may still be things = remaining > to be implemented for the datagram case. I finally got some time to test topic/af_unix in our "real" = cygwin-application (casual) and unfortunately very few of our unittests = pass The symptoms are that there's unexpected eternal blocking, sometimes = there's unexpected EADDRNOTAVAIL, sometimes it looks like some memory = corruption (and core-dumps) Of course the memory corruption etc could be our self and the core-dumps = might be because of uncaught exceptions Needles to say is that all unittests pass on Linux, but of course = cygwin-topic/af_unix could act according to POSIX-standard and the = behaviour could be due to our own misinterpretation of how POSIX works I will try to narrow down the quite complex logic and reproduce the = problems If you of some reason wanna try it with casual, I'd be glad to help you = out (it should be easier now that last time (but there might be some = documentation missing for Cygwin still)) https://bitbucket.org/casualcore/ Best regards, Kristian > Ken