From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-x12b.google.com (mail-lf1-x12b.google.com [IPv6:2a00:1450:4864:20::12b]) by sourceware.org (Postfix) with ESMTPS id C90DE3955422 for ; Wed, 14 Apr 2021 17:14:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org C90DE3955422 Received: by mail-lf1-x12b.google.com with SMTP id n138so34516829lfa.3 for ; Wed, 14 Apr 2021 10:14:22 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:references:in-reply-to:subject:date :message-id:mime-version:content-transfer-encoding:content-language :thread-index; bh=FBVAvybgUZTjUkdrE7H46Yc3hAA3s0cc47UyAmOfOsE=; b=iH+33jez5vsmn/oU09Y4MhUiKXfZSzCfWO2sRSlG9Sp8JwLupEw92pBliqt0HxK2H+ D12HlnIy3NTyRhs7lE2IzEE8D9/jaotsiFoCGfNex5VJqLfgcYexgYgzC+O6UY1l+UzS ubQFLee/57FHPFG3YPh1NvINupPw5GHGe4c496PMBNH+fmsQjggn1TZSY8uHWzu36dmo Y2Nn9TG1PB4E07o3cRak9WHx/dDhhpGMmhJftWSGGRk+8tQIodgOpLe6FlsbS/DUWuNH 5CQx7Fj1N0JFZNa3X1zGt5aAZVb/S7Yd4aQQsST/o9e26q7pFDErTT2xi5sSZU7Oob0B QQOw== X-Gm-Message-State: AOAM5330DDaSFTdsyuQZcZPE3VtBGDixeTF78sghuJZ7mVlECPoJcQ9n pgnv03HSZPMaWs2WSnm0d83W9Nhh6zZ4dw== X-Google-Smtp-Source: ABdhPJxxCHE9SWlfaBrDTgvifBSblf8bt7ifQhrpDR+5SmhDue59dJImwiFCuv7eKZkPA5us6PHBcA== X-Received: by 2002:a05:6512:1086:: with SMTP id j6mr23522774lfg.462.1618420461426; Wed, 14 Apr 2021 10:14:21 -0700 (PDT) Received: from zingo (87-249-172-112.ljusnet.se. [87.249.172.112]) by smtp.gmail.com with ESMTPSA id u12sm52515lff.126.2021.04.14.10.14.20 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 14 Apr 2021 10:14:20 -0700 (PDT) From: To: "'Ken Brown'" , References: <04cc01d71ffa$7d1e6cf0$775b46d0$@gmail.com> <00d901d7208e$97c05c50$c74114f0$@gmail.com> <860668bf-8cf9-0969-6a01-7fbf8b782db1@cornell.edu> <000901d72607$55dc5a90$01950fb0$@gmail.com> <3346cd1c-b93f-83c4-ff26-553ac95ec692@cornell.edu> <7c21a430-9609-7fd4-1a02-8b7c1978d2f8@cornell.edu> <001901d72af4$4009cd50$c01d67f0$@gmail.com> <134074c1-4c0b-0842-b88b-536a1ed4aefe@cornell.edu> <000e01d7306e$3c265580$b4730080$@gmail.com> <19cf8626-c653-76db-a409-730a5aa5c955@cornell.edu> <4380cdea-c95b-d9dc-50e3-e5adabb73b92@cornell.edu> In-Reply-To: <4380cdea-c95b-d9dc-50e3-e5adabb73b92@cornell.edu> Subject: RE: AF_UNIX/SOCK_DGRAM is dropping messages Date: Wed, 14 Apr 2021 19:14:20 +0200 Message-ID: <000701d73151$9c259660$d470c320$@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Mailer: Microsoft Outlook 16.0 Content-Language: en-se Thread-Index: AQIPffBCgY7dkx32YYBd4buxXBOzegICwCl2At957CQCAh4QbgK/qZQ0Aiflzi4DDsW9ugMOPnyiAg8iLcEB0zQOqwLLQMAbAeO3FIypb9NuoA== X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, URIBL_SBL, URIBL_SBL_A autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: cygwin@cygwin.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: General Cygwin discussions and problem reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Apr 2021 17:14:24 -0000 > >> Hi Ken > >> > >>>>>>>>>>> Using AF_UNIX/SOCK_DGRAM with current version (3.2.0) > seems > >>> to > >>>>>>>>>>> drop messages or at least they are not received in the = same > >>>>>>>>>>> order they are sent > >>>>>>> > >>>>>>> [snip] > >>>>>>> > >>>>>>>> Thanks for the test case. I can confirm the problem. I'm = not > >>>>>>>> familiar enough with the current AF_UNIX implementation to > >>>>>>>> debug this easily. I'd rather spend my time on the new > >>>>>>>> implementation (on the topic/af_unix branch). It turns out > >>>>>>>> that your test case fails there too, but in a completely > >>>>>>>> different way, due to a bug in sendto for datagrams. I'll = see > >>>>>>>> if I can fix that bug and then try again. > >>>>>>>> > >>>>>>>> Ken > >>>>>>> > >>>>>>> Ok, too bad it wasn't our own code base but good that the > "mystery" > >>>>>>> is verified > >>>>>>> > >>>>>>> I finally succeed to build topic/af_unix (after finding out = what > >>>>>>> version of zlib was needed), but not with -D__WITH_AF_UNIX to > >>>>>>> CXXFLAGS though and thus I haven=E2=80=99t tested it yet > >>>>>>> > >>>>>>> Is it sufficient to add the define to the "main" Makefile or = do > >>>>>>> you have to add it to all the Makefile:s ? I guess I can find > >>>>>>> out though > >>>>>> > >>>>>> I do it on the configure line, like this: > >>>>>> > >>>>>> ../af_unix/configure CXXFLAGS=3D"-g -O0 -D__WITH_AF_UNIX" = -- > >>> prefix=3D... > >>>>>> > >>>>>>> Is topic/af_unix fairly up to date with master branch ? > >>>>>> > >>>>>> Yes, I periodically cherry-pick commits from master to = topic/af_unix. > >>>>>> I'lldo that again right now. > >>>>>> > >>>>>>> Either way, I'll be glad to help out testing topic/af_unix > >>>>>> > >>>>>> Thanks! > >>>>> > >>>>> I've now pushed a fix for that sendto bug, and your test case = runs > >>>>> without error on the topic/af_unix branch. > >>>> > >>>> It seems like the test-case do work now with topic/af_unix in > >>>> blocking mode, but when using non-blocking (with MSG_DONTWAIT) > >>>> there are > >>> some > >>>> issues I think > >>>> > >>>> 1. When the queue is empty with non-blocking recv(), errno is set > >>>> to EPIPE but I think it should be EAGAIN (or maybe the pipe is > >>>> getting broken for real of some reason ?) > >>>> > >>>> 2. When using non-blocking recv() and no message is written at = all, > >>>> it seems like recv() blocks forever > >>>> > >>>> 3. Using non-blocking recv() where the "client" does send less = than > >>>> "count" messages, sometimes recv() blocks forever (as well) > >>>> > >>>> > >>>> My na=C3=AFve analysis of this is that for the first issue (if = any) the > >>>> wrong errno is set and for the second issue it blocks if no > >>>> sendto() is done after the first recv(), i.e. nothing kicks the = "reader > thread" > >>>> in the butt to realise the queue is empty. It is not super clear > >>>> though what POSIX says about creating blocking descriptors and = then > >>>> using non-blocking-flags with recv(), but this works in Linux any > >>>> way > >>> > >>> The explanation is actually much simpler. In the recv code where = a > >>> bound datagram socket waits for a remote socket to connect to the > >>> pipe, I simply forget to handle MSG_DONTWAIT. I've pushed a > fix. Please retest. > >>> > >>> I should add that in all my work so far on the topic/af_unix = branch, > >>> I've thought mainly about stream sockets. So there may still be > >>> things remaining to be implemented for the datagram case. > >> > >> I finally got some time to test topic/af_unix in our "real" > >> cygwin-application > >> (casual) and unfortunately very few of our unittests pass > >> > >> The symptoms are that there's unexpected eternal blocking, = sometimes > >> there's unexpected EADDRNOTAVAIL, sometimes it looks like some > memory > >> corruption (and > >> core-dumps) > >> > >> Of course the memory corruption etc could be our self and the > >> core-dumps might be because of uncaught exceptions > >> > >> Needles to say is that all unittests pass on Linux, but of course > >> cygwin-topic/af_unix could act according to POSIX-standard and the > >> behaviour couldbe due to our own misinterpretation of how POSIX = works > > > > More likely it's due to bugs in the topic/af_unix branch. This is > > still very much a work in progress. > > > >> I will try to narrow down the quite complex logic and reproduce the > >> problems > > > > That would be ideal. > > > >> If you of some reason wanna try it with casual, I'd be glad to help > >> you out (it should be easier now that last time (but there might be > >> some documentation missing for Cygwin still)) > >> > >> https://bitbucket.org/casualcore/ > > > > I'm going on vacation in a few days, but I might do this when I get = back. > > > > Thanks for your testing. >=20 > By the way, if your code is using datagram sockets, then there are = very serious > problems with our implementation (even aside from the performance = issue > that we've already discussed). For example, I don't know of any = reasonable > way for select to test whether such a socket is ready for writing. = We'll need to > solve that somehow. If you by that mean if we're using SOCK_DGRAM, the answer is yes I tried SOCK_STREAM (and SOCK_SEQPACKET I think) for CYGWIN 3.2.0 but = that didn't work at all As far as I understand, both all types on pretty much all = implementations preserves message ordering though I haven't tried SOCK_STREAM and/or SOCK_SEQPACKET with the = topic/af_unix-branch. Is that worth a try ? Best regards, Kristian > Ken