From: Ken Brown <kbrown@cornell.edu>
To: sten.kristian.ivarsson@gmail.com, cygwin@cygwin.com
Subject: Re: AF_UNIX/SOCK_DGRAM is dropping messages
Date: Wed, 14 Apr 2021 17:58:08 -0400 [thread overview]
Message-ID: <2e64e918-b28b-753e-8337-c757cc62b9bb@cornell.edu> (raw)
In-Reply-To: <000701d73151$9c259660$d470c320$@gmail.com>
On 4/14/2021 1:14 PM, sten.kristian.ivarsson@gmail.com wrote:
>>>> Hi Ken
>>>>
>>>>>>>>>>>>> Using AF_UNIX/SOCK_DGRAM with current version (3.2.0)
>> seems
>>>>> to
>>>>>>>>>>>>> drop messages or at least they are not received in the same
>>>>>>>>>>>>> order they are sent
>>>>>>>>>
>>>>>>>>> [snip]
>>>>>>>>>
>>>>>>>>>> Thanks for the test case. I can confirm the problem. I'm not
>>>>>>>>>> familiar enough with the current AF_UNIX implementation to
>>>>>>>>>> debug this easily. I'd rather spend my time on the new
>>>>>>>>>> implementation (on the topic/af_unix branch). It turns out
>>>>>>>>>> that your test case fails there too, but in a completely
>>>>>>>>>> different way, due to a bug in sendto for datagrams. I'll see
>>>>>>>>>> if I can fix that bug and then try again.
>>>>>>>>>>
>>>>>>>>>> Ken
>>>>>>>>>
>>>>>>>>> Ok, too bad it wasn't our own code base but good that the
>> "mystery"
>>>>>>>>> is verified
>>>>>>>>>
>>>>>>>>> I finally succeed to build topic/af_unix (after finding out what
>>>>>>>>> version of zlib was needed), but not with -D__WITH_AF_UNIX to
>>>>>>>>> CXXFLAGS though and thus I haven’t tested it yet
>>>>>>>>>
>>>>>>>>> Is it sufficient to add the define to the "main" Makefile or do
>>>>>>>>> you have to add it to all the Makefile:s ? I guess I can find
>>>>>>>>> out though
>>>>>>>>
>>>>>>>> I do it on the configure line, like this:
>>>>>>>>
>>>>>>>> ../af_unix/configure CXXFLAGS="-g -O0 -D__WITH_AF_UNIX" --
>>>>> prefix=...
>>>>>>>>
>>>>>>>>> Is topic/af_unix fairly up to date with master branch ?
>>>>>>>>
>>>>>>>> Yes, I periodically cherry-pick commits from master to topic/af_unix.
>>>>>>>> I'lldo that again right now.
>>>>>>>>
>>>>>>>>> Either way, I'll be glad to help out testing topic/af_unix
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>
>>>>>>> I've now pushed a fix for that sendto bug, and your test case runs
>>>>>>> without error on the topic/af_unix branch.
>>>>>>
>>>>>> It seems like the test-case do work now with topic/af_unix in
>>>>>> blocking mode, but when using non-blocking (with MSG_DONTWAIT)
>>>>>> there are
>>>>> some
>>>>>> issues I think
>>>>>>
>>>>>> 1. When the queue is empty with non-blocking recv(), errno is set
>>>>>> to EPIPE but I think it should be EAGAIN (or maybe the pipe is
>>>>>> getting broken for real of some reason ?)
>>>>>>
>>>>>> 2. When using non-blocking recv() and no message is written at all,
>>>>>> it seems like recv() blocks forever
>>>>>>
>>>>>> 3. Using non-blocking recv() where the "client" does send less than
>>>>>> "count" messages, sometimes recv() blocks forever (as well)
>>>>>>
>>>>>>
>>>>>> My naïve analysis of this is that for the first issue (if any) the
>>>>>> wrong errno is set and for the second issue it blocks if no
>>>>>> sendto() is done after the first recv(), i.e. nothing kicks the "reader
>> thread"
>>>>>> in the butt to realise the queue is empty. It is not super clear
>>>>>> though what POSIX says about creating blocking descriptors and then
>>>>>> using non-blocking-flags with recv(), but this works in Linux any
>>>>>> way
>>>>>
>>>>> The explanation is actually much simpler. In the recv code where a
>>>>> bound datagram socket waits for a remote socket to connect to the
>>>>> pipe, I simply forget to handle MSG_DONTWAIT. I've pushed a
>> fix. Please retest.
>>>>>
>>>>> I should add that in all my work so far on the topic/af_unix branch,
>>>>> I've thought mainly about stream sockets. So there may still be
>>>>> things remaining to be implemented for the datagram case.
>>>>
>>>> I finally got some time to test topic/af_unix in our "real"
>>>> cygwin-application
>>>> (casual) and unfortunately very few of our unittests pass
>>>>
>>>> The symptoms are that there's unexpected eternal blocking, sometimes
>>>> there's unexpected EADDRNOTAVAIL, sometimes it looks like some
>> memory
>>>> corruption (and
>>>> core-dumps)
>>>>
>>>> Of course the memory corruption etc could be our self and the
>>>> core-dumps might be because of uncaught exceptions
>>>>
>>>> Needles to say is that all unittests pass on Linux, but of course
>>>> cygwin-topic/af_unix could act according to POSIX-standard and the
>>>> behaviour couldbe due to our own misinterpretation of how POSIX works
>>>
>>> More likely it's due to bugs in the topic/af_unix branch. This is
>>> still very much a work in progress.
>>>
>>>> I will try to narrow down the quite complex logic and reproduce the
>>>> problems
>>>
>>> That would be ideal.
>>>
>>>> If you of some reason wanna try it with casual, I'd be glad to help
>>>> you out (it should be easier now that last time (but there might be
>>>> some documentation missing for Cygwin still))
>>>>
>>>> https://bitbucket.org/casualcore/
>>>
>>> I'm going on vacation in a few days, but I might do this when I get back.
>>>
>>> Thanks for your testing.
>>
>> By the way, if your code is using datagram sockets, then there are very serious
>> problems with our implementation (even aside from the performance issue
>> that we've already discussed). For example, I don't know of any reasonable
>> way for select to test whether such a socket is ready for writing. We'll need to
>> solve that somehow.
>
> If you by that mean if we're using SOCK_DGRAM, the answer is yes
>
> I tried SOCK_STREAM (and SOCK_SEQPACKET I think) for CYGWIN 3.2.0 but that didn't work at all
>
> As far as I understand, both all types on pretty much all implementations preserves message ordering though
>
> I haven't tried SOCK_STREAM and/or SOCK_SEQPACKET with the topic/af_unix-branch. Is that worth a try ?
SOCK_STREAM is definitely worth a try. The implementation of that should be
much more reliable than the implementation of SOCK_DGRAM at the moment. We
don't implement SOCK_SEQPACKET.
Ken
next prev parent reply other threads:[~2021-04-14 21:58 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-23 15:37 sten.kristian.ivarsson
2021-03-23 19:20 ` Glenn Strauss
2021-03-24 9:18 ` sten.kristian.ivarsson
2021-03-30 14:17 ` Ken Brown
2021-03-31 8:24 ` sten.kristian.ivarsson
2021-03-31 15:07 ` Ken Brown
2021-04-01 16:02 ` Ken Brown
2021-04-06 7:52 ` Noel Grandin
2021-04-06 14:59 ` Ken Brown
2021-04-06 14:50 ` sten.kristian.ivarsson
2021-04-06 15:24 ` Ken Brown
2021-04-07 14:56 ` Ken Brown
2021-04-08 8:37 ` sten.kristian.ivarsson
2021-04-08 19:47 ` sten.kristian.ivarsson
2021-04-08 21:02 ` Ken Brown
2021-04-13 14:06 ` sten.kristian.ivarsson
2021-04-13 14:47 ` Ken Brown
2021-04-13 22:43 ` Ken Brown
2021-04-14 15:53 ` Ken Brown
2021-04-14 17:14 ` sten.kristian.ivarsson
2021-04-14 21:58 ` Ken Brown [this message]
2021-04-15 13:15 ` sten.kristian.ivarsson
2021-04-15 15:01 ` Ken Brown
2021-04-27 14:56 ` Ken Brown
2021-04-28 7:15 ` sten.kristian.ivarsson
2021-08-12 12:56 ` sten.kristian.ivarsson
2021-08-13 11:19 ` Ken Brown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2e64e918-b28b-753e-8337-c757cc62b9bb@cornell.edu \
--to=kbrown@cornell.edu \
--cc=cygwin@cygwin.com \
--cc=sten.kristian.ivarsson@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).