From: <sten.kristian.ivarsson@gmail.com>
To: "'Ken Brown'" <kbrown@cornell.edu>, <cygwin@cygwin.com>
Subject: RE: AF_UNIX/SOCK_DGRAM is dropping messages
Date: Wed, 14 Apr 2021 19:14:20 +0200 [thread overview]
Message-ID: <000701d73151$9c259660$d470c320$@gmail.com> (raw)
In-Reply-To: <4380cdea-c95b-d9dc-50e3-e5adabb73b92@cornell.edu>
> >> Hi Ken
> >>
> >>>>>>>>>>> Using AF_UNIX/SOCK_DGRAM with current version (3.2.0)
> seems
> >>> to
> >>>>>>>>>>> drop messages or at least they are not received in the same
> >>>>>>>>>>> order they are sent
> >>>>>>>
> >>>>>>> [snip]
> >>>>>>>
> >>>>>>>> Thanks for the test case. I can confirm the problem. I'm not
> >>>>>>>> familiar enough with the current AF_UNIX implementation to
> >>>>>>>> debug this easily. I'd rather spend my time on the new
> >>>>>>>> implementation (on the topic/af_unix branch). It turns out
> >>>>>>>> that your test case fails there too, but in a completely
> >>>>>>>> different way, due to a bug in sendto for datagrams. I'll see
> >>>>>>>> if I can fix that bug and then try again.
> >>>>>>>>
> >>>>>>>> Ken
> >>>>>>>
> >>>>>>> Ok, too bad it wasn't our own code base but good that the
> "mystery"
> >>>>>>> is verified
> >>>>>>>
> >>>>>>> I finally succeed to build topic/af_unix (after finding out what
> >>>>>>> version of zlib was needed), but not with -D__WITH_AF_UNIX to
> >>>>>>> CXXFLAGS though and thus I haven’t tested it yet
> >>>>>>>
> >>>>>>> Is it sufficient to add the define to the "main" Makefile or do
> >>>>>>> you have to add it to all the Makefile:s ? I guess I can find
> >>>>>>> out though
> >>>>>>
> >>>>>> I do it on the configure line, like this:
> >>>>>>
> >>>>>> ../af_unix/configure CXXFLAGS="-g -O0 -D__WITH_AF_UNIX" --
> >>> prefix=...
> >>>>>>
> >>>>>>> Is topic/af_unix fairly up to date with master branch ?
> >>>>>>
> >>>>>> Yes, I periodically cherry-pick commits from master to topic/af_unix.
> >>>>>> I'lldo that again right now.
> >>>>>>
> >>>>>>> Either way, I'll be glad to help out testing topic/af_unix
> >>>>>>
> >>>>>> Thanks!
> >>>>>
> >>>>> I've now pushed a fix for that sendto bug, and your test case runs
> >>>>> without error on the topic/af_unix branch.
> >>>>
> >>>> It seems like the test-case do work now with topic/af_unix in
> >>>> blocking mode, but when using non-blocking (with MSG_DONTWAIT)
> >>>> there are
> >>> some
> >>>> issues I think
> >>>>
> >>>> 1. When the queue is empty with non-blocking recv(), errno is set
> >>>> to EPIPE but I think it should be EAGAIN (or maybe the pipe is
> >>>> getting broken for real of some reason ?)
> >>>>
> >>>> 2. When using non-blocking recv() and no message is written at all,
> >>>> it seems like recv() blocks forever
> >>>>
> >>>> 3. Using non-blocking recv() where the "client" does send less than
> >>>> "count" messages, sometimes recv() blocks forever (as well)
> >>>>
> >>>>
> >>>> My naïve analysis of this is that for the first issue (if any) the
> >>>> wrong errno is set and for the second issue it blocks if no
> >>>> sendto() is done after the first recv(), i.e. nothing kicks the "reader
> thread"
> >>>> in the butt to realise the queue is empty. It is not super clear
> >>>> though what POSIX says about creating blocking descriptors and then
> >>>> using non-blocking-flags with recv(), but this works in Linux any
> >>>> way
> >>>
> >>> The explanation is actually much simpler. In the recv code where a
> >>> bound datagram socket waits for a remote socket to connect to the
> >>> pipe, I simply forget to handle MSG_DONTWAIT. I've pushed a
> fix. Please retest.
> >>>
> >>> I should add that in all my work so far on the topic/af_unix branch,
> >>> I've thought mainly about stream sockets. So there may still be
> >>> things remaining to be implemented for the datagram case.
> >>
> >> I finally got some time to test topic/af_unix in our "real"
> >> cygwin-application
> >> (casual) and unfortunately very few of our unittests pass
> >>
> >> The symptoms are that there's unexpected eternal blocking, sometimes
> >> there's unexpected EADDRNOTAVAIL, sometimes it looks like some
> memory
> >> corruption (and
> >> core-dumps)
> >>
> >> Of course the memory corruption etc could be our self and the
> >> core-dumps might be because of uncaught exceptions
> >>
> >> Needles to say is that all unittests pass on Linux, but of course
> >> cygwin-topic/af_unix could act according to POSIX-standard and the
> >> behaviour couldbe due to our own misinterpretation of how POSIX works
> >
> > More likely it's due to bugs in the topic/af_unix branch. This is
> > still very much a work in progress.
> >
> >> I will try to narrow down the quite complex logic and reproduce the
> >> problems
> >
> > That would be ideal.
> >
> >> If you of some reason wanna try it with casual, I'd be glad to help
> >> you out (it should be easier now that last time (but there might be
> >> some documentation missing for Cygwin still))
> >>
> >> https://bitbucket.org/casualcore/
> >
> > I'm going on vacation in a few days, but I might do this when I get back.
> >
> > Thanks for your testing.
>
> By the way, if your code is using datagram sockets, then there are very serious
> problems with our implementation (even aside from the performance issue
> that we've already discussed). For example, I don't know of any reasonable
> way for select to test whether such a socket is ready for writing. We'll need to
> solve that somehow.
If you by that mean if we're using SOCK_DGRAM, the answer is yes
I tried SOCK_STREAM (and SOCK_SEQPACKET I think) for CYGWIN 3.2.0 but that didn't work at all
As far as I understand, both all types on pretty much all implementations preserves message ordering though
I haven't tried SOCK_STREAM and/or SOCK_SEQPACKET with the topic/af_unix-branch. Is that worth a try ?
Best regards,
Kristian
> Ken
next prev parent reply other threads:[~2021-04-14 17:14 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-23 15:37 sten.kristian.ivarsson
2021-03-23 19:20 ` Glenn Strauss
2021-03-24 9:18 ` sten.kristian.ivarsson
2021-03-30 14:17 ` Ken Brown
2021-03-31 8:24 ` sten.kristian.ivarsson
2021-03-31 15:07 ` Ken Brown
2021-04-01 16:02 ` Ken Brown
2021-04-06 7:52 ` Noel Grandin
2021-04-06 14:59 ` Ken Brown
2021-04-06 14:50 ` sten.kristian.ivarsson
2021-04-06 15:24 ` Ken Brown
2021-04-07 14:56 ` Ken Brown
2021-04-08 8:37 ` sten.kristian.ivarsson
2021-04-08 19:47 ` sten.kristian.ivarsson
2021-04-08 21:02 ` Ken Brown
2021-04-13 14:06 ` sten.kristian.ivarsson
2021-04-13 14:47 ` Ken Brown
2021-04-13 22:43 ` Ken Brown
2021-04-14 15:53 ` Ken Brown
2021-04-14 17:14 ` sten.kristian.ivarsson [this message]
2021-04-14 21:58 ` Ken Brown
2021-04-15 13:15 ` sten.kristian.ivarsson
2021-04-15 15:01 ` Ken Brown
2021-04-27 14:56 ` Ken Brown
2021-04-28 7:15 ` sten.kristian.ivarsson
2021-08-12 12:56 ` sten.kristian.ivarsson
2021-08-13 11:19 ` Ken Brown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='000701d73151$9c259660$d470c320$@gmail.com' \
--to=sten.kristian.ivarsson@gmail.com \
--cc=cygwin@cygwin.com \
--cc=kbrown@cornell.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).