public inbox for cygwin-developers@cygwin.com
 help / color / mirror / Atom feed
From: Ken Brown <kbrown@cornell.edu>
To: cygwin-developers@cygwin.com
Subject: Re: The unreliability of AF_UNIX datagram sockets
Date: Thu, 29 Apr 2021 12:44:48 -0400	[thread overview]
Message-ID: <16e1d55e-15ea-6c0e-04e4-aa6cb2c0c1bd@cornell.edu> (raw)
In-Reply-To: <YIrLQezXLUnEo8BS@calimero.vinschen.de>

On 4/29/2021 11:05 AM, Corinna Vinschen wrote:
> On Apr 29 10:38, Ken Brown wrote:
>> On 4/29/2021 7:05 AM, Corinna Vinschen wrote:
>>> On Apr 27 11:47, Ken Brown wrote:
>>>> I'm willing to start working on the switch to native AF_UNIX sockets.  (I'm
>>>> frankly getting bored with working on the pipe implementation, and this
>>>             ^^^^^^^^^^^^^
>>> I not really surprised, Windows pipe semantics are annoying.
>>>
>>>> doesn't really seem like it has much of a future.)  But I'd like to be
>>>> confident that there's a good solution to the datagram problem before I
>>>> invest too much time in this.
>>>
>>> Summary of our short discussion on IRC:
>>>
>>> - Switching to SOCK_STREAM under the hood adds the necessary reliabilty
>>>     but breaks DGRAM message boundaries.
>>>
>>> - There appears to be no way in Winsock to handle send buffer overflow
>>>     gracefully so that user space knows that messages have been discarded.
>>>     Strange enoug there's a SIO_ENABLE_CIRCULAR_QUEUEING ioctl, but that
>>>     just makes things worse, by dropping older messages in favor of the
>>>     newer ones :-P
>>>
>>> I think it should be possible to switch to STREAM sockets to emulate
>>> DGRAM semantics.  Our advantage is that this is all local.  For all
>>> practical purposes there's no chance data gets really lost.  Windows has
>>> an almost indefinite send buffer.
>>>
>>> If you look at the STREAM as a kind of tunneling layer for getting DGRAM
>>> messages over the (local) line, the DGRAM content could simply be
>>> encapsulated in a tunnel packet or frame, basically the same way the
>>> new, boring AF_UNIX code does it.  A DGRAM message encapsulated in a
>>> STREAM message always has a header which at least contains the length of
>>> the actual DGRAM message.  So when the peer reads from the socket, it
>>> always only reads the header until it's complete.  Then it knows how
>>> much payload is expected and then it reads until the payload has been
>>> received.
>>
>> This should work.  We could even use MSG_PEEK to read the header and then
>> MSG_WAITALL to read the whole packet.
>>
>> I'd be happy to try to implement this.  Do you want to create a branch
>> (maybe topic/dgram or something like that) for working on it?
> 
> You can create topic branches as you see fit, don't worry about it.
> 
>>> Ultimately this would even allow to emulate DGRAMs when using native
>>> Windows AF_UNIX sockets.  Then we'd just have to keep the old code for
>>> backward compat.
>>
>> Yep.
>>
>>> There's just one problem with this entire switch to non-pipes: Sending
>>> descriptors between peers running under different accounts requires to
>>> be able to switch the user context.  You need this if the sender is a
>>> non-admin account to call ImpersonateNamedPipeClient in the receiver.
>>> So we might need to keep the pipes even if just for the purpose of being
>>> able to call ImpersonateNamedPipeClient...
>>>
>>>
>>> Thoughts?
>>
>> Sounds great.  Thanks.
> 
> Don't start just yet.
> 
> I'm still not quite sure if that's really the way to go.  As I see it we
> still have something to discuss here.
> 
> For one thing, using native AF_UNIX sockets will split our user base
> into two.  Those who are not using a recent enough Windows will get the
> old code and no descriptor passing.  However, if an application has been
> built with descriptor passing, it won't work for those running older
> Windows versions.  I don't think we want that for the distro, or, do we?

Good point.  Sounds like a nightmare.

> Next problem... implementing actual STREAM sockets.  Even using native
> AF_UNIX sockets, these, too, would have to encapsulate the actual
> payload because of the ancilliary data we want to send with them.
> Whether or not we use native AF_UNIX sockets, they won't be compatible
> with native applications...
> 
> So maybe we should really think hard about the alternative
> implementation using POSIX message queues, I guess.  And *if* we do
> that, this should be used likewise for STREAM as for DGRAM sockets, so
> the code is easier to maintain.  Obvious advantage: No problem with
> older OS versions.  And maybe it's even dirt easy to implement in
> comparison with using other methods, because the transport mechanism
> is already in place.

Yes, I don't think it should be too hard.  The one thing I can think of that's 
missing is a facility for doing a partial read of a message on the message 
queue.  (This would be needed for a recv call on a STREAM socket, in which the 
buffer is smaller than the payload of the next message on the queue.)  But this 
should be straightforward to implement.

Alternatively, I guess we could read the whole message and store the excess in a 
readahead buffer.

> What's missing is the ImpersonateNamedPipeClient stuff (but that's not
> different from using native AF_UNIX) and reflections about the permission
> handling.

On 4/29/2021 11:18 AM, Corinna Vinschen wrote:
 > While searching the net I found this additional gem of information:
 >
 > Native AF_UNIX sockets don't support abstract sockets.  You must bind to
 > a valid path, so you always have a visible file in the filesystem.
 > Discussed here: https://github.com/microsoft/WSL/issues/4240
 >
 > We could workaround that with our POSIX unlink semantics, probably,
 > but it's YA downside

Agreed.  The more features that are missing from native AF_UNIX sockets, the 
less appealing they become.

Concerning abstract sockets, would we still have an issue if we used message 
queues?  Wouldn't there be a visible file under /dev/mqueue?  Or is there a way 
around that?

Ken

  parent reply	other threads:[~2021-04-29 16:44 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-27 15:47 Ken Brown
2021-04-29 11:05 ` Corinna Vinschen
2021-04-29 11:16   ` Corinna Vinschen
2021-04-29 14:38   ` Ken Brown
2021-04-29 15:05     ` Corinna Vinschen
2021-04-29 15:18       ` Corinna Vinschen
2021-04-29 16:44       ` Ken Brown [this message]
2021-04-29 17:39         ` Corinna Vinschen
2021-05-01 21:41           ` Ken Brown
2021-05-03 10:30             ` Corinna Vinschen
2021-05-03 15:45               ` Corinna Vinschen
2021-05-03 16:56                 ` Ken Brown
2021-05-03 18:40                   ` Corinna Vinschen
2021-05-03 19:48                     ` Ken Brown
2021-05-03 20:50                       ` Ken Brown
2021-05-04 11:06                         ` Corinna Vinschen
2021-05-13 14:30                           ` Ken Brown
2021-05-17 10:26                             ` Corinna Vinschen
2021-05-17 13:02                               ` Ken Brown
2021-05-17 13:02                               ` Ken Brown
2021-05-20 13:46   ` Ken Brown
2021-05-20 19:25     ` Corinna Vinschen
2021-05-21 21:54       ` Ken Brown
2021-05-22 15:49         ` Corinna Vinschen
2021-05-22 16:50           ` Ken Brown
2021-05-22 18:21             ` Ken Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=16e1d55e-15ea-6c0e-04e4-aa6cb2c0c1bd@cornell.edu \
    --to=kbrown@cornell.edu \
    --cc=cygwin-developers@cygwin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).