public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
* Unix Domain Socket Limitation?
@ 2020-11-25 21:47 Norton Allen
  2020-11-25 22:27 ` Ken Brown
  0 siblings, 1 reply; 15+ messages in thread
From: Norton Allen @ 2020-11-25 21:47 UTC (permalink / raw)
  To: cygwin

In my recent tests, it appears as though it is not possible to 
successfully connect via two Unix Domain sockets from one client 
application to one server application.

Specifically, if I create a server which listens on a Unix Domain socket 
and a client, which attempts to connect() twice, both seem to lock up. 
This is not the behavior under Linux.

I will be happy to work up a minimal example if it is helpful in 
tracking this down. I wanted to start by asking whether this is a known 
limitation and/or if there is something about the Cygwin implementation 
that makes this sort of thing very difficult.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Unix Domain Socket Limitation?
  2020-11-25 21:47 Unix Domain Socket Limitation? Norton Allen
@ 2020-11-25 22:27 ` Ken Brown
       [not found]   ` <4260ad1b-4ab2-fa36-fd0e-7c9644560114@huarp.harvard.edu>
  0 siblings, 1 reply; 15+ messages in thread
From: Ken Brown @ 2020-11-25 22:27 UTC (permalink / raw)
  To: cygwin

On 11/25/2020 4:47 PM, Norton Allen wrote:
> In my recent tests, it appears as though it is not possible to successfully 
> connect via two Unix Domain sockets from one client application to one server 
> application.
> 
> Specifically, if I create a server which listens on a Unix Domain socket and a 
> client, which attempts to connect() twice, both seem to lock up. This is not the 
> behavior under Linux.
> 
> I will be happy to work up a minimal example if it is helpful in tracking this 
> down. I wanted to start by asking whether this is a known limitation and/or if 
> there is something about the Cygwin implementation that makes this sort of thing 
> very difficult.

A minimal example would be extremely helpful.

Corinna can answer questions about limitations in the current implementation. 
But there is a new implementation under development.  It's in the topic/af_unix 
branch of the Cygwin git repository if you're interested in looking at it.

Corinna began working on this a couple years ago, and I've recently been trying 
to finish it.  I've made quite a bit of progress, but there's still more to do 
and undoubtedly many bugs.  So any test cases you have would be very useful.

Ken

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Unix Domain Socket Limitation?
       [not found]   ` <4260ad1b-4ab2-fa36-fd0e-7c9644560114@huarp.harvard.edu>
@ 2020-11-26 17:13     ` Ken Brown
  2020-11-30 17:19       ` Norton Allen
  0 siblings, 1 reply; 15+ messages in thread
From: Ken Brown @ 2020-11-26 17:13 UTC (permalink / raw)
  To: Norton Allen; +Cc: cygwin

[Adding the Cygwin list back to the Cc.]

On 11/26/2020 11:27 AM, Norton Allen wrote:
> On 11/25/2020 5:27 PM, Ken Brown via Cygwin wrote:
>> On 11/25/2020 4:47 PM, Norton Allen wrote:
>>> In my recent tests, it appears as though it is not possible to successfully 
>>> connect via two Unix Domain sockets from one client application to one server 
>>> application.
>>>
>>> Specifically, if I create a server which listens on a Unix Domain socket and 
>>> a client, which attempts to connect() twice, both seem to lock up. This is 
>>> not the behavior under Linux.
>>>
>>> I will be happy to work up a minimal example if it is helpful in tracking 
>>> this down. I wanted to start by asking whether this is a known limitation 
>>> and/or if there is something about the Cygwin implementation that makes this 
>>> sort of thing very difficult.
>>
>> A minimal example would be extremely helpful.
>>
>> Corinna can answer questions about limitations in the current implementation. 
>> But there is a new implementation under development. It's in the topic/af_unix 
>> branch of the Cygwin git repository if you're interested in looking at it.
>>
>> Corinna began working on this a couple years ago, and I've recently been 
>> trying to finish it.  I've made quite a bit of progress, but there's still 
>> more to do and undoubtedly many bugs. So any test cases you have would be very 
>> useful. 
> 
> Thanks Ken,
> 
> As it happens, attempting to produce a minimal example suggests my problem may 
> be somewhere else. I think I've worked in most of the features of my application 
> one by one but have not yet revealed a failure.

OK.  But if you ever do have occasion to write small test programs involving 
AF_UNIX sockets, please send them on.  The new AF_UNIX code needs as much 
testing as it can get.

Ken

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Unix Domain Socket Limitation?
  2020-11-26 17:13     ` Ken Brown
@ 2020-11-30 17:19       ` Norton Allen
  2020-11-30 18:14         ` Ken Brown
  0 siblings, 1 reply; 15+ messages in thread
From: Norton Allen @ 2020-11-30 17:19 UTC (permalink / raw)
  To: Ken Brown; +Cc: cygwin

On 11/26/2020 12:13 PM, Ken Brown wrote:
> [Adding the Cygwin list back to the Cc.]
>
> On 11/26/2020 11:27 AM, Norton Allen wrote:
>> On 11/25/2020 5:27 PM, Ken Brown via Cygwin wrote:
>>> On 11/25/2020 4:47 PM, Norton Allen wrote:
>>>> In my recent tests, it appears as though it is not possible to 
>>>> successfully connect via two Unix Domain sockets from one client 
>>>> application to one server application.
>>>>
>>>> Specifically, if I create a server which listens on a Unix Domain 
>>>> socket and a client, which attempts to connect() twice, both seem 
>>>> to lock up. This is not the behavior under Linux.
>>>>
>>>> I will be happy to work up a minimal example if it is helpful in 
>>>> tracking this down. I wanted to start by asking whether this is a 
>>>> known limitation and/or if there is something about the Cygwin 
>>>> implementation that makes this sort of thing very difficult.
>>>
>>> A minimal example would be extremely helpful.
>>>
>>> Corinna can answer questions about limitations in the current 
>>> implementation. But there is a new implementation under development. 
>>> It's in the topic/af_unix branch of the Cygwin git repository if 
>>> you're interested in looking at it.
>>>
>>> Corinna began working on this a couple years ago, and I've recently 
>>> been trying to finish it.  I've made quite a bit of progress, but 
>>> there's still more to do and undoubtedly many bugs. So any test 
>>> cases you have would be very useful. 
>>
>> Thanks Ken,
>>
>> As it happens, attempting to produce a minimal example suggests my 
>> problem may be somewhere else. I think I've worked in most of the 
>> features of my application one by one but have not yet revealed a 
>> failure.
>
> OK.  But if you ever do have occasion to write small test programs 
> involving AF_UNIX sockets, please send them on.  The new AF_UNIX code 
> needs as much testing as it can get.
>
I have finally put together a start of a minimal example, although it 
seems to require a certain level of complexity before tripping on the 
bug. At the moment, I do not believe the issue is related to having 
multiple sockets between the client and server. I am thinking it is some 
sort of race condition related to non-blocking sockets, since I have 
only observed it when both the client and server are using non-blocking 
sockets.

I have yet to plunge into cygwin.dll, but I think I have reached that point.

Here is the code: https://github.com/nthallen/cygwin_unix

Since I have only exercised this on my machine, I would be very 
interested to know if it is reproducible on anyone else's.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Unix Domain Socket Limitation?
  2020-11-30 17:19       ` Norton Allen
@ 2020-11-30 18:14         ` Ken Brown
  2020-11-30 18:26           ` Norton Allen
  0 siblings, 1 reply; 15+ messages in thread
From: Ken Brown @ 2020-11-30 18:14 UTC (permalink / raw)
  To: cygwin

On 11/30/2020 12:19 PM, Norton Allen wrote:
> On 11/26/2020 12:13 PM, Ken Brown wrote:
>> [Adding the Cygwin list back to the Cc.]
>>
>> On 11/26/2020 11:27 AM, Norton Allen wrote:
>>> On 11/25/2020 5:27 PM, Ken Brown via Cygwin wrote:
>>>> On 11/25/2020 4:47 PM, Norton Allen wrote:
>>>>> In my recent tests, it appears as though it is not possible to successfully 
>>>>> connect via two Unix Domain sockets from one client application to one 
>>>>> server application.
>>>>>
>>>>> Specifically, if I create a server which listens on a Unix Domain socket 
>>>>> and a client, which attempts to connect() twice, both seem to lock up. This 
>>>>> is not the behavior under Linux.
>>>>>
>>>>> I will be happy to work up a minimal example if it is helpful in tracking 
>>>>> this down. I wanted to start by asking whether this is a known limitation 
>>>>> and/or if there is something about the Cygwin implementation that makes 
>>>>> this sort of thing very difficult.
>>>>
>>>> A minimal example would be extremely helpful.
>>>>
>>>> Corinna can answer questions about limitations in the current 
>>>> implementation. But there is a new implementation under development. It's in 
>>>> the topic/af_unix branch of the Cygwin git repository if you're interested 
>>>> in looking at it.
>>>>
>>>> Corinna began working on this a couple years ago, and I've recently been 
>>>> trying to finish it.  I've made quite a bit of progress, but there's still 
>>>> more to do and undoubtedly many bugs. So any test cases you have would be 
>>>> very useful. 
>>>
>>> Thanks Ken,
>>>
>>> As it happens, attempting to produce a minimal example suggests my problem 
>>> may be somewhere else. I think I've worked in most of the features of my 
>>> application one by one but have not yet revealed a failure.
>>
>> OK.  But if you ever do have occasion to write small test programs involving 
>> AF_UNIX sockets, please send them on.  The new AF_UNIX code needs as much 
>> testing as it can get.
>>
> I have finally put together a start of a minimal example, although it seems to 
> require a certain level of complexity before tripping on the bug. At the moment, 
> I do not believe the issue is related to having multiple sockets between the 
> client and server. I am thinking it is some sort of race condition related to 
> non-blocking sockets, since I have only observed it when both the client and 
> server are using non-blocking sockets.
> 
> I have yet to plunge into cygwin.dll, but I think I have reached that point.
> 
> Here is the code: https://github.com/nthallen/cygwin_unix
> 
> Since I have only exercised this on my machine, I would be very interested to 
> know if it is reproducible on anyone else's.

I can reproduce the hang, and it happens if I use the new AF_UNIX code also. 
But what I'm seeing (at least with the new code) isn't exactly what you describe.

When the server's first select call returns, accept succeeds.  The server then 
calls select a second time, and that call doesn't return.  I haven't checked yet 
to see what's going on in the client, and I may not get to that for a while.

Ken

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Unix Domain Socket Limitation?
  2020-11-30 18:14         ` Ken Brown
@ 2020-11-30 18:26           ` Norton Allen
  2020-11-30 23:19             ` Ken Brown
  0 siblings, 1 reply; 15+ messages in thread
From: Norton Allen @ 2020-11-30 18:26 UTC (permalink / raw)
  To: Ken Brown, cygwin

On 11/30/2020 1:14 PM, Ken Brown wrote:
> I can reproduce the hang, and it happens if I use the new AF_UNIX code 
> also. But what I'm seeing (at least with the new code) isn't exactly 
> what you describe.
>
> When the server's first select call returns, accept succeeds.  The 
> server then calls select a second time, and that call doesn't return.  
> I haven't checked yet to see what's going on in the client, and I may 
> not get to that for a while.
>
That's good news, and seems to be consistent with my theory that it is 
some sort of race condition that might be particularly sensitive to 
system-specific timing. I am compiling cygwin1.dll now.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Unix Domain Socket Limitation?
  2020-11-30 18:26           ` Norton Allen
@ 2020-11-30 23:19             ` Ken Brown
  2020-12-01  2:14               ` Norton Allen
  0 siblings, 1 reply; 15+ messages in thread
From: Ken Brown @ 2020-11-30 23:19 UTC (permalink / raw)
  To: Norton Allen, cygwin

On 11/30/2020 1:26 PM, Norton Allen wrote:
> On 11/30/2020 1:14 PM, Ken Brown wrote:
>> I can reproduce the hang, and it happens if I use the new AF_UNIX code also. 
>> But what I'm seeing (at least with the new code) isn't exactly what you describe.
>>
>> When the server's first select call returns, accept succeeds.  The server then 
>> calls select a second time, and that call doesn't return. I haven't checked 
>> yet to see what's going on in the client, and I may not get to that for a while.
>>
> That's good news, and seems to be consistent with my theory that it is some sort 
> of race condition that might be particularly sensitive to system-specific 
> timing. I am compiling cygwin1.dll now.

Hi Norton,

I think there's a mistake in your test program.  Shouldn't client_pselect() be 
waiting for the socket to be write-ready rather than read-ready?  Here's a quote 
from the Posix page for 'connect':

If the connection cannot be established immediately and O_NONBLOCK is set for 
the file descriptor for the socket, connect() shall fail and set errno to 
[EINPROGRESS], but the connection request shall not be aborted, and the 
connection shall be established asynchronously....

When the connection has been established asynchronously, pselect(), select(), 
and poll() shall indicate that the file descriptor for the socket is ready for 
writing.

Ken

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Unix Domain Socket Limitation?
  2020-11-30 23:19             ` Ken Brown
@ 2020-12-01  2:14               ` Norton Allen
  2020-12-01  2:22                 ` Norton Allen
  0 siblings, 1 reply; 15+ messages in thread
From: Norton Allen @ 2020-12-01  2:14 UTC (permalink / raw)
  To: Ken Brown, cygwin

On 11/30/2020 6:19 PM, Ken Brown wrote:
> On 11/30/2020 1:26 PM, Norton Allen wrote:
>> On 11/30/2020 1:14 PM, Ken Brown wrote:
>>> I can reproduce the hang, and it happens if I use the new AF_UNIX 
>>> code also. But what I'm seeing (at least with the new code) isn't 
>>> exactly what you describe.
>>>
>>> When the server's first select call returns, accept succeeds. The 
>>> server then calls select a second time, and that call doesn't 
>>> return. I haven't checked yet to see what's going on in the client, 
>>> and I may not get to that for a while.
>>>
>> That's good news, and seems to be consistent with my theory that it 
>> is some sort of race condition that might be particularly sensitive 
>> to system-specific timing. I am compiling cygwin1.dll now.
>
> Hi Norton,
>
> I think there's a mistake in your test program.  Shouldn't 
> client_pselect() be waiting for the socket to be write-ready rather 
> than read-ready?  Here's a quote from the Posix page for 'connect':
>
> If the connection cannot be established immediately and O_NONBLOCK is 
> set for the file descriptor for the socket, connect() shall fail and 
> set errno to [EINPROGRESS], but the connection request shall not be 
> aborted, and the connection shall be established asynchronously....
>
> When the connection has been established asynchronously, pselect(), 
> select(), and poll() shall indicate that the file descriptor for the 
> socket is ready for writing.
>
Yes, you are correct. In fact I had already fixed that bug on another 
branch, then forgot to update it on this one. I also noticed another bug 
in calculating width. Now I am not getting the blocking behavior but 
instead getting the wrong bits set in select(). I think I'd better pick 
this up in the morning when I am thinking straight!



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Unix Domain Socket Limitation?
  2020-12-01  2:14               ` Norton Allen
@ 2020-12-01  2:22                 ` Norton Allen
  2020-12-02 17:30                   ` Norton Allen
  0 siblings, 1 reply; 15+ messages in thread
From: Norton Allen @ 2020-12-01  2:22 UTC (permalink / raw)
  To: Ken Brown, cygwin

On 11/30/2020 9:14 PM, Norton Allen wrote:
> On 11/30/2020 6:19 PM, Ken Brown wrote:
>> On 11/30/2020 1:26 PM, Norton Allen wrote:
>>> On 11/30/2020 1:14 PM, Ken Brown wrote:
>>>> I can reproduce the hang, and it happens if I use the new AF_UNIX 
>>>> code also. But what I'm seeing (at least with the new code) isn't 
>>>> exactly what you describe.
>>>>
>>>> When the server's first select call returns, accept succeeds. The 
>>>> server then calls select a second time, and that call doesn't 
>>>> return. I haven't checked yet to see what's going on in the client, 
>>>> and I may not get to that for a while.
>>>>
>>> That's good news, and seems to be consistent with my theory that it 
>>> is some sort of race condition that might be particularly sensitive 
>>> to system-specific timing. I am compiling cygwin1.dll now.
>>
>> Hi Norton,
>>
>> I think there's a mistake in your test program.  Shouldn't 
>> client_pselect() be waiting for the socket to be write-ready rather 
>> than read-ready?  Here's a quote from the Posix page for 'connect':
>>
>> If the connection cannot be established immediately and O_NONBLOCK is 
>> set for the file descriptor for the socket, connect() shall fail and 
>> set errno to [EINPROGRESS], but the connection request shall not be 
>> aborted, and the connection shall be established asynchronously....
>>
>> When the connection has been established asynchronously, pselect(), 
>> select(), and poll() shall indicate that the file descriptor for the 
>> socket is ready for writing.
>>
> Yes, you are correct. In fact I had already fixed that bug on another 
> branch, then forgot to update it on this one. I also noticed another 
> bug in calculating width. Now I am not getting the blocking behavior 
> but instead getting the wrong bits set in select(). I think I'd better 
> pick this up in the morning when I am thinking straight!

Yeah, so now the example no longer blocks for me. Unfortunately these 
bugs are not present in my application, so I will need to keep working 
on this.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Unix Domain Socket Limitation?
  2020-12-01  2:22                 ` Norton Allen
@ 2020-12-02 17:30                   ` Norton Allen
  2020-12-04  1:11                     ` Ken Brown
  0 siblings, 1 reply; 15+ messages in thread
From: Norton Allen @ 2020-12-02 17:30 UTC (permalink / raw)
  To: Ken Brown, cygwin

On 11/30/2020 9:22 PM, Norton Allen wrote:
> Yeah, so now the example no longer blocks for me. Unfortunately these 
> bugs are not present in my application, so I will need to keep working 
> on this.
>

After paring the main application down and back up, I finally narrowed 
in on the condition that was causing this blocking behavior. The issue 
arises when a client connect()s twice to the same server with 
non-blocking unix-domain sockets before calling select().

There are a few pieces to this. With the client configured to connect() 
just once, I can see that the server's select() returns as soon as the 
client calls connect(), but then the server's accept() blocks until the 
client calls select(). That is not proper non-blocking behavior, but it 
appears that the implementation under Cygwin does require that client 
and server both be communicating synchronously to accomplish the 
connect() operation.

I tried running this under Ubuntu 16.04 and found that connect() 
succeeded immediately, so no subsequent select() is required, and there 
does not appear to be a possibility for this collision. That proves to 
hold true even if the server is not waiting in select() to process the 
connect() with accept().

A workaround for this issue may be to keep the socket blocking until 
after connect().

I have pushed the new minimal example program,  'rapid_connects' to 
https://github.com/nthallen/cygwin_unix

The server is run like before as:

    $ ./rapid_connects server

The client can be run in two different modes. To connect with just one 
socket:

    $ ./rapid_connects client1

To connect with two:

    $ ./rapid_connects client2

My immediate strategy will be to develop a workaround for my project. 
Having spent a day inside cygwin1.dll, I can see that I have a steep 
learning curve to make much of a contribution there.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Unix Domain Socket Limitation?
  2020-12-02 17:30                   ` Norton Allen
@ 2020-12-04  1:11                     ` Ken Brown
  2020-12-04 13:51                       ` Norton Allen
  0 siblings, 1 reply; 15+ messages in thread
From: Ken Brown @ 2020-12-04  1:11 UTC (permalink / raw)
  To: Norton Allen, cygwin

On 12/2/2020 12:30 PM, Norton Allen wrote:
> On 11/30/2020 9:22 PM, Norton Allen wrote:
>> Yeah, so now the example no longer blocks for me. Unfortunately these bugs are 
>> not present in my application, so I will need to keep working on this.
>>
> 
> After paring the main application down and back up, I finally narrowed in on the 
> condition that was causing this blocking behavior. The issue arises when a 
> client connect()s twice to the same server with non-blocking unix-domain sockets 
> before calling select().
> 
> There are a few pieces to this. With the client configured to connect() just 
> once, I can see that the server's select() returns as soon as the client calls 
> connect(), but then the server's accept() blocks until the client calls 
> select(). That is not proper non-blocking behavior, but it appears that the 
> implementation under Cygwin does require that client and server both be 
> communicating synchronously to accomplish the connect() operation.
> 
> I tried running this under Ubuntu 16.04 and found that connect() succeeded 
> immediately, so no subsequent select() is required, and there does not appear to 
> be a possibility for this collision. That proves to hold true even if the server 
> is not waiting in select() to process the connect() with accept().
> 
> A workaround for this issue may be to keep the socket blocking until after 
> connect().
> 
> I have pushed the new minimal example program,  'rapid_connects' to 
> https://github.com/nthallen/cygwin_unix
> 
> The server is run like before as:
> 
>     $ ./rapid_connects server
> 
> The client can be run in two different modes. To connect with just one socket:
> 
>     $ ./rapid_connects client1
> 
> To connect with two:
> 
>     $ ./rapid_connects client2
> 
> My immediate strategy will be to develop a workaround for my project. Having 
> spent a day inside cygwin1.dll, I can see that I have a steep learning curve to 
> make much of a contribution there.

I'm traveling at the moment and unable to do any testing, but I wonder if you're 
bumping into an issue that was just discussed on the cygwin-developers list:

   https://cygwin.com/pipermail/cygwin-developers/2020-December/012015.html

A different workaround is described there.

If it's the same issue, then I don't think it will happen with the new AF_UNIX 
implementation.  More in a few days.

Ken

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Unix Domain Socket Limitation?
  2020-12-04  1:11                     ` Ken Brown
@ 2020-12-04 13:51                       ` Norton Allen
  2020-12-05 23:52                         ` Ken Brown
  0 siblings, 1 reply; 15+ messages in thread
From: Norton Allen @ 2020-12-04 13:51 UTC (permalink / raw)
  To: Ken Brown, cygwin

On 12/3/2020 8:11 PM, Ken Brown wrote:
> On 12/2/2020 12:30 PM, Norton Allen wrote:
>> On 11/30/2020 9:22 PM, Norton Allen wrote:
>>> Yeah, so now the example no longer blocks for me. Unfortunately 
>>> these bugs are not present in my application, so I will need to keep 
>>> working on this.
>>>
>>
>> After paring the main application down and back up, I finally 
>> narrowed in on the condition that was causing this blocking behavior. 
>> The issue arises when a client connect()s twice to the same server 
>> with non-blocking unix-domain sockets before calling select().
>>
>> There are a few pieces to this. With the client configured to 
>> connect() just once, I can see that the server's select() returns as 
>> soon as the client calls connect(), but then the server's accept() 
>> blocks until the client calls select(). That is not proper 
>> non-blocking behavior, but it appears that the implementation under 
>> Cygwin does require that client and server both be communicating 
>> synchronously to accomplish the connect() operation.
>>
>> I tried running this under Ubuntu 16.04 and found that connect() 
>> succeeded immediately, so no subsequent select() is required, and 
>> there does not appear to be a possibility for this collision. That 
>> proves to hold true even if the server is not waiting in select() to 
>> process the connect() with accept().
>>
>> A workaround for this issue may be to keep the socket blocking until 
>> after connect().
>>
>> I have pushed the new minimal example program,  'rapid_connects' to 
>> https://github.com/nthallen/cygwin_unix
>>
>> The server is run like before as:
>>
>>     $ ./rapid_connects server
>>
>> The client can be run in two different modes. To connect with just 
>> one socket:
>>
>>     $ ./rapid_connects client1
>>
>> To connect with two:
>>
>>     $ ./rapid_connects client2
>>
>> My immediate strategy will be to develop a workaround for my project. 
>> Having spent a day inside cygwin1.dll, I can see that I have a steep 
>> learning curve to make much of a contribution there.
>
> I'm traveling at the moment and unable to do any testing, but I wonder 
> if you're bumping into an issue that was just discussed on the 
> cygwin-developers list:
>
> https://cygwin.com/pipermail/cygwin-developers/2020-December/012015.html
>
> A different workaround is described there.
>
> If it's the same issue, then I don't think it will happen with the new 
> AF_UNIX implementation.  More in a few days.
>
It does seem related.

A work around that is working for me is to do a blocking connect() and 
switch to non-blocking when that completes. In my application, the 
connect() generally occurs once at the beginning of a run, so blocking 
for a few milliseconds does not impact responsiveness.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Unix Domain Socket Limitation?
  2020-12-04 13:51                       ` Norton Allen
@ 2020-12-05 23:52                         ` Ken Brown
  2020-12-06 17:17                           ` Norton Allen
  0 siblings, 1 reply; 15+ messages in thread
From: Ken Brown @ 2020-12-05 23:52 UTC (permalink / raw)
  To: Norton Allen, cygwin

On 12/4/2020 8:51 AM, Norton Allen wrote:
> On 12/3/2020 8:11 PM, Ken Brown wrote:
>> On 12/2/2020 12:30 PM, Norton Allen wrote:
>>> On 11/30/2020 9:22 PM, Norton Allen wrote:
>>>> Yeah, so now the example no longer blocks for me. Unfortunately these bugs 
>>>> are not present in my application, so I will need to keep working on this.
>>>>
>>>
>>> After paring the main application down and back up, I finally narrowed in on 
>>> the condition that was causing this blocking behavior. The issue arises when 
>>> a client connect()s twice to the same server with non-blocking unix-domain 
>>> sockets before calling select().
>>>
>>> There are a few pieces to this. With the client configured to connect() just 
>>> once, I can see that the server's select() returns as soon as the client 
>>> calls connect(), but then the server's accept() blocks until the client calls 
>>> select(). That is not proper non-blocking behavior, but it appears that the 
>>> implementation under Cygwin does require that client and server both be 
>>> communicating synchronously to accomplish the connect() operation.
>>>
>>> I tried running this under Ubuntu 16.04 and found that connect() succeeded 
>>> immediately, so no subsequent select() is required, and there does not appear 
>>> to be a possibility for this collision. That proves to hold true even if the 
>>> server is not waiting in select() to process the connect() with accept().
>>>
>>> A workaround for this issue may be to keep the socket blocking until after 
>>> connect().
>>>
>>> I have pushed the new minimal example program,  'rapid_connects' to 
>>> https://github.com/nthallen/cygwin_unix
>>>
>>> The server is run like before as:
>>>
>>>     $ ./rapid_connects server
>>>
>>> The client can be run in two different modes. To connect with just one socket:
>>>
>>>     $ ./rapid_connects client1
>>>
>>> To connect with two:
>>>
>>>     $ ./rapid_connects client2
>>>
>>> My immediate strategy will be to develop a workaround for my project. Having 
>>> spent a day inside cygwin1.dll, I can see that I have a steep learning curve 
>>> to make much of a contribution there.
>>
>> I'm traveling at the moment and unable to do any testing, but I wonder if 
>> you're bumping into an issue that was just discussed on the cygwin-developers 
>> list:
>>
>> https://cygwin.com/pipermail/cygwin-developers/2020-December/012015.html
>>
>> A different workaround is described there.
>>
>> If it's the same issue, then I don't think it will happen with the new AF_UNIX 
>> implementation.  More in a few days.
>>
> It does seem related.
> 
> A work around that is working for me is to do a blocking connect() and switch to 
> non-blocking when that completes. In my application, the connect() generally 
> occurs once at the beginning of a run, so blocking for a few milliseconds does 
> not impact responsiveness.

For the record, I can confirm that (a) the problem occurs with the current 
AF_UNIX implementation and (b) it does not occur with the new implementation (on 
the topic/af_unix branch).  With both client1 and client2, I see "connect() 
apparently succeeded immediately" using the new implementation.

The new implementation is not yet ready for prime time, but with any luck it 
might be ready within a few months.

Ken

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Unix Domain Socket Limitation?
  2020-12-05 23:52                         ` Ken Brown
@ 2020-12-06 17:17                           ` Norton Allen
  2020-12-06 22:32                             ` Ken Brown
  0 siblings, 1 reply; 15+ messages in thread
From: Norton Allen @ 2020-12-06 17:17 UTC (permalink / raw)
  To: Ken Brown, cygwin

On 12/5/2020 6:52 PM, Ken Brown wrote:
> On 12/4/2020 8:51 AM, Norton Allen wrote:
>> On 12/3/2020 8:11 PM, Ken Brown wrote:
>>>
>>> I'm traveling at the moment and unable to do any testing, but I 
>>> wonder if you're bumping into an issue that was just discussed on 
>>> the cygwin-developers list:
>>>
>>> https://cygwin.com/pipermail/cygwin-developers/2020-December/012015.html 
>>>
>>>
>>> A different workaround is described there.
>>>
>>> If it's the same issue, then I don't think it will happen with the 
>>> new AF_UNIX implementation.  More in a few days.
>>>
>> It does seem related.
>>
>> A work around that is working for me is to do a blocking connect() 
>> and switch to non-blocking when that completes. In my application, 
>> the connect() generally occurs once at the beginning of a run, so 
>> blocking for a few milliseconds does not impact responsiveness.
>
> For the record, I can confirm that (a) the problem occurs with the 
> current AF_UNIX implementation and (b) it does not occur with the new 
> implementation (on the topic/af_unix branch).  With both client1 and 
> client2, I see "connect() apparently succeeded immediately" using the 
> new implementation.
>
> The new implementation is not yet ready for prime time, but with any 
> luck it might be ready within a few months.
>
That sounds great, and exactly like the behavior under Linux. I'd 
certainly be happy to test the new implementation as it gets closer, and 
also happy to expand or improve the test apps to cover a wider range of 
functionality and/or usability (e.g. run both client and server via a 
fork.) Feel free to let me know what would be particularly useful.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Unix Domain Socket Limitation?
  2020-12-06 17:17                           ` Norton Allen
@ 2020-12-06 22:32                             ` Ken Brown
  0 siblings, 0 replies; 15+ messages in thread
From: Ken Brown @ 2020-12-06 22:32 UTC (permalink / raw)
  To: Norton Allen, cygwin

On 12/6/2020 12:17 PM, Norton Allen wrote:
> On 12/5/2020 6:52 PM, Ken Brown wrote:
>> On 12/4/2020 8:51 AM, Norton Allen wrote:
>>> On 12/3/2020 8:11 PM, Ken Brown wrote:
>>>>
>>>> I'm traveling at the moment and unable to do any testing, but I wonder if 
>>>> you're bumping into an issue that was just discussed on the 
>>>> cygwin-developers list:
>>>>
>>>> https://cygwin.com/pipermail/cygwin-developers/2020-December/012015.html
>>>>
>>>> A different workaround is described there.
>>>>
>>>> If it's the same issue, then I don't think it will happen with the new 
>>>> AF_UNIX implementation.  More in a few days.
>>>>
>>> It does seem related.
>>>
>>> A work around that is working for me is to do a blocking connect() and switch 
>>> to non-blocking when that completes. In my application, the connect() 
>>> generally occurs once at the beginning of a run, so blocking for a few 
>>> milliseconds does not impact responsiveness.
>>
>> For the record, I can confirm that (a) the problem occurs with the current 
>> AF_UNIX implementation and (b) it does not occur with the new implementation 
>> (on the topic/af_unix branch).  With both client1 and client2, I see 
>> "connect() apparently succeeded immediately" using the new implementation.
>>
>> The new implementation is not yet ready for prime time, but with any luck it 
>> might be ready within a few months.
>>
> That sounds great, and exactly like the behavior under Linux. I'd certainly be 
> happy to test the new implementation as it gets closer, and also happy to expand 
> or improve the test apps to cover a wider range of functionality and/or 
> usability (e.g. run both client and server via a fork.) Feel free to let me know 
> what would be particularly useful.

Thanks.  I'll take you up on that when the branch is in slightly better shape.

Ken

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-12-06 22:33 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-25 21:47 Unix Domain Socket Limitation? Norton Allen
2020-11-25 22:27 ` Ken Brown
     [not found]   ` <4260ad1b-4ab2-fa36-fd0e-7c9644560114@huarp.harvard.edu>
2020-11-26 17:13     ` Ken Brown
2020-11-30 17:19       ` Norton Allen
2020-11-30 18:14         ` Ken Brown
2020-11-30 18:26           ` Norton Allen
2020-11-30 23:19             ` Ken Brown
2020-12-01  2:14               ` Norton Allen
2020-12-01  2:22                 ` Norton Allen
2020-12-02 17:30                   ` Norton Allen
2020-12-04  1:11                     ` Ken Brown
2020-12-04 13:51                       ` Norton Allen
2020-12-05 23:52                         ` Ken Brown
2020-12-06 17:17                           ` Norton Allen
2020-12-06 22:32                             ` Ken Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).