* Unix Domain Socket Limitation?
@ 2020-11-25 21:47 Norton Allen
2020-11-25 22:27 ` Ken Brown
0 siblings, 1 reply; 15+ messages in thread
From: Norton Allen @ 2020-11-25 21:47 UTC (permalink / raw)
To: cygwin
In my recent tests, it appears as though it is not possible to
successfully connect via two Unix Domain sockets from one client
application to one server application.
Specifically, if I create a server which listens on a Unix Domain socket
and a client, which attempts to connect() twice, both seem to lock up.
This is not the behavior under Linux.
I will be happy to work up a minimal example if it is helpful in
tracking this down. I wanted to start by asking whether this is a known
limitation and/or if there is something about the Cygwin implementation
that makes this sort of thing very difficult.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Unix Domain Socket Limitation?
2020-11-25 21:47 Unix Domain Socket Limitation? Norton Allen
@ 2020-11-25 22:27 ` Ken Brown
[not found] ` <4260ad1b-4ab2-fa36-fd0e-7c9644560114@huarp.harvard.edu>
0 siblings, 1 reply; 15+ messages in thread
From: Ken Brown @ 2020-11-25 22:27 UTC (permalink / raw)
To: cygwin
On 11/25/2020 4:47 PM, Norton Allen wrote:
> In my recent tests, it appears as though it is not possible to successfully
> connect via two Unix Domain sockets from one client application to one server
> application.
>
> Specifically, if I create a server which listens on a Unix Domain socket and a
> client, which attempts to connect() twice, both seem to lock up. This is not the
> behavior under Linux.
>
> I will be happy to work up a minimal example if it is helpful in tracking this
> down. I wanted to start by asking whether this is a known limitation and/or if
> there is something about the Cygwin implementation that makes this sort of thing
> very difficult.
A minimal example would be extremely helpful.
Corinna can answer questions about limitations in the current implementation.
But there is a new implementation under development. It's in the topic/af_unix
branch of the Cygwin git repository if you're interested in looking at it.
Corinna began working on this a couple years ago, and I've recently been trying
to finish it. I've made quite a bit of progress, but there's still more to do
and undoubtedly many bugs. So any test cases you have would be very useful.
Ken
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Unix Domain Socket Limitation?
[not found] ` <4260ad1b-4ab2-fa36-fd0e-7c9644560114@huarp.harvard.edu>
@ 2020-11-26 17:13 ` Ken Brown
2020-11-30 17:19 ` Norton Allen
0 siblings, 1 reply; 15+ messages in thread
From: Ken Brown @ 2020-11-26 17:13 UTC (permalink / raw)
To: Norton Allen; +Cc: cygwin
[Adding the Cygwin list back to the Cc.]
On 11/26/2020 11:27 AM, Norton Allen wrote:
> On 11/25/2020 5:27 PM, Ken Brown via Cygwin wrote:
>> On 11/25/2020 4:47 PM, Norton Allen wrote:
>>> In my recent tests, it appears as though it is not possible to successfully
>>> connect via two Unix Domain sockets from one client application to one server
>>> application.
>>>
>>> Specifically, if I create a server which listens on a Unix Domain socket and
>>> a client, which attempts to connect() twice, both seem to lock up. This is
>>> not the behavior under Linux.
>>>
>>> I will be happy to work up a minimal example if it is helpful in tracking
>>> this down. I wanted to start by asking whether this is a known limitation
>>> and/or if there is something about the Cygwin implementation that makes this
>>> sort of thing very difficult.
>>
>> A minimal example would be extremely helpful.
>>
>> Corinna can answer questions about limitations in the current implementation.
>> But there is a new implementation under development. It's in the topic/af_unix
>> branch of the Cygwin git repository if you're interested in looking at it.
>>
>> Corinna began working on this a couple years ago, and I've recently been
>> trying to finish it. I've made quite a bit of progress, but there's still
>> more to do and undoubtedly many bugs. So any test cases you have would be very
>> useful.
>
> Thanks Ken,
>
> As it happens, attempting to produce a minimal example suggests my problem may
> be somewhere else. I think I've worked in most of the features of my application
> one by one but have not yet revealed a failure.
OK. But if you ever do have occasion to write small test programs involving
AF_UNIX sockets, please send them on. The new AF_UNIX code needs as much
testing as it can get.
Ken
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Unix Domain Socket Limitation?
2020-11-26 17:13 ` Ken Brown
@ 2020-11-30 17:19 ` Norton Allen
2020-11-30 18:14 ` Ken Brown
0 siblings, 1 reply; 15+ messages in thread
From: Norton Allen @ 2020-11-30 17:19 UTC (permalink / raw)
To: Ken Brown; +Cc: cygwin
On 11/26/2020 12:13 PM, Ken Brown wrote:
> [Adding the Cygwin list back to the Cc.]
>
> On 11/26/2020 11:27 AM, Norton Allen wrote:
>> On 11/25/2020 5:27 PM, Ken Brown via Cygwin wrote:
>>> On 11/25/2020 4:47 PM, Norton Allen wrote:
>>>> In my recent tests, it appears as though it is not possible to
>>>> successfully connect via two Unix Domain sockets from one client
>>>> application to one server application.
>>>>
>>>> Specifically, if I create a server which listens on a Unix Domain
>>>> socket and a client, which attempts to connect() twice, both seem
>>>> to lock up. This is not the behavior under Linux.
>>>>
>>>> I will be happy to work up a minimal example if it is helpful in
>>>> tracking this down. I wanted to start by asking whether this is a
>>>> known limitation and/or if there is something about the Cygwin
>>>> implementation that makes this sort of thing very difficult.
>>>
>>> A minimal example would be extremely helpful.
>>>
>>> Corinna can answer questions about limitations in the current
>>> implementation. But there is a new implementation under development.
>>> It's in the topic/af_unix branch of the Cygwin git repository if
>>> you're interested in looking at it.
>>>
>>> Corinna began working on this a couple years ago, and I've recently
>>> been trying to finish it. I've made quite a bit of progress, but
>>> there's still more to do and undoubtedly many bugs. So any test
>>> cases you have would be very useful.
>>
>> Thanks Ken,
>>
>> As it happens, attempting to produce a minimal example suggests my
>> problem may be somewhere else. I think I've worked in most of the
>> features of my application one by one but have not yet revealed a
>> failure.
>
> OK. But if you ever do have occasion to write small test programs
> involving AF_UNIX sockets, please send them on. The new AF_UNIX code
> needs as much testing as it can get.
>
I have finally put together a start of a minimal example, although it
seems to require a certain level of complexity before tripping on the
bug. At the moment, I do not believe the issue is related to having
multiple sockets between the client and server. I am thinking it is some
sort of race condition related to non-blocking sockets, since I have
only observed it when both the client and server are using non-blocking
sockets.
I have yet to plunge into cygwin.dll, but I think I have reached that point.
Here is the code: https://github.com/nthallen/cygwin_unix
Since I have only exercised this on my machine, I would be very
interested to know if it is reproducible on anyone else's.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Unix Domain Socket Limitation?
2020-11-30 17:19 ` Norton Allen
@ 2020-11-30 18:14 ` Ken Brown
2020-11-30 18:26 ` Norton Allen
0 siblings, 1 reply; 15+ messages in thread
From: Ken Brown @ 2020-11-30 18:14 UTC (permalink / raw)
To: cygwin
On 11/30/2020 12:19 PM, Norton Allen wrote:
> On 11/26/2020 12:13 PM, Ken Brown wrote:
>> [Adding the Cygwin list back to the Cc.]
>>
>> On 11/26/2020 11:27 AM, Norton Allen wrote:
>>> On 11/25/2020 5:27 PM, Ken Brown via Cygwin wrote:
>>>> On 11/25/2020 4:47 PM, Norton Allen wrote:
>>>>> In my recent tests, it appears as though it is not possible to successfully
>>>>> connect via two Unix Domain sockets from one client application to one
>>>>> server application.
>>>>>
>>>>> Specifically, if I create a server which listens on a Unix Domain socket
>>>>> and a client, which attempts to connect() twice, both seem to lock up. This
>>>>> is not the behavior under Linux.
>>>>>
>>>>> I will be happy to work up a minimal example if it is helpful in tracking
>>>>> this down. I wanted to start by asking whether this is a known limitation
>>>>> and/or if there is something about the Cygwin implementation that makes
>>>>> this sort of thing very difficult.
>>>>
>>>> A minimal example would be extremely helpful.
>>>>
>>>> Corinna can answer questions about limitations in the current
>>>> implementation. But there is a new implementation under development. It's in
>>>> the topic/af_unix branch of the Cygwin git repository if you're interested
>>>> in looking at it.
>>>>
>>>> Corinna began working on this a couple years ago, and I've recently been
>>>> trying to finish it. I've made quite a bit of progress, but there's still
>>>> more to do and undoubtedly many bugs. So any test cases you have would be
>>>> very useful.
>>>
>>> Thanks Ken,
>>>
>>> As it happens, attempting to produce a minimal example suggests my problem
>>> may be somewhere else. I think I've worked in most of the features of my
>>> application one by one but have not yet revealed a failure.
>>
>> OK. But if you ever do have occasion to write small test programs involving
>> AF_UNIX sockets, please send them on. The new AF_UNIX code needs as much
>> testing as it can get.
>>
> I have finally put together a start of a minimal example, although it seems to
> require a certain level of complexity before tripping on the bug. At the moment,
> I do not believe the issue is related to having multiple sockets between the
> client and server. I am thinking it is some sort of race condition related to
> non-blocking sockets, since I have only observed it when both the client and
> server are using non-blocking sockets.
>
> I have yet to plunge into cygwin.dll, but I think I have reached that point.
>
> Here is the code: https://github.com/nthallen/cygwin_unix
>
> Since I have only exercised this on my machine, I would be very interested to
> know if it is reproducible on anyone else's.
I can reproduce the hang, and it happens if I use the new AF_UNIX code also.
But what I'm seeing (at least with the new code) isn't exactly what you describe.
When the server's first select call returns, accept succeeds. The server then
calls select a second time, and that call doesn't return. I haven't checked yet
to see what's going on in the client, and I may not get to that for a while.
Ken
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Unix Domain Socket Limitation?
2020-11-30 18:14 ` Ken Brown
@ 2020-11-30 18:26 ` Norton Allen
2020-11-30 23:19 ` Ken Brown
0 siblings, 1 reply; 15+ messages in thread
From: Norton Allen @ 2020-11-30 18:26 UTC (permalink / raw)
To: Ken Brown, cygwin
On 11/30/2020 1:14 PM, Ken Brown wrote:
> I can reproduce the hang, and it happens if I use the new AF_UNIX code
> also. But what I'm seeing (at least with the new code) isn't exactly
> what you describe.
>
> When the server's first select call returns, accept succeeds. The
> server then calls select a second time, and that call doesn't return.
> I haven't checked yet to see what's going on in the client, and I may
> not get to that for a while.
>
That's good news, and seems to be consistent with my theory that it is
some sort of race condition that might be particularly sensitive to
system-specific timing. I am compiling cygwin1.dll now.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Unix Domain Socket Limitation?
2020-11-30 18:26 ` Norton Allen
@ 2020-11-30 23:19 ` Ken Brown
2020-12-01 2:14 ` Norton Allen
0 siblings, 1 reply; 15+ messages in thread
From: Ken Brown @ 2020-11-30 23:19 UTC (permalink / raw)
To: Norton Allen, cygwin
On 11/30/2020 1:26 PM, Norton Allen wrote:
> On 11/30/2020 1:14 PM, Ken Brown wrote:
>> I can reproduce the hang, and it happens if I use the new AF_UNIX code also.
>> But what I'm seeing (at least with the new code) isn't exactly what you describe.
>>
>> When the server's first select call returns, accept succeeds. The server then
>> calls select a second time, and that call doesn't return. I haven't checked
>> yet to see what's going on in the client, and I may not get to that for a while.
>>
> That's good news, and seems to be consistent with my theory that it is some sort
> of race condition that might be particularly sensitive to system-specific
> timing. I am compiling cygwin1.dll now.
Hi Norton,
I think there's a mistake in your test program. Shouldn't client_pselect() be
waiting for the socket to be write-ready rather than read-ready? Here's a quote
from the Posix page for 'connect':
If the connection cannot be established immediately and O_NONBLOCK is set for
the file descriptor for the socket, connect() shall fail and set errno to
[EINPROGRESS], but the connection request shall not be aborted, and the
connection shall be established asynchronously....
When the connection has been established asynchronously, pselect(), select(),
and poll() shall indicate that the file descriptor for the socket is ready for
writing.
Ken
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Unix Domain Socket Limitation?
2020-11-30 23:19 ` Ken Brown
@ 2020-12-01 2:14 ` Norton Allen
2020-12-01 2:22 ` Norton Allen
0 siblings, 1 reply; 15+ messages in thread
From: Norton Allen @ 2020-12-01 2:14 UTC (permalink / raw)
To: Ken Brown, cygwin
On 11/30/2020 6:19 PM, Ken Brown wrote:
> On 11/30/2020 1:26 PM, Norton Allen wrote:
>> On 11/30/2020 1:14 PM, Ken Brown wrote:
>>> I can reproduce the hang, and it happens if I use the new AF_UNIX
>>> code also. But what I'm seeing (at least with the new code) isn't
>>> exactly what you describe.
>>>
>>> When the server's first select call returns, accept succeeds. The
>>> server then calls select a second time, and that call doesn't
>>> return. I haven't checked yet to see what's going on in the client,
>>> and I may not get to that for a while.
>>>
>> That's good news, and seems to be consistent with my theory that it
>> is some sort of race condition that might be particularly sensitive
>> to system-specific timing. I am compiling cygwin1.dll now.
>
> Hi Norton,
>
> I think there's a mistake in your test program. Shouldn't
> client_pselect() be waiting for the socket to be write-ready rather
> than read-ready? Here's a quote from the Posix page for 'connect':
>
> If the connection cannot be established immediately and O_NONBLOCK is
> set for the file descriptor for the socket, connect() shall fail and
> set errno to [EINPROGRESS], but the connection request shall not be
> aborted, and the connection shall be established asynchronously....
>
> When the connection has been established asynchronously, pselect(),
> select(), and poll() shall indicate that the file descriptor for the
> socket is ready for writing.
>
Yes, you are correct. In fact I had already fixed that bug on another
branch, then forgot to update it on this one. I also noticed another bug
in calculating width. Now I am not getting the blocking behavior but
instead getting the wrong bits set in select(). I think I'd better pick
this up in the morning when I am thinking straight!
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Unix Domain Socket Limitation?
2020-12-01 2:14 ` Norton Allen
@ 2020-12-01 2:22 ` Norton Allen
2020-12-02 17:30 ` Norton Allen
0 siblings, 1 reply; 15+ messages in thread
From: Norton Allen @ 2020-12-01 2:22 UTC (permalink / raw)
To: Ken Brown, cygwin
On 11/30/2020 9:14 PM, Norton Allen wrote:
> On 11/30/2020 6:19 PM, Ken Brown wrote:
>> On 11/30/2020 1:26 PM, Norton Allen wrote:
>>> On 11/30/2020 1:14 PM, Ken Brown wrote:
>>>> I can reproduce the hang, and it happens if I use the new AF_UNIX
>>>> code also. But what I'm seeing (at least with the new code) isn't
>>>> exactly what you describe.
>>>>
>>>> When the server's first select call returns, accept succeeds. The
>>>> server then calls select a second time, and that call doesn't
>>>> return. I haven't checked yet to see what's going on in the client,
>>>> and I may not get to that for a while.
>>>>
>>> That's good news, and seems to be consistent with my theory that it
>>> is some sort of race condition that might be particularly sensitive
>>> to system-specific timing. I am compiling cygwin1.dll now.
>>
>> Hi Norton,
>>
>> I think there's a mistake in your test program. Shouldn't
>> client_pselect() be waiting for the socket to be write-ready rather
>> than read-ready? Here's a quote from the Posix page for 'connect':
>>
>> If the connection cannot be established immediately and O_NONBLOCK is
>> set for the file descriptor for the socket, connect() shall fail and
>> set errno to [EINPROGRESS], but the connection request shall not be
>> aborted, and the connection shall be established asynchronously....
>>
>> When the connection has been established asynchronously, pselect(),
>> select(), and poll() shall indicate that the file descriptor for the
>> socket is ready for writing.
>>
> Yes, you are correct. In fact I had already fixed that bug on another
> branch, then forgot to update it on this one. I also noticed another
> bug in calculating width. Now I am not getting the blocking behavior
> but instead getting the wrong bits set in select(). I think I'd better
> pick this up in the morning when I am thinking straight!
Yeah, so now the example no longer blocks for me. Unfortunately these
bugs are not present in my application, so I will need to keep working
on this.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Unix Domain Socket Limitation?
2020-12-01 2:22 ` Norton Allen
@ 2020-12-02 17:30 ` Norton Allen
2020-12-04 1:11 ` Ken Brown
0 siblings, 1 reply; 15+ messages in thread
From: Norton Allen @ 2020-12-02 17:30 UTC (permalink / raw)
To: Ken Brown, cygwin
On 11/30/2020 9:22 PM, Norton Allen wrote:
> Yeah, so now the example no longer blocks for me. Unfortunately these
> bugs are not present in my application, so I will need to keep working
> on this.
>
After paring the main application down and back up, I finally narrowed
in on the condition that was causing this blocking behavior. The issue
arises when a client connect()s twice to the same server with
non-blocking unix-domain sockets before calling select().
There are a few pieces to this. With the client configured to connect()
just once, I can see that the server's select() returns as soon as the
client calls connect(), but then the server's accept() blocks until the
client calls select(). That is not proper non-blocking behavior, but it
appears that the implementation under Cygwin does require that client
and server both be communicating synchronously to accomplish the
connect() operation.
I tried running this under Ubuntu 16.04 and found that connect()
succeeded immediately, so no subsequent select() is required, and there
does not appear to be a possibility for this collision. That proves to
hold true even if the server is not waiting in select() to process the
connect() with accept().
A workaround for this issue may be to keep the socket blocking until
after connect().
I have pushed the new minimal example program, 'rapid_connects' to
https://github.com/nthallen/cygwin_unix
The server is run like before as:
$ ./rapid_connects server
The client can be run in two different modes. To connect with just one
socket:
$ ./rapid_connects client1
To connect with two:
$ ./rapid_connects client2
My immediate strategy will be to develop a workaround for my project.
Having spent a day inside cygwin1.dll, I can see that I have a steep
learning curve to make much of a contribution there.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Unix Domain Socket Limitation?
2020-12-02 17:30 ` Norton Allen
@ 2020-12-04 1:11 ` Ken Brown
2020-12-04 13:51 ` Norton Allen
0 siblings, 1 reply; 15+ messages in thread
From: Ken Brown @ 2020-12-04 1:11 UTC (permalink / raw)
To: Norton Allen, cygwin
On 12/2/2020 12:30 PM, Norton Allen wrote:
> On 11/30/2020 9:22 PM, Norton Allen wrote:
>> Yeah, so now the example no longer blocks for me. Unfortunately these bugs are
>> not present in my application, so I will need to keep working on this.
>>
>
> After paring the main application down and back up, I finally narrowed in on the
> condition that was causing this blocking behavior. The issue arises when a
> client connect()s twice to the same server with non-blocking unix-domain sockets
> before calling select().
>
> There are a few pieces to this. With the client configured to connect() just
> once, I can see that the server's select() returns as soon as the client calls
> connect(), but then the server's accept() blocks until the client calls
> select(). That is not proper non-blocking behavior, but it appears that the
> implementation under Cygwin does require that client and server both be
> communicating synchronously to accomplish the connect() operation.
>
> I tried running this under Ubuntu 16.04 and found that connect() succeeded
> immediately, so no subsequent select() is required, and there does not appear to
> be a possibility for this collision. That proves to hold true even if the server
> is not waiting in select() to process the connect() with accept().
>
> A workaround for this issue may be to keep the socket blocking until after
> connect().
>
> I have pushed the new minimal example program, 'rapid_connects' to
> https://github.com/nthallen/cygwin_unix
>
> The server is run like before as:
>
> $ ./rapid_connects server
>
> The client can be run in two different modes. To connect with just one socket:
>
> $ ./rapid_connects client1
>
> To connect with two:
>
> $ ./rapid_connects client2
>
> My immediate strategy will be to develop a workaround for my project. Having
> spent a day inside cygwin1.dll, I can see that I have a steep learning curve to
> make much of a contribution there.
I'm traveling at the moment and unable to do any testing, but I wonder if you're
bumping into an issue that was just discussed on the cygwin-developers list:
https://cygwin.com/pipermail/cygwin-developers/2020-December/012015.html
A different workaround is described there.
If it's the same issue, then I don't think it will happen with the new AF_UNIX
implementation. More in a few days.
Ken
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Unix Domain Socket Limitation?
2020-12-04 1:11 ` Ken Brown
@ 2020-12-04 13:51 ` Norton Allen
2020-12-05 23:52 ` Ken Brown
0 siblings, 1 reply; 15+ messages in thread
From: Norton Allen @ 2020-12-04 13:51 UTC (permalink / raw)
To: Ken Brown, cygwin
On 12/3/2020 8:11 PM, Ken Brown wrote:
> On 12/2/2020 12:30 PM, Norton Allen wrote:
>> On 11/30/2020 9:22 PM, Norton Allen wrote:
>>> Yeah, so now the example no longer blocks for me. Unfortunately
>>> these bugs are not present in my application, so I will need to keep
>>> working on this.
>>>
>>
>> After paring the main application down and back up, I finally
>> narrowed in on the condition that was causing this blocking behavior.
>> The issue arises when a client connect()s twice to the same server
>> with non-blocking unix-domain sockets before calling select().
>>
>> There are a few pieces to this. With the client configured to
>> connect() just once, I can see that the server's select() returns as
>> soon as the client calls connect(), but then the server's accept()
>> blocks until the client calls select(). That is not proper
>> non-blocking behavior, but it appears that the implementation under
>> Cygwin does require that client and server both be communicating
>> synchronously to accomplish the connect() operation.
>>
>> I tried running this under Ubuntu 16.04 and found that connect()
>> succeeded immediately, so no subsequent select() is required, and
>> there does not appear to be a possibility for this collision. That
>> proves to hold true even if the server is not waiting in select() to
>> process the connect() with accept().
>>
>> A workaround for this issue may be to keep the socket blocking until
>> after connect().
>>
>> I have pushed the new minimal example program, 'rapid_connects' to
>> https://github.com/nthallen/cygwin_unix
>>
>> The server is run like before as:
>>
>> $ ./rapid_connects server
>>
>> The client can be run in two different modes. To connect with just
>> one socket:
>>
>> $ ./rapid_connects client1
>>
>> To connect with two:
>>
>> $ ./rapid_connects client2
>>
>> My immediate strategy will be to develop a workaround for my project.
>> Having spent a day inside cygwin1.dll, I can see that I have a steep
>> learning curve to make much of a contribution there.
>
> I'm traveling at the moment and unable to do any testing, but I wonder
> if you're bumping into an issue that was just discussed on the
> cygwin-developers list:
>
> https://cygwin.com/pipermail/cygwin-developers/2020-December/012015.html
>
> A different workaround is described there.
>
> If it's the same issue, then I don't think it will happen with the new
> AF_UNIX implementation. More in a few days.
>
It does seem related.
A work around that is working for me is to do a blocking connect() and
switch to non-blocking when that completes. In my application, the
connect() generally occurs once at the beginning of a run, so blocking
for a few milliseconds does not impact responsiveness.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Unix Domain Socket Limitation?
2020-12-04 13:51 ` Norton Allen
@ 2020-12-05 23:52 ` Ken Brown
2020-12-06 17:17 ` Norton Allen
0 siblings, 1 reply; 15+ messages in thread
From: Ken Brown @ 2020-12-05 23:52 UTC (permalink / raw)
To: Norton Allen, cygwin
On 12/4/2020 8:51 AM, Norton Allen wrote:
> On 12/3/2020 8:11 PM, Ken Brown wrote:
>> On 12/2/2020 12:30 PM, Norton Allen wrote:
>>> On 11/30/2020 9:22 PM, Norton Allen wrote:
>>>> Yeah, so now the example no longer blocks for me. Unfortunately these bugs
>>>> are not present in my application, so I will need to keep working on this.
>>>>
>>>
>>> After paring the main application down and back up, I finally narrowed in on
>>> the condition that was causing this blocking behavior. The issue arises when
>>> a client connect()s twice to the same server with non-blocking unix-domain
>>> sockets before calling select().
>>>
>>> There are a few pieces to this. With the client configured to connect() just
>>> once, I can see that the server's select() returns as soon as the client
>>> calls connect(), but then the server's accept() blocks until the client calls
>>> select(). That is not proper non-blocking behavior, but it appears that the
>>> implementation under Cygwin does require that client and server both be
>>> communicating synchronously to accomplish the connect() operation.
>>>
>>> I tried running this under Ubuntu 16.04 and found that connect() succeeded
>>> immediately, so no subsequent select() is required, and there does not appear
>>> to be a possibility for this collision. That proves to hold true even if the
>>> server is not waiting in select() to process the connect() with accept().
>>>
>>> A workaround for this issue may be to keep the socket blocking until after
>>> connect().
>>>
>>> I have pushed the new minimal example program, 'rapid_connects' to
>>> https://github.com/nthallen/cygwin_unix
>>>
>>> The server is run like before as:
>>>
>>> $ ./rapid_connects server
>>>
>>> The client can be run in two different modes. To connect with just one socket:
>>>
>>> $ ./rapid_connects client1
>>>
>>> To connect with two:
>>>
>>> $ ./rapid_connects client2
>>>
>>> My immediate strategy will be to develop a workaround for my project. Having
>>> spent a day inside cygwin1.dll, I can see that I have a steep learning curve
>>> to make much of a contribution there.
>>
>> I'm traveling at the moment and unable to do any testing, but I wonder if
>> you're bumping into an issue that was just discussed on the cygwin-developers
>> list:
>>
>> https://cygwin.com/pipermail/cygwin-developers/2020-December/012015.html
>>
>> A different workaround is described there.
>>
>> If it's the same issue, then I don't think it will happen with the new AF_UNIX
>> implementation. More in a few days.
>>
> It does seem related.
>
> A work around that is working for me is to do a blocking connect() and switch to
> non-blocking when that completes. In my application, the connect() generally
> occurs once at the beginning of a run, so blocking for a few milliseconds does
> not impact responsiveness.
For the record, I can confirm that (a) the problem occurs with the current
AF_UNIX implementation and (b) it does not occur with the new implementation (on
the topic/af_unix branch). With both client1 and client2, I see "connect()
apparently succeeded immediately" using the new implementation.
The new implementation is not yet ready for prime time, but with any luck it
might be ready within a few months.
Ken
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Unix Domain Socket Limitation?
2020-12-05 23:52 ` Ken Brown
@ 2020-12-06 17:17 ` Norton Allen
2020-12-06 22:32 ` Ken Brown
0 siblings, 1 reply; 15+ messages in thread
From: Norton Allen @ 2020-12-06 17:17 UTC (permalink / raw)
To: Ken Brown, cygwin
On 12/5/2020 6:52 PM, Ken Brown wrote:
> On 12/4/2020 8:51 AM, Norton Allen wrote:
>> On 12/3/2020 8:11 PM, Ken Brown wrote:
>>>
>>> I'm traveling at the moment and unable to do any testing, but I
>>> wonder if you're bumping into an issue that was just discussed on
>>> the cygwin-developers list:
>>>
>>> https://cygwin.com/pipermail/cygwin-developers/2020-December/012015.html
>>>
>>>
>>> A different workaround is described there.
>>>
>>> If it's the same issue, then I don't think it will happen with the
>>> new AF_UNIX implementation. More in a few days.
>>>
>> It does seem related.
>>
>> A work around that is working for me is to do a blocking connect()
>> and switch to non-blocking when that completes. In my application,
>> the connect() generally occurs once at the beginning of a run, so
>> blocking for a few milliseconds does not impact responsiveness.
>
> For the record, I can confirm that (a) the problem occurs with the
> current AF_UNIX implementation and (b) it does not occur with the new
> implementation (on the topic/af_unix branch). With both client1 and
> client2, I see "connect() apparently succeeded immediately" using the
> new implementation.
>
> The new implementation is not yet ready for prime time, but with any
> luck it might be ready within a few months.
>
That sounds great, and exactly like the behavior under Linux. I'd
certainly be happy to test the new implementation as it gets closer, and
also happy to expand or improve the test apps to cover a wider range of
functionality and/or usability (e.g. run both client and server via a
fork.) Feel free to let me know what would be particularly useful.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Unix Domain Socket Limitation?
2020-12-06 17:17 ` Norton Allen
@ 2020-12-06 22:32 ` Ken Brown
0 siblings, 0 replies; 15+ messages in thread
From: Ken Brown @ 2020-12-06 22:32 UTC (permalink / raw)
To: Norton Allen, cygwin
On 12/6/2020 12:17 PM, Norton Allen wrote:
> On 12/5/2020 6:52 PM, Ken Brown wrote:
>> On 12/4/2020 8:51 AM, Norton Allen wrote:
>>> On 12/3/2020 8:11 PM, Ken Brown wrote:
>>>>
>>>> I'm traveling at the moment and unable to do any testing, but I wonder if
>>>> you're bumping into an issue that was just discussed on the
>>>> cygwin-developers list:
>>>>
>>>> https://cygwin.com/pipermail/cygwin-developers/2020-December/012015.html
>>>>
>>>> A different workaround is described there.
>>>>
>>>> If it's the same issue, then I don't think it will happen with the new
>>>> AF_UNIX implementation. More in a few days.
>>>>
>>> It does seem related.
>>>
>>> A work around that is working for me is to do a blocking connect() and switch
>>> to non-blocking when that completes. In my application, the connect()
>>> generally occurs once at the beginning of a run, so blocking for a few
>>> milliseconds does not impact responsiveness.
>>
>> For the record, I can confirm that (a) the problem occurs with the current
>> AF_UNIX implementation and (b) it does not occur with the new implementation
>> (on the topic/af_unix branch). With both client1 and client2, I see
>> "connect() apparently succeeded immediately" using the new implementation.
>>
>> The new implementation is not yet ready for prime time, but with any luck it
>> might be ready within a few months.
>>
> That sounds great, and exactly like the behavior under Linux. I'd certainly be
> happy to test the new implementation as it gets closer, and also happy to expand
> or improve the test apps to cover a wider range of functionality and/or
> usability (e.g. run both client and server via a fork.) Feel free to let me know
> what would be particularly useful.
Thanks. I'll take you up on that when the branch is in slightly better shape.
Ken
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2020-12-06 22:33 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-25 21:47 Unix Domain Socket Limitation? Norton Allen
2020-11-25 22:27 ` Ken Brown
[not found] ` <4260ad1b-4ab2-fa36-fd0e-7c9644560114@huarp.harvard.edu>
2020-11-26 17:13 ` Ken Brown
2020-11-30 17:19 ` Norton Allen
2020-11-30 18:14 ` Ken Brown
2020-11-30 18:26 ` Norton Allen
2020-11-30 23:19 ` Ken Brown
2020-12-01 2:14 ` Norton Allen
2020-12-01 2:22 ` Norton Allen
2020-12-02 17:30 ` Norton Allen
2020-12-04 1:11 ` Ken Brown
2020-12-04 13:51 ` Norton Allen
2020-12-05 23:52 ` Ken Brown
2020-12-06 17:17 ` Norton Allen
2020-12-06 22:32 ` Ken Brown
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).