* Re: python fails asyncio tests (py 3.7 & 3.8)
@ 2020-12-02 9:01 Mark Geisert
2020-12-02 13:38 ` Corinna Vinschen
0 siblings, 1 reply; 6+ messages in thread
From: Mark Geisert @ 2020-12-02 9:01 UTC (permalink / raw)
To: cygwin-developers
Hi folks,
I'm following up on the OP's investigation supplied in
https://cygwin.com/pipermail/cygwin/2020-November/246832.html .
The situation is a socket select thread stuck in a wait-for-event loop that
doesn't realize select() is trying to cleanup that thread before returning a
result to the app. Here is the relevant part of an strace log:
> 114 8495682 [main] python3.8 1987 start_thread_socket: stuff_start 0xFFFF8C38
> 68 8495750 [main] python3.8 1987 cygthread::create: name socksel, id 0x737C, this 0x180234778
> 76 8495826 [main] python3.8 1987 cygthread::create: activated name 'socksel', thread_sync 0x3A8 for id 0x737C
> 122 8495948 [socksel] python3.8 1987 thread_socket: stuff_start 0xFFFF8C38, timeout 4294967295
> 78 8496026 [main] python3.8 1987 select_stuff::wait: m 4, us 10000, wmfo_timeout -1
> 77 8496103 [socksel] python3.8 1987 fhandler_socket_local::af_local_connect: af_local_connect called, no_getpeereid=0
> 115 8496218 [socksel] python3.8 1987 fhandler_socket_local::af_local_send_secret: Sending af_local secret succeeded
> 95 8496313 [socksel] python3.8 1987 fhandler_socket_local::af_local_recv_secret: entered
> 11450 8507763 [main] python3.8 1987 select_stuff::wait: wait_ret 3, m = 4. verifying
> 135 8507898 [main] python3.8 1987 select_stuff::wait: timed out
> 98 8507996 [main] python3.8 1987 select_stuff::wait: returning 1
> 84 8508080 [main] python3.8 1987 select: sel.wait returns 1
> 73 8508153 [main] python3.8 1987 select_stuff::cleanup: calling cleanup routines
> 78 8508231 [main] python3.8 1987 socket_cleanup: si 0x800324910 si->thread 0x180234778
[end of strace.. nothing further happens]
The 'socksel' thread is shown entering af_local_recv_secret(), so this is all part
of local socket connection startup, when a secret is sent and received, then
credentials are sent and received. The socksel thread is looping on a
WSAWaitForMultipleEvents() call. The OP suggested using select_info.stop_thread
to indicate that wait loops should exit. That would work further up the stack,
but at this level the code doesn't (currently) see the appropriate select_info.
I believe this is where I need to add such code, unless I'm missing something.
Here is the stack while the socksel thread is looping:
> (gdb) bt
> #0 0x00007ffdff2ec8f4 in ntdll!ZwWaitForMultipleObjects ()
> from /c/WINDOWS/SYSTEM32/ntdll.dll
> #1 0x00007ffdfce58910 in WaitForMultipleObjectsEx ()
> from /c/WINDOWS/System32/KERNELBASE.dll
> #2 0x00000001800a76fa in fhandler_socket_wsock::wait_for_events (
> this=0x180356450, event_mask=33, flags=<optimized out>)
> at /oss/src/winsup/cygwin/fhandler_socket_inet.cc:426
> #3 0x00000001800a81b5 in fhandler_socket_wsock::wait_for_events (
> this=this@entry=0x180356450, event_mask=event_mask@entry=33,
> flags=flags@entry=0) at /oss/src/winsup/cygwin/fhandler.h:656
> #4 0x00000001800ab820 in fhandler_socket_local::recv_internal (
> this=0x180356450, wsamsg=0x5fc930, use_recvmsg=false)
> at /oss/src/winsup/cygwin/fhandler_socket_local.cc:1167
> #5 0x00000001800a515b in fhandler_socket_wsock::recvfrom (
> this=<optimized out>, in_ptr=<optimized out>, len=<optimized out>,
> flags=<optimized out>, from=0x0, fromlen=0x0)
> at /oss/src/winsup/cygwin/fhandler_socket_inet.cc:1271
> #6 0x00000001800ad38f in fhandler_socket_local::af_local_recv_secret (
> this=this@entry=0x180356450)
> at /oss/src/winsup/cygwin/fhandler_socket_local.cc:437
> #7 0x00000001800ad929 in fhandler_socket_local::af_local_connect (
> this=0x180356450) at /oss/src/winsup/cygwin/fhandler_socket_local.cc:539
> #8 fhandler_socket_local::af_local_connect (this=0x180356450)
> at /oss/src/winsup/cygwin/fhandler_socket_local.cc:527
> #9 0x00000001800a75b6 in fhandler_socket_wsock::evaluate_events (
> this=0x180356450, event_mask=63, events=@0x5fcb68: 0,
> erase=erase@entry=false)
> at /oss/src/winsup/cygwin/fhandler_socket_inet.cc:313
> #10 0x0000000180129052 in peek_socket (me=me@entry=0x8002f4590)
> at /oss/src/winsup/cygwin/select.cc:1634
> #11 0x0000000180129221 in thread_socket (arg=0x80028ce10)
> at /oss/src/winsup/cygwin/select.cc:1664
> #12 0x0000000180046686 in cygthread::callfunc (
> this=this@entry=0x180234778 <threads+88>,
> issimplestub=issimplestub@entry=false)
> at /oss/src/winsup/cygwin/cygthread.cc:48
> #13 0x0000000180046c46 in cygthread::stub (
> arg=arg@entry=0x180234778 <threads+88>)
> at /oss/src/winsup/cygwin/cygthread.cc:91
> #14 0x0000000180047706 in _cygtls::call2 (this=0x5fce00,
> func=0x180046bc0 <cygthread::stub(void*)>, arg=0x180234778 <threads+88>,
> buf=buf@entry=0x5fcd20) at /oss/src/winsup/cygwin/cygtls.cc:40
> #15 0x00000001800477b4 in _cygtls::call (func=<optimized out>,
> arg=<optimized out>) at /usr/include/w32api/psdk_inc/intrin-impl.h:790
> #16 0x00007ffdfda47034 in KERNEL32!BaseThreadInitThunk ()
> from /c/WINDOWS/System32/KERNEL32.DLL
> #17 0x00007ffdff29cec1 in ntdll!RtlUserThreadStart ()
> from /c/WINDOWS/SYSTEM32/ntdll.dll
> #18 0x0000000000000000 in ?? ()
> Backtrace stopped: previous frame inner to this frame (corrupt stack?)
> (gdb)
Just to rule out one slight possibility, I built a different Cygwin DLL from
topic/af_unix but the resulting strace output is the same. So this doesn't seem
to be related to Ken's work.
..mark
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: python fails asyncio tests (py 3.7 & 3.8)
2020-12-02 9:01 python fails asyncio tests (py 3.7 & 3.8) Mark Geisert
@ 2020-12-02 13:38 ` Corinna Vinschen
2020-12-03 2:09 ` Mark Geisert
0 siblings, 1 reply; 6+ messages in thread
From: Corinna Vinschen @ 2020-12-02 13:38 UTC (permalink / raw)
To: cygwin-developers
Hi Mark,
On Dec 2 01:01, Mark Geisert wrote:
> Hi folks,
> I'm following up on the OP's investigation supplied in
> https://cygwin.com/pipermail/cygwin/2020-November/246832.html .
> The situation is a socket select thread stuck in a wait-for-event loop that
> doesn't realize select() is trying to cleanup that thread before returning a
> result to the app. Here is the relevant part of an strace log:
>
> > 114 8495682 [main] python3.8 1987 start_thread_socket: stuff_start 0xFFFF8C38
> > 68 8495750 [main] python3.8 1987 cygthread::create: name socksel, id 0x737C, this 0x180234778
> > 76 8495826 [main] python3.8 1987 cygthread::create: activated name 'socksel', thread_sync 0x3A8 for id 0x737C
> > 122 8495948 [socksel] python3.8 1987 thread_socket: stuff_start 0xFFFF8C38, timeout 4294967295
> > 78 8496026 [main] python3.8 1987 select_stuff::wait: m 4, us 10000, wmfo_timeout -1
> > 77 8496103 [socksel] python3.8 1987 fhandler_socket_local::af_local_connect: af_local_connect called, no_getpeereid=0
> > 115 8496218 [socksel] python3.8 1987 fhandler_socket_local::af_local_send_secret: Sending af_local secret succeeded
> > 95 8496313 [socksel] python3.8 1987 fhandler_socket_local::af_local_recv_secret: entered
> > 11450 8507763 [main] python3.8 1987 select_stuff::wait: wait_ret 3, m = 4. verifying
> > 135 8507898 [main] python3.8 1987 select_stuff::wait: timed out
> > 98 8507996 [main] python3.8 1987 select_stuff::wait: returning 1
> > 84 8508080 [main] python3.8 1987 select: sel.wait returns 1
> > 73 8508153 [main] python3.8 1987 select_stuff::cleanup: calling cleanup routines
> > 78 8508231 [main] python3.8 1987 socket_cleanup: si 0x800324910 si->thread 0x180234778
> [end of strace.. nothing further happens]
>
> The 'socksel' thread is shown entering af_local_recv_secret(), so this is
> all part of local socket connection startup, when a secret is sent and
> received, then credentials are sent and received. The socksel thread is
> looping on a WSAWaitForMultipleEvents() call. The OP suggested using
> select_info.stop_thread to indicate that wait loops should exit. That would
> work further up the stack, but at this level the code doesn't (currently)
> see the appropriate select_info.
This is apparently an old problem in the still current AF_LOCAL
implementation. Christian Franke encountered it when porting postfix:
https://sourceware.org/legacy-ml/cygwin/2014-08/msg00420.html
The problem is the security handshake between listening/accepting socket
and connecting socket. The connecting socket send its half of the
handshake and waits for accept on the other side to return the other
half. However, if the listening side doesn't accept right away, the
connecting side hangs.
The workaround right now is to call
int peercred_off = 1;
fd = socket (AF_LOCAL, SOCK_STREAM, 0);
setsockopt(fd, SOL_SOCKET, SO_PEERCRED, &peercred_off, sizeof peercred_off);
This disables the security handshake.
Corinna
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: python fails asyncio tests (py 3.7 & 3.8)
2020-12-02 13:38 ` Corinna Vinschen
@ 2020-12-03 2:09 ` Mark Geisert
2020-12-03 6:57 ` Mark Geisert
0 siblings, 1 reply; 6+ messages in thread
From: Mark Geisert @ 2020-12-03 2:09 UTC (permalink / raw)
To: cygwin-developers
Hi Corinna,
On Wed, 2 Dec 2020, Corinna Vinschen wrote:
[...]
>
> This is apparently an old problem in the still current AF_LOCAL
> implementation. Christian Franke encountered it when porting postfix:
>
> https://sourceware.org/legacy-ml/cygwin/2014-08/msg00420.html
>
> The problem is the security handshake between listening/accepting socket
> and connecting socket. The connecting socket send its half of the
> handshake and waits for accept on the other side to return the other
> half. However, if the listening side doesn't accept right away, the
> connecting side hangs.
>
> The workaround right now is to call
>
> int peercred_off = 1;
> fd = socket (AF_LOCAL, SOCK_STREAM, 0);
> setsockopt(fd, SOL_SOCKET, SO_PEERCRED, &peercred_off, sizeof peercred_off);
>
> This disables the security handshake.
I see. For this problem report should the setsockopt() workaround be
applied to Python (yikes!) or can it be done for all apps at once in
fhandler_socket_local::socket()? I might be able to look into the
standing issue with the security handshake later on, if (my) time permits.
I appreciate the history/explanation on this.
Thanks & Regards,
..mark
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: python fails asyncio tests (py 3.7 & 3.8)
2020-12-03 2:09 ` Mark Geisert
@ 2020-12-03 6:57 ` Mark Geisert
2020-12-03 9:28 ` Corinna Vinschen
0 siblings, 1 reply; 6+ messages in thread
From: Mark Geisert @ 2020-12-03 6:57 UTC (permalink / raw)
To: cygwin-developers
[replying to myself.. sorry..]
On Wed, 2 Dec 2020, Mark Geisert wrote:
> Hi Corinna,
>
> On Wed, 2 Dec 2020, Corinna Vinschen wrote:
> [...]
>>
>> This is apparently an old problem in the still current AF_LOCAL
>> implementation. Christian Franke encountered it when porting postfix:
>>
>> https://sourceware.org/legacy-ml/cygwin/2014-08/msg00420.html
>>
>> The problem is the security handshake between listening/accepting socket
>> and connecting socket. The connecting socket send its half of the
>> handshake and waits for accept on the other side to return the other
>> half. However, if the listening side doesn't accept right away, the
>> connecting side hangs.
>>
>> The workaround right now is to call
>>
>> int peercred_off = 1;
>> fd = socket (AF_LOCAL, SOCK_STREAM, 0);
>> setsockopt(fd, SOL_SOCKET, SO_PEERCRED, &peercred_off, sizeof
>> peercred_off);
>>
>> This disables the security handshake.
>
> I see. For this problem report should the setsockopt() workaround be applied
> to Python (yikes!) or can it be done for all apps at once in
> fhandler_socket_local::socket()? I might be able to look into the standing
> issue with the security handshake later on, if (my) time permits. I
> appreciate the history/explanation on this.
> Thanks & Regards,
>
> ..mark
Putting the workaround in fhandler_socket_local::socket() compiles fine
and the OP's Python testing issue seems fixed. OTOH XWin and X apps exit
soon after starting so that's no bueno.
It looks like the individual Python tests that exhibit the problem are
going to need patching. Does that sound about right?
Thanks again,
..mark
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: python fails asyncio tests (py 3.7 & 3.8)
2020-12-03 6:57 ` Mark Geisert
@ 2020-12-03 9:28 ` Corinna Vinschen
2021-02-01 10:17 ` Marco Atzeri
0 siblings, 1 reply; 6+ messages in thread
From: Corinna Vinschen @ 2020-12-03 9:28 UTC (permalink / raw)
To: cygwin-developers
On Dec 2 22:57, Mark Geisert wrote:
> [replying to myself.. sorry..]
>
> On Wed, 2 Dec 2020, Mark Geisert wrote:
>
> > Hi Corinna,
> >
> > On Wed, 2 Dec 2020, Corinna Vinschen wrote:
> > [...]
> > >
> > > This is apparently an old problem in the still current AF_LOCAL
> > > implementation. Christian Franke encountered it when porting postfix:
> > >
> > > https://sourceware.org/legacy-ml/cygwin/2014-08/msg00420.html
> > >
> > > The problem is the security handshake between listening/accepting socket
> > > and connecting socket. The connecting socket send its half of the
> > > handshake and waits for accept on the other side to return the other
> > > half. However, if the listening side doesn't accept right away, the
> > > connecting side hangs.
> > >
> > > The workaround right now is to call
> > >
> > > int peercred_off = 1;
> > > fd = socket (AF_LOCAL, SOCK_STREAM, 0);
> > > setsockopt(fd, SOL_SOCKET, SO_PEERCRED, &peercred_off, sizeof
> > > peercred_off);
> > >
> > > This disables the security handshake.
> >
> > I see. For this problem report should the setsockopt() workaround be
> > applied to Python (yikes!) or can it be done for all apps at once in
> > fhandler_socket_local::socket()? I might be able to look into the
> > standing issue with the security handshake later on, if (my) time
> > permits. I appreciate the history/explanation on this.
> > Thanks & Regards,
> >
> > ..mark
>
> Putting the workaround in fhandler_socket_local::socket() compiles fine and
> the OP's Python testing issue seems fixed. OTOH XWin and X apps exit soon
> after starting so that's no bueno.
>
> It looks like the individual Python tests that exhibit the problem are going
> to need patching. Does that sound about right?
In theory it's just a testcase and we have a known workaround for
that. The testcase can be marked as XFAIL for Cygwin or get the
extra setsockopt() call for Cygwin, both is fine.
Corinna
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: python fails asyncio tests (py 3.7 & 3.8)
2020-12-03 9:28 ` Corinna Vinschen
@ 2021-02-01 10:17 ` Marco Atzeri
0 siblings, 0 replies; 6+ messages in thread
From: Marco Atzeri @ 2021-02-01 10:17 UTC (permalink / raw)
To: cygwin-developers
On 03.12.2020 10:28, Corinna Vinschen via Cygwin-developers wrote:
> On Dec 2 22:57, Mark Geisert wrote:
>> [replying to myself.. sorry..]
>>
>> On Wed, 2 Dec 2020, Mark Geisert wrote:
>>
>>> Hi Corinna,
>>>
>>> On Wed, 2 Dec 2020, Corinna Vinschen wrote:
>>> [...]
>>>>
>>>> This is apparently an old problem in the still current AF_LOCAL
>>>> implementation. Christian Franke encountered it when porting postfix:
>>>>
>>>> https://sourceware.org/legacy-ml/cygwin/2014-08/msg00420.html
>>>>
>>>> The problem is the security handshake between listening/accepting socket
>>>> and connecting socket. The connecting socket send its half of the
>>>> handshake and waits for accept on the other side to return the other
>>>> half. However, if the listening side doesn't accept right away, the
>>>> connecting side hangs.
>>>>
>>>> The workaround right now is to call
>>>>
>>>> int peercred_off = 1;
>>>> fd = socket (AF_LOCAL, SOCK_STREAM, 0);
>>>> setsockopt(fd, SOL_SOCKET, SO_PEERCRED, &peercred_off, sizeof
>>>> peercred_off);
>>>>
>>>> This disables the security handshake.
>>>
>>> I see. For this problem report should the setsockopt() workaround be
>>> applied to Python (yikes!) or can it be done for all apps at once in
>>> fhandler_socket_local::socket()? I might be able to look into the
>>> standing issue with the security handshake later on, if (my) time
>>> permits. I appreciate the history/explanation on this.
>>> Thanks & Regards,
>>>
>>> ..mark
>>
>> Putting the workaround in fhandler_socket_local::socket() compiles fine and
>> the OP's Python testing issue seems fixed. OTOH XWin and X apps exit soon
>> after starting so that's no bueno.
>>
>> It looks like the individual Python tests that exhibit the problem are going
>> to need patching. Does that sound about right?
>
> In theory it's just a testcase and we have a known workaround for
> that. The testcase can be marked as XFAIL for Cygwin or get the
> extra setsockopt() call for Cygwin, both is fine.
>
>
> Corinna
>
the latest python packages to solve the asyncio issue are patched
to bypass Cygwin "socketpair" and use an internal alternative
Python3 version for system without that functionality.
But of course proper working socket functions will be appreciated.
https://sourceware.org/pipermail/cygwin/2021-January/247610.html
a, b = _socket.socketpair(family, type, proto)
SystemError: <built-in function socketpair> returned NULL without
setting an error
https://sourceware.org/pipermail/cygwin/2021-January/247679.html
Regards
Marco
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2021-02-01 10:17 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-02 9:01 python fails asyncio tests (py 3.7 & 3.8) Mark Geisert
2020-12-02 13:38 ` Corinna Vinschen
2020-12-03 2:09 ` Mark Geisert
2020-12-03 6:57 ` Mark Geisert
2020-12-03 9:28 ` Corinna Vinschen
2021-02-01 10:17 ` Marco Atzeri
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).