public inbox for cygwin-developers@cygwin.com
 help / color / mirror / Atom feed
* Re: python fails asyncio tests (py 3.7 & 3.8)
@ 2020-12-02  9:01 Mark Geisert
  2020-12-02 13:38 ` Corinna Vinschen
  0 siblings, 1 reply; 6+ messages in thread
From: Mark Geisert @ 2020-12-02  9:01 UTC (permalink / raw)
  To: cygwin-developers

Hi folks,
I'm following up on the OP's investigation supplied in
https://cygwin.com/pipermail/cygwin/2020-November/246832.html .
The situation is a socket select thread stuck in a wait-for-event loop that 
doesn't realize select() is trying to cleanup that thread before returning a 
result to the app.  Here is the relevant part of an strace log:

>   114 8495682 [main] python3.8 1987 start_thread_socket: stuff_start 0xFFFF8C38
>    68 8495750 [main] python3.8 1987 cygthread::create: name socksel, id 0x737C, this 0x180234778
>    76 8495826 [main] python3.8 1987 cygthread::create: activated name 'socksel', thread_sync 0x3A8 for id 0x737C
>   122 8495948 [socksel] python3.8 1987 thread_socket: stuff_start 0xFFFF8C38, timeout 4294967295
>    78 8496026 [main] python3.8 1987 select_stuff::wait: m 4, us 10000, wmfo_timeout -1
>    77 8496103 [socksel] python3.8 1987 fhandler_socket_local::af_local_connect: af_local_connect called, no_getpeereid=0
>   115 8496218 [socksel] python3.8 1987 fhandler_socket_local::af_local_send_secret: Sending af_local secret succeeded
>    95 8496313 [socksel] python3.8 1987 fhandler_socket_local::af_local_recv_secret: entered
> 11450 8507763 [main] python3.8 1987 select_stuff::wait: wait_ret 3, m = 4.  verifying
>   135 8507898 [main] python3.8 1987 select_stuff::wait: timed out
>    98 8507996 [main] python3.8 1987 select_stuff::wait: returning 1
>    84 8508080 [main] python3.8 1987 select: sel.wait returns 1
>    73 8508153 [main] python3.8 1987 select_stuff::cleanup: calling cleanup routines
>    78 8508231 [main] python3.8 1987 socket_cleanup: si 0x800324910 si->thread 0x180234778
[end of strace.. nothing further happens]

The 'socksel' thread is shown entering af_local_recv_secret(), so this is all part 
of local socket connection startup, when a secret is sent and received, then 
credentials are sent and received.  The socksel thread is looping on a 
WSAWaitForMultipleEvents() call.  The OP suggested using select_info.stop_thread 
to indicate that wait loops should exit.  That would work further up the stack, 
but at this level the code doesn't (currently) see the appropriate select_info.

I believe this is where I need to add such code, unless I'm missing something.
Here is the stack while the socksel thread is looping:

> (gdb) bt
> #0  0x00007ffdff2ec8f4 in ntdll!ZwWaitForMultipleObjects ()
>    from /c/WINDOWS/SYSTEM32/ntdll.dll
> #1  0x00007ffdfce58910 in WaitForMultipleObjectsEx ()
>    from /c/WINDOWS/System32/KERNELBASE.dll
> #2  0x00000001800a76fa in fhandler_socket_wsock::wait_for_events (
>     this=0x180356450, event_mask=33, flags=<optimized out>)
>     at /oss/src/winsup/cygwin/fhandler_socket_inet.cc:426
> #3  0x00000001800a81b5 in fhandler_socket_wsock::wait_for_events (
>     this=this@entry=0x180356450, event_mask=event_mask@entry=33,
>     flags=flags@entry=0) at /oss/src/winsup/cygwin/fhandler.h:656
> #4  0x00000001800ab820 in fhandler_socket_local::recv_internal (
>     this=0x180356450, wsamsg=0x5fc930, use_recvmsg=false)
>     at /oss/src/winsup/cygwin/fhandler_socket_local.cc:1167
> #5  0x00000001800a515b in fhandler_socket_wsock::recvfrom (
>     this=<optimized out>, in_ptr=<optimized out>, len=<optimized out>,
>     flags=<optimized out>, from=0x0, fromlen=0x0)
>     at /oss/src/winsup/cygwin/fhandler_socket_inet.cc:1271
> #6  0x00000001800ad38f in fhandler_socket_local::af_local_recv_secret (
>     this=this@entry=0x180356450)
>     at /oss/src/winsup/cygwin/fhandler_socket_local.cc:437
> #7  0x00000001800ad929 in fhandler_socket_local::af_local_connect (
>     this=0x180356450) at /oss/src/winsup/cygwin/fhandler_socket_local.cc:539
> #8  fhandler_socket_local::af_local_connect (this=0x180356450)
>     at /oss/src/winsup/cygwin/fhandler_socket_local.cc:527
> #9  0x00000001800a75b6 in fhandler_socket_wsock::evaluate_events (
>     this=0x180356450, event_mask=63, events=@0x5fcb68: 0,
>     erase=erase@entry=false)
>     at /oss/src/winsup/cygwin/fhandler_socket_inet.cc:313
> #10 0x0000000180129052 in peek_socket (me=me@entry=0x8002f4590)
>     at /oss/src/winsup/cygwin/select.cc:1634
> #11 0x0000000180129221 in thread_socket (arg=0x80028ce10)
>     at /oss/src/winsup/cygwin/select.cc:1664
> #12 0x0000000180046686 in cygthread::callfunc (
>     this=this@entry=0x180234778 <threads+88>,
>     issimplestub=issimplestub@entry=false)
>     at /oss/src/winsup/cygwin/cygthread.cc:48
> #13 0x0000000180046c46 in cygthread::stub (
>     arg=arg@entry=0x180234778 <threads+88>)
>     at /oss/src/winsup/cygwin/cygthread.cc:91
> #14 0x0000000180047706 in _cygtls::call2 (this=0x5fce00,
>     func=0x180046bc0 <cygthread::stub(void*)>, arg=0x180234778 <threads+88>,
>     buf=buf@entry=0x5fcd20) at /oss/src/winsup/cygwin/cygtls.cc:40
> #15 0x00000001800477b4 in _cygtls::call (func=<optimized out>,
>     arg=<optimized out>) at /usr/include/w32api/psdk_inc/intrin-impl.h:790
> #16 0x00007ffdfda47034 in KERNEL32!BaseThreadInitThunk ()
>    from /c/WINDOWS/System32/KERNEL32.DLL
> #17 0x00007ffdff29cec1 in ntdll!RtlUserThreadStart ()
>    from /c/WINDOWS/SYSTEM32/ntdll.dll
> #18 0x0000000000000000 in ?? ()
> Backtrace stopped: previous frame inner to this frame (corrupt stack?)
> (gdb)

Just to rule out one slight possibility, I built a different Cygwin DLL from 
topic/af_unix but the resulting strace output is the same.  So this doesn't seem 
to be related to Ken's work.

..mark

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: python fails asyncio tests (py 3.7 & 3.8)
  2020-12-02  9:01 python fails asyncio tests (py 3.7 & 3.8) Mark Geisert
@ 2020-12-02 13:38 ` Corinna Vinschen
  2020-12-03  2:09   ` Mark Geisert
  0 siblings, 1 reply; 6+ messages in thread
From: Corinna Vinschen @ 2020-12-02 13:38 UTC (permalink / raw)
  To: cygwin-developers

Hi Mark,

On Dec  2 01:01, Mark Geisert wrote:
> Hi folks,
> I'm following up on the OP's investigation supplied in
> https://cygwin.com/pipermail/cygwin/2020-November/246832.html .
> The situation is a socket select thread stuck in a wait-for-event loop that
> doesn't realize select() is trying to cleanup that thread before returning a
> result to the app.  Here is the relevant part of an strace log:
> 
> >   114 8495682 [main] python3.8 1987 start_thread_socket: stuff_start 0xFFFF8C38
> >    68 8495750 [main] python3.8 1987 cygthread::create: name socksel, id 0x737C, this 0x180234778
> >    76 8495826 [main] python3.8 1987 cygthread::create: activated name 'socksel', thread_sync 0x3A8 for id 0x737C
> >   122 8495948 [socksel] python3.8 1987 thread_socket: stuff_start 0xFFFF8C38, timeout 4294967295
> >    78 8496026 [main] python3.8 1987 select_stuff::wait: m 4, us 10000, wmfo_timeout -1
> >    77 8496103 [socksel] python3.8 1987 fhandler_socket_local::af_local_connect: af_local_connect called, no_getpeereid=0
> >   115 8496218 [socksel] python3.8 1987 fhandler_socket_local::af_local_send_secret: Sending af_local secret succeeded
> >    95 8496313 [socksel] python3.8 1987 fhandler_socket_local::af_local_recv_secret: entered
> > 11450 8507763 [main] python3.8 1987 select_stuff::wait: wait_ret 3, m = 4.  verifying
> >   135 8507898 [main] python3.8 1987 select_stuff::wait: timed out
> >    98 8507996 [main] python3.8 1987 select_stuff::wait: returning 1
> >    84 8508080 [main] python3.8 1987 select: sel.wait returns 1
> >    73 8508153 [main] python3.8 1987 select_stuff::cleanup: calling cleanup routines
> >    78 8508231 [main] python3.8 1987 socket_cleanup: si 0x800324910 si->thread 0x180234778
> [end of strace.. nothing further happens]
> 
> The 'socksel' thread is shown entering af_local_recv_secret(), so this is
> all part of local socket connection startup, when a secret is sent and
> received, then credentials are sent and received.  The socksel thread is
> looping on a WSAWaitForMultipleEvents() call.  The OP suggested using
> select_info.stop_thread to indicate that wait loops should exit.  That would
> work further up the stack, but at this level the code doesn't (currently)
> see the appropriate select_info.

This is apparently an old problem in the still current AF_LOCAL
implementation.  Christian Franke encountered it when porting postfix:

https://sourceware.org/legacy-ml/cygwin/2014-08/msg00420.html

The problem is the security handshake between listening/accepting socket
and connecting socket.  The connecting socket send its half of the
handshake and waits for accept on the other side to return the other
half.  However, if the listening side doesn't accept right away, the
connecting side hangs.

The workaround right now is to call

  int peercred_off = 1;
  fd = socket (AF_LOCAL, SOCK_STREAM, 0);
  setsockopt(fd, SOL_SOCKET, SO_PEERCRED, &peercred_off, sizeof peercred_off);

This disables the security handshake.


Corinna

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: python fails asyncio tests (py 3.7 & 3.8)
  2020-12-02 13:38 ` Corinna Vinschen
@ 2020-12-03  2:09   ` Mark Geisert
  2020-12-03  6:57     ` Mark Geisert
  0 siblings, 1 reply; 6+ messages in thread
From: Mark Geisert @ 2020-12-03  2:09 UTC (permalink / raw)
  To: cygwin-developers

Hi Corinna,

On Wed, 2 Dec 2020, Corinna Vinschen wrote:
[...]
>
> This is apparently an old problem in the still current AF_LOCAL
> implementation.  Christian Franke encountered it when porting postfix:
>
> https://sourceware.org/legacy-ml/cygwin/2014-08/msg00420.html
>
> The problem is the security handshake between listening/accepting socket
> and connecting socket.  The connecting socket send its half of the
> handshake and waits for accept on the other side to return the other
> half.  However, if the listening side doesn't accept right away, the
> connecting side hangs.
>
> The workaround right now is to call
>
>  int peercred_off = 1;
>  fd = socket (AF_LOCAL, SOCK_STREAM, 0);
>  setsockopt(fd, SOL_SOCKET, SO_PEERCRED, &peercred_off, sizeof peercred_off);
>
> This disables the security handshake.

I see.  For this problem report should the setsockopt() workaround be 
applied to Python (yikes!) or can it be done for all apps at once in 
fhandler_socket_local::socket()?  I might be able to look into the 
standing issue with the security handshake later on, if (my) time permits. 
I appreciate the history/explanation on this.
Thanks & Regards,

..mark

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: python fails asyncio tests (py 3.7 & 3.8)
  2020-12-03  2:09   ` Mark Geisert
@ 2020-12-03  6:57     ` Mark Geisert
  2020-12-03  9:28       ` Corinna Vinschen
  0 siblings, 1 reply; 6+ messages in thread
From: Mark Geisert @ 2020-12-03  6:57 UTC (permalink / raw)
  To: cygwin-developers

[replying to myself.. sorry..]

On Wed, 2 Dec 2020, Mark Geisert wrote:

> Hi Corinna,
>
> On Wed, 2 Dec 2020, Corinna Vinschen wrote:
> [...]
>> 
>> This is apparently an old problem in the still current AF_LOCAL
>> implementation.  Christian Franke encountered it when porting postfix:
>> 
>> https://sourceware.org/legacy-ml/cygwin/2014-08/msg00420.html
>> 
>> The problem is the security handshake between listening/accepting socket
>> and connecting socket.  The connecting socket send its half of the
>> handshake and waits for accept on the other side to return the other
>> half.  However, if the listening side doesn't accept right away, the
>> connecting side hangs.
>> 
>> The workaround right now is to call
>>
>>  int peercred_off = 1;
>>  fd = socket (AF_LOCAL, SOCK_STREAM, 0);
>>  setsockopt(fd, SOL_SOCKET, SO_PEERCRED, &peercred_off, sizeof 
>> peercred_off);
>> 
>> This disables the security handshake.
>
> I see.  For this problem report should the setsockopt() workaround be applied 
> to Python (yikes!) or can it be done for all apps at once in 
> fhandler_socket_local::socket()?  I might be able to look into the standing 
> issue with the security handshake later on, if (my) time permits. I 
> appreciate the history/explanation on this.
> Thanks & Regards,
>
> ..mark

Putting the workaround in fhandler_socket_local::socket() compiles fine 
and the OP's Python testing issue seems fixed.  OTOH XWin and X apps exit 
soon after starting so that's no bueno.

It looks like the individual Python tests that exhibit the problem are 
going to need patching.  Does that sound about right?
Thanks again,

..mark

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: python fails asyncio tests (py 3.7 & 3.8)
  2020-12-03  6:57     ` Mark Geisert
@ 2020-12-03  9:28       ` Corinna Vinschen
  2021-02-01 10:17         ` Marco Atzeri
  0 siblings, 1 reply; 6+ messages in thread
From: Corinna Vinschen @ 2020-12-03  9:28 UTC (permalink / raw)
  To: cygwin-developers

On Dec  2 22:57, Mark Geisert wrote:
> [replying to myself.. sorry..]
> 
> On Wed, 2 Dec 2020, Mark Geisert wrote:
> 
> > Hi Corinna,
> > 
> > On Wed, 2 Dec 2020, Corinna Vinschen wrote:
> > [...]
> > > 
> > > This is apparently an old problem in the still current AF_LOCAL
> > > implementation.  Christian Franke encountered it when porting postfix:
> > > 
> > > https://sourceware.org/legacy-ml/cygwin/2014-08/msg00420.html
> > > 
> > > The problem is the security handshake between listening/accepting socket
> > > and connecting socket.  The connecting socket send its half of the
> > > handshake and waits for accept on the other side to return the other
> > > half.  However, if the listening side doesn't accept right away, the
> > > connecting side hangs.
> > > 
> > > The workaround right now is to call
> > > 
> > >  int peercred_off = 1;
> > >  fd = socket (AF_LOCAL, SOCK_STREAM, 0);
> > >  setsockopt(fd, SOL_SOCKET, SO_PEERCRED, &peercred_off, sizeof
> > > peercred_off);
> > > 
> > > This disables the security handshake.
> > 
> > I see.  For this problem report should the setsockopt() workaround be
> > applied to Python (yikes!) or can it be done for all apps at once in
> > fhandler_socket_local::socket()?  I might be able to look into the
> > standing issue with the security handshake later on, if (my) time
> > permits. I appreciate the history/explanation on this.
> > Thanks & Regards,
> > 
> > ..mark
> 
> Putting the workaround in fhandler_socket_local::socket() compiles fine and
> the OP's Python testing issue seems fixed.  OTOH XWin and X apps exit soon
> after starting so that's no bueno.
> 
> It looks like the individual Python tests that exhibit the problem are going
> to need patching.  Does that sound about right?

In theory it's just a testcase and we have a known workaround for
that.  The testcase can be marked as XFAIL for Cygwin or get the
extra setsockopt() call for Cygwin, both is fine.


Corinna

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: python fails asyncio tests (py 3.7 & 3.8)
  2020-12-03  9:28       ` Corinna Vinschen
@ 2021-02-01 10:17         ` Marco Atzeri
  0 siblings, 0 replies; 6+ messages in thread
From: Marco Atzeri @ 2021-02-01 10:17 UTC (permalink / raw)
  To: cygwin-developers

On 03.12.2020 10:28, Corinna Vinschen via Cygwin-developers wrote:
> On Dec  2 22:57, Mark Geisert wrote:
>> [replying to myself.. sorry..]
>>
>> On Wed, 2 Dec 2020, Mark Geisert wrote:
>>
>>> Hi Corinna,
>>>
>>> On Wed, 2 Dec 2020, Corinna Vinschen wrote:
>>> [...]
>>>>
>>>> This is apparently an old problem in the still current AF_LOCAL
>>>> implementation.  Christian Franke encountered it when porting postfix:
>>>>
>>>> https://sourceware.org/legacy-ml/cygwin/2014-08/msg00420.html
>>>>
>>>> The problem is the security handshake between listening/accepting socket
>>>> and connecting socket.  The connecting socket send its half of the
>>>> handshake and waits for accept on the other side to return the other
>>>> half.  However, if the listening side doesn't accept right away, the
>>>> connecting side hangs.
>>>>
>>>> The workaround right now is to call
>>>>
>>>>   int peercred_off = 1;
>>>>   fd = socket (AF_LOCAL, SOCK_STREAM, 0);
>>>>   setsockopt(fd, SOL_SOCKET, SO_PEERCRED, &peercred_off, sizeof
>>>> peercred_off);
>>>>
>>>> This disables the security handshake.
>>>
>>> I see.  For this problem report should the setsockopt() workaround be
>>> applied to Python (yikes!) or can it be done for all apps at once in
>>> fhandler_socket_local::socket()?  I might be able to look into the
>>> standing issue with the security handshake later on, if (my) time
>>> permits. I appreciate the history/explanation on this.
>>> Thanks & Regards,
>>>
>>> ..mark
>>
>> Putting the workaround in fhandler_socket_local::socket() compiles fine and
>> the OP's Python testing issue seems fixed.  OTOH XWin and X apps exit soon
>> after starting so that's no bueno.
>>
>> It looks like the individual Python tests that exhibit the problem are going
>> to need patching.  Does that sound about right?
> 
> In theory it's just a testcase and we have a known workaround for
> that.  The testcase can be marked as XFAIL for Cygwin or get the
> extra setsockopt() call for Cygwin, both is fine.
> 
> 
> Corinna
> 

the latest python packages to solve the asyncio issue are patched
to bypass  Cygwin "socketpair" and use an internal alternative
Python3 version for system without that functionality.

But of course proper working socket functions will be appreciated.

https://sourceware.org/pipermail/cygwin/2021-January/247610.html

      a, b = _socket.socketpair(family, type, proto)
SystemError: <built-in function socketpair> returned NULL without 
setting an error

https://sourceware.org/pipermail/cygwin/2021-January/247679.html

Regards
Marco

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-02-01 10:17 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-02  9:01 python fails asyncio tests (py 3.7 & 3.8) Mark Geisert
2020-12-02 13:38 ` Corinna Vinschen
2020-12-03  2:09   ` Mark Geisert
2020-12-03  6:57     ` Mark Geisert
2020-12-03  9:28       ` Corinna Vinschen
2021-02-01 10:17         ` Marco Atzeri

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).