From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from m0.truegem.net (m0.truegem.net [69.55.228.47]) by sourceware.org (Postfix) with ESMTPS id 1D1CE3857824 for ; Wed, 2 Dec 2020 09:01:35 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 1D1CE3857824 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=maxrnd.com Authentication-Results: sourceware.org; spf=none smtp.mailfrom=mark@maxrnd.com Received: (from daemon@localhost) by m0.truegem.net (8.12.11/8.12.11) id 0B291XJv044444 for ; Wed, 2 Dec 2020 01:01:33 -0800 (PST) (envelope-from mark@maxrnd.com) Received: from 162-235-43-67.lightspeed.irvnca.sbcglobal.net(162.235.43.67), claiming to be "[192.168.1.100]" via SMTP by m0.truegem.net, id smtpdsHyPEy; Wed Dec 2 01:01:32 2020 X-Mozilla-News-Host: news://news.gmane.org:119 From: Mark Geisert Subject: Re: python fails asyncio tests (py 3.7 & 3.8) To: cygwin-developers@cygwin.com Message-ID: <9976c726-8bfd-febe-ac86-f7cbf3cc958b@maxrnd.com> Date: Wed, 2 Dec 2020 01:01:32 -0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0 SeaMonkey/2.49.4 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-0.0 required=5.0 tests=BAYES_00, FAKE_REPLY_A1, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: cygwin-developers@cygwin.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Cygwin core component developers mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Dec 2020 09:01:45 -0000 Hi folks, I'm following up on the OP's investigation supplied in https://cygwin.com/pipermail/cygwin/2020-November/246832.html . The situation is a socket select thread stuck in a wait-for-event loop that doesn't realize select() is trying to cleanup that thread before returning a result to the app. Here is the relevant part of an strace log: > 114 8495682 [main] python3.8 1987 start_thread_socket: stuff_start 0xFFFF8C38 > 68 8495750 [main] python3.8 1987 cygthread::create: name socksel, id 0x737C, this 0x180234778 > 76 8495826 [main] python3.8 1987 cygthread::create: activated name 'socksel', thread_sync 0x3A8 for id 0x737C > 122 8495948 [socksel] python3.8 1987 thread_socket: stuff_start 0xFFFF8C38, timeout 4294967295 > 78 8496026 [main] python3.8 1987 select_stuff::wait: m 4, us 10000, wmfo_timeout -1 > 77 8496103 [socksel] python3.8 1987 fhandler_socket_local::af_local_connect: af_local_connect called, no_getpeereid=0 > 115 8496218 [socksel] python3.8 1987 fhandler_socket_local::af_local_send_secret: Sending af_local secret succeeded > 95 8496313 [socksel] python3.8 1987 fhandler_socket_local::af_local_recv_secret: entered > 11450 8507763 [main] python3.8 1987 select_stuff::wait: wait_ret 3, m = 4. verifying > 135 8507898 [main] python3.8 1987 select_stuff::wait: timed out > 98 8507996 [main] python3.8 1987 select_stuff::wait: returning 1 > 84 8508080 [main] python3.8 1987 select: sel.wait returns 1 > 73 8508153 [main] python3.8 1987 select_stuff::cleanup: calling cleanup routines > 78 8508231 [main] python3.8 1987 socket_cleanup: si 0x800324910 si->thread 0x180234778 [end of strace.. nothing further happens] The 'socksel' thread is shown entering af_local_recv_secret(), so this is all part of local socket connection startup, when a secret is sent and received, then credentials are sent and received. The socksel thread is looping on a WSAWaitForMultipleEvents() call. The OP suggested using select_info.stop_thread to indicate that wait loops should exit. That would work further up the stack, but at this level the code doesn't (currently) see the appropriate select_info. I believe this is where I need to add such code, unless I'm missing something. Here is the stack while the socksel thread is looping: > (gdb) bt > #0 0x00007ffdff2ec8f4 in ntdll!ZwWaitForMultipleObjects () > from /c/WINDOWS/SYSTEM32/ntdll.dll > #1 0x00007ffdfce58910 in WaitForMultipleObjectsEx () > from /c/WINDOWS/System32/KERNELBASE.dll > #2 0x00000001800a76fa in fhandler_socket_wsock::wait_for_events ( > this=0x180356450, event_mask=33, flags=) > at /oss/src/winsup/cygwin/fhandler_socket_inet.cc:426 > #3 0x00000001800a81b5 in fhandler_socket_wsock::wait_for_events ( > this=this@entry=0x180356450, event_mask=event_mask@entry=33, > flags=flags@entry=0) at /oss/src/winsup/cygwin/fhandler.h:656 > #4 0x00000001800ab820 in fhandler_socket_local::recv_internal ( > this=0x180356450, wsamsg=0x5fc930, use_recvmsg=false) > at /oss/src/winsup/cygwin/fhandler_socket_local.cc:1167 > #5 0x00000001800a515b in fhandler_socket_wsock::recvfrom ( > this=, in_ptr=, len=, > flags=, from=0x0, fromlen=0x0) > at /oss/src/winsup/cygwin/fhandler_socket_inet.cc:1271 > #6 0x00000001800ad38f in fhandler_socket_local::af_local_recv_secret ( > this=this@entry=0x180356450) > at /oss/src/winsup/cygwin/fhandler_socket_local.cc:437 > #7 0x00000001800ad929 in fhandler_socket_local::af_local_connect ( > this=0x180356450) at /oss/src/winsup/cygwin/fhandler_socket_local.cc:539 > #8 fhandler_socket_local::af_local_connect (this=0x180356450) > at /oss/src/winsup/cygwin/fhandler_socket_local.cc:527 > #9 0x00000001800a75b6 in fhandler_socket_wsock::evaluate_events ( > this=0x180356450, event_mask=63, events=@0x5fcb68: 0, > erase=erase@entry=false) > at /oss/src/winsup/cygwin/fhandler_socket_inet.cc:313 > #10 0x0000000180129052 in peek_socket (me=me@entry=0x8002f4590) > at /oss/src/winsup/cygwin/select.cc:1634 > #11 0x0000000180129221 in thread_socket (arg=0x80028ce10) > at /oss/src/winsup/cygwin/select.cc:1664 > #12 0x0000000180046686 in cygthread::callfunc ( > this=this@entry=0x180234778 , > issimplestub=issimplestub@entry=false) > at /oss/src/winsup/cygwin/cygthread.cc:48 > #13 0x0000000180046c46 in cygthread::stub ( > arg=arg@entry=0x180234778 ) > at /oss/src/winsup/cygwin/cygthread.cc:91 > #14 0x0000000180047706 in _cygtls::call2 (this=0x5fce00, > func=0x180046bc0 , arg=0x180234778 , > buf=buf@entry=0x5fcd20) at /oss/src/winsup/cygwin/cygtls.cc:40 > #15 0x00000001800477b4 in _cygtls::call (func=, > arg=) at /usr/include/w32api/psdk_inc/intrin-impl.h:790 > #16 0x00007ffdfda47034 in KERNEL32!BaseThreadInitThunk () > from /c/WINDOWS/System32/KERNEL32.DLL > #17 0x00007ffdff29cec1 in ntdll!RtlUserThreadStart () > from /c/WINDOWS/SYSTEM32/ntdll.dll > #18 0x0000000000000000 in ?? () > Backtrace stopped: previous frame inner to this frame (corrupt stack?) > (gdb) Just to rule out one slight possibility, I built a different Cygwin DLL from topic/af_unix but the resulting strace output is the same. So this doesn't seem to be related to Ken's work. ..mark