From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.ispras.ru (mail.ispras.ru [83.149.199.84]) by sourceware.org (Postfix) with ESMTPS id B916938515F6 for ; Sat, 9 Apr 2022 19:35:06 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B916938515F6 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=ispras.ru Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=ispras.ru Received: from mail.ispras.ru (unknown [83.149.199.84]) by mail.ispras.ru (Postfix) with ESMTPSA id 663B2406BB55; Sat, 9 Apr 2022 19:35:03 +0000 (UTC) MIME-Version: 1.0 Date: Sat, 09 Apr 2022 22:35:03 +0300 From: Alexey Izbyshev To: Takashi Yano Cc: cygwin@cygwin.com Subject: Re: Deadlock of the process tree when running make In-Reply-To: <20220410025410.196aa0a04368147dbbb31d3e@nifty.ne.jp> References: <9388316255ada0e0fcb2d849cce5a894@ispras.ru> <20220409191743.6da2268a36e8c9b4ab22c722@nifty.ne.jp> <1ecd670b1cdff43e0b0d7e5ee4c9cfc5@ispras.ru> <20220409204619.dd0e53902d5e108ef462e510@nifty.ne.jp> <907ce1b4416a826cb07990dd601bd687@ispras.ru> <20220410015753.753e2a238513eaf2a3da81e9@nifty.ne.jp> <20220410025410.196aa0a04368147dbbb31d3e@nifty.ne.jp> User-Agent: Roundcube Webmail/1.4.4 Message-ID: X-Sender: izbyshev@ispras.ru Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-0.1 required=5.0 tests=BAYES_00, DOS_RCVD_IP_TWICE_B, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: cygwin@cygwin.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: General Cygwin discussions and problem reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 Apr 2022 19:35:09 -0000 On 2022-04-09 20:54, Takashi Yano wrote: > Thanks for checking. This seems to be normal. Then, I cannot > understand why the ClosePseudoConsole() call is blocked... > > The document by Microsoft mentions the blocking conditions of > ClosePseudoConsole(): > https://docs.microsoft.com/en-us/windows/console/closepseudoconsole > however, the thread above is draining the channel. I've decided to check what object ClosePseudoConsole() waits for. The wait happens inside unexported KERNELBASE!_ClosePseudoConsoleMembers function. Here is the relevant part: 76589fb5 8b4e08 mov ecx,dword ptr [esi+8] 76589fb8 e8c2fdffff call KERNELBASE!_HandleIsValid (76589d7f) 76589fbd 84c0 test al,al 76589fbf 7456 je KERNELBASE!_ClosePseudoConsoleMembers+0x89 (7658a017) 76589fc1 8d45fc lea eax,[ebp-4] 76589fc4 895dfc mov dword ptr [ebp-4],ebx 76589fc7 50 push eax 76589fc8 51 push ecx 76589fc9 e8c23ef5ff call KERNELBASE!GetExitCodeProcess (764dde90) 76589fce 85c0 test eax,eax 76589fd0 7414 je KERNELBASE!_ClosePseudoConsoleMembers+0x58 (76589fe6) 76589fd2 817dfc03010000 cmp dword ptr [ebp-4],103h 76589fd9 750b jne KERNELBASE!_ClosePseudoConsoleMembers+0x58 (76589fe6) 76589fdb 53 push ebx 76589fdc 6aff push 0FFFFFFFFh 76589fde ff7608 push dword ptr [esi+8] 76589fe1 e8ba74f6ff call KERNELBASE!WaitForSingleObjectEx (764f14a0) "esi" is the argument of ClosePseudoConsole(), so the first mov dereferences it with an offset and loads a process handle. Then, if this handle is valid, it calls GetExitCodeProcess(), and if it succeeds and returns STILL_ACTIVE, it waits for that process. I've checked that hanging bash process has only 3 process handles: for itself, for dead javac, and for conhost.exe. So obviously it waits for the latter to terminate. (After I did all this, I realized there was much easier way to get this result via "Analyze wait chain" feature of Task Manager). Unfortunately, I don't know anything about Windows consoles, but just in case I also checked what 5 threads of conhost.exe are waiting for: 1. Tries to enter a critical section (Task Manager claims it waits for thread 4, so probably the latter owns it). 2. Waits on a handle for "pty1-from-master-nat" named pipe. 3. Waits for an anonymous event. 4. Waits on a handle for "\Device\ConDrv" (in DeviceIoControl()). 5. Blocked in GetMessageW(). It's also worth of note that this conhost.exe seems to be the only one related to the Cygwin process tree (as well as the only related non-Cygwin process). All other conhost.exe processes were created before I started my stress test. My guess is that this conhost.exe was created for a native app started from a Cygwin process. Could it be some race condition/bug that prevented conhost.exe from terminating once the native process (probably javac?) died? Alexey