From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from conssluserg-05.nifty.com (conssluserg-05.nifty.com [210.131.2.90]) by sourceware.org (Postfix) with ESMTPS id 21DF4399D7C1 for ; Sun, 10 Apr 2022 07:35:16 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 21DF4399D7C1 Authentication-Results: sourceware.org; dmarc=fail (p=none dis=none) header.from=nifty.ne.jp Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=nifty.ne.jp Received: from Express5800-S70 (ak044095.dynamic.ppp.asahi-net.or.jp [119.150.44.95]) (authenticated) by conssluserg-05.nifty.com with ESMTP id 23A7YWsN014551; Sun, 10 Apr 2022 16:34:35 +0900 DKIM-Filter: OpenDKIM Filter v2.10.3 conssluserg-05.nifty.com 23A7YWsN014551 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nifty.ne.jp; s=dec2015msa; t=1649576076; bh=tWSj3HJgwziHzDY6jNeiK4Ze/7+bLpue6VdbDd5dp0I=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=VDwPgMIN7SphqolNlB4JAYTap4jdS0/ttV6Ys5680yVsiMqNIK0ADlASFlh22c2hz LOX6W/tyzggejG4HRhU/easOOkaMxR3IGfJuZGqpa3E1x3Z19DrUNc5imFAJojvS8g EBwCz941IYmjqqHr4YXvOMcZ2DlJ/vBw41lgzSJ+g11GmVXAUBJ95QXVS1sgD2i49a +NgKPFvw6IplgKl9MFXrutsz/O7rvcr7NwDfMulNWIuOPYgEZ0oL9FD3LVS9fOFCiW gzR01XzmCQgzaLBfXsG5YRhgNuq8tal592GWf54VjZxcZ8U792WkHDRK3B+ntjt35H RmlQQKAjUEsMQ== X-Nifty-SrcIP: [119.150.44.95] Date: Sun, 10 Apr 2022 16:34:32 +0900 From: Takashi Yano To: cygwin@cygwin.com Cc: Alexey Izbyshev Subject: Re: Deadlock of the process tree when running make Message-Id: <20220410163432.00dd7b9f81f8f322d97688f2@nifty.ne.jp> In-Reply-To: <7204ed0aa2d6b3fcfb239010e6b67646@ispras.ru> References: <9388316255ada0e0fcb2d849cce5a894@ispras.ru> <20220409191743.6da2268a36e8c9b4ab22c722@nifty.ne.jp> <1ecd670b1cdff43e0b0d7e5ee4c9cfc5@ispras.ru> <20220409204619.dd0e53902d5e108ef462e510@nifty.ne.jp> <907ce1b4416a826cb07990dd601bd687@ispras.ru> <20220410015753.753e2a238513eaf2a3da81e9@nifty.ne.jp> <20220410025410.196aa0a04368147dbbb31d3e@nifty.ne.jp> <7204ed0aa2d6b3fcfb239010e6b67646@ispras.ru> X-Mailer: Sylpheed 3.7.0 (GTK+ 2.24.30; i686-pc-mingw32) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-6.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: cygwin@cygwin.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: General Cygwin discussions and problem reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 10 Apr 2022 07:36:12 -0000 On Sat, 09 Apr 2022 23:26:51 +0300 Alexey Izbyshev wrote: > On 2022-04-09 22:35, Alexey Izbyshev wrote: > > On 2022-04-09 20:54, Takashi Yano wrote: > >> Thanks for checking. This seems to be normal. Then, I cannot > >> understand why the ClosePseudoConsole() call is blocked... > >> > >> The document by Microsoft mentions the blocking conditions of > >> ClosePseudoConsole(): > >> https://docs.microsoft.com/en-us/windows/console/closepseudoconsole > >> however, the thread above is draining the channel. > > > > I've decided to check what object ClosePseudoConsole() waits for. The > > wait happens inside unexported KERNELBASE!_ClosePseudoConsoleMembers > > function. Here is the relevant part: > > > > 76589fb5 8b4e08 mov ecx,dword ptr [esi+8] > > 76589fb8 e8c2fdffff call KERNELBASE!_HandleIsValid (76589d7f) > > 76589fbd 84c0 test al,al > > 76589fbf 7456 je > > KERNELBASE!_ClosePseudoConsoleMembers+0x89 (7658a017) > > 76589fc1 8d45fc lea eax,[ebp-4] > > 76589fc4 895dfc mov dword ptr [ebp-4],ebx > > 76589fc7 50 push eax > > 76589fc8 51 push ecx > > 76589fc9 e8c23ef5ff call KERNELBASE!GetExitCodeProcess > > (764dde90) > > 76589fce 85c0 test eax,eax > > 76589fd0 7414 je > > KERNELBASE!_ClosePseudoConsoleMembers+0x58 (76589fe6) > > 76589fd2 817dfc03010000 cmp dword ptr [ebp-4],103h > > 76589fd9 750b jne > > KERNELBASE!_ClosePseudoConsoleMembers+0x58 (76589fe6) > > 76589fdb 53 push ebx > > 76589fdc 6aff push 0FFFFFFFFh > > 76589fde ff7608 push dword ptr [esi+8] > > 76589fe1 e8ba74f6ff call KERNELBASE!WaitForSingleObjectEx > > (764f14a0) > > > > "esi" is the argument of ClosePseudoConsole(), so the first mov > > dereferences it with an offset and loads a process handle. Then, if > > this handle is valid, it calls GetExitCodeProcess(), and if it > > succeeds and returns STILL_ACTIVE, it waits for that process. > > > > I've checked that hanging bash process has only 3 process handles: for > > itself, for dead javac, and for conhost.exe. So obviously it waits for > > the latter to terminate. (After I did all this, I realized there was > > much easier way to get this result via "Analyze wait chain" feature of > > Task Manager). > > > > Unfortunately, I don't know anything about Windows consoles, but just > > in case I also checked what 5 threads of conhost.exe are waiting for: > > > > 1. Tries to enter a critical section (Task Manager claims it waits for > > thread 4, so probably the latter owns it). > > 2. Waits on a handle for "pty1-from-master-nat" named pipe. > > 3. Waits for an anonymous event. > > 4. Waits on a handle for "\Device\ConDrv" (in DeviceIoControl()). > > 5. Blocked in GetMessageW(). > > > > It's also worth of note that this conhost.exe seems to be the only one > > related to the Cygwin process tree (as well as the only related > > non-Cygwin process). All other conhost.exe processes were created > > before I started my stress test. > > > > My guess is that this conhost.exe was created for a native app started > > from a Cygwin process. Could it be some race condition/bug that > > prevented conhost.exe from terminating once the native process > > (probably javac?) died? > > > A few more things that might be important: > > * Clarification: thread 2 of conhost.exe waits in KernelBase!ReadFile(). > > * In the assembly part I omitted, before waiting on the conhost process, > _ClosePseudoConsoleMembers() closes the handle obtained from "dword ptr > [esi]", i.e. "hWritePipe" member of HPCON_INTERNAL struct. Thanks for investigating. In the normal case, conhost.exe is terminated when hWritePipe is closed. Possibly, the hWritePipe has incorrect handle value. -- Takashi Yano