From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.ispras.ru (mail.ispras.ru [83.149.199.84]) by sourceware.org (Postfix) with ESMTPS id 8876C385DAA5 for ; Sun, 10 Apr 2022 20:49:39 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 8876C385DAA5 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=ispras.ru Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=ispras.ru Received: from mail.ispras.ru (unknown [83.149.199.84]) by mail.ispras.ru (Postfix) with ESMTPSA id A0D32406BB55; Sun, 10 Apr 2022 20:49:29 +0000 (UTC) MIME-Version: 1.0 Date: Sun, 10 Apr 2022 23:49:29 +0300 From: Alexey Izbyshev To: Takashi Yano Cc: cygwin@cygwin.com Subject: Re: Deadlock of the process tree when running make In-Reply-To: <0e1a53626639cb21369225ff9092ecfc@ispras.ru> References: <9388316255ada0e0fcb2d849cce5a894@ispras.ru> <20220409191743.6da2268a36e8c9b4ab22c722@nifty.ne.jp> <1ecd670b1cdff43e0b0d7e5ee4c9cfc5@ispras.ru> <20220409204619.dd0e53902d5e108ef462e510@nifty.ne.jp> <907ce1b4416a826cb07990dd601bd687@ispras.ru> <20220410015753.753e2a238513eaf2a3da81e9@nifty.ne.jp> <20220410025410.196aa0a04368147dbbb31d3e@nifty.ne.jp> <7204ed0aa2d6b3fcfb239010e6b67646@ispras.ru> <20220410163432.00dd7b9f81f8f322d97688f2@nifty.ne.jp> <0e1a53626639cb21369225ff9092ecfc@ispras.ru> User-Agent: Roundcube Webmail/1.4.4 Message-ID: X-Sender: izbyshev@ispras.ru Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-0.0 required=5.0 tests=BAYES_00, DOS_RCVD_IP_TWICE_B, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: cygwin@cygwin.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: General Cygwin discussions and problem reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 10 Apr 2022 20:49:42 -0000 On 2022-04-10 15:13, Alexey Izbyshev wrote: > On 2022-04-10 10:34, Takashi Yano wrote: >> On Sat, 09 Apr 2022 23:26:51 +0300 >> Thanks for investigating. In the normal case, conhost.exe is >> terminated >> when hWritePipe is closed. > > Thanks for confirming. > >> >> Possibly, the hWritePipe has incorrect handle value. > > I've verified that the handle was correct by attaching via gdb to the > hanging bash and checking that hWritePipe field is now zeroed (which > happens only in the branch where _HandleIsValid returns true and > hWritePipe is closed). > > I've found something interesting though. I've modeled a similar > situation on another machine: > > 1. I've run a native process via bash. > 2. I've attached to bash via gdb and set a breakpoint on > ClosePseudoConsole(). > 3. I've killed the native process. > 4. The breakpoint was hit, and I looked at hWritePipe value. > > ProcessHacker shows it as "Unnamed file: \FileSystem\Npfs". Both bash > and conhost had a single handle with such name, and after I've > forcibly closed it in the bash process (while it was still suspended > by gdb), conhost.exe indeed died. > > Then I looked at the original hanging tree and found that the hanging > bash.exe still has a single handle displayed as "Unnamed file: > \FileSystem\Npfs". I don't know how to check what kernel object it > refers to, but at least its access rights are the same as for > hWritePipe that I've seen on another machine, and its handle count is > 1. So could it be another copy of hWritePipe, e.g. due to some handle > leak? > > I don't know how to verify whether this suspicious handle in bash.exe > is paired with "Unnamed file: \FileSystem\Npfs" in conhost.exe, other > than by forcibly closing it. If I close it and conhost.exe dies, it > will confirm "the extra handle" theory, but will also prevent further > investigation with the hanging tree. Do you have any advice? > I've found something that looked strange to me by checking handles in the hanging process tree: the hanging conhost.exe and the hanging bash.exe belong to different tests. Each test is a separate shell script in a separate make recipe, so it looks like conhost.exe was created by one test (which is still hanging at a later point in its script, trying to run grep), but then bash.exe belonging to another test somehow got a pseudoconsole referring to this conhost.exe and now hangs trying to close it. So it looks that Cygwin migrated the pseudoconsole between processes, and indeed fhandler_pty_slave::close_pseudoconsole() contains something looking like migration logic. And this logic contains the following call: DuplicateHandle (GetCurrentProcess (), ttyp->h_pcon_write_pipe, new_owner, &new_write_pipe, 0, TRUE, DUPLICATE_SAME_ACCESS); Is it safe to create an *inheritable* handle in another process here? Could it be that the target process spawns a child at the wrong moment (e.g. before it even knows about the newly created handle), and that handle unintentionally leaks into the child, triggering the hang afterwards? A similarly suspicious code is also in fhandler_pty_common::resize_pseudo_console(): DuplicateHandle (pcon_owner, get_ttyp ()->h_pcon_write_pipe, GetCurrentProcess (), &hpcon_local.hWritePipe, 0, TRUE, DUPLICATE_SAME_ACCESS); ResizePseudoConsole ((HPCON) &hpcon_local, size); CloseHandle (pcon_owner); CloseHandle (hpcon_local.hWritePipe); If another thread spawns a child using CreateProcess(bInheritHandles=TRUE) between DuplicateHandle() and CloseHandle(hpcon_local.hWritePipe), the handle will leak into the child. Sorry if this is a false lead, I haven't tried to really understand the pseudoconsole-related code yet. Thanks, Alexey