From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.ispras.ru (mail.ispras.ru [83.149.199.84]) by sourceware.org (Postfix) with ESMTPS id 2DCB23858C53 for ; Wed, 13 Apr 2022 16:48:11 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 2DCB23858C53 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=ispras.ru Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=ispras.ru Received: from mail.ispras.ru (unknown [83.149.199.84]) by mail.ispras.ru (Postfix) with ESMTPSA id B905A40755CB; Wed, 13 Apr 2022 16:48:04 +0000 (UTC) MIME-Version: 1.0 Date: Wed, 13 Apr 2022 19:48:04 +0300 From: Alexey Izbyshev To: Takashi Yano Cc: cygwin@cygwin.com Subject: Re: Deadlock of the process tree when running make In-Reply-To: References: <9388316255ada0e0fcb2d849cce5a894@ispras.ru> <20220409191743.6da2268a36e8c9b4ab22c722@nifty.ne.jp> <1ecd670b1cdff43e0b0d7e5ee4c9cfc5@ispras.ru> <20220409204619.dd0e53902d5e108ef462e510@nifty.ne.jp> <907ce1b4416a826cb07990dd601bd687@ispras.ru> <20220410015753.753e2a238513eaf2a3da81e9@nifty.ne.jp> <20220410025410.196aa0a04368147dbbb31d3e@nifty.ne.jp> <7204ed0aa2d6b3fcfb239010e6b67646@ispras.ru> <20220410163432.00dd7b9f81f8f322d97688f2@nifty.ne.jp> <0e1a53626639cb21369225ff9092ecfc@ispras.ru> <20220411173526.6243b9492e0fc3d4132a58a8@nifty.ne.jp> User-Agent: Roundcube Webmail/1.4.4 Message-ID: <1bdd5ac77277343fbff9b560fa98b15e@ispras.ru> X-Sender: izbyshev@ispras.ru Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=0.0 required=5.0 tests=BAYES_00, DOS_RCVD_IP_TWICE_B, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: cygwin@cygwin.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: General Cygwin discussions and problem reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 Apr 2022 16:48:12 -0000 On 2022-04-11 13:10, Alexey Izbyshev wrote: > On 2022-04-11 11:35, Takashi Yano wrote: >> On Sun, 10 Apr 2022 23:49:29 +0300 >> A countermeasure version is available at the following location: >> https://tyan0.yr32.net/cygwin/x86/test/cygwin1-20220411.dll.xz >> https://tyan0.yr32.net/cygwin/x86_64/test/cygwin1-20220411.dll.xz >> >> Could you please test? To keep the hanging tree, please install >> cygwin another directory, and replace cygwin1.dll with the >> countermeasure version. >> > Thank you for providing the binaries! I've started testing in a > separate cygwin installation on the same machine, as you suggested. > The hang previously took many hours to reproduce, so I'll keep tests > running for a while and then report back. > The good news is that the tests have been running for two days so far without any cygwin-related issues, so the patched version doesn't seem to introduce new issues. The bad news is my theory about the suspicious "Unnamed file: \FileSystem\Npfs" in the hanging bash.exe being a leak seems to be wrong. I've closed that handle, but conhost.exe hasn't unblocked. All of its threads are doing the same things as before: 1. Tries to enter a critical section. (Task Manager claims it waits for thread 4, so probably the latter owns it). 2. ReadFile("pty1-from-master-nat" named pipe) 3. Waits for an anonymous event. 4. Waits on a handle for "\Device\ConDrv" (in DeviceIoControl()). 5. Blocked in GetMessageW(). I've created a model situation with bash.exe stopped at a breakpoint in ClosePseudoConsole() at another machine again, and it seems that the last time I missed that bash.exe contains *two* handles for (different) "Unnamed file: \FileSystem\Npfs" here too, so it seems to be normal. What's probably not normal is the behavior of the hanging conhost.exe. I've compared the points where conhost.exe is blocked, and all but one threads in the model case are doing the same things as in the hanging case, but the remaining thread is blocked in ReadFile("\Device\NamedPipe\") (i.e. the read end of "hWritePipe" of pcon) instead of trying to enter a critical section like thread 1 above. So now I'm starting to doubt that it's a cygwin bug and not some conhost.exe bug. I'll try to poke around the hanging conhost.exe some more, and also may be will try to create a faster reproducer. Thanks for your help so far, Alexey