public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
From: Alexey Izbyshev <izbyshev@ispras.ru>
To: Takashi Yano <takashi.yano@nifty.ne.jp>
Cc: cygwin@cygwin.com
Subject: Re: Deadlock of the process tree when running make
Date: Wed, 27 Apr 2022 15:19:11 +0300	[thread overview]
Message-ID: <fa1203ef5ca93f2272754f0f7f7ffb6a@ispras.ru> (raw)
In-Reply-To: <20220427202216.4901f538e9916e4a8bde10d9@nifty.ne.jp>

Hi, Takashi,

On 2022-04-27 14:22, Takashi Yano wrote:
> Hi Alexey,
> 
> On Sat, 16 Apr 2022 16:21:34 +0300
> Alexey Izbyshev wrote:
>> On 2022-04-16 12:39, Takashi Yano wrote:
>> > I am not sure yet what is essential, but the current code closes
>> > pseudo console only if there is no other process which is attaching
>> > to the pseudo console. I wonder why javac.exe is remaining as
>> > zombie. The parent bash.exe calls ColosePseudoConsole() when
>> > child non-cygwin app is terminated, i.e., after WaitForSingleObject()
>> > for child process handle returns.
>> > https://www.cygwin.com/git/?p=newlib-cygwin.git;a=blob;f=winsup/cygwin/spawn.cc;h=81dba5a941e919ea2514013069aef22c6fad8004;hb=7ac0767053e278f0ce9811bf6f77278bd2f49c20#l1009
>> >
>> > What does the "zombie" mean? Is it listed in the process list of
>> > ProcessHacker? I still suspect that the zombie javac.exe holds
>> > the  hWritePipe handle leaked from parent bash.exe.
>> >
>> By "zombie" I meant the same thing as in the Linux kernel: a data
>> structure that remains after a process terminated, but hasn't been
>> waited for yet (I don't know how this is implemented in Cygwin). So
>> there is no javac.exe process in ProcessHacker, but "ps" and similar
>> tools in Cygwin still list "javac".
>> 
>> I'm now trying to create a small reproducer that I can share, and I've
>> had a first small success this night: I could get a very similar hang
>> with a simple Makefile and a script with Cygwin 3.3.4. Here is the 
>> tree:
>> 
>> make(14479)-+-bash(14484)---bash(14611)
>>              |-bash(14515)---bash(14618)
>>              |-bash(14491)---bash(14500)---bash(14612)
>>              |-bash(14501)---bash(14510)---bash(14605)
>>              |-bash(14505)---bash(14607)
>>              |-bash(14494)---bash(14617)
>>              |-bash(14506)---bash(14513)---bash(14610)
>>              |-bash(14512)---bash(14518)---bash(14615)
>>              |-bash(14486)---bash(14495)---bash(14606)
>>              |-bash(14483)---bash(14490)---bash(14609)
>>              |-bash(14509)---bash(14614)
>>              |-bash(14489)---bash(14608)
>>              |-bash(14499)---bash(14613)
>>              |-bash(14481)---bash(14485)---python(14588)
>>              |-bash(14496)---bash(14504)---bash(14616)
>>              `-bash(14482)---bash(14604)
>> 
>> 
>> "python" is a zombie, just as "javac" is in the original case. There 
>> is
>> also a single "conhost.exe" again, and all of its 5 threads are doing
>> the same things as in the original case (including the signal pipe
>> thread trying to EnterCriticalSection()). The only difference is that
>> leaf bash.exe are trying to acquire pcon mutex at a different point 
>> [1],
>> but I guess this difference is not important.
>> 
>> I'll try this reproducer with your patched DLL as well as on another
>> machine and share it in case of success.
>> 
>> Thanks,
>> Alexey
>> 
>> [1]
>> https://www.cygwin.com/git?p=newlib-cygwin.git;a=blob;f=winsup/cygwin/spawn.cc;h=81dba5a941e919ea2514013069aef22c6fad8004;hb=cygwin-3_3_4-release#l697
> 
> Is there any progress on this?

During the last week I reproduced the hang on a vanilla 3.3.4 Cygwin 
with a small test multiple times. In one case, the hanging state is even 
minimal, i.e. there is only a bash.exe waiting in ClosePseudoConsole() 
after its native child terminated and a conhost.exe, but no other 
processes trying to acquire pcon mutex. Conhost.exe signal-pipe thread 
is also blocked at the same EnterCriticalSection() call in all cases.

However, I couldn't reproduce the hang with your patched DLL[1] with the 
same test running for multiple days. I can't explain how your change of 
handle inheritability can affect the double-unlock bug in conhost.exe 
that I referenced earlier, so either I'm missing something or I've been 
very unlucky with reproducing. I was going to try to investigate 
conhost.exe logic and state more (in particular, why one of its threads 
still reads from "\Device\ConDrv" after all console clients detached) 
and then reply to you, but I haven't been able to do it yet.

If you want to try to reproduce the hang yourself with 3.3.4, here is 
one of small tests that I used (it looks strange because it's the result 
of minimization of other code):

$ cat Makefile
T := $(shell echo {1..16})

all: $(T)

$(T):
         @./test.sh $@

$ cat test.sh
#!/bin/bash
set -eu

(
   for ((i = 0; i < 10; i++)); do
     python -c ""
   done
)

$ while make -j16; do echo $((i++)); done

The test can still take multiple hours to hang on my machine.

If I get any new interesting data, I'll share it.

Thank you,
Alexey

[1] https://tyan0.yr32.net/cygwin/x86/test/cygwin1-20220418.dll.xz

  reply	other threads:[~2022-04-27 12:19 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-07 21:53 Alexey Izbyshev
2022-04-07 23:54 ` Brian Inglis
2022-04-08  8:42   ` Alexey Izbyshev
2022-04-08 17:04     ` Brian Inglis
2022-04-11 13:27       ` Alexey Izbyshev
2022-04-09 10:17 ` Takashi Yano
2022-04-09 11:00   ` Alexey Izbyshev
2022-04-09 11:02     ` Alexey Izbyshev
2022-04-09 11:46       ` Takashi Yano
2022-04-09 16:07         ` Alexey Izbyshev
2022-04-09 16:57           ` Takashi Yano
2022-04-09 17:23             ` Alexey Izbyshev
2022-04-09 17:54               ` Takashi Yano
2022-04-09 19:35                 ` Alexey Izbyshev
2022-04-09 20:26                   ` Alexey Izbyshev
2022-04-10  7:34                     ` Takashi Yano
2022-04-10 12:13                       ` Alexey Izbyshev
2022-04-10 20:49                         ` Alexey Izbyshev
2022-04-11  8:35                           ` Takashi Yano
2022-04-11 10:10                             ` Alexey Izbyshev
2022-04-13 16:48                               ` Alexey Izbyshev
2022-04-13 17:22                                 ` Takashi Yano
2022-04-13 17:27                                   ` Alexey Izbyshev
2022-04-13 23:17                                 ` Alexey Izbyshev
2022-04-16  9:39                                   ` Takashi Yano
2022-04-16 13:21                                     ` Alexey Izbyshev
2022-04-27 11:22                                       ` Takashi Yano
2022-04-27 12:19                                         ` Alexey Izbyshev [this message]
2022-04-11  5:23               ` Jeremy Drake
2022-04-11  8:36                 ` Takashi Yano
2022-04-11 15:28                 ` Alexey Izbyshev
2022-04-11 17:02                   ` Jeremy Drake

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fa1203ef5ca93f2272754f0f7f7ffb6a@ispras.ru \
    --to=izbyshev@ispras.ru \
    --cc=cygwin@cygwin.com \
    --cc=takashi.yano@nifty.ne.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).