From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mout.kundenserver.de (mout.kundenserver.de [212.227.126.133]) by sourceware.org (Postfix) with ESMTPS id 9C68D383943C for ; Wed, 18 May 2022 14:54:13 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 9C68D383943C Authentication-Results: sourceware.org; dmarc=fail (p=none dis=none) header.from=cygwin.com Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=cygwin.com Received: from calimero.vinschen.de ([24.134.7.25]) by mrelayeu.kundenserver.de (mreue012 [212.227.15.167]) with ESMTPSA (Nemesis) id 1N79Ey-1noIBV1WAO-017RfS; Wed, 18 May 2022 16:54:05 +0200 Received: by calimero.vinschen.de (Postfix, from userid 500) id 8F893A80B6C; Wed, 18 May 2022 16:54:03 +0200 (CEST) Date: Wed, 18 May 2022 16:54:03 +0200 From: Corinna Vinschen To: Jun T Cc: cygwin@cygwin.com Subject: Re: killpg(pgid, 0) fails if the process is in the middle of spawnve() Message-ID: Reply-To: cygwin@cygwin.com Mail-Followup-To: Jun T , cygwin@cygwin.com References: <08B78336-7554-4ACA-80EF-F87C3C04C781@kba.biglobe.ne.jp> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <08B78336-7554-4ACA-80EF-F87C3C04C781@kba.biglobe.ne.jp> X-Provags-ID: V03:K1:XSMjMmxii3TLHgU5TGSQDAR1mRibT6JnjuNufHvIZQ3yEpMDH/r OVC3e9xkV56OU265becD7veSvChno8bllIjYEISo13/w35dgurR7ukhRlsLKroXSQgv+RNr a3bErUiyCQIl2QEylHTusSLuLvtumy9fS3jXeOWx5JW9bu0BqY9PTw1TDSW4Dccj6qCiW9Z QByG2x5mXJ0077LVwl51Q== X-UI-Out-Filterresults: notjunk:1;V03:K0:DW1HNRsKwdc=:pOpnjVTFSbA/4y76R0HiRK Cl9FlvBLHXngnt3xOlKXbxhv3PqQQ74wQSlSToHqh4jU1YL/Gpd1DsgzhnP8s019XktPH3ix0 u7cEDOz3G8bkzarB7q5CAYsnAphtDa9VxjtrheBMF1u+HIAhABzY0mijq3wDW9tohEm5uXiJ+ AWMc44Gbwv6/UldO2SMHcxa1y76z+mNRZ7FzN8OsoOipV6/MdlKzSgkq6PldaLJj+GKeJW+kN r8Z2ghEJI+zKflEzmIM8kIg7JzmwNUmKBG1UsogApi46BID6PWwe/CHFgjTc2LuQNhEEDzZCq ILR4LRgTowezVcnD0IC/rpLNX5FqsAiLQv091BTYnxPMaaenan2ZYQIYaBm6LYCgYWRiV2vxv j+tuy58bK/UmKUgvuSiTizOTo1rjhCNjZO6ndLNrAalTDBtmkhHry13Iu80GEdTbHhWv+AVv2 tD3nNz8X9eVWDuNjv0wabvHeARMXDTJ0LR0sWwrFSHqcuCJFWxRtsCokgMHuOJdRxQRTpRXNP mdVisbaG55qnhmXAdN8HyHyi3DckHANA991GlZdf4jlt3ffRNTuJamOxxz/y1bqm9hJFOZj/J f2H877xnnjzew+2IyAgsLkuPtz+M40wFAx37C5ivUHPfI/I4fyK5hrTx06GW0ifyh4C7Aex+K PBzOodnakZ8cVLfQlMOb8T3iV49yeRGQqIP965V5oE8/k9mYpiPSEdofqLbMuVDqHdk8zUtfk ffGrEAkPFyU4taUrczt0sBXC5xZ3PmtqjX2C7DZMYvYKb5tzvL76bvvczZw= X-Spam-Status: No, score=-101.3 required=5.0 tests=BAYES_00, GIT_PATCH_0, GOOD_FROM_CORINNA_CYGWIN, KAM_DMARC_NONE, KAM_DMARC_STATUS, RCVD_IN_DNSWL_NONE, SPF_FAIL, SPF_HELO_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: cygwin@cygwin.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: General Cygwin discussions and problem reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 May 2022 14:54:15 -0000 On May 18 20:19, Jun T wrote: > It seems killpg(2) on Cygwin has a problem as described below. > Can this be (easily) fixed? > > [1] The problem > > killpg(pgid, 0) (or kill_pgrp(pgid, si_signo=0), in signal.cc) > fails (returns -1) even when there is a process in the process > group pgid, if the process is in the middle of spawnve(). > > [2] A problem of zsh on Cygwin that is caused by [1] > [...] > zsh% ls --color | less > zsh: done ls --color | > zsh: suspended (tty output) less > > How frequently you get this may depend on your hardware, > but if it happens you will find it quite annoying. > > [3] How does [1] cause [2]? > > According to the strace output, what is happening is as follows: > > The main zsh (zsh0) fork() two subshells, zsh1 for ls and > zsh2 for less. > zsh1 becomes a process group leader (pid=pgid=101, for example), > gets tty (becomes foreground), and calls execve(ls). > zsh2 becomes a member of the process group pgid=101, and > calls execve(less). > > When ls exits, zsh0 gets SIGCHLD, and in the signal handler > it calls killpg(101, 0) to see if there are any process > remaining in the process group 101. At this point zsh2/less > is still in the process group 101, so killpg(101, 0) should > succeed. > > But when problem [2] happens, zsh2 has already called execve(less) > or spawnve(_P_OVERLAY,less), but spawnve() has not finished yet. > > There are two Windows processes (zsh2 and less), but it _seems_ > neither of them is included in the list of win-pids created by > winpids pids ((DWORD) PID_MAP_RW); > at line 358 of signal.cc. So kill_pgrp() fails, and zsh0 thinks > that there is no foreground process remaining, and regains tty. > [...] I'm not 100% sure, but I think the core of the problem is spawn.cc. Look into the mode == _P_OVERLAY (aka "exec'ing") starting at line 857. At line 870, the handle to the own winpid object is closed in the exec'ing process. However, the exec'ing process only creates the winpid symlink for the exec'ed process if it's a non-Cygwin process. For a Cygwin process, the assumption is that the exec'ed process creates its own symlink (in pinfo::thisproc() in pinfo.cc). If the exec'ing process calls NtClose on it's own winpid symlink, but the exec'ed process didn't progress enough into initialization, there's a slim chance that neither the exec'ing process, nor the exec'ed process has a winpid symlink attached. As for how to fix this... as a first rough idea I'd think that we should always create the winpid symlink in spawn.cc, even for exec'ed Cygwin processes. We also need to make sure that we dup the handle into the new process, and stop creating the winpid symlink in exec'ed processes. Here's a Q&D patch: diff --git a/winsup/cygwin/pinfo.cc b/winsup/cygwin/pinfo.cc index 4a8daefd924e..f501c470f34c 100644 --- a/winsup/cygwin/pinfo.cc +++ b/winsup/cygwin/pinfo.cc @@ -55,10 +55,11 @@ void pinfo::thisproc (HANDLE h) { procinfo = NULL; + bool execed = !!h; DWORD flags = PID_IN_USE | PID_ACTIVE; /* Forked process or process started from non-Cygwin parent needs a pid. */ - if (!h) + if (!execed) { cygheap->pid = create_cygwin_pid (); flags |= PID_NEW; @@ -72,7 +73,8 @@ pinfo::thisproc (HANDLE h) procinfo->dwProcessId = myself_initial.dwProcessId; procinfo->sendsig = myself_initial.sendsig; wcscpy (procinfo->progname, myself_initial.progname); - create_winpid_symlink (); + if (!execed) + create_winpid_symlink (); procinfo->exec_sendsig = NULL; procinfo->exec_dwProcessId = 0; debug_printf ("myself dwProcessId %u", procinfo->dwProcessId); diff --git a/winsup/cygwin/spawn.cc b/winsup/cygwin/spawn.cc index 98b588698c5c..905d949fc5c4 100644 --- a/winsup/cygwin/spawn.cc +++ b/winsup/cygwin/spawn.cc @@ -860,13 +860,14 @@ child_info_spawn::worker (const char *prog_arg, const char *const *argv, strace.execing = 1; myself.hProcess = hExeced = pi.hProcess; HANDLE old_winpid_hdl = myself.shared_winpid_handle (); - if (!real_path.iscygexec ()) - { - /* If the child process is not a Cygwin process, we have to - create a new winpid symlink on behalf of the child process - not being able to do this by itself. */ - myself.create_winpid_symlink (); - } + /* We have to create a new winpid symlink on behalf of the child + process. For Cygwin processes we also have to create a reference + in the child. */ + myself.create_winpid_symlink (); + if (real_path.iscygexec ()) + DuplicateHandle (GetCurrentProcess (), + myself.shared_winpid_handle (), + pi.hProcess, NULL, 0, 0, DUPLICATE_SAME_ACCESS); NtClose (old_winpid_hdl); real_path.get_wide_win32_path (myself->progname); // FIXME: race? sigproc_printf ("new process name %W", myself->progname); However, this *might* have side effects, like having two processes with the same pid in ps output or something like that. The tricky thing here is that creating the new winpid symlink and closing the old winpid symlink should be atomic from the POV of pid collecting processes. For a start, can you try the above patch? I probably won't have the time to come up with an atomic solution in the next 2 weeks, but in general, the problem should be clear now. If somebody else comes up with a neat atomic solution during my absence, I'd be happy, too. Corinna