* killpg(pgid, 0) fails if the process is in the middle of spawnve() @ 2022-05-18 11:19 Jun T 2022-05-18 14:54 ` Corinna Vinschen 0 siblings, 1 reply; 4+ messages in thread From: Jun T @ 2022-05-18 11:19 UTC (permalink / raw) To: cygwin Dear Cygwin developers, It seems killpg(2) on Cygwin has a problem as described below. Can this be (easily) fixed? [1] The problem killpg(pgid, 0) (or kill_pgrp(pgid, si_signo=0), in signal.cc) fails (returns -1) even when there is a process in the process group pgid, if the process is in the middle of spawnve(). [2] A problem of zsh on Cygwin that is caused by [1] More than a year ago, a user of zsh on Cygwin/MSYS2 reported to the zsh/workers mailing list: https://www.zsh.org/mla/workers/2021/msg00060.html As described in this post, it can sometimes (or frequently) happen that a pipeline like 'ls | less' results in: zsh% ls --color | less zsh: done ls --color | zsh: suspended (tty output) less How frequently you get this may depend on your hardware, but if it happens you will find it quite annoying. [3] How does [1] cause [2]? According to the strace output, what is happening is as follows: The main zsh (zsh0) fork() two subshells, zsh1 for ls and zsh2 for less. zsh1 becomes a process group leader (pid=pgid=101, for example), gets tty (becomes foreground), and calls execve(ls). zsh2 becomes a member of the process group pgid=101, and calls execve(less). When ls exits, zsh0 gets SIGCHLD, and in the signal handler it calls killpg(101, 0) to see if there are any process remaining in the process group 101. At this point zsh2/less is still in the process group 101, so killpg(101, 0) should succeed. But when problem [2] happens, zsh2 has already called execve(less) or spawnve(_P_OVERLAY,less), but spawnve() has not finished yet. There are two Windows processes (zsh2 and less), but it _seems_ neither of them is included in the list of win-pids created by winpids pids ((DWORD) PID_MAP_RW); at line 358 of signal.cc. So kill_pgrp() fails, and zsh0 thinks that there is no foreground process remaining, and regains tty. Later spawnve(less) completes, and less wants to write to stdout, but it has not tty, and is stopped by SIGTTOU. Is it possible to fix the problem [1], so that killpg(pgid, 0) succeeds even when all the process(es) in the process group pgid is/are in the middle of spawnve()? -- Jun ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: killpg(pgid, 0) fails if the process is in the middle of spawnve() 2022-05-18 11:19 killpg(pgid, 0) fails if the process is in the middle of spawnve() Jun T @ 2022-05-18 14:54 ` Corinna Vinschen 2022-05-19 4:51 ` Jun T 0 siblings, 1 reply; 4+ messages in thread From: Corinna Vinschen @ 2022-05-18 14:54 UTC (permalink / raw) To: Jun T; +Cc: cygwin On May 18 20:19, Jun T wrote: > It seems killpg(2) on Cygwin has a problem as described below. > Can this be (easily) fixed? > > [1] The problem > > killpg(pgid, 0) (or kill_pgrp(pgid, si_signo=0), in signal.cc) > fails (returns -1) even when there is a process in the process > group pgid, if the process is in the middle of spawnve(). > > [2] A problem of zsh on Cygwin that is caused by [1] > [...] > zsh% ls --color | less > zsh: done ls --color | > zsh: suspended (tty output) less > > How frequently you get this may depend on your hardware, > but if it happens you will find it quite annoying. > > [3] How does [1] cause [2]? > > According to the strace output, what is happening is as follows: > > The main zsh (zsh0) fork() two subshells, zsh1 for ls and > zsh2 for less. > zsh1 becomes a process group leader (pid=pgid=101, for example), > gets tty (becomes foreground), and calls execve(ls). > zsh2 becomes a member of the process group pgid=101, and > calls execve(less). > > When ls exits, zsh0 gets SIGCHLD, and in the signal handler > it calls killpg(101, 0) to see if there are any process > remaining in the process group 101. At this point zsh2/less > is still in the process group 101, so killpg(101, 0) should > succeed. > > But when problem [2] happens, zsh2 has already called execve(less) > or spawnve(_P_OVERLAY,less), but spawnve() has not finished yet. > > There are two Windows processes (zsh2 and less), but it _seems_ > neither of them is included in the list of win-pids created by > winpids pids ((DWORD) PID_MAP_RW); > at line 358 of signal.cc. So kill_pgrp() fails, and zsh0 thinks > that there is no foreground process remaining, and regains tty. > [...] I'm not 100% sure, but I think the core of the problem is spawn.cc. Look into the mode == _P_OVERLAY (aka "exec'ing") starting at line 857. At line 870, the handle to the own winpid object is closed in the exec'ing process. However, the exec'ing process only creates the winpid symlink for the exec'ed process if it's a non-Cygwin process. For a Cygwin process, the assumption is that the exec'ed process creates its own symlink (in pinfo::thisproc() in pinfo.cc). If the exec'ing process calls NtClose on it's own winpid symlink, but the exec'ed process didn't progress enough into initialization, there's a slim chance that neither the exec'ing process, nor the exec'ed process has a winpid symlink attached. As for how to fix this... as a first rough idea I'd think that we should always create the winpid symlink in spawn.cc, even for exec'ed Cygwin processes. We also need to make sure that we dup the handle into the new process, and stop creating the winpid symlink in exec'ed processes. Here's a Q&D patch: diff --git a/winsup/cygwin/pinfo.cc b/winsup/cygwin/pinfo.cc index 4a8daefd924e..f501c470f34c 100644 --- a/winsup/cygwin/pinfo.cc +++ b/winsup/cygwin/pinfo.cc @@ -55,10 +55,11 @@ void pinfo::thisproc (HANDLE h) { procinfo = NULL; + bool execed = !!h; DWORD flags = PID_IN_USE | PID_ACTIVE; /* Forked process or process started from non-Cygwin parent needs a pid. */ - if (!h) + if (!execed) { cygheap->pid = create_cygwin_pid (); flags |= PID_NEW; @@ -72,7 +73,8 @@ pinfo::thisproc (HANDLE h) procinfo->dwProcessId = myself_initial.dwProcessId; procinfo->sendsig = myself_initial.sendsig; wcscpy (procinfo->progname, myself_initial.progname); - create_winpid_symlink (); + if (!execed) + create_winpid_symlink (); procinfo->exec_sendsig = NULL; procinfo->exec_dwProcessId = 0; debug_printf ("myself dwProcessId %u", procinfo->dwProcessId); diff --git a/winsup/cygwin/spawn.cc b/winsup/cygwin/spawn.cc index 98b588698c5c..905d949fc5c4 100644 --- a/winsup/cygwin/spawn.cc +++ b/winsup/cygwin/spawn.cc @@ -860,13 +860,14 @@ child_info_spawn::worker (const char *prog_arg, const char *const *argv, strace.execing = 1; myself.hProcess = hExeced = pi.hProcess; HANDLE old_winpid_hdl = myself.shared_winpid_handle (); - if (!real_path.iscygexec ()) - { - /* If the child process is not a Cygwin process, we have to - create a new winpid symlink on behalf of the child process - not being able to do this by itself. */ - myself.create_winpid_symlink (); - } + /* We have to create a new winpid symlink on behalf of the child + process. For Cygwin processes we also have to create a reference + in the child. */ + myself.create_winpid_symlink (); + if (real_path.iscygexec ()) + DuplicateHandle (GetCurrentProcess (), + myself.shared_winpid_handle (), + pi.hProcess, NULL, 0, 0, DUPLICATE_SAME_ACCESS); NtClose (old_winpid_hdl); real_path.get_wide_win32_path (myself->progname); // FIXME: race? sigproc_printf ("new process name %W", myself->progname); However, this *might* have side effects, like having two processes with the same pid in ps output or something like that. The tricky thing here is that creating the new winpid symlink and closing the old winpid symlink should be atomic from the POV of pid collecting processes. For a start, can you try the above patch? I probably won't have the time to come up with an atomic solution in the next 2 weeks, but in general, the problem should be clear now. If somebody else comes up with a neat atomic solution during my absence, I'd be happy, too. Corinna ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: killpg(pgid, 0) fails if the process is in the middle of spawnve() 2022-05-18 14:54 ` Corinna Vinschen @ 2022-05-19 4:51 ` Jun T 2022-05-19 7:46 ` Corinna Vinschen 0 siblings, 1 reply; 4+ messages in thread From: Jun T @ 2022-05-19 4:51 UTC (permalink / raw) To: Lavrentiev, Anton (NIH/NLM/NCBI) [C] via Cygwin > 2022/05/18 23:54, Corinna Vinschen <corinna-cygwin@cygwin.com> wrote: > > For a start, can you try the above patch? Thanks for a quick response. The patch seems to solve the problem. Of course there is a possibility that it happens with very low probability, but I haven't get the problem by running 'cmd | less' many times, say fifty times or so. I didn't notice any bad side effects, and I guess the probability is very low if it ever happens (so not easy to detect it). # I tried running 'ls | less' several times while running # while true; do ps; done # in another terminal, but didn't get duplicated processes. PS: I'm new to building cygwin1.dll by myself, and have a few questions: (1) After make, I've manually copied new-cygwin1.dll to /bin/cygwin1.dll. I didn't copy/install any other files. Is this OK? (2) What are cygwin0.dll and cygwin1.dbg? The latter is a new cygwin1.dll with debug info? (3) Is there a simple way to build only new-cygwin1.dll (and anything required by it), without building (or trying to build) unnecessary things (especially documents)? -- Jun ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: killpg(pgid, 0) fails if the process is in the middle of spawnve() 2022-05-19 4:51 ` Jun T @ 2022-05-19 7:46 ` Corinna Vinschen 0 siblings, 0 replies; 4+ messages in thread From: Corinna Vinschen @ 2022-05-19 7:46 UTC (permalink / raw) To: Jun T; +Cc: cygwin On May 19 13:51, Jun T wrote: > > > 2022/05/18 23:54, Corinna Vinschen <corinna-cygwin@cygwin.com> wrote: > > > > For a start, can you try the above patch? > > Thanks for a quick response. The patch seems to solve the problem. > Of course there is a possibility that it happens with very low probability, > but I haven't get the problem by running 'cmd | less' many times, say > fifty times or so. > > I didn't notice any bad side effects, and I guess the probability is > very low if it ever happens (so not easy to detect it). > # I tried running 'ls | less' several times while running > # while true; do ps; done > # in another terminal, but didn't get duplicated processes. Great, thanks for testing. I'll push the patch as is for now. > PS: > I'm new to building cygwin1.dll by myself, and have a few questions: > (1) After make, I've manually copied new-cygwin1.dll to /bin/cygwin1.dll. > I didn't copy/install any other files. Is this OK? Yes. > (2) What are cygwin0.dll and cygwin1.dbg? The latter is a new cygwin1.dll > with debug info? cygwin0.dll is an intermediate stage during build, cygwin1.dbg is the debuginfo file. For quick debugging, you can just copy it alongside the DLL into /bin. > (3) Is there a simple way to build only new-cygwin1.dll (and anything > required by it), without building (or trying to build) unnecessary > things (especially documents)? configure --disable-doc Everything else is built unconditionally. If you tweak sources inside ${srcdir}/winsup/cygwin only, it's usually sufficient to rebuild inside ${builddir}/x86_64-pc-cygwin/winsup/cygwin only, unless you change the layout/size of data structures. A build from toplevel should be performed then. Thanks, Corinna ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2022-05-19 7:46 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-05-18 11:19 killpg(pgid, 0) fails if the process is in the middle of spawnve() Jun T 2022-05-18 14:54 ` Corinna Vinschen 2022-05-19 4:51 ` Jun T 2022-05-19 7:46 ` Corinna Vinschen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).