From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dfmta1001.biglobe.ne.jp (snd01012-bg.im.kddi.ne.jp [IPv6:2001:268:f330:d305:1b:56:71:1c]) by sourceware.org (Postfix) with ESMTPS id 1909A3857341 for ; Wed, 18 May 2022 11:19:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 1909A3857341 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=kba.biglobe.ne.jp Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=kba.biglobe.ne.jp Received: from mail.biglobe.ne.jp by omta1001.biglobe.ne.jp with ESMTP id <20220518111932152.ICZG.98282.mail.biglobe.ne.jp@biglobe.ne.jp> for ; Wed, 18 May 2022 20:19:32 +0900 From: Jun T Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.21\)) Subject: killpg(pgid, 0) fails if the process is in the middle of spawnve() Message-Id: <08B78336-7554-4ACA-80EF-F87C3C04C781@kba.biglobe.ne.jp> Date: Wed, 18 May 2022 20:19:31 +0900 To: cygwin@cygwin.com X-Mailer: Apple Mail (2.3445.104.21) X-Biglobe-Sender: takimoto-j@kba.biglobe.ne.jp X-Spam-Status: No, score=-1.8 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: cygwin@cygwin.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: General Cygwin discussions and problem reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 May 2022 11:19:40 -0000 Dear Cygwin developers, It seems killpg(2) on Cygwin has a problem as described below. Can this be (easily) fixed? [1] The problem killpg(pgid, 0) (or kill_pgrp(pgid, si_signo=0), in signal.cc) fails (returns -1) even when there is a process in the process group pgid, if the process is in the middle of spawnve(). [2] A problem of zsh on Cygwin that is caused by [1] More than a year ago, a user of zsh on Cygwin/MSYS2 reported to the zsh/workers mailing list: https://www.zsh.org/mla/workers/2021/msg00060.html As described in this post, it can sometimes (or frequently) happen that a pipeline like 'ls | less' results in: zsh% ls --color | less zsh: done ls --color | zsh: suspended (tty output) less How frequently you get this may depend on your hardware, but if it happens you will find it quite annoying. [3] How does [1] cause [2]? According to the strace output, what is happening is as follows: The main zsh (zsh0) fork() two subshells, zsh1 for ls and zsh2 for less. zsh1 becomes a process group leader (pid=pgid=101, for example), gets tty (becomes foreground), and calls execve(ls). zsh2 becomes a member of the process group pgid=101, and calls execve(less). When ls exits, zsh0 gets SIGCHLD, and in the signal handler it calls killpg(101, 0) to see if there are any process remaining in the process group 101. At this point zsh2/less is still in the process group 101, so killpg(101, 0) should succeed. But when problem [2] happens, zsh2 has already called execve(less) or spawnve(_P_OVERLAY,less), but spawnve() has not finished yet. There are two Windows processes (zsh2 and less), but it _seems_ neither of them is included in the list of win-pids created by winpids pids ((DWORD) PID_MAP_RW); at line 358 of signal.cc. So kill_pgrp() fails, and zsh0 thinks that there is no foreground process remaining, and regains tty. Later spawnve(less) completes, and less wants to write to stdout, but it has not tty, and is stopped by SIGTTOU. Is it possible to fix the problem [1], so that killpg(pgid, 0) succeeds even when all the process(es) in the process group pgid is/are in the middle of spawnve()? -- Jun