From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com [66.111.4.25]) by sourceware.org (Postfix) with ESMTPS id 5A80E3858CDA for ; Sun, 14 Aug 2022 03:30:26 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 5A80E3858CDA Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=sunshowers.io Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=sunshowers.io Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.nyi.internal (Postfix) with ESMTP id 38B575C008C for ; Sat, 13 Aug 2022 23:30:26 -0400 (EDT) Received: from imap49 ([10.202.2.99]) by compute3.internal (MEProxy); Sat, 13 Aug 2022 23:30:26 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sunshowers.io; h=cc:content-type:date:date:from:from:in-reply-to:message-id :mime-version:reply-to:sender:subject:subject:to:to; s=fm2; t= 1660447826; x=1660534226; bh=0kRNb1gZnsEdLX7PPYq6kt9uL/26sSkh4Ar P4bxnROo=; b=ccZbFCjvGyIV6v8arSCYhLsVYbyC/cWMnRpRhgruHwHraJjjJIJ 7iyF6OymsFMhEQSa/8237IIj2DoDwMlkhu7ur5JptfjLTSTYEvsfRzPdtfaXBpF7 uv1wpKghD/FRTN1jgDWgnvoqGroVkkUvAEmdgfCoqBGnF/5sBRsothHkJYRb/LPW xs9Tp4Oh+JeS2ZYJ+VP9WMsFfbcNMMSClv2H7/pwP4xHL/yn1iDZKTZy/bA7wRwk 6x8tPYWJYDR0fK/ptw1KN0o1q+f8cvl7obnurDAA1FnLRYG9b5YcXv/7w54AsxKZ 8/wvB3ur2z5PUJDb9cJGu7Baf7GOnGaQnCQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:date:date:feedback-id :feedback-id:from:from:in-reply-to:message-id:mime-version :reply-to:sender:subject:subject:to:to:x-me-proxy:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm1; t=1660447826; x= 1660534226; bh=0kRNb1gZnsEdLX7PPYq6kt9uL/26sSkh4ArP4bxnROo=; b=H 3bU8UWzXrjh9hHn8Zcx22juU5cAi1/0madVvpa/rKUp38dIuQDPYeQe6ejzsxvLh pNN8osMZpxPX/yIxJrQ61X/xmCRLhYe2KrDjy/OO0qHO5MIWI72jcts64hgwCrXv dkCkP9MH3TFTrdNvaRSjyhkmDSE7cri1+iRz2PKx450Zo0vz2Tf8hs7CpF9ZHmrF eXKFx9JjmQNg+kkVCyu3WHcQyDiFNATHrmMCBJUE1XGBWxlZVP3XrXi/DWisM4By zXDKD/ToNSYnGSABeCmEtJ+M9W1zRpLEWb2qYVJkeUuZ6hUIzjSQ4PWvLiFw+/TF vt6HEBOVmvK+VgKypNNbg== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvfedrvdegledgjeefucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefofgggkfffhffvufgtsehttdertd erredtnecuhfhrohhmpeftrghinhcuoehglhhisggtsehsuhhnshhhohifvghrshdrihho qeenucggtffrrghtthgvrhhnpedttdevgfeuueekgfeiheefudevhedtieettefgtdetve ffvddtfefhteejvddvueenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgr ihhlfhhrohhmpehglhhisggtsehsuhhnshhhohifvghrshdrihho X-ME-Proxy: Feedback-ID: i552946fd:Fastmail Received: by mailuser.nyi.internal (Postfix, from userid 501) id EF1D515A0087; Sat, 13 Aug 2022 23:30:25 -0400 (EDT) X-Mailer: MessagingEngine.com Webmail Interface User-Agent: Cyrus-JMAP/3.7.0-alpha0-841-g7899e99a45-fm-20220811.002-g7899e99a Mime-Version: 1.0 Message-Id: <2921668c-773e-465d-9480-0abb6f979bf9@www.fastmail.com> Date: Sat, 13 Aug 2022 20:30:05 -0700 From: Rain To: libc-help@sourceware.org Subject: posix_spawn: parent can get stuck in uninterruptible sleep if child receives SIGTSTP early enough Content-Type: text/plain X-Spam-Status: No, score=-2.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, JMQ_SPF_NEUTRAL, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE, WEIRD_PORT autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-help@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-help mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Aug 2022 03:30:27 -0000 Hi there -- I've been working on a CLI tool (in Rust) that spawns lots of processes with posix_spawn. Specifically, I've been observing its behavior when Ctrl-Z is pressed in a terminal, and the process group receives a SIGTSTP signal. I'm seeing an issue where if the signal is received early enough during the posix_spawn process, the parent can be stuck in the middle of the clone3() syscall, an uninterruptible sleep status. Here are some backtraces, observed with glibc 2.35 and Linux kernel 5.18.10-76051810-generic on Ubuntu 22.04 (x86_64). I checked glibc master and I'm not seeing any code changes in this area, so I presume this issue still exists. In this case, during setup, posix_spawnattr_setsigmask is called with an empty signal set. However, based on reading the source code. I don't think that's relevant. --- parent process --- (gdb) bt #0 clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:62 #1 0x00007f12a0a37a51 in __GI___clone_internal (cl_args=cl_args@entry=0x7f129a5ed9e0, func=func@entry=0x7f12a0a24300 <__spawni_child>, arg=arg@entry=0x7f129a5eda40) at ../sysdeps/unix/sysv/linux/clone-internal.c:54 #2 0x00007f12a0a241f3 in __spawnix (pid=0x7f129a5edd20, file=0x7f123405d030 "/home/rain/dev/tokio/target/debug/deps/sync_mutex-22a40a7c6051156b", file_actions=0x7f129a5edd60, attrp=, argv=, envp=0x7f123403f2e0, xflags=1, exec=0x7f12a09fcdd0 <__execvpex>) at ../sysdeps/unix/sysv/linux/spawni.c:388 #3 0x00007f12a0a2490b in __spawni (pid=, file=, acts=, attrp=, argv=, envp=, xflags=1) at ../sysdeps/unix/sysv/linux/spawni.c:436 #4 0x00007f12a0a2403f in __posix_spawnp (pid=, file=, file_actions=, attrp=, argv=, envp=) at ./posix/spawnp.c:30 #5 0x000056199dee0811 in std::sys::unix::process::process_common::Command::posix_spawn () at library/std/src/sys/unix/process/process_unix.rs:544 #6 std::sys::unix::process::process_common::Command::spawn () at library/std/src/sys/unix/process/process_unix.rs:57 #7 0x000056199ded68dc in std::process::Command::spawn () at library/std/src/process.rs:881 --- child process --- (gdb) bt #0 __GI___pthread_sigmask (how=how@entry=2, newmask=, oldmask=oldmask@entry=0x0) at ./nptl/pthread_sigmask.c:43 #1 0x00007faaf8edd71d in __GI___sigprocmask (how=how@entry=2, set=, oset=oset@entry=0x0) at ../sysdeps/unix/sysv/linux/sigprocmask.c:25 #2 0x00007faaf8fae4d8 in __spawni_child (arguments=) at ../sysdeps/unix/sysv/linux/spawni.c:287 #3 0x00007faaf8fc1a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 --- Based on these backtraces and reading the source code, here's what I believe is happening: 1. The parent calls __posix_spawnp, which in turn calls __spawni and __spawnix. 2. The parent calls clone3 and enters uninterruptible sleep. 3. The child enters __spawni_child and blocks all incoming signals. ---> 4. At this point the child receives a SIGTSTP signal. <--- 5. The child unblocks signals by calling sigprocmask/pthread_sigmask. 6. At this point the SIGTSTP is delivered to the child. 7. However, the clone hasn't exited in the parent and so it remains stuck in the clone3 syscall until the child receives a SIGCONT. I'm not sure what a reasonable way to handle this would be on the part of my CLI tool. The tool currently just gets stuck in uninterruptible sleep, resulting in a bad user experience. Here are solutions I've thought about that don't seem to work (please correct me if I'm wrong!) 1. Setting the signal mask to include SIGTSTP. I do want to be able to send the child SIGTSTP after the clone(), and in my case the child is a third-party process so I can't depend on it to reset the signal mask. 2. Spawning a stub process that execves the real child. It seems like the same issue exists when the main process calls the stub process, if I'm understanding the code correctly, so this won't help. ... though now as I'm writing this email out, maybe one solution is: * my tool spawns a stub process with SIGTSTP masked. * the subprocess unmasks SIGTSTP (so it could receive the SIGTSTP here, but at least it won't block the parent process), then execves the third-party process. Is that the solution you would recommend? Thanks. -- Rain (they/she)