From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-vk1-xa29.google.com (mail-vk1-xa29.google.com [IPv6:2607:f8b0:4864:20::a29]) by sourceware.org (Postfix) with ESMTPS id 41EEC3858D32 for ; Mon, 22 Aug 2022 16:51:07 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 41EEC3858D32 Received: by mail-vk1-xa29.google.com with SMTP id j4so5642678vki.0 for ; Mon, 22 Aug 2022 09:51:07 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:organization:from:references :to:content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc; bh=oN1kjoZ+/f+fTnA5eRrT5UAsz60VdlwMBn6ygU4kv+k=; b=Vs89hExDgC/TpeAF6JVkkzEr1LhFYeZ/Om7QDhiFDW7gBJf6747QeUOL38mm8gevwA edRu9LWg9O8RK5wcsPT24TBJuRt3uNavzJ3dHybytBCRNwyzvX/9BhufHrSKvTUNzpA7 awhK1ccFwvLCPKeP5SvjgsBEo5v8sWIhl2GIZrBkxCcmb+GRjHnAhycHzuM8nL2jbhTN oSccBxYRIuSfhkmrQ9L0Y4vA0NGw0RpacvhBJnQikYCBgSUoVVj6NHRhu5D4v65ZphMf vPwyAgj5frGrJiuNGe14jSFRzBr4DYUz5z2ayruLmteO87KE/yPRE3s5udzSN6EV5Wxb gFMA== X-Gm-Message-State: ACgBeo3lGs6VDfb23kuKev6drtbONObvOiK05MVEJPQ0yczkduUQaYGX mki+w3PVdkItU/wW56VGBxH9ITauA62COQ== X-Google-Smtp-Source: AA6agR6jrWnRR95XPwoSwZgOLpiphDb3+pzJ8LfIrxZeOXF7zKYH0ZHt1DVdOJOVUfupLNzPSxMlNQ== X-Received: by 2002:a05:6122:50d:b0:38c:81c8:96f with SMTP id x13-20020a056122050d00b0038c81c8096fmr2203190vko.32.1661187066473; Mon, 22 Aug 2022 09:51:06 -0700 (PDT) Received: from ?IPV6:2804:1b3:a7c0:5805:4e9:d353:8061:acad? ([2804:1b3:a7c0:5805:4e9:d353:8061:acad]) by smtp.gmail.com with ESMTPSA id w5-20020ab07285000000b00394907b3966sm931881uao.9.2022.08.22.09.51.05 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 22 Aug 2022 09:51:06 -0700 (PDT) Message-ID: <7727e4de-a8da-1e6b-4d7c-68a132750996@linaro.org> Date: Mon, 22 Aug 2022 13:51:03 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.1.2 Subject: Re: posix_spawn: parent can get stuck in uninterruptible sleep if child receives SIGTSTP early enough Content-Language: en-US To: Rain , libc-help@sourceware.org References: <2921668c-773e-465d-9480-0abb6f979bf9@www.fastmail.com> From: Adhemerval Zanella Netto Organization: Linaro In-Reply-To: <2921668c-773e-465d-9480-0abb6f979bf9@www.fastmail.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-6.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE, WEIRD_PORT autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-help@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-help mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Aug 2022 16:51:11 -0000 On 14/08/22 00:30, Rain wrote: > Hi there -- > > I've been working on a CLI tool (in Rust) that spawns lots of processes with posix_spawn. Specifically, I've been observing its behavior when Ctrl-Z is pressed in a terminal, and the process group receives a SIGTSTP signal. I'm seeing an issue where if the signal is received early enough during the posix_spawn process, the parent can be stuck in the middle of the clone3() syscall, an uninterruptible sleep status. > > Here are some backtraces, observed with glibc 2.35 and Linux kernel 5.18.10-76051810-generic on Ubuntu 22.04 (x86_64). I checked glibc master and I'm not seeing any code changes in this area, so I presume this issue still exists. > > In this case, during setup, posix_spawnattr_setsigmask is called with an empty signal set. However, based on reading the source code. I don't think that's relevant. > > --- parent process --- > > (gdb) bt > #0 clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:62 > #1 0x00007f12a0a37a51 in __GI___clone_internal (cl_args=cl_args@entry=0x7f129a5ed9e0, func=func@entry=0x7f12a0a24300 <__spawni_child>, arg=arg@entry=0x7f129a5eda40) > at ../sysdeps/unix/sysv/linux/clone-internal.c:54 > #2 0x00007f12a0a241f3 in __spawnix (pid=0x7f129a5edd20, file=0x7f123405d030 "/home/rain/dev/tokio/target/debug/deps/sync_mutex-22a40a7c6051156b", file_actions=0x7f129a5edd60, > attrp=, argv=, envp=0x7f123403f2e0, xflags=1, exec=0x7f12a09fcdd0 <__execvpex>) at ../sysdeps/unix/sysv/linux/spawni.c:388 > #3 0x00007f12a0a2490b in __spawni (pid=, file=, acts=, attrp=, argv=, envp=, xflags=1) > at ../sysdeps/unix/sysv/linux/spawni.c:436 > #4 0x00007f12a0a2403f in __posix_spawnp (pid=, file=, file_actions=, attrp=, argv=, envp=) > at ./posix/spawnp.c:30 > #5 0x000056199dee0811 in std::sys::unix::process::process_common::Command::posix_spawn () at library/std/src/sys/unix/process/process_unix.rs:544 > #6 std::sys::unix::process::process_common::Command::spawn () at library/std/src/sys/unix/process/process_unix.rs:57 > #7 0x000056199ded68dc in std::process::Command::spawn () at library/std/src/process.rs:881 > > --- child process --- > > (gdb) bt > #0 __GI___pthread_sigmask (how=how@entry=2, newmask=, oldmask=oldmask@entry=0x0) at ./nptl/pthread_sigmask.c:43 > #1 0x00007faaf8edd71d in __GI___sigprocmask (how=how@entry=2, set=, oset=oset@entry=0x0) at ../sysdeps/unix/sysv/linux/sigprocmask.c:25 > #2 0x00007faaf8fae4d8 in __spawni_child (arguments=) at ../sysdeps/unix/sysv/linux/spawni.c:287 > #3 0x00007faaf8fc1a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 > > --- > > Based on these backtraces and reading the source code, here's what I believe is happening: > > 1. The parent calls __posix_spawnp, which in turn calls __spawni and __spawnix. > 2. The parent calls clone3 and enters uninterruptible sleep. > 3. The child enters __spawni_child and blocks all incoming signals. In fact glibc do not block, but rather set all handlers to either SIG_DFL if is not SIG_IGN, or SIG_DFL if POSIX_SPAWN_SETSIGDEF is set. However it does not matter for SIGSTOP since we can not set it to SIG_IGN. > ---> 4. At this point the child receives a SIGTSTP signal. <--- > 5. The child unblocks signals by calling sigprocmask/pthread_sigmask. > 6. At this point the SIGTSTP is delivered to the child. Afaik SIGSTOP is not synchronous and can be delivered any time during process execution. > 7. However, the clone hasn't exited in the parent and so it remains stuck in the clone3 syscall until the child receives a SIGCONT. > > I'm not sure what a reasonable way to handle this would be on the part of my CLI tool. The tool currently just gets stuck in uninterruptible sleep, resulting in a bad user experience. Reading through both your twitter discussion and the bug report against your tool [1] I think it is outside posix_spawn specification on how to handle SIGSTOP for the helper process itself in the tiny window between process creation and the setpgid. > > Here are solutions I've thought about that don't seem to work (please correct me if I'm wrong!) > 1. Setting the signal mask to include SIGTSTP. I do want to be able to send the child SIGTSTP after the clone(), and in my case the child is a third-party process so I can't depend on it to reset the signal mask. > 2. Spawning a stub process that execves the real child. It seems like the same issue exists when the main process calls the stub process, if I'm understanding the code correctly, so this won't help. > > ... though now as I'm writing this email out, maybe one solution is: > > * my tool spawns a stub process with SIGTSTP masked. > * the subprocess unmasks SIGTSTP (so it could receive the SIGTSTP here, but at least it won't block the parent process), then execves the third-party process. > > Is that the solution you would recommend? I am not sure this would work, since SIGSTOP cannot be caught, blocked, or ignored. What I think if might work is to spawn a stub process and make it a new session leader with setsid so it will not have a controlling terminal. The stub process will be responsible to spawn new processes, so any interaction with the controlling terminal (the CTRL+Z) won't affected the posix_spawn helper thread. You will probably need to open the controlling terminal in raw mode so you can catch ctrl-z and pass along the expected process groups. > > Thanks. [1] https://github.com/nextest-rs/nextest/pull/470#issue-1338100182