From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-vs1-xe35.google.com (mail-vs1-xe35.google.com [IPv6:2607:f8b0:4864:20::e35]) by sourceware.org (Postfix) with ESMTPS id A50673858D32 for ; Mon, 22 Aug 2022 17:49:02 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A50673858D32 Received: by mail-vs1-xe35.google.com with SMTP id k10so6918547vsr.4 for ; Mon, 22 Aug 2022 10:49:02 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:organization:from:references :to:content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc; bh=6sd21TZ9l/4YQZ75yso5cp3JMTLgXRRosAm5Jus0kHo=; b=2rP30ORDDAChfeshY1EI4mCiK3bF/1Z/6V1o40SmAwKEdfv5s3ozubI7xN+FaNNzxW Ex0RIRXj9sMCc3XxTvwyjwRvAe40OsdiEy23kFHu17DOlcVHwuCX5xaiRp9G6M1oC9kT a3LgldganCPF16j2dbmPNHYzeG8k/lLVYiX6B8dgZ09Y3ToBIuutyQCgsM/XsW95ul6X ehFiNLvXIBvR8Fi7CQeTcQIwaLwCxmZ/YoLEIus3mtVP+zb8MzlbS3eDcbxp0YqmI2M5 GM5igqQYwB/ytjtlHrgBhoWs59HQb9sZR95Gg8rIsNPccKkC+Yz/4p5YLnvRpMcBpJ9B +ajw== X-Gm-Message-State: ACgBeo0PAqd3PRHNiGV/nY/VRs1pSf0lMDI6tYgt/MtU1GEF1wjavdgO qx0dMqOzQUdh0vQD1uDttRn9KYZQBOpYSw== X-Google-Smtp-Source: AA6agR5PbrOsBi81yYXKnzUn5MO10WbUAkVNHiYUvrQp4f4Y3H3KAoJyuGVLK5fNYnLe37XCTwM3jw== X-Received: by 2002:a67:f812:0:b0:390:6622:e62f with SMTP id l18-20020a67f812000000b003906622e62fmr1636701vso.56.1661190541922; Mon, 22 Aug 2022 10:49:01 -0700 (PDT) Received: from ?IPV6:2804:1b3:a7c0:5805:4e9:d353:8061:acad? ([2804:1b3:a7c0:5805:4e9:d353:8061:acad]) by smtp.gmail.com with ESMTPSA id w85-20020a1f9458000000b0037d0cd81996sm10215456vkd.37.2022.08.22.10.49.00 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 22 Aug 2022 10:49:01 -0700 (PDT) Message-ID: Date: Mon, 22 Aug 2022 14:48:59 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.1.2 Subject: Re: posix_spawn: parent can get stuck in uninterruptible sleep if child receives SIGTSTP early enough Content-Language: en-US To: Rain , libc-help@sourceware.org References: <2921668c-773e-465d-9480-0abb6f979bf9@www.fastmail.com> <7727e4de-a8da-1e6b-4d7c-68a132750996@linaro.org> <64917a2f-788b-4695-b799-63bbb8a4873f@www.fastmail.com> From: Adhemerval Zanella Netto Organization: Linaro In-Reply-To: <64917a2f-788b-4695-b799-63bbb8a4873f@www.fastmail.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-5.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, KAM_SHORT, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-help@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-help mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Aug 2022 17:49:04 -0000 On 22/08/22 14:00, Rain wrote: > On Mon, Aug 22, 2022, at 09:51, Adhemerval Zanella Netto wrote: > > > >>> --- >>> >>> Based on these backtraces and reading the source code, here's what I believe is happening: >>> >>> 1. The parent calls __posix_spawnp, which in turn calls __spawni and __spawnix. >>> 2. The parent calls clone3 and enters uninterruptible sleep. >>> 3. The child enters __spawni_child and blocks all incoming signals. >> >> In fact glibc do not block, but rather set all handlers to either SIG_DFL >> if is not SIG_IGN, or SIG_DFL if POSIX_SPAWN_SETSIGDEF is set. However >> it does not matter for SIGSTOP since we can not set it to SIG_IGN. >> >>> ---> 4. At this point the child receives a SIGTSTP signal. <--- >>> 5. The child unblocks signals by calling sigprocmask/pthread_sigmask. >>> 6. At this point the SIGTSTP is delivered to the child. >> >> Afaik SIGSTOP is not synchronous and can be delivered any time during process >> execution. > > Thank you for the response! To be clear, I'm referring to SIGTSTP (Ctrl+Z) [1], not > SIGSTOP. I understand that SIGSTOP cannot be blocked. However, SIGTSTP (which is > a different signal which can be blocked) is what I'm concerned about. Right, my mistake. I understood the issue better now, although I am still puzzled why SIGTSTP is only being triggered on sigprocmask (sing default action is still to stop PROCESS). > >> >>> 7. However, the clone hasn't exited in the parent and so it remains stuck in the clone3 syscall until the child receives a SIGCONT. >>> >>> I'm not sure what a reasonable way to handle this would be on the part of my CLI tool. The tool currently just gets stuck in uninterruptible sleep, resulting in a bad user experience. >> >> Reading through both your twitter discussion and the bug report against your >> tool [1] I think it is outside posix_spawn specification on how to handle >> SIGSTOP for the helper process itself in the tiny window between process >> creation and the setpgid. >> >>> >>> Here are solutions I've thought about that don't seem to work (please correct me if I'm wrong!) >>> 1. Setting the signal mask to include SIGTSTP. I do want to be able to send the child SIGTSTP after the clone(), and in my case the child is a third-party process so I can't depend on it to reset the signal mask. >>> 2. Spawning a stub process that execves the real child. It seems like the same issue exists when the main process calls the stub process, if I'm understanding the code correctly, so this won't help. >>> >>> ... though now as I'm writing this email out, maybe one solution is: >>> >>> * my tool spawns a stub process with SIGTSTP masked. >>> * the subprocess unmasks SIGTSTP (so it could receive the SIGTSTP here, but at least it won't block the parent process), then execves the third-party process. >>> >>> Is that the solution you would recommend? >> >> I am not sure this would work, since SIGSTOP cannot be caught, blocked, or >> ignored. What I think if might work is to spawn a stub process and make >> it a new session leader with setsid so it will not have a controlling >> terminal. The stub process will be responsible to spawn new processes, >> so any interaction with the controlling terminal (the CTRL+Z) won't affected >> the posix_spawn helper thread. > > That is definitely an interesting solution. However, is it necessary given that > Ctrl+Z is actually SIGTSTP, which can be blocked? > > Thanks again. I think one possibility would to set the default signal actions to SIG_IGN, similar to POSIX_SPAWN_SETSIGDEF does for SIG_DFL (Solaris have POSIX_SPAWN_SETSIGIGN_NP as an extension). It won't help much if the signal is received in the tiny window between the helper process start and sigaction call, so I am afraid this will only decrease the possibility of the deadlock, not eliminate it. > > [1] https://www.gnu.org/software/libc/manual/html_node/Job-Control-Signals.html > >> >> You will probably need to open the controlling terminal in raw mode so you >> can catch ctrl-z and pass along the expected process groups. >> >>> >>> Thanks. >> >> >> [1] https://github.com/nextest-rs/nextest/pull/470#issue-1338100182 >