From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yw1-x112f.google.com (mail-yw1-x112f.google.com [IPv6:2607:f8b0:4864:20::112f]) by sourceware.org (Postfix) with ESMTPS id AF6F83858D20 for ; Tue, 8 Aug 2023 19:35:38 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org AF6F83858D20 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-yw1-x112f.google.com with SMTP id 00721157ae682-589878e5b37so2161157b3.2 for ; Tue, 08 Aug 2023 12:35:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1691523337; x=1692128137; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=PlfURh5vKdamYp3hwMuybjYMAnSQshkOXVD/yKntxuw=; b=kRpb/aT0nPI34+JTM5T8Qrc1A6sPjb/gfq3tRi+ADeJ/VYIC/CPkR/o/G3l6bfWwjp 8kO3QVmlXMGZRgLaSl0Ih8MkW5jo4VnUQfgbzC1LYjspw7SiJszhzMTuQnJkuXCUOk62 EYQZS3yNLu67jE4YKG4fK+T6qBVzkCFeSUeXkxNk+B/b3a4CGOyzxhPcs3jxsXaumwWY gFFu0ECKbopRKN3SQyNQVeGQkMxxCydRFCWME6IJdWbON5fMHFELlgOhwxrWM652AQVx PpsvmLXk2Z7vLV/kOWgOK62AeFemFxNhzCVckLtLv29vuibBnFcxUWGQAvfnpZyJKF9U hx7g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691523337; x=1692128137; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=PlfURh5vKdamYp3hwMuybjYMAnSQshkOXVD/yKntxuw=; b=Q1s+N6YKBb0gpmsRxjeMmiCAhJMRk7NCxmspO6l2IZWCtPF9azIZpzTVGX/y4EjBtH LKTDXTfJ+nGuB3fHcYkcqGXLCrUUIcoDMmmz2+BBUzsXKzTZknJ9JeWKy+rO39EJnNdi fqByV5C22h9r08CygwZi+KAMtTcc2IVbDox1yosDCagM0rvHkSMIy1kk3ztuPGI2Iom9 KymUbsgRHmJryK7SJ5WJ96PZvqjKtIjU4AigaT8hBIa1gnPHCrE2BDG9ClIdWlf2LvPl XXZbAdTbfjy3UpVySDRkDw7cabFlmiaWl3NC+HyGr1u7RdwPs+rXKayDKwOKxMtIft+/ 9VWA== X-Gm-Message-State: AOJu0YylCLxzmZ+e2QVjWH+eM9vMH43osjsEgGBOTWu8S0/54Czwx/qh t6YRtN+qN9JRq7tztJO92ce6kEEOHpfRrzfzq5G4AjZP X-Google-Smtp-Source: AGHT+IEmL5so9V1fW7NYYMKXaWIWuDbCeJa4KHzfbx6KVlMgoP5kah49bGCYZ64RHoCcE/5E2sz+n5Rnhov4yo8XBEY= X-Received: by 2002:a25:d6c8:0:b0:d3b:3baf:2db2 with SMTP id n191-20020a25d6c8000000b00d3b3baf2db2mr541380ybg.51.1691523337644; Tue, 08 Aug 2023 12:35:37 -0700 (PDT) MIME-Version: 1.0 References: <20230706134508.422526-1-adhemerval.zanella@linaro.org> In-Reply-To: <20230706134508.422526-1-adhemerval.zanella@linaro.org> From: Luca Boccassi Date: Tue, 8 Aug 2023 20:35:26 +0100 Message-ID: Subject: Re: [PATCH v6 0/5] Add pidfd and cgroupv2 support for process creation (resend) To: Adhemerval Zanella Cc: libc-alpha@sourceware.org, Philip Withnall Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-1.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Thu, 6 Jul 2023 at 14:45, Adhemerval Zanella wrote: > > The glibc 2.36 added wrappers for Linux syscall pidfd_open, > pidfd_getfd, and pidfd_send_signal, and exported the P_PIDFD to use > along with waitid. The pidfd is a race free interface, however > the pidfd_open is subject to TOCTOU if the file descriptor > is not obtained directly from the clone or clone3 syscall (there is > still a small window between the clone return and the pidfd_getfd > where the process can be reaped and the process ID reused). > > A fully race free interface with posix_spawn interface is being > discussed by GNOME [1] [2], and Qt already uses on its QtProcess > implementation [3]. The Qt implementation has some pitfalls: > > - It calls clone through the syscall symbol, which does not run the > pthread_atfork handlers even though it really intends to use the > clone semantic for fork (by only using CLONE_PIDFD | SIGCHLD). > > - It also does not reset any internal state, such as internal IO, > malloc, loader, etc. locks. > > - It does not set the TCB tid field nor the robust list, used by > pthread code. > > - It does not optimize process creation by using CLONE_VM and > CLONE_VFORK. > > Also, recent Linux kernel (starting with 5.7) provide a way to > create a new process in a different cgroups version 2 than the > default one (through clone3 CLONE_INTO_CGROUP flag). Providing it > through glibc interfaces make is usable without the risk of potential > breakage by issuing clone3 syscall directly (check BZ#26371 discussion). > > This patchset adds new interfaces that take care of this potential > issues. The new posix_spawn / posix_spawnp extesions: > > > #define POSIX_SPAWN_SETCGROUP 0x100 > > int posix_spawnattr_getcgroup_np (const posix_spawnattr_t > restrict *attr, int *cgroup); > int posix_spawnattr_setcgroup_np (posix_spawnattr_t *restrict attr, > int cgroup); > > Allow spawn a new process on a different cgroupv2. > > The pidfd_spawn and pidfd_spawnp is similar to posix_spawn and > posix_spawnp, > but return a process file descriptor instead of a PID. > > int pidfd_spawn (int *restrict pidfd, > const char *restrict file, > const posix_spawn_file_actions_t *restrict facts, > const posix_spawnattr_t *restrict attrp, > char *const argv[restrict], > char *const envp[restrict]) > > int pidfd_spawnp (int *restrict pidfd, > const char *restrict path, > const posix_spawn_file_actions_t *restrict facts, > const posix_spawnattr_t *restrict attrp, > char *const argv[restrict_arr], > char *const envp[restrict_arr]); > > The implementation makes sure that kernel must support the complete > pidfd interface, meaning that waitid (P_PIDFD) should be supported. It > ensure that non racy workaround is required (such as reading procfs > fdinfo pid to use along with old wait interfaces). If kernel does not > have the required support the interface returns ENOSYS. > > A new symbol is used instead of a posix_spawn extension to avoid > possible issue with language bindings that might track the argument > lifetime. > > Both symbols reuse the posix_spawn posix_spawn_file_actions_t and > posix_spawnattr_t, to either avoid rehash posix_spawn API or add a new > one. It also mean that both interfaces support the same attribute and > file actions, and a new flag or file actions on posix_spawn is also > added automatically for pidfd_spawn. It includes POSIX_SPAWN_SETCGROUP. > > Along with the spawn interface, a fork like one is also provided: > > pid_t pidfd_fork (int *pidfd, int cgroup, unsigned int flags) > > If PIDFD is set to NULL, no file descriptor is returned and pidfd_fork > acts as fork. Otherwise, a new file descriptor is returned and the > kernel already sets O_CLOEXEC as default. The pidfd_fork follows > fork/_Fork convention on returning a positive or negative value to the > parent (with negative indicating an error) and zero to the child. > > If cgroup is 0 or positive value, it is interpreted as a different > cgroup to be place the new process (check CLONE_INTO_CGROUP clone > flag). > > The kernel already sets O_CLOEXEC as default and it follows fork/_Fork > convention on returning a positive or negative value to the parent > (with negative indicating an error) and zero to the child. > > Similar to fork, pidfd_fork also runs the pthread_atfork handlers > It can be change by using PIDFDFORK_ASYNCSAFE flag, which make > pidfd_fork acts a _Fork. It also send SIGCHLD to parent when > process terminates. > > To have a way to interop between process IDs and process file > descriptors, the pidfd_getpid is also provided: > > pid_t pidfd_getpid (int fd) > > It reads the procfs fdinfo entry from the file descriptor to get > the process ID. Hi, any update on this series? Kind regards, Luca Boccassi