From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oa1-x34.google.com (mail-oa1-x34.google.com [IPv6:2001:4860:4864:20::34]) by sourceware.org (Postfix) with ESMTPS id 2F9733858401 for ; Fri, 18 Aug 2023 14:06:49 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2F9733858401 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org Received: by mail-oa1-x34.google.com with SMTP id 586e51a60fabf-1bff2d2c141so507485fac.1 for ; Fri, 18 Aug 2023 07:06:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1692367607; x=1692972407; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:from:to:cc:subject:date:message-id:reply-to; bh=ZSMOdc347t1/bDbGyY5CNm7KogTfPMJiG43YpTfgYsM=; b=KEZhXwLQIFMZmvf51QFDYwsLhEaX5VAi2ML1qyt5HCMcKkFfu1tjkTkVEhlQZ78Oq4 bRsNjqexoL1FtMJKMF6SOIKRAwBbDJhmHSyl5Cg1cvhZFfAhWnQTwC5HE96jsc320yrV M2u5GchhErdrJGZapFStQrGMuydc0H9WH1Cy1+Tla0zlOO4+tDhmQUCZa4xl25V0pinT 8HT5vQv6qIsPyX6ylRHeQ9Y45zXmoLYk3XU1DMtg74qrL+wnxnIMefPPsyjSbePGipUU HzmKAVoBDe59i86ujCLX2+9K6MpWPxTPpTHE9gQ619PG4P8d1aOGTC59dq9cIH+XZCyV c3aA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692367607; x=1692972407; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ZSMOdc347t1/bDbGyY5CNm7KogTfPMJiG43YpTfgYsM=; b=VaIh9Bb7Leo3BUKsnvFq8DJQyHFRZJ2hF9M636ENVm9cjn+hMt0C6daq987yyDuP0T h+42/eolL0t1WkXLmm2Ee0z1sX0wTxOF39lIEKPWOlNu2oN6M4Z9pnTv+hFuhkDSjdvi Vcs/F7pMHA1pt7VEGF9GpkQem2U6CjljwueNJL54P8xSBTGf0JSXYA/pndItFLdeMFZI g08vjPXfl0mUxMsOhkN/5vAffv4SZhGJNu7YVzwbCA4yhJeDpNIxSSNMx2HB3VM6cdOM LA44KRFjb8zZyMcyALc1h4llTlCfhp729LAUz2dB8+LGvv7yfanNXwqyHRVZEVeXHGxM 3+ZQ== X-Gm-Message-State: AOJu0YyEVXgUP5kA8U1OcmttqrVrZFisEW4VLoiSb0qnbXG5SRgE9IWJ K0q9ySf4p5DwFEx0ROns15a1Scx7/FShjjPRk1w= X-Google-Smtp-Source: AGHT+IGutmWEBqqf3YKI1S0LS53AlB/1yfgerNvNLjEyeIWP8n88enMKMyEKWAdbGOhYAj9w066Zdw== X-Received: by 2002:a05:6870:9709:b0:1b0:2686:5b23 with SMTP id n9-20020a056870970900b001b026865b23mr2927602oaq.51.1692367607395; Fri, 18 Aug 2023 07:06:47 -0700 (PDT) Received: from mandiga.. ([2804:1b3:a7c2:c275:7563:d0a5:7f5d:1313]) by smtp.gmail.com with ESMTPSA id r7-20020a056870734700b001c0e840ebcfsm1042841oal.35.2023.08.18.07.06.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 18 Aug 2023 07:06:46 -0700 (PDT) From: Adhemerval Zanella To: libc-alpha@sourceware.org, Florian Weimer Subject: [PATCH v8 0/7] Add pidfd and cgroupv2 support for process creation Date: Fri, 18 Aug 2023 11:06:35 -0300 Message-Id: <20230818140642.1623571-1-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: The glibc 2.36 added wrappers for Linux syscall pidfd_open, pidfd_getfd, and pidfd_send_signal, and exported the P_PIDFD to use along with waitid. The pidfd is a race-free interface, however, the pidfd_open is subject to TOCTOU if the file descriptor is not obtained directly from the clone or clone3 syscall (there is still a small window between the clone return and the pidfd_getfd where the process can be reaped and the process ID reused). A fully race-free interface with posix_spawn interface is being discussed by GNOME [1] [2], and Qt already uses it in its QtProcess implementation [3]. The Qt implementation has some pitfalls: - It calls clone through the syscall symbol, which does not run the pthread_atfork handlers even though it intends to use the clone semantic for fork (by only using CLONE_PIDFD | SIGCHLD). - It also does not reset any internal state, such as internal IO, malloc, loader, etc. locks. - It does not set the TCB tid field nor the robust list, used by the pthread code. - It does not optimize process creation by using CLONE_VM and CLONE_VFORK. Also, the recent Linux kernel (starting with 5.7) provides a way to create a new process in a different cgroups version 2 than the default one (through clone3 CLONE_INTO_CGROUP flag). Providing it through glibc interfaces makes it usable without the risk of potential breakage by issuing clone3 syscall directly (check BZ#26371 discussion). This patch set adds new interfaces that take care of these potential issues. The new posix_spawn / posix_spawnp extensions: #define POSIX_SPAWN_SETCGROUP 0x100 int posix_spawnattr_getcgroup_np (const posix_spawnattr_t restrict *attr, int *cgroup); int posix_spawnattr_setcgroup_np (posix_spawnattr_t *restrict attr, int cgroup); Allow spawning a new process on a different cgroupv2. The pidfd_spawn and pidfd_spawnp is similar to posix_spawn and posix_spawnp, but return a process file descriptor instead of a PID. int pidfd_spawn (int *restrict pidfd, const char *restrict file, const posix_spawn_file_actions_t *restrict facts, const posix_spawnattr_t *restrict attrp, char *const argv[restrict], char *const envp[restrict]); int pidfd_spawnp (int *restrict pidfd, const char *restrict path, const posix_spawn_file_actions_t *restrict facts, const posix_spawnattr_t *restrict attrp, char *const argv[restrict_arr], char *const envp[restrict_arr]); The implementation makes sure that kernel must support the complete pidfd interface, meaning that waitid (P_PIDFD) should be supported. It ensures that a non-racy workaround is required (such as reading procfs fdinfo pid to use along with old wait interfaces). If the kernel does not have the required support the interface returns ENOSYS. A new symbol is used instead of a posix_spawn extension to avoid possible issues with language bindings that might track the argument lifetime. Both symbols reuse the posix_spawn posix_spawn_file_actions_t and posix_spawnattr_t, to either avoid rehashing the posix_spawn API or add a new one. It also means that both interfaces support the same attribute and file actions, and a new flag or file action on posix_spawn is also added automatically for pidfd_spawn. It includes POSIX_SPAWN_SETCGROUP. Along with the spawn interface, a fork-like one is also provided: typedef union { struct { __uint64_t fork_np_flags; int fork_np_pidfd; int fork_np_cgroup; int fork_np_exit_signal; #define fork_np_flags __data.fork_np_flags #define fork_np_pidfd __data.fork_np_pidfd #define fork_np_cgroup __data.fork_np_cgroup #define fork_np_exit_signal __data.fork_np_exit_signal } __data; char __size [FORK_NP_ARGS_SIZE_VER0]; } fork_np_args_t; #define FORK_NP_PIDFD (1ULL << 1) #define FORK_NP_CGROUP (1ULL << 2) #define FORK_NP_ASYNCSAFE (1ULL << 3) #define FORK_NP_EXIT_SIGNAL (1ULL << 4) pid_t fork_np (fork_np_args_t *args, size_t size) The SIZE must represent a supported pidfd_fork_args_t type, otherwise, the function returns EINVAL. If ARGS has all members set to 0, no file descriptor is returned and pidfd_fork acts as fork. If PIDFDFORK_PIDFD is set on the flags member, a new file descriptor is returned on pidfd member and the kernel sets O_CLOEXEC as default. The pidfd_fork follows the fork/_Fork convention on returning a positive or negative value to the parent (with a negative indicating an error) and zero to the child. If PIDFDFORK_CGROUP is set, the value on the cgroup member is used as the cgroupv2 to be placed in the new process (by using the CLONE_INTO_CGROUP clone flag). If PIDFDFORK_ASYNCSAFE is set, pidfd_fork acts as _Fork, thus avoiding running pthread_atfork handlers. If PIDFDFORK_EXIT_SIGNAL is set, the signal on exit_signal is sent as process termination (SIGCHLD is the default). The 0 value is also valid, meaning no signal will be sent. The kernel already sets O_CLOEXEC as default and it follows the fork/_Fork convention on returning a positive or negative value to the parent (with negative indicating an error) and zero to the child. Similar to fork, pidfd_fork also runs the pthread_atfork handlers It can be changed by using the PIDFDFORK_ASYNCSAFE flag, which makes pidfd_fork act a _Fork. It also sends SIGCHLD to the parent when the new process terminates. To have a way to interop between process IDs and process file descriptors, the pidfd_getpid is also provided: pid_t pidfd_getpid (int fd) It reads the procfs fdinfo entry from the file descriptor to get the process ID. [1] https://gitlab.gnome.org/GNOME/glib/-/issues/1866 [2] https://sourceware.org/bugzilla/show_bug.cgi?id=30349 [3] https://codebrowser.dev/qt6/qtbase/src/3rdparty/forkfd/forkfd_linux.c.html --- Changes from v7: - Redefine __ASSUME_CLONE3 to 0 if the architecture does not support the syscall. - Fixed some failing errors to be reported by spawned processes. - Fixed pre-commit CI for AArch64 failures. - Rename pidfd_fork to fork_np and make the API extensible - Document more possible pidfd_getpid errors. Changes from v6: - Rebased against master, adjusted symbol version and NEWS entry. - Added arm/mips clone3 implementation. Changes from v5: - Added cgroupv2 support for posix_spawn, pidfd_spawn, and pidfd_fork. Changes from v4: - Changed pidfd_fork signature to return a pid_t instead of the PID file descriptor. - Changed pidfd_getpid to return EBADF for negative input, instead of EINVAL. - Added PIDFDFORK_NOSIGCHLD option. - Fixed nested __BEGIN_DECLS on spawn.h Changes from v3: - Remove strtoul usage. - Fixed patchwork tst-pidfd_getpid.c regression. - Fixed manual and NEWS typos. Changes from v2: - Added pidfd_fork and pidfd_getpid manual entries - Change pidfd_fork to act as fork as default, instead as _Fork. - Changed PIDFD_FORK_RUNATFORK flag to PIDFDFORK_ASYNCSAFE. - Added pidfd_getpid test for EREMOTE. Changes from v1: - Extended pidfd_getpid error codes to return EBADF if fdinfo does not have Pid entry or if the value is invalid, EREMOTE is pid is in a separate namespace, and ESRCH if is already terminated. - Extended tst-pidfd_getpid. - Rename PIDFD_FORK_RUNATFORK to PIDFDFORK_RUNATFORK to avoid clashes with possible kernel extensions. Adhemerval Zanella (7): arm: Add the clone3 wrapper mips: Add the clone3 wrapper linux: Define __ASSUME_CLONE3 to 0 for alpha, ia64, nios2, sh, and sparc linux: Add posix_spawnattr_{get,set}cgroup_np (BZ 26731) posix: Add pidfd_spawn and pidfd_spawnp (BZ 30349) posix: Add fork_np (BZ 26371) linux: Add pidfd_getpid NEWS | 24 ++ bits/spawn_ext.h | 21 ++ include/clone_internal.h | 21 ++ manual/process.texi | 122 ++++++++- posix/Makefile | 5 +- posix/fork-internal.c | 127 ++++++++++ posix/fork-internal.h | 36 +++ posix/fork.c | 107 +------- posix/spawn.h | 6 +- posix/spawn_int.h | 3 +- posix/spawnattr_setflags.c | 3 +- posix/tst-posix_spawn-setsid.c | 169 +++++++++---- posix/tst-spawn-chdir.c | 15 +- posix/tst-spawn.c | 24 +- posix/tst-spawn.h | 36 +++ posix/tst-spawn2.c | 17 +- posix/tst-spawn3.c | 100 ++++---- posix/tst-spawn4.c | 7 +- posix/tst-spawn5.c | 14 +- posix/tst-spawn6.c | 13 +- posix/tst-spawn7.c | 13 +- sysdeps/nptl/_Fork.c | 2 +- sysdeps/unix/sysv/linux/Makefile | 29 +++ sysdeps/unix/sysv/linux/Versions | 8 + sysdeps/unix/sysv/linux/aarch64/libc.abilist | 6 + .../unix/sysv/linux/alpha/kernel-features.h | 4 + sysdeps/unix/sysv/linux/alpha/libc.abilist | 6 + sysdeps/unix/sysv/linux/arc/libc.abilist | 6 + sysdeps/unix/sysv/linux/arch-fork.h | 16 +- sysdeps/unix/sysv/linux/arm/be/libc.abilist | 6 + sysdeps/unix/sysv/linux/arm/clone3.S | 80 ++++++ sysdeps/unix/sysv/linux/arm/le/libc.abilist | 6 + sysdeps/unix/sysv/linux/arm/sysdep.h | 1 + sysdeps/unix/sysv/linux/bits/spawn_ext.h | 71 ++++++ sysdeps/unix/sysv/linux/bits/unistd_ext.h | 51 ++++ sysdeps/unix/sysv/linux/clone-internal.c | 58 ++++- sysdeps/unix/sysv/linux/clone-pidfd-support.c | 60 +++++ sysdeps/unix/sysv/linux/csky/libc.abilist | 6 + sysdeps/unix/sysv/linux/fork_np.c | 97 +++++++ sysdeps/unix/sysv/linux/hppa/libc.abilist | 6 + sysdeps/unix/sysv/linux/i386/libc.abilist | 6 + .../unix/sysv/linux/ia64/kernel-features.h | 4 + sysdeps/unix/sysv/linux/ia64/libc.abilist | 6 + .../sysv/linux/loongarch/lp64/libc.abilist | 6 + .../sysv/linux/m68k/coldfire/libc.abilist | 6 + .../unix/sysv/linux/m68k/m680x0/libc.abilist | 6 + .../sysv/linux/microblaze/be/libc.abilist | 6 + .../sysv/linux/microblaze/le/libc.abilist | 6 + sysdeps/unix/sysv/linux/mips/clone3.S | 139 +++++++++++ .../sysv/linux/mips/mips32/fpu/libc.abilist | 6 + .../sysv/linux/mips/mips32/nofpu/libc.abilist | 6 + .../sysv/linux/mips/mips64/n32/libc.abilist | 6 + .../sysv/linux/mips/mips64/n64/libc.abilist | 6 + sysdeps/unix/sysv/linux/mips/sysdep.h | 2 + .../unix/sysv/linux/nios2/kernel-features.h | 24 ++ sysdeps/unix/sysv/linux/nios2/libc.abilist | 6 + sysdeps/unix/sysv/linux/or1k/libc.abilist | 6 + sysdeps/unix/sysv/linux/pidfd_getpid.c | 126 ++++++++++ sysdeps/unix/sysv/linux/pidfd_spawn.c | 30 +++ sysdeps/unix/sysv/linux/pidfd_spawnp.c | 30 +++ .../linux/powerpc/powerpc32/fpu/libc.abilist | 6 + .../powerpc/powerpc32/nofpu/libc.abilist | 6 + .../linux/powerpc/powerpc64/be/libc.abilist | 6 + .../linux/powerpc/powerpc64/le/libc.abilist | 6 + sysdeps/unix/sysv/linux/procutils.c | 97 +++++++ sysdeps/unix/sysv/linux/procutils.h | 43 ++++ .../unix/sysv/linux/riscv/rv32/libc.abilist | 6 + .../unix/sysv/linux/riscv/rv64/libc.abilist | 6 + .../unix/sysv/linux/s390/s390-32/libc.abilist | 6 + .../unix/sysv/linux/s390/s390-64/libc.abilist | 6 + sysdeps/unix/sysv/linux/sh/be/libc.abilist | 6 + sysdeps/unix/sysv/linux/sh/kernel-features.h | 4 + sysdeps/unix/sysv/linux/sh/le/libc.abilist | 6 + .../unix/sysv/linux/sparc/kernel-features.h | 4 + .../sysv/linux/sparc/sparc32/libc.abilist | 6 + .../sysv/linux/sparc/sparc64/libc.abilist | 6 + .../unix/sysv/linux/spawnattr_getcgroup_np.c | 28 +++ .../unix/sysv/linux/spawnattr_setcgroup_np.c | 27 ++ sysdeps/unix/sysv/linux/spawni.c | 42 +++- sysdeps/unix/sysv/linux/sys/pidfd.h | 4 + sysdeps/unix/sysv/linux/tst-fork_np-cgroup.c | 170 +++++++++++++ sysdeps/unix/sysv/linux/tst-fork_np.c | 236 ++++++++++++++++++ sysdeps/unix/sysv/linux/tst-pidfd.c | 48 ++++ sysdeps/unix/sysv/linux/tst-pidfd_getpid.c | 126 ++++++++++ .../sysv/linux/tst-posix_spawn-setsid-pidfd.c | 20 ++ sysdeps/unix/sysv/linux/tst-spawn-cgroup.c | 223 +++++++++++++++++ .../unix/sysv/linux/tst-spawn-chdir-pidfd.c | 20 ++ sysdeps/unix/sysv/linux/tst-spawn-pidfd.c | 20 ++ sysdeps/unix/sysv/linux/tst-spawn-pidfd.h | 63 +++++ sysdeps/unix/sysv/linux/tst-spawn2-pidfd.c | 20 ++ sysdeps/unix/sysv/linux/tst-spawn3-pidfd.c | 20 ++ sysdeps/unix/sysv/linux/tst-spawn4-pidfd.c | 20 ++ sysdeps/unix/sysv/linux/tst-spawn5-pidfd.c | 20 ++ sysdeps/unix/sysv/linux/tst-spawn6-pidfd.c | 20 ++ sysdeps/unix/sysv/linux/tst-spawn7-pidfd.c | 20 ++ .../unix/sysv/linux/x86_64/64/libc.abilist | 6 + .../unix/sysv/linux/x86_64/x32/libc.abilist | 6 + 97 files changed, 2947 insertions(+), 267 deletions(-) create mode 100644 bits/spawn_ext.h create mode 100644 posix/fork-internal.c create mode 100644 posix/fork-internal.h create mode 100644 posix/tst-spawn.h create mode 100644 sysdeps/unix/sysv/linux/arm/clone3.S create mode 100644 sysdeps/unix/sysv/linux/bits/spawn_ext.h create mode 100644 sysdeps/unix/sysv/linux/clone-pidfd-support.c create mode 100644 sysdeps/unix/sysv/linux/fork_np.c create mode 100644 sysdeps/unix/sysv/linux/mips/clone3.S create mode 100644 sysdeps/unix/sysv/linux/nios2/kernel-features.h create mode 100644 sysdeps/unix/sysv/linux/pidfd_getpid.c create mode 100644 sysdeps/unix/sysv/linux/pidfd_spawn.c create mode 100644 sysdeps/unix/sysv/linux/pidfd_spawnp.c create mode 100644 sysdeps/unix/sysv/linux/procutils.c create mode 100644 sysdeps/unix/sysv/linux/procutils.h create mode 100644 sysdeps/unix/sysv/linux/spawnattr_getcgroup_np.c create mode 100644 sysdeps/unix/sysv/linux/spawnattr_setcgroup_np.c create mode 100644 sysdeps/unix/sysv/linux/tst-fork_np-cgroup.c create mode 100644 sysdeps/unix/sysv/linux/tst-fork_np.c create mode 100644 sysdeps/unix/sysv/linux/tst-pidfd_getpid.c create mode 100644 sysdeps/unix/sysv/linux/tst-posix_spawn-setsid-pidfd.c create mode 100644 sysdeps/unix/sysv/linux/tst-spawn-cgroup.c create mode 100644 sysdeps/unix/sysv/linux/tst-spawn-chdir-pidfd.c create mode 100644 sysdeps/unix/sysv/linux/tst-spawn-pidfd.c create mode 100644 sysdeps/unix/sysv/linux/tst-spawn-pidfd.h create mode 100644 sysdeps/unix/sysv/linux/tst-spawn2-pidfd.c create mode 100644 sysdeps/unix/sysv/linux/tst-spawn3-pidfd.c create mode 100644 sysdeps/unix/sysv/linux/tst-spawn4-pidfd.c create mode 100644 sysdeps/unix/sysv/linux/tst-spawn5-pidfd.c create mode 100644 sysdeps/unix/sysv/linux/tst-spawn6-pidfd.c create mode 100644 sysdeps/unix/sysv/linux/tst-spawn7-pidfd.c -- 2.34.1