From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi1-x22c.google.com (mail-oi1-x22c.google.com [IPv6:2607:f8b0:4864:20::22c]) by sourceware.org (Postfix) with ESMTPS id 0378C3857738 for ; Wed, 5 Jul 2023 20:43:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 0378C3857738 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org Received: by mail-oi1-x22c.google.com with SMTP id 5614622812f47-3a1ebb79579so80452b6e.3 for ; Wed, 05 Jul 2023 13:43:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1688589813; x=1691181813; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:from:to:cc:subject:date:message-id:reply-to; bh=69jM5cL/z9zj6v8NiUFSnBuE/9X5StipQSYS6/8FtqI=; b=AR+hpR0kHlQQjBDqbKy8H5Fc1+zXz2ChoyDXq47/0kZPnlxrK6oU5kxvjxGBgeh58e bvIIZwa4FYNMCvO88votalXIPurdcD+FEupML+vyNVX3RGi7v9TVhJ2YDf/LMQmTvSNv 2DbV52r1fSDhBuuCN/B7WGRRYrQA83VRp32ZL+GFBLze/+e4GI/iF9TFJKRrc9ymM+Mh 8H9PYD2KqqGNlGHtcN+I+MaS8p83Y8JALzJ5bo0xkBuciDf9SIKKlu7Lkd8nzUK2ZHnt xDgsbENvl52yURqHD2EvUnCGYpzf84ZcB38ErzIeN6POueuzat0VxX6TLv+juWvmZ5tc Td6w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688589813; x=1691181813; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=69jM5cL/z9zj6v8NiUFSnBuE/9X5StipQSYS6/8FtqI=; b=fXIqzaFwkpeGWMplU9pGXOzg7hABzFc5G3LvuX1+FOcPAC5TB41N3rngwqIv9yrVca AsoVseVXpyFvmDZU/8+n5JPrwaykVnnTAry3xNx2qozHFkDehs+4oM3gmWqXTWxRfrzI uEBtsRa1+v/MkBfgKvZpia2+bEFR2r6SulHgyQ05Diaq6pju0kx7aywtJE3Mw2M8dpKv O7g+U0wDIiTpkmu+dRihIIaFfIDPJrSVBVxT3iG2w0vXxvSuC6Osu7VAv096lQ9N3yWh cgn0v/zIkRpaBlIYBUsS6n2Dn5CSa5pSoOTIhQs0Pez94L+ctAsSQ0nymZkrYPTdFe5V zyBw== X-Gm-Message-State: AC+VfDxdo8r3lWa6zMcyNwsFfynhuK/foSARtrc01bw3qA1x2JNoqQsW 5X2QywrOl1U4FSLx1LVOmpo2Wit3+s93paCusSPE+g== X-Google-Smtp-Source: ACHHUZ5KshXjIgbHrAwqvDn7LS2HFA0niJIxpjqybtCQFMMfEKp2xq8Q1WOJ0Pnm8yyMUxPxo5AE1A== X-Received: by 2002:a05:6808:1449:b0:3a1:e222:97db with SMTP id x9-20020a056808144900b003a1e22297dbmr22669981oiv.30.1688589812707; Wed, 05 Jul 2023 13:43:32 -0700 (PDT) Received: from mandiga.. ([2804:1b3:a7c3:e0c8:4c8a:4b26:2eb9:add8]) by smtp.gmail.com with ESMTPSA id u10-20020a056808114a00b003a3860b375esm5637858oiu.34.2023.07.05.13.43.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 05 Jul 2023 13:43:32 -0700 (PDT) From: Adhemerval Zanella To: libc-alpha@sourceware.org, Luca Boccassi , Philip Withnall Subject: [PATCH v6 0/5] Add pidfd and cgroupv2 support for process creation Date: Wed, 5 Jul 2023 17:43:23 -0300 Message-Id: <20230705204328.4067751-1-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Add pidfd and cgroupv2 support for process creation The glibc 2.36 added wrappers for Linux syscall pidfd_open, pidfd_getfd, and pidfd_send_signal, and exported the P_PIDFD to use along with waitid. The pidfd is a race free interface, however the pidfd_open is subject to TOCTOU if the file descriptor is not obtained directly from the clone or clone3 syscall (there is still a small window between the clone return and the pidfd_getfd where the process can be reaped and the process ID reused). A fully race free interface with posix_spawn interface is being discussed by GNOME [1] [2], and Qt already uses on its QtProcess implementation [3]. The Qt implementation has some pitfalls: - It calls clone through the syscall symbol, which does not run the pthread_atfork handlers even though it really intends to use the clone semantic for fork (by only using CLONE_PIDFD | SIGCHLD). - It also does not reset any internal state, such as internal IO, malloc, loader, etc. locks. - It does not set the TCB tid field nor the robust list, used by pthread code. - It does not optimize process creation by using CLONE_VM and CLONE_VFORK. Also, recent Linux kernel (starting with 5.7) provide a way to create a new process in a different cgroups version 2 than the default one (through clone3 CLONE_INTO_CGROUP flag). Providing it through glibc interfaces make is usable without the risk of potential breakage by issuing clone3 syscall directly (check BZ#26371 discussion). This patchset adds new interfaces that take care of this potential issues. The new posix_spawn / posix_spawnp extesions: #define POSIX_SPAWN_SETCGROUP 0x100 int posix_spawnattr_getcgroup_np (const posix_spawnattr_t restrict *attr, int *cgroup); int posix_spawnattr_setcgroup_np (posix_spawnattr_t *restrict attr, int cgroup); Allow spawn a new process on a different cgroupv2. The pidfd_spawn and pidfd_spawnp is similar to posix_spawn and posix_spawnp, but return a process file descriptor instead of a PID. int pidfd_spawn (int *restrict pidfd, const char *restrict file, const posix_spawn_file_actions_t *restrict facts, const posix_spawnattr_t *restrict attrp, char *const argv[restrict], char *const envp[restrict]) int pidfd_spawnp (int *restrict pidfd, const char *restrict path, const posix_spawn_file_actions_t *restrict facts, const posix_spawnattr_t *restrict attrp, char *const argv[restrict_arr], char *const envp[restrict_arr]); The implementation makes sure that kernel must support the complete pidfd interface, meaning that waitid (P_PIDFD) should be supported. It ensure that non racy workaround is required (such as reading procfs fdinfo pid to use along with old wait interfaces). If kernel does not have the required support the interface returns ENOSYS. A new symbol is used instead of a posix_spawn extension to avoid possible issue with language bindings that might track the argument lifetime. Both symbols reuse the posix_spawn posix_spawn_file_actions_t and posix_spawnattr_t, to either avoid rehash posix_spawn API or add a new one. It also mean that both interfaces support the same attribute and file actions, and a new flag or file actions on posix_spawn is also added automatically for pidfd_spawn. It includes POSIX_SPAWN_SETCGROUP. Along with the spawn interface, a fork like one is also provided: pid_t pidfd_fork (int *pidfd, int cgroup, unsigned int flags) If PIDFD is set to NULL, no file descriptor is returned and pidfd_fork acts as fork. Otherwise, a new file descriptor is returned and the kernel already sets O_CLOEXEC as default. The pidfd_fork follows fork/_Fork convention on returning a positive or negative value to the parent (with negative indicating an error) and zero to the child. If cgroup is 0 or positive value, it is interpreted as a different cgroup to be place the new process (check CLONE_INTO_CGROUP clone flag). The kernel already sets O_CLOEXEC as default and it follows fork/_Fork convention on returning a positive or negative value to the parent (with negative indicating an error) and zero to the child. Similar to fork, pidfd_fork also runs the pthread_atfork handlers It can be change by using PIDFDFORK_ASYNCSAFE flag, which make pidfd_fork acts a _Fork. It also send SIGCHLD to parent when process terminates. To have a way to interop between process IDs and process file descriptors, the pidfd_getpid is also provided: pid_t pidfd_getpid (int fd) It reads the procfs fdinfo entry from the file descriptor to get the process ID. --- Changes from v5: - Added cgroupv2 support for posix_spawn, pidfd_spawn, and pidfd_fork. Changes from v4: - Changed pidfd_fork signature to return a pid_t instead of PID file descriptor. - Changed pidfd_getpid to return EBADF for negative input, instead of EINVAL. - Added PIDFDFORK_NOSIGCHLD option. - Fixed nested __BEGIN_DECLS on spawn.h Changes from v3: - Remove strtoul usage. - Fixed patchwork tst-pidfd_getpid.c regression. - Fixed manual and NEWS typos. Changes from v2: - Added pidfd_fork and pidfd_getpid manual entries - Change pidfd_fork to act as fork as default, instead as _Fork. - Changed PIDFD_FORK_RUNATFORK flag to PIDFDFORK_ASYNCSAFE. - Added pidfd_getpid test for EREMOTE. Changes from v1: - Extended pidfd_getpid error codes to return EBADF if fdinfo does not have Pid entry or if the value is invalid, EREMOTE is pid is in a separate namespace, and ESRCH if is already terminated. - Extended tst-pidfd_getpid. - Rename PIDFD_FORK_RUNATFORK to PIDFDFORK_RUNATFORK to avoid clash with possible kernel extensions. Adhemerval Zanella (5): linux: Add posix_spawnattr_{get,set}cgroup_np (BZ 26731) posix: Add pidfd_spawn and pidfd_spawnp (BZ 30349) posix: Add pidfd_fork (BZ 26371) posix: Add PIDFDFORK_NOSIGCHLD for pidfd_fork linux: Add pidfd_getpid NEWS | 22 ++ bits/spawn_ext.h | 21 ++ include/clone_internal.h | 21 ++ manual/process.texi | 92 ++++++- posix/Makefile | 5 +- posix/fork-internal.c | 127 ++++++++++ posix/fork-internal.h | 36 +++ posix/fork.c | 107 +-------- posix/spawn.h | 6 +- posix/spawn_int.h | 3 +- posix/spawnattr_setflags.c | 3 +- posix/tst-posix_spawn-setsid.c | 168 +++++++++---- posix/tst-spawn-chdir.c | 15 +- posix/tst-spawn.c | 24 +- posix/tst-spawn.h | 36 +++ posix/tst-spawn2.c | 17 +- posix/tst-spawn3.c | 100 ++++---- posix/tst-spawn4.c | 7 +- posix/tst-spawn5.c | 14 +- posix/tst-spawn6.c | 15 +- posix/tst-spawn7.c | 13 +- sysdeps/nptl/_Fork.c | 2 +- sysdeps/unix/sysv/linux/Makefile | 29 +++ sysdeps/unix/sysv/linux/Versions | 8 + sysdeps/unix/sysv/linux/aarch64/libc.abilist | 6 + sysdeps/unix/sysv/linux/alpha/libc.abilist | 6 + sysdeps/unix/sysv/linux/arc/libc.abilist | 6 + sysdeps/unix/sysv/linux/arch-fork.h | 16 +- sysdeps/unix/sysv/linux/arm/be/libc.abilist | 6 + sysdeps/unix/sysv/linux/arm/le/libc.abilist | 6 + sysdeps/unix/sysv/linux/bits/spawn_ext.h | 60 +++++ sysdeps/unix/sysv/linux/clone-internal.c | 60 ++++- sysdeps/unix/sysv/linux/clone-pidfd-support.c | 58 +++++ sysdeps/unix/sysv/linux/csky/libc.abilist | 6 + sysdeps/unix/sysv/linux/hppa/libc.abilist | 6 + sysdeps/unix/sysv/linux/i386/libc.abilist | 6 + sysdeps/unix/sysv/linux/ia64/libc.abilist | 6 + .../sysv/linux/loongarch/lp64/libc.abilist | 6 + .../sysv/linux/m68k/coldfire/libc.abilist | 6 + .../unix/sysv/linux/m68k/m680x0/libc.abilist | 6 + .../sysv/linux/microblaze/be/libc.abilist | 6 + .../sysv/linux/microblaze/le/libc.abilist | 6 + .../sysv/linux/mips/mips32/fpu/libc.abilist | 6 + .../sysv/linux/mips/mips32/nofpu/libc.abilist | 6 + .../sysv/linux/mips/mips64/n32/libc.abilist | 6 + .../sysv/linux/mips/mips64/n64/libc.abilist | 6 + sysdeps/unix/sysv/linux/nios2/libc.abilist | 6 + sysdeps/unix/sysv/linux/or1k/libc.abilist | 6 + sysdeps/unix/sysv/linux/pidfd_fork.c | 82 +++++++ sysdeps/unix/sysv/linux/pidfd_getpid.c | 122 ++++++++++ sysdeps/unix/sysv/linux/pidfd_spawn.c | 30 +++ sysdeps/unix/sysv/linux/pidfd_spawnp.c | 30 +++ .../linux/powerpc/powerpc32/fpu/libc.abilist | 6 + .../powerpc/powerpc32/nofpu/libc.abilist | 6 + .../linux/powerpc/powerpc64/be/libc.abilist | 6 + .../linux/powerpc/powerpc64/le/libc.abilist | 6 + sysdeps/unix/sysv/linux/procutils.c | 104 ++++++++ sysdeps/unix/sysv/linux/procutils.h | 35 +++ .../unix/sysv/linux/riscv/rv32/libc.abilist | 6 + .../unix/sysv/linux/riscv/rv64/libc.abilist | 6 + .../unix/sysv/linux/s390/s390-32/libc.abilist | 6 + .../unix/sysv/linux/s390/s390-64/libc.abilist | 6 + sysdeps/unix/sysv/linux/sh/be/libc.abilist | 6 + sysdeps/unix/sysv/linux/sh/le/libc.abilist | 6 + .../sysv/linux/sparc/sparc32/libc.abilist | 6 + .../sysv/linux/sparc/sparc64/libc.abilist | 6 + .../unix/sysv/linux/spawnattr_getcgroup_np.c | 28 +++ .../unix/sysv/linux/spawnattr_setcgroup_np.c | 27 +++ sysdeps/unix/sysv/linux/spawni.c | 38 ++- sysdeps/unix/sysv/linux/sys/pidfd.h | 25 ++ sysdeps/unix/sysv/linux/tst-pidfd.c | 47 ++++ .../unix/sysv/linux/tst-pidfd_fork-cgroup.c | 162 +++++++++++++ sysdeps/unix/sysv/linux/tst-pidfd_fork.c | 227 ++++++++++++++++++ sysdeps/unix/sysv/linux/tst-pidfd_getpid.c | 187 +++++++++++++++ .../sysv/linux/tst-posix_spawn-setsid-pidfd.c | 20 ++ sysdeps/unix/sysv/linux/tst-spawn-cgroup.c | 216 +++++++++++++++++ .../unix/sysv/linux/tst-spawn-chdir-pidfd.c | 20 ++ sysdeps/unix/sysv/linux/tst-spawn-pidfd.c | 20 ++ sysdeps/unix/sysv/linux/tst-spawn-pidfd.h | 63 +++++ sysdeps/unix/sysv/linux/tst-spawn2-pidfd.c | 20 ++ sysdeps/unix/sysv/linux/tst-spawn3-pidfd.c | 20 ++ sysdeps/unix/sysv/linux/tst-spawn4-pidfd.c | 20 ++ sysdeps/unix/sysv/linux/tst-spawn5-pidfd.c | 20 ++ sysdeps/unix/sysv/linux/tst-spawn6-pidfd.c | 20 ++ sysdeps/unix/sysv/linux/tst-spawn7-pidfd.c | 20 ++ .../unix/sysv/linux/x86_64/64/libc.abilist | 6 + .../unix/sysv/linux/x86_64/x32/libc.abilist | 6 + 87 files changed, 2624 insertions(+), 268 deletions(-) create mode 100644 bits/spawn_ext.h create mode 100644 posix/fork-internal.c create mode 100644 posix/fork-internal.h create mode 100644 posix/tst-spawn.h create mode 100644 sysdeps/unix/sysv/linux/bits/spawn_ext.h create mode 100644 sysdeps/unix/sysv/linux/clone-pidfd-support.c create mode 100644 sysdeps/unix/sysv/linux/pidfd_fork.c create mode 100644 sysdeps/unix/sysv/linux/pidfd_getpid.c create mode 100644 sysdeps/unix/sysv/linux/pidfd_spawn.c create mode 100644 sysdeps/unix/sysv/linux/pidfd_spawnp.c create mode 100644 sysdeps/unix/sysv/linux/procutils.c create mode 100644 sysdeps/unix/sysv/linux/procutils.h create mode 100644 sysdeps/unix/sysv/linux/spawnattr_getcgroup_np.c create mode 100644 sysdeps/unix/sysv/linux/spawnattr_setcgroup_np.c create mode 100644 sysdeps/unix/sysv/linux/tst-pidfd_fork-cgroup.c create mode 100644 sysdeps/unix/sysv/linux/tst-pidfd_fork.c create mode 100644 sysdeps/unix/sysv/linux/tst-pidfd_getpid.c create mode 100644 sysdeps/unix/sysv/linux/tst-posix_spawn-setsid-pidfd.c create mode 100644 sysdeps/unix/sysv/linux/tst-spawn-cgroup.c create mode 100644 sysdeps/unix/sysv/linux/tst-spawn-chdir-pidfd.c create mode 100644 sysdeps/unix/sysv/linux/tst-spawn-pidfd.c create mode 100644 sysdeps/unix/sysv/linux/tst-spawn-pidfd.h create mode 100644 sysdeps/unix/sysv/linux/tst-spawn2-pidfd.c create mode 100644 sysdeps/unix/sysv/linux/tst-spawn3-pidfd.c create mode 100644 sysdeps/unix/sysv/linux/tst-spawn4-pidfd.c create mode 100644 sysdeps/unix/sysv/linux/tst-spawn5-pidfd.c create mode 100644 sysdeps/unix/sysv/linux/tst-spawn6-pidfd.c create mode 100644 sysdeps/unix/sysv/linux/tst-spawn7-pidfd.c -- 2.34.1