public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH v8 0/7] Add pidfd and cgroupv2 support for process creation
@ 2023-08-18 14:06 Adhemerval Zanella
  2023-08-18 14:06 ` [PATCH v8 1/7] arm: Add the clone3 wrapper Adhemerval Zanella
                   ` (7 more replies)
  0 siblings, 8 replies; 29+ messages in thread
From: Adhemerval Zanella @ 2023-08-18 14:06 UTC (permalink / raw)
  To: libc-alpha, Florian Weimer

The glibc 2.36 added wrappers for Linux syscall pidfd_open, pidfd_getfd,
and pidfd_send_signal, and exported the P_PIDFD to use along with
waitid. The pidfd is a race-free interface, however, the pidfd_open is
subject to TOCTOU if the file descriptor is not obtained directly from
the clone or clone3 syscall (there is still a small window between the
clone return and the pidfd_getfd where the process can be reaped and the
process ID reused).

A fully race-free interface with posix_spawn interface is being
discussed by GNOME [1] [2], and Qt already uses it in its QtProcess
implementation [3].  The Qt implementation has some pitfalls:

  - It calls clone through the syscall symbol, which does not run the
    pthread_atfork handlers even though it intends to use the clone
semantic for fork (by only using CLONE_PIDFD | SIGCHLD).

  - It also does not reset any internal state, such as internal IO,
    malloc, loader, etc. locks.

  - It does not set the TCB tid field nor the robust list, used by the
    pthread code.

  - It does not optimize process creation by using CLONE_VM and
    CLONE_VFORK.

Also, the recent Linux kernel (starting with 5.7) provides a way to
create a new process in a different cgroups version 2 than the default
one (through clone3 CLONE_INTO_CGROUP flag).  Providing it through glibc
interfaces makes it usable without the risk of potential breakage by
issuing clone3 syscall directly (check BZ#26371 discussion).

This patch set adds new interfaces that take care of these potential
issues.  The new posix_spawn / posix_spawnp extensions:

  #define POSIX_SPAWN_SETCGROUP 0x100

  int posix_spawnattr_getcgroup_np (const posix_spawnattr_t
				    restrict *attr, int *cgroup);
  int posix_spawnattr_setcgroup_np (posix_spawnattr_t *restrict attr,
                                    int cgroup);
  
Allow spawning a new process on a different cgroupv2.  

The pidfd_spawn and pidfd_spawnp is similar to posix_spawn and
posix_spawnp, but return a process file descriptor instead of a PID.

  int pidfd_spawn (int *restrict pidfd,
 		   const char *restrict file,
  		   const posix_spawn_file_actions_t *restrict facts,
  		   const posix_spawnattr_t *restrict attrp,
  		   char *const argv[restrict],
  		   char *const envp[restrict]);

  int pidfd_spawnp (int *restrict pidfd,
 		    const char *restrict path,
  		    const posix_spawn_file_actions_t *restrict facts,
  		    const posix_spawnattr_t *restrict attrp,
  		    char *const argv[restrict_arr],
  		    char *const envp[restrict_arr]);

The implementation makes sure that kernel must support the complete
pidfd interface, meaning that waitid (P_PIDFD) should be supported.  It
ensures that a non-racy workaround is required (such as reading procfs
fdinfo pid to use along with old wait interfaces).  If the kernel does
not have the required support the interface returns ENOSYS.

A new symbol is used instead of a posix_spawn extension to avoid
possible issues with language bindings that might track the argument
lifetime.

Both symbols reuse the posix_spawn posix_spawn_file_actions_t and
posix_spawnattr_t, to either avoid rehashing the posix_spawn API or add
a new one.  It also means that both interfaces support the same
attribute and file actions, and a new flag or file action on posix_spawn
is also added automatically for pidfd_spawn. It includes
POSIX_SPAWN_SETCGROUP.

Along with the spawn interface, a fork-like one is also provided:

  typedef union
  {
    struct
    {
      __uint64_t fork_np_flags;
      int fork_np_pidfd;
      int fork_np_cgroup;
      int fork_np_exit_signal;
  #define fork_np_flags       __data.fork_np_flags
  #define fork_np_pidfd       __data.fork_np_pidfd
  #define fork_np_cgroup      __data.fork_np_cgroup
  #define fork_np_exit_signal __data.fork_np_exit_signal
    } __data;
    char __size [FORK_NP_ARGS_SIZE_VER0];
  } fork_np_args_t;

  #define FORK_NP_PIDFD        (1ULL << 1)
  #define FORK_NP_CGROUP       (1ULL << 2)
  #define FORK_NP_ASYNCSAFE    (1ULL << 3)
  #define FORK_NP_EXIT_SIGNAL  (1ULL << 4)

  pid_t fork_np (fork_np_args_t *args, size_t size)

The SIZE must represent a supported pidfd_fork_args_t type, otherwise,
the function returns EINVAL.

If ARGS has all members set to 0, no file descriptor is returned and
pidfd_fork acts as fork.  If PIDFDFORK_PIDFD is set on the flags member,
a new file descriptor is returned on pidfd member and the kernel sets
O_CLOEXEC as default.  The pidfd_fork follows the fork/_Fork convention
on returning a positive or negative value to the parent (with a negative
indicating an error) and zero to the child.

If PIDFDFORK_CGROUP is set, the value on the cgroup member is used as
the cgroupv2 to be placed in the new process (by using the
CLONE_INTO_CGROUP clone flag).

If PIDFDFORK_ASYNCSAFE is set, pidfd_fork acts as _Fork, thus avoiding
running pthread_atfork handlers.

If PIDFDFORK_EXIT_SIGNAL is set, the signal on exit_signal is sent as
process termination (SIGCHLD is the default). The 0 value is also valid,
meaning no signal will be sent.

The kernel already sets O_CLOEXEC as default and it follows the
fork/_Fork convention on returning a positive or negative value to the
parent (with negative indicating an error) and zero to the child.

Similar to fork, pidfd_fork also runs the pthread_atfork handlers It can
be changed by using the PIDFDFORK_ASYNCSAFE flag, which makes pidfd_fork
act a _Fork.  It also sends SIGCHLD to the parent when the new process
terminates.

To have a way to interop between process IDs and process file
descriptors, the pidfd_getpid is also provided:

   pid_t pidfd_getpid (int fd)

It reads the procfs fdinfo entry from the file descriptor to get the
process ID.

[1] https://gitlab.gnome.org/GNOME/glib/-/issues/1866
[2] https://sourceware.org/bugzilla/show_bug.cgi?id=30349
[3] https://codebrowser.dev/qt6/qtbase/src/3rdparty/forkfd/forkfd_linux.c.html

---

Changes from v7:
- Redefine __ASSUME_CLONE3 to 0 if the architecture does not support the
  syscall.
- Fixed some failing errors to be reported by spawned processes.
- Fixed pre-commit CI for AArch64 failures.
- Rename pidfd_fork to fork_np and make the API extensible
- Document more possible pidfd_getpid errors.

Changes from v6:
- Rebased against master, adjusted symbol version and NEWS entry.
- Added arm/mips clone3 implementation.

Changes from v5:
- Added cgroupv2 support for posix_spawn, pidfd_spawn, and pidfd_fork.

Changes from v4:
- Changed pidfd_fork signature to return a pid_t instead of the PID file
  descriptor.
- Changed pidfd_getpid to return EBADF for negative input, instead of
  EINVAL.
- Added PIDFDFORK_NOSIGCHLD option.
- Fixed nested __BEGIN_DECLS on spawn.h

Changes from v3:
- Remove strtoul usage.
- Fixed patchwork tst-pidfd_getpid.c regression.
- Fixed manual and NEWS typos.

Changes from v2:
- Added pidfd_fork and pidfd_getpid manual entries
- Change pidfd_fork to act as fork as default, instead as _Fork.
- Changed PIDFD_FORK_RUNATFORK flag to PIDFDFORK_ASYNCSAFE.
- Added pidfd_getpid test for EREMOTE.

Changes from v1:
- Extended pidfd_getpid error codes to return EBADF if fdinfo does not
  have Pid entry or if the value is invalid, EREMOTE is pid is in a 
  separate namespace, and ESRCH if is already terminated.
- Extended tst-pidfd_getpid.
- Rename PIDFD_FORK_RUNATFORK to PIDFDFORK_RUNATFORK to avoid clashes
  with possible kernel extensions.

Adhemerval Zanella (7):
  arm: Add the clone3 wrapper
  mips: Add the clone3 wrapper
  linux: Define __ASSUME_CLONE3 to 0 for alpha, ia64, nios2, sh, and
    sparc
  linux: Add posix_spawnattr_{get,set}cgroup_np (BZ 26731)
  posix: Add pidfd_spawn and pidfd_spawnp (BZ 30349)
  posix: Add fork_np (BZ 26371)
  linux: Add pidfd_getpid

 NEWS                                          |  24 ++
 bits/spawn_ext.h                              |  21 ++
 include/clone_internal.h                      |  21 ++
 manual/process.texi                           | 122 ++++++++-
 posix/Makefile                                |   5 +-
 posix/fork-internal.c                         | 127 ++++++++++
 posix/fork-internal.h                         |  36 +++
 posix/fork.c                                  | 107 +-------
 posix/spawn.h                                 |   6 +-
 posix/spawn_int.h                             |   3 +-
 posix/spawnattr_setflags.c                    |   3 +-
 posix/tst-posix_spawn-setsid.c                | 169 +++++++++----
 posix/tst-spawn-chdir.c                       |  15 +-
 posix/tst-spawn.c                             |  24 +-
 posix/tst-spawn.h                             |  36 +++
 posix/tst-spawn2.c                            |  17 +-
 posix/tst-spawn3.c                            | 100 ++++----
 posix/tst-spawn4.c                            |   7 +-
 posix/tst-spawn5.c                            |  14 +-
 posix/tst-spawn6.c                            |  13 +-
 posix/tst-spawn7.c                            |  13 +-
 sysdeps/nptl/_Fork.c                          |   2 +-
 sysdeps/unix/sysv/linux/Makefile              |  29 +++
 sysdeps/unix/sysv/linux/Versions              |   8 +
 sysdeps/unix/sysv/linux/aarch64/libc.abilist  |   6 +
 .../unix/sysv/linux/alpha/kernel-features.h   |   4 +
 sysdeps/unix/sysv/linux/alpha/libc.abilist    |   6 +
 sysdeps/unix/sysv/linux/arc/libc.abilist      |   6 +
 sysdeps/unix/sysv/linux/arch-fork.h           |  16 +-
 sysdeps/unix/sysv/linux/arm/be/libc.abilist   |   6 +
 sysdeps/unix/sysv/linux/arm/clone3.S          |  80 ++++++
 sysdeps/unix/sysv/linux/arm/le/libc.abilist   |   6 +
 sysdeps/unix/sysv/linux/arm/sysdep.h          |   1 +
 sysdeps/unix/sysv/linux/bits/spawn_ext.h      |  71 ++++++
 sysdeps/unix/sysv/linux/bits/unistd_ext.h     |  51 ++++
 sysdeps/unix/sysv/linux/clone-internal.c      |  58 ++++-
 sysdeps/unix/sysv/linux/clone-pidfd-support.c |  60 +++++
 sysdeps/unix/sysv/linux/csky/libc.abilist     |   6 +
 sysdeps/unix/sysv/linux/fork_np.c             |  97 +++++++
 sysdeps/unix/sysv/linux/hppa/libc.abilist     |   6 +
 sysdeps/unix/sysv/linux/i386/libc.abilist     |   6 +
 .../unix/sysv/linux/ia64/kernel-features.h    |   4 +
 sysdeps/unix/sysv/linux/ia64/libc.abilist     |   6 +
 .../sysv/linux/loongarch/lp64/libc.abilist    |   6 +
 .../sysv/linux/m68k/coldfire/libc.abilist     |   6 +
 .../unix/sysv/linux/m68k/m680x0/libc.abilist  |   6 +
 .../sysv/linux/microblaze/be/libc.abilist     |   6 +
 .../sysv/linux/microblaze/le/libc.abilist     |   6 +
 sysdeps/unix/sysv/linux/mips/clone3.S         | 139 +++++++++++
 .../sysv/linux/mips/mips32/fpu/libc.abilist   |   6 +
 .../sysv/linux/mips/mips32/nofpu/libc.abilist |   6 +
 .../sysv/linux/mips/mips64/n32/libc.abilist   |   6 +
 .../sysv/linux/mips/mips64/n64/libc.abilist   |   6 +
 sysdeps/unix/sysv/linux/mips/sysdep.h         |   2 +
 .../unix/sysv/linux/nios2/kernel-features.h   |  24 ++
 sysdeps/unix/sysv/linux/nios2/libc.abilist    |   6 +
 sysdeps/unix/sysv/linux/or1k/libc.abilist     |   6 +
 sysdeps/unix/sysv/linux/pidfd_getpid.c        | 126 ++++++++++
 sysdeps/unix/sysv/linux/pidfd_spawn.c         |  30 +++
 sysdeps/unix/sysv/linux/pidfd_spawnp.c        |  30 +++
 .../linux/powerpc/powerpc32/fpu/libc.abilist  |   6 +
 .../powerpc/powerpc32/nofpu/libc.abilist      |   6 +
 .../linux/powerpc/powerpc64/be/libc.abilist   |   6 +
 .../linux/powerpc/powerpc64/le/libc.abilist   |   6 +
 sysdeps/unix/sysv/linux/procutils.c           |  97 +++++++
 sysdeps/unix/sysv/linux/procutils.h           |  43 ++++
 .../unix/sysv/linux/riscv/rv32/libc.abilist   |   6 +
 .../unix/sysv/linux/riscv/rv64/libc.abilist   |   6 +
 .../unix/sysv/linux/s390/s390-32/libc.abilist |   6 +
 .../unix/sysv/linux/s390/s390-64/libc.abilist |   6 +
 sysdeps/unix/sysv/linux/sh/be/libc.abilist    |   6 +
 sysdeps/unix/sysv/linux/sh/kernel-features.h  |   4 +
 sysdeps/unix/sysv/linux/sh/le/libc.abilist    |   6 +
 .../unix/sysv/linux/sparc/kernel-features.h   |   4 +
 .../sysv/linux/sparc/sparc32/libc.abilist     |   6 +
 .../sysv/linux/sparc/sparc64/libc.abilist     |   6 +
 .../unix/sysv/linux/spawnattr_getcgroup_np.c  |  28 +++
 .../unix/sysv/linux/spawnattr_setcgroup_np.c  |  27 ++
 sysdeps/unix/sysv/linux/spawni.c              |  42 +++-
 sysdeps/unix/sysv/linux/sys/pidfd.h           |   4 +
 sysdeps/unix/sysv/linux/tst-fork_np-cgroup.c  | 170 +++++++++++++
 sysdeps/unix/sysv/linux/tst-fork_np.c         | 236 ++++++++++++++++++
 sysdeps/unix/sysv/linux/tst-pidfd.c           |  48 ++++
 sysdeps/unix/sysv/linux/tst-pidfd_getpid.c    | 126 ++++++++++
 .../sysv/linux/tst-posix_spawn-setsid-pidfd.c |  20 ++
 sysdeps/unix/sysv/linux/tst-spawn-cgroup.c    | 223 +++++++++++++++++
 .../unix/sysv/linux/tst-spawn-chdir-pidfd.c   |  20 ++
 sysdeps/unix/sysv/linux/tst-spawn-pidfd.c     |  20 ++
 sysdeps/unix/sysv/linux/tst-spawn-pidfd.h     |  63 +++++
 sysdeps/unix/sysv/linux/tst-spawn2-pidfd.c    |  20 ++
 sysdeps/unix/sysv/linux/tst-spawn3-pidfd.c    |  20 ++
 sysdeps/unix/sysv/linux/tst-spawn4-pidfd.c    |  20 ++
 sysdeps/unix/sysv/linux/tst-spawn5-pidfd.c    |  20 ++
 sysdeps/unix/sysv/linux/tst-spawn6-pidfd.c    |  20 ++
 sysdeps/unix/sysv/linux/tst-spawn7-pidfd.c    |  20 ++
 .../unix/sysv/linux/x86_64/64/libc.abilist    |   6 +
 .../unix/sysv/linux/x86_64/x32/libc.abilist   |   6 +
 97 files changed, 2947 insertions(+), 267 deletions(-)
 create mode 100644 bits/spawn_ext.h
 create mode 100644 posix/fork-internal.c
 create mode 100644 posix/fork-internal.h
 create mode 100644 posix/tst-spawn.h
 create mode 100644 sysdeps/unix/sysv/linux/arm/clone3.S
 create mode 100644 sysdeps/unix/sysv/linux/bits/spawn_ext.h
 create mode 100644 sysdeps/unix/sysv/linux/clone-pidfd-support.c
 create mode 100644 sysdeps/unix/sysv/linux/fork_np.c
 create mode 100644 sysdeps/unix/sysv/linux/mips/clone3.S
 create mode 100644 sysdeps/unix/sysv/linux/nios2/kernel-features.h
 create mode 100644 sysdeps/unix/sysv/linux/pidfd_getpid.c
 create mode 100644 sysdeps/unix/sysv/linux/pidfd_spawn.c
 create mode 100644 sysdeps/unix/sysv/linux/pidfd_spawnp.c
 create mode 100644 sysdeps/unix/sysv/linux/procutils.c
 create mode 100644 sysdeps/unix/sysv/linux/procutils.h
 create mode 100644 sysdeps/unix/sysv/linux/spawnattr_getcgroup_np.c
 create mode 100644 sysdeps/unix/sysv/linux/spawnattr_setcgroup_np.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-fork_np-cgroup.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-fork_np.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-pidfd_getpid.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-posix_spawn-setsid-pidfd.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn-cgroup.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn-chdir-pidfd.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn-pidfd.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn-pidfd.h
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn2-pidfd.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn3-pidfd.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn4-pidfd.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn5-pidfd.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn6-pidfd.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn7-pidfd.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v8 1/7] arm: Add the clone3 wrapper
  2023-08-18 14:06 [PATCH v8 0/7] Add pidfd and cgroupv2 support for process creation Adhemerval Zanella
@ 2023-08-18 14:06 ` Adhemerval Zanella
  2023-08-18 14:06 ` [PATCH v8 2/7] mips: " Adhemerval Zanella
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 29+ messages in thread
From: Adhemerval Zanella @ 2023-08-18 14:06 UTC (permalink / raw)
  To: libc-alpha, Florian Weimer

It follows the internal signature:

  extern int clone3 (struct clone_args *__cl_args, size_t __size,
		    int (*__func) (void *__arg), void *__arg);

Checked on arm-linux-gnueabihf.
---
 sysdeps/unix/sysv/linux/arm/clone3.S | 80 ++++++++++++++++++++++++++++
 sysdeps/unix/sysv/linux/arm/sysdep.h |  1 +
 2 files changed, 81 insertions(+)
 create mode 100644 sysdeps/unix/sysv/linux/arm/clone3.S

diff --git a/sysdeps/unix/sysv/linux/arm/clone3.S b/sysdeps/unix/sysv/linux/arm/clone3.S
new file mode 100644
index 0000000000..f236d18390
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/arm/clone3.S
@@ -0,0 +1,80 @@
+/* The clone3 syscall wrapper.  Linux/arm version.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#define _ERRNO_H	1
+#include <bits/errno.h>
+
+/* The userland implementation is:
+   int clone3 (struct clone_args *cl_args, size_t size,
+               int (*func)(void *arg), void *arg);
+
+   the kernel entry is:
+   int clone3 (struct clone_args *cl_args, size_t size);
+
+   The parameters are passed in registers from userland:
+   r0: cl_args
+   r1: size
+   r2: func
+   r3: arg  */
+
+        .text
+ENTRY(__clone3)
+	/* Sanity check args.  */
+	cmp	r0, #0
+	ite	ne
+	cmpne	r1, #0
+	moveq	r0, #-EINVAL
+	beq	PLTJMP(syscall_error)
+
+	/* Do the syscall, the kernel expects:
+	   r7: system call number:
+	   r0: cl_args
+	   r1: size  */
+	push    { r7 }
+	cfi_adjust_cfa_offset (4)
+	cfi_rel_offset (r7, 0)
+	ldr     r7, =SYS_ify(clone3)
+	swi	0x0
+	cfi_endproc
+
+	cmp	r0, #0
+	beq	1f
+	pop     {r7}
+	blt	PLTJMP(C_SYMBOL_NAME(__syscall_error))
+	RETINSTR(, lr)
+
+	cfi_startproc
+PSEUDO_END (__clone3)
+
+1:
+	.fnstart
+	.cantunwind
+	mov	r0, r3
+	mov	ip, r2
+	BLX (ip)
+
+	/* And we are done, passing the return value through r0.  */
+	ldr	r7, =SYS_ify(exit)
+	swi	0x0
+
+	.fnend
+
+libc_hidden_def (__clone3)
+weak_alias (__clone3, clone3)
diff --git a/sysdeps/unix/sysv/linux/arm/sysdep.h b/sysdeps/unix/sysv/linux/arm/sysdep.h
index 2f321881c8..57fc5f16bd 100644
--- a/sysdeps/unix/sysv/linux/arm/sysdep.h
+++ b/sysdeps/unix/sysv/linux/arm/sysdep.h
@@ -362,6 +362,7 @@ __local_syscall_error:						\
 #define HAVE_CLOCK_GETTIME_VSYSCALL	"__vdso_clock_gettime"
 #define HAVE_CLOCK_GETTIME64_VSYSCALL	"__vdso_clock_gettime64"
 #define HAVE_GETTIMEOFDAY_VSYSCALL	"__vdso_gettimeofday"
+#define HAVE_CLONE3_WRAPPER		1
 
 #define LOAD_ARGS_0()
 #define ASM_ARGS_0
-- 
2.34.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v8 2/7] mips: Add the clone3 wrapper
  2023-08-18 14:06 [PATCH v8 0/7] Add pidfd and cgroupv2 support for process creation Adhemerval Zanella
  2023-08-18 14:06 ` [PATCH v8 1/7] arm: Add the clone3 wrapper Adhemerval Zanella
@ 2023-08-18 14:06 ` Adhemerval Zanella
  2023-08-18 14:06 ` [PATCH v8 3/7] linux: Define __ASSUME_CLONE3 to 0 for alpha, ia64, nios2, sh, and sparc Adhemerval Zanella
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 29+ messages in thread
From: Adhemerval Zanella @ 2023-08-18 14:06 UTC (permalink / raw)
  To: libc-alpha, Florian Weimer

It follows the internal signature:

extern int clone3 (struct clone_args *__cl_args, size_t __size,
                   int (*__func) (void *__arg), void *__arg);

Checked on mips64el-linux-gnueabihf, mips64el-n32-linux-gnu, and
mipsel-linux-gnu.
---
 sysdeps/unix/sysv/linux/mips/clone3.S | 139 ++++++++++++++++++++++++++
 sysdeps/unix/sysv/linux/mips/sysdep.h |   2 +
 2 files changed, 141 insertions(+)
 create mode 100644 sysdeps/unix/sysv/linux/mips/clone3.S

diff --git a/sysdeps/unix/sysv/linux/mips/clone3.S b/sysdeps/unix/sysv/linux/mips/clone3.S
new file mode 100644
index 0000000000..1d16bfcef6
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/mips/clone3.S
@@ -0,0 +1,139 @@
+/* The clone3 syscall wrapper.  Linux/mips version.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sys/asm.h>
+#include <sysdep.h>
+#define _ERRNO_H        1
+#include <bits/errno.h>
+
+/* The userland implementation is:
+   int clone3 (struct clone_args *cl_args, size_t size,
+               int (*func)(void *arg), void *arg);
+
+   the kernel entry is:
+   int clone3 (struct clone_args *cl_args, size_t size);
+
+   The parameters are passed in registers from userland:
+   a0/$4: cl_args
+   a1/$5: size
+   a2/$6: func
+   a3/$7: arg  */
+
+	.text
+	.set		nomips16
+#if _MIPS_SIM == _ABIO32
+# define EXTRA_LOCALS 1
+#else
+# define EXTRA_LOCALS 0
+#endif
+#define FRAMESZ ((NARGSAVE*SZREG)+ALSZ)&ALMASK
+GPOFF= FRAMESZ-(1*SZREG)
+NESTED(__clone3, SZREG, sp)
+#ifdef __PIC__
+	SETUP_GP
+#endif
+#if FRAMESZ
+	PTR_SUBU sp, FRAMESZ
+	cfi_adjust_cfa_offset (FRAMESZ)
+#endif
+	SETUP_GP64_STACK (GPOFF, __clone3)
+#ifdef __PIC__
+	SAVE_GP (GPOFF)
+#endif
+#ifdef PROF
+	.set	noat
+	move	$1,ra
+	jal	_mcount
+	.set	at
+#endif
+
+	/* Sanity check args.  */
+	li	v0, EINVAL
+	beqz	a0, L(error)	/* No NULL cl_args pointer.  */
+	beqz	a2, L(error)	/* No NULL function pointer.  */
+
+	move	$8, a3		/* a3 is set to 0/1 for syscall success/error
+				   while a4/$8 is returned unmodified.  */
+
+	/* Do the system call, the kernel expects:
+	   v0: system call number
+	   a0: cl_args
+	   a1: size  */
+	li		v0, __NR_clone3
+	cfi_endproc
+	syscall
+
+	bnez		a3, L(error)
+	beqz		v0, L(thread_start_clone3)
+
+	/* Successful return from the parent */
+	cfi_startproc
+#if FRAMESZ
+	cfi_adjust_cfa_offset (FRAMESZ)
+#endif
+	SETUP_GP64_STACK_CFI (GPOFF)
+	cfi_remember_state
+	RESTORE_GP64_STACK
+#if FRAMESZ
+	PTR_ADDU	sp, FRAMESZ
+	cfi_adjust_cfa_offset (-FRAMESZ)
+#endif
+	ret
+
+L(error):
+	cfi_restore_state
+#ifdef __PIC__
+	PTR_LA		t9, __syscall_error
+	RESTORE_GP64_STACK
+	PTR_ADDU	sp, FRAMESZ
+	cfi_adjust_cfa_offset (-FRAMESZ)
+	jr		t9
+#else
+	RESTORE_GP64_STACK
+	PTR_ADDU	sp, FRAMESZ
+	cfi_adjust_cfa_offset (-FRAMESZ)
+	j		__syscall_error
+#endif
+END (__clone3)
+
+/* Load up the arguments to the function.  Put this block of code in
+   its own function so that we can terminate the stack trace with our
+   debug info.  */
+
+ENTRY(__thread_start_clone3)
+L(thread_start_clone3):
+	cfi_undefined ($31)
+	/* cp is already loaded.  */
+	SAVE_GP (GPOFF)
+	/* The stackframe has been created on entry of clone3.  */
+
+	/* Restore the arg for user's function.  */
+	move		t9, a2		/* Function pointer.  */
+	move		a0, $8		/* Argument pointer.  */
+
+	/* Call the user's function.  */
+	jal		t9
+
+	move		a0, v0
+	li		v0, __NR_exit
+	syscall
+END(__thread_start_clone3)
+
+libc_hidden_def (__clone3)
+weak_alias (__clone3, clone3)
diff --git a/sysdeps/unix/sysv/linux/mips/sysdep.h b/sysdeps/unix/sysv/linux/mips/sysdep.h
index ff84a91b31..673aa08b57 100644
--- a/sysdeps/unix/sysv/linux/mips/sysdep.h
+++ b/sysdeps/unix/sysv/linux/mips/sysdep.h
@@ -28,3 +28,5 @@
 #endif
 #define HAVE_GETTIMEOFDAY_VSYSCALL      "__vdso_gettimeofday"
 #define HAVE_CLOCK_GETRES_VSYSCALL      "__vdso_clock_getres"
+
+#define HAVE_CLONE3_WRAPPER		1
-- 
2.34.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v8 3/7] linux: Define __ASSUME_CLONE3 to 0 for alpha, ia64, nios2, sh, and sparc
  2023-08-18 14:06 [PATCH v8 0/7] Add pidfd and cgroupv2 support for process creation Adhemerval Zanella
  2023-08-18 14:06 ` [PATCH v8 1/7] arm: Add the clone3 wrapper Adhemerval Zanella
  2023-08-18 14:06 ` [PATCH v8 2/7] mips: " Adhemerval Zanella
@ 2023-08-18 14:06 ` Adhemerval Zanella
  2023-08-24  6:06   ` Florian Weimer
  2023-08-18 14:06 ` [PATCH v8 4/7] linux: Add posix_spawnattr_{get,set}cgroup_np (BZ 26731) Adhemerval Zanella
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 29+ messages in thread
From: Adhemerval Zanella @ 2023-08-18 14:06 UTC (permalink / raw)
  To: libc-alpha, Florian Weimer

Not all architectures added clone3 syscall.
---
 .../unix/sysv/linux/alpha/kernel-features.h   |  4 ++++
 .../unix/sysv/linux/ia64/kernel-features.h    |  4 ++++
 .../unix/sysv/linux/nios2/kernel-features.h   | 24 +++++++++++++++++++
 sysdeps/unix/sysv/linux/sh/kernel-features.h  |  4 ++++
 .../unix/sysv/linux/sparc/kernel-features.h   |  4 ++++
 5 files changed, 40 insertions(+)
 create mode 100644 sysdeps/unix/sysv/linux/nios2/kernel-features.h

diff --git a/sysdeps/unix/sysv/linux/alpha/kernel-features.h b/sysdeps/unix/sysv/linux/alpha/kernel-features.h
index 3151e75449..d14c010333 100644
--- a/sysdeps/unix/sysv/linux/alpha/kernel-features.h
+++ b/sysdeps/unix/sysv/linux/alpha/kernel-features.h
@@ -50,4 +50,8 @@
 /* Alpha requires old sysvipc even being a 64-bit architecture.  */
 #undef __ASSUME_SYSVIPC_DEFAULT_IPC_64
 
+/* Alpha does not provide clone3.  */
+#undef __ASSUME_CLONE3
+#define __ASSUME_CLONE3 0
+
 #endif /* _KERNEL_FEATURES_H */
diff --git a/sysdeps/unix/sysv/linux/ia64/kernel-features.h b/sysdeps/unix/sysv/linux/ia64/kernel-features.h
index 98ebfb74bf..398ec37328 100644
--- a/sysdeps/unix/sysv/linux/ia64/kernel-features.h
+++ b/sysdeps/unix/sysv/linux/ia64/kernel-features.h
@@ -34,4 +34,8 @@
 #undef __ASSUME_CLONE_DEFAULT
 #define __ASSUME_CLONE2
 
+/* ia64 does not provide clone3.  */
+#undef __ASSUME_CLONE3
+#define __ASSUME_CLONE3 0
+
 #endif /* _KERNEL_FEATURES_H */
diff --git a/sysdeps/unix/sysv/linux/nios2/kernel-features.h b/sysdeps/unix/sysv/linux/nios2/kernel-features.h
new file mode 100644
index 0000000000..239e507272
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/nios2/kernel-features.h
@@ -0,0 +1,24 @@
+/* Set flags signalling availability of kernel features based on given
+   kernel version number.  NIOS2 version.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library.  If not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include_next <kernel-features.h>
+
+/* nios2 does not provide clone3.  */
+#undef __ASSUME_CLONE3
+#define __ASSUME_CLONE3 0
diff --git a/sysdeps/unix/sysv/linux/sh/kernel-features.h b/sysdeps/unix/sysv/linux/sh/kernel-features.h
index 953fa8dff0..3eaf9e0857 100644
--- a/sysdeps/unix/sysv/linux/sh/kernel-features.h
+++ b/sysdeps/unix/sysv/linux/sh/kernel-features.h
@@ -55,4 +55,8 @@
 # undef __ASSUME_STATX
 #endif
 
+/* sh does not provide clone3.  */
+#undef __ASSUME_CLONE3
+#define __ASSUME_CLONE3 0
+
 #endif
diff --git a/sysdeps/unix/sysv/linux/sparc/kernel-features.h b/sysdeps/unix/sysv/linux/sparc/kernel-features.h
index 98c938c16d..fcb343ef63 100644
--- a/sysdeps/unix/sysv/linux/sparc/kernel-features.h
+++ b/sysdeps/unix/sysv/linux/sparc/kernel-features.h
@@ -87,3 +87,7 @@
    (INLINE_CLONE_SYSCALL).  */
 #undef __ASSUME_CLONE_DEFAULT
 #define __ASSUME_CLONE_BACKWARDS	1
+
+/* sparc does not provide clone3.  */
+#undef __ASSUME_CLONE3
+#define __ASSUME_CLONE3 0
-- 
2.34.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v8 4/7] linux: Add posix_spawnattr_{get,set}cgroup_np (BZ 26731)
  2023-08-18 14:06 [PATCH v8 0/7] Add pidfd and cgroupv2 support for process creation Adhemerval Zanella
                   ` (2 preceding siblings ...)
  2023-08-18 14:06 ` [PATCH v8 3/7] linux: Define __ASSUME_CLONE3 to 0 for alpha, ia64, nios2, sh, and sparc Adhemerval Zanella
@ 2023-08-18 14:06 ` Adhemerval Zanella
  2023-08-24  7:00   ` Florian Weimer
  2023-08-18 14:06 ` [PATCH v8 5/7] posix: Add pidfd_spawn and pidfd_spawnp (BZ 30349) Adhemerval Zanella
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 29+ messages in thread
From: Adhemerval Zanella @ 2023-08-18 14:06 UTC (permalink / raw)
  To: libc-alpha, Florian Weimer

These functions allow to posix_spawn and posix_spawnp to use
CLONE_INTO_CGROUP with clone3, allowing the child process to
be created in a different cgroup version 2.  These are GNU
extensions that are available only for Linux, and also only
for the architectures that implement clone3 wrapper
(HAVE_CLONE3_WRAPPER).

To create a process on a different cgroupv2, one can use the:

  posix_spawnattr_t attr;
  posix_spawnattr_init (&attr);
  posix_spawnattr_setflags (&attr, POSIX_SPAWN_SETCGROUP);
  posix_spawnattr_setcgroup_np (&attr, cgroup);
  posix_spawn (...)

Similar to other posix_spawn flags, POSIX_SPAWN_SETCGROUP control
whether the cgroup file descriptor will be used or not with
clone3.

There is no fallback if either clone3 does not support the flag
or if the architecture does not provide the clone3 wrapper, in
this case posix_spawn returns ENOTSUP.

Checked on x86_64-linux-gnu.
---
 NEWS                                          |   6 +
 bits/spawn_ext.h                              |  21 ++
 posix/Makefile                                |   1 +
 posix/spawn.h                                 |   6 +-
 posix/spawnattr_setflags.c                    |   3 +-
 sysdeps/unix/sysv/linux/Makefile              |   5 +
 sysdeps/unix/sysv/linux/Versions              |   4 +
 sysdeps/unix/sysv/linux/aarch64/libc.abilist  |   2 +
 sysdeps/unix/sysv/linux/alpha/libc.abilist    |   2 +
 sysdeps/unix/sysv/linux/arc/libc.abilist      |   2 +
 sysdeps/unix/sysv/linux/arm/be/libc.abilist   |   2 +
 sysdeps/unix/sysv/linux/arm/le/libc.abilist   |   2 +
 sysdeps/unix/sysv/linux/bits/spawn_ext.h      |  40 ++++
 sysdeps/unix/sysv/linux/csky/libc.abilist     |   2 +
 sysdeps/unix/sysv/linux/hppa/libc.abilist     |   2 +
 sysdeps/unix/sysv/linux/i386/libc.abilist     |   2 +
 sysdeps/unix/sysv/linux/ia64/libc.abilist     |   2 +
 .../sysv/linux/loongarch/lp64/libc.abilist    |   2 +
 .../sysv/linux/m68k/coldfire/libc.abilist     |   2 +
 .../unix/sysv/linux/m68k/m680x0/libc.abilist  |   2 +
 .../sysv/linux/microblaze/be/libc.abilist     |   2 +
 .../sysv/linux/microblaze/le/libc.abilist     |   2 +
 .../sysv/linux/mips/mips32/fpu/libc.abilist   |   2 +
 .../sysv/linux/mips/mips32/nofpu/libc.abilist |   2 +
 .../sysv/linux/mips/mips64/n32/libc.abilist   |   2 +
 .../sysv/linux/mips/mips64/n64/libc.abilist   |   2 +
 sysdeps/unix/sysv/linux/nios2/libc.abilist    |   2 +
 sysdeps/unix/sysv/linux/or1k/libc.abilist     |   2 +
 .../linux/powerpc/powerpc32/fpu/libc.abilist  |   2 +
 .../powerpc/powerpc32/nofpu/libc.abilist      |   2 +
 .../linux/powerpc/powerpc64/be/libc.abilist   |   2 +
 .../linux/powerpc/powerpc64/le/libc.abilist   |   2 +
 .../unix/sysv/linux/riscv/rv32/libc.abilist   |   2 +
 .../unix/sysv/linux/riscv/rv64/libc.abilist   |   2 +
 .../unix/sysv/linux/s390/s390-32/libc.abilist |   2 +
 .../unix/sysv/linux/s390/s390-64/libc.abilist |   2 +
 sysdeps/unix/sysv/linux/sh/be/libc.abilist    |   2 +
 sysdeps/unix/sysv/linux/sh/le/libc.abilist    |   2 +
 .../sysv/linux/sparc/sparc32/libc.abilist     |   2 +
 .../sysv/linux/sparc/sparc64/libc.abilist     |   2 +
 .../unix/sysv/linux/spawnattr_getcgroup_np.c  |  28 +++
 .../unix/sysv/linux/spawnattr_setcgroup_np.c  |  27 +++
 sysdeps/unix/sysv/linux/spawni.c              |  22 +-
 sysdeps/unix/sysv/linux/tst-spawn-cgroup.c    | 223 ++++++++++++++++++
 .../unix/sysv/linux/x86_64/64/libc.abilist    |   2 +
 .../unix/sysv/linux/x86_64/x32/libc.abilist   |   2 +
 46 files changed, 449 insertions(+), 5 deletions(-)
 create mode 100644 bits/spawn_ext.h
 create mode 100644 sysdeps/unix/sysv/linux/bits/spawn_ext.h
 create mode 100644 sysdeps/unix/sysv/linux/spawnattr_getcgroup_np.c
 create mode 100644 sysdeps/unix/sysv/linux/spawnattr_setcgroup_np.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn-cgroup.c

diff --git a/NEWS b/NEWS
index 1d9ce09488..0b9a247241 100644
--- a/NEWS
+++ b/NEWS
@@ -14,6 +14,12 @@ Major new features:
   and under Linux a spare has been allocated: it was always zero
   in previous versions of glibc, and zero is not a valid result.
 
+* On Linux, the functions posix_spawnattr_getcgroup_np and
+  posix_spawnattr_setcgroup_np have been added, along with the
+  POSIX_SPAWN_SETCGROUP flag.  They allow posix_spawn and posix_spawnp
+  to set the cgroupv2 in the new process in a race-free manner.  These
+  functions are GNU extensions and require a kernel with clone3 support.
+
 Deprecated and removed features, and other changes affecting compatibility:
 
   [Add deprecations, removals and changes affecting compatibility here]
diff --git a/bits/spawn_ext.h b/bits/spawn_ext.h
new file mode 100644
index 0000000000..75b504a768
--- /dev/null
+++ b/bits/spawn_ext.h
@@ -0,0 +1,21 @@
+/* POSIX spawn extensions.   Generic version.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#ifndef _SPAWN_H
+# error "Never include <bits/spawn-ext.h> directly; use <spawn.h> instead."
+#endif
diff --git a/posix/Makefile b/posix/Makefile
index 3d368b91f6..70faad4b63 100644
--- a/posix/Makefile
+++ b/posix/Makefile
@@ -37,6 +37,7 @@ headers := \
   bits/pthreadtypes-arch.h \
   bits/pthreadtypes.h \
   bits/sched.h \
+  bits/spawn_ext.h \
   bits/thread-shared-types.h \
   bits/types.h \
   bits/types/idtype_t.h \
diff --git a/posix/spawn.h b/posix/spawn.h
index 04cc525fa5..731862cc5a 100644
--- a/posix/spawn.h
+++ b/posix/spawn.h
@@ -34,7 +34,8 @@ typedef struct
   sigset_t __ss;
   struct sched_param __sp;
   int __policy;
-  int __pad[16];
+  int __cgroup;
+  int __pad[15];
 } posix_spawnattr_t;
 
 
@@ -59,6 +60,7 @@ typedef struct
 #ifdef __USE_GNU
 # define POSIX_SPAWN_USEVFORK		0x40
 # define POSIX_SPAWN_SETSID		0x80
+# define POSIX_SPAWN_SETCGROUP         0x100
 #endif
 
 
@@ -231,4 +233,6 @@ posix_spawn_file_actions_addtcsetpgrp_np (posix_spawn_file_actions_t *,
 
 __END_DECLS
 
+#include <bits/spawn_ext.h>
+
 #endif /* spawn.h */
diff --git a/posix/spawnattr_setflags.c b/posix/spawnattr_setflags.c
index 97153948e4..e7bb217c6a 100644
--- a/posix/spawnattr_setflags.c
+++ b/posix/spawnattr_setflags.c
@@ -26,7 +26,8 @@
 		   | POSIX_SPAWN_SETSCHEDPARAM				      \
 		   | POSIX_SPAWN_SETSCHEDULER				      \
 		   | POSIX_SPAWN_SETSID					      \
-		   | POSIX_SPAWN_USEVFORK)
+		   | POSIX_SPAWN_USEVFORK				      \
+		   | POSIX_SPAWN_SETCGROUP)
 
 /* Store flags in the attribute structure.  */
 int
diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
index be801e3be4..d7b020154a 100644
--- a/sysdeps/unix/sysv/linux/Makefile
+++ b/sysdeps/unix/sysv/linux/Makefile
@@ -493,11 +493,14 @@ sysdep_routines += \
   getcpu \
   oldglob \
   sched_getcpu \
+  spawnattr_getcgroup_np \
+  spawnattr_setcgroup_np \
   # sysdep_routines
 
 tests += \
   tst-affinity \
   tst-affinity-pid \
+  tst-spawn-cgroup \
   # tests
 
 tests-static += \
@@ -511,6 +514,8 @@ tests += \
 CFLAGS-fork.c = $(libio-mtsafe)
 CFLAGS-getpid.o = -fomit-frame-pointer
 CFLAGS-getpid.os = -fomit-frame-pointer
+
+tst-spawn-cgroup-ARGS = -- $(host-test-program-cmd)
 endif
 
 ifeq ($(subdir),inet)
diff --git a/sysdeps/unix/sysv/linux/Versions b/sysdeps/unix/sysv/linux/Versions
index bc59bce42f..6d8a67039e 100644
--- a/sysdeps/unix/sysv/linux/Versions
+++ b/sysdeps/unix/sysv/linux/Versions
@@ -321,6 +321,10 @@ libc {
     __ppoll64_chk;
 %endif
   }
+  GLIBC_2.39 {
+    posix_spawnattr_getcgroup_np;
+    posix_spawnattr_setcgroup_np;
+  }
   GLIBC_PRIVATE {
     # functions used in other libraries
     __syscall_rt_sigqueueinfo;
diff --git a/sysdeps/unix/sysv/linux/aarch64/libc.abilist b/sysdeps/unix/sysv/linux/aarch64/libc.abilist
index c49363e70e..0090827e01 100644
--- a/sysdeps/unix/sysv/linux/aarch64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/aarch64/libc.abilist
@@ -2673,3 +2673,5 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/alpha/libc.abilist b/sysdeps/unix/sysv/linux/alpha/libc.abilist
index d6b1dcaae6..9d099471b6 100644
--- a/sysdeps/unix/sysv/linux/alpha/libc.abilist
+++ b/sysdeps/unix/sysv/linux/alpha/libc.abilist
@@ -2782,6 +2782,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 _IO_fprintf F
 GLIBC_2.4 _IO_printf F
 GLIBC_2.4 _IO_sprintf F
diff --git a/sysdeps/unix/sysv/linux/arc/libc.abilist b/sysdeps/unix/sysv/linux/arc/libc.abilist
index dfe0c3f7b6..d7ed2f66de 100644
--- a/sysdeps/unix/sysv/linux/arc/libc.abilist
+++ b/sysdeps/unix/sysv/linux/arc/libc.abilist
@@ -2434,3 +2434,5 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/arm/be/libc.abilist b/sysdeps/unix/sysv/linux/arm/be/libc.abilist
index 6c75e5aa76..92e686defe 100644
--- a/sysdeps/unix/sysv/linux/arm/be/libc.abilist
+++ b/sysdeps/unix/sysv/linux/arm/be/libc.abilist
@@ -554,6 +554,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 _Exit F
 GLIBC_2.4 _IO_2_1_stderr_ D 0xa0
 GLIBC_2.4 _IO_2_1_stdin_ D 0xa0
diff --git a/sysdeps/unix/sysv/linux/arm/le/libc.abilist b/sysdeps/unix/sysv/linux/arm/le/libc.abilist
index 03d6f7ae2d..b503e642fc 100644
--- a/sysdeps/unix/sysv/linux/arm/le/libc.abilist
+++ b/sysdeps/unix/sysv/linux/arm/le/libc.abilist
@@ -551,6 +551,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 _Exit F
 GLIBC_2.4 _IO_2_1_stderr_ D 0xa0
 GLIBC_2.4 _IO_2_1_stdin_ D 0xa0
diff --git a/sysdeps/unix/sysv/linux/bits/spawn_ext.h b/sysdeps/unix/sysv/linux/bits/spawn_ext.h
new file mode 100644
index 0000000000..a3aa020d5c
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/bits/spawn_ext.h
@@ -0,0 +1,40 @@
+/* POSIX spawn extensions.   Linux version.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#ifndef _SPAWN_H
+# error "Never include <bits/spawn-ext.h> directly; use <spawn.h> instead."
+#endif
+
+__BEGIN_DECLS
+
+#ifdef __USE_MISC
+
+/* Get the cgroupsv2 the attribute structure.  */
+extern int posix_spawnattr_getcgroup_np (const posix_spawnattr_t *
+					 __restrict __attr,
+					 int *__restrict __cgroup)
+     __THROW __nonnull ((1, 2));
+
+/* Sore the cgroupsv2 the attribute structure.  */
+extern int posix_spawnattr_setcgroup_np (posix_spawnattr_t *__attr,
+					 int __cgroup)
+     __THROW __nonnull ((1));
+
+#endif /* __USE_MISC */
+
+__END_DECLS
diff --git a/sysdeps/unix/sysv/linux/csky/libc.abilist b/sysdeps/unix/sysv/linux/csky/libc.abilist
index d858c108c6..ec9e209b8d 100644
--- a/sysdeps/unix/sysv/linux/csky/libc.abilist
+++ b/sysdeps/unix/sysv/linux/csky/libc.abilist
@@ -2710,3 +2710,5 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/hppa/libc.abilist b/sysdeps/unix/sysv/linux/hppa/libc.abilist
index 82a14f8ace..961f88bf14 100644
--- a/sysdeps/unix/sysv/linux/hppa/libc.abilist
+++ b/sysdeps/unix/sysv/linux/hppa/libc.abilist
@@ -2659,6 +2659,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/i386/libc.abilist b/sysdeps/unix/sysv/linux/i386/libc.abilist
index 1950b15d5d..b6f5a4ab83 100644
--- a/sysdeps/unix/sysv/linux/i386/libc.abilist
+++ b/sysdeps/unix/sysv/linux/i386/libc.abilist
@@ -2843,6 +2843,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/ia64/libc.abilist b/sysdeps/unix/sysv/linux/ia64/libc.abilist
index d0b9cb279b..a404b99e68 100644
--- a/sysdeps/unix/sysv/linux/ia64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/ia64/libc.abilist
@@ -2608,6 +2608,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/loongarch/lp64/libc.abilist b/sysdeps/unix/sysv/linux/loongarch/lp64/libc.abilist
index e760a631dd..2f9f6e2332 100644
--- a/sysdeps/unix/sysv/linux/loongarch/lp64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/loongarch/lp64/libc.abilist
@@ -2194,3 +2194,5 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
index 35785a3d5f..b7e9ab4558 100644
--- a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
+++ b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
@@ -555,6 +555,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 _Exit F
 GLIBC_2.4 _IO_2_1_stderr_ D 0x98
 GLIBC_2.4 _IO_2_1_stdin_ D 0x98
diff --git a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
index 4ab2426e0a..c345da7e0a 100644
--- a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
+++ b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
@@ -2786,6 +2786,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist b/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist
index 38faa16232..a643d868a8 100644
--- a/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist
+++ b/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist
@@ -2759,3 +2759,5 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist b/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist
index 374d658988..fed535742c 100644
--- a/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist
+++ b/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist
@@ -2756,3 +2756,5 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
index fcc5e88e91..147bac3eaf 100644
--- a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
@@ -2751,6 +2751,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
index 01eb96cd93..e550616576 100644
--- a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
@@ -2749,6 +2749,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
index a2748b7b74..56f414dbd0 100644
--- a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
@@ -2757,6 +2757,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
index 0ae7ba499d..da704a2e2b 100644
--- a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
@@ -2659,6 +2659,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/nios2/libc.abilist b/sysdeps/unix/sysv/linux/nios2/libc.abilist
index 947495a0e2..f5a157ea94 100644
--- a/sysdeps/unix/sysv/linux/nios2/libc.abilist
+++ b/sysdeps/unix/sysv/linux/nios2/libc.abilist
@@ -2798,3 +2798,5 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/or1k/libc.abilist b/sysdeps/unix/sysv/linux/or1k/libc.abilist
index 115f1039e7..85b552f1cb 100644
--- a/sysdeps/unix/sysv/linux/or1k/libc.abilist
+++ b/sysdeps/unix/sysv/linux/or1k/libc.abilist
@@ -2180,3 +2180,5 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
index 19c4c325b0..cadb16c12f 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
@@ -2825,6 +2825,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 _IO_fprintf F
 GLIBC_2.4 _IO_printf F
 GLIBC_2.4 _IO_sprintf F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
index 3e043c4044..50c5b99728 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
@@ -2858,6 +2858,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 _IO_fprintf F
 GLIBC_2.4 _IO_printf F
 GLIBC_2.4 _IO_sprintf F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
index e4f3a766bb..81c63385af 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
@@ -2579,6 +2579,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 _IO_fprintf F
 GLIBC_2.4 _IO_printf F
 GLIBC_2.4 _IO_sprintf F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
index dafe1c4a59..af9be18108 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
@@ -2893,3 +2893,5 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist
index b9740a1afc..2266a88ad5 100644
--- a/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist
@@ -2436,3 +2436,5 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
index e3b4656aa2..4776ae32b8 100644
--- a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
@@ -2636,3 +2636,5 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
index 84cb7a50ed..5d1d7d07a5 100644
--- a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
@@ -2823,6 +2823,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 _IO_fprintf F
 GLIBC_2.4 _IO_printf F
 GLIBC_2.4 _IO_sprintf F
diff --git a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
index 33df3b1646..fffc32a0f4 100644
--- a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
@@ -2616,6 +2616,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 _IO_fprintf F
 GLIBC_2.4 _IO_printf F
 GLIBC_2.4 _IO_sprintf F
diff --git a/sysdeps/unix/sysv/linux/sh/be/libc.abilist b/sysdeps/unix/sysv/linux/sh/be/libc.abilist
index 94cbccd715..43ff21447d 100644
--- a/sysdeps/unix/sysv/linux/sh/be/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sh/be/libc.abilist
@@ -2666,6 +2666,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/sh/le/libc.abilist b/sysdeps/unix/sysv/linux/sh/le/libc.abilist
index 3bb316a787..9ea18d5886 100644
--- a/sysdeps/unix/sysv/linux/sh/le/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sh/le/libc.abilist
@@ -2663,6 +2663,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
index 6341b491b4..c6607d5385 100644
--- a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
@@ -2818,6 +2818,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 _IO_fprintf F
 GLIBC_2.4 _IO_printf F
 GLIBC_2.4 _IO_sprintf F
diff --git a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
index 8ed1ea2926..a010a2bb16 100644
--- a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
@@ -2631,6 +2631,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/spawnattr_getcgroup_np.c b/sysdeps/unix/sysv/linux/spawnattr_getcgroup_np.c
new file mode 100644
index 0000000000..82fd8f4b71
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/spawnattr_getcgroup_np.c
@@ -0,0 +1,28 @@
+/* Copyright (C) 2000-2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <spawn.h>
+
+/* Get scheduling policy from the attribute structure.  */
+int
+posix_spawnattr_getcgroup_np (const posix_spawnattr_t *attr,
+			      int *cgroup)
+{
+  *cgroup = attr->__cgroup;
+
+  return 0;
+}
diff --git a/sysdeps/unix/sysv/linux/spawnattr_setcgroup_np.c b/sysdeps/unix/sysv/linux/spawnattr_setcgroup_np.c
new file mode 100644
index 0000000000..74d60bb5ea
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/spawnattr_setcgroup_np.c
@@ -0,0 +1,27 @@
+/* Copyright (C) 2000-2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <spawn.h>
+
+/* Store scheduling policy in the attribute structure.  */
+int
+posix_spawnattr_setcgroup_np (posix_spawnattr_t *attr, int cgroup)
+{
+  attr->__cgroup = cgroup;
+
+  return 0;
+}
diff --git a/sysdeps/unix/sysv/linux/spawni.c b/sysdeps/unix/sysv/linux/spawni.c
index ec687cb423..f0d4c62ae6 100644
--- a/sysdeps/unix/sysv/linux/spawni.c
+++ b/sysdeps/unix/sysv/linux/spawni.c
@@ -380,14 +380,19 @@ __spawnix (pid_t * pid, const char *file,
      need for CLONE_SETTLS.  Although parent and child share the same TLS
      namespace, there will be no concurrent access for TLS variables (errno
      for instance).  */
+  bool set_cgroup = attrp ? (attrp->__flags & POSIX_SPAWN_SETCGROUP) : false;
   struct clone_args clone_args =
     {
       /* Unsupported flags like CLONE_CLEAR_SIGHAND will be cleared up by
 	 __clone_internal_fallback.  */
-      .flags = CLONE_CLEAR_SIGHAND | CLONE_VM | CLONE_VFORK,
+      .flags = (set_cgroup ? CLONE_INTO_CGROUP : 0)
+	       | CLONE_CLEAR_SIGHAND
+	       | CLONE_VM
+	       | CLONE_VFORK,
       .exit_signal = SIGCHLD,
       .stack = (uintptr_t) stack,
       .stack_size = stack_size,
+      .cgroup = (set_cgroup ? attrp->__cgroup : 0)
     };
 #ifdef HAVE_CLONE3_WRAPPER
   args.use_clone3 = true;
@@ -398,8 +403,19 @@ __spawnix (pid_t * pid, const char *file,
 #endif
     {
       args.use_clone3 = false;
-      new_pid = __clone_internal_fallback (&clone_args, __spawni_child,
-					   &args);
+      if (!set_cgroup)
+	new_pid = __clone_internal_fallback (&clone_args, __spawni_child,
+					     &args);
+      else
+	{
+	  /* No fallback for POSIX_SPAWN_SETCGROUP if clone3 is not
+	     supported.  */
+	  new_pid = -1;
+#ifdef HAVE_CLONE3_WRAPPER
+	  if (errno == ENOSYS)
+#endif
+	    errno = ENOTSUP;
+	}
     }
 
   /* It needs to collect the case where the auxiliary process was created
diff --git a/sysdeps/unix/sysv/linux/tst-spawn-cgroup.c b/sysdeps/unix/sysv/linux/tst-spawn-cgroup.c
new file mode 100644
index 0000000000..84e24696eb
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/tst-spawn-cgroup.c
@@ -0,0 +1,223 @@
+/* Tests for posix_spawn cgroup extension.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <assert.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <getopt.h>
+#include <spawn.h>
+#include <stdlib.h>
+#include <string.h>
+#include <support/check.h>
+#include <support/support.h>
+#include <support/xstdio.h>
+#include <support/xunistd.h>
+#include <support/temp_file.h>
+#include <sys/vfs.h>
+#include <sys/wait.h>
+#include <unistd.h>
+
+#define CGROUPFS "/sys/fs/cgroup/"
+#ifndef CGROUP2_SUPER_MAGIC
+# define CGROUP2_SUPER_MAGIC 0x63677270
+#endif
+
+#define F_TYPE_EQUAL(a, b) (a == (typeof (a)) b)
+
+#define CGROUP_TEST "test-spawn-cgroup"
+
+/* Nonzero if the program gets called via `exec'.  */
+#define CMDLINE_OPTIONS \
+  { "restart", no_argument, &restart, 1 },
+static int restart;
+
+/* Hold the four initial argument used to respawn the process, plus the extra
+   '--direct', '--restart', the check type ('SIG_IGN' or 'SIG_DFL'), and a
+   final NULL.  */
+static char *spargs[8];
+
+static inline char *
+startswith (const char *s, const char *prefix)
+{
+  size_t l = strlen (prefix);
+  if (strncmp (s, prefix, l) == 0)
+    return (char *) s + l;
+  return NULL;
+}
+
+static char *
+get_cgroup (void)
+{
+  FILE *f = fopen ("/proc/self/cgroup", "re");
+  if (f == NULL)
+    FAIL_UNSUPPORTED ("no cgroup defined for the process: %m");
+
+  char *cgroup = NULL;
+
+  char *line = NULL;
+  size_t linesiz = 0;
+  while (xgetline (&line, &linesiz, f) > 0)
+    {
+      char *entry = startswith (line, "0:");
+      if (entry == NULL)
+	continue;
+
+      entry = strchr (entry, ':');
+      if (entry == NULL)
+	continue;
+
+      cgroup = entry + 1;
+      size_t l = strlen (cgroup);
+      if (cgroup[l - 1] == '\n')
+	cgroup[l - 1] = '\0';
+
+      cgroup = xstrdup (entry + 1);
+      break;
+    }
+
+  xfclose (f);
+  free (line);
+
+  return cgroup;
+}
+
+
+/* Called on process re-execution.  */
+static void
+handle_restart (int argc, char *argv[])
+{
+  assert (argc == 1);
+  char *newcgroup = argv[0];
+
+  char *current_cgroup = get_cgroup ();
+  TEST_VERIFY_EXIT (current_cgroup != NULL);
+  TEST_COMPARE_STRING (newcgroup, current_cgroup);
+}
+
+static int
+do_test_cgroup_failure (pid_t *pid, int cgroup)
+{
+  posix_spawnattr_t attr;
+  TEST_COMPARE (posix_spawnattr_init (&attr), 0);
+  TEST_COMPARE (posix_spawnattr_setflags (&attr, POSIX_SPAWN_SETCGROUP), 0);
+  TEST_COMPARE (posix_spawnattr_setcgroup_np (&attr, cgroup), 0);
+
+  int cgetgroup;
+  TEST_COMPARE (posix_spawnattr_getcgroup_np (&attr, &cgetgroup), 0);
+  TEST_COMPARE (cgroup, cgetgroup);
+
+  return posix_spawn (pid, spargs[0], NULL, &attr, spargs, environ);
+}
+
+static int
+create_new_cgroup (char **newcgroup)
+{
+  struct statfs fs;
+  if (statfs (CGROUPFS, &fs) < 0)
+    {
+      if (errno == ENOENT)
+	FAIL_UNSUPPORTED ("no cgroupv2 mount found");
+      FAIL_EXIT1 ("statfs (%s): %m\n", CGROUPFS);
+    }
+
+  if (!F_TYPE_EQUAL (fs.f_type, CGROUP2_SUPER_MAGIC))
+    FAIL_UNSUPPORTED ("%s is not a cgroupv2 (expected %jx, got %jd)",
+		      CGROUPFS, (intmax_t) fs.f_type,
+		      (intmax_t) CGROUP2_SUPER_MAGIC);
+
+  char *cgroup = get_cgroup ();
+  TEST_VERIFY_EXIT (cgroup != NULL);
+  *newcgroup = xasprintf ("%s/%s", cgroup, CGROUP_TEST);
+  char *cgpath = xasprintf ("%s%s/%s", CGROUPFS, cgroup, CGROUP_TEST);
+  free (cgroup);
+
+  if (mkdir (cgpath, 0755) == -1 && errno != EEXIST)
+    {
+      if (errno == EACCES || errno == EPERM || errno == EROFS)
+	FAIL_UNSUPPORTED ("can not create a new cgroupv2 group");
+      FAIL_EXIT1 ("mkdir (%s): %m", cgpath);
+    }
+  add_temp_file (cgpath);
+
+  return xopen (cgpath, O_DIRECTORY | O_RDONLY | O_CLOEXEC, 0666);
+}
+
+static int
+do_test (int argc, char *argv[])
+{
+  /* We must have either:
+
+     - one or four parameters if called initially:
+       + argv[1]: path for ld.so        optional
+       + argv[2]: "--library-path"      optional
+       + argv[3]: the library path      optional
+       + argv[4]: the application name
+
+     - six parameters left if called through re-execution:
+       + argv[4/1]: the application name
+       + argv[5/2]: the created cgroup
+
+     * When built with --enable-hardcoded-path-in-tests or issued without
+       using the loader directly.  */
+
+  if (restart)
+    {
+      handle_restart (argc - 1, &argv[1]);
+      return 0;
+    }
+
+  TEST_VERIFY_EXIT (argc == 2 || argc == 5);
+
+  char *newcgroup;
+  int cgroup = create_new_cgroup (&newcgroup);
+
+  int i;
+  for (i = 0; i < argc - 1; i++)
+    spargs[i] = argv[i + 1];
+  spargs[i++] = (char *) "--direct";
+  spargs[i++] = (char *) "--restart";
+  spargs[i++] = (char *) newcgroup;
+  spargs[i] = NULL;
+
+  /* Check if invalid cgroups returns an error.  */
+  {
+    int r = do_test_cgroup_failure (NULL, -1);
+    if (r == EOPNOTSUPP)
+      FAIL_UNSUPPORTED ("posix_spawn POSIX_SPAWN_SETCGROUP is not supported");
+    TEST_COMPARE (r, EINVAL);
+  }
+
+  {
+    pid_t pid;
+    TEST_COMPARE (do_test_cgroup_failure (&pid, cgroup), 0);
+
+    siginfo_t sinfo;
+    TEST_COMPARE (waitid (P_PID, pid, &sinfo, WEXITED), 0);
+    TEST_COMPARE (sinfo.si_signo, SIGCHLD);
+    TEST_COMPARE (sinfo.si_code, CLD_EXITED);
+    TEST_COMPARE (sinfo.si_status, 0);
+  }
+
+  xclose (cgroup);
+  free (newcgroup);
+
+  return 0;
+}
+
+#define TEST_FUNCTION_ARGV do_test
+#include <support/test-driver.c>
diff --git a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
index 57cfcc2086..3591b5de5e 100644
--- a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
@@ -2582,6 +2582,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
index 3f0a9f6d82..ffbd8f3738 100644
--- a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
@@ -2688,3 +2688,5 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
-- 
2.34.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v8 5/7] posix: Add pidfd_spawn and pidfd_spawnp (BZ 30349)
  2023-08-18 14:06 [PATCH v8 0/7] Add pidfd and cgroupv2 support for process creation Adhemerval Zanella
                   ` (3 preceding siblings ...)
  2023-08-18 14:06 ` [PATCH v8 4/7] linux: Add posix_spawnattr_{get,set}cgroup_np (BZ 26731) Adhemerval Zanella
@ 2023-08-18 14:06 ` Adhemerval Zanella
  2023-08-24  7:13   ` Florian Weimer
  2023-08-18 14:06 ` [PATCH v8 6/7] posix: Add fork_np (BZ 26371) Adhemerval Zanella
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 29+ messages in thread
From: Adhemerval Zanella @ 2023-08-18 14:06 UTC (permalink / raw)
  To: libc-alpha, Florian Weimer

Returning a pidfd allows a process to keep a race-free handle for a
child process, otherwise, the caller will need to either use pidfd_open
(which still might be subject to TOCTOU) or keep the old racy interface
base on pid_t.

The implementation makes sure that kernel must support the complete
pidfd interface, meaning that waitid (P_PIDFD) should be supported
(added on Linux 5.4).  It ensures that a non-racy workaround is required
(such as reading procfs fdinfo pid to use along with wait interfaces).

These interfaces are similar to the posix_spawn and posix_spawnp, with
the only difference being it returns a process file descriptor (int)
instead of a process ID (pid_t).  Their prototypes are:

  int pidfd_spawn (int *restrict pidfd,
                   const char *restrict file,
                   const posix_spawn_file_actions_t *restrict facts,
                   const posix_spawnattr_t *restrict attrp,
                   char *const argv[restrict],
                   char *const envp[restrict])

  int pidfd_spawnp (int *restrict pidfd,
                    const char *restrict path,
                    const posix_spawn_file_actions_t *restrict facts,
                    const posix_spawnattr_t *restrict attrp,
                    char *const argv[restrict_arr],
                    char *const envp[restrict_arr]);

A new symbol is used instead of a posix_spawn extension to avoid
possible issues with language bindings that might track the return
argument lifetime.  Although on Linux pid_t and int are interchangeable,
POSIX only states that pid_t should be a signed integer.

Both symbols reuse the posix_spawn posix_spawn_file_actions_t and
posix_spawnattr_t, to void rehash posix_spawn API or add a new one. It
also means that both interfaces support the same attribute and file
actions, and a new flag or file action on posix_spawn is also added
automatically for pidfd_spawn.

Also, using posix_spawn plumbing allows the reusing of most of the
current testing with some changes:

  - waitid is used instead of waitpid since it is a more generic
    interface.

  - tst-posix_spawn-setsid.c is adapted to take into consideration that
    the caller can check for session id directly.  The test now spawns
itself and writes the session id as a file instead.

  - tst-spawn3.c need to know where pidfd_spawn is used so it keeps an
    extra file description unused.

Checked on x86_64-linux-gnu on Linux 4.15 (no CLONE_PIDFD or waitid
support), Linux 5.4 (full support), and Linux 6.2.
---
 NEWS                                          |   7 +
 include/clone_internal.h                      |   4 +
 manual/process.texi                           |  14 +-
 posix/Makefile                                |   1 +
 posix/spawn_int.h                             |   3 +-
 posix/tst-posix_spawn-setsid.c                | 169 +++++++++++++-----
 posix/tst-spawn-chdir.c                       |  15 +-
 posix/tst-spawn.c                             |  24 +--
 posix/tst-spawn.h                             |  36 ++++
 posix/tst-spawn2.c                            |  17 +-
 posix/tst-spawn3.c                            | 100 ++++++-----
 posix/tst-spawn4.c                            |   7 +-
 posix/tst-spawn5.c                            |  14 +-
 posix/tst-spawn6.c                            |  13 +-
 posix/tst-spawn7.c                            |  13 +-
 sysdeps/unix/sysv/linux/Makefile              |  18 ++
 sysdeps/unix/sysv/linux/Versions              |   2 +
 sysdeps/unix/sysv/linux/aarch64/libc.abilist  |   2 +
 sysdeps/unix/sysv/linux/alpha/libc.abilist    |   2 +
 sysdeps/unix/sysv/linux/arc/libc.abilist      |   2 +
 sysdeps/unix/sysv/linux/arm/be/libc.abilist   |   2 +
 sysdeps/unix/sysv/linux/arm/le/libc.abilist   |   2 +
 sysdeps/unix/sysv/linux/bits/spawn_ext.h      |  31 ++++
 sysdeps/unix/sysv/linux/clone-pidfd-support.c |  60 +++++++
 sysdeps/unix/sysv/linux/csky/libc.abilist     |   2 +
 sysdeps/unix/sysv/linux/hppa/libc.abilist     |   2 +
 sysdeps/unix/sysv/linux/i386/libc.abilist     |   2 +
 sysdeps/unix/sysv/linux/ia64/libc.abilist     |   2 +
 .../sysv/linux/loongarch/lp64/libc.abilist    |   2 +
 .../sysv/linux/m68k/coldfire/libc.abilist     |   2 +
 .../unix/sysv/linux/m68k/m680x0/libc.abilist  |   2 +
 .../sysv/linux/microblaze/be/libc.abilist     |   2 +
 .../sysv/linux/microblaze/le/libc.abilist     |   2 +
 .../sysv/linux/mips/mips32/fpu/libc.abilist   |   2 +
 .../sysv/linux/mips/mips32/nofpu/libc.abilist |   2 +
 .../sysv/linux/mips/mips64/n32/libc.abilist   |   2 +
 .../sysv/linux/mips/mips64/n64/libc.abilist   |   2 +
 sysdeps/unix/sysv/linux/nios2/libc.abilist    |   2 +
 sysdeps/unix/sysv/linux/or1k/libc.abilist     |   2 +
 sysdeps/unix/sysv/linux/pidfd_spawn.c         |  30 ++++
 sysdeps/unix/sysv/linux/pidfd_spawnp.c        |  30 ++++
 .../linux/powerpc/powerpc32/fpu/libc.abilist  |   2 +
 .../powerpc/powerpc32/nofpu/libc.abilist      |   2 +
 .../linux/powerpc/powerpc64/be/libc.abilist   |   2 +
 .../linux/powerpc/powerpc64/le/libc.abilist   |   2 +
 .../unix/sysv/linux/riscv/rv32/libc.abilist   |   2 +
 .../unix/sysv/linux/riscv/rv64/libc.abilist   |   2 +
 .../unix/sysv/linux/s390/s390-32/libc.abilist |   2 +
 .../unix/sysv/linux/s390/s390-64/libc.abilist |   2 +
 sysdeps/unix/sysv/linux/sh/be/libc.abilist    |   2 +
 sysdeps/unix/sysv/linux/sh/le/libc.abilist    |   2 +
 .../sysv/linux/sparc/sparc32/libc.abilist     |   2 +
 .../sysv/linux/sparc/sparc64/libc.abilist     |   2 +
 sysdeps/unix/sysv/linux/spawni.c              |  22 ++-
 .../sysv/linux/tst-posix_spawn-setsid-pidfd.c |  20 +++
 .../unix/sysv/linux/tst-spawn-chdir-pidfd.c   |  20 +++
 sysdeps/unix/sysv/linux/tst-spawn-pidfd.c     |  20 +++
 sysdeps/unix/sysv/linux/tst-spawn-pidfd.h     |  63 +++++++
 sysdeps/unix/sysv/linux/tst-spawn2-pidfd.c    |  20 +++
 sysdeps/unix/sysv/linux/tst-spawn3-pidfd.c    |  20 +++
 sysdeps/unix/sysv/linux/tst-spawn4-pidfd.c    |  20 +++
 sysdeps/unix/sysv/linux/tst-spawn5-pidfd.c    |  20 +++
 sysdeps/unix/sysv/linux/tst-spawn6-pidfd.c    |  20 +++
 sysdeps/unix/sysv/linux/tst-spawn7-pidfd.c    |  20 +++
 .../unix/sysv/linux/x86_64/64/libc.abilist    |   2 +
 .../unix/sysv/linux/x86_64/x32/libc.abilist   |   2 +
 66 files changed, 791 insertions(+), 150 deletions(-)
 create mode 100644 posix/tst-spawn.h
 create mode 100644 sysdeps/unix/sysv/linux/clone-pidfd-support.c
 create mode 100644 sysdeps/unix/sysv/linux/pidfd_spawn.c
 create mode 100644 sysdeps/unix/sysv/linux/pidfd_spawnp.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-posix_spawn-setsid-pidfd.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn-chdir-pidfd.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn-pidfd.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn-pidfd.h
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn2-pidfd.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn3-pidfd.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn4-pidfd.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn5-pidfd.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn6-pidfd.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn7-pidfd.c

diff --git a/NEWS b/NEWS
index 0b9a247241..97681e6796 100644
--- a/NEWS
+++ b/NEWS
@@ -20,6 +20,13 @@ Major new features:
   to set the cgroupv2 in the new process in a race-free manner.  These
   functions are GNU extensions and require a kernel with clone3 support.
 
+* On Linux, the pidfd_spawn and pidfd_spawp functions have been added.
+  They have a similar prototype and semantic as posix_spawn, but instead of
+  returning a process ID, they return a file descriptor that can be used
+  along other pidfd functions (like pidfd_send_signal, poll, or waitid).
+  The pidfd functionality avoids the issue of PID reuse with the traditional
+  posix_spawn interface.
+
 Deprecated and removed features, and other changes affecting compatibility:
 
   [Add deprecations, removals and changes affecting compatibility here]
diff --git a/include/clone_internal.h b/include/clone_internal.h
index ad7b170f58..567160ebb5 100644
--- a/include/clone_internal.h
+++ b/include/clone_internal.h
@@ -35,6 +35,10 @@ extern int __clone_internal_fallback (struct clone_args *__cl_args,
 				      void *__arg)
      attribute_hidden;
 
+/* Return whether the kernel supports pid file descriptor, including clone
+   with CLONE_PIDFD and waitid with P_PIDFD.  */
+extern bool __clone_pidfd_supported (void) attribute_hidden;
+
 #ifndef _ISOMAC
 libc_hidden_proto (__clone3)
 libc_hidden_proto (__clone_internal)
diff --git a/manual/process.texi b/manual/process.texi
index c8413a5a58..68361c3f61 100644
--- a/manual/process.texi
+++ b/manual/process.texi
@@ -136,13 +136,13 @@ creating a process and making it run another program.
 @cindex parent process
 @cindex subprocess
 A new processes is created when one of the functions
-@code{posix_spawn}, @code{fork}, @code{_Fork} or @code{vfork} is called.
-(The @code{system} and @code{popen} also create new processes internally.)
-Due to the name of the @code{fork} function, the act of creating a new
-process is sometimes called @dfn{forking} a process.  Each new process
-(the @dfn{child process} or @dfn{subprocess}) is allocated a process
-ID, distinct from the process ID of the parent process.  @xref{Process
-Identification}.
+@code{posix_spawn}, @code{fork}, @code{_Fork}, @code{vfork}, or
+@code{pidfd_spawn} is called.  (The @code{system} and @code{popen} also
+create new processes internally.)  Due to the name of the @code{fork}
+function, the act of creating a new process is sometimes called
+@dfn{forking} a process.  Each new process (the @dfn{child process} or
+@dfn{subprocess}) is allocated a process ID, distinct from the process
+ID of the parent process.  @xref{Process Identification}.
 
 After forking a child process, both the parent and child processes
 continue to execute normally.  If you want your program to wait for a
diff --git a/posix/Makefile b/posix/Makefile
index 70faad4b63..905cf9fb54 100644
--- a/posix/Makefile
+++ b/posix/Makefile
@@ -602,6 +602,7 @@ tst-spawn-static-ARGS = $(tst-spawn-ARGS)
 tst-spawn5-ARGS = -- $(host-test-program-cmd)
 tst-spawn6-ARGS = -- $(host-test-program-cmd)
 tst-spawn7-ARGS = -- $(host-test-program-cmd)
+tst-posix_spawn-setsid-ARGS = -- $(host-test-program-cmd)
 tst-dir-ARGS = `pwd` `cd $(common-objdir)/$(subdir); pwd` `cd $(common-objdir); pwd` $(objpfx)tst-dir
 tst-chmod-ARGS = $(objdir)
 tst-vfork3-ARGS = --test-dir=$(objpfx)
diff --git a/posix/spawn_int.h b/posix/spawn_int.h
index aeb066c44f..64ee03e62d 100644
--- a/posix/spawn_int.h
+++ b/posix/spawn_int.h
@@ -76,12 +76,13 @@ struct __spawn_action
 
 #define SPAWN_XFLAGS_USE_PATH	0x1
 #define SPAWN_XFLAGS_TRY_SHELL	0x2
+#define SPAWN_XFLAGS_RET_PIDFD  0x4
 
 extern int __posix_spawn_file_actions_realloc (posix_spawn_file_actions_t *
 					       file_actions)
      attribute_hidden;
 
-extern int __spawni (pid_t *pid, const char *path,
+extern int __spawni (int *pid, const char *path,
 		     const posix_spawn_file_actions_t *file_actions,
 		     const posix_spawnattr_t *attrp, char *const argv[],
 		     char *const envp[], int xflags) attribute_hidden;
diff --git a/posix/tst-posix_spawn-setsid.c b/posix/tst-posix_spawn-setsid.c
index 124d878ce2..b47eb16cc5 100644
--- a/posix/tst-posix_spawn-setsid.c
+++ b/posix/tst-posix_spawn-setsid.c
@@ -18,78 +18,159 @@
 
 #include <errno.h>
 #include <fcntl.h>
+#include <getopt.h>
+#include <intprops.h>
+#include <paths.h>
 #include <spawn.h>
 #include <stdbool.h>
 #include <stdio.h>
+#include <stdlib.h>
 #include <sys/resource.h>
+#include <sys/wait.h>
 #include <unistd.h>
 
 #include <support/check.h>
+#include <support/xunistd.h>
+#include <support/temp_file.h>
+#include <tst-spawn.h>
 
+/* Nonzero if the program gets called via `exec'.  */
+static int restart;
+
+/* Hold the four initial argument used to respawn the process, plus
+   the extra '--direct' and '--restart', and a final NULL.  */
+static char *initial_argv[7];
+static int initial_argv_count;
+
+#define CMDLINE_OPTIONS \
+  { "restart", no_argument, &restart, 1 },
+
+static char *pidfile;
+
+static pid_t
+read_child_sid (void)
+{
+  int pidfd = xopen (pidfile, O_RDONLY, 0);
+
+  char buf[INT_STRLEN_BOUND (pid_t)];
+  ssize_t n = read (pidfd, buf, sizeof (buf));
+  TEST_VERIFY (n < sizeof buf && n >= 0);
+  buf[n] = '\0';
+
+  /* We only expect to read the PID.  */
+  char *endp;
+  long int rpid = strtol (buf, &endp, 10);
+  TEST_VERIFY (endp != buf);
+
+  xclose (pidfd);
+
+  return rpid;
+}
+
+/* Called on process re-execution, write down the session id on PIDFILE.  */
 static void
-do_test_setsid (bool test_setsid)
+handle_restart (const char *pidfile)
 {
-  pid_t sid, child_sid;
-  int res;
+  int pidfd = xopen (pidfile, O_WRONLY, 0);
+
+  char buf[INT_STRLEN_BOUND (pid_t)];
+  int s = snprintf (buf, sizeof buf, "%d", getsid (0));
+  size_t n = write (pidfd, buf, s);
+  TEST_VERIFY (n == s);
+
+  xclose (pidfd);
+}
 
+static void
+do_test_setsid (bool test_setsid)
+{
   /* Current session ID.  */
-  sid = getsid(0);
-  if (sid == (pid_t) -1)
-    FAIL_EXIT1 ("getsid (0): %m");
+  pid_t sid = getsid (0);
+  TEST_VERIFY (sid != (pid_t) -1);
 
   posix_spawnattr_t attrp;
-  /* posix_spawnattr_init should not fail (it basically memset the
-     attribute).  */
-  posix_spawnattr_init (&attrp);
+  TEST_COMPARE (posix_spawnattr_init (&attrp), 0);
   if (test_setsid)
-    {
-      res = posix_spawnattr_setflags (&attrp, POSIX_SPAWN_SETSID);
-      if (res != 0)
-	{
-	  errno = res;
-	  FAIL_EXIT1 ("posix_spawnattr_setflags: %m");
-	}
-    }
-
-  /* Program to run.  */
-  char *args[2] = { (char *) "true", NULL };
-  pid_t child;
-
-  res = posix_spawnp (&child, "true", NULL, &attrp, args, environ);
-  /* posix_spawnattr_destroy is noop.  */
-  posix_spawnattr_destroy (&attrp);
-
-  if (res != 0)
-    {
-      errno = res;
-      FAIL_EXIT1 ("posix_spawnp: %m");
-    }
+    TEST_COMPARE (posix_spawnattr_setflags (&attrp, POSIX_SPAWN_SETSID), 0);
+
+  /* 1 or 4 elements from initial_argv:
+       + path to ld.so          optional
+       + --library-path         optional
+       + the library path       optional
+       + application name
+       + --direct
+       + --restart
+       + pidfile  */
+  int argv_size = initial_argv_count + 2;
+  char *args[argv_size];
+  int argc = 0;
+
+  for (char **arg = initial_argv; *arg != NULL; arg++)
+    args[argc++] = *arg;
+  args[argc++] = pidfile;
+  args[argc] = NULL;
+  TEST_VERIFY (argc < argv_size);
+
+  PID_T_TYPE pid;
+  TEST_COMPARE (POSIX_SPAWN (&pid, args[0], NULL, &attrp, args, environ), 0);
+  TEST_COMPARE (posix_spawnattr_destroy (&attrp), 0);
+
+  siginfo_t sinfo;
+  TEST_COMPARE (WAITID (P_PID, pid, &sinfo, WEXITED), 0);
+  TEST_COMPARE (sinfo.si_code, CLD_EXITED);
+  TEST_COMPARE (sinfo.si_status, 0);
+
+  pid_t child_sid = read_child_sid ();
 
   /* Child should have a different session ID than parent.  */
-  child_sid = getsid (child);
-
-  if (child_sid == (pid_t) -1)
-    FAIL_EXIT1 ("getsid (%i): %m", child);
+  TEST_VERIFY (child_sid != (pid_t) -1);
 
   if (test_setsid)
-    {
-      if (child_sid == sid)
-	FAIL_EXIT1 ("child session ID matched parent one");
-    }
+    TEST_VERIFY (child_sid != sid);
   else
-    {
-      if (child_sid != sid)
-	FAIL_EXIT1 ("child session ID did not match parent one");
-    }
+    TEST_VERIFY (child_sid == sid);
 }
 
 static int
-do_test (void)
+do_test (int argc, char *argv[])
 {
+  /* We must have either:
+
+     - one or four parameters if called initially:
+       + argv[1]: path for ld.so        optional
+       + argv[2]: "--library-path"      optional
+       + argv[3]: the library path      optional
+       + argv[4]: the application name
+
+     - six parameters left if called through re-execution:
+       + argv[5/1]: the application name
+       + argv[6/2]: the pidfile
+
+     * When built with --enable-hardcoded-path-in-tests or issued without
+       using the loader directly.  */
+
+  if (restart)
+    {
+      handle_restart (argv[1]);
+      return 0;
+    }
+
+  TEST_VERIFY_EXIT (argc == 2 || argc == 5);
+
+  int i;
+  for (i = 0; i < argc - 1; i++)
+    initial_argv[i] = argv[i + 1];
+  initial_argv[i++] = (char *) "--direct";
+  initial_argv[i++] = (char *) "--restart";
+  initial_argv_count = i;
+
+  create_temp_file ("tst-posix_spawn-setsid-", &pidfile);
+
   do_test_setsid (false);
   do_test_setsid (true);
 
   return 0;
 }
 
+#define TEST_FUNCTION_ARGV do_test
 #include <support/test-driver.c>
diff --git a/posix/tst-spawn-chdir.c b/posix/tst-spawn-chdir.c
index b335092d7f..c01ca6692d 100644
--- a/posix/tst-spawn-chdir.c
+++ b/posix/tst-spawn-chdir.c
@@ -29,7 +29,9 @@
 #include <support/test-driver.h>
 #include <support/xstdio.h>
 #include <support/xunistd.h>
+#include <sys/wait.h>
 #include <unistd.h>
+#include <tst-spawn.h>
 
 /* Reads the file at PATH, which must consist of exactly one line.
    Removes the line terminator at the end of the file.  */
@@ -169,17 +171,18 @@ do_test (void)
 
           char *const argv[] = { (char *) "pwd", NULL };
           char *const envp[] = { NULL } ;
-          pid_t pid;
+          PID_T_TYPE pid;
           if (do_spawnp)
-            TEST_COMPARE (posix_spawnp (&pid, "pwd", &actions,
+            TEST_COMPARE (POSIX_SPAWNP (&pid, "pwd", &actions,
                                         NULL, argv, envp), 0);
           else
-            TEST_COMPARE (posix_spawn (&pid, "subdir/pwd-symlink", &actions,
+            TEST_COMPARE (POSIX_SPAWN (&pid, "subdir/pwd-symlink", &actions,
                                        NULL, argv, envp), 0);
           TEST_VERIFY (pid > 0);
-          int status;
-          xwaitpid (pid, &status, 0);
-          TEST_COMPARE (status, 0);
+          siginfo_t sinfo;
+          TEST_COMPARE (WAITID (P_ALL, 0, &sinfo, WEXITED), 0);
+          TEST_COMPARE (sinfo.si_code, CLD_EXITED);
+          TEST_COMPARE (sinfo.si_status, 0);
 
           /* Check that the current directory did not change.  */
           {
diff --git a/posix/tst-spawn.c b/posix/tst-spawn.c
index 6782a322fc..c44d90756a 100644
--- a/posix/tst-spawn.c
+++ b/posix/tst-spawn.c
@@ -25,11 +25,13 @@
 #include <stdlib.h>
 #include <string.h>
 #include <sys/param.h>
+#include <sys/wait.h>
 
 #include <support/check.h>
 #include <support/xunistd.h>
 #include <support/temp_file.h>
 #include <support/support.h>
+#include <tst-spawn.h>
 
 
 /* Nonzero if the program gets called via `exec'.  */
@@ -143,9 +145,9 @@ handle_restart (const char *fd1s, const char *fd2s, const char *fd3s,
 static int
 do_test (int argc, char *argv[])
 {
-  pid_t pid;
+  PID_T_TYPE pid;
   int fd4;
-  int status;
+  siginfo_t sinfo;
   posix_spawn_file_actions_t actions;
   char fd1name[18];
   char fd2name[18];
@@ -233,17 +235,16 @@ do_test (int argc, char *argv[])
   spargv[i++] = fd5name;
   spargv[i] = NULL;
 
-  TEST_COMPARE (posix_spawn (&pid, argv[1], &actions, NULL, spargv, environ),
+  TEST_COMPARE (POSIX_SPAWN (&pid, argv[1], &actions, NULL, spargv, environ),
 		0);
 
   /* Wait for the children.  */
-  TEST_COMPARE (xwaitpid (pid, &status, 0), pid);
-  TEST_VERIFY (WIFEXITED (status));
-  TEST_VERIFY (!WIFSIGNALED (status));
-  TEST_COMPARE (WEXITSTATUS (status), 0);
+  TEST_COMPARE (WAITID (P_PID, pid, &sinfo, WEXITED), 0);
+  TEST_COMPARE (sinfo.si_code, CLD_EXITED);
+  TEST_COMPARE (sinfo.si_status, 0);
 
   /* Same test but with a NULL pid argument.  */
-  TEST_COMPARE (posix_spawn (NULL, argv[1], &actions, NULL, spargv, environ),
+  TEST_COMPARE (POSIX_SPAWN (NULL, argv[1], &actions, NULL, spargv, environ),
 		0);
 
   /* Cleanup.  */
@@ -251,10 +252,9 @@ do_test (int argc, char *argv[])
   free (name3_copy);
 
   /* Wait for the children.  */
-  xwaitpid (-1, &status, 0);
-  TEST_VERIFY (WIFEXITED (status));
-  TEST_VERIFY (!WIFSIGNALED (status));
-  TEST_COMPARE (WEXITSTATUS (status), 0);
+  TEST_COMPARE (WAITID (P_ALL, 0, &sinfo, WEXITED), 0);
+  TEST_COMPARE (sinfo.si_code, CLD_EXITED);
+  TEST_COMPARE (sinfo.si_status, 0);
 
   return 0;
 }
diff --git a/posix/tst-spawn.h b/posix/tst-spawn.h
new file mode 100644
index 0000000000..a6f2dc8680
--- /dev/null
+++ b/posix/tst-spawn.h
@@ -0,0 +1,36 @@
+/* Generic definitions for posix_spawn tests.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#ifndef PID_T_TYPE
+# define PID_T_TYPE pid_t
+#endif
+
+#ifndef POSIX_SPAWN
+# define POSIX_SPAWN(__child, __path, __actions, __attr, __argv, __envp) \
+  posix_spawn (__child, __path, __actions, __attr, __argv, __envp)
+#endif
+
+#ifndef POSIX_SPAWNP
+# define POSIX_SPAWNP(__child, __path, __actions, __attr, __argv, __envp) \
+  posix_spawnp (__child, __path, __actions, __attr, __argv, __envp)
+#endif
+
+#ifndef WAITID
+# define WAITID(__idtype, __id, __info, __opts) \
+  waitid (__idtype, __id, __info, __opts)
+#endif
diff --git a/posix/tst-spawn2.c b/posix/tst-spawn2.c
index 40dc692488..f5c1f13039 100644
--- a/posix/tst-spawn2.c
+++ b/posix/tst-spawn2.c
@@ -26,6 +26,7 @@
 #include <stdio.h>
 
 #include <support/check.h>
+#include <tst-spawn.h>
 
 int
 do_test (void)
@@ -35,9 +36,9 @@ do_test (void)
 
   const char *program = "/path/to/invalid/binary";
   char * const args[] = { 0 };
-  pid_t pid = -1;
+  PID_T_TYPE pid = -1;
 
-  int ret = posix_spawn (&pid, program, 0, 0, args, environ);
+  int ret = POSIX_SPAWN (&pid, program, 0, 0, args, environ);
   if (ret != ENOENT)
     {
       errno = ret;
@@ -51,14 +52,13 @@ do_test (void)
     FAIL_EXIT1 ("posix_spawn returned pid != -1 (%i)", (int) pid);
 
   /* Check if no child is actually created.  */
-  ret = waitpid (-1, NULL, 0);
-  if (ret != -1 || errno != ECHILD)
-    FAIL_EXIT1 ("waitpid: %m)");
+  TEST_COMPARE (WAITID (P_ALL, 0, NULL, WEXITED), -1);
+  TEST_COMPARE (errno, ECHILD);
 
   /* Same as before, but with posix_spawnp.  */
   char *args2[] = { (char*) program, 0 };
 
-  ret = posix_spawnp (&pid, args2[0], 0, 0, args2, environ);
+  ret = POSIX_SPAWNP (&pid, args2[0], 0, 0, args2, environ);
   if (ret != ENOENT)
     {
       errno = ret;
@@ -68,9 +68,8 @@ do_test (void)
   if (pid != -1)
     FAIL_EXIT1 ("posix_spawnp returned pid != -1 (%i)", (int) pid);
 
-  ret = waitpid (-1, NULL, 0);
-  if (ret != -1 || errno != ECHILD)
-    FAIL_EXIT1 ("waitpid: %m)");
+  TEST_COMPARE (WAITID (P_ALL, 0, NULL, WEXITED), -1);
+  TEST_COMPARE (errno, ECHILD);
 
   return 0;
 }
diff --git a/posix/tst-spawn3.c b/posix/tst-spawn3.c
index e7ce0fb386..64052dc911 100644
--- a/posix/tst-spawn3.c
+++ b/posix/tst-spawn3.c
@@ -16,6 +16,7 @@
    License along with the GNU C Library; if not, see
    <https://www.gnu.org/licenses/>.  */
 
+#include <assert.h>
 #include <stdio.h>
 #include <spawn.h>
 #include <error.h>
@@ -27,9 +28,12 @@
 #include <sys/resource.h>
 #include <fcntl.h>
 #include <paths.h>
+#include <intprops.h>
 
 #include <support/check.h>
 #include <support/temp_file.h>
+#include <support/xunistd.h>
+#include <tst-spawn.h>
 
 static int
 do_test (void)
@@ -48,7 +52,6 @@ do_test (void)
 
   struct rlimit rl;
   int max_fd = 24;
-  int ret;
 
   /* Set maximum number of file descriptor to a low value to avoid open
      too many files in environments where RLIMIT_NOFILE is large and to
@@ -66,7 +69,7 @@ do_test (void)
   /* Exhauste the file descriptor limit with temporary files.  */
   int files[max_fd];
   int nfiles = 0;
-  for (;;)
+  for (; nfiles < max_fd; nfiles++)
     {
       int fd = create_temp_file ("tst-spawn3.", NULL);
       if (fd == -1)
@@ -75,75 +78,82 @@ do_test (void)
 	    FAIL_EXIT1 ("create_temp_file: %m");
 	  break;
 	}
-      files[nfiles++] = fd;
+      files[nfiles] = fd;
     }
+  assert (nfiles != 0);
 
   posix_spawn_file_actions_t a;
-  if (posix_spawn_file_actions_init (&a) != 0)
-    FAIL_EXIT1 ("posix_spawn_file_actions_init");
+  TEST_COMPARE (posix_spawn_file_actions_init (&a), 0);
 
   /* Executes a /bin/sh echo $$ 2>&1 > ${objpfx}tst-spawn3.pid .  */
   const char pidfile[] = OBJPFX "tst-spawn3.pid";
-  if (posix_spawn_file_actions_addopen (&a, STDOUT_FILENO, pidfile, O_WRONLY
-					| O_CREAT | O_TRUNC, 0644) != 0)
-    FAIL_EXIT1 ("posix_spawn_file_actions_addopen");
+  TEST_COMPARE (posix_spawn_file_actions_addopen (&a, STDOUT_FILENO, pidfile,
+						  O_WRONLY| O_CREAT | O_TRUNC,
+						  0644),
+		0);
 
-  if (posix_spawn_file_actions_adddup2 (&a, STDOUT_FILENO, STDERR_FILENO) != 0)
-    FAIL_EXIT1 ("posix_spawn_file_actions_adddup2");
+  TEST_COMPARE (posix_spawn_file_actions_adddup2 (&a, STDOUT_FILENO,
+						  STDERR_FILENO),
+		0);
 
   /* Since execve (called by posix_spawn) might require to open files to
      actually execute the shell script, setup to close the temporary file
      descriptors.  */
-  for (int i=0; i<nfiles; i++)
-    {
-      if (posix_spawn_file_actions_addclose (&a, files[i]))
-	FAIL_EXIT1 ("posix_spawn_file_actions_addclose");
-    }
+  int maxnfiles =
+#ifdef TST_SPAWN_PIDFD
+    /* The sparing file descriptor will be returned as the pid descriptor,
+       otherwise clone fail with EMFILE.  */
+    nfiles - 1;
+#else
+    nfiles;
+#endif
+
+  for (int i=0; i<maxnfiles; i++)
+    TEST_COMPARE (posix_spawn_file_actions_addclose (&a, files[i]), 0);
 
   char *spawn_argv[] = { (char *) _PATH_BSHELL, (char *) "-c",
 			 (char *) "echo $$", NULL };
-  pid_t pid;
-  if ((ret = posix_spawn (&pid, _PATH_BSHELL, &a, NULL, spawn_argv, NULL))
-       != 0)
-    {
-      errno = ret;
-      FAIL_EXIT1 ("posix_spawn: %m");
-    }
-
-  int status;
-  int err = waitpid (pid, &status, 0);
-  if (err != pid)
-    FAIL_EXIT1 ("waitpid: %m");
+  PID_T_TYPE pid;
+
+  {
+    int r = POSIX_SPAWN (&pid, _PATH_BSHELL, &a, NULL, spawn_argv, NULL);
+    if (r == ENOSYS)
+      FAIL_UNSUPPORTED ("kernel does not support CLONE_PIDFD clone flag");
+#ifdef TST_SPAWN_PIDFD
+    TEST_COMPARE (r, EMFILE);
+
+    /* Free up one file descriptor, so posix_spawn_pidfd_ex can return it.  */
+    xclose (files[nfiles-1]);
+    nfiles--;
+    r = POSIX_SPAWN (&pid, _PATH_BSHELL, &a, NULL, spawn_argv, NULL);
+#endif
+    TEST_COMPARE (r, 0);
+  }
+
+  siginfo_t sinfo;
+  TEST_COMPARE (WAITID (P_PID, pid, &sinfo, WEXITED), 0);
+  TEST_COMPARE (sinfo.si_code, CLD_EXITED);
+  TEST_COMPARE (sinfo.si_status, 0);
 
   /* Close the temporary files descriptor so it can check posix_spawn
      output.  */
   for (int i=0; i<nfiles; i++)
-    {
-      if (close (files[i]))
-	FAIL_EXIT1 ("close: %m");
-    }
+    xclose (files[i]);
 
-  int pidfd = open (pidfile, O_RDONLY);
-  if (pidfd == -1)
-    FAIL_EXIT1 ("open: %m");
+  int pidfd = xopen (pidfile, O_RDONLY, 0);
 
-  char buf[64];
-  ssize_t n;
-  if ((n = read (pidfd, buf, sizeof (buf))) < 0)
-    FAIL_EXIT1 ("read: %m");
+  char buf[INT_BUFSIZE_BOUND (pid_t)];
+  ssize_t n = read (pidfd, buf, sizeof (buf));
+  TEST_VERIFY (n < sizeof buf && n >= 0);
 
-  unlink (pidfile);
+  xunlink (pidfile);
 
   /* We only expect to read the PID.  */
   char *endp;
   long int rpid = strtol (buf, &endp, 10);
-  if (*endp != '\n')
-    FAIL_EXIT1 ("*endp != \'n\'");
-  if (endp == buf)
-    FAIL_EXIT1 ("read empty line");
+  TEST_VERIFY (*endp == '\n' && endp != buf);
 
-  if (rpid != pid)
-    FAIL_EXIT1 ("found \"%s\", expected pid %ld\n", buf, (long int) pid);
+  TEST_COMPARE (rpid, sinfo.si_pid);
 
   return 0;
 }
diff --git a/posix/tst-spawn4.c b/posix/tst-spawn4.c
index 327f04ea6c..8bf8bd52df 100644
--- a/posix/tst-spawn4.c
+++ b/posix/tst-spawn4.c
@@ -24,6 +24,7 @@
 #include <support/xunistd.h>
 #include <support/check.h>
 #include <support/temp_file.h>
+#include <tst-spawn.h>
 
 static int
 do_test (void)
@@ -38,15 +39,15 @@ do_test (void)
 
   TEST_VERIFY_EXIT (chmod (scriptname, 0x775) == 0);
 
-  pid_t pid;
+  PID_T_TYPE pid;
   int status;
 
   /* Check if scripts without shebang are correctly not executed.  */
-  status = posix_spawn (&pid, scriptname, NULL, NULL, (char *[]) { 0 },
+  status = POSIX_SPAWN (&pid, scriptname, NULL, NULL, (char *[]) { 0 },
                         (char *[]) { 0 });
   TEST_VERIFY_EXIT (status == ENOEXEC);
 
-  status = posix_spawnp (&pid, scriptname, NULL, NULL, (char *[]) { 0 },
+  status = POSIX_SPAWNP (&pid, scriptname, NULL, NULL, (char *[]) { 0 },
                          (char *[]) { 0 });
   TEST_VERIFY_EXIT (status == ENOEXEC);
 
diff --git a/posix/tst-spawn5.c b/posix/tst-spawn5.c
index 6b3d11cf82..7850f3d7dd 100644
--- a/posix/tst-spawn5.c
+++ b/posix/tst-spawn5.c
@@ -33,6 +33,7 @@
 
 #include <arch-fd_to_filename.h>
 #include <array_length.h>
+#include <tst-spawn.h>
 
 /* Nonzero if the program gets called via `exec'.  */
 static int restart;
@@ -161,14 +162,13 @@ spawn_closefrom_test (posix_spawn_file_actions_t *fa, int lowfd, int highfd,
   args[argc] = NULL;
   TEST_VERIFY (argc < argv_size);
 
-  pid_t pid;
-  int status;
+  PID_T_TYPE pid;
+  siginfo_t sinfo;
 
-  TEST_COMPARE (posix_spawn (&pid, args[0], fa, NULL, args, environ), 0);
-  TEST_COMPARE (xwaitpid (pid, &status, 0), pid);
-  TEST_VERIFY (WIFEXITED (status));
-  TEST_VERIFY (!WIFSIGNALED (status));
-  TEST_COMPARE (WEXITSTATUS (status), 0);
+  TEST_COMPARE (POSIX_SPAWN (&pid, args[0], fa, NULL, args, environ), 0);
+  TEST_COMPARE (WAITID (P_PID, pid, &sinfo, WEXITED), 0);
+  TEST_COMPARE (sinfo.si_code, CLD_EXITED);
+  TEST_COMPARE (sinfo.si_status, 0);
 }
 
 static void
diff --git a/posix/tst-spawn6.c b/posix/tst-spawn6.c
index 4e29d78168..94fb762f8b 100644
--- a/posix/tst-spawn6.c
+++ b/posix/tst-spawn6.c
@@ -32,6 +32,7 @@
 #include <sys/ioctl.h>
 #include <stdlib.h>
 #include <termios.h>
+#include <tst-spawn.h>
 
 #ifndef PATH_MAX
 # define PATH_MAX 1024
@@ -108,17 +109,15 @@ run_subprogram (int argc, char *argv[], const posix_spawnattr_t *attr,
   spargv[i] = NULL;
 
   pid_t pid;
-  TEST_COMPARE (posix_spawn (&pid, argv[1], actions, attr, spargv, environ),
+  TEST_COMPARE (POSIX_SPAWN (&pid, argv[1], actions, attr, spargv, environ),
 		exp_err);
   if (exp_err != 0)
     return;
 
-  int status;
-  TEST_COMPARE (xwaitpid (pid, &status, WUNTRACED), pid);
-  TEST_VERIFY (WIFEXITED (status));
-  TEST_VERIFY (!WIFSTOPPED (status));
-  TEST_VERIFY (!WIFSIGNALED (status));
-  TEST_COMPARE (WEXITSTATUS (status), 0);
+  siginfo_t sinfo;
+  TEST_COMPARE (WAITID (P_ALL, 0, &sinfo, WEXITED), 0);
+  TEST_COMPARE (sinfo.si_code, CLD_EXITED);
+  TEST_COMPARE (sinfo.si_status, 0);
 }
 
 static int
diff --git a/posix/tst-spawn7.c b/posix/tst-spawn7.c
index fb06915cb7..cc4498830b 100644
--- a/posix/tst-spawn7.c
+++ b/posix/tst-spawn7.c
@@ -24,7 +24,9 @@
 #include <support/check.h>
 #include <support/xsignal.h>
 #include <support/xunistd.h>
+#include <sys/wait.h>
 #include <unistd.h>
+#include <tst-spawn.h>
 
 /* Nonzero if the program gets called via `exec'.  */
 #define CMDLINE_OPTIONS \
@@ -81,14 +83,13 @@ spawn_signal_test (const char *type, const posix_spawnattr_t *attr)
 {
   spargs[check_type_argc] = (char*) type;
 
-  pid_t pid;
-  int status;
+  PID_T_TYPE pid;
+  siginfo_t sinfo;
 
   TEST_COMPARE (posix_spawn (&pid, spargs[0], NULL, attr, spargs, environ), 0);
-  TEST_COMPARE (xwaitpid (pid, &status, 0), pid);
-  TEST_VERIFY (WIFEXITED (status));
-  TEST_VERIFY (!WIFSIGNALED (status));
-  TEST_COMPARE (WEXITSTATUS (status), 0);
+  TEST_COMPARE (WAITID (P_ALL, 0, &sinfo, WEXITED), 0);
+  TEST_COMPARE (sinfo.si_code, CLD_EXITED);
+  TEST_COMPARE (sinfo.si_status, 0);
 }
 
 static void
diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
index d7b020154a..3ecfa184d0 100644
--- a/sysdeps/unix/sysv/linux/Makefile
+++ b/sysdeps/unix/sysv/linux/Makefile
@@ -62,6 +62,7 @@ sysdep_routines += \
   clock_adjtime \
   clone \
   clone-internal \
+  clone-pidfd-support \
   clone3 \
   closefrom_fallback \
   convert_scm_timestamps \
@@ -492,6 +493,8 @@ sysdep_headers += \
 sysdep_routines += \
   getcpu \
   oldglob \
+  pidfd_spawn \
+  pidfd_spawnp \
   sched_getcpu \
   spawnattr_getcgroup_np \
   spawnattr_setcgroup_np \
@@ -500,7 +503,16 @@ sysdep_routines += \
 tests += \
   tst-affinity \
   tst-affinity-pid \
+  tst-posix_spawn-setsid-pidfd \
   tst-spawn-cgroup \
+  tst-spawn-chdir-pidfd \
+  tst-spawn-pidfd \
+  tst-spawn2-pidfd \
+  tst-spawn3-pidfd \
+  tst-spawn4-pidfd \
+  tst-spawn5-pidfd \
+  tst-spawn6-pidfd \
+  tst-spawn7-pidfd \
   # tests
 
 tests-static += \
@@ -514,8 +526,14 @@ tests += \
 CFLAGS-fork.c = $(libio-mtsafe)
 CFLAGS-getpid.o = -fomit-frame-pointer
 CFLAGS-getpid.os = -fomit-frame-pointer
+CFLAGS-tst-spawn3-pidfd.c += -DOBJPFX=\"$(objpfx)\"
 
 tst-spawn-cgroup-ARGS = -- $(host-test-program-cmd)
+tst-spawn-pidfd-ARGS = -- $(host-test-program-cmd)
+tst-spawn5-pidfd-ARGS = -- $(host-test-program-cmd)
+tst-spawn6-pidfd-ARGS = -- $(host-test-program-cmd)
+tst-spawn7-pidfd-ARGS = -- $(host-test-program-cmd)
+tst-posix_spawn-setsid-pidfd-ARGS = -- $(host-test-program-cmd)
 endif
 
 ifeq ($(subdir),inet)
diff --git a/sysdeps/unix/sysv/linux/Versions b/sysdeps/unix/sysv/linux/Versions
index 6d8a67039e..a8bae0c2a2 100644
--- a/sysdeps/unix/sysv/linux/Versions
+++ b/sysdeps/unix/sysv/linux/Versions
@@ -322,6 +322,8 @@ libc {
 %endif
   }
   GLIBC_2.39 {
+    pidfd_spawn;
+    pidfd_spawnp;
     posix_spawnattr_getcgroup_np;
     posix_spawnattr_setcgroup_np;
   }
diff --git a/sysdeps/unix/sysv/linux/aarch64/libc.abilist b/sysdeps/unix/sysv/linux/aarch64/libc.abilist
index 0090827e01..6f23556067 100644
--- a/sysdeps/unix/sysv/linux/aarch64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/aarch64/libc.abilist
@@ -2673,5 +2673,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/alpha/libc.abilist b/sysdeps/unix/sysv/linux/alpha/libc.abilist
index 9d099471b6..02c43beb13 100644
--- a/sysdeps/unix/sysv/linux/alpha/libc.abilist
+++ b/sysdeps/unix/sysv/linux/alpha/libc.abilist
@@ -2782,6 +2782,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 _IO_fprintf F
diff --git a/sysdeps/unix/sysv/linux/arc/libc.abilist b/sysdeps/unix/sysv/linux/arc/libc.abilist
index d7ed2f66de..dd8e5912d8 100644
--- a/sysdeps/unix/sysv/linux/arc/libc.abilist
+++ b/sysdeps/unix/sysv/linux/arc/libc.abilist
@@ -2434,5 +2434,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/arm/be/libc.abilist b/sysdeps/unix/sysv/linux/arm/be/libc.abilist
index 92e686defe..a751e5f5a9 100644
--- a/sysdeps/unix/sysv/linux/arm/be/libc.abilist
+++ b/sysdeps/unix/sysv/linux/arm/be/libc.abilist
@@ -554,6 +554,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 _Exit F
diff --git a/sysdeps/unix/sysv/linux/arm/le/libc.abilist b/sysdeps/unix/sysv/linux/arm/le/libc.abilist
index b503e642fc..0eda3459ed 100644
--- a/sysdeps/unix/sysv/linux/arm/le/libc.abilist
+++ b/sysdeps/unix/sysv/linux/arm/le/libc.abilist
@@ -551,6 +551,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 _Exit F
diff --git a/sysdeps/unix/sysv/linux/bits/spawn_ext.h b/sysdeps/unix/sysv/linux/bits/spawn_ext.h
index a3aa020d5c..3254cfe9be 100644
--- a/sysdeps/unix/sysv/linux/bits/spawn_ext.h
+++ b/sysdeps/unix/sysv/linux/bits/spawn_ext.h
@@ -37,4 +37,35 @@ extern int posix_spawnattr_setcgroup_np (posix_spawnattr_t *__attr,
 
 #endif /* __USE_MISC */
 
+#ifdef __USE_GNU
+
+/* Spawn a new process executing PATH with the attributes describes in *ATTRP.
+   Before running the process perform the actions described in FACTS.  Return
+   a PID file descriptor in PIDFD if process creation was successful and the
+   argument is non-null.
+
+   This function is a possible cancellation point and therefore not
+   marked with __THROW.  */
+extern int pidfd_spawn (int *__restrict __pidfd,
+			const char *__restrict __path,
+			const posix_spawn_file_actions_t *__restrict __facts,
+			const posix_spawnattr_t *__restrict __attrp,
+			char *const __argv[__restrict_arr],
+			char *const __envp[__restrict_arr])
+    __nonnull ((2, 5));
+
+/* Similar to `pidfd_spawn' but search for FILE in the PATH.
+
+   This function is a possible cancellation point and therefore not
+   marked with __THROW.  */
+extern int pidfd_spawnp (int *__restrict __pidfd,
+			 const char *__restrict __file,
+			 const posix_spawn_file_actions_t *__restrict __facts,
+			 const posix_spawnattr_t *__restrict __attrp,
+			 char *const __argv[__restrict_arr],
+			 char *const __envp[__restrict_arr])
+    __nonnull ((1, 2, 5));
+
+#endif /* __USE_GNU */
+
 __END_DECLS
diff --git a/sysdeps/unix/sysv/linux/clone-pidfd-support.c b/sysdeps/unix/sysv/linux/clone-pidfd-support.c
new file mode 100644
index 0000000000..4411e2b9ea
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/clone-pidfd-support.c
@@ -0,0 +1,60 @@
+/* Check if kernel supports PID file descriptors.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <atomic.h>
+#include <sys/wait.h>
+#include <sysdep.h>
+
+/* The PID file descriptors was added during multiple releases:
+   - Linux 5.2 added CLONE_PIDFD support for clone and __clone_pidfd_supported
+     syscall.
+   - Linux 5.3 added support for poll and CLONE_PIDFD for clone3.
+   - Linux 5.4 added P_PIDFD support on waitid.
+
+   For internal usage on spawn and fork, it only make sense to return a file
+   descriptor if caller can actually waitid on it.  */
+
+static int __waitid_pidfd_supported = 0;
+
+bool
+__clone_pidfd_supported (void)
+{
+  int state = atomic_load_relaxed (&__waitid_pidfd_supported);
+  if (state == 0)
+    {
+      /* Linux define the maximum allocated file descriptor value as
+	 0x7fffffc0 (from fs/file.c):
+
+         #define __const_min(x, y) ((x) < (y) ? (x) : (y))
+         unsigned int sysctl_nr_open_max =
+	   __const_min(INT_MAX, ~(size_t)0/sizeof(void *)) & -BITS_PER_LONG;
+
+	 So we can detect whether kernel supports all pidfd interfaces by
+	 using a valid but never allocated file descriptor: if is not
+	 supported waitid will return EINVAL, otherwise EBADF.
+
+         Also the waitid is a cancellation entrypoint, so issue the syscall
+	 directly.  */
+      int r = INTERNAL_SYSCALL_CALL (waitid, P_PIDFD, INT_MAX, NULL,
+				     WEXITED | WNOHANG);
+      state = r == -EBADF ? 1 : -1;
+      atomic_store_relaxed (&__waitid_pidfd_supported, state);
+    }
+
+  return state > 1;
+}
diff --git a/sysdeps/unix/sysv/linux/csky/libc.abilist b/sysdeps/unix/sysv/linux/csky/libc.abilist
index ec9e209b8d..4f4e99427b 100644
--- a/sysdeps/unix/sysv/linux/csky/libc.abilist
+++ b/sysdeps/unix/sysv/linux/csky/libc.abilist
@@ -2710,5 +2710,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/hppa/libc.abilist b/sysdeps/unix/sysv/linux/hppa/libc.abilist
index 961f88bf14..abc471dd0b 100644
--- a/sysdeps/unix/sysv/linux/hppa/libc.abilist
+++ b/sysdeps/unix/sysv/linux/hppa/libc.abilist
@@ -2659,6 +2659,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 __confstr_chk F
diff --git a/sysdeps/unix/sysv/linux/i386/libc.abilist b/sysdeps/unix/sysv/linux/i386/libc.abilist
index b6f5a4ab83..9f03c8a9a2 100644
--- a/sysdeps/unix/sysv/linux/i386/libc.abilist
+++ b/sysdeps/unix/sysv/linux/i386/libc.abilist
@@ -2843,6 +2843,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 __confstr_chk F
diff --git a/sysdeps/unix/sysv/linux/ia64/libc.abilist b/sysdeps/unix/sysv/linux/ia64/libc.abilist
index a404b99e68..ce1d20b722 100644
--- a/sysdeps/unix/sysv/linux/ia64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/ia64/libc.abilist
@@ -2608,6 +2608,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 __confstr_chk F
diff --git a/sysdeps/unix/sysv/linux/loongarch/lp64/libc.abilist b/sysdeps/unix/sysv/linux/loongarch/lp64/libc.abilist
index 2f9f6e2332..8c3640b004 100644
--- a/sysdeps/unix/sysv/linux/loongarch/lp64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/loongarch/lp64/libc.abilist
@@ -2194,5 +2194,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
index b7e9ab4558..a594916319 100644
--- a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
+++ b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
@@ -555,6 +555,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 _Exit F
diff --git a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
index c345da7e0a..7f61d4824d 100644
--- a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
+++ b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
@@ -2786,6 +2786,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 __confstr_chk F
diff --git a/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist b/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist
index a643d868a8..83ebb84ff3 100644
--- a/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist
+++ b/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist
@@ -2759,5 +2759,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist b/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist
index fed535742c..89a0ff83bf 100644
--- a/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist
+++ b/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist
@@ -2756,5 +2756,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
index 147bac3eaf..e21c752057 100644
--- a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
@@ -2751,6 +2751,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 __confstr_chk F
diff --git a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
index e550616576..42f470d397 100644
--- a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
@@ -2749,6 +2749,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 __confstr_chk F
diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
index 56f414dbd0..6907f5f98b 100644
--- a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
@@ -2757,6 +2757,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 __confstr_chk F
diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
index da704a2e2b..4b1f017a98 100644
--- a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
@@ -2659,6 +2659,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 __confstr_chk F
diff --git a/sysdeps/unix/sysv/linux/nios2/libc.abilist b/sysdeps/unix/sysv/linux/nios2/libc.abilist
index f5a157ea94..0d45902209 100644
--- a/sysdeps/unix/sysv/linux/nios2/libc.abilist
+++ b/sysdeps/unix/sysv/linux/nios2/libc.abilist
@@ -2798,5 +2798,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/or1k/libc.abilist b/sysdeps/unix/sysv/linux/or1k/libc.abilist
index 85b552f1cb..c59032ef14 100644
--- a/sysdeps/unix/sysv/linux/or1k/libc.abilist
+++ b/sysdeps/unix/sysv/linux/or1k/libc.abilist
@@ -2180,5 +2180,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/pidfd_spawn.c b/sysdeps/unix/sysv/linux/pidfd_spawn.c
new file mode 100644
index 0000000000..cc76bf9935
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/pidfd_spawn.c
@@ -0,0 +1,30 @@
+/* pidfd_spawn - Spawn a process and return a PID file descriptor.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <spawn.h>
+#include "spawn_int.h"
+
+int
+pidfd_spawn (int *pidfd, const char *path,
+	     const posix_spawn_file_actions_t *file_actions,
+	     const posix_spawnattr_t *attrp, char *const argv[],
+	     char *const envp[])
+{
+  return __spawni (pidfd, path, file_actions, attrp, argv, envp,
+		   SPAWN_XFLAGS_RET_PIDFD);
+}
diff --git a/sysdeps/unix/sysv/linux/pidfd_spawnp.c b/sysdeps/unix/sysv/linux/pidfd_spawnp.c
new file mode 100644
index 0000000000..858c0f3191
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/pidfd_spawnp.c
@@ -0,0 +1,30 @@
+/* pidfd_spawnp - Spawn a process and return a PID file descriptor.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <spawn.h>
+#include "spawn_int.h"
+
+int
+pidfd_spawnp (int *pidfd, const char *path,
+	      const posix_spawn_file_actions_t *file_actions,
+	      const posix_spawnattr_t *attrp, char *const argv[],
+	      char *const envp[])
+{
+  return __spawni (pidfd, path, file_actions, attrp, argv, envp,
+		   SPAWN_XFLAGS_USE_PATH | SPAWN_XFLAGS_RET_PIDFD);
+}
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
index cadb16c12f..e014314d3e 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
@@ -2825,6 +2825,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 _IO_fprintf F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
index 50c5b99728..ac05154915 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
@@ -2858,6 +2858,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 _IO_fprintf F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
index 81c63385af..e13ee6e72a 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
@@ -2579,6 +2579,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 _IO_fprintf F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
index af9be18108..0e8c9ab3fe 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
@@ -2893,5 +2893,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist
index 2266a88ad5..b0559a5a64 100644
--- a/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist
@@ -2436,5 +2436,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
index 4776ae32b8..5f79a84016 100644
--- a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
@@ -2636,5 +2636,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
index 5d1d7d07a5..498886ccb2 100644
--- a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
@@ -2823,6 +2823,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 _IO_fprintf F
diff --git a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
index fffc32a0f4..51679c2990 100644
--- a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
@@ -2616,6 +2616,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 _IO_fprintf F
diff --git a/sysdeps/unix/sysv/linux/sh/be/libc.abilist b/sysdeps/unix/sysv/linux/sh/be/libc.abilist
index 43ff21447d..af7b6f5bc9 100644
--- a/sysdeps/unix/sysv/linux/sh/be/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sh/be/libc.abilist
@@ -2666,6 +2666,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 __confstr_chk F
diff --git a/sysdeps/unix/sysv/linux/sh/le/libc.abilist b/sysdeps/unix/sysv/linux/sh/le/libc.abilist
index 9ea18d5886..b766299f31 100644
--- a/sysdeps/unix/sysv/linux/sh/le/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sh/le/libc.abilist
@@ -2663,6 +2663,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 __confstr_chk F
diff --git a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
index c6607d5385..f5b9200a33 100644
--- a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
@@ -2818,6 +2818,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 _IO_fprintf F
diff --git a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
index a010a2bb16..f6012e6e17 100644
--- a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
@@ -2631,6 +2631,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 __confstr_chk F
diff --git a/sysdeps/unix/sysv/linux/spawni.c b/sysdeps/unix/sysv/linux/spawni.c
index f0d4c62ae6..d4ff23d955 100644
--- a/sysdeps/unix/sysv/linux/spawni.c
+++ b/sysdeps/unix/sysv/linux/spawni.c
@@ -68,6 +68,7 @@ struct posix_spawn_args
   int xflags;
   bool use_clone3;
   int err;
+  int pidfd;
 };
 
 /* Older version requires that shell script without shebang definition
@@ -309,7 +310,7 @@ fail:
 /* Spawn a new process executing PATH with the attributes describes in *ATTRP.
    Before running the process perform the actions described in FILE-ACTIONS. */
 static int
-__spawnix (pid_t * pid, const char *file,
+__spawnix (int *pid, const char *file,
 	   const posix_spawn_file_actions_t * file_actions,
 	   const posix_spawnattr_t * attrp, char *const argv[],
 	   char *const envp[], int xflags,
@@ -319,6 +320,17 @@ __spawnix (pid_t * pid, const char *file,
   struct posix_spawn_args args;
   int ec;
 
+  bool use_pidfd = xflags & SPAWN_XFLAGS_RET_PIDFD;
+
+  /* For CLONE_PIDFD, older kernels might not fail with unsupported flags or
+     some versions might not support waitid (P_PIDFD).  So to avoid the need
+     to handle the error on the helper process, check for full pidfd
+     support.
+     ENOSYS is returned because without proper waitid support, pidfd_spawn
+     can not be used proporly independently of its arguments.  */
+  if (use_pidfd && !__clone_pidfd_supported ())
+    return ENOSYS;
+
   /* To avoid imposing hard limits on posix_spawn{p} the total number of
      arguments is first calculated to allocate a mmap to hold all possible
      values.  */
@@ -368,6 +380,7 @@ __spawnix (pid_t * pid, const char *file,
   args.argv = argv;
   args.argc = argc;
   args.envp = envp;
+  args.pidfd = 0;
   args.xflags = xflags;
 
   internal_signal_block_all (&args.oldmask);
@@ -386,13 +399,16 @@ __spawnix (pid_t * pid, const char *file,
       /* Unsupported flags like CLONE_CLEAR_SIGHAND will be cleared up by
 	 __clone_internal_fallback.  */
       .flags = (set_cgroup ? CLONE_INTO_CGROUP : 0)
+	       | (use_pidfd ? CLONE_PIDFD : 0)
 	       | CLONE_CLEAR_SIGHAND
 	       | CLONE_VM
 	       | CLONE_VFORK,
       .exit_signal = SIGCHLD,
       .stack = (uintptr_t) stack,
       .stack_size = stack_size,
-      .cgroup = (set_cgroup ? attrp->__cgroup : 0)
+      .cgroup = (set_cgroup ? attrp->__cgroup : 0),
+      .pidfd = use_pidfd ? (uintptr_t) &args.pidfd : 0,
+      .parent_tid = use_pidfd ? (uintptr_t) &args.pidfd : 0,
     };
 #ifdef HAVE_CLONE3_WRAPPER
   args.use_clone3 = true;
@@ -445,7 +461,7 @@ __spawnix (pid_t * pid, const char *file,
   __munmap (stack, stack_size);
 
   if ((ec == 0) && (pid != NULL))
-    *pid = new_pid;
+    *pid = use_pidfd ? args.pidfd : new_pid;
 
   internal_signal_restore_set (&args.oldmask);
 
diff --git a/sysdeps/unix/sysv/linux/tst-posix_spawn-setsid-pidfd.c b/sysdeps/unix/sysv/linux/tst-posix_spawn-setsid-pidfd.c
new file mode 100644
index 0000000000..4372833f07
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/tst-posix_spawn-setsid-pidfd.c
@@ -0,0 +1,20 @@
+/* Tests for spawn pidfd extension.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <tst-spawn-pidfd.h>
+#include <posix/tst-posix_spawn-setsid.c>
diff --git a/sysdeps/unix/sysv/linux/tst-spawn-chdir-pidfd.c b/sysdeps/unix/sysv/linux/tst-spawn-chdir-pidfd.c
new file mode 100644
index 0000000000..019527b31b
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/tst-spawn-chdir-pidfd.c
@@ -0,0 +1,20 @@
+/* Tests for spawn pidfd extension.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <tst-spawn-pidfd.h>
+#include <posix/tst-spawn-chdir.c>
diff --git a/sysdeps/unix/sysv/linux/tst-spawn-pidfd.c b/sysdeps/unix/sysv/linux/tst-spawn-pidfd.c
new file mode 100644
index 0000000000..c430995af8
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/tst-spawn-pidfd.c
@@ -0,0 +1,20 @@
+/* Tests for spawn pidfd extension.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <tst-spawn-pidfd.h>
+#include <posix/tst-spawn.c>
diff --git a/sysdeps/unix/sysv/linux/tst-spawn-pidfd.h b/sysdeps/unix/sysv/linux/tst-spawn-pidfd.h
new file mode 100644
index 0000000000..ea51c22447
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/tst-spawn-pidfd.h
@@ -0,0 +1,63 @@
+/* Tests for spawn pidfd extension.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <errno.h>
+#include <spawn.h>
+#include <support/check.h>
+
+#define PID_T_TYPE int
+
+/* Call posix_spawn with POSIX_SPAWN_PIDFD set.  */
+static inline int
+pidfd_spawn_check (int *pidfd, const char *path,
+		   const posix_spawn_file_actions_t *fa,
+		   const posix_spawnattr_t *attr, char *const argv[],
+		   char *const envp[])
+{
+  int r = pidfd_spawn (pidfd, path, fa, attr, argv, envp);
+  if (r == ENOSYS)
+    FAIL_UNSUPPORTED ("kernel does not support CLONE_PIDFD clone flag");
+  return r;
+}
+
+#define POSIX_SPAWN(__pidfd, __path, __actions, __attr, __argv, __envp)	     \
+  pidfd_spawn_check (__pidfd, __path, __actions, __attr, __argv, __envp)
+
+static inline int
+pidfd_spawnp_check (int *pidfd, const char *file,
+		    const posix_spawn_file_actions_t *fa,
+		    const posix_spawnattr_t *attr,
+		    char *const argv[], char *const envp[])
+{
+  int r = pidfd_spawnp (pidfd, file, fa, attr, argv, envp);
+  if (r == ENOSYS)
+    FAIL_UNSUPPORTED ("kernel does not support CLONE_PIDFD clone flag");
+  return r;
+}
+
+#define POSIX_SPAWNP(__child, __path, __actions, __attr, __argv, __envp) \
+  pidfd_spawnp_check (__child, __path, __actions, __attr, __argv, __envp)
+
+#define WAITID(__idtype, __id, __info, __opts)				     \
+  ({									     \
+     __typeof (__idtype) __new_idtype = __idtype == P_PID		     \
+					? P_PIDFD : __idtype;		     \
+     waitid (__new_idtype, __id, __info, __opts);			     \
+  })
+
+#define TST_SPAWN_PIDFD 1
diff --git a/sysdeps/unix/sysv/linux/tst-spawn2-pidfd.c b/sysdeps/unix/sysv/linux/tst-spawn2-pidfd.c
new file mode 100644
index 0000000000..03ba7a3d15
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/tst-spawn2-pidfd.c
@@ -0,0 +1,20 @@
+/* Tests for spawn pidfd extension.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <tst-spawn-pidfd.h>
+#include <posix/tst-spawn2.c>
diff --git a/sysdeps/unix/sysv/linux/tst-spawn3-pidfd.c b/sysdeps/unix/sysv/linux/tst-spawn3-pidfd.c
new file mode 100644
index 0000000000..8ad9a16854
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/tst-spawn3-pidfd.c
@@ -0,0 +1,20 @@
+/* Check posix_spawn add file actions.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <tst-spawn-pidfd.h>
+#include <posix/tst-spawn3.c>
diff --git a/sysdeps/unix/sysv/linux/tst-spawn4-pidfd.c b/sysdeps/unix/sysv/linux/tst-spawn4-pidfd.c
new file mode 100644
index 0000000000..83922da7d1
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/tst-spawn4-pidfd.c
@@ -0,0 +1,20 @@
+/* Tests for spawn pidfd extension.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <tst-spawn-pidfd.h>
+#include <posix/tst-spawn4.c>
diff --git a/sysdeps/unix/sysv/linux/tst-spawn5-pidfd.c b/sysdeps/unix/sysv/linux/tst-spawn5-pidfd.c
new file mode 100644
index 0000000000..149c352bf8
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/tst-spawn5-pidfd.c
@@ -0,0 +1,20 @@
+/* Tests for spawn pidfd extension.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <tst-spawn-pidfd.h>
+#include <posix/tst-spawn5.c>
diff --git a/sysdeps/unix/sysv/linux/tst-spawn6-pidfd.c b/sysdeps/unix/sysv/linux/tst-spawn6-pidfd.c
new file mode 100644
index 0000000000..d3f5859457
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/tst-spawn6-pidfd.c
@@ -0,0 +1,20 @@
+/* Tests for spawn pidfd extension.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <tst-spawn-pidfd.h>
+#include <posix/tst-spawn6.c>
diff --git a/sysdeps/unix/sysv/linux/tst-spawn7-pidfd.c b/sysdeps/unix/sysv/linux/tst-spawn7-pidfd.c
new file mode 100644
index 0000000000..3aec86bec2
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/tst-spawn7-pidfd.c
@@ -0,0 +1,20 @@
+/* Tests for spawn pidfd extension.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <tst-spawn-pidfd.h>
+#include <posix/tst-spawn7.c>
diff --git a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
index 3591b5de5e..e35bf54779 100644
--- a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
@@ -2582,6 +2582,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 __confstr_chk F
diff --git a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
index ffbd8f3738..e7d7eb61c0 100644
--- a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
@@ -2688,5 +2688,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
-- 
2.34.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v8 6/7] posix: Add fork_np (BZ 26371)
  2023-08-18 14:06 [PATCH v8 0/7] Add pidfd and cgroupv2 support for process creation Adhemerval Zanella
                   ` (4 preceding siblings ...)
  2023-08-18 14:06 ` [PATCH v8 5/7] posix: Add pidfd_spawn and pidfd_spawnp (BZ 30349) Adhemerval Zanella
@ 2023-08-18 14:06 ` Adhemerval Zanella
  2023-08-24  6:07   ` Florian Weimer
  2023-08-18 14:06 ` [PATCH v8 7/7] linux: Add pidfd_getpid Adhemerval Zanella
  2023-08-18 17:51 ` [PATCH v8 0/7] Add pidfd and cgroupv2 support for process creation Rich Felker
  7 siblings, 1 reply; 29+ messages in thread
From: Adhemerval Zanella @ 2023-08-18 14:06 UTC (permalink / raw)
  To: libc-alpha, Florian Weimer

Returning a pidfd allows a process to keep a race-free handle to a child
process. However, to create a process file descriptor the caller needs
to use pidfd_open which still might be subject to TOCTOU.

The implementation assures that the kernel must support the complete
pidfd interface, meaning that waitid (P_PIDFD) should be supported. It
ensures that a non-racy workaround is required (such as reading procfs
fdinfo pid to use along with old wait interfaces).  If the kernel does
not have the required support the interface returns -1 and set errno to
ENOSYS.

The interface is:

  typedef union
  {
    struct
    {
      __uint64_t fork_np_flags;
      int fork_np_pidfd;
      int fork_np_cgroup;
      int fork_np_exit_signal;
  #define fork_np_flags       __data.fork_np_flags
  #define fork_np_pidfd       __data.fork_np_pidfd
  #define fork_np_cgroup      __data.fork_np_cgroup
  #define fork_np_exit_signal __data.fork_np_exit_signal
    } __data;
    char __size [FORK_NP_ARGS_SIZE_VER0];
  } fork_np_args_t;

  #define FORK_NP_PIDFD        (1ULL << 1)
  #define FORK_NP_CGROUP       (1ULL << 2)
  #define FORK_NP_ASYNCSAFE    (1ULL << 3)
  #define FORK_NP_EXIT_SIGNAL  (1ULL << 4)

  pid_t fork_np (fork_np_args_t *args, size_t size)

The SIZE must represent a supported fork_np_args_t type, otherwise, the
function returns EINVAL.  Also, each new member should add a new flag so
fork_np can be extended.

If ARGS has all members set to 0, no file descriptor is returned and
fork_np acts as fork.  If FORK_NP_PIDFD is set on the flags member, a
new file descriptor is returned on the pidfd member and the kernel sets
O_CLOEXEC as default.  The fork_np follows the fork/_Fork convention on
returning a positive or negative value to the parent (with a negative
indicating an error) and zero to the child.

If FORK_NP_CGROUP is set, the value on the cgroup member is used as the
cgroupv2 to be placed in the new process (by using the CLONE_INTO_CGROUP
clone flag).

If FORK_NP_EXIT_SIGNAL is set, the new process will send the exit signal
defined by exit_signal on termination or none if it is set to 0.  When
using this flag, the parent process must specify the __WALL or __WCLONE
Linux-specific options when waiting for the child with wait or waitid.

If FORK_NP_ASYNCSAFE is set, fork_np acts as _Fork, thus avoiding
running pthread_atfork handlers.

Checked on x86_64-linux-gnu on Linux 4.15 (no CLONE_PIDFD or waitid
support), Linux 5.4 (full support), and Linux 6.2.
---
 NEWS                                          |   7 +
 include/clone_internal.h                      |  17 ++
 manual/process.texi                           |  82 +++++-
 posix/Makefile                                |   3 +-
 posix/fork-internal.c                         | 127 ++++++++++
 posix/fork-internal.h                         |  36 +++
 posix/fork.c                                  | 107 +-------
 sysdeps/nptl/_Fork.c                          |   2 +-
 sysdeps/unix/sysv/linux/Makefile              |   3 +
 sysdeps/unix/sysv/linux/Versions              |   1 +
 sysdeps/unix/sysv/linux/aarch64/libc.abilist  |   1 +
 sysdeps/unix/sysv/linux/alpha/libc.abilist    |   1 +
 sysdeps/unix/sysv/linux/arc/libc.abilist      |   1 +
 sysdeps/unix/sysv/linux/arch-fork.h           |  16 +-
 sysdeps/unix/sysv/linux/arm/be/libc.abilist   |   1 +
 sysdeps/unix/sysv/linux/arm/le/libc.abilist   |   1 +
 sysdeps/unix/sysv/linux/bits/unistd_ext.h     |  51 ++++
 sysdeps/unix/sysv/linux/clone-internal.c      |  58 ++++-
 sysdeps/unix/sysv/linux/csky/libc.abilist     |   1 +
 sysdeps/unix/sysv/linux/fork_np.c             |  97 +++++++
 sysdeps/unix/sysv/linux/hppa/libc.abilist     |   1 +
 sysdeps/unix/sysv/linux/i386/libc.abilist     |   1 +
 sysdeps/unix/sysv/linux/ia64/libc.abilist     |   1 +
 .../sysv/linux/loongarch/lp64/libc.abilist    |   1 +
 .../sysv/linux/m68k/coldfire/libc.abilist     |   1 +
 .../unix/sysv/linux/m68k/m680x0/libc.abilist  |   1 +
 .../sysv/linux/microblaze/be/libc.abilist     |   1 +
 .../sysv/linux/microblaze/le/libc.abilist     |   1 +
 .../sysv/linux/mips/mips32/fpu/libc.abilist   |   1 +
 .../sysv/linux/mips/mips32/nofpu/libc.abilist |   1 +
 .../sysv/linux/mips/mips64/n32/libc.abilist   |   1 +
 .../sysv/linux/mips/mips64/n64/libc.abilist   |   1 +
 sysdeps/unix/sysv/linux/nios2/libc.abilist    |   1 +
 sysdeps/unix/sysv/linux/or1k/libc.abilist     |   1 +
 .../linux/powerpc/powerpc32/fpu/libc.abilist  |   1 +
 .../powerpc/powerpc32/nofpu/libc.abilist      |   1 +
 .../linux/powerpc/powerpc64/be/libc.abilist   |   1 +
 .../linux/powerpc/powerpc64/le/libc.abilist   |   1 +
 .../unix/sysv/linux/riscv/rv32/libc.abilist   |   1 +
 .../unix/sysv/linux/riscv/rv64/libc.abilist   |   1 +
 .../unix/sysv/linux/s390/s390-32/libc.abilist |   1 +
 .../unix/sysv/linux/s390/s390-64/libc.abilist |   1 +
 sysdeps/unix/sysv/linux/sh/be/libc.abilist    |   1 +
 sysdeps/unix/sysv/linux/sh/le/libc.abilist    |   1 +
 .../sysv/linux/sparc/sparc32/libc.abilist     |   1 +
 .../sysv/linux/sparc/sparc64/libc.abilist     |   1 +
 sysdeps/unix/sysv/linux/tst-fork_np-cgroup.c  | 170 +++++++++++++
 sysdeps/unix/sysv/linux/tst-fork_np.c         | 236 ++++++++++++++++++
 .../unix/sysv/linux/x86_64/64/libc.abilist    |   1 +
 .../unix/sysv/linux/x86_64/x32/libc.abilist   |   1 +
 50 files changed, 928 insertions(+), 119 deletions(-)
 create mode 100644 posix/fork-internal.c
 create mode 100644 posix/fork-internal.h
 create mode 100644 sysdeps/unix/sysv/linux/fork_np.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-fork_np-cgroup.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-fork_np.c

diff --git a/NEWS b/NEWS
index 97681e6796..00e9553e8f 100644
--- a/NEWS
+++ b/NEWS
@@ -27,6 +27,13 @@ Major new features:
   The pidfd functionality avoids the issue of PID reuse with the traditional
   posix_spawn interface.
 
+* On Linux, the fork_np has been added.  It has a similar semantic as ai
+  fork or _Fork, where it clones the calling process; and allows to extend
+  of the fork functionality by allowing the return of a process file
+  descriptor (as pidfd_spawn), setting a cgroupv2 of the new process (as
+  posix_spawnattr_getcgroup_np), setting a different signal on process
+  termination, and making the function act as _Fork.
+
 Deprecated and removed features, and other changes affecting compatibility:
 
   [Add deprecations, removals and changes affecting compatibility here]
diff --git a/include/clone_internal.h b/include/clone_internal.h
index 567160ebb5..340cc39a37 100644
--- a/include/clone_internal.h
+++ b/include/clone_internal.h
@@ -2,6 +2,8 @@
 #define _CLONE_INTERNAL_H
 
 #include <clone3.h>
+#include <stdbool.h>
+#include <stdint.h>
 
 /* The clone3 syscall provides a superset of the functionality of the clone
    interface.  The kernel might extend __CL_ARGS struct in the future, with
@@ -35,6 +37,21 @@ extern int __clone_internal_fallback (struct clone_args *__cl_args,
 				      void *__arg)
      attribute_hidden;
 
+/* Call the clone3/clone syscall with fork semantic (i.e. no stack setting
+   required).  The EXTRA_FLAGS define any additional flag to be used besides
+   CLONE_CHILD_SETTID and CLONE_CHILD_CLEARTID, the PIDFD indicates where
+   the process file descriptor (set with CLONE_PIDFD) should be returned,
+   and the CGROUP specifies the cgroupsv2 (set with CLONE_INTO_CGROUP).
+
+   Similar to __clone3_internal, it uses the stick check to avoid re-issue
+   the clone3 syscall if kernel does not support it.
+
+   It does not provide CLONE_INTO_CGROUP/CGROUP fallback if clone3 is not
+   supported, in this case the function returns -1/ENOTSUP.  */
+extern int __clone_fork (uint64_t __extra_flags, void *__pidfd, int __cgroup,
+			 int __exit_signal)
+     attribute_hidden;
+
 /* Return whether the kernel supports pid file descriptor, including clone
    with CLONE_PIDFD and waitid with P_PIDFD.  */
 extern bool __clone_pidfd_supported (void) attribute_hidden;
diff --git a/manual/process.texi b/manual/process.texi
index 68361c3f61..e6ac1f934f 100644
--- a/manual/process.texi
+++ b/manual/process.texi
@@ -137,12 +137,12 @@ creating a process and making it run another program.
 @cindex subprocess
 A new processes is created when one of the functions
 @code{posix_spawn}, @code{fork}, @code{_Fork}, @code{vfork}, or
-@code{pidfd_spawn} is called.  (The @code{system} and @code{popen} also
-create new processes internally.)  Due to the name of the @code{fork}
-function, the act of creating a new process is sometimes called
-@dfn{forking} a process.  Each new process (the @dfn{child process} or
-@dfn{subprocess}) is allocated a process ID, distinct from the process
-ID of the parent process.  @xref{Process Identification}.
+@code{pidfd_spawn}, or @code{fork_np} is called.  (The @code{system}
+and @code{popen} also create new processes internally.)  Due to the name
+of the @code{fork} function, the act of creating a new process is
+sometimes called @dfn{forking} a process.  Each new process (the
+@dfn{child process} or @dfn{subprocess}) is allocated a process ID,
+distinct from the process ID of the parent process.  @xref{Process Identification}.
 
 After forking a child process, both the parent and child processes
 continue to execute normally.  If you want your program to wait for a
@@ -153,10 +153,10 @@ limited information about why the child terminated---for example, its
 exit status code.
 
 A newly forked child process continues to execute the same program as
-its parent process, at the point where the @code{fork} or @code{_Fork}
-call returns.  You can use the return value from @code{fork} or
-@code{_Fork} to tell whether the program is running in the parent process
-or the child.
+its parent process, at the point where the @code{fork}, @code{_Fork},
+or @code{fork_np} call returns.  You can use the return value from
+@code{fork}, @code{_Fork}, or @code{fork_np} to tell whether the
+program is running in the parent process or the child.
 
 @cindex process image
 Having several processes run the same program is only occasionally
@@ -362,6 +362,68 @@ the proper precautions for using @code{vfork}, your program will still
 work even if the system uses @code{fork} instead.
 @end deftypefun
 
+@deftp {Data Type} {fork_np_args_t}
+@standards{GNU, unistd.h}
+This structure is used to along @code{fork_np} to enable extra
+functionality.
+
+@table @code
+@item uint64_t fork_np_flags
+If @code{FORK_NP_PIDFD} is set, the process file descriptor will be
+returned on @code{pidfd}.
+If @code{FORK_NP_CGROUP} is set, the value from @code{cgroup} will be used
+to specify a different cgroupv2 to start the new process.
+If @code{FORK_NP_ASYNCSAFE} is set, @code{fork_np} will not issue any
+atfork handler (similar to @code{_Fork}).
+If @code{PIDFDFORK_EXIT_SIGNAL} is set, the signal defined at @code{exit_signal}
+will be send on process termination.
+
+@item int fork_np_pidfd
+Return the process file descriptor if @code{FORK_NP_PIDFD} is set.
+
+@item int fork_np_cgroup
+Set the cgroupv2 to be used on the new process.
+
+@item int fork_np_exit_signal;
+Define which signal to send on process termination.
+@end table
+
+This union is a GNU extension.
+@end deftp
+
+@deftypefun pid_t fork_np (fork_np_args_t @var{args}, size_t @var{len})
+@standards{GNU, unistd.h}
+@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
+The @code{fork_np} function is similar to @code{fork} on both semantic and
+return code value, but allows extra functionality through the @code{args}
+parameter.  The @code{len} must be the size of @code{args}, otherwise the
+function returns with a failure.
+
+If @code{FORK_NP_PIDFD} is set on @code{fork_np_flags}, and the process is
+correctly created the @code{fork_np_pidfd} frkm @var{args} will contain a
+file descriptor that can be used along other pidfd functions (like
+@code{pidfd_send_signal} or with @code{waitid} along with@code{P_PIDFD}.
+
+If @code{FORK_NP_CGROUP} is set on @code{fork_np_flags}, the
+@code{fork_np_cgroup} value from @var{args} will be used as the cgroups v2
+control group on process creation.   There is no fallback implementation,
+meaning If the kernel does not provide the required support an error is returned.
+
+If @code{FORK_NP_ASYNCSAFE} is set on @code{fork_np_flags}, @code{fork_np}
+acts as @code{_Fork}, where it does not invoke any callbacks registered with
+@code{pthread_atfork}, nor does it reset internal state or locks (such as the
+@code{malloc} locks).
+
+If @code{FORK_NP_EXIT_SIGNAL} is set on @code{flags}, the signal number
+@code{fork_np_exit_signal} from @code{args} will be sent on process
+termination.  The @code{0} value is also valid, meaning that no signal
+will be sent.  @strong{NB:} When using this flag, the parent process must
+specify the @code{__WALL} or @code{__WCLONE} options when waiting for the
+child with @code{wait} or @code{waitid}.
+
+This function is a GNU extension and specific to Linux.
+@end deftypefun
+
 @node Executing a File
 @section Executing a File
 @cindex executing a file
diff --git a/posix/Makefile b/posix/Makefile
index 905cf9fb54..949f5632eb 100644
--- a/posix/Makefile
+++ b/posix/Makefile
@@ -85,6 +85,7 @@ routines := \
   fexecve \
   fnmatch \
   fork \
+  fork-internal \
   fpathconf \
   gai_strerror \
   get_child_max \
@@ -589,7 +590,7 @@ CFLAGS-execl.os = -fomit-frame-pointer
 CFLAGS-execvp.os = -fomit-frame-pointer
 CFLAGS-execlp.os = -fomit-frame-pointer
 CFLAGS-nanosleep.c += -fexceptions -fasynchronous-unwind-tables
-CFLAGS-fork.c = $(libio-mtsafe) $(config-cflags-wno-ignored-attributes)
+CFLAGS-fork-internal.c = $(libio-mtsafe) $(config-cflags-wno-ignored-attributes)
 
 tstgetopt-ARGS = -a -b -cfoobar --required foobar --optional=bazbug \
 		--none random --col --color --colour
diff --git a/posix/fork-internal.c b/posix/fork-internal.c
new file mode 100644
index 0000000000..a5e47cbe53
--- /dev/null
+++ b/posix/fork-internal.c
@@ -0,0 +1,127 @@
+/* Internal fork definitions.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <fork.h>
+#include <fork-internal.h>
+#include <ldsodefs.h>
+#include <libio/libioP.h>
+#include <malloc/malloc-internal.h>
+#include <register-atfork.h>
+#include <stdio-lock.h>
+#include <unwind-link.h>
+
+static void
+fresetlockfiles (void)
+{
+  _IO_ITER i;
+
+  for (i = _IO_iter_begin(); i != _IO_iter_end(); i = _IO_iter_next(i))
+    if ((_IO_iter_file (i)->_flags & _IO_USER_LOCK) == 0)
+      _IO_lock_init (*((_IO_lock_t *) _IO_iter_file(i)->_lock));
+}
+
+uint64_t
+__fork_pre (bool multiple_threads, struct nss_database_data *nss_database_data)
+{
+  uint64_t lastrun = __run_prefork_handlers (multiple_threads);
+
+  /* If we are not running multiple threads, we do not have to
+     preserve lock state.  If fork runs from a signal handler, only
+     async-signal-safe functions can be used in the child.  These data
+     structures are only used by unsafe functions, so their state does
+     not matter if fork was called from a signal handler.  */
+  if (multiple_threads)
+    {
+      call_function_static_weak (__nss_database_fork_prepare_parent,
+				 nss_database_data);
+
+      _IO_list_lock ();
+
+      /* Acquire malloc locks.  This needs to come last because fork
+	 handlers may use malloc, and the libio list lock has an
+	 indirect malloc dependency as well (via the getdelim
+	 function).  */
+      call_function_static_weak (__malloc_fork_lock_parent);
+    }
+
+  return lastrun;
+}
+
+void
+__fork_post (struct fork_post_state_t *state,
+	     struct nss_database_data *nss_database_data)
+{
+  if (state->pid == 0)
+    {
+      fork_system_setup ();
+
+      /* Reset the lock state in the multi-threaded case.  */
+      if (state->multiple_threads)
+	{
+	  __libc_unwind_link_after_fork ();
+
+	  fork_system_setup_after_fork ();
+
+	  /* Release malloc locks.  */
+	  call_function_static_weak (__malloc_fork_unlock_child);
+
+	  /* Reset the file list.  These are recursive mutexes.  */
+	  fresetlockfiles ();
+
+	  /* Reset locks in the I/O code.  */
+	  _IO_list_resetlock ();
+
+	  call_function_static_weak (__nss_database_fork_subprocess,
+				     nss_database_data);
+	}
+
+      /* Reset the lock the dynamic loader uses to protect its data.  */
+      __rtld_lock_initialize (GL(dl_load_lock));
+
+      /* Reset the lock protecting dynamic TLS related data.  */
+      __rtld_lock_initialize (GL(dl_load_tls_lock));
+
+      reclaim_stacks ();
+
+      /* Run the handlers registered for the child.  */
+      __run_postfork_handlers (atfork_run_child, state->multiple_threads,
+			       state->lastrun);
+    }
+  else
+    {
+      /* If _Fork failed, preserve its errno value.  */
+      int save_errno = errno;
+
+      /* Release acquired locks in the multi-threaded case.  */
+      if (state->multiple_threads)
+	{
+	  /* Release malloc locks, parent process variant.  */
+	  call_function_static_weak (__malloc_fork_unlock_parent);
+
+	  /* We execute this even if the 'fork' call failed.  */
+	  _IO_list_unlock ();
+	}
+
+      /* Run the handlers registered for the parent.  */
+      __run_postfork_handlers (atfork_run_parent, state->multiple_threads,
+			       state->lastrun);
+
+      if (state->pid < 0)
+	__set_errno (save_errno);
+    }
+}
diff --git a/posix/fork-internal.h b/posix/fork-internal.h
new file mode 100644
index 0000000000..5017061e1e
--- /dev/null
+++ b/posix/fork-internal.h
@@ -0,0 +1,36 @@
+/* Internal fork definitions.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#ifndef _FORK_INTERNAL_H
+#define _FORK_INTERNAL_H
+
+#include <stdint.h>
+#include <nss/nss_database.h>
+
+struct fork_post_state_t
+{
+  bool multiple_threads;
+  pid_t pid;
+  uint64_t lastrun;
+};
+
+uint64_t __fork_pre (bool, struct nss_database_data *) attribute_hidden;
+void __fork_post (struct fork_post_state_t *, struct nss_database_data *)
+  attribute_hidden;
+
+#endif
diff --git a/posix/fork.c b/posix/fork.c
index b4aaa9fa6d..1708473e72 100644
--- a/posix/fork.c
+++ b/posix/fork.c
@@ -16,25 +16,10 @@
    License along with the GNU C Library; if not, see
    <https://www.gnu.org/licenses/>.  */
 
-#include <fork.h>
-#include <libio/libioP.h>
-#include <ldsodefs.h>
-#include <malloc/malloc-internal.h>
-#include <nss/nss_database.h>
-#include <register-atfork.h>
-#include <stdio-lock.h>
+#include <fork-internal.h>
 #include <sys/single_threaded.h>
 #include <unwind-link.h>
-
-static void
-fresetlockfiles (void)
-{
-  _IO_ITER i;
-
-  for (i = _IO_iter_begin(); i != _IO_iter_end(); i = _IO_iter_next(i))
-    if ((_IO_iter_file (i)->_flags & _IO_USER_LOCK) == 0)
-      _IO_lock_init (*((_IO_lock_t *) _IO_iter_file(i)->_lock));
-}
+#include <unistd.h>
 
 pid_t
 __libc_fork (void)
@@ -45,92 +30,18 @@ __libc_fork (void)
      requirement for fork (Austin Group tracker issue #62) this is
      best effort to make is async-signal-safe at least for single-thread
      case.  */
-  bool multiple_threads = !SINGLE_THREAD_P;
-  uint64_t lastrun;
-
-  lastrun = __run_prefork_handlers (multiple_threads);
-
+  struct fork_post_state_t state = {
+      .multiple_threads = !SINGLE_THREAD_P
+  };
   struct nss_database_data nss_database_data;
 
-  /* If we are not running multiple threads, we do not have to
-     preserve lock state.  If fork runs from a signal handler, only
-     async-signal-safe functions can be used in the child.  These data
-     structures are only used by unsafe functions, so their state does
-     not matter if fork was called from a signal handler.  */
-  if (multiple_threads)
-    {
-      call_function_static_weak (__nss_database_fork_prepare_parent,
-				 &nss_database_data);
-
-      _IO_list_lock ();
-
-      /* Acquire malloc locks.  This needs to come last because fork
-	 handlers may use malloc, and the libio list lock has an
-	 indirect malloc dependency as well (via the getdelim
-	 function).  */
-      call_function_static_weak (__malloc_fork_lock_parent);
-    }
-
-  pid_t pid = _Fork ();
-
-  if (pid == 0)
-    {
-      fork_system_setup ();
-
-      /* Reset the lock state in the multi-threaded case.  */
-      if (multiple_threads)
-	{
-	  __libc_unwind_link_after_fork ();
-
-	  fork_system_setup_after_fork ();
-
-	  /* Release malloc locks.  */
-	  call_function_static_weak (__malloc_fork_unlock_child);
-
-	  /* Reset the file list.  These are recursive mutexes.  */
-	  fresetlockfiles ();
-
-	  /* Reset locks in the I/O code.  */
-	  _IO_list_resetlock ();
-
-	  call_function_static_weak (__nss_database_fork_subprocess,
-				     &nss_database_data);
-	}
-
-      /* Reset the lock the dynamic loader uses to protect its data.  */
-      __rtld_lock_initialize (GL(dl_load_lock));
-
-      /* Reset the lock protecting dynamic TLS related data.  */
-      __rtld_lock_initialize (GL(dl_load_tls_lock));
-
-      reclaim_stacks ();
-
-      /* Run the handlers registered for the child.  */
-      __run_postfork_handlers (atfork_run_child, multiple_threads, lastrun);
-    }
-  else
-    {
-      /* If _Fork failed, preserve its errno value.  */
-      int save_errno = errno;
-
-      /* Release acquired locks in the multi-threaded case.  */
-      if (multiple_threads)
-	{
-	  /* Release malloc locks, parent process variant.  */
-	  call_function_static_weak (__malloc_fork_unlock_parent);
-
-	  /* We execute this even if the 'fork' call failed.  */
-	  _IO_list_unlock ();
-	}
+  state.lastrun = __fork_pre (state.multiple_threads, &nss_database_data);
 
-      /* Run the handlers registered for the parent.  */
-      __run_postfork_handlers (atfork_run_parent, multiple_threads, lastrun);
+  state.pid = _Fork ();
 
-      if (pid < 0)
-	__set_errno (save_errno);
-    }
+  __fork_post (&state, &nss_database_data);
 
-  return pid;
+  return state.pid;
 }
 weak_alias (__libc_fork, __fork)
 libc_hidden_def (__fork)
diff --git a/sysdeps/nptl/_Fork.c b/sysdeps/nptl/_Fork.c
index f8322ae557..397f059fb0 100644
--- a/sysdeps/nptl/_Fork.c
+++ b/sysdeps/nptl/_Fork.c
@@ -22,7 +22,7 @@
 pid_t
 _Fork (void)
 {
-  pid_t pid = arch_fork (&THREAD_SELF->tid);
+  pid_t pid = arch_fork (SIGCHLD, NULL, &THREAD_SELF->tid);
   if (pid == 0)
     {
       struct pthread *self = THREAD_SELF;
diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
index 3ecfa184d0..c9164e9d0a 100644
--- a/sysdeps/unix/sysv/linux/Makefile
+++ b/sysdeps/unix/sysv/linux/Makefile
@@ -493,6 +493,7 @@ sysdep_headers += \
 sysdep_routines += \
   getcpu \
   oldglob \
+  fork_np \
   pidfd_spawn \
   pidfd_spawnp \
   sched_getcpu \
@@ -503,6 +504,8 @@ sysdep_routines += \
 tests += \
   tst-affinity \
   tst-affinity-pid \
+  tst-fork_np \
+  tst-fork_np-cgroup \
   tst-posix_spawn-setsid-pidfd \
   tst-spawn-cgroup \
   tst-spawn-chdir-pidfd \
diff --git a/sysdeps/unix/sysv/linux/Versions b/sysdeps/unix/sysv/linux/Versions
index a8bae0c2a2..c677631f24 100644
--- a/sysdeps/unix/sysv/linux/Versions
+++ b/sysdeps/unix/sysv/linux/Versions
@@ -322,6 +322,7 @@ libc {
 %endif
   }
   GLIBC_2.39 {
+    fork_np;
     pidfd_spawn;
     pidfd_spawnp;
     posix_spawnattr_getcgroup_np;
diff --git a/sysdeps/unix/sysv/linux/aarch64/libc.abilist b/sysdeps/unix/sysv/linux/aarch64/libc.abilist
index 6f23556067..dab02f0087 100644
--- a/sysdeps/unix/sysv/linux/aarch64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/aarch64/libc.abilist
@@ -2673,6 +2673,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 fork_np F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/alpha/libc.abilist b/sysdeps/unix/sysv/linux/alpha/libc.abilist
index 02c43beb13..1db00408cf 100644
--- a/sysdeps/unix/sysv/linux/alpha/libc.abilist
+++ b/sysdeps/unix/sysv/linux/alpha/libc.abilist
@@ -2782,6 +2782,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 fork_np F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/arc/libc.abilist b/sysdeps/unix/sysv/linux/arc/libc.abilist
index dd8e5912d8..032aacc1ba 100644
--- a/sysdeps/unix/sysv/linux/arc/libc.abilist
+++ b/sysdeps/unix/sysv/linux/arc/libc.abilist
@@ -2434,6 +2434,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 fork_np F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/arch-fork.h b/sysdeps/unix/sysv/linux/arch-fork.h
index 0e0eccbf38..f978d4c4f4 100644
--- a/sysdeps/unix/sysv/linux/arch-fork.h
+++ b/sysdeps/unix/sysv/linux/arch-fork.h
@@ -32,24 +32,24 @@
    override it with one of the supported calling convention (check generic
    kernel-features.h for the clone abi variants).  */
 static inline pid_t
-arch_fork (void *ctid)
+arch_fork (int flags, void *ptid, void *ctid)
 {
-  const int flags = CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID | SIGCHLD;
   long int ret;
+  flags |= CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID;
 #ifdef __ASSUME_CLONE_BACKWARDS
 # ifdef INLINE_CLONE_SYSCALL
-  ret = INLINE_CLONE_SYSCALL (flags, 0, NULL, 0, ctid);
+  ret = INLINE_CLONE_SYSCALL (flags, 0, ptid, 0, ctid);
 # else
-  ret = INLINE_SYSCALL_CALL (clone, flags, 0, NULL, 0, ctid);
+  ret = INLINE_SYSCALL_CALL (clone, flags, 0, ptid, 0, ctid);
 # endif
 #elif defined(__ASSUME_CLONE_BACKWARDS2)
-  ret = INLINE_SYSCALL_CALL (clone, 0, flags, NULL, ctid, 0);
+  ret = INLINE_SYSCALL_CALL (clone, 0, flags, ptid, ctid, 0);
 #elif defined(__ASSUME_CLONE_BACKWARDS3)
-  ret = INLINE_SYSCALL_CALL (clone, flags, 0, 0, NULL, ctid, 0);
+  ret = INLINE_SYSCALL_CALL (clone, flags, 0, 0, ptid, ctid, 0);
 #elif defined(__ASSUME_CLONE2)
-  ret = INLINE_SYSCALL_CALL (clone2, flags, 0, 0, NULL, ctid, 0);
+  ret = INLINE_SYSCALL_CALL (clone2, flags, 0, 0, ptid, ctid, 0);
 #elif defined(__ASSUME_CLONE_DEFAULT)
-  ret = INLINE_SYSCALL_CALL (clone, flags, 0, NULL, ctid, 0);
+  ret = INLINE_SYSCALL_CALL (clone, flags, 0, ptid, ctid, 0);
 #else
 # error "Undefined clone variant"
 #endif
diff --git a/sysdeps/unix/sysv/linux/arm/be/libc.abilist b/sysdeps/unix/sysv/linux/arm/be/libc.abilist
index a751e5f5a9..9f3ef16280 100644
--- a/sysdeps/unix/sysv/linux/arm/be/libc.abilist
+++ b/sysdeps/unix/sysv/linux/arm/be/libc.abilist
@@ -554,6 +554,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 fork_np F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/arm/le/libc.abilist b/sysdeps/unix/sysv/linux/arm/le/libc.abilist
index 0eda3459ed..c2c6c8af6b 100644
--- a/sysdeps/unix/sysv/linux/arm/le/libc.abilist
+++ b/sysdeps/unix/sysv/linux/arm/le/libc.abilist
@@ -551,6 +551,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 fork_np F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/bits/unistd_ext.h b/sysdeps/unix/sysv/linux/bits/unistd_ext.h
index c523ef67c1..1872728c51 100644
--- a/sysdeps/unix/sysv/linux/bits/unistd_ext.h
+++ b/sysdeps/unix/sysv/linux/bits/unistd_ext.h
@@ -47,4 +47,55 @@ extern __pid_t gettid (void) __THROW;
 # define CLOSE_RANGE_CLOEXEC (1U << 2)
 #endif
 
+#define FORK_NP_ARGS_SIZE_VER0 24
+typedef union
+{
+  struct
+  {
+    __uint64_t fork_np_flags;
+    int fork_np_pidfd;
+    int fork_np_cgroup;
+    int fork_np_exit_signal;
+#define fork_np_flags       __data.fork_np_flags
+#define fork_np_pidfd       __data.fork_np_pidfd
+#define fork_np_cgroup      __data.fork_np_cgroup
+#define fork_np_exit_signal __data.fork_np_exit_signal
+  } __data;
+  char __size [FORK_NP_ARGS_SIZE_VER0];
+} fork_np_args_t;
+
+/* Return the process file descriptor.  */
+#define FORK_NP_PIDFD        (1ULL << 1)
+/* Specify a different cgroup2 than the default one.  */
+#define FORK_NP_CGROUP       (1ULL << 2)
+/* Do not issue the pthread_atfork on process creation.  */
+#define FORK_NP_ASYNCSAFE    (1ULL << 3)
+/* Send a different signal to parent on child termination.  */
+#define FORK_NP_EXIT_SIGNAL  (1ULL << 4)
+
+/* Clone the calling process, creating an exact copy and return a file
+   descriptor that can be used along other pidfd functions.
+
+   The ARGS changes how the process creation is done.
+
+   If FORK_NP_PIDFD is set on FLAGS, a process file descriptor is returned on
+   PIDFD (which can be used along other pidfd function, like pidfd_signal).
+
+   If FORK_NP_CGROUP is set on FLAGS, the CGROUP file descriptor must
+   reference a cgroup v2 control group which will be used on process
+   creation.
+
+   If FORK_NP_ASYNCSAFE is set on FLAGS, fork_np does not invoke the
+   registered pthread_atfork callacks (similar to _Fork).
+
+   If FORK_NP_EXIT_SIGNAL is set on FLAGS, send the EXIT_SIGNAL signal
+   on process termination.
+
+   On success, the PID of the child process is returned in the parent,
+   and 0 is returned to child.  On failure, -1 is returned in the
+   parent, no child process is created.  */
+extern pid_t fork_np (fork_np_args_t *__args, __SIZE_TYPE__ __size)
+  __THROW;
+
+
 #endif /* __USE_GNU  */
diff --git a/sysdeps/unix/sysv/linux/clone-internal.c b/sysdeps/unix/sysv/linux/clone-internal.c
index 790739cfce..d121be48bc 100644
--- a/sysdeps/unix/sysv/linux/clone-internal.c
+++ b/sysdeps/unix/sysv/linux/clone-internal.c
@@ -16,6 +16,7 @@
    License along with the GNU C Library.  If not, see
    <https://www.gnu.org/licenses/>.  */
 
+#include <arch-fork.h>
 #include <sysdep.h>
 #include <stddef.h>
 #include <errno.h>
@@ -43,6 +44,11 @@ _Static_assert (offsetofend (struct clone_args, cgroup) == CLONE_ARGS_SIZE_VER2,
 _Static_assert (sizeof (struct clone_args) == CLONE_ARGS_SIZE_VER2,
 		"sizeof (struct clone_args) != CLONE_ARGS_SIZE_VER2");
 
+#if !__ASSUME_CLONE3 && defined __NR_clone3
+/* Set to 0 if kernel does not support clone3 syscall.  */
+static int clone3_supported = 1;
+#endif
+
 int
 __clone_internal_fallback (struct clone_args *cl_args,
 			   int (*func) (void *arg), void *arg)
@@ -84,7 +90,6 @@ __clone3_internal (struct clone_args *cl_args, int (*func) (void *args),
 # if __ASSUME_CLONE3
   return __clone3 (cl_args, sizeof (*cl_args), func, arg);
 # else
-  static int clone3_supported = 1;
   if (atomic_load_relaxed (&clone3_supported) == 1)
     {
       int ret = __clone3 (cl_args, sizeof (*cl_args), func, arg);
@@ -118,3 +123,54 @@ __clone_internal (struct clone_args *cl_args,
 }
 
 libc_hidden_def (__clone_internal)
+
+int
+__clone_fork (uint64_t extra_flags, void *pidfd, int cgroup, int exit_signal)
+{
+#ifdef __NR_clone3
+  struct clone_args clone_args =
+    {
+      .flags = extra_flags
+	       | CLONE_CHILD_SETTID
+	       | CLONE_CHILD_CLEARTID,
+      .exit_signal = exit_signal,
+      .cgroup = cgroup,
+      .child_tid = (uintptr_t) &THREAD_SELF->tid,
+      .pidfd = (uintptr_t) pidfd,
+      .parent_tid = (uintptr_t) pidfd
+    };
+#endif
+
+#if __ASSUME_CLONE3
+  return INLINE_SYSCALL_CALL (clone3, &clone_args, sizeof (clone_args));
+#else
+  /* Some architecture still does not export clone3.  */
+  pid_t pid;
+# ifdef __NR_clone3
+  if (atomic_load_relaxed (&clone3_supported) == 1)
+    {
+      pid = INLINE_SYSCALL_CALL (clone3, &clone_args, sizeof (clone_args));
+      if (pid != -1 || errno != ENOSYS)
+	return pid;
+
+      atomic_store_relaxed (&clone3_supported, 0);
+    }
+# endif
+
+  if (!(extra_flags & CLONE_INTO_CGROUP))
+    {
+      int flags = extra_flags | (exit_signal & 0xff);
+      pid = arch_fork (flags, pidfd, &THREAD_SELF->tid);
+    }
+  else
+    {
+      /* No fallback for POSIX_SPAWN_SETCGROUP if clone3 is not supported.  */
+      pid = -1;
+# ifdef __NR_clone3
+      if (errno == ENOSYS)
+# endif
+	errno = ENOTSUP;
+    }
+  return pid;
+#endif
+}
diff --git a/sysdeps/unix/sysv/linux/csky/libc.abilist b/sysdeps/unix/sysv/linux/csky/libc.abilist
index 4f4e99427b..4112163af2 100644
--- a/sysdeps/unix/sysv/linux/csky/libc.abilist
+++ b/sysdeps/unix/sysv/linux/csky/libc.abilist
@@ -2710,6 +2710,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 fork_np F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/fork_np.c b/sysdeps/unix/sysv/linux/fork_np.c
new file mode 100644
index 0000000000..ca9a83bb22
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/fork_np.c
@@ -0,0 +1,97 @@
+/* fork_np - Duplicated calling process and return a process file
+   descriptor.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <unistd.h>
+#include <clone_internal.h>
+#include <fork-internal.h>
+
+static pid_t
+fork_syscall (fork_np_args_t *args)
+{
+  bool use_pidfd = args->fork_np_flags & FORK_NP_PIDFD;
+  bool use_cgroup = args->fork_np_flags & FORK_NP_CGROUP;
+
+  int *pidfd = use_pidfd ? &args->fork_np_pidfd : NULL;
+  int cgroup = use_cgroup ? args->fork_np_cgroup : 0;
+
+  uint64_t extra_flags = (use_pidfd ? CLONE_PIDFD : 0)
+			 | (use_cgroup ? CLONE_INTO_CGROUP : 0);
+  int exit_signal = (args->fork_np_flags & FORK_NP_EXIT_SIGNAL)
+		     ? args->fork_np_exit_signal : SIGCHLD;
+
+  pid_t pid = __clone_fork (extra_flags, pidfd, cgroup, exit_signal);
+
+  if (pid == 0)
+    {
+      struct pthread *self = THREAD_SELF;
+
+      /* Initialize the robust mutex, check _Fork implementation for a full
+	 description why this is required.  */
+#if __PTHREAD_MUTEX_HAVE_PREV
+      self->robust_prev = &self->robust_head;
+#endif
+      self->robust_head.list = &self->robust_head;
+      INTERNAL_SYSCALL_CALL (set_robust_list, &self->robust_head,
+			     sizeof (struct robust_list_head));
+    }
+  return pid;
+}
+
+#define SUPPORTED_FLAGS (FORK_NP_PIDFD			\
+			 | FORK_NP_CGROUP		\
+			 | FORK_NP_ASYNCSAFE		\
+			 | FORK_NP_EXIT_SIGNAL)
+
+_Static_assert (sizeof (fork_np_args_t) == FORK_NP_ARGS_SIZE_VER0,
+		"sizeof (fork_np_args_t) != FORK_NP_ARGS_SIZE_VER0");
+
+pid_t
+fork_np (fork_np_args_t *args, size_t size)
+{
+  if (size != FORK_NP_ARGS_SIZE_VER0)
+    return INLINE_SYSCALL_ERROR_RETURN_VALUE (EINVAL);
+
+  if (args->fork_np_flags & ~(SUPPORTED_FLAGS))
+    return INLINE_SYSCALL_ERROR_RETURN_VALUE (EINVAL);
+
+  if ((args->fork_np_flags & FORK_NP_CGROUP) && !__clone_pidfd_supported ())
+    return INLINE_SYSCALL_ERROR_RETURN_VALUE (ENOSYS);
+
+  pid_t pid;
+  if (!(args->fork_np_flags & FORK_NP_ASYNCSAFE))
+    {
+      bool multiple_threads = !SINGLE_THREAD_P;
+      struct fork_post_state_t state = {
+	  .multiple_threads = !SINGLE_THREAD_P
+      };
+      struct nss_database_data nss_database_data;
+
+      state.lastrun = __fork_pre (multiple_threads, &nss_database_data);
+      state.pid = fork_syscall (args);
+      /* It follow the usual fork semantic, where a positive or negative
+	 value is returned to parent, and 0 for the child.  */
+      __fork_post (&state, &nss_database_data);
+
+      pid = state.pid;
+    }
+  else
+    pid = fork_syscall (args);
+
+  return pid;
+}
diff --git a/sysdeps/unix/sysv/linux/hppa/libc.abilist b/sysdeps/unix/sysv/linux/hppa/libc.abilist
index abc471dd0b..b01734661b 100644
--- a/sysdeps/unix/sysv/linux/hppa/libc.abilist
+++ b/sysdeps/unix/sysv/linux/hppa/libc.abilist
@@ -2659,6 +2659,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 fork_np F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/i386/libc.abilist b/sysdeps/unix/sysv/linux/i386/libc.abilist
index 9f03c8a9a2..14e58ef02d 100644
--- a/sysdeps/unix/sysv/linux/i386/libc.abilist
+++ b/sysdeps/unix/sysv/linux/i386/libc.abilist
@@ -2843,6 +2843,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 fork_np F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/ia64/libc.abilist b/sysdeps/unix/sysv/linux/ia64/libc.abilist
index ce1d20b722..25936400b8 100644
--- a/sysdeps/unix/sysv/linux/ia64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/ia64/libc.abilist
@@ -2608,6 +2608,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 fork_np F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/loongarch/lp64/libc.abilist b/sysdeps/unix/sysv/linux/loongarch/lp64/libc.abilist
index 8c3640b004..4299a45d2f 100644
--- a/sysdeps/unix/sysv/linux/loongarch/lp64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/loongarch/lp64/libc.abilist
@@ -2194,6 +2194,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 fork_np F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
index a594916319..98d11f7e00 100644
--- a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
+++ b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
@@ -555,6 +555,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 fork_np F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
index 7f61d4824d..311b17c166 100644
--- a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
+++ b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
@@ -2786,6 +2786,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 fork_np F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist b/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist
index 83ebb84ff3..9a645345e7 100644
--- a/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist
+++ b/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist
@@ -2759,6 +2759,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 fork_np F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist b/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist
index 89a0ff83bf..bc6b3094fc 100644
--- a/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist
+++ b/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist
@@ -2756,6 +2756,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 fork_np F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
index e21c752057..14f2335c29 100644
--- a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
@@ -2751,6 +2751,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 fork_np F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
index 42f470d397..f41a1adaca 100644
--- a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
@@ -2749,6 +2749,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 fork_np F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
index 6907f5f98b..3500745aa0 100644
--- a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
@@ -2757,6 +2757,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 fork_np F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
index 4b1f017a98..64cc996c51 100644
--- a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
@@ -2659,6 +2659,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 fork_np F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/nios2/libc.abilist b/sysdeps/unix/sysv/linux/nios2/libc.abilist
index 0d45902209..723956e4be 100644
--- a/sysdeps/unix/sysv/linux/nios2/libc.abilist
+++ b/sysdeps/unix/sysv/linux/nios2/libc.abilist
@@ -2798,6 +2798,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 fork_np F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/or1k/libc.abilist b/sysdeps/unix/sysv/linux/or1k/libc.abilist
index c59032ef14..97657be343 100644
--- a/sysdeps/unix/sysv/linux/or1k/libc.abilist
+++ b/sysdeps/unix/sysv/linux/or1k/libc.abilist
@@ -2180,6 +2180,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 fork_np F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
index e014314d3e..a3fa2f4f87 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
@@ -2825,6 +2825,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 fork_np F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
index ac05154915..bddf0f2d01 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
@@ -2858,6 +2858,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 fork_np F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
index e13ee6e72a..ee9db4eff2 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
@@ -2579,6 +2579,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 fork_np F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
index 0e8c9ab3fe..0a0c4c4650 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
@@ -2893,6 +2893,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 fork_np F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist
index b0559a5a64..0c9a1648e1 100644
--- a/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist
@@ -2436,6 +2436,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 fork_np F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
index 5f79a84016..0acdd6fff4 100644
--- a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
@@ -2636,6 +2636,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 fork_np F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
index 498886ccb2..b94792e4c1 100644
--- a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
@@ -2823,6 +2823,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 fork_np F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
index 51679c2990..7d3a6e3c90 100644
--- a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
@@ -2616,6 +2616,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 fork_np F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/sh/be/libc.abilist b/sysdeps/unix/sysv/linux/sh/be/libc.abilist
index af7b6f5bc9..1c26740359 100644
--- a/sysdeps/unix/sysv/linux/sh/be/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sh/be/libc.abilist
@@ -2666,6 +2666,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 fork_np F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/sh/le/libc.abilist b/sysdeps/unix/sysv/linux/sh/le/libc.abilist
index b766299f31..5b0bd8c6c8 100644
--- a/sysdeps/unix/sysv/linux/sh/le/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sh/le/libc.abilist
@@ -2663,6 +2663,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 fork_np F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
index f5b9200a33..9e18f09c1e 100644
--- a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
@@ -2818,6 +2818,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 fork_np F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
index f6012e6e17..3a94cf17ee 100644
--- a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
@@ -2631,6 +2631,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 fork_np F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/tst-fork_np-cgroup.c b/sysdeps/unix/sysv/linux/tst-fork_np-cgroup.c
new file mode 100644
index 0000000000..e024537fc8
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/tst-fork_np-cgroup.c
@@ -0,0 +1,170 @@
+/* fork_np test using cgroupsv2.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <errno.h>
+#include <sched.h>
+#include <stdlib.h>
+#include <string.h>
+#include <support/check.h>
+#include <support/support.h>
+#include <support/xstdio.h>
+#include <support/xunistd.h>
+#include <support/temp_file.h>
+#include <sys/pidfd.h>
+#include <sys/vfs.h>
+#include <sys/wait.h>
+
+#include <dirent.h>
+
+#define CGROUPFS "/sys/fs/cgroup/"
+#ifndef CGROUP2_SUPER_MAGIC
+# define CGROUP2_SUPER_MAGIC 0x63677270
+#endif
+
+#define F_TYPE_EQUAL(a, b) (a == (typeof(a)) b)
+
+static inline char *
+startswith(const char *s, const char *prefix)
+{
+  size_t l = strlen (prefix);
+  if (strncmp (s, prefix, l) == 0)
+    return (char*) s + l;
+  return NULL;
+}
+
+static char *
+get_cgroup (void)
+{
+  FILE *f = xfopen ("/proc/self/cgroup", "re");
+
+  char *cgroup = NULL;
+
+  char *line = NULL;
+  size_t linesiz = 0;
+  while (xgetline (&line, &linesiz, f) > 0)
+    {
+      char *entry = startswith (line, "0:");
+      if (entry == NULL)
+	continue;
+
+      entry = strchr (entry, ':');
+      if (entry == NULL)
+	continue;
+
+      cgroup = entry + 1;
+      size_t l = strlen (cgroup);
+      if (cgroup[l - 1] == '\n')
+	cgroup[l - 1] = '\0';
+
+      cgroup = xstrdup (entry + 1);
+      break;
+    }
+
+  xfclose (f);
+  free (line);
+
+  return cgroup;
+}
+
+static int
+do_test (void)
+{
+  struct statfs fs;
+  if (statfs (CGROUPFS, &fs) < 0)
+    {
+      if (errno == ENOENT)
+	FAIL_UNSUPPORTED ("not cgroupv2 mount found");
+      FAIL_EXIT1 ("statfs (%s): %m\n", CGROUPFS);
+    }
+
+  if (!F_TYPE_EQUAL (fs.f_type, CGROUP2_SUPER_MAGIC))
+    FAIL_UNSUPPORTED ("%s is not a cgroupv2", CGROUPFS);
+
+  char *cgroup = get_cgroup ();
+  TEST_VERIFY_EXIT (cgroup != NULL);
+  char *newcgroup = xasprintf ("%s/%s", cgroup, "test-fork_np-cgroup");
+  char *cgpath = xasprintf ("%s%s/test-fork_np-cgroup", CGROUPFS, cgroup);
+  free (cgroup);
+
+  if (mkdir (cgpath, 0755) == -1 && errno != EEXIST)
+    {
+      if (errno == EACCES || errno == EPERM || errno == EROFS)
+	FAIL_UNSUPPORTED ("can not create a new cgroupv2 group");
+      FAIL_EXIT1 ("mkdir (%s): %m", cgpath);
+    }
+  add_temp_file (cgpath);
+
+  int dfd = xopen (cgpath, O_DIRECTORY | O_RDONLY | O_CLOEXEC, 0666);
+
+  /* Check if the cgroup used at creation is the same returned by the kernel
+     and not as the parent.  */
+  {
+    fork_np_args_t pidfd_args = {
+      .fork_np_flags = FORK_NP_CGROUP,
+      .fork_np_cgroup = dfd,
+    };
+    pid_t pid = fork_np (&pidfd_args, sizeof pidfd_args);
+    if (pid == -1 && errno == ENOTSUP)
+      FAIL_UNSUPPORTED ("kernel does not support CLONE_PIDFD clone flag");
+    TEST_VERIFY_EXIT (pid != -1);
+    if (pid == 0)
+      {
+	char *child_cgroup = get_cgroup ();
+	TEST_VERIFY_EXIT (child_cgroup != NULL);
+	TEST_COMPARE_STRING (newcgroup, child_cgroup);
+	_exit (EXIT_SUCCESS);
+      }
+
+    siginfo_t sinfo;
+    TEST_COMPARE (waitid (P_PID, pid, &sinfo, WEXITED), 0);
+    TEST_COMPARE (sinfo.si_signo, SIGCHLD);
+    TEST_COMPARE (sinfo.si_code, CLD_EXITED);
+    TEST_COMPARE (sinfo.si_status, 0);
+  }
+
+  /* Same as before, but also check along with process file descriptor.  */
+  {
+    fork_np_args_t pidfd_args = {
+      .fork_np_flags = FORK_NP_PIDFD | FORK_NP_CGROUP,
+      .fork_np_cgroup = dfd,
+    };
+    pid_t pid = fork_np (&pidfd_args, sizeof pidfd_args);
+    TEST_VERIFY_EXIT (pid != -1);
+    if (pid == 0)
+      {
+	char *child_cgroup = get_cgroup ();
+	TEST_VERIFY_EXIT (child_cgroup != NULL);
+	TEST_COMPARE_STRING (newcgroup, child_cgroup);
+	_exit (EXIT_SUCCESS);
+      }
+
+    siginfo_t sinfo;
+    TEST_COMPARE (waitid (P_PIDFD, pidfd_args.fork_np_pidfd, &sinfo,
+			  WEXITED), 0);
+    TEST_COMPARE (sinfo.si_signo, SIGCHLD);
+    TEST_COMPARE (sinfo.si_code, CLD_EXITED);
+    TEST_COMPARE (sinfo.si_status, 0);
+  }
+
+  free (cgpath);
+  free (newcgroup);
+
+  return 0;
+}
+
+#include <support/test-driver.c>
diff --git a/sysdeps/unix/sysv/linux/tst-fork_np.c b/sysdeps/unix/sysv/linux/tst-fork_np.c
new file mode 100644
index 0000000000..568d2245ee
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/tst-fork_np.c
@@ -0,0 +1,236 @@
+/* Basic tests for fork_np.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <array_length.h>
+#include <errno.h>
+#include <pthread.h>
+#include <stdbool.h>
+#include <stdlib.h>
+#include <support/check.h>
+#include <support/temp_file.h>
+#include <support/xunistd.h>
+#include <support/xsignal.h>
+#include <sys/wait.h>
+#include <unistd.h>
+
+#define SIG_PID_EXIT_CODE 20
+
+static bool atfork_prepare_var;
+static bool atfork_parent_var;
+static bool atfork_child_var;
+
+static sig_atomic_t sigchld_called;
+
+static void
+sigchld_handler (int sig)
+{
+  sigchld_called = 1;
+}
+
+static void
+atfork_prepare (void)
+{
+  atfork_prepare_var = true;
+}
+
+static void
+atfork_parent (void)
+{
+  atfork_parent_var = true;
+}
+
+static void
+atfork_child (void)
+{
+  atfork_child_var = true;
+}
+
+static int
+singlethread_test (bool async, bool wait_with_pid, bool nosigchld)
+{
+  const char testdata1[] = "abcdefghijklmnopqrtuvwxz";
+  enum { testdatalen1 = array_length (testdata1) };
+  const char testdata2[] = "01234567890";
+  enum { testdatalen2 = array_length (testdata2) };
+
+  pid_t ppid = getpid ();
+
+  int tempfd = create_temp_file ("tst-fork_np", NULL);
+
+  /* Check if the opened file is shared between process by read and write
+     some data on parent and child processes.  */
+  xwrite (tempfd, testdata1, testdatalen1);
+  off_t off = xlseek (tempfd, 0, SEEK_CUR);
+  TEST_COMPARE (off, testdatalen1);
+
+  fork_np_args_t fork_args = {
+    .fork_np_flags = FORK_NP_PIDFD
+		     | (async ? FORK_NP_ASYNCSAFE : 0)
+		     | (nosigchld ? FORK_NP_EXIT_SIGNAL : 0)
+  };
+  pid_t pid = fork_np (&fork_args, sizeof fork_args);
+  TEST_VERIFY_EXIT (pid != -1);
+
+  sigchld_called = 0;
+
+  if (pid == 0)
+    {
+      if (async)
+	TEST_VERIFY (!atfork_child_var);
+      else
+	TEST_VERIFY (atfork_child_var);
+
+      TEST_VERIFY_EXIT (getpid () != ppid);
+      TEST_COMPARE (getppid(), ppid);
+
+      TEST_COMPARE (xlseek (tempfd, 0, SEEK_CUR), testdatalen1);
+
+      xlseek (tempfd, 0, SEEK_SET);
+      char buf[testdatalen1];
+      TEST_COMPARE (read (tempfd, buf, sizeof (buf)), testdatalen1);
+      TEST_COMPARE_BLOB (buf, testdatalen1, testdata1, testdatalen1);
+
+      xlseek (tempfd, 0, SEEK_SET);
+      xwrite (tempfd, testdata2, testdatalen2);
+
+      xclose (tempfd);
+
+      _exit (EXIT_SUCCESS);
+    }
+
+  {
+    siginfo_t sinfo;
+    int options = WEXITED | (nosigchld ? __WCLONE : 0);
+    if (wait_with_pid)
+      TEST_COMPARE (waitid (P_PID, pid, &sinfo, options), 0);
+    else
+      TEST_COMPARE (waitid (P_PIDFD, fork_args.fork_np_pidfd, &sinfo,
+			    options), 0);
+    TEST_COMPARE (sinfo.si_signo, SIGCHLD);
+    TEST_COMPARE (sinfo.si_code, CLD_EXITED);
+    TEST_COMPARE (sinfo.si_status, 0);
+
+    /* If nosigchld is specified no SIGCHLD should be sent by the kernel.  */
+    TEST_COMPARE (sigchld_called, nosigchld ? 0 : 1);
+  }
+
+  TEST_COMPARE (xlseek (tempfd, 0, SEEK_CUR), testdatalen2);
+
+  xlseek (tempfd, 0, SEEK_SET);
+  char buf[testdatalen2];
+  TEST_COMPARE (read (tempfd, buf, sizeof (buf)), testdatalen2);
+
+  TEST_COMPARE_BLOB (buf, testdatalen2, testdata2, testdatalen2);
+
+  return 0;
+}
+
+static int
+do_test (void)
+{
+  /* Sanity check for pidfd support. */
+  TEST_COMPARE (fork_np (NULL, -1), -1);
+  TEST_COMPARE (errno, EINVAL);
+
+  {
+    fork_np_args_t fork_args = {
+      .fork_np_flags = FORK_NP_PIDFD,
+    };
+    pid_t pid = fork_np (&fork_args, sizeof fork_args);
+    if (pid == -1 && errno == ENOSYS)
+      FAIL_UNSUPPORTED ("kernel does not support CLONE_PIDFD clone flag");
+    TEST_VERIFY_EXIT (pid != -1);
+    if (pid == 0)
+      _exit (EXIT_SUCCESS);
+
+    siginfo_t sinfo;
+    TEST_COMPARE (waitid (P_PID, pid, &sinfo, WEXITED), 0);
+    TEST_COMPARE (sinfo.si_signo, SIGCHLD);
+    TEST_COMPARE (sinfo.si_code, CLD_EXITED);
+    TEST_COMPARE (sinfo.si_status, 0);
+  }
+
+  {
+    struct sigaction sa;
+    sa.sa_handler = sigchld_handler;
+    sa.sa_flags = 0;
+    sigemptyset (&sa.sa_mask);
+    xsigaction (SIGCHLD, &sa, NULL);
+  }
+
+  pthread_atfork (atfork_prepare, atfork_parent, atfork_child);
+
+  /* With default flags, fork_np acts as fork and run the pthread_atfork
+     handlers.  */
+  {
+    atfork_prepare_var = atfork_parent_var = atfork_child_var = false;
+    singlethread_test (false, false, false);
+    TEST_VERIFY (atfork_prepare_var);
+    TEST_VERIFY (atfork_parent_var);
+    TEST_VERIFY (!atfork_child_var);
+  }
+
+  /* Same as before, but also wait using the PID instead of pidfd.  */
+  {
+    atfork_prepare_var = atfork_parent_var = atfork_child_var = false;
+    singlethread_test (false, true, false);
+    TEST_VERIFY (atfork_prepare_var);
+    TEST_VERIFY (atfork_parent_var);
+    TEST_VERIFY (!atfork_child_var);
+  }
+
+  /* Using pidfd and disable SIGCHLD.  */
+  {
+    atfork_prepare_var = atfork_parent_var = atfork_child_var = false;
+    singlethread_test (false, false, true);
+    TEST_VERIFY (atfork_prepare_var);
+    TEST_VERIFY (atfork_parent_var);
+    TEST_VERIFY (!atfork_child_var);
+  }
+
+  /* With FORK_NP_ASYNCSAFE, fork_np acts as _Fork.  */
+  {
+    atfork_prepare_var = atfork_parent_var = atfork_child_var = false;
+    pthread_atfork (atfork_prepare, atfork_parent, atfork_child);
+    singlethread_test (true, false, false);
+    TEST_VERIFY (!atfork_prepare_var);
+    TEST_VERIFY (!atfork_parent_var);
+    TEST_VERIFY (!atfork_child_var);
+  }
+
+  {
+    atfork_prepare_var = atfork_parent_var = atfork_child_var = false;
+    pthread_atfork (atfork_prepare, atfork_parent, atfork_child);
+    singlethread_test (true, true, false);
+    TEST_VERIFY (!atfork_prepare_var);
+    TEST_VERIFY (!atfork_parent_var);
+    TEST_VERIFY (!atfork_child_var);
+  }
+
+  {
+    atfork_prepare_var = atfork_parent_var = atfork_child_var = false;
+    singlethread_test (true, true, true);
+    TEST_VERIFY (!atfork_prepare_var);
+    TEST_VERIFY (!atfork_parent_var);
+    TEST_VERIFY (!atfork_child_var);
+  }
+
+  return 0;
+}
+
+#include <support/test-driver.c>
diff --git a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
index e35bf54779..bf06381f82 100644
--- a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
@@ -2582,6 +2582,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 fork_np F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
index e7d7eb61c0..032347e89c 100644
--- a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
@@ -2688,6 +2688,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 fork_np F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
-- 
2.34.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v8 7/7] linux: Add pidfd_getpid
  2023-08-18 14:06 [PATCH v8 0/7] Add pidfd and cgroupv2 support for process creation Adhemerval Zanella
                   ` (5 preceding siblings ...)
  2023-08-18 14:06 ` [PATCH v8 6/7] posix: Add fork_np (BZ 26371) Adhemerval Zanella
@ 2023-08-18 14:06 ` Adhemerval Zanella
  2023-08-24  7:53   ` Florian Weimer
  2023-08-18 17:51 ` [PATCH v8 0/7] Add pidfd and cgroupv2 support for process creation Rich Felker
  7 siblings, 1 reply; 29+ messages in thread
From: Adhemerval Zanella @ 2023-08-18 14:06 UTC (permalink / raw)
  To: libc-alpha, Florian Weimer

This interface allows to obtain the associated process ID from the
process file descriptor.  It is done by parsing the procps fdinfo
information.  Its prototype is:

   pid_t pidfd_getpid (int fd)

It returns the associated pid or -1 in case of an error and sets the
errno accordingly.  The possible errno values are those from open, read,
and close (used on procps parsing), along with:

   - EBADF if the FD is negative, does not have a PID associated, or if
     the fdinfo fields contain a value larger than pid_t.

   - EREMOTE if the PID is in a separate namespace.

   - ESRCH if the process is already terminated.

Checked on x86_64-linux-gnu on Linux 4.15 (no CLONE_PIDFD or waitid
support), Linux 5.4 (full support), and Linux 6.2.
---
 NEWS                                          |   4 +
 manual/process.texi                           |  38 ++++++
 sysdeps/unix/sysv/linux/Makefile              |   3 +
 sysdeps/unix/sysv/linux/Versions              |   1 +
 sysdeps/unix/sysv/linux/aarch64/libc.abilist  |   1 +
 sysdeps/unix/sysv/linux/alpha/libc.abilist    |   1 +
 sysdeps/unix/sysv/linux/arc/libc.abilist      |   1 +
 sysdeps/unix/sysv/linux/arm/be/libc.abilist   |   1 +
 sysdeps/unix/sysv/linux/arm/le/libc.abilist   |   1 +
 sysdeps/unix/sysv/linux/clone-pidfd-support.c |   2 +-
 sysdeps/unix/sysv/linux/csky/libc.abilist     |   1 +
 sysdeps/unix/sysv/linux/hppa/libc.abilist     |   1 +
 sysdeps/unix/sysv/linux/i386/libc.abilist     |   1 +
 sysdeps/unix/sysv/linux/ia64/libc.abilist     |   1 +
 .../sysv/linux/loongarch/lp64/libc.abilist    |   1 +
 .../sysv/linux/m68k/coldfire/libc.abilist     |   1 +
 .../unix/sysv/linux/m68k/m680x0/libc.abilist  |   1 +
 .../sysv/linux/microblaze/be/libc.abilist     |   1 +
 .../sysv/linux/microblaze/le/libc.abilist     |   1 +
 .../sysv/linux/mips/mips32/fpu/libc.abilist   |   1 +
 .../sysv/linux/mips/mips32/nofpu/libc.abilist |   1 +
 .../sysv/linux/mips/mips64/n32/libc.abilist   |   1 +
 .../sysv/linux/mips/mips64/n64/libc.abilist   |   1 +
 sysdeps/unix/sysv/linux/nios2/libc.abilist    |   1 +
 sysdeps/unix/sysv/linux/or1k/libc.abilist     |   1 +
 sysdeps/unix/sysv/linux/pidfd_getpid.c        | 126 ++++++++++++++++++
 .../linux/powerpc/powerpc32/fpu/libc.abilist  |   1 +
 .../powerpc/powerpc32/nofpu/libc.abilist      |   1 +
 .../linux/powerpc/powerpc64/be/libc.abilist   |   1 +
 .../linux/powerpc/powerpc64/le/libc.abilist   |   1 +
 sysdeps/unix/sysv/linux/procutils.c           |  97 ++++++++++++++
 sysdeps/unix/sysv/linux/procutils.h           |  43 ++++++
 .../unix/sysv/linux/riscv/rv32/libc.abilist   |   1 +
 .../unix/sysv/linux/riscv/rv64/libc.abilist   |   1 +
 .../unix/sysv/linux/s390/s390-32/libc.abilist |   1 +
 .../unix/sysv/linux/s390/s390-64/libc.abilist |   1 +
 sysdeps/unix/sysv/linux/sh/be/libc.abilist    |   1 +
 sysdeps/unix/sysv/linux/sh/le/libc.abilist    |   1 +
 .../sysv/linux/sparc/sparc32/libc.abilist     |   1 +
 .../sysv/linux/sparc/sparc64/libc.abilist     |   1 +
 sysdeps/unix/sysv/linux/sys/pidfd.h           |   4 +
 sysdeps/unix/sysv/linux/tst-pidfd.c           |  48 +++++++
 sysdeps/unix/sysv/linux/tst-pidfd_getpid.c    | 126 ++++++++++++++++++
 .../unix/sysv/linux/x86_64/64/libc.abilist    |   1 +
 .../unix/sysv/linux/x86_64/x32/libc.abilist   |   1 +
 45 files changed, 525 insertions(+), 1 deletion(-)
 create mode 100644 sysdeps/unix/sysv/linux/pidfd_getpid.c
 create mode 100644 sysdeps/unix/sysv/linux/procutils.c
 create mode 100644 sysdeps/unix/sysv/linux/procutils.h
 create mode 100644 sysdeps/unix/sysv/linux/tst-pidfd_getpid.c

diff --git a/NEWS b/NEWS
index 00e9553e8f..45ccb2e64b 100644
--- a/NEWS
+++ b/NEWS
@@ -34,6 +34,10 @@ Major new features:
   posix_spawnattr_getcgroup_np), setting a different signal on process
   termination, and making the function act as _Fork.
 
+* On Linux, the pidfd_getpid function has been added.  It allows retrieving
+  the process ID associated with the process file descriptor created by
+  pid_spawn, fork_np, or pidfd_open.
+
 Deprecated and removed features, and other changes affecting compatibility:
 
   [Add deprecations, removals and changes affecting compatibility here]
diff --git a/manual/process.texi b/manual/process.texi
index e6ac1f934f..1026b8d200 100644
--- a/manual/process.texi
+++ b/manual/process.texi
@@ -33,6 +33,7 @@ primitive functions to do each step individually instead.
 * Process Creation Concepts::   An overview of the hard way to do it.
 * Process Identification::      How to get the process ID of a process.
 * Creating a Process::          How to fork a child process.
+* Querying a Process::          How to query a child process.
 * Executing a File::            How to make a process execute another program.
 * Process Completion::          How to tell when a child process has completed.
 * Process Completion Status::   How to interpret the status value
@@ -424,6 +425,43 @@ child with @code{wait} or @code{waitid}.
 This function is a GNU extension and specific to Linux.
 @end deftypefun
 
+@node Querying a Process
+@section Querying a Process
+
+The file descriptor returned by the @code{pidfd_fork} function can be used to
+query process extra information.
+
+@deftypefun pid_t pidfd_getpid (int @var{fd})
+@standards{GNU, sys/pidfd.h}
+@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
+
+The @code{pidfd_getpid} function retrieves the process ID associated with process
+file descriptor created with @code{pid_spawn}, @code{pidfd_fork}, or
+@code{pidfd_open}.
+
+If the operation fails, @code{pidfd_getpid} return @code{-1} and the following
+@code{errno} error conditionas are defined:
+
+@table @code
+@item EBADF
+The input file descriptor is invalid, does not have a pidfd associated, or an
+error has occurred parsing the kernel data.
+@item EREMOTE
+There is no process ID to denote the process in the current namespace.
+@item ESRCH
+The process for which the file descriptor refers to is terminated.
+@item ENOENT
+The procfs is not mounted.
+@item ENFILE.
+Too many open files in system (@code{pidfd_open} tries to open a procfs file and
+read its contents).
+@item ENOMEM
+Insufficient kernel memory was available.
+@end table
+
+This function is specific to Linux.
+@end deftypefun
+
 @node Executing a File
 @section Executing a File
 @cindex executing a file
diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
index c9164e9d0a..2a8ab8b96b 100644
--- a/sysdeps/unix/sysv/linux/Makefile
+++ b/sysdeps/unix/sysv/linux/Makefile
@@ -213,6 +213,7 @@ tests += \
   tst-ofdlocks \
   tst-personality \
   tst-pidfd \
+  tst-pidfd_getpid \
   tst-pkey \
   tst-ppoll \
   tst-prctl \
@@ -494,8 +495,10 @@ sysdep_routines += \
   getcpu \
   oldglob \
   fork_np \
+  pidfd_getpid \
   pidfd_spawn \
   pidfd_spawnp \
+  procutils \
   sched_getcpu \
   spawnattr_getcgroup_np \
   spawnattr_setcgroup_np \
diff --git a/sysdeps/unix/sysv/linux/Versions b/sysdeps/unix/sysv/linux/Versions
index c677631f24..ce3579e1c2 100644
--- a/sysdeps/unix/sysv/linux/Versions
+++ b/sysdeps/unix/sysv/linux/Versions
@@ -323,6 +323,7 @@ libc {
   }
   GLIBC_2.39 {
     fork_np;
+    pidfd_getpid;
     pidfd_spawn;
     pidfd_spawnp;
     posix_spawnattr_getcgroup_np;
diff --git a/sysdeps/unix/sysv/linux/aarch64/libc.abilist b/sysdeps/unix/sysv/linux/aarch64/libc.abilist
index dab02f0087..7463a0e073 100644
--- a/sysdeps/unix/sysv/linux/aarch64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/aarch64/libc.abilist
@@ -2674,6 +2674,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 fork_np F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/alpha/libc.abilist b/sysdeps/unix/sysv/linux/alpha/libc.abilist
index 1db00408cf..c708c87c78 100644
--- a/sysdeps/unix/sysv/linux/alpha/libc.abilist
+++ b/sysdeps/unix/sysv/linux/alpha/libc.abilist
@@ -2783,6 +2783,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 fork_np F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/arc/libc.abilist b/sysdeps/unix/sysv/linux/arc/libc.abilist
index 032aacc1ba..390fc7da90 100644
--- a/sysdeps/unix/sysv/linux/arc/libc.abilist
+++ b/sysdeps/unix/sysv/linux/arc/libc.abilist
@@ -2435,6 +2435,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 fork_np F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/arm/be/libc.abilist b/sysdeps/unix/sysv/linux/arm/be/libc.abilist
index 9f3ef16280..98971b6b4c 100644
--- a/sysdeps/unix/sysv/linux/arm/be/libc.abilist
+++ b/sysdeps/unix/sysv/linux/arm/be/libc.abilist
@@ -555,6 +555,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 fork_np F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/arm/le/libc.abilist b/sysdeps/unix/sysv/linux/arm/le/libc.abilist
index c2c6c8af6b..2d1ae3768e 100644
--- a/sysdeps/unix/sysv/linux/arm/le/libc.abilist
+++ b/sysdeps/unix/sysv/linux/arm/le/libc.abilist
@@ -552,6 +552,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 fork_np F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/clone-pidfd-support.c b/sysdeps/unix/sysv/linux/clone-pidfd-support.c
index 4411e2b9ea..45378957b0 100644
--- a/sysdeps/unix/sysv/linux/clone-pidfd-support.c
+++ b/sysdeps/unix/sysv/linux/clone-pidfd-support.c
@@ -56,5 +56,5 @@ __clone_pidfd_supported (void)
       atomic_store_relaxed (&__waitid_pidfd_supported, state);
     }
 
-  return state > 1;
+  return state > 0;
 }
diff --git a/sysdeps/unix/sysv/linux/csky/libc.abilist b/sysdeps/unix/sysv/linux/csky/libc.abilist
index 4112163af2..99ca4e06e9 100644
--- a/sysdeps/unix/sysv/linux/csky/libc.abilist
+++ b/sysdeps/unix/sysv/linux/csky/libc.abilist
@@ -2711,6 +2711,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 fork_np F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/hppa/libc.abilist b/sysdeps/unix/sysv/linux/hppa/libc.abilist
index b01734661b..718c3c5545 100644
--- a/sysdeps/unix/sysv/linux/hppa/libc.abilist
+++ b/sysdeps/unix/sysv/linux/hppa/libc.abilist
@@ -2660,6 +2660,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 fork_np F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/i386/libc.abilist b/sysdeps/unix/sysv/linux/i386/libc.abilist
index 14e58ef02d..2864b5b223 100644
--- a/sysdeps/unix/sysv/linux/i386/libc.abilist
+++ b/sysdeps/unix/sysv/linux/i386/libc.abilist
@@ -2844,6 +2844,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 fork_np F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/ia64/libc.abilist b/sysdeps/unix/sysv/linux/ia64/libc.abilist
index 25936400b8..002f16a6cb 100644
--- a/sysdeps/unix/sysv/linux/ia64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/ia64/libc.abilist
@@ -2609,6 +2609,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 fork_np F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/loongarch/lp64/libc.abilist b/sysdeps/unix/sysv/linux/loongarch/lp64/libc.abilist
index 4299a45d2f..c799d98fca 100644
--- a/sysdeps/unix/sysv/linux/loongarch/lp64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/loongarch/lp64/libc.abilist
@@ -2195,6 +2195,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 fork_np F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
index 98d11f7e00..de1928d13f 100644
--- a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
+++ b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
@@ -556,6 +556,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 fork_np F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
index 311b17c166..1f4c12e3fd 100644
--- a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
+++ b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
@@ -2787,6 +2787,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 fork_np F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist b/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist
index 9a645345e7..9d51eb1eb1 100644
--- a/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist
+++ b/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist
@@ -2760,6 +2760,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 fork_np F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist b/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist
index bc6b3094fc..a5373cd308 100644
--- a/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist
+++ b/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist
@@ -2757,6 +2757,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 fork_np F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
index 14f2335c29..f71d0447ba 100644
--- a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
@@ -2752,6 +2752,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 fork_np F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
index f41a1adaca..511a11f618 100644
--- a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
@@ -2750,6 +2750,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 fork_np F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
index 3500745aa0..e414749b65 100644
--- a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
@@ -2758,6 +2758,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 fork_np F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
index 64cc996c51..2e5ebb90ac 100644
--- a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
@@ -2660,6 +2660,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 fork_np F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/nios2/libc.abilist b/sysdeps/unix/sysv/linux/nios2/libc.abilist
index 723956e4be..c8c2f8b5c3 100644
--- a/sysdeps/unix/sysv/linux/nios2/libc.abilist
+++ b/sysdeps/unix/sysv/linux/nios2/libc.abilist
@@ -2799,6 +2799,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 fork_np F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/or1k/libc.abilist b/sysdeps/unix/sysv/linux/or1k/libc.abilist
index 97657be343..a50a669712 100644
--- a/sysdeps/unix/sysv/linux/or1k/libc.abilist
+++ b/sysdeps/unix/sysv/linux/or1k/libc.abilist
@@ -2181,6 +2181,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 fork_np F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/pidfd_getpid.c b/sysdeps/unix/sysv/linux/pidfd_getpid.c
new file mode 100644
index 0000000000..d03936e97f
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/pidfd_getpid.c
@@ -0,0 +1,126 @@
+/* pidfd_getpid - Get the associated pid from the pid file descriptor.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <_itoa.h>
+#include <errno.h>
+#include <intprops.h>
+#include <procutils.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sysdep.h>
+#include <unistd.h>
+
+#define FDINFO_TO_FILENAME_PREFIX "/proc/self/fdinfo/"
+
+#define FDINFO_FILENAME_LEN \
+  (sizeof (FDINFO_TO_FILENAME_PREFIX) + INT_STRLEN_BOUND (int))
+
+struct parse_fdinfo_t
+{
+  bool found;
+  pid_t pid;
+};
+
+/* Parse the PID field in the fdinfo entry, if existent.  Avoid strtol or
+   similar to not be locale dependent.  */
+static int
+parse_fdinfo (const char *l, void *arg)
+{
+  enum { fieldlen = sizeof ("Pid:") - 1 };
+  if (strncmp (l, "Pid:", fieldlen) != 0)
+    return 0;
+
+  l += fieldlen;
+
+  /* Skip leading spaces.  */
+  while (*l == ' ' || (unsigned int) (*l) -'\t' < 5)
+    l++;
+
+  bool neg = false;
+  switch (*l)
+    {
+    case '-':
+      neg = true;
+      l++;
+      break;
+    case '+':
+      return -1;
+    }
+
+  if (*l == '\0')
+    return 0;
+
+  int n = 0;
+  while (*l != '\0')
+    {
+      /* Check if '*l' is a digit.  */
+      if ('0' > *l || *l > '9')
+        return -1;
+
+      /* Ignore invalid large values.  */
+      if (INT_MULTIPLY_WRAPV (10, n, &n)
+          || INT_ADD_WRAPV (n, *l++ - '0', &n))
+        return -1;
+    }
+
+  /* -1 indicates that the process is terminated.  */
+  if (neg && n != 1)
+    return -1;
+
+  struct parse_fdinfo_t *fdinfo = arg;
+  fdinfo->pid = neg ? -n : n;
+  fdinfo->found = true;
+
+  return 1;
+}
+
+pid_t
+pidfd_getpid (int fd)
+{
+  if (__glibc_unlikely (fd < 0))
+    {
+      __set_errno (EBADF);
+      return -1;
+    }
+
+  char fdinfoname[FDINFO_FILENAME_LEN];
+
+  char *p = mempcpy (fdinfoname, FDINFO_TO_FILENAME_PREFIX,
+		     strlen (FDINFO_TO_FILENAME_PREFIX));
+  *_fitoa_word (fd, p, 10, 0) = '\0';
+
+  struct parse_fdinfo_t fdinfo = { .found = false, .pid = -1 };
+  if (!procutils_read_file (fdinfoname, parse_fdinfo, &fdinfo))
+    /* The fdinfo contains an invalid 'Pid:' value.  */
+    return INLINE_SYSCALL_ERROR_RETURN_VALUE (EBADF);
+
+  /* The FD does not have a 'Pid:' entry associated.  */
+  if (!fdinfo.found)
+    return INLINE_SYSCALL_ERROR_RETURN_VALUE (EBADF);
+
+  /* The pidfd cannot be resolved because it is in a separate pid
+     namespace.  */
+  if (fdinfo.pid == 0)
+    return INLINE_SYSCALL_ERROR_RETURN_VALUE (EREMOTE);
+
+  /* A negative value means the process is terminated.  */
+  if (fdinfo.pid < 0)
+    return INLINE_SYSCALL_ERROR_RETURN_VALUE (ESRCH);
+
+  return fdinfo.pid;
+}
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
index a3fa2f4f87..becd490bef 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
@@ -2826,6 +2826,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 fork_np F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
index bddf0f2d01..cc088490ae 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
@@ -2859,6 +2859,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 fork_np F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
index ee9db4eff2..70872b29c5 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
@@ -2580,6 +2580,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 fork_np F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
index 0a0c4c4650..e2c7190ccb 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
@@ -2894,6 +2894,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 fork_np F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/procutils.c b/sysdeps/unix/sysv/linux/procutils.c
new file mode 100644
index 0000000000..30fc3a063a
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/procutils.c
@@ -0,0 +1,97 @@
+/* Utilities functions to read/parse Linux procfs and sysfs.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <assert.h>
+#include <not-cancel.h>
+#include <procutils.h>
+#include <string.h>
+
+static int
+next_line (char **r, int fd, char *const buffer, char **cp, char **re,
+           char *const buffer_end)
+{
+  char *res = *cp;
+  char *nl = memchr (*cp, '\n', *re - *cp);
+  if (nl == NULL)
+    {
+      if (*cp != buffer)
+        {
+          if (*re == buffer_end)
+            {
+              memmove (buffer, *cp, *re - *cp);
+              *re = buffer + (*re - *cp);
+              *cp = buffer;
+
+              ssize_t n = TEMP_FAILURE_RETRY (
+		__read_nocancel (fd, *re, buffer_end - *re));
+              if (n < 0)
+                return -1;
+
+              *re += n;
+
+              nl = memchr (*cp, '\n', *re - *cp);
+	      if (nl == NULL)
+	        /* Line too long.  */
+		return 0;
+            }
+          else
+            nl = memchr (*cp, '\n', *re - *cp);
+
+          res = *cp;
+        }
+
+      if (nl == NULL)
+        nl = *re - 1;
+    }
+
+  *nl = '\0';
+  *cp = nl + 1;
+  assert (*cp <= *re);
+
+  if (res == *re)
+    return 0;
+
+  *r = res;
+  return 1;
+}
+
+bool
+procutils_read_file (const char *filename, procutils_closure_t closure,
+		     void *arg)
+{
+  enum { buffer_size = PROCUTILS_MAX_LINE_LEN };
+  char buffer[buffer_size];
+  char *buffer_end = buffer + buffer_size;
+  char *cp = buffer_end;
+  char *re = buffer_end;
+
+  int fd = TEMP_FAILURE_RETRY (
+    __open64_nocancel (filename, O_RDONLY | O_CLOEXEC));
+  if (fd == -1)
+    return false;
+
+  char *l;
+  int r;
+  while ((r = next_line (&l, fd, buffer, &cp, &re, buffer_end)) > 0)
+    if (closure (l, arg) != 0)
+      break;
+
+  __close_nocancel_nostatus (fd);
+
+  return r == 1;
+}
diff --git a/sysdeps/unix/sysv/linux/procutils.h b/sysdeps/unix/sysv/linux/procutils.h
new file mode 100644
index 0000000000..b3c2f6fb55
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/procutils.h
@@ -0,0 +1,43 @@
+/* Utilities functions to read/parse Linux procfs and sysfs.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#ifndef _PROCUTILS_H
+#define _PROCUTILS_H
+
+#include <stdbool.h>
+
+typedef int (*procutils_closure_t) (const char *line, void *arg);
+
+#define PROCUTILS_MAX_LINE_LEN 256
+
+/* Open and read the path FILENAME, line per line, and call CLOSURE with
+   argument ARG on each line.  The read is done with a static buffer,
+   with non-cancellable calls, and the line is null terminated.
+
+   The CLOSURE should return 0 if the read should continue, otherwise the
+   the function should stop and return early.
+
+   The '\n' is not included in the CLOSURE input argument and lines longer
+   than PROCUTILS_MAX_LINE_LEN characteres are ignored.
+
+   It returns true in case the file is fully read or false if CLOSURE
+   returns a value diferent than 0.  */
+bool procutils_read_file (const char *filename, procutils_closure_t closure,
+			  void *arg) attribute_hidden;
+
+#endif
diff --git a/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist
index 0c9a1648e1..01b72f4a64 100644
--- a/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist
@@ -2437,6 +2437,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 fork_np F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
index 0acdd6fff4..8cfa87668c 100644
--- a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
@@ -2637,6 +2637,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 fork_np F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
index b94792e4c1..3174961090 100644
--- a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
@@ -2824,6 +2824,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 fork_np F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
index 7d3a6e3c90..da6ec4a3c1 100644
--- a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
@@ -2617,6 +2617,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 fork_np F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/sh/be/libc.abilist b/sysdeps/unix/sysv/linux/sh/be/libc.abilist
index 1c26740359..8a3fd3b4c7 100644
--- a/sysdeps/unix/sysv/linux/sh/be/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sh/be/libc.abilist
@@ -2667,6 +2667,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 fork_np F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/sh/le/libc.abilist b/sysdeps/unix/sysv/linux/sh/le/libc.abilist
index 5b0bd8c6c8..30346fae9e 100644
--- a/sysdeps/unix/sysv/linux/sh/le/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sh/le/libc.abilist
@@ -2664,6 +2664,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 fork_np F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
index 9e18f09c1e..d9f40c1f5e 100644
--- a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
@@ -2819,6 +2819,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 fork_np F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
index 3a94cf17ee..162027b16f 100644
--- a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
@@ -2632,6 +2632,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 fork_np F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/sys/pidfd.h b/sysdeps/unix/sysv/linux/sys/pidfd.h
index 342e593288..5179e2e795 100644
--- a/sysdeps/unix/sysv/linux/sys/pidfd.h
+++ b/sysdeps/unix/sysv/linux/sys/pidfd.h
@@ -46,4 +46,8 @@ extern int pidfd_getfd (int __pidfd, int __targetfd,
 extern int pidfd_send_signal (int __pidfd, int __sig, siginfo_t *__info,
 			      unsigned int __flags) __THROW;
 
+/* Query the process ID (PID) from process descriptor FD.  Return the PID
+   or -1 in case of an error.  */
+extern pid_t pidfd_getpid (int __fd) __THROW;
+
 #endif /* _PIDFD_H  */
diff --git a/sysdeps/unix/sysv/linux/tst-pidfd.c b/sysdeps/unix/sysv/linux/tst-pidfd.c
index 64d8a2ef40..b254020175 100644
--- a/sysdeps/unix/sysv/linux/tst-pidfd.c
+++ b/sysdeps/unix/sysv/linux/tst-pidfd.c
@@ -18,6 +18,7 @@
 
 #include <errno.h>
 #include <fcntl.h>
+#include <limits.h>
 #include <support/capture_subprocess.h>
 #include <support/check.h>
 #include <support/process_state.h>
@@ -27,6 +28,8 @@
 #include <support/xsocket.h>
 #include <sys/pidfd.h>
 #include <sys/wait.h>
+#include <stdlib.h>
+#include <unistd.h>
 
 #define REMOTE_PATH "/dev/null"
 
@@ -102,6 +105,45 @@ do_test (void)
   ppid = getpid ();
   puid = getuid ();
 
+  /* Sanity check for invalid inputs.  */
+  TEST_COMPARE (pidfd_getpid (-1), -1);
+  TEST_COMPARE (errno, EBADF);
+
+  {
+    pid_t pid = pidfd_getpid (STDOUT_FILENO);
+    TEST_COMPARE (pid, -1);
+    TEST_COMPARE (errno, EBADF);
+  }
+
+  /* Check if pidfd_getpid returns ESRCH for exited subprocess.  */
+  {
+    fork_np_args_t args = {
+      .fork_np_flags = FORK_NP_PIDFD,
+    };
+    pid_t pidfork = fork_np (&args, sizeof args);
+    if (pidfork == 0)
+      _exit (EXIT_SUCCESS);
+
+    /* The process might be still running or already in zombie state, in
+       either case the PID is still allocated to the process.  */
+    pid_t pid = pidfd_getpid (args.fork_np_pidfd);
+    if (pid > 0)
+      support_process_state_wait (pid, support_process_state_zombie);
+
+    siginfo_t info;
+    TEST_COMPARE (waitid (P_PIDFD, args.fork_np_pidfd, &info, WEXITED), 0);
+    TEST_COMPARE (info.si_pid, pidfork);
+    TEST_COMPARE (info.si_status, 0);
+    TEST_COMPARE (info.si_code, CLD_EXITED);
+
+    /* Once the process is reaped the associated PID is not available.  */
+    pid = pidfd_getpid (args.fork_np_pidfd);
+    TEST_COMPARE (pid, -1);
+    TEST_COMPARE (errno, ESRCH);
+
+    xclose (args.fork_np_pidfd);
+  }
+
   TEST_COMPARE (socketpair (AF_UNIX, SOCK_STREAM, 0, sockets), 0);
 
   pid_t pid = xfork ();
@@ -118,6 +160,12 @@ do_test (void)
   int pidfd = pidfd_open (pid, 0);
   TEST_VERIFY (pidfd != -1);
 
+  TEST_COMPARE (pidfd_getpid (INT_MAX), -1);
+  {
+    pid_t querypid = pidfd_getpid (pidfd);
+    TEST_COMPARE (querypid, pid);
+  }
+
   /* Wait for first sigtimedwait.  */
   support_process_state_wait (pid, support_process_state_sleeping);
   TEST_COMPARE (pidfd_send_signal (pidfd, SIGUSR1, NULL, 0), 0);
diff --git a/sysdeps/unix/sysv/linux/tst-pidfd_getpid.c b/sysdeps/unix/sysv/linux/tst-pidfd_getpid.c
new file mode 100644
index 0000000000..fe418465c7
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/tst-pidfd_getpid.c
@@ -0,0 +1,126 @@
+/* Specific tests for Linux pidfd_getpid.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <errno.h>
+#include <sched.h>
+#include <stdlib.h>
+#include <support/check.h>
+#include <support/xunistd.h>
+#include <support/test-driver.h>
+#include <sys/pidfd.h>
+#include <sys/wait.h>
+#include <sys/mount.h>
+
+static int
+do_test (void)
+{
+  {
+    /* The pidfd_getfd syscall was the last in the set of pidfd related
+       syscalls added to the kernel.  Use pidfd_getfd to decide if this
+       kernel has pidfd support that we can test.  */
+    int r = pidfd_getfd (0, 0, 1);
+    TEST_VERIFY_EXIT (r == -1);
+    if (errno == ENOSYS)
+      FAIL_UNSUPPORTED ("kernel does not support pidfd_getfd, skipping test");
+    if (errno == EPERM)
+      FAIL_UNSUPPORTED ("kernel does not allow pidfd_getfd, skipping test");
+  }
+
+  /* Check if pidfd_getpid returns EREMOTE for process not in current
+     namespace.  */
+  {
+    fork_np_args_t child_args = {
+      .fork_np_flags = FORK_NP_PIDFD,
+    };
+    pid_t child0 = fork_np (&child_args, sizeof child_args);
+    TEST_VERIFY_EXIT (child0 >= 0);
+    if (child0 == 0)
+      {
+	/* Create another unrelated descriptor, so child2 will inherit the
+	   file descriptor.  */
+	fork_np_args_t subchild_args = {
+	  .fork_np_flags = FORK_NP_PIDFD,
+	};
+	pid_t child1 = fork_np (&subchild_args, sizeof subchild_args);
+	TEST_VERIFY_EXIT (child1 >= 0);
+        if (child1 == 0)
+	  _exit (0);
+
+	if (unshare (CLONE_NEWNS | CLONE_NEWUSER | CLONE_NEWPID) < 0)
+	  {
+	    /* Older kernels may not support all the options, or security
+	       policy may block this call.  */
+	    if (errno == EINVAL || errno == EPERM || errno == ENOSPC)
+	      exit (EXIT_UNSUPPORTED);
+	    FAIL_EXIT1 ("unshare user/fs/pid failed: %m");
+	  }
+
+	if (mount (NULL, "/", NULL, MS_REC | MS_PRIVATE, 0) != 0)
+	  {
+	    /* This happens if we're trying to create a nested container,
+	       like if the build is running under podman, and we lack
+	       priviledges.  */
+	    if (errno  == EPERM)
+	      _exit (EXIT_UNSUPPORTED);
+	    else
+	      _exit (EXIT_FAILURE);
+	  }
+
+	pid_t child2 = xfork ();
+	if (child2 > 0)
+	  {
+	    int status;
+	    xwaitpid (child2, &status, 0);
+	    TEST_VERIFY (WIFEXITED (status));
+	    xwaitpid (child1, &status, 0);
+	    TEST_VERIFY (WIFEXITED (status));
+
+	    _exit (WEXITSTATUS (status));
+	  }
+
+	/* Now that we're pid 1 (effectively "root") we can mount /proc  */
+	if (mount ("proc", "/proc", "proc", 0, NULL) != 0)
+	  {
+	    if (errno == EPERM)
+	      _exit (EXIT_UNSUPPORTED);
+	    else
+	      _exit (EXIT_FAILURE);
+	  }
+
+	TEST_COMPARE (pidfd_getpid (subchild_args.fork_np_pidfd), -1);
+	TEST_COMPARE (errno, EREMOTE);
+
+	_exit (EXIT_SUCCESS);
+      }
+
+      pid_t child0pid = pidfd_getpid (child_args.fork_np_pidfd);
+
+      siginfo_t info;
+      TEST_COMPARE (waitid (P_PIDFD, child_args.fork_np_pidfd, &info,
+			    WEXITED), 0);
+      if (info.si_status == EXIT_UNSUPPORTED)
+	FAIL_UNSUPPORTED ("unable to unshare user/fs/pid");
+      TEST_COMPARE (info.si_status, 0);
+      TEST_COMPARE (info.si_code, CLD_EXITED);
+      TEST_COMPARE (info.si_pid, child0pid);
+  }
+
+  return 0;
+}
+
+#include <support/test-driver.c>
diff --git a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
index bf06381f82..f1e4f83345 100644
--- a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
@@ -2583,6 +2583,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 fork_np F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
index 032347e89c..280bca96f3 100644
--- a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
@@ -2689,6 +2689,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 fork_np F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
-- 
2.34.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v8 0/7] Add pidfd and cgroupv2 support for process creation
  2023-08-18 14:06 [PATCH v8 0/7] Add pidfd and cgroupv2 support for process creation Adhemerval Zanella
                   ` (6 preceding siblings ...)
  2023-08-18 14:06 ` [PATCH v8 7/7] linux: Add pidfd_getpid Adhemerval Zanella
@ 2023-08-18 17:51 ` Rich Felker
  2023-08-18 18:34   ` Adhemerval Zanella Netto
  2023-08-21  6:53   ` Florian Weimer
  7 siblings, 2 replies; 29+ messages in thread
From: Rich Felker @ 2023-08-18 17:51 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: libc-alpha, Florian Weimer

On Fri, Aug 18, 2023 at 11:06:35AM -0300, Adhemerval Zanella via Libc-alpha wrote:
> The glibc 2.36 added wrappers for Linux syscall pidfd_open, pidfd_getfd,
> and pidfd_send_signal, and exported the P_PIDFD to use along with
> waitid. The pidfd is a race-free interface, however, the pidfd_open is
> subject to TOCTOU if the file descriptor is not obtained directly from
> the clone or clone3 syscall (there is still a small window between the
> clone return and the pidfd_getfd where the process can be reaped and the
> process ID reused).

Unless I'm missing something, that window is purely programmer error.
The pid belongs to the parent process, that called fork, posix_spawn,
clone, or whatever, and is responsible for not freeing it until it's
done using it.

Yes this can happen if you install a SIGCHLD handler that reaps
anything it sees, or if you're calling wait without a pid. This is
programming error. If you're stuck with code outside your control that
makes that mistake, you can already avoid it with clone by setting the
child exit signal to 0 rather than SIGCHLD. But it's best just not to
do that.

IMO making a new, complex, highly nonstandard interface to work around
a problem that's programmer error, and getting this nonstandard and
nonportable pattern into mainstream software, has negative value.

Rich

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v8 0/7] Add pidfd and cgroupv2 support for process creation
  2023-08-18 17:51 ` [PATCH v8 0/7] Add pidfd and cgroupv2 support for process creation Rich Felker
@ 2023-08-18 18:34   ` Adhemerval Zanella Netto
  2023-08-28 12:52     ` Luca Boccassi
  2023-08-21  6:53   ` Florian Weimer
  1 sibling, 1 reply; 29+ messages in thread
From: Adhemerval Zanella Netto @ 2023-08-18 18:34 UTC (permalink / raw)
  To: Rich Felker; +Cc: libc-alpha, Florian Weimer



On 18/08/23 14:51, Rich Felker wrote:
> On Fri, Aug 18, 2023 at 11:06:35AM -0300, Adhemerval Zanella via Libc-alpha wrote:
>> The glibc 2.36 added wrappers for Linux syscall pidfd_open, pidfd_getfd,
>> and pidfd_send_signal, and exported the P_PIDFD to use along with
>> waitid. The pidfd is a race-free interface, however, the pidfd_open is
>> subject to TOCTOU if the file descriptor is not obtained directly from
>> the clone or clone3 syscall (there is still a small window between the
>> clone return and the pidfd_getfd where the process can be reaped and the
>> process ID reused).
> 
> Unless I'm missing something, that window is purely programmer error.
> The pid belongs to the parent process, that called fork, posix_spawn,
> clone, or whatever, and is responsible for not freeing it until it's
> done using it.
> 
> Yes this can happen if you install a SIGCHLD handler that reaps
> anything it sees, or if you're calling wait without a pid. This is
> programming error. If you're stuck with code outside your control that
> makes that mistake, you can already avoid it with clone by setting the
> child exit signal to 0 rather than SIGCHLD. But it's best just not to
> do that.
> 

Yes, this is the issue GNOME is having with their code base [1] and that
motivated this new interface.  Systemd also seems to be interested in 
these interface, although I am not sure if it is also subject to same
issue.

I don't have a strong opinion whether this should be considered a solid
reason to provide a new API, another option would to close BZ#30349 [2] 
as wontfix with this rationale.  However, this does not really provide 
an workaround, and worse it will pass the idea that to fully resolve it 
you will need either to allow the racy condition or issue clone directly.

> IMO making a new, complex, highly nonstandard interface to work around
> a problem that's programmer error, and getting this nonstandard and
> nonportable pattern into mainstream software, has negative value.

Although I see this makes sense for the pidfd_spawn, there is still the
BZ 26371 requirement for cgroupv2 support.  This would required at least
something analogous to posix_spawnattr_{get,set}cgroup_np.  I am not
sure about if my fork_np suggestion is really required here, although
the idea is to allow a more extensible fork interface for possible future
clone support.

Florian and Lennart seems to lean for a clone3 similar to the clone
interface, which I really would like to avoid.

[1] https://gitlab.gnome.org/GNOME/glib/-/issues/1866
[2] https://sourceware.org/bugzilla/show_bug.cgi?id=30349

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v8 0/7] Add pidfd and cgroupv2 support for process creation
  2023-08-18 17:51 ` [PATCH v8 0/7] Add pidfd and cgroupv2 support for process creation Rich Felker
  2023-08-18 18:34   ` Adhemerval Zanella Netto
@ 2023-08-21  6:53   ` Florian Weimer
  2023-08-21 13:55     ` Rich Felker
  1 sibling, 1 reply; 29+ messages in thread
From: Florian Weimer @ 2023-08-21  6:53 UTC (permalink / raw)
  To: Rich Felker; +Cc: Adhemerval Zanella, libc-alpha

* Rich Felker:

> On Fri, Aug 18, 2023 at 11:06:35AM -0300, Adhemerval Zanella via Libc-alpha wrote:
>> The glibc 2.36 added wrappers for Linux syscall pidfd_open, pidfd_getfd,
>> and pidfd_send_signal, and exported the P_PIDFD to use along with
>> waitid. The pidfd is a race-free interface, however, the pidfd_open is
>> subject to TOCTOU if the file descriptor is not obtained directly from
>> the clone or clone3 syscall (there is still a small window between the
>> clone return and the pidfd_getfd where the process can be reaped and the
>> process ID reused).
>
> Unless I'm missing something, that window is purely programmer error.
> The pid belongs to the parent process, that called fork, posix_spawn,
> clone, or whatever, and is responsible for not freeing it until it's
> done using it.
>
> Yes this can happen if you install a SIGCHLD handler that reaps
> anything it sees, or if you're calling wait without a pid. This is
> programming error. If you're stuck with code outside your control that
> makes that mistake, you can already avoid it with clone by setting the
> child exit signal to 0 rather than SIGCHLD. But it's best just not to
> do that.

I think clone3 with exit_signal set to 0 and CLONE_PIDFD allows the
creation of subprocesses that are difficult to observe by accident from
the rest of the process, while obtaining a stable identifier for the
process.  I do not think there is any other way to achieve that.  I
think it's desirable to expose this functionality in some way.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v8 0/7] Add pidfd and cgroupv2 support for process creation
  2023-08-21  6:53   ` Florian Weimer
@ 2023-08-21 13:55     ` Rich Felker
  2023-08-24  7:25       ` Florian Weimer
  0 siblings, 1 reply; 29+ messages in thread
From: Rich Felker @ 2023-08-21 13:55 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Adhemerval Zanella, libc-alpha

On Mon, Aug 21, 2023 at 08:53:53AM +0200, Florian Weimer wrote:
> * Rich Felker:
> 
> > On Fri, Aug 18, 2023 at 11:06:35AM -0300, Adhemerval Zanella via Libc-alpha wrote:
> >> The glibc 2.36 added wrappers for Linux syscall pidfd_open, pidfd_getfd,
> >> and pidfd_send_signal, and exported the P_PIDFD to use along with
> >> waitid. The pidfd is a race-free interface, however, the pidfd_open is
> >> subject to TOCTOU if the file descriptor is not obtained directly from
> >> the clone or clone3 syscall (there is still a small window between the
> >> clone return and the pidfd_getfd where the process can be reaped and the
> >> process ID reused).
> >
> > Unless I'm missing something, that window is purely programmer error.
> > The pid belongs to the parent process, that called fork, posix_spawn,
> > clone, or whatever, and is responsible for not freeing it until it's
> > done using it.
> >
> > Yes this can happen if you install a SIGCHLD handler that reaps
> > anything it sees, or if you're calling wait without a pid. This is
> > programming error. If you're stuck with code outside your control that
> > makes that mistake, you can already avoid it with clone by setting the
> > child exit signal to 0 rather than SIGCHLD. But it's best just not to
> > do that.
> 
> I think clone3 with exit_signal set to 0 and CLONE_PIDFD allows the
> creation of subprocesses that are difficult to observe by accident from
> the rest of the process, while obtaining a stable identifier for the
> process.  I do not think there is any other way to achieve that.  I
> think it's desirable to expose this functionality in some way.

Indeed that seems like useful functionality to expose for cases where
you can't fix some bad code, but there are lots of issues with how
clone3 (and even clone) should behave with respect to the child
environment when you don't exec -- is it _Fork-like or fork-like? Can
you use AS-unsafe interfaces in the child of a MT parent? Etc. This
should all be discussed on libc-coord not libc-alpha, IMO.

Independent of that, though, I'd like to focus on the fact that
randomly reaping children with no regard to what part of your process
owned those pids has been wrong and broken practice for something like
4 decades. Just like looping over a range of fds and closing
everything is broken. It's a type of UAF bug and it's not a pattern we
suddenly need to support because systemd or whoever says so. It's a
bug that's easily detected with static analysis (any calls to wait or
to waitpid/waitid without the pid specified, or SIGCHLD handlers) and
usually easy to fix -- and then the fix works on every POSIX-ish
platform that's existed, pretty much ever, not just on GNU/Linux.

Rich

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v8 3/7] linux: Define __ASSUME_CLONE3 to 0 for alpha, ia64, nios2, sh, and sparc
  2023-08-18 14:06 ` [PATCH v8 3/7] linux: Define __ASSUME_CLONE3 to 0 for alpha, ia64, nios2, sh, and sparc Adhemerval Zanella
@ 2023-08-24  6:06   ` Florian Weimer
  0 siblings, 0 replies; 29+ messages in thread
From: Florian Weimer @ 2023-08-24  6:06 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: libc-alpha

* Adhemerval Zanella:

> Not all architectures added clone3 syscall.

Looks good.

Reviewed-by: Florian Weimer <fweimer@redhat.com>

Thanks,
Florian


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v8 6/7] posix: Add fork_np (BZ 26371)
  2023-08-18 14:06 ` [PATCH v8 6/7] posix: Add fork_np (BZ 26371) Adhemerval Zanella
@ 2023-08-24  6:07   ` Florian Weimer
  0 siblings, 0 replies; 29+ messages in thread
From: Florian Weimer @ 2023-08-24  6:07 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: libc-alpha

* Adhemerval Zanella:

> Returning a pidfd allows a process to keep a race-free handle to a child
> process. However, to create a process file descriptor the caller needs
> to use pidfd_open which still might be subject to TOCTOU.
>
> The implementation assures that the kernel must support the complete
> pidfd interface, meaning that waitid (P_PIDFD) should be supported. It
> ensures that a non-racy workaround is required (such as reading procfs
> fdinfo pid to use along with old wait interfaces).  If the kernel does
> not have the required support the interface returns -1 and set errno to
> ENOSYS.

Please skip this for now.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v8 4/7] linux: Add posix_spawnattr_{get,set}cgroup_np (BZ 26731)
  2023-08-18 14:06 ` [PATCH v8 4/7] linux: Add posix_spawnattr_{get,set}cgroup_np (BZ 26731) Adhemerval Zanella
@ 2023-08-24  7:00   ` Florian Weimer
  0 siblings, 0 replies; 29+ messages in thread
From: Florian Weimer @ 2023-08-24  7:00 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: libc-alpha

* Adhemerval Zanella:

> These functions allow to posix_spawn and posix_spawnp to use
> CLONE_INTO_CGROUP with clone3, allowing the child process to
> be created in a different cgroup version 2.  These are GNU
> extensions that are available only for Linux, and also only
> for the architectures that implement clone3 wrapper
> (HAVE_CLONE3_WRAPPER).
>
> To create a process on a different cgroupv2, one can use the:
>
>   posix_spawnattr_t attr;
>   posix_spawnattr_init (&attr);
>   posix_spawnattr_setflags (&attr, POSIX_SPAWN_SETCGROUP);
>   posix_spawnattr_setcgroup_np (&attr, cgroup);
>   posix_spawn (...)
>
> Similar to other posix_spawn flags, POSIX_SPAWN_SETCGROUP control
> whether the cgroup file descriptor will be used or not with
> clone3.
>
> There is no fallback if either clone3 does not support the flag
> or if the architecture does not provide the clone3 wrapper, in
> this case posix_spawn returns ENOTSUP.

typo: EOPNOTSUPP

Okay with that change.

Reviewed-by: Florian Weimer <fweimer@redhat.com>

Thanks,
Florian


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v8 5/7] posix: Add pidfd_spawn and pidfd_spawnp (BZ 30349)
  2023-08-18 14:06 ` [PATCH v8 5/7] posix: Add pidfd_spawn and pidfd_spawnp (BZ 30349) Adhemerval Zanella
@ 2023-08-24  7:13   ` Florian Weimer
  2023-08-24 15:43     ` Adhemerval Zanella Netto
  0 siblings, 1 reply; 29+ messages in thread
From: Florian Weimer @ 2023-08-24  7:13 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: libc-alpha

* Adhemerval Zanella:

> Returning a pidfd allows a process to keep a race-free handle for a
> child process, otherwise, the caller will need to either use pidfd_open
> (which still might be subject to TOCTOU) or keep the old racy interface
> base on pid_t.
>
> The implementation makes sure that kernel must support the complete
> pidfd interface, meaning that waitid (P_PIDFD) should be supported
> (added on Linux 5.4).  It ensures that a non-racy workaround is required
> (such as reading procfs fdinfo pid to use along with wait interfaces).

Sorry, I don't understand the second sentence.

> diff --git a/posix/tst-spawn3.c b/posix/tst-spawn3.c
> index e7ce0fb386..64052dc911 100644
> --- a/posix/tst-spawn3.c
> +++ b/posix/tst-spawn3.c
> @@ -16,6 +16,7 @@
>     License along with the GNU C Library; if not, see
>     <https://www.gnu.org/licenses/>.  */
>  
> +#include <assert.h>

Please use TEST_VERIFY_EXIT, see below.

> @@ -75,75 +78,82 @@ do_test (void)
>  	    FAIL_EXIT1 ("create_temp_file: %m");
>  	  break;
>  	}
> -      files[nfiles++] = fd;
> +      files[nfiles] = fd;
>      }
> +  assert (nfiles != 0);

TEST_VERIFY_EXIT (nfiles != 0);

> diff --git a/sysdeps/unix/sysv/linux/bits/spawn_ext.h b/sysdeps/unix/sysv/linux/bits/spawn_ext.h
> index a3aa020d5c..3254cfe9be 100644
> --- a/sysdeps/unix/sysv/linux/bits/spawn_ext.h
> +++ b/sysdeps/unix/sysv/linux/bits/spawn_ext.h
> @@ -37,4 +37,35 @@ extern int posix_spawnattr_setcgroup_np (posix_spawnattr_t *__attr,
>  
>  #endif /* __USE_MISC */
>  
> +#ifdef __USE_GNU

Please use __USE_MISC, so this is available with _DEFAULT_SOURCE (like
the cgroups functions).

> diff --git a/sysdeps/unix/sysv/linux/spawni.c b/sysdeps/unix/sysv/linux/spawni.c
> index f0d4c62ae6..d4ff23d955 100644
> --- a/sysdeps/unix/sysv/linux/spawni.c
> +++ b/sysdeps/unix/sysv/linux/spawni.c

>    internal_signal_block_all (&args.oldmask);
> @@ -386,13 +399,16 @@ __spawnix (pid_t * pid, const char *file,
>        /* Unsupported flags like CLONE_CLEAR_SIGHAND will be cleared up by
>  	 __clone_internal_fallback.  */
>        .flags = (set_cgroup ? CLONE_INTO_CGROUP : 0)
> +	       | (use_pidfd ? CLONE_PIDFD : 0)
>  	       | CLONE_CLEAR_SIGHAND
>  	       | CLONE_VM
>  	       | CLONE_VFORK,
>        .exit_signal = SIGCHLD,
>        .stack = (uintptr_t) stack,
>        .stack_size = stack_size,
> -      .cgroup = (set_cgroup ? attrp->__cgroup : 0)
> +      .cgroup = (set_cgroup ? attrp->__cgroup : 0),
> +      .pidfd = use_pidfd ? (uintptr_t) &args.pidfd : 0,
> +      .parent_tid = use_pidfd ? (uintptr_t) &args.pidfd : 0,

The .parent_tid line looks wrong?

Thanks,
Florian


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v8 0/7] Add pidfd and cgroupv2 support for process creation
  2023-08-21 13:55     ` Rich Felker
@ 2023-08-24  7:25       ` Florian Weimer
  2023-08-24 12:21         ` Rich Felker
  0 siblings, 1 reply; 29+ messages in thread
From: Florian Weimer @ 2023-08-24  7:25 UTC (permalink / raw)
  To: Rich Felker; +Cc: Adhemerval Zanella, libc-alpha

* Rich Felker:

> On Mon, Aug 21, 2023 at 08:53:53AM +0200, Florian Weimer wrote:
>> I think clone3 with exit_signal set to 0 and CLONE_PIDFD allows the
>> creation of subprocesses that are difficult to observe by accident from
>> the rest of the process, while obtaining a stable identifier for the
>> process.  I do not think there is any other way to achieve that.  I
>> think it's desirable to expose this functionality in some way.
>
> Indeed that seems like useful functionality to expose for cases where
> you can't fix some bad code, but there are lots of issues with how

It's more about providing a confirming implementation of function in the
wait family in case the implementation needs to create processes for
some reason.  It's not about “bad code”, these are standard POSIX
interfaces.

> clone3 (and even clone) should behave with respect to the child
> environment when you don't exec -- is it _Fork-like or fork-like? Can
> you use AS-unsafe interfaces in the child of a MT parent? Etc. This
> should all be discussed on libc-coord not libc-alpha, IMO.

Seems more like linux-api material, given that it's Linux-specific?

Thanks,
Florian


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v8 7/7] linux: Add pidfd_getpid
  2023-08-18 14:06 ` [PATCH v8 7/7] linux: Add pidfd_getpid Adhemerval Zanella
@ 2023-08-24  7:53   ` Florian Weimer
  0 siblings, 0 replies; 29+ messages in thread
From: Florian Weimer @ 2023-08-24  7:53 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: libc-alpha

* Adhemerval Zanella:

> diff --git a/sysdeps/unix/sysv/linux/clone-pidfd-support.c b/sysdeps/unix/sysv/linux/clone-pidfd-support.c
> index 4411e2b9ea..45378957b0 100644
> --- a/sysdeps/unix/sysv/linux/clone-pidfd-support.c
> +++ b/sysdeps/unix/sysv/linux/clone-pidfd-support.c
> @@ -56,5 +56,5 @@ __clone_pidfd_supported (void)
>        atomic_store_relaxed (&__waitid_pidfd_supported, state);
>      }
>  
> -  return state > 1;
> +  return state > 0;
>  }

Ugh, this should go into the previous patch.

Rest looks okay.

Reviewed-by: Florian Weimer <fweimer@redhat.com>

Thanks,
Florian


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v8 0/7] Add pidfd and cgroupv2 support for process creation
  2023-08-24  7:25       ` Florian Weimer
@ 2023-08-24 12:21         ` Rich Felker
  0 siblings, 0 replies; 29+ messages in thread
From: Rich Felker @ 2023-08-24 12:21 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Adhemerval Zanella, libc-alpha

On Thu, Aug 24, 2023 at 09:25:05AM +0200, Florian Weimer wrote:
> * Rich Felker:
> 
> > On Mon, Aug 21, 2023 at 08:53:53AM +0200, Florian Weimer wrote:
> >> I think clone3 with exit_signal set to 0 and CLONE_PIDFD allows the
> >> creation of subprocesses that are difficult to observe by accident from
> >> the rest of the process, while obtaining a stable identifier for the
> >> process.  I do not think there is any other way to achieve that.  I
> >> think it's desirable to expose this functionality in some way.
> >
> > Indeed that seems like useful functionality to expose for cases where
> > you can't fix some bad code, but there are lots of issues with how
> 
> It's more about providing a confirming implementation of function in the
> wait family in case the implementation needs to create processes for
> some reason.  It's not about “bad code”, these are standard POSIX
> interfaces.

You're talking about the implementation (exit_signal=0 is very useful
for this) and I thought we were talking about interfaces to expose to
the application (where yes calling wait is a standard interface you
can use, but the standard consequence of "bad code" doing that when
another part of the program might have pids it doesn't want reaped is
that things blow up and you get to keep both pieces).

> > clone3 (and even clone) should behave with respect to the child
> > environment when you don't exec -- is it _Fork-like or fork-like? Can
> > you use AS-unsafe interfaces in the child of a MT parent? Etc. This
> > should all be discussed on libc-coord not libc-alpha, IMO.
> 
> Seems more like linux-api material, given that it's Linux-specific?

Perhaps.

Rich

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v8 5/7] posix: Add pidfd_spawn and pidfd_spawnp (BZ 30349)
  2023-08-24  7:13   ` Florian Weimer
@ 2023-08-24 15:43     ` Adhemerval Zanella Netto
  2023-08-24 17:00       ` Florian Weimer
  0 siblings, 1 reply; 29+ messages in thread
From: Adhemerval Zanella Netto @ 2023-08-24 15:43 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha



On 24/08/23 04:13, Florian Weimer wrote:
> * Adhemerval Zanella:
> 
>> Returning a pidfd allows a process to keep a race-free handle for a
>> child process, otherwise, the caller will need to either use pidfd_open
>> (which still might be subject to TOCTOU) or keep the old racy interface
>> base on pid_t.
>>
>> The implementation makes sure that kernel must support the complete
>> pidfd interface, meaning that waitid (P_PIDFD) should be supported
>> (added on Linux 5.4).  It ensures that a non-racy workaround is required
>> (such as reading procfs fdinfo pid to use along with wait interfaces).
> 
> Sorry, I don't understand the second sentence.

It is indeed confusing, I will change to:

To correctly use pifd_spawn, the kernel must support not only returning 
the pidfd with clone/clone3 but also waitid (P_PIDFD) (added on Linux 5.4).
If the kernel does not support the waitid, pidfd returns ENOSYS.  It avoids 
the need for racy workarounds, such as reading the procfs fdinfo to get the 
pid to use along with other wait interfaces.

> 
>> diff --git a/posix/tst-spawn3.c b/posix/tst-spawn3.c
>> index e7ce0fb386..64052dc911 100644
>> --- a/posix/tst-spawn3.c
>> +++ b/posix/tst-spawn3.c
>> @@ -16,6 +16,7 @@
>>     License along with the GNU C Library; if not, see
>>     <https://www.gnu.org/licenses/>.  */
>>  
>> +#include <assert.h>
> 
> Please use TEST_VERIFY_EXIT, see below.
> 
>> @@ -75,75 +78,82 @@ do_test (void)
>>  	    FAIL_EXIT1 ("create_temp_file: %m");
>>  	  break;
>>  	}
>> -      files[nfiles++] = fd;
>> +      files[nfiles] = fd;
>>      }
>> +  assert (nfiles != 0);
> 
> TEST_VERIFY_EXIT (nfiles != 0);

Ack.

> 
>> diff --git a/sysdeps/unix/sysv/linux/bits/spawn_ext.h b/sysdeps/unix/sysv/linux/bits/spawn_ext.h
>> index a3aa020d5c..3254cfe9be 100644
>> --- a/sysdeps/unix/sysv/linux/bits/spawn_ext.h
>> +++ b/sysdeps/unix/sysv/linux/bits/spawn_ext.h
>> @@ -37,4 +37,35 @@ extern int posix_spawnattr_setcgroup_np (posix_spawnattr_t *__attr,
>>  
>>  #endif /* __USE_MISC */
>>  
>> +#ifdef __USE_GNU
> 
> Please use __USE_MISC, so this is available with _DEFAULT_SOURCE (like
> the cgroups functions).

Ack.

> 
>> diff --git a/sysdeps/unix/sysv/linux/spawni.c b/sysdeps/unix/sysv/linux/spawni.c
>> index f0d4c62ae6..d4ff23d955 100644
>> --- a/sysdeps/unix/sysv/linux/spawni.c
>> +++ b/sysdeps/unix/sysv/linux/spawni.c
> 
>>    internal_signal_block_all (&args.oldmask);
>> @@ -386,13 +399,16 @@ __spawnix (pid_t * pid, const char *file,
>>        /* Unsupported flags like CLONE_CLEAR_SIGHAND will be cleared up by
>>  	 __clone_internal_fallback.  */
>>        .flags = (set_cgroup ? CLONE_INTO_CGROUP : 0)
>> +	       | (use_pidfd ? CLONE_PIDFD : 0)
>>  	       | CLONE_CLEAR_SIGHAND
>>  	       | CLONE_VM
>>  	       | CLONE_VFORK,
>>        .exit_signal = SIGCHLD,
>>        .stack = (uintptr_t) stack,
>>        .stack_size = stack_size,
>> -      .cgroup = (set_cgroup ? attrp->__cgroup : 0)
>> +      .cgroup = (set_cgroup ? attrp->__cgroup : 0),
>> +      .pidfd = use_pidfd ? (uintptr_t) &args.pidfd : 0,
>> +      .parent_tid = use_pidfd ? (uintptr_t) &args.pidfd : 0,
> 
> The .parent_tid line looks wrong?

It is required for clone (and that's why you can't use CLONE_PIDFD with
CLONE_PARENT_SETTID). It could only set parent_tid on clone fallback,
but I think this is simpler.  I will add a comment.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v8 5/7] posix: Add pidfd_spawn and pidfd_spawnp (BZ 30349)
  2023-08-24 15:43     ` Adhemerval Zanella Netto
@ 2023-08-24 17:00       ` Florian Weimer
  2023-08-24 17:10         ` Adhemerval Zanella Netto
  0 siblings, 1 reply; 29+ messages in thread
From: Florian Weimer @ 2023-08-24 17:00 UTC (permalink / raw)
  To: Adhemerval Zanella Netto; +Cc: libc-alpha

* Adhemerval Zanella Netto:

> On 24/08/23 04:13, Florian Weimer wrote:
>> * Adhemerval Zanella:
>> 
>>> Returning a pidfd allows a process to keep a race-free handle for a
>>> child process, otherwise, the caller will need to either use pidfd_open
>>> (which still might be subject to TOCTOU) or keep the old racy interface
>>> base on pid_t.
>>>
>>> The implementation makes sure that kernel must support the complete
>>> pidfd interface, meaning that waitid (P_PIDFD) should be supported
>>> (added on Linux 5.4).  It ensures that a non-racy workaround is required
>>> (such as reading procfs fdinfo pid to use along with wait interfaces).
>> 
>> Sorry, I don't understand the second sentence.
>
> It is indeed confusing, I will change to:
>
> To correctly use pifd_spawn, the kernel must support not only returning 
> the pidfd with clone/clone3 but also waitid (P_PIDFD) (added on Linux 5.4).
> If the kernel does not support the waitid, pidfd returns ENOSYS.  It avoids 
> the need for racy workarounds, such as reading the procfs fdinfo to get the 
> pid to use along with other wait interfaces.

Okay.

>>> diff --git a/sysdeps/unix/sysv/linux/spawni.c b/sysdeps/unix/sysv/linux/spawni.c
>>> index f0d4c62ae6..d4ff23d955 100644
>>> --- a/sysdeps/unix/sysv/linux/spawni.c
>>> +++ b/sysdeps/unix/sysv/linux/spawni.c
>> 
>>>    internal_signal_block_all (&args.oldmask);
>>> @@ -386,13 +399,16 @@ __spawnix (pid_t * pid, const char *file,
>>>        /* Unsupported flags like CLONE_CLEAR_SIGHAND will be cleared up by
>>>  	 __clone_internal_fallback.  */
>>>        .flags = (set_cgroup ? CLONE_INTO_CGROUP : 0)
>>> +	       | (use_pidfd ? CLONE_PIDFD : 0)
>>>  	       | CLONE_CLEAR_SIGHAND
>>>  	       | CLONE_VM
>>>  	       | CLONE_VFORK,
>>>        .exit_signal = SIGCHLD,
>>>        .stack = (uintptr_t) stack,
>>>        .stack_size = stack_size,
>>> -      .cgroup = (set_cgroup ? attrp->__cgroup : 0)
>>> +      .cgroup = (set_cgroup ? attrp->__cgroup : 0),
>>> +      .pidfd = use_pidfd ? (uintptr_t) &args.pidfd : 0,
>>> +      .parent_tid = use_pidfd ? (uintptr_t) &args.pidfd : 0,
>> 
>> The .parent_tid line looks wrong?
>
> It is required for clone (and that's why you can't use CLONE_PIDFD with
> CLONE_PARENT_SETTID). It could only set parent_tid on clone fallback,
> but I think this is simpler.  I will add a comment.

Please use a separate variable, not args.pidfd, though.  The current
code depends on the order the kernel sets these fields, I think.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v8 5/7] posix: Add pidfd_spawn and pidfd_spawnp (BZ 30349)
  2023-08-24 17:00       ` Florian Weimer
@ 2023-08-24 17:10         ` Adhemerval Zanella Netto
  2023-08-24 18:18           ` Florian Weimer
  0 siblings, 1 reply; 29+ messages in thread
From: Adhemerval Zanella Netto @ 2023-08-24 17:10 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha



On 24/08/23 14:00, Florian Weimer wrote:
> * Adhemerval Zanella Netto:
> 
>> On 24/08/23 04:13, Florian Weimer wrote:
>>> * Adhemerval Zanella:
>>>
>>>> Returning a pidfd allows a process to keep a race-free handle for a
>>>> child process, otherwise, the caller will need to either use pidfd_open
>>>> (which still might be subject to TOCTOU) or keep the old racy interface
>>>> base on pid_t.
>>>>
>>>> The implementation makes sure that kernel must support the complete
>>>> pidfd interface, meaning that waitid (P_PIDFD) should be supported
>>>> (added on Linux 5.4).  It ensures that a non-racy workaround is required
>>>> (such as reading procfs fdinfo pid to use along with wait interfaces).
>>>
>>> Sorry, I don't understand the second sentence.
>>
>> It is indeed confusing, I will change to:
>>
>> To correctly use pifd_spawn, the kernel must support not only returning 
>> the pidfd with clone/clone3 but also waitid (P_PIDFD) (added on Linux 5.4).
>> If the kernel does not support the waitid, pidfd returns ENOSYS.  It avoids 
>> the need for racy workarounds, such as reading the procfs fdinfo to get the 
>> pid to use along with other wait interfaces.
> 
> Okay.
> 
>>>> diff --git a/sysdeps/unix/sysv/linux/spawni.c b/sysdeps/unix/sysv/linux/spawni.c
>>>> index f0d4c62ae6..d4ff23d955 100644
>>>> --- a/sysdeps/unix/sysv/linux/spawni.c
>>>> +++ b/sysdeps/unix/sysv/linux/spawni.c
>>>
>>>>    internal_signal_block_all (&args.oldmask);
>>>> @@ -386,13 +399,16 @@ __spawnix (pid_t * pid, const char *file,
>>>>        /* Unsupported flags like CLONE_CLEAR_SIGHAND will be cleared up by
>>>>  	 __clone_internal_fallback.  */
>>>>        .flags = (set_cgroup ? CLONE_INTO_CGROUP : 0)
>>>> +	       | (use_pidfd ? CLONE_PIDFD : 0)
>>>>  	       | CLONE_CLEAR_SIGHAND
>>>>  	       | CLONE_VM
>>>>  	       | CLONE_VFORK,
>>>>        .exit_signal = SIGCHLD,
>>>>        .stack = (uintptr_t) stack,
>>>>        .stack_size = stack_size, 
>>>> -      .cgroup = (set_cgroup ? attrp->__cgroup : 0)
>>>> +      .cgroup = (set_cgroup ? attrp->__cgroup : 0),
>>>> +      .pidfd = use_pidfd ? (uintptr_t) &args.pidfd : 0,
>>>> +      .parent_tid = use_pidfd ? (uintptr_t) &args.pidfd : 0,
>>>
>>> The .parent_tid line looks wrong?
>>
>> It is required for clone (and that's why you can't use CLONE_PIDFD with
>> CLONE_PARENT_SETTID). It could only set parent_tid on clone fallback,
>> but I think this is simpler.  I will add a comment.
> 
> Please use a separate variable, not args.pidfd, though.  The current
> code depends on the order the kernel sets these fields, I think.

I can move out the pidfd out of posix_spawn_args, but I don't think this
would change much here.  It would be a stack allocated variable in both
cases, and kernel will just set it if CLONE_PIDFD (afaik there is no
order involved here).

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v8 5/7] posix: Add pidfd_spawn and pidfd_spawnp (BZ 30349)
  2023-08-24 17:10         ` Adhemerval Zanella Netto
@ 2023-08-24 18:18           ` Florian Weimer
  2023-08-24 18:22             ` Adhemerval Zanella Netto
  0 siblings, 1 reply; 29+ messages in thread
From: Florian Weimer @ 2023-08-24 18:18 UTC (permalink / raw)
  To: Adhemerval Zanella Netto; +Cc: libc-alpha

* Adhemerval Zanella Netto:

> On 24/08/23 14:00, Florian Weimer wrote:
>> Please use a separate variable, not args.pidfd, though.  The current
>> code depends on the order the kernel sets these fields, I think.
>
> I can move out the pidfd out of posix_spawn_args, but I don't think this
> would change much here.  It would be a stack allocated variable in both
> cases, and kernel will just set it if CLONE_PIDFD (afaik there is no
> order involved here).

I'm concerned about this:

>>>>> +      .pidfd = use_pidfd ? (uintptr_t) &args.pidfd : 0,
>>>>> +      .parent_tid = use_pidfd ? (uintptr_t) &args.pidfd : 0,

&args.pidfd in both cases.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v8 5/7] posix: Add pidfd_spawn and pidfd_spawnp (BZ 30349)
  2023-08-24 18:18           ` Florian Weimer
@ 2023-08-24 18:22             ` Adhemerval Zanella Netto
  2023-08-25 10:38               ` Florian Weimer
  0 siblings, 1 reply; 29+ messages in thread
From: Adhemerval Zanella Netto @ 2023-08-24 18:22 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha



On 24/08/23 15:18, Florian Weimer wrote:
> * Adhemerval Zanella Netto:
> 
>> On 24/08/23 14:00, Florian Weimer wrote:
>>> Please use a separate variable, not args.pidfd, though.  The current
>>> code depends on the order the kernel sets these fields, I think.
>>
>> I can move out the pidfd out of posix_spawn_args, but I don't think this
>> would change much here.  It would be a stack allocated variable in both
>> cases, and kernel will just set it if CLONE_PIDFD (afaik there is no
>> order involved here).
> 
> I'm concerned about this:
> 
>>>>>> +      .pidfd = use_pidfd ? (uintptr_t) &args.pidfd : 0,
>>>>>> +      .parent_tid = use_pidfd ? (uintptr_t) &args.pidfd : 0,
> 
> &args.pidfd in both cases.

My understanding is since it does use CLONE_PARENT_SETTID, if clone3 is
called it will ignore parent_tid and only set pidfd.  If clone is called,
there is no pidfd argument, and the pidfd will be returned on parent_tid
argument.  The kernel explicit disable CLONE_PIDFD | CLONE_PARENT_SETTID:

kernel/fork.c

2880         /*
2881          * For legacy clone() calls, CLONE_PIDFD uses the parent_tid argument
2882          * to return the pidfd. Hence, CLONE_PIDFD and CLONE_PARENT_SETTID are
2883          * mutually exclusive. With clone3() CLONE_PIDFD has grown a separate
2884          * field in struct clone_args and it still doesn't make sense to have
2885          * them both point at the same memory location. Performing this check
2886          * here has the advantage that we don't need to have a separate helper
2887          * to check for legacy clone().
2888          */
2889         if ((args->flags & CLONE_PIDFD) &&
2890             (args->flags & CLONE_PARENT_SETTID) &&
2891             (args->pidfd == args->parent_tid))
2892                 return -EINVAL;

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v8 5/7] posix: Add pidfd_spawn and pidfd_spawnp (BZ 30349)
  2023-08-24 18:22             ` Adhemerval Zanella Netto
@ 2023-08-25 10:38               ` Florian Weimer
  2023-08-25 16:37                 ` Adhemerval Zanella Netto
  0 siblings, 1 reply; 29+ messages in thread
From: Florian Weimer @ 2023-08-25 10:38 UTC (permalink / raw)
  To: Adhemerval Zanella Netto; +Cc: libc-alpha

* Adhemerval Zanella Netto:

> On 24/08/23 15:18, Florian Weimer wrote:
>> * Adhemerval Zanella Netto:
>> 
>>> On 24/08/23 14:00, Florian Weimer wrote:
>>>> Please use a separate variable, not args.pidfd, though.  The current
>>>> code depends on the order the kernel sets these fields, I think.
>>>
>>> I can move out the pidfd out of posix_spawn_args, but I don't think this
>>> would change much here.  It would be a stack allocated variable in both
>>> cases, and kernel will just set it if CLONE_PIDFD (afaik there is no
>>> order involved here).
>> 
>> I'm concerned about this:
>> 
>>>>>>> +      .pidfd = use_pidfd ? (uintptr_t) &args.pidfd : 0,
>>>>>>> +      .parent_tid = use_pidfd ? (uintptr_t) &args.pidfd : 0,
>> 
>> &args.pidfd in both cases.
>
> My understanding is since it does use CLONE_PARENT_SETTID, if clone3 is
> called it will ignore parent_tid and only set pidfd.  If clone is called,
> there is no pidfd argument, and the pidfd will be returned on parent_tid
> argument.  The kernel explicit disable CLONE_PIDFD | CLONE_PARENT_SETTID:
>
> kernel/fork.c
>
> 2880         /*
> 2881          * For legacy clone() calls, CLONE_PIDFD uses the parent_tid argument
> 2882          * to return the pidfd. Hence, CLONE_PIDFD and CLONE_PARENT_SETTID are
> 2883          * mutually exclusive. With clone3() CLONE_PIDFD has grown a separate
> 2884          * field in struct clone_args and it still doesn't make sense to have
> 2885          * them both point at the same memory location. Performing this check
> 2886          * here has the advantage that we don't need to have a separate helper
> 2887          * to check for legacy clone().
> 2888          */
> 2889         if ((args->flags & CLONE_PIDFD) &&
> 2890             (args->flags & CLONE_PARENT_SETTID) &&
> 2891             (args->pidfd == args->parent_tid))
> 2892                 return -EINVAL;

I still think the condition would be better written as

  /* Legacy clone writes the pidfd there with CLONE_PIDFD.  */
  if (!args.use_clone3)
    args.parent_tid = (uintptr_t) &args.pidfd;

after the initialization of args.use_clone3.  It makes the intent very
clear, at least to me.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v8 5/7] posix: Add pidfd_spawn and pidfd_spawnp (BZ 30349)
  2023-08-25 10:38               ` Florian Weimer
@ 2023-08-25 16:37                 ` Adhemerval Zanella Netto
  0 siblings, 0 replies; 29+ messages in thread
From: Adhemerval Zanella Netto @ 2023-08-25 16:37 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha



On 25/08/23 07:38, Florian Weimer wrote:
> * Adhemerval Zanella Netto:
> 
>> On 24/08/23 15:18, Florian Weimer wrote:
>>> * Adhemerval Zanella Netto:
>>>
>>>> On 24/08/23 14:00, Florian Weimer wrote:
>>>>> Please use a separate variable, not args.pidfd, though.  The current
>>>>> code depends on the order the kernel sets these fields, I think.
>>>>
>>>> I can move out the pidfd out of posix_spawn_args, but I don't think this
>>>> would change much here.  It would be a stack allocated variable in both
>>>> cases, and kernel will just set it if CLONE_PIDFD (afaik there is no
>>>> order involved here).
>>>
>>> I'm concerned about this:
>>>
>>>>>>>> +      .pidfd = use_pidfd ? (uintptr_t) &args.pidfd : 0,
>>>>>>>> +      .parent_tid = use_pidfd ? (uintptr_t) &args.pidfd : 0,
>>>
>>> &args.pidfd in both cases.
>>
>> My understanding is since it does use CLONE_PARENT_SETTID, if clone3 is
>> called it will ignore parent_tid and only set pidfd.  If clone is called,
>> there is no pidfd argument, and the pidfd will be returned on parent_tid
>> argument.  The kernel explicit disable CLONE_PIDFD | CLONE_PARENT_SETTID:
>>
>> kernel/fork.c
>>
>> 2880         /*
>> 2881          * For legacy clone() calls, CLONE_PIDFD uses the parent_tid argument
>> 2882          * to return the pidfd. Hence, CLONE_PIDFD and CLONE_PARENT_SETTID are
>> 2883          * mutually exclusive. With clone3() CLONE_PIDFD has grown a separate
>> 2884          * field in struct clone_args and it still doesn't make sense to have
>> 2885          * them both point at the same memory location. Performing this check
>> 2886          * here has the advantage that we don't need to have a separate helper
>> 2887          * to check for legacy clone().
>> 2888          */
>> 2889         if ((args->flags & CLONE_PIDFD) &&
>> 2890             (args->flags & CLONE_PARENT_SETTID) &&
>> 2891             (args->pidfd == args->parent_tid))
>> 2892                 return -EINVAL;
> 
> I still think the condition would be better written as
> 
>   /* Legacy clone writes the pidfd there with CLONE_PIDFD.  */
>   if (!args.use_clone3)
>     args.parent_tid = (uintptr_t) &args.pidfd;
> 
> after the initialization of args.use_clone3.  It makes the intent very
> clear, at least to me.

Alright, I changed to:

#ifdef HAVE_CLONE3_WRAPPER
  args.use_clone3 = true;
  new_pid = __clone3 (&clone_args, sizeof (clone_args), __spawni_child,
                      &args);
  /* clone3 was added in 5.3 and CLONE_CLEAR_SIGHAND in 5.5.  */
  if (new_pid == -1 && (errno == ENOSYS || errno == EINVAL))
#endif
    {
      /* Legacy clone writes the pidfd there with CLONE_PIDFD.  */
      clone_args.parent_tid = use_pidfd ? (uintptr_t) &args.pidfd : 0,
      args.use_clone3 = false;
      if (!set_cgroup)
        new_pid = __clone_internal_fallback (&clone_args, __spawni_child,
                                             &args);

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v8 0/7] Add pidfd and cgroupv2 support for process creation
  2023-08-18 18:34   ` Adhemerval Zanella Netto
@ 2023-08-28 12:52     ` Luca Boccassi
  2023-08-28 13:21       ` Florian Weimer
  0 siblings, 1 reply; 29+ messages in thread
From: Luca Boccassi @ 2023-08-28 12:52 UTC (permalink / raw)
  To: libc-alpha

[-- Attachment #1: Type: text/plain, Size: 2754 bytes --]

> On 18/08/23 14:51, Rich Felker wrote:
> > On Fri, Aug 18, 2023 at 11:06:35AM -0300, Adhemerval Zanella via
> Libc-alpha wrote:
> >> The glibc 2.36 added wrappers for Linux syscall pidfd_open,
> pidfd_getfd,
> >> and pidfd_send_signal, and exported the P_PIDFD to use along with
> >> waitid. The pidfd is a race-free interface, however, the
> pidfd_open is
> >> subject to TOCTOU if the file descriptor is not obtained directly
> from
> >> the clone or clone3 syscall (there is still a small window between
> the
> >> clone return and the pidfd_getfd where the process can be reaped
> and the
> >> process ID reused).
> > 
> > Unless I'm missing something, that window is purely programmer
> error.
> > The pid belongs to the parent process, that called fork,
> posix_spawn,
> > clone, or whatever, and is responsible for not freeing it until
> it's
> > done using it.
> > 
> > Yes this can happen if you install a SIGCHLD handler that reaps
> > anything it sees, or if you're calling wait without a pid. This is
> > programming error. If you're stuck with code outside your control
> that
> > makes that mistake, you can already avoid it with clone by setting
> the
> > child exit signal to 0 rather than SIGCHLD. But it's best just not
> to
> > do that.
> > 
> 
> Yes, this is the issue GNOME is having with their code base [1] and
> that
> motivated this new interface.  Systemd also seems to be interested in
> these interface, although I am not sure if it is also subject to same
> issue.
> 
> I don't have a strong opinion whether this should be considered a
> solid
> reason to provide a new API, another option would to close BZ#30349
> [2] 
> as wontfix with this rationale.  However, this does not really
> provide 
> an workaround, and worse it will pass the idea that to fully resolve
> it 
> you will need either to allow the racy condition or issue clone
> directly.

These are real race conditions, that cannot be solved otherwise,
characterizing them as 'programming errors' is very misleading and
wrong.

We very much need both of those interfaces in systemd, and fully intend
to use them as soon as they are available. We are slowly moving towards
using pidfds everywhere to be able to do end-to-end race-free process
tracking and management, and these are fundamental pieces for this
effort. From what I can read the GNOME developers feel the same way,
and I wouldn't be surprised if QT followed suit too given what you
mentioned in the cover letter.

Surely implementing useful, core functionality for the direct and
immediate benefit of 3 major Linux projects is a reason as solid as you
could ever find to add a new interface.

-- 
Kind regards,
Luca Boccassi

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v8 0/7] Add pidfd and cgroupv2 support for process creation
  2023-08-28 12:52     ` Luca Boccassi
@ 2023-08-28 13:21       ` Florian Weimer
  2023-08-28 13:50         ` Luca Boccassi
  0 siblings, 1 reply; 29+ messages in thread
From: Florian Weimer @ 2023-08-28 13:21 UTC (permalink / raw)
  To: Luca Boccassi; +Cc: libc-alpha

* Luca Boccassi:

> These are real race conditions, that cannot be solved otherwise,
> characterizing them as 'programming errors' is very misleading and
> wrong.
>
> We very much need both of those interfaces in systemd, and fully intend
> to use them as soon as they are available. We are slowly moving towards
> using pidfds everywhere to be able to do end-to-end race-free process
> tracking and management, and these are fundamental pieces for this
> effort. From what I can read the GNOME developers feel the same way,
> and I wouldn't be surprised if QT followed suit too given what you
> mentioned in the cover letter.
>
> Surely implementing useful, core functionality for the direct and
> immediate benefit of 3 major Linux projects is a reason as solid as you
> could ever find to add a new interface.

I see value in adding fork support, too.

The fundamental issue with the fork part is that it's not future-proof
at all.  The programming model is completely different from the kernel
interface.

If I start a discussion about API alternatives, who should I Cc: except
you and Rich?

Thanks,
Florian


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v8 0/7] Add pidfd and cgroupv2 support for process creation
  2023-08-28 13:21       ` Florian Weimer
@ 2023-08-28 13:50         ` Luca Boccassi
  0 siblings, 0 replies; 29+ messages in thread
From: Luca Boccassi @ 2023-08-28 13:50 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha

On Mon, 28 Aug 2023 at 14:21, Florian Weimer <fweimer@redhat.com> wrote:
>
> * Luca Boccassi:
>
> > These are real race conditions, that cannot be solved otherwise,
> > characterizing them as 'programming errors' is very misleading and
> > wrong.
> >
> > We very much need both of those interfaces in systemd, and fully intend
> > to use them as soon as they are available. We are slowly moving towards
> > using pidfds everywhere to be able to do end-to-end race-free process
> > tracking and management, and these are fundamental pieces for this
> > effort. From what I can read the GNOME developers feel the same way,
> > and I wouldn't be surprised if QT followed suit too given what you
> > mentioned in the cover letter.
> >
> > Surely implementing useful, core functionality for the direct and
> > immediate benefit of 3 major Linux projects is a reason as solid as you
> > could ever find to add a new interface.
>
> I see value in adding fork support, too.
>
> The fundamental issue with the fork part is that it's not future-proof
> at all.  The programming model is completely different from the kernel
> interface.

Sure that would be nice too, it is less urgent for us as we are moving
towards spawn(), so it seems fine if that is handled separately from
this series for me.

> If I start a discussion about API alternatives, who should I Cc: except
> you and Rich?

Nobody else for now on my side, thanks.

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2023-08-28 13:51 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-18 14:06 [PATCH v8 0/7] Add pidfd and cgroupv2 support for process creation Adhemerval Zanella
2023-08-18 14:06 ` [PATCH v8 1/7] arm: Add the clone3 wrapper Adhemerval Zanella
2023-08-18 14:06 ` [PATCH v8 2/7] mips: " Adhemerval Zanella
2023-08-18 14:06 ` [PATCH v8 3/7] linux: Define __ASSUME_CLONE3 to 0 for alpha, ia64, nios2, sh, and sparc Adhemerval Zanella
2023-08-24  6:06   ` Florian Weimer
2023-08-18 14:06 ` [PATCH v8 4/7] linux: Add posix_spawnattr_{get,set}cgroup_np (BZ 26731) Adhemerval Zanella
2023-08-24  7:00   ` Florian Weimer
2023-08-18 14:06 ` [PATCH v8 5/7] posix: Add pidfd_spawn and pidfd_spawnp (BZ 30349) Adhemerval Zanella
2023-08-24  7:13   ` Florian Weimer
2023-08-24 15:43     ` Adhemerval Zanella Netto
2023-08-24 17:00       ` Florian Weimer
2023-08-24 17:10         ` Adhemerval Zanella Netto
2023-08-24 18:18           ` Florian Weimer
2023-08-24 18:22             ` Adhemerval Zanella Netto
2023-08-25 10:38               ` Florian Weimer
2023-08-25 16:37                 ` Adhemerval Zanella Netto
2023-08-18 14:06 ` [PATCH v8 6/7] posix: Add fork_np (BZ 26371) Adhemerval Zanella
2023-08-24  6:07   ` Florian Weimer
2023-08-18 14:06 ` [PATCH v8 7/7] linux: Add pidfd_getpid Adhemerval Zanella
2023-08-24  7:53   ` Florian Weimer
2023-08-18 17:51 ` [PATCH v8 0/7] Add pidfd and cgroupv2 support for process creation Rich Felker
2023-08-18 18:34   ` Adhemerval Zanella Netto
2023-08-28 12:52     ` Luca Boccassi
2023-08-28 13:21       ` Florian Weimer
2023-08-28 13:50         ` Luca Boccassi
2023-08-21  6:53   ` Florian Weimer
2023-08-21 13:55     ` Rich Felker
2023-08-24  7:25       ` Florian Weimer
2023-08-24 12:21         ` Rich Felker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).