public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH v7 0/8] Add pidfd and cgroupv2 support for process creation
@ 2023-08-03 16:35 Adhemerval Zanella
  2023-08-03 16:35 ` [PATCH v7 1/8] arm: Add the clone3 wrapper Adhemerval Zanella
                   ` (7 more replies)
  0 siblings, 8 replies; 23+ messages in thread
From: Adhemerval Zanella @ 2023-08-03 16:35 UTC (permalink / raw)
  To: libc-alpha

The glibc 2.36 added wrappers for Linux syscall pidfd_open,
pidfd_getfd, and pidfd_send_signal, and exported the P_PIDFD to use
along with waitid. The pidfd is a race free interface, however
the pidfd_open is subject to TOCTOU if the file descriptor
is not obtained directly from the clone or clone3 syscall (there is
still a small window between the clone return and the pidfd_getfd
where the process can be reaped and the process ID reused).

A fully race free interface with posix_spawn interface is being
discussed by GNOME [1] [2], and Qt already uses on its QtProcess
implementation [3].  The Qt implementation has some pitfalls:

  - It calls clone through the syscall symbol, which does not run the
    pthread_atfork handlers even though it really intends to use the
    clone semantic for fork (by only using CLONE_PIDFD | SIGCHLD).

  - It also does not reset any internal state, such as internal IO,
    malloc, loader, etc. locks.

  - It does not set the TCB tid field nor the robust list, used by
    pthread code.

  - It does not optimize process creation by using CLONE_VM and
    CLONE_VFORK.

Also, recent Linux kernel (starting with 5.7) provide a way to
create a new process in a different cgroups version 2 than the
default one (through clone3 CLONE_INTO_CGROUP flag).  Providing it
through glibc interfaces make is usable without the risk of potential
breakage by issuing clone3 syscall directly (check BZ#26371 discussion).

This patchset adds new interfaces that take care of this potential
issues.  The new posix_spawn / posix_spawnp extesions:


  #define POSIX_SPAWN_SETCGROUP 0x100

  int posix_spawnattr_getcgroup_np (const posix_spawnattr_t
				    restrict *attr, int *cgroup);
  int posix_spawnattr_setcgroup_np (posix_spawnattr_t *restrict attr,
                                    int cgroup);
  
Allow spawn a new process on a different cgroupv2.  

The pidfd_spawn and pidfd_spawnp is similar to posix_spawn and
posix_spawnp,
but return a process file descriptor instead of a PID.

  int pidfd_spawn (int *restrict pidfd,
 		   const char *restrict file,
  		   const posix_spawn_file_actions_t *restrict facts,
  		   const posix_spawnattr_t *restrict attrp,
  		   char *const argv[restrict],
  		   char *const envp[restrict])

  int pidfd_spawnp (int *restrict pidfd,
 		    const char *restrict path,
  		    const posix_spawn_file_actions_t *restrict facts,
  		    const posix_spawnattr_t *restrict attrp,
  		    char *const argv[restrict_arr],
  		    char *const envp[restrict_arr]);

The implementation makes sure that kernel must support the complete
pidfd interface, meaning that waitid (P_PIDFD) should be supported.  It
ensure that non racy workaround is required (such as reading procfs
fdinfo pid to use along with old wait interfaces).  If kernel does not
have the required support the interface returns ENOSYS.

A new symbol is used instead of a posix_spawn extension to avoid
possible issue with language bindings that might track the argument
lifetime.

Both symbols reuse the posix_spawn posix_spawn_file_actions_t and
posix_spawnattr_t, to either avoid rehash posix_spawn API or add a new
one.  It also mean that both interfaces support the same attribute and
file actions, and a new flag or file actions on posix_spawn is also
added automatically for pidfd_spawn. It includes POSIX_SPAWN_SETCGROUP.

Along with the spawn interface, a fork like one is also provided:

  pid_t pidfd_fork (int *pidfd, int cgroup, unsigned int flags)

If PIDFD is set to NULL, no file descriptor is returned and pidfd_fork
acts as fork.  Otherwise, a new file descriptor is returned and the
kernel already sets O_CLOEXEC as default.  The pidfd_fork follows
fork/_Fork convention on returning a positive or negative value to the
parent (with negative indicating an error) and zero to the child.

If cgroup is 0 or positive value, it is interpreted as a different
cgroup to be place the new process (check CLONE_INTO_CGROUP clone
flag).

The kernel already sets O_CLOEXEC as default and it follows fork/_Fork
convention on returning a positive or negative value to the parent
(with negative indicating an error) and zero to the child.

Similar to fork, pidfd_fork also runs the pthread_atfork handlers
It can be change by using PIDFDFORK_ASYNCSAFE flag, which make
pidfd_fork acts a _Fork.  It also send SIGCHLD to parent when
process terminates.

To have a way to interop between process IDs and process file
descriptors, the pidfd_getpid is also provided:

   pid_t pidfd_getpid (int fd)

It reads the procfs fdinfo entry from the file descriptor to get
the process ID.

---

Changes from v6:
- Rebased against master, adjusted symbol version and NEWS entry.
- Added arm/mips clone3 implementation.

Changes from v5:
- Added cgroupv2 support for posix_spawn, pidfd_spawn, and pidfd_fork.

Changes from v4:
- Changed pidfd_fork signature to return a pid_t instead of PID file
  descriptor.
- Changed pidfd_getpid to return EBADF for negative input, instead of
  EINVAL.
- Added PIDFDFORK_NOSIGCHLD option.
- Fixed nested __BEGIN_DECLS on spawn.h

Changes from v3:
- Remove strtoul usage.
- Fixed patchwork tst-pidfd_getpid.c regression.
- Fixed manual and NEWS typos.

Changes from v2:
- Added pidfd_fork and pidfd_getpid manual entries
- Change pidfd_fork to act as fork as default, instead as _Fork.
- Changed PIDFD_FORK_RUNATFORK flag to PIDFDFORK_ASYNCSAFE.
- Added pidfd_getpid test for EREMOTE.

Changes from v1:
- Extended pidfd_getpid error codes to return EBADF if fdinfo does not
  have Pid entry or if the value is invalid, EREMOTE is pid is in a 
  separate namespace, and ESRCH if is already terminated.
- Extended tst-pidfd_getpid.
- Rename PIDFD_FORK_RUNATFORK to PIDFDFORK_RUNATFORK to avoid clash
  with possible kernel extensions.

Adhemerval Zanella (8):
  arm: Add the clone3 wrapper
  mips: Add the clone3 wrapper
  linux: Undef __ASSUME_CLONE3 for alpha, ia64, nios2, sh, and sparc
  linux: Add posix_spawnattr_{get,set}cgroup_np (BZ 26731)
  posix: Add pidfd_spawn and pidfd_spawnp (BZ 30349)
  posix: Add pidfd_fork (BZ 26371)
  posix: Add PIDFDFORK_NOSIGCHLD for pidfd_fork
  linux: Add pidfd_getpid

 NEWS                                          |  22 +-
 bits/spawn_ext.h                              |  21 ++
 include/clone_internal.h                      |  21 ++
 manual/process.texi                           |  92 ++++++-
 posix/Makefile                                |   5 +-
 posix/fork-internal.c                         | 127 ++++++++++
 posix/fork-internal.h                         |  36 +++
 posix/fork.c                                  | 107 +--------
 posix/spawn.h                                 |   6 +-
 posix/spawn_int.h                             |   3 +-
 posix/spawnattr_setflags.c                    |   3 +-
 posix/tst-posix_spawn-setsid.c                | 168 +++++++++----
 posix/tst-spawn-chdir.c                       |  15 +-
 posix/tst-spawn.c                             |  24 +-
 posix/tst-spawn.h                             |  36 +++
 posix/tst-spawn2.c                            |  17 +-
 posix/tst-spawn3.c                            | 100 ++++----
 posix/tst-spawn4.c                            |   7 +-
 posix/tst-spawn5.c                            |  14 +-
 posix/tst-spawn6.c                            |  15 +-
 posix/tst-spawn7.c                            |  13 +-
 sysdeps/nptl/_Fork.c                          |   2 +-
 sysdeps/unix/sysv/linux/Makefile              |  29 +++
 sysdeps/unix/sysv/linux/Versions              |   8 +
 sysdeps/unix/sysv/linux/aarch64/libc.abilist  |   6 +
 .../unix/sysv/linux/alpha/kernel-features.h   |   3 +
 sysdeps/unix/sysv/linux/alpha/libc.abilist    |   6 +
 sysdeps/unix/sysv/linux/arc/libc.abilist      |   6 +
 sysdeps/unix/sysv/linux/arch-fork.h           |  16 +-
 sysdeps/unix/sysv/linux/arm/be/libc.abilist   |   6 +
 sysdeps/unix/sysv/linux/arm/clone3.S          |  80 ++++++
 sysdeps/unix/sysv/linux/arm/le/libc.abilist   |   6 +
 sysdeps/unix/sysv/linux/arm/sysdep.h          |   1 +
 sysdeps/unix/sysv/linux/bits/spawn_ext.h      |  71 ++++++
 sysdeps/unix/sysv/linux/clone-internal.c      |  62 ++++-
 sysdeps/unix/sysv/linux/clone-pidfd-support.c |  58 +++++
 sysdeps/unix/sysv/linux/csky/libc.abilist     |   6 +
 sysdeps/unix/sysv/linux/hppa/libc.abilist     |   6 +
 sysdeps/unix/sysv/linux/i386/libc.abilist     |   6 +
 .../unix/sysv/linux/ia64/kernel-features.h    |   3 +
 sysdeps/unix/sysv/linux/ia64/libc.abilist     |   6 +
 .../sysv/linux/loongarch/lp64/libc.abilist    |   6 +
 .../sysv/linux/m68k/coldfire/libc.abilist     |   6 +
 .../unix/sysv/linux/m68k/m680x0/libc.abilist  |   6 +
 .../sysv/linux/microblaze/be/libc.abilist     |   6 +
 .../sysv/linux/microblaze/le/libc.abilist     |   6 +
 sysdeps/unix/sysv/linux/mips/clone3.S         | 139 +++++++++++
 .../sysv/linux/mips/mips32/fpu/libc.abilist   |   6 +
 .../sysv/linux/mips/mips32/nofpu/libc.abilist |   6 +
 .../sysv/linux/mips/mips64/n32/libc.abilist   |   6 +
 .../sysv/linux/mips/mips64/n64/libc.abilist   |   6 +
 sysdeps/unix/sysv/linux/mips/sysdep.h         |   2 +
 .../unix/sysv/linux/nios2/kernel-features.h   |  23 ++
 sysdeps/unix/sysv/linux/nios2/libc.abilist    |   6 +
 sysdeps/unix/sysv/linux/or1k/libc.abilist     |   6 +
 sysdeps/unix/sysv/linux/pidfd_fork.c          |  82 +++++++
 sysdeps/unix/sysv/linux/pidfd_getpid.c        | 122 ++++++++++
 sysdeps/unix/sysv/linux/pidfd_spawn.c         |  30 +++
 sysdeps/unix/sysv/linux/pidfd_spawnp.c        |  30 +++
 .../linux/powerpc/powerpc32/fpu/libc.abilist  |   6 +
 .../powerpc/powerpc32/nofpu/libc.abilist      |   6 +
 .../linux/powerpc/powerpc64/be/libc.abilist   |   6 +
 .../linux/powerpc/powerpc64/le/libc.abilist   |   6 +
 sysdeps/unix/sysv/linux/procutils.c           | 104 ++++++++
 sysdeps/unix/sysv/linux/procutils.h           |  35 +++
 .../unix/sysv/linux/riscv/rv32/libc.abilist   |   6 +
 .../unix/sysv/linux/riscv/rv64/libc.abilist   |   6 +
 .../unix/sysv/linux/s390/s390-32/libc.abilist |   6 +
 .../unix/sysv/linux/s390/s390-64/libc.abilist |   6 +
 sysdeps/unix/sysv/linux/sh/be/libc.abilist    |   6 +
 sysdeps/unix/sysv/linux/sh/kernel-features.h  |   3 +
 sysdeps/unix/sysv/linux/sh/le/libc.abilist    |   6 +
 .../unix/sysv/linux/sparc/kernel-features.h   |   3 +
 .../sysv/linux/sparc/sparc32/libc.abilist     |   6 +
 .../sysv/linux/sparc/sparc64/libc.abilist     |   6 +
 .../unix/sysv/linux/spawnattr_getcgroup_np.c  |  28 +++
 .../unix/sysv/linux/spawnattr_setcgroup_np.c  |  27 +++
 sysdeps/unix/sysv/linux/spawni.c              |  40 ++-
 sysdeps/unix/sysv/linux/sys/pidfd.h           |  25 ++
 sysdeps/unix/sysv/linux/tst-pidfd.c           |  47 ++++
 .../unix/sysv/linux/tst-pidfd_fork-cgroup.c   | 162 +++++++++++++
 sysdeps/unix/sysv/linux/tst-pidfd_fork.c      | 227 ++++++++++++++++++
 sysdeps/unix/sysv/linux/tst-pidfd_getpid.c    | 187 +++++++++++++++
 .../sysv/linux/tst-posix_spawn-setsid-pidfd.c |  20 ++
 sysdeps/unix/sysv/linux/tst-spawn-cgroup.c    | 216 +++++++++++++++++
 .../unix/sysv/linux/tst-spawn-chdir-pidfd.c   |  20 ++
 sysdeps/unix/sysv/linux/tst-spawn-pidfd.c     |  20 ++
 sysdeps/unix/sysv/linux/tst-spawn-pidfd.h     |  63 +++++
 sysdeps/unix/sysv/linux/tst-spawn2-pidfd.c    |  20 ++
 sysdeps/unix/sysv/linux/tst-spawn3-pidfd.c    |  20 ++
 sysdeps/unix/sysv/linux/tst-spawn4-pidfd.c    |  20 ++
 sysdeps/unix/sysv/linux/tst-spawn5-pidfd.c    |  20 ++
 sysdeps/unix/sysv/linux/tst-spawn6-pidfd.c    |  20 ++
 sysdeps/unix/sysv/linux/tst-spawn7-pidfd.c    |  20 ++
 .../unix/sysv/linux/x86_64/64/libc.abilist    |   6 +
 .../unix/sysv/linux/x86_64/x32/libc.abilist   |   6 +
 96 files changed, 2894 insertions(+), 270 deletions(-)
 create mode 100644 bits/spawn_ext.h
 create mode 100644 posix/fork-internal.c
 create mode 100644 posix/fork-internal.h
 create mode 100644 posix/tst-spawn.h
 create mode 100644 sysdeps/unix/sysv/linux/arm/clone3.S
 create mode 100644 sysdeps/unix/sysv/linux/bits/spawn_ext.h
 create mode 100644 sysdeps/unix/sysv/linux/clone-pidfd-support.c
 create mode 100644 sysdeps/unix/sysv/linux/mips/clone3.S
 create mode 100644 sysdeps/unix/sysv/linux/nios2/kernel-features.h
 create mode 100644 sysdeps/unix/sysv/linux/pidfd_fork.c
 create mode 100644 sysdeps/unix/sysv/linux/pidfd_getpid.c
 create mode 100644 sysdeps/unix/sysv/linux/pidfd_spawn.c
 create mode 100644 sysdeps/unix/sysv/linux/pidfd_spawnp.c
 create mode 100644 sysdeps/unix/sysv/linux/procutils.c
 create mode 100644 sysdeps/unix/sysv/linux/procutils.h
 create mode 100644 sysdeps/unix/sysv/linux/spawnattr_getcgroup_np.c
 create mode 100644 sysdeps/unix/sysv/linux/spawnattr_setcgroup_np.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-pidfd_fork-cgroup.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-pidfd_fork.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-pidfd_getpid.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-posix_spawn-setsid-pidfd.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn-cgroup.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn-chdir-pidfd.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn-pidfd.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn-pidfd.h
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn2-pidfd.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn3-pidfd.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn4-pidfd.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn5-pidfd.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn6-pidfd.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn7-pidfd.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v7 1/8] arm: Add the clone3 wrapper
  2023-08-03 16:35 [PATCH v7 0/8] Add pidfd and cgroupv2 support for process creation Adhemerval Zanella
@ 2023-08-03 16:35 ` Adhemerval Zanella
  2023-08-11 10:17   ` Florian Weimer
  2023-08-03 16:35 ` [PATCH v7 2/8] mips: " Adhemerval Zanella
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Adhemerval Zanella @ 2023-08-03 16:35 UTC (permalink / raw)
  To: libc-alpha

It follows the internal signature:

  extern int clone3 (struct clone_args *__cl_args, size_t __size,
		    int (*__func) (void *__arg), void *__arg);

Checked on arm-linux-gnueabihf.
---
 sysdeps/unix/sysv/linux/arm/clone3.S | 80 ++++++++++++++++++++++++++++
 sysdeps/unix/sysv/linux/arm/sysdep.h |  1 +
 2 files changed, 81 insertions(+)
 create mode 100644 sysdeps/unix/sysv/linux/arm/clone3.S

diff --git a/sysdeps/unix/sysv/linux/arm/clone3.S b/sysdeps/unix/sysv/linux/arm/clone3.S
new file mode 100644
index 0000000000..f236d18390
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/arm/clone3.S
@@ -0,0 +1,80 @@
+/* The clone3 syscall wrapper.  Linux/arm version.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#define _ERRNO_H	1
+#include <bits/errno.h>
+
+/* The userland implementation is:
+   int clone3 (struct clone_args *cl_args, size_t size,
+               int (*func)(void *arg), void *arg);
+
+   the kernel entry is:
+   int clone3 (struct clone_args *cl_args, size_t size);
+
+   The parameters are passed in registers from userland:
+   r0: cl_args
+   r1: size
+   r2: func
+   r3: arg  */
+
+        .text
+ENTRY(__clone3)
+	/* Sanity check args.  */
+	cmp	r0, #0
+	ite	ne
+	cmpne	r1, #0
+	moveq	r0, #-EINVAL
+	beq	PLTJMP(syscall_error)
+
+	/* Do the syscall, the kernel expects:
+	   r7: system call number:
+	   r0: cl_args
+	   r1: size  */
+	push    { r7 }
+	cfi_adjust_cfa_offset (4)
+	cfi_rel_offset (r7, 0)
+	ldr     r7, =SYS_ify(clone3)
+	swi	0x0
+	cfi_endproc
+
+	cmp	r0, #0
+	beq	1f
+	pop     {r7}
+	blt	PLTJMP(C_SYMBOL_NAME(__syscall_error))
+	RETINSTR(, lr)
+
+	cfi_startproc
+PSEUDO_END (__clone3)
+
+1:
+	.fnstart
+	.cantunwind
+	mov	r0, r3
+	mov	ip, r2
+	BLX (ip)
+
+	/* And we are done, passing the return value through r0.  */
+	ldr	r7, =SYS_ify(exit)
+	swi	0x0
+
+	.fnend
+
+libc_hidden_def (__clone3)
+weak_alias (__clone3, clone3)
diff --git a/sysdeps/unix/sysv/linux/arm/sysdep.h b/sysdeps/unix/sysv/linux/arm/sysdep.h
index 2f321881c8..57fc5f16bd 100644
--- a/sysdeps/unix/sysv/linux/arm/sysdep.h
+++ b/sysdeps/unix/sysv/linux/arm/sysdep.h
@@ -362,6 +362,7 @@ __local_syscall_error:						\
 #define HAVE_CLOCK_GETTIME_VSYSCALL	"__vdso_clock_gettime"
 #define HAVE_CLOCK_GETTIME64_VSYSCALL	"__vdso_clock_gettime64"
 #define HAVE_GETTIMEOFDAY_VSYSCALL	"__vdso_gettimeofday"
+#define HAVE_CLONE3_WRAPPER		1
 
 #define LOAD_ARGS_0()
 #define ASM_ARGS_0
-- 
2.34.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v7 2/8] mips: Add the clone3 wrapper
  2023-08-03 16:35 [PATCH v7 0/8] Add pidfd and cgroupv2 support for process creation Adhemerval Zanella
  2023-08-03 16:35 ` [PATCH v7 1/8] arm: Add the clone3 wrapper Adhemerval Zanella
@ 2023-08-03 16:35 ` Adhemerval Zanella
  2023-08-03 16:35 ` [PATCH v7 3/8] linux: Undef __ASSUME_CLONE3 for alpha, ia64, nios2, sh, and sparc Adhemerval Zanella
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 23+ messages in thread
From: Adhemerval Zanella @ 2023-08-03 16:35 UTC (permalink / raw)
  To: libc-alpha

It follows the internal signature:

extern int clone3 (struct clone_args *__cl_args, size_t __size,
                   int (*__func) (void *__arg), void *__arg);

Checked on mips64el-linux-gnueabihf, mips64el-n32-linux-gnu, and
mipsel-linux-gnu.
---
 sysdeps/unix/sysv/linux/mips/clone3.S | 139 ++++++++++++++++++++++++++
 sysdeps/unix/sysv/linux/mips/sysdep.h |   2 +
 2 files changed, 141 insertions(+)
 create mode 100644 sysdeps/unix/sysv/linux/mips/clone3.S

diff --git a/sysdeps/unix/sysv/linux/mips/clone3.S b/sysdeps/unix/sysv/linux/mips/clone3.S
new file mode 100644
index 0000000000..1d16bfcef6
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/mips/clone3.S
@@ -0,0 +1,139 @@
+/* The clone3 syscall wrapper.  Linux/mips version.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sys/asm.h>
+#include <sysdep.h>
+#define _ERRNO_H        1
+#include <bits/errno.h>
+
+/* The userland implementation is:
+   int clone3 (struct clone_args *cl_args, size_t size,
+               int (*func)(void *arg), void *arg);
+
+   the kernel entry is:
+   int clone3 (struct clone_args *cl_args, size_t size);
+
+   The parameters are passed in registers from userland:
+   a0/$4: cl_args
+   a1/$5: size
+   a2/$6: func
+   a3/$7: arg  */
+
+	.text
+	.set		nomips16
+#if _MIPS_SIM == _ABIO32
+# define EXTRA_LOCALS 1
+#else
+# define EXTRA_LOCALS 0
+#endif
+#define FRAMESZ ((NARGSAVE*SZREG)+ALSZ)&ALMASK
+GPOFF= FRAMESZ-(1*SZREG)
+NESTED(__clone3, SZREG, sp)
+#ifdef __PIC__
+	SETUP_GP
+#endif
+#if FRAMESZ
+	PTR_SUBU sp, FRAMESZ
+	cfi_adjust_cfa_offset (FRAMESZ)
+#endif
+	SETUP_GP64_STACK (GPOFF, __clone3)
+#ifdef __PIC__
+	SAVE_GP (GPOFF)
+#endif
+#ifdef PROF
+	.set	noat
+	move	$1,ra
+	jal	_mcount
+	.set	at
+#endif
+
+	/* Sanity check args.  */
+	li	v0, EINVAL
+	beqz	a0, L(error)	/* No NULL cl_args pointer.  */
+	beqz	a2, L(error)	/* No NULL function pointer.  */
+
+	move	$8, a3		/* a3 is set to 0/1 for syscall success/error
+				   while a4/$8 is returned unmodified.  */
+
+	/* Do the system call, the kernel expects:
+	   v0: system call number
+	   a0: cl_args
+	   a1: size  */
+	li		v0, __NR_clone3
+	cfi_endproc
+	syscall
+
+	bnez		a3, L(error)
+	beqz		v0, L(thread_start_clone3)
+
+	/* Successful return from the parent */
+	cfi_startproc
+#if FRAMESZ
+	cfi_adjust_cfa_offset (FRAMESZ)
+#endif
+	SETUP_GP64_STACK_CFI (GPOFF)
+	cfi_remember_state
+	RESTORE_GP64_STACK
+#if FRAMESZ
+	PTR_ADDU	sp, FRAMESZ
+	cfi_adjust_cfa_offset (-FRAMESZ)
+#endif
+	ret
+
+L(error):
+	cfi_restore_state
+#ifdef __PIC__
+	PTR_LA		t9, __syscall_error
+	RESTORE_GP64_STACK
+	PTR_ADDU	sp, FRAMESZ
+	cfi_adjust_cfa_offset (-FRAMESZ)
+	jr		t9
+#else
+	RESTORE_GP64_STACK
+	PTR_ADDU	sp, FRAMESZ
+	cfi_adjust_cfa_offset (-FRAMESZ)
+	j		__syscall_error
+#endif
+END (__clone3)
+
+/* Load up the arguments to the function.  Put this block of code in
+   its own function so that we can terminate the stack trace with our
+   debug info.  */
+
+ENTRY(__thread_start_clone3)
+L(thread_start_clone3):
+	cfi_undefined ($31)
+	/* cp is already loaded.  */
+	SAVE_GP (GPOFF)
+	/* The stackframe has been created on entry of clone3.  */
+
+	/* Restore the arg for user's function.  */
+	move		t9, a2		/* Function pointer.  */
+	move		a0, $8		/* Argument pointer.  */
+
+	/* Call the user's function.  */
+	jal		t9
+
+	move		a0, v0
+	li		v0, __NR_exit
+	syscall
+END(__thread_start_clone3)
+
+libc_hidden_def (__clone3)
+weak_alias (__clone3, clone3)
diff --git a/sysdeps/unix/sysv/linux/mips/sysdep.h b/sysdeps/unix/sysv/linux/mips/sysdep.h
index ff84a91b31..673aa08b57 100644
--- a/sysdeps/unix/sysv/linux/mips/sysdep.h
+++ b/sysdeps/unix/sysv/linux/mips/sysdep.h
@@ -28,3 +28,5 @@
 #endif
 #define HAVE_GETTIMEOFDAY_VSYSCALL      "__vdso_gettimeofday"
 #define HAVE_CLOCK_GETRES_VSYSCALL      "__vdso_clock_getres"
+
+#define HAVE_CLONE3_WRAPPER		1
-- 
2.34.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v7 3/8] linux: Undef __ASSUME_CLONE3 for alpha, ia64, nios2, sh, and sparc
  2023-08-03 16:35 [PATCH v7 0/8] Add pidfd and cgroupv2 support for process creation Adhemerval Zanella
  2023-08-03 16:35 ` [PATCH v7 1/8] arm: Add the clone3 wrapper Adhemerval Zanella
  2023-08-03 16:35 ` [PATCH v7 2/8] mips: " Adhemerval Zanella
@ 2023-08-03 16:35 ` Adhemerval Zanella
  2023-08-11 10:34   ` Florian Weimer
  2023-08-03 16:35 ` [PATCH v7 4/8] linux: Add posix_spawnattr_{get,set}cgroup_np (BZ 26731) Adhemerval Zanella
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Adhemerval Zanella @ 2023-08-03 16:35 UTC (permalink / raw)
  To: libc-alpha

Not all architectures added clone3 syscall.
---
 .../unix/sysv/linux/alpha/kernel-features.h   |  3 +++
 .../unix/sysv/linux/ia64/kernel-features.h    |  3 +++
 .../unix/sysv/linux/nios2/kernel-features.h   | 23 +++++++++++++++++++
 sysdeps/unix/sysv/linux/sh/kernel-features.h  |  3 +++
 .../unix/sysv/linux/sparc/kernel-features.h   |  3 +++
 5 files changed, 35 insertions(+)
 create mode 100644 sysdeps/unix/sysv/linux/nios2/kernel-features.h

diff --git a/sysdeps/unix/sysv/linux/alpha/kernel-features.h b/sysdeps/unix/sysv/linux/alpha/kernel-features.h
index 3151e75449..e298bf2bcc 100644
--- a/sysdeps/unix/sysv/linux/alpha/kernel-features.h
+++ b/sysdeps/unix/sysv/linux/alpha/kernel-features.h
@@ -50,4 +50,7 @@
 /* Alpha requires old sysvipc even being a 64-bit architecture.  */
 #undef __ASSUME_SYSVIPC_DEFAULT_IPC_64
 
+/* Alpha does not provide clone3.  */
+#undef __ASSUME_CLONE3
+
 #endif /* _KERNEL_FEATURES_H */
diff --git a/sysdeps/unix/sysv/linux/ia64/kernel-features.h b/sysdeps/unix/sysv/linux/ia64/kernel-features.h
index 98ebfb74bf..6580ec20fe 100644
--- a/sysdeps/unix/sysv/linux/ia64/kernel-features.h
+++ b/sysdeps/unix/sysv/linux/ia64/kernel-features.h
@@ -34,4 +34,7 @@
 #undef __ASSUME_CLONE_DEFAULT
 #define __ASSUME_CLONE2
 
+/* ia64 does not provide clone3.  */
+#undef __ASSUME_CLONE3
+
 #endif /* _KERNEL_FEATURES_H */
diff --git a/sysdeps/unix/sysv/linux/nios2/kernel-features.h b/sysdeps/unix/sysv/linux/nios2/kernel-features.h
new file mode 100644
index 0000000000..bb2d887dd5
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/nios2/kernel-features.h
@@ -0,0 +1,23 @@
+/* Set flags signalling availability of kernel features based on given
+   kernel version number.  NIOS2 version.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library.  If not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include_next <kernel-features.h>
+
+/* nios2 does not provide clone3.  */
+#undef __ASSUME_CLONE3
diff --git a/sysdeps/unix/sysv/linux/sh/kernel-features.h b/sysdeps/unix/sysv/linux/sh/kernel-features.h
index 953fa8dff0..80a90d05bb 100644
--- a/sysdeps/unix/sysv/linux/sh/kernel-features.h
+++ b/sysdeps/unix/sysv/linux/sh/kernel-features.h
@@ -55,4 +55,7 @@
 # undef __ASSUME_STATX
 #endif
 
+/* sh does not provide clone3.  */
+#undef __ASSUME_CLONE3
+
 #endif
diff --git a/sysdeps/unix/sysv/linux/sparc/kernel-features.h b/sysdeps/unix/sysv/linux/sparc/kernel-features.h
index 98c938c16d..fa9383081f 100644
--- a/sysdeps/unix/sysv/linux/sparc/kernel-features.h
+++ b/sysdeps/unix/sysv/linux/sparc/kernel-features.h
@@ -87,3 +87,6 @@
    (INLINE_CLONE_SYSCALL).  */
 #undef __ASSUME_CLONE_DEFAULT
 #define __ASSUME_CLONE_BACKWARDS	1
+
+/* sparc does not provide clone3.  */
+#undef __ASSUME_CLONE3
-- 
2.34.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v7 4/8] linux: Add posix_spawnattr_{get,set}cgroup_np (BZ 26731)
  2023-08-03 16:35 [PATCH v7 0/8] Add pidfd and cgroupv2 support for process creation Adhemerval Zanella
                   ` (2 preceding siblings ...)
  2023-08-03 16:35 ` [PATCH v7 3/8] linux: Undef __ASSUME_CLONE3 for alpha, ia64, nios2, sh, and sparc Adhemerval Zanella
@ 2023-08-03 16:35 ` Adhemerval Zanella
  2023-08-11 10:51   ` [PATCH v7 4/8] linux: Add posix_spawnattr_{get, set}cgroup_np " Florian Weimer
  2023-08-14 13:27   ` Carlos O'Donell
  2023-08-03 16:35 ` [PATCH v7 5/8] posix: Add pidfd_spawn and pidfd_spawnp (BZ 30349) Adhemerval Zanella
                   ` (3 subsequent siblings)
  7 siblings, 2 replies; 23+ messages in thread
From: Adhemerval Zanella @ 2023-08-03 16:35 UTC (permalink / raw)
  To: libc-alpha

These function allow to posix_spawn and posix_spawnp to use
CLONE_INTO_CGROUP with clone3, allowing the child process to
be created in a different version 2 cgroup.  These are GNU
extensions that are available only for Linux, and also only
for the architectures that implement clone3 wrapper
(HAVE_CLONE3_WRAPPER).

To create a process on a different cgroupv2, one can use the:

  posix_spawnattr_t attr;
  posix_spawnattr_init (&attr);
  posix_spawnattr_setflags (&attr, POSIX_SPAWN_SETCGROUP);
  posix_spawnattr_setcgroup_np (&attr, cgroup);
  posix_spawn (...)

Similar to other posix_spawn flags, POSIX_SPAWN_SETCGROUP control
whether the cgroup file descriptor will be used or not with
clone3.

There is no fallback is either clone3 does not support the flag
or if the architecture does not provide the clone3 wrapper, in
this case posix_spawn returns ENOTSUP.

Checked on x86_64-linux-gnu.
---
 NEWS                                          |   6 +-
 bits/spawn_ext.h                              |  21 ++
 posix/Makefile                                |   1 +
 posix/spawn.h                                 |   6 +-
 posix/spawnattr_setflags.c                    |   3 +-
 sysdeps/unix/sysv/linux/Makefile              |   5 +
 sysdeps/unix/sysv/linux/Versions              |   4 +
 sysdeps/unix/sysv/linux/aarch64/libc.abilist  |   2 +
 sysdeps/unix/sysv/linux/alpha/libc.abilist    |   2 +
 sysdeps/unix/sysv/linux/arc/libc.abilist      |   2 +
 sysdeps/unix/sysv/linux/arm/be/libc.abilist   |   2 +
 sysdeps/unix/sysv/linux/arm/le/libc.abilist   |   2 +
 sysdeps/unix/sysv/linux/bits/spawn_ext.h      |  40 ++++
 sysdeps/unix/sysv/linux/csky/libc.abilist     |   2 +
 sysdeps/unix/sysv/linux/hppa/libc.abilist     |   2 +
 sysdeps/unix/sysv/linux/i386/libc.abilist     |   2 +
 sysdeps/unix/sysv/linux/ia64/libc.abilist     |   2 +
 .../sysv/linux/loongarch/lp64/libc.abilist    |   2 +
 .../sysv/linux/m68k/coldfire/libc.abilist     |   2 +
 .../unix/sysv/linux/m68k/m680x0/libc.abilist  |   2 +
 .../sysv/linux/microblaze/be/libc.abilist     |   2 +
 .../sysv/linux/microblaze/le/libc.abilist     |   2 +
 .../sysv/linux/mips/mips32/fpu/libc.abilist   |   2 +
 .../sysv/linux/mips/mips32/nofpu/libc.abilist |   2 +
 .../sysv/linux/mips/mips64/n32/libc.abilist   |   2 +
 .../sysv/linux/mips/mips64/n64/libc.abilist   |   2 +
 sysdeps/unix/sysv/linux/nios2/libc.abilist    |   2 +
 sysdeps/unix/sysv/linux/or1k/libc.abilist     |   2 +
 .../linux/powerpc/powerpc32/fpu/libc.abilist  |   2 +
 .../powerpc/powerpc32/nofpu/libc.abilist      |   2 +
 .../linux/powerpc/powerpc64/be/libc.abilist   |   2 +
 .../linux/powerpc/powerpc64/le/libc.abilist   |   2 +
 .../unix/sysv/linux/riscv/rv32/libc.abilist   |   2 +
 .../unix/sysv/linux/riscv/rv64/libc.abilist   |   2 +
 .../unix/sysv/linux/s390/s390-32/libc.abilist |   2 +
 .../unix/sysv/linux/s390/s390-64/libc.abilist |   2 +
 sysdeps/unix/sysv/linux/sh/be/libc.abilist    |   2 +
 sysdeps/unix/sysv/linux/sh/le/libc.abilist    |   2 +
 .../sysv/linux/sparc/sparc32/libc.abilist     |   2 +
 .../sysv/linux/sparc/sparc64/libc.abilist     |   2 +
 .../unix/sysv/linux/spawnattr_getcgroup_np.c  |  28 +++
 .../unix/sysv/linux/spawnattr_setcgroup_np.c  |  27 +++
 sysdeps/unix/sysv/linux/spawni.c              |  22 +-
 sysdeps/unix/sysv/linux/tst-spawn-cgroup.c    | 216 ++++++++++++++++++
 .../unix/sysv/linux/x86_64/64/libc.abilist    |   2 +
 .../unix/sysv/linux/x86_64/x32/libc.abilist   |   2 +
 46 files changed, 441 insertions(+), 6 deletions(-)
 create mode 100644 bits/spawn_ext.h
 create mode 100644 sysdeps/unix/sysv/linux/bits/spawn_ext.h
 create mode 100644 sysdeps/unix/sysv/linux/spawnattr_getcgroup_np.c
 create mode 100644 sysdeps/unix/sysv/linux/spawnattr_setcgroup_np.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn-cgroup.c

diff --git a/NEWS b/NEWS
index 22875d5fa4..99824eab95 100644
--- a/NEWS
+++ b/NEWS
@@ -9,7 +9,11 @@ Version 2.39
 
 Major new features:
 
-  [Add new features here]
+* On Linux, the functions posix_spawnattr_getcgroup_np and
+  posix_spawnattr_setcgroup_np have been added, along with the
+  POSIX_SPAWN_SETCGROUP flag.  They allow posix_spawn and posix_spawnp to
+  set the cgroupv2 in the new process in a race free manner.  These functions
+  are GNU extensions and require a kernel with clone3 support.
 
 Deprecated and removed features, and other changes affecting compatibility:
 
diff --git a/bits/spawn_ext.h b/bits/spawn_ext.h
new file mode 100644
index 0000000000..75b504a768
--- /dev/null
+++ b/bits/spawn_ext.h
@@ -0,0 +1,21 @@
+/* POSIX spawn extensions.   Generic version.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#ifndef _SPAWN_H
+# error "Never include <bits/spawn-ext.h> directly; use <spawn.h> instead."
+#endif
diff --git a/posix/Makefile b/posix/Makefile
index 3d368b91f6..70faad4b63 100644
--- a/posix/Makefile
+++ b/posix/Makefile
@@ -37,6 +37,7 @@ headers := \
   bits/pthreadtypes-arch.h \
   bits/pthreadtypes.h \
   bits/sched.h \
+  bits/spawn_ext.h \
   bits/thread-shared-types.h \
   bits/types.h \
   bits/types/idtype_t.h \
diff --git a/posix/spawn.h b/posix/spawn.h
index 04cc525fa5..731862cc5a 100644
--- a/posix/spawn.h
+++ b/posix/spawn.h
@@ -34,7 +34,8 @@ typedef struct
   sigset_t __ss;
   struct sched_param __sp;
   int __policy;
-  int __pad[16];
+  int __cgroup;
+  int __pad[15];
 } posix_spawnattr_t;
 
 
@@ -59,6 +60,7 @@ typedef struct
 #ifdef __USE_GNU
 # define POSIX_SPAWN_USEVFORK		0x40
 # define POSIX_SPAWN_SETSID		0x80
+# define POSIX_SPAWN_SETCGROUP         0x100
 #endif
 
 
@@ -231,4 +233,6 @@ posix_spawn_file_actions_addtcsetpgrp_np (posix_spawn_file_actions_t *,
 
 __END_DECLS
 
+#include <bits/spawn_ext.h>
+
 #endif /* spawn.h */
diff --git a/posix/spawnattr_setflags.c b/posix/spawnattr_setflags.c
index 97153948e4..e7bb217c6a 100644
--- a/posix/spawnattr_setflags.c
+++ b/posix/spawnattr_setflags.c
@@ -26,7 +26,8 @@
 		   | POSIX_SPAWN_SETSCHEDPARAM				      \
 		   | POSIX_SPAWN_SETSCHEDULER				      \
 		   | POSIX_SPAWN_SETSID					      \
-		   | POSIX_SPAWN_USEVFORK)
+		   | POSIX_SPAWN_USEVFORK				      \
+		   | POSIX_SPAWN_SETCGROUP)
 
 /* Store flags in the attribute structure.  */
 int
diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
index be801e3be4..d7b020154a 100644
--- a/sysdeps/unix/sysv/linux/Makefile
+++ b/sysdeps/unix/sysv/linux/Makefile
@@ -493,11 +493,14 @@ sysdep_routines += \
   getcpu \
   oldglob \
   sched_getcpu \
+  spawnattr_getcgroup_np \
+  spawnattr_setcgroup_np \
   # sysdep_routines
 
 tests += \
   tst-affinity \
   tst-affinity-pid \
+  tst-spawn-cgroup \
   # tests
 
 tests-static += \
@@ -511,6 +514,8 @@ tests += \
 CFLAGS-fork.c = $(libio-mtsafe)
 CFLAGS-getpid.o = -fomit-frame-pointer
 CFLAGS-getpid.os = -fomit-frame-pointer
+
+tst-spawn-cgroup-ARGS = -- $(host-test-program-cmd)
 endif
 
 ifeq ($(subdir),inet)
diff --git a/sysdeps/unix/sysv/linux/Versions b/sysdeps/unix/sysv/linux/Versions
index bc59bce42f..6d8a67039e 100644
--- a/sysdeps/unix/sysv/linux/Versions
+++ b/sysdeps/unix/sysv/linux/Versions
@@ -321,6 +321,10 @@ libc {
     __ppoll64_chk;
 %endif
   }
+  GLIBC_2.39 {
+    posix_spawnattr_getcgroup_np;
+    posix_spawnattr_setcgroup_np;
+  }
   GLIBC_PRIVATE {
     # functions used in other libraries
     __syscall_rt_sigqueueinfo;
diff --git a/sysdeps/unix/sysv/linux/aarch64/libc.abilist b/sysdeps/unix/sysv/linux/aarch64/libc.abilist
index c49363e70e..0090827e01 100644
--- a/sysdeps/unix/sysv/linux/aarch64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/aarch64/libc.abilist
@@ -2673,3 +2673,5 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/alpha/libc.abilist b/sysdeps/unix/sysv/linux/alpha/libc.abilist
index d6b1dcaae6..9d099471b6 100644
--- a/sysdeps/unix/sysv/linux/alpha/libc.abilist
+++ b/sysdeps/unix/sysv/linux/alpha/libc.abilist
@@ -2782,6 +2782,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 _IO_fprintf F
 GLIBC_2.4 _IO_printf F
 GLIBC_2.4 _IO_sprintf F
diff --git a/sysdeps/unix/sysv/linux/arc/libc.abilist b/sysdeps/unix/sysv/linux/arc/libc.abilist
index dfe0c3f7b6..d7ed2f66de 100644
--- a/sysdeps/unix/sysv/linux/arc/libc.abilist
+++ b/sysdeps/unix/sysv/linux/arc/libc.abilist
@@ -2434,3 +2434,5 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/arm/be/libc.abilist b/sysdeps/unix/sysv/linux/arm/be/libc.abilist
index 6c75e5aa76..92e686defe 100644
--- a/sysdeps/unix/sysv/linux/arm/be/libc.abilist
+++ b/sysdeps/unix/sysv/linux/arm/be/libc.abilist
@@ -554,6 +554,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 _Exit F
 GLIBC_2.4 _IO_2_1_stderr_ D 0xa0
 GLIBC_2.4 _IO_2_1_stdin_ D 0xa0
diff --git a/sysdeps/unix/sysv/linux/arm/le/libc.abilist b/sysdeps/unix/sysv/linux/arm/le/libc.abilist
index 03d6f7ae2d..b503e642fc 100644
--- a/sysdeps/unix/sysv/linux/arm/le/libc.abilist
+++ b/sysdeps/unix/sysv/linux/arm/le/libc.abilist
@@ -551,6 +551,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 _Exit F
 GLIBC_2.4 _IO_2_1_stderr_ D 0xa0
 GLIBC_2.4 _IO_2_1_stdin_ D 0xa0
diff --git a/sysdeps/unix/sysv/linux/bits/spawn_ext.h b/sysdeps/unix/sysv/linux/bits/spawn_ext.h
new file mode 100644
index 0000000000..3bc10ab477
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/bits/spawn_ext.h
@@ -0,0 +1,40 @@
+/* POSIX spawn extensions.   Linux version.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#ifndef _SPAWN_H
+# error "Never include <bits/spawn-ext.h> directly; use <spawn.h> instead."
+#endif
+
+__BEGIN_DECLS
+
+#ifdef __USE_MISC
+
+/* Get the cgroupsv2 the attribute structure.  */
+extern int posix_spawnattr_getcgroup_np (const posix_spawnattr_t *
+					 __restrict __attr,
+					 int *__cgroup)
+     __THROW __nonnull ((1, 2));
+
+/* Store scheduling parameters in the attribute structure.  */
+extern int posix_spawnattr_setcgroup_np (posix_spawnattr_t *__restrict __attr,
+					 int __cgroup)
+     __THROW __nonnull ((1));
+
+#endif /* __USE_MISC */
+
+__END_DECLS
diff --git a/sysdeps/unix/sysv/linux/csky/libc.abilist b/sysdeps/unix/sysv/linux/csky/libc.abilist
index d858c108c6..ec9e209b8d 100644
--- a/sysdeps/unix/sysv/linux/csky/libc.abilist
+++ b/sysdeps/unix/sysv/linux/csky/libc.abilist
@@ -2710,3 +2710,5 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/hppa/libc.abilist b/sysdeps/unix/sysv/linux/hppa/libc.abilist
index 82a14f8ace..961f88bf14 100644
--- a/sysdeps/unix/sysv/linux/hppa/libc.abilist
+++ b/sysdeps/unix/sysv/linux/hppa/libc.abilist
@@ -2659,6 +2659,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/i386/libc.abilist b/sysdeps/unix/sysv/linux/i386/libc.abilist
index 1950b15d5d..b6f5a4ab83 100644
--- a/sysdeps/unix/sysv/linux/i386/libc.abilist
+++ b/sysdeps/unix/sysv/linux/i386/libc.abilist
@@ -2843,6 +2843,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/ia64/libc.abilist b/sysdeps/unix/sysv/linux/ia64/libc.abilist
index d0b9cb279b..a404b99e68 100644
--- a/sysdeps/unix/sysv/linux/ia64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/ia64/libc.abilist
@@ -2608,6 +2608,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/loongarch/lp64/libc.abilist b/sysdeps/unix/sysv/linux/loongarch/lp64/libc.abilist
index e760a631dd..2f9f6e2332 100644
--- a/sysdeps/unix/sysv/linux/loongarch/lp64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/loongarch/lp64/libc.abilist
@@ -2194,3 +2194,5 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
index 35785a3d5f..b7e9ab4558 100644
--- a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
+++ b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
@@ -555,6 +555,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 _Exit F
 GLIBC_2.4 _IO_2_1_stderr_ D 0x98
 GLIBC_2.4 _IO_2_1_stdin_ D 0x98
diff --git a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
index 4ab2426e0a..c345da7e0a 100644
--- a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
+++ b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
@@ -2786,6 +2786,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist b/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist
index 38faa16232..a643d868a8 100644
--- a/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist
+++ b/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist
@@ -2759,3 +2759,5 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist b/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist
index 374d658988..fed535742c 100644
--- a/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist
+++ b/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist
@@ -2756,3 +2756,5 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
index fcc5e88e91..147bac3eaf 100644
--- a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
@@ -2751,6 +2751,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
index 01eb96cd93..e550616576 100644
--- a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
@@ -2749,6 +2749,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
index a2748b7b74..56f414dbd0 100644
--- a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
@@ -2757,6 +2757,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
index 0ae7ba499d..da704a2e2b 100644
--- a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
@@ -2659,6 +2659,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/nios2/libc.abilist b/sysdeps/unix/sysv/linux/nios2/libc.abilist
index 947495a0e2..f5a157ea94 100644
--- a/sysdeps/unix/sysv/linux/nios2/libc.abilist
+++ b/sysdeps/unix/sysv/linux/nios2/libc.abilist
@@ -2798,3 +2798,5 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/or1k/libc.abilist b/sysdeps/unix/sysv/linux/or1k/libc.abilist
index 115f1039e7..85b552f1cb 100644
--- a/sysdeps/unix/sysv/linux/or1k/libc.abilist
+++ b/sysdeps/unix/sysv/linux/or1k/libc.abilist
@@ -2180,3 +2180,5 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
index 19c4c325b0..cadb16c12f 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
@@ -2825,6 +2825,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 _IO_fprintf F
 GLIBC_2.4 _IO_printf F
 GLIBC_2.4 _IO_sprintf F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
index 3e043c4044..50c5b99728 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
@@ -2858,6 +2858,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 _IO_fprintf F
 GLIBC_2.4 _IO_printf F
 GLIBC_2.4 _IO_sprintf F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
index e4f3a766bb..81c63385af 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
@@ -2579,6 +2579,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 _IO_fprintf F
 GLIBC_2.4 _IO_printf F
 GLIBC_2.4 _IO_sprintf F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
index dafe1c4a59..af9be18108 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
@@ -2893,3 +2893,5 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist
index b9740a1afc..2266a88ad5 100644
--- a/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist
@@ -2436,3 +2436,5 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
index e3b4656aa2..4776ae32b8 100644
--- a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
@@ -2636,3 +2636,5 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
index 84cb7a50ed..5d1d7d07a5 100644
--- a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
@@ -2823,6 +2823,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 _IO_fprintf F
 GLIBC_2.4 _IO_printf F
 GLIBC_2.4 _IO_sprintf F
diff --git a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
index 33df3b1646..fffc32a0f4 100644
--- a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
@@ -2616,6 +2616,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 _IO_fprintf F
 GLIBC_2.4 _IO_printf F
 GLIBC_2.4 _IO_sprintf F
diff --git a/sysdeps/unix/sysv/linux/sh/be/libc.abilist b/sysdeps/unix/sysv/linux/sh/be/libc.abilist
index 94cbccd715..43ff21447d 100644
--- a/sysdeps/unix/sysv/linux/sh/be/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sh/be/libc.abilist
@@ -2666,6 +2666,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/sh/le/libc.abilist b/sysdeps/unix/sysv/linux/sh/le/libc.abilist
index 3bb316a787..9ea18d5886 100644
--- a/sysdeps/unix/sysv/linux/sh/le/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sh/le/libc.abilist
@@ -2663,6 +2663,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
index 6341b491b4..c6607d5385 100644
--- a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
@@ -2818,6 +2818,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 _IO_fprintf F
 GLIBC_2.4 _IO_printf F
 GLIBC_2.4 _IO_sprintf F
diff --git a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
index 8ed1ea2926..a010a2bb16 100644
--- a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
@@ -2631,6 +2631,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/spawnattr_getcgroup_np.c b/sysdeps/unix/sysv/linux/spawnattr_getcgroup_np.c
new file mode 100644
index 0000000000..82fd8f4b71
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/spawnattr_getcgroup_np.c
@@ -0,0 +1,28 @@
+/* Copyright (C) 2000-2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <spawn.h>
+
+/* Get scheduling policy from the attribute structure.  */
+int
+posix_spawnattr_getcgroup_np (const posix_spawnattr_t *attr,
+			      int *cgroup)
+{
+  *cgroup = attr->__cgroup;
+
+  return 0;
+}
diff --git a/sysdeps/unix/sysv/linux/spawnattr_setcgroup_np.c b/sysdeps/unix/sysv/linux/spawnattr_setcgroup_np.c
new file mode 100644
index 0000000000..74d60bb5ea
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/spawnattr_setcgroup_np.c
@@ -0,0 +1,27 @@
+/* Copyright (C) 2000-2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <spawn.h>
+
+/* Store scheduling policy in the attribute structure.  */
+int
+posix_spawnattr_setcgroup_np (posix_spawnattr_t *attr, int cgroup)
+{
+  attr->__cgroup = cgroup;
+
+  return 0;
+}
diff --git a/sysdeps/unix/sysv/linux/spawni.c b/sysdeps/unix/sysv/linux/spawni.c
index ec687cb423..f0d4c62ae6 100644
--- a/sysdeps/unix/sysv/linux/spawni.c
+++ b/sysdeps/unix/sysv/linux/spawni.c
@@ -380,14 +380,19 @@ __spawnix (pid_t * pid, const char *file,
      need for CLONE_SETTLS.  Although parent and child share the same TLS
      namespace, there will be no concurrent access for TLS variables (errno
      for instance).  */
+  bool set_cgroup = attrp ? (attrp->__flags & POSIX_SPAWN_SETCGROUP) : false;
   struct clone_args clone_args =
     {
       /* Unsupported flags like CLONE_CLEAR_SIGHAND will be cleared up by
 	 __clone_internal_fallback.  */
-      .flags = CLONE_CLEAR_SIGHAND | CLONE_VM | CLONE_VFORK,
+      .flags = (set_cgroup ? CLONE_INTO_CGROUP : 0)
+	       | CLONE_CLEAR_SIGHAND
+	       | CLONE_VM
+	       | CLONE_VFORK,
       .exit_signal = SIGCHLD,
       .stack = (uintptr_t) stack,
       .stack_size = stack_size,
+      .cgroup = (set_cgroup ? attrp->__cgroup : 0)
     };
 #ifdef HAVE_CLONE3_WRAPPER
   args.use_clone3 = true;
@@ -398,8 +403,19 @@ __spawnix (pid_t * pid, const char *file,
 #endif
     {
       args.use_clone3 = false;
-      new_pid = __clone_internal_fallback (&clone_args, __spawni_child,
-					   &args);
+      if (!set_cgroup)
+	new_pid = __clone_internal_fallback (&clone_args, __spawni_child,
+					     &args);
+      else
+	{
+	  /* No fallback for POSIX_SPAWN_SETCGROUP if clone3 is not
+	     supported.  */
+	  new_pid = -1;
+#ifdef HAVE_CLONE3_WRAPPER
+	  if (errno == ENOSYS)
+#endif
+	    errno = ENOTSUP;
+	}
     }
 
   /* It needs to collect the case where the auxiliary process was created
diff --git a/sysdeps/unix/sysv/linux/tst-spawn-cgroup.c b/sysdeps/unix/sysv/linux/tst-spawn-cgroup.c
new file mode 100644
index 0000000000..6dba30ab29
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/tst-spawn-cgroup.c
@@ -0,0 +1,216 @@
+/* Tests for posix_spawn cgroup extension.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <assert.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <getopt.h>
+#include <spawn.h>
+#include <stdlib.h>
+#include <string.h>
+#include <support/check.h>
+#include <support/support.h>
+#include <support/xstdio.h>
+#include <support/xunistd.h>
+#include <support/temp_file.h>
+#include <sys/vfs.h>
+#include <sys/wait.h>
+#include <unistd.h>
+
+#define CGROUPFS "/sys/fs/cgroup/"
+#ifndef CGROUP2_SUPER_MAGIC
+# define CGROUP2_SUPER_MAGIC 0x63677270
+#endif
+
+#define F_TYPE_EQUAL(a, b) (a == (typeof(a)) b)
+
+#define CGROUP_TEST "test-spawn-cgroup"
+
+/* Nonzero if the program gets called via `exec'.  */
+#define CMDLINE_OPTIONS \
+  { "restart", no_argument, &restart, 1 },
+static int restart;
+
+/* Hold the four initial argument used to respawn the process, plus the extra
+   '--direct', '--restart', the check type ('SIG_IGN' or 'SIG_DFL'), and a
+   final NULL.  */
+static char *spargs[8];
+
+static inline char *
+startswith (const char *s, const char *prefix)
+{
+  size_t l = strlen (prefix);
+  if (strncmp (s, prefix, l) == 0)
+    return (char *) s + l;
+  return NULL;
+}
+
+static char *
+get_cgroup (void)
+{
+  FILE *f = fopen ("/proc/self/cgroup", "re");
+  if (f == NULL)
+    FAIL_UNSUPPORTED ("no cgroup defined for the process");
+
+  char *cgroup = NULL;
+
+  char *line = NULL;
+  size_t linesiz = 0;
+  while (xgetline (&line, &linesiz, f) > 0)
+    {
+      char *entry = startswith (line, "0:");
+      if (entry == NULL)
+	continue;
+
+      entry = strchr (entry, ':');
+      if (entry == NULL)
+	continue;
+
+      cgroup = entry + 1;
+      size_t l = strlen (cgroup);
+      if (cgroup[l - 1] == '\n')
+	cgroup[l - 1] = '\0';
+
+      cgroup = xstrdup (entry + 1);
+      break;
+    }
+
+  xfclose (f);
+  free (line);
+
+  return cgroup;
+}
+
+
+/* Called on process re-execution.  */
+_Noreturn static void
+handle_restart (int argc, char *argv[])
+{
+  assert (argc == 1);
+  char *newcgroup = argv[0];
+
+  char *current_cgroup = get_cgroup ();
+  TEST_VERIFY_EXIT (current_cgroup != NULL);
+  TEST_COMPARE_STRING (newcgroup, current_cgroup);
+  exit (EXIT_SUCCESS);
+}
+
+static int
+do_test_cgroup_failure (pid_t *pid, int cgroup)
+{
+  posix_spawnattr_t attr;
+  TEST_COMPARE (posix_spawnattr_init (&attr), 0);
+  TEST_COMPARE (posix_spawnattr_setflags (&attr, POSIX_SPAWN_SETCGROUP), 0);
+  TEST_COMPARE (posix_spawnattr_setcgroup_np (&attr, cgroup), 0);
+
+  int cgetgroup;
+  TEST_COMPARE (posix_spawnattr_getcgroup_np (&attr, &cgetgroup), 0);
+  TEST_COMPARE (cgroup, cgetgroup);
+
+  return posix_spawn (pid, spargs[0], NULL, &attr, spargs, environ);
+}
+
+static int
+create_new_cgroup (char **newcgroup)
+{
+  struct statfs fs;
+  if (statfs (CGROUPFS, &fs) < 0)
+    {
+      if (errno == ENOENT)
+	FAIL_UNSUPPORTED ("not cgroupv2 mount found");
+      FAIL_EXIT1 ("statfs (%s): %m\n", CGROUPFS);
+    }
+
+  if (!F_TYPE_EQUAL (fs.f_type, CGROUP2_SUPER_MAGIC))
+    FAIL_UNSUPPORTED ("%s is not a cgroupv2", CGROUPFS);
+
+  char *cgroup = get_cgroup ();
+  TEST_VERIFY_EXIT (cgroup != NULL);
+  *newcgroup = xasprintf ("%s/%s", cgroup, CGROUP_TEST);
+  char *cgpath = xasprintf ("%s%s/%s", CGROUPFS, cgroup, CGROUP_TEST);
+  free (cgroup);
+
+  if (mkdir (cgpath, 0755) == -1 && errno != EEXIST)
+    {
+      if (errno == EACCES || errno == EPERM)
+	FAIL_UNSUPPORTED ("can not create a new cgroupv2 group");
+      FAIL_EXIT1 ("mkdir (%s): %m", cgpath);
+    }
+  add_temp_file (cgpath);
+
+  return xopen (cgpath, O_DIRECTORY | O_RDONLY | O_CLOEXEC, 0666);
+}
+
+static int
+do_test (int argc, char *argv[])
+{
+  /* We must have either:
+
+     - one or four parameters if called initially:
+       + argv[1]: path for ld.so        optional
+       + argv[2]: "--library-path"      optional
+       + argv[3]: the library path      optional
+       + argv[4]: the application name
+
+     - six parameters left if called through re-execution:
+       + argv[4/1]: the application name
+       + argv[5/2]: the created cgroup
+
+     * When built with --enable-hardcoded-path-in-tests or issued without
+       using the loader directly.  */
+
+  if (restart)
+    handle_restart (argc - 1, &argv[1]);
+
+  TEST_VERIFY_EXIT (argc == 2 || argc == 5);
+
+  char *newcgroup;
+  int cgroup = create_new_cgroup (&newcgroup);
+
+  int i;
+  for (i = 0; i < argc - 1; i++)
+    spargs[i] = argv[i + 1];
+  spargs[i++] = (char *) "--direct";
+  spargs[i++] = (char *) "--restart";
+  spargs[i++] = (char *) newcgroup;
+  spargs[i] = NULL;
+
+  /* Check if invalid cgroups returns an error.  */
+  {
+    TEST_COMPARE (do_test_cgroup_failure (NULL, -1), EINVAL);
+  }
+
+  {
+    pid_t pid;
+    TEST_COMPARE (do_test_cgroup_failure (&pid, cgroup), 0);
+
+    siginfo_t sinfo;
+    TEST_COMPARE (waitid (P_PID, pid, &sinfo, WEXITED), 0);
+    TEST_COMPARE (sinfo.si_signo, SIGCHLD);
+    TEST_COMPARE (sinfo.si_code, CLD_EXITED);
+    TEST_COMPARE (sinfo.si_status, 0);
+  }
+
+  xclose (cgroup);
+  free (newcgroup);
+
+  return 0;
+}
+
+#define TEST_FUNCTION_ARGV do_test
+#include <support/test-driver.c>
diff --git a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
index 57cfcc2086..3591b5de5e 100644
--- a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
@@ -2582,6 +2582,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
index 3f0a9f6d82..ffbd8f3738 100644
--- a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
@@ -2688,3 +2688,5 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 posix_spawnattr_getcgroup_np F
+GLIBC_2.39 posix_spawnattr_setcgroup_np F
-- 
2.34.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v7 5/8] posix: Add pidfd_spawn and pidfd_spawnp (BZ 30349)
  2023-08-03 16:35 [PATCH v7 0/8] Add pidfd and cgroupv2 support for process creation Adhemerval Zanella
                   ` (3 preceding siblings ...)
  2023-08-03 16:35 ` [PATCH v7 4/8] linux: Add posix_spawnattr_{get,set}cgroup_np (BZ 26731) Adhemerval Zanella
@ 2023-08-03 16:35 ` Adhemerval Zanella
  2023-08-11 11:45   ` Florian Weimer
  2023-08-03 16:35 ` [PATCH v7 6/8] posix: Add pidfd_fork (BZ 26371) Adhemerval Zanella
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Adhemerval Zanella @ 2023-08-03 16:35 UTC (permalink / raw)
  To: libc-alpha

Returning a pidfd allows a process to keep a race-free handle to a child
process, otherwise the caller will need to either use pidfd_open (which
still might be subject to TOCTOU) or keep the old racy interface base
on pid_t.

The implementation makes sure that kernel must support the complete
pidfd interface, meaning that waitid (P_PIDFD) should be supported
(added on Linux 5.4).  It ensures that non racy workaround is required
(such as reading procfs fdinfo pid to use along with wait interfaces).

These interfaces are similar to the posix_spawn and posix_spawnp, with
the only different diferent being it returns a process file descriptor
(int) instead of process ID (pid_t).  Their prototypes are:

  int pidfd_spawn (int *restrict pidfd,
 		   const char *restrict file,
  		   const posix_spawn_file_actions_t *restrict facts,
  		   const posix_spawnattr_t *restrict attrp,
  		   char *const argv[restrict],
  		   char *const envp[restrict])

  int pidfd_spawnp (int *restrict pidfd,
 		    const char *restrict path,
  		    const posix_spawn_file_actions_t *restrict facts,
  		    const posix_spawnattr_t *restrict attrp,
  		    char *const argv[restrict_arr],
  		    char *const envp[restrict_arr]);

A new symbol is used instead of a posix_spawn extension to avoid possible
issue with language bindings that might track the return argument
lifetime.  Although, on Linux pid_t and int are interchangeable, POSIX
only state that pid_t should be a signed interger.

Both symbols reuse the posix_spawn posix_spawn_file_actions_t and
posix_spawnattr_t, to void rehash posix_spawn API or add a new one.
It also mean that both interfaces support the same attribute and
file actions, and a new flag or file actions on posix_spawn is also
added automatically for pidfd_spawn.

Also, using posix_spawn plumbering allows to reuse most of the current
testing with some changes:

  - waitid is used instead of waitpid, since it is a more generic
    interface.

  - tst-posix_spawn-setsid.c is adapted to take in consideration that
    caller can check for session id directly.  The test now spawn itself
    and write the session id a file instead.

  - tst-spawn3.c need to know where pidfd_spawn is used so it keep
    an extra file description ununsed.

Checked on x86_64-linux-gnu on Linux 4.15 (no CLONE_PID or waitid
support), Linux 5.15 (only clone3 support), and Linux 5.19 (full
support).
---
 NEWS                                          |   7 +
 include/clone_internal.h                      |   4 +
 manual/process.texi                           |  14 +-
 posix/Makefile                                |   1 +
 posix/spawn_int.h                             |   3 +-
 posix/tst-posix_spawn-setsid.c                | 168 +++++++++++++-----
 posix/tst-spawn-chdir.c                       |  15 +-
 posix/tst-spawn.c                             |  24 +--
 posix/tst-spawn.h                             |  36 ++++
 posix/tst-spawn2.c                            |  17 +-
 posix/tst-spawn3.c                            | 100 ++++++-----
 posix/tst-spawn4.c                            |   7 +-
 posix/tst-spawn5.c                            |  14 +-
 posix/tst-spawn6.c                            |  15 +-
 posix/tst-spawn7.c                            |  13 +-
 sysdeps/unix/sysv/linux/Makefile              |  18 ++
 sysdeps/unix/sysv/linux/Versions              |   2 +
 sysdeps/unix/sysv/linux/aarch64/libc.abilist  |   2 +
 sysdeps/unix/sysv/linux/alpha/libc.abilist    |   2 +
 sysdeps/unix/sysv/linux/arc/libc.abilist      |   2 +
 sysdeps/unix/sysv/linux/arm/be/libc.abilist   |   2 +
 sysdeps/unix/sysv/linux/arm/le/libc.abilist   |   2 +
 sysdeps/unix/sysv/linux/bits/spawn_ext.h      |  31 ++++
 sysdeps/unix/sysv/linux/clone-pidfd-support.c |  58 ++++++
 sysdeps/unix/sysv/linux/csky/libc.abilist     |   2 +
 sysdeps/unix/sysv/linux/hppa/libc.abilist     |   2 +
 sysdeps/unix/sysv/linux/i386/libc.abilist     |   2 +
 sysdeps/unix/sysv/linux/ia64/libc.abilist     |   2 +
 .../sysv/linux/loongarch/lp64/libc.abilist    |   2 +
 .../sysv/linux/m68k/coldfire/libc.abilist     |   2 +
 .../unix/sysv/linux/m68k/m680x0/libc.abilist  |   2 +
 .../sysv/linux/microblaze/be/libc.abilist     |   2 +
 .../sysv/linux/microblaze/le/libc.abilist     |   2 +
 .../sysv/linux/mips/mips32/fpu/libc.abilist   |   2 +
 .../sysv/linux/mips/mips32/nofpu/libc.abilist |   2 +
 .../sysv/linux/mips/mips64/n32/libc.abilist   |   2 +
 .../sysv/linux/mips/mips64/n64/libc.abilist   |   2 +
 sysdeps/unix/sysv/linux/nios2/libc.abilist    |   2 +
 sysdeps/unix/sysv/linux/or1k/libc.abilist     |   2 +
 sysdeps/unix/sysv/linux/pidfd_spawn.c         |  30 ++++
 sysdeps/unix/sysv/linux/pidfd_spawnp.c        |  30 ++++
 .../linux/powerpc/powerpc32/fpu/libc.abilist  |   2 +
 .../powerpc/powerpc32/nofpu/libc.abilist      |   2 +
 .../linux/powerpc/powerpc64/be/libc.abilist   |   2 +
 .../linux/powerpc/powerpc64/le/libc.abilist   |   2 +
 .../unix/sysv/linux/riscv/rv32/libc.abilist   |   2 +
 .../unix/sysv/linux/riscv/rv64/libc.abilist   |   2 +
 .../unix/sysv/linux/s390/s390-32/libc.abilist |   2 +
 .../unix/sysv/linux/s390/s390-64/libc.abilist |   2 +
 sysdeps/unix/sysv/linux/sh/be/libc.abilist    |   2 +
 sysdeps/unix/sysv/linux/sh/le/libc.abilist    |   2 +
 .../sysv/linux/sparc/sparc32/libc.abilist     |   2 +
 .../sysv/linux/sparc/sparc64/libc.abilist     |   2 +
 sysdeps/unix/sysv/linux/spawni.c              |  20 ++-
 .../sysv/linux/tst-posix_spawn-setsid-pidfd.c |  20 +++
 .../unix/sysv/linux/tst-spawn-chdir-pidfd.c   |  20 +++
 sysdeps/unix/sysv/linux/tst-spawn-pidfd.c     |  20 +++
 sysdeps/unix/sysv/linux/tst-spawn-pidfd.h     |  63 +++++++
 sysdeps/unix/sysv/linux/tst-spawn2-pidfd.c    |  20 +++
 sysdeps/unix/sysv/linux/tst-spawn3-pidfd.c    |  20 +++
 sysdeps/unix/sysv/linux/tst-spawn4-pidfd.c    |  20 +++
 sysdeps/unix/sysv/linux/tst-spawn5-pidfd.c    |  20 +++
 sysdeps/unix/sysv/linux/tst-spawn6-pidfd.c    |  20 +++
 sysdeps/unix/sysv/linux/tst-spawn7-pidfd.c    |  20 +++
 .../unix/sysv/linux/x86_64/64/libc.abilist    |   2 +
 .../unix/sysv/linux/x86_64/x32/libc.abilist   |   2 +
 66 files changed, 787 insertions(+), 151 deletions(-)
 create mode 100644 posix/tst-spawn.h
 create mode 100644 sysdeps/unix/sysv/linux/clone-pidfd-support.c
 create mode 100644 sysdeps/unix/sysv/linux/pidfd_spawn.c
 create mode 100644 sysdeps/unix/sysv/linux/pidfd_spawnp.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-posix_spawn-setsid-pidfd.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn-chdir-pidfd.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn-pidfd.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn-pidfd.h
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn2-pidfd.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn3-pidfd.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn4-pidfd.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn5-pidfd.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn6-pidfd.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn7-pidfd.c

diff --git a/NEWS b/NEWS
index 99824eab95..ff41443896 100644
--- a/NEWS
+++ b/NEWS
@@ -15,6 +15,13 @@ Major new features:
   set the cgroupv2 in the new process in a race free manner.  These functions
   are GNU extensions and require a kernel with clone3 support.
 
+* On Linux, the pidfd_spawn and pidfd_spawp functions have been added.
+  They have similar prototype and semantic as posix_spawn, but instead of
+  returning a process ID, they return a file descriptor that can be used
+  along other pidfd functions (like pidfd_send_signal, poll, or waitid).
+  The pidfd functionality avoid the issue of PID reuse with traditional
+  posix_spawn interface.
+
 Deprecated and removed features, and other changes affecting compatibility:
 
   [Add deprecations, removals and changes affecting compatibility here]
diff --git a/include/clone_internal.h b/include/clone_internal.h
index ad7b170f58..567160ebb5 100644
--- a/include/clone_internal.h
+++ b/include/clone_internal.h
@@ -35,6 +35,10 @@ extern int __clone_internal_fallback (struct clone_args *__cl_args,
 				      void *__arg)
      attribute_hidden;
 
+/* Return whether the kernel supports pid file descriptor, including clone
+   with CLONE_PIDFD and waitid with P_PIDFD.  */
+extern bool __clone_pidfd_supported (void) attribute_hidden;
+
 #ifndef _ISOMAC
 libc_hidden_proto (__clone3)
 libc_hidden_proto (__clone_internal)
diff --git a/manual/process.texi b/manual/process.texi
index c8413a5a58..68361c3f61 100644
--- a/manual/process.texi
+++ b/manual/process.texi
@@ -136,13 +136,13 @@ creating a process and making it run another program.
 @cindex parent process
 @cindex subprocess
 A new processes is created when one of the functions
-@code{posix_spawn}, @code{fork}, @code{_Fork} or @code{vfork} is called.
-(The @code{system} and @code{popen} also create new processes internally.)
-Due to the name of the @code{fork} function, the act of creating a new
-process is sometimes called @dfn{forking} a process.  Each new process
-(the @dfn{child process} or @dfn{subprocess}) is allocated a process
-ID, distinct from the process ID of the parent process.  @xref{Process
-Identification}.
+@code{posix_spawn}, @code{fork}, @code{_Fork}, @code{vfork}, or
+@code{pidfd_spawn} is called.  (The @code{system} and @code{popen} also
+create new processes internally.)  Due to the name of the @code{fork}
+function, the act of creating a new process is sometimes called
+@dfn{forking} a process.  Each new process (the @dfn{child process} or
+@dfn{subprocess}) is allocated a process ID, distinct from the process
+ID of the parent process.  @xref{Process Identification}.
 
 After forking a child process, both the parent and child processes
 continue to execute normally.  If you want your program to wait for a
diff --git a/posix/Makefile b/posix/Makefile
index 70faad4b63..905cf9fb54 100644
--- a/posix/Makefile
+++ b/posix/Makefile
@@ -602,6 +602,7 @@ tst-spawn-static-ARGS = $(tst-spawn-ARGS)
 tst-spawn5-ARGS = -- $(host-test-program-cmd)
 tst-spawn6-ARGS = -- $(host-test-program-cmd)
 tst-spawn7-ARGS = -- $(host-test-program-cmd)
+tst-posix_spawn-setsid-ARGS = -- $(host-test-program-cmd)
 tst-dir-ARGS = `pwd` `cd $(common-objdir)/$(subdir); pwd` `cd $(common-objdir); pwd` $(objpfx)tst-dir
 tst-chmod-ARGS = $(objdir)
 tst-vfork3-ARGS = --test-dir=$(objpfx)
diff --git a/posix/spawn_int.h b/posix/spawn_int.h
index aeb066c44f..64ee03e62d 100644
--- a/posix/spawn_int.h
+++ b/posix/spawn_int.h
@@ -76,12 +76,13 @@ struct __spawn_action
 
 #define SPAWN_XFLAGS_USE_PATH	0x1
 #define SPAWN_XFLAGS_TRY_SHELL	0x2
+#define SPAWN_XFLAGS_RET_PIDFD  0x4
 
 extern int __posix_spawn_file_actions_realloc (posix_spawn_file_actions_t *
 					       file_actions)
      attribute_hidden;
 
-extern int __spawni (pid_t *pid, const char *path,
+extern int __spawni (int *pid, const char *path,
 		     const posix_spawn_file_actions_t *file_actions,
 		     const posix_spawnattr_t *attrp, char *const argv[],
 		     char *const envp[], int xflags) attribute_hidden;
diff --git a/posix/tst-posix_spawn-setsid.c b/posix/tst-posix_spawn-setsid.c
index 124d878ce2..751674165c 100644
--- a/posix/tst-posix_spawn-setsid.c
+++ b/posix/tst-posix_spawn-setsid.c
@@ -18,78 +18,158 @@
 
 #include <errno.h>
 #include <fcntl.h>
+#include <getopt.h>
+#include <intprops.h>
+#include <paths.h>
 #include <spawn.h>
 #include <stdbool.h>
 #include <stdio.h>
+#include <stdlib.h>
 #include <sys/resource.h>
+#include <sys/wait.h>
 #include <unistd.h>
 
 #include <support/check.h>
+#include <support/xunistd.h>
+#include <support/temp_file.h>
+#include <tst-spawn.h>
+
+/* Nonzero if the program gets called via `exec'.  */
+static int restart;
+
+/* Hold the four initial argument used to respawn the process, plus
+   the extra '--direct' and '--restart', and a final NULL.  */
+static char *initial_argv[7];
+static int initial_argv_count;
+
+#define CMDLINE_OPTIONS \
+  { "restart", no_argument, &restart, 1 },
+
+static char *pidfile;
+
+static pid_t
+read_child_sid (void)
+{
+  int pidfd = xopen (pidfile, O_RDONLY, 0);
+
+  char buf[INT_STRLEN_BOUND (pid_t)];
+  ssize_t n = read (pidfd, buf, sizeof (buf));
+  TEST_VERIFY (n < sizeof buf && n >= 0);
+  buf[n] = '\0';
+
+  /* We only expect to read the PID.  */
+  char *endp;
+  long int rpid = strtol (buf, &endp, 10);
+  TEST_VERIFY (endp != buf);
+
+  xclose (pidfd);
+
+  return rpid;
+}
+
+/* Called on process re-execution, write down the session id on PIDFILE.  */
+_Noreturn static void
+handle_restart (const char *pidfile)
+{
+  int pidfd = xopen (pidfile, O_WRONLY, 0);
+
+  char buf[INT_STRLEN_BOUND (pid_t)];
+  int s = snprintf (buf, sizeof buf, "%d", getsid (0));
+  size_t n = write (pidfd, buf, s);
+  TEST_VERIFY (n == s);
+
+  xclose (pidfd);
+
+  exit (EXIT_SUCCESS);
+}
 
 static void
 do_test_setsid (bool test_setsid)
 {
-  pid_t sid, child_sid;
-  int res;
-
   /* Current session ID.  */
-  sid = getsid(0);
-  if (sid == (pid_t) -1)
-    FAIL_EXIT1 ("getsid (0): %m");
+  pid_t sid = getsid (0);
+  TEST_VERIFY (sid != (pid_t) -1);
 
   posix_spawnattr_t attrp;
-  /* posix_spawnattr_init should not fail (it basically memset the
-     attribute).  */
-  posix_spawnattr_init (&attrp);
+  TEST_COMPARE (posix_spawnattr_init (&attrp), 0);
   if (test_setsid)
-    {
-      res = posix_spawnattr_setflags (&attrp, POSIX_SPAWN_SETSID);
-      if (res != 0)
-	{
-	  errno = res;
-	  FAIL_EXIT1 ("posix_spawnattr_setflags: %m");
-	}
-    }
-
-  /* Program to run.  */
-  char *args[2] = { (char *) "true", NULL };
-  pid_t child;
-
-  res = posix_spawnp (&child, "true", NULL, &attrp, args, environ);
-  /* posix_spawnattr_destroy is noop.  */
-  posix_spawnattr_destroy (&attrp);
-
-  if (res != 0)
-    {
-      errno = res;
-      FAIL_EXIT1 ("posix_spawnp: %m");
-    }
+    TEST_COMPARE (posix_spawnattr_setflags (&attrp, POSIX_SPAWN_SETSID), 0);
+
+  /* 1 or 4 elements from initial_argv:
+       + path to ld.so          optional
+       + --library-path         optional
+       + the library path       optional
+       + application name
+       + --direct
+       + --restart
+       + pidfile  */
+  int argv_size = initial_argv_count + 2;
+  char *args[argv_size];
+  int argc = 0;
+
+  for (char **arg = initial_argv; *arg != NULL; arg++)
+    args[argc++] = *arg;
+  args[argc++] = pidfile;
+  args[argc] = NULL;
+  TEST_VERIFY (argc < argv_size);
+
+  PID_T_TYPE pid;
+  TEST_COMPARE (POSIX_SPAWN (&pid, args[0], NULL, &attrp, args, environ), 0);
+  TEST_COMPARE (posix_spawnattr_destroy (&attrp), 0);
+
+  siginfo_t sinfo;
+  TEST_COMPARE (WAITID (P_PID, pid, &sinfo, WEXITED), 0);
+  TEST_COMPARE (sinfo.si_code, CLD_EXITED);
+  TEST_COMPARE (sinfo.si_status, 0);
+
+  pid_t child_sid = read_child_sid ();
 
   /* Child should have a different session ID than parent.  */
-  child_sid = getsid (child);
-
-  if (child_sid == (pid_t) -1)
-    FAIL_EXIT1 ("getsid (%i): %m", child);
+  TEST_VERIFY (child_sid != (pid_t) -1);
 
   if (test_setsid)
-    {
-      if (child_sid == sid)
-	FAIL_EXIT1 ("child session ID matched parent one");
-    }
+    TEST_VERIFY (child_sid != sid);
   else
-    {
-      if (child_sid != sid)
-	FAIL_EXIT1 ("child session ID did not match parent one");
-    }
+    TEST_VERIFY (child_sid == sid);
 }
 
 static int
-do_test (void)
+do_test (int argc, char *argv[])
 {
+  /* We must have either:
+
+     - one or four parameters if called initially:
+       + argv[1]: path for ld.so        optional
+       + argv[2]: "--library-path"      optional
+       + argv[3]: the library path      optional
+       + argv[4]: the application name
+
+     - six parameters left if called through re-execution:
+       + argv[5/1]: the application name
+       + argv[6/2]: the pidfile
+
+     * When built with --enable-hardcoded-path-in-tests or issued without
+       using the loader directly.  */
+
+  if (restart)
+    handle_restart (argv[1]);
+
+  TEST_VERIFY_EXIT (argc == 2 || argc == 5);
+
+  int i;
+  for (i = 0; i < argc - 1; i++)
+    initial_argv[i] = argv[i + 1];
+  initial_argv[i++] = (char *) "--direct";
+  initial_argv[i++] = (char *) "--restart";
+  initial_argv_count = i;
+
+  create_temp_file ("tst-posix_spawn-setsid-", &pidfile);
+
   do_test_setsid (false);
   do_test_setsid (true);
 
   return 0;
 }
 
+#define TEST_FUNCTION_ARGV do_test
 #include <support/test-driver.c>
diff --git a/posix/tst-spawn-chdir.c b/posix/tst-spawn-chdir.c
index b335092d7f..c01ca6692d 100644
--- a/posix/tst-spawn-chdir.c
+++ b/posix/tst-spawn-chdir.c
@@ -29,7 +29,9 @@
 #include <support/test-driver.h>
 #include <support/xstdio.h>
 #include <support/xunistd.h>
+#include <sys/wait.h>
 #include <unistd.h>
+#include <tst-spawn.h>
 
 /* Reads the file at PATH, which must consist of exactly one line.
    Removes the line terminator at the end of the file.  */
@@ -169,17 +171,18 @@ do_test (void)
 
           char *const argv[] = { (char *) "pwd", NULL };
           char *const envp[] = { NULL } ;
-          pid_t pid;
+          PID_T_TYPE pid;
           if (do_spawnp)
-            TEST_COMPARE (posix_spawnp (&pid, "pwd", &actions,
+            TEST_COMPARE (POSIX_SPAWNP (&pid, "pwd", &actions,
                                         NULL, argv, envp), 0);
           else
-            TEST_COMPARE (posix_spawn (&pid, "subdir/pwd-symlink", &actions,
+            TEST_COMPARE (POSIX_SPAWN (&pid, "subdir/pwd-symlink", &actions,
                                        NULL, argv, envp), 0);
           TEST_VERIFY (pid > 0);
-          int status;
-          xwaitpid (pid, &status, 0);
-          TEST_COMPARE (status, 0);
+          siginfo_t sinfo;
+          TEST_COMPARE (WAITID (P_ALL, 0, &sinfo, WEXITED), 0);
+          TEST_COMPARE (sinfo.si_code, CLD_EXITED);
+          TEST_COMPARE (sinfo.si_status, 0);
 
           /* Check that the current directory did not change.  */
           {
diff --git a/posix/tst-spawn.c b/posix/tst-spawn.c
index 6782a322fc..c44d90756a 100644
--- a/posix/tst-spawn.c
+++ b/posix/tst-spawn.c
@@ -25,11 +25,13 @@
 #include <stdlib.h>
 #include <string.h>
 #include <sys/param.h>
+#include <sys/wait.h>
 
 #include <support/check.h>
 #include <support/xunistd.h>
 #include <support/temp_file.h>
 #include <support/support.h>
+#include <tst-spawn.h>
 
 
 /* Nonzero if the program gets called via `exec'.  */
@@ -143,9 +145,9 @@ handle_restart (const char *fd1s, const char *fd2s, const char *fd3s,
 static int
 do_test (int argc, char *argv[])
 {
-  pid_t pid;
+  PID_T_TYPE pid;
   int fd4;
-  int status;
+  siginfo_t sinfo;
   posix_spawn_file_actions_t actions;
   char fd1name[18];
   char fd2name[18];
@@ -233,17 +235,16 @@ do_test (int argc, char *argv[])
   spargv[i++] = fd5name;
   spargv[i] = NULL;
 
-  TEST_COMPARE (posix_spawn (&pid, argv[1], &actions, NULL, spargv, environ),
+  TEST_COMPARE (POSIX_SPAWN (&pid, argv[1], &actions, NULL, spargv, environ),
 		0);
 
   /* Wait for the children.  */
-  TEST_COMPARE (xwaitpid (pid, &status, 0), pid);
-  TEST_VERIFY (WIFEXITED (status));
-  TEST_VERIFY (!WIFSIGNALED (status));
-  TEST_COMPARE (WEXITSTATUS (status), 0);
+  TEST_COMPARE (WAITID (P_PID, pid, &sinfo, WEXITED), 0);
+  TEST_COMPARE (sinfo.si_code, CLD_EXITED);
+  TEST_COMPARE (sinfo.si_status, 0);
 
   /* Same test but with a NULL pid argument.  */
-  TEST_COMPARE (posix_spawn (NULL, argv[1], &actions, NULL, spargv, environ),
+  TEST_COMPARE (POSIX_SPAWN (NULL, argv[1], &actions, NULL, spargv, environ),
 		0);
 
   /* Cleanup.  */
@@ -251,10 +252,9 @@ do_test (int argc, char *argv[])
   free (name3_copy);
 
   /* Wait for the children.  */
-  xwaitpid (-1, &status, 0);
-  TEST_VERIFY (WIFEXITED (status));
-  TEST_VERIFY (!WIFSIGNALED (status));
-  TEST_COMPARE (WEXITSTATUS (status), 0);
+  TEST_COMPARE (WAITID (P_ALL, 0, &sinfo, WEXITED), 0);
+  TEST_COMPARE (sinfo.si_code, CLD_EXITED);
+  TEST_COMPARE (sinfo.si_status, 0);
 
   return 0;
 }
diff --git a/posix/tst-spawn.h b/posix/tst-spawn.h
new file mode 100644
index 0000000000..a6f2dc8680
--- /dev/null
+++ b/posix/tst-spawn.h
@@ -0,0 +1,36 @@
+/* Generic definitions for posix_spawn tests.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#ifndef PID_T_TYPE
+# define PID_T_TYPE pid_t
+#endif
+
+#ifndef POSIX_SPAWN
+# define POSIX_SPAWN(__child, __path, __actions, __attr, __argv, __envp) \
+  posix_spawn (__child, __path, __actions, __attr, __argv, __envp)
+#endif
+
+#ifndef POSIX_SPAWNP
+# define POSIX_SPAWNP(__child, __path, __actions, __attr, __argv, __envp) \
+  posix_spawnp (__child, __path, __actions, __attr, __argv, __envp)
+#endif
+
+#ifndef WAITID
+# define WAITID(__idtype, __id, __info, __opts) \
+  waitid (__idtype, __id, __info, __opts)
+#endif
diff --git a/posix/tst-spawn2.c b/posix/tst-spawn2.c
index 40dc692488..f5c1f13039 100644
--- a/posix/tst-spawn2.c
+++ b/posix/tst-spawn2.c
@@ -26,6 +26,7 @@
 #include <stdio.h>
 
 #include <support/check.h>
+#include <tst-spawn.h>
 
 int
 do_test (void)
@@ -35,9 +36,9 @@ do_test (void)
 
   const char *program = "/path/to/invalid/binary";
   char * const args[] = { 0 };
-  pid_t pid = -1;
+  PID_T_TYPE pid = -1;
 
-  int ret = posix_spawn (&pid, program, 0, 0, args, environ);
+  int ret = POSIX_SPAWN (&pid, program, 0, 0, args, environ);
   if (ret != ENOENT)
     {
       errno = ret;
@@ -51,14 +52,13 @@ do_test (void)
     FAIL_EXIT1 ("posix_spawn returned pid != -1 (%i)", (int) pid);
 
   /* Check if no child is actually created.  */
-  ret = waitpid (-1, NULL, 0);
-  if (ret != -1 || errno != ECHILD)
-    FAIL_EXIT1 ("waitpid: %m)");
+  TEST_COMPARE (WAITID (P_ALL, 0, NULL, WEXITED), -1);
+  TEST_COMPARE (errno, ECHILD);
 
   /* Same as before, but with posix_spawnp.  */
   char *args2[] = { (char*) program, 0 };
 
-  ret = posix_spawnp (&pid, args2[0], 0, 0, args2, environ);
+  ret = POSIX_SPAWNP (&pid, args2[0], 0, 0, args2, environ);
   if (ret != ENOENT)
     {
       errno = ret;
@@ -68,9 +68,8 @@ do_test (void)
   if (pid != -1)
     FAIL_EXIT1 ("posix_spawnp returned pid != -1 (%i)", (int) pid);
 
-  ret = waitpid (-1, NULL, 0);
-  if (ret != -1 || errno != ECHILD)
-    FAIL_EXIT1 ("waitpid: %m)");
+  TEST_COMPARE (WAITID (P_ALL, 0, NULL, WEXITED), -1);
+  TEST_COMPARE (errno, ECHILD);
 
   return 0;
 }
diff --git a/posix/tst-spawn3.c b/posix/tst-spawn3.c
index e7ce0fb386..bd21ac6c4b 100644
--- a/posix/tst-spawn3.c
+++ b/posix/tst-spawn3.c
@@ -16,6 +16,7 @@
    License along with the GNU C Library; if not, see
    <https://www.gnu.org/licenses/>.  */
 
+#include <assert.h>
 #include <stdio.h>
 #include <spawn.h>
 #include <error.h>
@@ -27,9 +28,12 @@
 #include <sys/resource.h>
 #include <fcntl.h>
 #include <paths.h>
+#include <intprops.h>
 
 #include <support/check.h>
 #include <support/temp_file.h>
+#include <support/xunistd.h>
+#include <tst-spawn.h>
 
 static int
 do_test (void)
@@ -48,7 +52,6 @@ do_test (void)
 
   struct rlimit rl;
   int max_fd = 24;
-  int ret;
 
   /* Set maximum number of file descriptor to a low value to avoid open
      too many files in environments where RLIMIT_NOFILE is large and to
@@ -66,7 +69,7 @@ do_test (void)
   /* Exhauste the file descriptor limit with temporary files.  */
   int files[max_fd];
   int nfiles = 0;
-  for (;;)
+  for (; nfiles < max_fd; nfiles++)
     {
       int fd = create_temp_file ("tst-spawn3.", NULL);
       if (fd == -1)
@@ -75,75 +78,82 @@ do_test (void)
 	    FAIL_EXIT1 ("create_temp_file: %m");
 	  break;
 	}
-      files[nfiles++] = fd;
+      files[nfiles] = fd;
     }
+  assert (nfiles != 0);
 
   posix_spawn_file_actions_t a;
-  if (posix_spawn_file_actions_init (&a) != 0)
-    FAIL_EXIT1 ("posix_spawn_file_actions_init");
+  TEST_COMPARE (posix_spawn_file_actions_init (&a), 0);
 
   /* Executes a /bin/sh echo $$ 2>&1 > ${objpfx}tst-spawn3.pid .  */
   const char pidfile[] = OBJPFX "tst-spawn3.pid";
-  if (posix_spawn_file_actions_addopen (&a, STDOUT_FILENO, pidfile, O_WRONLY
-					| O_CREAT | O_TRUNC, 0644) != 0)
-    FAIL_EXIT1 ("posix_spawn_file_actions_addopen");
+  TEST_COMPARE (posix_spawn_file_actions_addopen (&a, STDOUT_FILENO, pidfile,
+						  O_WRONLY| O_CREAT | O_TRUNC,
+						  0644),
+		0);
 
-  if (posix_spawn_file_actions_adddup2 (&a, STDOUT_FILENO, STDERR_FILENO) != 0)
-    FAIL_EXIT1 ("posix_spawn_file_actions_adddup2");
+  TEST_COMPARE (posix_spawn_file_actions_adddup2 (&a, STDOUT_FILENO,
+						  STDERR_FILENO),
+		0);
 
   /* Since execve (called by posix_spawn) might require to open files to
      actually execute the shell script, setup to close the temporary file
      descriptors.  */
-  for (int i=0; i<nfiles; i++)
-    {
-      if (posix_spawn_file_actions_addclose (&a, files[i]))
-	FAIL_EXIT1 ("posix_spawn_file_actions_addclose");
-    }
+  int maxnfiles =
+#ifdef TST_SPAWN_PIDFD
+    /* The sparing file descriptor will be returned as the pid descriptor,
+       otherwise clone fail with EMFILE.  */
+    nfiles - 1;
+#else
+    nfiles;
+#endif
+
+  for (int i=0; i<maxnfiles; i++)
+    TEST_COMPARE (posix_spawn_file_actions_addclose (&a, files[i]), 0);
 
   char *spawn_argv[] = { (char *) _PATH_BSHELL, (char *) "-c",
 			 (char *) "echo $$", NULL };
-  pid_t pid;
-  if ((ret = posix_spawn (&pid, _PATH_BSHELL, &a, NULL, spawn_argv, NULL))
-       != 0)
-    {
-      errno = ret;
-      FAIL_EXIT1 ("posix_spawn: %m");
-    }
-
-  int status;
-  int err = waitpid (pid, &status, 0);
-  if (err != pid)
-    FAIL_EXIT1 ("waitpid: %m");
+  PID_T_TYPE pid;
+
+  {
+    int r = POSIX_SPAWN (&pid, _PATH_BSHELL, &a, NULL, spawn_argv, NULL);
+    if (r == ENOSYS)
+      FAIL_UNSUPPORTED ("kernel does not support CLONE_PIDFD clone flag");
+#ifdef TST_SPAWN_PIDFD
+    TEST_COMPARE (r, EMFILE);
+
+    /* Free up one file descriptor, so posix_spawn_pidfd_ex can return it.  */
+    xclose (files[nfiles-1]);
+    nfiles--;
+    r = POSIX_SPAWN (&pid, _PATH_BSHELL, &a, NULL, spawn_argv, NULL);
+#endif
+    TEST_COMPARE (r, 0);
+  }
+
+  siginfo_t sinfo;
+  TEST_COMPARE (WAITID (P_PID, pid, &sinfo, WEXITED), 0);
+  TEST_COMPARE (sinfo.si_code, CLD_EXITED);
+  TEST_COMPARE (sinfo.si_status, 0);
 
   /* Close the temporary files descriptor so it can check posix_spawn
      output.  */
   for (int i=0; i<nfiles; i++)
-    {
-      if (close (files[i]))
-	FAIL_EXIT1 ("close: %m");
-    }
+    xclose (files[i]);
 
-  int pidfd = open (pidfile, O_RDONLY);
-  if (pidfd == -1)
-    FAIL_EXIT1 ("open: %m");
+  int pidfd = xopen (pidfile, O_RDONLY, 0);
 
-  char buf[64];
-  ssize_t n;
-  if ((n = read (pidfd, buf, sizeof (buf))) < 0)
-    FAIL_EXIT1 ("read: %m");
+  char buf[INT_STRLEN_BOUND (pid_t)];
+  ssize_t n = read (pidfd, buf, sizeof (buf));
+  TEST_VERIFY (n < sizeof buf && n >= 0);
 
-  unlink (pidfile);
+  xunlink (pidfile);
 
   /* We only expect to read the PID.  */
   char *endp;
   long int rpid = strtol (buf, &endp, 10);
-  if (*endp != '\n')
-    FAIL_EXIT1 ("*endp != \'n\'");
-  if (endp == buf)
-    FAIL_EXIT1 ("read empty line");
+  TEST_VERIFY (*endp == '\n' && endp != buf);
 
-  if (rpid != pid)
-    FAIL_EXIT1 ("found \"%s\", expected pid %ld\n", buf, (long int) pid);
+  TEST_COMPARE (rpid, sinfo.si_pid);
 
   return 0;
 }
diff --git a/posix/tst-spawn4.c b/posix/tst-spawn4.c
index 327f04ea6c..8bf8bd52df 100644
--- a/posix/tst-spawn4.c
+++ b/posix/tst-spawn4.c
@@ -24,6 +24,7 @@
 #include <support/xunistd.h>
 #include <support/check.h>
 #include <support/temp_file.h>
+#include <tst-spawn.h>
 
 static int
 do_test (void)
@@ -38,15 +39,15 @@ do_test (void)
 
   TEST_VERIFY_EXIT (chmod (scriptname, 0x775) == 0);
 
-  pid_t pid;
+  PID_T_TYPE pid;
   int status;
 
   /* Check if scripts without shebang are correctly not executed.  */
-  status = posix_spawn (&pid, scriptname, NULL, NULL, (char *[]) { 0 },
+  status = POSIX_SPAWN (&pid, scriptname, NULL, NULL, (char *[]) { 0 },
                         (char *[]) { 0 });
   TEST_VERIFY_EXIT (status == ENOEXEC);
 
-  status = posix_spawnp (&pid, scriptname, NULL, NULL, (char *[]) { 0 },
+  status = POSIX_SPAWNP (&pid, scriptname, NULL, NULL, (char *[]) { 0 },
                          (char *[]) { 0 });
   TEST_VERIFY_EXIT (status == ENOEXEC);
 
diff --git a/posix/tst-spawn5.c b/posix/tst-spawn5.c
index 6b3d11cf82..7850f3d7dd 100644
--- a/posix/tst-spawn5.c
+++ b/posix/tst-spawn5.c
@@ -33,6 +33,7 @@
 
 #include <arch-fd_to_filename.h>
 #include <array_length.h>
+#include <tst-spawn.h>
 
 /* Nonzero if the program gets called via `exec'.  */
 static int restart;
@@ -161,14 +162,13 @@ spawn_closefrom_test (posix_spawn_file_actions_t *fa, int lowfd, int highfd,
   args[argc] = NULL;
   TEST_VERIFY (argc < argv_size);
 
-  pid_t pid;
-  int status;
+  PID_T_TYPE pid;
+  siginfo_t sinfo;
 
-  TEST_COMPARE (posix_spawn (&pid, args[0], fa, NULL, args, environ), 0);
-  TEST_COMPARE (xwaitpid (pid, &status, 0), pid);
-  TEST_VERIFY (WIFEXITED (status));
-  TEST_VERIFY (!WIFSIGNALED (status));
-  TEST_COMPARE (WEXITSTATUS (status), 0);
+  TEST_COMPARE (POSIX_SPAWN (&pid, args[0], fa, NULL, args, environ), 0);
+  TEST_COMPARE (WAITID (P_PID, pid, &sinfo, WEXITED), 0);
+  TEST_COMPARE (sinfo.si_code, CLD_EXITED);
+  TEST_COMPARE (sinfo.si_status, 0);
 }
 
 static void
diff --git a/posix/tst-spawn6.c b/posix/tst-spawn6.c
index 4e29d78168..ff36351cd6 100644
--- a/posix/tst-spawn6.c
+++ b/posix/tst-spawn6.c
@@ -32,6 +32,7 @@
 #include <sys/ioctl.h>
 #include <stdlib.h>
 #include <termios.h>
+#include <tst-spawn.h>
 
 #ifndef PATH_MAX
 # define PATH_MAX 1024
@@ -108,17 +109,15 @@ run_subprogram (int argc, char *argv[], const posix_spawnattr_t *attr,
   spargv[i] = NULL;
 
   pid_t pid;
-  TEST_COMPARE (posix_spawn (&pid, argv[1], actions, attr, spargv, environ),
+  TEST_COMPARE (POSIX_SPAWN (&pid, argv[1], actions, attr, spargv, environ),
 		exp_err);
   if (exp_err != 0)
     return;
 
-  int status;
-  TEST_COMPARE (xwaitpid (pid, &status, WUNTRACED), pid);
-  TEST_VERIFY (WIFEXITED (status));
-  TEST_VERIFY (!WIFSTOPPED (status));
-  TEST_VERIFY (!WIFSIGNALED (status));
-  TEST_COMPARE (WEXITSTATUS (status), 0);
+  siginfo_t sinfo;
+  TEST_COMPARE (WAITID (P_ALL, 0, &sinfo, WEXITED), 0);
+  TEST_COMPARE (sinfo.si_code, CLD_EXITED);
+  TEST_COMPARE (sinfo.si_status, 0);
 }
 
 static int
@@ -202,7 +201,7 @@ do_test (int argc, char *argv[])
   if (restart)
     return handle_restart (argv[1], argv[2]);
 
-  pid_t pid = xfork ();
+  PID_T_TYPE pid = xfork ();
   if (pid == 0)
     {
       /* Create a pseudo-terminal to avoid interfering with the one using by
diff --git a/posix/tst-spawn7.c b/posix/tst-spawn7.c
index fb06915cb7..cc4498830b 100644
--- a/posix/tst-spawn7.c
+++ b/posix/tst-spawn7.c
@@ -24,7 +24,9 @@
 #include <support/check.h>
 #include <support/xsignal.h>
 #include <support/xunistd.h>
+#include <sys/wait.h>
 #include <unistd.h>
+#include <tst-spawn.h>
 
 /* Nonzero if the program gets called via `exec'.  */
 #define CMDLINE_OPTIONS \
@@ -81,14 +83,13 @@ spawn_signal_test (const char *type, const posix_spawnattr_t *attr)
 {
   spargs[check_type_argc] = (char*) type;
 
-  pid_t pid;
-  int status;
+  PID_T_TYPE pid;
+  siginfo_t sinfo;
 
   TEST_COMPARE (posix_spawn (&pid, spargs[0], NULL, attr, spargs, environ), 0);
-  TEST_COMPARE (xwaitpid (pid, &status, 0), pid);
-  TEST_VERIFY (WIFEXITED (status));
-  TEST_VERIFY (!WIFSIGNALED (status));
-  TEST_COMPARE (WEXITSTATUS (status), 0);
+  TEST_COMPARE (WAITID (P_ALL, 0, &sinfo, WEXITED), 0);
+  TEST_COMPARE (sinfo.si_code, CLD_EXITED);
+  TEST_COMPARE (sinfo.si_status, 0);
 }
 
 static void
diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
index d7b020154a..3ecfa184d0 100644
--- a/sysdeps/unix/sysv/linux/Makefile
+++ b/sysdeps/unix/sysv/linux/Makefile
@@ -62,6 +62,7 @@ sysdep_routines += \
   clock_adjtime \
   clone \
   clone-internal \
+  clone-pidfd-support \
   clone3 \
   closefrom_fallback \
   convert_scm_timestamps \
@@ -492,6 +493,8 @@ sysdep_headers += \
 sysdep_routines += \
   getcpu \
   oldglob \
+  pidfd_spawn \
+  pidfd_spawnp \
   sched_getcpu \
   spawnattr_getcgroup_np \
   spawnattr_setcgroup_np \
@@ -500,7 +503,16 @@ sysdep_routines += \
 tests += \
   tst-affinity \
   tst-affinity-pid \
+  tst-posix_spawn-setsid-pidfd \
   tst-spawn-cgroup \
+  tst-spawn-chdir-pidfd \
+  tst-spawn-pidfd \
+  tst-spawn2-pidfd \
+  tst-spawn3-pidfd \
+  tst-spawn4-pidfd \
+  tst-spawn5-pidfd \
+  tst-spawn6-pidfd \
+  tst-spawn7-pidfd \
   # tests
 
 tests-static += \
@@ -514,8 +526,14 @@ tests += \
 CFLAGS-fork.c = $(libio-mtsafe)
 CFLAGS-getpid.o = -fomit-frame-pointer
 CFLAGS-getpid.os = -fomit-frame-pointer
+CFLAGS-tst-spawn3-pidfd.c += -DOBJPFX=\"$(objpfx)\"
 
 tst-spawn-cgroup-ARGS = -- $(host-test-program-cmd)
+tst-spawn-pidfd-ARGS = -- $(host-test-program-cmd)
+tst-spawn5-pidfd-ARGS = -- $(host-test-program-cmd)
+tst-spawn6-pidfd-ARGS = -- $(host-test-program-cmd)
+tst-spawn7-pidfd-ARGS = -- $(host-test-program-cmd)
+tst-posix_spawn-setsid-pidfd-ARGS = -- $(host-test-program-cmd)
 endif
 
 ifeq ($(subdir),inet)
diff --git a/sysdeps/unix/sysv/linux/Versions b/sysdeps/unix/sysv/linux/Versions
index 6d8a67039e..bd96ad12ad 100644
--- a/sysdeps/unix/sysv/linux/Versions
+++ b/sysdeps/unix/sysv/linux/Versions
@@ -324,6 +324,8 @@ libc {
   GLIBC_2.39 {
     posix_spawnattr_getcgroup_np;
     posix_spawnattr_setcgroup_np;
+    pidfd_spawn;
+    pidfd_spawnp;
   }
   GLIBC_PRIVATE {
     # functions used in other libraries
diff --git a/sysdeps/unix/sysv/linux/aarch64/libc.abilist b/sysdeps/unix/sysv/linux/aarch64/libc.abilist
index 0090827e01..6f23556067 100644
--- a/sysdeps/unix/sysv/linux/aarch64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/aarch64/libc.abilist
@@ -2673,5 +2673,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/alpha/libc.abilist b/sysdeps/unix/sysv/linux/alpha/libc.abilist
index 9d099471b6..02c43beb13 100644
--- a/sysdeps/unix/sysv/linux/alpha/libc.abilist
+++ b/sysdeps/unix/sysv/linux/alpha/libc.abilist
@@ -2782,6 +2782,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 _IO_fprintf F
diff --git a/sysdeps/unix/sysv/linux/arc/libc.abilist b/sysdeps/unix/sysv/linux/arc/libc.abilist
index d7ed2f66de..dd8e5912d8 100644
--- a/sysdeps/unix/sysv/linux/arc/libc.abilist
+++ b/sysdeps/unix/sysv/linux/arc/libc.abilist
@@ -2434,5 +2434,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/arm/be/libc.abilist b/sysdeps/unix/sysv/linux/arm/be/libc.abilist
index 92e686defe..a751e5f5a9 100644
--- a/sysdeps/unix/sysv/linux/arm/be/libc.abilist
+++ b/sysdeps/unix/sysv/linux/arm/be/libc.abilist
@@ -554,6 +554,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 _Exit F
diff --git a/sysdeps/unix/sysv/linux/arm/le/libc.abilist b/sysdeps/unix/sysv/linux/arm/le/libc.abilist
index b503e642fc..0eda3459ed 100644
--- a/sysdeps/unix/sysv/linux/arm/le/libc.abilist
+++ b/sysdeps/unix/sysv/linux/arm/le/libc.abilist
@@ -551,6 +551,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 _Exit F
diff --git a/sysdeps/unix/sysv/linux/bits/spawn_ext.h b/sysdeps/unix/sysv/linux/bits/spawn_ext.h
index 3bc10ab477..ff8550f264 100644
--- a/sysdeps/unix/sysv/linux/bits/spawn_ext.h
+++ b/sysdeps/unix/sysv/linux/bits/spawn_ext.h
@@ -37,4 +37,35 @@ extern int posix_spawnattr_setcgroup_np (posix_spawnattr_t *__restrict __attr,
 
 #endif /* __USE_MISC */
 
+#ifdef __USE_GNU
+
+/* Spawn a new process executing PATH with the attributes describes in *ATTRP.
+   Before running the process perform the actions described in FACTS.  Return
+   a PID file descriptor in PIDFD if process creation was successful and the
+   argument is non-null.
+
+   This function is a possible cancellation point and therefore not
+   marked with __THROW. */
+extern int pidfd_spawn (int *__restrict __pidfd,
+			const char *__restrict __path,
+			const posix_spawn_file_actions_t *__restrict __facts,
+			const posix_spawnattr_t *__restrict __attrp,
+			char *const __argv[__restrict_arr],
+			char *const __envp[__restrict_arr])
+    __nonnull ((2, 5));
+
+/* Similar to `pidfd_spawn' but search for FILE in the PATH.
+
+   This function is a possible cancellation point and therefore not
+   marked with __THROW. */
+extern int pidfd_spawnp (int *__restrict __pidfd,
+			 const char *__restrict __file,
+			 const posix_spawn_file_actions_t *__restrict __facts,
+			 const posix_spawnattr_t *__restrict __attrp,
+			 char *const __argv[__restrict_arr],
+			 char *const __envp[__restrict_arr])
+    __nonnull ((2, 5));
+
+#endif /* __USE_GNU */
+
 __END_DECLS
diff --git a/sysdeps/unix/sysv/linux/clone-pidfd-support.c b/sysdeps/unix/sysv/linux/clone-pidfd-support.c
new file mode 100644
index 0000000000..af2d213cc5
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/clone-pidfd-support.c
@@ -0,0 +1,58 @@
+/* Check if kernel supports PID file descriptors.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <atomic.h>
+#include <sys/wait.h>
+#include <sysdep.h>
+
+/* The PID file descriptors was added during multiple releases:
+   - Linux 5.2 added CLONE_PIDFD support for clone and __clone_pidfd_supported
+     syscall.
+   - Linux 5.3 added support for poll and CLONE_PIDFD for clone3.
+   - Linux 5.4 added P_PIDFD support on waitid.
+
+   For internal usage on spawn and fork, it only make sense to return a file
+   descriptor if caller can actually waitid on it.  */
+bool
+__clone_pidfd_supported (void)
+{
+  static int supported = 0;
+  int state = atomic_load_relaxed (&supported);
+  if (state == 0)
+    {
+      /* Linux define the maximum allocated file descriptor value as
+	 0x7fffffc0 (from fs/file.c):
+
+         #define __const_min(x, y) ((x) < (y) ? (x) : (y))
+         unsigned int sysctl_nr_open_max =
+	   __const_min(INT_MAX, ~(size_t)0/sizeof(void *)) & -BITS_PER_LONG;
+
+	 So we can detect whether kernel supports all pidfd interfaces by
+	 using a valid but never allocated file descriptor: if is not
+	 supported waitid will return EINVAL, otherwise EBADF.
+
+         Also the waitid is a cancellation entrypoint, so issue the syscall
+	 directly.  */
+      int r = INTERNAL_SYSCALL_CALL (waitid, P_PIDFD, INT_MAX, NULL,
+				     WEXITED | WNOHANG);
+      state = r == -EBADF ? 1 : -1;
+      atomic_store_relaxed (&supported, state);
+    }
+
+  return state == 1;
+}
diff --git a/sysdeps/unix/sysv/linux/csky/libc.abilist b/sysdeps/unix/sysv/linux/csky/libc.abilist
index ec9e209b8d..4f4e99427b 100644
--- a/sysdeps/unix/sysv/linux/csky/libc.abilist
+++ b/sysdeps/unix/sysv/linux/csky/libc.abilist
@@ -2710,5 +2710,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/hppa/libc.abilist b/sysdeps/unix/sysv/linux/hppa/libc.abilist
index 961f88bf14..abc471dd0b 100644
--- a/sysdeps/unix/sysv/linux/hppa/libc.abilist
+++ b/sysdeps/unix/sysv/linux/hppa/libc.abilist
@@ -2659,6 +2659,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 __confstr_chk F
diff --git a/sysdeps/unix/sysv/linux/i386/libc.abilist b/sysdeps/unix/sysv/linux/i386/libc.abilist
index b6f5a4ab83..9f03c8a9a2 100644
--- a/sysdeps/unix/sysv/linux/i386/libc.abilist
+++ b/sysdeps/unix/sysv/linux/i386/libc.abilist
@@ -2843,6 +2843,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 __confstr_chk F
diff --git a/sysdeps/unix/sysv/linux/ia64/libc.abilist b/sysdeps/unix/sysv/linux/ia64/libc.abilist
index a404b99e68..ce1d20b722 100644
--- a/sysdeps/unix/sysv/linux/ia64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/ia64/libc.abilist
@@ -2608,6 +2608,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 __confstr_chk F
diff --git a/sysdeps/unix/sysv/linux/loongarch/lp64/libc.abilist b/sysdeps/unix/sysv/linux/loongarch/lp64/libc.abilist
index 2f9f6e2332..8c3640b004 100644
--- a/sysdeps/unix/sysv/linux/loongarch/lp64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/loongarch/lp64/libc.abilist
@@ -2194,5 +2194,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
index b7e9ab4558..a594916319 100644
--- a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
+++ b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
@@ -555,6 +555,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 _Exit F
diff --git a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
index c345da7e0a..7f61d4824d 100644
--- a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
+++ b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
@@ -2786,6 +2786,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 __confstr_chk F
diff --git a/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist b/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist
index a643d868a8..83ebb84ff3 100644
--- a/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist
+++ b/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist
@@ -2759,5 +2759,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist b/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist
index fed535742c..89a0ff83bf 100644
--- a/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist
+++ b/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist
@@ -2756,5 +2756,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
index 147bac3eaf..e21c752057 100644
--- a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
@@ -2751,6 +2751,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 __confstr_chk F
diff --git a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
index e550616576..42f470d397 100644
--- a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
@@ -2749,6 +2749,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 __confstr_chk F
diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
index 56f414dbd0..6907f5f98b 100644
--- a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
@@ -2757,6 +2757,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 __confstr_chk F
diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
index da704a2e2b..4b1f017a98 100644
--- a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
@@ -2659,6 +2659,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 __confstr_chk F
diff --git a/sysdeps/unix/sysv/linux/nios2/libc.abilist b/sysdeps/unix/sysv/linux/nios2/libc.abilist
index f5a157ea94..0d45902209 100644
--- a/sysdeps/unix/sysv/linux/nios2/libc.abilist
+++ b/sysdeps/unix/sysv/linux/nios2/libc.abilist
@@ -2798,5 +2798,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/or1k/libc.abilist b/sysdeps/unix/sysv/linux/or1k/libc.abilist
index 85b552f1cb..c59032ef14 100644
--- a/sysdeps/unix/sysv/linux/or1k/libc.abilist
+++ b/sysdeps/unix/sysv/linux/or1k/libc.abilist
@@ -2180,5 +2180,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/pidfd_spawn.c b/sysdeps/unix/sysv/linux/pidfd_spawn.c
new file mode 100644
index 0000000000..cc76bf9935
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/pidfd_spawn.c
@@ -0,0 +1,30 @@
+/* pidfd_spawn - Spawn a process and return a PID file descriptor.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <spawn.h>
+#include "spawn_int.h"
+
+int
+pidfd_spawn (int *pidfd, const char *path,
+	     const posix_spawn_file_actions_t *file_actions,
+	     const posix_spawnattr_t *attrp, char *const argv[],
+	     char *const envp[])
+{
+  return __spawni (pidfd, path, file_actions, attrp, argv, envp,
+		   SPAWN_XFLAGS_RET_PIDFD);
+}
diff --git a/sysdeps/unix/sysv/linux/pidfd_spawnp.c b/sysdeps/unix/sysv/linux/pidfd_spawnp.c
new file mode 100644
index 0000000000..858c0f3191
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/pidfd_spawnp.c
@@ -0,0 +1,30 @@
+/* pidfd_spawnp - Spawn a process and return a PID file descriptor.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <spawn.h>
+#include "spawn_int.h"
+
+int
+pidfd_spawnp (int *pidfd, const char *path,
+	      const posix_spawn_file_actions_t *file_actions,
+	      const posix_spawnattr_t *attrp, char *const argv[],
+	      char *const envp[])
+{
+  return __spawni (pidfd, path, file_actions, attrp, argv, envp,
+		   SPAWN_XFLAGS_USE_PATH | SPAWN_XFLAGS_RET_PIDFD);
+}
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
index cadb16c12f..e014314d3e 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
@@ -2825,6 +2825,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 _IO_fprintf F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
index 50c5b99728..ac05154915 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
@@ -2858,6 +2858,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 _IO_fprintf F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
index 81c63385af..e13ee6e72a 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
@@ -2579,6 +2579,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 _IO_fprintf F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
index af9be18108..0e8c9ab3fe 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
@@ -2893,5 +2893,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist
index 2266a88ad5..b0559a5a64 100644
--- a/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist
@@ -2436,5 +2436,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
index 4776ae32b8..5f79a84016 100644
--- a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
@@ -2636,5 +2636,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
index 5d1d7d07a5..498886ccb2 100644
--- a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
@@ -2823,6 +2823,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 _IO_fprintf F
diff --git a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
index fffc32a0f4..51679c2990 100644
--- a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
@@ -2616,6 +2616,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 _IO_fprintf F
diff --git a/sysdeps/unix/sysv/linux/sh/be/libc.abilist b/sysdeps/unix/sysv/linux/sh/be/libc.abilist
index 43ff21447d..af7b6f5bc9 100644
--- a/sysdeps/unix/sysv/linux/sh/be/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sh/be/libc.abilist
@@ -2666,6 +2666,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 __confstr_chk F
diff --git a/sysdeps/unix/sysv/linux/sh/le/libc.abilist b/sysdeps/unix/sysv/linux/sh/le/libc.abilist
index 9ea18d5886..b766299f31 100644
--- a/sysdeps/unix/sysv/linux/sh/le/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sh/le/libc.abilist
@@ -2663,6 +2663,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 __confstr_chk F
diff --git a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
index c6607d5385..f5b9200a33 100644
--- a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
@@ -2818,6 +2818,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 _IO_fprintf F
diff --git a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
index a010a2bb16..f6012e6e17 100644
--- a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
@@ -2631,6 +2631,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 __confstr_chk F
diff --git a/sysdeps/unix/sysv/linux/spawni.c b/sysdeps/unix/sysv/linux/spawni.c
index f0d4c62ae6..844abf1b0b 100644
--- a/sysdeps/unix/sysv/linux/spawni.c
+++ b/sysdeps/unix/sysv/linux/spawni.c
@@ -68,6 +68,7 @@ struct posix_spawn_args
   int xflags;
   bool use_clone3;
   int err;
+  int pidfd;
 };
 
 /* Older version requires that shell script without shebang definition
@@ -309,7 +310,7 @@ fail:
 /* Spawn a new process executing PATH with the attributes describes in *ATTRP.
    Before running the process perform the actions described in FILE-ACTIONS. */
 static int
-__spawnix (pid_t * pid, const char *file,
+__spawnix (int *pid, const char *file,
 	   const posix_spawn_file_actions_t * file_actions,
 	   const posix_spawnattr_t * attrp, char *const argv[],
 	   char *const envp[], int xflags,
@@ -319,6 +320,15 @@ __spawnix (pid_t * pid, const char *file,
   struct posix_spawn_args args;
   int ec;
 
+  bool use_pidfd = xflags & SPAWN_XFLAGS_RET_PIDFD;
+
+  /* For CLONE_PIDFD, older kernels might not fail with unsupported flags or
+     some versions might not support waitid (P_PIDFD).  So to avoid the need
+     to handle the error on the helper process, check for full pidfd
+     support.  */
+  if (use_pidfd && !__clone_pidfd_supported ())
+    return ENOSYS;
+
   /* To avoid imposing hard limits on posix_spawn{p} the total number of
      arguments is first calculated to allocate a mmap to hold all possible
      values.  */
@@ -368,6 +378,7 @@ __spawnix (pid_t * pid, const char *file,
   args.argv = argv;
   args.argc = argc;
   args.envp = envp;
+  args.pidfd = 0;
   args.xflags = xflags;
 
   internal_signal_block_all (&args.oldmask);
@@ -386,13 +397,16 @@ __spawnix (pid_t * pid, const char *file,
       /* Unsupported flags like CLONE_CLEAR_SIGHAND will be cleared up by
 	 __clone_internal_fallback.  */
       .flags = (set_cgroup ? CLONE_INTO_CGROUP : 0)
+	       | (use_pidfd ? CLONE_PIDFD : 0)
 	       | CLONE_CLEAR_SIGHAND
 	       | CLONE_VM
 	       | CLONE_VFORK,
       .exit_signal = SIGCHLD,
       .stack = (uintptr_t) stack,
       .stack_size = stack_size,
-      .cgroup = (set_cgroup ? attrp->__cgroup : 0)
+      .cgroup = (set_cgroup ? attrp->__cgroup : 0),
+      .pidfd = use_pidfd ? (uintptr_t) &args.pidfd : 0,
+      .parent_tid = use_pidfd ? (uintptr_t) &args.pidfd : 0,
     };
 #ifdef HAVE_CLONE3_WRAPPER
   args.use_clone3 = true;
@@ -445,7 +459,7 @@ __spawnix (pid_t * pid, const char *file,
   __munmap (stack, stack_size);
 
   if ((ec == 0) && (pid != NULL))
-    *pid = new_pid;
+    *pid = use_pidfd ? args.pidfd : new_pid;
 
   internal_signal_restore_set (&args.oldmask);
 
diff --git a/sysdeps/unix/sysv/linux/tst-posix_spawn-setsid-pidfd.c b/sysdeps/unix/sysv/linux/tst-posix_spawn-setsid-pidfd.c
new file mode 100644
index 0000000000..4372833f07
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/tst-posix_spawn-setsid-pidfd.c
@@ -0,0 +1,20 @@
+/* Tests for spawn pidfd extension.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <tst-spawn-pidfd.h>
+#include <posix/tst-posix_spawn-setsid.c>
diff --git a/sysdeps/unix/sysv/linux/tst-spawn-chdir-pidfd.c b/sysdeps/unix/sysv/linux/tst-spawn-chdir-pidfd.c
new file mode 100644
index 0000000000..019527b31b
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/tst-spawn-chdir-pidfd.c
@@ -0,0 +1,20 @@
+/* Tests for spawn pidfd extension.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <tst-spawn-pidfd.h>
+#include <posix/tst-spawn-chdir.c>
diff --git a/sysdeps/unix/sysv/linux/tst-spawn-pidfd.c b/sysdeps/unix/sysv/linux/tst-spawn-pidfd.c
new file mode 100644
index 0000000000..c430995af8
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/tst-spawn-pidfd.c
@@ -0,0 +1,20 @@
+/* Tests for spawn pidfd extension.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <tst-spawn-pidfd.h>
+#include <posix/tst-spawn.c>
diff --git a/sysdeps/unix/sysv/linux/tst-spawn-pidfd.h b/sysdeps/unix/sysv/linux/tst-spawn-pidfd.h
new file mode 100644
index 0000000000..ea51c22447
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/tst-spawn-pidfd.h
@@ -0,0 +1,63 @@
+/* Tests for spawn pidfd extension.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <errno.h>
+#include <spawn.h>
+#include <support/check.h>
+
+#define PID_T_TYPE int
+
+/* Call posix_spawn with POSIX_SPAWN_PIDFD set.  */
+static inline int
+pidfd_spawn_check (int *pidfd, const char *path,
+		   const posix_spawn_file_actions_t *fa,
+		   const posix_spawnattr_t *attr, char *const argv[],
+		   char *const envp[])
+{
+  int r = pidfd_spawn (pidfd, path, fa, attr, argv, envp);
+  if (r == ENOSYS)
+    FAIL_UNSUPPORTED ("kernel does not support CLONE_PIDFD clone flag");
+  return r;
+}
+
+#define POSIX_SPAWN(__pidfd, __path, __actions, __attr, __argv, __envp)	     \
+  pidfd_spawn_check (__pidfd, __path, __actions, __attr, __argv, __envp)
+
+static inline int
+pidfd_spawnp_check (int *pidfd, const char *file,
+		    const posix_spawn_file_actions_t *fa,
+		    const posix_spawnattr_t *attr,
+		    char *const argv[], char *const envp[])
+{
+  int r = pidfd_spawnp (pidfd, file, fa, attr, argv, envp);
+  if (r == ENOSYS)
+    FAIL_UNSUPPORTED ("kernel does not support CLONE_PIDFD clone flag");
+  return r;
+}
+
+#define POSIX_SPAWNP(__child, __path, __actions, __attr, __argv, __envp) \
+  pidfd_spawnp_check (__child, __path, __actions, __attr, __argv, __envp)
+
+#define WAITID(__idtype, __id, __info, __opts)				     \
+  ({									     \
+     __typeof (__idtype) __new_idtype = __idtype == P_PID		     \
+					? P_PIDFD : __idtype;		     \
+     waitid (__new_idtype, __id, __info, __opts);			     \
+  })
+
+#define TST_SPAWN_PIDFD 1
diff --git a/sysdeps/unix/sysv/linux/tst-spawn2-pidfd.c b/sysdeps/unix/sysv/linux/tst-spawn2-pidfd.c
new file mode 100644
index 0000000000..03ba7a3d15
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/tst-spawn2-pidfd.c
@@ -0,0 +1,20 @@
+/* Tests for spawn pidfd extension.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <tst-spawn-pidfd.h>
+#include <posix/tst-spawn2.c>
diff --git a/sysdeps/unix/sysv/linux/tst-spawn3-pidfd.c b/sysdeps/unix/sysv/linux/tst-spawn3-pidfd.c
new file mode 100644
index 0000000000..8ad9a16854
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/tst-spawn3-pidfd.c
@@ -0,0 +1,20 @@
+/* Check posix_spawn add file actions.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <tst-spawn-pidfd.h>
+#include <posix/tst-spawn3.c>
diff --git a/sysdeps/unix/sysv/linux/tst-spawn4-pidfd.c b/sysdeps/unix/sysv/linux/tst-spawn4-pidfd.c
new file mode 100644
index 0000000000..83922da7d1
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/tst-spawn4-pidfd.c
@@ -0,0 +1,20 @@
+/* Tests for spawn pidfd extension.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <tst-spawn-pidfd.h>
+#include <posix/tst-spawn4.c>
diff --git a/sysdeps/unix/sysv/linux/tst-spawn5-pidfd.c b/sysdeps/unix/sysv/linux/tst-spawn5-pidfd.c
new file mode 100644
index 0000000000..149c352bf8
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/tst-spawn5-pidfd.c
@@ -0,0 +1,20 @@
+/* Tests for spawn pidfd extension.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <tst-spawn-pidfd.h>
+#include <posix/tst-spawn5.c>
diff --git a/sysdeps/unix/sysv/linux/tst-spawn6-pidfd.c b/sysdeps/unix/sysv/linux/tst-spawn6-pidfd.c
new file mode 100644
index 0000000000..d3f5859457
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/tst-spawn6-pidfd.c
@@ -0,0 +1,20 @@
+/* Tests for spawn pidfd extension.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <tst-spawn-pidfd.h>
+#include <posix/tst-spawn6.c>
diff --git a/sysdeps/unix/sysv/linux/tst-spawn7-pidfd.c b/sysdeps/unix/sysv/linux/tst-spawn7-pidfd.c
new file mode 100644
index 0000000000..3aec86bec2
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/tst-spawn7-pidfd.c
@@ -0,0 +1,20 @@
+/* Tests for spawn pidfd extension.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <tst-spawn-pidfd.h>
+#include <posix/tst-spawn7.c>
diff --git a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
index 3591b5de5e..e35bf54779 100644
--- a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
@@ -2582,6 +2582,8 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
 GLIBC_2.4 __confstr_chk F
diff --git a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
index ffbd8f3738..e7d7eb61c0 100644
--- a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
@@ -2688,5 +2688,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_spawn F
+GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
 GLIBC_2.39 posix_spawnattr_setcgroup_np F
-- 
2.34.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v7 6/8] posix: Add pidfd_fork (BZ 26371)
  2023-08-03 16:35 [PATCH v7 0/8] Add pidfd and cgroupv2 support for process creation Adhemerval Zanella
                   ` (4 preceding siblings ...)
  2023-08-03 16:35 ` [PATCH v7 5/8] posix: Add pidfd_spawn and pidfd_spawnp (BZ 30349) Adhemerval Zanella
@ 2023-08-03 16:35 ` Adhemerval Zanella
  2023-08-11 12:06   ` Florian Weimer
  2023-08-03 16:35 ` [PATCH v7 7/8] posix: Add PIDFDFORK_NOSIGCHLD for pidfd_fork Adhemerval Zanella
  2023-08-03 16:35 ` [PATCH v7 8/8] linux: Add pidfd_getpid Adhemerval Zanella
  7 siblings, 1 reply; 23+ messages in thread
From: Adhemerval Zanella @ 2023-08-03 16:35 UTC (permalink / raw)
  To: libc-alpha

Returning a pidfd allows a process to keep a race-free handle to a
child process, otherwise the caller will need to either use pidfd_open
(which still might be subject to TOCTOU) or keep using the old racy
interface.

The implementation makes sure that kernel must support the complete
pidfd interface, meaning that waitid (P_PIDFD) should be supported.
It ensure that non racy workaround is required (such as reading procfs
fdinfo pid to use along with old wait interfaces).  If kernel does
not have the required support the interface returns -1 and set errno
to ENOSYS.

The interface is:

  pid_t pidfd_fork (int *pidfd, int cgroup, unsigned int flags)

If PIDFD is set to NULL, no file descriptor is returned and pidfd_fork
acts as fork.  Otherwise, a new file descriptor is returned and the
kernel already sets O_CLOEXEC as default.  The pidfd_fork follows
fork/_Fork convention on returning a positive or negative value to the
parent (with negative indicating an error) and zero to the child.

If cgroup is 0 or positive value, it is interpreted as a different
cgroup to be place the new process (check CLONE_INTO_CGROUP clone
flag).

Similar to fork, pidfd_fork also runs the pthread_atfork handlers
It can be change by using PIDFDFORK_ASYNCSAFE flag, which make
pidfd_fork acts a _Fork.  It also send SIGCHLD to parent when
process terminates.

Checked on x86_64-linux-gnu on Linux 4.15 (no CLONE_PID or waitid
support), Linux 5.15 (only clone support), and Linux 5.19 (full
support including clone3).
---
 NEWS                                          |   5 +
 include/clone_internal.h                      |  16 ++
 manual/process.texi                           |  53 ++++-
 posix/Makefile                                |   3 +-
 posix/fork-internal.c                         | 127 ++++++++++++
 posix/fork-internal.h                         |  36 ++++
 posix/fork.c                                  | 107 +---------
 sysdeps/nptl/_Fork.c                          |   2 +-
 sysdeps/unix/sysv/linux/Makefile              |   3 +
 sysdeps/unix/sysv/linux/Versions              |   1 +
 sysdeps/unix/sysv/linux/aarch64/libc.abilist  |   1 +
 sysdeps/unix/sysv/linux/alpha/libc.abilist    |   1 +
 sysdeps/unix/sysv/linux/arc/libc.abilist      |   1 +
 sysdeps/unix/sysv/linux/arch-fork.h           |  16 +-
 sysdeps/unix/sysv/linux/arm/be/libc.abilist   |   1 +
 sysdeps/unix/sysv/linux/arm/le/libc.abilist   |   1 +
 sysdeps/unix/sysv/linux/clone-internal.c      |  58 +++++-
 sysdeps/unix/sysv/linux/csky/libc.abilist     |   1 +
 sysdeps/unix/sysv/linux/hppa/libc.abilist     |   1 +
 sysdeps/unix/sysv/linux/i386/libc.abilist     |   1 +
 sysdeps/unix/sysv/linux/ia64/libc.abilist     |   1 +
 .../sysv/linux/loongarch/lp64/libc.abilist    |   1 +
 .../sysv/linux/m68k/coldfire/libc.abilist     |   1 +
 .../unix/sysv/linux/m68k/m680x0/libc.abilist  |   1 +
 .../sysv/linux/microblaze/be/libc.abilist     |   1 +
 .../sysv/linux/microblaze/le/libc.abilist     |   1 +
 .../sysv/linux/mips/mips32/fpu/libc.abilist   |   1 +
 .../sysv/linux/mips/mips32/nofpu/libc.abilist |   1 +
 .../sysv/linux/mips/mips64/n32/libc.abilist   |   1 +
 .../sysv/linux/mips/mips64/n64/libc.abilist   |   1 +
 sysdeps/unix/sysv/linux/nios2/libc.abilist    |   1 +
 sysdeps/unix/sysv/linux/or1k/libc.abilist     |   1 +
 sysdeps/unix/sysv/linux/pidfd_fork.c          |  81 ++++++++
 .../linux/powerpc/powerpc32/fpu/libc.abilist  |   1 +
 .../powerpc/powerpc32/nofpu/libc.abilist      |   1 +
 .../linux/powerpc/powerpc64/be/libc.abilist   |   1 +
 .../linux/powerpc/powerpc64/le/libc.abilist   |   1 +
 .../unix/sysv/linux/riscv/rv32/libc.abilist   |   1 +
 .../unix/sysv/linux/riscv/rv64/libc.abilist   |   1 +
 .../unix/sysv/linux/s390/s390-32/libc.abilist |   1 +
 .../unix/sysv/linux/s390/s390-64/libc.abilist |   1 +
 sysdeps/unix/sysv/linux/sh/be/libc.abilist    |   1 +
 sysdeps/unix/sysv/linux/sh/le/libc.abilist    |   1 +
 .../sysv/linux/sparc/sparc32/libc.abilist     |   1 +
 .../sysv/linux/sparc/sparc64/libc.abilist     |   1 +
 sysdeps/unix/sysv/linux/sys/pidfd.h           |  19 ++
 .../unix/sysv/linux/tst-pidfd_fork-cgroup.c   | 162 +++++++++++++++
 sysdeps/unix/sysv/linux/tst-pidfd_fork.c      | 186 ++++++++++++++++++
 .../unix/sysv/linux/x86_64/64/libc.abilist    |   1 +
 .../unix/sysv/linux/x86_64/x32/libc.abilist   |   1 +
 50 files changed, 789 insertions(+), 120 deletions(-)
 create mode 100644 posix/fork-internal.c
 create mode 100644 posix/fork-internal.h
 create mode 100644 sysdeps/unix/sysv/linux/pidfd_fork.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-pidfd_fork-cgroup.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-pidfd_fork.c

diff --git a/NEWS b/NEWS
index ff41443896..3d753a0f39 100644
--- a/NEWS
+++ b/NEWS
@@ -22,6 +22,11 @@ Major new features:
   The pidfd functionality avoid the issue of PID reuse with traditional
   posix_spawn interface.
 
+* On Linux, the pidfd_fork has been added.  It has a similar semantic
+  as fork or _Fork, where it clones the calling process.  However instead
+  of return a process ID, it returns a file descriptor that can be used
+  along other pidfd functions.
+
 Deprecated and removed features, and other changes affecting compatibility:
 
   [Add deprecations, removals and changes affecting compatibility here]
diff --git a/include/clone_internal.h b/include/clone_internal.h
index 567160ebb5..d9b5509f78 100644
--- a/include/clone_internal.h
+++ b/include/clone_internal.h
@@ -2,6 +2,8 @@
 #define _CLONE_INTERNAL_H
 
 #include <clone3.h>
+#include <stdbool.h>
+#include <stdint.h>
 
 /* The clone3 syscall provides a superset of the functionality of the clone
    interface.  The kernel might extend __CL_ARGS struct in the future, with
@@ -35,6 +37,20 @@ extern int __clone_internal_fallback (struct clone_args *__cl_args,
 				      void *__arg)
      attribute_hidden;
 
+/* Call the clone3/clone syscall with fork semantic (i.e. no stack setting
+   required).  The EXTRA_FLAGS define any additional flag to be used besides
+   CLONE_CHILD_SETTID and CLONE_CHILD_CLEARTID, the PIDFD indicates where
+   the process file descriptor (set with CLONE_PIDFD) should be returned,
+   and the CGROUP specifies the cgroupsv2 (set with CLONE_INTO_CGROUP).
+
+   Similar to __clone3_internal, it uses the stick check to avoid re-issue
+   the clone3 syscall if kernel does not support it.
+
+   It does not provide CLONE_INTO_CGROUP/CGROUP fallback if clone3 is not
+   supported, in this case the function returns -1/ENOTSUP.  */
+extern int __clone_fork (uint64_t __extra_flags, void *__pidfd, int __cgroup)
+     attribute_hidden;
+
 /* Return whether the kernel supports pid file descriptor, including clone
    with CLONE_PIDFD and waitid with P_PIDFD.  */
 extern bool __clone_pidfd_supported (void) attribute_hidden;
diff --git a/manual/process.texi b/manual/process.texi
index 68361c3f61..a656df425b 100644
--- a/manual/process.texi
+++ b/manual/process.texi
@@ -137,12 +137,12 @@ creating a process and making it run another program.
 @cindex subprocess
 A new processes is created when one of the functions
 @code{posix_spawn}, @code{fork}, @code{_Fork}, @code{vfork}, or
-@code{pidfd_spawn} is called.  (The @code{system} and @code{popen} also
-create new processes internally.)  Due to the name of the @code{fork}
-function, the act of creating a new process is sometimes called
-@dfn{forking} a process.  Each new process (the @dfn{child process} or
-@dfn{subprocess}) is allocated a process ID, distinct from the process
-ID of the parent process.  @xref{Process Identification}.
+@code{pidfd_spawn}, or @code{pidfd_fork} is called.  (The @code{system}
+and @code{popen} also create new processes internally.)  Due to the name
+of the @code{fork} function, the act of creating a new process is
+sometimes called @dfn{forking} a process.  Each new process (the
+@dfn{child process} or @dfn{subprocess}) is allocated a process ID,
+distinct from the process ID of the parent process.  @xref{Process Identification}.
 
 After forking a child process, both the parent and child processes
 continue to execute normally.  If you want your program to wait for a
@@ -153,10 +153,10 @@ limited information about why the child terminated---for example, its
 exit status code.
 
 A newly forked child process continues to execute the same program as
-its parent process, at the point where the @code{fork} or @code{_Fork}
-call returns.  You can use the return value from @code{fork} or
-@code{_Fork} to tell whether the program is running in the parent process
-or the child.
+its parent process, at the point where the @code{fork}, @code{_Fork},
+or @code{pidfd_fork} call returns.  You can use the return value from
+@code{fork}, @code{_Fork}, or @code{pidfd_fork} to tell whether the
+program is running in the parent process or the child.
 
 @cindex process image
 Having several processes run the same program is only occasionally
@@ -362,6 +362,39 @@ the proper precautions for using @code{vfork}, your program will still
 work even if the system uses @code{fork} instead.
 @end deftypefun
 
+@deftypefun pid_t pidfd_fork (int *@var{pidfd}, int @var{cgroup}, unsigned int @var{flags})
+@standards{GNU, sys/pidfd.h}
+@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
+The @code{fork} function is similar to @code{fork} but return a file
+descriptor instead of process ID.
+
+If the operation is sucessful, there are both parent and child processes
+and both see @code{pidfd_fork} return, but with different values: it return
+a value of @code{0} in the child process and returns the child's process ID
+in the parent process.
+
+Also, if the process is correctly created and @code{pidfd} is non @code{NULL}
+the input argument will contain a file descriptor that can be used along other
+pidfd functions (like @code{pidfd_send_signal} or with @code{waitid} along with
+@code{P_PIDFD}.
+
+The @var{cgroup} argument should either -1 or a file descriptor to a cgroup v2
+directory used on process creation.  There is no fallback implementation, meaning
+If the kernel does not provide the required support an error is returned.
+
+The @var{flags} argument should be either zero, or the bitwise OR of some of the
+following flags:
+
+@table @code
+@item PIDFDFORK_ASYNCSAFE
+Acts as @code{_Fork}, where it does not invoke any callbacks registered with
+@code{pthread_atfork}, nor does it reset internal state or locks (such as the
+@code{malloc} locks).
+@end table
+@end deftypefun
+
+This function is specific to Linux.
+
 @node Executing a File
 @section Executing a File
 @cindex executing a file
diff --git a/posix/Makefile b/posix/Makefile
index 905cf9fb54..949f5632eb 100644
--- a/posix/Makefile
+++ b/posix/Makefile
@@ -85,6 +85,7 @@ routines := \
   fexecve \
   fnmatch \
   fork \
+  fork-internal \
   fpathconf \
   gai_strerror \
   get_child_max \
@@ -589,7 +590,7 @@ CFLAGS-execl.os = -fomit-frame-pointer
 CFLAGS-execvp.os = -fomit-frame-pointer
 CFLAGS-execlp.os = -fomit-frame-pointer
 CFLAGS-nanosleep.c += -fexceptions -fasynchronous-unwind-tables
-CFLAGS-fork.c = $(libio-mtsafe) $(config-cflags-wno-ignored-attributes)
+CFLAGS-fork-internal.c = $(libio-mtsafe) $(config-cflags-wno-ignored-attributes)
 
 tstgetopt-ARGS = -a -b -cfoobar --required foobar --optional=bazbug \
 		--none random --col --color --colour
diff --git a/posix/fork-internal.c b/posix/fork-internal.c
new file mode 100644
index 0000000000..a5e47cbe53
--- /dev/null
+++ b/posix/fork-internal.c
@@ -0,0 +1,127 @@
+/* Internal fork definitions.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <fork.h>
+#include <fork-internal.h>
+#include <ldsodefs.h>
+#include <libio/libioP.h>
+#include <malloc/malloc-internal.h>
+#include <register-atfork.h>
+#include <stdio-lock.h>
+#include <unwind-link.h>
+
+static void
+fresetlockfiles (void)
+{
+  _IO_ITER i;
+
+  for (i = _IO_iter_begin(); i != _IO_iter_end(); i = _IO_iter_next(i))
+    if ((_IO_iter_file (i)->_flags & _IO_USER_LOCK) == 0)
+      _IO_lock_init (*((_IO_lock_t *) _IO_iter_file(i)->_lock));
+}
+
+uint64_t
+__fork_pre (bool multiple_threads, struct nss_database_data *nss_database_data)
+{
+  uint64_t lastrun = __run_prefork_handlers (multiple_threads);
+
+  /* If we are not running multiple threads, we do not have to
+     preserve lock state.  If fork runs from a signal handler, only
+     async-signal-safe functions can be used in the child.  These data
+     structures are only used by unsafe functions, so their state does
+     not matter if fork was called from a signal handler.  */
+  if (multiple_threads)
+    {
+      call_function_static_weak (__nss_database_fork_prepare_parent,
+				 nss_database_data);
+
+      _IO_list_lock ();
+
+      /* Acquire malloc locks.  This needs to come last because fork
+	 handlers may use malloc, and the libio list lock has an
+	 indirect malloc dependency as well (via the getdelim
+	 function).  */
+      call_function_static_weak (__malloc_fork_lock_parent);
+    }
+
+  return lastrun;
+}
+
+void
+__fork_post (struct fork_post_state_t *state,
+	     struct nss_database_data *nss_database_data)
+{
+  if (state->pid == 0)
+    {
+      fork_system_setup ();
+
+      /* Reset the lock state in the multi-threaded case.  */
+      if (state->multiple_threads)
+	{
+	  __libc_unwind_link_after_fork ();
+
+	  fork_system_setup_after_fork ();
+
+	  /* Release malloc locks.  */
+	  call_function_static_weak (__malloc_fork_unlock_child);
+
+	  /* Reset the file list.  These are recursive mutexes.  */
+	  fresetlockfiles ();
+
+	  /* Reset locks in the I/O code.  */
+	  _IO_list_resetlock ();
+
+	  call_function_static_weak (__nss_database_fork_subprocess,
+				     nss_database_data);
+	}
+
+      /* Reset the lock the dynamic loader uses to protect its data.  */
+      __rtld_lock_initialize (GL(dl_load_lock));
+
+      /* Reset the lock protecting dynamic TLS related data.  */
+      __rtld_lock_initialize (GL(dl_load_tls_lock));
+
+      reclaim_stacks ();
+
+      /* Run the handlers registered for the child.  */
+      __run_postfork_handlers (atfork_run_child, state->multiple_threads,
+			       state->lastrun);
+    }
+  else
+    {
+      /* If _Fork failed, preserve its errno value.  */
+      int save_errno = errno;
+
+      /* Release acquired locks in the multi-threaded case.  */
+      if (state->multiple_threads)
+	{
+	  /* Release malloc locks, parent process variant.  */
+	  call_function_static_weak (__malloc_fork_unlock_parent);
+
+	  /* We execute this even if the 'fork' call failed.  */
+	  _IO_list_unlock ();
+	}
+
+      /* Run the handlers registered for the parent.  */
+      __run_postfork_handlers (atfork_run_parent, state->multiple_threads,
+			       state->lastrun);
+
+      if (state->pid < 0)
+	__set_errno (save_errno);
+    }
+}
diff --git a/posix/fork-internal.h b/posix/fork-internal.h
new file mode 100644
index 0000000000..5017061e1e
--- /dev/null
+++ b/posix/fork-internal.h
@@ -0,0 +1,36 @@
+/* Internal fork definitions.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#ifndef _FORK_INTERNAL_H
+#define _FORK_INTERNAL_H
+
+#include <stdint.h>
+#include <nss/nss_database.h>
+
+struct fork_post_state_t
+{
+  bool multiple_threads;
+  pid_t pid;
+  uint64_t lastrun;
+};
+
+uint64_t __fork_pre (bool, struct nss_database_data *) attribute_hidden;
+void __fork_post (struct fork_post_state_t *, struct nss_database_data *)
+  attribute_hidden;
+
+#endif
diff --git a/posix/fork.c b/posix/fork.c
index b4aaa9fa6d..1708473e72 100644
--- a/posix/fork.c
+++ b/posix/fork.c
@@ -16,25 +16,10 @@
    License along with the GNU C Library; if not, see
    <https://www.gnu.org/licenses/>.  */
 
-#include <fork.h>
-#include <libio/libioP.h>
-#include <ldsodefs.h>
-#include <malloc/malloc-internal.h>
-#include <nss/nss_database.h>
-#include <register-atfork.h>
-#include <stdio-lock.h>
+#include <fork-internal.h>
 #include <sys/single_threaded.h>
 #include <unwind-link.h>
-
-static void
-fresetlockfiles (void)
-{
-  _IO_ITER i;
-
-  for (i = _IO_iter_begin(); i != _IO_iter_end(); i = _IO_iter_next(i))
-    if ((_IO_iter_file (i)->_flags & _IO_USER_LOCK) == 0)
-      _IO_lock_init (*((_IO_lock_t *) _IO_iter_file(i)->_lock));
-}
+#include <unistd.h>
 
 pid_t
 __libc_fork (void)
@@ -45,92 +30,18 @@ __libc_fork (void)
      requirement for fork (Austin Group tracker issue #62) this is
      best effort to make is async-signal-safe at least for single-thread
      case.  */
-  bool multiple_threads = !SINGLE_THREAD_P;
-  uint64_t lastrun;
-
-  lastrun = __run_prefork_handlers (multiple_threads);
-
+  struct fork_post_state_t state = {
+      .multiple_threads = !SINGLE_THREAD_P
+  };
   struct nss_database_data nss_database_data;
 
-  /* If we are not running multiple threads, we do not have to
-     preserve lock state.  If fork runs from a signal handler, only
-     async-signal-safe functions can be used in the child.  These data
-     structures are only used by unsafe functions, so their state does
-     not matter if fork was called from a signal handler.  */
-  if (multiple_threads)
-    {
-      call_function_static_weak (__nss_database_fork_prepare_parent,
-				 &nss_database_data);
-
-      _IO_list_lock ();
-
-      /* Acquire malloc locks.  This needs to come last because fork
-	 handlers may use malloc, and the libio list lock has an
-	 indirect malloc dependency as well (via the getdelim
-	 function).  */
-      call_function_static_weak (__malloc_fork_lock_parent);
-    }
-
-  pid_t pid = _Fork ();
-
-  if (pid == 0)
-    {
-      fork_system_setup ();
-
-      /* Reset the lock state in the multi-threaded case.  */
-      if (multiple_threads)
-	{
-	  __libc_unwind_link_after_fork ();
-
-	  fork_system_setup_after_fork ();
-
-	  /* Release malloc locks.  */
-	  call_function_static_weak (__malloc_fork_unlock_child);
-
-	  /* Reset the file list.  These are recursive mutexes.  */
-	  fresetlockfiles ();
-
-	  /* Reset locks in the I/O code.  */
-	  _IO_list_resetlock ();
-
-	  call_function_static_weak (__nss_database_fork_subprocess,
-				     &nss_database_data);
-	}
-
-      /* Reset the lock the dynamic loader uses to protect its data.  */
-      __rtld_lock_initialize (GL(dl_load_lock));
-
-      /* Reset the lock protecting dynamic TLS related data.  */
-      __rtld_lock_initialize (GL(dl_load_tls_lock));
-
-      reclaim_stacks ();
-
-      /* Run the handlers registered for the child.  */
-      __run_postfork_handlers (atfork_run_child, multiple_threads, lastrun);
-    }
-  else
-    {
-      /* If _Fork failed, preserve its errno value.  */
-      int save_errno = errno;
-
-      /* Release acquired locks in the multi-threaded case.  */
-      if (multiple_threads)
-	{
-	  /* Release malloc locks, parent process variant.  */
-	  call_function_static_weak (__malloc_fork_unlock_parent);
-
-	  /* We execute this even if the 'fork' call failed.  */
-	  _IO_list_unlock ();
-	}
+  state.lastrun = __fork_pre (state.multiple_threads, &nss_database_data);
 
-      /* Run the handlers registered for the parent.  */
-      __run_postfork_handlers (atfork_run_parent, multiple_threads, lastrun);
+  state.pid = _Fork ();
 
-      if (pid < 0)
-	__set_errno (save_errno);
-    }
+  __fork_post (&state, &nss_database_data);
 
-  return pid;
+  return state.pid;
 }
 weak_alias (__libc_fork, __fork)
 libc_hidden_def (__fork)
diff --git a/sysdeps/nptl/_Fork.c b/sysdeps/nptl/_Fork.c
index f8322ae557..aa99e05b5b 100644
--- a/sysdeps/nptl/_Fork.c
+++ b/sysdeps/nptl/_Fork.c
@@ -22,7 +22,7 @@
 pid_t
 _Fork (void)
 {
-  pid_t pid = arch_fork (&THREAD_SELF->tid);
+  pid_t pid = arch_fork (0, NULL, &THREAD_SELF->tid);
   if (pid == 0)
     {
       struct pthread *self = THREAD_SELF;
diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
index 3ecfa184d0..58dc23a2fb 100644
--- a/sysdeps/unix/sysv/linux/Makefile
+++ b/sysdeps/unix/sysv/linux/Makefile
@@ -493,6 +493,7 @@ sysdep_headers += \
 sysdep_routines += \
   getcpu \
   oldglob \
+  pidfd_fork \
   pidfd_spawn \
   pidfd_spawnp \
   sched_getcpu \
@@ -503,6 +504,8 @@ sysdep_routines += \
 tests += \
   tst-affinity \
   tst-affinity-pid \
+  tst-pidfd_fork \
+  tst-pidfd_fork-cgroup \
   tst-posix_spawn-setsid-pidfd \
   tst-spawn-cgroup \
   tst-spawn-chdir-pidfd \
diff --git a/sysdeps/unix/sysv/linux/Versions b/sysdeps/unix/sysv/linux/Versions
index bd96ad12ad..f50531ce8a 100644
--- a/sysdeps/unix/sysv/linux/Versions
+++ b/sysdeps/unix/sysv/linux/Versions
@@ -324,6 +324,7 @@ libc {
   GLIBC_2.39 {
     posix_spawnattr_getcgroup_np;
     posix_spawnattr_setcgroup_np;
+    pidfd_fork;
     pidfd_spawn;
     pidfd_spawnp;
   }
diff --git a/sysdeps/unix/sysv/linux/aarch64/libc.abilist b/sysdeps/unix/sysv/linux/aarch64/libc.abilist
index 6f23556067..0d252d841b 100644
--- a/sysdeps/unix/sysv/linux/aarch64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/aarch64/libc.abilist
@@ -2673,6 +2673,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_fork F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/alpha/libc.abilist b/sysdeps/unix/sysv/linux/alpha/libc.abilist
index 02c43beb13..347c7e2de5 100644
--- a/sysdeps/unix/sysv/linux/alpha/libc.abilist
+++ b/sysdeps/unix/sysv/linux/alpha/libc.abilist
@@ -2782,6 +2782,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_fork F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/arc/libc.abilist b/sysdeps/unix/sysv/linux/arc/libc.abilist
index dd8e5912d8..78da9c8434 100644
--- a/sysdeps/unix/sysv/linux/arc/libc.abilist
+++ b/sysdeps/unix/sysv/linux/arc/libc.abilist
@@ -2434,6 +2434,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_fork F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/arch-fork.h b/sysdeps/unix/sysv/linux/arch-fork.h
index 0e0eccbf38..9e8a449e2c 100644
--- a/sysdeps/unix/sysv/linux/arch-fork.h
+++ b/sysdeps/unix/sysv/linux/arch-fork.h
@@ -32,24 +32,24 @@
    override it with one of the supported calling convention (check generic
    kernel-features.h for the clone abi variants).  */
 static inline pid_t
-arch_fork (void *ctid)
+arch_fork (int flags, void *ptid, void *ctid)
 {
-  const int flags = CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID | SIGCHLD;
   long int ret;
+  flags |= CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID | SIGCHLD;
 #ifdef __ASSUME_CLONE_BACKWARDS
 # ifdef INLINE_CLONE_SYSCALL
-  ret = INLINE_CLONE_SYSCALL (flags, 0, NULL, 0, ctid);
+  ret = INLINE_CLONE_SYSCALL (flags, 0, ptid, 0, ctid);
 # else
-  ret = INLINE_SYSCALL_CALL (clone, flags, 0, NULL, 0, ctid);
+  ret = INLINE_SYSCALL_CALL (clone, flags, 0, ptid, 0, ctid);
 # endif
 #elif defined(__ASSUME_CLONE_BACKWARDS2)
-  ret = INLINE_SYSCALL_CALL (clone, 0, flags, NULL, ctid, 0);
+  ret = INLINE_SYSCALL_CALL (clone, 0, flags, ptid, ctid, 0);
 #elif defined(__ASSUME_CLONE_BACKWARDS3)
-  ret = INLINE_SYSCALL_CALL (clone, flags, 0, 0, NULL, ctid, 0);
+  ret = INLINE_SYSCALL_CALL (clone, flags, 0, 0, ptid, ctid, 0);
 #elif defined(__ASSUME_CLONE2)
-  ret = INLINE_SYSCALL_CALL (clone2, flags, 0, 0, NULL, ctid, 0);
+  ret = INLINE_SYSCALL_CALL (clone2, flags, 0, 0, ptid, ctid, 0);
 #elif defined(__ASSUME_CLONE_DEFAULT)
-  ret = INLINE_SYSCALL_CALL (clone, flags, 0, NULL, ctid, 0);
+  ret = INLINE_SYSCALL_CALL (clone, flags, 0, ptid, ctid, 0);
 #else
 # error "Undefined clone variant"
 #endif
diff --git a/sysdeps/unix/sysv/linux/arm/be/libc.abilist b/sysdeps/unix/sysv/linux/arm/be/libc.abilist
index a751e5f5a9..c99cc53158 100644
--- a/sysdeps/unix/sysv/linux/arm/be/libc.abilist
+++ b/sysdeps/unix/sysv/linux/arm/be/libc.abilist
@@ -554,6 +554,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_fork F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/arm/le/libc.abilist b/sysdeps/unix/sysv/linux/arm/le/libc.abilist
index 0eda3459ed..1d3412a4ec 100644
--- a/sysdeps/unix/sysv/linux/arm/le/libc.abilist
+++ b/sysdeps/unix/sysv/linux/arm/le/libc.abilist
@@ -551,6 +551,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_fork F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/clone-internal.c b/sysdeps/unix/sysv/linux/clone-internal.c
index 790739cfce..d212f2591f 100644
--- a/sysdeps/unix/sysv/linux/clone-internal.c
+++ b/sysdeps/unix/sysv/linux/clone-internal.c
@@ -16,6 +16,7 @@
    License along with the GNU C Library.  If not, see
    <https://www.gnu.org/licenses/>.  */
 
+#include <arch-fork.h>
 #include <sysdep.h>
 #include <stddef.h>
 #include <errno.h>
@@ -43,6 +44,11 @@ _Static_assert (offsetofend (struct clone_args, cgroup) == CLONE_ARGS_SIZE_VER2,
 _Static_assert (sizeof (struct clone_args) == CLONE_ARGS_SIZE_VER2,
 		"sizeof (struct clone_args) != CLONE_ARGS_SIZE_VER2");
 
+#if !defined __ASSUME_CLONE3 && defined __NR_clone3
+/* Set to 0 if kernel does not support clone3 syscall.  */
+static int clone3_supported = 1;
+#endif
+
 int
 __clone_internal_fallback (struct clone_args *cl_args,
 			   int (*func) (void *arg), void *arg)
@@ -81,10 +87,9 @@ __clone3_internal (struct clone_args *cl_args, int (*func) (void *args),
 		   void *arg)
 {
 #ifdef HAVE_CLONE3_WRAPPER
-# if __ASSUME_CLONE3
+# ifdef __ASSUME_CLONE3
   return __clone3 (cl_args, sizeof (*cl_args), func, arg);
 # else
-  static int clone3_supported = 1;
   if (atomic_load_relaxed (&clone3_supported) == 1)
     {
       int ret = __clone3 (cl_args, sizeof (*cl_args), func, arg);
@@ -118,3 +123,52 @@ __clone_internal (struct clone_args *cl_args,
 }
 
 libc_hidden_def (__clone_internal)
+
+int
+__clone_fork (uint64_t extra_flags, void *pidfd, int cgroup)
+{
+#ifdef __NR_clone3
+  struct clone_args clone_args =
+    {
+      .flags = extra_flags
+	       | CLONE_CHILD_SETTID
+	       | CLONE_CHILD_CLEARTID,
+      .exit_signal = SIGCHLD,
+      .cgroup = cgroup,
+      .child_tid = (uintptr_t) &THREAD_SELF->tid,
+      .pidfd = (uintptr_t) pidfd,
+      .parent_tid = (uintptr_t) pidfd
+    };
+#endif
+
+#ifdef __ASSUME_CLONE3
+  return INLINE_SYSCALL_CALL (clone3, &clone_args, sizeof (clone_args));
+#else
+  /* Some architecture still does not export clone3.  */
+  pid_t pid;
+# ifdef __NR_clone3
+  if (atomic_load_relaxed (&clone3_supported) == 1)
+    {
+      pid = INLINE_SYSCALL_CALL (clone3, &clone_args, sizeof (clone_args));
+      if (pid != -1 || errno != ENOSYS)
+	return pid;
+
+      atomic_store_relaxed (&clone3_supported, 0);
+    }
+# endif
+
+  bool set_cgroup = cgroup != -1;
+  bool use_pidfd = pidfd != NULL;
+
+  if (!set_cgroup)
+    pid = arch_fork (use_pidfd ? CLONE_PIDFD : 0, pidfd, &THREAD_SELF->tid);
+  else
+    {
+      /* No fallback for POSIX_SPAWN_SETCGROUP if clone3 is not supported.  */
+      pid = -1;
+      if (errno == ENOSYS)
+	errno = ENOTSUP;
+    }
+  return pid;
+#endif
+}
diff --git a/sysdeps/unix/sysv/linux/csky/libc.abilist b/sysdeps/unix/sysv/linux/csky/libc.abilist
index 4f4e99427b..993eb6d8b7 100644
--- a/sysdeps/unix/sysv/linux/csky/libc.abilist
+++ b/sysdeps/unix/sysv/linux/csky/libc.abilist
@@ -2710,6 +2710,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_fork F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/hppa/libc.abilist b/sysdeps/unix/sysv/linux/hppa/libc.abilist
index abc471dd0b..a5825a7e27 100644
--- a/sysdeps/unix/sysv/linux/hppa/libc.abilist
+++ b/sysdeps/unix/sysv/linux/hppa/libc.abilist
@@ -2659,6 +2659,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_fork F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/i386/libc.abilist b/sysdeps/unix/sysv/linux/i386/libc.abilist
index 9f03c8a9a2..696ef98aea 100644
--- a/sysdeps/unix/sysv/linux/i386/libc.abilist
+++ b/sysdeps/unix/sysv/linux/i386/libc.abilist
@@ -2843,6 +2843,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_fork F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/ia64/libc.abilist b/sysdeps/unix/sysv/linux/ia64/libc.abilist
index ce1d20b722..0cd4b92159 100644
--- a/sysdeps/unix/sysv/linux/ia64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/ia64/libc.abilist
@@ -2608,6 +2608,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_fork F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/loongarch/lp64/libc.abilist b/sysdeps/unix/sysv/linux/loongarch/lp64/libc.abilist
index 8c3640b004..cfb358dc15 100644
--- a/sysdeps/unix/sysv/linux/loongarch/lp64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/loongarch/lp64/libc.abilist
@@ -2194,6 +2194,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_fork F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
index a594916319..c7de4154f4 100644
--- a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
+++ b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
@@ -555,6 +555,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_fork F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
index 7f61d4824d..e99ce0daa7 100644
--- a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
+++ b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
@@ -2786,6 +2786,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_fork F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist b/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist
index 83ebb84ff3..ffb3b1fa20 100644
--- a/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist
+++ b/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist
@@ -2759,6 +2759,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_fork F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist b/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist
index 89a0ff83bf..120b0707ec 100644
--- a/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist
+++ b/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist
@@ -2756,6 +2756,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_fork F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
index e21c752057..d0f2dce89c 100644
--- a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
@@ -2751,6 +2751,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_fork F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
index 42f470d397..3b33299274 100644
--- a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
@@ -2749,6 +2749,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_fork F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
index 6907f5f98b..0253aeb4e4 100644
--- a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
@@ -2757,6 +2757,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_fork F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
index 4b1f017a98..1613b18958 100644
--- a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
@@ -2659,6 +2659,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_fork F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/nios2/libc.abilist b/sysdeps/unix/sysv/linux/nios2/libc.abilist
index 0d45902209..b2e2723a97 100644
--- a/sysdeps/unix/sysv/linux/nios2/libc.abilist
+++ b/sysdeps/unix/sysv/linux/nios2/libc.abilist
@@ -2798,6 +2798,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_fork F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/or1k/libc.abilist b/sysdeps/unix/sysv/linux/or1k/libc.abilist
index c59032ef14..800aec0661 100644
--- a/sysdeps/unix/sysv/linux/or1k/libc.abilist
+++ b/sysdeps/unix/sysv/linux/or1k/libc.abilist
@@ -2180,6 +2180,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_fork F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/pidfd_fork.c b/sysdeps/unix/sysv/linux/pidfd_fork.c
new file mode 100644
index 0000000000..983f8ade98
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/pidfd_fork.c
@@ -0,0 +1,81 @@
+/* pidfd_fork - Duplicated calling process and return a process file
+   descriptor.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <clone_internal.h>
+#include <fork-internal.h>
+#include <sys/pidfd.h>
+
+static pid_t
+forkfd (int *pidfd, int cgroup)
+{
+  bool use_pidfd = pidfd != NULL;
+  bool set_cgroup = cgroup != -1;
+
+  uint64_t extra_flags = (use_pidfd ? CLONE_PIDFD : 0)
+			 | (set_cgroup ? CLONE_INTO_CGROUP : 0);
+  pid_t pid = __clone_fork (extra_flags, use_pidfd ? pidfd : NULL,
+			    set_cgroup ? cgroup: 0);
+
+  if (pid == 0)
+    {
+      struct pthread *self = THREAD_SELF;
+
+      /* Initialize the robust mutex, check _Fork implementation for a full
+	 description why this is required.  */
+#if __PTHREAD_MUTEX_HAVE_PREV
+      self->robust_prev = &self->robust_head;
+#endif
+      self->robust_head.list = &self->robust_head;
+      INTERNAL_SYSCALL_CALL (set_robust_list, &self->robust_head,
+			     sizeof (struct robust_list_head));
+    }
+  return pid;
+}
+
+pid_t
+pidfd_fork (int *pidfd, int cgroup, unsigned int flags)
+{
+  if (!__clone_pidfd_supported ())
+    return INLINE_SYSCALL_ERROR_RETURN_VALUE (ENOSYS);
+
+  if (flags & ~(PIDFDFORK_ASYNCSAFE))
+    return INLINE_SYSCALL_ERROR_RETURN_VALUE (EINVAL);
+
+  pid_t pid;
+  if (!(flags & PIDFDFORK_ASYNCSAFE))
+    {
+      bool multiple_threads = !SINGLE_THREAD_P;
+      struct fork_post_state_t state = {
+	  .multiple_threads = !SINGLE_THREAD_P
+      };
+      struct nss_database_data nss_database_data;
+
+      state.lastrun = __fork_pre (multiple_threads, &nss_database_data);
+      state.pid = forkfd (pidfd, cgroup);
+      /* It follow the usual fork semantic, where a positive or negative
+	 value is returned to parent, and 0 for the child.  */
+      __fork_post (&state, &nss_database_data);
+
+      pid = state.pid;
+    }
+  else
+    pid = forkfd (pidfd, cgroup);
+
+  return pid;
+}
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
index e014314d3e..6ce440b9a0 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
@@ -2825,6 +2825,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_fork F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
index ac05154915..d4cdb3f50b 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
@@ -2858,6 +2858,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_fork F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
index e13ee6e72a..e2bba8152b 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
@@ -2579,6 +2579,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_fork F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
index 0e8c9ab3fe..8976c6b37b 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
@@ -2893,6 +2893,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_fork F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist
index b0559a5a64..9dc6a3df1e 100644
--- a/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist
@@ -2436,6 +2436,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_fork F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
index 5f79a84016..4bafa24dba 100644
--- a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
@@ -2636,6 +2636,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_fork F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
index 498886ccb2..abb21da1ce 100644
--- a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
@@ -2823,6 +2823,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_fork F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
index 51679c2990..246a180900 100644
--- a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
@@ -2616,6 +2616,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_fork F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/sh/be/libc.abilist b/sysdeps/unix/sysv/linux/sh/be/libc.abilist
index af7b6f5bc9..493b9a53f9 100644
--- a/sysdeps/unix/sysv/linux/sh/be/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sh/be/libc.abilist
@@ -2666,6 +2666,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_fork F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/sh/le/libc.abilist b/sysdeps/unix/sysv/linux/sh/le/libc.abilist
index b766299f31..0a23cc2a0d 100644
--- a/sysdeps/unix/sysv/linux/sh/le/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sh/le/libc.abilist
@@ -2663,6 +2663,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_fork F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
index f5b9200a33..40c8747a75 100644
--- a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
@@ -2818,6 +2818,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_fork F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
index f6012e6e17..873949ce6c 100644
--- a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
@@ -2631,6 +2631,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_fork F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/sys/pidfd.h b/sysdeps/unix/sysv/linux/sys/pidfd.h
index 342e593288..3e6d009ce7 100644
--- a/sysdeps/unix/sysv/linux/sys/pidfd.h
+++ b/sysdeps/unix/sysv/linux/sys/pidfd.h
@@ -46,4 +46,23 @@ extern int pidfd_getfd (int __pidfd, int __targetfd,
 extern int pidfd_send_signal (int __pidfd, int __sig, siginfo_t *__info,
 			      unsigned int __flags) __THROW;
 
+
+/* Do not issue the pthread_atfork on pidfd_fork.  */
+#define PIDFDFORK_ASYNCSAFE (1U << 1)
+
+/* Clone the calling process, creating an exact copy and return a file
+   descriptor that can be used along other pidfd functions.
+
+   THE __CGROUP can be used to specify a different cgroup2 than the default
+   one.  This is done with the CLONE_INTO_CGROUP clone3 flag, and passing an
+   value -1 disables it.  If clone3 is not supported the call will fail.
+
+   The __FLAGS can be used to specify whether to run pthread_atfork handlers
+   and reset internal states.  The default is to run it, similar to fork.
+
+   Return -1 for errors, 0 to the new process, and the process ID of the new
+   process to the parent process.  */
+extern pid_t pidfd_fork (int *__pidfd, int __cgroup, unsigned int __flags)
+  __THROW;
+
 #endif /* _PIDFD_H  */
diff --git a/sysdeps/unix/sysv/linux/tst-pidfd_fork-cgroup.c b/sysdeps/unix/sysv/linux/tst-pidfd_fork-cgroup.c
new file mode 100644
index 0000000000..997bfa1c6d
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/tst-pidfd_fork-cgroup.c
@@ -0,0 +1,162 @@
+/* pidfd_fork test using cgroupsv2.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <errno.h>
+#include <sched.h>
+#include <stdlib.h>
+#include <string.h>
+#include <support/check.h>
+#include <support/support.h>
+#include <support/xstdio.h>
+#include <support/xunistd.h>
+#include <support/temp_file.h>
+#include <sys/pidfd.h>
+#include <sys/vfs.h>
+#include <sys/wait.h>
+
+#include <dirent.h>
+
+#define CGROUPFS "/sys/fs/cgroup/"
+#ifndef CGROUP2_SUPER_MAGIC
+# define CGROUP2_SUPER_MAGIC 0x63677270
+#endif
+
+#define F_TYPE_EQUAL(a, b) (a == (typeof(a)) b)
+
+static inline char *
+startswith(const char *s, const char *prefix)
+{
+  size_t l = strlen (prefix);
+  if (strncmp (s, prefix, l) == 0)
+    return (char*) s + l;
+  return NULL;
+}
+
+static char *
+get_cgroup (void)
+{
+  FILE *f = xfopen ("/proc/self/cgroup", "re");
+
+  char *cgroup = NULL;
+
+  char *line = NULL;
+  size_t linesiz = 0;
+  while (xgetline (&line, &linesiz, f) > 0)
+    {
+      char *entry = startswith (line, "0:");
+      if (entry == NULL)
+	continue;
+
+      entry = strchr (entry, ':');
+      if (entry == NULL)
+	continue;
+
+      cgroup = entry + 1;
+      size_t l = strlen (cgroup);
+      if (cgroup[l - 1] == '\n')
+	cgroup[l - 1] = '\0';
+
+      cgroup = xstrdup (entry + 1);
+      break;
+    }
+
+  xfclose (f);
+  free (line);
+
+  return cgroup;
+}
+
+static int
+do_test (void)
+{
+  struct statfs fs;
+  if (statfs (CGROUPFS, &fs) < 0)
+    {
+      if (errno == ENOENT)
+	FAIL_UNSUPPORTED ("not cgroupv2 mount found");
+      FAIL_EXIT1 ("statfs (%s): %m\n", CGROUPFS);
+    }
+
+  if (!F_TYPE_EQUAL (fs.f_type, CGROUP2_SUPER_MAGIC))
+    FAIL_UNSUPPORTED ("%s is not a cgroupv2", CGROUPFS);
+
+  char *cgroup = get_cgroup ();
+  TEST_VERIFY_EXIT (cgroup != NULL);
+  char *newcgroup = xasprintf ("%s/%s", cgroup, "test-pidfd_fork-cgroup");
+  char *cgpath = xasprintf ("%s%s/test-pidfd_fork-cgroup", CGROUPFS, cgroup);
+  free (cgroup);
+
+  if (mkdir (cgpath, 0755) == -1 && errno != EEXIST)
+    {
+      if (errno == EACCES || errno == EPERM)
+	FAIL_UNSUPPORTED ("can not create a new cgroupv2 group");
+      FAIL_EXIT1 ("mkdir (%s): %m", cgpath);
+    }
+  add_temp_file (cgpath);
+
+  int dfd = xopen (cgpath, O_DIRECTORY | O_RDONLY | O_CLOEXEC, 0666);
+
+  /* Check if the cgroup used at creation is the same returned by the kernel
+     and not as the parent.  */
+  {
+    pid_t pid = pidfd_fork (NULL, dfd, 0);
+    if (pid == -1 && errno == ENOSYS)
+      FAIL_UNSUPPORTED ("kernel does not support CLONE_PIDFD clone flag");
+    TEST_VERIFY_EXIT (pid != -1);
+    if (pid == 0)
+      {
+	char *child_cgroup = get_cgroup ();
+	TEST_VERIFY_EXIT (child_cgroup != NULL);
+	TEST_COMPARE_STRING (newcgroup, child_cgroup);
+	_exit (EXIT_SUCCESS);
+      }
+
+    siginfo_t sinfo;
+    TEST_COMPARE (waitid (P_PID, pid, &sinfo, WEXITED), 0);
+    TEST_COMPARE (sinfo.si_signo, SIGCHLD);
+    TEST_COMPARE (sinfo.si_code, CLD_EXITED);
+    TEST_COMPARE (sinfo.si_status, 0);
+  }
+
+  /* Same as before, but also check along with process file descriptor.  */
+  {
+    int pidfd;
+    pid_t pid = pidfd_fork (&pidfd, dfd, 0);
+    TEST_VERIFY_EXIT (pid != -1);
+    if (pid == 0)
+      {
+	char *child_cgroup = get_cgroup ();
+	TEST_VERIFY_EXIT (child_cgroup != NULL);
+	TEST_COMPARE_STRING (newcgroup, child_cgroup);
+	_exit (EXIT_SUCCESS);
+      }
+
+    siginfo_t sinfo;
+    TEST_COMPARE (waitid (P_PIDFD, pidfd, &sinfo, WEXITED), 0);
+    TEST_COMPARE (sinfo.si_signo, SIGCHLD);
+    TEST_COMPARE (sinfo.si_code, CLD_EXITED);
+    TEST_COMPARE (sinfo.si_status, 0);
+  }
+
+  free (cgpath);
+  free (newcgroup);
+
+  return 0;
+}
+
+#include <support/test-driver.c>
diff --git a/sysdeps/unix/sysv/linux/tst-pidfd_fork.c b/sysdeps/unix/sysv/linux/tst-pidfd_fork.c
new file mode 100644
index 0000000000..3e09c55d54
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/tst-pidfd_fork.c
@@ -0,0 +1,186 @@
+/* Basic tests for pidfd_fork.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <array_length.h>
+#include <errno.h>
+#include <pthread.h>
+#include <stdbool.h>
+#include <stdlib.h>
+#include <support/check.h>
+#include <support/temp_file.h>
+#include <support/xunistd.h>
+#include <sys/pidfd.h>
+#include <sys/wait.h>
+
+#define SIG_PID_EXIT_CODE 20
+
+static bool atfork_prepare_var;
+static bool atfork_parent_var;
+static bool atfork_child_var;
+
+static void
+atfork_prepare (void)
+{
+  atfork_prepare_var = true;
+}
+
+static void
+atfork_parent (void)
+{
+  atfork_parent_var = true;
+}
+
+static void
+atfork_child (void)
+{
+  atfork_child_var = true;
+}
+
+static int
+singlethread_test (unsigned int flags, bool wait_with_pid)
+{
+  const char testdata1[] = "abcdefghijklmnopqrtuvwxz";
+  enum { testdatalen1 = array_length (testdata1) };
+  const char testdata2[] = "01234567890";
+  enum { testdatalen2 = array_length (testdata2) };
+
+  pid_t ppid = getpid ();
+
+  int tempfd = create_temp_file ("tst-pidfd_fork", NULL);
+
+  /* Check if the opened file is shared between process by read and write
+     some data on parent and child processes.  */
+  xwrite (tempfd, testdata1, testdatalen1);
+  off_t off = xlseek (tempfd, 0, SEEK_CUR);
+  TEST_COMPARE (off, testdatalen1);
+
+  int pidfd;
+  pid_t pid = pidfd_fork (&pidfd, -1, flags);
+  TEST_VERIFY_EXIT (pid != -1);
+
+  if (pid == 0)
+    {
+      if (flags & PIDFDFORK_ASYNCSAFE)
+	TEST_VERIFY (!atfork_child_var);
+      else
+	TEST_VERIFY (atfork_child_var);
+
+      TEST_VERIFY_EXIT (getpid () != ppid);
+      TEST_COMPARE (getppid(), ppid);
+
+      TEST_COMPARE (xlseek (tempfd, 0, SEEK_CUR), testdatalen1);
+
+      xlseek (tempfd, 0, SEEK_SET);
+      char buf[testdatalen1];
+      TEST_COMPARE (read (tempfd, buf, sizeof (buf)), testdatalen1);
+      TEST_COMPARE_BLOB (buf, testdatalen1, testdata1, testdatalen1);
+
+      xlseek (tempfd, 0, SEEK_SET);
+      xwrite (tempfd, testdata2, testdatalen2);
+
+      xclose (tempfd);
+
+      _exit (EXIT_SUCCESS);
+    }
+
+  {
+    siginfo_t sinfo;
+    if (wait_with_pid)
+      TEST_COMPARE (waitid (P_PID, pid, &sinfo, WEXITED), 0);
+    else
+      TEST_COMPARE (waitid (P_PIDFD, pidfd, &sinfo, WEXITED), 0);
+    TEST_COMPARE (sinfo.si_signo, SIGCHLD);
+    TEST_COMPARE (sinfo.si_code, CLD_EXITED);
+    TEST_COMPARE (sinfo.si_status, 0);
+  }
+
+  TEST_COMPARE (xlseek (tempfd, 0, SEEK_CUR), testdatalen2);
+
+  xlseek (tempfd, 0, SEEK_SET);
+  char buf[testdatalen2];
+  TEST_COMPARE (read (tempfd, buf, sizeof (buf)), testdatalen2);
+
+  TEST_COMPARE_BLOB (buf, testdatalen2, testdata2, testdatalen2);
+
+  return 0;
+}
+
+static int
+do_test (void)
+{
+  /* Sanity check for pidfd support and check if passing NULL as the argument
+     make pidfd_fork acts as fork.  */
+  {
+    pid_t pid = pidfd_fork (NULL, -1, 0);
+    if (pid == -1 && errno == ENOSYS)
+      FAIL_UNSUPPORTED ("kernel does not support CLONE_PIDFD clone flag");
+    TEST_VERIFY_EXIT (pid != -1);
+    if (pid == 0)
+      _exit (EXIT_SUCCESS);
+
+    siginfo_t sinfo;
+    TEST_COMPARE (waitid (P_PID, pid, &sinfo, WEXITED), 0);
+    TEST_COMPARE (sinfo.si_signo, SIGCHLD);
+    TEST_COMPARE (sinfo.si_code, CLD_EXITED);
+    TEST_COMPARE (sinfo.si_status, 0);
+  }
+
+  pthread_atfork (atfork_prepare, atfork_parent, atfork_child);
+
+  /* With default flags, pidfd_fork acts as fork and run the pthread_atfork
+     handlers.  */
+  {
+    atfork_prepare_var = atfork_parent_var = atfork_child_var = false;
+    singlethread_test (0, false);
+    TEST_VERIFY (atfork_prepare_var);
+    TEST_VERIFY (atfork_parent_var);
+    TEST_VERIFY (!atfork_child_var);
+  }
+
+  /* Same as before, but also wait using the PID instead of pidfd.  */
+  {
+    atfork_prepare_var = atfork_parent_var = atfork_child_var = false;
+    singlethread_test (0, true);
+    TEST_VERIFY (atfork_prepare_var);
+    TEST_VERIFY (atfork_parent_var);
+    TEST_VERIFY (!atfork_child_var);
+  }
+
+  /* With PIDFDFORK_ASYNCSAFE, pidfd_fork acts as _Fork.  */
+  {
+    atfork_prepare_var = atfork_parent_var = atfork_child_var = false;
+    pthread_atfork (atfork_prepare, atfork_parent, atfork_child);
+    singlethread_test (PIDFDFORK_ASYNCSAFE, false);
+    TEST_VERIFY (!atfork_prepare_var);
+    TEST_VERIFY (!atfork_parent_var);
+    TEST_VERIFY (!atfork_child_var);
+  }
+
+  {
+    atfork_prepare_var = atfork_parent_var = atfork_child_var = false;
+    pthread_atfork (atfork_prepare, atfork_parent, atfork_child);
+    singlethread_test (PIDFDFORK_ASYNCSAFE, true);
+    TEST_VERIFY (!atfork_prepare_var);
+    TEST_VERIFY (!atfork_parent_var);
+    TEST_VERIFY (!atfork_child_var);
+  }
+
+  return 0;
+}
+
+#include <support/test-driver.c>
diff --git a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
index e35bf54779..3aac9a0fb9 100644
--- a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
@@ -2582,6 +2582,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_fork F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
index e7d7eb61c0..9a53054b37 100644
--- a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
@@ -2688,6 +2688,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 pidfd_fork F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
-- 
2.34.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v7 7/8] posix: Add PIDFDFORK_NOSIGCHLD for pidfd_fork
  2023-08-03 16:35 [PATCH v7 0/8] Add pidfd and cgroupv2 support for process creation Adhemerval Zanella
                   ` (5 preceding siblings ...)
  2023-08-03 16:35 ` [PATCH v7 6/8] posix: Add pidfd_fork (BZ 26371) Adhemerval Zanella
@ 2023-08-03 16:35 ` Adhemerval Zanella
  2023-08-03 16:35 ` [PATCH v7 8/8] linux: Add pidfd_getpid Adhemerval Zanella
  7 siblings, 0 replies; 23+ messages in thread
From: Adhemerval Zanella @ 2023-08-03 16:35 UTC (permalink / raw)
  To: libc-alpha

It clones the process without setting SIGCHLD as the termination
signal.  When using this flag, the parent process must specify the
__WALL or __WCLONE Linux specific options when waiting for the child
with wait or waitid.

Checked on x86_64-linux-gnu and i686-linux-gnu.
---
 include/clone_internal.h                 |  3 +-
 manual/process.texi                      |  6 ++++
 sysdeps/nptl/_Fork.c                     |  2 +-
 sysdeps/unix/sysv/linux/arch-fork.h      |  2 +-
 sysdeps/unix/sysv/linux/clone-internal.c | 10 ++++--
 sysdeps/unix/sysv/linux/pidfd_fork.c     | 13 +++----
 sysdeps/unix/sysv/linux/sys/pidfd.h      |  2 ++
 sysdeps/unix/sysv/linux/tst-pidfd_fork.c | 45 ++++++++++++++++++++++--
 8 files changed, 69 insertions(+), 14 deletions(-)

diff --git a/include/clone_internal.h b/include/clone_internal.h
index d9b5509f78..4ec0c9198f 100644
--- a/include/clone_internal.h
+++ b/include/clone_internal.h
@@ -48,7 +48,8 @@ extern int __clone_internal_fallback (struct clone_args *__cl_args,
 
    It does not provide CLONE_INTO_CGROUP/CGROUP fallback if clone3 is not
    supported, in this case the function returns -1/ENOTSUP.  */
-extern int __clone_fork (uint64_t __extra_flags, void *__pidfd, int __cgroup)
+extern int __clone_fork (uint64_t __extra_flags, void *__pidfd, int __cgroup,
+			 bool nosigchld)
      attribute_hidden;
 
 /* Return whether the kernel supports pid file descriptor, including clone
diff --git a/manual/process.texi b/manual/process.texi
index a656df425b..c60701aeb8 100644
--- a/manual/process.texi
+++ b/manual/process.texi
@@ -390,6 +390,12 @@ following flags:
 Acts as @code{_Fork}, where it does not invoke any callbacks registered with
 @code{pthread_atfork}, nor does it reset internal state or locks (such as the
 @code{malloc} locks).
+
+@item PIDFDFORK_NOSIGCHLD
+Do not send a @code{SIGCHLD} termination signal when child terminates.
+@strong{NB:} When using this flag, the parent process must specify the
+@code{__WALL} or @code{__WCLONE} options when waiting for the child with
+@code{wait} or @code{waitid}.
 @end table
 @end deftypefun
 
diff --git a/sysdeps/nptl/_Fork.c b/sysdeps/nptl/_Fork.c
index aa99e05b5b..397f059fb0 100644
--- a/sysdeps/nptl/_Fork.c
+++ b/sysdeps/nptl/_Fork.c
@@ -22,7 +22,7 @@
 pid_t
 _Fork (void)
 {
-  pid_t pid = arch_fork (0, NULL, &THREAD_SELF->tid);
+  pid_t pid = arch_fork (SIGCHLD, NULL, &THREAD_SELF->tid);
   if (pid == 0)
     {
       struct pthread *self = THREAD_SELF;
diff --git a/sysdeps/unix/sysv/linux/arch-fork.h b/sysdeps/unix/sysv/linux/arch-fork.h
index 9e8a449e2c..f978d4c4f4 100644
--- a/sysdeps/unix/sysv/linux/arch-fork.h
+++ b/sysdeps/unix/sysv/linux/arch-fork.h
@@ -35,7 +35,7 @@ static inline pid_t
 arch_fork (int flags, void *ptid, void *ctid)
 {
   long int ret;
-  flags |= CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID | SIGCHLD;
+  flags |= CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID;
 #ifdef __ASSUME_CLONE_BACKWARDS
 # ifdef INLINE_CLONE_SYSCALL
   ret = INLINE_CLONE_SYSCALL (flags, 0, ptid, 0, ctid);
diff --git a/sysdeps/unix/sysv/linux/clone-internal.c b/sysdeps/unix/sysv/linux/clone-internal.c
index d212f2591f..615e79a510 100644
--- a/sysdeps/unix/sysv/linux/clone-internal.c
+++ b/sysdeps/unix/sysv/linux/clone-internal.c
@@ -125,7 +125,7 @@ __clone_internal (struct clone_args *cl_args,
 libc_hidden_def (__clone_internal)
 
 int
-__clone_fork (uint64_t extra_flags, void *pidfd, int cgroup)
+__clone_fork (uint64_t extra_flags, void *pidfd, int cgroup, bool nosigchld)
 {
 #ifdef __NR_clone3
   struct clone_args clone_args =
@@ -133,7 +133,7 @@ __clone_fork (uint64_t extra_flags, void *pidfd, int cgroup)
       .flags = extra_flags
 	       | CLONE_CHILD_SETTID
 	       | CLONE_CHILD_CLEARTID,
-      .exit_signal = SIGCHLD,
+      .exit_signal = nosigchld ? 0 : SIGCHLD,
       .cgroup = cgroup,
       .child_tid = (uintptr_t) &THREAD_SELF->tid,
       .pidfd = (uintptr_t) pidfd,
@@ -161,7 +161,11 @@ __clone_fork (uint64_t extra_flags, void *pidfd, int cgroup)
   bool use_pidfd = pidfd != NULL;
 
   if (!set_cgroup)
-    pid = arch_fork (use_pidfd ? CLONE_PIDFD : 0, pidfd, &THREAD_SELF->tid);
+    {
+      int extra_flags = use_pidfd ? CLONE_PIDFD : 0
+			| (nosigchld ? 0 : SIGCHLD);
+      pid = arch_fork (extra_flags, pidfd, &THREAD_SELF->tid);
+    }
   else
     {
       /* No fallback for POSIX_SPAWN_SETCGROUP if clone3 is not supported.  */
diff --git a/sysdeps/unix/sysv/linux/pidfd_fork.c b/sysdeps/unix/sysv/linux/pidfd_fork.c
index 983f8ade98..f3b6b74375 100644
--- a/sysdeps/unix/sysv/linux/pidfd_fork.c
+++ b/sysdeps/unix/sysv/linux/pidfd_fork.c
@@ -22,7 +22,7 @@
 #include <sys/pidfd.h>
 
 static pid_t
-forkfd (int *pidfd, int cgroup)
+forkfd (int *pidfd, int cgroup, bool nosigchld)
 {
   bool use_pidfd = pidfd != NULL;
   bool set_cgroup = cgroup != -1;
@@ -30,8 +30,7 @@ forkfd (int *pidfd, int cgroup)
   uint64_t extra_flags = (use_pidfd ? CLONE_PIDFD : 0)
 			 | (set_cgroup ? CLONE_INTO_CGROUP : 0);
   pid_t pid = __clone_fork (extra_flags, use_pidfd ? pidfd : NULL,
-			    set_cgroup ? cgroup: 0);
-
+			    set_cgroup ? cgroup: 0, nosigchld);
   if (pid == 0)
     {
       struct pthread *self = THREAD_SELF;
@@ -54,9 +53,11 @@ pidfd_fork (int *pidfd, int cgroup, unsigned int flags)
   if (!__clone_pidfd_supported ())
     return INLINE_SYSCALL_ERROR_RETURN_VALUE (ENOSYS);
 
-  if (flags & ~(PIDFDFORK_ASYNCSAFE))
+  if (flags & ~(PIDFDFORK_ASYNCSAFE | PIDFDFORK_NOSIGCHLD))
     return INLINE_SYSCALL_ERROR_RETURN_VALUE (EINVAL);
 
+  bool nosigchld = flags & PIDFDFORK_NOSIGCHLD;
+
   pid_t pid;
   if (!(flags & PIDFDFORK_ASYNCSAFE))
     {
@@ -67,7 +68,7 @@ pidfd_fork (int *pidfd, int cgroup, unsigned int flags)
       struct nss_database_data nss_database_data;
 
       state.lastrun = __fork_pre (multiple_threads, &nss_database_data);
-      state.pid = forkfd (pidfd, cgroup);
+      state.pid = forkfd (pidfd, cgroup, nosigchld);
       /* It follow the usual fork semantic, where a positive or negative
 	 value is returned to parent, and 0 for the child.  */
       __fork_post (&state, &nss_database_data);
@@ -75,7 +76,7 @@ pidfd_fork (int *pidfd, int cgroup, unsigned int flags)
       pid = state.pid;
     }
   else
-    pid = forkfd (pidfd, cgroup);
+    pid = forkfd (pidfd, cgroup, nosigchld);
 
   return pid;
 }
diff --git a/sysdeps/unix/sysv/linux/sys/pidfd.h b/sysdeps/unix/sysv/linux/sys/pidfd.h
index 3e6d009ce7..87095212a7 100644
--- a/sysdeps/unix/sysv/linux/sys/pidfd.h
+++ b/sysdeps/unix/sysv/linux/sys/pidfd.h
@@ -49,6 +49,8 @@ extern int pidfd_send_signal (int __pidfd, int __sig, siginfo_t *__info,
 
 /* Do not issue the pthread_atfork on pidfd_fork.  */
 #define PIDFDFORK_ASYNCSAFE (1U << 1)
+/* Do not send a SIGCHLD termination signal.  */
+#define PIDFDFORK_NOSIGCHLD (1U << 2)
 
 /* Clone the calling process, creating an exact copy and return a file
    descriptor that can be used along other pidfd functions.
diff --git a/sysdeps/unix/sysv/linux/tst-pidfd_fork.c b/sysdeps/unix/sysv/linux/tst-pidfd_fork.c
index 3e09c55d54..ee3a72ba5d 100644
--- a/sysdeps/unix/sysv/linux/tst-pidfd_fork.c
+++ b/sysdeps/unix/sysv/linux/tst-pidfd_fork.c
@@ -24,6 +24,7 @@
 #include <support/check.h>
 #include <support/temp_file.h>
 #include <support/xunistd.h>
+#include <support/xsignal.h>
 #include <sys/pidfd.h>
 #include <sys/wait.h>
 
@@ -33,6 +34,14 @@ static bool atfork_prepare_var;
 static bool atfork_parent_var;
 static bool atfork_child_var;
 
+static sig_atomic_t sigchld_called;
+
+static void
+sigchld_handler (int sig)
+{
+  sigchld_called = 1;
+}
+
 static void
 atfork_prepare (void)
 {
@@ -69,6 +78,9 @@ singlethread_test (unsigned int flags, bool wait_with_pid)
   off_t off = xlseek (tempfd, 0, SEEK_CUR);
   TEST_COMPARE (off, testdatalen1);
 
+  bool check_nosigchld = flags & PIDFDFORK_NOSIGCHLD;
+  sigchld_called = 0;
+
   int pidfd;
   pid_t pid = pidfd_fork (&pidfd, -1, flags);
   TEST_VERIFY_EXIT (pid != -1);
@@ -100,13 +112,18 @@ singlethread_test (unsigned int flags, bool wait_with_pid)
 
   {
     siginfo_t sinfo;
+    int options = WEXITED | (check_nosigchld ? __WCLONE : 0);
     if (wait_with_pid)
-      TEST_COMPARE (waitid (P_PID, pid, &sinfo, WEXITED), 0);
+      TEST_COMPARE (waitid (P_PID, pid, &sinfo, options), 0);
     else
-      TEST_COMPARE (waitid (P_PIDFD, pidfd, &sinfo, WEXITED), 0);
+      TEST_COMPARE (waitid (P_PIDFD, pidfd, &sinfo, options), 0);
     TEST_COMPARE (sinfo.si_signo, SIGCHLD);
     TEST_COMPARE (sinfo.si_code, CLD_EXITED);
     TEST_COMPARE (sinfo.si_status, 0);
+
+    /* If PIDFDFORK_NOSIGCHLD is specified no SIGCHLD should be sent by the
+       kernel.  */
+    TEST_COMPARE (sigchld_called, check_nosigchld ? 0 : 1);
   }
 
   TEST_COMPARE (xlseek (tempfd, 0, SEEK_CUR), testdatalen2);
@@ -140,6 +157,14 @@ do_test (void)
     TEST_COMPARE (sinfo.si_status, 0);
   }
 
+  {
+    struct sigaction sa;
+    sa.sa_handler = sigchld_handler;
+    sa.sa_flags = 0;
+    sigemptyset (&sa.sa_mask);
+    xsigaction (SIGCHLD, &sa, NULL);
+  }
+
   pthread_atfork (atfork_prepare, atfork_parent, atfork_child);
 
   /* With default flags, pidfd_fork acts as fork and run the pthread_atfork
@@ -161,6 +186,14 @@ do_test (void)
     TEST_VERIFY (!atfork_child_var);
   }
 
+  {
+    atfork_prepare_var = atfork_parent_var = atfork_child_var = false;
+    singlethread_test (PIDFDFORK_NOSIGCHLD, false);
+    TEST_VERIFY (atfork_prepare_var);
+    TEST_VERIFY (atfork_parent_var);
+    TEST_VERIFY (!atfork_child_var);
+  }
+
   /* With PIDFDFORK_ASYNCSAFE, pidfd_fork acts as _Fork.  */
   {
     atfork_prepare_var = atfork_parent_var = atfork_child_var = false;
@@ -180,6 +213,14 @@ do_test (void)
     TEST_VERIFY (!atfork_child_var);
   }
 
+  {
+    atfork_prepare_var = atfork_parent_var = atfork_child_var = false;
+    singlethread_test (PIDFDFORK_NOSIGCHLD | PIDFDFORK_ASYNCSAFE, true);
+    TEST_VERIFY (!atfork_prepare_var);
+    TEST_VERIFY (!atfork_parent_var);
+    TEST_VERIFY (!atfork_child_var);
+  }
+
   return 0;
 }
 
-- 
2.34.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v7 8/8] linux: Add pidfd_getpid
  2023-08-03 16:35 [PATCH v7 0/8] Add pidfd and cgroupv2 support for process creation Adhemerval Zanella
                   ` (6 preceding siblings ...)
  2023-08-03 16:35 ` [PATCH v7 7/8] posix: Add PIDFDFORK_NOSIGCHLD for pidfd_fork Adhemerval Zanella
@ 2023-08-03 16:35 ` Adhemerval Zanella
  2023-08-11 14:36   ` Florian Weimer
  7 siblings, 1 reply; 23+ messages in thread
From: Adhemerval Zanella @ 2023-08-03 16:35 UTC (permalink / raw)
  To: libc-alpha

This interface allows to obtain the associated process ID from the
process file descriptor.  It is done by parsing the procps fdinfo
information.  Its prototype is:

   pid_t pidfd_getpid (int fd)

It returns the associated pid or -1 in case of an error and sets the
errno accordingly.  The possible errno values are those from open,
read, and close (used on procps parsing), along with:

   - EBADF if the FD is negative, does not have a PID associatedi, or
     if the fdinfo fields contains a value larger than pid_t.

   - EREMOTE if the PID is in a separate namespace.

   - ESRCH if the process is already terminated.

Checked on x86_64-linux-gnu on Linux 4.15 (no CLONE_PID or waitid
support), Linux 5.15 (only clone support), and Linux 5.19 (full
support including clone3).
---
 NEWS                                          |   4 +
 manual/process.texi                           |  31 +++
 sysdeps/unix/sysv/linux/Makefile              |   3 +
 sysdeps/unix/sysv/linux/Versions              |   1 +
 sysdeps/unix/sysv/linux/aarch64/libc.abilist  |   1 +
 sysdeps/unix/sysv/linux/alpha/libc.abilist    |   1 +
 sysdeps/unix/sysv/linux/arc/libc.abilist      |   1 +
 sysdeps/unix/sysv/linux/arm/be/libc.abilist   |   1 +
 sysdeps/unix/sysv/linux/arm/le/libc.abilist   |   1 +
 sysdeps/unix/sysv/linux/csky/libc.abilist     |   1 +
 sysdeps/unix/sysv/linux/hppa/libc.abilist     |   1 +
 sysdeps/unix/sysv/linux/i386/libc.abilist     |   1 +
 sysdeps/unix/sysv/linux/ia64/libc.abilist     |   1 +
 .../sysv/linux/loongarch/lp64/libc.abilist    |   1 +
 .../sysv/linux/m68k/coldfire/libc.abilist     |   1 +
 .../unix/sysv/linux/m68k/m680x0/libc.abilist  |   1 +
 .../sysv/linux/microblaze/be/libc.abilist     |   1 +
 .../sysv/linux/microblaze/le/libc.abilist     |   1 +
 .../sysv/linux/mips/mips32/fpu/libc.abilist   |   1 +
 .../sysv/linux/mips/mips32/nofpu/libc.abilist |   1 +
 .../sysv/linux/mips/mips64/n32/libc.abilist   |   1 +
 .../sysv/linux/mips/mips64/n64/libc.abilist   |   1 +
 sysdeps/unix/sysv/linux/nios2/libc.abilist    |   1 +
 sysdeps/unix/sysv/linux/or1k/libc.abilist     |   1 +
 sysdeps/unix/sysv/linux/pidfd_getpid.c        | 122 ++++++++++++
 .../linux/powerpc/powerpc32/fpu/libc.abilist  |   1 +
 .../powerpc/powerpc32/nofpu/libc.abilist      |   1 +
 .../linux/powerpc/powerpc64/be/libc.abilist   |   1 +
 .../linux/powerpc/powerpc64/le/libc.abilist   |   1 +
 sysdeps/unix/sysv/linux/procutils.c           | 104 ++++++++++
 sysdeps/unix/sysv/linux/procutils.h           |  35 ++++
 .../unix/sysv/linux/riscv/rv32/libc.abilist   |   1 +
 .../unix/sysv/linux/riscv/rv64/libc.abilist   |   1 +
 .../unix/sysv/linux/s390/s390-32/libc.abilist |   1 +
 .../unix/sysv/linux/s390/s390-64/libc.abilist |   1 +
 sysdeps/unix/sysv/linux/sh/be/libc.abilist    |   1 +
 sysdeps/unix/sysv/linux/sh/le/libc.abilist    |   1 +
 .../sysv/linux/sparc/sparc32/libc.abilist     |   1 +
 .../sysv/linux/sparc/sparc64/libc.abilist     |   1 +
 sysdeps/unix/sysv/linux/sys/pidfd.h           |   4 +
 sysdeps/unix/sysv/linux/tst-pidfd.c           |  47 +++++
 sysdeps/unix/sysv/linux/tst-pidfd_getpid.c    | 187 ++++++++++++++++++
 .../unix/sysv/linux/x86_64/64/libc.abilist    |   1 +
 .../unix/sysv/linux/x86_64/x32/libc.abilist   |   1 +
 44 files changed, 572 insertions(+)
 create mode 100644 sysdeps/unix/sysv/linux/pidfd_getpid.c
 create mode 100644 sysdeps/unix/sysv/linux/procutils.c
 create mode 100644 sysdeps/unix/sysv/linux/procutils.h
 create mode 100644 sysdeps/unix/sysv/linux/tst-pidfd_getpid.c

diff --git a/NEWS b/NEWS
index 3d753a0f39..3a8cb00554 100644
--- a/NEWS
+++ b/NEWS
@@ -27,6 +27,10 @@ Major new features:
   of return a process ID, it returns a file descriptor that can be used
   along other pidfd functions.
 
+* On Linux, the pidfd_getpid function has been added.  It allows to retrieve
+  the process ID associated with process file descriptor created with
+  pid_spawn, pidfd_fork, or pidfd_open.
+
 Deprecated and removed features, and other changes affecting compatibility:
 
   [Add deprecations, removals and changes affecting compatibility here]
diff --git a/manual/process.texi b/manual/process.texi
index c60701aeb8..a74f316ddc 100644
--- a/manual/process.texi
+++ b/manual/process.texi
@@ -33,6 +33,7 @@ primitive functions to do each step individually instead.
 * Process Creation Concepts::   An overview of the hard way to do it.
 * Process Identification::      How to get the process ID of a process.
 * Creating a Process::          How to fork a child process.
+* Querying a Process::          How to query a child process.
 * Executing a File::            How to make a process execute another program.
 * Process Completion::          How to tell when a child process has completed.
 * Process Completion Status::   How to interpret the status value
@@ -401,6 +402,36 @@ Do not send a @code{SIGCHLD} termination signal when child terminates.
 
 This function is specific to Linux.
 
+@node Querying a Process
+@section Querying a Process
+
+The file descriptor returned by the @code{pidfd_fork} function can be used to
+query process extra information.
+
+@deftypefun pid_t pidfd_getpid (int @var{fd})
+@standards{GNU, sys/pidfd.h}
+@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
+
+The @code{pidfd_getpid} function retrieves the process ID associated with process
+file descriptor created with @code{pid_spawn}, @code{pidfd_fork}, or
+@code{pidfd_open}.
+
+If the operation fails, @code{pidfd_getpid} return @code{-1} and the following
+@code{errno} error conditionas are defined:
+
+@table @code
+@item EBADF
+The input file descriptor is invalid, does not have a pidfd associated, or an
+error has occurred parsing the kernel data.
+@item EREMOTE
+There is no process ID to denote the process in the current namespace.
+@item ESRCH
+The process for which the file descriptor refers to is terminated.
+@end table
+
+This function is specific to Linux.
+@end deftypefun
+
 @node Executing a File
 @section Executing a File
 @cindex executing a file
diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
index 58dc23a2fb..ee1f40883c 100644
--- a/sysdeps/unix/sysv/linux/Makefile
+++ b/sysdeps/unix/sysv/linux/Makefile
@@ -213,6 +213,7 @@ tests += \
   tst-ofdlocks \
   tst-personality \
   tst-pidfd \
+  tst-pidfd_getpid \
   tst-pkey \
   tst-ppoll \
   tst-prctl \
@@ -494,8 +495,10 @@ sysdep_routines += \
   getcpu \
   oldglob \
   pidfd_fork \
+  pidfd_getpid \
   pidfd_spawn \
   pidfd_spawnp \
+  procutils \
   sched_getcpu \
   spawnattr_getcgroup_np \
   spawnattr_setcgroup_np \
diff --git a/sysdeps/unix/sysv/linux/Versions b/sysdeps/unix/sysv/linux/Versions
index f50531ce8a..1310c009e8 100644
--- a/sysdeps/unix/sysv/linux/Versions
+++ b/sysdeps/unix/sysv/linux/Versions
@@ -325,6 +325,7 @@ libc {
     posix_spawnattr_getcgroup_np;
     posix_spawnattr_setcgroup_np;
     pidfd_fork;
+    pidfd_getpid;
     pidfd_spawn;
     pidfd_spawnp;
   }
diff --git a/sysdeps/unix/sysv/linux/aarch64/libc.abilist b/sysdeps/unix/sysv/linux/aarch64/libc.abilist
index 0d252d841b..c230d2f9bf 100644
--- a/sysdeps/unix/sysv/linux/aarch64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/aarch64/libc.abilist
@@ -2674,6 +2674,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 pidfd_fork F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/alpha/libc.abilist b/sysdeps/unix/sysv/linux/alpha/libc.abilist
index 347c7e2de5..735b230f3a 100644
--- a/sysdeps/unix/sysv/linux/alpha/libc.abilist
+++ b/sysdeps/unix/sysv/linux/alpha/libc.abilist
@@ -2783,6 +2783,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 pidfd_fork F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/arc/libc.abilist b/sysdeps/unix/sysv/linux/arc/libc.abilist
index 78da9c8434..c959044259 100644
--- a/sysdeps/unix/sysv/linux/arc/libc.abilist
+++ b/sysdeps/unix/sysv/linux/arc/libc.abilist
@@ -2435,6 +2435,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 pidfd_fork F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/arm/be/libc.abilist b/sysdeps/unix/sysv/linux/arm/be/libc.abilist
index c99cc53158..c88bc44bac 100644
--- a/sysdeps/unix/sysv/linux/arm/be/libc.abilist
+++ b/sysdeps/unix/sysv/linux/arm/be/libc.abilist
@@ -555,6 +555,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 pidfd_fork F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/arm/le/libc.abilist b/sysdeps/unix/sysv/linux/arm/le/libc.abilist
index 1d3412a4ec..e67da18c51 100644
--- a/sysdeps/unix/sysv/linux/arm/le/libc.abilist
+++ b/sysdeps/unix/sysv/linux/arm/le/libc.abilist
@@ -552,6 +552,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 pidfd_fork F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/csky/libc.abilist b/sysdeps/unix/sysv/linux/csky/libc.abilist
index 993eb6d8b7..24a7326b48 100644
--- a/sysdeps/unix/sysv/linux/csky/libc.abilist
+++ b/sysdeps/unix/sysv/linux/csky/libc.abilist
@@ -2711,6 +2711,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 pidfd_fork F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/hppa/libc.abilist b/sysdeps/unix/sysv/linux/hppa/libc.abilist
index a5825a7e27..1ba625305d 100644
--- a/sysdeps/unix/sysv/linux/hppa/libc.abilist
+++ b/sysdeps/unix/sysv/linux/hppa/libc.abilist
@@ -2660,6 +2660,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 pidfd_fork F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/i386/libc.abilist b/sysdeps/unix/sysv/linux/i386/libc.abilist
index 696ef98aea..324bf961d4 100644
--- a/sysdeps/unix/sysv/linux/i386/libc.abilist
+++ b/sysdeps/unix/sysv/linux/i386/libc.abilist
@@ -2844,6 +2844,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 pidfd_fork F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/ia64/libc.abilist b/sysdeps/unix/sysv/linux/ia64/libc.abilist
index 0cd4b92159..564476beac 100644
--- a/sysdeps/unix/sysv/linux/ia64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/ia64/libc.abilist
@@ -2609,6 +2609,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 pidfd_fork F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/loongarch/lp64/libc.abilist b/sysdeps/unix/sysv/linux/loongarch/lp64/libc.abilist
index cfb358dc15..17364cb102 100644
--- a/sysdeps/unix/sysv/linux/loongarch/lp64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/loongarch/lp64/libc.abilist
@@ -2195,6 +2195,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 pidfd_fork F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
index c7de4154f4..da63ffbd55 100644
--- a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
+++ b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
@@ -556,6 +556,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 pidfd_fork F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
index e99ce0daa7..a6715fb165 100644
--- a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
+++ b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
@@ -2787,6 +2787,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 pidfd_fork F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist b/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist
index ffb3b1fa20..9ef3686f7b 100644
--- a/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist
+++ b/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist
@@ -2760,6 +2760,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 pidfd_fork F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist b/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist
index 120b0707ec..6442415958 100644
--- a/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist
+++ b/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist
@@ -2757,6 +2757,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 pidfd_fork F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
index d0f2dce89c..8f184efc2f 100644
--- a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
@@ -2752,6 +2752,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 pidfd_fork F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
index 3b33299274..89667debc6 100644
--- a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
@@ -2750,6 +2750,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 pidfd_fork F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
index 0253aeb4e4..ec7b79a4b1 100644
--- a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
@@ -2758,6 +2758,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 pidfd_fork F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
index 1613b18958..afc87d46ac 100644
--- a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
@@ -2660,6 +2660,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 pidfd_fork F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/nios2/libc.abilist b/sysdeps/unix/sysv/linux/nios2/libc.abilist
index b2e2723a97..f836f84740 100644
--- a/sysdeps/unix/sysv/linux/nios2/libc.abilist
+++ b/sysdeps/unix/sysv/linux/nios2/libc.abilist
@@ -2799,6 +2799,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 pidfd_fork F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/or1k/libc.abilist b/sysdeps/unix/sysv/linux/or1k/libc.abilist
index 800aec0661..d089993ca3 100644
--- a/sysdeps/unix/sysv/linux/or1k/libc.abilist
+++ b/sysdeps/unix/sysv/linux/or1k/libc.abilist
@@ -2181,6 +2181,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 pidfd_fork F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/pidfd_getpid.c b/sysdeps/unix/sysv/linux/pidfd_getpid.c
new file mode 100644
index 0000000000..46848a5983
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/pidfd_getpid.c
@@ -0,0 +1,122 @@
+/* pidfd_getpid - Get the associated pid from the pid file descriptor.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <_itoa.h>
+#include <errno.h>
+#include <intprops.h>
+#include <procutils.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sysdep.h>
+#include <unistd.h>
+
+#define FDINFO_TO_FILENAME_PREFIX "/proc/self/fdinfo/"
+
+#define FDINFO_FILENAME_LEN \
+  (sizeof (FDINFO_TO_FILENAME_PREFIX) + INT_STRLEN_BOUND (int))
+
+struct parse_fdinfo_t
+{
+  bool found;
+  pid_t pid;
+};
+
+/* Parse the PID field in the fdinfo entry, if existent.  Avoid strtol or
+   similar to not be locale dependent.  */
+static int
+parse_fdinfo (const char *l, void *arg)
+{
+  enum { fieldlen = sizeof ("Pid:") - 1 };
+  if (strncmp (l, "Pid:", fieldlen) != 0)
+    return 0;
+
+  l += fieldlen;
+
+  /* Skip leading spaces.  */
+  while (*l == ' ' || (unsigned int)(*l) -'\t' < 5)
+    l++;
+
+  bool neg = false;
+  switch (*l)
+    {
+    case '-': neg = true;
+    case '+': l++;
+    }
+
+  if (*l == '\0')
+    return 0;
+
+  int n = 0;
+  while (*l != '\0')
+    {
+      /* Check if '*l' is a digit.  */
+      if ((unsigned int)(*l) - '0' >= 10)
+        return 0;
+
+      /* Ignore invalid large values.  */
+      if (INT_MULTIPLY_WRAPV (10, n, &n)
+          || INT_ADD_WRAPV (n, *l++ - '0', &n))
+        return 0;
+    }
+
+  /* -1 indicates that the process is terminated.  */
+  if (neg && n != 1)
+    return 0;
+
+  struct parse_fdinfo_t *fdinfo = arg;
+  fdinfo->pid = neg ? -n : n;
+  fdinfo->found = true;
+
+  return 1;
+}
+
+pid_t
+pidfd_getpid (int fd)
+{
+  if (__glibc_unlikely (fd < 0))
+    {
+      __set_errno (EBADF);
+      return -1;
+    }
+
+  char fdinfoname[FDINFO_FILENAME_LEN];
+
+  char *p = mempcpy (fdinfoname, FDINFO_TO_FILENAME_PREFIX,
+		     strlen (FDINFO_TO_FILENAME_PREFIX));
+  *_fitoa_word (fd, p, 10, 0) = '\0';
+
+  struct parse_fdinfo_t fdinfo = { .found = false, .pid = -1 };
+  if (procutils_read_file (fdinfoname, parse_fdinfo, &fdinfo) == -1)
+    /* The fdinfo contains an invalid 'Pid:' value.  */
+    return INLINE_SYSCALL_ERROR_RETURN_VALUE (EBADF);
+
+  /* The FD does not have a 'Pid:' entry associated.  */
+  if (!fdinfo.found)
+    return INLINE_SYSCALL_ERROR_RETURN_VALUE (EBADF);
+
+  /* The pidfd cannot be resolved because it is in a separate pid
+     namespace.  */
+  if (fdinfo.pid == 0)
+    return INLINE_SYSCALL_ERROR_RETURN_VALUE (EREMOTE);
+
+  /* A negative value means the process is terminated.  */
+  if (fdinfo.pid < 0)
+    return INLINE_SYSCALL_ERROR_RETURN_VALUE (ESRCH);
+
+  return fdinfo.pid;
+}
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
index 6ce440b9a0..73f551946c 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
@@ -2826,6 +2826,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 pidfd_fork F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
index d4cdb3f50b..06777226b9 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
@@ -2859,6 +2859,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 pidfd_fork F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
index e2bba8152b..5b49433840 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
@@ -2580,6 +2580,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 pidfd_fork F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
index 8976c6b37b..e82d5ef81d 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
@@ -2894,6 +2894,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 pidfd_fork F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/procutils.c b/sysdeps/unix/sysv/linux/procutils.c
new file mode 100644
index 0000000000..83b327cb9a
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/procutils.c
@@ -0,0 +1,104 @@
+/* Utilities functions to read/parse Linux procfs and sysfs.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <assert.h>
+#include <not-cancel.h>
+#include <procutils.h>
+#include <string.h>
+
+static int
+next_line (char **r, int fd, char *const buffer, char **cp, char **re,
+           char *const buffer_end)
+{
+  char *res = *cp;
+  char *nl = memchr (*cp, '\n', *re - *cp);
+  if (nl == NULL)
+    {
+      if (*cp != buffer)
+        {
+          if (*re == buffer_end)
+            {
+              memmove (buffer, *cp, *re - *cp);
+              *re = buffer + (*re - *cp);
+              *cp = buffer;
+
+              ssize_t n = __read_nocancel (fd, *re, buffer_end - *re);
+              if (n < 0)
+                return -1;
+
+              *re += n;
+
+              nl = memchr (*cp, '\n', *re - *cp);
+              while (nl == NULL && *re == buffer_end)
+                {
+                  /* Truncate too long lines.  */
+                  *re = buffer + 3 * (buffer_end - buffer) / 4;
+                  n = __read_nocancel (fd, *re, buffer_end - *re);
+                  if (n < 0)
+                    return -1;
+
+                  nl = memchr (*re, '\n', n);
+                  **re = '\0';
+                  *re += n;
+                }
+            }
+          else
+            nl = memchr (*cp, '\n', *re - *cp);
+
+          res = *cp;
+        }
+
+      if (nl == NULL)
+        nl = *re - 1;
+    }
+
+  *nl = '\0';
+  *cp = nl + 1;
+  assert (*cp <= *re);
+
+  if (res == *re)
+    return 0;
+
+  *r = res;
+  return 1;
+}
+
+int
+procutils_read_file (const char *filename, procutils_closure_t closure,
+		     void *arg)
+{
+  enum { buffer_size = 1024 };
+  char buffer[buffer_size];
+  char *buffer_end = buffer + buffer_size;
+  char *cp = buffer_end;
+  char *re = buffer_end;
+
+  int fd = __open64_nocancel (filename, O_RDONLY | O_CLOEXEC);
+  if (fd == -1)
+    return -1;
+
+  char *l;
+  int r;
+  while ((r = next_line (&l, fd, buffer, &cp, &re, buffer_end)) > 0)
+    if (closure (l, arg) != 0)
+      break;
+
+  __close_nocancel_nostatus (fd);
+
+  return r;
+}
diff --git a/sysdeps/unix/sysv/linux/procutils.h b/sysdeps/unix/sysv/linux/procutils.h
new file mode 100644
index 0000000000..64e1080920
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/procutils.h
@@ -0,0 +1,35 @@
+/* Utilities functions to read/parse Linux procfs and sysfs.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#ifndef _PROCUTILS_H
+#define _PROCUTILS_H
+
+typedef int (*procutils_closure_t)(const char *line, void *arg);
+
+/* Open and read the path FILENAME, line per line, and call CLOSURE with
+   argument ARG on each line.  The read is done with a static buffer,
+   with non-cancellable calls, and the line is null terminated.
+
+   The CLOSURE should return true if the read should continue, or false
+   if the function should stop.
+
+   It returns 0 in case of success, or -1 otherwise.  */
+int procutils_read_file (const char *filename, procutils_closure_t closure,
+			 void *arg) attribute_hidden;
+
+#endif
diff --git a/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist
index 9dc6a3df1e..ceb537ed1f 100644
--- a/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist
@@ -2437,6 +2437,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 pidfd_fork F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
index 4bafa24dba..93b4237957 100644
--- a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
@@ -2637,6 +2637,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 pidfd_fork F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
index abb21da1ce..36d6d8b389 100644
--- a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
@@ -2824,6 +2824,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 pidfd_fork F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
index 246a180900..4e49bdb79d 100644
--- a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
@@ -2617,6 +2617,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 pidfd_fork F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/sh/be/libc.abilist b/sysdeps/unix/sysv/linux/sh/be/libc.abilist
index 493b9a53f9..d6edb37baf 100644
--- a/sysdeps/unix/sysv/linux/sh/be/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sh/be/libc.abilist
@@ -2667,6 +2667,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 pidfd_fork F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/sh/le/libc.abilist b/sysdeps/unix/sysv/linux/sh/le/libc.abilist
index 0a23cc2a0d..c63992b93e 100644
--- a/sysdeps/unix/sysv/linux/sh/le/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sh/le/libc.abilist
@@ -2664,6 +2664,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 pidfd_fork F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
index 40c8747a75..489002b9a5 100644
--- a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
@@ -2819,6 +2819,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 pidfd_fork F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
index 873949ce6c..65a8d30c5a 100644
--- a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
@@ -2632,6 +2632,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 pidfd_fork F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/sys/pidfd.h b/sysdeps/unix/sysv/linux/sys/pidfd.h
index 87095212a7..8cf4df6b81 100644
--- a/sysdeps/unix/sysv/linux/sys/pidfd.h
+++ b/sysdeps/unix/sysv/linux/sys/pidfd.h
@@ -67,4 +67,8 @@ extern int pidfd_send_signal (int __pidfd, int __sig, siginfo_t *__info,
 extern pid_t pidfd_fork (int *__pidfd, int __cgroup, unsigned int __flags)
   __THROW;
 
+/* Query the process ID (PID) from process descriptor __FD.  Return the PID
+   or -1 in case of an error.  */
+extern pid_t pidfd_getpid (int __fd) __THROW;
+
 #endif /* _PIDFD_H  */
diff --git a/sysdeps/unix/sysv/linux/tst-pidfd.c b/sysdeps/unix/sysv/linux/tst-pidfd.c
index 64d8a2ef40..53d223f702 100644
--- a/sysdeps/unix/sysv/linux/tst-pidfd.c
+++ b/sysdeps/unix/sysv/linux/tst-pidfd.c
@@ -18,6 +18,7 @@
 
 #include <errno.h>
 #include <fcntl.h>
+#include <limits.h>
 #include <support/capture_subprocess.h>
 #include <support/check.h>
 #include <support/process_state.h>
@@ -27,6 +28,9 @@
 #include <support/xsocket.h>
 #include <sys/pidfd.h>
 #include <sys/wait.h>
+#include <stdlib.h>
+
+#include <string.h>
 
 #define REMOTE_PATH "/dev/null"
 
@@ -102,6 +106,43 @@ do_test (void)
   ppid = getpid ();
   puid = getuid ();
 
+  /* Sanity check for invalid inputs.  */
+  TEST_COMPARE (pidfd_getpid (-1), -1);
+  TEST_COMPARE (errno, EBADF);
+
+  {
+    pid_t pid = pidfd_getpid (STDOUT_FILENO);
+    TEST_COMPARE (pid, -1);
+    TEST_COMPARE (errno, EBADF);
+  }
+
+  /* Check if pidfd_getpid returns ESRCH for exited subprocess.  */
+  {
+    int pidfd;
+    pid_t pidfork = pidfd_fork (&pidfd, -1, 0);
+    if (pidfork == 0)
+      _exit (EXIT_SUCCESS);
+
+    /* The process might be still running or already in zombie state, in any
+       case the PID is still allocated to the process.  */
+    pid_t pid = pidfd_getpid (pidfd);
+    if (pid > 0)
+      support_process_state_wait (pid, support_process_state_zombie);
+
+    siginfo_t info;
+    TEST_COMPARE (waitid (P_PIDFD, pidfd, &info, WEXITED), 0);
+    TEST_COMPARE (info.si_pid, pidfork);
+    TEST_COMPARE (info.si_status, 0);
+    TEST_COMPARE (info.si_code, CLD_EXITED);
+
+    /* Once the process is reaped the associated PID is not available.  */
+    pid = pidfd_getpid (pidfd);
+    TEST_COMPARE (pid, -1);
+    TEST_COMPARE (errno, ESRCH);
+
+    xclose (pidfd);
+  }
+
   TEST_COMPARE (socketpair (AF_UNIX, SOCK_STREAM, 0, sockets), 0);
 
   pid_t pid = xfork ();
@@ -118,6 +159,12 @@ do_test (void)
   int pidfd = pidfd_open (pid, 0);
   TEST_VERIFY (pidfd != -1);
 
+  TEST_COMPARE (pidfd_getpid (INT_MAX), -1);
+  {
+    pid_t querypid = pidfd_getpid (pidfd);
+    TEST_COMPARE (querypid, pid);
+  }
+
   /* Wait for first sigtimedwait.  */
   support_process_state_wait (pid, support_process_state_sleeping);
   TEST_COMPARE (pidfd_send_signal (pidfd, SIGUSR1, NULL, 0), 0);
diff --git a/sysdeps/unix/sysv/linux/tst-pidfd_getpid.c b/sysdeps/unix/sysv/linux/tst-pidfd_getpid.c
new file mode 100644
index 0000000000..41d03a04ad
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/tst-pidfd_getpid.c
@@ -0,0 +1,187 @@
+/* Specific tests for Linux pidfd_getpid.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <errno.h>
+#include <sched.h>
+#include <stdlib.h>
+#include <support/check.h>
+#include <support/xsocket.h>
+#include <support/xunistd.h>
+#include <support/test-driver.h>
+#include <sys/pidfd.h>
+#include <sys/wait.h>
+#include <sys/mount.h>
+#include <string.h>
+
+#include <stdio.h>
+
+static int sockfd[2];
+
+static void
+send_fd (const int sock, const int fd)
+{
+  union
+    {
+      struct cmsghdr hdr;
+      char buf[CMSG_SPACE (sizeof (int))];
+    } cmsgbuf = {0};
+  struct cmsghdr *cmsg;
+  char ch = 'A';
+  struct iovec vec =
+    {
+      .iov_base = &ch,
+      .iov_len = sizeof ch
+    };
+  struct msghdr msg =
+    {
+      .msg_control = &cmsgbuf.buf,
+      .msg_controllen = sizeof (cmsgbuf.buf),
+      .msg_iov = &vec,
+      .msg_iovlen = 1,
+    };
+
+  cmsg = CMSG_FIRSTHDR (&msg);
+  cmsg->cmsg_len = CMSG_LEN (sizeof (int));
+  cmsg->cmsg_level = SOL_SOCKET;
+  cmsg->cmsg_type = SCM_RIGHTS;
+  memcpy (CMSG_DATA (cmsg), &fd, sizeof (fd));
+
+  ssize_t n;
+  while ((n = sendmsg (sock, &msg, 0)) == -1 && errno == EINTR);
+
+  TEST_VERIFY_EXIT (n == 1);
+}
+
+static int
+recv_fd (const int sock)
+{
+  union
+    {
+      struct cmsghdr hdr;
+      char buf[CMSG_SPACE(sizeof(int))];
+    } cmsgbuf = {0};
+  struct cmsghdr *cmsg;
+  char ch = '\0';
+  struct iovec vec =
+    {
+      .iov_base = &ch,
+      .iov_len = sizeof ch
+    };
+  struct msghdr msg =
+    {
+      .msg_control = &cmsgbuf.buf,
+      .msg_controllen = sizeof (cmsgbuf.buf),
+      .msg_iov = &vec,
+      .msg_iovlen = 1,
+    };
+
+  ssize_t n;
+  while ((n = recvmsg (sock, &msg, 0)) == -1 && errno == EINTR);
+  if (n != 1 || ch != 'A')
+    return -1;
+
+  cmsg = CMSG_FIRSTHDR (&msg);
+  if (cmsg == NULL)
+    return -1;
+  if (cmsg->cmsg_type != SCM_RIGHTS)
+    return -1;
+
+  int fd = -1;
+  memcpy (&fd, CMSG_DATA (cmsg), sizeof (fd));
+  if (fd < 0)
+    return -1;
+  return fd;
+}
+
+static int
+do_test (void)
+{
+  {
+    /* The pidfd_getfd syscall was the last in the set of pidfd related
+       syscalls added to the kernel.  Use pidfd_getfd to decide if this
+       kernel has pidfd support that we can test.  */
+    int r = pidfd_getfd (0, 0, 1);
+    TEST_VERIFY_EXIT (r == -1);
+    if (errno == ENOSYS)
+      FAIL_UNSUPPORTED ("kernel does not support pidfd_getfd, skipping test");
+  }
+
+  TEST_VERIFY_EXIT (socketpair (AF_UNIX, SOCK_STREAM, 0, sockfd) == 0);
+
+  /* Check if pidfd_getpid returns EREMOTE for process not in current
+     namespace.  */
+  {
+    int pidfd;
+    pid_t pid = pidfd_fork (&pidfd, -1, 0);
+    TEST_VERIFY_EXIT (pid >= 0);
+    if (pid == 0)
+      {
+        if (unshare (CLONE_NEWNS | CLONE_NEWUSER | CLONE_NEWPID) < 0)
+	  {
+	    /* Older kernels may not support all the options, or security
+	       policy may block this call.  */
+	    if (errno == EINVAL || errno == EPERM || errno == ENOSPC)
+	      exit (EXIT_UNSUPPORTED);
+	    FAIL_EXIT1 ("unshare user/fs/pid failed: %m");
+	  }
+
+	TEST_VERIFY_EXIT (mount (NULL, "/", NULL, MS_REC | MS_PRIVATE, 0)
+			  == 0);
+
+	pid_t child = xfork ();
+	if (child > 0)
+	  {
+	    int status;
+	    xwaitpid (child, &status, 0);
+	    TEST_VERIFY (WIFEXITED (status));
+	    exit (WEXITSTATUS (status));
+	  }
+
+	/* Now that we're pid 1 (effectively "root") we can mount /proc  */
+	if (mount ("proc", "/proc", "proc", 0, NULL) != 0)
+	  /* This happens if we're trying to create a nested container,
+	     like if the build is running under podman, and we lack
+	     priviledges.  */
+	  {
+	    if (errno == EPERM)
+	      _exit (EXIT_UNSUPPORTED);
+	    else
+	      _exit (EXIT_FAILURE);
+	  }
+
+	int ppidfd = recv_fd (sockfd[0]);
+	TEST_COMPARE (pidfd_getpid (ppidfd), -1);
+	TEST_COMPARE (errno, EREMOTE);
+
+	_exit (EXIT_SUCCESS);
+      }
+
+    send_fd (sockfd[1], pidfd);
+
+    siginfo_t info;
+    TEST_COMPARE (waitid (P_PIDFD, pidfd, &info, WEXITED), 0);
+    if (info.si_status == EXIT_UNSUPPORTED)
+      FAIL_UNSUPPORTED ("unable to unshare user/fs/pid");
+    TEST_COMPARE (info.si_status, 0);
+    TEST_COMPARE (info.si_code, CLD_EXITED);
+  }
+
+  return 0;
+}
+
+#include <support/test-driver.c>
diff --git a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
index 3aac9a0fb9..072b92e51d 100644
--- a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
@@ -2583,6 +2583,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 pidfd_fork F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
diff --git a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
index 9a53054b37..0bbb88176e 100644
--- a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
@@ -2689,6 +2689,7 @@ GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
 GLIBC_2.39 pidfd_fork F
+GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
 GLIBC_2.39 posix_spawnattr_getcgroup_np F
-- 
2.34.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 1/8] arm: Add the clone3 wrapper
  2023-08-03 16:35 ` [PATCH v7 1/8] arm: Add the clone3 wrapper Adhemerval Zanella
@ 2023-08-11 10:17   ` Florian Weimer
  2023-08-11 14:12     ` Adhemerval Zanella Netto
  0 siblings, 1 reply; 23+ messages in thread
From: Florian Weimer @ 2023-08-11 10:17 UTC (permalink / raw)
  To: Adhemerval Zanella via Libc-alpha; +Cc: Adhemerval Zanella

* Adhemerval Zanella via Libc-alpha:

> +	/* Do the syscall, the kernel expects:
> +	   r7: system call number:
> +	   r0: cl_args
> +	   r1: size  */
> +	push    { r7 }
> +	cfi_adjust_cfa_offset (4)
> +	cfi_rel_offset (r7, 0)
> +	ldr     r7, =SYS_ify(clone3)
> +	swi	0x0
> +	cfi_endproc
> +
> +	cmp	r0, #0
> +	beq	1f
> +	pop     {r7}

> +1:
> +	.fnstart
> +	.cantunwind
> +	mov	r0, r3
> +	mov	ip, r2
> +	BLX (ip)

I think the stack is misaligned at the BNLX call because only one 4-byte
register is pushed.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 3/8] linux: Undef __ASSUME_CLONE3 for alpha, ia64, nios2, sh, and sparc
  2023-08-03 16:35 ` [PATCH v7 3/8] linux: Undef __ASSUME_CLONE3 for alpha, ia64, nios2, sh, and sparc Adhemerval Zanella
@ 2023-08-11 10:34   ` Florian Weimer
  2023-08-11 15:12     ` Adhemerval Zanella Netto
  0 siblings, 1 reply; 23+ messages in thread
From: Florian Weimer @ 2023-08-11 10:34 UTC (permalink / raw)
  To: Adhemerval Zanella via Libc-alpha; +Cc: Adhemerval Zanella

* Adhemerval Zanella via Libc-alpha:

> Not all architectures added clone3 syscall.
> ---
>  .../unix/sysv/linux/alpha/kernel-features.h   |  3 +++
>  .../unix/sysv/linux/ia64/kernel-features.h    |  3 +++
>  .../unix/sysv/linux/nios2/kernel-features.h   | 23 +++++++++++++++++++
>  sysdeps/unix/sysv/linux/sh/kernel-features.h  |  3 +++
>  .../unix/sysv/linux/sparc/kernel-features.h   |  3 +++
>  5 files changed, 35 insertions(+)
>  create mode 100644 sysdeps/unix/sysv/linux/nios2/kernel-features.h
>
> diff --git a/sysdeps/unix/sysv/linux/alpha/kernel-features.h b/sysdeps/unix/sysv/linux/alpha/kernel-features.h
> index 3151e75449..e298bf2bcc 100644
> --- a/sysdeps/unix/sysv/linux/alpha/kernel-features.h
> +++ b/sysdeps/unix/sysv/linux/alpha/kernel-features.h
> @@ -50,4 +50,7 @@
>  /* Alpha requires old sysvipc even being a 64-bit architecture.  */
>  #undef __ASSUME_SYSVIPC_DEFAULT_IPC_64
>  
> +/* Alpha does not provide clone3.  */
> +#undef __ASSUME_CLONE3

This is inconsistent with sysdeps/unix/sysv/linux/kernel-features.h,
which I think uses 0 to indicate no support:

/* The clone3 system call was introduced across on most architectures in
   Linux 5.3.  Not all ports implements it, so it should be used along
   HAVE_CLONE3_WRAPPER define.  */
#if __LINUX_KERNEL_VERSION >= 0x050300
# define __ASSUME_CLONE3 1
#else
# define __ASSUME_CLONE3 0

Maybe that comment needs updating in this series, too?

Thanks,
Florian


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 4/8] linux: Add posix_spawnattr_{get, set}cgroup_np (BZ 26731)
  2023-08-03 16:35 ` [PATCH v7 4/8] linux: Add posix_spawnattr_{get,set}cgroup_np (BZ 26731) Adhemerval Zanella
@ 2023-08-11 10:51   ` Florian Weimer
  2023-08-11 15:31     ` Adhemerval Zanella Netto
  2023-08-14 13:27   ` Carlos O'Donell
  1 sibling, 1 reply; 23+ messages in thread
From: Florian Weimer @ 2023-08-11 10:51 UTC (permalink / raw)
  To: Adhemerval Zanella via Libc-alpha; +Cc: Adhemerval Zanella

* Adhemerval Zanella via Libc-alpha:

> These function allow to posix_spawn and posix_spawnp to use
> CLONE_INTO_CGROUP with clone3, allowing the child process to
> be created in a different version 2 cgroup.  These are GNU
> extensions that are available only for Linux, and also only
> for the architectures that implement clone3 wrapper
> (HAVE_CLONE3_WRAPPER).
>
> To create a process on a different cgroupv2, one can use the:
>
>   posix_spawnattr_t attr;
>   posix_spawnattr_init (&attr);
>   posix_spawnattr_setflags (&attr, POSIX_SPAWN_SETCGROUP);
>   posix_spawnattr_setcgroup_np (&attr, cgroup);
>   posix_spawn (...)

Why are both POSIX_SPAWN_SETCGROUP and posix_spawnattr_setcgroup_np
needed?  Couldn't the latter imply the former?

> There is no fallback is either clone3 does not support the flag
> or if the architecture does not provide the clone3 wrapper, in
> this case posix_spawn returns ENOTSUP.

I think this really should be added to the manual, mayb

It's also not clear to me how you would probe for support properly.
The spawn operation might fail for other reasons.

I wonder if we have to probe as part of the 

> diff --git a/sysdeps/unix/sysv/linux/bits/spawn_ext.h b/sysdeps/unix/sysv/linux/bits/spawn_ext.h
> new file mode 100644
> index 0000000000..3bc10ab477
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/bits/spawn_ext.h

> +/* Get the cgroupsv2 the attribute structure.  */
> +extern int posix_spawnattr_getcgroup_np (const posix_spawnattr_t *
> +					 __restrict __attr,
> +					 int *__cgroup)
> +     __THROW __nonnull ((1, 2));
> +
> +/* Store scheduling parameters in the attribute structure.  */
> +extern int posix_spawnattr_setcgroup_np (posix_spawnattr_t *__restrict __attr,
> +					 int __cgroup)
> +     __THROW __nonnull ((1));

Second comment seems wrong.

> diff --git a/sysdeps/unix/sysv/linux/tst-spawn-cgroup.c b/sysdeps/unix/sysv/linux/tst-spawn-cgroup.c
> new file mode 100644
> index 0000000000..6dba30ab29
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/tst-spawn-cgroup.c
> @@ -0,0 +1,216 @@
> +/* Tests for posix_spawn cgroup extension.

> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */

Should be “https://”.

> +#define F_TYPE_EQUAL(a, b) (a == (typeof(a)) b)

Missing space after “type”.

> +static char *
> +get_cgroup (void)
> +{
> +  FILE *f = fopen ("/proc/self/cgroup", "re");
> +  if (f == NULL)
> +    FAIL_UNSUPPORTED ("no cgroup defined for the process");

Maybe add %m here.

> +/* Called on process re-execution.  */
> +_Noreturn static void
> +handle_restart (int argc, char *argv[])
> +{
> +  assert (argc == 1);
> +  char *newcgroup = argv[0];
> +
> +  char *current_cgroup = get_cgroup ();
> +  TEST_VERIFY_EXIT (current_cgroup != NULL);
> +  TEST_COMPARE_STRING (newcgroup, current_cgroup);
> +  exit (EXIT_SUCCESS);
> +}

I think the exit (EXIT_SUCCESS) masks failures because after execve, the
shared mapping with failure status does not exist.

> +static int
> +create_new_cgroup (char **newcgroup)
> +{
> +  struct statfs fs;
> +  if (statfs (CGROUPFS, &fs) < 0)
> +    {
> +      if (errno == ENOENT)
> +	FAIL_UNSUPPORTED ("not cgroupv2 mount found");
> +      FAIL_EXIT1 ("statfs (%s): %m\n", CGROUPFS);

“no[] cgroupv2 found?”

> +    }
> +
> +  if (!F_TYPE_EQUAL (fs.f_type, CGROUP2_SUPER_MAGIC))
> +    FAIL_UNSUPPORTED ("%s is not a cgroupv2", CGROUPFS);

This could print fs.f_type.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 5/8] posix: Add pidfd_spawn and pidfd_spawnp (BZ 30349)
  2023-08-03 16:35 ` [PATCH v7 5/8] posix: Add pidfd_spawn and pidfd_spawnp (BZ 30349) Adhemerval Zanella
@ 2023-08-11 11:45   ` Florian Weimer
  2023-08-11 16:14     ` Adhemerval Zanella Netto
  0 siblings, 1 reply; 23+ messages in thread
From: Florian Weimer @ 2023-08-11 11:45 UTC (permalink / raw)
  To: Adhemerval Zanella via Libc-alpha; +Cc: Adhemerval Zanella

* Adhemerval Zanella via Libc-alpha:

> diff --git a/NEWS b/NEWS
> index 99824eab95..ff41443896 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -15,6 +15,13 @@ Major new features:
>    set the cgroupv2 in the new process in a race free manner.  These functions
>    are GNU extensions and require a kernel with clone3 support.
>  
> +* On Linux, the pidfd_spawn and pidfd_spawp functions have been added.
> +  They have similar prototype and semantic as posix_spawn, but instead of
> +  returning a process ID, they return a file descriptor that can be used
> +  along other pidfd functions (like pidfd_send_signal, poll, or waitid).
> +  The pidfd functionality avoid the issue of PID reuse with traditional
> +  posix_spawn interface.

“avoid[s]”

> diff --git a/posix/tst-posix_spawn-setsid.c b/posix/tst-posix_spawn-setsid.c
> index 124d878ce2..751674165c 100644
> --- a/posix/tst-posix_spawn-setsid.c
> +++ b/posix/tst-posix_spawn-setsid.c
> @@ -18,78 +18,158 @@

> +/* Called on process re-execution, write down the session id on PIDFILE.  */
> +_Noreturn static void
> +handle_restart (const char *pidfile)
> +{
> +  int pidfd = xopen (pidfile, O_WRONLY, 0);
> +
> +  char buf[INT_STRLEN_BOUND (pid_t)];
> +  int s = snprintf (buf, sizeof buf, "%d", getsid (0));
> +  size_t n = write (pidfd, buf, s);
> +  TEST_VERIFY (n == s);
> +
> +  xclose (pidfd);
> +
> +  exit (EXIT_SUCCESS);
> +}

I suspect this has an issue with hiding test failures (mapping not
shared after execve).
> diff --git a/posix/tst-spawn3.c b/posix/tst-spawn3.c
> index e7ce0fb386..bd21ac6c4b 100644
> --- a/posix/tst-spawn3.c
> +++ b/posix/tst-spawn3.c
> @@ -16,6 +16,7 @@

> +  char buf[INT_STRLEN_BOUND (pid_t)];

This should be INT_BUFSIZE_BOUND.

> diff --git a/posix/tst-spawn6.c b/posix/tst-spawn6.c
> index 4e29d78168..ff36351cd6 100644
> --- a/posix/tst-spawn6.c
> +++ b/posix/tst-spawn6.c

> @@ -202,7 +201,7 @@ do_test (int argc, char *argv[])
>    if (restart)
>      return handle_restart (argv[1], argv[2]);
>  
> -  pid_t pid = xfork ();
> +  PID_T_TYPE pid = xfork ();
>    if (pid == 0)
>      {
>        /* Create a pseudo-terminal to avoid interfering with the one using by

I think the result of xfork remains pid_t, so that switch seems wrong?


> diff --git a/sysdeps/unix/sysv/linux/Versions b/sysdeps/unix/sysv/linux/Versions
> index 6d8a67039e..bd96ad12ad 100644
> --- a/sysdeps/unix/sysv/linux/Versions
> +++ b/sysdeps/unix/sysv/linux/Versions
> @@ -324,6 +324,8 @@ libc {
>    GLIBC_2.39 {
>      posix_spawnattr_getcgroup_np;
>      posix_spawnattr_setcgroup_np;
> +    pidfd_spawn;
> +    pidfd_spawnp;
>    }

I'd prefer to maintain lexicographic order.

> diff --git a/sysdeps/unix/sysv/linux/bits/spawn_ext.h b/sysdeps/unix/sysv/linux/bits/spawn_ext.h
> index 3bc10ab477..ff8550f264 100644
> --- a/sysdeps/unix/sysv/linux/bits/spawn_ext.h
> +++ b/sysdeps/unix/sysv/linux/bits/spawn_ext.h
> @@ -37,4 +37,35 @@ extern int posix_spawnattr_setcgroup_np
>  (posix_spawnattr_t
> +/* Spawn a new process executing PATH with the attributes describes in *ATTRP.
> +   Before running the process perform the actions described in FACTS.  Return
> +   a PID file descriptor in PIDFD if process creation was successful and the
> +   argument is non-null.
> +
> +   This function is a possible cancellation point and therefore not
> +   marked with __THROW. */

Missing space after .

> +extern int pidfd_spawn (int *__restrict __pidfd,
> +			const char *__restrict __path,
> +			const posix_spawn_file_actions_t *__restrict __facts,
> +			const posix_spawnattr_t *__restrict __attrp,
> +			char *const __argv[__restrict_arr],
> +			char *const __envp[__restrict_arr])
> +    __nonnull ((2, 5));
> +
> +/* Similar to `pidfd_spawn' but search for FILE in the PATH.
> +
> +   This function is a possible cancellation point and therefore not
> +   marked with __THROW. */

Missing space after .

> +extern int pidfd_spawnp (int *__restrict __pidfd,
> +			 const char *__restrict __file,
> +			 const posix_spawn_file_actions_t *__restrict __facts,
> +			 const posix_spawnattr_t *__restrict __attrp,
> +			 char *const __argv[__restrict_arr],
> +			 char *const __envp[__restrict_arr])
> +    __nonnull ((2, 5));

I think we should mark PIDFD as nonnull.  If the caller ignores the
descriptor, it just leaks.  In that case, the caller should use the
non-descriptor variant of posix_spawn.

> diff --git a/sysdeps/unix/sysv/linux/clone-pidfd-support.c b/sysdeps/unix/sysv/linux/clone-pidfd-support.c
> new file mode 100644
> index 0000000000..af2d213cc5
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/clone-pidfd-support.c

> +bool
> +__clone_pidfd_supported (void)
> +{
> +  static int supported = 0;

I suggest to make this a file-level static with a non-colliding name,
this way it's easier to print its value with a debugger.

> +  int state = atomic_load_relaxed (&supported);
> +  if (state == 0)
> +    {

> +    }
> +
> +  return state == 1;

“return state > 0;” is probably more efficient.

> index f0d4c62ae6..844abf1b0b 100644
> --- a/sysdeps/unix/sysv/linux/spawni.c
> +++ b/sysdeps/unix/sysv/linux/spawni.c

> @@ -319,6 +320,15 @@ __spawnix (pid_t * pid, const char *file,
>    struct posix_spawn_args args;
>    int ec;
>  
> +  bool use_pidfd = xflags & SPAWN_XFLAGS_RET_PIDFD;
> +
> +  /* For CLONE_PIDFD, older kernels might not fail with unsupported flags or
> +     some versions might not support waitid (P_PIDFD).  So to avoid the need
> +     to handle the error on the helper process, check for full pidfd
> +     support.  */
> +  if (use_pidfd && !__clone_pidfd_supported ())
> +    return ENOSYS;

Why not EOPNOTSUPP?  I think ENOSYS can be justified because the pidfd
functions are a separate family of functions, and not a sub-operation
that is failing,  Maybe add this to the comment?

Thanks,
Florian


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 6/8] posix: Add pidfd_fork (BZ 26371)
  2023-08-03 16:35 ` [PATCH v7 6/8] posix: Add pidfd_fork (BZ 26371) Adhemerval Zanella
@ 2023-08-11 12:06   ` Florian Weimer
  2023-08-11 16:26     ` Adhemerval Zanella Netto
  0 siblings, 1 reply; 23+ messages in thread
From: Florian Weimer @ 2023-08-11 12:06 UTC (permalink / raw)
  To: Adhemerval Zanella via Libc-alpha; +Cc: Adhemerval Zanella

* Adhemerval Zanella via Libc-alpha:

> The interface is:
>
>   pid_t pidfd_fork (int *pidfd, int cgroup, unsigned int flags)
>
> If PIDFD is set to NULL, no file descriptor is returned and pidfd_fork
> acts as fork.  Otherwise, a new file descriptor is returned and the
> kernel already sets O_CLOEXEC as default.  The pidfd_fork follows
> fork/_Fork convention on returning a positive or negative value to the
> parent (with negative indicating an error) and zero to the child.

This interface isn't really extensible, and it looks like we'll soon
need an extension mechanism similar to posix_spawn.

Can we skip adding this for now?  I think we really need to expose some
sort of clone/clone3 wrapper, with some guardrails against unsupportable
scenarios (such as spawning new threads in the current process).

The pidfd_spawn stuff in this series seems independently useful.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 1/8] arm: Add the clone3 wrapper
  2023-08-11 10:17   ` Florian Weimer
@ 2023-08-11 14:12     ` Adhemerval Zanella Netto
  2023-08-11 14:21       ` Florian Weimer
  0 siblings, 1 reply; 23+ messages in thread
From: Adhemerval Zanella Netto @ 2023-08-11 14:12 UTC (permalink / raw)
  To: Florian Weimer, Adhemerval Zanella via Libc-alpha



On 11/08/23 07:17, Florian Weimer wrote:
> * Adhemerval Zanella via Libc-alpha:
> 
>> +	/* Do the syscall, the kernel expects:
>> +	   r7: system call number:
>> +	   r0: cl_args
>> +	   r1: size  */
>> +	push    { r7 }
>> +	cfi_adjust_cfa_offset (4)
>> +	cfi_rel_offset (r7, 0)
>> +	ldr     r7, =SYS_ify(clone3)
>> +	swi	0x0
>> +	cfi_endproc
>> +
>> +	cmp	r0, #0
>> +	beq	1f
>> +	pop     {r7}
> 
>> +1:
>> +	.fnstart
>> +	.cantunwind
>> +	mov	r0, r3
>> +	mov	ip, r2
>> +	BLX (ip)
> 
> I think the stack is misaligned at the BNLX call because only one 4-byte
> register is pushed.

It should not matter because the stack is defined by the cl_args::stack
argument.  We might add a alignment check on clone-internal.c, but I
think since it uses solely on internal usage, it should be required.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 1/8] arm: Add the clone3 wrapper
  2023-08-11 14:12     ` Adhemerval Zanella Netto
@ 2023-08-11 14:21       ` Florian Weimer
  0 siblings, 0 replies; 23+ messages in thread
From: Florian Weimer @ 2023-08-11 14:21 UTC (permalink / raw)
  To: Adhemerval Zanella Netto; +Cc: Adhemerval Zanella via Libc-alpha

* Adhemerval Zanella Netto:

> On 11/08/23 07:17, Florian Weimer wrote:
>> * Adhemerval Zanella via Libc-alpha:
>> 
>>> +	/* Do the syscall, the kernel expects:
>>> +	   r7: system call number:
>>> +	   r0: cl_args
>>> +	   r1: size  */
>>> +	push    { r7 }
>>> +	cfi_adjust_cfa_offset (4)
>>> +	cfi_rel_offset (r7, 0)
>>> +	ldr     r7, =SYS_ify(clone3)
>>> +	swi	0x0
>>> +	cfi_endproc
>>> +
>>> +	cmp	r0, #0
>>> +	beq	1f
>>> +	pop     {r7}
>> 
>>> +1:
>>> +	.fnstart
>>> +	.cantunwind
>>> +	mov	r0, r3
>>> +	mov	ip, r2
>>> +	BLX (ip)
>> 
>> I think the stack is misaligned at the BNLX call because only one 4-byte
>> register is pushed.
>
> It should not matter because the stack is defined by the cl_args::stack
> argument.  We might add a alignment check on clone-internal.c, but I
> think since it uses solely on internal usage, it should be required.

You are right, my mistake.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 8/8] linux: Add pidfd_getpid
  2023-08-03 16:35 ` [PATCH v7 8/8] linux: Add pidfd_getpid Adhemerval Zanella
@ 2023-08-11 14:36   ` Florian Weimer
  2023-08-11 17:29     ` Adhemerval Zanella Netto
  0 siblings, 1 reply; 23+ messages in thread
From: Florian Weimer @ 2023-08-11 14:36 UTC (permalink / raw)
  To: Adhemerval Zanella via Libc-alpha; +Cc: Adhemerval Zanella

* Adhemerval Zanella via Libc-alpha:

> +If the operation fails, @code{pidfd_getpid} return @code{-1} and the following
> +@code{errno} error conditionas are defined:
> +
> +@table @code
> +@item EBADF
> +The input file descriptor is invalid, does not have a pidfd associated, or an
> +error has occurred parsing the kernel data.
> +@item EREMOTE
> +There is no process ID to denote the process in the current namespace.
> +@item ESRCH
> +The process for which the file descriptor refers to is terminated.
> +@end table

Maybe document ENOENT (/proc not mounted), ENFILE, EMFILE, ENOMEM as
well?

There are missing spaces in a few places:

+  while (*l == ' ' || (unsigned int)(*l) -'\t' < 5)
+      if ((unsigned int)(*l) - '0' >= 10)
+typedef int (*procutils_closure_t)(const char *line, void *arg);
+      char buf[CMSG_SPACE(sizeof(int))];

> diff --git a/sysdeps/unix/sysv/linux/pidfd_getpid.c b/sysdeps/unix/sysv/linux/pidfd_getpid.c
> new file mode 100644
> index 0000000000..46848a5983
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/pidfd_getpid.c

> +  bool neg = false;
> +  switch (*l)
> +    {
> +    case '-': neg = true;
> +    case '+': l++;

'+' should probably return -1?

> +    }
> +
> +  if (*l == '\0')
> +    return 0;
> +
> +  int n = 0;
> +  while (*l != '\0')
> +    {
> +      /* Check if '*l' is a digit.  */
> +      if ((unsigned int)(*l) - '0' >= 10)

It's a strange way to write this condition. '0' <= *l && l <= '9' should
work equally well.  I know is supposed to optimize this into one
condition, but it's not immediately obvious why this works with an early
cast of unsigned int instead of unsigned char.

> +      /* Ignore invalid large values.  */
> +      if (INT_MULTIPLY_WRAPV (10, n, &n)
> +          || INT_ADD_WRAPV (n, *l++ - '0', &n))
> +        return 0;

Shouldn't this return -1?

I think these error returns should set errno (EINVAL perhaps, since
that's unlikely to come from read or open), so that we have a chance to
identify parser problems.


> diff --git a/sysdeps/unix/sysv/linux/procutils.c b/sysdeps/unix/sysv/linux/procutils.c
> new file mode 100644
> index 0000000000..83b327cb9a
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/procutils.c
> @@ -0,0 +1,104 @@
> +/* Utilities functions to read/parse Linux procfs and sysfs.
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <assert.h>
> +#include <not-cancel.h>
> +#include <procutils.h>
> +#include <string.h>
> +
> +static int
> +next_line (char **r, int fd, char *const buffer, char **cp, char **re,
> +           char *const buffer_end)
> +{
> +  char *res = *cp;
> +  char *nl = memchr (*cp, '\n', *re - *cp);
> +  if (nl == NULL)
> +    {
> +      if (*cp != buffer)
> +        {
> +          if (*re == buffer_end)
> +            {
> +              memmove (buffer, *cp, *re - *cp);
> +              *re = buffer + (*re - *cp);
> +              *cp = buffer;
> +
> +              ssize_t n = __read_nocancel (fd, *re, buffer_end - *re);

Missing TEMP_FAILURE_RETRY, I would (also below, and further below for
__open64_nocancel).

> +              if (n < 0)
> +                return -1;
> +
> +              *re += n;
> +
> +              nl = memchr (*cp, '\n', *re - *cp);
> +              while (nl == NULL && *re == buffer_end)
> +                {
> +                  /* Truncate too long lines.  */
> +                  *re = buffer + 3 * (buffer_end - buffer) / 4;
> +                  n = __read_nocancel (fd, *re, buffer_end - *re);
> +                  if (n < 0)
> +                    return -1;
> +
> +                  nl = memchr (*re, '\n', n);
> +                  **re = '\0';
> +                  *re += n;
> +                }

Should we just skip long lines?  The 3/4 business is a bit strange.

This results in an endless loop if the file does not end with '\n', I
think.

> diff --git a/sysdeps/unix/sysv/linux/procutils.h b/sysdeps/unix/sysv/linux/procutils.h
> new file mode 100644
> index 0000000000..64e1080920
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/procutils.h
> @@ -0,0 +1,35 @@

> +typedef int (*procutils_closure_t)(const char *line, void *arg);
> +
> +/* Open and read the path FILENAME, line per line, and call CLOSURE with
> +   argument ARG on each line.  The read is done with a static buffer,
> +   with non-cancellable calls, and the line is null terminated.
> +
> +   The CLOSURE should return true if the read should continue, or false
> +   if the function should stop.
> +
> +   It returns 0 in case of success, or -1 otherwise.  */
> +int procutils_read_file (const char *filename, procutils_closure_t closure,
> +			 void *arg) attribute_hidden;
> +

A comment should say whether '\n' is included in line argument, and what
happens to overlong lines.

The return value for the closure should be an actual bool, or otherwise
the int return value should be passed through to the caller of
procutils_read_file.

> diff --git a/sysdeps/unix/sysv/linux/tst-pidfd.c b/sysdeps/unix/sysv/linux/tst-pidfd.c
> index 64d8a2ef40..53d223f702 100644
> --- a/sysdeps/unix/sysv/linux/tst-pidfd.c
> +++ b/sysdeps/unix/sysv/linux/tst-pidfd.c
> @@ -18,6 +18,7 @@

> +  /* Check if pidfd_getpid returns ESRCH for exited subprocess.  */
> +  {
> +    int pidfd;
> +    pid_t pidfork = pidfd_fork (&pidfd, -1, 0);
> +    if (pidfork == 0)
> +      _exit (EXIT_SUCCESS);
> +
> +    /* The process might be still running or already in zombie state, in any
> +       case the PID is still allocated to the process.  */
> +    pid_t pid = pidfd_getpid (pidfd);
> +    if (pid > 0)
> +      support_process_state_wait (pid, support_process_state_zombie);

The condition does not match the comment.  I don't know which one is
correct.  Please verify that pid > 0 (if the PID remains available), or
change the comment to “in [either] case”.

> diff --git a/sysdeps/unix/sysv/linux/tst-pidfd_getpid.c b/sysdeps/unix/sysv/linux/tst-pidfd_getpid.c
> new file mode 100644
> index 0000000000..41d03a04ad
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/tst-pidfd_getpid.c

> +  TEST_VERIFY_EXIT (socketpair (AF_UNIX, SOCK_STREAM, 0, sockfd) == 0);
> +
> +  /* Check if pidfd_getpid returns EREMOTE for process not in current
> +     namespace.  */
> +  {
> +    int pidfd;
> +    pid_t pid = pidfd_fork (&pidfd, -1, 0);

I think you can avoid the file descriptor passing if you call pidfd_fork
to create an unrelated descriptor, and then do the namespace thing below
after another fork.  This way, the descriptor will just be inherited via
fork.

> +    send_fd (sockfd[1], pidfd);
> +
> +    siginfo_t info;
> +    TEST_COMPARE (waitid (P_PIDFD, pidfd, &info, WEXITED), 0);
> +    if (info.si_status == EXIT_UNSUPPORTED)
> +      FAIL_UNSUPPORTED ("unable to unshare user/fs/pid");
> +    TEST_COMPARE (info.si_status, 0);
> +    TEST_COMPARE (info.si_code, CLD_EXITED);

I think this could have a few tests, like the pidfd_getpid value
matching what comes back subsequently in si_pid.

> diff --git a/sysdeps/unix/sysv/linux/sys/pidfd.h b/sysdeps/unix/sysv/linux/sys/pidfd.h
> index 87095212a7..8cf4df6b81 100644
> --- a/sysdeps/unix/sysv/linux/sys/pidfd.h
> +++ b/sysdeps/unix/sysv/linux/sys/pidfd.h
> @@ -67,4 +67,8 @@ extern int pidfd_send_signal (int __pidfd, int __sig, siginfo_t *__info,
>  extern pid_t pidfd_fork (int *__pidfd, int __cgroup, unsigned int __flags)
>    __THROW;
>  
> +/* Query the process ID (PID) from process descriptor __FD.  Return the PID
> +   or -1 in case of an error.  */
> +extern pid_t pidfd_getpid (int __fd) __THROW;
> +

__FD should be FD.


Thanks,
Florian


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 3/8] linux: Undef __ASSUME_CLONE3 for alpha, ia64, nios2, sh, and sparc
  2023-08-11 10:34   ` Florian Weimer
@ 2023-08-11 15:12     ` Adhemerval Zanella Netto
  0 siblings, 0 replies; 23+ messages in thread
From: Adhemerval Zanella Netto @ 2023-08-11 15:12 UTC (permalink / raw)
  To: Florian Weimer, Adhemerval Zanella via Libc-alpha



On 11/08/23 07:34, Florian Weimer wrote:
> * Adhemerval Zanella via Libc-alpha:
> 
>> Not all architectures added clone3 syscall.
>> ---
>>  .../unix/sysv/linux/alpha/kernel-features.h   |  3 +++
>>  .../unix/sysv/linux/ia64/kernel-features.h    |  3 +++
>>  .../unix/sysv/linux/nios2/kernel-features.h   | 23 +++++++++++++++++++
>>  sysdeps/unix/sysv/linux/sh/kernel-features.h  |  3 +++
>>  .../unix/sysv/linux/sparc/kernel-features.h   |  3 +++
>>  5 files changed, 35 insertions(+)
>>  create mode 100644 sysdeps/unix/sysv/linux/nios2/kernel-features.h
>>
>> diff --git a/sysdeps/unix/sysv/linux/alpha/kernel-features.h b/sysdeps/unix/sysv/linux/alpha/kernel-features.h
>> index 3151e75449..e298bf2bcc 100644
>> --- a/sysdeps/unix/sysv/linux/alpha/kernel-features.h
>> +++ b/sysdeps/unix/sysv/linux/alpha/kernel-features.h
>> @@ -50,4 +50,7 @@
>>  /* Alpha requires old sysvipc even being a 64-bit architecture.  */
>>  #undef __ASSUME_SYSVIPC_DEFAULT_IPC_64
>>  
>> +/* Alpha does not provide clone3.  */
>> +#undef __ASSUME_CLONE3
> 
> This is inconsistent with sysdeps/unix/sysv/linux/kernel-features.h,
> which I think uses 0 to indicate no support:
> 
> /* The clone3 system call was introduced across on most architectures in
>    Linux 5.3.  Not all ports implements it, so it should be used along
>    HAVE_CLONE3_WRAPPER define.  */
> #if __LINUX_KERNEL_VERSION >= 0x050300
> # define __ASSUME_CLONE3 1
> #else
> # define __ASSUME_CLONE3 0
> 
> Maybe that comment needs updating in this series, too?

Are you sure? It follow what __ASSUME_FACCESSAT2, __ASSUME_CLOSE_RANGE,
and __ASSUME_FUTEX_LOCK_PI2 do.

We really should eventually refactor kernel-features.h to avoid the
defined/undef way which is error-prone and always have the __ASSUME
define to either 1 or 0.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 4/8] linux: Add posix_spawnattr_{get, set}cgroup_np (BZ 26731)
  2023-08-11 10:51   ` [PATCH v7 4/8] linux: Add posix_spawnattr_{get, set}cgroup_np " Florian Weimer
@ 2023-08-11 15:31     ` Adhemerval Zanella Netto
  0 siblings, 0 replies; 23+ messages in thread
From: Adhemerval Zanella Netto @ 2023-08-11 15:31 UTC (permalink / raw)
  To: Florian Weimer, Adhemerval Zanella via Libc-alpha



On 11/08/23 07:51, Florian Weimer wrote:
> * Adhemerval Zanella via Libc-alpha:
> 
>> These function allow to posix_spawn and posix_spawnp to use
>> CLONE_INTO_CGROUP with clone3, allowing the child process to
>> be created in a different version 2 cgroup.  These are GNU
>> extensions that are available only for Linux, and also only
>> for the architectures that implement clone3 wrapper
>> (HAVE_CLONE3_WRAPPER).
>>
>> To create a process on a different cgroupv2, one can use the:
>>
>>   posix_spawnattr_t attr;
>>   posix_spawnattr_init (&attr);
>>   posix_spawnattr_setflags (&attr, POSIX_SPAWN_SETCGROUP);
>>   posix_spawnattr_setcgroup_np (&attr, cgroup);
>>   posix_spawn (...)
> 
> Why are both POSIX_SPAWN_SETCGROUP and posix_spawnattr_setcgroup_np
> needed?  Couldn't the latter imply the former?

Besides being orthogonal with the other standard options, it allows
the called to just set/reset a flag in a posix_spawnattr_t to enable
disable the options instead of create/destroy a new attribute for
each posix_spawn call.

> 
>> There is no fallback is either clone3 does not support the flag
>> or if the architecture does not provide the clone3 wrapper, in
>> this case posix_spawn returns ENOTSUP.
> 
> I think this really should be added to the manual, mayb
> 
> It's also not clear to me how you would probe for support properly.
> The spawn operation might fail for other reasons.
> 
> I wonder if we have to probe as part of the 

Some comments seems to be truncated.  For probing, posix_spawnattr_setflags
fails with invalid flags, so trying to use POSIX_SPAWN_SETCGROUP if is
is not support should return EINVAL.

About the manual, I can add something but since we do not any sort of
posix_spawn it would require to add a lot of stubs.

> 
>> diff --git a/sysdeps/unix/sysv/linux/bits/spawn_ext.h b/sysdeps/unix/sysv/linux/bits/spawn_ext.h
>> new file mode 100644
>> index 0000000000..3bc10ab477
>> --- /dev/null
>> +++ b/sysdeps/unix/sysv/linux/bits/spawn_ext.h
> 
>> +/* Get the cgroupsv2 the attribute structure.  */
>> +extern int posix_spawnattr_getcgroup_np (const posix_spawnattr_t *
>> +					 __restrict __attr,
>> +					 int *__cgroup)
>> +     __THROW __nonnull ((1, 2));
>> +
>> +/* Store scheduling parameters in the attribute structure.  */
>> +extern int posix_spawnattr_setcgroup_np (posix_spawnattr_t *__restrict __attr,
>> +					 int __cgroup)
>> +     __THROW __nonnull ((1));
> 
> Second comment seems wrong.

Indeed, and there is no need of __restrict here.  Also on 
posix_spawnattr_getcgroup_np it should have a __restrict for the
cgroup argument.

> 
>> diff --git a/sysdeps/unix/sysv/linux/tst-spawn-cgroup.c b/sysdeps/unix/sysv/linux/tst-spawn-cgroup.c
>> new file mode 100644
>> index 0000000000..6dba30ab29
>> --- /dev/null
>> +++ b/sysdeps/unix/sysv/linux/tst-spawn-cgroup.c
>> @@ -0,0 +1,216 @@
>> +/* Tests for posix_spawn cgroup extension.
> 
>> +   You should have received a copy of the GNU Lesser General Public
>> +   License along with the GNU C Library; if not, see
>> +   <http://www.gnu.org/licenses/>.  */
> 
> Should be “https://”.

Ack.

> 
>> +#define F_TYPE_EQUAL(a, b) (a == (typeof(a)) b)
> 
> Missing space after “type”.

Ack.

> 
>> +static char *
>> +get_cgroup (void)
>> +{
>> +  FILE *f = fopen ("/proc/self/cgroup", "re");
>> +  if (f == NULL)
>> +    FAIL_UNSUPPORTED ("no cgroup defined for the process");
> 
> Maybe add %m here.

Ack.

> 
>> +/* Called on process re-execution.  */
>> +_Noreturn static void
>> +handle_restart (int argc, char *argv[])
>> +{
>> +  assert (argc == 1);
>> +  char *newcgroup = argv[0];
>> +
>> +  char *current_cgroup = get_cgroup ();
>> +  TEST_VERIFY_EXIT (current_cgroup != NULL);
>> +  TEST_COMPARE_STRING (newcgroup, current_cgroup);
>> +  exit (EXIT_SUCCESS);
>> +}
> 
> I think the exit (EXIT_SUCCESS) masks failures because after execve, the
> shared mapping with failure status does not exist.

The TEST_VERIFY_EXIT should trigger the waitid checks, but you are right
for TEST_COMPARE_STRING. I removed the exit to let the support/test-driver.c
return the expected exit code.

> 
>> +static int
>> +create_new_cgroup (char **newcgroup)
>> +{
>> +  struct statfs fs;
>> +  if (statfs (CGROUPFS, &fs) < 0)
>> +    {
>> +      if (errno == ENOENT)
>> +	FAIL_UNSUPPORTED ("not cgroupv2 mount found");
>> +      FAIL_EXIT1 ("statfs (%s): %m\n", CGROUPFS);
> 
> “no[] cgroupv2 found?”

Ack.

> 
>> +    }
>> +
>> +  if (!F_TYPE_EQUAL (fs.f_type, CGROUP2_SUPER_MAGIC))
>> +    FAIL_UNSUPPORTED ("%s is not a cgroupv2", CGROUPFS);
> 
> This could print fs.f_type.

Ack.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 5/8] posix: Add pidfd_spawn and pidfd_spawnp (BZ 30349)
  2023-08-11 11:45   ` Florian Weimer
@ 2023-08-11 16:14     ` Adhemerval Zanella Netto
  0 siblings, 0 replies; 23+ messages in thread
From: Adhemerval Zanella Netto @ 2023-08-11 16:14 UTC (permalink / raw)
  To: Florian Weimer, Adhemerval Zanella via Libc-alpha



On 11/08/23 08:45, Florian Weimer wrote:
> * Adhemerval Zanella via Libc-alpha:
> 
>> diff --git a/NEWS b/NEWS
>> index 99824eab95..ff41443896 100644
>> --- a/NEWS
>> +++ b/NEWS
>> @@ -15,6 +15,13 @@ Major new features:
>>    set the cgroupv2 in the new process in a race free manner.  These functions
>>    are GNU extensions and require a kernel with clone3 support.
>>  
>> +* On Linux, the pidfd_spawn and pidfd_spawp functions have been added.
>> +  They have similar prototype and semantic as posix_spawn, but instead of
>> +  returning a process ID, they return a file descriptor that can be used
>> +  along other pidfd functions (like pidfd_send_signal, poll, or waitid).
>> +  The pidfd functionality avoid the issue of PID reuse with traditional
>> +  posix_spawn interface.
> 
> “avoid[s]”

Ack.

> 
>> diff --git a/posix/tst-posix_spawn-setsid.c b/posix/tst-posix_spawn-setsid.c
>> index 124d878ce2..751674165c 100644
>> --- a/posix/tst-posix_spawn-setsid.c
>> +++ b/posix/tst-posix_spawn-setsid.c
>> @@ -18,78 +18,158 @@
> 
>> +/* Called on process re-execution, write down the session id on PIDFILE.  */
>> +_Noreturn static void
>> +handle_restart (const char *pidfile)
>> +{
>> +  int pidfd = xopen (pidfile, O_WRONLY, 0);
>> +
>> +  char buf[INT_STRLEN_BOUND (pid_t)];
>> +  int s = snprintf (buf, sizeof buf, "%d", getsid (0));
>> +  size_t n = write (pidfd, buf, s);
>> +  TEST_VERIFY (n == s);
>> +
>> +  xclose (pidfd);
>> +
>> +  exit (EXIT_SUCCESS);
>> +}
> 

Indeed, I removed the exit.

> I suspect this has an issue with hiding test failures (mapping not
> shared after execve).
>> diff --git a/posix/tst-spawn3.c b/posix/tst-spawn3.c
>> index e7ce0fb386..bd21ac6c4b 100644
>> --- a/posix/tst-spawn3.c
>> +++ b/posix/tst-spawn3.c
>> @@ -16,6 +16,7 @@
> 
>> +  char buf[INT_STRLEN_BOUND (pid_t)];
> 
> This should be INT_BUFSIZE_BOUND.

Ack.

> 
>> diff --git a/posix/tst-spawn6.c b/posix/tst-spawn6.c
>> index 4e29d78168..ff36351cd6 100644
>> --- a/posix/tst-spawn6.c
>> +++ b/posix/tst-spawn6.c
> 
>> @@ -202,7 +201,7 @@ do_test (int argc, char *argv[])
>>    if (restart)
>>      return handle_restart (argv[1], argv[2]);
>>  
>> -  pid_t pid = xfork ();
>> +  PID_T_TYPE pid = xfork ();
>>    if (pid == 0)
>>      {
>>        /* Create a pseudo-terminal to avoid interfering with the one using by
> 
> I think the result of xfork remains pid_t, so that switch seems wrong?
> 

Indeed, I reverted that.

> 
>> diff --git a/sysdeps/unix/sysv/linux/Versions b/sysdeps/unix/sysv/linux/Versions
>> index 6d8a67039e..bd96ad12ad 100644
>> --- a/sysdeps/unix/sysv/linux/Versions
>> +++ b/sysdeps/unix/sysv/linux/Versions
>> @@ -324,6 +324,8 @@ libc {
>>    GLIBC_2.39 {
>>      posix_spawnattr_getcgroup_np;
>>      posix_spawnattr_setcgroup_np;
>> +    pidfd_spawn;
>> +    pidfd_spawnp;
>>    }
> 
> I'd prefer to maintain lexicographic order.

Ack.

> 
>> diff --git a/sysdeps/unix/sysv/linux/bits/spawn_ext.h b/sysdeps/unix/sysv/linux/bits/spawn_ext.h
>> index 3bc10ab477..ff8550f264 100644
>> --- a/sysdeps/unix/sysv/linux/bits/spawn_ext.h
>> +++ b/sysdeps/unix/sysv/linux/bits/spawn_ext.h
>> @@ -37,4 +37,35 @@ extern int posix_spawnattr_setcgroup_np
>>  (posix_spawnattr_t
>> +/* Spawn a new process executing PATH with the attributes describes in *ATTRP.
>> +   Before running the process perform the actions described in FACTS.  Return
>> +   a PID file descriptor in PIDFD if process creation was successful and the
>> +   argument is non-null.
>> +
>> +   This function is a possible cancellation point and therefore not
>> +   marked with __THROW. */
> 
> Missing space after .

Ack.

> 
>> +extern int pidfd_spawn (int *__restrict __pidfd,
>> +			const char *__restrict __path,
>> +			const posix_spawn_file_actions_t *__restrict __facts,
>> +			const posix_spawnattr_t *__restrict __attrp,
>> +			char *const __argv[__restrict_arr],
>> +			char *const __envp[__restrict_arr])
>> +    __nonnull ((2, 5));
>> +
>> +/* Similar to `pidfd_spawn' but search for FILE in the PATH.
>> +
>> +   This function is a possible cancellation point and therefore not
>> +   marked with __THROW. */
> 
> Missing space after .

Ack.

> 
>> +extern int pidfd_spawnp (int *__restrict __pidfd,
>> +			 const char *__restrict __file,
>> +			 const posix_spawn_file_actions_t *__restrict __facts,
>> +			 const posix_spawnattr_t *__restrict __attrp,
>> +			 char *const __argv[__restrict_arr],
>> +			 char *const __envp[__restrict_arr])
>> +    __nonnull ((2, 5));
> 
> I think we should mark PIDFD as nonnull.  If the caller ignores the
> descriptor, it just leaks.  In that case, the caller should use the
> non-descriptor variant of posix_spawn.

Sounds reasonable.  Another options would to just close the file descriptor
if clone is successful and pidfd is NULL.

> 
>> diff --git a/sysdeps/unix/sysv/linux/clone-pidfd-support.c b/sysdeps/unix/sysv/linux/clone-pidfd-support.c
>> new file mode 100644
>> index 0000000000..af2d213cc5
>> --- /dev/null
>> +++ b/sysdeps/unix/sysv/linux/clone-pidfd-support.c
> 
>> +bool
>> +__clone_pidfd_supported (void)
>> +{
>> +  static int supported = 0;
> 
> I suggest to make this a file-level static with a non-colliding name,
> this way it's easier to print its value with a debugger.

Ack.

> 
>> +  int state = atomic_load_relaxed (&supported);
>> +  if (state == 0)
>> +    {
> 
>> +    }
>> +
>> +  return state == 1;
> 
> “return state > 0;” is probably more efficient.
> 

Ack.

>> index f0d4c62ae6..844abf1b0b 100644
>> --- a/sysdeps/unix/sysv/linux/spawni.c
>> +++ b/sysdeps/unix/sysv/linux/spawni.c
> 
>> @@ -319,6 +320,15 @@ __spawnix (pid_t * pid, const char *file,
>>    struct posix_spawn_args args;
>>    int ec;
>>  
>> +  bool use_pidfd = xflags & SPAWN_XFLAGS_RET_PIDFD;
>> +
>> +  /* For CLONE_PIDFD, older kernels might not fail with unsupported flags or
>> +     some versions might not support waitid (P_PIDFD).  So to avoid the need
>> +     to handle the error on the helper process, check for full pidfd
>> +     support.  */
>> +  if (use_pidfd && !__clone_pidfd_supported ())
>> +    return ENOSYS;
> 
> Why not EOPNOTSUPP?  I think ENOSYS can be justified because the pidfd
> functions are a separate family of functions, and not a sub-operation
> that is failing,  Maybe add this to the comment?

I think ENOSYS fits better here, since the ideia is that if pidfd waitid
is not supported there posix_spawn can not be used regardless of its
arguments.  This is different than posix_spawn with cgroupv2, that
depends where the kernel supports clone with CLONE_INTO_CGROUP.

I have added a comment about it.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 6/8] posix: Add pidfd_fork (BZ 26371)
  2023-08-11 12:06   ` Florian Weimer
@ 2023-08-11 16:26     ` Adhemerval Zanella Netto
  0 siblings, 0 replies; 23+ messages in thread
From: Adhemerval Zanella Netto @ 2023-08-11 16:26 UTC (permalink / raw)
  To: Florian Weimer, Adhemerval Zanella via Libc-alpha



On 11/08/23 09:06, Florian Weimer wrote:
> * Adhemerval Zanella via Libc-alpha:
> 
>> The interface is:
>>
>>   pid_t pidfd_fork (int *pidfd, int cgroup, unsigned int flags)
>>
>> If PIDFD is set to NULL, no file descriptor is returned and pidfd_fork
>> acts as fork.  Otherwise, a new file descriptor is returned and the
>> kernel already sets O_CLOEXEC as default.  The pidfd_fork follows
>> fork/_Fork convention on returning a positive or negative value to the
>> parent (with negative indicating an error) and zero to the child.
> 
> This interface isn't really extensible, and it looks like we'll soon
> need an extension mechanism similar to posix_spawn.
> 
> Can we skip adding this for now?  I think we really need to expose some
> sort of clone/clone3 wrapper, with some guardrails against unsupportable
> scenarios (such as spawning new threads in the current process).
> 
> The pidfd_spawn stuff in this series seems independently useful.

The clone3 wrapper will be always tricky, specially due the clone
flags constraints combinations and the stack size/alignment requirements.
If extensibility is really desirable I think we can either go for the 
current practice of adding a struct with some reserved space, or follow 
the newest kernel interface to add a struct plus size

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 8/8] linux: Add pidfd_getpid
  2023-08-11 14:36   ` Florian Weimer
@ 2023-08-11 17:29     ` Adhemerval Zanella Netto
  0 siblings, 0 replies; 23+ messages in thread
From: Adhemerval Zanella Netto @ 2023-08-11 17:29 UTC (permalink / raw)
  To: Florian Weimer, Adhemerval Zanella via Libc-alpha



On 11/08/23 11:36, Florian Weimer wrote:
> * Adhemerval Zanella via Libc-alpha:
> 
>> +If the operation fails, @code{pidfd_getpid} return @code{-1} and the following
>> +@code{errno} error conditionas are defined:
>> +
>> +@table @code
>> +@item EBADF
>> +The input file descriptor is invalid, does not have a pidfd associated, or an
>> +error has occurred parsing the kernel data.
>> +@item EREMOTE
>> +There is no process ID to denote the process in the current namespace.
>> +@item ESRCH
>> +The process for which the file descriptor refers to is terminated.
>> +@end table
> 
> Maybe document ENOENT (/proc not mounted), ENFILE, EMFILE, ENOMEM as
> well?

Ack.

> 
> There are missing spaces in a few places:
> 
> +  while (*l == ' ' || (unsigned int)(*l) -'\t' < 5)
> +      if ((unsigned int)(*l) - '0' >= 10)
> +typedef int (*procutils_closure_t)(const char *line, void *arg);
> +      char buf[CMSG_SPACE(sizeof(int))];
Ack.

> 
>> diff --git a/sysdeps/unix/sysv/linux/pidfd_getpid.c b/sysdeps/unix/sysv/linux/pidfd_getpid.c
>> new file mode 100644
>> index 0000000000..46848a5983
>> --- /dev/null
>> +++ b/sysdeps/unix/sysv/linux/pidfd_getpid.c
> 
>> +  bool neg = false;
>> +  switch (*l)
>> +    {
>> +    case '-': neg = true;
>> +    case '+': l++;
> 
> '+' should probably return -1?

I think it makes sense here indeed.

> 
>> +    }
>> +
>> +  if (*l == '\0')
>> +    return 0;
>> +
>> +  int n = 0;
>> +  while (*l != '\0')
>> +    {
>> +      /* Check if '*l' is a digit.  */
>> +      if ((unsigned int)(*l) - '0' >= 10)
> 
> It's a strange way to write this condition. '0' <= *l && l <= '9' should
> work equally well.  I know is supposed to optimize this into one
> condition, but it's not immediately obvious why this works with an early
> cast of unsigned int instead of unsigned char.

'0' > *l || *l > '9' works for me as well.

> 
>> +      /* Ignore invalid large values.  */
>> +      if (INT_MULTIPLY_WRAPV (10, n, &n)
>> +          || INT_ADD_WRAPV (n, *l++ - '0', &n))
>> +        return 0;
> 
> Shouldn't this return -1?

I think there is no point in returning 0 for any parsing failure when a
'Pid:' line is found (since multiple 'Pid:' entries also do not make
sense).

> 
> I think these error returns should set errno (EINVAL perhaps, since
> that's unlikely to come from read or open), so that we have a chance to
> identify parser problems.

The procutils_read_file already returns -1 for the case the closure
returns before reading whole file, so pidfd_getpid can return EBADF
in the case of an invalid 'Pid:'.  And it also return the same error
for the case 'Pid:' is not existent (procutils_read_file return 0,
but parse_fdinfo_t is { false, ... }).

We can change the later to EINVAL, but I am not sure if really fits
here.  An inexistent 'Pid: ' associated with the input file descriptor
argument means that its not a pidfd one; which imho does make to 
EBADF.

> 
> 
>> diff --git a/sysdeps/unix/sysv/linux/procutils.c b/sysdeps/unix/sysv/linux/procutils.c
>> new file mode 100644
>> index 0000000000..83b327cb9a
>> --- /dev/null
>> +++ b/sysdeps/unix/sysv/linux/procutils.c
>> @@ -0,0 +1,104 @@
>> +/* Utilities functions to read/parse Linux procfs and sysfs.
>> +   Copyright (C) 2023 Free Software Foundation, Inc.
>> +   This file is part of the GNU C Library.
>> +
>> +   The GNU C Library is free software; you can redistribute it and/or
>> +   modify it under the terms of the GNU Lesser General Public
>> +   License as published by the Free Software Foundation; either
>> +   version 2.1 of the License, or (at your option) any later version.
>> +
>> +   The GNU C Library is distributed in the hope that it will be useful,
>> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> +   Lesser General Public License for more details.
>> +
>> +   You should have received a copy of the GNU Lesser General Public
>> +   License along with the GNU C Library; if not, see
>> +   <https://www.gnu.org/licenses/>.  */
>> +
>> +#include <assert.h>
>> +#include <not-cancel.h>
>> +#include <procutils.h>
>> +#include <string.h>
>> +
>> +static int
>> +next_line (char **r, int fd, char *const buffer, char **cp, char **re,
>> +           char *const buffer_end)
>> +{
>> +  char *res = *cp;
>> +  char *nl = memchr (*cp, '\n', *re - *cp);
>> +  if (nl == NULL)
>> +    {
>> +      if (*cp != buffer)
>> +        {
>> +          if (*re == buffer_end)
>> +            {
>> +              memmove (buffer, *cp, *re - *cp);
>> +              *re = buffer + (*re - *cp);
>> +              *cp = buffer;
>> +
>> +              ssize_t n = __read_nocancel (fd, *re, buffer_end - *re);
> 
> Missing TEMP_FAILURE_RETRY, I would (also below, and further below for
> __open64_nocancel).

Ack.

> 
>> +              if (n < 0)
>> +                return -1;
>> +
>> +              *re += n;
>> +
>> +              nl = memchr (*cp, '\n', *re - *cp);
>> +              while (nl == NULL && *re == buffer_end)
>> +                {
>> +                  /* Truncate too long lines.  */
>> +                  *re = buffer + 3 * (buffer_end - buffer) / 4;
>> +                  n = __read_nocancel (fd, *re, buffer_end - *re);
>> +                  if (n < 0)
>> +                    return -1;
>> +
>> +                  nl = memchr (*re, '\n', n);
>> +                  **re = '\0';
>> +                  *re += n;
>> +                }
> 
> Should we just skip long lines?  The 3/4 business is a bit strange.

So this is essentially the same code from sysdeps/unix/sysv/linux/getsysstats.c
(which I plan to consolidate later), and I agree with you that 3/4 is not really
clear.

> 
> This results in an endless loop if the file does not end with '\n', I
> think.

I think it is meant to be a generic interface to read procfs/sysfs, I
would be better to bail out for long lines. I will change it.

> 
>> diff --git a/sysdeps/unix/sysv/linux/procutils.h b/sysdeps/unix/sysv/linux/procutils.h
>> new file mode 100644
>> index 0000000000..64e1080920
>> --- /dev/null
>> +++ b/sysdeps/unix/sysv/linux/procutils.h
>> @@ -0,0 +1,35 @@
> 
>> +typedef int (*procutils_closure_t)(const char *line, void *arg);
>> +
>> +/* Open and read the path FILENAME, line per line, and call CLOSURE with
>> +   argument ARG on each line.  The read is done with a static buffer,
>> +   with non-cancellable calls, and the line is null terminated.
>> +
>> +   The CLOSURE should return true if the read should continue, or false
>> +   if the function should stop.
>> +
>> +   It returns 0 in case of success, or -1 otherwise.  */
>> +int procutils_read_file (const char *filename, procutils_closure_t closure,
>> +			 void *arg) attribute_hidden;
>> +
> 
> A comment should say whether '\n' is included in line argument, and what
> happens to overlong lines.
> 
> The return value for the closure should be an actual bool, or otherwise
> the int return value should be passed through to the caller of
> procutils_read_file.

Ack, I changed to bool.

> 
>> diff --git a/sysdeps/unix/sysv/linux/tst-pidfd.c b/sysdeps/unix/sysv/linux/tst-pidfd.c
>> index 64d8a2ef40..53d223f702 100644
>> --- a/sysdeps/unix/sysv/linux/tst-pidfd.c
>> +++ b/sysdeps/unix/sysv/linux/tst-pidfd.c
>> @@ -18,6 +18,7 @@
> 
>> +  /* Check if pidfd_getpid returns ESRCH for exited subprocess.  */
>> +  {
>> +    int pidfd;
>> +    pid_t pidfork = pidfd_fork (&pidfd, -1, 0);
>> +    if (pidfork == 0)
>> +      _exit (EXIT_SUCCESS);
>> +
>> +    /* The process might be still running or already in zombie state, in any
>> +       case the PID is still allocated to the process.  */
>> +    pid_t pid = pidfd_getpid (pidfd);
>> +    if (pid > 0)
>> +      support_process_state_wait (pid, support_process_state_zombie);
> 
> The condition does not match the comment.  I don't know which one is
> correct.  Please verify that pid > 0 (if the PID remains available), or
> change the comment to “in [either] case”.

I changed the comment, what this snippet is trying to test if the process
it still executing pid will be positive one and thus we need to wait it
become a zombie. Otherwise we can issue the pidfd_getpid directly.

> 
>> diff --git a/sysdeps/unix/sysv/linux/tst-pidfd_getpid.c b/sysdeps/unix/sysv/linux/tst-pidfd_getpid.c
>> new file mode 100644
>> index 0000000000..41d03a04ad
>> --- /dev/null
>> +++ b/sysdeps/unix/sysv/linux/tst-pidfd_getpid.c
> 
>> +  TEST_VERIFY_EXIT (socketpair (AF_UNIX, SOCK_STREAM, 0, sockfd) == 0);
>> +
>> +  /* Check if pidfd_getpid returns EREMOTE for process not in current
>> +     namespace.  */
>> +  {
>> +    int pidfd;
>> +    pid_t pid = pidfd_fork (&pidfd, -1, 0);
> 
> I think you can avoid the file descriptor passing if you call pidfd_fork
> to create an unrelated descriptor, and then do the namespace thing below
> after another fork.  This way, the descriptor will just be inherited via
> fork.

Ok, I will check if this simplifies things.

> 
>> +    send_fd (sockfd[1], pidfd);
>> +
>> +    siginfo_t info;
>> +    TEST_COMPARE (waitid (P_PIDFD, pidfd, &info, WEXITED), 0);
>> +    if (info.si_status == EXIT_UNSUPPORTED)
>> +      FAIL_UNSUPPORTED ("unable to unshare user/fs/pid");
>> +    TEST_COMPARE (info.si_status, 0);
>> +    TEST_COMPARE (info.si_code, CLD_EXITED);
> 
> I think this could have a few tests, like the pidfd_getpid value
> matching what comes back subsequently in si_pid.

Ack.

> 
>> diff --git a/sysdeps/unix/sysv/linux/sys/pidfd.h b/sysdeps/unix/sysv/linux/sys/pidfd.h
>> index 87095212a7..8cf4df6b81 100644
>> --- a/sysdeps/unix/sysv/linux/sys/pidfd.h
>> +++ b/sysdeps/unix/sysv/linux/sys/pidfd.h
>> @@ -67,4 +67,8 @@ extern int pidfd_send_signal (int __pidfd, int __sig, siginfo_t *__info,
>>  extern pid_t pidfd_fork (int *__pidfd, int __cgroup, unsigned int __flags)
>>    __THROW;
>>  
>> +/* Query the process ID (PID) from process descriptor __FD.  Return the PID
>> +   or -1 in case of an error.  */
>> +extern pid_t pidfd_getpid (int __fd) __THROW;
>> +
> 
> __FD should be FD.

Ack.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 4/8] linux: Add posix_spawnattr_{get, set}cgroup_np (BZ 26731)
  2023-08-03 16:35 ` [PATCH v7 4/8] linux: Add posix_spawnattr_{get,set}cgroup_np (BZ 26731) Adhemerval Zanella
  2023-08-11 10:51   ` [PATCH v7 4/8] linux: Add posix_spawnattr_{get, set}cgroup_np " Florian Weimer
@ 2023-08-14 13:27   ` Carlos O'Donell
  1 sibling, 0 replies; 23+ messages in thread
From: Carlos O'Donell @ 2023-08-14 13:27 UTC (permalink / raw)
  To: Adhemerval Zanella, libc-alpha

On 8/3/23 12:35, Adhemerval Zanella via Libc-alpha wrote:
> These function allow to posix_spawn and posix_spawnp to use
> CLONE_INTO_CGROUP with clone3, allowing the child process to
> be created in a different version 2 cgroup.  These are GNU
> extensions that are available only for Linux, and also only
> for the architectures that implement clone3 wrapper
> (HAVE_CLONE3_WRAPPER).

This fails pre-commit CI for AArch64

https://patchwork.sourceware.org/project/glibc/patch/20230803163558.991832-5-adhemerval.zanella@linaro.org/

Please review the result.

> To create a process on a different cgroupv2, one can use the:
> 
>   posix_spawnattr_t attr;
>   posix_spawnattr_init (&attr);
>   posix_spawnattr_setflags (&attr, POSIX_SPAWN_SETCGROUP);
>   posix_spawnattr_setcgroup_np (&attr, cgroup);
>   posix_spawn (...)
> 
> Similar to other posix_spawn flags, POSIX_SPAWN_SETCGROUP control
> whether the cgroup file descriptor will be used or not with
> clone3.
> 
> There is no fallback is either clone3 does not support the flag
> or if the architecture does not provide the clone3 wrapper, in
> this case posix_spawn returns ENOTSUP.
> 
> Checked on x86_64-linux-gnu.
> ---
>  NEWS                                          |   6 +-
>  bits/spawn_ext.h                              |  21 ++
>  posix/Makefile                                |   1 +
>  posix/spawn.h                                 |   6 +-
>  posix/spawnattr_setflags.c                    |   3 +-
>  sysdeps/unix/sysv/linux/Makefile              |   5 +
>  sysdeps/unix/sysv/linux/Versions              |   4 +
>  sysdeps/unix/sysv/linux/aarch64/libc.abilist  |   2 +
>  sysdeps/unix/sysv/linux/alpha/libc.abilist    |   2 +
>  sysdeps/unix/sysv/linux/arc/libc.abilist      |   2 +
>  sysdeps/unix/sysv/linux/arm/be/libc.abilist   |   2 +
>  sysdeps/unix/sysv/linux/arm/le/libc.abilist   |   2 +
>  sysdeps/unix/sysv/linux/bits/spawn_ext.h      |  40 ++++
>  sysdeps/unix/sysv/linux/csky/libc.abilist     |   2 +
>  sysdeps/unix/sysv/linux/hppa/libc.abilist     |   2 +
>  sysdeps/unix/sysv/linux/i386/libc.abilist     |   2 +
>  sysdeps/unix/sysv/linux/ia64/libc.abilist     |   2 +
>  .../sysv/linux/loongarch/lp64/libc.abilist    |   2 +
>  .../sysv/linux/m68k/coldfire/libc.abilist     |   2 +
>  .../unix/sysv/linux/m68k/m680x0/libc.abilist  |   2 +
>  .../sysv/linux/microblaze/be/libc.abilist     |   2 +
>  .../sysv/linux/microblaze/le/libc.abilist     |   2 +
>  .../sysv/linux/mips/mips32/fpu/libc.abilist   |   2 +
>  .../sysv/linux/mips/mips32/nofpu/libc.abilist |   2 +
>  .../sysv/linux/mips/mips64/n32/libc.abilist   |   2 +
>  .../sysv/linux/mips/mips64/n64/libc.abilist   |   2 +
>  sysdeps/unix/sysv/linux/nios2/libc.abilist    |   2 +
>  sysdeps/unix/sysv/linux/or1k/libc.abilist     |   2 +
>  .../linux/powerpc/powerpc32/fpu/libc.abilist  |   2 +
>  .../powerpc/powerpc32/nofpu/libc.abilist      |   2 +
>  .../linux/powerpc/powerpc64/be/libc.abilist   |   2 +
>  .../linux/powerpc/powerpc64/le/libc.abilist   |   2 +
>  .../unix/sysv/linux/riscv/rv32/libc.abilist   |   2 +
>  .../unix/sysv/linux/riscv/rv64/libc.abilist   |   2 +
>  .../unix/sysv/linux/s390/s390-32/libc.abilist |   2 +
>  .../unix/sysv/linux/s390/s390-64/libc.abilist |   2 +
>  sysdeps/unix/sysv/linux/sh/be/libc.abilist    |   2 +
>  sysdeps/unix/sysv/linux/sh/le/libc.abilist    |   2 +
>  .../sysv/linux/sparc/sparc32/libc.abilist     |   2 +
>  .../sysv/linux/sparc/sparc64/libc.abilist     |   2 +
>  .../unix/sysv/linux/spawnattr_getcgroup_np.c  |  28 +++
>  .../unix/sysv/linux/spawnattr_setcgroup_np.c  |  27 +++
>  sysdeps/unix/sysv/linux/spawni.c              |  22 +-
>  sysdeps/unix/sysv/linux/tst-spawn-cgroup.c    | 216 ++++++++++++++++++
>  .../unix/sysv/linux/x86_64/64/libc.abilist    |   2 +
>  .../unix/sysv/linux/x86_64/x32/libc.abilist   |   2 +
>  46 files changed, 441 insertions(+), 6 deletions(-)
>  create mode 100644 bits/spawn_ext.h
>  create mode 100644 sysdeps/unix/sysv/linux/bits/spawn_ext.h
>  create mode 100644 sysdeps/unix/sysv/linux/spawnattr_getcgroup_np.c
>  create mode 100644 sysdeps/unix/sysv/linux/spawnattr_setcgroup_np.c
>  create mode 100644 sysdeps/unix/sysv/linux/tst-spawn-cgroup.c
> 
> diff --git a/NEWS b/NEWS
> index 22875d5fa4..99824eab95 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -9,7 +9,11 @@ Version 2.39
>  
>  Major new features:
>  
> -  [Add new features here]
> +* On Linux, the functions posix_spawnattr_getcgroup_np and
> +  posix_spawnattr_setcgroup_np have been added, along with the
> +  POSIX_SPAWN_SETCGROUP flag.  They allow posix_spawn and posix_spawnp to
> +  set the cgroupv2 in the new process in a race free manner.  These functions
> +  are GNU extensions and require a kernel with clone3 support.
>  
>  Deprecated and removed features, and other changes affecting compatibility:
>  
> diff --git a/bits/spawn_ext.h b/bits/spawn_ext.h
> new file mode 100644
> index 0000000000..75b504a768
> --- /dev/null
> +++ b/bits/spawn_ext.h
> @@ -0,0 +1,21 @@
> +/* POSIX spawn extensions.   Generic version.
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#ifndef _SPAWN_H
> +# error "Never include <bits/spawn-ext.h> directly; use <spawn.h> instead."
> +#endif
> diff --git a/posix/Makefile b/posix/Makefile
> index 3d368b91f6..70faad4b63 100644
> --- a/posix/Makefile
> +++ b/posix/Makefile
> @@ -37,6 +37,7 @@ headers := \
>    bits/pthreadtypes-arch.h \
>    bits/pthreadtypes.h \
>    bits/sched.h \
> +  bits/spawn_ext.h \
>    bits/thread-shared-types.h \
>    bits/types.h \
>    bits/types/idtype_t.h \
> diff --git a/posix/spawn.h b/posix/spawn.h
> index 04cc525fa5..731862cc5a 100644
> --- a/posix/spawn.h
> +++ b/posix/spawn.h
> @@ -34,7 +34,8 @@ typedef struct
>    sigset_t __ss;
>    struct sched_param __sp;
>    int __policy;
> -  int __pad[16];
> +  int __cgroup;
> +  int __pad[15];
>  } posix_spawnattr_t;
>  
>  
> @@ -59,6 +60,7 @@ typedef struct
>  #ifdef __USE_GNU
>  # define POSIX_SPAWN_USEVFORK		0x40
>  # define POSIX_SPAWN_SETSID		0x80
> +# define POSIX_SPAWN_SETCGROUP         0x100
>  #endif
>  
>  
> @@ -231,4 +233,6 @@ posix_spawn_file_actions_addtcsetpgrp_np (posix_spawn_file_actions_t *,
>  
>  __END_DECLS
>  
> +#include <bits/spawn_ext.h>
> +
>  #endif /* spawn.h */
> diff --git a/posix/spawnattr_setflags.c b/posix/spawnattr_setflags.c
> index 97153948e4..e7bb217c6a 100644
> --- a/posix/spawnattr_setflags.c
> +++ b/posix/spawnattr_setflags.c
> @@ -26,7 +26,8 @@
>  		   | POSIX_SPAWN_SETSCHEDPARAM				      \
>  		   | POSIX_SPAWN_SETSCHEDULER				      \
>  		   | POSIX_SPAWN_SETSID					      \
> -		   | POSIX_SPAWN_USEVFORK)
> +		   | POSIX_SPAWN_USEVFORK				      \
> +		   | POSIX_SPAWN_SETCGROUP)
>  
>  /* Store flags in the attribute structure.  */
>  int
> diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
> index be801e3be4..d7b020154a 100644
> --- a/sysdeps/unix/sysv/linux/Makefile
> +++ b/sysdeps/unix/sysv/linux/Makefile
> @@ -493,11 +493,14 @@ sysdep_routines += \
>    getcpu \
>    oldglob \
>    sched_getcpu \
> +  spawnattr_getcgroup_np \
> +  spawnattr_setcgroup_np \
>    # sysdep_routines
>  
>  tests += \
>    tst-affinity \
>    tst-affinity-pid \
> +  tst-spawn-cgroup \
>    # tests
>  
>  tests-static += \
> @@ -511,6 +514,8 @@ tests += \
>  CFLAGS-fork.c = $(libio-mtsafe)
>  CFLAGS-getpid.o = -fomit-frame-pointer
>  CFLAGS-getpid.os = -fomit-frame-pointer
> +
> +tst-spawn-cgroup-ARGS = -- $(host-test-program-cmd)
>  endif
>  
>  ifeq ($(subdir),inet)
> diff --git a/sysdeps/unix/sysv/linux/Versions b/sysdeps/unix/sysv/linux/Versions
> index bc59bce42f..6d8a67039e 100644
> --- a/sysdeps/unix/sysv/linux/Versions
> +++ b/sysdeps/unix/sysv/linux/Versions
> @@ -321,6 +321,10 @@ libc {
>      __ppoll64_chk;
>  %endif
>    }
> +  GLIBC_2.39 {
> +    posix_spawnattr_getcgroup_np;
> +    posix_spawnattr_setcgroup_np;
> +  }
>    GLIBC_PRIVATE {
>      # functions used in other libraries
>      __syscall_rt_sigqueueinfo;
> diff --git a/sysdeps/unix/sysv/linux/aarch64/libc.abilist b/sysdeps/unix/sysv/linux/aarch64/libc.abilist
> index c49363e70e..0090827e01 100644
> --- a/sysdeps/unix/sysv/linux/aarch64/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/aarch64/libc.abilist
> @@ -2673,3 +2673,5 @@ GLIBC_2.38 strlcat F
>  GLIBC_2.38 strlcpy F
>  GLIBC_2.38 wcslcat F
>  GLIBC_2.38 wcslcpy F
> +GLIBC_2.39 posix_spawnattr_getcgroup_np F
> +GLIBC_2.39 posix_spawnattr_setcgroup_np F
> diff --git a/sysdeps/unix/sysv/linux/alpha/libc.abilist b/sysdeps/unix/sysv/linux/alpha/libc.abilist
> index d6b1dcaae6..9d099471b6 100644
> --- a/sysdeps/unix/sysv/linux/alpha/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/alpha/libc.abilist
> @@ -2782,6 +2782,8 @@ GLIBC_2.38 strlcat F
>  GLIBC_2.38 strlcpy F
>  GLIBC_2.38 wcslcat F
>  GLIBC_2.38 wcslcpy F
> +GLIBC_2.39 posix_spawnattr_getcgroup_np F
> +GLIBC_2.39 posix_spawnattr_setcgroup_np F
>  GLIBC_2.4 _IO_fprintf F
>  GLIBC_2.4 _IO_printf F
>  GLIBC_2.4 _IO_sprintf F
> diff --git a/sysdeps/unix/sysv/linux/arc/libc.abilist b/sysdeps/unix/sysv/linux/arc/libc.abilist
> index dfe0c3f7b6..d7ed2f66de 100644
> --- a/sysdeps/unix/sysv/linux/arc/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/arc/libc.abilist
> @@ -2434,3 +2434,5 @@ GLIBC_2.38 strlcat F
>  GLIBC_2.38 strlcpy F
>  GLIBC_2.38 wcslcat F
>  GLIBC_2.38 wcslcpy F
> +GLIBC_2.39 posix_spawnattr_getcgroup_np F
> +GLIBC_2.39 posix_spawnattr_setcgroup_np F
> diff --git a/sysdeps/unix/sysv/linux/arm/be/libc.abilist b/sysdeps/unix/sysv/linux/arm/be/libc.abilist
> index 6c75e5aa76..92e686defe 100644
> --- a/sysdeps/unix/sysv/linux/arm/be/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/arm/be/libc.abilist
> @@ -554,6 +554,8 @@ GLIBC_2.38 strlcat F
>  GLIBC_2.38 strlcpy F
>  GLIBC_2.38 wcslcat F
>  GLIBC_2.38 wcslcpy F
> +GLIBC_2.39 posix_spawnattr_getcgroup_np F
> +GLIBC_2.39 posix_spawnattr_setcgroup_np F
>  GLIBC_2.4 _Exit F
>  GLIBC_2.4 _IO_2_1_stderr_ D 0xa0
>  GLIBC_2.4 _IO_2_1_stdin_ D 0xa0
> diff --git a/sysdeps/unix/sysv/linux/arm/le/libc.abilist b/sysdeps/unix/sysv/linux/arm/le/libc.abilist
> index 03d6f7ae2d..b503e642fc 100644
> --- a/sysdeps/unix/sysv/linux/arm/le/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/arm/le/libc.abilist
> @@ -551,6 +551,8 @@ GLIBC_2.38 strlcat F
>  GLIBC_2.38 strlcpy F
>  GLIBC_2.38 wcslcat F
>  GLIBC_2.38 wcslcpy F
> +GLIBC_2.39 posix_spawnattr_getcgroup_np F
> +GLIBC_2.39 posix_spawnattr_setcgroup_np F
>  GLIBC_2.4 _Exit F
>  GLIBC_2.4 _IO_2_1_stderr_ D 0xa0
>  GLIBC_2.4 _IO_2_1_stdin_ D 0xa0
> diff --git a/sysdeps/unix/sysv/linux/bits/spawn_ext.h b/sysdeps/unix/sysv/linux/bits/spawn_ext.h
> new file mode 100644
> index 0000000000..3bc10ab477
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/bits/spawn_ext.h
> @@ -0,0 +1,40 @@
> +/* POSIX spawn extensions.   Linux version.
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#ifndef _SPAWN_H
> +# error "Never include <bits/spawn-ext.h> directly; use <spawn.h> instead."
> +#endif
> +
> +__BEGIN_DECLS
> +
> +#ifdef __USE_MISC
> +
> +/* Get the cgroupsv2 the attribute structure.  */
> +extern int posix_spawnattr_getcgroup_np (const posix_spawnattr_t *
> +					 __restrict __attr,
> +					 int *__cgroup)
> +     __THROW __nonnull ((1, 2));
> +
> +/* Store scheduling parameters in the attribute structure.  */
> +extern int posix_spawnattr_setcgroup_np (posix_spawnattr_t *__restrict __attr,
> +					 int __cgroup)
> +     __THROW __nonnull ((1));
> +
> +#endif /* __USE_MISC */
> +
> +__END_DECLS
> diff --git a/sysdeps/unix/sysv/linux/csky/libc.abilist b/sysdeps/unix/sysv/linux/csky/libc.abilist
> index d858c108c6..ec9e209b8d 100644
> --- a/sysdeps/unix/sysv/linux/csky/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/csky/libc.abilist
> @@ -2710,3 +2710,5 @@ GLIBC_2.38 strlcat F
>  GLIBC_2.38 strlcpy F
>  GLIBC_2.38 wcslcat F
>  GLIBC_2.38 wcslcpy F
> +GLIBC_2.39 posix_spawnattr_getcgroup_np F
> +GLIBC_2.39 posix_spawnattr_setcgroup_np F
> diff --git a/sysdeps/unix/sysv/linux/hppa/libc.abilist b/sysdeps/unix/sysv/linux/hppa/libc.abilist
> index 82a14f8ace..961f88bf14 100644
> --- a/sysdeps/unix/sysv/linux/hppa/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/hppa/libc.abilist
> @@ -2659,6 +2659,8 @@ GLIBC_2.38 strlcat F
>  GLIBC_2.38 strlcpy F
>  GLIBC_2.38 wcslcat F
>  GLIBC_2.38 wcslcpy F
> +GLIBC_2.39 posix_spawnattr_getcgroup_np F
> +GLIBC_2.39 posix_spawnattr_setcgroup_np F
>  GLIBC_2.4 __confstr_chk F
>  GLIBC_2.4 __fgets_chk F
>  GLIBC_2.4 __fgets_unlocked_chk F
> diff --git a/sysdeps/unix/sysv/linux/i386/libc.abilist b/sysdeps/unix/sysv/linux/i386/libc.abilist
> index 1950b15d5d..b6f5a4ab83 100644
> --- a/sysdeps/unix/sysv/linux/i386/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/i386/libc.abilist
> @@ -2843,6 +2843,8 @@ GLIBC_2.38 strlcat F
>  GLIBC_2.38 strlcpy F
>  GLIBC_2.38 wcslcat F
>  GLIBC_2.38 wcslcpy F
> +GLIBC_2.39 posix_spawnattr_getcgroup_np F
> +GLIBC_2.39 posix_spawnattr_setcgroup_np F
>  GLIBC_2.4 __confstr_chk F
>  GLIBC_2.4 __fgets_chk F
>  GLIBC_2.4 __fgets_unlocked_chk F
> diff --git a/sysdeps/unix/sysv/linux/ia64/libc.abilist b/sysdeps/unix/sysv/linux/ia64/libc.abilist
> index d0b9cb279b..a404b99e68 100644
> --- a/sysdeps/unix/sysv/linux/ia64/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/ia64/libc.abilist
> @@ -2608,6 +2608,8 @@ GLIBC_2.38 strlcat F
>  GLIBC_2.38 strlcpy F
>  GLIBC_2.38 wcslcat F
>  GLIBC_2.38 wcslcpy F
> +GLIBC_2.39 posix_spawnattr_getcgroup_np F
> +GLIBC_2.39 posix_spawnattr_setcgroup_np F
>  GLIBC_2.4 __confstr_chk F
>  GLIBC_2.4 __fgets_chk F
>  GLIBC_2.4 __fgets_unlocked_chk F
> diff --git a/sysdeps/unix/sysv/linux/loongarch/lp64/libc.abilist b/sysdeps/unix/sysv/linux/loongarch/lp64/libc.abilist
> index e760a631dd..2f9f6e2332 100644
> --- a/sysdeps/unix/sysv/linux/loongarch/lp64/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/loongarch/lp64/libc.abilist
> @@ -2194,3 +2194,5 @@ GLIBC_2.38 strlcat F
>  GLIBC_2.38 strlcpy F
>  GLIBC_2.38 wcslcat F
>  GLIBC_2.38 wcslcpy F
> +GLIBC_2.39 posix_spawnattr_getcgroup_np F
> +GLIBC_2.39 posix_spawnattr_setcgroup_np F
> diff --git a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
> index 35785a3d5f..b7e9ab4558 100644
> --- a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
> @@ -555,6 +555,8 @@ GLIBC_2.38 strlcat F
>  GLIBC_2.38 strlcpy F
>  GLIBC_2.38 wcslcat F
>  GLIBC_2.38 wcslcpy F
> +GLIBC_2.39 posix_spawnattr_getcgroup_np F
> +GLIBC_2.39 posix_spawnattr_setcgroup_np F
>  GLIBC_2.4 _Exit F
>  GLIBC_2.4 _IO_2_1_stderr_ D 0x98
>  GLIBC_2.4 _IO_2_1_stdin_ D 0x98
> diff --git a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
> index 4ab2426e0a..c345da7e0a 100644
> --- a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
> @@ -2786,6 +2786,8 @@ GLIBC_2.38 strlcat F
>  GLIBC_2.38 strlcpy F
>  GLIBC_2.38 wcslcat F
>  GLIBC_2.38 wcslcpy F
> +GLIBC_2.39 posix_spawnattr_getcgroup_np F
> +GLIBC_2.39 posix_spawnattr_setcgroup_np F
>  GLIBC_2.4 __confstr_chk F
>  GLIBC_2.4 __fgets_chk F
>  GLIBC_2.4 __fgets_unlocked_chk F
> diff --git a/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist b/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist
> index 38faa16232..a643d868a8 100644
> --- a/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist
> @@ -2759,3 +2759,5 @@ GLIBC_2.38 strlcat F
>  GLIBC_2.38 strlcpy F
>  GLIBC_2.38 wcslcat F
>  GLIBC_2.38 wcslcpy F
> +GLIBC_2.39 posix_spawnattr_getcgroup_np F
> +GLIBC_2.39 posix_spawnattr_setcgroup_np F
> diff --git a/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist b/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist
> index 374d658988..fed535742c 100644
> --- a/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist
> @@ -2756,3 +2756,5 @@ GLIBC_2.38 strlcat F
>  GLIBC_2.38 strlcpy F
>  GLIBC_2.38 wcslcat F
>  GLIBC_2.38 wcslcpy F
> +GLIBC_2.39 posix_spawnattr_getcgroup_np F
> +GLIBC_2.39 posix_spawnattr_setcgroup_np F
> diff --git a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
> index fcc5e88e91..147bac3eaf 100644
> --- a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
> @@ -2751,6 +2751,8 @@ GLIBC_2.38 strlcat F
>  GLIBC_2.38 strlcpy F
>  GLIBC_2.38 wcslcat F
>  GLIBC_2.38 wcslcpy F
> +GLIBC_2.39 posix_spawnattr_getcgroup_np F
> +GLIBC_2.39 posix_spawnattr_setcgroup_np F
>  GLIBC_2.4 __confstr_chk F
>  GLIBC_2.4 __fgets_chk F
>  GLIBC_2.4 __fgets_unlocked_chk F
> diff --git a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
> index 01eb96cd93..e550616576 100644
> --- a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
> @@ -2749,6 +2749,8 @@ GLIBC_2.38 strlcat F
>  GLIBC_2.38 strlcpy F
>  GLIBC_2.38 wcslcat F
>  GLIBC_2.38 wcslcpy F
> +GLIBC_2.39 posix_spawnattr_getcgroup_np F
> +GLIBC_2.39 posix_spawnattr_setcgroup_np F
>  GLIBC_2.4 __confstr_chk F
>  GLIBC_2.4 __fgets_chk F
>  GLIBC_2.4 __fgets_unlocked_chk F
> diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
> index a2748b7b74..56f414dbd0 100644
> --- a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
> @@ -2757,6 +2757,8 @@ GLIBC_2.38 strlcat F
>  GLIBC_2.38 strlcpy F
>  GLIBC_2.38 wcslcat F
>  GLIBC_2.38 wcslcpy F
> +GLIBC_2.39 posix_spawnattr_getcgroup_np F
> +GLIBC_2.39 posix_spawnattr_setcgroup_np F
>  GLIBC_2.4 __confstr_chk F
>  GLIBC_2.4 __fgets_chk F
>  GLIBC_2.4 __fgets_unlocked_chk F
> diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
> index 0ae7ba499d..da704a2e2b 100644
> --- a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
> @@ -2659,6 +2659,8 @@ GLIBC_2.38 strlcat F
>  GLIBC_2.38 strlcpy F
>  GLIBC_2.38 wcslcat F
>  GLIBC_2.38 wcslcpy F
> +GLIBC_2.39 posix_spawnattr_getcgroup_np F
> +GLIBC_2.39 posix_spawnattr_setcgroup_np F
>  GLIBC_2.4 __confstr_chk F
>  GLIBC_2.4 __fgets_chk F
>  GLIBC_2.4 __fgets_unlocked_chk F
> diff --git a/sysdeps/unix/sysv/linux/nios2/libc.abilist b/sysdeps/unix/sysv/linux/nios2/libc.abilist
> index 947495a0e2..f5a157ea94 100644
> --- a/sysdeps/unix/sysv/linux/nios2/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/nios2/libc.abilist
> @@ -2798,3 +2798,5 @@ GLIBC_2.38 strlcat F
>  GLIBC_2.38 strlcpy F
>  GLIBC_2.38 wcslcat F
>  GLIBC_2.38 wcslcpy F
> +GLIBC_2.39 posix_spawnattr_getcgroup_np F
> +GLIBC_2.39 posix_spawnattr_setcgroup_np F
> diff --git a/sysdeps/unix/sysv/linux/or1k/libc.abilist b/sysdeps/unix/sysv/linux/or1k/libc.abilist
> index 115f1039e7..85b552f1cb 100644
> --- a/sysdeps/unix/sysv/linux/or1k/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/or1k/libc.abilist
> @@ -2180,3 +2180,5 @@ GLIBC_2.38 strlcat F
>  GLIBC_2.38 strlcpy F
>  GLIBC_2.38 wcslcat F
>  GLIBC_2.38 wcslcpy F
> +GLIBC_2.39 posix_spawnattr_getcgroup_np F
> +GLIBC_2.39 posix_spawnattr_setcgroup_np F
> diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
> index 19c4c325b0..cadb16c12f 100644
> --- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
> @@ -2825,6 +2825,8 @@ GLIBC_2.38 strlcat F
>  GLIBC_2.38 strlcpy F
>  GLIBC_2.38 wcslcat F
>  GLIBC_2.38 wcslcpy F
> +GLIBC_2.39 posix_spawnattr_getcgroup_np F
> +GLIBC_2.39 posix_spawnattr_setcgroup_np F
>  GLIBC_2.4 _IO_fprintf F
>  GLIBC_2.4 _IO_printf F
>  GLIBC_2.4 _IO_sprintf F
> diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
> index 3e043c4044..50c5b99728 100644
> --- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
> @@ -2858,6 +2858,8 @@ GLIBC_2.38 strlcat F
>  GLIBC_2.38 strlcpy F
>  GLIBC_2.38 wcslcat F
>  GLIBC_2.38 wcslcpy F
> +GLIBC_2.39 posix_spawnattr_getcgroup_np F
> +GLIBC_2.39 posix_spawnattr_setcgroup_np F
>  GLIBC_2.4 _IO_fprintf F
>  GLIBC_2.4 _IO_printf F
>  GLIBC_2.4 _IO_sprintf F
> diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
> index e4f3a766bb..81c63385af 100644
> --- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
> @@ -2579,6 +2579,8 @@ GLIBC_2.38 strlcat F
>  GLIBC_2.38 strlcpy F
>  GLIBC_2.38 wcslcat F
>  GLIBC_2.38 wcslcpy F
> +GLIBC_2.39 posix_spawnattr_getcgroup_np F
> +GLIBC_2.39 posix_spawnattr_setcgroup_np F
>  GLIBC_2.4 _IO_fprintf F
>  GLIBC_2.4 _IO_printf F
>  GLIBC_2.4 _IO_sprintf F
> diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
> index dafe1c4a59..af9be18108 100644
> --- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
> @@ -2893,3 +2893,5 @@ GLIBC_2.38 strlcat F
>  GLIBC_2.38 strlcpy F
>  GLIBC_2.38 wcslcat F
>  GLIBC_2.38 wcslcpy F
> +GLIBC_2.39 posix_spawnattr_getcgroup_np F
> +GLIBC_2.39 posix_spawnattr_setcgroup_np F
> diff --git a/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist
> index b9740a1afc..2266a88ad5 100644
> --- a/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist
> @@ -2436,3 +2436,5 @@ GLIBC_2.38 strlcat F
>  GLIBC_2.38 strlcpy F
>  GLIBC_2.38 wcslcat F
>  GLIBC_2.38 wcslcpy F
> +GLIBC_2.39 posix_spawnattr_getcgroup_np F
> +GLIBC_2.39 posix_spawnattr_setcgroup_np F
> diff --git a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
> index e3b4656aa2..4776ae32b8 100644
> --- a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
> @@ -2636,3 +2636,5 @@ GLIBC_2.38 strlcat F
>  GLIBC_2.38 strlcpy F
>  GLIBC_2.38 wcslcat F
>  GLIBC_2.38 wcslcpy F
> +GLIBC_2.39 posix_spawnattr_getcgroup_np F
> +GLIBC_2.39 posix_spawnattr_setcgroup_np F
> diff --git a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
> index 84cb7a50ed..5d1d7d07a5 100644
> --- a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
> @@ -2823,6 +2823,8 @@ GLIBC_2.38 strlcat F
>  GLIBC_2.38 strlcpy F
>  GLIBC_2.38 wcslcat F
>  GLIBC_2.38 wcslcpy F
> +GLIBC_2.39 posix_spawnattr_getcgroup_np F
> +GLIBC_2.39 posix_spawnattr_setcgroup_np F
>  GLIBC_2.4 _IO_fprintf F
>  GLIBC_2.4 _IO_printf F
>  GLIBC_2.4 _IO_sprintf F
> diff --git a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
> index 33df3b1646..fffc32a0f4 100644
> --- a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
> @@ -2616,6 +2616,8 @@ GLIBC_2.38 strlcat F
>  GLIBC_2.38 strlcpy F
>  GLIBC_2.38 wcslcat F
>  GLIBC_2.38 wcslcpy F
> +GLIBC_2.39 posix_spawnattr_getcgroup_np F
> +GLIBC_2.39 posix_spawnattr_setcgroup_np F
>  GLIBC_2.4 _IO_fprintf F
>  GLIBC_2.4 _IO_printf F
>  GLIBC_2.4 _IO_sprintf F
> diff --git a/sysdeps/unix/sysv/linux/sh/be/libc.abilist b/sysdeps/unix/sysv/linux/sh/be/libc.abilist
> index 94cbccd715..43ff21447d 100644
> --- a/sysdeps/unix/sysv/linux/sh/be/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/sh/be/libc.abilist
> @@ -2666,6 +2666,8 @@ GLIBC_2.38 strlcat F
>  GLIBC_2.38 strlcpy F
>  GLIBC_2.38 wcslcat F
>  GLIBC_2.38 wcslcpy F
> +GLIBC_2.39 posix_spawnattr_getcgroup_np F
> +GLIBC_2.39 posix_spawnattr_setcgroup_np F
>  GLIBC_2.4 __confstr_chk F
>  GLIBC_2.4 __fgets_chk F
>  GLIBC_2.4 __fgets_unlocked_chk F
> diff --git a/sysdeps/unix/sysv/linux/sh/le/libc.abilist b/sysdeps/unix/sysv/linux/sh/le/libc.abilist
> index 3bb316a787..9ea18d5886 100644
> --- a/sysdeps/unix/sysv/linux/sh/le/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/sh/le/libc.abilist
> @@ -2663,6 +2663,8 @@ GLIBC_2.38 strlcat F
>  GLIBC_2.38 strlcpy F
>  GLIBC_2.38 wcslcat F
>  GLIBC_2.38 wcslcpy F
> +GLIBC_2.39 posix_spawnattr_getcgroup_np F
> +GLIBC_2.39 posix_spawnattr_setcgroup_np F
>  GLIBC_2.4 __confstr_chk F
>  GLIBC_2.4 __fgets_chk F
>  GLIBC_2.4 __fgets_unlocked_chk F
> diff --git a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
> index 6341b491b4..c6607d5385 100644
> --- a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
> @@ -2818,6 +2818,8 @@ GLIBC_2.38 strlcat F
>  GLIBC_2.38 strlcpy F
>  GLIBC_2.38 wcslcat F
>  GLIBC_2.38 wcslcpy F
> +GLIBC_2.39 posix_spawnattr_getcgroup_np F
> +GLIBC_2.39 posix_spawnattr_setcgroup_np F
>  GLIBC_2.4 _IO_fprintf F
>  GLIBC_2.4 _IO_printf F
>  GLIBC_2.4 _IO_sprintf F
> diff --git a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
> index 8ed1ea2926..a010a2bb16 100644
> --- a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
> @@ -2631,6 +2631,8 @@ GLIBC_2.38 strlcat F
>  GLIBC_2.38 strlcpy F
>  GLIBC_2.38 wcslcat F
>  GLIBC_2.38 wcslcpy F
> +GLIBC_2.39 posix_spawnattr_getcgroup_np F
> +GLIBC_2.39 posix_spawnattr_setcgroup_np F
>  GLIBC_2.4 __confstr_chk F
>  GLIBC_2.4 __fgets_chk F
>  GLIBC_2.4 __fgets_unlocked_chk F
> diff --git a/sysdeps/unix/sysv/linux/spawnattr_getcgroup_np.c b/sysdeps/unix/sysv/linux/spawnattr_getcgroup_np.c
> new file mode 100644
> index 0000000000..82fd8f4b71
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/spawnattr_getcgroup_np.c
> @@ -0,0 +1,28 @@
> +/* Copyright (C) 2000-2023 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <spawn.h>
> +
> +/* Get scheduling policy from the attribute structure.  */
> +int
> +posix_spawnattr_getcgroup_np (const posix_spawnattr_t *attr,
> +			      int *cgroup)
> +{
> +  *cgroup = attr->__cgroup;
> +
> +  return 0;
> +}
> diff --git a/sysdeps/unix/sysv/linux/spawnattr_setcgroup_np.c b/sysdeps/unix/sysv/linux/spawnattr_setcgroup_np.c
> new file mode 100644
> index 0000000000..74d60bb5ea
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/spawnattr_setcgroup_np.c
> @@ -0,0 +1,27 @@
> +/* Copyright (C) 2000-2023 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <spawn.h>
> +
> +/* Store scheduling policy in the attribute structure.  */
> +int
> +posix_spawnattr_setcgroup_np (posix_spawnattr_t *attr, int cgroup)
> +{
> +  attr->__cgroup = cgroup;
> +
> +  return 0;
> +}
> diff --git a/sysdeps/unix/sysv/linux/spawni.c b/sysdeps/unix/sysv/linux/spawni.c
> index ec687cb423..f0d4c62ae6 100644
> --- a/sysdeps/unix/sysv/linux/spawni.c
> +++ b/sysdeps/unix/sysv/linux/spawni.c
> @@ -380,14 +380,19 @@ __spawnix (pid_t * pid, const char *file,
>       need for CLONE_SETTLS.  Although parent and child share the same TLS
>       namespace, there will be no concurrent access for TLS variables (errno
>       for instance).  */
> +  bool set_cgroup = attrp ? (attrp->__flags & POSIX_SPAWN_SETCGROUP) : false;
>    struct clone_args clone_args =
>      {
>        /* Unsupported flags like CLONE_CLEAR_SIGHAND will be cleared up by
>  	 __clone_internal_fallback.  */
> -      .flags = CLONE_CLEAR_SIGHAND | CLONE_VM | CLONE_VFORK,
> +      .flags = (set_cgroup ? CLONE_INTO_CGROUP : 0)
> +	       | CLONE_CLEAR_SIGHAND
> +	       | CLONE_VM
> +	       | CLONE_VFORK,
>        .exit_signal = SIGCHLD,
>        .stack = (uintptr_t) stack,
>        .stack_size = stack_size,
> +      .cgroup = (set_cgroup ? attrp->__cgroup : 0)
>      };
>  #ifdef HAVE_CLONE3_WRAPPER
>    args.use_clone3 = true;
> @@ -398,8 +403,19 @@ __spawnix (pid_t * pid, const char *file,
>  #endif
>      {
>        args.use_clone3 = false;
> -      new_pid = __clone_internal_fallback (&clone_args, __spawni_child,
> -					   &args);
> +      if (!set_cgroup)
> +	new_pid = __clone_internal_fallback (&clone_args, __spawni_child,
> +					     &args);
> +      else
> +	{
> +	  /* No fallback for POSIX_SPAWN_SETCGROUP if clone3 is not
> +	     supported.  */
> +	  new_pid = -1;
> +#ifdef HAVE_CLONE3_WRAPPER
> +	  if (errno == ENOSYS)
> +#endif
> +	    errno = ENOTSUP;
> +	}
>      }
>  
>    /* It needs to collect the case where the auxiliary process was created
> diff --git a/sysdeps/unix/sysv/linux/tst-spawn-cgroup.c b/sysdeps/unix/sysv/linux/tst-spawn-cgroup.c
> new file mode 100644
> index 0000000000..6dba30ab29
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/tst-spawn-cgroup.c
> @@ -0,0 +1,216 @@
> +/* Tests for posix_spawn cgroup extension.
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include <assert.h>
> +#include <errno.h>
> +#include <fcntl.h>
> +#include <getopt.h>
> +#include <spawn.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <support/check.h>
> +#include <support/support.h>
> +#include <support/xstdio.h>
> +#include <support/xunistd.h>
> +#include <support/temp_file.h>
> +#include <sys/vfs.h>
> +#include <sys/wait.h>
> +#include <unistd.h>
> +
> +#define CGROUPFS "/sys/fs/cgroup/"
> +#ifndef CGROUP2_SUPER_MAGIC
> +# define CGROUP2_SUPER_MAGIC 0x63677270
> +#endif
> +
> +#define F_TYPE_EQUAL(a, b) (a == (typeof(a)) b)
> +
> +#define CGROUP_TEST "test-spawn-cgroup"
> +
> +/* Nonzero if the program gets called via `exec'.  */
> +#define CMDLINE_OPTIONS \
> +  { "restart", no_argument, &restart, 1 },
> +static int restart;
> +
> +/* Hold the four initial argument used to respawn the process, plus the extra
> +   '--direct', '--restart', the check type ('SIG_IGN' or 'SIG_DFL'), and a
> +   final NULL.  */
> +static char *spargs[8];
> +
> +static inline char *
> +startswith (const char *s, const char *prefix)
> +{
> +  size_t l = strlen (prefix);
> +  if (strncmp (s, prefix, l) == 0)
> +    return (char *) s + l;
> +  return NULL;
> +}
> +
> +static char *
> +get_cgroup (void)
> +{
> +  FILE *f = fopen ("/proc/self/cgroup", "re");
> +  if (f == NULL)
> +    FAIL_UNSUPPORTED ("no cgroup defined for the process");
> +
> +  char *cgroup = NULL;
> +
> +  char *line = NULL;
> +  size_t linesiz = 0;
> +  while (xgetline (&line, &linesiz, f) > 0)
> +    {
> +      char *entry = startswith (line, "0:");
> +      if (entry == NULL)
> +	continue;
> +
> +      entry = strchr (entry, ':');
> +      if (entry == NULL)
> +	continue;
> +
> +      cgroup = entry + 1;
> +      size_t l = strlen (cgroup);
> +      if (cgroup[l - 1] == '\n')
> +	cgroup[l - 1] = '\0';
> +
> +      cgroup = xstrdup (entry + 1);
> +      break;
> +    }
> +
> +  xfclose (f);
> +  free (line);
> +
> +  return cgroup;
> +}
> +
> +
> +/* Called on process re-execution.  */
> +_Noreturn static void
> +handle_restart (int argc, char *argv[])
> +{
> +  assert (argc == 1);
> +  char *newcgroup = argv[0];
> +
> +  char *current_cgroup = get_cgroup ();
> +  TEST_VERIFY_EXIT (current_cgroup != NULL);
> +  TEST_COMPARE_STRING (newcgroup, current_cgroup);
> +  exit (EXIT_SUCCESS);
> +}
> +
> +static int
> +do_test_cgroup_failure (pid_t *pid, int cgroup)
> +{
> +  posix_spawnattr_t attr;
> +  TEST_COMPARE (posix_spawnattr_init (&attr), 0);
> +  TEST_COMPARE (posix_spawnattr_setflags (&attr, POSIX_SPAWN_SETCGROUP), 0);
> +  TEST_COMPARE (posix_spawnattr_setcgroup_np (&attr, cgroup), 0);
> +
> +  int cgetgroup;
> +  TEST_COMPARE (posix_spawnattr_getcgroup_np (&attr, &cgetgroup), 0);
> +  TEST_COMPARE (cgroup, cgetgroup);
> +
> +  return posix_spawn (pid, spargs[0], NULL, &attr, spargs, environ);
> +}
> +
> +static int
> +create_new_cgroup (char **newcgroup)
> +{
> +  struct statfs fs;
> +  if (statfs (CGROUPFS, &fs) < 0)
> +    {
> +      if (errno == ENOENT)
> +	FAIL_UNSUPPORTED ("not cgroupv2 mount found");
> +      FAIL_EXIT1 ("statfs (%s): %m\n", CGROUPFS);
> +    }
> +
> +  if (!F_TYPE_EQUAL (fs.f_type, CGROUP2_SUPER_MAGIC))
> +    FAIL_UNSUPPORTED ("%s is not a cgroupv2", CGROUPFS);
> +
> +  char *cgroup = get_cgroup ();
> +  TEST_VERIFY_EXIT (cgroup != NULL);
> +  *newcgroup = xasprintf ("%s/%s", cgroup, CGROUP_TEST);
> +  char *cgpath = xasprintf ("%s%s/%s", CGROUPFS, cgroup, CGROUP_TEST);
> +  free (cgroup);
> +
> +  if (mkdir (cgpath, 0755) == -1 && errno != EEXIST)
> +    {
> +      if (errno == EACCES || errno == EPERM)
> +	FAIL_UNSUPPORTED ("can not create a new cgroupv2 group");
> +      FAIL_EXIT1 ("mkdir (%s): %m", cgpath);
> +    }
> +  add_temp_file (cgpath);
> +
> +  return xopen (cgpath, O_DIRECTORY | O_RDONLY | O_CLOEXEC, 0666);
> +}
> +
> +static int
> +do_test (int argc, char *argv[])
> +{
> +  /* We must have either:
> +
> +     - one or four parameters if called initially:
> +       + argv[1]: path for ld.so        optional
> +       + argv[2]: "--library-path"      optional
> +       + argv[3]: the library path      optional
> +       + argv[4]: the application name
> +
> +     - six parameters left if called through re-execution:
> +       + argv[4/1]: the application name
> +       + argv[5/2]: the created cgroup
> +
> +     * When built with --enable-hardcoded-path-in-tests or issued without
> +       using the loader directly.  */
> +
> +  if (restart)
> +    handle_restart (argc - 1, &argv[1]);
> +
> +  TEST_VERIFY_EXIT (argc == 2 || argc == 5);
> +
> +  char *newcgroup;
> +  int cgroup = create_new_cgroup (&newcgroup);
> +
> +  int i;
> +  for (i = 0; i < argc - 1; i++)
> +    spargs[i] = argv[i + 1];
> +  spargs[i++] = (char *) "--direct";
> +  spargs[i++] = (char *) "--restart";
> +  spargs[i++] = (char *) newcgroup;
> +  spargs[i] = NULL;
> +
> +  /* Check if invalid cgroups returns an error.  */
> +  {
> +    TEST_COMPARE (do_test_cgroup_failure (NULL, -1), EINVAL);
> +  }
> +
> +  {
> +    pid_t pid;
> +    TEST_COMPARE (do_test_cgroup_failure (&pid, cgroup), 0);
> +
> +    siginfo_t sinfo;
> +    TEST_COMPARE (waitid (P_PID, pid, &sinfo, WEXITED), 0);
> +    TEST_COMPARE (sinfo.si_signo, SIGCHLD);
> +    TEST_COMPARE (sinfo.si_code, CLD_EXITED);
> +    TEST_COMPARE (sinfo.si_status, 0);
> +  }
> +
> +  xclose (cgroup);
> +  free (newcgroup);
> +
> +  return 0;
> +}
> +
> +#define TEST_FUNCTION_ARGV do_test
> +#include <support/test-driver.c>
> diff --git a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
> index 57cfcc2086..3591b5de5e 100644
> --- a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
> @@ -2582,6 +2582,8 @@ GLIBC_2.38 strlcat F
>  GLIBC_2.38 strlcpy F
>  GLIBC_2.38 wcslcat F
>  GLIBC_2.38 wcslcpy F
> +GLIBC_2.39 posix_spawnattr_getcgroup_np F
> +GLIBC_2.39 posix_spawnattr_setcgroup_np F
>  GLIBC_2.4 __confstr_chk F
>  GLIBC_2.4 __fgets_chk F
>  GLIBC_2.4 __fgets_unlocked_chk F
> diff --git a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
> index 3f0a9f6d82..ffbd8f3738 100644
> --- a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
> @@ -2688,3 +2688,5 @@ GLIBC_2.38 strlcat F
>  GLIBC_2.38 strlcpy F
>  GLIBC_2.38 wcslcat F
>  GLIBC_2.38 wcslcpy F
> +GLIBC_2.39 posix_spawnattr_getcgroup_np F
> +GLIBC_2.39 posix_spawnattr_setcgroup_np F

-- 
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2023-08-14 13:27 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-03 16:35 [PATCH v7 0/8] Add pidfd and cgroupv2 support for process creation Adhemerval Zanella
2023-08-03 16:35 ` [PATCH v7 1/8] arm: Add the clone3 wrapper Adhemerval Zanella
2023-08-11 10:17   ` Florian Weimer
2023-08-11 14:12     ` Adhemerval Zanella Netto
2023-08-11 14:21       ` Florian Weimer
2023-08-03 16:35 ` [PATCH v7 2/8] mips: " Adhemerval Zanella
2023-08-03 16:35 ` [PATCH v7 3/8] linux: Undef __ASSUME_CLONE3 for alpha, ia64, nios2, sh, and sparc Adhemerval Zanella
2023-08-11 10:34   ` Florian Weimer
2023-08-11 15:12     ` Adhemerval Zanella Netto
2023-08-03 16:35 ` [PATCH v7 4/8] linux: Add posix_spawnattr_{get,set}cgroup_np (BZ 26731) Adhemerval Zanella
2023-08-11 10:51   ` [PATCH v7 4/8] linux: Add posix_spawnattr_{get, set}cgroup_np " Florian Weimer
2023-08-11 15:31     ` Adhemerval Zanella Netto
2023-08-14 13:27   ` Carlos O'Donell
2023-08-03 16:35 ` [PATCH v7 5/8] posix: Add pidfd_spawn and pidfd_spawnp (BZ 30349) Adhemerval Zanella
2023-08-11 11:45   ` Florian Weimer
2023-08-11 16:14     ` Adhemerval Zanella Netto
2023-08-03 16:35 ` [PATCH v7 6/8] posix: Add pidfd_fork (BZ 26371) Adhemerval Zanella
2023-08-11 12:06   ` Florian Weimer
2023-08-11 16:26     ` Adhemerval Zanella Netto
2023-08-03 16:35 ` [PATCH v7 7/8] posix: Add PIDFDFORK_NOSIGCHLD for pidfd_fork Adhemerval Zanella
2023-08-03 16:35 ` [PATCH v7 8/8] linux: Add pidfd_getpid Adhemerval Zanella
2023-08-11 14:36   ` Florian Weimer
2023-08-11 17:29     ` Adhemerval Zanella Netto

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).