public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH v8 0/7] Add pidfd and cgroupv2 support for process creation
@ 2023-08-18 14:06 Adhemerval Zanella
  2023-08-18 14:06 ` [PATCH v8 1/7] arm: Add the clone3 wrapper Adhemerval Zanella
                   ` (7 more replies)
  0 siblings, 8 replies; 29+ messages in thread
From: Adhemerval Zanella @ 2023-08-18 14:06 UTC (permalink / raw)
  To: libc-alpha, Florian Weimer

The glibc 2.36 added wrappers for Linux syscall pidfd_open, pidfd_getfd,
and pidfd_send_signal, and exported the P_PIDFD to use along with
waitid. The pidfd is a race-free interface, however, the pidfd_open is
subject to TOCTOU if the file descriptor is not obtained directly from
the clone or clone3 syscall (there is still a small window between the
clone return and the pidfd_getfd where the process can be reaped and the
process ID reused).

A fully race-free interface with posix_spawn interface is being
discussed by GNOME [1] [2], and Qt already uses it in its QtProcess
implementation [3].  The Qt implementation has some pitfalls:

  - It calls clone through the syscall symbol, which does not run the
    pthread_atfork handlers even though it intends to use the clone
semantic for fork (by only using CLONE_PIDFD | SIGCHLD).

  - It also does not reset any internal state, such as internal IO,
    malloc, loader, etc. locks.

  - It does not set the TCB tid field nor the robust list, used by the
    pthread code.

  - It does not optimize process creation by using CLONE_VM and
    CLONE_VFORK.

Also, the recent Linux kernel (starting with 5.7) provides a way to
create a new process in a different cgroups version 2 than the default
one (through clone3 CLONE_INTO_CGROUP flag).  Providing it through glibc
interfaces makes it usable without the risk of potential breakage by
issuing clone3 syscall directly (check BZ#26371 discussion).

This patch set adds new interfaces that take care of these potential
issues.  The new posix_spawn / posix_spawnp extensions:

  #define POSIX_SPAWN_SETCGROUP 0x100

  int posix_spawnattr_getcgroup_np (const posix_spawnattr_t
				    restrict *attr, int *cgroup);
  int posix_spawnattr_setcgroup_np (posix_spawnattr_t *restrict attr,
                                    int cgroup);
  
Allow spawning a new process on a different cgroupv2.  

The pidfd_spawn and pidfd_spawnp is similar to posix_spawn and
posix_spawnp, but return a process file descriptor instead of a PID.

  int pidfd_spawn (int *restrict pidfd,
 		   const char *restrict file,
  		   const posix_spawn_file_actions_t *restrict facts,
  		   const posix_spawnattr_t *restrict attrp,
  		   char *const argv[restrict],
  		   char *const envp[restrict]);

  int pidfd_spawnp (int *restrict pidfd,
 		    const char *restrict path,
  		    const posix_spawn_file_actions_t *restrict facts,
  		    const posix_spawnattr_t *restrict attrp,
  		    char *const argv[restrict_arr],
  		    char *const envp[restrict_arr]);

The implementation makes sure that kernel must support the complete
pidfd interface, meaning that waitid (P_PIDFD) should be supported.  It
ensures that a non-racy workaround is required (such as reading procfs
fdinfo pid to use along with old wait interfaces).  If the kernel does
not have the required support the interface returns ENOSYS.

A new symbol is used instead of a posix_spawn extension to avoid
possible issues with language bindings that might track the argument
lifetime.

Both symbols reuse the posix_spawn posix_spawn_file_actions_t and
posix_spawnattr_t, to either avoid rehashing the posix_spawn API or add
a new one.  It also means that both interfaces support the same
attribute and file actions, and a new flag or file action on posix_spawn
is also added automatically for pidfd_spawn. It includes
POSIX_SPAWN_SETCGROUP.

Along with the spawn interface, a fork-like one is also provided:

  typedef union
  {
    struct
    {
      __uint64_t fork_np_flags;
      int fork_np_pidfd;
      int fork_np_cgroup;
      int fork_np_exit_signal;
  #define fork_np_flags       __data.fork_np_flags
  #define fork_np_pidfd       __data.fork_np_pidfd
  #define fork_np_cgroup      __data.fork_np_cgroup
  #define fork_np_exit_signal __data.fork_np_exit_signal
    } __data;
    char __size [FORK_NP_ARGS_SIZE_VER0];
  } fork_np_args_t;

  #define FORK_NP_PIDFD        (1ULL << 1)
  #define FORK_NP_CGROUP       (1ULL << 2)
  #define FORK_NP_ASYNCSAFE    (1ULL << 3)
  #define FORK_NP_EXIT_SIGNAL  (1ULL << 4)

  pid_t fork_np (fork_np_args_t *args, size_t size)

The SIZE must represent a supported pidfd_fork_args_t type, otherwise,
the function returns EINVAL.

If ARGS has all members set to 0, no file descriptor is returned and
pidfd_fork acts as fork.  If PIDFDFORK_PIDFD is set on the flags member,
a new file descriptor is returned on pidfd member and the kernel sets
O_CLOEXEC as default.  The pidfd_fork follows the fork/_Fork convention
on returning a positive or negative value to the parent (with a negative
indicating an error) and zero to the child.

If PIDFDFORK_CGROUP is set, the value on the cgroup member is used as
the cgroupv2 to be placed in the new process (by using the
CLONE_INTO_CGROUP clone flag).

If PIDFDFORK_ASYNCSAFE is set, pidfd_fork acts as _Fork, thus avoiding
running pthread_atfork handlers.

If PIDFDFORK_EXIT_SIGNAL is set, the signal on exit_signal is sent as
process termination (SIGCHLD is the default). The 0 value is also valid,
meaning no signal will be sent.

The kernel already sets O_CLOEXEC as default and it follows the
fork/_Fork convention on returning a positive or negative value to the
parent (with negative indicating an error) and zero to the child.

Similar to fork, pidfd_fork also runs the pthread_atfork handlers It can
be changed by using the PIDFDFORK_ASYNCSAFE flag, which makes pidfd_fork
act a _Fork.  It also sends SIGCHLD to the parent when the new process
terminates.

To have a way to interop between process IDs and process file
descriptors, the pidfd_getpid is also provided:

   pid_t pidfd_getpid (int fd)

It reads the procfs fdinfo entry from the file descriptor to get the
process ID.

[1] https://gitlab.gnome.org/GNOME/glib/-/issues/1866
[2] https://sourceware.org/bugzilla/show_bug.cgi?id=30349
[3] https://codebrowser.dev/qt6/qtbase/src/3rdparty/forkfd/forkfd_linux.c.html

---

Changes from v7:
- Redefine __ASSUME_CLONE3 to 0 if the architecture does not support the
  syscall.
- Fixed some failing errors to be reported by spawned processes.
- Fixed pre-commit CI for AArch64 failures.
- Rename pidfd_fork to fork_np and make the API extensible
- Document more possible pidfd_getpid errors.

Changes from v6:
- Rebased against master, adjusted symbol version and NEWS entry.
- Added arm/mips clone3 implementation.

Changes from v5:
- Added cgroupv2 support for posix_spawn, pidfd_spawn, and pidfd_fork.

Changes from v4:
- Changed pidfd_fork signature to return a pid_t instead of the PID file
  descriptor.
- Changed pidfd_getpid to return EBADF for negative input, instead of
  EINVAL.
- Added PIDFDFORK_NOSIGCHLD option.
- Fixed nested __BEGIN_DECLS on spawn.h

Changes from v3:
- Remove strtoul usage.
- Fixed patchwork tst-pidfd_getpid.c regression.
- Fixed manual and NEWS typos.

Changes from v2:
- Added pidfd_fork and pidfd_getpid manual entries
- Change pidfd_fork to act as fork as default, instead as _Fork.
- Changed PIDFD_FORK_RUNATFORK flag to PIDFDFORK_ASYNCSAFE.
- Added pidfd_getpid test for EREMOTE.

Changes from v1:
- Extended pidfd_getpid error codes to return EBADF if fdinfo does not
  have Pid entry or if the value is invalid, EREMOTE is pid is in a 
  separate namespace, and ESRCH if is already terminated.
- Extended tst-pidfd_getpid.
- Rename PIDFD_FORK_RUNATFORK to PIDFDFORK_RUNATFORK to avoid clashes
  with possible kernel extensions.

Adhemerval Zanella (7):
  arm: Add the clone3 wrapper
  mips: Add the clone3 wrapper
  linux: Define __ASSUME_CLONE3 to 0 for alpha, ia64, nios2, sh, and
    sparc
  linux: Add posix_spawnattr_{get,set}cgroup_np (BZ 26731)
  posix: Add pidfd_spawn and pidfd_spawnp (BZ 30349)
  posix: Add fork_np (BZ 26371)
  linux: Add pidfd_getpid

 NEWS                                          |  24 ++
 bits/spawn_ext.h                              |  21 ++
 include/clone_internal.h                      |  21 ++
 manual/process.texi                           | 122 ++++++++-
 posix/Makefile                                |   5 +-
 posix/fork-internal.c                         | 127 ++++++++++
 posix/fork-internal.h                         |  36 +++
 posix/fork.c                                  | 107 +-------
 posix/spawn.h                                 |   6 +-
 posix/spawn_int.h                             |   3 +-
 posix/spawnattr_setflags.c                    |   3 +-
 posix/tst-posix_spawn-setsid.c                | 169 +++++++++----
 posix/tst-spawn-chdir.c                       |  15 +-
 posix/tst-spawn.c                             |  24 +-
 posix/tst-spawn.h                             |  36 +++
 posix/tst-spawn2.c                            |  17 +-
 posix/tst-spawn3.c                            | 100 ++++----
 posix/tst-spawn4.c                            |   7 +-
 posix/tst-spawn5.c                            |  14 +-
 posix/tst-spawn6.c                            |  13 +-
 posix/tst-spawn7.c                            |  13 +-
 sysdeps/nptl/_Fork.c                          |   2 +-
 sysdeps/unix/sysv/linux/Makefile              |  29 +++
 sysdeps/unix/sysv/linux/Versions              |   8 +
 sysdeps/unix/sysv/linux/aarch64/libc.abilist  |   6 +
 .../unix/sysv/linux/alpha/kernel-features.h   |   4 +
 sysdeps/unix/sysv/linux/alpha/libc.abilist    |   6 +
 sysdeps/unix/sysv/linux/arc/libc.abilist      |   6 +
 sysdeps/unix/sysv/linux/arch-fork.h           |  16 +-
 sysdeps/unix/sysv/linux/arm/be/libc.abilist   |   6 +
 sysdeps/unix/sysv/linux/arm/clone3.S          |  80 ++++++
 sysdeps/unix/sysv/linux/arm/le/libc.abilist   |   6 +
 sysdeps/unix/sysv/linux/arm/sysdep.h          |   1 +
 sysdeps/unix/sysv/linux/bits/spawn_ext.h      |  71 ++++++
 sysdeps/unix/sysv/linux/bits/unistd_ext.h     |  51 ++++
 sysdeps/unix/sysv/linux/clone-internal.c      |  58 ++++-
 sysdeps/unix/sysv/linux/clone-pidfd-support.c |  60 +++++
 sysdeps/unix/sysv/linux/csky/libc.abilist     |   6 +
 sysdeps/unix/sysv/linux/fork_np.c             |  97 +++++++
 sysdeps/unix/sysv/linux/hppa/libc.abilist     |   6 +
 sysdeps/unix/sysv/linux/i386/libc.abilist     |   6 +
 .../unix/sysv/linux/ia64/kernel-features.h    |   4 +
 sysdeps/unix/sysv/linux/ia64/libc.abilist     |   6 +
 .../sysv/linux/loongarch/lp64/libc.abilist    |   6 +
 .../sysv/linux/m68k/coldfire/libc.abilist     |   6 +
 .../unix/sysv/linux/m68k/m680x0/libc.abilist  |   6 +
 .../sysv/linux/microblaze/be/libc.abilist     |   6 +
 .../sysv/linux/microblaze/le/libc.abilist     |   6 +
 sysdeps/unix/sysv/linux/mips/clone3.S         | 139 +++++++++++
 .../sysv/linux/mips/mips32/fpu/libc.abilist   |   6 +
 .../sysv/linux/mips/mips32/nofpu/libc.abilist |   6 +
 .../sysv/linux/mips/mips64/n32/libc.abilist   |   6 +
 .../sysv/linux/mips/mips64/n64/libc.abilist   |   6 +
 sysdeps/unix/sysv/linux/mips/sysdep.h         |   2 +
 .../unix/sysv/linux/nios2/kernel-features.h   |  24 ++
 sysdeps/unix/sysv/linux/nios2/libc.abilist    |   6 +
 sysdeps/unix/sysv/linux/or1k/libc.abilist     |   6 +
 sysdeps/unix/sysv/linux/pidfd_getpid.c        | 126 ++++++++++
 sysdeps/unix/sysv/linux/pidfd_spawn.c         |  30 +++
 sysdeps/unix/sysv/linux/pidfd_spawnp.c        |  30 +++
 .../linux/powerpc/powerpc32/fpu/libc.abilist  |   6 +
 .../powerpc/powerpc32/nofpu/libc.abilist      |   6 +
 .../linux/powerpc/powerpc64/be/libc.abilist   |   6 +
 .../linux/powerpc/powerpc64/le/libc.abilist   |   6 +
 sysdeps/unix/sysv/linux/procutils.c           |  97 +++++++
 sysdeps/unix/sysv/linux/procutils.h           |  43 ++++
 .../unix/sysv/linux/riscv/rv32/libc.abilist   |   6 +
 .../unix/sysv/linux/riscv/rv64/libc.abilist   |   6 +
 .../unix/sysv/linux/s390/s390-32/libc.abilist |   6 +
 .../unix/sysv/linux/s390/s390-64/libc.abilist |   6 +
 sysdeps/unix/sysv/linux/sh/be/libc.abilist    |   6 +
 sysdeps/unix/sysv/linux/sh/kernel-features.h  |   4 +
 sysdeps/unix/sysv/linux/sh/le/libc.abilist    |   6 +
 .../unix/sysv/linux/sparc/kernel-features.h   |   4 +
 .../sysv/linux/sparc/sparc32/libc.abilist     |   6 +
 .../sysv/linux/sparc/sparc64/libc.abilist     |   6 +
 .../unix/sysv/linux/spawnattr_getcgroup_np.c  |  28 +++
 .../unix/sysv/linux/spawnattr_setcgroup_np.c  |  27 ++
 sysdeps/unix/sysv/linux/spawni.c              |  42 +++-
 sysdeps/unix/sysv/linux/sys/pidfd.h           |   4 +
 sysdeps/unix/sysv/linux/tst-fork_np-cgroup.c  | 170 +++++++++++++
 sysdeps/unix/sysv/linux/tst-fork_np.c         | 236 ++++++++++++++++++
 sysdeps/unix/sysv/linux/tst-pidfd.c           |  48 ++++
 sysdeps/unix/sysv/linux/tst-pidfd_getpid.c    | 126 ++++++++++
 .../sysv/linux/tst-posix_spawn-setsid-pidfd.c |  20 ++
 sysdeps/unix/sysv/linux/tst-spawn-cgroup.c    | 223 +++++++++++++++++
 .../unix/sysv/linux/tst-spawn-chdir-pidfd.c   |  20 ++
 sysdeps/unix/sysv/linux/tst-spawn-pidfd.c     |  20 ++
 sysdeps/unix/sysv/linux/tst-spawn-pidfd.h     |  63 +++++
 sysdeps/unix/sysv/linux/tst-spawn2-pidfd.c    |  20 ++
 sysdeps/unix/sysv/linux/tst-spawn3-pidfd.c    |  20 ++
 sysdeps/unix/sysv/linux/tst-spawn4-pidfd.c    |  20 ++
 sysdeps/unix/sysv/linux/tst-spawn5-pidfd.c    |  20 ++
 sysdeps/unix/sysv/linux/tst-spawn6-pidfd.c    |  20 ++
 sysdeps/unix/sysv/linux/tst-spawn7-pidfd.c    |  20 ++
 .../unix/sysv/linux/x86_64/64/libc.abilist    |   6 +
 .../unix/sysv/linux/x86_64/x32/libc.abilist   |   6 +
 97 files changed, 2947 insertions(+), 267 deletions(-)
 create mode 100644 bits/spawn_ext.h
 create mode 100644 posix/fork-internal.c
 create mode 100644 posix/fork-internal.h
 create mode 100644 posix/tst-spawn.h
 create mode 100644 sysdeps/unix/sysv/linux/arm/clone3.S
 create mode 100644 sysdeps/unix/sysv/linux/bits/spawn_ext.h
 create mode 100644 sysdeps/unix/sysv/linux/clone-pidfd-support.c
 create mode 100644 sysdeps/unix/sysv/linux/fork_np.c
 create mode 100644 sysdeps/unix/sysv/linux/mips/clone3.S
 create mode 100644 sysdeps/unix/sysv/linux/nios2/kernel-features.h
 create mode 100644 sysdeps/unix/sysv/linux/pidfd_getpid.c
 create mode 100644 sysdeps/unix/sysv/linux/pidfd_spawn.c
 create mode 100644 sysdeps/unix/sysv/linux/pidfd_spawnp.c
 create mode 100644 sysdeps/unix/sysv/linux/procutils.c
 create mode 100644 sysdeps/unix/sysv/linux/procutils.h
 create mode 100644 sysdeps/unix/sysv/linux/spawnattr_getcgroup_np.c
 create mode 100644 sysdeps/unix/sysv/linux/spawnattr_setcgroup_np.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-fork_np-cgroup.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-fork_np.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-pidfd_getpid.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-posix_spawn-setsid-pidfd.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn-cgroup.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn-chdir-pidfd.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn-pidfd.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn-pidfd.h
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn2-pidfd.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn3-pidfd.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn4-pidfd.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn5-pidfd.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn6-pidfd.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn7-pidfd.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2023-08-28 13:51 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-18 14:06 [PATCH v8 0/7] Add pidfd and cgroupv2 support for process creation Adhemerval Zanella
2023-08-18 14:06 ` [PATCH v8 1/7] arm: Add the clone3 wrapper Adhemerval Zanella
2023-08-18 14:06 ` [PATCH v8 2/7] mips: " Adhemerval Zanella
2023-08-18 14:06 ` [PATCH v8 3/7] linux: Define __ASSUME_CLONE3 to 0 for alpha, ia64, nios2, sh, and sparc Adhemerval Zanella
2023-08-24  6:06   ` Florian Weimer
2023-08-18 14:06 ` [PATCH v8 4/7] linux: Add posix_spawnattr_{get,set}cgroup_np (BZ 26731) Adhemerval Zanella
2023-08-24  7:00   ` Florian Weimer
2023-08-18 14:06 ` [PATCH v8 5/7] posix: Add pidfd_spawn and pidfd_spawnp (BZ 30349) Adhemerval Zanella
2023-08-24  7:13   ` Florian Weimer
2023-08-24 15:43     ` Adhemerval Zanella Netto
2023-08-24 17:00       ` Florian Weimer
2023-08-24 17:10         ` Adhemerval Zanella Netto
2023-08-24 18:18           ` Florian Weimer
2023-08-24 18:22             ` Adhemerval Zanella Netto
2023-08-25 10:38               ` Florian Weimer
2023-08-25 16:37                 ` Adhemerval Zanella Netto
2023-08-18 14:06 ` [PATCH v8 6/7] posix: Add fork_np (BZ 26371) Adhemerval Zanella
2023-08-24  6:07   ` Florian Weimer
2023-08-18 14:06 ` [PATCH v8 7/7] linux: Add pidfd_getpid Adhemerval Zanella
2023-08-24  7:53   ` Florian Weimer
2023-08-18 17:51 ` [PATCH v8 0/7] Add pidfd and cgroupv2 support for process creation Rich Felker
2023-08-18 18:34   ` Adhemerval Zanella Netto
2023-08-28 12:52     ` Luca Boccassi
2023-08-28 13:21       ` Florian Weimer
2023-08-28 13:50         ` Luca Boccassi
2023-08-21  6:53   ` Florian Weimer
2023-08-21 13:55     ` Rich Felker
2023-08-24  7:25       ` Florian Weimer
2023-08-24 12:21         ` Rich Felker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).