public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH v6 0/5] Add pidfd and cgroupv2 support for process creation (resend)
@ 2023-07-06 13:45 Adhemerval Zanella
  2023-07-06 13:45 ` [PATCH v6 1/5] linux: Add posix_spawnattr_{get,set}cgroup_np (BZ 26731) Adhemerval Zanella
                   ` (5 more replies)
  0 siblings, 6 replies; 8+ messages in thread
From: Adhemerval Zanella @ 2023-07-06 13:45 UTC (permalink / raw)
  To: libc-alpha, Luca Boccassi, Philip Withnall

The glibc 2.36 added wrappers for Linux syscall pidfd_open,
pidfd_getfd, and pidfd_send_signal, and exported the P_PIDFD to use
along with waitid. The pidfd is a race free interface, however
the pidfd_open is subject to TOCTOU if the file descriptor
is not obtained directly from the clone or clone3 syscall (there is
still a small window between the clone return and the pidfd_getfd
where the process can be reaped and the process ID reused).

A fully race free interface with posix_spawn interface is being
discussed by GNOME [1] [2], and Qt already uses on its QtProcess
implementation [3].  The Qt implementation has some pitfalls:

  - It calls clone through the syscall symbol, which does not run the
    pthread_atfork handlers even though it really intends to use the
    clone semantic for fork (by only using CLONE_PIDFD | SIGCHLD).

  - It also does not reset any internal state, such as internal IO,
    malloc, loader, etc. locks.

  - It does not set the TCB tid field nor the robust list, used by
    pthread code.

  - It does not optimize process creation by using CLONE_VM and
    CLONE_VFORK.

Also, recent Linux kernel (starting with 5.7) provide a way to
create a new process in a different cgroups version 2 than the
default one (through clone3 CLONE_INTO_CGROUP flag).  Providing it
through glibc interfaces make is usable without the risk of potential
breakage by issuing clone3 syscall directly (check BZ#26371 discussion).

This patchset adds new interfaces that take care of this potential
issues.  The new posix_spawn / posix_spawnp extesions:


  #define POSIX_SPAWN_SETCGROUP 0x100

  int posix_spawnattr_getcgroup_np (const posix_spawnattr_t
				    restrict *attr, int *cgroup);
  int posix_spawnattr_setcgroup_np (posix_spawnattr_t *restrict attr,
                                    int cgroup);
  
Allow spawn a new process on a different cgroupv2.  

The pidfd_spawn and pidfd_spawnp is similar to posix_spawn and
posix_spawnp,
but return a process file descriptor instead of a PID.

  int pidfd_spawn (int *restrict pidfd,
 		   const char *restrict file,
  		   const posix_spawn_file_actions_t *restrict facts,
  		   const posix_spawnattr_t *restrict attrp,
  		   char *const argv[restrict],
  		   char *const envp[restrict])

  int pidfd_spawnp (int *restrict pidfd,
 		    const char *restrict path,
  		    const posix_spawn_file_actions_t *restrict facts,
  		    const posix_spawnattr_t *restrict attrp,
  		    char *const argv[restrict_arr],
  		    char *const envp[restrict_arr]);

The implementation makes sure that kernel must support the complete
pidfd interface, meaning that waitid (P_PIDFD) should be supported.  It
ensure that non racy workaround is required (such as reading procfs
fdinfo pid to use along with old wait interfaces).  If kernel does not
have the required support the interface returns ENOSYS.

A new symbol is used instead of a posix_spawn extension to avoid
possible issue with language bindings that might track the argument
lifetime.

Both symbols reuse the posix_spawn posix_spawn_file_actions_t and
posix_spawnattr_t, to either avoid rehash posix_spawn API or add a new
one.  It also mean that both interfaces support the same attribute and
file actions, and a new flag or file actions on posix_spawn is also
added automatically for pidfd_spawn. It includes POSIX_SPAWN_SETCGROUP.

Along with the spawn interface, a fork like one is also provided:

  pid_t pidfd_fork (int *pidfd, int cgroup, unsigned int flags)

If PIDFD is set to NULL, no file descriptor is returned and pidfd_fork
acts as fork.  Otherwise, a new file descriptor is returned and the
kernel already sets O_CLOEXEC as default.  The pidfd_fork follows
fork/_Fork convention on returning a positive or negative value to the
parent (with negative indicating an error) and zero to the child.

If cgroup is 0 or positive value, it is interpreted as a different
cgroup to be place the new process (check CLONE_INTO_CGROUP clone
flag).

The kernel already sets O_CLOEXEC as default and it follows fork/_Fork
convention on returning a positive or negative value to the parent
(with negative indicating an error) and zero to the child.

Similar to fork, pidfd_fork also runs the pthread_atfork handlers
It can be change by using PIDFDFORK_ASYNCSAFE flag, which make
pidfd_fork acts a _Fork.  It also send SIGCHLD to parent when
process terminates.

To have a way to interop between process IDs and process file
descriptors, the pidfd_getpid is also provided:

   pid_t pidfd_getpid (int fd)

It reads the procfs fdinfo entry from the file descriptor to get
the process ID.

---

Changes from v5:
- Added cgroupv2 support for posix_spawn, pidfd_spawn, and pidfd_fork.

Changes from v4:
- Changed pidfd_fork signature to return a pid_t instead of PID file
  descriptor.
- Changed pidfd_getpid to return EBADF for negative input, instead of
  EINVAL.
- Added PIDFDFORK_NOSIGCHLD option.
- Fixed nested __BEGIN_DECLS on spawn.h

Changes from v3:
- Remove strtoul usage.
- Fixed patchwork tst-pidfd_getpid.c regression.
- Fixed manual and NEWS typos.

Changes from v2:
- Added pidfd_fork and pidfd_getpid manual entries
- Change pidfd_fork to act as fork as default, instead as _Fork.
- Changed PIDFD_FORK_RUNATFORK flag to PIDFDFORK_ASYNCSAFE.
- Added pidfd_getpid test for EREMOTE.

Changes from v1:
- Extended pidfd_getpid error codes to return EBADF if fdinfo does not
  have Pid entry or if the value is invalid, EREMOTE is pid is in a 
  separate namespace, and ESRCH if is already terminated.
- Extended tst-pidfd_getpid.
- Rename PIDFD_FORK_RUNATFORK to PIDFDFORK_RUNATFORK to avoid clash
  with possible kernel extensions.

Adhemerval Zanella (5):
  linux: Add posix_spawnattr_{get,set}cgroup_np (BZ 26731)
  posix: Add pidfd_spawn and pidfd_spawnp (BZ 30349)
  posix: Add pidfd_fork (BZ 26371)
  posix: Add PIDFDFORK_NOSIGCHLD for pidfd_fork
  linux: Add pidfd_getpid

 NEWS                                          |  22 ++
 bits/spawn_ext.h                              |  21 ++
 include/clone_internal.h                      |  21 ++
 manual/process.texi                           |  92 ++++++-
 posix/Makefile                                |   5 +-
 posix/fork-internal.c                         | 127 ++++++++++
 posix/fork-internal.h                         |  36 +++
 posix/fork.c                                  | 107 +--------
 posix/spawn.h                                 |   6 +-
 posix/spawn_int.h                             |   3 +-
 posix/spawnattr_setflags.c                    |   3 +-
 posix/tst-posix_spawn-setsid.c                | 168 +++++++++----
 posix/tst-spawn-chdir.c                       |  15 +-
 posix/tst-spawn.c                             |  24 +-
 posix/tst-spawn.h                             |  36 +++
 posix/tst-spawn2.c                            |  17 +-
 posix/tst-spawn3.c                            | 100 ++++----
 posix/tst-spawn4.c                            |   7 +-
 posix/tst-spawn5.c                            |  14 +-
 posix/tst-spawn6.c                            |  15 +-
 posix/tst-spawn7.c                            |  13 +-
 sysdeps/nptl/_Fork.c                          |   2 +-
 sysdeps/unix/sysv/linux/Makefile              |  29 +++
 sysdeps/unix/sysv/linux/Versions              |   8 +
 sysdeps/unix/sysv/linux/aarch64/libc.abilist  |   6 +
 sysdeps/unix/sysv/linux/alpha/libc.abilist    |   6 +
 sysdeps/unix/sysv/linux/arc/libc.abilist      |   6 +
 sysdeps/unix/sysv/linux/arch-fork.h           |  16 +-
 sysdeps/unix/sysv/linux/arm/be/libc.abilist   |   6 +
 sysdeps/unix/sysv/linux/arm/le/libc.abilist   |   6 +
 sysdeps/unix/sysv/linux/bits/spawn_ext.h      |  60 +++++
 sysdeps/unix/sysv/linux/clone-internal.c      |  60 ++++-
 sysdeps/unix/sysv/linux/clone-pidfd-support.c |  58 +++++
 sysdeps/unix/sysv/linux/csky/libc.abilist     |   6 +
 sysdeps/unix/sysv/linux/hppa/libc.abilist     |   6 +
 sysdeps/unix/sysv/linux/i386/libc.abilist     |   6 +
 sysdeps/unix/sysv/linux/ia64/libc.abilist     |   6 +
 .../sysv/linux/loongarch/lp64/libc.abilist    |   6 +
 .../sysv/linux/m68k/coldfire/libc.abilist     |   6 +
 .../unix/sysv/linux/m68k/m680x0/libc.abilist  |   6 +
 .../sysv/linux/microblaze/be/libc.abilist     |   6 +
 .../sysv/linux/microblaze/le/libc.abilist     |   6 +
 .../sysv/linux/mips/mips32/fpu/libc.abilist   |   6 +
 .../sysv/linux/mips/mips32/nofpu/libc.abilist |   6 +
 .../sysv/linux/mips/mips64/n32/libc.abilist   |   6 +
 .../sysv/linux/mips/mips64/n64/libc.abilist   |   6 +
 sysdeps/unix/sysv/linux/nios2/libc.abilist    |   6 +
 sysdeps/unix/sysv/linux/or1k/libc.abilist     |   6 +
 sysdeps/unix/sysv/linux/pidfd_fork.c          |  82 +++++++
 sysdeps/unix/sysv/linux/pidfd_getpid.c        | 122 ++++++++++
 sysdeps/unix/sysv/linux/pidfd_spawn.c         |  30 +++
 sysdeps/unix/sysv/linux/pidfd_spawnp.c        |  30 +++
 .../linux/powerpc/powerpc32/fpu/libc.abilist  |   6 +
 .../powerpc/powerpc32/nofpu/libc.abilist      |   6 +
 .../linux/powerpc/powerpc64/be/libc.abilist   |   6 +
 .../linux/powerpc/powerpc64/le/libc.abilist   |   6 +
 sysdeps/unix/sysv/linux/procutils.c           | 104 ++++++++
 sysdeps/unix/sysv/linux/procutils.h           |  35 +++
 .../unix/sysv/linux/riscv/rv32/libc.abilist   |   6 +
 .../unix/sysv/linux/riscv/rv64/libc.abilist   |   6 +
 .../unix/sysv/linux/s390/s390-32/libc.abilist |   6 +
 .../unix/sysv/linux/s390/s390-64/libc.abilist |   6 +
 sysdeps/unix/sysv/linux/sh/be/libc.abilist    |   6 +
 sysdeps/unix/sysv/linux/sh/le/libc.abilist    |   6 +
 .../sysv/linux/sparc/sparc32/libc.abilist     |   6 +
 .../sysv/linux/sparc/sparc64/libc.abilist     |   6 +
 .../unix/sysv/linux/spawnattr_getcgroup_np.c  |  28 +++
 .../unix/sysv/linux/spawnattr_setcgroup_np.c  |  27 +++
 sysdeps/unix/sysv/linux/spawni.c              |  38 ++-
 sysdeps/unix/sysv/linux/sys/pidfd.h           |  25 ++
 sysdeps/unix/sysv/linux/tst-pidfd.c           |  47 ++++
 .../unix/sysv/linux/tst-pidfd_fork-cgroup.c   | 162 +++++++++++++
 sysdeps/unix/sysv/linux/tst-pidfd_fork.c      | 227 ++++++++++++++++++
 sysdeps/unix/sysv/linux/tst-pidfd_getpid.c    | 187 +++++++++++++++
 .../sysv/linux/tst-posix_spawn-setsid-pidfd.c |  20 ++
 sysdeps/unix/sysv/linux/tst-spawn-cgroup.c    | 216 +++++++++++++++++
 .../unix/sysv/linux/tst-spawn-chdir-pidfd.c   |  20 ++
 sysdeps/unix/sysv/linux/tst-spawn-pidfd.c     |  20 ++
 sysdeps/unix/sysv/linux/tst-spawn-pidfd.h     |  63 +++++
 sysdeps/unix/sysv/linux/tst-spawn2-pidfd.c    |  20 ++
 sysdeps/unix/sysv/linux/tst-spawn3-pidfd.c    |  20 ++
 sysdeps/unix/sysv/linux/tst-spawn4-pidfd.c    |  20 ++
 sysdeps/unix/sysv/linux/tst-spawn5-pidfd.c    |  20 ++
 sysdeps/unix/sysv/linux/tst-spawn6-pidfd.c    |  20 ++
 sysdeps/unix/sysv/linux/tst-spawn7-pidfd.c    |  20 ++
 .../unix/sysv/linux/x86_64/64/libc.abilist    |   6 +
 .../unix/sysv/linux/x86_64/x32/libc.abilist   |   6 +
 87 files changed, 2624 insertions(+), 268 deletions(-)
 create mode 100644 bits/spawn_ext.h
 create mode 100644 posix/fork-internal.c
 create mode 100644 posix/fork-internal.h
 create mode 100644 posix/tst-spawn.h
 create mode 100644 sysdeps/unix/sysv/linux/bits/spawn_ext.h
 create mode 100644 sysdeps/unix/sysv/linux/clone-pidfd-support.c
 create mode 100644 sysdeps/unix/sysv/linux/pidfd_fork.c
 create mode 100644 sysdeps/unix/sysv/linux/pidfd_getpid.c
 create mode 100644 sysdeps/unix/sysv/linux/pidfd_spawn.c
 create mode 100644 sysdeps/unix/sysv/linux/pidfd_spawnp.c
 create mode 100644 sysdeps/unix/sysv/linux/procutils.c
 create mode 100644 sysdeps/unix/sysv/linux/procutils.h
 create mode 100644 sysdeps/unix/sysv/linux/spawnattr_getcgroup_np.c
 create mode 100644 sysdeps/unix/sysv/linux/spawnattr_setcgroup_np.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-pidfd_fork-cgroup.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-pidfd_fork.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-pidfd_getpid.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-posix_spawn-setsid-pidfd.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn-cgroup.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn-chdir-pidfd.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn-pidfd.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn-pidfd.h
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn2-pidfd.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn3-pidfd.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn4-pidfd.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn5-pidfd.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn6-pidfd.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn7-pidfd.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-08-10 18:19 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-06 13:45 [PATCH v6 0/5] Add pidfd and cgroupv2 support for process creation (resend) Adhemerval Zanella
2023-07-06 13:45 ` [PATCH v6 1/5] linux: Add posix_spawnattr_{get,set}cgroup_np (BZ 26731) Adhemerval Zanella
2023-07-06 13:45 ` [PATCH v6 2/5] posix: Add pidfd_spawn and pidfd_spawnp (BZ 30349) Adhemerval Zanella
2023-07-06 13:45 ` [PATCH v6 3/5] posix: Add pidfd_fork (BZ 26371) Adhemerval Zanella
2023-07-06 13:45 ` [PATCH v6 4/5] posix: Add PIDFDFORK_NOSIGCHLD for pidfd_fork Adhemerval Zanella
2023-07-06 13:45 ` [PATCH v6 5/5] linux: Add pidfd_getpid Adhemerval Zanella
2023-08-08 19:35 ` [PATCH v6 0/5] Add pidfd and cgroupv2 support for process creation (resend) Luca Boccassi
2023-08-10 18:19   ` Adhemerval Zanella Netto

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).