public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH v5 0/2] Linux: Fix posix_spawn when user with time namespaces
@ 2022-05-30 17:49 Adhemerval Zanella
  2022-05-30 17:49 ` [PATCH v5 1/2] support: Add support_enter_time_namespace Adhemerval Zanella
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Adhemerval Zanella @ 2022-05-30 17:49 UTC (permalink / raw)
  To: libc-alpha

The patchset also adds some support to tests the fallback code
to use only use CLONE_VFORK.  It uses unshare directly because
it simpler than add container support.

v5: Use a MAP_SHARED page to communicate error on prepare phase from
    helper process.

Adhemerval Zanella (2):
  support: Add support_enter_time_namespace
  linux: Add fallback for clone failure on posix_spawn (BZ #29115)

 posix/Makefile                         |  16 ++-
 posix/tst-spawn-chdir-timens.c         |   2 +
 posix/tst-spawn-chdir.c                |   7 ++
 posix/tst-spawn-timens.c               |   2 +
 posix/tst-spawn.c                      |   6 ++
 posix/tst-spawn2-timens.c              |   2 +
 posix/tst-spawn2.c                     |   8 ++
 posix/tst-spawn4-timens.c              |   2 +
 posix/tst-spawn4.c                     |  10 +-
 posix/tst-spawn5-timens.c              |   2 +
 posix/tst-spawn5.c                     |  10 +-
 posix/tst-spawn6-timens.c              |   2 +
 posix/tst-spawn6.c                     |  12 ++-
 support/Makefile                       |   1 +
 support/namespace.h                    |   5 +
 support/support_enter_time_namespace.c |  34 ++++++
 sysdeps/unix/sysv/linux/spawni.c       | 141 +++++++++++++++++--------
 17 files changed, 211 insertions(+), 51 deletions(-)
 create mode 100644 posix/tst-spawn-chdir-timens.c
 create mode 100644 posix/tst-spawn-timens.c
 create mode 100644 posix/tst-spawn2-timens.c
 create mode 100644 posix/tst-spawn4-timens.c
 create mode 100644 posix/tst-spawn5-timens.c
 create mode 100644 posix/tst-spawn6-timens.c
 create mode 100644 support/support_enter_time_namespace.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v5 1/2] support: Add support_enter_time_namespace
  2022-05-30 17:49 [PATCH v5 0/2] Linux: Fix posix_spawn when user with time namespaces Adhemerval Zanella
@ 2022-05-30 17:49 ` Adhemerval Zanella
  2022-05-30 17:49 ` [PATCH v5 2/2] linux: Add fallback for clone failure on posix_spawn (BZ #29115) Adhemerval Zanella
  2022-07-11 15:32 ` [PATCH v5 0/2] Linux: Fix posix_spawn when user with time namespaces Carlos O'Donell
  2 siblings, 0 replies; 8+ messages in thread
From: Adhemerval Zanella @ 2022-05-30 17:49 UTC (permalink / raw)
  To: libc-alpha

Enter a time namespace, where the new namespace isolates clock
values.  It requires either a root-like privileges (done with
support_become_root) or a previous user namespace (CLONE_NEWUSER).

A time namespace is similar to a pid namespace in the way how it is
created: unshare(CLONE_NEWTIME) system call creates a new time
namespace, but doesn't set it to the current process. Then all
children of the process will be born in the new time namespace.

It will be used on posix_spawn tests to exercise the BZ #29115
fix, where clone (CLONE_VFORK | CLONE_VM) fails if the process
enter a time namespace.
---
 support/Makefile                       |  1 +
 support/namespace.h                    |  5 ++++
 support/support_enter_time_namespace.c | 34 ++++++++++++++++++++++++++
 3 files changed, 40 insertions(+)
 create mode 100644 support/support_enter_time_namespace.c

diff --git a/support/Makefile b/support/Makefile
index 9b50eac117..e4a1402c36 100644
--- a/support/Makefile
+++ b/support/Makefile
@@ -56,6 +56,7 @@ libsupport-routines = \
   support_descriptors \
   support_enter_mount_namespace \
   support_enter_network_namespace \
+  support_enter_time_namespace \
   support_format_address_family \
   support_format_addrinfo \
   support_format_dns_packet \
diff --git a/support/namespace.h b/support/namespace.h
index 23bad6403b..338000547c 100644
--- a/support/namespace.h
+++ b/support/namespace.h
@@ -56,6 +56,11 @@ bool support_enter_network_namespace (void);
    not affect the host system afterwards.  */
 bool support_enter_mount_namespace (void);
 
+/* Enter a time namespace, where the new namespace isolates clock
+   values.  It requires either a root-like privileges (done with
+   support_become_root) or a previous user namespace (CLONE_NEWUSER).  */
+bool support_enter_time_namespace (void);
+
 /* Return true if support_enter_network_namespace managed to enter a
    UTS namespace.  */
 bool support_in_uts_namespace (void);
diff --git a/support/support_enter_time_namespace.c b/support/support_enter_time_namespace.c
new file mode 100644
index 0000000000..a18caa878a
--- /dev/null
+++ b/support/support_enter_time_namespace.c
@@ -0,0 +1,34 @@
+/* Enter a time namespace.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <support/namespace.h>
+
+#include <sched.h>
+#include <stdio.h>
+
+bool
+support_enter_time_namespace (void)
+{
+#ifdef CLONE_NEWTIME
+  if (unshare (CLONE_NEWTIME) == 0)
+    return true;
+  else
+    printf ("warning: unshare (CLONE_NEWTIME) failed: %m\n");
+#endif /* CLONE_NEWNS */
+  return false;
+}
-- 
2.34.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v5 2/2] linux: Add fallback for clone failure on posix_spawn (BZ #29115)
  2022-05-30 17:49 [PATCH v5 0/2] Linux: Fix posix_spawn when user with time namespaces Adhemerval Zanella
  2022-05-30 17:49 ` [PATCH v5 1/2] support: Add support_enter_time_namespace Adhemerval Zanella
@ 2022-05-30 17:49 ` Adhemerval Zanella
  2022-07-11 15:32 ` [PATCH v5 0/2] Linux: Fix posix_spawn when user with time namespaces Carlos O'Donell
  2 siblings, 0 replies; 8+ messages in thread
From: Adhemerval Zanella @ 2022-05-30 17:49 UTC (permalink / raw)
  To: libc-alpha

Linux clone with CLONE_VM may fail for some namespace restriction (for
instance if kernel does not allow processes in different time namespaces
to share the sameaddress space).

In this case clone fails with EINVAL and posix_spawn can not spawn a new
process.  However the same process can be spawned with fork and exec.

The patch fixes by retrying the clone syscall with just CLONE_VFORK
if clone fails with a non transient failure (ENOMEM and EAGAIN still
returns failure to caller).

Error on preparation phase or execve is returned by through a shared
memory for the case of only CLONE_VFORK.  Failure on prepare phase in
helper process does not trigger the fork and exec fallback.

Some spawn tests were adapted to run inside a created time namespace.

Checked on x86_64-linux-gnu.
---
 posix/Makefile                   |  16 +++-
 posix/tst-spawn-chdir-timens.c   |   2 +
 posix/tst-spawn-chdir.c          |   7 ++
 posix/tst-spawn-timens.c         |   2 +
 posix/tst-spawn.c                |   6 ++
 posix/tst-spawn2-timens.c        |   2 +
 posix/tst-spawn2.c               |   8 ++
 posix/tst-spawn4-timens.c        |   2 +
 posix/tst-spawn4.c               |  10 ++-
 posix/tst-spawn5-timens.c        |   2 +
 posix/tst-spawn5.c               |  10 ++-
 posix/tst-spawn6-timens.c        |   2 +
 posix/tst-spawn6.c               |  12 ++-
 sysdeps/unix/sysv/linux/spawni.c | 141 +++++++++++++++++++++----------
 14 files changed, 171 insertions(+), 51 deletions(-)
 create mode 100644 posix/tst-spawn-chdir-timens.c
 create mode 100644 posix/tst-spawn-timens.c
 create mode 100644 posix/tst-spawn2-timens.c
 create mode 100644 posix/tst-spawn4-timens.c
 create mode 100644 posix/tst-spawn5-timens.c
 create mode 100644 posix/tst-spawn6-timens.c

diff --git a/posix/Makefile b/posix/Makefile
index cfebb8ef06..e21abbe637 100644
--- a/posix/Makefile
+++ b/posix/Makefile
@@ -109,7 +109,9 @@ tests		:= test-errno tstgetopt testfnm runtests runptests \
 		   tst-glob-tilde test-ssize-max tst-spawn4 bug-regex37 \
 		   bug-regex38 tst-regcomp-truncated tst-spawn-chdir \
 		   tst-wordexp-nocmd tst-execveat tst-spawn5 \
-		   tst-sched_getaffinity tst-spawn6
+		   tst-sched_getaffinity tst-spawn6 \
+		   tst-spawn-chdir-timens tst-spawn4-timens tst-spawn5-timens \
+		   tst-spawn6-timens
 
 # Test for the glob symbol version that was replaced in glibc 2.27.
 ifeq ($(have-GLIBC_2.26)$(build-shared),yesyes)
@@ -130,7 +132,15 @@ xtests		:= tst-getaddrinfo4 tst-getaddrinfo5 tst-sched_rr_get_interval
 xtests-time64	:= tst-sched_rr_get_interval-time64
 ifeq (yes,$(build-shared))
 test-srcs	:= globtest
-tests           += wordexp-test tst-exec tst-spawn tst-spawn2 tst-spawn3
+tests           += \
+  tst-exec \
+  tst-spawn \
+  tst-spawn-timens \
+  tst-spawn2 \
+  tst-spawn2-timens \
+  tst-spawn3 \
+  wordexp-test \
+  # tests
 endif
 ifeq (yesyes,$(build-shared)$(have-thread-library))
 tests		+= tst-getopt-cancel tst-_Fork
@@ -290,7 +300,9 @@ tst-execvpe5-ARGS = -- $(host-test-program-cmd)
 tst-spawn-ARGS = -- $(host-test-program-cmd)
 tst-spawn-static-ARGS = $(tst-spawn-ARGS)
 tst-spawn5-ARGS = -- $(host-test-program-cmd)
+tst-spawn5-timens-ARGS = -- $(host-test-program-cmd)
 tst-spawn6-ARGS = -- $(host-test-program-cmd)
+tst-spawn6-timens-ARGS = -- $(host-test-program-cmd)
 tst-dir-ARGS = `pwd` `cd $(common-objdir)/$(subdir); pwd` `cd $(common-objdir); pwd` $(objpfx)tst-dir
 tst-chmod-ARGS = $(objdir)
 tst-vfork3-ARGS = --test-dir=$(objpfx)
diff --git a/posix/tst-spawn-chdir-timens.c b/posix/tst-spawn-chdir-timens.c
new file mode 100644
index 0000000000..a659e3201e
--- /dev/null
+++ b/posix/tst-spawn-chdir-timens.c
@@ -0,0 +1,2 @@
+#define CHECK_TIMENAMESPACE
+#include "tst-spawn-chdir.c"
diff --git a/posix/tst-spawn-chdir.c b/posix/tst-spawn-chdir.c
index 18e90c2957..cadcf6ddc2 100644
--- a/posix/tst-spawn-chdir.c
+++ b/posix/tst-spawn-chdir.c
@@ -29,6 +29,7 @@
 #include <support/test-driver.h>
 #include <support/xstdio.h>
 #include <support/xunistd.h>
+#include <support/namespace.h>
 #include <unistd.h>
 
 /* Reads the file at PATH, which must consist of exactly one line.
@@ -87,6 +88,12 @@ add_chdir (posix_spawn_file_actions_t *actions, const char *path,
 static int
 do_test (void)
 {
+#ifdef CHECK_TIMENAMESPACE
+  support_become_root();
+  if (!support_enter_time_namespace ())
+    return EXIT_UNSUPPORTED;
+#endif
+
   /* Directory for temporary file data.  Each subtest uses a numeric
      subdirectory.  */
   char *directory = support_create_temp_directory ("tst-spawn-chdir-");
diff --git a/posix/tst-spawn-timens.c b/posix/tst-spawn-timens.c
new file mode 100644
index 0000000000..a659e3201e
--- /dev/null
+++ b/posix/tst-spawn-timens.c
@@ -0,0 +1,2 @@
+#define CHECK_TIMENAMESPACE
+#include "tst-spawn-chdir.c"
diff --git a/posix/tst-spawn.c b/posix/tst-spawn.c
index e378c72f1b..2007384a90 100644
--- a/posix/tst-spawn.c
+++ b/posix/tst-spawn.c
@@ -178,6 +178,12 @@ do_test (int argc, char *argv[])
     return handle_restart (argv[1], argv[2], argv[3], argv[4], argv[5],
 			   argv[6]);
 
+#ifdef CHECK_TIMENAMESPACE
+  support_become_root();
+  if (!support_enter_time_namespace ())
+    return EXIT_UNSUPPORTED;
+#endif
+
   /* Prepare the test.  We are creating four files: two which file descriptor
      will be marked with FD_CLOEXEC, another which is not.  */
 
diff --git a/posix/tst-spawn2-timens.c b/posix/tst-spawn2-timens.c
new file mode 100644
index 0000000000..612531fc86
--- /dev/null
+++ b/posix/tst-spawn2-timens.c
@@ -0,0 +1,2 @@
+#define CHECK_TIMENAMESPACE
+#include "tst-spawn2.c"
diff --git a/posix/tst-spawn2.c b/posix/tst-spawn2.c
index 45ff17f9a7..16fcbe4295 100644
--- a/posix/tst-spawn2.c
+++ b/posix/tst-spawn2.c
@@ -26,10 +26,18 @@
 #include <stdio.h>
 
 #include <support/check.h>
+#include <support/test-driver.h>
+#include <support/namespace.h>
 
 int
 do_test (void)
 {
+#ifdef CHECK_TIMENAMESPACE
+  support_become_root();
+  if (!support_enter_time_namespace ())
+    return EXIT_UNSUPPORTED;
+#endif
+
   /* Check if posix_spawn correctly returns an error and an invalid pid
      by trying to spawn an invalid binary.  */
 
diff --git a/posix/tst-spawn4-timens.c b/posix/tst-spawn4-timens.c
new file mode 100644
index 0000000000..3f4e8ea089
--- /dev/null
+++ b/posix/tst-spawn4-timens.c
@@ -0,0 +1,2 @@
+#define CHECK_TIMENAMESPACE
+#include "tst-spawn4.c"
diff --git a/posix/tst-spawn4.c b/posix/tst-spawn4.c
index 9eac2d1080..29b7a6215d 100644
--- a/posix/tst-spawn4.c
+++ b/posix/tst-spawn4.c
@@ -21,13 +21,21 @@
 #include <unistd.h>
 #include <sys/stat.h>
 
-#include <support/xunistd.h>
 #include <support/check.h>
+#include <support/namespace.h>
 #include <support/temp_file.h>
+#include <support/test-driver.h>
+#include <support/xunistd.h>
 
 static int
 do_test (void)
 {
+#ifdef CHECK_TIMENAMESPACE
+  support_become_root();
+  if (!support_enter_time_namespace ())
+    return EXIT_UNSUPPORTED;
+#endif
+
   char *scriptname;
   int fd = create_temp_file ("tst-spawn4.", &scriptname);
   TEST_VERIFY_EXIT (fd >= 0);
diff --git a/posix/tst-spawn5-timens.c b/posix/tst-spawn5-timens.c
new file mode 100644
index 0000000000..8d46819222
--- /dev/null
+++ b/posix/tst-spawn5-timens.c
@@ -0,0 +1,2 @@
+#define CHECK_TIMENAMESPACE
+#include "tst-spawn5.c"
diff --git a/posix/tst-spawn5.c b/posix/tst-spawn5.c
index dd5b3f9357..045ab15358 100644
--- a/posix/tst-spawn5.c
+++ b/posix/tst-spawn5.c
@@ -28,8 +28,10 @@
 #include <limits.h>
 
 #include <support/check.h>
-#include <support/xunistd.h>
+#include <support/namespace.h>
 #include <support/support.h>
+#include <support/test-driver.h>
+#include <support/xunistd.h>
 
 #include <arch-fd_to_filename.h>
 #include <array_length.h>
@@ -278,6 +280,12 @@ do_test (int argc, char *argv[])
     /* Ignore the application name. */
     handle_restart (argc - 1, &argv[1]);
 
+#ifdef CHECK_TIMENAMESPACE
+  support_become_root();
+  if (!support_enter_time_namespace ())
+    return EXIT_UNSUPPORTED;
+#endif
+
   TEST_VERIFY_EXIT (argc == 2 || argc == 5);
 
   int i;
diff --git a/posix/tst-spawn6-timens.c b/posix/tst-spawn6-timens.c
new file mode 100644
index 0000000000..a23d068ce5
--- /dev/null
+++ b/posix/tst-spawn6-timens.c
@@ -0,0 +1,2 @@
+#define CHECK_TIMENAMESPACE
+#include "tst-spawn6.c"
diff --git a/posix/tst-spawn6.c b/posix/tst-spawn6.c
index 044abd8535..4c55a1c110 100644
--- a/posix/tst-spawn6.c
+++ b/posix/tst-spawn6.c
@@ -25,12 +25,14 @@
 #include <spawn.h>
 #include <stdbool.h>
 #include <stdio.h>
+#include <stdlib.h>
 #include <string.h>
 #include <support/check.h>
+#include <support/namespace.h>
+#include <support/test-driver.h>
 #include <support/xunistd.h>
-#include <sys/wait.h>
 #include <sys/ioctl.h>
-#include <stdlib.h>
+#include <sys/wait.h>
 #include <termios.h>
 
 #ifndef PATH_MAX
@@ -124,6 +126,12 @@ run_subprogram (int argc, char *argv[], const posix_spawnattr_t *attr,
 static int
 run_test (int argc, char *argv[])
 {
+#ifdef CHECK_TIMENAMESPACE
+  support_become_root();
+  if (!support_enter_time_namespace ())
+    return EXIT_UNSUPPORTED;
+#endif
+
   /* We must have either:
      - four parameters left if called initially:
        + path to ld.so         optional
diff --git a/sysdeps/unix/sysv/linux/spawni.c b/sysdeps/unix/sysv/linux/spawni.c
index d6f5ca89cd..b57db64698 100644
--- a/sysdeps/unix/sysv/linux/spawni.c
+++ b/sysdeps/unix/sysv/linux/spawni.c
@@ -44,7 +44,12 @@
    third issue is done by a stack allocation in parent, and by using a
    field in struct spawn_args where the child can write an error
    code. CLONE_VFORK ensures that the parent does not run until the
-   child has either exec'ed successfully or exited.  */
+   child has either exec'ed successfully or exited.
+
+   If the clone with CLONE_VM and CLONE_VFORK fails (due any kernel limitation
+   such as time namespace), only CLONE_VFORK is used instead and the
+   preparation and execve failures are communicated through a shared
+   memory.  */
 
 
 /* The Unix standard contains a long explanation of the way to signal
@@ -67,6 +72,7 @@ struct posix_spawn_args
   char *const *envp;
   int xflags;
   int err;
+  int *errp;
 };
 
 /* Older version requires that shell script without shebang definition
@@ -97,8 +103,8 @@ maybe_script_execute (struct posix_spawn_args *args)
 /* Function used in the clone call to setup the signals mask, posix_spawn
    attributes, and file actions.  It run on its own stack (provided by the
    posix_spawn call).  */
-static int
-__spawni_child (void *arguments)
+static _Noreturn int
+spawni_child (void *arguments)
 {
   struct posix_spawn_args *args = arguments;
   const posix_spawnattr_t *restrict attr = args->attr;
@@ -300,10 +306,91 @@ fail:
      (EINTERNALBUG) describing that, use ECHILD.  Another option would
      be to set args->err to some negative sentinel and have the parent
      abort(), but that seems needlessly harsh.  */
-  args->err = errno ? : ECHILD;
+  *args->errp = errno ? : ECHILD;
+
   _exit (SPAWN_ERROR);
 }
 
+/* Spawn a new process using clone with FLAGS, STACK, and STACK_SIZE and
+   return TRUE if the helper was created or if the failure was not due
+   resource exhaustion.  */
+static bool
+spawni_clone (struct posix_spawn_args *args, pid_t *new_pid, int *ec,
+	      int flags, void *stack, size_t stack_size)
+{
+  /* The clone flags used will create a new child that will run in the same
+     memory space (CLONE_VM) and the execution of calling thread will be
+     suspend until the child calls execve or _exit.
+
+     Also since the calling thread execution will be suspend, there is not
+     need for CLONE_SETTLS.  Although parent and child share the same TLS
+     namespace, there will be no concurrent access for TLS variables (errno
+     for instance).  */
+  struct clone_args clone_args =
+    {
+      .flags = flags,
+      .exit_signal = SIGCHLD,
+      .stack = (uintptr_t) stack,
+      .stack_size = stack_size,
+    };
+  *new_pid = __clone_internal (&clone_args, spawni_child, args);
+
+  /* It needs to collect the case where the auxiliary process was created
+     but failed to execute the file (due either any preparation step or
+     for execve itself).  */
+  if (*new_pid > 0)
+    {
+      /* Also, it handles the unlikely case where the auxiliary process was
+	 terminated before calling execve as if it was successfully.  The
+	 args.err is set to 0 as default and changed to a positive value
+	 only in case of failure, so in case of premature termination
+	 due a signal args.err will remain zeroed and it will be up to
+	 caller to actually collect it.  */
+      *ec = *args->errp;
+      if (*ec > 0)
+	/* There still an unlikely case where the child is cancelled after
+	   setting args.err, due to a positive error value.  Also there is
+	   possible pid reuse race (where the kernel allocated the same pid
+	   to an unrelated process).  Unfortunately due synchronization
+	   issues where the kernel might not have the process collected
+	   the waitpid below can not use WNOHANG.  */
+	__waitpid (*new_pid, NULL, 0);
+    }
+  else
+    *ec = errno;
+
+  /* There is no much point in retrying with fork and exec if kernel returns a
+     failure due resource exhaustion.  */
+  return *new_pid > 0 || (errno == ENOMEM || errno == EAGAIN);
+}
+
+static bool
+spawni_clone_vfork_vm (struct posix_spawn_args *args, pid_t *new_pid, int *ec,
+		       void *stack, size_t stack_size)
+{
+  args->errp = &args->err;
+  bool r = spawni_clone (args, new_pid, ec, CLONE_VM | CLONE_VFORK,
+			 stack, stack_size);
+  return r;
+}
+
+static void
+spawni_clone_vfork (struct posix_spawn_args *args, pid_t *new_pid, int *ec,
+		    void *stack, size_t stack_size)
+{
+  int *errmap = __mmap (NULL, sizeof (int), PROT_READ | PROT_WRITE,
+			MAP_SHARED | MAP_ANONYMOUS, -1, 0);
+  if (__glibc_unlikely (errmap == MAP_FAILED))
+    {
+      *ec = errno;
+      return;
+    }
+
+  args->errp = errmap;
+  spawni_clone (args, new_pid, ec, CLONE_VFORK, stack, stack_size);
+  __munmap (errmap, sizeof (int));
+}
+
 /* Spawn a new process executing PATH with the attributes describes in *ATTRP.
    Before running the process perform the actions described in FILE-ACTIONS. */
 static int
@@ -370,47 +457,11 @@ __spawnix (pid_t * pid, const char *file,
 
   __libc_signal_block_all (&args.oldmask);
 
-  /* The clone flags used will create a new child that will run in the same
-     memory space (CLONE_VM) and the execution of calling thread will be
-     suspend until the child calls execve or _exit.
-
-     Also since the calling thread execution will be suspend, there is not
-     need for CLONE_SETTLS.  Although parent and child share the same TLS
-     namespace, there will be no concurrent access for TLS variables (errno
-     for instance).  */
-  struct clone_args clone_args =
-    {
-      .flags = CLONE_VM | CLONE_VFORK,
-      .exit_signal = SIGCHLD,
-      .stack = (uintptr_t) stack,
-      .stack_size = stack_size,
-    };
-  new_pid = __clone_internal (&clone_args, __spawni_child, &args);
-
-  /* It needs to collect the case where the auxiliary process was created
-     but failed to execute the file (due either any preparation step or
-     for execve itself).  */
-  if (new_pid > 0)
-    {
-      /* Also, it handles the unlikely case where the auxiliary process was
-	 terminated before calling execve as if it was successfully.  The
-	 args.err is set to 0 as default and changed to a positive value
-	 only in case of failure, so in case of premature termination
-	 due a signal args.err will remain zeroed and it will be up to
-	 caller to actually collect it.  */
-      ec = args.err;
-      if (ec > 0)
-	/* There still an unlikely case where the child is cancelled after
-	   setting args.err, due to a positive error value.  Also there is
-	   possible pid reuse race (where the kernel allocated the same pid
-	   to an unrelated process).  Unfortunately due synchronization
-	   issues where the kernel might not have the process collected
-	   the waitpid below can not use WNOHANG.  */
-	__waitpid (new_pid, NULL, 0);
-    }
-  else
-    ec = errno;
-
+  /* First try with CLONE_VM | CLONE_VFORK with errors being communicate through
+     ARGS.  If clone fails, only CLONE_VFORK is used and error status is passed
+     through a MAP_SHARED memory.  */
+  if (!spawni_clone_vfork_vm (&args, &new_pid, &ec, stack, stack_size))
+    spawni_clone_vfork (&args, &new_pid, &ec, stack, stack_size);
   __munmap (stack, stack_size);
 
   if ((ec == 0) && (pid != NULL))
-- 
2.34.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v5 0/2] Linux: Fix posix_spawn when user with time namespaces
  2022-05-30 17:49 [PATCH v5 0/2] Linux: Fix posix_spawn when user with time namespaces Adhemerval Zanella
  2022-05-30 17:49 ` [PATCH v5 1/2] support: Add support_enter_time_namespace Adhemerval Zanella
  2022-05-30 17:49 ` [PATCH v5 2/2] linux: Add fallback for clone failure on posix_spawn (BZ #29115) Adhemerval Zanella
@ 2022-07-11 15:32 ` Carlos O'Donell
  2022-07-11 16:56   ` Adhemerval Zanella
  2 siblings, 1 reply; 8+ messages in thread
From: Carlos O'Donell @ 2022-07-11 15:32 UTC (permalink / raw)
  To: Adhemerval Zanella, libc-alpha

On 5/30/22 13:49, Adhemerval Zanella via Libc-alpha wrote:
> The patchset also adds some support to tests the fallback code
> to use only use CLONE_VFORK.  It uses unshare directly because
> it simpler than add container support.
> 
> v5: Use a MAP_SHARED page to communicate error on prepare phase from
>     helper process.

The current progress is that it seems we may get the upstream Linux kernel
to change the semantics of the time namespace to take effect only after
exec for vfork. If that goes forward we won't need these changes?

I'm trying to determine if these changes constitute a blocker for 2.36.

-- 
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v5 0/2] Linux: Fix posix_spawn when user with time namespaces
  2022-07-11 15:32 ` [PATCH v5 0/2] Linux: Fix posix_spawn when user with time namespaces Carlos O'Donell
@ 2022-07-11 16:56   ` Adhemerval Zanella
  2022-07-19  2:33     ` Carlos O'Donell
  0 siblings, 1 reply; 8+ messages in thread
From: Adhemerval Zanella @ 2022-07-11 16:56 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: libc-alpha



> On 11 Jul 2022, at 12:32, Carlos O'Donell <carlos@redhat.com> wrote:
> 
> On 5/30/22 13:49, Adhemerval Zanella via Libc-alpha wrote:
>> The patchset also adds some support to tests the fallback code
>> to use only use CLONE_VFORK.  It uses unshare directly because
>> it simpler than add container support.
>> 
>> v5: Use a MAP_SHARED page to communicate error on prepare phase from
>>    helper process.
> 
> The current progress is that it seems we may get the upstream Linux kernel
> to change the semantics of the time namespace to take effect only after
> exec for vfork. If that goes forward we won't need these changes?
> 
> I'm trying to determine if these changes constitute a blocker for 2.36.

My understanding is even kernel does change or add a new flag to handle
it, we still have released kernels that might hit this issue.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v5 0/2] Linux: Fix posix_spawn when user with time namespaces
  2022-07-11 16:56   ` Adhemerval Zanella
@ 2022-07-19  2:33     ` Carlos O'Donell
  2022-07-19 12:56       ` Adhemerval Zanella Netto
  0 siblings, 1 reply; 8+ messages in thread
From: Carlos O'Donell @ 2022-07-19  2:33 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: libc-alpha

On 7/11/22 12:56, Adhemerval Zanella wrote:
> 
> 
>> On 11 Jul 2022, at 12:32, Carlos O'Donell <carlos@redhat.com> wrote:
>>
>> On 5/30/22 13:49, Adhemerval Zanella via Libc-alpha wrote:
>>> The patchset also adds some support to tests the fallback code
>>> to use only use CLONE_VFORK.  It uses unshare directly because
>>> it simpler than add container support.
>>>
>>> v5: Use a MAP_SHARED page to communicate error on prepare phase from
>>>    helper process.
>>
>> The current progress is that it seems we may get the upstream Linux kernel
>> to change the semantics of the time namespace to take effect only after
>> exec for vfork. If that goes forward we won't need these changes?
>>
>> I'm trying to determine if these changes constitute a blocker for 2.36.
> 
> My understanding is even kernel does change or add a new flag to handle
> it, we still have released kernels that might hit this issue.
 
Thanks. In which case I don't consider this a blocker for 2.36. It's a bug that we can fix
in 2.37 and backport to 2.36 when we've completed the review. I've started looking at these
changes but I think we need more testing in downstream. If we fix this when 2.37 opens we
can start testing it in Fedora Rawhide early, and backport after we know we don't have
further issues.

-- 
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v5 0/2] Linux: Fix posix_spawn when user with time namespaces
  2022-07-19  2:33     ` Carlos O'Donell
@ 2022-07-19 12:56       ` Adhemerval Zanella Netto
  2022-07-29 15:49         ` Carlos O'Donell
  0 siblings, 1 reply; 8+ messages in thread
From: Adhemerval Zanella Netto @ 2022-07-19 12:56 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: libc-alpha



On 18/07/22 23:33, Carlos O'Donell wrote:
> On 7/11/22 12:56, Adhemerval Zanella wrote:
>>
>>
>>> On 11 Jul 2022, at 12:32, Carlos O'Donell <carlos@redhat.com> wrote:
>>>
>>> On 5/30/22 13:49, Adhemerval Zanella via Libc-alpha wrote:
>>>> The patchset also adds some support to tests the fallback code
>>>> to use only use CLONE_VFORK.  It uses unshare directly because
>>>> it simpler than add container support.
>>>>
>>>> v5: Use a MAP_SHARED page to communicate error on prepare phase from
>>>>    helper process.
>>>
>>> The current progress is that it seems we may get the upstream Linux kernel
>>> to change the semantics of the time namespace to take effect only after
>>> exec for vfork. If that goes forward we won't need these changes?
>>>
>>> I'm trying to determine if these changes constitute a blocker for 2.36.
>>
>> My understanding is even kernel does change or add a new flag to handle
>> it, we still have released kernels that might hit this issue.
>  
> Thanks. In which case I don't consider this a blocker for 2.36. It's a bug that we can fix
> in 2.37 and backport to 2.36 when we've completed the review. I've started looking at these
> changes but I think we need more testing in downstream. If we fix this when 2.37 opens we
> can start testing it in Fedora Rawhide early, and backport after we know we don't have
> further issues.
> 

I agree, I have not added it as a 2.36 as blocker although it would be
a good thing to have.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v5 0/2] Linux: Fix posix_spawn when user with time namespaces
  2022-07-19 12:56       ` Adhemerval Zanella Netto
@ 2022-07-29 15:49         ` Carlos O'Donell
  0 siblings, 0 replies; 8+ messages in thread
From: Carlos O'Donell @ 2022-07-29 15:49 UTC (permalink / raw)
  To: Adhemerval Zanella Netto; +Cc: libc-alpha

On 7/19/22 08:56, Adhemerval Zanella Netto wrote:
> 
> 
> On 18/07/22 23:33, Carlos O'Donell wrote:
>> On 7/11/22 12:56, Adhemerval Zanella wrote:
>>>
>>>
>>>> On 11 Jul 2022, at 12:32, Carlos O'Donell <carlos@redhat.com> wrote:
>>>>
>>>> On 5/30/22 13:49, Adhemerval Zanella via Libc-alpha wrote:
>>>>> The patchset also adds some support to tests the fallback code
>>>>> to use only use CLONE_VFORK.  It uses unshare directly because
>>>>> it simpler than add container support.
>>>>>
>>>>> v5: Use a MAP_SHARED page to communicate error on prepare phase from
>>>>>    helper process.
>>>>
>>>> The current progress is that it seems we may get the upstream Linux kernel
>>>> to change the semantics of the time namespace to take effect only after
>>>> exec for vfork. If that goes forward we won't need these changes?
>>>>
>>>> I'm trying to determine if these changes constitute a blocker for 2.36.
>>>
>>> My understanding is even kernel does change or add a new flag to handle
>>> it, we still have released kernels that might hit this issue.
>>  
>> Thanks. In which case I don't consider this a blocker for 2.36. It's a bug that we can fix
>> in 2.37 and backport to 2.36 when we've completed the review. I've started looking at these
>> changes but I think we need more testing in downstream. If we fix this when 2.37 opens we
>> can start testing it in Fedora Rawhide early, and backport after we know we don't have
>> further issues.
>>
> 
> I agree, I have not added it as a 2.36 as blocker although it would be
> a good thing to have.
 
We'll review this again when 2.37 opens.

Since the details are hidden behind the interfaces in spawni.c it could be backported safely.

-- 
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-07-29 15:49 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-30 17:49 [PATCH v5 0/2] Linux: Fix posix_spawn when user with time namespaces Adhemerval Zanella
2022-05-30 17:49 ` [PATCH v5 1/2] support: Add support_enter_time_namespace Adhemerval Zanella
2022-05-30 17:49 ` [PATCH v5 2/2] linux: Add fallback for clone failure on posix_spawn (BZ #29115) Adhemerval Zanella
2022-07-11 15:32 ` [PATCH v5 0/2] Linux: Fix posix_spawn when user with time namespaces Carlos O'Donell
2022-07-11 16:56   ` Adhemerval Zanella
2022-07-19  2:33     ` Carlos O'Donell
2022-07-19 12:56       ` Adhemerval Zanella Netto
2022-07-29 15:49         ` Carlos O'Donell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).