public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH v2 0/9] Optimize posix_spawn signal setup with clone3
@ 2022-09-30 19:26 Adhemerval Zanella
  2022-09-30 19:26 ` [PATCH v2 1/9] linux: Do not reset signal handler in posix_spawn if it is already SIG_DFL Adhemerval Zanella
                   ` (10 more replies)
  0 siblings, 11 replies; 28+ messages in thread
From: Adhemerval Zanella @ 2022-09-30 19:26 UTC (permalink / raw)
  To: libc-alpha, Christian Brauner

Currently posix_spawn has to issue at least NSIG sigaction syscalls
to obtain current signal disposition and more if they there are not
either SIG_IGN or SIG_DFL.  The new clone3 CLONE_CLEAR_SIGHAND flags
introduced with Linux 5.5 resets all signal handlers of the child not
set to SIG_IGN to SIG_DFL, thus allowing posix_spawn to skip this
preparation phase.

The exception is the signals defined by posix_spawnattr_setsigdefault
when POSIX_SPAWN_SETSIGDEF is set (since they can be SIG_IGN).  In
this case posix_spawn helper process needs to issue a sigction to 
set the signal disposition to SIG_DFL.

The patchset also adds clone3 implementation for aarch64, powerpc64,
s390x, riscv, arm, and mips.

Changes from v1:
  * Adeed arm and mips clone3 implementation.

Adhemerval Zanella (9):
  linux: Do not reset signal handler in posix_spawn if it is already
    SIG_DFL
  linux: Add clone3 CLONE_CLEAR_SIGHAND optimization to posix_spawn
  powerpc64le: Add the clone3 wrapper
  aarch64: Add the clone3 wrapper
  s390x: Add the clone3 wrapper
  riscv: Add the clone3 wrapper
  arm: Add the clone3 wrapper
  mips: Add the clone3 wrapper
  Linux: optimize clone3 internal usage

 include/clone_internal.h                      |  10 +
 posix/Makefile                                |   3 +-
 posix/tst-spawn7.c                            | 179 ++++++++++++++++++
 sysdeps/unix/sysv/linux/aarch64/clone3.S      |  90 +++++++++
 sysdeps/unix/sysv/linux/aarch64/sysdep.h      |   2 +
 sysdeps/unix/sysv/linux/arm/clone3.S          |  80 ++++++++
 sysdeps/unix/sysv/linux/arm/sysdep.h          |   1 +
 sysdeps/unix/sysv/linux/clone-internal.c      |  59 ++++--
 sysdeps/unix/sysv/linux/clone3.h              |   6 +
 sysdeps/unix/sysv/linux/kernel-features.h     |   9 +
 sysdeps/unix/sysv/linux/mips/clone3.S         | 147 ++++++++++++++
 sysdeps/unix/sysv/linux/mips/sysdep.h         |   2 +
 .../sysv/linux/powerpc/powerpc64/clone3.S     | 145 ++++++++++++++
 sysdeps/unix/sysv/linux/powerpc/sysdep.h      |   1 +
 sysdeps/unix/sysv/linux/riscv/clone3.S        |  83 ++++++++
 sysdeps/unix/sysv/linux/riscv/sysdep.h        |   1 +
 sysdeps/unix/sysv/linux/s390/s390-64/clone3.S |  84 ++++++++
 sysdeps/unix/sysv/linux/s390/sysdep.h         |   1 +
 sysdeps/unix/sysv/linux/spawni.c              |  33 +++-
 19 files changed, 910 insertions(+), 26 deletions(-)
 create mode 100644 posix/tst-spawn7.c
 create mode 100644 sysdeps/unix/sysv/linux/aarch64/clone3.S
 create mode 100644 sysdeps/unix/sysv/linux/arm/clone3.S
 create mode 100644 sysdeps/unix/sysv/linux/mips/clone3.S
 create mode 100644 sysdeps/unix/sysv/linux/powerpc/powerpc64/clone3.S
 create mode 100644 sysdeps/unix/sysv/linux/riscv/clone3.S
 create mode 100644 sysdeps/unix/sysv/linux/s390/s390-64/clone3.S

-- 
2.34.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2 1/9] linux: Do not reset signal handler in posix_spawn if it is already SIG_DFL
  2022-09-30 19:26 [PATCH v2 0/9] Optimize posix_spawn signal setup with clone3 Adhemerval Zanella
@ 2022-09-30 19:26 ` Adhemerval Zanella
  2023-01-11 21:06   ` Carlos O'Donell
  2022-09-30 19:26 ` [PATCH v2 2/9] linux: Add clone3 CLONE_CLEAR_SIGHAND optimization to posix_spawn Adhemerval Zanella
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 28+ messages in thread
From: Adhemerval Zanella @ 2022-09-30 19:26 UTC (permalink / raw)
  To: libc-alpha, Christian Brauner

There is no need to issue another sigaction is the disposition is
already SIG_DFL.

Checked on x86_64-linux-gnu.
---
 sysdeps/unix/sysv/linux/spawni.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sysdeps/unix/sysv/linux/spawni.c b/sysdeps/unix/sysv/linux/spawni.c
index ee843a2247..65ee03c804 100644
--- a/sysdeps/unix/sysv/linux/spawni.c
+++ b/sysdeps/unix/sysv/linux/spawni.c
@@ -129,7 +129,7 @@ __spawni_child (void *arguments)
 	  else
 	    {
 	      __libc_sigaction (sig, 0, &sa);
-	      if (sa.sa_handler == SIG_IGN)
+	      if (sa.sa_handler == SIG_IGN || sa.sa_handler == SIG_DFL)
 		continue;
 	      sa.sa_handler = SIG_DFL;
 	    }
-- 
2.34.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2 2/9] linux: Add clone3 CLONE_CLEAR_SIGHAND optimization to posix_spawn
  2022-09-30 19:26 [PATCH v2 0/9] Optimize posix_spawn signal setup with clone3 Adhemerval Zanella
  2022-09-30 19:26 ` [PATCH v2 1/9] linux: Do not reset signal handler in posix_spawn if it is already SIG_DFL Adhemerval Zanella
@ 2022-09-30 19:26 ` Adhemerval Zanella
  2023-01-11 21:06   ` Carlos O'Donell
  2022-09-30 19:26 ` [PATCH v2 3/9] powerpc64le: Add the clone3 wrapper Adhemerval Zanella
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 28+ messages in thread
From: Adhemerval Zanella @ 2022-09-30 19:26 UTC (permalink / raw)
  To: libc-alpha, Christian Brauner

The clone3 flag resets all signal handlers of the child not set to
SIG_IGN to SIG_DFL.  It allows to skip most of the sigaction calls
to setup child signal handling, where previously a posix_spawn
has to issue 2 times NSIG sigaction calls (one to obtain the current
disposition and another to set either SIG_DFL or SIG_IGN).

The expection is POSIX_SPAWN_SETSIGDEF the child still setup the
signal for the case the disposition is SIG_IGN.

It also need to handle the fallback where clone3 is not available,
to set the fallback in child.  This is done by splitting of
__clone_internal_fallback from __clone_internal.

Checked on x86_64-linux-gnu.
---
 include/clone_internal.h                 |   5 +
 posix/Makefile                           |   3 +-
 posix/tst-spawn7.c                       | 179 +++++++++++++++++++++++
 sysdeps/unix/sysv/linux/clone-internal.c |  37 +++--
 sysdeps/unix/sysv/linux/clone3.h         |   6 +
 sysdeps/unix/sysv/linux/spawni.c         |  31 ++--
 6 files changed, 236 insertions(+), 25 deletions(-)
 create mode 100644 posix/tst-spawn7.c

diff --git a/include/clone_internal.h b/include/clone_internal.h
index 4b23ef33ce..320640e64b 100644
--- a/include/clone_internal.h
+++ b/include/clone_internal.h
@@ -7,6 +7,11 @@ extern __typeof (clone3) __clone3;
    -1 with ENOSYS, fall back to clone or clone2.  */
 extern int __clone_internal (struct clone_args *__cl_args,
 			     int (*__func) (void *__arg), void *__arg);
+/* The fallback code which calls clone/clone2 based on clone3 arguments.  */
+extern int __clone_internal_fallback (struct clone_args *__cl_args,
+				      int (*__func) (void *__arg),
+				      void *__arg)
+     attribute_hidden;
 
 #ifndef _ISOMAC
 libc_hidden_proto (__clone3)
diff --git a/posix/Makefile b/posix/Makefile
index d1df7c27cb..7887eb1c8b 100644
--- a/posix/Makefile
+++ b/posix/Makefile
@@ -109,7 +109,7 @@ tests		:= test-errno tstgetopt testfnm runtests runptests \
 		   tst-glob-tilde test-ssize-max tst-spawn4 bug-regex37 \
 		   bug-regex38 tst-regcomp-truncated tst-spawn-chdir \
 		   tst-wordexp-nocmd tst-execveat tst-spawn5 \
-		   tst-sched_getaffinity tst-spawn6
+		   tst-sched_getaffinity tst-spawn6 tst-spawn7
 
 # Test for the glob symbol version that was replaced in glibc 2.27.
 ifeq ($(have-GLIBC_2.26)$(build-shared),yesyes)
@@ -291,6 +291,7 @@ tst-spawn-ARGS = -- $(host-test-program-cmd)
 tst-spawn-static-ARGS = $(tst-spawn-ARGS)
 tst-spawn5-ARGS = -- $(host-test-program-cmd)
 tst-spawn6-ARGS = -- $(host-test-program-cmd)
+tst-spawn7-ARGS = -- $(host-test-program-cmd)
 tst-dir-ARGS = `pwd` `cd $(common-objdir)/$(subdir); pwd` `cd $(common-objdir); pwd` $(objpfx)tst-dir
 tst-chmod-ARGS = $(objdir)
 tst-vfork3-ARGS = --test-dir=$(objpfx)
diff --git a/posix/tst-spawn7.c b/posix/tst-spawn7.c
new file mode 100644
index 0000000000..7cb7347c58
--- /dev/null
+++ b/posix/tst-spawn7.c
@@ -0,0 +1,179 @@
+/* Tests for posix_spawn signal handling.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <assert.h>
+#include <getopt.h>
+#include <spawn.h>
+#include <stdlib.h>
+#include <string.h>
+#include <support/check.h>
+#include <support/xsignal.h>
+#include <support/xunistd.h>
+#include <unistd.h>
+
+/* Nonzero if the program gets called via `exec'.  */
+#define CMDLINE_OPTIONS \
+  { "restart", no_argument, &restart, 1 },
+static int restart;
+
+/* Hold the four initial argument used to respawn the process, plus the extra
+   '--direct', '--restart', the check type ('SIG_IGN' or 'SIG_DFL'), and a
+   final NULL.  */
+static char *spargs[8];
+static int check_type_argc;
+
+/* Called on process re-execution.  */
+_Noreturn static void
+handle_restart (int argc, char *argv[])
+{
+  assert (argc == 1);
+
+  if (strcmp (argv[0], "SIG_DFL") == 0)
+    {
+      for (int i = 1; i < NSIG; i++)
+	{
+	  struct sigaction sa;
+	  int r = sigaction (i, NULL, &sa);
+	  /* Skip internal signals (such as SIGCANCEL).  */
+	  if (r == -1)
+	    continue;
+	  TEST_VERIFY_EXIT (sa.sa_handler == SIG_DFL);
+	}
+      exit (EXIT_SUCCESS);
+    }
+  else if (strcmp (argv[0], "SIG_IGN") == 0)
+    {
+      for (int i = 1; i < NSIG; i++)
+	{
+	  struct sigaction sa;
+	  int r = sigaction (i, NULL, &sa);
+	  /* Skip internal signals (such as SIGCANCEL).  */
+	  if (r == -1)
+	    continue;
+	  if (i == SIGUSR1 || i == SIGUSR2)
+	    TEST_VERIFY_EXIT (sa.sa_handler == SIG_IGN);
+	  else
+	    TEST_VERIFY_EXIT (sa.sa_handler == SIG_DFL);
+	}
+      exit (EXIT_SUCCESS);
+    }
+
+  exit (EXIT_FAILURE);
+}
+
+static void
+spawn_signal_test (const char *type, const posix_spawnattr_t *attr)
+{
+  spargs[check_type_argc] = (char*) type;
+
+  pid_t pid;
+  int status;
+
+  TEST_COMPARE (posix_spawn (&pid, spargs[0], NULL, attr, spargs, environ), 0);
+  TEST_COMPARE (xwaitpid (pid, &status, 0), pid);
+  TEST_VERIFY (WIFEXITED (status));
+  TEST_VERIFY (!WIFSIGNALED (status));
+  TEST_COMPARE (WEXITSTATUS (status), 0);
+}
+
+static void
+dummy_sa_handler (int)
+{
+}
+
+static void
+do_test_signals (void)
+{
+  {
+    /* Check if all signals handler are set to SIG_DFL on spawned process.  */
+    spawn_signal_test ("SIG_DFL", NULL);
+  }
+
+  {
+    /* Same as before, but set SIGUSR1 and SIGUSR2 to a handler different than
+       SIG_IGN or SIG_DFL.  */
+    struct sigaction sa = { 0 };
+    sa.sa_handler = dummy_sa_handler;
+    xsigaction (SIGUSR1, &sa, NULL);
+    xsigaction (SIGUSR2, &sa, NULL);
+    spawn_signal_test ("SIG_DFL", NULL);
+  }
+
+  {
+    /* Check if SIG_IGN is keep as is.  */
+    struct sigaction sa = { 0 };
+    sa.sa_handler = SIG_IGN;
+    xsigaction (SIGUSR1, &sa, NULL);
+    xsigaction (SIGUSR2, &sa, NULL);
+    spawn_signal_test ("SIG_IGN", NULL);
+  }
+
+  {
+    /* Check if SIG_IGN handlers are set to SIG_DFL.  */
+    posix_spawnattr_t attr;
+    posix_spawnattr_init (&attr);
+    sigset_t mask;
+    sigemptyset (&mask);
+    sigaddset (&mask, SIGUSR1);
+    sigaddset (&mask, SIGUSR2);
+    posix_spawnattr_setsigdefault (&attr, &mask);
+    posix_spawnattr_setflags (&attr, POSIX_SPAWN_SETSIGDEF);
+    spawn_signal_test ("SIG_DFL", &attr);
+    posix_spawnattr_destroy (&attr);
+  }
+}
+
+static int
+do_test (int argc, char *argv[])
+{
+  /* We must have either:
+
+     - one or four parameters if called initially:
+       + argv[1]: path for ld.so        optional
+       + argv[2]: "--library-path"      optional
+       + argv[3]: the library path      optional
+       + argv[4]: the application name
+
+     - six parameters left if called through re-execution:
+       + argv[1]: the application name
+       + argv[2]: check SIG_IGN/SIG_DFL.
+
+     * When built with --enable-hardcoded-path-in-tests or issued without
+       using the loader directly.  */
+
+  if (restart)
+    handle_restart (argc - 1, &argv[1]);
+
+  TEST_VERIFY_EXIT (argc == 2 || argc == 5);
+
+  int i;
+  for (i = 0; i < argc - 1; i++)
+    spargs[i] = argv[i + 1];
+  spargs[i++] = (char *) "--direct";
+  spargs[i++] = (char *) "--restart";
+  check_type_argc = i++;
+  spargs[i] = NULL;
+
+
+  do_test_signals ();
+
+  return 0;
+}
+
+#define TEST_FUNCTION_ARGV do_test
+#include <support/test-driver.c>
diff --git a/sysdeps/unix/sysv/linux/clone-internal.c b/sysdeps/unix/sysv/linux/clone-internal.c
index a71effcbd3..7bc991e033 100644
--- a/sysdeps/unix/sysv/linux/clone-internal.c
+++ b/sysdeps/unix/sysv/linux/clone-internal.c
@@ -44,27 +44,15 @@ _Static_assert (sizeof (struct clone_args) == CLONE_ARGS_SIZE_VER2,
 		"sizeof (struct clone_args) != CLONE_ARGS_SIZE_VER2");
 
 int
-__clone_internal (struct clone_args *cl_args,
-		  int (*func) (void *arg), void *arg)
+__clone_internal_fallback (struct clone_args *cl_args,
+			   int (*func) (void *arg), void *arg)
 {
-  int ret;
-#ifdef HAVE_CLONE3_WRAPPER
-  /* Try clone3 first.  */
-  int saved_errno = errno;
-  ret = __clone3 (cl_args, sizeof (*cl_args), func, arg);
-  if (ret != -1 || errno != ENOSYS)
-    return ret;
-
-  /* NB: Restore errno since errno may be checked against non-zero
-     return value.  */
-  __set_errno (saved_errno);
-#endif
-
   /* Map clone3 arguments to clone arguments.  NB: No need to check
      invalid clone3 specific bits in flags nor exit_signal since this
      is an internal function.  */
   int flags = cl_args->flags | cl_args->exit_signal;
   void *stack = cast_to_pointer (cl_args->stack);
+  int ret;
 
 #ifdef __ia64__
   ret = __clone2 (func, stack, cl_args->stack_size,
@@ -88,4 +76,23 @@ __clone_internal (struct clone_args *cl_args,
   return ret;
 }
 
+
+int
+__clone_internal (struct clone_args *cl_args,
+		  int (*func) (void *arg), void *arg)
+{
+#ifdef HAVE_CLONE3_WRAPPER
+  int saved_errno = errno;
+  int ret = __clone3 (cl_args, sizeof (*cl_args), func, arg);
+  if (ret != -1 || errno != ENOSYS)
+    return ret;
+
+  /* NB: Restore errno since errno may be checked against non-zero
+     return value.  */
+  __set_errno (saved_errno);
+#endif
+
+  return __clone_internal_fallback (cl_args, func, arg);
+}
+
 libc_hidden_def (__clone_internal)
diff --git a/sysdeps/unix/sysv/linux/clone3.h b/sysdeps/unix/sysv/linux/clone3.h
index 889014a6a9..a150363f89 100644
--- a/sysdeps/unix/sysv/linux/clone3.h
+++ b/sysdeps/unix/sysv/linux/clone3.h
@@ -25,6 +25,12 @@
 
 __BEGIN_DECLS
 
+/* Flags for the clone3 syscall.  */
+#define CLONE_CLEAR_SIGHAND 0x100000000ULL /* Clear any signal handler and
+					      reset to SIG_DFL.  */
+#define CLONE_INTO_CGROUP 0x200000000ULL /* Clone into a specific cgroup given
+					    the right permissions.  */
+
 /* The unsigned 64-bit and 8-byte aligned integer type.  */
 typedef __U64_TYPE __aligned_uint64_t __attribute__ ((__aligned__ (8)));
 
diff --git a/sysdeps/unix/sysv/linux/spawni.c b/sysdeps/unix/sysv/linux/spawni.c
index 65ee03c804..ebc3c3e521 100644
--- a/sysdeps/unix/sysv/linux/spawni.c
+++ b/sysdeps/unix/sysv/linux/spawni.c
@@ -66,6 +66,7 @@ struct posix_spawn_args
   ptrdiff_t argc;
   char *const *envp;
   int xflags;
+  bool use_clone3;
   int err;
 };
 
@@ -104,12 +105,11 @@ __spawni_child (void *arguments)
   const posix_spawnattr_t *restrict attr = args->attr;
   const posix_spawn_file_actions_t *file_actions = args->fa;
 
-  /* The child must ensure that no signal handler are enabled because it shared
-     memory with parent, so the signal disposition must be either SIG_DFL or
-     SIG_IGN.  It does by iterating over all signals and although it could
-     possibly be more optimized (by tracking which signal potentially have a
-     signal handler), it might requires system specific solutions (since the
-     sigset_t data type can be very different on different architectures).  */
+  /* The child must ensure that no signal handler is enabled because it
+     shares memory with parent, so all signal dispositions must be either
+     SIG_DFL or SIG_IGN.  If clone3/CLONE_CLEAR_SIGHAND is used, there is
+     only need to set the defined signals POSIX_SPAWN_SETSIGDEF to SIG_DFL;
+     otherwise, the code iterates over all signals.  */
   struct sigaction sa;
   memset (&sa, '\0', sizeof (sa));
 
@@ -122,7 +122,7 @@ __spawni_child (void *arguments)
 	{
 	  sa.sa_handler = SIG_DFL;
 	}
-      else if (__sigismember (&hset, sig))
+      else if (!args->use_clone3 && __sigismember (&hset, sig))
 	{
 	  if (is_internal_signal (sig))
 	    sa.sa_handler = SIG_IGN;
@@ -382,12 +382,25 @@ __spawnix (pid_t * pid, const char *file,
      for instance).  */
   struct clone_args clone_args =
     {
-      .flags = CLONE_VM | CLONE_VFORK,
+      /* Unsupported flags like CLONE_CLEAR_SIGHAND will be cleared up by
+	 __clone_internal_fallback.  */
+      .flags = CLONE_CLEAR_SIGHAND | CLONE_VM | CLONE_VFORK,
       .exit_signal = SIGCHLD,
       .stack = (uintptr_t) stack,
       .stack_size = stack_size,
     };
-  new_pid = __clone_internal (&clone_args, __spawni_child, &args);
+#ifdef HAVE_CLONE3_WRAPPER
+  args.use_clone3 = true;
+  new_pid = __clone3 (&clone_args, sizeof (clone_args), __spawni_child,
+		      &args);
+  /* clone3 was added on 5.3 and CLONE_CLEAR_SIGHAND on 5.5.  */
+  if (new_pid == -1 && (errno == ENOSYS || errno == EINVAL))
+#endif
+    {
+      args.use_clone3 = false;
+      new_pid = __clone_internal_fallback (&clone_args, __spawni_child,
+					   &args);
+    }
 
   /* It needs to collect the case where the auxiliary process was created
      but failed to execute the file (due either any preparation step or
-- 
2.34.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2 3/9] powerpc64le: Add the clone3 wrapper
  2022-09-30 19:26 [PATCH v2 0/9] Optimize posix_spawn signal setup with clone3 Adhemerval Zanella
  2022-09-30 19:26 ` [PATCH v2 1/9] linux: Do not reset signal handler in posix_spawn if it is already SIG_DFL Adhemerval Zanella
  2022-09-30 19:26 ` [PATCH v2 2/9] linux: Add clone3 CLONE_CLEAR_SIGHAND optimization to posix_spawn Adhemerval Zanella
@ 2022-09-30 19:26 ` Adhemerval Zanella
  2022-09-30 19:26 ` [PATCH v2 4/9] aarch64: " Adhemerval Zanella
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 28+ messages in thread
From: Adhemerval Zanella @ 2022-09-30 19:26 UTC (permalink / raw)
  To: libc-alpha, Christian Brauner

It follows the internal signature:

  extern int clone3 (struct clone_args *__cl_args, size_t __size,
 int (*__func) (void *__arg), void *__arg);

And x86_64 semantics to return EINVAL if either cl_args or func
is NULL.  The stack is 16-byte aligned prior executing func.

Checked on powerpc64le-linux-gnu.
---
 .../sysv/linux/powerpc/powerpc64/clone3.S     | 145 ++++++++++++++++++
 sysdeps/unix/sysv/linux/powerpc/sysdep.h      |   1 +
 2 files changed, 146 insertions(+)
 create mode 100644 sysdeps/unix/sysv/linux/powerpc/powerpc64/clone3.S

diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/clone3.S b/sysdeps/unix/sysv/linux/powerpc/powerpc64/clone3.S
new file mode 100644
index 0000000000..a7461a3a42
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/clone3.S
@@ -0,0 +1,145 @@
+/* The clone3 syscall wrapper.  Linux/powerpc64 version.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#define _ERRNO_H        1
+#include <bits/errno.h>
+
+/* The userland implementation is:
+   int clone3 (struct clone_args *cl_args, size_t size,
+               int (*func)(void *arg), void *arg);
+
+   the kernel entry is:
+   int clone3 (struct clone_args *cl_args, size_t size);
+
+   The parameters are passed in registers from userland:
+   r3: cl_args
+   r4: size
+   r5: func
+   r6: arg  */
+
+        .text
+ENTRY(__clone3)
+	CALL_MCOUNT 4
+
+	/* Sanity checks args.  */
+	cmpdi	cr0, r3, 0
+	cmpdi	cr1, r5, 0
+	cror	cr0*4+eq, cr1*4+eq, cr0*4+eq
+	beq	cr0,L(badargs)
+
+	/* Save some regs in the "red zone".  */
+#ifdef USE_PPC_SCV
+	std	r28, -24(r1)
+	cfi_offset (r28, -24)
+#endif
+	std	r29, -16(r1)
+	std	r30, -8(r1)
+	cfi_offset (r29, -16)
+	cfi_offset (r30, -8)
+
+	/* Save fn and args across syscall.  */
+	mr	r30, r5		/* Function.  */
+	mr	r29, r6		/* Arguments.  */
+
+	/* End FDE now, because in the child the unwind info will be
+	   wrong.  */
+	cfi_endproc
+
+	/* Do the system call, the kernel expects:
+	   r0: system call numer
+	   r3: cl_args
+	   r4: size  */
+	li	r0, SYS_ify(clone3)
+#ifdef USE_PPC_SCV
+	CHECK_SCV_SUPPORT r28 0f
+	/* This is equivalent to DO_CALL_SCV, but we cannot use the macro here
+	   because it uses CFI directives and we just called cfi_endproc.  */
+	mflr 	r9
+	std 	r9, FRAME_LR_SAVE(r1)
+	.machine "push"
+	.machine "power9"
+	scv 	0
+	.machine "pop"
+	ld 	r9, FRAME_LR_SAVE(r1)
+	mtlr 	r9
+
+	/* When using scv, error is indicated by negative r3.  */
+	cmpdi	cr1, r3, 0
+	b	1f
+#endif
+0:      DO_CALL_SC
+
+	/* With sc, error is indicated by cr0.SO.  */
+	cmpdi	cr1, r3, 0
+	crandc	cr1*4+eq, cr1*4+eq, cr0*4+so
+
+1:	bne-	cr1,L(parent)
+
+	/* Child, load the function and arguments.  */
+
+	/* Align stack.  */
+	rldicr	r1, r1, 0, 59
+
+	std	r2, FRAME_TOC_SAVE(r1)
+	PPC64_LOAD_FUNCPTR r30
+	mr	r3, r29
+	bctrl
+	ld	r2, FRAME_TOC_SAVE(r1)
+
+	li	r0, SYS_ify(exit)
+	DO_CALL_SC
+	/* We won't ever get here but provide a nop so that the linker
+	   will insert a toc adjusting stub if necessary.  */
+	nop
+
+L(badargs):
+	cfi_startproc
+	li	r3, EINVAL
+	TAIL_CALL_SYSCALL_ERROR
+
+L(parent):
+	/* Check if svc is available.  */
+	cmpdi cr1, r28, 0
+
+	/* Parent.  Restore registers & return.  */
+#ifdef USE_PPC_SCV
+	cfi_offset (r28, -24)
+	ld	r28, -24(r1)
+	cfi_restore (r28)
+#endif
+	cfi_offset (r29,-16)
+	cfi_offset (r30,-8)
+	ld	r29, -16(r1)
+	ld	r30, -8(r1)
+	cfi_restore (r29)
+	cfi_restore (r30)
+
+#ifdef USE_PPC_SCV
+	beq	cr1, 0f
+	RET_SCV
+	b	1f
+#endif
+0:	RET_SC
+1:	TAIL_CALL_SYSCALL_ERROR
+
+PSEUDO_END (__clone3)
+
+libc_hidden_def (__clone3)
+weak_alias (__clone3, clone3)
diff --git a/sysdeps/unix/sysv/linux/powerpc/sysdep.h b/sysdeps/unix/sysv/linux/powerpc/sysdep.h
index 4fb135aa8d..b86951207d 100644
--- a/sysdeps/unix/sysv/linux/powerpc/sysdep.h
+++ b/sysdeps/unix/sysv/linux/powerpc/sysdep.h
@@ -246,6 +246,7 @@
 #if defined(__PPC64__) || defined(__powerpc64__)
 #define HAVE_CLOCK_GETRES64_VSYSCALL	"__kernel_clock_getres"
 #define HAVE_CLOCK_GETTIME64_VSYSCALL	"__kernel_clock_gettime"
+#define HAVE_CLONE3_WRAPPER		1
 #else
 #define HAVE_CLOCK_GETRES_VSYSCALL	"__kernel_clock_getres"
 #define HAVE_CLOCK_GETTIME_VSYSCALL	"__kernel_clock_gettime"
-- 
2.34.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2 4/9] aarch64: Add the clone3 wrapper
  2022-09-30 19:26 [PATCH v2 0/9] Optimize posix_spawn signal setup with clone3 Adhemerval Zanella
                   ` (2 preceding siblings ...)
  2022-09-30 19:26 ` [PATCH v2 3/9] powerpc64le: Add the clone3 wrapper Adhemerval Zanella
@ 2022-09-30 19:26 ` Adhemerval Zanella
  2022-11-02 12:12   ` Szabolcs Nagy
  2022-09-30 19:26 ` [PATCH v2 5/9] s390x: " Adhemerval Zanella
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 28+ messages in thread
From: Adhemerval Zanella @ 2022-09-30 19:26 UTC (permalink / raw)
  To: libc-alpha, Christian Brauner

It follow the internal signature:

  extern int clone3 (struct clone_args *__cl_args, size_t __size,
 int (*__func) (void *__arg), void *__arg);

And x86_64 semantics to return EINVAL if either cl_args or func
is NULL.  The stack is 16-byte aligned prior executing func.

Checked on aarch64-linux-gnu.
---
 sysdeps/unix/sysv/linux/aarch64/clone3.S | 90 ++++++++++++++++++++++++
 sysdeps/unix/sysv/linux/aarch64/sysdep.h |  2 +
 2 files changed, 92 insertions(+)
 create mode 100644 sysdeps/unix/sysv/linux/aarch64/clone3.S

diff --git a/sysdeps/unix/sysv/linux/aarch64/clone3.S b/sysdeps/unix/sysv/linux/aarch64/clone3.S
new file mode 100644
index 0000000000..dba93430eb
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/aarch64/clone3.S
@@ -0,0 +1,90 @@
+/* The clone3 syscall wrapper.  Linux/aarch64 version.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#define _ERRNO_H        1
+#include <bits/errno.h>
+
+/* The userland implementation is:
+   int clone3 (struct clone_args *cl_args, size_t size,
+               int (*func)(void *arg), void *arg);
+
+   the kernel entry is:
+   int clone3 (struct clone_args *cl_args, size_t size);
+
+   The parameters are passed in registers from userland:
+   x0: cl_args
+   x1: size
+   x2: func
+   x3: arg  */
+
+        .text
+ENTRY(__clone3)
+	PTR_ARG (0)
+	PTR_ARG (1)
+	PTR_ARG (3)
+	PTR_ARG (4)
+	/* Save args for the child.  */
+	mov	x10, x0		/* cl_args  */
+	mov	x11, x2		/* func	 */
+	mov	x12, x3		/* args  */
+
+	/* Sanity check args.  */
+	mov	x0, #-EINVAL
+	cbz	x10, .Lsyscall_error	/* No NULL cl_args pointer.  */
+	cbz	x2, .Lsyscall_error	/* No NULL function pointer.  */
+
+	/* Do the system call, the kernel expects:
+	   x8: system call number
+	   x0: cl_args
+	   x1: size  */
+	mov	x0, x10
+	mov	x8, #SYS_ify(clone3)
+	svc	0x0
+
+	cmp	x0, #0
+	beq	thread_start
+	blt	.Lsyscall_error
+	RET
+PSEUDO_END (__clone3)
+
+	.align 4
+	.type thread_start, %function
+thread_start:
+	cfi_startproc
+	cfi_undefined (x30)
+	mov	x29, 0
+
+	/* Align sp.  */
+	mov	x0, sp
+	and	x0, x0, -16
+	mov	sp, x0
+
+	/* Pick the function arg and execute.  */
+	mov	x0, x12
+	blr	x11
+
+	/* We are done, pass the return value through x0.  */
+	mov	x8, #SYS_ify(exit)
+	svc	0x0
+	cfi_endproc
+	.size thread_start, .-thread_start
+
+libc_hidden_def (__clone3)
+weak_alias (__clone3, clone3)
diff --git a/sysdeps/unix/sysv/linux/aarch64/sysdep.h b/sysdeps/unix/sysv/linux/aarch64/sysdep.h
index f1853e012f..42bb22f5e6 100644
--- a/sysdeps/unix/sysv/linux/aarch64/sysdep.h
+++ b/sysdeps/unix/sysv/linux/aarch64/sysdep.h
@@ -164,6 +164,8 @@
 # define HAVE_CLOCK_GETTIME64_VSYSCALL	"__kernel_clock_gettime"
 # define HAVE_GETTIMEOFDAY_VSYSCALL	"__kernel_gettimeofday"
 
+# define HAVE_CLONE3_WRAPPER		1
+
 # undef INTERNAL_SYSCALL_RAW
 # define INTERNAL_SYSCALL_RAW(name, nr, args...)		\
   ({ long _sys_result;						\
-- 
2.34.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2 5/9] s390x: Add the clone3 wrapper
  2022-09-30 19:26 [PATCH v2 0/9] Optimize posix_spawn signal setup with clone3 Adhemerval Zanella
                   ` (3 preceding siblings ...)
  2022-09-30 19:26 ` [PATCH v2 4/9] aarch64: " Adhemerval Zanella
@ 2022-09-30 19:26 ` Adhemerval Zanella
  2022-09-30 19:26 ` [PATCH v2 6/9] riscv: " Adhemerval Zanella
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 28+ messages in thread
From: Adhemerval Zanella @ 2022-09-30 19:26 UTC (permalink / raw)
  To: libc-alpha, Christian Brauner

It follows the internal signature:

  extern int clone3 (struct clone_args *__cl_args, size_t __size,
 int (*__func) (void *__arg), void *__arg);

And x86_64 semantics to return EINVAL if either cl_args or func
is NULL.  The stack is 16-byte aligned prior executing func.

Checked on s390x-linux-gnu.
---
 sysdeps/unix/sysv/linux/s390/s390-64/clone3.S | 84 +++++++++++++++++++
 sysdeps/unix/sysv/linux/s390/sysdep.h         |  1 +
 2 files changed, 85 insertions(+)
 create mode 100644 sysdeps/unix/sysv/linux/s390/s390-64/clone3.S

diff --git a/sysdeps/unix/sysv/linux/s390/s390-64/clone3.S b/sysdeps/unix/sysv/linux/s390/s390-64/clone3.S
new file mode 100644
index 0000000000..d338acfc9c
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/s390/s390-64/clone3.S
@@ -0,0 +1,84 @@
+/* The clone3 syscall wrapper.  Linux/s390x version.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#define _ERRNO_H        1
+#include <bits/errno.h>
+
+/* The userland implementation is:
+   int clone3 (struct clone_args *cl_args, size_t size,
+               int (*func)(void *arg), void *arg);
+
+   the kernel entry is:
+   int clone3 (struct clone_args *cl_args, size_t size);
+
+   The parameters are passed in registers from userland:
+   r2: cl_args
+   r3: size
+   r4: func
+   r5: arg  */
+
+        .text
+ENTRY(__clone3)
+	stg	%r6, 48(%r15)
+
+	/* Sanity check args.  */
+	ltgr	%r2, %r2
+	je	error
+	ltgr	%r6, %r4
+	je	error
+
+	/* Do the system call, the kernel expects:
+	   r1: system call number
+	   r2: cl_args
+	   r3: size  */
+	lghi	%r1, SYS_ify(clone3)
+	svc	0
+	ltgr	%r2,%r2			/* check return code */
+	jz	thread_start
+	lg	%r6, 48(%r15)
+	jgm	SYSCALL_ERROR_LABEL
+	br	%r14
+error:
+	lghi	%r2,-EINVAL
+	jg	SYSCALL_ERROR_LABEL
+PSEUDO_END (__clone3)
+
+	.align 4
+	.type thread_start, %function
+thread_start:
+	cfi_startproc
+	/* Mark r14 as undefined in order to stop unwinding here.  */
+	cfi_undefined (r14)
+
+	/* Align stack.  */
+	nill    %r15, 0xfff0
+
+	/* func is in gpr 6, arg in gpr 5.  */
+	lgr	%r2, %r5
+	aghi	%r15, -160
+	xc	0(8,%r15),0(%r15)
+	basr	%r14, %r6
+
+	DO_CALL (exit, 1)
+	cfi_endproc
+	.size thread_start, .-thread_start
+
+libc_hidden_def (__clone3)
+weak_alias (__clone3, clone3)
diff --git a/sysdeps/unix/sysv/linux/s390/sysdep.h b/sysdeps/unix/sysv/linux/s390/sysdep.h
index 930d7efe03..366c797ef1 100644
--- a/sysdeps/unix/sysv/linux/s390/sysdep.h
+++ b/sysdeps/unix/sysv/linux/s390/sysdep.h
@@ -72,6 +72,7 @@
 #ifdef __s390x__
 #define HAVE_CLOCK_GETRES64_VSYSCALL	"__kernel_clock_getres"
 #define HAVE_CLOCK_GETTIME64_VSYSCALL	"__kernel_clock_gettime"
+#define HAVE_CLONE3_WRAPPER		1
 #else
 #define HAVE_CLOCK_GETRES_VSYSCALL	"__kernel_clock_getres"
 #define HAVE_CLOCK_GETTIME_VSYSCALL	"__kernel_clock_gettime"
-- 
2.34.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2 6/9] riscv: Add the clone3 wrapper
  2022-09-30 19:26 [PATCH v2 0/9] Optimize posix_spawn signal setup with clone3 Adhemerval Zanella
                   ` (4 preceding siblings ...)
  2022-09-30 19:26 ` [PATCH v2 5/9] s390x: " Adhemerval Zanella
@ 2022-09-30 19:26 ` Adhemerval Zanella
  2022-09-30 19:26 ` [PATCH v2 7/9] arm: " Adhemerval Zanella
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 28+ messages in thread
From: Adhemerval Zanella @ 2022-09-30 19:26 UTC (permalink / raw)
  To: libc-alpha, Christian Brauner

It follows the internal signature:

  extern int clone3 (struct clone_args *__cl_args, size_t __size,
 int (*__func) (void *__arg), void *__arg);

And x86_64 semantics to return EINVAL if either cl_args or func
is NULL.  The stack is 16-byte aligned prior executing func.

Checked on riscv64-linux-gnu-rv64imafdc-lp64d.
---
 sysdeps/unix/sysv/linux/riscv/clone3.S | 83 ++++++++++++++++++++++++++
 sysdeps/unix/sysv/linux/riscv/sysdep.h |  1 +
 2 files changed, 84 insertions(+)
 create mode 100644 sysdeps/unix/sysv/linux/riscv/clone3.S

diff --git a/sysdeps/unix/sysv/linux/riscv/clone3.S b/sysdeps/unix/sysv/linux/riscv/clone3.S
new file mode 100644
index 0000000000..1e6e807aa0
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/riscv/clone3.S
@@ -0,0 +1,83 @@
+/* The clone3 syscall wrapper.  Linux/RISC-V version.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <asm/errno.h>
+#include <sys/asm.h>
+#include <sysdep.h>
+
+/* The userland implementation is:
+   int clone3 (struct clone_args *cl_args, size_t size,
+               int (*func)(void *arg), void *arg);
+
+   the kernel entry is:
+   int clone3 (struct clone_args *cl_args, size_t size);
+
+   The parameters are passed in registers from userland:
+   a0: cl_args
+   a1: size
+   a2: func
+   a3: arg  */
+
+        .text
+ENTRY(__clone3)
+	/* Sanity check args.  */
+	beqz	a0, L(invalid)	/* No NULL cl_args pointer.  */
+	beqz	a2, L(invalid)  /* No NULL function pointer.  */
+
+	/* Do the system call, the kernel expects:
+	   a7: system call number
+	   a0: cl_args
+	   a1: size  */
+	li	a7, __NR_clone3
+	scall
+
+	bltz	a0, L(error)
+	beqz	a0, L(thread_start)
+
+	ret
+
+L(invalid):
+	li	a0, -EINVAL
+L(error):
+	tail	__syscall_error
+END (__clone3)
+
+ENTRY(__thread_start_clone3)
+L(thread_start):
+	/* Terminate call stack by noting ra is undefined.  Use a dummy
+	   .cfi_label to force starting the FDE.  */
+	.cfi_label .Ldummy
+	cfi_undefined (ra)
+
+	/* Align stack to a 128-bit boundary as per RISC-V ABI.  */
+	andi		sp, sp, ALMASK
+
+	/* Restore the arg for user's function and call the user's
+	   function.  */
+	mv		a1, a2	/* Function pointer.  */
+	mv		a0, a3	/* Argument pointer.  */
+	jalr		a1
+
+	/* Call exit with the function's return value.  */
+	li		a7, __NR_exit
+	scall
+END(__thread_start_clone3)
+
+libc_hidden_def (__clone3)
+weak_alias (__clone3, clone3)
diff --git a/sysdeps/unix/sysv/linux/riscv/sysdep.h b/sysdeps/unix/sysv/linux/riscv/sysdep.h
index 9b03b10567..57523dc7b7 100644
--- a/sysdeps/unix/sysv/linux/riscv/sysdep.h
+++ b/sysdeps/unix/sysv/linux/riscv/sysdep.h
@@ -150,6 +150,7 @@
 
 /* RV32 does not support the gettime VDSO syscalls.  */
 # endif
+# define HAVE_CLONE3_WRAPPER		1
 
 /* List of system calls which are supported as vsyscalls (for RV32 and
    RV64).  */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2 7/9] arm: Add the clone3 wrapper
  2022-09-30 19:26 [PATCH v2 0/9] Optimize posix_spawn signal setup with clone3 Adhemerval Zanella
                   ` (5 preceding siblings ...)
  2022-09-30 19:26 ` [PATCH v2 6/9] riscv: " Adhemerval Zanella
@ 2022-09-30 19:26 ` Adhemerval Zanella
  2022-09-30 19:26 ` [PATCH v2 8/9] mips: " Adhemerval Zanella
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 28+ messages in thread
From: Adhemerval Zanella @ 2022-09-30 19:26 UTC (permalink / raw)
  To: libc-alpha, Christian Brauner

It follows the internal signature:

  extern int clone3 (struct clone_args *__cl_args, size_t __size,
		    int (*__func) (void *__arg), void *__arg);

And x86_64 semantics to return EINVAL if either cl_args or func
is NULL.  The stack is 16-byte aligned prior executing func.

Checked on arm-linux-gnueabihf.
---
 sysdeps/unix/sysv/linux/arm/clone3.S | 80 ++++++++++++++++++++++++++++
 sysdeps/unix/sysv/linux/arm/sysdep.h |  1 +
 2 files changed, 81 insertions(+)
 create mode 100644 sysdeps/unix/sysv/linux/arm/clone3.S

diff --git a/sysdeps/unix/sysv/linux/arm/clone3.S b/sysdeps/unix/sysv/linux/arm/clone3.S
new file mode 100644
index 0000000000..f6b274cf51
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/arm/clone3.S
@@ -0,0 +1,80 @@
+/* The clone3 syscall wrapper.  Linux/arm version.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#define _ERRNO_H	1
+#include <bits/errno.h>
+
+/* The userland implementation is:
+   int clone3 (struct clone_args *cl_args, size_t size,
+               int (*func)(void *arg), void *arg);
+
+   the kernel entry is:
+   int clone3 (struct clone_args *cl_args, size_t size);
+
+   The parameters are passed in registers from userland:
+   r0: cl_args
+   r1: size
+   r2: func
+   r3: arg  */
+
+        .text
+ENTRY(__clone3)
+	/* Sanity check args.  */
+	cmp	r0, #0
+	ite	ne
+	cmpne	r1, #0
+	moveq	r0, #-EINVAL
+	beq	PLTJMP(syscall_error)
+
+	/* Do the syscall, the kernel expects:
+	   r7: system call number:
+	   r0: cl_args
+	   r1: size  */
+	ldr     r7, =SYS_ify(clone3)
+	swi	0x0
+	cfi_endproc
+
+	cmp	r0, #0
+	beq	1f
+
+	blt	PLTJMP(C_SYMBOL_NAME(__syscall_error))
+	RETINSTR(, lr)
+
+	cfi_startproc
+PSEUDO_END (__clone3)
+
+1:
+	.fnstart
+	.cantunwind
+	/* Align sp.  */
+	and	sp, sp, #-8
+
+	mov	r0, r3
+	mov	ip, r2
+	BLX (ip)
+
+	/* And we are done, passing the return value through r0.  */
+	ldr	r7, =SYS_ify(exit)
+	swi	0x0
+
+	.fnend
+
+libc_hidden_def (__clone3)
+weak_alias (__clone3, clone3)
diff --git a/sysdeps/unix/sysv/linux/arm/sysdep.h b/sysdeps/unix/sysv/linux/arm/sysdep.h
index 1f270b961e..fcc86189d6 100644
--- a/sysdeps/unix/sysv/linux/arm/sysdep.h
+++ b/sysdeps/unix/sysv/linux/arm/sysdep.h
@@ -362,6 +362,7 @@ __local_syscall_error:						\
 #define HAVE_CLOCK_GETTIME_VSYSCALL	"__vdso_clock_gettime"
 #define HAVE_CLOCK_GETTIME64_VSYSCALL	"__vdso_clock_gettime64"
 #define HAVE_GETTIMEOFDAY_VSYSCALL	"__vdso_gettimeofday"
+#define HAVE_CLONE3_WRAPPER		1
 
 #define LOAD_ARGS_0()
 #define ASM_ARGS_0
-- 
2.34.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2 8/9] mips: Add the clone3 wrapper
  2022-09-30 19:26 [PATCH v2 0/9] Optimize posix_spawn signal setup with clone3 Adhemerval Zanella
                   ` (6 preceding siblings ...)
  2022-09-30 19:26 ` [PATCH v2 7/9] arm: " Adhemerval Zanella
@ 2022-09-30 19:26 ` Adhemerval Zanella
  2022-09-30 19:26 ` [PATCH v2 9/9] Linux: optimize clone3 internal usage Adhemerval Zanella
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 28+ messages in thread
From: Adhemerval Zanella @ 2022-09-30 19:26 UTC (permalink / raw)
  To: libc-alpha, Christian Brauner

It follows the internal signature:

extern int clone3 (struct clone_args *__cl_args, size_t __size,
                   int (*__func) (void *__arg), void *__arg);

And x86_64 semantics to return EINVAL if either cl_args or func
is NULL.  The stack is 16-byte aligned prior executing func.

Checked on mips64el-linux-gnueabihf and mipsel-linux-gnu.
---
 sysdeps/unix/sysv/linux/mips/clone3.S | 147 ++++++++++++++++++++++++++
 sysdeps/unix/sysv/linux/mips/sysdep.h |   2 +
 2 files changed, 149 insertions(+)
 create mode 100644 sysdeps/unix/sysv/linux/mips/clone3.S

diff --git a/sysdeps/unix/sysv/linux/mips/clone3.S b/sysdeps/unix/sysv/linux/mips/clone3.S
new file mode 100644
index 0000000000..bd493e344b
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/mips/clone3.S
@@ -0,0 +1,147 @@
+/* The clone3 syscall wrapper.  Linux/mips version.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sys/asm.h>
+#include <sysdep.h>
+#define _ERRNO_H        1
+#include <bits/errno.h>
+
+/* The userland implementation is:
+   int clone3 (struct clone_args *cl_args, size_t size,
+               int (*func)(void *arg), void *arg);
+
+   the kernel entry is:
+   int clone3 (struct clone_args *cl_args, size_t size);
+
+   The parameters are passed in registers from userland:
+   a0/$4: cl_args
+   a1/$5: size
+   a2/$6: func
+   a3/$7: arg  */
+
+	.text
+	.set		nomips16
+#if _MIPS_SIM == _ABIO32
+# define EXTRA_LOCALS 1
+#else
+# define EXTRA_LOCALS 0
+#endif
+#define FRAMESZ ((NARGSAVE*SZREG)+ALSZ)&ALMASK
+GPOFF= FRAMESZ-(1*SZREG)
+NESTED(__clone3, SZREG, sp)
+#ifdef __PIC__
+	SETUP_GP
+#endif
+#if FRAMESZ
+	PTR_SUBU sp, FRAMESZ
+	cfi_adjust_cfa_offset (FRAMESZ)
+#endif
+	SETUP_GP64_STACK (GPOFF, __clone)
+#ifdef __PIC__
+	SAVE_GP (GPOFF)
+#endif
+#ifdef PROF
+	.set	noat
+	move	$1,ra
+	jal	_mcount
+	.set	at
+#endif
+
+	/* Sanity check args.  */
+	li	v0, EINVAL
+	beqz	a0, L(error)	/* No NULL cl_args pointer.  */
+	beqz	a2, L(error)	/* No NULL function pointer.  */
+
+	move	$8, a3		/* a3 is set to 0/1 for syscall success/error
+				   while a4/$8 is returned unmodified.  */
+
+	/* Do the system call, the kernel expects:
+	   v0: system call number
+	   a0: cl_args
+	   a1: size  */
+	li		v0, __NR_clone3
+	cfi_endproc
+	syscall
+
+	bnez		a3, L(error)
+	beqz		v0, L(thread_start_clone3)
+
+	/* Successful return from the parent */
+	cfi_startproc
+#if FRAMESZ
+	cfi_adjust_cfa_offset (FRAMESZ)
+#endif
+	SETUP_GP64_STACK_CFI (GPOFF)
+	cfi_remember_state
+	RESTORE_GP64_STACK
+#if FRAMESZ
+	PTR_ADDU	sp, FRAMESZ
+	cfi_adjust_cfa_offset (-FRAMESZ)
+#endif
+	ret
+
+L(error):
+	cfi_restore_state
+#ifdef __PIC__
+	PTR_LA		t9,__syscall_error
+	RESTORE_GP64_STACK
+	PTR_ADDU	sp, FRAMESZ
+	cfi_adjust_cfa_offset (-FRAMESZ)
+	jr		t9
+#else
+	RESTORE_GP64_STACK
+	PTR_ADDU	sp, FRAMESZ
+	cfi_adjust_cfa_offset (-FRAMESZ)
+	j		__syscall_error
+#endif
+END (__clone3)
+
+/* Load up the arguments to the function.  Put this block of code in
+   its own function so that we can terminate the stack trace with our
+   debug info.  */
+
+ENTRY(__thread_start_clone3)
+L(thread_start_clone3):
+	cfi_undefined ($31)
+	/* cp is already loaded.  */
+	SAVE_GP (GPOFF)
+	/* The stackframe has been created on entry of clone3.  */
+
+	/* Align stack to 8/16 bytes per the ABI.  */
+#if _MIPS_SIM == _ABIO32
+	li		t1, -8
+#else
+	li		t1, -16
+#endif
+	and		sp, sp, t1
+
+	/* Restore the arg for user's function.  */
+	move		t9, a2		/* Function pointer.  */
+	move		a0, $8		/* Argument pointer.  */
+
+	/* Call the user's function.  */
+	jal		t9
+
+	move		a0, v0
+	li		v0, __NR_exit
+	syscall
+END(__thread_start_clone3)
+
+libc_hidden_def (__clone3)
+weak_alias (__clone3, clone3)
diff --git a/sysdeps/unix/sysv/linux/mips/sysdep.h b/sysdeps/unix/sysv/linux/mips/sysdep.h
index 2515dadc26..3f464925e7 100644
--- a/sysdeps/unix/sysv/linux/mips/sysdep.h
+++ b/sysdeps/unix/sysv/linux/mips/sysdep.h
@@ -28,3 +28,5 @@
 #endif
 #define HAVE_GETTIMEOFDAY_VSYSCALL      "__vdso_gettimeofday"
 #define HAVE_CLOCK_GETRES_VSYSCALL      "__vdso_clock_getres"
+
+#define HAVE_CLONE3_WRAPPER		1
-- 
2.34.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2 9/9] Linux: optimize clone3 internal usage
  2022-09-30 19:26 [PATCH v2 0/9] Optimize posix_spawn signal setup with clone3 Adhemerval Zanella
                   ` (7 preceding siblings ...)
  2022-09-30 19:26 ` [PATCH v2 8/9] mips: " Adhemerval Zanella
@ 2022-09-30 19:26 ` Adhemerval Zanella
  2023-01-11 21:12   ` Carlos O'Donell
  2022-10-27 16:48 ` [PATCH v2 0/9] Optimize posix_spawn signal setup with clone3 Adhemerval Zanella Netto
  2023-01-11 21:11 ` Carlos O'Donell
  10 siblings, 1 reply; 28+ messages in thread
From: Adhemerval Zanella @ 2022-09-30 19:26 UTC (permalink / raw)
  To: libc-alpha, Christian Brauner

Now that clone3 is used on more architectures, add an optimization
to avoid calling when glibc detects that it is no supported by the
kernel.  It also adds __ASSUME_CLONE3, which allows skip this
optimization and issue clone3 syscall directly.

It does not handle the the small window between 5.3 and 5.5 for
posix_spawn (CLONE_CLEAR_SIGHAND was added in 5.5).

Checked on x86_64-linux-gnu.
---
 include/clone_internal.h                  |  5 +++++
 sysdeps/unix/sysv/linux/clone-internal.c  | 24 ++++++++++++++++++++++-
 sysdeps/unix/sysv/linux/kernel-features.h |  9 +++++++++
 3 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/include/clone_internal.h b/include/clone_internal.h
index 320640e64b..385f3c5589 100644
--- a/include/clone_internal.h
+++ b/include/clone_internal.h
@@ -7,6 +7,11 @@ extern __typeof (clone3) __clone3;
    -1 with ENOSYS, fall back to clone or clone2.  */
 extern int __clone_internal (struct clone_args *__cl_args,
 			     int (*__func) (void *__arg), void *__arg);
+/* clone3 wrapper with a sticky check to avoid re-issue the syscall if
+   it fails with ENOSYS.  */
+extern int __clone3_internal (struct clone_args *cl_args,
+			      int (*func) (void *args), void *arg)
+     attribute_hidden;
 /* The fallback code which calls clone/clone2 based on clone3 arguments.  */
 extern int __clone_internal_fallback (struct clone_args *__cl_args,
 				      int (*__func) (void *__arg),
diff --git a/sysdeps/unix/sysv/linux/clone-internal.c b/sysdeps/unix/sysv/linux/clone-internal.c
index 7bc991e033..39d76733db 100644
--- a/sysdeps/unix/sysv/linux/clone-internal.c
+++ b/sysdeps/unix/sysv/linux/clone-internal.c
@@ -76,6 +76,28 @@ __clone_internal_fallback (struct clone_args *cl_args,
   return ret;
 }
 
+int
+__clone3_internal (struct clone_args *cl_args, int (*func) (void *args),
+		   void *arg)
+{
+#ifdef HAVE_CLONE3_WRAPPER
+# if __ASSUME_CLONE3
+  return __clone3 (cl_args, sizeof (*cl_args), func, arg);
+# else
+  static int clone3_supported = 1;
+  if (atomic_load_relaxed (&clone3_supported) == 1)
+    {
+      int ret = __clone3 (cl_args, sizeof (*cl_args), func, arg);
+      if (ret != -1 || errno != ENOSYS)
+	return ret;
+
+      atomic_store_relaxed (&clone3_supported, 0);
+    }
+# endif
+#endif
+  __set_errno (ENOSYS);
+  return -1;
+}
 
 int
 __clone_internal (struct clone_args *cl_args,
@@ -83,7 +105,7 @@ __clone_internal (struct clone_args *cl_args,
 {
 #ifdef HAVE_CLONE3_WRAPPER
   int saved_errno = errno;
-  int ret = __clone3 (cl_args, sizeof (*cl_args), func, arg);
+  int ret = __clone3_internal (cl_args, func, arg);
   if (ret != -1 || errno != ENOSYS)
     return ret;
 
diff --git a/sysdeps/unix/sysv/linux/kernel-features.h b/sysdeps/unix/sysv/linux/kernel-features.h
index 74adc3956b..4ecd08a98f 100644
--- a/sysdeps/unix/sysv/linux/kernel-features.h
+++ b/sysdeps/unix/sysv/linux/kernel-features.h
@@ -236,4 +236,13 @@
 # define __ASSUME_FUTEX_LOCK_PI2 0
 #endif
 
+/* The clone3 system call was introduced across on most architectures in
+   Linux 5.3.  Not all ports implements it, so it should be used along
+   HAVE_CLONE3_WRAPPER define.  */
+#if __LINUX_KERNEL_VERSION >= 0x050300
+# define __ASSUME_CLONE3 1
+#else
+# define __ASSUME_CLONE3 0
+#endif
+
 #endif /* kernel-features.h */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 0/9] Optimize posix_spawn signal setup with clone3
  2022-09-30 19:26 [PATCH v2 0/9] Optimize posix_spawn signal setup with clone3 Adhemerval Zanella
                   ` (8 preceding siblings ...)
  2022-09-30 19:26 ` [PATCH v2 9/9] Linux: optimize clone3 internal usage Adhemerval Zanella
@ 2022-10-27 16:48 ` Adhemerval Zanella Netto
  2023-01-11 21:11 ` Carlos O'Donell
  10 siblings, 0 replies; 28+ messages in thread
From: Adhemerval Zanella Netto @ 2022-10-27 16:48 UTC (permalink / raw)
  To: libc-alpha, Christian Brauner

Ping.

On 30/09/22 16:26, Adhemerval Zanella wrote:
> Currently posix_spawn has to issue at least NSIG sigaction syscalls
> to obtain current signal disposition and more if they there are not
> either SIG_IGN or SIG_DFL.  The new clone3 CLONE_CLEAR_SIGHAND flags
> introduced with Linux 5.5 resets all signal handlers of the child not
> set to SIG_IGN to SIG_DFL, thus allowing posix_spawn to skip this
> preparation phase.
> 
> The exception is the signals defined by posix_spawnattr_setsigdefault
> when POSIX_SPAWN_SETSIGDEF is set (since they can be SIG_IGN).  In
> this case posix_spawn helper process needs to issue a sigction to 
> set the signal disposition to SIG_DFL.
> 
> The patchset also adds clone3 implementation for aarch64, powerpc64,
> s390x, riscv, arm, and mips.
> 
> Changes from v1:
>   * Adeed arm and mips clone3 implementation.
> 
> Adhemerval Zanella (9):
>   linux: Do not reset signal handler in posix_spawn if it is already
>     SIG_DFL
>   linux: Add clone3 CLONE_CLEAR_SIGHAND optimization to posix_spawn
>   powerpc64le: Add the clone3 wrapper
>   aarch64: Add the clone3 wrapper
>   s390x: Add the clone3 wrapper
>   riscv: Add the clone3 wrapper
>   arm: Add the clone3 wrapper
>   mips: Add the clone3 wrapper
>   Linux: optimize clone3 internal usage
> 
>  include/clone_internal.h                      |  10 +
>  posix/Makefile                                |   3 +-
>  posix/tst-spawn7.c                            | 179 ++++++++++++++++++
>  sysdeps/unix/sysv/linux/aarch64/clone3.S      |  90 +++++++++
>  sysdeps/unix/sysv/linux/aarch64/sysdep.h      |   2 +
>  sysdeps/unix/sysv/linux/arm/clone3.S          |  80 ++++++++
>  sysdeps/unix/sysv/linux/arm/sysdep.h          |   1 +
>  sysdeps/unix/sysv/linux/clone-internal.c      |  59 ++++--
>  sysdeps/unix/sysv/linux/clone3.h              |   6 +
>  sysdeps/unix/sysv/linux/kernel-features.h     |   9 +
>  sysdeps/unix/sysv/linux/mips/clone3.S         | 147 ++++++++++++++
>  sysdeps/unix/sysv/linux/mips/sysdep.h         |   2 +
>  .../sysv/linux/powerpc/powerpc64/clone3.S     | 145 ++++++++++++++
>  sysdeps/unix/sysv/linux/powerpc/sysdep.h      |   1 +
>  sysdeps/unix/sysv/linux/riscv/clone3.S        |  83 ++++++++
>  sysdeps/unix/sysv/linux/riscv/sysdep.h        |   1 +
>  sysdeps/unix/sysv/linux/s390/s390-64/clone3.S |  84 ++++++++
>  sysdeps/unix/sysv/linux/s390/sysdep.h         |   1 +
>  sysdeps/unix/sysv/linux/spawni.c              |  33 +++-
>  19 files changed, 910 insertions(+), 26 deletions(-)
>  create mode 100644 posix/tst-spawn7.c
>  create mode 100644 sysdeps/unix/sysv/linux/aarch64/clone3.S
>  create mode 100644 sysdeps/unix/sysv/linux/arm/clone3.S
>  create mode 100644 sysdeps/unix/sysv/linux/mips/clone3.S
>  create mode 100644 sysdeps/unix/sysv/linux/powerpc/powerpc64/clone3.S
>  create mode 100644 sysdeps/unix/sysv/linux/riscv/clone3.S
>  create mode 100644 sysdeps/unix/sysv/linux/s390/s390-64/clone3.S
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 4/9] aarch64: Add the clone3 wrapper
  2022-09-30 19:26 ` [PATCH v2 4/9] aarch64: " Adhemerval Zanella
@ 2022-11-02 12:12   ` Szabolcs Nagy
  2022-11-03 13:15     ` Adhemerval Zanella Netto
  0 siblings, 1 reply; 28+ messages in thread
From: Szabolcs Nagy @ 2022-11-02 12:12 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: libc-alpha, Christian Brauner

The 09/30/2022 16:26, Adhemerval Zanella via Libc-alpha wrote:
> It follow the internal signature:
> 
>   extern int clone3 (struct clone_args *__cl_args, size_t __size,
>  int (*__func) (void *__arg), void *__arg);
> 
> And x86_64 semantics to return EINVAL if either cl_args or func
> is NULL.  The stack is 16-byte aligned prior executing func.

"x86_64 semantics" sounds wrong: maybe this should be documented?
i'd expect 0 cl_args/func to be UB like in most posix apis.

and aligning sp in the child fails if signals are allowed there
(pthreads does not allow signals now, direct callers might).
i dont know if that's a concert (or if unaligned stack is
something we should fix up in clone3).

> 
> Checked on aarch64-linux-gnu.
> ---
>  sysdeps/unix/sysv/linux/aarch64/clone3.S | 90 ++++++++++++++++++++++++
>  sysdeps/unix/sysv/linux/aarch64/sysdep.h |  2 +
>  2 files changed, 92 insertions(+)
>  create mode 100644 sysdeps/unix/sysv/linux/aarch64/clone3.S
> 
> diff --git a/sysdeps/unix/sysv/linux/aarch64/clone3.S b/sysdeps/unix/sysv/linux/aarch64/clone3.S
> new file mode 100644
> index 0000000000..dba93430eb
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/aarch64/clone3.S
> @@ -0,0 +1,90 @@
> +/* The clone3 syscall wrapper.  Linux/aarch64 version.
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#define _ERRNO_H        1
> +#include <bits/errno.h>
> +
> +/* The userland implementation is:
> +   int clone3 (struct clone_args *cl_args, size_t size,
> +               int (*func)(void *arg), void *arg);
> +
> +   the kernel entry is:
> +   int clone3 (struct clone_args *cl_args, size_t size);
> +
> +   The parameters are passed in registers from userland:
> +   x0: cl_args
> +   x1: size
> +   x2: func
> +   x3: arg  */
> +
> +        .text
> +ENTRY(__clone3)
> +	PTR_ARG (0)
> +	PTR_ARG (1)
> +	PTR_ARG (3)
> +	PTR_ARG (4)
> +	/* Save args for the child.  */
> +	mov	x10, x0		/* cl_args  */
> +	mov	x11, x2		/* func	 */
> +	mov	x12, x3		/* args  */
> +
> +	/* Sanity check args.  */
> +	mov	x0, #-EINVAL
> +	cbz	x10, .Lsyscall_error	/* No NULL cl_args pointer.  */
> +	cbz	x2, .Lsyscall_error	/* No NULL function pointer.  */
> +
> +	/* Do the system call, the kernel expects:
> +	   x8: system call number
> +	   x0: cl_args
> +	   x1: size  */
> +	mov	x0, x10
> +	mov	x8, #SYS_ify(clone3)
> +	svc	0x0
> +
> +	cmp	x0, #0
> +	beq	thread_start
> +	blt	.Lsyscall_error
> +	RET
> +PSEUDO_END (__clone3)
> +
> +	.align 4
> +	.type thread_start, %function
> +thread_start:
> +	cfi_startproc
> +	cfi_undefined (x30)
> +	mov	x29, 0
> +
> +	/* Align sp.  */
> +	mov	x0, sp
> +	and	x0, x0, -16
> +	mov	sp, x0
> +
> +	/* Pick the function arg and execute.  */
> +	mov	x0, x12
> +	blr	x11
> +
> +	/* We are done, pass the return value through x0.  */
> +	mov	x8, #SYS_ify(exit)
> +	svc	0x0
> +	cfi_endproc
> +	.size thread_start, .-thread_start
> +
> +libc_hidden_def (__clone3)
> +weak_alias (__clone3, clone3)
> diff --git a/sysdeps/unix/sysv/linux/aarch64/sysdep.h b/sysdeps/unix/sysv/linux/aarch64/sysdep.h
> index f1853e012f..42bb22f5e6 100644
> --- a/sysdeps/unix/sysv/linux/aarch64/sysdep.h
> +++ b/sysdeps/unix/sysv/linux/aarch64/sysdep.h
> @@ -164,6 +164,8 @@
>  # define HAVE_CLOCK_GETTIME64_VSYSCALL	"__kernel_clock_gettime"
>  # define HAVE_GETTIMEOFDAY_VSYSCALL	"__kernel_gettimeofday"
>  
> +# define HAVE_CLONE3_WRAPPER		1
> +
>  # undef INTERNAL_SYSCALL_RAW
>  # define INTERNAL_SYSCALL_RAW(name, nr, args...)		\
>    ({ long _sys_result;						\
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 4/9] aarch64: Add the clone3 wrapper
  2022-11-02 12:12   ` Szabolcs Nagy
@ 2022-11-03 13:15     ` Adhemerval Zanella Netto
  2022-11-03 14:01       ` Szabolcs Nagy
  0 siblings, 1 reply; 28+ messages in thread
From: Adhemerval Zanella Netto @ 2022-11-03 13:15 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: libc-alpha, Christian Brauner



On 02/11/22 09:12, Szabolcs Nagy wrote:
> The 09/30/2022 16:26, Adhemerval Zanella via Libc-alpha wrote:
>> It follow the internal signature:
>>
>>   extern int clone3 (struct clone_args *__cl_args, size_t __size,
>>  int (*__func) (void *__arg), void *__arg);
>>
>> And x86_64 semantics to return EINVAL if either cl_args or func
>> is NULL.  The stack is 16-byte aligned prior executing func.
> 
> "x86_64 semantics" sounds wrong: maybe this should be documented?
> i'd expect 0 cl_args/func to be UB like in most posix apis.

Right, I think it is worth to document the function semantic
properly at least on its internal header (include/clone_internal.h).
H.J also added a new clone3.h headers, which is not currently installed
that I am inclined to just remove it from now.  We might reinstate 
if/when we decide to provide the clone3 as an ABI.

And returning EINVAL for 0 cl_args/func aligns with our exported clone
interface, where EINVAL is also returned for 0 function argument.

> 
> and aligning sp in the child fails if signals are allowed there
> (pthreads does not allow signals now, direct callers might).
> i dont know if that's a concert (or if unaligned stack is
> something we should fix up in clone3).

It was overlooked on initial x86_64 clone3 implementation as well.  I
think it better to just return EINVAL for unaligned stacks and avoid
to change the stack pointer in the created thread.

> 
>>
>> Checked on aarch64-linux-gnu.
>> ---
>>  sysdeps/unix/sysv/linux/aarch64/clone3.S | 90 ++++++++++++++++++++++++
>>  sysdeps/unix/sysv/linux/aarch64/sysdep.h |  2 +
>>  2 files changed, 92 insertions(+)
>>  create mode 100644 sysdeps/unix/sysv/linux/aarch64/clone3.S
>>
>> diff --git a/sysdeps/unix/sysv/linux/aarch64/clone3.S b/sysdeps/unix/sysv/linux/aarch64/clone3.S
>> new file mode 100644
>> index 0000000000..dba93430eb
>> --- /dev/null
>> +++ b/sysdeps/unix/sysv/linux/aarch64/clone3.S
>> @@ -0,0 +1,90 @@
>> +/* The clone3 syscall wrapper.  Linux/aarch64 version.
>> +   Copyright (C) 2022 Free Software Foundation, Inc.
>> +
>> +   This file is part of the GNU C Library.
>> +
>> +   The GNU C Library is free software; you can redistribute it and/or
>> +   modify it under the terms of the GNU Lesser General Public
>> +   License as published by the Free Software Foundation; either
>> +   version 2.1 of the License, or (at your option) any later version.
>> +
>> +   The GNU C Library is distributed in the hope that it will be useful,
>> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> +   Lesser General Public License for more details.
>> +
>> +   You should have received a copy of the GNU Lesser General Public
>> +   License along with the GNU C Library; if not, see
>> +   <https://www.gnu.org/licenses/>.  */
>> +
>> +#include <sysdep.h>
>> +#define _ERRNO_H        1
>> +#include <bits/errno.h>
>> +
>> +/* The userland implementation is:
>> +   int clone3 (struct clone_args *cl_args, size_t size,
>> +               int (*func)(void *arg), void *arg);
>> +
>> +   the kernel entry is:
>> +   int clone3 (struct clone_args *cl_args, size_t size);
>> +
>> +   The parameters are passed in registers from userland:
>> +   x0: cl_args
>> +   x1: size
>> +   x2: func
>> +   x3: arg  */
>> +
>> +        .text
>> +ENTRY(__clone3)
>> +	PTR_ARG (0)
>> +	PTR_ARG (1)
>> +	PTR_ARG (3)
>> +	PTR_ARG (4)
>> +	/* Save args for the child.  */
>> +	mov	x10, x0		/* cl_args  */
>> +	mov	x11, x2		/* func	 */
>> +	mov	x12, x3		/* args  */
>> +
>> +	/* Sanity check args.  */
>> +	mov	x0, #-EINVAL
>> +	cbz	x10, .Lsyscall_error	/* No NULL cl_args pointer.  */
>> +	cbz	x2, .Lsyscall_error	/* No NULL function pointer.  */
>> +
>> +	/* Do the system call, the kernel expects:
>> +	   x8: system call number
>> +	   x0: cl_args
>> +	   x1: size  */
>> +	mov	x0, x10
>> +	mov	x8, #SYS_ify(clone3)
>> +	svc	0x0
>> +
>> +	cmp	x0, #0
>> +	beq	thread_start
>> +	blt	.Lsyscall_error
>> +	RET
>> +PSEUDO_END (__clone3)
>> +
>> +	.align 4
>> +	.type thread_start, %function
>> +thread_start:
>> +	cfi_startproc
>> +	cfi_undefined (x30)
>> +	mov	x29, 0
>> +
>> +	/* Align sp.  */
>> +	mov	x0, sp
>> +	and	x0, x0, -16
>> +	mov	sp, x0
>> +
>> +	/* Pick the function arg and execute.  */
>> +	mov	x0, x12
>> +	blr	x11
>> +
>> +	/* We are done, pass the return value through x0.  */
>> +	mov	x8, #SYS_ify(exit)
>> +	svc	0x0
>> +	cfi_endproc
>> +	.size thread_start, .-thread_start
>> +
>> +libc_hidden_def (__clone3)
>> +weak_alias (__clone3, clone3)
>> diff --git a/sysdeps/unix/sysv/linux/aarch64/sysdep.h b/sysdeps/unix/sysv/linux/aarch64/sysdep.h
>> index f1853e012f..42bb22f5e6 100644
>> --- a/sysdeps/unix/sysv/linux/aarch64/sysdep.h
>> +++ b/sysdeps/unix/sysv/linux/aarch64/sysdep.h
>> @@ -164,6 +164,8 @@
>>  # define HAVE_CLOCK_GETTIME64_VSYSCALL	"__kernel_clock_gettime"
>>  # define HAVE_GETTIMEOFDAY_VSYSCALL	"__kernel_gettimeofday"
>>  
>> +# define HAVE_CLONE3_WRAPPER		1
>> +
>>  # undef INTERNAL_SYSCALL_RAW
>>  # define INTERNAL_SYSCALL_RAW(name, nr, args...)		\
>>    ({ long _sys_result;						\
>> -- 
>> 2.34.1
>>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 4/9] aarch64: Add the clone3 wrapper
  2022-11-03 13:15     ` Adhemerval Zanella Netto
@ 2022-11-03 14:01       ` Szabolcs Nagy
  2022-11-03 16:22         ` Adhemerval Zanella Netto
  0 siblings, 1 reply; 28+ messages in thread
From: Szabolcs Nagy @ 2022-11-03 14:01 UTC (permalink / raw)
  To: Adhemerval Zanella Netto; +Cc: libc-alpha, Christian Brauner

The 11/03/2022 10:15, Adhemerval Zanella Netto wrote:
> On 02/11/22 09:12, Szabolcs Nagy wrote:
> > The 09/30/2022 16:26, Adhemerval Zanella via Libc-alpha wrote:
> >> It follow the internal signature:
> >>
> >>   extern int clone3 (struct clone_args *__cl_args, size_t __size,
> >>  int (*__func) (void *__arg), void *__arg);
> >>
> >> And x86_64 semantics to return EINVAL if either cl_args or func
> >> is NULL.  The stack is 16-byte aligned prior executing func.
> > 
> > "x86_64 semantics" sounds wrong: maybe this should be documented?
> > i'd expect 0 cl_args/func to be UB like in most posix apis.
> 
> Right, I think it is worth to document the function semantic
> properly at least on its internal header (include/clone_internal.h).
> H.J also added a new clone3.h headers, which is not currently installed
> that I am inclined to just remove it from now.  We might reinstate 
> if/when we decide to provide the clone3 as an ABI.
> 
> And returning EINVAL for 0 cl_args/func aligns with our exported clone
> interface, where EINVAL is also returned for 0 function argument.

ok.

> > 
> > and aligning sp in the child fails if signals are allowed there
> > (pthreads does not allow signals now, direct callers might).
> > i dont know if that's a concert (or if unaligned stack is
> > something we should fix up in clone3).
> 
> It was overlooked on initial x86_64 clone3 implementation as well.  I
> think it better to just return EINVAL for unaligned stacks and avoid
> to change the stack pointer in the created thread.

long time ago linux did that on aarch64, but it was removed:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e6d9a52543338603e25e71e0e4942f05dae0dd8a

i think in clone3 the kernel should have aligned (it knows
the bounds now), doing it in the userspace wrapper is weird
(should we adjust the stack size?). and not doing it at all
makes clone3 hard to use portably (user has to know target
specific pcs requirements).

not sure what's the best way forward.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 4/9] aarch64: Add the clone3 wrapper
  2022-11-03 14:01       ` Szabolcs Nagy
@ 2022-11-03 16:22         ` Adhemerval Zanella Netto
  2022-11-03 16:31           ` Szabolcs Nagy
  0 siblings, 1 reply; 28+ messages in thread
From: Adhemerval Zanella Netto @ 2022-11-03 16:22 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: libc-alpha, Christian Brauner, H.J. Lu



On 03/11/22 11:01, Szabolcs Nagy wrote:
> The 11/03/2022 10:15, Adhemerval Zanella Netto wrote:
>> On 02/11/22 09:12, Szabolcs Nagy wrote:
>>> The 09/30/2022 16:26, Adhemerval Zanella via Libc-alpha wrote:
>>>> It follow the internal signature:
>>>>
>>>>   extern int clone3 (struct clone_args *__cl_args, size_t __size,
>>>>  int (*__func) (void *__arg), void *__arg);
>>>>
>>>> And x86_64 semantics to return EINVAL if either cl_args or func
>>>> is NULL.  The stack is 16-byte aligned prior executing func.
>>>
>>> "x86_64 semantics" sounds wrong: maybe this should be documented?
>>> i'd expect 0 cl_args/func to be UB like in most posix apis.
>>
>> Right, I think it is worth to document the function semantic
>> properly at least on its internal header (include/clone_internal.h).
>> H.J also added a new clone3.h headers, which is not currently installed
>> that I am inclined to just remove it from now.  We might reinstate 
>> if/when we decide to provide the clone3 as an ABI.
>>
>> And returning EINVAL for 0 cl_args/func aligns with our exported clone
>> interface, where EINVAL is also returned for 0 function argument.
> 
> ok.
> 
>>>
>>> and aligning sp in the child fails if signals are allowed there
>>> (pthreads does not allow signals now, direct callers might).
>>> i dont know if that's a concert (or if unaligned stack is
>>> something we should fix up in clone3).
>>
>> It was overlooked on initial x86_64 clone3 implementation as well.  I
>> think it better to just return EINVAL for unaligned stacks and avoid
>> to change the stack pointer in the created thread.
> 
> long time ago linux did that on aarch64, but it was removed:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e6d9a52543338603e25e71e0e4942f05dae0dd8a
> 
> i think in clone3 the kernel should have aligned (it knows
> the bounds now), doing it in the userspace wrapper is weird
> (should we adjust the stack size?). and not doing it at all
> makes clone3 hard to use portably (user has to know target
> specific pcs requirements).
> 
> not sure what's the best way forward.

I think the stack size won't matter much here, at least not from
kernel point of view (the resulting stack size will most likely
be page aligned anyway).  But I think this kernel commit makes a good
point that silently adjusting the stack in userland is not the
correct approach, I think H.J has done to make it consistent with
glibc clone implementation which does it.

IMHO the best approach would to just remove the stack alignment,
since it incurs the signal handling issue.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 4/9] aarch64: Add the clone3 wrapper
  2022-11-03 16:22         ` Adhemerval Zanella Netto
@ 2022-11-03 16:31           ` Szabolcs Nagy
  2022-11-03 16:39             ` Adhemerval Zanella Netto
  0 siblings, 1 reply; 28+ messages in thread
From: Szabolcs Nagy @ 2022-11-03 16:31 UTC (permalink / raw)
  To: Adhemerval Zanella Netto; +Cc: libc-alpha, Christian Brauner, H.J. Lu

The 11/03/2022 13:22, Adhemerval Zanella Netto wrote:
> 
> 
> On 03/11/22 11:01, Szabolcs Nagy wrote:
> > The 11/03/2022 10:15, Adhemerval Zanella Netto wrote:
> >> On 02/11/22 09:12, Szabolcs Nagy wrote:
> >>> The 09/30/2022 16:26, Adhemerval Zanella via Libc-alpha wrote:
> >>>> It follow the internal signature:
> >>>>
> >>>>   extern int clone3 (struct clone_args *__cl_args, size_t __size,
> >>>>  int (*__func) (void *__arg), void *__arg);
> >>>>
> >>>> And x86_64 semantics to return EINVAL if either cl_args or func
> >>>> is NULL.  The stack is 16-byte aligned prior executing func.
> >>>
> >>> "x86_64 semantics" sounds wrong: maybe this should be documented?
> >>> i'd expect 0 cl_args/func to be UB like in most posix apis.
> >>
> >> Right, I think it is worth to document the function semantic
> >> properly at least on its internal header (include/clone_internal.h).
> >> H.J also added a new clone3.h headers, which is not currently installed
> >> that I am inclined to just remove it from now.  We might reinstate 
> >> if/when we decide to provide the clone3 as an ABI.
> >>
> >> And returning EINVAL for 0 cl_args/func aligns with our exported clone
> >> interface, where EINVAL is also returned for 0 function argument.
> > 
> > ok.
> > 
> >>>
> >>> and aligning sp in the child fails if signals are allowed there
> >>> (pthreads does not allow signals now, direct callers might).
> >>> i dont know if that's a concert (or if unaligned stack is
> >>> something we should fix up in clone3).
> >>
> >> It was overlooked on initial x86_64 clone3 implementation as well.  I
> >> think it better to just return EINVAL for unaligned stacks and avoid
> >> to change the stack pointer in the created thread.
> > 
> > long time ago linux did that on aarch64, but it was removed:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e6d9a52543338603e25e71e0e4942f05dae0dd8a
> > 
> > i think in clone3 the kernel should have aligned (it knows
> > the bounds now), doing it in the userspace wrapper is weird
> > (should we adjust the stack size?). and not doing it at all
> > makes clone3 hard to use portably (user has to know target
> > specific pcs requirements).
> > 
> > not sure what's the best way forward.
> 
> I think the stack size won't matter much here, at least not from
> kernel point of view (the resulting stack size will most likely
> be page aligned anyway).  But I think this kernel commit makes a good
> point that silently adjusting the stack in userland is not the
> correct approach, I think H.J has done to make it consistent with
> glibc clone implementation which does it.
> 
> IMHO the best approach would to just remove the stack alignment,
> since it incurs the signal handling issue.

current generic clone callers dont align the stack and
e.g. unaligned pthread custom stack should work.

so we have to do arch specific stack alignment somewhere,
maybe in pthread_create?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 4/9] aarch64: Add the clone3 wrapper
  2022-11-03 16:31           ` Szabolcs Nagy
@ 2022-11-03 16:39             ` Adhemerval Zanella Netto
  2022-11-03 16:52               ` Szabolcs Nagy
  0 siblings, 1 reply; 28+ messages in thread
From: Adhemerval Zanella Netto @ 2022-11-03 16:39 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: libc-alpha, Christian Brauner, H.J. Lu



On 03/11/22 13:31, Szabolcs Nagy wrote:
> The 11/03/2022 13:22, Adhemerval Zanella Netto wrote:
>>
>>
>> On 03/11/22 11:01, Szabolcs Nagy wrote:
>>> The 11/03/2022 10:15, Adhemerval Zanella Netto wrote:
>>>> On 02/11/22 09:12, Szabolcs Nagy wrote:
>>>>> The 09/30/2022 16:26, Adhemerval Zanella via Libc-alpha wrote:
>>>>>> It follow the internal signature:
>>>>>>
>>>>>>   extern int clone3 (struct clone_args *__cl_args, size_t __size,
>>>>>>  int (*__func) (void *__arg), void *__arg);
>>>>>>
>>>>>> And x86_64 semantics to return EINVAL if either cl_args or func
>>>>>> is NULL.  The stack is 16-byte aligned prior executing func.
>>>>>
>>>>> "x86_64 semantics" sounds wrong: maybe this should be documented?
>>>>> i'd expect 0 cl_args/func to be UB like in most posix apis.
>>>>
>>>> Right, I think it is worth to document the function semantic
>>>> properly at least on its internal header (include/clone_internal.h).
>>>> H.J also added a new clone3.h headers, which is not currently installed
>>>> that I am inclined to just remove it from now.  We might reinstate 
>>>> if/when we decide to provide the clone3 as an ABI.
>>>>
>>>> And returning EINVAL for 0 cl_args/func aligns with our exported clone
>>>> interface, where EINVAL is also returned for 0 function argument.
>>>
>>> ok.
>>>
>>>>>
>>>>> and aligning sp in the child fails if signals are allowed there
>>>>> (pthreads does not allow signals now, direct callers might).
>>>>> i dont know if that's a concert (or if unaligned stack is
>>>>> something we should fix up in clone3).
>>>>
>>>> It was overlooked on initial x86_64 clone3 implementation as well.  I
>>>> think it better to just return EINVAL for unaligned stacks and avoid
>>>> to change the stack pointer in the created thread.
>>>
>>> long time ago linux did that on aarch64, but it was removed:
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e6d9a52543338603e25e71e0e4942f05dae0dd8a
>>>
>>> i think in clone3 the kernel should have aligned (it knows
>>> the bounds now), doing it in the userspace wrapper is weird
>>> (should we adjust the stack size?). and not doing it at all
>>> makes clone3 hard to use portably (user has to know target
>>> specific pcs requirements).
>>>
>>> not sure what's the best way forward.
>>
>> I think the stack size won't matter much here, at least not from
>> kernel point of view (the resulting stack size will most likely
>> be page aligned anyway).  But I think this kernel commit makes a good
>> point that silently adjusting the stack in userland is not the
>> correct approach, I think H.J has done to make it consistent with
>> glibc clone implementation which does it.
>>
>> IMHO the best approach would to just remove the stack alignment,
>> since it incurs the signal handling issue.
> 
> current generic clone callers dont align the stack and
> e.g. unaligned pthread custom stack should work.
> 
> so we have to do arch specific stack alignment somewhere,
> maybe in pthread_create?

I am thinking on __clone_internal, where if an unaligned stack is
used it creates a new clone_args struct with adjust arguments.  It
can adjust the struct in place (not sure which is better).

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 4/9] aarch64: Add the clone3 wrapper
  2022-11-03 16:39             ` Adhemerval Zanella Netto
@ 2022-11-03 16:52               ` Szabolcs Nagy
  2022-11-03 16:55                 ` Adhemerval Zanella Netto
  0 siblings, 1 reply; 28+ messages in thread
From: Szabolcs Nagy @ 2022-11-03 16:52 UTC (permalink / raw)
  To: Adhemerval Zanella Netto; +Cc: libc-alpha, Christian Brauner, H.J. Lu

The 11/03/2022 13:39, Adhemerval Zanella Netto wrote:
> 
> 
> On 03/11/22 13:31, Szabolcs Nagy wrote:
> > The 11/03/2022 13:22, Adhemerval Zanella Netto wrote:
> >>
> >>
> >> On 03/11/22 11:01, Szabolcs Nagy wrote:
> >>> The 11/03/2022 10:15, Adhemerval Zanella Netto wrote:
> >>>> On 02/11/22 09:12, Szabolcs Nagy wrote:
> >>>>> The 09/30/2022 16:26, Adhemerval Zanella via Libc-alpha wrote:
> >>>>>> It follow the internal signature:
> >>>>>>
> >>>>>>   extern int clone3 (struct clone_args *__cl_args, size_t __size,
> >>>>>>  int (*__func) (void *__arg), void *__arg);
> >>>>>>
> >>>>>> And x86_64 semantics to return EINVAL if either cl_args or func
> >>>>>> is NULL.  The stack is 16-byte aligned prior executing func.
> >>>>>
> >>>>> "x86_64 semantics" sounds wrong: maybe this should be documented?
> >>>>> i'd expect 0 cl_args/func to be UB like in most posix apis.
> >>>>
> >>>> Right, I think it is worth to document the function semantic
> >>>> properly at least on its internal header (include/clone_internal.h).
> >>>> H.J also added a new clone3.h headers, which is not currently installed
> >>>> that I am inclined to just remove it from now.  We might reinstate 
> >>>> if/when we decide to provide the clone3 as an ABI.
> >>>>
> >>>> And returning EINVAL for 0 cl_args/func aligns with our exported clone
> >>>> interface, where EINVAL is also returned for 0 function argument.
> >>>
> >>> ok.
> >>>
> >>>>>
> >>>>> and aligning sp in the child fails if signals are allowed there
> >>>>> (pthreads does not allow signals now, direct callers might).
> >>>>> i dont know if that's a concert (or if unaligned stack is
> >>>>> something we should fix up in clone3).
> >>>>
> >>>> It was overlooked on initial x86_64 clone3 implementation as well.  I
> >>>> think it better to just return EINVAL for unaligned stacks and avoid
> >>>> to change the stack pointer in the created thread.
> >>>
> >>> long time ago linux did that on aarch64, but it was removed:
> >>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e6d9a52543338603e25e71e0e4942f05dae0dd8a
> >>>
> >>> i think in clone3 the kernel should have aligned (it knows
> >>> the bounds now), doing it in the userspace wrapper is weird
> >>> (should we adjust the stack size?). and not doing it at all
> >>> makes clone3 hard to use portably (user has to know target
> >>> specific pcs requirements).
> >>>
> >>> not sure what's the best way forward.
> >>
> >> I think the stack size won't matter much here, at least not from
> >> kernel point of view (the resulting stack size will most likely
> >> be page aligned anyway).  But I think this kernel commit makes a good
> >> point that silently adjusting the stack in userland is not the
> >> correct approach, I think H.J has done to make it consistent with
> >> glibc clone implementation which does it.
> >>
> >> IMHO the best approach would to just remove the stack alignment,
> >> since it incurs the signal handling issue.
> > 
> > current generic clone callers dont align the stack and
> > e.g. unaligned pthread custom stack should work.
> > 
> > so we have to do arch specific stack alignment somewhere,
> > maybe in pthread_create?
> 
> I am thinking on __clone_internal, where if an unaligned stack is
> used it creates a new clone_args struct with adjust arguments.  It
> can adjust the struct in place (not sure which is better).

if the api is not exposed, then i think the arg can be modified
in place. (if clone3 api is exposed to users then we should not
modify user structs unless the clone3 api contract explicitly
allows this.)

either aligning in pthread_create or __clone_internal works for me,
the target specific clone3 syscall should not in case that gets
exposed to users.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 4/9] aarch64: Add the clone3 wrapper
  2022-11-03 16:52               ` Szabolcs Nagy
@ 2022-11-03 16:55                 ` Adhemerval Zanella Netto
  2022-11-03 20:55                   ` H.J. Lu
  2022-11-03 21:22                   ` Adhemerval Zanella Netto
  0 siblings, 2 replies; 28+ messages in thread
From: Adhemerval Zanella Netto @ 2022-11-03 16:55 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: libc-alpha, Christian Brauner, H.J. Lu



On 03/11/22 13:52, Szabolcs Nagy wrote:
> The 11/03/2022 13:39, Adhemerval Zanella Netto wrote:
>>
>>
>> On 03/11/22 13:31, Szabolcs Nagy wrote:
>>> The 11/03/2022 13:22, Adhemerval Zanella Netto wrote:
>>>>
>>>>
>>>> On 03/11/22 11:01, Szabolcs Nagy wrote:
>>>>> The 11/03/2022 10:15, Adhemerval Zanella Netto wrote:
>>>>>> On 02/11/22 09:12, Szabolcs Nagy wrote:
>>>>>>> The 09/30/2022 16:26, Adhemerval Zanella via Libc-alpha wrote:
>>>>>>>> It follow the internal signature:
>>>>>>>>
>>>>>>>>   extern int clone3 (struct clone_args *__cl_args, size_t __size,
>>>>>>>>  int (*__func) (void *__arg), void *__arg);
>>>>>>>>
>>>>>>>> And x86_64 semantics to return EINVAL if either cl_args or func
>>>>>>>> is NULL.  The stack is 16-byte aligned prior executing func.
>>>>>>>
>>>>>>> "x86_64 semantics" sounds wrong: maybe this should be documented?
>>>>>>> i'd expect 0 cl_args/func to be UB like in most posix apis.
>>>>>>
>>>>>> Right, I think it is worth to document the function semantic
>>>>>> properly at least on its internal header (include/clone_internal.h).
>>>>>> H.J also added a new clone3.h headers, which is not currently installed
>>>>>> that I am inclined to just remove it from now.  We might reinstate 
>>>>>> if/when we decide to provide the clone3 as an ABI.
>>>>>>
>>>>>> And returning EINVAL for 0 cl_args/func aligns with our exported clone
>>>>>> interface, where EINVAL is also returned for 0 function argument.
>>>>>
>>>>> ok.
>>>>>
>>>>>>>
>>>>>>> and aligning sp in the child fails if signals are allowed there
>>>>>>> (pthreads does not allow signals now, direct callers might).
>>>>>>> i dont know if that's a concert (or if unaligned stack is
>>>>>>> something we should fix up in clone3).
>>>>>>
>>>>>> It was overlooked on initial x86_64 clone3 implementation as well.  I
>>>>>> think it better to just return EINVAL for unaligned stacks and avoid
>>>>>> to change the stack pointer in the created thread.
>>>>>
>>>>> long time ago linux did that on aarch64, but it was removed:
>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e6d9a52543338603e25e71e0e4942f05dae0dd8a
>>>>>
>>>>> i think in clone3 the kernel should have aligned (it knows
>>>>> the bounds now), doing it in the userspace wrapper is weird
>>>>> (should we adjust the stack size?). and not doing it at all
>>>>> makes clone3 hard to use portably (user has to know target
>>>>> specific pcs requirements).
>>>>>
>>>>> not sure what's the best way forward.
>>>>
>>>> I think the stack size won't matter much here, at least not from
>>>> kernel point of view (the resulting stack size will most likely
>>>> be page aligned anyway).  But I think this kernel commit makes a good
>>>> point that silently adjusting the stack in userland is not the
>>>> correct approach, I think H.J has done to make it consistent with
>>>> glibc clone implementation which does it.
>>>>
>>>> IMHO the best approach would to just remove the stack alignment,
>>>> since it incurs the signal handling issue.
>>>
>>> current generic clone callers dont align the stack and
>>> e.g. unaligned pthread custom stack should work.
>>>
>>> so we have to do arch specific stack alignment somewhere,
>>> maybe in pthread_create?
>>
>> I am thinking on __clone_internal, where if an unaligned stack is
>> used it creates a new clone_args struct with adjust arguments.  It
>> can adjust the struct in place (not sure which is better).
> 
> if the api is not exposed, then i think the arg can be modified
> in place. (if clone3 api is exposed to users then we should not
> modify user structs unless the clone3 api contract explicitly
> allows this.)
> 
> either aligning in pthread_create or __clone_internal works for me,
> the target specific clone3 syscall should not in case that gets
> exposed to users.
> 

The arg modification would be done only internally by __clone_internal,
if we ever export __clone3 it will not mess with stack alignment (my
idea is to remove it from x86_64 as well).

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 4/9] aarch64: Add the clone3 wrapper
  2022-11-03 16:55                 ` Adhemerval Zanella Netto
@ 2022-11-03 20:55                   ` H.J. Lu
  2022-11-03 21:28                     ` Adhemerval Zanella Netto
  2022-11-03 21:22                   ` Adhemerval Zanella Netto
  1 sibling, 1 reply; 28+ messages in thread
From: H.J. Lu @ 2022-11-03 20:55 UTC (permalink / raw)
  To: Adhemerval Zanella Netto; +Cc: Szabolcs Nagy, libc-alpha, Christian Brauner

On Thu, Nov 3, 2022 at 9:55 AM Adhemerval Zanella Netto
<adhemerval.zanella@linaro.org> wrote:
>
>
>
> On 03/11/22 13:52, Szabolcs Nagy wrote:
> > The 11/03/2022 13:39, Adhemerval Zanella Netto wrote:
> >>
> >>
> >> On 03/11/22 13:31, Szabolcs Nagy wrote:
> >>> The 11/03/2022 13:22, Adhemerval Zanella Netto wrote:
> >>>>
> >>>>
> >>>> On 03/11/22 11:01, Szabolcs Nagy wrote:
> >>>>> The 11/03/2022 10:15, Adhemerval Zanella Netto wrote:
> >>>>>> On 02/11/22 09:12, Szabolcs Nagy wrote:
> >>>>>>> The 09/30/2022 16:26, Adhemerval Zanella via Libc-alpha wrote:
> >>>>>>>> It follow the internal signature:
> >>>>>>>>
> >>>>>>>>   extern int clone3 (struct clone_args *__cl_args, size_t __size,
> >>>>>>>>  int (*__func) (void *__arg), void *__arg);
> >>>>>>>>
> >>>>>>>> And x86_64 semantics to return EINVAL if either cl_args or func
> >>>>>>>> is NULL.  The stack is 16-byte aligned prior executing func.
> >>>>>>>
> >>>>>>> "x86_64 semantics" sounds wrong: maybe this should be documented?
> >>>>>>> i'd expect 0 cl_args/func to be UB like in most posix apis.
> >>>>>>
> >>>>>> Right, I think it is worth to document the function semantic
> >>>>>> properly at least on its internal header (include/clone_internal.h).
> >>>>>> H.J also added a new clone3.h headers, which is not currently installed
> >>>>>> that I am inclined to just remove it from now.  We might reinstate
> >>>>>> if/when we decide to provide the clone3 as an ABI.
> >>>>>>
> >>>>>> And returning EINVAL for 0 cl_args/func aligns with our exported clone
> >>>>>> interface, where EINVAL is also returned for 0 function argument.
> >>>>>
> >>>>> ok.
> >>>>>
> >>>>>>>
> >>>>>>> and aligning sp in the child fails if signals are allowed there
> >>>>>>> (pthreads does not allow signals now, direct callers might).
> >>>>>>> i dont know if that's a concert (or if unaligned stack is
> >>>>>>> something we should fix up in clone3).
> >>>>>>
> >>>>>> It was overlooked on initial x86_64 clone3 implementation as well.  I
> >>>>>> think it better to just return EINVAL for unaligned stacks and avoid
> >>>>>> to change the stack pointer in the created thread.
> >>>>>
> >>>>> long time ago linux did that on aarch64, but it was removed:
> >>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e6d9a52543338603e25e71e0e4942f05dae0dd8a
> >>>>>
> >>>>> i think in clone3 the kernel should have aligned (it knows
> >>>>> the bounds now), doing it in the userspace wrapper is weird
> >>>>> (should we adjust the stack size?). and not doing it at all
> >>>>> makes clone3 hard to use portably (user has to know target
> >>>>> specific pcs requirements).
> >>>>>
> >>>>> not sure what's the best way forward.
> >>>>
> >>>> I think the stack size won't matter much here, at least not from
> >>>> kernel point of view (the resulting stack size will most likely
> >>>> be page aligned anyway).  But I think this kernel commit makes a good
> >>>> point that silently adjusting the stack in userland is not the
> >>>> correct approach, I think H.J has done to make it consistent with
> >>>> glibc clone implementation which does it.
> >>>>
> >>>> IMHO the best approach would to just remove the stack alignment,
> >>>> since it incurs the signal handling issue.
> >>>
> >>> current generic clone callers dont align the stack and
> >>> e.g. unaligned pthread custom stack should work.
> >>>
> >>> so we have to do arch specific stack alignment somewhere,
> >>> maybe in pthread_create?
> >>
> >> I am thinking on __clone_internal, where if an unaligned stack is
> >> used it creates a new clone_args struct with adjust arguments.  It
> >> can adjust the struct in place (not sure which is better).
> >
> > if the api is not exposed, then i think the arg can be modified
> > in place. (if clone3 api is exposed to users then we should not
> > modify user structs unless the clone3 api contract explicitly
> > allows this.)
> >
> > either aligning in pthread_create or __clone_internal works for me,
> > the target specific clone3 syscall should not in case that gets
> > exposed to users.
> >
>
> The arg modification would be done only internally by __clone_internal,
> if we ever export __clone3 it will not mess with stack alignment (my
> idea is to remove it from x86_64 as well).

Is there a bug for the signal handling issue?


-- 
H.J.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 4/9] aarch64: Add the clone3 wrapper
  2022-11-03 16:55                 ` Adhemerval Zanella Netto
  2022-11-03 20:55                   ` H.J. Lu
@ 2022-11-03 21:22                   ` Adhemerval Zanella Netto
  2022-11-03 21:58                     ` H.J. Lu
  1 sibling, 1 reply; 28+ messages in thread
From: Adhemerval Zanella Netto @ 2022-11-03 21:22 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: libc-alpha, Christian Brauner, H.J. Lu



On 03/11/22 13:55, Adhemerval Zanella Netto wrote:
> 
> 
> On 03/11/22 13:52, Szabolcs Nagy wrote:
>> The 11/03/2022 13:39, Adhemerval Zanella Netto wrote:
>>>
>>>
>>> On 03/11/22 13:31, Szabolcs Nagy wrote:
>>>> The 11/03/2022 13:22, Adhemerval Zanella Netto wrote:
>>>>>
>>>>>
>>>>> On 03/11/22 11:01, Szabolcs Nagy wrote:
>>>>>> The 11/03/2022 10:15, Adhemerval Zanella Netto wrote:
>>>>>>> On 02/11/22 09:12, Szabolcs Nagy wrote:
>>>>>>>> The 09/30/2022 16:26, Adhemerval Zanella via Libc-alpha wrote:
>>>>>>>>> It follow the internal signature:
>>>>>>>>>
>>>>>>>>>   extern int clone3 (struct clone_args *__cl_args, size_t __size,
>>>>>>>>>  int (*__func) (void *__arg), void *__arg);
>>>>>>>>>
>>>>>>>>> And x86_64 semantics to return EINVAL if either cl_args or func
>>>>>>>>> is NULL.  The stack is 16-byte aligned prior executing func.
>>>>>>>>
>>>>>>>> "x86_64 semantics" sounds wrong: maybe this should be documented?
>>>>>>>> i'd expect 0 cl_args/func to be UB like in most posix apis.
>>>>>>>
>>>>>>> Right, I think it is worth to document the function semantic
>>>>>>> properly at least on its internal header (include/clone_internal.h).
>>>>>>> H.J also added a new clone3.h headers, which is not currently installed
>>>>>>> that I am inclined to just remove it from now.  We might reinstate 
>>>>>>> if/when we decide to provide the clone3 as an ABI.
>>>>>>>
>>>>>>> And returning EINVAL for 0 cl_args/func aligns with our exported clone
>>>>>>> interface, where EINVAL is also returned for 0 function argument.
>>>>>>
>>>>>> ok.
>>>>>>
>>>>>>>>
>>>>>>>> and aligning sp in the child fails if signals are allowed there
>>>>>>>> (pthreads does not allow signals now, direct callers might).
>>>>>>>> i dont know if that's a concert (or if unaligned stack is
>>>>>>>> something we should fix up in clone3).
>>>>>>>
>>>>>>> It was overlooked on initial x86_64 clone3 implementation as well.  I
>>>>>>> think it better to just return EINVAL for unaligned stacks and avoid
>>>>>>> to change the stack pointer in the created thread.
>>>>>>
>>>>>> long time ago linux did that on aarch64, but it was removed:
>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e6d9a52543338603e25e71e0e4942f05dae0dd8a
>>>>>>
>>>>>> i think in clone3 the kernel should have aligned (it knows
>>>>>> the bounds now), doing it in the userspace wrapper is weird
>>>>>> (should we adjust the stack size?). and not doing it at all
>>>>>> makes clone3 hard to use portably (user has to know target
>>>>>> specific pcs requirements).
>>>>>>
>>>>>> not sure what's the best way forward.
>>>>>
>>>>> I think the stack size won't matter much here, at least not from
>>>>> kernel point of view (the resulting stack size will most likely
>>>>> be page aligned anyway).  But I think this kernel commit makes a good
>>>>> point that silently adjusting the stack in userland is not the
>>>>> correct approach, I think H.J has done to make it consistent with
>>>>> glibc clone implementation which does it.
>>>>>
>>>>> IMHO the best approach would to just remove the stack alignment,
>>>>> since it incurs the signal handling issue.
>>>>
>>>> current generic clone callers dont align the stack and
>>>> e.g. unaligned pthread custom stack should work.
>>>>
>>>> so we have to do arch specific stack alignment somewhere,
>>>> maybe in pthread_create?
>>>
>>> I am thinking on __clone_internal, where if an unaligned stack is
>>> used it creates a new clone_args struct with adjust arguments.  It
>>> can adjust the struct in place (not sure which is better).
>>
>> if the api is not exposed, then i think the arg can be modified
>> in place. (if clone3 api is exposed to users then we should not
>> modify user structs unless the clone3 api contract explicitly
>> allows this.)
>>
>> either aligning in pthread_create or __clone_internal works for me,
>> the target specific clone3 syscall should not in case that gets
>> exposed to users.
>>
> 
> The arg modification would be done only internally by __clone_internal,
> if we ever export __clone3 it will not mess with stack alignment (my
> idea is to remove it from x86_64 as well).

All the internal usage of __clone_internal are done with all signal masked,
so aligning the stack is currently safe.  However, I still think moving out
the stack alignment of __clone3 is still a net gain: it remove an
implementation detail (block/unblock signals) and simplifies the arch-specific
code.

However it makes a possible libc wrap clunky, the caller will need to know
the ABI stack alignment prior to the call since kernel does not automatically
align the stack.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 4/9] aarch64: Add the clone3 wrapper
  2022-11-03 20:55                   ` H.J. Lu
@ 2022-11-03 21:28                     ` Adhemerval Zanella Netto
  0 siblings, 0 replies; 28+ messages in thread
From: Adhemerval Zanella Netto @ 2022-11-03 21:28 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Szabolcs Nagy, libc-alpha, Christian Brauner



On 03/11/22 17:55, H.J. Lu wrote:
> On Thu, Nov 3, 2022 at 9:55 AM Adhemerval Zanella Netto
> <adhemerval.zanella@linaro.org> wrote:
>>
>>
>>
>> On 03/11/22 13:52, Szabolcs Nagy wrote:
>>> The 11/03/2022 13:39, Adhemerval Zanella Netto wrote:
>>>>
>>>>
>>>> On 03/11/22 13:31, Szabolcs Nagy wrote:
>>>>> The 11/03/2022 13:22, Adhemerval Zanella Netto wrote:
>>>>>>
>>>>>>
>>>>>> On 03/11/22 11:01, Szabolcs Nagy wrote:
>>>>>>> The 11/03/2022 10:15, Adhemerval Zanella Netto wrote:
>>>>>>>> On 02/11/22 09:12, Szabolcs Nagy wrote:
>>>>>>>>> The 09/30/2022 16:26, Adhemerval Zanella via Libc-alpha wrote:
>>>>>>>>>> It follow the internal signature:
>>>>>>>>>>
>>>>>>>>>>   extern int clone3 (struct clone_args *__cl_args, size_t __size,
>>>>>>>>>>  int (*__func) (void *__arg), void *__arg);
>>>>>>>>>>
>>>>>>>>>> And x86_64 semantics to return EINVAL if either cl_args or func
>>>>>>>>>> is NULL.  The stack is 16-byte aligned prior executing func.
>>>>>>>>>
>>>>>>>>> "x86_64 semantics" sounds wrong: maybe this should be documented?
>>>>>>>>> i'd expect 0 cl_args/func to be UB like in most posix apis.
>>>>>>>>
>>>>>>>> Right, I think it is worth to document the function semantic
>>>>>>>> properly at least on its internal header (include/clone_internal.h).
>>>>>>>> H.J also added a new clone3.h headers, which is not currently installed
>>>>>>>> that I am inclined to just remove it from now.  We might reinstate
>>>>>>>> if/when we decide to provide the clone3 as an ABI.
>>>>>>>>
>>>>>>>> And returning EINVAL for 0 cl_args/func aligns with our exported clone
>>>>>>>> interface, where EINVAL is also returned for 0 function argument.
>>>>>>>
>>>>>>> ok.
>>>>>>>
>>>>>>>>>
>>>>>>>>> and aligning sp in the child fails if signals are allowed there
>>>>>>>>> (pthreads does not allow signals now, direct callers might).
>>>>>>>>> i dont know if that's a concert (or if unaligned stack is
>>>>>>>>> something we should fix up in clone3).
>>>>>>>>
>>>>>>>> It was overlooked on initial x86_64 clone3 implementation as well.  I
>>>>>>>> think it better to just return EINVAL for unaligned stacks and avoid
>>>>>>>> to change the stack pointer in the created thread.
>>>>>>>
>>>>>>> long time ago linux did that on aarch64, but it was removed:
>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e6d9a52543338603e25e71e0e4942f05dae0dd8a
>>>>>>>
>>>>>>> i think in clone3 the kernel should have aligned (it knows
>>>>>>> the bounds now), doing it in the userspace wrapper is weird
>>>>>>> (should we adjust the stack size?). and not doing it at all
>>>>>>> makes clone3 hard to use portably (user has to know target
>>>>>>> specific pcs requirements).
>>>>>>>
>>>>>>> not sure what's the best way forward.
>>>>>>
>>>>>> I think the stack size won't matter much here, at least not from
>>>>>> kernel point of view (the resulting stack size will most likely
>>>>>> be page aligned anyway).  But I think this kernel commit makes a good
>>>>>> point that silently adjusting the stack in userland is not the
>>>>>> correct approach, I think H.J has done to make it consistent with
>>>>>> glibc clone implementation which does it.
>>>>>>
>>>>>> IMHO the best approach would to just remove the stack alignment,
>>>>>> since it incurs the signal handling issue.
>>>>>
>>>>> current generic clone callers dont align the stack and
>>>>> e.g. unaligned pthread custom stack should work.
>>>>>
>>>>> so we have to do arch specific stack alignment somewhere,
>>>>> maybe in pthread_create?
>>>>
>>>> I am thinking on __clone_internal, where if an unaligned stack is
>>>> used it creates a new clone_args struct with adjust arguments.  It
>>>> can adjust the struct in place (not sure which is better).
>>>
>>> if the api is not exposed, then i think the arg can be modified
>>> in place. (if clone3 api is exposed to users then we should not
>>> modify user structs unless the clone3 api contract explicitly
>>> allows this.)
>>>
>>> either aligning in pthread_create or __clone_internal works for me,
>>> the target specific clone3 syscall should not in case that gets
>>> exposed to users.
>>>
>>
>> The arg modification would be done only internally by __clone_internal,
>> if we ever export __clone3 it will not mess with stack alignment (my
>> idea is to remove it from x86_64 as well).
> 
> Is there a bug for the signal handling issue?

No with current usage, on both pthread_create and posix_spawn glibc mask
all internal signals (including internal ones).

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 4/9] aarch64: Add the clone3 wrapper
  2022-11-03 21:22                   ` Adhemerval Zanella Netto
@ 2022-11-03 21:58                     ` H.J. Lu
  2022-11-04 12:32                       ` Adhemerval Zanella Netto
  0 siblings, 1 reply; 28+ messages in thread
From: H.J. Lu @ 2022-11-03 21:58 UTC (permalink / raw)
  To: Adhemerval Zanella Netto; +Cc: Szabolcs Nagy, libc-alpha, Christian Brauner

On Thu, Nov 3, 2022 at 2:22 PM Adhemerval Zanella Netto
<adhemerval.zanella@linaro.org> wrote:
>
>
>
> On 03/11/22 13:55, Adhemerval Zanella Netto wrote:
> >
> >
> > On 03/11/22 13:52, Szabolcs Nagy wrote:
> >> The 11/03/2022 13:39, Adhemerval Zanella Netto wrote:
> >>>
> >>>
> >>> On 03/11/22 13:31, Szabolcs Nagy wrote:
> >>>> The 11/03/2022 13:22, Adhemerval Zanella Netto wrote:
> >>>>>
> >>>>>
> >>>>> On 03/11/22 11:01, Szabolcs Nagy wrote:
> >>>>>> The 11/03/2022 10:15, Adhemerval Zanella Netto wrote:
> >>>>>>> On 02/11/22 09:12, Szabolcs Nagy wrote:
> >>>>>>>> The 09/30/2022 16:26, Adhemerval Zanella via Libc-alpha wrote:
> >>>>>>>>> It follow the internal signature:
> >>>>>>>>>
> >>>>>>>>>   extern int clone3 (struct clone_args *__cl_args, size_t __size,
> >>>>>>>>>  int (*__func) (void *__arg), void *__arg);
> >>>>>>>>>
> >>>>>>>>> And x86_64 semantics to return EINVAL if either cl_args or func
> >>>>>>>>> is NULL.  The stack is 16-byte aligned prior executing func.
> >>>>>>>>
> >>>>>>>> "x86_64 semantics" sounds wrong: maybe this should be documented?
> >>>>>>>> i'd expect 0 cl_args/func to be UB like in most posix apis.
> >>>>>>>
> >>>>>>> Right, I think it is worth to document the function semantic
> >>>>>>> properly at least on its internal header (include/clone_internal.h).
> >>>>>>> H.J also added a new clone3.h headers, which is not currently installed
> >>>>>>> that I am inclined to just remove it from now.  We might reinstate
> >>>>>>> if/when we decide to provide the clone3 as an ABI.
> >>>>>>>
> >>>>>>> And returning EINVAL for 0 cl_args/func aligns with our exported clone
> >>>>>>> interface, where EINVAL is also returned for 0 function argument.
> >>>>>>
> >>>>>> ok.
> >>>>>>
> >>>>>>>>
> >>>>>>>> and aligning sp in the child fails if signals are allowed there
> >>>>>>>> (pthreads does not allow signals now, direct callers might).
> >>>>>>>> i dont know if that's a concert (or if unaligned stack is
> >>>>>>>> something we should fix up in clone3).
> >>>>>>>
> >>>>>>> It was overlooked on initial x86_64 clone3 implementation as well.  I
> >>>>>>> think it better to just return EINVAL for unaligned stacks and avoid
> >>>>>>> to change the stack pointer in the created thread.
> >>>>>>
> >>>>>> long time ago linux did that on aarch64, but it was removed:
> >>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e6d9a52543338603e25e71e0e4942f05dae0dd8a
> >>>>>>
> >>>>>> i think in clone3 the kernel should have aligned (it knows
> >>>>>> the bounds now), doing it in the userspace wrapper is weird
> >>>>>> (should we adjust the stack size?). and not doing it at all
> >>>>>> makes clone3 hard to use portably (user has to know target
> >>>>>> specific pcs requirements).
> >>>>>>
> >>>>>> not sure what's the best way forward.
> >>>>>
> >>>>> I think the stack size won't matter much here, at least not from
> >>>>> kernel point of view (the resulting stack size will most likely
> >>>>> be page aligned anyway).  But I think this kernel commit makes a good
> >>>>> point that silently adjusting the stack in userland is not the
> >>>>> correct approach, I think H.J has done to make it consistent with
> >>>>> glibc clone implementation which does it.
> >>>>>
> >>>>> IMHO the best approach would to just remove the stack alignment,
> >>>>> since it incurs the signal handling issue.
> >>>>
> >>>> current generic clone callers dont align the stack and
> >>>> e.g. unaligned pthread custom stack should work.
> >>>>
> >>>> so we have to do arch specific stack alignment somewhere,
> >>>> maybe in pthread_create?
> >>>
> >>> I am thinking on __clone_internal, where if an unaligned stack is
> >>> used it creates a new clone_args struct with adjust arguments.  It
> >>> can adjust the struct in place (not sure which is better).
> >>
> >> if the api is not exposed, then i think the arg can be modified
> >> in place. (if clone3 api is exposed to users then we should not
> >> modify user structs unless the clone3 api contract explicitly
> >> allows this.)
> >>
> >> either aligning in pthread_create or __clone_internal works for me,
> >> the target specific clone3 syscall should not in case that gets
> >> exposed to users.
> >>
> >
> > The arg modification would be done only internally by __clone_internal,
> > if we ever export __clone3 it will not mess with stack alignment (my
> > idea is to remove it from x86_64 as well).
>
> All the internal usage of __clone_internal are done with all signal masked,
> so aligning the stack is currently safe.  However, I still think moving out
> the stack alignment of __clone3 is still a net gain: it remove an
> implementation detail (block/unblock signals) and simplifies the arch-specific
> code.
>
> However it makes a possible libc wrap clunky, the caller will need to know
> the ABI stack alignment prior to the call since kernel does not automatically
> align the stack.

For the internal clone3, we can drop stack alignment adjustment.  The
internal users are responsible for correct stack alignment.  If there is
a public clone3 wrapper, it should adjust stack alignment.

-- 
H.J.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 4/9] aarch64: Add the clone3 wrapper
  2022-11-03 21:58                     ` H.J. Lu
@ 2022-11-04 12:32                       ` Adhemerval Zanella Netto
  0 siblings, 0 replies; 28+ messages in thread
From: Adhemerval Zanella Netto @ 2022-11-04 12:32 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Szabolcs Nagy, libc-alpha, Christian Brauner



On 03/11/22 18:58, H.J. Lu wrote:
> On Thu, Nov 3, 2022 at 2:22 PM Adhemerval Zanella Netto
> <adhemerval.zanella@linaro.org> wrote:
>>
>>
>>
>> On 03/11/22 13:55, Adhemerval Zanella Netto wrote:
>>>
>>>
>>> On 03/11/22 13:52, Szabolcs Nagy wrote:
>>>> The 11/03/2022 13:39, Adhemerval Zanella Netto wrote:
>>>>>
>>>>>
>>>>> On 03/11/22 13:31, Szabolcs Nagy wrote:
>>>>>> The 11/03/2022 13:22, Adhemerval Zanella Netto wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 03/11/22 11:01, Szabolcs Nagy wrote:
>>>>>>>> The 11/03/2022 10:15, Adhemerval Zanella Netto wrote:
>>>>>>>>> On 02/11/22 09:12, Szabolcs Nagy wrote:
>>>>>>>>>> The 09/30/2022 16:26, Adhemerval Zanella via Libc-alpha wrote:
>>>>>>>>>>> It follow the internal signature:
>>>>>>>>>>>
>>>>>>>>>>>   extern int clone3 (struct clone_args *__cl_args, size_t __size,
>>>>>>>>>>>  int (*__func) (void *__arg), void *__arg);
>>>>>>>>>>>
>>>>>>>>>>> And x86_64 semantics to return EINVAL if either cl_args or func
>>>>>>>>>>> is NULL.  The stack is 16-byte aligned prior executing func.
>>>>>>>>>>
>>>>>>>>>> "x86_64 semantics" sounds wrong: maybe this should be documented?
>>>>>>>>>> i'd expect 0 cl_args/func to be UB like in most posix apis.
>>>>>>>>>
>>>>>>>>> Right, I think it is worth to document the function semantic
>>>>>>>>> properly at least on its internal header (include/clone_internal.h).
>>>>>>>>> H.J also added a new clone3.h headers, which is not currently installed
>>>>>>>>> that I am inclined to just remove it from now.  We might reinstate
>>>>>>>>> if/when we decide to provide the clone3 as an ABI.
>>>>>>>>>
>>>>>>>>> And returning EINVAL for 0 cl_args/func aligns with our exported clone
>>>>>>>>> interface, where EINVAL is also returned for 0 function argument.
>>>>>>>>
>>>>>>>> ok.
>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> and aligning sp in the child fails if signals are allowed there
>>>>>>>>>> (pthreads does not allow signals now, direct callers might).
>>>>>>>>>> i dont know if that's a concert (or if unaligned stack is
>>>>>>>>>> something we should fix up in clone3).
>>>>>>>>>
>>>>>>>>> It was overlooked on initial x86_64 clone3 implementation as well.  I
>>>>>>>>> think it better to just return EINVAL for unaligned stacks and avoid
>>>>>>>>> to change the stack pointer in the created thread.
>>>>>>>>
>>>>>>>> long time ago linux did that on aarch64, but it was removed:
>>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e6d9a52543338603e25e71e0e4942f05dae0dd8a
>>>>>>>>
>>>>>>>> i think in clone3 the kernel should have aligned (it knows
>>>>>>>> the bounds now), doing it in the userspace wrapper is weird
>>>>>>>> (should we adjust the stack size?). and not doing it at all
>>>>>>>> makes clone3 hard to use portably (user has to know target
>>>>>>>> specific pcs requirements).
>>>>>>>>
>>>>>>>> not sure what's the best way forward.
>>>>>>>
>>>>>>> I think the stack size won't matter much here, at least not from
>>>>>>> kernel point of view (the resulting stack size will most likely
>>>>>>> be page aligned anyway).  But I think this kernel commit makes a good
>>>>>>> point that silently adjusting the stack in userland is not the
>>>>>>> correct approach, I think H.J has done to make it consistent with
>>>>>>> glibc clone implementation which does it.
>>>>>>>
>>>>>>> IMHO the best approach would to just remove the stack alignment,
>>>>>>> since it incurs the signal handling issue.
>>>>>>
>>>>>> current generic clone callers dont align the stack and
>>>>>> e.g. unaligned pthread custom stack should work.
>>>>>>
>>>>>> so we have to do arch specific stack alignment somewhere,
>>>>>> maybe in pthread_create?
>>>>>
>>>>> I am thinking on __clone_internal, where if an unaligned stack is
>>>>> used it creates a new clone_args struct with adjust arguments.  It
>>>>> can adjust the struct in place (not sure which is better).
>>>>
>>>> if the api is not exposed, then i think the arg can be modified
>>>> in place. (if clone3 api is exposed to users then we should not
>>>> modify user structs unless the clone3 api contract explicitly
>>>> allows this.)
>>>>
>>>> either aligning in pthread_create or __clone_internal works for me,
>>>> the target specific clone3 syscall should not in case that gets
>>>> exposed to users.
>>>>
>>>
>>> The arg modification would be done only internally by __clone_internal,
>>> if we ever export __clone3 it will not mess with stack alignment (my
>>> idea is to remove it from x86_64 as well).
>>
>> All the internal usage of __clone_internal are done with all signal masked,
>> so aligning the stack is currently safe.  However, I still think moving out
>> the stack alignment of __clone3 is still a net gain: it remove an
>> implementation detail (block/unblock signals) and simplifies the arch-specific
>> code.
>>
>> However it makes a possible libc wrap clunky, the caller will need to know
>> the ABI stack alignment prior to the call since kernel does not automatically
>> align the stack.
> 
> For the internal clone3, we can drop stack alignment adjustment.  The
> internal users are responsible for correct stack alignment.  If there is
> a public clone3 wrapper, it should adjust stack alignment.
>

Ok, I will a documentation that __clone3 does not align the stack.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/9] linux: Do not reset signal handler in posix_spawn if it is already SIG_DFL
  2022-09-30 19:26 ` [PATCH v2 1/9] linux: Do not reset signal handler in posix_spawn if it is already SIG_DFL Adhemerval Zanella
@ 2023-01-11 21:06   ` Carlos O'Donell
  0 siblings, 0 replies; 28+ messages in thread
From: Carlos O'Donell @ 2023-01-11 21:06 UTC (permalink / raw)
  To: Adhemerval Zanella, libc-alpha, Christian Brauner

On 9/30/22 15:26, Adhemerval Zanella via Libc-alpha wrote:
> There is no need to issue another sigaction is the disposition is

Please post v3 with corrected commit message.

s/is the/if the/g

> already SIG_DFL.
> 
> Checked on x86_64-linux-gnu.
> ---
>  sysdeps/unix/sysv/linux/spawni.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/sysdeps/unix/sysv/linux/spawni.c b/sysdeps/unix/sysv/linux/spawni.c
> index ee843a2247..65ee03c804 100644
> --- a/sysdeps/unix/sysv/linux/spawni.c
> +++ b/sysdeps/unix/sysv/linux/spawni.c
> @@ -129,7 +129,7 @@ __spawni_child (void *arguments)
>  	  else
>  	    {
>  	      __libc_sigaction (sig, 0, &sa);
> -	      if (sa.sa_handler == SIG_IGN)
> +	      if (sa.sa_handler == SIG_IGN || sa.sa_handler == SIG_DFL)

OK. Agreed, since we're going to set SIG_DFL and then call sigaction.

>  		continue;
>  	      sa.sa_handler = SIG_DFL;
>  	    }

-- 
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/9] linux: Add clone3 CLONE_CLEAR_SIGHAND optimization to posix_spawn
  2022-09-30 19:26 ` [PATCH v2 2/9] linux: Add clone3 CLONE_CLEAR_SIGHAND optimization to posix_spawn Adhemerval Zanella
@ 2023-01-11 21:06   ` Carlos O'Donell
  0 siblings, 0 replies; 28+ messages in thread
From: Carlos O'Donell @ 2023-01-11 21:06 UTC (permalink / raw)
  To: Adhemerval Zanella, libc-alpha, Christian Brauner

On 9/30/22 15:26, Adhemerval Zanella via Libc-alpha wrote:
> The clone3 flag resets all signal handlers of the child not set to
> SIG_IGN to SIG_DFL.  It allows to skip most of the sigaction calls
> to setup child signal handling, where previously a posix_spawn
> has to issue 2 times NSIG sigaction calls (one to obtain the current

Please post v3.

s/has/had/g

> disposition and another to set either SIG_DFL or SIG_IGN).
> 
> The expection is POSIX_SPAWN_SETSIGDEF the child still setup the
> signal for the case the disposition is SIG_IGN.

Suggest:

With POSIX_SPAWN_SETSIGDEF the child will setup the signal for the case where
the disposition is SIG_IGN.

> 
> It also need to handle the fallback where clone3 is not available,
> to set the fallback in child.  This is done by splitting of
> __clone_internal_fallback from __clone_internal.

Suggest:

The code must handle the fallback where clone3 is not available. This is done
by splitting __clone_internal_fallback from __clone_internal.

> 
> Checked on x86_64-linux-gnu.
> ---
>  include/clone_internal.h                 |   5 +
>  posix/Makefile                           |   3 +-
>  posix/tst-spawn7.c                       | 179 +++++++++++++++++++++++
>  sysdeps/unix/sysv/linux/clone-internal.c |  37 +++--
>  sysdeps/unix/sysv/linux/clone3.h         |   6 +
>  sysdeps/unix/sysv/linux/spawni.c         |  31 ++--
>  6 files changed, 236 insertions(+), 25 deletions(-)
>  create mode 100644 posix/tst-spawn7.c
> 
> diff --git a/include/clone_internal.h b/include/clone_internal.h
> index 4b23ef33ce..320640e64b 100644
> --- a/include/clone_internal.h
> +++ b/include/clone_internal.h
> @@ -7,6 +7,11 @@ extern __typeof (clone3) __clone3;
>     -1 with ENOSYS, fall back to clone or clone2.  */
>  extern int __clone_internal (struct clone_args *__cl_args,
>  			     int (*__func) (void *__arg), void *__arg);
> +/* The fallback code which calls clone/clone2 based on clone3 arguments.  */
> +extern int __clone_internal_fallback (struct clone_args *__cl_args,
> +				      int (*__func) (void *__arg),
> +				      void *__arg)

OK.

> +     attribute_hidden;
>  
>  #ifndef _ISOMAC
>  libc_hidden_proto (__clone3)
> diff --git a/posix/Makefile b/posix/Makefile
> index d1df7c27cb..7887eb1c8b 100644
> --- a/posix/Makefile
> +++ b/posix/Makefile
> @@ -109,7 +109,7 @@ tests		:= test-errno tstgetopt testfnm runtests runptests \
>  		   tst-glob-tilde test-ssize-max tst-spawn4 bug-regex37 \
>  		   bug-regex38 tst-regcomp-truncated tst-spawn-chdir \
>  		   tst-wordexp-nocmd tst-execveat tst-spawn5 \
> -		   tst-sched_getaffinity tst-spawn6
> +		   tst-sched_getaffinity tst-spawn6 tst-spawn7

OK. New test.

>  
>  # Test for the glob symbol version that was replaced in glibc 2.27.
>  ifeq ($(have-GLIBC_2.26)$(build-shared),yesyes)
> @@ -291,6 +291,7 @@ tst-spawn-ARGS = -- $(host-test-program-cmd)
>  tst-spawn-static-ARGS = $(tst-spawn-ARGS)
>  tst-spawn5-ARGS = -- $(host-test-program-cmd)
>  tst-spawn6-ARGS = -- $(host-test-program-cmd)
> +tst-spawn7-ARGS = -- $(host-test-program-cmd)

OK.

>  tst-dir-ARGS = `pwd` `cd $(common-objdir)/$(subdir); pwd` `cd $(common-objdir); pwd` $(objpfx)tst-dir
>  tst-chmod-ARGS = $(objdir)
>  tst-vfork3-ARGS = --test-dir=$(objpfx)
> diff --git a/posix/tst-spawn7.c b/posix/tst-spawn7.c
> new file mode 100644
> index 0000000000..7cb7347c58
> --- /dev/null
> +++ b/posix/tst-spawn7.c
> @@ -0,0 +1,179 @@
> +/* Tests for posix_spawn signal handling.
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include <assert.h>
> +#include <getopt.h>
> +#include <spawn.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <support/check.h>
> +#include <support/xsignal.h>
> +#include <support/xunistd.h>
> +#include <unistd.h>
> +
> +/* Nonzero if the program gets called via `exec'.  */
> +#define CMDLINE_OPTIONS \
> +  { "restart", no_argument, &restart, 1 },
> +static int restart;
> +
> +/* Hold the four initial argument used to respawn the process, plus the extra
> +   '--direct', '--restart', the check type ('SIG_IGN' or 'SIG_DFL'), and a
> +   final NULL.  */
> +static char *spargs[8];
> +static int check_type_argc;
> +
> +/* Called on process re-execution.  */
> +_Noreturn static void
> +handle_restart (int argc, char *argv[])
> +{
> +  assert (argc == 1);
> +
> +  if (strcmp (argv[0], "SIG_DFL") == 0)
> +    {
> +      for (int i = 1; i < NSIG; i++)
> +	{
> +	  struct sigaction sa;
> +	  int r = sigaction (i, NULL, &sa);
> +	  /* Skip internal signals (such as SIGCANCEL).  */
> +	  if (r == -1)
> +	    continue;
> +	  TEST_VERIFY_EXIT (sa.sa_handler == SIG_DFL);
> +	}
> +      exit (EXIT_SUCCESS);
> +    }
> +  else if (strcmp (argv[0], "SIG_IGN") == 0)
> +    {
> +      for (int i = 1; i < NSIG; i++)
> +	{
> +	  struct sigaction sa;
> +	  int r = sigaction (i, NULL, &sa);
> +	  /* Skip internal signals (such as SIGCANCEL).  */
> +	  if (r == -1)
> +	    continue;
> +	  if (i == SIGUSR1 || i == SIGUSR2)
> +	    TEST_VERIFY_EXIT (sa.sa_handler == SIG_IGN);
> +	  else
> +	    TEST_VERIFY_EXIT (sa.sa_handler == SIG_DFL);
> +	}
> +      exit (EXIT_SUCCESS);
> +    }
> +
> +  exit (EXIT_FAILURE);
> +}
> +
> +static void
> +spawn_signal_test (const char *type, const posix_spawnattr_t *attr)
> +{
> +  spargs[check_type_argc] = (char*) type;
> +
> +  pid_t pid;
> +  int status;
> +
> +  TEST_COMPARE (posix_spawn (&pid, spargs[0], NULL, attr, spargs, environ), 0);
> +  TEST_COMPARE (xwaitpid (pid, &status, 0), pid);
> +  TEST_VERIFY (WIFEXITED (status));
> +  TEST_VERIFY (!WIFSIGNALED (status));
> +  TEST_COMPARE (WEXITSTATUS (status), 0);
> +}
> +
> +static void
> +dummy_sa_handler (int)
> +{
> +}
> +
> +static void
> +do_test_signals (void)
> +{
> +  {
> +    /* Check if all signals handler are set to SIG_DFL on spawned process.  */
> +    spawn_signal_test ("SIG_DFL", NULL);
> +  }
> +
> +  {
> +    /* Same as before, but set SIGUSR1 and SIGUSR2 to a handler different than
> +       SIG_IGN or SIG_DFL.  */
> +    struct sigaction sa = { 0 };
> +    sa.sa_handler = dummy_sa_handler;
> +    xsigaction (SIGUSR1, &sa, NULL);
> +    xsigaction (SIGUSR2, &sa, NULL);
> +    spawn_signal_test ("SIG_DFL", NULL);
> +  }
> +
> +  {
> +    /* Check if SIG_IGN is keep as is.  */
> +    struct sigaction sa = { 0 };
> +    sa.sa_handler = SIG_IGN;
> +    xsigaction (SIGUSR1, &sa, NULL);
> +    xsigaction (SIGUSR2, &sa, NULL);
> +    spawn_signal_test ("SIG_IGN", NULL);
> +  }
> +
> +  {
> +    /* Check if SIG_IGN handlers are set to SIG_DFL.  */
> +    posix_spawnattr_t attr;
> +    posix_spawnattr_init (&attr);
> +    sigset_t mask;
> +    sigemptyset (&mask);
> +    sigaddset (&mask, SIGUSR1);
> +    sigaddset (&mask, SIGUSR2);
> +    posix_spawnattr_setsigdefault (&attr, &mask);
> +    posix_spawnattr_setflags (&attr, POSIX_SPAWN_SETSIGDEF);
> +    spawn_signal_test ("SIG_DFL", &attr);
> +    posix_spawnattr_destroy (&attr);
> +  }
> +}
> +
> +static int
> +do_test (int argc, char *argv[])
> +{
> +  /* We must have either:
> +
> +     - one or four parameters if called initially:
> +       + argv[1]: path for ld.so        optional
> +       + argv[2]: "--library-path"      optional
> +       + argv[3]: the library path      optional
> +       + argv[4]: the application name
> +
> +     - six parameters left if called through re-execution:
> +       + argv[1]: the application name
> +       + argv[2]: check SIG_IGN/SIG_DFL.
> +
> +     * When built with --enable-hardcoded-path-in-tests or issued without
> +       using the loader directly.  */
> +
> +  if (restart)
> +    handle_restart (argc - 1, &argv[1]);
> +
> +  TEST_VERIFY_EXIT (argc == 2 || argc == 5);
> +
> +  int i;
> +  for (i = 0; i < argc - 1; i++)
> +    spargs[i] = argv[i + 1];
> +  spargs[i++] = (char *) "--direct";
> +  spargs[i++] = (char *) "--restart";
> +  check_type_argc = i++;
> +  spargs[i] = NULL;
> +
> +
> +  do_test_signals ();
> +
> +  return 0;
> +}

OK.

> +
> +#define TEST_FUNCTION_ARGV do_test
> +#include <support/test-driver.c>
> diff --git a/sysdeps/unix/sysv/linux/clone-internal.c b/sysdeps/unix/sysv/linux/clone-internal.c
> index a71effcbd3..7bc991e033 100644
> --- a/sysdeps/unix/sysv/linux/clone-internal.c
> +++ b/sysdeps/unix/sysv/linux/clone-internal.c
> @@ -44,27 +44,15 @@ _Static_assert (sizeof (struct clone_args) == CLONE_ARGS_SIZE_VER2,
>  		"sizeof (struct clone_args) != CLONE_ARGS_SIZE_VER2");
>  
>  int
> -__clone_internal (struct clone_args *cl_args,
> -		  int (*func) (void *arg), void *arg)
> +__clone_internal_fallback (struct clone_args *cl_args,
> +			   int (*func) (void *arg), void *arg)
>  {
> -  int ret;
> -#ifdef HAVE_CLONE3_WRAPPER
> -  /* Try clone3 first.  */
> -  int saved_errno = errno;
> -  ret = __clone3 (cl_args, sizeof (*cl_args), func, arg);
> -  if (ret != -1 || errno != ENOSYS)
> -    return ret;
> -
> -  /* NB: Restore errno since errno may be checked against non-zero
> -     return value.  */
> -  __set_errno (saved_errno);
> -#endif
> -
>    /* Map clone3 arguments to clone arguments.  NB: No need to check
>       invalid clone3 specific bits in flags nor exit_signal since this
>       is an internal function.  */
>    int flags = cl_args->flags | cl_args->exit_signal;
>    void *stack = cast_to_pointer (cl_args->stack);
> +  int ret;
>  
>  #ifdef __ia64__
>    ret = __clone2 (func, stack, cl_args->stack_size,
> @@ -88,4 +76,23 @@ __clone_internal (struct clone_args *cl_args,
>    return ret;
>  }
>  
> +
> +int
> +__clone_internal (struct clone_args *cl_args,
> +		  int (*func) (void *arg), void *arg)
> +{
> +#ifdef HAVE_CLONE3_WRAPPER
> +  int saved_errno = errno;
> +  int ret = __clone3 (cl_args, sizeof (*cl_args), func, arg);
> +  if (ret != -1 || errno != ENOSYS)
> +    return ret;
> +
> +  /* NB: Restore errno since errno may be checked against non-zero
> +     return value.  */
> +  __set_errno (saved_errno);
> +#endif
> +
> +  return __clone_internal_fallback (cl_args, func, arg);
> +}

OK.

> +
>  libc_hidden_def (__clone_internal)
> diff --git a/sysdeps/unix/sysv/linux/clone3.h b/sysdeps/unix/sysv/linux/clone3.h
> index 889014a6a9..a150363f89 100644
> --- a/sysdeps/unix/sysv/linux/clone3.h
> +++ b/sysdeps/unix/sysv/linux/clone3.h
> @@ -25,6 +25,12 @@
>  
>  __BEGIN_DECLS
>  
> +/* Flags for the clone3 syscall.  */
> +#define CLONE_CLEAR_SIGHAND 0x100000000ULL /* Clear any signal handler and
> +					      reset to SIG_DFL.  */
> +#define CLONE_INTO_CGROUP 0x200000000ULL /* Clone into a specific cgroup given
> +					    the right permissions.  */

OK.

> +
>  /* The unsigned 64-bit and 8-byte aligned integer type.  */
>  typedef __U64_TYPE __aligned_uint64_t __attribute__ ((__aligned__ (8)));
>  
> diff --git a/sysdeps/unix/sysv/linux/spawni.c b/sysdeps/unix/sysv/linux/spawni.c
> index 65ee03c804..ebc3c3e521 100644
> --- a/sysdeps/unix/sysv/linux/spawni.c
> +++ b/sysdeps/unix/sysv/linux/spawni.c
> @@ -66,6 +66,7 @@ struct posix_spawn_args
>    ptrdiff_t argc;
>    char *const *envp;
>    int xflags;
> +  bool use_clone3;

OK.

>    int err;
>  };
>  
> @@ -104,12 +105,11 @@ __spawni_child (void *arguments)
>    const posix_spawnattr_t *restrict attr = args->attr;
>    const posix_spawn_file_actions_t *file_actions = args->fa;
>  
> -  /* The child must ensure that no signal handler are enabled because it shared
> -     memory with parent, so the signal disposition must be either SIG_DFL or
> -     SIG_IGN.  It does by iterating over all signals and although it could
> -     possibly be more optimized (by tracking which signal potentially have a
> -     signal handler), it might requires system specific solutions (since the
> -     sigset_t data type can be very different on different architectures).  */
> +  /* The child must ensure that no signal handler is enabled because it
> +     shares memory with parent, so all signal dispositions must be either
> +     SIG_DFL or SIG_IGN.  If clone3/CLONE_CLEAR_SIGHAND is used, there is
> +     only need to set the defined signals POSIX_SPAWN_SETSIGDEF to SIG_DFL;

s/only/only the/g

> +     otherwise, the code iterates over all signals.  */
>    struct sigaction sa;
>    memset (&sa, '\0', sizeof (sa));
>  
> @@ -122,7 +122,7 @@ __spawni_child (void *arguments)
>  	{
>  	  sa.sa_handler = SIG_DFL;
>  	}
> -      else if (__sigismember (&hset, sig))
> +      else if (!args->use_clone3 && __sigismember (&hset, sig))

OK.

>  	{
>  	  if (is_internal_signal (sig))
>  	    sa.sa_handler = SIG_IGN;
> @@ -382,12 +382,25 @@ __spawnix (pid_t * pid, const char *file,
>       for instance).  */
>    struct clone_args clone_args =
>      {
> -      .flags = CLONE_VM | CLONE_VFORK,
> +      /* Unsupported flags like CLONE_CLEAR_SIGHAND will be cleared up by
> +	 __clone_internal_fallback.  */
> +      .flags = CLONE_CLEAR_SIGHAND | CLONE_VM | CLONE_VFORK,

OK.

>        .exit_signal = SIGCHLD,
>        .stack = (uintptr_t) stack,
>        .stack_size = stack_size,
>      };
> -  new_pid = __clone_internal (&clone_args, __spawni_child, &args);
> +#ifdef HAVE_CLONE3_WRAPPER
> +  args.use_clone3 = true;
> +  new_pid = __clone3 (&clone_args, sizeof (clone_args), __spawni_child,
> +		      &args);
> +  /* clone3 was added on 5.3 and CLONE_CLEAR_SIGHAND on 5.5.  */

s/on/in/g, s/on/in/g

> +  if (new_pid == -1 && (errno == ENOSYS || errno == EINVAL))
> +#endif
> +    {
> +      args.use_clone3 = false;
> +      new_pid = __clone_internal_fallback (&clone_args, __spawni_child,
> +					   &args);

OK, and we use the fallback.

> +    }
>  
>    /* It needs to collect the case where the auxiliary process was created
>       but failed to execute the file (due either any preparation step or

-- 
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 0/9] Optimize posix_spawn signal setup with clone3
  2022-09-30 19:26 [PATCH v2 0/9] Optimize posix_spawn signal setup with clone3 Adhemerval Zanella
                   ` (9 preceding siblings ...)
  2022-10-27 16:48 ` [PATCH v2 0/9] Optimize posix_spawn signal setup with clone3 Adhemerval Zanella Netto
@ 2023-01-11 21:11 ` Carlos O'Donell
  10 siblings, 0 replies; 28+ messages in thread
From: Carlos O'Donell @ 2023-01-11 21:11 UTC (permalink / raw)
  To: Adhemerval Zanella, libc-alpha, Christian Brauner

On 9/30/22 15:26, Adhemerval Zanella via Libc-alpha wrote:
> Currently posix_spawn has to issue at least NSIG sigaction syscalls
> to obtain current signal disposition and more if they there are not
> either SIG_IGN or SIG_DFL.  The new clone3 CLONE_CLEAR_SIGHAND flags
> introduced with Linux 5.5 resets all signal handlers of the child not
> set to SIG_IGN to SIG_DFL, thus allowing posix_spawn to skip this
> preparation phase.
> 
> The exception is the signals defined by posix_spawnattr_setsigdefault
> when POSIX_SPAWN_SETSIGDEF is set (since they can be SIG_IGN).  In
> this case posix_spawn helper process needs to issue a sigction to 
> set the signal disposition to SIG_DFL.
> 
> The patchset also adds clone3 implementation for aarch64, powerpc64,
> s390x, riscv, arm, and mips.
> 

I would like to trim this down for glibc 2.37.

I think we can get the default implementation in place and enable this for x86_64.

Everything else that adds custom assembly should be something we commit once 2.38
opens and so we can have time to review the generated code and make sure we don't
break anything.

I'd like to see a v3 posted with just the 3 patches required to turn this on:
1/9 linux: Do not reset signal handler in posix_spawn if it is already SIG_DFL
2/9 linux: Add clone3 CLONE_CLEAR_SIGHAND optimization to posix_spawn
9/9 Linux: optimize clone3 internal usage

> Changes from v1:
>   * Adeed arm and mips clone3 implementation.
> 
> Adhemerval Zanella (9):
>   linux: Do not reset signal handler in posix_spawn if it is already
>     SIG_DFL
>   linux: Add clone3 CLONE_CLEAR_SIGHAND optimization to posix_spawn
>   powerpc64le: Add the clone3 wrapper
>   aarch64: Add the clone3 wrapper
>   s390x: Add the clone3 wrapper
>   riscv: Add the clone3 wrapper
>   arm: Add the clone3 wrapper
>   mips: Add the clone3 wrapper
>   Linux: optimize clone3 internal usage
> 
>  include/clone_internal.h                      |  10 +
>  posix/Makefile                                |   3 +-
>  posix/tst-spawn7.c                            | 179 ++++++++++++++++++
>  sysdeps/unix/sysv/linux/aarch64/clone3.S      |  90 +++++++++
>  sysdeps/unix/sysv/linux/aarch64/sysdep.h      |   2 +
>  sysdeps/unix/sysv/linux/arm/clone3.S          |  80 ++++++++
>  sysdeps/unix/sysv/linux/arm/sysdep.h          |   1 +
>  sysdeps/unix/sysv/linux/clone-internal.c      |  59 ++++--
>  sysdeps/unix/sysv/linux/clone3.h              |   6 +
>  sysdeps/unix/sysv/linux/kernel-features.h     |   9 +
>  sysdeps/unix/sysv/linux/mips/clone3.S         | 147 ++++++++++++++
>  sysdeps/unix/sysv/linux/mips/sysdep.h         |   2 +
>  .../sysv/linux/powerpc/powerpc64/clone3.S     | 145 ++++++++++++++
>  sysdeps/unix/sysv/linux/powerpc/sysdep.h      |   1 +
>  sysdeps/unix/sysv/linux/riscv/clone3.S        |  83 ++++++++
>  sysdeps/unix/sysv/linux/riscv/sysdep.h        |   1 +
>  sysdeps/unix/sysv/linux/s390/s390-64/clone3.S |  84 ++++++++
>  sysdeps/unix/sysv/linux/s390/sysdep.h         |   1 +
>  sysdeps/unix/sysv/linux/spawni.c              |  33 +++-
>  19 files changed, 910 insertions(+), 26 deletions(-)
>  create mode 100644 posix/tst-spawn7.c
>  create mode 100644 sysdeps/unix/sysv/linux/aarch64/clone3.S
>  create mode 100644 sysdeps/unix/sysv/linux/arm/clone3.S
>  create mode 100644 sysdeps/unix/sysv/linux/mips/clone3.S
>  create mode 100644 sysdeps/unix/sysv/linux/powerpc/powerpc64/clone3.S
>  create mode 100644 sysdeps/unix/sysv/linux/riscv/clone3.S
>  create mode 100644 sysdeps/unix/sysv/linux/s390/s390-64/clone3.S
> 

-- 
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 9/9] Linux: optimize clone3 internal usage
  2022-09-30 19:26 ` [PATCH v2 9/9] Linux: optimize clone3 internal usage Adhemerval Zanella
@ 2023-01-11 21:12   ` Carlos O'Donell
  0 siblings, 0 replies; 28+ messages in thread
From: Carlos O'Donell @ 2023-01-11 21:12 UTC (permalink / raw)
  To: Adhemerval Zanella, libc-alpha, Christian Brauner

On 9/30/22 15:26, Adhemerval Zanella via Libc-alpha wrote:
> Now that clone3 is used on more architectures, add an optimization
> to avoid calling when glibc detects that it is no supported by the
> kernel.  It also adds __ASSUME_CLONE3, which allows skip this
> optimization and issue clone3 syscall directly.

Please post v3 with changes.

Suggest:

Add an optimization to avoid calling clone3 when glibc detects that there is
no kernel support.  It also adds __ASSUME_CLONE3, which allows skipping this
optimization and issuing the clone3 syscall directly.

> 
> It does not handle the the small window between 5.3 and 5.5 for
> posix_spawn (CLONE_CLEAR_SIGHAND was added in 5.5).
> 
> Checked on x86_64-linux-gnu.
> ---
>  include/clone_internal.h                  |  5 +++++
>  sysdeps/unix/sysv/linux/clone-internal.c  | 24 ++++++++++++++++++++++-
>  sysdeps/unix/sysv/linux/kernel-features.h |  9 +++++++++
>  3 files changed, 37 insertions(+), 1 deletion(-)
> 
> diff --git a/include/clone_internal.h b/include/clone_internal.h
> index 320640e64b..385f3c5589 100644
> --- a/include/clone_internal.h
> +++ b/include/clone_internal.h
> @@ -7,6 +7,11 @@ extern __typeof (clone3) __clone3;
>     -1 with ENOSYS, fall back to clone or clone2.  */
>  extern int __clone_internal (struct clone_args *__cl_args,
>  			     int (*__func) (void *__arg), void *__arg);
> +/* clone3 wrapper with a sticky check to avoid re-issue the syscall if
> +   it fails with ENOSYS.  */

s/re-issue/re-issuing/g

> +extern int __clone3_internal (struct clone_args *cl_args,
> +			      int (*func) (void *args), void *arg)
> +     attribute_hidden;
>  /* The fallback code which calls clone/clone2 based on clone3 arguments.  */
>  extern int __clone_internal_fallback (struct clone_args *__cl_args,
>  				      int (*__func) (void *__arg),
> diff --git a/sysdeps/unix/sysv/linux/clone-internal.c b/sysdeps/unix/sysv/linux/clone-internal.c
> index 7bc991e033..39d76733db 100644
> --- a/sysdeps/unix/sysv/linux/clone-internal.c
> +++ b/sysdeps/unix/sysv/linux/clone-internal.c
> @@ -76,6 +76,28 @@ __clone_internal_fallback (struct clone_args *cl_args,
>    return ret;
>  }
>  
> +int
> +__clone3_internal (struct clone_args *cl_args, int (*func) (void *args),
> +		   void *arg)
> +{
> +#ifdef HAVE_CLONE3_WRAPPER
> +# if __ASSUME_CLONE3
> +  return __clone3 (cl_args, sizeof (*cl_args), func, arg);
> +# else
> +  static int clone3_supported = 1;
> +  if (atomic_load_relaxed (&clone3_supported) == 1)
> +    {
> +      int ret = __clone3 (cl_args, sizeof (*cl_args), func, arg);
> +      if (ret != -1 || errno != ENOSYS)
> +	return ret;
> +
> +      atomic_store_relaxed (&clone3_supported, 0);
> +    }
> +# endif
> +#endif
> +  __set_errno (ENOSYS);
> +  return -1;
> +}
>  
>  int
>  __clone_internal (struct clone_args *cl_args,
> @@ -83,7 +105,7 @@ __clone_internal (struct clone_args *cl_args,
>  {
>  #ifdef HAVE_CLONE3_WRAPPER
>    int saved_errno = errno;
> -  int ret = __clone3 (cl_args, sizeof (*cl_args), func, arg);
> +  int ret = __clone3_internal (cl_args, func, arg);
>    if (ret != -1 || errno != ENOSYS)
>      return ret;
>  
> diff --git a/sysdeps/unix/sysv/linux/kernel-features.h b/sysdeps/unix/sysv/linux/kernel-features.h
> index 74adc3956b..4ecd08a98f 100644
> --- a/sysdeps/unix/sysv/linux/kernel-features.h
> +++ b/sysdeps/unix/sysv/linux/kernel-features.h
> @@ -236,4 +236,13 @@
>  # define __ASSUME_FUTEX_LOCK_PI2 0
>  #endif
>  
> +/* The clone3 system call was introduced across on most architectures in
> +   Linux 5.3.  Not all ports implements it, so it should be used along
> +   HAVE_CLONE3_WRAPPER define.  */
> +#if __LINUX_KERNEL_VERSION >= 0x050300
> +# define __ASSUME_CLONE3 1
> +#else
> +# define __ASSUME_CLONE3 0
> +#endif
> +
>  #endif /* kernel-features.h */

-- 
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2023-01-11 21:15 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-30 19:26 [PATCH v2 0/9] Optimize posix_spawn signal setup with clone3 Adhemerval Zanella
2022-09-30 19:26 ` [PATCH v2 1/9] linux: Do not reset signal handler in posix_spawn if it is already SIG_DFL Adhemerval Zanella
2023-01-11 21:06   ` Carlos O'Donell
2022-09-30 19:26 ` [PATCH v2 2/9] linux: Add clone3 CLONE_CLEAR_SIGHAND optimization to posix_spawn Adhemerval Zanella
2023-01-11 21:06   ` Carlos O'Donell
2022-09-30 19:26 ` [PATCH v2 3/9] powerpc64le: Add the clone3 wrapper Adhemerval Zanella
2022-09-30 19:26 ` [PATCH v2 4/9] aarch64: " Adhemerval Zanella
2022-11-02 12:12   ` Szabolcs Nagy
2022-11-03 13:15     ` Adhemerval Zanella Netto
2022-11-03 14:01       ` Szabolcs Nagy
2022-11-03 16:22         ` Adhemerval Zanella Netto
2022-11-03 16:31           ` Szabolcs Nagy
2022-11-03 16:39             ` Adhemerval Zanella Netto
2022-11-03 16:52               ` Szabolcs Nagy
2022-11-03 16:55                 ` Adhemerval Zanella Netto
2022-11-03 20:55                   ` H.J. Lu
2022-11-03 21:28                     ` Adhemerval Zanella Netto
2022-11-03 21:22                   ` Adhemerval Zanella Netto
2022-11-03 21:58                     ` H.J. Lu
2022-11-04 12:32                       ` Adhemerval Zanella Netto
2022-09-30 19:26 ` [PATCH v2 5/9] s390x: " Adhemerval Zanella
2022-09-30 19:26 ` [PATCH v2 6/9] riscv: " Adhemerval Zanella
2022-09-30 19:26 ` [PATCH v2 7/9] arm: " Adhemerval Zanella
2022-09-30 19:26 ` [PATCH v2 8/9] mips: " Adhemerval Zanella
2022-09-30 19:26 ` [PATCH v2 9/9] Linux: optimize clone3 internal usage Adhemerval Zanella
2023-01-11 21:12   ` Carlos O'Donell
2022-10-27 16:48 ` [PATCH v2 0/9] Optimize posix_spawn signal setup with clone3 Adhemerval Zanella Netto
2023-01-11 21:11 ` Carlos O'Donell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).