public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH v8 0/3] Add an internal wrapper for clone, clone2 and clone3
@ 2021-06-01 14:55 H.J. Lu
  2021-06-01 14:55 ` [PATCH v8 1/3] " H.J. Lu
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: H.J. Lu @ 2021-06-01 14:55 UTC (permalink / raw)
  To: libc-alpha; +Cc: Florian Weimer, Noah Goldstein, Adhemerval Zanella

The clone3 system call provides a superset of the functionality of clone
and clone2.  It also provides a number of API improvements, including
the ability to specify the size of the child's stack area which can be
used by kernel to compute the shadow stack size when allocating the
shadow stack.  Add:

extern int __clone_internal (struct clone_args *__cl_args,
			     int (*__func) (void *__arg), void *__arg);

to provide an abstract interface for clone, clone2 and clone3.

1. Simplify stack management for thread creation by passing both stack
base and size to create_thread.
2. Consolidate clone vs clone2 differences into a single file.
3. Call __clone3 if HAVE_CLONE3_WAPPER is defined.  If __clone3 returns
-1 with ENOSYS, fall back to clone or clone2.
4. Use only __clone_internal to clone a thread.  Since the stack size
argument for create_thread is now unconditional, always pass stack size
to create_thread.
5. Enable the public clone3 wrapper in the future after it has been
added to all targets.

NB: Sandbox should return ENOSYS on clone3 if it is rejected:

https://bugs.chromium.org/p/chromium/issues/detail?id=1213452#c5

H.J. Lu (3):
  Add an internal wrapper for clone, clone2 and clone3
  x86-64: Add the clone3 wrapper
  Add static tests for __clone_internal

 include/clone_internal.h                      |  16 ++
 nptl/allocatestack.c                          |  59 +-------
 nptl/pthread_create.c                         |  38 +++--
 sysdeps/unix/sysv/linux/Makefile              |  11 +-
 sysdeps/unix/sysv/linux/clone-internal.c      |  91 ++++++++++++
 sysdeps/unix/sysv/linux/clone3.c              |   1 +
 sysdeps/unix/sysv/linux/clone3.h              |  60 ++++++++
 sysdeps/unix/sysv/linux/spawni.c              |  26 ++--
 .../sysv/linux/tst-align-clone-internal.c     |  87 +++++++++++
 sysdeps/unix/sysv/linux/tst-clone2-internal.c | 137 ++++++++++++++++++
 sysdeps/unix/sysv/linux/tst-clone3-internal.c |  99 +++++++++++++
 .../unix/sysv/linux/tst-getpid1-internal.c    | 133 +++++++++++++++++
 .../sysv/linux/tst-misalign-clone-internal.c  |  86 +++++++++++
 sysdeps/unix/sysv/linux/x86_64/clone3.S       |  92 ++++++++++++
 sysdeps/unix/sysv/linux/x86_64/sysdep.h       |   2 +
 15 files changed, 850 insertions(+), 88 deletions(-)
 create mode 100644 include/clone_internal.h
 create mode 100644 sysdeps/unix/sysv/linux/clone-internal.c
 create mode 100644 sysdeps/unix/sysv/linux/clone3.c
 create mode 100644 sysdeps/unix/sysv/linux/clone3.h
 create mode 100644 sysdeps/unix/sysv/linux/tst-align-clone-internal.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-clone2-internal.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-clone3-internal.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-getpid1-internal.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-misalign-clone-internal.c
 create mode 100644 sysdeps/unix/sysv/linux/x86_64/clone3.S

-- 
2.31.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v8 1/3] Add an internal wrapper for clone, clone2 and clone3
  2021-06-01 14:55 [PATCH v8 0/3] Add an internal wrapper for clone, clone2 and clone3 H.J. Lu
@ 2021-06-01 14:55 ` H.J. Lu
  2021-06-04 12:20   ` H.J. Lu
  2021-07-13 18:54   ` Adhemerval Zanella
  2021-06-01 14:55 ` [PATCH v8 2/3] x86-64: Add the clone3 wrapper H.J. Lu
  2021-06-01 14:55 ` [PATCH v8 3/3] Add static tests for __clone_internal H.J. Lu
  2 siblings, 2 replies; 16+ messages in thread
From: H.J. Lu @ 2021-06-01 14:55 UTC (permalink / raw)
  To: libc-alpha; +Cc: Florian Weimer, Noah Goldstein, Adhemerval Zanella

The clone3 system call provides a superset of the functionality of clone
and clone2.  It also provides a number of API improvements, including
the ability to specify the size of the child's stack area which can be
used by kernel to compute the shadow stack size when allocating the
shadow stack.  Add:

extern int __clone_internal (struct clone_args *__cl_args,
			     int (*__func) (void *__arg), void *__arg);

to provide an abstract interface for clone, clone2 and clone3.

1. Simplify stack management for thread creation by passing both stack
base and size to create_thread.
2. Consolidate clone vs clone2 differences into a single file.
3. Call __clone3 if HAVE_CLONE3_WAPPER is defined.  If __clone3 returns
-1 with ENOSYS, fall back to clone or clone2.
4. Use only __clone_internal to clone a thread.  Since the stack size
argument for create_thread is now unconditional, always pass stack size
to create_thread.
5. Enable the public clone3 wrapper in the future after it has been
added to all targets.

NB: Sandbox should return ENOSYS on clone3 if it is rejected:

https://bugs.chromium.org/p/chromium/issues/detail?id=1213452#c5
---
 include/clone_internal.h                 | 16 +++++
 nptl/allocatestack.c                     | 59 ++-------------
 nptl/pthread_create.c                    | 38 +++++-----
 sysdeps/unix/sysv/linux/Makefile         |  2 +-
 sysdeps/unix/sysv/linux/clone-internal.c | 91 ++++++++++++++++++++++++
 sysdeps/unix/sysv/linux/clone3.c         |  1 +
 sysdeps/unix/sysv/linux/clone3.h         | 60 ++++++++++++++++
 sysdeps/unix/sysv/linux/spawni.c         | 26 +++----
 8 files changed, 205 insertions(+), 88 deletions(-)
 create mode 100644 include/clone_internal.h
 create mode 100644 sysdeps/unix/sysv/linux/clone-internal.c
 create mode 100644 sysdeps/unix/sysv/linux/clone3.c
 create mode 100644 sysdeps/unix/sysv/linux/clone3.h

diff --git a/include/clone_internal.h b/include/clone_internal.h
new file mode 100644
index 0000000000..4b23ef33ce
--- /dev/null
+++ b/include/clone_internal.h
@@ -0,0 +1,16 @@
+#ifndef _CLONE3_H
+#include_next <clone3.h>
+
+extern __typeof (clone3) __clone3;
+
+/* The internal wrapper of clone/clone2 and clone3.  If __clone3 returns
+   -1 with ENOSYS, fall back to clone or clone2.  */
+extern int __clone_internal (struct clone_args *__cl_args,
+			     int (*__func) (void *__arg), void *__arg);
+
+#ifndef _ISOMAC
+libc_hidden_proto (__clone3)
+libc_hidden_proto (__clone_internal)
+#endif
+
+#endif
diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c
index dc81a2ca73..eebf9c2c3c 100644
--- a/nptl/allocatestack.c
+++ b/nptl/allocatestack.c
@@ -33,47 +33,6 @@
 #include <kernel-features.h>
 #include <nptl-stack.h>
 
-#ifndef NEED_SEPARATE_REGISTER_STACK
-
-/* Most architectures have exactly one stack pointer.  Some have more.  */
-# define STACK_VARIABLES void *stackaddr = NULL
-
-/* How to pass the values to the 'create_thread' function.  */
-# define STACK_VARIABLES_ARGS stackaddr
-
-/* How to declare function which gets there parameters.  */
-# define STACK_VARIABLES_PARMS void *stackaddr
-
-/* How to declare allocate_stack.  */
-# define ALLOCATE_STACK_PARMS void **stack
-
-/* This is how the function is called.  We do it this way to allow
-   other variants of the function to have more parameters.  */
-# define ALLOCATE_STACK(attr, pd) allocate_stack (attr, pd, &stackaddr)
-
-#else
-
-/* We need two stacks.  The kernel will place them but we have to tell
-   the kernel about the size of the reserved address space.  */
-# define STACK_VARIABLES void *stackaddr = NULL; size_t stacksize = 0
-
-/* How to pass the values to the 'create_thread' function.  */
-# define STACK_VARIABLES_ARGS stackaddr, stacksize
-
-/* How to declare function which gets there parameters.  */
-# define STACK_VARIABLES_PARMS void *stackaddr, size_t stacksize
-
-/* How to declare allocate_stack.  */
-# define ALLOCATE_STACK_PARMS void **stack, size_t *stacksize
-
-/* This is how the function is called.  We do it this way to allow
-   other variants of the function to have more parameters.  */
-# define ALLOCATE_STACK(attr, pd) \
-  allocate_stack (attr, pd, &stackaddr, &stacksize)
-
-#endif
-
-
 /* Default alignment of stack.  */
 #ifndef STACK_ALIGN
 # define STACK_ALIGN __alignof__ (long double)
@@ -249,7 +208,7 @@ advise_stack_range (void *mem, size_t size, uintptr_t pd, size_t guardsize)
    PDP must be non-NULL.  */
 static int
 allocate_stack (const struct pthread_attr *attr, struct pthread **pdp,
-		ALLOCATE_STACK_PARMS)
+		void **stack, size_t *stacksize)
 {
   struct pthread *pd;
   size_t size;
@@ -600,25 +559,17 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp,
   /* We place the thread descriptor at the end of the stack.  */
   *pdp = pd;
 
-#if _STACK_GROWS_DOWN
   void *stacktop;
 
-# if TLS_TCB_AT_TP
+#if TLS_TCB_AT_TP
   /* The stack begins before the TCB and the static TLS block.  */
   stacktop = ((char *) (pd + 1) - tls_static_size_for_stack);
-# elif TLS_DTV_AT_TP
+#elif TLS_DTV_AT_TP
   stacktop = (char *) (pd - 1);
-# endif
+#endif
 
-# ifdef NEED_SEPARATE_REGISTER_STACK
+  *stacksize = stacktop - pd->stackblock;
   *stack = pd->stackblock;
-  *stacksize = stacktop - *stack;
-# else
-  *stack = stacktop;
-# endif
-#else
-  *stack = pd->stackblock;
-#endif
 
   return 0;
 }
diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c
index 2d2535b07d..9e3b8f325c 100644
--- a/nptl/pthread_create.c
+++ b/nptl/pthread_create.c
@@ -37,6 +37,7 @@
 #include "libioP.h"
 #include <sys/single_threaded.h>
 #include <version.h>
+#include <clone_internal.h>
 
 #include <shlib-compat.h>
 
@@ -246,8 +247,8 @@ late_init (void)
 static int _Noreturn start_thread (void *arg);
 
 static int create_thread (struct pthread *pd, const struct pthread_attr *attr,
-			  bool *stopped_start, STACK_VARIABLES_PARMS,
-			  bool *thread_ran)
+			  bool *stopped_start, void *stackaddr,
+			  size_t stacksize, bool *thread_ran)
 {
   /* Determine whether the newly created threads has to be started
      stopped since we have to set the scheduling parameters or set the
@@ -299,14 +300,18 @@ static int create_thread (struct pthread *pd, const struct pthread_attr *attr,
 
   TLS_DEFINE_INIT_TP (tp, pd);
 
-#ifdef __NR_clone2
-# define ARCH_CLONE __clone2
-#else
-# define ARCH_CLONE __clone
-#endif
-  if (__glibc_unlikely (ARCH_CLONE (&start_thread, STACK_VARIABLES_ARGS,
-				    clone_flags, pd, &pd->tid, tp, &pd->tid)
-			== -1))
+  struct clone_args args =
+    {
+      .flags = clone_flags,
+      .pidfd = (uintptr_t) &pd->tid,
+      .parent_tid = (uintptr_t) &pd->tid,
+      .child_tid = (uintptr_t) &pd->tid,
+      .stack = (uintptr_t) stackaddr,
+      .stack_size = stacksize,
+      .tls = (uintptr_t) tp,
+    };
+  int ret = __clone_internal (&args, &start_thread, pd);
+  if (__glibc_unlikely (ret == -1))
     return errno;
 
   /* It's started now, so if we fail below, we'll have to cancel it
@@ -603,7 +608,8 @@ int
 __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
 		      void *(*start_routine) (void *), void *arg)
 {
-  STACK_VARIABLES;
+  void *stackaddr = NULL;
+  size_t stacksize = 0;
 
   /* Avoid a data race in the multi-threaded case, and call the
      deferred initialization only once.  */
@@ -627,7 +633,7 @@ __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
     }
 
   struct pthread *pd = NULL;
-  int err = ALLOCATE_STACK (iattr, &pd);
+  int err = allocate_stack (iattr, &pd, &stackaddr, &stacksize);
   int retval = 0;
 
   if (__glibc_unlikely (err != 0))
@@ -772,8 +778,8 @@ __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
 
       /* We always create the thread stopped at startup so we can
 	 notify the debugger.  */
-      retval = create_thread (pd, iattr, &stopped_start,
-			      STACK_VARIABLES_ARGS, &thread_ran);
+      retval = create_thread (pd, iattr, &stopped_start, stackaddr,
+			      stacksize, &thread_ran);
       if (retval == 0)
 	{
 	  /* We retain ownership of PD until (a) (see CONCURRENCY NOTES
@@ -804,8 +810,8 @@ __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
 	}
     }
   else
-    retval = create_thread (pd, iattr, &stopped_start,
-			    STACK_VARIABLES_ARGS, &thread_ran);
+    retval = create_thread (pd, iattr, &stopped_start, stackaddr,
+			    stacksize, &thread_ran);
 
   /* Return to the previous signal mask, after creating the new
      thread.  */
diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
index bc14f20274..9469868bce 100644
--- a/sysdeps/unix/sysv/linux/Makefile
+++ b/sysdeps/unix/sysv/linux/Makefile
@@ -64,7 +64,7 @@ sysdep_routines += adjtimex clone umount umount2 readahead sysctl \
 		   time64-support pselect32 \
 		   xstat fxstat lxstat xstat64 fxstat64 lxstat64 \
 		   fxstatat fxstatat64 \
-		   xmknod xmknodat
+		   xmknod xmknodat clone3 clone-internal
 
 CFLAGS-gethostid.c = -fexceptions
 CFLAGS-tee.c = -fexceptions -fasynchronous-unwind-tables
diff --git a/sysdeps/unix/sysv/linux/clone-internal.c b/sysdeps/unix/sysv/linux/clone-internal.c
new file mode 100644
index 0000000000..1e7a8f6b35
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/clone-internal.c
@@ -0,0 +1,91 @@
+/* The internal wrapper of clone and clone3.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library.  If not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include <stddef.h>
+#include <errno.h>
+#include <sched.h>
+#include <clone_internal.h>
+#include <libc-pointer-arith.h>	/* For cast_to_pointer.  */
+#include <stackinfo.h>		/* For _STACK_GROWS_{UP,DOWN}.  */
+
+#define CLONE_ARGS_SIZE_VER0 64 /* sizeof first published struct */
+#define CLONE_ARGS_SIZE_VER1 80 /* sizeof second published struct */
+#define CLONE_ARGS_SIZE_VER2 88 /* sizeof third published struct */
+
+#define sizeof_field(TYPE, MEMBER) sizeof ((((TYPE *)0)->MEMBER))
+#define offsetofend(TYPE, MEMBER) \
+  (offsetof (TYPE, MEMBER) + sizeof_field (TYPE, MEMBER))
+
+_Static_assert (__alignof (struct clone_args) == 8,
+		"__alignof (struct clone_args) != 8");
+_Static_assert (offsetofend (struct clone_args, tls) == CLONE_ARGS_SIZE_VER0,
+		"offsetofend (struct clone_args, tls) != CLONE_ARGS_SIZE_VER0");
+_Static_assert (offsetofend (struct clone_args, set_tid_size) == CLONE_ARGS_SIZE_VER1,
+		"offsetofend (struct clone_args, set_tid_size) != CLONE_ARGS_SIZE_VER1");
+_Static_assert (offsetofend (struct clone_args, cgroup) == CLONE_ARGS_SIZE_VER2,
+		"offsetofend (struct clone_args, cgroup) != CLONE_ARGS_SIZE_VER2");
+_Static_assert (sizeof (struct clone_args) == CLONE_ARGS_SIZE_VER2,
+		"sizeof (struct clone_args) != CLONE_ARGS_SIZE_VER2");
+
+int
+__clone_internal (struct clone_args *cl_args,
+		  int (*func) (void *arg), void *arg)
+{
+  int ret;
+#ifdef HAVE_CLONE3_WAPPER
+  /* Try clone3 first.  */
+  int saved_errno = errno;
+  ret = __clone3 (cl_args, sizeof (*cl_args), func, arg);
+  if (ret != -1 || errno != ENOSYS)
+    return ret;
+
+  /* NB: Restore errno since errno may be checked against non-zero
+     return value.  */
+  __set_errno (saved_errno);
+#endif
+
+  /* Map clone3 arguments to clone arguments.  NB: No need to check
+     invalid clone3 specific bits in flags nor exit_signal since this
+     is an internal function.  */
+  int flags = cl_args->flags | cl_args->exit_signal;
+  void *stack = cast_to_pointer (cl_args->stack);
+
+#ifdef __ia64__
+  ret = __clone2 (func, stack, cl_args->stack_size,
+		  flags, arg,
+		  cast_to_pointer (cl_args->parent_tid),
+		  cast_to_pointer (cl_args->tls),
+		  cast_to_pointer (cl_args->child_tid));
+#else
+# if !_STACK_GROWS_DOWN && !_STACK_GROWS_UP
+#  error "Define either _STACK_GROWS_DOWN or _STACK_GROWS_UP"
+# endif
+
+# if _STACK_GROWS_DOWN
+  stack += cl_args->stack_size;
+# endif
+  ret = __clone (func, stack, flags, arg,
+		 cast_to_pointer (cl_args->parent_tid),
+		 cast_to_pointer (cl_args->tls),
+		 cast_to_pointer (cl_args->child_tid));
+#endif
+  return ret;
+}
+
+libc_hidden_def (__clone_internal)
diff --git a/sysdeps/unix/sysv/linux/clone3.c b/sysdeps/unix/sysv/linux/clone3.c
new file mode 100644
index 0000000000..de963ef89d
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/clone3.c
@@ -0,0 +1 @@
+/* An empty placeholder.  */
diff --git a/sysdeps/unix/sysv/linux/clone3.h b/sysdeps/unix/sysv/linux/clone3.h
new file mode 100644
index 0000000000..0488884d59
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/clone3.h
@@ -0,0 +1,60 @@
+/* The wrapper of clone3.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library.  If not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#ifndef _CLONE3_H
+#define _CLONE3_H	1
+
+#include <features.h>
+#include <stdint.h>
+#include <stddef.h>
+
+__BEGIN_DECLS
+
+/* This struct should only be used in an argument to the clone3 system
+   call (along with its size argument).  It may be extended with new
+   fields in the future.  */
+
+struct clone_args
+{
+  uint64_t flags;	 /* Flags bit mask.  */
+  uint64_t pidfd;	 /* Where to store PID file descriptor
+			    (pid_t *).  */
+  uint64_t child_tid;	 /* Where to store child TID, in child's memory
+			    (pid_t *).  */
+  uint64_t parent_tid;	 /* Where to store child TID, in parent's memory
+			    (int *). */
+  uint64_t exit_signal;	 /* Signal to deliver to parent on child
+			    termination */
+  uint64_t stack;	 /* The lowest address of stack.  */
+  uint64_t stack_size;	 /* Size of stack.  */
+  uint64_t tls;		 /* Location of new TLS.  */
+  uint64_t set_tid;	 /* Pointer to a pid_t array
+			    (since Linux 5.5).  */
+  uint64_t set_tid_size; /* Number of elements in set_tid
+			    (since Linux 5.5). */
+  uint64_t cgroup;	 /* File descriptor for target cgroup
+			    of child (since Linux 5.7).  */
+} __attribute__ ((aligned (8)));
+
+/* The wrapper of clone3.  */
+extern int clone3 (struct clone_args *__cl_args, size_t __size,
+		   int (*__func) (void *__arg), void *__arg);
+
+__END_DECLS
+
+#endif /* clone3.h */
diff --git a/sysdeps/unix/sysv/linux/spawni.c b/sysdeps/unix/sysv/linux/spawni.c
index 501f8fbccd..fd29858cf5 100644
--- a/sysdeps/unix/sysv/linux/spawni.c
+++ b/sysdeps/unix/sysv/linux/spawni.c
@@ -31,6 +31,7 @@
 #include <dl-sysdep.h>
 #include <libc-pointer-arith.h>
 #include <ldsodefs.h>
+#include <clone_internal.h>
 #include "spawn_int.h"
 
 /* The Linux implementation of posix_spawn{p} uses the clone syscall directly
@@ -59,21 +60,6 @@
    normal program exit with the exit code 127.  */
 #define SPAWN_ERROR	127
 
-#ifdef __ia64__
-# define CLONE(__fn, __stackbase, __stacksize, __flags, __args) \
-  __clone2 (__fn, __stackbase, __stacksize, __flags, __args, 0, 0, 0)
-#else
-# define CLONE(__fn, __stack, __stacksize, __flags, __args) \
-  __clone (__fn, __stack, __flags, __args)
-#endif
-
-/* Since ia64 wants the stackbase w/clone2, re-use the grows-up macro.  */
-#if _STACK_GROWS_UP || defined (__ia64__)
-# define STACK(__stack, __stack_size) (__stack)
-#elif _STACK_GROWS_DOWN
-# define STACK(__stack, __stack_size) (__stack + __stack_size)
-#endif
-
 
 struct posix_spawn_args
 {
@@ -378,8 +364,14 @@ __spawnix (pid_t * pid, const char *file,
      need for CLONE_SETTLS.  Although parent and child share the same TLS
      namespace, there will be no concurrent access for TLS variables (errno
      for instance).  */
-  new_pid = CLONE (__spawni_child, STACK (stack, stack_size), stack_size,
-		   CLONE_VM | CLONE_VFORK | SIGCHLD, &args);
+  struct clone_args clone_args =
+    {
+      .flags = CLONE_VM | CLONE_VFORK,
+      .exit_signal = SIGCHLD,
+      .stack = (uintptr_t) stack,
+      .stack_size = stack_size,
+    };
+  new_pid = __clone_internal (&clone_args, __spawni_child, &args);
 
   /* It needs to collect the case where the auxiliary process was created
      but failed to execute the file (due either any preparation step or
-- 
2.31.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v8 2/3] x86-64: Add the clone3 wrapper
  2021-06-01 14:55 [PATCH v8 0/3] Add an internal wrapper for clone, clone2 and clone3 H.J. Lu
  2021-06-01 14:55 ` [PATCH v8 1/3] " H.J. Lu
@ 2021-06-01 14:55 ` H.J. Lu
  2021-07-13 19:12   ` Adhemerval Zanella
  2021-06-01 14:55 ` [PATCH v8 3/3] Add static tests for __clone_internal H.J. Lu
  2 siblings, 1 reply; 16+ messages in thread
From: H.J. Lu @ 2021-06-01 14:55 UTC (permalink / raw)
  To: libc-alpha; +Cc: Florian Weimer, Noah Goldstein, Adhemerval Zanella

extern int clone3 (struct clone_args *__cl_args, size_t __size,
		   int (*__func) (void *__arg), void *__arg);
---
 sysdeps/unix/sysv/linux/x86_64/clone3.S | 92 +++++++++++++++++++++++++
 sysdeps/unix/sysv/linux/x86_64/sysdep.h |  2 +
 2 files changed, 94 insertions(+)
 create mode 100644 sysdeps/unix/sysv/linux/x86_64/clone3.S

diff --git a/sysdeps/unix/sysv/linux/x86_64/clone3.S b/sysdeps/unix/sysv/linux/x86_64/clone3.S
new file mode 100644
index 0000000000..71caaecc29
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/x86_64/clone3.S
@@ -0,0 +1,92 @@
+/* The clone3 syscall wrapper.  Linux/x86-64 version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+/* clone3() is even more special than fork() as it mucks with stacks
+   and invokes a function in the right context after its all over.  */
+
+#include <sysdep.h>
+
+/* The userland implementation is:
+   int clone3 (struct clone_args *cl_args, size_t size,
+	       int (*func)(void *arg), void *arg);
+   the kernel entry is:
+   int clone3 (struct clone_args *cl_args, size_t size);
+
+   The parameters are passed in registers from userland:
+   rdi: cl_args
+   rsi: size
+   rdx: func
+   rcx: arg
+
+   The kernel expects:
+   rax: system call number
+   rdi: cl_args
+   rsi: size  */
+
+        .text
+ENTRY (__clone3)
+	/* Sanity check arguments.  */
+	movl	$-EINVAL, %eax
+	test	%RDI_LP, %RDI_LP	/* No NULL cl_args pointer.  */
+	jz	SYSCALL_ERROR_LABEL
+	test	%RDX_LP, %RDX_LP	/* No NULL function pointer.  */
+	jz	SYSCALL_ERROR_LABEL
+
+	/* Save the cl_args pointer in R8 which is preserved by the
+	   syscall.  */
+	mov	%RCX_LP, %R8_LP
+
+	/* Do the system call.  */
+	movl	$SYS_ify(clone3), %eax
+
+	/* End FDE now, because in the child the unwind info will be
+	   wrong.  */
+	cfi_endproc
+	syscall
+
+	test	%RAX_LP, %RAX_LP
+	jl	SYSCALL_ERROR_LABEL
+	jz	L(thread_start)
+
+	ret
+
+L(thread_start):
+	cfi_startproc
+	/* Clearing frame pointer is insufficient, use CFI.  */
+	cfi_undefined (rip)
+	/* Clear the frame pointer.  The ABI suggests this be done, to mark
+	   the outermost frame obviously.  */
+	xorl	%ebp, %ebp
+
+	/* Align stack to 16 bytes per the x86-64 psABI.  */
+	and	$-16, %RSP_LP
+
+	/* Set up arguments for the function call.  */
+	mov	%R8_LP, %RDI_LP	/* Argument.  */
+	call	*%rdx		/* Call function.  */
+	/* Call exit with return value from function call. */
+	movq	%rax, %rdi
+	movl	$SYS_ify(exit), %eax
+	syscall
+	cfi_endproc
+
+	cfi_startproc
+PSEUDO_END (__clone3)
+
+libc_hidden_def (__clone3)
+weak_alias (__clone3, clone3)
diff --git a/sysdeps/unix/sysv/linux/x86_64/sysdep.h b/sysdeps/unix/sysv/linux/x86_64/sysdep.h
index dbad2c788a..f26ffc68ae 100644
--- a/sysdeps/unix/sysv/linux/x86_64/sysdep.h
+++ b/sysdeps/unix/sysv/linux/x86_64/sysdep.h
@@ -377,6 +377,8 @@
 # define HAVE_GETCPU_VSYSCALL		"__vdso_getcpu"
 # define HAVE_CLOCK_GETRES64_VSYSCALL   "__vdso_clock_getres"
 
+# define HAVE_CLONE3_WAPPER			1
+
 # define SINGLE_THREAD_BY_GLOBAL		1
 
 #endif	/* __ASSEMBLER__ */
-- 
2.31.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v8 3/3] Add static tests for __clone_internal
  2021-06-01 14:55 [PATCH v8 0/3] Add an internal wrapper for clone, clone2 and clone3 H.J. Lu
  2021-06-01 14:55 ` [PATCH v8 1/3] " H.J. Lu
  2021-06-01 14:55 ` [PATCH v8 2/3] x86-64: Add the clone3 wrapper H.J. Lu
@ 2021-06-01 14:55 ` H.J. Lu
  2021-07-13 19:32   ` Adhemerval Zanella
  2 siblings, 1 reply; 16+ messages in thread
From: H.J. Lu @ 2021-06-01 14:55 UTC (permalink / raw)
  To: libc-alpha; +Cc: Florian Weimer, Noah Goldstein, Adhemerval Zanella

---
 sysdeps/unix/sysv/linux/Makefile              |   9 ++
 .../sysv/linux/tst-align-clone-internal.c     |  87 +++++++++++
 sysdeps/unix/sysv/linux/tst-clone2-internal.c | 137 ++++++++++++++++++
 sysdeps/unix/sysv/linux/tst-clone3-internal.c |  99 +++++++++++++
 .../unix/sysv/linux/tst-getpid1-internal.c    | 133 +++++++++++++++++
 .../sysv/linux/tst-misalign-clone-internal.c  |  86 +++++++++++
 6 files changed, 551 insertions(+)
 create mode 100644 sysdeps/unix/sysv/linux/tst-align-clone-internal.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-clone2-internal.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-clone3-internal.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-getpid1-internal.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-misalign-clone-internal.c

diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
index 9469868bce..214b912921 100644
--- a/sysdeps/unix/sysv/linux/Makefile
+++ b/sysdeps/unix/sysv/linux/Makefile
@@ -118,6 +118,15 @@ endif
 
 tests-internal += tst-sigcontext-get_pc
 
+tests-clone-internal = \
+  tst-align-clone-internal \
+  tst-clone2-internal \
+  tst-clone3-internal \
+  tst-getpid1-internal \
+  tst-misalign-clone-internal
+tests-internal += $(tests-clone-internal)
+tests-static += $(tests-clone-internal)
+
 CFLAGS-tst-sigcontext-get_pc.c = -fasynchronous-unwind-tables
 
 # Generate the list of SYS_* macros for the system calls (__NR_*
diff --git a/sysdeps/unix/sysv/linux/tst-align-clone-internal.c b/sysdeps/unix/sysv/linux/tst-align-clone-internal.c
new file mode 100644
index 0000000000..6c3631f3db
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/tst-align-clone-internal.c
@@ -0,0 +1,87 @@
+/* Verify that the clone child stack is properly aligned.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sched.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <string.h>
+#include <sys/wait.h>
+#include <unistd.h>
+#include <tst-stack-align.h>
+#include <clone_internal.h>
+#include <support/xunistd.h>
+
+static int
+f (void *arg)
+{
+  bool ok = true;
+
+  puts ("in f");
+
+  if (TEST_STACK_ALIGN ())
+    ok = false;
+
+  return ok ? 0 : 1;
+}
+
+static int
+do_test (void)
+{
+  bool ok = true;
+
+  puts ("in main");
+
+  if (TEST_STACK_ALIGN ())
+    ok = false;
+
+#ifdef __ia64__
+# define STACK_SIZE 256 * 1024
+#else
+# define STACK_SIZE 128 * 1024
+#endif
+  char st[STACK_SIZE] __attribute__ ((aligned));
+  struct clone_args clone_args =
+    {
+      .stack = (uintptr_t) st,
+      .stack_size = sizeof (st),
+    };
+  pid_t p = __clone_internal (&clone_args, f, 0);
+  if (p == -1)
+    {
+      printf("clone failed: %m\n");
+      return 1;
+    }
+
+  int e;
+  xwaitpid (p, &e, __WCLONE);
+  if (!WIFEXITED (e))
+    {
+      if (WIFSIGNALED (e))
+	printf ("died from signal %s\n", strsignal (WTERMSIG (e)));
+      else
+	puts ("did not terminate correctly");
+      return 1;
+    }
+  if (WEXITSTATUS (e) != 0)
+    ok = false;
+
+  return ok ? 0 : 1;
+}
+
+#include <support/test-driver.c>
diff --git a/sysdeps/unix/sysv/linux/tst-clone2-internal.c b/sysdeps/unix/sysv/linux/tst-clone2-internal.c
new file mode 100644
index 0000000000..b8917fe713
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/tst-clone2-internal.c
@@ -0,0 +1,137 @@
+/* Test if CLONE_VM does not change pthread pid/tid field (BZ #19957)
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sched.h>
+#include <signal.h>
+#include <string.h>
+#include <stdio.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <stddef.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <errno.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <sys/syscall.h>
+#include <clone_internal.h>
+#include <support/xunistd.h>
+#include <support/check.h>
+
+static int sig;
+static int pipefd[2];
+
+static int
+f (void *a)
+{
+  close (pipefd[0]);
+
+  pid_t ppid = getppid ();
+  pid_t pid = getpid ();
+  pid_t tid = gettid ();
+
+  if (write (pipefd[1], &ppid, sizeof ppid) != sizeof (ppid))
+    FAIL_EXIT1 ("write ppid failed\n");
+  if (write (pipefd[1], &pid, sizeof pid) != sizeof (pid))
+    FAIL_EXIT1 ("write pid failed\n");
+  if (write (pipefd[1], &tid, sizeof tid) != sizeof (tid))
+    FAIL_EXIT1 ("write tid failed\n");
+
+  return 0;
+}
+
+
+static int
+do_test (void)
+{
+  sig = SIGRTMIN;
+  sigset_t ss;
+  sigemptyset (&ss);
+  sigaddset (&ss, sig);
+  if (sigprocmask (SIG_BLOCK, &ss, NULL) != 0)
+    FAIL_EXIT1 ("sigprocmask failed: %m");
+
+  if (pipe2 (pipefd, O_CLOEXEC))
+    FAIL_EXIT1 ("pipe failed: %m");
+
+#ifdef __ia64__
+# define STACK_SIZE 256 * 1024
+#else
+# define STACK_SIZE 128 * 1024
+#endif
+  char st[STACK_SIZE] __attribute__ ((aligned));
+  struct clone_args clone_args =
+    {
+      .stack = (uintptr_t) st,
+      .stack_size = sizeof (st),
+    };
+  pid_t p = __clone_internal (&clone_args, f, 0);
+
+  close (pipefd[1]);
+
+  if (p == -1)
+    FAIL_EXIT1("clone failed: %m");
+
+  pid_t ppid, pid, tid;
+  if (read (pipefd[0], &ppid, sizeof pid) != sizeof pid)
+    {
+      kill (p, SIGKILL);
+      FAIL_EXIT1 ("read ppid failed: %m");
+    }
+  if (read (pipefd[0], &pid, sizeof pid) != sizeof pid)
+    {
+      kill (p, SIGKILL);
+      FAIL_EXIT1 ("read pid failed: %m");
+    }
+  if (read (pipefd[0], &tid, sizeof tid) != sizeof tid)
+    {
+      kill (p, SIGKILL);
+      FAIL_EXIT1 ("read tid failed: %m");
+    }
+
+  close (pipefd[0]);
+
+  int ret = 0;
+
+  pid_t own_pid = getpid ();
+  pid_t own_tid = syscall (__NR_gettid);
+
+  /* Some sanity checks for clone syscall: returned ppid should be current
+     pid and both returned tid/pid should be different from current one.  */
+  if ((ppid != own_pid) || (pid == own_pid) || (tid == own_tid))
+    FAIL_RET ("ppid=%i pid=%i tid=%i | own_pid=%i own_tid=%i",
+	      (int)ppid, (int)pid, (int)tid, (int)own_pid, (int)own_tid);
+
+  int e;
+  xwaitpid (p, &e, __WCLONE);
+  if (!WIFEXITED (e))
+    {
+      if (WIFSIGNALED (e))
+	printf ("died from signal %s\n", strsignal (WTERMSIG (e)));
+      else
+	puts ("did not terminate correctly");
+      exit (EXIT_FAILURE);
+    }
+  if (WEXITSTATUS (e) != 0)
+    FAIL_EXIT1 ("exit code %d", WEXITSTATUS (e));
+
+  return ret;
+}
+
+#include <support/test-driver.c>
diff --git a/sysdeps/unix/sysv/linux/tst-clone3-internal.c b/sysdeps/unix/sysv/linux/tst-clone3-internal.c
new file mode 100644
index 0000000000..2bdbc571e6
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/tst-clone3-internal.c
@@ -0,0 +1,99 @@
+/* Check if clone (CLONE_THREAD) does not call exit_group (BZ #21512)
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <string.h>
+#include <sched.h>
+#include <signal.h>
+#include <unistd.h>
+#include <errno.h>
+#include <sys/syscall.h>
+#include <sys/wait.h>
+#include <sys/types.h>
+#include <linux/futex.h>
+#include <support/check.h>
+#include <stdatomic.h>
+#include <clone_internal.h>
+
+/* Test if clone call with CLONE_THREAD does not call exit_group.  The 'f'
+   function returns '1', which will be used by clone thread to call the
+   'exit' syscall directly.  If _exit is used instead, exit_group will be
+   used and thus the thread group will finish with return value of '1'
+   (where '2' from main thread is expected.).  */
+
+static int
+f (void *a)
+{
+  return 1;
+}
+
+/* Futex wait for TID argument, similar to pthread_join internal
+   implementation.  */
+#define wait_tid(ctid_ptr, ctid_val)					\
+  do {									\
+    __typeof (*(ctid_ptr)) __tid;					\
+    /* We need acquire MO here so that we synchronize with the		\
+       kernel's store to 0 when the clone terminates.  */		\
+    while ((__tid = atomic_load_explicit (ctid_ptr,			\
+					  memory_order_acquire)) != 0)	\
+      futex_wait (ctid_ptr, ctid_val);					\
+  } while (0)
+
+static inline int
+futex_wait (int *futexp, int val)
+{
+#ifdef __NR_futex
+  return syscall (__NR_futex, futexp, FUTEX_WAIT, val);
+#else
+  return syscall (__NR_futex_time64, futexp, FUTEX_WAIT, val);
+#endif
+}
+
+static int
+do_test (void)
+{
+  char st[1024] __attribute__ ((aligned));
+  int clone_flags = CLONE_THREAD;
+  /* Minimum required flags to used along with CLONE_THREAD.  */
+  clone_flags |= CLONE_VM | CLONE_SIGHAND;
+  /* We will used ctid to call on futex to wait for thread exit.  */
+  clone_flags |= CLONE_CHILD_CLEARTID;
+  /* Initialize with a known value.  ctid is set to zero by the kernel after the
+     cloned thread has exited.  */
+#define CTID_INIT_VAL 1
+  pid_t ctid = CTID_INIT_VAL;
+  pid_t tid;
+
+  struct clone_args clone_args =
+    {
+      .flags = clone_flags & ~CSIGNAL,
+      .exit_signal = clone_flags & CSIGNAL,
+      .stack = (uintptr_t) st,
+      .stack_size = sizeof (st),
+      .child_tid = (uintptr_t) &ctid,
+    };
+  tid = __clone_internal (&clone_args, f, NULL);
+  if (tid == -1)
+    FAIL_EXIT1 ("clone failed: %m");
+
+  wait_tid (&ctid, CTID_INIT_VAL);
+
+  return 2;
+}
+
+#define EXPECTED_STATUS 2
+#include <support/test-driver.c>
diff --git a/sysdeps/unix/sysv/linux/tst-getpid1-internal.c b/sysdeps/unix/sysv/linux/tst-getpid1-internal.c
new file mode 100644
index 0000000000..ee69e52401
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/tst-getpid1-internal.c
@@ -0,0 +1,133 @@
+/* Verify that the parent pid is unchanged by __clone_internal.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sched.h>
+#include <signal.h>
+#include <string.h>
+#include <stdio.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <clone_internal.h>
+#include <support/xunistd.h>
+
+#ifndef TEST_CLONE_FLAGS
+#define TEST_CLONE_FLAGS 0
+#endif
+
+static int sig;
+
+static int
+f (void *a)
+{
+  puts ("in f");
+  union sigval sival;
+  sival.sival_int = getpid ();
+  printf ("pid = %d\n", sival.sival_int);
+  if (sigqueue (getppid (), sig, sival) != 0)
+    return 1;
+  return 0;
+}
+
+
+static int
+do_test (void)
+{
+  int mypid = getpid ();
+
+  sig = SIGRTMIN;
+  sigset_t ss;
+  sigemptyset (&ss);
+  sigaddset (&ss, sig);
+  if (sigprocmask (SIG_BLOCK, &ss, NULL) != 0)
+    {
+      printf ("sigprocmask failed: %m\n");
+      return 1;
+    }
+
+#ifdef __ia64__
+# define STACK_SIZE 256 * 1024
+#else
+# define STACK_SIZE 128 * 1024
+#endif
+  char st[STACK_SIZE] __attribute__ ((aligned));
+  struct clone_args clone_args =
+    {
+      .flags = TEST_CLONE_FLAGS & ~CSIGNAL,
+      .exit_signal = TEST_CLONE_FLAGS & CSIGNAL,
+      .stack = (uintptr_t) st,
+      .stack_size = sizeof (st),
+    };
+  pid_t p = __clone_internal (&clone_args, f, 0);
+  if (p == -1)
+    {
+      printf("clone failed: %m\n");
+      return 1;
+    }
+  printf ("new thread: %d\n", (int) p);
+
+  siginfo_t si;
+  do
+    if (sigwaitinfo (&ss, &si) < 0)
+      {
+	printf("sigwaitinfo failed: %m\n");
+	kill (p, SIGKILL);
+	return 1;
+      }
+  while  (si.si_signo != sig || si.si_code != SI_QUEUE);
+
+  int e;
+  xwaitpid (p, &e, __WCLONE);
+  if (!WIFEXITED (e))
+    {
+      if (WIFSIGNALED (e))
+	printf ("died from signal %s\n", strsignal (WTERMSIG (e)));
+      else
+	puts ("did not terminate correctly");
+      return 1;
+    }
+  if (WEXITSTATUS (e) != 0)
+    {
+      printf ("exit code %d\n", WEXITSTATUS (e));
+      return 1;
+    }
+
+  if (si.si_int != (int) p)
+    {
+      printf ("expected PID %d, got si_int %d\n", (int) p, si.si_int);
+      kill (p, SIGKILL);
+      return 1;
+    }
+
+  if (si.si_pid != p)
+    {
+      printf ("expected PID %d, got si_pid %d\n", (int) p, (int) si.si_pid);
+      kill (p, SIGKILL);
+      return 1;
+    }
+
+  if (getpid () != mypid)
+    {
+      puts ("my PID changed");
+      return 1;
+    }
+
+  return 0;
+}
+
+#include <support/test-driver.c>
diff --git a/sysdeps/unix/sysv/linux/tst-misalign-clone-internal.c b/sysdeps/unix/sysv/linux/tst-misalign-clone-internal.c
new file mode 100644
index 0000000000..6df5fd2cbc
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/tst-misalign-clone-internal.c
@@ -0,0 +1,86 @@
+/* Verify that __clone_internal properly aligns the child stack.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sched.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <string.h>
+#include <sys/wait.h>
+#include <unistd.h>
+#include <libc-pointer-arith.h>
+#include <tst-stack-align.h>
+#include <clone_internal.h>
+#include <support/xunistd.h>
+#include <support/check.h>
+
+static int
+check_stack_alignment (void *arg)
+{
+  bool ok = true;
+
+  puts ("in f");
+
+  if (TEST_STACK_ALIGN ())
+    ok = false;
+
+  return ok ? 0 : 1;
+}
+
+static int
+do_test (void)
+{
+  puts ("in do_test");
+
+  if (TEST_STACK_ALIGN ())
+    FAIL_EXIT1 ("stack isn't aligned\n");
+
+#ifdef __ia64__
+# define STACK_SIZE (256 * 1024)
+#else
+# define STACK_SIZE (128 * 1024)
+#endif
+  char st[STACK_SIZE + 1];
+  /* NB: Align child stack to 1 byte.  */
+  char *stack = PTR_ALIGN_UP (&st[0], 2) + 1;
+  struct clone_args clone_args =
+    {
+      .stack = (uintptr_t) stack,
+      .stack_size = STACK_SIZE,
+    };
+  pid_t p = __clone_internal (&clone_args, check_stack_alignment, 0);
+
+  /* Clone must not fail.  */
+  TEST_VERIFY_EXIT (p != -1);
+
+  int e;
+  xwaitpid (p, &e, __WCLONE);
+  if (!WIFEXITED (e))
+    {
+      if (WIFSIGNALED (e))
+	printf ("died from signal %s\n", strsignal (WTERMSIG (e)));
+     FAIL_EXIT1 ("process did not terminate correctly");
+    }
+
+  if (WEXITSTATUS (e) != 0)
+    FAIL_EXIT1 ("exit code %d", WEXITSTATUS (e));
+
+  return 0;
+}
+
+#include <support/test-driver.c>
-- 
2.31.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v8 1/3] Add an internal wrapper for clone, clone2 and clone3
  2021-06-01 14:55 ` [PATCH v8 1/3] " H.J. Lu
@ 2021-06-04 12:20   ` H.J. Lu
  2021-06-18 18:20     ` PING^1 " H.J. Lu
  2021-07-13 18:54   ` Adhemerval Zanella
  1 sibling, 1 reply; 16+ messages in thread
From: H.J. Lu @ 2021-06-04 12:20 UTC (permalink / raw)
  To: GNU C Library; +Cc: Florian Weimer, Noah Goldstein, Adhemerval Zanella

On Tue, Jun 1, 2021 at 7:55 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> The clone3 system call provides a superset of the functionality of clone
> and clone2.  It also provides a number of API improvements, including
> the ability to specify the size of the child's stack area which can be
> used by kernel to compute the shadow stack size when allocating the
> shadow stack.  Add:
>
> extern int __clone_internal (struct clone_args *__cl_args,
>                              int (*__func) (void *__arg), void *__arg);
>
> to provide an abstract interface for clone, clone2 and clone3.
>
> 1. Simplify stack management for thread creation by passing both stack
> base and size to create_thread.
> 2. Consolidate clone vs clone2 differences into a single file.
> 3. Call __clone3 if HAVE_CLONE3_WAPPER is defined.  If __clone3 returns
> -1 with ENOSYS, fall back to clone or clone2.
> 4. Use only __clone_internal to clone a thread.  Since the stack size
> argument for create_thread is now unconditional, always pass stack size
> to create_thread.
> 5. Enable the public clone3 wrapper in the future after it has been
> added to all targets.
>
> NB: Sandbox should return ENOSYS on clone3 if it is rejected:
>
> https://bugs.chromium.org/p/chromium/issues/detail?id=1213452#c5

The sandbox issue has been fixed by

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src/+/218438259dd795456f0a48f67cbe5b4e520db88b

commit 218438259dd795456f0a48f67cbe5b4e520db88b
Author: Matthew Denton <mpdenton@chromium.org>
Date: Thu Jun 03 20:06:13 2021

Linux sandbox: return ENOSYS for clone3

Because clone3 uses a pointer argument rather than a flags argument, we
cannot examine the contents with seccomp, which is essential to
preventing sandboxed processes from starting other processes. So, we
won't be able to support clone3 in Chromium. This CL modifies the
BPF policy to return ENOSYS for clone3 so glibc always uses the fallback
to clone.

Bug: 1213452
Change-Id: I7c7c585a319e0264eac5b1ebee1a45be2d782303
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2936184
Reviewed-by: Robert Sesek <rsesek@chromium.org>
Commit-Queue: Matthew Denton <mpdenton@chromium.org>
Cr-Commit-Position: refs/heads/master@{#888980}

[modify] https://crrev.com/218438259dd795456f0a48f67cbe5b4e520db88b/sandbox/linux/seccomp-bpf-helpers/baseline_policy.cc

-- 
H.J.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* PING^1 [PATCH v8 1/3] Add an internal wrapper for clone, clone2 and clone3
  2021-06-04 12:20   ` H.J. Lu
@ 2021-06-18 18:20     ` H.J. Lu
  0 siblings, 0 replies; 16+ messages in thread
From: H.J. Lu @ 2021-06-18 18:20 UTC (permalink / raw)
  To: GNU C Library; +Cc: Florian Weimer, Noah Goldstein, Adhemerval Zanella

On Fri, Jun 4, 2021 at 5:20 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Tue, Jun 1, 2021 at 7:55 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > The clone3 system call provides a superset of the functionality of clone
> > and clone2.  It also provides a number of API improvements, including
> > the ability to specify the size of the child's stack area which can be
> > used by kernel to compute the shadow stack size when allocating the
> > shadow stack.  Add:
> >
> > extern int __clone_internal (struct clone_args *__cl_args,
> >                              int (*__func) (void *__arg), void *__arg);
> >
> > to provide an abstract interface for clone, clone2 and clone3.
> >
> > 1. Simplify stack management for thread creation by passing both stack
> > base and size to create_thread.
> > 2. Consolidate clone vs clone2 differences into a single file.
> > 3. Call __clone3 if HAVE_CLONE3_WAPPER is defined.  If __clone3 returns
> > -1 with ENOSYS, fall back to clone or clone2.
> > 4. Use only __clone_internal to clone a thread.  Since the stack size
> > argument for create_thread is now unconditional, always pass stack size
> > to create_thread.
> > 5. Enable the public clone3 wrapper in the future after it has been
> > added to all targets.
> >
> > NB: Sandbox should return ENOSYS on clone3 if it is rejected:
> >
> > https://bugs.chromium.org/p/chromium/issues/detail?id=1213452#c5
>
> The sandbox issue has been fixed by
>
> The following revision refers to this bug:
>   https://chromium.googlesource.com/chromium/src/+/218438259dd795456f0a48f67cbe5b4e520db88b
>
> commit 218438259dd795456f0a48f67cbe5b4e520db88b
> Author: Matthew Denton <mpdenton@chromium.org>
> Date: Thu Jun 03 20:06:13 2021
>
> Linux sandbox: return ENOSYS for clone3
>
> Because clone3 uses a pointer argument rather than a flags argument, we
> cannot examine the contents with seccomp, which is essential to
> preventing sandboxed processes from starting other processes. So, we
> won't be able to support clone3 in Chromium. This CL modifies the
> BPF policy to return ENOSYS for clone3 so glibc always uses the fallback
> to clone.
>
> Bug: 1213452
> Change-Id: I7c7c585a319e0264eac5b1ebee1a45be2d782303
> Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2936184
> Reviewed-by: Robert Sesek <rsesek@chromium.org>
> Commit-Queue: Matthew Denton <mpdenton@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#888980}
>
> [modify] https://crrev.com/218438259dd795456f0a48f67cbe5b4e520db88b/sandbox/linux/seccomp-bpf-helpers/baseline_policy.cc
>
> --
> H.J.

PING.


-- 
H.J.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v8 1/3] Add an internal wrapper for clone, clone2 and clone3
  2021-06-01 14:55 ` [PATCH v8 1/3] " H.J. Lu
  2021-06-04 12:20   ` H.J. Lu
@ 2021-07-13 18:54   ` Adhemerval Zanella
  2021-07-13 19:06     ` Adhemerval Zanella
  2021-07-13 19:49     ` [PATCH v9] " H.J. Lu
  1 sibling, 2 replies; 16+ messages in thread
From: Adhemerval Zanella @ 2021-07-13 18:54 UTC (permalink / raw)
  To: H.J. Lu, libc-alpha; +Cc: Florian Weimer, Noah Goldstein



On 01/06/2021 11:55, H.J. Lu wrote:
> The clone3 system call provides a superset of the functionality of clone
> and clone2.  It also provides a number of API improvements, including
> the ability to specify the size of the child's stack area which can be
> used by kernel to compute the shadow stack size when allocating the
> shadow stack.  Add:
> 
> extern int __clone_internal (struct clone_args *__cl_args,
> 			     int (*__func) (void *__arg), void *__arg);
> 
> to provide an abstract interface for clone, clone2 and clone3.
> 
> 1. Simplify stack management for thread creation by passing both stack
> base and size to create_thread.
> 2. Consolidate clone vs clone2 differences into a single file.
> 3. Call __clone3 if HAVE_CLONE3_WAPPER is defined.  If __clone3 returns
> -1 with ENOSYS, fall back to clone or clone2.
> 4. Use only __clone_internal to clone a thread.  Since the stack size
> argument for create_thread is now unconditional, always pass stack size
> to create_thread.
> 5. Enable the public clone3 wrapper in the future after it has been
> added to all targets.
> 
> NB: Sandbox should return ENOSYS on clone3 if it is rejected:
> 
> https://bugs.chromium.org/p/chromium/issues/detail?id=1213452#c5

LGTM with just an suggestion below.  Also chromium also has fixed it,
so although it wouldn't be able to fully handled clone3, at least
it won't brick a 2.34 glibc.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>

> ---
>  include/clone_internal.h                 | 16 +++++
>  nptl/allocatestack.c                     | 59 ++-------------
>  nptl/pthread_create.c                    | 38 +++++-----
>  sysdeps/unix/sysv/linux/Makefile         |  2 +-
>  sysdeps/unix/sysv/linux/clone-internal.c | 91 ++++++++++++++++++++++++
>  sysdeps/unix/sysv/linux/clone3.c         |  1 +
>  sysdeps/unix/sysv/linux/clone3.h         | 60 ++++++++++++++++
>  sysdeps/unix/sysv/linux/spawni.c         | 26 +++----
>  8 files changed, 205 insertions(+), 88 deletions(-)
>  create mode 100644 include/clone_internal.h
>  create mode 100644 sysdeps/unix/sysv/linux/clone-internal.c
>  create mode 100644 sysdeps/unix/sysv/linux/clone3.c
>  create mode 100644 sysdeps/unix/sysv/linux/clone3.h
> 
> diff --git a/include/clone_internal.h b/include/clone_internal.h
> new file mode 100644
> index 0000000000..4b23ef33ce
> --- /dev/null
> +++ b/include/clone_internal.h
> @@ -0,0 +1,16 @@
> +#ifndef _CLONE3_H
> +#include_next <clone3.h>
> +
> +extern __typeof (clone3) __clone3;
> +
> +/* The internal wrapper of clone/clone2 and clone3.  If __clone3 returns
> +   -1 with ENOSYS, fall back to clone or clone2.  */
> +extern int __clone_internal (struct clone_args *__cl_args,
> +			     int (*__func) (void *__arg), void *__arg);
> +
> +#ifndef _ISOMAC
> +libc_hidden_proto (__clone3)
> +libc_hidden_proto (__clone_internal)
> +#endif
> +
> +#endif

Ok.

> diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c
> index dc81a2ca73..eebf9c2c3c 100644
> --- a/nptl/allocatestack.c
> +++ b/nptl/allocatestack.c
> @@ -33,47 +33,6 @@
>  #include <kernel-features.h>
>  #include <nptl-stack.h>
>  
> -#ifndef NEED_SEPARATE_REGISTER_STACK
> -
> -/* Most architectures have exactly one stack pointer.  Some have more.  */
> -# define STACK_VARIABLES void *stackaddr = NULL
> -
> -/* How to pass the values to the 'create_thread' function.  */
> -# define STACK_VARIABLES_ARGS stackaddr
> -
> -/* How to declare function which gets there parameters.  */
> -# define STACK_VARIABLES_PARMS void *stackaddr
> -
> -/* How to declare allocate_stack.  */
> -# define ALLOCATE_STACK_PARMS void **stack
> -
> -/* This is how the function is called.  We do it this way to allow
> -   other variants of the function to have more parameters.  */
> -# define ALLOCATE_STACK(attr, pd) allocate_stack (attr, pd, &stackaddr)
> -
> -#else
> -
> -/* We need two stacks.  The kernel will place them but we have to tell
> -   the kernel about the size of the reserved address space.  */
> -# define STACK_VARIABLES void *stackaddr = NULL; size_t stacksize = 0
> -
> -/* How to pass the values to the 'create_thread' function.  */
> -# define STACK_VARIABLES_ARGS stackaddr, stacksize
> -
> -/* How to declare function which gets there parameters.  */
> -# define STACK_VARIABLES_PARMS void *stackaddr, size_t stacksize
> -
> -/* How to declare allocate_stack.  */
> -# define ALLOCATE_STACK_PARMS void **stack, size_t *stacksize
> -
> -/* This is how the function is called.  We do it this way to allow
> -   other variants of the function to have more parameters.  */
> -# define ALLOCATE_STACK(attr, pd) \
> -  allocate_stack (attr, pd, &stackaddr, &stacksize)
> -
> -#endif
> -
> -
>  /* Default alignment of stack.  */
>  #ifndef STACK_ALIGN
>  # define STACK_ALIGN __alignof__ (long double)

Ok.

> @@ -249,7 +208,7 @@ advise_stack_range (void *mem, size_t size, uintptr_t pd, size_t guardsize)
>     PDP must be non-NULL.  */
>  static int
>  allocate_stack (const struct pthread_attr *attr, struct pthread **pdp,
> -		ALLOCATE_STACK_PARMS)
> +		void **stack, size_t *stacksize)
>  {
>    struct pthread *pd;
>    size_t size;
> @@ -600,25 +559,17 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp,
>    /* We place the thread descriptor at the end of the stack.  */
>    *pdp = pd;
>  
> -#if _STACK_GROWS_DOWN
>    void *stacktop;
>  
> -# if TLS_TCB_AT_TP
> +#if TLS_TCB_AT_TP
>    /* The stack begins before the TCB and the static TLS block.  */
>    stacktop = ((char *) (pd + 1) - tls_static_size_for_stack);
> -# elif TLS_DTV_AT_TP
> +#elif TLS_DTV_AT_TP
>    stacktop = (char *) (pd - 1);
> -# endif
> +#endif
>  
> -# ifdef NEED_SEPARATE_REGISTER_STACK
> +  *stacksize = stacktop - pd->stackblock;
>    *stack = pd->stackblock;
> -  *stacksize = stacktop - *stack;
> -# else
> -  *stack = stacktop;
> -# endif
> -#else
> -  *stack = pd->stackblock;
> -#endif
>  
>    return 0;
>  }

Ok.

> diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c
> index 2d2535b07d..9e3b8f325c 100644
> --- a/nptl/pthread_create.c
> +++ b/nptl/pthread_create.c
> @@ -37,6 +37,7 @@
>  #include "libioP.h"
>  #include <sys/single_threaded.h>
>  #include <version.h>
> +#include <clone_internal.h>
>  
>  #include <shlib-compat.h>
>  
> @@ -246,8 +247,8 @@ late_init (void)
>  static int _Noreturn start_thread (void *arg);
>  
>  static int create_thread (struct pthread *pd, const struct pthread_attr *attr,
> -			  bool *stopped_start, STACK_VARIABLES_PARMS,
> -			  bool *thread_ran)
> +			  bool *stopped_start, void *stackaddr,
> +			  size_t stacksize, bool *thread_ran)
>  {
>    /* Determine whether the newly created threads has to be started
>       stopped since we have to set the scheduling parameters or set the
> @@ -299,14 +300,18 @@ static int create_thread (struct pthread *pd, const struct pthread_attr *attr,
>  
>    TLS_DEFINE_INIT_TP (tp, pd);
>  
> -#ifdef __NR_clone2
> -# define ARCH_CLONE __clone2
> -#else
> -# define ARCH_CLONE __clone
> -#endif
> -  if (__glibc_unlikely (ARCH_CLONE (&start_thread, STACK_VARIABLES_ARGS,
> -				    clone_flags, pd, &pd->tid, tp, &pd->tid)
> -			== -1))
> +  struct clone_args args =
> +    {
> +      .flags = clone_flags,
> +      .pidfd = (uintptr_t) &pd->tid,
> +      .parent_tid = (uintptr_t) &pd->tid,
> +      .child_tid = (uintptr_t) &pd->tid,
> +      .stack = (uintptr_t) stackaddr,
> +      .stack_size = stacksize,
> +      .tls = (uintptr_t) tp,
> +    };
> +  int ret = __clone_internal (&args, &start_thread, pd);
> +  if (__glibc_unlikely (ret == -1))
>      return errno;
>  
>    /* It's started now, so if we fail below, we'll have to cancel it

Ok.

> @@ -603,7 +608,8 @@ int
>  __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
>  		      void *(*start_routine) (void *), void *arg)
>  {
> -  STACK_VARIABLES;
> +  void *stackaddr = NULL;
> +  size_t stacksize = 0;
>  
>    /* Avoid a data race in the multi-threaded case, and call the
>       deferred initialization only once.  */
> @@ -627,7 +633,7 @@ __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
>      }
>  
>    struct pthread *pd = NULL;
> -  int err = ALLOCATE_STACK (iattr, &pd);
> +  int err = allocate_stack (iattr, &pd, &stackaddr, &stacksize);
>    int retval = 0;
>  
>    if (__glibc_unlikely (err != 0))
> @@ -772,8 +778,8 @@ __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
>  
>        /* We always create the thread stopped at startup so we can
>  	 notify the debugger.  */
> -      retval = create_thread (pd, iattr, &stopped_start,
> -			      STACK_VARIABLES_ARGS, &thread_ran);
> +      retval = create_thread (pd, iattr, &stopped_start, stackaddr,
> +			      stacksize, &thread_ran);
>        if (retval == 0)
>  	{
>  	  /* We retain ownership of PD until (a) (see CONCURRENCY NOTES
> @@ -804,8 +810,8 @@ __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
>  	}
>      }
>    else
> -    retval = create_thread (pd, iattr, &stopped_start,
> -			    STACK_VARIABLES_ARGS, &thread_ran);
> +    retval = create_thread (pd, iattr, &stopped_start, stackaddr,
> +			    stacksize, &thread_ran);
>  
>    /* Return to the previous signal mask, after creating the new
>       thread.  */

Ok.

> diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
> index bc14f20274..9469868bce 100644
> --- a/sysdeps/unix/sysv/linux/Makefile
> +++ b/sysdeps/unix/sysv/linux/Makefile
> @@ -64,7 +64,7 @@ sysdep_routines += adjtimex clone umount umount2 readahead sysctl \
>  		   time64-support pselect32 \
>  		   xstat fxstat lxstat xstat64 fxstat64 lxstat64 \
>  		   fxstatat fxstatat64 \
> -		   xmknod xmknodat
> +		   xmknod xmknodat clone3 clone-internal
>  
>  CFLAGS-gethostid.c = -fexceptions
>  CFLAGS-tee.c = -fexceptions -fasynchronous-unwind-tables

Ok.

> diff --git a/sysdeps/unix/sysv/linux/clone-internal.c b/sysdeps/unix/sysv/linux/clone-internal.c
> new file mode 100644
> index 0000000000..1e7a8f6b35
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/clone-internal.c
> @@ -0,0 +1,91 @@
> +/* The internal wrapper of clone and clone3.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library.  If not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include <stddef.h>
> +#include <errno.h>
> +#include <sched.h>
> +#include <clone_internal.h>
> +#include <libc-pointer-arith.h>	/* For cast_to_pointer.  */
> +#include <stackinfo.h>		/* For _STACK_GROWS_{UP,DOWN}.  */
> +
> +#define CLONE_ARGS_SIZE_VER0 64 /* sizeof first published struct */
> +#define CLONE_ARGS_SIZE_VER1 80 /* sizeof second published struct */
> +#define CLONE_ARGS_SIZE_VER2 88 /* sizeof third published struct */
> +
> +#define sizeof_field(TYPE, MEMBER) sizeof ((((TYPE *)0)->MEMBER))
> +#define offsetofend(TYPE, MEMBER) \
> +  (offsetof (TYPE, MEMBER) + sizeof_field (TYPE, MEMBER))
> +
> +_Static_assert (__alignof (struct clone_args) == 8,
> +		"__alignof (struct clone_args) != 8");
> +_Static_assert (offsetofend (struct clone_args, tls) == CLONE_ARGS_SIZE_VER0,
> +		"offsetofend (struct clone_args, tls) != CLONE_ARGS_SIZE_VER0");
> +_Static_assert (offsetofend (struct clone_args, set_tid_size) == CLONE_ARGS_SIZE_VER1,
> +		"offsetofend (struct clone_args, set_tid_size) != CLONE_ARGS_SIZE_VER1");
> +_Static_assert (offsetofend (struct clone_args, cgroup) == CLONE_ARGS_SIZE_VER2,
> +		"offsetofend (struct clone_args, cgroup) != CLONE_ARGS_SIZE_VER2");
> +_Static_assert (sizeof (struct clone_args) == CLONE_ARGS_SIZE_VER2,
> +		"sizeof (struct clone_args) != CLONE_ARGS_SIZE_VER2");
> +
> +int
> +__clone_internal (struct clone_args *cl_args,
> +		  int (*func) (void *arg), void *arg)
> +{
> +  int ret;
> +#ifdef HAVE_CLONE3_WAPPER
> +  /* Try clone3 first.  */
> +  int saved_errno = errno;
> +  ret = __clone3 (cl_args, sizeof (*cl_args), func, arg);
> +  if (ret != -1 || errno != ENOSYS)
> +    return ret;
> +
> +  /* NB: Restore errno since errno may be checked against non-zero
> +     return value.  */
> +  __set_errno (saved_errno);
> +#endif
> +
> +  /* Map clone3 arguments to clone arguments.  NB: No need to check
> +     invalid clone3 specific bits in flags nor exit_signal since this
> +     is an internal function.  */
> +  int flags = cl_args->flags | cl_args->exit_signal;
> +  void *stack = cast_to_pointer (cl_args->stack);
> +
> +#ifdef __ia64__
> +  ret = __clone2 (func, stack, cl_args->stack_size,
> +		  flags, arg,
> +		  cast_to_pointer (cl_args->parent_tid),
> +		  cast_to_pointer (cl_args->tls),
> +		  cast_to_pointer (cl_args->child_tid));
> +#else
> +# if !_STACK_GROWS_DOWN && !_STACK_GROWS_UP
> +#  error "Define either _STACK_GROWS_DOWN or _STACK_GROWS_UP"
> +# endif
> +
> +# if _STACK_GROWS_DOWN
> +  stack += cl_args->stack_size;
> +# endif
> +  ret = __clone (func, stack, flags, arg,
> +		 cast_to_pointer (cl_args->parent_tid),
> +		 cast_to_pointer (cl_args->tls),
> +		 cast_to_pointer (cl_args->child_tid));
> +#endif
> +  return ret;
> +}
> +
> +libc_hidden_def (__clone_internal)

Ok.

> diff --git a/sysdeps/unix/sysv/linux/clone3.c b/sysdeps/unix/sysv/linux/clone3.c
> new file mode 100644
> index 0000000000..de963ef89d
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/clone3.c
> @@ -0,0 +1 @@
> +/* An empty placeholder.  */

Ok.

> diff --git a/sysdeps/unix/sysv/linux/clone3.h b/sysdeps/unix/sysv/linux/clone3.h
> new file mode 100644
> index 0000000000..0488884d59
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/clone3.h
> @@ -0,0 +1,60 @@
> +/* The wrapper of clone3.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library.  If not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#ifndef _CLONE3_H
> +#define _CLONE3_H	1
> +
> +#include <features.h>
> +#include <stdint.h>
> +#include <stddef.h>
> +
> +__BEGIN_DECLS
> +
> +/* This struct should only be used in an argument to the clone3 system
> +   call (along with its size argument).  It may be extended with new
> +   fields in the future.  */
> +
> +struct clone_args
> +{
> +  uint64_t flags;	 /* Flags bit mask.  */
> +  uint64_t pidfd;	 /* Where to store PID file descriptor
> +			    (pid_t *).  */
> +  uint64_t child_tid;	 /* Where to store child TID, in child's memory
> +			    (pid_t *).  */
> +  uint64_t parent_tid;	 /* Where to store child TID, in parent's memory
> +			    (int *). */
> +  uint64_t exit_signal;	 /* Signal to deliver to parent on child
> +			    termination */
> +  uint64_t stack;	 /* The lowest address of stack.  */
> +  uint64_t stack_size;	 /* Size of stack.  */
> +  uint64_t tls;		 /* Location of new TLS.  */
> +  uint64_t set_tid;	 /* Pointer to a pid_t array
> +			    (since Linux 5.5).  */
> +  uint64_t set_tid_size; /* Number of elements in set_tid
> +			    (since Linux 5.5). */
> +  uint64_t cgroup;	 /* File descriptor for target cgroup
> +			    of child (since Linux 5.7).  */
> +} __attribute__ ((aligned (8)));
> +

The kernel defined the alignment for each member (__aligned_u64) instead
of aligning the struct itself.  It should ok as lon all member are the
same type, but I am not sure if kernel decide to extend using different
internal types. Maybe we should mimic kernel in this regard.

> +/* The wrapper of clone3.  */
> +extern int clone3 (struct clone_args *__cl_args, size_t __size,
> +		   int (*__func) (void *__arg), void *__arg);
> +
> +__END_DECLS
> +
> +#endif /* clone3.h */

Ok.

> diff --git a/sysdeps/unix/sysv/linux/spawni.c b/sysdeps/unix/sysv/linux/spawni.c
> index 501f8fbccd..fd29858cf5 100644
> --- a/sysdeps/unix/sysv/linux/spawni.c
> +++ b/sysdeps/unix/sysv/linux/spawni.c
> @@ -31,6 +31,7 @@
>  #include <dl-sysdep.h>
>  #include <libc-pointer-arith.h>
>  #include <ldsodefs.h>
> +#include <clone_internal.h>
>  #include "spawn_int.h"
>  
>  /* The Linux implementation of posix_spawn{p} uses the clone syscall directly
> @@ -59,21 +60,6 @@
>     normal program exit with the exit code 127.  */
>  #define SPAWN_ERROR	127
>  
> -#ifdef __ia64__
> -# define CLONE(__fn, __stackbase, __stacksize, __flags, __args) \
> -  __clone2 (__fn, __stackbase, __stacksize, __flags, __args, 0, 0, 0)
> -#else
> -# define CLONE(__fn, __stack, __stacksize, __flags, __args) \
> -  __clone (__fn, __stack, __flags, __args)
> -#endif
> -
> -/* Since ia64 wants the stackbase w/clone2, re-use the grows-up macro.  */
> -#if _STACK_GROWS_UP || defined (__ia64__)
> -# define STACK(__stack, __stack_size) (__stack)
> -#elif _STACK_GROWS_DOWN
> -# define STACK(__stack, __stack_size) (__stack + __stack_size)
> -#endif
> -
>  

Ok.

>  struct posix_spawn_args
>  {
> @@ -378,8 +364,14 @@ __spawnix (pid_t * pid, const char *file,
>       need for CLONE_SETTLS.  Although parent and child share the same TLS
>       namespace, there will be no concurrent access for TLS variables (errno
>       for instance).  */
> -  new_pid = CLONE (__spawni_child, STACK (stack, stack_size), stack_size,
> -		   CLONE_VM | CLONE_VFORK | SIGCHLD, &args);
> +  struct clone_args clone_args =
> +    {
> +      .flags = CLONE_VM | CLONE_VFORK,
> +      .exit_signal = SIGCHLD,
> +      .stack = (uintptr_t) stack,
> +      .stack_size = stack_size,
> +    };
> +  new_pid = __clone_internal (&clone_args, __spawni_child, &args);
>  
>    /* It needs to collect the case where the auxiliary process was created
>       but failed to execute the file (due either any preparation step or
> 

Ok.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v8 1/3] Add an internal wrapper for clone, clone2 and clone3
  2021-07-13 18:54   ` Adhemerval Zanella
@ 2021-07-13 19:06     ` Adhemerval Zanella
  2021-07-13 19:49     ` [PATCH v9] " H.J. Lu
  1 sibling, 0 replies; 16+ messages in thread
From: Adhemerval Zanella @ 2021-07-13 19:06 UTC (permalink / raw)
  To: H.J. Lu, libc-alpha; +Cc: Florian Weimer, Noah Goldstein



On 13/07/2021 15:54, Adhemerval Zanella wrote:
> 
> 
> On 01/06/2021 11:55, H.J. Lu wrote:
>> The clone3 system call provides a superset of the functionality of clone
>> and clone2.  It also provides a number of API improvements, including
>> the ability to specify the size of the child's stack area which can be
>> used by kernel to compute the shadow stack size when allocating the
>> shadow stack.  Add:
>>
>> extern int __clone_internal (struct clone_args *__cl_args,
>> 			     int (*__func) (void *__arg), void *__arg);
>>
>> to provide an abstract interface for clone, clone2 and clone3.
>>
>> 1. Simplify stack management for thread creation by passing both stack
>> base and size to create_thread.
>> 2. Consolidate clone vs clone2 differences into a single file.
>> 3. Call __clone3 if HAVE_CLONE3_WAPPER is defined.  If __clone3 returns
>> -1 with ENOSYS, fall back to clone or clone2.
>> 4. Use only __clone_internal to clone a thread.  Since the stack size
>> argument for create_thread is now unconditional, always pass stack size
>> to create_thread.
>> 5. Enable the public clone3 wrapper in the future after it has been
>> added to all targets.
>>
>> NB: Sandbox should return ENOSYS on clone3 if it is rejected:
>>
>> https://bugs.chromium.org/p/chromium/issues/detail?id=1213452#c5
> 
> LGTM with just an suggestion below.  Also chromium also has fixed it,
> so although it wouldn't be able to fully handled clone3, at least
> it won't brick a 2.34 glibc.
> 
> Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>

I forgot to add that I think it would be useful to reference the kernel
version and commit which actually added the clone3 syscall on the
commit log.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v8 2/3] x86-64: Add the clone3 wrapper
  2021-06-01 14:55 ` [PATCH v8 2/3] x86-64: Add the clone3 wrapper H.J. Lu
@ 2021-07-13 19:12   ` Adhemerval Zanella
  0 siblings, 0 replies; 16+ messages in thread
From: Adhemerval Zanella @ 2021-07-13 19:12 UTC (permalink / raw)
  To: H.J. Lu, libc-alpha; +Cc: Florian Weimer, Noah Goldstein

LGTM, thanks.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>

On 01/06/2021 11:55, H.J. Lu wrote:
> extern int clone3 (struct clone_args *__cl_args, size_t __size,
> 		   int (*__func) (void *__arg), void *__arg);
> ---
>  sysdeps/unix/sysv/linux/x86_64/clone3.S | 92 +++++++++++++++++++++++++
>  sysdeps/unix/sysv/linux/x86_64/sysdep.h |  2 +
>  2 files changed, 94 insertions(+)
>  create mode 100644 sysdeps/unix/sysv/linux/x86_64/clone3.S
> 
> diff --git a/sysdeps/unix/sysv/linux/x86_64/clone3.S b/sysdeps/unix/sysv/linux/x86_64/clone3.S
> new file mode 100644
> index 0000000000..71caaecc29
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/x86_64/clone3.S
> @@ -0,0 +1,92 @@
> +/* The clone3 syscall wrapper.  Linux/x86-64 version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +/* clone3() is even more special than fork() as it mucks with stacks
> +   and invokes a function in the right context after its all over.  */
> +
> +#include <sysdep.h>
> +
> +/* The userland implementation is:
> +   int clone3 (struct clone_args *cl_args, size_t size,
> +	       int (*func)(void *arg), void *arg);
> +   the kernel entry is:
> +   int clone3 (struct clone_args *cl_args, size_t size);
> +
> +   The parameters are passed in registers from userland:
> +   rdi: cl_args
> +   rsi: size
> +   rdx: func
> +   rcx: arg
> +
> +   The kernel expects:
> +   rax: system call number
> +   rdi: cl_args
> +   rsi: size  */
> +
> +        .text
> +ENTRY (__clone3)
> +	/* Sanity check arguments.  */
> +	movl	$-EINVAL, %eax
> +	test	%RDI_LP, %RDI_LP	/* No NULL cl_args pointer.  */
> +	jz	SYSCALL_ERROR_LABEL
> +	test	%RDX_LP, %RDX_LP	/* No NULL function pointer.  */
> +	jz	SYSCALL_ERROR_LABEL
> +
> +	/* Save the cl_args pointer in R8 which is preserved by the
> +	   syscall.  */
> +	mov	%RCX_LP, %R8_LP
> +
> +	/* Do the system call.  */
> +	movl	$SYS_ify(clone3), %eax
> +
> +	/* End FDE now, because in the child the unwind info will be
> +	   wrong.  */
> +	cfi_endproc
> +	syscall
> +
> +	test	%RAX_LP, %RAX_LP
> +	jl	SYSCALL_ERROR_LABEL
> +	jz	L(thread_start)
> +
> +	ret
> +
> +L(thread_start):
> +	cfi_startproc
> +	/* Clearing frame pointer is insufficient, use CFI.  */
> +	cfi_undefined (rip)
> +	/* Clear the frame pointer.  The ABI suggests this be done, to mark
> +	   the outermost frame obviously.  */
> +	xorl	%ebp, %ebp
> +
> +	/* Align stack to 16 bytes per the x86-64 psABI.  */
> +	and	$-16, %RSP_LP
> +
> +	/* Set up arguments for the function call.  */
> +	mov	%R8_LP, %RDI_LP	/* Argument.  */
> +	call	*%rdx		/* Call function.  */
> +	/* Call exit with return value from function call. */
> +	movq	%rax, %rdi
> +	movl	$SYS_ify(exit), %eax
> +	syscall
> +	cfi_endproc
> +
> +	cfi_startproc
> +PSEUDO_END (__clone3)
> +
> +libc_hidden_def (__clone3)
> +weak_alias (__clone3, clone3)
> diff --git a/sysdeps/unix/sysv/linux/x86_64/sysdep.h b/sysdeps/unix/sysv/linux/x86_64/sysdep.h
> index dbad2c788a..f26ffc68ae 100644
> --- a/sysdeps/unix/sysv/linux/x86_64/sysdep.h
> +++ b/sysdeps/unix/sysv/linux/x86_64/sysdep.h
> @@ -377,6 +377,8 @@
>  # define HAVE_GETCPU_VSYSCALL		"__vdso_getcpu"
>  # define HAVE_CLOCK_GETRES64_VSYSCALL   "__vdso_clock_getres"
>  
> +# define HAVE_CLONE3_WAPPER			1
> +
>  # define SINGLE_THREAD_BY_GLOBAL		1
>  
>  #endif	/* __ASSEMBLER__ */
> 

Ok.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v8 3/3] Add static tests for __clone_internal
  2021-06-01 14:55 ` [PATCH v8 3/3] Add static tests for __clone_internal H.J. Lu
@ 2021-07-13 19:32   ` Adhemerval Zanella
  2021-07-13 21:12     ` [PATCH v9] " H.J. Lu
  0 siblings, 1 reply; 16+ messages in thread
From: Adhemerval Zanella @ 2021-07-13 19:32 UTC (permalink / raw)
  To: H.J. Lu, libc-alpha; +Cc: Florian Weimer, Noah Goldstein

They are quite similar to the non '-internal' tests, would be better to try
include and reimplement the difference bits that calls the __clone_internal()
intead of replicate all the tests?

On 01/06/2021 11:55, H.J. Lu wrote:
> ---
>  sysdeps/unix/sysv/linux/Makefile              |   9 ++
>  .../sysv/linux/tst-align-clone-internal.c     |  87 +++++++++++
>  sysdeps/unix/sysv/linux/tst-clone2-internal.c | 137 ++++++++++++++++++
>  sysdeps/unix/sysv/linux/tst-clone3-internal.c |  99 +++++++++++++
>  .../unix/sysv/linux/tst-getpid1-internal.c    | 133 +++++++++++++++++
>  .../sysv/linux/tst-misalign-clone-internal.c  |  86 +++++++++++
>  6 files changed, 551 insertions(+)
>  create mode 100644 sysdeps/unix/sysv/linux/tst-align-clone-internal.c
>  create mode 100644 sysdeps/unix/sysv/linux/tst-clone2-internal.c
>  create mode 100644 sysdeps/unix/sysv/linux/tst-clone3-internal.c
>  create mode 100644 sysdeps/unix/sysv/linux/tst-getpid1-internal.c
>  create mode 100644 sysdeps/unix/sysv/linux/tst-misalign-clone-internal.c
> 
> diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
> index 9469868bce..214b912921 100644
> --- a/sysdeps/unix/sysv/linux/Makefile
> +++ b/sysdeps/unix/sysv/linux/Makefile
> @@ -118,6 +118,15 @@ endif
>  
>  tests-internal += tst-sigcontext-get_pc
>  
> +tests-clone-internal = \
> +  tst-align-clone-internal \
> +  tst-clone2-internal \
> +  tst-clone3-internal \
> +  tst-getpid1-internal \
> +  tst-misalign-clone-internal
> +tests-internal += $(tests-clone-internal)
> +tests-static += $(tests-clone-internal)
> +
>  CFLAGS-tst-sigcontext-get_pc.c = -fasynchronous-unwind-tables
>  
>  # Generate the list of SYS_* macros for the system calls (__NR_*

Ok.

> diff --git a/sysdeps/unix/sysv/linux/tst-align-clone-internal.c b/sysdeps/unix/sysv/linux/tst-align-clone-internal.c
> new file mode 100644
> index 0000000000..6c3631f3db
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/tst-align-clone-internal.c
> @@ -0,0 +1,87 @@
> +/* Verify that the clone child stack is properly aligned.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sched.h>
> +#include <stdbool.h>
> +#include <stdint.h>
> +#include <stdio.h>
> +#include <string.h>
> +#include <sys/wait.h>
> +#include <unistd.h>
> +#include <tst-stack-align.h>
> +#include <clone_internal.h>
> +#include <support/xunistd.h>
> +
> +static int
> +f (void *arg)
> +{
> +  bool ok = true;
> +
> +  puts ("in f");
> +
> +  if (TEST_STACK_ALIGN ())
> +    ok = false;
> +
> +  return ok ? 0 : 1;
> +}

Maybe:

  statit int
  f (void *arg)
  {
    return TEST_STACK_ALIGN () ? 0 : 1;
  }

> +
> +static int
> +do_test (void)
> +{
> +  bool ok = true;
> +
> +  puts ("in main");
> +
> +  if (TEST_STACK_ALIGN ())
> +    ok = false;
> +

Maybe

  ok = TEST_STACK_ALIGN ();

But I think this does not really add much, so I think it would be better
to:

  if (! TEST_STACK_ALIGN ())
    FAIL_EXIT1 ("stack alignment failed");

> +#ifdef __ia64__
> +# define STACK_SIZE 256 * 1024
> +#else
> +# define STACK_SIZE 128 * 1024
> +#endif
> +  char st[STACK_SIZE] __attribute__ ((aligned));
> +  struct clone_args clone_args =
> +    {
> +      .stack = (uintptr_t) st,
> +      .stack_size = sizeof (st),
> +    };
> +  pid_t p = __clone_internal (&clone_args, f, 0);
> +  if (p == -1)
> +    {
> +      printf("clone failed: %m\n");
> +      return 1;
> +    }

Use TEST_VERIFY here:

  TEST_VERIFY (p != -1);

> +
> +  int e;
> +  xwaitpid (p, &e, __WCLONE);
> +  if (!WIFEXITED (e))
> +    {
> +      if (WIFSIGNALED (e))
> +	printf ("died from signal %s\n", strsignal (WTERMSIG (e)));
> +      else
> +	puts ("did not terminate correctly");
> +      return 1;
> +    }
> +  if (WEXITSTATUS (e) != 0)
> +    ok = false;

    TEST_VERIFY (WIFEXITED (status));
    TEST_COMPARE (WEXITSTATUS (status), 0);

> +
> +  return ok ? 0 : 1;
> +}
> +
> +#include <support/test-driver.c>
> diff --git a/sysdeps/unix/sysv/linux/tst-clone2-internal.c b/sysdeps/unix/sysv/linux/tst-clone2-internal.c
> new file mode 100644
> index 0000000000..b8917fe713
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/tst-clone2-internal.c
> @@ -0,0 +1,137 @@
> +/* Test if CLONE_VM does not change pthread pid/tid field (BZ #19957)
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sched.h>
> +#include <signal.h>
> +#include <string.h>
> +#include <stdio.h>
> +#include <fcntl.h>
> +#include <unistd.h>
> +#include <stddef.h>
> +#include <stdbool.h>
> +#include <stdint.h>
> +#include <stdlib.h>
> +#include <errno.h>
> +#include <sys/types.h>
> +#include <sys/wait.h>
> +#include <sys/syscall.h>
> +#include <clone_internal.h>
> +#include <support/xunistd.h>
> +#include <support/check.h>
> +
> +static int sig;
> +static int pipefd[2];
> +
> +static int
> +f (void *a)
> +{
> +  close (pipefd[0]);
> +
> +  pid_t ppid = getppid ();
> +  pid_t pid = getpid ();
> +  pid_t tid = gettid ();
> +
> +  if (write (pipefd[1], &ppid, sizeof ppid) != sizeof (ppid))
> +    FAIL_EXIT1 ("write ppid failed\n");
> +  if (write (pipefd[1], &pid, sizeof pid) != sizeof (pid))
> +    FAIL_EXIT1 ("write pid failed\n");
> +  if (write (pipefd[1], &tid, sizeof tid) != sizeof (tid))
> +    FAIL_EXIT1 ("write tid failed\n");
> +
> +  return 0;
> +}
> +
> +
> +static int
> +do_test (void)
> +{
> +  sig = SIGRTMIN;
> +  sigset_t ss;
> +  sigemptyset (&ss);
> +  sigaddset (&ss, sig);
> +  if (sigprocmask (SIG_BLOCK, &ss, NULL) != 0)
> +    FAIL_EXIT1 ("sigprocmask failed: %m");
> +
> +  if (pipe2 (pipefd, O_CLOEXEC))
> +    FAIL_EXIT1 ("pipe failed: %m");
> +
> +#ifdef __ia64__
> +# define STACK_SIZE 256 * 1024
> +#else
> +# define STACK_SIZE 128 * 1024
> +#endif
> +  char st[STACK_SIZE] __attribute__ ((aligned));
> +  struct clone_args clone_args =
> +    {
> +      .stack = (uintptr_t) st,
> +      .stack_size = sizeof (st),
> +    };
> +  pid_t p = __clone_internal (&clone_args, f, 0);
> +
> +  close (pipefd[1]);
> +
> +  if (p == -1)
> +    FAIL_EXIT1("clone failed: %m");
> +
> +  pid_t ppid, pid, tid;
> +  if (read (pipefd[0], &ppid, sizeof pid) != sizeof pid)
> +    {
> +      kill (p, SIGKILL);
> +      FAIL_EXIT1 ("read ppid failed: %m");
> +    }
> +  if (read (pipefd[0], &pid, sizeof pid) != sizeof pid)
> +    {
> +      kill (p, SIGKILL);
> +      FAIL_EXIT1 ("read pid failed: %m");
> +    }
> +  if (read (pipefd[0], &tid, sizeof tid) != sizeof tid)
> +    {
> +      kill (p, SIGKILL);
> +      FAIL_EXIT1 ("read tid failed: %m");
> +    }
> +
> +  close (pipefd[0]);
> +
> +  int ret = 0;
> +
> +  pid_t own_pid = getpid ();
> +  pid_t own_tid = syscall (__NR_gettid);
> +
> +  /* Some sanity checks for clone syscall: returned ppid should be current
> +     pid and both returned tid/pid should be different from current one.  */
> +  if ((ppid != own_pid) || (pid == own_pid) || (tid == own_tid))
> +    FAIL_RET ("ppid=%i pid=%i tid=%i | own_pid=%i own_tid=%i",
> +	      (int)ppid, (int)pid, (int)tid, (int)own_pid, (int)own_tid);
> +
> +  int e;
> +  xwaitpid (p, &e, __WCLONE);
> +  if (!WIFEXITED (e))
> +    {
> +      if (WIFSIGNALED (e))
> +	printf ("died from signal %s\n", strsignal (WTERMSIG (e)));
> +      else
> +	puts ("did not terminate correctly");
> +      exit (EXIT_FAILURE);
> +    }
> +  if (WEXITSTATUS (e) != 0)
> +    FAIL_EXIT1 ("exit code %d", WEXITSTATUS (e));

    TEST_VERIFY (WIFEXITED (status));
    TEST_COMPARE (WEXITSTATUS (status), 0);

> +
> +  return ret;
> +}
> +
> +#include <support/test-driver.c>
> diff --git a/sysdeps/unix/sysv/linux/tst-clone3-internal.c b/sysdeps/unix/sysv/linux/tst-clone3-internal.c
> new file mode 100644
> index 0000000000..2bdbc571e6
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/tst-clone3-internal.c
> @@ -0,0 +1,99 @@
> +/* Check if clone (CLONE_THREAD) does not call exit_group (BZ #21512)
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <string.h>
> +#include <sched.h>
> +#include <signal.h>
> +#include <unistd.h>
> +#include <errno.h>
> +#include <sys/syscall.h>
> +#include <sys/wait.h>
> +#include <sys/types.h>
> +#include <linux/futex.h>
> +#include <support/check.h>
> +#include <stdatomic.h>
> +#include <clone_internal.h>
> +
> +/* Test if clone call with CLONE_THREAD does not call exit_group.  The 'f'
> +   function returns '1', which will be used by clone thread to call the
> +   'exit' syscall directly.  If _exit is used instead, exit_group will be
> +   used and thus the thread group will finish with return value of '1'
> +   (where '2' from main thread is expected.).  */
> +
> +static int
> +f (void *a)
> +{
> +  return 1;
> +}
> +
> +/* Futex wait for TID argument, similar to pthread_join internal
> +   implementation.  */
> +#define wait_tid(ctid_ptr, ctid_val)					\
> +  do {									\
> +    __typeof (*(ctid_ptr)) __tid;					\
> +    /* We need acquire MO here so that we synchronize with the		\
> +       kernel's store to 0 when the clone terminates.  */		\
> +    while ((__tid = atomic_load_explicit (ctid_ptr,			\
> +					  memory_order_acquire)) != 0)	\
> +      futex_wait (ctid_ptr, ctid_val);					\
> +  } while (0)
> +
> +static inline int
> +futex_wait (int *futexp, int val)
> +{
> +#ifdef __NR_futex
> +  return syscall (__NR_futex, futexp, FUTEX_WAIT, val);
> +#else
> +  return syscall (__NR_futex_time64, futexp, FUTEX_WAIT, val);
> +#endif
> +}
> +
> +static int
> +do_test (void)
> +{
> +  char st[1024] __attribute__ ((aligned));
> +  int clone_flags = CLONE_THREAD;
> +  /* Minimum required flags to used along with CLONE_THREAD.  */
> +  clone_flags |= CLONE_VM | CLONE_SIGHAND;
> +  /* We will used ctid to call on futex to wait for thread exit.  */
> +  clone_flags |= CLONE_CHILD_CLEARTID;
> +  /* Initialize with a known value.  ctid is set to zero by the kernel after the
> +     cloned thread has exited.  */
> +#define CTID_INIT_VAL 1
> +  pid_t ctid = CTID_INIT_VAL;
> +  pid_t tid;
> +
> +  struct clone_args clone_args =
> +    {
> +      .flags = clone_flags & ~CSIGNAL,
> +      .exit_signal = clone_flags & CSIGNAL,
> +      .stack = (uintptr_t) st,
> +      .stack_size = sizeof (st),
> +      .child_tid = (uintptr_t) &ctid,
> +    };
> +  tid = __clone_internal (&clone_args, f, NULL);
> +  if (tid == -1)
> +    FAIL_EXIT1 ("clone failed: %m");
> +
> +  wait_tid (&ctid, CTID_INIT_VAL);
> +
> +  return 2;
> +}
> +
> +#define EXPECTED_STATUS 2
> +#include <support/test-driver.c>
> diff --git a/sysdeps/unix/sysv/linux/tst-getpid1-internal.c b/sysdeps/unix/sysv/linux/tst-getpid1-internal.c
> new file mode 100644
> index 0000000000..ee69e52401
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/tst-getpid1-internal.c
> @@ -0,0 +1,133 @@
> +/* Verify that the parent pid is unchanged by __clone_internal.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sched.h>
> +#include <signal.h>
> +#include <string.h>
> +#include <stdio.h>
> +#include <unistd.h>
> +#include <sys/types.h>
> +#include <sys/wait.h>
> +#include <clone_internal.h>
> +#include <support/xunistd.h>
> +
> +#ifndef TEST_CLONE_FLAGS
> +#define TEST_CLONE_FLAGS 0
> +#endif
> +
> +static int sig;
> +
> +static int
> +f (void *a)
> +{
> +  puts ("in f");
> +  union sigval sival;
> +  sival.sival_int = getpid ();
> +  printf ("pid = %d\n", sival.sival_int);
> +  if (sigqueue (getppid (), sig, sival) != 0)
> +    return 1;
> +  return 0;
> +}
> +
> +
> +static int
> +do_test (void)
> +{
> +  int mypid = getpid ();
> +
> +  sig = SIGRTMIN;
> +  sigset_t ss;
> +  sigemptyset (&ss);
> +  sigaddset (&ss, sig);
> +  if (sigprocmask (SIG_BLOCK, &ss, NULL) != 0)
> +    {
> +      printf ("sigprocmask failed: %m\n");
> +      return 1;
> +    }
> +
> +#ifdef __ia64__
> +# define STACK_SIZE 256 * 1024
> +#else
> +# define STACK_SIZE 128 * 1024
> +#endif
> +  char st[STACK_SIZE] __attribute__ ((aligned));
> +  struct clone_args clone_args =
> +    {
> +      .flags = TEST_CLONE_FLAGS & ~CSIGNAL,
> +      .exit_signal = TEST_CLONE_FLAGS & CSIGNAL,
> +      .stack = (uintptr_t) st,
> +      .stack_size = sizeof (st),
> +    };
> +  pid_t p = __clone_internal (&clone_args, f, 0);
> +  if (p == -1)
> +    {
> +      printf("clone failed: %m\n");
> +      return 1;
> +    }
> +  printf ("new thread: %d\n", (int) p);
> +
> +  siginfo_t si;
> +  do
> +    if (sigwaitinfo (&ss, &si) < 0)
> +      {
> +	printf("sigwaitinfo failed: %m\n");
> +	kill (p, SIGKILL);
> +	return 1;
> +      }
> +  while  (si.si_signo != sig || si.si_code != SI_QUEUE);
> +
> +  int e;
> +  xwaitpid (p, &e, __WCLONE);
> +  if (!WIFEXITED (e))
> +    {
> +      if (WIFSIGNALED (e))
> +	printf ("died from signal %s\n", strsignal (WTERMSIG (e)));
> +      else
> +	puts ("did not terminate correctly");
> +      return 1;
> +    }
> +  if (WEXITSTATUS (e) != 0)
> +    {
> +      printf ("exit code %d\n", WEXITSTATUS (e));
> +      return 1;
> +    }
> +
> +  if (si.si_int != (int) p)
> +    {
> +      printf ("expected PID %d, got si_int %d\n", (int) p, si.si_int);
> +      kill (p, SIGKILL);
> +      return 1;
> +    }
> +
> +  if (si.si_pid != p)
> +    {
> +      printf ("expected PID %d, got si_pid %d\n", (int) p, (int) si.si_pid);
> +      kill (p, SIGKILL);
> +      return 1;
> +    }
> +
> +  if (getpid () != mypid)
> +    {
> +      puts ("my PID changed");
> +      return 1;
> +    }
> +
> +  return 0;
> +}
> +
> +#include <support/test-driver.c>

Ok.

> diff --git a/sysdeps/unix/sysv/linux/tst-misalign-clone-internal.c b/sysdeps/unix/sysv/linux/tst-misalign-clone-internal.c
> new file mode 100644
> index 0000000000..6df5fd2cbc
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/tst-misalign-clone-internal.c
> @@ -0,0 +1,86 @@
> +/* Verify that __clone_internal properly aligns the child stack.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sched.h>
> +#include <stdbool.h>
> +#include <stdint.h>
> +#include <stdio.h>
> +#include <string.h>
> +#include <sys/wait.h>
> +#include <unistd.h>
> +#include <libc-pointer-arith.h>
> +#include <tst-stack-align.h>
> +#include <clone_internal.h>
> +#include <support/xunistd.h>
> +#include <support/check.h>
> +
> +static int
> +check_stack_alignment (void *arg)
> +{
> +  bool ok = true;
> +
> +  puts ("in f");
> +
> +  if (TEST_STACK_ALIGN ())
> +    ok = false;
> +
> +  return ok ? 0 : 1;
> +}
> +
> +static int
> +do_test (void)
> +{
> +  puts ("in do_test");
> +
> +  if (TEST_STACK_ALIGN ())
> +    FAIL_EXIT1 ("stack isn't aligned\n");
> +
> +#ifdef __ia64__
> +# define STACK_SIZE (256 * 1024)
> +#else
> +# define STACK_SIZE (128 * 1024)
> +#endif
> +  char st[STACK_SIZE + 1];
> +  /* NB: Align child stack to 1 byte.  */
> +  char *stack = PTR_ALIGN_UP (&st[0], 2) + 1;
> +  struct clone_args clone_args =
> +    {
> +      .stack = (uintptr_t) stack,
> +      .stack_size = STACK_SIZE,
> +    };
> +  pid_t p = __clone_internal (&clone_args, check_stack_alignment, 0);
> +
> +  /* Clone must not fail.  */
> +  TEST_VERIFY_EXIT (p != -1);
> +
> +  int e;
> +  xwaitpid (p, &e, __WCLONE);
> +  if (!WIFEXITED (e))
> +    {
> +      if (WIFSIGNALED (e))
> +	printf ("died from signal %s\n", strsignal (WTERMSIG (e)));
> +     FAIL_EXIT1 ("process did not terminate correctly");
> +    }
> +
> +  if (WEXITSTATUS (e) != 0)
> +    FAIL_EXIT1 ("exit code %d", WEXITSTATUS (e));
> +
> +  return 0;
> +}
> +
> +#include <support/test-driver.c>
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v9] Add an internal wrapper for clone, clone2 and clone3
  2021-07-13 18:54   ` Adhemerval Zanella
  2021-07-13 19:06     ` Adhemerval Zanella
@ 2021-07-13 19:49     ` H.J. Lu
  2021-07-14 13:17       ` Adhemerval Zanella
  1 sibling, 1 reply; 16+ messages in thread
From: H.J. Lu @ 2021-07-13 19:49 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: GNU C Library, Florian Weimer, Noah Goldstein

[-- Attachment #1: Type: text/plain, Size: 20747 bytes --]

On Tue, Jul 13, 2021 at 11:54 AM Adhemerval Zanella
<adhemerval.zanella@linaro.org> wrote:
>
>
>
> On 01/06/2021 11:55, H.J. Lu wrote:
> > The clone3 system call provides a superset of the functionality of clone
> > and clone2.  It also provides a number of API improvements, including
> > the ability to specify the size of the child's stack area which can be
> > used by kernel to compute the shadow stack size when allocating the
> > shadow stack.  Add:
> >
> > extern int __clone_internal (struct clone_args *__cl_args,
> >                            int (*__func) (void *__arg), void *__arg);
> >
> > to provide an abstract interface for clone, clone2 and clone3.
> >
> > 1. Simplify stack management for thread creation by passing both stack
> > base and size to create_thread.
> > 2. Consolidate clone vs clone2 differences into a single file.
> > 3. Call __clone3 if HAVE_CLONE3_WAPPER is defined.  If __clone3 returns
> > -1 with ENOSYS, fall back to clone or clone2.
> > 4. Use only __clone_internal to clone a thread.  Since the stack size
> > argument for create_thread is now unconditional, always pass stack size
> > to create_thread.
> > 5. Enable the public clone3 wrapper in the future after it has been
> > added to all targets.
> >
> > NB: Sandbox should return ENOSYS on clone3 if it is rejected:
> >
> > https://bugs.chromium.org/p/chromium/issues/detail?id=1213452#c5
>
> LGTM with just an suggestion below.  Also chromium also has fixed it,
> so although it wouldn't be able to fully handled clone3, at least
> it won't brick a 2.34 glibc.
>
> Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
>
> > ---
> >  include/clone_internal.h                 | 16 +++++
> >  nptl/allocatestack.c                     | 59 ++-------------
> >  nptl/pthread_create.c                    | 38 +++++-----
> >  sysdeps/unix/sysv/linux/Makefile         |  2 +-
> >  sysdeps/unix/sysv/linux/clone-internal.c | 91 ++++++++++++++++++++++++
> >  sysdeps/unix/sysv/linux/clone3.c         |  1 +
> >  sysdeps/unix/sysv/linux/clone3.h         | 60 ++++++++++++++++
> >  sysdeps/unix/sysv/linux/spawni.c         | 26 +++----
> >  8 files changed, 205 insertions(+), 88 deletions(-)
> >  create mode 100644 include/clone_internal.h
> >  create mode 100644 sysdeps/unix/sysv/linux/clone-internal.c
> >  create mode 100644 sysdeps/unix/sysv/linux/clone3.c
> >  create mode 100644 sysdeps/unix/sysv/linux/clone3.h
> >
> > diff --git a/include/clone_internal.h b/include/clone_internal.h
> > new file mode 100644
> > index 0000000000..4b23ef33ce
> > --- /dev/null
> > +++ b/include/clone_internal.h
> > @@ -0,0 +1,16 @@
> > +#ifndef _CLONE3_H
> > +#include_next <clone3.h>
> > +
> > +extern __typeof (clone3) __clone3;
> > +
> > +/* The internal wrapper of clone/clone2 and clone3.  If __clone3 returns
> > +   -1 with ENOSYS, fall back to clone or clone2.  */
> > +extern int __clone_internal (struct clone_args *__cl_args,
> > +                          int (*__func) (void *__arg), void *__arg);
> > +
> > +#ifndef _ISOMAC
> > +libc_hidden_proto (__clone3)
> > +libc_hidden_proto (__clone_internal)
> > +#endif
> > +
> > +#endif
>
> Ok.
>
> > diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c
> > index dc81a2ca73..eebf9c2c3c 100644
> > --- a/nptl/allocatestack.c
> > +++ b/nptl/allocatestack.c
> > @@ -33,47 +33,6 @@
> >  #include <kernel-features.h>
> >  #include <nptl-stack.h>
> >
> > -#ifndef NEED_SEPARATE_REGISTER_STACK
> > -
> > -/* Most architectures have exactly one stack pointer.  Some have more.  */
> > -# define STACK_VARIABLES void *stackaddr = NULL
> > -
> > -/* How to pass the values to the 'create_thread' function.  */
> > -# define STACK_VARIABLES_ARGS stackaddr
> > -
> > -/* How to declare function which gets there parameters.  */
> > -# define STACK_VARIABLES_PARMS void *stackaddr
> > -
> > -/* How to declare allocate_stack.  */
> > -# define ALLOCATE_STACK_PARMS void **stack
> > -
> > -/* This is how the function is called.  We do it this way to allow
> > -   other variants of the function to have more parameters.  */
> > -# define ALLOCATE_STACK(attr, pd) allocate_stack (attr, pd, &stackaddr)
> > -
> > -#else
> > -
> > -/* We need two stacks.  The kernel will place them but we have to tell
> > -   the kernel about the size of the reserved address space.  */
> > -# define STACK_VARIABLES void *stackaddr = NULL; size_t stacksize = 0
> > -
> > -/* How to pass the values to the 'create_thread' function.  */
> > -# define STACK_VARIABLES_ARGS stackaddr, stacksize
> > -
> > -/* How to declare function which gets there parameters.  */
> > -# define STACK_VARIABLES_PARMS void *stackaddr, size_t stacksize
> > -
> > -/* How to declare allocate_stack.  */
> > -# define ALLOCATE_STACK_PARMS void **stack, size_t *stacksize
> > -
> > -/* This is how the function is called.  We do it this way to allow
> > -   other variants of the function to have more parameters.  */
> > -# define ALLOCATE_STACK(attr, pd) \
> > -  allocate_stack (attr, pd, &stackaddr, &stacksize)
> > -
> > -#endif
> > -
> > -
> >  /* Default alignment of stack.  */
> >  #ifndef STACK_ALIGN
> >  # define STACK_ALIGN __alignof__ (long double)
>
> Ok.
>
> > @@ -249,7 +208,7 @@ advise_stack_range (void *mem, size_t size, uintptr_t pd, size_t guardsize)
> >     PDP must be non-NULL.  */
> >  static int
> >  allocate_stack (const struct pthread_attr *attr, struct pthread **pdp,
> > -             ALLOCATE_STACK_PARMS)
> > +             void **stack, size_t *stacksize)
> >  {
> >    struct pthread *pd;
> >    size_t size;
> > @@ -600,25 +559,17 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp,
> >    /* We place the thread descriptor at the end of the stack.  */
> >    *pdp = pd;
> >
> > -#if _STACK_GROWS_DOWN
> >    void *stacktop;
> >
> > -# if TLS_TCB_AT_TP
> > +#if TLS_TCB_AT_TP
> >    /* The stack begins before the TCB and the static TLS block.  */
> >    stacktop = ((char *) (pd + 1) - tls_static_size_for_stack);
> > -# elif TLS_DTV_AT_TP
> > +#elif TLS_DTV_AT_TP
> >    stacktop = (char *) (pd - 1);
> > -# endif
> > +#endif
> >
> > -# ifdef NEED_SEPARATE_REGISTER_STACK
> > +  *stacksize = stacktop - pd->stackblock;
> >    *stack = pd->stackblock;
> > -  *stacksize = stacktop - *stack;
> > -# else
> > -  *stack = stacktop;
> > -# endif
> > -#else
> > -  *stack = pd->stackblock;
> > -#endif
> >
> >    return 0;
> >  }
>
> Ok.
>
> > diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c
> > index 2d2535b07d..9e3b8f325c 100644
> > --- a/nptl/pthread_create.c
> > +++ b/nptl/pthread_create.c
> > @@ -37,6 +37,7 @@
> >  #include "libioP.h"
> >  #include <sys/single_threaded.h>
> >  #include <version.h>
> > +#include <clone_internal.h>
> >
> >  #include <shlib-compat.h>
> >
> > @@ -246,8 +247,8 @@ late_init (void)
> >  static int _Noreturn start_thread (void *arg);
> >
> >  static int create_thread (struct pthread *pd, const struct pthread_attr *attr,
> > -                       bool *stopped_start, STACK_VARIABLES_PARMS,
> > -                       bool *thread_ran)
> > +                       bool *stopped_start, void *stackaddr,
> > +                       size_t stacksize, bool *thread_ran)
> >  {
> >    /* Determine whether the newly created threads has to be started
> >       stopped since we have to set the scheduling parameters or set the
> > @@ -299,14 +300,18 @@ static int create_thread (struct pthread *pd, const struct pthread_attr *attr,
> >
> >    TLS_DEFINE_INIT_TP (tp, pd);
> >
> > -#ifdef __NR_clone2
> > -# define ARCH_CLONE __clone2
> > -#else
> > -# define ARCH_CLONE __clone
> > -#endif
> > -  if (__glibc_unlikely (ARCH_CLONE (&start_thread, STACK_VARIABLES_ARGS,
> > -                                 clone_flags, pd, &pd->tid, tp, &pd->tid)
> > -                     == -1))
> > +  struct clone_args args =
> > +    {
> > +      .flags = clone_flags,
> > +      .pidfd = (uintptr_t) &pd->tid,
> > +      .parent_tid = (uintptr_t) &pd->tid,
> > +      .child_tid = (uintptr_t) &pd->tid,
> > +      .stack = (uintptr_t) stackaddr,
> > +      .stack_size = stacksize,
> > +      .tls = (uintptr_t) tp,
> > +    };
> > +  int ret = __clone_internal (&args, &start_thread, pd);
> > +  if (__glibc_unlikely (ret == -1))
> >      return errno;
> >
> >    /* It's started now, so if we fail below, we'll have to cancel it
>
> Ok.
>
> > @@ -603,7 +608,8 @@ int
> >  __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
> >                     void *(*start_routine) (void *), void *arg)
> >  {
> > -  STACK_VARIABLES;
> > +  void *stackaddr = NULL;
> > +  size_t stacksize = 0;
> >
> >    /* Avoid a data race in the multi-threaded case, and call the
> >       deferred initialization only once.  */
> > @@ -627,7 +633,7 @@ __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
> >      }
> >
> >    struct pthread *pd = NULL;
> > -  int err = ALLOCATE_STACK (iattr, &pd);
> > +  int err = allocate_stack (iattr, &pd, &stackaddr, &stacksize);
> >    int retval = 0;
> >
> >    if (__glibc_unlikely (err != 0))
> > @@ -772,8 +778,8 @@ __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
> >
> >        /* We always create the thread stopped at startup so we can
> >        notify the debugger.  */
> > -      retval = create_thread (pd, iattr, &stopped_start,
> > -                           STACK_VARIABLES_ARGS, &thread_ran);
> > +      retval = create_thread (pd, iattr, &stopped_start, stackaddr,
> > +                           stacksize, &thread_ran);
> >        if (retval == 0)
> >       {
> >         /* We retain ownership of PD until (a) (see CONCURRENCY NOTES
> > @@ -804,8 +810,8 @@ __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
> >       }
> >      }
> >    else
> > -    retval = create_thread (pd, iattr, &stopped_start,
> > -                         STACK_VARIABLES_ARGS, &thread_ran);
> > +    retval = create_thread (pd, iattr, &stopped_start, stackaddr,
> > +                         stacksize, &thread_ran);
> >
> >    /* Return to the previous signal mask, after creating the new
> >       thread.  */
>
> Ok.
>
> > diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
> > index bc14f20274..9469868bce 100644
> > --- a/sysdeps/unix/sysv/linux/Makefile
> > +++ b/sysdeps/unix/sysv/linux/Makefile
> > @@ -64,7 +64,7 @@ sysdep_routines += adjtimex clone umount umount2 readahead sysctl \
> >                  time64-support pselect32 \
> >                  xstat fxstat lxstat xstat64 fxstat64 lxstat64 \
> >                  fxstatat fxstatat64 \
> > -                xmknod xmknodat
> > +                xmknod xmknodat clone3 clone-internal
> >
> >  CFLAGS-gethostid.c = -fexceptions
> >  CFLAGS-tee.c = -fexceptions -fasynchronous-unwind-tables
>
> Ok.
>
> > diff --git a/sysdeps/unix/sysv/linux/clone-internal.c b/sysdeps/unix/sysv/linux/clone-internal.c
> > new file mode 100644
> > index 0000000000..1e7a8f6b35
> > --- /dev/null
> > +++ b/sysdeps/unix/sysv/linux/clone-internal.c
> > @@ -0,0 +1,91 @@
> > +/* The internal wrapper of clone and clone3.
> > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > +   This file is part of the GNU C Library.
> > +
> > +   The GNU C Library is free software; you can redistribute it and/or
> > +   modify it under the terms of the GNU Lesser General Public
> > +   License as published by the Free Software Foundation; either
> > +   version 2.1 of the License, or (at your option) any later version.
> > +
> > +   The GNU C Library is distributed in the hope that it will be useful,
> > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +   Lesser General Public License for more details.
> > +
> > +   You should have received a copy of the GNU Lesser General Public
> > +   License along with the GNU C Library.  If not, see
> > +   <https://www.gnu.org/licenses/>.  */
> > +
> > +#include <sysdep.h>
> > +#include <stddef.h>
> > +#include <errno.h>
> > +#include <sched.h>
> > +#include <clone_internal.h>
> > +#include <libc-pointer-arith.h>      /* For cast_to_pointer.  */
> > +#include <stackinfo.h>               /* For _STACK_GROWS_{UP,DOWN}.  */
> > +
> > +#define CLONE_ARGS_SIZE_VER0 64 /* sizeof first published struct */
> > +#define CLONE_ARGS_SIZE_VER1 80 /* sizeof second published struct */
> > +#define CLONE_ARGS_SIZE_VER2 88 /* sizeof third published struct */
> > +
> > +#define sizeof_field(TYPE, MEMBER) sizeof ((((TYPE *)0)->MEMBER))
> > +#define offsetofend(TYPE, MEMBER) \
> > +  (offsetof (TYPE, MEMBER) + sizeof_field (TYPE, MEMBER))
> > +
> > +_Static_assert (__alignof (struct clone_args) == 8,
> > +             "__alignof (struct clone_args) != 8");
> > +_Static_assert (offsetofend (struct clone_args, tls) == CLONE_ARGS_SIZE_VER0,
> > +             "offsetofend (struct clone_args, tls) != CLONE_ARGS_SIZE_VER0");
> > +_Static_assert (offsetofend (struct clone_args, set_tid_size) == CLONE_ARGS_SIZE_VER1,
> > +             "offsetofend (struct clone_args, set_tid_size) != CLONE_ARGS_SIZE_VER1");
> > +_Static_assert (offsetofend (struct clone_args, cgroup) == CLONE_ARGS_SIZE_VER2,
> > +             "offsetofend (struct clone_args, cgroup) != CLONE_ARGS_SIZE_VER2");
> > +_Static_assert (sizeof (struct clone_args) == CLONE_ARGS_SIZE_VER2,
> > +             "sizeof (struct clone_args) != CLONE_ARGS_SIZE_VER2");
> > +
> > +int
> > +__clone_internal (struct clone_args *cl_args,
> > +               int (*func) (void *arg), void *arg)
> > +{
> > +  int ret;
> > +#ifdef HAVE_CLONE3_WAPPER
> > +  /* Try clone3 first.  */
> > +  int saved_errno = errno;
> > +  ret = __clone3 (cl_args, sizeof (*cl_args), func, arg);
> > +  if (ret != -1 || errno != ENOSYS)
> > +    return ret;
> > +
> > +  /* NB: Restore errno since errno may be checked against non-zero
> > +     return value.  */
> > +  __set_errno (saved_errno);
> > +#endif
> > +
> > +  /* Map clone3 arguments to clone arguments.  NB: No need to check
> > +     invalid clone3 specific bits in flags nor exit_signal since this
> > +     is an internal function.  */
> > +  int flags = cl_args->flags | cl_args->exit_signal;
> > +  void *stack = cast_to_pointer (cl_args->stack);
> > +
> > +#ifdef __ia64__
> > +  ret = __clone2 (func, stack, cl_args->stack_size,
> > +               flags, arg,
> > +               cast_to_pointer (cl_args->parent_tid),
> > +               cast_to_pointer (cl_args->tls),
> > +               cast_to_pointer (cl_args->child_tid));
> > +#else
> > +# if !_STACK_GROWS_DOWN && !_STACK_GROWS_UP
> > +#  error "Define either _STACK_GROWS_DOWN or _STACK_GROWS_UP"
> > +# endif
> > +
> > +# if _STACK_GROWS_DOWN
> > +  stack += cl_args->stack_size;
> > +# endif
> > +  ret = __clone (func, stack, flags, arg,
> > +              cast_to_pointer (cl_args->parent_tid),
> > +              cast_to_pointer (cl_args->tls),
> > +              cast_to_pointer (cl_args->child_tid));
> > +#endif
> > +  return ret;
> > +}
> > +
> > +libc_hidden_def (__clone_internal)
>
> Ok.
>
> > diff --git a/sysdeps/unix/sysv/linux/clone3.c b/sysdeps/unix/sysv/linux/clone3.c
> > new file mode 100644
> > index 0000000000..de963ef89d
> > --- /dev/null
> > +++ b/sysdeps/unix/sysv/linux/clone3.c
> > @@ -0,0 +1 @@
> > +/* An empty placeholder.  */
>
> Ok.
>
> > diff --git a/sysdeps/unix/sysv/linux/clone3.h b/sysdeps/unix/sysv/linux/clone3.h
> > new file mode 100644
> > index 0000000000..0488884d59
> > --- /dev/null
> > +++ b/sysdeps/unix/sysv/linux/clone3.h
> > @@ -0,0 +1,60 @@
> > +/* The wrapper of clone3.
> > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > +   This file is part of the GNU C Library.
> > +
> > +   The GNU C Library is free software; you can redistribute it and/or
> > +   modify it under the terms of the GNU Lesser General Public
> > +   License as published by the Free Software Foundation; either
> > +   version 2.1 of the License, or (at your option) any later version.
> > +
> > +   The GNU C Library is distributed in the hope that it will be useful,
> > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +   Lesser General Public License for more details.
> > +
> > +   You should have received a copy of the GNU Lesser General Public
> > +   License along with the GNU C Library.  If not, see
> > +   <https://www.gnu.org/licenses/>.  */
> > +
> > +#ifndef _CLONE3_H
> > +#define _CLONE3_H    1
> > +
> > +#include <features.h>
> > +#include <stdint.h>
> > +#include <stddef.h>
> > +
> > +__BEGIN_DECLS
> > +
> > +/* This struct should only be used in an argument to the clone3 system
> > +   call (along with its size argument).  It may be extended with new
> > +   fields in the future.  */
> > +
> > +struct clone_args
> > +{
> > +  uint64_t flags;     /* Flags bit mask.  */
> > +  uint64_t pidfd;     /* Where to store PID file descriptor
> > +                         (pid_t *).  */
> > +  uint64_t child_tid;         /* Where to store child TID, in child's memory
> > +                         (pid_t *).  */
> > +  uint64_t parent_tid;        /* Where to store child TID, in parent's memory
> > +                         (int *). */
> > +  uint64_t exit_signal;       /* Signal to deliver to parent on child
> > +                         termination */
> > +  uint64_t stack;     /* The lowest address of stack.  */
> > +  uint64_t stack_size;        /* Size of stack.  */
> > +  uint64_t tls;               /* Location of new TLS.  */
> > +  uint64_t set_tid;   /* Pointer to a pid_t array
> > +                         (since Linux 5.5).  */
> > +  uint64_t set_tid_size; /* Number of elements in set_tid
> > +                         (since Linux 5.5). */
> > +  uint64_t cgroup;    /* File descriptor for target cgroup
> > +                         of child (since Linux 5.7).  */
> > +} __attribute__ ((aligned (8)));
> > +
>
> The kernel defined the alignment for each member (__aligned_u64) instead
> of aligning the struct itself.  It should ok as lon all member are the
> same type, but I am not sure if kernel decide to extend using different
> internal types. Maybe we should mimic kernel in this regard.

I added __aligned_uint64_t to sysdeps/unix/sysv/linux/clone3.h.
We can improve it if it becomes an installed header file.

> > +/* The wrapper of clone3.  */
> > +extern int clone3 (struct clone_args *__cl_args, size_t __size,
> > +                int (*__func) (void *__arg), void *__arg);
> > +
> > +__END_DECLS
> > +
> > +#endif /* clone3.h */
>
> Ok.
>
> > diff --git a/sysdeps/unix/sysv/linux/spawni.c b/sysdeps/unix/sysv/linux/spawni.c
> > index 501f8fbccd..fd29858cf5 100644
> > --- a/sysdeps/unix/sysv/linux/spawni.c
> > +++ b/sysdeps/unix/sysv/linux/spawni.c
> > @@ -31,6 +31,7 @@
> >  #include <dl-sysdep.h>
> >  #include <libc-pointer-arith.h>
> >  #include <ldsodefs.h>
> > +#include <clone_internal.h>
> >  #include "spawn_int.h"
> >
> >  /* The Linux implementation of posix_spawn{p} uses the clone syscall directly
> > @@ -59,21 +60,6 @@
> >     normal program exit with the exit code 127.  */
> >  #define SPAWN_ERROR  127
> >
> > -#ifdef __ia64__
> > -# define CLONE(__fn, __stackbase, __stacksize, __flags, __args) \
> > -  __clone2 (__fn, __stackbase, __stacksize, __flags, __args, 0, 0, 0)
> > -#else
> > -# define CLONE(__fn, __stack, __stacksize, __flags, __args) \
> > -  __clone (__fn, __stack, __flags, __args)
> > -#endif
> > -
> > -/* Since ia64 wants the stackbase w/clone2, re-use the grows-up macro.  */
> > -#if _STACK_GROWS_UP || defined (__ia64__)
> > -# define STACK(__stack, __stack_size) (__stack)
> > -#elif _STACK_GROWS_DOWN
> > -# define STACK(__stack, __stack_size) (__stack + __stack_size)
> > -#endif
> > -
> >
>
> Ok.
>
> >  struct posix_spawn_args
> >  {
> > @@ -378,8 +364,14 @@ __spawnix (pid_t * pid, const char *file,
> >       need for CLONE_SETTLS.  Although parent and child share the same TLS
> >       namespace, there will be no concurrent access for TLS variables (errno
> >       for instance).  */
> > -  new_pid = CLONE (__spawni_child, STACK (stack, stack_size), stack_size,
> > -                CLONE_VM | CLONE_VFORK | SIGCHLD, &args);
> > +  struct clone_args clone_args =
> > +    {
> > +      .flags = CLONE_VM | CLONE_VFORK,
> > +      .exit_signal = SIGCHLD,
> > +      .stack = (uintptr_t) stack,
> > +      .stack_size = stack_size,
> > +    };
> > +  new_pid = __clone_internal (&clone_args, __spawni_child, &args);
> >
> >    /* It needs to collect the case where the auxiliary process was created
> >       but failed to execute the file (due either any preparation step or
> >
>
> Ok.

Here is the v9 patch.

Thanks.

-- 
H.J.

[-- Attachment #2: v9-0001-Add-an-internal-wrapper-for-clone-clone2-and-clon.patch --]
[-- Type: text/x-patch, Size: 18857 bytes --]

From ead31bdf83b0a73273399b17885f3c81c5ad3b83 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Sat, 13 Feb 2021 11:47:46 -0800
Subject: [PATCH v9] Add an internal wrapper for clone, clone2 and clone3

The clone3 system call (since Linux 5.3) provides a superset of the
functionality of clone and clone2.  It also provides a number of API
improvements, including the ability to specify the size of the child's
stack area which can be used by kernel to compute the shadow stack size
when allocating the shadow stack.  Add:

extern int __clone_internal (struct clone_args *__cl_args,
			     int (*__func) (void *__arg), void *__arg);

to provide an abstract interface for clone, clone2 and clone3.

1. Simplify stack management for thread creation by passing both stack
base and size to create_thread.
2. Consolidate clone vs clone2 differences into a single file.
3. Call __clone3 if HAVE_CLONE3_WAPPER is defined.  If __clone3 returns
-1 with ENOSYS, fall back to clone or clone2.
4. Use only __clone_internal to clone a thread.  Since the stack size
argument for create_thread is now unconditional, always pass stack size
to create_thread.
5. Enable the public clone3 wrapper in the future after it has been
added to all targets.

NB: Sandbox will return ENOSYS on clone3 in both Chromium:

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src/+/218438259dd795456f0a48f67cbe5b4e520db88b

commit 218438259dd795456f0a48f67cbe5b4e520db88b
Author: Matthew Denton <mpdenton@chromium.org>
Date: Thu Jun 03 20:06:13 2021

Linux sandbox: return ENOSYS for clone3

Because clone3 uses a pointer argument rather than a flags argument, we
cannot examine the contents with seccomp, which is essential to
preventing sandboxed processes from starting other processes. So, we
won't be able to support clone3 in Chromium. This CL modifies the
BPF policy to return ENOSYS for clone3 so glibc always uses the fallback
to clone.

Bug: 1213452
Change-Id: I7c7c585a319e0264eac5b1ebee1a45be2d782303
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2936184
Reviewed-by: Robert Sesek <rsesek@chromium.org>
Commit-Queue: Matthew Denton <mpdenton@chromium.org>
Cr-Commit-Position: refs/heads/master@{#888980}

[modify] https://crrev.com/218438259dd795456f0a48f67cbe5b4e520db88b/sandbox/linux/seccomp-bpf-helpers/baseline_policy.cc

and Firefox:

https://hg.mozilla.org/integration/autoland/rev/ecb4011a0c76
---
 include/clone_internal.h                 | 16 +++++
 nptl/allocatestack.c                     | 59 ++-------------
 nptl/pthread_create.c                    | 38 +++++-----
 sysdeps/unix/sysv/linux/Makefile         |  3 +-
 sysdeps/unix/sysv/linux/clone-internal.c | 91 ++++++++++++++++++++++++
 sysdeps/unix/sysv/linux/clone3.c         |  1 +
 sysdeps/unix/sysv/linux/clone3.h         | 67 +++++++++++++++++
 sysdeps/unix/sysv/linux/spawni.c         | 26 +++----
 8 files changed, 213 insertions(+), 88 deletions(-)
 create mode 100644 include/clone_internal.h
 create mode 100644 sysdeps/unix/sysv/linux/clone-internal.c
 create mode 100644 sysdeps/unix/sysv/linux/clone3.c
 create mode 100644 sysdeps/unix/sysv/linux/clone3.h

diff --git a/include/clone_internal.h b/include/clone_internal.h
new file mode 100644
index 0000000000..4b23ef33ce
--- /dev/null
+++ b/include/clone_internal.h
@@ -0,0 +1,16 @@
+#ifndef _CLONE3_H
+#include_next <clone3.h>
+
+extern __typeof (clone3) __clone3;
+
+/* The internal wrapper of clone/clone2 and clone3.  If __clone3 returns
+   -1 with ENOSYS, fall back to clone or clone2.  */
+extern int __clone_internal (struct clone_args *__cl_args,
+			     int (*__func) (void *__arg), void *__arg);
+
+#ifndef _ISOMAC
+libc_hidden_proto (__clone3)
+libc_hidden_proto (__clone_internal)
+#endif
+
+#endif
diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c
index 9be6c42894..cfe37a3443 100644
--- a/nptl/allocatestack.c
+++ b/nptl/allocatestack.c
@@ -33,47 +33,6 @@
 #include <kernel-features.h>
 #include <nptl-stack.h>
 
-#ifndef NEED_SEPARATE_REGISTER_STACK
-
-/* Most architectures have exactly one stack pointer.  Some have more.  */
-# define STACK_VARIABLES void *stackaddr = NULL
-
-/* How to pass the values to the 'create_thread' function.  */
-# define STACK_VARIABLES_ARGS stackaddr
-
-/* How to declare function which gets there parameters.  */
-# define STACK_VARIABLES_PARMS void *stackaddr
-
-/* How to declare allocate_stack.  */
-# define ALLOCATE_STACK_PARMS void **stack
-
-/* This is how the function is called.  We do it this way to allow
-   other variants of the function to have more parameters.  */
-# define ALLOCATE_STACK(attr, pd) allocate_stack (attr, pd, &stackaddr)
-
-#else
-
-/* We need two stacks.  The kernel will place them but we have to tell
-   the kernel about the size of the reserved address space.  */
-# define STACK_VARIABLES void *stackaddr = NULL; size_t stacksize = 0
-
-/* How to pass the values to the 'create_thread' function.  */
-# define STACK_VARIABLES_ARGS stackaddr, stacksize
-
-/* How to declare function which gets there parameters.  */
-# define STACK_VARIABLES_PARMS void *stackaddr, size_t stacksize
-
-/* How to declare allocate_stack.  */
-# define ALLOCATE_STACK_PARMS void **stack, size_t *stacksize
-
-/* This is how the function is called.  We do it this way to allow
-   other variants of the function to have more parameters.  */
-# define ALLOCATE_STACK(attr, pd) \
-  allocate_stack (attr, pd, &stackaddr, &stacksize)
-
-#endif
-
-
 /* Default alignment of stack.  */
 #ifndef STACK_ALIGN
 # define STACK_ALIGN __alignof__ (long double)
@@ -252,7 +211,7 @@ advise_stack_range (void *mem, size_t size, uintptr_t pd, size_t guardsize)
    PDP must be non-NULL.  */
 static int
 allocate_stack (const struct pthread_attr *attr, struct pthread **pdp,
-		ALLOCATE_STACK_PARMS)
+		void **stack, size_t *stacksize)
 {
   struct pthread *pd;
   size_t size;
@@ -603,25 +562,17 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp,
   /* We place the thread descriptor at the end of the stack.  */
   *pdp = pd;
 
-#if _STACK_GROWS_DOWN
   void *stacktop;
 
-# if TLS_TCB_AT_TP
+#if TLS_TCB_AT_TP
   /* The stack begins before the TCB and the static TLS block.  */
   stacktop = ((char *) (pd + 1) - tls_static_size_for_stack);
-# elif TLS_DTV_AT_TP
+#elif TLS_DTV_AT_TP
   stacktop = (char *) (pd - 1);
-# endif
+#endif
 
-# ifdef NEED_SEPARATE_REGISTER_STACK
+  *stacksize = stacktop - pd->stackblock;
   *stack = pd->stackblock;
-  *stacksize = stacktop - *stack;
-# else
-  *stack = stacktop;
-# endif
-#else
-  *stack = pd->stackblock;
-#endif
 
   return 0;
 }
diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c
index 440adc2a6f..d8ec299cb1 100644
--- a/nptl/pthread_create.c
+++ b/nptl/pthread_create.c
@@ -36,6 +36,7 @@
 #include "libioP.h"
 #include <sys/single_threaded.h>
 #include <version.h>
+#include <clone_internal.h>
 
 #include <shlib-compat.h>
 
@@ -227,8 +228,8 @@ late_init (void)
 static int _Noreturn start_thread (void *arg);
 
 static int create_thread (struct pthread *pd, const struct pthread_attr *attr,
-			  bool *stopped_start, STACK_VARIABLES_PARMS,
-			  bool *thread_ran)
+			  bool *stopped_start, void *stackaddr,
+			  size_t stacksize, bool *thread_ran)
 {
   /* Determine whether the newly created threads has to be started
      stopped since we have to set the scheduling parameters or set the
@@ -280,14 +281,18 @@ static int create_thread (struct pthread *pd, const struct pthread_attr *attr,
 
   TLS_DEFINE_INIT_TP (tp, pd);
 
-#ifdef __NR_clone2
-# define ARCH_CLONE __clone2
-#else
-# define ARCH_CLONE __clone
-#endif
-  if (__glibc_unlikely (ARCH_CLONE (&start_thread, STACK_VARIABLES_ARGS,
-				    clone_flags, pd, &pd->tid, tp, &pd->tid)
-			== -1))
+  struct clone_args args =
+    {
+      .flags = clone_flags,
+      .pidfd = (uintptr_t) &pd->tid,
+      .parent_tid = (uintptr_t) &pd->tid,
+      .child_tid = (uintptr_t) &pd->tid,
+      .stack = (uintptr_t) stackaddr,
+      .stack_size = stacksize,
+      .tls = (uintptr_t) tp,
+    };
+  int ret = __clone_internal (&args, &start_thread, pd);
+  if (__glibc_unlikely (ret == -1))
     return errno;
 
   /* It's started now, so if we fail below, we'll have to let it clean itself
@@ -576,7 +581,8 @@ int
 __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
 		      void *(*start_routine) (void *), void *arg)
 {
-  STACK_VARIABLES;
+  void *stackaddr = NULL;
+  size_t stacksize = 0;
 
   /* Avoid a data race in the multi-threaded case, and call the
      deferred initialization only once.  */
@@ -600,7 +606,7 @@ __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
     }
 
   struct pthread *pd = NULL;
-  int err = ALLOCATE_STACK (iattr, &pd);
+  int err = allocate_stack (iattr, &pd, &stackaddr, &stacksize);
   int retval = 0;
 
   if (__glibc_unlikely (err != 0))
@@ -744,8 +750,8 @@ __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
 
       /* We always create the thread stopped at startup so we can
 	 notify the debugger.  */
-      retval = create_thread (pd, iattr, &stopped_start,
-			      STACK_VARIABLES_ARGS, &thread_ran);
+      retval = create_thread (pd, iattr, &stopped_start, stackaddr,
+			      stacksize, &thread_ran);
       if (retval == 0)
 	{
 	  /* We retain ownership of PD until (a) (see CONCURRENCY NOTES
@@ -776,8 +782,8 @@ __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
 	}
     }
   else
-    retval = create_thread (pd, iattr, &stopped_start,
-			    STACK_VARIABLES_ARGS, &thread_ran);
+    retval = create_thread (pd, iattr, &stopped_start, stackaddr,
+			    stacksize, &thread_ran);
 
   /* Return to the previous signal mask, after creating the new
      thread.  */
diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
index feb8fd4ce1..ed0c0d27f4 100644
--- a/sysdeps/unix/sysv/linux/Makefile
+++ b/sysdeps/unix/sysv/linux/Makefile
@@ -65,7 +65,8 @@ sysdep_routines += adjtimex clone umount umount2 readahead sysctl \
 		   xstat fxstat lxstat xstat64 fxstat64 lxstat64 \
 		   fxstatat fxstatat64 \
 		   xmknod xmknodat convert_scm_timestamps \
-		   closefrom_fallback
+		   closefrom_fallback \
+		   clone3 clone-internal
 
 CFLAGS-gethostid.c = -fexceptions
 CFLAGS-tee.c = -fexceptions -fasynchronous-unwind-tables
diff --git a/sysdeps/unix/sysv/linux/clone-internal.c b/sysdeps/unix/sysv/linux/clone-internal.c
new file mode 100644
index 0000000000..1e7a8f6b35
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/clone-internal.c
@@ -0,0 +1,91 @@
+/* The internal wrapper of clone and clone3.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library.  If not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include <stddef.h>
+#include <errno.h>
+#include <sched.h>
+#include <clone_internal.h>
+#include <libc-pointer-arith.h>	/* For cast_to_pointer.  */
+#include <stackinfo.h>		/* For _STACK_GROWS_{UP,DOWN}.  */
+
+#define CLONE_ARGS_SIZE_VER0 64 /* sizeof first published struct */
+#define CLONE_ARGS_SIZE_VER1 80 /* sizeof second published struct */
+#define CLONE_ARGS_SIZE_VER2 88 /* sizeof third published struct */
+
+#define sizeof_field(TYPE, MEMBER) sizeof ((((TYPE *)0)->MEMBER))
+#define offsetofend(TYPE, MEMBER) \
+  (offsetof (TYPE, MEMBER) + sizeof_field (TYPE, MEMBER))
+
+_Static_assert (__alignof (struct clone_args) == 8,
+		"__alignof (struct clone_args) != 8");
+_Static_assert (offsetofend (struct clone_args, tls) == CLONE_ARGS_SIZE_VER0,
+		"offsetofend (struct clone_args, tls) != CLONE_ARGS_SIZE_VER0");
+_Static_assert (offsetofend (struct clone_args, set_tid_size) == CLONE_ARGS_SIZE_VER1,
+		"offsetofend (struct clone_args, set_tid_size) != CLONE_ARGS_SIZE_VER1");
+_Static_assert (offsetofend (struct clone_args, cgroup) == CLONE_ARGS_SIZE_VER2,
+		"offsetofend (struct clone_args, cgroup) != CLONE_ARGS_SIZE_VER2");
+_Static_assert (sizeof (struct clone_args) == CLONE_ARGS_SIZE_VER2,
+		"sizeof (struct clone_args) != CLONE_ARGS_SIZE_VER2");
+
+int
+__clone_internal (struct clone_args *cl_args,
+		  int (*func) (void *arg), void *arg)
+{
+  int ret;
+#ifdef HAVE_CLONE3_WAPPER
+  /* Try clone3 first.  */
+  int saved_errno = errno;
+  ret = __clone3 (cl_args, sizeof (*cl_args), func, arg);
+  if (ret != -1 || errno != ENOSYS)
+    return ret;
+
+  /* NB: Restore errno since errno may be checked against non-zero
+     return value.  */
+  __set_errno (saved_errno);
+#endif
+
+  /* Map clone3 arguments to clone arguments.  NB: No need to check
+     invalid clone3 specific bits in flags nor exit_signal since this
+     is an internal function.  */
+  int flags = cl_args->flags | cl_args->exit_signal;
+  void *stack = cast_to_pointer (cl_args->stack);
+
+#ifdef __ia64__
+  ret = __clone2 (func, stack, cl_args->stack_size,
+		  flags, arg,
+		  cast_to_pointer (cl_args->parent_tid),
+		  cast_to_pointer (cl_args->tls),
+		  cast_to_pointer (cl_args->child_tid));
+#else
+# if !_STACK_GROWS_DOWN && !_STACK_GROWS_UP
+#  error "Define either _STACK_GROWS_DOWN or _STACK_GROWS_UP"
+# endif
+
+# if _STACK_GROWS_DOWN
+  stack += cl_args->stack_size;
+# endif
+  ret = __clone (func, stack, flags, arg,
+		 cast_to_pointer (cl_args->parent_tid),
+		 cast_to_pointer (cl_args->tls),
+		 cast_to_pointer (cl_args->child_tid));
+#endif
+  return ret;
+}
+
+libc_hidden_def (__clone_internal)
diff --git a/sysdeps/unix/sysv/linux/clone3.c b/sysdeps/unix/sysv/linux/clone3.c
new file mode 100644
index 0000000000..de963ef89d
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/clone3.c
@@ -0,0 +1 @@
+/* An empty placeholder.  */
diff --git a/sysdeps/unix/sysv/linux/clone3.h b/sysdeps/unix/sysv/linux/clone3.h
new file mode 100644
index 0000000000..1e35ff6422
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/clone3.h
@@ -0,0 +1,67 @@
+/* The wrapper of clone3.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library.  If not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#ifndef _CLONE3_H
+#define _CLONE3_H	1
+
+#include <features.h>
+#include <stddef.h>
+#include <bits/types.h>
+
+__BEGIN_DECLS
+
+/* The unsigned 64-bit and 8-byte aligned integer type.  */
+typedef __U64_TYPE __aligned_uint64_t __attribute__ ((__aligned__ (8)));
+
+/* This struct should only be used in an argument to the clone3 system
+   call (along with its size argument).  It may be extended with new
+   fields in the future.  */
+
+struct clone_args
+{
+  /* Flags bit mask.  */
+  __aligned_uint64_t flags;
+  /* Where to store PID file descriptor (pid_t *).  */
+  __aligned_uint64_t pidfd;
+  /* Where to store child TID, in child's memory (pid_t *).  */
+  __aligned_uint64_t child_tid;
+  /* Where to store child TID, in parent's memory (int *). */
+  __aligned_uint64_t parent_tid;
+  /* Signal to deliver to parent on child termination */
+  __aligned_uint64_t exit_signal;
+  /* The lowest address of stack.  */
+  __aligned_uint64_t stack;
+  /* Size of stack.  */
+  __aligned_uint64_t stack_size;
+  /* Location of new TLS.  */
+  __aligned_uint64_t tls;
+  /* Pointer to a pid_t array (since Linux 5.5).  */
+  __aligned_uint64_t set_tid;
+  /* Number of elements in set_tid (since Linux 5.5). */
+  __aligned_uint64_t set_tid_size;
+  /* File descriptor for target cgroup of child (since Linux 5.7).  */
+  __aligned_uint64_t cgroup;
+};
+
+/* The wrapper of clone3.  */
+extern int clone3 (struct clone_args *__cl_args, size_t __size,
+		   int (*__func) (void *__arg), void *__arg);
+
+__END_DECLS
+
+#endif /* clone3.h */
diff --git a/sysdeps/unix/sysv/linux/spawni.c b/sysdeps/unix/sysv/linux/spawni.c
index f7e7353a05..6b0bade4d4 100644
--- a/sysdeps/unix/sysv/linux/spawni.c
+++ b/sysdeps/unix/sysv/linux/spawni.c
@@ -26,6 +26,7 @@
 #include <spawn_int.h>
 #include <sysdep.h>
 #include <sys/resource.h>
+#include <clone_internal.h>
 
 /* The Linux implementation of posix_spawn{p} uses the clone syscall directly
    with CLONE_VM and CLONE_VFORK flags and an allocated stack.  The new stack
@@ -53,21 +54,6 @@
    normal program exit with the exit code 127.  */
 #define SPAWN_ERROR	127
 
-#ifdef __ia64__
-# define CLONE(__fn, __stackbase, __stacksize, __flags, __args) \
-  __clone2 (__fn, __stackbase, __stacksize, __flags, __args, 0, 0, 0)
-#else
-# define CLONE(__fn, __stack, __stacksize, __flags, __args) \
-  __clone (__fn, __stack, __flags, __args)
-#endif
-
-/* Since ia64 wants the stackbase w/clone2, re-use the grows-up macro.  */
-#if _STACK_GROWS_UP || defined (__ia64__)
-# define STACK(__stack, __stack_size) (__stack)
-#elif _STACK_GROWS_DOWN
-# define STACK(__stack, __stack_size) (__stack + __stack_size)
-#endif
-
 
 struct posix_spawn_args
 {
@@ -382,8 +368,14 @@ __spawnix (pid_t * pid, const char *file,
      need for CLONE_SETTLS.  Although parent and child share the same TLS
      namespace, there will be no concurrent access for TLS variables (errno
      for instance).  */
-  new_pid = CLONE (__spawni_child, STACK (stack, stack_size), stack_size,
-		   CLONE_VM | CLONE_VFORK | SIGCHLD, &args);
+  struct clone_args clone_args =
+    {
+      .flags = CLONE_VM | CLONE_VFORK,
+      .exit_signal = SIGCHLD,
+      .stack = (uintptr_t) stack,
+      .stack_size = stack_size,
+    };
+  new_pid = __clone_internal (&clone_args, __spawni_child, &args);
 
   /* It needs to collect the case where the auxiliary process was created
      but failed to execute the file (due either any preparation step or
-- 
2.31.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v9] Add static tests for __clone_internal
  2021-07-13 19:32   ` Adhemerval Zanella
@ 2021-07-13 21:12     ` H.J. Lu
  2021-07-14 13:18       ` Adhemerval Zanella
  0 siblings, 1 reply; 16+ messages in thread
From: H.J. Lu @ 2021-07-13 21:12 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: GNU C Library, Florian Weimer, Noah Goldstein

On Tue, Jul 13, 2021 at 12:33 PM Adhemerval Zanella
<adhemerval.zanella@linaro.org> wrote:
>
> They are quite similar to the non '-internal' tests, would be better to try
> include and reimplement the difference bits that calls the __clone_internal()
> intead of replicate all the tests?

I'd like to test __clone_internal separately and leave the existing clone
tests alone.

> On 01/06/2021 11:55, H.J. Lu wrote:
> > ---
> >  sysdeps/unix/sysv/linux/Makefile              |   9 ++
> >  .../sysv/linux/tst-align-clone-internal.c     |  87 +++++++++++
> >  sysdeps/unix/sysv/linux/tst-clone2-internal.c | 137 ++++++++++++++++++
> >  sysdeps/unix/sysv/linux/tst-clone3-internal.c |  99 +++++++++++++
> >  .../unix/sysv/linux/tst-getpid1-internal.c    | 133 +++++++++++++++++
> >  .../sysv/linux/tst-misalign-clone-internal.c  |  86 +++++++++++
> >  6 files changed, 551 insertions(+)
> >  create mode 100644 sysdeps/unix/sysv/linux/tst-align-clone-internal.c
> >  create mode 100644 sysdeps/unix/sysv/linux/tst-clone2-internal.c
> >  create mode 100644 sysdeps/unix/sysv/linux/tst-clone3-internal.c
> >  create mode 100644 sysdeps/unix/sysv/linux/tst-getpid1-internal.c
> >  create mode 100644 sysdeps/unix/sysv/linux/tst-misalign-clone-internal.c
> >
> > diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
> > index 9469868bce..214b912921 100644
> > --- a/sysdeps/unix/sysv/linux/Makefile
> > +++ b/sysdeps/unix/sysv/linux/Makefile
> > @@ -118,6 +118,15 @@ endif
> >
> >  tests-internal += tst-sigcontext-get_pc
> >
> > +tests-clone-internal = \
> > +  tst-align-clone-internal \
> > +  tst-clone2-internal \
> > +  tst-clone3-internal \
> > +  tst-getpid1-internal \
> > +  tst-misalign-clone-internal
> > +tests-internal += $(tests-clone-internal)
> > +tests-static += $(tests-clone-internal)
> > +
> >  CFLAGS-tst-sigcontext-get_pc.c = -fasynchronous-unwind-tables
> >
> >  # Generate the list of SYS_* macros for the system calls (__NR_*
>
> Ok.
>
> > diff --git a/sysdeps/unix/sysv/linux/tst-align-clone-internal.c b/sysdeps/unix/sysv/linux/tst-align-clone-internal.c
> > new file mode 100644
> > index 0000000000..6c3631f3db
> > --- /dev/null
> > +++ b/sysdeps/unix/sysv/linux/tst-align-clone-internal.c
> > @@ -0,0 +1,87 @@
> > +/* Verify that the clone child stack is properly aligned.
> > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > +   This file is part of the GNU C Library.
> > +
> > +   The GNU C Library is free software; you can redistribute it and/or
> > +   modify it under the terms of the GNU Lesser General Public
> > +   License as published by the Free Software Foundation; either
> > +   version 2.1 of the License, or (at your option) any later version.
> > +
> > +   The GNU C Library is distributed in the hope that it will be useful,
> > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +   Lesser General Public License for more details.
> > +
> > +   You should have received a copy of the GNU Lesser General Public
> > +   License along with the GNU C Library; if not, see
> > +   <https://www.gnu.org/licenses/>.  */
> > +
> > +#include <sched.h>
> > +#include <stdbool.h>
> > +#include <stdint.h>
> > +#include <stdio.h>
> > +#include <string.h>
> > +#include <sys/wait.h>
> > +#include <unistd.h>
> > +#include <tst-stack-align.h>
> > +#include <clone_internal.h>
> > +#include <support/xunistd.h>
> > +
> > +static int
> > +f (void *arg)
> > +{
> > +  bool ok = true;
> > +
> > +  puts ("in f");
> > +
> > +  if (TEST_STACK_ALIGN ())
> > +    ok = false;
> > +
> > +  return ok ? 0 : 1;
> > +}
>
> Maybe:
>
>   statit int
>   f (void *arg)
>   {
>     return TEST_STACK_ALIGN () ? 0 : 1;
>   }

Fixed.

> > +
> > +static int
> > +do_test (void)
> > +{
> > +  bool ok = true;
> > +
> > +  puts ("in main");
> > +
> > +  if (TEST_STACK_ALIGN ())
> > +    ok = false;
> > +
>
> Maybe
>
>   ok = TEST_STACK_ALIGN ();
>
> But I think this does not really add much, so I think it would be better
> to:
>
>   if (! TEST_STACK_ALIGN ())
>     FAIL_EXIT1 ("stack alignment failed");

Fixed.

> > +#ifdef __ia64__
> > +# define STACK_SIZE 256 * 1024
> > +#else
> > +# define STACK_SIZE 128 * 1024
> > +#endif
> > +  char st[STACK_SIZE] __attribute__ ((aligned));
> > +  struct clone_args clone_args =
> > +    {
> > +      .stack = (uintptr_t) st,
> > +      .stack_size = sizeof (st),
> > +    };
> > +  pid_t p = __clone_internal (&clone_args, f, 0);
> > +  if (p == -1)
> > +    {
> > +      printf("clone failed: %m\n");
> > +      return 1;
> > +    }
>
> Use TEST_VERIFY here:
>
>   TEST_VERIFY (p != -1);

Fixed.

> > +
> > +  int e;
> > +  xwaitpid (p, &e, __WCLONE);
> > +  if (!WIFEXITED (e))
> > +    {
> > +      if (WIFSIGNALED (e))
> > +     printf ("died from signal %s\n", strsignal (WTERMSIG (e)));
> > +      else
> > +     puts ("did not terminate correctly");
> > +      return 1;
> > +    }
> > +  if (WEXITSTATUS (e) != 0)
> > +    ok = false;
>
>     TEST_VERIFY (WIFEXITED (status));
>     TEST_COMPARE (WEXITSTATUS (status), 0);

Fixed.

> > +
> > +  return ok ? 0 : 1;
> > +}
> > +
> > +#include <support/test-driver.c>
> > diff --git a/sysdeps/unix/sysv/linux/tst-clone2-internal.c b/sysdeps/unix/sysv/linux/tst-clone2-internal.c
> > new file mode 100644
> > index 0000000000..b8917fe713
> > --- /dev/null
> > +++ b/sysdeps/unix/sysv/linux/tst-clone2-internal.c
> > @@ -0,0 +1,137 @@
> > +/* Test if CLONE_VM does not change pthread pid/tid field (BZ #19957)
> > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > +   This file is part of the GNU C Library.
> > +
> > +   The GNU C Library is free software; you can redistribute it and/or
> > +   modify it under the terms of the GNU Lesser General Public
> > +   License as published by the Free Software Foundation; either
> > +   version 2.1 of the License, or (at your option) any later version.
> > +
> > +   The GNU C Library is distributed in the hope that it will be useful,
> > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +   Lesser General Public License for more details.
> > +
> > +   You should have received a copy of the GNU Lesser General Public
> > +   License along with the GNU C Library; if not, see
> > +   <https://www.gnu.org/licenses/>.  */
> > +
> > +#include <sched.h>
> > +#include <signal.h>
> > +#include <string.h>
> > +#include <stdio.h>
> > +#include <fcntl.h>
> > +#include <unistd.h>
> > +#include <stddef.h>
> > +#include <stdbool.h>
> > +#include <stdint.h>
> > +#include <stdlib.h>
> > +#include <errno.h>
> > +#include <sys/types.h>
> > +#include <sys/wait.h>
> > +#include <sys/syscall.h>
> > +#include <clone_internal.h>
> > +#include <support/xunistd.h>
> > +#include <support/check.h>
> > +
> > +static int sig;
> > +static int pipefd[2];
> > +
> > +static int
> > +f (void *a)
> > +{
> > +  close (pipefd[0]);
> > +
> > +  pid_t ppid = getppid ();
> > +  pid_t pid = getpid ();
> > +  pid_t tid = gettid ();
> > +
> > +  if (write (pipefd[1], &ppid, sizeof ppid) != sizeof (ppid))
> > +    FAIL_EXIT1 ("write ppid failed\n");
> > +  if (write (pipefd[1], &pid, sizeof pid) != sizeof (pid))
> > +    FAIL_EXIT1 ("write pid failed\n");
> > +  if (write (pipefd[1], &tid, sizeof tid) != sizeof (tid))
> > +    FAIL_EXIT1 ("write tid failed\n");
> > +
> > +  return 0;
> > +}
> > +
> > +
> > +static int
> > +do_test (void)
> > +{
> > +  sig = SIGRTMIN;
> > +  sigset_t ss;
> > +  sigemptyset (&ss);
> > +  sigaddset (&ss, sig);
> > +  if (sigprocmask (SIG_BLOCK, &ss, NULL) != 0)
> > +    FAIL_EXIT1 ("sigprocmask failed: %m");
> > +
> > +  if (pipe2 (pipefd, O_CLOEXEC))
> > +    FAIL_EXIT1 ("pipe failed: %m");
> > +
> > +#ifdef __ia64__
> > +# define STACK_SIZE 256 * 1024
> > +#else
> > +# define STACK_SIZE 128 * 1024
> > +#endif
> > +  char st[STACK_SIZE] __attribute__ ((aligned));
> > +  struct clone_args clone_args =
> > +    {
> > +      .stack = (uintptr_t) st,
> > +      .stack_size = sizeof (st),
> > +    };
> > +  pid_t p = __clone_internal (&clone_args, f, 0);
> > +
> > +  close (pipefd[1]);
> > +
> > +  if (p == -1)
> > +    FAIL_EXIT1("clone failed: %m");
> > +
> > +  pid_t ppid, pid, tid;
> > +  if (read (pipefd[0], &ppid, sizeof pid) != sizeof pid)
> > +    {
> > +      kill (p, SIGKILL);
> > +      FAIL_EXIT1 ("read ppid failed: %m");
> > +    }
> > +  if (read (pipefd[0], &pid, sizeof pid) != sizeof pid)
> > +    {
> > +      kill (p, SIGKILL);
> > +      FAIL_EXIT1 ("read pid failed: %m");
> > +    }
> > +  if (read (pipefd[0], &tid, sizeof tid) != sizeof tid)
> > +    {
> > +      kill (p, SIGKILL);
> > +      FAIL_EXIT1 ("read tid failed: %m");
> > +    }
> > +
> > +  close (pipefd[0]);
> > +
> > +  int ret = 0;
> > +
> > +  pid_t own_pid = getpid ();
> > +  pid_t own_tid = syscall (__NR_gettid);
> > +
> > +  /* Some sanity checks for clone syscall: returned ppid should be current
> > +     pid and both returned tid/pid should be different from current one.  */
> > +  if ((ppid != own_pid) || (pid == own_pid) || (tid == own_tid))
> > +    FAIL_RET ("ppid=%i pid=%i tid=%i | own_pid=%i own_tid=%i",
> > +           (int)ppid, (int)pid, (int)tid, (int)own_pid, (int)own_tid);
> > +
> > +  int e;
> > +  xwaitpid (p, &e, __WCLONE);
> > +  if (!WIFEXITED (e))
> > +    {
> > +      if (WIFSIGNALED (e))
> > +     printf ("died from signal %s\n", strsignal (WTERMSIG (e)));
> > +      else
> > +     puts ("did not terminate correctly");
> > +      exit (EXIT_FAILURE);
> > +    }
> > +  if (WEXITSTATUS (e) != 0)
> > +    FAIL_EXIT1 ("exit code %d", WEXITSTATUS (e));
>
>     TEST_VERIFY (WIFEXITED (status));
>     TEST_COMPARE (WEXITSTATUS (status), 0);

Fixed.

> > +
> > +  return ret;
> > +}
> > +
> > +#include <support/test-driver.c>
> > diff --git a/sysdeps/unix/sysv/linux/tst-clone3-internal.c b/sysdeps/unix/sysv/linux/tst-clone3-internal.c
> > new file mode 100644
> > index 0000000000..2bdbc571e6
> > --- /dev/null
> > +++ b/sysdeps/unix/sysv/linux/tst-clone3-internal.c
> > @@ -0,0 +1,99 @@
> > +/* Check if clone (CLONE_THREAD) does not call exit_group (BZ #21512)
> > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > +   This file is part of the GNU C Library.
> > +
> > +   The GNU C Library is free software; you can redistribute it and/or
> > +   modify it under the terms of the GNU Lesser General Public
> > +   License as published by the Free Software Foundation; either
> > +   version 2.1 of the License, or (at your option) any later version.
> > +
> > +   The GNU C Library is distributed in the hope that it will be useful,
> > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +   Lesser General Public License for more details.
> > +
> > +   You should have received a copy of the GNU Lesser General Public
> > +   License along with the GNU C Library; if not, see
> > +   <https://www.gnu.org/licenses/>.  */
> > +
> > +#include <string.h>
> > +#include <sched.h>
> > +#include <signal.h>
> > +#include <unistd.h>
> > +#include <errno.h>
> > +#include <sys/syscall.h>
> > +#include <sys/wait.h>
> > +#include <sys/types.h>
> > +#include <linux/futex.h>
> > +#include <support/check.h>
> > +#include <stdatomic.h>
> > +#include <clone_internal.h>
> > +
> > +/* Test if clone call with CLONE_THREAD does not call exit_group.  The 'f'
> > +   function returns '1', which will be used by clone thread to call the
> > +   'exit' syscall directly.  If _exit is used instead, exit_group will be
> > +   used and thus the thread group will finish with return value of '1'
> > +   (where '2' from main thread is expected.).  */
> > +
> > +static int
> > +f (void *a)
> > +{
> > +  return 1;
> > +}
> > +
> > +/* Futex wait for TID argument, similar to pthread_join internal
> > +   implementation.  */
> > +#define wait_tid(ctid_ptr, ctid_val)                                 \
> > +  do {                                                                       \
> > +    __typeof (*(ctid_ptr)) __tid;                                    \
> > +    /* We need acquire MO here so that we synchronize with the               \
> > +       kernel's store to 0 when the clone terminates.  */            \
> > +    while ((__tid = atomic_load_explicit (ctid_ptr,                  \
> > +                                       memory_order_acquire)) != 0)  \
> > +      futex_wait (ctid_ptr, ctid_val);                                       \
> > +  } while (0)
> > +
> > +static inline int
> > +futex_wait (int *futexp, int val)
> > +{
> > +#ifdef __NR_futex
> > +  return syscall (__NR_futex, futexp, FUTEX_WAIT, val);
> > +#else
> > +  return syscall (__NR_futex_time64, futexp, FUTEX_WAIT, val);
> > +#endif
> > +}
> > +
> > +static int
> > +do_test (void)
> > +{
> > +  char st[1024] __attribute__ ((aligned));
> > +  int clone_flags = CLONE_THREAD;
> > +  /* Minimum required flags to used along with CLONE_THREAD.  */
> > +  clone_flags |= CLONE_VM | CLONE_SIGHAND;
> > +  /* We will used ctid to call on futex to wait for thread exit.  */
> > +  clone_flags |= CLONE_CHILD_CLEARTID;
> > +  /* Initialize with a known value.  ctid is set to zero by the kernel after the
> > +     cloned thread has exited.  */
> > +#define CTID_INIT_VAL 1
> > +  pid_t ctid = CTID_INIT_VAL;
> > +  pid_t tid;
> > +
> > +  struct clone_args clone_args =
> > +    {
> > +      .flags = clone_flags & ~CSIGNAL,
> > +      .exit_signal = clone_flags & CSIGNAL,
> > +      .stack = (uintptr_t) st,
> > +      .stack_size = sizeof (st),
> > +      .child_tid = (uintptr_t) &ctid,
> > +    };
> > +  tid = __clone_internal (&clone_args, f, NULL);
> > +  if (tid == -1)
> > +    FAIL_EXIT1 ("clone failed: %m");
> > +
> > +  wait_tid (&ctid, CTID_INIT_VAL);
> > +
> > +  return 2;
> > +}
> > +
> > +#define EXPECTED_STATUS 2
> > +#include <support/test-driver.c>
> > diff --git a/sysdeps/unix/sysv/linux/tst-getpid1-internal.c b/sysdeps/unix/sysv/linux/tst-getpid1-internal.c
> > new file mode 100644
> > index 0000000000..ee69e52401
> > --- /dev/null
> > +++ b/sysdeps/unix/sysv/linux/tst-getpid1-internal.c
> > @@ -0,0 +1,133 @@
> > +/* Verify that the parent pid is unchanged by __clone_internal.
> > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > +   This file is part of the GNU C Library.
> > +
> > +   The GNU C Library is free software; you can redistribute it and/or
> > +   modify it under the terms of the GNU Lesser General Public
> > +   License as published by the Free Software Foundation; either
> > +   version 2.1 of the License, or (at your option) any later version.
> > +
> > +   The GNU C Library is distributed in the hope that it will be useful,
> > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +   Lesser General Public License for more details.
> > +
> > +   You should have received a copy of the GNU Lesser General Public
> > +   License along with the GNU C Library; if not, see
> > +   <https://www.gnu.org/licenses/>.  */
> > +
> > +#include <sched.h>
> > +#include <signal.h>
> > +#include <string.h>
> > +#include <stdio.h>
> > +#include <unistd.h>
> > +#include <sys/types.h>
> > +#include <sys/wait.h>
> > +#include <clone_internal.h>
> > +#include <support/xunistd.h>
> > +
> > +#ifndef TEST_CLONE_FLAGS
> > +#define TEST_CLONE_FLAGS 0
> > +#endif
> > +
> > +static int sig;
> > +
> > +static int
> > +f (void *a)
> > +{
> > +  puts ("in f");
> > +  union sigval sival;
> > +  sival.sival_int = getpid ();
> > +  printf ("pid = %d\n", sival.sival_int);
> > +  if (sigqueue (getppid (), sig, sival) != 0)
> > +    return 1;
> > +  return 0;
> > +}
> > +
> > +
> > +static int
> > +do_test (void)
> > +{
> > +  int mypid = getpid ();
> > +
> > +  sig = SIGRTMIN;
> > +  sigset_t ss;
> > +  sigemptyset (&ss);
> > +  sigaddset (&ss, sig);
> > +  if (sigprocmask (SIG_BLOCK, &ss, NULL) != 0)
> > +    {
> > +      printf ("sigprocmask failed: %m\n");
> > +      return 1;
> > +    }
> > +
> > +#ifdef __ia64__
> > +# define STACK_SIZE 256 * 1024
> > +#else
> > +# define STACK_SIZE 128 * 1024
> > +#endif
> > +  char st[STACK_SIZE] __attribute__ ((aligned));
> > +  struct clone_args clone_args =
> > +    {
> > +      .flags = TEST_CLONE_FLAGS & ~CSIGNAL,
> > +      .exit_signal = TEST_CLONE_FLAGS & CSIGNAL,
> > +      .stack = (uintptr_t) st,
> > +      .stack_size = sizeof (st),
> > +    };
> > +  pid_t p = __clone_internal (&clone_args, f, 0);
> > +  if (p == -1)
> > +    {
> > +      printf("clone failed: %m\n");
> > +      return 1;
> > +    }
> > +  printf ("new thread: %d\n", (int) p);
> > +
> > +  siginfo_t si;
> > +  do
> > +    if (sigwaitinfo (&ss, &si) < 0)
> > +      {
> > +     printf("sigwaitinfo failed: %m\n");
> > +     kill (p, SIGKILL);
> > +     return 1;
> > +      }
> > +  while  (si.si_signo != sig || si.si_code != SI_QUEUE);
> > +
> > +  int e;
> > +  xwaitpid (p, &e, __WCLONE);
> > +  if (!WIFEXITED (e))
> > +    {
> > +      if (WIFSIGNALED (e))
> > +     printf ("died from signal %s\n", strsignal (WTERMSIG (e)));
> > +      else
> > +     puts ("did not terminate correctly");
> > +      return 1;
> > +    }
> > +  if (WEXITSTATUS (e) != 0)
> > +    {
> > +      printf ("exit code %d\n", WEXITSTATUS (e));
> > +      return 1;
> > +    }
> > +
> > +  if (si.si_int != (int) p)
> > +    {
> > +      printf ("expected PID %d, got si_int %d\n", (int) p, si.si_int);
> > +      kill (p, SIGKILL);
> > +      return 1;
> > +    }
> > +
> > +  if (si.si_pid != p)
> > +    {
> > +      printf ("expected PID %d, got si_pid %d\n", (int) p, (int) si.si_pid);
> > +      kill (p, SIGKILL);
> > +      return 1;
> > +    }
> > +
> > +  if (getpid () != mypid)
> > +    {
> > +      puts ("my PID changed");
> > +      return 1;
> > +    }
> > +
> > +  return 0;
> > +}
> > +
> > +#include <support/test-driver.c>
>
> Ok.
>
> > diff --git a/sysdeps/unix/sysv/linux/tst-misalign-clone-internal.c b/sysdeps/unix/sysv/linux/tst-misalign-clone-internal.c
> > new file mode 100644
> > index 0000000000..6df5fd2cbc
> > --- /dev/null
> > +++ b/sysdeps/unix/sysv/linux/tst-misalign-clone-internal.c
> > @@ -0,0 +1,86 @@
> > +/* Verify that __clone_internal properly aligns the child stack.
> > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > +   This file is part of the GNU C Library.
> > +
> > +   The GNU C Library is free software; you can redistribute it and/or
> > +   modify it under the terms of the GNU Lesser General Public
> > +   License as published by the Free Software Foundation; either
> > +   version 2.1 of the License, or (at your option) any later version.
> > +
> > +   The GNU C Library is distributed in the hope that it will be useful,
> > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +   Lesser General Public License for more details.
> > +
> > +   You should have received a copy of the GNU Lesser General Public
> > +   License along with the GNU C Library; if not, see
> > +   <https://www.gnu.org/licenses/>.  */
> > +
> > +#include <sched.h>
> > +#include <stdbool.h>
> > +#include <stdint.h>
> > +#include <stdio.h>
> > +#include <string.h>
> > +#include <sys/wait.h>
> > +#include <unistd.h>
> > +#include <libc-pointer-arith.h>
> > +#include <tst-stack-align.h>
> > +#include <clone_internal.h>
> > +#include <support/xunistd.h>
> > +#include <support/check.h>
> > +
> > +static int
> > +check_stack_alignment (void *arg)
> > +{
> > +  bool ok = true;
> > +
> > +  puts ("in f");
> > +
> > +  if (TEST_STACK_ALIGN ())
> > +    ok = false;
> > +
> > +  return ok ? 0 : 1;
> > +}

I made a similar change here.

> > +static int
> > +do_test (void)
> > +{
> > +  puts ("in do_test");
> > +
> > +  if (TEST_STACK_ALIGN ())
> > +    FAIL_EXIT1 ("stack isn't aligned\n");
> > +
> > +#ifdef __ia64__
> > +# define STACK_SIZE (256 * 1024)
> > +#else
> > +# define STACK_SIZE (128 * 1024)
> > +#endif
> > +  char st[STACK_SIZE + 1];
> > +  /* NB: Align child stack to 1 byte.  */
> > +  char *stack = PTR_ALIGN_UP (&st[0], 2) + 1;
> > +  struct clone_args clone_args =
> > +    {
> > +      .stack = (uintptr_t) stack,
> > +      .stack_size = STACK_SIZE,
> > +    };
> > +  pid_t p = __clone_internal (&clone_args, check_stack_alignment, 0);
> > +
> > +  /* Clone must not fail.  */
> > +  TEST_VERIFY_EXIT (p != -1);
> > +
> > +  int e;
> > +  xwaitpid (p, &e, __WCLONE);
> > +  if (!WIFEXITED (e))
> > +    {
> > +      if (WIFSIGNALED (e))
> > +     printf ("died from signal %s\n", strsignal (WTERMSIG (e)));
> > +     FAIL_EXIT1 ("process did not terminate correctly");
> > +    }
> > +
> > +  if (WEXITSTATUS (e) != 0)
> > +    FAIL_EXIT1 ("exit code %d", WEXITSTATUS (e));

Likewise.

> > +  return 0;
> > +}
> > +
> > +#include <support/test-driver.c>
> >

Here is the v9 patch.  OK for master?

Thanks.

-- 
H.J.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v9] Add an internal wrapper for clone, clone2 and clone3
  2021-07-13 19:49     ` [PATCH v9] " H.J. Lu
@ 2021-07-14 13:17       ` Adhemerval Zanella
  0 siblings, 0 replies; 16+ messages in thread
From: Adhemerval Zanella @ 2021-07-14 13:17 UTC (permalink / raw)
  To: H.J. Lu; +Cc: GNU C Library, Florian Weimer, Noah Goldstein



On 13/07/2021 16:49, H.J. Lu wrote:
> Here is the v9 patch.
> 
> Thanks.
> 


> From ead31bdf83b0a73273399b17885f3c81c5ad3b83 Mon Sep 17 00:00:00 2001
> From: "H.J. Lu" <hjl.tools@gmail.com>
> Date: Sat, 13 Feb 2021 11:47:46 -0800
> Subject: [PATCH v9] Add an internal wrapper for clone, clone2 and clone3
> 
> The clone3 system call (since Linux 5.3) provides a superset of the
> functionality of clone and clone2.  It also provides a number of API
> improvements, including the ability to specify the size of the child's
> stack area which can be used by kernel to compute the shadow stack size
> when allocating the shadow stack.  Add:
> 
> extern int __clone_internal (struct clone_args *__cl_args,
> 			     int (*__func) (void *__arg), void *__arg);
> 
> to provide an abstract interface for clone, clone2 and clone3.
> 
> 1. Simplify stack management for thread creation by passing both stack
> base and size to create_thread.
> 2. Consolidate clone vs clone2 differences into a single file.
> 3. Call __clone3 if HAVE_CLONE3_WAPPER is defined.  If __clone3 returns
> -1 with ENOSYS, fall back to clone or clone2.
> 4. Use only __clone_internal to clone a thread.  Since the stack size
> argument for create_thread is now unconditional, always pass stack size
> to create_thread.
> 5. Enable the public clone3 wrapper in the future after it has been
> added to all targets.
> 
> NB: Sandbox will return ENOSYS on clone3 in both Chromium:
> 
> The following revision refers to this bug:
>   https://chromium.googlesource.com/chromium/src/+/218438259dd795456f0a48f67cbe5b4e520db88b
> 
> commit 218438259dd795456f0a48f67cbe5b4e520db88b
> Author: Matthew Denton <mpdenton@chromium.org>
> Date: Thu Jun 03 20:06:13 2021
> 
> Linux sandbox: return ENOSYS for clone3
> 
> Because clone3 uses a pointer argument rather than a flags argument, we
> cannot examine the contents with seccomp, which is essential to
> preventing sandboxed processes from starting other processes. So, we
> won't be able to support clone3 in Chromium. This CL modifies the
> BPF policy to return ENOSYS for clone3 so glibc always uses the fallback
> to clone.
> 
> Bug: 1213452
> Change-Id: I7c7c585a319e0264eac5b1ebee1a45be2d782303
> Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2936184
> Reviewed-by: Robert Sesek <rsesek@chromium.org>
> Commit-Queue: Matthew Denton <mpdenton@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#888980}
> 
> [modify] https://crrev.com/218438259dd795456f0a48f67cbe5b4e520db88b/sandbox/linux/seccomp-bpf-helpers/baseline_policy.cc
> 
> and Firefox:
> 
> https://hg.mozilla.org/integration/autoland/rev/ecb4011a0c76
> ---
>  include/clone_internal.h                 | 16 +++++
>  nptl/allocatestack.c                     | 59 ++-------------
>  nptl/pthread_create.c                    | 38 +++++-----
>  sysdeps/unix/sysv/linux/Makefile         |  3 +-
>  sysdeps/unix/sysv/linux/clone-internal.c | 91 ++++++++++++++++++++++++
>  sysdeps/unix/sysv/linux/clone3.c         |  1 +
>  sysdeps/unix/sysv/linux/clone3.h         | 67 +++++++++++++++++
>  sysdeps/unix/sysv/linux/spawni.c         | 26 +++----
>  8 files changed, 213 insertions(+), 88 deletions(-)
>  create mode 100644 include/clone_internal.h
>  create mode 100644 sysdeps/unix/sysv/linux/clone-internal.c
>  create mode 100644 sysdeps/unix/sysv/linux/clone3.c
>  create mode 100644 sysdeps/unix/sysv/linux/clone3.h
> 
> diff --git a/include/clone_internal.h b/include/clone_internal.h
> new file mode 100644
> index 0000000000..4b23ef33ce
> --- /dev/null
> +++ b/include/clone_internal.h
> @@ -0,0 +1,16 @@
> +#ifndef _CLONE3_H
> +#include_next <clone3.h>
> +
> +extern __typeof (clone3) __clone3;
> +
> +/* The internal wrapper of clone/clone2 and clone3.  If __clone3 returns
> +   -1 with ENOSYS, fall back to clone or clone2.  */
> +extern int __clone_internal (struct clone_args *__cl_args,
> +			     int (*__func) (void *__arg), void *__arg);
> +
> +#ifndef _ISOMAC
> +libc_hidden_proto (__clone3)
> +libc_hidden_proto (__clone_internal)
> +#endif
> +
> +#endif
> diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c
> index 9be6c42894..cfe37a3443 100644
> --- a/nptl/allocatestack.c
> +++ b/nptl/allocatestack.c
> @@ -33,47 +33,6 @@
>  #include <kernel-features.h>
>  #include <nptl-stack.h>
>  
> -#ifndef NEED_SEPARATE_REGISTER_STACK
> -
> -/* Most architectures have exactly one stack pointer.  Some have more.  */
> -# define STACK_VARIABLES void *stackaddr = NULL
> -
> -/* How to pass the values to the 'create_thread' function.  */
> -# define STACK_VARIABLES_ARGS stackaddr
> -
> -/* How to declare function which gets there parameters.  */
> -# define STACK_VARIABLES_PARMS void *stackaddr
> -
> -/* How to declare allocate_stack.  */
> -# define ALLOCATE_STACK_PARMS void **stack
> -
> -/* This is how the function is called.  We do it this way to allow
> -   other variants of the function to have more parameters.  */
> -# define ALLOCATE_STACK(attr, pd) allocate_stack (attr, pd, &stackaddr)
> -
> -#else
> -
> -/* We need two stacks.  The kernel will place them but we have to tell
> -   the kernel about the size of the reserved address space.  */
> -# define STACK_VARIABLES void *stackaddr = NULL; size_t stacksize = 0
> -
> -/* How to pass the values to the 'create_thread' function.  */
> -# define STACK_VARIABLES_ARGS stackaddr, stacksize
> -
> -/* How to declare function which gets there parameters.  */
> -# define STACK_VARIABLES_PARMS void *stackaddr, size_t stacksize
> -
> -/* How to declare allocate_stack.  */
> -# define ALLOCATE_STACK_PARMS void **stack, size_t *stacksize
> -
> -/* This is how the function is called.  We do it this way to allow
> -   other variants of the function to have more parameters.  */
> -# define ALLOCATE_STACK(attr, pd) \
> -  allocate_stack (attr, pd, &stackaddr, &stacksize)
> -
> -#endif
> -
> -
>  /* Default alignment of stack.  */
>  #ifndef STACK_ALIGN
>  # define STACK_ALIGN __alignof__ (long double)
> @@ -252,7 +211,7 @@ advise_stack_range (void *mem, size_t size, uintptr_t pd, size_t guardsize)
>     PDP must be non-NULL.  */
>  static int
>  allocate_stack (const struct pthread_attr *attr, struct pthread **pdp,
> -		ALLOCATE_STACK_PARMS)
> +		void **stack, size_t *stacksize)
>  {
>    struct pthread *pd;
>    size_t size;
> @@ -603,25 +562,17 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp,
>    /* We place the thread descriptor at the end of the stack.  */
>    *pdp = pd;
>  
> -#if _STACK_GROWS_DOWN
>    void *stacktop;
>  
> -# if TLS_TCB_AT_TP
> +#if TLS_TCB_AT_TP
>    /* The stack begins before the TCB and the static TLS block.  */
>    stacktop = ((char *) (pd + 1) - tls_static_size_for_stack);
> -# elif TLS_DTV_AT_TP
> +#elif TLS_DTV_AT_TP
>    stacktop = (char *) (pd - 1);
> -# endif
> +#endif
>  
> -# ifdef NEED_SEPARATE_REGISTER_STACK
> +  *stacksize = stacktop - pd->stackblock;
>    *stack = pd->stackblock;
> -  *stacksize = stacktop - *stack;
> -# else
> -  *stack = stacktop;
> -# endif
> -#else
> -  *stack = pd->stackblock;
> -#endif
>  
>    return 0;
>  }
> diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c
> index 440adc2a6f..d8ec299cb1 100644
> --- a/nptl/pthread_create.c
> +++ b/nptl/pthread_create.c
> @@ -36,6 +36,7 @@
>  #include "libioP.h"
>  #include <sys/single_threaded.h>
>  #include <version.h>
> +#include <clone_internal.h>
>  
>  #include <shlib-compat.h>
>  
> @@ -227,8 +228,8 @@ late_init (void)
>  static int _Noreturn start_thread (void *arg);
>  
>  static int create_thread (struct pthread *pd, const struct pthread_attr *attr,
> -			  bool *stopped_start, STACK_VARIABLES_PARMS,
> -			  bool *thread_ran)
> +			  bool *stopped_start, void *stackaddr,
> +			  size_t stacksize, bool *thread_ran)
>  {
>    /* Determine whether the newly created threads has to be started
>       stopped since we have to set the scheduling parameters or set the
> @@ -280,14 +281,18 @@ static int create_thread (struct pthread *pd, const struct pthread_attr *attr,
>  
>    TLS_DEFINE_INIT_TP (tp, pd);
>  
> -#ifdef __NR_clone2
> -# define ARCH_CLONE __clone2
> -#else
> -# define ARCH_CLONE __clone
> -#endif
> -  if (__glibc_unlikely (ARCH_CLONE (&start_thread, STACK_VARIABLES_ARGS,
> -				    clone_flags, pd, &pd->tid, tp, &pd->tid)
> -			== -1))
> +  struct clone_args args =
> +    {
> +      .flags = clone_flags,
> +      .pidfd = (uintptr_t) &pd->tid,
> +      .parent_tid = (uintptr_t) &pd->tid,
> +      .child_tid = (uintptr_t) &pd->tid,
> +      .stack = (uintptr_t) stackaddr,
> +      .stack_size = stacksize,
> +      .tls = (uintptr_t) tp,
> +    };
> +  int ret = __clone_internal (&args, &start_thread, pd);
> +  if (__glibc_unlikely (ret == -1))
>      return errno;
>  
>    /* It's started now, so if we fail below, we'll have to let it clean itself
> @@ -576,7 +581,8 @@ int
>  __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
>  		      void *(*start_routine) (void *), void *arg)
>  {
> -  STACK_VARIABLES;
> +  void *stackaddr = NULL;
> +  size_t stacksize = 0;
>  
>    /* Avoid a data race in the multi-threaded case, and call the
>       deferred initialization only once.  */
> @@ -600,7 +606,7 @@ __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
>      }
>  
>    struct pthread *pd = NULL;
> -  int err = ALLOCATE_STACK (iattr, &pd);
> +  int err = allocate_stack (iattr, &pd, &stackaddr, &stacksize);
>    int retval = 0;
>  
>    if (__glibc_unlikely (err != 0))
> @@ -744,8 +750,8 @@ __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
>  
>        /* We always create the thread stopped at startup so we can
>  	 notify the debugger.  */
> -      retval = create_thread (pd, iattr, &stopped_start,
> -			      STACK_VARIABLES_ARGS, &thread_ran);
> +      retval = create_thread (pd, iattr, &stopped_start, stackaddr,
> +			      stacksize, &thread_ran);
>        if (retval == 0)
>  	{
>  	  /* We retain ownership of PD until (a) (see CONCURRENCY NOTES
> @@ -776,8 +782,8 @@ __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
>  	}
>      }
>    else
> -    retval = create_thread (pd, iattr, &stopped_start,
> -			    STACK_VARIABLES_ARGS, &thread_ran);
> +    retval = create_thread (pd, iattr, &stopped_start, stackaddr,
> +			    stacksize, &thread_ran);
>  
>    /* Return to the previous signal mask, after creating the new
>       thread.  */
> diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
> index feb8fd4ce1..ed0c0d27f4 100644
> --- a/sysdeps/unix/sysv/linux/Makefile
> +++ b/sysdeps/unix/sysv/linux/Makefile
> @@ -65,7 +65,8 @@ sysdep_routines += adjtimex clone umount umount2 readahead sysctl \
>  		   xstat fxstat lxstat xstat64 fxstat64 lxstat64 \
>  		   fxstatat fxstatat64 \
>  		   xmknod xmknodat convert_scm_timestamps \
> -		   closefrom_fallback
> +		   closefrom_fallback \
> +		   clone3 clone-internal
>  
>  CFLAGS-gethostid.c = -fexceptions
>  CFLAGS-tee.c = -fexceptions -fasynchronous-unwind-tables
> diff --git a/sysdeps/unix/sysv/linux/clone-internal.c b/sysdeps/unix/sysv/linux/clone-internal.c
> new file mode 100644
> index 0000000000..1e7a8f6b35
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/clone-internal.c
> @@ -0,0 +1,91 @@
> +/* The internal wrapper of clone and clone3.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library.  If not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include <stddef.h>
> +#include <errno.h>
> +#include <sched.h>
> +#include <clone_internal.h>
> +#include <libc-pointer-arith.h>	/* For cast_to_pointer.  */
> +#include <stackinfo.h>		/* For _STACK_GROWS_{UP,DOWN}.  */
> +
> +#define CLONE_ARGS_SIZE_VER0 64 /* sizeof first published struct */
> +#define CLONE_ARGS_SIZE_VER1 80 /* sizeof second published struct */
> +#define CLONE_ARGS_SIZE_VER2 88 /* sizeof third published struct */
> +
> +#define sizeof_field(TYPE, MEMBER) sizeof ((((TYPE *)0)->MEMBER))
> +#define offsetofend(TYPE, MEMBER) \
> +  (offsetof (TYPE, MEMBER) + sizeof_field (TYPE, MEMBER))
> +
> +_Static_assert (__alignof (struct clone_args) == 8,
> +		"__alignof (struct clone_args) != 8");
> +_Static_assert (offsetofend (struct clone_args, tls) == CLONE_ARGS_SIZE_VER0,
> +		"offsetofend (struct clone_args, tls) != CLONE_ARGS_SIZE_VER0");
> +_Static_assert (offsetofend (struct clone_args, set_tid_size) == CLONE_ARGS_SIZE_VER1,
> +		"offsetofend (struct clone_args, set_tid_size) != CLONE_ARGS_SIZE_VER1");
> +_Static_assert (offsetofend (struct clone_args, cgroup) == CLONE_ARGS_SIZE_VER2,
> +		"offsetofend (struct clone_args, cgroup) != CLONE_ARGS_SIZE_VER2");
> +_Static_assert (sizeof (struct clone_args) == CLONE_ARGS_SIZE_VER2,
> +		"sizeof (struct clone_args) != CLONE_ARGS_SIZE_VER2");
> +
> +int
> +__clone_internal (struct clone_args *cl_args,
> +		  int (*func) (void *arg), void *arg)
> +{
> +  int ret;
> +#ifdef HAVE_CLONE3_WAPPER
> +  /* Try clone3 first.  */
> +  int saved_errno = errno;
> +  ret = __clone3 (cl_args, sizeof (*cl_args), func, arg);
> +  if (ret != -1 || errno != ENOSYS)
> +    return ret;
> +
> +  /* NB: Restore errno since errno may be checked against non-zero
> +     return value.  */
> +  __set_errno (saved_errno);
> +#endif
> +
> +  /* Map clone3 arguments to clone arguments.  NB: No need to check
> +     invalid clone3 specific bits in flags nor exit_signal since this
> +     is an internal function.  */
> +  int flags = cl_args->flags | cl_args->exit_signal;
> +  void *stack = cast_to_pointer (cl_args->stack);
> +
> +#ifdef __ia64__
> +  ret = __clone2 (func, stack, cl_args->stack_size,
> +		  flags, arg,
> +		  cast_to_pointer (cl_args->parent_tid),
> +		  cast_to_pointer (cl_args->tls),
> +		  cast_to_pointer (cl_args->child_tid));
> +#else
> +# if !_STACK_GROWS_DOWN && !_STACK_GROWS_UP
> +#  error "Define either _STACK_GROWS_DOWN or _STACK_GROWS_UP"
> +# endif
> +
> +# if _STACK_GROWS_DOWN
> +  stack += cl_args->stack_size;
> +# endif
> +  ret = __clone (func, stack, flags, arg,
> +		 cast_to_pointer (cl_args->parent_tid),
> +		 cast_to_pointer (cl_args->tls),
> +		 cast_to_pointer (cl_args->child_tid));
> +#endif
> +  return ret;
> +}
> +
> +libc_hidden_def (__clone_internal)
> diff --git a/sysdeps/unix/sysv/linux/clone3.c b/sysdeps/unix/sysv/linux/clone3.c
> new file mode 100644
> index 0000000000..de963ef89d
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/clone3.c
> @@ -0,0 +1 @@
> +/* An empty placeholder.  */
> diff --git a/sysdeps/unix/sysv/linux/clone3.h b/sysdeps/unix/sysv/linux/clone3.h
> new file mode 100644
> index 0000000000..1e35ff6422
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/clone3.h
> @@ -0,0 +1,67 @@
> +/* The wrapper of clone3.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library.  If not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#ifndef _CLONE3_H
> +#define _CLONE3_H	1
> +
> +#include <features.h>
> +#include <stddef.h>
> +#include <bits/types.h>
> +
> +__BEGIN_DECLS
> +
> +/* The unsigned 64-bit and 8-byte aligned integer type.  */
> +typedef __U64_TYPE __aligned_uint64_t __attribute__ ((__aligned__ (8)));
> +
> +/* This struct should only be used in an argument to the clone3 system
> +   call (along with its size argument).  It may be extended with new
> +   fields in the future.  */
> +
> +struct clone_args
> +{
> +  /* Flags bit mask.  */
> +  __aligned_uint64_t flags;
> +  /* Where to store PID file descriptor (pid_t *).  */
> +  __aligned_uint64_t pidfd;
> +  /* Where to store child TID, in child's memory (pid_t *).  */
> +  __aligned_uint64_t child_tid;
> +  /* Where to store child TID, in parent's memory (int *). */
> +  __aligned_uint64_t parent_tid;
> +  /* Signal to deliver to parent on child termination */
> +  __aligned_uint64_t exit_signal;
> +  /* The lowest address of stack.  */
> +  __aligned_uint64_t stack;
> +  /* Size of stack.  */
> +  __aligned_uint64_t stack_size;
> +  /* Location of new TLS.  */
> +  __aligned_uint64_t tls;
> +  /* Pointer to a pid_t array (since Linux 5.5).  */
> +  __aligned_uint64_t set_tid;
> +  /* Number of elements in set_tid (since Linux 5.5). */
> +  __aligned_uint64_t set_tid_size;
> +  /* File descriptor for target cgroup of child (since Linux 5.7).  */
> +  __aligned_uint64_t cgroup;
> +};
> +
> +/* The wrapper of clone3.  */
> +extern int clone3 (struct clone_args *__cl_args, size_t __size,
> +		   int (*__func) (void *__arg), void *__arg);
> +
> +__END_DECLS
> +
> +#endif /* clone3.h */
> diff --git a/sysdeps/unix/sysv/linux/spawni.c b/sysdeps/unix/sysv/linux/spawni.c
> index f7e7353a05..6b0bade4d4 100644
> --- a/sysdeps/unix/sysv/linux/spawni.c
> +++ b/sysdeps/unix/sysv/linux/spawni.c
> @@ -26,6 +26,7 @@
>  #include <spawn_int.h>
>  #include <sysdep.h>
>  #include <sys/resource.h>
> +#include <clone_internal.h>
>  
>  /* The Linux implementation of posix_spawn{p} uses the clone syscall directly
>     with CLONE_VM and CLONE_VFORK flags and an allocated stack.  The new stack
> @@ -53,21 +54,6 @@
>     normal program exit with the exit code 127.  */
>  #define SPAWN_ERROR	127
>  
> -#ifdef __ia64__
> -# define CLONE(__fn, __stackbase, __stacksize, __flags, __args) \
> -  __clone2 (__fn, __stackbase, __stacksize, __flags, __args, 0, 0, 0)
> -#else
> -# define CLONE(__fn, __stack, __stacksize, __flags, __args) \
> -  __clone (__fn, __stack, __flags, __args)
> -#endif
> -
> -/* Since ia64 wants the stackbase w/clone2, re-use the grows-up macro.  */
> -#if _STACK_GROWS_UP || defined (__ia64__)
> -# define STACK(__stack, __stack_size) (__stack)
> -#elif _STACK_GROWS_DOWN
> -# define STACK(__stack, __stack_size) (__stack + __stack_size)
> -#endif
> -
>  
>  struct posix_spawn_args
>  {
> @@ -382,8 +368,14 @@ __spawnix (pid_t * pid, const char *file,
>       need for CLONE_SETTLS.  Although parent and child share the same TLS
>       namespace, there will be no concurrent access for TLS variables (errno
>       for instance).  */
> -  new_pid = CLONE (__spawni_child, STACK (stack, stack_size), stack_size,
> -		   CLONE_VM | CLONE_VFORK | SIGCHLD, &args);
> +  struct clone_args clone_args =
> +    {
> +      .flags = CLONE_VM | CLONE_VFORK,
> +      .exit_signal = SIGCHLD,
> +      .stack = (uintptr_t) stack,
> +      .stack_size = stack_size,
> +    };
> +  new_pid = __clone_internal (&clone_args, __spawni_child, &args);
>  
>    /* It needs to collect the case where the auxiliary process was created
>       but failed to execute the file (due either any preparation step or
> -- 
> 2.31.1

This version looks fine, thanks.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v9] Add static tests for __clone_internal
  2021-07-13 21:12     ` [PATCH v9] " H.J. Lu
@ 2021-07-14 13:18       ` Adhemerval Zanella
  2021-07-14 13:32         ` H.J. Lu
  0 siblings, 1 reply; 16+ messages in thread
From: Adhemerval Zanella @ 2021-07-14 13:18 UTC (permalink / raw)
  To: H.J. Lu; +Cc: GNU C Library, Florian Weimer, Noah Goldstein



On 13/07/2021 18:12, H.J. Lu wrote:
> 
> Here is the v9 patch.  OK for master?
> 
> Thanks.
> 

I think you forgot the attach the patch.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v9] Add static tests for __clone_internal
  2021-07-14 13:18       ` Adhemerval Zanella
@ 2021-07-14 13:32         ` H.J. Lu
  2021-07-14 13:42           ` Adhemerval Zanella
  0 siblings, 1 reply; 16+ messages in thread
From: H.J. Lu @ 2021-07-14 13:32 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: GNU C Library, Florian Weimer, Noah Goldstein

[-- Attachment #1: Type: text/plain, Size: 282 bytes --]

On Wed, Jul 14, 2021 at 6:18 AM Adhemerval Zanella
<adhemerval.zanella@linaro.org> wrote:
>
>
>
> On 13/07/2021 18:12, H.J. Lu wrote:
> >
> > Here is the v9 patch.  OK for master?
> >
> > Thanks.
> >
>
> I think you forgot the attach the patch.

Oops.  Here is the patch.

-- 
H.J.

[-- Attachment #2: v9-0001-Add-static-tests-for-__clone_internal.patch --]
[-- Type: text/x-patch, Size: 16758 bytes --]

From 1aa6b571479b0a2fb54e8455f45875b4f6b4b2d8 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Fri, 14 May 2021 15:23:46 -0700
Subject: [PATCH v9] Add static tests for __clone_internal

---
 sysdeps/unix/sysv/linux/Makefile              |   9 ++
 .../sysv/linux/tst-align-clone-internal.c     |  68 +++++++++
 sysdeps/unix/sysv/linux/tst-clone2-internal.c | 126 +++++++++++++++++
 sysdeps/unix/sysv/linux/tst-clone3-internal.c |  99 +++++++++++++
 .../unix/sysv/linux/tst-getpid1-internal.c    | 133 ++++++++++++++++++
 .../sysv/linux/tst-misalign-clone-internal.c  |  74 ++++++++++
 6 files changed, 509 insertions(+)
 create mode 100644 sysdeps/unix/sysv/linux/tst-align-clone-internal.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-clone2-internal.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-clone3-internal.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-getpid1-internal.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-misalign-clone-internal.c

diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
index ed0c0d27f4..cceb16be05 100644
--- a/sysdeps/unix/sysv/linux/Makefile
+++ b/sysdeps/unix/sysv/linux/Makefile
@@ -139,6 +139,15 @@ tests-time64 += \
   tst-sigtimedwait-time64 \
   tst-timerfd-time64 \
 
+tests-clone-internal = \
+  tst-align-clone-internal \
+  tst-clone2-internal \
+  tst-clone3-internal \
+  tst-getpid1-internal \
+  tst-misalign-clone-internal
+tests-internal += $(tests-clone-internal)
+tests-static += $(tests-clone-internal)
+
 CFLAGS-tst-sigcontext-get_pc.c = -fasynchronous-unwind-tables
 
 # Generate the list of SYS_* macros for the system calls (__NR_*
diff --git a/sysdeps/unix/sysv/linux/tst-align-clone-internal.c b/sysdeps/unix/sysv/linux/tst-align-clone-internal.c
new file mode 100644
index 0000000000..f3f5b46378
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/tst-align-clone-internal.c
@@ -0,0 +1,68 @@
+/* Verify that the clone child stack is properly aligned.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sched.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <string.h>
+#include <sys/wait.h>
+#include <unistd.h>
+#include <tst-stack-align.h>
+#include <clone_internal.h>
+#include <support/xunistd.h>
+#include <support/check.h>
+
+static int
+f (void *arg)
+{
+  puts ("in f");
+
+  return TEST_STACK_ALIGN () ? 1 : 0;
+}
+
+static int
+do_test (void)
+{
+  puts ("in main");
+
+  if (TEST_STACK_ALIGN ())
+    FAIL_EXIT1 ("stack alignment failed");
+
+#ifdef __ia64__
+# define STACK_SIZE 256 * 1024
+#else
+# define STACK_SIZE 128 * 1024
+#endif
+  char st[STACK_SIZE] __attribute__ ((aligned));
+  struct clone_args clone_args =
+    {
+      .stack = (uintptr_t) st,
+      .stack_size = sizeof (st),
+    };
+  pid_t p = __clone_internal (&clone_args, f, 0);
+  TEST_VERIFY (p != -1);
+
+  int e;
+  xwaitpid (p, &e, __WCLONE);
+  TEST_VERIFY (WIFEXITED (e));
+  TEST_COMPARE (WEXITSTATUS (e), 0);
+  return 0;
+}
+
+#include <support/test-driver.c>
diff --git a/sysdeps/unix/sysv/linux/tst-clone2-internal.c b/sysdeps/unix/sysv/linux/tst-clone2-internal.c
new file mode 100644
index 0000000000..fd3a55158c
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/tst-clone2-internal.c
@@ -0,0 +1,126 @@
+/* Test if CLONE_VM does not change pthread pid/tid field (BZ #19957)
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sched.h>
+#include <signal.h>
+#include <string.h>
+#include <stdio.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <stddef.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <errno.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <sys/syscall.h>
+#include <clone_internal.h>
+#include <support/xunistd.h>
+#include <support/check.h>
+
+static int sig;
+static int pipefd[2];
+
+static int
+f (void *a)
+{
+  close (pipefd[0]);
+
+  pid_t ppid = getppid ();
+  pid_t pid = getpid ();
+  pid_t tid = gettid ();
+
+  if (write (pipefd[1], &ppid, sizeof ppid) != sizeof (ppid))
+    FAIL_EXIT1 ("write ppid failed\n");
+  if (write (pipefd[1], &pid, sizeof pid) != sizeof (pid))
+    FAIL_EXIT1 ("write pid failed\n");
+  if (write (pipefd[1], &tid, sizeof tid) != sizeof (tid))
+    FAIL_EXIT1 ("write tid failed\n");
+
+  return 0;
+}
+
+
+static int
+do_test (void)
+{
+  sig = SIGRTMIN;
+  sigset_t ss;
+  sigemptyset (&ss);
+  sigaddset (&ss, sig);
+  if (sigprocmask (SIG_BLOCK, &ss, NULL) != 0)
+    FAIL_EXIT1 ("sigprocmask failed: %m");
+
+  if (pipe2 (pipefd, O_CLOEXEC))
+    FAIL_EXIT1 ("pipe failed: %m");
+
+#ifdef __ia64__
+# define STACK_SIZE 256 * 1024
+#else
+# define STACK_SIZE 128 * 1024
+#endif
+  char st[STACK_SIZE] __attribute__ ((aligned));
+  struct clone_args clone_args =
+    {
+      .stack = (uintptr_t) st,
+      .stack_size = sizeof (st),
+    };
+  pid_t p = __clone_internal (&clone_args, f, 0);
+
+  close (pipefd[1]);
+
+  if (p == -1)
+    FAIL_EXIT1("clone failed: %m");
+
+  pid_t ppid, pid, tid;
+  if (read (pipefd[0], &ppid, sizeof pid) != sizeof pid)
+    {
+      kill (p, SIGKILL);
+      FAIL_EXIT1 ("read ppid failed: %m");
+    }
+  if (read (pipefd[0], &pid, sizeof pid) != sizeof pid)
+    {
+      kill (p, SIGKILL);
+      FAIL_EXIT1 ("read pid failed: %m");
+    }
+  if (read (pipefd[0], &tid, sizeof tid) != sizeof tid)
+    {
+      kill (p, SIGKILL);
+      FAIL_EXIT1 ("read tid failed: %m");
+    }
+
+  close (pipefd[0]);
+
+  pid_t own_pid = getpid ();
+  pid_t own_tid = syscall (__NR_gettid);
+
+  /* Some sanity checks for clone syscall: returned ppid should be current
+     pid and both returned tid/pid should be different from current one.  */
+  if ((ppid != own_pid) || (pid == own_pid) || (tid == own_tid))
+    FAIL_RET ("ppid=%i pid=%i tid=%i | own_pid=%i own_tid=%i",
+	      (int)ppid, (int)pid, (int)tid, (int)own_pid, (int)own_tid);
+
+  int e;
+  xwaitpid (p, &e, __WCLONE);
+  TEST_VERIFY (WIFEXITED (e));
+  TEST_COMPARE (WEXITSTATUS (e), 0);
+  return 0;
+}
+
+#include <support/test-driver.c>
diff --git a/sysdeps/unix/sysv/linux/tst-clone3-internal.c b/sysdeps/unix/sysv/linux/tst-clone3-internal.c
new file mode 100644
index 0000000000..2bdbc571e6
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/tst-clone3-internal.c
@@ -0,0 +1,99 @@
+/* Check if clone (CLONE_THREAD) does not call exit_group (BZ #21512)
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <string.h>
+#include <sched.h>
+#include <signal.h>
+#include <unistd.h>
+#include <errno.h>
+#include <sys/syscall.h>
+#include <sys/wait.h>
+#include <sys/types.h>
+#include <linux/futex.h>
+#include <support/check.h>
+#include <stdatomic.h>
+#include <clone_internal.h>
+
+/* Test if clone call with CLONE_THREAD does not call exit_group.  The 'f'
+   function returns '1', which will be used by clone thread to call the
+   'exit' syscall directly.  If _exit is used instead, exit_group will be
+   used and thus the thread group will finish with return value of '1'
+   (where '2' from main thread is expected.).  */
+
+static int
+f (void *a)
+{
+  return 1;
+}
+
+/* Futex wait for TID argument, similar to pthread_join internal
+   implementation.  */
+#define wait_tid(ctid_ptr, ctid_val)					\
+  do {									\
+    __typeof (*(ctid_ptr)) __tid;					\
+    /* We need acquire MO here so that we synchronize with the		\
+       kernel's store to 0 when the clone terminates.  */		\
+    while ((__tid = atomic_load_explicit (ctid_ptr,			\
+					  memory_order_acquire)) != 0)	\
+      futex_wait (ctid_ptr, ctid_val);					\
+  } while (0)
+
+static inline int
+futex_wait (int *futexp, int val)
+{
+#ifdef __NR_futex
+  return syscall (__NR_futex, futexp, FUTEX_WAIT, val);
+#else
+  return syscall (__NR_futex_time64, futexp, FUTEX_WAIT, val);
+#endif
+}
+
+static int
+do_test (void)
+{
+  char st[1024] __attribute__ ((aligned));
+  int clone_flags = CLONE_THREAD;
+  /* Minimum required flags to used along with CLONE_THREAD.  */
+  clone_flags |= CLONE_VM | CLONE_SIGHAND;
+  /* We will used ctid to call on futex to wait for thread exit.  */
+  clone_flags |= CLONE_CHILD_CLEARTID;
+  /* Initialize with a known value.  ctid is set to zero by the kernel after the
+     cloned thread has exited.  */
+#define CTID_INIT_VAL 1
+  pid_t ctid = CTID_INIT_VAL;
+  pid_t tid;
+
+  struct clone_args clone_args =
+    {
+      .flags = clone_flags & ~CSIGNAL,
+      .exit_signal = clone_flags & CSIGNAL,
+      .stack = (uintptr_t) st,
+      .stack_size = sizeof (st),
+      .child_tid = (uintptr_t) &ctid,
+    };
+  tid = __clone_internal (&clone_args, f, NULL);
+  if (tid == -1)
+    FAIL_EXIT1 ("clone failed: %m");
+
+  wait_tid (&ctid, CTID_INIT_VAL);
+
+  return 2;
+}
+
+#define EXPECTED_STATUS 2
+#include <support/test-driver.c>
diff --git a/sysdeps/unix/sysv/linux/tst-getpid1-internal.c b/sysdeps/unix/sysv/linux/tst-getpid1-internal.c
new file mode 100644
index 0000000000..ee69e52401
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/tst-getpid1-internal.c
@@ -0,0 +1,133 @@
+/* Verify that the parent pid is unchanged by __clone_internal.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sched.h>
+#include <signal.h>
+#include <string.h>
+#include <stdio.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <clone_internal.h>
+#include <support/xunistd.h>
+
+#ifndef TEST_CLONE_FLAGS
+#define TEST_CLONE_FLAGS 0
+#endif
+
+static int sig;
+
+static int
+f (void *a)
+{
+  puts ("in f");
+  union sigval sival;
+  sival.sival_int = getpid ();
+  printf ("pid = %d\n", sival.sival_int);
+  if (sigqueue (getppid (), sig, sival) != 0)
+    return 1;
+  return 0;
+}
+
+
+static int
+do_test (void)
+{
+  int mypid = getpid ();
+
+  sig = SIGRTMIN;
+  sigset_t ss;
+  sigemptyset (&ss);
+  sigaddset (&ss, sig);
+  if (sigprocmask (SIG_BLOCK, &ss, NULL) != 0)
+    {
+      printf ("sigprocmask failed: %m\n");
+      return 1;
+    }
+
+#ifdef __ia64__
+# define STACK_SIZE 256 * 1024
+#else
+# define STACK_SIZE 128 * 1024
+#endif
+  char st[STACK_SIZE] __attribute__ ((aligned));
+  struct clone_args clone_args =
+    {
+      .flags = TEST_CLONE_FLAGS & ~CSIGNAL,
+      .exit_signal = TEST_CLONE_FLAGS & CSIGNAL,
+      .stack = (uintptr_t) st,
+      .stack_size = sizeof (st),
+    };
+  pid_t p = __clone_internal (&clone_args, f, 0);
+  if (p == -1)
+    {
+      printf("clone failed: %m\n");
+      return 1;
+    }
+  printf ("new thread: %d\n", (int) p);
+
+  siginfo_t si;
+  do
+    if (sigwaitinfo (&ss, &si) < 0)
+      {
+	printf("sigwaitinfo failed: %m\n");
+	kill (p, SIGKILL);
+	return 1;
+      }
+  while  (si.si_signo != sig || si.si_code != SI_QUEUE);
+
+  int e;
+  xwaitpid (p, &e, __WCLONE);
+  if (!WIFEXITED (e))
+    {
+      if (WIFSIGNALED (e))
+	printf ("died from signal %s\n", strsignal (WTERMSIG (e)));
+      else
+	puts ("did not terminate correctly");
+      return 1;
+    }
+  if (WEXITSTATUS (e) != 0)
+    {
+      printf ("exit code %d\n", WEXITSTATUS (e));
+      return 1;
+    }
+
+  if (si.si_int != (int) p)
+    {
+      printf ("expected PID %d, got si_int %d\n", (int) p, si.si_int);
+      kill (p, SIGKILL);
+      return 1;
+    }
+
+  if (si.si_pid != p)
+    {
+      printf ("expected PID %d, got si_pid %d\n", (int) p, (int) si.si_pid);
+      kill (p, SIGKILL);
+      return 1;
+    }
+
+  if (getpid () != mypid)
+    {
+      puts ("my PID changed");
+      return 1;
+    }
+
+  return 0;
+}
+
+#include <support/test-driver.c>
diff --git a/sysdeps/unix/sysv/linux/tst-misalign-clone-internal.c b/sysdeps/unix/sysv/linux/tst-misalign-clone-internal.c
new file mode 100644
index 0000000000..5e34160238
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/tst-misalign-clone-internal.c
@@ -0,0 +1,74 @@
+/* Verify that __clone_internal properly aligns the child stack.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sched.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <string.h>
+#include <sys/wait.h>
+#include <unistd.h>
+#include <libc-pointer-arith.h>
+#include <tst-stack-align.h>
+#include <clone_internal.h>
+#include <support/xunistd.h>
+#include <support/check.h>
+
+static int
+check_stack_alignment (void *arg)
+{
+  puts ("in f");
+
+  return TEST_STACK_ALIGN () ? 1 : 0;
+}
+
+static int
+do_test (void)
+{
+  puts ("in do_test");
+
+  if (TEST_STACK_ALIGN ())
+    FAIL_EXIT1 ("stack isn't aligned\n");
+
+#ifdef __ia64__
+# define STACK_SIZE (256 * 1024)
+#else
+# define STACK_SIZE (128 * 1024)
+#endif
+  char st[STACK_SIZE + 1];
+  /* NB: Align child stack to 1 byte.  */
+  char *stack = PTR_ALIGN_UP (&st[0], 2) + 1;
+  struct clone_args clone_args =
+    {
+      .stack = (uintptr_t) stack,
+      .stack_size = STACK_SIZE,
+    };
+  pid_t p = __clone_internal (&clone_args, check_stack_alignment, 0);
+
+  /* Clone must not fail.  */
+  TEST_VERIFY_EXIT (p != -1);
+
+  int e;
+  xwaitpid (p, &e, __WCLONE);
+  TEST_VERIFY (WIFEXITED (e));
+  TEST_COMPARE (WEXITSTATUS (e), 0);
+
+  return 0;
+}
+
+#include <support/test-driver.c>
-- 
2.31.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v9] Add static tests for __clone_internal
  2021-07-14 13:32         ` H.J. Lu
@ 2021-07-14 13:42           ` Adhemerval Zanella
  0 siblings, 0 replies; 16+ messages in thread
From: Adhemerval Zanella @ 2021-07-14 13:42 UTC (permalink / raw)
  To: H.J. Lu; +Cc: GNU C Library, Florian Weimer, Noah Goldstein



On 14/07/2021 10:32, H.J. Lu wrote:
> On Wed, Jul 14, 2021 at 6:18 AM Adhemerval Zanella
> <adhemerval.zanella@linaro.org> wrote:
>>
>>
>>
>> On 13/07/2021 18:12, H.J. Lu wrote:
>>>
>>> Here is the v9 patch.  OK for master?
>>>
>>> Thanks.
>>>
>>
>> I think you forgot the attach the patch.
> 
> Oops.  Here is the patch.
> 

LGTM, thanks.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>

> From 1aa6b571479b0a2fb54e8455f45875b4f6b4b2d8 Mon Sep 17 00:00:00 2001
> From: "H.J. Lu" <hjl.tools@gmail.com>
> Date: Fri, 14 May 2021 15:23:46 -0700
> Subject: [PATCH v9] Add static tests for __clone_internal
> 
> ---
>  sysdeps/unix/sysv/linux/Makefile              |   9 ++
>  .../sysv/linux/tst-align-clone-internal.c     |  68 +++++++++
>  sysdeps/unix/sysv/linux/tst-clone2-internal.c | 126 +++++++++++++++++
>  sysdeps/unix/sysv/linux/tst-clone3-internal.c |  99 +++++++++++++
>  .../unix/sysv/linux/tst-getpid1-internal.c    | 133 ++++++++++++++++++
>  .../sysv/linux/tst-misalign-clone-internal.c  |  74 ++++++++++
>  6 files changed, 509 insertions(+)
>  create mode 100644 sysdeps/unix/sysv/linux/tst-align-clone-internal.c
>  create mode 100644 sysdeps/unix/sysv/linux/tst-clone2-internal.c
>  create mode 100644 sysdeps/unix/sysv/linux/tst-clone3-internal.c
>  create mode 100644 sysdeps/unix/sysv/linux/tst-getpid1-internal.c
>  create mode 100644 sysdeps/unix/sysv/linux/tst-misalign-clone-internal.c
> 
> diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
> index ed0c0d27f4..cceb16be05 100644
> --- a/sysdeps/unix/sysv/linux/Makefile
> +++ b/sysdeps/unix/sysv/linux/Makefile
> @@ -139,6 +139,15 @@ tests-time64 += \
>    tst-sigtimedwait-time64 \
>    tst-timerfd-time64 \
>  
> +tests-clone-internal = \
> +  tst-align-clone-internal \
> +  tst-clone2-internal \
> +  tst-clone3-internal \
> +  tst-getpid1-internal \
> +  tst-misalign-clone-internal
> +tests-internal += $(tests-clone-internal)
> +tests-static += $(tests-clone-internal)
> +
>  CFLAGS-tst-sigcontext-get_pc.c = -fasynchronous-unwind-tables
>  
>  # Generate the list of SYS_* macros for the system calls (__NR_*,

Ok.

> diff --git a/sysdeps/unix/sysv/linux/tst-align-clone-internal.c b/sysdeps/unix/sysv/linux/tst-align-clone-internal.c
> new file mode 100644
> index 0000000000..f3f5b46378
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/tst-align-clone-internal.c
> @@ -0,0 +1,68 @@
> +/* Verify that the clone child stack is properly aligned.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sched.h>
> +#include <stdbool.h>
> +#include <stdint.h>
> +#include <stdio.h>
> +#include <string.h>
> +#include <sys/wait.h>
> +#include <unistd.h>
> +#include <tst-stack-align.h>
> +#include <clone_internal.h>
> +#include <support/xunistd.h>
> +#include <support/check.h>
> +
> +static int
> +f (void *arg)
> +{
> +  puts ("in f");
> +
> +  return TEST_STACK_ALIGN () ? 1 : 0;
> +}
> +
> +static int
> +do_test (void)
> +{
> +  puts ("in main");
> +
> +  if (TEST_STACK_ALIGN ())
> +    FAIL_EXIT1 ("stack alignment failed");
> +
> +#ifdef __ia64__
> +# define STACK_SIZE 256 * 1024
> +#else
> +# define STACK_SIZE 128 * 1024
> +#endif
> +  char st[STACK_SIZE] __attribute__ ((aligned));
> +  struct clone_args clone_args =
> +    {
> +      .stack = (uintptr_t) st,
> +      .stack_size = sizeof (st),
> +    };
> +  pid_t p = __clone_internal (&clone_args, f, 0);
> +  TEST_VERIFY (p != -1);
> +
> +  int e;
> +  xwaitpid (p, &e, __WCLONE);
> +  TEST_VERIFY (WIFEXITED (e));
> +  TEST_COMPARE (WEXITSTATUS (e), 0);
> +  return 0;
> +}
> +
> +#include <support/test-driver.c>

Ok.

> diff --git a/sysdeps/unix/sysv/linux/tst-clone2-internal.c b/sysdeps/unix/sysv/linux/tst-clone2-internal.c
> new file mode 100644
> index 0000000000..fd3a55158c
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/tst-clone2-internal.c
> @@ -0,0 +1,126 @@
> +/* Test if CLONE_VM does not change pthread pid/tid field (BZ #19957)
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sched.h>
> +#include <signal.h>
> +#include <string.h>
> +#include <stdio.h>
> +#include <fcntl.h>
> +#include <unistd.h>
> +#include <stddef.h>
> +#include <stdbool.h>
> +#include <stdint.h>
> +#include <stdlib.h>
> +#include <errno.h>
> +#include <sys/types.h>
> +#include <sys/wait.h>
> +#include <sys/syscall.h>
> +#include <clone_internal.h>
> +#include <support/xunistd.h>
> +#include <support/check.h>
> +
> +static int sig;
> +static int pipefd[2];
> +
> +static int
> +f (void *a)
> +{
> +  close (pipefd[0]);
> +
> +  pid_t ppid = getppid ();
> +  pid_t pid = getpid ();
> +  pid_t tid = gettid ();
> +
> +  if (write (pipefd[1], &ppid, sizeof ppid) != sizeof (ppid))
> +    FAIL_EXIT1 ("write ppid failed\n");
> +  if (write (pipefd[1], &pid, sizeof pid) != sizeof (pid))
> +    FAIL_EXIT1 ("write pid failed\n");
> +  if (write (pipefd[1], &tid, sizeof tid) != sizeof (tid))
> +    FAIL_EXIT1 ("write tid failed\n");
> +
> +  return 0;
> +}
> +
> +
> +static int
> +do_test (void)
> +{
> +  sig = SIGRTMIN;
> +  sigset_t ss;
> +  sigemptyset (&ss);
> +  sigaddset (&ss, sig);
> +  if (sigprocmask (SIG_BLOCK, &ss, NULL) != 0)
> +    FAIL_EXIT1 ("sigprocmask failed: %m");
> +
> +  if (pipe2 (pipefd, O_CLOEXEC))
> +    FAIL_EXIT1 ("pipe failed: %m");
> +
> +#ifdef __ia64__
> +# define STACK_SIZE 256 * 1024
> +#else
> +# define STACK_SIZE 128 * 1024
> +#endif
> +  char st[STACK_SIZE] __attribute__ ((aligned));
> +  struct clone_args clone_args =
> +    {
> +      .stack = (uintptr_t) st,
> +      .stack_size = sizeof (st),
> +    };
> +  pid_t p = __clone_internal (&clone_args, f, 0);
> +
> +  close (pipefd[1]);
> +
> +  if (p == -1)
> +    FAIL_EXIT1("clone failed: %m");
> +
> +  pid_t ppid, pid, tid;
> +  if (read (pipefd[0], &ppid, sizeof pid) != sizeof pid)
> +    {
> +      kill (p, SIGKILL);
> +      FAIL_EXIT1 ("read ppid failed: %m");
> +    }
> +  if (read (pipefd[0], &pid, sizeof pid) != sizeof pid)
> +    {
> +      kill (p, SIGKILL);
> +      FAIL_EXIT1 ("read pid failed: %m");
> +    }
> +  if (read (pipefd[0], &tid, sizeof tid) != sizeof tid)
> +    {
> +      kill (p, SIGKILL);
> +      FAIL_EXIT1 ("read tid failed: %m");
> +    }
> +
> +  close (pipefd[0]);
> +
> +  pid_t own_pid = getpid ();
> +  pid_t own_tid = syscall (__NR_gettid);
> +
> +  /* Some sanity checks for clone syscall: returned ppid should be current
> +     pid and both returned tid/pid should be different from current one.  */
> +  if ((ppid != own_pid) || (pid == own_pid) || (tid == own_tid))
> +    FAIL_RET ("ppid=%i pid=%i tid=%i | own_pid=%i own_tid=%i",
> +	      (int)ppid, (int)pid, (int)tid, (int)own_pid, (int)own_tid);
> +
> +  int e;
> +  xwaitpid (p, &e, __WCLONE);
> +  TEST_VERIFY (WIFEXITED (e));
> +  TEST_COMPARE (WEXITSTATUS (e), 0);
> +  return 0;
> +}
> +
> +#include <support/test-driver.c>

Ok.

> diff --git a/sysdeps/unix/sysv/linux/tst-clone3-internal.c b/sysdeps/unix/sysv/linux/tst-clone3-internal.c
> new file mode 100644
> index 0000000000..2bdbc571e6
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/tst-clone3-internal.c
> @@ -0,0 +1,99 @@
> +/* Check if clone (CLONE_THREAD) does not call exit_group (BZ #21512)
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <string.h>
> +#include <sched.h>
> +#include <signal.h>
> +#include <unistd.h>
> +#include <errno.h>
> +#include <sys/syscall.h>
> +#include <sys/wait.h>
> +#include <sys/types.h>
> +#include <linux/futex.h>
> +#include <support/check.h>
> +#include <stdatomic.h>
> +#include <clone_internal.h>
> +
> +/* Test if clone call with CLONE_THREAD does not call exit_group.  The 'f'
> +   function returns '1', which will be used by clone thread to call the
> +   'exit' syscall directly.  If _exit is used instead, exit_group will be
> +   used and thus the thread group will finish with return value of '1'
> +   (where '2' from main thread is expected.).  */
> +
> +static int
> +f (void *a)
> +{
> +  return 1;
> +}
> +
> +/* Futex wait for TID argument, similar to pthread_join internal
> +   implementation.  */
> +#define wait_tid(ctid_ptr, ctid_val)					\
> +  do {									\
> +    __typeof (*(ctid_ptr)) __tid;					\
> +    /* We need acquire MO here so that we synchronize with the		\
> +       kernel's store to 0 when the clone terminates.  */		\
> +    while ((__tid = atomic_load_explicit (ctid_ptr,			\
> +					  memory_order_acquire)) != 0)	\
> +      futex_wait (ctid_ptr, ctid_val);					\
> +  } while (0)
> +
> +static inline int
> +futex_wait (int *futexp, int val)
> +{
> +#ifdef __NR_futex
> +  return syscall (__NR_futex, futexp, FUTEX_WAIT, val);
> +#else
> +  return syscall (__NR_futex_time64, futexp, FUTEX_WAIT, val);
> +#endif
> +}
> +
> +static int
> +do_test (void)
> +{
> +  char st[1024] __attribute__ ((aligned));
> +  int clone_flags = CLONE_THREAD;
> +  /* Minimum required flags to used along with CLONE_THREAD.  */
> +  clone_flags |= CLONE_VM | CLONE_SIGHAND;
> +  /* We will used ctid to call on futex to wait for thread exit.  */
> +  clone_flags |= CLONE_CHILD_CLEARTID;
> +  /* Initialize with a known value.  ctid is set to zero by the kernel after the
> +     cloned thread has exited.  */
> +#define CTID_INIT_VAL 1
> +  pid_t ctid = CTID_INIT_VAL;
> +  pid_t tid;
> +
> +  struct clone_args clone_args =
> +    {
> +      .flags = clone_flags & ~CSIGNAL,
> +      .exit_signal = clone_flags & CSIGNAL,
> +      .stack = (uintptr_t) st,
> +      .stack_size = sizeof (st),
> +      .child_tid = (uintptr_t) &ctid,
> +    };
> +  tid = __clone_internal (&clone_args, f, NULL);
> +  if (tid == -1)
> +    FAIL_EXIT1 ("clone failed: %m");
> +
> +  wait_tid (&ctid, CTID_INIT_VAL);
> +
> +  return 2;
> +}
> +
> +#define EXPECTED_STATUS 2
> +#include <support/test-driver.c>

Ok.

> diff --git a/sysdeps/unix/sysv/linux/tst-getpid1-internal.c b/sysdeps/unix/sysv/linux/tst-getpid1-internal.c
> new file mode 100644
> index 0000000000..ee69e52401
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/tst-getpid1-internal.c
> @@ -0,0 +1,133 @@
> +/* Verify that the parent pid is unchanged by __clone_internal.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sched.h>
> +#include <signal.h>
> +#include <string.h>
> +#include <stdio.h>
> +#include <unistd.h>
> +#include <sys/types.h>
> +#include <sys/wait.h>
> +#include <clone_internal.h>
> +#include <support/xunistd.h>
> +
> +#ifndef TEST_CLONE_FLAGS
> +#define TEST_CLONE_FLAGS 0
> +#endif
> +
> +static int sig;
> +
> +static int
> +f (void *a)
> +{
> +  puts ("in f");
> +  union sigval sival;
> +  sival.sival_int = getpid ();
> +  printf ("pid = %d\n", sival.sival_int);
> +  if (sigqueue (getppid (), sig, sival) != 0)
> +    return 1;
> +  return 0;
> +}
> +
> +
> +static int
> +do_test (void)
> +{
> +  int mypid = getpid ();
> +
> +  sig = SIGRTMIN;
> +  sigset_t ss;
> +  sigemptyset (&ss);
> +  sigaddset (&ss, sig);
> +  if (sigprocmask (SIG_BLOCK, &ss, NULL) != 0)
> +    {
> +      printf ("sigprocmask failed: %m\n");
> +      return 1;
> +    }
> +
> +#ifdef __ia64__
> +# define STACK_SIZE 256 * 1024
> +#else
> +# define STACK_SIZE 128 * 1024
> +#endif
> +  char st[STACK_SIZE] __attribute__ ((aligned));
> +  struct clone_args clone_args =
> +    {
> +      .flags = TEST_CLONE_FLAGS & ~CSIGNAL,
> +      .exit_signal = TEST_CLONE_FLAGS & CSIGNAL,
> +      .stack = (uintptr_t) st,
> +      .stack_size = sizeof (st),
> +    };
> +  pid_t p = __clone_internal (&clone_args, f, 0);
> +  if (p == -1)
> +    {
> +      printf("clone failed: %m\n");
> +      return 1;
> +    }
> +  printf ("new thread: %d\n", (int) p);
> +
> +  siginfo_t si;
> +  do
> +    if (sigwaitinfo (&ss, &si) < 0)
> +      {
> +	printf("sigwaitinfo failed: %m\n");
> +	kill (p, SIGKILL);
> +	return 1;
> +      }
> +  while  (si.si_signo != sig || si.si_code != SI_QUEUE);
> +
> +  int e;
> +  xwaitpid (p, &e, __WCLONE);
> +  if (!WIFEXITED (e))
> +    {
> +      if (WIFSIGNALED (e))
> +	printf ("died from signal %s\n", strsignal (WTERMSIG (e)));
> +      else
> +	puts ("did not terminate correctly");
> +      return 1;
> +    }
> +  if (WEXITSTATUS (e) != 0)
> +    {
> +      printf ("exit code %d\n", WEXITSTATUS (e));
> +      return 1;
> +    }
> +
> +  if (si.si_int != (int) p)
> +    {
> +      printf ("expected PID %d, got si_int %d\n", (int) p, si.si_int);
> +      kill (p, SIGKILL);
> +      return 1;
> +    }
> +
> +  if (si.si_pid != p)
> +    {
> +      printf ("expected PID %d, got si_pid %d\n", (int) p, (int) si.si_pid);
> +      kill (p, SIGKILL);
> +      return 1;
> +    }
> +
> +  if (getpid () != mypid)
> +    {
> +      puts ("my PID changed");
> +      return 1;
> +    }
> +
> +  return 0;
> +}
> +
> +#include <support/test-driver.c>

Ok.

> diff --git a/sysdeps/unix/sysv/linux/tst-misalign-clone-internal.c b/sysdeps/unix/sysv/linux/tst-misalign-clone-internal.c
> new file mode 100644
> index 0000000000..5e34160238
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/tst-misalign-clone-internal.c
> @@ -0,0 +1,74 @@
> +/* Verify that __clone_internal properly aligns the child stack.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sched.h>
> +#include <stdbool.h>
> +#include <stdint.h>
> +#include <stdio.h>
> +#include <string.h>
> +#include <sys/wait.h>
> +#include <unistd.h>
> +#include <libc-pointer-arith.h>
> +#include <tst-stack-align.h>
> +#include <clone_internal.h>
> +#include <support/xunistd.h>
> +#include <support/check.h>
> +
> +static int
> +check_stack_alignment (void *arg)
> +{
> +  puts ("in f");
> +
> +  return TEST_STACK_ALIGN () ? 1 : 0;
> +}
> +
> +static int
> +do_test (void)
> +{
> +  puts ("in do_test");
> +
> +  if (TEST_STACK_ALIGN ())
> +    FAIL_EXIT1 ("stack isn't aligned\n");
> +
> +#ifdef __ia64__
> +# define STACK_SIZE (256 * 1024)
> +#else
> +# define STACK_SIZE (128 * 1024)
> +#endif
> +  char st[STACK_SIZE + 1];
> +  /* NB: Align child stack to 1 byte.  */
> +  char *stack = PTR_ALIGN_UP (&st[0], 2) + 1;
> +  struct clone_args clone_args =
> +    {
> +      .stack = (uintptr_t) stack,
> +      .stack_size = STACK_SIZE,
> +    };
> +  pid_t p = __clone_internal (&clone_args, check_stack_alignment, 0);
> +
> +  /* Clone must not fail.  */
> +  TEST_VERIFY_EXIT (p != -1);
> +
> +  int e;
> +  xwaitpid (p, &e, __WCLONE);
> +  TEST_VERIFY (WIFEXITED (e));
> +  TEST_COMPARE (WEXITSTATUS (e), 0);
> +
> +  return 0;
> +}
> +
> +#include <support/test-driver.c>
> -- 
> 2.31.1

Ok.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-07-14 13:42 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-01 14:55 [PATCH v8 0/3] Add an internal wrapper for clone, clone2 and clone3 H.J. Lu
2021-06-01 14:55 ` [PATCH v8 1/3] " H.J. Lu
2021-06-04 12:20   ` H.J. Lu
2021-06-18 18:20     ` PING^1 " H.J. Lu
2021-07-13 18:54   ` Adhemerval Zanella
2021-07-13 19:06     ` Adhemerval Zanella
2021-07-13 19:49     ` [PATCH v9] " H.J. Lu
2021-07-14 13:17       ` Adhemerval Zanella
2021-06-01 14:55 ` [PATCH v8 2/3] x86-64: Add the clone3 wrapper H.J. Lu
2021-07-13 19:12   ` Adhemerval Zanella
2021-06-01 14:55 ` [PATCH v8 3/3] Add static tests for __clone_internal H.J. Lu
2021-07-13 19:32   ` Adhemerval Zanella
2021-07-13 21:12     ` [PATCH v9] " H.J. Lu
2021-07-14 13:18       ` Adhemerval Zanella
2021-07-14 13:32         ` H.J. Lu
2021-07-14 13:42           ` Adhemerval Zanella

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).