public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH v5 0/5] Add an internal wrapper for clone, clone2 and clone3
@ 2021-05-15 12:34 H.J. Lu
  2021-05-15 12:34 ` [PATCH v5 1/5] " H.J. Lu
                   ` (4 more replies)
  0 siblings, 5 replies; 19+ messages in thread
From: H.J. Lu @ 2021-05-15 12:34 UTC (permalink / raw)
  To: libc-alpha; +Cc: Florian Weimer, Adhemerval Zanella

The clone3 system call provides a superset of the functionality of clone
and clone2.  It also provides a number of API improve ments, including
the ability to specify the size of the child's stack area which can be
used by kernel to compute the shadow stack size when allocating the
shadow stack.  Add:

extern int __clone_internal (struct clone_args *__cl_args,
			     int (*__func) (void *__arg), void *__arg);

to provide an abstract interface for clone, clone2 and clone3.

1. Add cast_to_pointer to cast an integer to void * pointer.
2. Simplify stack management for thread creation by passing both stack
base and size to create_thread.
3. Consolidate clone vs clone2 differences into a single file.
4. Use only __clone_internal to clone a thread.
5. Call __clone3 if HAVE_CLONE3_WAPPER is defined.  If __clone3 returns
-1 with ENOSYS, fall back to clone or clone2.
6. Enable the public clone3 wrapper in the future after it has been
added to all targets.

Tested with build-many-glibcs.py.

H.J. Lu (5):
  Add an internal wrapper for clone, clone2 and clone3
  nptl: Always pass stack size to create_thread
  GLIBC_PRIVATE: Export __clone_internal
  x86-64: Add the clone3 wrapper
  Add tests for __clone_internal

 include/clone_internal.h                      |  14 ++
 include/libc-pointer-arith.h                  |   3 +
 nptl/allocatestack.c                          |  59 +-------
 nptl/createthread.c                           |   3 +-
 nptl/pthread_create.c                         |  17 ++-
 sysdeps/unix/sysv/linux/Makefile              |  11 +-
 sysdeps/unix/sysv/linux/Versions              |   1 +
 sysdeps/unix/sysv/linux/clone-internal.c      |  97 ++++++++++++
 sysdeps/unix/sysv/linux/clone-offsets.sym     |   5 +
 sysdeps/unix/sysv/linux/clone3.c              |   1 +
 sysdeps/unix/sysv/linux/clone3.h              |  55 +++++++
 sysdeps/unix/sysv/linux/createthread.c        |  25 +--
 sysdeps/unix/sysv/linux/spawni.c              |  26 ++--
 .../sysv/linux/tst-align-clone-internal.c     |  91 +++++++++++
 sysdeps/unix/sysv/linux/tst-clone-internal.c  |  51 +++++++
 sysdeps/unix/sysv/linux/tst-clone2-internal.c | 142 ++++++++++++++++++
 sysdeps/unix/sysv/linux/tst-clone3-internal.c | 101 +++++++++++++
 .../unix/sysv/linux/tst-getpid1-internal.c    | 137 +++++++++++++++++
 sysdeps/unix/sysv/linux/x86_64/clone3.S       |  92 ++++++++++++
 sysdeps/unix/sysv/linux/x86_64/sysdep.h       |   2 +
 20 files changed, 841 insertions(+), 92 deletions(-)
 create mode 100644 include/clone_internal.h
 create mode 100644 sysdeps/unix/sysv/linux/clone-internal.c
 create mode 100644 sysdeps/unix/sysv/linux/clone-offsets.sym
 create mode 100644 sysdeps/unix/sysv/linux/clone3.c
 create mode 100644 sysdeps/unix/sysv/linux/clone3.h
 create mode 100644 sysdeps/unix/sysv/linux/tst-align-clone-internal.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-clone-internal.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-clone2-internal.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-clone3-internal.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-getpid1-internal.c
 create mode 100644 sysdeps/unix/sysv/linux/x86_64/clone3.S

-- 
2.31.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v5 1/5] Add an internal wrapper for clone, clone2 and clone3
  2021-05-15 12:34 [PATCH v5 0/5] Add an internal wrapper for clone, clone2 and clone3 H.J. Lu
@ 2021-05-15 12:34 ` H.J. Lu
  2021-05-20 14:46   ` Florian Weimer
  2021-05-15 12:34 ` [PATCH v5 2/5] nptl: Always pass stack size to create_thread H.J. Lu
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 19+ messages in thread
From: H.J. Lu @ 2021-05-15 12:34 UTC (permalink / raw)
  To: libc-alpha; +Cc: Florian Weimer, Adhemerval Zanella

The clone3 system call provides a superset of the functionality of clone
and clone2.  It also provides a number of API improve ments, including
the ability to specify the size of the child's stack area which can be
used by kernel to compute the shadow stack size when allocating the
shadow stack.  Add:

extern int __clone_internal (struct clone_args *__cl_args,
			     int (*__func) (void *__arg), void *__arg);

to provide an abstract interface for clone, clone2 and clone3.

1. Add cast_to_pointer to cast an integer to void * pointer.
2. Simplify stack management for thread creation by passing both stack
base and size to create_thread.
3. Consolidate clone vs clone2 differences into a single file.
4. Use only __clone_internal to clone a thread.
5. Call __clone3 if HAVE_CLONE3_WAPPER is defined.  If __clone3 returns
-1 with ENOSYS, fall back to clone or clone2.
6. Enable the public clone3 wrapper in the future after it has been
added to all targets.

Tested with build-many-glibcs.py.
---
 include/clone_internal.h                  | 14 ++++
 include/libc-pointer-arith.h              |  3 +
 sysdeps/unix/sysv/linux/Makefile          |  4 +-
 sysdeps/unix/sysv/linux/clone-internal.c  | 97 +++++++++++++++++++++++
 sysdeps/unix/sysv/linux/clone-offsets.sym |  5 ++
 sysdeps/unix/sysv/linux/clone3.c          |  1 +
 sysdeps/unix/sysv/linux/clone3.h          | 55 +++++++++++++
 sysdeps/unix/sysv/linux/createthread.c    | 25 +++---
 sysdeps/unix/sysv/linux/spawni.c          | 26 +++---
 9 files changed, 202 insertions(+), 28 deletions(-)
 create mode 100644 include/clone_internal.h
 create mode 100644 sysdeps/unix/sysv/linux/clone-internal.c
 create mode 100644 sysdeps/unix/sysv/linux/clone-offsets.sym
 create mode 100644 sysdeps/unix/sysv/linux/clone3.c
 create mode 100644 sysdeps/unix/sysv/linux/clone3.h

diff --git a/include/clone_internal.h b/include/clone_internal.h
new file mode 100644
index 0000000000..124f7ba169
--- /dev/null
+++ b/include/clone_internal.h
@@ -0,0 +1,14 @@
+#ifndef _CLONE3_H
+#include_next <clone3.h>
+
+extern __typeof (clone3) __clone3;
+
+/* The internal wrapper of clone and clone3.  */
+extern __typeof (clone3) __clone_internal;
+
+#ifndef _ISOMAC
+libc_hidden_proto (__clone3)
+libc_hidden_proto (__clone_internal)
+#endif
+
+#endif
diff --git a/include/libc-pointer-arith.h b/include/libc-pointer-arith.h
index 72e722c5aa..04ba537617 100644
--- a/include/libc-pointer-arith.h
+++ b/include/libc-pointer-arith.h
@@ -37,6 +37,9 @@
 /* Cast an integer or a pointer VAL to integer with proper type.  */
 # define cast_to_integer(val) ((__integer_if_pointer_type (val)) (val))
 
+/* Cast an integer VAL to void * pointer.  */
+# define cast_to_pointer(val) ((void *) (uintptr_t) (val))
+
 /* Align a value by rounding down to closest size.
    e.g. Using size of 4096, we get this behavior:
 	{4095, 4096, 4097} = {0, 4096, 4096}.  */
diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
index fb155cf856..7c1f32b84d 100644
--- a/sysdeps/unix/sysv/linux/Makefile
+++ b/sysdeps/unix/sysv/linux/Makefile
@@ -54,6 +54,8 @@ CFLAGS-malloc.c += -DMORECORE_CLEARS=2
 endif
 
 ifeq ($(subdir),misc)
+gen-as-const-headers += clone-offsets.sym
+
 sysdep_routines += adjtimex clone umount umount2 readahead sysctl \
 		   setfsuid setfsgid epoll_pwait signalfd \
 		   eventfd eventfd_read eventfd_write prlimit \
@@ -64,7 +66,7 @@ sysdep_routines += adjtimex clone umount umount2 readahead sysctl \
 		   time64-support pselect32 \
 		   xstat fxstat lxstat xstat64 fxstat64 lxstat64 \
 		   fxstatat fxstatat64 \
-		   xmknod xmknodat
+		   xmknod xmknodat clone3 clone-internal
 
 CFLAGS-gethostid.c = -fexceptions
 CFLAGS-tee.c = -fexceptions -fasynchronous-unwind-tables
diff --git a/sysdeps/unix/sysv/linux/clone-internal.c b/sysdeps/unix/sysv/linux/clone-internal.c
new file mode 100644
index 0000000000..c357b0ac14
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/clone-internal.c
@@ -0,0 +1,97 @@
+/* The internal wrapper of clone and clone3.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library.  If not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include <stddef.h>
+#include <errno.h>
+#include <sched.h>
+#include <clone_internal.h>
+#include <libc-pointer-arith.h>	/* For cast_to_pointer.  */
+#include <stackinfo.h>		/* For _STACK_GROWS_{UP,DOWN}.  */
+
+#define CLONE_ARGS_SIZE_VER0 64 /* sizeof first published struct */
+#define CLONE_ARGS_SIZE_VER1 80 /* sizeof second published struct */
+#define CLONE_ARGS_SIZE_VER2 88 /* sizeof third published struct */
+
+#define sizeof_field(TYPE, MEMBER) sizeof((((TYPE *)0)->MEMBER))
+#define offsetofend(TYPE, MEMBER) \
+  (offsetof(TYPE, MEMBER) + sizeof_field(TYPE, MEMBER))
+
+_Static_assert (__alignof(struct clone_args) == 8,
+		"__alignof(struct clone_args) != 8");
+_Static_assert (offsetofend(struct clone_args, tls) == CLONE_ARGS_SIZE_VER0,
+		"offsetofend(struct clone_args, tls) != CLONE_ARGS_SIZE_VER0");
+_Static_assert (offsetofend(struct clone_args, set_tid_size) == CLONE_ARGS_SIZE_VER1,
+		"offsetofend(struct clone_args, set_tid_size) != CLONE_ARGS_SIZE_VER1");
+_Static_assert (offsetofend(struct clone_args, cgroup) == CLONE_ARGS_SIZE_VER2,
+		"offsetofend(struct clone_args, cgroup) != CLONE_ARGS_SIZE_VER2");
+_Static_assert (sizeof(struct clone_args) == CLONE_ARGS_SIZE_VER2,
+		"sizeof(struct clone_args) != CLONE_ARGS_SIZE_VER2");
+
+int
+__clone_internal (struct clone_args *cl_args,
+		  int (*func) (void *arg), void *arg)
+{
+  int ret;
+#ifdef HAVE_CLONE3_WAPPER
+  /* Try clone3 first.  */
+  int saved_errno = errno;
+  ret = __clone3 (cl_args, func, arg);
+  if (ret != -1 || errno != ENOSYS)
+    return ret;
+
+  /* NB: Restore errno since errno may be checked against non-zero
+     return value.  */
+  __set_errno (saved_errno);
+#else
+    /* Check invalid arguments.  */
+  if (cl_args == NULL || func == NULL)
+    {
+      __set_errno (EINVAL);
+      return -1;
+    }
+#endif
+
+  /* Map clone3 arguments to clone arguments.  NB: No need to check
+     invalid clone3 specific bits since this is an internal function.  */
+  int flags = cl_args->flags | cl_args->exit_signal;
+  void *stack = cast_to_pointer (cl_args->stack);
+
+#ifdef __ia64__
+  ret = __clone2 (func, stack, cl_args->stack_size,
+		  flags, arg,
+		  cast_to_pointer (cl_args->parent_tid),
+		  cast_to_pointer (cl_args->tls),
+		  cast_to_pointer (cl_args->child_tid));
+#else
+# if !_STACK_GROWS_DOWN && !_STACK_GROWS_UP
+#  error "Define either _STACK_GROWS_DOWN or _STACK_GROWS_UP"
+# endif
+
+# if _STACK_GROWS_DOWN
+  stack += cl_args->stack_size;
+# endif
+  ret = __clone (func, stack, flags, arg,
+		 cast_to_pointer (cl_args->parent_tid),
+		 cast_to_pointer (cl_args->tls),
+		 cast_to_pointer (cl_args->child_tid));
+#endif
+  return ret;
+}
+
+libc_hidden_def (__clone_internal)
diff --git a/sysdeps/unix/sysv/linux/clone-offsets.sym b/sysdeps/unix/sysv/linux/clone-offsets.sym
new file mode 100644
index 0000000000..d767e49fc8
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/clone-offsets.sym
@@ -0,0 +1,5 @@
+#include <clone3.h>
+
+--
+
+CLONE_ARGS_SIZE			sizeof (struct clone_args)
diff --git a/sysdeps/unix/sysv/linux/clone3.c b/sysdeps/unix/sysv/linux/clone3.c
new file mode 100644
index 0000000000..de963ef89d
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/clone3.c
@@ -0,0 +1 @@
+/* An empty placeholder.  */
diff --git a/sysdeps/unix/sysv/linux/clone3.h b/sysdeps/unix/sysv/linux/clone3.h
new file mode 100644
index 0000000000..a222948d55
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/clone3.h
@@ -0,0 +1,55 @@
+/* The wrapper of clone3.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library.  If not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#ifndef _CLONE3_H
+#define _CLONE3_H	1
+
+#include <features.h>
+#include <stdint.h>
+
+__BEGIN_DECLS
+
+struct clone_args
+{
+  uint64_t flags;	 /* Flags bit mask.  */
+  uint64_t pidfd;	 /* Where to store PID file descriptor
+			    (pid_t *).  */
+  uint64_t child_tid;	 /* Where to store child TID, in child's memory
+			    (pid_t *).  */
+  uint64_t parent_tid;	 /* Where to store child TID, in parent's memory
+			    (int *). */
+  uint64_t exit_signal;	 /* Signal to deliver to parent on child
+			    termination */
+  uint64_t stack;	 /* The lowest address of stack.  */
+  uint64_t stack_size;	 /* Size of stack.  */
+  uint64_t tls;		 /* Location of new TLS.  */
+  uint64_t set_tid;	 /* Pointer to a pid_t array
+			    (since Linux 5.5).  */
+  uint64_t set_tid_size; /* Number of elements in set_tid
+			    (since Linux 5.5). */
+  uint64_t cgroup;	 /* File descriptor for target cgroup
+			    of child (since Linux 5.7).  */
+} __attribute__ ((aligned (8)));
+
+/* The wrapper of clone3.  */
+extern int clone3 (struct clone_args *__cl_args,
+		   int (*__func) (void *__arg), void *__arg);
+
+__END_DECLS
+
+#endif /* clone3.h */
diff --git a/sysdeps/unix/sysv/linux/createthread.c b/sysdeps/unix/sysv/linux/createthread.c
index bc3409b326..406c73ba00 100644
--- a/sysdeps/unix/sysv/linux/createthread.c
+++ b/sysdeps/unix/sysv/linux/createthread.c
@@ -25,15 +25,10 @@
 #include <ldsodefs.h>
 #include <tls.h>
 #include <stdint.h>
+#include <clone_internal.h>
 
 #include <arch-fork.h>
 
-#ifdef __NR_clone2
-# define ARCH_CLONE __clone2
-#else
-# define ARCH_CLONE __clone
-#endif
-
 /* See the comments in pthread_create.c for the requirements for these
    two macros and the create_thread function.  */
 
@@ -47,7 +42,8 @@ static int start_thread (void *arg) __attribute__ ((noreturn));
 
 static int
 create_thread (struct pthread *pd, const struct pthread_attr *attr,
-	       bool *stopped_start, STACK_VARIABLES_PARMS, bool *thread_ran)
+	       bool *stopped_start, void *stackaddr, size_t stacksize,
+	       bool *thread_ran)
 {
   /* Determine whether the newly created threads has to be started
      stopped since we have to set the scheduling parameters or set the
@@ -100,9 +96,18 @@ create_thread (struct pthread *pd, const struct pthread_attr *attr,
 
   TLS_DEFINE_INIT_TP (tp, pd);
 
-  if (__glibc_unlikely (ARCH_CLONE (&start_thread, STACK_VARIABLES_ARGS,
-				    clone_flags, pd, &pd->tid, tp, &pd->tid)
-			== -1))
+  struct clone_args args =
+    {
+      .flags = clone_flags,
+      .pidfd = (uintptr_t) &pd->tid,
+      .parent_tid = (uintptr_t) &pd->tid,
+      .child_tid = (uintptr_t) &pd->tid,
+      .stack = (uintptr_t) stackaddr,
+      .stack_size = stacksize,
+      .tls = (uintptr_t) tp,
+    };
+  int ret = __clone_internal (&args, &start_thread, pd);
+  if (__glibc_unlikely (ret == -1))
     return errno;
 
   /* It's started now, so if we fail below, we'll have to cancel it
diff --git a/sysdeps/unix/sysv/linux/spawni.c b/sysdeps/unix/sysv/linux/spawni.c
index 501f8fbccd..fd29858cf5 100644
--- a/sysdeps/unix/sysv/linux/spawni.c
+++ b/sysdeps/unix/sysv/linux/spawni.c
@@ -31,6 +31,7 @@
 #include <dl-sysdep.h>
 #include <libc-pointer-arith.h>
 #include <ldsodefs.h>
+#include <clone_internal.h>
 #include "spawn_int.h"
 
 /* The Linux implementation of posix_spawn{p} uses the clone syscall directly
@@ -59,21 +60,6 @@
    normal program exit with the exit code 127.  */
 #define SPAWN_ERROR	127
 
-#ifdef __ia64__
-# define CLONE(__fn, __stackbase, __stacksize, __flags, __args) \
-  __clone2 (__fn, __stackbase, __stacksize, __flags, __args, 0, 0, 0)
-#else
-# define CLONE(__fn, __stack, __stacksize, __flags, __args) \
-  __clone (__fn, __stack, __flags, __args)
-#endif
-
-/* Since ia64 wants the stackbase w/clone2, re-use the grows-up macro.  */
-#if _STACK_GROWS_UP || defined (__ia64__)
-# define STACK(__stack, __stack_size) (__stack)
-#elif _STACK_GROWS_DOWN
-# define STACK(__stack, __stack_size) (__stack + __stack_size)
-#endif
-
 
 struct posix_spawn_args
 {
@@ -378,8 +364,14 @@ __spawnix (pid_t * pid, const char *file,
      need for CLONE_SETTLS.  Although parent and child share the same TLS
      namespace, there will be no concurrent access for TLS variables (errno
      for instance).  */
-  new_pid = CLONE (__spawni_child, STACK (stack, stack_size), stack_size,
-		   CLONE_VM | CLONE_VFORK | SIGCHLD, &args);
+  struct clone_args clone_args =
+    {
+      .flags = CLONE_VM | CLONE_VFORK,
+      .exit_signal = SIGCHLD,
+      .stack = (uintptr_t) stack,
+      .stack_size = stack_size,
+    };
+  new_pid = __clone_internal (&clone_args, __spawni_child, &args);
 
   /* It needs to collect the case where the auxiliary process was created
      but failed to execute the file (due either any preparation step or
-- 
2.31.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v5 2/5] nptl: Always pass stack size to create_thread
  2021-05-15 12:34 [PATCH v5 0/5] Add an internal wrapper for clone, clone2 and clone3 H.J. Lu
  2021-05-15 12:34 ` [PATCH v5 1/5] " H.J. Lu
@ 2021-05-15 12:34 ` H.J. Lu
  2021-05-20 14:26   ` Florian Weimer
  2021-05-15 12:34 ` [PATCH v5 3/5] GLIBC_PRIVATE: Export __clone_internal H.J. Lu
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 19+ messages in thread
From: H.J. Lu @ 2021-05-15 12:34 UTC (permalink / raw)
  To: libc-alpha; +Cc: Florian Weimer, Adhemerval Zanella

Since the stack size argument for create_thread is now unconditional,
always pass stack size to create_thread.
---
 nptl/allocatestack.c  | 59 ++++---------------------------------------
 nptl/createthread.c   |  3 ++-
 nptl/pthread_create.c | 17 +++++++------
 3 files changed, 16 insertions(+), 63 deletions(-)

diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c
index c0a5c4d96d..1a9ba5a52a 100644
--- a/nptl/allocatestack.c
+++ b/nptl/allocatestack.c
@@ -33,47 +33,6 @@
 #include <kernel-features.h>
 #include <nptl-stack.h>
 
-#ifndef NEED_SEPARATE_REGISTER_STACK
-
-/* Most architectures have exactly one stack pointer.  Some have more.  */
-# define STACK_VARIABLES void *stackaddr = NULL
-
-/* How to pass the values to the 'create_thread' function.  */
-# define STACK_VARIABLES_ARGS stackaddr
-
-/* How to declare function which gets there parameters.  */
-# define STACK_VARIABLES_PARMS void *stackaddr
-
-/* How to declare allocate_stack.  */
-# define ALLOCATE_STACK_PARMS void **stack
-
-/* This is how the function is called.  We do it this way to allow
-   other variants of the function to have more parameters.  */
-# define ALLOCATE_STACK(attr, pd) allocate_stack (attr, pd, &stackaddr)
-
-#else
-
-/* We need two stacks.  The kernel will place them but we have to tell
-   the kernel about the size of the reserved address space.  */
-# define STACK_VARIABLES void *stackaddr = NULL; size_t stacksize = 0
-
-/* How to pass the values to the 'create_thread' function.  */
-# define STACK_VARIABLES_ARGS stackaddr, stacksize
-
-/* How to declare function which gets there parameters.  */
-# define STACK_VARIABLES_PARMS void *stackaddr, size_t stacksize
-
-/* How to declare allocate_stack.  */
-# define ALLOCATE_STACK_PARMS void **stack, size_t *stacksize
-
-/* This is how the function is called.  We do it this way to allow
-   other variants of the function to have more parameters.  */
-# define ALLOCATE_STACK(attr, pd) \
-  allocate_stack (attr, pd, &stackaddr, &stacksize)
-
-#endif
-
-
 /* Default alignment of stack.  */
 #ifndef STACK_ALIGN
 # define STACK_ALIGN __alignof__ (long double)
@@ -249,7 +208,7 @@ advise_stack_range (void *mem, size_t size, uintptr_t pd, size_t guardsize)
    PDP must be non-NULL.  */
 static int
 allocate_stack (const struct pthread_attr *attr, struct pthread **pdp,
-		ALLOCATE_STACK_PARMS)
+		void **stack, size_t *stacksize)
 {
   struct pthread *pd;
   size_t size;
@@ -597,25 +556,17 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp,
   /* We place the thread descriptor at the end of the stack.  */
   *pdp = pd;
 
-#if _STACK_GROWS_DOWN
   void *stacktop;
 
-# if TLS_TCB_AT_TP
+#if TLS_TCB_AT_TP
   /* The stack begins before the TCB and the static TLS block.  */
   stacktop = ((char *) (pd + 1) - __static_tls_size);
-# elif TLS_DTV_AT_TP
+#elif TLS_DTV_AT_TP
   stacktop = (char *) (pd - 1);
-# endif
+#endif
 
-# ifdef NEED_SEPARATE_REGISTER_STACK
+  *stacksize = stacktop - pd->stackblock;
   *stack = pd->stackblock;
-  *stacksize = stacktop - *stack;
-# else
-  *stack = stacktop;
-# endif
-#else
-  *stack = pd->stackblock;
-#endif
 
   return 0;
 }
diff --git a/nptl/createthread.c b/nptl/createthread.c
index 46943b33fe..2ac83111ec 100644
--- a/nptl/createthread.c
+++ b/nptl/createthread.c
@@ -25,7 +25,8 @@
 
 static int
 create_thread (struct pthread *pd, const struct pthread_attr *attr,
-	       bool *stopped_start, STACK_VARIABLES_PARMS, bool *thread_ran)
+	       bool *stopped_start, void *stackaddr, size_t stacksize,
+	       bool *thread_ran)
 {
   /* If the implementation needs to do some tweaks to the thread after
      it has been created at the OS level, it can set STOPPED_START here.  */
diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c
index 770656453d..cacd1285aa 100644
--- a/nptl/pthread_create.c
+++ b/nptl/pthread_create.c
@@ -202,8 +202,8 @@ static struct rtld_global *__nptl_rtld_global __attribute_used__
    be set to true iff the thread actually started up and then got
    canceled before calling user code (*PD->start_routine).  */
 static int create_thread (struct pthread *pd, const struct pthread_attr *attr,
-			  bool *stopped_start, STACK_VARIABLES_PARMS,
-			  bool *thread_ran);
+			  bool *stopped_start, void *stackaddr,
+			  size_t stacksize, bool *thread_ran);
 
 #include <createthread.c>
 
@@ -457,7 +457,8 @@ int
 __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
 		      void *(*start_routine) (void *), void *arg)
 {
-  STACK_VARIABLES;
+  void *stackaddr = NULL;
+  size_t stacksize = 0;
 
   /* Avoid a data race in the multi-threaded case.  */
   if (__libc_single_threaded)
@@ -477,7 +478,7 @@ __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
     }
 
   struct pthread *pd = NULL;
-  int err = ALLOCATE_STACK (iattr, &pd);
+  int err = allocate_stack (iattr, &pd, &stackaddr, &stacksize);
   int retval = 0;
 
   if (__glibc_unlikely (err != 0))
@@ -622,8 +623,8 @@ __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
 
       /* We always create the thread stopped at startup so we can
 	 notify the debugger.  */
-      retval = create_thread (pd, iattr, &stopped_start,
-			      STACK_VARIABLES_ARGS, &thread_ran);
+      retval = create_thread (pd, iattr, &stopped_start, stackaddr,
+			      stacksize, &thread_ran);
       if (retval == 0)
 	{
 	  /* We retain ownership of PD until (a) (see CONCURRENCY NOTES
@@ -654,8 +655,8 @@ __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
 	}
     }
   else
-    retval = create_thread (pd, iattr, &stopped_start,
-			    STACK_VARIABLES_ARGS, &thread_ran);
+    retval = create_thread (pd, iattr, &stopped_start, stackaddr,
+			    stacksize, &thread_ran);
 
   /* Return to the previous signal mask, after creating the new
      thread.  */
-- 
2.31.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v5 3/5] GLIBC_PRIVATE: Export __clone_internal
  2021-05-15 12:34 [PATCH v5 0/5] Add an internal wrapper for clone, clone2 and clone3 H.J. Lu
  2021-05-15 12:34 ` [PATCH v5 1/5] " H.J. Lu
  2021-05-15 12:34 ` [PATCH v5 2/5] nptl: Always pass stack size to create_thread H.J. Lu
@ 2021-05-15 12:34 ` H.J. Lu
  2021-05-17 13:54   ` Andreas Schwab
  2021-05-20 14:24   ` Florian Weimer
  2021-05-15 12:34 ` [PATCH v5 4/5] x86-64: Add the clone3 wrapper H.J. Lu
  2021-05-15 12:34 ` [PATCH v5 5/5] Add tests for __clone_internal H.J. Lu
  4 siblings, 2 replies; 19+ messages in thread
From: H.J. Lu @ 2021-05-15 12:34 UTC (permalink / raw)
  To: libc-alpha; +Cc: Florian Weimer, Adhemerval Zanella

Export __clone_internal for libpthread.so and __clone_internal tests.
---
 sysdeps/unix/sysv/linux/Versions | 1 +
 1 file changed, 1 insertion(+)

diff --git a/sysdeps/unix/sysv/linux/Versions b/sysdeps/unix/sysv/linux/Versions
index 220bb2dffe..299d4fef9c 100644
--- a/sysdeps/unix/sysv/linux/Versions
+++ b/sysdeps/unix/sysv/linux/Versions
@@ -179,6 +179,7 @@ libc {
     __sigtimedwait;
     # functions used by nscd
     __netlink_assert_response;
+    __clone_internal;
   }
 }
 
-- 
2.31.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v5 4/5] x86-64: Add the clone3 wrapper
  2021-05-15 12:34 [PATCH v5 0/5] Add an internal wrapper for clone, clone2 and clone3 H.J. Lu
                   ` (2 preceding siblings ...)
  2021-05-15 12:34 ` [PATCH v5 3/5] GLIBC_PRIVATE: Export __clone_internal H.J. Lu
@ 2021-05-15 12:34 ` H.J. Lu
  2021-05-20 14:53   ` Florian Weimer
  2021-05-20 18:35   ` Noah Goldstein
  2021-05-15 12:34 ` [PATCH v5 5/5] Add tests for __clone_internal H.J. Lu
  4 siblings, 2 replies; 19+ messages in thread
From: H.J. Lu @ 2021-05-15 12:34 UTC (permalink / raw)
  To: libc-alpha; +Cc: Florian Weimer, Adhemerval Zanella

extern int clone3 (struct clone_args *__cl_args,
		   int (*__func) (void *__arg), void *__arg);
---
 sysdeps/unix/sysv/linux/x86_64/clone3.S | 92 +++++++++++++++++++++++++
 sysdeps/unix/sysv/linux/x86_64/sysdep.h |  2 +
 2 files changed, 94 insertions(+)
 create mode 100644 sysdeps/unix/sysv/linux/x86_64/clone3.S

diff --git a/sysdeps/unix/sysv/linux/x86_64/clone3.S b/sysdeps/unix/sysv/linux/x86_64/clone3.S
new file mode 100644
index 0000000000..f7d4036a6a
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/x86_64/clone3.S
@@ -0,0 +1,92 @@
+/* The clone3 syscall wrapper.  Linux/x86-64 version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+/* clone3() is even more special than fork() as it mucks with stacks
+   and invokes a function in the right context after its all over.  */
+
+#include <sysdep.h>
+#include <clone-offsets.h>
+
+/* The userland implementation is:
+   int clone3 (struct clone_args *cl_args, int (*func)(void *arg),
+	       void *arg);
+   the kernel entry is:
+   int clone3 (struct clone_args *cl_args, size_t size);
+
+   The parameters are passed in registers from userland:
+   rdi: cl_args
+   rsi: func
+   rdx: arg
+
+   The kernel expects:
+   rax: system call number
+   rdi: cl_args
+   rsi: size  */
+
+        .text
+ENTRY (__clone3)
+	/* Sanity check arguments.  */
+	movq	$-EINVAL, %rax
+	testq	%rdi, %rdi		/* No NULL cl_args pointer.  */
+	jz	SYSCALL_ERROR_LABEL
+	testq	%rsi, %rsi		/* No NULL function pointer.  */
+	jz	SYSCALL_ERROR_LABEL
+
+	/* Save the function pointer in R8 which is preserved by the
+	   syscall.  */
+	movq	%rsi, %r8
+
+	/* Put sizeof (struct clone_args) in ESI.  */
+	movl	$CLONE_ARGS_SIZE , %esi
+
+	/* Do the system call.  */
+	movl	$SYS_ify(clone3), %eax
+
+	/* End FDE now, because in the child the unwind info will be
+	   wrong.  */
+	cfi_endproc
+	syscall
+
+	test	%RAX_LP, %RAX_LP
+	jl	SYSCALL_ERROR_LABEL
+	jz	L(thread_start)
+
+	ret
+
+L(thread_start):
+	cfi_startproc
+	/* Clearing frame pointer is insufficient, use CFI.  */
+	cfi_undefined (rip)
+	/* Clear the frame pointer.  The ABI suggests this be done, to mark
+	   the outermost frame obviously.  */
+	xorl	%ebp, %ebp
+
+	/* Set up arguments for the function call.  */
+	movq	%rdx, %rdi	/* Argument.  */
+	call	*%r8		/* Call function.  */
+	/* Call exit with return value from function call. */
+	movq	%rax, %rdi
+	movl	$SYS_ify(exit), %eax
+	syscall
+	cfi_endproc
+
+	cfi_startproc
+PSEUDO_END (__clone3)
+
+libc_hidden_def (__clone3)
+weak_alias (__clone3, clone3)
diff --git a/sysdeps/unix/sysv/linux/x86_64/sysdep.h b/sysdeps/unix/sysv/linux/x86_64/sysdep.h
index dbad2c788a..f26ffc68ae 100644
--- a/sysdeps/unix/sysv/linux/x86_64/sysdep.h
+++ b/sysdeps/unix/sysv/linux/x86_64/sysdep.h
@@ -377,6 +377,8 @@
 # define HAVE_GETCPU_VSYSCALL		"__vdso_getcpu"
 # define HAVE_CLOCK_GETRES64_VSYSCALL   "__vdso_clock_getres"
 
+# define HAVE_CLONE3_WAPPER			1
+
 # define SINGLE_THREAD_BY_GLOBAL		1
 
 #endif	/* __ASSEMBLER__ */
-- 
2.31.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v5 5/5] Add tests for __clone_internal
  2021-05-15 12:34 [PATCH v5 0/5] Add an internal wrapper for clone, clone2 and clone3 H.J. Lu
                   ` (3 preceding siblings ...)
  2021-05-15 12:34 ` [PATCH v5 4/5] x86-64: Add the clone3 wrapper H.J. Lu
@ 2021-05-15 12:34 ` H.J. Lu
  2021-05-20 15:08   ` Florian Weimer
  4 siblings, 1 reply; 19+ messages in thread
From: H.J. Lu @ 2021-05-15 12:34 UTC (permalink / raw)
  To: libc-alpha; +Cc: Florian Weimer, Adhemerval Zanella

These tests should be removed if __clone_internal is no longer exported.
---
 sysdeps/unix/sysv/linux/Makefile              |   7 +-
 .../sysv/linux/tst-align-clone-internal.c     |  91 +++++++++++
 sysdeps/unix/sysv/linux/tst-clone-internal.c  |  51 +++++++
 sysdeps/unix/sysv/linux/tst-clone2-internal.c | 142 ++++++++++++++++++
 sysdeps/unix/sysv/linux/tst-clone3-internal.c | 101 +++++++++++++
 .../unix/sysv/linux/tst-getpid1-internal.c    | 137 +++++++++++++++++
 6 files changed, 528 insertions(+), 1 deletion(-)
 create mode 100644 sysdeps/unix/sysv/linux/tst-align-clone-internal.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-clone-internal.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-clone2-internal.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-clone3-internal.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-getpid1-internal.c

diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
index 7c1f32b84d..08a7ec7928 100644
--- a/sysdeps/unix/sysv/linux/Makefile
+++ b/sysdeps/unix/sysv/linux/Makefile
@@ -118,7 +118,12 @@ ifeq ($(have-GLIBC_2.27)$(build-shared),yesyes)
 tests += tst-ofdlocks-compat
 endif
 
-tests-internal += tst-sigcontext-get_pc
+tests-internal += tst-sigcontext-get_pc \
+  tst-align-clone-internal \
+  tst-clone-internal \
+  tst-clone2-internal \
+  tst-clone3-internal \
+  tst-getpid1-internal
 
 CFLAGS-tst-sigcontext-get_pc.c = -fasynchronous-unwind-tables
 
diff --git a/sysdeps/unix/sysv/linux/tst-align-clone-internal.c b/sysdeps/unix/sysv/linux/tst-align-clone-internal.c
new file mode 100644
index 0000000000..eccc39e255
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/tst-align-clone-internal.c
@@ -0,0 +1,91 @@
+/* Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sched.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <string.h>
+#include <sys/wait.h>
+#include <unistd.h>
+#include <tst-stack-align.h>
+#include <stackinfo.h>
+#include <clone_internal.h>
+
+static int
+f (void *arg)
+{
+  bool ok = true;
+
+  puts ("in f");
+
+  if (TEST_STACK_ALIGN ())
+    ok = false;
+
+  return ok ? 0 : 1;
+}
+
+static int
+do_test (void)
+{
+  bool ok = true;
+
+  puts ("in main");
+
+  if (TEST_STACK_ALIGN ())
+    ok = false;
+
+#ifdef __ia64__
+# define STACK_SIZE 256 * 1024
+#else
+# define STACK_SIZE 128 * 1024
+#endif
+  char st[STACK_SIZE] __attribute__ ((aligned));
+  struct clone_args clone_args =
+    {
+      .stack = (uintptr_t) st,
+      .stack_size = sizeof (st),
+    };
+  pid_t p = __clone_internal (&clone_args, f, 0);
+  if (p == -1)
+    {
+      printf("clone failed: %m\n");
+      return 1;
+    }
+
+  int e;
+  if (waitpid (p, &e, __WCLONE) != p)
+    {
+      puts ("waitpid failed");
+      kill (p, SIGKILL);
+      return 1;
+    }
+  if (!WIFEXITED (e))
+    {
+      if (WIFSIGNALED (e))
+	printf ("died from signal %s\n", strsignal (WTERMSIG (e)));
+      else
+	puts ("did not terminate correctly");
+      return 1;
+    }
+  if (WEXITSTATUS (e) != 0)
+    ok = false;
+
+  return ok ? 0 : 1;
+}
+
+#include <support/test-driver.c>
diff --git a/sysdeps/unix/sysv/linux/tst-clone-internal.c b/sysdeps/unix/sysv/linux/tst-clone-internal.c
new file mode 100644
index 0000000000..587d519bf2
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/tst-clone-internal.c
@@ -0,0 +1,51 @@
+/* Test for proper error/errno handling in clone.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+/* BZ #2386 */
+#include <errno.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <sched.h>
+#include <clone_internal.h>
+
+int child_fn(void *arg)
+{
+  puts ("FAIL: in child_fn(); should not be here");
+  exit(1);
+}
+
+static int
+do_test (void)
+{
+  int result;
+
+  result = __clone_internal (NULL, child_fn, NULL);
+
+  if (errno != EINVAL || result != -1)
+    {
+      printf ("FAIL: clone()=%d (wanted -1) errno=%d (wanted %d)\n",
+              result, errno, EINVAL);
+      return 1;
+    }
+
+  puts ("All OK");
+  return 0;
+}
+
+#include <support/test-driver.c>
diff --git a/sysdeps/unix/sysv/linux/tst-clone2-internal.c b/sysdeps/unix/sysv/linux/tst-clone2-internal.c
new file mode 100644
index 0000000000..dd8f32c24b
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/tst-clone2-internal.c
@@ -0,0 +1,142 @@
+/* Test if CLONE_VM does not change pthread pid/tid field (BZ #19957)
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sched.h>
+#include <signal.h>
+#include <string.h>
+#include <stdio.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <stddef.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <errno.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <sys/syscall.h>
+#include <stackinfo.h>  /* For _STACK_GROWS_{UP,DOWN}.  */
+#include <clone_internal.h>
+
+#include <support/check.h>
+
+static int sig;
+static int pipefd[2];
+
+static int
+f (void *a)
+{
+  close (pipefd[0]);
+
+  pid_t ppid = getppid ();
+  pid_t pid = getpid ();
+  pid_t tid = syscall (__NR_gettid);
+
+  if (write (pipefd[1], &ppid, sizeof ppid) != sizeof (ppid))
+    FAIL_EXIT1 ("write ppid failed\n");
+  if (write (pipefd[1], &pid, sizeof pid) != sizeof (pid))
+    FAIL_EXIT1 ("write pid failed\n");
+  if (write (pipefd[1], &tid, sizeof tid) != sizeof (tid))
+    FAIL_EXIT1 ("write tid failed\n");
+
+  return 0;
+}
+
+
+static int
+do_test (void)
+{
+  sig = SIGRTMIN;
+  sigset_t ss;
+  sigemptyset (&ss);
+  sigaddset (&ss, sig);
+  if (sigprocmask (SIG_BLOCK, &ss, NULL) != 0)
+    FAIL_EXIT1 ("sigprocmask failed: %m");
+
+  if (pipe2 (pipefd, O_CLOEXEC))
+    FAIL_EXIT1 ("pipe failed: %m");
+
+#ifdef __ia64__
+# define STACK_SIZE 256 * 1024
+#else
+# define STACK_SIZE 128 * 1024
+#endif
+  char st[STACK_SIZE] __attribute__ ((aligned));
+  struct clone_args clone_args =
+    {
+      .stack = (uintptr_t) st,
+      .stack_size = sizeof (st),
+    };
+  pid_t p = __clone_internal (&clone_args, f, 0);
+
+  close (pipefd[1]);
+
+  if (p == -1)
+    FAIL_EXIT1("clone failed: %m");
+
+  pid_t ppid, pid, tid;
+  if (read (pipefd[0], &ppid, sizeof pid) != sizeof pid)
+    {
+      kill (p, SIGKILL);
+      FAIL_EXIT1 ("read ppid failed: %m");
+    }
+  if (read (pipefd[0], &pid, sizeof pid) != sizeof pid)
+    {
+      kill (p, SIGKILL);
+      FAIL_EXIT1 ("read pid failed: %m");
+    }
+  if (read (pipefd[0], &tid, sizeof tid) != sizeof tid)
+    {
+      kill (p, SIGKILL);
+      FAIL_EXIT1 ("read tid failed: %m");
+    }
+
+  close (pipefd[0]);
+
+  int ret = 0;
+
+  pid_t own_pid = getpid ();
+  pid_t own_tid = syscall (__NR_gettid);
+
+  /* Some sanity checks for clone syscall: returned ppid should be current
+     pid and both returned tid/pid should be different from current one.  */
+  if ((ppid != own_pid) || (pid == own_pid) || (tid == own_tid))
+    FAIL_RET ("ppid=%i pid=%i tid=%i | own_pid=%i own_tid=%i",
+	      (int)ppid, (int)pid, (int)tid, (int)own_pid, (int)own_tid);
+
+  int e;
+  if (waitpid (p, &e, __WCLONE) != p)
+    {
+      kill (p, SIGKILL);
+      FAIL_EXIT1 ("waitpid failed");
+    }
+  if (!WIFEXITED (e))
+    {
+      if (WIFSIGNALED (e))
+	printf ("died from signal %s\n", strsignal (WTERMSIG (e)));
+      else
+	puts ("did not terminate correctly");
+      exit (EXIT_FAILURE);
+    }
+  if (WEXITSTATUS (e) != 0)
+    FAIL_EXIT1 ("exit code %d", WEXITSTATUS (e));
+
+  return ret;
+}
+
+#include <support/test-driver.c>
diff --git a/sysdeps/unix/sysv/linux/tst-clone3-internal.c b/sysdeps/unix/sysv/linux/tst-clone3-internal.c
new file mode 100644
index 0000000000..61863e1504
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/tst-clone3-internal.c
@@ -0,0 +1,101 @@
+/* Check if clone (CLONE_THREAD) does not call exit_group (BZ #21512)
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <string.h>
+#include <sched.h>
+#include <signal.h>
+#include <unistd.h>
+#include <errno.h>
+#include <sys/syscall.h>
+#include <sys/wait.h>
+#include <sys/types.h>
+#include <linux/futex.h>
+
+#include <stackinfo.h>  /* For _STACK_GROWS_{UP,DOWN}.  */
+#include <support/check.h>
+#include <stdatomic.h>
+#include <clone_internal.h>
+
+/* Test if clone call with CLONE_THREAD does not call exit_group.  The 'f'
+   function returns '1', which will be used by clone thread to call the
+   'exit' syscall directly.  If _exit is used instead, exit_group will be
+   used and thus the thread group will finish with return value of '1'
+   (where '2' from main thread is expected.  */
+
+static int
+f (void *a)
+{
+  return 1;
+}
+
+/* Futex wait for TID argument, similar to pthread_join internal
+   implementation.  */
+#define wait_tid(ctid_ptr, ctid_val)					\
+  do {									\
+    __typeof (*(ctid_ptr)) __tid;					\
+    /* We need acquire MO here so that we synchronize with the		\
+       kernel's store to 0 when the clone terminates.  */		\
+    while ((__tid = atomic_load_explicit (ctid_ptr,			\
+					  memory_order_acquire)) != 0)	\
+      futex_wait (ctid_ptr, ctid_val);					\
+  } while (0)
+
+static inline int
+futex_wait (int *futexp, int val)
+{
+#ifdef __NR_futex
+  return syscall (__NR_futex, futexp, FUTEX_WAIT, val);
+#else
+  return syscall (__NR_futex_time64, futexp, FUTEX_WAIT, val);
+#endif
+}
+
+static int
+do_test (void)
+{
+  char st[1024] __attribute__ ((aligned));
+  int clone_flags = CLONE_THREAD;
+  /* Minimum required flags to used along with CLONE_THREAD.  */
+  clone_flags |= CLONE_VM | CLONE_SIGHAND;
+  /* We will used ctid to call on futex to wait for thread exit.  */
+  clone_flags |= CLONE_CHILD_CLEARTID;
+  /* Initialize with a known value.  ctid is set to zero by the kernel after the
+     cloned thread has exited.  */
+#define CTID_INIT_VAL 1
+  pid_t ctid = CTID_INIT_VAL;
+  pid_t tid;
+
+  struct clone_args clone_args =
+    {
+      .flags = clone_flags & ~CSIGNAL,
+      .exit_signal = clone_flags & CSIGNAL,
+      .stack = (uintptr_t) st,
+      .stack_size = sizeof (st),
+      .child_tid = (uintptr_t) &ctid,
+    };
+  tid = __clone_internal (&clone_args, f, NULL);
+  if (tid == -1)
+    FAIL_EXIT1 ("clone failed: %m");
+
+  wait_tid (&ctid, CTID_INIT_VAL);
+
+  return 2;
+}
+
+#define EXPECTED_STATUS 2
+#include <support/test-driver.c>
diff --git a/sysdeps/unix/sysv/linux/tst-getpid1-internal.c b/sysdeps/unix/sysv/linux/tst-getpid1-internal.c
new file mode 100644
index 0000000000..1d1109b188
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/tst-getpid1-internal.c
@@ -0,0 +1,137 @@
+/* Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sched.h>
+#include <signal.h>
+#include <string.h>
+#include <stdio.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <stackinfo.h>
+#include <clone_internal.h>
+
+#ifndef TEST_CLONE_FLAGS
+#define TEST_CLONE_FLAGS 0
+#endif
+
+static int sig;
+
+static int
+f (void *a)
+{
+  puts ("in f");
+  union sigval sival;
+  sival.sival_int = getpid ();
+  printf ("pid = %d\n", sival.sival_int);
+  if (sigqueue (getppid (), sig, sival) != 0)
+    return 1;
+  return 0;
+}
+
+
+static int
+do_test (void)
+{
+  int mypid = getpid ();
+
+  sig = SIGRTMIN;
+  sigset_t ss;
+  sigemptyset (&ss);
+  sigaddset (&ss, sig);
+  if (sigprocmask (SIG_BLOCK, &ss, NULL) != 0)
+    {
+      printf ("sigprocmask failed: %m\n");
+      return 1;
+    }
+
+#ifdef __ia64__
+# define STACK_SIZE 256 * 1024
+#else
+# define STACK_SIZE 128 * 1024
+#endif
+  char st[STACK_SIZE] __attribute__ ((aligned));
+  struct clone_args clone_args =
+    {
+      .flags = TEST_CLONE_FLAGS & ~CSIGNAL,
+      .exit_signal = TEST_CLONE_FLAGS & CSIGNAL,
+      .stack = (uintptr_t) st,
+      .stack_size = sizeof (st),
+    };
+  pid_t p = __clone_internal (&clone_args, f, 0);
+  if (p == -1)
+    {
+      printf("clone failed: %m\n");
+      return 1;
+    }
+  printf ("new thread: %d\n", (int) p);
+
+  siginfo_t si;
+  do
+    if (sigwaitinfo (&ss, &si) < 0)
+      {
+	printf("sigwaitinfo failed: %m\n");
+	kill (p, SIGKILL);
+	return 1;
+      }
+  while  (si.si_signo != sig || si.si_code != SI_QUEUE);
+
+  int e;
+  if (waitpid (p, &e, __WCLONE) != p)
+    {
+      puts ("waitpid failed");
+      kill (p, SIGKILL);
+      return 1;
+    }
+  if (!WIFEXITED (e))
+    {
+      if (WIFSIGNALED (e))
+	printf ("died from signal %s\n", strsignal (WTERMSIG (e)));
+      else
+	puts ("did not terminate correctly");
+      return 1;
+    }
+  if (WEXITSTATUS (e) != 0)
+    {
+      printf ("exit code %d\n", WEXITSTATUS (e));
+      return 1;
+    }
+
+  if (si.si_int != (int) p)
+    {
+      printf ("expected PID %d, got si_int %d\n", (int) p, si.si_int);
+      kill (p, SIGKILL);
+      return 1;
+    }
+
+  if (si.si_pid != p)
+    {
+      printf ("expected PID %d, got si_pid %d\n", (int) p, (int) si.si_pid);
+      kill (p, SIGKILL);
+      return 1;
+    }
+
+  if (getpid () != mypid)
+    {
+      puts ("my PID changed");
+      return 1;
+    }
+
+  return 0;
+}
+
+#include <support/test-driver.c>
-- 
2.31.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 3/5] GLIBC_PRIVATE: Export __clone_internal
  2021-05-15 12:34 ` [PATCH v5 3/5] GLIBC_PRIVATE: Export __clone_internal H.J. Lu
@ 2021-05-17 13:54   ` Andreas Schwab
  2021-05-20 14:24   ` Florian Weimer
  1 sibling, 0 replies; 19+ messages in thread
From: Andreas Schwab @ 2021-05-17 13:54 UTC (permalink / raw)
  To: H.J. Lu via Libc-alpha; +Cc: H.J. Lu, Florian Weimer

On Mai 15 2021, H.J. Lu via Libc-alpha wrote:

> diff --git a/sysdeps/unix/sysv/linux/Versions b/sysdeps/unix/sysv/linux/Versions
> index 220bb2dffe..299d4fef9c 100644
> --- a/sysdeps/unix/sysv/linux/Versions
> +++ b/sysdeps/unix/sysv/linux/Versions
> @@ -179,6 +179,7 @@ libc {
>      __sigtimedwait;
>      # functions used by nscd
>      __netlink_assert_response;
> +    __clone_internal;

The comment doesn't fit here.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 3/5] GLIBC_PRIVATE: Export __clone_internal
  2021-05-15 12:34 ` [PATCH v5 3/5] GLIBC_PRIVATE: Export __clone_internal H.J. Lu
  2021-05-17 13:54   ` Andreas Schwab
@ 2021-05-20 14:24   ` Florian Weimer
  2021-05-22  1:55     ` H.J. Lu
  1 sibling, 1 reply; 19+ messages in thread
From: Florian Weimer @ 2021-05-20 14:24 UTC (permalink / raw)
  To: H.J. Lu; +Cc: libc-alpha, Adhemerval Zanella

* H. J. Lu:

> Export __clone_internal for libpthread.so and __clone_internal tests.
> ---
>  sysdeps/unix/sysv/linux/Versions | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/sysdeps/unix/sysv/linux/Versions b/sysdeps/unix/sysv/linux/Versions
> index 220bb2dffe..299d4fef9c 100644
> --- a/sysdeps/unix/sysv/linux/Versions
> +++ b/sysdeps/unix/sysv/linux/Versions
> @@ -179,6 +179,7 @@ libc {
>      __sigtimedwait;
>      # functions used by nscd
>      __netlink_assert_response;
> +    __clone_internal;
>    }
>  }

I think this won't be necessary after the libpthread move.

We can test the function directly by linking statically.  We already do
this in a few other cases.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 2/5] nptl: Always pass stack size to create_thread
  2021-05-15 12:34 ` [PATCH v5 2/5] nptl: Always pass stack size to create_thread H.J. Lu
@ 2021-05-20 14:26   ` Florian Weimer
  0 siblings, 0 replies; 19+ messages in thread
From: Florian Weimer @ 2021-05-20 14:26 UTC (permalink / raw)
  To: H.J. Lu; +Cc: libc-alpha, Adhemerval Zanella

* H. J. Lu:

> Since the stack size argument for create_thread is now unconditional,
> always pass stack size to create_thread.

Nice cleanup, thanks.  Looks okay to me.

Florian


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 1/5] Add an internal wrapper for clone, clone2 and clone3
  2021-05-15 12:34 ` [PATCH v5 1/5] " H.J. Lu
@ 2021-05-20 14:46   ` Florian Weimer
  2021-05-22  1:14     ` H.J. Lu
  0 siblings, 1 reply; 19+ messages in thread
From: Florian Weimer @ 2021-05-20 14:46 UTC (permalink / raw)
  To: H.J. Lu; +Cc: libc-alpha, Adhemerval Zanella

* H. J. Lu:

> diff --git a/include/clone_internal.h b/include/clone_internal.h
> new file mode 100644
> index 0000000000..124f7ba169
> --- /dev/null
> +++ b/include/clone_internal.h
> @@ -0,0 +1,14 @@
> +#ifndef _CLONE3_H
> +#include_next <clone3.h>
> +
> +extern __typeof (clone3) __clone3;
> +
> +/* The internal wrapper of clone and clone3.  */
> +extern __typeof (clone3) __clone_internal;

Maybe mention fallback explicitly?

> diff --git a/include/libc-pointer-arith.h b/include/libc-pointer-arith.h
> index 72e722c5aa..04ba537617 100644
> --- a/include/libc-pointer-arith.h
> +++ b/include/libc-pointer-arith.h
> @@ -37,6 +37,9 @@
>  /* Cast an integer or a pointer VAL to integer with proper type.  */
>  # define cast_to_integer(val) ((__integer_if_pointer_type (val)) (val))
>  
> +/* Cast an integer VAL to void * pointer.  */
> +# define cast_to_pointer(val) ((void *) (uintptr_t) (val))
> +
>  /* Align a value by rounding down to closest size.
>     e.g. Using size of 4096, we get this behavior:
>  	{4095, 4096, 4097} = {0, 4096, 4096}.  */

As a regular backporter, I'd like to see this in a separate commit if
possible.

> diff --git a/sysdeps/unix/sysv/linux/clone-internal.c b/sysdeps/unix/sysv/linux/clone-internal.c
> new file mode 100644
> index 0000000000..c357b0ac14
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/clone-internal.c

> +#define sizeof_field(TYPE, MEMBER) sizeof((((TYPE *)0)->MEMBER))
> +#define offsetofend(TYPE, MEMBER) \
> +  (offsetof(TYPE, MEMBER) + sizeof_field(TYPE, MEMBER))

Missing after sizeof/offsetof/sizeof_field.  And __alignof below.

> +int
> +__clone_internal (struct clone_args *cl_args,
> +		  int (*func) (void *arg), void *arg)
> +{
> +  int ret;
> +#ifdef HAVE_CLONE3_WAPPER
> +  /* Try clone3 first.  */
> +  int saved_errno = errno;
> +  ret = __clone3 (cl_args, func, arg);
> +  if (ret != -1 || errno != ENOSYS)
> +    return ret;

*sigh* This will cause breakage in containers again.  Like faccessat2.

I think this is technically the right thing to do.

> +  /* NB: Restore errno since errno may be checked against non-zero
> +     return value.  */
> +  __set_errno (saved_errno);
> +#else
> +    /* Check invalid arguments.  */
> +  if (cl_args == NULL || func == NULL)
> +    {
> +      __set_errno (EINVAL);
> +      return -1;
> +    }
> +#endif
> +
> +  /* Map clone3 arguments to clone arguments.  NB: No need to check
> +     invalid clone3 specific bits since this is an internal function.  */

This comment contradicts with the check above under the #else.

Maybe the public clone3 wrapper should not have emulation.  This would
push the EPERM problem to callers.  (But it doesn't solve EPERM from
pthread_create of course.)

> diff --git a/sysdeps/unix/sysv/linux/clone3.c b/sysdeps/unix/sysv/linux/clone3.c
> new file mode 100644
> index 0000000000..de963ef89d
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/clone3.c
> @@ -0,0 +1 @@
> +/* An empty placeholder.  */
> diff --git a/sysdeps/unix/sysv/linux/clone3.h b/sysdeps/unix/sysv/linux/clone3.h
> new file mode 100644
> index 0000000000..a222948d55
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/clone3.h

> +struct clone_args
> +{
> +  uint64_t flags;	 /* Flags bit mask.  */
> +  uint64_t pidfd;	 /* Where to store PID file descriptor
> +			    (pid_t *).  */
> +  uint64_t child_tid;	 /* Where to store child TID, in child's memory
> +			    (pid_t *).  */
> +  uint64_t parent_tid;	 /* Where to store child TID, in parent's memory
> +			    (int *). */
> +  uint64_t exit_signal;	 /* Signal to deliver to parent on child
> +			    termination */
> +  uint64_t stack;	 /* The lowest address of stack.  */
> +  uint64_t stack_size;	 /* Size of stack.  */
> +  uint64_t tls;		 /* Location of new TLS.  */
> +  uint64_t set_tid;	 /* Pointer to a pid_t array
> +			    (since Linux 5.5).  */
> +  uint64_t set_tid_size; /* Number of elements in set_tid
> +			    (since Linux 5.5). */
> +  uint64_t cgroup;	 /* File descriptor for target cgroup
> +			    of child (since Linux 5.7).  */
> +} __attribute__ ((aligned (8)));

Usually, this kind of use of an ABI-changing attribute would not be
okay, but there is an expectation that the struct will be extended with
future fields in the future, so

I know that this is not an installed header yet.  But would you please
add a comment to the end of the struct that new fields will be added in
the future, and that this struct should only be used in an argument to
clone3 (along with its size arguments) and not in a way that defines
some external ABI?

> +/* The wrapper of clone3.  */
> +extern int clone3 (struct clone_args *__cl_args,
> +		   int (*__func) (void *__arg), void *__arg);

Sorry, the public clone3 system call wrapper will have to retain its
size argument.  I didn't realize things were moeving in that direction.
I think __clone_internal should still avoid the size argument, but
__clone3 should have it (to align with public clone3).

Rest looks okay to me, thanks.

Florian


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 4/5] x86-64: Add the clone3 wrapper
  2021-05-15 12:34 ` [PATCH v5 4/5] x86-64: Add the clone3 wrapper H.J. Lu
@ 2021-05-20 14:53   ` Florian Weimer
  2021-05-22  1:38     ` H.J. Lu
  2021-05-20 18:35   ` Noah Goldstein
  1 sibling, 1 reply; 19+ messages in thread
From: Florian Weimer @ 2021-05-20 14:53 UTC (permalink / raw)
  To: H.J. Lu; +Cc: libc-alpha, Adhemerval Zanella

* H. J. Lu:

> extern int clone3 (struct clone_args *__cl_args,
> 		   int (*__func) (void *__arg), void *__arg);
> ---
>  sysdeps/unix/sysv/linux/x86_64/clone3.S | 92 +++++++++++++++++++++++++
>  sysdeps/unix/sysv/linux/x86_64/sysdep.h |  2 +
>  2 files changed, 94 insertions(+)
>  create mode 100644 sysdeps/unix/sysv/linux/x86_64/clone3.S
>
> diff --git a/sysdeps/unix/sysv/linux/x86_64/clone3.S b/sysdeps/unix/sysv/linux/x86_64/clone3.S
> new file mode 100644
> index 0000000000..f7d4036a6a

> +        .text
> +ENTRY (__clone3)
> +	/* Sanity check arguments.  */
> +	movq	$-EINVAL, %rax
> +	testq	%rdi, %rdi		/* No NULL cl_args pointer.  */
> +	jz	SYSCALL_ERROR_LABEL
> +	testq	%rsi, %rsi		/* No NULL function pointer.  */
> +	jz	SYSCALL_ERROR_LABEL

I think some of these register aren't x32-compatible.  Isn't the upper
half undefined?

> +	/* Save the function pointer in R8 which is preserved by the
> +	   syscall.  */
> +	movq	%rsi, %r8
> +
> +	/* Put sizeof (struct clone_args) in ESI.  */
> +	movl	$CLONE_ARGS_SIZE , %esi

If this is in preparation of the public wrapper, this should actually be
an argument.  Sorry didn't realize this was the direction.

> +L(thread_start):
> +	cfi_startproc
> +	/* Clearing frame pointer is insufficient, use CFI.  */
> +	cfi_undefined (rip)
> +	/* Clear the frame pointer.  The ABI suggests this be done, to mark
> +	   the outermost frame obviously.  */
> +	xorl	%ebp, %ebp
> +
> +	/* Set up arguments for the function call.  */
> +	movq	%rdx, %rdi	/* Argument.  */
> +	call	*%r8		/* Call function.  */
> +	/* Call exit with return value from function call. */
> +	movq	%rax, %rdi
> +	movl	$SYS_ify(exit), %eax
> +	syscall
> +	cfi_endproc
> +
> +	cfi_startproc
> +PSEUDO_END (__clone3)

If this is a public wrapper, should it round up %rsp to 16 bytes
at the point of the caller, to follow the x86-64 calling convention?

Thanks,
Florian


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 5/5] Add tests for __clone_internal
  2021-05-15 12:34 ` [PATCH v5 5/5] Add tests for __clone_internal H.J. Lu
@ 2021-05-20 15:08   ` Florian Weimer
  2021-05-22  1:54     ` H.J. Lu
  0 siblings, 1 reply; 19+ messages in thread
From: Florian Weimer @ 2021-05-20 15:08 UTC (permalink / raw)
  To: H.J. Lu; +Cc: libc-alpha, Adhemerval Zanella

* H. J. Lu:

> diff --git a/sysdeps/unix/sysv/linux/tst-align-clone-internal.c b/sysdeps/unix/sysv/linux/tst-align-clone-internal.c
> new file mode 100644
> index 0000000000..eccc39e255
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/tst-align-clone-internal.c

> +  int e;
> +  if (waitpid (p, &e, __WCLONE) != p)
> +    {
> +      puts ("waitpid failed");
> +      kill (p, SIGKILL);
> +      return 1;
> +    }

This could use xwaitpid.  The same comment applies to other tests.

> diff --git a/sysdeps/unix/sysv/linux/tst-clone-internal.c b/sysdeps/unix/sysv/linux/tst-clone-internal.c
> new file mode 100644
> index 0000000000..587d519bf2
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/tst-clone-internal.c

> +static int
> +do_test (void)
> +{
> +  int result;
> +
> +  result = __clone_internal (NULL, child_fn, NULL);
> +
> +  if (errno != EINVAL || result != -1)
> +    {
> +      printf ("FAIL: clone()=%d (wanted -1) errno=%d (wanted %d)\n",
> +              result, errno, EINVAL);
> +      return 1;
> +    }
> +
> +  puts ("All OK");
> +  return 0;
> +}

I think this test is invalid for the internal function (the comment
about not checking arguments).

> diff --git a/sysdeps/unix/sysv/linux/tst-clone2-internal.c b/sysdeps/unix/sysv/linux/tst-clone2-internal.c
> new file mode 100644
> index 0000000000..dd8f32c24b
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/tst-clone2-internal.c
> @@ -0,0 +1,142 @@

> +
> +static int
> +f (void *a)
> +{
> +  close (pipefd[0]);
> +
> +  pid_t ppid = getppid ();
> +  pid_t pid = getpid ();
> +  pid_t tid = syscall (__NR_gettid);
> +
> +  if (write (pipefd[1], &ppid, sizeof ppid) != sizeof (ppid))
> +    FAIL_EXIT1 ("write ppid failed\n");
> +  if (write (pipefd[1], &pid, sizeof pid) != sizeof (pid))
> +    FAIL_EXIT1 ("write pid failed\n");
> +  if (write (pipefd[1], &tid, sizeof tid) != sizeof (tid))
> +    FAIL_EXIT1 ("write tid failed\n");
> +
> +  return 0;
> +}

You could use support_shared_allocate for the parent/child communication
instead of a pipe.  The MAP_SHARED mapping overrides the lack of
CLONE_VM.

> +static int
> +do_test (void)
> +{
> +  sig = SIGRTMIN;
> +  sigset_t ss;
> +  sigemptyset (&ss);
> +  sigaddset (&ss, sig);
> +  if (sigprocmask (SIG_BLOCK, &ss, NULL) != 0)
> +    FAIL_EXIT1 ("sigprocmask failed: %m");

You could use xpthread_sigmask.  (Applies to tst-getpid1-internal.c as
well.)


> +  pid_t own_pid = getpid ();
> +  pid_t own_tid = syscall (__NR_gettid);

We have gettid nowadays.

> +#include <support/test-driver.c>
> diff --git a/sysdeps/unix/sysv/linux/tst-clone3-internal.c b/sysdeps/unix/sysv/linux/tst-clone3-internal.c
> new file mode 100644
> index 0000000000..61863e1504
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/tst-clone3-internal.c
> +#include <stackinfo.h>  /* For _STACK_GROWS_{UP,DOWN}.  */

No longer needed! 8-)

> +#include <support/check.h>
> +#include <stdatomic.h>
> +#include <clone_internal.h>
> +
> +/* Test if clone call with CLONE_THREAD does not call exit_group.  The 'f'
> +   function returns '1', which will be used by clone thread to call the
> +   'exit' syscall directly.  If _exit is used instead, exit_group will be
> +   used and thus the thread group will finish with return value of '1'
> +   (where '2' from main thread is expected.  */

Missing ).

The rest looks okay as far as I can tell.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 4/5] x86-64: Add the clone3 wrapper
  2021-05-15 12:34 ` [PATCH v5 4/5] x86-64: Add the clone3 wrapper H.J. Lu
  2021-05-20 14:53   ` Florian Weimer
@ 2021-05-20 18:35   ` Noah Goldstein
  2021-05-20 18:39     ` Noah Goldstein
  1 sibling, 1 reply; 19+ messages in thread
From: Noah Goldstein @ 2021-05-20 18:35 UTC (permalink / raw)
  To: H.J. Lu; +Cc: GNU C Library, Florian Weimer

On Sat, May 15, 2021 at 9:23 AM H.J. Lu via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
> extern int clone3 (struct clone_args *__cl_args,
>                    int (*__func) (void *__arg), void *__arg);
> ---
>  sysdeps/unix/sysv/linux/x86_64/clone3.S | 92 +++++++++++++++++++++++++
>  sysdeps/unix/sysv/linux/x86_64/sysdep.h |  2 +
>  2 files changed, 94 insertions(+)
>  create mode 100644 sysdeps/unix/sysv/linux/x86_64/clone3.S
>
> diff --git a/sysdeps/unix/sysv/linux/x86_64/clone3.S b/sysdeps/unix/sysv/linux/x86_64/clone3.S
> new file mode 100644
> index 0000000000..f7d4036a6a
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/x86_64/clone3.S
> @@ -0,0 +1,92 @@
> +/* The clone3 syscall wrapper.  Linux/x86-64 version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +/* clone3() is even more special than fork() as it mucks with stacks
> +   and invokes a function in the right context after its all over.  */
> +
> +#include <sysdep.h>
> +#include <clone-offsets.h>
> +
> +/* The userland implementation is:
> +   int clone3 (struct clone_args *cl_args, int (*func)(void *arg),
> +              void *arg);
> +   the kernel entry is:
> +   int clone3 (struct clone_args *cl_args, size_t size);
> +
> +   The parameters are passed in registers from userland:
> +   rdi: cl_args
> +   rsi: func
> +   rdx: arg
> +
> +   The kernel expects:
> +   rax: system call number
> +   rdi: cl_args
> +   rsi: size  */
> +
> +        .text
> +ENTRY (__clone3)
> +       /* Sanity check arguments.  */
> +       movq    $-EINVAL, %rax

Can this be movl?

> +       testq   %rdi, %rdi              /* No NULL cl_args pointer.  */
> +       jz      SYSCALL_ERROR_LABEL
> +       testq   %rsi, %rsi              /* No NULL function pointer.  */
> +       jz      SYSCALL_ERROR_LABEL
> +
> +       /* Save the function pointer in R8 which is preserved by the
> +          syscall.  */
> +       movq    %rsi, %r8
> +
> +       /* Put sizeof (struct clone_args) in ESI.  */
> +       movl    $CLONE_ARGS_SIZE , %esi
> +
> +       /* Do the system call.  */
> +       movl    $SYS_ify(clone3), %eax
> +
> +       /* End FDE now, because in the child the unwind info will be
> +          wrong.  */
> +       cfi_endproc
> +       syscall
> +
> +       test    %RAX_LP, %RAX_LP
> +       jl      SYSCALL_ERROR_LABEL
> +       jz      L(thread_start)
> +

Is expectation to go to L(thread_start)? If so
think jnz L(ret) and fallthrough is probably
better.

> +       ret
> +
> +L(thread_start):
> +       cfi_startproc
> +       /* Clearing frame pointer is insufficient, use CFI.  */
> +       cfi_undefined (rip)
> +       /* Clear the frame pointer.  The ABI suggests this be done, to mark
> +          the outermost frame obviously.  */
> +       xorl    %ebp, %ebp
> +
> +       /* Set up arguments for the function call.  */
> +       movq    %rdx, %rdi      /* Argument.  */
> +       call    *%r8            /* Call function.  */
> +       /* Call exit with return value from function call. */
> +       movq    %rax, %rdi
> +       movl    $SYS_ify(exit), %eax
> +       syscall
> +       cfi_endproc
> +
> +       cfi_startproc
> +PSEUDO_END (__clone3)
> +
> +libc_hidden_def (__clone3)
> +weak_alias (__clone3, clone3)
> diff --git a/sysdeps/unix/sysv/linux/x86_64/sysdep.h b/sysdeps/unix/sysv/linux/x86_64/sysdep.h
> index dbad2c788a..f26ffc68ae 100644
> --- a/sysdeps/unix/sysv/linux/x86_64/sysdep.h
> +++ b/sysdeps/unix/sysv/linux/x86_64/sysdep.h
> @@ -377,6 +377,8 @@
>  # define HAVE_GETCPU_VSYSCALL          "__vdso_getcpu"
>  # define HAVE_CLOCK_GETRES64_VSYSCALL   "__vdso_clock_getres"
>
> +# define HAVE_CLONE3_WAPPER                    1
> +
>  # define SINGLE_THREAD_BY_GLOBAL               1
>
>  #endif /* __ASSEMBLER__ */
> --
> 2.31.1
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 4/5] x86-64: Add the clone3 wrapper
  2021-05-20 18:35   ` Noah Goldstein
@ 2021-05-20 18:39     ` Noah Goldstein
  2021-05-22  1:52       ` H.J. Lu
  0 siblings, 1 reply; 19+ messages in thread
From: Noah Goldstein @ 2021-05-20 18:39 UTC (permalink / raw)
  To: H.J. Lu; +Cc: GNU C Library, Florian Weimer

On Thu, May 20, 2021 at 2:35 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
>
> On Sat, May 15, 2021 at 9:23 AM H.J. Lu via Libc-alpha
> <libc-alpha@sourceware.org> wrote:
> >
> > extern int clone3 (struct clone_args *__cl_args,
> >                    int (*__func) (void *__arg), void *__arg);
> > ---
> >  sysdeps/unix/sysv/linux/x86_64/clone3.S | 92 +++++++++++++++++++++++++
> >  sysdeps/unix/sysv/linux/x86_64/sysdep.h |  2 +
> >  2 files changed, 94 insertions(+)
> >  create mode 100644 sysdeps/unix/sysv/linux/x86_64/clone3.S
> >
> > diff --git a/sysdeps/unix/sysv/linux/x86_64/clone3.S b/sysdeps/unix/sysv/linux/x86_64/clone3.S
> > new file mode 100644
> > index 0000000000..f7d4036a6a
> > --- /dev/null
> > +++ b/sysdeps/unix/sysv/linux/x86_64/clone3.S
> > @@ -0,0 +1,92 @@
> > +/* The clone3 syscall wrapper.  Linux/x86-64 version.
> > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > +   This file is part of the GNU C Library.
> > +
> > +   The GNU C Library is free software; you can redistribute it and/or
> > +   modify it under the terms of the GNU Lesser General Public
> > +   License as published by the Free Software Foundation; either
> > +   version 2.1 of the License, or (at your option) any later version.
> > +
> > +   The GNU C Library is distributed in the hope that it will be useful,
> > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +   Lesser General Public License for more details.
> > +
> > +   You should have received a copy of the GNU Lesser General Public
> > +   License along with the GNU C Library; if not, see
> > +   <https://www.gnu.org/licenses/>.  */
> > +
> > +/* clone3() is even more special than fork() as it mucks with stacks
> > +   and invokes a function in the right context after its all over.  */
> > +
> > +#include <sysdep.h>
> > +#include <clone-offsets.h>
> > +
> > +/* The userland implementation is:
> > +   int clone3 (struct clone_args *cl_args, int (*func)(void *arg),
> > +              void *arg);
> > +   the kernel entry is:
> > +   int clone3 (struct clone_args *cl_args, size_t size);
> > +
> > +   The parameters are passed in registers from userland:
> > +   rdi: cl_args
> > +   rsi: func
> > +   rdx: arg
> > +
> > +   The kernel expects:
> > +   rax: system call number
> > +   rdi: cl_args
> > +   rsi: size  */
> > +
> > +        .text
> > +ENTRY (__clone3)
> > +       /* Sanity check arguments.  */
> > +       movq    $-EINVAL, %rax
>
> Can this be movl?
>
> > +       testq   %rdi, %rdi              /* No NULL cl_args pointer.  */
> > +       jz      SYSCALL_ERROR_LABEL
> > +       testq   %rsi, %rsi              /* No NULL function pointer.  */
> > +       jz      SYSCALL_ERROR_LABEL
> > +
> > +       /* Save the function pointer in R8 which is preserved by the
> > +          syscall.  */
> > +       movq    %rsi, %r8
> > +
> > +       /* Put sizeof (struct clone_args) in ESI.  */
> > +       movl    $CLONE_ARGS_SIZE , %esi
> > +
> > +       /* Do the system call.  */
> > +       movl    $SYS_ify(clone3), %eax
> > +
> > +       /* End FDE now, because in the child the unwind info will be
> > +          wrong.  */
> > +       cfi_endproc
> > +       syscall
> > +
> > +       test    %RAX_LP, %RAX_LP
> > +       jl      SYSCALL_ERROR_LABEL
> > +       jz      L(thread_start)
> > +
>
> Is expectation to go to L(thread_start)? If so
> think jnz L(ret) and fallthrough is probably
> better.

Or better take the error check branch off
the critical path with jnz L(error_or_ret) then jl
in L(error_or_ret)

>
> > +       ret
> > +
> > +L(thread_start):
> > +       cfi_startproc
> > +       /* Clearing frame pointer is insufficient, use CFI.  */
> > +       cfi_undefined (rip)
> > +       /* Clear the frame pointer.  The ABI suggests this be done, to mark
> > +          the outermost frame obviously.  */
> > +       xorl    %ebp, %ebp
> > +
> > +       /* Set up arguments for the function call.  */
> > +       movq    %rdx, %rdi      /* Argument.  */
> > +       call    *%r8            /* Call function.  */
> > +       /* Call exit with return value from function call. */
> > +       movq    %rax, %rdi
> > +       movl    $SYS_ify(exit), %eax
> > +       syscall
> > +       cfi_endproc
> > +
> > +       cfi_startproc
> > +PSEUDO_END (__clone3)
> > +
> > +libc_hidden_def (__clone3)
> > +weak_alias (__clone3, clone3)
> > diff --git a/sysdeps/unix/sysv/linux/x86_64/sysdep.h b/sysdeps/unix/sysv/linux/x86_64/sysdep.h
> > index dbad2c788a..f26ffc68ae 100644
> > --- a/sysdeps/unix/sysv/linux/x86_64/sysdep.h
> > +++ b/sysdeps/unix/sysv/linux/x86_64/sysdep.h
> > @@ -377,6 +377,8 @@
> >  # define HAVE_GETCPU_VSYSCALL          "__vdso_getcpu"
> >  # define HAVE_CLOCK_GETRES64_VSYSCALL   "__vdso_clock_getres"
> >
> > +# define HAVE_CLONE3_WAPPER                    1
> > +
> >  # define SINGLE_THREAD_BY_GLOBAL               1
> >
> >  #endif /* __ASSEMBLER__ */
> > --
> > 2.31.1
> >

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 1/5] Add an internal wrapper for clone, clone2 and clone3
  2021-05-20 14:46   ` Florian Weimer
@ 2021-05-22  1:14     ` H.J. Lu
  0 siblings, 0 replies; 19+ messages in thread
From: H.J. Lu @ 2021-05-22  1:14 UTC (permalink / raw)
  To: Florian Weimer; +Cc: GNU C Library, Adhemerval Zanella

On Thu, May 20, 2021 at 7:46 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * H. J. Lu:
>
> > diff --git a/include/clone_internal.h b/include/clone_internal.h
> > new file mode 100644
> > index 0000000000..124f7ba169
> > --- /dev/null
> > +++ b/include/clone_internal.h
> > @@ -0,0 +1,14 @@
> > +#ifndef _CLONE3_H
> > +#include_next <clone3.h>
> > +
> > +extern __typeof (clone3) __clone3;
> > +
> > +/* The internal wrapper of clone and clone3.  */
> > +extern __typeof (clone3) __clone_internal;
>
> Maybe mention fallback explicitly?

Done.

> > diff --git a/include/libc-pointer-arith.h b/include/libc-pointer-arith.h
> > index 72e722c5aa..04ba537617 100644
> > --- a/include/libc-pointer-arith.h
> > +++ b/include/libc-pointer-arith.h
> > @@ -37,6 +37,9 @@
> >  /* Cast an integer or a pointer VAL to integer with proper type.  */
> >  # define cast_to_integer(val) ((__integer_if_pointer_type (val)) (val))
> >
> > +/* Cast an integer VAL to void * pointer.  */
> > +# define cast_to_pointer(val) ((void *) (uintptr_t) (val))
> > +
> >  /* Align a value by rounding down to closest size.
> >     e.g. Using size of 4096, we get this behavior:
> >       {4095, 4096, 4097} = {0, 4096, 4096}.  */
>
> As a regular backporter, I'd like to see this in a separate commit if
> possible.

Done.

> > diff --git a/sysdeps/unix/sysv/linux/clone-internal.c b/sysdeps/unix/sysv/linux/clone-internal.c
> > new file mode 100644
> > index 0000000000..c357b0ac14
> > --- /dev/null
> > +++ b/sysdeps/unix/sysv/linux/clone-internal.c
>
> > +#define sizeof_field(TYPE, MEMBER) sizeof((((TYPE *)0)->MEMBER))
> > +#define offsetofend(TYPE, MEMBER) \
> > +  (offsetof(TYPE, MEMBER) + sizeof_field(TYPE, MEMBER))
>
> Missing after sizeof/offsetof/sizeof_field.  And __alignof below.

Fixed.

> > +int
> > +__clone_internal (struct clone_args *cl_args,
> > +               int (*func) (void *arg), void *arg)
> > +{
> > +  int ret;
> > +#ifdef HAVE_CLONE3_WAPPER
> > +  /* Try clone3 first.  */
> > +  int saved_errno = errno;
> > +  ret = __clone3 (cl_args, func, arg);
> > +  if (ret != -1 || errno != ENOSYS)
> > +    return ret;
>
> *sigh* This will cause breakage in containers again.  Like faccessat2.
>
> I think this is technically the right thing to do.
>
> > +  /* NB: Restore errno since errno may be checked against non-zero
> > +     return value.  */
> > +  __set_errno (saved_errno);
> > +#else
> > +    /* Check invalid arguments.  */
> > +  if (cl_args == NULL || func == NULL)
> > +    {
> > +      __set_errno (EINVAL);
> > +      return -1;
> > +    }

This block is removed.

> > +#endif
> > +
> > +  /* Map clone3 arguments to clone arguments.  NB: No need to check
> > +     invalid clone3 specific bits since this is an internal function.  */
>
> This comment contradicts with the check above under the #else.

Fixed.

> Maybe the public clone3 wrapper should not have emulation.  This would
> push the EPERM problem to callers.  (But it doesn't solve EPERM from
> pthread_create of course.)

I don't think the public clone3 wrapper should have emulation.

> > diff --git a/sysdeps/unix/sysv/linux/clone3.c b/sysdeps/unix/sysv/linux/clone3.c
> > new file mode 100644
> > index 0000000000..de963ef89d
> > --- /dev/null
> > +++ b/sysdeps/unix/sysv/linux/clone3.c
> > @@ -0,0 +1 @@
> > +/* An empty placeholder.  */
> > diff --git a/sysdeps/unix/sysv/linux/clone3.h b/sysdeps/unix/sysv/linux/clone3.h
> > new file mode 100644
> > index 0000000000..a222948d55
> > --- /dev/null
> > +++ b/sysdeps/unix/sysv/linux/clone3.h
>
> > +struct clone_args
> > +{
> > +  uint64_t flags;     /* Flags bit mask.  */
> > +  uint64_t pidfd;     /* Where to store PID file descriptor
> > +                         (pid_t *).  */
> > +  uint64_t child_tid;         /* Where to store child TID, in child's memory
> > +                         (pid_t *).  */
> > +  uint64_t parent_tid;        /* Where to store child TID, in parent's memory
> > +                         (int *). */
> > +  uint64_t exit_signal;       /* Signal to deliver to parent on child
> > +                         termination */
> > +  uint64_t stack;     /* The lowest address of stack.  */
> > +  uint64_t stack_size;        /* Size of stack.  */
> > +  uint64_t tls;               /* Location of new TLS.  */
> > +  uint64_t set_tid;   /* Pointer to a pid_t array
> > +                         (since Linux 5.5).  */
> > +  uint64_t set_tid_size; /* Number of elements in set_tid
> > +                         (since Linux 5.5). */
> > +  uint64_t cgroup;    /* File descriptor for target cgroup
> > +                         of child (since Linux 5.7).  */
> > +} __attribute__ ((aligned (8)));
>
> Usually, this kind of use of an ABI-changing attribute would not be
> okay, but there is an expectation that the struct will be extended with
> future fields in the future, so
>
> I know that this is not an installed header yet.  But would you please
> add a comment to the end of the struct that new fields will be added in
> the future, and that this struct should only be used in an argument to
> clone3 (along with its size arguments) and not in a way that defines
> some external ABI?

I added:

/* This struct should only be used in an argument to the clone3 system
   call (along with its size argument).  It may be extended with new
   fields in the future.  */

struct clone_args
{
  ...

> > +/* The wrapper of clone3.  */
> > +extern int clone3 (struct clone_args *__cl_args,
> > +                int (*__func) (void *__arg), void *__arg);
>
> Sorry, the public clone3 system call wrapper will have to retain its
> size argument.  I didn't realize things were moeving in that direction.
> I think __clone_internal should still avoid the size argument, but
> __clone3 should have it (to align with public clone3).

I added the size argument to the clone3 wrapper.

> Rest looks okay to me, thanks.
>
> Florian
>

Thanks.

-- 
H.J.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 4/5] x86-64: Add the clone3 wrapper
  2021-05-20 14:53   ` Florian Weimer
@ 2021-05-22  1:38     ` H.J. Lu
  0 siblings, 0 replies; 19+ messages in thread
From: H.J. Lu @ 2021-05-22  1:38 UTC (permalink / raw)
  To: Florian Weimer; +Cc: GNU C Library, Adhemerval Zanella

On Thu, May 20, 2021 at 7:53 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * H. J. Lu:
>
> > extern int clone3 (struct clone_args *__cl_args,
> >                  int (*__func) (void *__arg), void *__arg);
> > ---
> >  sysdeps/unix/sysv/linux/x86_64/clone3.S | 92 +++++++++++++++++++++++++
> >  sysdeps/unix/sysv/linux/x86_64/sysdep.h |  2 +
> >  2 files changed, 94 insertions(+)
> >  create mode 100644 sysdeps/unix/sysv/linux/x86_64/clone3.S
> >
> > diff --git a/sysdeps/unix/sysv/linux/x86_64/clone3.S b/sysdeps/unix/sysv/linux/x86_64/clone3.S
> > new file mode 100644
> > index 0000000000..f7d4036a6a
>
> > +        .text
> > +ENTRY (__clone3)
> > +     /* Sanity check arguments.  */
> > +     movq    $-EINVAL, %rax
> > +     testq   %rdi, %rdi              /* No NULL cl_args pointer.  */
> > +     jz      SYSCALL_ERROR_LABEL
> > +     testq   %rsi, %rsi              /* No NULL function pointer.  */
> > +     jz      SYSCALL_ERROR_LABEL
>
> I think some of these register aren't x32-compatible.  Isn't the upper
> half undefined?

All pointers passed in registers are zero-extended to 64 bits.
I changed it to use REG_LP macros to avoid the REX prefix.

> > +     /* Save the function pointer in R8 which is preserved by the
> > +        syscall.  */
> > +     movq    %rsi, %r8
> > +
> > +     /* Put sizeof (struct clone_args) in ESI.  */
> > +     movl    $CLONE_ARGS_SIZE , %esi
>
> If this is in preparation of the public wrapper, this should actually be
> an argument.  Sorry didn't realize this was the direction.

Fixed.

> > +L(thread_start):
> > +     cfi_startproc
> > +     /* Clearing frame pointer is insufficient, use CFI.  */
> > +     cfi_undefined (rip)
> > +     /* Clear the frame pointer.  The ABI suggests this be done, to mark
> > +        the outermost frame obviously.  */
> > +     xorl    %ebp, %ebp
> > +
> > +     /* Set up arguments for the function call.  */
> > +     movq    %rdx, %rdi      /* Argument.  */
> > +     call    *%r8            /* Call function.  */
> > +     /* Call exit with return value from function call. */
> > +     movq    %rax, %rdi
> > +     movl    $SYS_ify(exit), %eax
> > +     syscall
> > +     cfi_endproc
> > +
> > +     cfi_startproc
> > +PSEUDO_END (__clone3)
>
> If this is a public wrapper, should it round up %rsp to 16 bytes

Fixed.

> at the point of the caller, to follow the x86-64 calling convention?
>
> Thanks,
> Florian
>

Thanks.

-- 
H.J.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 4/5] x86-64: Add the clone3 wrapper
  2021-05-20 18:39     ` Noah Goldstein
@ 2021-05-22  1:52       ` H.J. Lu
  0 siblings, 0 replies; 19+ messages in thread
From: H.J. Lu @ 2021-05-22  1:52 UTC (permalink / raw)
  To: Noah Goldstein; +Cc: GNU C Library, Florian Weimer

On Thu, May 20, 2021 at 11:39 AM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
>
> On Thu, May 20, 2021 at 2:35 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
> >
> > On Sat, May 15, 2021 at 9:23 AM H.J. Lu via Libc-alpha
> > <libc-alpha@sourceware.org> wrote:
> > >
> > > extern int clone3 (struct clone_args *__cl_args,
> > >                    int (*__func) (void *__arg), void *__arg);
> > > ---
> > >  sysdeps/unix/sysv/linux/x86_64/clone3.S | 92 +++++++++++++++++++++++++
> > >  sysdeps/unix/sysv/linux/x86_64/sysdep.h |  2 +
> > >  2 files changed, 94 insertions(+)
> > >  create mode 100644 sysdeps/unix/sysv/linux/x86_64/clone3.S
> > >
> > > diff --git a/sysdeps/unix/sysv/linux/x86_64/clone3.S b/sysdeps/unix/sysv/linux/x86_64/clone3.S
> > > new file mode 100644
> > > index 0000000000..f7d4036a6a
> > > --- /dev/null
> > > +++ b/sysdeps/unix/sysv/linux/x86_64/clone3.S
> > > @@ -0,0 +1,92 @@
> > > +/* The clone3 syscall wrapper.  Linux/x86-64 version.
> > > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > > +   This file is part of the GNU C Library.
> > > +
> > > +   The GNU C Library is free software; you can redistribute it and/or
> > > +   modify it under the terms of the GNU Lesser General Public
> > > +   License as published by the Free Software Foundation; either
> > > +   version 2.1 of the License, or (at your option) any later version.
> > > +
> > > +   The GNU C Library is distributed in the hope that it will be useful,
> > > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > > +   Lesser General Public License for more details.
> > > +
> > > +   You should have received a copy of the GNU Lesser General Public
> > > +   License along with the GNU C Library; if not, see
> > > +   <https://www.gnu.org/licenses/>.  */
> > > +
> > > +/* clone3() is even more special than fork() as it mucks with stacks
> > > +   and invokes a function in the right context after its all over.  */
> > > +
> > > +#include <sysdep.h>
> > > +#include <clone-offsets.h>
> > > +
> > > +/* The userland implementation is:
> > > +   int clone3 (struct clone_args *cl_args, int (*func)(void *arg),
> > > +              void *arg);
> > > +   the kernel entry is:
> > > +   int clone3 (struct clone_args *cl_args, size_t size);
> > > +
> > > +   The parameters are passed in registers from userland:
> > > +   rdi: cl_args
> > > +   rsi: func
> > > +   rdx: arg
> > > +
> > > +   The kernel expects:
> > > +   rax: system call number
> > > +   rdi: cl_args
> > > +   rsi: size  */
> > > +
> > > +        .text
> > > +ENTRY (__clone3)
> > > +       /* Sanity check arguments.  */
> > > +       movq    $-EINVAL, %rax
> >
> > Can this be movl?

Yes.  Fixed.

> > > +       testq   %rdi, %rdi              /* No NULL cl_args pointer.  */
> > > +       jz      SYSCALL_ERROR_LABEL
> > > +       testq   %rsi, %rsi              /* No NULL function pointer.  */
> > > +       jz      SYSCALL_ERROR_LABEL
> > > +
> > > +       /* Save the function pointer in R8 which is preserved by the
> > > +          syscall.  */
> > > +       movq    %rsi, %r8
> > > +
> > > +       /* Put sizeof (struct clone_args) in ESI.  */
> > > +       movl    $CLONE_ARGS_SIZE , %esi
> > > +
> > > +       /* Do the system call.  */
> > > +       movl    $SYS_ify(clone3), %eax
> > > +
> > > +       /* End FDE now, because in the child the unwind info will be
> > > +          wrong.  */
> > > +       cfi_endproc
> > > +       syscall
> > > +
> > > +       test    %RAX_LP, %RAX_LP
> > > +       jl      SYSCALL_ERROR_LABEL
> > > +       jz      L(thread_start)
> > > +
> >
> > Is expectation to go to L(thread_start)? If so
> > think jnz L(ret) and fallthrough is probably
> > better.
>
> Or better take the error check branch off
> the critical path with jnz L(error_or_ret) then jl
> in L(error_or_ret)

I don't think the clone wrapper is on the critical path.
Since the same code is executed by both child and parent.
I check the error return first.

> >
> > > +       ret
> > > +
> > > +L(thread_start):
> > > +       cfi_startproc
> > > +       /* Clearing frame pointer is insufficient, use CFI.  */
> > > +       cfi_undefined (rip)
> > > +       /* Clear the frame pointer.  The ABI suggests this be done, to mark
> > > +          the outermost frame obviously.  */
> > > +       xorl    %ebp, %ebp
> > > +
> > > +       /* Set up arguments for the function call.  */
> > > +       movq    %rdx, %rdi      /* Argument.  */
> > > +       call    *%r8            /* Call function.  */
> > > +       /* Call exit with return value from function call. */
> > > +       movq    %rax, %rdi
> > > +       movl    $SYS_ify(exit), %eax
> > > +       syscall
> > > +       cfi_endproc
> > > +
> > > +       cfi_startproc
> > > +PSEUDO_END (__clone3)
> > > +
> > > +libc_hidden_def (__clone3)
> > > +weak_alias (__clone3, clone3)
> > > diff --git a/sysdeps/unix/sysv/linux/x86_64/sysdep.h b/sysdeps/unix/sysv/linux/x86_64/sysdep.h
> > > index dbad2c788a..f26ffc68ae 100644
> > > --- a/sysdeps/unix/sysv/linux/x86_64/sysdep.h
> > > +++ b/sysdeps/unix/sysv/linux/x86_64/sysdep.h
> > > @@ -377,6 +377,8 @@
> > >  # define HAVE_GETCPU_VSYSCALL          "__vdso_getcpu"
> > >  # define HAVE_CLOCK_GETRES64_VSYSCALL   "__vdso_clock_getres"
> > >
> > > +# define HAVE_CLONE3_WAPPER                    1
> > > +
> > >  # define SINGLE_THREAD_BY_GLOBAL               1
> > >
> > >  #endif /* __ASSEMBLER__ */
> > > --
> > > 2.31.1
> > >

Thanks.

-- 
H.J.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 5/5] Add tests for __clone_internal
  2021-05-20 15:08   ` Florian Weimer
@ 2021-05-22  1:54     ` H.J. Lu
  0 siblings, 0 replies; 19+ messages in thread
From: H.J. Lu @ 2021-05-22  1:54 UTC (permalink / raw)
  To: Florian Weimer; +Cc: GNU C Library, Adhemerval Zanella

On Thu, May 20, 2021 at 8:08 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * H. J. Lu:
>
> > diff --git a/sysdeps/unix/sysv/linux/tst-align-clone-internal.c b/sysdeps/unix/sysv/linux/tst-align-clone-internal.c
> > new file mode 100644
> > index 0000000000..eccc39e255
> > --- /dev/null
> > +++ b/sysdeps/unix/sysv/linux/tst-align-clone-internal.c
>
> > +  int e;
> > +  if (waitpid (p, &e, __WCLONE) != p)
> > +    {
> > +      puts ("waitpid failed");
> > +      kill (p, SIGKILL);
> > +      return 1;
> > +    }
>
> This could use xwaitpid.  The same comment applies to other tests.

Fixed.

> > diff --git a/sysdeps/unix/sysv/linux/tst-clone-internal.c b/sysdeps/unix/sysv/linux/tst-clone-internal.c
> > new file mode 100644
> > index 0000000000..587d519bf2
> > --- /dev/null
> > +++ b/sysdeps/unix/sysv/linux/tst-clone-internal.c
>
> > +static int
> > +do_test (void)
> > +{
> > +  int result;
> > +
> > +  result = __clone_internal (NULL, child_fn, NULL);
> > +
> > +  if (errno != EINVAL || result != -1)
> > +    {
> > +      printf ("FAIL: clone()=%d (wanted -1) errno=%d (wanted %d)\n",
> > +              result, errno, EINVAL);
> > +      return 1;
> > +    }
> > +
> > +  puts ("All OK");
> > +  return 0;
> > +}
>
> I think this test is invalid for the internal function (the comment
> about not checking arguments).

Removed.

> > diff --git a/sysdeps/unix/sysv/linux/tst-clone2-internal.c b/sysdeps/unix/sysv/linux/tst-clone2-internal.c
> > new file mode 100644
> > index 0000000000..dd8f32c24b
> > --- /dev/null
> > +++ b/sysdeps/unix/sysv/linux/tst-clone2-internal.c
> > @@ -0,0 +1,142 @@
>
> > +
> > +static int
> > +f (void *a)
> > +{
> > +  close (pipefd[0]);
> > +
> > +  pid_t ppid = getppid ();
> > +  pid_t pid = getpid ();
> > +  pid_t tid = syscall (__NR_gettid);
> > +
> > +  if (write (pipefd[1], &ppid, sizeof ppid) != sizeof (ppid))
> > +    FAIL_EXIT1 ("write ppid failed\n");
> > +  if (write (pipefd[1], &pid, sizeof pid) != sizeof (pid))
> > +    FAIL_EXIT1 ("write pid failed\n");
> > +  if (write (pipefd[1], &tid, sizeof tid) != sizeof (tid))
> > +    FAIL_EXIT1 ("write tid failed\n");
> > +
> > +  return 0;
> > +}
>
> You could use support_shared_allocate for the parent/child communication
> instead of a pipe.  The MAP_SHARED mapping overrides the lack of
> CLONE_VM.
>
> > +static int
> > +do_test (void)
> > +{
> > +  sig = SIGRTMIN;
> > +  sigset_t ss;
> > +  sigemptyset (&ss);
> > +  sigaddset (&ss, sig);
> > +  if (sigprocmask (SIG_BLOCK, &ss, NULL) != 0)
> > +    FAIL_EXIT1 ("sigprocmask failed: %m");
>
> You could use xpthread_sigmask.  (Applies to tst-getpid1-internal.c as
> well.)

Since this test is copied from the previous test and doesn't use
thread, I'd like to keep it this way.

>
> > +  pid_t own_pid = getpid ();
> > +  pid_t own_tid = syscall (__NR_gettid);
>
> We have gettid nowadays.

Fixed.

> > +#include <support/test-driver.c>
> > diff --git a/sysdeps/unix/sysv/linux/tst-clone3-internal.c b/sysdeps/unix/sysv/linux/tst-clone3-internal.c
> > new file mode 100644
> > index 0000000000..61863e1504
> > --- /dev/null
> > +++ b/sysdeps/unix/sysv/linux/tst-clone3-internal.c
> > +#include <stackinfo.h>  /* For _STACK_GROWS_{UP,DOWN}.  */
>
> No longer needed! 8-)

Fixed.

> > +#include <support/check.h>
> > +#include <stdatomic.h>
> > +#include <clone_internal.h>
> > +
> > +/* Test if clone call with CLONE_THREAD does not call exit_group.  The 'f'
> > +   function returns '1', which will be used by clone thread to call the
> > +   'exit' syscall directly.  If _exit is used instead, exit_group will be
> > +   used and thus the thread group will finish with return value of '1'
> > +   (where '2' from main thread is expected.  */
>
> Missing ).

Fixed.

> The rest looks okay as far as I can tell.
>
> Thanks,
> Florian
>

Thanks.

-- 
H.J.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 3/5] GLIBC_PRIVATE: Export __clone_internal
  2021-05-20 14:24   ` Florian Weimer
@ 2021-05-22  1:55     ` H.J. Lu
  0 siblings, 0 replies; 19+ messages in thread
From: H.J. Lu @ 2021-05-22  1:55 UTC (permalink / raw)
  To: Florian Weimer; +Cc: GNU C Library, Adhemerval Zanella

On Thu, May 20, 2021 at 7:24 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * H. J. Lu:
>
> > Export __clone_internal for libpthread.so and __clone_internal tests.
> > ---
> >  sysdeps/unix/sysv/linux/Versions | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/sysdeps/unix/sysv/linux/Versions b/sysdeps/unix/sysv/linux/Versions
> > index 220bb2dffe..299d4fef9c 100644
> > --- a/sysdeps/unix/sysv/linux/Versions
> > +++ b/sysdeps/unix/sysv/linux/Versions
> > @@ -179,6 +179,7 @@ libc {
> >      __sigtimedwait;
> >      # functions used by nscd
> >      __netlink_assert_response;
> > +    __clone_internal;
> >    }
> >  }
>
> I think this won't be necessary after the libpthread move.

This patch has been dropped.

> We can test the function directly by linking statically.  We already do
> this in a few other cases.

That is in the v6 patch.

Thanks.

-- 
H.J.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2021-05-22  1:56 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-15 12:34 [PATCH v5 0/5] Add an internal wrapper for clone, clone2 and clone3 H.J. Lu
2021-05-15 12:34 ` [PATCH v5 1/5] " H.J. Lu
2021-05-20 14:46   ` Florian Weimer
2021-05-22  1:14     ` H.J. Lu
2021-05-15 12:34 ` [PATCH v5 2/5] nptl: Always pass stack size to create_thread H.J. Lu
2021-05-20 14:26   ` Florian Weimer
2021-05-15 12:34 ` [PATCH v5 3/5] GLIBC_PRIVATE: Export __clone_internal H.J. Lu
2021-05-17 13:54   ` Andreas Schwab
2021-05-20 14:24   ` Florian Weimer
2021-05-22  1:55     ` H.J. Lu
2021-05-15 12:34 ` [PATCH v5 4/5] x86-64: Add the clone3 wrapper H.J. Lu
2021-05-20 14:53   ` Florian Weimer
2021-05-22  1:38     ` H.J. Lu
2021-05-20 18:35   ` Noah Goldstein
2021-05-20 18:39     ` Noah Goldstein
2021-05-22  1:52       ` H.J. Lu
2021-05-15 12:34 ` [PATCH v5 5/5] Add tests for __clone_internal H.J. Lu
2021-05-20 15:08   ` Florian Weimer
2021-05-22  1:54     ` H.J. Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).