From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf1-x435.google.com (mail-pf1-x435.google.com [IPv6:2607:f8b0:4864:20::435]) by sourceware.org (Postfix) with ESMTPS id 1DF81383F423 for ; Tue, 13 Jul 2021 18:54:20 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 1DF81383F423 Received: by mail-pf1-x435.google.com with SMTP id j199so20481450pfd.7 for ; Tue, 13 Jul 2021 11:54:20 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=9SXPRXs3Fw7A86sAmlkefxN8zRSULuY0sRBfXepU/Fg=; b=BxB124u2oVuGfuLB+qlgaffTW3Kq3sh1oB6pVHNl2GNIJfWRw8fHlNm389M0/FZb1H htSLFhpEpk6yFhRywadGw/DLib+1vFruoG2aG5cHahTTSvLLDWRF7O8rk1RPnqeJQdyp sEMzsUfzD23Z3olGMQc0Ps7/AqUTFpJxsH45vNPUD3sCTJqLkS3uiXwa8QsgacyyuyGp V8Mg6JzPDZvkT7g9vxxU7+DKIRW707ntuswvGT5U3FG+/NXQgvBj7yok+qhSd6CL2pdT KAdX0CVTHSzOlr9bJmn8UBGuJu3K86Sm58MFdDy3yi69j5AXAPujSa4R7M06auuSc6Z/ ig+Q== X-Gm-Message-State: AOAM532LJKGon4/OYN+6ZVQadVF0+dn8WIf1b6XSHDWfthCgix4YwzWz J+AqF+shQQLuhYuGhssrSiyHXg== X-Google-Smtp-Source: ABdhPJzQOF6Afn0VK9oq7cxowZyKwXD2MYllDRTau9Oq4+k8Sgr8uecZnvlEKaqWLRbPWsZsW3cSfw== X-Received: by 2002:aa7:9e5c:0:b029:32b:4e2a:e549 with SMTP id z28-20020aa79e5c0000b029032b4e2ae549mr6116262pfq.68.1626202459060; Tue, 13 Jul 2021 11:54:19 -0700 (PDT) Received: from [192.168.1.108] ([177.194.59.218]) by smtp.gmail.com with ESMTPSA id g2sm8659032pfv.91.2021.07.13.11.54.17 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 13 Jul 2021 11:54:18 -0700 (PDT) Subject: Re: [PATCH v8 1/3] Add an internal wrapper for clone, clone2 and clone3 To: "H.J. Lu" , libc-alpha@sourceware.org Cc: Florian Weimer , Noah Goldstein References: <20210601145516.3553627-1-hjl.tools@gmail.com> <20210601145516.3553627-2-hjl.tools@gmail.com> From: Adhemerval Zanella Message-ID: <19675287-dc96-6aed-392d-b9aab1504c49@linaro.org> Date: Tue, 13 Jul 2021 15:54:15 -0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: <20210601145516.3553627-2-hjl.tools@gmail.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_NUMSUBJECT, KAM_SHORT, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Jul 2021 18:54:24 -0000 On 01/06/2021 11:55, H.J. Lu wrote: > The clone3 system call provides a superset of the functionality of clone > and clone2. It also provides a number of API improvements, including > the ability to specify the size of the child's stack area which can be > used by kernel to compute the shadow stack size when allocating the > shadow stack. Add: > > extern int __clone_internal (struct clone_args *__cl_args, > int (*__func) (void *__arg), void *__arg); > > to provide an abstract interface for clone, clone2 and clone3. > > 1. Simplify stack management for thread creation by passing both stack > base and size to create_thread. > 2. Consolidate clone vs clone2 differences into a single file. > 3. Call __clone3 if HAVE_CLONE3_WAPPER is defined. If __clone3 returns > -1 with ENOSYS, fall back to clone or clone2. > 4. Use only __clone_internal to clone a thread. Since the stack size > argument for create_thread is now unconditional, always pass stack size > to create_thread. > 5. Enable the public clone3 wrapper in the future after it has been > added to all targets. > > NB: Sandbox should return ENOSYS on clone3 if it is rejected: > > https://bugs.chromium.org/p/chromium/issues/detail?id=1213452#c5 LGTM with just an suggestion below. Also chromium also has fixed it, so although it wouldn't be able to fully handled clone3, at least it won't brick a 2.34 glibc. Reviewed-by: Adhemerval Zanella > --- > include/clone_internal.h | 16 +++++ > nptl/allocatestack.c | 59 ++------------- > nptl/pthread_create.c | 38 +++++----- > sysdeps/unix/sysv/linux/Makefile | 2 +- > sysdeps/unix/sysv/linux/clone-internal.c | 91 ++++++++++++++++++++++++ > sysdeps/unix/sysv/linux/clone3.c | 1 + > sysdeps/unix/sysv/linux/clone3.h | 60 ++++++++++++++++ > sysdeps/unix/sysv/linux/spawni.c | 26 +++---- > 8 files changed, 205 insertions(+), 88 deletions(-) > create mode 100644 include/clone_internal.h > create mode 100644 sysdeps/unix/sysv/linux/clone-internal.c > create mode 100644 sysdeps/unix/sysv/linux/clone3.c > create mode 100644 sysdeps/unix/sysv/linux/clone3.h > > diff --git a/include/clone_internal.h b/include/clone_internal.h > new file mode 100644 > index 0000000000..4b23ef33ce > --- /dev/null > +++ b/include/clone_internal.h > @@ -0,0 +1,16 @@ > +#ifndef _CLONE3_H > +#include_next > + > +extern __typeof (clone3) __clone3; > + > +/* The internal wrapper of clone/clone2 and clone3. If __clone3 returns > + -1 with ENOSYS, fall back to clone or clone2. */ > +extern int __clone_internal (struct clone_args *__cl_args, > + int (*__func) (void *__arg), void *__arg); > + > +#ifndef _ISOMAC > +libc_hidden_proto (__clone3) > +libc_hidden_proto (__clone_internal) > +#endif > + > +#endif Ok. > diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c > index dc81a2ca73..eebf9c2c3c 100644 > --- a/nptl/allocatestack.c > +++ b/nptl/allocatestack.c > @@ -33,47 +33,6 @@ > #include > #include > > -#ifndef NEED_SEPARATE_REGISTER_STACK > - > -/* Most architectures have exactly one stack pointer. Some have more. */ > -# define STACK_VARIABLES void *stackaddr = NULL > - > -/* How to pass the values to the 'create_thread' function. */ > -# define STACK_VARIABLES_ARGS stackaddr > - > -/* How to declare function which gets there parameters. */ > -# define STACK_VARIABLES_PARMS void *stackaddr > - > -/* How to declare allocate_stack. */ > -# define ALLOCATE_STACK_PARMS void **stack > - > -/* This is how the function is called. We do it this way to allow > - other variants of the function to have more parameters. */ > -# define ALLOCATE_STACK(attr, pd) allocate_stack (attr, pd, &stackaddr) > - > -#else > - > -/* We need two stacks. The kernel will place them but we have to tell > - the kernel about the size of the reserved address space. */ > -# define STACK_VARIABLES void *stackaddr = NULL; size_t stacksize = 0 > - > -/* How to pass the values to the 'create_thread' function. */ > -# define STACK_VARIABLES_ARGS stackaddr, stacksize > - > -/* How to declare function which gets there parameters. */ > -# define STACK_VARIABLES_PARMS void *stackaddr, size_t stacksize > - > -/* How to declare allocate_stack. */ > -# define ALLOCATE_STACK_PARMS void **stack, size_t *stacksize > - > -/* This is how the function is called. We do it this way to allow > - other variants of the function to have more parameters. */ > -# define ALLOCATE_STACK(attr, pd) \ > - allocate_stack (attr, pd, &stackaddr, &stacksize) > - > -#endif > - > - > /* Default alignment of stack. */ > #ifndef STACK_ALIGN > # define STACK_ALIGN __alignof__ (long double) Ok. > @@ -249,7 +208,7 @@ advise_stack_range (void *mem, size_t size, uintptr_t pd, size_t guardsize) > PDP must be non-NULL. */ > static int > allocate_stack (const struct pthread_attr *attr, struct pthread **pdp, > - ALLOCATE_STACK_PARMS) > + void **stack, size_t *stacksize) > { > struct pthread *pd; > size_t size; > @@ -600,25 +559,17 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp, > /* We place the thread descriptor at the end of the stack. */ > *pdp = pd; > > -#if _STACK_GROWS_DOWN > void *stacktop; > > -# if TLS_TCB_AT_TP > +#if TLS_TCB_AT_TP > /* The stack begins before the TCB and the static TLS block. */ > stacktop = ((char *) (pd + 1) - tls_static_size_for_stack); > -# elif TLS_DTV_AT_TP > +#elif TLS_DTV_AT_TP > stacktop = (char *) (pd - 1); > -# endif > +#endif > > -# ifdef NEED_SEPARATE_REGISTER_STACK > + *stacksize = stacktop - pd->stackblock; > *stack = pd->stackblock; > - *stacksize = stacktop - *stack; > -# else > - *stack = stacktop; > -# endif > -#else > - *stack = pd->stackblock; > -#endif > > return 0; > } Ok. > diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c > index 2d2535b07d..9e3b8f325c 100644 > --- a/nptl/pthread_create.c > +++ b/nptl/pthread_create.c > @@ -37,6 +37,7 @@ > #include "libioP.h" > #include > #include > +#include > > #include > > @@ -246,8 +247,8 @@ late_init (void) > static int _Noreturn start_thread (void *arg); > > static int create_thread (struct pthread *pd, const struct pthread_attr *attr, > - bool *stopped_start, STACK_VARIABLES_PARMS, > - bool *thread_ran) > + bool *stopped_start, void *stackaddr, > + size_t stacksize, bool *thread_ran) > { > /* Determine whether the newly created threads has to be started > stopped since we have to set the scheduling parameters or set the > @@ -299,14 +300,18 @@ static int create_thread (struct pthread *pd, const struct pthread_attr *attr, > > TLS_DEFINE_INIT_TP (tp, pd); > > -#ifdef __NR_clone2 > -# define ARCH_CLONE __clone2 > -#else > -# define ARCH_CLONE __clone > -#endif > - if (__glibc_unlikely (ARCH_CLONE (&start_thread, STACK_VARIABLES_ARGS, > - clone_flags, pd, &pd->tid, tp, &pd->tid) > - == -1)) > + struct clone_args args = > + { > + .flags = clone_flags, > + .pidfd = (uintptr_t) &pd->tid, > + .parent_tid = (uintptr_t) &pd->tid, > + .child_tid = (uintptr_t) &pd->tid, > + .stack = (uintptr_t) stackaddr, > + .stack_size = stacksize, > + .tls = (uintptr_t) tp, > + }; > + int ret = __clone_internal (&args, &start_thread, pd); > + if (__glibc_unlikely (ret == -1)) > return errno; > > /* It's started now, so if we fail below, we'll have to cancel it Ok. > @@ -603,7 +608,8 @@ int > __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr, > void *(*start_routine) (void *), void *arg) > { > - STACK_VARIABLES; > + void *stackaddr = NULL; > + size_t stacksize = 0; > > /* Avoid a data race in the multi-threaded case, and call the > deferred initialization only once. */ > @@ -627,7 +633,7 @@ __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr, > } > > struct pthread *pd = NULL; > - int err = ALLOCATE_STACK (iattr, &pd); > + int err = allocate_stack (iattr, &pd, &stackaddr, &stacksize); > int retval = 0; > > if (__glibc_unlikely (err != 0)) > @@ -772,8 +778,8 @@ __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr, > > /* We always create the thread stopped at startup so we can > notify the debugger. */ > - retval = create_thread (pd, iattr, &stopped_start, > - STACK_VARIABLES_ARGS, &thread_ran); > + retval = create_thread (pd, iattr, &stopped_start, stackaddr, > + stacksize, &thread_ran); > if (retval == 0) > { > /* We retain ownership of PD until (a) (see CONCURRENCY NOTES > @@ -804,8 +810,8 @@ __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr, > } > } > else > - retval = create_thread (pd, iattr, &stopped_start, > - STACK_VARIABLES_ARGS, &thread_ran); > + retval = create_thread (pd, iattr, &stopped_start, stackaddr, > + stacksize, &thread_ran); > > /* Return to the previous signal mask, after creating the new > thread. */ Ok. > diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile > index bc14f20274..9469868bce 100644 > --- a/sysdeps/unix/sysv/linux/Makefile > +++ b/sysdeps/unix/sysv/linux/Makefile > @@ -64,7 +64,7 @@ sysdep_routines += adjtimex clone umount umount2 readahead sysctl \ > time64-support pselect32 \ > xstat fxstat lxstat xstat64 fxstat64 lxstat64 \ > fxstatat fxstatat64 \ > - xmknod xmknodat > + xmknod xmknodat clone3 clone-internal > > CFLAGS-gethostid.c = -fexceptions > CFLAGS-tee.c = -fexceptions -fasynchronous-unwind-tables Ok. > diff --git a/sysdeps/unix/sysv/linux/clone-internal.c b/sysdeps/unix/sysv/linux/clone-internal.c > new file mode 100644 > index 0000000000..1e7a8f6b35 > --- /dev/null > +++ b/sysdeps/unix/sysv/linux/clone-internal.c > @@ -0,0 +1,91 @@ > +/* The internal wrapper of clone and clone3. > + Copyright (C) 2021 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library. If not, see > + . */ > + > +#include > +#include > +#include > +#include > +#include > +#include /* For cast_to_pointer. */ > +#include /* For _STACK_GROWS_{UP,DOWN}. */ > + > +#define CLONE_ARGS_SIZE_VER0 64 /* sizeof first published struct */ > +#define CLONE_ARGS_SIZE_VER1 80 /* sizeof second published struct */ > +#define CLONE_ARGS_SIZE_VER2 88 /* sizeof third published struct */ > + > +#define sizeof_field(TYPE, MEMBER) sizeof ((((TYPE *)0)->MEMBER)) > +#define offsetofend(TYPE, MEMBER) \ > + (offsetof (TYPE, MEMBER) + sizeof_field (TYPE, MEMBER)) > + > +_Static_assert (__alignof (struct clone_args) == 8, > + "__alignof (struct clone_args) != 8"); > +_Static_assert (offsetofend (struct clone_args, tls) == CLONE_ARGS_SIZE_VER0, > + "offsetofend (struct clone_args, tls) != CLONE_ARGS_SIZE_VER0"); > +_Static_assert (offsetofend (struct clone_args, set_tid_size) == CLONE_ARGS_SIZE_VER1, > + "offsetofend (struct clone_args, set_tid_size) != CLONE_ARGS_SIZE_VER1"); > +_Static_assert (offsetofend (struct clone_args, cgroup) == CLONE_ARGS_SIZE_VER2, > + "offsetofend (struct clone_args, cgroup) != CLONE_ARGS_SIZE_VER2"); > +_Static_assert (sizeof (struct clone_args) == CLONE_ARGS_SIZE_VER2, > + "sizeof (struct clone_args) != CLONE_ARGS_SIZE_VER2"); > + > +int > +__clone_internal (struct clone_args *cl_args, > + int (*func) (void *arg), void *arg) > +{ > + int ret; > +#ifdef HAVE_CLONE3_WAPPER > + /* Try clone3 first. */ > + int saved_errno = errno; > + ret = __clone3 (cl_args, sizeof (*cl_args), func, arg); > + if (ret != -1 || errno != ENOSYS) > + return ret; > + > + /* NB: Restore errno since errno may be checked against non-zero > + return value. */ > + __set_errno (saved_errno); > +#endif > + > + /* Map clone3 arguments to clone arguments. NB: No need to check > + invalid clone3 specific bits in flags nor exit_signal since this > + is an internal function. */ > + int flags = cl_args->flags | cl_args->exit_signal; > + void *stack = cast_to_pointer (cl_args->stack); > + > +#ifdef __ia64__ > + ret = __clone2 (func, stack, cl_args->stack_size, > + flags, arg, > + cast_to_pointer (cl_args->parent_tid), > + cast_to_pointer (cl_args->tls), > + cast_to_pointer (cl_args->child_tid)); > +#else > +# if !_STACK_GROWS_DOWN && !_STACK_GROWS_UP > +# error "Define either _STACK_GROWS_DOWN or _STACK_GROWS_UP" > +# endif > + > +# if _STACK_GROWS_DOWN > + stack += cl_args->stack_size; > +# endif > + ret = __clone (func, stack, flags, arg, > + cast_to_pointer (cl_args->parent_tid), > + cast_to_pointer (cl_args->tls), > + cast_to_pointer (cl_args->child_tid)); > +#endif > + return ret; > +} > + > +libc_hidden_def (__clone_internal) Ok. > diff --git a/sysdeps/unix/sysv/linux/clone3.c b/sysdeps/unix/sysv/linux/clone3.c > new file mode 100644 > index 0000000000..de963ef89d > --- /dev/null > +++ b/sysdeps/unix/sysv/linux/clone3.c > @@ -0,0 +1 @@ > +/* An empty placeholder. */ Ok. > diff --git a/sysdeps/unix/sysv/linux/clone3.h b/sysdeps/unix/sysv/linux/clone3.h > new file mode 100644 > index 0000000000..0488884d59 > --- /dev/null > +++ b/sysdeps/unix/sysv/linux/clone3.h > @@ -0,0 +1,60 @@ > +/* The wrapper of clone3. > + Copyright (C) 2021 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library. If not, see > + . */ > + > +#ifndef _CLONE3_H > +#define _CLONE3_H 1 > + > +#include > +#include > +#include > + > +__BEGIN_DECLS > + > +/* This struct should only be used in an argument to the clone3 system > + call (along with its size argument). It may be extended with new > + fields in the future. */ > + > +struct clone_args > +{ > + uint64_t flags; /* Flags bit mask. */ > + uint64_t pidfd; /* Where to store PID file descriptor > + (pid_t *). */ > + uint64_t child_tid; /* Where to store child TID, in child's memory > + (pid_t *). */ > + uint64_t parent_tid; /* Where to store child TID, in parent's memory > + (int *). */ > + uint64_t exit_signal; /* Signal to deliver to parent on child > + termination */ > + uint64_t stack; /* The lowest address of stack. */ > + uint64_t stack_size; /* Size of stack. */ > + uint64_t tls; /* Location of new TLS. */ > + uint64_t set_tid; /* Pointer to a pid_t array > + (since Linux 5.5). */ > + uint64_t set_tid_size; /* Number of elements in set_tid > + (since Linux 5.5). */ > + uint64_t cgroup; /* File descriptor for target cgroup > + of child (since Linux 5.7). */ > +} __attribute__ ((aligned (8))); > + The kernel defined the alignment for each member (__aligned_u64) instead of aligning the struct itself. It should ok as lon all member are the same type, but I am not sure if kernel decide to extend using different internal types. Maybe we should mimic kernel in this regard. > +/* The wrapper of clone3. */ > +extern int clone3 (struct clone_args *__cl_args, size_t __size, > + int (*__func) (void *__arg), void *__arg); > + > +__END_DECLS > + > +#endif /* clone3.h */ Ok. > diff --git a/sysdeps/unix/sysv/linux/spawni.c b/sysdeps/unix/sysv/linux/spawni.c > index 501f8fbccd..fd29858cf5 100644 > --- a/sysdeps/unix/sysv/linux/spawni.c > +++ b/sysdeps/unix/sysv/linux/spawni.c > @@ -31,6 +31,7 @@ > #include > #include > #include > +#include > #include "spawn_int.h" > > /* The Linux implementation of posix_spawn{p} uses the clone syscall directly > @@ -59,21 +60,6 @@ > normal program exit with the exit code 127. */ > #define SPAWN_ERROR 127 > > -#ifdef __ia64__ > -# define CLONE(__fn, __stackbase, __stacksize, __flags, __args) \ > - __clone2 (__fn, __stackbase, __stacksize, __flags, __args, 0, 0, 0) > -#else > -# define CLONE(__fn, __stack, __stacksize, __flags, __args) \ > - __clone (__fn, __stack, __flags, __args) > -#endif > - > -/* Since ia64 wants the stackbase w/clone2, re-use the grows-up macro. */ > -#if _STACK_GROWS_UP || defined (__ia64__) > -# define STACK(__stack, __stack_size) (__stack) > -#elif _STACK_GROWS_DOWN > -# define STACK(__stack, __stack_size) (__stack + __stack_size) > -#endif > - > Ok. > struct posix_spawn_args > { > @@ -378,8 +364,14 @@ __spawnix (pid_t * pid, const char *file, > need for CLONE_SETTLS. Although parent and child share the same TLS > namespace, there will be no concurrent access for TLS variables (errno > for instance). */ > - new_pid = CLONE (__spawni_child, STACK (stack, stack_size), stack_size, > - CLONE_VM | CLONE_VFORK | SIGCHLD, &args); > + struct clone_args clone_args = > + { > + .flags = CLONE_VM | CLONE_VFORK, > + .exit_signal = SIGCHLD, > + .stack = (uintptr_t) stack, > + .stack_size = stack_size, > + }; > + new_pid = __clone_internal (&clone_args, __spawni_child, &args); > > /* It needs to collect the case where the auxiliary process was created > but failed to execute the file (due either any preparation step or > Ok.