From: Florian Weimer <fweimer@redhat.com>
To: "H.J. Lu via Libc-alpha" <libc-alpha@sourceware.org>
Cc: Christian Brauner <christian.brauner@ubuntu.com>,
"H.J. Lu" <hjl.tools@gmail.com>
Subject: Re: [PATCH v3] Add an internal wrapper for clone, clone2 and clone3
Date: Mon, 10 May 2021 10:17:38 +0200 [thread overview]
Message-ID: <87y2cnf1el.fsf@oldenburg.str.redhat.com> (raw)
In-Reply-To: <CAMe9rOrYKmHQJ8rTvqXP0UHVzzFPm0Nr6b8cByRCN5GzBTyyXw@mail.gmail.com> (H. J. Lu via Libc-alpha's message of "Fri, 5 Mar 2021 08:59:39 -0800")
* H. J. Lu via Libc-alpha:
> diff --git a/include/sched.h b/include/sched.h
> index b0bf971c93..28dd99a571 100644
> --- a/include/sched.h
> +++ b/include/sched.h
> @@ -32,5 +32,8 @@ libc_hidden_proto (__clone2)
> and Hurd doesn't have it. */
> extern int __getcpu (unsigned int *, unsigned int *);
> libc_hidden_proto (__getcpu)
> +
> +# include <clone-internal.h>
> +libc_hidden_proto (__clone_internal)
> #endif
> #endif
I think the libc_hidden_proto should be in <clone3.h> below, and
including the header here shouldn't be necessary.
> diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c
> index 149b999603..73c7f33a00 100644
> --- a/nptl/allocatestack.c
> +++ b/nptl/allocatestack.c
> @@ -34,47 +34,6 @@
> #include <stack-aliasing.h>
>
>
> -#ifndef NEED_SEPARATE_REGISTER_STACK
This conflicts with my libpthread removal work.
Maybe it is possible to do these changes in a separate refactoring?
> diff --git a/sysdeps/generic/clone-internal.h b/sysdeps/generic/clone-internal.h
> new file mode 100644
> index 0000000000..1c1f997ca5
> --- /dev/null
> +++ b/sysdeps/generic/clone-internal.h
This should be in sysdeps/unix/sysv/linux because it is Linux-specific.
I would call it <clone3.h> (still as an internal-only header of course).
> @@ -0,0 +1,49 @@
> +/* The internal wrapper of clone and clone3.
> + Copyright (C) 2021 Free Software Foundation, Inc.
> + This file is part of the GNU C Library.
> +
> + The GNU C Library is free software; you can redistribute it and/or
> + modify it under the terms of the GNU Lesser General Public
> + License as published by the Free Software Foundation; either
> + version 2.1 of the License, or (at your option) any later version.
> +
> + The GNU C Library is distributed in the hope that it will be useful,
> + but WITHOUT ANY WARRANTY; without even the implied warranty of
> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + Lesser General Public License for more details.
> +
> + You should have received a copy of the GNU Lesser General Public
> + License along with the GNU C Library. If not, see
> + <https://www.gnu.org/licenses/>. */
> +
> +#include <stdint.h>
> +#include <stddef.h>
> +
> +struct clone_args
> +{
> + uint64_t flags; /* Flags bit mask. */
> + uint64_t pidfd; /* Where to store PID file descriptor
> + (pid_t *). */
> + uint64_t child_tid; /* Where to store child TID, in child's memory
> + (pid_t *). */
> + uint64_t parent_tid; /* Where to store child TID, in parent's memory
> + (int *). */
> + uint64_t exit_signal; /* Signal to deliver to parent on child
> + termination */
> + uint64_t stack; /* The lowest address of stack. */
> + uint64_t stack_size; /* Size of stack. */
> + uint64_t tls; /* Location of new TLS. */
> + uint64_t set_tid; /* Pointer to a pid_t array
> + (since Linux 5.5). */
> + uint64_t set_tid_size; /* Number of elements in set_tid
> + (since Linux 5.5). */
> + uint64_t cgroup; /* File descriptor for target cgroup
> + of child (since Linux 5.7). */
> +};
I think this has to be surrounded by #ifdef CLONE_ARGS_SIZE_VER0 (after
including <linux/sched.h>). This assumes we actually can include
<linux/sched.h> in the glibc build, which I haven't checked. But I
think it is worth a try.
If we can't use <linux/sched.h>, it would be safer to rename that
struct, in case kernel headers begin to define it.
> +extern int __clone3 (struct clone_args *cl_args, size_t size,
> + int (*__fn) (void *__arg), void *__arg)
> + __attribute__ ((visibility ("hidden")));
Please use libc_hidden_proto (without the explicit visibility change).
The internal __clone3 call should not have size argument because the
size of struct clone_args is known during the build. Soon we won't need
GLIBC_PRIVATE, so there is no risk of this backwards-incompatible
interface leaking out to userspace.
Likewise for __clone_internal.
> diff --git a/sysdeps/unix/sysv/linux/clone-internal.c b/sysdeps/unix/sysv/linux/clone-internal.c
> new file mode 100644
> index 0000000000..fd9929b6df
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/clone-internal.c
> @@ -0,0 +1,62 @@
> +#include <errno.h>
> +#include <sched.h>
> +#include <libc-pointer-arith.h> /* For cast_to_pointer. */
> +#include <stackinfo.h> /* For _STACK_GROWS_{UP,DOWN}. */
> +
> +int
> +__clone_internal (struct clone_args *cl_args, size_t size,
> + int (*fn)(void *arg), void *arg)
> +{
> + /* Try clone3 first. */
> + int ret = __clone3 (cl_args, size, fn, arg);
> + if (ret == -1 && errno == ENOSYS)
> + {
> + /* NB: Must clear errno since errno may be checked against non-zero
> + return value. */
> + __set_errno (0);
Setting errno to zero doesn't look right. This should probably restore
the original errno value.
> + /* Map clone3 arguments to clone arguments. */
> + int flags = cl_args->flags | cl_args->exit_signal;
> + void *stack = cast_to_pointer (cl_args->stack);
I think you should check here if there are flags set outside of 0xff
because the kernel won't.
> +libc_hidden_def (__clone_internal)
> diff --git a/sysdeps/unix/sysv/linux/clone-offsets.sym b/sysdeps/unix/sysv/linux/clone-offsets.sym
> new file mode 100644
> index 0000000000..38d447bcf3
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/clone-offsets.sym
> @@ -0,0 +1,9 @@
> +#include <stddef.h>
> +#include <sched.h>
> +
> +--
> +
> +#define clone_args_offset(member) offsetof (struct clone_args, member)
> +
> +CLONE_ARGS_STACK_OFFSET clone_args_offset(stack)
> +CLONE_ARGS_STACK_SIZE_OFFSET clone_args_offset(stack_size)
Missing spaces before '('.
> diff --git a/sysdeps/unix/sysv/linux/tst-align-clone.c b/sysdeps/unix/sysv/linux/tst-align-clone.c
> index 6ace61bac3..5f00b8ee4a 100644
> --- a/sysdeps/unix/sysv/linux/tst-align-clone.c
> +++ b/sysdeps/unix/sysv/linux/tst-align-clone.c
> @@ -23,6 +23,7 @@
> #include <sys/wait.h>
> #include <unistd.h>
> #include <tst-stack-align.h>
> +#include <clone-internal.h>
> #include <stackinfo.h>
>
> static int
> @@ -49,21 +50,17 @@ do_test (void)
> ok = false;
>
> #ifdef __ia64__
> - extern int __clone2 (int (*__fn) (void *__arg), void *__child_stack_base,
> - size_t __child_stack_size, int __flags,
> - void *__arg, ...);
> - char st[256 * 1024];
> - pid_t p = __clone2 (f, st, sizeof (st), 0, 0);
> +# define STACK_SIZE 256 * 1024
> #else
> - char st[128 * 1024] __attribute__ ((aligned));
> -# if _STACK_GROWS_DOWN
> - pid_t p = clone (f, st + sizeof (st), 0, 0);
> -# elif _STACK_GROWS_UP
> - pid_t p = clone (f, st, 0, 0);
> -# else
> -# error "Define either _STACK_GROWS_DOWN or _STACK_GROWS_UP"
> -# endif
> +# define STACK_SIZE 128 * 1024
> #endif
> + char st[STACK_SIZE] __attribute__ ((aligned));
> + struct clone_args clone_args =
> + {
> + .stack = (uintptr_t) st,
> + .stack_size = sizeof (st),
> + };
> + pid_t p = __clone_internal (&clone_args, sizeof (clone_args), f, 0);
> if (p == -1)
> {
> printf("clone failed: %m\n");
Please do not remvoe coverage of the clone/__clone2 interface from the
test. Either test both here or add a new test.
The same comment applies to the other tests as well.
> diff --git a/sysdeps/unix/sysv/linux/x86_64/clone3.S b/sysdeps/unix/sysv/linux/x86_64/clone3.S
> new file mode 100644
> index 0000000000..4d99b4b881
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/x86_64/clone3.S
> @@ -0,0 +1,105 @@
> +/* The clone3 syscall wrapper. Linux/x86-64 version.
> + Copyright (C) 2021 Free Software Foundation, Inc.
> + This file is part of the GNU C Library.
> +
> + The GNU C Library is free software; you can redistribute it and/or
> + modify it under the terms of the GNU Lesser General Public
> + License as published by the Free Software Foundation; either
> + version 2.1 of the License, or (at your option) any later version.
> +
> + The GNU C Library is distributed in the hope that it will be useful,
> + but WITHOUT ANY WARRANTY; without even the implied warranty of
> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + Lesser General Public License for more details.
> +
> + You should have received a copy of the GNU Lesser General Public
> + License along with the GNU C Library; if not, see
> + <https://www.gnu.org/licenses/>. */
> +
> +/* clone3() is even more special than fork() as it mucks with stacks
> + and invokes a function in the right context after its all over. */
> +
> +#include <sysdep.h>
> +#include <clone-offsets.h>
> +
> +/* The userland implementation is:
> + int clone3 (struct clone_args *cl_args, size_t size,
> + int (*fn)(void *arg), void *arg);
> + the kernel entry is:
> + int clone3 (struct clone_args *cl_args, size_t size);
> +
> + The parameters are passed in registers from userland:
> + rdi: cl_args
> + rsi: size
> + rdx: fn
> + rcx: arg
> +
> + The kernel expects:
> + rax: system call number
> + rdi: cl_args
> + rsi: size */
> +
> + .text
> +ENTRY (__clone3)
> + /* Sanity check arguments. */
> + movq $-EINVAL, %rax
> + testq %rdi, %rdi /* No NULL cl_args pointer. */
> + jz SYSCALL_ERROR_LABEL
> + testq %rdx, %rdx /* No NULL function pointer. */
> + jz SYSCALL_ERROR_LABEL
I don't think we need the error checking for the internal usage.
> + /* Load the new stack size into R9. */
> + movq CLONE_ARGS_STACK_SIZE_OFFSET(%rdi), %r9
> +
> + /* Load the new stack bottom into R8. */
> + movq CLONE_ARGS_STACK_OFFSET(%rdi), %r8
> +
> + /* Make room for 2 8-byte slots on the new stack. */
> + subq $16, %r9
> +
> + /* Insert the argument onto the new stack. */
> + movq %rcx, 8(%r8, %r9)
> +
> + /* Save the function pointer. It will be popped off in the
> + child. */
> + movq %rdx, (%r8, %r9)
> +
> + /* Update the stack size field passed to clone3. */
> + movq %r9, CLONE_ARGS_STACK_SIZE_OFFSET(%rdi)
Why is this necessary? This is rather surprising. I think you should
avoid modification here. Can you pass the data to thread_start in
registers instead?
Thanks,
Florian
next prev parent reply other threads:[~2021-05-10 8:17 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-02-14 22:45 [PATCH] " H.J. Lu
2021-02-15 10:44 ` Florian Weimer
2021-02-15 14:17 ` H.J. Lu
2021-02-15 14:27 ` Florian Weimer
2021-02-15 14:44 ` H.J. Lu
2021-02-15 14:39 ` Adhemerval Zanella
2021-03-04 18:40 ` [PATCH v2] " H.J. Lu
2021-03-04 18:50 ` H.J. Lu
2021-03-05 8:32 ` Christian Brauner
2021-03-05 16:59 ` [PATCH v3] " H.J. Lu
2021-03-18 23:18 ` PING^1 " H.J. Lu
2021-04-19 12:56 ` H.J. Lu
2021-05-10 8:17 ` Florian Weimer [this message]
2021-05-10 23:56 ` H.J. Lu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87y2cnf1el.fsf@oldenburg.str.redhat.com \
--to=fweimer@redhat.com \
--cc=christian.brauner@ubuntu.com \
--cc=hjl.tools@gmail.com \
--cc=libc-alpha@sourceware.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).