public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: "H.J. Lu" <hjl.tools@gmail.com>
To: Florian Weimer <fweimer@redhat.com>
Cc: "H.J. Lu via Libc-alpha" <libc-alpha@sourceware.org>,
	 Christian Brauner <christian.brauner@ubuntu.com>
Subject: Re: [PATCH v3] Add an internal wrapper for clone, clone2 and clone3
Date: Mon, 10 May 2021 16:56:21 -0700	[thread overview]
Message-ID: <CAMe9rOpYgDhJ1EM3otfTkcOuZmfQWpxqy3Ny33GM=TDo-NhibA@mail.gmail.com> (raw)
In-Reply-To: <87y2cnf1el.fsf@oldenburg.str.redhat.com>

On Mon, May 10, 2021 at 1:17 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * H. J. Lu via Libc-alpha:
>
> > diff --git a/include/sched.h b/include/sched.h
> > index b0bf971c93..28dd99a571 100644
> > --- a/include/sched.h
> > +++ b/include/sched.h
> > @@ -32,5 +32,8 @@ libc_hidden_proto (__clone2)
> >     and Hurd doesn't have it.  */
> >  extern int __getcpu (unsigned int *, unsigned int *);
> >  libc_hidden_proto (__getcpu)
> > +
> > +# include <clone-internal.h>
> > +libc_hidden_proto (__clone_internal)
> >  #endif
> >  #endif
>
> I think the libc_hidden_proto should be in <clone3.h> below, and
> including the header here shouldn't be necessary.

Fixed.

> > diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c
> > index 149b999603..73c7f33a00 100644
> > --- a/nptl/allocatestack.c
> > +++ b/nptl/allocatestack.c
> > @@ -34,47 +34,6 @@
> >  #include <stack-aliasing.h>
> >
> >
> > -#ifndef NEED_SEPARATE_REGISTER_STACK
>
> This conflicts with my libpthread removal work.
>
> Maybe it is possible to do these changes in a separate refactoring?

Fixed.

> > diff --git a/sysdeps/generic/clone-internal.h b/sysdeps/generic/clone-internal.h
> > new file mode 100644
> > index 0000000000..1c1f997ca5
> > --- /dev/null
> > +++ b/sysdeps/generic/clone-internal.h
>
> This should be in sysdeps/unix/sysv/linux because it is Linux-specific.
>
> I would call it <clone3.h> (still as an internal-only header of course).

Fixed.

> > @@ -0,0 +1,49 @@
> > +/* The internal wrapper of clone and clone3.
> > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > +   This file is part of the GNU C Library.
> > +
> > +   The GNU C Library is free software; you can redistribute it and/or
> > +   modify it under the terms of the GNU Lesser General Public
> > +   License as published by the Free Software Foundation; either
> > +   version 2.1 of the License, or (at your option) any later version.
> > +
> > +   The GNU C Library is distributed in the hope that it will be useful,
> > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +   Lesser General Public License for more details.
> > +
> > +   You should have received a copy of the GNU Lesser General Public
> > +   License along with the GNU C Library.  If not, see
> > +   <https://www.gnu.org/licenses/>.  */
> > +
> > +#include <stdint.h>
> > +#include <stddef.h>
> > +
> > +struct clone_args
> > +{
> > +  uint64_t flags;     /* Flags bit mask.  */
> > +  uint64_t pidfd;     /* Where to store PID file descriptor
> > +                         (pid_t *).  */
> > +  uint64_t child_tid;         /* Where to store child TID, in child's memory
> > +                         (pid_t *).  */
> > +  uint64_t parent_tid;        /* Where to store child TID, in parent's memory
> > +                         (int *). */
> > +  uint64_t exit_signal;       /* Signal to deliver to parent on child
> > +                         termination */
> > +  uint64_t stack;     /* The lowest address of stack.  */
> > +  uint64_t stack_size;        /* Size of stack.  */
> > +  uint64_t tls;               /* Location of new TLS.  */
> > +  uint64_t set_tid;   /* Pointer to a pid_t array
> > +                         (since Linux 5.5).  */
> > +  uint64_t set_tid_size; /* Number of elements in set_tid
> > +                         (since Linux 5.5). */
> > +  uint64_t cgroup;    /* File descriptor for target cgroup
> > +                         of child (since Linux 5.7).  */
> > +};
>
> I think this has to be surrounded by #ifdef CLONE_ARGS_SIZE_VER0 (after
> including <linux/sched.h>).  This assumes we actually can include
> <linux/sched.h> in the glibc build, which I haven't checked.  But I
> think it is worth a try.
>
> If we can't use <linux/sched.h>, it would be safer to rename that
> struct, in case kernel headers begin to define it.

Fixed.

> > +extern int __clone3 (struct clone_args *cl_args, size_t size,
> > +                  int (*__fn) (void *__arg), void *__arg)
> > +  __attribute__ ((visibility ("hidden")));
>
> Please use libc_hidden_proto (without the explicit visibility change).

Fixed.

> The internal __clone3 call should not have size argument because the
> size of struct clone_args is known during the build.  Soon we won't need
> GLIBC_PRIVATE, so there is no risk of this backwards-incompatible
> interface leaking out to userspace.

Fixed.  __clone3 is only for internal use at the moment.  But we can export
it later if needed.

> Likewise for __clone_internal.

Fixed.  __clone_internal is kept in GLIBC_PRIVATE for glibc clone
tests.

> > diff --git a/sysdeps/unix/sysv/linux/clone-internal.c b/sysdeps/unix/sysv/linux/clone-internal.c
> > new file mode 100644
> > index 0000000000..fd9929b6df
> > --- /dev/null
> > +++ b/sysdeps/unix/sysv/linux/clone-internal.c
> > @@ -0,0 +1,62 @@
>
> > +#include <errno.h>
> > +#include <sched.h>
> > +#include <libc-pointer-arith.h>      /* For cast_to_pointer.  */
> > +#include <stackinfo.h>               /* For _STACK_GROWS_{UP,DOWN}.  */
> > +
> > +int
> > +__clone_internal (struct clone_args *cl_args, size_t size,
> > +               int (*fn)(void *arg), void *arg)
> > +{
> > +  /* Try clone3 first.  */
> > +  int ret = __clone3 (cl_args, size, fn, arg);
> > +  if (ret == -1 && errno == ENOSYS)
> > +    {
> > +      /* NB: Must clear errno since errno may be checked against non-zero
> > +         return value.  */
> > +      __set_errno (0);
>
> Setting errno to zero doesn't look right.  This should probably restore
> the original errno value.

Fixed.

> > +      /* Map clone3 arguments to clone arguments.  */
> > +      int flags = cl_args->flags | cl_args->exit_signal;
> > +      void *stack = cast_to_pointer (cl_args->stack);
>
> I think you should check here if there are flags set outside of 0xff
> because the kernel won't.

Fixed.

> > +libc_hidden_def (__clone_internal)
> > diff --git a/sysdeps/unix/sysv/linux/clone-offsets.sym b/sysdeps/unix/sysv/linux/clone-offsets.sym
> > new file mode 100644
> > index 0000000000..38d447bcf3
> > --- /dev/null
> > +++ b/sysdeps/unix/sysv/linux/clone-offsets.sym
> > @@ -0,0 +1,9 @@
> > +#include <stddef.h>
> > +#include <sched.h>
> > +
> > +--
> > +
> > +#define clone_args_offset(member) offsetof (struct clone_args, member)
> > +
> > +CLONE_ARGS_STACK_OFFSET              clone_args_offset(stack)
> > +CLONE_ARGS_STACK_SIZE_OFFSET clone_args_offset(stack_size)
>
> Missing spaces before '('.

Fixed.

> > diff --git a/sysdeps/unix/sysv/linux/tst-align-clone.c b/sysdeps/unix/sysv/linux/tst-align-clone.c
> > index 6ace61bac3..5f00b8ee4a 100644
> > --- a/sysdeps/unix/sysv/linux/tst-align-clone.c
> > +++ b/sysdeps/unix/sysv/linux/tst-align-clone.c
> > @@ -23,6 +23,7 @@
> >  #include <sys/wait.h>
> >  #include <unistd.h>
> >  #include <tst-stack-align.h>
> > +#include <clone-internal.h>
> >  #include <stackinfo.h>
> >
> >  static int
> > @@ -49,21 +50,17 @@ do_test (void)
> >      ok = false;
> >
> >  #ifdef __ia64__
> > -  extern int __clone2 (int (*__fn) (void *__arg), void *__child_stack_base,
> > -                    size_t __child_stack_size, int __flags,
> > -                    void *__arg, ...);
> > -  char st[256 * 1024];
> > -  pid_t p = __clone2 (f, st, sizeof (st), 0, 0);
> > +# define STACK_SIZE 256 * 1024
> >  #else
> > -  char st[128 * 1024] __attribute__ ((aligned));
> > -# if _STACK_GROWS_DOWN
> > -  pid_t p = clone (f, st + sizeof (st), 0, 0);
> > -# elif _STACK_GROWS_UP
> > -  pid_t p = clone (f, st, 0, 0);
> > -# else
> > -#  error "Define either _STACK_GROWS_DOWN or _STACK_GROWS_UP"
> > -# endif
> > +# define STACK_SIZE 128 * 1024
> >  #endif
> > +  char st[STACK_SIZE] __attribute__ ((aligned));
> > +  struct clone_args clone_args =
> > +    {
> > +      .stack = (uintptr_t) st,
> > +      .stack_size = sizeof (st),
> > +    };
> > +  pid_t p = __clone_internal (&clone_args, sizeof (clone_args), f, 0);
> >    if (p == -1)
> >      {
> >        printf("clone failed: %m\n");
>
> Please do not remvoe coverage of the clone/__clone2 interface from the
> test.  Either test both here or add a new test.
>
> The same comment applies to the other tests as well.

Fixed.

>
> > diff --git a/sysdeps/unix/sysv/linux/x86_64/clone3.S b/sysdeps/unix/sysv/linux/x86_64/clone3.S
> > new file mode 100644
> > index 0000000000..4d99b4b881
> > --- /dev/null
> > +++ b/sysdeps/unix/sysv/linux/x86_64/clone3.S
> > @@ -0,0 +1,105 @@
> > +/* The clone3 syscall wrapper.  Linux/x86-64 version.
> > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > +   This file is part of the GNU C Library.
> > +
> > +   The GNU C Library is free software; you can redistribute it and/or
> > +   modify it under the terms of the GNU Lesser General Public
> > +   License as published by the Free Software Foundation; either
> > +   version 2.1 of the License, or (at your option) any later version.
> > +
> > +   The GNU C Library is distributed in the hope that it will be useful,
> > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +   Lesser General Public License for more details.
> > +
> > +   You should have received a copy of the GNU Lesser General Public
> > +   License along with the GNU C Library; if not, see
> > +   <https://www.gnu.org/licenses/>.  */
> > +
> > +/* clone3() is even more special than fork() as it mucks with stacks
> > +   and invokes a function in the right context after its all over.  */
> > +
> > +#include <sysdep.h>
> > +#include <clone-offsets.h>
> > +
> > +/* The userland implementation is:
> > +   int clone3 (struct clone_args *cl_args, size_t size,
> > +            int (*fn)(void *arg), void *arg);
> > +   the kernel entry is:
> > +   int clone3 (struct clone_args *cl_args, size_t size);
> > +
> > +   The parameters are passed in registers from userland:
> > +   rdi: cl_args
> > +   rsi: size
> > +   rdx: fn
> > +   rcx: arg
> > +
> > +   The kernel expects:
> > +   rax: system call number
> > +   rdi: cl_args
> > +   rsi: size  */
> > +
> > +        .text
> > +ENTRY (__clone3)
> > +     /* Sanity check arguments.  */
> > +     movq    $-EINVAL, %rax
> > +     testq   %rdi, %rdi              /* No NULL cl_args pointer.  */
> > +     jz      SYSCALL_ERROR_LABEL
> > +     testq   %rdx, %rdx              /* No NULL function pointer.  */
> > +     jz      SYSCALL_ERROR_LABEL
>
> I don't think we need the error checking for the internal usage.

The clone3 wrapper can be for external use, just like clone.

> > +     /* Load the new stack size into R9.  */
> > +     movq    CLONE_ARGS_STACK_SIZE_OFFSET(%rdi), %r9
> > +
> > +     /* Load the new stack bottom into R8.  */
> > +     movq    CLONE_ARGS_STACK_OFFSET(%rdi), %r8
> > +
> > +     /* Make room for 2 8-byte slots on the new stack.  */
> > +     subq    $16, %r9
> > +
> > +     /* Insert the argument onto the new stack.  */
> > +     movq    %rcx, 8(%r8, %r9)
> > +
> > +     /* Save the function pointer.  It will be popped off in the
> > +        child.  */
> > +     movq    %rdx, (%r8, %r9)
> > +
> > +     /* Update the stack size field passed to clone3.  */
> > +     movq    %r9, CLONE_ARGS_STACK_SIZE_OFFSET(%rdi)
>
> Why is this necessary?  This is rather surprising.  I think you should
> avoid modification here.  Can you pass the data to thread_start in
> registers instead?
>

Fixed.

I will submit the v4 patch.


-- 
H.J.

      reply	other threads:[~2021-05-10 23:56 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-14 22:45 [PATCH] " H.J. Lu
2021-02-15 10:44 ` Florian Weimer
2021-02-15 14:17   ` H.J. Lu
2021-02-15 14:27     ` Florian Weimer
2021-02-15 14:44       ` H.J. Lu
2021-02-15 14:39   ` Adhemerval Zanella
2021-03-04 18:40     ` [PATCH v2] " H.J. Lu
2021-03-04 18:50       ` H.J. Lu
2021-03-05  8:32         ` Christian Brauner
2021-03-05 16:59           ` [PATCH v3] " H.J. Lu
2021-03-18 23:18             ` PING^1 " H.J. Lu
2021-04-19 12:56             ` H.J. Lu
2021-05-10  8:17             ` Florian Weimer
2021-05-10 23:56               ` H.J. Lu [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAMe9rOpYgDhJ1EM3otfTkcOuZmfQWpxqy3Ny33GM=TDo-NhibA@mail.gmail.com' \
    --to=hjl.tools@gmail.com \
    --cc=christian.brauner@ubuntu.com \
    --cc=fweimer@redhat.com \
    --cc=libc-alpha@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).