public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libc/29115] New: vfork()-based posix_spawn() has more failure modes than fork()-based one
@ 2022-05-02 12:08 izbyshev at ispras dot ru
  2022-05-02 12:09 ` [Bug libc/29115] " izbyshev at ispras dot ru
                   ` (17 more replies)
  0 siblings, 18 replies; 19+ messages in thread
From: izbyshev at ispras dot ru @ 2022-05-02 12:08 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=29115

            Bug ID: 29115
           Summary: vfork()-based posix_spawn() has more failure modes
                    than fork()-based one
           Product: glibc
           Version: 2.35
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: libc
          Assignee: unassigned at sourceware dot org
          Reporter: izbyshev at ispras dot ru
                CC: drepper.fsp at gmail dot com
  Target Milestone: ---

Modern vfork()-based posix_spawn() can be used as an efficient alternative to
fork()/exec() to avoid performance and overcommit issues. A common expectation
is that whenever posix_spawn() feature set is sufficient for application needs
of tweaking the child attributes, it can be used instead of fork()/exec().

However, it turns out that vfork() can have failure modes than fork() doesn't
have. One such case is due to Linux not allowing processes in different time
namespaces to share address space.

$ cat test.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <spawn.h>
#include <unistd.h>

int main(int argc, char *argv[], char *envp[]) {
  if (getenv("TEST_FORK")) {
    pid_t pid = fork();
    if (pid < 0) {
        perror("fork");
        return 127;
    }
    if (pid == 0) {
        execve(argv[1], argv + 1, envp);
        perror("execve");
        return 127;
    }
  } else {
      int err = posix_spawn(0, argv[1], 0, 0, argv + 1, envp);
      if (err) {
        printf("posix_spawn: %s\n", strerror(err));
        return 127;
      }
  }
  printf("OK\n");
  return 0;
}

$ gcc test.c

$ unshare -UrT ./a.out /bin/true
posix_spawn: Operation not permitted

(The actual clone() error is EINVAL, but it's reported incorrectly due to bug
29109).

$ TEST_FORK=1 unshare -UrT ./a.out /bin/true
OK

I'm not aware of other failure modes, but more might appear in the future.

Does this qualify as a glibc bug? Should glibc's posix_spawn() implementation,
for example, retry with fork() on vfork() failure (which would require a
redesign of error reporting from the child process because it currently relies
on address space sharing)?

Or do applications are expected to deal with that somehow? In this case, what
is the recommended way to do that, given that it's not possible to reliably
detect "retriable" posix_spawn() failures?

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug libc/29115] vfork()-based posix_spawn() has more failure modes than fork()-based one
  2022-05-02 12:08 [Bug libc/29115] New: vfork()-based posix_spawn() has more failure modes than fork()-based one izbyshev at ispras dot ru
@ 2022-05-02 12:09 ` izbyshev at ispras dot ru
  2022-05-02 16:17 ` adhemerval.zanella at linaro dot org
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: izbyshev at ispras dot ru @ 2022-05-02 12:09 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=29115

Alexey Izbyshev <izbyshev at ispras dot ru> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |adhemerval.zanella at linaro dot o
                   |                            |rg, fweimer at redhat dot com

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug libc/29115] vfork()-based posix_spawn() has more failure modes than fork()-based one
  2022-05-02 12:08 [Bug libc/29115] New: vfork()-based posix_spawn() has more failure modes than fork()-based one izbyshev at ispras dot ru
  2022-05-02 12:09 ` [Bug libc/29115] " izbyshev at ispras dot ru
@ 2022-05-02 16:17 ` adhemerval.zanella at linaro dot org
  2022-05-02 16:26 ` adhemerval.zanella at linaro dot org
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: adhemerval.zanella at linaro dot org @ 2022-05-02 16:17 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=29115

--- Comment #1 from Adhemerval Zanella <adhemerval.zanella at linaro dot org> ---
(In reply to Alexey Izbyshev from comment #0)
> Modern vfork()-based posix_spawn() can be used as an efficient alternative
> to fork()/exec() to avoid performance and overcommit issues. A common
> expectation is that whenever posix_spawn() feature set is sufficient for
> application needs of tweaking the child attributes, it can be used instead
> of fork()/exec().
> 
> However, it turns out that vfork() can have failure modes than fork()
> doesn't have. One such case is due to Linux not allowing processes in
> different time namespaces to share address space.
> 
> $ cat test.c
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <spawn.h>
> #include <unistd.h>
> 
> int main(int argc, char *argv[], char *envp[]) {
>   if (getenv("TEST_FORK")) {
>     pid_t pid = fork();
>     if (pid < 0) {
>         perror("fork");
>         return 127;
>     }
>     if (pid == 0) {
>         execve(argv[1], argv + 1, envp);
>         perror("execve");
>         return 127;
>     }
>   } else {
>       int err = posix_spawn(0, argv[1], 0, 0, argv + 1, envp);
>       if (err) {
>         printf("posix_spawn: %s\n", strerror(err));
>         return 127;
>       }
>   }
>   printf("OK\n");
>   return 0;
> }
> 
> $ gcc test.c
> 
> $ unshare -UrT ./a.out /bin/true
> posix_spawn: Operation not permitted
> 
> (The actual clone() error is EINVAL, but it's reported incorrectly due to
> bug 29109).
> 
> $ TEST_FORK=1 unshare -UrT ./a.out /bin/true
> OK
> 
> I'm not aware of other failure modes, but more might appear in the future.
> 
> Does this qualify as a glibc bug? Should glibc's posix_spawn()
> implementation, for example, retry with fork() on vfork() failure (which
> would require a redesign of error reporting from the child process because
> it currently relies on address space sharing)?
> 
> Or do applications are expected to deal with that somehow? In this case,
> what is the recommended way to do that, given that it's not possible to
> reliably detect "retriable" posix_spawn() failures?

It is really annoying that kernel does not allow clone (CLONE_VM | CLONE_VFORK)
with time namespace, however I am not the implications of allowing it (neither
if this is feasible on current kernel architecture).  

In any case, adding fork+exec fallback seems feasible, the only annoying case
is
if glibc should detect a clone transient failure (for instance due some
resource
exhaustion) from a namespace filtering. We can always retry in case of clone
failure, it should be really an exception and retrying will most likely succeed
in both cases.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug libc/29115] vfork()-based posix_spawn() has more failure modes than fork()-based one
  2022-05-02 12:08 [Bug libc/29115] New: vfork()-based posix_spawn() has more failure modes than fork()-based one izbyshev at ispras dot ru
  2022-05-02 12:09 ` [Bug libc/29115] " izbyshev at ispras dot ru
  2022-05-02 16:17 ` adhemerval.zanella at linaro dot org
@ 2022-05-02 16:26 ` adhemerval.zanella at linaro dot org
  2022-05-02 16:55 ` izbyshev at ispras dot ru
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: adhemerval.zanella at linaro dot org @ 2022-05-02 16:26 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=29115

--- Comment #2 from Adhemerval Zanella <adhemerval.zanella at linaro dot org> ---
Another issue is with fork+exec fallback it would require additional resources
to communicate the possible error code from the helper process while running
the prepare phase (as covered by tst-spawn3.c).

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug libc/29115] vfork()-based posix_spawn() has more failure modes than fork()-based one
  2022-05-02 12:08 [Bug libc/29115] New: vfork()-based posix_spawn() has more failure modes than fork()-based one izbyshev at ispras dot ru
                   ` (2 preceding siblings ...)
  2022-05-02 16:26 ` adhemerval.zanella at linaro dot org
@ 2022-05-02 16:55 ` izbyshev at ispras dot ru
  2022-05-02 17:17 ` adhemerval.zanella at linaro dot org
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: izbyshev at ispras dot ru @ 2022-05-02 16:55 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=29115

--- Comment #3 from Alexey Izbyshev <izbyshev at ispras dot ru> ---
> It is really annoying that kernel does not allow clone (CLONE_VM |
> CLONE_VFORK)
> with time namespace, however I am not the implications of allowing it
> (neither
> if this is feasible on current kernel architecture).  
> 
I suspect this restriction is due to the conflict of the shared address space
and the need to provide different VDSOs (for clock_gettime()) for processes in
separate time namespaces, but I haven't looked closely.

> In any case, adding fork+exec fallback seems feasible, the only annoying
> case is
> if glibc should detect a clone transient failure (for instance due some
> resource
> exhaustion) from a namespace filtering. We can always retry in case of clone
> failure, it should be really an exception and retrying will most likely
> succeed
> in both cases.

I think it would be great if glibc provided such a fallback. I agree that
retrying once with fork() in case of *any* clone(CLONE_VM | CLONE_VFORK)
failure shouldn't hurt, but it should probably also be OK to skip retry on
ENOMEM and (paradoxically) EAGAIN because the caller has to deal with them in
any case.

> Another issue is with fork+exec fallback it would require additional
> resources to communicate the possible error code from the helper process
> while running the prepare phase (as covered by tst-spawn3.c).

Yes, I'm aware that glibc currently relies on address space sharing to pass the
error code, so adding an alternative error reporting would constitute most of
the fix.

One benefit of the alternative error reporting is that it would also work
correctly in environments where vfork() system call acts as fork() (i.e.
doesn't provide address space sharing), such as qemu-user. So if it's in place,
glibc could add some knob to always enable it for users that need good
posix_spawn() error reporting in such environments.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug libc/29115] vfork()-based posix_spawn() has more failure modes than fork()-based one
  2022-05-02 12:08 [Bug libc/29115] New: vfork()-based posix_spawn() has more failure modes than fork()-based one izbyshev at ispras dot ru
                   ` (3 preceding siblings ...)
  2022-05-02 16:55 ` izbyshev at ispras dot ru
@ 2022-05-02 17:17 ` adhemerval.zanella at linaro dot org
  2022-05-02 18:04 ` adhemerval.zanella at linaro dot org
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: adhemerval.zanella at linaro dot org @ 2022-05-02 17:17 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=29115

--- Comment #4 from Adhemerval Zanella <adhemerval.zanella at linaro dot org> ---
(In reply to Alexey Izbyshev from comment #3)
> > It is really annoying that kernel does not allow clone (CLONE_VM |
> > CLONE_VFORK)
> > with time namespace, however I am not the implications of allowing it
> > (neither
> > if this is feasible on current kernel architecture).  
> > 
> I suspect this restriction is due to the conflict of the shared address
> space and the need to provide different VDSOs (for clock_gettime()) for
> processes in separate time namespaces, but I haven't looked closely.
> 
> > In any case, adding fork+exec fallback seems feasible, the only annoying
> > case is
> > if glibc should detect a clone transient failure (for instance due some
> > resource
> > exhaustion) from a namespace filtering. We can always retry in case of clone
> > failure, it should be really an exception and retrying will most likely
> > succeed
> > in both cases.
> 
> I think it would be great if glibc provided such a fallback. I agree that
> retrying once with fork() in case of *any* clone(CLONE_VM | CLONE_VFORK)
> failure shouldn't hurt, but it should probably also be OK to skip retry on
> ENOMEM and (paradoxically) EAGAIN because the caller has to deal with them
> in any case.

It makes sense indeed.

> 
> > Another issue is with fork+exec fallback it would require additional
> > resources to communicate the possible error code from the helper process
> > while running the prepare phase (as covered by tst-spawn3.c).
> 
> Yes, I'm aware that glibc currently relies on address space sharing to pass
> the error code, so adding an alternative error reporting would constitute
> most of the fix.
> 
> One benefit of the alternative error reporting is that it would also work
> correctly in environments where vfork() system call acts as fork() (i.e.
> doesn't provide address space sharing), such as qemu-user. So if it's in
> place, glibc could add some knob to always enable it for users that need
> good posix_spawn() error reporting in such environments.

We discussed this before and we moved to use shared memory to report errors
exactly to avoid requiring another resource allocation (which adds even more
failure paths).  I am not very found in restoring it.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug libc/29115] vfork()-based posix_spawn() has more failure modes than fork()-based one
  2022-05-02 12:08 [Bug libc/29115] New: vfork()-based posix_spawn() has more failure modes than fork()-based one izbyshev at ispras dot ru
                   ` (4 preceding siblings ...)
  2022-05-02 17:17 ` adhemerval.zanella at linaro dot org
@ 2022-05-02 18:04 ` adhemerval.zanella at linaro dot org
  2022-05-02 20:38 ` carlos at redhat dot com
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: adhemerval.zanella at linaro dot org @ 2022-05-02 18:04 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=29115

Adhemerval Zanella <adhemerval.zanella at linaro dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at sourceware dot org   |adhemerval.zanella at linaro dot o
                   |                            |rg

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug libc/29115] vfork()-based posix_spawn() has more failure modes than fork()-based one
  2022-05-02 12:08 [Bug libc/29115] New: vfork()-based posix_spawn() has more failure modes than fork()-based one izbyshev at ispras dot ru
                   ` (5 preceding siblings ...)
  2022-05-02 18:04 ` adhemerval.zanella at linaro dot org
@ 2022-05-02 20:38 ` carlos at redhat dot com
  2022-05-02 20:43 ` fweimer at redhat dot com
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: carlos at redhat dot com @ 2022-05-02 20:38 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=29115

Carlos O'Donell <carlos at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |carlos at redhat dot com

--- Comment #5 from Carlos O'Donell <carlos at redhat dot com> ---
Either the kernel supports vfork or it doesn't.

A time namespace, or a seccomp filter are all the same problems, and we should
return the error the userspace.

Adding code which will only be exercised in the event that a time namespace is
in use is going to result in increased long-term maintenance costs.

It also results in unexpected surprise behaviour when the developer runs under
a time namespace e.g. more memory usage, different code paths taken etc.

Rather than add long-term maintenance and surprise developers my suggestion is
to fail the posix_spawn.

Users can take this up with the kernel to add support for vfork in time
namespaces, or with their sandboxing technology provider.

There might be exceptional cases where we need to add fallbacks, but I can't
see that this is one of those cases. For example clone3 to clone2 fallback is
sensible.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug libc/29115] vfork()-based posix_spawn() has more failure modes than fork()-based one
  2022-05-02 12:08 [Bug libc/29115] New: vfork()-based posix_spawn() has more failure modes than fork()-based one izbyshev at ispras dot ru
                   ` (6 preceding siblings ...)
  2022-05-02 20:38 ` carlos at redhat dot com
@ 2022-05-02 20:43 ` fweimer at redhat dot com
  2022-05-02 20:56 ` izbyshev at ispras dot ru
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: fweimer at redhat dot com @ 2022-05-02 20:43 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=29115

--- Comment #6 from Florian Weimer <fweimer at redhat dot com> ---
CLONE_NEWTIME is as specified today fundamentally incompatible with real vfork
and the vDSO. It just does not work. Entering the new namespace requires a new
vDSO data mapping, and that conflicts with vfork using the same address space.

CLONE_NEWTIME should have been specified to apply only after execve (which
remaps the vDSO anyway), but it it's what we've got.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug libc/29115] vfork()-based posix_spawn() has more failure modes than fork()-based one
  2022-05-02 12:08 [Bug libc/29115] New: vfork()-based posix_spawn() has more failure modes than fork()-based one izbyshev at ispras dot ru
                   ` (7 preceding siblings ...)
  2022-05-02 20:43 ` fweimer at redhat dot com
@ 2022-05-02 20:56 ` izbyshev at ispras dot ru
  2022-05-02 21:02 ` carlos at redhat dot com
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: izbyshev at ispras dot ru @ 2022-05-02 20:56 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=29115

--- Comment #7 from Alexey Izbyshev <izbyshev at ispras dot ru> ---
(In reply to Carlos O'Donell from comment #5)
> Either the kernel supports vfork or it doesn't.
> 
> A time namespace, or a seccomp filter are all the same problems, and we
> should return the error the userspace.
> 
> Adding code which will only be exercised in the event that a time namespace
> is in use is going to result in increased long-term maintenance costs.
> 
> It also results in unexpected surprise behaviour when the developer runs
> under a time namespace e.g. more memory usage, different code paths taken
> etc.
> 
> Rather than add long-term maintenance and surprise developers my suggestion
> is to fail the posix_spawn.
> 
posix_spawn() failing and fork()/exec() not failing is also a surprise for
developers. Note that if users are expected to deal with this posix_spawn()
failure, all language frameworks/libraries providing high level process
creation APIs will have to implement knobs to opt-out from posix_spawn(). It's
not clear to me that it's better than a potential performance problem due to
fork() when time namespaces are used.

We also don't know what other vfork() failure modes that fork() doesn't have
may appear in the future. A fallback would cover them.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug libc/29115] vfork()-based posix_spawn() has more failure modes than fork()-based one
  2022-05-02 12:08 [Bug libc/29115] New: vfork()-based posix_spawn() has more failure modes than fork()-based one izbyshev at ispras dot ru
                   ` (8 preceding siblings ...)
  2022-05-02 20:56 ` izbyshev at ispras dot ru
@ 2022-05-02 21:02 ` carlos at redhat dot com
  2022-05-02 21:06 ` fweimer at redhat dot com
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: carlos at redhat dot com @ 2022-05-02 21:02 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=29115

--- Comment #8 from Carlos O'Donell <carlos at redhat dot com> ---
(In reply to Florian Weimer from comment #6)
> CLONE_NEWTIME is as specified today fundamentally incompatible with real
> vfork and the vDSO. It just does not work. Entering the new namespace
> requires a new vDSO data mapping, and that conflicts with vfork using the
> same address space.

The kernel already has per-cpu data in the vDSO.

The vDSO doesn't follow any concept of a single address space for the process.

The vDSO is not a part of POSIX and so doesn't have to follow any vfork
semantic requirements.

What prevents the kernel from making a new vDSO data mapping?

> CLONE_NEWTIME should have been specified to apply only after execve (which
> remaps the vDSO anyway), but it it's what we've got.

It can remain that way, we just have to remap the vDSO at vfork time?

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug libc/29115] vfork()-based posix_spawn() has more failure modes than fork()-based one
  2022-05-02 12:08 [Bug libc/29115] New: vfork()-based posix_spawn() has more failure modes than fork()-based one izbyshev at ispras dot ru
                   ` (9 preceding siblings ...)
  2022-05-02 21:02 ` carlos at redhat dot com
@ 2022-05-02 21:06 ` fweimer at redhat dot com
  2022-05-02 21:15 ` carlos at redhat dot com
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: fweimer at redhat dot com @ 2022-05-02 21:06 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=29115

--- Comment #9 from Florian Weimer <fweimer at redhat dot com> ---
(In reply to Carlos O'Donell from comment #8)
> (In reply to Florian Weimer from comment #6)
> > CLONE_NEWTIME is as specified today fundamentally incompatible with real
> > vfork and the vDSO. It just does not work. Entering the new namespace
> > requires a new vDSO data mapping, and that conflicts with vfork using the
> > same address space.
> 
> The kernel already has per-cpu data in the vDSO.

Uh, since when? I thought that Linux didn't do per-CPU page tables.

> The vDSO doesn't follow any concept of a single address space for the
> process.
> 
> The vDSO is not a part of POSIX and so doesn't have to follow any vfork
> semantic requirements.
> 
> What prevents the kernel from making a new vDSO data mapping?

It requires creating a new VM for the vfork process, while preserving existing
shared VM semantics in other regards. That seems difficult?

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug libc/29115] vfork()-based posix_spawn() has more failure modes than fork()-based one
  2022-05-02 12:08 [Bug libc/29115] New: vfork()-based posix_spawn() has more failure modes than fork()-based one izbyshev at ispras dot ru
                   ` (10 preceding siblings ...)
  2022-05-02 21:06 ` fweimer at redhat dot com
@ 2022-05-02 21:15 ` carlos at redhat dot com
  2022-05-02 21:24 ` carlos at redhat dot com
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: carlos at redhat dot com @ 2022-05-02 21:15 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=29115

--- Comment #10 from Carlos O'Donell <carlos at redhat dot com> ---
(In reply to Alexey Izbyshev from comment #7)
> (In reply to Carlos O'Donell from comment #5)
> > Either the kernel supports vfork or it doesn't.
> > 
> > A time namespace, or a seccomp filter are all the same problems, and we
> > should return the error the userspace.
> > 
> > Adding code which will only be exercised in the event that a time namespace
> > is in use is going to result in increased long-term maintenance costs.
> > 
> > It also results in unexpected surprise behaviour when the developer runs
> > under a time namespace e.g. more memory usage, different code paths taken
> > etc.
> > 
> > Rather than add long-term maintenance and surprise developers my suggestion
> > is to fail the posix_spawn.
> > 
> posix_spawn() failing and fork()/exec() not failing is also a surprise for
> developers. Note that if users are expected to deal with this posix_spawn()
> failure, all language frameworks/libraries providing high level process
> creation APIs will have to implement knobs to opt-out from posix_spawn().
> It's not clear to me that it's better than a potential performance problem
> due to fork() when time namespaces are used.
> 
> We also don't know what other vfork() failure modes that fork() doesn't have
> may appear in the future. A fallback would cover them.

That is a slipper slope fallacy. Those other failure modes haven't materialized
and so they do not matter to the conversation at hand. When we have other
failure modes, and fork() can fail badly also as it consumes more memmory,
maybe triggering OOM, we have other problems.

Performance and expected semantics are an important part of an interface.
Library and applications authors would not only have to change posix_spawn() as
a choice but also system() which may use vfork(), and maybe even clone (if used
with the right flags).

All of this makes me suspect that blocking vfork is the wrong semantic. It
needs to be enabled in the kernel otherwise the CRIU use case is *not met*.

We can't add CLONE_NEWTIME and yet require all of userspace to move away from
vfork/clone which is the fastest and least-memory intensive way to clone a
process.

This change adds significant code to the implementation. Please involve the
CRIU developers and see if this can't be solved in the kernel first. I haven't
seen any justification that there are blockers to this in the kernel.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug libc/29115] vfork()-based posix_spawn() has more failure modes than fork()-based one
  2022-05-02 12:08 [Bug libc/29115] New: vfork()-based posix_spawn() has more failure modes than fork()-based one izbyshev at ispras dot ru
                   ` (11 preceding siblings ...)
  2022-05-02 21:15 ` carlos at redhat dot com
@ 2022-05-02 21:24 ` carlos at redhat dot com
  2022-05-02 21:51 ` adhemerval.zanella at linaro dot org
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: carlos at redhat dot com @ 2022-05-02 21:24 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=29115

--- Comment #11 from Carlos O'Donell <carlos at redhat dot com> ---
(In reply to Florian Weimer from comment #9)
> (In reply to Carlos O'Donell from comment #8)
> > (In reply to Florian Weimer from comment #6)
> > > CLONE_NEWTIME is as specified today fundamentally incompatible with real
> > > vfork and the vDSO. It just does not work. Entering the new namespace
> > > requires a new vDSO data mapping, and that conflicts with vfork using the
> > > same address space.
> > 
> > The kernel already has per-cpu data in the vDSO.
> 
> Uh, since when? I thought that Linux didn't do per-CPU page tables.

So, this is a stretch, but on x86 you use GDT to get the per-CPU data.

Is this not what we could call per-cpu data in a distinct address space?

> > The vDSO doesn't follow any concept of a single address space for the
> > process.
> > 
> > The vDSO is not a part of POSIX and so doesn't have to follow any vfork
> > semantic requirements.
> > 
> > What prevents the kernel from making a new vDSO data mapping?
> 
> It requires creating a new VM for the vfork process, while preserving
> existing shared VM semantics in other regards. That seems difficult?

I don't know until a kernel developer tells me this is difficult :-)

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug libc/29115] vfork()-based posix_spawn() has more failure modes than fork()-based one
  2022-05-02 12:08 [Bug libc/29115] New: vfork()-based posix_spawn() has more failure modes than fork()-based one izbyshev at ispras dot ru
                   ` (12 preceding siblings ...)
  2022-05-02 21:24 ` carlos at redhat dot com
@ 2022-05-02 21:51 ` adhemerval.zanella at linaro dot org
  2022-08-08 14:08 ` adhemerval.zanella at linaro dot org
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: adhemerval.zanella at linaro dot org @ 2022-05-02 21:51 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=29115

--- Comment #12 from Adhemerval Zanella <adhemerval.zanella at linaro dot org> ---
(In reply to Carlos O'Donell from comment #10)
> All of this makes me suspect that blocking vfork is the wrong semantic. It
> needs to be enabled in the kernel otherwise the CRIU use case is *not met*.
> 

We need the CLONE_VFORK semantic as a QoI. Otherwise, it would require
synchronizing with a pipe or similar facility and thus require additional
resources (with might fail under some constraint environments).

> We can't add CLONE_NEWTIME and yet require all of userspace to move away
> from vfork/clone which is the fastest and least-memory intensive way to
> clone a process.
> 
> This change adds significant code to the implementation. Please involve the
> CRIU developers and see if this can't be solved in the kernel first. I
> haven't seen any justification that there are blockers to this in the kernel.

I tend to agree it adds maintainability, but I think since we have some kernel
with timestamp support not having a fallback or if kernel developers decide to
not fix it, it will make posix_spawn an unappealing API.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug libc/29115] vfork()-based posix_spawn() has more failure modes than fork()-based one
  2022-05-02 12:08 [Bug libc/29115] New: vfork()-based posix_spawn() has more failure modes than fork()-based one izbyshev at ispras dot ru
                   ` (13 preceding siblings ...)
  2022-05-02 21:51 ` adhemerval.zanella at linaro dot org
@ 2022-08-08 14:08 ` adhemerval.zanella at linaro dot org
  2022-08-08 14:13 ` fweimer at redhat dot com
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: adhemerval.zanella at linaro dot org @ 2022-08-08 14:08 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=29115

Adhemerval Zanella <adhemerval.zanella at linaro dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |WONTFIX
             Status|UNCONFIRMED                 |RESOLVED

--- Comment #13 from Adhemerval Zanella <adhemerval.zanella at linaro dot org> ---
It has now been fixed on Linux kernel
(133e2d3e81de5d9706cab2dd1d52d231c27382e5), and based on previous discussions
where only fixing glibc does not help direct user cases of clone (CLONE_VM |
CLONE_FORK), I will drop my fix [1].  We will need to ask for kernel backport
of kernel fix to support this usage.

[1] https://patchwork.sourceware.org/project/glibc/list/?series=9797

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug libc/29115] vfork()-based posix_spawn() has more failure modes than fork()-based one
  2022-05-02 12:08 [Bug libc/29115] New: vfork()-based posix_spawn() has more failure modes than fork()-based one izbyshev at ispras dot ru
                   ` (14 preceding siblings ...)
  2022-08-08 14:08 ` adhemerval.zanella at linaro dot org
@ 2022-08-08 14:13 ` fweimer at redhat dot com
  2022-08-08 14:15 ` fweimer at redhat dot com
  2022-08-08 15:37 ` izbyshev at ispras dot ru
  17 siblings, 0 replies; 19+ messages in thread
From: fweimer at redhat dot com @ 2022-08-08 14:13 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=29115

Florian Weimer <fweimer at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://bugzilla.redhat.com
                   |                            |/show_bug.cgi?id=2116442

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug libc/29115] vfork()-based posix_spawn() has more failure modes than fork()-based one
  2022-05-02 12:08 [Bug libc/29115] New: vfork()-based posix_spawn() has more failure modes than fork()-based one izbyshev at ispras dot ru
                   ` (15 preceding siblings ...)
  2022-08-08 14:13 ` fweimer at redhat dot com
@ 2022-08-08 14:15 ` fweimer at redhat dot com
  2022-08-08 15:37 ` izbyshev at ispras dot ru
  17 siblings, 0 replies; 19+ messages in thread
From: fweimer at redhat dot com @ 2022-08-08 14:15 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=29115

Florian Weimer <fweimer at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://bugzilla.redhat.com
                   |                            |/show_bug.cgi?id=2116444

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug libc/29115] vfork()-based posix_spawn() has more failure modes than fork()-based one
  2022-05-02 12:08 [Bug libc/29115] New: vfork()-based posix_spawn() has more failure modes than fork()-based one izbyshev at ispras dot ru
                   ` (16 preceding siblings ...)
  2022-08-08 14:15 ` fweimer at redhat dot com
@ 2022-08-08 15:37 ` izbyshev at ispras dot ru
  17 siblings, 0 replies; 19+ messages in thread
From: izbyshev at ispras dot ru @ 2022-08-08 15:37 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=29115

--- Comment #14 from Alexey Izbyshev <izbyshev at ispras dot ru> ---
Great to see it fixed in the kernel, thanks for everybody involved!

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2022-08-08 15:37 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-02 12:08 [Bug libc/29115] New: vfork()-based posix_spawn() has more failure modes than fork()-based one izbyshev at ispras dot ru
2022-05-02 12:09 ` [Bug libc/29115] " izbyshev at ispras dot ru
2022-05-02 16:17 ` adhemerval.zanella at linaro dot org
2022-05-02 16:26 ` adhemerval.zanella at linaro dot org
2022-05-02 16:55 ` izbyshev at ispras dot ru
2022-05-02 17:17 ` adhemerval.zanella at linaro dot org
2022-05-02 18:04 ` adhemerval.zanella at linaro dot org
2022-05-02 20:38 ` carlos at redhat dot com
2022-05-02 20:43 ` fweimer at redhat dot com
2022-05-02 20:56 ` izbyshev at ispras dot ru
2022-05-02 21:02 ` carlos at redhat dot com
2022-05-02 21:06 ` fweimer at redhat dot com
2022-05-02 21:15 ` carlos at redhat dot com
2022-05-02 21:24 ` carlos at redhat dot com
2022-05-02 21:51 ` adhemerval.zanella at linaro dot org
2022-08-08 14:08 ` adhemerval.zanella at linaro dot org
2022-08-08 14:13 ` fweimer at redhat dot com
2022-08-08 14:15 ` fweimer at redhat dot com
2022-08-08 15:37 ` izbyshev at ispras dot ru

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).