public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* system and popen fail in case of big application
@ 2018-09-09 22:16 Sergey Melnikov
  2018-09-09 22:44 ` Zack Weinberg
  0 siblings, 1 reply; 26+ messages in thread
From: Sergey Melnikov @ 2018-09-09 22:16 UTC (permalink / raw)
  To: libc-alpha

Hi all,

I'm developing JNI code for a Java application. My code works fine in
case of small heap size. Nevertheless, if I set heap size to something
reasonably big (like 20Gb on my dev PC with 32Gb of RAM), 'popen' and
'system' glibc calls don't work with ENOMEM errno.

It looks like I'm not the first person who has faced this issue since
there is quite the same question on stackoverflow
(https://stackoverflow.com/questions/46574798/enomem-from-popen-for-system-while-there-is-enough-memory?rq=1).

So, my question is if does it worth to rewrite existing popen/system
calls with posix_spawn as StackOverflow recommends? If so, I may
implement and contribute this to glibc.

--Sergey

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: system and popen fail in case of big application
  2018-09-09 22:16 system and popen fail in case of big application Sergey Melnikov
@ 2018-09-09 22:44 ` Zack Weinberg
  2018-09-09 22:47   ` Zack Weinberg
  2018-09-11 20:16   ` Adhemerval Zanella
  0 siblings, 2 replies; 26+ messages in thread
From: Zack Weinberg @ 2018-09-09 22:44 UTC (permalink / raw)
  To: melnikov.sergey.v; +Cc: GNU C Library

On Sun, Sep 9, 2018 at 6:16 PM Sergey Melnikov
<melnikov.sergey.v@gmail.com> wrote:
>
> Hi all,
>
> I'm developing JNI code for a Java application. My code works fine in
> case of small heap size. Nevertheless, if I set heap size to something
> reasonably big (like 20Gb on my dev PC with 32Gb of RAM), 'popen' and
> 'system' glibc calls don't work with ENOMEM errno.
>
> So, my question is if does it worth to rewrite existing popen/system
> calls with posix_spawn as StackOverflow recommends? If so, I may
> implement and contribute this to glibc.

What would be worthwhile is to change popen and system so that they
internally use *vfork*, on systems where vfork is not the same as
fork. posix_spawn is just an elaborate and inconvenient wrapper around
vfork, and I don't think we should be using it internally or
encouraging its use by others.  Also, unlike vfork, it does not
guarantee to avoid the problem you are experiencing.

zw

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: system and popen fail in case of big application
  2018-09-09 22:44 ` Zack Weinberg
@ 2018-09-09 22:47   ` Zack Weinberg
  2018-09-09 23:27     ` Sergey Melnikov
  2018-09-11 20:16   ` Adhemerval Zanella
  1 sibling, 1 reply; 26+ messages in thread
From: Zack Weinberg @ 2018-09-09 22:47 UTC (permalink / raw)
  To: melnikov.sergey.v; +Cc: GNU C Library

On Sun, Sep 9, 2018 at 6:44 PM Zack Weinberg <zackw@panix.com> wrote:
> On Sun, Sep 9, 2018 at 6:16 PM Sergey Melnikov
> <melnikov.sergey.v@gmail.com> wrote:
> > If so, I may implement and contribute this to glibc.
>
> What would be worthwhile is to change popen and system so that they
> internally use *vfork*, on systems where vfork is not the same as fork.

I meant to say explicitly that if you write patches that do this, we
will definitely consider them for inclusion.  However, you're going to
need to file a copyright assignment with the FSF.  This can be slow,
so you should get the process started immediately.  See
https://sourceware.org/glibc/wiki/Contribution%20checklist#FSF_copyright_Assignment
.

zw

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: system and popen fail in case of big application
  2018-09-09 22:47   ` Zack Weinberg
@ 2018-09-09 23:27     ` Sergey Melnikov
  2018-09-09 23:49       ` Zack Weinberg
                         ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Sergey Melnikov @ 2018-09-09 23:27 UTC (permalink / raw)
  To: zackw; +Cc: libc-alpha

Mark,

Thank you for your guidance. I'll write man for vfork and prepare a patch.

Regarding FSF. I had signed FSF during my work at Intel (I worked on
gcc for android x86). Should I sign this one more time as an
individual contributor (I don't work at Intel anymore)?

--Sergey
On Mon, Sep 10, 2018 at 1:47 AM Zack Weinberg <zackw@panix.com> wrote:
>
> On Sun, Sep 9, 2018 at 6:44 PM Zack Weinberg <zackw@panix.com> wrote:
> > On Sun, Sep 9, 2018 at 6:16 PM Sergey Melnikov
> > <melnikov.sergey.v@gmail.com> wrote:
> > > If so, I may implement and contribute this to glibc.
> >
> > What would be worthwhile is to change popen and system so that they
> > internally use *vfork*, on systems where vfork is not the same as fork.
>
> I meant to say explicitly that if you write patches that do this, we
> will definitely consider them for inclusion.  However, you're going to
> need to file a copyright assignment with the FSF.  This can be slow,
> so you should get the process started immediately.  See
> https://sourceware.org/glibc/wiki/Contribution%20checklist#FSF_copyright_Assignment
> .
>
> zw



-- 

--Sergey

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: system and popen fail in case of big application
  2018-09-09 23:27     ` Sergey Melnikov
@ 2018-09-09 23:49       ` Zack Weinberg
  2018-09-10  2:50         ` Martin Buchholz
  2018-09-10 15:38       ` Joseph Myers
       [not found]       ` <CA+kOe0_1k_PbJ-pjHznP4AmTJUgziAdT+4vCcCRSb7GGdvbv7Q@mail.gmail.com>
  2 siblings, 1 reply; 26+ messages in thread
From: Zack Weinberg @ 2018-09-09 23:49 UTC (permalink / raw)
  To: melnikov.sergey.v; +Cc: GNU C Library, Carlos O'Donell, Joseph Myers

On Sun, Sep 9, 2018 at 7:27 PM Sergey Melnikov
<melnikov.sergey.v@gmail.com> wrote:
> Regarding FSF. I had signed FSF during my work at Intel (I worked on
> gcc for android x86). Should I sign this one more time as an
> individual contributor (I don't work at Intel anymore)?

It depends on exactly what you signed during your time at Intel.  I'm
cc:ing two people who have access to the FSF's records and can tell
you what, if anything, you should do next.

Incidentally, this is an old-fashioned mailing list, please reply
below the text you are responding to, and trim out all the quoted text
except what you are responding to (see
http://howto-pages.org/posting_style/).  Yes, I know Gmail makes this
difficult.  That's Gmail's bug, and we need you to work around it.

zw

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: system and popen fail in case of big application
  2018-09-09 23:49       ` Zack Weinberg
@ 2018-09-10  2:50         ` Martin Buchholz
  0 siblings, 0 replies; 26+ messages in thread
From: Martin Buchholz @ 2018-09-10  2:50 UTC (permalink / raw)
  To: Zack Weinberg
  Cc: Sergey Melnikov, GNU C Library, Carlos O'Donell, Joseph Myers

See discussion in the source file
http://hg.openjdk.java.net/jdk/jdk/file/tip/src/java.base/unix/native/libjava/ProcessImpl_md.c

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: system and popen fail in case of big application
  2018-09-09 23:27     ` Sergey Melnikov
  2018-09-09 23:49       ` Zack Weinberg
@ 2018-09-10 15:38       ` Joseph Myers
  2018-09-10 15:42         ` Zack Weinberg
       [not found]       ` <CA+kOe0_1k_PbJ-pjHznP4AmTJUgziAdT+4vCcCRSb7GGdvbv7Q@mail.gmail.com>
  2 siblings, 1 reply; 26+ messages in thread
From: Joseph Myers @ 2018-09-10 15:38 UTC (permalink / raw)
  To: Sergey Melnikov; +Cc: zackw, libc-alpha

On Mon, 10 Sep 2018, Sergey Melnikov wrote:

> Mark,
> 
> Thank you for your guidance. I'll write man for vfork and prepare a patch.
> 
> Regarding FSF. I had signed FSF during my work at Intel (I worked on
> gcc for android x86). Should I sign this one more time as an
> individual contributor (I don't work at Intel anymore)?

There's a corporate assignment from Intel for "GCC GLIBC GDB BINUTILS 
DEJAGNU GRUB" (2015-3-20; "Assigns Past and Future Changes (modified; see 
assignments)").  I don't see a personal assignment for you recorded.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: system and popen fail in case of big application
  2018-09-10 15:38       ` Joseph Myers
@ 2018-09-10 15:42         ` Zack Weinberg
  2018-09-10 15:43           ` Joseph Myers
  0 siblings, 1 reply; 26+ messages in thread
From: Zack Weinberg @ 2018-09-10 15:42 UTC (permalink / raw)
  To: Joseph Myers; +Cc: Sergey Melnikov, GNU C Library

On Mon, Sep 10, 2018 at 11:38 AM Joseph Myers <joseph@codesourcery.com> wrote:
> On Mon, 10 Sep 2018, Sergey Melnikov wrote:
> > Regarding FSF. I had signed FSF during my work at Intel (I worked on
> > gcc for android x86). Should I sign this one more time as an
> > individual contributor (I don't work at Intel anymore)?
>
> There's a corporate assignment from Intel for "GCC GLIBC GDB BINUTILS
> DEJAGNU GRUB" (2015-3-20; "Assigns Past and Future Changes (modified; see
> assignments)").  I don't see a personal assignment for you recorded.

OK, since Sergey no longer works at Intel, I believe that means he
needs to file new paperwork?

zw

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: system and popen fail in case of big application
  2018-09-10 15:42         ` Zack Weinberg
@ 2018-09-10 15:43           ` Joseph Myers
  0 siblings, 0 replies; 26+ messages in thread
From: Joseph Myers @ 2018-09-10 15:43 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: Sergey Melnikov, GNU C Library

On Mon, 10 Sep 2018, Zack Weinberg wrote:

> On Mon, Sep 10, 2018 at 11:38 AM Joseph Myers <joseph@codesourcery.com> wrote:
> > On Mon, 10 Sep 2018, Sergey Melnikov wrote:
> > > Regarding FSF. I had signed FSF during my work at Intel (I worked on
> > > gcc for android x86). Should I sign this one more time as an
> > > individual contributor (I don't work at Intel anymore)?
> >
> > There's a corporate assignment from Intel for "GCC GLIBC GDB BINUTILS
> > DEJAGNU GRUB" (2015-3-20; "Assigns Past and Future Changes (modified; see
> > assignments)").  I don't see a personal assignment for you recorded.
> 
> OK, since Sergey no longer works at Intel, I believe that means he
> needs to file new paperwork?

Yes.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: system and popen fail in case of big application
       [not found]       ` <CA+kOe0_1k_PbJ-pjHznP4AmTJUgziAdT+4vCcCRSb7GGdvbv7Q@mail.gmail.com>
@ 2018-09-10 15:59         ` Zack Weinberg
  2018-09-10 16:55           ` Martin Buchholz
  0 siblings, 1 reply; 26+ messages in thread
From: Zack Weinberg @ 2018-09-10 15:59 UTC (permalink / raw)
  To: martinrb; +Cc: GNU C Library

On Sun, Sep 9, 2018 at 10:47 PM Martin Buchholz <martinrb@google.com> wrote:
>
> See discussion in the source file
> http://hg.openjdk.java.net/jdk/jdk/file/tip/src/java.base/unix/native/libjava/ProcessImpl_md.c

Thanks, that's useful information on the bigger picture.

FYI, the test program in PR #10311 runs to completion without error
under glibc 2.27, when compiled on x86_64-linux with either -m32 or
-m64.  I think this probably reflects our backing away from caching
the result of getpid() a few versions ago.  I don't want to say that
this has been _fixed_, but if you gave us a list of everything you
would actually want to do on the child side of a CLONE_VM without
CLONE_THREAD, I think we could now have a more constructive
conversation about whether that's something we could commit to keeping
working.

It's interesting that that cites not being able to reliably close all
inherited file descriptors as a reason not to use posix_spawn.  I have
thought for years that closefrom() should be universal, and I have yet
to hear an argument against it that wasn't, in my opinion, 100% bogus.
Perhaps we should push for it to be adopted in Linux.

zw

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: system and popen fail in case of big application
  2018-09-10 15:59         ` Zack Weinberg
@ 2018-09-10 16:55           ` Martin Buchholz
  0 siblings, 0 replies; 26+ messages in thread
From: Martin Buchholz @ 2018-09-10 16:55 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: GNU C Library

I updated the bug with a new comment.
https://sourceware.org/bugzilla/show_bug.cgi?id=10311#c10

I fumbled the bugzilla interface and added the same comment twice, but
don't see any way to delete one of them.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: system and popen fail in case of big application
  2018-09-09 22:44 ` Zack Weinberg
  2018-09-09 22:47   ` Zack Weinberg
@ 2018-09-11 20:16   ` Adhemerval Zanella
  2018-09-12 16:30     ` Zack Weinberg
  1 sibling, 1 reply; 26+ messages in thread
From: Adhemerval Zanella @ 2018-09-11 20:16 UTC (permalink / raw)
  To: libc-alpha



On 09/09/2018 19:44, Zack Weinberg wrote:
> On Sun, Sep 9, 2018 at 6:16 PM Sergey Melnikov
> <melnikov.sergey.v@gmail.com> wrote:
>>
>> Hi all,
>>
>> I'm developing JNI code for a Java application. My code works fine in
>> case of small heap size. Nevertheless, if I set heap size to something
>> reasonably big (like 20Gb on my dev PC with 32Gb of RAM), 'popen' and
>> 'system' glibc calls don't work with ENOMEM errno.
>>
>> So, my question is if does it worth to rewrite existing popen/system
>> calls with posix_spawn as StackOverflow recommends? If so, I may
>> implement and contribute this to glibc.
> 
> What would be worthwhile is to change popen and system so that they
> internally use *vfork*, on systems where vfork is not the same as
> fork. posix_spawn is just an elaborate and inconvenient wrapper around
> vfork, and I don't think we should be using it internally or
> encouraging its use by others.  Also, unlike vfork, it does not
> guarantee to avoid the problem you are experiencing.

We changed posix_spawn to exactly avoid call vfork due its inherent usage
issue with signal handlers (BZ#14750), cancellation state being shared 
with the parent (BZ#10354), and the vfork limitation of possible parent 
clobber due stack spilling (which would require some specific compiler
support to mark the call).  

I can't really see why you think not using posix_spawn, which addresses
all this issues, would be better to either user a problematic interface
(vfork) or replicate the logic all over arch-specific popen implementation
(since clone is Linux specific and it would require us to maintain a
generic implementation which uses fork and another one which uses
CLONE_VFORK with all the required machinery).

In fact think using posix_spawn should require a minimum extra memory
for each call (at least 32kb due BZ#21253 and I still think this change
is quite pessimist due current somewhat broken -fstack-check support).

So I would suggest otherwise: to use implement popen using posix_spawn
as generic code instead of current fork and avoid using the broken
vfork interface (which has been removed from POSIX 2008).


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: system and popen fail in case of big application
  2018-09-11 20:16   ` Adhemerval Zanella
@ 2018-09-12 16:30     ` Zack Weinberg
  2018-09-12 19:46       ` Martin Buchholz
                         ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Zack Weinberg @ 2018-09-12 16:30 UTC (permalink / raw)
  To: adhemerval.zanella; +Cc: libc-alpha

On Tue, Sep 11, 2018 at 4:16 PM >
Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote:
> On 09/09/2018 19:44, Zack Weinberg wrote:
> > On Sun, Sep 9, 2018 at 6:16 PM Sergey Melnikov
> > <melnikov.sergey.v@gmail.com> wrote:
> >> So, my question is if does it worth to rewrite existing popen/system
> >> calls with posix_spawn as StackOverflow recommends? If so, I may
> >> implement and contribute this to glibc.
> >
> > What would be worthwhile is to change popen and system so that they
> > internally use *vfork*, on systems where vfork is not the same as
> > fork. posix_spawn is just an elaborate and inconvenient wrapper around
> > vfork, and I don't think we should be using it internally or
> > encouraging its use by others.  Also, unlike vfork, it does not
> > guarantee to avoid the problem you are experiencing.
>
> We changed posix_spawn to exactly avoid call vfork due its inherent usage
> issue with signal handlers (BZ#14750), cancellation state being shared
> with the parent (BZ#10354), and the vfork limitation of possible parent
> clobber due stack spilling (which would require some specific compiler
> support to mark the call).

When I first read this, I thought you were saying you changed
posix_spawn to call plain old fork, which would have meant it would no
longer solve Sergey's problem.  After reading over BZ#14750, I see
that posix_spawn does still use the vfork primitive operation on
Linux, it just does it in a lower-level manner that allows you to
avoid the bugs you list (clone() with CLONE_VM|CLONE_VFORK and a
temporary stack, plus blocking internal signals, which application
code can't do for itself because we don't let it).

But I don't see any reason we couldn't expose your bug fixes to all
callers of vfork.  We might need to change its calling convention
(hypothetically, pid_t nvfork(void (*proc)(void *closure))) but we
could do that as a GNU extension.  That would fix #14750 and #10354
for _everyone_, and changing all existing internal subprocess creation
to use the hypothetical nvfork would be less work than changing it to
use posix_spawn, and give a cleaner result.

> I can't really see why you think not using posix_spawn, which addresses
> all this issues, would be better to either user a problematic interface
> (vfork)

I don't want to encourage people to use posix_spawn because
posix_spawn is a badly designed API.  It's difficult to use correctly.
It's *tedious* to use correctly, which means people won't bother.  It
can't do everything you might want to do on the child side (witness
the discussion of adding an extension to let it do chdir()).  Its
behavior in case of errors is underspecified.  And it might be
implemented in terms of fork, which means it doesn't guarantee to
solve Sergey's problem.

zw

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: system and popen fail in case of big application
  2018-09-12 16:30     ` Zack Weinberg
@ 2018-09-12 19:46       ` Martin Buchholz
  2018-09-13  1:27         ` Adhemerval Zanella
  2018-09-12 23:08       ` Adhemerval Zanella
  2018-09-13 12:11       ` Szabolcs Nagy
  2 siblings, 1 reply; 26+ messages in thread
From: Martin Buchholz @ 2018-09-12 19:46 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: adhemerval.zanella, GNU C Library

On Wed, Sep 12, 2018 at 9:29 AM, Zack Weinberg <zackw@panix.com> wrote:
> I don't want to encourage people to use posix_spawn because
> posix_spawn is a badly designed API.  It's difficult to use correctly.
> It's *tedious* to use correctly, which means people won't bother.  It
> can't do everything you might want to do on the child side (witness
> the discussion of adding an extension to let it do chdir()).  Its
> behavior in case of errors is underspecified.  And it might be
> implemented in terms of fork, which means it doesn't guarantee to
> solve Sergey's problem.

I seem to be in agreement with Zack's eloquent post.  From java's
point of view, we need to avoid the momentary doubling of memory
caused by fork()ing a huge process, and we need to be able to specify
a working directory for the child.  One strategy is to use posix_spawn
to start a small helper process which then in turn execs the real
child after necessary set up (including chdir).

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: system and popen fail in case of big application
  2018-09-12 16:30     ` Zack Weinberg
  2018-09-12 19:46       ` Martin Buchholz
@ 2018-09-12 23:08       ` Adhemerval Zanella
  2018-09-13 12:11       ` Szabolcs Nagy
  2 siblings, 0 replies; 26+ messages in thread
From: Adhemerval Zanella @ 2018-09-12 23:08 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: libc-alpha



On 12/09/2018 13:29, Zack Weinberg wrote:
> On Tue, Sep 11, 2018 at 4:16 PM >
> Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote:
>> On 09/09/2018 19:44, Zack Weinberg wrote:
>>> On Sun, Sep 9, 2018 at 6:16 PM Sergey Melnikov
>>> <melnikov.sergey.v@gmail.com> wrote:
>>>> So, my question is if does it worth to rewrite existing popen/system
>>>> calls with posix_spawn as StackOverflow recommends? If so, I may
>>>> implement and contribute this to glibc.
>>>
>>> What would be worthwhile is to change popen and system so that they
>>> internally use *vfork*, on systems where vfork is not the same as
>>> fork. posix_spawn is just an elaborate and inconvenient wrapper around
>>> vfork, and I don't think we should be using it internally or
>>> encouraging its use by others.  Also, unlike vfork, it does not
>>> guarantee to avoid the problem you are experiencing.
>>
>> We changed posix_spawn to exactly avoid call vfork due its inherent usage
>> issue with signal handlers (BZ#14750), cancellation state being shared
>> with the parent (BZ#10354), and the vfork limitation of possible parent
>> clobber due stack spilling (which would require some specific compiler
>> support to mark the call).
> 
> When I first read this, I thought you were saying you changed
> posix_spawn to call plain old fork, which would have meant it would no
> longer solve Sergey's problem.  After reading over BZ#14750, I see
> that posix_spawn does still use the vfork primitive operation on
> Linux, it just does it in a lower-level manner that allows you to
> avoid the bugs you list (clone() with CLONE_VM|CLONE_VFORK and a
> temporary stack, plus blocking internal signals, which application
> code can't do for itself because we don't let it).

And my understanding it is better than the alternatives in a lot of
situations of either use of the tricky and broken vfork or the old 
fork plus exec* with its possible inherent memory issues.

> 
> But I don't see any reason we couldn't expose your bug fixes to all
> callers of vfork.  We might need to change its calling convention
> (hypothetically, pid_t nvfork(void (*proc)(void *closure))) but we
> could do that as a GNU extension.  That would fix #14750 and #10354
> for _everyone_, and changing all existing internal subprocess creation
> to use the hypothetical nvfork would be less work than changing it to
> use posix_spawn, and give a cleaner result.

I think we are discussing two slight different points here: one is to
base both popen and system internally on posix_spawn to take its advantage 
over aforementioned alternatives, and another one is to provide a vfork
like alternative as a GNU extension.

For former, by using posix_spawn internally we won't need to add *another* 
internal API which potential substantial work to accommodate some internal 
arch (for clone, stack, etc. on Linux) and system (as for hurd which does 
not use it) variants.  I still think using posix_spawn internally would 
yield simpler implementation (and other libcs, musl for instance, already 
does it).

For latter I also still think vfork-like interface are just broken without
the extra care: the vfork-like interface would require to run on its own
stack decoupled from parent or some guarantee from compiler it will create 
such stack for spill, etc. after the function call. There are other issues 
where calling functions which might alter the shared state between child and
parent are problematic (such as malloc).  In any case, such interface really 
abuses some assumptions about stack and process management, which I see it 
should be hiding behind more concise interfaces. 

> 
>> I can't really see why you think not using posix_spawn, which addresses
>> all this issues, would be better to either user a problematic interface
>> (vfork)
> 
> I don't want to encourage people to use posix_spawn because
> posix_spawn is a badly designed API.  It's difficult to use correctly.
> It's *tedious* to use correctly, which means people won't bother.  It
> can't do everything you might want to do on the child side (witness
> the discussion of adding an extension to let it do chdir()).  Its
> behavior in case of errors is underspecified.  And it might be
> implemented in terms of fork, which means it doesn't guarantee to
> solve Sergey's problem.
> 

I do agree current interface definition is limited and do not cover all
current user cases, but I do see that it does provide some improvements
over other alternatives in a lot of situations (gnome3 for instance
has started to use posix_spawn because our optimization [1]). 

Some points you raised are more a question of aesthetics than the 
interface itself, but at least posix_spawn it is designed so it can be 
extended (such as POSIX_SPAWN_SETSID recently).  I do think it would be 
better to work towards it than try to come up with some GNU specific 
interface based on a broken interface.

Right now I see two potentially extensions: one is the chdir and which
would be better its interface and another one is how to reliable close all
files.  For former I am still catching up with current discussion, but
for second I see that without help either from application or from
kernel there is not a reliable *and* race-free to close all file descriptor
with hurting process launching scalability. *BSD implementation of 
closeall, for instance, is just a dummy for over all possible FD and it
does not handle race conditions (and it is just worse than not provide
such interface IHMO).

[1] https://github.com/GNOME/glib/commit/61f54591acdfe69315cef6d1aa6d3bf1ff763082

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: system and popen fail in case of big application
  2018-09-12 19:46       ` Martin Buchholz
@ 2018-09-13  1:27         ` Adhemerval Zanella
  2018-09-13  6:31           ` Andreas Schwab
  2018-09-18  1:18           ` Martin Buchholz
  0 siblings, 2 replies; 26+ messages in thread
From: Adhemerval Zanella @ 2018-09-13  1:27 UTC (permalink / raw)
  To: Martin Buchholz, Zack Weinberg; +Cc: GNU C Library



On 12/09/2018 16:45, Martin Buchholz wrote:
> On Wed, Sep 12, 2018 at 9:29 AM, Zack Weinberg <zackw@panix.com> wrote:
>> I don't want to encourage people to use posix_spawn because
>> posix_spawn is a badly designed API.  It's difficult to use correctly.
>> It's *tedious* to use correctly, which means people won't bother.  It
>> can't do everything you might want to do on the child side (witness
>> the discussion of adding an extension to let it do chdir()).  Its
>> behavior in case of errors is underspecified.  And it might be
>> implemented in terms of fork, which means it doesn't guarantee to
>> solve Sergey's problem.
> 
> I seem to be in agreement with Zack's eloquent post.  From java's
> point of view, we need to avoid the momentary doubling of memory
> caused by fork()ing a huge process, and we need to be able to specify
> a working directory for the child.  One strategy is to use posix_spawn
> to start a small helper process which then in turn execs the real
> child after necessary set up (including chdir).
> 

The code you linked [1] contains some outdated information regarding
vfork and posix_spawn:

  - vfork would be supported on Linux not because its a good interface,
    but rather than Linux policy is to not broke userspace in any case.
    And it would be supported only on architectures that already provide
    it, new ports do not automatically export vfork and they are not 
    encouraged to do so. Currently ARC, alpha, s390, microblame, x86, sH,
    powerpc, hexagen, parisc, m68k, arm, and aarch64 do so (and aarch64 
    only if compat mode is enabled, a 64-bit kernel only won't have vfork
    syscall). For remaining arches, glibc will just use a clone plus
    required flags (CLONE_VM | CLONE_VFORK | SIGCHLD).

  - Almost sure the clone issues described is due the fact of old tid
    caching due CLONE_VM and this semantic has been changed in recent
    glibc version.  clone now should not touch any shared internal 
    structures, the only constraint is it still requires a stack argument.

Also, on vfork oath I am not very confident openjdk implementation works
around correctly on the issue that posix_spawn has fixed recently:

  - inherent usage issue with signal handlers (BZ#14750): I don't see
    any signal handling in neither Java_java_lang_ProcessImpl_forkAndExec,
    startChild, vforkChild, or childProcess (maybe it is handle by the
    Java_java_lang_ProcessImpl_forkAndExec caller?).

  - cancellation state being shared with the parent (BZ#10354): at least
    two cancellation entrypoints are called on childProcess (write and
    close), which in a vfork state is undefined behaviour if cancellation
    in pending and act upon it.

  - vfork limitation of possible parent clobber due stack spilling:
    although the call stack does have some care of assuming a shared
    memory environment, it does not have any treatment of possible
    compiler spilling the parent's stack (glibc's posix_spawn uses a
    mmaped stack for this issue).

  - Another issue is the case where the helper process itself
    is killed somehow (there were some discussion about it on BZ#22273).

  - The closeDescriptors is inherent racy in the way it is implemented
    for vfork case: either by iterating over proc/%d/fd or _SC_OPEN_MAX
    you can't guarantee that another thread won't open another file. 
    Worse, taking locks in the vfork helper case can't be quite
    troublesome for the aforementioned case where helper process is
    killed before calling either _exit or _execv*.

You may have not crossed any of this issue due the fact pthread cancellation
is not widely used, some compiler luck with code generation, and with runtime
not issuing misguided signals.  In any case I would say it would be better to
use the posix_spawn strategy by issuing a helper process for Linux as well.

And what I would really like is to see what prevents openjdk and other programs
to use posix_spawn directly. It seems now that you need an extension to change 
the working directory and to close the inherited file descriptor. We are 
discussing the former and I think it would be feasible to raise it to Austin 
group as an extension.

For former I see there is no easy way to provide a similar closefrom
function without either being racy (as *BSD and openjdk implementations)
or to remove some scalability by adding serialization (for open* syscall
to avoid create new FD while closeall is closing them). A better alternative
would to request for some kernel helper, but I am not convinced that
current way of explicit open all file descriptors with O_CLOEXEC is not
the better option. Unfortunately it does not help run external code
(through shared libraries) that still call open in default mode.

[1] http://hg.openjdk.java.net/jdk/jdk/file/tip/src/java.base/unix/native/libjava/ProcessImpl_md.c

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: system and popen fail in case of big application
  2018-09-13  1:27         ` Adhemerval Zanella
@ 2018-09-13  6:31           ` Andreas Schwab
  2018-09-13 12:30             ` Adhemerval Zanella
  2018-09-18  1:18           ` Martin Buchholz
  1 sibling, 1 reply; 26+ messages in thread
From: Andreas Schwab @ 2018-09-13  6:31 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: Martin Buchholz, Zack Weinberg, GNU C Library

On Sep 12 2018, Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote:

>   - vfork would be supported on Linux not because its a good interface,
>     but rather than Linux policy is to not broke userspace in any case.
>     And it would be supported only on architectures that already provide
>     it, new ports do not automatically export vfork and they are not 
>     encouraged to do so. Currently ARC, alpha, s390, microblame, x86, sH,
>     powerpc, hexagen, parisc, m68k, arm, and aarch64 do so (and aarch64 
>     only if compat mode is enabled, a 64-bit kernel only won't have vfork
>     syscall). For remaining arches, glibc will just use a clone plus
>     required flags (CLONE_VM | CLONE_VFORK | SIGCHLD).

That doesn't make sense.  CLONE_VFORK *is* vfork in all matters, and
every architecture supports it.  It's nothing different from the fact
that there is no open syscall any more, because openat does the same
(and more).

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: system and popen fail in case of big application
  2018-09-12 16:30     ` Zack Weinberg
  2018-09-12 19:46       ` Martin Buchholz
  2018-09-12 23:08       ` Adhemerval Zanella
@ 2018-09-13 12:11       ` Szabolcs Nagy
  2018-09-14 18:42         ` Zack Weinberg
  2 siblings, 1 reply; 26+ messages in thread
From: Szabolcs Nagy @ 2018-09-13 12:11 UTC (permalink / raw)
  To: Zack Weinberg, adhemerval.zanella; +Cc: nd, libc-alpha

On 12/09/18 17:29, Zack Weinberg wrote:
> I don't want to encourage people to use posix_spawn because
> posix_spawn is a badly designed API.  It's difficult to use correctly.
> It's *tedious* to use correctly, which means people won't bother.  It
> can't do everything you might want to do on the child side (witness
> the discussion of adding an extension to let it do chdir()).  Its
> behavior in case of errors is underspecified.  And it might be

the misdesigned api is vfork: it is not just difficult to use
correctly but impossible: it breaks c language semantics
(gcc fixes this up to some extent when it sees the actual vfork
symbol in the code, but there is no guarantee in c that the
compiler can see that, and even then the shared stack between
parent and child has various issues as Adhemerval described)

so stop telling ppl to use vfork.

fork has the problem of large commit charge so it can fail with oom,
it's also not implementable efficiently on various systems (no-mmu,
windows, etc) can easily cause leaks of sensitive data into
child processes (hard to provide security boundary between parent
and child during the critical operations before exec)

posix_spawn is ugly but at least was designed with the c language
in mind and currently it's the only api that avoids the problems
listed above, so until there is a better api, this is what should
be recommended for process creation (e.g. gnu make would benefit
from it, its vfork usage is a pile of UB)

> implemented in terms of fork, which means it doesn't guarantee to
> solve Sergey's problem.

vfork can be implemented in terms of fork too, memcpy can be
implemented using syscalls, these are QoI issues and not
good technical reasons to avoid a particular api. (the entire
point of posix_spawn is that it does not depend on the va
space copy semantics of fork, the way glibc ignored this for
a long time was just a glibc bug)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: system and popen fail in case of big application
  2018-09-13  6:31           ` Andreas Schwab
@ 2018-09-13 12:30             ` Adhemerval Zanella
  0 siblings, 0 replies; 26+ messages in thread
From: Adhemerval Zanella @ 2018-09-13 12:30 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Martin Buchholz, Zack Weinberg, GNU C Library



On 13/09/2018 03:31, Andreas Schwab wrote:
> On Sep 12 2018, Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote:
> 
>>   - vfork would be supported on Linux not because its a good interface,
>>     but rather than Linux policy is to not broke userspace in any case.
>>     And it would be supported only on architectures that already provide
>>     it, new ports do not automatically export vfork and they are not 
>>     encouraged to do so. Currently ARC, alpha, s390, microblame, x86, sH,
>>     powerpc, hexagen, parisc, m68k, arm, and aarch64 do so (and aarch64 
>>     only if compat mode is enabled, a 64-bit kernel only won't have vfork
>>     syscall). For remaining arches, glibc will just use a clone plus
>>     required flags (CLONE_VM | CLONE_VFORK | SIGCHLD).
> 
> That doesn't make sense.  CLONE_VFORK *is* vfork in all matters, and
> every architecture supports it.  It's nothing different from the fact
> that there is no open syscall any more, because openat does the same
> (and more).
> 

What I meant is the *wire up* syscall which is not automatically provided
anymore. Sure you can implement vfork with CLONE_VFORK and it is what glibc
does and will continue to do so unless we decide to remove it.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: system and popen fail in case of big application
  2018-09-13 12:11       ` Szabolcs Nagy
@ 2018-09-14 18:42         ` Zack Weinberg
  2018-09-17 10:32           ` Szabolcs Nagy
  0 siblings, 1 reply; 26+ messages in thread
From: Zack Weinberg @ 2018-09-14 18:42 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: Adhemerval Zanella, nd, GNU C Library

On Thu, Sep 13, 2018 at 8:11 AM Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
> On 12/09/18 17:29, Zack Weinberg wrote:
> > I don't want to encourage people to use posix_spawn because
> > posix_spawn is a badly designed API.  It's difficult to use correctly.
> > It's *tedious* to use correctly, which means people won't bother.  It
> > can't do everything you might want to do on the child side (witness
> > the discussion of adding an extension to let it do chdir()).  Its
> > behavior in case of errors is underspecified.  And it might be
>
> the misdesigned api is vfork: it is not just difficult to use
> correctly but impossible: it breaks c language semantics
> (gcc fixes this up to some extent when it sees the actual vfork
> symbol in the code, but there is no guarantee in c that the
> compiler can see that, and even then the shared stack between
> parent and child has various issues as Adhemerval described)
[etc]

The way I see it is that that is all stuff that could be fixed.  I
looked at what Adhemerval did in s/u/s/linux/spawni.c and it appears
that generalizing it to `pid_t nvfork(void (*)(void *))` would solve
_all_ of the problems you list and even be safe to expose to
applications.  (Name provisional.)

On the other hand, in my opinion posix_spawn is an irretrievably
flawed design, because of its underspecification, because of its
license to lose error details, and above all, because it _doesn't_
allow you to perform arbitrary computation on the child side, which
means we will forever be playing catch-up with the list of "actions".
I want to see it deprecated and removed from POSIX, and that means
coming up with a genuinely better way to do what it does.  nvfork()
seems like a step in the right direction.

(I've actually had an even better plan in mind for years now,
involving a new "embryo" process state, but someone would have to pay
me to argue with Linus, or else take over that part of the project
themselves.)

> vfork can be implemented in terms of fork too, memcpy can be
> implemented using syscalls, these are QoI issues

I don't see it that way.  The kind of underspecification that both
vfork and posix_spawn suffer is the same kind of underspecification
that renders it unwise to use signal(), and continuing the analogy,
the virtue of sigaction() is that it _does_ specify all of the
behavioral details like whether or not signal delivery will cause
blocking system calls to fail and set errno to EINTR.

If we were going to add the hypothetical nvfork(), I would write into
its documentation that it is _guaranteed_ to have vfork-like behavior
(that is, it calls the supplied procedure in the memory space of the
parent process; writes through the closure parameter are visible to
the caller, as are any changes to errno; control does not return to
the caller until the procedure finishes) and it will fail and set
errno to ENOSYS if the OS doesn't make that possible.

zw

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: system and popen fail in case of big application
  2018-09-14 18:42         ` Zack Weinberg
@ 2018-09-17 10:32           ` Szabolcs Nagy
  2018-09-17 18:27             ` Adhemerval Zanella
  0 siblings, 1 reply; 26+ messages in thread
From: Szabolcs Nagy @ 2018-09-17 10:32 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: nd, Adhemerval Zanella, GNU C Library

On 14/09/18 19:42, Zack Weinberg wrote:
> On Thu, Sep 13, 2018 at 8:11 AM Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
>> the misdesigned api is vfork: it is not just difficult to use
>> correctly but impossible: it breaks c language semantics
>> (gcc fixes this up to some extent when it sees the actual vfork
>> symbol in the code, but there is no guarantee in c that the
>> compiler can see that, and even then the shared stack between
>> parent and child has various issues as Adhemerval described)
> [etc]
> 
> The way I see it is that that is all stuff that could be fixed.  I
> looked at what Adhemerval did in s/u/s/linux/spawni.c and it appears
> that generalizing it to `pid_t nvfork(void (*)(void *))` would solve
> _all_ of the problems you list and even be safe to expose to
> applications.  (Name provisional.)
...
> If we were going to add the hypothetical nvfork(), I would write into
> its documentation that it is _guaranteed_ to have vfork-like behavior
> (that is, it calls the supplied procedure in the memory space of the
> parent process; writes through the closure parameter are visible to
> the caller, as are any changes to errno; control does not return to
> the caller until the procedure finishes) and it will fail and set
> errno to ENOSYS if the OS doesn't make that possible.

i think a new api is fine, but it needs to be proposed
as standard api (or discussed with other implementors
as a glibc only api would be suboptimal).

and it needs to be specified what the callback can do.
e.g. in multi-threaded process only as-safe operations
can be allowed.

and if the vforked child vforks another child which kills
its parent then the original calling thread continues
while the grand child still executes in the same address
space, so either vfork after vfork should be disallowed
or the vforked child needs to run on a separate stack
(if it has its own stack then the caller does not even
need to stop).

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: system and popen fail in case of big application
  2018-09-17 10:32           ` Szabolcs Nagy
@ 2018-09-17 18:27             ` Adhemerval Zanella
  0 siblings, 0 replies; 26+ messages in thread
From: Adhemerval Zanella @ 2018-09-17 18:27 UTC (permalink / raw)
  To: Szabolcs Nagy, Zack Weinberg; +Cc: nd, GNU C Library



On 17/09/2018 03:32, Szabolcs Nagy wrote:
> On 14/09/18 19:42, Zack Weinberg wrote:
>> On Thu, Sep 13, 2018 at 8:11 AM Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
>>> the misdesigned api is vfork: it is not just difficult to use
>>> correctly but impossible: it breaks c language semantics
>>> (gcc fixes this up to some extent when it sees the actual vfork
>>> symbol in the code, but there is no guarantee in c that the
>>> compiler can see that, and even then the shared stack between
>>> parent and child has various issues as Adhemerval described)
>> [etc]
>>
>> The way I see it is that that is all stuff that could be fixed.  I
>> looked at what Adhemerval did in s/u/s/linux/spawni.c and it appears
>> that generalizing it to `pid_t nvfork(void (*)(void *))` would solve
>> _all_ of the problems you list and even be safe to expose to
>> applications.  (Name provisional.)
> ...
>> If we were going to add the hypothetical nvfork(), I would write into
>> its documentation that it is _guaranteed_ to have vfork-like behavior
>> (that is, it calls the supplied procedure in the memory space of the
>> parent process; writes through the closure parameter are visible to
>> the caller, as are any changes to errno; control does not return to
>> the caller until the procedure finishes) and it will fail and set
>> errno to ENOSYS if the OS doesn't make that possible.
> 
> i think a new api is fine, but it needs to be proposed
> as standard api (or discussed with other implementors
> as a glibc only api would be suboptimal).
> 
> and it needs to be specified what the callback can do.
> e.g. in multi-threaded process only as-safe operations
> can be allowed.
> 
> and if the vforked child vforks another child which kills
> its parent then the original calling thread continues
> while the grand child still executes in the same address
> space, so either vfork after vfork should be disallowed
> or the vforked child needs to run on a separate stack
> (if it has its own stack then the caller does not even
> need to stop).

I am not very found of callback mechanism exactly because of your
example, where we need to either specify the possible multiple
interactions by limiting the kind of computation the callback is
allowed to do or just define such invalid computation as unsupported.

For instance, Florian has brought up the discussion about whether or
not support C++ exceptions inside signal handlers and a possible
issue might apply here: should we support C++ code in callback (since
it might trigger non as-safe code with unwinder on a exception)?
Another example is should we handle thread cancellation on callback,
since some async-safe functions might block (fsync or select for 
instance)?

These are just examples of kind of complexity it brings where you
allow unbounded computation on callbacks (even when you specify which
kind of semantics you intend to allow).

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: system and popen fail in case of big application
  2018-09-13  1:27         ` Adhemerval Zanella
  2018-09-13  6:31           ` Andreas Schwab
@ 2018-09-18  1:18           ` Martin Buchholz
  2018-09-18  1:59             ` Adhemerval Zanella
  1 sibling, 1 reply; 26+ messages in thread
From: Martin Buchholz @ 2018-09-18  1:18 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: Zack Weinberg, GNU C Library

Thanks for taking such a close look!

On Wed, Sep 12, 2018 at 6:27 PM, Adhemerval Zanella
<adhemerval.zanella@linaro.org> wrote:

> The code you linked [1] contains some outdated information regarding
> vfork and posix_spawn:
>
>   - vfork would be supported on Linux not because its a good interface,
>     but rather than Linux policy is to not broke userspace in any case.
>     And it would be supported only on architectures that already provide
>     it, new ports do not automatically export vfork and they are not
>     encouraged to do so. Currently ARC, alpha, s390, microblame, x86, sH,
>     powerpc, hexagen, parisc, m68k, arm, and aarch64 do so (and aarch64
>     only if compat mode is enabled, a 64-bit kernel only won't have vfork
>     syscall). For remaining arches, glibc will just use a clone plus
>     required flags (CLONE_VM | CLONE_VFORK | SIGCHLD).
>
>   - Almost sure the clone issues described is due the fact of old tid
>     caching due CLONE_VM and this semantic has been changed in recent
>     glibc version.  clone now should not touch any shared internal
>     structures, the only constraint is it still requires a stack argument.

Can you then update the bug from WONTFIX ?
https://www.sourceware.org/bugzilla/show_bug.cgi?id=10311

> Also, on vfork oath I am not very confident openjdk implementation works
> around correctly on the issue that posix_spawn has fixed recently:
>
>   - inherent usage issue with signal handlers (BZ#14750): I don't see
>     any signal handling in neither Java_java_lang_ProcessImpl_forkAndExec,
>     startChild, vforkChild, or childProcess (maybe it is handle by the
>     Java_java_lang_ProcessImpl_forkAndExec caller?).

We've never worried about signals arriving between vfork and exec, and
no one has ever reported a problem.

>   - The closeDescriptors is inherent racy in the way it is implemented
>     for vfork case: either by iterating over proc/%d/fd or _SC_OPEN_MAX
>     you can't guarantee that another thread won't open another file.
>     Worse, taking locks in the vfork helper case can't be quite
>     troublesome for the aforementioned case where helper process is
>     killed before calling either _exit or _execv*.

On vfork, parent and child must have independent file descriptor
tables; else how could the child close all file descriptors without
affecting the parent?

> You may have not crossed any of this issue due the fact pthread cancellation
> is not widely used, some compiler luck with code generation, and with runtime
> not issuing misguided signals.  In any case I would say it would be better to
> use the posix_spawn strategy by issuing a helper process for Linux as well.
>
> And what I would really like is to see what prevents openjdk and other programs
> to use posix_spawn directly. It seems now that you need an extension to change
> the working directory and to close the inherited file descriptor. We are
> discussing the former and I think it would be feasible to raise it to Austin
> group as an extension.

Right.  Java needs to change the process environment and working
directory, and fiddle with file descriptors.
I do worry that an interface like posix_spawn will never be quite
complete enough to allow us to spawn the child we want in one step,
but there will remain some reason why we need to spawn a small helper
program first.

> For former I see there is no easy way to provide a similar closefrom
> function without either being racy (as *BSD and openjdk implementations)
> or to remove some scalability by adding serialization (for open* syscall
> to avoid create new FD while closeall is closing them). A better alternative
> would to request for some kernel helper, but I am not convinced that
> current way of explicit open all file descriptors with O_CLOEXEC is not
> the better option. Unfortunately it does not help run external code
> (through shared libraries) that still call open in default mode.

It seems like traditional methods to create file descriptors will
never set the CLOEXEC flag by default, and most programmers will not
be aware of the problem, so those of us implementing subprocess
creation cannot ever rely on all the file descriptors having the flag
set.

After vfork, there should be no race with closefrom (CLONE_FILES is not used).

> [1] http://hg.openjdk.java.net/jdk/jdk/file/tip/src/java.base/unix/native/libjava/ProcessImpl_md.c

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: system and popen fail in case of big application
  2018-09-18  1:18           ` Martin Buchholz
@ 2018-09-18  1:59             ` Adhemerval Zanella
  2018-09-18  2:18               ` Martin Buchholz
  2018-09-18  4:31               ` Rich Felker
  0 siblings, 2 replies; 26+ messages in thread
From: Adhemerval Zanella @ 2018-09-18  1:59 UTC (permalink / raw)
  To: Martin Buchholz; +Cc: Zack Weinberg, GNU C Library



On 17/09/2018 18:18, Martin Buchholz wrote:
> Thanks for taking such a close look!
> 
> On Wed, Sep 12, 2018 at 6:27 PM, Adhemerval Zanella
> <adhemerval.zanella@linaro.org> wrote:
> 
>> The code you linked [1] contains some outdated information regarding
>> vfork and posix_spawn:
>>
>>   - vfork would be supported on Linux not because its a good interface,
>>     but rather than Linux policy is to not broke userspace in any case.
>>     And it would be supported only on architectures that already provide
>>     it, new ports do not automatically export vfork and they are not
>>     encouraged to do so. Currently ARC, alpha, s390, microblame, x86, sH,
>>     powerpc, hexagen, parisc, m68k, arm, and aarch64 do so (and aarch64
>>     only if compat mode is enabled, a 64-bit kernel only won't have vfork
>>     syscall). For remaining arches, glibc will just use a clone plus
>>     required flags (CLONE_VM | CLONE_VFORK | SIGCHLD).
>>
>>   - Almost sure the clone issues described is due the fact of old tid
>>     caching due CLONE_VM and this semantic has been changed in recent
>>     glibc version.  clone now should not touch any shared internal
>>     structures, the only constraint is it still requires a stack argument.
> 
> Can you then update the bug from WONTFIX ?
> https://www.sourceware.org/bugzilla/show_bug.cgi?id=10311

I will take a look if we can close this bug, now that clones does not
change any glibc internal state the example should work.  However from
previous discussions, the consensus is the glibc state after the 
direct clone call is undefined and glibc can't guarantee all its
functionalities will work safely.

> 
>> Also, on vfork oath I am not very confident openjdk implementation works
>> around correctly on the issue that posix_spawn has fixed recently:
>>
>>   - inherent usage issue with signal handlers (BZ#14750): I don't see
>>     any signal handling in neither Java_java_lang_ProcessImpl_forkAndExec,
>>     startChild, vforkChild, or childProcess (maybe it is handle by the
>>     Java_java_lang_ProcessImpl_forkAndExec caller?).
> 
> We've never worried about signals arriving between vfork and exec, and
> no one has ever reported a problem.

Not sure if this a problem for the JVM perpective, but from a libc 
standpoint this is still an issue.

> 
>>   - The closeDescriptors is inherent racy in the way it is implemented
>>     for vfork case: either by iterating over proc/%d/fd or _SC_OPEN_MAX
>>     you can't guarantee that another thread won't open another file.
>>     Worse, taking locks in the vfork helper case can't be quite
>>     troublesome for the aforementioned case where helper process is
>>     killed before calling either _exit or _execv*.
> 
> On vfork, parent and child must have independent file descriptor
> tables; else how could the child close all file descriptors without
> affecting the parent?

My mistake here, I was thinking without considering vfork indeed does not
use CLONE_FILES.

> 
>> You may have not crossed any of this issue due the fact pthread cancellation
>> is not widely used, some compiler luck with code generation, and with runtime
>> not issuing misguided signals.  In any case I would say it would be better to
>> use the posix_spawn strategy by issuing a helper process for Linux as well.
>>
>> And what I would really like is to see what prevents openjdk and other programs
>> to use posix_spawn directly. It seems now that you need an extension to change
>> the working directory and to close the inherited file descriptor. We are
>> discussing the former and I think it would be feasible to raise it to Austin
>> group as an extension.
> 
> Right.  Java needs to change the process environment and working
> directory, and fiddle with file descriptors.
> I do worry that an interface like posix_spawn will never be quite
> complete enough to allow us to spawn the child we want in one step,
> but there will remain some reason why we need to spawn a small helper
> program first.
> 
>> For former I see there is no easy way to provide a similar closefrom
>> function without either being racy (as *BSD and openjdk implementations)
>> or to remove some scalability by adding serialization (for open* syscall
>> to avoid create new FD while closeall is closing them). A better alternative
>> would to request for some kernel helper, but I am not convinced that
>> current way of explicit open all file descriptors with O_CLOEXEC is not
>> the better option. Unfortunately it does not help run external code
>> (through shared libraries) that still call open in default mode.
> 
> It seems like traditional methods to create file descriptors will
> never set the CLOEXEC flag by default, and most programmers will not
> be aware of the problem, so those of us implementing subprocess
> creation cannot ever rely on all the file descriptors having the flag
> set.

But from a QoI standpoint O_CLOEXEC is still the better option. And f we 
aim to add such extension on posix_spawn, I will expect we also push for 
some kernel support. There are reports where trying to interacting over 
all possible FD takes a lot of time [1].

> 
> After vfork, there should be no race with closefrom (CLONE_FILES is not used).
> 
>> [1] http://hg.openjdk.java.net/jdk/jdk/file/tip/src/java.base/unix/native/libjava/ProcessImpl_md.c

[1] https://bugzilla.redhat.com/show_bug.cgi?id=837033

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: system and popen fail in case of big application
  2018-09-18  1:59             ` Adhemerval Zanella
@ 2018-09-18  2:18               ` Martin Buchholz
  2018-09-18  4:31               ` Rich Felker
  1 sibling, 0 replies; 26+ messages in thread
From: Martin Buchholz @ 2018-09-18  2:18 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: Zack Weinberg, GNU C Library

On Mon, Sep 17, 2018 at 6:58 PM, Adhemerval Zanella
<adhemerval.zanella@linaro.org> wrote:

> But from a QoI standpoint O_CLOEXEC is still the better option. And f we
> aim to add such extension on posix_spawn, I will expect we also push for
> some kernel support.

Yes.

Most programmers can set the CLOEXEC flag on most descriptors most of
the time, but we can never reach "all".

> There are reports where trying to interacting over
> all possible FD takes a lot of time [1].

See bottom of
https://markmail.org/message/6ede5ocuow5aba6i
in concurrent discussion

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: system and popen fail in case of big application
  2018-09-18  1:59             ` Adhemerval Zanella
  2018-09-18  2:18               ` Martin Buchholz
@ 2018-09-18  4:31               ` Rich Felker
  1 sibling, 0 replies; 26+ messages in thread
From: Rich Felker @ 2018-09-18  4:31 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: Martin Buchholz, Zack Weinberg, GNU C Library

On Mon, Sep 17, 2018 at 06:58:56PM -0700, Adhemerval Zanella wrote:
> >> For former I see there is no easy way to provide a similar closefrom
> >> function without either being racy (as *BSD and openjdk implementations)
> >> or to remove some scalability by adding serialization (for open* syscall
> >> to avoid create new FD while closeall is closing them). A better alternative
> >> would to request for some kernel helper, but I am not convinced that
> >> current way of explicit open all file descriptors with O_CLOEXEC is not
> >> the better option. Unfortunately it does not help run external code
> >> (through shared libraries) that still call open in default mode.
> > 
> > It seems like traditional methods to create file descriptors will
> > never set the CLOEXEC flag by default, and most programmers will not
> > be aware of the problem, so those of us implementing subprocess
> > creation cannot ever rely on all the file descriptors having the flag
> > set.
> 
> But from a QoI standpoint O_CLOEXEC is still the better option. And f we 
> aim to add such extension on posix_spawn, I will expect we also push for 
> some kernel support. There are reports where trying to interacting over 
> all possible FD takes a lot of time [1].
> 
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=837033

While closeall is an awful operation that should not be done (fix
stuff to use O_CLOEXEC), looping over the whole fd range and making a
close syscall for each probably-already-closed one is the most awful
possible way to implement it. You can make it fast by calling poll
with 1000 or more fds at a time in the pollfd set, checking revents
for POLLNVAL to determine which ones are open, and only close those.

Rich

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2018-09-18  4:31 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-09 22:16 system and popen fail in case of big application Sergey Melnikov
2018-09-09 22:44 ` Zack Weinberg
2018-09-09 22:47   ` Zack Weinberg
2018-09-09 23:27     ` Sergey Melnikov
2018-09-09 23:49       ` Zack Weinberg
2018-09-10  2:50         ` Martin Buchholz
2018-09-10 15:38       ` Joseph Myers
2018-09-10 15:42         ` Zack Weinberg
2018-09-10 15:43           ` Joseph Myers
     [not found]       ` <CA+kOe0_1k_PbJ-pjHznP4AmTJUgziAdT+4vCcCRSb7GGdvbv7Q@mail.gmail.com>
2018-09-10 15:59         ` Zack Weinberg
2018-09-10 16:55           ` Martin Buchholz
2018-09-11 20:16   ` Adhemerval Zanella
2018-09-12 16:30     ` Zack Weinberg
2018-09-12 19:46       ` Martin Buchholz
2018-09-13  1:27         ` Adhemerval Zanella
2018-09-13  6:31           ` Andreas Schwab
2018-09-13 12:30             ` Adhemerval Zanella
2018-09-18  1:18           ` Martin Buchholz
2018-09-18  1:59             ` Adhemerval Zanella
2018-09-18  2:18               ` Martin Buchholz
2018-09-18  4:31               ` Rich Felker
2018-09-12 23:08       ` Adhemerval Zanella
2018-09-13 12:11       ` Szabolcs Nagy
2018-09-14 18:42         ` Zack Weinberg
2018-09-17 10:32           ` Szabolcs Nagy
2018-09-17 18:27             ` Adhemerval Zanella

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).