[RFC] Toward Shareable POSIX Signals

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

* [RFC] Toward Shareable POSIX Signals
@ 2018-03-08 17:53 Daniel Colascione
  2018-03-08 20:09 ` Florian Weimer
  2018-03-11 18:07 ` Zack Weinberg
  0 siblings, 2 replies; 28+ messages in thread
From: Daniel Colascione @ 2018-03-08 17:53 UTC (permalink / raw)
  To: libc-alpha

Hi libc-alpha@

I've written up a proposal for improving the application signal APIs,
written below. Might there be any interest in prototyping this work
in glibc?

Thanks in advance for taking a look!

Problem
=======

The Unix signal API comes down to us from a time when the process was
the natural unit of code division. In that world, making the handler
for a signal a process-wide property presented few real problems. Now
that itâ€™s become common for multiple independent components to share a
process and for these components to each want to perform some action
in response to the receipt of a signal, the old Unix process-wide
signal handler approach has begun to break down. This document
proposes a scalable and backwards-compatible facility for allowing
these components to peacefully coexist.

Why do we use signals? Why might we want to share them?
-------------------------------------------------------

Itâ€™s useful to think about _why_ components might want to use signals
and why multiple independently-developed components sharing a process
might each want to use the same signal. Iâ€™ve describe a few use cases
for signal handling below and explain why different components might
want to share a signal.

HIGH-LEVEL LANGUAGE RUNTIME OPTIMIZATION: many runtimes for high-level
languages (e.g., ART) rely on signals to catch out-of-bound memory
accesses (e.g., to null pointers) and transform these accesses into
well-defined high-level language exceptions (e.g.,
NullPointerException). By relying on signals to detect these accesses
instead of inserting explicit access checks before every dereference,
runtime authors can greatly improve code performance in the common
case where code never dereferences invalid pointers. A runtime using
this technique typically checks, upon getting a SIGSEGV, whether the
code that triggered the SIGSEGV is â€œownedâ€ by the runtime. If so, the
runtime arranges for a high-level exception to be raised. If not, the
runtime delegates responsibility for handling the signal to some other
component (e.g., a crash reporting engine). If multiple such runtimes
exist in a single process, itâ€™s reasonable to expect each to respect
its own languageâ€™s exceptions and not interfere with other runtimes.

CHILD PROCESS EXIT NOTIFICATION: itâ€™s surprisingly difficult to wait
for subprocess exit in the traditional Unix way: itâ€™s legal to wait on
a process only once, and the wait*-family primitives are not
multiplexed with other kinds of waits (like poll(2)) and so require a
dedicated wait for each thread. Compounding the problem, itâ€™s not
possible to wait for a list of specific child processes: the APIs
support waiting for all children, waiting for children of a specific
process group, or waiting for one specific child.

Some components work around the inability to wait on a list of
processes by using a blanket waitpid(-1, ..) to look for exits from
any of their children, ignoring the exits of any children they donâ€™t
recognize. If two such components exist in a single process, then they
will each receive notifications for the exits of each otherâ€™s
children, ignore them, and (from the perspective of each runtime)
never receive child exit notifications.

A more scalable approach is to rely on SIGCHLD for child exit
notifications. Under this model, components wait for SIGCHLD (sharing
the signal using the mechanism that this document proposes), and upon
getting it, call waitpid(CHILD, &status, WNOHANG) for each CHILD that
component owns. (Or just the child whose PID matches the one in the
siginfo_t). Each component receives a notification for every child it
cares about without interfering with the operation of any other
component in the process.  Proposed non-portable mechanisms like
CLONE_FD would also allow for scalable subprocess waiting, but this
document proposes a portable, universal solution to the
multiple-child-wait problem that falls out of a general-purpose signal
sharing facility.

HANDLING MMAP IO ERRORS VIA SIGBUS: the kernel reports IO errors on
mmaped files by sending threads with failing IO SIGBUS. Itâ€™s perfectly
reasonable for multiple components in a single process to want to each
deal with IO errors in memory-mapped files they own.

SIGIO: multiple components might reasonably went to perform
asynchronous IO and receive completion notifications. While itâ€™s
possible to use real time signals instead of SIGIO, there are only so
many of them and a coordination problem still exists with respect to
access to the signals.

SUSPENSION: SIGTSTP is useful in a variety of contexts.

MEMORY TRICKS: itâ€™s reasonable to use SIGSEGV to perform tricks like
user-space page faulting of compressed files, access checks, and so
on. There ought to be no reason in principle that multiple such
components couldnâ€™t exist in the same process.

INSTRUCTION EMULATION: trapping SIGILL is a legal way to provide a
fallback for instructions that might not exist on a particular
architecture. Again, multiple such components, each supporting
different instructions, might exist, and these components
should cooperate.

EXTERNAL SIGNALING: itâ€™s traditional for processes to reload
configuration files, metadata, and so on upon the receipt of
SIGHUP. Why shouldnâ€™t multiple components in a process each listen for
this notification? The same argues applies for triggering cleanups
upon receiving a SIGTERM or SIGINT.

CRASH REPORTING: itâ€™s useful to allow for multiple crash reporting
components in a single process so as to report different information
to different users. For example, in Android, an application may want
to install something like breakpad to report crashes in an
application-specific way, but also trigger the systemâ€™s debugd handler
to print informative messages in the system log.

Previous Work
=============

Manual Signal Chaining
----------------------

APPROACH: The most straightforward way for two components to share a
signal handler is for one componentâ€™s signal handler to delegate to
another. That is, component A loads first and installs its own signal
handler. Component B loads consequently, retrieves the signal handler
set by component A, and then installs its own. When Bâ€™s signal handler
runs, it performs whatever logic is necessary, then calls component
Aâ€™s handler.

PROBLEMS:

     1. Error-prone implementation: the POSIX signal API has evolved
     over many years and includes features that chaining implementers
     may not honor, like SA_RESETHAND, SA_NODEFER and signal-specific
     masks set via sigaction. Unless the chaining library goes to great
     pains to be exactly correct, signaled â€œchainedâ€ from some other
     handler will likely run in an unexpected environment and
     may malfunction.

     2 Default unsafe: unless components go out of their way to chain
     to existing handlers, they will clobber handlers already
     installed, and naive testing may not reveal this problem until a
     program tries to combine components in ways that their authors did
     not expect.

     3. Component unloading: any â€œchainedâ€ approach interacts poorly
     with component unloading. If component A above were unloaded,
     component B would have no way of knowing about it, and B, in its
     signal handler, would attempt to call into unloaded code and
     explore many kinds of undefined behavior. Uninstallation of signal
     handlers also breaks.

     4. Signal ordering: component Bâ€™s handler will always run before
     component Aâ€™s handler. If component B is some kind of
     general-purpose catch-all handler (e.g., for a crash reporting
     component), and component A wants to handle just one kind of
     signal and recover, weâ€™ll do the (probably-heavyweight,
     maybe-fatal) work for B before even getting started with A, with
     consequences ranging from behavioral weirdness to instacrashing.

sigchainlib
-----------

APPROACH: sigchainlib is part of ART; itâ€™s essentially an elaboration
of the manual signal chaining approach. Instead of relying on each
component author to manually chain signals, it uses ELF symbol
interposition to provide to other components in an ART-using process
alternative implementations of the POSIX signal functions.

PROBLEMS: sigchainlib is an improvement over asking the rest of the
ecosystem to manually defer to ARTâ€™s own signal handler, but it still
suffers from problems #1 and #3, and to some extent #4. (Whether #4
applies depends on whether you believe itâ€™s legitimate to want a
signal handler to run before ARTâ€™s.) Additionally, sigchainlib
requires dynamic linking to operate properly, and the scheme fails to
operate correctly in a statically-linked process. Itâ€™s also not
possible to correctly intercept all signal handler requests: for
example, if a component using sigchainlib is loaded into a process
that has privately stashed a pointer to the sigaction function.

Windows Vectored Exception Handlers
-----------------------------------

Windows isnâ€™t a POSIX system and doesnâ€™t have signals per se, but it
does have a similar concept of a global unhandled â€œexceptionâ€ (e.g.,
SIGSEGV-equivalent) handler. Vectored Exception Handlers allow
multiple components to cooperate in handling these exceptions and
operate very similarly to the mechanism that this document proposes.

Proposed New Standard API
=========================

Summary
-------

We can fix the signal-sharing problem by providing a new API
explicitly designed for cooperative handling of signals and layering
it â€œon topâ€ of the traditional signal handling functions, first giving
shared signal handlers a chance to run, and then automatically falling
back to the traditional model.

Why not a library?
------------------

Coordination. While itâ€™s possible to provide the interface below via a
user library, having it in the base system solves the problem of
having multiple independent components coordinate their signal sharing
mechanisms, and it allows the system to ensure that these shared
signal handlers interact properly with the legacy signal API. If
everyone used a signal multiplexing library, weâ€™d just have to
coordinate the signal multiplexing libraries.

Why not signalfd?
-----------------

Signalfd is definitely useful, but itâ€™s non-standard and doesnâ€™t
support all the use cases of conventional signal handlers: e.g.,
support synchronous signals like SIGTSTP and SIGSEGV, non-local
control flow in a signal handler, and so on. In addition, even for the
cases for which signalfd is a viable solution, the interface with
signal-handling code is completely different, complicating
porting. Itâ€™s relatively easy to port legacy sigaction-based signal
handlers to the shared handler model.

Definition
----------

enum signal_disposition {
   SIGNAL_CONTINUE_SEARCH = 0,
   SIGNAL_CONTINUE_EXECUTION = 1,
};

typedef <opaque> signal_registration;
/* Declare opaquely */ INVALID_SIGNAL_REGISTRATION;

signal_registration signal_register(
   int signum,
   sigset_t mask,
   int flags,
   enum signal_disposition (*shared_handler)(
     int signo,
     siginfo_t *info,
     struct ucontext *context));

void signal_unregister(
   signal_registration registration);

Semantics
---------

signal_register registers a shared handler for a specific signal and
returns an opaque cookie that can later be used to unregister that
specific handler. signum, mask, and flags are as for sigaction(2),
except that SA_SIGINFO, SA_ONSTACK, and SA_RESTART are
implied. signal_unregister unregisters a handler registered with
signal_register. (signal_register fails by returning
INVALID_SIGNAL_REGISTRATION and setting errno.)

(Itâ€™s okay for two functions can be async-signal-unsafe, I think.)

An additional flag, SA_LOW_PRIORITY, has the following effect: all
handlers registered without SA_LOW_PRIORITY run before handlers
registered with SA_LOW_PRIORITY. SA_LOW_PRIORITY allows a component
like breakpad to express that it wants to run after other handlers
even if installed later. The handler function works like a
sigaction(2) SA_SIGINFO handler except for its return value,
described below.

When a signal arrives, instead of running the handler registered with
sigaction(2), we run the shared signal handler functions registered
with signal_register for that signal. Each of these shared signal
handlers returns either SIGNAL_CONTINUE_SEARCH or
SIGNAL_CONTINUE_EXECUTION. If a shared handler returns
SIGNAL_CONTINUE_EXECUTION, the system terminates signal processing and
resumes whatever it was doing before receiving the signal. (Itâ€™s the
moral equivalent of returning normally from a legacy signal handler.)
If a shared handler returns SIGNAL_CONTINUE_SEARCH, the system another
registered shared handler for that signal. If all shared handlers for
a signal return SIGNAL_CONTINUE_SEARCH, the legacy POSIX signal
handling rules apply.

Execution order: the system runs all non-SA_LOW_PRIORITY handlers in
order of installation, then all SA_LOW_PRIORITY handlers in order
of installation.

Non-Local Control Flow
----------------------

Itâ€™s occasionally useful to longjmp out of a signal handler. Itâ€™s
reasonable to want to return non-locally from a shared signal handler
too --- that is, to resume program execution after
SIGNAL_CONTINUE_EXECUTION in a different state from the state the
program had when we entered the shared signal handler. Since the
signal system probably wants to maintain some kind of state to track
its progress through its shared signal handler list, a plain longjmp
out of a shared signal handler will likely leave the system in an
unspecified state.

One way to allow non-local returns is to provide a longjmp wrapper;
shared signal handlers can call this function to reset the systemâ€™s
internal state before jumping. A second option is to allow handlers to
mutate the context argument describing program state. By modifying
this structure and affecting the state of the program we â€œreturnâ€
into, a shared signal handler can achieve the effect of a longjmp
without actually returning non-locally.

That is, the intent is to ban longjmp out of shared signal handlers
and achieve the effect of longjmp by having shared signal handlers
mutate the ucontext structure they receive, then returning with
SIGNAL_CONTINUE_EXECUTION.

Interaction with signal handler inheritance
-------------------------------------------

Upon exec(), the system â€œsquashesâ€ all process signal handlers into
either SIG_IGN (if the pre-exec process had configured SIG_IGN as the
signal/sigaction handler for a signal) or SIG_DFL (in any other
case). The intent of this proposal is to preserve this behavior and
make the installation of shared signal handlers irrelevant for
purposes of process inheritance. That is, if a process sets a signal
handler to SIG_IGN *and* uses signal_register to install a handler, a
post-exec process starts with no shared signal handlers and the legacy
sigaction handler set to SIG_IGN.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] Toward Shareable POSIX Signals
  2018-03-08 17:53 [RFC] Toward Shareable POSIX Signals Daniel Colascione
@ 2018-03-08 20:09 ` Florian Weimer
  2018-03-08 20:22   ` dancol
  2018-03-11 18:07 ` Zack Weinberg
  1 sibling, 1 reply; 28+ messages in thread
From: Florian Weimer @ 2018-03-08 20:09 UTC (permalink / raw)
  To: Daniel Colascione, libc-alpha

On 03/08/2018 06:52 PM, Daniel Colascione wrote:
> Windows Vectored Exception Handlers
> -----------------------------------
> 
> Windows isnâ€™t a POSIX system and doesnâ€™t have signals per se, but it
> does have a similar concept of a global unhandled â€œexceptionâ€ (e.g.,
> SIGSEGV-equivalent) handler. Vectored Exception Handlers allow
> multiple components to cooperate in handling these exceptions and
> operate very similarly to the mechanism that this document proposes.

For many of the things you listed (particularly the synchronously 
delivered signals), Structured Exception Handling (SEH) would actually 
be the proper model (with a table-driven implementation).  It would 
allow to install handlers for small regions of code, which helps with 
modularity, and the handlers would be effectively thread-local.

For asynchronously delivered signals (such as subprocess termination), 
the signal mechanism may not be entirely appropriate anyway.  For those, 
standardizing on a single event loop looks like the right solution, and 
glib has largely taken over there.  Any other effort would simply 
undermine that, and not increase consolidation.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] Toward Shareable POSIX Signals
  2018-03-08 20:09 ` Florian Weimer
@ 2018-03-08 20:22   ` dancol
  2018-03-08 21:21     ` Ondřej Bílka
  2018-03-09  9:19     ` Florian Weimer
  0 siblings, 2 replies; 28+ messages in thread
From: dancol @ 2018-03-08 20:22 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Daniel Colascione, libc-alpha

> On 03/08/2018 06:52 PM, Daniel Colascione wrote:
>> Windows Vectored Exception Handlers
>> -----------------------------------
>>
>> Windows isnÃ¢Â€Â™t a POSIX system and doesnÃ¢Â€Â™t have signals per se, but it
>> does have a similar concept of a global unhandled Ã¢Â€ÂœexceptionÃ¢Â€Â (e.g.,
>> SIGSEGV-equivalent) handler. Vectored Exception Handlers allow
>> multiple components to cooperate in handling these exceptions and
>> operate very similarly to the mechanism that this document proposes.
>
> For many of the things you listed (particularly the synchronously
> delivered signals), Structured Exception Handling (SEH) would actually
> be the proper model (with a table-driven implementation).  It would
> allow to install handlers for small regions of code, which helps with
> modularity, and the handlers would be effectively thread-local.

Not the case. SEH works only when you know about all the call sites that
might generate an exception. Sometimes, you want generic process-wide
handling keyed on memory address.

We don't have SEH, however, and there's no realistic prospect of getting
it. It would be realistic to extend the signals API to support more use
cases.

> For asynchronously delivered signals (such as subprocess termination),
> the signal mechanism may not be entirely appropriate anyway.

It beats wait. Which part of my proposed mechanism would operate improperly?

> For those,
> standardizing on a single event loop looks like the right solution, and
> glib has largely taken over there.

The libevent and Qt people might disagree. I don't think standardizing on
a single event loop is realistic considering that various event loop
libraries have been around for many years and not achieved any kind of
fixation.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] Toward Shareable POSIX Signals
  2018-03-08 20:22   ` dancol
@ 2018-03-08 21:21     ` Ondřej Bílka
  2018-03-08 21:50       ` dancol
  2018-03-09  9:19     ` Florian Weimer
  1 sibling, 1 reply; 28+ messages in thread
From: Ondřej Bílka @ 2018-03-08 21:21 UTC (permalink / raw)
  To: dancol; +Cc: Florian Weimer, libc-alpha

On Thu, Mar 08, 2018 at 12:22:05PM -0800, dancol@dancol.org wrote:
> > For asynchronously delivered signals (such as subprocess termination),
> > the signal mechanism may not be entirely appropriate anyway.
> 
> It beats wait. Which part of my proposed mechanism would operate improperly?
>
Basic problem is that if you combine signals, threads and locks you get
a big mess. 

It is hard to write handler doing something complex, you
couldn't take any lock because thread you interrupted could have that
lock. Introducing signal leads to unexpected race conditions(for example
what happens when interrupt is interrupted?).

> > For those,
> > standardizing on a single event loop looks like the right solution, and
> > glib has largely taken over there.
> 
> The libevent and Qt people might disagree. I don't think standardizing on
> a single event loop is realistic considering that various event loop
> libraries have been around for many years and not achieved any kind of
> fixation.
> 

Original answer to use event loop, doesn't matter which one. 

Async signals should be handled as separate thread which has event loop
to serially handle arrived signals. That would remove most difficulties
of signal handlers.

Reason why this isn't default is performance and putting this to another
event loop is compromise.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] Toward Shareable POSIX Signals
  2018-03-08 21:21     ` Ondřej Bílka
@ 2018-03-08 21:50       ` dancol
  2018-03-09  8:17         ` Ondřej Bílka
  0 siblings, 1 reply; 28+ messages in thread
From: dancol @ 2018-03-08 21:50 UTC (permalink / raw)
  To: "OndÅ™ej BÃlka"
  Cc: dancol, Florian Weimer, libc-alpha

> On Thu, Mar 08, 2018 at 12:22:05PM -0800, dancol@dancol.org wrote:
>> > For asynchronously delivered signals (such as subprocess termination),
>> > the signal mechanism may not be entirely appropriate anyway.
>>
>> It beats wait. Which part of my proposed mechanism would operate
>> improperly?
>>
> Basic problem is that if you combine signals, threads and locks you get
> a big mess.
>
> It is hard to write handler doing something complex, you
> couldn't take any lock because thread you interrupted could have that
> lock. Introducing signal leads to unexpected race conditions(for example
> what happens when interrupt is interrupted?).

Writing a robust handler requires some understanding of the subtleties
involved, but it's in no way impossible or even particularly difficult.
It's the moral equivalent of writing the top half of an interrupt handler.
People do that all the time.

Besides: people are _already_ using signals for this purpose. Practically
every high-performance language runtime already hooks various signal
handlers, and for good reason. There's no reason it should be hard for
these systems to coexist in the same process, and something like glib does
nothing to help sharing.

I'm proposing making existing use cases more portable and more robust.

>> > For those,
>> > standardizing on a single event loop looks like the right solution,
>> and
>> > glib has largely taken over there.
>>
>> The libevent and Qt people might disagree. I don't think standardizing
>> on
>> a single event loop is realistic considering that various event loop
>> libraries have been around for many years and not achieved any kind of
>> fixation.
>>
>
> Original answer to use event loop, doesn't matter which one.

Nobody will agree on a single event loop library. There is, for example,
_zero_ chance that popular mobile operating systems will adopt glib as the
primary event dispatching mechanism. In any case, an event loop doesn't
help with the synchronous signal sharing use cases --- the ones, for
example, a SIGSEGV-using high-performance runtime might require. (See the
first of my use cases on my original post.)

> Async signals should be handled as separate thread which has event loop
> to serially handle arrived signals. That would remove most difficulties
> of signal handlers.
>
> Reason why this isn't default is performance and putting this to another
> event loop is compromise.

It's not performance. It's that certain events, particularly those related
to memory errors, _need_ to be addressed immediately and synchronously.
Even if you were to use something like the Mach ports mechanism and send a
message instead of pushing a stack frame, you'd still have to
synchronously block a faulting thread, which could be anywhere, thus
giving you the same atomicity constraints.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] Toward Shareable POSIX Signals
  2018-03-08 21:50       ` dancol
@ 2018-03-09  8:17         ` Ondřej Bílka
  2018-03-09 10:51           ` Daniel Colascione
  0 siblings, 1 reply; 28+ messages in thread
From: Ondřej Bílka @ 2018-03-09  8:17 UTC (permalink / raw)
  To: dancol; +Cc: Florian Weimer, libc-alpha

On Thu, Mar 08, 2018 at 01:50:17PM -0800, dancol@dancol.org wrote:
> > On Thu, Mar 08, 2018 at 12:22:05PM -0800, dancol@dancol.org wrote:
> >> > For asynchronously delivered signals (such as subprocess termination),
> >> > the signal mechanism may not be entirely appropriate anyway.
> >>
> >> It beats wait. Which part of my proposed mechanism would operate
> >> improperly?
> >>
> > Basic problem is that if you combine signals, threads and locks you get
> > a big mess.
> >
> > It is hard to write handler doing something complex, you
> > couldn't take any lock because thread you interrupted could have that
> > lock. Introducing signal leads to unexpected race conditions(for example
> > what happens when interrupt is interrupted?).
> 
> Writing a robust handler requires some understanding of the subtleties
> involved, but it's in no way impossible or even particularly difficult.
> It's the moral equivalent of writing the top half of an interrupt handler.
> People do that all the time.
> 
> Besides: people are _already_ using signals for this purpose. Practically
> every high-performance language runtime already hooks various signal
> handlers, and for good reason. There's no reason it should be hard for
> these systems to coexist in the same process, and something like glib does
> nothing to help sharing.
> 
> I'm proposing making existing use cases more portable and more robust.
>
Reality is that people don't bother and use first code that appears to
work. 

So aim is to make these as safe as possible by default.
 
> >> > For those,
> >> > standardizing on a single event loop looks like the right solution,
> >> and
> >> > glib has largely taken over there.
> >>
> >> The libevent and Qt people might disagree. I don't think standardizing
> >> on
> >> a single event loop is realistic considering that various event loop
> >> libraries have been around for many years and not achieved any kind of
> >> fixation.
> >>
> >
> > Original answer to use event loop, doesn't matter which one.
> 
> Nobody will agree on a single event loop library. There is, for example,
> _zero_ chance that popular mobile operating systems will adopt glib as the
> primary event dispatching mechanism. In any case, an event loop doesn't
> help with the synchronous signal sharing use cases --- the ones, for
> example, a SIGSEGV-using high-performance runtime might require. (See the
> first of my use cases on my original post.)
> 
Florian and I were talking about async signals and that they should be
handled in single event loop to serialize them. That there may be other
event loops is irrelevant. 


> > Async signals should be handled as separate thread which has event loop
> > to serially handle arrived signals. That would remove most difficulties
> > of signal handlers.
> >
> > Reason why this isn't default is performance and putting this to another
> > event loop is compromise.
> 
> 
> It's not performance. It's that certain events, particularly those related
> to memory errors, _need_ to be addressed immediately and synchronously.
> Even if you were to use something like the Mach ports mechanism and send a
> message instead of pushing a stack frame, you'd still have to
> synchronously block a faulting thread, which could be anywhere, thus
> giving you the same atomicity constraints.
>
This starts with

> > Async signals 

So sync part isn't relevant. Beside that you could do sync signals using
different thread if you don't care about overhead. Kernel could create
lock for offending thread/process, signal handler would run in different thread
which would unlock to resume offending thread when condition was handled. 

This isn't done because creating thread contexts just to handle signals
in single thread application is too expensive.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] Toward Shareable POSIX Signals
  2018-03-08 20:22   ` dancol
  2018-03-08 21:21     ` Ondřej Bílka
@ 2018-03-09  9:19     ` Florian Weimer
  2018-03-09 10:43       ` Daniel Colascione
  1 sibling, 1 reply; 28+ messages in thread
From: Florian Weimer @ 2018-03-09  9:19 UTC (permalink / raw)
  To: dancol; +Cc: libc-alpha

On 03/08/2018 09:22 PM, dancol@dancol.org wrote:
>> On 03/08/2018 06:52 PM, Daniel Colascione wrote:
>>> Windows Vectored Exception Handlers
>>> -----------------------------------
>>>
>>> Windows isnÃ¢â‚¬â„¢t a POSIX system and doesnÃ¢â‚¬â„¢t have signals per se, but it
>>> does have a similar concept of a global unhandled Ã¢â‚¬Å“exceptionÃ¢â‚¬Â (e.g.,
>>> SIGSEGV-equivalent) handler. Vectored Exception Handlers allow
>>> multiple components to cooperate in handling these exceptions and
>>> operate very similarly to the mechanism that this document proposes.
>>
>> For many of the things you listed (particularly the synchronously
>> delivered signals), Structured Exception Handling (SEH) would actually
>> be the proper model (with a table-driven implementation).  It would
>> allow to install handlers for small regions of code, which helps with
>> modularity, and the handlers would be effectively thread-local.
> 
> Not the case. SEH works only when you know about all the call sites that
> might generate an exception. Sometimes, you want generic process-wide
> handling keyed on memory address.

There is userfaultfd for that.  However, I find it a bit scary to paper 
over segmentation faults for unknown call sites.  This seems to be a bit 
of a fringe application, also considering that page faults keep getting 
more and more expensive.

> We don't have SEH, however, and there's no realistic prospect of getting
> it. It would be realistic to extend the signals API to support more use
> cases.

At present, we don't have those stacked signal handlers, either.  I'm 
just saying that there is a better model for synchronous signals.

>> For asynchronously delivered signals (such as subprocess termination),
>> the signal mechanism may not be entirely appropriate anyway.
> 
> It beats wait. Which part of my proposed mechanism would operate improperly?

It just doesn't scale at all.  For each subprocess termination, you have 
to iterate through about half of the registered signal handlers until 
you hit one that happens to know about the PID that was terminated. 
Same for the other signals.

>> For those,
>> standardizing on a single event loop looks like the right solution, and
>> glib has largely taken over there.
> 
> The libevent and Qt people might disagree.

Last time I checked, Qt used the glib event loop.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] Toward Shareable POSIX Signals
  2018-03-09  9:19     ` Florian Weimer
@ 2018-03-09 10:43       ` Daniel Colascione
  2018-03-09 16:41         ` Rich Felker
  0 siblings, 1 reply; 28+ messages in thread
From: Daniel Colascione @ 2018-03-09 10:43 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha

On 03/09/2018 01:19 AM, Florian Weimer wrote:
> On 03/08/2018 09:22 PM, dancol@dancol.org wrote:
>>> On 03/08/2018 06:52 PM, Daniel Colascione wrote:
>>>> Windows Vectored Exception Handlers
>>>> -----------------------------------
>>>>
>>>> Windows isnÃ¢â‚¬â„¢t a POSIX system and doesnÃ¢â‚¬â„¢t have signals per se, 
>>>> but it
>>>> does have a similar concept of a global unhandled Ã¢â‚¬Å“exceptionÃ¢â‚¬Â 
>>>> (e.g.,
>>>> SIGSEGV-equivalent) handler. Vectored Exception Handlers allow
>>>> multiple components to cooperate in handling these exceptions and
>>>> operate very similarly to the mechanism that this document proposes.
>>>
>>> For many of the things you listed (particularly the synchronously
>>> delivered signals), Structured Exception Handling (SEH) would actually
>>> be the proper model (with a table-driven implementation).Â  It would
>>> allow to install handlers for small regions of code, which helps with
>>> modularity, and the handlers would be effectively thread-local.
>>
>> Not the case. SEH works only when you know about all the call sites that
>> might generate an exception. Sometimes, you want generic process-wide
>> handling keyed on memory address.
> 
> There is userfaultfd for that.  

userfaultfd is an optional kernel feature, not a standard interface, and 
it's never realistically never going to be adopted on non-Linux systems.

> However, I find it a bit scary to paper 
> over segmentation faults for unknown call sites.Â  This seems to be a bit 
> of a fringe application, also considering that page faults keep getting 
> more and more expensive.

People use signals for lots of things today. They mostly work fine. I'm 
proposing a mechanism to make signals *less* "scary", not *more*. 
Besides, it's not libc's job to make value judgments about which 
techniques application developers should use. At this low level, 
libraries should provide capabilities, not opinions.

>> We don't have SEH, however, and there's no realistic prospect of getting
>> it. It would be realistic to extend the signals API to support more use
>> cases.
> 
> At present, we don't have those stacked signal handlers, either.Â  I'm 
> just saying that there is a better model for synchronous signals.

The two models work together. But since we don't have SEH and will 
realistically never have it (especially if this "signals are scary" 
attitude persists), sharing the existing signal handler mechanism is better.

>>> For asynchronously delivered signals (such as subprocess termination),
>>> the signal mechanism may not be entirely appropriate anyway.
>>
>> It beats wait. Which part of my proposed mechanism would operate 
>> improperly?
> 
> It just doesn't scale at all.Â  For each subprocess termination, you have 
> to iterate through about half of the registered signal handlers until 
> you hit one that happens to know about the PID that was terminated. Same 
> for the other signals.

It's linear in the number of components asking for notification, not in 
the number of processes awaited. si_pid in siginfo makes identifying a 
particular child fast.

Also, I don't see any realistic alternatives to the wait family of APIs 
being proposed either. (And as I explain below, "just use glib" is 
completely unacceptable as a response to a fundamental defect in the 
design of wait*(2).)

In any case, focusing on this one child-monitoring use case misses the 
point. My original message lists many different example use cases for 
shared signals, all of which we could address with a simple API. It 
would take decades for standard alternatives for each of these use cases 
to become available universally.

>>> For those,
>>> standardizing on a single event loop looks like the right solution, and
>>> glib has largely taken over there.
>>
>> The libevent and Qt people might disagree.
> 
> Last time I checked, Qt used the glib event loop.

It's configurable, so you can't rely on glib integration being present.

In any case, there is zero chance that something glib sees universal 
adoption, particularly outside the Gnome part of the desktop Linux 
world. Can you imagine macOS applications having a glib main event loop? 
Android? Nginx? A sane signals API would make it possible for a single 
component written against a single API to happily coexist in any of 
these environments.

Seeing "just use glib" as response to an attempt to innovate in core 
interfaces is extremely disappointing. We could fix long-standing 
problems and make software more reliable, but instead we're talking 
about a pipe dream of some single event loop library being universally 
adopted.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] Toward Shareable POSIX Signals
  2018-03-09  8:17         ` Ondřej Bílka
@ 2018-03-09 10:51           ` Daniel Colascione
  0 siblings, 0 replies; 28+ messages in thread
From: Daniel Colascione @ 2018-03-09 10:51 UTC (permalink / raw)
  To: Ondřej Bílka; +Cc: Florian Weimer, libc-alpha

On 03/09/2018 12:17 AM, OndÅ™ej BÃlka wrote:
> On Thu, Mar 08, 2018 at 01:50:17PM -0800, dancol@dancol.org wrote:
>>> On Thu, Mar 08, 2018 at 12:22:05PM -0800, dancol@dancol.org wrote:
>>>>> For asynchronously delivered signals (such as subprocess termination),
>>>>> the signal mechanism may not be entirely appropriate anyway.
>>>>
>>>> It beats wait. Which part of my proposed mechanism would operate
>>>> improperly?
>>>>
>>> Basic problem is that if you combine signals, threads and locks you get
>>> a big mess.
>>>
>>> It is hard to write handler doing something complex, you
>>> couldn't take any lock because thread you interrupted could have that
>>> lock. Introducing signal leads to unexpected race conditions(for example
>>> what happens when interrupt is interrupted?).
>>
>> Writing a robust handler requires some understanding of the subtleties
>> involved, but it's in no way impossible or even particularly difficult.
>> It's the moral equivalent of writing the top half of an interrupt handler.
>> People do that all the time.
>>
>> Besides: people are _already_ using signals for this purpose. Practically
>> every high-performance language runtime already hooks various signal
>> handlers, and for good reason. There's no reason it should be hard for
>> these systems to coexist in the same process, and something like glib does
>> nothing to help sharing.
>>
>> I'm proposing making existing use cases more portable and more robust.
>>
> Reality is that people don't bother and use first code that appears to
> work.

If you follow that line of thought to its logical conclusion, you end up 
with banning C. Low-level libraries help let developers express their 
design, not hold their hand because some techniques are thought to be 
too dangerous in solutions pasted from Stack Overflow.

In any case, blocking an API like this won't stop people using signals. 
It'll just make the signal hacking they do more dangerous.

>>>>> For those,
>>>>> standardizing on a single event loop looks like the right solution,
>>>> and
>>>>> glib has largely taken over there.
>>>>
>>>> The libevent and Qt people might disagree. I don't think standardizing
>>>> on
>>>> a single event loop is realistic considering that various event loop
>>>> libraries have been around for many years and not achieved any kind of
>>>> fixation.
>>>>
>>>
>>> Original answer to use event loop, doesn't matter which one.
>>
>> Nobody will agree on a single event loop library. There is, for example,
>> _zero_ chance that popular mobile operating systems will adopt glib as the
>> primary event dispatching mechanism. In any case, an event loop doesn't
>> help with the synchronous signal sharing use cases --- the ones, for
>> example, a SIGSEGV-using high-performance runtime might require. (See the
>> first of my use cases on my original post.)
>>
> Florian and I were talking about async signals and that they should be
> handled in single event loop to serialize them. That there may be other
> event loops is irrelevant.

If there are multiple event loop libraries, they need to arbitrate 
access to signals somehow.

>>> Async signals should be handled as separate thread which has event loop
>>> to serially handle arrived signals. That would remove most difficulties
>>> of signal handlers.
>>>
>>> Reason why this isn't default is performance and putting this to another
>>> event loop is compromise.
>>
>>
>> It's not performance. It's that certain events, particularly those related
>> to memory errors, _need_ to be addressed immediately and synchronously.
>> Even if you were to use something like the Mach ports mechanism and send a
>> message instead of pushing a stack frame, you'd still have to
>> synchronously block a faulting thread, which could be anywhere, thus
>> giving you the same atomicity constraints.
>>
> This starts with
> 
>>> Async signals
> 
> So sync part isn't relevant.

Sure it is. With one API, you can cover both kinds of signal.

> Beside that you could do sync signals using
> different thread if you don't care about overhead. Kernel could create
> lock for offending thread/process, signal handler would run in different thread
> which would unlock to resume offending thread when condition was handled.

It doesn't help. A thread causing a synchronous signal can hold 
arbitrary locks while blocked waiting for some other thread to service 
that signal, so that handler code would still need to be 
async-signal-safe, just like a signal handler today. All you've done is 
tweak the precise location of the signal stack and made it harder for 
the handler to inspect any relevant thread state.

Sure, you could handle the signal out-of-process somehow, but at that 
point, you've just reinvented ptrace.

> This isn't done because creating thread contexts just to handle signals
> in single thread application is too expensive.

It's also because the thread mechanism you're suggesting wouldn't 
actually deliver any benefits.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] Toward Shareable POSIX Signals
  2018-03-09 10:43       ` Daniel Colascione
@ 2018-03-09 16:41         ` Rich Felker
  2018-03-09 16:58           ` Florian Weimer
                             ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Rich Felker @ 2018-03-09 16:41 UTC (permalink / raw)
  To: Daniel Colascione; +Cc: Florian Weimer, libc-alpha

On Fri, Mar 09, 2018 at 02:43:06AM -0800, Daniel Colascione wrote:
> People use signals for lots of things today. They mostly work fine.
> I'm proposing a mechanism to make signals *less* "scary", not
> *more*. Besides, it's not libc's job to make value judgments about
> which techniques application developers should use. At this low
> level, libraries should provide capabilities, not opinions.

I think we need to weigh the benefits of making signals less
scary/unsafe/hideous versus the benefits of leaving them so. Yes,
people use signals today. Most of the uses are utterly unsafe and
utterly wrong. Most of them are not even justified; they're for lack
of knowing better or just cargo-culting from something they saw done
elsewhere. Do new interfaces fix existing incorrect usage and
discourage it in the future?

> >>>For asynchronously delivered signals (such as subprocess termination),
> >>>the signal mechanism may not be entirely appropriate anyway.
> >>
> >>It beats wait. Which part of my proposed mechanism would operate
> >>improperly?
> >
> >It just doesn't scale at all.Â  For each subprocess termination,
> >you have to iterate through about half of the registered signal
> >handlers until you hit one that happens to know about the PID that
> >was terminated. Same for the other signals.
> 
> It's linear in the number of components asking for notification, not
> in the number of processes awaited. si_pid in siginfo makes
> identifying a particular child fast.
> 
> Also, I don't see any realistic alternatives to the wait family of
> APIs being proposed either. (And as I explain below, "just use glib"
> is completely unacceptable as a response to a fundamental defect in
> the design of wait*(2).)

"Just use glib" is of course fundamentally unacceptable. But the
obvious solution is "just use threads" and I don't see why that's not
acceptable. The cost of a thread is miniscule compared to the cost of
a child process, and threads performing synchronous waitpid can
convert the result into whatever type of notification (poll wakeup,
cond var, synchronous handling, etc.) you like. This is clearly the
best approach for any application that's not creating at least
tens/hundreds of child processes per second; when people refuse to use
it in such a situation, it's because of irrational aversion to threads
and nothing else.

For loads where the cost of child process creation and termination is
the dominant factor, I'll grant that the added cost of a thread
lifecycle might not be acceptable. But there are much better
approaches like just forcing each child to inherit a pipe, and polling
those pipes to determine when the child exited, that are very light
(roughly equivalent to BSD forkfd) and much cleaner than using
signals.

> In any case, focusing on this one child-monitoring use case misses
> the point. My original message lists many different example use
> cases for shared signals, all of which we could address with a
> simple API. It would take decades for standard alternatives for each
> of these use cases to become available universally.

I don't think it misses the point when the point is to determine
whether the legitimate uses of signals establish a compelling need for
new interfaces. Each possible use needs to be evaluated one by one.

> Seeing "just use glib" as response to an attempt to innovate in core
> interfaces is extremely disappointing.

Again I agree re: "just use glib", but when it comes to core system
interfaces, the principle that "a maintainer's job is to say no"
applies more than ever. Perhaps a nicer and more precise way of saying
it is that a maintainer's job is to press for justification of need by
exhaustively searching for alternatives before turning to creation of
new interfaces as a solution.

Rich

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] Toward Shareable POSIX Signals
  2018-03-09 16:41         ` Rich Felker
@ 2018-03-09 16:58           ` Florian Weimer
  2018-03-09 17:14             ` Rich Felker
  2018-03-09 19:28           ` Daniel Colascione
  2018-03-09 19:30           ` Zack Weinberg
  2 siblings, 1 reply; 28+ messages in thread
From: Florian Weimer @ 2018-03-09 16:58 UTC (permalink / raw)
  To: Rich Felker, Daniel Colascione; +Cc: libc-alpha

On 03/09/2018 05:41 PM, Rich Felker wrote:
> "Just use glib" is of course fundamentally unacceptable. But the
> obvious solution is "just use threads" and I don't see why that's not
> acceptable. The cost of a thread is miniscule compared to the cost of
> a child process, and threads performing synchronous waitpid can
> convert the result into whatever type of notification (poll wakeup,
> cond var, synchronous handling, etc.) you like. This is clearly the
> best approach for any application that's not creating at least
> tens/hundreds of child processes per second; when people refuse to use
> it in such a situation, it's because of irrational aversion to threads
> and nothing else.

But this only works for asynchronous signals.  It's reasonable for an 
application to want to catch synchronous signals (SIGBUS when dealing 
with file mappings, SIGFPE for arithmetic), and there is currently no 
thread-safe or library-safe way at all to do that.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] Toward Shareable POSIX Signals
  2018-03-09 16:58           ` Florian Weimer
@ 2018-03-09 17:14             ` Rich Felker
  2018-03-09 17:36               ` Paul Eggert
  2018-03-09 19:34               ` Daniel Colascione
  0 siblings, 2 replies; 28+ messages in thread
From: Rich Felker @ 2018-03-09 17:14 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Daniel Colascione, libc-alpha

On Fri, Mar 09, 2018 at 05:58:51PM +0100, Florian Weimer wrote:
> On 03/09/2018 05:41 PM, Rich Felker wrote:
> >"Just use glib" is of course fundamentally unacceptable. But the
> >obvious solution is "just use threads" and I don't see why that's not
> >acceptable. The cost of a thread is miniscule compared to the cost of
> >a child process, and threads performing synchronous waitpid can
> >convert the result into whatever type of notification (poll wakeup,
> >cond var, synchronous handling, etc.) you like. This is clearly the
> >best approach for any application that's not creating at least
> >tens/hundreds of child processes per second; when people refuse to use
> >it in such a situation, it's because of irrational aversion to threads
> >and nothing else.
> 
> But this only works for asynchronous signals.  It's reasonable for
> an application to want to catch synchronous signals (SIGBUS when
> dealing with file mappings, SIGFPE for arithmetic), and there is
> currently no thread-safe or library-safe way at all to do that.

Yes, as I noted each use case needs to be considered separately to
determine if there's some other better/more-portable/whatnot way it
could be done already. The above applies only to SIGCHLD.

FWIW I'm rather skeptical of many of the usage cases for synchronous
signals (most are dangerous papering-over of UB for dubious
performance reasons; never-taken "test reg,reg;jz" takes essentially 0
cycles on a modern uarch) but SIGBUS makes it hard to use mmap safely
to begin with. So there's still a lot of material to consider here.

FYI Daniel proposed the ideas to me first before posting on libc-alpha
and I suggested bringing a proposal here. I'm rather split between
finding the proposal nice and finding signals irredeemably awful.

Rich

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] Toward Shareable POSIX Signals
  2018-03-09 17:14             ` Rich Felker
@ 2018-03-09 17:36               ` Paul Eggert
  2018-03-09 19:34               ` Daniel Colascione
  1 sibling, 0 replies; 28+ messages in thread
From: Paul Eggert @ 2018-03-09 17:36 UTC (permalink / raw)
  To: Rich Felker, Florian Weimer; +Cc: Daniel Colascione, libc-alpha

On 03/09/2018 09:14 AM, Rich Felker wrote:
> I'm rather split between
> finding the proposal nice and finding signals irredeemably awful.

Why not both?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] Toward Shareable POSIX Signals
  2018-03-09 16:41         ` Rich Felker
  2018-03-09 16:58           ` Florian Weimer
@ 2018-03-09 19:28           ` Daniel Colascione
  2018-03-09 19:30           ` Zack Weinberg
  2 siblings, 0 replies; 28+ messages in thread
From: Daniel Colascione @ 2018-03-09 19:28 UTC (permalink / raw)
  To: Rich Felker; +Cc: Florian Weimer, libc-alpha

On 03/09/2018 08:41 AM, Rich Felker wrote:
> On Fri, Mar 09, 2018 at 02:43:06AM -0800, Daniel Colascione wrote:
>> People use signals for lots of things today. They mostly work fine.
>> I'm proposing a mechanism to make signals *less* "scary", not
>> *more*. Besides, it's not libc's job to make value judgments about
>> which techniques application developers should use. At this low
>> level, libraries should provide capabilities, not opinions.
> 
> I think we need to weigh the benefits of making signals less
> scary/unsafe/hideous versus the benefits of leaving them so. Yes,
> people use signals today. Most of the uses are utterly unsafe and
> utterly wrong. Most of them are not even justified; they're for lack
> of knowing better or just cargo-culting from something they saw done
> elsewhere. Do new interfaces fix existing incorrect usage and
> discourage it in the future?

I see little evidence that people use signals unnecessarily. If someone 
is determined to use signals, the existing interfaces are adequate for 
causing chaos. They are inadequate for the purpose of letting experts do 
the right thing.

There is an argument for programming environments having safeguards, but 
libc is far too fundamental and far too low-level to serve a role as a 
guarantor of safety. libc needs to let people who know what they're 
doing do their work.

>>>>> For asynchronously delivered signals (such as subprocess termination),
>>>>> the signal mechanism may not be entirely appropriate anyway.
>>>>
>>>> It beats wait. Which part of my proposed mechanism would operate
>>>> improperly?
>>>
>>> It just doesn't scale at all.Â  For each subprocess termination,
>>> you have to iterate through about half of the registered signal
>>> handlers until you hit one that happens to know about the PID that
>>> was terminated. Same for the other signals.
>>
>> It's linear in the number of components asking for notification, not
>> in the number of processes awaited. si_pid in siginfo makes
>> identifying a particular child fast.
>>
>> Also, I don't see any realistic alternatives to the wait family of
>> APIs being proposed either. (And as I explain below, "just use glib"
>> is completely unacceptable as a response to a fundamental defect in
>> the design of wait*(2).)
> 
> "Just use glib" is of course fundamentally unacceptable. But the
> obvious solution is "just use threads" and I don't see why that's not
> acceptable. The cost of a thread is miniscule compared to the cost of
> a child process, and threads performing synchronous waitpid can
> convert the result into whatever type of notification (poll wakeup,
> cond var, synchronous handling, etc.) you like. This is clearly the
> best approach for any application that's not creating at least
> tens/hundreds of child processes per second; when people refuse to use
> it in such a situation, it's because of irrational aversion to threads
> and nothing else.

Suppose you're the JVM. You have no idea how many subprocesses someone 
might create. Some users might have a lot of children, and an internal 
thread per child starts to look very expensive. "Wait", you might 
exclaim in surprise. "I can use wait(-1) and have the kernel just _tell_ 
me which process exits!" So you use wait(-1) and your system appears to 
work fine.

Now imagine two runtime environments think this way and you want to use 
them in the same process. (It's more likely that you might think.) The 
wait(-1) scheme fails the "What if two libraries did this?" test and a 
thread-per-process scheme doesn't scale.

SIGCHLD solves the whole problem. It provides a way for the kernel to 
tell any interested observer about the death of a process without racing 
with _other_ interested observers who want to learn about the states of 
_their_ processes. (You have to scan, unfortunately, because multiple 
pending SIGCHLDs can be collapsed into one.)

Is it ideal? No. I'd much rather have some kind of real waitable 
process-handle FD. But it works.

In any case, SIGCHLD is separable from the rest of my proposal. We could 
in principle agree that SIGCHLD is a bad idea and that we still need a 
better interface for signals generally.

> For loads where the cost of child process creation and termination is
> the dominant factor, I'll grant that the added cost of a thread
> lifecycle might not be acceptable. But there are much better
> approaches like just forcing each child to inherit a pipe, and polling
> those pipes to determine when the child exited, that are very light
> (roughly equivalent to BSD forkfd) and much cleaner than using
> signals.

The "death pipe" trick isn't general-purpose. A child can itself 
propagate that FD to other children, leading to a false-negative result, 
since the FD's lifetime can exceed that of the process it's intended to 
monitor.

I very much want something like forkfd. The process waiting APIs being 
awful is a longstanding problem with POSIX systems. It's also impossible 
to write a race-free pkill! Most people just rely on PID reuse not being 
fast enough to cause real problems. This kind of thinking is terrifying.

Ideally, you'd be able to open *any* process and get a file descriptor 
"handle" for it. Then, you'd be able to perform all process management 
operations (waiting, sending a signal, reading times, etc.) via the 
handle, either directly (by passing the handle FD to some API) or 
indirectly (by relying on the handle reserving the PID of the process to 
which it refers and preventing reuse). The "direct" approach is better, 
since it lets the process FD act as a credential.

>> In any case, focusing on this one child-monitoring use case misses
>> the point. My original message lists many different example use
>> cases for shared signals, all of which we could address with a
>> simple API. It would take decades for standard alternatives for each
>> of these use cases to become available universally.
> 
> I don't think it misses the point when the point is to determine
> whether the legitimate uses of signals establish a compelling need for
> new interfaces. Each possible use needs to be evaluated one by one.

Even if you ignored my other signal use cases, the first one I listed, 
high performance runtime optimization via SIGSEGV, would be sufficient 
grounds for a new interface for arbitrating access.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] Toward Shareable POSIX Signals
  2018-03-09 16:41         ` Rich Felker
  2018-03-09 16:58           ` Florian Weimer
  2018-03-09 19:28           ` Daniel Colascione
@ 2018-03-09 19:30           ` Zack Weinberg
  2018-03-09 20:06             ` Daniel Colascione
  2018-03-09 20:25             ` Rich Felker
  2 siblings, 2 replies; 28+ messages in thread
From: Zack Weinberg @ 2018-03-09 19:30 UTC (permalink / raw)
  To: Rich Felker; +Cc: Daniel Colascione, Florian Weimer, GNU C Library

On Fri, Mar 9, 2018 at 11:41 AM, Rich Felker <dalias@libc.org> wrote:
> On Fri, Mar 09, 2018 at 02:43:06AM -0800, Daniel Colascione wrote:
>> People use signals for lots of things today. They mostly work fine.
>> I'm proposing a mechanism to make signals *less* "scary", not
>> *more*. Besides, it's not libc's job to make value judgments about
>> which techniques application developers should use. At this low
>> level, libraries should provide capabilities, not opinions.
>
> I think we need to weigh the benefits of making signals less
> scary/unsafe/hideous versus the benefits of leaving them so. Yes,
> people use signals today. Most of the uses are utterly unsafe and
> utterly wrong. Most of them are not even justified; they're for lack
> of knowing better or just cargo-culting from something they saw done
> elsewhere. Do new interfaces fix existing incorrect usage and
> discourage it in the future?

This is a good question to ask.

I tend to think that the basic mechanism of signals -- interrupting
normal user-space execution and transferring control to a handler
function -- is irretrievably flawed; not even as a design, but as a
_concept_.  This should not be a thing that ever happens to user
space.  As such, I appreciate Daniel's having taken the time to
canvass existing use cases for signals that are poorly served, or not
served at all, by any alternative, but I'm not a fan of any of his
proposed solutions.  I would like to work toward an end-state of
being able to remove <signal.h> from ISO C and POSIX.

[Daniel:]
>> Also, I don't see any realistic alternatives to the wait family of
>> APIs being proposed either. (And as I explain below, "just use glib"
>> is completely unacceptable as a response to a fundamental defect in
>> the design of wait*(2).)

pdfork / forkfd is obviously the correct replacement for SIGCHLD, and
it's infuriating that it still hasn't gotten traction.  pdfork is
still (documented as) incomplete as of FreeBSD 11, and the CLONE_FD
patches for Linux don't seem to have gone anywhere since 2015.  I'd
want to see a few refinements, notably an fcntl or ioctl that's
equivalent to WUNTRACED (that is, it controls whether select() will
wake you up when the process _stops_; without that, shells can't
really use pdfork) but if you (Daniel) were to pick up the pdfork ball
and get it accepted on the other *BSDs and on Linux, you'd have made
significant progress in this area.

> "Just use glib" is of course fundamentally unacceptable. But the
> obvious solution is "just use threads" and I don't see why that's not
> acceptable. The cost of a thread is miniscule compared to the cost of
> a child process, and threads performing synchronous waitpid can
> convert the result into whatever type of notification (poll wakeup,
> cond var, synchronous handling, etc.) you like.

The main problem I see with this idea is, a thread waiting for _any_
process can steal the event from a thread waiting for a specific
process; this makes it nonviable for any situation where you don't
control all of the code in the parent process.  In particular, this is
not a viable solution for libraries that want to run helper programs.
It is unclear to me whether pdfork-as-implemented (in FreeBSD 11) has
this problem; the manpage says that a pdfork'ed child does not fire
SIGCHLD, but it doesn't say whether an ordinary waitpid for the
process ID, or wait-for-any, can compete with pdwait4 on the fd
(perhaps because "pdwait4 has not yet been implemented", sigh...)

...
> On Fri, Mar 09, 2018 at 05:58:51PM +0100, Florian Weimer wrote:
>> But [threads] only works for asynchronous signals.  It's reasonable
>> for an application to want to catch synchronous signals (SIGBUS
>> when dealing with file mappings, SIGFPE for arithmetic), and there
>> is currently no thread-safe or library-safe way at all to do that.
>
> Yes, as I noted each use case needs to be considered separately to
> determine if there's some other better/more-portable/whatnot way it
> could be done already. The above applies only to SIGCHLD.
>
> FWIW I'm rather skeptical of many of the usage cases for synchronous
> signals (most are dangerous papering-over of UB for dubious
> performance reasons; never-taken "test reg,reg;jz" takes essentially 0
> cycles on a modern uarch) but SIGBUS makes it hard to use mmap safely
> to begin with. So there's still a lot of material to consider here.

If I remember correctly, GCJ tried to use signal handlers to generate
NullPointerExceptions not for speed reasons, but for code-size and
exception-precision reasons.  But it was never 100% reliable and it
might have been better to go with "test reg,reg;jz" + lean harder on
proving pointers couldn't be null.

That's the only case I'm personally familiar with where a serious
application tried to _recover from_ synchronous signals.  I've also
dug into Breakpad a little, but that is a debugger at heart, and it
would be well-served by a mechanism where the kernel would
automatically spawn a ptrace supervisor instead of delivering a fatal
signal.  (This would also allow us to kick core dump generation out of
the kernel.)

> FYI Daniel proposed the ideas to me first before posting on libc-alpha
> and I suggested bringing a proposal here. I'm rather split between
> finding the proposal nice and finding signals irredeemably awful.

Yeah, I feel we collectively could have done a better job of not
leaping at Daniel's throat just for bringing this up.

zw

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] Toward Shareable POSIX Signals
  2018-03-09 17:14             ` Rich Felker
  2018-03-09 17:36               ` Paul Eggert
@ 2018-03-09 19:34               ` Daniel Colascione
  1 sibling, 0 replies; 28+ messages in thread
From: Daniel Colascione @ 2018-03-09 19:34 UTC (permalink / raw)
  To: Rich Felker, Florian Weimer; +Cc: libc-alpha

On 03/09/2018 09:14 AM, Rich Felker wrote:
> On Fri, Mar 09, 2018 at 05:58:51PM +0100, Florian Weimer wrote:
>> On 03/09/2018 05:41 PM, Rich Felker wrote:
>>> "Just use glib" is of course fundamentally unacceptable. But the
>>> obvious solution is "just use threads" and I don't see why that's not
>>> acceptable. The cost of a thread is miniscule compared to the cost of
>>> a child process, and threads performing synchronous waitpid can
>>> convert the result into whatever type of notification (poll wakeup,
>>> cond var, synchronous handling, etc.) you like. This is clearly the
>>> best approach for any application that's not creating at least
>>> tens/hundreds of child processes per second; when people refuse to use
>>> it in such a situation, it's because of irrational aversion to threads
>>> and nothing else.
>>
>> But this only works for asynchronous signals.  It's reasonable for
>> an application to want to catch synchronous signals (SIGBUS when
>> dealing with file mappings, SIGFPE for arithmetic), and there is
>> currently no thread-safe or library-safe way at all to do that.
> 
> Yes, as I noted each use case needs to be considered separately to
> determine if there's some other better/more-portable/whatnot way it
> could be done already. The above applies only to SIGCHLD.
> 
> FWIW I'm rather skeptical of many of the usage cases for synchronous
> signals 

> (most are dangerous 

They work fine. Billions of people every day use devices with a runtime 
that safely uses a SIGSEGV optimizations. It's hard to make the case 
that this technique doesn't actually work or is somehow dangerous.

> papering-over

It's not "papering-over". It's intended design: don't make the common 
case pay for the uncommon case.

> of UB 

It's not UB at the level we're talking about. Of course just longjmping 
from random code that segfaults is likely a bad idea. (I'm looking at 
you, Emacs stack recovery code.) But if you know what you're doing and 
can constrain the situations in which you do non-local control flow in 
response to a SIGSEGV, you can create something that's perfectly safe 
and efficient.

> for dubious
s/dubious/important/

> performance reasons; never-taken "test reg,reg;jz" takes essentially 0
> cycles on a modern uarch) 

It isn't.

> but SIGBUS makes it hard to use mmap safely
> to begin with. So there's still a lot of material to consider here.

Right. It's reasonable to want to transform SIGBUS into some kind of 
friendly high-level error, and to do that, you need a SIGBUS handler. 
It's reasonable for multiple such systems in a single process to want to 
convert their respective SIGBUS errors into friendly errors, so we need 
some way to share the SIGBUS signal.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] Toward Shareable POSIX Signals
  2018-03-09 19:30           ` Zack Weinberg
@ 2018-03-09 20:06             ` Daniel Colascione
  2018-03-09 20:25             ` Rich Felker
  1 sibling, 0 replies; 28+ messages in thread
From: Daniel Colascione @ 2018-03-09 20:06 UTC (permalink / raw)
  To: Zack Weinberg, Rich Felker; +Cc: Florian Weimer, GNU C Library

On 03/09/2018 11:30 AM, Zack Weinberg wrote:
> On Fri, Mar 9, 2018 at 11:41 AM, Rich Felker <dalias@libc.org> wrote:
>> On Fri, Mar 09, 2018 at 02:43:06AM -0800, Daniel Colascione wrote:
>>> People use signals for lots of things today. They mostly work fine.
>>> I'm proposing a mechanism to make signals *less* "scary", not
>>> *more*. Besides, it's not libc's job to make value judgments about
>>> which techniques application developers should use. At this low
>>> level, libraries should provide capabilities, not opinions.
>>
>> I think we need to weigh the benefits of making signals less
>> scary/unsafe/hideous versus the benefits of leaving them so. Yes,
>> people use signals today. Most of the uses are utterly unsafe and
>> utterly wrong. Most of them are not even justified; they're for lack
>> of knowing better or just cargo-culting from something they saw done
>> elsewhere. Do new interfaces fix existing incorrect usage and
>> discourage it in the future?
> 
> This is a good question to ask.
> 
> I tend to think that the basic mechanism of signals -- interrupting
> normal user-space execution and transferring control to a handler
> function -- is irretrievably flawed; not even as a design, but as a
> _concept_.  This should not be a thing that ever happens to user
> space.  As such, I appreciate Daniel's having taken the time to
> canvass existing use cases for signals that are poorly served, or not
> served at all, by any alternative, but I'm not a fan of any of his
> proposed solutions.  I would like to work toward an end-state of
> being able to remove <signal.h> from ISO C and POSIX.

Removing signals will never happen. 30 years from now, we're going to 
have SIGSEGV and SIGINT; we'll probably have them in 300 years too.

First, breaking backward compatibility for the sake of removing an ugly 
concept is unjustified. Rewriting the world doesn't work.

Secondly, I don't think the "concept" of signals is flawed at all. 
They're asynchronous events. They're analogous to hardware interrupts. 
The reason we have systems with interrupts is that in the real world, 
things go wrong, and you have to allow for responding to these 
circumstances in some way. It's a fact of life and a consequence of the 
structure of the universe: that's why some kind of asynchronous 
exception mechanism ends up being included in *every* sufficiently 
complex computing environment, even ones not POSIX-derived.

I'm not proposing that we ignore the need to also come up with 
signal-free interfaces for important functionality. All things being 
equal, I'd prefer not to use signals. But I don't think we'll be able to 
ever actually get away from them, and since we're going to be stuck with 
signals regardless, we might as well make them work better.

> [Daniel:]
>>> Also, I don't see any realistic alternatives to the wait family of
>>> APIs being proposed either. (And as I explain below, "just use glib"
>>> is completely unacceptable as a response to a fundamental defect in
>>> the design of wait*(2).)
> 
> pdfork / forkfd is obviously the correct replacement for SIGCHLD, and
> it's infuriating that it still hasn't gotten traction.

I recall a discussion I read about with Linus a few years ago in which 
he suggested that a process-handle FD might be some kind of attack 
vector because it would let people fill the PID table. That argument 
makes no sense to me: in the worst case, you could count process handles 
against the process ulimit. But I don't think most people should bother 
with that level of restriction.

Windows has conventional process handles and gets along fine.

>> On Fri, Mar 09, 2018 at 05:58:51PM +0100, Florian Weimer wrote:
>>> But [threads] only works for asynchronous signals.  It's reasonable
>>> for an application to want to catch synchronous signals (SIGBUS
>>> when dealing with file mappings, SIGFPE for arithmetic), and there
>>> is currently no thread-safe or library-safe way at all to do that.
>>
>> Yes, as I noted each use case needs to be considered separately to
>> determine if there's some other better/more-portable/whatnot way it
>> could be done already. The above applies only to SIGCHLD.
>>
>> FWIW I'm rather skeptical of many of the usage cases for synchronous
>> signals (most are dangerous papering-over of UB for dubious
>> performance reasons; never-taken "test reg,reg;jz" takes essentially 0
>> cycles on a modern uarch) but SIGBUS makes it hard to use mmap safely
>> to begin with. So there's still a lot of material to consider here.
> 
> If I remember correctly, GCJ tried to use signal handlers to generate
> NullPointerExceptions not for speed reasons, but for code-size and
> exception-precision reasons.  But it was never 100% reliable and it
> might have been better to go with "test reg,reg;jz" + lean harder on
> proving pointers couldn't be null.

Android's ART runtime, Mono, Microsoft's CLR, and the JVM's HotSpot all 
recover from SIGSEGV and other synchronous signals. It's a good technique.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] Toward Shareable POSIX Signals
  2018-03-09 19:30           ` Zack Weinberg
  2018-03-09 20:06             ` Daniel Colascione
@ 2018-03-09 20:25             ` Rich Felker
  2018-03-09 20:54               ` Daniel Colascione
                                 ` (2 more replies)
  1 sibling, 3 replies; 28+ messages in thread
From: Rich Felker @ 2018-03-09 20:25 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: Daniel Colascione, Florian Weimer, GNU C Library

On Fri, Mar 09, 2018 at 02:30:51PM -0500, Zack Weinberg wrote:
> > "Just use glib" is of course fundamentally unacceptable. But the
> > obvious solution is "just use threads" and I don't see why that's not
> > acceptable. The cost of a thread is miniscule compared to the cost of
> > a child process, and threads performing synchronous waitpid can
> > convert the result into whatever type of notification (poll wakeup,
> > cond var, synchronous handling, etc.) you like.
> 
> The main problem I see with this idea is, a thread waiting for _any_
> process can steal the event from a thread waiting for a specific
> process; this makes it nonviable for any situation where you don't

I never proposed using a thread that calls wait or waidpid with a
negative argument, rather one thread per child. As long as there is no
rogue thread in the program doing wait-any, the thread-per-child
approach lets you emulate pdfork pretty well; programs written around
this model can use pdfork as a drop-in replacement and eliminate the
cost of the thread.

> > On Fri, Mar 09, 2018 at 05:58:51PM +0100, Florian Weimer wrote:
> >> But [threads] only works for asynchronous signals.  It's reasonable
> >> for an application to want to catch synchronous signals (SIGBUS
> >> when dealing with file mappings, SIGFPE for arithmetic), and there
> >> is currently no thread-safe or library-safe way at all to do that.
> >
> > Yes, as I noted each use case needs to be considered separately to
> > determine if there's some other better/more-portable/whatnot way it
> > could be done already. The above applies only to SIGCHLD.
> >
> > FWIW I'm rather skeptical of many of the usage cases for synchronous
> > signals (most are dangerous papering-over of UB for dubious
> > performance reasons; never-taken "test reg,reg;jz" takes essentially 0
> > cycles on a modern uarch) but SIGBUS makes it hard to use mmap safely
> > to begin with. So there's still a lot of material to consider here.
> 
> If I remember correctly, GCJ tried to use signal handlers to generate
> NullPointerExceptions not for speed reasons, but for code-size and
> exception-precision reasons.  But it was never 100% reliable and it
> might have been better to go with "test reg,reg;jz" + lean harder on
> proving pointers couldn't be null.

This is my view. Null checks/proofs should be maximally hoisted and
explicitly emitted in the output rather than relying on traps.

> That's the only case I'm personally familiar with where a serious
> application tried to _recover from_ synchronous signals.  I've also
> dug into Breakpad a little, but that is a debugger at heart, and it
> would be well-served by a mechanism where the kernel would
> automatically spawn a ptrace supervisor instead of delivering a fatal
> signal.  (This would also allow us to kick core dump generation out of
> the kernel.)

This is a very bad idea. Introspective crash logging/reporting is a
huge gift to attackers. If an attacker has compromised a process in a
manner to cause it to segfault, they almost surely have enough control
over the process state to force the handler to perform code execution
for them. There have been real-world CVEs along these lines.

> > FYI Daniel proposed the ideas to me first before posting on libc-alpha
> > and I suggested bringing a proposal here. I'm rather split between
> > finding the proposal nice and finding signals irredeemably awful.
> 
> Yeah, I feel we collectively could have done a better job of not
> leaping at Daniel's throat just for bringing this up.

Absolutely. This needs to improve.

Rich

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] Toward Shareable POSIX Signals
  2018-03-09 20:25             ` Rich Felker
@ 2018-03-09 20:54               ` Daniel Colascione
  2018-03-09 21:10                 ` Rich Felker
  2018-03-09 21:05               ` Zack Weinberg
  2018-03-10  7:56               ` Florian Weimer
  2 siblings, 1 reply; 28+ messages in thread
From: Daniel Colascione @ 2018-03-09 20:54 UTC (permalink / raw)
  To: Rich Felker, Zack Weinberg; +Cc: Florian Weimer, GNU C Library

On 03/09/2018 12:25 PM, Rich Felker wrote:
> On Fri, Mar 09, 2018 at 02:30:51PM -0500, Zack Weinberg wrote:
>>> "Just use glib" is of course fundamentally unacceptable. But the
>>> obvious solution is "just use threads" and I don't see why that's not
>>> acceptable. The cost of a thread is miniscule compared to the cost of
>>> a child process, and threads performing synchronous waitpid can
>>> convert the result into whatever type of notification (poll wakeup,
>>> cond var, synchronous handling, etc.) you like.
>>
>> The main problem I see with this idea is, a thread waiting for _any_
>> process can steal the event from a thread waiting for a specific
>> process; this makes it nonviable for any situation where you don't
> 
> I never proposed using a thread that calls wait or waidpid with a
> negative argument, rather one thread per child. 

Understood.

> As long as there is no
> rogue thread in the program doing wait-any, the thread-per-child
> approach lets you emulate pdfork pretty well; programs written around
> this model can use pdfork as a drop-in replacement and eliminate the
> cost of the thread.

My contention is that a thread per child process is infeasible from a 
resource POV and that major subsystem authors will never adopt this 
approach.

Let's be realistic here: lots of systems behave badly due to the 
inadequacies of the wait API. If a better alternative doesn't appear, 
these systems are going to continue behaving badly.

>>> On Fri, Mar 09, 2018 at 05:58:51PM +0100, Florian Weimer wrote:
>>>> But [threads] only works for asynchronous signals.  It's reasonable
>>>> for an application to want to catch synchronous signals (SIGBUS
>>>> when dealing with file mappings, SIGFPE for arithmetic), and there
>>>> is currently no thread-safe or library-safe way at all to do that.
>>>
>>> Yes, as I noted each use case needs to be considered separately to
>>> determine if there's some other better/more-portable/whatnot way it
>>> could be done already. The above applies only to SIGCHLD.
>>>
>>> FWIW I'm rather skeptical of many of the usage cases for synchronous
>>> signals (most are dangerous papering-over of UB for dubious
>>> performance reasons; never-taken "test reg,reg;jz" takes essentially 0
>>> cycles on a modern uarch) but SIGBUS makes it hard to use mmap safely
>>> to begin with. So there's still a lot of material to consider here.
>>
>> If I remember correctly, GCJ tried to use signal handlers to generate
>> NullPointerExceptions not for speed reasons, but for code-size and
>> exception-precision reasons.  But it was never 100% reliable and it
>> might have been better to go with "test reg,reg;jz" + lean harder on
>> proving pointers couldn't be null.
> 
> This is my view. Null checks/proofs should be maximally hoisted and
> explicitly emitted in the output rather than relying on traps.

Every major managed code runtime team disagrees with you.

It's not productive for low-level infrastructure maintainers to claim 
that a universal practice is somehow illegitimate. This attitude is not 
going to convince people doing the supposedly illegitimate thing to stop 
doing it, but it will block progress that leads to improvement of the 
system as a whole.

>> That's the only case I'm personally familiar with where a serious
>> application tried to _recover from_ synchronous signals.  I've also
>> dug into Breakpad a little, but that is a debugger at heart, and it
>> would be well-served by a mechanism where the kernel would
>> automatically spawn a ptrace supervisor instead of delivering a fatal
>> signal.  (This would also allow us to kick core dump generation out of
>> the kernel.)
> 
> This is a very bad idea. Introspective crash logging/reporting is a
> huge gift to attackers. If an attacker has compromised a process in a
> manner to cause it to segfault, they almost surely have enough control
> over the process state to force the handler to perform code execution
> for them. There have been real-world CVEs along these lines.

I've hacked on crash reporters for a while now. Reporting a crash in a 
damaged process environment is undesirable, but unavoidable in some 
cases. For example, on iOS, fork(2) doesn't work. At all. Consequently, 
breakpad there needs to do its best with the state it has.

Calling fork(2) in a SIGSEGV handler and immediately execing a crash 
reporting process is generally safe enough. It's hard for things to go 
wrong enough that this mechanism doesn't work. That fresh crash 
reporting process can ptrace its parent and collect what it wants.

While some kernel help in spawning this process wouldn't hurt, I don't 
think it's particularly necessary. (And I think the existing Linux 
core_pipe approach is adequate.)

We _do_ need user-space dump collection though. The logic for deciding 
what information we include in a crash report is too complex to hoist to 
the kernel, where it'll seldom get updates. The kernel's job should be 
limited to hooking up a crashing process and a crash-reporting process; 
I'd get rid of kernel-written core dumps entirely if I had my way.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] Toward Shareable POSIX Signals
  2018-03-09 20:25             ` Rich Felker
  2018-03-09 20:54               ` Daniel Colascione
@ 2018-03-09 21:05               ` Zack Weinberg
  2018-03-10  7:56               ` Florian Weimer
  2 siblings, 0 replies; 28+ messages in thread
From: Zack Weinberg @ 2018-03-09 21:05 UTC (permalink / raw)
  To: Rich Felker; +Cc: Daniel Colascione, Florian Weimer, GNU C Library

On Fri, Mar 9, 2018 at 3:25 PM, Rich Felker <dalias@libc.org> wrote:
> On Fri, Mar 09, 2018 at 02:30:51PM -0500, Zack Weinberg wrote:
>> > "Just use glib" is of course fundamentally unacceptable. But the
>> > obvious solution is "just use threads" and I don't see why that's not
>> > acceptable. The cost of a thread is miniscule compared to the cost of
>> > a child process, and threads performing synchronous waitpid can
>> > convert the result into whatever type of notification (poll wakeup,
>> > cond var, synchronous handling, etc.) you like.
>>
>> The main problem I see with this idea is, a thread waiting for _any_
>> process can steal the event from a thread waiting for a specific
>> process; this makes it nonviable for any situation where you don't
>
> I never proposed using a thread that calls wait or waidpid with a
> negative argument, rather one thread per child. As long as there is no
> rogue thread in the program doing wait-any,

My point here is that a library author cannot assume there is no such
"rogue thread".

>> I've also
>> dug into Breakpad a little, but that is a debugger at heart, and it
>> would be well-served by a mechanism where the kernel would
>> automatically spawn a ptrace supervisor instead of delivering a fatal
>> signal.  (This would also allow us to kick core dump generation out of
>> the kernel.)
>
> This is a very bad idea. Introspective crash logging/reporting is a
> huge gift to attackers. If an attacker has compromised a process in a
> manner to cause it to segfault, they almost surely have enough control
> over the process state to force the handler to perform code execution
> for them. There have been real-world CVEs along these lines.

That is a problem for Breakpad _as it is today_, but it should not be
a problem for a hypothetical out-of-process ptrace monitor that the
kernel spawns after freezing the process that was about to take a
fatal sync signal.

You can almost fake this today by starting up the ptrace monitor with
the monitored child; the main issue I'm aware of is that the monitor
will get woken up every time the child takes any signal at all, which
makes signal handling even costlier than it already is, and figuring
out whether the child was about to be _killed_ by the signal is a
PITA.

zw

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] Toward Shareable POSIX Signals
  2018-03-09 20:54               ` Daniel Colascione
@ 2018-03-09 21:10                 ` Rich Felker
  2018-03-09 21:27                   ` dancol
  0 siblings, 1 reply; 28+ messages in thread
From: Rich Felker @ 2018-03-09 21:10 UTC (permalink / raw)
  To: Daniel Colascione; +Cc: Zack Weinberg, Florian Weimer, GNU C Library

On Fri, Mar 09, 2018 at 12:54:33PM -0800, Daniel Colascione wrote:
> On 03/09/2018 12:25 PM, Rich Felker wrote:
> >On Fri, Mar 09, 2018 at 02:30:51PM -0500, Zack Weinberg wrote:
> >>>"Just use glib" is of course fundamentally unacceptable. But the
> >>>obvious solution is "just use threads" and I don't see why that's not
> >>>acceptable. The cost of a thread is miniscule compared to the cost of
> >>>a child process, and threads performing synchronous waitpid can
> >>>convert the result into whatever type of notification (poll wakeup,
> >>>cond var, synchronous handling, etc.) you like.
> >>
> >>The main problem I see with this idea is, a thread waiting for _any_
> >>process can steal the event from a thread waiting for a specific
> >>process; this makes it nonviable for any situation where you don't
> >
> >I never proposed using a thread that calls wait or waidpid with a
> >negative argument, rather one thread per child.
> 
> Understood.
> 
> >As long as there is no
> >rogue thread in the program doing wait-any, the thread-per-child
> >approach lets you emulate pdfork pretty well; programs written around
> >this model can use pdfork as a drop-in replacement and eliminate the
> >cost of the thread.
> 
> My contention is that a thread per child process is infeasible from
> a resource POV and that major subsystem authors will never adopt
> this approach.

This may be the current reality but my contention is that it's based
on myths. A thread that will do nothing but waitpid can be created
with a 1-page stack, no guard page, and all signals blocked. It
consumes 4k of memory, 4k plus some epsilonish amount of kernel
memory, one number from the pid/tid space (enlarge if needed), and a
few microseconds to start/exit. Compare with exec'ing a child which
takes hundreds of microseconds (fork-only is much less than with exec,
but still much more than a thread, and fork-only should be considered
deprecated for most purposes for lots of other good reasons). We're
really talking about something like a 1-5% increase in cost here,
probably on the lower end.

> >>>On Fri, Mar 09, 2018 at 05:58:51PM +0100, Florian Weimer wrote:
> >>>>But [threads] only works for asynchronous signals.  It's reasonable
> >>>>for an application to want to catch synchronous signals (SIGBUS
> >>>>when dealing with file mappings, SIGFPE for arithmetic), and there
> >>>>is currently no thread-safe or library-safe way at all to do that.
> >>>
> >>>Yes, as I noted each use case needs to be considered separately to
> >>>determine if there's some other better/more-portable/whatnot way it
> >>>could be done already. The above applies only to SIGCHLD.
> >>>
> >>>FWIW I'm rather skeptical of many of the usage cases for synchronous
> >>>signals (most are dangerous papering-over of UB for dubious
> >>>performance reasons; never-taken "test reg,reg;jz" takes essentially 0
> >>>cycles on a modern uarch) but SIGBUS makes it hard to use mmap safely
> >>>to begin with. So there's still a lot of material to consider here.
> >>
> >>If I remember correctly, GCJ tried to use signal handlers to generate
> >>NullPointerExceptions not for speed reasons, but for code-size and
> >>exception-precision reasons.  But it was never 100% reliable and it
> >>might have been better to go with "test reg,reg;jz" + lean harder on
> >>proving pointers couldn't be null.
> >
> >This is my view. Null checks/proofs should be maximally hoisted and
> >explicitly emitted in the output rather than relying on traps.
> 
> Every major managed code runtime team disagrees with you.
> 
> It's not productive for low-level infrastructure maintainers to
> claim that a universal practice is somehow illegitimate. This
> attitude is not going to convince people doing the supposedly
> illegitimate thing to stop doing it, but it will block progress that
> leads to improvement of the system as a whole.

There are a lot of widespread programming practices that have little
of no legitimacy, and it is productive for parties who have some
leverage to change them to try to use that leverage. Ideally this
should not be unilateral (based on a single person's or single
implementor's position) but reflect widely agreed upon principles.

> >>That's the only case I'm personally familiar with where a serious
> >>application tried to _recover from_ synchronous signals.  I've also
> >>dug into Breakpad a little, but that is a debugger at heart, and it
> >>would be well-served by a mechanism where the kernel would
> >>automatically spawn a ptrace supervisor instead of delivering a fatal
> >>signal.  (This would also allow us to kick core dump generation out of
> >>the kernel.)
> >
> >This is a very bad idea. Introspective crash logging/reporting is a
> >huge gift to attackers. If an attacker has compromised a process in a
> >manner to cause it to segfault, they almost surely have enough control
> >over the process state to force the handler to perform code execution
> >for them. There have been real-world CVEs along these lines.
> 
> I've hacked on crash reporters for a while now. Reporting a crash in
> a damaged process environment is undesirable, but unavoidable in
> some cases. For example, on iOS, fork(2) doesn't work. At all.
> Consequently, breakpad there needs to do its best with the state it
> has.
> 
> Calling fork(2) in a SIGSEGV handler and immediately execing a crash
> reporting process is generally safe enough. It's hard for things to
> go wrong enough that this mechanism doesn't work. That fresh crash
> reporting process can ptrace its parent and collect what it wants.

On i386, the vdso syscall pointer is stored at the beginning of the
TCB, which is just above the thread's TLS, which is just above the
thread's stack with no guard pages in between. Your syscall to fork
could very well turn into a jump to the attacker's payload, not to
mention all the other stuff done in addition to the fork.

> While some kernel help in spawning this process wouldn't hurt, I
> don't think it's particularly necessary. (And I think the existing
> Linux core_pipe approach is adequate.)

It's the only way to make it remotely secure. You cannot safely do
anything from a compromised context. Any further logging/reporting
work has to take place in a known-uncompromised context and has to
account for any data structures extracted from the crashing process
possibly being tainted/malicious.

> We _do_ need user-space dump collection though. The logic for
> deciding what information we include in a crash report is too
> complex to hoist to the kernel, where it'll seldom get updates. The
> kernel's job should be limited to hooking up a crashing process and
> a crash-reporting process; I'd get rid of kernel-written core dumps
> entirely if I had my way.

This is plausible if you sandbox the collection utility such that it
does not have access to do harmful things locally and does not have
channels for exfiltration.

Rich

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] Toward Shareable POSIX Signals
  2018-03-09 21:10                 ` Rich Felker
@ 2018-03-09 21:27                   ` dancol
  0 siblings, 0 replies; 28+ messages in thread
From: dancol @ 2018-03-09 21:27 UTC (permalink / raw)
  To: Rich Felker
  Cc: Daniel Colascione, Zack Weinberg, Florian Weimer, GNU C Library

> On Fri, Mar 09, 2018 at 12:54:33PM -0800, Daniel Colascione wrote:
>> On 03/09/2018 12:25 PM, Rich Felker wrote:
>> >On Fri, Mar 09, 2018 at 02:30:51PM -0500, Zack Weinberg wrote:
>> >>>"Just use glib" is of course fundamentally unacceptable. But the
>> >>>obvious solution is "just use threads" and I don't see why that's not
>> >>>acceptable. The cost of a thread is miniscule compared to the cost of
>> >>>a child process, and threads performing synchronous waitpid can
>> >>>convert the result into whatever type of notification (poll wakeup,
>> >>>cond var, synchronous handling, etc.) you like.
>> >>
>> >>The main problem I see with this idea is, a thread waiting for _any_
>> >>process can steal the event from a thread waiting for a specific
>> >>process; this makes it nonviable for any situation where you don't
>> >
>> >I never proposed using a thread that calls wait or waidpid with a
>> >negative argument, rather one thread per child.
>>
>> Understood.
>>
>> >As long as there is no
>> >rogue thread in the program doing wait-any, the thread-per-child
>> >approach lets you emulate pdfork pretty well; programs written around
>> >this model can use pdfork as a drop-in replacement and eliminate the
>> >cost of the thread.
>>
>> My contention is that a thread per child process is infeasible from
>> a resource POV and that major subsystem authors will never adopt
>> this approach.
>
> This may be the current reality but my contention is that it's based
> on myths. A thread that will do nothing but waitpid can be created
> with a 1-page stack, no guard page, and all signals blocked. It
> consumes 4k of memory, 4k plus some epsilonish amount of kernel
> memory, one number from the pid/tid space (enlarge if needed), and a
> few microseconds to start/exit. Compare with exec'ing a child which
> takes hundreds of microseconds (fork-only is much less than with exec,
> but still much more than a thread, and fork-only should be considered
> deprecated for most purposes for lots of other good reasons). We're
> really talking about something like a 1-5% increase in cost here,
> probably on the lower end.

Myth or not, the idea that threads are expensive has a powerful hold on
people. Especially since thread creation _and teardown_ still contend on
mmap_sem in Linux, and unnecessary vm-map modifications are still highly
undesired.

Besides, everyone using threads for process waiting does nothing to help
*existing* software that uses broad wait operations, especially in the
case where we have a library that wants an internal helper child process
and that wants to work in arbitrary processes.

It's the logic that justifies O_CLOEXEC.

Besides, the kind of thread-based process waiting you're talking about is
pretty complex to implement, and wait is relatively simple and more
efficient. It's wait that people will default into using.

>> >>>On Fri, Mar 09, 2018 at 05:58:51PM +0100, Florian Weimer wrote:
>> >>>>But [threads] only works for asynchronous signals.  It's reasonable
>> >>>>for an application to want to catch synchronous signals (SIGBUS
>> >>>>when dealing with file mappings, SIGFPE for arithmetic), and there
>> >>>>is currently no thread-safe or library-safe way at all to do that.
>> >>>
>> >>>Yes, as I noted each use case needs to be considered separately to
>> >>>determine if there's some other better/more-portable/whatnot way it
>> >>>could be done already. The above applies only to SIGCHLD.
>> >>>
>> >>>FWIW I'm rather skeptical of many of the usage cases for synchronous
>> >>>signals (most are dangerous papering-over of UB for dubious
>> >>>performance reasons; never-taken "test reg,reg;jz" takes essentially
>> 0
>> >>>cycles on a modern uarch) but SIGBUS makes it hard to use mmap safely
>> >>>to begin with. So there's still a lot of material to consider here.
>> >>
>> >>If I remember correctly, GCJ tried to use signal handlers to generate
>> >>NullPointerExceptions not for speed reasons, but for code-size and
>> >>exception-precision reasons.  But it was never 100% reliable and it
>> >>might have been better to go with "test reg,reg;jz" + lean harder on
>> >>proving pointers couldn't be null.
>> >
>> >This is my view. Null checks/proofs should be maximally hoisted and
>> >explicitly emitted in the output rather than relying on traps.
>>
>> Every major managed code runtime team disagrees with you.
>>
>> It's not productive for low-level infrastructure maintainers to
>> claim that a universal practice is somehow illegitimate. This
>> attitude is not going to convince people doing the supposedly
>> illegitimate thing to stop doing it, but it will block progress that
>> leads to improvement of the system as a whole.
>
> There are a lot of widespread programming practices that have little
> of no legitimacy, and it is productive for parties who have some
> leverage to change them to try to use that leverage. Ideally this
> should not be unilateral (based on a single person's or single
> implementor's position) but reflect widely agreed upon principles.

First, relying on traps for optimization isn't an illegitimate technique.
There is _nothing_ wrong with it from a conceptual perspective. Legitimacy
comes from broad adoption. Traps in runtimes do work. They've worked well
and they've worked for a long time. They do improve performance. Why
should anyone stop using them?

Second, there's using leverage and there's tilting as windmills. The
objection to trapping is largely aesthetic; when performance bumps up
against aesthetics, performance has to win. There's no chance that major
managed-code runtimes leave performance on the table because some people
think trapping is ugly.

>> >>That's the only case I'm personally familiar with where a serious
>> >>application tried to _recover from_ synchronous signals.  I've also
>> >>dug into Breakpad a little, but that is a debugger at heart, and it
>> >>would be well-served by a mechanism where the kernel would
>> >>automatically spawn a ptrace supervisor instead of delivering a fatal
>> >>signal.  (This would also allow us to kick core dump generation out of
>> >>the kernel.)
>> >
>> >This is a very bad idea. Introspective crash logging/reporting is a
>> >huge gift to attackers. If an attacker has compromised a process in a
>> >manner to cause it to segfault, they almost surely have enough control
>> >over the process state to force the handler to perform code execution
>> >for them. There have been real-world CVEs along these lines.
>>
>> I've hacked on crash reporters for a while now. Reporting a crash in
>> a damaged process environment is undesirable, but unavoidable in
>> some cases. For example, on iOS, fork(2) doesn't work. At all.
>> Consequently, breakpad there needs to do its best with the state it
>> has.
>>
>> Calling fork(2) in a SIGSEGV handler and immediately execing a crash
>> reporting process is generally safe enough. It's hard for things to
>> go wrong enough that this mechanism doesn't work. That fresh crash
>> reporting process can ptrace its parent and collect what it wants.
>
> On i386, the vdso syscall pointer is stored at the beginning of the
> TCB, which is just above the thread's TLS, which is just above the
> thread's stack with no guard pages in between. Your syscall to fork
> could very well turn into a jump to the attacker's payload, not to
> mention all the other stuff done in addition to the fork.

Right. That's why you issue the system call _directly_ using something
like https://chromium.googlesource.com/linux-syscall-support/

>> While some kernel help in spawning this process wouldn't hurt, I
>> don't think it's particularly necessary. (And I think the existing
>> Linux core_pipe approach is adequate.)
>
> It's the only way to make it remotely secure. You cannot safely do
> anything from a compromised context. Any further logging/reporting
> work has to take place in a known-uncompromised context and has to
> account for any data structures extracted from the crashing process
> possibly being tainted/malicious.

Right. After a process has crashed, its entire address space is untrusted
input.

>
>> We _do_ need user-space dump collection though. The logic for
>> deciding what information we include in a crash report is too
>> complex to hoist to the kernel, where it'll seldom get updates. The
>> kernel's job should be limited to hooking up a crashing process and
>> a crash-reporting process; I'd get rid of kernel-written core dumps
>> entirely if I had my way.
>
> This is plausible if you sandbox the collection utility such that it
> does not have access to do harmful things locally and does not have
> channels for exfiltration.

Agreed. I've implemented such sandboxed collection systems.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] Toward Shareable POSIX Signals
  2018-03-09 20:25             ` Rich Felker
  2018-03-09 20:54               ` Daniel Colascione
  2018-03-09 21:05               ` Zack Weinberg
@ 2018-03-10  7:56               ` Florian Weimer
  2018-03-10  8:41                 ` dancol
  2 siblings, 1 reply; 28+ messages in thread
From: Florian Weimer @ 2018-03-10  7:56 UTC (permalink / raw)
  To: Rich Felker, Zack Weinberg; +Cc: Daniel Colascione, GNU C Library

On 03/09/2018 09:25 PM, Rich Felker wrote:
> This is a very bad idea. Introspective crash logging/reporting is a
> huge gift to attackers. If an attacker has compromised a process in a
> manner to cause it to segfault, they almost surely have enough control
> over the process state to force the handler to perform code execution
> for them. There have been real-world CVEs along these lines.

More importantly, in-process crash handlers also destroy evidence *why* 
the crash happened, or inhibit the crash altogether because they run 
into some sort of deadlock due to the corrupt state of the process.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] Toward Shareable POSIX Signals
  2018-03-10  7:56               ` Florian Weimer
@ 2018-03-10  8:41                 ` dancol
  0 siblings, 0 replies; 28+ messages in thread
From: dancol @ 2018-03-10  8:41 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Rich Felker, Zack Weinberg, Daniel Colascione, GNU C Library

> On 03/09/2018 09:25 PM, Rich Felker wrote:
>> This is a very bad idea. Introspective crash logging/reporting is a
>> huge gift to attackers. If an attacker has compromised a process in a
>> manner to cause it to segfault, they almost surely have enough control
>> over the process state to force the handler to perform code execution
>> for them. There have been real-world CVEs along these lines.
>
> More importantly, in-process crash handlers also destroy evidence *why*
> the crash happened, or inhibit the crash altogether because they run
> into some sort of deadlock due to the corrupt state of the process.

The problems you mention are possible in theory. _In practice_, they
seldom occur. A carefully-programmed in-process crash reporter --- one
that's async-signal-safe (a requirement which helps avoid deadlocks) and
that issues system calls directly --- can almost always produce a crash
dump.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] Toward Shareable POSIX Signals
  2018-03-08 17:53 [RFC] Toward Shareable POSIX Signals Daniel Colascione
  2018-03-08 20:09 ` Florian Weimer
@ 2018-03-11 18:07 ` Zack Weinberg
  2018-03-11 18:56   ` Daniel Colascione
  1 sibling, 1 reply; 28+ messages in thread
From: Zack Weinberg @ 2018-03-11 18:07 UTC (permalink / raw)
  To: Daniel Colascione; +Cc: GNU C Library

On Thu, Mar 8, 2018 at 12:52 PM, Daniel Colascione <dancol@dancol.org> wrote:
> I've written up a proposal for improving the application signal APIs,
> written below. Might there be any interest in prototyping this work
> in glibc?

I want to say first of all that I think you have identified a real
problem and I appreciate your having taken the time to write up a
proposed solution.  However, along with most of the other posters in
this thread, I don't like the proposed solution -- and not just
because I don't like signals (although, indeed, I do not like signals)
but because I think the basic mechanism you suggest, chained handlers,
is inherently unreliable and will cause more problems than it solves.
We have had nothing but bad luck with mechanisms that rely on several
user space components, not all maintained by the same people,
cooperating in access to a shared resource.  Adding signal_register()
to a universe that already has signal() also introduces a nasty
compatibility problem: suppose library A uses the new API to register
a handler for SIGINT (for example), but library B, or the application,
calls signal(SIGINT, SIG_IGN), or sighold(SIGINT): what do you do?

I also think you haven't gone deep enough into the root cause of the
problem you're trying to solve.  You set out to make it possible to
have more than one signal handler per process for each signal, but
_why_ is that an undesirable limitation?  In most cases, it's because
_signals are too coarse_.  When you get a SIGCHLD or a SIGIO or a
SIGSEGV, you don't know which of many possible child processes / file
descriptors / memory addresses is relevant.  If we had a mechanism for
dispatching _specific_ events in these categories directly to the code
that cared about them, then we wouldn't need to have SIGwhatever
handlers in the first place, and we also wouldn't need to worry about
buggy or malevolent handlers eating events that were not for them.
_That_ should be your goal.

With that in mind, let's run down the list of signals with their uses:

CHLD, PIPE, POLL/IO, URG, RTMIN through RTMAX -- These all represent
I/O events.  In most cases it is already possible to receive a
notification tied to the specific file descriptor that's relevant,
instead.  The biggest gap I know about is that child processes are not
represented by file descriptors, and this would be solved by adopting
pdfork() (with some improvements).  You also mentioned async I/O; I am
not up to date on the exact state of async I/O, but I do see a gap in
the sigevent(7) manpage: there ought to be a new alternative,
SIGEV_FD, in which the kernel writes a siginfo_t structure to the
specified file descriptor when I/O completes.  (In fact, if this
existed, I believe SIGEV_SIGNAL and SIGEV_THREAD could be implemented
entirely in user space, on top of it.)

HUP, INT, QUIT, TERM, TSTP, TTIN, TTOU, WINCH, USR1, USR2, XCPU, PWR,
ALRM, VTALRM, PROF -- Often is right to conceptualize these as I/O
events as well, and many of them can already be turned into normal I/O
(e.g. by putting the tty in raw mode, or by using timer_create instead
of alarm), and for those that can't it should be made possible.  But
another valid way to look at them is that they represent _broadcast_
notifications that are already as fine-grained as they can be.  So,
for these, I could be persuaded to support a multi-handler approach --
but one in which all of the registered handlers are always called, no
matter what.  I would need to hear a compelling answer to the
coordination problem I mentioned above, though (what do you do if
there are registered handlers and then someone else uses the legacy
API to ignore the signal?)

ILL, ABRT, FPE, SEGV, BUS, SYS, TRAP, IOT, EMT, STKFLT -- Synchronous
signals arising from processor faults deserve a specialized mechanism
all their own.  The notion I currently like, at the kernel level, is
just-in-time instantiation of a ptrace monitor, because that avoids
the problem of recovering from memory corruption from within the
corrupted address space.  At the C-library level, there are several
plausible strategies for deciding whose responsibility a processor
fault is: special ELF sections that label regions of code with
handlers (like the except_table in the Linux kernel); dynamically
registered annotations on memory regions; SEH; etc.  But notice that
all of those can be built on top of "instantiate a ptrace monitor
instead of delivering a fatal signal."  Someone would need to do
something about how hard it is to write ptrace monitors, but that is
technically a separate issue.

0, STOP, CONT, KILL -- These aren't really signals at all, they are
process-control system calls.  In a from-scratch design we would have
IsProcessRunning(), SuspendProcess(), ResumeProcess(), and
TerminateProcess() primitives instead.  I don't see any real need to
mess with them.

zw

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] Toward Shareable POSIX Signals
  2018-03-11 18:07 ` Zack Weinberg
@ 2018-03-11 18:56   ` Daniel Colascione
  2018-03-12 15:17     ` Zack Weinberg
  0 siblings, 1 reply; 28+ messages in thread
From: Daniel Colascione @ 2018-03-11 18:56 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: GNU C Library

On 03/11/2018 11:07 AM, Zack Weinberg wrote:
> On Thu, Mar 8, 2018 at 12:52 PM, Daniel Colascione <dancol@dancol.org> wrote:
>> I've written up a proposal for improving the application signal APIs,
>> written below. Might there be any interest in prototyping this work
>> in glibc?
> 
> I want to say first of all that I think you have identified a real
> problem and I appreciate your having taken the time to write up a
> proposed solution.  

Thanks for taking a look.

> However, along with most of the other posters in
> this thread, I don't like the proposed solution -- and not just
> because I don't like signals (although, indeed, I do not like signals)
> but because I think the basic mechanism you suggest, chained handlers,
> is inherently unreliable and will cause more problems than it solves.

Why? You're ignoring the present reality that people _already_ use 
chained signal handlers. They're not going to stop. When libc 
maintainers reject widespread use cases as illegitimate, all they're 
doing is forestalling any sort of improvement.

A C runtime needs to consider realistic proposals to address real 
problems of real software --- not hold out instead for some idealized 
alternative family of APIs that will never materialize.

> We have had nothing but bad luck with mechanisms that rely on several
> user space components, not all maintained by the same people,
> cooperating in access to a shared resource. 

Resource arbitration is hard. It's even harder with the hacks (like 
ART's libsigchain) that people are forced to use today because libc 
authors don't consider signals legitimate somehow. Here, I propose an 
API that ensures that the right thing happens as long as everyone 
follows the rules, and that's far better than the lawless waste that 
exists today.

> Adding signal_register()
> to a universe that already has signal() also introduces a nasty
> compatibility problem: suppose library A uses the new API to register
> a handler for SIGINT (for example), but library B, or the application,
> calls signal(SIGINT, SIG_IGN), or sighold(SIGINT): what do you do?

My proposal specifically addresses this subject. Signal mask behavior is 
unchanged. Legacy signal hander installation behavior is unchanged. 
signal and sigaction affect the legacy signal handler slot, even when 
called with SIG_IGN.

> I also think you haven't gone deep enough into the root cause of the
> problem you're trying to solve.  You set out to make it possible to
> have more than one signal handler per process for each signal, but
> _why_ is that an undesirable limitation?  In most cases, it's because
> _signals are too coarse_.  When you get a SIGCHLD or a SIGIO or a
> SIGSEGV, you don't know which of many possible child processes / file
> descriptors / memory addresses is relevant. 

This claim is technically incorrect. The siginfo structure passed to the 
sigaction handler (and, in my proposal, to registered handlers) provides 
the necessary specificity. There are issues with merging asynchronous 
non-queued signals, but no such issues for synchronous signals like 
SIGSEGV, which cannot be queued.

> If we had a mechanism for
> dispatching _specific_ events in these categories directly to the code
> that cared about them, then we wouldn't need to have SIGwhatever
> handlers in the first place, and we also wouldn't need to worry about
> buggy or malevolent handlers eating events that were not for them.
> _That_ should be your goal.

You still need some way to register handlers of interest for specific 
events. You still need some kind of catch-all mechanism in case no 
specific handler is applicable. The overall shape of the API starts to 
resemble the one I proposed.

> With that in mind, let's run down the list of signals with their uses:
> 
> CHLD, PIPE, POLL/IO, URG, RTMIN through RTMAX -- These all represent
> I/O events.  In most cases it is already possible to receive a
> notification tied to the specific file descriptor that's relevant,
> instead.  The biggest gap I know about is that child processes are not
> represented by file descriptors, and this would be solved by adopting
> pdfork() (with some improvements).  

I would prefer process handle file descriptors. Linux upstream has 
specifically rejected process handle file descriptors on several 
occasions. I see no realistic path to solving this problem at that level.

> HUP, INT, QUIT, TERM, TSTP, TTIN, TTOU, WINCH, USR1, USR2, XCPU, PWR,
> ALRM, VTALRM, PROF -- Often is right to conceptualize these as I/O
> events as well, and many of them can already be turned into normal I/O
> (e.g. by putting the tty in raw mode, or by using timer_create instead
> of alarm), and for those that can't it should be made possible.  But
> another valid way to look at them is that they represent _broadcast_
> notifications that are already as fine-grained as they can be. 

I would consider SIGCHLD such a broadcast as well. "One of your child 
processes has died" is a perfectly bit of news to provide to the process 
as a whole.

ALRM, VTALRM, PROF --- as a completely separate matter, additional 
arbitration for coordinating timer deadlines would be useful.

> So,
> for these, I could be persuaded to support a multi-handler approach --
> but one in which all of the registered handlers are always called, no
> matter what.

Thanks for being receptive in this area. I understand your motivation 
for ensuring all such handlers are called for these broadcast signals. I 
think API uniformity matters more than ensuring that all handlers are 
called, especially since I'm certain that we need multi-handler support 
for synchronous signals as well as asynchronous ones, and synchronous 
signals need to be cancelable.

> I would need to hear a compelling answer to the
> coordination problem I mentioned above, though (what do you do if
> there are registered handlers and then someone else uses the legacy
> API to ignore the signal?)

Then no legacy handler is called, but the registered handler is. Any 
execved child inherits the SIG_IGN entry in the legacy slot, just like 
today, and none of the registered handlers.

> ILL, ABRT, FPE, SEGV, BUS, SYS, TRAP, IOT, EMT, STKFLT -- Synchronous
> signals arising from processor faults deserve a specialized mechanism
> all their own.  The notion I currently like, at the kernel level, is
> just-in-time instantiation of a ptrace monitor

That approach doesn't solve the arbitration issue and would make 
performance significantly worse than present. Not every instance of one 
of these synchronous signals is a crash. Spawning a process to handle 
them is far too expensive (and unreliable!) for something like a Java 
runtime's null pointer checks.

>, because that avoids
> the problem of recovering from memory corruption from within the
> corrupted address space.

Not every instance of these signals results in corrupted process state. 
In certain contexts, continuing to execute after receiving these signals 
is perfectly safe.

> At the C-library level, there are several
> plausible strategies for deciding whose responsibility a processor
> fault is: special ELF sections that label regions of code with
> handlers (like the except_table in the Linux kernel); dynamically
> registered annotations on memory regions; SEH; etc. 

While I would approve of adding SEH, I don't think it's a realistic 
option at the moment.

An except_table approach might solve part of the problem, but you'd need 
to provide a dynamic registration facility for the sake of JIT systems. 
I also don't think that keying handler _purely_ on program counter value 
is sufficient --- one might want to handle faults to a particular memory 
region or instruction type independent of precise code identity --- and 
for these use cases, a PC-keyed lookup table is inadequate. Besides, you 
still need a registered to be able to defer to the global process signal 
handler in case a SIGSEGV it receives really does represent a crash.

Also, think of how a table lookup would work at a mechanical level. The 
kernel would still push a SIGSEGV frame onto some stack and transfer 
control flow to the handler. Whether the handler is a set of chained 
user handlers as I propose or a libc-internal table lookup, you have the 
issues with asynchronous safety and memory corruption. My approach is 
just as safe and provides much greater flexibility.

You could teach the kernel to do the table lookup, but that's a much 
bigger task.

userfaultfd isn't adequate because it doesn't work on unmapped memory 
ranges, because it doesn't provide the values of registers in the 
faulting thread, and because there's no hope of making it a portable 
interface (because it's too powerful for limited systems).

> But notice that
> all of those can be built on top of "instantiate a ptrace monitor
> instead of delivering a fatal signal."  Someone would need to do
> something about how hard it is to write ptrace monitors, but that is
> technically a separate issue.
ptrace is far too slow. Besides, automatic ptrace-monitor creation 
suffers from the problem of unreliability in case we can't spawn a 
process (which can happen for any number of reasons) and conflicts with 
other processes ptracing the parent.

SIGSEGV, by contrast, with sigaltstack, is reliable and does not 
interfere with debugging (except to the extent that the debugger needs 
to be configured to ignore SIGSEGV).

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] Toward Shareable POSIX Signals
  2018-03-11 18:56   ` Daniel Colascione
@ 2018-03-12 15:17     ` Zack Weinberg
  2018-03-12 19:47       ` Daniel Colascione
  0 siblings, 1 reply; 28+ messages in thread
From: Zack Weinberg @ 2018-03-12 15:17 UTC (permalink / raw)
  To: Daniel Colascione; +Cc: GNU C Library

On Sun, Mar 11, 2018 at 2:56 PM, Daniel Colascione <dancol@dancol.org> wrote:
> On 03/11/2018 11:07 AM, Zack Weinberg wrote:
>
>> However, along with most of the other posters in
>> this thread, I don't like the proposed solution -- and not just
>> because I don't like signals (although, indeed, I do not like signals)
>> but because I think the basic mechanism you suggest, chained handlers,
>> is inherently unreliable and will cause more problems than it solves.
>
> Why? You're ignoring the present reality that people _already_ use chained
> signal handlers. They're not going to stop. When libc maintainers reject
> widespread use cases as illegitimate, all they're doing is forestalling any
> sort of improvement.

The C library has to be extremely conservative about adding new APIs,
because we are, to first order, stuck with anything we add _forever_.
In particular, we will _not_ accept "people are doing X now" as a
valid argument for codifying X as part of the C library, especially
not when we can come up with an alternative with fewer problems.  Yes,
the alternative might mean that the people doing X now have to do
something else instead.  But switching from libsigchain (for instance)
to libsigchain-codified-in-the-C-library is _also_ a code change for
the people doing X now.  It might be a _smaller_ code change than what
they would have to do to adopt the alternative, but we don't care
about that.  We care, instead, about whether the alternative makes it
easier in the long run to write reliable code.

> A C runtime needs to consider realistic proposals to address real
> problems of real software --- not hold out instead for some
> idealized alternative family of APIs that will never materialize.

This is unfair.  Some of the alternatives we have suggested already
exist, and others are proposals at least as concrete as yours is.

> I propose an API that ensures
> that the right thing happens as long as everyone follows the rules, and
> that's far better than the lawless waste that exists today.

Speaking only for myself here, "the right thing happens as long as
everyone follows the rules" is _not good enough_ for an API codified
as part of the C library.  It needs to be "the right thing happens for
everyone who follows the rules, _even if_ other code in the same
process is breaking the rules in ways that we reasonably anticipate
will happen."

For instance: Chained handlers for SIGCHLD are not good enough,
because we reasonably anticipate that some handlers will -- not out of
malice, just out of lack of foresight -- swallow notifications that
were properly intended for other handlers.  pdfork, on the other hand,
_is_ good enough, because the holder of a process handle is the only
code to receive a notification for that process, regardless of what
other code waiting for unrelated processes might be doing.

>> Adding signal_register()
>> to a universe that already has signal() also introduces a nasty
>> compatibility problem: suppose library A uses the new API to register
>> a handler for SIGINT (for example), but library B, or the application,
>> calls signal(SIGINT, SIG_IGN), or sighold(SIGINT): what do you do?
>
> My proposal specifically addresses this subject. Signal mask behavior is
> unchanged. Legacy signal hander installation behavior is unchanged. signal
> and sigaction affect the legacy signal handler slot, even when called with
> SIG_IGN.

My apologies; I missed the paragraph of your proposal that discusses
this.

I will drop this objection, since I'm not interested in hammering out
the details of an API that I think is a bad idea regardless of its
details.  There could be problems with the legacy interaction you
describe, but that could be said of _any_ legacy interaction.

>> I also think you haven't gone deep enough into the root cause of the
>> problem you're trying to solve.  You set out to make it possible to
>> have more than one signal handler per process for each signal, but
>> _why_ is that an undesirable limitation?  In most cases, it's because
>> _signals are too coarse_.  When you get a SIGCHLD or a SIGIO or a
>> SIGSEGV, you don't know which of many possible child processes / file
>> descriptors / memory addresses is relevant.
>
> This claim is technically incorrect. The siginfo structure passed to
> the sigaction handler (and, in my proposal, to registered handlers)
> provides the necessary specificity.

You misunderstand me.  The problem is not that the handler(s) don't
have enough information to figure out whether the specific event is
relevant to them; the problem is that the specific event is not
delivered directly to the specific handler that cares about it, and
nobody else.

This is why I'm sort-of OK with chained handlers for events that
really are broadcast in nature, such as SIGPWR and SIGTERM.  However,
having thought about it some more, I don't want it to be expressed in
the API as chaining, because chaining implies an order, and that's a
problem in itself.  I want it to be expressed as _independent_
handlers, and by "handlers" I mean "file descriptors" to the maximum
extent possible, e.g. open("/dev/power_failure_notify", O_RDONLY)
gives you a file descriptor that will become readable at the same time
SIGPWR is fired.

> You still need some kind of catch-all mechanism in case no specific
> handler is applicable.

NO WE DON'T.

Catch-alls are bad, OK.  They suffer intrinsically from the same
problem you are trying to solve -- "what if two pieces of code want
to be the catch-all?"

I don't even like your SA_LOW_PRIORITY, because, again, what if two
pieces of code want to be the last to receive the notification?  You
can't honor both requests, so you mustn't even offer the possibility
in the first place.

Instead, what I ideally want is for us to decompose all coarse events
until there is one and only one handler for each specific event, and
then figure out some way to map all of the specific events into file
descriptor notifications that can be fielded via select() or epoll().
If an event is legitimately a broadcast event, like SIGPWR, then we
make it possible for there to be multiple _independent_ -- not
chained; no ordering -- listeners.

> I would prefer process handle file descriptors. Linux upstream has
> specifically rejected process handle file descriptors on several
> occasions.

This is news to me; could you please dig up pointers to specific
objections by people with veto authority?  I thought it had just been
neglected.

...
> ALRM, VTALRM, PROF --- as a completely separate matter, additional
> arbitration for coordinating timer deadlines would be useful.

Yeah.  I haven't had to do anything complicated with timers in C
myself, so I'm not sure what would be ideal as a C API, but
timer_create seems more like the Right Thing than setitimer does.

There is an additional headache in that SIGALRM or SIGVTALRM + a
non-SA_RESTART handler are still sometimes the only way to impose a
timeout on a blocking system call.  Abstractly, all such system calls
need to grow extended versions that take timeouts, but that's a large
and mostly independent project, and there's still an issue with
blocking system calls happening inside a library you don't control.

> I understand your motivation for ensuring all such handlers are
> called for these broadcast signals. I think API uniformity matters
> more than ensuring that all handlers are called, especially since
> I'm certain that we need multi-handler support for synchronous
> signals as well as asynchronous ones, and synchronous signals need
> to be cancelable.

To me it's exactly the other way around: if we can't ensure that all
handlers are called, then the design problem has not yet been solved;
if the API needs to be non-uniform in order to fit the design
requirements, then so be it.

I don't understand what you mean by "synchronous signals need to be
cancelable."

>> ILL, ABRT, FPE, SEGV, BUS, SYS, TRAP, IOT, EMT, STKFLT -- Synchronous
>> signals arising from processor faults deserve a specialized mechanism
>> all their own.  The notion I currently like, at the kernel level, is
>> just-in-time instantiation of a ptrace monitor
>
> That approach doesn't solve the arbitration issue and would make
> performance significantly worse than present. Not every instance of
> one of these synchronous signals is a crash. Spawning a process to
> handle them is far too expensive (and unreliable!) for something
> like a Java runtime's null pointer checks.

As discussed elsethread, I currently agree with Rich Felker that
Java's null pointer checks are better implemented with explicit tests
emitted by the JIT; not by taking a fault and then fixing up
afterward.  Same for persistent object stores and incremental GC; use
compiler-generated write barriers, not page faults.

I _could_ be convinced otherwise, but what it would take is a
head-to-head performance comparison between a JIT that relies on page
faults and a JIT that relies on explicit tests and implements
state-of-the-art elimination of unnecessary tests, all else held
equal, on a real application.

In the absence of that comparison, for synchronous faults I'm really
only interested in making crash recovery more reliable.  That needs to
happen from outside the corrupted address space, and it's OK if it
takes a slow path.

You're right that "instantiate a ptrace monitor just-in-time" still
has an arbitration problem _at the kernel level_.  I imagine the
arbitration - via exception tables or whatever - happening _inside_
the monitor.  The C library might provide a "shell" ptrace monitor
that could be extended with application-specific modules.

Note also that we wouldn't spawn a fresh instance of the monitor for
every fault.  Once it's running, it would stay running and remain
attached to the process.  If the process was already being ptraced by
a full debugger, the monitor would not be involved.  (This gets tricky
when you want to debug the monitor, but not worse than when you want
to debug a debugger.)

> While I would approve of adding SEH, I don't think it's a realistic option
> at the moment.

Agreed that it is too much of a coordination challenge to add SEH; also,
since it relies on dynamic information on the stack, it's not safe in
the face of adversarial memory corruption.

> An except_table approach might solve part of the problem, but you'd need to
> provide a dynamic registration facility for the sake of JIT systems.

Yeah.  But you don't want the JIT-generated code to be able to access
the registrar.  Here, perhaps the right thing is for the JIT to invoke
its sandboxed untrusted-code subprocess already under its own ptrace
monitoring.

> Also, think of how a table lookup would work at a mechanical level. The
> kernel would still push a SIGSEGV frame onto some stack and transfer control
> flow to the handler.

No. The kernel would wake up the monitor process sleeping in ptrace
(or perhaps select() on the process handle) or instantiate one if it
doesn't already exist.

"For code using the new API, we NEVER need to interrupt normal control
flow and push a signal frame" is also on my list of constraints that
must be satisfied for the design to be complete and acceptable.

zw

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] Toward Shareable POSIX Signals
  2018-03-12 15:17     ` Zack Weinberg
@ 2018-03-12 19:47       ` Daniel Colascione
  0 siblings, 0 replies; 28+ messages in thread
From: Daniel Colascione @ 2018-03-12 19:47 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: GNU C Library

On 03/12/2018 08:17 AM, Zack Weinberg wrote:
> On Sun, Mar 11, 2018 at 2:56 PM, Daniel Colascione <dancol@dancol.org> wrote:
>> On 03/11/2018 11:07 AM, Zack Weinberg wrote:
>>
>>> However, along with most of the other posters in
>>> this thread, I don't like the proposed solution -- and not just
>>> because I don't like signals (although, indeed, I do not like signals)
>>> but because I think the basic mechanism you suggest, chained handlers,
>>> is inherently unreliable and will cause more problems than it solves.
>>
>> Why? You're ignoring the present reality that people _already_ use chained
>> signal handlers. They're not going to stop. When libc maintainers reject
>> widespread use cases as illegitimate, all they're doing is forestalling any
>> sort of improvement.
> 
> The C library has to be extremely conservative about adding new APIs,
> because we are, to first order, stuck with anything we add _forever_.

> In particular, we will _not_ accept "people are doing X now" as a
> valid argument for codifying X as part of the C library,

It's very curious to see an argument that suggests what users actually 
do shouldn't matter.

"People are ruining the lawn", the groundskeeper explained to the 
administrator. "Here's a proposal for some concrete walkways. Should be 
done by next week."

"No", the administrator replied. "We wouldn't want to encourage walking. 
It's dangerous." He paused a moment, then continued wistfully. "Flying 
is much better and safer. How many birds with broken ankles do you see?"

"...", the groundskeeper exclaimed internally.

> especially
> not when we can come up with an alternative with fewer problems.  Yes,
> the alternative might mean that the people doing X now have to do
> something else instead.  But switching from libsigchain (for instance)
> to libsigchain-codified-in-the-C-library is _also_ a code change for
> the people doing X now.  It might be a _smaller_ code change than what
> they would have to do to adopt the alternative, but we don't care
> about that.  We care, instead, about whether the alternative makes it
> easier in the long run to write reliable code.

You're missing the point. There is no universe where people stop using 
traps for null pointer checks. You are not going to convince people who 
care about performance to waste cycles doing in software what hardware 
will provide for free. Nobody will ever use a ptrace monitor to 
implement mandatory parts of a language specification either, especially 
not while sigaction still exists. POSIX already gave the world 
sigaction. That's _never_ going away.

There are two possible future worlds:

1) People continue to use awful sigaction-based hacks to exercise the 
signal mechanism, then screw it up and hurt users.

2) libc provides a mechanism that's at least as good as sigaction.

There is no world where people spawn an entire process (which can be 
killed, SIGSTOPed, ptraced, OOM-killed, places in a background cgroup, 
or damaged in myriad other ways) and talk some complicated protocol with 
it when all they want to do is load a library, run some code, unload the 
library, and move on. That is a perfectly reasonable thing to want to 
do, and it shouldn't require any interaction with the system process 
table, raising RLIMIT_NPROC, or reaping children.

Windows has provided a suitable API similar to the one I propose for 20 
years without catastrophe. All the concerns you've raised about memory 
corruption also exist on that system, yet programs continue to run 
reliably. POSIX should be at least as good as Windows, don't you think?

Yes, yes, I know, existing practice isn't evidence in favor of any 
particular proposal. It is though.

>> A C runtime needs to consider realistic proposals to address real
>> problems of real software --- not hold out instead for some
>> idealized alternative family of APIs that will never materialize.
> 
> This is unfair.  Some of the alternatives we have suggested already
> exist, and others are proposals at least as concrete as yours is.

Not for arbitrating access to shared signals, unless I missed something. 
The closest thing is userfaultfd, and I explained why it's not really 
suitable. It could be *made* suitable, but it'd be a massive effort, and 
I doubt other systems would adopt it. I want an API that I could 
conceivably working on Darwin and Cygwin and QNX too.

>> I propose an API that ensures
>> that the right thing happens as long as everyone follows the rules, and
>> that's far better than the lawless waste that exists today.
> 
> Speaking only for myself here, "the right thing happens as long as
> everyone follows the rules" is _not good enough_ for an API codified
> as part of the C library. 

Practically the entire existing interface surface of libc doesn't meet 
this bar. printf can corrupt memory with %n. People can get confused and 
misuse malloc and free --- and in fact, do all the time. longjmp 
introduces all sorts of odd side effects.

Practically anything you do in this language at this level more 
complicated than sysconf(3) is going to be unsafe if you don't follow 
the rules, and that's a good thing! A sharp razor cuts well.

In the right hands, dangerous tools are incredibly useful. I'm not 
proposing an API for people who don't know what they're doing. Sure, 
there's an argument that C shouldn't exist. But I don't expect to 
encounter this argument when trying to improve the C standard library.

> It needs to be "the right thing happens for
> everyone who follows the rules, _even if_ other code in the same
> process is breaking the rules in ways that we reasonably anticipate
> will happen."

At this level, we're talking about individual bytes. There is no 
absolute safety possible within a single process, nor should there be. 
Safety is the job of higher level components to provide, and it's the 
job of lower-level, slower-moving components to provide a suitable 
foundation for these components. In the area of signal arbitration, this 
foundation is lacking.

> For instance: Chained handlers for SIGCHLD are not good enough,
> because we reasonably anticipate that some handlers will -- not out of
> malice, just out of lack of foresight -- swallow notifications that
> were properly intended for other handlers.

What if someone were to call _exit from one of these handlers? What if 
someone accidentally infloops? Lots of things can do wrong.

> pdfork, on the other hand,
> _is_ good enough, because the holder of a process handle is the only
> code to receive a notification for that process, regardless of what
> other code waiting for unrelated processes might be doing.

What if some other component close(2)s the pdfork file descriptor? After 
all, it's not unheard of for people to erroneously retry close(2) and 
cause collateral damage. I don't see how file descriptors get a pass 
from "safe even if people don't follow the rules" criterion.

>>> I also think you haven't gone deep enough into the root cause of the
>>> problem you're trying to solve.  You set out to make it possible to
>>> have more than one signal handler per process for each signal, but
>>> _why_ is that an undesirable limitation?  In most cases, it's because
>>> _signals are too coarse_.  When you get a SIGCHLD or a SIGIO or a
>>> SIGSEGV, you don't know which of many possible child processes / file
>>> descriptors / memory addresses is relevant.
>>
>> This claim is technically incorrect. The siginfo structure passed to
>> the sigaction handler (and, in my proposal, to registered handlers)
>> provides the necessary specificity.
> 
> You misunderstand me.  The problem is not that the handler(s) don't
> have enough information to figure out whether the specific event is
> relevant to them; the problem is that the specific event is not
> delivered directly to the specific handler that cares about it, and
> nobody else.

The most flexible and succinct way to determine which handler should 
exclusively claim a particular fault is to ask each handler in turn. 
Requiring table registration would be both brittle and inefficient, 
since the tables would likely duplicate code-lookup structures that 
language environments already provide.

> This is why I'm sort-of OK with chained handlers for events that
> really are broadcast in nature, such as SIGPWR and SIGTERM.  However,
> having thought about it some more, I don't want it to be expressed in
> the API as chaining, because chaining implies an order, and that's a
> problem in itself.  I want it to be expressed as _independent_
> handlers, and by "handlers" I mean "file descriptors" to the maximum
> extent possible, e.g. open("/dev/power_failure_notify", O_RDONLY)
> gives you a file descriptor that will become readable at the same time
> SIGPWR is fired.
> 
>> You still need some kind of catch-all mechanism in case no specific
>> handler is applicable.
> 
> NO WE DON'T.

YES WE DO.

In-process crash reporting is another one of those things that might be 
ugly, but that never going away. Besides, what if one of your specific 
handlers *can't* handle a particular fault? Should it just infloop until 
the process dies? Raise a different signal?

There must be a way for a handler to throw its hands up in the air and 
say, "I don't know what to do with this signal. It's not mine. Do 
whatever would happen if I weren't here at all.", because this situation 
_will_ arise, and this approach is the least disruptive option.

If my runtime SIGSEGVs trying to dereference NULL, I know what to do. If 
I catch it trying to dereference 0xDEADBEEF, I probably want to treat it 
like a crash. I want to see the process die with SIGSEGV, not see it 
_exit(1) because the except_table pointer for that PC didn't know what 
else to do.

> Catch-alls are bad, OK.  They suffer intrinsically from the same
> problem you are trying to solve -- "what if two pieces of code want
> to be the catch-all?"

They chain to each other. One eventually invokes the SIG_DFL handler and 
the process dies.

> I don't even like your SA_LOW_PRIORITY, because, again, what if two
> pieces of code want to be the last to receive the notification?  You
> can't honor both requests, so you mustn't even offer the possibility
> in the first place.

I can accept the argument that for both SA_LOW_PRIORITY and for 
asynchronous signals, all handlers get called no matter what. We still 
need a way for synchronous signals to exclusively claim the right to 
handle a particular deliver of a particular signal to a particular 
thread, because that's necessary for correctness. I can budge on other 
signals.

> Instead, what I ideally want is for us to decompose all coarse events
> until there is one and only one handler for each specific event, and
> then figure out some way to map all of the specific events into file
> descriptor notifications that can be fielded via select() or epoll().
> If an event is legitimately a broadcast event, like SIGPWR, then we
> make it possible for there to be multiple _independent_ -- not
> chained; no ordering -- listeners.
> 
>> I would prefer process handle file descriptors. Linux upstream has
>> specifically rejected process handle file descriptors on several
>> occasions.
> 
> This is news to me; could you please dig up pointers to specific
> objections by people with veto authority?  I thought it had just been
> neglected.

I can't find the message now. I hope I'm not just imagining it. But I 
distinctly recall reading Linus (I think it was Linus) arguing that a 
file descriptor that would keep a zombie process alive was a terrible, 
bad, no-good thing because it would allow anyone to consume all the 
process table entries on the system.

To be clear, my first preferred option is a facility that represents 
processes at file descriptors --- preferably _arbitrary_ processes, not 
just direct children as with pdfork. The argument I'm recalling (and 
perhaps mis-recollecting) is, IMHO, bogus.

> ...
>> ALRM, VTALRM, PROF --- as a completely separate matter, additional
>> arbitration for coordinating timer deadlines would be useful.
> 
> Yeah.  I haven't had to do anything complicated with timers in C
> myself, so I'm not sure what would be ideal as a C API, but
> timer_create seems more like the Right Thing than setitimer does.
> 
> There is an additional headache in that SIGALRM or SIGVTALRM + a
> non-SA_RESTART handler are still sometimes the only way to impose a
> timeout on a blocking system call.  Abstractly, all such system calls
> need to grow extended versions that take timeouts

Do they? It's more elegant for timeout and logic to be orthogonal than 
for each system call to gain a parameter for any bit behavior you might 
want to associate with that call. There's parsimony in having one way to 
arrange a timeout for any system call. Something like SO_RCVTIMEO seems 
better, IMHO.

> but that's a large
> and mostly independent project, and there's still an issue with
> blocking system calls happening inside a library you don't control.

SIGALRM isn't a great way to impose a constraint on a library you don't 
control anyway. All you can do is make a call fail with EINTR. The 
library you don't control will probably retry on EINTR. You could 
longjmp out of your SIGALRM handler, but that'll probably break some 
invariants in that library you don't control.

>> I understand your motivation for ensuring all such handlers are
>> called for these broadcast signals. I think API uniformity matters
>> more than ensuring that all handlers are called, especially since
>> I'm certain that we need multi-handler support for synchronous
>> signals as well as asynchronous ones, and synchronous signals need
>> to be cancelable.
> 
> To me it's exactly the other way around: if we can't ensure that all
> handlers are called, then the design problem has not yet been solved;
> if the API needs to be non-uniform in order to fit the design
> requirements, then so be it.

I would accept an API that imposed a run-all requirement on most signals 
and that limited exclusive signal claims to synchronously-delivered 
trapping signals.

I agree with part of your earlier message that the signal mechanism 
basically conflates three very distinct APIs: 1) process management 
system calls, 2) process-wide notifications, and 3) poor man's SEH. It's 
#3 that most concerns me.

> I don't understand what you mean by "synchronous signals need to be
> cancelable."

A handler for a synchronous signal needs to be able to exclusively claim 
a particular signal and prevent both other handlers and any catch-all 
handlers (they _will_ exist) from running and misinterpreting that signal.

>>> ILL, ABRT, FPE, SEGV, BUS, SYS, TRAP, IOT, EMT, STKFLT -- Synchronous
>>> signals arising from processor faults deserve a specialized mechanism
>>> all their own.  The notion I currently like, at the kernel level, is
>>> just-in-time instantiation of a ptrace monitor
>>
>> That approach doesn't solve the arbitration issue and would make
>> performance significantly worse than present. Not every instance of
>> one of these synchronous signals is a crash. Spawning a process to
>> handle them is far too expensive (and unreliable!) for something
>> like a Java runtime's null pointer checks.
> 
> As discussed elsethread, I currently agree with Rich Felker that
> Java's null pointer checks are better implemented with explicit tests
> emitted by the JIT; not by taking a fault and then fixing up
> afterward.  Same for persistent object stores and incremental GC; use
> compiler-generated write barriers, not page faults.
All major runtime authors disagree with you. Just look at the code. I 
spoke to some colleagues last week, and they indicate that it's a win on 
code size too.

As I mentioned above, you don't have to accept the technical merit of 
trapping. All that's necessary to see the necessity of my proposal is 
understanding that people will use sigaction if no alternative presents 
itself and that sigaction is not library-safe.

> I _could_ be convinced otherwise, but what it would take is a
> head-to-head performance comparison between a JIT that relies on page
> faults and a JIT that relies on explicit tests and implements
> state-of-the-art elimination of unnecessary tests, all else held
> equal, on a real application.

That's an impossible bar. I could insert artificial NULL checks in ART, 
but there's no way I could convince you I'd done all I could do to 
eliminate extraneous checks. No matter how many checks I eliminated, as 
long as I demonstrated an adverse impact on speed and code size, you 
could claim that there was some algorithmic fruit left unpicked. I'm not 
doing that. The universal use of traps by high-performance managed code 
runtime authors should be evidence enough.

> In the absence of that comparison, for synchronous faults I'm really
> only interested in making crash recovery more reliable. 

That is a deeply disappointing stance.

> That needs to
> happen from outside the corrupted address space, and it's OK if it
> takes a slow path.
> 
> You're right that "instantiate a ptrace monitor just-in-time" still
> has an arbitration problem _at the kernel level_. 

You haven't also addressed the reliability issue. Reliability is 
essential when we're talking about code implementing a mandatory aspect 
of a language specification.

> I imagine the
> arbitration - via exception tables or whatever - happening _inside_
> the monitor.  The C library might provide a "shell" ptrace monitor
> that could be extended with application-specific modules.

Think of all the problems we have with NSS and PAM. Now add page faults. 
Is that a good world?

> Note also that we wouldn't spawn a fresh instance of the monitor for
> every fault.  Once it's running, it would stay running and remain
> attached to the process.  If the process was already being ptraced by
> a full debugger, the monitor would not be involved.

So the debugger, for correctness, would _also_ have to implement the 
trap-arbitration protocol? Would strace? rr? That's an unreasonable 
demand, especially when coupled with zero engineering benefit that such 
a complicated mechanism would bring.

> (This gets tricky
> when you want to debug the monitor, but not worse than when you want
> to debug a debugger.)

Debugging a debugger is no harder than debugging other programs, in my 
experience.

>> While I would approve of adding SEH, I don't think it's a realistic option
>> at the moment.
> 
> Agreed that it is too much of a coordination challenge to add SEH; also,
> since it relies on dynamic information on the stack, it's not safe in
> the face of adversarial memory corruption.

Nothing is safe in the face of memory corruption. Demanding perfect 
safety in an unsafe world is tantamount to blocking all progress.

>> An except_table approach might solve part of the problem, but you'd need to
>> provide a dynamic registration facility for the sake of JIT systems.
> 
> Yeah.  But you don't want the JIT-generated code to be able to access
> the registrar.  Here, perhaps the right thing is for the JIT to invoke
> its sandboxed untrusted-code subprocess already under its own ptrace
> monitoring.

So now we're talking about _multiple_ fragile external processes. That's 
unacceptable.

>> Also, think of how a table lookup would work at a mechanical level. The
>> kernel would still push a SIGSEGV frame onto some stack and transfer control
>> flow to the handler.
> 
> No. The kernel would wake up the monitor process sleeping in ptrace
> (or perhaps select() on the process handle) or instantiate one if it
> doesn't already exist.

It can't guarantee successful instantiation.

> "For code using the new API, we NEVER need to interrupt normal control
> flow and push a signal frame" is also on my list of constraints that
> must be satisfied for the design to be complete and acceptable.

That is an utterly unrealistic stance. You _have_ to interrupt control 
flow. That's the whole point. And you have to be able to respond to that 
interruption from _inside_ the process whose thread is being interrupted 
for reasons I've already discussed. At this point, you have 
async-signal-safety concerns whether or not the precise mechanism is 
frame-pushing or message delivery, so what's the point of bothering with 
the charade of a sending message?

The ability to handle synchronous signals in-process is one of the 
constraints that any acceptable and complete system must have. 
Fortunately, we have sigaction, for which this thread has taught me to 
be increasingly thankful.

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2018-03-12 19:47 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-08 17:53 [RFC] Toward Shareable POSIX Signals Daniel Colascione
2018-03-08 20:09 ` Florian Weimer
2018-03-08 20:22   ` dancol
2018-03-08 21:21     ` Ondřej Bílka
2018-03-08 21:50       ` dancol
2018-03-09  8:17         ` Ondřej Bílka
2018-03-09 10:51           ` Daniel Colascione
2018-03-09  9:19     ` Florian Weimer
2018-03-09 10:43       ` Daniel Colascione
2018-03-09 16:41         ` Rich Felker
2018-03-09 16:58           ` Florian Weimer
2018-03-09 17:14             ` Rich Felker
2018-03-09 17:36               ` Paul Eggert
2018-03-09 19:34               ` Daniel Colascione
2018-03-09 19:28           ` Daniel Colascione
2018-03-09 19:30           ` Zack Weinberg
2018-03-09 20:06             ` Daniel Colascione
2018-03-09 20:25             ` Rich Felker
2018-03-09 20:54               ` Daniel Colascione
2018-03-09 21:10                 ` Rich Felker
2018-03-09 21:27                   ` dancol
2018-03-09 21:05               ` Zack Weinberg
2018-03-10  7:56               ` Florian Weimer
2018-03-10  8:41                 ` dancol
2018-03-11 18:07 ` Zack Weinberg
2018-03-11 18:56   ` Daniel Colascione
2018-03-12 15:17     ` Zack Weinberg
2018-03-12 19:47       ` Daniel Colascione

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).