Futex error handling

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

* Futex error handling
@ 2014-09-16 15:36 Torvald Riegel
  2014-09-16 16:56 ` Rich Felker
                   ` (2 more replies)
  0 siblings, 3 replies; 35+ messages in thread
From: Torvald Riegel @ 2014-09-16 15:36 UTC (permalink / raw)
  To: GLIBC Devel; +Cc: Darren Hart

We got complains from the kernel side that glibc wouldn't react properly
to futex errors being returned.  Thus, I'm looking at what we'd need to
actually improve.  I'm using this here as a documentation for futex
error codes: https://lkml.org/lkml/2014/5/15/356

Generally, we have three categories of faults (ie, the cause for an
error/failure):
* Bug in glibc ("BL")
* Bug in the client program ("BP")
* Failures that are neither a bug in glibc nor the program ("F")

Also, there are cases where it's not a "real" failure, but just
something that is expected behavior that needs to be handled ("NF").

I'm not aware of a general policy about whether glibc should abort or
assert (ie, abort only with assertion checks enabled) when the fault is
in the BL or BP categories.  I'd say we don't, because there's no way to
handle it anyway, and other things will likely go wrong; but I don't
have a strong opinion.  Thoughts?

For every futex op, here's a list of how I'd categorize the possible
error codes (I'm ignoring ENOSYS, which is NF when feature testing (or
BL)):

FUTEX_WAIT:
* EFAULT is either BL or BP.  Nothing we can do.  Should have failed
earlier when we accessed the futex variable.
* EINVAL (alignment and timeout normalization) is BL/BP.
* EWOULDBLOCK, ETIMEDOUT are NF.

FUTEX_WAKE, FUTEX_WAKE_OP:
* EFAULT can be BL/BP *or* NF, so we *must not* abort or assert in this
case.  This is due to how futexes work when combined with certain rules
for destruction of the underlying synchronization data structure; see my
description of the mutex destruction issue (but this can happen with
other data structures such as semaphores or cond vars too):
https://sourceware.org/ml/libc-alpha/2014-04/msg00075.html
* EINVAL (futex alignment) is BL/BP.
* EINVAL (inconsistent state or hit a PI futex) can be either BL/BP *or*
NF.  The latter is caused by the mutex destruction issue, only that a
pending FUTEX_WAKE after destruction doesn't hit an inaccessible memory
location but one which has been reused for a PI futex.  Thus, we must
not abort or assert in this case.

FUTEX_REQUEUE:
* Like FUTEX_WAKE, except that it's not safe to use concurrently with
possible destruction / reuse of the futex memory (because requeueing to
a futex that's unrelated to the new futex located in reused memory is
bad).

FUTEX_REQUEUE_CMP:
* Like FUTEX_REQUEUE.  EAGAIN is NF.

FUTEX_WAKE_OP:
* Haven't looked at this yet.  Only used in condvars, and might not be
necessary for a condvar that's not based on a condvar-internal lock.

FUTEX_WAIT_BITSET / FUTEX_WAKE_BITSET:
* Like FUTEX_WAIT / FUTEX_WAKE.  The additional EINVAL is BL.

FUTEX_LOCK_PI, FUTEX_TRYLOCK_PI:
* EFAULT is BL/BP.
* ENOMEM is F.  We need to handle this.
* EINVAL, EPERM, ESRCH are BL/BP.
* EAGAIN and ETIMEDOUT are NF.
* EDEADLOCK is BP (or BL).
* EOWNERDIED is F.

FUTEX_UNLOCK_PI:
(* I guess this can return EFAULT too, which is BL/BP.)
* EINVAL and EPERM are BL/BP.  I don't think there's a mutex destruction
issue with PI locks because the kernel takes care of both resetting the
value of the futex var and waking up threads; it should do so in a way
that won't access reused memory.  I guess we should check that though...

FUTEX_WAIT_REQUEUE_PI:
* EFAULT and EINVAL are BL/BP.
* EWOULDBLOCK and ETIMEDOUT are NF.
* EOWNERDIED is F.

FUTEX_CMP_REQUEUE_PI is like FUTEX_CMP_REQUEUE except:
* ENOMEM is F.
* EPERM and ESRCH are BL/BP.
* EDEADLOCK is BP (or BL).

I think the next steps to improve this should be:
1) Getting consensus on how we want to handle BL and BP in general.
2) Applying the outcome of that to the list above and getting consensus
on the result.
3) For each case of F, find the best way to report it to the caller
(e.g., error code from the pthreads function, abort, ...).
4) Change each use of the futexes accordingly, one at a time.

I've asked Michael Kerrisk for the state of the futex error docs, but
haven't gotten a reply yet.  (Last time I checked, the new input from
the email I referred to above wasn't part of the futex docs yet.)

Thoughts?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Futex error handling
  2014-09-16 15:36 Futex error handling Torvald Riegel
@ 2014-09-16 16:56 ` Rich Felker
  2014-09-16 18:13   ` Torvald Riegel
  2014-09-17 19:41 ` Roland McGrath
  2014-10-20 16:15 ` Futex error handling Torvald Riegel
  2 siblings, 1 reply; 35+ messages in thread
From: Rich Felker @ 2014-09-16 16:56 UTC (permalink / raw)
  To: Torvald Riegel; +Cc: GLIBC Devel, Darren Hart

On Tue, Sep 16, 2014 at 05:36:25PM +0200, Torvald Riegel wrote:
> We got complains from the kernel side that glibc wouldn't react properly
> to futex errors being returned.

Are these complaints public anywhere?

> Thus, I'm looking at what we'd need to
> actually improve.  I'm using this here as a documentation for futex
> error codes: https://lkml.org/lkml/2014/5/15/356
> 
> Generally, we have three categories of faults (ie, the cause for an
> error/failure):
> * Bug in glibc ("BL")
> * Bug in the client program ("BP")
> * Failures that are neither a bug in glibc nor the program ("F")
> 
> Also, there are cases where it's not a "real" failure, but just
> something that is expected behavior that needs to be handled ("NF").
> 
> I'm not aware of a general policy about whether glibc should abort or
> assert (ie, abort only with assertion checks enabled) when the fault is
> in the BL or BP categories.  I'd say we don't, because there's no way to
> handle it anyway, and other things will likely go wrong; but I don't
> have a strong opinion.  Thoughts?
> 
> For every futex op, here's a list of how I'd categorize the possible
> error codes (I'm ignoring ENOSYS, which is NF when feature testing (or
> BL)):
> 
> FUTEX_WAIT:
> * EFAULT is either BL or BP.  Nothing we can do.  Should have failed
> earlier when we accessed the futex variable.
> * EINVAL (alignment and timeout normalization) is BL/BP.
> * EWOULDBLOCK, ETIMEDOUT are NF.

I would distingish multiple versions of "BP" for EINVAL here. You seem
to have mixed "program has invoked undefined behavior" (e.g. invalid
synchronization object) with "program has provided an erroneous
argument which the implementation is required to report" (e.g. invalid
timespec contents). If you don't want to add a new class, the latter
technically could just be considered NF; it's fully equivalent to NF
in terms of how it has to be handled.

> FUTEX_WAKE, FUTEX_WAKE_OP:
> * EFAULT can be BL/BP *or* NF, so we *must not* abort or assert in this
> case.  This is due to how futexes work when combined with certain rules
> for destruction of the underlying synchronization data structure; see my
> description of the mutex destruction issue (but this can happen with
> other data structures such as semaphores or cond vars too):
> https://sourceware.org/ml/libc-alpha/2014-04/msg00075.html

Note that it's possible to use FUTEX_WAKE_OP in such a way that EFAULT
is reserved for BL/BP (and not NF). I don't see any point in
having/using FUTEX_WAKE_OP except for this purpose, but maybe I'm
missing something.

> * EINVAL (futex alignment) is BL/BP.
> * EINVAL (inconsistent state or hit a PI futex) can be either BL/BP *or*
> NF.  The latter is caused by the mutex destruction issue, only that a
> pending FUTEX_WAKE after destruction doesn't hit an inaccessible memory
> location but one which has been reused for a PI futex.  Thus, we must
> not abort or assert in this case.

Nice catch. I wasn't aware of this one (the PI causing EINVAL).

> FUTEX_REQUEUE:
> * Like FUTEX_WAKE, except that it's not safe to use concurrently with
> possible destruction / reuse of the futex memory (because requeueing to
> a futex that's unrelated to the new futex located in reused memory is
> bad).

I think all cases where FUTEX_REQUEUE remotely makes sense as an
operation inherently preclude destruction of the target (for example,
the mutex that cond var waiters are going to try to relock can't be
deleted until they've all relocked it and returned).

> I think the next steps to improve this should be:
> 1) Getting consensus on how we want to handle BL and BP in general.
> 2) Applying the outcome of that to the list above and getting consensus
> on the result.
> 3) For each case of F, find the best way to report it to the caller
> (e.g., error code from the pthreads function, abort, ...).
> 4) Change each use of the futexes accordingly, one at a time.

Sounds reasonable to me.

> I've asked Michael Kerrisk for the state of the futex error docs, but
> haven't gotten a reply yet.  (Last time I checked, the new input from
> the email I referred to above wasn't part of the futex docs yet.)

Yes, documentation on futex behavior would be very nice to have...

Rich

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Futex error handling
  2014-09-16 16:56 ` Rich Felker
@ 2014-09-16 18:13   ` Torvald Riegel
  2014-09-16 18:55     ` Rich Felker
  0 siblings, 1 reply; 35+ messages in thread
From: Torvald Riegel @ 2014-09-16 18:13 UTC (permalink / raw)
  To: Rich Felker; +Cc: GLIBC Devel, Darren Hart

On Tue, 2014-09-16 at 12:56 -0400, Rich Felker wrote:
> On Tue, Sep 16, 2014 at 05:36:25PM +0200, Torvald Riegel wrote:
> > We got complains from the kernel side that glibc wouldn't react properly
> > to futex errors being returned.
> 
> Are these complaints public anywhere?

I don't think so.  Anyway, this point isn't really relevant to the
discussion -- just my personal motivation, kind of :)

> > Thus, I'm looking at what we'd need to
> > actually improve.  I'm using this here as a documentation for futex
> > error codes: https://lkml.org/lkml/2014/5/15/356
> > 
> > Generally, we have three categories of faults (ie, the cause for an
> > error/failure):
> > * Bug in glibc ("BL")
> > * Bug in the client program ("BP")
> > * Failures that are neither a bug in glibc nor the program ("F")
> > 
> > Also, there are cases where it's not a "real" failure, but just
> > something that is expected behavior that needs to be handled ("NF").
> > 
> > I'm not aware of a general policy about whether glibc should abort or
> > assert (ie, abort only with assertion checks enabled) when the fault is
> > in the BL or BP categories.  I'd say we don't, because there's no way to
> > handle it anyway, and other things will likely go wrong; but I don't
> > have a strong opinion.  Thoughts?
> > 
> > For every futex op, here's a list of how I'd categorize the possible
> > error codes (I'm ignoring ENOSYS, which is NF when feature testing (or
> > BL)):
> > 
> > FUTEX_WAIT:
> > * EFAULT is either BL or BP.  Nothing we can do.  Should have failed
> > earlier when we accessed the futex variable.
> > * EINVAL (alignment and timeout normalization) is BL/BP.
> > * EWOULDBLOCK, ETIMEDOUT are NF.
> 
> I would distingish multiple versions of "BP" for EINVAL here. You seem
> to have mixed "program has invoked undefined behavior" (e.g. invalid
> synchronization object) with "program has provided an erroneous
> argument which the implementation is required to report" (e.g. invalid
> timespec contents). If you don't want to add a new class, the latter
> technically could just be considered NF; it's fully equivalent to NF
> in terms of how it has to be handled.

Good point on timespec.

> > FUTEX_WAKE, FUTEX_WAKE_OP:
> > * EFAULT can be BL/BP *or* NF, so we *must not* abort or assert in this
> > case.  This is due to how futexes work when combined with certain rules
> > for destruction of the underlying synchronization data structure; see my
> > description of the mutex destruction issue (but this can happen with
> > other data structures such as semaphores or cond vars too):
> > https://sourceware.org/ml/libc-alpha/2014-04/msg00075.html
> 
> Note that it's possible to use FUTEX_WAKE_OP in such a way that EFAULT
> is reserved for BL/BP (and not NF). I don't see any point in
> having/using FUTEX_WAKE_OP except for this purpose, but maybe I'm
> missing something.

I agree that I was a bit sloppy in the categorization.  You're right
that depending on how it's used, EFAULT can be just BL/BP.  This applies
to both FUTEX_WAKE and FUTEX_WAKE_OP, I think; the latter has just a
finite number of bits, so you can't avoid an ABA issue entirely.  You
can use the latter like FUTEX_UNLOCK_PI though, to try to avoid the
mutex destruction issue.

So, to summarize, my categories kind of assume a "typical" use of those
operations in glibc.  What I was trying to point out is that we can't
abort in the generic futex syscall code when we see EFAULT, because
that's wrong for typical uses of FUTEX_WAKE.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Futex error handling
  2014-09-16 18:13   ` Torvald Riegel
@ 2014-09-16 18:55     ` Rich Felker
  2014-09-18 12:27       ` Torvald Riegel
  0 siblings, 1 reply; 35+ messages in thread
From: Rich Felker @ 2014-09-16 18:55 UTC (permalink / raw)
  To: Torvald Riegel; +Cc: GLIBC Devel, Darren Hart

On Tue, Sep 16, 2014 at 08:12:38PM +0200, Torvald Riegel wrote:
> > > FUTEX_WAKE, FUTEX_WAKE_OP:
> > > * EFAULT can be BL/BP *or* NF, so we *must not* abort or assert in this
> > > case.  This is due to how futexes work when combined with certain rules
> > > for destruction of the underlying synchronization data structure; see my
> > > description of the mutex destruction issue (but this can happen with
> > > other data structures such as semaphores or cond vars too):
> > > https://sourceware.org/ml/libc-alpha/2014-04/msg00075.html
> > 
> > Note that it's possible to use FUTEX_WAKE_OP in such a way that EFAULT
> > is reserved for BL/BP (and not NF). I don't see any point in
> > having/using FUTEX_WAKE_OP except for this purpose, but maybe I'm
> > missing something.
> 
> I agree that I was a bit sloppy in the categorization.  You're right
> that depending on how it's used, EFAULT can be just BL/BP.  This applies
> to both FUTEX_WAKE and FUTEX_WAKE_OP, I think; the latter has just a
> finite number of bits, so you can't avoid an ABA issue entirely.  You

I'm not sure what ABA issue you have in mind. The EFAULT case with
FUTEX_WAKE, and which I claim FUTEX_WAKE_OP avoids, is when the atomic
operation on the futex int that's associated with the wake allows
another thread to synchronize and determine that it may legally
destroy the object before the actual wake is sent. FUTEX_WAKE_OP can
fully avoid this by performing the atomic operation after looking up
and locking the futex hash bucket, so that there's no further access
after the atomic and thus no opportunity for fault.

> So, to summarize, my categories kind of assume a "typical" use of those
> operations in glibc.  What I was trying to point out is that we can't
> abort in the generic futex syscall code when we see EFAULT, because
> that's wrong for typical uses of FUTEX_WAKE.

Yes, I agree with this. I'm not clear yet on whether it would be an
advantage to use FUTEX_WAKE_OP to avoid this, but I think it's
plausible that it might be.

Rich

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Futex error handling
  2014-09-16 18:55     ` Rich Felker
@ 2014-09-18 12:27       ` Torvald Riegel
  2014-09-18 17:40         ` Rich Felker
  0 siblings, 1 reply; 35+ messages in thread
From: Torvald Riegel @ 2014-09-18 12:27 UTC (permalink / raw)
  To: Rich Felker; +Cc: GLIBC Devel, Darren Hart

On Tue, 2014-09-16 at 14:54 -0400, Rich Felker wrote:
> On Tue, Sep 16, 2014 at 08:12:38PM +0200, Torvald Riegel wrote:
> > > > FUTEX_WAKE, FUTEX_WAKE_OP:
> > > > * EFAULT can be BL/BP *or* NF, so we *must not* abort or assert in this
> > > > case.  This is due to how futexes work when combined with certain rules
> > > > for destruction of the underlying synchronization data structure; see my
> > > > description of the mutex destruction issue (but this can happen with
> > > > other data structures such as semaphores or cond vars too):
> > > > https://sourceware.org/ml/libc-alpha/2014-04/msg00075.html
> > > 
> > > Note that it's possible to use FUTEX_WAKE_OP in such a way that EFAULT
> > > is reserved for BL/BP (and not NF). I don't see any point in
> > > having/using FUTEX_WAKE_OP except for this purpose, but maybe I'm
> > > missing something.
> > 
> > I agree that I was a bit sloppy in the categorization.  You're right
> > that depending on how it's used, EFAULT can be just BL/BP.  This applies
> > to both FUTEX_WAKE and FUTEX_WAKE_OP, I think; the latter has just a
> > finite number of bits, so you can't avoid an ABA issue entirely.  You
> 
> I'm not sure what ABA issue you have in mind.

Sorry, I must have been thinking about the _BITSET variants when I wrote
this.  Not sure why... :)

> The EFAULT case with
> FUTEX_WAKE, and which I claim FUTEX_WAKE_OP avoids, is when the atomic
> operation on the futex int that's associated with the wake allows
> another thread to synchronize and determine that it may legally
> destroy the object before the actual wake is sent. FUTEX_WAKE_OP can
> fully avoid this by performing the atomic operation after looking up
> and locking the futex hash bucket, so that there's no further access
> after the atomic and thus no opportunity for fault.

Agreed; that like what UNLOCK_PI does.  However, and that's something
I've only thought about recently, it would be good to know which
guarantees the kernel gives in this case; in particular, what happens
(and which error code results) if there is destruction and potential
unmapping etc. of the futex variable concurrently with WAKE_OP or
UNLOCK_PI being in flight.

> > So, to summarize, my categories kind of assume a "typical" use of those
> > operations in glibc.  What I was trying to point out is that we can't
> > abort in the generic futex syscall code when we see EFAULT, because
> > that's wrong for typical uses of FUTEX_WAKE.
> 
> Yes, I agree with this. I'm not clear yet on whether it would be an
> advantage to use FUTEX_WAKE_OP to avoid this, but I think it's
> plausible that it might be.

It should work to avoid the issue, but then wake-up latency will be
higher (because the kernel has to do it).  Not a good trade-off to make,
I think.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Futex error handling
  2014-09-18 12:27       ` Torvald Riegel
@ 2014-09-18 17:40         ` Rich Felker
  2014-09-18 17:58           ` Torvald Riegel
  0 siblings, 1 reply; 35+ messages in thread
From: Rich Felker @ 2014-09-18 17:40 UTC (permalink / raw)
  To: Torvald Riegel; +Cc: GLIBC Devel, Darren Hart

On Thu, Sep 18, 2014 at 02:26:35PM +0200, Torvald Riegel wrote:
> > The EFAULT case with
> > FUTEX_WAKE, and which I claim FUTEX_WAKE_OP avoids, is when the atomic
> > operation on the futex int that's associated with the wake allows
> > another thread to synchronize and determine that it may legally
> > destroy the object before the actual wake is sent. FUTEX_WAKE_OP can
> > fully avoid this by performing the atomic operation after looking up
> > and locking the futex hash bucket, so that there's no further access
> > after the atomic and thus no opportunity for fault.
> 
> Agreed; that like what UNLOCK_PI does.  However, and that's something
> I've only thought about recently, it would be good to know which
> guarantees the kernel gives in this case; in particular, what happens
> (and which error code results) if there is destruction and potential
> unmapping etc. of the futex variable concurrently with WAKE_OP or
> UNLOCK_PI being in flight.

I've RTFS'd and my understanding is that no such problems are
possible. The futex hashing (note: there are two futex address
arguments and both are hashed, even if they're equal; this should be
optimized on the kernel side to make FUTEX_WAKE_OP practical) and
locking of the resulting hash buckets happens before the atomic
operation is performed. After the atomic operation, the bucket is
walked and matching waiters are woken.

In theory it's possible that, as soon as the atomic operation is
performed, the backing (file/anon/whatever) is destroyed and its
underlying id (e.g. inode number) is reused, so that the backing
identified for the original futex address has been reused by this
time. However, it's not a problem because a new waiter can't arrive
while the hash bucket is still locked -- so such a new waiter can't be
woken.

Rich

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Futex error handling
  2014-09-18 17:40         ` Rich Felker
@ 2014-09-18 17:58           ` Torvald Riegel
  2014-09-18 20:41             ` Rich Felker
  0 siblings, 1 reply; 35+ messages in thread
From: Torvald Riegel @ 2014-09-18 17:58 UTC (permalink / raw)
  To: Rich Felker; +Cc: GLIBC Devel, Darren Hart, Michael Kerrisk

On Thu, 2014-09-18 at 13:39 -0400, Rich Felker wrote:
> On Thu, Sep 18, 2014 at 02:26:35PM +0200, Torvald Riegel wrote:
> > > The EFAULT case with
> > > FUTEX_WAKE, and which I claim FUTEX_WAKE_OP avoids, is when the atomic
> > > operation on the futex int that's associated with the wake allows
> > > another thread to synchronize and determine that it may legally
> > > destroy the object before the actual wake is sent. FUTEX_WAKE_OP can
> > > fully avoid this by performing the atomic operation after looking up
> > > and locking the futex hash bucket, so that there's no further access
> > > after the atomic and thus no opportunity for fault.
> > 
> > Agreed; that like what UNLOCK_PI does.  However, and that's something
> > I've only thought about recently, it would be good to know which
> > guarantees the kernel gives in this case; in particular, what happens
> > (and which error code results) if there is destruction and potential
> > unmapping etc. of the futex variable concurrently with WAKE_OP or
> > UNLOCK_PI being in flight.
> 
> I've RTFS'd and my understanding is that no such problems are
> possible. The futex hashing (note: there are two futex address
> arguments and both are hashed, even if they're equal; this should be
> optimized on the kernel side to make FUTEX_WAKE_OP practical) and
> locking of the resulting hash buckets happens before the atomic
> operation is performed. After the atomic operation, the bucket is
> walked and matching waiters are woken.
> 
> In theory it's possible that, as soon as the atomic operation is
> performed, the backing (file/anon/whatever) is destroyed and its
> underlying id (e.g. inode number) is reused, so that the backing
> identified for the original futex address has been reused by this
> time. However, it's not a problem because a new waiter can't arrive
> while the hash bucket is still locked -- so such a new waiter can't be
> woken.

I don't disagree with what you said (but I haven't looked at all the
source).  The point I was trying to make was that the current
implementation is not a guarantee -- if we rely on this then we should
request the kernel to document that this is allowed usage; that's the
only way to make sure this keeps working.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Futex error handling
  2014-09-18 17:58           ` Torvald Riegel
@ 2014-09-18 20:41             ` Rich Felker
  2014-10-30  1:55               ` Darren Hart
  0 siblings, 1 reply; 35+ messages in thread
From: Rich Felker @ 2014-09-18 20:41 UTC (permalink / raw)
  To: Torvald Riegel; +Cc: GLIBC Devel, Darren Hart, Michael Kerrisk

On Thu, Sep 18, 2014 at 07:57:45PM +0200, Torvald Riegel wrote:
> On Thu, 2014-09-18 at 13:39 -0400, Rich Felker wrote:
> > On Thu, Sep 18, 2014 at 02:26:35PM +0200, Torvald Riegel wrote:
> > > > The EFAULT case with
> > > > FUTEX_WAKE, and which I claim FUTEX_WAKE_OP avoids, is when the atomic
> > > > operation on the futex int that's associated with the wake allows
> > > > another thread to synchronize and determine that it may legally
> > > > destroy the object before the actual wake is sent. FUTEX_WAKE_OP can
> > > > fully avoid this by performing the atomic operation after looking up
> > > > and locking the futex hash bucket, so that there's no further access
> > > > after the atomic and thus no opportunity for fault.
> > > 
> > > Agreed; that like what UNLOCK_PI does.  However, and that's something
> > > I've only thought about recently, it would be good to know which
> > > guarantees the kernel gives in this case; in particular, what happens
> > > (and which error code results) if there is destruction and potential
> > > unmapping etc. of the futex variable concurrently with WAKE_OP or
> > > UNLOCK_PI being in flight.
> > 
> > I've RTFS'd and my understanding is that no such problems are
> > possible. The futex hashing (note: there are two futex address
> > arguments and both are hashed, even if they're equal; this should be
> > optimized on the kernel side to make FUTEX_WAKE_OP practical) and
> > locking of the resulting hash buckets happens before the atomic
> > operation is performed. After the atomic operation, the bucket is
> > walked and matching waiters are woken.
> > 
> > In theory it's possible that, as soon as the atomic operation is
> > performed, the backing (file/anon/whatever) is destroyed and its
> > underlying id (e.g. inode number) is reused, so that the backing
> > identified for the original futex address has been reused by this
> > time. However, it's not a problem because a new waiter can't arrive
> > while the hash bucket is still locked -- so such a new waiter can't be
> > woken.
> 
> I don't disagree with what you said (but I haven't looked at all the
> source).  The point I was trying to make was that the current
> implementation is not a guarantee -- if we rely on this then we should
> request the kernel to document that this is allowed usage; that's the
> only way to make sure this keeps working.

Oh, I agree entirely on this. The lack of documentation of the futex
interfaces makes it rather unsafe to rely on _anything_ about it right
now, except for the vague principle that the kernel isn't supposed to
"break userspace" with incompatible changes.

I don't think it will be a problem ggettingthis behavior for
FUTEX_WAKE_OP documented though. It seems like it's part of the
intended semantics.

Rich

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Futex error handling
  2014-09-18 20:41             ` Rich Felker
@ 2014-10-30  1:55               ` Darren Hart
  2014-10-30  6:51                 ` Carlos O'Donell
  0 siblings, 1 reply; 35+ messages in thread
From: Darren Hart @ 2014-10-30  1:55 UTC (permalink / raw)
  To: Rich Felker; +Cc: Torvald Riegel, GLIBC Devel, Michael Kerrisk

On Thu, Sep 18, 2014 at 04:40:55PM -0400, Rich Felker wrote:
> On Thu, Sep 18, 2014 at 07:57:45PM +0200, Torvald Riegel wrote:
> > On Thu, 2014-09-18 at 13:39 -0400, Rich Felker wrote:
> > > On Thu, Sep 18, 2014 at 02:26:35PM +0200, Torvald Riegel wrote:
> > > > > The EFAULT case with
> > > > > FUTEX_WAKE, and which I claim FUTEX_WAKE_OP avoids, is when the atomic
> > > > > operation on the futex int that's associated with the wake allows
> > > > > another thread to synchronize and determine that it may legally
> > > > > destroy the object before the actual wake is sent. FUTEX_WAKE_OP can
> > > > > fully avoid this by performing the atomic operation after looking up
> > > > > and locking the futex hash bucket, so that there's no further access
> > > > > after the atomic and thus no opportunity for fault.
> > > > 
> > > > Agreed; that like what UNLOCK_PI does.  However, and that's something
> > > > I've only thought about recently, it would be good to know which
> > > > guarantees the kernel gives in this case; in particular, what happens
> > > > (and which error code results) if there is destruction and potential
> > > > unmapping etc. of the futex variable concurrently with WAKE_OP or
> > > > UNLOCK_PI being in flight.
> > > 
> > > I've RTFS'd and my understanding is that no such problems are
> > > possible. The futex hashing (note: there are two futex address
> > > arguments and both are hashed, even if they're equal; this should be
> > > optimized on the kernel side to make FUTEX_WAKE_OP practical) and
> > > locking of the resulting hash buckets happens before the atomic
> > > operation is performed. After the atomic operation, the bucket is
> > > walked and matching waiters are woken.
> > > 
> > > In theory it's possible that, as soon as the atomic operation is
> > > performed, the backing (file/anon/whatever) is destroyed and its
> > > underlying id (e.g. inode number) is reused, so that the backing
> > > identified for the original futex address has been reused by this
> > > time. However, it's not a problem because a new waiter can't arrive
> > > while the hash bucket is still locked -- so such a new waiter can't be
> > > woken.
> > 
> > I don't disagree with what you said (but I haven't looked at all the
> > source).  The point I was trying to make was that the current
> > implementation is not a guarantee -- if we rely on this then we should
> > request the kernel to document that this is allowed usage; that's the
> > only way to make sure this keeps working.
> 
> Oh, I agree entirely on this. The lack of documentation of the futex
> interfaces makes it rather unsafe to rely on _anything_ about it right

Where do you want to see this documented? Thomas recently added a lot of state
documentation to futex.c, and he and I both responded to Michael Kerrisk's
request for documented error codes. Are we refering to the man page? If so, I
believe that is in the works already.
-- 
Darren Hart
Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Futex error handling
  2014-10-30  1:55               ` Darren Hart
@ 2014-10-30  6:51                 ` Carlos O'Donell
  0 siblings, 0 replies; 35+ messages in thread
From: Carlos O'Donell @ 2014-10-30  6:51 UTC (permalink / raw)
  To: Darren Hart, Rich Felker; +Cc: Torvald Riegel, GLIBC Devel, Michael Kerrisk

On 10/29/2014 09:55 PM, Darren Hart wrote:
> On Thu, Sep 18, 2014 at 04:40:55PM -0400, Rich Felker wrote:
>> Oh, I agree entirely on this. The lack of documentation of the futex
>> interfaces makes it rather unsafe to rely on _anything_ about it right
> 
> Where do you want to see this documented? Thomas recently added a lot of state
> documentation to futex.c, and he and I both responded to Michael Kerrisk's
> request for documented error codes. Are we refering to the man page? If so, I
> believe that is in the works already.

Yes, it is in the works, but it isn't done yet. Torvald Riegel was following
up on some of this.

The linux kernel man pages maintained by Michael Kerrisk is the canonical
place we look to for kernel API documentation for linux-specific syscalls.

If we can get everything well documented there, then we are off to the
races. We want it documented there also so we can agree upon what error
codes mean what.

Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Futex error handling
  2014-09-16 15:36 Futex error handling Torvald Riegel
  2014-09-16 16:56 ` Rich Felker
@ 2014-09-17 19:41 ` Roland McGrath
  2014-09-17 19:46   ` Torvald Riegel
  2014-10-20 16:15 ` Futex error handling Torvald Riegel
  2 siblings, 1 reply; 35+ messages in thread
From: Roland McGrath @ 2014-09-17 19:41 UTC (permalink / raw)
  To: Torvald Riegel; +Cc: GLIBC Devel, Darren Hart, Michael Kerrisk

> We got complains from the kernel side that glibc wouldn't react properly
> to futex errors being returned.  Thus, I'm looking at what we'd need to
> actually improve.  I'm using this here as a documentation for futex
> error codes: https://lkml.org/lkml/2014/5/15/356

Can somebody work with Michael et al to ensure that the futex(2) man page
becomes reliable documentation for this?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Futex error handling
  2014-09-17 19:41 ` Roland McGrath
@ 2014-09-17 19:46   ` Torvald Riegel
  2014-09-17 19:59     ` Roland McGrath
  0 siblings, 1 reply; 35+ messages in thread
From: Torvald Riegel @ 2014-09-17 19:46 UTC (permalink / raw)
  To: Roland McGrath; +Cc: GLIBC Devel, Darren Hart, Michael Kerrisk

On Wed, 2014-09-17 at 12:41 -0700, Roland McGrath wrote:
> > We got complains from the kernel side that glibc wouldn't react properly
> > to futex errors being returned.  Thus, I'm looking at what we'd need to
> > actually improve.  I'm using this here as a documentation for futex
> > error codes: https://lkml.org/lkml/2014/5/15/356
> 
> Can somebody work with Michael et al to ensure that the futex(2) man page
> becomes reliable documentation for this?

I've asked Michael about the current status and offered my help in case
we also want to document more high-level properties of futex, like how
to use it properly.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Futex error handling
  2014-09-17 19:46   ` Torvald Riegel
@ 2014-09-17 19:59     ` Roland McGrath
  2014-09-17 23:17       ` Rich Felker
  0 siblings, 1 reply; 35+ messages in thread
From: Roland McGrath @ 2014-09-17 19:59 UTC (permalink / raw)
  To: Torvald Riegel; +Cc: GLIBC Devel, Darren Hart, Michael Kerrisk

> I've asked Michael about the current status and offered my help in case
> we also want to document more high-level properties of futex, like how
> to use it properly.

Thanks for taking this on.  The futex(7) man page seems like the right
place for that, and it's rather thin now.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Futex error handling
  2014-09-17 19:59     ` Roland McGrath
@ 2014-09-17 23:17       ` Rich Felker
  2014-10-23 19:16         ` Add futex wrapper to glibc? Carlos O'Donell
  0 siblings, 1 reply; 35+ messages in thread
From: Rich Felker @ 2014-09-17 23:17 UTC (permalink / raw)
  To: Roland McGrath; +Cc: Torvald Riegel, GLIBC Devel, Darren Hart, Michael Kerrisk

On Wed, Sep 17, 2014 at 12:59:18PM -0700, Roland McGrath wrote:
> > I've asked Michael about the current status and offered my help in case
> > we also want to document more high-level properties of futex, like how
> > to use it properly.
> 
> Thanks for taking this on.  The futex(7) man page seems like the right
> place for that, and it's rather thin now.

I would really like to expose futex(2) as a public interface and see
it documented there.. :-) I know that's something of a separate topic,
but it seems like a good time to discuss since the documentation is
going to be worked on, and since C11 atomics make having futex()
available to applications A LOT more desirable.

Rich

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Add futex wrapper to glibc?
  2014-09-17 23:17       ` Rich Felker
@ 2014-10-23 19:16         ` Carlos O'Donell
  2014-10-23 21:07           ` Joseph S. Myers
                             ` (2 more replies)
  0 siblings, 3 replies; 35+ messages in thread
From: Carlos O'Donell @ 2014-10-23 19:16 UTC (permalink / raw)
  To: Rich Felker, Roland McGrath
  Cc: Torvald Riegel, GLIBC Devel, Darren Hart, Michael Kerrisk

On 09/17/2014 07:17 PM, Rich Felker wrote:
> On Wed, Sep 17, 2014 at 12:59:18PM -0700, Roland McGrath wrote:
>>> I've asked Michael about the current status and offered my help in case
>>> we also want to document more high-level properties of futex, like how
>>> to use it properly.
>>
>> Thanks for taking this on.  The futex(7) man page seems like the right
>> place for that, and it's rather thin now.
> 
> I would really like to expose futex(2) as a public interface and see
> it documented there.. :-) I know that's something of a separate topic,
> but it seems like a good time to discuss since the documentation is
> going to be worked on, and since C11 atomics make having futex()
> available to applications A LOT more desirable.

We've had this discussion before, but I can't find the reference in
the libc-alpha archives.

The largest problem right now is lack of documentation and that becomes
lack of commitment from the kernel to maintain the API/ABI, and that
translates into reluctance for glibc to create a wrapper.

We are IMO at the stage where futex is stable, few things are changing,
and with documentation in place, I would consider adding a futex wrapper.
However, per other discussions on errors, I'd actually add a non-trivial
C wrapper to enforce that the kernel does not return error codes that
are unexpected.

Roland expressed some opinions in the past. I think, because I can't find
the reference, they were along the lines that: the value of all of these
things being in glibc is suspect.

I'm OK with distinct symbol sets between supported OSs and I understand
that for GNU/Linux that glibc *is* part C library and part Linux library.

Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Add futex wrapper to glibc?
  2014-10-23 19:16         ` Add futex wrapper to glibc? Carlos O'Donell
@ 2014-10-23 21:07           ` Joseph S. Myers
  2014-10-24  2:00             ` Carlos O'Donell
                               ` (2 more replies)
  2014-10-23 21:12           ` Rich Felker
  2014-10-30  1:59           ` Darren Hart
  2 siblings, 3 replies; 35+ messages in thread
From: Joseph S. Myers @ 2014-10-23 21:07 UTC (permalink / raw)
  To: Carlos O'Donell
  Cc: Rich Felker, Roland McGrath, Torvald Riegel, GLIBC Devel,
	Darren Hart, Michael Kerrisk

On Thu, 23 Oct 2014, Carlos O'Donell wrote:

> > I would really like to expose futex(2) as a public interface and see
> > it documented there.. :-) I know that's something of a separate topic,
> > but it seems like a good time to discuss since the documentation is
> > going to be worked on, and since C11 atomics make having futex()
> > available to applications A LOT more desirable.
> 
> We've had this discussion before, but I can't find the reference in
> the libc-alpha archives.

See bug 9712 (and comment 4 therein where I pointed to past discussions of 
the general question of which syscall wrappers to add).

> The largest problem right now is lack of documentation and that becomes
> lack of commitment from the kernel to maintain the API/ABI, and that
> translates into reluctance for glibc to create a wrapper.

That's not a plausible reason.  The presumption is that if any interface 
provided by the kernel could plausibly be used by userspace, it must stay 
stable.

We need to reach consensus on when to add syscall wrappers to glibc (and I 
suppose on how to determine whether a given syscall wrapper should be a 
cancellation point).  Possible considerations include:

* If a syscall is obsoleted by another syscall (or otherwise considered 
obsolete), there is no need to add a wrapper to glibc.

* If a syscall cannot meaningfully be used behind glibc's back, or is not 
useful in the glibc context except for in the ways in which it is used by 
glibc, but can only be used directly by glibc, there is no need to add a 
wrapper to glibc (this probably applies to set_thread_area, for example).

* If there's a glibc function that's not quite a direct wrapper of the 
syscall but provides all the functionality of it that can usefully be used 
in a program using glibc, there is no need to add a wrapper to glibc.

* Otherwise, my inclination would be to default to adding wrappers (both 
for syscalls not used in glibc, and for cases such as futex where the 
syscall is used in glibc but can usefully be used directly as well) unless 
there is a clear reason not to.  This includes for architecture-specific 
syscalls.

* If a wrapper is provided, there should also be a header with 
corresponding declarations (that either provides any relevant constants / 
structures or includes appropriate kernel neaders providing them).

* There may be cases where a function is meaningful not just with Linux 
kernels supporting the syscall, and should be provided with some kind of 
fallback for older kernels, and possibly made part of the glibc API for 
systems with other kernels as well.

* If we do get consensus on principles, we still need someone to go 
through current syscall lists and work out a more detailed proposal for 
which syscalls to add to glibc, then actually add them once there is 
consensus on those sets.

> We are IMO at the stage where futex is stable, few things are changing,
> and with documentation in place, I would consider adding a futex wrapper.
> However, per other discussions on errors, I'd actually add a non-trivial
> C wrapper to enforce that the kernel does not return error codes that
> are unexpected.

I don't think that makes sense for futex.  I think such checks only make 
sense where glibc is making use of futex within its own code - there it 
may sometimes be useful to sanity-check results match glibc's 
expectations.  I also think it's desirable for any such syscall wrapper to 
be usable with futex operations that may be added to the kernel in future 
(i.e. someone should only need new kernel headers to build code using new 
operations, not a new glibc binary that knows how to check things for 
those new operations).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Add futex wrapper to glibc?
  2014-10-23 21:07           ` Joseph S. Myers
@ 2014-10-24  2:00             ` Carlos O'Donell
  2014-10-26  7:19             ` Mike Frysinger
  2014-10-30  2:03             ` Darren Hart
  2 siblings, 0 replies; 35+ messages in thread
From: Carlos O'Donell @ 2014-10-24  2:00 UTC (permalink / raw)
  To: Joseph S. Myers
  Cc: Rich Felker, Roland McGrath, Torvald Riegel, GLIBC Devel,
	Darren Hart, Michael Kerrisk

On 10/23/2014 05:06 PM, Joseph S. Myers wrote:
>> We are IMO at the stage where futex is stable, few things are changing,
>> and with documentation in place, I would consider adding a futex wrapper.
>> However, per other discussions on errors, I'd actually add a non-trivial
>> C wrapper to enforce that the kernel does not return error codes that
>> are unexpected.
> 
> I don't think that makes sense for futex.  I think such checks only make 
> sense where glibc is making use of futex within its own code - there it 
> may sometimes be useful to sanity-check results match glibc's 
> expectations.  I also think it's desirable for any such syscall wrapper to 
> be usable with futex operations that may be added to the kernel in future 
> (i.e. someone should only need new kernel headers to build code using new 
> operations, not a new glibc binary that knows how to check things for 
> those new operations).

I concede.

My worry is the mess with sched_[sg]etaffinity where the kernel syscall
and userspace API are different.

Your point " * If there's a glibc function that's not quite a direct 
wrapper of the syscall but provides all the functionality of it that
can usefully be used in a program using glibc, there is no need to 
add a wrapper to glibc." covers that though.

I've added your notes to the wiki:

https://sourceware.org/glibc/wiki/Consensus#WIP:_Kernel_syscalls_wrappers

Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Add futex wrapper to glibc?
  2014-10-23 21:07           ` Joseph S. Myers
  2014-10-24  2:00             ` Carlos O'Donell
@ 2014-10-26  7:19             ` Mike Frysinger
  2014-10-27 17:52               ` Joseph S. Myers
                                 ` (2 more replies)
  2014-10-30  2:03             ` Darren Hart
  2 siblings, 3 replies; 35+ messages in thread
From: Mike Frysinger @ 2014-10-26  7:19 UTC (permalink / raw)
  To: Joseph S. Myers
  Cc: Carlos O'Donell, Rich Felker, Roland McGrath, Torvald Riegel,
	GLIBC Devel, Darren Hart, Michael Kerrisk

[-- Attachment #1: Type: text/plain, Size: 2662 bytes --]

On 23 Oct 2014 21:06, Joseph S. Myers wrote:
> On Thu, 23 Oct 2014, Carlos O'Donell wrote:
> > The largest problem right now is lack of documentation and that becomes
> > lack of commitment from the kernel to maintain the API/ABI, and that
> > translates into reluctance for glibc to create a wrapper.
> 
> That's not a plausible reason.  The presumption is that if any interface 
> provided by the kernel could plausibly be used by userspace, it must stay 
> stable.

sure, but Carlos is speaking in this specific case wrt futex, and it def suffers 
from lack of documentation everywhere.  which provides quite a good amount of 
wiggle room for the kernel peeps :).

> * Otherwise, my inclination would be to default to adding wrappers (both 
> for syscalls not used in glibc, and for cases such as futex where the 
> syscall is used in glibc but can usefully be used directly as well) unless 
> there is a clear reason not to.  This includes for architecture-specific 
> syscalls.

i disagree here.  if the func is not going to be generally useful, i don't think 
glibc should pick it up.  especially for arch-specific syscalls.  i think the 
kernel itself is much better suited for providing a thin wrapper layer for 
userspace as they're directly in control of the ABI (especially cross-arch) and 
the API (via the UAPI headers), they can provide a full API for all syscalls 
(unlike glibc per the caveats listed above), and it means there's a central 
source point for everyone to use, not just glibc.

if the syscall is generally useful to people writing userland code, then i don't 
see a problem with incorporating it in some fashion into the official GNU API.

> * If a wrapper is provided, there should also be a header with 
> corresponding declarations (that either provides any relevant constants / 
> structures or includes appropriate kernel neaders providing them).

absolutely

> * There may be cases where a function is meaningful not just with Linux 
> kernels supporting the syscall, and should be provided with some kind of 
> fallback for older kernels, and possibly made part of the glibc API for 
> systems with other kernels as well.

i think any syscall we think about adopting, this should be given serious 
consideration.  ideally glibc's API should be stable across platforms.  if the 
Linux syscall is so great as to warrant us adding a wrapper, then it stands to 
reason that equivalent functionality should be desirable on other platforms.  it 
doesn't mean we need to wait for them to implement the kernel side, but we 
shouldn't be baking in pure Linuxisms from the start.
-mike

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Add futex wrapper to glibc?
  2014-10-26  7:19             ` Mike Frysinger
@ 2014-10-27 17:52               ` Joseph S. Myers
  2014-10-27 20:28               ` Roland McGrath
  2014-10-30  2:07               ` Darren Hart
  2 siblings, 0 replies; 35+ messages in thread
From: Joseph S. Myers @ 2014-10-27 17:52 UTC (permalink / raw)
  To: Mike Frysinger
  Cc: Carlos O'Donell, Rich Felker, Roland McGrath, Torvald Riegel,
	GLIBC Devel, Darren Hart, Michael Kerrisk

On Sun, 26 Oct 2014, Mike Frysinger wrote:

> > * Otherwise, my inclination would be to default to adding wrappers (both 
> > for syscalls not used in glibc, and for cases such as futex where the 
> > syscall is used in glibc but can usefully be used directly as well) unless 
> > there is a clear reason not to.  This includes for architecture-specific 
> > syscalls.
> 
> i disagree here.  if the func is not going to be generally useful, i 
> don't think glibc should pick it up.  especially for arch-specific 
> syscalls.  i think the kernel itself is much better suited for providing 
> a thin wrapper layer for userspace as they're directly in control of the 
> ABI (especially cross-arch) and the API (via the UAPI headers), they can 
> provide a full API for all syscalls (unlike glibc per the caveats listed 
> above), and it means there's a central source point for everyone to use, 
> not just glibc.

I don't think the kernel is well-suited for this; various issues around 
building objects for all the different ABIs, including ABI differences 
that are only relevant at the userspace level and not the kernel level, 
don't arise for kernel code (and only to a limited extent for vDSOs) - and 
then you get into things such as providing unwind / debug info for syscall 
stubs.  If the syscall is to be a cancellation point (and it was suggested 
the futex wrapper should be, for certain arguments) then the syscall 
implementation closely ties into libc internals (and if it sets errno, 
that means the object code for the implementation depends on libc, even if 
it would be possible to write C source code for it that does not).

> if the syscall is generally useful to people writing userland code, then 
> i don't see a problem with incorporating it in some fashion into the 
> official GNU API.

Sometimes "access this Linux kernel interface" is generally useful (e.g. 
gettid, futex - and plenty of existing syscall wrappers).

> i think any syscall we think about adopting, this should be given 
> serious consideration.  ideally glibc's API should be stable across 
> platforms.  if the Linux syscall is so great as to warrant us adding a 
> wrapper, then it stands to reason that equivalent functionality should 
> be desirable on other platforms.  it doesn't mean we need to wait for 
> them to implement the kernel side, but we shouldn't be baking in pure 
> Linuxisms from the start. -mike

I don't think we actually need to decide at the time of adding a syscall 
wrapper whether it would be useful on other platforms - if we decide that 
later, we can always add the ENOSYS stub for Hurd use, or fallbacks for 
older Linux kernel versions, at that point.  (If we wish to use a function 
internally in glibc in OS-independent code, that would be a good 
indication that it should be made an OS-independent interface.)

Nor do I think we can reliably predict whether an interface is suitable to 
be OS-independent from whether it is generally useful with the Linux 
kernel - it may be necessary to see how several different OSes approach an 
issue before a good conclusion can be reached on what a portable 
OS-independent interface should look like.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Add futex wrapper to glibc?
  2014-10-26  7:19             ` Mike Frysinger
  2014-10-27 17:52               ` Joseph S. Myers
@ 2014-10-27 20:28               ` Roland McGrath
  2014-10-30  2:07               ` Darren Hart
  2 siblings, 0 replies; 35+ messages in thread
From: Roland McGrath @ 2014-10-27 20:28 UTC (permalink / raw)
  To: Mike Frysinger
  Cc: Joseph S. Myers, Carlos O'Donell, Rich Felker,
	Torvald Riegel, GLIBC Devel, Darren Hart, Michael Kerrisk

I generally concur with Mike's points.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Add futex wrapper to glibc?
  2014-10-26  7:19             ` Mike Frysinger
  2014-10-27 17:52               ` Joseph S. Myers
  2014-10-27 20:28               ` Roland McGrath
@ 2014-10-30  2:07               ` Darren Hart
  2014-10-30  9:32                 ` Torvald Riegel
  2 siblings, 1 reply; 35+ messages in thread
From: Darren Hart @ 2014-10-30  2:07 UTC (permalink / raw)
  To: Joseph S. Myers, Carlos O'Donell, Rich Felker,
	Roland McGrath, Torvald Riegel, GLIBC Devel, Michael Kerrisk

On Sun, Oct 26, 2014 at 03:19:22AM -0400, Mike Frysinger wrote:
> On 23 Oct 2014 21:06, Joseph S. Myers wrote:
> > On Thu, 23 Oct 2014, Carlos O'Donell wrote:
> > > The largest problem right now is lack of documentation and that becomes
> > > lack of commitment from the kernel to maintain the API/ABI, and that
> > > translates into reluctance for glibc to create a wrapper.
> > 
> > That's not a plausible reason.  The presumption is that if any interface 
> > provided by the kernel could plausibly be used by userspace, it must stay 
> > stable.
> 
> sure, but Carlos is speaking in this specific case wrt futex, and it def suffers 
> from lack of documentation everywhere.  which provides quite a good amount of 
> wiggle room for the kernel peeps :).

This kernel peep, and I think I can speak for tglx as well, is not looking for
wiggle room here. We've provide Michael Kerrisk with the return values and error
conditions. I for one treat this as binding.

> 
> > * Otherwise, my inclination would be to default to adding wrappers (both 
> > for syscalls not used in glibc, and for cases such as futex where the 
> > syscall is used in glibc but can usefully be used directly as well) unless 
> > there is a clear reason not to.  This includes for architecture-specific 
> > syscalls.
> 
> i disagree here.  if the func is not going to be generally useful, i don't think 
> glibc should pick it up.  especially for arch-specific syscalls.  i think the 
> kernel itself is much better suited for providing a thin wrapper layer for 
> userspace as they're directly in control of the ABI (especially cross-arch) and 
> the API (via the UAPI headers), they can provide a full API for all syscalls 
> (unlike glibc per the caveats listed above), and it means there's a central 
> source point for everyone to use, not just glibc.
> 
> if the syscall is generally useful to people writing userland code, then i don't 
> see a problem with incorporating it in some fashion into the official GNU API.
> 

Agreed, and we have at least a couple examples of futex usage outside of glibc.

(futextest - OK, that one's self-serving, librcu, and I've heard of others)


-- 
Darren Hart
Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Add futex wrapper to glibc?
  2014-10-30  2:07               ` Darren Hart
@ 2014-10-30  9:32                 ` Torvald Riegel
  0 siblings, 0 replies; 35+ messages in thread
From: Torvald Riegel @ 2014-10-30  9:32 UTC (permalink / raw)
  To: Darren Hart
  Cc: Joseph S. Myers, Carlos O'Donell, Rich Felker,
	Roland McGrath, GLIBC Devel, Michael Kerrisk

On Wed, 2014-10-29 at 19:07 -0700, Darren Hart wrote:
> On Sun, Oct 26, 2014 at 03:19:22AM -0400, Mike Frysinger wrote:
> > On 23 Oct 2014 21:06, Joseph S. Myers wrote:
> > > On Thu, 23 Oct 2014, Carlos O'Donell wrote:
> > > * Otherwise, my inclination would be to default to adding wrappers (both 
> > > for syscalls not used in glibc, and for cases such as futex where the 
> > > syscall is used in glibc but can usefully be used directly as well) unless 
> > > there is a clear reason not to.  This includes for architecture-specific 
> > > syscalls.
> > 
> > i disagree here.  if the func is not going to be generally useful, i don't think 
> > glibc should pick it up.  especially for arch-specific syscalls.  i think the 
> > kernel itself is much better suited for providing a thin wrapper layer for 
> > userspace as they're directly in control of the ABI (especially cross-arch) and 
> > the API (via the UAPI headers), they can provide a full API for all syscalls 
> > (unlike glibc per the caveats listed above), and it means there's a central 
> > source point for everyone to use, not just glibc.
> > 
> > if the syscall is generally useful to people writing userland code, then i don't 
> > see a problem with incorporating it in some fashion into the official GNU API.
> > 
> 
> Agreed, and we have at least a couple examples of futex usage outside of glibc.
> 
> (futextest - OK, that one's self-serving, librcu, and I've heard of others)

Yes, there are and will be others, for example GCC's libitm, or C++'s
std::synchronic<T> if it gets accepted.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Add futex wrapper to glibc?
  2014-10-23 21:07           ` Joseph S. Myers
  2014-10-24  2:00             ` Carlos O'Donell
  2014-10-26  7:19             ` Mike Frysinger
@ 2014-10-30  2:03             ` Darren Hart
  2 siblings, 0 replies; 35+ messages in thread
From: Darren Hart @ 2014-10-30  2:03 UTC (permalink / raw)
  To: Joseph S. Myers
  Cc: Carlos O'Donell, Rich Felker, Roland McGrath, Torvald Riegel,
	GLIBC Devel, Michael Kerrisk

On Thu, Oct 23, 2014 at 09:06:17PM +0000, Joseph S. Myers wrote:
> On Thu, 23 Oct 2014, Carlos O'Donell wrote:
> 
> > We are IMO at the stage where futex is stable, few things are changing,
> > and with documentation in place, I would consider adding a futex wrapper.
> > However, per other discussions on errors, I'd actually add a non-trivial
> > C wrapper to enforce that the kernel does not return error codes that
> > are unexpected.
> 
> I don't think that makes sense for futex.  I think such checks only make 
> sense where glibc is making use of futex within its own code - there it 
> may sometimes be useful to sanity-check results match glibc's 
> expectations.  I also think it's desirable for any such syscall wrapper to 
> be usable with futex operations that may be added to the kernel in future 
> (i.e. someone should only need new kernel headers to build code using new 
> operations, not a new glibc binary that knows how to check things for 
> those new operations).

Agreed, let the caller deal with the return values. When used directly, glibc
should not be acting on any of the return codes as they are needed by the
implementation (which is outside of glibc).

-- 
Darren Hart
Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Add futex wrapper to glibc?
  2014-10-23 19:16         ` Add futex wrapper to glibc? Carlos O'Donell
  2014-10-23 21:07           ` Joseph S. Myers
@ 2014-10-23 21:12           ` Rich Felker
  2014-10-30  1:59           ` Darren Hart
  2 siblings, 0 replies; 35+ messages in thread
From: Rich Felker @ 2014-10-23 21:12 UTC (permalink / raw)
  To: Carlos O'Donell
  Cc: Roland McGrath, Torvald Riegel, GLIBC Devel, Darren Hart,
	Michael Kerrisk

On Thu, Oct 23, 2014 at 03:16:07PM -0400, Carlos O'Donell wrote:
> On 09/17/2014 07:17 PM, Rich Felker wrote:
> > On Wed, Sep 17, 2014 at 12:59:18PM -0700, Roland McGrath wrote:
> >>> I've asked Michael about the current status and offered my help in case
> >>> we also want to document more high-level properties of futex, like how
> >>> to use it properly.
> >>
> >> Thanks for taking this on.  The futex(7) man page seems like the right
> >> place for that, and it's rather thin now.
> > 
> > I would really like to expose futex(2) as a public interface and see
> > it documented there.. :-) I know that's something of a separate topic,
> > but it seems like a good time to discuss since the documentation is
> > going to be worked on, and since C11 atomics make having futex()
> > available to applications A LOT more desirable.
> 
> We've had this discussion before, but I can't find the reference in
> the libc-alpha archives.

Perhaps on the bug tracker. IIRC there's at least one duplicate bug
report for it too, and the discussions are probably buried in one of
them.

> The largest problem right now is lack of documentation and that becomes
> lack of commitment from the kernel to maintain the API/ABI, and that
> translates into reluctance for glibc to create a wrapper.

The kernel's commitment to maintain API/ABI is simply a consequence of
all the pthread primitives depending on the behavior. This is at least
as strong as documentation, but I agree documentation would be nice
too, especially for less-known aspects that glibc is not depending on
right now.

> We are IMO at the stage where futex is stable, few things are changing,
> and with documentation in place, I would consider adding a futex wrapper.
> However, per other discussions on errors, I'd actually add a non-trivial
> C wrapper to enforce that the kernel does not return error codes that
> are unexpected.

Is there a reason for this? I don't see it being done on any other
Linux-specific functions glibc provides syscall wrappers for.

> Roland expressed some opinions in the past. I think, because I can't find
> the reference, they were along the lines that: the value of all of these
> things being in glibc is suspect.

I don't think it's any more suspect than epoll, eventfd, fanotify,
sendfile, etc. syscall wrappers being in glibc. These are all Linux
APIs that are useful to applications. Recommending these to be handled
in third-party libraries is not a good idea. Not only does it
perpetuate the use of syscall() (which has ugly variadic calling
issues and issues with argument type on x32 and possibly other future
32-on-64 type ABIs), but it precludes applications from being able to
use them with cancellation. One of the bug reports requesting futex()
is actually about having it as a cancellation point. IMO it would make
sense for just the wait forms to be cancellation points, much like how
fcntl is a cancellation point only for blocking lock operations.

> I'm OK with distinct symbol sets between supported OSs and I understand
> that for GNU/Linux that glibc *is* part C library and part Linux library.

Yes. BTW I think it's fundamentally possible to do non-process-shared
futex in userspace if you do want to support it on non-Linux targets.
You just do the equivalent hash buckets/waiter lists wrapped with a
mutex and cond var. Not to say this is a good idea, but it's possible.

Rich

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Add futex wrapper to glibc?
  2014-10-23 19:16         ` Add futex wrapper to glibc? Carlos O'Donell
  2014-10-23 21:07           ` Joseph S. Myers
  2014-10-23 21:12           ` Rich Felker
@ 2014-10-30  1:59           ` Darren Hart
  2014-10-30  2:44             ` Rich Felker
  2014-10-30  2:55             ` Carlos O'Donell
  2 siblings, 2 replies; 35+ messages in thread
From: Darren Hart @ 2014-10-30  1:59 UTC (permalink / raw)
  To: Carlos O'Donell
  Cc: Rich Felker, Roland McGrath, Torvald Riegel, GLIBC Devel,
	Michael Kerrisk

On Thu, Oct 23, 2014 at 03:16:07PM -0400, Carlos O'Donell wrote:
> On 09/17/2014 07:17 PM, Rich Felker wrote:
> > On Wed, Sep 17, 2014 at 12:59:18PM -0700, Roland McGrath wrote:
> >>> I've asked Michael about the current status and offered my help in case
> >>> we also want to document more high-level properties of futex, like how
> >>> to use it properly.
> >>
> >> Thanks for taking this on.  The futex(7) man page seems like the right
> >> place for that, and it's rather thin now.
> > 
> > I would really like to expose futex(2) as a public interface and see
> > it documented there.. :-) I know that's something of a separate topic,
> > but it seems like a good time to discuss since the documentation is
> > going to be worked on, and since C11 atomics make having futex()
> > available to applications A LOT more desirable.
> 
> We've had this discussion before, but I can't find the reference in
> the libc-alpha archives.
> 
> The largest problem right now is lack of documentation and that becomes
> lack of commitment from the kernel to maintain the API/ABI, and that
> translates into reluctance for glibc to create a wrapper.

I would like to see this as well so I (and others, like Mathieu Desnoyers) don't
have to do our own syscall wrapper in futextests and librcu, for example.

> 
> We are IMO at the stage where futex is stable, few things are changing,
> and with documentation in place, I would consider adding a futex wrapper.

Yes, at least for the defined OP codes. New OPs may be added of course, but that
isn't a concern for supporting what exists today, and doesn't break
compatibility.

I wonder though... can we not wrap FUTEX_REQUEUE? It's fundamentally broken.
FUTEX_CMP_REQUEUE should *always* be used instead. The glibc wrapper is one way
to encourage developers to do the right thing (don't expose the bad op in the
header).

-- 
Darren Hart
Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Add futex wrapper to glibc?
  2014-10-30  1:59           ` Darren Hart
@ 2014-10-30  2:44             ` Rich Felker
  2014-10-30  2:55             ` Carlos O'Donell
  1 sibling, 0 replies; 35+ messages in thread
From: Rich Felker @ 2014-10-30  2:44 UTC (permalink / raw)
  To: Darren Hart
  Cc: Carlos O'Donell, Roland McGrath, Torvald Riegel, GLIBC Devel,
	Michael Kerrisk

On Wed, Oct 29, 2014 at 06:59:15PM -0700, Darren Hart wrote:
> > We are IMO at the stage where futex is stable, few things are changing,
> > and with documentation in place, I would consider adding a futex wrapper.
> 
> Yes, at least for the defined OP codes. New OPs may be added of course, but that
> isn't a concern for supporting what exists today, and doesn't break
> compatibility.
> 
> I wonder though... can we not wrap FUTEX_REQUEUE? It's fundamentally broken.
> FUTEX_CMP_REQUEUE should *always* be used instead. The glibc wrapper is one way
> to encourage developers to do the right thing (don't expose the bad op in the
> header).

You're mistaken here. There are plenty of valid ways to use
FUTEX_REQUEUE - for example if the calling thread is requeuing the
target(s) to a lock that the calling thread owns. Just because it
doesn't meet the needs of the way glibc was using it internally
doesn't mean it's useless for other applications.

In any case, I don't think there's a proposal to intercept/modify the
commands to futex, just to pass them through (and possibly do a
cancellable syscall for some of them).

Rich

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Add futex wrapper to glibc?
  2014-10-30  1:59           ` Darren Hart
  2014-10-30  2:44             ` Rich Felker
@ 2014-10-30  2:55             ` Carlos O'Donell
  2014-10-30  3:26               ` Rich Felker
  2014-10-30 17:09               ` Joseph S. Myers
  1 sibling, 2 replies; 35+ messages in thread
From: Carlos O'Donell @ 2014-10-30  2:55 UTC (permalink / raw)
  To: Darren Hart
  Cc: Rich Felker, Roland McGrath, Torvald Riegel, GLIBC Devel,
	Michael Kerrisk

On 10/29/2014 09:59 PM, Darren Hart wrote:
> I wonder though... can we not wrap FUTEX_REQUEUE? It's fundamentally broken.
> FUTEX_CMP_REQUEUE should *always* be used instead. The glibc wrapper is one way
> to encourage developers to do the right thing (don't expose the bad op in the
> header).

We can do whatever you want, but you should document it as broken in the
linux kernel man pages if it is not already. Then when we add the wrapper
we'll make sure it's not there.

The WIP consensus is that deprecated kernel syscalls or features will
not be part of wrappers.

Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Add futex wrapper to glibc?
  2014-10-30  2:55             ` Carlos O'Donell
@ 2014-10-30  3:26               ` Rich Felker
  2014-10-30 17:09               ` Joseph S. Myers
  1 sibling, 0 replies; 35+ messages in thread
From: Rich Felker @ 2014-10-30  3:26 UTC (permalink / raw)
  To: Carlos O'Donell
  Cc: Darren Hart, Roland McGrath, Torvald Riegel, GLIBC Devel,
	Michael Kerrisk

On Wed, Oct 29, 2014 at 10:54:19PM -0400, Carlos O'Donell wrote:
> On 10/29/2014 09:59 PM, Darren Hart wrote:
> > I wonder though... can we not wrap FUTEX_REQUEUE? It's fundamentally broken.
> > FUTEX_CMP_REQUEUE should *always* be used instead. The glibc wrapper is one way
> > to encourage developers to do the right thing (don't expose the bad op in the
> > header).
> 
> We can do whatever you want, but you should document it as broken in the
> linux kernel man pages if it is not already. Then when we add the wrapper
> we'll make sure it's not there.
> 
> The WIP consensus is that deprecated kernel syscalls or features will
> not be part of wrappers.

As I mentioned, it's not broken; it's fine to use as long as there's
no possible race on the condition for whether the target thread(s)
should wake up or wait on the futex, and there are actually situations
where it's preferable (yields better performance by avoiding spurious
EAGAIN) than the CMP version. I can give examples if anyone's
interested.

Rich

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Add futex wrapper to glibc?
  2014-10-30  2:55             ` Carlos O'Donell
  2014-10-30  3:26               ` Rich Felker
@ 2014-10-30 17:09               ` Joseph S. Myers
  1 sibling, 0 replies; 35+ messages in thread
From: Joseph S. Myers @ 2014-10-30 17:09 UTC (permalink / raw)
  To: Carlos O'Donell
  Cc: Darren Hart, Rich Felker, Roland McGrath, Torvald Riegel,
	GLIBC Devel, Michael Kerrisk

On Wed, 29 Oct 2014, Carlos O'Donell wrote:

> The WIP consensus is that deprecated kernel syscalls or features will
> not be part of wrappers.

Deprecated *syscalls* will not have wrappers.  If a particular syscall is 
to have a wrapper, I don't think there's any consensus for glibc to try to 
filter out deprecated uses and only pass through non-deprecated ones (and 
wrappers should definitely allow through unknown flags, futex operations 
etc., rather than requiring glibc to have a full understanding of valid 
arguments to the syscall).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Futex error handling
  2014-09-16 15:36 Futex error handling Torvald Riegel
  2014-09-16 16:56 ` Rich Felker
  2014-09-17 19:41 ` Roland McGrath
@ 2014-10-20 16:15 ` Torvald Riegel
  2014-10-21 22:34   ` Roland McGrath
  2014-10-23 19:35   ` Carlos O'Donell
  2 siblings, 2 replies; 35+ messages in thread
From: Torvald Riegel @ 2014-10-20 16:15 UTC (permalink / raw)
  To: GLIBC Devel; +Cc: Darren Hart

On Tue, 2014-09-16 at 17:36 +0200, Torvald Riegel wrote:
> We got complains from the kernel side that glibc wouldn't react properly
> to futex errors being returned.  Thus, I'm looking at what we'd need to
> actually improve.  I'm using this here as a documentation for futex
> error codes: https://lkml.org/lkml/2014/5/15/356
> 
> Generally, we have three categories of faults (ie, the cause for an
> error/failure):
> * Bug in glibc ("BL")
> * Bug in the client program ("BP")
> * Failures that are neither a bug in glibc nor the program ("F")
> 
> Also, there are cases where it's not a "real" failure, but just
> something that is expected behavior that needs to be handled ("NF").
> 
> I'm not aware of a general policy about whether glibc should abort or
> assert (ie, abort only with assertion checks enabled) when the fault is
> in the BL or BP categories.  I'd say we don't, because there's no way to
> handle it anyway, and other things will likely go wrong; but I don't
> have a strong opinion.  Thoughts?
> 
> For every futex op, here's a list of how I'd categorize the possible
> error codes (I'm ignoring ENOSYS, which is NF when feature testing (or
> BL)):
> 
> FUTEX_WAIT:
> * EFAULT is either BL or BP.  Nothing we can do.  Should have failed
> earlier when we accessed the futex variable.
> * EINVAL (alignment and timeout normalization) is BL/BP.
> * EWOULDBLOCK, ETIMEDOUT are NF.
> 
> FUTEX_WAKE, FUTEX_WAKE_OP:
> * EFAULT can be BL/BP *or* NF, so we *must not* abort or assert in this
> case.  This is due to how futexes work when combined with certain rules
> for destruction of the underlying synchronization data structure; see my
> description of the mutex destruction issue (but this can happen with
> other data structures such as semaphores or cond vars too):
> https://sourceware.org/ml/libc-alpha/2014-04/msg00075.html
> * EINVAL (futex alignment) is BL/BP.
> * EINVAL (inconsistent state or hit a PI futex) can be either BL/BP *or*
> NF.  The latter is caused by the mutex destruction issue, only that a
> pending FUTEX_WAKE after destruction doesn't hit an inaccessible memory
> location but one which has been reused for a PI futex.  Thus, we must
> not abort or assert in this case.
> 
> FUTEX_REQUEUE:
> * Like FUTEX_WAKE, except that it's not safe to use concurrently with
> possible destruction / reuse of the futex memory (because requeueing to
> a futex that's unrelated to the new futex located in reused memory is
> bad).
> 
> FUTEX_REQUEUE_CMP:
> * Like FUTEX_REQUEUE.  EAGAIN is NF.
> 
> FUTEX_WAKE_OP:
> * Haven't looked at this yet.  Only used in condvars, and might not be
> necessary for a condvar that's not based on a condvar-internal lock.
> 
> FUTEX_WAIT_BITSET / FUTEX_WAKE_BITSET:
> * Like FUTEX_WAIT / FUTEX_WAKE.  The additional EINVAL is BL.
> 
> FUTEX_LOCK_PI, FUTEX_TRYLOCK_PI:
> * EFAULT is BL/BP.
> * ENOMEM is F.  We need to handle this.
> * EINVAL, EPERM, ESRCH are BL/BP.
> * EAGAIN and ETIMEDOUT are NF.
> * EDEADLOCK is BP (or BL).
> * EOWNERDIED is F.
> 
> FUTEX_UNLOCK_PI:
> (* I guess this can return EFAULT too, which is BL/BP.)
> * EINVAL and EPERM are BL/BP.  I don't think there's a mutex destruction
> issue with PI locks because the kernel takes care of both resetting the
> value of the futex var and waking up threads; it should do so in a way
> that won't access reused memory.  I guess we should check that though...
> 
> FUTEX_WAIT_REQUEUE_PI:
> * EFAULT and EINVAL are BL/BP.
> * EWOULDBLOCK and ETIMEDOUT are NF.
> * EOWNERDIED is F.
> 
> FUTEX_CMP_REQUEUE_PI is like FUTEX_CMP_REQUEUE except:
> * ENOMEM is F.
> * EPERM and ESRCH are BL/BP.
> * EDEADLOCK is BP (or BL).
> 
> 
> I think the next steps to improve this should be:
> 1) Getting consensus on how we want to handle BL and BP in general.

Ping.  I think that's a fairly generic issue (in the sense of likely
being a question for other things than just mutexes too), so I'd like
some input on the direction.

Carlos, Joseph, Roland:  Do you have any comments?

> 2) Applying the outcome of that to the list above and getting consensus
> on the result.
> 3) For each case of F, find the best way to report it to the caller
> (e.g., error code from the pthreads function, abort, ...).
> 4) Change each use of the futexes accordingly, one at a time.
> 
> I've asked Michael Kerrisk for the state of the futex error docs, but
> haven't gotten a reply yet.  (Last time I checked, the new input from
> the email I referred to above wasn't part of the futex docs yet.)

Michael has replied, and it's on his list of things to do.



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Futex error handling
  2014-10-20 16:15 ` Futex error handling Torvald Riegel
@ 2014-10-21 22:34   ` Roland McGrath
  2014-10-23  2:37     ` Carlos O'Donell
  2014-10-23 19:35   ` Carlos O'Donell
  1 sibling, 1 reply; 35+ messages in thread
From: Roland McGrath @ 2014-10-21 22:34 UTC (permalink / raw)
  To: Torvald Riegel; +Cc: GLIBC Devel, Darren Hart

The high-level comment is that we have always favored having actual bugs
cause quick and complete failure.

If it's a bug in libc, then it should fail early and catastrophically so
that we find out about the bug as soon as possible.  That trades off
against any runtime cost of detecting the case.  If it's cheap to detect,
then detect it.  If it's not so cheap, then don't pay the cost because we
don't expect that we'll have the bug.  Using assert is a middle ground for
things that have enough cost that we don't just leave them in all the time,
but little enough that there's still any question about it.  I don't know
what the distribution of NDEBUG use is like across distributions.  If every
distribution builds production libc with NDEBUG then in practice assert
will not catch any real-world problems and it shouldn't really count as
runtime detection, because only people developing libc will ever see it.

If it's user code invoking undefined behavior, then it should fail early
and catastrophically so that developers don't get the false impression that
their code is OK when it happens not to break the use cases they test
adequately.  (Said another way, so that we avoid giving developers an
excuse to complain when a future implementation change "breaks" their
programs that were always broken, but theretofore ignorably so.)  That too
trades off against any runtime cost of detecting the case.  I'd say the
allowance for cost of detection is marginally higher than in the first
case, because we expect user bugs to be more common that libc bugs.  But
it's still not much, since correct programs performing better is more
important to us than buggy programs being easier to debug.

Those are generic principles.  There's another kind of case I don't think
you mentioned, that is especially apropros for the futex operations.  That
is unexpected results from the kernel.  That could of course just be a libc
bug that causes its expectations to be wrong.  But it could also be a
kernel bug, or a new compatibility problem (e.g. some system call starts
returning new error codes in a new kernel version that weren't possible
when the libc code was written, built, and tested).  For those I'm not sure
there is any general rule that will really help.  It might just require
careful consideration case by case for what is the wisest form of
future-proofing.  Sometimes, propagating whatever error the kernel gave
back to the user is clearly the best thing to do.  But there might also be
situations where an unexpected result means that libc has become confused
about what state the kernel left things in, and crashing would be better.
And finally, there might well be instances of kernel bugs that we could
adequately recognize and work around.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Futex error handling
  2014-10-21 22:34   ` Roland McGrath
@ 2014-10-23  2:37     ` Carlos O'Donell
  2014-10-23 17:48       ` Roland McGrath
  0 siblings, 1 reply; 35+ messages in thread
From: Carlos O'Donell @ 2014-10-23  2:37 UTC (permalink / raw)
  To: Roland McGrath, Torvald Riegel; +Cc: GLIBC Devel, Darren Hart

On 10/21/2014 06:34 PM, Roland McGrath wrote:
> The high-level comment is that we have always favored having actual bugs
> cause quick and complete failure.

Agreed.
 
> If it's a bug in libc, then it should fail early and catastrophically so
> that we find out about the bug as soon as possible.  That trades off
> against any runtime cost of detecting the case.  If it's cheap to detect,
> then detect it.  If it's not so cheap, then don't pay the cost because we
> don't expect that we'll have the bug.  Using assert is a middle ground for
> things that have enough cost that we don't just leave them in all the time,
> but little enough that there's still any question about it.  I don't know
> what the distribution of NDEBUG use is like across distributions.  If every
> distribution builds production libc with NDEBUG then in practice assert
> will not catch any real-world problems and it shouldn't really count as
> runtime detection, because only people developing libc will ever see it.

RHEL builds with -DNDEBUG. I expect every distribution does the same.	

> If it's user code invoking undefined behavior, then it should fail early
> and catastrophically so that developers don't get the false impression that
> their code is OK when it happens not to break the use cases they test
> adequately.  (Said another way, so that we avoid giving developers an
> excuse to complain when a future implementation change "breaks" their
> programs that were always broken, but theretofore ignorably so.)  That too
> trades off against any runtime cost of detecting the case.  I'd say the
> allowance for cost of detection is marginally higher than in the first
> case, because we expect user bugs to be more common that libc bugs.  But
> it's still not much, since correct programs performing better is more
> important to us than buggy programs being easier to debug.

Agreed. Apparently the kernel detection of user bugs is very low cost,
the kernel has a broader view of all the locks than glibc does. Thus error
returns from glibc should IMO become immediate catastrophic failures if
the error indicates undefined behaviour. If the situation is in any way
recoverable, we must return the error code to the user.

> Those are generic principles.  There's another kind of case I don't think
> you mentioned, that is especially apropros for the futex operations.  That
> is unexpected results from the kernel.  That could of course just be a libc
> bug that causes its expectations to be wrong.  But it could also be a
> kernel bug, or a new compatibility problem (e.g. some system call starts
> returning new error codes in a new kernel version that weren't possible
> when the libc code was written, built, and tested).  For those I'm not sure
> there is any general rule that will really help.  It might just require
> careful consideration case by case for what is the wisest form of
> future-proofing.  Sometimes, propagating whatever error the kernel gave
> back to the user is clearly the best thing to do.  But there might also be
> situations where an unexpected result means that libc has become confused
> about what state the kernel left things in, and crashing would be better.
> And finally, there might well be instances of kernel bugs that we could
> adequately recognize and work around.

I would prefer that unexpected error codes cause glibc to crash. This
immediately alerts kernel and glibc developers of a mismatch in their
expectations before this ever gets out of experimental distributions.

That's just my preference.

Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Futex error handling
  2014-10-23  2:37     ` Carlos O'Donell
@ 2014-10-23 17:48       ` Roland McGrath
  0 siblings, 0 replies; 35+ messages in thread
From: Roland McGrath @ 2014-10-23 17:48 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: Torvald Riegel, GLIBC Devel, Darren Hart

> I would prefer that unexpected error codes cause glibc to crash. This
> immediately alerts kernel and glibc developers of a mismatch in their
> expectations before this ever gets out of experimental distributions.

That's my inclination as well.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Futex error handling
  2014-10-20 16:15 ` Futex error handling Torvald Riegel
  2014-10-21 22:34   ` Roland McGrath
@ 2014-10-23 19:35   ` Carlos O'Donell
  2014-10-23 20:47     ` Rich Felker
  1 sibling, 1 reply; 35+ messages in thread
From: Carlos O'Donell @ 2014-10-23 19:35 UTC (permalink / raw)
  To: Torvald Riegel, GLIBC Devel; +Cc: Darren Hart

On 10/20/2014 12:14 PM, Torvald Riegel wrote:
> On Tue, 2014-09-16 at 17:36 +0200, Torvald Riegel wrote:
>> We got complains from the kernel side that glibc wouldn't react properly
>> to futex errors being returned.  Thus, I'm looking at what we'd need to
>> actually improve.  I'm using this here as a documentation for futex
>> error codes: https://lkml.org/lkml/2014/5/15/356
>>
>> Generally, we have three categories of faults (ie, the cause for an
>> error/failure):
>> * Bug in glibc ("BL")
>> * Bug in the client program ("BP")
>> * Failures that are neither a bug in glibc nor the program ("F")
>>
>> Also, there are cases where it's not a "real" failure, but just
>> something that is expected behavior that needs to be handled ("NF").
>>
>> I'm not aware of a general policy about whether glibc should abort or
>> assert (ie, abort only with assertion checks enabled) when the fault is
>> in the BL or BP categories.  I'd say we don't, because there's no way to
>> handle it anyway, and other things will likely go wrong; but I don't
>> have a strong opinion.  Thoughts?
>>
>> For every futex op, here's a list of how I'd categorize the possible
>> error codes (I'm ignoring ENOSYS, which is NF when feature testing (or
>> BL)):
>>
>> FUTEX_WAIT:
>> * EFAULT is either BL or BP.  Nothing we can do.  Should have failed
>> earlier when we accessed the futex variable.

Abort.

>> * EINVAL (alignment and timeout normalization) is BL/BP.

Abort.

>> * EWOULDBLOCK, ETIMEDOUT are NF.

Don't know.

WARNING: Aborting on EWOULDBLOCK is dangerous since we might just
wish to deadlock the single thread, so a developer can attach with
the debugger to inspect thread state while other threads keep running.

>> FUTEX_WAKE, FUTEX_WAKE_OP:
>> * EFAULT can be BL/BP *or* NF, so we *must not* abort or assert in this
>> case.  This is due to how futexes work when combined with certain rules
>> for destruction of the underlying synchronization data structure; see my
>> description of the mutex destruction issue (but this can happen with
>> other data structures such as semaphores or cond vars too):
>> https://sourceware.org/ml/libc-alpha/2014-04/msg00075.html

Don't know.

>> * EINVAL (futex alignment) is BL/BP.

Abort.

>> * EINVAL (inconsistent state or hit a PI futex) can be either BL/BP *or*
>> NF.  The latter is caused by the mutex destruction issue, only that a
>> pending FUTEX_WAKE after destruction doesn't hit an inaccessible memory
>> location but one which has been reused for a PI futex.  Thus, we must
>> not abort or assert in this case.

Don't know.

>>
>> FUTEX_REQUEUE:
>> * Like FUTEX_WAKE, except that it's not safe to use concurrently with
>> possible destruction / reuse of the futex memory (because requeueing to
>> a futex that's unrelated to the new futex located in reused memory is
>> bad).

Are there any error codes for FUTEX_REQUEUE or are you saying it's
identical to FUTEX_WAKE?

>> FUTEX_REQUEUE_CMP:
>> * Like FUTEX_REQUEUE.  EAGAIN is NF.

Is EGAIN the only one to consider?

Yes, EAGAIN needs some kind of special handling, but I don't know what.

>> FUTEX_WAKE_OP:
>> * Haven't looked at this yet.  Only used in condvars, and might not be
>> necessary for a condvar that's not based on a condvar-internal lock.
>>
>> FUTEX_WAIT_BITSET / FUTEX_WAKE_BITSET:
>> * Like FUTEX_WAIT / FUTEX_WAKE.  The additional EINVAL is BL.

Abort.

>> FUTEX_LOCK_PI, FUTEX_TRYLOCK_PI:
>> * EFAULT is BL/BP.

Abort.

>> * ENOMEM is F.  We need to handle this.

Yes, pass it along.

>> * EINVAL, EPERM, ESRCH are BL/BP.

Abort.

>> * EAGAIN and ETIMEDOUT are NF.

Don't know.

>> * EDEADLOCK is BP (or BL).

See WARNING above.

>> * EOWNERDIED is F.

Don't know.

>> FUTEX_UNLOCK_PI:
>> (* I guess this can return EFAULT too, which is BL/BP.)
>> * EINVAL and EPERM are BL/BP.  I don't think there's a mutex destruction
>> issue with PI locks because the kernel takes care of both resetting the
>> value of the futex var and waking up threads; it should do so in a way
>> that won't access reused memory.  I guess we should check that though...

Abort.

>> FUTEX_WAIT_REQUEUE_PI:
>> * EFAULT and EINVAL are BL/BP.

Abort.

>> * EWOULDBLOCK and ETIMEDOUT are NF.

Don't know.

>> * EOWNERDIED is F.

Don't know. How did we get here?

>> FUTEX_CMP_REQUEUE_PI is like FUTEX_CMP_REQUEUE except:
>> * ENOMEM is F.

Pass it along.

>> * EPERM and ESRCH are BL/BP.

Abort.

>> * EDEADLOCK is BP (or BL).

See WARNING above.

>>
>>
>> I think the next steps to improve this should be:
>> 1) Getting consensus on how we want to handle BL and BP in general.
> 
> Ping.  I think that's a fairly generic issue (in the sense of likely
> being a question for other things than just mutexes too), so I'd like
> some input on the direction.
> 
> Carlos, Joseph, Roland:  Do you have any comments?

Roland already responded, but I'll reiterate with wiki updates:

(1) Bugs in the GNU C library should fail early and visibly, balanced by
runtime detection cost.

(2) Bugs in the user program should fail early and visibly, balanced by
runtime detection cost in the GNU C library.

(3) Unknown error codes from the kernel should cause immediate failure.

https://sourceware.org/glibc/wiki/Style_and_Conventions#Error_Handling

>> 2) Applying the outcome of that to the list above and getting consensus
>> on the result.
>> 3) For each case of F, find the best way to report it to the caller
>> (e.g., error code from the pthreads function, abort, ...).
>> 4) Change each use of the futexes accordingly, one at a time.
>>
>> I've asked Michael Kerrisk for the state of the futex error docs, but
>> haven't gotten a reply yet.  (Last time I checked, the new input from
>> the email I referred to above wasn't part of the futex docs yet.)
> 
> Michael has replied, and it's on his list of things to do.

Hopefully that helps, maybe it doesn't, I answered the easy ones, and
perhaps the first step is to fix the easy ones first :-)

Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Futex error handling
  2014-10-23 19:35   ` Carlos O'Donell
@ 2014-10-23 20:47     ` Rich Felker
  0 siblings, 0 replies; 35+ messages in thread
From: Rich Felker @ 2014-10-23 20:47 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: Torvald Riegel, GLIBC Devel, Darren Hart

On Thu, Oct 23, 2014 at 03:34:46PM -0400, Carlos O'Donell wrote:
> On 10/20/2014 12:14 PM, Torvald Riegel wrote:
> > On Tue, 2014-09-16 at 17:36 +0200, Torvald Riegel wrote:
> >> We got complains from the kernel side that glibc wouldn't react properly
> >> to futex errors being returned.  Thus, I'm looking at what we'd need to
> >> actually improve.  I'm using this here as a documentation for futex
> >> error codes: https://lkml.org/lkml/2014/5/15/356
> >>
> >> Generally, we have three categories of faults (ie, the cause for an
> >> error/failure):
> >> * Bug in glibc ("BL")
> >> * Bug in the client program ("BP")
> >> * Failures that are neither a bug in glibc nor the program ("F")
> >>
> >> Also, there are cases where it's not a "real" failure, but just
> >> something that is expected behavior that needs to be handled ("NF").
> >>
> >> I'm not aware of a general policy about whether glibc should abort or
> >> assert (ie, abort only with assertion checks enabled) when the fault is
> >> in the BL or BP categories.  I'd say we don't, because there's no way to
> >> handle it anyway, and other things will likely go wrong; but I don't
> >> have a strong opinion.  Thoughts?
> >>
> >> For every futex op, here's a list of how I'd categorize the possible
> >> error codes (I'm ignoring ENOSYS, which is NF when feature testing (or
> >> BL)):
> >>
> >> FUTEX_WAIT:
> >> * EFAULT is either BL or BP.  Nothing we can do.  Should have failed
> >> earlier when we accessed the futex variable.
> 
> Abort.

Yes.

> >> * EINVAL (alignment and timeout normalization) is BL/BP.
> 
> Abort.

For alignment, yes. For timeout normalization, no, this is a specified
error that must be passed back to the application. Of course if glibc
has already checked it in userspace, then further EINVAL from the
kernel should not be possible and aborting is in principle valid.

> >> * EWOULDBLOCK, ETIMEDOUT are NF.
> 
> Don't know.
> 
> WARNING: Aborting on EWOULDBLOCK is dangerous since we might just
> wish to deadlock the single thread, so a developer can attach with
> the debugger to inspect thread state while other threads keep running.

NF is never a case you would abort on. It means a condition that
occurs regularly in a valid program.

> >> FUTEX_WAKE, FUTEX_WAKE_OP:
> >> * EFAULT can be BL/BP *or* NF, so we *must not* abort or assert in this
> >> case.  This is due to how futexes work when combined with certain rules
> >> for destruction of the underlying synchronization data structure; see my
> >> description of the mutex destruction issue (but this can happen with
> >> other data structures such as semaphores or cond vars too):
> >> https://sourceware.org/ml/libc-alpha/2014-04/msg00075.html
> 
> Don't know.

This is complex, but unless glibc is going to take the effort (and
likely, cost) to prevent this mostly-harmless race, this case for
FUTEX_WAKE must be considered as NF. For FUTEX_WAKE_OP, EFAULT is
BL/BP if the fault is on the addr OP is applied to.

> >> * EINVAL (futex alignment) is BL/BP.
> 
> Abort.

Yes.

> >> * EINVAL (inconsistent state or hit a PI futex) can be either BL/BP *or*
> >> NF.  The latter is caused by the mutex destruction issue, only that a
> >> pending FUTEX_WAKE after destruction doesn't hit an inaccessible memory
> >> location but one which has been reused for a PI futex.  Thus, we must
> >> not abort or assert in this case.
> 
> Don't know.

I don't know much of anything about PI implementation details, so same
here -- don't know.

> >> FUTEX_REQUEUE:
> >> * Like FUTEX_WAKE, except that it's not safe to use concurrently with
> >> possible destruction / reuse of the futex memory (because requeueing to
> >> a futex that's unrelated to the new futex located in reused memory is
> >> bad).
> 
> Are there any error codes for FUTEX_REQUEUE or are you saying it's
> identical to FUTEX_WAKE?

I think some of the NF cases with FUTEX_WAKE are BL/BP for FUTEX_REQUEUE.

> >> FUTEX_REQUEUE_CMP:
> >> * Like FUTEX_REQUEUE.  EAGAIN is NF.
> 
> Is EGAIN the only one to consider?
> 
> Yes, EAGAIN needs some kind of special handling, but I don't know what.

It's the only additional case beyond FUTEX_REQUEUE.

> >> FUTEX_WAKE_OP:
> >> * Haven't looked at this yet.  Only used in condvars, and might not be
> >> necessary for a condvar that's not based on a condvar-internal lock.
> >>
> >> FUTEX_WAIT_BITSET / FUTEX_WAKE_BITSET:
> >> * Like FUTEX_WAIT / FUTEX_WAKE.  The additional EINVAL is BL.
> 
> Abort.
> 
> >> FUTEX_LOCK_PI, FUTEX_TRYLOCK_PI:
> >> * EFAULT is BL/BP.
> 
> Abort.

I think these are right.

> >> * ENOMEM is F.  We need to handle this.
> 
> Yes, pass it along.

Passing it along is probably not desirable; applications are not going
to be prepared to deal with ENOMEM from pthread_mutex_lock. But I'm
not sure what would be preferable.

> >> * EINVAL, EPERM, ESRCH are BL/BP.
> 
> Abort.

Are you sure ESRCH can't happen as a race with robust mutex owner
death?

> >> * EAGAIN and ETIMEDOUT are NF.
> 
> Don't know.

NF is NF.

> >> * EDEADLOCK is BP (or BL).
> 
> See WARNING above.

I don't see how it's related.

> >> * EOWNERDIED is F.
> 
> Don't know.

I would think it's NF..?

> >> FUTEX_UNLOCK_PI:
> >> (* I guess this can return EFAULT too, which is BL/BP.)
> >> * EINVAL and EPERM are BL/BP.  I don't think there's a mutex destruction
> >> issue with PI locks because the kernel takes care of both resetting the
> >> value of the futex var and waking up threads; it should do so in a way
> >> that won't access reused memory.  I guess we should check that though...
> 
> Abort.

I think this is correct but it should be checked.

> >> FUTEX_WAIT_REQUEUE_PI:
> >> * EFAULT and EINVAL are BL/BP.
> 
> Abort.
> 
> >> * EWOULDBLOCK and ETIMEDOUT are NF.
> 
> Don't know.

I think these look right.

> >> * EOWNERDIED is F.
> 
> Don't know. How did we get here?

Normal robust mutex usage.

> >> FUTEX_CMP_REQUEUE_PI is like FUTEX_CMP_REQUEUE except:
> >> * ENOMEM is F.
> 
> Pass it along.

How would this be passed along? Requeue is used internally in cond
vars. If requeue fails, it needs to be replaced with a simple wake
where the target thread will be responsible for attempting to wait on
the mutex (and possibly failing there, in which case it takes
responsibility for what to do).

> >> * EPERM and ESRCH are BL/BP.
> 
> Abort.

Not sure. Probably right.

> >> * EDEADLOCK is BP (or BL).
> 
> See WARNING above.

Again, unclear on this.

> >> I think the next steps to improve this should be:
> >> 1) Getting consensus on how we want to handle BL and BP in general.
> > 
> > Ping.  I think that's a fairly generic issue (in the sense of likely
> > being a question for other things than just mutexes too), so I'd like
> > some input on the direction.
> > 
> > Carlos, Joseph, Roland:  Do you have any comments?
> 
> Roland already responded, but I'll reiterate with wiki updates:
> 
> (1) Bugs in the GNU C library should fail early and visibly, balanced by
> runtime detection cost.
> 
> (2) Bugs in the user program should fail early and visibly, balanced by
> runtime detection cost in the GNU C library.
> 
> (3) Unknown error codes from the kernel should cause immediate failure.
> 
> https://sourceware.org/glibc/wiki/Style_and_Conventions#Error_Handling

I agree with these principles.

> >> 2) Applying the outcome of that to the list above and getting consensus
> >> on the result.
> >> 3) For each case of F, find the best way to report it to the caller
> >> (e.g., error code from the pthreads function, abort, ...).
> >> 4) Change each use of the futexes accordingly, one at a time.
> >>
> >> I've asked Michael Kerrisk for the state of the futex error docs, but
> >> haven't gotten a reply yet.  (Last time I checked, the new input from
> >> the email I referred to above wasn't part of the futex docs yet.)
> > 
> > Michael has replied, and it's on his list of things to do.
> 
> Hopefully that helps, maybe it doesn't, I answered the easy ones, and
> perhaps the first step is to fix the easy ones first :-)

:)

Rich

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2014-10-30 17:09 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-16 15:36 Futex error handling Torvald Riegel
2014-09-16 16:56 ` Rich Felker
2014-09-16 18:13   ` Torvald Riegel
2014-09-16 18:55     ` Rich Felker
2014-09-18 12:27       ` Torvald Riegel
2014-09-18 17:40         ` Rich Felker
2014-09-18 17:58           ` Torvald Riegel
2014-09-18 20:41             ` Rich Felker
2014-10-30  1:55               ` Darren Hart
2014-10-30  6:51                 ` Carlos O'Donell
2014-09-17 19:41 ` Roland McGrath
2014-09-17 19:46   ` Torvald Riegel
2014-09-17 19:59     ` Roland McGrath
2014-09-17 23:17       ` Rich Felker
2014-10-23 19:16         ` Add futex wrapper to glibc? Carlos O'Donell
2014-10-23 21:07           ` Joseph S. Myers
2014-10-24  2:00             ` Carlos O'Donell
2014-10-26  7:19             ` Mike Frysinger
2014-10-27 17:52               ` Joseph S. Myers
2014-10-27 20:28               ` Roland McGrath
2014-10-30  2:07               ` Darren Hart
2014-10-30  9:32                 ` Torvald Riegel
2014-10-30  2:03             ` Darren Hart
2014-10-23 21:12           ` Rich Felker
2014-10-30  1:59           ` Darren Hart
2014-10-30  2:44             ` Rich Felker
2014-10-30  2:55             ` Carlos O'Donell
2014-10-30  3:26               ` Rich Felker
2014-10-30 17:09               ` Joseph S. Myers
2014-10-20 16:15 ` Futex error handling Torvald Riegel
2014-10-21 22:34   ` Roland McGrath
2014-10-23  2:37     ` Carlos O'Donell
2014-10-23 17:48       ` Roland McGrath
2014-10-23 19:35   ` Carlos O'Donell
2014-10-23 20:47     ` Rich Felker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).