[Bug nptl/13065] New: Race condition in pthread barriers

public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug nptl/13065] New: Race condition in pthread barriers
@ 2011-08-07 18:30 bugdal at aerifal dot cx
  2011-09-25 16:14 ` [Bug nptl/13065] " bugdal at aerifal dot cx
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: bugdal at aerifal dot cx @ 2011-08-07 18:30 UTC (permalink / raw)
  To: glibc-bugs

http://sourceware.org/bugzilla/show_bug.cgi?id=13065

           Summary: Race condition in pthread barriers
           Product: glibc
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: nptl
        AssignedTo: drepper.fsp@gmail.com
        ReportedBy: bugdal@aerifal.cx

The glibc/NPTL implementation of pthread barriers has a race condition, whereby
threads exiting the barrier may access memory belonging to the barrier after
one or more of the pthread_barrier_wait calls has returned. At this point, per
POSIX, the barrier is supposed to be in "the state it had as a result of the
most recent pthread_barrier_init() function that referenced it." In particular,
it's valid to call pthread_barrier_destroy on the barrier then re-initialize it
with a new value, or to free/unmap the memory it's located in. The latter
operation would especially make sense for a process-shared barrier where the
caller is done using the barrier but other processes may continue to use it.

See the attachedment for a proof-of-concept that causes NPTL's
pthread_barrier_wait to crash. This usage is not quite "correct" (the barrier
should be destroyed before unmapping the memory, and only one thread should
destroy it) but these issues could easily be fixed by throwing in a mutex. I've
just made the code as simple as possible to demonstrate the bug.

Michael Burr proposed a solution
(http://stackoverflow.com/questions/5886614/how-can-barriers-be-destroyable-as-soon-as-pthread-barrier-wait-returns/5902671#5902671)
to this problem which which we have successfully incorporated into musl:

http://git.etalabs.net/cgi-bin/gitweb.cgi?p=musl;a=commitdiff;h=f16a3089be33a75ef8e75b2dd5ec3095996bbb87;hp=202911435b56fe007ca62fc6e573fa3ea238d337

However it only works for non-process-shared barriers, as it requires all
waiters to be able to access the first waiter's address space. I am not aware
of any fix for process-shared barriers that does not involve allocating shared
resources at pthread_barrier_wait time, which could of course fail and leave
the caller with no way to recover... I suspect fixing this robustly may require
adding a FUTEX_BARRIER operation to the kernel that does not return success
until "val" processes all call FUTEX_BARRIER on the same futex address.

Note that, unfortunately, process-shared barriers are the area where this bug
has the greatest chance of hitting real-world applications, since a process is
likely to unmap the barrier soon after it's done using it.

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug nptl/13065] Race condition in pthread barriers
  2011-08-07 18:30 [Bug nptl/13065] New: Race condition in pthread barriers bugdal at aerifal dot cx
@ 2011-09-25 16:14 ` bugdal at aerifal dot cx
  2012-04-29  3:04 ` bugdal at aerifal dot cx
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: bugdal at aerifal dot cx @ 2011-09-25 16:14 UTC (permalink / raw)
  To: glibc-bugs

http://sourceware.org/bugzilla/show_bug.cgi?id=13065

--- Comment #1 from Rich Felker <bugdal at aerifal dot cx> 2011-09-25 16:13:44 UTC ---
Created attachment 5944
  --> http://sourceware.org/bugzilla/attachment.cgi?id=5944
simple demonstration of the bug

This is the attachment that was supposed to be included in the original bug
report.

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug nptl/13065] Race condition in pthread barriers
  2011-08-07 18:30 [Bug nptl/13065] New: Race condition in pthread barriers bugdal at aerifal dot cx
  2011-09-25 16:14 ` [Bug nptl/13065] " bugdal at aerifal dot cx
@ 2012-04-29  3:04 ` bugdal at aerifal dot cx
  2013-12-20 18:10 ` triegel at redhat dot com
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: bugdal at aerifal dot cx @ 2012-04-29  3:04 UTC (permalink / raw)
  To: glibc-bugs

http://sourceware.org/bugzilla/show_bug.cgi?id=13065

Rich Felker <bugdal at aerifal dot cx> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|drepper.fsp at gmail dot    |unassigned at sourceware
                   |com                         |dot org

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug nptl/13065] Race condition in pthread barriers
  2011-08-07 18:30 [Bug nptl/13065] New: Race condition in pthread barriers bugdal at aerifal dot cx
  2011-09-25 16:14 ` [Bug nptl/13065] " bugdal at aerifal dot cx
  2012-04-29  3:04 ` bugdal at aerifal dot cx
@ 2013-12-20 18:10 ` triegel at redhat dot com
  2013-12-20 18:39 ` bugdal at aerifal dot cx
  2014-06-27 12:42 ` fweimer at redhat dot com
  4 siblings, 0 replies; 6+ messages in thread
From: triegel at redhat dot com @ 2013-12-20 18:10 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=13065

Torvald Riegel <triegel at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |triegel at redhat dot com

--- Comment #2 from Torvald Riegel <triegel at redhat dot com> ---
This is conceptually related to Bug 13690, whose resolution depends on the
outcome of a POSIX request for clarification.  The same kind of wording that
needs to be clarified for that bug is not present in the barrier specification,
but it's essentially the same question of when POSIX synchronization objects
can be safely destroyed.  Therefore, I think it's good to wait for a result of
the clarification request.

Furthermore, destruction is somewhat more complex due to the standard leaving
it unspecified whether a thread that processes a signal will keep other waiters
block.  That is, if at least one of the other threads blocked on the barrier
may process signals, then one cannot destruct a barrier after returning from a
call to pthread_barrier_wait().

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug nptl/13065] Race condition in pthread barriers
  2011-08-07 18:30 [Bug nptl/13065] New: Race condition in pthread barriers bugdal at aerifal dot cx
                   ` (2 preceding siblings ...)
  2013-12-20 18:10 ` triegel at redhat dot com
@ 2013-12-20 18:39 ` bugdal at aerifal dot cx
  2014-06-27 12:42 ` fweimer at redhat dot com
  4 siblings, 0 replies; 6+ messages in thread
From: bugdal at aerifal dot cx @ 2013-12-20 18:39 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=13065

--- Comment #3 from Rich Felker <bugdal at aerifal dot cx> ---
I don't think signals make it any more complicated. Implementation-wise, there
are two possibilities:

1. A waiter stuck in a signal handler blocks other waiters until it returns. In
this case, no waiter returns while the one waiter is still in the signal
handler, and there's no special issue to deal with.

2. A waiter stuck in a signal handler allows other waiters to proceed once the
last waiter has arrived. The ONLY way to implement this is to have some
resource identifying the barrier instance (keep in mind: as soon as any waiter
returns from the wait, the barrier is ready for reuse as a new instance) whose
lifetime persists until the signal handler returns. In order to avoid requiring
dynamic resource allocation for each barrier instance (which could fail,
rendering barriers unsafe for any actual synchronization usage) the resource
must essentially have its storage associated with the threads involved in the
barrier instance (e.g. on their stacks, TLS, kernel task structures, etc.). In
such an implementation, the thread stuck in the signal handler needs to be
finished working with the barrier resource itself (since it could be reused for
a new barrier instance, or destroyed) and must perform its waiting based on the
instance resource associated with the waiting threads.

In case it's not clear, what I'm arguing is not in regards to what the standard
says about barriers and self-synchronized destruction. My argument is that, in
either case, there's no additional barrier-specific difficulty to supporting
self-synchronized destruction. The other requirements of making the barrier
implementation correct already put you in a good position for supporting
self-synchronized destruction where it's no more difficult than for mutexes or
semaphores.

BTW, since a thread's status as being a waiter on a barrier is not a testable
condition (i.e. there's no way to measure whether it's waiting on the barrier
versus suspended awaiting scheduling just prior to waiting on the barrier) the
standard has no choice but to allow the option where signal handlers block
forward process of other waiters. Allowing the other option, however, does
create the possibility of observable behavior; if an implementation takes this
option, you may observe other threads exiting the barrier wait while the signal
handler is still running.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug nptl/13065] Race condition in pthread barriers
  2011-08-07 18:30 [Bug nptl/13065] New: Race condition in pthread barriers bugdal at aerifal dot cx
                   ` (3 preceding siblings ...)
  2013-12-20 18:39 ` bugdal at aerifal dot cx
@ 2014-06-27 12:42 ` fweimer at redhat dot com
  4 siblings, 0 replies; 6+ messages in thread
From: fweimer at redhat dot com @ 2014-06-27 12:42 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=13065

Florian Weimer <fweimer at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |fweimer at redhat dot com
              Flags|                            |security-

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-06-27 12:42 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-07 18:30 [Bug nptl/13065] New: Race condition in pthread barriers bugdal at aerifal dot cx
2011-09-25 16:14 ` [Bug nptl/13065] " bugdal at aerifal dot cx
2012-04-29  3:04 ` bugdal at aerifal dot cx
2013-12-20 18:10 ` triegel at redhat dot com
2013-12-20 18:39 ` bugdal at aerifal dot cx
2014-06-27 12:42 ` fweimer at redhat dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).