[Bug nptl/14485] New: File corruption race condition in robust mutex unlocking

public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug nptl/14485] New: File corruption race condition in robust mutex unlocking
@ 2012-08-17 18:52 bugdal at aerifal dot cx
  2012-08-17 22:34 ` [Bug nptl/14485] " bugdal at aerifal dot cx
                   ` (12 more replies)
  0 siblings, 13 replies; 14+ messages in thread
From: bugdal at aerifal dot cx @ 2012-08-17 18:52 UTC (permalink / raw)
  To: glibc-bugs

http://sourceware.org/bugzilla/show_bug.cgi?id=14485

             Bug #: 14485
           Summary: File corruption race condition in robust mutex
                    unlocking
           Product: glibc
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: nptl
        AssignedTo: unassigned@sourceware.org
        ReportedBy: bugdal@aerifal.cx
                CC: drepper.fsp@gmail.com
    Classification: Unclassified

The general procedure for unlocking a robust mutex is:

1. Put the mutex address in the "pending" slot of the thread's robust mutex
list.
2. Remove the mutex from the thread's linked list of locked robust mutexes.
3. Low level unlock (clear the futex and possibly wake waiters).
4. Clear the "pending" slot in the thread's robust mutex list.

Suppose between steps 3 and 4, another thread in the same process obtains the
mutex in such a way that it is necessarily the last user of the mutex, then
unlocks, destroys, and frees it. It then calls mmap with MAP_SHARED on a file,
device, or shared memory segment, which happens to be assigned the same address
the robust mutex had, and the file contents at the offset where the futex was
located happen to contain the tid of the first thread that was in between steps
3 and 4 above. Now, suppose the process is immediately killed. The kernel then
sets bit 30 (owner died) at this offset in the mapped file, wrongly trusting
that the pending field in the robust list header still points to a valid robust
mutex.

As far as I can tell, the ONLY solution to this problem is to introduce a
global (within the process) lock on mmap and munmap, and to hold it between
steps 2 and 4 of the robust mutex unlock procedure. The same lock can also be
used to fix bug #13064. To minimize cost, this lock should be a rwlock where
mmap and munmap count as "read" operations (so they don't block one another)
and only the dangerous robust mutex unlock and barrier operations count as
"write" operations.

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug nptl/14485] File corruption race condition in robust mutex unlocking
  2012-08-17 18:52 [Bug nptl/14485] New: File corruption race condition in robust mutex unlocking bugdal at aerifal dot cx
@ 2012-08-17 22:34 ` bugdal at aerifal dot cx
  2014-06-17 18:35 ` fweimer at redhat dot com
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: bugdal at aerifal dot cx @ 2012-08-17 22:34 UTC (permalink / raw)
  To: glibc-bugs

http://sourceware.org/bugzilla/show_bug.cgi?id=14485

--- Comment #1 from Rich Felker <bugdal at aerifal dot cx> 2012-08-17 22:34:24 UTC ---
It seems this bug has been known (but not reported as a bug) since 2010 or
earlier:

http://lists.freebsd.org/pipermail/svn-src-user/2010-November/003668.html

Keep in mind this thread I'm linking has some other complaints about NPTL's
robust mutexes that are orthogonal to this bug report, such as the fact that
you can maliciously mess up other processes that map the same mutex you have
access to. These other complaints are perhaps QoI issues, but not major bugs;
an application has no basis to assume it's safe to let untrusted processes map
its synchronization objects.

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug nptl/14485] File corruption race condition in robust mutex unlocking
  2012-08-17 18:52 [Bug nptl/14485] New: File corruption race condition in robust mutex unlocking bugdal at aerifal dot cx
  2012-08-17 22:34 ` [Bug nptl/14485] " bugdal at aerifal dot cx
@ 2014-06-17 18:35 ` fweimer at redhat dot com
  2014-06-25 10:47 ` fweimer at redhat dot com
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: fweimer at redhat dot com @ 2014-06-17 18:35 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=14485

Florian Weimer <fweimer at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |fweimer at redhat dot com

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug nptl/14485] File corruption race condition in robust mutex unlocking
  2012-08-17 18:52 [Bug nptl/14485] New: File corruption race condition in robust mutex unlocking bugdal at aerifal dot cx
  2012-08-17 22:34 ` [Bug nptl/14485] " bugdal at aerifal dot cx
  2014-06-17 18:35 ` fweimer at redhat dot com
@ 2014-06-25 10:47 ` fweimer at redhat dot com
  2014-06-25 15:47 ` bugdal at aerifal dot cx
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: fweimer at redhat dot com @ 2014-06-25 10:47 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=14485

Florian Weimer <fweimer at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
              Flags|                            |security-

--- Comment #2 from Florian Weimer <fweimer at redhat dot com> ---
What causes the corruption? Can you really unmap a page which is in use in a
futex system call? Do we have a test case?

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug nptl/14485] File corruption race condition in robust mutex unlocking
  2012-08-17 18:52 [Bug nptl/14485] New: File corruption race condition in robust mutex unlocking bugdal at aerifal dot cx
                   ` (2 preceding siblings ...)
  2014-06-25 10:47 ` fweimer at redhat dot com
@ 2014-06-25 15:47 ` bugdal at aerifal dot cx
  2015-02-09  0:28 ` mail at nh2 dot me
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: bugdal at aerifal dot cx @ 2014-06-25 15:47 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=14485

--- Comment #3 from Rich Felker <bugdal at aerifal dot cx> ---
The corruption is performed by the kernel when it walks the robust list. The
basic situation is the same as in PR #13690, except that here there's actually
a potential write to the memory rather than just a read.

The sequence of events leading to corruption goes like this:

1. Thread A unlocks the process-shared, robust mutex and is preempted after the
mutex is removed from the robust list and atomically unlocked, but before it's
removed from the list_op_pending field of the robust list header.

2. Thread B locks the mutex, and, knowing by program logic that it's the last
user of the mutex, unlocks and unmaps it, allocates/maps something else that
gets assigned the same address as the shared mutex mapping, and then exits.

3. The kernel destroys the process, which involves walking each thread's robust
list and processing each thread's list_op_pending field of the robust list
header. Since thread A has a list_op_pending pointing at the address previously
occupied by the mutex, the kernel obliviously "unlocks the mutex" by writing a
0 to the address and futex-waking it. However, the kernel has instead
overwritten part of whatever mapping thread A created. If this is private
memory it (probably) doesn't matter since the process is ending anyway (but are
there race conditions where this can be seen?). If this is shared memory or a
shared file mapping, however, the kernel corrupts it.

I suspect the race is difficult to hit since thread A has to get preempted at
exactly the wrong time AND thread B has to do a fair amount of work without
thread A getting scheduled again. So I'm not sure how much luck we'd have
getting a test case.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug nptl/14485] File corruption race condition in robust mutex unlocking
  2012-08-17 18:52 [Bug nptl/14485] New: File corruption race condition in robust mutex unlocking bugdal at aerifal dot cx
                   ` (3 preceding siblings ...)
  2014-06-25 15:47 ` bugdal at aerifal dot cx
@ 2015-02-09  0:28 ` mail at nh2 dot me
  2015-02-09 20:41 ` carlos at redhat dot com
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: mail at nh2 dot me @ 2015-02-09  0:28 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=14485

mail at nh2 dot me changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mail at nh2 dot me

--- Comment #4 from mail at nh2 dot me ---
@maintainers, do you acknowledge this as a bug?

I'd like to use this in a shared memory setup, but am scared of this case to
happen.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug nptl/14485] File corruption race condition in robust mutex unlocking
  2012-08-17 18:52 [Bug nptl/14485] New: File corruption race condition in robust mutex unlocking bugdal at aerifal dot cx
                   ` (4 preceding siblings ...)
  2015-02-09  0:28 ` mail at nh2 dot me
@ 2015-02-09 20:41 ` carlos at redhat dot com
  2015-02-09 21:13 ` carlos at redhat dot com
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: carlos at redhat dot com @ 2015-02-09 20:41 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=14485

Carlos O'Donell <carlos at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |carlos at redhat dot com

--- Comment #5 from Carlos O'Donell <carlos at redhat dot com> ---
(In reply to Rich Felker from comment #3)
> 1. Thread A unlocks the process-shared, robust mutex and is preempted after
> the mutex is removed from the robust list and atomically unlocked, but
> before it's removed from the list_op_pending field of the robust list header.
> 
> 2. Thread B locks the mutex, and, knowing by program logic that it's the
> last user of the mutex, unlocks and unmaps it, allocates/maps something else
> that gets assigned the same address as the shared mutex mapping, and then
> exits.

Isn't this undefined behaviour? You have not specified how you established a
happens-after relationship between the destruction of the mutex by Thread B and
the last use by Thread A. In this description you give it would seem to me that
Thread A is still not done, and that the "program logic" from Thread B is
destroying an in-use mutex and that results in undefined behaviour from Thread
A. Thread B fails to establish a happens-after the use of the mutex from Thread
A. If Thread B truly establishes a happens-after the unlock from Thread A, is
there a problem? I don't think there is.

Did I get something wrong Rich?

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug nptl/14485] File corruption race condition in robust mutex unlocking
  2012-08-17 18:52 [Bug nptl/14485] New: File corruption race condition in robust mutex unlocking bugdal at aerifal dot cx
                   ` (5 preceding siblings ...)
  2015-02-09 20:41 ` carlos at redhat dot com
@ 2015-02-09 21:13 ` carlos at redhat dot com
  2015-02-09 22:51 ` bugdal at aerifal dot cx
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: carlos at redhat dot com @ 2015-02-09 21:13 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=14485

--- Comment #6 from Carlos O'Donell <carlos at redhat dot com> ---
(In reply to Carlos O'Donell from comment #5)
> (In reply to Rich Felker from comment #3)
> > 1. Thread A unlocks the process-shared, robust mutex and is preempted after
> > the mutex is removed from the robust list and atomically unlocked, but
> > before it's removed from the list_op_pending field of the robust list header.
> > 
> > 2. Thread B locks the mutex, and, knowing by program logic that it's the
> > last user of the mutex, unlocks and unmaps it, allocates/maps something else
> > that gets assigned the same address as the shared mutex mapping, and then
> > exits.
> 
> Isn't this undefined behaviour? You have not specified how you established a
> happens-after relationship between the destruction of the mutex by Thread B
> and the last use by Thread A. In this description you give it would seem to
> me that Thread A is still not done, and that the "program logic" from Thread
> B is destroying an in-use mutex and that results in undefined behaviour from
> Thread A. Thread B fails to establish a happens-after the use of the mutex
> from Thread A. If Thread B truly establishes a happens-after the unlock from
> Thread A, is there a problem? I don't think there is.
> 
> Did I get something wrong Rich?

OK, I see what's wrong.

This issue is about self-synchronizing vs. not-self-synchronizing.

http://austingroupbugs.net/view.php?id=811

Given 811 has been accepted, I withdraw my complaint.

Your example is valid, and we do have a problem.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug nptl/14485] File corruption race condition in robust mutex unlocking
  2012-08-17 18:52 [Bug nptl/14485] New: File corruption race condition in robust mutex unlocking bugdal at aerifal dot cx
                   ` (6 preceding siblings ...)
  2015-02-09 21:13 ` carlos at redhat dot com
@ 2015-02-09 22:51 ` bugdal at aerifal dot cx
  2015-02-10  0:18 ` bugdal at aerifal dot cx
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: bugdal at aerifal dot cx @ 2015-02-09 22:51 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=14485

--- Comment #7 from Rich Felker <bugdal at aerifal dot cx> ---
Carlos, there's actually a still-open related Austin Group issue, number 864:

http://austingroupbugs.net/view.php?id=864

If there's a desire from the glibc side that implementations not be required to
handle the case of self-synchronized unmapping, please have someone make that
case. I'd be interested in hearing some arguments on both sides, as I haven't
really made up my own opinion on which way it should be resolved.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug nptl/14485] File corruption race condition in robust mutex unlocking
  2012-08-17 18:52 [Bug nptl/14485] New: File corruption race condition in robust mutex unlocking bugdal at aerifal dot cx
                   ` (7 preceding siblings ...)
  2015-02-09 22:51 ` bugdal at aerifal dot cx
@ 2015-02-10  0:18 ` bugdal at aerifal dot cx
  2015-02-10 21:57 ` triegel at redhat dot com
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: bugdal at aerifal dot cx @ 2015-02-10  0:18 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=14485

--- Comment #8 from Rich Felker <bugdal at aerifal dot cx> ---
In reply to comment 4, this issue can be avoided by applications in at least
two ways:

1. Use a separate mapping of the shared synchronization object for each
user/thread that might want to unmap it.

2. Use a separate synchronization object local to the process to synchronize
unmapping of the shared mutex.

Since the only way you'd have multiple threads in the same process accessing
the shared synchronization object is by storing the pointer to the (mapping
containing the) shared mutex in some process-local object that's shared between
threads, it seems natural that you would already be synchronizing access to
this memory with another mutex (or other synchronization object) stored with
the pointer. So approach 2 seems like it's always practical, probably doesn't
involve any new synchronization, and likely makes it unnecessary/useless to
support self-synchronized unmapping. On the other hand, it may not actually be
any harder to support self-synchronized unmapping than to support
self-synchronized destruction+unmapping, which almost certainly needs to be
supported.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug nptl/14485] File corruption race condition in robust mutex unlocking
  2012-08-17 18:52 [Bug nptl/14485] New: File corruption race condition in robust mutex unlocking bugdal at aerifal dot cx
                   ` (8 preceding siblings ...)
  2015-02-10  0:18 ` bugdal at aerifal dot cx
@ 2015-02-10 21:57 ` triegel at redhat dot com
  2015-02-10 22:17 ` bugdal at aerifal dot cx
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: triegel at redhat dot com @ 2015-02-10 21:57 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=14485

Torvald Riegel <triegel at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
                 CC|                            |triegel at redhat dot com
           Assignee|unassigned at sourceware dot org   |triegel at redhat dot com

--- Comment #9 from Torvald Riegel <triegel at redhat dot com> ---
I agree that there is an issue if we claim that a robust mutex can be destroyed
as soon as the thread that wants to destroy it can acquire it and there is no
other thread or process trying to acquire it anymore.  I don't think that
whether we consider destruction or unmap without destruction makes a
significant difference, except regarding performance of potential solutions.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug nptl/14485] File corruption race condition in robust mutex unlocking
  2012-08-17 18:52 [Bug nptl/14485] New: File corruption race condition in robust mutex unlocking bugdal at aerifal dot cx
                   ` (9 preceding siblings ...)
  2015-02-10 21:57 ` triegel at redhat dot com
@ 2015-02-10 22:17 ` bugdal at aerifal dot cx
  2015-08-09 12:29 ` mail at nh2 dot me
  2021-10-21 15:42 ` fweimer at redhat dot com
  12 siblings, 0 replies; 14+ messages in thread
From: bugdal at aerifal dot cx @ 2015-02-10 22:17 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=14485

--- Comment #10 from Rich Felker <bugdal at aerifal dot cx> ---
Torvald, the distinction between unmap and destroy+unmap may be significant in
that the costly synchronization could be tucked away in pthread_mutex_destroy
to deal with the latter case but not the former. So I think this realistically
comes into any performance-based argument of what the standard should mandate.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug nptl/14485] File corruption race condition in robust mutex unlocking
  2012-08-17 18:52 [Bug nptl/14485] New: File corruption race condition in robust mutex unlocking bugdal at aerifal dot cx
                   ` (10 preceding siblings ...)
  2015-02-10 22:17 ` bugdal at aerifal dot cx
@ 2015-08-09 12:29 ` mail at nh2 dot me
  2021-10-21 15:42 ` fweimer at redhat dot com
  12 siblings, 0 replies; 14+ messages in thread
From: mail at nh2 dot me @ 2015-08-09 12:29 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=14485

--- Comment #11 from mail at nh2 dot me ---
Could somebody summarise for me as somebody not familiar with the glibc
internals, what is the status of this bug, and in which cases am I safe to use
a robust mutex in a shared memory setup? Thanks!

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug nptl/14485] File corruption race condition in robust mutex unlocking
  2012-08-17 18:52 [Bug nptl/14485] New: File corruption race condition in robust mutex unlocking bugdal at aerifal dot cx
                   ` (11 preceding siblings ...)
  2015-08-09 12:29 ` mail at nh2 dot me
@ 2021-10-21 15:42 ` fweimer at redhat dot com
  12 siblings, 0 replies; 14+ messages in thread
From: fweimer at redhat dot com @ 2021-10-21 15:42 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=14485

Florian Weimer <fweimer at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|triegel at redhat dot com          |

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-10-21 15:42 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-17 18:52 [Bug nptl/14485] New: File corruption race condition in robust mutex unlocking bugdal at aerifal dot cx
2012-08-17 22:34 ` [Bug nptl/14485] " bugdal at aerifal dot cx
2014-06-17 18:35 ` fweimer at redhat dot com
2014-06-25 10:47 ` fweimer at redhat dot com
2014-06-25 15:47 ` bugdal at aerifal dot cx
2015-02-09  0:28 ` mail at nh2 dot me
2015-02-09 20:41 ` carlos at redhat dot com
2015-02-09 21:13 ` carlos at redhat dot com
2015-02-09 22:51 ` bugdal at aerifal dot cx
2015-02-10  0:18 ` bugdal at aerifal dot cx
2015-02-10 21:57 ` triegel at redhat dot com
2015-02-10 22:17 ` bugdal at aerifal dot cx
2015-08-09 12:29 ` mail at nh2 dot me
2021-10-21 15:42 ` fweimer at redhat dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).