segfaults in MD_FROB_UPDATE_CONTEXT and MD_FALLBACK_FRAME_STATE

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* segfaults in MD_FROB_UPDATE_CONTEXT and MD_FALLBACK_FRAME_STATE_FOR
@ 2004-08-30  3:01 Alan Modra
  2004-08-31  8:25 ` Jakub Jelinek
  0 siblings, 1 reply; 4+ messages in thread
From: Alan Modra @ 2004-08-30  3:01 UTC (permalink / raw)
  To: gcc

[-- Attachment #1: Type: text/plain, Size: 2581 bytes --]

Executive summary:
A call to nptl pthread_exit can cause a segfault if a function in a
dlclose'd shared lib is somewhere in the pthread_exit call chain.
Testcase attached that demonstrates the problem on powerpc64-linux.

My first reaction on seeing the bug report that complained about this
problem was "Well, don't do that!", but after properly investigating
I'm reasonably convinced this is a real bug.  At least, a target that
doesn't use MD_FROB_UPDATE_CONTEXT or MD_FALLBACK_FRAME_STATE_FOR will
quite happily work with the testcase.

Details:
nptl pthread_exit uses the .eh_frame stack unwinder to find cleanup
handlers.  The stack unwinding process uses the .eh_frame information
for a given function, with a known register state, to deduce the
register state at its caller.  The process is repeated until a handler
is found, or until we hit a function without .eh_frame information.
This last point is the simple reason that targets without the MD_*
macros mentioned above work on the testcase;  On shared lib dlclose,
__deregister_frame_info is called which tells the unwinder not to
consider .eh_frame for that shared lib (you'd get a segfault if you
tried to access it).

The problem with the two MD_* macros is that they look at .text.  In the
testcase, the backtrace from __pthread_exit looks like:

__pthread_exit (libpthread)
terminate      (main)
doSomething    (libmylib)
myThread       (main)
start_thread   (libpthread)
__clone	       (libc)

The segfault caused by MD_FROB_UPDATE_CONTEXT occurs when the unwinder
is processing .eh_frame info for "terminate".  MD_FROB_UPDATE_CONTEXT
wants to look at .text for "doSomething", which has already been
upmapped.  Of course, the frobbing that MD_FROB_UPDATE_CONTEXT does for
"terminate" isn't needed;  The unwinder won't need the register state
for "doSomething", since it won't find any .eh_frame for the unloaded
library.  This suggests a possible solution for MD_FROB_UPDATE_CONTEXT
might be to change the point at which the frobbing is done.  An
alternate simple hack is to have MD_FROB_UPDATE_CONTEXT check via
_Unwind_Find_FDE that .eh_frame info is available for the pc that it
examines.

However, MD_FALLBACK_FRAME_STATE_FOR, whose purpose is to unwind through
signal frames, is called for funtions that don't have .eh_frame info.
It too wants to examine .text.

My solution to this problem will be to install a segv handler before the
unwinder does any work, and annotate instructions that might segfault in
the MD_* macros.  Better ideas appreciated..

-- 
Alan Modra
IBM OzLabs - Linux Technology Centre

[-- Attachment #2: thread_unload.tar.gz --]
[-- Type: application/x-gunzip, Size: 843 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: segfaults in MD_FROB_UPDATE_CONTEXT and MD_FALLBACK_FRAME_STATE_FOR
  2004-08-30  3:01 segfaults in MD_FROB_UPDATE_CONTEXT and MD_FALLBACK_FRAME_STATE_FOR Alan Modra
@ 2004-08-31  8:25 ` Jakub Jelinek
  2004-08-31  8:31   ` Alan Modra
  0 siblings, 1 reply; 4+ messages in thread
From: Jakub Jelinek @ 2004-08-31  8:25 UTC (permalink / raw)
  To: gcc; +Cc: drepper

On Mon, Aug 30, 2004 at 12:04:44PM +0930, Alan Modra wrote:
> Executive summary:
> A call to nptl pthread_exit can cause a segfault if a function in a
> dlclose'd shared lib is somewhere in the pthread_exit call chain.
> Testcase attached that demonstrates the problem on powerpc64-linux.
> 
> My first reaction on seeing the bug report that complained about this
> problem was "Well, don't do that!", but after properly investigating
> I'm reasonably convinced this is a real bug.  At least, a target that
> doesn't use MD_FROB_UPDATE_CONTEXT or MD_FALLBACK_FRAME_STATE_FOR will
> quite happily work with the testcase.

I think this is definitely a don't do that situation.
As pthread_exit must be able to run all cleanups in the backtrace
which called it, all functions in the backtrace simply must be available
at the time pthread_exit is called.
IMHO working around application bugs is not necessary.

	Jakub

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: segfaults in MD_FROB_UPDATE_CONTEXT and MD_FALLBACK_FRAME_STATE_FOR
  2004-08-31  8:25 ` Jakub Jelinek
@ 2004-08-31  8:31   ` Alan Modra
  2004-08-31 18:37     ` Dave Korn
  0 siblings, 1 reply; 4+ messages in thread
From: Alan Modra @ 2004-08-31  8:31 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc, drepper

On Tue, Aug 31, 2004 at 05:44:30AM +0200, Jakub Jelinek wrote:
> On Mon, Aug 30, 2004 at 12:04:44PM +0930, Alan Modra wrote:
> > Executive summary:
> > A call to nptl pthread_exit can cause a segfault if a function in a
> > dlclose'd shared lib is somewhere in the pthread_exit call chain.
> > Testcase attached that demonstrates the problem on powerpc64-linux.
> > 
> > My first reaction on seeing the bug report that complained about this
> > problem was "Well, don't do that!", but after properly investigating
> > I'm reasonably convinced this is a real bug.  At least, a target that
> > doesn't use MD_FROB_UPDATE_CONTEXT or MD_FALLBACK_FRAME_STATE_FOR will
> > quite happily work with the testcase.
> 
> I think this is definitely a don't do that situation.
> As pthread_exit must be able to run all cleanups in the backtrace
> which called it, all functions in the backtrace simply must be available
> at the time pthread_exit is called.
> IMHO working around application bugs is not necessary.

Of course, the real application was a little more complicated than the
testcase.  Also, the original bug reporter gave two work-arounds, use of
pthread_join in the master, or setjmp/longjmp in the slaves to collapse
the stack before calling pthread_exit.  ie. plenty of evidence that he
wasn't an incompetent needing help with his app.  I'll quote some things
from the bug report (IBM LTC bugzilla #10704 for those who can access
it):

- begin quote
In our application, this situation can occur during shared library shutdown. 
The library asks each of its threads to shutdown. The threads respond by 
calling a framework function which notifies the waiting thread and then calls 
pthread_exit(). As soon as all slave threads have notified the master, the 
master unloads their shared library. This can happen between the notification 
and the call the pthread_exit(), leading to a situation where pthread_exit() 
is called on a stack which contains frames from unloaded functions.

This pattern is sufficiently common that Win32 actually provides an API for 
this case: FreeLibraryAndExitThread. Implementing an equivalent API for NPTL 
would be difficult, given the current implementation of pthread_exit().
- end quote

-- 
Alan Modra
IBM OzLabs - Linux Technology Centre

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: segfaults in MD_FROB_UPDATE_CONTEXT and MD_FALLBACK_FRAME_STATE_FOR
  2004-08-31  8:31   ` Alan Modra
@ 2004-08-31 18:37     ` Dave Korn
  0 siblings, 0 replies; 4+ messages in thread
From: Dave Korn @ 2004-08-31 18:37 UTC (permalink / raw)
  To: 'Alan Modra', 'Jakub Jelinek'; +Cc: gcc, drepper

> -----Original Message-----
> From: gcc-owner On Behalf Of Alan Modra
> Sent: 31 August 2004 08:38

> The library asks each of its threads to shutdown. The threads 
> respond by 
> calling a framework function which notifies the waiting 
> thread and then calls 
> pthread_exit(). As soon as all slave threads have notified 
> the master, the 
> master unloads their shared library. This can happen between 
> the notification 
> and the call the pthread_exit(), leading to a situation where 
> pthread_exit() 
> is called on a stack which contains frames from unloaded functions.

  As far as I can see, this is a completely ordinary bog-standard race
condition design fault in the application, which is attempting to do
something completely invalid.  This is a textbook example of how not to do
what they're trying to do.

  There is a time window between the slave thread sending the notification
and it actually exiting and being in a terminated state; therefore it is
utterly incorrect for the master to try and deduce that the thread is in a
terminated state from the fact that the notification has been sent.

  It's hard to say for sure without full design details, but from what's
written above I can't see any protection either against the case where the
very final thread sends a notification, the scheduler interrupts and
schedules the master, the master receives the notification and unloads the
shared library, and then the slave thread resumes with the same instruction
pointer / context that it had before the scheduler interrupted, and tries
executing in what is now presumably completely invalid and unmapped memory
space.

> This pattern is sufficiently common that Win32 actually 
> provides an API for 
> this case: FreeLibraryAndExitThread. Implementing an 
> equivalent API for NPTL 
> would be difficult, given the current implementation of 
> pthread_exit().

  To me this reads as an admission that they know they've got a race
condition, and their observation that a different OS provides a mechanism
that potentially can be used by programmers in the design of their
applications to avoid a similar race condition[*] does not in any way mean
that their code is valid, merely that other people have noticed and gone to
some trouble to fix the problem that they wish to merely neglect.

  Crossing-your-fingers-and-hoping with race conditions never works.  They
_always_ bite you, and surprisingly often they do so sooner rather than
later.  The only fix for race conditions is to use correct rather than
incorrect application design.

  Why doesn't the application use pthread_join if it wants to know when a
thread has terminated?  That is the *correct* way to detect the condition
that the application is currently falsely deducing from the notification
signal.  The application could even retain its current design and only call
pthread_join on the very final slave thread, and only do so *after* having
received the exit notification anyway, and then nothing would be likely to
go wrong.

    cheers, 
      DaveK

[*] but not the same one: the situation would only be the same, and
FreeLibraryAndExitThread would only be a suitable solution, if it was the
last _library_ thread was responsible for doing the unload, or if the master
thread was itself the last thread running in the library (which amount to
more-or-less the same thing, really)
-- 
Can't think of a witty .sigline today....

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2004-08-31 18:30 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-08-30  3:01 segfaults in MD_FROB_UPDATE_CONTEXT and MD_FALLBACK_FRAME_STATE_FOR Alan Modra
2004-08-31  8:25 ` Jakub Jelinek
2004-08-31  8:31   ` Alan Modra
2004-08-31 18:37     ` Dave Korn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).