public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
From: "tom dot honermann at oracle dot com" <sourceware-bugzilla@sourceware.org>
To: glibc-bugs@sources.redhat.com
Subject: [Bug libc/4737] fork is not async-signal-safe
Date: Wed, 05 Nov 2008 09:57:00 -0000	[thread overview]
Message-ID: <20081105095557.25884.qmail@sourceware.org> (raw)
In-Reply-To: <20070704013541.4737.nmiell@comcast.net>


------- Additional Comments From tom dot honermann at oracle dot com  2008-11-05 09:55 -------
Oracle/PeopleSoft is also running into this bug.  Oracle engineers (though not
myself) are currently assigned to resolving this issue.  I am hoping to
facilitate discussion regarding what an acceptable solution to this problem
should look like.

I've been studying the glibc-2.3.4 source code (I know, old, but this is the
version that we will ultimately have to create patches for).  I suspect (but
have not verified) that the underlying issue is still present in the latest CVS
source code (implied by the fact that this bug report is still open).  My
priority is to get this corrected for Linux/x86_64.

The stack trace for the hang I've been seeing looks like:
#0  0x00000034cc0d9128 in __lll_mutex_lock_wait () from /lib64/libc.so.6
#1  0x00000034cc07262c in _L_lock_57 () from /lib64/libc.so.6
#2  0x00000034cc06bfa3 in ptmalloc_lock_all () from /lib64/libc.so.6
#3  0x00000034cc09461a in fork () from /lib64/libc.so.6 
<Oracle/PeopleSoft signal handler stack frames>
#9  <signal handler called> 
#10 0x00000034cc030015 in raise () from /lib64/libc.so.6
#11 0x00000034cc031980 in abort () from /lib64/libc.so.6
#12 0x00000034cc0674db in __libc_message () from /lib64/libc.so.6
#13 0x00000034cc06e8a0 in _int_free () from /lib64/libc.so.6
#14 0x00000034cc071fbc in free () from /lib64/libc.so.6 

The root cause for the signal generated in this case was heap corruption (glibc
detected the corruption and aborted the process).  The invoked signal handler is
simply trying to fork/exec a program to gather diagnostics we need to help us
find the cause of the heap corruption.

The Linux/x86_64 glibc build is currently using "normal" mutexes for locking the
heap arenas (see 'ptmalloc_init' in malloc/arena.c).  These mutexes are
initialized by calling 'mutex_init' in 'ptmalloc_init' and these "normal"
mutexes will deadlock if a thread owning the mutex attempts to re-acquire it.

The simplest solution seems to me to convert these to recursive mutexes.  The
reason for using a recursive mutex is to allow a thread that already holds one
of the arena mutexes to handle a signal, call fork from within that signal
handler, call ptmalloc_lock_all, and still obtain a lock to all arena mutexes. 
This would allow the thread to continue while the data structures for the
previously locked arena are not in a stable state (since the original heap
function invocation was interrupted by the signal), but this should be ok since
heap functions are not async-signal safe and therefore may not be (reliably)
called from within a signal handler anyway.  Since the relevant thread in both
the parent and child processes is still executing within the context of a signal
handler, the arena data structures may not be touched by either thread.

The downsides to this approach are performance overhead and the potential for
defects to go unnoticed during development (since unintentional attempts to
recursively lock a mutex would no longer lead to deadlocks).

This approach would also require changes to 'ptmalloc_unlock_all2' (which
currently re-initializes arena mutexes in the child processes rather than
unlocking them) since a return from the signal handler in the child process will
attempt to unlock the previously held arena mutex lock.  If the mutex is
re-initialized, the unlock call could result in undesirable behavior.

Eagerly awaiting comments and criticisms...

-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=4737

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


  parent reply	other threads:[~2008-11-05  9:57 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-07-04  1:35 [Bug libc/4737] New: " nmiell at comcast dot net
2008-10-11 18:47 ` [Bug libc/4737] " morten+sources dot redhat dot com at afdelingp dot dk
2008-10-20 11:47 ` morten+sources dot redhat dot com at afdelingp dot dk
2008-10-21  5:13 ` nmiell at comcast dot net
2008-10-30 14:55 ` morten+sources dot redhat dot com at afdelingp dot dk
2008-11-05  9:00 ` tom dot honermann at oracle dot com
2008-11-05  9:57 ` tom dot honermann at oracle dot com [this message]
2008-11-06 23:10 ` nmiell at comcast dot net
2008-11-07  1:10 ` morten+sources dot redhat dot com at afdelingp dot dk
2008-11-11 21:35 ` tom dot honermann at oracle dot com
2008-11-11 21:41 ` tom dot honermann at oracle dot com
2008-11-11 22:04 ` tom dot honermann at oracle dot com
2008-11-18 22:30 ` tom dot honermann at oracle dot com
2008-11-18 23:45 ` rsa at us dot ibm dot com
2008-11-18 23:57 ` rsa at us dot ibm dot com
2008-11-19  1:39 ` tom dot honermann at oracle dot com
2008-11-19 16:23 ` rsa at us dot ibm dot com
2009-01-14  1:22 ` tom dot honermann at oracle dot com
2009-01-14  8:46 ` jakub at redhat dot com
2009-01-14  9:44 ` tom dot honermann at oracle dot com
2009-01-16 17:19 ` tom dot honermann at oracle dot com
     [not found] <bug-4737-131@http.sourceware.org/bugzilla/>
2011-10-13 17:53 ` llucax at gmail dot com
2011-10-13 17:54 ` llucax at gmail dot com
2011-10-13 17:56 ` llucax at gmail dot com
2011-10-14 14:06 ` bugdal at aerifal dot cx
2012-03-27 13:43 ` krebbel1 at de dot ibm.com
2012-04-11  7:58 ` aj at suse dot de
2014-02-16 19:42 ` jackie.rosen at hushmail dot com
2014-05-28 19:41 ` schwab at sourceware dot org
2014-08-25  2:25 ` naesten at gmail dot com
2015-05-07 15:27 ` gbenson at redhat dot com
2021-06-28 19:00 ` adhemerval.zanella at linaro dot org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081105095557.25884.qmail@sourceware.org \
    --to=sourceware-bugzilla@sourceware.org \
    --cc=glibc-bugs@sources.redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).