[Bug nptl/17705] New: nptl_db: stale thread create/death events if debugger detaches

public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug nptl/17705] New: nptl_db: stale thread create/death events if debugger detaches
@ 2014-12-12 17:28 palves at redhat dot com
  2014-12-12 17:44 ` [Bug nptl/17705] " palves at redhat dot com
  2014-12-12 17:50 ` palves at redhat dot com
  0 siblings, 2 replies; 3+ messages in thread
From: palves at redhat dot com @ 2014-12-12 17:28 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=17705

            Bug ID: 17705
           Summary: nptl_db: stale thread create/death events if debugger
                    detaches
           Product: glibc
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: nptl
          Assignee: unassigned at sourceware dot org
          Reporter: palves at redhat dot com
                CC: drepper.fsp at gmail dot com

I wrote a GDB test that attaches to a program that is constantly/quickly
spawning short-lived threads.  The test makes GDB attach, have threads hit a
breakpoint, detach, and then reattaches, rinse/repeat.

Sometimes, the test fails with a surprising libthread_db error:

 (gdb) continue
 Continuing.
 Cannot get thread event message: debugger service failed
 (gdb)

Investigation showed that that test exposes a libthread_db issue.

If we detach just after a thread had decided that it needs to report an event
to the debugger (thread creation or death), and before the event is actually
queued (in __nptl_last_event), and the event function (__nptl_create_event or
__nptl_death_event) is called, the debugger won't be around to consume the
event, but the thread will still be left dangling in the __nptl_last_event
event queue/list.

__pthread_create_2_1():
...
  /* Start the thread.  */
  if (__glibc_unlikely (report_thread_creation (pd)))
    {
...
      retval = create_thread (pd, iattr, true, STACK_VARIABLES_ARGS,
                  &thread_ran);
      if (retval == 0)
    {
...
          pd->eventbuf.eventnum = TD_CREATE;
          pd->eventbuf.eventdata = pd;

          /* Enqueue the descriptor.  */
          do
            pd->nextevent = __nptl_last_event;
          while (atomic_compare_and_exchange_bool_acq (&__nptl_last_event,
                                                       pd, pd->nextevent)
                                                     != 0);

          /* Now call the function which signals the event.  */
          __nptl_create_event ();
...

That is, if the debugger detaches after the report_thread_creation check and
before the __nptl_create_event call.

Later when the thread dies, if it has a glibc managed stack, and its stack is
reused, its event buffer is cleared, but, __nptl_last_event (or a thread in the
chain that itself is __nptl_last_event ultimately) still has a stale pointer to
to it.

So if another GDB reattaches, when any thread pushes another event, the new GDB
fetches the events out of libthread_db, with td_ta_event_getmsg.  Now
td_ta_event_getmsg finds a stale pointer to the resumed thread stack in the
event list, with no event, which fails with TD_DBERR:

td_err_e
td_ta_event_getmsg (const td_thragent_t *ta_arg, td_event_msg_t *msg)
{
...
  /* If the structure is on the list there better be an event recorded.  */
  if ((int) (uintptr_t) eventnum == TD_EVENT_NONE)
    return TD_DBERR;
...

And thus GDB's "debugger service failed" error message.

If the thread had been allocated on a user provided stack, then the failures
modes will even be more "interesting", possibly even corrupting the inferior,
as that TD_EVENT_NONE check (and a similar one in td_thr_event_getmsg) might
well be fooled, for reading from a dangling pointer.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug nptl/17705] nptl_db: stale thread create/death events if debugger detaches
  2014-12-12 17:28 [Bug nptl/17705] New: nptl_db: stale thread create/death events if debugger detaches palves at redhat dot com
@ 2014-12-12 17:44 ` palves at redhat dot com
  2014-12-12 17:50 ` palves at redhat dot com
  1 sibling, 0 replies; 3+ messages in thread
From: palves at redhat dot com @ 2014-12-12 17:44 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=17705

--- Comment #1 from Pedro Alves <palves at redhat dot com> ---
Created attachment 8010
  --> https://sourceware.org/bugzilla/attachment.cgi?id=8010&action=edit
WIP fix

WIP fix.  The main idea is that whenever a __nptl_*_event event function is
called, the debugger is expected to consume the event (see
nptl_db/td_ta_event_getmsg.c).  If the event wasn't consumed, then it must be
the debugger is either gone, or misbehaved.  The death event is to reason about
-- clearly we shouldn't hang on the event forever, as the thread is about to be
wiped out.  So right after calling the event function, we remove the thread
from the event queue.  The complication is that a new debugger may manage to
reattach just while we're doing that, and the code must be written in a way
that works without any locking/synchronization between the debugger and the
inferior.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug nptl/17705] nptl_db: stale thread create/death events if debugger detaches
  2014-12-12 17:28 [Bug nptl/17705] New: nptl_db: stale thread create/death events if debugger detaches palves at redhat dot com
  2014-12-12 17:44 ` [Bug nptl/17705] " palves at redhat dot com
@ 2014-12-12 17:50 ` palves at redhat dot com
  1 sibling, 0 replies; 3+ messages in thread
From: palves at redhat dot com @ 2014-12-12 17:50 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=17705

--- Comment #2 from Pedro Alves <palves at redhat dot com> ---
I'm not going to be working on this fix further, at least for now.

GDB doesn't really need to be using libthread_db's thread creation/destruction
events nowadays when the kernel supports PTRACE_EVENT_CLONE, which it has for a
long while.  I'll work on that instead.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2014-12-12 17:50 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-12 17:28 [Bug nptl/17705] New: nptl_db: stale thread create/death events if debugger detaches palves at redhat dot com
2014-12-12 17:44 ` [Bug nptl/17705] " palves at redhat dot com
2014-12-12 17:50 ` palves at redhat dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).