[Bug server/16168] New: Signal heavy execution + repeated breakpoint locks up gbserver

public inbox for gdb-prs@sourceware.org
help / color / mirror / Atom feed

* [Bug server/16168] New: Signal heavy execution + repeated breakpoint locks up gbserver
@ 2013-11-13 20:28 saugustine at google dot com
  2013-11-14  1:06 ` [Bug server/16168] " saugustine at google dot com
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: saugustine at google dot com @ 2013-11-13 20:28 UTC (permalink / raw)
  To: gdb-prs

http://sourceware.org/bugzilla/show_bug.cgi?id=16168

            Bug ID: 16168
           Summary: Signal heavy execution + repeated breakpoint locks up
                    gbserver
           Product: gdb
           Version: HEAD
            Status: NEW
          Severity: normal
          Priority: P2
         Component: server
          Assignee: unassigned at sourceware dot org
          Reporter: saugustine at google dot com

Created attachment 7276
  --> http://sourceware.org/bugzilla/attachment.cgi?id=7276&action=edit
files to reproduce.

The attached tar file includes a source file, a bash script, and a gdb script
which exposes a bug in gdbserver's signal handling.

You can simply run sh doit.sh to reproduce the problem.

gdbserver attaches to a multi-threaded application, which is also taking
SIGPROF signals.

The program repeatedly hits a breakpoint in some of the threads, and continues.

At some point, the SIGPROF will trigger a situation where a thread has a
pending signal, so gdbserver elects not to restart all threads.

In the included gdbserver.log file, this is the line: "Not resuming, all-stop
and found an LWP with pending status."

The only thread ever restarted is the pending one. Eventually, this thread runs
out of work and the system locks up.

There is a race involved, so you may need to run it a couple of times.
Sometimes it happens very early, and these are the easiest logs to study.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug server/16168] Signal heavy execution + repeated breakpoint locks up gbserver
  2013-11-13 20:28 [Bug server/16168] New: Signal heavy execution + repeated breakpoint locks up gbserver saugustine at google dot com
@ 2013-11-14  1:06 ` saugustine at google dot com
  2013-12-04 19:36 ` dje at google dot com
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: saugustine at google dot com @ 2013-11-14  1:06 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=16168

Sterling Augustine <saugustine at google dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |saugustine at google dot com

--- Comment #1 from Sterling Augustine <saugustine at google dot com> ---
Created attachment 7279
  --> https://sourceware.org/bugzilla/attachment.cgi?id=7279&action=edit
More elaborate test case

This newly uploaded file is a test case for the patch proposed at:

https://sourceware.org/ml/gdb-patches/2013-11/msg00361.html

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug server/16168] Signal heavy execution + repeated breakpoint locks up gbserver
  2013-11-13 20:28 [Bug server/16168] New: Signal heavy execution + repeated breakpoint locks up gbserver saugustine at google dot com
  2013-11-14  1:06 ` [Bug server/16168] " saugustine at google dot com
@ 2013-12-04 19:36 ` dje at google dot com
  2013-12-04 19:37 ` dje at google dot com
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: dje at google dot com @ 2013-12-04 19:36 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=16168

dje at google dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dje at google dot com

--- Comment #2 from dje at google dot com ---
What happens here is this:

1) This is all-stop, SIGPROF is active, and a thread hits a breakpoint.
gdbserver stops all threads, and while stopping all threads one thread gets a
SIGPROF.

2) gdb then advances the breakpointed thread passed the breakpoint and then
resumes all threads.

3) gdbserver gets the resume request and looks for a thread with a pending
signal, finds it (the SIGPROF'd thread), and leaves all threads stopped knowing
linux_wait_for_event will find the thread with status_pending_p (there could be
more than one of course).

4) gdbserver then enters wait processing for all threads, linux_wait_for_thread
finds the SIGPROF'd thread which linux_wait_1 forwards on to the inferior, and
goes back to waiting for all threads.

5) At this point only the SIGPROF'd thread is running and linux_wait_1 is
waiting for an event worthy of reporting back to gdb.
gdbserver sees the SIGSTOP that was sent earlier to stop all threads, knows it
no longer cares about it, resumes the thread, and goes back to waiting for all
threads. The thread continues to receive SIGPROF which are continually
forwarded on and eventually the thread exits.

6) At this point gdbserver is hung waiting for an event from some thread, but
no threads are running.

>From a high level perspective, if we want to keep the "any_pending" processing,
a signal gdb doesn't care about is different than a signal gdb does care about,
and the "any_pending" processing that gdbserver does only applies to the
latter, not the former.  E.g., if there are 10 threads to be resumed, 1 of
which is a "normal" resume after a SIGSTOP, and 9 have different signals all
marked as "nostop noprint pass", then that is no different than having the same
10 threads all marked for "normal" resumption: resume them all in the way
appropriate for each thread.
Thus, from a high level perspective, IWBN to distinguish signals thusly. 
Whether that's actually easy/possible in the implementation ... have to see.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug server/16168] Signal heavy execution + repeated breakpoint locks up gbserver
  2013-11-13 20:28 [Bug server/16168] New: Signal heavy execution + repeated breakpoint locks up gbserver saugustine at google dot com
  2013-11-14  1:06 ` [Bug server/16168] " saugustine at google dot com
  2013-12-04 19:36 ` dje at google dot com
@ 2013-12-04 19:37 ` dje at google dot com
  2014-11-23 15:05 ` eclipsehivernale at sfr dot fr
  2014-11-23 15:33 ` eclipsehivernale at sfr dot fr
  4 siblings, 0 replies; 6+ messages in thread
From: dje at google dot com @ 2013-12-04 19:37 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=16168

--- Comment #3 from dje at google dot com ---
(In reply to dje from comment #2)
> What happens here is this:

For completeness sake, that's from analyzing the hang using thread-test-2 in
the attached testcase.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug server/16168] Signal heavy execution + repeated breakpoint locks up gbserver
  2013-11-13 20:28 [Bug server/16168] New: Signal heavy execution + repeated breakpoint locks up gbserver saugustine at google dot com
                   ` (2 preceding siblings ...)
  2013-12-04 19:37 ` dje at google dot com
@ 2014-11-23 15:05 ` eclipsehivernale at sfr dot fr
  2014-11-23 15:33 ` eclipsehivernale at sfr dot fr
  4 siblings, 0 replies; 6+ messages in thread
From: eclipsehivernale at sfr dot fr @ 2014-11-23 15:05 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=16168

eclipsehivernale at sfr dot fr changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |eclipsehivernale at sfr dot fr

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug server/16168] Signal heavy execution + repeated breakpoint locks up gbserver
  2013-11-13 20:28 [Bug server/16168] New: Signal heavy execution + repeated breakpoint locks up gbserver saugustine at google dot com
                   ` (3 preceding siblings ...)
  2014-11-23 15:05 ` eclipsehivernale at sfr dot fr
@ 2014-11-23 15:33 ` eclipsehivernale at sfr dot fr
  4 siblings, 0 replies; 6+ messages in thread
From: eclipsehivernale at sfr dot fr @ 2014-11-23 15:33 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=16168

--- Comment #4 from eclipsehivernale at sfr dot fr ---
I am a software developer of a multi threaded application (about 10 threads).
Recently we decided to use tcmalloc instead of the glibc malloc.
It is a google open source malloc optimized for multi allocation allocation.

Since this change, it is impossible to use gdbserver.
The SIGPROF signal management is automatic in tcmalloc library.
After a few "next" operation, gdbserver hangs, waiting for a pending event from
thread which has received a SIGPROF signal, exactly like you describe in your
comment.

It is still possible to use gdb directly on the remote target, but this is a
waste of time.
I also observed once gdb hanged in native configuration, but I can't tell for
sure it is the same issue as I just killed it and tried again.

I tested the patch you posted:
https://sourceware.org/ml/gdb-patches/2013-11/msg00361.html and it seems to
work fine on 7.8.50.20141107.

There are other freeze/hangs reported in the bug zilla database that may be
linked to this issue, since it can appear by using any running operation (next,
step, break...) and every gdb version so far are impacted.

I think more and more people will face this issue (tcmalloc + multi threaded
application without control on SIGPROF) and I would like to push to integrate a
fix in the next version of gdb.

Anyway thanks a lot to you for the investigation and the fix suggestion.
If no action is taken to fix gdb then I guess I will use your fix locally
forever.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-11-23 15:33 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-13 20:28 [Bug server/16168] New: Signal heavy execution + repeated breakpoint locks up gbserver saugustine at google dot com
2013-11-14  1:06 ` [Bug server/16168] " saugustine at google dot com
2013-12-04 19:36 ` dje at google dot com
2013-12-04 19:37 ` dje at google dot com
2014-11-23 15:05 ` eclipsehivernale at sfr dot fr
2014-11-23 15:33 ` eclipsehivernale at sfr dot fr

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).