* [PATCH 2/8] watch_thread_num.exp and targets with fairer event reporting
2014-12-26 20:31 [PATCH 0/8] Linux: starvation avoidance in non-stop mode Pedro Alves
@ 2014-12-26 20:31 ` Pedro Alves
2014-12-26 20:31 ` [PATCH 1/8] gdb.threads/{siginfo-thread.c,watchthreads-reorder.c,ia64-sigill.c} races with GDB Pedro Alves
` (7 subsequent siblings)
8 siblings, 0 replies; 23+ messages in thread
From: Pedro Alves @ 2014-12-26 20:31 UTC (permalink / raw)
To: gdb-patches
This patch fixes the watch_thread_num.exp test to work when the target
is better at making event handling be fair among threads.
I wrote patches that make GDB native and GDBserver event handling
fairer between threads. That is, if threads A and B both
simultaneously trigger some debug event, GDB will pick either A or B
at random, rather than always handling the event of A first. There's
code for that in the Linux backends (gdb and gdbserver) already, but
it can be improved, and only works in all-stop mode.
With those fixes in place, I found that the watch_thread_num.exp would
often time out. The problem is that the test only works _because_
event handling isn't as fair as intended. With the fairness fixes,
the test falls victim of PR10116 (gdb drops watchpoints on
multi-threaded apps) quite often.
To expand on the PR10116 reference, consider that stop events are
serialized to GDB core, through target_wait. Say a thread-specific
watchpoint as set on thread A. When the "right" thread and some other
"wrong" thread both trigger a watchpoint simultaneously, the target
may report the "wrong" thread's hit to GDB first (thread B). When
handling that event, GDB notices the watchpoint is for another thread,
and so shouldn't cause a user-visible stop. On resume, GDB saves the
now current value of the watched expression. Afterwards, the "right"
thread (thread A) reports its watchpoint trigger. But the watched
value hasn't changed since GDB last saved it, and so GDB doesn't
report the watchpoint hit to the user.
The way the test is written, the watchpoint is associated with the
first thread that happens to report an event. It happens that GDB is
processing events much more often for one of the threads, which
usually will be that same first thread.
Hacking the test with "set debug infrun 1", we see exactly that:
$ grep "infrun.*\[Thread.*," testsuite/gdb.log | sort | uniq -c | sort -nr
70 infrun: 8798 [Thread 8798],
37 infrun: 8798 [Thread 8802],
36 infrun: 8798 [Thread 8804],
36 infrun: 8798 [Thread 8803],
35 infrun: 8798 [Thread 8805],
34 infrun: 8798 [Thread 8806],
The first column shows the number of times the target reported an
event for that thread, from:
infrun: target_wait (-1, status) =
infrun: 8798 [Thread 8798],
infrun: status->kind = stopped, signal = GDB_SIGNAL_TRAP
This masks out the PR10116 issue.
However, if the target is better at giving equal priority to all
threads, the PR10116 issue happens often, so it may take quite a while
for the right thread to be the first to report its watchpoint event
just after the memory being watched really changed, resulting in test
time outs.
Here's the number of events handled for each thread on a gdbserver run
with the event fairness patches:
$ grep "infrun.*\[Thread.*," gdb.log | sort | uniq -c
2961 infrun: 13591 [Thread 13591],
2956 infrun: 13591 [Thread 13595],
2941 infrun: 13591 [Thread 13596],
2932 infrun: 13591 [Thread 13597],
2905 infrun: 13591 [Thread 13598],
2891 infrun: 13591 [Thread 13599],
Note how the number of events is much higher. The test routinely
takes over 10 seconds to finish on my machine rather than under a
second as with unpatched gdbserver, when it succeeds, but often it'll
fail with timeouts too.
So to make the test robust, this patch switches the tests to using
"awatch" instead of "watch", as access watchpoints don't care about
the watchpoint's "old value". With this, the test always finishes
quickly, and we can even bump the number of threads concurrently
writting to the shared variable, to have better assurance we're really
testing the case of the "wrong" thread triggering a watchpoint.
Here's the number of events I see for each thread on a run on my
machine, with a gdbserver patched with the event fairness series:
$ grep "infrun.*\[Thread.*," testsuite/gdb.log | sort | uniq -c
5 infrun: 5298 [Thread 5302],
4 infrun: 5298 [Thread 5303],
4 infrun: 5298 [Thread 5304],
4 infrun: 5298 [Thread 5305],
4 infrun: 5298 [Thread 5306],
4 infrun: 5298 [Thread 5307],
4 infrun: 5298 [Thread 5308],
4 infrun: 5298 [Thread 5309],
4 infrun: 5298 [Thread 5310],
4 infrun: 5298 [Thread 5311],
4 infrun: 5298 [Thread 5312],
4 infrun: 5298 [Thread 5313],
4 infrun: 5298 [Thread 5314],
4 infrun: 5298 [Thread 5315],
4 infrun: 5298 [Thread 5316],
gdb/testsuite/
2014-12-26 Pedro Alves <palves@redhat.com>
* gdb.base/annota1.exp (thread_test): Use srcfile and binfile from
the global scope. Set a breakpoint after all threads are started
rather than stepping over two source lines. Expect the prompt.
* gdb.base/watch_thread_num.c (threads_started_barrier): New
global.
(NUM): Now 15.
(main): Use threads_started_barrier to wait for all threads to
start. Main thread no longer calls thread_function. Exit after
180 seconds.
(loop): New function.
(thread_function): Wait on threads_started_barrier barrier. Call
'loop' at each iteration.
* gdb.base/watch_thread_num.exp: Continue to breakpoint after all
threads have started, instead of hardcoding number of "next"
steps. Use an access watchpoint instead of a write watchpoint.
---
gdb/testsuite/gdb.base/annota1.exp | 9 ++++---
gdb/testsuite/gdb.base/watch_thread_num.c | 21 +++++++++++++--
gdb/testsuite/gdb.base/watch_thread_num.exp | 40 ++++++++++++++++++++++-------
3 files changed, 56 insertions(+), 14 deletions(-)
diff --git a/gdb/testsuite/gdb.base/annota1.exp b/gdb/testsuite/gdb.base/annota1.exp
index 98d3b60..03e5bfe 100644
--- a/gdb/testsuite/gdb.base/annota1.exp
+++ b/gdb/testsuite/gdb.base/annota1.exp
@@ -440,7 +440,7 @@ if { [remote_file host exists core] } {
}
proc thread_test {} {
- global subdir srcdir testfile
+ global subdir srcdir testfile srcfile binfile
global gdb_prompt old_gdb_prompt
set srcfile watch_thread_num.c
set binfile [standard_output_file ${testfile}-watch_thread_num]
@@ -457,6 +457,9 @@ proc thread_test {} {
return
}
+ set linenum [gdb_get_line_number "all threads started"]
+ gdb_breakpoint "$linenum"
+
set gdb_prompt \
"\r\n\032\032pre-prompt\r\n$gdb_prompt \r\n\032\032prompt\r\n"
@@ -465,8 +468,8 @@ proc thread_test {} {
}
}
- gdb_test_multiple "next 2" "new thread" {
- -re ".*\032\032new-thread" {
+ gdb_test_multiple "continue" "new thread" {
+ -re "\032\032new-thread.*\r\n$gdb_prompt$" {
pass "new thread"
}
}
diff --git a/gdb/testsuite/gdb.base/watch_thread_num.c b/gdb/testsuite/gdb.base/watch_thread_num.c
index 96ac843..668f868 100644
--- a/gdb/testsuite/gdb.base/watch_thread_num.c
+++ b/gdb/testsuite/gdb.base/watch_thread_num.c
@@ -27,7 +27,11 @@
void *thread_function (void *arg); /* Pointer to function executed by each thread */
-#define NUM 5
+static pthread_barrier_t threads_started_barrier;
+
+#define NUM 15
+
+static int num_threads = NUM;
static unsigned int shared_var = 1;
@@ -37,6 +41,8 @@ int main () {
void *thread_result;
long i;
+ pthread_barrier_init (&threads_started_barrier, NULL, NUM + 1);
+
for (i = 0; i < NUM; i++)
{
res = pthread_create (&threads[i],
@@ -45,18 +51,29 @@ int main () {
(void *) i);
}
- thread_result = thread_function ((void *) i);
+ pthread_barrier_wait (&threads_started_barrier);
+
+ sleep (180); /* all threads started */
exit (EXIT_SUCCESS);
}
+void
+loop (void)
+{
+}
+
void *thread_function (void *arg) {
int my_number = (long) arg;
+
+ pthread_barrier_wait (&threads_started_barrier);
+
/* Don't run forever. Run just short of it :) */
while (shared_var > 0)
{
shared_var++;
usleep (1); /* Loop increment. */
+ loop ();
}
pthread_exit (NULL);
diff --git a/gdb/testsuite/gdb.base/watch_thread_num.exp b/gdb/testsuite/gdb.base/watch_thread_num.exp
index 0e69ae8..fdb3841 100644
--- a/gdb/testsuite/gdb.base/watch_thread_num.exp
+++ b/gdb/testsuite/gdb.base/watch_thread_num.exp
@@ -47,12 +47,14 @@ if { ![runto main] } then {
gdb_test "watch shared_var thread 0" "Unknown thread 0\." "Watchpoint on invalid thread"
gdb_test "watch shared_var thread" "A syntax error in expression, near `thread'\." "Invalid watch syntax"
-gdb_test "Next 5" ".*"
+set bpexitline [gdb_get_line_number "all threads started"]
+gdb_breakpoint "$bpexitline"
+gdb_continue_to_breakpoint "all threads started"
-gdb_test "break thread_function" "Breakpoint \[0-9\].*" \
- "Set breakpoint at thread_function"
+gdb_test "break loop" "Breakpoint \[0-9\].*" \
+ "Set breakpoint at loop"
-gdb_test "continue" ".*Breakpoint 2.*" "Stopped in thread_function"
+gdb_test "continue" ".*Breakpoint .*loop.*" "Stopped in loop"
gdb_test_multiple "thread" "Thread command" {
-re ".*Current thread is (\[0-9\]*).*$gdb_prompt $" {
@@ -62,15 +64,35 @@ gdb_test_multiple "thread" "Thread command" {
set thread_num "$expect_out(1,string)"
-gdb_test_no_output "disable 2" "Disable breakpoint 2"
-gdb_test "watch shared_var thread $thread_num" "Hardware watchpoint 3: shared_var" "Watchpoint on shared variable"
-gdb_test "info breakpoint 3" "stop only in thread $thread_num"
+delete_breakpoints
-for {set i 1} {$i <= 10} {incr i 1} {
+# We use an access watchpoint rather than a write watchpoint, because
+# GDB can drop the latter when multiple threads trigger events
+# simultaneously, on targets with continuable watchpoints, such as
+# x86. See PR breakpoints/10116.
+
+gdb_test "awatch shared_var thread $thread_num" \
+ "Hardware access \\(read/write\\) watchpoint .*: shared_var.*" \
+ "Watchpoint on shared variable"
+
+gdb_test "info breakpoint \$bpnum" \
+ "stop only in thread $thread_num" \
+ "info breakpoint shows watchpoint is thread-specific"
+
+for {set i 1} {$i <= 5} {incr i} {
set watchpoint "Watchpoint triggered iteration $i"
set check "Check thread that triggered iteration $i"
- gdb_test "continue" "Hardware watchpoint 3: shared_var.*" $watchpoint
+ set test $watchpoint
+ gdb_test_multiple "continue" $test {
+ -re "infrun:" {
+ # Avoid timeouts when debugging GDB.
+ exp_continue
+ }
+ -re "Hardware access \\(read/write\\) watchpoint .*: shared_var.*$gdb_prompt $" {
+ pass $test
+ }
+ }
gdb_test "thread" ".*Current thread is $thread_num .*" $check
}
--
1.9.3
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 1/8] gdb.threads/{siginfo-thread.c,watchthreads-reorder.c,ia64-sigill.c} races with GDB
2014-12-26 20:31 [PATCH 0/8] Linux: starvation avoidance in non-stop mode Pedro Alves
2014-12-26 20:31 ` [PATCH 2/8] watch_thread_num.exp and targets with fairer event reporting Pedro Alves
@ 2014-12-26 20:31 ` Pedro Alves
2015-01-06 6:20 ` Yao Qi
2014-12-26 20:31 ` [PATCH 3/8] cleanup and speed up (software_)breakpoint_inserted_here_p Pedro Alves
` (6 subsequent siblings)
8 siblings, 1 reply; 23+ messages in thread
From: Pedro Alves @ 2014-12-26 20:31 UTC (permalink / raw)
To: gdb-patches
These three test all spawn a few threads and then send a SIGSTOP to
their parent GDB in order to pause it while the new threads set things
up for the test. With a GDB patch that changes the inferior thread's
scheduling a bit, I sometimes see:
FAIL: gdb.threads/siginfo-threads.exp: catch signal 0 (timeout)
...
FAIL: gdb.threads/watchthreads-reorder.exp: reorder1: continue a (timeout)
...
FAIL: gdb.threads/ia64-sigill.exp: continue (timeout)
...
The issue is that the test program stops GDB before it had a chance of
processing the new thread's clone event:
(gdb) PASS: gdb.threads/siginfo-threads.exp: get pid
continue
Continuing.
Stopping GDB PID 21541.
Waiting till the threads initialize their TIDs.
FAIL: gdb.threads/siginfo-threads.exp: catch signal 0 (timeout)
On Linux (at least), new threads start stopped, and the debugger must
resume them. The fix is to make the test program wait for the new
threads to be running before stopping GDB.
gdb/testsuite/
2014-12-26 Pedro Alves <palves@redhat.com>
* gdb.threads/ia64-sigill.c (threads_started_barrier): New global.
(thread_func): Wait on barrier.
(main): Wait for all threads to start before stopping GDB.
* gdb.threads/siginfo-threads.c (threads_started_barrier): New
global.
(thread1_func, thread2_func): Wait on barrier.
(main): Wait for all threads to start before stopping GDB.
* gdb.threads/watchthreads-reorder.c (threads_started_barrier):
New global.
(thread1_func, thread2_func): Wait on barrier.
(main): Wait for all threads to start before stopping GDB.
---
gdb/testsuite/gdb.threads/ia64-sigill.c | 11 +++++++++++
gdb/testsuite/gdb.threads/siginfo-threads.c | 13 +++++++++++++
gdb/testsuite/gdb.threads/watchthreads-reorder.c | 13 +++++++++++++
3 files changed, 37 insertions(+)
diff --git a/gdb/testsuite/gdb.threads/ia64-sigill.c b/gdb/testsuite/gdb.threads/ia64-sigill.c
index 4833e47..a685fb0 100644
--- a/gdb/testsuite/gdb.threads/ia64-sigill.c
+++ b/gdb/testsuite/gdb.threads/ia64-sigill.c
@@ -44,6 +44,8 @@ static pthread_mutex_t thread2_tid_mutex = PTHREAD_ERRORCHECK_MUTEX_INITIALIZER_
static pthread_mutex_t terminate_mutex = PTHREAD_ERRORCHECK_MUTEX_INITIALIZER_NP;
+static pthread_barrier_t threads_started_barrier;
+
/* Do not use alarm as it would create a ptrace event which would hang up us if
we are being traced by GDB which we stopped ourselves. */
@@ -78,6 +80,8 @@ thread_func (void *threadno_voidp)
int threadno = (intptr_t) threadno_voidp;
int i;
+ pthread_barrier_wait (&threads_started_barrier);
+
switch (threadno)
{
case 1:
@@ -272,6 +276,8 @@ main (int argc, char **argv)
timed_mutex_lock (&terminate_mutex);
+ pthread_barrier_init (&threads_started_barrier, NULL, 3);
+
i = pthread_create (&thread1, NULL, thread_func, (void *) (intptr_t) 1);
assert (i == 0);
@@ -298,6 +304,11 @@ main (int argc, char **argv)
atexit (cleanup);
+ /* Wait until all threads are seen running. On Linux (at least),
+ new threads start stopped, and the debugger must resume them.
+ Need to wait for that before stopping GDB. */
+ pthread_barrier_wait (&threads_started_barrier);
+
printf ("Stopping GDB PID %lu.\n", (unsigned long) tracer);
if (tracer)
diff --git a/gdb/testsuite/gdb.threads/siginfo-threads.c b/gdb/testsuite/gdb.threads/siginfo-threads.c
index 7aa1ea2..b86df54 100644
--- a/gdb/testsuite/gdb.threads/siginfo-threads.c
+++ b/gdb/testsuite/gdb.threads/siginfo-threads.c
@@ -54,6 +54,8 @@ static int thread2_sigusr2_hit;
static pthread_mutex_t terminate_mutex
= PTHREAD_ERRORCHECK_MUTEX_INITIALIZER_NP;
+static pthread_barrier_t threads_started_barrier;
+
/* Do not use alarm as it would create a ptrace event which would hang
us up if we are being traced by GDB, which we stopped
ourselves. */
@@ -128,6 +130,8 @@ thread1_func (void *unused)
{
int i;
+ pthread_barrier_wait (&threads_started_barrier);
+
timed_mutex_lock (&thread1_tid_mutex);
/* THREAD1_TID_MUTEX must be already locked to avoid a race. */
@@ -163,6 +167,8 @@ thread2_func (void *unused)
{
int i;
+ pthread_barrier_wait (&threads_started_barrier);
+
timed_mutex_lock (&thread2_tid_mutex);
/* THREAD2_TID_MUTEX must be already locked to avoid a race. */
@@ -354,6 +360,8 @@ main (int argc, char **argv)
assert_perror (errno);
assert (i == 0);
+ pthread_barrier_init (&threads_started_barrier, NULL, 3);
+
i = pthread_create (&thread1, NULL, thread1_func, NULL);
assert (i == 0);
@@ -380,6 +388,11 @@ main (int argc, char **argv)
atexit (cleanup);
+ /* Wait until all threads are seen running. On Linux (at least),
+ new threads start stopped, and the debugger must resume them.
+ Need to wait for that before stopping GDB. */
+ pthread_barrier_wait (&threads_started_barrier);
+
printf ("Stopping GDB PID %lu.\n", (unsigned long) tracer);
if (tracer)
diff --git a/gdb/testsuite/gdb.threads/watchthreads-reorder.c b/gdb/testsuite/gdb.threads/watchthreads-reorder.c
index 315104a..05b200c 100644
--- a/gdb/testsuite/gdb.threads/watchthreads-reorder.c
+++ b/gdb/testsuite/gdb.threads/watchthreads-reorder.c
@@ -44,6 +44,8 @@ static pthread_mutex_t thread2_tid_mutex = PTHREAD_ERRORCHECK_MUTEX_INITIALIZER_
static pthread_mutex_t terminate_mutex = PTHREAD_ERRORCHECK_MUTEX_INITIALIZER_NP;
+static pthread_barrier_t threads_started_barrier;
+
/* These variables must have lower in-memory addresses than thread1_rwatch and
thread2_rwatch so that they take their watchpoint slots. */
@@ -87,6 +89,8 @@ thread1_func (void *unused)
int i;
volatile int rwatch_store;
+ pthread_barrier_wait (&threads_started_barrier);
+
timed_mutex_lock (&thread1_tid_mutex);
/* THREAD1_TID_MUTEX must be already locked to avoid race. */
@@ -113,6 +117,8 @@ thread2_func (void *unused)
int i;
volatile int rwatch_store;
+ pthread_barrier_wait (&threads_started_barrier);
+
timed_mutex_lock (&thread2_tid_mutex);
/* THREAD2_TID_MUTEX must be already locked to avoid race. */
@@ -279,6 +285,8 @@ main (int argc, char **argv)
timed_mutex_lock (&terminate_mutex);
+ pthread_barrier_init (&threads_started_barrier, NULL, 3);
+
i = pthread_create (&thread1, NULL, thread1_func, NULL);
assert (i == 0);
@@ -305,6 +313,11 @@ main (int argc, char **argv)
atexit (cleanup);
+ /* Wait until all threads are seen running. On Linux (at least),
+ new threads start stopped, and the debugger must resume them.
+ Need to wait for that before stopping GDB. */
+ pthread_barrier_wait (&threads_started_barrier);
+
printf ("Stopping GDB PID %lu.\n", (unsigned long) tracer);
if (tracer)
--
1.9.3
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/8] gdb.threads/{siginfo-thread.c,watchthreads-reorder.c,ia64-sigill.c} races with GDB
2014-12-26 20:31 ` [PATCH 1/8] gdb.threads/{siginfo-thread.c,watchthreads-reorder.c,ia64-sigill.c} races with GDB Pedro Alves
@ 2015-01-06 6:20 ` Yao Qi
0 siblings, 0 replies; 23+ messages in thread
From: Yao Qi @ 2015-01-06 6:20 UTC (permalink / raw)
To: Pedro Alves; +Cc: gdb-patches
Pedro Alves <palves@redhat.com> writes:
> The issue is that the test program stops GDB before it had a chance of
> processing the new thread's clone event:
>
> (gdb) PASS: gdb.threads/siginfo-threads.exp: get pid
> continue
> Continuing.
> Stopping GDB PID 21541.
> Waiting till the threads initialize their TIDs.
> FAIL: gdb.threads/siginfo-threads.exp: catch signal 0 (timeout)
>
> On Linux (at least), new threads start stopped, and the debugger must
> resume them. The fix is to make the test program wait for the new
> threads to be running before stopping GDB.
Looks there is a deadlock. main thread sends SIGSTOP to gdb and calls
pthread_cond_wait to wait for being notified by child thread. Child
thread is stopped (should be resumed by gdb) and can't notify main
thread. The patch looks right to me.
--
Yao (齐尧)
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 3/8] cleanup and speed up (software_)breakpoint_inserted_here_p
2014-12-26 20:31 [PATCH 0/8] Linux: starvation avoidance in non-stop mode Pedro Alves
2014-12-26 20:31 ` [PATCH 2/8] watch_thread_num.exp and targets with fairer event reporting Pedro Alves
2014-12-26 20:31 ` [PATCH 1/8] gdb.threads/{siginfo-thread.c,watchthreads-reorder.c,ia64-sigill.c} races with GDB Pedro Alves
@ 2014-12-26 20:31 ` Pedro Alves
2014-12-26 20:32 ` [PATCH 6/8] linux-nat.c: better starvation avoidance, handle non-stop mode too Pedro Alves
` (5 subsequent siblings)
8 siblings, 0 replies; 23+ messages in thread
From: Pedro Alves @ 2014-12-26 20:31 UTC (permalink / raw)
To: gdb-patches
Factor out common code, and use the more efficient
ALL_BP_LOCATIONS_AT_ADDR.
gdb/
2014-12-26 Pedro Alves <palves@redhat.com>
* breakpoint.c (bp_location_inserted_here_p): New function,
factored out from ...
(breakpoint_inserted_here_p): ... here. Use
ALL_BP_LOCATIONS_AT_ADDR.
(software_breakpoint_inserted_here_p): Use
bp_location_inserted_here_p and ALL_BP_LOCATIONS_AT_ADDR.
---
gdb/breakpoint.c | 60 +++++++++++++++++++++++++++++++++-----------------------
1 file changed, 35 insertions(+), 25 deletions(-)
diff --git a/gdb/breakpoint.c b/gdb/breakpoint.c
index 8ccaf43..99b9190 100644
--- a/gdb/breakpoint.c
+++ b/gdb/breakpoint.c
@@ -4226,29 +4226,45 @@ moribund_breakpoint_here_p (struct address_space *aspace, CORE_ADDR pc)
return 0;
}
+/* Returns non-zero iff BL is inserted at PC, in address space
+ ASPACE. */
+
+static int
+bp_location_inserted_here_p (struct bp_location *bl,
+ struct address_space *aspace, CORE_ADDR pc)
+{
+ if (bl->inserted
+ && breakpoint_address_match (bl->pspace->aspace, bl->address,
+ aspace, pc))
+ {
+ if (overlay_debugging
+ && section_is_overlay (bl->section)
+ && !section_is_mapped (bl->section))
+ return 0; /* unmapped overlay -- can't be a match */
+ else
+ return 1;
+ }
+ return 0;
+}
+
/* Returns non-zero iff there's a breakpoint inserted at PC. */
int
breakpoint_inserted_here_p (struct address_space *aspace, CORE_ADDR pc)
{
- struct bp_location *bl, **blp_tmp;
+ struct bp_location **blp, **blp_tmp = NULL;
+ struct bp_location *bl;
- ALL_BP_LOCATIONS (bl, blp_tmp)
+ ALL_BP_LOCATIONS_AT_ADDR (blp, blp_tmp, pc)
{
+ struct bp_location *bl = *blp;
+
if (bl->loc_type != bp_loc_software_breakpoint
&& bl->loc_type != bp_loc_hardware_breakpoint)
continue;
- if (bl->inserted
- && breakpoint_location_address_match (bl, aspace, pc))
- {
- if (overlay_debugging
- && section_is_overlay (bl->section)
- && !section_is_mapped (bl->section))
- continue; /* unmapped overlay -- can't be a match */
- else
- return 1;
- }
+ if (bp_location_inserted_here_p (bl, aspace, pc))
+ return 1;
}
return 0;
}
@@ -4260,24 +4276,18 @@ int
software_breakpoint_inserted_here_p (struct address_space *aspace,
CORE_ADDR pc)
{
- struct bp_location *bl, **blp_tmp;
+ struct bp_location **blp, **blp_tmp = NULL;
+ struct bp_location *bl;
- ALL_BP_LOCATIONS (bl, blp_tmp)
+ ALL_BP_LOCATIONS_AT_ADDR (blp, blp_tmp, pc)
{
+ struct bp_location *bl = *blp;
+
if (bl->loc_type != bp_loc_software_breakpoint)
continue;
- if (bl->inserted
- && breakpoint_address_match (bl->pspace->aspace, bl->address,
- aspace, pc))
- {
- if (overlay_debugging
- && section_is_overlay (bl->section)
- && !section_is_mapped (bl->section))
- continue; /* unmapped overlay -- can't be a match */
- else
- return 1;
- }
+ if (bp_location_inserted_here_p (bl, aspace, pc))
+ return 1;
}
return 0;
--
1.9.3
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 6/8] linux-nat.c: better starvation avoidance, handle non-stop mode too
2014-12-26 20:31 [PATCH 0/8] Linux: starvation avoidance in non-stop mode Pedro Alves
` (2 preceding siblings ...)
2014-12-26 20:31 ` [PATCH 3/8] cleanup and speed up (software_)breakpoint_inserted_here_p Pedro Alves
@ 2014-12-26 20:32 ` Pedro Alves
2015-01-07 7:06 ` Yao Qi
2014-12-26 20:32 ` [PATCH 5/8] linux-nat.c: always mark execing LWP as resumed Pedro Alves
` (4 subsequent siblings)
8 siblings, 1 reply; 23+ messages in thread
From: Pedro Alves @ 2014-12-26 20:32 UTC (permalink / raw)
To: gdb-patches
Running the testsuite with a series that reimplements user-visible
all-stop behavior on top of a target running in non-stop mode revealed
problems related to event starvation avoidance.
For example, I see
gdb.threads/signal-while-stepping-over-bp-other-thread.exp failing.
What happens is that GDB core never gets to see the signal event. It
ends up processing the events for the same threads over an over,
because Linux's waitpid(-1, ...) returns that first task in the task
list that has an event, starving threads on the tail of the task list.
So I wrote a non-stop mode test originally inspired by
signal-while-stepping-over-bp-other-thread.exp, to stress this
independently of all-stop on top of non-stop. Fixing it required the
changes described below. The test will be added in a following
commit.
1) linux-nat.c has code in place that picks an event LWP at random out
of all that have had events. This is because on the kernel side,
"waitpid(-1, ...)" just walks the task list linearly looking for the
first that had an event. But, this code is currently only used in
all-stop mode. So with a multi-threaded program that has multiple
events triggering debug events in parallel, GDB ends up starving some
threads.
To make the event randomization work in non-stop mode too, the patch
makes us pull out all the already pending events on the kernel side,
with waitpid, before deciding which LWP to report to the core.
There's some code in linux_wait that takes care of leaving events
pending if they were for LWPs the caller is not interested in. The
patch moves that to linux_nat_filter_event, so that we only have one
place that leaves events pending. With that in place, conceptually,
the flow is simpler and more normalized:
#1 - walk the LWP list looking for an LWP with a pending event to report.
#2 - if no pending event, pull events out of the kernel, and store
them in the LWP structures as pending.
#3- goto #1.
2) Then, currently the event randomization code only considers SIGTRAP
(or trap-like) events. That means that if e.g., have have multiple
threads stepping in parallel that hit a breakpoint that needs stepping
over, and one gets a signal, the signal may end up never getting
processed, because GDB will always be giving priority to the SIGTRAPs.
The patch fixes this by making the randomization code consider all
kinds of pending events.
3) If multiple threads hit a breakpoint, we report one of those, and
"cancel" the others. Cancelling means decrementing the PC, and
discarding the event. If the next time the LWP is resumed the
breakpoint is still installed, the LWP should hit it again, and we'll
report the hit then. The problem I found is that this delays threads
from advancing too much, with the kernel potentially ending up
scheduling the same threads over and over, and others not advancing.
So the patch switches away from cancelling the breakpoints, and
instead remembering that the LWP had stopped for a breakpoint. If on
resume the breakpoint is still installed, we report it. If it's no
longer installed, we discard the pending event then. This is actually
how GDBserver used to handle this before d50171e4 (Teach linux
gdbserver to step-over-breakpoints), but with the difference that back
then we'd delay adjusting the PC until resuming, which made it so that
"info threads" could wrongly see threads with unadjusted PCs.
gdb/
2014-12-26 Pedro Alves <palves@redhat.com>
* breakpoint.c (hardware_breakpoint_inserted_here_p): New
function.
* breakpoint.h (hardware_breakpoint_inserted_here_p): New
declaration.
* linux-nat.c (linux_nat_status_is_event): Move higher up in file.
(linux_resume_one_lwp): Store the thread's PC. Adjust to clear
stop_reason.
(check_stopped_by_watchpoint): New function.
(save_sigtrap): Reimplement.
(linux_nat_stopped_by_watchpoint): Adjust.
(stop_wait_callback): Only call save_sigtrap after storing the
pending status.
(status_callback): If the thread had been stopped for a breakpoint
that has since been removed, discard the event and resume the LWP.
(select_event_lwp_callback): Use lwp_status_pending_p instead of
linux_nat_lp_status_is_event.
(cancel_breakpoint): Rename to ...
(check_stopped_by_breakpoint): ... this. Record whether the LWP
stopped for a software breakpoint or hardware breakpoint.
(select_event_lwp): Only give preference to the stepping LWP in
all-stop mode. Adjust comments.
(stop_and_resume_callback): Remove references to new_pending_p.
(linux_nat_filter_event): Likewise. Leave exit events of the
leader thread pending here. Handle signal short circuiting here.
Only call save_sigtrap after storing the pending waitstatus.
(linux_nat_wait_1): Remove 'retry' label. Remove references to
new_pending. Don't handle leaving events the caller is not
interested in pending here, nor handle signal short-circuiting
here. Also give equal priority to all LWPs that have had events
in non-stop mode. If reporting a software breakpoint event,
unadjust the LWP's PC.
* linux-nat.h (enum lwp_stop_reason): New.
(struct lwp_info) <stop_pc>: New field.
(struct lwp_info) <stopped_by_watchpoint>: Delete field.
(struct lwp_info) <stop_reason>: New field.
* x86-linux-nat.c (x86_linux_prepare_to_resume): Adjust.
---
gdb/breakpoint.c | 23 +++
gdb/breakpoint.h | 5 +
gdb/linux-nat.c | 572 +++++++++++++++++++++++++++-------------------------
gdb/linux-nat.h | 31 ++-
gdb/x86-linux-nat.c | 2 +-
5 files changed, 349 insertions(+), 284 deletions(-)
diff --git a/gdb/breakpoint.c b/gdb/breakpoint.c
index 99b9190..59fb2be 100644
--- a/gdb/breakpoint.c
+++ b/gdb/breakpoint.c
@@ -4293,6 +4293,29 @@ software_breakpoint_inserted_here_p (struct address_space *aspace,
return 0;
}
+/* See breakpoint.h. */
+
+int
+hardware_breakpoint_inserted_here_p (struct address_space *aspace,
+ CORE_ADDR pc)
+{
+ struct bp_location **blp, **blp_tmp = NULL;
+ struct bp_location *bl;
+
+ ALL_BP_LOCATIONS_AT_ADDR (blp, blp_tmp, pc)
+ {
+ struct bp_location *bl = *blp;
+
+ if (bl->loc_type != bp_loc_hardware_breakpoint)
+ continue;
+
+ if (bp_location_inserted_here_p (bl, aspace, pc))
+ return 1;
+ }
+
+ return 0;
+}
+
int
hardware_watchpoint_inserted_in_range (struct address_space *aspace,
CORE_ADDR addr, ULONGEST len)
diff --git a/gdb/breakpoint.h b/gdb/breakpoint.h
index c383cd4..59887d5 100644
--- a/gdb/breakpoint.h
+++ b/gdb/breakpoint.h
@@ -1130,6 +1130,11 @@ extern int regular_breakpoint_inserted_here_p (struct address_space *,
extern int software_breakpoint_inserted_here_p (struct address_space *,
CORE_ADDR);
+/* Return non-zero iff there is a hardware breakpoint inserted at
+ PC. */
+extern int hardware_breakpoint_inserted_here_p (struct address_space *,
+ CORE_ADDR);
+
/* Check whether any location of BP is inserted at PC. */
extern int breakpoint_has_location_inserted_here (struct breakpoint *bp,
diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
index c50a59f..a3d4c32 100644
--- a/gdb/linux-nat.c
+++ b/gdb/linux-nat.c
@@ -279,6 +279,10 @@ static struct lwp_info *find_lwp_pid (ptid_t ptid);
static int lwp_status_pending_p (struct lwp_info *lp);
+static int check_stopped_by_breakpoint (struct lwp_info *lp);
+static int sigtrap_is_event (int status);
+static int (*linux_nat_status_is_event) (int status) = sigtrap_is_event;
+
\f
/* Trivial list manipulation functions to keep track of a list of
new stopped processes. */
@@ -1441,12 +1445,25 @@ linux_resume_one_lwp (struct lwp_info *lp, int step, enum gdb_signal signo)
ptid_t ptid;
lp->step = step;
+
+ /* stop_pc doubles as the PC the LWP had when it was last resumed.
+ We only presently need that if the LWP is stepped though (to
+ handle the case of stepping a breakpoint instruction). */
+ if (step)
+ {
+ struct regcache *regcache = get_thread_regcache (lp->ptid);
+
+ lp->stop_pc = regcache_read_pc (regcache);
+ }
+ else
+ lp->stop_pc = 0;
+
if (linux_nat_prepare_to_resume != NULL)
linux_nat_prepare_to_resume (lp);
/* Convert to something the lower layer understands. */
ptid = pid_to_ptid (ptid_get_lwp (lp->ptid));
linux_ops->to_resume (linux_ops, ptid, step, signo);
- lp->stopped_by_watchpoint = 0;
+ lp->stop_reason = LWP_STOPPED_BY_NO_REASON;
lp->stopped = 0;
registers_changed_ptid (lp->ptid);
}
@@ -2288,24 +2305,21 @@ maybe_clear_ignore_sigint (struct lwp_info *lp)
soon as we see LP stop with a SIGTRAP. If GDB changes the debug
registers meanwhile, we have the cached data we can rely on. */
-static void
-save_sigtrap (struct lwp_info *lp)
+static int
+check_stopped_by_watchpoint (struct lwp_info *lp)
{
struct cleanup *old_chain;
if (linux_ops->to_stopped_by_watchpoint == NULL)
- {
- lp->stopped_by_watchpoint = 0;
- return;
- }
+ return 0;
old_chain = save_inferior_ptid ();
inferior_ptid = lp->ptid;
- lp->stopped_by_watchpoint = linux_ops->to_stopped_by_watchpoint (linux_ops);
-
- if (lp->stopped_by_watchpoint)
+ if (linux_ops->to_stopped_by_watchpoint (linux_ops))
{
+ lp->stop_reason = LWP_STOPPED_BY_WATCHPOINT;
+
if (linux_ops->to_stopped_data_address != NULL)
lp->stopped_data_address_p =
linux_ops->to_stopped_data_address (¤t_target,
@@ -2315,9 +2329,27 @@ save_sigtrap (struct lwp_info *lp)
}
do_cleanups (old_chain);
+
+ return lp->stop_reason == LWP_STOPPED_BY_WATCHPOINT;
+}
+
+/* Called when the LWP stopped for a trap that could be explained by a
+ watchpoint or a breakpoint. */
+
+static void
+save_sigtrap (struct lwp_info *lp)
+{
+ gdb_assert (lp->stop_reason == LWP_STOPPED_BY_NO_REASON);
+ gdb_assert (lp->status != 0);
+
+ if (check_stopped_by_watchpoint (lp))
+ return;
+
+ if (linux_nat_status_is_event (lp->status))
+ check_stopped_by_breakpoint (lp);
}
-/* See save_sigtrap. */
+/* Returns true if the LWP had stopped for a watchpoint. */
static int
linux_nat_stopped_by_watchpoint (struct target_ops *ops)
@@ -2326,7 +2358,7 @@ linux_nat_stopped_by_watchpoint (struct target_ops *ops)
gdb_assert (lp != NULL);
- return lp->stopped_by_watchpoint;
+ return lp->stop_reason == LWP_STOPPED_BY_WATCHPOINT;
}
static int
@@ -2351,8 +2383,6 @@ sigtrap_is_event (int status)
/* SIGTRAP-like events recognizer. */
-static int (*linux_nat_status_is_event) (int status) = sigtrap_is_event;
-
/* Check for SIGTRAP-like events in LP. */
static int
@@ -2422,8 +2452,6 @@ stop_wait_callback (struct lwp_info *lp, void *data)
{
/* The thread was stopped with a signal other than SIGSTOP. */
- save_sigtrap (lp);
-
if (debug_linux_nat)
fprintf_unfiltered (gdb_stdlog,
"SWC: Pending event %s in %s\n",
@@ -2433,6 +2461,7 @@ stop_wait_callback (struct lwp_info *lp, void *data)
/* Save the sigtrap event. */
lp->status = status;
gdb_assert (lp->signalled);
+ save_sigtrap (lp);
}
else
{
@@ -2453,7 +2482,9 @@ stop_wait_callback (struct lwp_info *lp, void *data)
return 0;
}
-/* Return non-zero if LP has a wait status pending. */
+/* Return non-zero if LP has a wait status pending. Discard the
+ pending event and resume the LWP if the event that originally
+ caused the stop became uninteresting. */
static int
status_callback (struct lwp_info *lp, void *data)
@@ -2463,6 +2494,53 @@ status_callback (struct lwp_info *lp, void *data)
if (!lp->resumed)
return 0;
+ if (lp->stop_reason == LWP_STOPPED_BY_SW_BREAKPOINT
+ || lp->stop_reason == LWP_STOPPED_BY_HW_BREAKPOINT)
+ {
+ struct regcache *regcache = get_thread_regcache (lp->ptid);
+ struct gdbarch *gdbarch = get_regcache_arch (regcache);
+ CORE_ADDR pc;
+ int discard = 0;
+
+ gdb_assert (lp->status != 0);
+
+ pc = regcache_read_pc (regcache);
+
+ if (pc != lp->stop_pc)
+ {
+ if (debug_linux_nat)
+ fprintf_unfiltered (gdb_stdlog,
+ "SC: PC of %s changed. was=%s, now=%s\n",
+ target_pid_to_str (lp->ptid),
+ paddress (target_gdbarch (), lp->stop_pc),
+ paddress (target_gdbarch (), pc));
+ discard = 1;
+ }
+ else if (!breakpoint_inserted_here_p (get_regcache_aspace (regcache), pc))
+ {
+ if (debug_linux_nat)
+ fprintf_unfiltered (gdb_stdlog,
+ "SC: previous breakpoint of %s, at %s gone\n",
+ target_pid_to_str (lp->ptid),
+ paddress (target_gdbarch (), lp->stop_pc));
+
+ discard = 1;
+ }
+
+ if (discard)
+ {
+ if (debug_linux_nat)
+ fprintf_unfiltered (gdb_stdlog,
+ "SC: pending event of %s cancelled.\n",
+ target_pid_to_str (lp->ptid));
+
+ lp->status = 0;
+ linux_resume_one_lwp (lp, lp->step, GDB_SIGNAL_0);
+ return 0;
+ }
+ return 1;
+ }
+
return lwp_status_pending_p (lp);
}
@@ -2523,16 +2601,19 @@ select_event_lwp_callback (struct lwp_info *lp, void *data)
gdb_assert (selector != NULL);
- /* Select only resumed LWPs that have a SIGTRAP event pending. */
- if (lp->resumed && linux_nat_lp_status_is_event (lp))
+ /* Select only resumed LWPs that have an event pending. */
+ if (lp->resumed && lwp_status_pending_p (lp))
if ((*selector)-- == 0)
return 1;
return 0;
}
+/* Called when the LWP got a signal/trap that could be explained by a
+ software or hardware breakpoint. */
+
static int
-cancel_breakpoint (struct lwp_info *lp)
+check_stopped_by_breakpoint (struct lwp_info *lp)
{
/* Arrange for a breakpoint to be hit again later. We don't keep
the SIGTRAP status and don't forward the SIGTRAP signal to the
@@ -2546,48 +2627,42 @@ cancel_breakpoint (struct lwp_info *lp)
struct regcache *regcache = get_thread_regcache (lp->ptid);
struct gdbarch *gdbarch = get_regcache_arch (regcache);
CORE_ADDR pc;
+ CORE_ADDR sw_bp_pc;
+
+ pc = regcache_read_pc (regcache);
+ sw_bp_pc = pc - target_decr_pc_after_break (gdbarch);
- pc = regcache_read_pc (regcache) - target_decr_pc_after_break (gdbarch);
- if (breakpoint_inserted_here_p (get_regcache_aspace (regcache), pc))
+ if ((!lp->step || lp->stop_pc == sw_bp_pc)
+ && software_breakpoint_inserted_here_p (get_regcache_aspace (regcache),
+ sw_bp_pc))
{
+ /* The LWP was either continued, or stepped a software
+ breakpoint instruction. */
if (debug_linux_nat)
fprintf_unfiltered (gdb_stdlog,
- "CB: Push back breakpoint for %s\n",
+ "CB: Push back software breakpoint for %s\n",
target_pid_to_str (lp->ptid));
/* Back up the PC if necessary. */
- if (target_decr_pc_after_break (gdbarch))
- regcache_write_pc (regcache, pc);
+ if (pc != sw_bp_pc)
+ regcache_write_pc (regcache, sw_bp_pc);
+ lp->stop_pc = sw_bp_pc;
+ lp->stop_reason = LWP_STOPPED_BY_SW_BREAKPOINT;
return 1;
}
- return 0;
-}
-
-static int
-cancel_breakpoints_callback (struct lwp_info *lp, void *data)
-{
- struct lwp_info *event_lp = data;
- /* Leave the LWP that has been elected to receive a SIGTRAP alone. */
- if (lp == event_lp)
- return 0;
-
- /* If a LWP other than the LWP that we're reporting an event for has
- hit a GDB breakpoint (as opposed to some random trap signal),
- then just arrange for it to hit it again later. We don't keep
- the SIGTRAP status and don't forward the SIGTRAP signal to the
- LWP. We will handle the current event, eventually we will resume
- all LWPs, and this one will get its breakpoint trap again.
-
- If we do not do this, then we run the risk that the user will
- delete or disable the breakpoint, but the LWP will have already
- tripped on it. */
+ if (hardware_breakpoint_inserted_here_p (get_regcache_aspace (regcache), pc))
+ {
+ if (debug_linux_nat)
+ fprintf_unfiltered (gdb_stdlog,
+ "CB: Push back hardware breakpoint for %s\n",
+ target_pid_to_str (lp->ptid));
- if (linux_nat_lp_status_is_event (lp)
- && cancel_breakpoint (lp))
- /* Throw away the SIGTRAP. */
- lp->status = 0;
+ lp->stop_pc = pc;
+ lp->stop_reason = LWP_STOPPED_BY_HW_BREAKPOINT;
+ return 1;
+ }
return 0;
}
@@ -2599,36 +2674,48 @@ select_event_lwp (ptid_t filter, struct lwp_info **orig_lp, int *status)
{
int num_events = 0;
int random_selector;
- struct lwp_info *event_lp;
+ struct lwp_info *event_lp = NULL;
/* Record the wait status for the original LWP. */
(*orig_lp)->status = *status;
- /* Give preference to any LWP that is being single-stepped. */
- event_lp = iterate_over_lwps (filter,
- select_singlestep_lwp_callback, NULL);
- if (event_lp != NULL)
+ /* In all-stop, give preference to the LWP that is being
+ single-stepped. There will be at most one, and it will be the
+ LWP that the core is most interested in. If we didn't do this,
+ then we'd have to handle pending step SIGTRAPs somehow in case
+ the core later continues the previously-stepped thread, as
+ otherwise we'd report the pending SIGTRAP then, and the core, not
+ having stepped the thread, wouldn't understand what the trap was
+ for, and therefore would report it to the user as a random
+ signal. */
+ if (!non_stop)
{
- if (debug_linux_nat)
- fprintf_unfiltered (gdb_stdlog,
- "SEL: Select single-step %s\n",
- target_pid_to_str (event_lp->ptid));
+ event_lp = iterate_over_lwps (filter,
+ select_singlestep_lwp_callback, NULL);
+ if (event_lp != NULL)
+ {
+ if (debug_linux_nat)
+ fprintf_unfiltered (gdb_stdlog,
+ "SEL: Select single-step %s\n",
+ target_pid_to_str (event_lp->ptid));
+ }
}
- else
+
+ if (event_lp == NULL)
{
- /* No single-stepping LWP. Select one at random, out of those
- which have had SIGTRAP events. */
+ /* Pick one at random, out of those which have had events. */
- /* First see how many SIGTRAP events we have. */
+ /* First see how many events we have. */
iterate_over_lwps (filter, count_events_callback, &num_events);
- /* Now randomly pick a LWP out of those that have had a SIGTRAP. */
+ /* Now randomly pick a LWP out of those that have had
+ events. */
random_selector = (int)
((num_events * (double) rand ()) / (RAND_MAX + 1.0));
if (debug_linux_nat && num_events > 1)
fprintf_unfiltered (gdb_stdlog,
- "SEL: Found %d SIGTRAP events, selecting #%d\n",
+ "SEL: Found %d events, selecting #%d\n",
num_events, random_selector);
event_lp = iterate_over_lwps (filter,
@@ -2662,8 +2749,6 @@ resumed_callback (struct lwp_info *lp, void *data)
static int
stop_and_resume_callback (struct lwp_info *lp, void *data)
{
- int *new_pending_p = data;
-
if (!lp->stopped)
{
ptid_t ptid = lp->ptid;
@@ -2704,8 +2789,6 @@ stop_and_resume_callback (struct lwp_info *lp, void *data)
"SARC: not re-resuming LWP %ld "
"(has pending)\n",
ptid_get_lwp (lp->ptid));
- if (new_pending_p)
- *new_pending_p = 1;
}
}
}
@@ -2713,18 +2796,14 @@ stop_and_resume_callback (struct lwp_info *lp, void *data)
}
/* Check if we should go on and pass this event to common code.
- Return the affected lwp if we are, or NULL otherwise. If we stop
- all lwps temporarily, we may end up with new pending events in some
- other lwp. In that case set *NEW_PENDING_P to true. */
+ Return the affected lwp if we are, or NULL otherwise. */
static struct lwp_info *
-linux_nat_filter_event (int lwpid, int status, int *new_pending_p)
+linux_nat_filter_event (int lwpid, int status)
{
struct lwp_info *lp;
int event = linux_ptrace_get_extended_event (status);
- *new_pending_p = 0;
-
lp = find_lwp_pid (pid_to_ptid (lwpid));
/* Check for stop events reported by a process we didn't already
@@ -2796,42 +2875,62 @@ linux_nat_filter_event (int lwpid, int status, int *new_pending_p)
return NULL;
}
- if (linux_nat_status_is_event (status))
- save_sigtrap (lp);
-
/* Check if the thread has exited. */
- if ((WIFEXITED (status) || WIFSIGNALED (status))
- && num_lwps (ptid_get_pid (lp->ptid)) > 1)
+ if (WIFEXITED (status) || WIFSIGNALED (status))
{
- /* If this is the main thread, we must stop all threads and verify
- if they are still alive. This is because in the nptl thread model
- on Linux 2.4, there is no signal issued for exiting LWPs
- other than the main thread. We only get the main thread exit
- signal once all child threads have already exited. If we
- stop all the threads and use the stop_wait_callback to check
- if they have exited we can determine whether this signal
- should be ignored or whether it means the end of the debugged
- application, regardless of which threading model is being
- used. */
- if (ptid_get_pid (lp->ptid) == ptid_get_lwp (lp->ptid))
+ if (num_lwps (ptid_get_pid (lp->ptid)) > 1)
{
- iterate_over_lwps (pid_to_ptid (ptid_get_pid (lp->ptid)),
- stop_and_resume_callback, new_pending_p);
+ /* If this is the main thread, we must stop all threads and
+ verify if they are still alive. This is because in the
+ nptl thread model on Linux 2.4, there is no signal issued
+ for exiting LWPs other than the main thread. We only get
+ the main thread exit signal once all child threads have
+ already exited. If we stop all the threads and use the
+ stop_wait_callback to check if they have exited we can
+ determine whether this signal should be ignored or
+ whether it means the end of the debugged application,
+ regardless of which threading model is being used. */
+ if (ptid_get_pid (lp->ptid) == ptid_get_lwp (lp->ptid))
+ {
+ iterate_over_lwps (pid_to_ptid (ptid_get_pid (lp->ptid)),
+ stop_and_resume_callback, NULL);
+ }
+
+ if (debug_linux_nat)
+ fprintf_unfiltered (gdb_stdlog,
+ "LLW: %s exited.\n",
+ target_pid_to_str (lp->ptid));
+
+ if (num_lwps (ptid_get_pid (lp->ptid)) > 1)
+ {
+ /* If there is at least one more LWP, then the exit signal
+ was not the end of the debugged application and should be
+ ignored. */
+ exit_lwp (lp);
+ return NULL;
+ }
}
+ gdb_assert (lp->resumed);
+
if (debug_linux_nat)
fprintf_unfiltered (gdb_stdlog,
- "LLW: %s exited.\n",
- target_pid_to_str (lp->ptid));
+ "Process %ld exited\n",
+ ptid_get_lwp (lp->ptid));
- if (num_lwps (ptid_get_pid (lp->ptid)) > 1)
- {
- /* If there is at least one more LWP, then the exit signal
- was not the end of the debugged application and should be
- ignored. */
- exit_lwp (lp);
- return NULL;
- }
+ /* This was the last lwp in the process. Since events are
+ serialized to GDB core, we may not be able report this one
+ right now, but GDB core and the other target layers will want
+ to be notified about the exit code/signal, leave the status
+ pending for the next time we're able to report it. */
+
+ /* Dead LWP's aren't expected to reported a pending sigstop. */
+ lp->signalled = 0;
+
+ /* Store the pending event in the waitstatus, because
+ W_EXITCODE(0,0) == 0. */
+ store_waitstatus (&lp->waitstatus, status);
+ return lp;
}
/* Check if the current LWP has previously exited. In the nptl
@@ -2913,9 +3012,59 @@ linux_nat_filter_event (int lwpid, int status, int *new_pending_p)
return NULL;
}
+ /* Don't report signals that GDB isn't interested in, such as
+ signals that are neither printed nor stopped upon. Stopping all
+ threads can be a bit time-consuming so if we want decent
+ performance with heavily multi-threaded programs, especially when
+ they're using a high frequency timer, we'd better avoid it if we
+ can. */
+ if (WIFSTOPPED (status))
+ {
+ enum gdb_signal signo = gdb_signal_from_host (WSTOPSIG (status));
+
+ if (!non_stop)
+ {
+ /* Only do the below in all-stop, as we currently use SIGSTOP
+ to implement target_stop (see linux_nat_stop) in
+ non-stop. */
+ if (signo == GDB_SIGNAL_INT && signal_pass_state (signo) == 0)
+ {
+ /* If ^C/BREAK is typed at the tty/console, SIGINT gets
+ forwarded to the entire process group, that is, all LWPs
+ will receive it - unless they're using CLONE_THREAD to
+ share signals. Since we only want to report it once, we
+ mark it as ignored for all LWPs except this one. */
+ iterate_over_lwps (pid_to_ptid (ptid_get_pid (lp->ptid)),
+ set_ignore_sigint, NULL);
+ lp->ignore_sigint = 0;
+ }
+ else
+ maybe_clear_ignore_sigint (lp);
+ }
+
+ /* When using hardware single-step, we need to report every signal.
+ Otherwise, signals in pass_mask may be short-circuited. */
+ if (!lp->step
+ && WSTOPSIG (status) && sigismember (&pass_mask, WSTOPSIG (status)))
+ {
+ linux_resume_one_lwp (lp, lp->step, signo);
+ if (debug_linux_nat)
+ fprintf_unfiltered (gdb_stdlog,
+ "LLW: %s %s, %s (preempt 'handle')\n",
+ lp->step ?
+ "PTRACE_SINGLESTEP" : "PTRACE_CONT",
+ target_pid_to_str (lp->ptid),
+ (signo != GDB_SIGNAL_0
+ ? strsignal (gdb_signal_to_host (signo))
+ : "0"));
+ return NULL;
+ }
+ }
+
/* An interesting event. */
gdb_assert (lp);
lp->status = status;
+ save_sigtrap (lp);
return lp;
}
@@ -3014,9 +3163,6 @@ linux_nat_wait_1 (struct target_ops *ops,
/* Make sure SIGCHLD is blocked until the sigsuspend below. */
block_child_signals (&prev_mask);
-retry:
- status = 0;
-
/* First check if there is a LWP with a wait status pending. */
lp = iterate_over_lwps (ptid, status_callback, NULL);
if (lp != NULL)
@@ -3034,7 +3180,9 @@ retry:
set_sigint_trap ();
}
- /* But if we don't find a pending event, we'll have to wait. */
+ /* But if we don't find a pending event, we'll have to wait. Always
+ pull all events out of the kernel. We'll randomly select an
+ event LWP out of all that have events, to prevent starvation. */
while (lp == NULL)
{
@@ -3065,10 +3213,6 @@ retry:
if (lwpid > 0)
{
- /* If this is true, then we paused LWPs momentarily, and may
- now have pending events to handle. */
- int new_pending;
-
if (debug_linux_nat)
{
fprintf_unfiltered (gdb_stdlog,
@@ -3076,101 +3220,18 @@ retry:
(long) lwpid, status_to_str (status));
}
- lp = linux_nat_filter_event (lwpid, status, &new_pending);
-
- /* STATUS is now no longer valid, use LP->STATUS instead. */
- status = 0;
-
- if (lp && !ptid_match (lp->ptid, ptid))
- {
- gdb_assert (lp->resumed);
-
- if (debug_linux_nat)
- fprintf_unfiltered (gdb_stdlog,
- "LWP %ld got an event %06x, "
- "leaving pending.\n",
- ptid_get_lwp (lp->ptid), lp->status);
-
- if (WIFSTOPPED (lp->status))
- {
- if (WSTOPSIG (lp->status) != SIGSTOP)
- {
- /* Cancel breakpoint hits. The breakpoint may
- be removed before we fetch events from this
- process to report to the core. It is best
- not to assume the moribund breakpoints
- heuristic always handles these cases --- it
- could be too many events go through to the
- core before this one is handled. All-stop
- always cancels breakpoint hits in all
- threads. */
- if (non_stop
- && linux_nat_lp_status_is_event (lp)
- && cancel_breakpoint (lp))
- {
- /* Throw away the SIGTRAP. */
- lp->status = 0;
-
- if (debug_linux_nat)
- fprintf_unfiltered (gdb_stdlog,
- "LLW: LWP %ld hit a "
- "breakpoint while "
- "waiting for another "
- "process; "
- "cancelled it\n",
- ptid_get_lwp (lp->ptid));
- }
- }
- else
- lp->signalled = 0;
- }
- else if (WIFEXITED (lp->status) || WIFSIGNALED (lp->status))
- {
- if (debug_linux_nat)
- fprintf_unfiltered (gdb_stdlog,
- "Process %ld exited while stopping "
- "LWPs\n",
- ptid_get_lwp (lp->ptid));
-
- /* This was the last lwp in the process. Since
- events are serialized to GDB core, and we can't
- report this one right now, but GDB core and the
- other target layers will want to be notified
- about the exit code/signal, leave the status
- pending for the next time we're able to report
- it. */
-
- /* Dead LWP's aren't expected to reported a pending
- sigstop. */
- lp->signalled = 0;
-
- /* Store the pending event in the waitstatus as
- well, because W_EXITCODE(0,0) == 0. */
- store_waitstatus (&lp->waitstatus, lp->status);
- }
-
- /* Keep looking. */
- lp = NULL;
- }
-
- if (new_pending)
- {
- /* Some LWP now has a pending event. Go all the way
- back to check it. */
- goto retry;
- }
-
- if (lp)
- {
- /* We got an event to report to the core. */
- break;
- }
-
+ linux_nat_filter_event (lwpid, status);
/* Retry until nothing comes out of waitpid. A single
SIGCHLD can indicate more than one child stopped. */
continue;
}
+ /* Now that we've pulled all events out of the kernel, check if
+ there's any LWP with a status to report to the core. */
+ lp = iterate_over_lwps (ptid, status_callback, NULL);
+ if (lp != NULL)
+ break;
+
/* Check for zombie thread group leaders. Those can't be reaped
until all other threads in the thread group are. */
check_zombie_leaders ();
@@ -3220,68 +3281,6 @@ retry:
status = lp->status;
lp->status = 0;
- /* Don't report signals that GDB isn't interested in, such as
- signals that are neither printed nor stopped upon. Stopping all
- threads can be a bit time-consuming so if we want decent
- performance with heavily multi-threaded programs, especially when
- they're using a high frequency timer, we'd better avoid it if we
- can. */
-
- if (WIFSTOPPED (status))
- {
- enum gdb_signal signo = gdb_signal_from_host (WSTOPSIG (status));
-
- /* When using hardware single-step, we need to report every signal.
- Otherwise, signals in pass_mask may be short-circuited. */
- if (!lp->step
- && WSTOPSIG (status) && sigismember (&pass_mask, WSTOPSIG (status)))
- {
- /* FIMXE: kettenis/2001-06-06: Should we resume all threads
- here? It is not clear we should. GDB may not expect
- other threads to run. On the other hand, not resuming
- newly attached threads may cause an unwanted delay in
- getting them running. */
- linux_resume_one_lwp (lp, lp->step, signo);
- if (debug_linux_nat)
- fprintf_unfiltered (gdb_stdlog,
- "LLW: %s %s, %s (preempt 'handle')\n",
- lp->step ?
- "PTRACE_SINGLESTEP" : "PTRACE_CONT",
- target_pid_to_str (lp->ptid),
- (signo != GDB_SIGNAL_0
- ? strsignal (gdb_signal_to_host (signo))
- : "0"));
- goto retry;
- }
-
- if (!non_stop)
- {
- /* Only do the below in all-stop, as we currently use SIGINT
- to implement target_stop (see linux_nat_stop) in
- non-stop. */
- if (signo == GDB_SIGNAL_INT && signal_pass_state (signo) == 0)
- {
- /* If ^C/BREAK is typed at the tty/console, SIGINT gets
- forwarded to the entire process group, that is, all LWPs
- will receive it - unless they're using CLONE_THREAD to
- share signals. Since we only want to report it once, we
- mark it as ignored for all LWPs except this one. */
- iterate_over_lwps (pid_to_ptid (ptid_get_pid (ptid)),
- set_ignore_sigint, NULL);
- lp->ignore_sigint = 0;
- }
- else
- maybe_clear_ignore_sigint (lp);
- }
- }
-
- /* This LWP is stopped now. */
- lp->stopped = 1;
-
- if (debug_linux_nat)
- fprintf_unfiltered (gdb_stdlog, "LLW: Candidate event %s in %s.\n",
- status_to_str (status), target_pid_to_str (lp->ptid));
-
if (!non_stop)
{
/* Now stop all other LWP's ... */
@@ -3290,33 +3289,46 @@ retry:
/* ... and wait until all of them have reported back that
they're no longer running. */
iterate_over_lwps (minus_one_ptid, stop_wait_callback, NULL);
+ }
+
+ /* If we're not waiting for a specific LWP, choose an event LWP from
+ among those that have had events. Giving equal priority to all
+ LWPs that have had events helps prevent starvation. */
+ if (ptid_equal (ptid, minus_one_ptid) || ptid_is_pid (ptid))
+ select_event_lwp (ptid, &lp, &status);
+
+ gdb_assert (lp != NULL);
+
+ /* Now that we've selected our final event LWP, un-adjust its PC if
+ it was a software breakpoint. */
+ if (lp->stop_reason == LWP_STOPPED_BY_SW_BREAKPOINT)
+ {
+ struct regcache *regcache = get_thread_regcache (lp->ptid);
+ struct gdbarch *gdbarch = get_regcache_arch (regcache);
+ int decr_pc = target_decr_pc_after_break (gdbarch);
- /* If we're not waiting for a specific LWP, choose an event LWP
- from among those that have had events. Giving equal priority
- to all LWPs that have had events helps prevent
- starvation. */
- if (ptid_equal (ptid, minus_one_ptid) || ptid_is_pid (ptid))
- select_event_lwp (ptid, &lp, &status);
+ if (decr_pc != 0)
+ {
+ CORE_ADDR pc;
- /* Now that we've selected our final event LWP, cancel any
- breakpoints in other LWPs that have hit a GDB breakpoint.
- See the comment in cancel_breakpoints_callback to find out
- why. */
- iterate_over_lwps (minus_one_ptid, cancel_breakpoints_callback, lp);
+ pc = regcache_read_pc (regcache);
+ regcache_write_pc (regcache, pc + decr_pc);
+ }
+ }
- /* We'll need this to determine whether to report a SIGSTOP as
- TARGET_WAITKIND_0. Need to take a copy because
- resume_clear_callback clears it. */
- last_resume_kind = lp->last_resume_kind;
+ /* We'll need this to determine whether to report a SIGSTOP as
+ GDB_SIGNAL_0. Need to take a copy because resume_clear_callback
+ clears it. */
+ last_resume_kind = lp->last_resume_kind;
+ if (!non_stop)
+ {
/* In all-stop, from the core's perspective, all LWPs are now
stopped until a new resume action is sent over. */
iterate_over_lwps (minus_one_ptid, resume_clear_callback, NULL);
}
else
{
- /* See above. */
- last_resume_kind = lp->last_resume_kind;
resume_clear_callback (lp, NULL);
}
diff --git a/gdb/linux-nat.h b/gdb/linux-nat.h
index 0aa8377..3e3d806 100644
--- a/gdb/linux-nat.h
+++ b/gdb/linux-nat.h
@@ -23,6 +23,24 @@
struct arch_lwp_info;
+/* Reasons an LWP last stopped. */
+
+enum lwp_stop_reason
+{
+ /* Either not stopped, or stopped for a reason that doesn't require
+ special tracking. */
+ LWP_STOPPED_BY_NO_REASON,
+
+ /* Stopped by a software breakpoint. */
+ LWP_STOPPED_BY_SW_BREAKPOINT,
+
+ /* Stopped by a hardware breakpoint. */
+ LWP_STOPPED_BY_HW_BREAKPOINT,
+
+ /* Stopped by a watchpoint. */
+ LWP_STOPPED_BY_WATCHPOINT
+};
+
/* Structure describing an LWP. This is public only for the purposes
of ALL_LWPS; target-specific code should generally not access it
directly. */
@@ -59,12 +77,19 @@ struct lwp_info
/* If non-zero, a pending wait status. */
int status;
+ /* When 'stopped' is set, this is where the lwp last stopped, with
+ decr_pc_after_break already accounted for. If the LWP is
+ running, and stepping, this is the address at which the lwp was
+ resumed (that is, it's the previous stop PC). If the LWP is
+ running and not stepping, this is 0. */
+ CORE_ADDR stop_pc;
+
/* Non-zero if we were stepping this LWP. */
int step;
- /* STOPPED_BY_WATCHPOINT is non-zero if this LWP stopped with a data
- watchpoint trap. */
- int stopped_by_watchpoint;
+ /* The reason the LWP last stopped, if we need to track it
+ (breakpoint, watchpoint, etc.) */
+ enum lwp_stop_reason stop_reason;
/* On architectures where it is possible to know the data address of
a triggered watchpoint, STOPPED_DATA_ADDRESS_P is non-zero, and
diff --git a/gdb/x86-linux-nat.c b/gdb/x86-linux-nat.c
index b2141eb..752800a 100644
--- a/gdb/x86-linux-nat.c
+++ b/gdb/x86-linux-nat.c
@@ -214,7 +214,7 @@ x86_linux_prepare_to_resume (struct lwp_info *lwp)
lwp->arch_private->debug_registers_changed = 0;
}
- if (clear_status || lwp->stopped_by_watchpoint)
+ if (clear_status || lwp->stop_reason == LWP_STOPPED_BY_WATCHPOINT)
x86_linux_dr_set (lwp->ptid, DR_STATUS, 0);
}
--
1.9.3
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 6/8] linux-nat.c: better starvation avoidance, handle non-stop mode too
2014-12-26 20:32 ` [PATCH 6/8] linux-nat.c: better starvation avoidance, handle non-stop mode too Pedro Alves
@ 2015-01-07 7:06 ` Yao Qi
2015-01-07 13:22 ` Pedro Alves
0 siblings, 1 reply; 23+ messages in thread
From: Yao Qi @ 2015-01-07 7:06 UTC (permalink / raw)
To: Pedro Alves; +Cc: gdb-patches
Pedro Alves <palves@redhat.com> writes:
> 3) If multiple threads hit a breakpoint, we report one of those, and
> "cancel" the others. Cancelling means decrementing the PC, and
> discarding the event. If the next time the LWP is resumed the
> breakpoint is still installed, the LWP should hit it again, and we'll
> report the hit then. The problem I found is that this delays threads
> from advancing too much, with the kernel potentially ending up
> scheduling the same threads over and over, and others not advancing.
Sorry, I don't fully understand what is the problem. Can you elaborate?
Is it something like GDB gets an event of thread foo, and events from
other threads aren't interesting. GDB cancels them, but when GDB
resumes them, kernel always schedules thread foo.
> So the patch switches away from cancelling the breakpoints, and
> instead remembering that the LWP had stopped for a breakpoint. If on
> resume the breakpoint is still installed, we report it. If it's no
> longer installed, we discard the pending event then. This is actually
Does "breakpoint is installed" indicate "target is still running"? then,
we report the event. Otherwise, "target is stopped", we discard the
pending event. Is that correct?
> @@ -2453,7 +2482,9 @@ stop_wait_callback (struct lwp_info *lp, void *data)
> return 0;
> }
>
> -/* Return non-zero if LP has a wait status pending. */
> +/* Return non-zero if LP has a wait status pending. Discard the
> + pending event and resume the LWP if the event that originally
> + caused the stop became uninteresting. */
>
> static int
> status_callback (struct lwp_info *lp, void *data)
Probably, we should rename status_callback to something more meaningful.
> @@ -2523,16 +2601,19 @@ select_event_lwp_callback (struct lwp_info *lp, void *data)
>
> gdb_assert (selector != NULL);
>
> - /* Select only resumed LWPs that have a SIGTRAP event pending. */
> - if (lp->resumed && linux_nat_lp_status_is_event (lp))
> + /* Select only resumed LWPs that have an event pending. */
> + if (lp->resumed && lwp_status_pending_p (lp))
> if ((*selector)-- == 0)
> return 1;
Do we need to do the same change in count_events_callback? IMO, the
condition in both count_events_callback and select_event_lwp_callback
should be consistent.
--
Yao (齐尧)
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 6/8] linux-nat.c: better starvation avoidance, handle non-stop mode too
2015-01-07 7:06 ` Yao Qi
@ 2015-01-07 13:22 ` Pedro Alves
2015-01-07 14:08 ` Yao Qi
0 siblings, 1 reply; 23+ messages in thread
From: Pedro Alves @ 2015-01-07 13:22 UTC (permalink / raw)
To: Yao Qi; +Cc: gdb-patches
On 01/07/2015 07:06 AM, Yao Qi wrote:
> Pedro Alves <palves@redhat.com> writes:
>
>> 3) If multiple threads hit a breakpoint, we report one of those, and
>> "cancel" the others. Cancelling means decrementing the PC, and
>> discarding the event. If the next time the LWP is resumed the
>> breakpoint is still installed, the LWP should hit it again, and we'll
>> report the hit then. The problem I found is that this delays threads
>> from advancing too much, with the kernel potentially ending up
>> scheduling the same threads over and over, and others not advancing.
>
> Sorry, I don't fully understand what is the problem. Can you elaborate?
> Is it something like GDB gets an event of thread foo, and events from
> other threads aren't interesting. GDB cancels them, but when GDB
> resumes them, kernel always schedules thread foo.
Yeah. With all-stop-on-top-of-non-stop I saw several different
code paths/algorithms leading to some threads being starved
(resulting in occasional test time outs).
But it's visible with pure non-stop too. I think the simplest was around
something like this. Say you have 10 threads, A through J, all stepping the same
code in parallel. There's a breakpoint in that code. In the test added by
the series it's a step-resume; with software single-step it's even worse. This is
easier to trigger the more threads you have, naturally. With just the
the event-randomization-on-non-stop-too part, still canceling breakpoint
hits instead of leaving them pending, I saw things like:
#1 - Threads A and B hit breakpoint.
#2 - linux_nat_wait decides to report the event for thread A, and cancel
the other.
#3 - GDB core decides the breakpoint doesn't cause a stop, and displaced steps it.
#4 - GDB re-sets thread B stepping (resume_stopped_resumed_lwps).
#5 - Meanwhile threads D and E hit the breakpoint. linux_nat_wait decides to report
the event for thread D, and cancel the other.
#6 - GDB core decides the breakpoint doesn't cause a stop, and displaced steps it.
#7 - GDB re-sets thread E stepping (resume_stopped_resumed_lwps).
#8 - Threads F, B hit breakpoint. linux_nat_wait decides to report
the event for thread F, and cancel the other, even though it's
the second time B is hitting the breakpoint already.
etc, etc. In all that, the event for B keeps getting delayed.
After the patch, the event for B in #1 is handled before
the events for D and E are.
>
>> So the patch switches away from cancelling the breakpoints, and
>> instead remembering that the LWP had stopped for a breakpoint. If on
>> resume the breakpoint is still installed, we report it. If it's no
>> longer installed, we discard the pending event then. This is actually
>
> Does "breakpoint is installed" indicate "target is still running"? then,
> we report the event. Otherwise, "target is stopped", we discard the
> pending event. Is that correct?
No. It really means "if the breakpoint is installed on the target".
It's this change here:
@@ -2486,6 +2517,53 @@ status_callback (struct lwp_info *lp, void *data)
if (!lp->resumed)
return 0;
+ if (lp->stop_reason == LWP_STOPPED_BY_SW_BREAKPOINT
+ || lp->stop_reason == LWP_STOPPED_BY_HW_BREAKPOINT)
+ {
+ struct regcache *regcache = get_thread_regcache (lp->ptid);
+ struct gdbarch *gdbarch = get_regcache_arch (regcache);
+ CORE_ADDR pc;
+ int discard = 0;
+
+ gdb_assert (lp->status != 0);
+
+ pc = regcache_read_pc (regcache);
+
+ if (pc != lp->stop_pc)
+ {
+ if (debug_linux_nat)
+ fprintf_unfiltered (gdb_stdlog,
+ "SC: PC of %s changed. was=%s, now=%s\n",
+ target_pid_to_str (lp->ptid),
+ paddress (target_gdbarch (), lp->stop_pc),
+ paddress (target_gdbarch (), pc));
+ discard = 1;
+ }
+ else if (!breakpoint_inserted_here_p (get_regcache_aspace (regcache), pc))
+ {
+ if (debug_linux_nat)
+ fprintf_unfiltered (gdb_stdlog,
+ "SC: previous breakpoint of %s, at %s gone\n",
+ target_pid_to_str (lp->ptid),
+ paddress (target_gdbarch (), lp->stop_pc));
+
+ discard = 1;
+ }
That is, the LWP had previously stopped for a breakpoint, and we
left that event pending. But if the breakpoint is now gone,
GDB core isn't interested in it anymore, so we just discard
that event.
>
>> @@ -2453,7 +2482,9 @@ stop_wait_callback (struct lwp_info *lp, void *data)
>> return 0;
>> }
>>
>> -/* Return non-zero if LP has a wait status pending. */
>> +/* Return non-zero if LP has a wait status pending. Discard the
>> + pending event and resume the LWP if the event that originally
>> + caused the stop became uninteresting. */
>>
>> static int
>> status_callback (struct lwp_info *lp, void *data)
>
> Probably, we should rename status_callback to something more meaningful.
Possibly, though I feel that that would be a bit orthogonal to this
patch, since the meaning of the function is still the same.
>
>> @@ -2523,16 +2601,19 @@ select_event_lwp_callback (struct lwp_info *lp, void *data)
>>
>> gdb_assert (selector != NULL);
>>
>> - /* Select only resumed LWPs that have a SIGTRAP event pending. */
>> - if (lp->resumed && linux_nat_lp_status_is_event (lp))
>> + /* Select only resumed LWPs that have an event pending. */
>> + if (lp->resumed && lwp_status_pending_p (lp))
>> if ((*selector)-- == 0)
>> return 1;
>
> Do we need to do the same change in count_events_callback? IMO, the
> condition in both count_events_callback and select_event_lwp_callback
> should be consistent.
Eh! I could bet I had done that. I think I even pointed out
the same thing in a recent review. :-)
Here's the updated patch.
----
From 2d1e9538fde6000ac9dd4ac980a57b2951d140a9 Mon Sep 17 00:00:00 2001
From: Pedro Alves <palves@redhat.com>
Date: Wed, 7 Jan 2015 12:48:32 +0000
Subject: [PATCH] linux-nat.c: better starvation avoidance, handle non-stop
mode too
Running the testsuite with a series that reimplements user-visible
all-stop behavior on top of a target running in non-stop mode revealed
problems related to event starvation avoidance.
For example, I see
gdb.threads/signal-while-stepping-over-bp-other-thread.exp failing.
What happens is that GDB core never gets to see the signal event. It
ends up processing the events for the same threads over an over,
because Linux's waitpid(-1, ...) returns that first task in the task
list that has an event, starving threads on the tail of the task list.
So I wrote a non-stop mode test originally inspired by
signal-while-stepping-over-bp-other-thread.exp, to stress this
independently of all-stop on top of non-stop. Fixing it required the
changes described below. The test will be added in a following
commit.
1) linux-nat.c has code in place that picks an event LWP at random out
of all that have had events. This is because on the kernel side,
"waitpid(-1, ...)" just walks the task list linearly looking for the
first that had an event. But, this code is currently only used in
all-stop mode. So with a multi-threaded program that has multiple
events triggering debug events in parallel, GDB ends up starving some
threads.
To make the event randomization work in non-stop mode too, the patch
makes us pull out all the already pending events on the kernel side,
with waitpid, before deciding which LWP to report to the core.
There's some code in linux_wait that takes care of leaving events
pending if they were for LWPs the caller is not interested in. The
patch moves that to linux_nat_filter_event, so that we only have one
place that leaves events pending. With that in place, conceptually,
the flow is simpler and more normalized:
#1 - walk the LWP list looking for an LWP with a pending event to report.
#2 - if no pending event, pull events out of the kernel, and store
them in the LWP structures as pending.
#3- goto #1.
2) Then, currently the event randomization code only considers SIGTRAP
(or trap-like) events. That means that if e.g., have have multiple
threads stepping in parallel that hit a breakpoint that needs stepping
over, and one gets a signal, the signal may end up never getting
processed, because GDB will always be giving priority to the SIGTRAPs.
The patch fixes this by making the randomization code consider all
kinds of pending events.
3) If multiple threads hit a breakpoint, we report one of those, and
"cancel" the others. Cancelling means decrementing the PC, and
discarding the event. If the next time the LWP is resumed the
breakpoint is still installed, the LWP should hit it again, and we'll
report the hit then. The problem I found is that this delays threads
from advancing too much, with the kernel potentially ending up
scheduling the same threads over and over, and others not advancing.
So the patch switches away from cancelling the breakpoints, and
instead remembering that the LWP had stopped for a breakpoint. If on
resume the breakpoint is still installed, we report it. If it's no
longer installed, we discard the pending event then. This is actually
how GDBserver used to handle this before d50171e4 (Teach linux
gdbserver to step-over-breakpoints), but with the difference that back
then we'd delay adjusting the PC until resuming, which made it so that
"info threads" could wrongly see threads with unadjusted PCs.
gdb/
2015-01-07 Pedro Alves <palves@redhat.com>
* breakpoint.c (hardware_breakpoint_inserted_here_p): New
function.
* breakpoint.h (hardware_breakpoint_inserted_here_p): New
declaration.
* linux-nat.c (linux_nat_status_is_event): Move higher up in file.
(linux_resume_one_lwp): Store the thread's PC. Adjust to clear
stop_reason.
(check_stopped_by_watchpoint): New function.
(save_sigtrap): Reimplement.
(linux_nat_stopped_by_watchpoint): Adjust.
(linux_nat_lp_status_is_event): Delete.
(stop_wait_callback): Only call save_sigtrap after storing the
pending status.
(status_callback): If the thread had been stopped for a breakpoint
that has since been removed, discard the event and resume the LWP.
(count_events_callback, select_event_lwp_callback): Use
lwp_status_pending_p instead of linux_nat_lp_status_is_event.
(cancel_breakpoint): Rename to ...
(check_stopped_by_breakpoint): ... this. Record whether the LWP
stopped for a software breakpoint or hardware breakpoint.
(select_event_lwp): Only give preference to the stepping LWP in
all-stop mode. Adjust comments.
(stop_and_resume_callback): Remove references to new_pending_p.
(linux_nat_filter_event): Likewise. Leave exit events of the
leader thread pending here. Handle signal short circuiting here.
Only call save_sigtrap after storing the pending waitstatus.
(linux_nat_wait_1): Remove 'retry' label. Remove references to
new_pending. Don't handle leaving events the caller is not
interested in pending here, nor handle signal short-circuiting
here. Also give equal priority to all LWPs that have had events
in non-stop mode. If reporting a software breakpoint event,
unadjust the LWP's PC.
* linux-nat.h (enum lwp_stop_reason): New.
(struct lwp_info) <stop_pc>: New field.
(struct lwp_info) <stopped_by_watchpoint>: Delete field.
(struct lwp_info) <stop_reason>: New field.
* x86-linux-nat.c (x86_linux_prepare_to_resume): Adjust.
---
gdb/breakpoint.c | 23 ++
gdb/breakpoint.h | 5 +
gdb/linux-nat.c | 592 ++++++++++++++++++++++++++--------------------------
gdb/linux-nat.h | 31 ++-
gdb/x86-linux-nat.c | 2 +-
5 files changed, 351 insertions(+), 302 deletions(-)
diff --git a/gdb/breakpoint.c b/gdb/breakpoint.c
index 99b9190..59fb2be 100644
--- a/gdb/breakpoint.c
+++ b/gdb/breakpoint.c
@@ -4293,6 +4293,29 @@ software_breakpoint_inserted_here_p (struct address_space *aspace,
return 0;
}
+/* See breakpoint.h. */
+
+int
+hardware_breakpoint_inserted_here_p (struct address_space *aspace,
+ CORE_ADDR pc)
+{
+ struct bp_location **blp, **blp_tmp = NULL;
+ struct bp_location *bl;
+
+ ALL_BP_LOCATIONS_AT_ADDR (blp, blp_tmp, pc)
+ {
+ struct bp_location *bl = *blp;
+
+ if (bl->loc_type != bp_loc_hardware_breakpoint)
+ continue;
+
+ if (bp_location_inserted_here_p (bl, aspace, pc))
+ return 1;
+ }
+
+ return 0;
+}
+
int
hardware_watchpoint_inserted_in_range (struct address_space *aspace,
CORE_ADDR addr, ULONGEST len)
diff --git a/gdb/breakpoint.h b/gdb/breakpoint.h
index c383cd4..59887d5 100644
--- a/gdb/breakpoint.h
+++ b/gdb/breakpoint.h
@@ -1130,6 +1130,11 @@ extern int regular_breakpoint_inserted_here_p (struct address_space *,
extern int software_breakpoint_inserted_here_p (struct address_space *,
CORE_ADDR);
+/* Return non-zero iff there is a hardware breakpoint inserted at
+ PC. */
+extern int hardware_breakpoint_inserted_here_p (struct address_space *,
+ CORE_ADDR);
+
/* Check whether any location of BP is inserted at PC. */
extern int breakpoint_has_location_inserted_here (struct breakpoint *bp,
diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
index b558648..9ce6e8a 100644
--- a/gdb/linux-nat.c
+++ b/gdb/linux-nat.c
@@ -279,6 +279,10 @@ static struct lwp_info *find_lwp_pid (ptid_t ptid);
static int lwp_status_pending_p (struct lwp_info *lp);
+static int check_stopped_by_breakpoint (struct lwp_info *lp);
+static int sigtrap_is_event (int status);
+static int (*linux_nat_status_is_event) (int status) = sigtrap_is_event;
+
\f
/* Trivial list manipulation functions to keep track of a list of
new stopped processes. */
@@ -1464,12 +1468,25 @@ linux_resume_one_lwp (struct lwp_info *lp, int step, enum gdb_signal signo)
ptid_t ptid;
lp->step = step;
+
+ /* stop_pc doubles as the PC the LWP had when it was last resumed.
+ We only presently need that if the LWP is stepped though (to
+ handle the case of stepping a breakpoint instruction). */
+ if (step)
+ {
+ struct regcache *regcache = get_thread_regcache (lp->ptid);
+
+ lp->stop_pc = regcache_read_pc (regcache);
+ }
+ else
+ lp->stop_pc = 0;
+
if (linux_nat_prepare_to_resume != NULL)
linux_nat_prepare_to_resume (lp);
/* Convert to something the lower layer understands. */
ptid = pid_to_ptid (ptid_get_lwp (lp->ptid));
linux_ops->to_resume (linux_ops, ptid, step, signo);
- lp->stopped_by_watchpoint = 0;
+ lp->stop_reason = LWP_STOPPED_BY_NO_REASON;
lp->stopped = 0;
registers_changed_ptid (lp->ptid);
}
@@ -2311,24 +2328,21 @@ maybe_clear_ignore_sigint (struct lwp_info *lp)
soon as we see LP stop with a SIGTRAP. If GDB changes the debug
registers meanwhile, we have the cached data we can rely on. */
-static void
-save_sigtrap (struct lwp_info *lp)
+static int
+check_stopped_by_watchpoint (struct lwp_info *lp)
{
struct cleanup *old_chain;
if (linux_ops->to_stopped_by_watchpoint == NULL)
- {
- lp->stopped_by_watchpoint = 0;
- return;
- }
+ return 0;
old_chain = save_inferior_ptid ();
inferior_ptid = lp->ptid;
- lp->stopped_by_watchpoint = linux_ops->to_stopped_by_watchpoint (linux_ops);
-
- if (lp->stopped_by_watchpoint)
+ if (linux_ops->to_stopped_by_watchpoint (linux_ops))
{
+ lp->stop_reason = LWP_STOPPED_BY_WATCHPOINT;
+
if (linux_ops->to_stopped_data_address != NULL)
lp->stopped_data_address_p =
linux_ops->to_stopped_data_address (¤t_target,
@@ -2338,9 +2352,27 @@ save_sigtrap (struct lwp_info *lp)
}
do_cleanups (old_chain);
+
+ return lp->stop_reason == LWP_STOPPED_BY_WATCHPOINT;
}
-/* See save_sigtrap. */
+/* Called when the LWP stopped for a trap that could be explained by a
+ watchpoint or a breakpoint. */
+
+static void
+save_sigtrap (struct lwp_info *lp)
+{
+ gdb_assert (lp->stop_reason == LWP_STOPPED_BY_NO_REASON);
+ gdb_assert (lp->status != 0);
+
+ if (check_stopped_by_watchpoint (lp))
+ return;
+
+ if (linux_nat_status_is_event (lp->status))
+ check_stopped_by_breakpoint (lp);
+}
+
+/* Returns true if the LWP had stopped for a watchpoint. */
static int
linux_nat_stopped_by_watchpoint (struct target_ops *ops)
@@ -2349,7 +2381,7 @@ linux_nat_stopped_by_watchpoint (struct target_ops *ops)
gdb_assert (lp != NULL);
- return lp->stopped_by_watchpoint;
+ return lp->stop_reason == LWP_STOPPED_BY_WATCHPOINT;
}
static int
@@ -2372,24 +2404,6 @@ sigtrap_is_event (int status)
return WIFSTOPPED (status) && WSTOPSIG (status) == SIGTRAP;
}
-/* SIGTRAP-like events recognizer. */
-
-static int (*linux_nat_status_is_event) (int status) = sigtrap_is_event;
-
-/* Check for SIGTRAP-like events in LP. */
-
-static int
-linux_nat_lp_status_is_event (struct lwp_info *lp)
-{
- /* We check for lp->waitstatus in addition to lp->status, because we can
- have pending process exits recorded in lp->status
- and W_EXITCODE(0,0) == 0. We should probably have an additional
- lp->status_p flag. */
-
- return (lp->waitstatus.kind == TARGET_WAITKIND_IGNORE
- && linux_nat_status_is_event (lp->status));
-}
-
/* Set alternative SIGTRAP-like events recognizer. If
breakpoint_inserted_here_p there then gdbarch_decr_pc_after_break will be
applied. */
@@ -2445,8 +2459,6 @@ stop_wait_callback (struct lwp_info *lp, void *data)
{
/* The thread was stopped with a signal other than SIGSTOP. */
- save_sigtrap (lp);
-
if (debug_linux_nat)
fprintf_unfiltered (gdb_stdlog,
"SWC: Pending event %s in %s\n",
@@ -2456,6 +2468,7 @@ stop_wait_callback (struct lwp_info *lp, void *data)
/* Save the sigtrap event. */
lp->status = status;
gdb_assert (lp->signalled);
+ save_sigtrap (lp);
}
else
{
@@ -2476,7 +2489,9 @@ stop_wait_callback (struct lwp_info *lp, void *data)
return 0;
}
-/* Return non-zero if LP has a wait status pending. */
+/* Return non-zero if LP has a wait status pending. Discard the
+ pending event and resume the LWP if the event that originally
+ caused the stop became uninteresting. */
static int
status_callback (struct lwp_info *lp, void *data)
@@ -2486,6 +2501,53 @@ status_callback (struct lwp_info *lp, void *data)
if (!lp->resumed)
return 0;
+ if (lp->stop_reason == LWP_STOPPED_BY_SW_BREAKPOINT
+ || lp->stop_reason == LWP_STOPPED_BY_HW_BREAKPOINT)
+ {
+ struct regcache *regcache = get_thread_regcache (lp->ptid);
+ struct gdbarch *gdbarch = get_regcache_arch (regcache);
+ CORE_ADDR pc;
+ int discard = 0;
+
+ gdb_assert (lp->status != 0);
+
+ pc = regcache_read_pc (regcache);
+
+ if (pc != lp->stop_pc)
+ {
+ if (debug_linux_nat)
+ fprintf_unfiltered (gdb_stdlog,
+ "SC: PC of %s changed. was=%s, now=%s\n",
+ target_pid_to_str (lp->ptid),
+ paddress (target_gdbarch (), lp->stop_pc),
+ paddress (target_gdbarch (), pc));
+ discard = 1;
+ }
+ else if (!breakpoint_inserted_here_p (get_regcache_aspace (regcache), pc))
+ {
+ if (debug_linux_nat)
+ fprintf_unfiltered (gdb_stdlog,
+ "SC: previous breakpoint of %s, at %s gone\n",
+ target_pid_to_str (lp->ptid),
+ paddress (target_gdbarch (), lp->stop_pc));
+
+ discard = 1;
+ }
+
+ if (discard)
+ {
+ if (debug_linux_nat)
+ fprintf_unfiltered (gdb_stdlog,
+ "SC: pending event of %s cancelled.\n",
+ target_pid_to_str (lp->ptid));
+
+ lp->status = 0;
+ linux_resume_one_lwp (lp, lp->step, GDB_SIGNAL_0);
+ return 0;
+ }
+ return 1;
+ }
+
return lwp_status_pending_p (lp);
}
@@ -2507,8 +2569,8 @@ count_events_callback (struct lwp_info *lp, void *data)
gdb_assert (count != NULL);
- /* Count only resumed LWPs that have a SIGTRAP event pending. */
- if (lp->resumed && linux_nat_lp_status_is_event (lp))
+ /* Select only resumed LWPs that have an event pending. */
+ if (lp->resumed && lwp_status_pending_p (lp))
(*count)++;
return 0;
@@ -2546,16 +2608,19 @@ select_event_lwp_callback (struct lwp_info *lp, void *data)
gdb_assert (selector != NULL);
- /* Select only resumed LWPs that have a SIGTRAP event pending. */
- if (lp->resumed && linux_nat_lp_status_is_event (lp))
+ /* Select only resumed LWPs that have an event pending. */
+ if (lp->resumed && lwp_status_pending_p (lp))
if ((*selector)-- == 0)
return 1;
return 0;
}
+/* Called when the LWP got a signal/trap that could be explained by a
+ software or hardware breakpoint. */
+
static int
-cancel_breakpoint (struct lwp_info *lp)
+check_stopped_by_breakpoint (struct lwp_info *lp)
{
/* Arrange for a breakpoint to be hit again later. We don't keep
the SIGTRAP status and don't forward the SIGTRAP signal to the
@@ -2569,48 +2634,42 @@ cancel_breakpoint (struct lwp_info *lp)
struct regcache *regcache = get_thread_regcache (lp->ptid);
struct gdbarch *gdbarch = get_regcache_arch (regcache);
CORE_ADDR pc;
+ CORE_ADDR sw_bp_pc;
- pc = regcache_read_pc (regcache) - target_decr_pc_after_break (gdbarch);
- if (breakpoint_inserted_here_p (get_regcache_aspace (regcache), pc))
+ pc = regcache_read_pc (regcache);
+ sw_bp_pc = pc - target_decr_pc_after_break (gdbarch);
+
+ if ((!lp->step || lp->stop_pc == sw_bp_pc)
+ && software_breakpoint_inserted_here_p (get_regcache_aspace (regcache),
+ sw_bp_pc))
{
+ /* The LWP was either continued, or stepped a software
+ breakpoint instruction. */
if (debug_linux_nat)
fprintf_unfiltered (gdb_stdlog,
- "CB: Push back breakpoint for %s\n",
+ "CB: Push back software breakpoint for %s\n",
target_pid_to_str (lp->ptid));
/* Back up the PC if necessary. */
- if (target_decr_pc_after_break (gdbarch))
- regcache_write_pc (regcache, pc);
+ if (pc != sw_bp_pc)
+ regcache_write_pc (regcache, sw_bp_pc);
+ lp->stop_pc = sw_bp_pc;
+ lp->stop_reason = LWP_STOPPED_BY_SW_BREAKPOINT;
return 1;
}
- return 0;
-}
-
-static int
-cancel_breakpoints_callback (struct lwp_info *lp, void *data)
-{
- struct lwp_info *event_lp = data;
- /* Leave the LWP that has been elected to receive a SIGTRAP alone. */
- if (lp == event_lp)
- return 0;
-
- /* If a LWP other than the LWP that we're reporting an event for has
- hit a GDB breakpoint (as opposed to some random trap signal),
- then just arrange for it to hit it again later. We don't keep
- the SIGTRAP status and don't forward the SIGTRAP signal to the
- LWP. We will handle the current event, eventually we will resume
- all LWPs, and this one will get its breakpoint trap again.
-
- If we do not do this, then we run the risk that the user will
- delete or disable the breakpoint, but the LWP will have already
- tripped on it. */
+ if (hardware_breakpoint_inserted_here_p (get_regcache_aspace (regcache), pc))
+ {
+ if (debug_linux_nat)
+ fprintf_unfiltered (gdb_stdlog,
+ "CB: Push back hardware breakpoint for %s\n",
+ target_pid_to_str (lp->ptid));
- if (linux_nat_lp_status_is_event (lp)
- && cancel_breakpoint (lp))
- /* Throw away the SIGTRAP. */
- lp->status = 0;
+ lp->stop_pc = pc;
+ lp->stop_reason = LWP_STOPPED_BY_HW_BREAKPOINT;
+ return 1;
+ }
return 0;
}
@@ -2622,36 +2681,48 @@ select_event_lwp (ptid_t filter, struct lwp_info **orig_lp, int *status)
{
int num_events = 0;
int random_selector;
- struct lwp_info *event_lp;
+ struct lwp_info *event_lp = NULL;
/* Record the wait status for the original LWP. */
(*orig_lp)->status = *status;
- /* Give preference to any LWP that is being single-stepped. */
- event_lp = iterate_over_lwps (filter,
- select_singlestep_lwp_callback, NULL);
- if (event_lp != NULL)
+ /* In all-stop, give preference to the LWP that is being
+ single-stepped. There will be at most one, and it will be the
+ LWP that the core is most interested in. If we didn't do this,
+ then we'd have to handle pending step SIGTRAPs somehow in case
+ the core later continues the previously-stepped thread, as
+ otherwise we'd report the pending SIGTRAP then, and the core, not
+ having stepped the thread, wouldn't understand what the trap was
+ for, and therefore would report it to the user as a random
+ signal. */
+ if (!non_stop)
{
- if (debug_linux_nat)
- fprintf_unfiltered (gdb_stdlog,
- "SEL: Select single-step %s\n",
- target_pid_to_str (event_lp->ptid));
+ event_lp = iterate_over_lwps (filter,
+ select_singlestep_lwp_callback, NULL);
+ if (event_lp != NULL)
+ {
+ if (debug_linux_nat)
+ fprintf_unfiltered (gdb_stdlog,
+ "SEL: Select single-step %s\n",
+ target_pid_to_str (event_lp->ptid));
+ }
}
- else
+
+ if (event_lp == NULL)
{
- /* No single-stepping LWP. Select one at random, out of those
- which have had SIGTRAP events. */
+ /* Pick one at random, out of those which have had events. */
- /* First see how many SIGTRAP events we have. */
+ /* First see how many events we have. */
iterate_over_lwps (filter, count_events_callback, &num_events);
- /* Now randomly pick a LWP out of those that have had a SIGTRAP. */
+ /* Now randomly pick a LWP out of those that have had
+ events. */
random_selector = (int)
((num_events * (double) rand ()) / (RAND_MAX + 1.0));
if (debug_linux_nat && num_events > 1)
fprintf_unfiltered (gdb_stdlog,
- "SEL: Found %d SIGTRAP events, selecting #%d\n",
+ "SEL: Found %d events, selecting #%d\n",
num_events, random_selector);
event_lp = iterate_over_lwps (filter,
@@ -2685,8 +2756,6 @@ resumed_callback (struct lwp_info *lp, void *data)
static int
stop_and_resume_callback (struct lwp_info *lp, void *data)
{
- int *new_pending_p = data;
-
if (!lp->stopped)
{
ptid_t ptid = lp->ptid;
@@ -2727,8 +2796,6 @@ stop_and_resume_callback (struct lwp_info *lp, void *data)
"SARC: not re-resuming LWP %ld "
"(has pending)\n",
ptid_get_lwp (lp->ptid));
- if (new_pending_p)
- *new_pending_p = 1;
}
}
}
@@ -2736,18 +2803,14 @@ stop_and_resume_callback (struct lwp_info *lp, void *data)
}
/* Check if we should go on and pass this event to common code.
- Return the affected lwp if we are, or NULL otherwise. If we stop
- all lwps temporarily, we may end up with new pending events in some
- other lwp. In that case set *NEW_PENDING_P to true. */
+ Return the affected lwp if we are, or NULL otherwise. */
static struct lwp_info *
-linux_nat_filter_event (int lwpid, int status, int *new_pending_p)
+linux_nat_filter_event (int lwpid, int status)
{
struct lwp_info *lp;
int event = linux_ptrace_get_extended_event (status);
- *new_pending_p = 0;
-
lp = find_lwp_pid (pid_to_ptid (lwpid));
/* Check for stop events reported by a process we didn't already
@@ -2823,42 +2886,62 @@ linux_nat_filter_event (int lwpid, int status, int *new_pending_p)
return NULL;
}
- if (linux_nat_status_is_event (status))
- save_sigtrap (lp);
-
/* Check if the thread has exited. */
- if ((WIFEXITED (status) || WIFSIGNALED (status))
- && num_lwps (ptid_get_pid (lp->ptid)) > 1)
+ if (WIFEXITED (status) || WIFSIGNALED (status))
{
- /* If this is the main thread, we must stop all threads and verify
- if they are still alive. This is because in the nptl thread model
- on Linux 2.4, there is no signal issued for exiting LWPs
- other than the main thread. We only get the main thread exit
- signal once all child threads have already exited. If we
- stop all the threads and use the stop_wait_callback to check
- if they have exited we can determine whether this signal
- should be ignored or whether it means the end of the debugged
- application, regardless of which threading model is being
- used. */
- if (ptid_get_pid (lp->ptid) == ptid_get_lwp (lp->ptid))
+ if (num_lwps (ptid_get_pid (lp->ptid)) > 1)
{
- iterate_over_lwps (pid_to_ptid (ptid_get_pid (lp->ptid)),
- stop_and_resume_callback, new_pending_p);
+ /* If this is the main thread, we must stop all threads and
+ verify if they are still alive. This is because in the
+ nptl thread model on Linux 2.4, there is no signal issued
+ for exiting LWPs other than the main thread. We only get
+ the main thread exit signal once all child threads have
+ already exited. If we stop all the threads and use the
+ stop_wait_callback to check if they have exited we can
+ determine whether this signal should be ignored or
+ whether it means the end of the debugged application,
+ regardless of which threading model is being used. */
+ if (ptid_get_pid (lp->ptid) == ptid_get_lwp (lp->ptid))
+ {
+ iterate_over_lwps (pid_to_ptid (ptid_get_pid (lp->ptid)),
+ stop_and_resume_callback, NULL);
+ }
+
+ if (debug_linux_nat)
+ fprintf_unfiltered (gdb_stdlog,
+ "LLW: %s exited.\n",
+ target_pid_to_str (lp->ptid));
+
+ if (num_lwps (ptid_get_pid (lp->ptid)) > 1)
+ {
+ /* If there is at least one more LWP, then the exit signal
+ was not the end of the debugged application and should be
+ ignored. */
+ exit_lwp (lp);
+ return NULL;
+ }
}
+ gdb_assert (lp->resumed);
+
if (debug_linux_nat)
fprintf_unfiltered (gdb_stdlog,
- "LLW: %s exited.\n",
- target_pid_to_str (lp->ptid));
+ "Process %ld exited\n",
+ ptid_get_lwp (lp->ptid));
- if (num_lwps (ptid_get_pid (lp->ptid)) > 1)
- {
- /* If there is at least one more LWP, then the exit signal
- was not the end of the debugged application and should be
- ignored. */
- exit_lwp (lp);
- return NULL;
- }
+ /* This was the last lwp in the process. Since events are
+ serialized to GDB core, we may not be able report this one
+ right now, but GDB core and the other target layers will want
+ to be notified about the exit code/signal, leave the status
+ pending for the next time we're able to report it. */
+
+ /* Dead LWP's aren't expected to reported a pending sigstop. */
+ lp->signalled = 0;
+
+ /* Store the pending event in the waitstatus, because
+ W_EXITCODE(0,0) == 0. */
+ store_waitstatus (&lp->waitstatus, status);
+ return lp;
}
/* Check if the current LWP has previously exited. In the nptl
@@ -2940,9 +3023,59 @@ linux_nat_filter_event (int lwpid, int status, int *new_pending_p)
return NULL;
}
+ /* Don't report signals that GDB isn't interested in, such as
+ signals that are neither printed nor stopped upon. Stopping all
+ threads can be a bit time-consuming so if we want decent
+ performance with heavily multi-threaded programs, especially when
+ they're using a high frequency timer, we'd better avoid it if we
+ can. */
+ if (WIFSTOPPED (status))
+ {
+ enum gdb_signal signo = gdb_signal_from_host (WSTOPSIG (status));
+
+ if (!non_stop)
+ {
+ /* Only do the below in all-stop, as we currently use SIGSTOP
+ to implement target_stop (see linux_nat_stop) in
+ non-stop. */
+ if (signo == GDB_SIGNAL_INT && signal_pass_state (signo) == 0)
+ {
+ /* If ^C/BREAK is typed at the tty/console, SIGINT gets
+ forwarded to the entire process group, that is, all LWPs
+ will receive it - unless they're using CLONE_THREAD to
+ share signals. Since we only want to report it once, we
+ mark it as ignored for all LWPs except this one. */
+ iterate_over_lwps (pid_to_ptid (ptid_get_pid (lp->ptid)),
+ set_ignore_sigint, NULL);
+ lp->ignore_sigint = 0;
+ }
+ else
+ maybe_clear_ignore_sigint (lp);
+ }
+
+ /* When using hardware single-step, we need to report every signal.
+ Otherwise, signals in pass_mask may be short-circuited. */
+ if (!lp->step
+ && WSTOPSIG (status) && sigismember (&pass_mask, WSTOPSIG (status)))
+ {
+ linux_resume_one_lwp (lp, lp->step, signo);
+ if (debug_linux_nat)
+ fprintf_unfiltered (gdb_stdlog,
+ "LLW: %s %s, %s (preempt 'handle')\n",
+ lp->step ?
+ "PTRACE_SINGLESTEP" : "PTRACE_CONT",
+ target_pid_to_str (lp->ptid),
+ (signo != GDB_SIGNAL_0
+ ? strsignal (gdb_signal_to_host (signo))
+ : "0"));
+ return NULL;
+ }
+ }
+
/* An interesting event. */
gdb_assert (lp);
lp->status = status;
+ save_sigtrap (lp);
return lp;
}
@@ -3041,9 +3174,6 @@ linux_nat_wait_1 (struct target_ops *ops,
/* Make sure SIGCHLD is blocked until the sigsuspend below. */
block_child_signals (&prev_mask);
-retry:
- status = 0;
-
/* First check if there is a LWP with a wait status pending. */
lp = iterate_over_lwps (ptid, status_callback, NULL);
if (lp != NULL)
@@ -3061,7 +3191,9 @@ retry:
set_sigint_trap ();
}
- /* But if we don't find a pending event, we'll have to wait. */
+ /* But if we don't find a pending event, we'll have to wait. Always
+ pull all events out of the kernel. We'll randomly select an
+ event LWP out of all that have events, to prevent starvation. */
while (lp == NULL)
{
@@ -3092,10 +3224,6 @@ retry:
if (lwpid > 0)
{
- /* If this is true, then we paused LWPs momentarily, and may
- now have pending events to handle. */
- int new_pending;
-
if (debug_linux_nat)
{
fprintf_unfiltered (gdb_stdlog,
@@ -3103,101 +3231,18 @@ retry:
(long) lwpid, status_to_str (status));
}
- lp = linux_nat_filter_event (lwpid, status, &new_pending);
-
- /* STATUS is now no longer valid, use LP->STATUS instead. */
- status = 0;
-
- if (lp && !ptid_match (lp->ptid, ptid))
- {
- gdb_assert (lp->resumed);
-
- if (debug_linux_nat)
- fprintf_unfiltered (gdb_stdlog,
- "LWP %ld got an event %06x, "
- "leaving pending.\n",
- ptid_get_lwp (lp->ptid), lp->status);
-
- if (WIFSTOPPED (lp->status))
- {
- if (WSTOPSIG (lp->status) != SIGSTOP)
- {
- /* Cancel breakpoint hits. The breakpoint may
- be removed before we fetch events from this
- process to report to the core. It is best
- not to assume the moribund breakpoints
- heuristic always handles these cases --- it
- could be too many events go through to the
- core before this one is handled. All-stop
- always cancels breakpoint hits in all
- threads. */
- if (non_stop
- && linux_nat_lp_status_is_event (lp)
- && cancel_breakpoint (lp))
- {
- /* Throw away the SIGTRAP. */
- lp->status = 0;
-
- if (debug_linux_nat)
- fprintf_unfiltered (gdb_stdlog,
- "LLW: LWP %ld hit a "
- "breakpoint while "
- "waiting for another "
- "process; "
- "cancelled it\n",
- ptid_get_lwp (lp->ptid));
- }
- }
- else
- lp->signalled = 0;
- }
- else if (WIFEXITED (lp->status) || WIFSIGNALED (lp->status))
- {
- if (debug_linux_nat)
- fprintf_unfiltered (gdb_stdlog,
- "Process %ld exited while stopping "
- "LWPs\n",
- ptid_get_lwp (lp->ptid));
-
- /* This was the last lwp in the process. Since
- events are serialized to GDB core, and we can't
- report this one right now, but GDB core and the
- other target layers will want to be notified
- about the exit code/signal, leave the status
- pending for the next time we're able to report
- it. */
-
- /* Dead LWP's aren't expected to reported a pending
- sigstop. */
- lp->signalled = 0;
-
- /* Store the pending event in the waitstatus as
- well, because W_EXITCODE(0,0) == 0. */
- store_waitstatus (&lp->waitstatus, lp->status);
- }
-
- /* Keep looking. */
- lp = NULL;
- }
-
- if (new_pending)
- {
- /* Some LWP now has a pending event. Go all the way
- back to check it. */
- goto retry;
- }
-
- if (lp)
- {
- /* We got an event to report to the core. */
- break;
- }
-
+ linux_nat_filter_event (lwpid, status);
/* Retry until nothing comes out of waitpid. A single
SIGCHLD can indicate more than one child stopped. */
continue;
}
+ /* Now that we've pulled all events out of the kernel, check if
+ there's any LWP with a status to report to the core. */
+ lp = iterate_over_lwps (ptid, status_callback, NULL);
+ if (lp != NULL)
+ break;
+
/* Check for zombie thread group leaders. Those can't be reaped
until all other threads in the thread group are. */
check_zombie_leaders ();
@@ -3247,68 +3292,6 @@ retry:
status = lp->status;
lp->status = 0;
- /* Don't report signals that GDB isn't interested in, such as
- signals that are neither printed nor stopped upon. Stopping all
- threads can be a bit time-consuming so if we want decent
- performance with heavily multi-threaded programs, especially when
- they're using a high frequency timer, we'd better avoid it if we
- can. */
-
- if (WIFSTOPPED (status))
- {
- enum gdb_signal signo = gdb_signal_from_host (WSTOPSIG (status));
-
- /* When using hardware single-step, we need to report every signal.
- Otherwise, signals in pass_mask may be short-circuited. */
- if (!lp->step
- && WSTOPSIG (status) && sigismember (&pass_mask, WSTOPSIG (status)))
- {
- /* FIMXE: kettenis/2001-06-06: Should we resume all threads
- here? It is not clear we should. GDB may not expect
- other threads to run. On the other hand, not resuming
- newly attached threads may cause an unwanted delay in
- getting them running. */
- linux_resume_one_lwp (lp, lp->step, signo);
- if (debug_linux_nat)
- fprintf_unfiltered (gdb_stdlog,
- "LLW: %s %s, %s (preempt 'handle')\n",
- lp->step ?
- "PTRACE_SINGLESTEP" : "PTRACE_CONT",
- target_pid_to_str (lp->ptid),
- (signo != GDB_SIGNAL_0
- ? strsignal (gdb_signal_to_host (signo))
- : "0"));
- goto retry;
- }
-
- if (!non_stop)
- {
- /* Only do the below in all-stop, as we currently use SIGINT
- to implement target_stop (see linux_nat_stop) in
- non-stop. */
- if (signo == GDB_SIGNAL_INT && signal_pass_state (signo) == 0)
- {
- /* If ^C/BREAK is typed at the tty/console, SIGINT gets
- forwarded to the entire process group, that is, all LWPs
- will receive it - unless they're using CLONE_THREAD to
- share signals. Since we only want to report it once, we
- mark it as ignored for all LWPs except this one. */
- iterate_over_lwps (pid_to_ptid (ptid_get_pid (ptid)),
- set_ignore_sigint, NULL);
- lp->ignore_sigint = 0;
- }
- else
- maybe_clear_ignore_sigint (lp);
- }
- }
-
- /* This LWP is stopped now. */
- lp->stopped = 1;
-
- if (debug_linux_nat)
- fprintf_unfiltered (gdb_stdlog, "LLW: Candidate event %s in %s.\n",
- status_to_str (status), target_pid_to_str (lp->ptid));
-
if (!non_stop)
{
/* Now stop all other LWP's ... */
@@ -3317,33 +3300,46 @@ retry:
/* ... and wait until all of them have reported back that
they're no longer running. */
iterate_over_lwps (minus_one_ptid, stop_wait_callback, NULL);
+ }
- /* If we're not waiting for a specific LWP, choose an event LWP
- from among those that have had events. Giving equal priority
- to all LWPs that have had events helps prevent
- starvation. */
- if (ptid_equal (ptid, minus_one_ptid) || ptid_is_pid (ptid))
- select_event_lwp (ptid, &lp, &status);
+ /* If we're not waiting for a specific LWP, choose an event LWP from
+ among those that have had events. Giving equal priority to all
+ LWPs that have had events helps prevent starvation. */
+ if (ptid_equal (ptid, minus_one_ptid) || ptid_is_pid (ptid))
+ select_event_lwp (ptid, &lp, &status);
- /* Now that we've selected our final event LWP, cancel any
- breakpoints in other LWPs that have hit a GDB breakpoint.
- See the comment in cancel_breakpoints_callback to find out
- why. */
- iterate_over_lwps (minus_one_ptid, cancel_breakpoints_callback, lp);
+ gdb_assert (lp != NULL);
- /* We'll need this to determine whether to report a SIGSTOP as
- TARGET_WAITKIND_0. Need to take a copy because
- resume_clear_callback clears it. */
- last_resume_kind = lp->last_resume_kind;
+ /* Now that we've selected our final event LWP, un-adjust its PC if
+ it was a software breakpoint. */
+ if (lp->stop_reason == LWP_STOPPED_BY_SW_BREAKPOINT)
+ {
+ struct regcache *regcache = get_thread_regcache (lp->ptid);
+ struct gdbarch *gdbarch = get_regcache_arch (regcache);
+ int decr_pc = target_decr_pc_after_break (gdbarch);
+ if (decr_pc != 0)
+ {
+ CORE_ADDR pc;
+
+ pc = regcache_read_pc (regcache);
+ regcache_write_pc (regcache, pc + decr_pc);
+ }
+ }
+
+ /* We'll need this to determine whether to report a SIGSTOP as
+ GDB_SIGNAL_0. Need to take a copy because resume_clear_callback
+ clears it. */
+ last_resume_kind = lp->last_resume_kind;
+
+ if (!non_stop)
+ {
/* In all-stop, from the core's perspective, all LWPs are now
stopped until a new resume action is sent over. */
iterate_over_lwps (minus_one_ptid, resume_clear_callback, NULL);
}
else
{
- /* See above. */
- last_resume_kind = lp->last_resume_kind;
resume_clear_callback (lp, NULL);
}
diff --git a/gdb/linux-nat.h b/gdb/linux-nat.h
index 0aa8377..3e3d806 100644
--- a/gdb/linux-nat.h
+++ b/gdb/linux-nat.h
@@ -23,6 +23,24 @@
struct arch_lwp_info;
+/* Reasons an LWP last stopped. */
+
+enum lwp_stop_reason
+{
+ /* Either not stopped, or stopped for a reason that doesn't require
+ special tracking. */
+ LWP_STOPPED_BY_NO_REASON,
+
+ /* Stopped by a software breakpoint. */
+ LWP_STOPPED_BY_SW_BREAKPOINT,
+
+ /* Stopped by a hardware breakpoint. */
+ LWP_STOPPED_BY_HW_BREAKPOINT,
+
+ /* Stopped by a watchpoint. */
+ LWP_STOPPED_BY_WATCHPOINT
+};
+
/* Structure describing an LWP. This is public only for the purposes
of ALL_LWPS; target-specific code should generally not access it
directly. */
@@ -59,12 +77,19 @@ struct lwp_info
/* If non-zero, a pending wait status. */
int status;
+ /* When 'stopped' is set, this is where the lwp last stopped, with
+ decr_pc_after_break already accounted for. If the LWP is
+ running, and stepping, this is the address at which the lwp was
+ resumed (that is, it's the previous stop PC). If the LWP is
+ running and not stepping, this is 0. */
+ CORE_ADDR stop_pc;
+
/* Non-zero if we were stepping this LWP. */
int step;
- /* STOPPED_BY_WATCHPOINT is non-zero if this LWP stopped with a data
- watchpoint trap. */
- int stopped_by_watchpoint;
+ /* The reason the LWP last stopped, if we need to track it
+ (breakpoint, watchpoint, etc.) */
+ enum lwp_stop_reason stop_reason;
/* On architectures where it is possible to know the data address of
a triggered watchpoint, STOPPED_DATA_ADDRESS_P is non-zero, and
diff --git a/gdb/x86-linux-nat.c b/gdb/x86-linux-nat.c
index b2141eb..752800a 100644
--- a/gdb/x86-linux-nat.c
+++ b/gdb/x86-linux-nat.c
@@ -214,7 +214,7 @@ x86_linux_prepare_to_resume (struct lwp_info *lwp)
lwp->arch_private->debug_registers_changed = 0;
}
- if (clear_status || lwp->stopped_by_watchpoint)
+ if (clear_status || lwp->stop_reason == LWP_STOPPED_BY_WATCHPOINT)
x86_linux_dr_set (lwp->ptid, DR_STATUS, 0);
}
--
1.9.3
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 6/8] linux-nat.c: better starvation avoidance, handle non-stop mode too
2015-01-07 13:22 ` Pedro Alves
@ 2015-01-07 14:08 ` Yao Qi
2015-01-07 14:36 ` Pedro Alves
0 siblings, 1 reply; 23+ messages in thread
From: Yao Qi @ 2015-01-07 14:08 UTC (permalink / raw)
To: Pedro Alves; +Cc: gdb-patches
Pedro Alves <palves@redhat.com> writes:
> That is, the LWP had previously stopped for a breakpoint, and we
> left that event pending. But if the breakpoint is now gone,
> GDB core isn't interested in it anymore, so we just discard
> that event.
The breakpoint is gone because it is removed by user, for example,
it that correct?
> Here's the updated patch.
It looks good to me.
--
Yao (齐尧)
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 6/8] linux-nat.c: better starvation avoidance, handle non-stop mode too
2015-01-07 14:08 ` Yao Qi
@ 2015-01-07 14:36 ` Pedro Alves
0 siblings, 0 replies; 23+ messages in thread
From: Pedro Alves @ 2015-01-07 14:36 UTC (permalink / raw)
To: Yao Qi; +Cc: gdb-patches
On 01/07/2015 02:08 PM, Yao Qi wrote:
> Pedro Alves <palves@redhat.com> writes:
>
>> That is, the LWP had previously stopped for a breakpoint, and we
>> left that event pending. But if the breakpoint is now gone,
>> GDB core isn't interested in it anymore, so we just discard
>> that event.
>
> The breakpoint is gone because it is removed by user, for example,
> it that correct?
Yeah, removed for any reason, really. Another example would be because it
was a step-resume breakpoint for another thread and that other thread
no longer needs it, so GDB meanwhile removed it.
>
>> Here's the updated patch.
>
> It looks good to me.
Thank you for the review.
Pedro Alves
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 5/8] linux-nat.c: always mark execing LWP as resumed
2014-12-26 20:31 [PATCH 0/8] Linux: starvation avoidance in non-stop mode Pedro Alves
` (3 preceding siblings ...)
2014-12-26 20:32 ` [PATCH 6/8] linux-nat.c: better starvation avoidance, handle non-stop mode too Pedro Alves
@ 2014-12-26 20:32 ` Pedro Alves
2014-12-26 20:32 ` [PATCH 8/8] add non-stop test that stresses thread starvation issues Pedro Alves
` (3 subsequent siblings)
8 siblings, 0 replies; 23+ messages in thread
From: Pedro Alves @ 2014-12-26 20:32 UTC (permalink / raw)
To: gdb-patches
A subsequent patch will make the Linux backend's target_wait method
pull all events out of the kernel (with waitpid) and store them as
pending status in the LWP structure if no pending status was already
available. Then, the backend goes over the pending statuses and pick
one to report to the core.
With that, the existing thread-execl.exp test exposes a bug, like:
(gdb) set scheduler-locking on
(gdb) PASS: gdb.threads/thread-execl.exp: schedlock on: set scheduler-locking on
next
FAIL: gdb.threads/thread-execl.exp: schedlock on: get to main in new image (timeout)
Recall that when the non-leader thread execs, all threads in the
process die, the execing thread changes its pid to the tgid, and then
waitpid returns an exec event to the tgid. If GDB didn't resume the
leader LWP, then GDB sees an event for an LWP that was supposedly
stopped, and thus not marked as resumed. Because the code that picks
a pending event to report to the core ignores not-resumed LWPs:
/* Return non-zero if LP has a wait status pending. */
static int
status_callback (struct lwp_info *lp, void *data)
{
/* Only report a pending wait status if we pretend that this has
indeed been resumed. */
if (!lp->resumed)
return 0;
the event ends up pending forever, thus the timeout.
gdb/
2014-12-26 Pedro Alves <palves@redhat.com>
* linux-nat.c (linux_handle_extended_wait) <PTRACE_EVENT_EXEC>:
Set the LWP's 'resumed' flag.
---
gdb/linux-nat.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
index 8346c55..c50a59f 100644
--- a/gdb/linux-nat.c
+++ b/gdb/linux-nat.c
@@ -2007,6 +2007,10 @@ linux_handle_extended_wait (struct lwp_info *lp, int status,
ourstatus->value.execd_pathname
= xstrdup (linux_child_pid_to_exec_file (NULL, pid));
+ /* The thread that execed must have been resumed, but, when a
+ thread execs, it changes its tid to the tgid, and the old
+ tgid thread might have not been resumed. */
+ lp->resumed = 1;
return 0;
}
--
1.9.3
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 8/8] add non-stop test that stresses thread starvation issues
2014-12-26 20:31 [PATCH 0/8] Linux: starvation avoidance in non-stop mode Pedro Alves
` (4 preceding siblings ...)
2014-12-26 20:32 ` [PATCH 5/8] linux-nat.c: always mark execing LWP as resumed Pedro Alves
@ 2014-12-26 20:32 ` Pedro Alves
2015-04-02 14:53 ` Yao Qi
2014-12-26 20:32 ` [PATCH 7/8] [gdbserver] linux-low.c: better starvation avoidance, handle non-stop mode too Pedro Alves
` (2 subsequent siblings)
8 siblings, 1 reply; 23+ messages in thread
From: Pedro Alves @ 2014-12-26 20:32 UTC (permalink / raw)
To: gdb-patches
This commit adds a non-stop mode test originally inspired by
signal-while-stepping-over-bp-other-thread.exp, that exposes the
thread starvation issues fixed by the previous patches. It sets a set
of threads stepping in parallel, and has one of them get a signal.
Without the previous fixes, this would fail with timeouts.
gdb/testsuite/
2014-12-26 Pedro Alves <palves@redhat.com>
* gdb.threads/non-stop-fair-events.c: New file.
* gdb.threads/non-stop-fair-events.exp: New file.
---
gdb/testsuite/gdb.threads/non-stop-fair-events.c | 84 +++++++++++
gdb/testsuite/gdb.threads/non-stop-fair-events.exp | 161 +++++++++++++++++++++
2 files changed, 245 insertions(+)
create mode 100644 gdb/testsuite/gdb.threads/non-stop-fair-events.c
create mode 100644 gdb/testsuite/gdb.threads/non-stop-fair-events.exp
diff --git a/gdb/testsuite/gdb.threads/non-stop-fair-events.c b/gdb/testsuite/gdb.threads/non-stop-fair-events.c
new file mode 100644
index 0000000..421c766
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/non-stop-fair-events.c
@@ -0,0 +1,84 @@
+/* This testcase is part of GDB, the GNU debugger.
+
+ Copyright 2014 Free Software Foundation, Inc.
+
+ This program is free software; you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>. */
+
+#include <pthread.h>
+#include <unistd.h>
+#include <stdlib.h>
+#include <signal.h>
+
+#define NUM_THREADS 10
+const int num_threads = NUM_THREADS;
+
+pthread_t child_thread[NUM_THREADS];
+volatile pthread_t signal_thread;
+volatile int got_sig;
+
+void
+handler (int signo)
+{
+ got_sig = 1;
+}
+
+void
+loop_broke (void)
+{
+}
+
+#define INF_LOOP \
+ do \
+ { \
+ while (!got_sig) \
+ ; \
+ } \
+ while (0)
+
+void *
+child_function (void *arg)
+{
+ pthread_t self = pthread_self ();
+
+ while (1)
+ {
+ INF_LOOP; /* set thread breakpoint here */
+ loop_broke ();
+ }
+}
+
+int
+main (void)
+{
+ int res;
+ int i;
+
+ alarm (60);
+
+ signal (SIGUSR1, handler);
+
+ for (i = 0; i < NUM_THREADS; i++)
+ {
+ res = pthread_create (&child_thread[i], NULL, child_function, NULL);
+ }
+
+ while (1)
+ {
+ pthread_kill (signal_thread, SIGUSR1); /* set kill breakpoint here */
+ INF_LOOP;
+ loop_broke ();
+ }
+
+ exit(EXIT_SUCCESS);
+}
diff --git a/gdb/testsuite/gdb.threads/non-stop-fair-events.exp b/gdb/testsuite/gdb.threads/non-stop-fair-events.exp
new file mode 100644
index 0000000..944a2d3
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/non-stop-fair-events.exp
@@ -0,0 +1,161 @@
+# Copyright (C) 2014 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program. If not, see <http://www.gnu.org/licenses/>.
+
+# Test that GDB in non-stop mode gives roughly equal priority to
+# events of all threads.
+
+standard_testfile
+set executable ${testfile}
+
+if [target_info exists gdb,nosignals] {
+ verbose "Skipping ${testfile}.exp because of nosignals."
+ return -1
+}
+
+if {[prepare_for_testing "failed to prepare" $testfile $srcfile {debug pthreads}] == -1} {
+ return -1
+}
+
+gdb_test_no_output "set non-stop on"
+
+if ![runto_main] {
+ return -1
+}
+
+# We want "handle print", to make sure the target backend reports the
+# signal to the run control core.
+gdb_test "handle SIGUSR1 print nostop pass" ""
+
+# Get current value of VAR from the inferior. TEST is used as test
+# message.
+
+proc get_value {var test} {
+ global expect_out
+ global gdb_prompt
+ global decimal
+
+ set value -1
+ gdb_test_multiple "print $var" "$test" {
+ -re ".*= ($decimal).*\r\n$gdb_prompt $" {
+ set value $expect_out(1,string)
+ pass "$test"
+ }
+ }
+ return ${value}
+}
+
+set NUM_THREADS [get_value "num_threads" "get num_threads"]
+
+# Account for the main thread.
+incr NUM_THREADS
+
+# Run threads to their start positions. This prepares for a new test
+# sequence.
+
+proc restart {} {
+ global gdb_prompt
+ global NUM_THREADS
+
+ delete_breakpoints
+
+ gdb_test "print got_sig = 0" " = 0"
+
+ gdb_breakpoint [gdb_get_line_number "set thread breakpoint here"]
+ gdb_breakpoint [gdb_get_line_number "set kill breakpoint here"]
+
+ set test "continue -a&"
+ gdb_test_multiple $test $test {
+ -re "Continuing.\r\n$gdb_prompt " {
+ pass $test
+ }
+ }
+
+ for {set i 1} { $i <= $NUM_THREADS } { incr i } {
+ set test "thread $i restarted"
+ gdb_test_multiple "" $test {
+ -re "breakpoint here" {
+ # The prompt was already matched in the "continue &"
+ # test above. We're now consuming asynchronous output
+ # that comes after the prompt.
+ pass $test
+ }
+ }
+ }
+
+ delete_breakpoints
+}
+
+# The test proper. SIGNAL_THREAD is the thread that has been elected
+# to receive the SIGUSR1 signal.
+
+proc test {signal_thread} {
+ global gdb_prompt
+ global NUM_THREADS
+
+ with_test_prefix "signal_thread=$signal_thread" {
+ restart
+
+ # Set all threads stepping the infinite loop line in parallel.
+ for {set i 2} { $i <= $NUM_THREADS } { incr i } {
+ gdb_test "thread $i" \
+ "child_function.*set thread breakpoint here.*" \
+ "switch to thread $i to step it"
+
+ if {$i == $signal_thread} {
+ gdb_test "print signal_thread = self" " = .*"
+ }
+
+ gdb_test "step&" "" "set $i thread stepping"
+ }
+
+ gdb_test "thread 1" "Switching to .*" \
+ "switch to the main thread to queue signal"
+
+ # Let the main thread queue the signal.
+ gdb_breakpoint "loop_broke"
+ set test "continue &"
+ gdb_test_multiple $test $test {
+ -re "Continuing.\r\n$gdb_prompt " {
+ pass $test
+ }
+ }
+
+ # Wait for all threads to finish their steps, and for the main
+ # thread to hit the breakpoint.
+ for {set i 1} { $i <= $NUM_THREADS } { incr i } {
+ set test "thread $i broke out of loop"
+ gdb_test_multiple "" $test {
+ -re "loop_broke" {
+ # The prompt was already matched in the "continue
+ # &" test above. We're now consuming asynchronous
+ # output that comes after the prompt.
+ pass $test
+ }
+ }
+ }
+
+ # It's helpful to have this in the log if the test ever
+ # happens to fail.
+ gdb_test "info threads"
+ }
+}
+
+# The kernel/debug API may always walk its thread list looking for the
+# first with an event, resulting in giving priority to e.g. the thread
+# with lowest kernel thread ID. So test once with the signal pending
+# in each thread, except the main thread.
+for {set i 2} { $i <= $NUM_THREADS } { incr i } {
+ test $i
+}
--
1.9.3
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 8/8] add non-stop test that stresses thread starvation issues
2014-12-26 20:32 ` [PATCH 8/8] add non-stop test that stresses thread starvation issues Pedro Alves
@ 2015-04-02 14:53 ` Yao Qi
2015-04-06 11:26 ` Pedro Alves
0 siblings, 1 reply; 23+ messages in thread
From: Yao Qi @ 2015-04-02 14:53 UTC (permalink / raw)
To: Pedro Alves, gdb-patches
Hi, Pedro,
On 26/12/14 20:31, Pedro Alves wrote:
> +int
> +main (void)
> +{
> + int res;
> + int i;
> +
> + alarm (60);
Is there any reason to call alarm here? It causes some fails on
arm-linux, that is, the board is slow, and alarm is triggered.
Then thread is disappeared and current_thread is set to NULL.
GDB/GDBserver doesn't know about that. When the inferior memory is
accessed, current_thread is dereferenced, and GDBserver is crashed.
How about removing it?
--
Yao (é½å°§)
From: Yao Qi <yao.qi@linaro.org>
Date: Thu, 2 Apr 2015 15:38:49 +0100
Subject: [PATCH] Remove unnecessary call to alarm
It is unnecessary to call alarm, and this causes some fails when
tests are running on some arm board.
gdb/testsuite:
2015-04-02 Yao Qi <yao.qi@linaro.org>
* gdb.threads/non-stop-fair-events.c (main): Don't call
alarm.
diff --git a/gdb/testsuite/gdb.threads/non-stop-fair-events.c
b/gdb/testsuite/gdb.threads/non-stop-fair-events.c
index 6a9d31d..8ce8fe8 100644
--- a/gdb/testsuite/gdb.threads/non-stop-fair-events.c
+++ b/gdb/testsuite/gdb.threads/non-stop-fair-events.c
@@ -64,8 +64,6 @@ main (void)
int res;
int i;
- alarm (60);
-
signal (SIGUSR1, handler);
for (i = 0; i < NUM_THREADS; i++)
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 8/8] add non-stop test that stresses thread starvation issues
2015-04-02 14:53 ` Yao Qi
@ 2015-04-06 11:26 ` Pedro Alves
2015-04-07 10:10 ` Yao Qi
0 siblings, 1 reply; 23+ messages in thread
From: Pedro Alves @ 2015-04-06 11:26 UTC (permalink / raw)
To: Yao Qi, gdb-patches
On 04/02/2015 03:53 PM, Yao Qi wrote:
> Hi, Pedro,
>
> On 26/12/14 20:31, Pedro Alves wrote:
>> +int
>> +main (void)
>> +{
>> + int res;
>> + int i;
>> +
>> + alarm (60);
>
> Is there any reason to call alarm here?
Yes, the test runs forever otherwise. See:
https://sourceware.org/gdb/wiki/GDBTestcaseCookbook#Don.27t_write_tests_that_run_forever
How about fixing this like attach-many-short-lived-threads.exp was
fixed here:
https://sourceware.org/ml/gdb-patches/2015-02/msg00152.html
?
> It causes some fails on
> arm-linux, that is, the board is slow, and alarm is triggered.
> Then thread is disappeared and current_thread is set to NULL.
> GDB/GDBserver doesn't know about that. When the inferior memory is
> accessed, current_thread is dereferenced, and GDBserver is crashed.
Sounds like we should have a test that explicitly covers that.
GDBserver shouldn't crash.
--
Pedro Alves
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 8/8] add non-stop test that stresses thread starvation issues
2015-04-06 11:26 ` Pedro Alves
@ 2015-04-07 10:10 ` Yao Qi
2015-04-07 10:22 ` Pedro Alves
0 siblings, 1 reply; 23+ messages in thread
From: Yao Qi @ 2015-04-07 10:10 UTC (permalink / raw)
To: Pedro Alves; +Cc: Yao Qi, gdb-patches
Pedro Alves <palves@redhat.com> writes:
> Yes, the test runs forever otherwise. See:
>
> https://sourceware.org/gdb/wiki/GDBTestcaseCookbook#Don.27t_write_tests_that_run_forever
>
> How about fixing this like attach-many-short-lived-threads.exp was
> fixed here:
>
> https://sourceware.org/ml/gdb-patches/2015-02/msg00152.html
>
> ?
>
OK, how about the patch below?
>> It causes some fails on
>> arm-linux, that is, the board is slow, and alarm is triggered.
>> Then thread is disappeared and current_thread is set to NULL.
>> GDB/GDBserver doesn't know about that. When the inferior memory is
>> accessed, current_thread is dereferenced, and GDBserver is crashed.
>
> Sounds like we should have a test that explicitly covers that.
> GDBserver shouldn't crash.
Yes, let me think about a test case to cover that.
--
Yao (齐尧)
From: Yao Qi <yao.qi@linaro.org>
Subject: [PATCH] Properly set alarm value in gdb.threads/non-stop-fair-events.exp
Nowadays, the alarm value is 60, and alarm is generated on some slow
boards. This patch is to pass DejaGNU timeout value to the program,
and move the alarm call before going to infinite loop. If any thread
has activities, the alarm is reset.
gdb/testsuite:
2015-04-07 Yao Qi <yao.qi@linaro.org>
* gdb.threads/non-stop-fair-events.c (SECONDS): New macro.
(child_function): Call alarm.
(main): Move call to alarm into the loop.
* gdb.threads/non-stop-fair-events.exp: Build program with
-DTIMEOUT=$timeout.
diff --git a/gdb/testsuite/gdb.threads/non-stop-fair-events.c b/gdb/testsuite/gdb.threads/non-stop-fair-events.c
index 6a9d31d..f82c366 100644
--- a/gdb/testsuite/gdb.threads/non-stop-fair-events.c
+++ b/gdb/testsuite/gdb.threads/non-stop-fair-events.c
@@ -22,6 +22,9 @@
#define NUM_THREADS 10
const int num_threads = NUM_THREADS;
+/* Allow for as much timeout as DejaGnu wants, plus a bit of
+ slack. */
+#define SECONDS (TIMEOUT + 20)
pthread_t child_thread[NUM_THREADS];
volatile pthread_t signal_thread;
@@ -53,6 +56,8 @@ child_function (void *arg)
while (1)
{
+ /* Reset the timer before going to INF_LOOP. */
+ alarm (SECONDS);
INF_LOOP; /* set thread breakpoint here */
loop_broke ();
}
@@ -64,8 +69,6 @@ main (void)
int res;
int i;
- alarm (60);
-
signal (SIGUSR1, handler);
for (i = 0; i < NUM_THREADS; i++)
@@ -76,6 +79,8 @@ main (void)
while (1)
{
pthread_kill (signal_thread, SIGUSR1); /* set kill breakpoint here */
+ /* Reset the timer before going to INF_LOOP. */
+ alarm (SECONDS);
INF_LOOP;
loop_broke ();
}
diff --git a/gdb/testsuite/gdb.threads/non-stop-fair-events.exp b/gdb/testsuite/gdb.threads/non-stop-fair-events.exp
index 9b7b9ca..e2d3f7d 100644
--- a/gdb/testsuite/gdb.threads/non-stop-fair-events.exp
+++ b/gdb/testsuite/gdb.threads/non-stop-fair-events.exp
@@ -24,7 +24,8 @@ if [target_info exists gdb,nosignals] {
return -1
}
-if {[prepare_for_testing "failed to prepare" $testfile $srcfile {debug pthreads}] == -1} {
+set options { "additional_flags=-DTIMEOUT=$timeout" debug pthreads }
+if {[prepare_for_testing "failed to prepare" $testfile $srcfile $options] == -1} {
return -1
}
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 7/8] [gdbserver] linux-low.c: better starvation avoidance, handle non-stop mode too
2014-12-26 20:31 [PATCH 0/8] Linux: starvation avoidance in non-stop mode Pedro Alves
` (5 preceding siblings ...)
2014-12-26 20:32 ` [PATCH 8/8] add non-stop test that stresses thread starvation issues Pedro Alves
@ 2014-12-26 20:32 ` Pedro Alves
2014-12-26 20:32 ` [PATCH 4/8] linux-nat.c: clean up pending status checking and resuming LWPs Pedro Alves
2015-01-09 15:07 ` [PATCH 0/8] Linux: starvation avoidance in non-stop mode Pedro Alves
8 siblings, 0 replies; 23+ messages in thread
From: Pedro Alves @ 2014-12-26 20:32 UTC (permalink / raw)
To: gdb-patches
This patch applies the same starvation avoidance improvements of the
previous patch to the Linux gdbserver side.
Without this, the test added by the following commit
(gdb.threads/non-stop-fair-events.exp) always fails with time outs.
gdb/gdbserver/
2014-12-26 Pedro Alves <palves@redhat.com>
* linux-low.c (step_over_bkpt): Move higher up in the file.
(handle_extended_wait): Don't store the stop_pc here.
(get_stop_pc): Adjust comments and rename to ...
(check_stopped_by_breakpoint): ... this. Record whether the LWP
stopped for a software breakpoint or hardware breakpoint.
(thread_still_has_status_pending_p): New function.
(status_pending_p_callback): Use
thread_still_has_status_pending_p. If the event is no longer
interesting, resume the LWP.
(handle_tracepoints): Add assert.
(maybe_move_out_of_jump_pad): Remove cancel_breakpoints call.
(wstatus_maybe_breakpoint): New function.
(cancel_breakpoint): Delete function.
(check_stopped_by_watchpoint): New function, factored out from
linux_low_filter_event.
(lp_status_maybe_breakpoint): Delete function.
(linux_low_filter_event): Remove filter_ptid argument.
Leave thread group exits pending here. Store the LWP's stop PC.
Always leave events pending.
(linux_wait_for_event_filtered): Pull all events out of the
kernel, and leave them all pending.
(count_events_callback, select_event_lwp_callback): Consider all
events.
(cancel_breakpoints_callback, linux_cancel_breakpoints): Delete.
(select_event_lwp): Only give preference to the stepping LWP in
all-stop mode. Adjust comments.
(ignore_event): New function.
(linux_wait_1): Delete 'retry' label. Use ignore_event. Remove
references to cancel_breakpoints. Adjust to renames. Also give
equal priority to all LWPs that have had events in non-stop mode.
If reporting a software breakpoint event, unadjust the LWP's PC.
(linux_wait): If linux_wait_1 returned an ignored event, retry.
(stuck_in_jump_pad_callback, move_out_of_jump_pad_callback):
Adjust.
(linux_resume_one_lwp): Store the LWP's PC. Adjust.
(resume_status_pending_p): Use thread_still_has_status_pending_p.
(linux_stopped_by_watchpoint): Adjust.
(linux_target_ops): Remove reference to linux_cancel_breakpoints.
* linux-low.h (enum lwp_stop_reason): New.
(struct lwp_info) <stop_pc>: Adjust comment.
<stopped_by_watchpoint>: Delete field.
<stop_reason>: New field.
* linux-x86-low.c (x86_linux_prepare_to_resume): Adjust.
* mem-break.c (software_breakpoint_inserted_here)
(hardware_breakpoint_inserted_here): New function.
* mem-break.h (software_breakpoint_inserted_here)
(hardware_breakpoint_inserted_here): Declare.
* target.h (struct target_ops) <cancel_breakpoints>: Remove field.
(cancel_breakpoints): Delete.
* tracepoint.c (clear_installed_tracepoints, stop_tracing)
(upload_fast_traceframes): Remove references to
cancel_breakpoints.
---
gdb/gdbserver/linux-low.c | 707 +++++++++++++++++++++++-------------------
gdb/gdbserver/linux-low.h | 29 +-
gdb/gdbserver/linux-x86-low.c | 2 +-
gdb/gdbserver/mem-break.c | 34 ++
gdb/gdbserver/mem-break.h | 10 +
gdb/gdbserver/target.h | 10 -
gdb/gdbserver/tracepoint.c | 5 -
7 files changed, 451 insertions(+), 346 deletions(-)
diff --git a/gdb/gdbserver/linux-low.c b/gdb/gdbserver/linux-low.c
index 65f72a2..609f534 100644
--- a/gdb/gdbserver/linux-low.c
+++ b/gdb/gdbserver/linux-low.c
@@ -218,9 +218,12 @@ static int linux_stopped_by_watchpoint (void);
static void mark_lwp_dead (struct lwp_info *lwp, int wstat);
static void proceed_all_lwps (void);
static int finish_step_over (struct lwp_info *lwp);
-static CORE_ADDR get_stop_pc (struct lwp_info *lwp);
static int kill_lwp (unsigned long lwpid, int signo);
+/* When the event-loop is doing a step-over, this points at the thread
+ being stepped. */
+ptid_t step_over_bkpt;
+
/* True if the low target can hardware single-step. Such targets
don't need a BREAKPOINT_REINSERT_ADDR callback. */
@@ -363,6 +366,8 @@ linux_add_process (int pid, int attached)
return proc;
}
+static CORE_ADDR get_pc (struct lwp_info *lwp);
+
/* Handle a GNU/Linux extended wait response. If we see a clone
event, we need to add the new LWP to our list (and not report the
trap to higher layers). */
@@ -423,9 +428,7 @@ handle_extended_wait (struct lwp_info *event_child, int wstat)
If we do get another signal, be sure not to lose it. */
if (WSTOPSIG (status) == SIGSTOP)
{
- if (stopping_threads != NOT_STOPPING_THREADS)
- new_lwp->stop_pc = get_stop_pc (new_lwp);
- else
+ if (stopping_threads == NOT_STOPPING_THREADS)
linux_resume_one_lwp (new_lwp, 0, 0, NULL);
}
else
@@ -434,7 +437,6 @@ handle_extended_wait (struct lwp_info *event_child, int wstat)
if (stopping_threads != NOT_STOPPING_THREADS)
{
- new_lwp->stop_pc = get_stop_pc (new_lwp);
new_lwp->status_pending_p = 1;
new_lwp->status_pending = status;
}
@@ -481,44 +483,96 @@ get_pc (struct lwp_info *lwp)
The SIGTRAP could mean several things.
On i386, where decr_pc_after_break is non-zero:
- If we were single-stepping this process using PTRACE_SINGLESTEP,
- we will get only the one SIGTRAP (even if the instruction we
- stepped over was a breakpoint). The value of $eip will be the
- next instruction.
+
+ If we were single-stepping this process using PTRACE_SINGLESTEP, we
+ will get only the one SIGTRAP. The value of $eip will be the next
+ instruction. If the instruction we stepped over was a breakpoint,
+ we need to decrement the PC.
+
If we continue the process using PTRACE_CONT, we will get a
SIGTRAP when we hit a breakpoint. The value of $eip will be
the instruction after the breakpoint (i.e. needs to be
decremented). If we report the SIGTRAP to GDB, we must also
- report the undecremented PC. If we cancel the SIGTRAP, we
+ report the undecremented PC. If the breakpoint is removed, we
must resume at the decremented PC.
- (Presumably, not yet tested) On a non-decr_pc_after_break machine
- with hardware or kernel single-step:
- If we single-step over a breakpoint instruction, our PC will
- point at the following instruction. If we continue and hit a
- breakpoint instruction, our PC will point at the breakpoint
+ On a non-decr_pc_after_break machine with hardware or kernel
+ single-step:
+
+ If we either single-step a breakpoint instruction, or continue and
+ hit a breakpoint instruction, our PC will point at the breakpoint
instruction. */
-static CORE_ADDR
-get_stop_pc (struct lwp_info *lwp)
+static int
+check_stopped_by_breakpoint (struct lwp_info *lwp)
{
- CORE_ADDR stop_pc;
+ CORE_ADDR pc;
+ CORE_ADDR sw_breakpoint_pc;
+ struct thread_info *saved_thread;
if (the_low_target.get_pc == NULL)
return 0;
- stop_pc = get_pc (lwp);
+ pc = get_pc (lwp);
+ sw_breakpoint_pc = pc - the_low_target.decr_pc_after_break;
- if (WSTOPSIG (lwp->last_status) == SIGTRAP
- && !lwp->stepping
- && !lwp->stopped_by_watchpoint
- && !linux_is_extended_waitstatus (lwp->last_status))
- stop_pc -= the_low_target.decr_pc_after_break;
+ /* breakpoint_at reads from the current thread. */
+ saved_thread = current_thread;
+ current_thread = get_lwp_thread (lwp);
- if (debug_threads)
- debug_printf ("stop pc is 0x%lx\n", (long) stop_pc);
+ /* We may have just stepped a breakpoint instruction. E.g., in
+ non-stop mode, GDB first tells the thread A to step a range, and
+ then the user inserts a breakpoint inside the range. In that
+ case, we need to report the breakpoint PC. But, when we're
+ trying to step past one of our own breakpoints, that happens to
+ have been placed on top of a permanent breakpoint instruction, we
+ shouldn't adjust the PC, otherwise the program would keep
+ trapping the permanent breakpoint forever. */
+ if ((!lwp->stepping
+ || (!ptid_equal (ptid_of (current_thread), step_over_bkpt)
+ && lwp->stop_pc == sw_breakpoint_pc))
+ && (*the_low_target.breakpoint_at) (sw_breakpoint_pc))
+ {
+ if (debug_threads)
+ {
+ struct thread_info *thr = get_lwp_thread (lwp);
+
+ debug_printf ("CSBB: %s stopped by software breakpoint\n",
+ target_pid_to_str (ptid_of (thr)));
+ }
+
+ /* Back up the PC if necessary. */
+ if (pc != sw_breakpoint_pc)
+ {
+ struct regcache *regcache
+ = get_thread_regcache (current_thread, 1);
+ (*the_low_target.set_pc) (regcache, sw_breakpoint_pc);
+ }
+
+ lwp->stop_pc = sw_breakpoint_pc;
+ lwp->stop_reason = LWP_STOPPED_BY_SW_BREAKPOINT;
+ current_thread = saved_thread;
+ return 1;
+ }
+
+ if (hardware_breakpoint_inserted_here (pc))
+ {
+ if (debug_threads)
+ {
+ struct thread_info *thr = get_lwp_thread (lwp);
+
+ debug_printf ("CSBB: %s stopped by hardware breakpoint\n",
+ target_pid_to_str (ptid_of (thr)));
+ }
- return stop_pc;
+ lwp->stop_pc = pc;
+ lwp->stop_reason = LWP_STOPPED_BY_HW_BREAKPOINT;
+ current_thread = saved_thread;
+ return 1;
+ }
+
+ current_thread = saved_thread;
+ return 0;
}
static struct lwp_info *
@@ -1235,12 +1289,83 @@ linux_thread_alive (ptid_t ptid)
return 0;
}
+/* Return 1 if this lwp still has an interesting status pending. If
+ not (e.g., it had stopped for a breakpoint that is gone), return
+ false. */
+
+static int
+thread_still_has_status_pending_p (struct thread_info *thread)
+{
+ struct lwp_info *lp = get_thread_lwp (thread);
+
+ if (!lp->status_pending_p)
+ return 0;
+
+ /* If we got a `vCont;t', but we haven't reported a stop yet, do
+ report any status pending the LWP may have. */
+ if (thread->last_resume_kind == resume_stop
+ && thread->last_status.kind != TARGET_WAITKIND_IGNORE)
+ return 0;
+
+ if (thread->last_resume_kind != resume_stop
+ && (lp->stop_reason == LWP_STOPPED_BY_SW_BREAKPOINT
+ || lp->stop_reason == LWP_STOPPED_BY_HW_BREAKPOINT))
+ {
+ struct thread_info *saved_thread;
+ CORE_ADDR pc;
+ int discard = 0;
+
+ gdb_assert (lp->last_status != 0);
+
+ pc = get_pc (lp);
+
+ saved_thread = current_thread;
+ current_thread = thread;
+
+ if (pc != lp->stop_pc)
+ {
+ if (debug_threads)
+ debug_printf ("PC of %ld changed\n",
+ lwpid_of (thread));
+ discard = 1;
+ }
+ else if (lp->stop_reason == LWP_STOPPED_BY_SW_BREAKPOINT
+ && !(*the_low_target.breakpoint_at) (pc))
+ {
+ if (debug_threads)
+ debug_printf ("previous SW breakpoint of %ld gone\n",
+ lwpid_of (thread));
+ discard = 1;
+ }
+ else if (lp->stop_reason == LWP_STOPPED_BY_HW_BREAKPOINT
+ && !hardware_breakpoint_inserted_here (pc))
+ {
+ if (debug_threads)
+ debug_printf ("previous HW breakpoint of %ld gone\n",
+ lwpid_of (thread));
+ discard = 1;
+ }
+
+ current_thread = saved_thread;
+
+ if (discard)
+ {
+ if (debug_threads)
+ debug_printf ("discarding pending breakpoint status\n");
+ lp->status_pending_p = 0;
+ return 0;
+ }
+ }
+
+ return 1;
+}
+
/* Return 1 if this lwp has an interesting status pending. */
static int
status_pending_p_callback (struct inferior_list_entry *entry, void *arg)
{
struct thread_info *thread = (struct thread_info *) entry;
- struct lwp_info *lwp = get_thread_lwp (thread);
+ struct lwp_info *lp = get_thread_lwp (thread);
ptid_t ptid = * (ptid_t *) arg;
/* Check if we're only interested in events from a specific process
@@ -1249,13 +1374,14 @@ status_pending_p_callback (struct inferior_list_entry *entry, void *arg)
&& ptid_get_pid (ptid) != ptid_get_pid (thread->entry.id))
return 0;
- /* If we got a `vCont;t', but we haven't reported a stop yet, do
- report any status pending the LWP may have. */
- if (thread->last_resume_kind == resume_stop
- && thread->last_status.kind != TARGET_WAITKIND_IGNORE)
- return 0;
+ if (lp->status_pending_p
+ && !thread_still_has_status_pending_p (thread))
+ {
+ linux_resume_one_lwp (lp, lp->stepping, GDB_SIGNAL_0, NULL);
+ return 0;
+ }
- return lwp->status_pending_p;
+ return lp->status_pending_p;
}
static int
@@ -1401,6 +1527,8 @@ handle_tracepoints (struct lwp_info *lwp)
struct thread_info *tinfo = get_lwp_thread (lwp);
int tpoint_related_event = 0;
+ gdb_assert (lwp->suspended == 0);
+
/* If this tracepoint hit causes a tracing stop, we'll immediately
uninsert tracepoints. To do this, we temporarily pause all
threads, unpatch away, and then unpause threads. We need to make
@@ -1566,7 +1694,6 @@ maybe_move_out_of_jump_pad (struct lwp_info *lwp, int *wstat)
"stopping all threads momentarily.\n");
stop_all_lwps (1, lwp);
- cancel_breakpoints ();
delete_breakpoint (lwp->exit_jump_pad_bkpt);
lwp->exit_jump_pad_bkpt = NULL;
@@ -1692,65 +1819,59 @@ dequeue_one_deferred_signal (struct lwp_info *lwp, int *wstat)
return 0;
}
-/* Arrange for a breakpoint to be hit again later. We don't keep the
- SIGTRAP status and don't forward the SIGTRAP signal to the LWP. We
- will handle the current event, eventually we will resume this LWP,
- and this breakpoint will trap again. */
+/* Return true if the event in LP may be caused by breakpoint. */
static int
-cancel_breakpoint (struct lwp_info *lwp)
+wstatus_maybe_breakpoint (int wstatus)
{
- struct thread_info *saved_thread;
+ return (WIFSTOPPED (wstatus)
+ && (WSTOPSIG (wstatus) == SIGTRAP
+ /* SIGILL and SIGSEGV are also treated as traps in case a
+ breakpoint is inserted at the current PC. */
+ || WSTOPSIG (wstatus) == SIGILL
+ || WSTOPSIG (wstatus) == SIGSEGV));
+}
- /* There's nothing to do if we don't support breakpoints. */
- if (!supports_breakpoints ())
- return 0;
+/* Fetch the possibly triggered data watchpoint info and store it in
+ CHILD.
- /* breakpoint_at reads from current inferior. */
- saved_thread = current_thread;
- current_thread = get_lwp_thread (lwp);
+ On some archs, like x86, that use debug registers to set
+ watchpoints, it's possible that the way to know which watched
+ address trapped, is to check the register that is used to select
+ which address to watch. Problem is, between setting the watchpoint
+ and reading back which data address trapped, the user may change
+ the set of watchpoints, and, as a consequence, GDB changes the
+ debug registers in the inferior. To avoid reading back a stale
+ stopped-data-address when that happens, we cache in LP the fact
+ that a watchpoint trapped, and the corresponding data address, as
+ soon as we see CHILD stop with a SIGTRAP. If GDB changes the debug
+ registers meanwhile, we have the cached data we can rely on. */
- if ((*the_low_target.breakpoint_at) (lwp->stop_pc))
+static int
+check_stopped_by_watchpoint (struct lwp_info *child)
+{
+ if (the_low_target.stopped_by_watchpoint != NULL)
{
- if (debug_threads)
- debug_printf ("CB: Push back breakpoint for %s\n",
- target_pid_to_str (ptid_of (current_thread)));
+ struct thread_info *saved_thread;
- /* Back up the PC if necessary. */
- if (the_low_target.decr_pc_after_break)
+ saved_thread = current_thread;
+ current_thread = get_lwp_thread (child);
+
+ if (the_low_target.stopped_by_watchpoint ())
{
- struct regcache *regcache
- = get_thread_regcache (current_thread, 1);
- (*the_low_target.set_pc) (regcache, lwp->stop_pc);
+ child->stop_reason = LWP_STOPPED_BY_WATCHPOINT;
+
+ if (the_low_target.stopped_data_address != NULL)
+ child->stopped_data_address
+ = the_low_target.stopped_data_address ();
+ else
+ child->stopped_data_address = 0;
}
current_thread = saved_thread;
- return 1;
- }
- else
- {
- if (debug_threads)
- debug_printf ("CB: No breakpoint found at %s for [%s]\n",
- paddress (lwp->stop_pc),
- target_pid_to_str (ptid_of (current_thread)));
}
- current_thread = saved_thread;
- return 0;
-}
-
-/* Return true if the event in LP may be caused by breakpoint. */
-
-static int
-lp_status_maybe_breakpoint (struct lwp_info *lp)
-{
- return (lp->status_pending_p
- && WIFSTOPPED (lp->status_pending)
- && (WSTOPSIG (lp->status_pending) == SIGTRAP
- /* SIGILL and SIGSEGV are also treated as traps in case a
- breakpoint is inserted at the current PC. */
- || WSTOPSIG (lp->status_pending) == SIGILL
- || WSTOPSIG (lp->status_pending) == SIGSEGV));
+ return child->stop_reason == LWP_STOPPED_BY_WATCHPOINT;
}
/* Do low-level handling of the event, and check if we should go on
@@ -1758,10 +1879,11 @@ lp_status_maybe_breakpoint (struct lwp_info *lp)
NULL otherwise. */
static struct lwp_info *
-linux_low_filter_event (ptid_t filter_ptid, int lwpid, int wstat)
+linux_low_filter_event (int lwpid, int wstat)
{
struct lwp_info *child;
struct thread_info *thread;
+ int have_stop_pc = 0;
child = find_lwp_pid (pid_to_ptid (lwpid));
@@ -1783,6 +1905,35 @@ linux_low_filter_event (ptid_t filter_ptid, int lwpid, int wstat)
child->last_status = wstat;
+ /* Check if the thread has exited. */
+ if ((WIFEXITED (wstat) || WIFSIGNALED (wstat)))
+ {
+ if (debug_threads)
+ debug_printf ("LLFE: %d exited.\n", lwpid);
+ if (num_lwps (pid_of (thread)) > 1)
+ {
+
+ /* If there is at least one more LWP, then the exit signal was
+ not the end of the debugged application and should be
+ ignored. */
+ delete_lwp (child);
+ return NULL;
+ }
+ else
+ {
+ /* This was the last lwp in the process. Since events are
+ serialized to GDB core, and we can't report this one
+ right now, but GDB core and the other target layers will
+ want to be notified about the exit code/signal, leave the
+ status pending for the next time we're able to report
+ it. */
+ mark_lwp_dead (child, wstat);
+ return child;
+ }
+ }
+
+ gdb_assert (WIFSTOPPED (wstat));
+
if (WIFSTOPPED (wstat))
{
struct process_info *proc;
@@ -1807,75 +1958,6 @@ linux_low_filter_event (ptid_t filter_ptid, int lwpid, int wstat)
}
}
- /* Store the STOP_PC, with adjustment applied. This depends on the
- architecture being defined already (so that CHILD has a valid
- regcache), and on LAST_STATUS being set (to check for SIGTRAP or
- not). */
- if (WIFSTOPPED (wstat))
- {
- if (debug_threads
- && the_low_target.get_pc != NULL)
- {
- struct thread_info *saved_thread;
- struct regcache *regcache;
- CORE_ADDR pc;
-
- saved_thread = current_thread;
- current_thread = thread;
- regcache = get_thread_regcache (current_thread, 1);
- pc = (*the_low_target.get_pc) (regcache);
- debug_printf ("linux_low_filter_event: pc is 0x%lx\n", (long) pc);
- current_thread = saved_thread;
- }
-
- child->stop_pc = get_stop_pc (child);
- }
-
- /* Fetch the possibly triggered data watchpoint info and store it in
- CHILD.
-
- On some archs, like x86, that use debug registers to set
- watchpoints, it's possible that the way to know which watched
- address trapped, is to check the register that is used to select
- which address to watch. Problem is, between setting the
- watchpoint and reading back which data address trapped, the user
- may change the set of watchpoints, and, as a consequence, GDB
- changes the debug registers in the inferior. To avoid reading
- back a stale stopped-data-address when that happens, we cache in
- LP the fact that a watchpoint trapped, and the corresponding data
- address, as soon as we see CHILD stop with a SIGTRAP. If GDB
- changes the debug registers meanwhile, we have the cached data we
- can rely on. */
-
- if (WIFSTOPPED (wstat) && WSTOPSIG (wstat) == SIGTRAP)
- {
- if (the_low_target.stopped_by_watchpoint == NULL)
- {
- child->stopped_by_watchpoint = 0;
- }
- else
- {
- struct thread_info *saved_thread;
-
- saved_thread = current_thread;
- current_thread = thread;
-
- child->stopped_by_watchpoint
- = the_low_target.stopped_by_watchpoint ();
-
- if (child->stopped_by_watchpoint)
- {
- if (the_low_target.stopped_data_address != NULL)
- child->stopped_data_address
- = the_low_target.stopped_data_address ();
- else
- child->stopped_data_address = 0;
- }
-
- current_thread = saved_thread;
- }
- }
-
if (WIFSTOPPED (wstat) && child->must_set_ptrace_flags)
{
struct process_info *proc = find_process_pid (pid_of (thread));
@@ -1884,13 +1966,28 @@ linux_low_filter_event (ptid_t filter_ptid, int lwpid, int wstat)
child->must_set_ptrace_flags = 0;
}
+ /* Be careful to not overwrite stop_pc until
+ check_stopped_by_breakpoint is called. */
if (WIFSTOPPED (wstat) && WSTOPSIG (wstat) == SIGTRAP
&& linux_is_extended_waitstatus (wstat))
{
+ child->stop_pc = get_pc (child);
handle_extended_wait (child, wstat);
return NULL;
}
+ if (WIFSTOPPED (wstat) && WSTOPSIG (wstat) == SIGTRAP
+ && check_stopped_by_watchpoint (child))
+ ;
+ else if (WIFSTOPPED (wstat) && wstatus_maybe_breakpoint (wstat))
+ {
+ if (check_stopped_by_breakpoint (child))
+ have_stop_pc = 1;
+ }
+
+ if (!have_stop_pc)
+ child->stop_pc = get_pc (child);
+
if (WIFSTOPPED (wstat) && WSTOPSIG (wstat) == SIGSTOP
&& child->stop_expected)
{
@@ -1906,7 +2003,7 @@ linux_low_filter_event (ptid_t filter_ptid, int lwpid, int wstat)
else if (stopping_threads != NOT_STOPPING_THREADS)
{
/* Stopping threads. We don't want this SIGSTOP to end up
- pending in the FILTER_PTID handling below. */
+ pending. */
return NULL;
}
else
@@ -1917,79 +2014,11 @@ linux_low_filter_event (ptid_t filter_ptid, int lwpid, int wstat)
}
}
- /* Check if the thread has exited. */
- if ((WIFEXITED (wstat) || WIFSIGNALED (wstat))
- && num_lwps (pid_of (thread)) > 1)
- {
- if (debug_threads)
- debug_printf ("LLW: %d exited.\n", lwpid);
-
- /* If there is at least one more LWP, then the exit signal
- was not the end of the debugged application and should be
- ignored. */
- delete_lwp (child);
- return NULL;
- }
-
- if (!ptid_match (ptid_of (thread), filter_ptid))
- {
- if (debug_threads)
- debug_printf ("LWP %d got an event %06x, leaving pending.\n",
- lwpid, wstat);
-
- if (WIFSTOPPED (wstat))
- {
- child->status_pending_p = 1;
- child->status_pending = wstat;
-
- if (WSTOPSIG (wstat) != SIGSTOP)
- {
- /* Cancel breakpoint hits. The breakpoint may be
- removed before we fetch events from this process to
- report to the core. It is best not to assume the
- moribund breakpoints heuristic always handles these
- cases --- it could be too many events go through to
- the core before this one is handled. All-stop always
- cancels breakpoint hits in all threads. */
- if (non_stop
- && lp_status_maybe_breakpoint (child)
- && cancel_breakpoint (child))
- {
- /* Throw away the SIGTRAP. */
- child->status_pending_p = 0;
-
- if (debug_threads)
- debug_printf ("LLW: LWP %d hit a breakpoint while"
- " waiting for another process;"
- " cancelled it\n", lwpid);
- }
- }
- }
- else if (WIFEXITED (wstat) || WIFSIGNALED (wstat))
- {
- if (debug_threads)
- debug_printf ("LLWE: process %d exited while fetching "
- "event from another LWP\n", lwpid);
-
- /* This was the last lwp in the process. Since events are
- serialized to GDB core, and we can't report this one
- right now, but GDB core and the other target layers will
- want to be notified about the exit code/signal, leave the
- status pending for the next time we're able to report
- it. */
- mark_lwp_dead (child, wstat);
- }
-
- return NULL;
- }
-
+ child->status_pending_p = 1;
+ child->status_pending = wstat;
return child;
}
-/* When the event-loop is doing a step-over, this points at the thread
- being stepped. */
-ptid_t step_over_bkpt;
-
/* Wait for an event from child(ren) WAIT_PTID, and return any that
match FILTER_PTID (leaving others pending). The PTIDs can be:
minus_one_ptid, to specify any child; a pid PTID, specifying all
@@ -2079,6 +2108,9 @@ linux_wait_for_event_filtered (ptid_t wait_ptid, ptid_t filter_ptid,
sigfillset (&block_mask);
sigprocmask (SIG_BLOCK, &block_mask, &prev_mask);
+ /* Always pull all events out of the kernel. We'll randomly select
+ an event LWP out of all that have events, to prevent
+ starvation. */
while (event_child == NULL)
{
pid_t ret = 0;
@@ -2111,20 +2143,28 @@ linux_wait_for_event_filtered (ptid_t wait_ptid, ptid_t filter_ptid,
(long) ret, status_to_str (*wstatp));
}
- event_child = linux_low_filter_event (filter_ptid,
- ret, *wstatp);
- if (event_child != NULL)
- {
- /* We got an event to report to the core. */
- event_thread = get_lwp_thread (event_child);
- break;
- }
-
+ /* Filter all events. IOW, leave all events pending. We'll
+ randomly select an event LWP out of all that have events
+ below. */
+ linux_low_filter_event (ret, *wstatp);
/* Retry until nothing comes out of waitpid. A single
SIGCHLD can indicate more than one child stopped. */
continue;
}
+ /* Now that we've pulled all events out of the kernel, check if
+ there's any LWP with a status to report to the core. */
+ event_thread = (struct thread_info *)
+ find_inferior (&all_threads, status_pending_p_callback, &filter_ptid);
+ if (event_thread != NULL)
+ {
+ event_child = get_thread_lwp (event_thread);
+ *wstatp = event_child->status_pending;
+ event_child->status_pending_p = 0;
+ event_child->status_pending = 0;
+ break;
+ }
+
/* Check for zombie thread group leaders. Those can't be reaped
until all other threads in the thread group are. */
check_zombie_leaders ();
@@ -2204,17 +2244,14 @@ static int
count_events_callback (struct inferior_list_entry *entry, void *data)
{
struct thread_info *thread = (struct thread_info *) entry;
- struct lwp_info *lp = get_thread_lwp (thread);
int *count = data;
gdb_assert (count != NULL);
- /* Count only resumed LWPs that have a SIGTRAP event pending that
- should be reported to GDB. */
+ /* Count only resumed LWPs that have an event pending. */
if (thread->last_status.kind == TARGET_WAITKIND_IGNORE
&& thread->last_resume_kind != resume_stop
- && lp_status_maybe_breakpoint (lp)
- && !breakpoint_inserted_here (lp->stop_pc))
+ && thread->status_pending_p)
(*count)++;
return 0;
@@ -2243,62 +2280,20 @@ static int
select_event_lwp_callback (struct inferior_list_entry *entry, void *data)
{
struct thread_info *thread = (struct thread_info *) entry;
- struct lwp_info *lp = get_thread_lwp (thread);
int *selector = data;
gdb_assert (selector != NULL);
- /* Select only resumed LWPs that have a SIGTRAP event pending. */
+ /* Select only resumed LWPs that have an event pending. */
if (thread->last_resume_kind != resume_stop
&& thread->last_status.kind == TARGET_WAITKIND_IGNORE
- && lp_status_maybe_breakpoint (lp)
- && !breakpoint_inserted_here (lp->stop_pc))
+ && thread->status_pending_p)
if ((*selector)-- == 0)
return 1;
return 0;
}
-static int
-cancel_breakpoints_callback (struct inferior_list_entry *entry, void *data)
-{
- struct thread_info *thread = (struct thread_info *) entry;
- struct lwp_info *lp = get_thread_lwp (thread);
- struct lwp_info *event_lp = data;
-
- /* Leave the LWP that has been elected to receive a SIGTRAP alone. */
- if (lp == event_lp)
- return 0;
-
- /* If a LWP other than the LWP that we're reporting an event for has
- hit a GDB breakpoint (as opposed to some random trap signal),
- then just arrange for it to hit it again later. We don't keep
- the SIGTRAP status and don't forward the SIGTRAP signal to the
- LWP. We will handle the current event, eventually we will resume
- all LWPs, and this one will get its breakpoint trap again.
-
- If we do not do this, then we run the risk that the user will
- delete or disable the breakpoint, but the LWP will have already
- tripped on it. */
-
- if (thread->last_resume_kind != resume_stop
- && thread->last_status.kind == TARGET_WAITKIND_IGNORE
- && lp_status_maybe_breakpoint (lp)
- && !lp->stepping
- && !lp->stopped_by_watchpoint
- && cancel_breakpoint (lp))
- /* Throw away the SIGTRAP. */
- lp->status_pending_p = 0;
-
- return 0;
-}
-
-static void
-linux_cancel_breakpoints (void)
-{
- find_inferior (&all_threads, cancel_breakpoints_callback, NULL);
-}
-
/* Select one LWP out of those that have events pending. */
static void
@@ -2306,20 +2301,30 @@ select_event_lwp (struct lwp_info **orig_lp)
{
int num_events = 0;
int random_selector;
- struct thread_info *event_thread;
-
- /* Give preference to any LWP that is being single-stepped. */
- event_thread
- = (struct thread_info *) find_inferior (&all_threads,
- select_singlestep_lwp_callback,
- NULL);
- if (event_thread != NULL)
+ struct thread_info *event_thread = NULL;
+
+ /* In all-stop, give preference to the LWP that is being
+ single-stepped. There will be at most one, and it's the LWP that
+ the core is most interested in. If we didn't do this, then we'd
+ have to handle pending step SIGTRAPs somehow in case the core
+ later continues the previously-stepped thread, otherwise we'd
+ report the pending SIGTRAP, and the core, not having stepped the
+ thread, wouldn't understand what the trap was for, and therefore
+ would report it to the user as a random signal. */
+ if (!non_stop)
{
- if (debug_threads)
- debug_printf ("SEL: Select single-step %s\n",
- target_pid_to_str (ptid_of (event_thread)));
+ event_thread
+ = (struct thread_info *) find_inferior (&all_threads,
+ select_singlestep_lwp_callback,
+ NULL);
+ if (event_thread != NULL)
+ {
+ if (debug_threads)
+ debug_printf ("SEL: Select single-step %s\n",
+ target_pid_to_str (ptid_of (event_thread)));
+ }
}
- else
+ if (event_thread == NULL)
{
/* No single-stepping LWP. Select one at random, out of those
which have had SIGTRAP events. */
@@ -2486,6 +2491,23 @@ linux_stabilize_threads (void)
}
}
+static void async_file_mark (void);
+
+/* Convenience function that is called when the kernel reports an
+ event that is not passed out to GDB. */
+
+static ptid_t
+ignore_event (struct target_waitstatus *ourstatus)
+{
+ /* If we got an event, there may still be others, as a single
+ SIGCHLD can indicate more than one child stopped. This forces
+ another target_wait call. */
+ async_file_mark ();
+
+ ourstatus->kind = TARGET_WAITKIND_IGNORE;
+ return null_ptid;
+}
+
/* Wait for process, returns status. */
static ptid_t
@@ -2514,7 +2536,6 @@ linux_wait_1 (ptid_t ptid,
if (target_options & TARGET_WNOHANG)
options |= WNOHANG;
-retry:
bp_explains_trap = 0;
trace_event = 0;
in_step_range = 0;
@@ -2679,7 +2700,8 @@ retry:
WSTOPSIG (w), lwpid_of (current_thread));
linux_resume_one_lwp (event_child, 0, 0, NULL);
- goto retry;
+
+ return ignore_event (ourstatus);
}
}
@@ -2713,7 +2735,6 @@ retry:
care of while the breakpoint is still
inserted. */
stop_all_lwps (1, event_child);
- cancel_breakpoints ();
delete_breakpoint (event_child->exit_jump_pad_bkpt);
event_child->exit_jump_pad_bkpt = NULL;
@@ -2797,7 +2818,7 @@ retry:
info_p = NULL;
linux_resume_one_lwp (event_child, event_child->stepping,
WSTOPSIG (w), info_p);
- goto retry;
+ return ignore_event (ourstatus);
}
/* Note that all addresses are always "out of the step range" when
@@ -2816,7 +2837,7 @@ retry:
report_to_gdb = (!maybe_internal_trap
|| (current_thread->last_resume_kind == resume_step
&& !in_step_range)
- || event_child->stopped_by_watchpoint
+ || event_child->stop_reason == LWP_STOPPED_BY_WATCHPOINT
|| (!step_over_finished && !in_step_range
&& !bp_explains_trap && !trace_event)
|| (gdb_breakpoint_here (event_child->stop_pc)
@@ -2868,7 +2889,7 @@ retry:
unsuspend_all_lwps (event_child);
proceed_all_lwps ();
- goto retry;
+ return ignore_event (ourstatus);
}
if (debug_threads)
@@ -2880,9 +2901,9 @@ retry:
else if (!lwp_in_step_range (event_child))
debug_printf ("Out of step range, reporting event.\n");
}
- if (event_child->stopped_by_watchpoint)
+ if (event_child->stop_reason == LWP_STOPPED_BY_WATCHPOINT)
debug_printf ("Stopped by watchpoint.\n");
- if (gdb_breakpoint_here (event_child->stop_pc))
+ else if (gdb_breakpoint_here (event_child->stop_pc))
debug_printf ("Stopped by GDB breakpoint.\n");
if (debug_threads)
debug_printf ("Hit a non-gdbserver trap event.\n");
@@ -2890,10 +2911,11 @@ retry:
/* Alright, we're going to report a stop. */
- if (!non_stop && !stabilizing_threads)
+ if (!stabilizing_threads)
{
/* In all-stop, stop all threads. */
- stop_all_lwps (0, NULL);
+ if (!non_stop)
+ stop_all_lwps (0, NULL);
/* If we're not waiting for a specific LWP, choose an event LWP
from among those that have had events. Giving equal priority
@@ -2913,23 +2935,33 @@ retry:
w = event_child->status_pending;
}
- /* Now that we've selected our final event LWP, cancel any
- breakpoints in other LWPs that have hit a GDB breakpoint.
- See the comment in cancel_breakpoints_callback to find out
- why. */
- find_inferior (&all_threads, cancel_breakpoints_callback, event_child);
-
- /* If we were going a step-over, all other threads but the stepping one
- had been paused in start_step_over, with their suspend counts
- incremented. We don't want to do a full unstop/unpause, because we're
- in all-stop mode (so we want threads stopped), but we still need to
- unsuspend the other threads, to decrement their `suspended' count
- back. */
if (step_over_finished)
- unsuspend_all_lwps (event_child);
+ {
+ if (!non_stop)
+ {
+ /* If we were doing a step-over, all other threads but
+ the stepping one had been paused in start_step_over,
+ with their suspend counts incremented. We don't want
+ to do a full unstop/unpause, because we're in
+ all-stop mode (so we want threads stopped), but we
+ still need to unsuspend the other threads, to
+ decrement their `suspended' count back. */
+ unsuspend_all_lwps (event_child);
+ }
+ else
+ {
+ /* If we just finished a step-over, then all threads had
+ been momentarily paused. In all-stop, that's fine,
+ we want threads stopped by now anyway. In non-stop,
+ we need to re-resume threads that GDB wanted to be
+ running. */
+ unstop_all_lwps (1, event_child);
+ }
+ }
/* Stabilize threads (move out of jump pads). */
- stabilize_threads ();
+ if (!non_stop)
+ stabilize_threads ();
}
else
{
@@ -2943,6 +2975,20 @@ retry:
ourstatus->kind = TARGET_WAITKIND_STOPPED;
+ /* Now that we've selected our final event LWP, un-adjust its PC if
+ it was a software breakpoint. */
+ if (event_child->stop_reason == LWP_STOPPED_BY_SW_BREAKPOINT)
+ {
+ int decr_pc = the_low_target.decr_pc_after_break;
+
+ if (decr_pc != 0)
+ {
+ struct regcache *regcache
+ = get_thread_regcache (current_thread, 1);
+ (*the_low_target.set_pc) (regcache, event_child->stop_pc + decr_pc);
+ }
+ }
+
if (current_thread->last_resume_kind == resume_stop
&& WSTOPSIG (w) == SIGSTOP)
{
@@ -3014,7 +3060,13 @@ linux_wait (ptid_t ptid,
if (target_is_async_p ())
async_file_flush ();
- event_ptid = linux_wait_1 (ptid, ourstatus, target_options);
+ do
+ {
+ event_ptid = linux_wait_1 (ptid, ourstatus, target_options);
+ }
+ while ((target_options & TARGET_WNOHANG) == 0
+ && ptid_equal (event_ptid, null_ptid)
+ && ourstatus->kind == TARGET_WAITKIND_IGNORE);
/* If at least one stop was reported, there may be more. A single
SIGCHLD can signal more than one child stop. */
@@ -3202,7 +3254,7 @@ stuck_in_jump_pad_callback (struct inferior_list_entry *entry, void *data)
return (supports_fast_tracepoints ()
&& agent_loaded_p ()
&& (gdb_breakpoint_here (lwp->stop_pc)
- || lwp->stopped_by_watchpoint
+ || lwp->stop_reason == LWP_STOPPED_BY_WATCHPOINT
|| thread->last_resume_kind == resume_step)
&& linux_fast_tracepoint_collecting (lwp, NULL));
}
@@ -3221,7 +3273,7 @@ move_out_of_jump_pad_callback (struct inferior_list_entry *entry)
/* Allow debugging the jump pad, gdb_collect, etc. */
if (!gdb_breakpoint_here (lwp->stop_pc)
- && !lwp->stopped_by_watchpoint
+ && lwp->stop_reason != LWP_STOPPED_BY_WATCHPOINT
&& thread->last_resume_kind != resume_step
&& maybe_move_out_of_jump_pad (lwp, wstat))
{
@@ -3445,11 +3497,17 @@ linux_resume_one_lwp (struct lwp_info *lwp,
step = 1;
}
- if (debug_threads && the_low_target.get_pc != NULL)
+ if (the_low_target.get_pc != NULL)
{
struct regcache *regcache = get_thread_regcache (current_thread, 1);
- CORE_ADDR pc = (*the_low_target.get_pc) (regcache);
- debug_printf (" resuming from pc 0x%lx\n", (long) pc);
+
+ lwp->stop_pc = (*the_low_target.get_pc) (regcache);
+
+ if (debug_threads)
+ {
+ debug_printf (" %s from pc 0x%lx\n", step ? "step" : "continue",
+ (long) lwp->stop_pc);
+ }
}
/* If we have pending signals, consume one unless we are trying to
@@ -3480,7 +3538,7 @@ linux_resume_one_lwp (struct lwp_info *lwp,
regcache_invalidate_thread (thread);
errno = 0;
lwp->stopped = 0;
- lwp->stopped_by_watchpoint = 0;
+ lwp->stop_reason = LWP_STOPPED_BY_NO_REASON;
lwp->stepping = step;
ptrace (step ? PTRACE_SINGLESTEP : PTRACE_CONT, lwpid_of (thread),
(PTRACE_TYPE_ARG3) 0,
@@ -3601,7 +3659,7 @@ resume_status_pending_p (struct inferior_list_entry *entry, void *flag_p)
if (lwp->resume == NULL)
return 0;
- if (lwp->status_pending_p)
+ if (thread_still_has_status_pending_p (thread))
* (int *) flag_p = 1;
return 0;
@@ -4886,7 +4944,7 @@ linux_stopped_by_watchpoint (void)
{
struct lwp_info *lwp = get_thread_lwp (current_thread);
- return lwp->stopped_by_watchpoint;
+ return lwp->stop_reason == LWP_STOPPED_BY_WATCHPOINT;
}
static CORE_ADDR
@@ -6055,7 +6113,6 @@ static struct target_ops linux_target_ops = {
NULL,
linux_pause_all,
linux_unpause_all,
- linux_cancel_breakpoints,
linux_stabilize_threads,
linux_install_fast_tracepoint_jump_pad,
linux_emit_ops,
diff --git a/gdb/gdbserver/linux-low.h b/gdb/gdbserver/linux-low.h
index 4820929..b2f0b91c 100644
--- a/gdb/gdbserver/linux-low.h
+++ b/gdb/gdbserver/linux-low.h
@@ -230,6 +230,24 @@ extern struct linux_target_ops the_low_target;
#define get_thread_lwp(thr) ((struct lwp_info *) (inferior_target_data (thr)))
#define get_lwp_thread(lwp) ((lwp)->thread)
+/* Reasons an LWP last stopped. */
+
+enum lwp_stop_reason
+{
+ /* Either not stopped, or stopped for a reason that doesn't require
+ special tracking. */
+ LWP_STOPPED_BY_NO_REASON,
+
+ /* Stopped by a software breakpoint. */
+ LWP_STOPPED_BY_SW_BREAKPOINT,
+
+ /* Stopped by a hardware breakpoint. */
+ LWP_STOPPED_BY_HW_BREAKPOINT,
+
+ /* Stopped by a watchpoint. */
+ LWP_STOPPED_BY_WATCHPOINT
+};
+
/* This struct is recorded in the target_data field of struct thread_info.
On linux ``all_threads'' is keyed by the LWP ID, which we use as the
@@ -269,8 +287,9 @@ struct lwp_info
/* When stopped is set, the last wait status recorded for this lwp. */
int last_status;
- /* When stopped is set, this is where the lwp stopped, with
- decr_pc_after_break already accounted for. */
+ /* When stopped is set, this is where the lwp last stopped, with
+ decr_pc_after_break already accounted for. If the LWP is
+ running, this is the address at which the lwp was resumed. */
CORE_ADDR stop_pc;
/* If this flag is set, STATUS_PENDING is a waitstatus that has not yet
@@ -278,9 +297,9 @@ struct lwp_info
int status_pending_p;
int status_pending;
- /* STOPPED_BY_WATCHPOINT is non-zero if this LWP stopped with a data
- watchpoint trap. */
- int stopped_by_watchpoint;
+ /* The reason the LWP last stopped, if we need to track it
+ (breakpoint, watchpoint, etc.) */
+ enum lwp_stop_reason stop_reason;
/* On architectures where it is possible to know the data address of
a triggered watchpoint, STOPPED_DATA_ADDRESS is non-zero, and
diff --git a/gdb/gdbserver/linux-x86-low.c b/gdb/gdbserver/linux-x86-low.c
index 838e7c9..885c1d6 100644
--- a/gdb/gdbserver/linux-x86-low.c
+++ b/gdb/gdbserver/linux-x86-low.c
@@ -805,7 +805,7 @@ x86_linux_prepare_to_resume (struct lwp_info *lwp)
lwp->arch_private->debug_registers_changed = 0;
}
- if (clear_status || lwp->stopped_by_watchpoint)
+ if (clear_status || lwp->stop_reason == LWP_STOPPED_BY_WATCHPOINT)
x86_linux_dr_set (ptid, DR_STATUS, 0);
}
\f
diff --git a/gdb/gdbserver/mem-break.c b/gdb/gdbserver/mem-break.c
index d7481c4..5f39bff 100644
--- a/gdb/gdbserver/mem-break.c
+++ b/gdb/gdbserver/mem-break.c
@@ -1610,6 +1610,40 @@ breakpoint_inserted_here (CORE_ADDR addr)
return 0;
}
+/* See mem-break.h. */
+
+int
+software_breakpoint_inserted_here (CORE_ADDR addr)
+{
+ struct process_info *proc = current_process ();
+ struct raw_breakpoint *bp;
+
+ for (bp = proc->raw_breakpoints; bp != NULL; bp = bp->next)
+ if (bp->raw_type == raw_bkpt_type_sw
+ && bp->pc == addr
+ && bp->inserted)
+ return 1;
+
+ return 0;
+}
+
+/* See mem-break.h. */
+
+int
+hardware_breakpoint_inserted_here (CORE_ADDR addr)
+{
+ struct process_info *proc = current_process ();
+ struct raw_breakpoint *bp;
+
+ for (bp = proc->raw_breakpoints; bp != NULL; bp = bp->next)
+ if (bp->raw_type == raw_bkpt_type_hw
+ && bp->pc == addr
+ && bp->inserted)
+ return 1;
+
+ return 0;
+}
+
static int
validate_inserted_breakpoint (struct raw_breakpoint *bp)
{
diff --git a/gdb/gdbserver/mem-break.h b/gdb/gdbserver/mem-break.h
index c84c688..5f52d93 100644
--- a/gdb/gdbserver/mem-break.h
+++ b/gdb/gdbserver/mem-break.h
@@ -90,6 +90,16 @@ int breakpoint_here (CORE_ADDR addr);
int breakpoint_inserted_here (CORE_ADDR addr);
+/* Returns TRUE if there's any inserted software breakpoint at
+ ADDR. */
+
+int software_breakpoint_inserted_here (CORE_ADDR addr);
+
+/* Returns TRUE if there's any inserted hardware (code) breakpoint at
+ ADDR. */
+
+int hardware_breakpoint_inserted_here (CORE_ADDR addr);
+
/* Clear all breakpoint conditions and commands associated with a
breakpoint. */
diff --git a/gdb/gdbserver/target.h b/gdb/gdbserver/target.h
index 5e29b7f..d8c5c54 100644
--- a/gdb/gdbserver/target.h
+++ b/gdb/gdbserver/target.h
@@ -304,9 +304,6 @@ struct target_ops
the pause call. */
void (*unpause_all) (int unfreeze);
- /* Cancel all pending breakpoints hits in all threads. */
- void (*cancel_breakpoints) (void);
-
/* Stabilize all threads. That is, force them out of jump pads. */
void (*stabilize_threads) (void);
@@ -453,13 +450,6 @@ int kill_inferior (int);
(*the_target->unpause_all) (unfreeze); \
} while (0)
-#define cancel_breakpoints() \
- do \
- { \
- if (the_target->cancel_breakpoints) \
- (*the_target->cancel_breakpoints) (); \
- } while (0)
-
#define stabilize_threads() \
do \
{ \
diff --git a/gdb/gdbserver/tracepoint.c b/gdb/gdbserver/tracepoint.c
index b6ab53f..1b2eba7 100644
--- a/gdb/gdbserver/tracepoint.c
+++ b/gdb/gdbserver/tracepoint.c
@@ -2443,7 +2443,6 @@ clear_installed_tracepoints (void)
struct tracepoint *prev_stpoint;
pause_all (1);
- cancel_breakpoints ();
prev_stpoint = NULL;
@@ -3433,9 +3432,6 @@ stop_tracing (void)
We can't now, since we may be getting here due to the inferior
agent calling us. */
pause_all (1);
- /* Since we're removing breakpoints, cancel breakpoint hits,
- possibly related to the breakpoints we're about to delete. */
- cancel_breakpoints ();
/* Stop logging. Tracepoints can still be hit, but they will not be
recorded. */
@@ -6538,7 +6534,6 @@ upload_fast_traceframes (void)
trace_debug ("Done uploading traceframes [%d]\n", curr_tbctrl_idx);
pause_all (1);
- cancel_breakpoints ();
delete_breakpoint (about_to_request_buffer_space_bkpt);
about_to_request_buffer_space_bkpt = NULL;
--
1.9.3
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 4/8] linux-nat.c: clean up pending status checking and resuming LWPs
2014-12-26 20:31 [PATCH 0/8] Linux: starvation avoidance in non-stop mode Pedro Alves
` (6 preceding siblings ...)
2014-12-26 20:32 ` [PATCH 7/8] [gdbserver] linux-low.c: better starvation avoidance, handle non-stop mode too Pedro Alves
@ 2014-12-26 20:32 ` Pedro Alves
2015-01-06 8:12 ` Yao Qi
2015-01-09 15:07 ` [PATCH 0/8] Linux: starvation avoidance in non-stop mode Pedro Alves
8 siblings, 1 reply; 23+ messages in thread
From: Pedro Alves @ 2014-12-26 20:32 UTC (permalink / raw)
To: gdb-patches
Whenever we resume an LWP, we must clear a few flags and flush the
LWPs register cache. Instead of open coding that in many places, use
an helper function.
Likewise, we have two fields in the LWP structure where a pending
status may be recorded. Add a helper predicate that checks both and
use it throughout instead of open coding the checks.
gdb/
2014-12-26 Pedro Alves <palves@redhat.com>
* linux-nat.c (linux_resume_one_lwp): New function.
(resume_lwp): Use lwp_status_pending_p and linux_resume_one_lwp.
(linux_nat_resume): Use lwp_status_pending_p and
linux_resume_one_lwp.
(linux_handle_syscall_trap): Use linux_resume_one_lwp.
(linux_handle_extended_wait): Use linux_resume_one_lwp.
(status_callback, running_callback): Use lwp_status_pending_p.
(lwp_status_pending_p): New function.
(stop_and_resume_callback): Use lwp_status_pending_p.
(linux_nat_filter_event): Use linux_resume_one_lwp.
(linux_nat_wait_1): Always use status_callback to look for an LWP
with a pending status. Use linux_resume_one_lwp.
(resume_stopped_resumed_lwps): Use lwp_status_pending_p and
linux_resume_one_lwp.
---
gdb/linux-nat.c | 170 +++++++++++++++++---------------------------------------
1 file changed, 50 insertions(+), 120 deletions(-)
diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
index 845d566..8346c55 100644
--- a/gdb/linux-nat.c
+++ b/gdb/linux-nat.c
@@ -277,6 +277,8 @@ static void purge_lwp_list (int pid);
static void delete_lwp (ptid_t ptid);
static struct lwp_info *find_lwp_pid (ptid_t ptid);
+static int lwp_status_pending_p (struct lwp_info *lp);
+
\f
/* Trivial list manipulation functions to keep track of a list of
new stopped processes. */
@@ -1430,6 +1432,25 @@ linux_nat_detach (struct target_ops *ops, const char *args, int from_tty)
linux_ops->to_detach (ops, args, from_tty);
}
+/* Resume execution of the inferior process. If STEP is nonzero,
+ single-step it. If SIGNAL is nonzero, give it that signal. */
+
+static void
+linux_resume_one_lwp (struct lwp_info *lp, int step, enum gdb_signal signo)
+{
+ ptid_t ptid;
+
+ lp->step = step;
+ if (linux_nat_prepare_to_resume != NULL)
+ linux_nat_prepare_to_resume (lp);
+ /* Convert to something the lower layer understands. */
+ ptid = pid_to_ptid (ptid_get_lwp (lp->ptid));
+ linux_ops->to_resume (linux_ops, ptid, step, signo);
+ lp->stopped_by_watchpoint = 0;
+ lp->stopped = 0;
+ registers_changed_ptid (lp->ptid);
+}
+
/* Resume LP. */
static void
@@ -1446,8 +1467,7 @@ resume_lwp (struct lwp_info *lp, int step, enum gdb_signal signo)
"RC: Not resuming %s (vfork parent)\n",
target_pid_to_str (lp->ptid));
}
- else if (lp->status == 0
- && lp->waitstatus.kind == TARGET_WAITKIND_IGNORE)
+ else if (!lwp_status_pending_p (lp))
{
if (debug_linux_nat)
fprintf_unfiltered (gdb_stdlog,
@@ -1458,14 +1478,7 @@ resume_lwp (struct lwp_info *lp, int step, enum gdb_signal signo)
: "0"),
step ? "step" : "resume");
- if (linux_nat_prepare_to_resume != NULL)
- linux_nat_prepare_to_resume (lp);
- linux_ops->to_resume (linux_ops,
- pid_to_ptid (ptid_get_lwp (lp->ptid)),
- step, signo);
- lp->stopped = 0;
- lp->step = step;
- lp->stopped_by_watchpoint = 0;
+ linux_resume_one_lwp (lp, step, signo);
}
else
{
@@ -1559,7 +1572,6 @@ linux_nat_resume (struct target_ops *ops,
gdb_assert (lp != NULL);
/* Remember if we're stepping. */
- lp->step = step;
lp->last_resume_kind = step ? resume_step : resume_continue;
/* If we have a pending wait status for this thread, there is no
@@ -1589,7 +1601,7 @@ linux_nat_resume (struct target_ops *ops,
}
}
- if (lp->status || lp->waitstatus.kind != TARGET_WAITKIND_IGNORE)
+ if (lwp_status_pending_p (lp))
{
/* FIXME: What should we do if we are supposed to continue
this thread with a signal? */
@@ -1612,14 +1624,7 @@ linux_nat_resume (struct target_ops *ops,
if (resume_many)
iterate_over_lwps (ptid, linux_nat_resume_callback, lp);
- /* Convert to something the lower layer understands. */
- ptid = pid_to_ptid (ptid_get_lwp (lp->ptid));
-
- if (linux_nat_prepare_to_resume != NULL)
- linux_nat_prepare_to_resume (lp);
- linux_ops->to_resume (linux_ops, ptid, step, signo);
- lp->stopped_by_watchpoint = 0;
- lp->stopped = 0;
+ linux_resume_one_lwp (lp, step, signo);
if (debug_linux_nat)
fprintf_unfiltered (gdb_stdlog,
@@ -1783,14 +1788,7 @@ linux_handle_syscall_trap (struct lwp_info *lp, int stopping)
subsequent syscall. Simply resume using the inf-ptrace layer,
which knows when to use PT_SYSCALL or PT_CONTINUE. */
- /* Note that gdbarch_get_syscall_number may access registers, hence
- fill a regcache. */
- registers_changed ();
- if (linux_nat_prepare_to_resume != NULL)
- linux_nat_prepare_to_resume (lp);
- linux_ops->to_resume (linux_ops, pid_to_ptid (ptid_get_lwp (lp->ptid)),
- lp->step, GDB_SIGNAL_0);
- lp->stopped = 0;
+ linux_resume_one_lwp (lp, lp->step, GDB_SIGNAL_0);
return 1;
}
@@ -1984,23 +1982,14 @@ linux_handle_extended_wait (struct lwp_info *lp, int status,
fprintf_unfiltered (gdb_stdlog,
"LHEW: resuming new LWP %ld\n",
ptid_get_lwp (new_lp->ptid));
- if (linux_nat_prepare_to_resume != NULL)
- linux_nat_prepare_to_resume (new_lp);
- linux_ops->to_resume (linux_ops, pid_to_ptid (new_pid),
- 0, GDB_SIGNAL_0);
- new_lp->stopped = 0;
+ linux_resume_one_lwp (new_lp, 0, GDB_SIGNAL_0);
}
}
if (debug_linux_nat)
fprintf_unfiltered (gdb_stdlog,
"LHEW: resuming parent LWP %d\n", pid);
- if (linux_nat_prepare_to_resume != NULL)
- linux_nat_prepare_to_resume (lp);
- linux_ops->to_resume (linux_ops,
- pid_to_ptid (ptid_get_lwp (lp->ptid)),
- 0, GDB_SIGNAL_0);
- lp->stopped = 0;
+ linux_resume_one_lwp (lp, 0, GDB_SIGNAL_0);
return 1;
}
@@ -2470,20 +2459,7 @@ status_callback (struct lwp_info *lp, void *data)
if (!lp->resumed)
return 0;
- if (lp->waitstatus.kind != TARGET_WAITKIND_IGNORE)
- {
- /* A ptrace event, like PTRACE_FORK|VFORK|EXEC, syscall event,
- or a pending process exit. Note that `W_EXITCODE(0,0) ==
- 0', so a clean process exit can not be stored pending in
- lp->status, it is indistinguishable from
- no-pending-status. */
- return 1;
- }
-
- if (lp->status != 0)
- return 1;
-
- return 0;
+ return lwp_status_pending_p (lp);
}
/* Return non-zero if LP isn't stopped. */
@@ -2492,9 +2468,7 @@ static int
running_callback (struct lwp_info *lp, void *data)
{
return (!lp->stopped
- || ((lp->status != 0
- || lp->waitstatus.kind != TARGET_WAITKIND_IGNORE)
- && lp->resumed));
+ || (lwp_status_pending_p (lp) && lp->resumed));
}
/* Count the LWP's that have had events. */
@@ -2525,6 +2499,17 @@ select_singlestep_lwp_callback (struct lwp_info *lp, void *data)
return 0;
}
+/* Returns true if LP has a status pending. */
+
+static int
+lwp_status_pending_p (struct lwp_info *lp)
+{
+ /* We check for lp->waitstatus in addition to lp->status, because we
+ can have pending process exits recorded in lp->status and
+ W_EXITCODE(0,0) happens to be 0. */
+ return lp->status != 0 || lp->waitstatus.kind != TARGET_WAITKIND_IGNORE;
+}
+
/* Select the Nth LWP that has had a SIGTRAP event. */
static int
@@ -2688,7 +2673,7 @@ stop_and_resume_callback (struct lwp_info *lp, void *data)
if (lp != NULL)
{
if (lp->last_resume_kind == resume_stop
- && lp->status == 0)
+ && !lwp_status_pending_p (lp))
{
/* The core wanted the LWP to stop. Even if it stopped
cleanly (with SIGSTOP), leave the event pending. */
@@ -2700,7 +2685,7 @@ stop_and_resume_callback (struct lwp_info *lp, void *data)
lp->status = W_STOPCODE (SIGSTOP);
}
- if (lp->status == 0)
+ if (!lwp_status_pending_p (lp))
{
if (debug_linux_nat)
fprintf_unfiltered (gdb_stdlog,
@@ -2883,13 +2868,7 @@ linux_nat_filter_event (int lwpid, int status, int *new_pending_p)
{
/* This is a delayed SIGSTOP. */
- registers_changed ();
-
- if (linux_nat_prepare_to_resume != NULL)
- linux_nat_prepare_to_resume (lp);
- linux_ops->to_resume (linux_ops,
- pid_to_ptid (ptid_get_lwp (lp->ptid)),
- lp->step, GDB_SIGNAL_0);
+ linux_resume_one_lwp (lp, lp->step, GDB_SIGNAL_0);
if (debug_linux_nat)
fprintf_unfiltered (gdb_stdlog,
"LLW: %s %s, 0, 0 (discard SIGSTOP)\n",
@@ -2897,7 +2876,6 @@ linux_nat_filter_event (int lwpid, int status, int *new_pending_p)
"PTRACE_SINGLESTEP" : "PTRACE_CONT",
target_pid_to_str (lp->ptid));
- lp->stopped = 0;
gdb_assert (lp->resumed);
/* Discard the event. */
@@ -2918,19 +2896,13 @@ linux_nat_filter_event (int lwpid, int status, int *new_pending_p)
/* This is a delayed SIGINT. */
lp->ignore_sigint = 0;
- registers_changed ();
- if (linux_nat_prepare_to_resume != NULL)
- linux_nat_prepare_to_resume (lp);
- linux_ops->to_resume (linux_ops, pid_to_ptid (ptid_get_lwp (lp->ptid)),
- lp->step, GDB_SIGNAL_0);
+ linux_resume_one_lwp (lp, lp->step, GDB_SIGNAL_0);
if (debug_linux_nat)
fprintf_unfiltered (gdb_stdlog,
"LLW: %s %s, 0, 0 (discard SIGINT)\n",
lp->step ?
"PTRACE_SINGLESTEP" : "PTRACE_CONT",
target_pid_to_str (lp->ptid));
-
- lp->stopped = 0;
gdb_assert (lp->resumed);
/* Discard the event. */
@@ -3039,46 +3011,17 @@ linux_nat_wait_1 (struct target_ops *ops,
block_child_signals (&prev_mask);
retry:
- lp = NULL;
status = 0;
/* First check if there is a LWP with a wait status pending. */
- if (ptid_equal (ptid, minus_one_ptid) || ptid_is_pid (ptid))
- {
- /* Any LWP in the PTID group that's been resumed will do. */
- lp = iterate_over_lwps (ptid, status_callback, NULL);
- if (lp)
- {
- if (debug_linux_nat && lp->status)
- fprintf_unfiltered (gdb_stdlog,
- "LLW: Using pending wait status %s for %s.\n",
- status_to_str (lp->status),
- target_pid_to_str (lp->ptid));
- }
- }
- else if (ptid_lwp_p (ptid))
+ lp = iterate_over_lwps (ptid, status_callback, NULL);
+ if (lp != NULL)
{
if (debug_linux_nat)
fprintf_unfiltered (gdb_stdlog,
- "LLW: Waiting for specific LWP %s.\n",
- target_pid_to_str (ptid));
-
- /* We have a specific LWP to check. */
- lp = find_lwp_pid (ptid);
- gdb_assert (lp);
-
- if (debug_linux_nat && lp->status)
- fprintf_unfiltered (gdb_stdlog,
"LLW: Using pending wait status %s for %s.\n",
status_to_str (lp->status),
target_pid_to_str (lp->ptid));
-
- /* We check for lp->waitstatus in addition to lp->status,
- because we can have pending process exits recorded in
- lp->status and W_EXITCODE(0,0) == 0. We should probably have
- an additional lp->status_p flag. */
- if (lp->status == 0 && lp->waitstatus.kind == TARGET_WAITKIND_IGNORE)
- lp = NULL;
}
if (!target_can_async_p ())
@@ -3294,12 +3237,7 @@ retry:
other threads to run. On the other hand, not resuming
newly attached threads may cause an unwanted delay in
getting them running. */
- registers_changed ();
- if (linux_nat_prepare_to_resume != NULL)
- linux_nat_prepare_to_resume (lp);
- linux_ops->to_resume (linux_ops,
- pid_to_ptid (ptid_get_lwp (lp->ptid)),
- lp->step, signo);
+ linux_resume_one_lwp (lp, lp->step, signo);
if (debug_linux_nat)
fprintf_unfiltered (gdb_stdlog,
"LLW: %s %s, %s (preempt 'handle')\n",
@@ -3309,7 +3247,6 @@ retry:
(signo != GDB_SIGNAL_0
? strsignal (gdb_signal_to_host (signo))
: "0"));
- lp->stopped = 0;
goto retry;
}
@@ -3429,8 +3366,7 @@ resume_stopped_resumed_lwps (struct lwp_info *lp, void *data)
if (lp->stopped
&& lp->resumed
- && lp->status == 0
- && lp->waitstatus.kind == TARGET_WAITKIND_IGNORE)
+ && !lwp_status_pending_p (lp))
{
struct regcache *regcache = get_thread_regcache (lp->ptid);
struct gdbarch *gdbarch = get_regcache_arch (regcache);
@@ -3453,13 +3389,7 @@ resume_stopped_resumed_lwps (struct lwp_info *lp, void *data)
paddress (gdbarch, pc),
lp->step);
- registers_changed ();
- if (linux_nat_prepare_to_resume != NULL)
- linux_nat_prepare_to_resume (lp);
- linux_ops->to_resume (linux_ops, pid_to_ptid (ptid_get_lwp (lp->ptid)),
- lp->step, GDB_SIGNAL_0);
- lp->stopped = 0;
- lp->stopped_by_watchpoint = 0;
+ linux_resume_one_lwp (lp, lp->step, GDB_SIGNAL_0);
}
return 0;
--
1.9.3
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 4/8] linux-nat.c: clean up pending status checking and resuming LWPs
2014-12-26 20:32 ` [PATCH 4/8] linux-nat.c: clean up pending status checking and resuming LWPs Pedro Alves
@ 2015-01-06 8:12 ` Yao Qi
2015-01-07 13:22 ` Pedro Alves
0 siblings, 1 reply; 23+ messages in thread
From: Yao Qi @ 2015-01-06 8:12 UTC (permalink / raw)
To: Pedro Alves; +Cc: gdb-patches
Pedro Alves <palves@redhat.com> writes:
> Whenever we resume an LWP, we must clear a few flags and flush the
> LWPs register cache. Instead of open coding that in many places, use
> an helper function.
After this patch, the behavior is changed to flush the LWP's register
cache rather than register cache of all LWPs (registers_changed ->
registers_changed_ptid (lp->ptid)), right? Although I believe the
change is correct, we should mention it in the commit log.
--
Yao (齐尧)
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 4/8] linux-nat.c: clean up pending status checking and resuming LWPs
2015-01-06 8:12 ` Yao Qi
@ 2015-01-07 13:22 ` Pedro Alves
2015-01-07 14:10 ` Yao Qi
0 siblings, 1 reply; 23+ messages in thread
From: Pedro Alves @ 2015-01-07 13:22 UTC (permalink / raw)
To: Yao Qi; +Cc: gdb-patches
On 01/06/2015 08:12 AM, Yao Qi wrote:
> Pedro Alves <palves@redhat.com> writes:
>
>> Whenever we resume an LWP, we must clear a few flags and flush the
>> LWPs register cache. Instead of open coding that in many places, use
>> an helper function.
>
> After this patch, the behavior is changed to flush the LWP's register
> cache rather than register cache of all LWPs (registers_changed ->
> registers_changed_ptid (lp->ptid)), right? Although I believe the
> change is correct, we should mention it in the commit log.
Good point. Here's the updated version. Nothing else changed.
-----
From 645d6471aad217309c750a28d9ab08468acaee38 Mon Sep 17 00:00:00 2001
From: Pedro Alves <palves@redhat.com>
Subject: [PATCH] linux-nat.c: clean up pending status checking and resuming
LWPs
Whenever we resume an LWP, we must clear a few flags and flush the
LWP's register cache. We actually currently flush the register cache
of all LWPs, but that's unnecessary. This patch makes us flush the
register cache of only the LWP that is resumed. Instead of open
coding all that in many places, we use a helper function.
Likewise, we have two fields in the LWP structure where a pending
status may be recorded. Add a helper predicate that checks both and
use it throughout instead of open coding the checks.
gdb/
2015-01-07 Pedro Alves <palves@redhat.com>
* linux-nat.c (linux_resume_one_lwp): New function.
(resume_lwp): Use lwp_status_pending_p and linux_resume_one_lwp.
(linux_nat_resume): Use lwp_status_pending_p and
linux_resume_one_lwp.
(linux_handle_syscall_trap): Use linux_resume_one_lwp.
(linux_handle_extended_wait): Use linux_resume_one_lwp.
(status_callback, running_callback): Use lwp_status_pending_p.
(lwp_status_pending_p): New function.
(stop_and_resume_callback): Use lwp_status_pending_p.
(linux_nat_filter_event): Use linux_resume_one_lwp.
(linux_nat_wait_1): Always use status_callback to look for an LWP
with a pending status. Use linux_resume_one_lwp.
(resume_stopped_resumed_lwps): Use lwp_status_pending_p and
linux_resume_one_lwp.
---
gdb/linux-nat.c | 170 +++++++++++++++++---------------------------------------
1 file changed, 50 insertions(+), 120 deletions(-)
diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
index 5e60145..de9aec5 100644
--- a/gdb/linux-nat.c
+++ b/gdb/linux-nat.c
@@ -277,6 +277,8 @@ static void purge_lwp_list (int pid);
static void delete_lwp (ptid_t ptid);
static struct lwp_info *find_lwp_pid (ptid_t ptid);
+static int lwp_status_pending_p (struct lwp_info *lp);
+
\f
/* Trivial list manipulation functions to keep track of a list of
new stopped processes. */
@@ -1453,6 +1455,25 @@ linux_nat_detach (struct target_ops *ops, const char *args, int from_tty)
linux_ops->to_detach (ops, args, from_tty);
}
+/* Resume execution of the inferior process. If STEP is nonzero,
+ single-step it. If SIGNAL is nonzero, give it that signal. */
+
+static void
+linux_resume_one_lwp (struct lwp_info *lp, int step, enum gdb_signal signo)
+{
+ ptid_t ptid;
+
+ lp->step = step;
+ if (linux_nat_prepare_to_resume != NULL)
+ linux_nat_prepare_to_resume (lp);
+ /* Convert to something the lower layer understands. */
+ ptid = pid_to_ptid (ptid_get_lwp (lp->ptid));
+ linux_ops->to_resume (linux_ops, ptid, step, signo);
+ lp->stopped_by_watchpoint = 0;
+ lp->stopped = 0;
+ registers_changed_ptid (lp->ptid);
+}
+
/* Resume LP. */
static void
@@ -1469,8 +1490,7 @@ resume_lwp (struct lwp_info *lp, int step, enum gdb_signal signo)
"RC: Not resuming %s (vfork parent)\n",
target_pid_to_str (lp->ptid));
}
- else if (lp->status == 0
- && lp->waitstatus.kind == TARGET_WAITKIND_IGNORE)
+ else if (!lwp_status_pending_p (lp))
{
if (debug_linux_nat)
fprintf_unfiltered (gdb_stdlog,
@@ -1481,14 +1501,7 @@ resume_lwp (struct lwp_info *lp, int step, enum gdb_signal signo)
: "0"),
step ? "step" : "resume");
- if (linux_nat_prepare_to_resume != NULL)
- linux_nat_prepare_to_resume (lp);
- linux_ops->to_resume (linux_ops,
- pid_to_ptid (ptid_get_lwp (lp->ptid)),
- step, signo);
- lp->stopped = 0;
- lp->step = step;
- lp->stopped_by_watchpoint = 0;
+ linux_resume_one_lwp (lp, step, signo);
}
else
{
@@ -1582,7 +1595,6 @@ linux_nat_resume (struct target_ops *ops,
gdb_assert (lp != NULL);
/* Remember if we're stepping. */
- lp->step = step;
lp->last_resume_kind = step ? resume_step : resume_continue;
/* If we have a pending wait status for this thread, there is no
@@ -1612,7 +1624,7 @@ linux_nat_resume (struct target_ops *ops,
}
}
- if (lp->status || lp->waitstatus.kind != TARGET_WAITKIND_IGNORE)
+ if (lwp_status_pending_p (lp))
{
/* FIXME: What should we do if we are supposed to continue
this thread with a signal? */
@@ -1635,14 +1647,7 @@ linux_nat_resume (struct target_ops *ops,
if (resume_many)
iterate_over_lwps (ptid, linux_nat_resume_callback, lp);
- /* Convert to something the lower layer understands. */
- ptid = pid_to_ptid (ptid_get_lwp (lp->ptid));
-
- if (linux_nat_prepare_to_resume != NULL)
- linux_nat_prepare_to_resume (lp);
- linux_ops->to_resume (linux_ops, ptid, step, signo);
- lp->stopped_by_watchpoint = 0;
- lp->stopped = 0;
+ linux_resume_one_lwp (lp, step, signo);
if (debug_linux_nat)
fprintf_unfiltered (gdb_stdlog,
@@ -1806,14 +1811,7 @@ linux_handle_syscall_trap (struct lwp_info *lp, int stopping)
subsequent syscall. Simply resume using the inf-ptrace layer,
which knows when to use PT_SYSCALL or PT_CONTINUE. */
- /* Note that gdbarch_get_syscall_number may access registers, hence
- fill a regcache. */
- registers_changed ();
- if (linux_nat_prepare_to_resume != NULL)
- linux_nat_prepare_to_resume (lp);
- linux_ops->to_resume (linux_ops, pid_to_ptid (ptid_get_lwp (lp->ptid)),
- lp->step, GDB_SIGNAL_0);
- lp->stopped = 0;
+ linux_resume_one_lwp (lp, lp->step, GDB_SIGNAL_0);
return 1;
}
@@ -2007,23 +2005,14 @@ linux_handle_extended_wait (struct lwp_info *lp, int status,
fprintf_unfiltered (gdb_stdlog,
"LHEW: resuming new LWP %ld\n",
ptid_get_lwp (new_lp->ptid));
- if (linux_nat_prepare_to_resume != NULL)
- linux_nat_prepare_to_resume (new_lp);
- linux_ops->to_resume (linux_ops, pid_to_ptid (new_pid),
- 0, GDB_SIGNAL_0);
- new_lp->stopped = 0;
+ linux_resume_one_lwp (new_lp, 0, GDB_SIGNAL_0);
}
}
if (debug_linux_nat)
fprintf_unfiltered (gdb_stdlog,
"LHEW: resuming parent LWP %d\n", pid);
- if (linux_nat_prepare_to_resume != NULL)
- linux_nat_prepare_to_resume (lp);
- linux_ops->to_resume (linux_ops,
- pid_to_ptid (ptid_get_lwp (lp->ptid)),
- 0, GDB_SIGNAL_0);
- lp->stopped = 0;
+ linux_resume_one_lwp (lp, 0, GDB_SIGNAL_0);
return 1;
}
@@ -2493,20 +2482,7 @@ status_callback (struct lwp_info *lp, void *data)
if (!lp->resumed)
return 0;
- if (lp->waitstatus.kind != TARGET_WAITKIND_IGNORE)
- {
- /* A ptrace event, like PTRACE_FORK|VFORK|EXEC, syscall event,
- or a pending process exit. Note that `W_EXITCODE(0,0) ==
- 0', so a clean process exit can not be stored pending in
- lp->status, it is indistinguishable from
- no-pending-status. */
- return 1;
- }
-
- if (lp->status != 0)
- return 1;
-
- return 0;
+ return lwp_status_pending_p (lp);
}
/* Return non-zero if LP isn't stopped. */
@@ -2515,9 +2491,7 @@ static int
running_callback (struct lwp_info *lp, void *data)
{
return (!lp->stopped
- || ((lp->status != 0
- || lp->waitstatus.kind != TARGET_WAITKIND_IGNORE)
- && lp->resumed));
+ || (lwp_status_pending_p (lp) && lp->resumed));
}
/* Count the LWP's that have had events. */
@@ -2548,6 +2522,17 @@ select_singlestep_lwp_callback (struct lwp_info *lp, void *data)
return 0;
}
+/* Returns true if LP has a status pending. */
+
+static int
+lwp_status_pending_p (struct lwp_info *lp)
+{
+ /* We check for lp->waitstatus in addition to lp->status, because we
+ can have pending process exits recorded in lp->status and
+ W_EXITCODE(0,0) happens to be 0. */
+ return lp->status != 0 || lp->waitstatus.kind != TARGET_WAITKIND_IGNORE;
+}
+
/* Select the Nth LWP that has had a SIGTRAP event. */
static int
@@ -2711,7 +2696,7 @@ stop_and_resume_callback (struct lwp_info *lp, void *data)
if (lp != NULL)
{
if (lp->last_resume_kind == resume_stop
- && lp->status == 0)
+ && !lwp_status_pending_p (lp))
{
/* The core wanted the LWP to stop. Even if it stopped
cleanly (with SIGSTOP), leave the event pending. */
@@ -2723,7 +2708,7 @@ stop_and_resume_callback (struct lwp_info *lp, void *data)
lp->status = W_STOPCODE (SIGSTOP);
}
- if (lp->status == 0)
+ if (!lwp_status_pending_p (lp))
{
if (debug_linux_nat)
fprintf_unfiltered (gdb_stdlog,
@@ -2910,13 +2895,7 @@ linux_nat_filter_event (int lwpid, int status, int *new_pending_p)
{
/* This is a delayed SIGSTOP. */
- registers_changed ();
-
- if (linux_nat_prepare_to_resume != NULL)
- linux_nat_prepare_to_resume (lp);
- linux_ops->to_resume (linux_ops,
- pid_to_ptid (ptid_get_lwp (lp->ptid)),
- lp->step, GDB_SIGNAL_0);
+ linux_resume_one_lwp (lp, lp->step, GDB_SIGNAL_0);
if (debug_linux_nat)
fprintf_unfiltered (gdb_stdlog,
"LLW: %s %s, 0, 0 (discard SIGSTOP)\n",
@@ -2924,7 +2903,6 @@ linux_nat_filter_event (int lwpid, int status, int *new_pending_p)
"PTRACE_SINGLESTEP" : "PTRACE_CONT",
target_pid_to_str (lp->ptid));
- lp->stopped = 0;
gdb_assert (lp->resumed);
/* Discard the event. */
@@ -2945,19 +2923,13 @@ linux_nat_filter_event (int lwpid, int status, int *new_pending_p)
/* This is a delayed SIGINT. */
lp->ignore_sigint = 0;
- registers_changed ();
- if (linux_nat_prepare_to_resume != NULL)
- linux_nat_prepare_to_resume (lp);
- linux_ops->to_resume (linux_ops, pid_to_ptid (ptid_get_lwp (lp->ptid)),
- lp->step, GDB_SIGNAL_0);
+ linux_resume_one_lwp (lp, lp->step, GDB_SIGNAL_0);
if (debug_linux_nat)
fprintf_unfiltered (gdb_stdlog,
"LLW: %s %s, 0, 0 (discard SIGINT)\n",
lp->step ?
"PTRACE_SINGLESTEP" : "PTRACE_CONT",
target_pid_to_str (lp->ptid));
-
- lp->stopped = 0;
gdb_assert (lp->resumed);
/* Discard the event. */
@@ -3066,46 +3038,17 @@ linux_nat_wait_1 (struct target_ops *ops,
block_child_signals (&prev_mask);
retry:
- lp = NULL;
status = 0;
/* First check if there is a LWP with a wait status pending. */
- if (ptid_equal (ptid, minus_one_ptid) || ptid_is_pid (ptid))
- {
- /* Any LWP in the PTID group that's been resumed will do. */
- lp = iterate_over_lwps (ptid, status_callback, NULL);
- if (lp)
- {
- if (debug_linux_nat && lp->status)
- fprintf_unfiltered (gdb_stdlog,
- "LLW: Using pending wait status %s for %s.\n",
- status_to_str (lp->status),
- target_pid_to_str (lp->ptid));
- }
- }
- else if (ptid_lwp_p (ptid))
+ lp = iterate_over_lwps (ptid, status_callback, NULL);
+ if (lp != NULL)
{
if (debug_linux_nat)
fprintf_unfiltered (gdb_stdlog,
- "LLW: Waiting for specific LWP %s.\n",
- target_pid_to_str (ptid));
-
- /* We have a specific LWP to check. */
- lp = find_lwp_pid (ptid);
- gdb_assert (lp);
-
- if (debug_linux_nat && lp->status)
- fprintf_unfiltered (gdb_stdlog,
"LLW: Using pending wait status %s for %s.\n",
status_to_str (lp->status),
target_pid_to_str (lp->ptid));
-
- /* We check for lp->waitstatus in addition to lp->status,
- because we can have pending process exits recorded in
- lp->status and W_EXITCODE(0,0) == 0. We should probably have
- an additional lp->status_p flag. */
- if (lp->status == 0 && lp->waitstatus.kind == TARGET_WAITKIND_IGNORE)
- lp = NULL;
}
if (!target_can_async_p ())
@@ -3321,12 +3264,7 @@ retry:
other threads to run. On the other hand, not resuming
newly attached threads may cause an unwanted delay in
getting them running. */
- registers_changed ();
- if (linux_nat_prepare_to_resume != NULL)
- linux_nat_prepare_to_resume (lp);
- linux_ops->to_resume (linux_ops,
- pid_to_ptid (ptid_get_lwp (lp->ptid)),
- lp->step, signo);
+ linux_resume_one_lwp (lp, lp->step, signo);
if (debug_linux_nat)
fprintf_unfiltered (gdb_stdlog,
"LLW: %s %s, %s (preempt 'handle')\n",
@@ -3336,7 +3274,6 @@ retry:
(signo != GDB_SIGNAL_0
? strsignal (gdb_signal_to_host (signo))
: "0"));
- lp->stopped = 0;
goto retry;
}
@@ -3456,8 +3393,7 @@ resume_stopped_resumed_lwps (struct lwp_info *lp, void *data)
if (lp->stopped
&& lp->resumed
- && lp->status == 0
- && lp->waitstatus.kind == TARGET_WAITKIND_IGNORE)
+ && !lwp_status_pending_p (lp))
{
struct regcache *regcache = get_thread_regcache (lp->ptid);
struct gdbarch *gdbarch = get_regcache_arch (regcache);
@@ -3480,13 +3416,7 @@ resume_stopped_resumed_lwps (struct lwp_info *lp, void *data)
paddress (gdbarch, pc),
lp->step);
- registers_changed ();
- if (linux_nat_prepare_to_resume != NULL)
- linux_nat_prepare_to_resume (lp);
- linux_ops->to_resume (linux_ops, pid_to_ptid (ptid_get_lwp (lp->ptid)),
- lp->step, GDB_SIGNAL_0);
- lp->stopped = 0;
- lp->stopped_by_watchpoint = 0;
+ linux_resume_one_lwp (lp, lp->step, GDB_SIGNAL_0);
}
return 0;
--
1.9.3
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 4/8] linux-nat.c: clean up pending status checking and resuming LWPs
2015-01-07 13:22 ` Pedro Alves
@ 2015-01-07 14:10 ` Yao Qi
0 siblings, 0 replies; 23+ messages in thread
From: Yao Qi @ 2015-01-07 14:10 UTC (permalink / raw)
To: Pedro Alves; +Cc: gdb-patches
Pedro Alves <palves@redhat.com> writes:
> Whenever we resume an LWP, we must clear a few flags and flush the
> LWP's register cache. We actually currently flush the register cache
> of all LWPs, but that's unnecessary. This patch makes us flush the
> register cache of only the LWP that is resumed. Instead of open
This looks clear to me. Thanks.
--
Yao (齐尧)
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/8] Linux: starvation avoidance in non-stop mode
2014-12-26 20:31 [PATCH 0/8] Linux: starvation avoidance in non-stop mode Pedro Alves
` (7 preceding siblings ...)
2014-12-26 20:32 ` [PATCH 4/8] linux-nat.c: clean up pending status checking and resuming LWPs Pedro Alves
@ 2015-01-09 15:07 ` Pedro Alves
8 siblings, 0 replies; 23+ messages in thread
From: Pedro Alves @ 2015-01-09 15:07 UTC (permalink / raw)
To: gdb-patches
On 12/26/2014 08:31 PM, Pedro Alves wrote:
> I've been working on reimplementing all-stop behavior against a target
> backend working in non-stop mode. Running the testsuite against that
> shows a few regressions related to thread starvation. The Linux
> backends (gdb and gdbserver) already have logic in place to avoid
> that, but it's only currently used when the backend is in all-stop
> mode. This series fixes that work in non-stop too, and further
> improves it.
>
> As a result, the all-stop and non-stop code paths in the backends are
> further merged. Also the native and gdbserver backends end up a
> little bit more similar. Both good things on their own.
>
> Tested on x86_64 Fedora 20, native and gdbserver.
I've pushed this in now.
Thanks,
Pedro Alves
^ permalink raw reply [flat|nested] 23+ messages in thread