[PATCH] RFC: All stepping threads need the same checks and cleanup as the currently stepping thread

public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed

* [PATCH] RFC: All stepping threads need the same checks and cleanup as the currently stepping thread
@ 2014-01-21 22:17 Sterling Augustine
  2014-01-22 14:04 ` Pedro Alves
  0 siblings, 1 reply; 8+ messages in thread
From: Sterling Augustine @ 2014-01-21 22:17 UTC (permalink / raw)
  To: gdb-patches, Pedro Alves

[-- Attachment #1: Type: text/plain, Size: 1761 bytes --]

I am having trouble reducing a large and heavily multi-threaded
program (with many SIGPROFs) to a publishable test case, but I have
been intermittently hitting the following assertion in gdb's "resume"
function:

gdb_assert (pc_in_thread_step_range (pc, tp));

The PC is in a function called inside the step range, and therefore
you would expect process_event_stop_test's checks for this (starting
with the comment, "We stepped out of the stepping range") to catch it
and fix it up.

But the problem is that the thread out of the step range is not the
currently stepped thread. Process_event_stop_test calls
switch_back_to_stepped_thread, which, in turn, calls resume, bypassing
the extra logic in process_event_stop_test to fix up the step range,
and leading to the assertion error.

But I don't see any reason to assume that only the current thread
would need the additional cleanup found in process_event_stop_test. In
fact, switch_back_to_stepped_thread has a special case (hidden in
currently_stepping_or_nexting_callback), to _prevent_ it from
restarting the current thread, presumably so it that thread can get
the additional cleanup.

The enclosed patch does two things:

1. Adds a large amount of tracing, which helped me diagnose the problem.
2. Changes switch_back_to_stepped_thread to still switch back to a
stepped thread, but to avoid restarting it, allowing the additional
checks in process_event_stop_test to work their magic.

The GDB testsuite still passes (in particular, no new failures in
testsuite/gdb.threads), but I know there are a lot of subtleties here.

This is the best fix I can come up with, but are there any other
suggestions on how to approach the problem? (I'll do a ChangeLog entry
after taking comments.)

Thanks,

Sterling

[-- Attachment #2: switch-thread.patch --]
[-- Type: text/x-patch, Size: 9907 bytes --]

diff --git a/gdb/infcmd.c b/gdb/infcmd.c
index 2d50f41..c042e0d 100644
--- a/gdb/infcmd.c
+++ b/gdb/infcmd.c
@@ -32,6 +32,7 @@
 #include "gdbcore.h"
 #include "target.h"
 #include "language.h"
+//#include "symfile.h"
 #include "objfiles.h"
 #include "completer.h"
 #include "ui-out.h"
@@ -1049,6 +1050,10 @@ step_once (int skip_subroutines, int single_inst, int count, int thread)
 				 &tp->control.step_range_start,
 				 &tp->control.step_range_end);
 
+          if (debug_infrun)
+            fprintf_unfiltered (gdb_stdlog,
+				"infrun: %s may range step\n",
+                                target_pid_to_str (tp->ptid));
 	  tp->control.may_range_step = 1;
 
 	  /* If we have no line info, switch to stepi mode.  */
@@ -1056,6 +1061,10 @@ step_once (int skip_subroutines, int single_inst, int count, int thread)
 	    {
 	      tp->control.step_range_start = tp->control.step_range_end = 1;
 	      tp->control.may_range_step = 0;
+              if (debug_infrun)
+                fprintf_unfiltered (gdb_stdlog,
+                                    "infrun: %s may not range step\n",
+                                    target_pid_to_str (tp->ptid));
 	    }
 	  else if (tp->control.step_range_end == 0)
 	    {
@@ -1346,6 +1355,10 @@ until_next_command (int from_tty)
       tp->control.step_range_end = sal.end;
     }
   tp->control.may_range_step = 1;
+  if (debug_infrun)
+    fprintf_unfiltered (gdb_stdlog,
+                        "infrun: %s may range step\n",
+                        target_pid_to_str (tp->ptid));
 
   tp->control.step_over_calls = STEP_OVER_ALL;
 
diff --git a/gdb/infrun.c b/gdb/infrun.c
index 311bf9c..e6f51c9 100644
--- a/gdb/infrun.c
+++ b/gdb/infrun.c
@@ -1315,6 +1315,10 @@ displaced_step_prepare (ptid_t ptid)
      the scratch buffer lands within the stepping range (e.g., a
      jump/branch).  */
   tp->control.may_range_step = 0;
+  if (debug_infrun)
+    fprintf_unfiltered (gdb_stdlog,
+                        "infrun: %s may not range step\n",
+                        target_pid_to_str (tp->ptid));
 
   /* We have to displaced step one thread at a time, as we only have
      access to a single scratch space per inferior.  */
@@ -1775,7 +1779,14 @@ a command like `return' or `jump' to continue execution."));
   /* If we have a breakpoint to step over, make sure to do a single
      step only.  Same if we have software watchpoints.  */
   if (tp->control.trap_expected || bpstat_should_step ())
-    tp->control.may_range_step = 0;
+    {
+      tp->control.may_range_step = 0;
+      if (debug_infrun)
+        fprintf_unfiltered (gdb_stdlog,
+                            "infrun: %s may not range step\n",
+                            target_pid_to_str (tp->ptid));
+    }
+
 
   /* If enabled, step over breakpoints by executing a copy of the
      instruction at a different address.
@@ -1945,6 +1956,14 @@ a command like `return' or `jump' to continue execution."));
 	     operation, like stepping the thread out of the dynamic
 	     linker or the displaced stepping scratch pad.  We
 	     shouldn't have allowed a range step then.  */
+	  if (debug_infrun)
+	    fprintf_unfiltered (gdb_stdlog,
+				"infrun: %s step range = [%lx-%lx] pc = %lx\n",
+                                target_pid_to_str (tp->ptid),
+                                tp->control.step_range_start,
+                                tp->control.step_range_end,
+                                pc);
+
 	  gdb_assert (pc_in_thread_step_range (pc, tp));
 	}
 
@@ -1990,6 +2009,10 @@ clear_proceed_status_thread (struct thread_info *tp)
   tp->control.step_range_start = 0;
   tp->control.step_range_end = 0;
   tp->control.may_range_step = 0;
+  if (debug_infrun)
+    fprintf_unfiltered (gdb_stdlog,
+                        "infrun: %s may not range step\n",
+                        target_pid_to_str (tp->ptid));
   tp->control.step_frame_id = null_frame_id;
   tp->control.step_stack_frame_id = null_frame_id;
   tp->control.step_over_calls = STEP_OVER_UNDEBUGGABLE;
@@ -3254,6 +3277,10 @@ handle_inferior_event (struct execution_control_state *ecs)
       /* Disable range stepping.  If the next step request could use a
 	 range, this will be end up re-enabled then.  */
       ecs->event_thread->control.may_range_step = 0;
+      if (debug_infrun)
+        fprintf_unfiltered (gdb_stdlog,
+                            "infrun: %s may not range step\n",
+                            target_pid_to_str (ecs->ptid));
     }
 
   /* Dependent on valid ECS->EVENT_THREAD.  */
@@ -4644,7 +4671,9 @@ process_event_stop_test (struct execution_control_state *ecs)
 
     case BPSTAT_WHAT_HP_STEP_RESUME:
       if (debug_infrun)
-	fprintf_unfiltered (gdb_stdlog, "infrun: BPSTAT_WHAT_HP_STEP_RESUME\n");
+	fprintf_unfiltered (gdb_stdlog, "infrun: BPSTAT_WHAT_HP_STEP_RESUME "
+                            "step_after_step_resume_breakpoint = %d\n",
+                            ecs->event_thread->step_after_step_resume_breakpoint);
 
       delete_step_resume_breakpoint (ecs->event_thread);
       if (ecs->event_thread->step_after_step_resume_breakpoint)
@@ -4727,6 +4756,10 @@ process_event_stop_test (struct execution_control_state *ecs)
 	 necessary (e.g., if we're stepping over a breakpoint or we
 	 have software watchpoints).  */
       ecs->event_thread->control.may_range_step = 1;
+      if (debug_infrun)
+        fprintf_unfiltered (gdb_stdlog,
+                            "infrun: %s may range step\n",
+                            target_pid_to_str (ecs->ptid));
 
       /* When stepping backward, stop at beginning of line range
 	 (unless it's the function entry point, in which case
@@ -5245,6 +5278,10 @@ process_event_stop_test (struct execution_control_state *ecs)
   ecs->event_thread->control.step_range_start = stop_pc_sal.pc;
   ecs->event_thread->control.step_range_end = stop_pc_sal.end;
   ecs->event_thread->control.may_range_step = 1;
+  if (debug_infrun)
+    fprintf_unfiltered (gdb_stdlog,
+                        "infrun: %s may range step\n",
+                        target_pid_to_str (ecs->ptid));
   set_step_info (frame, stop_pc_sal);
 
   if (debug_infrun)
@@ -5274,10 +5311,7 @@ switch_back_to_stepped_thread (struct execution_control_state *ecs)
 	  if ((ecs->event_thread->control.trap_expected
 	       && ecs->event_thread->suspend.stop_signal != GDB_SIGNAL_TRAP)
 	      || ecs->event_thread->stepping_over_breakpoint)
-	    {
-	      keep_going (ecs);
-	      return 1;
-	    }
+            return 0;
 
 	  /* If the stepping thread exited, then don't try to switch
 	     back and resume it, which could fail in several different
@@ -5307,24 +5341,18 @@ switch_back_to_stepped_thread (struct execution_control_state *ecs)
 				    "stepped thread, it has vanished\n");
 
 	      delete_thread (tp->ptid);
-	      keep_going (ecs);
-	      return 1;
+	      return 0;
 	    }
 
-	  /* Otherwise, we no longer expect a trap in the current thread.
-	     Clear the trap_expected flag before switching back -- this is
-	     what keep_going would do as well, if we called it.  */
-	  ecs->event_thread->control.trap_expected = 0;
-
 	  if (debug_infrun)
 	    fprintf_unfiltered (gdb_stdlog,
-				"infrun: switching back to stepped thread\n");
+				"infrun: switching back to stepped thread %s\n",
+                                target_pid_to_str (tp->ptid));
 
 	  ecs->event_thread = tp;
 	  ecs->ptid = tp->ptid;
 	  context_switch (ecs->ptid);
-	  keep_going (ecs);
-	  return 1;
+          return 0;
 	}
     }
   return 0;
diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
index 2371ad4..1c6bd45 100644
--- a/gdb/linux-nat.c
+++ b/gdb/linux-nat.c
@@ -46,6 +46,7 @@
 #include "gregset.h"		/* for gregset */
 #include "gdbcore.h"		/* for get_exec_file */
 #include <ctype.h>		/* for isdigit */
+#include "gdbthread.h"		/* for struct thread_info etc.  */
 #include <sys/stat.h>		/* for struct stat */
 #include <fcntl.h>		/* for O_RDONLY */
 #include "inf-loop.h"
@@ -64,6 +65,7 @@
 #include "agent.h"
 #include "tracepoint.h"
 #include "exceptions.h"
+#include "linux-ptrace.h"
 #include "buffer.h"
 #include "target-descriptions.h"
 #include "filestuff.h"
@@ -2601,6 +2603,9 @@ stop_wait_callback (struct lwp_info *lp, void *data)
 
       if (WSTOPSIG (status) != SIGSTOP)
 	{
+          struct thread_info *tp = find_thread_ptid (lp->ptid);
+          CORE_ADDR pc = regcache_read_pc (get_thread_regcache (lp->ptid));
+
 	  /* The thread was stopped with a signal other than SIGSTOP.  */
 
 	  save_sigtrap (lp);
@@ -2611,6 +2616,18 @@ stop_wait_callback (struct lwp_info *lp, void *data)
 				status_to_str ((int) status),
 				target_pid_to_str (lp->ptid));
 
+          if (debug_linux_nat
+              && tp->control.may_range_step
+              && !pc_in_thread_step_range (pc, tp))
+            {
+              fprintf_unfiltered (gdb_stdlog,
+                                  "SWC: %s %lx out of step range [%lx-%lx]\n",
+                                  target_pid_to_str (tp->ptid),
+                                  pc,
+                                  tp->control.step_range_start,
+                                  tp->control.step_range_end);
+            }
+
 	  /* Save the sigtrap event.  */
 	  lp->status = status;
 	  gdb_assert (!lp->stopped);
@@ -2686,7 +2703,13 @@ count_events_callback (struct lwp_info *lp, void *data)
 
   /* Count only resumed LWPs that have a SIGTRAP event pending.  */
   if (lp->resumed && linux_nat_lp_status_is_event (lp))
-    (*count)++;
+    {
+      if (debug_linux_nat)
+	fprintf_unfiltered (gdb_stdlog,
+			    "CEC: LWP %ld has an event\n",
+			    ptid_get_lwp (lp->ptid));
+      (*count)++;
+    }
 
   return 0;
 }
@@ -3212,6 +3235,9 @@ linux_nat_wait_1 (struct target_ops *ops,
   block_child_signals (&prev_mask);
 
 retry:
+  if (debug_linux_nat)
+    fprintf_unfiltered (gdb_stdlog, "LLW: retry\n");
+
   lp = NULL;
   status = 0;
 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] RFC: All stepping threads need the same checks and cleanup as the currently stepping thread
  2014-01-21 22:17 [PATCH] RFC: All stepping threads need the same checks and cleanup as the currently stepping thread Sterling Augustine
@ 2014-01-22 14:04 ` Pedro Alves
  2014-01-23 17:55   ` Sterling Augustine
  0 siblings, 1 reply; 8+ messages in thread
From: Pedro Alves @ 2014-01-22 14:04 UTC (permalink / raw)
  To: Sterling Augustine; +Cc: gdb-patches

On 01/21/2014 10:17 PM, Sterling Augustine wrote:

> But the problem is that the thread out of the step range is not the
> currently stepped thread. 

Then I'm quite confused.  Each thread has its own step range.  How come
the thread that is not stepped has a step range at all then?

> Process_event_stop_test calls
> switch_back_to_stepped_thread, which, in turn, calls resume, bypassing
> the extra logic in process_event_stop_test to fix up the step range,
> and leading to the assertion error.

The issue seems to me, as previously discussed, not really about missing
the "fix up" of the step range, but rather that we overstep the thread
by mistake.  Running the thread through process_event_stop_test makes us
detect that the step finished (before we ever get to fix up the step
range).  That is, we switch back to the stepping thread,
and re-step it, ignoring the possibility that that thread might have
already moved past the step range, but not have had a chance to report
that trap to the core yet (because events are serialized).

The thing missing is a testcase clearly showing that that's indeed
the issue in question.  I spent a few days trying to write one from
scratch a while ago, but failed, because linux-nat.c always gives
preference to reporting the stepping/SIGTRAP thread if there are multiple
simultaneous events, and it seemed like another signal needs to be
involved to trigger this.
Perhaps we could confirm all this already in a log produced by
your extra debug outputs ran against your big app?

> But I don't see any reason to assume that only the current thread
> would need the additional cleanup found in process_event_stop_test. In
> fact, switch_back_to_stepped_thread has a special case (hidden in
> currently_stepping_or_nexting_callback), to _prevent_ it from
> restarting the current thread, presumably so it that thread can get
> the additional cleanup.
> 
> The enclosed patch does two things:
> 
> 1. Adds a large amount of tracing, which helped me diagnose the problem.
> 2. Changes switch_back_to_stepped_thread to still switch back to a
> stepped thread, but to avoid restarting it, allowing the additional
> checks in process_event_stop_test to work their magic.

I'm not convinced the first two branches in switch_back_to_stepped_thread
should be changed at all.  So without those that reduces to exactly the
original patch I had shown you originally:

 https://github.com/palves/gdb/commit/b6b55ba610f8db5d89ec7405c93013a10d9a1c20

Does that alone fix things for you?

In that branch, I then later rewrote that fix differently:

 https://github.com/palves/gdb/commit/1d56ddf439b6f7e7fa9759cf1f8e02106eea6af5

The idea of that "better fix" was to handle the case mentioned in
this comment:

+       There might be some cases where this loses signal
+       information, if a signal has arrived at exactly the same
+       time that the PC changed, but this is the best we can do
+       with the information available.

by setting a breakpoint at the current PC, and re-resuming the thread.
That means that if there was indeed some other signal/event pending,
we'd collect it first.  But that's unfinished, and breaks hardware-step
targets again in the process, for it only handles software-step targets.
The thing preventing moving this forward is a testcase (or a log showing
clearly that the problem is what I say above it is, which should show
the steps needed to construct a testcase).

-- 
Pedro Alves

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] RFC: All stepping threads need the same checks and cleanup as the currently stepping thread
  2014-01-22 14:04 ` Pedro Alves
@ 2014-01-23 17:55   ` Sterling Augustine
  2014-01-23 18:53     ` Pedro Alves
  0 siblings, 1 reply; 8+ messages in thread
From: Sterling Augustine @ 2014-01-23 17:55 UTC (permalink / raw)
  To: Pedro Alves; +Cc: gdb-patches

On Wed, Jan 22, 2014 at 6:03 AM, Pedro Alves <palves@redhat.com> wrote:
> On 01/21/2014 10:17 PM, Sterling Augustine wrote:
>
>> But the problem is that the thread out of the step range is not the
>> currently stepped thread.
>
> Then I'm quite confused.  Each thread has its own step range.  How come
> the thread that is not stepped has a step range at all then?

It is stepped. We are switching back to it after a complicated set of
events, see below.

>> Process_event_stop_test calls
>> switch_back_to_stepped_thread, which, in turn, calls resume, bypassing
>> the extra logic in process_event_stop_test to fix up the step range,
>> and leading to the assertion error.
>
> The issue seems to me, as previously discussed, not really about missing
> the "fix up" of the step range, but rather that we overstep the thread
> by mistake.

That is incorrect. The thread's stepping range looks something like
this, in source code:

x = 1; foo(); x = 2;

With the step range equivalent to the single line.

But the stepped-thread gets stopped in foo--that's what all the extra
logic in process_event_stop_test does to fix up the step range.

So the thread is not past the step range at all and will hit it
eventually, but it is outside it. There is logic in
process_event_stop_test to handle this exact case.

> Running the thread through process_event_stop_test makes us
> detect that the step finished (before we ever get to fix up the step
> range).

The step didn't finish. The thread stopped deeper in the stack.

> The thing missing is a testcase clearly showing that that's indeed
> the issue in question.  I spent a few days trying to write one from
> scratch a while ago, but failed, because linux-nat.c always gives
> preference to reporting the stepping/SIGTRAP thread if there are multiple
> simultaneous events, and it seemed like another signal needs to be
> involved to trigger this.

Even if I could release this app (which I can't), it is several
gigabytes big, and it takes a while to hit the case--it is obviously a
race that is unusual extremely uncommon.

I have spent a solid week trying to write a case to hit this
independently, but I can't.

> Perhaps we could confirm all this already in a log produced by
> your extra debug outputs ran against your big app?

I have attached a severely trimmed log to the bug here:

https://sourceware.org/bugzilla/show_bug.cgi?id=16292

The relevant lines in the log as far as I can tell are:

157292: LWP 28899 Takes a trace/breakpoint trap, and is inside its step range.
157293: LWP 29437 has a breakpoint pushed back.
157307: LWP 28899 is resumed and prepared to step (still inside step range).
159355: GDB switches contexts from LWP 28899 to LWP 29437
159808: LWP 29437 Takes a trace/breakpoint trap, and GDB sends SIGSTOP
to all other threads
161427: LWP 28899 Takes a Profiling timer expired trap, instead of a
SIGSTOP. outside of its step range.
161466: switch_to_stepped_thread decides to restart LWP 28899
(switch_to_stepped_thread).
161488: assertion failure.

> I'm not convinced the first two branches in switch_back_to_stepped_thread
> should be changed at all.  So without those that reduces to exactly the
> original patch I had shown you originally:
>
>  https://github.com/palves/gdb/commit/b6b55ba610f8db5d89ec7405c93013a10d9a1c20
>
> Does that alone fix things for you?

Yes, by itself it does work, but I thought you had rejected that for
some reason. The second patch you made breaks things by bypassing the
step-range fix up in process_event_stop_test.

> In that branch, I then later rewrote that fix differently:
>
>  https://github.com/palves/gdb/commit/1d56ddf439b6f7e7fa9759cf1f8e02106eea6af5
>
> The idea of that "better fix" was to handle the case mentioned in
> this comment:
>
> +       There might be some cases where this loses signal
> +       information, if a signal has arrived at exactly the same
> +       time that the PC changed, but this is the best we can do
> +       with the information available.
>
> by setting a breakpoint at the current PC, and re-resuming the thread.
> That means that if there was indeed some other signal/event pending,
> we'd collect it first.  But that's unfinished, and breaks hardware-step
> targets again in the process, for it only handles software-step targets.
> The thing preventing moving this forward is a testcase (or a log showing
> clearly that the problem is what I say above it is, which should show
> the steps needed to construct a testcase).

This doesn't seem to address the case at hand. In any case, the second
patch does not fix it.

I would be fine with your first patch though.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] RFC: All stepping threads need the same checks and cleanup as the currently stepping thread
  2014-01-23 17:55   ` Sterling Augustine
@ 2014-01-23 18:53     ` Pedro Alves
  2014-01-24 12:44       ` Pedro Alves
  2014-02-07 19:52       ` Pedro Alves
  0 siblings, 2 replies; 8+ messages in thread
From: Pedro Alves @ 2014-01-23 18:53 UTC (permalink / raw)
  To: Sterling Augustine; +Cc: gdb-patches

On 01/22/2014 05:41 PM, Sterling Augustine wrote:
> On Wed, Jan 22, 2014 at 6:03 AM, Pedro Alves <palves@redhat.com> wrote:

>> The issue seems to me, as previously discussed, not really about missing
>> the "fix up" of the step range, but rather that we overstep the thread
>> by mistake.
> 
> That is incorrect. The thread's stepping range looks something like
> this, in source code:
> 
> x = 1; foo(); x = 2;
> 
> With the step range equivalent to the single line.
> 
> But the stepped-thread gets stopped in foo--that's what all the extra
> logic in process_event_stop_test does to fix up the step range.

OK, we're saying similar things.  The thing is that the thread (call
it A) doesn't report an event saying it's stopped in foo.  It's
quiesced while handling an event for another thread (B), and we
noticed that thread B had moved.  If we just blindly keep_going it,
we might overstep -- that's what I'm saying.

> 
> So the thread is not past the step range at all 

A matter of definition I guess.  If the thread is in "foo",
then it is out of the step range, and that needs care.

> and will hit it eventually, but it is outside it. There is logic in
> process_event_stop_test to handle this exact case.

Right.

> 
>> Running the thread through process_event_stop_test makes us
>> detect that the step finished (before we ever get to fix up the step
>> range).

I understand that.

> 
> The step didn't finish. The thread stopped deeper in the stack.
> 
>> The thing missing is a testcase clearly showing that that's indeed
>> the issue in question.  I spent a few days trying to write one from
>> scratch a while ago, but failed, because linux-nat.c always gives
>> preference to reporting the stepping/SIGTRAP thread if there are multiple
>> simultaneous events, and it seemed like another signal needs to be
>> involved to trigger this.
> 
> Even if I could release this app (which I can't), it is several
> gigabytes big, and it takes a while to hit the case--it is obviously a
> race that is unusual extremely uncommon.
> 
> I have spent a solid week trying to write a case to hit this
> independently, but I can't.
> 
>> Perhaps we could confirm all this already in a log produced by
>> your extra debug outputs ran against your big app?
> 
> I have attached a severely trimmed log to the bug here:
> 
> https://sourceware.org/bugzilla/show_bug.cgi?id=16292

Thank you that helps a lot.

> The relevant lines in the log as far as I can tell are:
> 
> 157292: LWP 28899 Takes a trace/breakpoint trap, and is inside its step range.
> 157293: LWP 29437 has a breakpoint pushed back.
> 157307: LWP 28899 is resumed and prepared to step (still inside step range).
> 159355: GDB switches contexts from LWP 28899 to LWP 29437
> 159808: LWP 29437 Takes a trace/breakpoint trap, and GDB sends SIGSTOP
> to all other threads
> 161427: LWP 28899 Takes a Profiling timer expired trap, instead of a
> SIGSTOP. outside of its step range.
> 161466: switch_to_stepped_thread decides to restart LWP 28899
> (switch_to_stepped_thread).
> 161488: assertion failure.

We should start looking earlier even.  I do believe this case eventually
needs to be handled (though I'd prefer the form of my other patch, even
though obviously it isn't complete).

>> I'm not convinced the first two branches in switch_back_to_stepped_thread
>> should be changed at all.  So without those that reduces to exactly the
>> original patch I had shown you originally:
>>
>>  https://github.com/palves/gdb/commit/b6b55ba610f8db5d89ec7405c93013a10d9a1c20
>>
>> Does that alone fix things for you?
> 
> Yes, by itself it does work, but I thought you had rejected that for
> some reason. 

I "rejected" it in the sense that that loses signals as mentioned in the
second patch.

> The second patch you made breaks things by bypassing the
> step-range fix up in process_event_stop_test.

Of course, for hardware step targets, it's essentially a revertion
of the first patch.  It's unfinished!  It reverted the first patch,
and then fixed things only for software step targets.  It's needs
finishing for hardware step targets...

> 
>> In that branch, I then later rewrote that fix differently:
>>
>>  https://github.com/palves/gdb/commit/1d56ddf439b6f7e7fa9759cf1f8e02106eea6af5
>>
>> The idea of that "better fix" was to handle the case mentioned in
>> this comment:
>>
>> +       There might be some cases where this loses signal
>> +       information, if a signal has arrived at exactly the same
>> +       time that the PC changed, but this is the best we can do
>> +       with the information available.
>>
>> by setting a breakpoint at the current PC, and re-resuming the thread.
>> That means that if there was indeed some other signal/event pending,
>> we'd collect it first.  But that's unfinished, and breaks hardware-step
>> targets again in the process, for it only handles software-step targets.
>> The thing preventing moving this forward is a testcase (or a log showing
>> clearly that the problem is what I say above it is, which should show
>> the steps needed to construct a testcase).
> 
> This doesn't seem to address the case at hand. In any case, the second
> patch does not fix it.

I don't know how to say it more clearly -- it's unfinished.  :-)
Making the hardware step path do something like the the software step
path should fix it.  At least, that's my hope.

> I would be fine with your first patch though.

Let's step back a bit first.  Trimming your log attached to
the PR a bit, I noticed something related going wrong _before_
the assertion in question.

So we're stepping 28899:

infrun: target_wait (-1, status) =
infrun:   28899 [Thread 0x7ffff7ff1b80 (LWP 28899)],
infrun:   status->kind = stopped, signal = GDB_SIGNAL_TRAP
infrun: Thread 0x7ffff7ff1b80 (LWP 28899) may not range step
infrun: infwait_normal_state
infrun: TARGET_WAITKIND_STOPPED
infrun: stop_pc = 0x8a9878
infrun: stepping inside range [0x8a9769-0x8a9e9d]
infrun: Thread 0x7ffff7ff1b80 (LWP 28899) may range step
infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=0, current thread [Thread 0x7ffff7ff1b80 (LWP 28899)] at 0x8a9878
infrun: Thread 0x7ffff7ff1b80 (LWP 28899) step range = [8a9769-8a9e9d] pc = 8a9878
LLR: Preparing to step process 28899, 0, inferior_ptid Thread 0x7ffff7ff1b80 (LWP 28899)
RC: Resuming sibling Thread 0x7fff60fa0700 (LWP 29437), 0, resume
...

Above we've stepped it, in the range.  But eventually, some other
thread hits a breakpoint that needs stepping over:

infrun: Switching context from Thread 0x7ffff7ff1b80 (LWP 28899) to Thread 0x7fff60fa0700 (LWP 29437)
infrun: BPSTAT_WHAT_SINGLE
infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=1, current thread [Thread 0x7fff60fa0700 (LWP 29437)] at 0x7ffff757cd70
infrun: Thread 0x7fff60fa0700 (LWP 29437) may not range step
LLR: Preparing to step Thread 0x7fff60fa0700 (LWP 29437), 0, inferior_ptid Thread 0x7fff60fa0700 (LWP 29437)
LLR: PTRACE_SINGLESTEP process 29437, 0 (resume event thread)
infrun: prepare_to_wait
linux_nat_wait: [process -1], []
LLW: enter
LLW: retry
LNW: waitpid(-1, ...) returned 0, ERRNO-OK
sigchld
LNW: waitpid(-1, ...) returned 29437, ERRNO-OK
LLW: waitpid 29437 received Profiling timer expired (stopped)
LLTA: KILL(SIG0) Thread 0x7fff60fa0700 (LWP 29437) (OK)
LLW: Candidate event Profiling timer expired (stopped) in Thread 0x7fff60fa0700 (LWP 29437).
SEL: Select single-step Thread 0x7fff60fa0700 (LWP 29437)
LLW: exit

... and then while trying to step over the breakpoint, the thread
reports a signal.

infrun: target_wait (-1, status) =
infrun:   28899 [Thread 0x7fff60fa0700 (LWP 29437)],
infrun:   status->kind = stopped, signal = GDB_SIGNAL_PROF
infrun: Thread 0x7fff60fa0700 (LWP 29437) may not range step
infrun: infwait_normal_state
infrun: TARGET_WAITKIND_STOPPED
infrun: stop_pc = 0x7ffff757cd70
infrun: random signal (GDB_SIGNAL_PROF)
infrun: signal arrived while stepping over breakpoint
infrun: inserting step-resume breakpoint at 0x7ffff757cd70

So we set a step-resume breakpoint at the PC, and continue (all
threads), to move past the signal handler.  But, 

infrun: resume (step=0, signal=GDB_SIGNAL_PROF), trap_expected=0, current thread [Thread 0x7fff60fa0700 (LWP 29437)] at 0x7ffff757cd70
LLR: Preparing to resume process 28899, Profiling timer expired, inferior_ptid Thread 0x7fff60fa0700 (LWP 29437)
RC: Not resuming sibling Thread 0x7fff60fa0700 (LWP 29437) (not stopped)
...
RC: Resuming sibling Thread 0x7ffff7ff1b80 (LWP 28899), 0, resume
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

... that means we CONTINUE thread 28899, the one stepping in the
range, instead of stepping it!  No wonder it later appeared
outside the stepping range.

When I saw this, I had a deja vu...

https://sourceware.org/ml/gdb-patches/2011-05/msg00443.html

(I never pushed that patch.)

And lo, the test in that patch triggers your assertion too...

I never pushed that patch because I wasn't really sure I
liked it.  Running the signal handler with scheduler locking
on is "dangerous" in we could cause deadlock in the inferior
(some other thread might be holding some form of lock the
handler takes too).

So, after that patch was written, things have evolved some
in infrun.c, that I think this simpler patch suffices.
There's a spot in resume that uses
step_after_step_resume_breakpoint too (for software
single-step targets), that I haven't really given much
thought yet.  It might well need something there too.

-----
From c6c4e2960d3f8c797ea96909c423d6d53934e1c6 Mon Sep 17 00:00:00 2001
From: Pedro Alves <palves@redhat.com>
Date: Thu, 23 Jan 2014 18:23:42 +0000
Subject: [PATCH] Make sure we don't resume the stepped thread by accident.

Say:

<stopped at a breakpoint in thread 2>
(gdb) thread 3
(gdb) step

The above triggers the prepare_to_proceed/deferred_step_ptid process,
which switches back to thread 2, to step over its breakpoint before
getting back to thread 3 and "step" it.

If while stepping over the breakpoint in thread 2, a signal arrives,
and it is set to pass/nostop, we'll set a step-resume breakpoint at
the supposed signal-handler resume address, and call keep_going.  The
problem is that we were supposely stepping thread 3, and that
keep_going delivers a signal to thread 2, and due to scheduler-locking
off, resumes everything else, _including_ thread 3, the thread we want
stepping.  This means that we lose control of thread 3 until the next
event, when we stop everything.  The end result for the user, is that
GDB lost control of the "step".

Here's the current infrun debug output of the above, with the testcase
in the patch below:

infrun: clear_proceed_status_thread (Thread 0x2aaaab8f5700 (LWP 11663))
infrun: clear_proceed_status_thread (Thread 0x2aaaab6f4700 (LWP 11662))
infrun: clear_proceed_status_thread (Thread 0x2aaaab4f2b20 (LWP 11659))
infrun: proceed (addr=0xffffffffffffffff, signal=144, step=1)
infrun: prepare_to_proceed (step=1), switched to [Thread 0x2aaaab6f4700 (LWP 11662)]
infrun: resume (step=1, signal=0), trap_expected=1, current thread [Thread 0x2aaaab6f4700 (LWP 11662)] at 0x40098f
infrun: wait_for_inferior ()
infrun: target_wait (-1, status) =
infrun:   11659 [Thread 0x2aaaab6f4700 (LWP 11662)],
infrun:   status->kind = stopped, signal = SIGUSR1
infrun: infwait_normal_state
infrun: TARGET_WAITKIND_STOPPED
infrun: stop_pc = 0x40098f
infrun: random signal 30

Program received signal SIGUSR1, User defined signal 1.
infrun: signal arrived while stepping over breakpoint
infrun: inserting step-resume breakpoint at 0x40098f
infrun: resume (step=0, signal=30), trap_expected=0, current thread [Thread 0x2aaaab6f4700 (LWP 11662)] at 0x40098f

^^^ this is a wildcard resume.

infrun: prepare_to_wait
infrun: target_wait (-1, status) =
infrun:   11659 [Thread 0x2aaaab6f4700 (LWP 11662)],
infrun:   status->kind = stopped, signal = SIGTRAP
infrun: infwait_normal_state
infrun: TARGET_WAITKIND_STOPPED
infrun: stop_pc = 0x40098f
infrun: BPSTAT_WHAT_STEP_RESUME
infrun: resume (step=1, signal=0), trap_expected=1, current thread [Thread 0x2aaaab6f4700 (LWP 11662)] at 0x40098f

^^^ step-resume hit, meaning the handler returned, so we go back to stepping thread 3.


infrun: prepare_to_wait
infrun: target_wait (-1, status) =
infrun:   11659 [Thread 0x2aaaab6f4700 (LWP 11662)],
infrun:   status->kind = stopped, signal = SIGTRAP
infrun: infwait_normal_state
infrun: TARGET_WAITKIND_STOPPED

infrun: stop_pc = 0x40088b
infrun: switching back to stepped thread
infrun: Switching context from Thread 0x2aaaab6f4700 (LWP 11662) to Thread 0x2aaaab8f5700 (LWP 11663)
infrun: resume (step=1, signal=0), trap_expected=0, current thread [Thread 0x2aaaab8f5700 (LWP 11663)] at 0x400938
infrun: prepare_to_wait
infrun: target_wait (-1, status) =
infrun:   11659 [Thread 0x2aaaab8f5700 (LWP 11663)],
infrun:   status->kind = stopped, signal = SIGTRAP
infrun: infwait_normal_state
infrun: TARGET_WAITKIND_STOPPED
infrun: stop_pc = 0x40093a
infrun: keep going
infrun: resume (step=1, signal=0), trap_expected=0, current thread [Thread 0x2aaaab8f5700 (LWP 11663)] at 0x40093a
infrun: prepare_to_wait
infrun: target_wait (-1, status) =
infrun:   11659 [Thread 0x2aaaab8f5700 (LWP 11663)],
infrun:   status->kind = stopped, signal = SIGTRAP
infrun: infwait_normal_state
infrun: TARGET_WAITKIND_STOPPED
infrun: stop_pc = 0x40091e
infrun: stepped to a different line
infrun: stop_stepping
[Switching to Thread 0x2aaaab8f5700 (LWP 11663)]
69            (*myp) ++; /* set breakpoint child_two here */

^^^ we stopped at the wrong line.  We still stepped a bit because the
test is running in a loop, and when we got back to stepping thread 3,
it happened to be in the stepping range.  (The loop increments a
counter, and the test makes sure it increments exactly once.  Without
the fix, the counter increments a bunch, since the user-stepped thread
runs free without GDB noticing.)

The fix is switch to the stepping thread before continuing for the
step-resume breakpoint.

2014-01-23  Pedro Alves  <palves@redhat.com>

	gdb/
	* infrun.c (handle_signal_stop) <signal arrives while
	stepping over a breakpoint>: Switch back to the stepping thread.

2014-01-23  Pedro Alves  <pedro@codesourcery.com>
	    Pedro Alves  <palves@redhat.com>

	gdb/testsuite/
	* gdb.threads/step-after-sr-lock.c: New.
	* gdb.threads/step-after-sr-lock.exp: New.
---
 gdb/infrun.c                                     |   6 +-
 gdb/testsuite/gdb.threads/step-after-sr-lock.c   | 145 +++++++++++++++++++++++
 gdb/testsuite/gdb.threads/step-after-sr-lock.exp | 120 +++++++++++++++++++
 3 files changed, 270 insertions(+), 1 deletion(-)
 create mode 100644 gdb/testsuite/gdb.threads/step-after-sr-lock.c
 create mode 100644 gdb/testsuite/gdb.threads/step-after-sr-lock.exp

diff --git a/gdb/infrun.c b/gdb/infrun.c
index 71d9615..99ef13b 100644
--- a/gdb/infrun.c
+++ b/gdb/infrun.c
@@ -4380,7 +4380,11 @@ handle_signal_stop (struct execution_control_state *ecs)
 	  ecs->event_thread->step_after_step_resume_breakpoint = 1;
 	  /* Reset trap_expected to ensure breakpoints are re-inserted.  */
 	  ecs->event_thread->control.trap_expected = 0;
-	  keep_going (ecs);
+
+	  /* If we were nexting/stepping some other thread, switch to
+	     it, so that we don't continue it, losing control.  */
+	  if (!switch_back_to_stepped_thread (ecs))
+	    keep_going (ecs);
 	  return;
 	}
 
diff --git a/gdb/testsuite/gdb.threads/step-after-sr-lock.c b/gdb/testsuite/gdb.threads/step-after-sr-lock.c
new file mode 100644
index 0000000..fa0f6f1
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/step-after-sr-lock.c
@@ -0,0 +1,145 @@
+/* This testcase is part of GDB, the GNU debugger.
+
+   Copyright 2009, 2010, 2011 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include <pthread.h>
+#include <unistd.h>
+#include <stdlib.h>
+#include <signal.h>
+
+unsigned int args[2];
+
+pid_t pid;
+pthread_barrier_t barrier;
+pthread_t child_thread_2, child_thread_3;
+
+void
+handler (int signo)
+{
+  /* so that thread 3 is sure to run, in case the bug is present.  */
+  usleep (10);
+}
+
+void
+callme (void)
+{
+}
+
+void
+block_signals (void)
+{
+  sigset_t mask;
+
+  sigfillset (&mask);
+  sigprocmask (SIG_BLOCK, &mask, NULL);
+}
+
+void
+unblock_signals (void)
+{
+  sigset_t mask;
+
+  sigfillset (&mask);
+  sigprocmask (SIG_UNBLOCK, &mask, NULL);
+}
+
+void *
+child_function_3 (void *arg)
+{
+  int my_number =  (long) arg;
+  volatile int *myp = (int *) &args[my_number];
+
+  pthread_barrier_wait (&barrier);
+
+  while (*myp > 0)
+    {
+      (*myp) ++; /* set breakpoint child_two here */
+      callme ();
+    }
+
+  pthread_exit (NULL);
+}
+
+void *
+child_function_2 (void *arg)
+{
+  int my_number =  (long) arg;
+  volatile int *myp = (int *) &args[my_number];
+
+  unblock_signals ();
+
+  pthread_barrier_wait (&barrier);
+
+  while (*myp > 0)
+    {
+      (*myp) ++;
+      callme (); /* set breakpoint child_one here */
+    }
+
+  *myp = 1;
+  while (*myp > 0)
+    {
+      (*myp) ++;
+      callme ();
+    }
+
+  pthread_exit (NULL);
+}
+
+
+int
+main ()
+{
+  int res;
+  long i;
+
+  /* Block signals in all threads but one, so that we're sure which
+     thread gets the signal we send from the command line.  */
+  block_signals ();
+
+  signal (SIGUSR1, handler);
+
+  /* Call these early so that PLTs for these are resolved soon,
+     instead of in the threads.  RTLD_NOW should work as well.  */
+  usleep (0);
+  pthread_barrier_init (&barrier, NULL, 1);
+  pthread_barrier_wait (&barrier);
+
+  pthread_barrier_init (&barrier, NULL, 2);
+
+  /* The test uses this global to know where to send the signal
+     to.  */
+  pid = getpid ();
+
+  i = 0;
+  args[i] = 1;
+  res = pthread_create (&child_thread_2,
+			NULL, child_function_2, (void *) i);
+  pthread_barrier_wait (&barrier);
+  callme (); /* set wait-thread-2 breakpoint here */
+
+  i = 1;
+  args[i] = 1;
+  res = pthread_create (&child_thread_3,
+			NULL, child_function_3, (void *) i);
+  pthread_barrier_wait (&barrier);
+  callme (); /* set wait-thread-3 breakpoint here */
+
+  pthread_join (child_thread_2, NULL);
+  pthread_join (child_thread_3, NULL);
+
+  exit(EXIT_SUCCESS);
+}
diff --git a/gdb/testsuite/gdb.threads/step-after-sr-lock.exp b/gdb/testsuite/gdb.threads/step-after-sr-lock.exp
new file mode 100644
index 0000000..7fea49e
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/step-after-sr-lock.exp
@@ -0,0 +1,120 @@
+# Copyright (C) 2011 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# Test that GDB doesn't inadvertently resume the stepped thread when a
+# signal arrives while stepping over the breakpoint that last caused a
+# stop, when the thread that hit that breakpoint is not the stepped
+# thread.
+
+standard_testfile
+set executable ${testfile}
+
+if [target_info exists gdb,nosignals] {
+    verbose "Skipping ${testfile}.exp because of nosignals."
+    return -1
+}
+
+# Test uses host "kill".
+if { [is_remote target] } {
+    return -1
+}
+
+if {[gdb_compile_pthreads "${srcdir}/${subdir}/${srcfile}" "${binfile}" \
+	 executable [list debug "incdir=${objdir}"]] != "" } {
+    return -1
+}
+
+proc get_value {var test} {
+    global expect_out
+    global gdb_prompt
+    global decimal
+
+    set value -1
+    gdb_test_multiple "print $var" "$test" {
+	-re ".*= ($decimal).*\r\n$gdb_prompt $" {
+	    set value $expect_out(1,string)
+	    pass "$test"
+        }
+    }
+    return ${value}
+}
+
+# Start with a fresh gdb.
+
+clean_restart $executable
+
+if ![runto_main] {
+    return -1
+}
+
+gdb_breakpoint [gdb_get_line_number "set wait-thread-2 breakpoint here"]
+gdb_continue_to_breakpoint "run to wait-thread-2 breakpoint"
+gdb_test "info threads" "" "info threads with thread 2"
+
+gdb_breakpoint [gdb_get_line_number "set wait-thread-3 breakpoint here"]
+gdb_continue_to_breakpoint "run to breakpoint"
+gdb_test "info threads" "" "info threads with thread 3"
+
+set testpid [get_value "pid" "get pid of inferior"]
+
+gdb_test "set scheduler-locking on"
+
+gdb_breakpoint [gdb_get_line_number "set breakpoint child_two here"]
+gdb_breakpoint [gdb_get_line_number "set breakpoint child_one here"]
+
+gdb_test "thread 3" "" "switch to thread 3 to run to its breakpoint"
+gdb_continue_to_breakpoint "run to breakpoint in thread 3"
+
+gdb_test "thread 2" "" "switch to thread 2 to run to its breakpoint"
+gdb_continue_to_breakpoint "run to breakpoint in thread 2"
+
+delete_breakpoints
+
+gdb_test "b *\$pc" "" "set breakpoint to be stepped over"
+# Make sure the first loop breaks without hitting the breakpoint
+# again.
+gdb_test "p *myp = 0" " = 0" "force loop break in thread 2"
+
+# We want "print" to make sure the target reports the signal to the
+# core.
+gdb_test "handle SIGUSR1 print nostop pass" "" ""
+
+# Queue a signal in thread 2.
+remote_exec host "kill -SIGUSR1 ${testpid}"
+
+gdb_test "thread 3" "" "switch to thread 3 for stepping"
+set my_number [get_value "my_number" "get my_number"]
+set cnt_before [get_value "args\[$my_number\]" "get count before step"]
+gdb_test "set scheduler-locking off"
+
+# Make sure we're exercising the paths we want to.
+gdb_test "set debug infrun 1"
+
+gdb_test \
+    "step" \
+    ".*prepare_to_proceed \\(step=1\\), switched to.*signal arrived while stepping over breakpoint.*switching back to stepped thread.*stepped to a different line.*callme.*" \
+    "step"
+
+set cnt_after [get_value "args\[$my_number\]" "get count after step"]
+
+# Test that GDB doesn't inadvertently resume the stepped thread when a
+# signal arrives while stepping over a breakpoint in another thread.
+
+set test "stepped thread under control"
+if { $cnt_before + 1 == $cnt_after } {
+    pass $test
+} else {
+    fail $test
+}
-- 
1.7.11.7


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] RFC: All stepping threads need the same checks and cleanup as the currently stepping thread
  2014-01-23 18:53     ` Pedro Alves
@ 2014-01-24 12:44       ` Pedro Alves
  2014-02-07 19:52       ` Pedro Alves
  1 sibling, 0 replies; 8+ messages in thread
From: Pedro Alves @ 2014-01-24 12:44 UTC (permalink / raw)
  To: Sterling Augustine; +Cc: gdb-patches

On 01/23/2014 06:53 PM, Pedro Alves wrote:

> I "rejected" it in the sense that that loses signals as mentioned in the
> second patch.
> 
>> The second patch you made breaks things by bypassing the
>> step-range fix up in process_event_stop_test.
> 
> Of course, for hardware step targets, it's essentially a revertion
> of the first patch.  It's unfinished!  It reverted the first patch,
> and then fixed things only for software step targets.  It's needs
> finishing for hardware step targets...

So I thought about this some more, and I think I now
recall/realize why I reverted the hardware step path to just do
keep_going like before.  It's actually the same reason I failed to
create a reproducer that oversteps before.  Assuming a well behaved
target backend (and no bugs in the core infrun code), it shouldn't
really be possible to overstep here.  Consider, e.g. a thread
T1 is stopped at xxx0000, below.

 [ADDR ]
 xxx0000  <<< T1 stopped here
 xxx0001
 xxx0002
 xxx0003

GDB tells this thread to single-step.  Another thread (T2)
hits some other event (say SIGPROF).  The backend starts
stopping all threads in order to report that event to the
core, and while doing that, either T1 manages to finish
the step, or the event in SIGPROF was reported so fast,
that T1 didn't really get a chance to be scheduled by the
kernel, and so it reports a SIGSTOP (or some other async
signal).  If T1 does finish the step and reports a SIGTRAP, then
the target backend prefers reporting that event first over
the SIGPROF in T2 -- see linux-nat.c:select_event_lwp's
"Give preference to any LWP that is being single-stepped.".
When that path is taken, SIGPROF is left pending to report
later pre-emptively on the next resume attempt -- see
linux_nat_resume's "LLR: Short circuiting for status".

Even if GDB didn't give preference to the stepping thread's
event, say, it reported the SIGPROF event for T2 first, leaving the
SIGTRAP for T1 pending, that'd be okay in terms of overstepping
concerns, because then what we have would be, before:

 xxx0000  <<< T1 stopped here
 xxx0001
 xxx0002
 xxx0003

after single-step request, T2 hits SIGPROF, and T1
stops at xxx0001 with SIGTRAP:

 xxx0000
 xxx0001  <<< T1 stopped here w/SIGTRAP
 xxx0002
 xxx0003

and then GDB reports T2's SIGPROF to the core, leaving T1's
SIGTRAP pending.  Assuming SIGPROF is set to
"handle SIGPROF nostop ...", infrun's switch_back_to_stepping_thread
would switch back to T1, and tell it to single-step.  This is where
I was blurry and worrying about overstepping, because T1 is _already_
at xxx0001 but the core wasn't told about it.  So I was mistakenly worried
that that new step request would make the thread step to xxx0002.  But,
it's not really an issue, because T1 still has the SIGTRAP for previous
finished step at xxx0001 pending, and won't really be resumed (again,
that's linux_nat_resume's  "LLR: Short circuiting for status".).

Now, the reason the backend prefers reporting the stepping thread's
SIGTRAPs first, is because if T1 has a SIGTRAP pending for a previous
step request, and now GDB issues a continue on T1 instead of a step,
and T1 reports the SIGTRAP, the core will be confused over that SIGTRAP
(because from the core's perpective, the thread wasn't told to step),
and report it to the user as a spurious SIGTRAP.  An alternative way
to handle this would be for the backend to keep track of the fact
that the pending SIGTRAP was for a step request, and seeing a
continue request, discard the pending SIGTRAP and indeed continue
the thread free.  (gdbserver's linux-low.c used to do this at
some point, and then I switched it to do what gdb's linux-nat.c
does, IIRC.)

-- 
Pedro Alves

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] RFC: All stepping threads need the same checks and cleanup as the currently stepping thread
  2014-01-23 18:53     ` Pedro Alves
  2014-01-24 12:44       ` Pedro Alves
@ 2014-02-07 19:52       ` Pedro Alves
  2014-02-07 20:12         ` Pedro Alves
  2014-02-26 17:12         ` Pedro Alves
  1 sibling, 2 replies; 8+ messages in thread
From: Pedro Alves @ 2014-02-07 19:52 UTC (permalink / raw)
  To: gdb-patches; +Cc: Sterling Augustine

On 01/23/2014 06:53 PM, Pedro Alves wrote:

> There's a spot in resume that uses
> step_after_step_resume_breakpoint too (for software
> single-step targets), that I haven't really given much
> thought yet.  It might well need something there too.

So I've stared at that piece of code, and I'm not seeing
anything that would need to change.

I've adjusted the copyright dates in the test and pushed
it in, as below.

Although the bug must be pretty old, the triggered assertion
is new in 7.7.  So it might be good to have it in 7.7.

-------
From d137e6dc798cdf3b3b17fe47322ce61450870e22 Mon Sep 17 00:00:00 2001
From: Pedro Alves <palves@redhat.com>
Date: Fri, 7 Feb 2014 19:11:25 +0000
Subject: [PATCH] Make sure we don't resume the stepped thread by accident.

Say:

<stopped at a breakpoint in thread 2>
(gdb) thread 3
(gdb) step

The above triggers the prepare_to_proceed/deferred_step_ptid process,
which switches back to thread 2, to step over its breakpoint before
getting back to thread 3 and "step" it.

If while stepping over the breakpoint in thread 2, a signal arrives,
and it is set to pass/nostop, we'll set a step-resume breakpoint at
the supposed signal-handler resume address, and call keep_going.  The
problem is that we were supposedly stepping thread 3, and that
keep_going delivers a signal to thread 2, and due to scheduler-locking
off, resumes everything else, _including_ thread 3, the thread we want
stepping.  This means that we lose control of thread 3 until the next
event, when we stop everything.  The end result for the user, is that
GDB lost control of the "step".

Here's the current infrun debug output of the above, with the testcase
in the patch below:

infrun: clear_proceed_status_thread (Thread 0x2aaaab8f5700 (LWP 11663))
infrun: clear_proceed_status_thread (Thread 0x2aaaab6f4700 (LWP 11662))
infrun: clear_proceed_status_thread (Thread 0x2aaaab4f2b20 (LWP 11659))
infrun: proceed (addr=0xffffffffffffffff, signal=144, step=1)
infrun: prepare_to_proceed (step=1), switched to [Thread 0x2aaaab6f4700 (LWP 11662)]
infrun: resume (step=1, signal=0), trap_expected=1, current thread [Thread 0x2aaaab6f4700 (LWP 11662)] at 0x40098f
infrun: wait_for_inferior ()
infrun: target_wait (-1, status) =
infrun:   11659 [Thread 0x2aaaab6f4700 (LWP 11662)],
infrun:   status->kind = stopped, signal = SIGUSR1
infrun: infwait_normal_state
infrun: TARGET_WAITKIND_STOPPED
infrun: stop_pc = 0x40098f
infrun: random signal 30

Program received signal SIGUSR1, User defined signal 1.
infrun: signal arrived while stepping over breakpoint
infrun: inserting step-resume breakpoint at 0x40098f
infrun: resume (step=0, signal=30), trap_expected=0, current thread [Thread 0x2aaaab6f4700 (LWP 11662)] at 0x40098f

^^^ this is a wildcard resume.

infrun: prepare_to_wait
infrun: target_wait (-1, status) =
infrun:   11659 [Thread 0x2aaaab6f4700 (LWP 11662)],
infrun:   status->kind = stopped, signal = SIGTRAP
infrun: infwait_normal_state
infrun: TARGET_WAITKIND_STOPPED
infrun: stop_pc = 0x40098f
infrun: BPSTAT_WHAT_STEP_RESUME
infrun: resume (step=1, signal=0), trap_expected=1, current thread [Thread 0x2aaaab6f4700 (LWP 11662)] at 0x40098f

^^^ step-resume hit, meaning the handler returned, so we go back to stepping thread 3.

infrun: prepare_to_wait
infrun: target_wait (-1, status) =
infrun:   11659 [Thread 0x2aaaab6f4700 (LWP 11662)],
infrun:   status->kind = stopped, signal = SIGTRAP
infrun: infwait_normal_state
infrun: TARGET_WAITKIND_STOPPED

infrun: stop_pc = 0x40088b
infrun: switching back to stepped thread
infrun: Switching context from Thread 0x2aaaab6f4700 (LWP 11662) to Thread 0x2aaaab8f5700 (LWP 11663)
infrun: resume (step=1, signal=0), trap_expected=0, current thread [Thread 0x2aaaab8f5700 (LWP 11663)] at 0x400938
infrun: prepare_to_wait
infrun: target_wait (-1, status) =
infrun:   11659 [Thread 0x2aaaab8f5700 (LWP 11663)],
infrun:   status->kind = stopped, signal = SIGTRAP
infrun: infwait_normal_state
infrun: TARGET_WAITKIND_STOPPED
infrun: stop_pc = 0x40093a
infrun: keep going
infrun: resume (step=1, signal=0), trap_expected=0, current thread [Thread 0x2aaaab8f5700 (LWP 11663)] at 0x40093a
infrun: prepare_to_wait
infrun: target_wait (-1, status) =
infrun:   11659 [Thread 0x2aaaab8f5700 (LWP 11663)],
infrun:   status->kind = stopped, signal = SIGTRAP
infrun: infwait_normal_state
infrun: TARGET_WAITKIND_STOPPED
infrun: stop_pc = 0x40091e
infrun: stepped to a different line
infrun: stop_stepping
[Switching to Thread 0x2aaaab8f5700 (LWP 11663)]
69            (*myp) ++; /* set breakpoint child_two here */

^^^ we stopped at the wrong line.  We still stepped a bit because the
test is running in a loop, and when we got back to stepping thread 3,
it happened to be in the stepping range.  (The loop increments a
counter, and the test makes sure it increments exactly once.  Without
the fix, the counter increments a bunch, since the user-stepped thread
runs free without GDB noticing.)

The fix is to switch to the stepping thread before continuing for the
step-resume breakpoint.

gdb/
2014-02-07  Pedro Alves  <palves@redhat.com>

	* infrun.c (handle_signal_stop) <signal arrives while stepping
	over a breakpoint>: Switch back to the stepping thread.

gdb/testsuite/
2014-02-07  Pedro Alves  <pedro@codesourcery.com>
	    Pedro Alves  <palves@redhat.com>

	* gdb.threads/step-after-sr-lock.c: New file.
	* gdb.threads/step-after-sr-lock.exp: New file.
---
 gdb/ChangeLog                                    |   5 +
 gdb/infrun.c                                     |   6 +-
 gdb/testsuite/ChangeLog                          |   6 +
 gdb/testsuite/gdb.threads/step-after-sr-lock.c   | 145 +++++++++++++++++++++++
 gdb/testsuite/gdb.threads/step-after-sr-lock.exp | 120 +++++++++++++++++++
 5 files changed, 281 insertions(+), 1 deletion(-)
 create mode 100644 gdb/testsuite/gdb.threads/step-after-sr-lock.c
 create mode 100644 gdb/testsuite/gdb.threads/step-after-sr-lock.exp

diff --git a/gdb/ChangeLog b/gdb/ChangeLog
index 6bcc205..99ed610 100644
--- a/gdb/ChangeLog
+++ b/gdb/ChangeLog
@@ -1,3 +1,8 @@
+2014-02-07  Pedro Alves  <palves@redhat.com>
+
+	* infrun.c (handle_signal_stop) <signal arrives while stepping
+	over a breakpoint>: Switch back to the stepping thread.
+
 2014-02-07  Yao Qi  <yao@codesourcery.com>
 
 	* target.c (target_xfer_partial): Return zero if LEN is zero.
diff --git a/gdb/infrun.c b/gdb/infrun.c
index c0df124..5d60a90 100644
--- a/gdb/infrun.c
+++ b/gdb/infrun.c
@@ -4384,7 +4384,11 @@ handle_signal_stop (struct execution_control_state *ecs)
 	  ecs->event_thread->step_after_step_resume_breakpoint = 1;
 	  /* Reset trap_expected to ensure breakpoints are re-inserted.  */
 	  ecs->event_thread->control.trap_expected = 0;
-	  keep_going (ecs);
+
+	  /* If we were nexting/stepping some other thread, switch to
+	     it, so that we don't continue it, losing control.  */
+	  if (!switch_back_to_stepped_thread (ecs))
+	    keep_going (ecs);
 	  return;
 	}
 
diff --git a/gdb/testsuite/ChangeLog b/gdb/testsuite/ChangeLog
index 89f879b..7c1fd10 100644
--- a/gdb/testsuite/ChangeLog
+++ b/gdb/testsuite/ChangeLog
@@ -1,3 +1,9 @@
+2014-02-07  Pedro Alves  <pedro@codesourcery.com>
+	    Pedro Alves  <palves@redhat.com>
+
+	* gdb.threads/step-after-sr-lock.c: New file.
+	* gdb.threads/step-after-sr-lock.exp: New file.
+
 2014-02-07  Pedro Alves  <palves@redhat.com>
 
 	* gdb.threads/stepi-random-signal.exp: Set SIGCHLD to print.
diff --git a/gdb/testsuite/gdb.threads/step-after-sr-lock.c b/gdb/testsuite/gdb.threads/step-after-sr-lock.c
new file mode 100644
index 0000000..a4634f2
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/step-after-sr-lock.c
@@ -0,0 +1,145 @@
+/* This testcase is part of GDB, the GNU debugger.
+
+   Copyright 2009-2014 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include <pthread.h>
+#include <unistd.h>
+#include <stdlib.h>
+#include <signal.h>
+
+unsigned int args[2];
+
+pid_t pid;
+pthread_barrier_t barrier;
+pthread_t child_thread_2, child_thread_3;
+
+void
+handler (int signo)
+{
+  /* so that thread 3 is sure to run, in case the bug is present.  */
+  usleep (10);
+}
+
+void
+callme (void)
+{
+}
+
+void
+block_signals (void)
+{
+  sigset_t mask;
+
+  sigfillset (&mask);
+  sigprocmask (SIG_BLOCK, &mask, NULL);
+}
+
+void
+unblock_signals (void)
+{
+  sigset_t mask;
+
+  sigfillset (&mask);
+  sigprocmask (SIG_UNBLOCK, &mask, NULL);
+}
+
+void *
+child_function_3 (void *arg)
+{
+  int my_number =  (long) arg;
+  volatile int *myp = (int *) &args[my_number];
+
+  pthread_barrier_wait (&barrier);
+
+  while (*myp > 0)
+    {
+      (*myp) ++; /* set breakpoint child_two here */
+      callme ();
+    }
+
+  pthread_exit (NULL);
+}
+
+void *
+child_function_2 (void *arg)
+{
+  int my_number =  (long) arg;
+  volatile int *myp = (int *) &args[my_number];
+
+  unblock_signals ();
+
+  pthread_barrier_wait (&barrier);
+
+  while (*myp > 0)
+    {
+      (*myp) ++;
+      callme (); /* set breakpoint child_one here */
+    }
+
+  *myp = 1;
+  while (*myp > 0)
+    {
+      (*myp) ++;
+      callme ();
+    }
+
+  pthread_exit (NULL);
+}
+
+
+int
+main ()
+{
+  int res;
+  long i;
+
+  /* Block signals in all threads but one, so that we're sure which
+     thread gets the signal we send from the command line.  */
+  block_signals ();
+
+  signal (SIGUSR1, handler);
+
+  /* Call these early so that PLTs for these are resolved soon,
+     instead of in the threads.  RTLD_NOW should work as well.  */
+  usleep (0);
+  pthread_barrier_init (&barrier, NULL, 1);
+  pthread_barrier_wait (&barrier);
+
+  pthread_barrier_init (&barrier, NULL, 2);
+
+  /* The test uses this global to know where to send the signal
+     to.  */
+  pid = getpid ();
+
+  i = 0;
+  args[i] = 1;
+  res = pthread_create (&child_thread_2,
+			NULL, child_function_2, (void *) i);
+  pthread_barrier_wait (&barrier);
+  callme (); /* set wait-thread-2 breakpoint here */
+
+  i = 1;
+  args[i] = 1;
+  res = pthread_create (&child_thread_3,
+			NULL, child_function_3, (void *) i);
+  pthread_barrier_wait (&barrier);
+  callme (); /* set wait-thread-3 breakpoint here */
+
+  pthread_join (child_thread_2, NULL);
+  pthread_join (child_thread_3, NULL);
+
+  exit(EXIT_SUCCESS);
+}
diff --git a/gdb/testsuite/gdb.threads/step-after-sr-lock.exp b/gdb/testsuite/gdb.threads/step-after-sr-lock.exp
new file mode 100644
index 0000000..6b93d9c
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/step-after-sr-lock.exp
@@ -0,0 +1,120 @@
+# Copyright (C) 2011-2014 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# Test that GDB doesn't inadvertently resume the stepped thread when a
+# signal arrives while stepping over the breakpoint that last caused a
+# stop, when the thread that hit that breakpoint is not the stepped
+# thread.
+
+standard_testfile
+set executable ${testfile}
+
+if [target_info exists gdb,nosignals] {
+    verbose "Skipping ${testfile}.exp because of nosignals."
+    return -1
+}
+
+# Test uses host "kill".
+if { [is_remote target] } {
+    return -1
+}
+
+if {[gdb_compile_pthreads "${srcdir}/${subdir}/${srcfile}" "${binfile}" \
+	 executable [list debug "incdir=${objdir}"]] != "" } {
+    return -1
+}
+
+proc get_value {var test} {
+    global expect_out
+    global gdb_prompt
+    global decimal
+
+    set value -1
+    gdb_test_multiple "print $var" "$test" {
+	-re ".*= ($decimal).*\r\n$gdb_prompt $" {
+	    set value $expect_out(1,string)
+	    pass "$test"
+        }
+    }
+    return ${value}
+}
+
+# Start with a fresh gdb.
+
+clean_restart $executable
+
+if ![runto_main] {
+    return -1
+}
+
+gdb_breakpoint [gdb_get_line_number "set wait-thread-2 breakpoint here"]
+gdb_continue_to_breakpoint "run to wait-thread-2 breakpoint"
+gdb_test "info threads" "" "info threads with thread 2"
+
+gdb_breakpoint [gdb_get_line_number "set wait-thread-3 breakpoint here"]
+gdb_continue_to_breakpoint "run to breakpoint"
+gdb_test "info threads" "" "info threads with thread 3"
+
+set testpid [get_value "pid" "get pid of inferior"]
+
+gdb_test "set scheduler-locking on"
+
+gdb_breakpoint [gdb_get_line_number "set breakpoint child_two here"]
+gdb_breakpoint [gdb_get_line_number "set breakpoint child_one here"]
+
+gdb_test "thread 3" "" "switch to thread 3 to run to its breakpoint"
+gdb_continue_to_breakpoint "run to breakpoint in thread 3"
+
+gdb_test "thread 2" "" "switch to thread 2 to run to its breakpoint"
+gdb_continue_to_breakpoint "run to breakpoint in thread 2"
+
+delete_breakpoints
+
+gdb_test "b *\$pc" "" "set breakpoint to be stepped over"
+# Make sure the first loop breaks without hitting the breakpoint
+# again.
+gdb_test "p *myp = 0" " = 0" "force loop break in thread 2"
+
+# We want "print" to make sure the target reports the signal to the
+# core.
+gdb_test "handle SIGUSR1 print nostop pass" "" ""
+
+# Queue a signal in thread 2.
+remote_exec host "kill -SIGUSR1 ${testpid}"
+
+gdb_test "thread 3" "" "switch to thread 3 for stepping"
+set my_number [get_value "my_number" "get my_number"]
+set cnt_before [get_value "args\[$my_number\]" "get count before step"]
+gdb_test "set scheduler-locking off"
+
+# Make sure we're exercising the paths we want to.
+gdb_test "set debug infrun 1"
+
+gdb_test \
+    "step" \
+    ".*prepare_to_proceed \\(step=1\\), switched to.*signal arrived while stepping over breakpoint.*switching back to stepped thread.*stepped to a different line.*callme.*" \
+    "step"
+
+set cnt_after [get_value "args\[$my_number\]" "get count after step"]
+
+# Test that GDB doesn't inadvertently resume the stepped thread when a
+# signal arrives while stepping over a breakpoint in another thread.
+
+set test "stepped thread under control"
+if { $cnt_before + 1 == $cnt_after } {
+    pass $test
+} else {
+    fail $test
+}
-- 
1.7.11.7


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] RFC: All stepping threads need the same checks and cleanup as the currently stepping thread
  2014-02-07 19:52       ` Pedro Alves
@ 2014-02-07 20:12         ` Pedro Alves
  2014-02-26 17:12         ` Pedro Alves
  1 sibling, 0 replies; 8+ messages in thread
From: Pedro Alves @ 2014-02-07 20:12 UTC (permalink / raw)
  To: Sterling Augustine; +Cc: gdb-patches

On 02/07/2014 07:52 PM, Pedro Alves wrote:
> On 01/23/2014 06:53 PM, Pedro Alves wrote:
> 
>> > There's a spot in resume that uses
>> > step_after_step_resume_breakpoint too (for software
>> > single-step targets), that I haven't really given much
>> > thought yet.  It might well need something there too.
> So I've stared at that piece of code, and I'm not seeing
> anything that would need to change.

I forgot to mention, but for the archives, I also
tested the patch against a software single-step target
(or rather my series that makes x86-64 use that).
The new test fails before the patch there too, and
passes after.

-- 
Pedro Alves

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] RFC: All stepping threads need the same checks and cleanup as the currently stepping thread
  2014-02-07 19:52       ` Pedro Alves
  2014-02-07 20:12         ` Pedro Alves
@ 2014-02-26 17:12         ` Pedro Alves
  1 sibling, 0 replies; 8+ messages in thread
From: Pedro Alves @ 2014-02-26 17:12 UTC (permalink / raw)
  To: gdb-patches; +Cc: Sterling Augustine

On 02/07/2014 07:52 PM, Pedro Alves wrote:
> Although the bug must be pretty old, the triggered assertion
> is new in 7.7.  So it might be good to have it in 7.7.

I've now pushed this to the 7.7 branch.

----- 
[PATCH] Make sure we don't resume the stepped thread by accident.

Say:

<stopped at a breakpoint in thread 2>
(gdb) thread 3
(gdb) step

The above triggers the prepare_to_proceed/deferred_step_ptid process,
which switches back to thread 2, to step over its breakpoint before
getting back to thread 3 and "step" it.

If while stepping over the breakpoint in thread 2, a signal arrives,
and it is set to pass/nostop, we'll set a step-resume breakpoint at
the supposed signal-handler resume address, and call keep_going.  The
problem is that we were supposedly stepping thread 3, and that
keep_going delivers a signal to thread 2, and due to scheduler-locking
off, resumes everything else, _including_ thread 3, the thread we want
stepping.  This means that we lose control of thread 3 until the next
event, when we stop everything.  The end result for the user, is that
GDB lost control of the "step".

Here's the current infrun debug output of the above, with the testcase
in the patch below:

infrun: clear_proceed_status_thread (Thread 0x2aaaab8f5700 (LWP 11663))
infrun: clear_proceed_status_thread (Thread 0x2aaaab6f4700 (LWP 11662))
infrun: clear_proceed_status_thread (Thread 0x2aaaab4f2b20 (LWP 11659))
infrun: proceed (addr=0xffffffffffffffff, signal=144, step=1)
infrun: prepare_to_proceed (step=1), switched to [Thread 0x2aaaab6f4700 (LWP 11662)]
infrun: resume (step=1, signal=0), trap_expected=1, current thread [Thread 0x2aaaab6f4700 (LWP 11662)] at 0x40098f
infrun: wait_for_inferior ()
infrun: target_wait (-1, status) =
infrun:   11659 [Thread 0x2aaaab6f4700 (LWP 11662)],
infrun:   status->kind = stopped, signal = SIGUSR1
infrun: infwait_normal_state
infrun: TARGET_WAITKIND_STOPPED
infrun: stop_pc = 0x40098f
infrun: random signal 30

Program received signal SIGUSR1, User defined signal 1.
infrun: signal arrived while stepping over breakpoint
infrun: inserting step-resume breakpoint at 0x40098f
infrun: resume (step=0, signal=30), trap_expected=0, current thread [Thread 0x2aaaab6f4700 (LWP 11662)] at 0x40098f

^^^ this is a wildcard resume.

infrun: prepare_to_wait
infrun: target_wait (-1, status) =
infrun:   11659 [Thread 0x2aaaab6f4700 (LWP 11662)],
infrun:   status->kind = stopped, signal = SIGTRAP
infrun: infwait_normal_state
infrun: TARGET_WAITKIND_STOPPED
infrun: stop_pc = 0x40098f
infrun: BPSTAT_WHAT_STEP_RESUME
infrun: resume (step=1, signal=0), trap_expected=1, current thread [Thread 0x2aaaab6f4700 (LWP 11662)] at 0x40098f

^^^ step-resume hit, meaning the handler returned, so we go back to stepping thread 3.

infrun: prepare_to_wait
infrun: target_wait (-1, status) =
infrun:   11659 [Thread 0x2aaaab6f4700 (LWP 11662)],
infrun:   status->kind = stopped, signal = SIGTRAP
infrun: infwait_normal_state
infrun: TARGET_WAITKIND_STOPPED

infrun: stop_pc = 0x40088b
infrun: switching back to stepped thread
infrun: Switching context from Thread 0x2aaaab6f4700 (LWP 11662) to Thread 0x2aaaab8f5700 (LWP 11663)
infrun: resume (step=1, signal=0), trap_expected=0, current thread [Thread 0x2aaaab8f5700 (LWP 11663)] at 0x400938
infrun: prepare_to_wait
infrun: target_wait (-1, status) =
infrun:   11659 [Thread 0x2aaaab8f5700 (LWP 11663)],
infrun:   status->kind = stopped, signal = SIGTRAP
infrun: infwait_normal_state
infrun: TARGET_WAITKIND_STOPPED
infrun: stop_pc = 0x40093a
infrun: keep going
infrun: resume (step=1, signal=0), trap_expected=0, current thread [Thread 0x2aaaab8f5700 (LWP 11663)] at 0x40093a
infrun: prepare_to_wait
infrun: target_wait (-1, status) =
infrun:   11659 [Thread 0x2aaaab8f5700 (LWP 11663)],
infrun:   status->kind = stopped, signal = SIGTRAP
infrun: infwait_normal_state
infrun: TARGET_WAITKIND_STOPPED
infrun: stop_pc = 0x40091e
infrun: stepped to a different line
infrun: stop_stepping
[Switching to Thread 0x2aaaab8f5700 (LWP 11663)]
69            (*myp) ++; /* set breakpoint child_two here */

^^^ we stopped at the wrong line.  We still stepped a bit because the
test is running in a loop, and when we got back to stepping thread 3,
it happened to be in the stepping range.  (The loop increments a
counter, and the test makes sure it increments exactly once.  Without
the fix, the counter increments a bunch, since the user-stepped thread
runs free without GDB noticing.)

The fix is to switch to the stepping thread before continuing for the
step-resume breakpoint.

gdb/
2014-02-26  Pedro Alves  <palves@redhat.com>

	PR breakpoints/16292
	* infrun.c (handle_signal_stop) <signal arrives while stepping
	over a breakpoint>: Switch back to the stepping thread.

gdb/testsuite/
2014-02-26  Pedro Alves  <pedro@codesourcery.com>
	    Pedro Alves  <palves@redhat.com>

	PR breakpoints/16292
	* gdb.threads/signal-while-stepping-over-bp-other-thread.c: New
	file.
	* gdb.threads/signal-while-stepping-over-bp-other-threadexp: New
	file.
---
 gdb/ChangeLog                                      |   6 +
 gdb/infrun.c                                       |   6 +-
 gdb/testsuite/ChangeLog                            |   9 ++
 .../signal-while-stepping-over-bp-other-thread.c   | 145 +++++++++++++++++++++
 .../signal-while-stepping-over-bp-other-thread.exp | 120 +++++++++++++++++
 5 files changed, 285 insertions(+), 1 deletion(-)
 create mode 100644 gdb/testsuite/gdb.threads/signal-while-stepping-over-bp-other-thread.c
 create mode 100644 gdb/testsuite/gdb.threads/signal-while-stepping-over-bp-other-thread.exp

diff --git a/gdb/ChangeLog b/gdb/ChangeLog
index 0625695..6518d22 100644
--- a/gdb/ChangeLog
+++ b/gdb/ChangeLog
@@ -1,3 +1,9 @@
+2014-02-26  Pedro Alves  <palves@redhat.com>
+
+	PR breakpoints/16292
+	* infrun.c (handle_signal_stop) <signal arrives while stepping
+	over a breakpoint>: Switch back to the stepping thread.
+
 2014-02-25  Jan Kratochvil  <jan.kratochvil@redhat.com>
 
 	PR gdb/16626
diff --git a/gdb/infrun.c b/gdb/infrun.c
index 51540b3..05639c6 100644
--- a/gdb/infrun.c
+++ b/gdb/infrun.c
@@ -4379,7 +4379,11 @@ handle_signal_stop (struct execution_control_state *ecs)
 	  ecs->event_thread->step_after_step_resume_breakpoint = 1;
 	  /* Reset trap_expected to ensure breakpoints are re-inserted.  */
 	  ecs->event_thread->control.trap_expected = 0;
-	  keep_going (ecs);
+
+	  /* If we were nexting/stepping some other thread, switch to
+	     it, so that we don't continue it, losing control.  */
+	  if (!switch_back_to_stepped_thread (ecs))
+	    keep_going (ecs);
 	  return;
 	}
 
diff --git a/gdb/testsuite/ChangeLog b/gdb/testsuite/ChangeLog
index bc317d7..8168ec6 100644
--- a/gdb/testsuite/ChangeLog
+++ b/gdb/testsuite/ChangeLog
@@ -1,3 +1,12 @@
+2014-02-26  Pedro Alves  <pedro@codesourcery.com>
+	    Pedro Alves  <palves@redhat.com>
+
+	PR breakpoints/16292
+	* gdb.threads/signal-while-stepping-over-bp-other-thread.c: New
+	file.
+	* gdb.threads/signal-while-stepping-over-bp-other-thread.exp: New
+	file.
+
 2014-02-25  Jan Kratochvil  <jan.kratochvil@redhat.com>
 
 	PR gdb/16626
diff --git a/gdb/testsuite/gdb.threads/signal-while-stepping-over-bp-other-thread.c b/gdb/testsuite/gdb.threads/signal-while-stepping-over-bp-other-thread.c
new file mode 100644
index 0000000..a4634f2
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/signal-while-stepping-over-bp-other-thread.c
@@ -0,0 +1,145 @@
+/* This testcase is part of GDB, the GNU debugger.
+
+   Copyright 2009-2014 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include <pthread.h>
+#include <unistd.h>
+#include <stdlib.h>
+#include <signal.h>
+
+unsigned int args[2];
+
+pid_t pid;
+pthread_barrier_t barrier;
+pthread_t child_thread_2, child_thread_3;
+
+void
+handler (int signo)
+{
+  /* so that thread 3 is sure to run, in case the bug is present.  */
+  usleep (10);
+}
+
+void
+callme (void)
+{
+}
+
+void
+block_signals (void)
+{
+  sigset_t mask;
+
+  sigfillset (&mask);
+  sigprocmask (SIG_BLOCK, &mask, NULL);
+}
+
+void
+unblock_signals (void)
+{
+  sigset_t mask;
+
+  sigfillset (&mask);
+  sigprocmask (SIG_UNBLOCK, &mask, NULL);
+}
+
+void *
+child_function_3 (void *arg)
+{
+  int my_number =  (long) arg;
+  volatile int *myp = (int *) &args[my_number];
+
+  pthread_barrier_wait (&barrier);
+
+  while (*myp > 0)
+    {
+      (*myp) ++; /* set breakpoint child_two here */
+      callme ();
+    }
+
+  pthread_exit (NULL);
+}
+
+void *
+child_function_2 (void *arg)
+{
+  int my_number =  (long) arg;
+  volatile int *myp = (int *) &args[my_number];
+
+  unblock_signals ();
+
+  pthread_barrier_wait (&barrier);
+
+  while (*myp > 0)
+    {
+      (*myp) ++;
+      callme (); /* set breakpoint child_one here */
+    }
+
+  *myp = 1;
+  while (*myp > 0)
+    {
+      (*myp) ++;
+      callme ();
+    }
+
+  pthread_exit (NULL);
+}
+
+
+int
+main ()
+{
+  int res;
+  long i;
+
+  /* Block signals in all threads but one, so that we're sure which
+     thread gets the signal we send from the command line.  */
+  block_signals ();
+
+  signal (SIGUSR1, handler);
+
+  /* Call these early so that PLTs for these are resolved soon,
+     instead of in the threads.  RTLD_NOW should work as well.  */
+  usleep (0);
+  pthread_barrier_init (&barrier, NULL, 1);
+  pthread_barrier_wait (&barrier);
+
+  pthread_barrier_init (&barrier, NULL, 2);
+
+  /* The test uses this global to know where to send the signal
+     to.  */
+  pid = getpid ();
+
+  i = 0;
+  args[i] = 1;
+  res = pthread_create (&child_thread_2,
+			NULL, child_function_2, (void *) i);
+  pthread_barrier_wait (&barrier);
+  callme (); /* set wait-thread-2 breakpoint here */
+
+  i = 1;
+  args[i] = 1;
+  res = pthread_create (&child_thread_3,
+			NULL, child_function_3, (void *) i);
+  pthread_barrier_wait (&barrier);
+  callme (); /* set wait-thread-3 breakpoint here */
+
+  pthread_join (child_thread_2, NULL);
+  pthread_join (child_thread_3, NULL);
+
+  exit(EXIT_SUCCESS);
+}
diff --git a/gdb/testsuite/gdb.threads/signal-while-stepping-over-bp-other-thread.exp b/gdb/testsuite/gdb.threads/signal-while-stepping-over-bp-other-thread.exp
new file mode 100644
index 0000000..6b93d9c
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/signal-while-stepping-over-bp-other-thread.exp
@@ -0,0 +1,120 @@
+# Copyright (C) 2011-2014 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# Test that GDB doesn't inadvertently resume the stepped thread when a
+# signal arrives while stepping over the breakpoint that last caused a
+# stop, when the thread that hit that breakpoint is not the stepped
+# thread.
+
+standard_testfile
+set executable ${testfile}
+
+if [target_info exists gdb,nosignals] {
+    verbose "Skipping ${testfile}.exp because of nosignals."
+    return -1
+}
+
+# Test uses host "kill".
+if { [is_remote target] } {
+    return -1
+}
+
+if {[gdb_compile_pthreads "${srcdir}/${subdir}/${srcfile}" "${binfile}" \
+	 executable [list debug "incdir=${objdir}"]] != "" } {
+    return -1
+}
+
+proc get_value {var test} {
+    global expect_out
+    global gdb_prompt
+    global decimal
+
+    set value -1
+    gdb_test_multiple "print $var" "$test" {
+	-re ".*= ($decimal).*\r\n$gdb_prompt $" {
+	    set value $expect_out(1,string)
+	    pass "$test"
+        }
+    }
+    return ${value}
+}
+
+# Start with a fresh gdb.
+
+clean_restart $executable
+
+if ![runto_main] {
+    return -1
+}
+
+gdb_breakpoint [gdb_get_line_number "set wait-thread-2 breakpoint here"]
+gdb_continue_to_breakpoint "run to wait-thread-2 breakpoint"
+gdb_test "info threads" "" "info threads with thread 2"
+
+gdb_breakpoint [gdb_get_line_number "set wait-thread-3 breakpoint here"]
+gdb_continue_to_breakpoint "run to breakpoint"
+gdb_test "info threads" "" "info threads with thread 3"
+
+set testpid [get_value "pid" "get pid of inferior"]
+
+gdb_test "set scheduler-locking on"
+
+gdb_breakpoint [gdb_get_line_number "set breakpoint child_two here"]
+gdb_breakpoint [gdb_get_line_number "set breakpoint child_one here"]
+
+gdb_test "thread 3" "" "switch to thread 3 to run to its breakpoint"
+gdb_continue_to_breakpoint "run to breakpoint in thread 3"
+
+gdb_test "thread 2" "" "switch to thread 2 to run to its breakpoint"
+gdb_continue_to_breakpoint "run to breakpoint in thread 2"
+
+delete_breakpoints
+
+gdb_test "b *\$pc" "" "set breakpoint to be stepped over"
+# Make sure the first loop breaks without hitting the breakpoint
+# again.
+gdb_test "p *myp = 0" " = 0" "force loop break in thread 2"
+
+# We want "print" to make sure the target reports the signal to the
+# core.
+gdb_test "handle SIGUSR1 print nostop pass" "" ""
+
+# Queue a signal in thread 2.
+remote_exec host "kill -SIGUSR1 ${testpid}"
+
+gdb_test "thread 3" "" "switch to thread 3 for stepping"
+set my_number [get_value "my_number" "get my_number"]
+set cnt_before [get_value "args\[$my_number\]" "get count before step"]
+gdb_test "set scheduler-locking off"
+
+# Make sure we're exercising the paths we want to.
+gdb_test "set debug infrun 1"
+
+gdb_test \
+    "step" \
+    ".*prepare_to_proceed \\(step=1\\), switched to.*signal arrived while stepping over breakpoint.*switching back to stepped thread.*stepped to a different line.*callme.*" \
+    "step"
+
+set cnt_after [get_value "args\[$my_number\]" "get count after step"]
+
+# Test that GDB doesn't inadvertently resume the stepped thread when a
+# signal arrives while stepping over a breakpoint in another thread.
+
+set test "stepped thread under control"
+if { $cnt_before + 1 == $cnt_after } {
+    pass $test
+} else {
+    fail $test
+}
-- 
1.7.11.7


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2014-02-26 17:12 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-21 22:17 [PATCH] RFC: All stepping threads need the same checks and cleanup as the currently stepping thread Sterling Augustine
2014-01-22 14:04 ` Pedro Alves
2014-01-23 17:55   ` Sterling Augustine
2014-01-23 18:53     ` Pedro Alves
2014-01-24 12:44       ` Pedro Alves
2014-02-07 19:52       ` Pedro Alves
2014-02-07 20:12         ` Pedro Alves
2014-02-26 17:12         ` Pedro Alves

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).