public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
* [PATCH 00/12] Fix detach + displaced-step regression + N bugs more
@ 2021-01-13  1:15 Pedro Alves
  2021-01-13  1:15 ` [PATCH 01/12] Fix attaching in non-stop mode Pedro Alves
                   ` (11 more replies)
  0 siblings, 12 replies; 36+ messages in thread
From: Pedro Alves @ 2021-01-13  1:15 UTC (permalink / raw)
  To: gdb-patches

I noticed that the "detach" command while the inferior is running
recently regressed -- the inferior crashes with SIGTRAP if it happens
to be detached just while it is doing a displaced step.  The fix
looked very simple, it's just a reversed condition, so I fixed it, and
wrote a testcase.  That's then the trouble started -- the testcase
exposed a TON of other issues, of the "stare at infrun and gdbserver
logs for days" kind...  I would set the testcase running in a loop,
and then once in a while, puuft, a new racy bug would trigger.

This fixes all the bugs I ran into.  I won't be surprised if the
testcase still manages to expose some more racy problems.  But I did
leave it running for hours at time on my machine more than once, like
this:

$ (set -e; while true; do \
    make check TESTS="gdb.threads/detach-step-over.exp" RUNTESTFLAGS="--target_board=native-extended-gdbserver"; \
    make check TESTS="gdb.threads/detach-step-over.exp"; \
    done)

and nothing showed up.

For your convenience, I've pushed this to the
users/palves/detach-step-over branch.

Pedro Alves (12):
  Fix attaching in non-stop mode
  Fix "target extended-remote" + "maint set target-non-stop" + "attach"
  Testcase for attaching in non-stop mode
  Fix a couple vStopped pending ack bugs
  gdbserver: spurious SIGTRAP w/ detach while step-over in progress
  Factor out after-stop event handling code from stop_all_threads
  prepare_for_detach: don't release scoped_restore at the end
  prepare_for_detach and ongoing displaced stepping
  detach and breakpoint removal
  detach with in-line step over in progress
  detach in all-stop with threads running
  Testcase for detaching while stepping over breakpoint

 gdb/infcmd.c                                  |  13 +
 gdb/infrun.c                                  | 640 ++++++++++--------
 gdb/infrun.h                                  |   4 +
 gdb/linux-nat.c                               |   5 +
 gdb/remote.c                                  |  39 +-
 gdb/target.c                                  |   9 -
 gdb/testsuite/gdb.threads/attach-non-stop.c   |  58 ++
 gdb/testsuite/gdb.threads/attach-non-stop.exp | 149 ++++
 gdb/testsuite/gdb.threads/detach-step-over.c  | 112 +++
 .../gdb.threads/detach-step-over.exp          | 290 ++++++++
 gdbserver/linux-low.cc                        |  29 +-
 gdbserver/server.cc                           |   9 +
 12 files changed, 1072 insertions(+), 285 deletions(-)
 create mode 100644 gdb/testsuite/gdb.threads/attach-non-stop.c
 create mode 100644 gdb/testsuite/gdb.threads/attach-non-stop.exp
 create mode 100644 gdb/testsuite/gdb.threads/detach-step-over.c
 create mode 100644 gdb/testsuite/gdb.threads/detach-step-over.exp


base-commit: bfc7d04afbeb56a3dc3caa71322a71fbb084d5dd
-- 
2.26.2


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 01/12] Fix attaching in non-stop mode
  2021-01-13  1:15 [PATCH 00/12] Fix detach + displaced-step regression + N bugs more Pedro Alves
@ 2021-01-13  1:15 ` Pedro Alves
  2021-01-13  3:11   ` Simon Marchi
  2021-01-13  1:15 ` [PATCH 02/12] Fix "target extended-remote" + "maint set target-non-stop" + "attach" Pedro Alves
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 36+ messages in thread
From: Pedro Alves @ 2021-01-13  1:15 UTC (permalink / raw)
  To: gdb-patches

Attaching in non-stop mode currently misbehaves, like so:

 (gdb) attach 1244450
 Attaching to process 1244450
 [New LWP 1244453]
 [New LWP 1244454]
 [New LWP 1244455]
 [New LWP 1244456]
 [New LWP 1244457]
 [New LWP 1244458]
 [New LWP 1244459]
 [New LWP 1244461]
 [New LWP 1244462]
 [New LWP 1244463]
 No unwaited-for children left.

At this point, GDB's stopped/running thread state is out of sync with
the inferior:

(gdb) info threads
  Id   Target Id                     Frame
* 1    LWP 1244450 "attach-non-stop" 0xf1b443bf in ?? ()
  2    LWP 1244453 "attach-non-stop" (running)
  3    LWP 1244454 "attach-non-stop" (running)
  4    LWP 1244455 "attach-non-stop" (running)
  5    LWP 1244456 "attach-non-stop" (running)
  6    LWP 1244457 "attach-non-stop" (running)
  7    LWP 1244458 "attach-non-stop" (running)
  8    LWP 1244459 "attach-non-stop" (running)
  9    LWP 1244461 "attach-non-stop" (running)
  10   LWP 1244462 "attach-non-stop" (running)
  11   LWP 1244463 "attach-non-stop" (running)
(gdb)
(gdb) interrupt -a
(gdb)
*nothing*

The problem is that attaching installs an inferior continuation,
called when the target reports the initial attach stop, here, in
inf-loop.c:inferior_event_handler:

      /* Do all continuations associated with the whole inferior (not
	 a particular thread).  */
      if (inferior_ptid != null_ptid)
	do_all_inferior_continuations (0);

However, currently in non-stop mode, inferior_ptid is still null_ptid
when we get here.

If you try to do "set debug infrun 1" to debug the problem, however,
then the attach completes correctly, with GDB reporting a stop for
each thread.

The bug is that we're missing a switch_to_thread/context_switch call
when handling the initial stop, here:

  if (stop_soon == STOP_QUIETLY_NO_SIGSTOP
      && (ecs->event_thread->suspend.stop_signal == GDB_SIGNAL_STOP
	  || ecs->event_thread->suspend.stop_signal == GDB_SIGNAL_TRAP
	  || ecs->event_thread->suspend.stop_signal == GDB_SIGNAL_0))
    {
      stop_print_frame = true;
      stop_waiting (ecs);
      ecs->event_thread->suspend.stop_signal = GDB_SIGNAL_0;
      return;
    }

Note how the STOP_QUIETLY / STOP_QUIETLY_REMOTE case above that does
call context_switch.

And the reason "set debug infrun 1" "fixes" it, is that the debug path
has a switch_to_thread call.

This patch fixes it by moving the main context_switch call earlier.

A testcase exercising this will be added in a following patch.

gdb/ChangeLog:

	* infrun.c (handle_signal_stop): Move main context_switch call
	earlier, before STOP_QUIETLY_NO_SIGSTOP.
---
 gdb/infrun.c | 27 ++++++++++++---------------
 1 file changed, 12 insertions(+), 15 deletions(-)

diff --git a/gdb/infrun.c b/gdb/infrun.c
index 45bedf89641..b54baf70994 100644
--- a/gdb/infrun.c
+++ b/gdb/infrun.c
@@ -5735,13 +5735,23 @@ handle_signal_stop (struct execution_control_state *ecs)
   ecs->event_thread->suspend.stop_pc
     = regcache_read_pc (get_thread_regcache (ecs->event_thread));
 
+  /* See if something interesting happened to the non-current thread.
+     If so, then switch to that thread.  */
+  if (ecs->ptid != inferior_ptid)
+    {
+      infrun_debug_printf ("context switch");
+
+      context_switch (ecs);
+
+      if (deprecated_context_hook)
+	deprecated_context_hook (ecs->event_thread->global_num);
+    }
+
   if (debug_infrun)
     {
       struct regcache *regcache = get_thread_regcache (ecs->event_thread);
       struct gdbarch *reg_gdbarch = regcache->arch ();
 
-      switch_to_thread (ecs->event_thread);
-
       infrun_debug_printf ("stop_pc=%s",
 			   paddress (reg_gdbarch,
 				     ecs->event_thread->suspend.stop_pc));
@@ -5764,7 +5774,6 @@ handle_signal_stop (struct execution_control_state *ecs)
   stop_soon = get_inferior_stop_soon (ecs);
   if (stop_soon == STOP_QUIETLY || stop_soon == STOP_QUIETLY_REMOTE)
     {
-      context_switch (ecs);
       infrun_debug_printf ("quietly stopped");
       stop_print_frame = true;
       stop_waiting (ecs);
@@ -5802,18 +5811,6 @@ handle_signal_stop (struct execution_control_state *ecs)
       return;
     }
 
-  /* See if something interesting happened to the non-current thread.  If
-     so, then switch to that thread.  */
-  if (ecs->ptid != inferior_ptid)
-    {
-      infrun_debug_printf ("context switch");
-
-      context_switch (ecs);
-
-      if (deprecated_context_hook)
-	deprecated_context_hook (ecs->event_thread->global_num);
-    }
-
   /* At this point, get hold of the now-current thread's frame.  */
   frame = get_current_frame ();
   gdbarch = get_frame_arch (frame);
-- 
2.26.2


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 02/12] Fix "target extended-remote" + "maint set target-non-stop" + "attach"
  2021-01-13  1:15 [PATCH 00/12] Fix detach + displaced-step regression + N bugs more Pedro Alves
  2021-01-13  1:15 ` [PATCH 01/12] Fix attaching in non-stop mode Pedro Alves
@ 2021-01-13  1:15 ` Pedro Alves
  2021-01-13  5:01   ` Simon Marchi
  2021-01-13  1:15 ` [PATCH 03/12] Testcase for attaching in non-stop mode Pedro Alves
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 36+ messages in thread
From: Pedro Alves @ 2021-01-13  1:15 UTC (permalink / raw)
  To: gdb-patches

With "target extended-remote" + "maint set target-non-stop", attaching
hangs like so:

 (gdb) attach 1244450
 Attaching to process 1244450
 [New Thread 1244450.1244450]
 [New Thread 1244450.1244453]
 [New Thread 1244450.1244454]
 [New Thread 1244450.1244455]
 [New Thread 1244450.1244456]
 [New Thread 1244450.1244457]
 [New Thread 1244450.1244458]
 [New Thread 1244450.1244459]
 [New Thread 1244450.1244461]
 [New Thread 1244450.1244462]
 [New Thread 1244450.1244463]
 * hang *

Attaching to the hung GDB shows that GDB is busy in an infinite loop
in stop_all_threads:

 (top-gdb) bt
 #0  stop_all_threads () at /home/pedro/gdb/binutils-gdb/src/gdb/infrun.c:4755
 #1  0x000055555597b424 in stop_waiting (ecs=0x7fffffffd930) at /home/pedro/gdb/binutils-gdb/src/gdb/infrun.c:7738
 #2  0x0000555555976fba in handle_signal_stop (ecs=0x7fffffffd930) at /home/pedro/gdb/binutils-gdb/src/gdb/infrun.c:5868
 #3  0x0000555555975f6a in handle_inferior_event (ecs=0x7fffffffd930) at /home/pedro/gdb/binutils-gdb/src/gdb/infrun.c:5527
 #4  0x0000555555971da4 in fetch_inferior_event () at /home/pedro/gdb/binutils-gdb/src/gdb/infrun.c:3910
 #5  0x00005555559540b2 in inferior_event_handler (event_type=INF_REG_EVENT) at /home/pedro/gdb/binutils-gdb/src/gdb/inf-loop.c:42
 #6  0x000055555597e825 in infrun_async_inferior_event_handler (data=0x0) at /home/pedro/gdb/binutils-gdb/src/gdb/infrun.c:9162
 #7  0x0000555555687d1d in check_async_event_handlers () at /home/pedro/gdb/binutils-gdb/src/gdb/async-event.c:328
 #8  0x0000555555e48284 in gdb_do_one_event () at /home/pedro/gdb/binutils-gdb/src/gdbsupport/event-loop.cc:216
 #9  0x00005555559e7512 in start_event_loop () at /home/pedro/gdb/binutils-gdb/src/gdb/main.c:347
 #10 0x00005555559e765d in captured_command_loop () at /home/pedro/gdb/binutils-gdb/src/gdb/main.c:407
 #11 0x00005555559e8f80 in captured_main (data=0x7fffffffdb70) at /home/pedro/gdb/binutils-gdb/src/gdb/main.c:1239
 #12 0x00005555559e8ff2 in gdb_main (args=0x7fffffffdb70) at /home/pedro/gdb/binutils-gdb/src/gdb/main.c:1254
 #13 0x0000555555627c86 in main (argc=12, argv=0x7fffffffdc88) at /home/pedro/gdb/binutils-gdb/src/gdb/gdb.c:32

The problem is that the remote sends stops for all the threads:

 Packet received: l/home/pedro/gdb/binutils-gdb/build/gdb/testsuite/outputs/gdb.threads/attach-non-stop/attach-non-stop
 Sending packet: $vStopped#55...Packet received: T0006:f06e25edec7f0000;07:f06e25edec7f0000;10:f14190ccf4550000;thread:p12fd22.12fd2f;core:15;
 Sending packet: $vStopped#55...Packet received: T0006:f0dea5f0ec7f0000;07:f0dea5f0ec7f0000;10:e84190ccf4550000;thread:p12fd22.12fd27;core:4;
 Sending packet: $vStopped#55...Packet received: T0006:f0ee25f1ec7f0000;07:f0ee25f1ec7f0000;10:f14190ccf4550000;thread:p12fd22.12fd26;core:5;
 Sending packet: $vStopped#55...Packet received: T0006:f0bea5efec7f0000;07:f0bea5efec7f0000;10:f14190ccf4550000;thread:p12fd22.12fd29;core:1;
 Sending packet: $vStopped#55...Packet received: T0006:f0ce25f0ec7f0000;07:f0ce25f0ec7f0000;10:e84190ccf4550000;thread:p12fd22.12fd28;core:a;
 Sending packet: $vStopped#55...Packet received: T0006:f07ea5edec7f0000;07:f07ea5edec7f0000;10:e84190ccf4550000;thread:p12fd22.12fd2e;core:f;
 Sending packet: $vStopped#55...Packet received: T0006:f0ae25efec7f0000;07:f0ae25efec7f0000;10:df4190ccf4550000;thread:p12fd22.12fd2a;core:6;
 Sending packet: $vStopped#55...Packet received: T0006:0000000000000000;07:c0e8a381fe7f0000;10:bf43b4f1ec7f0000;thread:p12fd22.12fd22;core:2;
 Sending packet: $vStopped#55...Packet received: T0006:f0fea5f1ec7f0000;07:f0fea5f1ec7f0000;10:df4190ccf4550000;thread:p12fd22.12fd25;core:8;
 Sending packet: $vStopped#55...Packet received: T0006:f09ea5eeec7f0000;07:f09ea5eeec7f0000;10:e84190ccf4550000;thread:p12fd22.12fd2b;core:b;
 Sending packet: $vStopped#55...Packet received: OK

But then wait_one never consumes them, always hitting this path:

 4473          if (nfds == 0)
 4474            {
 4475              /* No waitable targets left.  All must be stopped.  */
 4476              return {NULL, minus_one_ptid, {TARGET_WAITKIND_NO_RESUMED}};
 4477            }

Resulting in GDB constanly calling target_stop to stop threads, but
the remote target never reporting back the stops to infrun.

That TARGET_WAITKIND_NO_RESUMED path shown above is always taken
because here, in wait_one too, just above:

 4428          for (inferior *inf : all_inferiors ())
 4429            {
 4430              process_stratum_target *target = inf->process_target ();
 4431              if (target == NULL
 4432                  || !target->is_async_p ()
                           ^^^^^^^^^^^^^^^^^^^^^
 4433                  || !target->threads_executing)
 4434                continue;

... the remote target is not async.

And in turn that happened because extended_remote_target::attach
misses enabling async in the target-non-stop path.

A testcase exercising this will be added in a following patch.

gdb/ChangeLog:

	* remote.c (extended_remote_target::attach): Set target async in
	the target-non-stop path too.
---
 gdb/remote.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/gdb/remote.c b/gdb/remote.c
index 6dacc24307e..2f15c4c1f32 100644
--- a/gdb/remote.c
+++ b/gdb/remote.c
@@ -5968,7 +5968,12 @@ extended_remote_target::attach (const char *args, int from_tty)
 	}
     }
   else
-    gdb_assert (wait_status == NULL);
+    {
+      gdb_assert (wait_status == NULL);
+
+      gdb_assert (target_can_async_p ());
+      target_async (1);
+    }
 }
 
 /* Implementation of the to_post_attach method.  */
-- 
2.26.2


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 03/12] Testcase for attaching in non-stop mode
  2021-01-13  1:15 [PATCH 00/12] Fix detach + displaced-step regression + N bugs more Pedro Alves
  2021-01-13  1:15 ` [PATCH 01/12] Fix attaching in non-stop mode Pedro Alves
  2021-01-13  1:15 ` [PATCH 02/12] Fix "target extended-remote" + "maint set target-non-stop" + "attach" Pedro Alves
@ 2021-01-13  1:15 ` Pedro Alves
  2021-01-13  5:09   ` Simon Marchi
  2021-01-13  1:15 ` [PATCH 04/12] Fix a couple vStopped pending ack bugs Pedro Alves
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 36+ messages in thread
From: Pedro Alves @ 2021-01-13  1:15 UTC (permalink / raw)
  To: gdb-patches

This adds a testcase exercising attaching to a multi-threaded process,
in all combinations of:

  - set non-stop on/off
  - maint target non-stop off/on
  - "attach" vs "attach &"

This exercises the bugs fixed in the two previous patches.

gdb/testsuite/ChangeLog:

	* gdb.threads/attach-non-stop.c: New file.
	* gdb.threads/attach-non-stop.exp: New file.
---
 gdb/testsuite/gdb.threads/attach-non-stop.c   |  58 +++++++
 gdb/testsuite/gdb.threads/attach-non-stop.exp | 149 ++++++++++++++++++
 2 files changed, 207 insertions(+)
 create mode 100644 gdb/testsuite/gdb.threads/attach-non-stop.c
 create mode 100644 gdb/testsuite/gdb.threads/attach-non-stop.exp

diff --git a/gdb/testsuite/gdb.threads/attach-non-stop.c b/gdb/testsuite/gdb.threads/attach-non-stop.c
new file mode 100644
index 00000000000..23a8b258c18
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/attach-non-stop.c
@@ -0,0 +1,58 @@
+/* This testcase is part of GDB, the GNU debugger.
+
+   Copyright 2021 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#define _GNU_SOURCE
+#include <assert.h>
+#include <pthread.h>
+#include <unistd.h>
+
+/* Number of threads we'll create.  */
+static const int n_threads = 10;
+
+/* Entry point for threads.  Loops forever.  */
+
+void *
+thread_func (void *arg)
+{
+  while (1)
+    sleep (1);
+
+  return NULL;
+}
+
+int
+main (int argc, char **argv)
+{
+  int i;
+
+  alarm (30);
+
+  /* Spawn the test threads.  */
+  for (i = 0; i < n_threads; ++i)
+    {
+      pthread_t child;
+      int rc;
+
+      rc = pthread_create (&child, NULL, thread_func, NULL);
+      assert (rc == 0);
+    }
+
+  while (1)
+    sleep (1);
+
+  return 0;
+}
diff --git a/gdb/testsuite/gdb.threads/attach-non-stop.exp b/gdb/testsuite/gdb.threads/attach-non-stop.exp
new file mode 100644
index 00000000000..96b95b764d7
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/attach-non-stop.exp
@@ -0,0 +1,149 @@
+# Copyright 2021 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# Test attaching to a multi-threaded process, in all combinations of:
+#
+#  - set non-stop on/off
+#  - maint target non-stop off/on
+#  - "attach" vs "attach &"
+
+if {![can_spawn_for_attach]} {
+    return 0
+}
+
+standard_testfile
+
+# The test proper.  See description above.
+
+proc test {target_non_stop non_stop cmd} {
+    global binfile srcfile
+    global gdb_prompt
+    global decimal
+    global GDBFLAGS
+
+    # Number of threads started by the program.
+    set n_threads 10
+
+    save_vars { GDBFLAGS } {
+	append GDBFLAGS " -ex \"maint set target-non-stop $target_non_stop\""
+	append GDBFLAGS " -ex \"set non-stop $non_stop\""
+	clean_restart $binfile
+    }
+
+    set test_spawn_id [spawn_wait_for_attach $binfile]
+    set testpid [spawn_id_get_pid $test_spawn_id]
+
+    set attached 0
+    set test "attach"
+    set any "\[^\r\n\]*"
+
+    if {$cmd == "attach"} {
+	gdb_test_multiple "attach $testpid" $test {
+	    -re "Attaching to program:${any}process $testpid\r\n.*$gdb_prompt " {
+		pass $test
+		set attached 1
+	    }
+	}
+
+	if {!$attached} {
+	    kill_wait_spawned_process $test_spawn_id
+	    return
+	}
+
+	if {$non_stop} {
+	    # In non-stop, we will see one stop per thread after
+	    # the prompt.
+	    set stops 0
+	    set test "seen all stops"
+	    for {set thread 1} { $thread <= $n_threads } { incr thread } {
+		gdb_test_multiple "" $test {
+		    -re "Thread $::decimal ${any} stopped" {
+			incr stops
+		    }
+		}
+	    }
+
+	    # This we haven't seen all stops, then the
+	    # gdb_test_multiple in the loop above will have
+	    # already issued a FAIL.
+	    if {$stops == $n_threads} {
+		pass $test
+	    }
+	}
+
+	gdb_test_multiple "info threads" "" {
+	    -re "\\(running\\).*$gdb_prompt $" {
+		fail $gdb_test_name
+	    }
+	    -re "$gdb_prompt $" {
+		pass $gdb_test_name
+	    }
+	}
+    } else {
+	gdb_test_multiple "attach $testpid &" $test {
+	    -re "Attaching to program:${any}process $testpid\r\n.*$gdb_prompt " {
+		pass $test
+		set attached 1
+	    }
+	}
+
+	if {!$attached} {
+	    kill_wait_spawned_process $test_spawn_id
+	    return
+	}
+
+	set running_count 0
+	gdb_test_multiple "info threads" "all threads running" {
+	    -re "\\(running\\)" {
+		incr running_count
+		exp_continue
+	    }
+	    -re "Cannot execute this command while the target is running.*$gdb_prompt $" {
+		# Testing against a remote server that doesn't do
+		# non-stop mode.  Explicitly interrupt.  This doesn't
+		# test the same code paths in GDB, but it's still
+		# something.
+		gdb_test_multiple "interrupt" "" {
+		    -re "$gdb_prompt " {
+			gdb_test_multiple "" $gdb_test_name {
+			    -re "received signal SIGINT, Interrupt" {
+				pass $gdb_test_name
+			    }
+			}
+		    }
+		}
+	    }
+	    -re "$gdb_prompt $" {
+		gdb_assert {$running_count == ($n_threads + 1)} $gdb_test_name
+	    }
+	}
+    }
+
+    gdb_test "detach" "Detaching from.*"
+
+    kill_wait_spawned_process $test_spawn_id
+}
+
+if {[build_executable "failed to prepare" $testfile $srcfile {debug pthreads}] == -1} {
+    return -1
+}
+
+foreach_with_prefix target-non-stop {"off" "on"} {
+    foreach_with_prefix non-stop {"off" "on"} {
+	foreach_with_prefix cmd {"attach" "attach&"} {
+	    test ${target-non-stop} ${non-stop} ${cmd}
+	}
+    }
+}
-- 
2.26.2


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 04/12] Fix a couple vStopped pending ack bugs
  2021-01-13  1:15 [PATCH 00/12] Fix detach + displaced-step regression + N bugs more Pedro Alves
                   ` (2 preceding siblings ...)
  2021-01-13  1:15 ` [PATCH 03/12] Testcase for attaching in non-stop mode Pedro Alves
@ 2021-01-13  1:15 ` Pedro Alves
  2021-01-13  5:29   ` Simon Marchi
  2021-01-13  1:15 ` [PATCH 05/12] gdbserver: spurious SIGTRAP w/ detach while step-over in progress Pedro Alves
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 36+ messages in thread
From: Pedro Alves @ 2021-01-13  1:15 UTC (permalink / raw)
  To: gdb-patches

A following patch will add a testcase that has two processes with
threads stepping over a breakpoint continuously, and then detaches
from one of the processes while threads are running.  The other
process continues stepping over its breakpoint.  And then the testcase
sends a SIGUSR1, expecting that GDB reports it.  That would sometimes
hang against gdbserver, due to the bugs fixed here.  Both bugs are
related, in that they're about remote protocol asynchronous Stop
notifications.  There's a bug in GDB, and another in GDBserver.

The GDB bug:

- when we detach from a process, the remote target discards any
  pending RSP notification related to that process, including the
  in-flight, yet-unacked notification.  Discarding the in-flight
  notification is the problem.  Until the in-flight notification is
  acked with a vStopped packet, the server won't send another %Stop
  notification.  As a result, the debug session gets messed up.  In
  the new testcase's case, GDB would hang inside stop_all_threads,
  waiting for a stop for one of the process'es threads, which never
  arrived -- its stop reply was permanently stuck in the stop reply
  queue, waiting for a vStopped packet that never arrived.

The GDBserver bug:

  GDBserver has the opposite bug.  It also discards notifications for
  the process being detached.  If that discards the head of the
  notification queue, when gdb sends an ack, it ends up acking the
  _next_ notification.  Meaning, gdb loses one notification.  In the
  testcase, this results in a similar hang in stop_all_threads.

So we have two very similar bugs in GDB and GDBserver, both resulting
in a similar symptom.  That's why I'm fixing them both at the same
time.

gdb/ChangeLog:

	* remote.c (remote_notif_stop_ack): Don't error out on
	TARGET_WAITKIND_IGNORE; instead, just ignore the notification.
	(remote_target::discard_pending_stop_replies): Don't delete
	in-flight notification; instead, clear its contents.

gdbserver/ChangeLog:

	* server.cc (discard_queued_stop_replies): Don't ever discard the
	notification at the head of the list.
---
 gdb/remote.c        | 22 +++++++++++++---------
 gdbserver/server.cc |  9 +++++++++
 2 files changed, 22 insertions(+), 9 deletions(-)

diff --git a/gdb/remote.c b/gdb/remote.c
index 2f15c4c1f32..7199e0ac422 100644
--- a/gdb/remote.c
+++ b/gdb/remote.c
@@ -6935,13 +6935,11 @@ remote_notif_stop_ack (remote_target *remote,
   /* acknowledge */
   putpkt (remote, self->ack_command);
 
-  if (stop_reply->ws.kind == TARGET_WAITKIND_IGNORE)
-    {
-      /* We got an unknown stop reply.  */
-      error (_("Unknown stop reply"));
-    }
-
-  remote->push_stop_reply (stop_reply);
+  /* Kind can be TARGET_WAITKIND_IGNORE if we have meanwhile discarded
+     the notification.  It was left in the queue because we need to
+     acknowledge it and pull the rest of the notifications out.  */
+  if (stop_reply->ws.kind != TARGET_WAITKIND_IGNORE)
+    remote->push_stop_reply (stop_reply);
 }
 
 static int
@@ -7110,8 +7108,14 @@ remote_target::discard_pending_stop_replies (struct inferior *inf)
   /* Discard the in-flight notification.  */
   if (reply != NULL && reply->ptid.pid () == inf->pid)
     {
-      delete reply;
-      rns->pending_event[notif_client_stop.id] = NULL;
+      /* Leave the notification pending, since the server expects that
+	 we acknowledge it with vStopped.  But clear its contents, so
+	 that later on when we acknowledge it, we also discard it.  */
+      reply->ws.kind = TARGET_WAITKIND_IGNORE;
+
+      if (remote_debug)
+	fprintf_unfiltered (gdb_stdlog,
+			    "discarded in-flight notification\n");
     }
 
   /* Discard the stop replies we have already pulled with
diff --git a/gdbserver/server.cc b/gdbserver/server.cc
index 77e89fe6ed0..a5497e93cee 100644
--- a/gdbserver/server.cc
+++ b/gdbserver/server.cc
@@ -203,6 +203,15 @@ discard_queued_stop_replies (ptid_t ptid)
       next = iter;
       ++next;
 
+      if (iter == notif_stop.queue.begin ())
+	{
+	  /* The head of the list contains the notification that was
+	     already sent to GDB.  So we can't remove it, otherwise
+	     when GDB sends the vStopped, it would ack the _next_
+	     notification, which hadn't been sent yet!  */
+	  continue;
+	}
+
       if (remove_all_on_match_ptid (*iter, ptid))
 	{
 	  delete *iter;
-- 
2.26.2


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 05/12] gdbserver: spurious SIGTRAP w/ detach while step-over in progress
  2021-01-13  1:15 [PATCH 00/12] Fix detach + displaced-step regression + N bugs more Pedro Alves
                   ` (3 preceding siblings ...)
  2021-01-13  1:15 ` [PATCH 04/12] Fix a couple vStopped pending ack bugs Pedro Alves
@ 2021-01-13  1:15 ` Pedro Alves
  2021-01-13  6:00   ` Simon Marchi
  2021-01-13  1:15 ` [PATCH 06/12] Factor out after-stop event handling code from stop_all_threads Pedro Alves
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 36+ messages in thread
From: Pedro Alves @ 2021-01-13  1:15 UTC (permalink / raw)
  To: gdb-patches

A following patch will add a new testcase that has two processes, each
with a number of threads constantly tripping a breakpoint and stepping
over it, because the breakpoint has a condition that evals false.
Then GDB detaches from one of the processes, while both processes are
running.  And then the testcase sends a SIGUSR1 to the other process.

When run against gdbserver, that would occasionaly fail like this:

 (gdb) PASS: gdb.threads/detach-step-over.exp: iter 1: detach
 Executing on target: kill -SIGUSR1 208303    (timeout = 300)
 spawn -ignore SIGHUP kill -SIGUSR1 208303

 Thread 2.5 "detach-step-ove" received signal SIGTRAP, Trace/breakpoint trap.
 [Switching to Thread 208303.208305]
 0x000055555555522a in thread_func (arg=0x0) at /home/pedro/gdb/binutils-gdb/src/gdb/testsuite/gdb.threads/detach-step-over.c:54
 54            counter++; /* Set breakpoint here.  */

Note that it's gdbserver itself that steps over the breakpoint.

The gdbserver logs reveal what happened:

 - GDB manages to detach while a step over is in progress.  That reaches
   linux_process_target::complete_ongoing_step_over(), which does:

      /* Passing NULL_PTID as filter indicates we want all events to
	 be left pending.  Eventually this returns when there are no
	 unwaited-for children left.  */
      ret = wait_for_event_filtered (minus_one_ptid, null_ptid, &wstat,
				     __WALL);

   As the comment say, this leaves all events pending, _including_ the
   just finished step SIGTRAP.  We never discard that SIGTRAP.  So
   GDBserver reports the SIGTRAP to GDB.  GDB can't explain the
   SIGTRAP, so it reports it to the user.

The gdbserver log looks like this.  The LWP of interest is 208305:

 Need step over [LWP 208305]? yes, found breakpoint at 0x555555555227
 proceed_all_lwps: found thread 208305 needing a step-over
 Starting step-over on LWP 208305.  Stopping all threads

208305 starts a step-over.

 >>>> entering void linux_process_target::stop_all_lwps(int, lwp_info*)
 stop_all_lwps (stop-and-suspend, except=LWP 208303.208305)
 Sending sigstop to lwp 208303
 Sending sigstop to lwp 207755
 wait_for_sigstop: pulling events
 LWFE: waitpid(-1, ...) returned 207755, ERRNO-OK
 LLW: waitpid 207755 received Stopped (signal) (stopped)
 pc is 0x7f7e045593bf
 Expected stop.
 LLW: SIGSTOP caught for LWP 207755.207755 while stopping threads.
 LWFE: waitpid(-1, ...) returned 208303, ERRNO-OK
 LLW: waitpid 208303 received Stopped (signal) (stopped)
 pc is 0x7ffff7e743bf
 Expected stop.
 LLW: SIGSTOP caught for LWP 208303.208303 while stopping threads.
 LWFE: waitpid(-1, ...) returned 0, ERRNO-OK
 leader_pid=208303, leader_lp!=NULL=1, num_lwps=11, zombie=0
 leader_pid=207755, leader_lp!=NULL=1, num_lwps=11, zombie=0
 LLW: exit (no unwaited-for LWP)
 stop_all_lwps done, setting stopping_threads back to !stopping
 <<<< exiting void linux_process_target::stop_all_lwps(int, lwp_info*)
 Done stopping all threads for step-over.
 pc is 0x555555555227
 Writing 8b to 0x555555555227 in process 208305
 Could not findsigchld_handler
  fast tracepoint jump at 0x555555555227 in list (uninserting).
   pending reinsert at 0x555555555227
   step from pc 0x555555555227
 Resuming lwp 208305 (step, signal 0, stop expected)
 <<<< exiting ptid_t linux_process_target::wait_1(ptid_t, target_waitstatus*, target_wait_flags)
 handling possible serial event
 getpkt ("D;32b8b");  [no ack sent]

The detach request arrives.

 sigchld_handler
 Tracing is already off, ignoring
 detach: step over in progress, finish it first

gdbserver realizes a step over for 208305 was in progress, let's it
finish.

 LWFE: waitpid(-1, ...) returned 208305, ERRNO-OK
 LLW: waitpid 208305 received Stopped (signal) (stopped)
 pc is 0x555555555227
 Expected stop.
 LLW: step LWP 208303.208305, 0, 0 (discard delayed SIGSTOP)
   pending reinsert at 0x555555555227
   step from pc 0x555555555227
 Resuming lwp 208305 (step, signal 0, stop not expected)
 LWFE: waitpid(-1, ...) returned 0, ERRNO-OK
 leader_pid=208303, leader_lp!=NULL=1, num_lwps=11, zombie=0
 leader_pid=207755, leader_lp!=NULL=1, num_lwps=11, zombie=0
 sigsuspend'ing
 LWFE: waitpid(-1, ...) returned 208305, ERRNO-OK
 LLW: waitpid 208305 received Trace/breakpoint trap (stopped)
 pc is 0x55555555522a
 CSBB: LWP 208303.208305 stopped by trace
 LWFE: waitpid(-1, ...) returned 0, ERRNO-OK
 leader_pid=208303, leader_lp!=NULL=1, num_lwps=11, zombie=0
 leader_pid=207755, leader_lp!=NULL=1, num_lwps=11, zombie=0
 LLW: exit (no unwaited-for LWP)
 Finished step over.

The step-over for 208305 finishes.

 Writing cc to 0x555555555227 in process 208305
 Could not find fast tracepoint jump at 0x555555555227 in list (reinserting).
 >>>> entering void linux_process_target::stop_all_lwps(int, lwp_info*)
 stop_all_lwps (stop, except=none)
 wait_for_sigstop: pulling events

The detach proceeds (snipped).

...

 proceed_one_lwp: lwp 208305
    LWP 208305 has pending status, leaving stopped

Later on, 208305 has a pending status (the step SIGTRAP from the
step-over), so GDBserver starts the process of reporting it.

...

 wait_1 ret = LWP 208303.208305, 1, 5
 <<<< exiting ptid_t linux_process_target::wait_1(ptid_t, target_waitstatus*, target_wait_flags)

...

and eventually GDB receives the stop notification (T05 == SIGTRAP):

 getpkt ("vStopped");  [no ack sent]
 sigchld_handler
 vStopped: acking 3
 Writing resume reply for LWP 208303.208305:1
 putpkt ("$T0506:f0ee58f7ff7f0* ;07:f0ee58f7ff7f0* ;10:2a525*"550* ;thread:p32daf.32db1;core:c;#37"); [noack mode]

From the GDB side, we see:

 [infrun] fetch_inferior_event: enter
   [infrun] fetch_inferior_event: fetch_inferior_event enter
   [infrun] do_target_wait: Found 2 inferiors, starting at #1
   [infrun] print_target_wait_results: target_wait (-1.0.0 [process -1], status) =
   [infrun] print_target_wait_results:   208303.208305.0 [Thread 208303.208305],
   [infrun] print_target_wait_results:   status->kind = stopped, signal = GDB_SIGNAL_TRAP
   [infrun] handle_inferior_event: status->kind = stopped, signal = GDB_SIGNAL_TRAP
   [infrun] start_step_over: enter
     [infrun] start_step_over: stealing global queue of threads to step, length = 6
     [infrun] operator(): putting back 6 threads to step in global queue
   [infrun] start_step_over: exit
   [infrun] handle_signal_stop: context switch
   [infrun] context_switch: Switching context from process 0 to Thread 208303.208305
   [infrun] handle_signal_stop: stop_pc=0x55555555522a
   [infrun] handle_signal_stop: random signal (GDB_SIGNAL_TRAP)
   [infrun] stop_waiting: stop_waiting
   [infrun] stop_all_threads: starting

The fix is to discard the step SIGTRAP, unless GDB wanted the thread
to step.

gdbserver/ChangeLog:

	* linux-low.cc (linux_process_target::complete_ongoing_step_over):
	Discard step SIGTRAP, unless GDB wanted the thread to step.
---
 gdbserver/linux-low.cc | 29 ++++++++++++++++++++++++++++-
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/gdbserver/linux-low.cc b/gdbserver/linux-low.cc
index 4b43d171d2d..5c696c275dd 100644
--- a/gdbserver/linux-low.cc
+++ b/gdbserver/linux-low.cc
@@ -4695,7 +4695,34 @@ linux_process_target::complete_ongoing_step_over ()
 
       lwp = find_lwp_pid (step_over_bkpt);
       if (lwp != NULL)
-	finish_step_over (lwp);
+	{
+	  finish_step_over (lwp);
+
+	  /* If we got our step SIGTRAP, don't leave it pending,
+	     otherwise we would report it to GDB as a spurious
+	     SIGTRAP.  */
+	  gdb_assert (lwp->status_pending_p);
+	  if (WIFSTOPPED (lwp->status_pending)
+	      && WSTOPSIG (lwp->status_pending) == SIGTRAP)
+	    {
+	      thread_info *thread = get_lwp_thread (lwp);
+	      if (thread->last_resume_kind != resume_step)
+		{
+		  if (debug_threads)
+		    debug_printf ("detach: discard step-over SIGTRAP\n");
+
+		  lwp->status_pending_p = 0;
+		  lwp->status_pending = 0;
+		  resume_one_lwp (lwp, lwp->stepping, 0, NULL);
+		}
+	      else
+		{
+		  if (debug_threads)
+		    debug_printf ("detach: resume_step, "
+				  "not discarding step-over SIGTRAP\n");
+		}
+	    }
+	}
       step_over_bkpt = null_ptid;
       unsuspend_all_lwps (lwp);
     }
-- 
2.26.2


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 06/12] Factor out after-stop event handling code from stop_all_threads
  2021-01-13  1:15 [PATCH 00/12] Fix detach + displaced-step regression + N bugs more Pedro Alves
                   ` (4 preceding siblings ...)
  2021-01-13  1:15 ` [PATCH 05/12] gdbserver: spurious SIGTRAP w/ detach while step-over in progress Pedro Alves
@ 2021-01-13  1:15 ` Pedro Alves
  2021-01-13  6:06   ` Simon Marchi
  2021-01-13  1:15 ` [PATCH 07/12] prepare_for_detach: don't release scoped_restore at the end Pedro Alves
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 36+ messages in thread
From: Pedro Alves @ 2021-01-13  1:15 UTC (permalink / raw)
  To: gdb-patches

This moves the code handling an event out of wait_one to a separate
function, to be used in another context in a following patch.

gdb/ChangeLog:

	* infrun.c (handle_one): New function, factored out from ...
	(stop_all_threads): ... here.
---
 gdb/infrun.c | 288 +++++++++++++++++++++++++++------------------------
 1 file changed, 150 insertions(+), 138 deletions(-)

diff --git a/gdb/infrun.c b/gdb/infrun.c
index b54baf70994..b62e74d3dab 100644
--- a/gdb/infrun.c
+++ b/gdb/infrun.c
@@ -4560,6 +4560,154 @@ mark_non_executing_threads (process_stratum_target *target,
   set_resumed (target, mark_ptid, false);
 }
 
+/* Handle one event after stopping threads.  If the eventing thread
+   reports back any interesting event, we leave it pending.  If the
+   eventing thread was in the middle of a displaced step, we
+   cancel/finish it.  */
+
+static bool
+handle_one (const wait_one_event &event)
+{
+  infrun_debug_printf
+    ("%s %s", target_waitstatus_to_string (&event.ws).c_str (),
+     target_pid_to_str (event.ptid).c_str ());
+
+  if (event.ws.kind == TARGET_WAITKIND_NO_RESUMED)
+    {
+      /* All resumed threads exited.  */
+      return true;
+    }
+  else if (event.ws.kind == TARGET_WAITKIND_THREAD_EXITED
+	   || event.ws.kind == TARGET_WAITKIND_EXITED
+	   || event.ws.kind == TARGET_WAITKIND_SIGNALLED)
+    {
+      /* One thread/process exited/signalled.  */
+
+      thread_info *t = nullptr;
+
+      /* The target may have reported just a pid.  If so, try
+	 the first non-exited thread.  */
+      if (event.ptid.is_pid ())
+	{
+	  int pid  = event.ptid.pid ();
+	  inferior *inf = find_inferior_pid (event.target, pid);
+	  for (thread_info *tp : inf->non_exited_threads ())
+	    {
+	      t = tp;
+	      break;
+	    }
+
+	  /* If there is no available thread, the event would
+	     have to be appended to a per-inferior event list,
+	     which does not exist (and if it did, we'd have
+	     to adjust run control command to be able to
+	     resume such an inferior).  We assert here instead
+	     of going into an infinite loop.  */
+	  gdb_assert (t != nullptr);
+
+	  infrun_debug_printf
+	    ("using %s", target_pid_to_str (t->ptid).c_str ());
+	}
+      else
+	{
+	  t = find_thread_ptid (event.target, event.ptid);
+	  /* Check if this is the first time we see this thread.
+	     Don't bother adding if it individually exited.  */
+	  if (t == nullptr
+	      && event.ws.kind != TARGET_WAITKIND_THREAD_EXITED)
+	    t = add_thread (event.target, event.ptid);
+	}
+
+      if (t != nullptr)
+	{
+	  /* Set the threads as non-executing to avoid
+	     another stop attempt on them.  */
+	  switch_to_thread_no_regs (t);
+	  mark_non_executing_threads (event.target, event.ptid,
+				      event.ws);
+	  save_waitstatus (t, &event.ws);
+	  t->stop_requested = false;
+	}
+    }
+  else
+    {
+      thread_info *t = find_thread_ptid (event.target, event.ptid);
+      if (t == NULL)
+	t = add_thread (event.target, event.ptid);
+
+      t->stop_requested = 0;
+      t->executing = 0;
+      t->resumed = false;
+      t->control.may_range_step = 0;
+
+      /* This may be the first time we see the inferior report
+	 a stop.  */
+      inferior *inf = find_inferior_ptid (event.target, event.ptid);
+      if (inf->needs_setup)
+	{
+	  switch_to_thread_no_regs (t);
+	  setup_inferior (0);
+	}
+
+      if (event.ws.kind == TARGET_WAITKIND_STOPPED
+	  && event.ws.value.sig == GDB_SIGNAL_0)
+	{
+	  /* We caught the event that we intended to catch, so
+	     there's no event pending.  */
+	  t->suspend.waitstatus.kind = TARGET_WAITKIND_IGNORE;
+	  t->suspend.waitstatus_pending_p = 0;
+
+	  if (displaced_step_finish (t, GDB_SIGNAL_0)
+	      == DISPLACED_STEP_FINISH_STATUS_NOT_EXECUTED)
+	    {
+	      /* Add it back to the step-over queue.  */
+	      infrun_debug_printf
+		("displaced-step of %s canceled",
+		 target_pid_to_str (t->ptid).c_str ());
+
+	      t->control.trap_expected = 0;
+	      global_thread_step_over_chain_enqueue (t);
+	    }
+	}
+      else
+	{
+	  enum gdb_signal sig;
+	  struct regcache *regcache;
+
+	  infrun_debug_printf
+	    ("target_wait %s, saving status for %d.%ld.%ld",
+	     target_waitstatus_to_string (&event.ws).c_str (),
+	     t->ptid.pid (), t->ptid.lwp (), t->ptid.tid ());
+
+	  /* Record for later.  */
+	  save_waitstatus (t, &event.ws);
+
+	  sig = (event.ws.kind == TARGET_WAITKIND_STOPPED
+		 ? event.ws.value.sig : GDB_SIGNAL_0);
+
+	  if (displaced_step_finish (t, sig)
+	      == DISPLACED_STEP_FINISH_STATUS_NOT_EXECUTED)
+	    {
+	      /* Add it back to the step-over queue.  */
+	      t->control.trap_expected = 0;
+	      global_thread_step_over_chain_enqueue (t);
+	    }
+
+	  regcache = get_thread_regcache (t);
+	  t->suspend.stop_pc = regcache_read_pc (regcache);
+
+	  infrun_debug_printf ("saved stop_pc=%s for %s "
+			       "(currently_stepping=%d)",
+			       paddress (target_gdbarch (),
+					 t->suspend.stop_pc),
+			       target_pid_to_str (t->ptid).c_str (),
+			       currently_stepping (t));
+	}
+    }
+
+  return false;
+}
+
 /* See infrun.h.  */
 
 void
@@ -4673,144 +4821,8 @@ stop_all_threads (void)
 	  for (int i = 0; i < waits_needed; i++)
 	    {
 	      wait_one_event event = wait_one ();
-
-	      infrun_debug_printf
-		("%s %s", target_waitstatus_to_string (&event.ws).c_str (),
-		 target_pid_to_str (event.ptid).c_str ());
-
-	      if (event.ws.kind == TARGET_WAITKIND_NO_RESUMED)
-		{
-		  /* All resumed threads exited.  */
-		  break;
-		}
-	      else if (event.ws.kind == TARGET_WAITKIND_THREAD_EXITED
-		       || event.ws.kind == TARGET_WAITKIND_EXITED
-		       || event.ws.kind == TARGET_WAITKIND_SIGNALLED)
-		{
-		  /* One thread/process exited/signalled.  */
-
-		  thread_info *t = nullptr;
-
-		  /* The target may have reported just a pid.  If so, try
-		     the first non-exited thread.  */
-		  if (event.ptid.is_pid ())
-		    {
-		      int pid  = event.ptid.pid ();
-		      inferior *inf = find_inferior_pid (event.target, pid);
-		      for (thread_info *tp : inf->non_exited_threads ())
-			{
-			  t = tp;
-			  break;
-			}
-
-		      /* If there is no available thread, the event would
-			 have to be appended to a per-inferior event list,
-			 which does not exist (and if it did, we'd have
-			 to adjust run control command to be able to
-			 resume such an inferior).  We assert here instead
-			 of going into an infinite loop.  */
-		      gdb_assert (t != nullptr);
-
-		      infrun_debug_printf
-			("using %s", target_pid_to_str (t->ptid).c_str ());
-		    }
-		  else
-		    {
-		      t = find_thread_ptid (event.target, event.ptid);
-		      /* Check if this is the first time we see this thread.
-			 Don't bother adding if it individually exited.  */
-		      if (t == nullptr
-			  && event.ws.kind != TARGET_WAITKIND_THREAD_EXITED)
-			t = add_thread (event.target, event.ptid);
-		    }
-
-		  if (t != nullptr)
-		    {
-		      /* Set the threads as non-executing to avoid
-			 another stop attempt on them.  */
-		      switch_to_thread_no_regs (t);
-		      mark_non_executing_threads (event.target, event.ptid,
-						  event.ws);
-		      save_waitstatus (t, &event.ws);
-		      t->stop_requested = false;
-		    }
-		}
-	      else
-		{
-		  thread_info *t = find_thread_ptid (event.target, event.ptid);
-		  if (t == NULL)
-		    t = add_thread (event.target, event.ptid);
-
-		  t->stop_requested = 0;
-		  t->executing = 0;
-		  t->resumed = false;
-		  t->control.may_range_step = 0;
-
-		  /* This may be the first time we see the inferior report
-		     a stop.  */
-		  inferior *inf = find_inferior_ptid (event.target, event.ptid);
-		  if (inf->needs_setup)
-		    {
-		      switch_to_thread_no_regs (t);
-		      setup_inferior (0);
-		    }
-
-		  if (event.ws.kind == TARGET_WAITKIND_STOPPED
-		      && event.ws.value.sig == GDB_SIGNAL_0)
-		    {
-		      /* We caught the event that we intended to catch, so
-			 there's no event pending.  */
-		      t->suspend.waitstatus.kind = TARGET_WAITKIND_IGNORE;
-		      t->suspend.waitstatus_pending_p = 0;
-
-		      if (displaced_step_finish (t, GDB_SIGNAL_0)
-			  == DISPLACED_STEP_FINISH_STATUS_NOT_EXECUTED)
-			{
-			  /* Add it back to the step-over queue.  */
-			  infrun_debug_printf
-			    ("displaced-step of %s canceled: adding back to "
-			     "the step-over queue",
-			      target_pid_to_str (t->ptid).c_str ());
-
-			  t->control.trap_expected = 0;
-			  global_thread_step_over_chain_enqueue (t);
-			}
-		    }
-		  else
-		    {
-		      enum gdb_signal sig;
-		      struct regcache *regcache;
-
-		      infrun_debug_printf
-			("target_wait %s, saving status for %d.%ld.%ld",
-			 target_waitstatus_to_string (&event.ws).c_str (),
-			 t->ptid.pid (), t->ptid.lwp (), t->ptid.tid ());
-
-		      /* Record for later.  */
-		      save_waitstatus (t, &event.ws);
-
-		      sig = (event.ws.kind == TARGET_WAITKIND_STOPPED
-			     ? event.ws.value.sig : GDB_SIGNAL_0);
-
-		      if (displaced_step_finish (t, sig)
-			  == DISPLACED_STEP_FINISH_STATUS_NOT_EXECUTED)
-			{
-			  /* Add it back to the step-over queue.  */
-			  t->control.trap_expected = 0;
-			  global_thread_step_over_chain_enqueue (t);
-			}
-
-		      regcache = get_thread_regcache (t);
-		      t->suspend.stop_pc = regcache_read_pc (regcache);
-
-		      infrun_debug_printf ("saved stop_pc=%s for %s "
-					   "(currently_stepping=%d)",
-					   paddress (target_gdbarch (),
-						     t->suspend.stop_pc),
-					   target_pid_to_str (t->ptid).c_str (),
-					   currently_stepping (t));
-		    }
-		}
+	      if (handle_one (event))
+		break;
 	    }
 	}
     }
-- 
2.26.2


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 07/12] prepare_for_detach: don't release scoped_restore at the end
  2021-01-13  1:15 [PATCH 00/12] Fix detach + displaced-step regression + N bugs more Pedro Alves
                   ` (5 preceding siblings ...)
  2021-01-13  1:15 ` [PATCH 06/12] Factor out after-stop event handling code from stop_all_threads Pedro Alves
@ 2021-01-13  1:15 ` Pedro Alves
  2021-01-13  6:08   ` Simon Marchi
  2021-01-13  1:15 ` [PATCH 08/12] prepare_for_detach and ongoing displaced stepping Pedro Alves
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 36+ messages in thread
From: Pedro Alves @ 2021-01-13  1:15 UTC (permalink / raw)
  To: gdb-patches

After detaching from a process, the inf->detaching flag is
inadvertently left set to true.  If you afterwards reuse the same
inferior to start a new process, GDB will mishave...

The problem is that prepare_for_detach discards the scoped_restore at
the end, while the intention is for the flag to be set only for the
duration of prepare_for_detach.

This was already a bug in the original commit that added
prepare_for_detach, commit 24291992dac3 ("PR gdb/11321"), by yours
truly.  Back then, we still used cleanups, and the function called
discard_cleanups instead of do_cleanups, by mistake.

gdb/ChangeLog:

	* infrun.c (prepare_for_detach): Don't release scoped_restore at
	the end.
---
 gdb/infrun.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/gdb/infrun.c b/gdb/infrun.c
index b62e74d3dab..fc7ba745737 100644
--- a/gdb/infrun.c
+++ b/gdb/infrun.c
@@ -3612,8 +3612,6 @@ prepare_for_detach (void)
 	  error (_("Program exited while detaching"));
 	}
     }
-
-  restore_detaching.release ();
 }
 
 /* Wait for control to return from inferior to debugger.
-- 
2.26.2


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 08/12] prepare_for_detach and ongoing displaced stepping
  2021-01-13  1:15 [PATCH 00/12] Fix detach + displaced-step regression + N bugs more Pedro Alves
                   ` (6 preceding siblings ...)
  2021-01-13  1:15 ` [PATCH 07/12] prepare_for_detach: don't release scoped_restore at the end Pedro Alves
@ 2021-01-13  1:15 ` Pedro Alves
  2021-01-13  6:23   ` Simon Marchi
  2021-01-13  1:15 ` [PATCH 09/12] detach and breakpoint removal Pedro Alves
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 36+ messages in thread
From: Pedro Alves @ 2021-01-13  1:15 UTC (permalink / raw)
  To: gdb-patches

I noticed that "detach" while a program was running sometimes resulted
in the process crashing.  I tracked it down to this change to
prepare_for_detach in commit 187b041e ("gdb: move displaced stepping
logic to gdbarch, allow starting concurrent displaced steps"):

    /* Is any thread of this process displaced stepping?  If not,
       there's nothing else to do.  */
 -  if (displaced->step_thread == nullptr)
 +  if (displaced_step_in_progress (inf))
      return;

The problem above is that the condition was inadvertently flipped.  It
should have been:

   if (!displaced_step_in_progress (inf))

So I fixed it, and wrote a testcase to exercise it.  The testcase has
a number of threads constantly stepping over a breakpoint, and then
GDB detaches the process, while threads are running and stepping over
the breakpoint.  And then I was surprised that my testcase would hang
-- GDB would get stuck in an infinite loop in prepare_for_detach,
here:

  while (displaced_step_in_progress (inf))
    {
      ...

What is going on is that since we now have two displaced stepping
buffers, as one displaced step finishes, GDB starts another, and
there's another one already in progress, and on and on, so the
displaced_step_in_progress condition never turns false.  This happens
because we go via the whole handle_inferior_event, which tries to
start new step overs when one finishes.  And also because while we
remove breakpoints from the target before prepare_for_detach is
called, handle_inferior_event ends up calling insert_breakpoints via
e.g. keep_going.

Thinking through all this, I came to the conclusion that going through
the whole handle_inferior_event isn't ideal.  A _lot_ is done by that
function, e.g., some thread may get a signal which is passed to the
inferior, and gdb decides to try to get over the signal handler, which
reinstalls breakpoints.  Or some process may exit.  We can end up
reporting these events via normal_stop while detaching, maybe end up
running some breakpoint commands, or maybe even something runs an
inferior function call.  Etc.  All this after the user has already
declared they don't want to debug the process anymore, by asking to
detach.

I came to the conclusion that it's better to do the minimal amount of
work possible, in a more controlled fashion, without going through
handle_inferior_event.  So in the new approach implemented by this
patch, if there are threads of the inferior that we're detaching in
the middle of a displaced step, stop them, and cancel the displaced
step.  This is basically what stop_all_threads already does, via
wait_one and (the now factored out) handle_one, so I'm reusing those.

gdb/ChangeLog:

	* infrun.c (struct wait_one_event): Move higher up.
	(prepare_for_detach): Abort in-progress displaced steps instead of
	letting them complete.
	(handle_one): If the inferior is detaching, don't add the thread
	back to the global step-over chain.
	(restart_threads): Don't restart threads if detaching.
	(handle_signal_stop): Remove inferior::detaching reference.
---
 gdb/infrun.c | 123 +++++++++++++++++++++++++++------------------------
 1 file changed, 65 insertions(+), 58 deletions(-)

diff --git a/gdb/infrun.c b/gdb/infrun.c
index fc7ba745737..12e1564c090 100644
--- a/gdb/infrun.c
+++ b/gdb/infrun.c
@@ -3551,6 +3551,22 @@ do_target_wait (ptid_t wait_ptid, execution_control_state *ecs,
   return false;
 }
 
+/* An event reported by wait_one.  */
+
+struct wait_one_event
+{
+  /* The target the event came out of.  */
+  process_stratum_target *target;
+
+  /* The PTID the event was for.  */
+  ptid_t ptid;
+
+  /* The waitstatus.  */
+  target_waitstatus ws;
+};
+
+static bool handle_one (const wait_one_event &event);
+
 /* Prepare and stabilize the inferior for detaching it.  E.g.,
    detaching while a thread is displaced stepping is a recipe for
    crashing it, as nothing would readjust the PC out of the scratch
@@ -3561,56 +3577,60 @@ prepare_for_detach (void)
 {
   struct inferior *inf = current_inferior ();
   ptid_t pid_ptid = ptid_t (inf->pid);
-
-  /* Is any thread of this process displaced stepping?  If not,
-     there's nothing else to do.  */
-  if (displaced_step_in_progress (inf))
-    return;
-
-  infrun_debug_printf ("displaced-stepping in-process while detaching");
+  scoped_restore_current_thread restore_thread;
 
   scoped_restore restore_detaching = make_scoped_restore (&inf->detaching, true);
 
-  while (displaced_step_in_progress (inf))
+  /* Remove all threads of INF from the global step-over chain.  We
+     want to stop any ongoing step-over, not start any new one.  */
+  thread_info *next;
+  for (thread_info *tp = global_thread_step_over_chain_head;
+       tp != nullptr;
+       tp = next)
     {
-      struct execution_control_state ecss;
-      struct execution_control_state *ecs;
-
-      ecs = &ecss;
-      memset (ecs, 0, sizeof (*ecs));
+      next = global_thread_step_over_chain_next (tp);
+      if (tp->inf == inf)
+	global_thread_step_over_chain_remove (tp);
+    }
 
-      overlay_cache_invalid = 1;
-      /* Flush target cache before starting to handle each event.
-	 Target was running and cache could be stale.  This is just a
-	 heuristic.  Running threads may modify target memory, but we
-	 don't get any event.  */
-      target_dcache_invalidate ();
+  if (displaced_step_in_progress (inf))
+    {
+      infrun_debug_printf ("displaced-stepping in-process while detaching");
 
-      do_target_wait (pid_ptid, ecs, 0);
+      /* Stop threads currently displaced stepping, aborting it.  */
 
-      if (debug_infrun)
-	print_target_wait_results (pid_ptid, ecs->ptid, &ecs->ws);
+      for (thread_info *thr : inf->non_exited_threads ())
+	{
+	  if (thr->displaced_step_state.in_progress ())
+	    {
+	      if (thr->executing)
+		{
+		  if (!thr->stop_requested)
+		    {
+		      target_stop (thr->ptid);
+		      thr->stop_requested = true;
+		    }
+		}
+	      else
+		thr->resumed = false;
+	    }
+	}
 
-      /* If an error happens while handling the event, propagate GDB's
-	 knowledge of the executing state to the frontend/user running
-	 state.  */
-      scoped_finish_thread_state finish_state (inf->process_target (),
-					       minus_one_ptid);
+      while (displaced_step_in_progress (inf))
+	{
+	  wait_one_event event;
 
-      /* Now figure out what to do with the result of the result.  */
-      handle_inferior_event (ecs);
+	  event.target = inf->process_target ();
+	  event.ptid = do_target_wait_1 (inf, pid_ptid, &event.ws, 0);
 
-      /* No error, don't finish the state yet.  */
-      finish_state.release ();
+	  if (debug_infrun)
+	    print_target_wait_results (pid_ptid, event.ptid, &event.ws);
 
-      /* Breakpoints and watchpoints are not installed on the target
-	 at this point, and signals are passed directly to the
-	 inferior, so this must mean the process is gone.  */
-      if (!ecs->wait_some_more)
-	{
-	  restore_detaching.release ();
-	  error (_("Program exited while detaching"));
+	  handle_one (event);
 	}
+
+      /* It's OK to leave some of the threads of INF stopped, since
+	 they'll be detached shortly.  */
     }
 }
 
@@ -4364,20 +4384,6 @@ poll_one_curr_target (struct target_waitstatus *ws)
   return event_ptid;
 }
 
-/* An event reported by wait_one.  */
-
-struct wait_one_event
-{
-  /* The target the event came out of.  */
-  process_stratum_target *target;
-
-  /* The PTID the event was for.  */
-  ptid_t ptid;
-
-  /* The waitstatus.  */
-  target_waitstatus ws;
-};
-
 /* Wait for one event out of any target.  */
 
 static wait_one_event
@@ -4561,7 +4567,8 @@ mark_non_executing_threads (process_stratum_target *target,
 /* Handle one event after stopping threads.  If the eventing thread
    reports back any interesting event, we leave it pending.  If the
    eventing thread was in the middle of a displaced step, we
-   cancel/finish it.  */
+   cancel/finish it, and unless the thread's inferior is being
+   detached, put the thread back in the step-over chain.  */
 
 static bool
 handle_one (const wait_one_event &event)
@@ -4664,7 +4671,8 @@ handle_one (const wait_one_event &event)
 		 target_pid_to_str (t->ptid).c_str ());
 
 	      t->control.trap_expected = 0;
-	      global_thread_step_over_chain_enqueue (t);
+	      if (!t->inf->detaching)
+		global_thread_step_over_chain_enqueue (t);
 	    }
 	}
       else
@@ -4688,7 +4696,8 @@ handle_one (const wait_one_event &event)
 	    {
 	      /* Add it back to the step-over queue.  */
 	      t->control.trap_expected = 0;
-	      global_thread_step_over_chain_enqueue (t);
+	      if (!t->inf->detaching)
+		global_thread_step_over_chain_enqueue (t);
 	    }
 
 	  regcache = get_thread_regcache (t);
@@ -6119,7 +6128,6 @@ handle_signal_stop (struct execution_control_state *ecs)
   if (random_signal)
     {
       /* Signal not for debugging purposes.  */
-      struct inferior *inf = find_inferior_ptid (ecs->target, ecs->ptid);
       enum gdb_signal stop_signal = ecs->event_thread->suspend.stop_signal;
 
       infrun_debug_printf ("random signal (%s)",
@@ -6132,8 +6140,7 @@ handle_signal_stop (struct execution_control_state *ecs)
 	 to remain stopped.  */
       if (stop_soon != NO_STOP_QUIETLY
 	  || ecs->event_thread->stop_requested
-	  || (!inf->detaching
-	      && signal_stop_state (ecs->event_thread->suspend.stop_signal)))
+	  || signal_stop_state (ecs->event_thread->suspend.stop_signal))
 	{
 	  stop_waiting (ecs);
 	  return;
-- 
2.26.2


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 09/12] detach and breakpoint removal
  2021-01-13  1:15 [PATCH 00/12] Fix detach + displaced-step regression + N bugs more Pedro Alves
                   ` (7 preceding siblings ...)
  2021-01-13  1:15 ` [PATCH 08/12] prepare_for_detach and ongoing displaced stepping Pedro Alves
@ 2021-01-13  1:15 ` Pedro Alves
  2021-01-13  6:32   ` Simon Marchi
  2021-01-13  1:15 ` [PATCH 10/12] detach with in-line step over in progress Pedro Alves
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 36+ messages in thread
From: Pedro Alves @ 2021-01-13  1:15 UTC (permalink / raw)
  To: gdb-patches

A following patch will add a testcase that has a number of threads
constantly stepping over a breakpoint, and then has GDB detach the
process.  That testcase sometimes fails with the inferior crashing
with SIGTRAP after the detach because of the bug fixed by this patch,
when tested with the native target.

The problem is that target_detach removes breakpoints from the target
immediately, and that does not work with the native GNU/Linux target
(and probably no other native target) currently.  The test wouldn't
fail with this issue when testing against gdbserver, because gdbserver
does allow accessing memory while the current thread is running, by
transparently pausing all threads temporarily, without GDB noticing.
Implementing that in gdbserver was a lot of work, so I'm not looking
forward right now to do the same in the native target.  Instead, I
came up with a simpler solution -- push the breakpoints removal down
to the targets.  The Linux target conveniently already pauses all
threads before detaching them, since PTRACE_DETACH only works with
stopped threads, so we move removing breakpoints to after that.  Only
the remote and GNU/Linux targets support support async execution, so
no other target should really need this.

gdb/ChangeLog:

	* linux-nat.c (linux_nat_target::detach): Remove breakpoints
	here...
	* remote.c (remote_target::remote_detach_1): ... and here ...
	* target.c (target_detach): ... instead of here.
---
 gdb/linux-nat.c |  5 +++++
 gdb/remote.c    | 10 ++++++++++
 gdb/target.c    |  9 ---------
 3 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
index dc524cf10dc..e46fa1ad6ff 100644
--- a/gdb/linux-nat.c
+++ b/gdb/linux-nat.c
@@ -1456,6 +1456,11 @@ linux_nat_target::detach (inferior *inf, int from_tty)
      they're no longer running.  */
   iterate_over_lwps (ptid_t (pid), stop_wait_callback);
 
+  /* We can now safely remove breakpoints.  We don't this in earlier
+     in common code because this target doesn't currently support
+     writing memory while the inferior is running.  */
+  remove_breakpoints_inf (current_inferior ());
+
   iterate_over_lwps (ptid_t (pid), detach_callback);
 
   /* Only the initial process should be left right now.  */
diff --git a/gdb/remote.c b/gdb/remote.c
index 7199e0ac422..423cd4a84c9 100644
--- a/gdb/remote.c
+++ b/gdb/remote.c
@@ -5735,6 +5735,16 @@ remote_target::remote_detach_1 (inferior *inf, int from_tty)
 
   target_announce_detach (from_tty);
 
+  if (!gdbarch_has_global_breakpoints (target_gdbarch ()))
+    {
+      /* If we're in breakpoints-always-inserted mode, or the inferior
+	 is running, we have to remove breakpoints before detaching.
+	 We don't do this in common code instead because not all
+	 targets support removing breakpoints while the target is
+	 running.  The remote target / gdbserver does, though.  */
+      remove_breakpoints_inf (current_inferior ());
+    }
+
   /* Tell the remote target to detach.  */
   remote_detach_pid (pid);
 
diff --git a/gdb/target.c b/gdb/target.c
index 3a03a0ad530..9a8473d40e1 100644
--- a/gdb/target.c
+++ b/gdb/target.c
@@ -1949,15 +1949,6 @@ target_detach (inferior *inf, int from_tty)
      assertion.  */
   gdb_assert (inf == current_inferior ());
 
-  if (gdbarch_has_global_breakpoints (target_gdbarch ()))
-    /* Don't remove global breakpoints here.  They're removed on
-       disconnection from the target.  */
-    ;
-  else
-    /* If we're in breakpoints-always-inserted mode, have to remove
-       breakpoints before detaching.  */
-    remove_breakpoints_inf (current_inferior ());
-
   prepare_for_detach ();
 
   /* Hold a strong reference because detaching may unpush the
-- 
2.26.2


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 10/12] detach with in-line step over in progress
  2021-01-13  1:15 [PATCH 00/12] Fix detach + displaced-step regression + N bugs more Pedro Alves
                   ` (8 preceding siblings ...)
  2021-01-13  1:15 ` [PATCH 09/12] detach and breakpoint removal Pedro Alves
@ 2021-01-13  1:15 ` Pedro Alves
  2021-01-13  6:39   ` Simon Marchi
  2021-01-13  1:15 ` [PATCH 11/12] detach in all-stop with threads running Pedro Alves
  2021-01-13  1:15 ` [PATCH 12/12] Testcase for detaching while stepping over breakpoint Pedro Alves
  11 siblings, 1 reply; 36+ messages in thread
From: Pedro Alves @ 2021-01-13  1:15 UTC (permalink / raw)
  To: gdb-patches

A following patch will add a testcase that has a number of threads
constantly stepping over a breakpoint, and then has GDB detach the
process.  That testcase exercises both "set displaced-stepping
on/off".  Testing with "set displaced-stepping off" reveals that GDB
does not handle the case of the user typing "detach" just while some
thread is in the middle of an in-line step over.  If that thread
belongs to the inferior that is being detached, then the step-over
never finishes, and threads of other inferiors are never re-resumed.
This fixes it.

gdb/ChangeLog:

	* infrun.c (struct step_over_info): Initialize fields.
	(prepare_for_detach): Handle ongoing in-line step over.
---
 gdb/infrun.c | 45 +++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 41 insertions(+), 4 deletions(-)

diff --git a/gdb/infrun.c b/gdb/infrun.c
index 12e1564c090..4a17410fcd6 100644
--- a/gdb/infrun.c
+++ b/gdb/infrun.c
@@ -1261,15 +1261,15 @@ struct step_over_info
      and address of the instruction the breakpoint is set at.  We'll
      skip inserting all breakpoints here.  Valid iff ASPACE is
      non-NULL.  */
-  const address_space *aspace;
-  CORE_ADDR address;
+  const address_space *aspace = nullptr;
+  CORE_ADDR address = 0;
 
   /* The instruction being stepped over triggers a nonsteppable
      watchpoint.  If true, we'll skip inserting watchpoints.  */
-  int nonsteppable_watchpoint_p;
+  int nonsteppable_watchpoint_p = 0;
 
   /* The thread's global number.  */
-  int thread;
+  int thread = -1;
 };
 
 /* The step-over info of the location that is being stepped over.
@@ -3566,6 +3566,7 @@ struct wait_one_event
 };
 
 static bool handle_one (const wait_one_event &event);
+static void restart_threads (struct thread_info *event_thread);
 
 /* Prepare and stabilize the inferior for detaching it.  E.g.,
    detaching while a thread is displaced stepping is a recipe for
@@ -3593,6 +3594,35 @@ prepare_for_detach (void)
 	global_thread_step_over_chain_remove (tp);
     }
 
+  /* If we were already in the middle of an inline step-over, and the
+     thread stepping belongs to the inferior we're detaching, we need
+     to restart the threads of other inferiors.  */
+  if (step_over_info.thread != -1)
+    {
+      infrun_debug_printf ("inline step-over in-process while detaching");
+
+      thread_info *thr = find_thread_global_id (step_over_info.thread);
+      if (thr->inf == inf)
+	{
+	  /* Since we removed threads of INF from the step-over chain,
+	     we know this won't start a step-over for INF.  */
+	  clear_step_over_info ();
+
+	  if (target_is_non_stop_p ())
+	    {
+	      /* Start a new step-over in another thread if there's
+		 one that needs it.  */
+	      start_step_over ();
+
+	      /* Restart all other threads (except the
+		 previously-stepping thread, since that one is still
+		 running).  */
+	      if (!step_over_info_valid_p ())
+		restart_threads (thr);
+	    }
+	}
+    }
+
   if (displaced_step_in_progress (inf))
     {
       infrun_debug_printf ("displaced-stepping in-process while detaching");
@@ -5526,6 +5556,13 @@ restart_threads (struct thread_info *event_thread)
 
   for (thread_info *tp : all_non_exited_threads ())
     {
+      if (tp->inf->detaching)
+	{
+	  infrun_debug_printf ("restart threads: [%s] inferior detaching",
+			       target_pid_to_str (tp->ptid).c_str ());
+	  continue;
+	}
+
       switch_to_thread_no_regs (tp);
 
       if (tp == event_thread)
-- 
2.26.2


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 11/12] detach in all-stop with threads running
  2021-01-13  1:15 [PATCH 00/12] Fix detach + displaced-step regression + N bugs more Pedro Alves
                   ` (9 preceding siblings ...)
  2021-01-13  1:15 ` [PATCH 10/12] detach with in-line step over in progress Pedro Alves
@ 2021-01-13  1:15 ` Pedro Alves
  2021-01-13  6:55   ` Simon Marchi
  2021-07-12 15:36   ` Jonah Graham
  2021-01-13  1:15 ` [PATCH 12/12] Testcase for detaching while stepping over breakpoint Pedro Alves
  11 siblings, 2 replies; 36+ messages in thread
From: Pedro Alves @ 2021-01-13  1:15 UTC (permalink / raw)
  To: gdb-patches

A following patch will add a testcase that has a number of threads
constantly stepping over a breakpoint, and then has GDB detach the
process, while threads are running.  If we have more then one inferior
running, and we detach from just one of the inferiors, we expect that
the remaining inferior continues running.  However, in all-stop, if
GDB needs to pause the target for the detach, nothing is re-resuming
the other inferiors after the detach.  "info threads" shows the
threads as running, but they really aren't.  This fixes it.

gdb/ChangeLog:

	* infcmd.c (detach_command): Hold strong reference to target, and
	if all-stop on entry, restart threads on exit.
	* infrun.c (switch_back_to_stepped_thread): Factor out bits to ...
	(restart_stepped_thread): ... this new function.  Also handle
	trap_expected.
	(restart_after_all_stop_detach): New function.
	* infrun.h (restart_after_all_stop_detach): Declare.
---
 gdb/infcmd.c |  13 ++++
 gdb/infrun.c | 163 +++++++++++++++++++++++++++++++++++----------------
 gdb/infrun.h |   4 ++
 3 files changed, 128 insertions(+), 52 deletions(-)

diff --git a/gdb/infcmd.c b/gdb/infcmd.c
index 6f0ed952de6..ebaf57592ef 100644
--- a/gdb/infcmd.c
+++ b/gdb/infcmd.c
@@ -2750,6 +2750,16 @@ detach_command (const char *args, int from_tty)
 
   disconnect_tracing ();
 
+  /* Hold a strong reference to the target while (maybe)
+     detaching the parent.  Otherwise detaching could close the
+     target.  */
+  auto target_ref
+    = target_ops_ref::new_reference (current_inferior ()->process_target ());
+
+  /* Save this before detaching, since detaching may unpush the
+     process_stratum target.  */
+  bool was_non_stop_p = target_is_non_stop_p ();
+
   target_detach (current_inferior (), from_tty);
 
   /* The current inferior process was just detached successfully.  Get
@@ -2766,6 +2776,9 @@ detach_command (const char *args, int from_tty)
 
   if (deprecated_detach_hook)
     deprecated_detach_hook ();
+
+  if (!was_non_stop_p)
+    restart_after_all_stop_detach (as_process_stratum_target (target_ref.get ()));
 }
 
 /* Disconnect from the current target without resuming it (leaving it
diff --git a/gdb/infrun.c b/gdb/infrun.c
index 4a17410fcd6..4f1a16d2c60 100644
--- a/gdb/infrun.c
+++ b/gdb/infrun.c
@@ -7098,6 +7098,9 @@ process_event_stop_test (struct execution_control_state *ecs)
   keep_going (ecs);
 }
 
+static bool restart_stepped_thread (process_stratum_target *resume_target,
+				    ptid_t resume_ptid);
+
 /* In all-stop mode, if we're currently stepping but have stopped in
    some other thread, we may need to switch back to the stepped
    thread.  Returns true we set the inferior running, false if we left
@@ -7108,8 +7111,6 @@ switch_back_to_stepped_thread (struct execution_control_state *ecs)
 {
   if (!target_is_non_stop_p ())
     {
-      struct thread_info *stepping_thread;
-
       /* If any thread is blocked on some internal breakpoint, and we
 	 simply need to step over that breakpoint to get it going
 	 again, do that first.  */
@@ -7172,78 +7173,136 @@ switch_back_to_stepped_thread (struct execution_control_state *ecs)
       if (!signal_program[ecs->event_thread->suspend.stop_signal])
 	ecs->event_thread->suspend.stop_signal = GDB_SIGNAL_0;
 
-      /* Do all pending step-overs before actually proceeding with
-	 step/next/etc.  */
-      if (start_step_over ())
+      if (restart_stepped_thread (ecs->target, ecs->ptid))
 	{
 	  prepare_to_wait (ecs);
 	  return true;
 	}
 
-      /* Look for the stepping/nexting thread.  */
-      stepping_thread = NULL;
+      switch_to_thread (ecs->event_thread);
+    }
 
-      for (thread_info *tp : all_non_exited_threads ())
-	{
-	  switch_to_thread_no_regs (tp);
+  return false;
+}
 
-	  /* Ignore threads of processes the caller is not
-	     resuming.  */
-	  if (!sched_multi
-	      && (tp->inf->process_target () != ecs->target
-		  || tp->inf->pid != ecs->ptid.pid ()))
-	    continue;
+/* Look for the thread that was stepping, and resume it.
+   RESUME_TARGET / RESUME_PTID indicate the set of threads the caller
+   is resuming.  Return true if a thread was started, false
+   otherwise.  */
 
-	  /* When stepping over a breakpoint, we lock all threads
-	     except the one that needs to move past the breakpoint.
-	     If a non-event thread has this set, the "incomplete
-	     step-over" check above should have caught it earlier.  */
-	  if (tp->control.trap_expected)
-	    {
-	      internal_error (__FILE__, __LINE__,
-			      "[%s] has inconsistent state: "
-			      "trap_expected=%d\n",
-			      target_pid_to_str (tp->ptid).c_str (),
-			      tp->control.trap_expected);
-	    }
+static bool
+restart_stepped_thread (process_stratum_target *resume_target,
+			ptid_t resume_ptid)
+{
+  /* Do all pending step-overs before actually proceeding with
+     step/next/etc.  */
+  if (start_step_over ())
+    return true;
 
-	  /* Did we find the stepping thread?  */
-	  if (tp->control.step_range_end)
-	    {
-	      /* Yep.  There should only one though.  */
-	      gdb_assert (stepping_thread == NULL);
+  for (thread_info *tp : all_threads_safe ())
+    {
+      if (tp->state == THREAD_EXITED)
+	continue;
+
+      if (tp->suspend.waitstatus_pending_p)
+	continue;
 
-	      /* The event thread is handled at the top, before we
-		 enter this loop.  */
-	      gdb_assert (tp != ecs->event_thread);
+      /* Ignore threads of processes the caller is not
+	 resuming.  */
+      if (!sched_multi
+	  && (tp->inf->process_target () != resume_target
+	      || tp->inf->pid != resume_ptid.pid ()))
+	continue;
 
-	      /* If some thread other than the event thread is
-		 stepping, then scheduler locking can't be in effect,
-		 otherwise we wouldn't have resumed the current event
-		 thread in the first place.  */
-	      gdb_assert (!schedlock_applies (tp));
+      if (tp->control.trap_expected)
+	{
+	  infrun_debug_printf ("switching back to stepped thread (step-over)");
 
-	      stepping_thread = tp;
-	    }
+	  if (keep_going_stepped_thread (tp))
+	    return true;
 	}
+    }
+
+  for (thread_info *tp : all_threads_safe ())
+    {
+      if (tp->state == THREAD_EXITED)
+	continue;
+
+      if (tp->suspend.waitstatus_pending_p)
+	continue;
 
-      if (stepping_thread != NULL)
+      /* Ignore threads of processes the caller is not
+	 resuming.  */
+      if (!sched_multi
+	  && (tp->inf->process_target () != resume_target
+	      || tp->inf->pid != resume_ptid.pid ()))
+	continue;
+
+      /* Did we find the stepping thread?  */
+      if (tp->control.step_range_end)
 	{
-	  infrun_debug_printf ("switching back to stepped thread");
+	  infrun_debug_printf ("switching back to stepped thread (stepping)");
 
-	  if (keep_going_stepped_thread (stepping_thread))
-	    {
-	      prepare_to_wait (ecs);
-	      return true;
-	    }
+	  if (keep_going_stepped_thread (tp))
+	    return true;
 	}
-
-      switch_to_thread (ecs->event_thread);
     }
 
   return false;
 }
 
+/* See infrun.h.  */
+
+void
+restart_after_all_stop_detach (process_stratum_target *proc_target)
+{
+  /* Note we don't check target_is_non_stop_p() here, because the
+     current inferior may no longer have a process_stratum target
+     pushed, as we just detached.  */
+
+  /* See if we have a THREAD_RUNNING thread that need to be
+     re-resumed.  If we have any thread that is already executing,
+     then we don't need to resume the target -- it is already been
+     resumed.  With the remote target (in all-stop), it's even
+     impossible to issue another resumption if the target is already
+     resumed, until the target reports a stop.  */
+  for (thread_info *thr : all_threads (proc_target))
+    {
+      if (thr->state != THREAD_RUNNING)
+	continue;
+
+      /* If we have any thread that is already executing, then we
+	 don't need to resume the target -- it is already been
+	 resumed.  */
+      if (thr->executing)
+	return;
+
+      /* If we have a pending event to process, skip resuming the
+	 target and go straight to processing it.  */
+      if (thr->resumed && thr->suspend.waitstatus_pending_p)
+	return;
+    }
+
+  /* Alright, we need to re-resume the target.  If a thread was
+     stepping, we need to restart it stepping.  */
+  if (restart_stepped_thread (proc_target, minus_one_ptid))
+    return;
+
+  /* Otherwise, find the first THREAD_RUNNING thread and resume
+     it.  */
+  for (thread_info *thr : all_threads (proc_target))
+    {
+      if (thr->state != THREAD_RUNNING)
+	continue;
+
+      execution_control_state ecs;
+      reset_ecs (&ecs, thr);
+      switch_to_thread (thr);
+      keep_going (&ecs);
+      return;
+    }
+}
+
 /* Set a previously stepped thread back to stepping.  Returns true on
    success, false if the resume is not possible (e.g., the thread
    vanished).  */
diff --git a/gdb/infrun.h b/gdb/infrun.h
index 7160b60f136..e643c84dd80 100644
--- a/gdb/infrun.h
+++ b/gdb/infrun.h
@@ -269,4 +269,8 @@ extern void all_uis_check_sync_execution_done (void);
    started or re-started).  */
 extern void all_uis_on_sync_execution_starting (void);
 
+/* In all-stop, restart the target if it had to be stopped to
+   detach.  */
+extern void restart_after_all_stop_detach (process_stratum_target *proc_target);
+
 #endif /* INFRUN_H */
-- 
2.26.2


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 12/12] Testcase for detaching while stepping over breakpoint
  2021-01-13  1:15 [PATCH 00/12] Fix detach + displaced-step regression + N bugs more Pedro Alves
                   ` (10 preceding siblings ...)
  2021-01-13  1:15 ` [PATCH 11/12] detach in all-stop with threads running Pedro Alves
@ 2021-01-13  1:15 ` Pedro Alves
  2021-01-13  7:05   ` Simon Marchi
  11 siblings, 1 reply; 36+ messages in thread
From: Pedro Alves @ 2021-01-13  1:15 UTC (permalink / raw)
  To: gdb-patches

This adds a testcase that exercises detaching while GDB is stepping
over a breakpoint, in all combinations of:

  - maint target non-stop off/on
  - set non-stop on/off
  - displaced stepping on/off

This exercises the bugs fixed in the previous 8 patches.

gdb/testsuite/ChangeLog:

	* gdb.threads/detach-step-over.c: New file.
	* gdb.threads/detach-step-over.exp: New file.
---
 gdb/testsuite/gdb.threads/detach-step-over.c  | 112 +++++++
 .../gdb.threads/detach-step-over.exp          | 290 ++++++++++++++++++
 2 files changed, 402 insertions(+)
 create mode 100644 gdb/testsuite/gdb.threads/detach-step-over.c
 create mode 100644 gdb/testsuite/gdb.threads/detach-step-over.exp

diff --git a/gdb/testsuite/gdb.threads/detach-step-over.c b/gdb/testsuite/gdb.threads/detach-step-over.c
new file mode 100644
index 00000000000..13db9f60b04
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/detach-step-over.c
@@ -0,0 +1,112 @@
+/* This testcase is part of GDB, the GNU debugger.
+
+   Copyright 2021 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#define _GNU_SOURCE
+#include <assert.h>
+#include <pthread.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <unistd.h>
+#include <string.h>
+#include <signal.h>
+
+/* Number of threads we'll create.  */
+int n_threads = 10;
+
+int mypid;
+
+static void
+setup_done (void)
+{
+}
+
+/* Entry point for threads.  Loops forever.  */
+
+void *
+thread_func (void *arg)
+{
+  /* Avoid setting the breakpoint at an instruction that wouldn't
+     require a fixup phase, like a branch/jump.  In such a case, even
+     if GDB manages to detach the inferior with an incomplete
+     displaced step, GDB inferior may still not crash.  A breakpoint
+     at a line that increments a variable is good bet that we end up
+     setting a breakpoint at an instruction that will require a fixup
+     phase to move the PC from the scratch pad to the instruction
+     after the breakpoint.  */
+  volatile unsigned counter = 0;
+
+  while (1)
+    {
+      counter++; /* Set breakpoint here.  */
+      counter++;
+      counter++;
+    }
+
+  return NULL;
+}
+
+/* Allow for as much timeout as DejaGnu wants, plus a bit of
+   slack.  */
+#define SECONDS (TIMEOUT + 20)
+
+/* We'll exit after this many seconds.  */
+unsigned int seconds_left = SECONDS;
+
+/* GDB sets this whenever it's about to start a new detach/attach
+   sequence.  We react by resetting the seconds-left counter.  */
+volatile int again = 0;
+
+int
+main (int argc, char **argv)
+{
+  int i;
+
+  signal (SIGUSR1, SIG_IGN);
+
+  mypid = getpid ();
+  setup_done ();
+
+  if (argc > 1)
+    n_threads = atoi (argv[1]);
+
+  /* Spawn the test threads.  */
+  for (i = 0; i < n_threads; ++i)
+    {
+      pthread_t child;
+      int rc;
+
+      rc = pthread_create (&child, NULL, thread_func, NULL);
+      assert (rc == 0);
+    }
+
+  /* Exit after a while if GDB is gone/crashes.  But wait long enough
+     for one attach/detach sequence done by the .exp file.  */
+  while (--seconds_left > 0)
+    {
+      sleep (1);
+
+      if (again)
+	{
+	  /* GDB should be reattaching soon.  Restart the timer.  */
+	  again = 0;
+	  seconds_left = SECONDS;
+	}
+    }
+
+  printf ("timeout, exiting\n");
+  return 0;
+}
diff --git a/gdb/testsuite/gdb.threads/detach-step-over.exp b/gdb/testsuite/gdb.threads/detach-step-over.exp
new file mode 100644
index 00000000000..3298c5b61a3
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/detach-step-over.exp
@@ -0,0 +1,290 @@
+# Copyright 2021 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# Test detaching from a process that is running and has threads
+# constantly hitting a breakpoint and stepping over it, in all
+# combinations of:
+#
+#  - maint target non-stop off/on
+#  - set non-stop on/off
+#  - displaced stepping on/off
+#
+# This stresses the edge cases of detaching while a displaced step or
+# an in-line step over are in progress.
+#
+# A fail mode is that the inferior process dies after being detached.
+# This can happen because e.g.:
+#
+# - GDB leaves a breakpoint installed behind, or
+#
+# - GDB leaves a thread running in the displaced step scratch buffer.
+#   With no debugger around to run the finish step, the thread runs
+#   off of the scratch buffer, with undefined results.
+#
+# To exercise this, the testcase reattaches to the process shortly
+# after detaching, ensuring the process is still alive and well.
+#
+# In addition, since GDB may pause threads of all processes for
+# stepping over a breakpoint, it needs to re-resume all threads if it
+# detaches from the process that was just stepping over the
+# breakpoint.  To ensure that, the testcase actually runs a second
+# process at the same time as the one that is used to test detaching.
+# After the first process is detached, the testcase sends a SIGUSR1 to
+# the second process.  If threads failed to be resumed, then the
+# SIGUSR1 is never reported to the user, resulting in timeout.  The
+# threads of this second process will also be constantly stepping over
+# a breakpoint, which has helped with exposing further corner case
+# bugs.
+
+if {![can_spawn_for_attach]} {
+    return 0
+}
+
+standard_testfile
+
+set bp_lineno [gdb_get_line_number "Set breakpoint here"]
+
+# The test proper.  See description above.
+proc test {condition_eval target_non_stop non_stop displaced} {
+    global binfile srcfile
+    global gdb_prompt
+    global decimal
+    global bp_lineno
+    global GDBFLAGS
+
+    # Number of threads started by the program.
+    set n_threads 10
+
+    save_vars { GDBFLAGS } {
+	append GDBFLAGS " -ex \"maint set target-non-stop $target_non_stop\""
+	append GDBFLAGS " -ex \"set non-stop $non_stop\""
+	append GDBFLAGS " -ex \"set displaced $displaced\""
+	append GDBFLAGS " -ex \"set schedule-multiple on\""
+	clean_restart $binfile
+    }
+
+    set test_spawn_id [spawn_wait_for_attach $binfile]
+    set testpid [spawn_id_get_pid $test_spawn_id]
+
+    set any "\[^\r\n\]*"
+
+    gdb_test "add-inferior" "Added inferior 2.*"
+    gdb_test "inferior 2" "Switching to .*"
+
+    gdb_load $binfile
+    if ![runto setup_done] then {
+	fail "can't run to setup_done"
+	kill_wait_spawned_process $test_spawn_id
+	return
+    }
+
+    gdb_test_no_output "set breakpoint condition-evaluation $condition_eval"
+
+    # Get the PID of the test process.
+    set pid_inf2 ""
+    gdb_test_multiple "p mypid" "get pid of inferior 2" {
+	-re " = ($decimal)\r\n$gdb_prompt $" {
+	    set pid_inf2 $expect_out(1,string)
+	    pass $gdb_test_name
+	}
+    }
+
+    set attempts 3
+    for {set attempt 1} { $attempt <= $attempts } { incr attempt } {
+	with_test_prefix "iter $attempt" {
+	    gdb_test "inferior 1" "Switching to .*"
+
+	    set attached 0
+	    set eperm 0
+	    set test "attach"
+	    gdb_test_multiple "attach $testpid" $test {
+		-re "new threads in iteration" {
+		    # Seen when "set debug libthread_db" is on.
+		    exp_continue
+		}
+		-re "is a zombie - the process has already terminated.*$gdb_prompt " {
+		    fail $gdb_test_name
+		}
+		-re "Unable to attach: .*$gdb_prompt " {
+		    fail $gdb_test_name
+		}
+		-re "Attaching to program.*process $testpid.*$gdb_prompt " {
+		    pass $test
+		    set attached 1
+		}
+	    }
+
+	    if {!$attached} {
+		kill_wait_spawned_process $test_spawn_id
+		return
+	    }
+
+	    if {$non_stop} {
+		# In non-stop, we will see one stop per thread after
+		# the prompt.
+		set stops 0
+		set tid_re "$::decimal\.$::decimal"
+		set test "seen all stops"
+		for {set thread 1} { $thread <= $n_threads } { incr thread } {
+		    if {[gdb_test_multiple "" $test {
+			-re "Thread ${tid_re} ${any} stopped" {
+			    incr stops
+			}
+		    }] != 0} {
+			break
+		    }
+		}
+
+		# This we haven't seen all stops, then the
+		# gdb_test_multiple in the loop above will have
+		# already issued a FAIL.
+		if {$stops != $n_threads} {
+		    kill_wait_spawned_process $test_spawn_id
+		    return
+		}
+		pass $test
+	    }
+
+	    # Set threads stepping over a breakpoint continuously.
+	    gdb_test "break $srcfile:$bp_lineno if 0" "Breakpoint.*" \
+		"break LOC if 0"
+
+	    if {$attempt < $attempts} {
+		# Kick the time out timer for another round.
+		gdb_test "print again = 1" " = 1" "reset timer in the inferior"
+		# Show the time we had left in the logs, in case
+		# something goes wrong.
+		gdb_test "print seconds_left" " = .*"
+	    }
+
+	    if {$non_stop} {
+		set cont_cmd "continue -a &"
+	    } else {
+		set cont_cmd "continue &"
+	    }
+
+	    set cont_cmd_re [string_to_regexp $cont_cmd]
+	    gdb_test_multiple $cont_cmd "" {
+		-re "^$cont_cmd_re\r\nContinuing\.\r\n$gdb_prompt " {
+		    pass $gdb_test_name
+		}
+	    }
+
+	    # Wait a bit, to give time for the threads to hit the
+	    # breakpoint.
+	    sleep 1
+
+	    set running_count 0
+	    set interrupted 0
+	    gdb_test_multiple "info threads" "all threads running" {
+		-re "\\(running\\)" {
+		    incr running_count
+		    exp_continue
+		}
+		-re "Cannot execute this command while the target is running.*$gdb_prompt $" {
+		    # Testing against a remote server that doesn't do
+		    # non-stop mode.  Explicitly interrupt.  This
+		    # doesn't test the same code paths in GDB, but
+		    # it's still something.
+		    set interrupted 1
+		    gdb_test_multiple "interrupt" "" {
+			-re "$gdb_prompt " {
+			    gdb_test_multiple "" $gdb_test_name {
+				-re "received signal SIGINT, Interrupt" {
+				    pass $gdb_test_name
+				}
+			    }
+			}
+		    }
+		}
+		-re "$gdb_prompt $" {
+		    gdb_assert {$running_count == ($n_threads + 1) * 2} $gdb_test_name
+		}
+	    }
+
+	    gdb_test "detach" "Detaching from.*"
+
+	    if {!$interrupted} {
+		# Now test whether inferior 2's thread were really left
+		# running.  Currently an inline step-over stops all
+		# threads of all processes.  If detach aborts such a step
+		# over, then threads of other inferiors should be
+		# re-resumed.  Test for that by sending a signal to
+		# inferior 2.
+		remote_exec target "kill -SIGUSR1 ${pid_inf2}"
+
+		gdb_test_multiple "" "stop with SIGUSR1" {
+		    -re "received signal SIGUSR1" {
+			pass $gdb_test_name
+		    }
+		}
+	    }
+
+	    delete_breakpoints
+	}
+    }
+    kill_wait_spawned_process $test_spawn_id
+}
+
+# The test program exits after a while, in case GDB crashes.  Make it
+# wait at least as long as we may wait before declaring a time out
+# failure.
+set options { "additional_flags=-DTIMEOUT=$timeout" debug pthreads }
+
+if {[prepare_for_testing "failed to prepare" $testfile $srcfile $options] == -1} {
+    return -1
+}
+
+if ![runto_main] {
+    return -1
+}
+
+# Probe support for "set breakpoint condition-evaluation target".
+# This setting influences who steps over the breakpoint, the (remote)
+# target (e.g. gdbserver) or gdb, thus exposing issues on either the
+# target or gdb.
+set supports_condition_eval_target 1
+set cmd "set breakpoint condition-evaluation target"
+gdb_test_multiple $cmd "probe condition-evaluation target support" {
+    -re "warning: Target does not support breakpoint condition evaluation.\r\nUsing host evaluation mode instead.\r\n$gdb_prompt $" {
+	# Target doesn't support breakpoint condition evaluation on
+	# its side.
+	set supports_condition_eval_target 0
+	pass $gdb_test_name
+    }
+    -re "^$cmd\r\n$gdb_prompt $" {
+	pass $gdb_test_name
+    }
+}
+
+foreach_with_prefix breakpoint-condition-evaluation {"host" "target"} {
+    if {!$supports_condition_eval_target && ${breakpoint-condition-evaluation} == "target"} {
+	continue
+    }
+
+    foreach_with_prefix target-non-stop {"off" "on"} {
+	foreach_with_prefix non-stop {"off" "on"} {
+	    if {${non-stop} && !${target-non-stop}} {
+		# "set non-stop" overrides "maint set
+		# target-non-stop", no use testing this combination.
+		continue
+	    }
+
+	    foreach_with_prefix displaced {"off" "auto"} {
+		test ${breakpoint-condition-evaluation} ${target-non-stop} ${non-stop} ${displaced}
+	    }
+	}
+    }
+}
-- 
2.26.2


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 01/12] Fix attaching in non-stop mode
  2021-01-13  1:15 ` [PATCH 01/12] Fix attaching in non-stop mode Pedro Alves
@ 2021-01-13  3:11   ` Simon Marchi
  2021-02-03  1:21     ` Pedro Alves
  0 siblings, 1 reply; 36+ messages in thread
From: Simon Marchi @ 2021-01-13  3:11 UTC (permalink / raw)
  To: Pedro Alves, gdb-patches



On 2021-01-12 8:15 p.m., Pedro Alves wrote:
> Attaching in non-stop mode currently misbehaves, like so:
> 
>  (gdb) attach 1244450
>  Attaching to process 1244450
>  [New LWP 1244453]
>  [New LWP 1244454]
>  [New LWP 1244455]
>  [New LWP 1244456]
>  [New LWP 1244457]
>  [New LWP 1244458]
>  [New LWP 1244459]
>  [New LWP 1244461]
>  [New LWP 1244462]
>  [New LWP 1244463]
>  No unwaited-for children left.
> 
> At this point, GDB's stopped/running thread state is out of sync with
> the inferior:
> 
> (gdb) info threads
>   Id   Target Id                     Frame
> * 1    LWP 1244450 "attach-non-stop" 0xf1b443bf in ?? ()
>   2    LWP 1244453 "attach-non-stop" (running)
>   3    LWP 1244454 "attach-non-stop" (running)
>   4    LWP 1244455 "attach-non-stop" (running)
>   5    LWP 1244456 "attach-non-stop" (running)
>   6    LWP 1244457 "attach-non-stop" (running)
>   7    LWP 1244458 "attach-non-stop" (running)
>   8    LWP 1244459 "attach-non-stop" (running)
>   9    LWP 1244461 "attach-non-stop" (running)
>   10   LWP 1244462 "attach-non-stop" (running)
>   11   LWP 1244463 "attach-non-stop" (running)
> (gdb)
> (gdb) interrupt -a
> (gdb)
> *nothing*
> 
> The problem is that attaching installs an inferior continuation,
> called when the target reports the initial attach stop, here, in
> inf-loop.c:inferior_event_handler:
> 
>       /* Do all continuations associated with the whole inferior (not
> 	 a particular thread).  */
>       if (inferior_ptid != null_ptid)
> 	do_all_inferior_continuations (0);
> 
> However, currently in non-stop mode, inferior_ptid is still null_ptid
> when we get here.
> 
> If you try to do "set debug infrun 1" to debug the problem, however,
> then the attach completes correctly, with GDB reporting a stop for
> each thread.
> 
> The bug is that we're missing a switch_to_thread/context_switch call
> when handling the initial stop, here:
> 
>   if (stop_soon == STOP_QUIETLY_NO_SIGSTOP
>       && (ecs->event_thread->suspend.stop_signal == GDB_SIGNAL_STOP
> 	  || ecs->event_thread->suspend.stop_signal == GDB_SIGNAL_TRAP
> 	  || ecs->event_thread->suspend.stop_signal == GDB_SIGNAL_0))
>     {
>       stop_print_frame = true;
>       stop_waiting (ecs);
>       ecs->event_thread->suspend.stop_signal = GDB_SIGNAL_0;
>       return;
>     }
> 
> Note how the STOP_QUIETLY / STOP_QUIETLY_REMOTE case above that does
> call context_switch.
> 
> And the reason "set debug infrun 1" "fixes" it, is that the debug path
> has a switch_to_thread call.
> 
> This patch fixes it by moving the main context_switch call earlier.
> 
> A testcase exercising this will be added in a following patch.
> 
> gdb/ChangeLog:
> 
> 	* infrun.c (handle_signal_stop): Move main context_switch call
> 	earlier, before STOP_QUIETLY_NO_SIGSTOP.

This looks like this bug, if you want to annotate the ChangeLog entry:

https://sourceware.org/bugzilla/show_bug.cgi?id=27055

> ---
>  gdb/infrun.c | 27 ++++++++++++---------------
>  1 file changed, 12 insertions(+), 15 deletions(-)
> 
> diff --git a/gdb/infrun.c b/gdb/infrun.c
> index 45bedf89641..b54baf70994 100644
> --- a/gdb/infrun.c
> +++ b/gdb/infrun.c
> @@ -5735,13 +5735,23 @@ handle_signal_stop (struct execution_control_state *ecs)
>    ecs->event_thread->suspend.stop_pc
>      = regcache_read_pc (get_thread_regcache (ecs->event_thread));
>  
> +  /* See if something interesting happened to the non-current thread.
> +     If so, then switch to that thread.  */
> +  if (ecs->ptid != inferior_ptid)
> +    {
> +      infrun_debug_printf ("context switch");
> +
> +      context_switch (ecs);
> +
> +      if (deprecated_context_hook)
> +	deprecated_context_hook (ecs->event_thread->global_num);
> +    }

This condition (ecs->ptid != inferior_ptid) is half the condition that is
already in context_switch.  Could we just call context_switch unconditionally
here?  The way I understand it is that context_switch makes sure the event
thread is the current one, so it's idempotent.  Each condition we can remove
from this code makes it slightly easier to understand.

I checked deprecated_context_hook, it's only used in insight.  And all it does
is set an int:

  https://sourceware.org/git/?p=insight.git;a=blob;f=gdbtk/generic/gdbtk-hooks.c;h=0aaf404995a3e08aca1e2921d455afa7c8a438ea;hb=HEAD#l783

So it won't care if we call it when the current thread hasn't actually changed,
it's idempotent.

For fun I checked a but further, that variable is then used to drive the
"gdb_context_id" tcl variable.  When I grep in insight, I find no use of that
variable.  Is it maybe a variable that people could use in tcl scripting?  If
not, it just seems that all this could be removed...

Anyway, that LGTM in any case.

Simon

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 02/12] Fix "target extended-remote" + "maint set target-non-stop" + "attach"
  2021-01-13  1:15 ` [PATCH 02/12] Fix "target extended-remote" + "maint set target-non-stop" + "attach" Pedro Alves
@ 2021-01-13  5:01   ` Simon Marchi
  0 siblings, 0 replies; 36+ messages in thread
From: Simon Marchi @ 2021-01-13  5:01 UTC (permalink / raw)
  To: Pedro Alves, gdb-patches

On 2021-01-12 8:15 p.m., Pedro Alves wrote:
> With "target extended-remote" + "maint set target-non-stop", attaching
> hangs like so:
> 
>  (gdb) attach 1244450
>  Attaching to process 1244450
>  [New Thread 1244450.1244450]
>  [New Thread 1244450.1244453]
>  [New Thread 1244450.1244454]
>  [New Thread 1244450.1244455]
>  [New Thread 1244450.1244456]
>  [New Thread 1244450.1244457]
>  [New Thread 1244450.1244458]
>  [New Thread 1244450.1244459]
>  [New Thread 1244450.1244461]
>  [New Thread 1244450.1244462]
>  [New Thread 1244450.1244463]
>  * hang *
> 
> Attaching to the hung GDB shows that GDB is busy in an infinite loop
> in stop_all_threads:
> 
>  (top-gdb) bt
>  #0  stop_all_threads () at /home/pedro/gdb/binutils-gdb/src/gdb/infrun.c:4755
>  #1  0x000055555597b424 in stop_waiting (ecs=0x7fffffffd930) at /home/pedro/gdb/binutils-gdb/src/gdb/infrun.c:7738
>  #2  0x0000555555976fba in handle_signal_stop (ecs=0x7fffffffd930) at /home/pedro/gdb/binutils-gdb/src/gdb/infrun.c:5868
>  #3  0x0000555555975f6a in handle_inferior_event (ecs=0x7fffffffd930) at /home/pedro/gdb/binutils-gdb/src/gdb/infrun.c:5527
>  #4  0x0000555555971da4 in fetch_inferior_event () at /home/pedro/gdb/binutils-gdb/src/gdb/infrun.c:3910
>  #5  0x00005555559540b2 in inferior_event_handler (event_type=INF_REG_EVENT) at /home/pedro/gdb/binutils-gdb/src/gdb/inf-loop.c:42
>  #6  0x000055555597e825 in infrun_async_inferior_event_handler (data=0x0) at /home/pedro/gdb/binutils-gdb/src/gdb/infrun.c:9162
>  #7  0x0000555555687d1d in check_async_event_handlers () at /home/pedro/gdb/binutils-gdb/src/gdb/async-event.c:328
>  #8  0x0000555555e48284 in gdb_do_one_event () at /home/pedro/gdb/binutils-gdb/src/gdbsupport/event-loop.cc:216
>  #9  0x00005555559e7512 in start_event_loop () at /home/pedro/gdb/binutils-gdb/src/gdb/main.c:347
>  #10 0x00005555559e765d in captured_command_loop () at /home/pedro/gdb/binutils-gdb/src/gdb/main.c:407
>  #11 0x00005555559e8f80 in captured_main (data=0x7fffffffdb70) at /home/pedro/gdb/binutils-gdb/src/gdb/main.c:1239
>  #12 0x00005555559e8ff2 in gdb_main (args=0x7fffffffdb70) at /home/pedro/gdb/binutils-gdb/src/gdb/main.c:1254
>  #13 0x0000555555627c86 in main (argc=12, argv=0x7fffffffdc88) at /home/pedro/gdb/binutils-gdb/src/gdb/gdb.c:32
> 
> The problem is that the remote sends stops for all the threads:
> 
>  Packet received: l/home/pedro/gdb/binutils-gdb/build/gdb/testsuite/outputs/gdb.threads/attach-non-stop/attach-non-stop
>  Sending packet: $vStopped#55...Packet received: T0006:f06e25edec7f0000;07:f06e25edec7f0000;10:f14190ccf4550000;thread:p12fd22.12fd2f;core:15;
>  Sending packet: $vStopped#55...Packet received: T0006:f0dea5f0ec7f0000;07:f0dea5f0ec7f0000;10:e84190ccf4550000;thread:p12fd22.12fd27;core:4;
>  Sending packet: $vStopped#55...Packet received: T0006:f0ee25f1ec7f0000;07:f0ee25f1ec7f0000;10:f14190ccf4550000;thread:p12fd22.12fd26;core:5;
>  Sending packet: $vStopped#55...Packet received: T0006:f0bea5efec7f0000;07:f0bea5efec7f0000;10:f14190ccf4550000;thread:p12fd22.12fd29;core:1;
>  Sending packet: $vStopped#55...Packet received: T0006:f0ce25f0ec7f0000;07:f0ce25f0ec7f0000;10:e84190ccf4550000;thread:p12fd22.12fd28;core:a;
>  Sending packet: $vStopped#55...Packet received: T0006:f07ea5edec7f0000;07:f07ea5edec7f0000;10:e84190ccf4550000;thread:p12fd22.12fd2e;core:f;
>  Sending packet: $vStopped#55...Packet received: T0006:f0ae25efec7f0000;07:f0ae25efec7f0000;10:df4190ccf4550000;thread:p12fd22.12fd2a;core:6;
>  Sending packet: $vStopped#55...Packet received: T0006:0000000000000000;07:c0e8a381fe7f0000;10:bf43b4f1ec7f0000;thread:p12fd22.12fd22;core:2;
>  Sending packet: $vStopped#55...Packet received: T0006:f0fea5f1ec7f0000;07:f0fea5f1ec7f0000;10:df4190ccf4550000;thread:p12fd22.12fd25;core:8;
>  Sending packet: $vStopped#55...Packet received: T0006:f09ea5eeec7f0000;07:f09ea5eeec7f0000;10:e84190ccf4550000;thread:p12fd22.12fd2b;core:b;
>  Sending packet: $vStopped#55...Packet received: OK
> 
> But then wait_one never consumes them, always hitting this path:
> 
>  4473          if (nfds == 0)
>  4474            {
>  4475              /* No waitable targets left.  All must be stopped.  */
>  4476              return {NULL, minus_one_ptid, {TARGET_WAITKIND_NO_RESUMED}};
>  4477            }
> 
> Resulting in GDB constanly calling target_stop to stop threads, but
> the remote target never reporting back the stops to infrun.
> 
> That TARGET_WAITKIND_NO_RESUMED path shown above is always taken
> because here, in wait_one too, just above:
> 
>  4428          for (inferior *inf : all_inferiors ())
>  4429            {
>  4430              process_stratum_target *target = inf->process_target ();
>  4431              if (target == NULL
>  4432                  || !target->is_async_p ()
>                            ^^^^^^^^^^^^^^^^^^^^^
>  4433                  || !target->threads_executing)
>  4434                continue;
> 
> ... the remote target is not async.
> 
> And in turn that happened because extended_remote_target::attach
> misses enabling async in the target-non-stop path.
> 
> A testcase exercising this will be added in a following patch.
> 
> gdb/ChangeLog:
> 
> 	* remote.c (extended_remote_target::attach): Set target async in
> 	the target-non-stop path too.
> ---
>  gdb/remote.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/gdb/remote.c b/gdb/remote.c
> index 6dacc24307e..2f15c4c1f32 100644
> --- a/gdb/remote.c
> +++ b/gdb/remote.c
> @@ -5968,7 +5968,12 @@ extended_remote_target::attach (const char *args, int from_tty)
>  	}
>      }
>    else
> -    gdb_assert (wait_status == NULL);
> +    {
> +      gdb_assert (wait_status == NULL);
> +
> +      gdb_assert (target_can_async_p ());
> +      target_async (1);
> +    }
>  }
>  
>  /* Implementation of the to_post_attach method.  */
> 

LGTM.

Simon

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 03/12] Testcase for attaching in non-stop mode
  2021-01-13  1:15 ` [PATCH 03/12] Testcase for attaching in non-stop mode Pedro Alves
@ 2021-01-13  5:09   ` Simon Marchi
  2021-02-03  1:23     ` Pedro Alves
  0 siblings, 1 reply; 36+ messages in thread
From: Simon Marchi @ 2021-01-13  5:09 UTC (permalink / raw)
  To: Pedro Alves, gdb-patches

On 2021-01-12 8:15 p.m., Pedro Alves wrote:
> diff --git a/gdb/testsuite/gdb.threads/attach-non-stop.exp b/gdb/testsuite/gdb.threads/attach-non-stop.exp
> new file mode 100644
> index 00000000000..96b95b764d7
> --- /dev/null
> +++ b/gdb/testsuite/gdb.threads/attach-non-stop.exp
> @@ -0,0 +1,149 @@
> +# Copyright 2021 Free Software Foundation, Inc.
> +
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
> +
> +# Test attaching to a multi-threaded process, in all combinations of:
> +#
> +#  - set non-stop on/off
> +#  - maint target non-stop off/on
> +#  - "attach" vs "attach &"
> +
> +if {![can_spawn_for_attach]} {
> +    return 0
> +}
> +
> +standard_testfile
> +
> +# The test proper.  See description above.
> +
> +proc test {target_non_stop non_stop cmd} {
> +    global binfile srcfile
> +    global gdb_prompt
> +    global decimal
> +    global GDBFLAGS
> +
> +    # Number of threads started by the program.
> +    set n_threads 10
> +
> +    save_vars { GDBFLAGS } {
> +	append GDBFLAGS " -ex \"maint set target-non-stop $target_non_stop\""
> +	append GDBFLAGS " -ex \"set non-stop $non_stop\""
> +	clean_restart $binfile
> +    }
> +
> +    set test_spawn_id [spawn_wait_for_attach $binfile]
> +    set testpid [spawn_id_get_pid $test_spawn_id]
> +
> +    set attached 0
> +    set test "attach"
> +    set any "\[^\r\n\]*"
> +
> +    if {$cmd == "attach"} {
> +	gdb_test_multiple "attach $testpid" $test {
> +	    -re "Attaching to program:${any}process $testpid\r\n.*$gdb_prompt " {
> +		pass $test
> +		set attached 1
> +	    }
> +	}
> +
> +	if {!$attached} {
> +	    kill_wait_spawned_process $test_spawn_id
> +	    return
> +	}
> +
> +	if {$non_stop} {
> +	    # In non-stop, we will see one stop per thread after
> +	    # the prompt.
> +	    set stops 0
> +	    set test "seen all stops"
> +	    for {set thread 1} { $thread <= $n_threads } { incr thread } {
> +		gdb_test_multiple "" $test {
> +		    -re "Thread $::decimal ${any} stopped" {
> +			incr stops
> +		    }
> +		}
> +	    }
> +
> +	    # This we haven't seen all stops, then the

This sentence is weird.

Otherwise, the patch LGTM.

Simon

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 04/12] Fix a couple vStopped pending ack bugs
  2021-01-13  1:15 ` [PATCH 04/12] Fix a couple vStopped pending ack bugs Pedro Alves
@ 2021-01-13  5:29   ` Simon Marchi
  2021-02-03  1:25     ` Pedro Alves
  0 siblings, 1 reply; 36+ messages in thread
From: Simon Marchi @ 2021-01-13  5:29 UTC (permalink / raw)
  To: Pedro Alves, gdb-patches

On 2021-01-12 8:15 p.m., Pedro Alves wrote:
> A following patch will add a testcase that has two processes with
> threads stepping over a breakpoint continuously, and then detaches
> from one of the processes while threads are running.  The other
> process continues stepping over its breakpoint.  And then the testcase
> sends a SIGUSR1, expecting that GDB reports it.  That would sometimes
> hang against gdbserver, due to the bugs fixed here.  Both bugs are
> related, in that they're about remote protocol asynchronous Stop
> notifications.  There's a bug in GDB, and another in GDBserver.
> 
> The GDB bug:
> 
> - when we detach from a process, the remote target discards any
>   pending RSP notification related to that process, including the
>   in-flight, yet-unacked notification.  Discarding the in-flight
>   notification is the problem.  Until the in-flight notification is
>   acked with a vStopped packet, the server won't send another %Stop
>   notification.  As a result, the debug session gets messed up.  In
>   the new testcase's case, GDB would hang inside stop_all_threads,
>   waiting for a stop for one of the process'es threads, which never
>   arrived -- its stop reply was permanently stuck in the stop reply
>   queue, waiting for a vStopped packet that never arrived.

Just to be sure I understand does that sum it up?

1. GDBserver sends stop notification about thread X, the remote target
   receives it and stores it
2. At the same time, GDB detaches thread X's inferior
3. The remote target discards the received stop notification
4. GDBserver waits forever for the ack


> The GDBserver bug:
> 
>   GDBserver has the opposite bug.  It also discards notifications for
>   the process being detached.  If that discards the head of the
>   notification queue, when gdb sends an ack, it ends up acking the
>   _next_ notification.  Meaning, gdb loses one notification.  In the
>   testcase, this results in a similar hang in stop_all_threads.
> 
> So we have two very similar bugs in GDB and GDBserver, both resulting
> in a similar symptom.  That's why I'm fixing them both at the same
> time.
> 
> gdb/ChangeLog:
> 
> 	* remote.c (remote_notif_stop_ack): Don't error out on
> 	TARGET_WAITKIND_IGNORE; instead, just ignore the notification.
> 	(remote_target::discard_pending_stop_replies): Don't delete
> 	in-flight notification; instead, clear its contents.
> 
> gdbserver/ChangeLog:
> 
> 	* server.cc (discard_queued_stop_replies): Don't ever discard the
> 	notification at the head of the list.
> ---
>  gdb/remote.c        | 22 +++++++++++++---------
>  gdbserver/server.cc |  9 +++++++++
>  2 files changed, 22 insertions(+), 9 deletions(-)
> 
> diff --git a/gdb/remote.c b/gdb/remote.c
> index 2f15c4c1f32..7199e0ac422 100644
> --- a/gdb/remote.c
> +++ b/gdb/remote.c
> @@ -6935,13 +6935,11 @@ remote_notif_stop_ack (remote_target *remote,
>    /* acknowledge */
>    putpkt (remote, self->ack_command);
>  
> -  if (stop_reply->ws.kind == TARGET_WAITKIND_IGNORE)
> -    {
> -      /* We got an unknown stop reply.  */
> -      error (_("Unknown stop reply"));
> -    }
> -
> -  remote->push_stop_reply (stop_reply);
> +  /* Kind can be TARGET_WAITKIND_IGNORE if we have meanwhile discarded
> +     the notification.  It was left in the queue because we need to
> +     acknowledge it and pull the rest of the notifications out.  */
> +  if (stop_reply->ws.kind != TARGET_WAITKIND_IGNORE)
> +    remote->push_stop_reply (stop_reply);
>  }
>  
>  static int
> @@ -7110,8 +7108,14 @@ remote_target::discard_pending_stop_replies (struct inferior *inf)
>    /* Discard the in-flight notification.  */
>    if (reply != NULL && reply->ptid.pid () == inf->pid)
>      {
> -      delete reply;
> -      rns->pending_event[notif_client_stop.id] = NULL;
> +      /* Leave the notification pending, since the server expects that
> +	 we acknowledge it with vStopped.  But clear its contents, so
> +	 that later on when we acknowledge it, we also discard it.  */
> +      reply->ws.kind = TARGET_WAITKIND_IGNORE;
> +
> +      if (remote_debug)
> +	fprintf_unfiltered (gdb_stdlog,
> +			    "discarded in-flight notification\n");
>      }
>  
>    /* Discard the stop replies we have already pulled with
> diff --git a/gdbserver/server.cc b/gdbserver/server.cc
> index 77e89fe6ed0..a5497e93cee 100644
> --- a/gdbserver/server.cc
> +++ b/gdbserver/server.cc
> @@ -203,6 +203,15 @@ discard_queued_stop_replies (ptid_t ptid)
>        next = iter;
>        ++next;
>  
> +      if (iter == notif_stop.queue.begin ())
> +	{
> +	  /* The head of the list contains the notification that was
> +	     already sent to GDB.  So we can't remove it, otherwise
> +	     when GDB sends the vStopped, it would ack the _next_
> +	     notification, which hadn't been sent yet!  */
> +	  continue;
> +	}
> +

Since the first item in the list has a special status (it is the in
flight one), perhaps having a separate field in notif_server for it
would make things clearer.

I mean having an additional

  notif_event *in_flight;

field in there.  When sending a notification, it gets dequeu-ed from
the queue and put there.

Either way, this LGTM.

Simon

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 05/12] gdbserver: spurious SIGTRAP w/ detach while step-over in progress
  2021-01-13  1:15 ` [PATCH 05/12] gdbserver: spurious SIGTRAP w/ detach while step-over in progress Pedro Alves
@ 2021-01-13  6:00   ` Simon Marchi
  2021-02-03  1:26     ` Pedro Alves
  0 siblings, 1 reply; 36+ messages in thread
From: Simon Marchi @ 2021-01-13  6:00 UTC (permalink / raw)
  To: Pedro Alves, gdb-patches

On 2021-01-12 8:15 p.m., Pedro Alves wrote:
> A following patch will add a new testcase that has two processes, each
> with a number of threads constantly tripping a breakpoint and stepping
> over it, because the breakpoint has a condition that evals false.
> Then GDB detaches from one of the processes, while both processes are
> running.  And then the testcase sends a SIGUSR1 to the other process.
> 
> When run against gdbserver, that would occasionaly fail like this:
> 
>  (gdb) PASS: gdb.threads/detach-step-over.exp: iter 1: detach
>  Executing on target: kill -SIGUSR1 208303    (timeout = 300)
>  spawn -ignore SIGHUP kill -SIGUSR1 208303
> 
>  Thread 2.5 "detach-step-ove" received signal SIGTRAP, Trace/breakpoint trap.
>  [Switching to Thread 208303.208305]
>  0x000055555555522a in thread_func (arg=0x0) at /home/pedro/gdb/binutils-gdb/src/gdb/testsuite/gdb.threads/detach-step-over.c:54
>  54            counter++; /* Set breakpoint here.  */
> 
> Note that it's gdbserver itself that steps over the breakpoint.
> 
> The gdbserver logs reveal what happened:
> 
>  - GDB manages to detach while a step over is in progress.  That reaches
>    linux_process_target::complete_ongoing_step_over(), which does:
> 
>       /* Passing NULL_PTID as filter indicates we want all events to
> 	 be left pending.  Eventually this returns when there are no
> 	 unwaited-for children left.  */
>       ret = wait_for_event_filtered (minus_one_ptid, null_ptid, &wstat,
> 				     __WALL);
> 
>    As the comment say, this leaves all events pending, _including_ the
>    just finished step SIGTRAP.  We never discard that SIGTRAP.  So
>    GDBserver reports the SIGTRAP to GDB.  GDB can't explain the
>    SIGTRAP, so it reports it to the user.
> 
> The gdbserver log looks like this.  The LWP of interest is 208305:
> 
>  Need step over [LWP 208305]? yes, found breakpoint at 0x555555555227
>  proceed_all_lwps: found thread 208305 needing a step-over
>  Starting step-over on LWP 208305.  Stopping all threads
> 
> 208305 starts a step-over.
> 
>  >>>> entering void linux_process_target::stop_all_lwps(int, lwp_info*)
>  stop_all_lwps (stop-and-suspend, except=LWP 208303.208305)
>  Sending sigstop to lwp 208303
>  Sending sigstop to lwp 207755
>  wait_for_sigstop: pulling events
>  LWFE: waitpid(-1, ...) returned 207755, ERRNO-OK
>  LLW: waitpid 207755 received Stopped (signal) (stopped)
>  pc is 0x7f7e045593bf
>  Expected stop.
>  LLW: SIGSTOP caught for LWP 207755.207755 while stopping threads.
>  LWFE: waitpid(-1, ...) returned 208303, ERRNO-OK
>  LLW: waitpid 208303 received Stopped (signal) (stopped)
>  pc is 0x7ffff7e743bf
>  Expected stop.
>  LLW: SIGSTOP caught for LWP 208303.208303 while stopping threads.
>  LWFE: waitpid(-1, ...) returned 0, ERRNO-OK
>  leader_pid=208303, leader_lp!=NULL=1, num_lwps=11, zombie=0
>  leader_pid=207755, leader_lp!=NULL=1, num_lwps=11, zombie=0
>  LLW: exit (no unwaited-for LWP)
>  stop_all_lwps done, setting stopping_threads back to !stopping
>  <<<< exiting void linux_process_target::stop_all_lwps(int, lwp_info*)
>  Done stopping all threads for step-over.
>  pc is 0x555555555227
>  Writing 8b to 0x555555555227 in process 208305
>  Could not findsigchld_handler
>   fast tracepoint jump at 0x555555555227 in list (uninserting).
>    pending reinsert at 0x555555555227
>    step from pc 0x555555555227
>  Resuming lwp 208305 (step, signal 0, stop expected)
>  <<<< exiting ptid_t linux_process_target::wait_1(ptid_t, target_waitstatus*, target_wait_flags)
>  handling possible serial event
>  getpkt ("D;32b8b");  [no ack sent]
> 
> The detach request arrives.
> 
>  sigchld_handler
>  Tracing is already off, ignoring
>  detach: step over in progress, finish it first
> 
> gdbserver realizes a step over for 208305 was in progress, let's it
> finish.
> 
>  LWFE: waitpid(-1, ...) returned 208305, ERRNO-OK
>  LLW: waitpid 208305 received Stopped (signal) (stopped)
>  pc is 0x555555555227
>  Expected stop.
>  LLW: step LWP 208303.208305, 0, 0 (discard delayed SIGSTOP)
>    pending reinsert at 0x555555555227
>    step from pc 0x555555555227
>  Resuming lwp 208305 (step, signal 0, stop not expected)
>  LWFE: waitpid(-1, ...) returned 0, ERRNO-OK
>  leader_pid=208303, leader_lp!=NULL=1, num_lwps=11, zombie=0
>  leader_pid=207755, leader_lp!=NULL=1, num_lwps=11, zombie=0
>  sigsuspend'ing
>  LWFE: waitpid(-1, ...) returned 208305, ERRNO-OK
>  LLW: waitpid 208305 received Trace/breakpoint trap (stopped)
>  pc is 0x55555555522a
>  CSBB: LWP 208303.208305 stopped by trace
>  LWFE: waitpid(-1, ...) returned 0, ERRNO-OK
>  leader_pid=208303, leader_lp!=NULL=1, num_lwps=11, zombie=0
>  leader_pid=207755, leader_lp!=NULL=1, num_lwps=11, zombie=0
>  LLW: exit (no unwaited-for LWP)
>  Finished step over.
> 
> The step-over for 208305 finishes.
> 
>  Writing cc to 0x555555555227 in process 208305
>  Could not find fast tracepoint jump at 0x555555555227 in list (reinserting).
>  >>>> entering void linux_process_target::stop_all_lwps(int, lwp_info*)
>  stop_all_lwps (stop, except=none)
>  wait_for_sigstop: pulling events
> 
> The detach proceeds (snipped).
> 
> ...
> 
>  proceed_one_lwp: lwp 208305
>     LWP 208305 has pending status, leaving stopped
> 
> Later on, 208305 has a pending status (the step SIGTRAP from the
> step-over), so GDBserver starts the process of reporting it.
> 
> ...
> 
>  wait_1 ret = LWP 208303.208305, 1, 5
>  <<<< exiting ptid_t linux_process_target::wait_1(ptid_t, target_waitstatus*, target_wait_flags)
> 
> ...
> 
> and eventually GDB receives the stop notification (T05 == SIGTRAP):
> 
>  getpkt ("vStopped");  [no ack sent]
>  sigchld_handler
>  vStopped: acking 3
>  Writing resume reply for LWP 208303.208305:1
>  putpkt ("$T0506:f0ee58f7ff7f0* ;07:f0ee58f7ff7f0* ;10:2a525*"550* ;thread:p32daf.32db1;core:c;#37"); [noack mode]
> 
> From the GDB side, we see:
> 
>  [infrun] fetch_inferior_event: enter
>    [infrun] fetch_inferior_event: fetch_inferior_event enter
>    [infrun] do_target_wait: Found 2 inferiors, starting at #1
>    [infrun] print_target_wait_results: target_wait (-1.0.0 [process -1], status) =
>    [infrun] print_target_wait_results:   208303.208305.0 [Thread 208303.208305],
>    [infrun] print_target_wait_results:   status->kind = stopped, signal = GDB_SIGNAL_TRAP
>    [infrun] handle_inferior_event: status->kind = stopped, signal = GDB_SIGNAL_TRAP
>    [infrun] start_step_over: enter
>      [infrun] start_step_over: stealing global queue of threads to step, length = 6
>      [infrun] operator(): putting back 6 threads to step in global queue
>    [infrun] start_step_over: exit
>    [infrun] handle_signal_stop: context switch
>    [infrun] context_switch: Switching context from process 0 to Thread 208303.208305
>    [infrun] handle_signal_stop: stop_pc=0x55555555522a
>    [infrun] handle_signal_stop: random signal (GDB_SIGNAL_TRAP)
>    [infrun] stop_waiting: stop_waiting
>    [infrun] stop_all_threads: starting
> 
> The fix is to discard the step SIGTRAP, unless GDB wanted the thread
> to step.

Maybe the only thing that wasn't clear from the start of the
explanation is that GDBserver is doing a step-over for process A
when a detach request for process B arrives.  And that generates
a spurious SIGTRAP report for process A.  It wasn't clear which
process was doing doing what.  But otherwise, make sense.

The fix LGTM, although I'm not much used to the GDBserver internals.

Simon

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 06/12] Factor out after-stop event handling code from stop_all_threads
  2021-01-13  1:15 ` [PATCH 06/12] Factor out after-stop event handling code from stop_all_threads Pedro Alves
@ 2021-01-13  6:06   ` Simon Marchi
  2021-02-03  1:26     ` Pedro Alves
  0 siblings, 1 reply; 36+ messages in thread
From: Simon Marchi @ 2021-01-13  6:06 UTC (permalink / raw)
  To: Pedro Alves, gdb-patches

On 2021-01-12 8:15 p.m., Pedro Alves wrote:
> This moves the code handling an event out of wait_one to a separate
> function, to be used in another context in a following patch.
> 
> gdb/ChangeLog:
> 
> 	* infrun.c (handle_one): New function, factored out from ...
> 	(stop_all_threads): ... here.

I think this is a good cleanup on its own, that function was starting to
get crowded.  Since that removes a few indentation levels to the code, it
would be nice to un-wrap it a little bit, that would help readability.

Could you document the meaning of the return value of handle_one in its
top comment?

Otherwise, that LGTM.

Simon

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 07/12] prepare_for_detach: don't release scoped_restore at the end
  2021-01-13  1:15 ` [PATCH 07/12] prepare_for_detach: don't release scoped_restore at the end Pedro Alves
@ 2021-01-13  6:08   ` Simon Marchi
  2021-02-03  1:27     ` Pedro Alves
  0 siblings, 1 reply; 36+ messages in thread
From: Simon Marchi @ 2021-01-13  6:08 UTC (permalink / raw)
  To: Pedro Alves, gdb-patches



On 2021-01-12 8:15 p.m., Pedro Alves wrote:
> After detaching from a process, the inf->detaching flag is
> inadvertently left set to true.  If you afterwards reuse the same
> inferior to start a new process, GDB will mishave...
> 
> The problem is that prepare_for_detach discards the scoped_restore at
> the end, while the intention is for the flag to be set only for the
> duration of prepare_for_detach.
> 
> This was already a bug in the original commit that added
> prepare_for_detach, commit 24291992dac3 ("PR gdb/11321"), by yours
> truly.  Back then, we still used cleanups, and the function called
> discard_cleanups instead of do_cleanups, by mistake.
> 
> gdb/ChangeLog:
> 
> 	* infrun.c (prepare_for_detach): Don't release scoped_restore at
> 	the end.
> ---
>  gdb/infrun.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/gdb/infrun.c b/gdb/infrun.c
> index b62e74d3dab..fc7ba745737 100644
> --- a/gdb/infrun.c
> +++ b/gdb/infrun.c
> @@ -3612,8 +3612,6 @@ prepare_for_detach (void)
>  	  error (_("Program exited while detaching"));
>  	}
>      }
> -
> -  restore_detaching.release ();

There's another "restore_detaching.release ()" a few lines above, that
one should stay?

Simon

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 08/12] prepare_for_detach and ongoing displaced stepping
  2021-01-13  1:15 ` [PATCH 08/12] prepare_for_detach and ongoing displaced stepping Pedro Alves
@ 2021-01-13  6:23   ` Simon Marchi
  0 siblings, 0 replies; 36+ messages in thread
From: Simon Marchi @ 2021-01-13  6:23 UTC (permalink / raw)
  To: Pedro Alves, gdb-patches

On 2021-01-12 8:15 p.m., Pedro Alves wrote:
> I noticed that "detach" while a program was running sometimes resulted
> in the process crashing.  I tracked it down to this change to
> prepare_for_detach in commit 187b041e ("gdb: move displaced stepping
> logic to gdbarch, allow starting concurrent displaced steps"):
> 
>     /* Is any thread of this process displaced stepping?  If not,
>        there's nothing else to do.  */
>  -  if (displaced->step_thread == nullptr)
>  +  if (displaced_step_in_progress (inf))
>       return;
> 
> The problem above is that the condition was inadvertently flipped.  It
> should have been:
> 
>    if (!displaced_step_in_progress (inf))
> 
> So I fixed it, and wrote a testcase to exercise it.  The testcase has
> a number of threads constantly stepping over a breakpoint, and then
> GDB detaches the process, while threads are running and stepping over
> the breakpoint.  And then I was surprised that my testcase would hang
> -- GDB would get stuck in an infinite loop in prepare_for_detach,
> here:
> 
>   while (displaced_step_in_progress (inf))
>     {
>       ...
> 
> What is going on is that since we now have two displaced stepping
> buffers, as one displaced step finishes, GDB starts another, and
> there's another one already in progress, and on and on, so the
> displaced_step_in_progress condition never turns false.  This happens
> because we go via the whole handle_inferior_event, which tries to
> start new step overs when one finishes.  And also because while we
> remove breakpoints from the target before prepare_for_detach is
> called, handle_inferior_event ends up calling insert_breakpoints via
> e.g. keep_going.
> 
> Thinking through all this, I came to the conclusion that going through
> the whole handle_inferior_event isn't ideal.  A _lot_ is done by that
> function, e.g., some thread may get a signal which is passed to the
> inferior, and gdb decides to try to get over the signal handler, which
> reinstalls breakpoints.  Or some process may exit.  We can end up
> reporting these events via normal_stop while detaching, maybe end up
> running some breakpoint commands, or maybe even something runs an
> inferior function call.  Etc.  All this after the user has already
> declared they don't want to debug the process anymore, by asking to
> detach.
> 
> I came to the conclusion that it's better to do the minimal amount of
> work possible, in a more controlled fashion, without going through
> handle_inferior_event.  So in the new approach implemented by this
> patch, if there are threads of the inferior that we're detaching in
> the middle of a displaced step, stop them, and cancel the displaced
> step.  This is basically what stop_all_threads already does, via
> wait_one and (the now factored out) handle_one, so I'm reusing those.
> 
> gdb/ChangeLog:
> 
> 	* infrun.c (struct wait_one_event): Move higher up.
> 	(prepare_for_detach): Abort in-progress displaced steps instead of
> 	letting them complete.
> 	(handle_one): If the inferior is detaching, don't add the thread
> 	back to the global step-over chain.
> 	(restart_threads): Don't restart threads if detaching.
> 	(handle_signal_stop): Remove inferior::detaching reference.
> ---
>  gdb/infrun.c | 123 +++++++++++++++++++++++++++------------------------
>  1 file changed, 65 insertions(+), 58 deletions(-)
> 
> diff --git a/gdb/infrun.c b/gdb/infrun.c
> index fc7ba745737..12e1564c090 100644
> --- a/gdb/infrun.c
> +++ b/gdb/infrun.c
> @@ -3551,6 +3551,22 @@ do_target_wait (ptid_t wait_ptid, execution_control_state *ecs,
>    return false;
>  }
>  
> +/* An event reported by wait_one.  */
> +
> +struct wait_one_event
> +{
> +  /* The target the event came out of.  */
> +  process_stratum_target *target;
> +
> +  /* The PTID the event was for.  */
> +  ptid_t ptid;
> +
> +  /* The waitstatus.  */
> +  target_waitstatus ws;
> +};
> +
> +static bool handle_one (const wait_one_event &event);
> +
>  /* Prepare and stabilize the inferior for detaching it.  E.g.,
>     detaching while a thread is displaced stepping is a recipe for
>     crashing it, as nothing would readjust the PC out of the scratch
> @@ -3561,56 +3577,60 @@ prepare_for_detach (void)
>  {
>    struct inferior *inf = current_inferior ();
>    ptid_t pid_ptid = ptid_t (inf->pid);
> -
> -  /* Is any thread of this process displaced stepping?  If not,
> -     there's nothing else to do.  */
> -  if (displaced_step_in_progress (inf))
> -    return;
> -
> -  infrun_debug_printf ("displaced-stepping in-process while detaching");
> +  scoped_restore_current_thread restore_thread;
>  
>    scoped_restore restore_detaching = make_scoped_restore (&inf->detaching, true);
>  
> -  while (displaced_step_in_progress (inf))
> +  /* Remove all threads of INF from the global step-over chain.  We
> +     want to stop any ongoing step-over, not start any new one.  */
> +  thread_info *next;
> +  for (thread_info *tp = global_thread_step_over_chain_head;
> +       tp != nullptr;
> +       tp = next)
>      {
> -      struct execution_control_state ecss;
> -      struct execution_control_state *ecs;
> -
> -      ecs = &ecss;
> -      memset (ecs, 0, sizeof (*ecs));
> +      next = global_thread_step_over_chain_next (tp);
> +      if (tp->inf == inf)
> +	global_thread_step_over_chain_remove (tp);
> +    }
>  
> -      overlay_cache_invalid = 1;
> -      /* Flush target cache before starting to handle each event.
> -	 Target was running and cache could be stale.  This is just a
> -	 heuristic.  Running threads may modify target memory, but we
> -	 don't get any event.  */
> -      target_dcache_invalidate ();
> +  if (displaced_step_in_progress (inf))
> +    {
> +      infrun_debug_printf ("displaced-stepping in-process while detaching");
>  
> -      do_target_wait (pid_ptid, ecs, 0);
> +      /* Stop threads currently displaced stepping, aborting it.  */
>  
> -      if (debug_infrun)
> -	print_target_wait_results (pid_ptid, ecs->ptid, &ecs->ws);
> +      for (thread_info *thr : inf->non_exited_threads ())
> +	{
> +	  if (thr->displaced_step_state.in_progress ())
> +	    {
> +	      if (thr->executing)
> +		{
> +		  if (!thr->stop_requested)
> +		    {
> +		      target_stop (thr->ptid);
> +		      thr->stop_requested = true;
> +		    }
> +		}
> +	      else
> +		thr->resumed = false;
> +	    }
> +	}
>  
> -      /* If an error happens while handling the event, propagate GDB's
> -	 knowledge of the executing state to the frontend/user running
> -	 state.  */
> -      scoped_finish_thread_state finish_state (inf->process_target (),
> -					       minus_one_ptid);
> +      while (displaced_step_in_progress (inf))
> +	{
> +	  wait_one_event event;
>  
> -      /* Now figure out what to do with the result of the result.  */
> -      handle_inferior_event (ecs);
> +	  event.target = inf->process_target ();
> +	  event.ptid = do_target_wait_1 (inf, pid_ptid, &event.ws, 0);
>  
> -      /* No error, don't finish the state yet.  */
> -      finish_state.release ();
> +	  if (debug_infrun)
> +	    print_target_wait_results (pid_ptid, event.ptid, &event.ws);
>  
> -      /* Breakpoints and watchpoints are not installed on the target
> -	 at this point, and signals are passed directly to the
> -	 inferior, so this must mean the process is gone.  */
> -      if (!ecs->wait_some_more)
> -	{
> -	  restore_detaching.release ();
> -	  error (_("Program exited while detaching"));
> +	  handle_one (event);
>  	}
> +
> +      /* It's OK to leave some of the threads of INF stopped, since
> +	 they'll be detached shortly.  */
>      }
>  }

Thanks, that all LGTM.  I thought that maybe we could add a last gdb_assert
to check that there is no thread of the inferior in the global step over
chain at the end, to make sure that while handling the events we haven't
enqueued one.  But that's not so important.

Simon

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 09/12] detach and breakpoint removal
  2021-01-13  1:15 ` [PATCH 09/12] detach and breakpoint removal Pedro Alves
@ 2021-01-13  6:32   ` Simon Marchi
  2021-02-03  1:28     ` Pedro Alves
  0 siblings, 1 reply; 36+ messages in thread
From: Simon Marchi @ 2021-01-13  6:32 UTC (permalink / raw)
  To: Pedro Alves, gdb-patches



On 2021-01-12 8:15 p.m., Pedro Alves wrote:
> A following patch will add a testcase that has a number of threads
> constantly stepping over a breakpoint, and then has GDB detach the
> process.  That testcase sometimes fails with the inferior crashing
> with SIGTRAP after the detach because of the bug fixed by this patch,
> when tested with the native target.
> 
> The problem is that target_detach removes breakpoints from the target
> immediately, and that does not work with the native GNU/Linux target
> (and probably no other native target) currently.  The test wouldn't
> fail with this issue when testing against gdbserver, because gdbserver
> does allow accessing memory while the current thread is running, by
> transparently pausing all threads temporarily, without GDB noticing.
> Implementing that in gdbserver was a lot of work, so I'm not looking
> forward right now to do the same in the native target.  Instead, I
> came up with a simpler solution -- push the breakpoints removal down
> to the targets.  The Linux target conveniently already pauses all
> threads before detaching them, since PTRACE_DETACH only works with
> stopped threads, so we move removing breakpoints to after that.  Only
> the remote and GNU/Linux targets support support async execution, so
> no other target should really need this.

I think that makes sense.

I initially thought that GDB could always stop all threads, remove
breakpoints, and then ask the target to detach.  But then that's
unfair for the targets that can remove the breakpoints while all
threads are running.  These targets would pay a price (a window of
time where threads are paused when detaching) they don't need to pay.
So leaving the task of removing breakpoints to the target makes sense.

I think it would be good to mention this in the doc of
target_ops::detach (which doesn't exist yet), that the target is
responsible for removing the breakpoints from the inferior.

Simon

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 10/12] detach with in-line step over in progress
  2021-01-13  1:15 ` [PATCH 10/12] detach with in-line step over in progress Pedro Alves
@ 2021-01-13  6:39   ` Simon Marchi
  0 siblings, 0 replies; 36+ messages in thread
From: Simon Marchi @ 2021-01-13  6:39 UTC (permalink / raw)
  To: Pedro Alves, gdb-patches



On 2021-01-12 8:15 p.m., Pedro Alves wrote:
> A following patch will add a testcase that has a number of threads
> constantly stepping over a breakpoint, and then has GDB detach the
> process.  That testcase exercises both "set displaced-stepping
> on/off".  Testing with "set displaced-stepping off" reveals that GDB
> does not handle the case of the user typing "detach" just while some
> thread is in the middle of an in-line step over.  If that thread
> belongs to the inferior that is being detached, then the step-over
> never finishes, and threads of other inferiors are never re-resumed.
> This fixes it.

LGTM, thanks.

Simon

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 11/12] detach in all-stop with threads running
  2021-01-13  1:15 ` [PATCH 11/12] detach in all-stop with threads running Pedro Alves
@ 2021-01-13  6:55   ` Simon Marchi
  2021-02-03  1:31     ` Pedro Alves
  2021-07-12 15:36   ` Jonah Graham
  1 sibling, 1 reply; 36+ messages in thread
From: Simon Marchi @ 2021-01-13  6:55 UTC (permalink / raw)
  To: Pedro Alves, gdb-patches



On 2021-01-12 8:15 p.m., Pedro Alves wrote:
> A following patch will add a testcase that has a number of threads
> constantly stepping over a breakpoint, and then has GDB detach the
> process, while threads are running.  If we have more then one inferior

"more than"

> running, and we detach from just one of the inferiors, we expect that
> the remaining inferior continues running.  However, in all-stop, if
> GDB needs to pause the target for the detach, nothing is re-resuming
> the other inferiors after the detach.  "info threads" shows the
> threads as running, but they really aren't.  This fixes it.
> 
> gdb/ChangeLog:
> 
> 	* infcmd.c (detach_command): Hold strong reference to target, and
> 	if all-stop on entry, restart threads on exit.
> 	* infrun.c (switch_back_to_stepped_thread): Factor out bits to ...
> 	(restart_stepped_thread): ... this new function.  Also handle
> 	trap_expected.
> 	(restart_after_all_stop_detach): New function.
> 	* infrun.h (restart_after_all_stop_detach): Declare.
> ---
>  gdb/infcmd.c |  13 ++++
>  gdb/infrun.c | 163 +++++++++++++++++++++++++++++++++++----------------
>  gdb/infrun.h |   4 ++
>  3 files changed, 128 insertions(+), 52 deletions(-)
> 
> diff --git a/gdb/infcmd.c b/gdb/infcmd.c
> index 6f0ed952de6..ebaf57592ef 100644
> --- a/gdb/infcmd.c
> +++ b/gdb/infcmd.c
> @@ -2750,6 +2750,16 @@ detach_command (const char *args, int from_tty)
>  
>    disconnect_tracing ();
>  
> +  /* Hold a strong reference to the target while (maybe)
> +     detaching the parent.  Otherwise detaching could close the
> +     target.  */
> +  auto target_ref
> +    = target_ops_ref::new_reference (current_inferior ()->process_target ());
> +
> +  /* Save this before detaching, since detaching may unpush the
> +     process_stratum target.  */
> +  bool was_non_stop_p = target_is_non_stop_p ();
> +
>    target_detach (current_inferior (), from_tty);
>  
>    /* The current inferior process was just detached successfully.  Get
> @@ -2766,6 +2776,9 @@ detach_command (const char *args, int from_tty)
>  
>    if (deprecated_detach_hook)
>      deprecated_detach_hook ();
> +
> +  if (!was_non_stop_p)
> +    restart_after_all_stop_detach (as_process_stratum_target (target_ref.get ()));
>  }
>  
>  /* Disconnect from the current target without resuming it (leaving it
> diff --git a/gdb/infrun.c b/gdb/infrun.c
> index 4a17410fcd6..4f1a16d2c60 100644
> --- a/gdb/infrun.c
> +++ b/gdb/infrun.c
> @@ -7098,6 +7098,9 @@ process_event_stop_test (struct execution_control_state *ecs)
>    keep_going (ecs);
>  }
>  
> +static bool restart_stepped_thread (process_stratum_target *resume_target,
> +				    ptid_t resume_ptid);
> +
>  /* In all-stop mode, if we're currently stepping but have stopped in
>     some other thread, we may need to switch back to the stepped
>     thread.  Returns true we set the inferior running, false if we left
> @@ -7108,8 +7111,6 @@ switch_back_to_stepped_thread (struct execution_control_state *ecs)
>  {
>    if (!target_is_non_stop_p ())
>      {
> -      struct thread_info *stepping_thread;
> -
>        /* If any thread is blocked on some internal breakpoint, and we
>  	 simply need to step over that breakpoint to get it going
>  	 again, do that first.  */
> @@ -7172,78 +7173,136 @@ switch_back_to_stepped_thread (struct execution_control_state *ecs)
>        if (!signal_program[ecs->event_thread->suspend.stop_signal])
>  	ecs->event_thread->suspend.stop_signal = GDB_SIGNAL_0;
>  
> -      /* Do all pending step-overs before actually proceeding with
> -	 step/next/etc.  */
> -      if (start_step_over ())
> +      if (restart_stepped_thread (ecs->target, ecs->ptid))
>  	{
>  	  prepare_to_wait (ecs);
>  	  return true;
>  	}
>  
> -      /* Look for the stepping/nexting thread.  */
> -      stepping_thread = NULL;
> +      switch_to_thread (ecs->event_thread);
> +    }
>  
> -      for (thread_info *tp : all_non_exited_threads ())
> -	{
> -	  switch_to_thread_no_regs (tp);
> +  return false;
> +}
>  
> -	  /* Ignore threads of processes the caller is not
> -	     resuming.  */
> -	  if (!sched_multi
> -	      && (tp->inf->process_target () != ecs->target
> -		  || tp->inf->pid != ecs->ptid.pid ()))
> -	    continue;
> +/* Look for the thread that was stepping, and resume it.
> +   RESUME_TARGET / RESUME_PTID indicate the set of threads the caller
> +   is resuming.  Return true if a thread was started, false
> +   otherwise.  */
>  
> -	  /* When stepping over a breakpoint, we lock all threads
> -	     except the one that needs to move past the breakpoint.
> -	     If a non-event thread has this set, the "incomplete
> -	     step-over" check above should have caught it earlier.  */
> -	  if (tp->control.trap_expected)
> -	    {
> -	      internal_error (__FILE__, __LINE__,
> -			      "[%s] has inconsistent state: "
> -			      "trap_expected=%d\n",
> -			      target_pid_to_str (tp->ptid).c_str (),
> -			      tp->control.trap_expected);
> -	    }
> +static bool
> +restart_stepped_thread (process_stratum_target *resume_target,
> +			ptid_t resume_ptid)
> +{
> +  /* Do all pending step-overs before actually proceeding with
> +     step/next/etc.  */
> +  if (start_step_over ())
> +    return true;
>  
> -	  /* Did we find the stepping thread?  */
> -	  if (tp->control.step_range_end)
> -	    {
> -	      /* Yep.  There should only one though.  */
> -	      gdb_assert (stepping_thread == NULL);
> +  for (thread_info *tp : all_threads_safe ())
> +    {
> +      if (tp->state == THREAD_EXITED)
> +	continue;
> +
> +      if (tp->suspend.waitstatus_pending_p)
> +	continue;
>  
> -	      /* The event thread is handled at the top, before we
> -		 enter this loop.  */
> -	      gdb_assert (tp != ecs->event_thread);
> +      /* Ignore threads of processes the caller is not
> +	 resuming.  */
> +      if (!sched_multi
> +	  && (tp->inf->process_target () != resume_target
> +	      || tp->inf->pid != resume_ptid.pid ()))
> +	continue;
>  
> -	      /* If some thread other than the event thread is
> -		 stepping, then scheduler locking can't be in effect,
> -		 otherwise we wouldn't have resumed the current event
> -		 thread in the first place.  */
> -	      gdb_assert (!schedlock_applies (tp));
> +      if (tp->control.trap_expected)
> +	{
> +	  infrun_debug_printf ("switching back to stepped thread (step-over)");
>  
> -	      stepping_thread = tp;
> -	    }
> +	  if (keep_going_stepped_thread (tp))
> +	    return true;
>  	}
> +    }
> +
> +  for (thread_info *tp : all_threads_safe ())
> +    {
> +      if (tp->state == THREAD_EXITED)
> +	continue;
> +
> +      if (tp->suspend.waitstatus_pending_p)
> +	continue;
>  
> -      if (stepping_thread != NULL)
> +      /* Ignore threads of processes the caller is not
> +	 resuming.  */
> +      if (!sched_multi
> +	  && (tp->inf->process_target () != resume_target
> +	      || tp->inf->pid != resume_ptid.pid ()))
> +	continue;
> +
> +      /* Did we find the stepping thread?  */
> +      if (tp->control.step_range_end)
>  	{
> -	  infrun_debug_printf ("switching back to stepped thread");
> +	  infrun_debug_printf ("switching back to stepped thread (stepping)");
>  
> -	  if (keep_going_stepped_thread (stepping_thread))
> -	    {
> -	      prepare_to_wait (ecs);
> -	      return true;
> -	    }
> +	  if (keep_going_stepped_thread (tp))
> +	    return true;
>  	}
> -
> -      switch_to_thread (ecs->event_thread);
>      }
>  
>    return false;
>  }
>  
> +/* See infrun.h.  */
> +
> +void
> +restart_after_all_stop_detach (process_stratum_target *proc_target)
> +{
> +  /* Note we don't check target_is_non_stop_p() here, because the
> +     current inferior may no longer have a process_stratum target
> +     pushed, as we just detached.  */
> +
> +  /* See if we have a THREAD_RUNNING thread that need to be
> +     re-resumed.  If we have any thread that is already executing,
> +     then we don't need to resume the target -- it is already been
> +     resumed.  With the remote target (in all-stop), it's even
> +     impossible to issue another resumption if the target is already
> +     resumed, until the target reports a stop.  */
> +  for (thread_info *thr : all_threads (proc_target))
> +    {
> +      if (thr->state != THREAD_RUNNING)
> +	continue;
> +
> +      /* If we have any thread that is already executing, then we
> +	 don't need to resume the target -- it is already been
> +	 resumed.  */
> +      if (thr->executing)
> +	return;
> +
> +      /* If we have a pending event to process, skip resuming the
> +	 target and go straight to processing it.  */
> +      if (thr->resumed && thr->suspend.waitstatus_pending_p)
> +	return;
> +    }

Hmm instead of that loop, could you do an early check in the function
instead, like

  if (proc_target->threads_executing)
    return;

? When a thread gets target-resumed, the thr->executing flag gets set
at the same time as the proc_target->threads_executing flag.  And when
resume_1-ing a thread with a pending status,
proc_target->threads_executing also gets set.  So maybe (just maybe)
that whole loop could be replaced with that check.

But otherwise, that LGTM.

Simon

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 12/12] Testcase for detaching while stepping over breakpoint
  2021-01-13  1:15 ` [PATCH 12/12] Testcase for detaching while stepping over breakpoint Pedro Alves
@ 2021-01-13  7:05   ` Simon Marchi
  2021-02-03  1:33     ` Pedro Alves
  0 siblings, 1 reply; 36+ messages in thread
From: Simon Marchi @ 2021-01-13  7:05 UTC (permalink / raw)
  To: Pedro Alves, gdb-patches

On 2021-01-12 8:15 p.m., Pedro Alves wrote:
> +# The test proper.  See description above.
> +proc test {condition_eval target_non_stop non_stop displaced} {
> +    global binfile srcfile
> +    global gdb_prompt
> +    global decimal
> +    global bp_lineno
> +    global GDBFLAGS
> +
> +    # Number of threads started by the program.
> +    set n_threads 10
> +
> +    save_vars { GDBFLAGS } {
> +	append GDBFLAGS " -ex \"maint set target-non-stop $target_non_stop\""
> +	append GDBFLAGS " -ex \"set non-stop $non_stop\""
> +	append GDBFLAGS " -ex \"set displaced $displaced\""
> +	append GDBFLAGS " -ex \"set schedule-multiple on\""
> +	clean_restart $binfile
> +    }
> +
> +    set test_spawn_id [spawn_wait_for_attach $binfile]
> +    set testpid [spawn_id_get_pid $test_spawn_id]
> +
> +    set any "\[^\r\n\]*"
> +
> +    gdb_test "add-inferior" "Added inferior 2.*"
> +    gdb_test "inferior 2" "Switching to .*"
> +
> +    gdb_load $binfile
> +    if ![runto setup_done] then {
> +	fail "can't run to setup_done"
> +	kill_wait_spawned_process $test_spawn_id
> +	return
> +    }
> +
> +    gdb_test_no_output "set breakpoint condition-evaluation $condition_eval"
> +
> +    # Get the PID of the test process.
> +    set pid_inf2 ""
> +    gdb_test_multiple "p mypid" "get pid of inferior 2" {
> +	-re " = ($decimal)\r\n$gdb_prompt $" {
> +	    set pid_inf2 $expect_out(1,string)
> +	    pass $gdb_test_name
> +	}
> +    }
> +
> +    set attempts 3
> +    for {set attempt 1} { $attempt <= $attempts } { incr attempt } {
> +	with_test_prefix "iter $attempt" {
> +	    gdb_test "inferior 1" "Switching to .*"
> +
> +	    set attached 0
> +	    set eperm 0
> +	    set test "attach"
> +	    gdb_test_multiple "attach $testpid" $test {
> +		-re "new threads in iteration" {
> +		    # Seen when "set debug libthread_db" is on.
> +		    exp_continue
> +		}
> +		-re "is a zombie - the process has already terminated.*$gdb_prompt " {
> +		    fail $gdb_test_name
> +		}
> +		-re "Unable to attach: .*$gdb_prompt " {
> +		    fail $gdb_test_name
> +		}
> +		-re "Attaching to program.*process $testpid.*$gdb_prompt " {
> +		    pass $test
> +		    set attached 1
> +		}
> +	    }
> +
> +	    if {!$attached} {
> +		kill_wait_spawned_process $test_spawn_id
> +		return
> +	    }
> +
> +	    if {$non_stop} {
> +		# In non-stop, we will see one stop per thread after
> +		# the prompt.
> +		set stops 0
> +		set tid_re "$::decimal\.$::decimal"
> +		set test "seen all stops"
> +		for {set thread 1} { $thread <= $n_threads } { incr thread } {
> +		    if {[gdb_test_multiple "" $test {
> +			-re "Thread ${tid_re} ${any} stopped" {
> +			    incr stops
> +			}
> +		    }] != 0} {
> +			break
> +		    }
> +		}
> +
> +		# This we haven't seen all stops, then the

This sentence misses something, like in the other test.

I am running it in a loop here and it's passing so far :).  Really, congrats
and thanks for this series, that looked like some hardcore debugging.

Simon

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 01/12] Fix attaching in non-stop mode
  2021-01-13  3:11   ` Simon Marchi
@ 2021-02-03  1:21     ` Pedro Alves
  0 siblings, 0 replies; 36+ messages in thread
From: Pedro Alves @ 2021-02-03  1:21 UTC (permalink / raw)
  To: Simon Marchi, gdb-patches

On 13/01/21 03:11, Simon Marchi wrote:

> This condition (ecs->ptid != inferior_ptid) is half the condition that is
> already in context_switch.  Could we just call context_switch unconditionally
> here?  The way I understand it is that context_switch makes sure the event
> thread is the current one, so it's idempotent.  Each condition we can remove
> from this code makes it slightly easier to understand.
> 
> I checked deprecated_context_hook, it's only used in insight.  And all it does
> is set an int:
> 
>   https://sourceware.org/git/?p=insight.git;a=blob;f=gdbtk/generic/gdbtk-hooks.c;h=0aaf404995a3e08aca1e2921d455afa7c8a438ea;hb=HEAD#l783
> 
> So it won't care if we call it when the current thread hasn't actually changed,
> it's idempotent.
> 
> For fun I checked a but further, that variable is then used to drive the
> "gdb_context_id" tcl variable.  When I grep in insight, I find no use of that
> variable.  Is it maybe a variable that people could use in tcl scripting?  If
> not, it just seems that all this could be removed...

Alright, I was thinking of the patch in terms of just moving the code around,
but I don't mind doing that at the same time.  The patch as merged is below.

> 
> Anyway, that LGTM in any case.

From 2ab76a181f3db93f051aaae66d65ff2733884d96 Mon Sep 17 00:00:00 2001
From: Pedro Alves <pedro@palves.net>
Date: Wed, 23 Dec 2020 00:34:54 +0000
Subject: [PATCH 01/12] Fix attaching in non-stop mode (PR gdb/27055)

Attaching in non-stop mode currently misbehaves, like so:

 (gdb) attach 1244450
 Attaching to process 1244450
 [New LWP 1244453]
 [New LWP 1244454]
 [New LWP 1244455]
 [New LWP 1244456]
 [New LWP 1244457]
 [New LWP 1244458]
 [New LWP 1244459]
 [New LWP 1244461]
 [New LWP 1244462]
 [New LWP 1244463]
 No unwaited-for children left.

At this point, GDB's stopped/running thread state is out of sync with
the inferior:

(gdb) info threads
  Id   Target Id                     Frame
* 1    LWP 1244450 "attach-non-stop" 0xf1b443bf in ?? ()
  2    LWP 1244453 "attach-non-stop" (running)
  3    LWP 1244454 "attach-non-stop" (running)
  4    LWP 1244455 "attach-non-stop" (running)
  5    LWP 1244456 "attach-non-stop" (running)
  6    LWP 1244457 "attach-non-stop" (running)
  7    LWP 1244458 "attach-non-stop" (running)
  8    LWP 1244459 "attach-non-stop" (running)
  9    LWP 1244461 "attach-non-stop" (running)
  10   LWP 1244462 "attach-non-stop" (running)
  11   LWP 1244463 "attach-non-stop" (running)
(gdb)
(gdb) interrupt -a
(gdb)
*nothing*

The problem is that attaching installs an inferior continuation,
called when the target reports the initial attach stop, here, in
inf-loop.c:inferior_event_handler:

      /* Do all continuations associated with the whole inferior (not
	 a particular thread).  */
      if (inferior_ptid != null_ptid)
	do_all_inferior_continuations (0);

However, currently in non-stop mode, inferior_ptid is still null_ptid
when we get here.

If you try to do "set debug infrun 1" to debug the problem, however,
then the attach completes correctly, with GDB reporting a stop for
each thread.

The bug is that we're missing a switch_to_thread/context_switch call
when handling the initial stop, here:

  if (stop_soon == STOP_QUIETLY_NO_SIGSTOP
      && (ecs->event_thread->suspend.stop_signal == GDB_SIGNAL_STOP
	  || ecs->event_thread->suspend.stop_signal == GDB_SIGNAL_TRAP
	  || ecs->event_thread->suspend.stop_signal == GDB_SIGNAL_0))
    {
      stop_print_frame = true;
      stop_waiting (ecs);
      ecs->event_thread->suspend.stop_signal = GDB_SIGNAL_0;
      return;
    }

Note how the STOP_QUIETLY / STOP_QUIETLY_REMOTE case above that does
call context_switch.

And the reason "set debug infrun 1" "fixes" it, is that the debug path
has a switch_to_thread call.

This patch fixes it by moving the main context_switch call earlier.
It also removes the:

   if (ecs->ptid != inferior_ptid)

check at the same time because:

 #1 - that is half of what context_switch already does

 #2 - deprecated_context_hook is only used in Insight, and all it does
      is set an int.  It won't care if we call it when the current
      thread hasn't actually changed.

A testcase exercising this will be added in a following patch.

gdb/ChangeLog:

	PR gdb/27055
	* infrun.c (handle_signal_stop): Move main context_switch call
	earlier, before STOP_QUIETLY_NO_SIGSTOP.
---
 gdb/ChangeLog |  6 ++++++
 gdb/infrun.c  | 20 +++++---------------
 2 files changed, 11 insertions(+), 15 deletions(-)

diff --git a/gdb/ChangeLog b/gdb/ChangeLog
index 209834a15fc..9b3b64dcf93 100644
--- a/gdb/ChangeLog
+++ b/gdb/ChangeLog
@@ -1,3 +1,9 @@
+2021-02-03  Pedro Alves  <pedro@palves.net>
+
+	PR gdb/27055
+	* infrun.c (handle_signal_stop): Move main context_switch call
+	earlier, before STOP_QUIETLY_NO_SIGSTOP.
+
 2021-02-02  Lancelot SIX  <lsix@lancelotsix.com>
 
 	* NEWS (Changed commands): Add entry for the behavior change of
diff --git a/gdb/infrun.c b/gdb/infrun.c
index e070eff33d7..405b907856a 100644
--- a/gdb/infrun.c
+++ b/gdb/infrun.c
@@ -5735,13 +5735,16 @@ handle_signal_stop (struct execution_control_state *ecs)
   ecs->event_thread->suspend.stop_pc
     = regcache_read_pc (get_thread_regcache (ecs->event_thread));
 
+  context_switch (ecs);
+
+  if (deprecated_context_hook)
+    deprecated_context_hook (ecs->event_thread->global_num);
+
   if (debug_infrun)
     {
       struct regcache *regcache = get_thread_regcache (ecs->event_thread);
       struct gdbarch *reg_gdbarch = regcache->arch ();
 
-      switch_to_thread (ecs->event_thread);
-
       infrun_debug_printf ("stop_pc=%s",
 			   paddress (reg_gdbarch,
 				     ecs->event_thread->suspend.stop_pc));
@@ -5764,7 +5767,6 @@ handle_signal_stop (struct execution_control_state *ecs)
   stop_soon = get_inferior_stop_soon (ecs);
   if (stop_soon == STOP_QUIETLY || stop_soon == STOP_QUIETLY_REMOTE)
     {
-      context_switch (ecs);
       infrun_debug_printf ("quietly stopped");
       stop_print_frame = true;
       stop_waiting (ecs);
@@ -5802,18 +5804,6 @@ handle_signal_stop (struct execution_control_state *ecs)
       return;
     }
 
-  /* See if something interesting happened to the non-current thread.  If
-     so, then switch to that thread.  */
-  if (ecs->ptid != inferior_ptid)
-    {
-      infrun_debug_printf ("context switch");
-
-      context_switch (ecs);
-
-      if (deprecated_context_hook)
-	deprecated_context_hook (ecs->event_thread->global_num);
-    }
-
   /* At this point, get hold of the now-current thread's frame.  */
   frame = get_current_frame ();
   gdbarch = get_frame_arch (frame);

base-commit: 0e33957abf8878a16283bee68dbd3899c2bcba09
-- 
2.26.2


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 03/12] Testcase for attaching in non-stop mode
  2021-01-13  5:09   ` Simon Marchi
@ 2021-02-03  1:23     ` Pedro Alves
  0 siblings, 0 replies; 36+ messages in thread
From: Pedro Alves @ 2021-02-03  1:23 UTC (permalink / raw)
  To: Simon Marchi, gdb-patches

On 13/01/21 05:09, Simon Marchi wrote:
> On 2021-01-12 8:15 p.m., Pedro Alves wrote:


>> +
>> +	if {$non_stop} {
>> +	    # In non-stop, we will see one stop per thread after
>> +	    # the prompt.
>> +	    set stops 0
>> +	    set test "seen all stops"
>> +	    for {set thread 1} { $thread <= $n_threads } { incr thread } {
>> +		gdb_test_multiple "" $test {
>> +		    -re "Thread $::decimal ${any} stopped" {
>> +			incr stops
>> +		    }
>> +		}
>> +	    }
>> +
>> +	    # This we haven't seen all stops, then the
> 
> This sentence is weird.

Ah, it was meant to be s/This/If/, like so:

	    # If we haven't seen all stops, then the gdb_test_multiple
	    # in the loop above will have already issued a FAIL.

> 
> Otherwise, the patch LGTM.

Thanks, pushed with the fix above.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 04/12] Fix a couple vStopped pending ack bugs
  2021-01-13  5:29   ` Simon Marchi
@ 2021-02-03  1:25     ` Pedro Alves
  0 siblings, 0 replies; 36+ messages in thread
From: Pedro Alves @ 2021-02-03  1:25 UTC (permalink / raw)
  To: Simon Marchi, gdb-patches

On 13/01/21 05:29, Simon Marchi wrote:
> On 2021-01-12 8:15 p.m., Pedro Alves wrote:

> Just to be sure I understand does that sum it up?
> 
> 1. GDBserver sends stop notification about thread X, the remote target
>    receives it and stores it
> 2. At the same time, GDB detaches thread X's inferior
> 3. The remote target discards the received stop notification
> 4. GDBserver waits forever for the ack

Yup.  Thanks for that summary.  I've borrowed it and put it in the commit log.

> Since the first item in the list has a special status (it is the in
> flight one), perhaps having a separate field in notif_server for it
> would make things clearer.
> 
> I mean having an additional
> 
>   notif_event *in_flight;
> 
> field in there.  When sending a notification, it gets dequeu-ed from
> the queue and put there.

It's likely yes, though I'd prefer that was done separately, as it has
risk of messing up something else.

> 
> Either way, this LGTM.

Thanks.  I've merged it.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 05/12] gdbserver: spurious SIGTRAP w/ detach while step-over in progress
  2021-01-13  6:00   ` Simon Marchi
@ 2021-02-03  1:26     ` Pedro Alves
  0 siblings, 0 replies; 36+ messages in thread
From: Pedro Alves @ 2021-02-03  1:26 UTC (permalink / raw)
  To: Simon Marchi, gdb-patches

On 13/01/21 06:00, Simon Marchi wrote:

> Maybe the only thing that wasn't clear from the start of the
> explanation is that GDBserver is doing a step-over for process A
> when a detach request for process B arrives.  And that generates
> a spurious SIGTRAP report for process A.  It wasn't clear which
> process was doing doing what.  But otherwise, make sense.

I see.  I've added that info to the commit log then, and merged it.

> 
> The fix LGTM, although I'm not much used to the GDBserver internals.
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 06/12] Factor out after-stop event handling code from stop_all_threads
  2021-01-13  6:06   ` Simon Marchi
@ 2021-02-03  1:26     ` Pedro Alves
  0 siblings, 0 replies; 36+ messages in thread
From: Pedro Alves @ 2021-02-03  1:26 UTC (permalink / raw)
  To: Simon Marchi, gdb-patches

On 13/01/21 06:06, Simon Marchi wrote:
> On 2021-01-12 8:15 p.m., Pedro Alves wrote:
>> This moves the code handling an event out of wait_one to a separate
>> function, to be used in another context in a following patch.
>>
>> gdb/ChangeLog:
>>
>> 	* infrun.c (handle_one): New function, factored out from ...
>> 	(stop_all_threads): ... here.
> 
> I think this is a good cleanup on its own, that function was starting to
> get crowded.  Since that removes a few indentation levels to the code, it
> would be nice to un-wrap it a little bit, that would help readability.
> 
> Could you document the meaning of the return value of handle_one in its
> top comment?

Here's what I looks like now:

+/* Handle one event after stopping threads.  If the eventing thread
+   reports back any interesting event, we leave it pending.  If the
+   eventing thread was in the middle of a displaced step, we
+   cancel/finish it.  Returns true if there are no resumed threads
+   left in the target (thus there's no point in waiting further),
+   false otherwise.  */
+
+static bool
+handle_one (const wait_one_event &event)


> 
> Otherwise, that LGTM.

Thanks.  I've merged it with the above change.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 07/12] prepare_for_detach: don't release scoped_restore at the end
  2021-01-13  6:08   ` Simon Marchi
@ 2021-02-03  1:27     ` Pedro Alves
  0 siblings, 0 replies; 36+ messages in thread
From: Pedro Alves @ 2021-02-03  1:27 UTC (permalink / raw)
  To: Simon Marchi, gdb-patches

On 13/01/21 06:08, Simon Marchi wrote:
> 
> 
> On 2021-01-12 8:15 p.m., Pedro Alves wrote:
>> After detaching from a process, the inf->detaching flag is
>> inadvertently left set to true.  If you afterwards reuse the same
>> inferior to start a new process, GDB will mishave...
>>
>> The problem is that prepare_for_detach discards the scoped_restore at
>> the end, while the intention is for the flag to be set only for the
>> duration of prepare_for_detach.
>>
>> This was already a bug in the original commit that added
>> prepare_for_detach, commit 24291992dac3 ("PR gdb/11321"), by yours
>> truly.  Back then, we still used cleanups, and the function called
>> discard_cleanups instead of do_cleanups, by mistake.
>>
>> gdb/ChangeLog:
>>
>> 	* infrun.c (prepare_for_detach): Don't release scoped_restore at
>> 	the end.
>> ---
>>  gdb/infrun.c | 2 --
>>  1 file changed, 2 deletions(-)
>>
>> diff --git a/gdb/infrun.c b/gdb/infrun.c
>> index b62e74d3dab..fc7ba745737 100644
>> --- a/gdb/infrun.c
>> +++ b/gdb/infrun.c
>> @@ -3612,8 +3612,6 @@ prepare_for_detach (void)
>>  	  error (_("Program exited while detaching"));
>>  	}
>>      }
>> -
>> -  restore_detaching.release ();
> 
> There's another "restore_detaching.release ()" a few lines above, that
> one should stay?

Whooops, no, that should be gone too.  The next patch removes it and
I didn't notice that bit should have been pulled into this patch.

Thanks.  I've merged this with that change.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 09/12] detach and breakpoint removal
  2021-01-13  6:32   ` Simon Marchi
@ 2021-02-03  1:28     ` Pedro Alves
  0 siblings, 0 replies; 36+ messages in thread
From: Pedro Alves @ 2021-02-03  1:28 UTC (permalink / raw)
  To: Simon Marchi, gdb-patches

On 13/01/21 06:32, Simon Marchi wrote:
> 
> 
> On 2021-01-12 8:15 p.m., Pedro Alves wrote:
>> A following patch will add a testcase that has a number of threads
>> constantly stepping over a breakpoint, and then has GDB detach the
>> process.  That testcase sometimes fails with the inferior crashing
>> with SIGTRAP after the detach because of the bug fixed by this patch,
>> when tested with the native target.
>>
>> The problem is that target_detach removes breakpoints from the target
>> immediately, and that does not work with the native GNU/Linux target
>> (and probably no other native target) currently.  The test wouldn't
>> fail with this issue when testing against gdbserver, because gdbserver
>> does allow accessing memory while the current thread is running, by
>> transparently pausing all threads temporarily, without GDB noticing.
>> Implementing that in gdbserver was a lot of work, so I'm not looking
>> forward right now to do the same in the native target.  Instead, I
>> came up with a simpler solution -- push the breakpoints removal down
>> to the targets.  The Linux target conveniently already pauses all
>> threads before detaching them, since PTRACE_DETACH only works with
>> stopped threads, so we move removing breakpoints to after that.  Only
>> the remote and GNU/Linux targets support support async execution, so
>> no other target should really need this.
> 
> I think that makes sense.
> 
> I initially thought that GDB could always stop all threads, remove
> breakpoints, and then ask the target to detach.  But then that's
> unfair for the targets that can remove the breakpoints while all
> threads are running.  These targets would pay a price (a window of
> time where threads are paused when detaching) they don't need to pay.
> So leaving the task of removing breakpoints to the target makes sense.

Yeah, that was my thinking too.  Always stopping everything would be
simpler, but with a potential penalty.

> 
> I think it would be good to mention this in the doc of
> target_ops::detach (which doesn't exist yet), that the target is
> responsible for removing the breakpoints from the inferior.
> 

I've merged the patch with this added comment:

+
+    /* Detaches from the inferior.  Note that on targets that support
+       async execution (i.e., targets where it is possible to detach
+       from programs with threads running), the target is responsible
+       for removing breakpoints from the program before the actual
+       detach, otherwise the program dies when it hits one.  */
     virtual void detach (inferior *, int)
       TARGET_DEFAULT_IGNORE ();
+



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 11/12] detach in all-stop with threads running
  2021-01-13  6:55   ` Simon Marchi
@ 2021-02-03  1:31     ` Pedro Alves
  0 siblings, 0 replies; 36+ messages in thread
From: Pedro Alves @ 2021-02-03  1:31 UTC (permalink / raw)
  To: Simon Marchi, gdb-patches

On 13/01/21 06:55, Simon Marchi wrote:
>
> On 2021-01-12 8:15 p.m., Pedro Alves wrote:
>>  
>> +/* See infrun.h.  */
>> +
>> +void
>> +restart_after_all_stop_detach (process_stratum_target *proc_target)
>> +{
>> +  /* Note we don't check target_is_non_stop_p() here, because the
>> +     current inferior may no longer have a process_stratum target
>> +     pushed, as we just detached.  */
>> +
>> +  /* See if we have a THREAD_RUNNING thread that need to be
>> +     re-resumed.  If we have any thread that is already executing,
>> +     then we don't need to resume the target -- it is already been
>> +     resumed.  With the remote target (in all-stop), it's even
>> +     impossible to issue another resumption if the target is already
>> +     resumed, until the target reports a stop.  */
>> +  for (thread_info *thr : all_threads (proc_target))
>> +    {
>> +      if (thr->state != THREAD_RUNNING)
>> +	continue;
>> +
>> +      /* If we have any thread that is already executing, then we
>> +	 don't need to resume the target -- it is already been
>> +	 resumed.  */
>> +      if (thr->executing)
>> +	return;
>> +
>> +      /* If we have a pending event to process, skip resuming the
>> +	 target and go straight to processing it.  */
>> +      if (thr->resumed && thr->suspend.waitstatus_pending_p)
>> +	return;
>> +    }
> 
> Hmm instead of that loop, could you do an early check in the function
> instead, like
> 
>   if (proc_target->threads_executing)
>     return;
> 
> ? When a thread gets target-resumed, the thr->executing flag gets set
> at the same time as the proc_target->threads_executing flag.  And when
> resume_1-ing a thread with a pending status,
> proc_target->threads_executing also gets set.  So maybe (just maybe)
> that whole loop could be replaced with that check.

Unfortunately not, because while we set proc_target->threads_executing
in those conditions, we don't clear it when the conditions reverse.

I applied this patch locally:

diff --git c/gdb/infrun.c w/gdb/infrun.c
index f2abfe4a7c2..424de103597 100644
--- c/gdb/infrun.c
+++ w/gdb/infrun.c
@@ -7287,13 +7287,20 @@ restart_after_all_stop_detach (process_stratum_target *proc_target)
         don't need to resume the target -- it is already been
         resumed.  */
       if (thr->executing)
-       return;
+       {
+         gdb_assert (proc_target->threads_executing);
+         return;
+       }
 
       /* If we have a pending event to process, skip resuming the
         target and go straight to processing it.  */
       if (thr->resumed && thr->suspend.waitstatus_pending_p)
-       return;
+       {
+         gdb_assert (proc_target->threads_executing);
+         return;
+       }
     }
+  gdb_assert (!proc_target->threads_executing);
 
   /* Alright, we need to re-resume the target.  If a thread was
      stepping, we need to restart it stepping.  */

and re-run the test, and it hit:

detach
Detaching from program: /home/pedro/gdb/binutils-gdb/build/gdb/testsuite/outputs/gdb.threads/detach-step-over/detach-step-over, process 2908032
[Inferior 1 (process 2908032) detached]
/home/pedro/gdb/binutils-gdb/src/gdb/infrun.c:7303: internal-error: void restart_after_all_stop_detach(process_stratum_target*): Assertion `!proc_target->threads_executing
' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n) FAIL: gdb.threads/detach-step-over.exp: breakpoint-condition-evaluation=host: target-non-stop=off: non-stop=off: displaced=off: iter 2: detach (GDB internal error)
Resyncing due to internal error.
n

And if I just try your suggestion without the asserts, I get:

 Running /home/pedro/gdb/binutils-gdb/src/gdb/testsuite/gdb.threads/detach-step-over.exp ...
 FAIL: gdb.threads/detach-step-over.exp: breakpoint-condition-evaluation=host: target-non-stop=off: non-stop=off: displaced=off: iter 2: stop with SIGUSR1 (timeout)
 FAIL: gdb.threads/detach-step-over.exp: breakpoint-condition-evaluation=host: target-non-stop=off: non-stop=off: displaced=off: iter 3: all threads running

which is what I expect to see if we fail to re-resume the target.

> 
> But otherwise, that LGTM.

Thanks.  I've merged it.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 12/12] Testcase for detaching while stepping over breakpoint
  2021-01-13  7:05   ` Simon Marchi
@ 2021-02-03  1:33     ` Pedro Alves
  0 siblings, 0 replies; 36+ messages in thread
From: Pedro Alves @ 2021-02-03  1:33 UTC (permalink / raw)
  To: Simon Marchi, gdb-patches

On 13/01/21 07:05, Simon Marchi wrote:
>> +
>> +		# This we haven't seen all stops, then the
> This sentence misses something, like in the other test.
> 

Yeah, it's s/This/If/ again.  Thanks again for noticing this.


> I am running it in a loop here and it's passing so far :).  Really, congrats
> and thanks for this series, that looked like some hardcore debugging.

Thanks.  Yeah, it started out by looking like a simple typo fix, but then
I kept going down the rabbit hole.  It was maddening at times.  What I thought
would be a few hours fix, turned out to be weeks long N fixes.  I kind of
dread the testcase hitting more racy issues.

One thing that really helped was a local crude GDB hack that I made
to redirect gdb_stdlog to a file, so that I could run the whole
testcase in a loop with "set debug infrun 1" etc.

I've merged the whole series now.  Thanks so much for the (impressively!) speedy review.

Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 11/12] detach in all-stop with threads running
  2021-01-13  1:15 ` [PATCH 11/12] detach in all-stop with threads running Pedro Alves
  2021-01-13  6:55   ` Simon Marchi
@ 2021-07-12 15:36   ` Jonah Graham
  2021-07-12 19:36     ` Pedro Alves
  1 sibling, 1 reply; 36+ messages in thread
From: Jonah Graham @ 2021-07-12 15:36 UTC (permalink / raw)
  To: Pedro Alves; +Cc: gdb-patches

On Tue, 12 Jan 2021 at 20:16, Pedro Alves <pedro@palves.net> wrote:

> A following patch will add a testcase that has a number of threads
> constantly stepping over a breakpoint, and then has GDB detach the
> process, while threads are running.  If we have more then one inferior
> running, and we detach from just one of the inferiors, we expect that
> the remaining inferior continues running.  However, in all-stop, if
> GDB needs to pause the target for the detach, nothing is re-resuming
> the other inferiors after the detach.  "info threads" shows the
> threads as running, but they really aren't.  This fixes it.
>
> gdb/ChangeLog:
>
>         * infcmd.c (detach_command): Hold strong reference to target, and
>         if all-stop on entry, restart threads on exit.
>         * infrun.c (switch_back_to_stepped_thread): Factor out bits to ...
>         (restart_stepped_thread): ... this new function.  Also handle
>         trap_expected.
>         (restart_after_all_stop_detach): New function.
>         * infrun.h (restart_after_all_stop_detach): Declare.
>

Hi Pedro,

This patch appears to cause a regression that I am seeing core dumps when
doing a detach. Full details in
https://sourceware.org/bugzilla/show_bug.cgi?id=28080

I would be happy to test any patches and run them against the Eclipse CDT
testsuite. Please CC me on the patch.

Thanks,
Jonah

>
>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 11/12] detach in all-stop with threads running
  2021-07-12 15:36   ` Jonah Graham
@ 2021-07-12 19:36     ` Pedro Alves
  0 siblings, 0 replies; 36+ messages in thread
From: Pedro Alves @ 2021-07-12 19:36 UTC (permalink / raw)
  To: Jonah Graham; +Cc: gdb-patches

On 2021-07-12 4:36 p.m., Jonah Graham wrote:
> On Tue, 12 Jan 2021 at 20:16, Pedro Alves <pedro@palves.net <mailto:pedro@palves.net>> wrote:
> 
>     A following patch will add a testcase that has a number of threads
>     constantly stepping over a breakpoint, and then has GDB detach the
>     process, while threads are running.  If we have more then one inferior
>     running, and we detach from just one of the inferiors, we expect that
>     the remaining inferior continues running.  However, in all-stop, if
>     GDB needs to pause the target for the detach, nothing is re-resuming
>     the other inferiors after the detach.  "info threads" shows the
>     threads as running, but they really aren't.  This fixes it.
> 
>     gdb/ChangeLog:
> 
>             * infcmd.c (detach_command): Hold strong reference to target, and
>             if all-stop on entry, restart threads on exit.
>             * infrun.c (switch_back_to_stepped_thread): Factor out bits to ...
>             (restart_stepped_thread): ... this new function.  Also handle
>             trap_expected.
>             (restart_after_all_stop_detach): New function.
>             * infrun.h (restart_after_all_stop_detach): Declare.
> 
> 
> Hi Pedro,
> 
> This patch appears to cause a regression that I am seeing core dumps when doing a detach. Full details in https://sourceware.org/bugzilla/show_bug.cgi?id=28080 <https://sourceware.org/bugzilla/show_bug.cgi?id=28080>
> 
> I would be happy to test any patches and run them against the Eclipse CDT testsuite. Please CC me on the patch.
> 

Thanks for spotting this.  I've written and fix, and sent it here:
  https://sourceware.org/pipermail/gdb-patches/2021-July/180847.html

You're in CC.

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2021-07-12 19:36 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-13  1:15 [PATCH 00/12] Fix detach + displaced-step regression + N bugs more Pedro Alves
2021-01-13  1:15 ` [PATCH 01/12] Fix attaching in non-stop mode Pedro Alves
2021-01-13  3:11   ` Simon Marchi
2021-02-03  1:21     ` Pedro Alves
2021-01-13  1:15 ` [PATCH 02/12] Fix "target extended-remote" + "maint set target-non-stop" + "attach" Pedro Alves
2021-01-13  5:01   ` Simon Marchi
2021-01-13  1:15 ` [PATCH 03/12] Testcase for attaching in non-stop mode Pedro Alves
2021-01-13  5:09   ` Simon Marchi
2021-02-03  1:23     ` Pedro Alves
2021-01-13  1:15 ` [PATCH 04/12] Fix a couple vStopped pending ack bugs Pedro Alves
2021-01-13  5:29   ` Simon Marchi
2021-02-03  1:25     ` Pedro Alves
2021-01-13  1:15 ` [PATCH 05/12] gdbserver: spurious SIGTRAP w/ detach while step-over in progress Pedro Alves
2021-01-13  6:00   ` Simon Marchi
2021-02-03  1:26     ` Pedro Alves
2021-01-13  1:15 ` [PATCH 06/12] Factor out after-stop event handling code from stop_all_threads Pedro Alves
2021-01-13  6:06   ` Simon Marchi
2021-02-03  1:26     ` Pedro Alves
2021-01-13  1:15 ` [PATCH 07/12] prepare_for_detach: don't release scoped_restore at the end Pedro Alves
2021-01-13  6:08   ` Simon Marchi
2021-02-03  1:27     ` Pedro Alves
2021-01-13  1:15 ` [PATCH 08/12] prepare_for_detach and ongoing displaced stepping Pedro Alves
2021-01-13  6:23   ` Simon Marchi
2021-01-13  1:15 ` [PATCH 09/12] detach and breakpoint removal Pedro Alves
2021-01-13  6:32   ` Simon Marchi
2021-02-03  1:28     ` Pedro Alves
2021-01-13  1:15 ` [PATCH 10/12] detach with in-line step over in progress Pedro Alves
2021-01-13  6:39   ` Simon Marchi
2021-01-13  1:15 ` [PATCH 11/12] detach in all-stop with threads running Pedro Alves
2021-01-13  6:55   ` Simon Marchi
2021-02-03  1:31     ` Pedro Alves
2021-07-12 15:36   ` Jonah Graham
2021-07-12 19:36     ` Pedro Alves
2021-01-13  1:15 ` [PATCH 12/12] Testcase for detaching while stepping over breakpoint Pedro Alves
2021-01-13  7:05   ` Simon Marchi
2021-02-03  1:33     ` Pedro Alves

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).