[PATCH 0/8] Step over thread exit improvements/fixes + AMD GPU

public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed

* [PATCH 0/8] Step over thread exit improvements/fixes + AMD GPU
@ 2023-12-14 20:22 Pedro Alves
  2023-12-14 20:22 ` [PATCH 1/8] gdb.threads/step-over-thread-exit.exp improvements Pedro Alves
                   ` (8 more replies)
  0 siblings, 9 replies; 12+ messages in thread
From: Pedro Alves @ 2023-12-14 20:22 UTC (permalink / raw)
  To: gdb-patches

This series is the result of making step-over-thread-exit work
properly with the AMD GPU target.

It includes some improvements to
gdb.threads/step-over-thread-exit.exp, a few core fixes, and then AMD
GPU target fixes.

Finally, the last two patches include tests that we are carrying
downstream, but that unfortunately can't work with upstream GDB yet,
because upstream doesn't understand the DWARF extensions that we are
working hard to get into DWARF 6, so upstream is missing proper
unwinding and accessing variables.  I include them in the series so
reading the patches makes more sense, but I don't plan on pushing
them.  Unless we are OK with adding them upstream with some early
return, effectivelly making them nops.

I sent the first two patches as a separate series last month, and
Simon & Lancelot have meanwhile reviewed this whole series internally
at AMD, which resulted in some further improvements in those first
patches (as well as in the others).

Pedro Alves (8):
  gdb.threads/step-over-thread-exit.exp improvements
  Ensure selected thread after thread exit stop
  displaced_step_finish: Don't fetch the regcache of exited threads
  Step over thread exit, always delete the thread non-silently
  Fix thread target ID of exited waves
  Fix handling of vanishing threads that were stepping/stopping
  Add tests for s_endpgm handling
  Add tests for handling of vanishing threads that were
    stepping/stopping

 gdb/amd-dbgapi-target.c                       | 313 ++++++++++++++----
 gdb/infrun.c                                  |  31 +-
 gdb/observable.c                              |   1 +
 gdb/observable.h                              |   5 +
 gdb/regcache.c                                |   2 +
 .../gdb.rocm/continue-over-kernel-exit.cpp    |  66 ++++
 .../gdb.rocm/continue-over-kernel-exit.exp    | 165 +++++++++
 .../gdb.rocm/step-over-kernel-exit.cpp        |  48 +++
 .../gdb.rocm/step-over-kernel-exit.exp        | 108 ++++++
 .../gdb.threads/step-over-thread-exit.c       |  16 +-
 .../gdb.threads/step-over-thread-exit.exp     | 127 +++++--
 gdb/thread.c                                  |   2 +
 12 files changed, 792 insertions(+), 92 deletions(-)
 create mode 100644 gdb/testsuite/gdb.rocm/continue-over-kernel-exit.cpp
 create mode 100644 gdb/testsuite/gdb.rocm/continue-over-kernel-exit.exp
 create mode 100644 gdb/testsuite/gdb.rocm/step-over-kernel-exit.cpp
 create mode 100644 gdb/testsuite/gdb.rocm/step-over-kernel-exit.exp

base-commit: 1d2f86b6b74e6caae77951353a4c353ce9816374
-- 
2.43.0

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/8] gdb.threads/step-over-thread-exit.exp improvements
  2023-12-14 20:22 [PATCH 0/8] Step over thread exit improvements/fixes + AMD GPU Pedro Alves
@ 2023-12-14 20:22 ` Pedro Alves
  2023-12-14 20:22 ` [PATCH 2/8] Ensure selected thread after thread exit stop Pedro Alves
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Pedro Alves @ 2023-12-14 20:22 UTC (permalink / raw)
  To: gdb-patches

This commit makes the following improvements to
gdb.threads/step-over-thread-exit.exp:

- Add a third axis to stepping over the breakpoint with displaced vs
  inline stepping -- also test with no breakpoint at all.

- Check that when GDB reports "Command aborted, thread exited.", the
  selected thread is the thread that exited.  This is always true
  currently on GNU/Linux by coincidence, but a similar testcase on AMD
  GPU exposed a problem here.  Better make the testcase catch any
  potential regression.

- Fixes a race that Simon ran into with GDBserver testing.

    (gdb) next
    [New Thread 2143071.2143438]

    Thread 3 "step-over-threa" hit Breakpoint 2, 0x000055555555524e in my_exit_syscall () at .../testsuite/lib/my-syscalls.S:74
    74      SYSCALL (my_exit, __NR_exit)
    (gdb) FAIL: gdb.threads/step-over-thread-exit.exp: displaced-stepping=auto: non-stop=on: target-non-stop=on: schedlock=off: cmd=next: ns_stop_all=0: command aborts when thread exits

  I was not able to reproduce it, but I believe that what happens is
  the following:

  Once we continue, the thread 2 exits, and the main thread thus
  unblocks from its pthread_join, and spawns a new thread.  That new
  thread may hit the breakpoint at my_exit_syscall very quickly.  GDB
  could then see/process that breakpoint event before the thread exit
  event for the thread we care about, which would result in the
  failure seen above.

  The fix here is to not loop and start a new thread at all in the
  scenario where the race can happen.  We only need to loop and spawn
  new threads when testing with "cmd=continue" and schedlock off, in
  which case GDB doesn't abort the command when the thread exits.

Change-Id: I90c95c32f00630a3f682b1541c23aff52451f9b6
---
 .../gdb.threads/step-over-thread-exit.c       |  16 ++-
 .../gdb.threads/step-over-thread-exit.exp     | 127 +++++++++++++++---
 2 files changed, 119 insertions(+), 24 deletions(-)

diff --git a/gdb/testsuite/gdb.threads/step-over-thread-exit.c b/gdb/testsuite/gdb.threads/step-over-thread-exit.c
index 878e5924c5c..218f003b205 100644
--- a/gdb/testsuite/gdb.threads/step-over-thread-exit.c
+++ b/gdb/testsuite/gdb.threads/step-over-thread-exit.c
@@ -18,6 +18,7 @@
 #include <pthread.h>
 #include <assert.h>
 #include <stdlib.h>
+#include <unistd.h>
 #include "../lib/my-syscalls.h"
 
 static void *
@@ -30,13 +31,19 @@ thread_func (void *arg)
   abort ();
 }
 
+/* Number of threads we'll create.  */
+int n_threads = 100;
+
 int
-main (void)
+main (int argc, char **argv)
 {
   int i;
 
-  /* Spawn and join a thread, 100 times.  */
-  for (i = 0; i < 100; i++)
+  if (argc > 1)
+    n_threads = atoi (argv[1]);
+
+  /* Spawn and join a thread, N_THREADS times.  */
+  for (i = 0; i < n_threads; i++)
     {
       pthread_t thread;
       int ret;
@@ -48,5 +55,8 @@ main (void)
       assert (ret == 0);
     }
 
+  /* Some time to make sure that GDB processes the thread exit event
+     before the whole-process exit.  */
+  sleep (3);
   return 0;
 }
diff --git a/gdb/testsuite/gdb.threads/step-over-thread-exit.exp b/gdb/testsuite/gdb.threads/step-over-thread-exit.exp
index 615bd838763..32f64ce1a3e 100644
--- a/gdb/testsuite/gdb.threads/step-over-thread-exit.exp
+++ b/gdb/testsuite/gdb.threads/step-over-thread-exit.exp
@@ -25,11 +25,29 @@ if { [build_executable "failed to prepare" $testfile \
     return
 }
 
-# Each argument is a different testing axis, most of them obvious.
+# Test stepping/continuing at an exit syscall instruction.
+#
+# Each argument is a different testing axis.
+#
+# STEP_OVER_MODE can be one of:
+#
+#   - none: don't put a breakpoint on the exit syscall instruction.
+#
+#   - inline: put a breakpoint on the exit syscall instruction, and
+#     use in-line stepping to step over it (disable
+#     displaced-stepping).
+#
+#   - displaced: same, but use displaced stepping.
+#
+# SCHEDLOCK can be "on" or "off".
+#
+# CMD is the GDB command to run when at the exit syscall instruction.
+#
 # NS_STOP_ALL is only used if testing "set non-stop on", and indicates
 # whether to have GDB explicitly stop all threads before continuing to
 # thread exit.
-proc test {displaced-stepping non-stop target-non-stop schedlock cmd ns_stop_all} {
+#
+proc test {step_over_mode non-stop target-non-stop schedlock cmd ns_stop_all} {
     if {${non-stop} == "off" && $ns_stop_all} {
 	error "invalid arguments"
     }
@@ -40,23 +58,29 @@ proc test {displaced-stepping non-stop target-non-stop schedlock cmd ns_stop_all
 	clean_restart $::binfile
     }
 
-    gdb_test_no_output "set displaced-stepping ${displaced-stepping}"
-
-    if { ![runto_main] } {
-	return
+    if { $step_over_mode == "none" } {
+	# Nothing to do.
+    } elseif { $step_over_mode == "inline" } {
+	gdb_test_no_output "set displaced-stepping off"
+    } elseif { $step_over_mode == "displaced" } {
+	gdb_test_no_output "set displaced-stepping on"
+    } else {
+	error "Invalid step_over_mode value: $step_over_mode"
     }
 
-    gdb_breakpoint "my_exit_syscall"
-
     if {$schedlock
 	|| (${non-stop} == "on" && $ns_stop_all)} {
-	gdb_test "continue" \
-	    "Thread 2 .*hit Breakpoint $::decimal.* my_exit_syscall .*" \
-	    "continue until syscall"
+
+	gdb_test_no_output "set args 1"
+
+	if { ![runto my_exit_syscall] } {
+	    return
+	}
 
 	if {${non-stop} == "on"} {
 	    # The test only spawns one thread at a time, so this just
-	    # stops the main thread.
+	    # stops the main thread.  IOW, we only need to wait for
+	    # one stop.
 	    gdb_test_multiple "interrupt -a" "" {
 		-re "$::gdb_prompt " {
 		    gdb_test_multiple "" $gdb_test_name {
@@ -66,12 +90,19 @@ proc test {displaced-stepping non-stop target-non-stop schedlock cmd ns_stop_all
 		    }
 		}
 	    }
-	}
 
-	gdb_test "thread 2" "Switching to thread 2 .*"
+	    gdb_test "thread 2" "Switching to thread 2 .*"
+	}
 
 	gdb_test_no_output "set scheduler-locking ${schedlock}"
 
+	# If testing a step-over is requested, leave the breakpoint at
+	# the current instruction to force a step-over; otherwise,
+	# remove it.
+	if { $step_over_mode == "none" } {
+	    delete_breakpoints
+	}
+
 	if {$cmd == "continue"} {
 	    gdb_test "continue" \
 		"No unwaited-for children left." \
@@ -84,9 +115,50 @@ proc test {displaced-stepping non-stop target-non-stop schedlock cmd ns_stop_all
 	    }
 	}
     } else {
+	# Schedlock is off here.
+	#
+	# With "continue" and no scheduler-locking, GDB doesn't stop
+	# with "Command aborted, thread exited." when the thread
+	# exits, it just lets the inferior continue running freely.
+	# So we test that we can move past the thread exit, and that
+	# other threads can be freely scheduled.  We do that by
+	# spawning another thread as soon as the first exit.  We test
+	# that a number of times.  This should also exercise GDB's
+	# handling of inline or displaced step-overs, that GDB handles
+	# the related resource accounting correctly when the stepping
+	# thread exits, etc.
+	#
+	# With "continue" and $step_over_mode == "none" however, after
+	# the first my_exit_syscall breakpoint hit, we will remove the
+	# breakpoint, so no other thread would ever hit it again.  So
+	# might as well just test one thread.
+	#
+	# With step/next, GDB aborts the execution command with
+	# "Command aborted, thread exited." when the stepping thread
+	# exits.  If we let the main spawn another thread as soon as
+	# the first exits, it would be possible for that new thread to
+	# hit the exit syscall insn breakpoint quickly enough that it
+	# would be reported to be user before the first thread exit
+	# would be, which would confuse testing.  To avoid that, we
+	# only spawn one thread, too.
+	#
+	if {$cmd != "continue" || $step_over_mode == "none"} {
+	    set n_threads 1
+	} else {
+	    set n_threads 100
+	}
+
+	gdb_test_no_output "set args $n_threads"
+
+	if { ![runto_main] } {
+	    return
+	}
+
+	gdb_breakpoint "my_exit_syscall"
+
 	gdb_test_no_output "set scheduler-locking ${schedlock}"
 
-	if {$cmd != "continue"} {
+	if {$cmd != "continue" || $step_over_mode == "none"} {
 	    set thread "<unknown>"
 	    gdb_test_multiple "continue" "" {
 		-re -wrap "Thread ($::decimal) .*hit Breakpoint $::decimal.* my_exit_syscall .*" {
@@ -98,10 +170,23 @@ proc test {displaced-stepping non-stop target-non-stop schedlock cmd ns_stop_all
 		    "switch to event thread"
 	    }
 
-	    gdb_test_multiple $cmd "command aborts when thread exits" {
-		-re "Command aborted, thread exited\\.\r\n$::gdb_prompt " {
-		    pass $gdb_test_name
+	    # If testing a step-over is requested, leave the breakpoint at
+	    # the current instruction to force a step-over; otherwise,
+	    # remove it.
+	    if { $step_over_mode == "none" } {
+		delete_breakpoints
+	    }
+
+	    if {$cmd == "continue"} {
+		gdb_continue_to_end "continue to end" "continue" 1
+	    } else {
+		gdb_test_multiple $cmd "command aborts when thread exits" {
+		    -re "Command aborted, thread exited\\.\r\n$::gdb_prompt " {
+			pass $gdb_test_name
+		    }
 		}
+		gdb_test "p \$_thread == $thread" "= 1" \
+		    "selected thread didn't change"
 	    }
 	} else {
 	    for { set i 0 } { $i < 100 } { incr i } {
@@ -130,7 +215,7 @@ proc test {displaced-stepping non-stop target-non-stop schedlock cmd ns_stop_all
     }
 }
 
-foreach_with_prefix displaced-stepping {off auto} {
+foreach_with_prefix step_over_mode {none inline displaced} {
     foreach_with_prefix non-stop {off on} {
 	foreach_with_prefix target-non-stop {off on} {
 	    if {${non-stop} == "on" && ${target-non-stop} == "off"} {
@@ -142,11 +227,11 @@ foreach_with_prefix displaced-stepping {off auto} {
 		foreach_with_prefix cmd {"next" "continue"} {
 		    if {${non-stop} == "on"} {
 			foreach_with_prefix ns_stop_all {0 1} {
-			    test ${displaced-stepping} ${non-stop} ${target-non-stop} \
+			    test ${step_over_mode} ${non-stop} ${target-non-stop} \
 				${schedlock} ${cmd} ${ns_stop_all}
 			}
 		    } else {
-			test ${displaced-stepping} ${non-stop} ${target-non-stop} ${schedlock} ${cmd} 0
+			test ${step_over_mode} ${non-stop} ${target-non-stop} ${schedlock} ${cmd} 0
 		    }
 		}
 	    }
-- 
2.43.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 2/8] Ensure selected thread after thread exit stop
  2023-12-14 20:22 [PATCH 0/8] Step over thread exit improvements/fixes + AMD GPU Pedro Alves
  2023-12-14 20:22 ` [PATCH 1/8] gdb.threads/step-over-thread-exit.exp improvements Pedro Alves
@ 2023-12-14 20:22 ` Pedro Alves
  2023-12-14 20:22 ` [PATCH 3/8] displaced_step_finish: Don't fetch the regcache of exited threads Pedro Alves
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Pedro Alves @ 2023-12-14 20:22 UTC (permalink / raw)
  To: gdb-patches

While making step over thread exit work properly on AMDGPU, I noticed
that if there's a breakpoint on top of the exit syscall, and,
displaced stepping is off, then when GDB reports "Command aborted,
thread exited.", GDB also switches focus to a random thread, instead
of leaving the exited thread as selected:

 (gdb) thread
 [Current thread is 6, lane 0 (AMDGPU Lane 1:4:1:1/0 (0,0,0)[0,0,0])]
 (gdb) si
 Command aborted, thread exited.
 (gdb) thread
 [Current thread is 5 (Thread 0x7ffff626f640 (LWP 3248392))]
 (gdb)

The previous patch extended gdb.threads/step-over-thread-exit.exp to
exercise this on GNU/Linux (on the CPU side), and there, after that
"si", we always end up with the exiting thread as selected even
without this fix, but that's just a concidence, there's a code path
that happens to select the exiting thread for an unrelated reason.

This commit add the explict switch, fixing the latent problem for
GNU/Linux, and the actual problem on AMDGPU.  I wrote a gdb.rocm/
testcase for this, but it can't be upstreamed yet, until more pieces
of the DWARF machinery are upstream as well.

Change-Id: I6ff57a79514ac0142bba35c749fe83d53d9e4e51
---
 gdb/infrun.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/gdb/infrun.c b/gdb/infrun.c
index 45c1b4a79bb..6dc0a2bb9a5 100644
--- a/gdb/infrun.c
+++ b/gdb/infrun.c
@@ -5895,7 +5895,14 @@ handle_thread_exited (execution_control_state *ecs)

   if (abort_cmd)
     {
+      /* We're stopping for the thread exit event.  Switch to the
+	 event thread again, as finish_step_over may have switched
+	 threads.  */
+      switch_to_thread (ecs->event_thread);
+
+      /* Emit [Thread ... exited] notification.  */
       delete_thread (ecs->event_thread);
+
       ecs->event_thread = nullptr;
       return false;
     }
-- 
2.43.0

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 3/8] displaced_step_finish: Don't fetch the regcache of exited threads
  2023-12-14 20:22 [PATCH 0/8] Step over thread exit improvements/fixes + AMD GPU Pedro Alves
  2023-12-14 20:22 ` [PATCH 1/8] gdb.threads/step-over-thread-exit.exp improvements Pedro Alves
  2023-12-14 20:22 ` [PATCH 2/8] Ensure selected thread after thread exit stop Pedro Alves
@ 2023-12-14 20:22 ` Pedro Alves
  2023-12-14 20:22 ` [PATCH 4/8] Step over thread exit, always delete the thread non-silently Pedro Alves
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Pedro Alves @ 2023-12-14 20:22 UTC (permalink / raw)
  To: gdb-patches

displaced_step_finish can be called with event_status.kind ==
TARGET_WAITKIND_THREAD_EXITED, and in that case it is not possible to
get at the already-exited thread's registers.

This patch moves the get_thread_regcache calls to branches that
actually need it, where we know the thread is still alive.

It also adds an assertion to get_thread_regcache, to help catching
these broken cases sooner.

Change-Id: I63b5eacb3e02a538fc5087c270d8025adfda88c3
---
 gdb/infrun.c   | 19 ++++++++++++-------
 gdb/regcache.c |  2 ++
 2 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/gdb/infrun.c b/gdb/infrun.c
index 6dc0a2bb9a5..76693a30611 100644
--- a/gdb/infrun.c
+++ b/gdb/infrun.c
@@ -2021,8 +2021,6 @@ displaced_step_finish (thread_info *event_thread,
 		       const target_waitstatus &event_status)
 {
   /* Check whether the parent is displaced stepping.  */
-  struct regcache *regcache = get_thread_regcache (event_thread);
-  struct gdbarch *gdbarch = regcache->arch ();
   inferior *parent_inf = event_thread->inf;
 
   /* If this was a fork/vfork/clone, this event indicates that the
@@ -2040,10 +2038,15 @@ displaced_step_finish (thread_info *event_thread,
      gdbarch_displaced_step_restore_all_in_ptid.  This is not enforced
      during gdbarch validation to support architectures which support
      displaced stepping but not forks.  */
-  if (event_status.kind () == TARGET_WAITKIND_FORKED
-      && gdbarch_supports_displaced_stepping (gdbarch))
-    gdbarch_displaced_step_restore_all_in_ptid
-      (gdbarch, parent_inf, event_status.child_ptid ());
+  if (event_status.kind () == TARGET_WAITKIND_FORKED)
+    {
+      struct regcache *parent_regcache = get_thread_regcache (event_thread);
+      struct gdbarch *gdbarch = parent_regcache->arch ();
+
+      if (gdbarch_supports_displaced_stepping (gdbarch))
+	gdbarch_displaced_step_restore_all_in_ptid
+	  (gdbarch, parent_inf, event_status.child_ptid ());
+    }
 
   displaced_step_thread_state *displaced = &event_thread->displaced_step_state;
 
@@ -2082,11 +2085,13 @@ displaced_step_finish (thread_info *event_thread,
 	 child hasn't been added to the inferior list yet at this
 	 point.  */
 
+      struct regcache *parent_regcache = get_thread_regcache (event_thread);
+      struct gdbarch *gdbarch = parent_regcache->arch ();
       struct regcache *child_regcache
 	= get_thread_arch_regcache (parent_inf, event_status.child_ptid (),
 				    gdbarch);
       /* Read PC value of parent.  */
-      CORE_ADDR parent_pc = regcache_read_pc (regcache);
+      CORE_ADDR parent_pc = regcache_read_pc (parent_regcache);
 
       displaced_debug_printf ("write child pc from %s to %s",
 			      paddress (gdbarch,
diff --git a/gdb/regcache.c b/gdb/regcache.c
index e46a0b58f50..f9bf1ecbe12 100644
--- a/gdb/regcache.c
+++ b/gdb/regcache.c
@@ -411,6 +411,8 @@ get_thread_regcache (process_stratum_target *target, ptid_t ptid)
 struct regcache *
 get_thread_regcache (thread_info *thread)
 {
+  gdb_assert (thread->state != THREAD_EXITED);
+
   return get_thread_regcache (thread->inf->process_target (),
 			      thread->ptid);
 }
-- 
2.43.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 4/8] Step over thread exit, always delete the thread non-silently
  2023-12-14 20:22 [PATCH 0/8] Step over thread exit improvements/fixes + AMD GPU Pedro Alves
                   ` (2 preceding siblings ...)
  2023-12-14 20:22 ` [PATCH 3/8] displaced_step_finish: Don't fetch the regcache of exited threads Pedro Alves
@ 2023-12-14 20:22 ` Pedro Alves
  2023-12-14 20:22 ` [PATCH 5/8] Fix thread target ID of exited waves Pedro Alves
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Pedro Alves @ 2023-12-14 20:22 UTC (permalink / raw)
  To: gdb-patches

With AMD GPU debugging, I noticed that when stepping over a breakpoint
placed on top of the s_endpgm instruction inline (displaced=off), GDB
would behave differently -- it wouldn't print the wave exit.  E.g:

With displaced stepping, or no breakpoint at all:

 stepi
 [AMDGPU Wave 1:4:1:1 (0,0,0)/0 exited]
 Command aborted, thread exited.
 (gdb)

With inline stepping:

 stepi
 Command aborted, thread exited.
 (gdb)

In the cases we see the "exited" notification, handle_thread_exit is
what first called delete_thread on the exiting thread, which is
non-silent.

With inline stepping, however, handle_thread_exit ends up in
update_thread_list (via restart_threads) before any delete_thread
call.  Thus, amd_dbgapi_target::update_thread_list notices that the
wave is gone and deletes it with delete_thread_silent.

This commit fixes it, by making handle_thread_exited call
set_thread_exited (with the default silent=false) early, which emits
the user-visible notification.

Change-Id: I22ab3145e18d07c99dace45576307b9f9d5d966f
---
 gdb/infrun.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/gdb/infrun.c b/gdb/infrun.c
index 76693a30611..1d863896c40 100644
--- a/gdb/infrun.c
+++ b/gdb/infrun.c
@@ -5885,6 +5885,13 @@ handle_thread_exited (execution_control_state *ecs)
      update the thread list and delete the event thread.  */
   bool abort_cmd = (ecs->event_thread->thread_fsm () != nullptr);

+  /* Mark the thread exited right now, because finish_step_over may
+     update the thread list and that may delete the thread silently
+     (depending on target), while we always want to emit the "[Thread
+     ... exited]" notification.  Don't actually delete the thread yet,
+     because we need to pass its pointer down to finish_step_over.  */
+  set_thread_exited (ecs->event_thread);
+
   /* Maybe the thread was doing a step-over, if so release
      resources and start any further pending step-overs.

@@ -5904,10 +5911,6 @@ handle_thread_exited (execution_control_state *ecs)
 	 event thread again, as finish_step_over may have switched
 	 threads.  */
       switch_to_thread (ecs->event_thread);
-
-      /* Emit [Thread ... exited] notification.  */
-      delete_thread (ecs->event_thread);
-
       ecs->event_thread = nullptr;
       return false;
     }
-- 
2.43.0

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 5/8] Fix thread target ID of exited waves
  2023-12-14 20:22 [PATCH 0/8] Step over thread exit improvements/fixes + AMD GPU Pedro Alves
                   ` (3 preceding siblings ...)
  2023-12-14 20:22 ` [PATCH 4/8] Step over thread exit, always delete the thread non-silently Pedro Alves
@ 2023-12-14 20:22 ` Pedro Alves
  2023-12-15 10:51   ` Lancelot SIX
  2023-12-14 20:22 ` [PATCH 6/8] Fix handling of vanishing threads that were stepping/stopping Pedro Alves
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 12+ messages in thread
From: Pedro Alves @ 2023-12-14 20:22 UTC (permalink / raw)
  To: gdb-patches

Currently, if you step over kernel exit, you see:

 stepi
 [AMDGPU Wave ?:?:?:1 (?,?,?)/? exited]
 Command aborted, thread exited.
 (gdb)

Those '?' are because the thread/wave is already gone by the time GDB
prints the "exited" notification, we can't ask dbgapi for any info
about the wave anymore.

This commit fixes it by caching the wave's coordinates as soon as GDB
sees the wave for the first time, and making
amd_dbgapi_target::pid_to_str use the cached info.

At first I thought of clearing the wave_info object from a
thread_exited observer.  However, that is too soon, resulting in this:

 (gdb) si
 [AMDGPU Wave 1:4:1:1 (0,0,0)/0 exited]
 Command aborted, thread exited.
 (gdb) thread
 [Current thread is 6 (AMDGPU Wave ?:?:?:0 (?,?,?)/?) (exited)]

We need instead to clear the wave info when the thread is ultimately
deleted, so we get:

 (gdb) si
 [AMDGPU Wave 1:4:1:1 (0,0,0)/0 exited]
 Command aborted, thread exited.
 (gdb) thread
 [Current thread is 6 (AMDGPU Wave 1:4:1:1 (0,0,0)/0) (exited)]

And for that, we need a new thread_deleted observable.

Change-Id: I6c3e22541f051e1205f75eb657b04dc15e547580
---
 gdb/amd-dbgapi-target.c | 168 +++++++++++++++++++++++++++++++---------
 gdb/observable.c        |   1 +
 gdb/observable.h        |   5 ++
 gdb/thread.c            |   2 +
 4 files changed, 138 insertions(+), 38 deletions(-)

diff --git a/gdb/amd-dbgapi-target.c b/gdb/amd-dbgapi-target.c
index 18c0543c40e..86102b7fb03 100644
--- a/gdb/amd-dbgapi-target.c
+++ b/gdb/amd-dbgapi-target.c
@@ -109,6 +109,28 @@ get_amd_dbgapi_target_inferior_created_observer_token ()
   return amd_dbgapi_target_inferior_created_observer_token;
 }
 
+/* A type holding coordinate, etc. info for a given wave.  We cache
+   this because we need this information after a wave exits.  */
+
+struct wave_info
+{
+  /* The wave.  Set by the ctor.  */
+  amd_dbgapi_wave_id_t wave_id;
+
+  /* All these fields are initialized here to a value that is printed
+     as "?".  */
+  amd_dbgapi_dispatch_id_t dispatch_id {};
+  amd_dbgapi_queue_id_t queue_id {};
+  amd_dbgapi_agent_id_t agent_id {};
+  uint32_t group_ids[3] {UINT32_MAX, UINT32_MAX, UINT32_MAX};
+  uint32_t wave_in_group = UINT32_MAX;
+
+  explicit wave_info (amd_dbgapi_wave_id_t wave_id);
+
+  /* Return the target ID string for the wave this wave_info is
+     for.  */
+  std::string to_string () const;
+};
 
 /* Big enough to hold the size of the largest register in bytes.  */
 #define AMDGPU_MAX_REGISTER_SIZE 256
@@ -160,6 +182,16 @@ struct amd_dbgapi_inferior_info
 
   /* List of pending events the amd-dbgapi target retrieved from the dbgapi.  */
   std::list<std::pair<ptid_t, target_waitstatus>> wave_events;
+
+  /* Map of wave ID to wave_info.  We cache wave_info objects because
+     we need to access the info after the wave is gone, in the thread
+     exit nofication.  E.g.:
+	[AMDGPU Wave 1:4:1:1 (0,0,0)/0 exited]
+
+     wave_info objects are added when we first see the wave, and
+     removed from a thread_deleted observer.  */
+  std::unordered_map<decltype (amd_dbgapi_wave_id_t::handle), wave_info>
+    wave_info_map;
 };
 
 static amd_dbgapi_event_id_t process_event_queue
@@ -256,56 +288,70 @@ static const registry<inferior>::key<amd_dbgapi_inferior_info>
 
 static async_event_handler *amd_dbgapi_async_event_handler = nullptr;
 
-/* Return the target id string for a given wave.  */
-
-static std::string
-wave_target_id_string (amd_dbgapi_wave_id_t wave_id)
+std::string
+wave_info::to_string () const
 {
-  amd_dbgapi_dispatch_id_t dispatch_id;
-  amd_dbgapi_queue_id_t queue_id;
-  amd_dbgapi_agent_id_t agent_id;
-  uint32_t group_ids[3], wave_in_group;
   std::string str = "AMDGPU Wave";
 
-  amd_dbgapi_status_t status
-    = amd_dbgapi_wave_get_info (wave_id, AMD_DBGAPI_WAVE_INFO_AGENT,
-				sizeof (agent_id), &agent_id);
-  str += (status == AMD_DBGAPI_STATUS_SUCCESS
+  str += (agent_id.handle != 0
 	  ? string_printf (" %ld", agent_id.handle)
 	  : " ?");
 
-  status = amd_dbgapi_wave_get_info (wave_id, AMD_DBGAPI_WAVE_INFO_QUEUE,
-				     sizeof (queue_id), &queue_id);
-  str += (status == AMD_DBGAPI_STATUS_SUCCESS
+  str += (queue_id.handle != 0
 	  ? string_printf (":%ld", queue_id.handle)
 	  : ":?");
 
-  status = amd_dbgapi_wave_get_info (wave_id, AMD_DBGAPI_WAVE_INFO_DISPATCH,
-				     sizeof (dispatch_id), &dispatch_id);
-  str += (status == AMD_DBGAPI_STATUS_SUCCESS
+  str += (dispatch_id.handle != 0
 	  ? string_printf (":%ld", dispatch_id.handle)
 	  : ":?");
 
   str += string_printf (":%ld", wave_id.handle);
 
-  status = amd_dbgapi_wave_get_info (wave_id,
-				     AMD_DBGAPI_WAVE_INFO_WORKGROUP_COORD,
-				     sizeof (group_ids), &group_ids);
-  str += (status == AMD_DBGAPI_STATUS_SUCCESS
+  str += (group_ids[0] != UINT32_MAX
 	  ? string_printf (" (%d,%d,%d)", group_ids[0], group_ids[1],
 			   group_ids[2])
 	  : " (?,?,?)");
 
-  status = amd_dbgapi_wave_get_info
-    (wave_id, AMD_DBGAPI_WAVE_INFO_WAVE_NUMBER_IN_WORKGROUP,
-     sizeof (wave_in_group), &wave_in_group);
-  str += (status == AMD_DBGAPI_STATUS_SUCCESS
+  str += (wave_in_group != UINT32_MAX
 	  ? string_printf ("/%d", wave_in_group)
 	  : "/?");
 
   return str;
 }
 
+wave_info::wave_info (amd_dbgapi_wave_id_t wave_id)
+  : wave_id (wave_id)
+{
+}
+
+/* Read in wave_info for WAVE_ID.  */
+
+static wave_info
+get_wave_info (amd_dbgapi_wave_id_t wave_id)
+{
+  wave_info res (wave_id);
+
+  /* Any field that fails to be read is left with its in-class
+     initialized value, which is printed as "?".  */
+
+  amd_dbgapi_wave_get_info (wave_id, AMD_DBGAPI_WAVE_INFO_AGENT,
+			    sizeof (res.agent_id), &res.agent_id);
+  amd_dbgapi_wave_get_info (wave_id, AMD_DBGAPI_WAVE_INFO_QUEUE,
+			    sizeof (res.queue_id), &res.queue_id);
+  amd_dbgapi_wave_get_info (wave_id, AMD_DBGAPI_WAVE_INFO_DISPATCH,
+			    sizeof (res.dispatch_id), &res.dispatch_id);
+
+  amd_dbgapi_wave_get_info (wave_id,
+			    AMD_DBGAPI_WAVE_INFO_WORKGROUP_COORD,
+			    sizeof (res.group_ids), &res.group_ids);
+
+  amd_dbgapi_wave_get_info (wave_id,
+			    AMD_DBGAPI_WAVE_INFO_WAVE_NUMBER_IN_WORKGROUP,
+			    sizeof (res.wave_in_group), &res.wave_in_group);
+
+  return res;
+}
+
 /* Clear our async event handler.  */
 
 static void
@@ -510,7 +556,21 @@ amd_dbgapi_target::pid_to_str (ptid_t ptid)
   if (!ptid_is_gpu (ptid))
     return beneath ()->pid_to_str (ptid);
 
-  return wave_target_id_string (get_amd_dbgapi_wave_id (ptid));
+  process_stratum_target *proc_target = current_inferior ()->process_target ();
+  inferior *inf = find_inferior_pid (proc_target, ptid.pid ());
+  gdb_assert (inf != nullptr);
+  amd_dbgapi_inferior_info *info = get_amd_dbgapi_inferior_info (inf);
+
+  auto wave_id = get_amd_dbgapi_wave_id (ptid);
+
+  auto it = info->wave_info_map.find (wave_id.handle);
+  if (it != info->wave_info_map.end ())
+    return it->second.to_string ();
+
+  /* A wave we don't know about.  Shouldn't usually happen, but
+     asserting and bringing down the session is a bit too harsh.  Just
+     print all unknown info as "?"s.  */
+  return wave_info (wave_id).to_string ();
 }
 
 const char *
@@ -929,6 +989,46 @@ make_gpu_ptid (ptid_t::pid_type pid, amd_dbgapi_wave_id_t wave_id)
  return ptid_t (pid, 1, wave_id.handle);
 }
 
+/* When a thread is deleted, remove its wave_info from the inferior's
+   wave_info map.  */
+
+static void
+amd_dbgapi_thread_deleted (thread_info *tp)
+{
+  if (tp->inf->target_at (arch_stratum) == &the_amd_dbgapi_target
+      && ptid_is_gpu (tp->ptid))
+    {
+      amd_dbgapi_inferior_info *info = amd_dbgapi_inferior_data.get (tp->inf);
+      auto wave_id = get_amd_dbgapi_wave_id (tp->ptid);
+      auto it = info->wave_info_map.find (wave_id.handle);
+      gdb_assert (it != info->wave_info_map.end ());
+      info->wave_info_map.erase (it);
+    }
+}
+
+/* Register WAVE_PTID as a new thread in INF's thread list, and record
+   its wave_info in the inferior's wave_info map.  */
+
+static thread_info *
+add_gpu_thread (inferior *inf, ptid_t wave_ptid)
+{
+  process_stratum_target *proc_target = inf->process_target ();
+  amd_dbgapi_inferior_info *info = get_amd_dbgapi_inferior_info (inf);
+
+  auto wave_id = get_amd_dbgapi_wave_id (wave_ptid);
+
+  if (!info->wave_info_map.try_emplace (wave_id.handle,
+					get_wave_info (wave_id)).second)
+    internal_error ("wave ID %ld already in map", wave_id.handle);
+
+  /* Create new GPU threads silently to avoid spamming the terminal
+     with thousands of "[New Thread ...]" messages.  */
+  thread_info *thread = add_thread_silent (proc_target, wave_ptid);
+  set_running (proc_target, wave_ptid, true);
+  set_executing (proc_target, wave_ptid, true);
+  return thread;
+}
+
 /* Process an event that was just pulled out of the amd-dbgapi library.  */
 
 static void
@@ -1015,13 +1115,7 @@ process_one_event (amd_dbgapi_event_id_t event_id,
 
 	    thread_info *thread = proc_target->find_thread (event_ptid);
 	    if (thread == nullptr)
-	      {
-		/* Silently create new GPU threads to avoid spamming the
-		   terminal with thousands of "[New Thread ...]" messages.  */
-		thread = add_thread_silent (proc_target, event_ptid);
-		set_running (proc_target, event_ptid, true);
-		set_executing (proc_target, event_ptid, true);
-	      }
+	      thread = add_gpu_thread (inf, event_ptid);
 
 	    /* If the wave is stopped because of a software breakpoint, the
 	       program counter needs to be adjusted so that it points to the
@@ -1686,10 +1780,7 @@ amd_dbgapi_target::update_thread_list ()
 	{
 	  ptid_t wave_ptid
 	    = make_gpu_ptid (inf->pid, amd_dbgapi_wave_id_t {tid});
-
-	  add_thread_silent (inf->process_target (), wave_ptid);
-	  set_running (inf->process_target (), wave_ptid, true);
-	  set_executing (inf->process_target (), wave_ptid, true);
+	  add_gpu_thread (inf, wave_ptid);
 	}
     }
 
@@ -2115,6 +2206,7 @@ _initialize_amd_dbgapi_target ()
   gdb::observers::inferior_forked.attach (amd_dbgapi_inferior_forked, "amd-dbgapi");
   gdb::observers::inferior_exit.attach (amd_dbgapi_inferior_exited, "amd-dbgapi");
   gdb::observers::inferior_pre_detach.attach (amd_dbgapi_inferior_pre_detach, "amd-dbgapi");
+  gdb::observers::thread_deleted.attach (amd_dbgapi_thread_deleted, "amd-dbgapi");
 
   add_basic_prefix_cmd ("amdgpu", no_class,
 			_("Generic command for setting amdgpu flags."),
diff --git a/gdb/observable.c b/gdb/observable.c
index f2e65b11604..29675f3abf3 100644
--- a/gdb/observable.c
+++ b/gdb/observable.c
@@ -46,6 +46,7 @@ DEFINE_OBSERVABLE (all_objfiles_removed);
 DEFINE_OBSERVABLE (free_objfile);
 DEFINE_OBSERVABLE (new_thread);
 DEFINE_OBSERVABLE (thread_exit);
+DEFINE_OBSERVABLE (thread_deleted);
 DEFINE_OBSERVABLE (thread_stop_requested);
 DEFINE_OBSERVABLE (target_resumed);
 DEFINE_OBSERVABLE (about_to_proceed);
diff --git a/gdb/observable.h b/gdb/observable.h
index 32ef65435cc..91a2c871524 100644
--- a/gdb/observable.h
+++ b/gdb/observable.h
@@ -126,6 +126,11 @@ extern observable<thread_info */* t */,
 		  std::optional<ULONGEST> /* exit_code */,
 		  bool /* silent */> thread_exit;
 
+/* The thread specified by T has been deleted, with delete_thread.
+   This is called just before the thread_info object is destroyed with
+   operator delete.  */
+extern observable<thread_info */* t */> thread_deleted;
+
 /* An explicit stop request was issued to PTID.  If PTID equals
    minus_one_ptid, the request applied to all threads.  If
    ptid_is_pid(PTID) returns true, the request applied to all
diff --git a/gdb/thread.c b/gdb/thread.c
index 85bdbaa6cd8..bd3fe85f3b9 100644
--- a/gdb/thread.c
+++ b/gdb/thread.c
@@ -527,6 +527,8 @@ delete_thread_1 (thread_info *thr, std::optional<ULONGEST> exit_code,
   auto it = thr->inf->thread_list.iterator_to (*thr);
   thr->inf->thread_list.erase (it);
 
+  gdb::observers::thread_deleted.notify (thr);
+
   delete thr;
 }
 
-- 
2.43.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 6/8] Fix handling of vanishing threads that were stepping/stopping
  2023-12-14 20:22 [PATCH 0/8] Step over thread exit improvements/fixes + AMD GPU Pedro Alves
                   ` (4 preceding siblings ...)
  2023-12-14 20:22 ` [PATCH 5/8] Fix thread target ID of exited waves Pedro Alves
@ 2023-12-14 20:22 ` Pedro Alves
  2023-12-15 10:51   ` Lancelot SIX
  2023-12-14 20:22 ` [PATCH 7/8] Add tests for s_endpgm handling Pedro Alves
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 12+ messages in thread
From: Pedro Alves @ 2023-12-14 20:22 UTC (permalink / raw)
  To: gdb-patches

Downstream, AMD is carrying a testcase
(gdb.rocm/continue-over-kernel-exit.exp) that exposes a couple issues
with the amd-dbgapi target's handling of exited threads.  The test
can't be added upstream yet, unfortunately, due to dependency on DWARF
extensions that can't be upstreamed yet.  However, it can be found on
the mailing list on the same series as this patch.

The test spawns a kernel with a number of waves.  The waves do nothing
but exit.  There is a breakpoint on the s_endpgm instruction.  Once
that breakpoint is hit, the test issues a "continue" command.  We
should see one breakpoint hit per wave, and then the whole program
exiting.  We do see that, however we also see this:

 [New AMDGPU Wave ?:?:?:1 (?,?,?)/?]
 [AMDGPU Wave ?:?:?:1 (?,?,?)/? exited]
 *repeat for other waves*
 ...
 [Thread 0x7ffff626f640 (LWP 3048491) exited]
 [Thread 0x7fffeb7ff640 (LWP 3048488) exited]
 [Inferior 1 (process 3048475) exited normally]

That "New AMDGPU Wave" output comes from infrun.c itself adding the
thread to the GDB thread list, because it got an event for a thread
not on the thread list yet.  The output shows "?"s instead of proper
coordinates, because the event was a TARGET_WAITKIND_THREAD_EXITED,
i.e., the wave was already gone when infrun.c added the thread to the
thread list.

That shouldn't ever happen for the amd-dbgapi target, threads should
only ever be added by the backend.

Note "New AMDGPU Wave ?:?:?:1" is for wave 1.  What happened was that
wave 1 terminated previously, and a previous call to
amd_dbgapi_target::update_thread_list() noticed the wave had vanished
and removed it from the GDB thread list.  However, because the wave
was stepping when it terminated (due to the displaced step over the
s_endpgm) instruction, it is guaranteed that the amd-dbgapi library
queues a WAVE_COMMAND_TERMINATED event for the exit.

When we process that WAVE_COMMAND_TERMINATED event, in
amd-dbgapi-target.c:process_one_event, we return it to the core as a
TARGET_WAITKIND_THREAD_EXITED event:

 static void
 process_one_event (amd_dbgapi_event_id_t event_id,
		    amd_dbgapi_event_kind_t event_kind)
 {
 ...
	 if (status == AMD_DBGAPI_STATUS_ERROR_INVALID_WAVE_ID
	     && event_kind == AMD_DBGAPI_EVENT_KIND_WAVE_COMMAND_TERMINATED)
	   ws.set_thread_exited (0);
 ...
 }

Recall the wave is already gone from the GDB thread list.  So when GDB
sees that TARGET_WAITKIND_THREAD_EXITED event for a thread it doesn't
know about, it adds the thread to the thread list, resulting in that:

 [New AMDGPU Wave ?:?:?:1 (?,?,?)/?]

and then, because it was a TARGET_WAITKIND_THREAD_EXITED event, GDB
marks the thread exited right afterwards:

 [AMDGPU Wave ?:?:?:1 (?,?,?)/? exited]

The fix is to make amd_dbgapi_target::update_thread_list() _not_
delete vanishing waves iff they were stepping or in progress of being
stopped.  These two cases are the ones dbgapi guarantees will result
in a WAVE_COMMAND_TERMINATED event if the wave terminates:

  /**
   * A command for a wave was not able to complete because the wave has
   * terminated.
   *
   * Commands that can result in this event are ::amd_dbgapi_wave_stop and
   * ::amd_dbgapi_wave_resume in single step mode.  Since the wave terminated
   * before stopping, this event will be reported instead of
   * ::AMD_DBGAPI_EVENT_KIND_WAVE_STOP.
   *
   * The wave that terminated is available by the ::AMD_DBGAPI_EVENT_INFO_WAVE
   * query.  However, the wave will be invalid since it has already terminated.
   * It is the client's responsibility to know what command was being performed
   * and was unable to complete due to the wave terminating.
   */
  AMD_DBGAPI_EVENT_KIND_WAVE_COMMAND_TERMINATED = 2,

As the comment says, it's GDB's responsability to know whether the
wave was stepping or being stopped.  Since we now have a wave_info map
with one entry for each wave, that seems like the place to store that
information.  However, I still decided to put all the coordinate
information in its own structure.  I.e., basically renamed the
existing wave_info to wave_coordinates, and then added a new wave_info
structure that holds the new state, plus a wave_coordinates object.
This seemer cleaner as there are places where we only need to
instantiate a wave_coordinates object.

There's an extra twist.  The testcase also exercises stopping at a new
kernel right after the first kernel fully exits.  In that scenario, we
were hitting this assertion after the first kernel fully exits and the
hit of the breakpoint at the second kernel is handled:

 [amd-dbgapi] process_event_queue: Pulled event from dbgapi: event_id.handle = 26, event_kind = WAVE_STOP
 [amd-dbgapi-lib] suspending queue_3, queue_2, queue_1 (refresh wave list)
 ../../src/gdb/amd-dbgapi-target.c:1625: internal-error: amd_dbgapi_thread_deleted: Assertion `it != info->wave_info_map.end ()' failed.
 A problem internal to GDB has been detected,
 further debugging may prove unreliable.

This is the exact same problem as above, just a different
manifestation.  In this scenario, we end up in update_thread_list
successfully deleting the exited thread (because it was no longer the
current thread) that was incorrectly added by infrun.c.  Because it
was added by infrun.c and not by amd-dbgapi-target.c:add_gpu_thread,
it doesn't have an entry in the wave_info map, so
amd_dbgapi_thread_deleted trips on this assertion:

      gdb_assert (it != info->wave_info_map.end ());

here:

  ...
  -> stop_all_threads
   -> update_thread_list
    -> target_update_thread_list
     -> amd_dbgapi_target::update_thread_list
      -> thread_db_target::update_thread_list
       -> linux_nat_target::update_thread_list
	-> delete_exited_threads
	 -> delete_thread
	  -> delete_thread_1
	   -> gdb::observers::observable<thread_info*>::notify
	    -> amd_dbgapi_thread_deleted
	     -> internal_error_loc

The testcase thus tries both running to exit after the first kernel
exits, and running to a breakpoint in a second kernel after the first
kernel exits.

Change-Id: I43a66f060c35aad1fe0d9ff022ce2afd0537f028
---
 gdb/amd-dbgapi-target.c | 197 ++++++++++++++++++++++++++++++----------
 1 file changed, 149 insertions(+), 48 deletions(-)

diff --git a/gdb/amd-dbgapi-target.c b/gdb/amd-dbgapi-target.c
index 86102b7fb03..06f9e8c5f9c 100644
--- a/gdb/amd-dbgapi-target.c
+++ b/gdb/amd-dbgapi-target.c
@@ -109,10 +109,9 @@ get_amd_dbgapi_target_inferior_created_observer_token ()
   return amd_dbgapi_target_inferior_created_observer_token;
 }
 
-/* A type holding coordinate, etc. info for a given wave.  We cache
-   this because we need this information after a wave exits.  */
+/* A type holding coordinates, etc. info for a given wave.  */
 
-struct wave_info
+struct wave_coordinates
 {
   /* The wave.  Set by the ctor.  */
   amd_dbgapi_wave_id_t wave_id;
@@ -125,11 +124,44 @@ struct wave_info
   uint32_t group_ids[3] {UINT32_MAX, UINT32_MAX, UINT32_MAX};
   uint32_t wave_in_group = UINT32_MAX;
 
-  explicit wave_info (amd_dbgapi_wave_id_t wave_id);
+  explicit wave_coordinates (amd_dbgapi_wave_id_t wave_id)
+    : wave_id (wave_id)
+  {}
 
-  /* Return the target ID string for the wave this wave_info is
+  /* Return the target ID string for the wave this wave_coordinates is
      for.  */
   std::string to_string () const;
+
+  /* Pull out coordinates info from the amd-dbgapi library.  */
+  void fetch ();
+};
+
+/* A type holding info about a given wave.  */
+
+struct wave_info
+{
+  /* We cache the coordinates info because we need it after a wave
+     exits.  The wave's ID is here.  */
+  wave_coordinates coords;
+
+  /* The last resume_mode passed to amd_dbgapi_wave_resume for this
+     wave.  We track this because we are guaranteed to see a
+     WAVE_COMMAND_TERMINATED event if a stepping wave terminates, and
+     we need to know to not delete such a wave until we process that
+     event.  */
+  amd_dbgapi_resume_mode_t last_resume_mode = AMD_DBGAPI_RESUME_MODE_NORMAL;
+
+  /* Whether we've called amd_dbgapi_wave_stop for this wave and are
+     waiting for its stop event.  Similarly, we track this because
+     we're guaranteed to get a WAVE_COMMAND_TERMINATED event if the
+     wave terminates while being stopped.  */
+  bool stopping = false;
+
+  explicit wave_info (amd_dbgapi_wave_id_t wave_id)
+    : coords (wave_id)
+  {
+    coords.fetch ();
+  }
 };
 
 /* Big enough to hold the size of the largest register in bytes.  */
@@ -275,6 +307,19 @@ static struct amd_dbgapi_target the_amd_dbgapi_target;
 static const registry<inferior>::key<amd_dbgapi_inferior_info>
   amd_dbgapi_inferior_data;
 
+/* Fetch the amd_dbgapi_inferior_info data for the given inferior.  */
+
+static struct amd_dbgapi_inferior_info *
+get_amd_dbgapi_inferior_info (struct inferior *inferior)
+{
+  amd_dbgapi_inferior_info *info = amd_dbgapi_inferior_data.get (inferior);
+
+  if (info == nullptr)
+    info = amd_dbgapi_inferior_data.emplace (inferior, inferior);
+
+  return info;
+}
+
 /* The async event handler registered with the event loop, indicating that we
    might have events to report to the core and that we'd like our wait method
    to be called.
@@ -289,7 +334,7 @@ static const registry<inferior>::key<amd_dbgapi_inferior_info>
 static async_event_handler *amd_dbgapi_async_event_handler = nullptr;
 
 std::string
-wave_info::to_string () const
+wave_coordinates::to_string () const
 {
   std::string str = "AMDGPU Wave";
 
@@ -319,37 +364,41 @@ wave_info::to_string () const
   return str;
 }
 
-wave_info::wave_info (amd_dbgapi_wave_id_t wave_id)
-  : wave_id (wave_id)
-{
-}
-
-/* Read in wave_info for WAVE_ID.  */
-
-static wave_info
-get_wave_info (amd_dbgapi_wave_id_t wave_id)
+void
+wave_coordinates::fetch ()
 {
-  wave_info res (wave_id);
-
   /* Any field that fails to be read is left with its in-class
      initialized value, which is printed as "?".  */
 
   amd_dbgapi_wave_get_info (wave_id, AMD_DBGAPI_WAVE_INFO_AGENT,
-			    sizeof (res.agent_id), &res.agent_id);
+			    sizeof (agent_id), &agent_id);
   amd_dbgapi_wave_get_info (wave_id, AMD_DBGAPI_WAVE_INFO_QUEUE,
-			    sizeof (res.queue_id), &res.queue_id);
+			    sizeof (queue_id), &queue_id);
   amd_dbgapi_wave_get_info (wave_id, AMD_DBGAPI_WAVE_INFO_DISPATCH,
-			    sizeof (res.dispatch_id), &res.dispatch_id);
+			    sizeof (dispatch_id), &dispatch_id);
 
   amd_dbgapi_wave_get_info (wave_id,
 			    AMD_DBGAPI_WAVE_INFO_WORKGROUP_COORD,
-			    sizeof (res.group_ids), &res.group_ids);
+			    sizeof (group_ids), &group_ids);
 
   amd_dbgapi_wave_get_info (wave_id,
 			    AMD_DBGAPI_WAVE_INFO_WAVE_NUMBER_IN_WORKGROUP,
-			    sizeof (res.wave_in_group), &res.wave_in_group);
+			    sizeof (wave_in_group), &wave_in_group);
+}
+
+/* Get the wave_info object for TP, from the wave_info map.  It is
+   assumed that the wave is in the map.  */
+
+static wave_info &
+get_thread_wave_info (thread_info *tp)
+{
+  amd_dbgapi_inferior_info *info = get_amd_dbgapi_inferior_info (tp->inf);
+  amd_dbgapi_wave_id_t wave_id = get_amd_dbgapi_wave_id (tp->ptid);
+
+  auto it = info->wave_info_map.find (wave_id.handle);
+  gdb_assert (it != info->wave_info_map.end ());
 
-  return res;
+  return it->second;
 }
 
 /* Clear our async event handler.  */
@@ -370,19 +419,6 @@ async_event_handler_mark ()
   mark_async_event_handler (amd_dbgapi_async_event_handler);
 }
 
-/* Fetch the amd_dbgapi_inferior_info data for the given inferior.  */
-
-static struct amd_dbgapi_inferior_info *
-get_amd_dbgapi_inferior_info (struct inferior *inferior)
-{
-  amd_dbgapi_inferior_info *info = amd_dbgapi_inferior_data.get (inferior);
-
-  if (info == nullptr)
-    info = amd_dbgapi_inferior_data.emplace (inferior, inferior);
-
-  return info;
-}
-
 /* Set forward progress requirement to REQUIRE for all processes of PROC_TARGET
    matching PTID.  */
 
@@ -565,12 +601,12 @@ amd_dbgapi_target::pid_to_str (ptid_t ptid)
 
   auto it = info->wave_info_map.find (wave_id.handle);
   if (it != info->wave_info_map.end ())
-    return it->second.to_string ();
+    return it->second.coords.to_string ();
 
   /* A wave we don't know about.  Shouldn't usually happen, but
      asserting and bringing down the session is a bit too harsh.  Just
      print all unknown info as "?"s.  */
-  return wave_info (wave_id).to_string ();
+  return wave_coordinates (wave_id).to_string ();
 }
 
 const char *
@@ -694,16 +730,24 @@ amd_dbgapi_target::resume (ptid_t scope_ptid, int step, enum gdb_signal signo)
 
       amd_dbgapi_wave_id_t wave_id = get_amd_dbgapi_wave_id (thread->ptid);
       amd_dbgapi_status_t status;
+
+      wave_info &wi = get_thread_wave_info (thread);
+      amd_dbgapi_resume_mode_t &resume_mode = wi.last_resume_mode;
+      amd_dbgapi_exceptions_t wave_exception;
       if (thread->ptid == inferior_ptid)
-	status = amd_dbgapi_wave_resume (wave_id,
-					 (step
-					  ? AMD_DBGAPI_RESUME_MODE_SINGLE_STEP
-					  : AMD_DBGAPI_RESUME_MODE_NORMAL),
-					 exception);
+	{
+	  resume_mode = (step
+			 ? AMD_DBGAPI_RESUME_MODE_SINGLE_STEP
+			 : AMD_DBGAPI_RESUME_MODE_NORMAL);
+	  wave_exception = exception;
+	}
       else
-	status = amd_dbgapi_wave_resume (wave_id, AMD_DBGAPI_RESUME_MODE_NORMAL,
-					 AMD_DBGAPI_EXCEPTION_NONE);
+	{
+	  resume_mode = AMD_DBGAPI_RESUME_MODE_NORMAL;
+	  wave_exception = AMD_DBGAPI_EXCEPTION_NONE;
+	}
 
+      status = amd_dbgapi_wave_resume (wave_id, resume_mode, wave_exception);
       if (status != AMD_DBGAPI_STATUS_SUCCESS
 	  /* Ignore the error that wave is no longer valid as that could
 	     indicate that the process has exited.  GDB treats resuming a
@@ -711,6 +755,8 @@ amd_dbgapi_target::resume (ptid_t scope_ptid, int step, enum gdb_signal signo)
 	  && status != AMD_DBGAPI_STATUS_ERROR_INVALID_WAVE_ID)
 	error (_("wave_resume for wave_%ld failed (%s)"), wave_id.handle,
 	       get_status_string (status));
+
+      wi.stopping = false;
     }
 }
 
@@ -725,6 +771,15 @@ amd_dbgapi_target::commit_resumed ()
   require_forward_progress (minus_one_ptid, proc_target, true);
 }
 
+/* Return a string version of RESUME_MODE, for debug log purposes.  */
+static const char *
+resume_mode_to_string (amd_dbgapi_resume_mode_t resume_mode)
+{
+  return (resume_mode == AMD_DBGAPI_RESUME_MODE_SINGLE_STEP
+	  ? "step"
+	  : "normal");
+}
+
 void
 amd_dbgapi_target::stop (ptid_t ptid)
 {
@@ -758,7 +813,11 @@ amd_dbgapi_target::stop (ptid_t ptid)
 
 	  status = amd_dbgapi_wave_stop (wave_id);
 	  if (status == AMD_DBGAPI_STATUS_SUCCESS)
-	    return;
+	    {
+	      wave_info &wi = get_thread_wave_info (thread);
+	      wi.stopping = true;
+	      return;
+	    }
 
 	  if (status != AMD_DBGAPI_STATUS_ERROR_INVALID_WAVE_ID)
 	    error (_("wave_stop for wave_%ld failed (%s)"), wave_id.handle,
@@ -772,6 +831,23 @@ amd_dbgapi_target::stop (ptid_t ptid)
 	 could have terminated since the last time the wave list was
 	 refreshed.  */
 
+      wave_info &wi = get_thread_wave_info (thread);
+      wi.stopping = true;
+
+      amd_dbgapi_debug_printf ("got AMD_DBGAPI_STATUS_ERROR_INVALID_WAVE_ID "
+			       "for wave_%ld, last_resume_mode=%s, "
+			       "report_thread_events=%d",
+			       wave_id.handle,
+			       resume_mode_to_string (wi.last_resume_mode),
+			       m_report_thread_events);
+
+      /* If the wave was stepping when it terminated, then it is
+	 guaranteed that we will see a WAVE_COMMAND_TERMINATED event
+	 for it.  Don't report a thread exit event or delete the
+	 thread yet, until we see such event.  */
+      if (wi.last_resume_mode == AMD_DBGAPI_RESUME_MODE_SINGLE_STEP)
+	return;
+
       if (m_report_thread_events)
 	{
 	  get_amd_dbgapi_inferior_info (thread->inf)->wave_events.emplace_back
@@ -1018,7 +1094,7 @@ add_gpu_thread (inferior *inf, ptid_t wave_ptid)
   auto wave_id = get_amd_dbgapi_wave_id (wave_ptid);
 
   if (!info->wave_info_map.try_emplace (wave_id.handle,
-					get_wave_info (wave_id)).second)
+					wave_info (wave_id)).second)
     internal_error ("wave ID %ld already in map", wave_id.handle);
 
   /* Create new GPU threads silently to avoid spamming the terminal
@@ -1770,7 +1846,32 @@ amd_dbgapi_target::update_thread_list ()
 	    auto it = threads.find (tp->ptid.tid ());
 
 	    if (it == threads.end ())
-	      delete_thread_silent (tp);
+	      {
+		auto wave_id = get_amd_dbgapi_wave_id (tp->ptid);
+		wave_info &wi = get_thread_wave_info (tp);
+
+		/* Waves that were stepping or in progress of being
+		   stopped are guaranteed to report a
+		   WAVE_COMMAND_TERMINATED event if they terminate.
+		   Don't delete such threads until we see the
+		   event.  */
+		if (wi.last_resume_mode == AMD_DBGAPI_RESUME_MODE_SINGLE_STEP
+		    || wi.stopping)
+		  {
+		    amd_dbgapi_debug_printf
+		      ("wave_%ld disappeared, keeping it"
+		       " (last_resume_mode=%s, stopping=%d)",
+		       wave_id.handle,
+		       resume_mode_to_string (wi.last_resume_mode),
+		       wi.stopping);
+		  }
+		else
+		  {
+		    amd_dbgapi_debug_printf ("wave_%ld disappeared, deleting it",
+					     wave_id.handle);
+		    delete_thread_silent (tp);
+		  }
+	      }
 	    else
 	      threads.erase (it);
 	  }
-- 
2.43.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 7/8] Add tests for s_endpgm handling
  2023-12-14 20:22 [PATCH 0/8] Step over thread exit improvements/fixes + AMD GPU Pedro Alves
                   ` (5 preceding siblings ...)
  2023-12-14 20:22 ` [PATCH 6/8] Fix handling of vanishing threads that were stepping/stopping Pedro Alves
@ 2023-12-14 20:22 ` Pedro Alves
  2023-12-14 20:22 ` [PATCH 8/8] Add tests for handling of vanishing threads that were stepping/stopping Pedro Alves
  2023-12-20 21:24 ` [PATCH 0/8] Step over thread exit improvements/fixes + AMD GPU Pedro Alves
  8 siblings, 0 replies; 12+ messages in thread
From: Pedro Alves @ 2023-12-14 20:22 UTC (permalink / raw)
  To: gdb-patches; +Cc: Laurent Morichetti, Simon Marchi

[Not for commit.  This won't work with current upstream, unfortunately.]

Check that a wave can halt at an s_endpgm instruction by
single-stepping or displaced stepping the instruction preceding the
s_endpgm.

Check that a wave can single-step s_endpgm, and that it can step over
a breakpoint placed at an s_endpgm by displaced-stepping the
instruction.  Test all three cases:

 - no step-over (stepping without a breakpoint installed)
 - in-line step-over
 - displaced step-over

Check the same with "set scheduler-locking on".

Check that GDB always prints the exited wave's ID when aborting a
command due to thread exit, and that it prints a valid ID with no "?"
in it.

This is named gdb.rocm/step-over-kernel-exit.cpp and not
gdb.rocm/s_endpgm.cpp because we will most probably want to extend
this to test s_sendmsg deallow vgprs before s_endpgm as well.

Co-Authored-By: Laurent Morichetti <laurent.morichetti@amd.com>
Co-Authored-By: Simon Marchi <simon.marchi@efficios.com>
Change-Id: I6db617ac009383698e1c66744d68e70b1d1ca90f
---
 .../gdb.rocm/step-over-kernel-exit.cpp        |  48 ++++++++
 .../gdb.rocm/step-over-kernel-exit.exp        | 108 ++++++++++++++++++
 2 files changed, 156 insertions(+)
 create mode 100644 gdb/testsuite/gdb.rocm/step-over-kernel-exit.cpp
 create mode 100644 gdb/testsuite/gdb.rocm/step-over-kernel-exit.exp

diff --git a/gdb/testsuite/gdb.rocm/step-over-kernel-exit.cpp b/gdb/testsuite/gdb.rocm/step-over-kernel-exit.cpp
new file mode 100644
index 00000000000..61f1b431df1
--- /dev/null
+++ b/gdb/testsuite/gdb.rocm/step-over-kernel-exit.cpp
@@ -0,0 +1,48 @@
+/* Copyright (C) 2023 Free Software Foundation, Inc.
+   Copyright (C) 2023 Advanced Micro Devices, Inc. All rights reserved.
+
+   This file is part of GDB.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.
+*/
+
+#include <stdio.h>
+#include <hip/hip_runtime.h>
+
+#define CHECK(cmd)							\
+  do									\
+    {									\
+      hipError_t error = cmd;						\
+      if (error != hipSuccess)						\
+	{								\
+	  fprintf (stderr, "error: '%s'(%d) at %s:%d\n",		\
+		   hipGetErrorString (error), error,			\
+		   __FILE__, __LINE__);					\
+	  exit (EXIT_FAILURE);						\
+	}								\
+    } while (0)
+
+__global__ void
+kernel ()
+{
+  asm ("before_s_endpgm_insn: s_nop 0");
+  asm ("s_endpgm_insn: s_endpgm"); /* set breakpoint here */
+}
+
+int
+main (int argc, char **argv)
+{
+  kernel<<<1, 1>>> ();
+  CHECK (hipDeviceSynchronize ());
+}
diff --git a/gdb/testsuite/gdb.rocm/step-over-kernel-exit.exp b/gdb/testsuite/gdb.rocm/step-over-kernel-exit.exp
new file mode 100644
index 00000000000..484298ffa3e
--- /dev/null
+++ b/gdb/testsuite/gdb.rocm/step-over-kernel-exit.exp
@@ -0,0 +1,108 @@
+# Copyright (C) 2023 Free Software Foundation, Inc.
+# Copyright (C) 2023 Advanced Micro Devices, Inc. All rights reserved.
+
+# This file is part of GDB.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# Test stopping at and single-stepping and displaced-stepping an
+# s_endpgm instruction.
+
+load_lib rocm.exp
+
+standard_testfile .cpp
+
+require allow_hipcc_tests
+
+if { [build_executable "failed to prepare" \
+	  $testfile $srcfile {debug hip}] == -1 } {
+    return -1
+}
+
+# Test stepping over an s_endpgm instruction.
+#
+# STEP_OVER_MODE can be one of:
+#
+#   - none: don't put a breakpoint on the s_endpgm instruction.
+
+#   - inline: put a breakpoint on the s_endpgm instruction, and use
+#     in-line stepping to step over it (disable displaced-stepping).
+
+#   - displaced: same, but use displaced stepping.
+#
+# SCHEDLOCK can be "on" or "off".
+
+proc do_test { step_over_mode schedlock } {
+    with_rocm_gpu_lock {
+	clean_restart $::binfile
+
+	if { $step_over_mode == "none" } {
+	    # Nothing to do.
+	} elseif { $step_over_mode == "inline" } {
+	    gdb_test_no_output "set displaced-stepping off"
+	} elseif { $step_over_mode == "displaced" } {
+	    gdb_test_no_output "set displaced-stepping on"
+	} else {
+	    error "Invalid step_over_mode value: $step_over_mode"
+	}
+
+	if ![runto_main] {
+	    fail "can't run to main"
+	    return -1
+	}
+
+	# Put a breakpoint on the instruction before s_endpgm,
+	# continue to it.
+	gdb_breakpoint "before_s_endpgm_insn" allow-pending
+	gdb_continue_to_breakpoint "before_s_endpgm_insn"
+
+	gdb_test_no_output "set scheduler-locking $schedlock"
+
+	gdb_test "stepi" \
+	    "\"s_endpgm_insn: s_endpgm\".*" \
+	    "single-step instruction before s_endpgm"
+
+	gdb_test "x/i \$pc" \
+	    "$::hex <\[^\r\n\]*>:\[ \t\]+s_endpgm.*" \
+	    "stopped at s_endpgm"
+
+	# If testing a step-over is requested, place a breakpoint at
+	# the current instruction to force a step-over.
+	if { $step_over_mode != "none" } {
+	    gdb_test "break s_endpgm_insn" "Breakpoint $::decimal at $::hex.*"
+	}
+
+	set d $::decimal
+	set wave_target_id_re "AMDGPU Wave $d:$d:$d:1 \\(0,0,0\\)/0"
+
+	set selected_thread_before \
+	    [get_integer_valueof "\$_thread" 0 "get selected thread before"]
+
+	gdb_test "stepi" \
+	    "\r\n\[$wave_target_id_re exited\]\r\nCommand aborted, thread exited\\." \
+	    "single-step s_endpgm"
+
+	# Check that the selected thread didn't change, and that GDB
+	# manages to print the exited wave's target ID properly.
+	gdb_test "thread" \
+	    "\r\n\[Current thread is $selected_thread_before \\($wave_target_id_re\\) \\(exited\\)\]" \
+	    "exited wave target id"
+    }
+}
+
+foreach_with_prefix step_over_mode {none inline displaced} {
+    foreach_with_prefix schedlock {off on} {
+	do_test $step_over_mode $schedlock
+    }
+}
-- 
2.43.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 8/8] Add tests for handling of vanishing threads that were stepping/stopping
  2023-12-14 20:22 [PATCH 0/8] Step over thread exit improvements/fixes + AMD GPU Pedro Alves
                   ` (6 preceding siblings ...)
  2023-12-14 20:22 ` [PATCH 7/8] Add tests for s_endpgm handling Pedro Alves
@ 2023-12-14 20:22 ` Pedro Alves
  2023-12-20 21:24 ` [PATCH 0/8] Step over thread exit improvements/fixes + AMD GPU Pedro Alves
  8 siblings, 0 replies; 12+ messages in thread
From: Pedro Alves @ 2023-12-14 20:22 UTC (permalink / raw)
  To: gdb-patches

Not for commit.  This won't work with current upstream, unfortunately.

Change-Id: I43a66f060c35aad1fe0d9ff022ce2afd0537f028
---
 .../gdb.rocm/continue-over-kernel-exit.cpp    |  66 +++++++
 .../gdb.rocm/continue-over-kernel-exit.exp    | 165 ++++++++++++++++++
 2 files changed, 231 insertions(+)
 create mode 100644 gdb/testsuite/gdb.rocm/continue-over-kernel-exit.cpp
 create mode 100644 gdb/testsuite/gdb.rocm/continue-over-kernel-exit.exp

diff --git a/gdb/testsuite/gdb.rocm/continue-over-kernel-exit.cpp b/gdb/testsuite/gdb.rocm/continue-over-kernel-exit.cpp
new file mode 100644
index 00000000000..bad7064f30b
--- /dev/null
+++ b/gdb/testsuite/gdb.rocm/continue-over-kernel-exit.cpp
@@ -0,0 +1,66 @@
+/* Copyright (C) 2023 Free Software Foundation, Inc.
+   Copyright (C) 2023 Advanced Micro Devices, Inc. All rights reserved.
+
+   This file is part of GDB.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.
+*/
+
+#include <hip/hip_runtime.h>
+#include <stdio.h>
+
+#define CHECK(cmd)						\
+  do								\
+    {								\
+      hipError_t error = cmd;					\
+      if (error != hipSuccess)					\
+	{							\
+	  fprintf (stderr, "error: '%s'(%d) at %s:%d\n",	\
+		   hipGetErrorString (error), error,		\
+		   __FILE__, __LINE__);				\
+	  exit (EXIT_FAILURE);					\
+	}							\
+    } while (0)
+
+__global__ void
+kern ()
+{
+  asm ("s_endpgm_insn: s_endpgm");
+}
+
+__global__ void
+second_kernel ()
+{
+}
+
+int
+main ()
+{
+  /* Use 1-thread blocks to easily control number of waves.  */
+  size_t blocksize = 1;
+  size_t gridsize = 10;
+
+  kern<<<gridsize, blocksize>>> ();
+
+  /* Stopping at this second kernel after the first kernel completely
+     finishes makes GDB refresh its thread list while the
+     amd-dbgapi-target is still active, which triggers different code
+     paths in GDB that lead to deleting exited threads.  We test both
+     stopping here, and not stopping here.  */
+  second_kernel<<<1, 1>>> ();
+
+  CHECK (hipDeviceSynchronize ());
+
+  return 0;
+}
diff --git a/gdb/testsuite/gdb.rocm/continue-over-kernel-exit.exp b/gdb/testsuite/gdb.rocm/continue-over-kernel-exit.exp
new file mode 100644
index 00000000000..ed98ab4697b
--- /dev/null
+++ b/gdb/testsuite/gdb.rocm/continue-over-kernel-exit.exp
@@ -0,0 +1,165 @@
+# Copyright (C) 2023 Free Software Foundation, Inc.
+# Copyright (C) 2023 Advanced Micro Devices, Inc. All rights reserved.
+
+# This file is part of GDB.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# Test stopping at and continuing from a s_endpgm instruction, with
+# and without stepping over a breakpoint on top of it.
+
+load_lib rocm.exp
+
+standard_testfile .cpp
+
+require allow_hipcc_tests
+
+if { [build_executable "failed to prepare" \
+	  $testfile $srcfile {debug hip}] == -1 } {
+    return -1
+}
+
+# Test continuing from an s_endpgm instruction with multiple waves.
+#
+# STEP_OVER_MODE can be one of:
+#
+#   - none: don't put a breakpoint on the s_endpgm instruction.
+#
+#   - inline: put a breakpoint on the s_endpgm instruction, and use
+#     in-line stepping to step over it (disable displaced-stepping).
+#
+#   - displaced: same, but use displaced stepping.
+#
+# FINISH_HOW can be one of:
+#
+#   - second_kernel: stop at a breakpoint at "second_kernel" after the
+#     first kernel finishes.
+#
+#   - normal_exit: let the inferior exit normally after the first
+#     kernel finishes, without stopping at second_kernel.
+#
+proc do_test { step_over_mode finish_how } {
+    with_rocm_gpu_lock {
+	clean_restart $::binfile
+
+	if { $step_over_mode == "none" } {
+	    # Nothing to do.
+	} elseif { $step_over_mode == "inline" } {
+	    gdb_test_no_output "set displaced-stepping off"
+	} elseif { $step_over_mode == "displaced" } {
+	    gdb_test_no_output "set displaced-stepping on"
+	} else {
+	    error "Invalid step_over_mode value: $step_over_mode"
+	}
+
+	if ![runto_main] {
+	    fail "can't run to main"
+	    return -1
+	}
+
+	# Put a breakpoint on the s_endpgm instruction, and continue
+	# to it.  If testing a step-over is requested, leave the
+	# breakpoint inserted after the initial hit to force a
+	# step-over.
+	if { $step_over_mode != "none" } {
+	    gdb_breakpoint "s_endpgm_insn" allow-pending
+	    set bp_hits 10
+	} else {
+	    gdb_breakpoint "s_endpgm_insn" allow-pending temporary
+	    set bp_hits 0
+	}
+	gdb_continue_to_breakpoint "s_endpgm_insn"
+
+	gdb_test "x/i \$pc" \
+	    "$::hex <\[^\r\n\]*>:\[ \t\]+s_endpgm.*" \
+	    "stopped at s_endpgm"
+
+	if {$finish_how == "second_kernel"} {
+	    gdb_breakpoint "second_kernel"
+	}
+
+	for {set i 1} {$i < $bp_hits} {incr i} {
+	    with_test_prefix "iter $i" {
+		gdb_test_multiple "continue" "continue to s_endpgm" {
+		    -re -wrap "Continuing\\.\r\n.*hit Breakpoint $::decimal, .* kern .*\"s_endpgm_insn: .*" {
+			pass $gdb_test_name
+		    }
+		}
+	    }
+	}
+
+	# GDB used to mishandle wave exits resulting in
+	# WAVE_COMMAND_TERMINATED events being left in the
+	# amd-dbgapi-target's event queue _after_ the wave had already
+	# been removed from the GDB thread list.  That in turn would
+	# result in seeing already-dead waves re-added to GDB's thread
+	# list, and then immediately deleted, with these user
+	# notifications:
+	#
+	#  [New AMDGPU Wave ?:?:?:1 (?,?,?)/?]
+	#  [AMDGPU Wave ?:?:?:1 (?,?,?)/? exited]
+	#
+	# The test below fails if we see any "?" in the wave's target
+	# id.  It is written this way instead of a tighter match to
+	# increase the chances of the problem being caught if the GDB
+	# output ever changes.
+	#
+	# The bad thread additions mentioned above were done without
+	# adding the amd-dbgapi-target-specific wave info to the
+	# amd-dbgapi-target data structures, resulting in GDB
+	# assertion failures in the amd-dbgapi-target if GDB handled
+	# any other stop event after the first kernel finishes.  We
+	# exercise that with FINISH_HOW=second_kernel.
+
+	set bad_coords 0
+	set exited_normally 0
+	set second_kernel_breakpoint 0
+	gdb_test_multiple "continue" "last continue" -lbl {
+	    -re "AMDGPU Wave (\[^\r\n\]*)(?=\r\n)" {
+		set wave_coords $expect_out(1,string)
+		if {[string first "?" $wave_coords] != -1} {
+		    incr bad_coords
+		}
+		exp_continue
+	    }
+	    -re "Inferior 1 \\(process $::decimal\\) exited normally" {
+		incr exited_normally
+		exp_continue
+	    }
+	    -re "hit Breakpoint $::decimal, with lane 0, second_kernel " {
+		incr second_kernel_breakpoint
+		exp_continue
+	    }
+	    -re -wrap "" {
+		if {$bad_coords > 0} {
+		    fail "$gdb_test_name (bad coords)"
+		} elseif {[expr ($second_kernel_breakpoint + $exited_normally) > 1]} {
+		    fail "$gdb_test_name (bad finish)"
+		} elseif {$finish_how == "second_kernel" && $second_kernel_breakpoint != 1} {
+		    fail "$gdb_test_name (no second_kernel breakpoint)"
+		} elseif {$finish_how == "normal_exit" && $exited_normally != 1} {
+		    fail "$gdb_test_name (not normal exit)"
+		} else {
+		    pass $gdb_test_name
+		}
+	    }
+	}
+    }
+}
+
+foreach_with_prefix step_over_mode {none inline displaced} {
+    foreach_with_prefix finish_how {second_kernel normal_exit} {
+	do_test $step_over_mode $finish_how
+    }
+}
-- 
2.43.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 5/8] Fix thread target ID of exited waves
  2023-12-14 20:22 ` [PATCH 5/8] Fix thread target ID of exited waves Pedro Alves
@ 2023-12-15 10:51   ` Lancelot SIX
  0 siblings, 0 replies; 12+ messages in thread
From: Lancelot SIX @ 2023-12-15 10:51 UTC (permalink / raw)
  To: Pedro Alves; +Cc: gdb-patches

Hi Pedro,

Thanks for doing this.

I have minor suggestions below, but I am happy if you prefer what you
have here.

On Thu, Dec 14, 2023 at 08:22:35PM +0000, Pedro Alves wrote:
> Currently, if you step over kernel exit, you see:
> 
>  stepi
>  [AMDGPU Wave ?:?:?:1 (?,?,?)/? exited]
>  Command aborted, thread exited.
>  (gdb)
> 
> Those '?' are because the thread/wave is already gone by the time GDB
> prints the "exited" notification, we can't ask dbgapi for any info
> about the wave anymore.
> 
> This commit fixes it by caching the wave's coordinates as soon as GDB
> sees the wave for the first time, and making
> amd_dbgapi_target::pid_to_str use the cached info.
> 
> At first I thought of clearing the wave_info object from a
> thread_exited observer.  However, that is too soon, resulting in this:
> 
>  (gdb) si
>  [AMDGPU Wave 1:4:1:1 (0,0,0)/0 exited]
>  Command aborted, thread exited.
>  (gdb) thread
>  [Current thread is 6 (AMDGPU Wave ?:?:?:0 (?,?,?)/?) (exited)]
> 
> We need instead to clear the wave info when the thread is ultimately
> deleted, so we get:
> 
>  (gdb) si
>  [AMDGPU Wave 1:4:1:1 (0,0,0)/0 exited]
>  Command aborted, thread exited.
>  (gdb) thread
>  [Current thread is 6 (AMDGPU Wave 1:4:1:1 (0,0,0)/0) (exited)]
> 
> And for that, we need a new thread_deleted observable.
> 
> Change-Id: I6c3e22541f051e1205f75eb657b04dc15e547580
> ---
>  gdb/amd-dbgapi-target.c | 168 +++++++++++++++++++++++++++++++---------
>  gdb/observable.c        |   1 +
>  gdb/observable.h        |   5 ++
>  gdb/thread.c            |   2 +
>  4 files changed, 138 insertions(+), 38 deletions(-)
> 
> diff --git a/gdb/amd-dbgapi-target.c b/gdb/amd-dbgapi-target.c
> index 18c0543c40e..86102b7fb03 100644
> --- a/gdb/amd-dbgapi-target.c
> +++ b/gdb/amd-dbgapi-target.c
> @@ -109,6 +109,28 @@ get_amd_dbgapi_target_inferior_created_observer_token ()
>    return amd_dbgapi_target_inferior_created_observer_token;
>  }
>  
> +/* A type holding coordinate, etc. info for a given wave.  We cache
> +   this because we need this information after a wave exits.  */
> +
> +struct wave_info
> +{
> +  /* The wave.  Set by the ctor.  */
> +  amd_dbgapi_wave_id_t wave_id;
> +
> +  /* All these fields are initialized here to a value that is printed
> +     as "?".  */
> +  amd_dbgapi_dispatch_id_t dispatch_id {};
> +  amd_dbgapi_queue_id_t queue_id {};
> +  amd_dbgapi_agent_id_t agent_id {};

We could be more explicit here and use the semantically equivalent
notation:

  amd_dbgapi_dispatch_id_t dispatch_id = AMD_DBGAPI_DISPATCH_NONE;
  amd_dbgapi_queue_id_t queue_id = AMD_DBGAPI_QUEUE_NONE;
  amd_dbgapi_agent_id_t agent_id = AMD_DBGAPI_AGENT_NONE;


> +  uint32_t group_ids[3] {UINT32_MAX, UINT32_MAX, UINT32_MAX};
> +  uint32_t wave_in_group = UINT32_MAX;
> +
> +  explicit wave_info (amd_dbgapi_wave_id_t wave_id);
> +
> +  /* Return the target ID string for the wave this wave_info is
> +     for.  */
> +  std::string to_string () const;
> +};
>  
>  /* Big enough to hold the size of the largest register in bytes.  */
>  #define AMDGPU_MAX_REGISTER_SIZE 256
> @@ -160,6 +182,16 @@ struct amd_dbgapi_inferior_info
>  
>    /* List of pending events the amd-dbgapi target retrieved from the dbgapi.  */
>    std::list<std::pair<ptid_t, target_waitstatus>> wave_events;
> +
> +  /* Map of wave ID to wave_info.  We cache wave_info objects because
> +     we need to access the info after the wave is gone, in the thread
> +     exit nofication.  E.g.:
> +	[AMDGPU Wave 1:4:1:1 (0,0,0)/0 exited]
> +
> +     wave_info objects are added when we first see the wave, and
> +     removed from a thread_deleted observer.  */
> +  std::unordered_map<decltype (amd_dbgapi_wave_id_t::handle), wave_info>
> +    wave_info_map;
>  };
>  
>  static amd_dbgapi_event_id_t process_event_queue
> @@ -256,56 +288,70 @@ static const registry<inferior>::key<amd_dbgapi_inferior_info>
>  
>  static async_event_handler *amd_dbgapi_async_event_handler = nullptr;
>  
> -/* Return the target id string for a given wave.  */
> -
> -static std::string
> -wave_target_id_string (amd_dbgapi_wave_id_t wave_id)
> +std::string
> +wave_info::to_string () const
>  {
> -  amd_dbgapi_dispatch_id_t dispatch_id;
> -  amd_dbgapi_queue_id_t queue_id;
> -  amd_dbgapi_agent_id_t agent_id;
> -  uint32_t group_ids[3], wave_in_group;
>    std::string str = "AMDGPU Wave";
>  
> -  amd_dbgapi_status_t status
> -    = amd_dbgapi_wave_get_info (wave_id, AMD_DBGAPI_WAVE_INFO_AGENT,
> -				sizeof (agent_id), &agent_id);
> -  str += (status == AMD_DBGAPI_STATUS_SUCCESS
> +  str += (agent_id.handle != 0

We could use `agent_id != AMD_DBGAPI_AGENT_NONE` here.

>  	  ? string_printf (" %ld", agent_id.handle)
>  	  : " ?");
>  
> -  status = amd_dbgapi_wave_get_info (wave_id, AMD_DBGAPI_WAVE_INFO_QUEUE,
> -				     sizeof (queue_id), &queue_id);
> -  str += (status == AMD_DBGAPI_STATUS_SUCCESS
> +  str += (queue_id.handle != 0

Similarly, `queue_id != AMD_DBGAPI_QUEUE_NONE`.

>  	  ? string_printf (":%ld", queue_id.handle)
>  	  : ":?");
>  
> -  status = amd_dbgapi_wave_get_info (wave_id, AMD_DBGAPI_WAVE_INFO_DISPATCH,
> -				     sizeof (dispatch_id), &dispatch_id);
> -  str += (status == AMD_DBGAPI_STATUS_SUCCESS
> +  str += (dispatch_id.handle != 0

Similarly, `dispatch_id != AMD_DBGAPI_DISPATCH_NONE`.

>  	  ? string_printf (":%ld", dispatch_id.handle)
>  	  : ":?");
>  
>    str += string_printf (":%ld", wave_id.handle);
>  
> -  status = amd_dbgapi_wave_get_info (wave_id,
> -				     AMD_DBGAPI_WAVE_INFO_WORKGROUP_COORD,
> -				     sizeof (group_ids), &group_ids);
> -  str += (status == AMD_DBGAPI_STATUS_SUCCESS
> +  str += (group_ids[0] != UINT32_MAX
>  	  ? string_printf (" (%d,%d,%d)", group_ids[0], group_ids[1],
>  			   group_ids[2])
>  	  : " (?,?,?)");
>  
> -  status = amd_dbgapi_wave_get_info
> -    (wave_id, AMD_DBGAPI_WAVE_INFO_WAVE_NUMBER_IN_WORKGROUP,
> -     sizeof (wave_in_group), &wave_in_group);
> -  str += (status == AMD_DBGAPI_STATUS_SUCCESS
> +  str += (wave_in_group != UINT32_MAX
>  	  ? string_printf ("/%d", wave_in_group)
>  	  : "/?");
>  
>    return str;
>  }
>  
> +wave_info::wave_info (amd_dbgapi_wave_id_t wave_id)
> +  : wave_id (wave_id)
> +{
> +}
> +
> +/* Read in wave_info for WAVE_ID.  */
> +
> +static wave_info
> +get_wave_info (amd_dbgapi_wave_id_t wave_id)
> +{
> +  wave_info res (wave_id);
> +
> +  /* Any field that fails to be read is left with its in-class
> +     initialized value, which is printed as "?".  */
> +
> +  amd_dbgapi_wave_get_info (wave_id, AMD_DBGAPI_WAVE_INFO_AGENT,
> +			    sizeof (res.agent_id), &res.agent_id);
> +  amd_dbgapi_wave_get_info (wave_id, AMD_DBGAPI_WAVE_INFO_QUEUE,
> +			    sizeof (res.queue_id), &res.queue_id);
> +  amd_dbgapi_wave_get_info (wave_id, AMD_DBGAPI_WAVE_INFO_DISPATCH,
> +			    sizeof (res.dispatch_id), &res.dispatch_id);
> +
> +  amd_dbgapi_wave_get_info (wave_id,
> +			    AMD_DBGAPI_WAVE_INFO_WORKGROUP_COORD,
> +			    sizeof (res.group_ids), &res.group_ids);
> +
> +  amd_dbgapi_wave_get_info (wave_id,
> +			    AMD_DBGAPI_WAVE_INFO_WAVE_NUMBER_IN_WORKGROUP,
> +			    sizeof (res.wave_in_group), &res.wave_in_group);
> +
> +  return res;
> +}
> +
>  /* Clear our async event handler.  */
>  
>  static void
> @@ -510,7 +556,21 @@ amd_dbgapi_target::pid_to_str (ptid_t ptid)
>    if (!ptid_is_gpu (ptid))
>      return beneath ()->pid_to_str (ptid);
>  
> -  return wave_target_id_string (get_amd_dbgapi_wave_id (ptid));
> +  process_stratum_target *proc_target = current_inferior ()->process_target ();
> +  inferior *inf = find_inferior_pid (proc_target, ptid.pid ());
> +  gdb_assert (inf != nullptr);
> +  amd_dbgapi_inferior_info *info = get_amd_dbgapi_inferior_info (inf);
> +
> +  auto wave_id = get_amd_dbgapi_wave_id (ptid);
> +
> +  auto it = info->wave_info_map.find (wave_id.handle);
> +  if (it != info->wave_info_map.end ())
> +    return it->second.to_string ();
> +
> +  /* A wave we don't know about.  Shouldn't usually happen, but
> +     asserting and bringing down the session is a bit too harsh.  Just
> +     print all unknown info as "?"s.  */
> +  return wave_info (wave_id).to_string ();
>  }
>  
>  const char *
> @@ -929,6 +989,46 @@ make_gpu_ptid (ptid_t::pid_type pid, amd_dbgapi_wave_id_t wave_id)
>   return ptid_t (pid, 1, wave_id.handle);
>  }
>  
> +/* When a thread is deleted, remove its wave_info from the inferior's
> +   wave_info map.  */
> +
> +static void
> +amd_dbgapi_thread_deleted (thread_info *tp)
> +{
> +  if (tp->inf->target_at (arch_stratum) == &the_amd_dbgapi_target
> +      && ptid_is_gpu (tp->ptid))
> +    {
> +      amd_dbgapi_inferior_info *info = amd_dbgapi_inferior_data.get (tp->inf);
> +      auto wave_id = get_amd_dbgapi_wave_id (tp->ptid);
> +      auto it = info->wave_info_map.find (wave_id.handle);
> +      gdb_assert (it != info->wave_info_map.end ());
> +      info->wave_info_map.erase (it);
> +    }
> +}
> +
> +/* Register WAVE_PTID as a new thread in INF's thread list, and record
> +   its wave_info in the inferior's wave_info map.  */
> +
> +static thread_info *
> +add_gpu_thread (inferior *inf, ptid_t wave_ptid)
> +{
> +  process_stratum_target *proc_target = inf->process_target ();
> +  amd_dbgapi_inferior_info *info = get_amd_dbgapi_inferior_info (inf);
> +
> +  auto wave_id = get_amd_dbgapi_wave_id (wave_ptid);
> +
> +  if (!info->wave_info_map.try_emplace (wave_id.handle,
> +					get_wave_info (wave_id)).second)
> +    internal_error ("wave ID %ld already in map", wave_id.handle);
> +
> +  /* Create new GPU threads silently to avoid spamming the terminal
> +     with thousands of "[New Thread ...]" messages.  */
> +  thread_info *thread = add_thread_silent (proc_target, wave_ptid);
> +  set_running (proc_target, wave_ptid, true);
> +  set_executing (proc_target, wave_ptid, true);
> +  return thread;
> +}
> +
>  /* Process an event that was just pulled out of the amd-dbgapi library.  */
>  
>  static void
> @@ -1015,13 +1115,7 @@ process_one_event (amd_dbgapi_event_id_t event_id,
>  
>  	    thread_info *thread = proc_target->find_thread (event_ptid);
>  	    if (thread == nullptr)
> -	      {
> -		/* Silently create new GPU threads to avoid spamming the
> -		   terminal with thousands of "[New Thread ...]" messages.  */
> -		thread = add_thread_silent (proc_target, event_ptid);
> -		set_running (proc_target, event_ptid, true);
> -		set_executing (proc_target, event_ptid, true);
> -	      }
> +	      thread = add_gpu_thread (inf, event_ptid);
>  
>  	    /* If the wave is stopped because of a software breakpoint, the
>  	       program counter needs to be adjusted so that it points to the
> @@ -1686,10 +1780,7 @@ amd_dbgapi_target::update_thread_list ()
>  	{
>  	  ptid_t wave_ptid
>  	    = make_gpu_ptid (inf->pid, amd_dbgapi_wave_id_t {tid});
> -
> -	  add_thread_silent (inf->process_target (), wave_ptid);
> -	  set_running (inf->process_target (), wave_ptid, true);
> -	  set_executing (inf->process_target (), wave_ptid, true);
> +	  add_gpu_thread (inf, wave_ptid);
>  	}
>      }
>  
> @@ -2115,6 +2206,7 @@ _initialize_amd_dbgapi_target ()
>    gdb::observers::inferior_forked.attach (amd_dbgapi_inferior_forked, "amd-dbgapi");
>    gdb::observers::inferior_exit.attach (amd_dbgapi_inferior_exited, "amd-dbgapi");
>    gdb::observers::inferior_pre_detach.attach (amd_dbgapi_inferior_pre_detach, "amd-dbgapi");
> +  gdb::observers::thread_deleted.attach (amd_dbgapi_thread_deleted, "amd-dbgapi");
>  
>    add_basic_prefix_cmd ("amdgpu", no_class,
>  			_("Generic command for setting amdgpu flags."),
> diff --git a/gdb/observable.c b/gdb/observable.c
> index f2e65b11604..29675f3abf3 100644
> --- a/gdb/observable.c
> +++ b/gdb/observable.c
> @@ -46,6 +46,7 @@ DEFINE_OBSERVABLE (all_objfiles_removed);
>  DEFINE_OBSERVABLE (free_objfile);
>  DEFINE_OBSERVABLE (new_thread);
>  DEFINE_OBSERVABLE (thread_exit);
> +DEFINE_OBSERVABLE (thread_deleted);
>  DEFINE_OBSERVABLE (thread_stop_requested);
>  DEFINE_OBSERVABLE (target_resumed);
>  DEFINE_OBSERVABLE (about_to_proceed);
> diff --git a/gdb/observable.h b/gdb/observable.h
> index 32ef65435cc..91a2c871524 100644
> --- a/gdb/observable.h
> +++ b/gdb/observable.h
> @@ -126,6 +126,11 @@ extern observable<thread_info */* t */,
>  		  std::optional<ULONGEST> /* exit_code */,
>  		  bool /* silent */> thread_exit;
>  
> +/* The thread specified by T has been deleted, with delete_thread.
> +   This is called just before the thread_info object is destroyed with
> +   operator delete.  */
> +extern observable<thread_info */* t */> thread_deleted;
> +
>  /* An explicit stop request was issued to PTID.  If PTID equals
>     minus_one_ptid, the request applied to all threads.  If
>     ptid_is_pid(PTID) returns true, the request applied to all
> diff --git a/gdb/thread.c b/gdb/thread.c
> index 85bdbaa6cd8..bd3fe85f3b9 100644
> --- a/gdb/thread.c
> +++ b/gdb/thread.c
> @@ -527,6 +527,8 @@ delete_thread_1 (thread_info *thr, std::optional<ULONGEST> exit_code,
>    auto it = thr->inf->thread_list.iterator_to (*thr);
>    thr->inf->thread_list.erase (it);
>  
> +  gdb::observers::thread_deleted.notify (thr);
> +
>    delete thr;
>  }
>  
> 
> -- 
> 2.43.0
> 

I am happy whether you do the suggested changes or not. Either way

Approved-By: Lancelot Six <lancelot.six@amd.com> (amdgpu)

Best,
Lancelot.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 6/8] Fix handling of vanishing threads that were stepping/stopping
  2023-12-14 20:22 ` [PATCH 6/8] Fix handling of vanishing threads that were stepping/stopping Pedro Alves
@ 2023-12-15 10:51   ` Lancelot SIX
  0 siblings, 0 replies; 12+ messages in thread
From: Lancelot SIX @ 2023-12-15 10:51 UTC (permalink / raw)
  To: Pedro Alves; +Cc: gdb-patches

Hi Pedro,

I have a minor comment below.

On Thu, Dec 14, 2023 at 08:22:36PM +0000, Pedro Alves wrote:
> Downstream, AMD is carrying a testcase
> (gdb.rocm/continue-over-kernel-exit.exp) that exposes a couple issues
> with the amd-dbgapi target's handling of exited threads.  The test
> can't be added upstream yet, unfortunately, due to dependency on DWARF
> extensions that can't be upstreamed yet.  However, it can be found on
> the mailing list on the same series as this patch.
> 
> The test spawns a kernel with a number of waves.  The waves do nothing
> but exit.  There is a breakpoint on the s_endpgm instruction.  Once
> that breakpoint is hit, the test issues a "continue" command.  We
> should see one breakpoint hit per wave, and then the whole program
> exiting.  We do see that, however we also see this:
> 
>  [New AMDGPU Wave ?:?:?:1 (?,?,?)/?]
>  [AMDGPU Wave ?:?:?:1 (?,?,?)/? exited]
>  *repeat for other waves*
>  ...
>  [Thread 0x7ffff626f640 (LWP 3048491) exited]
>  [Thread 0x7fffeb7ff640 (LWP 3048488) exited]
>  [Inferior 1 (process 3048475) exited normally]
> 
> That "New AMDGPU Wave" output comes from infrun.c itself adding the
> thread to the GDB thread list, because it got an event for a thread
> not on the thread list yet.  The output shows "?"s instead of proper
> coordinates, because the event was a TARGET_WAITKIND_THREAD_EXITED,
> i.e., the wave was already gone when infrun.c added the thread to the
> thread list.
> 
> That shouldn't ever happen for the amd-dbgapi target, threads should
> only ever be added by the backend.
> 
> Note "New AMDGPU Wave ?:?:?:1" is for wave 1.  What happened was that
> wave 1 terminated previously, and a previous call to
> amd_dbgapi_target::update_thread_list() noticed the wave had vanished
> and removed it from the GDB thread list.  However, because the wave
> was stepping when it terminated (due to the displaced step over the
> s_endpgm) instruction, it is guaranteed that the amd-dbgapi library
> queues a WAVE_COMMAND_TERMINATED event for the exit.
> 
> When we process that WAVE_COMMAND_TERMINATED event, in
> amd-dbgapi-target.c:process_one_event, we return it to the core as a
> TARGET_WAITKIND_THREAD_EXITED event:
> 
>  static void
>  process_one_event (amd_dbgapi_event_id_t event_id,
> 		    amd_dbgapi_event_kind_t event_kind)
>  {
>  ...
> 	 if (status == AMD_DBGAPI_STATUS_ERROR_INVALID_WAVE_ID
> 	     && event_kind == AMD_DBGAPI_EVENT_KIND_WAVE_COMMAND_TERMINATED)
> 	   ws.set_thread_exited (0);
>  ...
>  }
> 
> Recall the wave is already gone from the GDB thread list.  So when GDB
> sees that TARGET_WAITKIND_THREAD_EXITED event for a thread it doesn't
> know about, it adds the thread to the thread list, resulting in that:
> 
>  [New AMDGPU Wave ?:?:?:1 (?,?,?)/?]
> 
> and then, because it was a TARGET_WAITKIND_THREAD_EXITED event, GDB
> marks the thread exited right afterwards:
> 
>  [AMDGPU Wave ?:?:?:1 (?,?,?)/? exited]
> 
> The fix is to make amd_dbgapi_target::update_thread_list() _not_
> delete vanishing waves iff they were stepping or in progress of being
> stopped.  These two cases are the ones dbgapi guarantees will result
> in a WAVE_COMMAND_TERMINATED event if the wave terminates:
> 
>   /**
>    * A command for a wave was not able to complete because the wave has
>    * terminated.
>    *
>    * Commands that can result in this event are ::amd_dbgapi_wave_stop and
>    * ::amd_dbgapi_wave_resume in single step mode.  Since the wave terminated
>    * before stopping, this event will be reported instead of
>    * ::AMD_DBGAPI_EVENT_KIND_WAVE_STOP.
>    *
>    * The wave that terminated is available by the ::AMD_DBGAPI_EVENT_INFO_WAVE
>    * query.  However, the wave will be invalid since it has already terminated.
>    * It is the client's responsibility to know what command was being performed
>    * and was unable to complete due to the wave terminating.
>    */
>   AMD_DBGAPI_EVENT_KIND_WAVE_COMMAND_TERMINATED = 2,
> 
> As the comment says, it's GDB's responsability to know whether the
> wave was stepping or being stopped.  Since we now have a wave_info map
> with one entry for each wave, that seems like the place to store that
> information.  However, I still decided to put all the coordinate
> information in its own structure.  I.e., basically renamed the
> existing wave_info to wave_coordinates, and then added a new wave_info
> structure that holds the new state, plus a wave_coordinates object.
> This seemer cleaner as there are places where we only need to

s/seemer/seemed/ maybe?

> instantiate a wave_coordinates object.
> 
> There's an extra twist.  The testcase also exercises stopping at a new
> kernel right after the first kernel fully exits.  In that scenario, we
> were hitting this assertion after the first kernel fully exits and the
> hit of the breakpoint at the second kernel is handled:
> 
>  [amd-dbgapi] process_event_queue: Pulled event from dbgapi: event_id.handle = 26, event_kind = WAVE_STOP
>  [amd-dbgapi-lib] suspending queue_3, queue_2, queue_1 (refresh wave list)
>  ../../src/gdb/amd-dbgapi-target.c:1625: internal-error: amd_dbgapi_thread_deleted: Assertion `it != info->wave_info_map.end ()' failed.
>  A problem internal to GDB has been detected,
>  further debugging may prove unreliable.
> 
> This is the exact same problem as above, just a different
> manifestation.  In this scenario, we end up in update_thread_list
> successfully deleting the exited thread (because it was no longer the
> current thread) that was incorrectly added by infrun.c.  Because it
> was added by infrun.c and not by amd-dbgapi-target.c:add_gpu_thread,
> it doesn't have an entry in the wave_info map, so
> amd_dbgapi_thread_deleted trips on this assertion:
> 
>       gdb_assert (it != info->wave_info_map.end ());
> 
> here:
> 
>   ...
>   -> stop_all_threads
>    -> update_thread_list
>     -> target_update_thread_list
>      -> amd_dbgapi_target::update_thread_list
>       -> thread_db_target::update_thread_list
>        -> linux_nat_target::update_thread_list
> 	-> delete_exited_threads
> 	 -> delete_thread
> 	  -> delete_thread_1
> 	   -> gdb::observers::observable<thread_info*>::notify
> 	    -> amd_dbgapi_thread_deleted
> 	     -> internal_error_loc
> 
> The testcase thus tries both running to exit after the first kernel
> exits, and running to a breakpoint in a second kernel after the first
> kernel exits.
> 
> Change-Id: I43a66f060c35aad1fe0d9ff022ce2afd0537f028
> ---
>  gdb/amd-dbgapi-target.c | 197 ++++++++++++++++++++++++++++++----------
>  1 file changed, 149 insertions(+), 48 deletions(-)
> 
> diff --git a/gdb/amd-dbgapi-target.c b/gdb/amd-dbgapi-target.c
> index 86102b7fb03..06f9e8c5f9c 100644
> --- a/gdb/amd-dbgapi-target.c
> +++ b/gdb/amd-dbgapi-target.c
> @@ -109,10 +109,9 @@ get_amd_dbgapi_target_inferior_created_observer_token ()
>    return amd_dbgapi_target_inferior_created_observer_token;
>  }
>  
> -/* A type holding coordinate, etc. info for a given wave.  We cache
> -   this because we need this information after a wave exits.  */
> +/* A type holding coordinates, etc. info for a given wave.  */
>  
> -struct wave_info
> +struct wave_coordinates
>  {
>    /* The wave.  Set by the ctor.  */
>    amd_dbgapi_wave_id_t wave_id;
> @@ -125,11 +124,44 @@ struct wave_info
>    uint32_t group_ids[3] {UINT32_MAX, UINT32_MAX, UINT32_MAX};
>    uint32_t wave_in_group = UINT32_MAX;
>  
> -  explicit wave_info (amd_dbgapi_wave_id_t wave_id);
> +  explicit wave_coordinates (amd_dbgapi_wave_id_t wave_id)
> +    : wave_id (wave_id)
> +  {}
>  
> -  /* Return the target ID string for the wave this wave_info is
> +  /* Return the target ID string for the wave this wave_coordinates is
>       for.  */
>    std::string to_string () const;
> +
> +  /* Pull out coordinates info from the amd-dbgapi library.  */
> +  void fetch ();
> +};
> +
> +/* A type holding info about a given wave.  */
> +
> +struct wave_info
> +{
> +  /* We cache the coordinates info because we need it after a wave
> +     exits.  The wave's ID is here.  */
> +  wave_coordinates coords;
> +
> +  /* The last resume_mode passed to amd_dbgapi_wave_resume for this
> +     wave.  We track this because we are guaranteed to see a
> +     WAVE_COMMAND_TERMINATED event if a stepping wave terminates, and
> +     we need to know to not delete such a wave until we process that
> +     event.  */
> +  amd_dbgapi_resume_mode_t last_resume_mode = AMD_DBGAPI_RESUME_MODE_NORMAL;
> +
> +  /* Whether we've called amd_dbgapi_wave_stop for this wave and are
> +     waiting for its stop event.  Similarly, we track this because
> +     we're guaranteed to get a WAVE_COMMAND_TERMINATED event if the
> +     wave terminates while being stopped.  */
> +  bool stopping = false;
> +
> +  explicit wave_info (amd_dbgapi_wave_id_t wave_id)
> +    : coords (wave_id)
> +  {
> +    coords.fetch ();
> +  }
>  };
>  
>  /* Big enough to hold the size of the largest register in bytes.  */
> @@ -275,6 +307,19 @@ static struct amd_dbgapi_target the_amd_dbgapi_target;
>  static const registry<inferior>::key<amd_dbgapi_inferior_info>
>    amd_dbgapi_inferior_data;
>  
> +/* Fetch the amd_dbgapi_inferior_info data for the given inferior.  */
> +
> +static struct amd_dbgapi_inferior_info *
> +get_amd_dbgapi_inferior_info (struct inferior *inferior)
> +{
> +  amd_dbgapi_inferior_info *info = amd_dbgapi_inferior_data.get (inferior);
> +
> +  if (info == nullptr)
> +    info = amd_dbgapi_inferior_data.emplace (inferior, inferior);
> +
> +  return info;
> +}
> +
>  /* The async event handler registered with the event loop, indicating that we
>     might have events to report to the core and that we'd like our wait method
>     to be called.
> @@ -289,7 +334,7 @@ static const registry<inferior>::key<amd_dbgapi_inferior_info>
>  static async_event_handler *amd_dbgapi_async_event_handler = nullptr;
>  
>  std::string
> -wave_info::to_string () const
> +wave_coordinates::to_string () const
>  {
>    std::string str = "AMDGPU Wave";
>  
> @@ -319,37 +364,41 @@ wave_info::to_string () const
>    return str;
>  }
>  
> -wave_info::wave_info (amd_dbgapi_wave_id_t wave_id)
> -  : wave_id (wave_id)
> -{
> -}
> -
> -/* Read in wave_info for WAVE_ID.  */
> -
> -static wave_info
> -get_wave_info (amd_dbgapi_wave_id_t wave_id)
> +void
> +wave_coordinates::fetch ()
>  {
> -  wave_info res (wave_id);
> -
>    /* Any field that fails to be read is left with its in-class
>       initialized value, which is printed as "?".  */
>  
>    amd_dbgapi_wave_get_info (wave_id, AMD_DBGAPI_WAVE_INFO_AGENT,
> -			    sizeof (res.agent_id), &res.agent_id);
> +			    sizeof (agent_id), &agent_id);
>    amd_dbgapi_wave_get_info (wave_id, AMD_DBGAPI_WAVE_INFO_QUEUE,
> -			    sizeof (res.queue_id), &res.queue_id);
> +			    sizeof (queue_id), &queue_id);
>    amd_dbgapi_wave_get_info (wave_id, AMD_DBGAPI_WAVE_INFO_DISPATCH,
> -			    sizeof (res.dispatch_id), &res.dispatch_id);
> +			    sizeof (dispatch_id), &dispatch_id);
>  
>    amd_dbgapi_wave_get_info (wave_id,
>  			    AMD_DBGAPI_WAVE_INFO_WORKGROUP_COORD,
> -			    sizeof (res.group_ids), &res.group_ids);
> +			    sizeof (group_ids), &group_ids);
>  
>    amd_dbgapi_wave_get_info (wave_id,
>  			    AMD_DBGAPI_WAVE_INFO_WAVE_NUMBER_IN_WORKGROUP,
> -			    sizeof (res.wave_in_group), &res.wave_in_group);
> +			    sizeof (wave_in_group), &wave_in_group);
> +}
> +
> +/* Get the wave_info object for TP, from the wave_info map.  It is
> +   assumed that the wave is in the map.  */
> +
> +static wave_info &
> +get_thread_wave_info (thread_info *tp)
> +{
> +  amd_dbgapi_inferior_info *info = get_amd_dbgapi_inferior_info (tp->inf);
> +  amd_dbgapi_wave_id_t wave_id = get_amd_dbgapi_wave_id (tp->ptid);
> +
> +  auto it = info->wave_info_map.find (wave_id.handle);
> +  gdb_assert (it != info->wave_info_map.end ());
>  
> -  return res;
> +  return it->second;
>  }
>  
>  /* Clear our async event handler.  */
> @@ -370,19 +419,6 @@ async_event_handler_mark ()
>    mark_async_event_handler (amd_dbgapi_async_event_handler);
>  }
>  
> -/* Fetch the amd_dbgapi_inferior_info data for the given inferior.  */
> -
> -static struct amd_dbgapi_inferior_info *
> -get_amd_dbgapi_inferior_info (struct inferior *inferior)
> -{
> -  amd_dbgapi_inferior_info *info = amd_dbgapi_inferior_data.get (inferior);
> -
> -  if (info == nullptr)
> -    info = amd_dbgapi_inferior_data.emplace (inferior, inferior);
> -
> -  return info;
> -}
> -
>  /* Set forward progress requirement to REQUIRE for all processes of PROC_TARGET
>     matching PTID.  */
>  
> @@ -565,12 +601,12 @@ amd_dbgapi_target::pid_to_str (ptid_t ptid)
>  
>    auto it = info->wave_info_map.find (wave_id.handle);
>    if (it != info->wave_info_map.end ())
> -    return it->second.to_string ();
> +    return it->second.coords.to_string ();
>  
>    /* A wave we don't know about.  Shouldn't usually happen, but
>       asserting and bringing down the session is a bit too harsh.  Just
>       print all unknown info as "?"s.  */
> -  return wave_info (wave_id).to_string ();
> +  return wave_coordinates (wave_id).to_string ();
>  }
>  
>  const char *
> @@ -694,16 +730,24 @@ amd_dbgapi_target::resume (ptid_t scope_ptid, int step, enum gdb_signal signo)
>  
>        amd_dbgapi_wave_id_t wave_id = get_amd_dbgapi_wave_id (thread->ptid);
>        amd_dbgapi_status_t status;
> +
> +      wave_info &wi = get_thread_wave_info (thread);
> +      amd_dbgapi_resume_mode_t &resume_mode = wi.last_resume_mode;
> +      amd_dbgapi_exceptions_t wave_exception;
>        if (thread->ptid == inferior_ptid)
> -	status = amd_dbgapi_wave_resume (wave_id,
> -					 (step
> -					  ? AMD_DBGAPI_RESUME_MODE_SINGLE_STEP
> -					  : AMD_DBGAPI_RESUME_MODE_NORMAL),
> -					 exception);
> +	{
> +	  resume_mode = (step
> +			 ? AMD_DBGAPI_RESUME_MODE_SINGLE_STEP
> +			 : AMD_DBGAPI_RESUME_MODE_NORMAL);
> +	  wave_exception = exception;
> +	}
>        else
> -	status = amd_dbgapi_wave_resume (wave_id, AMD_DBGAPI_RESUME_MODE_NORMAL,
> -					 AMD_DBGAPI_EXCEPTION_NONE);
> +	{
> +	  resume_mode = AMD_DBGAPI_RESUME_MODE_NORMAL;
> +	  wave_exception = AMD_DBGAPI_EXCEPTION_NONE;
> +	}
>  
> +      status = amd_dbgapi_wave_resume (wave_id, resume_mode, wave_exception);
>        if (status != AMD_DBGAPI_STATUS_SUCCESS
>  	  /* Ignore the error that wave is no longer valid as that could
>  	     indicate that the process has exited.  GDB treats resuming a
> @@ -711,6 +755,8 @@ amd_dbgapi_target::resume (ptid_t scope_ptid, int step, enum gdb_signal signo)
>  	  && status != AMD_DBGAPI_STATUS_ERROR_INVALID_WAVE_ID)
>  	error (_("wave_resume for wave_%ld failed (%s)"), wave_id.handle,
>  	       get_status_string (status));
> +
> +      wi.stopping = false;
>      }
>  }
>  
> @@ -725,6 +771,15 @@ amd_dbgapi_target::commit_resumed ()
>    require_forward_progress (minus_one_ptid, proc_target, true);
>  }
>  
> +/* Return a string version of RESUME_MODE, for debug log purposes.  */
> +static const char *
> +resume_mode_to_string (amd_dbgapi_resume_mode_t resume_mode)
> +{
> +  return (resume_mode == AMD_DBGAPI_RESUME_MODE_SINGLE_STEP
> +	  ? "step"
> +	  : "normal");

Pedantically, amd_dbgapi_resume_mode_t is an enum and could in theory
get more enumerators.  I don't expect any, but the following construct
would make the compiler complain if that ever happens:

   static const char *
   resume_mode_to_string (amd_dbgapi_resume_mode_t resume_mode)
   {
     switch (resume_mode)
       {
       case AMD_DBGAPI_RESUME_MODE_NORMAL:
         return "normal";
       case AMD_DBGAPI_RESUME_MODE_SINGLE_STEP:
         return "step";
       }
     gdb_assert_not_reached ("invalid amd_dbgapi_resume_mode_t");
   }

> +}
> +
>  void
>  amd_dbgapi_target::stop (ptid_t ptid)
>  {
> @@ -758,7 +813,11 @@ amd_dbgapi_target::stop (ptid_t ptid)
>  
>  	  status = amd_dbgapi_wave_stop (wave_id);
>  	  if (status == AMD_DBGAPI_STATUS_SUCCESS)
> -	    return;
> +	    {
> +	      wave_info &wi = get_thread_wave_info (thread);
> +	      wi.stopping = true;
> +	      return;
> +	    }
>  
>  	  if (status != AMD_DBGAPI_STATUS_ERROR_INVALID_WAVE_ID)
>  	    error (_("wave_stop for wave_%ld failed (%s)"), wave_id.handle,
> @@ -772,6 +831,23 @@ amd_dbgapi_target::stop (ptid_t ptid)
>  	 could have terminated since the last time the wave list was
>  	 refreshed.  */
>  
> +      wave_info &wi = get_thread_wave_info (thread);
> +      wi.stopping = true;
> +
> +      amd_dbgapi_debug_printf ("got AMD_DBGAPI_STATUS_ERROR_INVALID_WAVE_ID "
> +			       "for wave_%ld, last_resume_mode=%s, "
> +			       "report_thread_events=%d",
> +			       wave_id.handle,
> +			       resume_mode_to_string (wi.last_resume_mode),
> +			       m_report_thread_events);
> +
> +      /* If the wave was stepping when it terminated, then it is
> +	 guaranteed that we will see a WAVE_COMMAND_TERMINATED event
> +	 for it.  Don't report a thread exit event or delete the
> +	 thread yet, until we see such event.  */
> +      if (wi.last_resume_mode == AMD_DBGAPI_RESUME_MODE_SINGLE_STEP)
> +	return;
> +
>        if (m_report_thread_events)
>  	{
>  	  get_amd_dbgapi_inferior_info (thread->inf)->wave_events.emplace_back
> @@ -1018,7 +1094,7 @@ add_gpu_thread (inferior *inf, ptid_t wave_ptid)
>    auto wave_id = get_amd_dbgapi_wave_id (wave_ptid);
>  
>    if (!info->wave_info_map.try_emplace (wave_id.handle,
> -					get_wave_info (wave_id)).second)
> +					wave_info (wave_id)).second)
>      internal_error ("wave ID %ld already in map", wave_id.handle);
>  
>    /* Create new GPU threads silently to avoid spamming the terminal
> @@ -1770,7 +1846,32 @@ amd_dbgapi_target::update_thread_list ()
>  	    auto it = threads.find (tp->ptid.tid ());
>  
>  	    if (it == threads.end ())
> -	      delete_thread_silent (tp);
> +	      {
> +		auto wave_id = get_amd_dbgapi_wave_id (tp->ptid);
> +		wave_info &wi = get_thread_wave_info (tp);
> +
> +		/* Waves that were stepping or in progress of being
> +		   stopped are guaranteed to report a
> +		   WAVE_COMMAND_TERMINATED event if they terminate.
> +		   Don't delete such threads until we see the
> +		   event.  */
> +		if (wi.last_resume_mode == AMD_DBGAPI_RESUME_MODE_SINGLE_STEP
> +		    || wi.stopping)
> +		  {
> +		    amd_dbgapi_debug_printf
> +		      ("wave_%ld disappeared, keeping it"
> +		       " (last_resume_mode=%s, stopping=%d)",
> +		       wave_id.handle,
> +		       resume_mode_to_string (wi.last_resume_mode),
> +		       wi.stopping);
> +		  }
> +		else
> +		  {
> +		    amd_dbgapi_debug_printf ("wave_%ld disappeared, deleting it",
> +					     wave_id.handle);
> +		    delete_thread_silent (tp);
> +		  }
> +	      }
>  	    else
>  	      threads.erase (it);
>  	  }
> 
> -- 
> 2.43.0
> 

I happy with the patch as it is if you prefer to not change
resume_mode_to_string.  Either way

Approved-By: Lancelot Six <laneclot.six@amd.com> (amdgpu)

Best,
Lancelot.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/8] Step over thread exit improvements/fixes + AMD GPU
  2023-12-14 20:22 [PATCH 0/8] Step over thread exit improvements/fixes + AMD GPU Pedro Alves
                   ` (7 preceding siblings ...)
  2023-12-14 20:22 ` [PATCH 8/8] Add tests for handling of vanishing threads that were stepping/stopping Pedro Alves
@ 2023-12-20 21:24 ` Pedro Alves
  8 siblings, 0 replies; 12+ messages in thread
From: Pedro Alves @ 2023-12-20 21:24 UTC (permalink / raw)
  To: gdb-patches

Hi!

On 2023-12-14 20:22, Pedro Alves wrote:
> This series is the result of making step-over-thread-exit work
> properly with the AMD GPU target.
> 
> It includes some improvements to
> gdb.threads/step-over-thread-exit.exp, a few core fixes, and then AMD
> GPU target fixes.
> 
> Finally, the last two patches include tests that we are carrying
> downstream, but that unfortunately can't work with upstream GDB yet,
> because upstream doesn't understand the DWARF extensions that we are
> working hard to get into DWARF 6, so upstream is missing proper
> unwinding and accessing variables.  I include them in the series so
> reading the patches makes more sense, but I don't plan on pushing
> them.  Unless we are OK with adding them upstream with some early
> return, effectivelly making them nops.
> 
> I sent the first two patches as a separate series last month, and
> Simon & Lancelot have meanwhile reviewed this whole series internally
> at AMD, which resulted in some further improvements in those first
> patches (as well as in the others).

I've addressed Lancelot's comments (thanks! all adjusted accordingly.), added
corresponding approved-by tags, and pushed this series in, except the tests,
as mentioned above.

Pedro Alves

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-12-20 21:24 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-14 20:22 [PATCH 0/8] Step over thread exit improvements/fixes + AMD GPU Pedro Alves
2023-12-14 20:22 ` [PATCH 1/8] gdb.threads/step-over-thread-exit.exp improvements Pedro Alves
2023-12-14 20:22 ` [PATCH 2/8] Ensure selected thread after thread exit stop Pedro Alves
2023-12-14 20:22 ` [PATCH 3/8] displaced_step_finish: Don't fetch the regcache of exited threads Pedro Alves
2023-12-14 20:22 ` [PATCH 4/8] Step over thread exit, always delete the thread non-silently Pedro Alves
2023-12-14 20:22 ` [PATCH 5/8] Fix thread target ID of exited waves Pedro Alves
2023-12-15 10:51   ` Lancelot SIX
2023-12-14 20:22 ` [PATCH 6/8] Fix handling of vanishing threads that were stepping/stopping Pedro Alves
2023-12-15 10:51   ` Lancelot SIX
2023-12-14 20:22 ` [PATCH 7/8] Add tests for s_endpgm handling Pedro Alves
2023-12-14 20:22 ` [PATCH 8/8] Add tests for handling of vanishing threads that were stepping/stopping Pedro Alves
2023-12-20 21:24 ` [PATCH 0/8] Step over thread exit improvements/fixes + AMD GPU Pedro Alves

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).