public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
From: Pedro Alves <pedro@palves.net>
To: gdb-patches@sourceware.org
Subject: [PATCH 5/8] Fix thread target ID of exited waves
Date: Thu, 14 Dec 2023 20:22:35 +0000	[thread overview]
Message-ID: <20231214202238.1065676-6-pedro@palves.net> (raw)
In-Reply-To: <20231214202238.1065676-1-pedro@palves.net>

Currently, if you step over kernel exit, you see:

 stepi
 [AMDGPU Wave ?:?:?:1 (?,?,?)/? exited]
 Command aborted, thread exited.
 (gdb)

Those '?' are because the thread/wave is already gone by the time GDB
prints the "exited" notification, we can't ask dbgapi for any info
about the wave anymore.

This commit fixes it by caching the wave's coordinates as soon as GDB
sees the wave for the first time, and making
amd_dbgapi_target::pid_to_str use the cached info.

At first I thought of clearing the wave_info object from a
thread_exited observer.  However, that is too soon, resulting in this:

 (gdb) si
 [AMDGPU Wave 1:4:1:1 (0,0,0)/0 exited]
 Command aborted, thread exited.
 (gdb) thread
 [Current thread is 6 (AMDGPU Wave ?:?:?:0 (?,?,?)/?) (exited)]

We need instead to clear the wave info when the thread is ultimately
deleted, so we get:

 (gdb) si
 [AMDGPU Wave 1:4:1:1 (0,0,0)/0 exited]
 Command aborted, thread exited.
 (gdb) thread
 [Current thread is 6 (AMDGPU Wave 1:4:1:1 (0,0,0)/0) (exited)]

And for that, we need a new thread_deleted observable.

Change-Id: I6c3e22541f051e1205f75eb657b04dc15e547580
---
 gdb/amd-dbgapi-target.c | 168 +++++++++++++++++++++++++++++++---------
 gdb/observable.c        |   1 +
 gdb/observable.h        |   5 ++
 gdb/thread.c            |   2 +
 4 files changed, 138 insertions(+), 38 deletions(-)

diff --git a/gdb/amd-dbgapi-target.c b/gdb/amd-dbgapi-target.c
index 18c0543c40e..86102b7fb03 100644
--- a/gdb/amd-dbgapi-target.c
+++ b/gdb/amd-dbgapi-target.c
@@ -109,6 +109,28 @@ get_amd_dbgapi_target_inferior_created_observer_token ()
   return amd_dbgapi_target_inferior_created_observer_token;
 }
 
+/* A type holding coordinate, etc. info for a given wave.  We cache
+   this because we need this information after a wave exits.  */
+
+struct wave_info
+{
+  /* The wave.  Set by the ctor.  */
+  amd_dbgapi_wave_id_t wave_id;
+
+  /* All these fields are initialized here to a value that is printed
+     as "?".  */
+  amd_dbgapi_dispatch_id_t dispatch_id {};
+  amd_dbgapi_queue_id_t queue_id {};
+  amd_dbgapi_agent_id_t agent_id {};
+  uint32_t group_ids[3] {UINT32_MAX, UINT32_MAX, UINT32_MAX};
+  uint32_t wave_in_group = UINT32_MAX;
+
+  explicit wave_info (amd_dbgapi_wave_id_t wave_id);
+
+  /* Return the target ID string for the wave this wave_info is
+     for.  */
+  std::string to_string () const;
+};
 
 /* Big enough to hold the size of the largest register in bytes.  */
 #define AMDGPU_MAX_REGISTER_SIZE 256
@@ -160,6 +182,16 @@ struct amd_dbgapi_inferior_info
 
   /* List of pending events the amd-dbgapi target retrieved from the dbgapi.  */
   std::list<std::pair<ptid_t, target_waitstatus>> wave_events;
+
+  /* Map of wave ID to wave_info.  We cache wave_info objects because
+     we need to access the info after the wave is gone, in the thread
+     exit nofication.  E.g.:
+	[AMDGPU Wave 1:4:1:1 (0,0,0)/0 exited]
+
+     wave_info objects are added when we first see the wave, and
+     removed from a thread_deleted observer.  */
+  std::unordered_map<decltype (amd_dbgapi_wave_id_t::handle), wave_info>
+    wave_info_map;
 };
 
 static amd_dbgapi_event_id_t process_event_queue
@@ -256,56 +288,70 @@ static const registry<inferior>::key<amd_dbgapi_inferior_info>
 
 static async_event_handler *amd_dbgapi_async_event_handler = nullptr;
 
-/* Return the target id string for a given wave.  */
-
-static std::string
-wave_target_id_string (amd_dbgapi_wave_id_t wave_id)
+std::string
+wave_info::to_string () const
 {
-  amd_dbgapi_dispatch_id_t dispatch_id;
-  amd_dbgapi_queue_id_t queue_id;
-  amd_dbgapi_agent_id_t agent_id;
-  uint32_t group_ids[3], wave_in_group;
   std::string str = "AMDGPU Wave";
 
-  amd_dbgapi_status_t status
-    = amd_dbgapi_wave_get_info (wave_id, AMD_DBGAPI_WAVE_INFO_AGENT,
-				sizeof (agent_id), &agent_id);
-  str += (status == AMD_DBGAPI_STATUS_SUCCESS
+  str += (agent_id.handle != 0
 	  ? string_printf (" %ld", agent_id.handle)
 	  : " ?");
 
-  status = amd_dbgapi_wave_get_info (wave_id, AMD_DBGAPI_WAVE_INFO_QUEUE,
-				     sizeof (queue_id), &queue_id);
-  str += (status == AMD_DBGAPI_STATUS_SUCCESS
+  str += (queue_id.handle != 0
 	  ? string_printf (":%ld", queue_id.handle)
 	  : ":?");
 
-  status = amd_dbgapi_wave_get_info (wave_id, AMD_DBGAPI_WAVE_INFO_DISPATCH,
-				     sizeof (dispatch_id), &dispatch_id);
-  str += (status == AMD_DBGAPI_STATUS_SUCCESS
+  str += (dispatch_id.handle != 0
 	  ? string_printf (":%ld", dispatch_id.handle)
 	  : ":?");
 
   str += string_printf (":%ld", wave_id.handle);
 
-  status = amd_dbgapi_wave_get_info (wave_id,
-				     AMD_DBGAPI_WAVE_INFO_WORKGROUP_COORD,
-				     sizeof (group_ids), &group_ids);
-  str += (status == AMD_DBGAPI_STATUS_SUCCESS
+  str += (group_ids[0] != UINT32_MAX
 	  ? string_printf (" (%d,%d,%d)", group_ids[0], group_ids[1],
 			   group_ids[2])
 	  : " (?,?,?)");
 
-  status = amd_dbgapi_wave_get_info
-    (wave_id, AMD_DBGAPI_WAVE_INFO_WAVE_NUMBER_IN_WORKGROUP,
-     sizeof (wave_in_group), &wave_in_group);
-  str += (status == AMD_DBGAPI_STATUS_SUCCESS
+  str += (wave_in_group != UINT32_MAX
 	  ? string_printf ("/%d", wave_in_group)
 	  : "/?");
 
   return str;
 }
 
+wave_info::wave_info (amd_dbgapi_wave_id_t wave_id)
+  : wave_id (wave_id)
+{
+}
+
+/* Read in wave_info for WAVE_ID.  */
+
+static wave_info
+get_wave_info (amd_dbgapi_wave_id_t wave_id)
+{
+  wave_info res (wave_id);
+
+  /* Any field that fails to be read is left with its in-class
+     initialized value, which is printed as "?".  */
+
+  amd_dbgapi_wave_get_info (wave_id, AMD_DBGAPI_WAVE_INFO_AGENT,
+			    sizeof (res.agent_id), &res.agent_id);
+  amd_dbgapi_wave_get_info (wave_id, AMD_DBGAPI_WAVE_INFO_QUEUE,
+			    sizeof (res.queue_id), &res.queue_id);
+  amd_dbgapi_wave_get_info (wave_id, AMD_DBGAPI_WAVE_INFO_DISPATCH,
+			    sizeof (res.dispatch_id), &res.dispatch_id);
+
+  amd_dbgapi_wave_get_info (wave_id,
+			    AMD_DBGAPI_WAVE_INFO_WORKGROUP_COORD,
+			    sizeof (res.group_ids), &res.group_ids);
+
+  amd_dbgapi_wave_get_info (wave_id,
+			    AMD_DBGAPI_WAVE_INFO_WAVE_NUMBER_IN_WORKGROUP,
+			    sizeof (res.wave_in_group), &res.wave_in_group);
+
+  return res;
+}
+
 /* Clear our async event handler.  */
 
 static void
@@ -510,7 +556,21 @@ amd_dbgapi_target::pid_to_str (ptid_t ptid)
   if (!ptid_is_gpu (ptid))
     return beneath ()->pid_to_str (ptid);
 
-  return wave_target_id_string (get_amd_dbgapi_wave_id (ptid));
+  process_stratum_target *proc_target = current_inferior ()->process_target ();
+  inferior *inf = find_inferior_pid (proc_target, ptid.pid ());
+  gdb_assert (inf != nullptr);
+  amd_dbgapi_inferior_info *info = get_amd_dbgapi_inferior_info (inf);
+
+  auto wave_id = get_amd_dbgapi_wave_id (ptid);
+
+  auto it = info->wave_info_map.find (wave_id.handle);
+  if (it != info->wave_info_map.end ())
+    return it->second.to_string ();
+
+  /* A wave we don't know about.  Shouldn't usually happen, but
+     asserting and bringing down the session is a bit too harsh.  Just
+     print all unknown info as "?"s.  */
+  return wave_info (wave_id).to_string ();
 }
 
 const char *
@@ -929,6 +989,46 @@ make_gpu_ptid (ptid_t::pid_type pid, amd_dbgapi_wave_id_t wave_id)
  return ptid_t (pid, 1, wave_id.handle);
 }
 
+/* When a thread is deleted, remove its wave_info from the inferior's
+   wave_info map.  */
+
+static void
+amd_dbgapi_thread_deleted (thread_info *tp)
+{
+  if (tp->inf->target_at (arch_stratum) == &the_amd_dbgapi_target
+      && ptid_is_gpu (tp->ptid))
+    {
+      amd_dbgapi_inferior_info *info = amd_dbgapi_inferior_data.get (tp->inf);
+      auto wave_id = get_amd_dbgapi_wave_id (tp->ptid);
+      auto it = info->wave_info_map.find (wave_id.handle);
+      gdb_assert (it != info->wave_info_map.end ());
+      info->wave_info_map.erase (it);
+    }
+}
+
+/* Register WAVE_PTID as a new thread in INF's thread list, and record
+   its wave_info in the inferior's wave_info map.  */
+
+static thread_info *
+add_gpu_thread (inferior *inf, ptid_t wave_ptid)
+{
+  process_stratum_target *proc_target = inf->process_target ();
+  amd_dbgapi_inferior_info *info = get_amd_dbgapi_inferior_info (inf);
+
+  auto wave_id = get_amd_dbgapi_wave_id (wave_ptid);
+
+  if (!info->wave_info_map.try_emplace (wave_id.handle,
+					get_wave_info (wave_id)).second)
+    internal_error ("wave ID %ld already in map", wave_id.handle);
+
+  /* Create new GPU threads silently to avoid spamming the terminal
+     with thousands of "[New Thread ...]" messages.  */
+  thread_info *thread = add_thread_silent (proc_target, wave_ptid);
+  set_running (proc_target, wave_ptid, true);
+  set_executing (proc_target, wave_ptid, true);
+  return thread;
+}
+
 /* Process an event that was just pulled out of the amd-dbgapi library.  */
 
 static void
@@ -1015,13 +1115,7 @@ process_one_event (amd_dbgapi_event_id_t event_id,
 
 	    thread_info *thread = proc_target->find_thread (event_ptid);
 	    if (thread == nullptr)
-	      {
-		/* Silently create new GPU threads to avoid spamming the
-		   terminal with thousands of "[New Thread ...]" messages.  */
-		thread = add_thread_silent (proc_target, event_ptid);
-		set_running (proc_target, event_ptid, true);
-		set_executing (proc_target, event_ptid, true);
-	      }
+	      thread = add_gpu_thread (inf, event_ptid);
 
 	    /* If the wave is stopped because of a software breakpoint, the
 	       program counter needs to be adjusted so that it points to the
@@ -1686,10 +1780,7 @@ amd_dbgapi_target::update_thread_list ()
 	{
 	  ptid_t wave_ptid
 	    = make_gpu_ptid (inf->pid, amd_dbgapi_wave_id_t {tid});
-
-	  add_thread_silent (inf->process_target (), wave_ptid);
-	  set_running (inf->process_target (), wave_ptid, true);
-	  set_executing (inf->process_target (), wave_ptid, true);
+	  add_gpu_thread (inf, wave_ptid);
 	}
     }
 
@@ -2115,6 +2206,7 @@ _initialize_amd_dbgapi_target ()
   gdb::observers::inferior_forked.attach (amd_dbgapi_inferior_forked, "amd-dbgapi");
   gdb::observers::inferior_exit.attach (amd_dbgapi_inferior_exited, "amd-dbgapi");
   gdb::observers::inferior_pre_detach.attach (amd_dbgapi_inferior_pre_detach, "amd-dbgapi");
+  gdb::observers::thread_deleted.attach (amd_dbgapi_thread_deleted, "amd-dbgapi");
 
   add_basic_prefix_cmd ("amdgpu", no_class,
 			_("Generic command for setting amdgpu flags."),
diff --git a/gdb/observable.c b/gdb/observable.c
index f2e65b11604..29675f3abf3 100644
--- a/gdb/observable.c
+++ b/gdb/observable.c
@@ -46,6 +46,7 @@ DEFINE_OBSERVABLE (all_objfiles_removed);
 DEFINE_OBSERVABLE (free_objfile);
 DEFINE_OBSERVABLE (new_thread);
 DEFINE_OBSERVABLE (thread_exit);
+DEFINE_OBSERVABLE (thread_deleted);
 DEFINE_OBSERVABLE (thread_stop_requested);
 DEFINE_OBSERVABLE (target_resumed);
 DEFINE_OBSERVABLE (about_to_proceed);
diff --git a/gdb/observable.h b/gdb/observable.h
index 32ef65435cc..91a2c871524 100644
--- a/gdb/observable.h
+++ b/gdb/observable.h
@@ -126,6 +126,11 @@ extern observable<thread_info */* t */,
 		  std::optional<ULONGEST> /* exit_code */,
 		  bool /* silent */> thread_exit;
 
+/* The thread specified by T has been deleted, with delete_thread.
+   This is called just before the thread_info object is destroyed with
+   operator delete.  */
+extern observable<thread_info */* t */> thread_deleted;
+
 /* An explicit stop request was issued to PTID.  If PTID equals
    minus_one_ptid, the request applied to all threads.  If
    ptid_is_pid(PTID) returns true, the request applied to all
diff --git a/gdb/thread.c b/gdb/thread.c
index 85bdbaa6cd8..bd3fe85f3b9 100644
--- a/gdb/thread.c
+++ b/gdb/thread.c
@@ -527,6 +527,8 @@ delete_thread_1 (thread_info *thr, std::optional<ULONGEST> exit_code,
   auto it = thr->inf->thread_list.iterator_to (*thr);
   thr->inf->thread_list.erase (it);
 
+  gdb::observers::thread_deleted.notify (thr);
+
   delete thr;
 }
 
-- 
2.43.0


  parent reply	other threads:[~2023-12-14 20:22 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-14 20:22 [PATCH 0/8] Step over thread exit improvements/fixes + AMD GPU Pedro Alves
2023-12-14 20:22 ` [PATCH 1/8] gdb.threads/step-over-thread-exit.exp improvements Pedro Alves
2023-12-14 20:22 ` [PATCH 2/8] Ensure selected thread after thread exit stop Pedro Alves
2023-12-14 20:22 ` [PATCH 3/8] displaced_step_finish: Don't fetch the regcache of exited threads Pedro Alves
2023-12-14 20:22 ` [PATCH 4/8] Step over thread exit, always delete the thread non-silently Pedro Alves
2023-12-14 20:22 ` Pedro Alves [this message]
2023-12-15 10:51   ` [PATCH 5/8] Fix thread target ID of exited waves Lancelot SIX
2023-12-14 20:22 ` [PATCH 6/8] Fix handling of vanishing threads that were stepping/stopping Pedro Alves
2023-12-15 10:51   ` Lancelot SIX
2023-12-14 20:22 ` [PATCH 7/8] Add tests for s_endpgm handling Pedro Alves
2023-12-14 20:22 ` [PATCH 8/8] Add tests for handling of vanishing threads that were stepping/stopping Pedro Alves
2023-12-20 21:24 ` [PATCH 0/8] Step over thread exit improvements/fixes + AMD GPU Pedro Alves

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231214202238.1065676-6-pedro@palves.net \
    --to=pedro@palves.net \
    --cc=gdb-patches@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).