Re: [PATCH 5/8] Fix thread target ID of exited waves

public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed

From: Lancelot SIX <lancelot.six@amd.com>
To: Pedro Alves <pedro@palves.net>
Cc: <gdb-patches@sourceware.org>
Subject: Re: [PATCH 5/8] Fix thread target ID of exited waves
Date: Fri, 15 Dec 2023 10:51:05 +0000	[thread overview]
Message-ID: <20231215105105.bqqo5jxy3hpxyqqj@khazad-dum> (raw)
In-Reply-To: <20231214202238.1065676-6-pedro@palves.net>

Hi Pedro,

Thanks for doing this.

I have minor suggestions below, but I am happy if you prefer what you
have here.

On Thu, Dec 14, 2023 at 08:22:35PM +0000, Pedro Alves wrote:
> Currently, if you step over kernel exit, you see:
> 
>  stepi
>  [AMDGPU Wave ?:?:?:1 (?,?,?)/? exited]
>  Command aborted, thread exited.
>  (gdb)
> 
> Those '?' are because the thread/wave is already gone by the time GDB
> prints the "exited" notification, we can't ask dbgapi for any info
> about the wave anymore.
> 
> This commit fixes it by caching the wave's coordinates as soon as GDB
> sees the wave for the first time, and making
> amd_dbgapi_target::pid_to_str use the cached info.
> 
> At first I thought of clearing the wave_info object from a
> thread_exited observer.  However, that is too soon, resulting in this:
> 
>  (gdb) si
>  [AMDGPU Wave 1:4:1:1 (0,0,0)/0 exited]
>  Command aborted, thread exited.
>  (gdb) thread
>  [Current thread is 6 (AMDGPU Wave ?:?:?:0 (?,?,?)/?) (exited)]
> 
> We need instead to clear the wave info when the thread is ultimately
> deleted, so we get:
> 
>  (gdb) si
>  [AMDGPU Wave 1:4:1:1 (0,0,0)/0 exited]
>  Command aborted, thread exited.
>  (gdb) thread
>  [Current thread is 6 (AMDGPU Wave 1:4:1:1 (0,0,0)/0) (exited)]
> 
> And for that, we need a new thread_deleted observable.
> 
> Change-Id: I6c3e22541f051e1205f75eb657b04dc15e547580
> ---
>  gdb/amd-dbgapi-target.c | 168 +++++++++++++++++++++++++++++++---------
>  gdb/observable.c        |   1 +
>  gdb/observable.h        |   5 ++
>  gdb/thread.c            |   2 +
>  4 files changed, 138 insertions(+), 38 deletions(-)
> 
> diff --git a/gdb/amd-dbgapi-target.c b/gdb/amd-dbgapi-target.c
> index 18c0543c40e..86102b7fb03 100644
> --- a/gdb/amd-dbgapi-target.c
> +++ b/gdb/amd-dbgapi-target.c
> @@ -109,6 +109,28 @@ get_amd_dbgapi_target_inferior_created_observer_token ()
>    return amd_dbgapi_target_inferior_created_observer_token;
>  }
>  
> +/* A type holding coordinate, etc. info for a given wave.  We cache
> +   this because we need this information after a wave exits.  */
> +
> +struct wave_info
> +{
> +  /* The wave.  Set by the ctor.  */
> +  amd_dbgapi_wave_id_t wave_id;
> +
> +  /* All these fields are initialized here to a value that is printed
> +     as "?".  */
> +  amd_dbgapi_dispatch_id_t dispatch_id {};
> +  amd_dbgapi_queue_id_t queue_id {};
> +  amd_dbgapi_agent_id_t agent_id {};

We could be more explicit here and use the semantically equivalent
notation:

  amd_dbgapi_dispatch_id_t dispatch_id = AMD_DBGAPI_DISPATCH_NONE;
  amd_dbgapi_queue_id_t queue_id = AMD_DBGAPI_QUEUE_NONE;
  amd_dbgapi_agent_id_t agent_id = AMD_DBGAPI_AGENT_NONE;


> +  uint32_t group_ids[3] {UINT32_MAX, UINT32_MAX, UINT32_MAX};
> +  uint32_t wave_in_group = UINT32_MAX;
> +
> +  explicit wave_info (amd_dbgapi_wave_id_t wave_id);
> +
> +  /* Return the target ID string for the wave this wave_info is
> +     for.  */
> +  std::string to_string () const;
> +};
>  
>  /* Big enough to hold the size of the largest register in bytes.  */
>  #define AMDGPU_MAX_REGISTER_SIZE 256
> @@ -160,6 +182,16 @@ struct amd_dbgapi_inferior_info
>  
>    /* List of pending events the amd-dbgapi target retrieved from the dbgapi.  */
>    std::list<std::pair<ptid_t, target_waitstatus>> wave_events;
> +
> +  /* Map of wave ID to wave_info.  We cache wave_info objects because
> +     we need to access the info after the wave is gone, in the thread
> +     exit nofication.  E.g.:
> +	[AMDGPU Wave 1:4:1:1 (0,0,0)/0 exited]
> +
> +     wave_info objects are added when we first see the wave, and
> +     removed from a thread_deleted observer.  */
> +  std::unordered_map<decltype (amd_dbgapi_wave_id_t::handle), wave_info>
> +    wave_info_map;
>  };
>  
>  static amd_dbgapi_event_id_t process_event_queue
> @@ -256,56 +288,70 @@ static const registry<inferior>::key<amd_dbgapi_inferior_info>
>  
>  static async_event_handler *amd_dbgapi_async_event_handler = nullptr;
>  
> -/* Return the target id string for a given wave.  */
> -
> -static std::string
> -wave_target_id_string (amd_dbgapi_wave_id_t wave_id)
> +std::string
> +wave_info::to_string () const
>  {
> -  amd_dbgapi_dispatch_id_t dispatch_id;
> -  amd_dbgapi_queue_id_t queue_id;
> -  amd_dbgapi_agent_id_t agent_id;
> -  uint32_t group_ids[3], wave_in_group;
>    std::string str = "AMDGPU Wave";
>  
> -  amd_dbgapi_status_t status
> -    = amd_dbgapi_wave_get_info (wave_id, AMD_DBGAPI_WAVE_INFO_AGENT,
> -				sizeof (agent_id), &agent_id);
> -  str += (status == AMD_DBGAPI_STATUS_SUCCESS
> +  str += (agent_id.handle != 0

We could use `agent_id != AMD_DBGAPI_AGENT_NONE` here.

>  	  ? string_printf (" %ld", agent_id.handle)
>  	  : " ?");
>  
> -  status = amd_dbgapi_wave_get_info (wave_id, AMD_DBGAPI_WAVE_INFO_QUEUE,
> -				     sizeof (queue_id), &queue_id);
> -  str += (status == AMD_DBGAPI_STATUS_SUCCESS
> +  str += (queue_id.handle != 0

Similarly, `queue_id != AMD_DBGAPI_QUEUE_NONE`.

>  	  ? string_printf (":%ld", queue_id.handle)
>  	  : ":?");
>  
> -  status = amd_dbgapi_wave_get_info (wave_id, AMD_DBGAPI_WAVE_INFO_DISPATCH,
> -				     sizeof (dispatch_id), &dispatch_id);
> -  str += (status == AMD_DBGAPI_STATUS_SUCCESS
> +  str += (dispatch_id.handle != 0

Similarly, `dispatch_id != AMD_DBGAPI_DISPATCH_NONE`.

>  	  ? string_printf (":%ld", dispatch_id.handle)
>  	  : ":?");
>  
>    str += string_printf (":%ld", wave_id.handle);
>  
> -  status = amd_dbgapi_wave_get_info (wave_id,
> -				     AMD_DBGAPI_WAVE_INFO_WORKGROUP_COORD,
> -				     sizeof (group_ids), &group_ids);
> -  str += (status == AMD_DBGAPI_STATUS_SUCCESS
> +  str += (group_ids[0] != UINT32_MAX
>  	  ? string_printf (" (%d,%d,%d)", group_ids[0], group_ids[1],
>  			   group_ids[2])
>  	  : " (?,?,?)");
>  
> -  status = amd_dbgapi_wave_get_info
> -    (wave_id, AMD_DBGAPI_WAVE_INFO_WAVE_NUMBER_IN_WORKGROUP,
> -     sizeof (wave_in_group), &wave_in_group);
> -  str += (status == AMD_DBGAPI_STATUS_SUCCESS
> +  str += (wave_in_group != UINT32_MAX
>  	  ? string_printf ("/%d", wave_in_group)
>  	  : "/?");
>  
>    return str;
>  }
>  
> +wave_info::wave_info (amd_dbgapi_wave_id_t wave_id)
> +  : wave_id (wave_id)
> +{
> +}
> +
> +/* Read in wave_info for WAVE_ID.  */
> +
> +static wave_info
> +get_wave_info (amd_dbgapi_wave_id_t wave_id)
> +{
> +  wave_info res (wave_id);
> +
> +  /* Any field that fails to be read is left with its in-class
> +     initialized value, which is printed as "?".  */
> +
> +  amd_dbgapi_wave_get_info (wave_id, AMD_DBGAPI_WAVE_INFO_AGENT,
> +			    sizeof (res.agent_id), &res.agent_id);
> +  amd_dbgapi_wave_get_info (wave_id, AMD_DBGAPI_WAVE_INFO_QUEUE,
> +			    sizeof (res.queue_id), &res.queue_id);
> +  amd_dbgapi_wave_get_info (wave_id, AMD_DBGAPI_WAVE_INFO_DISPATCH,
> +			    sizeof (res.dispatch_id), &res.dispatch_id);
> +
> +  amd_dbgapi_wave_get_info (wave_id,
> +			    AMD_DBGAPI_WAVE_INFO_WORKGROUP_COORD,
> +			    sizeof (res.group_ids), &res.group_ids);
> +
> +  amd_dbgapi_wave_get_info (wave_id,
> +			    AMD_DBGAPI_WAVE_INFO_WAVE_NUMBER_IN_WORKGROUP,
> +			    sizeof (res.wave_in_group), &res.wave_in_group);
> +
> +  return res;
> +}
> +
>  /* Clear our async event handler.  */
>  
>  static void
> @@ -510,7 +556,21 @@ amd_dbgapi_target::pid_to_str (ptid_t ptid)
>    if (!ptid_is_gpu (ptid))
>      return beneath ()->pid_to_str (ptid);
>  
> -  return wave_target_id_string (get_amd_dbgapi_wave_id (ptid));
> +  process_stratum_target *proc_target = current_inferior ()->process_target ();
> +  inferior *inf = find_inferior_pid (proc_target, ptid.pid ());
> +  gdb_assert (inf != nullptr);
> +  amd_dbgapi_inferior_info *info = get_amd_dbgapi_inferior_info (inf);
> +
> +  auto wave_id = get_amd_dbgapi_wave_id (ptid);
> +
> +  auto it = info->wave_info_map.find (wave_id.handle);
> +  if (it != info->wave_info_map.end ())
> +    return it->second.to_string ();
> +
> +  /* A wave we don't know about.  Shouldn't usually happen, but
> +     asserting and bringing down the session is a bit too harsh.  Just
> +     print all unknown info as "?"s.  */
> +  return wave_info (wave_id).to_string ();
>  }
>  
>  const char *
> @@ -929,6 +989,46 @@ make_gpu_ptid (ptid_t::pid_type pid, amd_dbgapi_wave_id_t wave_id)
>   return ptid_t (pid, 1, wave_id.handle);
>  }
>  
> +/* When a thread is deleted, remove its wave_info from the inferior's
> +   wave_info map.  */
> +
> +static void
> +amd_dbgapi_thread_deleted (thread_info *tp)
> +{
> +  if (tp->inf->target_at (arch_stratum) == &the_amd_dbgapi_target
> +      && ptid_is_gpu (tp->ptid))
> +    {
> +      amd_dbgapi_inferior_info *info = amd_dbgapi_inferior_data.get (tp->inf);
> +      auto wave_id = get_amd_dbgapi_wave_id (tp->ptid);
> +      auto it = info->wave_info_map.find (wave_id.handle);
> +      gdb_assert (it != info->wave_info_map.end ());
> +      info->wave_info_map.erase (it);
> +    }
> +}
> +
> +/* Register WAVE_PTID as a new thread in INF's thread list, and record
> +   its wave_info in the inferior's wave_info map.  */
> +
> +static thread_info *
> +add_gpu_thread (inferior *inf, ptid_t wave_ptid)
> +{
> +  process_stratum_target *proc_target = inf->process_target ();
> +  amd_dbgapi_inferior_info *info = get_amd_dbgapi_inferior_info (inf);
> +
> +  auto wave_id = get_amd_dbgapi_wave_id (wave_ptid);
> +
> +  if (!info->wave_info_map.try_emplace (wave_id.handle,
> +					get_wave_info (wave_id)).second)
> +    internal_error ("wave ID %ld already in map", wave_id.handle);
> +
> +  /* Create new GPU threads silently to avoid spamming the terminal
> +     with thousands of "[New Thread ...]" messages.  */
> +  thread_info *thread = add_thread_silent (proc_target, wave_ptid);
> +  set_running (proc_target, wave_ptid, true);
> +  set_executing (proc_target, wave_ptid, true);
> +  return thread;
> +}
> +
>  /* Process an event that was just pulled out of the amd-dbgapi library.  */
>  
>  static void
> @@ -1015,13 +1115,7 @@ process_one_event (amd_dbgapi_event_id_t event_id,
>  
>  	    thread_info *thread = proc_target->find_thread (event_ptid);
>  	    if (thread == nullptr)
> -	      {
> -		/* Silently create new GPU threads to avoid spamming the
> -		   terminal with thousands of "[New Thread ...]" messages.  */
> -		thread = add_thread_silent (proc_target, event_ptid);
> -		set_running (proc_target, event_ptid, true);
> -		set_executing (proc_target, event_ptid, true);
> -	      }
> +	      thread = add_gpu_thread (inf, event_ptid);
>  
>  	    /* If the wave is stopped because of a software breakpoint, the
>  	       program counter needs to be adjusted so that it points to the
> @@ -1686,10 +1780,7 @@ amd_dbgapi_target::update_thread_list ()
>  	{
>  	  ptid_t wave_ptid
>  	    = make_gpu_ptid (inf->pid, amd_dbgapi_wave_id_t {tid});
> -
> -	  add_thread_silent (inf->process_target (), wave_ptid);
> -	  set_running (inf->process_target (), wave_ptid, true);
> -	  set_executing (inf->process_target (), wave_ptid, true);
> +	  add_gpu_thread (inf, wave_ptid);
>  	}
>      }
>  
> @@ -2115,6 +2206,7 @@ _initialize_amd_dbgapi_target ()
>    gdb::observers::inferior_forked.attach (amd_dbgapi_inferior_forked, "amd-dbgapi");
>    gdb::observers::inferior_exit.attach (amd_dbgapi_inferior_exited, "amd-dbgapi");
>    gdb::observers::inferior_pre_detach.attach (amd_dbgapi_inferior_pre_detach, "amd-dbgapi");
> +  gdb::observers::thread_deleted.attach (amd_dbgapi_thread_deleted, "amd-dbgapi");
>  
>    add_basic_prefix_cmd ("amdgpu", no_class,
>  			_("Generic command for setting amdgpu flags."),
> diff --git a/gdb/observable.c b/gdb/observable.c
> index f2e65b11604..29675f3abf3 100644
> --- a/gdb/observable.c
> +++ b/gdb/observable.c
> @@ -46,6 +46,7 @@ DEFINE_OBSERVABLE (all_objfiles_removed);
>  DEFINE_OBSERVABLE (free_objfile);
>  DEFINE_OBSERVABLE (new_thread);
>  DEFINE_OBSERVABLE (thread_exit);
> +DEFINE_OBSERVABLE (thread_deleted);
>  DEFINE_OBSERVABLE (thread_stop_requested);
>  DEFINE_OBSERVABLE (target_resumed);
>  DEFINE_OBSERVABLE (about_to_proceed);
> diff --git a/gdb/observable.h b/gdb/observable.h
> index 32ef65435cc..91a2c871524 100644
> --- a/gdb/observable.h
> +++ b/gdb/observable.h
> @@ -126,6 +126,11 @@ extern observable<thread_info */* t */,
>  		  std::optional<ULONGEST> /* exit_code */,
>  		  bool /* silent */> thread_exit;
>  
> +/* The thread specified by T has been deleted, with delete_thread.
> +   This is called just before the thread_info object is destroyed with
> +   operator delete.  */
> +extern observable<thread_info */* t */> thread_deleted;
> +
>  /* An explicit stop request was issued to PTID.  If PTID equals
>     minus_one_ptid, the request applied to all threads.  If
>     ptid_is_pid(PTID) returns true, the request applied to all
> diff --git a/gdb/thread.c b/gdb/thread.c
> index 85bdbaa6cd8..bd3fe85f3b9 100644
> --- a/gdb/thread.c
> +++ b/gdb/thread.c
> @@ -527,6 +527,8 @@ delete_thread_1 (thread_info *thr, std::optional<ULONGEST> exit_code,
>    auto it = thr->inf->thread_list.iterator_to (*thr);
>    thr->inf->thread_list.erase (it);
>  
> +  gdb::observers::thread_deleted.notify (thr);
> +
>    delete thr;
>  }
>  
> 
> -- 
> 2.43.0
> 

I am happy whether you do the suggested changes or not. Either way

Approved-By: Lancelot Six <lancelot.six@amd.com> (amdgpu)

Best,
Lancelot.

next prev parent reply	other threads:[~2023-12-15 10:51 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-14 20:22 [PATCH 0/8] Step over thread exit improvements/fixes + AMD GPU Pedro Alves
2023-12-14 20:22 ` [PATCH 1/8] gdb.threads/step-over-thread-exit.exp improvements Pedro Alves
2023-12-14 20:22 ` [PATCH 2/8] Ensure selected thread after thread exit stop Pedro Alves
2023-12-14 20:22 ` [PATCH 3/8] displaced_step_finish: Don't fetch the regcache of exited threads Pedro Alves
2023-12-14 20:22 ` [PATCH 4/8] Step over thread exit, always delete the thread non-silently Pedro Alves
2023-12-14 20:22 ` [PATCH 5/8] Fix thread target ID of exited waves Pedro Alves
2023-12-15 10:51   ` Lancelot SIX [this message]
2023-12-14 20:22 ` [PATCH 6/8] Fix handling of vanishing threads that were stepping/stopping Pedro Alves
2023-12-15 10:51   ` Lancelot SIX
2023-12-14 20:22 ` [PATCH 7/8] Add tests for s_endpgm handling Pedro Alves
2023-12-14 20:22 ` [PATCH 8/8] Add tests for handling of vanishing threads that were stepping/stopping Pedro Alves
2023-12-20 21:24 ` [PATCH 0/8] Step over thread exit improvements/fixes + AMD GPU Pedro Alves

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231215105105.bqqo5jxy3hpxyqqj@khazad-dum \
    --to=lancelot.six@amd.com \
    --cc=gdb-patches@sourceware.org \
    --cc=pedro@palves.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).