From: Lancelot SIX <lancelot.six@amd.com>
To: Pedro Alves <pedro@palves.net>
Cc: <gdb-patches@sourceware.org>
Subject: Re: [PATCH 5/8] Fix thread target ID of exited waves
Date: Fri, 15 Dec 2023 10:51:05 +0000 [thread overview]
Message-ID: <20231215105105.bqqo5jxy3hpxyqqj@khazad-dum> (raw)
In-Reply-To: <20231214202238.1065676-6-pedro@palves.net>
Hi Pedro,
Thanks for doing this.
I have minor suggestions below, but I am happy if you prefer what you
have here.
On Thu, Dec 14, 2023 at 08:22:35PM +0000, Pedro Alves wrote:
> Currently, if you step over kernel exit, you see:
>
> stepi
> [AMDGPU Wave ?:?:?:1 (?,?,?)/? exited]
> Command aborted, thread exited.
> (gdb)
>
> Those '?' are because the thread/wave is already gone by the time GDB
> prints the "exited" notification, we can't ask dbgapi for any info
> about the wave anymore.
>
> This commit fixes it by caching the wave's coordinates as soon as GDB
> sees the wave for the first time, and making
> amd_dbgapi_target::pid_to_str use the cached info.
>
> At first I thought of clearing the wave_info object from a
> thread_exited observer. However, that is too soon, resulting in this:
>
> (gdb) si
> [AMDGPU Wave 1:4:1:1 (0,0,0)/0 exited]
> Command aborted, thread exited.
> (gdb) thread
> [Current thread is 6 (AMDGPU Wave ?:?:?:0 (?,?,?)/?) (exited)]
>
> We need instead to clear the wave info when the thread is ultimately
> deleted, so we get:
>
> (gdb) si
> [AMDGPU Wave 1:4:1:1 (0,0,0)/0 exited]
> Command aborted, thread exited.
> (gdb) thread
> [Current thread is 6 (AMDGPU Wave 1:4:1:1 (0,0,0)/0) (exited)]
>
> And for that, we need a new thread_deleted observable.
>
> Change-Id: I6c3e22541f051e1205f75eb657b04dc15e547580
> ---
> gdb/amd-dbgapi-target.c | 168 +++++++++++++++++++++++++++++++---------
> gdb/observable.c | 1 +
> gdb/observable.h | 5 ++
> gdb/thread.c | 2 +
> 4 files changed, 138 insertions(+), 38 deletions(-)
>
> diff --git a/gdb/amd-dbgapi-target.c b/gdb/amd-dbgapi-target.c
> index 18c0543c40e..86102b7fb03 100644
> --- a/gdb/amd-dbgapi-target.c
> +++ b/gdb/amd-dbgapi-target.c
> @@ -109,6 +109,28 @@ get_amd_dbgapi_target_inferior_created_observer_token ()
> return amd_dbgapi_target_inferior_created_observer_token;
> }
>
> +/* A type holding coordinate, etc. info for a given wave. We cache
> + this because we need this information after a wave exits. */
> +
> +struct wave_info
> +{
> + /* The wave. Set by the ctor. */
> + amd_dbgapi_wave_id_t wave_id;
> +
> + /* All these fields are initialized here to a value that is printed
> + as "?". */
> + amd_dbgapi_dispatch_id_t dispatch_id {};
> + amd_dbgapi_queue_id_t queue_id {};
> + amd_dbgapi_agent_id_t agent_id {};
We could be more explicit here and use the semantically equivalent
notation:
amd_dbgapi_dispatch_id_t dispatch_id = AMD_DBGAPI_DISPATCH_NONE;
amd_dbgapi_queue_id_t queue_id = AMD_DBGAPI_QUEUE_NONE;
amd_dbgapi_agent_id_t agent_id = AMD_DBGAPI_AGENT_NONE;
> + uint32_t group_ids[3] {UINT32_MAX, UINT32_MAX, UINT32_MAX};
> + uint32_t wave_in_group = UINT32_MAX;
> +
> + explicit wave_info (amd_dbgapi_wave_id_t wave_id);
> +
> + /* Return the target ID string for the wave this wave_info is
> + for. */
> + std::string to_string () const;
> +};
>
> /* Big enough to hold the size of the largest register in bytes. */
> #define AMDGPU_MAX_REGISTER_SIZE 256
> @@ -160,6 +182,16 @@ struct amd_dbgapi_inferior_info
>
> /* List of pending events the amd-dbgapi target retrieved from the dbgapi. */
> std::list<std::pair<ptid_t, target_waitstatus>> wave_events;
> +
> + /* Map of wave ID to wave_info. We cache wave_info objects because
> + we need to access the info after the wave is gone, in the thread
> + exit nofication. E.g.:
> + [AMDGPU Wave 1:4:1:1 (0,0,0)/0 exited]
> +
> + wave_info objects are added when we first see the wave, and
> + removed from a thread_deleted observer. */
> + std::unordered_map<decltype (amd_dbgapi_wave_id_t::handle), wave_info>
> + wave_info_map;
> };
>
> static amd_dbgapi_event_id_t process_event_queue
> @@ -256,56 +288,70 @@ static const registry<inferior>::key<amd_dbgapi_inferior_info>
>
> static async_event_handler *amd_dbgapi_async_event_handler = nullptr;
>
> -/* Return the target id string for a given wave. */
> -
> -static std::string
> -wave_target_id_string (amd_dbgapi_wave_id_t wave_id)
> +std::string
> +wave_info::to_string () const
> {
> - amd_dbgapi_dispatch_id_t dispatch_id;
> - amd_dbgapi_queue_id_t queue_id;
> - amd_dbgapi_agent_id_t agent_id;
> - uint32_t group_ids[3], wave_in_group;
> std::string str = "AMDGPU Wave";
>
> - amd_dbgapi_status_t status
> - = amd_dbgapi_wave_get_info (wave_id, AMD_DBGAPI_WAVE_INFO_AGENT,
> - sizeof (agent_id), &agent_id);
> - str += (status == AMD_DBGAPI_STATUS_SUCCESS
> + str += (agent_id.handle != 0
We could use `agent_id != AMD_DBGAPI_AGENT_NONE` here.
> ? string_printf (" %ld", agent_id.handle)
> : " ?");
>
> - status = amd_dbgapi_wave_get_info (wave_id, AMD_DBGAPI_WAVE_INFO_QUEUE,
> - sizeof (queue_id), &queue_id);
> - str += (status == AMD_DBGAPI_STATUS_SUCCESS
> + str += (queue_id.handle != 0
Similarly, `queue_id != AMD_DBGAPI_QUEUE_NONE`.
> ? string_printf (":%ld", queue_id.handle)
> : ":?");
>
> - status = amd_dbgapi_wave_get_info (wave_id, AMD_DBGAPI_WAVE_INFO_DISPATCH,
> - sizeof (dispatch_id), &dispatch_id);
> - str += (status == AMD_DBGAPI_STATUS_SUCCESS
> + str += (dispatch_id.handle != 0
Similarly, `dispatch_id != AMD_DBGAPI_DISPATCH_NONE`.
> ? string_printf (":%ld", dispatch_id.handle)
> : ":?");
>
> str += string_printf (":%ld", wave_id.handle);
>
> - status = amd_dbgapi_wave_get_info (wave_id,
> - AMD_DBGAPI_WAVE_INFO_WORKGROUP_COORD,
> - sizeof (group_ids), &group_ids);
> - str += (status == AMD_DBGAPI_STATUS_SUCCESS
> + str += (group_ids[0] != UINT32_MAX
> ? string_printf (" (%d,%d,%d)", group_ids[0], group_ids[1],
> group_ids[2])
> : " (?,?,?)");
>
> - status = amd_dbgapi_wave_get_info
> - (wave_id, AMD_DBGAPI_WAVE_INFO_WAVE_NUMBER_IN_WORKGROUP,
> - sizeof (wave_in_group), &wave_in_group);
> - str += (status == AMD_DBGAPI_STATUS_SUCCESS
> + str += (wave_in_group != UINT32_MAX
> ? string_printf ("/%d", wave_in_group)
> : "/?");
>
> return str;
> }
>
> +wave_info::wave_info (amd_dbgapi_wave_id_t wave_id)
> + : wave_id (wave_id)
> +{
> +}
> +
> +/* Read in wave_info for WAVE_ID. */
> +
> +static wave_info
> +get_wave_info (amd_dbgapi_wave_id_t wave_id)
> +{
> + wave_info res (wave_id);
> +
> + /* Any field that fails to be read is left with its in-class
> + initialized value, which is printed as "?". */
> +
> + amd_dbgapi_wave_get_info (wave_id, AMD_DBGAPI_WAVE_INFO_AGENT,
> + sizeof (res.agent_id), &res.agent_id);
> + amd_dbgapi_wave_get_info (wave_id, AMD_DBGAPI_WAVE_INFO_QUEUE,
> + sizeof (res.queue_id), &res.queue_id);
> + amd_dbgapi_wave_get_info (wave_id, AMD_DBGAPI_WAVE_INFO_DISPATCH,
> + sizeof (res.dispatch_id), &res.dispatch_id);
> +
> + amd_dbgapi_wave_get_info (wave_id,
> + AMD_DBGAPI_WAVE_INFO_WORKGROUP_COORD,
> + sizeof (res.group_ids), &res.group_ids);
> +
> + amd_dbgapi_wave_get_info (wave_id,
> + AMD_DBGAPI_WAVE_INFO_WAVE_NUMBER_IN_WORKGROUP,
> + sizeof (res.wave_in_group), &res.wave_in_group);
> +
> + return res;
> +}
> +
> /* Clear our async event handler. */
>
> static void
> @@ -510,7 +556,21 @@ amd_dbgapi_target::pid_to_str (ptid_t ptid)
> if (!ptid_is_gpu (ptid))
> return beneath ()->pid_to_str (ptid);
>
> - return wave_target_id_string (get_amd_dbgapi_wave_id (ptid));
> + process_stratum_target *proc_target = current_inferior ()->process_target ();
> + inferior *inf = find_inferior_pid (proc_target, ptid.pid ());
> + gdb_assert (inf != nullptr);
> + amd_dbgapi_inferior_info *info = get_amd_dbgapi_inferior_info (inf);
> +
> + auto wave_id = get_amd_dbgapi_wave_id (ptid);
> +
> + auto it = info->wave_info_map.find (wave_id.handle);
> + if (it != info->wave_info_map.end ())
> + return it->second.to_string ();
> +
> + /* A wave we don't know about. Shouldn't usually happen, but
> + asserting and bringing down the session is a bit too harsh. Just
> + print all unknown info as "?"s. */
> + return wave_info (wave_id).to_string ();
> }
>
> const char *
> @@ -929,6 +989,46 @@ make_gpu_ptid (ptid_t::pid_type pid, amd_dbgapi_wave_id_t wave_id)
> return ptid_t (pid, 1, wave_id.handle);
> }
>
> +/* When a thread is deleted, remove its wave_info from the inferior's
> + wave_info map. */
> +
> +static void
> +amd_dbgapi_thread_deleted (thread_info *tp)
> +{
> + if (tp->inf->target_at (arch_stratum) == &the_amd_dbgapi_target
> + && ptid_is_gpu (tp->ptid))
> + {
> + amd_dbgapi_inferior_info *info = amd_dbgapi_inferior_data.get (tp->inf);
> + auto wave_id = get_amd_dbgapi_wave_id (tp->ptid);
> + auto it = info->wave_info_map.find (wave_id.handle);
> + gdb_assert (it != info->wave_info_map.end ());
> + info->wave_info_map.erase (it);
> + }
> +}
> +
> +/* Register WAVE_PTID as a new thread in INF's thread list, and record
> + its wave_info in the inferior's wave_info map. */
> +
> +static thread_info *
> +add_gpu_thread (inferior *inf, ptid_t wave_ptid)
> +{
> + process_stratum_target *proc_target = inf->process_target ();
> + amd_dbgapi_inferior_info *info = get_amd_dbgapi_inferior_info (inf);
> +
> + auto wave_id = get_amd_dbgapi_wave_id (wave_ptid);
> +
> + if (!info->wave_info_map.try_emplace (wave_id.handle,
> + get_wave_info (wave_id)).second)
> + internal_error ("wave ID %ld already in map", wave_id.handle);
> +
> + /* Create new GPU threads silently to avoid spamming the terminal
> + with thousands of "[New Thread ...]" messages. */
> + thread_info *thread = add_thread_silent (proc_target, wave_ptid);
> + set_running (proc_target, wave_ptid, true);
> + set_executing (proc_target, wave_ptid, true);
> + return thread;
> +}
> +
> /* Process an event that was just pulled out of the amd-dbgapi library. */
>
> static void
> @@ -1015,13 +1115,7 @@ process_one_event (amd_dbgapi_event_id_t event_id,
>
> thread_info *thread = proc_target->find_thread (event_ptid);
> if (thread == nullptr)
> - {
> - /* Silently create new GPU threads to avoid spamming the
> - terminal with thousands of "[New Thread ...]" messages. */
> - thread = add_thread_silent (proc_target, event_ptid);
> - set_running (proc_target, event_ptid, true);
> - set_executing (proc_target, event_ptid, true);
> - }
> + thread = add_gpu_thread (inf, event_ptid);
>
> /* If the wave is stopped because of a software breakpoint, the
> program counter needs to be adjusted so that it points to the
> @@ -1686,10 +1780,7 @@ amd_dbgapi_target::update_thread_list ()
> {
> ptid_t wave_ptid
> = make_gpu_ptid (inf->pid, amd_dbgapi_wave_id_t {tid});
> -
> - add_thread_silent (inf->process_target (), wave_ptid);
> - set_running (inf->process_target (), wave_ptid, true);
> - set_executing (inf->process_target (), wave_ptid, true);
> + add_gpu_thread (inf, wave_ptid);
> }
> }
>
> @@ -2115,6 +2206,7 @@ _initialize_amd_dbgapi_target ()
> gdb::observers::inferior_forked.attach (amd_dbgapi_inferior_forked, "amd-dbgapi");
> gdb::observers::inferior_exit.attach (amd_dbgapi_inferior_exited, "amd-dbgapi");
> gdb::observers::inferior_pre_detach.attach (amd_dbgapi_inferior_pre_detach, "amd-dbgapi");
> + gdb::observers::thread_deleted.attach (amd_dbgapi_thread_deleted, "amd-dbgapi");
>
> add_basic_prefix_cmd ("amdgpu", no_class,
> _("Generic command for setting amdgpu flags."),
> diff --git a/gdb/observable.c b/gdb/observable.c
> index f2e65b11604..29675f3abf3 100644
> --- a/gdb/observable.c
> +++ b/gdb/observable.c
> @@ -46,6 +46,7 @@ DEFINE_OBSERVABLE (all_objfiles_removed);
> DEFINE_OBSERVABLE (free_objfile);
> DEFINE_OBSERVABLE (new_thread);
> DEFINE_OBSERVABLE (thread_exit);
> +DEFINE_OBSERVABLE (thread_deleted);
> DEFINE_OBSERVABLE (thread_stop_requested);
> DEFINE_OBSERVABLE (target_resumed);
> DEFINE_OBSERVABLE (about_to_proceed);
> diff --git a/gdb/observable.h b/gdb/observable.h
> index 32ef65435cc..91a2c871524 100644
> --- a/gdb/observable.h
> +++ b/gdb/observable.h
> @@ -126,6 +126,11 @@ extern observable<thread_info */* t */,
> std::optional<ULONGEST> /* exit_code */,
> bool /* silent */> thread_exit;
>
> +/* The thread specified by T has been deleted, with delete_thread.
> + This is called just before the thread_info object is destroyed with
> + operator delete. */
> +extern observable<thread_info */* t */> thread_deleted;
> +
> /* An explicit stop request was issued to PTID. If PTID equals
> minus_one_ptid, the request applied to all threads. If
> ptid_is_pid(PTID) returns true, the request applied to all
> diff --git a/gdb/thread.c b/gdb/thread.c
> index 85bdbaa6cd8..bd3fe85f3b9 100644
> --- a/gdb/thread.c
> +++ b/gdb/thread.c
> @@ -527,6 +527,8 @@ delete_thread_1 (thread_info *thr, std::optional<ULONGEST> exit_code,
> auto it = thr->inf->thread_list.iterator_to (*thr);
> thr->inf->thread_list.erase (it);
>
> + gdb::observers::thread_deleted.notify (thr);
> +
> delete thr;
> }
>
>
> --
> 2.43.0
>
I am happy whether you do the suggested changes or not. Either way
Approved-By: Lancelot Six <lancelot.six@amd.com> (amdgpu)
Best,
Lancelot.
next prev parent reply other threads:[~2023-12-15 10:51 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-14 20:22 [PATCH 0/8] Step over thread exit improvements/fixes + AMD GPU Pedro Alves
2023-12-14 20:22 ` [PATCH 1/8] gdb.threads/step-over-thread-exit.exp improvements Pedro Alves
2023-12-14 20:22 ` [PATCH 2/8] Ensure selected thread after thread exit stop Pedro Alves
2023-12-14 20:22 ` [PATCH 3/8] displaced_step_finish: Don't fetch the regcache of exited threads Pedro Alves
2023-12-14 20:22 ` [PATCH 4/8] Step over thread exit, always delete the thread non-silently Pedro Alves
2023-12-14 20:22 ` [PATCH 5/8] Fix thread target ID of exited waves Pedro Alves
2023-12-15 10:51 ` Lancelot SIX [this message]
2023-12-14 20:22 ` [PATCH 6/8] Fix handling of vanishing threads that were stepping/stopping Pedro Alves
2023-12-15 10:51 ` Lancelot SIX
2023-12-14 20:22 ` [PATCH 7/8] Add tests for s_endpgm handling Pedro Alves
2023-12-14 20:22 ` [PATCH 8/8] Add tests for handling of vanishing threads that were stepping/stopping Pedro Alves
2023-12-20 21:24 ` [PATCH 0/8] Step over thread exit improvements/fixes + AMD GPU Pedro Alves
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231215105105.bqqo5jxy3hpxyqqj@khazad-dum \
--to=lancelot.six@amd.com \
--cc=gdb-patches@sourceware.org \
--cc=pedro@palves.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).