From: Pedro Alves <pedro@palves.net>
To: Lancelot Six <lancelot.six@amd.com>, gdb-patches@sourceware.org
Cc: lsix@lancelotsix.com
Subject: Re: [PATCH 2/2] gdb/amdgpu: Fix debugging multiple inferiors using the ROCm runtime
Date: Fri, 28 Jul 2023 19:01:08 +0100 [thread overview]
Message-ID: <0bffd0c1-1aeb-a401-2509-c96905ba2e28@palves.net> (raw)
In-Reply-To: <20230630145755.6500-3-lancelot.six@amd.com>
Hi Lancelot,
This LGTM with some nits below addressed. No need for another round
of review. Just post the updated patch, and merge it.
With that,
Approved-By: Pedro Alves <pedro@palves.net>
On to the nits...
On 2023-06-30 15:57, Lancelot Six via Gdb-patches wrote:
> - The driver creates the runtime activation for inferior 2 and writes to
> the associated file descriptor.
> - GDB has inferior 1 selected and calls target_wait for some reason.
> - This prompts amd_dbgapi_target::wait to be called. The method pulls
> all events from the driver, including the runtime activation event for
> inferior 2, leading to the insertion failure.
insertion -> assertion.
>
> The fix for this problem is simple. To avoid such problem, we need to
> make sure that amd_dbgapi_target::wait only pulls events for the current
> inferior from the driver. This is what this patch implements.
>
> This patch also includes a testcase which could fail before this patch.
>
> This patch has been tested on a system with multiple GPUs which had more
> chances to reproduce the original bug. It has also been tested on top
> of the downstream ROCgdb port which has more AMDGPU related tests. The
> testcase have been tested with `make check check-read1 check-readmore`.
"have been" -> "has been"
> ---
> gdb/amd-dbgapi-target.c | 6 +-
> gdb/testsuite/gdb.rocm/multi-inferior-gpu.cpp | 111 ++++++++++++++++++
> gdb/testsuite/gdb.rocm/multi-inferior-gpu.exp | 86 ++++++++++++++
> 3 files changed, 201 insertions(+), 2 deletions(-)
> create mode 100644 gdb/testsuite/gdb.rocm/multi-inferior-gpu.cpp
> create mode 100644 gdb/testsuite/gdb.rocm/multi-inferior-gpu.exp
>
> diff --git a/gdb/amd-dbgapi-target.c b/gdb/amd-dbgapi-target.c
> index 5565cf907fa..371f0683754 100644
> --- a/gdb/amd-dbgapi-target.c
> +++ b/gdb/amd-dbgapi-target.c
> @@ -1255,8 +1255,10 @@ amd_dbgapi_target::wait (ptid_t ptid, struct target_waitstatus *ws,
> std::tie (event_ptid, gpu_waitstatus) = consume_one_event (ptid.pid ());
> if (event_ptid == minus_one_ptid)
> {
> - /* Drain the events from the amd_dbgapi and preserve the ordering. */
> - process_event_queue ();
> + /* Drain the events for the current inferior from the amd_dbgapi and
> + preserve the ordering. */
> + auto info = get_amd_dbgapi_inferior_info (current_inferior ());
> + process_event_queue (info->process_id, AMD_DBGAPI_EVENT_KIND_NONE);
I think the process_event_queue's process_id parameter could stop having
a default argument.
>
> std::tie (event_ptid, gpu_waitstatus) = consume_one_event (ptid.pid ());
> if (event_ptid == minus_one_ptid)
> diff --git a/gdb/testsuite/gdb.rocm/multi-inferior-gpu.cpp b/gdb/testsuite/gdb.rocm/multi-inferior-gpu.cpp
> new file mode 100644
> index 00000000000..828dc0cf7d4
> --- /dev/null
> +++ b/gdb/testsuite/gdb.rocm/multi-inferior-gpu.cpp
> @@ -0,0 +1,111 @@
...
> + if (pid == 0)
> + {
> + /* Exec to be fore the child to re-initialize the ROCm runtime. */
I can't parse the
"Exec to be fore the child"
comment. I think you mean:
"Exec the child"
?
> + if (execl (argv[0], argv[0], n) == -1)
> + {
> + perror ("Failed to exec");
> + return -1;
> + }
> + }
> + }
> +
> + /* Wait for all children. */
> + int ws;
> + pid_t ret;
> + do
> + ret = waitpid (-1, &ws, 0);
> + while (!(ret == -1 && errno == ECHILD));
At <https://www.gnu.org/prep/standards/standards.html>, we have:
"Format do-while statements like this:
do
{
a = foo (a);
}
while (a > 0);
"
IMO, this is more readable, and lets you keep the
variables in the scope:
while (1)
{
int ws;
pid_t ret = waitpid (-1, &ws, 0);
if (ret == -1 && errno == ECHILD)
break;
}
> +
> + /* Last break here. */
> + return 0;
> +}
> +
> +static int
> +child (int argc, char **argv)
> +{
> + int dev_number;
> + if (sscanf (argv[1], "%d", &dev_number) != 1)
> + {
> + fprintf (stderr, "Invalid argument \"%s\"\n", argv[1]);
> + return -1;
> + }
> +
> + CHECK (hipSetDevice (dev_number));
> + kern<<<1, 1>>> ();
> + hipDeviceSynchronize ();
> + return 0;
> +}
> +
> +/* When called with no argument, identify how many AMDGPU devices are
> + available on the system and spawn one worker process per GPU. If a
> + command-line argument is provided, it is the index of the GPU to use. */
> +
> +int
> +main (int argc, char **argv)
> +{
> + if (argc <= 1)
> + return parent (argc, argv);
> + else
> + return child (argc, argv);
> +}
> diff --git a/gdb/testsuite/gdb.rocm/multi-inferior-gpu.exp b/gdb/testsuite/gdb.rocm/multi-inferior-gpu.exp
> new file mode 100644
> index 00000000000..3e8934645e6
> --- /dev/null
> +++ b/gdb/testsuite/gdb.rocm/multi-inferior-gpu.exp
> @@ -0,0 +1,86 @@
> +# Copyright 2023 Free Software Foundation, Inc.
> +
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program. If not, see <http://www.gnu.org/licenses/>.
> +
> +# This test checks that GDB can debug multiple inferior which uses all
> +# the ROCm runtime.
# This test checks that GDB can debug multiple inferiors that all use
# the ROCm runtime.
> +
> +load_lib rocm.exp
> +
> +standard_testfile .cpp
> +
> +require allow_hipcc_tests
> +require hip_devices_support_debug_multi_process
> +
> +if {[build_executable "failed to prepare" $testfile $srcfile {debug hip}]} {
> + return
> +}
> +
> +proc do_test {} {
> + clean_restart $::binfile
> + gdb_test_no_output "set non-stop on"
> + gdb_test_no_output "set detach-on-fork off"
> + gdb_test_no_output "set follow-fork parent"
> +
> + with_rocm_gpu_lock {
> + gdb_breakpoint [gdb_get_line_number "Break here"]
> + gdb_breakpoint kern allow-pending
> + gdb_breakpoint [gdb_get_line_number "Last break here"]
> +
> + # Run intil we reach the first breakpoint where we can figure
"Run intil" -> "Run until".
> + # out how many children will be spawned.
> + gdb_test "run" "hit Breakpoint.*"
> +
> + set num_childs [get_integer_valueof "num_devices" 0]
num_childs -> num_children ?
> + set bp_to_see $num_childs
> + set stopped_threads [list]
> +
> + gdb_test_multiple "continue -a" "continue to gpu breakpoints" {
> + -re "Thread ($::decimal\.$::decimal)\[^\r\n\]* hit Breakpoint\[^\r\n\]*, kern \(\)\[^\r\n\]*\r\n" {
> + lappend stopped_threads $expect_out(1,string)
> + incr bp_to_see -1
> + if {$bp_to_see != 0} {
> + exp_continue
> + } else {
> + pass $gdb_test_name
> + }
> + }
> + -re "^\[^\r\n\]*\r\n" {
> + exp_continue
> + }
> + }
Since this is non-stop, this "continue -a" will cause the first
stop to print the prompt, and other stops to not print it. The
"-re" cases above don't explicitly handle the prompt, which seems
brittle to me.
"continue -a&" instead of ""continue -a" immediately prints the
prompt. It would be better IMO to explicitly consume the prompt
with that. Like (untested):
gdb_test_multiple "continue -a &" "continue to gpu breakpoints" {
-re "Continuing\.\r\n$gdb_prompt " {
pass $gdb_test_name
}
}
gdb_test_multiple "" "wait for gpu stops {
-re "Thread ($::decimal\.$::decimal)\[^\r\n\]* hit Breakpoint\[^\r\n\]*, kern \(\)\[^\r\n\]*\r\n" {
lappend stopped_threads $expect_out(1,string)
incr bp_to_see -1
if {$bp_to_see != 0} {
exp_continue
} else {
pass $gdb_test_name
}
}
}
> +
> + # Continue all the children processes until they exit.
Maybe say:
# Continue all the GPU kernels until all the children processes exit.
If I am not mistaken, the children processes on the CPU side are already
running at this point, only the GPU kernels were stopped.
> + foreach thread $stopped_threads {
I would rename "stopped_threads" -> stopped_gdb_threads.
That's it. Thanks for the fix!
Pedro Alves
> + set infnumber [lindex [split $thread .] 0]
> + gdb_test "thread $thread" "Switching to thread.*"
> + gdb_test_multiple "continue $thread" "" {
> + -re "\\\[Inferior $infnumber \[^\n\r\]* exited normally\\]\r\n$::gdb_prompt " {
> + pass $gdb_test_name
> + }
> + }
> + }
> +
> + gdb_test_multiple "" "reach breakpoint in main" {
> + -re "hit Breakpoint.*parent" {
> + pass $gdb_test_name
> + }
> + }
> + # Select main inferior
> + gdb_test "inferior 1" "Switching to inferior 1.*"
> + gdb_continue_to_end "" "continue -a" 1
> + }
> +}
> +
> +do_test
next prev parent reply other threads:[~2023-07-28 18:01 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-30 14:57 [PATCH 0/2] Fix debugging multi " Lancelot Six
2023-06-30 14:57 ` [PATCH 1/2] gdb/testsuite/rocm: Add the hip_devices_support_debug_multi_process proc Lancelot Six
2023-07-28 18:00 ` Pedro Alves
2023-06-30 14:57 ` [PATCH 2/2] gdb/amdgpu: Fix debugging multiple inferiors using the ROCm runtime Lancelot Six
2023-07-28 18:01 ` Pedro Alves [this message]
2023-07-28 18:04 ` Pedro Alves
2023-07-28 16:47 ` [PING] [PATCH 0/2] Fix debugging multi " Lancelot SIX
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0bffd0c1-1aeb-a401-2509-c96905ba2e28@palves.net \
--to=pedro@palves.net \
--cc=gdb-patches@sourceware.org \
--cc=lancelot.six@amd.com \
--cc=lsix@lancelotsix.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).