Re: [PATCH 12/12] gdb: initial support for ROCm platform (AMDGPU) debugging

public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed

From: Eli Zaretskii <eliz@gnu.org>
To: Simon Marchi <simark@simark.ca>
Cc: simon.marchi@efficios.com, gdb-patches@sourceware.org,
	zoran.zaric@amd.com, laurent.morichetti@amd.com,
	Tony.Tye@amd.com, lancelot.six@amd.com, pedro@palves.net
Subject: Re: [PATCH 12/12] gdb: initial support for ROCm platform (AMDGPU) debugging
Date: Wed, 07 Dec 2022 15:29:35 +0200	[thread overview]
Message-ID: <83wn734a0w.fsf@gnu.org> (raw)
In-Reply-To: <e3bcf411-ce93-c391-2a09-5eb4876f5475@simark.ca> (message from Simon Marchi on Tue, 6 Dec 2022 21:17:25 -0500)

> Date: Tue, 6 Dec 2022 21:17:25 -0500
> Cc: gdb-patches@sourceware.org, zoran.zaric@amd.com,
>  laurent.morichetti@amd.com, Tony.Tye@amd.com, lancelot.six@amd.com,
>  pedro@palves.net
> From: Simon Marchi <simark@simark.ca>
> 
> >> +@node set scheduler-locking
> > 
> > This @node is without any @chapter/@section, and does not appear in any
> > @menu.  That doesn't look right; did you actually succeed in building the
> > manual with these changes?
> Yes, it builds.  It is used as a destination for a @pxref{set
> scheduler-locking} later in the patch, if I remove it doesn't build:
> 
>     gdb.texinfo:26469: @pxref reference to nonexistent node `set scheduler-locking'
> 
> But I see the problem, in the HTML it introduces a new page starting at
> that point, we don't want that.  Perhaps we should use @anchor instead?

You could use @anchor if all you want is a place to direct an @xref.

> >> +The following @acronym{AMD GPU} architectures are supported:
> >> +
> >> +@table @emph
> >> +
> >> +@item @samp{gfx900}
> >> +AMD Vega 10 devices, displayed as @samp{vega10} by @value{GDBN}.
> >> +
> >> +@item @samp{gfx906}
> >> +AMD Vega 7nm devices, displayed as @samp{vega20} by @value{GDBN}.
> > 
> > Do we really need this long list of architectures in the GDB manual?  It
> > sounds like an ad for AMD...
> 
> We found it useful, as people often ask which devices / models the
> debugger supports.  AMD produces a lot of GPU models.  A subset of that
> can run ROCm programs.  And a subset of that support debugging.  I think
> it's useful to tell users which devices GDB is expected to work with.
> 
> And it would not be a very good ad either, as most of these devices are
> far from the latest and greatest :).

I understand, but this is a _GDB_ manual, not a manual for debugging
AMD GPU programs.  We need to draw the line at some point.  Why cannot
these details be in some README somewhere, or on the Wiki?

> > This and other @smallexamples in the patch have too long lines; please break
> > them into two or more, to avoid problems with the printed format of the
> > manual.
> 
> What is the maximum line length for this?

I think 70 or 72.

> I'll try, but it's a bit difficult when quoting actual GDB output.  For
> instance, how would you do this one?
> 
> @smallexample
> (@value{GDBP}) info sharedlibrary
> >From                To                  Syms Read   Shared Object Library
> 0x00007fd120664ac0  0x00007fd120682790  Yes (*)     /lib64/ld-linux-x86-64.so.2

Use shorter addresses and directory names: they are immaterial for
your purposes here.

> >> +@subsubsection @acronym{AMD GPU} Wavefronts
> > 
> > I think a @cindex about wavefronts would be useful here.
> 
> I just add a `@cindex Wavefronts` under the line quoted above?

Yes, but "wavefronts", lower-case.  In general, index entries should
not use upper-case unless really necessary, because the sorting order
of mixed-case text depends on the locale and the underlying C library.

> >> +@item file_path
> >> +The file's path specified as a URI encoded UTF-8 string.  In URI
> >> +encoding, every character that is not:
> >> +
> >> +@itemize
> >> +@item In the @samp{a-z}, @samp{A-Z}, @samp{0-9} ranges
> >> +@item @samp{/}, @samp{_}, @samp{.}, @samp{~} or @samp{-}
> >> +@end itemize
> >> +
> >> +is encoded as two uppercase hexadecimal digits proceeded by @samp{%}.
> > 
> > You want @noindent before the last line.
> 
> I can add it, but I don't see the difference (at least in the HTML
> and PDF outputs).

It's unreliable to rely on this to produce un-indented lines.
Depending on the global settings such as @paragraphindent you can get
something you don't want.

> > Finally: maybe it's just me, but isn't this documentation way too detailed?
> > It weighs in at 800 lines, and includes many details that seem to be more
> > related to AMD, GPU, and HIP than to GDB.  Would it be reasonable to make
> > this section shorter by omitting too low-level and unrelated details?
> 
> I went over the page, and while I agree it's very thorough and detailed,
> my impression is that it's all information that is one way or another
> related to how GDB interacts with ROCm / HIP / AMD GPUs.  So, all
> information that could be useful to someone with good knowledge of ROCm
> / HIP / AMD GPUs, if they wanted to use GDB to debug their program.  For
> instance, the description of when GDB reports a SIGABRT is useful, as
> the mapping between target debug events and Unix signals in GDB is kind
> of arbitrary.

But you also describe in detail what each signal means, for example,
which is either redundant or belongs to the documentation of GPU
programming.  E.g., we don't explain in the manual what kind of signal
is SIGBUS or SIGFPE in GP CPUs, so why should we have this spelled out
for GPU programs?

> If you can point out specific parts that you think are not relevant, we
> can discuss them specifically.

For example, this sounds like a description of the GPU, not of GDB
features:

> +@acronym{AMD GPU} supports the following @var{reggroup} values for the
> +@samp{info registers @var{reggroup} @dots{}} command:
> +
> +@itemize @bullet
> +
> +@item
> +general
> +
> +@item
> +vector
> +
> +@item
> +scalar
> +
> +@item
> +system
> +
> +@end itemize
> +
> +The number of scalar and vector registers is configured when a
> +wavefront is created.  Only allocated registers are displayed.

Or why do we need this in our manual:

> +The code object path for @acronym{AMD GPU} code objects is shown as a
> +@acronym{URI, Universal Location Identifier} with a syntax defined by
> +the following BNF syntax:
> +
> +@smallexample
> +code_object_uri ::== file_uri | memory_uri
> +file_uri        ::== "file://" file_path [ range_specifier ]
> +memory_uri      ::== "memory://" process_id range_specifier
> +range_specifier ::== [ "#" | "?" ] "offset=" number "&" "size=" number
> +file_path       ::== URI_ENCODED_OS_FILE_PATH
> +process_id      ::== DECIMAL_NUMBER
> +number          ::== HEX_NUMBER | DECIMAL_NUMBER | OCTAL_NUMBER
> +@end smallexample

(followed by a longish legend of what each atom means above).

And this paragraph seems to describe the GPU, not what GDB does:

> +If any of these signals are delivered to the wavefront, it will cause
> +the wavefront to enter the halt state and cause the @acronym{AMD ROCm}
> +runtime to put the associated queue into the queue error state.  All
> +wavefronts associated with a queue that is in the queue error state
> +are inhibited from executing further instructions even if they are not
> +in the halt state.  In addition, when the @acronym{AMD ROCm} runtime
> +puts a queue into the queue error state it may invoke an application
> +registered callback that could either abort the application or delete
> +the queue which will delete any wavefronts associated with the queue.

There's also a lot of stuff only very remotely related to GDB, which
basically reads like a large number of tips and tricks for someone who
needs this mode.  For example:

> +@item
> +By default, for some architectures, the @acronym{AMD GPU} device
> +driver causes all @acronym{AMD GPU} wavefronts created when
> +@value{GDBN} is not attached to be unable to report the dispatch
> +associated with the wavefront, or the wavefront's work-group
> +position.  The @samp{info threads} command will display this
> +missing information with a @samp{?}.
> +
> +For example,
> +
> +@smallexample
> +(gdb) info threads
> +  Id   Target Id                                       Frame
> +* 1    Thread 0x7ffff6987840 (LWP 62056) "bit_extract" 0x00007ffff6da489b in sched_yield () at ../sysdeps/unix/syscall-template.S:78
> +  2    Thread 0x7ffff6986700 (LWP 62064) "bit_extract" 0x00007ffff6db650b in ioctl () at ../sysdeps/unix/syscall-template.S:78
> +  3    Thread 0x7ffff5f7f700 (LWP 62066) "bit_extract" 0x00007ffff6db650b in ioctl () at ../sysdeps/unix/syscall-template.S:78
> +  4    Thread 0x7ffff597f700 (LWP 62067) "bit_extract" 0x00007ffff6db650b in ioctl () at ../sysdeps/unix/syscall-template.S:78
> +  5    AMDGPU Wave 1:2:?:1 (?,?,?)/? "bit_extract"     bit_extract_kernel (C_d=<optimized out>, A_d=<optimized out>, N=<optimized out>) at bit_extract.cpp:41
> +@end smallexample
> +
> +This does not affect wavefronts created while @value{GDBN} is attached
> +which are always capable of reporting this information.
> +
> +If the @env{HSA_ENABLE_DEBUG} environment variable is set to @samp{1}
> +when the @acronym{AMD ROCm} runtime is initialized, then this
> +information will be available for all architectures even for wavefronts
> +created when @value{GDBN} was not attached.  Setting this environment
> +variable may very marginally increase wavefront launch latency for some
> +architectures for very short lived wavefronts.
> +
> +@item
> +If an @acronym{AMD GPU} wavefront has the @code{DX10_CLAMP} bit set in
> +the @code{MODE} register, enabled arithmetic exceptions will not be
> +reported as @code{SIGFPE} signals.  This happens if the
> +@code{DX10_CLAMP} kernel descriptor field is enabled.

The last paragraph in particular reads like something from the GPU
programming manual.  And even if this kind of info is useful and
should be in the GDB manual, why does it need to be so wordy, with so
many detailed examples?

I understand the urge to document all the potentially useful stuff
about this mode of GDB, but the result looks disproportionally long
and full of low-level information only tangentially related to GDB.

That said, if you feel strongly about the need to include all this,
and I'm the only one who raises the brow, feel free to install this, I
won't object anymore.

Thanks.

next prev parent reply	other threads:[~2022-12-07 13:30 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-06 13:57 [PATCH 00/12] Initial " Simon Marchi
2022-12-06 13:57 ` [PATCH 01/12] gdb: add supports_arch_info callback to gdbarch_register Simon Marchi
2022-12-06 16:45   ` Andrew Burgess
2022-12-06 13:57 ` [PATCH 02/12] gdb: make install_breakpoint return a non-owning reference Simon Marchi
2022-12-06 16:46   ` Andrew Burgess
2022-12-06 13:57 ` [PATCH 03/12] gdbsupport: add type definitions for pid, lwp and tid Simon Marchi
2022-12-06 16:36   ` Andrew Burgess
2022-12-07  2:55     ` Simon Marchi
2022-12-06 13:57 ` [PATCH 04/12] gdb: add inferior_pre_detach observable Simon Marchi
2022-12-06 16:39   ` Andrew Burgess
2022-12-06 13:57 ` [PATCH 05/12] gdb: make gdbarch_alloc take ownership of the tdep Simon Marchi
2022-12-06 17:06   ` Andrew Burgess
2022-12-06 13:57 ` [PATCH 06/12] gdb: add gdbarch_up Simon Marchi
2022-12-06 17:07   ` Andrew Burgess
2022-12-06 13:57 ` [PATCH 07/12] gdbsupport: move libxxhash configure check to gdbsupport Simon Marchi
2022-12-06 17:19   ` Andrew Burgess
2022-12-06 13:57 ` [PATCH 08/12] gdbsupport: move fast_hash to gdbsupport/common-utils.h Simon Marchi
2022-12-06 17:19   ` Andrew Burgess
2022-12-06 13:57 ` [PATCH 09/12] gdbsupport: add gdb::string_view_hash Simon Marchi
2022-12-06 17:19   ` Andrew Burgess
2022-12-06 13:57 ` [PATCH 10/12] gdb/solib-svr4: don't disable probes interface if probe not found Simon Marchi
2022-12-06 13:57 ` [PATCH 11/12] gdb: make gdb_printing_disassembler::stream public Simon Marchi
2022-12-06 17:38   ` Andrew Burgess
2022-12-06 13:57 ` [PATCH 12/12] gdb: initial support for ROCm platform (AMDGPU) debugging Simon Marchi
2022-12-06 15:00   ` Eli Zaretskii
2022-12-06 15:10     ` Simon Marchi
2022-12-06 15:42   ` Eli Zaretskii
2022-12-07  2:17     ` Simon Marchi
2022-12-07 13:29       ` Eli Zaretskii [this message]
2022-12-16 17:37         ` Simon Marchi
2023-01-05 19:41 ` [PATCH 00/12] Initial " Simon Marchi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83wn734a0w.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=Tony.Tye@amd.com \
    --cc=gdb-patches@sourceware.org \
    --cc=lancelot.six@amd.com \
    --cc=laurent.morichetti@amd.com \
    --cc=pedro@palves.net \
    --cc=simark@simark.ca \
    --cc=simon.marchi@efficios.com \
    --cc=zoran.zaric@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).