From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from eggs.gnu.org (eggs.gnu.org [IPv6:2001:470:142:3::10]) by sourceware.org (Postfix) with ESMTPS id BA6B438369F2 for ; Wed, 7 Dec 2022 13:30:09 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org BA6B438369F2 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gnu.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gnu.org Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1p2uUr-0002lL-NS; Wed, 07 Dec 2022 08:30:01 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=5aiaDh8TPeCZFHplhaAbwCzIt42Ril4T6dOZP9fNpzw=; b=DMCAj4qSTMj4 /2yRn2YqGVWBP3hlSxyYmYjbueDArh5Vfl1QA1nJCDsFYHyCpqzNLKrMCVAMC6BX0B96jNK3riGDa NIzEOR3kwBI/BrhrKFRkQbZlAfDEgLC4oRPv8S0Sdmitw2uXqdSPRuGKEkXfc0DA8A2OCH/T43Ame sIagX7aO8I8BSi5wkWyjtr2NWCPeeVJ7PaWJ5KNs+M0GRHBKuB1nL3Pb4pPxAiBljy5fJMNfgsGxM ETB81YLoeGcRwd2L33iZXj7X1wDt1g8W2VrSkuUpwbC2vXAkgyi1eoN1c9xiyjmidwdaH2dbMGu8b JAHvrkAJ+AtLxV06xu47PQ==; Received: from [87.69.77.57] (helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1p2uUe-0000ev-Jk; Wed, 07 Dec 2022 08:30:01 -0500 Date: Wed, 07 Dec 2022 15:29:35 +0200 Message-Id: <83wn734a0w.fsf@gnu.org> From: Eli Zaretskii To: Simon Marchi Cc: simon.marchi@efficios.com, gdb-patches@sourceware.org, zoran.zaric@amd.com, laurent.morichetti@amd.com, Tony.Tye@amd.com, lancelot.six@amd.com, pedro@palves.net In-Reply-To: (message from Simon Marchi on Tue, 6 Dec 2022 21:17:25 -0500) Subject: Re: [PATCH 12/12] gdb: initial support for ROCm platform (AMDGPU) debugging References: <20221206135729.3937767-1-simon.marchi@efficios.com> <20221206135729.3937767-13-simon.marchi@efficios.com> <835yeo7d3i.fsf@gnu.org> X-Spam-Status: No, score=1.9 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_BARRACUDACENTRAL,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Level: * X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: > Date: Tue, 6 Dec 2022 21:17:25 -0500 > Cc: gdb-patches@sourceware.org, zoran.zaric@amd.com, > laurent.morichetti@amd.com, Tony.Tye@amd.com, lancelot.six@amd.com, > pedro@palves.net > From: Simon Marchi > > >> +@node set scheduler-locking > > > > This @node is without any @chapter/@section, and does not appear in any > > @menu. That doesn't look right; did you actually succeed in building the > > manual with these changes? > Yes, it builds. It is used as a destination for a @pxref{set > scheduler-locking} later in the patch, if I remove it doesn't build: > > gdb.texinfo:26469: @pxref reference to nonexistent node `set scheduler-locking' > > But I see the problem, in the HTML it introduces a new page starting at > that point, we don't want that. Perhaps we should use @anchor instead? You could use @anchor if all you want is a place to direct an @xref. > >> +The following @acronym{AMD GPU} architectures are supported: > >> + > >> +@table @emph > >> + > >> +@item @samp{gfx900} > >> +AMD Vega 10 devices, displayed as @samp{vega10} by @value{GDBN}. > >> + > >> +@item @samp{gfx906} > >> +AMD Vega 7nm devices, displayed as @samp{vega20} by @value{GDBN}. > > > > Do we really need this long list of architectures in the GDB manual? It > > sounds like an ad for AMD... > > We found it useful, as people often ask which devices / models the > debugger supports. AMD produces a lot of GPU models. A subset of that > can run ROCm programs. And a subset of that support debugging. I think > it's useful to tell users which devices GDB is expected to work with. > > And it would not be a very good ad either, as most of these devices are > far from the latest and greatest :). I understand, but this is a _GDB_ manual, not a manual for debugging AMD GPU programs. We need to draw the line at some point. Why cannot these details be in some README somewhere, or on the Wiki? > > This and other @smallexamples in the patch have too long lines; please break > > them into two or more, to avoid problems with the printed format of the > > manual. > > What is the maximum line length for this? I think 70 or 72. > I'll try, but it's a bit difficult when quoting actual GDB output. For > instance, how would you do this one? > > @smallexample > (@value{GDBP}) info sharedlibrary > >From To Syms Read Shared Object Library > 0x00007fd120664ac0 0x00007fd120682790 Yes (*) /lib64/ld-linux-x86-64.so.2 Use shorter addresses and directory names: they are immaterial for your purposes here. > >> +@subsubsection @acronym{AMD GPU} Wavefronts > > > > I think a @cindex about wavefronts would be useful here. > > I just add a `@cindex Wavefronts` under the line quoted above? Yes, but "wavefronts", lower-case. In general, index entries should not use upper-case unless really necessary, because the sorting order of mixed-case text depends on the locale and the underlying C library. > >> +@item file_path > >> +The file's path specified as a URI encoded UTF-8 string. In URI > >> +encoding, every character that is not: > >> + > >> +@itemize > >> +@item In the @samp{a-z}, @samp{A-Z}, @samp{0-9} ranges > >> +@item @samp{/}, @samp{_}, @samp{.}, @samp{~} or @samp{-} > >> +@end itemize > >> + > >> +is encoded as two uppercase hexadecimal digits proceeded by @samp{%}. > > > > You want @noindent before the last line. > > I can add it, but I don't see the difference (at least in the HTML > and PDF outputs). It's unreliable to rely on this to produce un-indented lines. Depending on the global settings such as @paragraphindent you can get something you don't want. > > Finally: maybe it's just me, but isn't this documentation way too detailed? > > It weighs in at 800 lines, and includes many details that seem to be more > > related to AMD, GPU, and HIP than to GDB. Would it be reasonable to make > > this section shorter by omitting too low-level and unrelated details? > > I went over the page, and while I agree it's very thorough and detailed, > my impression is that it's all information that is one way or another > related to how GDB interacts with ROCm / HIP / AMD GPUs. So, all > information that could be useful to someone with good knowledge of ROCm > / HIP / AMD GPUs, if they wanted to use GDB to debug their program. For > instance, the description of when GDB reports a SIGABRT is useful, as > the mapping between target debug events and Unix signals in GDB is kind > of arbitrary. But you also describe in detail what each signal means, for example, which is either redundant or belongs to the documentation of GPU programming. E.g., we don't explain in the manual what kind of signal is SIGBUS or SIGFPE in GP CPUs, so why should we have this spelled out for GPU programs? > If you can point out specific parts that you think are not relevant, we > can discuss them specifically. For example, this sounds like a description of the GPU, not of GDB features: > +@acronym{AMD GPU} supports the following @var{reggroup} values for the > +@samp{info registers @var{reggroup} @dots{}} command: > + > +@itemize @bullet > + > +@item > +general > + > +@item > +vector > + > +@item > +scalar > + > +@item > +system > + > +@end itemize > + > +The number of scalar and vector registers is configured when a > +wavefront is created. Only allocated registers are displayed. Or why do we need this in our manual: > +The code object path for @acronym{AMD GPU} code objects is shown as a > +@acronym{URI, Universal Location Identifier} with a syntax defined by > +the following BNF syntax: > + > +@smallexample > +code_object_uri ::== file_uri | memory_uri > +file_uri ::== "file://" file_path [ range_specifier ] > +memory_uri ::== "memory://" process_id range_specifier > +range_specifier ::== [ "#" | "?" ] "offset=" number "&" "size=" number > +file_path ::== URI_ENCODED_OS_FILE_PATH > +process_id ::== DECIMAL_NUMBER > +number ::== HEX_NUMBER | DECIMAL_NUMBER | OCTAL_NUMBER > +@end smallexample (followed by a longish legend of what each atom means above). And this paragraph seems to describe the GPU, not what GDB does: > +If any of these signals are delivered to the wavefront, it will cause > +the wavefront to enter the halt state and cause the @acronym{AMD ROCm} > +runtime to put the associated queue into the queue error state. All > +wavefronts associated with a queue that is in the queue error state > +are inhibited from executing further instructions even if they are not > +in the halt state. In addition, when the @acronym{AMD ROCm} runtime > +puts a queue into the queue error state it may invoke an application > +registered callback that could either abort the application or delete > +the queue which will delete any wavefronts associated with the queue. There's also a lot of stuff only very remotely related to GDB, which basically reads like a large number of tips and tricks for someone who needs this mode. For example: > +@item > +By default, for some architectures, the @acronym{AMD GPU} device > +driver causes all @acronym{AMD GPU} wavefronts created when > +@value{GDBN} is not attached to be unable to report the dispatch > +associated with the wavefront, or the wavefront's work-group > +position. The @samp{info threads} command will display this > +missing information with a @samp{?}. > + > +For example, > + > +@smallexample > +(gdb) info threads > + Id Target Id Frame > +* 1 Thread 0x7ffff6987840 (LWP 62056) "bit_extract" 0x00007ffff6da489b in sched_yield () at ../sysdeps/unix/syscall-template.S:78 > + 2 Thread 0x7ffff6986700 (LWP 62064) "bit_extract" 0x00007ffff6db650b in ioctl () at ../sysdeps/unix/syscall-template.S:78 > + 3 Thread 0x7ffff5f7f700 (LWP 62066) "bit_extract" 0x00007ffff6db650b in ioctl () at ../sysdeps/unix/syscall-template.S:78 > + 4 Thread 0x7ffff597f700 (LWP 62067) "bit_extract" 0x00007ffff6db650b in ioctl () at ../sysdeps/unix/syscall-template.S:78 > + 5 AMDGPU Wave 1:2:?:1 (?,?,?)/? "bit_extract" bit_extract_kernel (C_d=, A_d=, N=) at bit_extract.cpp:41 > +@end smallexample > + > +This does not affect wavefronts created while @value{GDBN} is attached > +which are always capable of reporting this information. > + > +If the @env{HSA_ENABLE_DEBUG} environment variable is set to @samp{1} > +when the @acronym{AMD ROCm} runtime is initialized, then this > +information will be available for all architectures even for wavefronts > +created when @value{GDBN} was not attached. Setting this environment > +variable may very marginally increase wavefront launch latency for some > +architectures for very short lived wavefronts. > + > +@item > +If an @acronym{AMD GPU} wavefront has the @code{DX10_CLAMP} bit set in > +the @code{MODE} register, enabled arithmetic exceptions will not be > +reported as @code{SIGFPE} signals. This happens if the > +@code{DX10_CLAMP} kernel descriptor field is enabled. The last paragraph in particular reads like something from the GPU programming manual. And even if this kind of info is useful and should be in the GDB manual, why does it need to be so wordy, with so many detailed examples? I understand the urge to document all the potentially useful stuff about this mode of GDB, but the result looks disproportionally long and full of low-level information only tangentially related to GDB. That said, if you feel strongly about the need to include all this, and I'm the only one who raises the brow, feel free to install this, I won't object anymore. Thanks.