From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from gnu.wildebeest.org (wildebeest.demon.nl [212.238.236.112]) by sourceware.org (Postfix) with ESMTPS id B38973858405 for ; Fri, 10 Sep 2021 17:11:38 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B38973858405 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=klomp.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=klomp.org Received: from tarox.wildebeest.org (83-87-18-245.cable.dynamic.v4.ziggo.nl [83.87.18.245]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by gnu.wildebeest.org (Postfix) with ESMTPSA id E80A030016B8; Fri, 10 Sep 2021 19:11:36 +0200 (CEST) Received: by tarox.wildebeest.org (Postfix, from userid 1000) id 50F4C413CE02; Fri, 10 Sep 2021 19:11:36 +0200 (CEST) Message-ID: Subject: Re: [PATCH] read inlining info in an NVIDIA extended line map (was: Extension ...) From: Mark Wielaard To: John Mellor-Crummey , elfutils-devel@sourceware.org Cc: Xiaozhu Meng Date: Fri, 10 Sep 2021 19:11:36 +0200 In-Reply-To: <37B899F3-FD5F-4F7B-9F1B-BE63CF84F554@rice.edu> References: <7C166312-4876-444F-9B85-C7E30C8F4959@rice.edu> <37B899F3-FD5F-4F7B-9F1B-BE63CF84F554@rice.edu> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Mailer: Evolution 3.28.5 (3.28.5-10.el7) Mime-Version: 1.0 X-Spam-Status: No, score=-4.3 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: elfutils-devel@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Elfutils-devel mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Sep 2021 17:11:40 -0000 Hi John, On Fri, 2021-09-10 at 10:49 -0500, John Mellor-Crummey via Elfutils- devel wrote: > My previous patch submission seems to have been overlooked as > buildbot issues consumed several days this week. However, discussion > in the mailing list now seems to have moved on beyond my submission > and I would like my patch considered. Here, I echo my previous > submission, except I improved my submission by including the prefix > [PATCH] in the subject and I included the patch in the message body > rather than as an attachment. Sorry for the buildbot noise, that was indeed a little painful. But a buildbot that randomly fails is not much fun, so it took highest priority to get it back to green. Your submission is really nice, having extensive documentation, all features implemented, a testcase. Well done. There are however some concerns outside your control. It is somewhat disappointing NVIDIA didn't document this themselves, or tried to standardize this. You seems to have a good grasp of what was intended though. We have to be careful not to extend the api in a way that makes a better standard/implementation impossible. And the way we implemented Dwarf_Lines isn't ideal, so this extension shows we should maybe change it to be more efficient/compact. But maybe we can do that after adding the extension, we should however have a plan. > Regarding the DWARF proposal by Cary Coutant for two-level linemaps: > I now believe that NVIDIA=E2=80=99s implementation is consistent with tha= t > proposal although NVIDIA uses a DWARF extended opcode for inlined > calls whereas Cary=E2=80=99s proposal uses DW_LNS_inlined_call (a standar= d > opcode), NVIDIA=E2=80=99s implementation uses DW_LNE_inlined_call (an > extended opcode). The naming is one of the concerns. It would be better to use a name like DW_LNE_NVIDIA_inlined_call and DW_LNE_NVIDIA_set_function_name to show they are vendor extensions and don't clash with possible future standard opcode names. That it mimics the two-level linemaps proposal is a good thing. But lets make sure that the new accessor functions, dwarf_linecontext and dwarf_linefunctionname, are generic enough that they can hopefully be reused when two-level linemaps or a similar proposal becomes a standard. > A note about the source code for the test case reading the extended > linemap entries using libdw: this code was copied from another test > that used dwarf_next_lines and extended with code that reads the new > context and functionname fields of a line table entry. Thanks for the test case, it makes clear how the new functionality can be used. How was the test binary, testfile_nvidia_linemap, generated? That should probably be documented inside the testcase. I won't be able to review all the code right now, but here are some high-level comments, so you know what I am thinking. On Sep 5, 2021, at 7:07 PM, John Mellor-Crummey > > wrote: > >=20 > > As of CUDA 11.2, NVIDIA added extensions to the line map section > > of CUDA binaries to represent inlined functions. These extensions > > include > >=20 > > - two new fields in a line table row to represent inline=20 > > information: context, and functionname, We didn't design the Dwarf_Line_s very well/compact. We already have the op_index, isa and discriminator fields which are almost never used. This adds two more. I wonder if we can split the struct up so that the extra fields are only used when actually used. Maybe this can be done with a flag in Dwarf_Lines_s indicating whether the array contains only the basic line attributes or also some extended values. Of course this makes dwarf_getsrclines even more complex because it then has to expand the line state struct whenever it first sees an extended attribute. But I really like to see if we can use less memory here. If we agree on some way to make that possible we can implement it afterwards. > > - two new DWARF extended opcodes: DW_LNE_inlined_call,=20 > > DW_LNE_set_function_name, Like I said above I think these names should contain the "vendor" name DW_LNE_NVIDIA_... > > - an additional word in the line table header that indicates=20 > > the offset in the .debug_str function where the function=20 > > names for this line table begin, and This is the only part I think is somewhat tricky. I believe you implement it cleverly by checking the header_length. And your implementation should work, but it is also the part that no other tool understands, which means any such binary cannot really be handled by any other tool. And I don't really understand how this works when linking objects with such a offset into .debug_str. Normally the .debug_str section contains strings that can be merged, but that would disrupt the order, so I don't fully understand how the function_name indexes are kept correct. This would have been nicer as a DWARF5 extension, where there is already a .debug_str_offsets section (but also confusing because there is also a special .debug_line_str section where these strings should probably point at). > > - two new functions in the libdw API: dwarf_linecontext and=20 > > dwarf_linefunctionname, which return the new line table fields. I think it would be nice if we could make these functions return a Dwarf_Line * and a const char * to make them future proof in case a standard extension encodes these differently. Cheers, Mark