From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-0010f301.pphosted.com (mx0b-0010f301.pphosted.com [148.163.153.244]) by sourceware.org (Postfix) with ESMTPS id B6EFA3858036 for ; Thu, 4 Nov 2021 21:42:02 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B6EFA3858036 Received: from pps.filterd (m0102858.ppops.net [127.0.0.1]) by mx0b-0010f301.pphosted.com (8.16.1.2/8.16.1.2) with ESMTP id 1A4J6efh021115 for ; Thu, 4 Nov 2021 16:42:01 -0500 Received: from mx2.mail.rice.edu (smtp1.mail.rice.edu [128.42.199.100]) by mx0b-0010f301.pphosted.com (PPS) with ESMTPS id 3c4aeks4hf-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 04 Nov 2021 16:42:01 -0500 Received: from mx2.mail.rice.edu (localhost [127.0.0.1]) by mx2.mail.rice.edu (Postfix) with ESMTP id 9670E41568E for ; Thu, 4 Nov 2021 16:42:00 -0500 (CDT) Received: from localhost (localhost [127.0.0.1]) by mx2.mail.rice.edu (Postfix) with ESMTP id 943E0415D4A; Thu, 4 Nov 2021 16:42:00 -0500 (CDT) X-Virus-Scanned: by amavis-2.12.1 at mx2.mail.rice.edu, auth channel Received: from mx2.mail.rice.edu ([127.0.0.1]) by localhost (mx2.mail.rice.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id lBs6xIW3-d4S; Thu, 4 Nov 2021 16:41:59 -0500 (CDT) Received: from jmc16.dyndns.rice.edu (jmc16.dyndns.rice.edu [168.5.18.183]) (using TLSv1.2 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: johnmc@rice.edu) by mx2.mail.rice.edu (Postfix) with ESMTPSA id 306AA209C21; Thu, 4 Nov 2021 16:41:59 -0500 (CDT) From: John Mellor-Crummey Message-Id: Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.4\)) Subject: [PATCH] (v2) read inlining info in an NVIDIA extended line map Date: Thu, 4 Nov 2021 16:41:58 -0500 In-Reply-To: <413A462A-D548-452E-9323-5193339C288A@rice.edu> Cc: John Mellor-Crummey , Xiaozhu Meng , Jonathon Anderson To: elfutils-devel@sourceware.org References: <7C166312-4876-444F-9B85-C7E30C8F4959@rice.edu> <37B899F3-FD5F-4F7B-9F1B-BE63CF84F554@rice.edu> <413A462A-D548-452E-9323-5193339C288A@rice.edu> X-Mailer: Apple Mail (2.3608.120.23.2.4) X-Proofpoint-ORIG-GUID: z0tuib4nY5ZzwvZy04oU1yWjuszfk0f3 X-Proofpoint-GUID: z0tuib4nY5ZzwvZy04oU1yWjuszfk0f3 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.790,Hydra:6.0.425,FMLib:17.0.607.475 definitions=2021-11-04_07,2021-11-03_01,2020-04-07_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 impostorscore=0 priorityscore=1501 mlxscore=0 bulkscore=0 lowpriorityscore=0 spamscore=0 suspectscore=0 clxscore=1011 malwarescore=0 adultscore=0 mlxlogscore=999 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2110150000 definitions=main-2111040083 X-Spam-Status: No, score=-3.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, HTML_MESSAGE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: elfutils-devel@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Elfutils-devel mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Nov 2021 21:42:08 -0000 [We would really like this patch in the forthcoming release] Attached is a new version of the patch for reading inlining information = encoded in an enhanced line map format used in NVIDIA GPU binaries for = CUDA 11.2+. This is an updated version of a patch first submitted on Sept. 5. A copy = of the original submission email is quoted below this note.=20 Here I describe just the improvements to that patch that address = Mark=E2=80=99s concerns: (1) all of the code for handling NVIDIA DWARF extensions is always = available; there is no special configuration switch needed. (2) all changes are bracketed by comments that mark them NVIDIA = extensions (3) the DWARF extended opcodes have been renamed with names that include = NVIDIA in them (4) the two new API functions to surface the new information have been = improved to separate the interface result from the internal = representation (at Mark=E2=80=99s request) (4a) the API for extracting the name of an inlined function in a = DWARF line now returns a const char * instead of a string table index (4b) the API for extracting an inline =E2=80=9Ccontext=E2=80=9D = now returns a pointer to a DWARF line where the code is inlined rather = than returning an unsigned int (an index into the line table that one = could use to compute the pointer) (5) there are test cases for readelf and libdw that use a binary = generated by NVIDIA=E2=80=99s compiler. the test cases include = information about how the binary was generated -- John Mellor-Crummey Professor Dept of Computer Science Rice University email: johnmc@rice.edu phone: 713-348-5179 Description of the first version of the patch:=20 > On Sep 5, 2021, at 7:07 PM, John Mellor-Crummey > wrote: >=20 > As of CUDA 11.2, NVIDIA added extensions to the line map section > of CUDA binaries to represent inlined functions. These extensions > include >=20 > - two new fields in a line table row to represent inline=20 > information: context, and functionname, >=20 > - two new DWARF extended opcodes: DW_LNE_inlined_call,=20 > DW_LNE_set_function_name, >=20 > - an additional word in the line table header that indicates=20 > the offset in the .debug_str function where the function=20 > names for this line table begin, and >=20 > - two new functions in the libdw API: dwarf_linecontext and=20 > dwarf_linefunctionname, which return the new line table fields. >=20 > A line table row for an inlined function contains a non-zero > "context" value. The =E2=80=9Ccontext=E2=80=9D field indicates the = index of the > line table row that serves as the call site for an inlined > context. >=20 > The "functionname" field in a line table row is only meaningful > if the "context" field of the row is non-zero. A meaningful > "functionname" field contains an index into the .debug_str > section relative to the base offset established in the line table > header; the position in the .debug_str section indicates the name > of the inlined function. >=20 > These extensions resemble the proposed DWARF extensions > (http://dwarfstd.org/ShowIssue.php?issue=3D140906.1 = ) by Cary > Coutant, but are not identical. >=20 > This patch adds integrates support for handling NVIDIA's extended > line maps into elfutil's libdw library and the readelf command > line utility. >=20 > Since this support is a non-standard extension to DWARF, all code > that implements the extensions is implemented between markers =20 > /* Begin NVIDIA_LINEMAP_INLINING_EXTENSIONS */ and=20 > /* End NVIDIA_LINEMAP_INLINING_EXTENSIONS */. >=20 > The definition below >=20 > #define NVIDIA_LINEMAP_INLINING_EXTENSIONS 1 >=20 > is added to elfutils/version.h, which enables a client of elfutils=20 > to test whether the NVIDIA line map extensions are present.=20 >=20 > Note: The support for NVIDIA extended line maps adds two integer > fields (context and functionname) to struct Dwarf_Line_s, which > makes the structure about 30% larger. >=20 > The patch includes a binary testfile_nvidia_linemap.bz2 that > contains an NVIDIA extended linemap along with two tests that > read the line map. >=20 > - A test script run-nvidia-extended-linemap-readelf.sh=20 > checks the output of readelf on the new test binary to=20 > validate its dump of the line op codes containing context=20 > and functionname entries. >=20 > - A test program tests/nvidia_extended_linemap_libdw.c reads=20 > the extended line map with dwarf_next_lines and dumps the=20 > context and functionname fields of the line map where they=20 > are relevant, i.e. the value of context is non-zero. A test=20 > script run-nvidia-extended-linemap-libdw.sh runs this test=20 > and validates its output. >=20 > A patch with the new functionality described above is attached. Discussion about the first version of the patch: > On Sep 15, 2021, at 1:25 PM, John Mellor-Crummey = wrote: >=20 > Mark, >=20 > Thanks for your feedback. I am working on a new version of the patch = that changes > the interface for dwarf_linecontext and dwarf_linefunctionname to = return a line pointer and a character pointer so it will be future = proof. >=20 > See my other comments below, e.g. about an idea for reworking the = Dwarf_line_s data structure. > -- > John Mellor-Crummey Professor > Dept of Computer Science Rice University > email: johnmc@rice.edu phone: = 713-348-5179 >=20 >> On Sep 10, 2021, at 12:11 PM, Mark Wielaard > wrote: >>=20 >> Hi John, >>=20 >> On Fri, 2021-09-10 at 10:49 -0500, John Mellor-Crummey via Elfutils- >> devel wrote: >>> My previous patch submission seems to have been overlooked as >>> buildbot issues consumed several days this week. However, discussion >>> in the mailing list now seems to have moved on beyond my submission >>> and I would like my patch considered. Here, I echo my previous >>> submission, except I improved my submission by including the prefix >>> [PATCH] in the subject and I included the patch in the message body >>> rather than as an attachment. >>=20 >> Sorry for the buildbot noise, that was indeed a little painful. But a >> buildbot that randomly fails is not much fun, so it took highest >> priority to get it back to green. >>=20 >> Your submission is really nice, having extensive documentation, all >> features implemented, a testcase. Well done. >>=20 >> There are however some concerns outside your control. It is somewhat >> disappointing NVIDIA didn't document this themselves, or tried to >> standardize this. You seems to have a good grasp of what was intended >> though. We have to be careful not to extend the api in a way that = makes >> a better standard/implementation impossible. And the way we = implemented >> Dwarf_Lines isn't ideal, so this extension shows we should maybe = change >> it to be more efficient/compact. But maybe we can do that after = adding >> the extension, we should however have a plan. >>=20 >>> Regarding the DWARF proposal by Cary Coutant for two-level linemaps: >>> I now believe that NVIDIA=E2=80=99s implementation is consistent = with that >>> proposal although NVIDIA uses a DWARF extended opcode for inlined >>> calls whereas Cary=E2=80=99s proposal uses DW_LNS_inlined_call (a = standard >>> opcode), NVIDIA=E2=80=99s implementation uses DW_LNE_inlined_call = (an >>> extended opcode). >>=20 >> The naming is one of the concerns. It would be better to use a name >> like DW_LNE_NVIDIA_inlined_call and DW_LNE_NVIDIA_set_function_name = to >> show they are vendor extensions and don't clash with possible future >> standard opcode names. >=20 > I renamed the two new DWARF extended opcodes as you suggested. >=20 >> That it mimics the two-level linemaps proposal is a good thing. But >> lets make sure that the new accessor functions, dwarf_linecontext and >> dwarf_linefunctionname, are generic enough that they can hopefully be >> reused when two-level linemaps or a similar proposal becomes a >> standard. >=20 >=20 >>> A note about the source code for the test case reading the extended >>> linemap entries using libdw: this code was copied from another test >>> that used dwarf_next_lines and extended with code that reads the new >>> context and functionname fields of a line table entry. >>=20 >> Thanks for the test case, it makes clear how the new functionality = can >> be used. How was the test binary, testfile_nvidia_linemap, generated? >> That should probably be documented inside the testcase. >=20 > I documented how the NVIDIA binary used in the two test cases was = created > by adding comments to the two test cases. >=20 >> I won't be able to review all the code right now, but here are some >> high-level comments, so you know what I am thinking. >>=20 >> On Sep 5, 2021, at 7:07 PM, John Mellor-Crummey > >>>> wrote: >>>>=20 >>>> As of CUDA 11.2, NVIDIA added extensions to the line map section >>>> of CUDA binaries to represent inlined functions. These extensions >>>> include >>>>=20 >>>> - two new fields in a line table row to represent inline=20 >>>> information: context, and functionname, >>=20 >> We didn't design the Dwarf_Line_s very well/compact. We already have >> the op_index, isa and discriminator fields which are almost never = used. >> This adds two more. I wonder if we can split the struct up so that = the >> extra fields are only used when actually used. >=20 >=20 >=20 >> Maybe this can be done with a flag in Dwarf_Lines_s indicating = whether >> the array contains only the basic line attributes or also some = extended >> values. Of course this makes dwarf_getsrclines even more complex >> because it then has to expand the line state struct whenever it first >> sees an extended attribute. But I really like to see if we can use = less >> memory here. If we agree on some way to make that possible we can >> implement it afterwards. >=20 > Here are my thoughts about that.=20 >=20 > I think the type for Dwarf_Lines should be opaque rather than simply > a pointer to a Dwarf_Line_s record. The dwarf_onesrcline or routine > requires constant time random access to line records for efficiency. > Here is how I think that could be preserved. >=20 > Let the Dwarf_Line_s structure be a union type: a small record with a = bit or two that indicates the record kind, e.g. > enum dwarf_line_type { dwarf_line_compact, dwarf_line_nvidia_inline, = dwarf_line_extended2, =E2=80=A6}.=20 > All the fields for the compact record would be in one structure in the = union. Other structures in the union > would just contain a pointer to an out-of-band extended record that is = stored elsewhere. The storage could be managed as follows: > Have a buffer to store the lines. A sequence of compact Dwarf_Line_s = records begin at the front of the buffer. Any time an extended record is = needed, allocate it at the end of the buffer before the previous = extended record. An extended record will have both a This could = accommodate extended records of various lengths. Allocations in the = buffer would manage the fact that compact line records are allocated at = the front and the smaller number of larger extended line records are = allocated at the back. When the the next allocation would cause the = cursors to cross in the middle, the buffer is full and another buffer = is needed. The Dwarf_Lines type could have a pointer to a next = Dwarf_Lines structure. If there is concern that finding a line would not = really be constant time if there was a huge number of lines for a file = line lookup could require following an unbounded chain of pointers to a = next buffer, then the buffers could be managed in a splay tree or a skip = list, which would give O(log n) time lookup of a line buffer followed by = a constant time indexing into the buffer. >=20 >>=20 >>>> - two new DWARF extended opcodes: DW_LNE_inlined_call,=20 >>>> DW_LNE_set_function_name, >>=20 >> Like I said above I think these names should contain the "vendor" = name >> DW_LNE_NVIDIA_... >=20 > Done. >=20 >> ... >=20 >>=20 >>>> - two new functions in the libdw API: dwarf_linecontext and=20 >>>> dwarf_linefunctionname, which return the new line table fields. >>=20 >> I think it would be nice if we could make these functions return a >> Dwarf_Line * and a const char * to make them future proof in case a = standard extension encodes these differently.