From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-0010f301.pphosted.com (mx0a-0010f301.pphosted.com [148.163.149.254]) by sourceware.org (Postfix) with ESMTPS id 1CCED3857C6B for ; Wed, 15 Sep 2021 18:25:36 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 1CCED3857C6B Received: from pps.filterd (m0102856.ppops.net [127.0.0.1]) by mx0b-0010f301.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 18F9QdaU029270; Wed, 15 Sep 2021 13:25:25 -0500 Received: from mx3.mail.rice.edu (smtp2.mail.rice.edu [128.42.201.101]) by mx0b-0010f301.pphosted.com with ESMTP id 3b35gh8yjx-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 15 Sep 2021 13:25:25 -0500 Received: from mx3.mail.rice.edu (localhost [127.0.0.1]) by mx3.mail.rice.edu (Postfix) with ESMTP id 80CA142EC6A; Wed, 15 Sep 2021 13:25:24 -0500 (CDT) Received: from localhost (localhost [127.0.0.1]) by mx3.mail.rice.edu (Postfix) with ESMTP id 7FC4D42EC62; Wed, 15 Sep 2021 13:25:24 -0500 (CDT) X-Virus-Scanned: by amavis-2.12.1 at mx3.mail.rice.edu, auth channel Received: from mx3.mail.rice.edu ([127.0.0.1]) by localhost (mx3.mail.rice.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id Dol-DvUrAAx7; Wed, 15 Sep 2021 13:25:23 -0500 (CDT) Received: from [192.168.50.202] (c-98-200-175-18.hsd1.tx.comcast.net [98.200.175.18]) (using TLSv1.2 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: johnmc@rice.edu) by mx3.mail.rice.edu (Postfix) with ESMTPSA id 2DDFD216C7B; Wed, 15 Sep 2021 13:25:23 -0500 (CDT) From: John Mellor-Crummey Message-Id: <413A462A-D548-452E-9323-5193339C288A@rice.edu> Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.4\)) Subject: [PATCH] (revised) read inlining info in an NVIDIA extended line map (was: Extension ...) Date: Wed, 15 Sep 2021 13:25:22 -0500 In-Reply-To: Cc: John Mellor-Crummey , elfutils-devel@sourceware.org, Xiaozhu Meng To: Mark Wielaard References: <7C166312-4876-444F-9B85-C7E30C8F4959@rice.edu> <37B899F3-FD5F-4F7B-9F1B-BE63CF84F554@rice.edu> X-Mailer: Apple Mail (2.3608.120.23.2.4) X-Proofpoint-GUID: FcoaaSRwL2VhtDfbdGlqjBhkburlMRKU X-Proofpoint-ORIG-GUID: FcoaaSRwL2VhtDfbdGlqjBhkburlMRKU X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.182.1,Aquarius:18.0.790,Hydra:6.0.391,FMLib:17.0.607.475 definitions=2021-09-15_05,2021-09-15_01,2020-04-07_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1011 bulkscore=0 lowpriorityscore=0 phishscore=0 spamscore=0 priorityscore=1501 mlxscore=0 impostorscore=0 malwarescore=0 mlxlogscore=999 adultscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2109030001 definitions=main-2109150106 X-Spam-Status: No, score=-3.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, HTML_MESSAGE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: elfutils-devel@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Elfutils-devel mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Sep 2021 18:25:38 -0000 Mark, Thanks for your feedback. I am working on a new version of the patch = that changes the interface for dwarf_linecontext and dwarf_linefunctionname to return = a line pointer and a character pointer so it will be future proof. See my other comments below, e.g. about an idea for reworking the = Dwarf_line_s data structure. -- John Mellor-Crummey Professor Dept of Computer Science Rice University email: johnmc@rice.edu phone: 713-348-5179 > On Sep 10, 2021, at 12:11 PM, Mark Wielaard wrote: >=20 > Hi John, >=20 > On Fri, 2021-09-10 at 10:49 -0500, John Mellor-Crummey via Elfutils- > devel wrote: >> My previous patch submission seems to have been overlooked as >> buildbot issues consumed several days this week. However, discussion >> in the mailing list now seems to have moved on beyond my submission >> and I would like my patch considered. Here, I echo my previous >> submission, except I improved my submission by including the prefix >> [PATCH] in the subject and I included the patch in the message body >> rather than as an attachment. >=20 > Sorry for the buildbot noise, that was indeed a little painful. But a > buildbot that randomly fails is not much fun, so it took highest > priority to get it back to green. >=20 > Your submission is really nice, having extensive documentation, all > features implemented, a testcase. Well done. >=20 > There are however some concerns outside your control. It is somewhat > disappointing NVIDIA didn't document this themselves, or tried to > standardize this. You seems to have a good grasp of what was intended > though. We have to be careful not to extend the api in a way that = makes > a better standard/implementation impossible. And the way we = implemented > Dwarf_Lines isn't ideal, so this extension shows we should maybe = change > it to be more efficient/compact. But maybe we can do that after adding > the extension, we should however have a plan. >=20 >> Regarding the DWARF proposal by Cary Coutant for two-level linemaps: >> I now believe that NVIDIA=E2=80=99s implementation is consistent with = that >> proposal although NVIDIA uses a DWARF extended opcode for inlined >> calls whereas Cary=E2=80=99s proposal uses DW_LNS_inlined_call (a = standard >> opcode), NVIDIA=E2=80=99s implementation uses DW_LNE_inlined_call (an >> extended opcode). >=20 > The naming is one of the concerns. It would be better to use a name > like DW_LNE_NVIDIA_inlined_call and DW_LNE_NVIDIA_set_function_name to > show they are vendor extensions and don't clash with possible future > standard opcode names. I renamed the two new DWARF extended opcodes as you suggested. > That it mimics the two-level linemaps proposal is a good thing. But > lets make sure that the new accessor functions, dwarf_linecontext and > dwarf_linefunctionname, are generic enough that they can hopefully be > reused when two-level linemaps or a similar proposal becomes a > standard. >> A note about the source code for the test case reading the extended >> linemap entries using libdw: this code was copied from another test >> that used dwarf_next_lines and extended with code that reads the new >> context and functionname fields of a line table entry. >=20 > Thanks for the test case, it makes clear how the new functionality can > be used. How was the test binary, testfile_nvidia_linemap, generated? > That should probably be documented inside the testcase. I documented how the NVIDIA binary used in the two test cases was = created by adding comments to the two test cases. > I won't be able to review all the code right now, but here are some > high-level comments, so you know what I am thinking. >=20 > On Sep 5, 2021, at 7:07 PM, John Mellor-Crummey >>> wrote: >>>=20 >>> As of CUDA 11.2, NVIDIA added extensions to the line map section >>> of CUDA binaries to represent inlined functions. These extensions >>> include >>>=20 >>> - two new fields in a line table row to represent inline=20 >>> information: context, and functionname, >=20 > We didn't design the Dwarf_Line_s very well/compact. We already have > the op_index, isa and discriminator fields which are almost never = used. > This adds two more. I wonder if we can split the struct up so that the > extra fields are only used when actually used. > Maybe this can be done with a flag in Dwarf_Lines_s indicating whether > the array contains only the basic line attributes or also some = extended > values. Of course this makes dwarf_getsrclines even more complex > because it then has to expand the line state struct whenever it first > sees an extended attribute. But I really like to see if we can use = less > memory here. If we agree on some way to make that possible we can > implement it afterwards. Here are my thoughts about that.=20 I think the type for Dwarf_Lines should be opaque rather than simply a pointer to a Dwarf_Line_s record. The dwarf_onesrcline or routine requires constant time random access to line records for efficiency. Here is how I think that could be preserved. Let the Dwarf_Line_s structure be a union type: a small record with a = bit or two that indicates the record kind, e.g. enum dwarf_line_type { dwarf_line_compact, dwarf_line_nvidia_inline, = dwarf_line_extended2, =E2=80=A6}.=20 All the fields for the compact record would be in one structure in the = union. Other structures in the union would just contain a pointer to an out-of-band extended record that is = stored elsewhere. The storage could be managed as follows: Have a buffer to store the lines. A sequence of compact Dwarf_Line_s = records begin at the front of the buffer. Any time an extended record is = needed, allocate it at the end of the buffer before the previous = extended record. An extended record will have both a This could = accommodate extended records of various lengths. Allocations in the = buffer would manage the fact that compact line records are allocated at = the front and the smaller number of larger extended line records are = allocated at the back. When the the next allocation would cause the = cursors to cross in the middle, the buffer is full and another buffer = is needed. The Dwarf_Lines type could have a pointer to a next = Dwarf_Lines structure. If there is concern that finding a line would not = really be constant time if there was a huge number of lines for a file = line lookup could require following an unbounded chain of pointers to a = next buffer, then the buffers could be managed in a splay tree or a skip = list, which would give O(log n) time lookup of a line buffer followed by = a constant time indexing into the buffer. >=20 >>> - two new DWARF extended opcodes: DW_LNE_inlined_call,=20 >>> DW_LNE_set_function_name, >=20 > Like I said above I think these names should contain the "vendor" name > DW_LNE_NVIDIA_... Done. >=20 >>> - an additional word in the line table header that indicates=20 >>> the offset in the .debug_str function where the function=20 >>> names for this line table begin, and >=20 > This is the only part I think is somewhat tricky. I believe you > implement it cleverly by checking the header_length. That strategy came from NVIDIA=E2=80=99s implementation in cuda-gdb . = They check when there is a gap between the end of the standard header and the header length. > And your > implementation should work, but it is also the part that no other tool > understands, which means any such binary cannot really be handled by > any other tool. And I don't really understand how this works when > linking objects with such a offset into .debug_str. Normally the > .debug_str section contains strings that can be merged, but that would > disrupt the order, so I don't fully understand how the function_name > indexes are kept correct. This would have been nicer as a DWARF5 > extension, where there is already a .debug_str_offsets section (but > also confusing because there is also a special .debug_line_str section > where these strings should probably point at). >=20 >>> - two new functions in the libdw API: dwarf_linecontext and=20 >>> dwarf_linefunctionname, which return the new line table fields. >=20 > I think it would be nice if we could make these functions return a > Dwarf_Line * and a const char * to make them future proof in case a > standard extension encodes these differently. I will work on a patch that does just this as soon as I have time.