From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 29257 invoked by alias); 10 Aug 2009 09:04:27 -0000 Mailing-List: contact archer-help@sourceware.org; run by ezmlm Sender: Precedence: bulk List-Post: List-Help: List-Subscribe: List-Id: Received: (qmail 28412 invoked by uid 22791); 10 Aug 2009 09:04:23 -0000 X-SWARE-Spam-Status: No, hits=-2.2 required=5.0 tests=AWL,BAYES_00,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: sourceware.org Message-ID: <4A7FE28D.4050901@redhat.com> Date: Mon, 10 Aug 2009 09:04:00 -0000 From: Dodji Seketeli User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1b3pre) Gecko/20090513 Fedora/3.0-2.3.beta2.fc11 Thunderbird/3.0b2 MIME-Version: 1.0 To: GDB/Archer list Subject: [RFC] Proposal for a new DWARF name index section Content-Type: multipart/mixed; boundary="------------050509040509020703080604" X-SW-Source: 2009-q3/txt/msg00105.txt.bz2 This is a multi-part message in MIME format. --------------050509040509020703080604 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-length: 494 Hello, Tom Tromey and myself thought about possible ways to help GDB achieve faster name lookups by trying as much as possible to load some debug information in a lazy manner. The DWARF 3 and 4 specifications address that need by mean of the .debug_pubnames and .debug_pubtypes sections. However, we believe that the content of these sections falls short in several ways. The attached (simple) proposal is an attempt to address those issues. Comments ? Thanks. -- Dodji Seketeli Red Hat --------------050509040509020703080604 Content-Type: text/plain; name="debug-gnu-index.txt" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="debug-gnu-index.txt" Content-length: 5925 I) Introduction .debug_pubnames is an elf section that contains names of objects and functions. Each name is associated to the debug information of the corresponding object or function. That debug information is located in-extenso in the .debug_info section. As such, the .debug_pubnames section is an index which main interesting intend is to allow the retrieval of objects and functions debug information without having to scan loads of object files. Likewise, the .debug_pubtypes acts an index for type names. II) Problem In practice, for performance reasons, there are cases where we need to know the kind of entity a given name relates to, without having to actually load the debug information associated to said name. E.g., the qualified name x::y::z could represent either an object or a function. If it represents an object and if the user (wrongly) types in her debugger: "break x::y::z" - note that she cannot break into x::y::z because that name does not designate a function - the debugger ought to issue an error message without even having to load the debug information associated to the looked up symbol. Just looking at the index should be enough. DWARF3 (and DWARF4) have several deficiencies in their support for indexing. Some of these are design problems that cannot be fixed given the current format: * There is no way to know whether if a name references an enumerator, and object or a function. This makes it hard for debuggers to implement lazy debug information loading schemes. * Only public names are indexed. However, historically GDB has allowed users to inspect and break on private objects as well, without specifying a scope. * It is unclear from the standard whether enumerators should be listed in .debug_pubnames. * The .debug_pubtypes section does not encode whether a name is a typedef or a struct, union, or enum tag. * Compilers are not required to emit index entries for inlined functions which have no concrete out-of-line instance. This implies that a command like "break function", if it is to work for an inlined function, must read all the .debug_info sections even if it turns out that no such function exists anywhere. III) Proposal: An extended index section. A possible way to address the issue at hand is to create a new GNU-specific section called .debug_gnu_index. It would have a similar format as the existing pubnames section and thus would be a table that contains sets of variable length entries describing the names of global objects, enumerators, and functions, whose definitions are represented by debugging information entries owned by a single compilation unit. III.1) Format of gnu_index section Each set begins with a header containing four values, that are identical the the values contained in the pubnames set header. I have modified the pubnames format specification from the DWARF3 6.1.1 section as follows: 1. unit_length (initial length) The length of the entries for that set, not including the length field itself 2. version (uhalf) A version number. This number is specific to the name look-up table and is independent of the DWARF version number. 3. debug_info_offset (section offset) The offset from the beginning of the .debug_info section of the compilation unit header referenced by the set. 4. debug_info_length (section length) The size in bytes of the contents of the .debug_info section generated to represent that compilation unit. This header is followed by a variable number of offset/name/kind triplets. Each triplet consists of the section offset from the beginning of the compilation unit (corresponding to the current set) to the debugging information entry for the given object, followed by a null-terminated character string representing the name of the object as given by the DW_AT_name attribute of the referenced debugging entry. Each set of names is terminated by an offset field containing zero (and no following string). The last element of the triplet is the kind of entity designated by the triplet name element. This kind element is encoded as DWARF tag as specified in figure 18 of chapter 7.5.4. E.g., for a name designating a function, the kind element would be DW_TAG_subprogram. For an enumerator, the kind element would be DW_TAG_enumerator. For a variable, the kind element would be DW_TAG_variable, etc. A name may appear multiple times in the index, if it has multiple definitions. (This can be used to specify the points at which an inlined function appears.) We don't presently see the need for the section to encode whether a given object is public or private. The debug_gnu_index section must either be complete, or not exist. A compiler must emit all "global" names, according to rules appropriate to each CU's language, into this section. E.g., for C this would mean type tags, typedefs, enum constants, global variables, and functions. All instances of inlined functions must be mentioned, if such instances are mentioned in the .debug_info section. IV) Conclusion This small extension allows interested debuggers to speed up debug information loading by implementing lazy loading schemes without breaking existing debuggers which rely on the existing .debug_pubnames section format. On the other hand, it increases the size of debug information, as .debug_pubnames and .debug_gnu_index become somewhat redundant. We propose that GCC simply stop emitting .debug_pubnames and .debug_pubtypes, as experience has shown that they are not very useful. (In fact, on Linux GCC did not even generate .debug_pubtypes until 2009, and no one ever complained.) We believe the .debug_gnu_index format cannot be modified to make it be an addition of the .debug_pubname format, due to the deficiencies cited above. However, the problem might be fixable in DWARF5 by bumping the relevant version numbers and defining a new format for these sections. --------------050509040509020703080604--