From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <archer-return-1539-listarch-archer=sourceware.org@sourceware.org>
Received: (qmail 29257 invoked by alias); 10 Aug 2009 09:04:27 -0000
Mailing-List: contact archer-help@sourceware.org; run by ezmlm
Sender: <archer@sourceware.org>
Precedence: bulk
List-Post: <mailto:archer@sourceware.org>
List-Help: <mailto:archer-help@sourceware.org>
List-Subscribe: <mailto:archer-subscribe@sourceware.org>
List-Id: <archer.sourceware.org>
Received: (qmail 28412 invoked by uid 22791); 10 Aug 2009 09:04:23 -0000
X-SWARE-Spam-Status: No, hits=-2.2 required=5.0
	tests=AWL,BAYES_00,SPF_HELO_PASS,SPF_PASS
X-Spam-Check-By: sourceware.org
Message-ID: <4A7FE28D.4050901@redhat.com>
Date: Mon, 10 Aug 2009 09:04:00 -0000
From: Dodji Seketeli <dodji@redhat.com>
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1b3pre) Gecko/20090513 Fedora/3.0-2.3.beta2.fc11 Thunderbird/3.0b2
MIME-Version: 1.0
To: GDB/Archer list <archer@sourceware.org>
Subject: [RFC] Proposal for a new DWARF name index section
Content-Type: multipart/mixed;
 boundary="------------050509040509020703080604"
X-SW-Source: 2009-q3/txt/msg00105.txt.bz2

This is a multi-part message in MIME format.
--------------050509040509020703080604
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-length: 494

Hello,

Tom Tromey and myself thought about possible ways to help GDB achieve
faster name lookups by trying as much as possible to load some debug
information in a lazy manner.

The DWARF 3 and 4 specifications address that need by mean of the
.debug_pubnames and .debug_pubtypes sections. However, we believe that the
content of these sections falls short in several ways.

The attached (simple) proposal is an attempt to address those issues.

Comments ?

Thanks.

-- 
Dodji Seketeli
Red Hat

--------------050509040509020703080604
Content-Type: text/plain;
 name="debug-gnu-index.txt"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filename="debug-gnu-index.txt"
Content-length: 5925

I) Introduction

.debug_pubnames is an elf section that contains names of objects and
functions. Each name is associated to the debug information of the corresponding
object or function. That debug information is located in-extenso in the
.debug_info section.

As such, the .debug_pubnames section is an index which
main interesting intend is to allow the retrieval of objects and functions
debug information without having to scan loads of object files.

Likewise, the .debug_pubtypes acts an index for type names.

II) Problem


In practice, for performance reasons, there are cases where we need to know
the kind of entity a given name relates to, without having to actually load
the debug information associated to said name.

E.g., the qualified name x::y::z could represent either an object or a function. If
it represents an object and if the user (wrongly) types in her debugger:
"break x::y::z" - note that she cannot break into x::y::z because that name
does not designate a function - the debugger ought to issue an error
message without even having to load the debug information associated to the
looked up symbol. Just looking at the index should be enough.

DWARF3 (and DWARF4) have several deficiencies in their support for
indexing.  Some of these are design problems that cannot be fixed
given the current format:

* There is no way to know whether if a name references an enumerator,
  and object or a function. This makes it hard for debuggers to
  implement lazy debug information loading schemes.

* Only public names are indexed.  However, historically GDB has
  allowed users to inspect and break on private objects as well,
  without specifying a scope.

* It is unclear from the standard whether enumerators should be listed
  in .debug_pubnames.

* The .debug_pubtypes section does not encode whether a name is a
  typedef or a struct, union, or enum tag.

* Compilers are not required to emit index entries for inlined
  functions which have no concrete out-of-line instance.  This implies
  that a command like "break function", if it is to work for an
  inlined function, must read all the .debug_info sections even if it
  turns out that no such function exists anywhere.

III) Proposal: An extended index section.


A possible way to address the issue at hand is to create a new GNU-specific
section called .debug_gnu_index. It would have a similar format as the
existing pubnames section and thus would be a table that contains sets
of variable length entries describing the names of global objects,
enumerators, and functions, whose definitions are represented by debugging
information entries owned by a single compilation unit.

III.1) Format of gnu_index section


Each set begins with a header containing four values, that are identical
the the values contained in the pubnames set header. I have modified
the pubnames format specification from the DWARF3 6.1.1 section as follows:

1. unit_length (initial length)
  The length of the entries for that set, not including the length field itself

2. version (uhalf)
  A version number. This number is specific to the name look-up table and is
  independent of the DWARF version number.

3. debug_info_offset (section offset)
  The offset from the beginning of the .debug_info section of the compilation unit header
  referenced by the set.

4. debug_info_length (section length)
  The size in bytes of the contents of the .debug_info section generated to represent that
  compilation unit.

This header is followed by a variable number of offset/name/kind triplets.
Each triplet consists of the section offset from the beginning of the
compilation unit (corresponding to the current set) to the debugging
information entry for the given object, followed by a null-terminated
character string representing the name of the object as given by the
DW_AT_name attribute of the referenced debugging entry. Each set of names
is terminated by an offset field containing zero (and no following string).
The last element of the triplet is the kind of entity designated by the
triplet name element. This kind element is encoded as DWARF tag as
specified in figure 18 of chapter 7.5.4. E.g., for a name designating a
function, the kind element would be DW_TAG_subprogram. For an enumerator,
the kind element would be DW_TAG_enumerator. For a variable, the kind element
would be DW_TAG_variable, etc.

A name may appear multiple times in the index, if it has multiple
definitions.  (This can be used to specify the points at which an
inlined function appears.)

We don't presently see the need for the section to encode whether a
given object is public or private.

The debug_gnu_index section must either be complete, or not exist.  A
compiler must emit all "global" names, according to rules appropriate
to each CU's language, into this section.  E.g., for C this would mean
type tags, typedefs, enum constants, global variables, and functions.
All instances of inlined functions must be mentioned, if such
instances are mentioned in the .debug_info section.

IV) Conclusion

This small extension allows interested debuggers to speed up debug information
loading by implementing lazy loading schemes without breaking existing
debuggers which rely on the existing .debug_pubnames section format.

On the other hand, it increases the size of debug information, as
.debug_pubnames and .debug_gnu_index become somewhat redundant.  We
propose that GCC simply stop emitting .debug_pubnames and
.debug_pubtypes, as experience has shown that they are not very
useful.  (In fact, on Linux GCC did not even generate .debug_pubtypes
until 2009, and no one ever complained.)

We believe the .debug_gnu_index format cannot be modified to make it
be an addition of the .debug_pubname format, due to the deficiencies
cited above.  However, the problem might be fixable in DWARF5 by
bumping the relevant version numbers and defining a new format for
these sections.

--------------050509040509020703080604--