[RFC] Proposal for a new DWARF name index section

public inbox for archer@sourceware.org
 help / color / mirror / Atom feed

* [RFC] Proposal for a new DWARF name index section
@ 2009-08-10  9:04 Dodji Seketeli
  2009-08-10 14:38 ` Jan Kratochvil
  2009-08-20 17:31 ` Dodji Seketeli
  0 siblings, 2 replies; 31+ messages in thread
From: Dodji Seketeli @ 2009-08-10  9:04 UTC (permalink / raw)
  To: GDB/Archer list

[-- Attachment #1: Type: text/plain, Size: 494 bytes --]

Hello,

Tom Tromey and myself thought about possible ways to help GDB achieve
faster name lookups by trying as much as possible to load some debug
information in a lazy manner.

The DWARF 3 and 4 specifications address that need by mean of the
.debug_pubnames and .debug_pubtypes sections. However, we believe that the
content of these sections falls short in several ways.

The attached (simple) proposal is an attempt to address those issues.

Comments ?

Thanks.

-- 
Dodji Seketeli
Red Hat

[-- Attachment #2: debug-gnu-index.txt --]
[-- Type: text/plain, Size: 5925 bytes --]

I) Introduction

.debug_pubnames is an elf section that contains names of objects and
functions. Each name is associated to the debug information of the corresponding
object or function. That debug information is located in-extenso in the
.debug_info section.

As such, the .debug_pubnames section is an index which
main interesting intend is to allow the retrieval of objects and functions
debug information without having to scan loads of object files.

Likewise, the .debug_pubtypes acts an index for type names.

II) Problem

In practice, for performance reasons, there are cases where we need to know
the kind of entity a given name relates to, without having to actually load
the debug information associated to said name.

E.g., the qualified name x::y::z could represent either an object or a function. If
it represents an object and if the user (wrongly) types in her debugger:
"break x::y::z" - note that she cannot break into x::y::z because that name
does not designate a function - the debugger ought to issue an error
message without even having to load the debug information associated to the
looked up symbol. Just looking at the index should be enough.

DWARF3 (and DWARF4) have several deficiencies in their support for
indexing.  Some of these are design problems that cannot be fixed
given the current format:

* There is no way to know whether if a name references an enumerator,
  and object or a function. This makes it hard for debuggers to
  implement lazy debug information loading schemes.

* Only public names are indexed.  However, historically GDB has
  allowed users to inspect and break on private objects as well,
  without specifying a scope.

* It is unclear from the standard whether enumerators should be listed
  in .debug_pubnames.

* The .debug_pubtypes section does not encode whether a name is a
  typedef or a struct, union, or enum tag.

* Compilers are not required to emit index entries for inlined
  functions which have no concrete out-of-line instance.  This implies
  that a command like "break function", if it is to work for an
  inlined function, must read all the .debug_info sections even if it
  turns out that no such function exists anywhere.

III) Proposal: An extended index section.

A possible way to address the issue at hand is to create a new GNU-specific
section called .debug_gnu_index. It would have a similar format as the
existing pubnames section and thus would be a table that contains sets
of variable length entries describing the names of global objects,
enumerators, and functions, whose definitions are represented by debugging
information entries owned by a single compilation unit.

III.1) Format of gnu_index section

Each set begins with a header containing four values, that are identical
the the values contained in the pubnames set header. I have modified
the pubnames format specification from the DWARF3 6.1.1 section as follows:

1. unit_length (initial length)
  The length of the entries for that set, not including the length field itself

2. version (uhalf)
  A version number. This number is specific to the name look-up table and is
  independent of the DWARF version number.

3. debug_info_offset (section offset)
  The offset from the beginning of the .debug_info section of the compilation unit header
  referenced by the set.

4. debug_info_length (section length)
  The size in bytes of the contents of the .debug_info section generated to represent that
  compilation unit.

This header is followed by a variable number of offset/name/kind triplets.
Each triplet consists of the section offset from the beginning of the
compilation unit (corresponding to the current set) to the debugging
information entry for the given object, followed by a null-terminated
character string representing the name of the object as given by the
DW_AT_name attribute of the referenced debugging entry. Each set of names
is terminated by an offset field containing zero (and no following string).
The last element of the triplet is the kind of entity designated by the
triplet name element. This kind element is encoded as DWARF tag as
specified in figure 18 of chapter 7.5.4. E.g., for a name designating a
function, the kind element would be DW_TAG_subprogram. For an enumerator,
the kind element would be DW_TAG_enumerator. For a variable, the kind element
would be DW_TAG_variable, etc.

A name may appear multiple times in the index, if it has multiple
definitions.  (This can be used to specify the points at which an
inlined function appears.)

We don't presently see the need for the section to encode whether a
given object is public or private.

The debug_gnu_index section must either be complete, or not exist.  A
compiler must emit all "global" names, according to rules appropriate
to each CU's language, into this section.  E.g., for C this would mean
type tags, typedefs, enum constants, global variables, and functions.
All instances of inlined functions must be mentioned, if such
instances are mentioned in the .debug_info section.

IV) Conclusion

This small extension allows interested debuggers to speed up debug information
loading by implementing lazy loading schemes without breaking existing
debuggers which rely on the existing .debug_pubnames section format.

On the other hand, it increases the size of debug information, as
.debug_pubnames and .debug_gnu_index become somewhat redundant.  We
propose that GCC simply stop emitting .debug_pubnames and
.debug_pubtypes, as experience has shown that they are not very
useful.  (In fact, on Linux GCC did not even generate .debug_pubtypes
until 2009, and no one ever complained.)

We believe the .debug_gnu_index format cannot be modified to make it
be an addition of the .debug_pubname format, due to the deficiencies
cited above.  However, the problem might be fixable in DWARF5 by
bumping the relevant version numbers and defining a new format for
these sections.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC] Proposal for a new DWARF name index section
  2009-08-10  9:04 [RFC] Proposal for a new DWARF name index section Dodji Seketeli
@ 2009-08-10 14:38 ` Jan Kratochvil
  2009-08-10 17:36   ` Tom Tromey
  2009-08-20 17:31 ` Dodji Seketeli
  1 sibling, 1 reply; 31+ messages in thread
From: Jan Kratochvil @ 2009-08-10 14:38 UTC (permalink / raw)
  To: Dodji Seketeli; +Cc: GDB/Archer list

On Mon, 10 Aug 2009 11:04:13 +0200, Dodji Seketeli wrote:
> * Only public names are indexed.  However, historically GDB has
>   allowed users to inspect and break on private objects as well,
>   without specifying a scope.

I think this requirement should be discussed more.

Still I find the goal is that the expression evaluation in debugger should
match the expression evaluation in compiler.

In practice the results are tricky due to the static symbols resolution:
$ echo 'main(){}' >libm.c;cc -o libm libm.c -lm -g;gdb -q -ex start -ex 'p b' ./libm
...
$1 = {i = {0, 1068498944}, d = 0.0625}
$ nm /lib64/libm.so.6 |grep ' b$'
0000003bb1e4bad8 r b


There could be an option to support backward-compatible "slow" mode and use
the (possible DWARF) indexes only for the new compiler-compliant mode.


Including also a C++ overloading example at the bottom but it may be offtopic.


> * It is unclear from the standard whether enumerators should be listed
>   in .debug_pubnames.
+
> * There is no way to know whether if a name references an enumerator,
>   and object or a function. This makes it hard for debuggers to
>   implement lazy debug information loading schemes.

A fixup by http://dwarfstd.org/Issues.php looks as appropriate in each case.


> * There is no way to know whether if a name references an enumerator,
>   and object or a function. This makes it hard for debuggers to
>   implement lazy debug information loading schemes.
+
> * The .debug_pubtypes section does not encode whether a name is a
>   typedef or a struct, union, or enum tag.

Are there any serious consequences?  Occasional needless read of a CU to find
out the type should not be a real performance hit.


> * Compilers are not required to emit index entries for inlined
>   functions which have no concrete out-of-line instance.  This implies
>   that a command like "break function", if it is to work for an
>   inlined function, must read all the .debug_info sections even if it
>   turns out that no such function exists anywhere.

Both
	http://dwarf.freestandards.org/Dwarf3.pdf
	http://www.dwarfstd.org/doc/DWARF4-draft3-090522.pdf
say
	C++ member functions with a definition in the class declaration are
	definitions in every compilation unit containing the class
	declaration, but if there is no concrete out-of-line instance there is
	no need to have a .debug_pubnames entry for the member function.

"no need to have" (and you say "are not required") so GCC is free to emit such
index entries.  Excessive index entries hopefully should not break debuggers.

GDB could check DW_AT_producer against known GCC versions to skip the slow
reading of '.debug_info's and rely just on '.debug_pubnames' - to find out all
the inlined instances of a specified function.



Regards,
Jan


C++ example:

(set -ex; g++ -o main main.C other.C -Wall -ggdb2; ./main)
gdb ./main
(gdb) start
(gdb) p &c
$1 = (C *) 0x7fffffffd3df
(gdb) call c.main()
main-main
(gdb) call C::main()
main-main
(gdb) call c.other()
Couldn't find method C::other
(gdb) call C::other()
other-other
	# Isn't it wrong here?  C::other() comes from a different CU.
(gdb) 

C++ class with the same name should be probably always fully equivalent as all
the methods are global ("W" nm symbol class) by default.  Thus my example is
an invalid C++ program probably.  Automatically limiting the C++ scope only to
the current CU would not be able to call methods having instances in other CUs
and just accidentally missing at the current CU due to being unused.


==> main.C <==
#include <stdio.h>

class C
  {
  public:
    static void m (int x) { puts ("main-int"); }
//    static void m (long x) { puts ("main-long"); }
    static void main () { puts ("main-main"); }
  };

extern void other ();

int
main ()
{
  C c;

  c.m (1);
  c.m (1L);
  c.main ();

  other ();

  return 0;
}

==> other.C <==
#include <stdio.h>

class C
  {
  public:
//    static void m (int x) { puts ("other-int"); }
    static void m (long x) { puts ("other-long"); }
    static void other () { puts ("other-other"); }
  };

void
other ()
{
  C c;

  c.m (1);
  c.m (1L);
  c.other ();
}

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC] Proposal for a new DWARF name index section
  2009-08-10 14:38 ` Jan Kratochvil
@ 2009-08-10 17:36   ` Tom Tromey
  2009-08-10 18:21     ` Jan Kratochvil
  0 siblings, 1 reply; 31+ messages in thread
From: Tom Tromey @ 2009-08-10 17:36 UTC (permalink / raw)
  To: Jan Kratochvil; +Cc: Dodji Seketeli, GDB/Archer list

>>>>> "Jan" == Jan Kratochvil <jan.kratochvil@redhat.com> writes:

Dodji> * Only public names are indexed.  However, historically GDB has
Dodji> allowed users to inspect and break on private objects as well,
Dodji> without specifying a scope.

Jan> I think this requirement should be discussed more.

Jan> Still I find the goal is that the expression evaluation in debugger
Jan> should match the expression evaluation in compiler.

I agree with this principle, but the reason to include this information
in the index has to do with setting breakpoints, not with expression
evaluation.

I don't think breakpoint setting should necessarily follow language
rules.

It is not uncommon for a program to have a uniquely-named static
function.  It seems friendly to users to let them type "break func" in
any context.  And, it seems like this operation should not cause gdb to
go off and read all the debuginfo.

Anyway, that is my logic.  Which part of this do you disagree with?
Or, am I missing something else?

[...]
Jan> A fixup by http://dwarfstd.org/Issues.php looks as appropriate in
Jan> each case.

Yeah, for the minor things.  However...

Dodji> * Compilers are not required to emit index entries for inlined
Dodji> functions which have no concrete out-of-line instance.  This implies
Dodji> that a command like "break function", if it is to work for an
Dodji> inlined function, must read all the .debug_info sections even if it
Dodji> turns out that no such function exists anywhere.

... this is not minor.

And, if we agree about the private names problem, then that is not minor
either :)

Jan> 	C++ member functions with a definition in the class declaration are
Jan> 	definitions in every compilation unit containing the class
Jan> 	declaration, but if there is no concrete out-of-line instance there is
Jan> 	no need to have a .debug_pubnames entry for the member function.

Jan> "no need to have" (and you say "are not required") so GCC is free
Jan> to emit such index entries.  Excessive index entries hopefully
Jan> should not break debuggers.

Jan> GDB could check DW_AT_producer against known GCC versions to skip
Jan> the slow reading of '.debug_info's and rely just on
Jan> '.debug_pubnames' - to find out all the inlined instances of a
Jan> specified function.

That's gross, though.

There does not seem to be a big downside to introducing a new section
that does exactly what we want.  It is automatically backward
compatible.  It is (I believe) not difficult to implement.  And,
finally, we can make it reliable by fiat.

Tom

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC] Proposal for a new DWARF name index section
  2009-08-10 17:36   ` Tom Tromey
@ 2009-08-10 18:21     ` Jan Kratochvil
  2009-08-11  7:55       ` Dodji Seketeli
  2009-08-11 22:29       ` Tom Tromey
  0 siblings, 2 replies; 31+ messages in thread
From: Jan Kratochvil @ 2009-08-10 18:21 UTC (permalink / raw)
  To: Tom Tromey; +Cc: Dodji Seketeli, GDB/Archer list

On Mon, 10 Aug 2009 19:36:14 +0200, Tom Tromey wrote:
> but the reason to include this information in the index has to do with
> setting breakpoints, not with expression evaluation.
> 
> I don't think breakpoint setting should necessarily follow language
> rules.

OK, thanks for the clarification, forgot etc.

Still when thinking about it:
* I do not find the symbols reading much slow myself (working _on_ small GDB).
* People complaining it is slow usually use IDEs which use rather file:line
  based breakpoints, don't they?  (As it was discussed on RH IRC today.)
  = Assuming the C++ people do not put breakpoints on static out-of-scope
    functions by name.

For the latter case I agree a fix is needed but an index of static names will
not help with it.


> It is not uncommon for a program to have a uniquely-named static
> function.  It seems friendly to users to let them type "break func" in
> any context.

(One needs to think about same-name functions both static and global in
different files but sure it is unrelated to the new index.)


> Anyway, that is my logic.  Which part of this do you disagree with?
> Or, am I missing something else?

We have concluded the currently missing information is for:
* static functions (are they really needed for the file:line IDE usecases?)
* inlined functions which have no concrete out-of-line instance
  (the same file:line IDE usecase question)

IMO not for:
* static non-function symbols are deprecated (backward GDB compatibility only)


> There does not seem to be a big downside to introducing a new section
> that does exactly what we want.  It is automatically backward
> compatible.  It is (I believe) not difficult to implement.  And,
> finally, we can make it reliable by fiat.

While it is an improvement with existing .debug_pubnames, .debug_pubtypes and
.debug_aranges one can:

* Lookup everything in current CU which can is fully read-in from .debug_info.
* Always lookup global symbols from other CUs through the DWARF indexes.
* Fallback to the full read-in only for:
  * static functions in out of the language (compiler) scope
  * inlined functions which have no concrete out-of-line instance
  * reference to a non-existing symbol

archer-tromey-delayed-symfile could be probably more improved by properly
following the indexes.  While I did fix a regression I broke a performance by
my patch before, it could be probably patched better:
	[delayed-symfile] [commit] Fix a regression on forgotten delayed read of a type info.
	http://sourceware.org/ml/archer/2009-q1/msg00232.html


As a summary GDB could already give (with proper non-existing patches) in the
common usecases acceptable performance even based just on the existing DWARF
indexes, couldn't it?  I did not think so before this mail thread.



Thanks,
Jan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC] Proposal for a new DWARF name index section
  2009-08-10 18:21     ` Jan Kratochvil
@ 2009-08-11  7:55       ` Dodji Seketeli
  2009-08-11 17:45         ` Jan Kratochvil
  2009-08-11 22:29       ` Tom Tromey
  1 sibling, 1 reply; 31+ messages in thread
From: Dodji Seketeli @ 2009-08-11  7:55 UTC (permalink / raw)
  To: Jan Kratochvil; +Cc: Tom Tromey, GDB/Archer list

Le 10/08/2009 20:21, Jan Kratochvil a Ã©crit :

> OK, thanks for the clarification, forgot etc.
> 
> Still when thinking about it:
> * I do not find the symbols reading much slow myself (working _on_ small GDB).

I agree this is hard to assess precisely. In my experience, debugging large
c++ applications made of lots of dynamic libraries (like mozilla or any
webkit based app) triggers lots of disk access. How much of that is due to
debug info reading ? I don't know. What is the weight of the time penalty
induced by disk access ? I don't know. I think trying to get accurate data
to answer those questions is costly. Any taker ? :)

My hope is that the cost of trying to come up with precise data is not
_much_ less than actually trying to do the lazy reading stuff and see what
we gain. After all, compiler optimization junkies use that strategy all the
time :)  If you, experienced GDB folks, unanimously think trying this is
not worth it, then OK :)
FWIW, I think implementing this new section stuff is not really complex on
the gcc side. I guess the GDB side of things might be trickier ?

> * People complaining it is slow usually use IDEs which use rather file:line
>   based breakpoints, don't they?  (As it was discussed on RH IRC today.)
>   = Assuming the C++ people do not put breakpoints on static out-of-scope
>     functions by name.

I'd say it really depends on the user. If I am used to the code base I am
debugging, I will tend to set quite some breakpoints by name, because
opening $file, then clicking on the right line takes more time than doing
ctrl-b (assuming that's the shortcut to set a breakpoint) and typing the
name of the known function I want to break in. The debugger opens the file
and scrolls down to where the breakpoint is set. Much faster. Even better
if the debugger can provide me with _fast_ name completion when typing the
function name.
How cool would it be if GDB wouldn't stand between me and the joy of
snappiness when I a take that road of speed ? :-)

Oh, and let not forget the command line user base :)

> For the latter case I agree a fix is needed but an index of static names will
> not help with it.

True.

> 
>> Anyway, that is my logic.  Which part of this do you disagree with?
>> Or, am I missing something else?
> 
> We have concluded the currently missing information is for:
> * static functions (are they really needed for the file:line IDE usecases?)

I think they aren't needed for that exact use case. But as I said earlier,
I think there are other use cases that should be faster, are useful for
regular debugger users, and that are unfortunately not as faster as they
ought to be today. And we can address those, can't we ?

> * inlined functions which have no concrete out-of-line instance
>   (the same file:line IDE usecase question)

[...]

> IMO not for:
> * static non-function symbols are deprecated (backward GDB compatibility only)

Sorry, I am not sure to fully understand this. Do global variables and
enumerator constants fall into this "deprecated" category ?

> 
>> There does not seem to be a big downside to introducing a new section
>> that does exactly what we want.  It is automatically backward
>> compatible.  It is (I believe) not difficult to implement.  And,
>> finally, we can make it reliable by fiat.
> 
> While it is an improvement with existing .debug_pubnames, .debug_pubtypes and
> .debug_aranges one can:
> 
> * Lookup everything in current CU which can is fully read-in from .debug_info.
> * Always lookup global symbols from other CUs through the DWARF indexes.
> * Fallback to the full read-in only for:
>   * static functions in out of the language (compiler) scope
>   * inlined functions which have no concrete out-of-line instance
>   * reference to a non-existing symbol
> 
> archer-tromey-delayed-symfile could be probably more improved by properly
> following the indexes.  While I did fix a regression I broke a performance by
> my patch before, it could be probably patched better:
> 	[delayed-symfile] [commit] Fix a regression on forgotten delayed read of a type info.
> 	http://sourceware.org/ml/archer/2009-q1/msg00232.html
> 
> 
> As a summary GDB could already give (with proper non-existing patches) in the
> common usecases acceptable performance even based just on the existing DWARF
> indexes, couldn't it?  I did not think so before this mail thread.

From what I have seen, I'd say, of course things can be improved with the
existing sections. I am not arguing against that.

What I see is that:

1/ There are "basic" usage cases that you won't be able to speedup, e.g.
imagine there is a global variable named 'foobar'. The user wants to break
in a function at some point and types "break foobar". I think the debugger
ought to know if there is a visible function named foobar in which it could
set the breakpoint. If not, it should gracefully display an error to the
user (possibly proposing the name of another function, close to foobar,
into which to break ?) without having to hit the disk to scan possibly
zillions of objects.

2/ To reach a point where we could implement those usage cases in all
serenity, I am not sure building on top of the current infrastructure (e.g.
extending the current .debug_pubnames and .debug_pubtypes) in a backward
compatible way is possible.

3/ We are lucky that no one seems to be using .debug_pubnames and
.debug_pubtypes today.

So based on 2/ and 3/ maybe it can be worth it to just throw out
.debug_pubname and .debug_pubtypes and think about something more "solid"
that we can build on ?

Thanks for reading so far.

-- 
Dodji Seketeli
Red Hat

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC] Proposal for a new DWARF name index section
  2009-08-11  7:55       ` Dodji Seketeli
@ 2009-08-11 17:45         ` Jan Kratochvil
  2009-08-11 22:43           ` Tom Tromey
  0 siblings, 1 reply; 31+ messages in thread
From: Jan Kratochvil @ 2009-08-11 17:45 UTC (permalink / raw)
  To: Dodji Seketeli; +Cc: Tom Tromey, GDB/Archer list

On Tue, 11 Aug 2009 09:54:49 +0200, Dodji Seketeli wrote:
> Le 10/08/2009 20:21, Jan Kratochvil a Ã©crit :
> > Still when thinking about it:
> > * I do not find the symbols reading much slow myself (working _on_ small GDB).
> 
> I agree this is hard to assess precisely. In my experience, debugging large
> c++ applications made of lots of dynamic libraries (like mozilla or any
> webkit based app) triggers lots of disk access. How much of that is due to
> debug info reading ?

Most of the time (like >95%, did not measure now), you can try:
  --readnever        Do not read symbol files.


> FWIW, I think implementing this new section stuff is not really complex on
> the gcc side.
...
> I'd say it really depends on the user. If I am used to the code base I am
> debugging, I will tend to set quite some breakpoints by name, because
> opening $file, then clicking on the right line takes more time than doing
> ctrl-b (assuming that's the shortcut to set a breakpoint) and typing the
> name of the known function I want to break in. The debugger opens the file
> and scrolls down to where the breakpoint is set. Much faster. Even better
> if the debugger can provide me with _fast_ name completion when typing the
> function name.

OK, so the static out-of-scope breakpoint-by-name make sense for large apps.


> > We have concluded the currently missing information is for:
> > * static functions (are they really needed for the file:line IDE usecases?)
> 
> I think they aren't needed for that exact use case. But as I said earlier,
> I think there are other use cases that should be faster, are useful for
> regular debugger users, and that are unfortunately not as faster as they
> ought to be today. And we can address those, can't we ?

Yes, GDB can address the non-static use cases with the DWARF indexes already.


> > IMO not for:
> > * static non-function symbols are deprecated (backward GDB compatibility only)
> 
> Sorry, I am not sure to fully understand this.

Due to this existing GDB behavior:
On Mon, 10 Aug 2009 16:38:04 +0200, Jan Kratochvil wrote:
# In practice the results are tricky due to the static symbols resolution:
# $ echo 'main(){}' >libm.c;cc -o libm libm.c -lm -g;gdb -q -ex start -ex 'p b' ./libm
# ...
# $1 = {i = {0, 1068498944}, d = 0.0625}
# $ nm /lib64/libm.so.6 |grep ' b$'
# 0000003bb1e4bad8 r b


> Do global variables and
> enumerator constants fall into this "deprecated" category ?

Global variables are a part of .debug_pubnames and they are not "deprecated"
as they are globally visible in the C language.

enumerator constants are not globally visible, they create no ELF symbols,
they need to be #included for each CU, they are like static functions,
therefore they should not be a part of .debug_pubnames.  As you wrote:
On Mon, 10 Aug 2009 11:04:13 +0200, Dodji Seketeli wrote:
# * It is unclear from the standard whether enumerators should be listed
#   in .debug_pubnames.
I think the DWARF spec is right and intentionally not listing them for
.debug_pubnames/.debug_pubtypes.


> 1/ There are "basic" usage cases that you won't be able to speedup, e.g.
> imagine there is a global variable named 'foobar'. The user wants to break
> in a function at some point and types "break foobar". I think the debugger
> ought to know if there is a visible function named foobar in which it could
> set the breakpoint. If not, it should gracefully display an error to the
> user (possibly proposing the name of another function, close to foobar,
> into which to break ?) without having to hit the disk to scan possibly
> zillions of objects.

`foobar' will be found in local CU and then in .debug_pubnames.  But if the
out-of-scope static function names should be reachable then we need some new
index, yes.


> So based on 2/ and 3/ maybe it can be worth it to just throw out
> .debug_pubname and .debug_pubtypes and think about something more "solid"
> that we can build on ?

Yes, the new index would be useful for:
  * static functions in out of the language (compiler) scope
  * any inlined functions (so that no '.debug_line's need to be read for
    putting a breakpoint-by-name).

(dropping the IMO-"deprecated" out-of-scope static data symbols lookups)


I hope these mails were useful for both sides, at least for me.


Thanks,
Jan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC] Proposal for a new DWARF name index section
  2009-08-10 18:21     ` Jan Kratochvil
  2009-08-11  7:55       ` Dodji Seketeli
@ 2009-08-11 22:29       ` Tom Tromey
  1 sibling, 0 replies; 31+ messages in thread
From: Tom Tromey @ 2009-08-11 22:29 UTC (permalink / raw)
  To: Jan Kratochvil; +Cc: Dodji Seketeli, GDB/Archer list

>>>>> "Jan" == Jan Kratochvil <jan.kratochvil@redhat.com> writes:

Jan> * I do not find the symbols reading much slow myself (working _on_
Jan> small GDB).

Jan> * People complaining it is slow usually use IDEs which use rather file:line
Jan>   based breakpoints, don't they?  (As it was discussed on RH IRC today.)
Jan>   = Assuming the C++ people do not put breakpoints on static out-of-scope
Jan>     functions by name.

Thanks for bringing this up.  I think there are a few different use
cases to consider.

My reason for adding "break function" to list is just that it is such a
common CLI operation.

This current proposal is a way to fix the index problem, which is one
necessary step.  It is not the only necessary step, though -- we must
also change gdb to take advantage of the index.  This probably means
some kind of surgery on partial symbol tables (ideally I'd like to get
rid of them, but we'll see).

Jan> We have concluded the currently missing information is for:
Jan> * static functions (are they really needed for the file:line IDE usecases?)
Jan> * inlined functions which have no concrete out-of-line instance
Jan>   (the same file:line IDE usecase question)

Jan> IMO not for:
Jan> * static non-function symbols are deprecated (backward GDB
Jan> compatibility only)

Ok.  This is where I disagree, for reasons I won't repeat :)

Jan> * Fallback to the full read-in only for:
Jan>   * static functions in out of the language (compiler) scope
Jan>   * inlined functions which have no concrete out-of-line instance
Jan>   * reference to a non-existing symbol

This is another thing I don't like.  It means that a typo in a "break"
command will cause gdb to pause while it scans a lot of debuginfo.  This
also means that any attempt to set a pending breakpoint will require a
full scan.

Jan> As a summary GDB could already give (with proper non-existing
Jan> patches) in the common usecases acceptable performance even based
Jan> just on the existing DWARF indexes, couldn't it?  I did not think
Jan> so before this mail thread.

It could do better than it does today, but still not as good as we could
do with a few extensions.  The extensions are cheap on the gcc side
(already done IIUC) and because there is no gdb patch yet, equally cheap
there.

Tom

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC] Proposal for a new DWARF name index section
  2009-08-11 17:45         ` Jan Kratochvil
@ 2009-08-11 22:43           ` Tom Tromey
  2009-08-12 19:20             ` Jan Kratochvil
  0 siblings, 1 reply; 31+ messages in thread
From: Tom Tromey @ 2009-08-11 22:43 UTC (permalink / raw)
  To: Jan Kratochvil; +Cc: Dodji Seketeli, GDB/Archer list

[ time spent reading ]
Jan> Most of the time (like >95%, did not measure now), you can try:

Yeah, it is the #1 time user and #1 space user :-)

I think whatever problems we have here will be exaggerated by
multi-exec, too.

Dodji> Do global variables and enumerator constants fall into this
Dodji> "deprecated" category ?

Jan> Global variables are a part of .debug_pubnames and they are not
Jan> "deprecated" as they are globally visible in the C language.

Jan> enumerator constants are not globally visible, they create no ELF
Jan> symbols, they need to be #included for each CU, they are like
Jan> static functions, therefore they should not be a part of
Jan> .debug_pubnames.

Types also are not globally visible and create no ELF symbols.  So, I
think you need an additional argument about why enum constants ought to
be treated differently.

Also, I will occasionally start gdb just to "print/d CONSTANT" to see
what its value is.  So this would be another user-visible change -- we'd
require a full debuginfo scan on any expression.  (Though it occurs to
me that perhaps this is happening due to some "static scope" set at main
or something ... I did not check.  If so that would eliminate this
objection to this particular bit.)

Jan> I hope these mails were useful for both sides, at least for me.

Me too.

Tom

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC] Proposal for a new DWARF name index section
  2009-08-11 22:43           ` Tom Tromey
@ 2009-08-12 19:20             ` Jan Kratochvil
  0 siblings, 0 replies; 31+ messages in thread
From: Jan Kratochvil @ 2009-08-12 19:20 UTC (permalink / raw)
  To: Tom Tromey; +Cc: Dodji Seketeli, GDB/Archer list

On Wed, 12 Aug 2009 00:42:46 +0200, Tom Tromey wrote:
> Jan> enumerator constants are not globally visible, they create no ELF
> Jan> symbols, they need to be #included for each CU, they are like
> Jan> static functions, therefore they should not be a part of
> Jan> .debug_pubnames.
> 
> Types also are not globally visible and create no ELF symbols.  So, I
> think you need an additional argument about why enum constants ought to
> be treated differently.

OK, I find .debug_pubtypes as a strong argument the DWARF committee probably
intended debuggers should have wider lookup scope than the language itself.

> Also, I will occasionally start gdb just to "print/d CONSTANT" to see
> what its value is.  So this would be another user-visible change -- we'd
> require a full debuginfo scan on any expression.

I expected before the debugger should require the lookup scope as the language
does:

$ gdb -q ./file
(gdb) print/d CONSTANT
No symbol "CONSTANT" in current context.
(gdb) list main
[...]
(gdb) print/d CONSTANT
$1 = 42

> (Though it occurs to me that perhaps this is happening due to some "static
> scope" set at main or something ... I did not check.  If so that would
> eliminate this objection to this particular bit.)

IMO (did not verify) it is because there no default scope for main.  But if
the symbol is not found in the current lexical block and current CU the symbol
is looked for in all the other CUs and objfiles.  If there would be two
differing definitions of CONSTANT GDB will pick a random one (which is a bug).

On Wed, 12 Aug 2009 00:29:10 +0200, Tom Tromey wrote:
> The extensions are cheap on the gcc side (already done IIUC) and because
> there is no gdb patch yet, equally cheap there.

OK.

I am now convinced the lookup for static symbols is probably OK.  Software
projects already try to have unique symbol names even if they are static.

And clashing names for different C++ classes or enums are considered more
a bug than anything encouraged.

Thanks,
Jan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC] Proposal for a new DWARF name index section
  2009-08-10  9:04 [RFC] Proposal for a new DWARF name index section Dodji Seketeli
  2009-08-10 14:38 ` Jan Kratochvil
@ 2009-08-20 17:31 ` Dodji Seketeli
  2009-11-17 23:46   ` Cary Coutant
  1 sibling, 1 reply; 31+ messages in thread
From: Dodji Seketeli @ 2009-08-20 17:31 UTC (permalink / raw)
  To: GDB/Archer list

Le 10/08/2009 11:04, Dodji Seketeli a Ã©crit :

> The DWARF 3 and 4 specifications address that need by mean of the
> .debug_pubnames and .debug_pubtypes sections. However, we believe that the
> content of these sections falls short in several ways.
> 
> The attached (simple) proposal is an attempt to address those issues.

As a follow up to this proposal, I have opened an enhancement request in
the gcc bugzilla at http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41130 .

A first patch implementing the generation of the .debug_gnu_index is
attached there. It's complete as far as the functionalities are concerned
(modulo the possible bugs) but it lacks tests for now. I will update it
with deja gnu tests soon.

I have also put the wording of the proposal in the wiki at
http://gcc.gnu.org/wiki/DebugGNUIndexSection .

Thanks.

-- 
Dodji Seketeli
Red Hat

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC] Proposal for a new DWARF name index section
  2009-08-20 17:31 ` Dodji Seketeli
@ 2009-11-17 23:46   ` Cary Coutant
  2009-11-20 17:25     ` Tom Tromey
  2009-12-11 23:56     ` Tom Tromey
  0 siblings, 2 replies; 31+ messages in thread
From: Cary Coutant @ 2009-11-17 23:46 UTC (permalink / raw)
  To: Dodji Seketeli; +Cc: GDB/Archer list

> As a follow up to this proposal, I have opened an enhancement request in
> the gcc bugzilla at http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41130 .

Sorry for jumping in late to this discussion, and please forgive me if
I missed something important, but I was just talking about
pubnames/pubtypes the other day with some other members of the DWARF
committee, and your wiki page came up.

It seems to me that the .debug_pubnames/pubtypes/aranges sections were
created specifically so that the debugger could have fast access
without having to pre-read all the debug info, so the idea of creating
a new section rather than fix the one we already have is questionable.
The deficiencies you've listed in this thread (and on the wiki) are:

- No enumerators.
- No entries for inline instances with no out-of-line instance.
- No private names.
- No information about whether a name is a function or variable,
typedef or struct or union or enum.

I don't see why the first three can't be fixed simply by changing gcc
to emit those names into .debug_pubnames. Forget the "pub" part of the
section name -- the point of that index is so that the debugger can
find every compilation unit that provides info on a name that can be
typed unqualified into the debugger's UI. I don't think there's
anything in the DWARF spec that *prohibits* the compiler from doing
this; if you're looking for language that requires the compiler to do
that, I think you'll get the standard "DWARF is descriptive, not
prescriptive" answer (the spec says what the DWARF means, not what a
producer or consumer *must* do). The list of things for which the
compiler uses this section should be a QOI issue between gcc and gdb
or whatever other compiler/debugger pair you're interested in.

(My first reaction was that a .debug_privnames section was perhaps a
reasonable thing to add to DWARF-5, but then I started wondering what
the difference really was between pubnames and privnames. As far as
the debugger experience goes, I couldn't think of anything, so I'd
prefer to think of "pubnames" as a list of names that a debugger user
would want to type without qualification, regardless of the compiler's
definition of the word "public". If a wording change in the DWARF spec
would help there, I wouldn't have a problem with that.)

As for the fourth item, the index already points directly to the DIE
that defines the name. It should be almost trivial to go lookup the
tag of that DIE without actually triggering a full symbol table read,
which should tell you exactly what you need to know. The pubnames
index is already huge; the use cases you've mentioned don't seem to
justify the extra redundancy.

Now how will gdb know whether or not the pubnames index actually has
all of this extra info? A suggestion to have gdb look at the producer
string was shot down as ugly, but compare that to the alternative of
using a non-standard index. You could instead tie it to the switch to
DWARF-4 and just check the section version number in the compilation
unit header (the version number for .debug_pubnames isn't scheduled to
change with DWARF-4, unfortunately). Another alternative is just to
have gdb use the pubnames index if it's present, and any name that
isn't in the index simply won't be found without qualification.

-cary

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC] Proposal for a new DWARF name index section
  2009-11-17 23:46   ` Cary Coutant
@ 2009-11-20 17:25     ` Tom Tromey
  2009-11-22  4:39       ` Daniel Jacobowitz
  2009-12-01 19:14       ` Tom Tromey
  2009-12-11 23:56     ` Tom Tromey
  1 sibling, 2 replies; 31+ messages in thread
From: Tom Tromey @ 2009-11-20 17:25 UTC (permalink / raw)
  To: Cary Coutant; +Cc: Dodji Seketeli, GDB/Archer list

>>>>> "Cary" == Cary Coutant <ccoutant@google.com> writes:

>> As a follow up to this proposal, I have opened an enhancement request in
>> the gcc bugzilla at http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41130 .

Cary> Sorry for jumping in late to this discussion

It is no problem at all, thanks for replying.

Cary> It seems to me that the .debug_pubnames/pubtypes/aranges sections were
Cary> created specifically so that the debugger could have fast access
Cary> without having to pre-read all the debug info, so the idea of creating
Cary> a new section rather than fix the one we already have is questionable.

Yes, I agree it would be ideal not to do this.

Cary> I don't see why the first three can't be fixed simply by changing gcc
Cary> to emit those names into .debug_pubnames. Forget the "pub" part of the
Cary> section name -- the point of that index is so that the debugger can
Cary> find every compilation unit that provides info on a name that can be
Cary> typed unqualified into the debugger's UI. I don't think there's
Cary> anything in the DWARF spec that *prohibits* the compiler from doing
Cary> this

True, though I think it is clearly implied.

Cary> As for the fourth item, the index already points directly to the DIE
Cary> that defines the name. It should be almost trivial to go lookup the
Cary> tag of that DIE without actually triggering a full symbol table read,
Cary> which should tell you exactly what you need to know. The pubnames
Cary> index is already huge; the use cases you've mentioned don't seem to
Cary> justify the extra redundancy.

I think this is probably the most important problem; the rest I think
are fixable by following your advice :-)

GDB does actually use this tag info during lookups, before deciding
whether or not to fully read a CU.  (And think it is reasonable to
assume that other debuginfo consumers would want to do so as well.)

I agree we could read the DIE and look at the tag.  However, that means
disk access to read the DIE, and disk access to read in the abbrev
table.  That seems very expensive for what is supposed to be a quick
index lookup.

Cary> Now how will gdb know whether or not the pubnames index actually has
Cary> all of this extra info? A suggestion to have gdb look at the producer
Cary> string was shot down as ugly, but compare that to the alternative of
Cary> using a non-standard index.

The reason I think a non-standard index is better in this case is that
its mere presence implies the DWARF producer is attempting to emit what
we want to see.

I think parsing the producer info is a problem for two reasons.

First, it is a pain to keep a list of the known-good GCC versions.  We
can't just say "4.5 or better is good", because distro compilers
back-port patches to older versions, etc.

Second, reading the producer info means reading a DIE, which we'd rather
avoid.

I'm not sure about tying it to DWARF-4, I'll have to think about that.
That requires less reading (no abbrevs), so it might not be as bad; but
it does still mean reading some data per CU from disk -- and the
extension index does not need that.

Cary> Another alternative is just to have gdb use the pubnames index if
Cary> it's present, and any name that isn't in the index simply won't be
Cary> found without qualification.

I think this won't work because GCC historically has emitted bad
pubnames info.

I've found that we do sometimes need to read a DIE to extract the line
table, because GDB has a few searches that map over file names.  I'm not
sure what to do about this yet.

I think I should probably implement your proposal and try to measure the
difference.

Tom

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC] Proposal for a new DWARF name index section
  2009-11-20 17:25     ` Tom Tromey
@ 2009-11-22  4:39       ` Daniel Jacobowitz
  2009-11-23 19:51         ` Tom Tromey
  2009-12-01 19:14       ` Tom Tromey
  1 sibling, 1 reply; 31+ messages in thread
From: Daniel Jacobowitz @ 2009-11-22  4:39 UTC (permalink / raw)
  To: Tom Tromey; +Cc: Cary Coutant, Dodji Seketeli, GDB/Archer list

On Fri, Nov 20, 2009 at 10:24:38AM -0700, Tom Tromey wrote:
> I agree we could read the DIE and look at the tag.  However, that means
> disk access to read the DIE, and disk access to read in the abbrev
> table.  That seems very expensive for what is supposed to be a quick
> index lookup.

If you had a sufficiently smart consumer that it didn't need to keep
all of .debug_info in memory all the time, then this would have some
measurable impact.  But GDB isn't that consumer.  If you've got the
.debug_info section read in or mapped anyway (one-time operation),
then checking the DIE tag is not too bad.  It will be a cache miss, of
course.

If you don't read this data off disk when reading the pubnames, you'll
have to do it the first time one of them is referenced, anyway.  This
is separate from parsing all the DIEs (psymtabs), which is much more
work.

Someone suggested on gdb-patches that GDB could generate and cache the
pubnames table.  It follows that a separate packaging tool could do so
also.  Something to consider... during separate debug file generation,
for instance.

-- 
Daniel Jacobowitz
CodeSourcery

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC] Proposal for a new DWARF name index section
  2009-11-22  4:39       ` Daniel Jacobowitz
@ 2009-11-23 19:51         ` Tom Tromey
  0 siblings, 0 replies; 31+ messages in thread
From: Tom Tromey @ 2009-11-23 19:51 UTC (permalink / raw)
  To: Daniel Jacobowitz; +Cc: Cary Coutant, Dodji Seketeli, GDB/Archer list

>>>>> "Daniel" == Daniel Jacobowitz <drow@false.org> writes:

Tom> I agree we could read the DIE and look at the tag.  However, that means
Tom> disk access to read the DIE, and disk access to read in the abbrev
Tom> table.  That seems very expensive for what is supposed to be a quick
Tom> index lookup.

Daniel> If you had a sufficiently smart consumer that it didn't need to keep
Daniel> all of .debug_info in memory all the time, then this would have some
Daniel> measurable impact.

We now mmap .debug_info, and I've been assuming that this would not
actually touch the disk until we tried to read the data.  I didn't try
to verify this yet, though.

Also, it looks possible for us to defer even mapping .debug_info and
other sections until some symbols are actually needed.  (At least, this
is true in the situation where the index describes every CU.  If we are
missing an entry then we do more work.)

Daniel> If you don't read this data off disk when reading the pubnames, you'll
Daniel> have to do it the first time one of them is referenced, anyway.  This
Daniel> is separate from parsing all the DIEs (psymtabs), which is much more
Daniel> work.

I think the question is whether searches requiring the tag occur often
enough that they would begin to eat into the performance gains.
Unfortunately this question isn't separable from the question of how
expensive it is to decide whether pubnames is actually usable.

I suppose this can be answered experimentally, if we assume some use
cases.

Daniel> Someone suggested on gdb-patches that GDB could generate and cache the
Daniel> pubnames table.  It follows that a separate packaging tool could do so
Daniel> also.  Something to consider... during separate debug file generation,
Daniel> for instance.

Yeah, or at least extend dwarflint to verify the output, assuming it
doesn't already.

Tom

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC] Proposal for a new DWARF name index section
  2009-11-20 17:25     ` Tom Tromey
  2009-11-22  4:39       ` Daniel Jacobowitz
@ 2009-12-01 19:14       ` Tom Tromey
  2009-12-02  5:17         ` Daniel Jacobowitz
  2009-12-02 16:11         ` Dodji Seketeli
  1 sibling, 2 replies; 31+ messages in thread
From: Tom Tromey @ 2009-12-01 19:14 UTC (permalink / raw)
  To: Cary Coutant; +Cc: Dodji Seketeli, GDB/Archer list

>>> As a follow up to this proposal, I have opened an enhancement request in
>>> the gcc bugzilla at http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41130 .

Cary> It seems to me that the .debug_pubnames/pubtypes/aranges sections were
Cary> created specifically so that the debugger could have fast access
Cary> without having to pre-read all the debug info, so the idea of creating
Cary> a new section rather than fix the one we already have is questionable.

Tom> Yes, I agree it would be ideal not to do this.

I still haven't implemented pubtypes/pubnames reading, but I did run
across a couple of other problems not solved by those.  I'd like to get
people's reactions.

To reiterate a little, my goal here is maximal performance.  Ideally,
the initial read would be very fast -- fast than the user can perceive.
This is probably not achievable, but we can get close.

The biggest fixable performance problem in the current reader is
actually computing the hash codes for the strings from the
.debug_gnu_index section.  So, I've been thinking about putting the hash
code directly into the section.

The other problem I've noticed is name canonicalization.  This past
year, we changed gdb to canonicalize names in its symbol tables, and to
canonicalize user input before doing lookups.  This lets gdb return the
right answer even when the order of modifiers varies.  This change
slowed down DWARF reading, and it occurred to me that it would also
substantially slow down index reading.  So, I would also like to change
the .debug_gnu_index spec to specify how names are to be canonicalized.

The hash code idea seems a little weird to me, but the name
canonicalization problem seems important to solve.

Tom

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC] Proposal for a new DWARF name index section
  2009-12-01 19:14       ` Tom Tromey
@ 2009-12-02  5:17         ` Daniel Jacobowitz
  2009-12-02 17:07           ` Tom Tromey
  2009-12-02 16:11         ` Dodji Seketeli
  1 sibling, 1 reply; 31+ messages in thread
From: Daniel Jacobowitz @ 2009-12-02  5:17 UTC (permalink / raw)
  To: Tom Tromey; +Cc: Cary Coutant, Dodji Seketeli, GDB/Archer list

On Tue, Dec 01, 2009 at 12:13:47PM -0700, Tom Tromey wrote:
> The hash code idea seems a little weird to me, but the name
> canonicalization problem seems important to solve.

FWIW, I think this is a good argument for handling this as a GDB
caching extension rather than a DWARF extension.  I can't imagine any
way to stably standardize this.  GDB is designed to be, if not
completely robust, at least flexible w.r.t. future changes in the
demangler's canonicalized output.

-- 
Daniel Jacobowitz
CodeSourcery

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC] Proposal for a new DWARF name index section
  2009-12-01 19:14       ` Tom Tromey
  2009-12-02  5:17         ` Daniel Jacobowitz
@ 2009-12-02 16:11         ` Dodji Seketeli
  2009-12-02 17:29           ` Tom Tromey
  1 sibling, 1 reply; 31+ messages in thread
From: Dodji Seketeli @ 2009-12-02 16:11 UTC (permalink / raw)
  To: Tom Tromey; +Cc: Cary Coutant, GDB/Archer list

On Tue, Dec 01, 2009 at 12:13:47PM (-0700), Tom Tromey wrote:
> The biggest fixable performance problem in the current reader is
> actually computing the hash codes for the strings from the
> .debug_gnu_index section.  So, I've been thinking about putting the hash
> code directly into the section.

FWIW, from a G++ pov, I think it'd be tempted to say "let's try to
implement this and see how much GDB gains" so that we can have data to make
an educated choice. So I'd be on the "yes, let's try side" of the story
here.

> 
> The other problem I've noticed is name canonicalization.  This past
> year, we changed gdb to canonicalize names in its symbol tables, and to
> canonicalize user input before doing lookups.  This lets gdb return the
> right answer even when the order of modifiers varies.  This change
> slowed down DWARF reading, and it occurred to me that it would also
> substantially slow down index reading.  So, I would also like to change
> the .debug_gnu_index spec to specify how names are to be canonicalized.

Just to be sure I understand. How saying _how_ the strings are to be
canonicalized is going to speed up significantly GDB's processing?
I would have have thought that the killer gain would come from the
producing directly what the consumer expects. I guess I am missing
something.

        Dodji

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC] Proposal for a new DWARF name index section
  2009-12-02  5:17         ` Daniel Jacobowitz
@ 2009-12-02 17:07           ` Tom Tromey
  2009-12-02 17:35             ` Daniel Jacobowitz
  0 siblings, 1 reply; 31+ messages in thread
From: Tom Tromey @ 2009-12-02 17:07 UTC (permalink / raw)
  To: Daniel Jacobowitz; +Cc: Cary Coutant, Dodji Seketeli, GDB/Archer list

>>>>> "Daniel" == Daniel Jacobowitz <drow@false.org> writes:

Tom> The hash code idea seems a little weird to me, but the name
Tom> canonicalization problem seems important to solve.

Daniel> FWIW, I think this is a good argument for handling this as a GDB
Daniel> caching extension rather than a DWARF extension.  I can't imagine any
Daniel> way to stably standardize this.  GDB is designed to be, if not
Daniel> completely robust, at least flexible w.r.t. future changes in the
Daniel> demangler's canonicalized output.

I think it would still be ok.  We would specify the canonical form as
part of the extension, and it would be up to compilers, and gdb, to
correctly implement it.  AFAICT it can be specified entirely in terms of
the source language of the CU.

In terms of our own demangler, if we wanted to change its output -- say,
we make some tweak to make it prettier somehow -- we could add a "DWARF"
demangle style that would preserve compatibility.

We don't need to ever display the contents of this index to users.  It
is purely an internal detail.  gdb (and presumably any debugger) needs
to canonicalize the table entries, so why not have the compiler generate
things already canonicalized, to save a step?

If we need to change the canonicalization for some other reason, say
because C++ adds new features, we can bump the version number in
.debug_gnu_index and specify a new canonical form.

Tom

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC] Proposal for a new DWARF name index section
  2009-12-02 16:11         ` Dodji Seketeli
@ 2009-12-02 17:29           ` Tom Tromey
  0 siblings, 0 replies; 31+ messages in thread
From: Tom Tromey @ 2009-12-02 17:29 UTC (permalink / raw)
  To: Dodji Seketeli; +Cc: Cary Coutant, GDB/Archer list

>>>>> "Dodji" == Dodji Seketeli <dodji@redhat.com> writes:

Tom> The other problem I've noticed is name canonicalization.  This past
Tom> year, we changed gdb to canonicalize names in its symbol tables, and to
Tom> canonicalize user input before doing lookups.  This lets gdb return the
Tom> right answer even when the order of modifiers varies.  This change
Tom> slowed down DWARF reading, and it occurred to me that it would also
Tom> substantially slow down index reading.  So, I would also like to change
Tom> the .debug_gnu_index spec to specify how names are to be canonicalized.

Dodji> Just to be sure I understand. How saying _how_ the strings are to be
Dodji> canonicalized is going to speed up significantly GDB's processing?
Dodji> I would have have thought that the killer gain would come from the
Dodji> producing directly what the consumer expects. I guess I am missing
Dodji> something.

The producer and consumer need to agree on the format.  The best way to
do that is define what the format actually is.

GDB must canonicalize names in order to do lookups in a sane way.  E.g.,
see: http://sourceware.org/bugzilla/show_bug.cgi?id=8617.  So, GDB
applies the same canonicalization to the names coming from DWARF, and to
the names the user enters.  Then it can do simple string comparisons to
find names.  But, GDB does not actually care about most details of this
canonicalization.

As a practical matter I would assume that we would define the canonical
form to conveniently align with our already-existing code.

Maybe this is all really wrong-headed, though, and it would be better to
change gdb to have a structured symbol table.  I haven't really
considered this approach much.

Tom

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC] Proposal for a new DWARF name index section
  2009-12-02 17:07           ` Tom Tromey
@ 2009-12-02 17:35             ` Daniel Jacobowitz
  2009-12-02 19:23               ` Tom Tromey
  0 siblings, 1 reply; 31+ messages in thread
From: Daniel Jacobowitz @ 2009-12-02 17:35 UTC (permalink / raw)
  To: Tom Tromey; +Cc: Cary Coutant, Dodji Seketeli, GDB/Archer list

On Wed, Dec 02, 2009 at 10:07:41AM -0700, Tom Tromey wrote:
> I think it would still be ok.  We would specify the canonical form as
> part of the extension, and it would be up to compilers, and gdb, to
> correctly implement it.  AFAICT it can be specified entirely in terms of
> the source language of the CU.

Well, as long as I'm not the one who has to convert the C++ demangler
into a specification, I guess it's not totally crazy.  I still think
this makes more sense as an internal cache, though, because it's so
tied to the implementation of both the compiler and debugger.  And
the canonicalization isn't cheap, and doesn't match GCC's internal
representation of the names; this would slow down GCC to speed up
debugging.

You could even speed up full DIE reading this way, by canonicalizing
strings there also, and leaving an attribute or other marking in the
cached copy saying this string was simple.

Not to mention it's easier to prototype :-)

-- 
Daniel Jacobowitz
CodeSourcery

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC] Proposal for a new DWARF name index section
  2009-12-02 17:35             ` Daniel Jacobowitz
@ 2009-12-02 19:23               ` Tom Tromey
  2009-12-02 19:39                 ` Daniel Jacobowitz
  0 siblings, 1 reply; 31+ messages in thread
From: Tom Tromey @ 2009-12-02 19:23 UTC (permalink / raw)
  To: Daniel Jacobowitz; +Cc: Cary Coutant, Dodji Seketeli, GDB/Archer list

>>>>> "Daniel" == Daniel Jacobowitz <drow@false.org> writes:

Daniel> Well, as long as I'm not the one who has to convert the C++ demangler
Daniel> into a specification, I guess it's not totally crazy.  I still think
Daniel> this makes more sense as an internal cache, though, because it's so
Daniel> tied to the implementation of both the compiler and debugger.  And
Daniel> the canonicalization isn't cheap, and doesn't match GCC's internal
Daniel> representation of the names; this would slow down GCC to speed up
Daniel> debugging.

I do think that slowing down the compiler to speed up the debugger would
be the wrong tradeoff.  I was hoping we could get this for free in the
compiler, or nearly so, but now I unfortunately see that I was confused
on that point :-(.  We could pick a representation close to what gcc
already emits -- but then that overly constrains gcc in the future.

Caching is interesting but it comes with other problems.  We have to
manage the cache somehow.  And, the cache would not be useful when an
object changes.  So, I'd prefer a direct approach, if one can be made to
work.

The index is still an improvement even if we have to do canonicalization
when reading.

Some initial numbers:

                           warm cache    cold cache
 without canonicalization:  ~0.5 sec       5 sec
gdb does canonicalization:  ~1.7 sec       6 sec
             gdb cvs head:  ~2.4 sec      10 sec

There's a fair amount of noise in the warm cache numbers, I'd say +/-
0.2 sec.

This is timing "gdb -batch" on a smallish (80 KLOC) C++ program.  I
still haven't tried my big tests, I'm still working on setting those up.
I also still haven't tried the pubnames/pubtypes index.

I guess canonicalization is not terrible -- I definitely notice it when
the cache is warm, but with a cold cache it doesn't matter.

Tom

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC] Proposal for a new DWARF name index section
  2009-12-02 19:23               ` Tom Tromey
@ 2009-12-02 19:39                 ` Daniel Jacobowitz
  2009-12-03  1:46                   ` Paul Pluzhnikov
  0 siblings, 1 reply; 31+ messages in thread
From: Daniel Jacobowitz @ 2009-12-02 19:39 UTC (permalink / raw)
  To: Tom Tromey; +Cc: Cary Coutant, Dodji Seketeli, GDB/Archer list

On Wed, Dec 02, 2009 at 12:23:25PM -0700, Tom Tromey wrote:
> I do think that slowing down the compiler to speed up the debugger would
> be the wrong tradeoff.  I was hoping we could get this for free in the
> compiler, or nearly so, but now I unfortunately see that I was confused
> on that point :-(.  We could pick a representation close to what gcc
> already emits -- but then that overly constrains gcc in the future.

That's not an option, anyway; the stuff GCC emits is too vague
in some cases (he says unsubstantiatedly).

> Caching is interesting but it comes with other problems.  We have to
> manage the cache somehow.  And, the cache would not be useful when an
> object changes.  So, I'd prefer a direct approach, if one can be made to
> work.

Well, inherent in the cache approach (IMO) is a system-provided cache;
for installed libraries, the cache data could be added to a debuginfo
file.  Of course, that assumes GDB's format stays "relatively stable"
across GDB updates.

>                            warm cache    cold cache
>  without canonicalization:  ~0.5 sec       5 sec
> gdb does canonicalization:  ~1.7 sec       6 sec
>              gdb cvs head:  ~2.4 sec      10 sec

Not bad!

-- 
Daniel Jacobowitz
CodeSourcery

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC] Proposal for a new DWARF name index section
  2009-12-02 19:39                 ` Daniel Jacobowitz
@ 2009-12-03  1:46                   ` Paul Pluzhnikov
  2009-12-04 23:13                     ` Tom Tromey
  0 siblings, 1 reply; 31+ messages in thread
From: Paul Pluzhnikov @ 2009-12-03  1:46 UTC (permalink / raw)
  To: Daniel Jacobowitz
  Cc: Tom Tromey, Cary Coutant, Dodji Seketeli, GDB/Archer list

On Wed, Dec 2, 2009 at 11:38 AM, Daniel Jacobowitz <drow@false.org> wrote:

> Well, inherent in the cache approach (IMO) is a system-provided cache;
> for installed libraries, the cache data could be added to a debuginfo
> file.  Of course, that assumes GDB's format stays "relatively stable"
> across GDB updates.

FWIW, I've used the following approach on a previous product X:

- As new binary is detected, a copy of X is invoked to parse all
  the needed debug info into internal form and written to a cache file.
- Once the copy exits, the cache file is directly mmap()ed by X.
- Cache files older than 1 week, and cache files prepared from
  binaries which no longer exist in their original location are
  pruned to keep cache size down.

The cache file contains version of X, so when a new version of X
is shipped, the cache is automatically rebuilt.

It also contains path/timestamp/inode/size for the target binary,
so if e.g. one of the shared libs has been rebuilt since last run,
only that one shared library must be re-processed.

This trades startup speed against disk space, and disk is usually
very cheap now.

One of our typical usage scenarios is a tiny executable linked with
1000+ C++ shared libraries. Simply re-running the test a second time
in a row in GDB takes 1+ minutes, as GDB discards and re-reads the
debug info for each solib (it used to take 6+ minutes before my dwarf
mmap changes).

The major CPU consumers in my tests are now:

CPU: AMD64 processors, speed 2200 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a
unit mask of 0x00 (No unit mask) count 100000
samples  %        symbol name
43092     8.2847  read_partial_die
38243     7.3525  strcmp_iw_ordered
36744     7.0643  read_attribute_value
28887     5.5537  cpname_parse
28849     5.5464  d_print_comp
27731     5.3315  htab_hash_string
21975     4.2248  cp_canonicalize_string
20736     3.9866  load_partial_dies
18098     3.4795  cpname_lex
15649     3.0086  lookup_minimal_symbol
15156     2.9138  msymbol_hash_iw
14185     2.7272  htab_find_slot_with_hash

I am guessing that a GDB cache of pre-canonicalized strings would
save a *lot* of CPU under this scenario, and there is no reason
you can't put any other indices into the cache, or to have a stable
format of the cache file -- newer version of GDB will simply rebuild
what it needs on demand.

-- 
Paul Pluzhnikov

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC] Proposal for a new DWARF name index section
  2009-12-03  1:46                   ` Paul Pluzhnikov
@ 2009-12-04 23:13                     ` Tom Tromey
  2009-12-06  3:41                       ` Tom Tromey
  0 siblings, 1 reply; 31+ messages in thread
From: Tom Tromey @ 2009-12-04 23:13 UTC (permalink / raw)
  To: Paul Pluzhnikov
  Cc: Daniel Jacobowitz, Cary Coutant, Dodji Seketeli, GDB/Archer list

>>>>> "Paul" == Paul Pluzhnikov <ppluzhnikov@google.com> writes:

Paul> FWIW, I've used the following approach on a previous product X:
Paul> - As new binary is detected, a copy of X is invoked to parse all
Paul>   the needed debug info into internal form and written to a cache file.
Paul> - Once the copy exits, the cache file is directly mmap()ed by X.
Paul> - Cache files older than 1 week, and cache files prepared from
Paul>   binaries which no longer exist in their original location are
Paul>   pruned to keep cache size down.

Thanks.

FWIW, gdb used to have a caching scheme like that.  It has been a long
time, so I don't remember the details... I know that Jan had a
reimplementation of it last year, but found that it wasn't a real
performance win.  I don't recall why.

If we can get acceptable performance without a cache, then I think that
would be preferable.  One trouble with caching is that it is still slow
the first time.

So far, it is clear that we can improve performance.  It is less clear
whether we can improve it enough, but I'm working on finding out.

Paul> One of our typical usage scenarios is a tiny executable linked with
Paul> 1000+ C++ shared libraries. Simply re-running the test a second time
Paul> in a row in GDB takes 1+ minutes, as GDB discards and re-reads the
Paul> debug info for each solib (it used to take 6+ minutes before my dwarf
Paul> mmap changes).

It seems to me that we could be a bit smarter about objfile lifetimes.
I think this will probably be important for good performance in the
multi-inferior case.

Consider the classic "make check" example.  If we aggressively discard
objfiles as we do now, in this case we will be reading and throwing away
the debuginfo for gcc/cc1/etc for every object built by make... ugh.

Tom

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC] Proposal for a new DWARF name index section
  2009-12-04 23:13                     ` Tom Tromey
@ 2009-12-06  3:41                       ` Tom Tromey
  2009-12-07 21:32                         ` Tom Tromey
  0 siblings, 1 reply; 31+ messages in thread
From: Tom Tromey @ 2009-12-06  3:41 UTC (permalink / raw)
  To: Paul Pluzhnikov
  Cc: Daniel Jacobowitz, Cary Coutant, Dodji Seketeli, GDB/Archer list

>>>>> "Tom" == Tom Tromey <tromey@redhat.com> writes:

Tom> If we can get acceptable performance without a cache, then I think that
Tom> would be preferable.  One trouble with caching is that it is still slow
Tom> the first time.

Tom> So far, it is clear that we can improve performance.  It is less clear
Tom> whether we can improve it enough, but I'm working on finding out.

It turns out that we can make it start very fast by assuming that if we
see .debug_gnu_index, then we are going to use both it and
.debug_aranges.  I changed the code to make this assumption, and to
lazily map all debug sections.  With this in the place the results are
ridiculously great, like:

    0.15user 0.02system 0:00.21elapsed

I also found out that testing this code with "gdb -batch" introduces a
bit of confusion into the results, because that will call
find_main_filename, which will require some symbol table information.
I've taken to disabling this code when looking at timings.

Also, I made another funny hack tonight.  I changed gdb to read
.debug_aranges and .debug_gnu_index in a background thread.  This was
pretty easy to do; it really just few a couple global __thread
variables.  (I didn't attempt reading full symbols in a separate thread,
because that seems a lot more involved.)

What this means is that the user will still see very fast startup times,
but also will typically have less waiting when he runs a command.  The
problem with this "fast" patch is that we are really just deferring some
work until it is needed.  But, when it is needed we still have to spend
the time to actually read the index.  Threading lets us hide a bit of
that work.

I would guess that threading will be met with revulsion :-).  But, it
seems very practical in this case.

I'll clean up this patch a bit and push it to a new branch in archer
this week.

Tom

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC] Proposal for a new DWARF name index section
  2009-12-06  3:41                       ` Tom Tromey
@ 2009-12-07 21:32                         ` Tom Tromey
  0 siblings, 0 replies; 31+ messages in thread
From: Tom Tromey @ 2009-12-07 21:32 UTC (permalink / raw)
  To: Paul Pluzhnikov
  Cc: Daniel Jacobowitz, Cary Coutant, Dodji Seketeli, GDB/Archer list

>>>>> "Tom" == Tom Tromey <tromey@redhat.com> writes:

Tom> Also, I made another funny hack tonight.  I changed gdb to read
Tom> .debug_aranges and .debug_gnu_index in a background thread.  This was
Tom> pretty easy to do; it really just few a couple global __thread
Tom> variables.  (I didn't attempt reading full symbols in a separate thread,
Tom> because that seems a lot more involved.)

BTW, what I mean by this is that when using the indices, sometimes GDB
has to read full symbols for a given CU.  This happens if we don't see
any index entry, or any aranges entry, for that CU.  In this situation
we can't tell whether that entry was somehow stripped or not created, or
just empty.

For aranges this is http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42288
I updated the .debug_gnu_index PR with a similar request.

This sort of reading has to be done in the main thread.  There were just
too many problems with putting this into a separate thread -- not just
all the global variables, but this would also imply that users of the
symtabs would need to take a lock.

Tom> I'll clean up this patch a bit and push it to a new branch in archer
Tom> this week.

It is now on archer-tromey-threaded-dwarf.

Tom

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC] Proposal for a new DWARF name index section
  2009-11-17 23:46   ` Cary Coutant
  2009-11-20 17:25     ` Tom Tromey
@ 2009-12-11 23:56     ` Tom Tromey
  2009-12-12  0:06       ` Daniel Jacobowitz
                         ` (3 more replies)
  1 sibling, 4 replies; 31+ messages in thread
From: Tom Tromey @ 2009-12-11 23:56 UTC (permalink / raw)
  To: Cary Coutant; +Cc: Dodji Seketeli, GDB/Archer list

>>>>> "Cary" == Cary Coutant <ccoutant@google.com> writes:

Cary> Now how will gdb know whether or not the pubnames index actually has
Cary> all of this extra info? A suggestion to have gdb look at the producer
Cary> string was shot down as ugly, but compare that to the alternative of
Cary> using a non-standard index. You could instead tie it to the switch to
Cary> DWARF-4 and just check the section version number in the compilation
Cary> unit header (the version number for .debug_pubnames isn't scheduled to
Cary> change with DWARF-4, unfortunately). Another alternative is just to
Cary> have gdb use the pubnames index if it's present, and any name that
Cary> isn't in the index simply won't be found without qualification.

It occurred to me today that we could define a GNU-local attribute that
GCC (and whatever DWARF-rewriting tool we come up with to generate the
indices post-facto) can put into the CU DIE.

What do you think of that?

Tom

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC] Proposal for a new DWARF name index section
  2009-12-11 23:56     ` Tom Tromey
@ 2009-12-12  0:06       ` Daniel Jacobowitz
  2009-12-12  0:13       ` Cary Coutant
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 31+ messages in thread
From: Daniel Jacobowitz @ 2009-12-12  0:06 UTC (permalink / raw)
  To: Tom Tromey; +Cc: Cary Coutant, Dodji Seketeli, GDB/Archer list

On Fri, Dec 11, 2009 at 04:56:02PM -0700, Tom Tromey wrote:
> It occurred to me today that we could define a GNU-local attribute that
> GCC (and whatever DWARF-rewriting tool we come up with to generate the
> indices post-facto) can put into the CU DIE.
> 
> What do you think of that?

Sounds better than producer-grubbing to me.

-- 
Daniel Jacobowitz
CodeSourcery

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC] Proposal for a new DWARF name index section
  2009-12-11 23:56     ` Tom Tromey
  2009-12-12  0:06       ` Daniel Jacobowitz
@ 2009-12-12  0:13       ` Cary Coutant
  2009-12-13  3:48       ` Dodji Seketeli
  2009-12-14 15:32       ` Dodji Seketeli
  3 siblings, 0 replies; 31+ messages in thread
From: Cary Coutant @ 2009-12-12  0:13 UTC (permalink / raw)
  To: Tom Tromey; +Cc: Dodji Seketeli, GDB/Archer list

> It occurred to me today that we could define a GNU-local attribute that
> GCC (and whatever DWARF-rewriting tool we come up with to generate the
> indices post-facto) can put into the CU DIE.
>
> What do you think of that?

Sounds fine to me.

-cary

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC] Proposal for a new DWARF name index section
  2009-12-11 23:56     ` Tom Tromey
  2009-12-12  0:06       ` Daniel Jacobowitz
  2009-12-12  0:13       ` Cary Coutant
@ 2009-12-13  3:48       ` Dodji Seketeli
  2009-12-14 15:32       ` Dodji Seketeli
  3 siblings, 0 replies; 31+ messages in thread
From: Dodji Seketeli @ 2009-12-13  3:48 UTC (permalink / raw)
  To: Tom Tromey; +Cc: Cary Coutant, GDB/Archer list

On Fri, Dec 11, 2009 at 04:56:02PM -0700, Tom Tromey wrote:
> It occurred to me today that we could define a GNU-local attribute that
> GCC (and whatever DWARF-rewriting tool we come up with to generate the
> indices post-facto) can put into the CU DIE.
> 
> What do you think of that?

Fine by me.

        Dodji

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC] Proposal for a new DWARF name index section
  2009-12-11 23:56     ` Tom Tromey
                         ` (2 preceding siblings ...)
  2009-12-13  3:48       ` Dodji Seketeli
@ 2009-12-14 15:32       ` Dodji Seketeli
  3 siblings, 0 replies; 31+ messages in thread
From: Dodji Seketeli @ 2009-12-14 15:32 UTC (permalink / raw)
  To: Tom Tromey; +Cc: Cary Coutant, GDB/Archer list

On Fri, Dec 11, 2009 at 04:56:02PM -0700, Tom Tromey wrote:
> It occurred to me today that we could define a GNU-local attribute that
> GCC (and whatever DWARF-rewriting tool we come up with to generate the
> indices post-facto) can put into the CU DIE.
> 
> What do you think of that?

It looks fine to me.

        Dodji

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2009-12-14 15:32 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-08-10  9:04 [RFC] Proposal for a new DWARF name index section Dodji Seketeli
2009-08-10 14:38 ` Jan Kratochvil
2009-08-10 17:36   ` Tom Tromey
2009-08-10 18:21     ` Jan Kratochvil
2009-08-11  7:55       ` Dodji Seketeli
2009-08-11 17:45         ` Jan Kratochvil
2009-08-11 22:43           ` Tom Tromey
2009-08-12 19:20             ` Jan Kratochvil
2009-08-11 22:29       ` Tom Tromey
2009-08-20 17:31 ` Dodji Seketeli
2009-11-17 23:46   ` Cary Coutant
2009-11-20 17:25     ` Tom Tromey
2009-11-22  4:39       ` Daniel Jacobowitz
2009-11-23 19:51         ` Tom Tromey
2009-12-01 19:14       ` Tom Tromey
2009-12-02  5:17         ` Daniel Jacobowitz
2009-12-02 17:07           ` Tom Tromey
2009-12-02 17:35             ` Daniel Jacobowitz
2009-12-02 19:23               ` Tom Tromey
2009-12-02 19:39                 ` Daniel Jacobowitz
2009-12-03  1:46                   ` Paul Pluzhnikov
2009-12-04 23:13                     ` Tom Tromey
2009-12-06  3:41                       ` Tom Tromey
2009-12-07 21:32                         ` Tom Tromey
2009-12-02 16:11         ` Dodji Seketeli
2009-12-02 17:29           ` Tom Tromey
2009-12-11 23:56     ` Tom Tromey
2009-12-12  0:06       ` Daniel Jacobowitz
2009-12-12  0:13       ` Cary Coutant
2009-12-13  3:48       ` Dodji Seketeli
2009-12-14 15:32       ` Dodji Seketeli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).