dwarf name canonicalization

public inbox for archer@sourceware.org
 help / color / mirror / Atom feed

* dwarf name canonicalization
@ 2010-02-23 20:52 Tom Tromey
  2010-02-23 22:07 ` Keith Seitz
  0 siblings, 1 reply; 4+ messages in thread
From: Tom Tromey @ 2010-02-23 20:52 UTC (permalink / raw)
  To: Project Archer

Keith, do you happen to know offhand what things in gdb rely on
canonicalizing C++ names in the dwarf reader?

I was discussing this canonicalization with a user who prefers to remain
anonymous.  His experience is that this canonicalization greatly slows
down gdb startup (he said 15%), and in his experience isn't needed for
his use case, which is running gdb as part of an IDE.

I'm wondering whether it would make sense to somehow disable this, maybe
via some special mode for IDEs to use.  I thought maybe you'd know what
would break...

My understanding is that the typical IDE use cases are much more
restricted than what CLI users do.  E.g., in the IDE case, most
breakpoints are set by "file:line" and expression evaluation is not as
important.

Tom

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: dwarf name canonicalization
  2010-02-23 20:52 dwarf name canonicalization Tom Tromey
@ 2010-02-23 22:07 ` Keith Seitz
  2010-02-25 11:16   ` André Pönitz
  0 siblings, 1 reply; 4+ messages in thread
From: Keith Seitz @ 2010-02-23 22:07 UTC (permalink / raw)
  To: tromey; +Cc: Project Archer

On 02/23/2010 12:52 PM, Tom Tromey wrote:
> Keith, do you happen to know offhand what things in gdb rely on
> canonicalizing C++ names in the dwarf reader?

I have a pretty good idea. :-)

> I was discussing this canonicalization with a user who prefers to remain
> anonymous.  His experience is that this canonicalization greatly slows
> down gdb startup (he said 15%), and in his experience isn't needed for
> his use case, which is running gdb as part of an IDE.

[OT: I would love a test case. I *pleaded* for specific test cases.]

Anonymous obviously has evidence that his IDE can work around the 
problems of generic input which we must deal with in the console. I see 
no reason why anonymous shouldn't submit a patch to disable 
canonicalization (and related). He'll probably want to also disable 
dwarf2_physname and bring back DW_AT_MIPS_linkage_name (assuming that 
doesn't disappear altogether).

> I'm wondering whether it would make sense to somehow disable this, maybe
> via some special mode for IDEs to use.  I thought maybe you'd know what
> would break...

Unless the IDE provided a console that accepted generic input (like 
"normal" gdb), I don't think that much would break, if anything. IDEs 
really rather rely on linespecs for the most part, no? As long as you're 
not sending input to gdb that looks like a function name, you should be 
safe. But I cannot guarantee. I have no first-hand experience with IDEs 
(in many years).

I would much rather address (fix?) the speed problem first. The idea of 
multiple paths through the code for the "same" task would seem a high 
bit rot risk.

Keith

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: dwarf name canonicalization
  2010-02-23 22:07 ` Keith Seitz
@ 2010-02-25 11:16   ` André Pönitz
  2010-02-25 19:38     ` Tom Tromey
  0 siblings, 1 reply; 4+ messages in thread
From: André Pönitz @ 2010-02-25 11:16 UTC (permalink / raw)
  To: archer

[-- Attachment #1: Type: Text/Plain, Size: 4236 bytes --]

On Tuesday 23 February 2010 23:06:50 Keith Seitz wrote:
> [OT: I would love a test case. I *pleaded* for specific test cases.]

Yes, I remember that. Sorry. I still had it on my TODO list, just had not
found the time to create something that it's easily reproducible without
too much external dependencies. 

Looks like it is time to act now.

All my "real" use cases would require Qt which is probably not acceptable 
here, so let me have a shot at a contrived example that I'd consider
structurally not too far off from reality, a ~1000 function "project", 
structured like this:

----------------------- lib1.h --------------------
#ifndef LIB1_H
#define LIB1_H

#include <string>
#include <vector>

#include <map>

namespace ns {
namespace inner {

struct Foo1
{
    int foo0(std::map<std::string, std::vector<std::string> > &map, 
         const std::string &index, const std::string &x);
   // [...]
    int foo25(std::map<std::string, std::vector<std::string> > &map, 
         const std::string &index, const std::string &x);
    int sum();
};
[...]

----------------------- lib1.cpp --------------------
[...]
int Foo1::foo25(std::map<std::string, std::vector<std::string> > &map,
 const std::string &index, const std::string &x)
{
        return map[index].size() < x.size();
}

int Foo1::sum()
{
        int t = 0;
        std::map<std::string, std::vector<std::string> > m;
        m["key 0"].push_back("value 0");
        t += foo0(m, "key 1", "xxx");
        [...]
        return t;
}

----------------------- main.cpp --------------------
#include "lib1.h"
[...]

using namespace ns::inner;

int main()
{
       int s = 0;
        s += Foo0().sum();
        s += Foo1().sum();
       [...]
       return s;
}

I'll attach a perl script generating the code. Don't look at the actual code
too close, it really does not matter. A quick test also indicates that neither
the number or files nor of functions make a difference for the time ratio.

With 7.0.90 gdb spends 15.48% of its instructions in dwarf2_canonicalize_name
and functions called from there,  with 7.0.1 it is only 0.04%. 

Total instruction count is 429,137,527 vs 516,590,964.
Both versions of gdb are compiled with -O2 -g  using gcc 4.4.1.

I certainly do understand that instruction count does not need to mean
much, but it is fairly reproducible and in this case it correlates indeed with 
wall clock times.

Note that the number will get _much_ worse when it comes to "modern"
C++ like code using template expressions or even 

> [...] Unless the IDE provided a console that accepted generic input (like 
> "normal" gdb), I don't think that much would break, if anything. IDEs 
> really rather rely on linespecs for the most part, no? As long as you're 
> not sending input to gdb that looks like a function name, you should be 
> safe. But I cannot guarantee. I have no first-hand experience with IDEs 
> (in many years).

From my point of view it is a safe assumption that most if not all IDE users
would prefer a 15% startup time gain over an improved parsing of function 
names - especially since they are very unlikely to ever use anyway.

However, it looks like it does not even have to be an either-or here. If 
the canonicalization would be made optional using, say, some 'maint set'
switch, a user could make his own choice, and an IDE could even apply 
some "cleverness" like switching canonicalization off in the beginning
and reload with canonicalization as soon as the user triggers an operation
that needs canonicalization. Or maybe even retrieve a list of uncanonicalized
symbols and match user input against that before bothering gdb with it.

> I would much rather address (fix?) the speed problem first. The idea of 
> multiple paths through the code for the "same" task would seem a high 
> bit rot risk.

I am not sure this will solve the problem. Even if you were able to speed up
canonicalization by, say, 30%, it would still impact startup times by 10%, 
unconditionally, no matter whether the result is ever needed. And 10% are
highly visible when the total time is in the "several dozen seconds" range.

Andre'

[-- Attachment #2: createit.pl --]
[-- Type: application/x-perl, Size: 2478 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: dwarf name canonicalization
  2010-02-25 11:16   ` André Pönitz
@ 2010-02-25 19:38     ` Tom Tromey
  0 siblings, 0 replies; 4+ messages in thread
From: Tom Tromey @ 2010-02-25 19:38 UTC (permalink / raw)
  To: André Pönitz; +Cc: archer

>>>>> "André" == André Pönitz <andre.poenitz@nokia.com> writes:

André> All my "real" use cases would require Qt which is probably not
André> acceptable here

I wanted to reply to this quickly, so I haven't actually read the rest
of your note yet :)

Anything in Fedora is fine as a test case for this kind of thing.  It is
simple for us to install the needed packages and debuginfo.

Tom

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2010-02-25 19:38 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-02-23 20:52 dwarf name canonicalization Tom Tromey
2010-02-23 22:07 ` Keith Seitz
2010-02-25 11:16   ` André Pönitz
2010-02-25 19:38     ` Tom Tromey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).