Re: dwarf name canonicalization

public inbox for archer@sourceware.org
 help / color / mirror / Atom feed

From: "André Pönitz" <andre.poenitz@nokia.com>
To: archer@sourceware.org
Subject: Re: dwarf name canonicalization
Date: Thu, 25 Feb 2010 11:16:00 -0000	[thread overview]
Message-ID: <201002251216.16565.andre.poenitz@nokia.com> (raw)
In-Reply-To: <4B84517A.4020307@redhat.com>

[-- Attachment #1: Type: Text/Plain, Size: 4236 bytes --]

On Tuesday 23 February 2010 23:06:50 Keith Seitz wrote:
> [OT: I would love a test case. I *pleaded* for specific test cases.]

Yes, I remember that. Sorry. I still had it on my TODO list, just had not
found the time to create something that it's easily reproducible without
too much external dependencies. 

Looks like it is time to act now.

All my "real" use cases would require Qt which is probably not acceptable 
here, so let me have a shot at a contrived example that I'd consider
structurally not too far off from reality, a ~1000 function "project", 
structured like this:

----------------------- lib1.h --------------------
#ifndef LIB1_H
#define LIB1_H

#include <string>
#include <vector>

#include <map>

namespace ns {
namespace inner {

struct Foo1
{
    int foo0(std::map<std::string, std::vector<std::string> > &map, 
         const std::string &index, const std::string &x);
   // [...]
    int foo25(std::map<std::string, std::vector<std::string> > &map, 
         const std::string &index, const std::string &x);
    int sum();
};
[...]

----------------------- lib1.cpp --------------------
[...]
int Foo1::foo25(std::map<std::string, std::vector<std::string> > &map,
 const std::string &index, const std::string &x)
{
        return map[index].size() < x.size();
}

int Foo1::sum()
{
        int t = 0;
        std::map<std::string, std::vector<std::string> > m;
        m["key 0"].push_back("value 0");
        t += foo0(m, "key 1", "xxx");
        [...]
        return t;
}

----------------------- main.cpp --------------------
#include "lib1.h"
[...]

using namespace ns::inner;

int main()
{
       int s = 0;
        s += Foo0().sum();
        s += Foo1().sum();
       [...]
       return s;
}

I'll attach a perl script generating the code. Don't look at the actual code
too close, it really does not matter. A quick test also indicates that neither
the number or files nor of functions make a difference for the time ratio.

With 7.0.90 gdb spends 15.48% of its instructions in dwarf2_canonicalize_name
and functions called from there,  with 7.0.1 it is only 0.04%. 

Total instruction count is 429,137,527 vs 516,590,964.
Both versions of gdb are compiled with -O2 -g  using gcc 4.4.1.

I certainly do understand that instruction count does not need to mean
much, but it is fairly reproducible and in this case it correlates indeed with 
wall clock times.

Note that the number will get _much_ worse when it comes to "modern"
C++ like code using template expressions or even 

> [...] Unless the IDE provided a console that accepted generic input (like 
> "normal" gdb), I don't think that much would break, if anything. IDEs 
> really rather rely on linespecs for the most part, no? As long as you're 
> not sending input to gdb that looks like a function name, you should be 
> safe. But I cannot guarantee. I have no first-hand experience with IDEs 
> (in many years).

From my point of view it is a safe assumption that most if not all IDE users
would prefer a 15% startup time gain over an improved parsing of function 
names - especially since they are very unlikely to ever use anyway.

However, it looks like it does not even have to be an either-or here. If 
the canonicalization would be made optional using, say, some 'maint set'
switch, a user could make his own choice, and an IDE could even apply 
some "cleverness" like switching canonicalization off in the beginning
and reload with canonicalization as soon as the user triggers an operation
that needs canonicalization. Or maybe even retrieve a list of uncanonicalized
symbols and match user input against that before bothering gdb with it.

> I would much rather address (fix?) the speed problem first. The idea of 
> multiple paths through the code for the "same" task would seem a high 
> bit rot risk.

I am not sure this will solve the problem. Even if you were able to speed up
canonicalization by, say, 30%, it would still impact startup times by 10%, 
unconditionally, no matter whether the result is ever needed. And 10% are
highly visible when the total time is in the "several dozen seconds" range.

Andre'

[-- Attachment #2: createit.pl --]
[-- Type: application/x-perl, Size: 2478 bytes --]

next prev parent reply	other threads:[~2010-02-25 11:16 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-02-23 20:52 Tom Tromey
2010-02-23 22:07 ` Keith Seitz
2010-02-25 11:16   ` André Pönitz [this message]
2010-02-25 19:38     ` Tom Tromey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201002251216.16565.andre.poenitz@nokia.com \
    --to=andre.poenitz@nokia.com \
    --cc=archer@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).