From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <archer-return-1926-listarch-archer=sourceware.org@sourceware.org>
Received: (qmail 5980 invoked by alias); 25 Feb 2010 11:16:35 -0000
Mailing-List: contact archer-help@sourceware.org; run by ezmlm
Sender: <archer@sourceware.org>
Precedence: bulk
List-Post: <mailto:archer@sourceware.org>
List-Help: <mailto:archer-help@sourceware.org>
List-Subscribe: <mailto:archer-subscribe@sourceware.org>
List-Id: <archer.sourceware.org>
Received: (qmail 5950 invoked by uid 22791); 25 Feb 2010 11:16:33 -0000
X-SWARE-Spam-Status: No, hits=-2.6 required=5.0
	tests=AWL,BAYES_00
X-Spam-Check-By: sourceware.org
From: =?iso-8859-1?q?Andr=E9_P=F6nitz?= <andre.poenitz@nokia.com>
To: archer@sourceware.org
Subject: Re: dwarf name canonicalization
Date: Thu, 25 Feb 2010 11:16:00 -0000
User-Agent: KMail/1.12.2 (Linux/2.6.31-19-generic; KDE/4.3.2; i686; ; )
References: <m3zl2zy8sa.fsf@fleche.redhat.com> <4B84517A.4020307@redhat.com>
In-Reply-To: <4B84517A.4020307@redhat.com>
MIME-Version: 1.0
Content-Type: Multipart/Mixed;
  boundary="Boundary-00=_AwlhL6kUXR3dTiG"
Message-Id: <201002251216.16565.andre.poenitz@nokia.com>
X-Nokia-AV: Clean
X-SW-Source: 2010-q1/txt/msg00086.txt.bz2

--Boundary-00=_AwlhL6kUXR3dTiG
Content-Type: Text/Plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Content-length: 4181

On Tuesday 23 February 2010 23:06:50 Keith Seitz wrote:
> [OT: I would love a test case. I *pleaded* for specific test cases.]

Yes, I remember that. Sorry. I still had it on my TODO list, just had not
found the time to create something that it's easily reproducible without
too much external dependencies.=20

Looks like it is time to act now.

All my "real" use cases would require Qt which is probably not acceptable=20
here, so let me have a shot at a contrived example that I'd consider
structurally not too far off from reality, a ~1000 function "project",=20
structured like this:

----------------------- lib1.h --------------------
#ifndef LIB1_H
#define LIB1_H

#include <string>
#include <vector>

#include <map>

namespace ns {
namespace inner {

struct Foo1
{
    int foo0(std::map<std::string, std::vector<std::string> > &map,=20
         const std::string &index, const std::string &x);
   // [...]
    int foo25(std::map<std::string, std::vector<std::string> > &map,=20
         const std::string &index, const std::string &x);
    int sum();
};
[...]

----------------------- lib1.cpp --------------------
[...]
int Foo1::foo25(std::map<std::string, std::vector<std::string> > &map,
 const std::string &index, const std::string &x)
{
        return map[index].size() < x.size();
}

int Foo1::sum()
{
        int t =3D 0;
        std::map<std::string, std::vector<std::string> > m;
        m["key 0"].push_back("value 0");
        t +=3D foo0(m, "key 1", "xxx");
        [...]
        return t;
}

----------------------- main.cpp --------------------
#include "lib1.h"
[...]

using namespace ns::inner;

int main()
{
       int s =3D 0;
        s +=3D Foo0().sum();
        s +=3D Foo1().sum();
       [...]
       return s;
}


I'll attach a perl script generating the code. Don't look at the actual code
too close, it really does not matter. A quick test also indicates that neit=
her
the number or files nor of functions make a difference for the time ratio.

With 7.0.90 gdb spends 15.48% of its instructions in dwarf2_canonicalize_na=
me
and functions called from there,  with 7.0.1 it is only 0.04%.=20

Total instruction count is 429,137,527 vs 516,590,964.
Both versions of gdb are compiled with -O2 -g  using gcc 4.4.1.

I certainly do understand that instruction count does not need to mean
much, but it is fairly reproducible and in this case it correlates indeed w=
ith=20
wall clock times.

Note that the number will get _much_ worse when it comes to "modern"
C++ like code using template expressions or even=20

> [...] Unless the IDE provided a console that accepted generic input (like=
=20
> "normal" gdb), I don't think that much would break, if anything. IDEs=20
> really rather rely on linespecs for the most part, no? As long as you're=
=20
> not sending input to gdb that looks like a function name, you should be=20
> safe. But I cannot guarantee. I have no first-hand experience with IDEs=20
> (in many years).

>From my point of view it is a safe assumption that most if not all IDE users
would prefer a 15% startup time gain over an improved parsing of function=20
names - especially since they are very unlikely to ever use anyway.

However, it looks like it does not even have to be an either-or here. If=20
the canonicalization would be made optional using, say, some 'maint set'
switch, a user could make his own choice, and an IDE could even apply=20
some "cleverness" like switching canonicalization off in the beginning
and reload with canonicalization as soon as the user triggers an operation
that needs canonicalization. Or maybe even retrieve a list of uncanonicaliz=
ed
symbols and match user input against that before bothering gdb with it.

> I would much rather address (fix?) the speed problem first. The idea of=20
> multiple paths through the code for the "same" task would seem a high=20
> bit rot risk.

I am not sure this will solve the problem. Even if you were able to speed up
canonicalization by, say, 30%, it would still impact startup times by 10%,=
=20
unconditionally, no matter whether the result is ever needed. And 10% are
highly visible when the total time is in the "several dozen seconds" range.

Andre'

--Boundary-00=_AwlhL6kUXR3dTiG
Content-Type: application/x-perl;
  name="createit.pl"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
	filename="createit.pl"
Content-length: 2478

#!/usr/bin/perl -w

$N = 40; # files
$M = 25; # functions per file

$S = "std::string";
$H = "std::map<std::string, std::vector<std::string> >";

foreach $i (0..$N) {
    open LIBH, '>', "lib$i.h";
    print LIBH "#ifndef LIB$i"."_H\n";
    print LIBH "#define LIB$i"."_H\n\n";
    print LIBH "#include <string>\n";
    print LIBH "#include <vector>\n\n";
    print LIBH "#include <map>\n\n";
    print LIBH "namespace ns {\n";
    print LIBH "namespace inner {\n\n";
    print LIBH "struct Foo$i\n{\n";
    print LIBH "\tint foo$_($H &map, const $S &index, const $S &x);\n" foreach 0..$M;
    print LIBH "\tint sum();\n";
    print LIBH "};\n\n}\n}\n";
    print LIBH "#endif\n";
    close LIBH;

    open LIBC, '>', "lib$i.cpp";
    print LIBC "#include \"lib$i.h\"\n\n";
    print LIBC "namespace ns {\n";
    print LIBC "namespace inner {\n\n";
    print LIBC "int Foo$i"."::foo$_"."($H &map,\n const $S &index, const $S &x)\n"
        ."{\n\treturn map[index].size() < x.size();\n}\n\n" foreach 0..$M;
    print LIBC "int Foo$i"."::sum()\n{\n";
    print LIBC "\tint t = 0;\n";
    print LIBC "\t$H m;\n";
    print LIBC "\tm[\"key 0\"].push_back(\"value 0\");\n";
    print LIBC "\tt += foo$_"."(m, \"key 1\", \"xxx\");\n"  foreach 0..$M;
    print LIBC "\treturn t;\n}\n";
    print LIBC "\n\n}\n}\n";
    close LIBC;
}

open MAIN, '>', 'main.cpp';
print MAIN "#include \"lib$_.h\"\n" foreach 0..$N;
print MAIN "\nusing namespace ns::inner;\n";
print MAIN "\nint main()\n{\tint s = 0;\n";
print MAIN "\ts += Foo$_"."().sum();\n" foreach 0..$N;
print MAIN "\treturn s;\n}\n";
close MAIN;

foreach $i (0..$N) {
    print "Compiling lib$i.cpp...\n";
    system "g++ -c -g -o lib$i.o lib$i.cpp";
    system "g++ -shared lib$i.o -o lib$i.so";
}

print "Compiling main.cpp...\n";
system "g++ -c -g -o main.o main.cpp";

print "Linking..\n";
system "g++ -o main main.o *.so";

print "Running valgrind.cpp...\n";
system "LD_LIBRARY_PATH=`pwd` valgrind --tool=callgrind "
        . " ~/debugger/gdb-7.0.90/gdb/gdb"
        . " -ex 'set confirm off' -ex 'b main' -ex run -ex quit ./main"
        . " > out-7.0.90.txt";
system "LD_LIBRARY_PATH=`pwd` valgrind --tool=callgrind "
        . " ~/debugger/gdb-7.0.90/gdb/gdb"
        . " -ex 'set confirm off' -ex 'b main' -ex run -ex quit ./main"
        . " > out-7.0.1.txt";


With 7.0.90:  15.48% in dwarf2_canonicalize_name and lower
With 7.0.1:    0.04% --- " ---  

Total instruction count is 429,137,527 vs 516,590,964 insns





--Boundary-00=_AwlhL6kUXR3dTiG--