From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 5980 invoked by alias); 25 Feb 2010 11:16:35 -0000 Mailing-List: contact archer-help@sourceware.org; run by ezmlm Sender: Precedence: bulk List-Post: List-Help: List-Subscribe: List-Id: Received: (qmail 5950 invoked by uid 22791); 25 Feb 2010 11:16:33 -0000 X-SWARE-Spam-Status: No, hits=-2.6 required=5.0 tests=AWL,BAYES_00 X-Spam-Check-By: sourceware.org From: =?iso-8859-1?q?Andr=E9_P=F6nitz?= To: archer@sourceware.org Subject: Re: dwarf name canonicalization Date: Thu, 25 Feb 2010 11:16:00 -0000 User-Agent: KMail/1.12.2 (Linux/2.6.31-19-generic; KDE/4.3.2; i686; ; ) References: <4B84517A.4020307@redhat.com> In-Reply-To: <4B84517A.4020307@redhat.com> MIME-Version: 1.0 Content-Type: Multipart/Mixed; boundary="Boundary-00=_AwlhL6kUXR3dTiG" Message-Id: <201002251216.16565.andre.poenitz@nokia.com> X-Nokia-AV: Clean X-SW-Source: 2010-q1/txt/msg00086.txt.bz2 --Boundary-00=_AwlhL6kUXR3dTiG Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-length: 4181 On Tuesday 23 February 2010 23:06:50 Keith Seitz wrote: > [OT: I would love a test case. I *pleaded* for specific test cases.] Yes, I remember that. Sorry. I still had it on my TODO list, just had not found the time to create something that it's easily reproducible without too much external dependencies.=20 Looks like it is time to act now. All my "real" use cases would require Qt which is probably not acceptable=20 here, so let me have a shot at a contrived example that I'd consider structurally not too far off from reality, a ~1000 function "project",=20 structured like this: ----------------------- lib1.h -------------------- #ifndef LIB1_H #define LIB1_H #include #include #include namespace ns { namespace inner { struct Foo1 { int foo0(std::map > &map,=20 const std::string &index, const std::string &x); // [...] int foo25(std::map > &map,=20 const std::string &index, const std::string &x); int sum(); }; [...] ----------------------- lib1.cpp -------------------- [...] int Foo1::foo25(std::map > &map, const std::string &index, const std::string &x) { return map[index].size() < x.size(); } int Foo1::sum() { int t =3D 0; std::map > m; m["key 0"].push_back("value 0"); t +=3D foo0(m, "key 1", "xxx"); [...] return t; } ----------------------- main.cpp -------------------- #include "lib1.h" [...] using namespace ns::inner; int main() { int s =3D 0; s +=3D Foo0().sum(); s +=3D Foo1().sum(); [...] return s; } I'll attach a perl script generating the code. Don't look at the actual code too close, it really does not matter. A quick test also indicates that neit= her the number or files nor of functions make a difference for the time ratio. With 7.0.90 gdb spends 15.48% of its instructions in dwarf2_canonicalize_na= me and functions called from there, with 7.0.1 it is only 0.04%.=20 Total instruction count is 429,137,527 vs 516,590,964. Both versions of gdb are compiled with -O2 -g using gcc 4.4.1. I certainly do understand that instruction count does not need to mean much, but it is fairly reproducible and in this case it correlates indeed w= ith=20 wall clock times. Note that the number will get _much_ worse when it comes to "modern" C++ like code using template expressions or even=20 > [...] Unless the IDE provided a console that accepted generic input (like= =20 > "normal" gdb), I don't think that much would break, if anything. IDEs=20 > really rather rely on linespecs for the most part, no? As long as you're= =20 > not sending input to gdb that looks like a function name, you should be=20 > safe. But I cannot guarantee. I have no first-hand experience with IDEs=20 > (in many years). >From my point of view it is a safe assumption that most if not all IDE users would prefer a 15% startup time gain over an improved parsing of function=20 names - especially since they are very unlikely to ever use anyway. However, it looks like it does not even have to be an either-or here. If=20 the canonicalization would be made optional using, say, some 'maint set' switch, a user could make his own choice, and an IDE could even apply=20 some "cleverness" like switching canonicalization off in the beginning and reload with canonicalization as soon as the user triggers an operation that needs canonicalization. Or maybe even retrieve a list of uncanonicaliz= ed symbols and match user input against that before bothering gdb with it. > I would much rather address (fix?) the speed problem first. The idea of=20 > multiple paths through the code for the "same" task would seem a high=20 > bit rot risk. I am not sure this will solve the problem. Even if you were able to speed up canonicalization by, say, 30%, it would still impact startup times by 10%,= =20 unconditionally, no matter whether the result is ever needed. And 10% are highly visible when the total time is in the "several dozen seconds" range. Andre' --Boundary-00=_AwlhL6kUXR3dTiG Content-Type: application/x-perl; name="createit.pl" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="createit.pl" Content-length: 2478 #!/usr/bin/perl -w $N = 40; # files $M = 25; # functions per file $S = "std::string"; $H = "std::map >"; foreach $i (0..$N) { open LIBH, '>', "lib$i.h"; print LIBH "#ifndef LIB$i"."_H\n"; print LIBH "#define LIB$i"."_H\n\n"; print LIBH "#include \n"; print LIBH "#include \n\n"; print LIBH "#include \n\n"; print LIBH "namespace ns {\n"; print LIBH "namespace inner {\n\n"; print LIBH "struct Foo$i\n{\n"; print LIBH "\tint foo$_($H &map, const $S &index, const $S &x);\n" foreach 0..$M; print LIBH "\tint sum();\n"; print LIBH "};\n\n}\n}\n"; print LIBH "#endif\n"; close LIBH; open LIBC, '>', "lib$i.cpp"; print LIBC "#include \"lib$i.h\"\n\n"; print LIBC "namespace ns {\n"; print LIBC "namespace inner {\n\n"; print LIBC "int Foo$i"."::foo$_"."($H &map,\n const $S &index, const $S &x)\n" ."{\n\treturn map[index].size() < x.size();\n}\n\n" foreach 0..$M; print LIBC "int Foo$i"."::sum()\n{\n"; print LIBC "\tint t = 0;\n"; print LIBC "\t$H m;\n"; print LIBC "\tm[\"key 0\"].push_back(\"value 0\");\n"; print LIBC "\tt += foo$_"."(m, \"key 1\", \"xxx\");\n" foreach 0..$M; print LIBC "\treturn t;\n}\n"; print LIBC "\n\n}\n}\n"; close LIBC; } open MAIN, '>', 'main.cpp'; print MAIN "#include \"lib$_.h\"\n" foreach 0..$N; print MAIN "\nusing namespace ns::inner;\n"; print MAIN "\nint main()\n{\tint s = 0;\n"; print MAIN "\ts += Foo$_"."().sum();\n" foreach 0..$N; print MAIN "\treturn s;\n}\n"; close MAIN; foreach $i (0..$N) { print "Compiling lib$i.cpp...\n"; system "g++ -c -g -o lib$i.o lib$i.cpp"; system "g++ -shared lib$i.o -o lib$i.so"; } print "Compiling main.cpp...\n"; system "g++ -c -g -o main.o main.cpp"; print "Linking..\n"; system "g++ -o main main.o *.so"; print "Running valgrind.cpp...\n"; system "LD_LIBRARY_PATH=`pwd` valgrind --tool=callgrind " . " ~/debugger/gdb-7.0.90/gdb/gdb" . " -ex 'set confirm off' -ex 'b main' -ex run -ex quit ./main" . " > out-7.0.90.txt"; system "LD_LIBRARY_PATH=`pwd` valgrind --tool=callgrind " . " ~/debugger/gdb-7.0.90/gdb/gdb" . " -ex 'set confirm off' -ex 'b main' -ex run -ex quit ./main" . " > out-7.0.1.txt"; With 7.0.90: 15.48% in dwarf2_canonicalize_name and lower With 7.0.1: 0.04% --- " --- Total instruction count is 429,137,527 vs 516,590,964 insns --Boundary-00=_AwlhL6kUXR3dTiG--