From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 16451 invoked by alias); 18 May 2012 18:51:51 -0000 Received: (qmail 16442 invoked by uid 22791); 18 May 2012 18:51:49 -0000 X-SWARE-Spam-Status: No, hits=-5.8 required=5.0 tests=AWL,BAYES_00,KAM_STOCKGEN,KHOP_RCVD_UNTRUST,RCVD_IN_DNSWL_HI,RCVD_IN_HOSTKARMA_W,SPF_HELO_PASS,T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 18 May 2012 18:51:32 +0000 Received: from int-mx12.intmail.prod.int.phx2.redhat.com (int-mx12.intmail.prod.int.phx2.redhat.com [10.5.11.25]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id q4IIpVNA018036 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Fri, 18 May 2012 14:51:31 -0400 Received: from barimba (ovpn01.gateway.prod.ext.phx2.redhat.com [10.5.9.1]) by int-mx12.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id q4IIpTlC027723 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NO); Fri, 18 May 2012 14:51:30 -0400 From: Tom Tromey To: John Gilmore Cc: Jan Kratochvil , Pedro Alves , gdb@sourceware.org Subject: Lazy CU expansion (Was: Will therefore GDB utilize C++ or not?) References: <20120330161403.GA17891@host2.jankratochvil.net> <87aa2rjkb8.fsf@fleche.redhat.com> <4F832D5B.9030308@redhat.com> <20120409190519.GA524@host2.jankratochvil.net> <4F833D29.4050102@redhat.com> <20120416065456.GA30097@host2.jankratochvil.net> <4F8ECB72.70708@redhat.com> <20120418151553.GA16768@host2.jankratochvil.net> <4F8EDD7B.2010602@redhat.com> <20120418155354.GA17912@host2.jankratochvil.net> <201204181748.q3IHm1cF002815@new.toad.com> <87pqb4q2on.fsf@fleche.redhat.com> <201204182309.q3IN9FcF019607@new.toad.com> Date: Fri, 18 May 2012 18:51:00 -0000 In-Reply-To: <201204182309.q3IN9FcF019607@new.toad.com> (John Gilmore's message of "Wed, 18 Apr 2012 16:09:15 -0700") Message-ID: <87fwaxgw5q.fsf_-_@fleche.redhat.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.0.95 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain Mailing-List: contact gdb-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-owner@sourceware.org X-SW-Source: 2012-05/txt/msg00089.txt.bz2 >>>>> "John" == John Gilmore writes: Some comments on your comments about lazy CU expansion. John> The whole design of partial_symbols was that they're only needed when John> the real symbols haven't been read in. This is well documented. In John> fact the partial_symtab for a file can be (or used to be able to be) John> thrown away when the real symtab is created, and many symbol-readers John> never bothered to create partial_symbols. I don't think it's been possible to discard a partial symtab for many years now. It doesn't seem very worthwhile to do it, given that the bulk of the memory is in the psymbols themselves, and these can't ever be deleted. But, maybe it would be worth trying. John> Partial symtabs were only a John> speed optimization to avoid parsing Stabs debugging info when host John> machines ran at 20 megahertz. You could probably get rid of them John> entirely nowadays. This seems unlikely to me; but due to memory use, not CPU. Partial symbols take a lot less memory than full symbols, partly because they are smaller, but more importantly because they can be put into the bcache, and this is quite effective in practice. John> The GDB Internals manual (which I originated when I discovered that John> there was no internals documentation) makes it clear that there are John> only a few ways to look up a symbol. Has that nice clean bit of John> modular C programming has been retained over the last decade? No, the symbol tables are a total mess, and the internals manual is out of date. John> So how is this idea of pointing to psymbols going to save any John> memory? 'struct symbol' starts with a 'general_symbol_info', and also includes 'domain' and 'aclass' fields -- all of which are duplicated in the partial symbol. So, pointing to the partial symbol will save at least sizeof(general_symbol_info) - sizeof(void*) bytes per symbol. On x86-64 that is 32 bytes. Maybe it could save more memory with more packing. More importantly, this sort of thing would allow instantiation of a full symtab without re-parsing the DWARF. Re-parsing is slow, and also mostly pointless, as most symbols in a given CU are never used. John> And if you're going to have to allocate all the memory for the John> struct symbol, then why not populate it with the real information John> for the symbol, instead of just a psymbol pointer? Reading the remaining information is slow and uses memory, but the results are often not used. So it would be preferable to fill in the details on demand. Just skipping function bodies alone saves ~30% of the CU expansion time. John> It's much simpler to read all the symbols in a symbol file, in John> order, and once you're doing that anyway, you might as well save John> them all. Yes, it is simpler. This is what is done now. I think it doesn't scale very well... Jan has dug up some C++ libraries where there is one enormous CU which sucks up a lot of time if you happen to have to expand it. Tom> Full symbols are already reasonably C++y, what with the ops vector. John> It looks to me like the "ops vector" in symbols in gdb-7.4 is pretty John> minimal, only applying to a tiny number of symbol categories (and the John> comments in findvar.c -- from 2004 -- report that DWARF2 symbols screw John> up the ops vector anyway). Large parts of GDB touch symbols; is the John> idea that all of these will be rewritten to indirect through an John> ops-table (either explicitly in C, or implicitly in C++) without ever John> accessing fields (like SYMBOL_CLASS(sym)) directly? Do you think this John> will make GDB faster and smaller? I don't. I doubt it would be smaller. History indicates this is of zero importance. It would probably be faster. At least for lazy CU expansion, the changes are of the form: #define SYMBOL_TYPE(sym) \ ((sym)->type ? (sym)->type : compute_symbol_type (sym)) ... or moral equivalent. Rewriting is not necessary, you can redefine the macros. But, rewriting the uses would be better if we were moving to C++. This is easy though. John> (There's a comment in symtab.c from 2003 that says address classes and John> ops vectors should be merged. But clearly nobody has felt like doing John> that work in the last 9 years -- probably because so many places in the John> code would need to be touched. I'm not sure I trust that comment. I find that in general, comments in GDB relating to future maintenance issues are often questionable. Tom