From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-return-40933-listarch-gdb=sources.redhat.com@sourceware.org>
Received: (qmail 16451 invoked by alias); 18 May 2012 18:51:51 -0000
Received: (qmail 16442 invoked by uid 22791); 18 May 2012 18:51:49 -0000
X-SWARE-Spam-Status: No, hits=-5.8 required=5.0	tests=AWL,BAYES_00,KAM_STOCKGEN,KHOP_RCVD_UNTRUST,RCVD_IN_DNSWL_HI,RCVD_IN_HOSTKARMA_W,SPF_HELO_PASS,T_RP_MATCHES_RCVD
X-Spam-Check-By: sourceware.org
Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 18 May 2012 18:51:32 +0000
Received: from int-mx12.intmail.prod.int.phx2.redhat.com (int-mx12.intmail.prod.int.phx2.redhat.com [10.5.11.25])	by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id q4IIpVNA018036	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK);	Fri, 18 May 2012 14:51:31 -0400
Received: from barimba (ovpn01.gateway.prod.ext.phx2.redhat.com [10.5.9.1])	by int-mx12.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id q4IIpTlC027723	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NO);	Fri, 18 May 2012 14:51:30 -0400
From: Tom Tromey <tromey@redhat.com>
To: John Gilmore <gnu@toad.com>
Cc: Jan Kratochvil <jan.kratochvil@redhat.com>,        Pedro Alves <palves@redhat.com>, gdb@sourceware.org
Subject: Lazy CU expansion (Was: Will therefore GDB utilize C++ or not?)
References: <20120330161403.GA17891@host2.jankratochvil.net>	<87aa2rjkb8.fsf@fleche.redhat.com> <4F832D5B.9030308@redhat.com>	<20120409190519.GA524@host2.jankratochvil.net>	<4F833D29.4050102@redhat.com>	<20120416065456.GA30097@host2.jankratochvil.net>	<4F8ECB72.70708@redhat.com>	<20120418151553.GA16768@host2.jankratochvil.net>	<4F8EDD7B.2010602@redhat.com>	<20120418155354.GA17912@host2.jankratochvil.net>	<201204181748.q3IHm1cF002815@new.toad.com>	<87pqb4q2on.fsf@fleche.redhat.com>	<201204182309.q3IN9FcF019607@new.toad.com>
Date: Fri, 18 May 2012 18:51:00 -0000
In-Reply-To: <201204182309.q3IN9FcF019607@new.toad.com> (John Gilmore's	message of "Wed, 18 Apr 2012 16:09:15 -0700")
Message-ID: <87fwaxgw5q.fsf_-_@fleche.redhat.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.0.95 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
Mailing-List: contact gdb-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <gdb.sourceware.org>
List-Subscribe: <mailto:gdb-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/gdb/>
List-Post: <mailto:gdb@sourceware.org>
List-Help: <mailto:gdb-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: gdb-owner@sourceware.org
X-SW-Source: 2012-05/txt/msg00089.txt.bz2

>>>>> "John" == John Gilmore <gnu@toad.com> writes:

Some comments on your comments about lazy CU expansion.

John> The whole design of partial_symbols was that they're only needed when
John> the real symbols haven't been read in.  This is well documented.  In
John> fact the partial_symtab for a file can be (or used to be able to be)
John> thrown away when the real symtab is created, and many symbol-readers
John> never bothered to create partial_symbols.

I don't think it's been possible to discard a partial symtab for many
years now.

It doesn't seem very worthwhile to do it, given that the bulk of the
memory is in the psymbols themselves, and these can't ever be deleted.
But, maybe it would be worth trying.

John> Partial symtabs were only a
John> speed optimization to avoid parsing Stabs debugging info when host
John> machines ran at 20 megahertz.  You could probably get rid of them
John> entirely nowadays.

This seems unlikely to me; but due to memory use, not CPU.

Partial symbols take a lot less memory than full symbols, partly because
they are smaller, but more importantly because they can be put into the
bcache, and this is quite effective in practice.

John> The GDB Internals manual (which I originated when I discovered that
John> there was no internals documentation) makes it clear that there are
John> only a few ways to look up a symbol.  Has that nice clean bit of
John> modular C programming has been retained over the last decade?

No, the symbol tables are a total mess, and the internals manual is out
of date.

John> So how is this idea of pointing to psymbols going to save any
John> memory?

'struct symbol' starts with a 'general_symbol_info', and also includes
'domain' and 'aclass' fields -- all of which are duplicated in the
partial symbol.

So, pointing to the partial symbol will save at least
sizeof(general_symbol_info) - sizeof(void*) bytes per symbol.  On x86-64
that is 32 bytes.  Maybe it could save more memory with more packing.

More importantly, this sort of thing would allow instantiation of a full
symtab without re-parsing the DWARF.  Re-parsing is slow, and also
mostly pointless, as most symbols in a given CU are never used.

John> And if you're going to have to allocate all the memory for the
John> struct symbol, then why not populate it with the real information
John> for the symbol, instead of just a psymbol pointer?

Reading the remaining information is slow and uses memory, but the
results are often not used.  So it would be preferable to fill in the
details on demand.

Just skipping function bodies alone saves ~30% of the CU expansion time.

John> It's much simpler to read all the symbols in a symbol file, in
John> order, and once you're doing that anyway, you might as well save
John> them all.

Yes, it is simpler.  This is what is done now.  I think it doesn't scale
very well... Jan has dug up some C++ libraries where there is one
enormous CU which sucks up a lot of time if you happen to have to expand
it.

Tom> Full symbols are already reasonably C++y, what with the ops vector.

John> It looks to me like the "ops vector" in symbols in gdb-7.4 is pretty
John> minimal, only applying to a tiny number of symbol categories (and the
John> comments in findvar.c -- from 2004 -- report that DWARF2 symbols screw
John> up the ops vector anyway).  Large parts of GDB touch symbols; is the
John> idea that all of these will be rewritten to indirect through an
John> ops-table (either explicitly in C, or implicitly in C++) without ever
John> accessing fields (like SYMBOL_CLASS(sym)) directly?  Do you think this
John> will make GDB faster and smaller?  I don't.

I doubt it would be smaller.  History indicates this is of zero
importance.

It would probably be faster.  At least for lazy CU expansion, the
changes are of the form:

#define SYMBOL_TYPE(sym) \
  ((sym)->type ? (sym)->type : compute_symbol_type (sym))

... or moral equivalent.

Rewriting is not necessary, you can redefine the macros.
But, rewriting the uses would be better if we were moving to C++.
This is easy though.

John> (There's a comment in symtab.c from 2003 that says address classes and
John> ops vectors should be merged.  But clearly nobody has felt like doing
John> that work in the last 9 years -- probably because so many places in the
John> code would need to be touched.

I'm not sure I trust that comment.  I find that in general, comments in
GDB relating to future maintenance issues are often questionable.

Tom