From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-return-40770-listarch-gdb=sources.redhat.com@sourceware.org>
Received: (qmail 14847 invoked by alias); 18 Apr 2012 23:10:09 -0000
Received: (qmail 14602 invoked by uid 22791); 18 Apr 2012 23:09:59 -0000
X-SWARE-Spam-Status: No, hits=1.7 required=5.0	tests=AWL,BAYES_00,KAM_STOCKGEN,KHOP_THREADED,RCVD_IN_BRBL_LASTEXT,RCVD_IN_HOSTKARMA_YE,RCVD_IN_NJABL_RELAY,T_RP_MATCHES_RCVD
X-Spam-Check-By: sourceware.org
Received: from new.toad.com (HELO new.toad.com) (209.237.225.253)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 18 Apr 2012 23:09:25 +0000
Received: from new.toad.com (localhost.localdomain [127.0.0.1])	by new.toad.com (8.12.9/8.12.9) with ESMTP id q3IN9FcF019607;	Wed, 18 Apr 2012 16:09:15 -0700
Message-Id: <201204182309.q3IN9FcF019607@new.toad.com>
To: Tom Tromey <tromey@redhat.com>
cc: John Gilmore <gnu@toad.com>, Jan Kratochvil <jan.kratochvil@redhat.com>,   Pedro Alves <palves@redhat.com>, gdb@sourceware.org
Subject: Re: Will therefore GDB utilize C++ or not?
In-reply-to: <87pqb4q2on.fsf@fleche.redhat.com>
References: <20120330161403.GA17891@host2.jankratochvil.net> <87aa2rjkb8.fsf@fleche.redhat.com> <4F832D5B.9030308@redhat.com> <20120409190519.GA524@host2.jankratochvil.net> <4F833D29.4050102@redhat.com> <20120416065456.GA30097@host2.jankratochvil.net> <4F8ECB72.70708@redhat.com> <20120418151553.GA16768@host2.jankratochvil.net> <4F8EDD7B.2010602@redhat.com> <20120418155354.GA17912@host2.jankratochvil.net> <201204181748.q3IHm1cF002815@new.toad.com> <87pqb4q2on.fsf@fleche.redhat.com>
Comments: In-reply-to Tom Tromey <tromey@redhat.com>   message dated "Wed, 18 Apr 2012 13:07:20 -0600."
Date: Wed, 18 Apr 2012 23:10:00 -0000
From: John Gilmore <gnu@toad.com>
X-IsSubscribed: yes
Mailing-List: contact gdb-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <gdb.sourceware.org>
List-Subscribe: <mailto:gdb-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/gdb/>
List-Post: <mailto:gdb@sourceware.org>
List-Help: <mailto:gdb-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: gdb-owner@sourceware.org
X-SW-Source: 2012-04/txt/msg00148.txt.bz2

> I didn't reply to some of your earlier notes since they were just
> clearly bogus -- straw men, weird characterizations and generalizations
> bordering on the insulting, etc -- but a couple people asked me
> privately about these.  So I guess I'll respond to this one after all.

I had no desire to insult anyone, and I apologize if you were offended.

I wish you'd retract the terms you used for my ideas (bogus, straw
men, weird, etc).  All of my messages on this topic have been serious.
I do realize that I'm a decade out of doing regular daily programming
on GDB, so I try to preface my characterizations with things like "If
this is still true in the GDB design, then ...".

> The thrust of the argument that I posted, which you have basically
> ignored in all its particulars, is (1) gdb is already quite close to C++
> in practice, albeit a badly broken dialect, and (2) the constructs gdb
> implements are in fact difficult to use correctly in practice.  The
> evidence for #2 is all in the gdb-patches archives, mostly along the
> lines of bug fixes for various purely avoidable problems.

Re (1), I could say that C is already quite close to C++, since GDB
is written in C and you say GDB is already quite close to C++.  But
that is not an argument for converting C to C++ (or GDB to C++), it's
just an observation.  Large modular programs use modularity
techniques.  Some of those are also used by C++.  Again, nice
observation, but what does it have to do with converting GDB to C++?

The crux of your argument is (2), that GDB is hard to maintain.  And
the key differences in our opinions on that are that:

  *  Just because it's written in C++ doesn't make it maintainable.  My
     experience so far is that large C programs tend to be more
     maintainable than large C++ programs.  You may dismiss this as an
     insulting generalization, but again, I can't comment on the
     actual maintainability of the actual C++ version of GDB, because
     it doesn't exist.  All I have is experience to guide me.  And
     below is my experience with the maintainability of the C GDB.

  *  I introduced the first ops-vector into GDB (and co-designed and
     built BFD, which made the second ops-vector).  But I was
     receiving many patches on bug-gdb long before we had any
     ops-vectors and long before we had any exceptions, and it was the
     rare patch that actually worked.  My own early patches to GDB
     produced a variety of bugs, before I understood it.  And
     touching anything in wait_for_inferior was guar-an-teeeeeed to
     produce obscure bugs.

See, GDB was a large, complex program even in 1991.  And the average
guy who makes a patch to a large program, makes it solve his
particular problem but usually breaks six other things in the process.
Those patches that people submitted were great guides to intuiting
"what the user's actual problem is", which was often unclear from the
bug report.  But the GDB maintainer (in that case me) had to apply
their knowledge of the structure of GDB in order to *fix* the problem
in a correct way, once they understood what the problem is.

Most such posted patches I would just read, debug the real problem,
and patch it my own way.  With a few people who seemed to have more
insight, I would point out the problems in their patch and ask them to
revise it to solve those problems.  Or I'd point out that e.g. they'd
patched the code but not the documentation, and would they please also
do that.  By training them up that way, ultimately many of those
people (like Fred Fish) became valuable GDB contributors and even were
hired as valuable Cygnus employees.  But you can't afford to do that
for more than about 10% of the patches.  The guy who wrote the patch
has solved his own problem and usually doesn't care to rewrite his
patch, doesn't have the insight to fix it up, or doesn't have the
programming chops to do it well.  And it takes more work to train
somebody, than to just do the work yourself, so you have to ration
your own time.  You just have to resolutely NOT APPLY the other 90% of
the patches, using them only as an informal guide to where problems
might be lurking.

So if there are tons of contributed patches in gdb-patches, for
"purely avoidable problems" that are in the shipping version of GDB,
then they probably result from prior acts of the GDB maintainers, who
inappropriately applied patches from people who don't understand the
structure of GDB.  I would strongly doubt the integrity of these tons
of subsequent contributed patches too, unless they came from a small
set of people who have already made a bunch of GDB patches; they are
again more likely to tear up the guts of GDB and create four other
problems, than they are to fix an existing problem.  Again I'm
speaking in generalities, but these generalities are grounded in years
of being the GDB maintainer and reading all the patches posted to
bug-gdb and gdb-patches.

In summary, I don't think the fact that the average guy can't patch
GDB is an artifact of its being written in C.  I think it's an
artifact of it being a large, complex program.  I don't think a
rewrite, into any language, is going to fix that.

> There are some threads on gdb-patches recently about lazy CU expansion

Also, I'm not sure what "lazy CU expansion" is.  I did find a bug (PR
12708 at http://sourceware.org/bugzilla/show_bug.cgi?id=12708) that
mentions "CU expansion" but it never says what that means.  Clearly
it's something to do with tab-expansion of mangled C++ types.  It
seems to have been a psymtab bug, since --readnow fixed it.  Aha!  The
test case for that bug mentions "Compilation Unit" expansion.  Of
course, must be obvious (to somebody -- though web search engines
did not turn up any useful references).  This, it seems to me, means
turning a psymtab into a symtab.  Is that what we're talking about?

>		  Namely, we'd like to change symbol so that it
> can optionally point to the corresponding psymbol, but that means
> wackiness in the accessors; a problem easily and cleanly solved by
> subclassing and virtual methods.  Note that we already have one subclass
> of symbols, and there's some indication we'll need more regardless.

By psymbol I presume you mean partial_symbol.  And I'm working from
your email message (in which you read between the lines of some
gdb-patches); I haven't looked at those patches since you weren't
specific about which ones you're talking about.

The whole design of partial_symbols was that they're only needed when
the real symbols haven't been read in.  This is well documented.  In
fact the partial_symtab for a file can be (or used to be able to be)
thrown away when the real symtab is created, and many symbol-readers
never bothered to create partial_symbols.  Partial symtabs were only a
speed optimization to avoid parsing Stabs debugging info when host
machines ran at 20 megahertz.  You could probably get rid of them
entirely nowadays.

Now, you could conceivably keep the partial_symtab (and hope that it
pages out since you'll never access it), and at some later point,
throw away the entire real symtab.  Then subsequent accesses to this
range of addresses or symbols would again start accessing the
partial_symtab.  This would let you reclaim some memory, but would
require a reread of these symbols the next time execution stopped
inside this file (or someone accessed a partial_symbol from it).
(You'd also have to do a sweep of all other symbol pointers, e.g. for
saved $37 values, to make sure that none of them point into the symtab
you're trying to throw away.)  But I think the idea in these patches
is some idea that's much more granular than that, like throwing away
95% of the symbols in a file, while retaining 5%?  Or only reading in
5%, and then going back to reread other symbols into the same symtab
later?

Is it really faster and cheaper to throw away and reread symbols, than
it is to let the operating system's virtual memory page them in and
out?  The beauty of virtual memory is that it does cacheing without
requiring any programming changes.  (Yes, you can optimize virtual
memory based programs by avoiding accesses that would run all over
the memory space.  That was part of the idea of keeping the
symbols in obstacks, where they're all close to each other in memory.
If you want to throw 95% of them away, what kind of memory
allocator do you plan to use to preserve the locality of reference of
the remaining 5%?)

The GDB Internals manual (which I originated when I discovered that
there was no internals documentation) makes it clear that there are
only a few ways to look up a symbol.  Has that nice clean bit of
modular C programming has been retained over the last decade?  If so,
then you could change the implementation of symbol tables in just
those few places, so that the lookup code would handle symbols that
pointed to partial_symbols.  And as long as the full symbol got
created before those few lookup functions returned and some other bit
of code went poking around in the guts of the symbol, then the rest of
GDB wouldn't need to change.  (That modularity is how psymtabs work, too;
most of GDB never needs to care about them, and none of it needs to
depend upon their existence.)

But if you're going to have a struct symbol that other things can
point to, but which internally only contains a psymbol pointer, you'll
still need to allocate all the space for the whole struct symbol.
Because when someone touches it and you need to read the real symbol
in, you can't change the address of the struct symbol.  (If you did
change its address, then you'd have to find all the things that point
to it, and revise those pointers to point to the new one.)  So how is
this idea of pointing to psymbols going to save any memory?  And if
you're going to have to allocate all the memory for the struct symbol,
then why not populate it with the real information for the symbol,
instead of just a psymbol pointer?

Or else are you talking about only reading in a few symbols from 
a symbol-file when a lookup indicates that you need to expand a psymtab
into a symtab?  You could try that, but it's probably a hive of
bugs, because you'd have to recurse to find all the symbols that
the symbol-of-interest needs.  It's much simpler to read all the
symbols in a symbol file, in order, and once you're doing that
anyway, you might as well save them all.

(You could try doing a super cheap optimization and merely stop
reading and creating symbols after you've defined the particular
symbol that you are looking for.  This wouldn't require
symbols-that-point-to-psymbols; it would instead require lookups to
look in both the psymtab and the symtab, even if there's a symtab.
This would also involve removing duplicates, now that we'd look at
both, or require some fancy footwork to only look in the PART of the
psymtab that hadn't already been read into a symtab.  Subsequent
efforts to expand the psymtab to find some different symbol, would
resume reading from where the previous symbol-reading-and-creation
left off.  This would probably be a much easier change than to 
make a symtab entry be able to point to a psymtab entry.)

> Full symbols are already reasonably C++y, what with the ops vector.

It looks to me like the "ops vector" in symbols in gdb-7.4 is pretty
minimal, only applying to a tiny number of symbol categories (and the
comments in findvar.c -- from 2004 -- report that DWARF2 symbols screw
up the ops vector anyway).  Large parts of GDB touch symbols; is the
idea that all of these will be rewritten to indirect through an
ops-table (either explicitly in C, or implicitly in C++) without ever
accessing fields (like SYMBOL_CLASS(sym)) directly?  Do you think this
will make GDB faster and smaller?  I don't.

(There's a comment in symtab.c from 2003 that says address classes and
ops vectors should be merged.  But clearly nobody has felt like doing
that work in the last 9 years -- probably because so many places in the
code would need to be touched.  You can't use C++ to overload
those symbols without touching all the same places that nobody wanted
to touch for a decade.)

In conclusion, why does making this change require (or encourage) C++?
The idea seems to require changing a ton of working code, whether it's
ultimately done in C or in C++, which makes it an unappetizing change
in any language.  It would take a powerful heap of benefit to make
it worth doing that.

	John