From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 14847 invoked by alias); 18 Apr 2012 23:10:09 -0000 Received: (qmail 14602 invoked by uid 22791); 18 Apr 2012 23:09:59 -0000 X-SWARE-Spam-Status: No, hits=1.7 required=5.0 tests=AWL,BAYES_00,KAM_STOCKGEN,KHOP_THREADED,RCVD_IN_BRBL_LASTEXT,RCVD_IN_HOSTKARMA_YE,RCVD_IN_NJABL_RELAY,T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from new.toad.com (HELO new.toad.com) (209.237.225.253) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 18 Apr 2012 23:09:25 +0000 Received: from new.toad.com (localhost.localdomain [127.0.0.1]) by new.toad.com (8.12.9/8.12.9) with ESMTP id q3IN9FcF019607; Wed, 18 Apr 2012 16:09:15 -0700 Message-Id: <201204182309.q3IN9FcF019607@new.toad.com> To: Tom Tromey cc: John Gilmore , Jan Kratochvil , Pedro Alves , gdb@sourceware.org Subject: Re: Will therefore GDB utilize C++ or not? In-reply-to: <87pqb4q2on.fsf@fleche.redhat.com> References: <20120330161403.GA17891@host2.jankratochvil.net> <87aa2rjkb8.fsf@fleche.redhat.com> <4F832D5B.9030308@redhat.com> <20120409190519.GA524@host2.jankratochvil.net> <4F833D29.4050102@redhat.com> <20120416065456.GA30097@host2.jankratochvil.net> <4F8ECB72.70708@redhat.com> <20120418151553.GA16768@host2.jankratochvil.net> <4F8EDD7B.2010602@redhat.com> <20120418155354.GA17912@host2.jankratochvil.net> <201204181748.q3IHm1cF002815@new.toad.com> <87pqb4q2on.fsf@fleche.redhat.com> Comments: In-reply-to Tom Tromey message dated "Wed, 18 Apr 2012 13:07:20 -0600." Date: Wed, 18 Apr 2012 23:10:00 -0000 From: John Gilmore X-IsSubscribed: yes Mailing-List: contact gdb-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-owner@sourceware.org X-SW-Source: 2012-04/txt/msg00148.txt.bz2 > I didn't reply to some of your earlier notes since they were just > clearly bogus -- straw men, weird characterizations and generalizations > bordering on the insulting, etc -- but a couple people asked me > privately about these. So I guess I'll respond to this one after all. I had no desire to insult anyone, and I apologize if you were offended. I wish you'd retract the terms you used for my ideas (bogus, straw men, weird, etc). All of my messages on this topic have been serious. I do realize that I'm a decade out of doing regular daily programming on GDB, so I try to preface my characterizations with things like "If this is still true in the GDB design, then ...". > The thrust of the argument that I posted, which you have basically > ignored in all its particulars, is (1) gdb is already quite close to C++ > in practice, albeit a badly broken dialect, and (2) the constructs gdb > implements are in fact difficult to use correctly in practice. The > evidence for #2 is all in the gdb-patches archives, mostly along the > lines of bug fixes for various purely avoidable problems. Re (1), I could say that C is already quite close to C++, since GDB is written in C and you say GDB is already quite close to C++. But that is not an argument for converting C to C++ (or GDB to C++), it's just an observation. Large modular programs use modularity techniques. Some of those are also used by C++. Again, nice observation, but what does it have to do with converting GDB to C++? The crux of your argument is (2), that GDB is hard to maintain. And the key differences in our opinions on that are that: * Just because it's written in C++ doesn't make it maintainable. My experience so far is that large C programs tend to be more maintainable than large C++ programs. You may dismiss this as an insulting generalization, but again, I can't comment on the actual maintainability of the actual C++ version of GDB, because it doesn't exist. All I have is experience to guide me. And below is my experience with the maintainability of the C GDB. * I introduced the first ops-vector into GDB (and co-designed and built BFD, which made the second ops-vector). But I was receiving many patches on bug-gdb long before we had any ops-vectors and long before we had any exceptions, and it was the rare patch that actually worked. My own early patches to GDB produced a variety of bugs, before I understood it. And touching anything in wait_for_inferior was guar-an-teeeeeed to produce obscure bugs. See, GDB was a large, complex program even in 1991. And the average guy who makes a patch to a large program, makes it solve his particular problem but usually breaks six other things in the process. Those patches that people submitted were great guides to intuiting "what the user's actual problem is", which was often unclear from the bug report. But the GDB maintainer (in that case me) had to apply their knowledge of the structure of GDB in order to *fix* the problem in a correct way, once they understood what the problem is. Most such posted patches I would just read, debug the real problem, and patch it my own way. With a few people who seemed to have more insight, I would point out the problems in their patch and ask them to revise it to solve those problems. Or I'd point out that e.g. they'd patched the code but not the documentation, and would they please also do that. By training them up that way, ultimately many of those people (like Fred Fish) became valuable GDB contributors and even were hired as valuable Cygnus employees. But you can't afford to do that for more than about 10% of the patches. The guy who wrote the patch has solved his own problem and usually doesn't care to rewrite his patch, doesn't have the insight to fix it up, or doesn't have the programming chops to do it well. And it takes more work to train somebody, than to just do the work yourself, so you have to ration your own time. You just have to resolutely NOT APPLY the other 90% of the patches, using them only as an informal guide to where problems might be lurking. So if there are tons of contributed patches in gdb-patches, for "purely avoidable problems" that are in the shipping version of GDB, then they probably result from prior acts of the GDB maintainers, who inappropriately applied patches from people who don't understand the structure of GDB. I would strongly doubt the integrity of these tons of subsequent contributed patches too, unless they came from a small set of people who have already made a bunch of GDB patches; they are again more likely to tear up the guts of GDB and create four other problems, than they are to fix an existing problem. Again I'm speaking in generalities, but these generalities are grounded in years of being the GDB maintainer and reading all the patches posted to bug-gdb and gdb-patches. In summary, I don't think the fact that the average guy can't patch GDB is an artifact of its being written in C. I think it's an artifact of it being a large, complex program. I don't think a rewrite, into any language, is going to fix that. > There are some threads on gdb-patches recently about lazy CU expansion Also, I'm not sure what "lazy CU expansion" is. I did find a bug (PR 12708 at http://sourceware.org/bugzilla/show_bug.cgi?id=12708) that mentions "CU expansion" but it never says what that means. Clearly it's something to do with tab-expansion of mangled C++ types. It seems to have been a psymtab bug, since --readnow fixed it. Aha! The test case for that bug mentions "Compilation Unit" expansion. Of course, must be obvious (to somebody -- though web search engines did not turn up any useful references). This, it seems to me, means turning a psymtab into a symtab. Is that what we're talking about? > Namely, we'd like to change symbol so that it > can optionally point to the corresponding psymbol, but that means > wackiness in the accessors; a problem easily and cleanly solved by > subclassing and virtual methods. Note that we already have one subclass > of symbols, and there's some indication we'll need more regardless. By psymbol I presume you mean partial_symbol. And I'm working from your email message (in which you read between the lines of some gdb-patches); I haven't looked at those patches since you weren't specific about which ones you're talking about. The whole design of partial_symbols was that they're only needed when the real symbols haven't been read in. This is well documented. In fact the partial_symtab for a file can be (or used to be able to be) thrown away when the real symtab is created, and many symbol-readers never bothered to create partial_symbols. Partial symtabs were only a speed optimization to avoid parsing Stabs debugging info when host machines ran at 20 megahertz. You could probably get rid of them entirely nowadays. Now, you could conceivably keep the partial_symtab (and hope that it pages out since you'll never access it), and at some later point, throw away the entire real symtab. Then subsequent accesses to this range of addresses or symbols would again start accessing the partial_symtab. This would let you reclaim some memory, but would require a reread of these symbols the next time execution stopped inside this file (or someone accessed a partial_symbol from it). (You'd also have to do a sweep of all other symbol pointers, e.g. for saved $37 values, to make sure that none of them point into the symtab you're trying to throw away.) But I think the idea in these patches is some idea that's much more granular than that, like throwing away 95% of the symbols in a file, while retaining 5%? Or only reading in 5%, and then going back to reread other symbols into the same symtab later? Is it really faster and cheaper to throw away and reread symbols, than it is to let the operating system's virtual memory page them in and out? The beauty of virtual memory is that it does cacheing without requiring any programming changes. (Yes, you can optimize virtual memory based programs by avoiding accesses that would run all over the memory space. That was part of the idea of keeping the symbols in obstacks, where they're all close to each other in memory. If you want to throw 95% of them away, what kind of memory allocator do you plan to use to preserve the locality of reference of the remaining 5%?) The GDB Internals manual (which I originated when I discovered that there was no internals documentation) makes it clear that there are only a few ways to look up a symbol. Has that nice clean bit of modular C programming has been retained over the last decade? If so, then you could change the implementation of symbol tables in just those few places, so that the lookup code would handle symbols that pointed to partial_symbols. And as long as the full symbol got created before those few lookup functions returned and some other bit of code went poking around in the guts of the symbol, then the rest of GDB wouldn't need to change. (That modularity is how psymtabs work, too; most of GDB never needs to care about them, and none of it needs to depend upon their existence.) But if you're going to have a struct symbol that other things can point to, but which internally only contains a psymbol pointer, you'll still need to allocate all the space for the whole struct symbol. Because when someone touches it and you need to read the real symbol in, you can't change the address of the struct symbol. (If you did change its address, then you'd have to find all the things that point to it, and revise those pointers to point to the new one.) So how is this idea of pointing to psymbols going to save any memory? And if you're going to have to allocate all the memory for the struct symbol, then why not populate it with the real information for the symbol, instead of just a psymbol pointer? Or else are you talking about only reading in a few symbols from a symbol-file when a lookup indicates that you need to expand a psymtab into a symtab? You could try that, but it's probably a hive of bugs, because you'd have to recurse to find all the symbols that the symbol-of-interest needs. It's much simpler to read all the symbols in a symbol file, in order, and once you're doing that anyway, you might as well save them all. (You could try doing a super cheap optimization and merely stop reading and creating symbols after you've defined the particular symbol that you are looking for. This wouldn't require symbols-that-point-to-psymbols; it would instead require lookups to look in both the psymtab and the symtab, even if there's a symtab. This would also involve removing duplicates, now that we'd look at both, or require some fancy footwork to only look in the PART of the psymtab that hadn't already been read into a symtab. Subsequent efforts to expand the psymtab to find some different symbol, would resume reading from where the previous symbol-reading-and-creation left off. This would probably be a much easier change than to make a symtab entry be able to point to a psymtab entry.) > Full symbols are already reasonably C++y, what with the ops vector. It looks to me like the "ops vector" in symbols in gdb-7.4 is pretty minimal, only applying to a tiny number of symbol categories (and the comments in findvar.c -- from 2004 -- report that DWARF2 symbols screw up the ops vector anyway). Large parts of GDB touch symbols; is the idea that all of these will be rewritten to indirect through an ops-table (either explicitly in C, or implicitly in C++) without ever accessing fields (like SYMBOL_CLASS(sym)) directly? Do you think this will make GDB faster and smaller? I don't. (There's a comment in symtab.c from 2003 that says address classes and ops vectors should be merged. But clearly nobody has felt like doing that work in the last 9 years -- probably because so many places in the code would need to be touched. You can't use C++ to overload those symbols without touching all the same places that nobody wanted to touch for a decade.) In conclusion, why does making this change require (or encourage) C++? The idea seems to require changing a ton of working code, whether it's ultimately done in C or in C++, which makes it an unappetizing change in any language. It would take a powerful heap of benefit to make it worth doing that. John