From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-return-13790-listarch-gdb=sources.redhat.com@sources.redhat.com>
Received: (qmail 25225 invoked by alias); 9 May 2003 23:14:24 -0000
Mailing-List: contact gdb-help@sources.redhat.com; run by ezmlm
Precedence: bulk
List-Subscribe: <mailto:gdb-subscribe@sources.redhat.com>
List-Archive: <http://sources.redhat.com/ml/gdb/>
List-Post: <mailto:gdb@sources.redhat.com>
List-Help: <mailto:gdb-help@sources.redhat.com>, <http://sources.redhat.com/ml/#faqs>
Sender: gdb-owner@sources.redhat.com
Received: (qmail 25100 invoked from network); 9 May 2003 23:14:23 -0000
Received: from unknown (HELO papaya.bactrian.org) (216.101.126.244)
  by sources.redhat.com with SMTP; 9 May 2003 23:14:23 -0000
Received: from papaya.bactrian.org (papaya.bactrian.org [127.0.0.1])
	by papaya.bactrian.org (8.12.8/8.12.8) with ESMTP id h49NELOt006847;
	Fri, 9 May 2003 16:14:21 -0700
Received: (from carlton@localhost)
	by papaya.bactrian.org (8.12.8/8.12.8/Submit) id h49NEIwR006845;
	Fri, 9 May 2003 16:14:18 -0700
X-Authentication-Warning: papaya.bactrian.org: carlton set sender to carlton@bactrian.org using -f
To: gdb <gdb@sources.redhat.com>
Subject: dwarves, hierarchies, and cross-references
From: David Carlton <carlton@bactrian.org>
Date: Fri, 09 May 2003 23:14:00 -0000
Message-ID: <m3ptmryaph.fsf@papaya.bactrian.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-SW-Source: 2003-05/txt/msg00173.txt.bz2

Right now the DWARF 2 symbol reader basically proceeds in a
hierarchical fashion: it starts at the DW_TAG_compile_unit entry, then
reads its children, and while reading those children reads their
children, and so forth.  This is good for building up fully qualified
names: e.g. if I have code like

namespace N {
  class C {
    void foo() {} 
  };
}

then it's easy to remember that, when generating the info for 'foo',
we're within a context called 'N::C', so we should really call it
'N::C::foo'.

But that's not the whole story: sometimes one DIE refers to another
DIE somewhere else in the hierarchy.  Typically (always?), these other
DIEs are used to provide type info.  So an example might be:

namespace N {
  class C {
  public:
    class E {};
  };

  class D : public C::E {};
}

Here, N::D has a DW_TAG_inheritance entry that references N::C::E's
DIE.  Now, if I've already traversed N::C::E before traversing N::D, I
probably already know everything about N::C::E.  But if the compiler
happens to emit the info for N::D before the info for N::C (and hence
N::C::E), things get hairier: the reader wants to find info about this
class called E, and it's hard to envision exactly how the reader will
know that, say, the class is really N::C::E (as opposed to, say, E or
N::D::E or N::E or something).

My branch gets this wrong: in situations like the above, it frequently
thinks that D has a base class called N::D::E.  Fortunately, later the
reader generates a correctly named version of the debug info of the
class, so this isn't the end of the world, but it's an unfortunate
situation, because the wrong name lingers in places.

Any suggestions?  Here are some options that I've considered:

* Once we notice that we put E in the wrong context, update everybody
  who has been misled by this.  This seems complicated and potentially
  fragile to me: exactly what data would we have to maintain to make
  this work?

* When parsing E via a cross-reference, figure out its context, so we
  can name it correctly.  This seems like a plausible idea to me; I'm
  only worried that it might be a little inefficient at times.

* Break up the symbol reading into a two-stage process: first, go
  through the hierarchy of DIE's enough to initialize their type
  fields with a bare-bones type, containing enough info for future
  cross-references to be able to use it.  (What exactly needs to be
  filled in?  Certainly the name field; does anything else need to be
  filled in?)  Then go through the hierarchy a second time, filling in
  everything completely.

I think I like the third option the best.  But I'm worried that it
won't be clear what information has to be filled in on the first pass,
and that it also won't be clear exactly how far the first pass will
have to descend the tree; also, it could lead to lots of code
duplication.  (We already look at the tree once for psymtabs and once
for symtabs; breaking the latter up into two passes would make that
even worse.)  If the third option doesn't work, I think the second
option should work: probably the patches on my branch to set names
properly is the only place that we really depend on traversing the
hierarchy in order, in which case tackling the name issue head-on is a
sensible approach.  (Hmm.  Maybe I like the second approach the best.)

Comments?  Suggestions?  Is this explanation of the problem clear at
all?

David Carlton
carlton@bactrian.org