From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <archer-return-1958-listarch-archer=sourceware.org@sourceware.org>
Received: (qmail 19069 invoked by alias); 27 Apr 2010 08:50:38 -0000
Mailing-List: contact archer-help@sourceware.org; run by ezmlm
Sender: <archer@sourceware.org>
Precedence: bulk
List-Post: <mailto:archer@sourceware.org>
List-Help: <mailto:archer-help@sourceware.org>
List-Subscribe: <mailto:archer-subscribe@sourceware.org>
List-Id: <archer.sourceware.org>
Received: (qmail 18994 invoked by uid 22791); 27 Apr 2010 08:50:29 -0000
X-SWARE-Spam-Status: No, hits=-1.2 required=5.0
	tests=BAYES_50,RCVD_IN_DNSWL_HI,SPF_HELO_PASS,TBC,T_RP_MATCHES_RCVD
X-Spam-Check-By: sourceware.org
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
From: Roland McGrath <roland@redhat.com>
To: Sami Wagiaalla <swagiaal@redhat.com>
Cc: Jan Kratochvil <jan.kratochvil@redhat.com>, archer@sourceware.org,
        Keith Seitz <keiths@redhat.com>
Subject: Re: Cross-CU C++ DIE references vs. mangling
In-Reply-To: Sami Wagiaalla's message of  Monday, 12 April 2010 14:46:48 -0400 <1271098008.2901.211.camel@localhost>
References: <20100310191833.GA2816@host0.dyn.jankratochvil.net>
	<20100310193207.GA6147@host0.dyn.jankratochvil.net>
	<20100311060305.B177A7D5E@magilla.sf.frob.com>
	<1271098008.2901.211.camel@localhost>
Message-Id: <20100427085011.9ADC0CF70@magilla.sf.frob.com>
Date: Tue, 27 Apr 2010 08:50:00 -0000
X-SW-Source: 2010-q2/txt/msg00014.txt.bz2

Sorry for the delay.  I'm now trying to hose down the various hornets'
nests I stirred up in DWARFland.

> So after a few (really, many) reads of this email I think I can
> summarize the issues and solutions discussed there. I just wanted to
> make sure I have a proper understanding of the issue before filing a gcc
> feature request. So, Is this a correct summary:

Ok.  I don't think I stated an actual conclusion, just tried to air all
the nuances needing consideration.  Perhaps appropriate conclusions were
implied by the confluence of nuances, but I did not quite assert any.

> The goal is the help gdb find the proper location for variables where
> declarations and definitions are separated over CU's or so's.

Yes, that's the problem that we started discussing.  In my ramblings,
I extended it to consider finding the proper code address (or function
descriptor, as appropriate) for functions too (the cases with complexity
analogous to the examples we've discussed with variables being C++
methods and namespace-qualified functions).

> - It requires a search of all other CU's/so' to locate the definition.
>   Which is inefficient 

"It requires" sounds like this is the only option today.
That's not so.  I think you might be conflating two different things.

One option is to search all other CUs (in all objects) to locate a DIE
that is both a defining declaration and matches the original declaration
DIE of interest.  That is inefficient because it's an exhaustive-search
kind of method.  It's incomplete for all cases where the definition you
are looking for does not have a DWARF CU you can find (due to stripping
or due to lack of -g at compilation, etc).

Another option today is to glean an ELF symbol name by one method or
another, and then look for that.  This has two components: coming up
with the symbol name, and searching for it.  The symbol search portion
is presumed to be more efficient than searching through DWARF, though
its largest-granularity scaling problem is the same one of searching
across all the ELF objects.

> but also inaccurate since
> 
> - The scope of the declaration can be different from that of the
>   definition (e.g. class members). 

That issue per se does not render the grovel-all-CUs method inaccurate,
just more complex than you might think at first blush.  In each CU, you
have to notice each matching non-defining declaration (which does indeed
have the DIE topology matching your original declaration), and check for
defining declarations whose DW_AT_specification points to the match.
This is just a detail of how it is both complex and costly to check all
CUs for a DIE that's an appropriate defining declaration.

> If DW_AT_MIPS_linkage_name is
>   available it can be used to resolve this, however

The "glean an ELF symbol name" portion can be done in two ways.  One is
DW_AT_MIPS_linkage_name, which is trivially simple to code for in the
debugger and trivially cheap to extract.  The other is to apply the
language/ABI-specific symbol name mangling algorithm to the DIE topology
of your original declaration DIE of interest.

If DW_AT_MIPS_linkage_name is available, it supplies the same answer(*)
that you get via mangling based on DIE topology.  It's not that it
"resolves inaccuracy", it's just that it yields from a very simple and
cheap procedure (looking at the attribute) the same answer that the much
more complex procedure should yield.

(*) Conversely, I had the impression from Keith that GCC (at least in
the past and maybe still today) emitted the wrong mangled name for
DW_AT_MIPS_linkage_name sometimes.  That sort of boggles the mind, but
apparently is an issue of potential concern weighing against using
DW_AT_MIPS_linkage_name.

> - if the definition is in a stripped DSO there is indeed a definition
>   (ELF) but nowhere is there a DW_AT_location pointing to it. Also,

That is true but is not a "however" about using DW_AT_MIPS_linkage_name.
Nor is it a "however" about NOT using DW_AT_MIPS_linkage_name.  

Rather, it is a "however" about using CU grovelling to find a definition
rather than gleaning an ELF symbol name.  If you rely on CU grovelling,
you of course only grovel the DWARF CUs that you have, which might not
include the definition.

> - it is possible to have two names defined in two separate so's with the
>   same linkage name. eg:

Yes.  I gave the concrete example for this situation, but I consider it
part of the same point that you can also have two symbols with the same
name inside one object.  For that to happen, either one or both will be
local or hidden symbols, or two global symbols will be in different
symbol version sets.

To be fair, this could be considered an entirely orthogonal issue.  It
applies here no different than it does to very simple non-mangled symbol
names (e.g. from C).  If at any point you glean a symbol name and then
look it up by directly name, you can have multiple ambiguous matches.

However, if you glean a specific ELF symbol--not the name, but a
particular symbol index in a particular ELF symbol table--then you can
disambiguate (with potentially very complex effort, but you in theory
have enough information).  In the vast majority of cases where you have
a DWARF CU with a non-defining declaration to start from, you should be
able to glean the particular ELF symbol in that object to use.  In the
object containing the non-defining declaration itself there will almost
always be only one ELF symbol by that name.  If it's local or has
non-default visibility, you can use it right there--it's the defining
symbol you're looking for.  If it's global, then it has a symbol version
association that you can use to drive your ELF symbol search unambiguously.

> Proposed solution:
>
> Teach the compiler to generate a DW_AT_location for a non defining
> declaration that is applicable in that die's scope. That location
> expression would be parallel to the assembly generated for the symbol

I only sort of proposed this, and it's not a complete solution.

> The following part I don't quite understand:
> 
> > We could certainly teach GCC to do this.
> > It would then be telling us more pieces of direct truth about the code.
> > Would that not be the best thing ever?
> > Well, almost.
> > 
> > First, what about a defining declaration in a PIC CU?  
[...]
> Why is there a need for second artificial location describing die ? As I
> understand it declarationhood is specified by the die's nesting in the
> die hierarchy not its DW_AT_location. In other words, what is missing in
> the current way gcc specifies locations for defining declarations ?

Declarationhood per se (or perhaps we should say "non-definingness") is
specified by the presence of DW_AT_declaration, not by DIE toplogy.  The
issue is that in PIC code, what's a defining declaration in the source
might actually be acting as a non-defining declaration at runtime.

Every defining declaration serves two purposes.  The first is to
describe the declaration.  Just like a non-defining declaration, this
wants to tell the debugger what using this particular name in this
particular context (i.e. containing DIE) means in the source program.
That's the frame of mind you want when doing things like expression
evaluation in a given context.  This corresponds to how assembly code is
generated in that context to find the address of the entity described
(data address or target PC/function descriptor).

The second purpose is to describe the definition.  This wants to tell
the debugger what piece of memory this definition is providing.  That's
the frame of mind you want when resolving someone else's non-defining
declaration, or when trying to examine initializer values before a
program is running.  This corresponds to how assembly code is generated
to create the definition and (perhaps) initialize it.

In code generated today for all defining declarations, we only have a
description of the definition.  In non-PIC code, that suffices to
describe the declaration, since it's always resolved to that selfsame
definition.  In PIC code, that declaration is resolved like non-defining
declarations and may or may not wind up matching this same definition.

Thus there is the idea for PIC code generation to emit both a
declaration DIE and a definition DIE for each defining declaration.
When looking to evaluate the named variable in that context, you'd
use the one.  When looking for a definition, you'd use the other.

> This summary does not include the part starting with "Before dynamic
> linker startup" to the end of the email. Mainly because I am assuming
> that the main use case is after dynamic linker startup.

Well, I have a few problems with assuming that.  

Firstly, it's just not the way to do business.  If we're going to
contemplate changing or refining the contract between compiler and
debugger in subtle ways, we don't do it lightly and we don't consider
just one use case and go and change things purely to satisfy that
purpose.  We need to thoroughly consider what is most correct for each
case we know of, and understand what methods do or don't achieve that.
We may very well decide to trade off better support for some cases
against less perfect support for cases deemed less common or important.
But we'll do that explicitly after understanding what we're giving up
and what we're getting.

Secondly, is it really the main use case?  Well, maybe it is for
variables.  That is all you actually asked about, but I insist on
answering about what you need to know, not just what you asked.  For
variables, what it doesn't cover is printing initial values (which
ordinarily works today) and setting watchpoints.  The other half of the
problem is functions (including methods).  I'll grant that they are not
quite as central in debugger expression evaluation as variables, but
they're important there.  Moreover, they are key to a use case every bit
as important to the debugging experience as expression evaluation:
setting breakpoints.

Finally, after the dynamic linker details, and where I ranted a little
about the functions, near the end is where I came closest to drawing an
actual conclusion.  That putative conclusion is more or less that all
the preceding new ideas are sufficiently incomplete in their own ways
that no such proposals really warrant pursuit and we might as well admit
we are stuck with mangled symbols and faking the ELF dance as best we
can.  If that's the conclusion, then the only proposals are either to
have a reliable linkage_name attribute and rely on it, or to drop it as
useless and expect to construct a mangled name from DIE topology.

I still wouldn't say I have come to that conclusion quite yet.  I
described (almost) everything I understand about the possibilities and
constraints.  I was hoping for some other folks to gain that
understanding and share their opinions about how it all fits together.


Thanks,
Roland