From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <archer-return-1932-listarch-archer=sourceware.org@sourceware.org>
Received: (qmail 25422 invoked by alias); 11 Mar 2010 06:03:22 -0000
Mailing-List: contact archer-help@sourceware.org; run by ezmlm
Sender: <archer@sourceware.org>
Precedence: bulk
List-Post: <mailto:archer@sourceware.org>
List-Help: <mailto:archer-help@sourceware.org>
List-Subscribe: <mailto:archer-subscribe@sourceware.org>
List-Id: <archer.sourceware.org>
Received: (qmail 25400 invoked by uid 22791); 11 Mar 2010 06:03:18 -0000
X-SWARE-Spam-Status: No, hits=-4.7 required=5.0
	tests=AWL,BAYES_50,RCVD_IN_DNSWL_HI,SPF_HELO_PASS
X-Spam-Check-By: sourceware.org
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
From: Roland McGrath <roland@redhat.com>
To: Jan Kratochvil <jan.kratochvil@redhat.com>
Cc: archer@sourceware.org, Sami Wagiaalla <swagiaal@redhat.com>,
        Keith Seitz <keiths@redhat.com>
Subject: Re: Cross-CU C++ DIE references vs. mangling
In-Reply-To: Jan Kratochvil's message of  Wednesday, 10 March 2010 20:32:07 +0100 <20100310193207.GA6147@host0.dyn.jankratochvil.net>
References: <20100310191833.GA2816@host0.dyn.jankratochvil.net>
	<20100310193207.GA6147@host0.dyn.jankratochvil.net>
Message-Id: <20100311060305.B177A7D5E@magilla.sf.frob.com>
Date: Thu, 11 Mar 2010 06:03:00 -0000
X-SW-Source: 2010-q1/txt/msg00092.txt.bz2

> In this case if it see DW_AT_external + DW_AT_declaration it can also global
> "S::i" defining DIE so DW_AT_MIPS_linkage_name is probably really not needed.

I'm not quite sure I'm following you.  I think you meant "... it can also
see the global ..." here.

So, yes, it can look through all other CUs looking for a defining
declaration with a DW_AT_location.  It matches another CU's version
of the same thing by looking at the path of <namespace name="foo">
levels to a <variable name="bar"/>.

You already know this, but I'll mention another wrinkle about that.  For
C++ namespaces, I think all definitions do indeed have to be inside a
DW_TAG_namespace scope with matching name.  So there you are looking for
exactly the sequence of nested <namespace name=> that matches.  But the
general case also includes class members (both methods and static member
variables).  For these, the defining declaration can be lexically outside
the declaration scope, either at top-level (child of CU) or inside matching
DW_TAG_namespace scopes but outside the matching DW_TAG_class_type (et al).
To find that case, you need to remember each DIE that matches the
scope/name path but has only DW_AT_declaration.  Then look at each DIE with
an appropriate tag for a DW_AT_specification pointing to a matched DIE.
(Here "appropriate" means DW_TAG_variable for DW_TAG_variable,
DW_TAG_variable for DW_TAG_member, and DW_TAG_subprogram for
DW_TAG_subprogram.)

However, you might well never find one.  Consider when the definition is in
a stripped DSO.  (For simplicity, let's say your reference is in another
DSO, so it's PIC code.  I'll mention that wrinkle a little later.)  (It
could even just be in a single stripped (or -g0 compiled) object in the
referring object, given obtuse build situations.)  Then there is indeed a
definition to find, but nowhere a DW_AT_location pointing to it.  In that
case, you really have no recourse but to discern the ELF symbol name and
look that up.  If you don't have (or don't trust) DW_AT_MIPS_linkage_name,
then you need to construct the name correct with the mangling algorithm
applied to your DWARF-derived understanding of the declaration scope in C++
terms.

Another way to think about the subject is that there are two fundamental
ways to go about answering the question, "What does DIE 0x123 refer to?"
One is holistic and source-centric.  The other is "local" and code-centric.

The first is what we've talked about above.  We take the DWARF to describe
the semantic structure of the code in source-language terms.  We then look
at all the CUs together and take them to form a single coherent whole as if
we were a human reading all the bare source code.  From that perspective,
we can say, "S::i means S::i.  There's an S::i reference over here, and an
S::i definition over there, and there we are."  Unfortunately, that only
works, at best, as well as a human reading the (preprocessed) source
code--without reading the build rules, arcane linking circumstances, etc.

The second is what is embodied in theory by using DW_AT_MIPS_linkage_name
all the time.  We take the DWARF in a single (logical) CU to tell us
exactly and only about what the compiler producing that one CU thought
about the code and/or data actually emitted in that same translation unit
(i.e. .o file from $compiler -c, pedantically distinct from a DWARF CU).
It's local in that you confine your consideration to what the compiler
knew, when it knew it (before linking).  Whereas the first approach is
source-centric in that you use DWARF to imagine what the source looked like
and then apply source-language rules to understand it, this is code-centric
in that you use DWARF to imagine what the assembly looked like and apply
ELF linking rules to understand what it means at run time.

The hairy issues of the latter approach are orthogonal to the question of
DW_AT_MIPS_linkage_name itself.  The crux of that approach is that you are
using the DWARF to surmise what a reference in assembly looks like.  The
sole purpose of DW_AT_MIPS_linkage_name is to make it trivial to surmise
that because the compiler is just telling you, and it should well know.
But if the compiler lies, it's useless.  For all the issues I'm raising
today, it's entirely equivalent if you get the right answer by using the
DWARF to feed the mangling algorithm as the way to surmise what the
assembly must have been.  It's just not equivalently trivial to do. :-)

I started with a mention of the case where you can't possibly use a
holistic source-centric approach: ya just ain't got all the source.
That reveals the crux of the inadequacy of that approach, but it's
only the tip of the iceberg.  Not all CUs are created equal (some
are born without the benefit of -g), and not all CUs weather the
slings and arrows equally (some are mercilessly stripped bare).  But
beyond that, still others have equal faculties to their brethren and
appear equal yet are separate and different, not the same at all.
In those cases you can have all the imagined source code you think
you need, but get the wrong answers because that's not really
everything you need to know.

If you remember a page ago, I made an aside about everything being
PIC simplifying things.  (Go figure!  It sure never simplifies
understanding the assembly code!  But it really does simplify the
situation here.)  Consider a case where there is a non-PIC reference:

	$ gcc -g -xc <(echo 'int foo = 23;') -shared -o foo.so
	$ gcc -g -xc <(echo 'extern int foo;main(){return foo;}') -c -o bar.o
	$ gcc -o bar bar.o foo.so
	$ eu-readelf -sr -winfo bar foo.so

Now, if you look at all those CUs there is exactly one "foo" that
has a DW_AT_location.  That's in foo.so, and it gives the address of
foo.so's definition of "foo" in its .data section.  This address is
also foo.so's st_value for "foo" in its .dynsym and .symtab.  (If
foo.so is stripped, you may have only .dynsym.)

But that's the wrong address.  The real address is the space
allocated for "foo" in the main executable (bar).  The CU doesn't
say that.  It just says DW_AT_external, DW_AT_declaration.  That's
the truth the compiler knew, when it was not yet even possible to
know whether "foo" would have a real definition in another CU in the
same executable, or would instead have an implicit definition
created by the linker.

That's in fact what we have here.  The resolution of "foo" from the
bar.o CU is an address in bar's own .bss, for a word that never
existed in bar.o at all.  The linker adds the word, a R_*_COPY reloc
to it, and a "foo" symbol with that st_value.  (This tells the
dynamic linker to look up foo.so's "foo" at startup time and copy
its initializer (23) to bar's copy.)

But it's worse than that!  That's not only what "foo" means to the
code in the executable, it's also what "foo" actually means to the
code in foo.so itself, the code in the very CU that purports to (and
does!) provide the defining declaration.  So even though it gave you
a DW_AT_location, you cannot stop there and take that as the
explanation.  The compiler didn't lie, but it told you something
quite subtle.  It gave DW_AT_location, but also DW_AT_external.
This means exactly, "I defined it in the assembly, but the linker is
also involved."  (You can read in, "... and good luck with that,
sucker!")  Since the linker is involved, this means you have to
figure out what the assembly looked like (i.e. correct mangling or
whatever), and then figure out what the linker did with that, and
then figure out what that meant to the dynamic linker.

This is related to two other cases, with symbol visibility, and with
symbol versioning.  These are ways that are similar in how you have
to look at the compiler's perspective through the lens of the
linkers that followed, and in that there are two things that from
CUs alone look like they go together.  In the PIC case, they really
do go together in the abstract view--there is just a purely
mechanical wrinkle that could mislead you.  In these cases, you can
have something that the DWARF identifies just the same in multiple
CUs, but that actually aren't the same in the slightest!

Consider:

	$ g++ -g -c -fPIC -o foo1.o -xc++ <(echo 'namespace internal __attribute__((visibility("hidden"))) { int i; };')
	$ g++ -g -c -fPIC -o foo2.o -xc++ <(echo 'namespace internal __attribute__((visibility("hidden"))) { extern int i; }; int foo () { return internal::i; }')
	$ gcc -g -shared -o foo.so foo1.o foo2.o
	$ g++ -g -c -fPIC -o bar1.o -xc++ <(echo 'namespace internal { int i; };')
	$ g++ -g -c -fPIC -o bar2.o -xc++ <(echo 'namespace internal { extern int i; }; int bar () { return internal::i; }')
	$ gcc -g -shared -o bar.so bar1.o bar2.o
	$ eu-readelf -sr -winfo foo.so bar.so

Now imagine a program linking in both foo.so and bar.so.  There are
two different things that are both separate but equal and both truly
internal::i and both truly _ZN8internal1iE.  By any method, there is
no one answer to, "What is internal::i?"  The only answers are
context-specific.

This example is the simplest case.  Here you could very well take
the holistic, source-centric view and decide that means "holistic
within the same linked object".  When asking in a foo.so context,
you take the foo1.o and foo2.o CUs together.  When asking in a
bar.so context, you take the bar1.o and bar2.o CUs together.

But this really is only the simplest case.  Now imagine that bar1.o
and bar2.o are linked into two separate DSOs.  Those two DSOs are
meant to work together, so one refers to the other's namespace.  But
foo.so has nothing to do with them.  You can't tell that from the
source, and you can't tell it from segregating the source along DSO
boundaries.  (Well, maybe you can, since the literal source has the
visiblity attribute to see, and perhaps the DWARF could represent
that to you.  But I could as well have made the example do the
symbol-hiding with a linker version script, so the compiler might
never have known.)  There is no CU in bar2.so that defines it, so
you have to look in some other DSO.  How can you tell whether you
wanted to look in foo.so or in bar1.so?  If the answer is to consult
any ELF details about the DSOs' interrelationships (and that's the
only answer I can think of), then we've come full circle to getting
code-centric at the large granularity while we claimed to be
holistic at the small granularity.

If instead you think locally, and code-centrically, you can get it
all right.  The bar2.o CU's declaration says DW_AT_external, so you
know its symbol name.  In bar2.so this symbol is undefined, so you
have to emulate the dynamic linker's lookup to resolve it.  The
foo.so symbol by that name is defined, but it's STB_LOCAL and
STV_HIDDEN, so it can't be the one.  The bar1.so definition is
STB_GLOBAL and STV_DEFAULT, so it could be the one.  Luckily, it's
the only such candidate in this example.  When there are other
candidates, you have to be sure you are following the dynamic
linker's algorithm for which comes first and wins.  All the corners
of that are a big wad of hair, but it is concretely knowable anyway.
(Mostly.  Mostly.)

The case with symbol versions is much the same.  The details of what
ELF symbol magic you have to grok to resolve to the right symbol in
the right object are all that differ.

Of course, it can always be worse than that.  Even when you do
everything right, you still have to bridge the gap from the symbol
name DWARF has made you decide you to look for, and an actual ELF
symbol.  If you do enough fancy linker machinations, there can be
more than one symbol by the same name in a single linked object.
(They'll have different binding, or different symbol versions, or be
in different section groups.)  It could very well not be possible to
figure out which one the assembly code corresponding to a given CU
wound up actually referring to when all linked.  (There are ways to
do things that just plain lose information.)  But you have to really
go out of your way to wind up with that.  If a build process
involves ld -r stages with funny options and/or linker scripts, be
afraid--but even then the extreme weirdness is unlikely to come up.

Not that I'm averse to ending on a note of despair, but I'll toss in
a twist of contrarian wistfulness on the finish.  It could all have
been so different, man, it could have been...beautiful.

There is an obvious case that the truly local and truly code-centric
thing would be for the compiler to just tell you directly in the
DWARF the actual truth about what the code does.  Not with a symbol
name, but with code, or its DWARF equivalent, which is a location.
This is what Jan imagined in that IRC conversation.  The DWARF spec
explicitly allows this: a non-defining declaration (i.e. one with
DW_AT_declaration) may have a DW_AT_location that applies only in
the context of that DIE's scope and supercedes the defining
declaration's DW_AT_location.  Personally, I love this as a theory.

But traditional practice (at least in GCC, and perhaps everywhere)
has always been to just leave it off and let DW_AT_declaration,
DW_AT_external imply the need to follow ELF-driven runtime logic.

What it would mean is that the compiler emits two pieces of assembly
code together in the translation unit: the compiled code that
actually accesses a variable, and the DW_AT_location expression
describing that access.  In theory, though the instruction sets
differ (one is real machine code fragment and one is DWARF
expression stack machine program), this is assembling two versions
of code that does the same thing, and uses the same symbols and
relocations to do it.  This is what Jan was imagining on IRC.  But I
was thinking about it in the wrong way.  I took Jan's "with a
relocation record" comment too literally, as meaning final objects
with relocations left for DWARF (akin to dynamic text relocations).
I didn't think of the cool way to consider the PIC case, or else I
blathered on about that hours or days later that Jan didn't quote.
(In fact, probably I just thought about it idly an hour or two later
and then never discussed the thought.  Sigh.)

The key is that you can have the same(ish) relocs using the same
symbols in the code and DWARF as assembled.  Then whatever happens
in linking stages later should be the same, as it's all the same ELF
symbol references in both the .text and .debug_info relocations.  If
what you do is describe in the DWARF location expression exactly the
real access that is used in the emitted code, then for a PIC
translation unit that's a PIC access in the DWARF stack machine
program too.  That is resolved fully at link time (ld -shared) and
does not yield a relocation record for the final DWARF, just as it
doesn't yield a text relocation for each site of access in the code.

For non-PIC code, the actual code looks like:

	movl	_ZN8internal1iE(%rip), %eax

and the DWARF bit could look like:

	.byte DW_OP_addr
	.quad _ZN8internal1iE

These use different relocation types, but they mean the same thing
in the context of how the code works.  x86-64 globals are always
PC-relative just because that's the efficient instruction, but it
means the "absolute" address of the symbol.  So the access uses a
R_X86_64_PC32 and the DWARF uses R_X86_64_64.  Since both relocs
point to the same ELF symbol, you know they will "travel together".
These get resolved at link time to absolute addresses, et voila.
The DWARF location accurately describes what the code really does.

In a PIC access, what the final code will actually do is not really
related to anything about ELF symbols.  It's just memory indirection.
The PIC code is:

	movq	_ZN8internal1iE@GOTPCREL(%rip), %rax
	movl	(%rax), %eax

This generates R_X86_64_GOTPCREL.  At link time (ld -shared), this
relocation goes away and it doesn't refer to the location where the
_ZN8internal1iE symbol says.  Instead, it's resolved to a .got slot
created by the linker.  When this code runs, all that's happening is
loading the pointer from that memory.  So, the DWARF location can
describe that too:

	.byte DW_OP_addr
	.quad _ZN8internal1iE@GOT
	.byte DW_OP_deref

This generates R_X86_64_GOT64.  At link time, this too goes away and
becomes the "absolute" address of the .got slot.  So it accurately
describes just what the code does to access internal::i.  (In this
case, "absolute" earns its scare quotes, because it really means
relative to the load bias of the containing DSO at run time, just like
all other addresses in DWARF, and in ELF symbol values, in a DSO.)


We could certainly teach GCC to do this.
It would then be telling us more pieces of direct truth about the code.
Would that not be the best thing ever?
Well, almost.

First, what about a defining declaration in a PIC CU?  

In the abstract, a defining declaration can be considered as talking
about two different things.  One is its declarationhood, wherein it
says that the containing scope has this name visible.  For that
purpose, it could reasonably be expected to be like a non-defining
declaration: say how code in this scope accesses the variable--the
truth about what's in the assembly code for any accesses in that CU.
But the other thing is its definitionhood, wherein it says what data
address contains the data cell and thus (optionally) implies what
object file position holds the initializer image--another truth about
what's in the assembly code for the definition in this CU.

In non-PIC code, these two truths match.  Both use direct address
constants (as relocated at link time).  But in PIC code, the truth
about the definition is an address constant, while the truth about the
access is an indirection through .got.  (If you have PIC code that
uses __attribute__((visibility("hidden"))) then it's direct access,
though PC-relative, and thus "non-PIC" ("absolute") for DWARF
purposes, so both truths match as in truly non-PIC code.)

Personally, I would be all for having it both ways.  In a CU where a
defining declaration is actually used by PIC accesses, then you could
generate a second non-defining declaration (even for C).  Give it
DW_AT_artificial, DW_AT_declaration, DW_AT_specification pointing to
the defining declaration (in lieu of DW_AT_name, DW_AT_type, et al),
and then DW_AT_location with the PIC style using indirection.

With that, you could know that if you got a DW_AT_location from any
DIE with DW_AT_declaration then you're done and have the real truth
for accesses.  If we presume no CUs from pre-apocalyptic compilers now
that we are in these here end times, then we are finally free from
ever having to rely on discerning the right ELF symbol from a name we
surmised from DWARF (be it via DW_AT_MIPS_linkage_name or mangling).

Phew!  In other words, excepting the small matter of manifest reality,
we don't even need to think about ELF stuff (except for the occasional
load bias for a DW_OP_addr)--the access locations are always given in
DWARF expressions using pure memory access (either direct or indirect).
Well, almost.

Before dynamic linker startup, what's the truth about what memory
location a given name in a given scope refers to when PIC indirection
is involved?  The real truth is that code in that scope doesn't run
yet!  That question doesn't get answered until the dynamic linker has
done startup.  But you might want to know.  Like what if you tried
"print internal::i" (or "print foo", from the first PIC C example).
If you've done something like "info line func" (I think) then you've
given it a context (at least a CU) to imagine what you're asking
about, so it can get to the right DIE for the right "internal::i" or
"foo".  That should print the initialized value, even though there is
no memory to read it out of, only the ELF files.  If that is a
non-defining declaration with a PIC-indirect DW_AT_location, then it
says to load the .got slot, but that slot is not initialized yet.
Likewise, if that is a non-defining declaration for non-PIC code
linked against a DSO definition, then it has an absolute-address
DW_AT_location--which is the real truth about the memory to use,
e.g. where to put a watchpoint--but that memory is not initialized yet.

For those cases you can look for dynamic relocs applying to the memory
address from DW_AT_location.  There will be a R_*_GLOB_DAT reloc or an
R_*_COPY reloc, respectively.  That reloc points to an ELF symbol.  It
points to exactly the right symbol, no name-matching to do and
possibly be wrong.  That is, it reduces even the "extreme weirdness"
cases to the level of hair of merely every actual case you would ever
have to contend with in the real world.  Secure in the knowledge that
this is exactly the ELF symbol that the dynamic linker will be using
to drive its resolution of this reloc, you just have to consider the
binding, visibility, and version set of this symbol and correctly
emulate the dynamic linker to find which exact ELF symbol in which
file will be supplying the definition.  In the PIC-indirect case, that
is, the symbol whose adjusted address is the actual memory.  In that
case, of course, you have to then check that address for a copy reloc
and possibly turn it into the direct address w/copy case.  In that,
"the definition" is the symbol whose address (as represented in the
file by following its phdrs or shdrs) holds the initializer.

So, if either you want to do "offline" work of any kind, or you want
to cope with DWARF from compilers that have existed yet, then you
still have the little matter of emulating the dynamic linker.

I've omitted a whole little tirade (yes, omitted! it could be longer,
I tell you!) about how literal the "truth" about how PIC accesses work
really should be (x86-64 and its PC-relativity is the simplifying example!).

I'll merely allude to most of the tirade about non-defining
declarations for code.  That is, functions, including methods.  Well,
ok, a wee bit of tirade.  If you are trying to do correct expression
evaluation, or just which "foo" I meant in "break foo", in a
particular context, you have all the same issues for functions and
methods.  A non-defining declaration is a DW_TAG_subprogram with
DW_AT_external and DW_AT_declaration.  If the context yields a DIE
that has DW_AT_declaration, you have to discern a mangled symbol and
look it up.  It has no DW_AT_location like a data object would.  But
it could have a DW_AT_entry_pc.  The DWARF spec does not mention this
case explicitly for use with non-defining declarations as it does for
DW_AT_location.  But I read it as implicitly permitted, and naturally
meaning something analogous: the truth about where "foo()" calls made
in this scope would jump to.  The really real truth for PIC cases is
that it's a PLT entry, and what all that means is pretty much the
whole rest of that tirade.

At the end of the day, more truth in the DWARF can only really save
you from some pathological weirdness that isn't going to show up
anyway.  Just the run-of-the-mill weirdness means you really need to
turn all the DW_AT_external declarations (defining ones too, given
PIC!) into a known ELF symbol in the referring file and attempt to
resolve that to the right true definition address by ELF rules.

Not to mention that at best we assume that our imagined
post-apocalyptic compiler surely can emit the same ELF symbol in
assembly for DW_AT_location that it just did in assembly code for an
actual use.  But, that's exactly the same symbol name string it should
emit in the assembly for DW_AT_MIPS_linkage_name, just with "" instead
of @GOT.  So, if it can get DW_AT_MIPS_linkage_name wrong...


Ok, so by "wistfulness", I meant fantasy, disillusionment, bitterness,
and resignation, and by "the finish", I meant "another four pages".
What were you expecting?


Thanks,
Roland