From: Roland McGrath <roland@redhat.com>
To: Frysk Hackers <frysk@sourceware.org>
Subject: Re: dl symbol search path; Was: Corefile -arch 32 test failures with breakpoint and stacktrace tests
Date: Tue, 16 Oct 2007 02:39:00 -0000 [thread overview]
Message-ID: <20071016023848.95A054D0389@magilla.localdomain> (raw)
In-Reply-To: Andrew Cagney's message of Tuesday, 9 October 2007 13:28:52 -0400 <470BBA54.9060701@redhat.com>
Sorry, there is indeed no handy function to help you with this.
I'm sure we will have one eventually. But doing it really right
does involve really severe amounts of hair, which are all in the
90% to get the last 10%. The best we can try to do for now is to
stub it out with the trivial naive search, but inside an interface
that asks the question the right way.
The goal is to resolve the ELF symbol reference that would have been
done by referring to a particular name in a particular source context.
When reduced to ELF terms, these components describe a reference:
1. symbol name
2. symbol version binding
3. Dwfl_Module making the reference
4. purpose of the reference
To take each of these in detail:
1. This is the name string used in the ELF symbol table, obviously.
Resolving what a name used in a particular context in the source
would actually yield as an ELF name is in fact quite difficult for
all the complex cases, though 90% of cases are trivial. I won't go
into all the gory details of those problems now. It suffices to
observe the context necessary to do the best possible job: DWARF
scope DIE of the desired context, if available (either CU or inner);
language mode if not implicit from CU; module the reference is being
made from.
2. The symbol version is approximately a second and third part to the
symbol name. It is usually determined (bound) only at link time,
not by the compiler nor in assembly source code; DWARF is never
much help figuring it out. It's only an issue when there are two
definitions with the same name and different symbol versions. This
is likely for e.g. printf and some libstdc++ name-mangled symbols,
but not very common in Joe Blow's DSO.
Given a module to consider as the context making the reference, when
there is an existing external reference in that module to a given
symbol name, that pretty much makes it easy. (It's possible to refer
to two different symbols with the same name and different versions
inside one module. But it's so inconvenient that even DSOs like libc
that define multiple symbols by the same name don't use the versioned
mechanism to refer to their own symbols between CUs, they just use
private aliases with different names.) If there was no reference to
this symbol linked into the context module, then you can't in all
cases get an unambiguously right answer. But there is no information
helpful to that guessing more granular than just the module as
context.
3. Every reference is in the context of some module or other, even if not
in the context of a known CU/scope. If the context of the user request
seems to be "global, I don't care exactly what I mean", then either
this means the main program module, or it means accepting the result is
a map of different results given different contexts in cases of ambiguity.
4. The purposes for reference that we must distinguish are:
a. like a PLT reloc, a jump target
b. like a COPY reloc, an initializer value
c. like all others, an object address
For "func(args)", and probably for a breakpoint on func, it's (a).
For "&func", it's (c). In module A using PIC that calls func in module B,
(a) means func's real definition in B; (c) means func's PLT symbol in A.
For a non-PIC object (i.e. the main program) that does "extern int foo;"
and links to a DSO that does "int foo = 1;", then (b) is foo's symbol in
the main program; (c) is foo's definition in the DSO. When the
reference is to "the variable foo", to see its live value in the
program, change it, place a watchpoint, it's (c). When the reference is
to print the static initializer value, offline or before the program has
finished dynamic loading at startup/dlopen, then it's (b).
The (a) vs (c) and (b) vs (c) distinctions often apply to a module's
own references to its own symbols, though I used less confusing examples
above. So purpose is relevant to every lookup.
At high level, we can describe the purpose distinctions as "for actual
code address", "for static initializer data", and "for object address".
So, gleaned from that, the lookup function should have at hand:
* symbol name
* DWARF scope DIE of the desired context, if available
* language mode, if no scope/CU context
* referring module, if available
* purpose of reference
Furthermore, it should be able to return ambiguous results. That is,
the result of the lookup is either "this one is it", or a list of
candidates each annotated with "this would be it if you were to give
foobar as the referring module" and/or "this would be it if this
reference by module foobar were resolved to symbol version V in module
M". It's fine for now if the only plan for the ambiguous cases is an
exception "couldn't figure it out" or even "blindly pick the first one".
But one should contemplate how the description of the ambiguity and the
options for resolving it might percolate up to the user.
Thanks,
Roland
prev parent reply other threads:[~2007-10-16 2:39 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-10-06 8:12 Phil Muldoon
2007-10-08 10:46 ` Mark Wielaard
2007-10-09 17:08 ` Andrew Cagney
2007-10-09 17:30 ` dl symbol search path; Was: " Andrew Cagney
2007-10-16 2:39 ` Roland McGrath [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20071016023848.95A054D0389@magilla.localdomain \
--to=roland@redhat.com \
--cc=frysk@sourceware.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).