public inbox for frysk@sourceware.org
 help / color / mirror / Atom feed
From: Roland McGrath <roland@redhat.com>
To: Frysk Hackers <frysk@sourceware.org>
Subject: Re: dl symbol search path; Was: Corefile -arch 32 test failures with  breakpoint and stacktrace  	tests
Date: Tue, 16 Oct 2007 02:39:00 -0000	[thread overview]
Message-ID: <20071016023848.95A054D0389@magilla.localdomain> (raw)
In-Reply-To: Andrew Cagney's message of  Tuesday, 9 October 2007 13:28:52 -0400 <470BBA54.9060701@redhat.com>

Sorry, there is indeed no handy function to help you with this.
I'm sure we will have one eventually.  But doing it really right
does involve really severe amounts of hair, which are all in the
90% to get the last 10%.  The best we can try to do for now is to
stub it out with the trivial naive search, but inside an interface
that asks the question the right way.

The goal is to resolve the ELF symbol reference that would have been
done by referring to a particular name in a particular source context.
When reduced to ELF terms, these components describe a reference:

1. symbol name
2. symbol version binding
3. Dwfl_Module making the reference
4. purpose of the reference

To take each of these in detail:

1. This is the name string used in the ELF symbol table, obviously.
   Resolving what a name used in a particular context in the source
   would actually yield as an ELF name is in fact quite difficult for
   all the complex cases, though 90% of cases are trivial.  I won't go
   into all the gory details of those problems now.  It suffices to
   observe the context necessary to do the best possible job: DWARF
   scope DIE of the desired context, if available (either CU or inner);
   language mode if not implicit from CU; module the reference is being
   made from.

2. The symbol version is approximately a second and third part to the
   symbol name.  It is usually determined (bound) only at link time,
   not by the compiler nor in assembly source code; DWARF is never
   much help figuring it out.  It's only an issue when there are two
   definitions with the same name and different symbol versions.  This
   is likely for e.g. printf and some libstdc++ name-mangled symbols,
   but not very common in Joe Blow's DSO.

   Given a module to consider as the context making the reference, when
   there is an existing external reference in that module to a given
   symbol name, that pretty much makes it easy.  (It's possible to refer
   to two different symbols with the same name and different versions
   inside one module.  But it's so inconvenient that even DSOs like libc
   that define multiple symbols by the same name don't use the versioned
   mechanism to refer to their own symbols between CUs, they just use
   private aliases with different names.)  If there was no reference to
   this symbol linked into the context module, then you can't in all
   cases get an unambiguously right answer.  But there is no information
   helpful to that guessing more granular than just the module as
   context.

3. Every reference is in the context of some module or other, even if not
   in the context of a known CU/scope.  If the context of the user request
   seems to be "global, I don't care exactly what I mean", then either
   this means the main program module, or it means accepting the result is
   a map of different results given different contexts in cases of ambiguity.

4. The purposes for reference that we must distinguish are:
   a. like a PLT reloc, a jump target
   b. like a COPY reloc, an initializer value
   c. like all others, an object address
   
   For "func(args)", and probably for a breakpoint on func, it's (a).
   For "&func", it's (c).  In module A using PIC that calls func in module B,
   (a) means func's real definition in B; (c) means func's PLT symbol in A.

   For a non-PIC object (i.e. the main program) that does "extern int foo;"
   and links to a DSO that does "int foo = 1;", then (b) is foo's symbol in
   the main program; (c) is foo's definition in the DSO.  When the
   reference is to "the variable foo", to see its live value in the
   program, change it, place a watchpoint, it's (c).  When the reference is
   to print the static initializer value, offline or before the program has
   finished dynamic loading at startup/dlopen, then it's (b).

   The (a) vs (c) and (b) vs (c) distinctions often apply to a module's
   own references to its own symbols, though I used less confusing examples
   above.  So purpose is relevant to every lookup.

   At high level, we can describe the purpose distinctions as "for actual
   code address", "for static initializer data", and "for object address".

So, gleaned from that, the lookup function should have at hand:

* symbol name
* DWARF scope DIE of the desired context, if available
* language mode, if no scope/CU context
* referring module, if available
* purpose of reference

Furthermore, it should be able to return ambiguous results.  That is,
the result of the lookup is either "this one is it", or a list of
candidates each annotated with "this would be it if you were to give
foobar as the referring module" and/or "this would be it if this
reference by module foobar were resolved to symbol version V in module
M".  It's fine for now if the only plan for the ambiguous cases is an
exception "couldn't figure it out" or even "blindly pick the first one".
But one should contemplate how the description of the ambiguity and the
options for resolving it might percolate up to the user.


Thanks,
Roland

      reply	other threads:[~2007-10-16  2:39 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-10-06  8:12 Phil Muldoon
2007-10-08 10:46 ` Mark Wielaard
2007-10-09 17:08   ` Andrew Cagney
2007-10-09 17:30     ` dl symbol search path; Was: " Andrew Cagney
2007-10-16  2:39       ` Roland McGrath [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20071016023848.95A054D0389@magilla.localdomain \
    --to=roland@redhat.com \
    --cc=frysk@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).