From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <systemtap-return-6927-listarch-systemtap=sources.redhat.com@sourceware.org>
Received: (qmail 27505 invoked by alias); 14 Aug 2007 04:59:58 -0000
Received: (qmail 27193 invoked by uid 22791); 14 Aug 2007 04:59:55 -0000
X-Spam-Status: No, hits=0.2 required=5.0 	tests=AWL,BAYES_50,DK_POLICY_SIGNSOME,FORGED_RCVD_HELO,SPF_HELO_PASS,SPF_PASS
X-Spam-Check-By: sourceware.org
Received: from mx2.redhat.com (HELO mx2.redhat.com) (66.187.237.31)     by sourceware.org (qpsmtpd/0.31) with ESMTP; Tue, 14 Aug 2007 04:59:48 +0000
Received: from gateway.sf.frob.com (c-67-160-211-197.hsd1.ca.comcast.net [67.160.211.197]) 	by mx2.redhat.com (8.13.1/8.13.1) with ESMTP id l7E4xUmH004846; 	Tue, 14 Aug 2007 00:59:40 -0400
Received: from magilla.localdomain (magilla.sf.frob.com [198.49.250.228]) 	by gateway.sf.frob.com (Postfix) with ESMTP 	id 9F293357B; Mon, 13 Aug 2007 21:08:16 -0700 (PDT)
Received: by magilla.localdomain (Postfix, from userid 5281) 	id 6539F4D057D; Mon, 13 Aug 2007 21:07:46 -0700 (PDT)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
From: Roland McGrath <roland@redhat.com>
To: fche@redhat.com (Frank Ch. Eigler)
Cc: systemtap@sources.redhat.com
Subject: Re: elfutils offline mode for user-space?
In-Reply-To: Frank Ch. Eigler's message of  , 23 July 2007 13:50:20 -0400 <y0m4pjv2en7.fsf@ton.toronto.redhat.com>
X-Shopping-List:     (1) Chemical meritorious crosswords    (2) Hobnobbing deviant departure losers    (3) Delinquent climates
Message-Id: <20070814040746.6539F4D057D@magilla.localdomain>
Date: Tue, 14 Aug 2007 15:51:00 -0000
X-RedHat-Blacklist-Warning: Relay 67.160.211.197 is blacklisted by a RBL system
X-RedHat-Spam-Score: 3.702 ***
X-IsSubscribed: yes
Mailing-List: contact systemtap-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <systemtap.sourceware.org>
List-Subscribe: <mailto:systemtap-subscribe@sourceware.org>
List-Post: <mailto:systemtap@sourceware.org>
List-Help: <mailto:systemtap-help@sourceware.org>, <http://sourceware.org/lists.html#faqs>
Sender: systemtap-owner@sourceware.org
X-SW-Source: 2007-q3/txt/msg00333.txt.bz2

> roland wrote:
> > [...]  I see this as just another element of the general "figure out
> > which files to look at" problem.  There just isn't any clear and
> > satisfying means to glean it statically.
> 
> It is likely sufficient to identify all the *candidates* statically.

That's the hard part.  I think "external means", i.e. kludges, are where
you should start.  You can run ldd.  For each packaging system (rpm et al)
you can do some sort of packaging-specific guesswork about what the thing
might dlopen.  (It's a little bleak.)  So, just start prototyping with the
kludges and see what seems to fly.  

> > This gets back to the whole question of what the plan is for the
> > model of specifying user-space probe locations.  It's not just about
> > the right list of loaded files.
> 
> The basics would be identifying executables or libraries, and naming
> functions/statements defined therein, as before.  A reference to an
> unlisted shared library could be made probe-able later.

I don't follow what "reference to ... unlisted" means here.

> > I can spin you a whole rant about DWARF names and symbol versioning.
> > But all this belongs on the public mailing lists. [...]
> 
> Fire away.

As you mentioned, there are two basic pieces of specification: module, and
probe location within that module.

The way kernel probes are specified is in essence always from the point of
view of the kernel source.  For modules that is fully sufficient.  Aside
from s/-/_/g, there is only one way that the modules are referred to both
in the kernel source and by users.  For probe locations, it means that
direct probes are purely the province of tapset writers and other kernel
developers.  An average script writer refers only to the probe points
designed by hand and provided in a tapset.  That is fine for the kernel.
Application programmers have no expectation of describing their interests
in the kernel directly in detailed language terms they use in their own
programming.  Even for system calls, there is some understanding that there
is not a one-to-one correspondence between the syscall-like functions in
whatever language binding they use (libc included) and the actual kernel
crossing names and argument encodings.  (The only expectation is to have a
complete syscall tapset that presents terms that are sensible in the abstract.)

For DSOs, the line between "library" (or "system") and "application" (or
"user") is much more blurry.  But the naive support that builds minimally
from what works now for kernel probes yields the same result.  That is, to
specify a probe in a DSO one takes the point of view of the source code
that was built into that DSO.  For any library not part of the user's own
code, including headers as well as DSOs, one must either wade blindly into
that, or rely solely on a tapset provided to go with that DSO.  Maybe that
is OK, or maybe it is not.

In any one user process, there can be as many different source-level points
of view as compilation units making up the modules in the process, and then
there are the ABI points of view.

DWARF data describes the source-level point of view in each compilation
unit.  The name of a function in DWARF data is the name used in the source
to define the function's body.  With aliases, this may not be the same name
by which it's called even elsewhere in the same compilation unit and source
file.  With aliases and symbol versions, it often may not be the same name
used from other modules.  

If a tapset writer asks for a probe on foobar in libfoo.so in the libfoo.so
tapset, he expects the "foobar" used as a source name in building libfoo.so.
This is just like the current kernel case.  If an application writer asks
for a probe on foobar in libfoo.so, he may expect the "foobar" used as a
source name in his application.  He may expect the signature that name had
for use in his application source.  He may expect its parameters to have the
names used in the prototype declaration in the header file included by the
application, or those names s/^_*/, or want to refer to them positionally
given that known signature.

The DWARF data of an application (or any module referring to symbols defined
in another) can include declaration records for external symbols.  These
give the details of the type signature and the source location of the
declaration in a header file.  That is enough to know what the script writer
means to refer to.  With some luck, you can map that into the ABI perspective.
However, I think it's normal for the compiler to elide all those
declarations from the DWARF data when they are externals.

The ABI perspective is driven only by ELF symbols (in the dynamic symbol
table).  For our purposes, a symbol either is undefined, a key with no
value, or is a key and value.  The key is (soname, setname, symbolname),
or just (,, symbolname) for unversioned symbols.  The relevant part of
the value for us is the address.

In the general case, I don't think there is a reliable way to map the
source-level reference a DWARF declaration record describes to the right
undefined symbol.  However, without unusual efforts, one module will not
normally contain two undefined symbols with the same name (but different
version bindings).  So a first approximation is to take the DWARF name
(with appropriate mangling) as the ELF symbol name and finding the
undefined symbol with that name.  You may be SOL for things like the
asm("name") decl magic used for "open" to produce "open64" if you are
using -D_FILE_OFFSET_BITS=64.  

Say somehow you got the right symbol key, i.e. the ABI perspective.  You
can find the module(s) with the right soname, or any modules that define
the right unversioned symbol name, and now you have the defined symbol.
All that really tells you is an address.  To correlate this with DWARF in
the defining module, you just have to look up the right CU and look for
functions whose entry_pc matches that address.  Now you've gotten back
around to the source-level perspective of the writer of that function,
all the way from the application source perspective.  You get to
reconcile the two.  If you do know what the application thought the
signature was, you want to sanity-check that for compatibility with the
type of the defined function you found in the DWARF data.  If you had
application source-perspective parameter names, you've probably already
converted those to positional by now.  You can resolve those from DWARF.


So there are many ways to attack all this.  The way I've just described
things is as if doing some very fancy nuanced kind of probe specification
that specifies a resolution context and a target from that perspective.
I'm not suggesting something like that in particular.  

Another approach would be to rely on tapset writers to supply explicit
probes for the functions in their DSOs.  But, give them some help.
For example, automatic probe aliases for all the ABI keys of a
source-level function you name in a probe definition.  This still
relies on some idea of probe resolution context, but in a much simpler
way.  Basically, the context of a user script (or another tapset
script, for that matter) is specified simply as the ABI context of a
given module (i.e., default "the executable" for user scripts).  What
this entails is each other soname referenced, and for each of those
the ABI version (ancestorless symbol version set name) bound to (try
ldd -v).  All that context does is choose among the several sets of
probe aliases each tapset defines.

Hmm, maybe you could do the same thing implicitly for user function
probes to find those aliases and it's basically the same as a simple
form of the context-qualified probe specifiers.  I'm just thinking out loud.

I'll leave the inlines part of the rant for next time.  
And then there's PLT probes.  It's all probably ... tractable.
But, you know, be afraid.


Thanks,
Roland