public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* Re: [Ksummit-2008-discuss] DTrace
       [not found]               ` <20080630112913.GA18817@lst.de>
@ 2008-06-30 19:27                 ` Frank Ch. Eigler
  2008-07-01  1:21                   ` Jim Keniston
       [not found]                   ` <20080706123414.GA9265@lst.de>
  0 siblings, 2 replies; 52+ messages in thread
From: Frank Ch. Eigler @ 2008-06-30 19:27 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: ksummit-2008-discuss, systemtap, utrace-devel

Hi -

On Mon, Jun 30, 2008 at 01:29:13PM +0200, Christoph Hellwig wrote:

> [...]  This might be getting a little offtopic for the kernel summit
> discuss list, but let's start anyway, we can move this to a better
> suited list, although I can't think of one except for linux-kernel.

systemtap@sources.redhat.com
utrace-devel@redhat.com


> I'm not sure if that's the current design, but I can't find any
> evidence in the code that it allows running handlers in process
> context, all that's available is a kernel callback.  [...]

For systemtap's purposes, that is sufficient.  Our probes are meant to
run non-intrusively (they do not mess with user thread scheduling,
their VM state, strictly limited time & space consumption), so
actually injecting equivalent snippets of code into userspace for
execution there does not seem to buy anything.  Plus, like dtrace, we
want scripts to be able to intermix probes (=> share data) amongst
kernel and multiple user-space threads, and this seems most naturally
done by running the probes themselves in kernel space.


> [...] What we really need is a userspace interface so that it
> actually can be used by thing like frysk or an implementation of the
> userspace dtrace hooks.

That will get built as other tools require it.  Systemtap per se will
likely not.


> [...] For complex traces doing this in userspace is for sure a better idea.

Can you elaborate upon this more complex scenario?


- FChE

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-06-30 19:27                 ` [Ksummit-2008-discuss] DTrace Frank Ch. Eigler
@ 2008-07-01  1:21                   ` Jim Keniston
       [not found]                   ` <20080706123414.GA9265@lst.de>
  1 sibling, 0 replies; 52+ messages in thread
From: Jim Keniston @ 2008-07-01  1:21 UTC (permalink / raw)
  To: Frank Ch. Eigler
  Cc: Christoph Hellwig, ksummit-2008-discuss, systemtap, utrace-devel

On Mon, 2008-06-30 at 13:12 -0400, Frank Ch. Eigler wrote:
> Hi -
> 
> On Mon, Jun 30, 2008 at 01:29:13PM +0200, Christoph Hellwig wrote:
> 
> > [...]  This might be getting a little offtopic for the kernel summit
> > discuss list, but let's start anyway, we can move this to a better
> > suited list, although I can't think of one except for linux-kernel.
> 
> systemtap@sources.redhat.com
> utrace-devel@redhat.com
> 
> 
> > I'm not sure if that's the current design, but I can't find any
> > evidence in the code that it allows running handlers in process
> > context, all that's available is a kernel callback.  [...]

To clarify, it's a kernel callback that runs in the context of the
probed thread -- like other utrace-based callbacks.  And like other
utrace-based callbacks, a uprobes handler can block for stuff like
copy_to/from_user()... although I believe systemtap will support only
non-blocking handlers for now.

> 
> For systemtap's purposes, that is sufficient.  Our probes are meant to
> run non-intrusively (they do not mess with user thread scheduling,
> their VM state, strictly limited time & space consumption), so
> actually injecting equivalent snippets of code into userspace for
> execution there does not seem to buy anything.  Plus, like dtrace, we
> want scripts to be able to intermix probes (=> share data) amongst
> kernel and multiple user-space threads, and this seems most naturally
> done by running the probes themselves in kernel space.

Yes.

> 
> 
> > [...] What we really need is a userspace interface so that it
> > actually can be used by thing like frysk or an implementation of the
> > userspace dtrace hooks.

Userspace dtrace hooks could be probed using systemtap-generated
uprobes, whether or not the hooks all funnel into the same user-space
handler function.

> 
> That will get built as other tools require it.  Systemtap per se will
> likely not.

Two years back, we explored the possibility of systemtap translating a
script into an ad hoc tracer app that used ptrace.  The idea was that
that would suffice in cases where the user doesn't care to see what's
going on in the kernel.  My experience was that ptrace wasn't up to the
task.  Perhaps when we nail down the right utrace-based,
ptrace-replacement system call interface (utracer II, or whatever -- see
the  current discussion on the utrace-devel list), we should revisit
that option.  It would make systemtap accessible to the ordinary
application programmer, without him/her needing root or stapdev to bless
his/her script.

Stuff that's in uprobes (e.g., kprobes-style single-stepping out of
line, to allow real-time tracing of multithreaded apps) can be made
available to the new syscall API and/or utrace.

> 
> 
> > [...] For complex traces doing this in userspace is for sure a better idea.
> 
> Can you elaborate upon this more complex scenario?
> 
> 
> - FChE

Jim Keniston

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
       [not found]                   ` <20080706123414.GA9265@lst.de>
@ 2008-07-06 15:47                     ` Frank Ch. Eigler
  2008-07-06 16:36                       ` Evgeniy Polyakov
  0 siblings, 1 reply; 52+ messages in thread
From: Frank Ch. Eigler @ 2008-07-06 15:47 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: ksummit-2008-discuss, systemtap

Hi -


On Sun, Jul 06, 2008 at 02:34:14PM +0200, Christoph Hellwig wrote:

> > For systemtap's purposes, that is sufficient.  Our probes are meant to
> > run non-intrusively (they do not mess with user thread scheduling,

> But that's not what matters.

Really?  Exactly what kinds/degrees of intrusiveness do you believe
would be acceptable, and still be dtrace-level useful, and how did you
come up with that list?

> We don't add kernel interface for out of tree modules.

That is a specific example of an attitude that I hope will be
reexamined if y'all want to support dtrace-level introspection.


> And thinking about it - having to compile out of tree kernel modules
> on the fly to trace user space processes is just braindead.

I gladly grant "counterintuitive", especially if one's intuition is
limited to probing just one's own pet user-space process.  It is a
different matter when one needs to seamlessly probe a mixture of
kernel activities, daemons, and user processes.


> > > [...] For complex traces doing this in userspace is for sure a better idea.
> > 
> > Can you elaborate upon this more complex scenario?
> 
> For complex traces you basically want a ptrace without the signal mess.
> See the utrace list for some design ideas.

I'm well aware of the utrace list traffic, and that describes a
low-level debugger interface API.  You're not describing a "complex ..
trace ...  scenario" -- i.e., the purpose that you imagine
ptrace-via-utrace is *the* appropriate solution for.


- FChE

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-07-06 15:47                     ` Frank Ch. Eigler
@ 2008-07-06 16:36                       ` Evgeniy Polyakov
  2008-07-06 18:05                         ` Frank Ch. Eigler
  0 siblings, 1 reply; 52+ messages in thread
From: Evgeniy Polyakov @ 2008-07-06 16:36 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: Christoph Hellwig, ksummit-2008-discuss, systemtap

Hi Frank.

On Sun, Jul 06, 2008 at 11:46:12AM -0400, Frank Ch. Eigler (fche@redhat.com) wrote:
> > We don't add kernel interface for out of tree modules.
> 
> That is a specific example of an attitude that I hope will be
> reexamined if y'all want to support dtrace-level introspection.

I believe the only answer you will get is 'no'. Both for dtrace-like
stuff and ability to add unmaintained interfaces into the kernel.

Out-of-tree stuff can appear and disappear, change its internal
structures, API, ABI and degree of fevered consciousness delirium,
but kernel interface has to be supported long and long time after it was
added, so there is no way to add some interface without its users being
in kernel.

> > And thinking about it - having to compile out of tree kernel modules
> > on the fly to trace user space processes is just braindead.
> 
> I gladly grant "counterintuitive", especially if one's intuition is
> limited to probing just one's own pet user-space process.  It is a
> different matter when one needs to seamlessly probe a mixture of
> kernel activities, daemons, and user processes.
 
Out of curiosity, how in the hell administrator or any other non kernel
hacker person needs to have ability to tap into userspace process via
kernel module?

-- 
	Evgeniy Polyakov

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-07-06 16:36                       ` Evgeniy Polyakov
@ 2008-07-06 18:05                         ` Frank Ch. Eigler
  2008-07-06 18:24                           ` Evgeniy Polyakov
  0 siblings, 1 reply; 52+ messages in thread
From: Frank Ch. Eigler @ 2008-07-06 18:05 UTC (permalink / raw)
  To: Evgeniy Polyakov; +Cc: Christoph Hellwig, ksummit-2008-discuss, systemtap

Evgeniy Polyakov <johnpol@2ka.mipt.ru> writes:

>> > We don't add kernel interface for out of tree modules.
>> 
>> That is a specific example of an attitude that I hope will be
>> reexamined if y'all want to support dtrace-level introspection.
>
> I believe the only answer you will get is 'no'. Both for dtrace-like
> stuff and ability to add unmaintained interfaces into the kernel.
>
> Out-of-tree stuff can appear and disappear, change its internal
> structures, API, ABI [...]

Yes, well, it turns out that the number of systemtap-specific kernel
interfaces we have requested is ... precisely ... zero.

We have on occasion asked that some established module interfaces
simply not be *unexported*, but almost all of those requests have been
turned down, requiring us to kludge.  We have helped promote a
systemtap-neutral instrumentation mechanism (markers), along with a
project with a near-decade history of stable instrumentation (ltt/ng),
and one can see the progress (?) that this has made even since Karim's
"emperor is naked" note two (!) years ago.


>> > And thinking about it - having to compile out of tree kernel modules
>> > on the fly to trace user space processes is just braindead.
>> 
>> I gladly grant "counterintuitive", especially if one's intuition is
>> limited to probing just one's own pet user-space process.  It is a
>> different matter when one needs to seamlessly probe a mixture of
>> kernel activities, daemons, and user processes.
>  
> Out of curiosity, how in the hell administrator or any other non kernel
> hacker person needs to have ability to tap into userspace process via
> kernel module?

Think about how a non-intrusive system-wide probing must work, if it
is desirable not to interfere with e.g. thread scheduling or VM state.
Specifically, if we don't want to context-switch from threads (thereby
interfering with contention effects we may want to measure), nor page
data in/out just to satisfy probe data (thereby generating more I/O
and associated distortions).  It seems only kernel-side code can do
all of that.  Do you have a better suggestion?


- FChE

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-07-06 18:05                         ` Frank Ch. Eigler
@ 2008-07-06 18:24                           ` Evgeniy Polyakov
  2008-07-06 21:46                             ` Frank Ch. Eigler
  0 siblings, 1 reply; 52+ messages in thread
From: Evgeniy Polyakov @ 2008-07-06 18:24 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: Christoph Hellwig, ksummit-2008-discuss, systemtap

On Sun, Jul 06, 2008 at 02:00:03PM -0400, Frank Ch. Eigler (fche@redhat.com) wrote:
> >> That is a specific example of an attitude that I hope will be
> >> reexamined if y'all want to support dtrace-level introspection.
> >
> > I believe the only answer you will get is 'no'. Both for dtrace-like
> > stuff and ability to add unmaintained interfaces into the kernel.
> >
> > Out-of-tree stuff can appear and disappear, change its internal
> > structures, API, ABI [...]
> 
> Yes, well, it turns out that the number of systemtap-specific kernel
> interfaces we have requested is ... precisely ... zero.

Well, in the mail you replied to there was objection to add new
interfaces.

> We have on occasion asked that some established module interfaces
> simply not be *unexported*, but almost all of those requests have been
> turned down, requiring us to kludge.  We have helped promote a
> systemtap-neutral instrumentation mechanism (markers), along with a
> project with a near-decade history of stable instrumentation (ltt/ng),
> and one can see the progress (?) that this has made even since Karim's
> "emperor is naked" note two (!) years ago.

Unexporting some things allows to change them in order to fix some bugs,
create better abstraction, introduce new feature... Having all calls
forever does not provide any gain to the kernel, instead project can be
pushed into the kernel, so anyone would win.

> >> > And thinking about it - having to compile out of tree kernel modules
> >> > on the fly to trace user space processes is just braindead.
> >> 
> >> I gladly grant "counterintuitive", especially if one's intuition is
> >> limited to probing just one's own pet user-space process.  It is a
> >> different matter when one needs to seamlessly probe a mixture of
> >> kernel activities, daemons, and user processes.
> >  
> > Out of curiosity, how in the hell administrator or any other non kernel
> > hacker person needs to have ability to tap into userspace process via
> > kernel module?
> 
> Think about how a non-intrusive system-wide probing must work, if it
> is desirable not to interfere with e.g. thread scheduling or VM state.
> Specifically, if we don't want to context-switch from threads (thereby
> interfering with contention effects we may want to measure), nor page
> data in/out just to satisfy probe data (thereby generating more I/O
> and associated distortions).  It seems only kernel-side code can do
> all of that.  Do you have a better suggestion?

Hmmm... Utrace suddenly stopped to work?
Even ptrace will work in described cases, if requested data is
accessible from userspace. Not sure about VM states, but kernel hides
this data on purpose, if it does need to be viewed from userspace, you
can extend statistics.

And is it really much simpler to use dtrace scripts (btw, does
systemtap has the same complexity of script writing?) for that?

-- 
	Evgeniy Polyakov

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-07-06 18:24                           ` Evgeniy Polyakov
@ 2008-07-06 21:46                             ` Frank Ch. Eigler
  2008-07-06 22:47                               ` Karen Shaeffer
  2008-07-07  5:59                               ` Evgeniy Polyakov
  0 siblings, 2 replies; 52+ messages in thread
From: Frank Ch. Eigler @ 2008-07-06 21:46 UTC (permalink / raw)
  To: Evgeniy Polyakov; +Cc: Christoph Hellwig, systemtap

Hi -

On Sun, Jul 06, 2008 at 10:23:08PM +0400, Evgeniy Polyakov wrote:

> > Yes, well, it turns out that the number of systemtap-specific kernel
> > interfaces we have requested is ... precisely ... zero.
> 
> Well, in the mail you replied to there was objection to add new
> interfaces.

That just illustrates the imperfect information floating around.


> > We have on occasion asked that some established module interfaces
> > simply not be *unexported*, but almost all of those requests have been
> > turned down, requiring us to kludge.  [...]
> 
> Unexporting some things allows to change them in order to fix some bugs,
> create better abstraction, introduce new feature... Having all calls
> forever does not provide any gain to the kernel, instead project can be
> pushed into the kernel, so anyone would win.

OK, so you think systemtap should go into the kernel ...


> > Think about how a non-intrusive system-wide probing must work,
> > [...]  It seems only kernel-side code can do all of that.  Do you
> > have a better suggestion?

> Hmmm... Utrace suddenly stopped to work?

(I assume you know that utrace is a kernel-side API.  You may realize
is that we are using it (via another layered module uprobes) to place
probes into user-space programs.)


> Even ptrace will work in described cases, if requested data is
> accessible from userspace. [...]

... but now systemtap stay out to userspace?  I don't understand.


> And is it really much simpler to use dtrace scripts [...] for that?

Simpler than what?  A userspace debugger that messes with thread state?


> (btw, does systemtap has the same complexity of script writing?

If you point out an example of what you consider a complex dtrace
script, I can try to answer that.


- FChE

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-07-06 21:46                             ` Frank Ch. Eigler
@ 2008-07-06 22:47                               ` Karen Shaeffer
  2008-07-06 23:15                                 ` Frank Ch. Eigler
  2008-07-07  5:59                               ` Evgeniy Polyakov
  1 sibling, 1 reply; 52+ messages in thread
From: Karen Shaeffer @ 2008-07-06 22:47 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: systemtap

On Sun, Jul 06, 2008 at 05:39:20PM -0400, Frank Ch. Eigler wrote:
> 
> (I assume you know that utrace is a kernel-side API.  You may realize
> is that we are using it (via another layered module uprobes) to place
> probes into user-space programs.)

Hi Frank,
I was wondering. I am going to try to work with systemtap with
a kernel.org 2.6.25.x kernel. My understanding is that neither
utrace or uprobes is in there. I did clone systemtap git, but
it doesn't look like there are any kernel patches in there.

Roland's utrace kernel patches stop at 2.6.24. Please advise
me?

Thanks,
Karen
-- 
 Karen Shaeffer
 Neuralscape, Palo Alto, Ca. 94306
 shaeffer@neuralscape.com  http://www.neuralscape.com

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-07-06 22:47                               ` Karen Shaeffer
@ 2008-07-06 23:15                                 ` Frank Ch. Eigler
  0 siblings, 0 replies; 52+ messages in thread
From: Frank Ch. Eigler @ 2008-07-06 23:15 UTC (permalink / raw)
  To: Karen Shaeffer; +Cc: systemtap

Hi -

On Sun, Jul 06, 2008 at 03:46:31PM -0700, Karen Shaeffer wrote:
> > (I assume you know that utrace is a kernel-side API.  You may realize
> > is that we are using it (via another layered module uprobes) to place
> > probes into user-space programs.)
> 
> I was wondering. I am going to try to work with systemtap with
> a kernel.org 2.6.25.x kernel. My understanding is that neither
> utrace or uprobes is in there. I did clone systemtap git, but
> it doesn't look like there are any kernel patches in there.

The upstream-this-later parts are in .../runtime/uprobes in the form
of a self-contained module, not as a kernel patch.

> Roland's utrace kernel patches stop at 2.6.24. Please advise me?

See roland's utrace git tree
git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-utrace.git


- FChE

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
       [not found]                 ` <1215273663.3439.34.camel@localhost.localdomain>
@ 2008-07-06 23:33                   ` Frank Ch. Eigler
  2008-07-07 14:35                     ` James Bottomley
  2008-07-07 15:02                     ` James Bottomley
  0 siblings, 2 replies; 52+ messages in thread
From: Frank Ch. Eigler @ 2008-07-06 23:33 UTC (permalink / raw)
  To: James Bottomley; +Cc: Karen Shaeffer, Roland Dreier, systemtap

Hi, James -


> [...]  The problem is that SystemTap hasn't really benefited from
> community based innovation largely because it doesn't have much of a
> community.  The bigger picture problem Red Hat didn't see when they
> accepted the cash was that this project wouldn't generate a
> community just from the usual publish the code and they will come
> philosophy.  [...]

Red Hat didn't help build systemtap on "contract" if that's what you
think.  It has not been the sole player either.  Many of the same
customers who really want to use the tool have been partners,
contributing considerable ongoing engineering talent to help it along.


> [...]  The rising challenge it to find a way of bringing open source
> methods to bear on this class of problem.  What SystemTap seems to
> have demonstrated nicely is that simply paying a third party to
> solve your problem doesn't really work largely because the tight
> feedback loop between the producer and the consumer that drives open
> source innovation is broken. [...]

This analysis really doesn't seem to fit well to our actual situation.
We've been soliciting/collecting community contributions from the
beginning.  We've been active on LKML since 2005.  Any accusation that
we're not displaying "open source innovation" needs better evidence
than that some LKML seniors have not been very supportive / satisfied
users.


- FChE

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-07-06 21:46                             ` Frank Ch. Eigler
  2008-07-06 22:47                               ` Karen Shaeffer
@ 2008-07-07  5:59                               ` Evgeniy Polyakov
  2008-07-07 11:19                                 ` Frank Ch. Eigler
  1 sibling, 1 reply; 52+ messages in thread
From: Evgeniy Polyakov @ 2008-07-07  5:59 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: Christoph Hellwig, systemtap

Hi Frank.

On Sun, Jul 06, 2008 at 05:39:20PM -0400, Frank Ch. Eigler (fche@redhat.com) wrote:
> > Unexporting some things allows to change them in order to fix some bugs,
> > create better abstraction, introduce new feature... Having all calls
> > forever does not provide any gain to the kernel, instead project can be
> > pushed into the kernel, so anyone would win.
> 
> OK, so you think systemtap should go into the kernel ...

Getting history of systemtap, I'm not sure it will be easily possible :)
But it is good project for overall Linux advertisement.
Personally I will not use it, since it is simpler (and very likely has
smaller performance overhead) to put various kinds of counters all over
needed places to get required statistics for hot pathes like VM and put
debug prints in the slow parts.

> > Hmmm... Utrace suddenly stopped to work?
> 
> (I assume you know that utrace is a kernel-side API.  You may realize
> is that we are using it (via another layered module uprobes) to place
> probes into user-space programs.)

And adds additional layer.
 
> > Even ptrace will work in described cases, if requested data is
> > accessible from userspace. [...]
> 
> ... but now systemtap stay out to userspace?  I don't understand.

You asked how to get needed information. It can be accessed via ptrace,
so there is no need to have additional kernel module to get it.

> > And is it really much simpler to use dtrace scripts [...] for that?
> 
> Simpler than what?  A userspace debugger that messes with thread state?

Yes, ptrace approach is a bit messy, still there is utrace, and
now additional systap layer.

> > (btw, does systemtap has the same complexity of script writing?
> 
> If you point out an example of what you consider a complex dtrace
> script, I can try to answer that.

Things like  syscall::*read:entry, syscall::*write:entry
this->vnodep = this->filep != 0 ? this->filep->f_vnode : 0;
I.e. 'this' and 'self' pointer

-- 
	Evgeniy Polyakov

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-07-07  5:59                               ` Evgeniy Polyakov
@ 2008-07-07 11:19                                 ` Frank Ch. Eigler
  0 siblings, 0 replies; 52+ messages in thread
From: Frank Ch. Eigler @ 2008-07-07 11:19 UTC (permalink / raw)
  To: systemtap


Evgeniy Polyakov <johnpol@2ka.mipt.ru> writes:

>> OK, so you think systemtap should go into the kernel ...
>
> Getting history of systemtap, I'm not sure it will be easily possible :)
> But it is good project for overall Linux advertisement.

> Personally I will not use it, since it is simpler (and very likely
> has smaller performance overhead) to put various kinds of counters
> all over needed places to get required statistics for hot pathes
> like VM and put debug prints in the slow parts.

You may be right for your specific use scenario, or you may be
mistaken.  That is a separate matter from systemtap implementation.


>> [...]
>> ... but now systemtap stay out to userspace?  I don't understand.
>
> You asked how to get needed information. It can be accessed via ptrace,
> so there is no need to have additional kernel module to get it.

You would need to delve deeper into the problem domain then to
understand why dynamic probing is useful to people, and why
nonintrustive user-space probing is useful to people.  Opinions about
"how" to implement something less are not that useful if one didn't
understand/agree-with the "why" (requirements) and "what" (design)
in the first place.


>> > (btw, does systemtap has the same complexity of script writing?
>> 
>> If you point out an example of what you consider a complex dtrace
>> script, I can try to answer that.
>
> Things like  syscall::*read:entry, syscall::*write:entry
> this->vnodep = this->filep != 0 ? this->filep->f_vnode : 0;
> I.e. 'this' and 'self' pointer

"this" in dtrace is shorthand for a thread-specific named data field.
In systemtap, one might use an explicit associative-array expression
"filep[tid()]" to save such values between probes.

FWIW, this script does not strike me as that complex - I've seen much
"worse", both there and in systemtap.  Nor does it strike me as
unnecessarily complex, considering how you'd have to do the same thing
in C code or kprobes or user-space audit or whatever.  YMMV.


- FChE

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-07-06 23:33                   ` Frank Ch. Eigler
@ 2008-07-07 14:35                     ` James Bottomley
  2008-07-07 15:02                     ` James Bottomley
  1 sibling, 0 replies; 52+ messages in thread
From: James Bottomley @ 2008-07-07 14:35 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: Karen Shaeffer, Roland Dreier, systemtap

On Sun, 2008-07-06 at 19:31 -0400, Frank Ch. Eigler wrote:
> Hi, James -
> 
> 
> > [...]  The problem is that SystemTap hasn't really benefited from
> > community based innovation largely because it doesn't have much of a
> > community.  The bigger picture problem Red Hat didn't see when they
> > accepted the cash was that this project wouldn't generate a
> > community just from the usual publish the code and they will come
> > philosophy.  [...]
> 
> Red Hat didn't help build systemtap on "contract" if that's what you
> think.  It has not been the sole player either.  Many of the same
> customers who really want to use the tool have been partners,
> contributing considerable ongoing engineering talent to help it along.

My observation has been that companies who recognise the need for this
functionality for their customers (IBM, Intel for example) have
contributed engineering resources.  Customers who want it (at least
those I've talked to) seem only to have contributed cash.

> > [...]  The rising challenge it to find a way of bringing open source
> > methods to bear on this class of problem.  What SystemTap seems to
> > have demonstrated nicely is that simply paying a third party to
> > solve your problem doesn't really work largely because the tight
> > feedback loop between the producer and the consumer that drives open
> > source innovation is broken. [...]
> 
> This analysis really doesn't seem to fit well to our actual situation.
> We've been soliciting/collecting community contributions from the
> beginning.  We've been active on LKML since 2005.  Any accusation that
> we're not displaying "open source innovation" needs better evidence
> than that some LKML seniors have not been very supportive / satisfied
> users.

I actually covered this in a piece you failed to quote:

        The bigger picture problem Red Hat didn't see when they accepted
        the cash was that this project wouldn't generate a community
        just from the usual publish the code and they will come
        philosophy.
        
Your quote above pretty much bears this out ... especially as you seem
to be blaming others for the actual lack of community formation.

If it helps, I also didn't blame you or Red Hat for this.  Open Source
communities (at least from the distro point of view) have been largely
self assembling to date, driven by overriding pressures like need and
interest.

SystemTap is different: its notional community seems to come as some
assembly required, so what I was trying to analyse was why that was (and
possibly think about how it could be assembled).  One of the ironies of
this situation is that the distros are quite a bit further behind the
curve on this one than a lot of other companies who've all had
experience of launching an open source project (or simply publishing the
code) and seeing it die and then trying to work out how to do better.

James


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-07-06 23:33                   ` Frank Ch. Eigler
  2008-07-07 14:35                     ` James Bottomley
@ 2008-07-07 15:02                     ` James Bottomley
  1 sibling, 0 replies; 52+ messages in thread
From: James Bottomley @ 2008-07-07 15:02 UTC (permalink / raw)
  To: Frank Ch. Eigler
  Cc: Karen Shaeffer, Roland Dreier, systemtap, ksummit-2008-discuss

Don't drop the ks discuss list, since very few people who're
participating will be subscribed to that.

On Sun, 2008-07-06 at 19:31 -0400, Frank Ch. Eigler wrote:
> Hi, James -
> 
> 
> > [...]  The problem is that SystemTap hasn't really benefited from
> > community based innovation largely because it doesn't have much of a
> > community.  The bigger picture problem Red Hat didn't see when they
> > accepted the cash was that this project wouldn't generate a
> > community just from the usual publish the code and they will come
> > philosophy.  [...]
> 
> Red Hat didn't help build systemtap on "contract" if that's what you
> think.  It has not been the sole player either.  Many of the same
> customers who really want to use the tool have been partners,
> contributing considerable ongoing engineering talent to help it along.

My observation has been that companies who recognise the need for this
functionality for their customers (IBM, Intel for example) have
contributed engineering resources.  Customers who want it (at least
those I've talked to) seem only to have contributed cash.

> > [...]  The rising challenge it to find a way of bringing open source
> > methods to bear on this class of problem.  What SystemTap seems to
> > have demonstrated nicely is that simply paying a third party to
> > solve your problem doesn't really work largely because the tight
> > feedback loop between the producer and the consumer that drives open
> > source innovation is broken. [...]
> 
> This analysis really doesn't seem to fit well to our actual situation.
> We've been soliciting/collecting community contributions from the
> beginning.  We've been active on LKML since 2005.  Any accusation that
> we're not displaying "open source innovation" needs better evidence
> than that some LKML seniors have not been very supportive / satisfied
> users.

I actually covered this in a piece you failed to quote:

        The bigger picture problem Red Hat didn't see when they accepted
        the cash was that this project wouldn't generate a community
        just from the usual publish the code and they will come
        philosophy.
        
Your quote above pretty much bears this out ... especially as you seem
to be blaming others for the actual lack of community formation.

If it helps, I also didn't blame you or Red Hat for this.  Open Source
communities (at least from the distro point of view) have been largely
self assembling to date, driven by overriding pressures like need and
interest.

SystemTap is different: its notional community seems to come as some
assembly required, so what I was trying to analyse was why that was (and
possibly think about how it could be assembled).  One of the ironies of
this situation is that the distros are quite a bit further behind the
curve on this one than a lot of other companies who've all had
experience of launching an open source project (or simply publishing the
code) and seeing it die and then trying to work out how to do better.

James


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-06-30 19:59 ` James Bottomley
  2008-06-30 23:52   ` Masami Hiramatsu
@ 2008-07-08 23:32   ` Eric W. Biederman
  1 sibling, 0 replies; 52+ messages in thread
From: Eric W. Biederman @ 2008-07-08 23:32 UTC (permalink / raw)
  To: James Bottomley; +Cc: Frank Ch. Eigler, ksummit-2008-discuss, systemtap

James Bottomley <James.Bottomley@HansenPartnership.com> writes:

> I've also found it very easy to crash the system under probe if you use
> the wrong build tree for the running kernel (not a problem, I know that
> enterprise customers run into, but a common one for kernel developers).
> Since we have a kernel build version that increments with every build,
> it would be useful to sanity check the one systemtap pulled out of the
> debug with the one in the running kernel.

Interesting.  I wonder how much the DTrace in kernel interpreter
protects them from that.

>> * dtrace "just works"
>> 
>>   Yeah, so I hear, but think about how different their target
>>   environment is.  Their kernel hardly changes (several fixed APIs,
>>   ABIs): this has huge implications.  Their kernel was willing to
>>   insert probes (~ markers), a bunch of build system changes (debug
>>   info subset transcribing).  Here in linux land, we suffer
>>   multifaceted tensions and it is hard to go toward a goal without
>>   obstructions (well-meaning as they may be).
>
> The goal has to be well articulated and agreed to.  Open source is rapid
> at progressing towards common goals ... it's when the goals aren't
> common that progress gets bogged down.

In addition to a well articulated goal.  A feature poor implementation
that works and gives people lots of itches to scratch helps.

What is the goal with SystemTap?

The goal with Dtrace (and I am badly paraphrasing) is to allow a system
level look into what is happening.

There was a talk at OLS in 2006 entitled "Why user space sucks."
Which pointed out all of the silly things that were happening during
boot from a system level perspective.  How do we make that kind of
tracing easy with SystemTap?

Those are the interesting questions, and in general they shouldn't
need complicated tapsets and other prebuilt knowledge.  It is the
global view of how things are connected that looks most interesting.

Eric

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-07-07 14:36                         ` Christoph Hellwig
@ 2008-07-07 17:44                           ` K.Prasad
  0 siblings, 0 replies; 52+ messages in thread
From: K.Prasad @ 2008-07-07 17:44 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Peter Zijlstra, Theodore Tso, ksummit-2008-discuss,
	Roland McGrath, systemtap

On Sun, Jul 06, 2008 at 02:36:47PM +0200, Christoph Hellwig wrote:
> On Sat, Jul 05, 2008 at 11:34:32PM +0530, K.Prasad wrote:
> > > You can write probes in plain C, in fact I do this all the time.  what's
> > > missing is a nice and easy to use channel to get the traces to userspace
> > > and interpret them, and helper for poking at kernel data structures.
> > >
> > As you might be aware the "trace" interface which is part of the -mm
> > tree was meant to satisfy such needs.
> 
> From the interface POV it's a step in the right direction.  But to make
> adhoc kprobe tracing viable for anyone but hardcore kernel hackers we
> basically want a /debug/trace always availble so that the traces just
> need to relay_printk() on it.

The relay_printk() creates a directory structure that contains control
and data files under the 'debugfs' mounted path, while the user is kept
oblivious of the internals that are required to set it up. The user
simply does a relay_printk(relay_structure, "my desired string") and the
output is sent to the data file(s).

> We also want some helpers to encode
> complex strucures.  See the current printk on steroids discussion on
> lkml which would be pretty helpful for it.

On a related note, one of the improvements envisaged for the
"relay_printk()" interface was the ability to have user-defined
callback functions - which would  be invoked before the "trace"
infrastructure actually accesses the data for output.
Typical uses for the same, could range from defining call-backs for
obtaining/releasing locks on each protected variable without the user
having to bother about the same ... or to prefix/append a given data
with user-defined string automatically through the callback function
everytime before being output.

Such callback functions may also be used to encode/decode data into/from
complex data structures before they are sent to user-space without the
need to explicitly program them before every *printk(). Say something
like - also print i_ino when printing i_count etc. Not sure if the
example represents your case of a need for "helpers to encode complex
structures".

Thanks,
K.Prasad

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-07-05 18:05                       ` K.Prasad
@ 2008-07-07 14:36                         ` Christoph Hellwig
  2008-07-07 17:44                           ` K.Prasad
  0 siblings, 1 reply; 52+ messages in thread
From: Christoph Hellwig @ 2008-07-07 14:36 UTC (permalink / raw)
  To: K.Prasad
  Cc: Christoph Hellwig, Peter Zijlstra, Theodore Tso,
	ksummit-2008-discuss, Roland McGrath, systemtap

On Sat, Jul 05, 2008 at 11:34:32PM +0530, K.Prasad wrote:
> > You can write probes in plain C, in fact I do this all the time.  what's
> > missing is a nice and easy to use channel to get the traces to userspace
> > and interpret them, and helper for poking at kernel data structures.
> >
> As you might be aware the "trace" interface which is part of the -mm
> tree was meant to satisfy such needs.

From the interface POV it's a step in the right direction.  But to make
adhoc kprobe tracing viable for anyone but hardcore kernel hackers we
basically want a /debug/trace always availble so that the traces just
need to relay_printk() on it.  We also want some helpers to encode
complex strucures.  See the current printk on steroids discussion on
lkml which would be pretty helpful for it.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-07-05 13:50                       ` James Bottomley
@ 2008-07-05 18:08                         ` Christoph Hellwig
  0 siblings, 0 replies; 52+ messages in thread
From: Christoph Hellwig @ 2008-07-05 18:08 UTC (permalink / raw)
  To: James Bottomley
  Cc: Christoph Hellwig, Peter Zijlstra, ksummit-2008-discuss,
	Roland McGrath, systemtap

On Sat, Jul 05, 2008 at 08:49:49AM -0500, James Bottomley wrote:
> To be fair, you can simply "just write" entry (jprobes) and return
> probes (kretprobes).  For the entry probes, if you want access to the
> function arguments you need to know the deep magic of the calling
> conventions of your platform (pretty easy on x86, though).

For jprobes your probe just has the same signature as the probed
function, and for kretprobes you use the regs_return_value()
arch-provided helper.  It's really quite easy and covers 90% of what
I need as a kernel developer.  Well, except for the cases where gcc
gets to smart and inlines enormous callchains, but at least in XFS
we've just delcared everything noinline..

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-07-05 12:12                       ` Frank Ch. Eigler
@ 2008-07-05 18:08                         ` Christoph Hellwig
  0 siblings, 0 replies; 52+ messages in thread
From: Christoph Hellwig @ 2008-07-05 18:08 UTC (permalink / raw)
  To: Frank Ch. Eigler
  Cc: Christoph Hellwig, Peter Zijlstra, ksummit-2008-discuss,
	Roland McGrath, systemtap

On Sat, Jul 05, 2008 at 08:10:39AM -0400, Frank Ch. Eigler wrote:
> Hi -
> 
> On Sat, Jul 05, 2008 at 12:05:36PM +0200, Christoph Hellwig wrote:
> > > Also, it would be really great if you could write probes in regular C,
> > > some pseudo C language just messes up my mind.
> > 
> > You can write probes in plain C, in fact I do this all the time.  what's
> > missing is a nice and easy to use channel to get the traces to userspace
> > and interpret them, and helper for poking at kernel data structures.
> 
> Perhaps something like like this:

But that means I need to pull in all the systemtap crap, which is exatly
what I want to avoid.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-07-05 10:07                     ` Christoph Hellwig
  2008-07-05 12:12                       ` Frank Ch. Eigler
  2008-07-05 13:50                       ` James Bottomley
@ 2008-07-05 18:05                       ` K.Prasad
  2008-07-07 14:36                         ` Christoph Hellwig
  2 siblings, 1 reply; 52+ messages in thread
From: K.Prasad @ 2008-07-05 18:05 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Peter Zijlstra, Theodore Tso, ksummit-2008-discuss,
	Roland McGrath, systemtap

On Sat, Jul 05, 2008 at 12:05:36PM +0200, Christoph Hellwig wrote:
> On Sat, Jul 05, 2008 at 11:44:09AM +0200, Peter Zijlstra wrote:
> > Also, it would be really great if you could write probes in regular C,
> > some pseudo C language just messes up my mind.
> 
> You can write probes in plain C, in fact I do this all the time.  what's
> missing is a nice and easy to use channel to get the traces to userspace
> and interpret them, and helper for poking at kernel data structures.
>
As you might be aware the "trace" interface which is part of the -mm
tree was meant to satisfy such needs.

Moreover, an enhancement over the "trace" interface introduced in the form
of relay_printk() [for string output] and relay_dump() [for binary output]
was meant to give the user a printk()-like interface, but with the
advantages of using relay (such as per-cpu buffers) is still waiting for
acceptance and use-cases (Refer http://lkml.org/lkml/2008/5/28/212). A
typical example illustrating the brevity of the code when using the
relay_printk() interface can be found in the
samples/trace/fork_new_trace.c file in the patch.

Thanks,
K.Prasad
 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-07-05 10:07                     ` Christoph Hellwig
  2008-07-05 12:12                       ` Frank Ch. Eigler
@ 2008-07-05 13:50                       ` James Bottomley
  2008-07-05 18:08                         ` Christoph Hellwig
  2008-07-05 18:05                       ` K.Prasad
  2 siblings, 1 reply; 52+ messages in thread
From: James Bottomley @ 2008-07-05 13:50 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Peter Zijlstra, ksummit-2008-discuss, Roland McGrath, systemtap

On Sat, 2008-07-05 at 12:05 +0200, Christoph Hellwig wrote:
> On Sat, Jul 05, 2008 at 11:44:09AM +0200, Peter Zijlstra wrote:
> > Also, it would be really great if you could write probes in regular C,
> > some pseudo C language just messes up my mind.
> 
> You can write probes in plain C, in fact I do this all the time.  what's
> missing is a nice and easy to use channel to get the traces to userspace
> and interpret them, and helper for poking at kernel data structures.

To be fair, you can simply "just write" entry (jprobes) and return
probes (kretprobes).  For the entry probes, if you want access to the
function arguments you need to know the deep magic of the calling
conventions of your platform (pretty easy on x86, though).  However,
what you can't just write are the arbitrary kprobes in file x line y
because you need to know all the nasty details of dwarf to have a clue
what the absolute address is and where all the local varaibles you're
trying to look at are.  Now ... is using systemtap to do this easier
than printk?  For me, yes, since a recompile reboot sequence takes quite
a while, perhaps for someone with a faster machine ...

For getting information back, as you know, systemtap uses relayfs.  It's
not the most friendly or efficient thing in the world, so I'm happy to
have it wrappered by systemtap ...

James


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-07-05  9:46                   ` Peter Zijlstra
  2008-07-05 10:07                     ` Christoph Hellwig
@ 2008-07-05 12:34                     ` Theodore Tso
  1 sibling, 0 replies; 52+ messages in thread
From: Theodore Tso @ 2008-07-05 12:34 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Roland McGrath, ksummit-2008-discuss, systemtap

On Sat, Jul 05, 2008 at 11:44:09AM +0200, Peter Zijlstra wrote:
> On Tue, 2008-07-01 at 19:13 -0400, Theodore Tso wrote:
> 
> .. snip Ted talking about how using Systemtap is a pain ..
> 
> Well said. Until the point where using Systemtap is less work than
> recompile and boot I'm not remotely interested in even looking at it.

To be fair, Systemtap has gotten a lot better.  About a week ago or so
the latest version in its git tree actually passes the "download,
configure, make" test on Ubuntu Hardy.  Apparently previous to that
you needed to know about the magic --enable-staticdw configure option,
but its build system has been made much smarter now, which is good.

Kernel compiles with symbols *do* take longer, since DWARF is just
such a bloated method of storing debugging information, and so the
files get much, much bigger.  However, if you are worried about your
root partition filling up due to bloaded modules taking up a huge
amount of space, you can do this: "make INSTALL_MOD_STRIP=1 modules_install;
make INSTALL_MOD_PATH=/usr/lib/debug modules_install".

I have this partially automated with some patches to make-kpkg, on my
Ubuntu system.  (Available on request).  The real issue is that there
is a bunch of basic usability issues that aren't well documented, or
better yet, automated.  As a result, people hit these things, get
frustrated, and then go back to printk.  They're not going to file
bugs in some bugzilla; they're not going to switch their distribution
to Fedora; they're just going to move on to something more productive.

The really wierd thing is that this might make sense if Systemtap were
primarily targetted at system administrators, and not kernel
developers.  If this were so, making a bunch of installation issues
distro-specific, and only worrying about distro-supplied kernels and
not about kernel developer workflow might make sense.  But given that
a common excuse I've heard for the paucity of tapsets is "well, we
need the kernel developer's expertise", and the fact that Systemtap
has various UI decisions that really seem to make it much more
optimized for developers (there are various things that in Dtrace can
be done with a single command line, but which in Systemtap requires a
much longer scripts, which is bad for system administrators who find
it much easier to use DTrace as a result), this just seems to be a
very puzzling gap.

> Also, it would be really great if you could write probes in regular C,
> some pseudo C language just messes up my mind.

You can.  Just surround the code with %{ and %} symbols.  In fact, you
have to do this a lot in tapsets.

							- Ted

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-07-05 10:07                     ` Christoph Hellwig
@ 2008-07-05 12:12                       ` Frank Ch. Eigler
  2008-07-05 18:08                         ` Christoph Hellwig
  2008-07-05 13:50                       ` James Bottomley
  2008-07-05 18:05                       ` K.Prasad
  2 siblings, 1 reply; 52+ messages in thread
From: Frank Ch. Eigler @ 2008-07-05 12:12 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Peter Zijlstra, ksummit-2008-discuss, Roland McGrath, systemtap

Hi -

On Sat, Jul 05, 2008 at 12:05:36PM +0200, Christoph Hellwig wrote:
> > Also, it would be really great if you could write probes in regular C,
> > some pseudo C language just messes up my mind.
> 
> You can write probes in plain C, in fact I do this all the time.  what's
> missing is a nice and easy to use channel to get the traces to userspace
> and interpret them, and helper for poking at kernel data structures.

Perhaps something like like this:


probe kernel.function("foobar") // or kernel.statement("*@dir/file.c:222")
{
    probe_me_harder ($var1, $ptr->field)
}

%{
#include "linux/something.h"
%}
function probe_me_harder (v1, f1) 
%{
   struct something *s = (struct something*) THIS->v1;
   struct something_else *w = (struct something_else*) THIS->f1;
   call_some_safe_kernel_function__be_careful_out_there (s, w);
   _stp_printf ("I did something fun with %p and %p\n", s, w);
%}


# stap -g probe.stp
I did something fun with 0xdick and 0xjane.
^C


- FChE

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-07-05  9:46                   ` Peter Zijlstra
@ 2008-07-05 10:07                     ` Christoph Hellwig
  2008-07-05 12:12                       ` Frank Ch. Eigler
                                         ` (2 more replies)
  2008-07-05 12:34                     ` Theodore Tso
  1 sibling, 3 replies; 52+ messages in thread
From: Christoph Hellwig @ 2008-07-05 10:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Theodore Tso, ksummit-2008-discuss, Roland McGrath, systemtap

On Sat, Jul 05, 2008 at 11:44:09AM +0200, Peter Zijlstra wrote:
> Also, it would be really great if you could write probes in regular C,
> some pseudo C language just messes up my mind.

You can write probes in plain C, in fact I do this all the time.  what's
missing is a nice and easy to use channel to get the traces to userspace
and interpret them, and helper for poking at kernel data structures.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-07-01 23:13                 ` Theodore Tso
                                     ` (2 preceding siblings ...)
  2008-07-02 20:08                   ` Joel Becker
@ 2008-07-05  9:46                   ` Peter Zijlstra
  2008-07-05 10:07                     ` Christoph Hellwig
  2008-07-05 12:34                     ` Theodore Tso
  3 siblings, 2 replies; 52+ messages in thread
From: Peter Zijlstra @ 2008-07-05  9:46 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Roland McGrath, ksummit-2008-discuss, systemtap

On Tue, 2008-07-01 at 19:13 -0400, Theodore Tso wrote:

.. snip Ted talking about how using Systemtap is a pain ..

Well said. Until the point where using Systemtap is less work than
recompile and boot I'm not remotely interested in even looking at it.

Also, it would be really great if you could write probes in regular C,
some pseudo C language just messes up my mind.



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-07-02 21:30                       ` Theodore Tso
@ 2008-07-02 21:46                         ` J. Bruce Fields
  0 siblings, 0 replies; 52+ messages in thread
From: J. Bruce Fields @ 2008-07-02 21:46 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Roland McGrath, ksummit-2008-discuss, systemtap

On Wed, Jul 02, 2008 at 05:29:28PM -0400, Theodore Tso wrote:
> On Wed, Jul 02, 2008 at 04:16:37PM -0400, J. Bruce Fields wrote:
> > 
> > That would be nice.  But I'm afraid I normally don't even have access to
> > the kernel tree on the machine I'm installing to--I usually build a
> > monolithic kernel and then scp it to the test machines.  Is there hope
> > for me?
> > 
> 
> Well, that's why I build kernels packaged up in .deb format; it's much
> easier to scp a .deb file over to a test machine and install it.  The
> take-home here is that "make deb-pkg" and "make rpm-pkg" should
> ideally create bare-bones linux-image, linux-headers, and
> linux-debuginfo packages.  Maybe no distribution will ever use it, but
> it will be invaluable for kernel developers.

Oh, sure, that's a good idea.

One other minor consideration: I do sometimes find myself building on my
laptop while away on a slow network, and wanting to install the result
on a test machine back at the office.  So it's nice if the resulting
debs or rpms aren't too much larger than they need to be.

Though scp'ing a kernel in that situation is already pretty slow, so
I've mostly gotten used to doing a git push to a machine in the same
room as the test machines, then building there.

--b.

> The good news is that a member of the Debian kernel team member
> contacted me off-line and mentioned that he plans to work with Sam to
> send a patch series to get "make deb-pkg" working again, and in the
> long term with proper debuginfo support.  Maybe someone will be
> inspired to enhance "rpm-pkg" to do the right thing.
> 
> 	    	    	      	    	      - Ted

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-07-02 20:17                     ` J. Bruce Fields
  2008-07-02 20:41                       ` Frank Ch. Eigler
  2008-07-02 21:19                       ` H. Peter Anvin
@ 2008-07-02 21:30                       ` Theodore Tso
  2008-07-02 21:46                         ` J. Bruce Fields
  2 siblings, 1 reply; 52+ messages in thread
From: Theodore Tso @ 2008-07-02 21:30 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Roland McGrath, ksummit-2008-discuss, systemtap

On Wed, Jul 02, 2008 at 04:16:37PM -0400, J. Bruce Fields wrote:
> 
> That would be nice.  But I'm afraid I normally don't even have access to
> the kernel tree on the machine I'm installing to--I usually build a
> monolithic kernel and then scp it to the test machines.  Is there hope
> for me?
> 

Well, that's why I build kernels packaged up in .deb format; it's much
easier to scp a .deb file over to a test machine and install it.  The
take-home here is that "make deb-pkg" and "make rpm-pkg" should
ideally create bare-bones linux-image, linux-headers, and
linux-debuginfo packages.  Maybe no distribution will ever use it, but
it will be invaluable for kernel developers.

The good news is that a member of the Debian kernel team member
contacted me off-line and mentioned that he plans to work with Sam to
send a patch series to get "make deb-pkg" working again, and in the
long term with proper debuginfo support.  Maybe someone will be
inspired to enhance "rpm-pkg" to do the right thing.

	    	    	      	    	      - Ted

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-07-02 20:17                     ` J. Bruce Fields
  2008-07-02 20:41                       ` Frank Ch. Eigler
@ 2008-07-02 21:19                       ` H. Peter Anvin
  2008-07-02 21:30                       ` Theodore Tso
  2 siblings, 0 replies; 52+ messages in thread
From: H. Peter Anvin @ 2008-07-02 21:19 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Theodore Tso, Roland McGrath, ksummit-2008-discuss, systemtap

J. Bruce Fields wrote:
> 
> That would be nice.  But I'm afraid I normally don't even have access to
> the kernel tree on the machine I'm installing to--I usually build a
> monolithic kernel and then scp it to the test machines.  Is there hope
> for me?
> 

I usually don't even do the scp bit.  There is always NFS, I guess.

	-hpa

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-07-02 20:17                     ` J. Bruce Fields
@ 2008-07-02 20:41                       ` Frank Ch. Eigler
  2008-07-02 21:19                       ` H. Peter Anvin
  2008-07-02 21:30                       ` Theodore Tso
  2 siblings, 0 replies; 52+ messages in thread
From: Frank Ch. Eigler @ 2008-07-02 20:41 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Theodore Tso, Roland McGrath, ksummit-2008-discuss, systemtap

>> [...]
>> 	Me too.  I want to be able to say "make install; make
>> tap_install" in my kernel objdir.  "install" does what it always has
>> done - no change.  "tap_install" (or whatever) drops things in eg
>> /lib/modules/<version>/debug such that systemtap Just Works.

OK, we'll try to work out something like that soon.


"J. Bruce Fields" <bfields@fieldses.org> writes:

> That would be nice.  But I'm afraid I normally don't even have
> access to the kernel tree on the machine I'm installing to--I
> usually build a monolithic kernel and then scp it to the test
> machines.  Is there hope for me?

You can cross-compile systemtap scripts today, if that kernel at least
is built with CONFIG_MODULES etc.  On your development host:

   % stap -p4 SCRIPT
   % scp RESULT.ko target-machine:
   % ssh root@target-machine staprun RESULT.ko

or something very close to that.  The Fedora/RHEL packages separate a
"systemtap-runtime" piece consisting of one or two small binaries that
need to go onto the target machine.  We're working on a target-side
driven network client/server widget to fully automate this.

- FChE

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-07-01 12:13                 ` Theodore Tso
@ 2008-07-02 20:27                   ` Sam Ravnborg
  0 siblings, 0 replies; 52+ messages in thread
From: Sam Ravnborg @ 2008-07-02 20:27 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Roland McGrath, ksummit-2008-discuss, systemtap

On Tue, Jul 01, 2008 at 08:12:43AM -0400, Theodore Tso wrote:
> On Tue, Jul 01, 2008 at 01:05:17PM +0200, Sam Ravnborg wrote:
> > On Tue, Jul 01, 2008 at 06:15:07AM -0400, Theodore Tso wrote:
> > > 
> > > I've pulled apart RHEL's rpm macro magic before, and it's not a
> > > pleasant wading through all of the files; maybe we can teach the
> > > native kernel build infrastructure how to create debuginfo files so
> > > that each distribution doesn't have to re-invent the wheel from
> > > scratch, but rather can reuse common infrastructure in Kbuild....
> > 
> > What is needed to create debuginfo files?
> > Seems like a simple thing to integrate in kbuild
> > if this is per file or per module.
> 
> Well, the simple/stupdiest thing we can do is simply have an alternate
> target which installs the modules in
> 
>        $(INSTALL_MOD_PATH)/usr/lib/debug/$(KERNELRELEASE)
> 
> ... while ignoring the INSTALL_MOD_STRIP option.  You may recall that
> that I submitted the patch to add INSTALL_MOD_STRIP (commit
> ac031f26e); this was from an earlier attempt of mine to use
> kdump/systemtap.
> 
> RHEL's rpm macro magic does some additional objcopy's which I'll have
> to try to ease out to strip out the text segments and only leave the
> debug information in debuginfo files, which helps slim them down a
> little.  
> 
> > 
> > If it is for the kernel as a whole things gets a bit more complex.
> > 
> 
> It would be nice to do this for the base kernel as well (a vmlinux
> with strip --strip-debug applied takes only 6 megs in /boot on my
> system, but a vmlinux with full debugging information takes 66 megs;
> so moving the unstripped vmlinux out of /boot to /usr/lib/debug would
> be quite helpful for people who created their /boot partition not
> allowing for the rather dramatic increase in size needed for kernels
> built with debugging information.)

It all seems quite simple to do with a bit of careful testing.
I can give it a try when I'm properly installed in my new house.
So lets hope someone jump in and do it.

	Sam

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-07-02 20:08                   ` Joel Becker
@ 2008-07-02 20:17                     ` J. Bruce Fields
  2008-07-02 20:41                       ` Frank Ch. Eigler
                                         ` (2 more replies)
  0 siblings, 3 replies; 52+ messages in thread
From: J. Bruce Fields @ 2008-07-02 20:17 UTC (permalink / raw)
  To: Theodore Tso, Roland McGrath, ksummit-2008-discuss, systemtap

On Wed, Jul 02, 2008 at 01:06:51PM -0700, Joel Becker wrote:
> On Tue, Jul 01, 2008 at 07:13:27PM -0400, Theodore Tso wrote:
> > And remember, for the average kernel developer, the question is
> > whether using SystemTap is easier than inserting a bunch of printk's
> 
> 	I'll throw in a datapoint here.  I recently had to track a
> problem down on a distro kernel, and rebuilding distro kernels takes a
> lot of time.  So I decided to try SystemTap.  Once I'd discovered the
> magic location of the distro's debuginfo package, systemtap was *WAY*
> faster than prink+recompile.  I mean, we're talking 30 second turnaround
> between "Oh, I'd like to print this other value" and actually printing
> it.  In the core kernel, not a module.  No reboot, no nothin.  This is a
> huge win.
> 	But I'll never replicate that for my normal work at this rate.
> I'm usually floating multiple hand-built mainline kernels with new
> work.  Just like Ted describes.
> 
> > repository or weekly automated snapshots.)  So actually, being able to
> > install stripped modules and vmlinux into /boot and /linux, and then
> > being able to put the unstripped binaries somewhere else, without
> > having to use the !@#@! complicated RPM macros by Fedora/RHEL is
> > actually **very** important to me.
> 
> 	Me too.  I want to be able to say "make install; make
> tap_install" in my kernel objdir.  "install" does what it always has
> done - no change.  "tap_install" (or whatever) drops things in eg
> /lib/modules/<version>/debug such that systemtap Just Works.

That would be nice.  But I'm afraid I normally don't even have access to
the kernel tree on the machine I'm installing to--I usually build a
monolithic kernel and then scp it to the test machines.  Is there hope
for me?

--b.

> It can
> error if systemtap isn't installed or is too old.  But I shouldn't have
> to build a distro package of my kernel, or even understand the
> mechanism for building 'debuginfo' bits (even if I do).
> 
> Joel
> 
> -- 
> 
> "I am working for the time when unqualified blacks, browns, and
>  women join the unqualified men in running our overnment."
> 	- Sissy Farenthold
> 
> Joel Becker
> Principal Software Developer
> Oracle
> E-mail: joel.becker@oracle.com
> Phone: (650) 506-8127
> _______________________________________________
> Ksummit-2008-discuss mailing list
> Ksummit-2008-discuss@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/ksummit-2008-discuss

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-07-01 23:13                 ` Theodore Tso
  2008-07-02  2:23                   ` Frank Ch. Eigler
  2008-07-02 19:27                   ` Frank Ch. Eigler
@ 2008-07-02 20:08                   ` Joel Becker
  2008-07-02 20:17                     ` J. Bruce Fields
  2008-07-05  9:46                   ` Peter Zijlstra
  3 siblings, 1 reply; 52+ messages in thread
From: Joel Becker @ 2008-07-02 20:08 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Roland McGrath, ksummit-2008-discuss, systemtap

On Tue, Jul 01, 2008 at 07:13:27PM -0400, Theodore Tso wrote:
> And remember, for the average kernel developer, the question is
> whether using SystemTap is easier than inserting a bunch of printk's

	I'll throw in a datapoint here.  I recently had to track a
problem down on a distro kernel, and rebuilding distro kernels takes a
lot of time.  So I decided to try SystemTap.  Once I'd discovered the
magic location of the distro's debuginfo package, systemtap was *WAY*
faster than prink+recompile.  I mean, we're talking 30 second turnaround
between "Oh, I'd like to print this other value" and actually printing
it.  In the core kernel, not a module.  No reboot, no nothin.  This is a
huge win.
	But I'll never replicate that for my normal work at this rate.
I'm usually floating multiple hand-built mainline kernels with new
work.  Just like Ted describes.

> repository or weekly automated snapshots.)  So actually, being able to
> install stripped modules and vmlinux into /boot and /linux, and then
> being able to put the unstripped binaries somewhere else, without
> having to use the !@#@! complicated RPM macros by Fedora/RHEL is
> actually **very** important to me.

	Me too.  I want to be able to say "make install; make
tap_install" in my kernel objdir.  "install" does what it always has
done - no change.  "tap_install" (or whatever) drops things in eg
/lib/modules/<version>/debug such that systemtap Just Works.  It can
error if systemtap isn't installed or is too old.  But I shouldn't have
to build a distro package of my kernel, or even understand the
mechanism for building 'debuginfo' bits (even if I do).

Joel

-- 

"I am working for the time when unqualified blacks, browns, and
 women join the unqualified men in running our overnment."
	- Sissy Farenthold

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-07-01 23:13                 ` Theodore Tso
  2008-07-02  2:23                   ` Frank Ch. Eigler
@ 2008-07-02 19:27                   ` Frank Ch. Eigler
  2008-07-02 20:08                   ` Joel Becker
  2008-07-05  9:46                   ` Peter Zijlstra
  3 siblings, 0 replies; 52+ messages in thread
From: Frank Ch. Eigler @ 2008-07-02 19:27 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Roland McGrath, ksummit-2008-discuss, systemtap

Hi -


On Tue, Jul 01, 2008 at 07:13:27PM -0400, Theodore Tso wrote:

> [...]  And one of the major flaws of the Linux's RAS tools is that
> the LKML development community doesn't use them; and to the extent
> that tapsets would be written more quickly if they are easy for
> kernel developers who aren't depending on distro packaging and
> distro building of systemtap.  [...]

Please excuse my return to this point, but it meshes with something
else:

> probe kernel.function ("vfs_write"),
>       kernel.function ("vfs_read")
> {
>   dev_nr = $file->f_dentry->d_inode->i_sb->s_dev
>   inode_nr = $file->f_dentry->d_inode->i_ino
> 
>   if (dev_nr == ($1 << 20 | $2) # major/minor device
>       && inode_nr == $3)
>     printf ("%s(%d) %s 0x%x/%u\n",
>       execname(), pid(), probefunc(), dev_nr, inode_nr)
> }

So, one way a kernel developer could help write a tapset piece for us
is to encapsulate this into a tapset script fragment:

probe vfs.read = kernel.function ("vfs_read")
  {
    dev_nr = $...expression
    inode_nr = $...expression
  }

Then this definition would be shipped with the kernel or systemtap,
tested in one or the other build system for currency.  (Not by
coincidence, something much like that is already in our tapset, just
lacks those two values.)

Then the end user just does

   probe vfs.read { if (dev_nr != MKDEV(2,3)) printf ("whatever you want to print") }


****  or  ****


Kernel maintainers could add a marker or two right into their C code:

vfs_read() 
{
    /* ... */
    trace_mark (vfs_read, "dev %u inode %u whatever %s",
                          expression1, expression2, whatever);
    /* ... */
}

And that's it.  It's compiled-in, and checked as a part of your
routine builds.  Then the systemtap-side interpration code is trivial,
and anyone can write it.  And it doesn't require debugging data.

   probe vfs.read = kernel.mark("vfs_read") { dev_nr = $arg1; inode_nr = $arg2 }
   probe vfs.read = kernel.mark("vfs_read") { dev_nr = $arg1; inode_nr = $arg2 }


If people could get over the funny look of the markers (since
performance effects have been shown to be negligible), they could make
a significant contribution to this problem, with just a few lines of C
code.


- FChE

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-07-01 23:13                 ` Theodore Tso
@ 2008-07-02  2:23                   ` Frank Ch. Eigler
  2008-07-02 19:27                   ` Frank Ch. Eigler
                                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 52+ messages in thread
From: Frank Ch. Eigler @ 2008-07-02  2:23 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Roland McGrath, ksummit-2008-discuss, systemtap

Hi -

On Tue, Jul 01, 2008 at 07:13:27PM -0400, Theodore Tso wrote:

> On Tue, Jul 01, 2008 at 01:06:32PM -0700, Roland McGrath wrote:
> > Like I said, the essential command is eu-strip -f.  It is simple to use.
> > 
> > For one's own local hacking purposes, there is no real reason to bother
> > with strip-to-file complexities.  You can just copy the unstripped files
> > before stripping them.  [...]

> Well, actually, it *does* matter, at least to me.  [...]  I might
> have five, six, seven, eight or more kernels installed.  And on a
> number of my systems, the amount of space on the parititons where
> /boot and /lib live can't take the space demands of compiling the
> kernel and modules with -g.

You simply misunderstood Roland's suggestion: that you save the
unstripped copies of vmlinux etc. someplace - anyplace - for
systemtap's use, and that you strip (as normal) the pieces that go
into /boot.  No one is asking you to enlarge your boot partition.


> [...] And one of the major flaws of the Linux's RAS tools is that
> the LKML development community doesn't use them; and to the extent
> that tapsets would be written more quickly if they are easy for
> kernel developers [...]

Point taken (and applies broadly to all the other RAS tools).


> In the past two years, I've on average tried Systemtap every 9
> months or so, and each time, I'd hit a different annoying roadblock,
> and then I was so busy I would move on to a more productive way of
> solving my problems. [...]

Hearing about your problems at the time could well have steered us
toward focusing on their solution.

There has been a bit of a vicious circle in play: apparent lack of
interest from the LKML community drives focus toward on customery
problem areas, which then apparently disappoints (members of) the LKML
community into more disinterst.  Let's break this.


- FChE

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-07-01 20:06               ` Roland McGrath
@ 2008-07-01 23:13                 ` Theodore Tso
  2008-07-02  2:23                   ` Frank Ch. Eigler
                                     ` (3 more replies)
  0 siblings, 4 replies; 52+ messages in thread
From: Theodore Tso @ 2008-07-01 23:13 UTC (permalink / raw)
  To: Roland McGrath; +Cc: Frank Ch. Eigler, ksummit-2008-discuss, systemtap

On Tue, Jul 01, 2008 at 01:06:32PM -0700, Roland McGrath wrote:
> Like I said, the essential command is eu-strip -f.  It is simple to use.
> 
> For one's own local hacking purposes, there is no real reason to bother
> with strip-to-file complexities.  You can just copy the unstripped files
> before stripping them.  The effect is the same (or easier for you, with
> most of the tools), and the extra 6M (stripped) where you have disk space
> for the 60M (debuginfo) is never an issue (i.e. if it's 66M unstripped).

Well, actually, it *does* matter, at least to me.  Sometimes when I am
trying to track down a problem, either using git-bisect or evaluating
multiple patches, I might have five, six, seven, eight or more kernels
installed.  And on a number of my systems, the amount of space on the
parititons where /boot and /lib live can't take the space demands of
compiling the kernel and modules with -g.

And remember, for the average kernel developer, the question is
whether using SystemTap is easier than inserting a bunch of printk's
and recompiling.  And one of the major flaws of the Linux's RAS tools
is that the LKML development community doesn't use them; and to the
extent that tapsets would be written more quickly if they are easy for
kernel developers who aren't depending on distro packaging and distro
building of systemtap.  (Especially if systemtap is so fast moving
that people shouldn't depend on stable releases but rather the git
repository or weekly automated snapshots.)  So actually, being able to
install stripped modules and vmlinux into /boot and /linux, and then
being able to put the unstripped binaries somewhere else, without
having to use the !@#@! complicated RPM macros by Fedora/RHEL is
actually **very** important to me.

I don't know how many people considered that a showstopper; but James
mentioned on another thread that figuring out the magic, undocumanted
--enable-staticdw flag hit him as well.  Yes, I know that's been fixed
as of last Friday in the git repository, but again, it's these little
things that cause people to throw up their hands in frustration and
say, "Eh!  I'll just use printk's and recompilations instead; it's
easier."

In the past two years, I've on average tried Systemtap every 9 months
or so, and each time, I'd hit a different annoying roadblock, and then
I was so busy I would move on to a more productive way of solving my
problems.  And I've asked various different Systemtap developers and
architects (mostly inside IBM), and I'd get the same answers that
Ulrich spouted just recently on this list.  "Tapsets?  Yeah, we're
depending on kernel subsystem experts to write them; we don't know how
to get inside the internals of the various subsystems."  "Building it?
Stable releases?  That's a distro problem; just use what your distro
uses."  "Ooooh, sorry, that's an ancient version of Systemtap, blame
your distro provider for doing a sucky job."  And my reaction each
time was, "OK, back to printk debugging; and if you want me to write
tapsets for you, you're in fantasy land."

So I think this issue is very much a potential topic for the kernel
summit, namely --- why is it that so few kernel developers are using
RAS tools like Systemtap, and what can be done to improve this
situation?  Or if the Systemtap team doesn't need any help, and can
write all of these tapsets without kernel developer's participation,
or maybe assume that System administrators can write Systemtap scripts
that do things like:

probe kernel.function ("vfs_write"),
      kernel.function ("vfs_read")
{
  dev_nr = $file->f_dentry->d_inode->i_sb->s_dev
  inode_nr = $file->f_dentry->d_inode->i_ino

  if (dev_nr == ($1 << 20 | $2) # major/minor device
      && inode_nr == $3)
    printf ("%s(%d) %s 0x%x/%u\n",
      execname(), pid(), probefunc(), dev_nr, inode_nr)
}

and still be a credible competition to the audience served by DTrace,
hey, knock yourself out.  But I think there may be a connection
between problems which Systemtap developers seem to continually assert
a Sombody Else's Problem field around, and the lack of uptake by the
LKML community.  Maybe.  Just a guess on my part.

						- Ted

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-07-01 10:15             ` Theodore Tso
  2008-07-01 11:04               ` Sam Ravnborg
@ 2008-07-01 20:06               ` Roland McGrath
  2008-07-01 23:13                 ` Theodore Tso
  1 sibling, 1 reply; 52+ messages in thread
From: Roland McGrath @ 2008-07-01 20:06 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Frank Ch. Eigler, ksummit-2008-discuss, systemtap

Like I said, the essential command is eu-strip -f.  It is simple to use.

For one's own local hacking purposes, there is no real reason to bother
with strip-to-file complexities.  You can just copy the unstripped files
before stripping them.  The effect is the same (or easier for you, with
most of the tools), and the extra 6M (stripped) where you have disk space
for the 60M (debuginfo) is never an issue (i.e. if it's 66M unstripped).

In the Fedora packaging, an ELF vmlinux file in /boot is treated the same
as the .ko files (and all installed binaries for any package) and gets the
strip-to-file treatment.  It works the same on ELF executables (be they
kernels or otherwise), DSOs, and .ko's.  There is a special case in the
kernel packaging when what's in /boot is not in ELF format (i.e. bzImage
format and such)--the strip-to-file convention requires having the stripped
ELF file intact and on hand too.  When there won't be any plain ELF vmlinux
in /boot, we just copy the unstripped vmlinux into /usr/src/debug.

I honestly don't think it's ever going to be useful to any distro build to
have kernel makefiles do .debug file splitting.  For purposes of separate
debuginfo, the kernel really isn't a very special package.  The distro
packaging magic needs to do its debuginfo diddling, strip-to-file, and
related cataloguing magic for all packages anyway.  All the packagers have
to do for each individual package is get it to compile with -g and not
strip the binaries it installs.  The packaging hooey takes care of the
rest, and having a package's "make install" try to "do it for you" would
just break everything.  Future distro magic will evolve with newer tools to
pack the .debug file data in different, better ways, etc.  It just is not
going to help packagers to have any version of such logic built into the
kernel build process.

That said, knock yourself out.  I'm glad to answer questions about the
tools.  But we have gone pretty darn far afield from this thread's topic
now.  This does not seem like the logical place to pursue those technical
details of the toolchain.


Thanks,
Roland

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-07-01 11:04               ` Sam Ravnborg
@ 2008-07-01 12:13                 ` Theodore Tso
  2008-07-02 20:27                   ` Sam Ravnborg
  0 siblings, 1 reply; 52+ messages in thread
From: Theodore Tso @ 2008-07-01 12:13 UTC (permalink / raw)
  To: Sam Ravnborg; +Cc: Roland McGrath, ksummit-2008-discuss, systemtap

On Tue, Jul 01, 2008 at 01:05:17PM +0200, Sam Ravnborg wrote:
> On Tue, Jul 01, 2008 at 06:15:07AM -0400, Theodore Tso wrote:
> > 
> > I've pulled apart RHEL's rpm macro magic before, and it's not a
> > pleasant wading through all of the files; maybe we can teach the
> > native kernel build infrastructure how to create debuginfo files so
> > that each distribution doesn't have to re-invent the wheel from
> > scratch, but rather can reuse common infrastructure in Kbuild....
> 
> What is needed to create debuginfo files?
> Seems like a simple thing to integrate in kbuild
> if this is per file or per module.

Well, the simple/stupdiest thing we can do is simply have an alternate
target which installs the modules in

       $(INSTALL_MOD_PATH)/usr/lib/debug/$(KERNELRELEASE)

... while ignoring the INSTALL_MOD_STRIP option.  You may recall that
that I submitted the patch to add INSTALL_MOD_STRIP (commit
ac031f26e); this was from an earlier attempt of mine to use
kdump/systemtap.

RHEL's rpm macro magic does some additional objcopy's which I'll have
to try to ease out to strip out the text segments and only leave the
debug information in debuginfo files, which helps slim them down a
little.  

> 
> If it is for the kernel as a whole things gets a bit more complex.
> 

It would be nice to do this for the base kernel as well (a vmlinux
with strip --strip-debug applied takes only 6 megs in /boot on my
system, but a vmlinux with full debugging information takes 66 megs;
so moving the unstripped vmlinux out of /boot to /usr/lib/debug would
be quite helpful for people who created their /boot partition not
allowing for the rather dramatic increase in size needed for kernels
built with debugging information.)

						- Ted

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-07-01 10:15             ` Theodore Tso
@ 2008-07-01 11:04               ` Sam Ravnborg
  2008-07-01 12:13                 ` Theodore Tso
  2008-07-01 20:06               ` Roland McGrath
  1 sibling, 1 reply; 52+ messages in thread
From: Sam Ravnborg @ 2008-07-01 11:04 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Roland McGrath, ksummit-2008-discuss, systemtap

On Tue, Jul 01, 2008 at 06:15:07AM -0400, Theodore Tso wrote:
> 
> I've pulled apart RHEL's rpm macro magic before, and it's not a
> pleasant wading through all of the files; maybe we can teach the
> native kernel build infrastructure how to create debuginfo files so
> that each distribution doesn't have to re-invent the wheel from
> scratch, but rather can reuse common infrastructure in Kbuild....

What is needed to create debuginfo files?
Seems like a simple thing to integrate in kbuild
if this is per file or per module.

If it is for the kernel as a whole things gets a bit more complex.

	Sam

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-07-01  7:08           ` Roland McGrath
@ 2008-07-01 10:15             ` Theodore Tso
  2008-07-01 11:04               ` Sam Ravnborg
  2008-07-01 20:06               ` Roland McGrath
  0 siblings, 2 replies; 52+ messages in thread
From: Theodore Tso @ 2008-07-01 10:15 UTC (permalink / raw)
  To: Roland McGrath; +Cc: Frank Ch. Eigler, ksummit-2008-discuss, systemtap

On Tue, Jul 01, 2008 at 12:07:46AM -0700, Roland McGrath wrote:
> In rpm-based distros, this is done automagically in rpmbuild and driven by
> magic macros and shell scripts.  I had the impression Debian also did
> parallel -debuginfo packages of the same sort, so I presume some scripts
> using either objcopy/strip or eu-strip are buried in that build magic too.

Debian doesn't have -debuginfo packages, hence my request to get a
pointer at the magic shell script to do the separation.  To the extent
that Systemtap will be used by more people (and hence grow its tapset
collection more quickly) it would be useful if more distributions
could figure out how to deal with the -debuginfo information in a more
sane fashion (where quadroupling or so the space needed in /boot for
each kernel is often not practical :-).

I've pulled apart RHEL's rpm macro magic before, and it's not a
pleasant wading through all of the files; maybe we can teach the
native kernel build infrastructure how to create debuginfo files so
that each distribution doesn't have to re-invent the wheel from
scratch, but rather can reuse common infrastructure in Kbuild....

							- Ted

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-07-01  2:42         ` Theodore Tso
@ 2008-07-01  7:08           ` Roland McGrath
  2008-07-01 10:15             ` Theodore Tso
  0 siblings, 1 reply; 52+ messages in thread
From: Roland McGrath @ 2008-07-01  7:08 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Frank Ch. Eigler, ksummit-2008-discuss, systemtap

> Do you have a pointer to whatever program is used to generate the
> Fedora/RHEL-style separated .ko.debug files?

It's eu-strip -f (elfutils) or a combination of binutils tools with several
special options (I think it's two objcopy's and a strip or something).

In rpm-based distros, this is done automagically in rpmbuild and driven by
magic macros and shell scripts.  I had the impression Debian also did
parallel -debuginfo packages of the same sort, so I presume some scripts
using either objcopy/strip or eu-strip are buried in that build magic too.


Thanks,
Roland

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-06-30 19:40 ` Theodore Tso
  2008-06-30 20:00   ` Frank Ch. Eigler
@ 2008-07-01  5:29   ` Ananth N Mavinakayanahalli
  1 sibling, 0 replies; 52+ messages in thread
From: Ananth N Mavinakayanahalli @ 2008-07-01  5:29 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Frank Ch. Eigler, ksummit-2008-discuss, systemtap

On Mon, Jun 30, 2008 at 02:19:59PM -0400, Theodore Tso wrote:
> On Sun, Jun 29, 2008 at 09:04:23PM -0400, Frank Ch. Eigler wrote:
> > * tapsets
> > 
> >   Theodore is mistaken that we are deflecting the job of tapset (probe
> >   macro; abstracting architecture and kernel version-change -
> >   $foo->bar->baz, function names) authorship.  We have asked for help,
> >   and have received a little, but the group has in fact authored a
> >   growing collection of this stuff.
> 
> Well I've heard the line that it's up to the kernel subsystem experts
> to write tapsets from Ulrich Drepper (on the ksummit-2008-discuss
> list) and from Ananth N Mavinakayanahalli (private communication) so I
> think it's fair to say that at least some people associated with
> Systemtap have been placing the blame for the lack of tapsets on the
> kernel developers.

I wouldn't call that 'blame'. What I was trying to say simply was that
kernel subsystem experts are best suited to identify the location and
types of data one could extract from their subsystem. I believe that is
also what Ulrich was trying to say.

Apologies if you felt I was blaming anybody.

Ananth

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-06-30 22:10       ` Frank Ch. Eigler
@ 2008-07-01  2:42         ` Theodore Tso
  2008-07-01  7:08           ` Roland McGrath
  0 siblings, 1 reply; 52+ messages in thread
From: Theodore Tso @ 2008-07-01  2:42 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: ksummit-2008-discuss, systemtap

On Mon, Jun 30, 2008 at 04:42:19PM -0400, Frank Ch. Eigler wrote:
> > [...]  What about stripping out the text segment of the object
> > files, so you aren't storing the information twice on disk, or
> > compressing the debuginfo files so they take up less room on disk?
> 
> This is roughly what the Fedora/RHEL-style separated .ko.debug files
> do, though I don't know if they are that complete.  (They'd need a
> copy of the symbol tables, and probably other stuff.)

Do you have a pointer to whatever program is used to generate the
Fedora/RHEL-style separated .ko.debug files?

Thanks, regards,

						- Ted

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-06-30 19:59 ` James Bottomley
@ 2008-06-30 23:52   ` Masami Hiramatsu
  2008-07-08 23:32   ` Eric W. Biederman
  1 sibling, 0 replies; 52+ messages in thread
From: Masami Hiramatsu @ 2008-06-30 23:52 UTC (permalink / raw)
  To: James Bottomley; +Cc: Frank Ch. Eigler, ksummit-2008-discuss, systemtap

Hi,

James Bottomley wrote:
> On Sun, 2008-06-29 at 21:04 -0400, Frank Ch. Eigler wrote:
>> * kprobes, markers
>>
>>   Performance of kprobes-based probes is about 1 us per hit overhead.
>>   Markers are on the order of tens of nanoseconds, which makes a huge
>>   difference for frequently-hit probes.  We'd be happy to interface to
>>   other event sources like ftrace or whatever, as long as they provide
>>   suitable kernel-module-accessible APIs.
> 
> There were two specific latencies of concern to the financial trading
> house type end user: One was the latency from execution to run.  This is
> caused mostly by the module build and insertion.  I really can't see
> this getting better except by divorcing systemtap from having to use the
> whole of the kernel build infrastructure.  To do that, we need to begin
> putting a lot of the C fragments that make up that infrastructure into
> the kernel to lessen the load.  It would actually be nice finally to get
> to the point where you simply link the probe routines with a special
> module stub (built as part of the kernel) and insert it.

I agree, compiling systemtap runtime code to an independent module(or
object file) could reduce building time.
(However, I think it depends on what script you write. if you probe all
of sys_* functions, function searching time becomes long)

> The other is the probe execution latency.  Since the institutions are
> tracing transactions on the order of milliseconds, microsecond latencies
> in the probes do give them cause for concern (it only takes a few probe
> points to add up to a significant perturbation).

Marker has another benefit, it enables you to probe irq handler.
Since Kprobe uses exceptions and isn't recursive, it can't probe
irq related functions. Marker can probe it, because it doesn't use
any exceptions.

[...]
>> * integrating systemtap runtime into kernel
>>
>>   We did some analysis about how much of the runtime code contains
>>   novel & relevant code to the kernel.  We came up with a fraction
>>   like 20% (IIRC; still searching for a link to the thread).  Some of
>>   the code is indeed in need of some cleanup love.  
>>
>>   Some of it has been necessary to work around kernel disruptions
>>   (e.g., unexporting stuff like kallsyms_lookup).  The parts that are
>>   deeply kernel-version-sensitive (and would thus benefit from your
>>   maintenance) are quite small.  We're still open to trying to pursue
>>   copying/upstreaming some of this code into the kernel.
> 
> Actually, this one is an example of a wrong approach.  What you're
> effectively doing is trying to implement an ABI for staprun in these
> files (as well as various helpers for the modules).  The work around for
> kallsyms_lookup is pretty horrible as well ... expecially as the kernel
> has its own address to symbol string converter.
> 
> This is a lot of what needs to be cleaned up and simplified.  The
> interface between systemtap and the kernel is essentially a private ABI
> and we should treat it as such, so all the helpers for the modules and
> the necessary implementers of the ABI should be in kernel ... there
> shouldn't be any (if done right) carried around as C fragments with
> kernel version ifdefs ...

And also, some of them should be isolated from the kernel itself.
For example, systemtap can not call do_gettimeofday() because it
is not recursive. So, now, systemtap has its own time.c.

Thank you,

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: mhiramat@redhat.com

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-06-30 21:12       ` Arnaldo Carvalho de Melo
@ 2008-06-30 23:02         ` David Miller
  0 siblings, 0 replies; 52+ messages in thread
From: David Miller @ 2008-06-30 23:02 UTC (permalink / raw)
  To: acme; +Cc: tytso, fche, ksummit-2008-discuss, systemtap

From: Arnaldo Carvalho de Melo <acme@redhat.com>
Date: Mon, 30 Jun 2008 17:22:33 -0300

> Em Mon, Jun 30, 2008 at 04:10:31PM -0400, Theodore Tso escreveu:
> > On Mon, Jun 30, 2008 at 03:25:33PM -0400, Frank Ch. Eigler wrote:
> > Stupid question --- has anyone thought about writing tools to strip
> > out specific debug information not needed by Systemtap?  For example,
> > I assume systemtap doesn't need the line number information, since you
> > can't set probes on arbitrary line numbers (and even if you could,
> > such tapsets would be so brittle that it wouldn't be funny); so would
> > the debuginfo files be smaller if that information were stripped out?
> > I understand that this would make the files less useful for
> > kdump/crash, but for systemtap only users, it might be quite useful.
> > What about stripping out the text segment of the object files, so you
> > aren't storing the information twice on disk, or compressing the
> > debuginfo files so they take up less room on disk?
> 
> Yes, its called CTF, Compressed C Type Format, in DTrace land:
> 
> http://opensolaris.org/os/project/ppc-dev/task_map/ctf/
> 
> DaveM wrote a CTF loader that I included in my dwarves package, so that
> we can pretty-print and use all the other features in pahole on files
> with CTF sections, such as the Open Solaris kernel and the userland
> binaries, that all ship with CTF embedded, dispensing the usage of
> -debuginfo packages, all AFAIK.

One thing you lose with CTF is the stack unwind tables,
and I don't know if systemtap needs that or not.

If someone can state what the absolute minimum requirement
is for systemtap to be able to analyze a binary properly,
we can figure out if CTF provides it.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-06-30 20:19     ` Theodore Tso
  2008-06-30 21:12       ` Arnaldo Carvalho de Melo
  2008-06-30 21:13       ` James Bottomley
@ 2008-06-30 22:10       ` Frank Ch. Eigler
  2008-07-01  2:42         ` Theodore Tso
  2 siblings, 1 reply; 52+ messages in thread
From: Frank Ch. Eigler @ 2008-06-30 22:10 UTC (permalink / raw)
  To: Theodore Tso; +Cc: ksummit-2008-discuss, systemtap

Hi -

On Mon, Jun 30, 2008 at 04:10:31PM -0400, Theodore Tso wrote:
> [...]
> > They shouldn't have to repackage it at all - just leave it in the
> > build tree.
> 
> The problem is that I am often juggling multiple kernel builds, and so
> I don't want to keep the full build tree around.  So I just want to
> extract out the specific files needed by Systemtap [...]

OK, we'll have to think about how to support that well.

> Stupid question --- has anyone thought about writing tools to strip
> out specific debug information not needed by Systemtap?

Yes, but nothing so simple/workable as to have been done already.

> For example, I assume systemtap doesn't need the line number
> information, since you can't set probes on arbitrary line numbers

Actually, we can - and now with wildcards too if you want
source-line-by-line tracing.  See the top of the NEWS file.

> (and even if you could, such tapsets would be so brittle that it
> wouldn't be funny); [...]

Yes, this is not a good fit for tapsets, but is handy for exploring
one's known version of code.  Also, we can now use relative line
numbers (line #10 within this function), which might be stable enough
for some tapset use.  (This is all very recent stuff, beware.)

> [...]  What about stripping out the text segment of the object
> files, so you aren't storing the information twice on disk, or
> compressing the debuginfo files so they take up less room on disk?

This is roughly what the Fedora/RHEL-style separated .ko.debug files
do, though I don't know if they are that complete.  (They'd need a
copy of the symbol tables, and probably other stuff.)


- FChE

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-06-30 20:19     ` Theodore Tso
  2008-06-30 21:12       ` Arnaldo Carvalho de Melo
@ 2008-06-30 21:13       ` James Bottomley
  2008-06-30 22:10       ` Frank Ch. Eigler
  2 siblings, 0 replies; 52+ messages in thread
From: James Bottomley @ 2008-06-30 21:13 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Frank Ch. Eigler, ksummit-2008-discuss, systemtap

On Mon, 2008-06-30 at 16:10 -0400, Theodore Tso wrote:
> Stupid question --- has anyone thought about writing tools to strip
> out specific debug information not needed by Systemtap?  For example,
> I assume systemtap doesn't need the line number information, since you
> can't set probes on arbitrary line numbers (and even if you could,
> such tapsets would be so brittle that it wouldn't be funny); so would
> the debuginfo files be smaller if that information were stripped out?
> I understand that this would make the files less useful for
> kdump/crash, but for systemtap only users, it might be quite useful.
> What about stripping out the text segment of the object files, so you
> aren't storing the information twice on disk, or compressing the
> debuginfo files so they take up less room on disk?

Actually, you can ... and I know it's brittle, but I do use this feature
a lot (there's no other way to get at local variables currently than by
specifying a line number through the statement interface).  I believe
the point of the markers project is to add pieces to the kernel that
identify useful (and invariant) internal points in the routines where
you can get at the local variables without having to specify line
numbers.

The debug information is bulky because dwarf is so damn wasteful.
Practically every kernel module will contain the dwarf definition of
certain central structures.  When a debugger works on an executable, it
first of all builds up and in house view of all the dwarf statements,
combining all of the duplicate symbols.  If we could find a way of doing
that for the kernel and then spitting it out as a single file, it would
be far smaller than the debuginfo. Assuming we don't want a monolith,
but actual reduced files (so that modules can be added) we immediately
run across the other annoying thing with dwarf: it has a mechanism to
stub out definitions (DW_AT_declaration) but no way of providing input
about where the real definition is (you now have to search the entire
tree to find it).

Unfortunately, you really have to do these type of reduction tricks, and
strip really just won't do them usefully.

James


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-06-30 20:19     ` Theodore Tso
@ 2008-06-30 21:12       ` Arnaldo Carvalho de Melo
  2008-06-30 23:02         ` David Miller
  2008-06-30 21:13       ` James Bottomley
  2008-06-30 22:10       ` Frank Ch. Eigler
  2 siblings, 1 reply; 52+ messages in thread
From: Arnaldo Carvalho de Melo @ 2008-06-30 21:12 UTC (permalink / raw)
  To: Theodore Tso
  Cc: Frank Ch. Eigler, David S. Miller, ksummit-2008-discuss, systemtap

Em Mon, Jun 30, 2008 at 04:10:31PM -0400, Theodore Tso escreveu:
> On Mon, Jun 30, 2008 at 03:25:33PM -0400, Frank Ch. Eigler wrote:
> > > The problem is that kernel developers are often juggling multiple
> > > kernels, so kernel developers need to learn how to package up this
> > > bulky data as well.
> > 
> > They shouldn't have to repackage it at all - just leave it in the
> > build tree.
> 
> The problem is that I am often juggling multiple kernel builds, and so
> I don't want to keep the full build tree around.  So I just want to
> extract out the specific files needed by Systemtap, especially becuase
> they are so bulky.  So normally I actually do create specific packages
> for the kernels I use (so I can give them to others or put them on my
> server machines if they prove to be stable), and I want to be able to
> package up the debuginfo files as well --- and only exactly the
> debuginfo files which are needed to make systemtap work.
> 
> Stupid question --- has anyone thought about writing tools to strip
> out specific debug information not needed by Systemtap?  For example,
> I assume systemtap doesn't need the line number information, since you
> can't set probes on arbitrary line numbers (and even if you could,
> such tapsets would be so brittle that it wouldn't be funny); so would
> the debuginfo files be smaller if that information were stripped out?
> I understand that this would make the files less useful for
> kdump/crash, but for systemtap only users, it might be quite useful.
> What about stripping out the text segment of the object files, so you
> aren't storing the information twice on disk, or compressing the
> debuginfo files so they take up less room on disk?

Yes, its called CTF, Compressed C Type Format, in DTrace land:

http://opensolaris.org/os/project/ppc-dev/task_map/ctf/

DaveM wrote a CTF loader that I included in my dwarves package, so that
we can pretty-print and use all the other features in pahole on files
with CTF sections, such as the Open Solaris kernel and the userland
binaries, that all ship with CTF embedded, dispensing the usage of
-debuginfo packages, all AFAIK.

In my TODO I have "encode CTF from DWARF and make it a part of the
kernel building process" together with "publish results about the savings
obtained", how much would be added to the kernel image so that we always
ship the, by then compressed debugging information, to the kernel.

I hope to get back to working on this RSN.

- Arnaldo

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-06-30 20:00   ` Frank Ch. Eigler
@ 2008-06-30 20:19     ` Theodore Tso
  2008-06-30 21:12       ` Arnaldo Carvalho de Melo
                         ` (2 more replies)
  0 siblings, 3 replies; 52+ messages in thread
From: Theodore Tso @ 2008-06-30 20:19 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: ksummit-2008-discuss, systemtap

On Mon, Jun 30, 2008 at 03:25:33PM -0400, Frank Ch. Eigler wrote:
> > The problem is that kernel developers are often juggling multiple
> > kernels, so kernel developers need to learn how to package up this
> > bulky data as well.
> 
> They shouldn't have to repackage it at all - just leave it in the
> build tree.

The problem is that I am often juggling multiple kernel builds, and so
I don't want to keep the full build tree around.  So I just want to
extract out the specific files needed by Systemtap, especially becuase
they are so bulky.  So normally I actually do create specific packages
for the kernels I use (so I can give them to others or put them on my
server machines if they prove to be stable), and I want to be able to
package up the debuginfo files as well --- and only exactly the
debuginfo files which are needed to make systemtap work.

Stupid question --- has anyone thought about writing tools to strip
out specific debug information not needed by Systemtap?  For example,
I assume systemtap doesn't need the line number information, since you
can't set probes on arbitrary line numbers (and even if you could,
such tapsets would be so brittle that it wouldn't be funny); so would
the debuginfo files be smaller if that information were stripped out?
I understand that this would make the files less useful for
kdump/crash, but for systemtap only users, it might be quite useful.
What about stripping out the text segment of the object files, so you
aren't storing the information twice on disk, or compressing the
debuginfo files so they take up less room on disk?

> > [...] since the Wiki is filled with assertions (echoed by Ulrich in
> > the recent ksummit-discuss thread) about how Systemtap is a fast
> > moving project, and why it's absolutely necessary to grab the latest
> > bleeding edge sources from the git tree.
> 
> That's been generally true - but that does not apply to elfutils.
> Some of us run with rather old elfutils just fine.

Hmm, well it doesn't work with the version of elfutils shipped with
the latest (8.04) Ubuntu.  <Checking to get the exact message
configure blew up with...>  Ah, now it does.  The wiki didn't say
anyting about needing --enable-staticdw, and I see with a recent
commit from last Friday you don't even need to specify
--enable-staticdw any more, it DTRT automatcally.  Nice!  Thanks for
fixing this!

							- Ted

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-06-30 19:40 ` Theodore Tso
@ 2008-06-30 20:00   ` Frank Ch. Eigler
  2008-06-30 20:19     ` Theodore Tso
  2008-07-01  5:29   ` Ananth N Mavinakayanahalli
  1 sibling, 1 reply; 52+ messages in thread
From: Frank Ch. Eigler @ 2008-06-30 20:00 UTC (permalink / raw)
  To: Theodore Tso; +Cc: ksummit-2008-discuss, systemtap

Hi -

On Mon, Jun 30, 2008 at 02:19:59PM -0400, Theodore Tso wrote:
> [...]
> >   Theodore is mistaken that we are deflecting the job of tapset (probe
> >   macro; abstracting architecture and kernel version-change -
> >   $foo->bar->baz, function names) authorship.  We have asked for help,
> >   and have received a little, but the group has in fact authored a
> >   growing collection of this stuff.
> 
> Well I've heard the line that it's up to the kernel subsystem experts
> to write tapsets from Ulrich Drepper (on the ksummit-2008-discuss
> list) and from Ananth N Mavinakayanahalli (private communication) so I
> think it's fair to say that at least some people associated with
> Systemtap have been placing the blame for the lack of tapsets on the
> kernel developers.

We wouldn't talk about blame.


> As far as the growing collection of this stuff?  Where is it?  Do you
> mean in the tapsets directory of the systemtap sources in the git
> repository?  

Yes.

> Is there any documentation or example usage scenarios for these
> tapsets?

Yes, documentation - where exists - is in man pages (stapprobes, ...);
sample usage is in the example scripts, wiki, or the test suite itself.


> > * debuginfo
> > 
> >   Yes, it's very helpful & necessary if one wants to place probes at
> >   just about any statement and extract just about any data value.
> >   It's the same prerequisite that crash or kgdb would have, since we
> >   operate at a similar level of object/source code visibility.  Other
> >   distros are learning to package this admittedly bulky data up, so
> >   it'll be a matter of a largish download for distro users. Kernel
> >   developers will of course have the data generated locally already.
> 
> The problem is that kernel developers are often juggling multiple
> kernels, so kernel developers need to learn how to package up this
> bulky data as well.

They shouldn't have to repackage it at all - just leave it in the
build tree.

> It would be useful if
> http://sourceware.org/systemtap/wiki/SystemTapWithSelfBuiltKernel
> was a bit more explicit about exactly what SystemTap expects to find
> in SYSTEMTAP_DEBUGINFO_PATH.  [...]

That's a good point.  I'll make sure that the recipe for self-built
kernels is more complete.


> > * systemtap building
> > 
> >   The only thing unusual with building the thing is the use of the
> >   elfutils library to parse elf/dwarf data; links to that are provided
> >   and one can link to a private copy if the system lacks it.

> So how do you link to a private copy?  There's nothing in the wiki
> that describes this.  [...]  It would be nice if the Systemtap
> libraries had some provision where you could either point to a
> source directory where the patched elfutils libraries had been
> placed, and automatically used them for static linking,

That's exactly what the "--with-elfutils=DIRECTORY" systemtap autoconf
option does.

> [...] since the Wiki is filled with assertions (echoed by Ulrich in
> the recent ksummit-discuss thread) about how Systemtap is a fast
> moving project, and why it's absolutely necessary to grab the latest
> bleeding edge sources from the git tree.

That's been generally true - but that does not apply to elfutils.
Some of us run with rather old elfutils just fine.

> I'm willing to send patches for this sorts of usability issues if
> it's likely such patches would be accepted...

We would welcome any help with this stuff.

> > * systemtap releases
> > 
> >   True, we've been spotty with formal releases, though they are
> >   archived and available, and we're moving to a more regular release
> >   schedule very shortly.  The weekly snapshots have been good (except
> >   a recent unfortunate regression that hits 2.6.25 kernels
> >   particularly badly - that's holding up the new release plans).
> 
> Does the regression hit 2.6.26-rc8 kernels?  (i.e., should I not
> bother trying Systemtap until this gets cleared up, lest I waste hours
> and hours again getting frustrated?)

Early data suggests it's better under 2.6.26, so I recommend trying it
just once (don't spend hours).  If it fails, then please wait until
the 0.7 release -- or just try the older 0.6.2, which will almost
certainly work fine for you.

- FChE

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-06-30 13:57 DTrace Frank Ch. Eigler
  2008-06-30 19:00 ` [Ksummit-2008-discuss] DTrace Grant Grundler
  2008-06-30 19:40 ` Theodore Tso
@ 2008-06-30 19:59 ` James Bottomley
  2008-06-30 23:52   ` Masami Hiramatsu
  2008-07-08 23:32   ` Eric W. Biederman
  2 siblings, 2 replies; 52+ messages in thread
From: James Bottomley @ 2008-06-30 19:59 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: ksummit-2008-discuss, systemtap

On Sun, 2008-06-29 at 21:04 -0400, Frank Ch. Eigler wrote:
> Please forgive me for "crashing" the discussion party here.  I would
> like to clarify some systemtap-related issues that people have raised.
> (I'm one of its developers.)  I'll just list individual points,
> roughly in order they were raised.  For a fuller treatment of any of
> the topics, please involve our public <systemtap@sources.redhat.com>
> mailing list.

It's not a private party ... hence the "discuss" part of the list
naming ...

> * postgres, other dtrace-probe-instrumented userspace programs
> 
>   We aim to piggyback on these efforts by reusing the dtrace
>   instrumentation calls embedded into postgres etc., if at all
>   possible.
> 
> * "klunky and prone to break in unexpected ways"
> 
>   There's a germ of truth there, but OTOH the case James ran into
>   involved complications beyond normal symbolic debugging too
>   (possibly having to search separately compiled modules for
>   definitions of opaque struct-pointer types).  We're working on it;
>   our bug/feature list is in public bugzilla.

Well, let me give you another example, because it tripped me up for
days:  Return probes give access to the entry variables in the state the
routine was entered (not on return).  I ran into it because I was trying
to look at what a routine had done to the scsi command structure which
was passed as an input.

I've also found it very easy to crash the system under probe if you use
the wrong build tree for the running kernel (not a problem, I know that
enterprise customers run into, but a common one for kernel developers).
Since we have a kernel build version that increments with every build,
it would be useful to sanity check the one systemtap pulled out of the
debug with the one in the running kernel.

> * "unhappy week with dwarf"
> 
>   Guilty as charged. :-)
> 
> * kprobes, markers
> 
>   Performance of kprobes-based probes is about 1 us per hit overhead.
>   Markers are on the order of tens of nanoseconds, which makes a huge
>   difference for frequently-hit probes.  We'd be happy to interface to
>   other event sources like ftrace or whatever, as long as they provide
>   suitable kernel-module-accessible APIs.

There were two specific latencies of concern to the financial trading
house type end user: One was the latency from execution to run.  This is
caused mostly by the module build and insertion.  I really can't see
this getting better except by divorcing systemtap from having to use the
whole of the kernel build infrastructure.  To do that, we need to begin
putting a lot of the C fragments that make up that infrastructure into
the kernel to lessen the load.  It would actually be nice finally to get
to the point where you simply link the probe routines with a special
module stub (built as part of the kernel) and insert it.

The other is the probe execution latency.  Since the institutions are
tracing transactions on the order of milliseconds, microsecond latencies
in the probes do give them cause for concern (it only takes a few probe
points to add up to a significant perturbation).

> * user-space probing
> 
>   We're finally getting very close in this.  Yes, it'd use the IBM
>   uprobes prototype above the Red Hat utrace work as a lower layer,
>   which we hope get upstream as soon as possible.  It will behave
>   analogously to dtrace: executing probes in kernel space.  If it can
>   be made safe (and we think it can), it's a huge performance win over
>   trying to do it in userspace (with some gang of debugging processes
>   or whatever).
> 
> * oprofile
> 
>   It's a fine special-purpose tool.  We hope to hook into the same
>   sorts of underlying hardware performance counters to enable the same
>   profiling capability in systemtap - except well integrated with the
>   rest of the probing events / scripts.  perfmon2 upstream would be
>   very helpful.
> 
> * dtrace "just works"
> 
>   Yeah, so I hear, but think about how different their target
>   environment is.  Their kernel hardly changes (several fixed APIs,
>   ABIs): this has huge implications.  Their kernel was willing to
>   insert probes (~ markers), a bunch of build system changes (debug
>   info subset transcribing).  Here in linux land, we suffer
>   multifaceted tensions and it is hard to go toward a goal without
>   obstructions (well-meaning as they may be).

The goal has to be well articulated and agreed to.  Open source is rapid
at progressing towards common goals ... it's when the goals aren't
common that progress gets bogged down.

>   A bunch of third-party scripts are often conflated with "dtrace",
>   which is just a matter of growing the user community enough, and
>   giving them a good tool to build on top of.  A growing set of
>   runnable end-user scripts is already packaged with systemtap,
>   intended for use by nonexperts, more help (e.g. concise problem
>   statements about what you'd like to measure/see) would be welcome.
> 
> * integrating systemtap runtime into kernel
> 
>   We did some analysis about how much of the runtime code contains
>   novel & relevant code to the kernel.  We came up with a fraction
>   like 20% (IIRC; still searching for a link to the thread).  Some of
>   the code is indeed in need of some cleanup love.  
> 
>   Some of it has been necessary to work around kernel disruptions
>   (e.g., unexporting stuff like kallsyms_lookup).  The parts that are
>   deeply kernel-version-sensitive (and would thus benefit from your
>   maintenance) are quite small.  We're still open to trying to pursue
>   copying/upstreaming some of this code into the kernel.

Actually, this one is an example of a wrong approach.  What you're
effectively doing is trying to implement an ABI for staprun in these
files (as well as various helpers for the modules).  The work around for
kallsyms_lookup is pretty horrible as well ... expecially as the kernel
has its own address to symbol string converter.

This is a lot of what needs to be cleaned up and simplified.  The
interface between systemtap and the kernel is essentially a private ABI
and we should treat it as such, so all the helpers for the modules and
the necessary implementers of the ABI should be in kernel ... there
shouldn't be any (if done right) carried around as C fragments with
kernel version ifdefs ...

> * tapsets
> 
>   Theodore is mistaken that we are deflecting the job of tapset (probe
>   macro; abstracting architecture and kernel version-change -
>   $foo->bar->baz, function names) authorship.  We have asked for help,
>   and have received a little, but the group has in fact authored a
>   growing collection of this stuff.
> 
>   We would welcome having tapsets be included with the kernel and
>   cared for by you guys.
> 
> * debuginfo
> 
>   Yes, it's very helpful & necessary if one wants to place probes at
>   just about any statement and extract just about any data value.
>   It's the same prerequisite that crash or kgdb would have, since we
>   operate at a similar level of object/source code visibility.  Other
>   distros are learning to package this admittedly bulky data up, so
>   it'll be a matter of a largish download for distro users. Kernel
>   developers will of course have the data generated locally already.
> 
>   We've recently gained the ability to work on symbol table level data
>   only.  It's a compromise technology: it shrinks the installation
>   footprint but we get only function-entry probes; we lose data
>   typing; can only get at ABI-dictated positional integral arguments.
> 
> * systemtap building
> 
>   The only thing unusual with building the thing is the use of the
>   elfutils library to parse elf/dwarf data; links to that are provided
>   and one can link to a private copy if the system lacks it.

That's true, just: I've done it but it's not exactly easy.  The
necessity of this undocumented --enable-staticdw flag stalled my
attempts to build it for a while.

> * systemtap releases
> 
>   True, we've been spotty with formal releases, though they are
>   archived and available, and we're moving to a more regular release
>   schedule very shortly.  The weekly snapshots have been good (except
>   a recent unfortunate regression that hits 2.6.25 kernels
>   particularly badly - that's holding up the new release plans).
> 
> 
> Thanks for reading; sorry about the length.

James


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-06-30 13:57 DTrace Frank Ch. Eigler
  2008-06-30 19:00 ` [Ksummit-2008-discuss] DTrace Grant Grundler
@ 2008-06-30 19:40 ` Theodore Tso
  2008-06-30 20:00   ` Frank Ch. Eigler
  2008-07-01  5:29   ` Ananth N Mavinakayanahalli
  2008-06-30 19:59 ` James Bottomley
  2 siblings, 2 replies; 52+ messages in thread
From: Theodore Tso @ 2008-06-30 19:40 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: ksummit-2008-discuss, systemtap

On Sun, Jun 29, 2008 at 09:04:23PM -0400, Frank Ch. Eigler wrote:
> * tapsets
> 
>   Theodore is mistaken that we are deflecting the job of tapset (probe
>   macro; abstracting architecture and kernel version-change -
>   $foo->bar->baz, function names) authorship.  We have asked for help,
>   and have received a little, but the group has in fact authored a
>   growing collection of this stuff.

Well I've heard the line that it's up to the kernel subsystem experts
to write tapsets from Ulrich Drepper (on the ksummit-2008-discuss
list) and from Ananth N Mavinakayanahalli (private communication) so I
think it's fair to say that at least some people associated with
Systemtap have been placing the blame for the lack of tapsets on the
kernel developers.

As far as the growing collection of this stuff?  Where is it?  Do you
mean in the tapsets directory of the systemtap sources in the git
repository?  Is there any documentation or example usage scenarios for
these tapsets?

> * debuginfo
> 
>   Yes, it's very helpful & necessary if one wants to place probes at
>   just about any statement and extract just about any data value.
>   It's the same prerequisite that crash or kgdb would have, since we
>   operate at a similar level of object/source code visibility.  Other
>   distros are learning to package this admittedly bulky data up, so
>   it'll be a matter of a largish download for distro users. Kernel
>   developers will of course have the data generated locally already.

The problem is that kernel developers are often juggling multiple
kernels, so kernel developers need to learn how to package up this
bulky data as well.  And it's not obvious what needs to be done with
for example the modules files, especially if they've been stripped so
they will fit into the /boot partition.  It would be useful if
http://sourceware.org/systemtap/wiki/SystemTapWithSelfBuiltKernel 
was a bit more explicit about exactly what SystemTap expects to find
in SYSTEMTAP_DEBUGINFO_PATH.  I'm sure it's blindly obvious to a Systemtap
developer, but it isn't for someone who is just getting started with
Systemtap, and runs into one brick wall after another.

> * systemtap building
> 
>   The only thing unusual with building the thing is the use of the
>   elfutils library to parse elf/dwarf data; links to that are provided
>   and one can link to a private copy if the system lacks it.

So how do you link to a private copy?  There's nothing in the wiki
that describes this.  It seems to imply that you have to install the
elfutils globally, and I've been hesitant to do this lest it break
things that aren't expecting the latest bleeding edge library.  (I
have no idea whether the elfutils library developers worry about ABI
compatibility for applications dynamically link with the
system-provided elfutils library.)

It would be nice if the Systemtap libraries had some provision where
you could either point to a source directory where the patched
elfutils libraries had been placed, and automatically used them for
static linking, or if you could download the patched elfutils library
into some directory in the Systemtap sources, and if present, the
build system would automatically use them.  This sort of minor thing
makes life much simpler for people who are trying to pull down the
latest Systemtap tree, especially since the Wiki is filled with
assertions (echoed by Ulrich in the recent ksummit-discuss thread)
about how Systemtap is a fast moving project, and why it's absolutely
necessary to grab the latest bleeding edge sources from the git tree.

I'm willing to send patches for this sorts of usability issues if it's
likely such patches would be accepted...

> * systemtap releases
> 
>   True, we've been spotty with formal releases, though they are
>   archived and available, and we're moving to a more regular release
>   schedule very shortly.  The weekly snapshots have been good (except
>   a recent unfortunate regression that hits 2.6.25 kernels
>   particularly badly - that's holding up the new release plans).

Does the regression hit 2.6.26-rc8 kernels?  (i.e., should I not
bother trying Systemtap until this gets cleared up, lest I waste hours
and hours again getting frustrated?)

						- Ted

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-06-30 13:57 DTrace Frank Ch. Eigler
@ 2008-06-30 19:00 ` Grant Grundler
  2008-06-30 19:40 ` Theodore Tso
  2008-06-30 19:59 ` James Bottomley
  2 siblings, 0 replies; 52+ messages in thread
From: Grant Grundler @ 2008-06-30 19:00 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: ksummit-2008-discuss, systemtap

On Sun, Jun 29, 2008 at 6:04 PM, Frank Ch. Eigler <fche@redhat.com> wrote:
> Please forgive me for "crashing" the discussion party here.

Frank,
Excellent reply! Thanks!

Just one observation here:

> * dtrace "just works"
...
> A growing set of
>  runnable end-user scripts is already packaged with systemtap,
>  intended for use by nonexperts, more help (e.g. concise problem
>  statements about what you'd like to measure/see) would be welcome.

Would it be possible to look at prepackaged Dtrace "scripts"
and generate something comparable for systemtap?

I know the kernels are very different but the underlying functionality
(manage resources: CPU, memory, disk, network, etc) is basically
the same. In x86-64 case, it's the same HW doing essentially
the same things.

thanks,
grant

^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2008-07-08 23:32 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20080627150424.GB14894@parisc-linux.org>
     [not found] ` <1214580213.3394.40.camel@localhost.localdomain>
     [not found]   ` <20080627155018.GC14894@parisc-linux.org>
     [not found]     ` <1214583502.7698.15.camel@weaponx>
     [not found]       ` <20080627163056.GB1416@lst.de>
     [not found]         ` <20080628072605.GA505@in.ibm.com>
     [not found]           ` <20080629084002.GA24131@lst.de>
     [not found]             ` <20080630051034.GA4970@in.ibm.com>
     [not found]               ` <20080630112913.GA18817@lst.de>
2008-06-30 19:27                 ` [Ksummit-2008-discuss] DTrace Frank Ch. Eigler
2008-07-01  1:21                   ` Jim Keniston
     [not found]                   ` <20080706123414.GA9265@lst.de>
2008-07-06 15:47                     ` Frank Ch. Eigler
2008-07-06 16:36                       ` Evgeniy Polyakov
2008-07-06 18:05                         ` Frank Ch. Eigler
2008-07-06 18:24                           ` Evgeniy Polyakov
2008-07-06 21:46                             ` Frank Ch. Eigler
2008-07-06 22:47                               ` Karen Shaeffer
2008-07-06 23:15                                 ` Frank Ch. Eigler
2008-07-07  5:59                               ` Evgeniy Polyakov
2008-07-07 11:19                                 ` Frank Ch. Eigler
     [not found]   ` <20080627182754.GB7549@mit.edu>
     [not found]     ` <1214597135.3394.82.camel@localhost.localdomain>
     [not found]       ` <aday74qlh08.fsf@cisco.com>
     [not found]         ` <4865B111.2040307@redhat.com>
     [not found]           ` <adavdzujh2u.fsf@cisco.com>
     [not found]             ` <20080704200055.GA11232@synapse.neuralscape.com>
     [not found]               ` <20080704224424.GA12454@synapse.neuralscape.com>
     [not found]                 ` <1215273663.3439.34.camel@localhost.localdomain>
2008-07-06 23:33                   ` Frank Ch. Eigler
2008-07-07 14:35                     ` James Bottomley
2008-07-07 15:02                     ` James Bottomley
2008-06-30 13:57 DTrace Frank Ch. Eigler
2008-06-30 19:00 ` [Ksummit-2008-discuss] DTrace Grant Grundler
2008-06-30 19:40 ` Theodore Tso
2008-06-30 20:00   ` Frank Ch. Eigler
2008-06-30 20:19     ` Theodore Tso
2008-06-30 21:12       ` Arnaldo Carvalho de Melo
2008-06-30 23:02         ` David Miller
2008-06-30 21:13       ` James Bottomley
2008-06-30 22:10       ` Frank Ch. Eigler
2008-07-01  2:42         ` Theodore Tso
2008-07-01  7:08           ` Roland McGrath
2008-07-01 10:15             ` Theodore Tso
2008-07-01 11:04               ` Sam Ravnborg
2008-07-01 12:13                 ` Theodore Tso
2008-07-02 20:27                   ` Sam Ravnborg
2008-07-01 20:06               ` Roland McGrath
2008-07-01 23:13                 ` Theodore Tso
2008-07-02  2:23                   ` Frank Ch. Eigler
2008-07-02 19:27                   ` Frank Ch. Eigler
2008-07-02 20:08                   ` Joel Becker
2008-07-02 20:17                     ` J. Bruce Fields
2008-07-02 20:41                       ` Frank Ch. Eigler
2008-07-02 21:19                       ` H. Peter Anvin
2008-07-02 21:30                       ` Theodore Tso
2008-07-02 21:46                         ` J. Bruce Fields
2008-07-05  9:46                   ` Peter Zijlstra
2008-07-05 10:07                     ` Christoph Hellwig
2008-07-05 12:12                       ` Frank Ch. Eigler
2008-07-05 18:08                         ` Christoph Hellwig
2008-07-05 13:50                       ` James Bottomley
2008-07-05 18:08                         ` Christoph Hellwig
2008-07-05 18:05                       ` K.Prasad
2008-07-07 14:36                         ` Christoph Hellwig
2008-07-07 17:44                           ` K.Prasad
2008-07-05 12:34                     ` Theodore Tso
2008-07-01  5:29   ` Ananth N Mavinakayanahalli
2008-06-30 19:59 ` James Bottomley
2008-06-30 23:52   ` Masami Hiramatsu
2008-07-08 23:32   ` Eric W. Biederman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).