public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* nits to please a kernel hacker
@ 2008-09-29 23:02 Roland McGrath
  2008-09-30 11:28 ` Srikar Dronamraju
  2008-09-30 13:55 ` Theodore Tso
  0 siblings, 2 replies; 8+ messages in thread
From: Roland McGrath @ 2008-09-29 23:02 UTC (permalink / raw)
  To: systemtap

Hi guys.  So today I have my random kernel hacker hat on.  I have a
thing to debug in my branch of the upstream kernel du jour, and kgdb
still doesn't all the way work, so I thought I'd try systemtap.
Here's how it went.  After this email, I've done my thinking about
systemtap for the day and will get on with debugging my thing; next
time I have a thing to debug, I'll think about systemtap again and
try the same things from zero again to see if systemtap has achieved
utility in helping me do such work.  So if the follow-up to any of
these is to ask me to do anything (including think about bugzilla),
other than "here read this, it answers your question" or "here, try
this again, it might be fixed now", then you lose.

I did a fresh git pull and then configure;make in a separate build dir.
That much any hacker does before even thinking about how you actually use
the thing.  (It went fine.)  Naturally, I'm going to use it out of the
build dir and not install it at all.  I just built it.

I'd used stap before, so I knew it had to be able to find some installed
files.  Nothing immediately obvious told me the list of these to set.
I looked in Makefile and surmised from the 'make check' usage that it's:

	SYSTEMTAP_RUNTIME=/my/builds/stap/runtime \
	SYSTEMTAP_TAPSET=/my/builds/stap/tapset \
	/my/builds/stap/stap [options...]
[sic...more to come]

This should be said up front in README, or at least HACKING.  Even from the
stap(1) man page I have to tease these out from mentions in the FILES
section.  It should have an ENVIRONMENT section that pops right out for
easy reference.

In the glibc build, we write a tiny script called testrun.sh in the build
directory that execs tests with all the magic path options set correctly
for that build directory.  Maybe there could be a test-stap.sh or some such
file written by configure/make.  It could have build/src paths wired in,
and/or do something clever with $0.  Just so after configure;make, one can do:

	/my/builds/stap/test-stap.sh -e 'probe me.baby() { real=$good }'

Ok, so now I have stap and I think I know how to run it.  Naturally, I am
using my own hand-built kernel and I don't properly install it anywhere.
(I don't use any modules, and only care about finding debuginfo for vmlinux
itself.)  So I use (env vars etc from above):

	.../stap -r /my/builds/linux-foobar -e 'probe kernel.function("foo") {}'

This actually works fine as far as finding debuginfo, because the libdwfl
calls take a string starting with / as an absolute directory to use in
place of /lib/modules/something.  Here it is in reality:

	-bash-3.2$ SYSTEMTAP_RUNTIME=$HOME/build/systemtap/runtime SYSTEMTAP_TAPSET=$HOME/build/systemtap/tapset $HOME/build/systemtap/stap -r ~/build/linux-utrace -e 'probe kernel.function("tracehook_report_clone") {}'
	WARNING: side-effect-free probe 'probe_1': keyword at <input>:1:1
	Module directory /lib/modules//home/roland/build/linux-utrace/build check failed: No such file or directory
	Make sure kernel devel is installed.
	Pass 4: compilation failed.  Try again with more '-v' (verbose) options.
	-bash-3.2$ 

Ok, first peeve.  If you tell me one more time to use "more" -v options,
I'm going to beat you with a rusty tire-iron.  HOW MANY MORE????  The
message should say exactly the option that will give information about
the kind of failure that just happened.

When there is one, that is.  For this particular failure, from -v all
the way up to -vvvvvv give me no more information on what the critical
failure was.  If the failure was in pass 4, then there should be a way
to say "make pass 4 very verbose" without making passes 1-3 spew 100s of
lines of verbosity I have to skip over looking for "Pass 4".  Being sure
I had all the info I could get from the phase that is actually failing
would have made me give up quicker and realize that:

	Module directory /lib/modules//home/roland/build/linux-utrace/build check failed: No such file or directory
	Make sure kernel devel is installed.

was in fact the fatal message and meant it had nothing else to explain or
show going wrong in detail.  Since this message stands out less than a
"WARNING:" that I already knew I didn't care about, it was far from
immediately obvious that this was "FATAL ERROR" and it's still only me
surmising now that this means "I didn't even try anything else in pass 4,
there are no more error messages to see."

So I now I see that -r /... does not mean the same thing to two different
parts of systemtap--one worked, and now the other is looking for
/lib/modules//... just like you expect from code that doesn't grok /... as
an absolute directory name.  So that's a stupid bug, but I know from stupid
bugs.  So seeing that, I do:

	ln -s /home/roland/build/linux-utrace /lib/modules/foobar

And try again.

	-bash-3.2$ SYSTEMTAP_RUNTIME=$HOME/build/systemtap/runtime SYSTEMTAP_TAPSET=$HOME/build/systemtap/tapset $HOME/build/systemtap/stap -r foobar -e 'probe kernel.function("tracehook_report_clone") {}'
	WARNING: side-effect-free probe 'probe_1': keyword at <input>:1:1
	Module directory /lib/modules/foobar/build check failed: No such file or directory
	Make sure kernel devel is installed.
	Pass 4: compilation failed.  Try again with more '-v' (verbose) options.
	-bash-3.2$ 

Oh right, I know from stupid code.

	ln -s . /home/roland/build/linux-utrace/build

And try again.

	-bash-3.2$ SYSTEMTAP_RUNTIME=$HOME/build/systemtap/runtime SYSTEMTAP_TAPSET=$HOME/build/systemtap/tapset $HOME/build/systemtap/stap -r foobar -e 'probe kernel.function("tracehook_report_clone") {}'
	WARNING: side-effect-free probe 'probe_1': keyword at <input>:1:1
	Pass 4: compilation failed.  Try again with more '-v' (verbose) options.

Now I am raising my rusty tire-iron.

	-bash-3.2$ SYSTEMTAP_RUNTIME=$HOME/build/systemtap/runtime SYSTEMTAP_TAPSET=$HOME/build/systemtap/tapset $HOME/build/systemtap/stap -v -r foobar -e 'probe kernel.function("tracehook_report_clone") {}'
	Warning: changing last pass to 4 since cross-compiling
	Pass 1: parsed user script and 0 library script(s) in 0usr/0sys/0real ms.
	WARNING: side-effect-free probe 'probe_1': keyword at <input>:1:1
	Pass 2: analyzed script: 1 probe(s), 0 function(s), 0 embed(s), 0 global(s) in 300usr/110sys/422real ms.
	Pass 3: translated to C into "/tmp/staprRLw6c/stap_e7e4b9a28ebf0874a811632eaa2cf03d_324.c" in 530usr/1250sys/1779real ms.
	Pass 4: compiled C into "stap_e7e4b9a28ebf0874a811632eaa2cf03d_324.ko" in 670usr/1060sys/1771real ms.
	Pass 4: compilation failed.  Try again with more '-v' (verbose) options.

Now I am beating you with my rusty tire-iron.

	-bash-3.2$ SYSTEMTAP_RUNTIME=$HOME/build/systemtap/runtime SYSTEMTAP_TAPSET=$HOME/build/systemtap/tapset $HOME/build/systemtap/stap -vv -r foobar -e 'probe kernel.function("tracehook_report_clone") {}'
	Warning: changing last pass to 4 since cross-compiling
	SystemTap translator/driver (version 0.7.1/0.133 git branch master, commit 338e2309)
	Copyright (C) 2005-2008 Red Hat, Inc. and others
	This is free software; see the source for copying conditions.
	Session arch: x86_64 release: foobar
	Created temporary directory "/tmp/stapHqdXIq"
	Pass 1: parsed user script and 0 library script(s) in 0usr/0sys/0real ms.
	probe tracehook_report_clone@/home/roland/redhat/linux/utrace/include/linux/tracehook.h:301 kernel reloc=.dynamic section=.text pc=0xffffffff80240f96
	WARNING: side-effect-free probe 'probe_1': keyword at <input>:1:1
	Pass 2: analyzed script: 1 probe(s), 0 function(s), 0 embed(s), 0 global(s) in 330usr/90sys/424real ms.
	probe_1 locks nothing
	dump_unwindsyms kernel index=0 base=0xffffffff80200000
	Pass 3: translated to C into "/tmp/stapHqdXIq/stap_e7e4b9a28ebf0874a811632eaa2cf03d_324.c" in 520usr/1290sys/1804real ms.
	Running make -C "/lib/modules/foobar/build" M="/tmp/stapHqdXIq" modules >/dev/null
	/tmp/stapHqdXIq/stap_e7e4b9a28ebf0874a811632eaa2cf03d_324.c:41:21: error: runtime.h: No such file or directory
	/tmp/stapHqdXIq/stap_e7e4b9a28ebf0874a811632eaa2cf03d_324.c:42:18: error: regs.c: No such file or directory
	/tmp/stapHqdXIq/stap_e7e4b9a28ebf0874a811632eaa2cf03d_324.c:43:19: error: stack.c: No such file or directory
	[...]

Luckily for your cranium, I can put down the tire-iron after only two -v
increments.  Imagine the violence and bloodshed if this were a problem
explained only at -vvvvvvvvv.

At this point I decide those rumors about systemtap's runtime not being
compatible with current upstream kernels are true, and prepare to give up
on the exercise for the day.  A random glance makes me realize that in
transcribing the 'make check' command lines into my actual situation, I
made the obvious mistake.  (They are $srcdir references, not build dir
references.)  Phew.  A less attentive kernel hacker who was no less error
prone to begin with than I am would probably have given up and then
reinforced the rumor about the runtime.  All because there isn't an obvious
explanation you can't miss about how to correctly run stap out of a fresh
build directory.

So, operator error due to inadequate automation discovered and corrected.
We can resume our story (rumors are false and the current stap runtime is
fine with the current upstream Linus tree).

	-bash-3.2$ SYSTEMTAP_RUNTIME=$HOME/redhat/systemtap/runtime SYSTEMTAP_TAPSET=$HOME/redhat/systemtap/tapset $HOME/build/systemtap/stap -r foobar -e 'probe kernel.function("tracehook_report_clone") {}'
	WARNING: side-effect-free probe 'probe_1383': keyword at <input>:1:1
	/home/roland/.systemtap/cache/88/stap_8820c6519888bb04e4fc2e61ef2f1f11_325.ko
	-bash-3.2$ 

Um, what?

	-bash-3.2$ SYSTEMTAP_RUNTIME=$HOME/redhat/systemtap/runtime SYSTEMTAP_TAPSET=$HOME/redhat/systemtap/tapset $HOME/build/systemtap/stap -v -r foobar -e 'probe kernel.function("tracehook_report_clone") {}'
	Warning: changing last pass to 4 since cross-compiling
	Pass 1: parsed user script and 45 library script(s) in 320usr/50sys/391real ms.
	WARNING: side-effect-free probe 'probe_1383': keyword at <input>:1:1
	Pass 2: analyzed script: 1 probe(s), 0 function(s), 0 embed(s), 0 global(s) in 340usr/110sys/442real ms.
	/home/roland/.systemtap/cache/88/stap_8820c6519888bb04e4fc2e61ef2f1f11_325.ko
	Pass 3: using cached /home/roland/.systemtap/cache/88/stap_8820c6519888bb04e4fc2e61ef2f1f11_325.c
	Pass 4: using cached /home/roland/.systemtap/cache/88/stap_8820c6519888bb04e4fc2e61ef2f1f11_325.ko
	-bash-3.2$ 

After extra seconds because the eyes are drawn to "WARNING:" that I already
know to ignore, I notice:

	Warning: changing last pass to 4 since cross-compiling

I'm cross-compiling?  Since when, buddy?  Oh, you mean magically considered
"cross-compiling" just because I used -r to tell you where to find the
kernel build.  So when the usage says:

   -r RELEASE cross-compile to kernel RELEASE, instead of 2.6.27-rc7.utrace-00170-gbe4ac41

even though 2.6.27-rc7.utrace-00170-gbe4ac41 is my `uname -r`, i.e. my
running kernel, it means that just giving -r changes the fundamental
control flow of what the command means.  It's not just giving the directory
to find the kernel stuff in, like -R does.  I never would have guessed
that.  (It's a little hard for me to tell, but I suspect it's only because
I do secretly know lots of stap internals that the .ko file name output was
not just a complete mystery, and that it even occurred to me when there was
apparently no error message mentioning it to try -v next.)

Ok, I know from cockamamy.  I'm not going to try to figure out what stap
isn't doing, but I can guess how to fool it.

	mv /lib/modules/foobar /lib/modules/`uname -r`

(That's my symlink from before.)

	-bash-3.2$ SYSTEMTAP_RUNTIME=$HOME/redhat/systemtap/runtime SYSTEMTAP_TAPSET=$HOME/redhat/systemtap/tapset $HOME/build/systemtap/stap -e 'probe kernel.function("tracehook_report_clone") {}' 
	WARNING: side-effect-free probe 'probe_1383': keyword at <input>:1:1
	sh: /usr/local/bin/staprun: No such file or directory
	Pass 5: run failed.  Try again with more '-v' (verbose) options.
	-bash-3.2$ 

Crikey!  Is there another variable for finding staprun?  I looked at the
docs, and even grepped the source for "getenv" and I sure don't see one!
I think I tracked it down in buildrun.cxx:run_pass and it's frickin'
hard-wired to the configure-chosen $(bindir)/staprun at compile time.
Give me a break!

	-bash-3.2$ SYSTEMTAP_RUNTIME=$HOME/redhat/systemtap/runtime SYSTEMTAP_TAPSET=$HOME/redhat/systemtap/tapset $HOME/build/systemtap/stap -v -e 'probe kernel.function("tracehook_report_clone") {}' 
	Pass 1: parsed user script and 45 library script(s) in 340usr/40sys/393real ms.
	WARNING: side-effect-free probe 'probe_1383': keyword at <input>:1:1
	Pass 2: analyzed script: 1 probe(s), 0 function(s), 0 embed(s), 0 global(s) in 330usr/110sys/441real ms.
	Pass 3: using cached /home/roland/.systemtap/cache/32/stap_32273b197bad7e53129d892d463a65cd_351.c
	Pass 4: using cached /home/roland/.systemtap/cache/32/stap_32273b197bad7e53129d892d463a65cd_351.ko
	Pass 5: starting run.
	sh: /usr/local/bin/staprun: No such file or directory
	Pass 5: run completed in 0usr/0sys/9real ms.
	Pass 5: run failed.  Try again with more '-v' (verbose) options.
	-bash-3.2$ 

Warming up the tire-iron...

	-bash-3.2$ SYSTEMTAP_RUNTIME=$HOME/redhat/systemtap/runtime SYSTEMTAP_TAPSET=$HOME/redhat/systemtap/tapset $HOME/build/systemtap/stap -vv -e 'probe kernel.function("tracehook_report_clone") {}' 
	SystemTap translator/driver (version 0.7.1/0.133 git branch master, commit 338e2309)
	Copyright (C) 2005-2008 Red Hat, Inc. and others
	This is free software; see the source for copying conditions.
	Session arch: x86_64 release: 2.6.27-rc7.utrace-00170-gbe4ac41
	Created temporary directory "/tmp/stapXpCOU6"
	Searched '/home/roland/redhat/systemtap/tapset/x86_64/*.stp', found 2
	Searched '/home/roland/redhat/systemtap/tapset/*.stp', found 43
	Pass 1: parsed user script and 45 library script(s) in 330usr/50sys/450real ms.
	probe tracehook_report_clone@/home/roland/redhat/linux/utrace/include/linux/tracehook.h:301 kernel reloc=.dynamic section=.text pc=0xffffffff80240f16
	WARNING: side-effect-free probe 'probe_1383': keyword at <input>:1:1
	Pass 2: analyzed script: 1 probe(s), 0 function(s), 0 embed(s), 0 global(s) in 330usr/870sys/2202real ms.
	Pass 3: using cached /home/roland/.systemtap/cache/32/stap_32273b197bad7e53129d892d463a65cd_351.c
	Pass 4: using cached /home/roland/.systemtap/cache/32/stap_32273b197bad7e53129d892d463a65cd_351.ko
	Pass 5: starting run.
	Running /usr/local/bin/staprun -v /tmp/stapXpCOU6/stap_32273b197bad7e53129d892d463a65cd_351.ko
	sh: /usr/local/bin/staprun: No such file or directory
	Pass 5: run completed in 0usr/10sys/11real ms.
	Pass 5: run failed.  Try again with more '-v' (verbose) options.
	Running rm -rf /tmp/stapXpCOU6
	-bash-3.2$ 

Ok, there is finally.  Two -v's just to see the command line it tried to run?
Really?  Even though you deserved it for that smarmy damn error message,
I'm feeling a little bad about the tire-iron.  So, I'm working with ya!

	-bash-3.2$ $HOME/build/systemtap/staprun -v /tmp/stapXpCOU6/stap_32273b197bad7e53129d892d463a65cd_351.ko
	ERROR: The effective user ID of staprun must be set to the root user.
	  Check permissions on staprun and ensure it is a setuid root program.

Ok, so stap was actually using sudo to run staprun but didn't say the
actual command it ran in the -vv "Running ..." message.  Still letting the
tire-iron cool down, we can do this.

	-bash-3.2$ sudo $HOME/build/systemtap/staprun -v /tmp/stapXpCOU6/stap_32273b197bad7e53129d892d463a65cd_351.ko
	ERROR: Error opening '/tmp/stapXpCOU6/stap_32273b197bad7e53129d892d463a65cd_351.ko': No such file or directory
	-bash-3.2$

What?  Oh, no more temps.  Hmm.  Hey, maybe if I screw myself the other way
this will add up to goodness!

	-bash-3.2$ SYSTEMTAP_RUNTIME=$HOME/redhat/systemtap/runtime SYSTEMTAP_TAPSET=$HOME/redhat/systemtap/tapset $HOME/build/systemtap/stap -r `uname -r` -e 'probe kernel.function("tracehook_report_clone") {}' 
	WARNING: side-effect-free probe 'probe_1383': keyword at <input>:1:1
	/home/roland/.systemtap/cache/32/stap_32273b197bad7e53129d892d463a65cd_351.ko
	-bash-3.2$ sudo $HOME/build/systemtap/staprun -v /home/roland/.systemtap/cache/32/stap_32273b197bad7e53129d892d463a65cd_351.ko
	/usr/local/libexec/systemtap/stapio: No such file or directory
	-bash-3.2$ 

Crispy Fried Jeebus.  Man, I could have had such a nice lunch in the time
it took to get this far.  Ok, we've been through this drill.  Almost there!
Stay on target!  Damn, let's skip a step and go straight to three -v's!

	bash-3.2$ sudo $HOME/build/systemtap/staprun -vvv /home/roland/.systemtap/cache/32/stap_32273b197bad7e53129d892d463a65cd_351.ko
	staprun:parse_modpath:170 inpath=/home/roland/.systemtap/cache/32/stap_32273b197bad7e53129d892d463a65cd_351.ko
	staprun:main:249 modpath="/home/roland/.systemtap/cache/32/stap_32273b197bad7e53129d892d463a65cd_351.ko", modname="stap_32273b197bad7e53129d892d463a65cd_351"
	staprun:init_staprun:207 init_staprun
	staprun:insert_module:47 inserting module
	staprun:insert_module:66 module options: _stp_bufsize=0
	Error inserting module '/home/roland/.systemtap/cache/32/stap_32273b197bad7e53129d892d463a65cd_351.ko': File exists

Wait, what?  Does that mean what I think it means?

	-bash-3.2$ /sbin/lsmod
	Module                  Size  Used by
	stap_32273b197bad7e53129d892d463a65cd_351   706120  0 
	-bash-3.2$ 

What a low-rent cheeseball freshman maneuver.  Too bad I don't drink, or
some kernel hackers would get an earful of fresh rumors about the mental
acuity of stap hackers at the bar tonight!  Ok, staprun, let me get a baby
wipe there, I'm feeling parental.

	-bash-3.2$ sudo /sbin/rmmod stap_32273b197bad7e53129d892d463a65cd_351
	-bash-3.2$ sudo $HOME/build/systemtap/staprun -vvv /home/roland/.systemtap/cache/32/stap_32273b197bad7e53129d892d463a65cd_351.ko
	staprun:parse_modpath:170 inpath=/home/roland/.systemtap/cache/32/stap_32273b197bad7e53129d892d463a65cd_351.ko
	staprun:main:249 modpath="/home/roland/.systemtap/cache/32/stap_32273b197bad7e53129d892d463a65cd_351.ko", modname="stap_32273b197bad7e53129d892d463a65cd_351"
	staprun:init_staprun:207 init_staprun
	staprun:insert_module:47 inserting module
	staprun:insert_module:66 module options: _stp_bufsize=0
	staprun:init_ctl_channel:30 Opening /sys/kernel/debug/systemtap/stap_32273b197bad7e53129d892d463a65cd_351/.cmd
	execing: /usr/local/libexec/systemtap/stapio -vvv /home/roland/.systemtap/cache/32/stap_32273b197bad7e53129d892d463a65cd_351.ko 
	/usr/local/libexec/systemtap/stapio: No such file or directory
	-bash-3.2$ 

Hey that time the failing to clean up the module after a failure is handy,
we can pretend it's a feature!

	-bash-3.2$ sudo $HOME/build/systemtap/stapio -vvv /home/roland/.systemtap/cache/32/stap_32273b197bad7e53129d892d463a65cd_351.ko
	stapio:parse_modpath:170 inpath=/home/roland/.systemtap/cache/32/stap_32273b197bad7e53129d892d463a65cd_351.ko
	stapio:main:37 modpath="/home/roland/.systemtap/cache/32/stap_32273b197bad7e53129d892d463a65cd_351.ko", modname="stap_32273b197bad7e53129d892d463a65cd_351"
	stapio:init_stapio:224 init_stapio
	stapio:init_ctl_channel:30 Opening /sys/kernel/debug/systemtap/stap_32273b197bad7e53129d892d463a65cd_351/.cmd
	stapio:stp_main_loop:313 in main loop
	stapio:stp_main_loop:320 nb=4
	stapio:init_relayfs:124 initializing relayfs
	stapio:init_relayfs:148 attempting to open /sys/kernel/debug/systemtap/stap_32273b197bad7e53129d892d463a65cd_351/trace0
	stapio:init_relayfs:148 attempting to open /sys/kernel/debug/systemtap/stap_32273b197bad7e53129d892d463a65cd_351/trace1
	stapio:init_relayfs:154 ncpus=1, bulkmode = 0
	stapio:init_relayfs:204 starting threads
	stapio:stp_main_loop:320 nb=12
	stapio:stp_main_loop:360 probe_start() returned 0

Holy crap!  I think it's working!

	[hit C-c]
	^Cstapio:signal_thread:36 sigproc 2 (Interrupt)
	stapio:stp_main_loop:320 nb=4
	stapio:stp_main_loop:353 got STP_EXIT
	stapio:cleanup_and_exit:271 detach=0
	stapio:close_relayfs:221 closing
	stapio:reader_thread:108 exiting thread 0
	stapio:close_relayfs:240 done
	stapio:cleanup_and_exit:284 closing control channel
	stapio:cleanup_and_exit:290 removing stap_32273b197bad7e53129d892d463a65cd_351
	stap_32273b197bad7e53129d892d463a65cd_351: No such file or directory

Huh?  What does that mean?

	-bash-3.2$ /sbin/lsmod
	Module                  Size  Used by
	stap_32273b197bad7e53129d892d463a65cd_351   706120  0 
	-bash-3.2$ 

Oh, I guess that's not actually what it meant (who the hell knows).
I guess that rmmod is what staprun would have done after if it had
been able to run stapio.  So this is normal.  I think it really worked.
Alrighty, then.

I wonder if I want to repeat all that with a script that actually does
something.  Nah, I'm starving and in the meantime I rebuilt a kernel
where kgdb works a little more.  I think after lunch I'll try that instead.


Thanks,
Roland

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: nits to please a kernel hacker
  2008-09-29 23:02 nits to please a kernel hacker Roland McGrath
@ 2008-09-30 11:28 ` Srikar Dronamraju
  2008-09-30 13:55 ` Theodore Tso
  1 sibling, 0 replies; 8+ messages in thread
From: Srikar Dronamraju @ 2008-09-30 11:28 UTC (permalink / raw)
  To: Roland McGrath; +Cc: systemtap

* Roland McGrath <roland@redhat.com> [2008-09-29 16:01:07]:

Hi 

Some of these problems were already discussed over here:
http://sourceware.org/ml/systemtap/2008-q1/msg00378.html

--
Thanks and Regards
Srikar

> Hi guys.  So today I have my random kernel hacker hat on.  I have a
> thing to debug in my branch of the upstream kernel du jour, and kgdb
> still doesn't all the way work, so I thought I'd try systemtap.
> Here's how it went.  After this email, I've done my thinking about
> systemtap for the day and will get on with debugging my thing; next
> time I have a thing to debug, I'll think about systemtap again and
> try the same things from zero again to see if systemtap has achieved
> utility in helping me do such work.  So if the follow-up to any of
> these is to ask me to do anything (including think about bugzilla),
> other than "here read this, it answers your question" or "here, try
> this again, it might be fixed now", then you lose.
> 
> I did a fresh git pull and then configure;make in a separate build dir.
> That much any hacker does before even thinking about how you actually use
> the thing.  (It went fine.)  Naturally, I'm going to use it out of the
> build dir and not install it at all.  I just built it.
> 
> I'd used stap before, so I knew it had to be able to find some installed
> files.  Nothing immediately obvious told me the list of these to set.
> I looked in Makefile and surmised from the 'make check' usage that it's:
> 
> 	SYSTEMTAP_RUNTIME=/my/builds/stap/runtime \
> 	SYSTEMTAP_TAPSET=/my/builds/stap/tapset \
> 	/my/builds/stap/stap [options...]
> [sic...more to come]
> 
> This should be said up front in README, or at least HACKING.  Even from the
> stap(1) man page I have to tease these out from mentions in the FILES
> section.  It should have an ENVIRONMENT section that pops right out for
> easy reference.
> 
> In the glibc build, we write a tiny script called testrun.sh in the build
> directory that execs tests with all the magic path options set correctly
> for that build directory.  Maybe there could be a test-stap.sh or some such
> file written by configure/make.  It could have build/src paths wired in,
> and/or do something clever with $0.  Just so after configure;make, one can do:
> 
> 	/my/builds/stap/test-stap.sh -e 'probe me.baby() { real=$good }'
> 
> Ok, so now I have stap and I think I know how to run it.  Naturally, I am
> using my own hand-built kernel and I don't properly install it anywhere.
> (I don't use any modules, and only care about finding debuginfo for vmlinux
> itself.)  So I use (env vars etc from above):
> 
> 	.../stap -r /my/builds/linux-foobar -e 'probe kernel.function("foo") {}'
> 
> This actually works fine as far as finding debuginfo, because the libdwfl
> calls take a string starting with / as an absolute directory to use in
> place of /lib/modules/something.  Here it is in reality:
> 
> 	-bash-3.2$ SYSTEMTAP_RUNTIME=$HOME/build/systemtap/runtime SYSTEMTAP_TAPSET=$HOME/build/systemtap/tapset $HOME/build/systemtap/stap -r ~/build/linux-utrace -e 'probe kernel.function("tracehook_report_clone") {}'
> 	WARNING: side-effect-free probe 'probe_1': keyword at <input>:1:1
> 	Module directory /lib/modules//home/roland/build/linux-utrace/build check failed: No such file or directory
> 	Make sure kernel devel is installed.
> 	Pass 4: compilation failed.  Try again with more '-v' (verbose) options.
> 	-bash-3.2$ 
> 
> Ok, first peeve.  If you tell me one more time to use "more" -v options,
> I'm going to beat you with a rusty tire-iron.  HOW MANY MORE????  The
> message should say exactly the option that will give information about
> the kind of failure that just happened.
> 
> When there is one, that is.  For this particular failure, from -v all
> the way up to -vvvvvv give me no more information on what the critical
> failure was.  If the failure was in pass 4, then there should be a way
> to say "make pass 4 very verbose" without making passes 1-3 spew 100s of
> lines of verbosity I have to skip over looking for "Pass 4".  Being sure
> I had all the info I could get from the phase that is actually failing
> would have made me give up quicker and realize that:
> 
> 	Module directory /lib/modules//home/roland/build/linux-utrace/build check failed: No such file or directory
> 	Make sure kernel devel is installed.
> 
> was in fact the fatal message and meant it had nothing else to explain or
> show going wrong in detail.  Since this message stands out less than a
> "WARNING:" that I already knew I didn't care about, it was far from
> immediately obvious that this was "FATAL ERROR" and it's still only me
> surmising now that this means "I didn't even try anything else in pass 4,
> there are no more error messages to see."
> 
> So I now I see that -r /... does not mean the same thing to two different
> parts of systemtap--one worked, and now the other is looking for
> /lib/modules//... just like you expect from code that doesn't grok /... as
> an absolute directory name.  So that's a stupid bug, but I know from stupid
> bugs.  So seeing that, I do:
> 
> 	ln -s /home/roland/build/linux-utrace /lib/modules/foobar
> 
> And try again.
> 
> 	-bash-3.2$ SYSTEMTAP_RUNTIME=$HOME/build/systemtap/runtime SYSTEMTAP_TAPSET=$HOME/build/systemtap/tapset $HOME/build/systemtap/stap -r foobar -e 'probe kernel.function("tracehook_report_clone") {}'
> 	WARNING: side-effect-free probe 'probe_1': keyword at <input>:1:1
> 	Module directory /lib/modules/foobar/build check failed: No such file or directory
> 	Make sure kernel devel is installed.
> 	Pass 4: compilation failed.  Try again with more '-v' (verbose) options.
> 	-bash-3.2$ 
> 
> Oh right, I know from stupid code.
> 
> 	ln -s . /home/roland/build/linux-utrace/build
> 
> And try again.
> 
> 	-bash-3.2$ SYSTEMTAP_RUNTIME=$HOME/build/systemtap/runtime SYSTEMTAP_TAPSET=$HOME/build/systemtap/tapset $HOME/build/systemtap/stap -r foobar -e 'probe kernel.function("tracehook_report_clone") {}'
> 	WARNING: side-effect-free probe 'probe_1': keyword at <input>:1:1
> 	Pass 4: compilation failed.  Try again with more '-v' (verbose) options.
> 
> Now I am raising my rusty tire-iron.
> 
> 	-bash-3.2$ SYSTEMTAP_RUNTIME=$HOME/build/systemtap/runtime SYSTEMTAP_TAPSET=$HOME/build/systemtap/tapset $HOME/build/systemtap/stap -v -r foobar -e 'probe kernel.function("tracehook_report_clone") {}'
> 	Warning: changing last pass to 4 since cross-compiling
> 	Pass 1: parsed user script and 0 library script(s) in 0usr/0sys/0real ms.
> 	WARNING: side-effect-free probe 'probe_1': keyword at <input>:1:1
> 	Pass 2: analyzed script: 1 probe(s), 0 function(s), 0 embed(s), 0 global(s) in 300usr/110sys/422real ms.
> 	Pass 3: translated to C into "/tmp/staprRLw6c/stap_e7e4b9a28ebf0874a811632eaa2cf03d_324.c" in 530usr/1250sys/1779real ms.
> 	Pass 4: compiled C into "stap_e7e4b9a28ebf0874a811632eaa2cf03d_324.ko" in 670usr/1060sys/1771real ms.
> 	Pass 4: compilation failed.  Try again with more '-v' (verbose) options.
> 
> Now I am beating you with my rusty tire-iron.
> 
> 	-bash-3.2$ SYSTEMTAP_RUNTIME=$HOME/build/systemtap/runtime SYSTEMTAP_TAPSET=$HOME/build/systemtap/tapset $HOME/build/systemtap/stap -vv -r foobar -e 'probe kernel.function("tracehook_report_clone") {}'
> 	Warning: changing last pass to 4 since cross-compiling
> 	SystemTap translator/driver (version 0.7.1/0.133 git branch master, commit 338e2309)
> 	Copyright (C) 2005-2008 Red Hat, Inc. and others
> 	This is free software; see the source for copying conditions.
> 	Session arch: x86_64 release: foobar
> 	Created temporary directory "/tmp/stapHqdXIq"
> 	Pass 1: parsed user script and 0 library script(s) in 0usr/0sys/0real ms.
> 	probe tracehook_report_clone@/home/roland/redhat/linux/utrace/include/linux/tracehook.h:301 kernel reloc=.dynamic section=.text pc=0xffffffff80240f96
> 	WARNING: side-effect-free probe 'probe_1': keyword at <input>:1:1
> 	Pass 2: analyzed script: 1 probe(s), 0 function(s), 0 embed(s), 0 global(s) in 330usr/90sys/424real ms.
> 	probe_1 locks nothing
> 	dump_unwindsyms kernel index=0 base=0xffffffff80200000
> 	Pass 3: translated to C into "/tmp/stapHqdXIq/stap_e7e4b9a28ebf0874a811632eaa2cf03d_324.c" in 520usr/1290sys/1804real ms.
> 	Running make -C "/lib/modules/foobar/build" M="/tmp/stapHqdXIq" modules >/dev/null
> 	/tmp/stapHqdXIq/stap_e7e4b9a28ebf0874a811632eaa2cf03d_324.c:41:21: error: runtime.h: No such file or directory
> 	/tmp/stapHqdXIq/stap_e7e4b9a28ebf0874a811632eaa2cf03d_324.c:42:18: error: regs.c: No such file or directory
> 	/tmp/stapHqdXIq/stap_e7e4b9a28ebf0874a811632eaa2cf03d_324.c:43:19: error: stack.c: No such file or directory
> 	[...]
> 
> Luckily for your cranium, I can put down the tire-iron after only two -v
> increments.  Imagine the violence and bloodshed if this were a problem
> explained only at -vvvvvvvvv.
> 
> At this point I decide those rumors about systemtap's runtime not being
> compatible with current upstream kernels are true, and prepare to give up
> on the exercise for the day.  A random glance makes me realize that in
> transcribing the 'make check' command lines into my actual situation, I
> made the obvious mistake.  (They are $srcdir references, not build dir
> references.)  Phew.  A less attentive kernel hacker who was no less error
> prone to begin with than I am would probably have given up and then
> reinforced the rumor about the runtime.  All because there isn't an obvious
> explanation you can't miss about how to correctly run stap out of a fresh
> build directory.
> 
> So, operator error due to inadequate automation discovered and corrected.
> We can resume our story (rumors are false and the current stap runtime is
> fine with the current upstream Linus tree).
> 
> 	-bash-3.2$ SYSTEMTAP_RUNTIME=$HOME/redhat/systemtap/runtime SYSTEMTAP_TAPSET=$HOME/redhat/systemtap/tapset $HOME/build/systemtap/stap -r foobar -e 'probe kernel.function("tracehook_report_clone") {}'
> 	WARNING: side-effect-free probe 'probe_1383': keyword at <input>:1:1
> 	/home/roland/.systemtap/cache/88/stap_8820c6519888bb04e4fc2e61ef2f1f11_325.ko
> 	-bash-3.2$ 
> 
> Um, what?
> 
> 	-bash-3.2$ SYSTEMTAP_RUNTIME=$HOME/redhat/systemtap/runtime SYSTEMTAP_TAPSET=$HOME/redhat/systemtap/tapset $HOME/build/systemtap/stap -v -r foobar -e 'probe kernel.function("tracehook_report_clone") {}'
> 	Warning: changing last pass to 4 since cross-compiling
> 	Pass 1: parsed user script and 45 library script(s) in 320usr/50sys/391real ms.
> 	WARNING: side-effect-free probe 'probe_1383': keyword at <input>:1:1
> 	Pass 2: analyzed script: 1 probe(s), 0 function(s), 0 embed(s), 0 global(s) in 340usr/110sys/442real ms.
> 	/home/roland/.systemtap/cache/88/stap_8820c6519888bb04e4fc2e61ef2f1f11_325.ko
> 	Pass 3: using cached /home/roland/.systemtap/cache/88/stap_8820c6519888bb04e4fc2e61ef2f1f11_325.c
> 	Pass 4: using cached /home/roland/.systemtap/cache/88/stap_8820c6519888bb04e4fc2e61ef2f1f11_325.ko
> 	-bash-3.2$ 
> 
> After extra seconds because the eyes are drawn to "WARNING:" that I already
> know to ignore, I notice:
> 
> 	Warning: changing last pass to 4 since cross-compiling
> 
> I'm cross-compiling?  Since when, buddy?  Oh, you mean magically considered
> "cross-compiling" just because I used -r to tell you where to find the
> kernel build.  So when the usage says:
> 
>    -r RELEASE cross-compile to kernel RELEASE, instead of 2.6.27-rc7.utrace-00170-gbe4ac41
> 
> even though 2.6.27-rc7.utrace-00170-gbe4ac41 is my `uname -r`, i.e. my
> running kernel, it means that just giving -r changes the fundamental
> control flow of what the command means.  It's not just giving the directory
> to find the kernel stuff in, like -R does.  I never would have guessed
> that.  (It's a little hard for me to tell, but I suspect it's only because
> I do secretly know lots of stap internals that the .ko file name output was
> not just a complete mystery, and that it even occurred to me when there was
> apparently no error message mentioning it to try -v next.)
> 
> Ok, I know from cockamamy.  I'm not going to try to figure out what stap
> isn't doing, but I can guess how to fool it.
> 
> 	mv /lib/modules/foobar /lib/modules/`uname -r`
> 
> (That's my symlink from before.)
> 
> 	-bash-3.2$ SYSTEMTAP_RUNTIME=$HOME/redhat/systemtap/runtime SYSTEMTAP_TAPSET=$HOME/redhat/systemtap/tapset $HOME/build/systemtap/stap -e 'probe kernel.function("tracehook_report_clone") {}' 
> 	WARNING: side-effect-free probe 'probe_1383': keyword at <input>:1:1
> 	sh: /usr/local/bin/staprun: No such file or directory
> 	Pass 5: run failed.  Try again with more '-v' (verbose) options.
> 	-bash-3.2$ 
> 
> Crikey!  Is there another variable for finding staprun?  I looked at the
> docs, and even grepped the source for "getenv" and I sure don't see one!
> I think I tracked it down in buildrun.cxx:run_pass and it's frickin'
> hard-wired to the configure-chosen $(bindir)/staprun at compile time.
> Give me a break!
> 
> 	-bash-3.2$ SYSTEMTAP_RUNTIME=$HOME/redhat/systemtap/runtime SYSTEMTAP_TAPSET=$HOME/redhat/systemtap/tapset $HOME/build/systemtap/stap -v -e 'probe kernel.function("tracehook_report_clone") {}' 
> 	Pass 1: parsed user script and 45 library script(s) in 340usr/40sys/393real ms.
> 	WARNING: side-effect-free probe 'probe_1383': keyword at <input>:1:1
> 	Pass 2: analyzed script: 1 probe(s), 0 function(s), 0 embed(s), 0 global(s) in 330usr/110sys/441real ms.
> 	Pass 3: using cached /home/roland/.systemtap/cache/32/stap_32273b197bad7e53129d892d463a65cd_351.c
> 	Pass 4: using cached /home/roland/.systemtap/cache/32/stap_32273b197bad7e53129d892d463a65cd_351.ko
> 	Pass 5: starting run.
> 	sh: /usr/local/bin/staprun: No such file or directory
> 	Pass 5: run completed in 0usr/0sys/9real ms.
> 	Pass 5: run failed.  Try again with more '-v' (verbose) options.
> 	-bash-3.2$ 
> 
> Warming up the tire-iron...
> 
> 	-bash-3.2$ SYSTEMTAP_RUNTIME=$HOME/redhat/systemtap/runtime SYSTEMTAP_TAPSET=$HOME/redhat/systemtap/tapset $HOME/build/systemtap/stap -vv -e 'probe kernel.function("tracehook_report_clone") {}' 
> 	SystemTap translator/driver (version 0.7.1/0.133 git branch master, commit 338e2309)
> 	Copyright (C) 2005-2008 Red Hat, Inc. and others
> 	This is free software; see the source for copying conditions.
> 	Session arch: x86_64 release: 2.6.27-rc7.utrace-00170-gbe4ac41
> 	Created temporary directory "/tmp/stapXpCOU6"
> 	Searched '/home/roland/redhat/systemtap/tapset/x86_64/*.stp', found 2
> 	Searched '/home/roland/redhat/systemtap/tapset/*.stp', found 43
> 	Pass 1: parsed user script and 45 library script(s) in 330usr/50sys/450real ms.
> 	probe tracehook_report_clone@/home/roland/redhat/linux/utrace/include/linux/tracehook.h:301 kernel reloc=.dynamic section=.text pc=0xffffffff80240f16
> 	WARNING: side-effect-free probe 'probe_1383': keyword at <input>:1:1
> 	Pass 2: analyzed script: 1 probe(s), 0 function(s), 0 embed(s), 0 global(s) in 330usr/870sys/2202real ms.
> 	Pass 3: using cached /home/roland/.systemtap/cache/32/stap_32273b197bad7e53129d892d463a65cd_351.c
> 	Pass 4: using cached /home/roland/.systemtap/cache/32/stap_32273b197bad7e53129d892d463a65cd_351.ko
> 	Pass 5: starting run.
> 	Running /usr/local/bin/staprun -v /tmp/stapXpCOU6/stap_32273b197bad7e53129d892d463a65cd_351.ko
> 	sh: /usr/local/bin/staprun: No such file or directory
> 	Pass 5: run completed in 0usr/10sys/11real ms.
> 	Pass 5: run failed.  Try again with more '-v' (verbose) options.
> 	Running rm -rf /tmp/stapXpCOU6
> 	-bash-3.2$ 
> 
> Ok, there is finally.  Two -v's just to see the command line it tried to run?
> Really?  Even though you deserved it for that smarmy damn error message,
> I'm feeling a little bad about the tire-iron.  So, I'm working with ya!
> 
> 	-bash-3.2$ $HOME/build/systemtap/staprun -v /tmp/stapXpCOU6/stap_32273b197bad7e53129d892d463a65cd_351.ko
> 	ERROR: The effective user ID of staprun must be set to the root user.
> 	  Check permissions on staprun and ensure it is a setuid root program.
> 
> Ok, so stap was actually using sudo to run staprun but didn't say the
> actual command it ran in the -vv "Running ..." message.  Still letting the
> tire-iron cool down, we can do this.
> 
> 	-bash-3.2$ sudo $HOME/build/systemtap/staprun -v /tmp/stapXpCOU6/stap_32273b197bad7e53129d892d463a65cd_351.ko
> 	ERROR: Error opening '/tmp/stapXpCOU6/stap_32273b197bad7e53129d892d463a65cd_351.ko': No such file or directory
> 	-bash-3.2$
> 
> What?  Oh, no more temps.  Hmm.  Hey, maybe if I screw myself the other way
> this will add up to goodness!
> 
> 	-bash-3.2$ SYSTEMTAP_RUNTIME=$HOME/redhat/systemtap/runtime SYSTEMTAP_TAPSET=$HOME/redhat/systemtap/tapset $HOME/build/systemtap/stap -r `uname -r` -e 'probe kernel.function("tracehook_report_clone") {}' 
> 	WARNING: side-effect-free probe 'probe_1383': keyword at <input>:1:1
> 	/home/roland/.systemtap/cache/32/stap_32273b197bad7e53129d892d463a65cd_351.ko
> 	-bash-3.2$ sudo $HOME/build/systemtap/staprun -v /home/roland/.systemtap/cache/32/stap_32273b197bad7e53129d892d463a65cd_351.ko
> 	/usr/local/libexec/systemtap/stapio: No such file or directory
> 	-bash-3.2$ 
> 
> Crispy Fried Jeebus.  Man, I could have had such a nice lunch in the time
> it took to get this far.  Ok, we've been through this drill.  Almost there!
> Stay on target!  Damn, let's skip a step and go straight to three -v's!
> 
> 	bash-3.2$ sudo $HOME/build/systemtap/staprun -vvv /home/roland/.systemtap/cache/32/stap_32273b197bad7e53129d892d463a65cd_351.ko
> 	staprun:parse_modpath:170 inpath=/home/roland/.systemtap/cache/32/stap_32273b197bad7e53129d892d463a65cd_351.ko
> 	staprun:main:249 modpath="/home/roland/.systemtap/cache/32/stap_32273b197bad7e53129d892d463a65cd_351.ko", modname="stap_32273b197bad7e53129d892d463a65cd_351"
> 	staprun:init_staprun:207 init_staprun
> 	staprun:insert_module:47 inserting module
> 	staprun:insert_module:66 module options: _stp_bufsize=0
> 	Error inserting module '/home/roland/.systemtap/cache/32/stap_32273b197bad7e53129d892d463a65cd_351.ko': File exists
> 
> Wait, what?  Does that mean what I think it means?
> 
> 	-bash-3.2$ /sbin/lsmod
> 	Module                  Size  Used by
> 	stap_32273b197bad7e53129d892d463a65cd_351   706120  0 
> 	-bash-3.2$ 
> 
> What a low-rent cheeseball freshman maneuver.  Too bad I don't drink, or
> some kernel hackers would get an earful of fresh rumors about the mental
> acuity of stap hackers at the bar tonight!  Ok, staprun, let me get a baby
> wipe there, I'm feeling parental.
> 
> 	-bash-3.2$ sudo /sbin/rmmod stap_32273b197bad7e53129d892d463a65cd_351
> 	-bash-3.2$ sudo $HOME/build/systemtap/staprun -vvv /home/roland/.systemtap/cache/32/stap_32273b197bad7e53129d892d463a65cd_351.ko
> 	staprun:parse_modpath:170 inpath=/home/roland/.systemtap/cache/32/stap_32273b197bad7e53129d892d463a65cd_351.ko
> 	staprun:main:249 modpath="/home/roland/.systemtap/cache/32/stap_32273b197bad7e53129d892d463a65cd_351.ko", modname="stap_32273b197bad7e53129d892d463a65cd_351"
> 	staprun:init_staprun:207 init_staprun
> 	staprun:insert_module:47 inserting module
> 	staprun:insert_module:66 module options: _stp_bufsize=0
> 	staprun:init_ctl_channel:30 Opening /sys/kernel/debug/systemtap/stap_32273b197bad7e53129d892d463a65cd_351/.cmd
> 	execing: /usr/local/libexec/systemtap/stapio -vvv /home/roland/.systemtap/cache/32/stap_32273b197bad7e53129d892d463a65cd_351.ko 
> 	/usr/local/libexec/systemtap/stapio: No such file or directory
> 	-bash-3.2$ 
> 
> Hey that time the failing to clean up the module after a failure is handy,
> we can pretend it's a feature!
> 
> 	-bash-3.2$ sudo $HOME/build/systemtap/stapio -vvv /home/roland/.systemtap/cache/32/stap_32273b197bad7e53129d892d463a65cd_351.ko
> 	stapio:parse_modpath:170 inpath=/home/roland/.systemtap/cache/32/stap_32273b197bad7e53129d892d463a65cd_351.ko
> 	stapio:main:37 modpath="/home/roland/.systemtap/cache/32/stap_32273b197bad7e53129d892d463a65cd_351.ko", modname="stap_32273b197bad7e53129d892d463a65cd_351"
> 	stapio:init_stapio:224 init_stapio
> 	stapio:init_ctl_channel:30 Opening /sys/kernel/debug/systemtap/stap_32273b197bad7e53129d892d463a65cd_351/.cmd
> 	stapio:stp_main_loop:313 in main loop
> 	stapio:stp_main_loop:320 nb=4
> 	stapio:init_relayfs:124 initializing relayfs
> 	stapio:init_relayfs:148 attempting to open /sys/kernel/debug/systemtap/stap_32273b197bad7e53129d892d463a65cd_351/trace0
> 	stapio:init_relayfs:148 attempting to open /sys/kernel/debug/systemtap/stap_32273b197bad7e53129d892d463a65cd_351/trace1
> 	stapio:init_relayfs:154 ncpus=1, bulkmode = 0
> 	stapio:init_relayfs:204 starting threads
> 	stapio:stp_main_loop:320 nb=12
> 	stapio:stp_main_loop:360 probe_start() returned 0
> 
> Holy crap!  I think it's working!
> 
> 	[hit C-c]
> 	^Cstapio:signal_thread:36 sigproc 2 (Interrupt)
> 	stapio:stp_main_loop:320 nb=4
> 	stapio:stp_main_loop:353 got STP_EXIT
> 	stapio:cleanup_and_exit:271 detach=0
> 	stapio:close_relayfs:221 closing
> 	stapio:reader_thread:108 exiting thread 0
> 	stapio:close_relayfs:240 done
> 	stapio:cleanup_and_exit:284 closing control channel
> 	stapio:cleanup_and_exit:290 removing stap_32273b197bad7e53129d892d463a65cd_351
> 	stap_32273b197bad7e53129d892d463a65cd_351: No such file or directory
> 
> Huh?  What does that mean?
> 
> 	-bash-3.2$ /sbin/lsmod
> 	Module                  Size  Used by
> 	stap_32273b197bad7e53129d892d463a65cd_351   706120  0 
> 	-bash-3.2$ 
> 
> Oh, I guess that's not actually what it meant (who the hell knows).
> I guess that rmmod is what staprun would have done after if it had
> been able to run stapio.  So this is normal.  I think it really worked.
> Alrighty, then.
> 
> I wonder if I want to repeat all that with a script that actually does
> something.  Nah, I'm starving and in the meantime I rebuilt a kernel
> where kgdb works a little more.  I think after lunch I'll try that instead.
> 
> 
> Thanks,
> Roland

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: nits to please a kernel hacker
  2008-09-29 23:02 nits to please a kernel hacker Roland McGrath
  2008-09-30 11:28 ` Srikar Dronamraju
@ 2008-09-30 13:55 ` Theodore Tso
  2008-09-30 14:11   ` Frank Ch. Eigler
  1 sibling, 1 reply; 8+ messages in thread
From: Theodore Tso @ 2008-09-30 13:55 UTC (permalink / raw)
  To: Roland McGrath; +Cc: systemtap

On Mon, Sep 29, 2008 at 04:01:07PM -0700, Roland McGrath wrote:
> 	sh: /usr/local/bin/staprun: No such file or directory
> 	Pass 5: run completed in 0usr/10sys/11real ms.
> 	Pass 5: run failed.  Try again with more '-v' (verbose) options.
> 	Running rm -rf /tmp/stapXpCOU6
> 	-bash-3.2$ 
> 
> Ok, there is finally.  Two -v's just to see the command line it tried to run?
> Really?  Even though you deserved it for that smarmy damn error message,
> I'm feeling a little bad about the tire-iron.  So, I'm working with ya!

Note that run stap, "try again with more '-v'", rinse, repeat loop is
really annoying not just to kernel developers, but also to System
Administrators.  (Especially ones that were dragged kicking and
screaming to Linux and are longing for Dtrace.)

Stupid question, is there a reason than messages from stderr are
filtered out unless a sufficiently high -v is given?  I can see
wanting to filter out warning messages, but are there normally lots of
error messages sent out to stderr that need to be filtered?

Here's a potentially stupid suggestion --- why not pipe stdout and
stderr into a perl script which is resposible for doing the filtering?
With a perl script it might be possible to do more intelligent
filtering based on regexp's and then translate some very
unintelligible error messages from the compiler, linker, or whatever
into more user-friendly error messages.

						- Ted

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: nits to please a kernel hacker
  2008-09-30 13:55 ` Theodore Tso
@ 2008-09-30 14:11   ` Frank Ch. Eigler
  2008-09-30 16:24     ` Theodore Tso
  0 siblings, 1 reply; 8+ messages in thread
From: Frank Ch. Eigler @ 2008-09-30 14:11 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Roland McGrath, systemtap

Theodore Tso <tytso@mit.edu> writes:

> [...]
>> Ok, there is finally.  Two -v's just to see the command line it tried to run?
>> Really?  Even though you deserved it for that smarmy damn error message,
>> I'm feeling a little bad about the tire-iron.  So, I'm working with ya!
>
> [...]
> Stupid question, is there a reason than messages from stderr are
> filtered out unless a sufficiently high -v is given?  I can see
> wanting to filter out warning messages, but are there normally lots of
> error messages sent out to stderr that need to be filtered?

That's a good point; actual error messages should not be filtered by
the verbosity logic.  Rather, verbosity values should produce more
informative text about what preceded the error.

(By the way, many of Roland's frustrations could've been avoided
had he done it the README way and ran a "make install".)


- FChE

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: nits to please a kernel hacker
  2008-09-30 14:11   ` Frank Ch. Eigler
@ 2008-09-30 16:24     ` Theodore Tso
  2008-09-30 17:09       ` Frank Ch. Eigler
  0 siblings, 1 reply; 8+ messages in thread
From: Theodore Tso @ 2008-09-30 16:24 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: Roland McGrath, systemtap

On Tue, Sep 30, 2008 at 10:09:33AM -0400, Frank Ch. Eigler wrote:
> 
> That's a good point; actual error messages should not be filtered by
> the verbosity logic.  Rather, verbosity values should produce more
> informative text about what preceded the error.
> 

I really think the verbosity approach is not the right one; in the
long run, if the goal is for non-source-code-reading-sysadmins to use
this tool, systemtap really needs to consider using some kind of error
message translation scheme so that the wierd error messages that get
ommitted (currently with a sufficiently high verbosity level) because
the optimizer has made the DWARF information useless, gets translated
into something which a non-tools, non-systemtap expert can actually
understand.  Getting rid of the verbosity level for error messages is
just the first step; it still would be a good idea to trap common
error messages such to english, such as ("attempt to set tracepoint
where the C compilers' sptimizer has optimized out the code", or
"attempt to compile the systemtap runtime failed; probably a failed
systemtap installation?").

> (By the way, many of Roland's frustrations could've been avoided
> had he done it the README way and ran a "make install".)

Maybe it would be better if the README had some explicit words ---
"don't even bother trying to run it out of the build tree; systemtap
has to be run in an installed configuration"?

Yes, some people might be annoyed about needing to install it in
/usr/local.  (Although if you provide and document a "make uninstall",
that can satisfy a lot of the complaints).  But it seems pretty clear
that it is ***so*** annoying to run systemtap out of the build tree
that it might not be worth it.  Sure, you can provide shell scripts
that set all of the magic environment scripts, and that might not be a
bad thing to do.  But the problem is that it makes Systemtap see very
clunky.   

Another approach might be to add logic which tries to detect if stap
was run in the build tree, and then automatically sets all of the
defaults to find the runtime, et. al, out of the build tree.  The
question is whether the additional complexity is worth it.

Regards,

						- Ted

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: nits to please a kernel hacker
  2008-09-30 16:24     ` Theodore Tso
@ 2008-09-30 17:09       ` Frank Ch. Eigler
  2008-09-30 17:33         ` Theodore Tso
  0 siblings, 1 reply; 8+ messages in thread
From: Frank Ch. Eigler @ 2008-09-30 17:09 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Roland McGrath, systemtap

Hi -

On Tue, Sep 30, 2008 at 12:23:41PM -0400, Theodore Tso wrote:
> [...]
> systemtap really needs to consider using some kind of error
> message translation scheme so that the wierd error messages [...]
> [get] translated into something which a non-tools, non-systemtap 
> expert can actually understand.  [...]

That's an error message quality issue, and point taken.


> > (By the way, many of Roland's frustrations could've been avoided
> > had he done it the README way and ran a "make install".)
> 
> Maybe it would be better if the README had some explicit words ---
> "don't even bother trying to run it out of the build tree; systemtap
> has to be run in an installed configuration"?
> 
> Yes, some people might be annoyed about needing to install it in
> /usr/local.  [...]

/usr/local is simply the FSF-mandated autoconf default.  Just like any
other such tool, one can and generally should designate an arbitrary
installation directory with --prefix=DIR.

- FChE

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: nits to please a kernel hacker
  2008-09-30 17:09       ` Frank Ch. Eigler
@ 2008-09-30 17:33         ` Theodore Tso
  2008-09-30 18:57           ` Frank Ch. Eigler
  0 siblings, 1 reply; 8+ messages in thread
From: Theodore Tso @ 2008-09-30 17:33 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: Roland McGrath, systemtap

On Tue, Sep 30, 2008 at 01:07:34PM -0400, Frank Ch. Eigler wrote:
> 
> That's an error message quality issue, and point taken.
> 

The reason why I keep pushing the perl filtering scheme is that it
might expand the number of people who can help out with error message
qualty.  Not everyone is comfortable hacking C++....  (or wants to
have anything to do with it).  So you might have more people being
able to help out with that particular usabiity issue if it is
imeplemeted in a more approachable language.  Just a thought.

> > Yes, some people might be annoyed about needing to install it in
> > /usr/local.  [...]
> 
> /usr/local is simply the FSF-mandated autoconf default.  Just like any
> other such tool, one can and generally should designate an arbitrary
> installation directory with --prefix=DIR.

I suspect most users will probably install it in either /usr/local or
/usr/bin, and hence either not specify --prefix, or use --prefix=/usr,
since otherwise they would need to set up extra directories to put in
their PATH.

Doing something like --prefix=/opt/systemtap is more convenient from
the point of view of uninstalling a package, but most people I know
don't bother to do that.

						- Ted

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: nits to please a kernel hacker
  2008-09-30 17:33         ` Theodore Tso
@ 2008-09-30 18:57           ` Frank Ch. Eigler
  0 siblings, 0 replies; 8+ messages in thread
From: Frank Ch. Eigler @ 2008-09-30 18:57 UTC (permalink / raw)
  To: Theodore Tso; +Cc: systemtap

Hi -

On Tue, Sep 30, 2008 at 01:33:03PM -0400, Theodore Tso wrote:
> [...]
> The reason why I keep pushing the perl filtering scheme is that it
> might expand the number of people who can help out with error message
> quality.  Not everyone is comfortable hacking C++....  [...]

Heh.  At some point we'll have gettext-internationalized messages, so
perhaps we can have c++-phobes submit english-to-english
translations. :-)


> > /usr/local is simply the FSF-mandated autoconf default.  Just like any
> > other such tool, one can and generally should designate an arbitrary
> > installation directory with --prefix=DIR.
> 
> I suspect most users will probably install it in either /usr/local or
> /usr/bin [...]

I added a blurb to README to provide more guidance in this matter.


- FChE

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2008-09-30 18:57 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-09-29 23:02 nits to please a kernel hacker Roland McGrath
2008-09-30 11:28 ` Srikar Dronamraju
2008-09-30 13:55 ` Theodore Tso
2008-09-30 14:11   ` Frank Ch. Eigler
2008-09-30 16:24     ` Theodore Tso
2008-09-30 17:09       ` Frank Ch. Eigler
2008-09-30 17:33         ` Theodore Tso
2008-09-30 18:57           ` Frank Ch. Eigler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).