public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* Troubles with debug info, using systemtap on debian.
@ 2009-11-10  1:18 James Y Knight
  2009-11-10  9:35 ` Eugeniy Meshcheryakov
  0 siblings, 1 reply; 8+ messages in thread
From: James Y Knight @ 2009-11-10  1:18 UTC (permalink / raw)
  To: systemtap

I built my own 2.6.31 kernel with:
make-kpkg --initrd --revision 1 --append-to-version -jknight-1-amd64  
kernel_image kernel_headers kernel_debug

I have kernel-package version 12.025.

And I installed all 3 debs that created.

Therefore, I had on my filesystem direcories that look like this:
Original compile directory: /usr/src/linux-source-2.6.31
Kernel mods installed in: /lib/modules/2.6.31-jknight-1-amd64/
Debug data installed in: /usr/lib/debug/lib/modules/2.6.31-jknight-1- 
amd64/

/lib/modules/2.6.31-jknight-1-amd64/build got created as a symlink  
to: /usr/src/linux-source-2.6.31


Systemtap was working fine, for symbols in vmlinux, but segfaulted  
when trying to probe modules. E.g., the simplest script segfaulted in  
the translator.

probe module("autofs4").function("autofs4_fill_super") {}


Failed with this backtrace:

#0  0x00002b0e7568e34f in memmove () from /lib/libc.so.6
#1  0x00002b0e749cdf7c in elf64_xlatetof (dest=0x7fff515977d0,  
src=0x7fff51597800,
     encode=<value optimized out>) at elf32_xlatetof.c:118
#2  0x00002b0e747aeb0e in relocate (offset=49, addend=0x7fff51597940,  
rtype=<value optimized out>,
     symndx=11) at relocate.c:436
#3  0x00002b0e747af238 in relocate_section (ehdr=<value optimized  
out>, shstrndx=<value optimized out>,
     reloc_symtab=<value optimized out>, scn=0x291d020,  
shdr=0x7fff515979d0, tscn=0x291cf68,
     debugscn=false, partial=true) at relocate.c:501
#4  0x00002b0e747af741 in __libdwfl_relocate_section (mod=0x2908f60,  
relocated=0x291cbb0,
     relocscn=0x291d020, tscn=0x291cf68, partial=<value optimized  
out>) at relocate.c:632
#5  0x00002b0e747b04a6 in dwfl_module_address_section (mod=0x2908f60,  
address=<value optimized out>,
     bias=0x7fff51597ed8) at derelocate.c:399
#6  0x000000000046d2f5 in dump_unwindsyms (m=0x2908f60,  
userdata=<value optimized out>,
     name=<value optimized out>, base=65536, arg=0x7fff51598330) at  
translate.cxx:4730
#7  0x00002b0e747b1677 in dwfl_getmodules (dwfl=0x28cb170,  
callback=0x46c560 <dump_unwindsyms>,
     arg=0x7fff51598330, offset=2) at dwfl_getmodules.c:103
#8  0x0000000000469f66 in emit_symbol_data (s=@0x7fff515990f0) at  
translate.cxx:4970
#9  0x000000000046c041 in translate_pass (s=@0x7fff515990f0) at  
translate.cxx:5273
#10 0x000000000041062f in main (argc=2, argv=0x7fff5159aeb8) at  
main.cxx:1231

Adding --ignore-vmlinux --ignore-dwarf didn't cause the crash to go  
away.

Eventually, I figured out that it was finding debug data from a  
strange location:

/lib/modules/2.6.31-jknight-1-amd64/build/debian/linux-image-2.6.31- 
jknight-1-amd64-dbg/usr/lib/debug/lib/modules/2.6.31-jknight-1-amd64/ 
kernel/fs/autofs4/autofs4.ko

(I found that via, at that backtrace, "f 6; print *m").
Okay, I thought, that's odd. Let me just remove the "build" symlink,  
so that hopefully it finds the debug data from the installed kernel- 
debug package. Well, that failed, because the files there are  
apparently expected to be called: *.ko.debug, but I had a file called:
/usr/lib/debug/lib/modules/2.6.31-jknight-1-amd64/kernel/fs/autofs4/ 
autofs4.ko
instead. So, I symlinked it to be called autofs4.ko.debug.

Note that autofs4.ko there is the same file (same md5sum) as the one  
it found and crashed with above in /lib/modules/../build.

And, it still crashed. But, now, in a different place!!!

#0  0x00002b4884c2f34f in memmove () from /lib/libc.so.6
#1  0x00002b4883f6ef7c in elf64_xlatetof (dest=0x7fff49c3cff0,  
src=0x7fff49c3d020,
     encode=<value optimized out>) at elf32_xlatetof.c:118
#2  0x00002b4883d4fb0e in relocate (offset=47, addend=0x7fff49c3d160,  
rtype=<value optimized out>,
     symndx=179) at relocate.c:436
#3  0x00002b4883d50238 in relocate_section (ehdr=<value optimized  
out>, shstrndx=<value optimized out>,
     reloc_symtab=<value optimized out>, scn=0x2a160b0,  
shdr=0x7fff49c3d1f0, tscn=0x2a15ff8,
     debugscn=false, partial=true) at relocate.c:501
#4  0x00002b4883d50898 in __libdwfl_relocate (mod=0x2a511f0,  
debugfile=0x2a15db0,
     debug=<value optimized out>) at relocate.c:609
#5  0x00002b4883d539e8 in dwfl_module_getelf (mod=0x2a511f0,  
loadbase=0x7fff49c3d6e0)
     at dwfl_module_getelf.c:76
#6  0x000000000046cf79 in dump_unwindsyms (m=0x2a511f0,  
userdata=<value optimized out>,
     name=0x2b488f606b8f "autofs4_direct_root_inode_operations",  
base=65536, arg=0x7fff49c3db40)
     at translate.cxx:4475
#7  0x00002b4883d52677 in dwfl_getmodules (dwfl=0x19b9440,  
callback=0x46c560 <dump_unwindsyms>,
     arg=0x7fff49c3db40, offset=2) at dwfl_getmodules.c:103
#8  0x0000000000469f66 in emit_symbol_data (s=@0x7fff49c3e900) at  
translate.cxx:4970
#9  0x000000000046c041 in translate_pass (s=@0x7fff49c3e900) at  
translate.cxx:5273
#10 0x000000000041062f in main (argc=2, argv=0x7fff49c406c8) at  
main.cxx:1231

Eventually after a bit of flailing, I decided to put the build symlink  
back, but remove all the temporary packaging build directories: rm - 
rf /usr/src/linux-source-2.6.31/debian/linux*

Now, stap found the debuginfo in:
/lib/modules/2.6.31-jknight-1-amd64/build/fs/autofs4/autofs4.ko

That is the file actually generated by the kernel build process,  
unmangled by debian packaging scripts. And, then it worked! Without  
segfaulting, hooray!


So, some questions, at the end of all this:
1) Surely --ignore-dwarf --ignore-vmlinux should've caused systemtap  
to not use libelf to find and parse the dwarf debug info?

2) Why did stap find the debug data at such a strange path in /lib/ 
modules/.../build/debian/.... Does it do something like traverse every  
file, recursively, under the modules directory until it finds one it  
likes? That's quite...odd. I noticed that even if I renamed "build" to  
"build.foo", it *STILL* looked in there.

3) The debian kernel's debuginfo does "objcopy --only-keep- 
debug"...That seems like it shouldn't cause systemtap to blow up, but  
it does. I guess that's a known bug?

4) Why does it blow up *differently* depending on whether it found the  
file in /usr/lib/debug or /lib/modules?

5) Whose bug is it that systemtap doesn't look for /usr/lib/debug/.../ 
autofs4.ko, but only autofs4.ko.debug?
Apparently this is a difference between debian and Fedora. Fedora  
systems append .debug, Debian systems do not. My guess: debian should  
be patching their copy of elfutils to not append ".debug"? But maybe  
that's an upstream bug, and it should try both by default (or  
something). I dunno.

Someone else discovered the ".debug" issue in another program:
http://www.visophyte.org/rev_control/patches/chronicle-recorder/debian-usr-lib-debug-support.patch
And here's the debian reference about how to install debuginfo:
http://www.debian.org/doc/developers-reference/best-pkging-practices.html#bpp-dbg


I guess all these except the first are probably bugs in elfutils, not  
systemtap, so perhaps I should be reporting it there instead. But  
despite what you might think, I actually have no clue about any of  
this crap: any clue you might infer from the above has all been gained  
by random flailing over the course of the last couple hours. So I  
figure it's safer to report here, first and redirect if requested. :)

James

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Troubles with debug info, using systemtap on debian.
  2009-11-10  1:18 Troubles with debug info, using systemtap on debian James Y Knight
@ 2009-11-10  9:35 ` Eugeniy Meshcheryakov
  2009-11-10 10:19   ` Eugeniy Meshcheryakov
                     ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Eugeniy Meshcheryakov @ 2009-11-10  9:35 UTC (permalink / raw)
  To: James Y Knight; +Cc: systemtap

[-- Attachment #1: Type: text/plain, Size: 2769 bytes --]

Hello,

9 листопада 2009 о 20:18 -0500 James Y Knight написав(-ла):
> Systemtap was working fine, for symbols in vmlinux, but segfaulted
> when trying to probe modules. E.g., the simplest script segfaulted
> in the translator.
> 
> probe module("autofs4").function("autofs4_fill_super") {}
I never saw systemtap segfaulting. I do not have autofs4 module, but i
tried with snd module and it works. What version of
systemtap/libelf1/libdw1 do you use?

> debug package. Well, that failed, because the files there are
> apparently expected to be called: *.ko.debug, but I had a file
> called:
> /usr/lib/debug/lib/modules/2.6.31-jknight-1-amd64/kernel/fs/autofs4/
> autofs4.ko
> instead. So, I symlinked it to be called autofs4.ko.debug.
My bad, I rarely clean the build tree... Still I do not understand why
this .debug is needed... I'm going to ask kernel-package or elfutils
maintainers to change this...

> 3) The debian kernel's debuginfo does "objcopy --only-keep-
> debug"...That seems like it shouldn't cause systemtap to blow up,
> but it does. I guess that's a known bug?
No it is not. At least not for me.

> 4) Why does it blow up *differently* depending on whether it found
> the file in /usr/lib/debug or /lib/modules?
> 
> 5) Whose bug is it that systemtap doesn't look for
> /usr/lib/debug/.../autofs4.ko, but only autofs4.ko.debug?
> Apparently this is a difference between debian and Fedora. Fedora
> systems append .debug, Debian systems do not. My guess: debian
> should be patching their copy of elfutils to not append ".debug"?
> But maybe that's an upstream bug, and it should try both by default
> (or something).
> I dunno.
Me too.

> 
> Someone else discovered the ".debug" issue in another program:
> http://www.visophyte.org/rev_control/patches/chronicle-recorder/debian-usr-lib-debug-support.patch
> And here's the debian reference about how to install debuginfo:
> http://www.debian.org/doc/developers-reference/best-pkging-practices.html#bpp-dbg
If it is really Fedors's policy to append .debug to file names under
/usr/lib/debug (I did not know about that), then I guess Debian elfutils
should be modified to not append .debug to kernel modules names to
comply with Debian policy. Or hopefuly it can be done upstream.

> 
> 
> I guess all these except the first are probably bugs in elfutils,
> not systemtap, so perhaps I should be reporting it there instead.
> But despite what you might think, I actually have no clue about any
> of this crap: any clue you might infer from the above has all been
> gained by random flailing over the course of the last couple hours.
> So I figure it's safer to report here, first and redirect if
> requested. :)
> 
> James

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Troubles with debug info, using systemtap on debian.
  2009-11-10  9:35 ` Eugeniy Meshcheryakov
@ 2009-11-10 10:19   ` Eugeniy Meshcheryakov
  2009-11-10 18:07   ` Frank Ch. Eigler
  2009-11-10 18:38   ` James Y Knight
  2 siblings, 0 replies; 8+ messages in thread
From: Eugeniy Meshcheryakov @ 2009-11-10 10:19 UTC (permalink / raw)
  To: James Y Knight; +Cc: systemtap

[-- Attachment #1: Type: text/plain, Size: 621 bytes --]

10 листопада 2009 о 10:32 +0100 Eugeniy Meshcheryakov написав(-ла):
> > debug package. Well, that failed, because the files there are
> > apparently expected to be called: *.ko.debug, but I had a file
> > called:
> > /usr/lib/debug/lib/modules/2.6.31-jknight-1-amd64/kernel/fs/autofs4/
> > autofs4.ko
> > instead. So, I symlinked it to be called autofs4.ko.debug.
> My bad, I rarely clean the build tree... Still I do not understand why
> this .debug is needed... I'm going to ask kernel-package or elfutils
> maintainers to change this...
Bug report is here http://bugs.debian.org/555549


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Troubles with debug info, using systemtap on debian.
  2009-11-10  9:35 ` Eugeniy Meshcheryakov
  2009-11-10 10:19   ` Eugeniy Meshcheryakov
@ 2009-11-10 18:07   ` Frank Ch. Eigler
  2009-11-10 18:24     ` James Y Knight
  2009-11-10 18:38   ` James Y Knight
  2 siblings, 1 reply; 8+ messages in thread
From: Frank Ch. Eigler @ 2009-11-10 18:07 UTC (permalink / raw)
  To: Eugeniy Meshcheryakov; +Cc: James Y Knight, systemtap

Eugeniy Meshcheryakov <eugen@debian.org> writes:

> [...]
>> 3) The debian kernel's debuginfo does "objcopy --only-keep-
>> debug"...That seems like it shouldn't cause systemtap to blow up,
>> but it does. I guess that's a known bug? [...]
>> 5) Whose bug is it that systemtap doesn't look for
>> /usr/lib/debug/.../autofs4.ko, but only autofs4.ko.debug?
>> [...]

The issue here is that elfutils looks for .ko.debug files, if the
original .ko was stripped of debug data.  The fedora naming convention
communicates the fact that the separated .ko.debug files are not
.ko's, in that they lack executable .text/.data/etc. payload.

Some distributions don't strip the debug data the same way as fedora,
but instead preserve the original unstripped binaries under
/usr/lib/debug or similar.  In this case, since the original files are
complete, it makes sense not to rename them "anything.debug", but OTOH
then elfutils must break the tie between that copy and an identically
named stripped one.

I believe Roland is aware of the issue, but hasn't indicated
how/whether he plans to handle all the permutations.  I'm sure he'd
welcome concrete suggestions/patches.

- FChE

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Troubles with debug info, using systemtap on debian.
  2009-11-10 18:07   ` Frank Ch. Eigler
@ 2009-11-10 18:24     ` James Y Knight
  0 siblings, 0 replies; 8+ messages in thread
From: James Y Knight @ 2009-11-10 18:24 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: Eugeniy Meshcheryakov, systemtap


On Nov 10, 2009, at 1:07 PM, Frank Ch. Eigler wrote:

> Eugeniy Meshcheryakov <eugen@debian.org> writes:
>
>> [...]
>>> 3) The debian kernel's debuginfo does "objcopy --only-keep-
>>> debug"...That seems like it shouldn't cause systemtap to blow up,
>>> but it does. I guess that's a known bug? [...]
>>> 5) Whose bug is it that systemtap doesn't look for
>>> /usr/lib/debug/.../autofs4.ko, but only autofs4.ko.debug?
>>> [...]
>
> The issue here is that elfutils looks for .ko.debug files, if the
> original .ko was stripped of debug data.  The fedora naming convention
> communicates the fact that the separated .ko.debug files are not
> .ko's, in that they lack executable .text/.data/etc. payload.
>
> Some distributions don't strip the debug data the same way as fedora,
> but instead preserve the original unstripped binaries under
> /usr/lib/debug or similar.  In this case, since the original files are
> complete, it makes sense not to rename them "anything.debug", but OTOH
> then elfutils must break the tie between that copy and an identically
> named stripped one.

Debian preserves only the debug info in /usr/lib/debug, with objcopy -- 
only-keep-debug. So, the files are generally not complete. But, they  
are named "anything", not "anything.debug".

James

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Troubles with debug info, using systemtap on debian.
  2009-11-10  9:35 ` Eugeniy Meshcheryakov
  2009-11-10 10:19   ` Eugeniy Meshcheryakov
  2009-11-10 18:07   ` Frank Ch. Eigler
@ 2009-11-10 18:38   ` James Y Knight
  2009-11-11  9:41     ` Eugeniy Meshcheryakov
  2 siblings, 1 reply; 8+ messages in thread
From: James Y Knight @ 2009-11-10 18:38 UTC (permalink / raw)
  To: Eugeniy Meshcheryakov; +Cc: systemtap

On Nov 10, 2009, at 4:32 AM, Eugeniy Meshcheryakov wrote:
>> probe module("autofs4").function("autofs4_fill_super") {}
> I never saw systemtap segfaulting. I do not have autofs4 module, but i
> tried with snd module and it works. What version of
> systemtap/libelf1/libdw1 do you use?

After seeing the segfaults, I compiled both systemtap and elfutils  
from latest git head, as of yesterday. Same segfaults. Before that, I  
was using:
systemtap 1.0-2
libelf/etc 0.143-1

The segfaults only occur if the installed module is stripped of debug  
data (which it usually is), and if the debug data itself is stripped  
of code. That is the case with the files debian's kernel-package  
installs into /usr/lib/debug/*.ko, but I'm led to believe is not the  
case with the files that Fedora installs into /usr/lib/debug/*.ko.debug.

Unless something *further* strange is going on in my environment,  
anyone should be able to reproduce by:
1) removing the symlink to your kernel build dir (rm /lib/modules/ 
$VERS/build): renaming is not enough, systemtap still finds it!
2) ensuring that the debug info in /usr/lib/debug is stripped of code  
with objcopy --only-keep-debug.
3) ensuring the kernel modules in /lib/modules/$VERS are stripped of  
debug info.

I'd certainly be interested if people can't reproduce this, so I can  
look further to try to figure out what else might be a triggering  
factor...

>> 3) The debian kernel's debuginfo does "objcopy --only-keep-
>> debug"...That seems like it shouldn't cause systemtap to blow up,
>> but it does. I guess that's a known bug?
> No it is not. At least not for me.

When I was discussing this on IRC "przemoc86" mentioned that debuginfo  
stripped of code might not be currently supported. In any case, a  
segfault seems poor, whether or not it's supposed to be supported. :)

James

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Troubles with debug info, using systemtap on debian.
  2009-11-10 18:38   ` James Y Knight
@ 2009-11-11  9:41     ` Eugeniy Meshcheryakov
  2009-11-11  9:50       ` Eugeniy Meshcheryakov
  0 siblings, 1 reply; 8+ messages in thread
From: Eugeniy Meshcheryakov @ 2009-11-11  9:41 UTC (permalink / raw)
  To: James Y Knight; +Cc: systemtap

[-- Attachment #1: Type: text/plain, Size: 887 bytes --]

Hello,

10 листопада 2009 о 13:38 -0500 James Y Knight написав(-ла):
> Unless something *further* strange is going on in my environment,
> anyone should be able to reproduce by:
> 1) removing the symlink to your kernel build dir (rm /lib/modules/
> $VERS/build): renaming is not enough, systemtap still finds it!
> 2) ensuring that the debug info in /usr/lib/debug is stripped of
> code with objcopy --only-keep-debug.
> 3) ensuring the kernel modules in /lib/modules/$VERS are stripped of
> debug info.
This does not work at all. With DWARF probes stap does not find debug
info (and does not segfault). With dwarfless probes stap still requires
build directory (I guess to build module?) and does not segfault.

I tried to probe module("snd").function("snd_open") and
kprobe.module("snd").function("snd_open").

Did you try to remove ~/.systemtap?

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Troubles with debug info, using systemtap on debian.
  2009-11-11  9:41     ` Eugeniy Meshcheryakov
@ 2009-11-11  9:50       ` Eugeniy Meshcheryakov
  0 siblings, 0 replies; 8+ messages in thread
From: Eugeniy Meshcheryakov @ 2009-11-11  9:50 UTC (permalink / raw)
  To: James Y Knight; +Cc: systemtap

[-- Attachment #1: Type: text/plain, Size: 939 bytes --]

11 листопада 2009 о 10:41 +0100 Eugeniy Meshcheryakov написав(-ла):
> Hello,
> 
> 10 листопада 2009 о 13:38 -0500 James Y Knight написав(-ла):
> > Unless something *further* strange is going on in my environment,
> > anyone should be able to reproduce by:
> > 1) removing the symlink to your kernel build dir (rm /lib/modules/
> > $VERS/build): renaming is not enough, systemtap still finds it!
> > 2) ensuring that the debug info in /usr/lib/debug is stripped of
> > code with objcopy --only-keep-debug.
> > 3) ensuring the kernel modules in /lib/modules/$VERS are stripped of
> > debug info.
> This does not work at all. With DWARF probes stap does not find debug
> info (and does not segfault). With dwarfless probes stap still requires
> build directory (I guess to build module?) and does not segfault.
> 
I tried to symlink snd.ko to snd.ko.debug and can reproduce the bug now.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2009-11-11  9:50 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-11-10  1:18 Troubles with debug info, using systemtap on debian James Y Knight
2009-11-10  9:35 ` Eugeniy Meshcheryakov
2009-11-10 10:19   ` Eugeniy Meshcheryakov
2009-11-10 18:07   ` Frank Ch. Eigler
2009-11-10 18:24     ` James Y Knight
2009-11-10 18:38   ` James Y Knight
2009-11-11  9:41     ` Eugeniy Meshcheryakov
2009-11-11  9:50       ` Eugeniy Meshcheryakov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).