public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* Linux AMI 2011.09.1.x86_64-ebs issue: kernel aki-825ea7eb build-id munging
@ 2011-10-27 20:56 Frank Ch. Eigler
  2011-10-27 21:25 ` Gafton, Cristian
  2011-10-27 22:56 ` Mark Wielaard
  0 siblings, 2 replies; 3+ messages in thread
From: Frank Ch. Eigler @ 2011-10-27 20:56 UTC (permalink / raw)
  To: gafton; +Cc: systemtap

Hi again, Christian -

I'm back for some more EC2 systemtap testing.  You kindly fixed one packaging
problem with kernel-debuginfo back in August, if you recall.  It turns out we
have a new problem; this one related to build-ids.  systemtap uses ELF
build-id notes in order to verify version matching between the running kernel
and one whose ELF/DWARF files it's reading.  On the current default AMI
kernel (2.6.35.14-95.38.amzn1.x86_64), there is a mismatch.

One can see this by hex-dumping /sys/kernel/notes on a running instance,
and contrasting it with 
% readelf -x .notes /usr/lib/debug/lib/modules/`uname -r`/vmlinux
from the corresponding debuginfo.  The last bunch of bytes are supposed
to be identical.

The build-id is getting corrupted at some point during the packaging process.
This precludes systemtap operation:

sudo stap -e 'probe kernel.function("sys_open"){}' -tv
Pass 1: parsed user script and 76 library script(s) using 96240virt/21920res/2788shr kb, in 130usr/20sys/151real ms.
Pass 2: analyzed script: 1 probe(s), 0 function(s), 0 embed(s), 0 global(s) using 196460virt/86980res/51868shr kb, in 270usr/140sys/414real ms.
Pass 3: translated to C into "/tmp/stapVRU8p1/stap_5789e459df56e64ee93d1b2d5fe74936_758.c" using 196460virt/87868res/52756shr kb, in 280usr/10sys/294real ms.
Pass 4: compiled C into "stap_5789e459df56e64ee93d1b2d5fe74936_758.ko" in 4380usr/1590sys/6490real ms.
Pass 5: starting run.
ERROR: Build-id mismatch: "kernel" vs. "vmlinux" byte 0 (0x7c vs 0x01) address 0xffffffff813218f4 rc 0

I seem to recall a kernel makefile (or perhaps elfutils) problem that
resulted in a problem like this before.  IIRC, it was some sort of problem
during the vmlinux debuginfo stripping stage.  Unfortunately, I can't find
a link to the fix of the actual problem.

cc:'ing our team to see if someone's memories can be jogged.


- FChE

^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: Linux AMI 2011.09.1.x86_64-ebs issue: kernel aki-825ea7eb build-id munging
  2011-10-27 20:56 Linux AMI 2011.09.1.x86_64-ebs issue: kernel aki-825ea7eb build-id munging Frank Ch. Eigler
@ 2011-10-27 21:25 ` Gafton, Cristian
  2011-10-27 22:56 ` Mark Wielaard
  1 sibling, 0 replies; 3+ messages in thread
From: Gafton, Cristian @ 2011-10-27 21:25 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: systemtap

Thanks for the report, Frank - I will try to do some digging into it over the weekend and see if I can figure out where the build-ids are getting clobbered. If any additional information/memories come back to you please let me know. Sorry for not getting it fully right, still.

Cristian


-----Original Message-----
From: Frank Ch. Eigler [mailto:fche@redhat.com] 
Sent: Thursday, October 27, 2011 1:56 PM
To: Gafton, Cristian
Cc: systemtap@sourceware.org
Subject: Linux AMI 2011.09.1.x86_64-ebs issue: kernel aki-825ea7eb build-id munging

Hi again, Christian -

I'm back for some more EC2 systemtap testing.  You kindly fixed one packaging problem with kernel-debuginfo back in August, if you recall.  It turns out we have a new problem; this one related to build-ids.  systemtap uses ELF build-id notes in order to verify version matching between the running kernel and one whose ELF/DWARF files it's reading.  On the current default AMI kernel (2.6.35.14-95.38.amzn1.x86_64), there is a mismatch.

One can see this by hex-dumping /sys/kernel/notes on a running instance, and contrasting it with % readelf -x .notes /usr/lib/debug/lib/modules/`uname -r`/vmlinux from the corresponding debuginfo.  The last bunch of bytes are supposed to be identical.

The build-id is getting corrupted at some point during the packaging process.
This precludes systemtap operation:

sudo stap -e 'probe kernel.function("sys_open"){}' -tv Pass 1: parsed user script and 76 library script(s) using 96240virt/21920res/2788shr kb, in 130usr/20sys/151real ms.
Pass 2: analyzed script: 1 probe(s), 0 function(s), 0 embed(s), 0 global(s) using 196460virt/86980res/51868shr kb, in 270usr/140sys/414real ms.
Pass 3: translated to C into "/tmp/stapVRU8p1/stap_5789e459df56e64ee93d1b2d5fe74936_758.c" using 196460virt/87868res/52756shr kb, in 280usr/10sys/294real ms.
Pass 4: compiled C into "stap_5789e459df56e64ee93d1b2d5fe74936_758.ko" in 4380usr/1590sys/6490real ms.
Pass 5: starting run.
ERROR: Build-id mismatch: "kernel" vs. "vmlinux" byte 0 (0x7c vs 0x01) address 0xffffffff813218f4 rc 0

I seem to recall a kernel makefile (or perhaps elfutils) problem that resulted in a problem like this before.  IIRC, it was some sort of problem during the vmlinux debuginfo stripping stage.  Unfortunately, I can't find a link to the fix of the actual problem.

cc:'ing our team to see if someone's memories can be jogged.


- FChE

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Linux AMI 2011.09.1.x86_64-ebs issue: kernel aki-825ea7eb build-id munging
  2011-10-27 20:56 Linux AMI 2011.09.1.x86_64-ebs issue: kernel aki-825ea7eb build-id munging Frank Ch. Eigler
  2011-10-27 21:25 ` Gafton, Cristian
@ 2011-10-27 22:56 ` Mark Wielaard
  1 sibling, 0 replies; 3+ messages in thread
From: Mark Wielaard @ 2011-10-27 22:56 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: gafton, systemtap

On Thu, 2011-10-27 at 16:55 -0400, Frank Ch. Eigler wrote:
> On the current default AMI
> kernel (2.6.35.14-95.38.amzn1.x86_64), there is a mismatch.
> 
> One can see this by hex-dumping /sys/kernel/notes on a running instance,
> and contrasting it with 
> % readelf -x .notes /usr/lib/debug/lib/modules/`uname -r`/vmlinux
> from the corresponding debuginfo.  The last bunch of bytes are supposed
> to be identical.
> [...]
> I seem to recall a kernel makefile (or perhaps elfutils) problem that
> resulted in a problem like this before.  IIRC, it was some sort of problem
> during the vmlinux debuginfo stripping stage.  Unfortunately, I can't find
> a link to the fix of the actual problem.
> 
> cc:'ing our team to see if someone's memories can be jogged.

It doesn't immediately ring a bell [*].

Just to be sure. Does /sys/kernel/notes match
readelf -x .notes /boot/vmlinuz-`uname -r` ?

Is there anything else about /boot/vmlinuz-`uname -r`
vs /usr/lib/debug/lib/modules/`uname -r`/vmlinux that might indicate a
mismatch? Or does everything look fine if you just hack out the stap
build-id safety-check?

Cheers,

Mark

[*] There was https://bugzilla.redhat.com/show_bug.cgi?id=590947
    "debugedit vs modsign changes build ID", but that should only
    impact kernel modules, not the vmlinuz image itself.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2011-10-27 22:56 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-10-27 20:56 Linux AMI 2011.09.1.x86_64-ebs issue: kernel aki-825ea7eb build-id munging Frank Ch. Eigler
2011-10-27 21:25 ` Gafton, Cristian
2011-10-27 22:56 ` Mark Wielaard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).