public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* [Bug translator/16676] New: Inconsistently-biased addresses for ET_EXEC
@ 2014-03-08  0:12 jistone at redhat dot com
  2014-03-08  0:35 ` [Bug translator/16676] " jistone at redhat dot com
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: jistone at redhat dot com @ 2014-03-08  0:12 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=16676

            Bug ID: 16676
           Summary: Inconsistently-biased addresses for ET_EXEC
           Product: systemtap
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: translator
          Assignee: systemtap at sourceware dot org
          Reporter: jistone at redhat dot com

We seem to have inconsistent bias behavior for the build-id in ET_EXEC, and the
problem seems to be from missing debuginfo.  Note that prelink is often another
culprit of bias issues, but I did "prelink -u" for these first.


For my first point of comparison, consider ET_DYN /usr/bin/stap, whether or not
I have systemtap-debuginfo installed:

  $ stap -e 'probe process.plt {next}' -c /usr/bin/stap \
    -p3 -vv --poison-cache |& grep build-id
  Found build-id in /usr/bin/stap, length 20, start at 0x284

For relocatable binaries, it's reasonable that we'd get a plain file offset.


Now ET_EXEC /usr/local/bin/stap, with debuginfo baked in:

  $ stap -e 'probe process.plt {next}' -c /usr/local/bin/stap \
    -p3 -vv --poison-cache |& grep build-id
  Found build-id in /usr/local/bin/stap, length 20, start at 0x400284

It's not relocatable, and now we have an absolute address, ok.


Now ET_EXEC /usr/bin/ls without coreutils-debuginfo:

  $ stap -e 'probe process.plt {next}' -c /usr/bin/ls \
    -p3 -vv --poison-cache |& grep build-id
  Found build-id in /usr/bin/ls, length 20, start at 0x284

So that's inconsistent -- not relocatable, but it's a file offset.


Now ET_EXEC /usr/bin/ls *with* coreutils-debuginfo:

  $ stap -e 'probe process.plt {next}' -c /usr/bin/ls \
    -p3 -vv --poison-cache |& grep build-id
  Found build-id in /usr/bin/ls, length 20, start at 0x400284

We got the absolute address back!


That "Found build-id" line is in translate.cxx dump_build_id().  In a debugger
I can see that dwfl_module_build_id() is giving 0x400284 either way, but when
debuginfo is missing, the dwfl_module_relocate_address() kills the absolute
bias.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug translator/16676] Inconsistently-biased addresses for ET_EXEC
  2014-03-08  0:12 [Bug translator/16676] New: Inconsistently-biased addresses for ET_EXEC jistone at redhat dot com
@ 2014-03-08  0:35 ` jistone at redhat dot com
  2014-03-08  1:12 ` jistone at redhat dot com
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: jistone at redhat dot com @ 2014-03-08  0:35 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=16676

--- Comment #1 from Josh Stone <jistone at redhat dot com> ---
Here's another example, probing function("_start") because that will resolve
from the symbol table either way.  You can see this with "main" too, but it
will be resolving from debuginfo when available, so it's a very different path.

With coreutils-debuginfo:

$ stap -e 'probe process.function("_start") {next}' -c /usr/bin/ls -p2 
# probes
process("/usr/bin/ls").function("_start") /* pc=.absolute+0x4e3c */ /* <-
process("/usr/bin/ls").function("_start") */

Without coreutils-debuginfo:

$ stap -e 'probe process.function("_start") {next}' -c /usr/bin/ls -p2 
# probes
process("/usr/bin/ls").function("_start") /* pc=.dynamic+0x4e3c */ /* <-
process("/usr/bin/ls").function("_start") */


This explains why we aren't bitten by the buildid more often.  For
inode-uprobes, we always ultimately use a file-offset "address", but
.absolute/.dynamic affects how we get task_finder callbacks.  For .absolute, we
use a process callback and fake a 0 "relocation", so having an absolute
build-id address from there works fine.  For .dynamic, we use an mmap callback
where we know the relocation, so having a relative build-id address also works.

But process.plt is always giving me .absolute, which fails if the build-id
address was relative.  Unless it happens to follow a function probe, then it
will becomes .dynamic too. :/  So maybe process.plt just needs to trigger
something in dwfl to make it always follow suit?


(Honestly, I'd rather get rid of the ".absolute" concept, and convert
everything to ".dynamic" with relative addresses, but that may be more
invasive.)

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug translator/16676] Inconsistently-biased addresses for ET_EXEC
  2014-03-08  0:12 [Bug translator/16676] New: Inconsistently-biased addresses for ET_EXEC jistone at redhat dot com
  2014-03-08  0:35 ` [Bug translator/16676] " jistone at redhat dot com
@ 2014-03-08  1:12 ` jistone at redhat dot com
  2014-03-11 23:30 ` mjw at redhat dot com
  2014-03-12 18:47 ` jistone at redhat dot com
  3 siblings, 0 replies; 5+ messages in thread
From: jistone at redhat dot com @ 2014-03-08  1:12 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=16676

--- Comment #2 from Josh Stone <jistone at redhat dot com> ---
Consider:

$ ./stap -e 'probe process.plt("strstr"),process.function("_start") {next}' -c
/usr/bin/ls --poison-cache -p2
# probes
process("/usr/bin/ls").statement(0x402c00) /* pc=.absolute+0x2c00 */ /* <-
process("/usr/bin/ls").plt("strstr").statement(0x402c00) */
process("/usr/bin/ls").function("_start") /* pc=.dynamic+0x4e3c */ /* <-
process("/usr/bin/ls").plt("strstr"),process("/usr/bin/ls").function("_start")
*/

In one run, we changed our mind from .absolute to .dynamic!?!

We make this decision in dwflpp::relocate_address, which looks at
dwfl_module_relocations().  That function will return 0 if mod->e_type is
ET_EXEC, or 1 if mod->e_type is ET_DYN.  And sure enough, the e_type is
changing in the middle of this run.  A hardware watchpoint tells me where:

libdwfl/dwfl_module_getdwarf.c 
 134│   mod->e_type = ehdr->e_type;
 135│
 136│   /* Relocatable Linux kernels are ET_EXEC but act like ET_DYN.  */
 137│   if (mod->e_type == ET_EXEC && file->vaddr != mod->low_addr)
 138├>    mod->e_type = ET_DYN;

(gdb) bt
#0  0x000000370481dc4b in open_elf (file=file@entry=0x2185be0, mod=<optimized
out>, mod=<optimized out>) at dwfl_module_getdwarf.c:138
#1  0x000000370481e4b1 in find_aux_sym (aux_strshndx=<synthetic pointer>,
aux_xndxscn=<synthetic pointer>, aux_symscn=<synthetic pointer>, mod=0x2185b60)
at dwfl_module_getdwarf.c:907
#2  find_symtab (mod=mod@entry=0x2185b60) at dwfl_module_getdwarf.c:1022
#3  0x000000370481ee8e in dwfl_module_getsymtab (mod=0x2185b60) at
dwfl_module_getdwarf.c:1259
#4  0x00000000004e4c24 in symbol_table::get_from_elf (this=0x2188fb0) at
../tapsets.cxx:7806

So when it opened the aux minisymtab (.gnu_debugdata), this triggered a kernel
heuristic that really should not apply to this case.  FWIW, file->vaddr =
0x400020, and mod->low_addr = 0x400000.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug translator/16676] Inconsistently-biased addresses for ET_EXEC
  2014-03-08  0:12 [Bug translator/16676] New: Inconsistently-biased addresses for ET_EXEC jistone at redhat dot com
  2014-03-08  0:35 ` [Bug translator/16676] " jistone at redhat dot com
  2014-03-08  1:12 ` jistone at redhat dot com
@ 2014-03-11 23:30 ` mjw at redhat dot com
  2014-03-12 18:47 ` jistone at redhat dot com
  3 siblings, 0 replies; 5+ messages in thread
From: mjw at redhat dot com @ 2014-03-11 23:30 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=16676

Mark Wielaard <mjw at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mjw at redhat dot com

--- Comment #3 from Mark Wielaard <mjw at redhat dot com> ---
Should be fixed by elfutils commit 65cefbd0793c0f9e90a326d7bebf0a47c93294ad
Author: Josh Stone <jistone@redhat.com>
Date:   Tue Mar 11 10:19:28 2014 -0700

    libdwfl: dwfl_module_getdwarf.c (open_elf) only (re)set mod->e_type once.

    As noted in https://sourceware.org/bugzilla/show_bug.cgi?id=16676#c2 for
    systemtap, the heuristic used by open_elf to set the kernel Dwfl_Module
    type to ET_DYN, even if the underlying ELF file e_type was set to
    ET_EXEC, could trigger erroneously for non-kernel/non-main (debug or
    aux) files.  Make sure we only set the e_type of the module once when
    processing the main file (when the phdrs can be trusted).

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug translator/16676] Inconsistently-biased addresses for ET_EXEC
  2014-03-08  0:12 [Bug translator/16676] New: Inconsistently-biased addresses for ET_EXEC jistone at redhat dot com
                   ` (2 preceding siblings ...)
  2014-03-11 23:30 ` mjw at redhat dot com
@ 2014-03-12 18:47 ` jistone at redhat dot com
  3 siblings, 0 replies; 5+ messages in thread
From: jistone at redhat dot com @ 2014-03-12 18:47 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=16676

Josh Stone <jistone at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #4 from Josh Stone <jistone at redhat dot com> ---
I confirmed on elfutils-0.158-2.fc21, ET_EXEC stays ".absolute" in all cases.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-03-12 18:47 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-08  0:12 [Bug translator/16676] New: Inconsistently-biased addresses for ET_EXEC jistone at redhat dot com
2014-03-08  0:35 ` [Bug translator/16676] " jistone at redhat dot com
2014-03-08  1:12 ` jistone at redhat dot com
2014-03-11 23:30 ` mjw at redhat dot com
2014-03-12 18:47 ` jistone at redhat dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).