public inbox for elfutils@sourceware.org
 help / color / mirror / Atom feed
* Re: Using dwfl to enumerate frames of current thread
@ 2015-08-20 17:06 Josh Stone
  0 siblings, 0 replies; 25+ messages in thread
From: Josh Stone @ 2015-08-20 17:06 UTC (permalink / raw)
  To: elfutils-devel

[-- Attachment #1: Type: text/plain, Size: 397 bytes --]

On 08/20/2015 08:32 AM, Mark Wielaard wrote:
> See Dwarf 4 2.17 Code Addresses and Ranges. In particular elfutils takes
> advantage of:
> "If an entity has no associated machine code, none of these attributes
> are specified."

(A -> B) does not give you (B -> A)!

That is, the spec does *not* say "If none of these attributes are
specified, the entity has no associated machine code."

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Using dwfl to enumerate frames of current thread
@ 2015-08-25 12:23 Mark Wielaard
  0 siblings, 0 replies; 25+ messages in thread
From: Mark Wielaard @ 2015-08-25 12:23 UTC (permalink / raw)
  To: elfutils-devel

[-- Attachment #1: Type: text/plain, Size: 1023 bytes --]

Hi Ben,

On Tue, 2015-08-25 at 12:59 +0200, Ben Gamari wrote:
> Right, we have a number of Darwin users and I still haven't figured out
> how we might support them. That being said I personally have little
> interest in putting time into proprietary platforms so I'm not terribly
> concerned.

Right. Our first goal is to nicely support GNU/Linux platforms. If there
is interest then we aren't against supporting other platforms. But
besides some small kfreebsd patches we haven't seen much interest.
Patches welcome.

> On this note, would you be willing to accept a patch adding
> dwfl_attach_local() functionality? My x86-64 implementation appears to
> work, although it could probably use a second set of eyes. I may also be
> able to provide ARM and i386 implementations.

Yes, that sounds like it might be interesting to other users too.
See https://git.fedorahosted.org/cgit/elfutils.git/plain/CONTRIBUTING
Just post proposed patches to the list and we'll take it from there.

Thanks,

Mark

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Using dwfl to enumerate frames of current thread
@ 2015-08-25 10:59 Ben Gamari
  0 siblings, 0 replies; 25+ messages in thread
From: Ben Gamari @ 2015-08-25 10:59 UTC (permalink / raw)
  To: elfutils-devel

[-- Attachment #1: Type: text/plain, Size: 7161 bytes --]

Mark Wielaard <mjw@redhat.com> writes:

> On Sat, Aug 22, 2015 at 12:18:46PM +0200, Ben Gamari wrote:
>> 
>> I actually ask because while we don't produce or use the .debug_ranges
>> section, they sneak into our executables from the C runtime system
>> objects. However, whatever is emitting it seems to be doing a
>> questionable job,
>> 
>>     $ objdump --dwarf-check -e inplace/lib/bin/ghc-stage2 -x > /dev/null
>>     objdump: Warning: There is an overlap [0x780 - 0x750] in .debug_ranges section.
>
> I wonder why binutils objdump is really warning about that.
> In this case it looks like the pc ranges might indeed overlap, but that is
> somewhat expected. I don't think it is actually wrong. e.g. this one seems
> to be a two nested lexical blocks, the outer block [750] obviously will
> overlap with the pc ranges of the inner block [780]:
>
>  [c1bdf9]            lexical_block
>                      ranges               (sec_offset) range list [   750]
>  [c1bdfe]              variable
>                        abstract_origin      (ref4) [c1bad1]
>                        location             (sec_offset) location list [  7269]
>                        [...]
>  [c1be46]              lexical_block
>                        ranges               (sec_offset) range list [   780]
>  [c1be4b]                variable
>                          abstract_origin      (ref4) [c1bb2e]
>                          location             (exprloc) 
>                           [   0] fbreg -68
Ahh, interesting. Right, I've been ignoring this for now but perhaps
I'll ask the binutils folks for some clarification just to make sure.
>
>> You'll be pleased to know that libdw has worked out quite well. For
>> instance, with my patch [1] GHC can produce a backtrace like,
>> 
...
>> 
>> There's still a fair amount of work left to integrate this fully, but at
>> least the tricky DWARF work is done.
>
> Very nice!
> Any idea why the libdw.so/dwfl_ functions don't have any info?
> Where they simply build without debuginfo?
>
Right, I believe in this case I was running against my distribution's
libdw, for which I have no debuginfo.

>> [1] https://phabricator.haskell.org/D1156
>
> Some quick answers to some of the questions there (I didn't read the full
> bug report or the patch, so please let me know if you have any specific
> questions):
>
Great! Thank you very much for taking the time to do this.

> - portability of elfutils.
>   It is ported across a lot of arches on GNU/Linux.
>   i386, x86_64, ppc, ppc64, ppc64le, s390x, arm and aarch64 are at least
>   regularly tested (should be zero fail at release time) and there are
>   other ports in the backends both in tree [alpha, ia64, sparc, tilegx]
>   and some not yet merged out of tree [mips, m68k, hppa].
>   In theory it should also work on other ELF/DWARF bases systems like
>   *BSD, Debian has some limited success with kfreebsd, and Solaris. But
>   there are some tricky dependencies of some of the dwfl functions on the
>   /proc file system and ptrace, not all of them have clean backend/ebl
>   functions. Darwin/MacOS is a bit harder since it doesn't use ELF and
>   libdw currently depends on the DWARF container being ELF (and I have
>   no idea what the ptrace/proc story is on Darwin). Windows is probably
>   pretty hard given that it doesn't natively support ELF, DWARF or
>   ptrace/proc.
>
Right, we have a number of Darwin users and I still haven't figured out
how we might support them. That being said I personally have little
interest in putting time into proprietary platforms so I'm not terribly
concerned.

On this note, would you be willing to accept a patch adding
dwfl_attach_local() functionality? My x86-64 implementation appears to
work, although it could probably use a second set of eyes. I may also be
able to provide ARM and i386 implementations.

> - As I said before between libunwind/elfutils and libbacktrace I actually
>   would have expected libbacktrace to be the easiest for you to use
>   since it is actually designed for in-process unwinding. libunwind
>   tries to do both in- and out-of-process unwinding, which I think is a
>   little confusing, and has much less other functionality than elfutils
>   with respect to model process memory, libraries, ELF, DWARF and symbol
>   inspection. And elfutils really only tries to support out-of-process
>   unwinding (but you happily managed to make it do in-process anyway, so
>   maybe our design isn't so bad). Now that you got your DWARF/CFI correct
>   I would give libbacktrace another go.
>
This would be interesting to try. I've been trying to keep the design
open to supporting multiple unwinding backends, so it should be easy to
dust off my libbacktrace code and try it out.

> - To mark an end of stack you should set the CFI rule for the return
>   register to undefined. See 6.4.4 Call Frame Calling Address.
>   On x86_64 the return register is often just equal to rip and so
>   using .cfi_undefined rip (in gas assembler) would do the trick.
>   In general you can find fun and wonderful CFI describing interesting
>   register unwinding tricks in glibc internals (try start.S, clone.S and
>   __longjmp.S).
>
Right, this is definitely a useful hint. That being said, Haskell
code is typically called from C code. Ideally we'd be able to resume
unwinding the C stack, not simply terminate unwinding. I think this
should be possible but first I need to work out the RTS entry/exit
convention.

> - Yes, perf can use elfutils to do unwinding. It does this "after the fact"
>   It has a initial registers handler and memory read handler like you
>   probably made for the in-process Dwfl_Thread_Callbacks. But they use
>   the dumped register and partial stack dump they made during runtime
>   to do the actual unwinding. So this only works if the CFI in your
>   binary is complete and it is (mostly) expressed through the contents
>   of the initial register dump and the stack values (which is almost always
>   the case).
>
> - We could in theory try to cleanup my hack to not need .debug_aranges
>   if we really want to. But I hope we don't now that you have it :)
>
This shouldn't be necessary. Thanks for the offer though!

> - Why both have pc ranges in the CU and in .debug_aranges (pointing to
>   the CUs)? Because they technically describe different things. The
>   ranges given in the CU are the covered program scope entries (code).
>   While .debug_aranges give the ranges of code and data object addresses
>   described by the CU (although I believe in practice it really is the
>   same and even .debug_aranges only has the code ranges). Secondly it
>   is really mildly more efficient since .debug_aranges is small and
>   compact and doesn't refer to other data sections (the CUs are all
>   spread out in the .debug_info section and can potentially point
>   into the .debug_ranges section when the CU uses DW_AT_low_pc plus
>   DW_AT_ranges to describe more complex pc ranges).
>
Ahhh, I see. Thanks!

Cheers,

- Ben


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Using dwfl to enumerate frames of current thread
@ 2015-08-23 21:57 Mark Wielaard
  0 siblings, 0 replies; 25+ messages in thread
From: Mark Wielaard @ 2015-08-23 21:57 UTC (permalink / raw)
  To: elfutils-devel

[-- Attachment #1: Type: text/plain, Size: 6872 bytes --]

On Sat, Aug 22, 2015 at 12:18:46PM +0200, Ben Gamari wrote:
> Mark Wielaard <mjw@redhat.com> writes:
> 
> > On Fri, Aug 21, 2015 at 06:25:38PM +0200, Ben Gamari wrote:
> >> Does elfutils need .debug_ranges as well?
> >
> > Only if the DWARF producer created DIEs with DW_AT_ranges attributes.
> > You seem to never do that. If your range is just one simple one you can
> > just use a DW_AT_low_pc/high_pc pair and .debug_ranges will never be
> > used.
> >
> I actually ask because while we don't produce or use the .debug_ranges
> section, they sneak into our executables from the C runtime system
> objects. However, whatever is emitting it seems to be doing a
> questionable job,
> 
>     $ objdump --dwarf-check -e inplace/lib/bin/ghc-stage2 -x > /dev/null
>     objdump: Warning: There is an overlap [0x780 - 0x750] in .debug_ranges section.

I wonder why binutils objdump is really warning about that.
In this case it looks like the pc ranges might indeed overlap, but that is
somewhat expected. I don't think it is actually wrong. e.g. this one seems
to be a two nested lexical blocks, the outer block [750] obviously will
overlap with the pc ranges of the inner block [780]:

 [c1bdf9]            lexical_block
                     ranges               (sec_offset) range list [   750]
 [c1bdfe]              variable
                       abstract_origin      (ref4) [c1bad1]
                       location             (sec_offset) location list [  7269]
                       [...]
 [c1be46]              lexical_block
                       ranges               (sec_offset) range list [   780]
 [c1be4b]                variable
                         abstract_origin      (ref4) [c1bb2e]
                         location             (exprloc) 
                          [   0] fbreg -68

> You'll be pleased to know that libdw has worked out quite well. For
> instance, with my patch [1] GHC can produce a backtrace like,
> 
>           0x6ff81f    set_initial_registers (rts/Libdw.c:323.0)
>     0x7fd852eaba68    dwfl_thread_getframes ((null):0.0)
>     0x7fd852eab4bf    (null) ((null):0.0)
>     0x7fd852eab7f7    dwfl_getthreads ((null):0.0)
>     0x7fd852eabde3    dwfl_getthread_frames ((null):0.0)
>           0x6ffbfc    libdw_get_backtrace (rts/Libdw.c:295.0)
>           0x6f187e    backtrace_handler (rts/posix/Signals.c:540.0)
>     0x7fd852b0617f    (null) ((null):0.0)
>           0x407b58    s91t_info (nofib/shootout/n-body/Main.hs:66.27)
>           0x407cf8    r8YB_info (nofib/shootout/n-body/Main.hs:83.5)
>           0x408408    s99O_info (nofib/shootout/n-body/Main.hs:28.19)
>           0x409e80    Main_main1_info (nofib/shootout/n-body/Main.hs:27.26)
>           0x6f41c0    stg_catch_frame_info (rts/Exception.cmm:370.1)
>           0x6f29a8    stg_stop_thread_info (rts/StgStartup.cmm:42.1)
> 
> There's still a fair amount of work left to integrate this fully, but at
> least the tricky DWARF work is done.

Very nice!
Any idea why the libdw.so/dwfl_ functions don't have any info?
Where they simply build without debuginfo?

> [1] https://phabricator.haskell.org/D1156

Some quick answers to some of the questions there (I didn't read the full
bug report or the patch, so please let me know if you have any specific
questions):

- portability of elfutils.
  It is ported across a lot of arches on GNU/Linux.
  i386, x86_64, ppc, ppc64, ppc64le, s390x, arm and aarch64 are at least
  regularly tested (should be zero fail at release time) and there are
  other ports in the backends both in tree [alpha, ia64, sparc, tilegx]
  and some not yet merged out of tree [mips, m68k, hppa].
  In theory it should also work on other ELF/DWARF bases systems like
  *BSD, Debian has some limited success with kfreebsd, and Solaris. But
  there are some tricky dependencies of some of the dwfl functions on the
  /proc file system and ptrace, not all of them have clean backend/ebl
  functions. Darwin/MacOS is a bit harder since it doesn't use ELF and
  libdw currently depends on the DWARF container being ELF (and I have
  no idea what the ptrace/proc story is on Darwin). Windows is probably
  pretty hard given that it doesn't natively support ELF, DWARF or
  ptrace/proc.

- As I said before between libunwind/elfutils and libbacktrace I actually
  would have expected libbacktrace to be the easiest for you to use
  since it is actually designed for in-process unwinding. libunwind
  tries to do both in- and out-of-process unwinding, which I think is a
  little confusing, and has much less other functionality than elfutils
  with respect to model process memory, libraries, ELF, DWARF and symbol
  inspection. And elfutils really only tries to support out-of-process
  unwinding (but you happily managed to make it do in-process anyway, so
  maybe our design isn't so bad). Now that you got your DWARF/CFI correct
  I would give libbacktrace another go.

- To mark an end of stack you should set the CFI rule for the return
  register to undefined. See 6.4.4 Call Frame Calling Address.
  On x86_64 the return register is often just equal to rip and so
  using .cfi_undefined rip (in gas assembler) would do the trick.
  In general you can find fun and wonderful CFI describing interesting
  register unwinding tricks in glibc internals (try start.S, clone.S and
  __longjmp.S).

- Yes, perf can use elfutils to do unwinding. It does this "after the fact"
  It has a initial registers handler and memory read handler like you
  probably made for the in-process Dwfl_Thread_Callbacks. But they use
  the dumped register and partial stack dump they made during runtime
  to do the actual unwinding. So this only works if the CFI in your
  binary is complete and it is (mostly) expressed through the contents
  of the initial register dump and the stack values (which is almost always
  the case).

- We could in theory try to cleanup my hack to not need .debug_aranges
  if we really want to. But I hope we don't now that you have it :)

- Why both have pc ranges in the CU and in .debug_aranges (pointing to
  the CUs)? Because they technically describe different things. The
  ranges given in the CU are the covered program scope entries (code).
  While .debug_aranges give the ranges of code and data object addresses
  described by the CU (although I believe in practice it really is the
  same and even .debug_aranges only has the code ranges). Secondly it
  is really mildly more efficient since .debug_aranges is small and
  compact and doesn't refer to other data sections (the CUs are all
  spread out in the .debug_info section and can potentially point
  into the .debug_ranges section when the CU uses DW_AT_low_pc plus
  DW_AT_ranges to describe more complex pc ranges).

Cheers,

Mark

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Using dwfl to enumerate frames of current thread
@ 2015-08-22 10:18 Ben Gamari
  0 siblings, 0 replies; 25+ messages in thread
From: Ben Gamari @ 2015-08-22 10:18 UTC (permalink / raw)
  To: elfutils-devel

[-- Attachment #1: Type: text/plain, Size: 4058 bytes --]

Mark Wielaard <mjw@redhat.com> writes:

> On Fri, Aug 21, 2015 at 06:25:38PM +0200, Ben Gamari wrote:
>> Does elfutils need .debug_ranges as well?
>
> Only if the DWARF producer created DIEs with DW_AT_ranges attributes.
> You seem to never do that. If your range is just one simple one you can
> just use a DW_AT_low_pc/high_pc pair and .debug_ranges will never be
> used.
>
I actually ask because while we don't produce or use the .debug_ranges
section, they sneak into our executables from the C runtime system
objects. However, whatever is emitting it seems to be doing a
questionable job,

    $ objdump --dwarf-check -e inplace/lib/bin/ghc-stage2 -x > /dev/null
    objdump: Warning: There is an overlap [0x780 - 0x750] in .debug_ranges section.
    objdump: Warning: There is an overlap [0x6000 - 0x5fc0] in .debug_ranges section.
    objdump: Warning: There is an overlap [0x6860 - 0x6820] in .debug_ranges section.
    objdump: Warning: There is an overlap [0x68f0 - 0x68b0] in .debug_ranges section.
    objdump: Warning: There is an overlap [0x6980 - 0x6940] in .debug_ranges section.
    objdump: Warning: There is an overlap [0x6a10 - 0x69d0] in .debug_ranges section.
    objdump: Warning: There is an overlap [0x6aa0 - 0x6a60] in .debug_ranges section.
    objdump: Warning: There is an overlap [0x6b30 - 0x6af0] in .debug_ranges section.
    objdump: Warning: There is an overlap [0x6bb0 - 0x6b70] in .debug_ranges section.
    objdump: Warning: There is an overlap [0x8b40 - 0x8b10] in .debug_ranges section.
    objdump: Warning: There is an overlap [0x8bb0 - 0x8b80] in .debug_ranges section.
    objdump: Warning: There is an overlap [0x8c20 - 0x8bf0] in .debug_ranges section.
    objdump: Warning: There is an overlap [0x8cd0 - 0x8ca0] in .debug_ranges section.
    objdump: Warning: There is an overlap [0xa5a0 - 0xa570] in .debug_ranges section.
    objdump: Warning: There is an overlap [0xa610 - 0xa5e0] in .debug_ranges section.
    objdump: Warning: There is an overlap [0xa680 - 0xa650] in .debug_ranges section.
    objdump: Warning: There is an overlap [0xa730 - 0xa700] in .debug_ranges section.
    objdump: Warning: There is an overlap [0xba90 - 0xba60] in .debug_ranges section.
    objdump: Warning: There is an overlap [0xd430 - 0xd400] in .debug_ranges section.
    objdump: Warning: There is an overlap [0xdc90 - 0xdc60] in .debug_ranges section.

I have read that this might simply be due to DWARF 4 produced by GCC,
although this is essentially hearsay.

> Also see Appendix B -- Debug Section Relationships in
> http://dwarfstd.org/doc/DWARF4.pdf for a picture of which and how
> different debug sections might refer to each other.
>
Thanks!

You'll be pleased to know that libdw has worked out quite well. For
instance, with my patch [1] GHC can produce a backtrace like,

                  0x6ff81f    set_initial_registers (rts/Libdw.c:323.0)
            0x7fd852eaba68    dwfl_thread_getframes ((null):0.0)
            0x7fd852eab4bf    (null) ((null):0.0)
            0x7fd852eab7f7    dwfl_getthreads ((null):0.0)
            0x7fd852eabde3    dwfl_getthread_frames ((null):0.0)
                  0x6ffbfc    libdw_get_backtrace (rts/Libdw.c:295.0)
                  0x6f187e    backtrace_handler (rts/posix/Signals.c:540.0)
            0x7fd852b0617f    (null) ((null):0.0)
                  0x407b58    s91t_info (nofib/shootout/n-body/Main.hs:66.27)
                  0x407cf8    r8YB_info (nofib/shootout/n-body/Main.hs:83.5)
                  0x408408    s99O_info (nofib/shootout/n-body/Main.hs:28.19)
                  0x409e80    Main_main1_info (nofib/shootout/n-body/Main.hs:27.26)
                  0x6f41c0    stg_catch_frame_info (rts/Exception.cmm:370.1)
                  0x6f29a8    stg_stop_thread_info (rts/StgStartup.cmm:42.1)

There's still a fair amount of work left to integrate this fully, but at
least the tricky DWARF work is done.

Thanks again for your help,

- Ben


[1] https://phabricator.haskell.org/D1156

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Using dwfl to enumerate frames of current thread
@ 2015-08-21 22:53 Mark Wielaard
  0 siblings, 0 replies; 25+ messages in thread
From: Mark Wielaard @ 2015-08-21 22:53 UTC (permalink / raw)
  To: elfutils-devel

[-- Attachment #1: Type: text/plain, Size: 1275 bytes --]

On Fri, Aug 21, 2015 at 06:25:38PM +0200, Ben Gamari wrote:
> Mark Wielaard <mjw@redhat.com> writes:
> 
> > And this is the problem. Sorry. I should have realized earlier.
> > We use the .debug_aranges to get a quick index of the CUs and which
> > address ranges they cover. In the case that there is no .debug_aranges
> > we could do a full scan of all CUs. But that is somewhat inefficient,
> > since no .debug_aranges could also mean that there really are no
> > CUs with address scope DIEs (however that is probably unlikely). But
> > if there is a .debug_aranges then we do assume it is complete. I am
> > thinking whether we should still scan all CUs anyway if we are
> > looking for an address that is really inside a module. But I think
> > that would quickly become very inefficient.
> >
> Does elfutils need .debug_ranges as well?

Only if the DWARF producer created DIEs with DW_AT_ranges attributes.
You seem to never do that. If your range is just one simple one you can
just use a DW_AT_low_pc/high_pc pair and .debug_ranges will never be
used.

Also see Appendix B -- Debug Section Relationships in
http://dwarfstd.org/doc/DWARF4.pdf for a picture of which and how
different debug sections might refer to each other.

Cheers,

Mark

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Using dwfl to enumerate frames of current thread
@ 2015-08-21 22:41 Mark Wielaard
  0 siblings, 0 replies; 25+ messages in thread
From: Mark Wielaard @ 2015-08-21 22:41 UTC (permalink / raw)
  To: elfutils-devel

[-- Attachment #1: Type: text/plain, Size: 2488 bytes --]

On Fri, Aug 21, 2015 at 04:41:20PM +0200, Ben Gamari wrote:
> Mark Wielaard <mjw@redhat.com> writes:
> 
> > On Fri, Aug 21, 2015 at 09:35:14AM +0200, Ben Gamari wrote:
> >
> >> I have noticed that `/opt/exp/elfutils-root/bin/readelf -e
> >> ghc/stage2/build/Main.o --debug-dump=aranges` returns nothing for
> >> GHC-produced objects whereas it does not for objects produced by GCC.
> >
> > And this is the problem. Sorry. I should have realized earlier.
> > We use the .debug_aranges to get a quick index of the CUs and which
> > address ranges they cover. In the case that there is no .debug_aranges
> > we could do a full scan of all CUs. But that is somewhat inefficient,
> > since no .debug_aranges could also mean that there really are no
> > CUs with address scope DIEs (however that is probably unlikely). But
> > if there is a .debug_aranges then we do assume it is complete. I am
> > thinking whether we should still scan all CUs anyway if we are
> > looking for an address that is really inside a module. But I think
> > that would quickly become very inefficient.
> >
> Brilliant! I'll try implementing these.

That would be best I think.

Meanwhile, to show it should really work, the attached horrible hack
makes it work with the current setup (if all CUs do have the pc ranges
attached).

Maybe we can turn this horrible hack into something that is less horrible
to make eu-addr2line and friends work better with missing/partial
.debug_aranges. But then we should really properly cache the results/CUs
we scanned. And make sure that we don't unnecessary scan all CUs if
the .debug_aranges are complete (sadly we cannot really tell just from
the .debug_aranges section itself).

> I know this is probably a "patches accepted" sort of task, but it would
> be great if libdw documented precisely what it expects from user objects
> in order to behave as expected. Even better would be optional warnings
> when the library doesn't find a DWARF annotation that it expects. As
> someone relatively new to DWARF, it is rather difficult to get a
> high-level view of what the significant differences are.

Yeah. Unfortunately DWARF is not very strict. A lot is left as a "quality
of implementation" issue. In practice that means we assume that the
minimum quality is what GCC outputs. And it isn't till some other DWARF
producer comes around that we even realize that is what we assumed was
the "quality" we needed.

Cheers,

Mark

[-- Attachment #2: horrible_getsrc_hack.patch --]
[-- Type: text/plain, Size: 2065 bytes --]

diff --git a/libdwfl/dwfl_module_getsrc.c b/libdwfl/dwfl_module_getsrc.c
index f7e340b..579abad 100644
--- a/libdwfl/dwfl_module_getsrc.c
+++ b/libdwfl/dwfl_module_getsrc.c
@@ -1,5 +1,5 @@
 /* Find source location for PC address in module.
-   Copyright (C) 2005, 2008, 2014 Red Hat, Inc.
+   Copyright (C) 2005, 2008, 2014, 2015 Red Hat, Inc.
    This file is part of elfutils.
 
    This file is free software; you can redistribute it and/or modify
@@ -29,6 +29,19 @@
 #include "libdwflP.h"
 #include "../libdw/libdwP.h"
 
+static bool
+in_cu_range (Dwarf_Die *cudie, Dwarf_Addr addr)
+{
+  ptrdiff_t off = 0;
+  Dwarf_Addr base, begin, end;
+  while ((off = dwarf_ranges (cudie, off, &base, &begin, &end) > 0))
+    {
+      if (addr >= begin && addr < end)
+	return true;
+    }
+  return false;
+}
+
 Dwfl_Line *
 dwfl_module_getsrc (Dwfl_Module *mod, Dwarf_Addr addr)
 {
@@ -38,6 +51,26 @@ dwfl_module_getsrc (Dwfl_Module *mod, Dwarf_Addr addr)
 
   struct dwfl_cu *cu;
   Dwfl_Error error = __libdwfl_addrcu (mod, addr, &cu);
+  /* Horrible hack, if we do this then we should at least cache results.
+     There are two cases here, the aranges indicated no CU with the
+     requested address or the aranges do, but it isn't actually in the
+     actual CU because there was a gap that got optimized away for the
+     quick search and the real CU lies in that gap.  */
+  if ((error == DWFL_E_ADDR_OUTOFRANGE
+       || (error == DWFL_E_NOERROR
+	   && ! in_cu_range (&cu->die, addr)))
+      && addr >= mod->low_addr && addr < mod->high_addr)
+    {
+      /* Assume the user knows there is some CU that should cover
+	 this address, even though there is no aranges for it.  */
+      cu = NULL;
+      while ((error = __libdwfl_nextcu (mod, cu, &cu)) == DWFL_E_NOERROR
+	     && cu != NULL)
+	if (in_cu_range (&cu->die, addr))
+	  break;
+      if (error == DWFL_E_NOERROR && cu == NULL)
+	error = DWFL_E_ADDR_OUTOFRANGE;
+    }
   if (likely (error == DWFL_E_NOERROR))
     error = __libdwfl_cu_getsrclines (cu);
   if (likely (error == DWFL_E_NOERROR))

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Using dwfl to enumerate frames of current thread
@ 2015-08-21 16:25 Ben Gamari
  0 siblings, 0 replies; 25+ messages in thread
From: Ben Gamari @ 2015-08-21 16:25 UTC (permalink / raw)
  To: elfutils-devel

[-- Attachment #1: Type: text/plain, Size: 789 bytes --]

Mark Wielaard <mjw@redhat.com> writes:

> And this is the problem. Sorry. I should have realized earlier.
> We use the .debug_aranges to get a quick index of the CUs and which
> address ranges they cover. In the case that there is no .debug_aranges
> we could do a full scan of all CUs. But that is somewhat inefficient,
> since no .debug_aranges could also mean that there really are no
> CUs with address scope DIEs (however that is probably unlikely). But
> if there is a .debug_aranges then we do assume it is complete. I am
> thinking whether we should still scan all CUs anyway if we are
> looking for an address that is really inside a module. But I think
> that would quickly become very inefficient.
>
Does elfutils need .debug_ranges as well?

Cheers,

- Ben


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Using dwfl to enumerate frames of current thread
@ 2015-08-21 14:41 Ben Gamari
  0 siblings, 0 replies; 25+ messages in thread
From: Ben Gamari @ 2015-08-21 14:41 UTC (permalink / raw)
  To: elfutils-devel

[-- Attachment #1: Type: text/plain, Size: 1504 bytes --]

Mark Wielaard <mjw@redhat.com> writes:

> On Fri, Aug 21, 2015 at 09:35:14AM +0200, Ben Gamari wrote:
>
>> I have noticed that `/opt/exp/elfutils-root/bin/readelf -e
>> ghc/stage2/build/Main.o --debug-dump=aranges` returns nothing for
>> GHC-produced objects whereas it does not for objects produced by GCC.
>
> And this is the problem. Sorry. I should have realized earlier.
> We use the .debug_aranges to get a quick index of the CUs and which
> address ranges they cover. In the case that there is no .debug_aranges
> we could do a full scan of all CUs. But that is somewhat inefficient,
> since no .debug_aranges could also mean that there really are no
> CUs with address scope DIEs (however that is probably unlikely). But
> if there is a .debug_aranges then we do assume it is complete. I am
> thinking whether we should still scan all CUs anyway if we are
> looking for an address that is really inside a module. But I think
> that would quickly become very inefficient.
>
Brilliant! I'll try implementing these.

I know this is probably a "patches accepted" sort of task, but it would
be great if libdw documented precisely what it expects from user objects
in order to behave as expected. Even better would be optional warnings
when the library doesn't find a DWARF annotation that it expects. As
someone relatively new to DWARF, it is rather difficult to get a
high-level view of what the significant differences are.

Thanks for all of your help so far!

- Ben

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Using dwfl to enumerate frames of current thread
@ 2015-08-21 14:26 Mark Wielaard
  0 siblings, 0 replies; 25+ messages in thread
From: Mark Wielaard @ 2015-08-21 14:26 UTC (permalink / raw)
  To: elfutils-devel

[-- Attachment #1: Type: text/plain, Size: 2522 bytes --]

On Fri, Aug 21, 2015 at 09:35:14AM +0200, Ben Gamari wrote:
> So I have added {low_high}_pc attributes to the compilation units yet
> elfutils' addr2line is still not giving me line information,
> 
>     $ addr2line -e inplace/lib/bin/ghc-stage2 0x427150
>     /opt/exp/ghc/ghc//ghc/Main.hs:79
>     $ /opt/exp/elfutils-root/bin/addr2line -e inplace/lib/bin/ghc-stage2 0x427150
>     ??:0
> 
>     $ addr2line -e ghc/stage2/build/Main.o 0x1be9
>     /opt/exp/ghc/ghc//ghc/Main.hs:626
>     $ /opt/exp/elfutils-root/bin/addr2line -e ghc/stage2/build/Main.o 0x1be9
>     ??:0
> 
> The objects in question can be found here,
> 
>   * http://home.smart-cactus.org/~ben/Main.o
>   * http://home.smart-cactus.org/~ben/ghc-stage2
> 
> I've compared the output from readelf and the differences that I've
> noticed really don't seem like they should be significant,
> 
>  * GHC uses addresses for high_pc, whereas GCC appears to use a
>    data8, meaning relative to low_pc. That being said, it appears that
>    elfutils should handle this.

Yes, it should. Using an offset for low_pc is just an optimization.
It might be worth emitting it as offset since that is smaller and
it removes a relocation that the linker will have to resolve.

>  * I use strings instead of strps.

That should also be fine, but using strps is often much more efficient
if you have identical strings (or strings that are the end of another
string).

> 
>  * I don't provide `decl_file`, `decl_line`, `type` or `prototyped`
>    attributes on subprograms

Those are helpful, but shouldn't matter for just finding the lines.

> I have noticed that `/opt/exp/elfutils-root/bin/readelf -e
> ghc/stage2/build/Main.o --debug-dump=aranges` returns nothing for
> GHC-produced objects whereas it does not for objects produced by GCC.

And this is the problem. Sorry. I should have realized earlier.
We use the .debug_aranges to get a quick index of the CUs and which
address ranges they cover. In the case that there is no .debug_aranges
we could do a full scan of all CUs. But that is somewhat inefficient,
since no .debug_aranges could also mean that there really are no
CUs with address scope DIEs (however that is probably unlikely). But
if there is a .debug_aranges then we do assume it is complete. I am
thinking whether we should still scan all CUs anyway if we are
looking for an address that is really inside a module. But I think
that would quickly become very inefficient.

Cheers,

Mark


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Using dwfl to enumerate frames of current thread
@ 2015-08-21  7:35 Ben Gamari
  0 siblings, 0 replies; 25+ messages in thread
From: Ben Gamari @ 2015-08-21  7:35 UTC (permalink / raw)
  To: elfutils-devel

[-- Attachment #1: Type: text/plain, Size: 2585 bytes --]

Mark Wielaard <mjw@redhat.com> writes:

> On Thu, 2015-08-20 at 17:02 +0200, Ben Gamari wrote:
>> Fair enough. However, I was basing my original statement not on the
>> output of elfutils' readelf but instead on libdwfl data structures
>> dumped in gdb.
>> 
>> As you may have realized by now, the problem begins in
>> dwfl_module_getsrc, which returns DWFL_E_ADDR_OUTOFRANGE.
>> If you look at the Dwfl_Module that gets passed to dwfl_module_getsrc,
>> you'll find that mod->ncu == 1 (which is confirmed by traversing the
>> mod->cu list). This struct me as rather odd.
>
> mod->ncu is the number of CUs that have been interned because they have
> been used. But you are correct that the Dwfl isn't seeing/using many
> CUs. The reason for that is that dwfl only caches the CUs which (program
> scope) addresses associated with it. There seem to be only 64 CUs that
> have a DW_AT_low_pc associated with the DW_TAG_compile unit (the same
> number as listed in .debug_aranges). All those seem to be generated by
> GCC for C files. None of the compile_unit generated by ghc seem to have
> address ranges associates with their CUs. Because of that libdwfl
> assumed that there are no program scopes inside that CU (only types) and
> so doesn't use them.
>
So I have added {low_high}_pc attributes to the compilation units yet
elfutils' addr2line is still not giving me line information,

    $ addr2line -e inplace/lib/bin/ghc-stage2 0x427150
    /opt/exp/ghc/ghc//ghc/Main.hs:79
    $ /opt/exp/elfutils-root/bin/addr2line -e inplace/lib/bin/ghc-stage2 0x427150
    ??:0

    $ addr2line -e ghc/stage2/build/Main.o 0x1be9
    /opt/exp/ghc/ghc//ghc/Main.hs:626
    $ /opt/exp/elfutils-root/bin/addr2line -e ghc/stage2/build/Main.o 0x1be9
    ??:0

The objects in question can be found here,

  * http://home.smart-cactus.org/~ben/Main.o
  * http://home.smart-cactus.org/~ben/ghc-stage2

I've compared the output from readelf and the differences that I've
noticed really don't seem like they should be significant,

 * GHC uses addresses for high_pc, whereas GCC appears to use a
   data8, meaning relative to low_pc. That being said, it appears that
   elfutils should handle this.

 * I use strings instead of strps.

 * I don't provide `decl_file`, `decl_line`, `type` or `prototyped`
   attributes on subprograms

I have noticed that `/opt/exp/elfutils-root/bin/readelf -e
ghc/stage2/build/Main.o --debug-dump=aranges` returns nothing for
GHC-produced objects whereas it does not for objects produced by GCC.

Cheers,

- Ben

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Using dwfl to enumerate frames of current thread
@ 2015-08-20 17:39 Mark Wielaard
  0 siblings, 0 replies; 25+ messages in thread
From: Mark Wielaard @ 2015-08-20 17:39 UTC (permalink / raw)
  To: elfutils-devel

[-- Attachment #1: Type: text/plain, Size: 1353 bytes --]

On Thu, 2015-08-20 at 19:32 +0200, Mark Wielaard wrote:
> On Thu, 2015-08-20 at 10:06 -0700, Josh Stone wrote:
> > On 08/20/2015 08:32 AM, Mark Wielaard wrote:
> > > See Dwarf 4 2.17 Code Addresses and Ranges. In particular elfutils takes
> > > advantage of:
> > > "If an entity has no associated machine code, none of these attributes
> > > are specified."
> > 
> > (A -> B) does not give you (B -> A)!
> > 
> > That is, the spec does *not* say "If none of these attributes are
> > specified, the entity has no associated machine code."
> 
> Maybe the spec needs clarification that is what is actually meant. Or
> you can see it as a quality of implementation issue. But it is what we
> need and what we rely on. Otherwise you have no way to know whether or
> not to scan a whole CU or DIE subtree for program scope DIEs. And always
> scanning each and every subtree just in case some program scope DIE is
> hiding deep down is just silly.

I see it is in the non-normative text of 6.1 Accelerated Access:

        To find the debugging information associated with a subroutine,
        given an address, a debugger can use the low and high pc
        attributes of the compilation unit entries to quickly narrow
        down the search

So, yes, it is non-normative, but it still makes a lot of sense :)

Cheers,

Mark

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Using dwfl to enumerate frames of current thread
@ 2015-08-20 17:32 Mark Wielaard
  0 siblings, 0 replies; 25+ messages in thread
From: Mark Wielaard @ 2015-08-20 17:32 UTC (permalink / raw)
  To: elfutils-devel

[-- Attachment #1: Type: text/plain, Size: 886 bytes --]

On Thu, 2015-08-20 at 10:06 -0700, Josh Stone wrote:
> On 08/20/2015 08:32 AM, Mark Wielaard wrote:
> > See Dwarf 4 2.17 Code Addresses and Ranges. In particular elfutils takes
> > advantage of:
> > "If an entity has no associated machine code, none of these attributes
> > are specified."
> 
> (A -> B) does not give you (B -> A)!
> 
> That is, the spec does *not* say "If none of these attributes are
> specified, the entity has no associated machine code."

Maybe the spec needs clarification that is what is actually meant. Or
you can see it as a quality of implementation issue. But it is what we
need and what we rely on. Otherwise you have no way to know whether or
not to scan a whole CU or DIE subtree for program scope DIEs. And always
scanning each and every subtree just in case some program scope DIE is
hiding deep down is just silly.

Cheers,

Mark

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Using dwfl to enumerate frames of current thread
@ 2015-08-20 15:32 Mark Wielaard
  0 siblings, 0 replies; 25+ messages in thread
From: Mark Wielaard @ 2015-08-20 15:32 UTC (permalink / raw)
  To: elfutils-devel

[-- Attachment #1: Type: text/plain, Size: 2093 bytes --]

On Thu, 2015-08-20 at 17:02 +0200, Ben Gamari wrote:
> Fair enough. However, I was basing my original statement not on the
> output of elfutils' readelf but instead on libdwfl data structures
> dumped in gdb.
> 
> As you may have realized by now, the problem begins in
> dwfl_module_getsrc, which returns DWFL_E_ADDR_OUTOFRANGE.
> If you look at the Dwfl_Module that gets passed to dwfl_module_getsrc,
> you'll find that mod->ncu == 1 (which is confirmed by traversing the
> mod->cu list). This struct me as rather odd.

mod->ncu is the number of CUs that have been interned because they have
been used. But you are correct that the Dwfl isn't seeing/using many
CUs. The reason for that is that dwfl only caches the CUs which (program
scope) addresses associated with it. There seem to be only 64 CUs that
have a DW_AT_low_pc associated with the DW_TAG_compile unit (the same
number as listed in .debug_aranges). All those seem to be generated by
GCC for C files. None of the compile_unit generated by ghc seem to have
address ranges associates with their CUs. Because of that libdwfl
assumed that there are no program scopes inside that CU (only types) and
so doesn't use them.

I am pondering if we can use the fact that they do have a
DW_AT_stmt_list to keep/cache them anyway. But I would suggest that ghc
outputs the program scope address ranges that the CU covers (either a
DW_AT_low_pc plus DW_AT_high_pc for a continues address range or a
DW_AT_low_pc plus DW_AT_ranges or non-contiguous address ranges). There
are other places where things probably go wrong if they aren't there.
elfutils tries to be efficient and not read a whole DIE tree unless it
thinks it needs it. One way to do that is to only read the children of a
program scope DIE if it covers the address we are looking for.
See Dwarf 4 2.17 Code Addresses and Ranges. In particular elfutils takes
advantage of:
"If an entity has no associated machine code, none of these attributes
are specified."

Cheers,

Mark

P.S. Is the DW_AT_language code you use for Haskell standardized?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Using dwfl to enumerate frames of current thread
@ 2015-08-20 15:02 Ben Gamari
  0 siblings, 0 replies; 25+ messages in thread
From: Ben Gamari @ 2015-08-20 15:02 UTC (permalink / raw)
  To: elfutils-devel

[-- Attachment #1: Type: text/plain, Size: 1699 bytes --]

Mark Wielaard <mjw@redhat.com> writes:

> On Thu, 2015-08-20 at 16:13 +0200, Ben Gamari wrote:
>> Mark Wielaard <mjw@redhat.com> writes:
>> > Could you post the binary somewhere?
>> > Then I will take a look at what is going wrong.
>> >
>> It can be found here (sorry for the size),
>> 
>>     http://home.smart-cactus.org/~ben/ghc-stage2
>
> Thanks.
>
>> With binutils I find,
>> 
>>     $ addr2line -e inplace/lib/bin/ghc-stage2 -f 0x12011b8
>>     ghc_Util_maybeReadFuzzzzy_info
>>     /opt/exp/ghc/ghc//compiler/utils/Util.hs:955
>> 
>> If I'm interpretting gdb correctly, elfutils seems to think there is
>> only one CU, which sounds like it could be the issue. By my count there
>> should be over 700,
>> 
>>     $ readelf -e  inplace/lib/bin/ghc-stage2 -w | grep -i  "File Name Table (" | wc -l
>>     742
>
> Still debugging, but the debug_info and line_info seem to be parsed
> correctly. The output is just slightly different between eu-readelf and
> binutils readelf:
>
> $ eu-readelf --debug-dump=info -N ./ghc-stage2 | grep "Compilation unit
> at offset" | wc --lines
> 742
>
> $ eu-readelf --debug-dump=decodedline -N ./ghc-stage2 | grep ^\ CU\  |
> wc --lines
> 742
>
Fair enough. However, I was basing my original statement not on the
output of elfutils' readelf but instead on libdwfl data structures
dumped in gdb.

As you may have realized by now, the problem begins in
dwfl_module_getsrc, which returns DWFL_E_ADDR_OUTOFRANGE.
If you look at the Dwfl_Module that gets passed to dwfl_module_getsrc,
you'll find that mod->ncu == 1 (which is confirmed by traversing the
mod->cu list). This struct me as rather odd.

Cheers,

- Ben

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Using dwfl to enumerate frames of current thread
@ 2015-08-20 14:46 Mark Wielaard
  0 siblings, 0 replies; 25+ messages in thread
From: Mark Wielaard @ 2015-08-20 14:46 UTC (permalink / raw)
  To: elfutils-devel

[-- Attachment #1: Type: text/plain, Size: 1149 bytes --]

On Thu, 2015-08-20 at 16:13 +0200, Ben Gamari wrote:
> Mark Wielaard <mjw@redhat.com> writes:
> > Could you post the binary somewhere?
> > Then I will take a look at what is going wrong.
> >
> It can be found here (sorry for the size),
> 
>     http://home.smart-cactus.org/~ben/ghc-stage2

Thanks.

> With binutils I find,
> 
>     $ addr2line -e inplace/lib/bin/ghc-stage2 -f 0x12011b8
>     ghc_Util_maybeReadFuzzzzy_info
>     /opt/exp/ghc/ghc//compiler/utils/Util.hs:955
> 
> If I'm interpretting gdb correctly, elfutils seems to think there is
> only one CU, which sounds like it could be the issue. By my count there
> should be over 700,
> 
>     $ readelf -e  inplace/lib/bin/ghc-stage2 -w | grep -i  "File Name Table (" | wc -l
>     742

Still debugging, but the debug_info and line_info seem to be parsed
correctly. The output is just slightly different between eu-readelf and
binutils readelf:

$ eu-readelf --debug-dump=info -N ./ghc-stage2 | grep "Compilation unit
at offset" | wc --lines
742

$ eu-readelf --debug-dump=decodedline -N ./ghc-stage2 | grep ^\ CU\  |
wc --lines
742

Cheers,

Mark

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Using dwfl to enumerate frames of current thread
@ 2015-08-20 14:28 Ben Gamari
  0 siblings, 0 replies; 25+ messages in thread
From: Ben Gamari @ 2015-08-20 14:28 UTC (permalink / raw)
  To: elfutils-devel

[-- Attachment #1: Type: text/plain, Size: 1968 bytes --]

Mark Wielaard <mjw@redhat.com> writes:

> On Thu, 2015-08-20 at 13:09 +0200, Ben Gamari wrote:
>> It turns out that libbacktrace only uses DWARF line information, not
>> the .debug_frames unwinding information.
>
> It might indeed be that libbacktrace only handles .eh_frame.
> If you already generate .debug_frame it should be easy to
> generate .eh_frame information. The formats are almost the same with a
> few small encoding differences (also .eh_frame can have a .eh_frame_hdr
> index which makes address lookup and unwinding much more efficient).
> http://refspecs.linuxfoundation.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic/ehframechpt.html
> http://www.airs.com/blog/archives/460
>
> Alternatively it might not be too hard to make libbacktrace
> use .debug_frame if it is already loading the .debug_line info then
> making it also load .debug_frame and interpret it mostly like .eh_frame
> is. (elfutils libdw cie.c/fde.c use CFI_IS_EH to distinquish if you want
> to see some of the practical differences)
>
> Again, not wanting you to push away towards another library, but simply
> pointing out different options.
>
Thanks!

The other consideration that is pushing me towards elfutils is the fact
that we may very well end up needing the flexibility that it offers. For
instance, I suspect we can unwind the GHC stack more efficiently (and
perhaps more easily) that the DWARF unwinder by implementing the
unwinding explicitly in the runtime system (in fact, this is already
implemented).

In order for this to be possible, however, we need the ability to look
up symbol and line information from arbitrary addresses. Unfortunately
both libbacktrace and libunwind lack this and other low-level
interfaces.

They are both very convenient (and, for this reason, tempting) but so
far my attempts at using them have consistently ended in my needing
something which their interfaces don't provide.

Cheers,

- Ben

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Using dwfl to enumerate frames of current thread
@ 2015-08-20 14:15 Mark Wielaard
  0 siblings, 0 replies; 25+ messages in thread
From: Mark Wielaard @ 2015-08-20 14:15 UTC (permalink / raw)
  To: elfutils-devel

[-- Attachment #1: Type: text/plain, Size: 1094 bytes --]

On Thu, 2015-08-20 at 13:09 +0200, Ben Gamari wrote:
> It turns out that libbacktrace only uses DWARF line information, not
> the .debug_frames unwinding information.

It might indeed be that libbacktrace only handles .eh_frame.
If you already generate .debug_frame it should be easy to
generate .eh_frame information. The formats are almost the same with a
few small encoding differences (also .eh_frame can have a .eh_frame_hdr
index which makes address lookup and unwinding much more efficient).
http://refspecs.linuxfoundation.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic/ehframechpt.html
http://www.airs.com/blog/archives/460

Alternatively it might not be too hard to make libbacktrace
use .debug_frame if it is already loading the .debug_line info then
making it also load .debug_frame and interpret it mostly like .eh_frame
is. (elfutils libdw cie.c/fde.c use CFI_IS_EH to distinquish if you want
to see some of the practical differences)

Again, not wanting you to push away towards another library, but simply
pointing out different options.

Cheers,

Mark

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Using dwfl to enumerate frames of current thread
@ 2015-08-20 14:13 Ben Gamari
  0 siblings, 0 replies; 25+ messages in thread
From: Ben Gamari @ 2015-08-20 14:13 UTC (permalink / raw)
  To: elfutils-devel

[-- Attachment #1: Type: text/plain, Size: 1743 bytes --]

Mark Wielaard <mjw@redhat.com> writes:

> On Thu, 2015-08-20 at 14:47 +0200, Ben Gamari wrote:
>> Using elfutils 0.163, built by me with all dependencies available,
>> 
>>     $ /opt/exp/elfutils-root/bin/addr2line --version
>>     addr2line (elfutils) 0.163
>>     Copyright (C) 2012 Red Hat, Inc.
>>     This is free software; see the source for copying conditions.  There is NO
>>     warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
>>     Written by Ulrich Drepper.
>> 
>>     $ /opt/exp/elfutils-root/bin/addr2line -e inplace/lib/bin/ghc-stage2 -f 0x25a6898
>>     base_TextziParserCombinatorsziReadP_zdfAlternativePzuzdczlzbzg_info
>>     ??:0
>> 
>> What am I doing wrong here?
>
> Probably just a bug/missing feature in elfutils. If the binary is
> statically linked it might be that we are not able to find all
> information we need. There have been other reports about that.
> https://bugzilla.redhat.com/show_bug.cgi?id=1053583
> Given that most binaries are not static it hasn't been a problem in
> practice.
>
> Could you post the binary somewhere?
> Then I will take a look at what is going wrong.
>
It can be found here (sorry for the size),

    http://home.smart-cactus.org/~ben/ghc-stage2

With binutils I find,

    $ addr2line -e inplace/lib/bin/ghc-stage2 -f 0x12011b8
    ghc_Util_maybeReadFuzzzzy_info
    /opt/exp/ghc/ghc//compiler/utils/Util.hs:955

If I'm interpretting gdb correctly, elfutils seems to think there is
only one CU, which sounds like it could be the issue. By my count there
should be over 700,

    $ readelf -e  inplace/lib/bin/ghc-stage2 -w | grep -i  "File Name Table (" | wc -l
    742

Thanks a ton!

Cheers,

- Ben


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Using dwfl to enumerate frames of current thread
@ 2015-08-20 14:04 Mark Wielaard
  0 siblings, 0 replies; 25+ messages in thread
From: Mark Wielaard @ 2015-08-20 14:04 UTC (permalink / raw)
  To: elfutils-devel

[-- Attachment #1: Type: text/plain, Size: 1110 bytes --]

On Thu, 2015-08-20 at 14:47 +0200, Ben Gamari wrote:
> Using elfutils 0.163, built by me with all dependencies available,
> 
>     $ /opt/exp/elfutils-root/bin/addr2line --version
>     addr2line (elfutils) 0.163
>     Copyright (C) 2012 Red Hat, Inc.
>     This is free software; see the source for copying conditions.  There is NO
>     warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
>     Written by Ulrich Drepper.
> 
>     $ /opt/exp/elfutils-root/bin/addr2line -e inplace/lib/bin/ghc-stage2 -f 0x25a6898
>     base_TextziParserCombinatorsziReadP_zdfAlternativePzuzdczlzbzg_info
>     ??:0
> 
> What am I doing wrong here?

Probably just a bug/missing feature in elfutils. If the binary is
statically linked it might be that we are not able to find all
information we need. There have been other reports about that.
https://bugzilla.redhat.com/show_bug.cgi?id=1053583
Given that most binaries are not static it hasn't been a problem in
practice.

Could you post the binary somewhere?
Then I will take a look at what is going wrong.

Thanks,

Mark

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Using dwfl to enumerate frames of current thread
@ 2015-08-20 12:47 Ben Gamari
  0 siblings, 0 replies; 25+ messages in thread
From: Ben Gamari @ 2015-08-20 12:47 UTC (permalink / raw)
  To: elfutils-devel

[-- Attachment #1: Type: text/plain, Size: 1969 bytes --]

Ben Gamari <ben@smart-cactus.org> writes:

> Ben Gamari <ben@smart-cactus.org> writes:
>
snip
>
> Moreover, I have also implemented unwinding of the Haskell evaluation
> stack (the stack used by Haskell code). While the unwinding and symbol
> lookup work, I'm not getting any line information out of libdw. I'm
> quite certain that my object (namely a statically linked executable) has
> proper line information as addr2line can extract meaningful positions
> for the addresses found in my stacktraces.
>
Strangely enough, it actually may be an elfutils issue. I just realized
that I was using addr2line from my system's binutils, not elfutils.
Indeed if I use elfutils's addr2line I also find that there is no line
information.

Using binutils 2.25-11 (Debian Sid),

    $ /usr/bin/addr2line --version
    GNU addr2line (GNU Binutils for Debian) 2.25
    Copyright (C) 2014 Free Software Foundation, Inc.
    This program is free software; you may redistribute it under the terms of
    the GNU General Public License version 3 or (at your option) any later version.
    This program has absolutely no warranty.

    $ /usr/bin/addr2line -e inplace/lib/bin/ghc-stage2 -f 0x25a6898
    base_TextziParserCombinatorsziReadP_zdfAlternativePzuzdczlzbzg_info
    /opt/exp/ghc/ghc//libraries/base/Text/ParserCombinators/ReadP.hs:128

Using elfutils 0.163, built by me with all dependencies available,

    $ /opt/exp/elfutils-root/bin/addr2line --version
    addr2line (elfutils) 0.163
    Copyright (C) 2012 Red Hat, Inc.
    This is free software; see the source for copying conditions.  There is NO
    warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
    Written by Ulrich Drepper.

    $ /opt/exp/elfutils-root/bin/addr2line -e inplace/lib/bin/ghc-stage2 -f 0x25a6898
    base_TextziParserCombinatorsziReadP_zdfAlternativePzuzdczlzbzg_info
    ??:0

What am I doing wrong here?

Cheers,

- Ben

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Using dwfl to enumerate frames of current thread
@ 2015-08-20 11:09 Ben Gamari
  0 siblings, 0 replies; 25+ messages in thread
From: Ben Gamari @ 2015-08-20 11:09 UTC (permalink / raw)
  To: elfutils-devel

[-- Attachment #1: Type: text/plain, Size: 3031 bytes --]

Ben Gamari <ben@smart-cactus.org> writes:

> Mark Wielaard <mjw@redhat.com> writes:
>
>> On Wed, 2015-08-19 at 14:11 +0200, Ben Gamari wrote:
>>> I am adding full DWARF unwinding-based backtrace support to the Glasgow
>>> Haskell Compiler's (GHC) runtime system. GHC now emits DWARF information
>>> in the object files it produces [1] and we would like to be able to
>>> record full backtraces on both runtime system events and the user's
>>> request.
>>
>> Nice. The [1] doesn't point anywhere though.
>>
> Doh.
>
> This was supposed to point to Peter Wortmann's thesis [1]. As you might
> imagine, generating useful line number information for a
> lazily evaluated language without compromising optimization
> opportunities is not a trivial task. Peter deserves quite some credit
> for getting this working.
>
>>> The goal here is to expose a function to the runtime system and user
>>> programs allowing them to get a snapshot of the calling thread's current
>>> call-stack.
>>
>> I don't want to push you away from using elfutils, but have you looked
>> at gcc's libbacktrace? That seems to be made precisely for your use case
>> (it is how gccgo provides call-stacks to the go runtime).
>> https://gcc.gnu.org/git/?p=gcc.git;a=tree;f=libbacktrace;hb=HEAD
>>
> This looks great although it's not immediately clear to me whether/how
> it handles symbols from shared objects (which unfortunately GHC now
> produces and links against by default).
>
It turns out that libbacktrace only uses DWARF line information, not
the .debug_frames unwinding information. This means that we would have
needed to implement our own unwinder for the Haskell stack (separate
From the C stack). Unfortunately libbacktrace doesn't provide the
necessary tools to make this feasible. I also ended up looking at
libunwind, which seems to have similar goals to libbacktrace and also
suffers from the same insufficient interfaces.

So, I've come back to libdw. At this point I have basic stacktrace
output for the C stack using libdw's unwinder. My set_initial_registers
implementation can be found here [1]. It seems to work reasonably well
although I would appreciate a second pair of eyes to ensure I didn't
miss something.

Moreover, I have also implemented unwinding of the Haskell evaluation
stack (the stack used by Haskell code). While the unwinding and symbol
lookup work, I'm not getting any line information out of libdw. I'm
quite certain that my object (namely a statically linked executable) has
proper line information as addr2line can extract meaningful positions
for the addresses found in my stacktraces.

My stacktrace implementation is quite simple [2] and I suspect I'm
merely missing something obvious. Can anyone spot what I've missed?

Thanks,

- Ben


[1] https://github.com/ghc/ghc/compare/142a673...bgamari:libdw#diff-c6aeaf799bec63621050f2515ca3ba9eR207
[2] https://github.com/ghc/ghc/compare/142a673...bgamari:libdw#diff-c6aeaf799bec63621050f2515ca3ba9eR241


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Using dwfl to enumerate frames of current thread
@ 2015-08-19 14:25 Ben Gamari
  0 siblings, 0 replies; 25+ messages in thread
From: Ben Gamari @ 2015-08-19 14:25 UTC (permalink / raw)
  To: elfutils-devel

[-- Attachment #1: Type: text/plain, Size: 3249 bytes --]

Mark Wielaard <mjw@redhat.com> writes:

> On Wed, 2015-08-19 at 14:11 +0200, Ben Gamari wrote:
>> I am adding full DWARF unwinding-based backtrace support to the Glasgow
>> Haskell Compiler's (GHC) runtime system. GHC now emits DWARF information
>> in the object files it produces [1] and we would like to be able to
>> record full backtraces on both runtime system events and the user's
>> request.
>
> Nice. The [1] doesn't point anywhere though.
>
Doh.

This was supposed to point to Peter Wortmann's thesis [1]. As you might
imagine, generating useful line number information for a
lazily evaluated language without compromising optimization
opportunities is not a trivial task. Peter deserves quite some credit
for getting this working.

>> The goal here is to expose a function to the runtime system and user
>> programs allowing them to get a snapshot of the calling thread's current
>> call-stack.
>
> I don't want to push you away from using elfutils, but have you looked
> at gcc's libbacktrace? That seems to be made precisely for your use case
> (it is how gccgo provides call-stacks to the go runtime).
> https://gcc.gnu.org/git/?p=gcc.git;a=tree;f=libbacktrace;hb=HEAD
>
This looks great although it's not immediately clear to me whether/how
it handles symbols from shared objects (which unfortunately GHC now
produces and links against by default).

>>  It seems that elfutils' current dwfl interfaces only allow
>> for extraction of frames from Core dumps (with dwfl_core_file_attach)
>> and ptrace'd processes (with dwfl_linux_proc_attach).
>> 
>> In principle, however, I can't see any reason why frames couldn't be
>> extracted from the current thread. Besides taking care to preserve the
>> registers at the out-set, it doesn't even seem that this should be 
>> terribly difficult. Am I missing something? Does the interface for this
>> exist and I have simply overlooked it? Does it not yet exist but is
>> merely waiting for someone to implement it?
>
> In general libdwfl functions are used mostly to introspect other
> processes. But theoretically you could also use them on "self". It is
> not a usecase that has come up yet though (or maybe people do and I just
> didn't hear about it). I do think it should be possible and maybe it
> already kind of works if you use dwfl_linux_proc_report and
> dwfl_linux_proc_attach with pid () as argument. Although I wouldn't be
> surprised if it would deadlock trying to ptrace itself.
>
Right, it looks like it will use ptrace and I can't imagine that going
well.

> So the best thing would be to write specific Dwfl_Thread_Callbacks
> functions for "self". The only tricky part would be the
> set_initial_registers callback. Which should be possible with some arch
> specific assembler tricks.
>
I started down this road and am still pondering how to handle
set_initial_registers. It is definitely a rather delicate operation.
I'll give libbacktrace a shot and let you know if I end up returning to
libdw. I would be happy to contribute a patch upstream if I do end up
getting local backtraces working with libdw.

Thanks for your prompt feedback!

- Ben


[1] http://etheses.whiterose.ac.uk/8321/1/thesis.pdf

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Using dwfl to enumerate frames of current thread
@ 2015-08-19 13:10 Mark Wielaard
  0 siblings, 0 replies; 25+ messages in thread
From: Mark Wielaard @ 2015-08-19 13:10 UTC (permalink / raw)
  To: elfutils-devel

[-- Attachment #1: Type: text/plain, Size: 2174 bytes --]

On Wed, 2015-08-19 at 14:11 +0200, Ben Gamari wrote:
> I am adding full DWARF unwinding-based backtrace support to the Glasgow
> Haskell Compiler's (GHC) runtime system. GHC now emits DWARF information
> in the object files it produces [1] and we would like to be able to
> record full backtraces on both runtime system events and the user's
> request.

Nice. The [1] doesn't point anywhere though.

> The goal here is to expose a function to the runtime system and user
> programs allowing them to get a snapshot of the calling thread's current
> call-stack.

I don't want to push you away from using elfutils, but have you looked
at gcc's libbacktrace? That seems to be made precisely for your use case
(it is how gccgo provides call-stacks to the go runtime).
https://gcc.gnu.org/git/?p=gcc.git;a=tree;f=libbacktrace;hb=HEAD

>  It seems that elfutils' current dwfl interfaces only allow
> for extraction of frames from Core dumps (with dwfl_core_file_attach)
> and ptrace'd processes (with dwfl_linux_proc_attach).
> 
> In principle, however, I can't see any reason why frames couldn't be
> extracted from the current thread. Besides taking care to preserve the
> registers at the out-set, it doesn't even seem that this should be 
> terribly difficult. Am I missing something? Does the interface for this
> exist and I have simply overlooked it? Does it not yet exist but is
> merely waiting for someone to implement it?

In general libdwfl functions are used mostly to introspect other
processes. But theoretically you could also use them on "self". It is
not a usecase that has come up yet though (or maybe people do and I just
didn't hear about it). I do think it should be possible and maybe it
already kind of works if you use dwfl_linux_proc_report and
dwfl_linux_proc_attach with pid () as argument. Although I wouldn't be
surprised if it would deadlock trying to ptrace itself.

So the best thing would be to write specific Dwfl_Thread_Callbacks
functions for "self". The only tricky part would be the
set_initial_registers callback. Which should be possible with some arch
specific assembler tricks.

Cheers,

Mark

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Using dwfl to enumerate frames of current thread
@ 2015-08-19 12:11 Ben Gamari
  0 siblings, 0 replies; 25+ messages in thread
From: Ben Gamari @ 2015-08-19 12:11 UTC (permalink / raw)
  To: elfutils-devel

[-- Attachment #1: Type: text/plain, Size: 1061 bytes --]

I am adding full DWARF unwinding-based backtrace support to the Glasgow
Haskell Compiler's (GHC) runtime system. GHC now emits DWARF information
in the object files it produces [1] and we would like to be able to
record full backtraces on both runtime system events and the user's
request.

The goal here is to expose a function to the runtime system and user
programs allowing them to get a snapshot of the calling thread's current
call-stack. It seems that elfutils' current dwfl interfaces only allow
for extraction of frames from Core dumps (with dwfl_core_file_attach)
and ptrace'd processes (with dwfl_linux_proc_attach).

In principle, however, I can't see any reason why frames couldn't be
extracted from the current thread. Besides taking care to preserve the
registers at the out-set, it doesn't even seem that this should be 
terribly difficult. Am I missing something? Does the interface for this
exist and I have simply overlooked it? Does it not yet exist but is
merely waiting for someone to implement it?

Thanks,

- Ben


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2015-08-25 12:23 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-20 17:06 Using dwfl to enumerate frames of current thread Josh Stone
  -- strict thread matches above, loose matches on Subject: below --
2015-08-25 12:23 Mark Wielaard
2015-08-25 10:59 Ben Gamari
2015-08-23 21:57 Mark Wielaard
2015-08-22 10:18 Ben Gamari
2015-08-21 22:53 Mark Wielaard
2015-08-21 22:41 Mark Wielaard
2015-08-21 16:25 Ben Gamari
2015-08-21 14:41 Ben Gamari
2015-08-21 14:26 Mark Wielaard
2015-08-21  7:35 Ben Gamari
2015-08-20 17:39 Mark Wielaard
2015-08-20 17:32 Mark Wielaard
2015-08-20 15:32 Mark Wielaard
2015-08-20 15:02 Ben Gamari
2015-08-20 14:46 Mark Wielaard
2015-08-20 14:28 Ben Gamari
2015-08-20 14:15 Mark Wielaard
2015-08-20 14:13 Ben Gamari
2015-08-20 14:04 Mark Wielaard
2015-08-20 12:47 Ben Gamari
2015-08-20 11:09 Ben Gamari
2015-08-19 14:25 Ben Gamari
2015-08-19 13:10 Mark Wielaard
2015-08-19 12:11 Ben Gamari

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).