public inbox for elfutils@sourceware.org
 help / color / mirror / Atom feed
* location list
@ 2020-06-02 14:18 Sasha Da Rocha Pinheiro
  2020-06-02 17:19 ` Mark Wielaard
  0 siblings, 1 reply; 9+ messages in thread
From: Sasha Da Rocha Pinheiro @ 2020-06-02 14:18 UTC (permalink / raw)
  To: elfutils-devel, Mark Wielaard

Hi all,
I am trying to parse a location list given as an sec_offset. 
How do I get this offset value that points to .debug_loc so I can call dwarf_getlocations()?
Should I pass this offset as the second parameter of this call? 

Sasha

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: location list
  2020-06-02 14:18 location list Sasha Da Rocha Pinheiro
@ 2020-06-02 17:19 ` Mark Wielaard
  2020-06-02 18:12   ` Sasha Da Rocha Pinheiro
  0 siblings, 1 reply; 9+ messages in thread
From: Mark Wielaard @ 2020-06-02 17:19 UTC (permalink / raw)
  To: Sasha Da Rocha Pinheiro, elfutils-devel

Hi,

On Tue, 2020-06-02 at 14:18 +0000, Sasha Da Rocha Pinheiro wrote:
> I am trying to parse a location list given as an sec_offset. 
> How do I get this offset value that points to .debug_loc so I can
> call dwarf_getlocations()?
> Should I pass this offset as the second parameter of this call? 

Normally an offset isn't enough information to resolve an DIE attribute
reference. So if at all possible you should try to use an
Dwarf_Attribute you got from some DIE.

It isn't really supported, but you could try creating a "fake"
attribute that carries all information. e.g.

Dwarf_Attribute loc_attr;
loc_attr.code = DW_AT_location;
loc_attr.form = DW_FORM_data4; /* Assuming 32bit DWARF.  */
loc_attr.valp = &offset; /* Your offset should be a 32bit type.  */
loc_attr.cu = cu;

dwarf_getlocations (&loc_attr, offset, ...);

Note that the CU needs to be version 3 or lower for the above to work.
If the CU is > 3 then the only correct form to use is
DW_FORM_sec_offset, and your valp should point to a uleb128 encoded
offset value.

But in general I would not recommend this approach. It isn't really
supported. And some code might do sanity checks on the valp pointer and
decide it looks bogus and just error out. Also you have to have a valid
Dwarf_CU pointer around because you cannot create a valid fake CU
easily.

So try to keep a reference (or copy) around of the Dwarf_Attribute from
which you got this offset and use that for your dwarf_getlocations
call.

Cheers,

Mark

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: location list
  2020-06-02 17:19 ` Mark Wielaard
@ 2020-06-02 18:12   ` Sasha Da Rocha Pinheiro
  2020-06-06  0:30     ` Sasha Da Rocha Pinheiro
  0 siblings, 1 reply; 9+ messages in thread
From: Sasha Da Rocha Pinheiro @ 2020-06-02 18:12 UTC (permalink / raw)
  To: Mark Wielaard, elfutils-devel

Well, I have been trying to use the Dwarf_Attribute.
If you see the snippet below, locationAttribute is acquired by doing dwarf_attr(&e, DW_AT_location, &locationAttribute);
and &e is the DW_TAG_variable DIE.

        Dwarf_Op * exprs = NULL;
        size_t exprlen = 0;
        std::vector<LocDesc> locDescs;
        ptrdiff_t offset = 0;
        Dwarf_Addr basep, start, end;
        do {
            offset = dwarf_getlocations(&locationAttribute, offset, &basep,
                    &start, &end, &exprs, &exprlen);
            if(offset==-1) return false;
            if(offset==0) break;
            LocDesc ld;
            ld.ld_lopc = start;
            ld.ld_hipc = end;
            ld.dwarfOp = exprs;
            ld.opLen = exprlen;
            locDescs.push_back(ld);
        }while(offset > 0);

But what happens here is I always get the very first entry in .debug_loc. Where clearly for this variable, the location list (sec_offset) is at [    4a] of that section.
Maybe I am using the offset or the basep wrongly?


Sasha




From: Mark Wielaard <mark@klomp.org>
Sent: Tuesday, June 2, 2020 12:19 PM
To: Sasha Da Rocha Pinheiro <darochapinhe@wisc.edu>; elfutils-devel@sourceware.org <elfutils-devel@sourceware.org>
Subject: Re: location list 
 
Hi,

On Tue, 2020-06-02 at 14:18 +0000, Sasha Da Rocha Pinheiro wrote:
> I am trying to parse a location list given as an sec_offset. 
> How do I get this offset value that points to .debug_loc so I can
> call dwarf_getlocations()?
> Should I pass this offset as the second parameter of this call? 

Normally an offset isn't enough information to resolve an DIE attribute
reference. So if at all possible you should try to use an
Dwarf_Attribute you got from some DIE.

It isn't really supported, but you could try creating a "fake"
attribute that carries all information. e.g.

Dwarf_Attribute loc_attr;
loc_attr.code = DW_AT_location;
loc_attr.form = DW_FORM_data4; /* Assuming 32bit DWARF.  */
loc_attr.valp = &offset; /* Your offset should be a 32bit type.  */
loc_attr.cu = cu;

dwarf_getlocations (&loc_attr, offset, ...);

Note that the CU needs to be version 3 or lower for the above to work.
If the CU is > 3 then the only correct form to use is
DW_FORM_sec_offset, and your valp should point to a uleb128 encoded
offset value.

But in general I would not recommend this approach. It isn't really
supported. And some code might do sanity checks on the valp pointer and
decide it looks bogus and just error out. Also you have to have a valid
Dwarf_CU pointer around because you cannot create a valid fake CU
easily.

So try to keep a reference (or copy) around of the Dwarf_Attribute from
which you got this offset and use that for your dwarf_getlocations
call.

Cheers,

Mark

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: location list
  2020-06-02 18:12   ` Sasha Da Rocha Pinheiro
@ 2020-06-06  0:30     ` Sasha Da Rocha Pinheiro
  2020-06-06 14:05       ` Mark Wielaard
  0 siblings, 1 reply; 9+ messages in thread
From: Sasha Da Rocha Pinheiro @ 2020-06-06  0:30 UTC (permalink / raw)
  To: Mark Wielaard, elfutils-devel

As you can see the following variables have distinct locations:
 [    81]      variable             abbrev: 5
               name                 (string) "a"
               decl_file            (data1) sasha.c (1)
               decl_line            (data1) 12
               type                 (ref4) [    cd]
               location             (sec_offset) location list [     0]
 [    9f]        variable             abbrev: 5
                 name                 (string) "g"
                 decl_file            (data1) sasha.c (1)
                 decl_line            (data1) 15
                 type                 (ref4) [    cd]
                 location             (sec_offset) location list [    4a]
[    bd]          variable             abbrev: 5
                   name                 (string) "z"
                   decl_file            (data1) sasha.c (1)
                   decl_line            (data1) 16
                   type                 (ref4) [    cd]
                   location             (sec_offset) location list [    6e]

But when I use the code I sent before to list the three variables, I always get:

[main01.cpp:73] - Variable and location found (a), size(1).
[main01.cpp:78] - interval: (0x0,0x5) 
[main01.cpp:78] - interval: (0x5,0xa) 
[main01.cpp:78] - interval: (0x16,0x24) 
[main01.cpp:73] - Variable and location found (g), size(1).
[main01.cpp:78] - interval: (0x0,0x5) 
[main01.cpp:78] - interval: (0x5,0xa) 
[main01.cpp:78] - interval: (0x16,0x24) 
[main01.cpp:73] - Variable and location found (z), size(1).
[main01.cpp:78] - interval: (0x0,0x5) 
[main01.cpp:78] - interval: (0x5,0xa) 
[main01.cpp:78] - interval: (0x16,0x24) 


No matter the locationAttribute the code always get the first location descriptors in .debug_loc: 
 
DWARF section [ 7] '.debug_loc' at offset 0x1c6:

 CU [     b] base: .text+000000000000000000 <main>
 [     0] range 0, 5
          .text+000000000000000000 <main>..
          .text+0x0000000000000004 <main+0x4>
           [ 0] lit0
           [ 1] stack_value
          range 5, a
          .text+0x0000000000000005 <main+0x5>..
          .text+0x0000000000000009 <main+0x9>
           [ 0] reg1
          range 16, 24
          .text+0x0000000000000016 <main+0x16>..
          .text+0x0000000000000023 <main+0x23>
           [ 0] reg1
 [    4a] range 0, 5
          .text+000000000000000000 <main>..
          .text+0x0000000000000004 <main+0x4>
           [ 0] lit0
           [ 1] stack_value
 [    6e] range 5, a
          .text+0x0000000000000005 <main+0x5>..
          .text+0x0000000000000009 <main+0x9>
           [ 0] lit0
           [ 1] stack_value
          range a, e
          .text+0x000000000000000a <main+0xa>..
          .text+0x000000000000000d <main+0xd>
           [ 0] const4u 65537
           [ 5] breg0 0
           [ 7] minus
           [ 8] stack_value



Sasha

From: Sasha Da Rocha Pinheiro <darochapinhe@wisc.edu>
Sent: Tuesday, June 2, 2020 1:12 PM
To: Mark Wielaard <mark@klomp.org>; elfutils-devel@sourceware.org <elfutils-devel@sourceware.org>
Subject: Re: location list 
 
Well, I have been trying to use the Dwarf_Attribute.
If you see the snippet below, locationAttribute is acquired by doing dwarf_attr(&e, DW_AT_location, &locationAttribute);
and &e is the DW_TAG_variable DIE.

        Dwarf_Op * exprs = NULL;
        size_t exprlen = 0;
        std::vector<LocDesc> locDescs;
        ptrdiff_t offset = 0;
        Dwarf_Addr basep, start, end;
        do {
            offset = dwarf_getlocations(&locationAttribute, offset, &basep,
                    &start, &end, &exprs, &exprlen);
            if(offset==-1) return false;
            if(offset==0) break;
            LocDesc ld;
            ld.ld_lopc = start;
            ld.ld_hipc = end;
            ld.dwarfOp = exprs;
            ld.opLen = exprlen;
            locDescs.push_back(ld);
        }while(offset > 0);

But what happens here is I always get the very first entry in .debug_loc. Where clearly for this variable, the location list (sec_offset) is at [    4a] of that section.
Maybe I am using the offset or the basep wrongly?


Sasha




From: Mark Wielaard <mark@klomp.org>
Sent: Tuesday, June 2, 2020 12:19 PM
To: Sasha Da Rocha Pinheiro <darochapinhe@wisc.edu>; elfutils-devel@sourceware.org <elfutils-devel@sourceware.org>
Subject: Re: location list 
 
Hi,

On Tue, 2020-06-02 at 14:18 +0000, Sasha Da Rocha Pinheiro wrote:
> I am trying to parse a location list given as an sec_offset. 
> How do I get this offset value that points to .debug_loc so I can
> call dwarf_getlocations()?
> Should I pass this offset as the second parameter of this call? 

Normally an offset isn't enough information to resolve an DIE attribute
reference. So if at all possible you should try to use an
Dwarf_Attribute you got from some DIE.

It isn't really supported, but you could try creating a "fake"
attribute that carries all information. e.g.

Dwarf_Attribute loc_attr;
loc_attr.code = DW_AT_location;
loc_attr.form = DW_FORM_data4; /* Assuming 32bit DWARF.  */
loc_attr.valp = &offset; /* Your offset should be a 32bit type.  */
loc_attr.cu = cu;

dwarf_getlocations (&loc_attr, offset, ...);

Note that the CU needs to be version 3 or lower for the above to work.
If the CU is > 3 then the only correct form to use is
DW_FORM_sec_offset, and your valp should point to a uleb128 encoded
offset value.

But in general I would not recommend this approach. It isn't really
supported. And some code might do sanity checks on the valp pointer and
decide it looks bogus and just error out. Also you have to have a valid
Dwarf_CU pointer around because you cannot create a valid fake CU
easily.

So try to keep a reference (or copy) around of the Dwarf_Attribute from
which you got this offset and use that for your dwarf_getlocations
call.

Cheers,

Mark

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: location list
  2020-06-06  0:30     ` Sasha Da Rocha Pinheiro
@ 2020-06-06 14:05       ` Mark Wielaard
  2020-06-09 16:38         ` Sasha Da Rocha Pinheiro
  0 siblings, 1 reply; 9+ messages in thread
From: Mark Wielaard @ 2020-06-06 14:05 UTC (permalink / raw)
  To: Sasha Da Rocha Pinheiro, elfutils-devel

[-- Attachment #1: Type: text/plain, Size: 3861 bytes --]

Hi Sasha,

On Sat, 2020-06-06 at 00:30 +0000, Sasha Da Rocha Pinheiro wrote:
> As you can see the following variables have distinct locations:
>  [    81]      variable             abbrev: 5
>                name                 (string) "a"
>                decl_file            (data1) sasha.c (1)
>                decl_line            (data1) 12
>                type                 (ref4) [    cd]
>                location             (sec_offset) location list
> [     0]
>  [    9f]        variable             abbrev: 5
>                  name                 (string) "g"
>                  decl_file            (data1) sasha.c (1)
>                  decl_line            (data1) 15
>                  type                 (ref4) [    cd]
>                  location             (sec_offset) location list
> [    4a]
> [    bd]          variable             abbrev: 5
>                    name                 (string) "z"
>                    decl_file            (data1) sasha.c (1)
>                    decl_line            (data1) 16
>                    type                 (ref4) [    cd]
>                    location             (sec_offset) location list
> [    6e]
> 
> But when I use the code I sent before to list the three variables, I
> always get:
> 
> [main01.cpp:73] - Variable and location found (a), size(1).
> [main01.cpp:78] - interval: (0x0,0x5) 
> [main01.cpp:78] - interval: (0x5,0xa) 
> [main01.cpp:78] - interval: (0x16,0x24) 
> [main01.cpp:73] - Variable and location found (g), size(1).
> [main01.cpp:78] - interval: (0x0,0x5) 
> [main01.cpp:78] - interval: (0x5,0xa) 
> [main01.cpp:78] - interval: (0x16,0x24) 
> [main01.cpp:73] - Variable and location found (z), size(1).
> [main01.cpp:78] - interval: (0x0,0x5) 
> [main01.cpp:78] - interval: (0x5,0xa) 
> [main01.cpp:78] - interval: (0x16,0x24) 
> 
> 
> No matter the locationAttribute the code always get the first
> location descriptors in .debug_loc: 
>  
> DWARF section [ 7] '.debug_loc' at offset 0x1c6:
> 
>  CU [     b] base: .text+000000000000000000 <main>
>  [     0] range 0, 5
>           .text+000000000000000000 <main>..
>           .text+0x0000000000000004 <main+0x4>
>            [ 0] lit0
>            [ 1] stack_value
>           range 5, a
>           .text+0x0000000000000005 <main+0x5>..
>           .text+0x0000000000000009 <main+0x9>
>            [ 0] reg1
>           range 16, 24
>           .text+0x0000000000000016 <main+0x16>..
>           .text+0x0000000000000023 <main+0x23>
>            [ 0] reg1
>  [    4a] range 0, 5
>           .text+000000000000000000 <main>..
>           .text+0x0000000000000004 <main+0x4>
>            [ 0] lit0
>            [ 1] stack_value
>  [    6e] range 5, a
>           .text+0x0000000000000005 <main+0x5>..
>           .text+0x0000000000000009 <main+0x9>
>            [ 0] lit0
>            [ 1] stack_value
>           range a, e
>           .text+0x000000000000000a <main+0xa>..
>           .text+0x000000000000000d <main+0xd>
>            [ 0] const4u 65537
>            [ 5] breg0 0
>            [ 7] minus
>            [ 8] stack_value

I think I see what is happening. The fact that <main> is at
.text+000000000000000000 suggests that this is actually an ET_REL file
(not linked object file). The libdw dwarf_xxx calls don't do
relocations. But eu-readelf does. So while eu-readelf shows some
offsets as their relocated values, your program just using dwarf_xxx
calls does not. Specifically the DW_AT_location list attributes will
all point to zero. Which explains why every location list seems to be
the same.

We don't have a public function to just apply all relocations to an
object file, but opening the file through dwfl_begin () will do it.

Something like the attached.

Hope that helps,

Mark

[-- Attachment #2: dwfl_dwarf.c --]
[-- Type: text/x-csrc, Size: 2172 bytes --]

/* Print all locations in the whole DIE tree of a single file using
   dwfl to handle ET_REL files (which need the .debug sections to be
   relocated) and to automatically get separate debuginfo.

   gcc -Wall -Wextra -g -O2 -o dwfl_dwarf dwfl_dwarf.c -ldw
*/

/* We want the sane basename function. */
#define _GNU_SOURCE
#include <string.h>
#include <stdbool.h>
#include <stdio.h>
#include <inttypes.h>

#include <dwarf.h>
#include <elfutils/libdw.h>
#include <elfutils/libdwfl.h>

void
handle_die (Dwarf_Die *die)
{
  do
    {
      Dwarf_Attribute attr;
      if ((dwarf_attr (die, DW_AT_location, &attr) != NULL))
	{
	  printf ("[%" PRIx64 "]", dwarf_dieoffset (die));
	  ptrdiff_t off = 0;
	  Dwarf_Addr base, start, end;
	  do
	    {
	      Dwarf_Op *expr;
	      size_t exprlen;
	      off = dwarf_getlocations(&attr, off, &base, &start, &end,
				       &expr, &exprlen);
	      if (off > 0)
		printf ("(%" PRIx64 ",%" PRIx64 ")[%zd] ",
			start, end, exprlen);
	    }
	  while (off > 0);
	  printf ("\n");
	}

      Dwarf_Die child;
      if (dwarf_child (die, &child) == 0)
	handle_die (&child);
    }
  while (dwarf_siblingof (die, die) == 0);
}

static const Dwfl_Callbacks dwfl_callbacks =
  {
    .find_debuginfo = dwfl_standard_find_debuginfo,
    .section_address = dwfl_offline_section_address,
    .find_elf = dwfl_build_id_find_elf,
  };

int main (int argc, char **argv)
{
  if (argc == 2)
    {
      const char *file = argv[1];
      const char *base = basename (file);
      /* Create a one elf module file Dwfl. */
      Dwfl *dwfl = dwfl_begin (&dwfl_callbacks);
      dwfl_report_begin (dwfl);
      Dwfl_Module *mod = dwfl_report_elf (dwfl, base, file, -1, 0, true);
      dwfl_report_end (dwfl, NULL, NULL);

      Dwarf_Addr bias;
      Dwarf *dbg = dwfl_module_getdwarf(mod, &bias);
      if (dbg != NULL)
	{
	   /* Should be zero with one module. */
	  printf ("bias: %" PRIx64 "\n", bias);
	  Dwarf_CU *cu = NULL;
	  Dwarf_Half version;
	  Dwarf_Die cudie, subdie;
	  uint8_t unit_type;
	  while (dwarf_get_units (dbg, cu, &cu, &version, &unit_type,
				  &cudie, &subdie) == 0)
	    handle_die (&cudie);
	}
      dwfl_end (dwfl);
    }
}

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: location list
  2020-06-06 14:05       ` Mark Wielaard
@ 2020-06-09 16:38         ` Sasha Da Rocha Pinheiro
  2020-06-10 11:33           ` Mark Wielaard
  0 siblings, 1 reply; 9+ messages in thread
From: Sasha Da Rocha Pinheiro @ 2020-06-09 16:38 UTC (permalink / raw)
  To: Mark Wielaard, elfutils-devel

Hi Mark,

first of all, thanks for giving me a direction here.

I am now trying to design the changes needed to be done in Dyninst.
So far we have only used the functions dwarf_* under libdw.
What I understood is that libdw is kinda divided in subsets of functions, dwarf_*, dwfl_* and dwelf_*.
I didn't find any documentation about it, or the purpose of these subset of functions. (Whats fl in dwfl for?)
But my understanding is that I can't use data structures from one on the other one.
That alone will need some design to modify the way we parse dwarf info into Dyninst.
Currently the lifetime of a dwarf handle lasts through one execution, because we parse dwarf data when the user needs it.

Can you point me to more documentation here or schedule a call so I can get a more clear view of this?

Regards,
Sasha



From: Mark Wielaard <mark@klomp.org>
Sent: Saturday, June 6, 2020 9:05 AM
To: Sasha Da Rocha Pinheiro <darochapinhe@wisc.edu>; elfutils-devel@sourceware.org <elfutils-devel@sourceware.org>
Subject: Re: location list 
 
Hi Sasha,

On Sat, 2020-06-06 at 00:30 +0000, Sasha Da Rocha Pinheiro wrote:
> As you can see the following variables have distinct locations:
>  [    81]      variable             abbrev: 5
>                name                 (string) "a"
>                decl_file            (data1) sasha.c (1)
>                decl_line            (data1) 12
>                type                 (ref4) [    cd]
>                location             (sec_offset) location list
> [     0]
>  [    9f]        variable             abbrev: 5
>                  name                 (string) "g"
>                  decl_file            (data1) sasha.c (1)
>                  decl_line            (data1) 15
>                  type                 (ref4) [    cd]
>                  location             (sec_offset) location list
> [    4a]
> [    bd]          variable             abbrev: 5
>                    name                 (string) "z"
>                    decl_file            (data1) sasha.c (1)
>                    decl_line            (data1) 16
>                    type                 (ref4) [    cd]
>                    location             (sec_offset) location list
> [    6e]
> 
> But when I use the code I sent before to list the three variables, I
> always get:
> 
> [main01.cpp:73] - Variable and location found (a), size(1).
> [main01.cpp:78] - interval: (0x0,0x5) 
> [main01.cpp:78] - interval: (0x5,0xa) 
> [main01.cpp:78] - interval: (0x16,0x24) 
> [main01.cpp:73] - Variable and location found (g), size(1).
> [main01.cpp:78] - interval: (0x0,0x5) 
> [main01.cpp:78] - interval: (0x5,0xa) 
> [main01.cpp:78] - interval: (0x16,0x24) 
> [main01.cpp:73] - Variable and location found (z), size(1).
> [main01.cpp:78] - interval: (0x0,0x5) 
> [main01.cpp:78] - interval: (0x5,0xa) 
> [main01.cpp:78] - interval: (0x16,0x24) 
> 
> 
> No matter the locationAttribute the code always get the first
> location descriptors in .debug_loc: 
>  
> DWARF section [ 7] '.debug_loc' at offset 0x1c6:
> 
>  CU [     b] base: .text+000000000000000000 <main>
>  [     0] range 0, 5
>           .text+000000000000000000 <main>..
>           .text+0x0000000000000004 <main+0x4>
>            [ 0] lit0
>            [ 1] stack_value
>           range 5, a
>           .text+0x0000000000000005 <main+0x5>..
>           .text+0x0000000000000009 <main+0x9>
>            [ 0] reg1
>           range 16, 24
>           .text+0x0000000000000016 <main+0x16>..
>           .text+0x0000000000000023 <main+0x23>
>            [ 0] reg1
>  [    4a] range 0, 5
>           .text+000000000000000000 <main>..
>           .text+0x0000000000000004 <main+0x4>
>            [ 0] lit0
>            [ 1] stack_value
>  [    6e] range 5, a
>           .text+0x0000000000000005 <main+0x5>..
>           .text+0x0000000000000009 <main+0x9>
>            [ 0] lit0
>            [ 1] stack_value
>           range a, e
>           .text+0x000000000000000a <main+0xa>..
>           .text+0x000000000000000d <main+0xd>
>            [ 0] const4u 65537
>            [ 5] breg0 0
>            [ 7] minus
>            [ 8] stack_value

I think I see what is happening. The fact that <main> is at
.text+000000000000000000 suggests that this is actually an ET_REL file
(not linked object file). The libdw dwarf_xxx calls don't do
relocations. But eu-readelf does. So while eu-readelf shows some
offsets as their relocated values, your program just using dwarf_xxx
calls does not. Specifically the DW_AT_location list attributes will
all point to zero. Which explains why every location list seems to be
the same.

We don't have a public function to just apply all relocations to an
object file, but opening the file through dwfl_begin () will do it.

Something like the attached.

Hope that helps,

Mark

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: location list
  2020-06-09 16:38         ` Sasha Da Rocha Pinheiro
@ 2020-06-10 11:33           ` Mark Wielaard
  2020-06-23 16:34             ` Sasha Da Rocha Pinheiro
  0 siblings, 1 reply; 9+ messages in thread
From: Mark Wielaard @ 2020-06-10 11:33 UTC (permalink / raw)
  To: Sasha Da Rocha Pinheiro, elfutils-devel

Hi Sasha,

On Tue, 2020-06-09 at 16:38 +0000, Sasha Da Rocha Pinheiro via
Elfutils-devel wrote:
> I am now trying to design the changes needed to be done in Dyninst.
> So far we have only used the functions dwarf_* under libdw.
> What I understood is that libdw is kinda divided in subsets of functions,
>  dwarf_*, dwfl_* and dwelf_*.
> I didn't find any documentation about it, or the purpose of these subset of functions.
> (Whats fl in dwfl for?)
> But my understanding is that I can't use data structures from one on the other one.
> That alone will need some design to modify the way we parse dwarf info into Dyninst.
> Currently the lifetime of a dwarf handle lasts through one execution,
> because we parse dwarf data when the user needs it.

So elfutils contains 4 libraries. libelf, which is a semi-standardize
"unix" library to read and manipulate ELF files. libdw, which adds
reading of DWARF data, linux process and kernel mappings, and various
elf/dwarf utility functions. libasm, which provides a assembler and
disassembler interface, but which isn't really finished/recommended at
the moment (it only provides a partial x86 assembler/disassembler and a
bpf disassembler). And libdebuginfod, which provides a way to fetch
remotely stored executables, debuginfo and sources based on build-ids
(from a debuginfod server).

There used to be non-public, internal, "libebl" backend libraries, for
each elfutils supported architecture (libebl_aarch64.so,
libebl_riscv.so, etc.) which were loaded dynamically to safe a bit of
memory in case the backend/arch wasn't used. But with 0.178 the
libraries are build into libdw.so directly and no longer dynamically
loaded. libebl was never intended to be used directly.

[lib]ebl stands for ELF Backend Library. [lib]dw is short for DWARF.
[lib]dwfl then can be read as DWARF Frontend library functions. And
[lib]dwelf are the DWARF and ELF utility functions.

The main data structure of libelf is the Elf handle which can be used
to go through an ELF through sections (Shdrs) or program (Phdrs)
headers. The main data structure that the libdw dwarf_* functions work
on is the Dwarf handle, which is associated with one Elf handle. The
main data structure of the libdwfl dwfl_* functions is the Dwfl handle.
A Dwfl represents a program (or kernel) with library (or kernel
modules) memory lay out. Each Dwfl_Module represents a piece of
executable code mapped at a certain memory range. The Dwfl uses
buildids to associate/create Elf images and Dwarf handles associated
with each Dwfl_Module (it can optionally use libdebuginfod to
download/cache any it doesn't have yet). Since kernel modules are
ET_REL file (non-relocated object files), libdwfl also resolves any
relocations between .debug_sections (this is the property we abused in
the example code I gave you, where we construct a Dwfl from a single
ET_REL object file). Given a Dwfl_Module you can get the associated Elf
or Dwarf with dwfl_module_getelf or dwfl_module_getdwarf. You will note
that those functions also provide a Dwarf_Addr bias which might be non-
zero if the address range where the Dwfl_Module is mapped is different
(at an offset) from the addresses found in the Elf image or Dwarf data.

You would use the libdwfl functions if you want to represent a whole
program as it would be mapped into memory (or the kernel and its
modules). It is convenient if you got a process map
(dwfl_linux_proc_report) or core file (dwfl_core_file_report). The
libdwfl functions would automatically associate an Elf image and find
the Dwarf data for you.

It is even nice to use for "single file" programs like we did in the
example with the single file because it does the automatic lookup of
the Dwarf handle, and because, if the file is an ET_REL object, you get
the relocation between .debug sections for free.

It might make sense to provide utility functions (in libdwelf) to do
both functions separately from setting up a Dwfl. You can already do
most of the lookups by hand using dwelf_elf_gnu_debuglink,
dwelf_dwarf_gnu_debugaltlink, dwelf_elf_gnu_build_id, plus the
libdebuginfod calls. But one generic helper function might be
convenient. The only other way to do the relocation resolving at the
moment is through eu-strip --reloc-debug-sections-only (but this is a
permanent, non-reversible operation on the file, which will make it
unsuitable for linking with other object files).

Hope that give a bit of an overview.

Cheers,

Mark

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: location list
  2020-06-10 11:33           ` Mark Wielaard
@ 2020-06-23 16:34             ` Sasha Da Rocha Pinheiro
  2020-06-25 14:13               ` Mark Wielaard
  0 siblings, 1 reply; 9+ messages in thread
From: Sasha Da Rocha Pinheiro @ 2020-06-23 16:34 UTC (permalink / raw)
  To: Mark Wielaard, elfutils-devel

Hi Mark,

this was very useful. Thanks.

Since we are now using not only executables and .so, but ".o" files too, I'm trying to decide if I can use the same functions to all of them, like the code you pointed out to deal with ".o". Would that work for EXEC, SHARED, and RELOC?

The idea is not to have two codes to parse modules and DIEs, two ways because as you pointed out ".o" files need some relocation to be performed, therefore using dwfl_*. Meanwhile for executables and .so we only use dwarf_* functions.
In face of that, do you foresee bigger changes or things we should worry that we would have in case we use only dwfl_* to open all the ELF files with dwarf data, and drop the way we used to open them? Because our code base for a long time has only used the dwarf_* functions, this would be a big change.

Sasha


From: Mark Wielaard <mark@klomp.org>
Sent: Wednesday, June 10, 2020 6:33 AM
To: Sasha Da Rocha Pinheiro <darochapinhe@wisc.edu>; elfutils-devel@sourceware.org <elfutils-devel@sourceware.org>
Subject: Re: location list 
 
Hi Sasha,

On Tue, 2020-06-09 at 16:38 +0000, Sasha Da Rocha Pinheiro via
Elfutils-devel wrote:
> I am now trying to design the changes needed to be done in Dyninst.
> So far we have only used the functions dwarf_* under libdw.
> What I understood is that libdw is kinda divided in subsets of functions,
>  dwarf_*, dwfl_* and dwelf_*.
> I didn't find any documentation about it, or the purpose of these subset of functions.
> (Whats fl in dwfl for?)
> But my understanding is that I can't use data structures from one on the other one.
> That alone will need some design to modify the way we parse dwarf info into Dyninst.
> Currently the lifetime of a dwarf handle lasts through one execution,
> because we parse dwarf data when the user needs it.

So elfutils contains 4 libraries. libelf, which is a semi-standardize
"unix" library to read and manipulate ELF files. libdw, which adds
reading of DWARF data, linux process and kernel mappings, and various
elf/dwarf utility functions. libasm, which provides a assembler and
disassembler interface, but which isn't really finished/recommended at
the moment (it only provides a partial x86 assembler/disassembler and a
bpf disassembler). And libdebuginfod, which provides a way to fetch
remotely stored executables, debuginfo and sources based on build-ids
(from a debuginfod server).

There used to be non-public, internal, "libebl" backend libraries, for
each elfutils supported architecture (libebl_aarch64.so,
libebl_riscv.so, etc.) which were loaded dynamically to safe a bit of
memory in case the backend/arch wasn't used. But with 0.178 the
libraries are build into libdw.so directly and no longer dynamically
loaded. libebl was never intended to be used directly.

[lib]ebl stands for ELF Backend Library. [lib]dw is short for DWARF.
[lib]dwfl then can be read as DWARF Frontend library functions. And
[lib]dwelf are the DWARF and ELF utility functions.

The main data structure of libelf is the Elf handle which can be used
to go through an ELF through sections (Shdrs) or program (Phdrs)
headers. The main data structure that the libdw dwarf_* functions work
on is the Dwarf handle, which is associated with one Elf handle. The
main data structure of the libdwfl dwfl_* functions is the Dwfl handle.
A Dwfl represents a program (or kernel) with library (or kernel
modules) memory lay out. Each Dwfl_Module represents a piece of
executable code mapped at a certain memory range. The Dwfl uses
buildids to associate/create Elf images and Dwarf handles associated
with each Dwfl_Module (it can optionally use libdebuginfod to
download/cache any it doesn't have yet). Since kernel modules are
ET_REL file (non-relocated object files), libdwfl also resolves any
relocations between .debug_sections (this is the property we abused in
the example code I gave you, where we construct a Dwfl from a single
ET_REL object file). Given a Dwfl_Module you can get the associated Elf
or Dwarf with dwfl_module_getelf or dwfl_module_getdwarf. You will note
that those functions also provide a Dwarf_Addr bias which might be non-
zero if the address range where the Dwfl_Module is mapped is different
(at an offset) from the addresses found in the Elf image or Dwarf data.

You would use the libdwfl functions if you want to represent a whole
program as it would be mapped into memory (or the kernel and its
modules). It is convenient if you got a process map
(dwfl_linux_proc_report) or core file (dwfl_core_file_report). The
libdwfl functions would automatically associate an Elf image and find
the Dwarf data for you.

It is even nice to use for "single file" programs like we did in the
example with the single file because it does the automatic lookup of
the Dwarf handle, and because, if the file is an ET_REL object, you get
the relocation between .debug sections for free.

It might make sense to provide utility functions (in libdwelf) to do
both functions separately from setting up a Dwfl. You can already do
most of the lookups by hand using dwelf_elf_gnu_debuglink,
dwelf_dwarf_gnu_debugaltlink, dwelf_elf_gnu_build_id, plus the
libdebuginfod calls. But one generic helper function might be
convenient. The only other way to do the relocation resolving at the
moment is through eu-strip --reloc-debug-sections-only (but this is a
permanent, non-reversible operation on the file, which will make it
unsuitable for linking with other object files).

Hope that give a bit of an overview.

Cheers,

Mark

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: location list
  2020-06-23 16:34             ` Sasha Da Rocha Pinheiro
@ 2020-06-25 14:13               ` Mark Wielaard
  0 siblings, 0 replies; 9+ messages in thread
From: Mark Wielaard @ 2020-06-25 14:13 UTC (permalink / raw)
  To: Sasha Da Rocha Pinheiro, elfutils-devel

Hi Sasha,

On Tue, 2020-06-23 at 16:34 +0000, Sasha Da Rocha Pinheiro wrote:
> Since we are now using not only executables and .so, but ".o" files
> too, I'm trying to decide if I can use the same functions to all of
> them, like the code you pointed out to deal with ".o". Would that
> work for EXEC, SHARED, and RELOC?

Yes, it would work. The relocation resolving logic only triggers for .o
(ET_REL) files. But everything else works as expected also for ET_EXEC
and ET_DYN files.

> The idea is not to have two codes to parse modules and DIEs, two ways
> because as you pointed out ".o" files need some relocation to be
> performed, therefore using dwfl_*. Meanwhile for executables and .so
> we only use dwarf_* functions.
> In face of that, do you foresee bigger changes or things we should
> worry that we would have in case we use only dwfl_* to open all the
> ELF files with dwarf data, and drop the way we used to open them?
> Because our code base for a long time has only used the dwarf_*
> functions, this would be a big change.

The real "value" from the Dwfl interface comes from it trying to layout
objects as if dynamically loaded. So you can mimic a process even if it
isn't loaded (or a kernel plus modules). This is why some functions
return an "bias" indicating the difference between the Dwfl_Module
"assigned" addresses and any addresses you might read directly from the
Elf or Dwarf. But you can of course ignore that functionality and just
treat each object file independently.

Besides resolving those relocations for ET_REL files, Dwfl also
provides various (default/standard) callbacks to find/associate
separate debuginfo to an Elf file. See Dwfl_Callbacks and the "Standard
callbacks" in the libdwfl.h file. If you do use it, it might
override/change some search paths for where to get the Dwarf data/file
from. Again, you could not use this functionality if you don't like it.
(Dwfl also works when you provide it the Dwarf data files directly.)

Just look at what you need/want.

Cheers,

Mark

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-06-25 14:13 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-02 14:18 location list Sasha Da Rocha Pinheiro
2020-06-02 17:19 ` Mark Wielaard
2020-06-02 18:12   ` Sasha Da Rocha Pinheiro
2020-06-06  0:30     ` Sasha Da Rocha Pinheiro
2020-06-06 14:05       ` Mark Wielaard
2020-06-09 16:38         ` Sasha Da Rocha Pinheiro
2020-06-10 11:33           ` Mark Wielaard
2020-06-23 16:34             ` Sasha Da Rocha Pinheiro
2020-06-25 14:13               ` Mark Wielaard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).