public inbox for elfutils@sourceware.org
 help / color / mirror / Atom feed
* Handling pgoff in perf elf mmap/mmap2 elf info
@ 2018-09-19 12:12 Christoph Sterz
  2018-09-19 12:24 ` Ulf Hermann
  0 siblings, 1 reply; 17+ messages in thread
From: Christoph Sterz @ 2018-09-19 12:12 UTC (permalink / raw)
  To: elfutils-devel

[-- Attachment #1: Type: text/plain, Size: 2186 bytes --]

Hi,

I work on Hotspot[1] an opensource linux perf aggregator and visualizer.
For this we use perfparser[2], which in turn uses libdw for unwinding.

Recently, we found more and more perf trace-files to use the 'pgoff'
field [3].
This happens especially on newer distros, (arch, opensuse tumbleweed).

We suspect perf to offset its recording-addresses of mmapped
dsos/executables starting with a specific section, such that they denote
their pointers with this pg_offset parameter. (e.g. skipping a library's
header and setting pgoff to the headersize). Although we are not 100%
sure about this information.

The Function I am using here is:
extern Dwfl_Module *dwfl_report_elf (Dwfl *dwfl, const char *name,
                     const char *file_name, int fd,
                     GElf_Addr base, bool add_p_vaddr);

in the specific call I am doing is:

 Dwfl_Module *ret = dwfl_report_elf(
                m_dwfl, info.originalFileName.constData(),
               
info.localFile.absoluteFilePath().toLocal8Bit().constData(), -1,
                info.addr,
                false);

and I am wondering how to include the pgoff here.

Simply subtracting it from info.addr results in a lots of "address range
overlaps an existing module" errors, where I guess I subtracted too
much. I know pgoff is in bytes.
Tried adding the offset, also overlap errors.

Ignoring the offset results in errors where perfparser fails to find ELF
for instruction pointer addresses.

I would be happy to hear if anyone has experience unwinding with these
offsets.
Maybe there is a different function I should use reporting the elf.
Maybe even someone unwinded/parsed perf data before.

Thanks,

Christoph


[1] https://github.com/KDAB/hotspot
[2] http://code.qt.io/cgit/qt-creator/perfparser.git/
[3] sparse info at
http://man7.org/linux/man-pages/man2/perf_event_open.2.html

-- 
Christoph Sterz | christoph.sterz@kdab.com | Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts



[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4003 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Handling pgoff in perf elf mmap/mmap2 elf info
  2018-09-19 12:12 Handling pgoff in perf elf mmap/mmap2 elf info Christoph Sterz
@ 2018-09-19 12:24 ` Ulf Hermann
  2018-09-21 13:07   ` Mark Wielaard
  0 siblings, 1 reply; 17+ messages in thread
From: Ulf Hermann @ 2018-09-19 12:24 UTC (permalink / raw)
  To: elfutils-devel

> We suspect perf to offset its recording-addresses of mmapped
> dsos/executables starting with a specific section, such that they denote
> their pointers with this pg_offset parameter. (e.g. skipping a library's
> header and setting pgoff to the headersize). Although we are not 100%
> sure about this information.

According to my understanding, the pgoff is not perf's invention. Rather, the libary loader for the target application does not mmap() the full ELF file, but only the parts it's interested in. Those partial mappings are then reported through perf. We then try to recreate the memory mapping with perfparser, but run into problems because dwfl_report_elf() doesn't let us do partial mappings. You can only map complete files with that function. There probably is some way to manually map the relevant sections using other functions in libdw and libelf, but I haven't figured out how to do this, yet. If there is a simple trick I'm missing, I'd be happy to hear about it. 

And, yes, a function that works like dwfl_report_elf, but takes a pgoff and length as additional parameters is sorely missing from the API.

best regards,
Ulf

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Handling pgoff in perf elf mmap/mmap2 elf info
  2018-09-19 12:24 ` Ulf Hermann
@ 2018-09-21 13:07   ` Mark Wielaard
  2018-09-21 13:35     ` Ulf Hermann
  2018-09-26 14:38     ` Milian Wolff
  0 siblings, 2 replies; 17+ messages in thread
From: Mark Wielaard @ 2018-09-21 13:07 UTC (permalink / raw)
  To: Ulf Hermann, elfutils-devel; +Cc: Christoph Sterz

On Wed, 2018-09-19 at 14:24 +0200, Ulf Hermann wrote:
> > We suspect perf to offset its recording-addresses of mmapped
> > dsos/executables starting with a specific section, such that they
> > denote
> > their pointers with this pg_offset parameter. (e.g. skipping a
> > library's
> > header and setting pgoff to the headersize). Although we are not
> > 100%
> > sure about this information.
> 
> According to my understanding, the pgoff is not perf's invention.
> Rather, the libary loader for the target application does not mmap()
> the full ELF file, but only the parts it's interested in. Those
> partial mappings are then reported through perf.

OK, so pgoff is like the offset argument of the mmap call?
Is it just recording all user space mmap events that have PROT_EXEC in
their prot argument? What about if the mapping was later changed with
mprotect? Or does PERF_RECORD_MMAP only map to some internal kernel
mmap action?

>  We then try to recreate the memory mapping with perfparser, but run
> into problems because dwfl_report_elf() doesn't let us do partial
> mappings. You can only map complete files with that function. There
> probably is some way to manually map the relevant sections using
> other functions in libdw and libelf, but I haven't figured out how to
> do this, yet. If there is a simple trick I'm missing, I'd be happy to
> hear about it. 
> 
> And, yes, a function that works like dwfl_report_elf, but takes a
> pgoff and length as additional parameters is sorely missing from the
> API.

dwfl_report_elf indeed does assume the whole ELF file is mapped in
according to the PHDRs in the file and the given base address. But what
you are actually seeing (I think, depending on the answers on the
questions above) is the dynamic loader mapping in the file in pieces.
And so you would like an interface where you can report the module
piece wise while it is being mapped in. So what would be most
convenient would be some kind of dwfl_report_elf_mmap function that you
can use to either get a new Dwfl_Module or to extend an existing one.

I have to think how that interacts with mprotect and mmunmap.

Cheers,

Mark

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Handling pgoff in perf elf mmap/mmap2 elf info
  2018-09-21 13:07   ` Mark Wielaard
@ 2018-09-21 13:35     ` Ulf Hermann
  2018-09-26 14:38     ` Milian Wolff
  1 sibling, 0 replies; 17+ messages in thread
From: Ulf Hermann @ 2018-09-21 13:35 UTC (permalink / raw)
  To: Mark Wielaard, elfutils-devel; +Cc: Christoph Sterz

> OK, so pgoff is like the offset argument of the mmap call?

As far as I understand it, yes.

> Is it just recording all user space mmap events that have PROT_EXEC in
> their prot argument? 

It just records all mmap events, also the ones without PROT_EXEC.

We check in perfparser if the file in question is supposed to be an elf 
file and consider everything that looks like one for later reporting to 
dwfl. We then report modules on demand, though. So, if no sample ever 
touches a module, the module doesn't get reported. Mmaps without 
PROT_EXEC shouldn't show up in any samples, except if the trace data is 
bad in some other way. But then, if we reported to dwfl a module that 
isn't actually linked in the target application but still mapped to a 
place in memory, that wouldn't disturb the unwinding for other modules. 
Would it?

We can probably determine the initial PROT_EXEC state for some mmaps, 
though. There are two possible variants of mmap events in the perf 
trace, one of which has a "prot" field. However, I don't know if we can 
rely on that being present in traces from the systems in question.

> What about if the mapping was later changed with mprotect? 

We don't get separate events for mprotect, so the "prot" field is 
probably worthless.

> Or does PERF_RECORD_MMAP only map to some internal kernel mmap action?

I don't know exactly where the mmap events are generated, but it has to 
be somewhere in the kernel. That is how perf_event_open operates, and, 
by extension, how perf gets its data. We might get some synthesized mmap 
events generated from the current memory map of an application when we 
attach while it's already running. I don't quite know how that works.

> dwfl_report_elf indeed does assume the whole ELF file is mapped in
> according to the PHDRs in the file and the given base address. But what
> you are actually seeing (I think, depending on the answers on the
> questions above) is the dynamic loader mapping in the file in pieces.

Yes, that is my interpretation of the (limited) data I have. Maybe 
Christoph can add something here.

> And so you would like an interface where you can report the module
> piece wise while it is being mapped in. So what would be most
> convenient would be some kind of dwfl_report_elf_mmap function that you
> can use to either get a new Dwfl_Module or to extend an existing one.

That sounds about right.

> I have to think how that interacts with mprotect and mmunmap.

We don't get any extra events for mprotect and munmap. If we detect an 
address space conflict between different modules, we just throw out the 
dwfl state and restart the reporting. (I know, there is a potential for 
optimization here: We could use the callback to dwfl_report_end and only 
throw out the conflicting modules.)

cheers,
Ulf

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Handling pgoff in perf elf mmap/mmap2 elf info
  2018-09-21 13:07   ` Mark Wielaard
  2018-09-21 13:35     ` Ulf Hermann
@ 2018-09-26 14:38     ` Milian Wolff
  2018-10-09 20:33       ` Milian Wolff
  1 sibling, 1 reply; 17+ messages in thread
From: Milian Wolff @ 2018-09-26 14:38 UTC (permalink / raw)
  To: elfutils-devel; +Cc: Mark Wielaard, Ulf Hermann, Christoph Sterz

On Friday, September 21, 2018 3:07:29 PM CEST Mark Wielaard wrote:
> On Wed, 2018-09-19 at 14:24 +0200, Ulf Hermann wrote:
> > > We suspect perf to offset its recording-addresses of mmapped
> > > dsos/executables starting with a specific section, such that they
> > > denote
> > > their pointers with this pg_offset parameter. (e.g. skipping a
> > > library's
> > > header and setting pgoff to the headersize). Although we are not
> > > 100%
> > > sure about this information.
> > 
> > According to my understanding, the pgoff is not perf's invention.
> > Rather, the libary loader for the target application does not mmap()
> > the full ELF file, but only the parts it's interested in. Those
> > partial mappings are then reported through perf.
> 
> OK, so pgoff is like the offset argument of the mmap call?
> Is it just recording all user space mmap events that have PROT_EXEC in
> their prot argument? What about if the mapping was later changed with
> mprotect? Or does PERF_RECORD_MMAP only map to some internal kernel
> mmap action?
> 
> >  We then try to recreate the memory mapping with perfparser, but run
> > 
> > into problems because dwfl_report_elf() doesn't let us do partial
> > mappings. You can only map complete files with that function. There
> > probably is some way to manually map the relevant sections using
> > other functions in libdw and libelf, but I haven't figured out how to
> > do this, yet. If there is a simple trick I'm missing, I'd be happy to
> > hear about it. 
> > 
> > And, yes, a function that works like dwfl_report_elf, but takes a
> > pgoff and length as additional parameters is sorely missing from the
> > API.
> 
> dwfl_report_elf indeed does assume the whole ELF file is mapped in
> according to the PHDRs in the file and the given base address. But what
> you are actually seeing (I think, depending on the answers on the
> questions above) is the dynamic loader mapping in the file in pieces.
> And so you would like an interface where you can report the module
> piece wise while it is being mapped in. So what would be most
> convenient would be some kind of dwfl_report_elf_mmap function that you
> can use to either get a new Dwfl_Module or to extend an existing one.
> 
> I have to think how that interacts with mprotect and mmunmap.

Hey Mark,

I can only second what Christoph and Ulf said so far. I want to add though 
that this limitation essentially makes elfutils unusable for usage in perf. 
I.e., perf can be build with either libunwind or elfutils for unwinding 
callstacks. Using the former works like a charm. The latter runs into the same 
problems like perfparser / hotspot. To reproduce, use a modern distribution 
with recent userland and kernel, then do something like this:

~~~~~
$ cat test.cpp
#include <cmath>
#include <complex>
#include <iostream>
#include <random>

using namespace std;

int main()
{
    uniform_real_distribution<double> uniform(-1E5, 1E5);
    default_random_engine engine;
    double s = 0;
    for (int i = 0; i < 10000000; ++i) {
        s += norm(complex<double>(uniform(engine), uniform(engine)));
    }
    cout << s << '\n';
    return 0;
}
$ g++ -O2 -g test.cpp -o test
$ perf record --call-graph dwarf ./test
$ perf report --stdio -vv
...
overlapping maps:
 55a1cdd87000-55a1cdd89000 1000 /ssd/milian/projects/kdab/rnd/hotspot/tests/
test-clients/cpp-inlining/test
 55a1cdd85000-55a1cdd89000 0 /ssd/milian/projects/kdab/rnd/hotspot/tests/test-
clients/cpp-inlining/test
 55a1cdd85000-55a1cdd87000 0 /ssd/milian/projects/kdab/rnd/hotspot/tests/test-
clients/cpp-inlining/test
overlapping maps:
 7fdda4690000-7fdda46af000 2000 /usr/lib/ld-2.28.so
 7fdda468e000-7fdda46ba000 0 /usr/lib/ld-2.28.so
 7fdda468e000-7fdda4690000 0 /usr/lib/ld-2.28.so
 7fdda46af000-7fdda46ba000 0 /usr/lib/ld-2.28.so
...
57.14%     0.00%  test     [unknown]         [.] 0xe775b17d50ae8cff
            |
            ---0xe775b17d50ae8cff
~~~~~

To build perf against elfutils, do:

~~~~~
git clone --branch=perf/core git://git.kernel.org/pub/scm/linux/kernel/git/
acme/linux.git
cd linux/tools/perf
make NO_LIBUNWIND=1
~~~~~

The perf binary in the last folder can than be used as a drop-in replacement.

Since I consider this issue a serious blocker, I would like to see it fixed 
sooner rather than later. Would it maybe be possible for you to create a proof 
of concept for the new proposed dwfl_report_elf_mmap? I can then try to take 
it from there to fill in the missing bits and pieces and to make it actually 
work for our purposes.

Thanks
-- 
Milian Wolff
mail@milianw.de
http://milianw.de


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Handling pgoff in perf elf mmap/mmap2 elf info
  2018-09-26 14:38     ` Milian Wolff
@ 2018-10-09 20:33       ` Milian Wolff
  2018-10-11 17:02         ` Ulf Hermann
  0 siblings, 1 reply; 17+ messages in thread
From: Milian Wolff @ 2018-10-09 20:33 UTC (permalink / raw)
  To: elfutils-devel; +Cc: Mark Wielaard, Ulf Hermann, Christoph Sterz

[-- Attachment #1: Type: text/plain, Size: 6144 bytes --]

On Mittwoch, 26. September 2018 16:38:43 CEST Milian Wolff wrote:
> On Friday, September 21, 2018 3:07:29 PM CEST Mark Wielaard wrote:
> > On Wed, 2018-09-19 at 14:24 +0200, Ulf Hermann wrote:
> > > > We suspect perf to offset its recording-addresses of mmapped
> > > > dsos/executables starting with a specific section, such that they
> > > > denote
> > > > their pointers with this pg_offset parameter. (e.g. skipping a
> > > > library's
> > > > header and setting pgoff to the headersize). Although we are not
> > > > 100%
> > > > sure about this information.
> > > 
> > > According to my understanding, the pgoff is not perf's invention.
> > > Rather, the libary loader for the target application does not mmap()
> > > the full ELF file, but only the parts it's interested in. Those
> > > partial mappings are then reported through perf.
> > 
> > OK, so pgoff is like the offset argument of the mmap call?
> > Is it just recording all user space mmap events that have PROT_EXEC in
> > their prot argument? What about if the mapping was later changed with
> > mprotect? Or does PERF_RECORD_MMAP only map to some internal kernel
> > mmap action?
> > 
> > >  We then try to recreate the memory mapping with perfparser, but run
> > > 
> > > into problems because dwfl_report_elf() doesn't let us do partial
> > > mappings. You can only map complete files with that function. There
> > > probably is some way to manually map the relevant sections using
> > > other functions in libdw and libelf, but I haven't figured out how to
> > > do this, yet. If there is a simple trick I'm missing, I'd be happy to
> > > hear about it.
> > > 
> > > And, yes, a function that works like dwfl_report_elf, but takes a
> > > pgoff and length as additional parameters is sorely missing from the
> > > API.
> > 
> > dwfl_report_elf indeed does assume the whole ELF file is mapped in
> > according to the PHDRs in the file and the given base address. But what
> > you are actually seeing (I think, depending on the answers on the
> > questions above) is the dynamic loader mapping in the file in pieces.
> > And so you would like an interface where you can report the module
> > piece wise while it is being mapped in. So what would be most
> > convenient would be some kind of dwfl_report_elf_mmap function that you
> > can use to either get a new Dwfl_Module or to extend an existing one.
> > 
> > I have to think how that interacts with mprotect and mmunmap.
> 
> Hey Mark,
> 
> I can only second what Christoph and Ulf said so far. I want to add though
> that this limitation essentially makes elfutils unusable for usage in perf.
> I.e., perf can be build with either libunwind or elfutils for unwinding
> callstacks. Using the former works like a charm. The latter runs into the
> same problems like perfparser / hotspot. To reproduce, use a modern
> distribution with recent userland and kernel, then do something like this:
> 
> ~~~~~
> $ cat test.cpp
> #include <cmath>
> #include <complex>
> #include <iostream>
> #include <random>
> 
> using namespace std;
> 
> int main()
> {
>     uniform_real_distribution<double> uniform(-1E5, 1E5);
>     default_random_engine engine;
>     double s = 0;
>     for (int i = 0; i < 10000000; ++i) {
>         s += norm(complex<double>(uniform(engine), uniform(engine)));
>     }
>     cout << s << '\n';
>     return 0;
> }
> $ g++ -O2 -g test.cpp -o test
> $ perf record --call-graph dwarf ./test
> $ perf report --stdio -vv
> ...
> overlapping maps:
>  55a1cdd87000-55a1cdd89000 1000 /ssd/milian/projects/kdab/rnd/hotspot/tests/
> test-clients/cpp-inlining/test
>  55a1cdd85000-55a1cdd89000 0
> /ssd/milian/projects/kdab/rnd/hotspot/tests/test- clients/cpp-inlining/test
>  55a1cdd85000-55a1cdd87000 0
> /ssd/milian/projects/kdab/rnd/hotspot/tests/test- clients/cpp-inlining/test
> overlapping maps:
>  7fdda4690000-7fdda46af000 2000 /usr/lib/ld-2.28.so
>  7fdda468e000-7fdda46ba000 0 /usr/lib/ld-2.28.so
>  7fdda468e000-7fdda4690000 0 /usr/lib/ld-2.28.so
>  7fdda46af000-7fdda46ba000 0 /usr/lib/ld-2.28.so
> ...
> 57.14%     0.00%  test     [unknown]         [.] 0xe775b17d50ae8cff
> 
>             ---0xe775b17d50ae8cff
> ~~~~~
> 
> To build perf against elfutils, do:
> 
> ~~~~~
> git clone --branch=perf/core git://git.kernel.org/pub/scm/linux/kernel/git/
> acme/linux.git
> cd linux/tools/perf
> make NO_LIBUNWIND=1
> ~~~~~
> 
> The perf binary in the last folder can than be used as a drop-in
> replacement.
> 
> Since I consider this issue a serious blocker, I would like to see it fixed
> sooner rather than later. Would it maybe be possible for you to create a
> proof of concept for the new proposed dwfl_report_elf_mmap? I can then try
> to take it from there to fill in the missing bits and pieces and to make it
> actually work for our purposes.

Hey Mark,

any news on this? Today, I spend some time reading through dwfl_report_elf.c 
and it's far from trivial for me to go from here to a dwfl_report_elf_mmap or 
similar.

If you are unable to work on a POC, can you guide us a bit please? Here are 
some questions from my side:

- The address mapping seems to be handled via dwfl_report_module, so do we 
want to call this once per mmap we encounter?

- If so, then maybe what's actually missing is some API to allow *setting* an 
Elf for a Dwfl_Module, i.e. such that dwfl_report_module becomes useful for 
public consumption? This would basically boil down to two functions: One to 
open an elf file as is done currently in dwfl_report_elf/__libdw_open_file. 
Then another function to assign the opened Elf to the Dwfl_Module, cf. the 
tail of __libdwfl_report_elf below the call to __libdwfl_elf_address_range.

- But then, in such a per-mmap module, what values would one set for the 
following properties?

	m->main.vaddr = vaddr;
	m->main.address_sync = address_sync;
	m->main_bias = bias;

Do we set the same values everywhere, or do these values then depend on which 
mmap section we are looking at? Generally, I still don't have a good enough 
knowledge of the Elf API nor elfutils to really know what I'm talking about 
here...

Thanks
-- 
Milian Wolff
mail@milianw.de
http://milianw.de

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Handling pgoff in perf elf mmap/mmap2 elf info
  2018-10-09 20:33       ` Milian Wolff
@ 2018-10-11 17:02         ` Ulf Hermann
  2018-10-11 17:37           ` Mark Wielaard
  0 siblings, 1 reply; 17+ messages in thread
From: Ulf Hermann @ 2018-10-11 17:02 UTC (permalink / raw)
  To: Milian Wolff, elfutils-devel; +Cc: Mark Wielaard, Christoph Sterz

Hi Milian,

is there any pattern in how the loader maps the ELF sections into 
memory? What sections does it actually map and which of those do we need 
for unwinding?

I hope that only one of those MMAPs per ELF is actually meaningful and 
we can simply add that one's pgoff as an extra member to Dwfl_Module and 
use it whenever we poke the underlying file.

br,
Ulf

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Handling pgoff in perf elf mmap/mmap2 elf info
  2018-10-11 17:02         ` Ulf Hermann
@ 2018-10-11 17:37           ` Mark Wielaard
  2018-10-11 18:14             ` Milian Wolff
  0 siblings, 1 reply; 17+ messages in thread
From: Mark Wielaard @ 2018-10-11 17:37 UTC (permalink / raw)
  To: Ulf Hermann; +Cc: Milian Wolff, elfutils-devel, Christoph Sterz

Hi,

My apologies for not having looked deeper at this.
It is a bit tricky and I just didnt have enough time to
really sit down and think it all through yet.

On Thu, Oct 11, 2018 at 05:02:18PM +0000, Ulf Hermann wrote:
> is there any pattern in how the loader maps the ELF sections into 
> memory? What sections does it actually map and which of those do we need 
> for unwinding?

Yes, it would be helpful to have some examples of mmap events plus
the associated segment header (eu-readelf -l) of the ELF file.

Note that the kernel and dynamic loader will use the (PT_LOAD) segments,
not the sections, to map things into memory. Each segment might contain
multiple sections.

libdwfl then tries to associate the correct sections (and address bias)
with how the ELF file was mapped into memory.

> I hope that only one of those MMAPs per ELF is actually meaningful and 
> we can simply add that one's pgoff as an extra member to Dwfl_Module and 
> use it whenever we poke the underlying file.

One "trick" might be to just substract the pgoff from the load address.
And so report as if the ELF file was being mapped from the start. This
isn't really correct, but it might be interesting to see if that makes
libdwfl able to just associate the whole ELF file with the correct
address map.

Cheers,

Mark

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Handling pgoff in perf elf mmap/mmap2 elf info
  2018-10-11 17:37           ` Mark Wielaard
@ 2018-10-11 18:14             ` Milian Wolff
  2018-10-15 20:39               ` Milian Wolff
  2018-10-15 20:48               ` Milian Wolff
  0 siblings, 2 replies; 17+ messages in thread
From: Milian Wolff @ 2018-10-11 18:14 UTC (permalink / raw)
  To: Mark Wielaard; +Cc: Ulf Hermann, elfutils-devel, Christoph Sterz

[-- Attachment #1: Type: text/plain, Size: 2363 bytes --]

On Donnerstag, 11. Oktober 2018 19:37:07 CEST Mark Wielaard wrote:
> Hi,
> 
> My apologies for not having looked deeper at this.
> It is a bit tricky and I just didnt have enough time to
> really sit down and think it all through yet.
> 
> On Thu, Oct 11, 2018 at 05:02:18PM +0000, Ulf Hermann wrote:
> > is there any pattern in how the loader maps the ELF sections into
> > memory? What sections does it actually map and which of those do we need
> > for unwinding?
> 
> Yes, it would be helpful to have some examples of mmap events plus
> the associated segment header (eu-readelf -l) of the ELF file.
> 
> Note that the kernel and dynamic loader will use the (PT_LOAD) segments,
> not the sections, to map things into memory. Each segment might contain
> multiple sections.
> 
> libdwfl then tries to associate the correct sections (and address bias)
> with how the ELF file was mapped into memory.
> 
> > I hope that only one of those MMAPs per ELF is actually meaningful and
> > we can simply add that one's pgoff as an extra member to Dwfl_Module and
> > use it whenever we poke the underlying file.
> 
> One "trick" might be to just substract the pgoff from the load address.
> And so report as if the ELF file was being mapped from the start. This
> isn't really correct, but it might be interesting to see if that makes
> libdwfl able to just associate the whole ELF file with the correct
> address map.

I'll try to come up with some minimal code examples we can use to test all of 
this. But from what I remember, neither of the above suggestions will be 
sufficient as we can still run into overlapping module errors from elfutils 
when we always load everything. I.e. I believe we've seen mappings that 
eventually become partially obsoleted by a future mmap event. At that point, 
we somehow need to be able to only map parts of a file, not all of it. So just 
subtracting or honoring pgoff is not enough, I believe we also need to be able 
to explicitly say how much of a file to map.

But to make this discussion easier to follow for others, I'll create some 
standalone cpp code that takes a `perf script --show-mmap-events  | grep 
PERF_RECORD_MMAP` input file and then runs this through elfutils API to 
reproduce the issues we are facing.

I'll get back to you all once this is done.

Cheers

-- 
Milian Wolff
mail@milianw.de
http://milianw.de

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Handling pgoff in perf elf mmap/mmap2 elf info
  2018-10-11 18:14             ` Milian Wolff
@ 2018-10-15 20:39               ` Milian Wolff
  2018-10-15 21:05                 ` Mark Wielaard
                                   ` (2 more replies)
  2018-10-15 20:48               ` Milian Wolff
  1 sibling, 3 replies; 17+ messages in thread
From: Milian Wolff @ 2018-10-15 20:39 UTC (permalink / raw)
  To: elfutils-devel; +Cc: Mark Wielaard, Ulf Hermann, Christoph Sterz

[-- Attachment #1: Type: text/plain, Size: 3966 bytes --]

On Donnerstag, 11. Oktober 2018 20:14:43 CEST Milian Wolff wrote:
> On Donnerstag, 11. Oktober 2018 19:37:07 CEST Mark Wielaard wrote:
> > Hi,
> > 
> > My apologies for not having looked deeper at this.
> > It is a bit tricky and I just didnt have enough time to
> > really sit down and think it all through yet.
> > 
> > On Thu, Oct 11, 2018 at 05:02:18PM +0000, Ulf Hermann wrote:
> > > is there any pattern in how the loader maps the ELF sections into
> > > memory? What sections does it actually map and which of those do we need
> > > for unwinding?
> > 
> > Yes, it would be helpful to have some examples of mmap events plus
> > the associated segment header (eu-readelf -l) of the ELF file.
> > 
> > Note that the kernel and dynamic loader will use the (PT_LOAD) segments,
> > not the sections, to map things into memory. Each segment might contain
> > multiple sections.
> > 
> > libdwfl then tries to associate the correct sections (and address bias)
> > with how the ELF file was mapped into memory.
> > 
> > > I hope that only one of those MMAPs per ELF is actually meaningful and
> > > we can simply add that one's pgoff as an extra member to Dwfl_Module and
> > > use it whenever we poke the underlying file.
> > 
> > One "trick" might be to just substract the pgoff from the load address.
> > And so report as if the ELF file was being mapped from the start. This
> > isn't really correct, but it might be interesting to see if that makes
> > libdwfl able to just associate the whole ELF file with the correct
> > address map.
> 
> I'll try to come up with some minimal code examples we can use to test all
> of this. But from what I remember, neither of the above suggestions will be
> sufficient as we can still run into overlapping module errors from elfutils
> when we always load everything. I.e. I believe we've seen mappings that
> eventually become partially obsoleted by a future mmap event. At that
> point, we somehow need to be able to only map parts of a file, not all of
> it. So just subtracting or honoring pgoff is not enough, I believe we also
> need to be able to explicitly say how much of a file to map.
> 
> But to make this discussion easier to follow for others, I'll create some
> standalone cpp code that takes a `perf script --show-mmap-events  | grep
> PERF_RECORD_MMAP` input file and then runs this through elfutils API to
> reproduce the issues we are facing.
> 
> I'll get back to you all once this is done.

Hey all,

here's one example of mmap events recorded by perf:

0x7fac5ec0b000 to 0x7fac5ed9a000, len =   0x18f000, offset =                0   
r--p    /usr/lib/libstdc++.so.6.0.25
0x7fac5ec94000 to 0x7fac5ed8a000, len =    0xf6000, offset =          0x89000   
---p    /usr/lib/libstdc++.so.6.0.25
0x7fac5ec94000 to 0x7fac5ed4c000, len =    0xb8000, offset =          0x89000   
r-xp    /usr/lib/libstdc++.so.6.0.25
0x7fac5ed4c000 to 0x7fac5ed89000, len =    0x3d000, offset =         0x141000   
r--p    /usr/lib/libstdc++.so.6.0.25
0x7fac5ed8a000 to 0x7fac5ed97000, len =     0xd000, offset =         0x17e000   
rw-p    /usr/lib/libstdc++.so.6.0.25

this is noteworthy in multiple ways:

- the first mapping we receive is for pgoff = 0 for the full file size aligned 
to the page boundary
- the first mapping isn't executable yet
- the last mappings have a huge offset which actually lies beyond the 
initially mmaped region?!

And to make things worse, when we report the file at address 0x7fac5ec0b000 
via dwfl, we get:

reported module /usr/lib/libstdc++.so.6.0.25
        expected: 0x7fac5ec0b000 to 0x7fac5ed9a000 (0x18f000)
        actual:   0x7fac5ec0b000 to 0x7fac5ed99640 (0x18e640)

So now dwfl won't ever be able to map any addresses into this module when they 
come after 0x7fac5ed99640, but the mmap events above seem to indicate that 
this could be possible?

I'll now upload my code to enable you all to play around with this yourself.

Bye
-- 
Milian Wolff
mail@milianw.de
http://milianw.de

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Handling pgoff in perf elf mmap/mmap2 elf info
  2018-10-11 18:14             ` Milian Wolff
  2018-10-15 20:39               ` Milian Wolff
@ 2018-10-15 20:48               ` Milian Wolff
  1 sibling, 0 replies; 17+ messages in thread
From: Milian Wolff @ 2018-10-15 20:48 UTC (permalink / raw)
  To: elfutils-devel; +Cc: Mark Wielaard, Ulf Hermann, Christoph Sterz

[-- Attachment #1: Type: text/plain, Size: 2941 bytes --]

On Donnerstag, 11. Oktober 2018 20:14:43 CEST Milian Wolff wrote:
> On Donnerstag, 11. Oktober 2018 19:37:07 CEST Mark Wielaard wrote:
> > Hi,
> > 
> > My apologies for not having looked deeper at this.
> > It is a bit tricky and I just didnt have enough time to
> > really sit down and think it all through yet.
> > 
> > On Thu, Oct 11, 2018 at 05:02:18PM +0000, Ulf Hermann wrote:
> > > is there any pattern in how the loader maps the ELF sections into
> > > memory? What sections does it actually map and which of those do we need
> > > for unwinding?
> > 
> > Yes, it would be helpful to have some examples of mmap events plus
> > the associated segment header (eu-readelf -l) of the ELF file.
> > 
> > Note that the kernel and dynamic loader will use the (PT_LOAD) segments,
> > not the sections, to map things into memory. Each segment might contain
> > multiple sections.
> > 
> > libdwfl then tries to associate the correct sections (and address bias)
> > with how the ELF file was mapped into memory.
> > 
> > > I hope that only one of those MMAPs per ELF is actually meaningful and
> > > we can simply add that one's pgoff as an extra member to Dwfl_Module and
> > > use it whenever we poke the underlying file.
> > 
> > One "trick" might be to just substract the pgoff from the load address.
> > And so report as if the ELF file was being mapped from the start. This
> > isn't really correct, but it might be interesting to see if that makes
> > libdwfl able to just associate the whole ELF file with the correct
> > address map.
> 
> I'll try to come up with some minimal code examples we can use to test all
> of this. But from what I remember, neither of the above suggestions will be
> sufficient as we can still run into overlapping module errors from elfutils
> when we always load everything. I.e. I believe we've seen mappings that
> eventually become partially obsoleted by a future mmap event. At that
> point, we somehow need to be able to only map parts of a file, not all of
> it. So just subtracting or honoring pgoff is not enough, I believe we also
> need to be able to explicitly say how much of a file to map.
> 
> But to make this discussion easier to follow for others, I'll create some
> standalone cpp code that takes a `perf script --show-mmap-events  | grep
> PERF_RECORD_MMAP` input file and then runs this through elfutils API to
> reproduce the issues we are facing.
> 
> I'll get back to you all once this is done.

I've pushed a preliminary POC for a reproducer:

https://github.com/milianw/perf_mmaps_to_elfutils

Note that it's not really exhibiting any dwfl errors as-is. We would need to 
feed it also all the instruction pointer addresses that perf encounters, then 
try to find the matching module via libdwfl. This isn't easily done, and I 
hope the current output already exemplifies some of the issues with the 
current libdwfl API.

Thanks

-- 
Milian Wolff
mail@milianw.de
http://milianw.de

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Handling pgoff in perf elf mmap/mmap2 elf info
  2018-10-15 20:39               ` Milian Wolff
@ 2018-10-15 21:05                 ` Mark Wielaard
  2018-10-15 21:06                   ` Milian Wolff
  2018-10-18  7:41                 ` Ulf Hermann
  2018-10-20 10:49                 ` Milian Wolff
  2 siblings, 1 reply; 17+ messages in thread
From: Mark Wielaard @ 2018-10-15 21:05 UTC (permalink / raw)
  To: Milian Wolff, elfutils-devel; +Cc: Ulf Hermann, Christoph Sterz

Hi Milian,

On Mon, 2018-10-15 at 22:38 +0200, Milian Wolff wrote:
> here's one example of mmap events recorded by perf:
> 
> 0x7fac5ec0b000 to 0x7fac5ed9a000, len =   0x18f000, offset =                0   
> r--p    /usr/lib/libstdc++.so.6.0.25
> 0x7fac5ec94000 to 0x7fac5ed8a000, len =    0xf6000, offset =          0x89000   
> ---p    /usr/lib/libstdc++.so.6.0.25
> 0x7fac5ec94000 to 0x7fac5ed4c000, len =    0xb8000, offset =          0x89000   
> r-xp    /usr/lib/libstdc++.so.6.0.25
> 0x7fac5ed4c000 to 0x7fac5ed89000, len =    0x3d000, offset =         0x141000   
> r--p    /usr/lib/libstdc++.so.6.0.25
> 0x7fac5ed8a000 to 0x7fac5ed97000, len =     0xd000, offset =         0x17e000   
> rw-p    /usr/lib/libstdc++.so.6.0.25

Could you also post the matching phdr output for the file?
eu-readelf -l /usr/lib/libstdc++.so.6.0.25 should show it.
That way we can see how the PT_LOAD segments map to the mmap events.

Thanks,

Mark

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Handling pgoff in perf elf mmap/mmap2 elf info
  2018-10-15 21:05                 ` Mark Wielaard
@ 2018-10-15 21:06                   ` Milian Wolff
  2018-10-17 14:52                     ` Milian Wolff
  0 siblings, 1 reply; 17+ messages in thread
From: Milian Wolff @ 2018-10-15 21:06 UTC (permalink / raw)
  To: Mark Wielaard; +Cc: elfutils-devel, Ulf Hermann, Christoph Sterz

[-- Attachment #1: Type: text/plain, Size: 2885 bytes --]

On Montag, 15. Oktober 2018 23:04:52 CEST Mark Wielaard wrote:
> Hi Milian,
> 
> On Mon, 2018-10-15 at 22:38 +0200, Milian Wolff wrote:
> > here's one example of mmap events recorded by perf:
> > 
> > 0x7fac5ec0b000 to 0x7fac5ed9a000, len =   0x18f000, offset
> > =                0    r--p    /usr/lib/libstdc++.so.6.0.25
> > 0x7fac5ec94000 to 0x7fac5ed8a000, len =    0xf6000, offset
> > =          0x89000    ---p    /usr/lib/libstdc++.so.6.0.25
> > 0x7fac5ec94000 to 0x7fac5ed4c000, len =    0xb8000, offset
> > =          0x89000    r-xp    /usr/lib/libstdc++.so.6.0.25
> > 0x7fac5ed4c000 to 0x7fac5ed89000, len =    0x3d000, offset
> > =         0x141000    r--p    /usr/lib/libstdc++.so.6.0.25
> > 0x7fac5ed8a000 to 0x7fac5ed97000, len =     0xd000, offset
> > =         0x17e000    rw-p    /usr/lib/libstdc++.so.6.0.25
> 
> Could you also post the matching phdr output for the file?
> eu-readelf -l /usr/lib/libstdc++.so.6.0.25 should show it.
> That way we can see how the PT_LOAD segments map to the mmap events.

Sure:

$ eu-readelf -l /usr/lib/libstdc++.so.6.0.25
Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  
MemSiz   Flg Align
  LOAD           0x000000 0x0000000000000000 0x0000000000000000 0x088fa8 
0x088fa8 R   0x1000
  LOAD           0x089000 0x0000000000089000 0x0000000000089000 0x0b7ae1 
0x0b7ae1 R E 0x1000
  LOAD           0x141000 0x0000000000141000 0x0000000000141000 0x03cfe0 
0x03cfe0 R   0x1000
  LOAD           0x17e8e0 0x000000000017f8e0 0x000000000017f8e0 0x00b8b8 
0x00ed60 RW  0x1000
  DYNAMIC        0x1873a8 0x00000000001883a8 0x00000000001883a8 0x0001e0 
0x0001e0 RW  0x8
  NOTE           0x0002a8 0x00000000000002a8 0x00000000000002a8 0x000024 
0x000024 R   0x4
  NOTE           0x17dfc0 0x000000000017dfc0 0x000000000017dfc0 0x000020 
0x000020 R   0x8
  TLS            0x17e8e0 0x000000000017f8e0 0x000000000017f8e0 0x000000 
0x000020 R   0x8
  GNU_EH_FRAME   0x149558 0x0000000000149558 0x0000000000149558 0x007f04 
0x007f04 R   0x4
  GNU_STACK      0x000000 0x0000000000000000 0x0000000000000000 0x000000 
0x000000 RW  0x10
  GNU_RELRO      0x17e8e0 0x000000000017f8e0 0x000000000017f8e0 0x00b720 
0x00b720 R   0x1

 Section to Segment mapping:
  Segment Sections...
   00      [RO: .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version 
.gnu.version_d .gnu.version_r .rela.dyn]
   01      [RO: .init .text .fini]
   02      [RO: .rodata .eh_frame_hdr .eh_frame .gcc_except_table 
.note.gnu.property]
   03      [RELRO: .tbss .init_array .fini_array .data.rel.ro .dynamic .got] 
.got.plt .data .bss
   04      [RELRO: .dynamic]
   05      [RO: .note.gnu.build-id]
   06      [RO: .note.gnu.property]
   07      [RELRO: .tbss]
   08      [RO: .eh_frame_hdr]
   09     
   10      [RELRO: .tbss .init_array .fini_array .data.rel.ro .dynamic .got]

-- 
Milian Wolff
mail@milianw.de
http://milianw.de

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Handling pgoff in perf elf mmap/mmap2 elf info
  2018-10-15 21:06                   ` Milian Wolff
@ 2018-10-17 14:52                     ` Milian Wolff
  2018-10-17 22:26                       ` Mark Wielaard
  0 siblings, 1 reply; 17+ messages in thread
From: Milian Wolff @ 2018-10-17 14:52 UTC (permalink / raw)
  To: elfutils-devel; +Cc: Mark Wielaard, Ulf Hermann, Christoph Sterz

[-- Attachment #1: Type: text/plain, Size: 3724 bytes --]

On Montag, 15. Oktober 2018 23:06:07 CEST Milian Wolff wrote:
> On Montag, 15. Oktober 2018 23:04:52 CEST Mark Wielaard wrote:
> > Hi Milian,
> > 
> > On Mon, 2018-10-15 at 22:38 +0200, Milian Wolff wrote:
> > > here's one example of mmap events recorded by perf:
> > > 
> > > 0x7fac5ec0b000 to 0x7fac5ed9a000, len =   0x18f000, offset
> > > =                0    r--p    /usr/lib/libstdc++.so.6.0.25
> > > 0x7fac5ec94000 to 0x7fac5ed8a000, len =    0xf6000, offset
> > > =          0x89000    ---p    /usr/lib/libstdc++.so.6.0.25
> > > 0x7fac5ec94000 to 0x7fac5ed4c000, len =    0xb8000, offset
> > > =          0x89000    r-xp    /usr/lib/libstdc++.so.6.0.25
> > > 0x7fac5ed4c000 to 0x7fac5ed89000, len =    0x3d000, offset
> > > =         0x141000    r--p    /usr/lib/libstdc++.so.6.0.25
> > > 0x7fac5ed8a000 to 0x7fac5ed97000, len =     0xd000, offset
> > > =         0x17e000    rw-p    /usr/lib/libstdc++.so.6.0.25
> > 
> > Could you also post the matching phdr output for the file?
> > eu-readelf -l /usr/lib/libstdc++.so.6.0.25 should show it.
> > That way we can see how the PT_LOAD segments map to the mmap events.
> 
> Sure:
> 
> $ eu-readelf -l /usr/lib/libstdc++.so.6.0.25
> Program Headers:
>   Type           Offset   VirtAddr           PhysAddr           FileSiz
> MemSiz   Flg Align
>   LOAD           0x000000 0x0000000000000000 0x0000000000000000 0x088fa8
> 0x088fa8 R   0x1000
>   LOAD           0x089000 0x0000000000089000 0x0000000000089000 0x0b7ae1
> 0x0b7ae1 R E 0x1000
>   LOAD           0x141000 0x0000000000141000 0x0000000000141000 0x03cfe0
> 0x03cfe0 R   0x1000
>   LOAD           0x17e8e0 0x000000000017f8e0 0x000000000017f8e0 0x00b8b8
> 0x00ed60 RW  0x1000
>   DYNAMIC        0x1873a8 0x00000000001883a8 0x00000000001883a8 0x0001e0
> 0x0001e0 RW  0x8
>   NOTE           0x0002a8 0x00000000000002a8 0x00000000000002a8 0x000024
> 0x000024 R   0x4
>   NOTE           0x17dfc0 0x000000000017dfc0 0x000000000017dfc0 0x000020
> 0x000020 R   0x8
>   TLS            0x17e8e0 0x000000000017f8e0 0x000000000017f8e0 0x000000
> 0x000020 R   0x8
>   GNU_EH_FRAME   0x149558 0x0000000000149558 0x0000000000149558 0x007f04
> 0x007f04 R   0x4
>   GNU_STACK      0x000000 0x0000000000000000 0x0000000000000000 0x000000
> 0x000000 RW  0x10
>   GNU_RELRO      0x17e8e0 0x000000000017f8e0 0x000000000017f8e0 0x00b720
> 0x00b720 R   0x1
> 
>  Section to Segment mapping:
>   Segment Sections...
>    00      [RO: .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version
> .gnu.version_d .gnu.version_r .rela.dyn]
>    01      [RO: .init .text .fini]
>    02      [RO: .rodata .eh_frame_hdr .eh_frame .gcc_except_table
> .note.gnu.property]
>    03      [RELRO: .tbss .init_array .fini_array .data.rel.ro .dynamic .got]
> .got.plt .data .bss
>    04      [RELRO: .dynamic]
>    05      [RO: .note.gnu.build-id]
>    06      [RO: .note.gnu.property]
>    07      [RELRO: .tbss]
>    08      [RO: .eh_frame_hdr]
>    09
>    10      [RELRO: .tbss .init_array .fini_array .data.rel.ro .dynamic .got]

So, Mark - any chance you could have a look at the above and give us your 
feedback?

When I compare the actual mmap events with the LOAD segments, there are some 
similarities, but also some discrepancies. Note how the mmap sizes always 
differ from the FileSiz header value. And the offsets also sometimes mismatch, 
e.g. for the last segment / mmap event we get 0x17f8e0 in the header, but 
0x17e000 in the mmap event...:

LOAD           0x17e8e0 0x000000000017f8e0 0x000000000017f8e0 0x00b8b8
 0x00ed60 RW  0x1000

0x7fac5ed8a000 to 0x7fac5ed97000, len =     0xd000, offset =         0x17e000    
rw-p    /usr/lib/libstdc++.so.6.0.25

I'm pretty confused here!

-- 
Milian Wolff
mail@milianw.de
http://milianw.de

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Handling pgoff in perf elf mmap/mmap2 elf info
  2018-10-17 14:52                     ` Milian Wolff
@ 2018-10-17 22:26                       ` Mark Wielaard
  0 siblings, 0 replies; 17+ messages in thread
From: Mark Wielaard @ 2018-10-17 22:26 UTC (permalink / raw)
  To: Milian Wolff; +Cc: elfutils-devel, Ulf Hermann, Christoph Sterz

Hi Milian,

On Wed, Oct 17, 2018 at 04:52:42PM +0200, Milian Wolff wrote:
> On Montag, 15. Oktober 2018 23:06:07 CEST Milian Wolff wrote:
> > On Montag, 15. Oktober 2018 23:04:52 CEST Mark Wielaard wrote:
> > > On Mon, 2018-10-15 at 22:38 +0200, Milian Wolff wrote:
> > > > here's one example of mmap events recorded by perf:
> > > > 
> > > > 0x7fac5ec0b000 to 0x7fac5ed9a000, len =   0x18f000, offset
> > > > =                0    r--p    /usr/lib/libstdc++.so.6.0.25
> > > > 0x7fac5ec94000 to 0x7fac5ed8a000, len =    0xf6000, offset
> > > > =          0x89000    ---p    /usr/lib/libstdc++.so.6.0.25
> > > > 0x7fac5ec94000 to 0x7fac5ed4c000, len =    0xb8000, offset
> > > > =          0x89000    r-xp    /usr/lib/libstdc++.so.6.0.25
> > > > 0x7fac5ed4c000 to 0x7fac5ed89000, len =    0x3d000, offset
> > > > =         0x141000    r--p    /usr/lib/libstdc++.so.6.0.25
> > > > 0x7fac5ed8a000 to 0x7fac5ed97000, len =     0xd000, offset
> > > > =         0x17e000    rw-p    /usr/lib/libstdc++.so.6.0.25
> > > 
> > > Could you also post the matching phdr output for the file?
> > > eu-readelf -l /usr/lib/libstdc++.so.6.0.25 should show it.
> > > That way we can see how the PT_LOAD segments map to the mmap events.
> > 
> > Sure:
> > 
> > $ eu-readelf -l /usr/lib/libstdc++.so.6.0.25
> > Program Headers:
> >   Type           Offset   VirtAddr           PhysAddr           FileSiz
> > MemSiz   Flg Align
> >   LOAD           0x000000 0x0000000000000000 0x0000000000000000 0x088fa8
> > 0x088fa8 R   0x1000
> >   LOAD           0x089000 0x0000000000089000 0x0000000000089000 0x0b7ae1
> > 0x0b7ae1 R E 0x1000
> >   LOAD           0x141000 0x0000000000141000 0x0000000000141000 0x03cfe0
> > 0x03cfe0 R   0x1000
> >   LOAD           0x17e8e0 0x000000000017f8e0 0x000000000017f8e0 0x00b8b8
> > 0x00ed60 RW  0x1000
> >   DYNAMIC        0x1873a8 0x00000000001883a8 0x00000000001883a8 0x0001e0
> > 0x0001e0 RW  0x8
> >   NOTE           0x0002a8 0x00000000000002a8 0x00000000000002a8 0x000024
> > 0x000024 R   0x4
> >   NOTE           0x17dfc0 0x000000000017dfc0 0x000000000017dfc0 0x000020
> > 0x000020 R   0x8
> >   TLS            0x17e8e0 0x000000000017f8e0 0x000000000017f8e0 0x000000
> > 0x000020 R   0x8
> >   GNU_EH_FRAME   0x149558 0x0000000000149558 0x0000000000149558 0x007f04
> > 0x007f04 R   0x4
> >   GNU_STACK      0x000000 0x0000000000000000 0x0000000000000000 0x000000
> > 0x000000 RW  0x10
> >   GNU_RELRO      0x17e8e0 0x000000000017f8e0 0x000000000017f8e0 0x00b720
> > 0x00b720 R   0x1
> > 
> >  Section to Segment mapping:
> >   Segment Sections...
> >    00      [RO: .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version
> > .gnu.version_d .gnu.version_r .rela.dyn]
> >    01      [RO: .init .text .fini]
> >    02      [RO: .rodata .eh_frame_hdr .eh_frame .gcc_except_table
> > .note.gnu.property]
> >    03      [RELRO: .tbss .init_array .fini_array .data.rel.ro .dynamic .got]
> > .got.plt .data .bss
> >    04      [RELRO: .dynamic]
> >    05      [RO: .note.gnu.build-id]
> >    06      [RO: .note.gnu.property]
> >    07      [RELRO: .tbss]
> >    08      [RO: .eh_frame_hdr]
> >    09
> >    10      [RELRO: .tbss .init_array .fini_array .data.rel.ro .dynamic .got]
> 
> So, Mark - any chance you could have a look at the above and give us your 
> feedback?

Sorry, I haven't yet looked at this deeply. But some quick comments.
The mmap events do seem to correspond to the PT_LOAD segments. At least
the offsets are. Why the second on is mmapped twice I don't know. The
difference in length for the last 3 seems to be that the mmaps are
aligned up (0x1000, 4K, page size)

> When I compare the actual mmap events with the LOAD segments, there are some 
> similarities, but also some discrepancies. Note how the mmap sizes always 
> differ from the FileSiz header value. And the offsets also sometimes mismatch, 
> e.g. for the last segment / mmap event we get 0x17f8e0 in the header, but 
> 0x17e000 in the mmap event...:
> 
> LOAD           0x17e8e0 0x000000000017f8e0 0x000000000017f8e0 0x00b8b8
>  0x00ed60 RW  0x1000
> 
> 0x7fac5ed8a000 to 0x7fac5ed97000, len =     0xd000, offset =         0x17e000    
> rw-p    /usr/lib/libstdc++.so.6.0.25
> 
> I'm pretty confused here!

I think the differences can be explained by the fact that mmap will use
aligned offsets and length.

In theory libdwfl just needs to see one mmap even and should then be
able to use the phdrs PT_LOAD headers to see how the whole file is
mmapped into memory. Maybe something goes wrong there. And reporting
multiple events for the same file might confuse things.

Cheers,

Mark

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Handling pgoff in perf elf mmap/mmap2 elf info
  2018-10-15 20:39               ` Milian Wolff
  2018-10-15 21:05                 ` Mark Wielaard
@ 2018-10-18  7:41                 ` Ulf Hermann
  2018-10-20 10:49                 ` Milian Wolff
  2 siblings, 0 replies; 17+ messages in thread
From: Ulf Hermann @ 2018-10-18  7:41 UTC (permalink / raw)
  To: Milian Wolff, elfutils-devel; +Cc: Mark Wielaard, Christoph Sterz

Consider:

> 0x7fac5ec0b000 to 0x7fac5ed9a000, len =   0x18f000, offset =                0
> r--p    /usr/lib/libstdc++.so.6.0.25
> 0x7fac5ec94000 to 0x7fac5ed8a000, len =    0xf6000, offset =          0x89000
> ---p    /usr/lib/libstdc++.so.6.0.25

0x7fac5ec94000 - 0x89000 = 0x7fac5ec0b000

This is just taking away the 'r' bit from part of the first mapping. We 
can ignore it in perf and perfparser.

> 0x7fac5ec94000 to 0x7fac5ed4c000, len =    0xb8000, offset =          0x89000
> r-xp    /usr/lib/libstdc++.so.6.0.25

Same thing, but adding the 'r' and 'x' bits.

> 0x7fac5ed4c000 to 0x7fac5ed89000, len =    0x3d000, offset =         0x141000
> r--p    /usr/lib/libstdc++.so.6.0.25

0x7fac5ed4c000 - 0x141000 = 0x7fac5ec0b000

This is re-adding the 'r' bit for a different range, again without 
changing the contents.

> 0x7fac5ed8a000 to 0x7fac5ed97000, len =     0xd000, offset =         0x17e000
> rw-p    /usr/lib/libstdc++.so.6.0.25

0x7fac5ed8a000 - 0x17e000 = 0x7fac5ec0c000

Strange, what is this? Note that the 'rw', though. It probably doesn't 
contain any executable code. In effect, perfparser and perf should then 
truncate the original mapping and never report this one to libdw, just 
like we do in perfparser now (for different reasons).

So, what about the following algorithing, which can be done entirely 
outside of elfutils:

If we get an mmap with a pgoff, check if the same file is already mapped 
at the "virtual" start address of the file (start - pgoff), and if it 
is, ignore the mmap. That should deal with the first 3 overlaps here. 
The last one is really mapping a different part of the file, but as the 
mapping is not executable and read-write we should really never get a 
sample from it.

We might actually check the permission bits before reporting things to 
dwfl so that a few broken samples don't destroy the memory map. However, 
that partially contradicts the above. We'd need to OR up all the 
permissions from different mmaps covering the same file at the same base 
address to get an approximation for it, or we need a different data 
structure.

best,
Ulf

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Handling pgoff in perf elf mmap/mmap2 elf info
  2018-10-15 20:39               ` Milian Wolff
  2018-10-15 21:05                 ` Mark Wielaard
  2018-10-18  7:41                 ` Ulf Hermann
@ 2018-10-20 10:49                 ` Milian Wolff
  2 siblings, 0 replies; 17+ messages in thread
From: Milian Wolff @ 2018-10-20 10:49 UTC (permalink / raw)
  To: elfutils-devel; +Cc: Mark Wielaard, Ulf Hermann, Christoph Sterz

[-- Attachment #1: Type: text/plain, Size: 4450 bytes --]

On Montag, 15. Oktober 2018 22:38:53 CEST Milian Wolff wrote:
> On Donnerstag, 11. Oktober 2018 20:14:43 CEST Milian Wolff wrote:
> > On Donnerstag, 11. Oktober 2018 19:37:07 CEST Mark Wielaard wrote:
> > > Hi,
> > > 
> > > My apologies for not having looked deeper at this.
> > > It is a bit tricky and I just didnt have enough time to
> > > really sit down and think it all through yet.
> > > 
> > > On Thu, Oct 11, 2018 at 05:02:18PM +0000, Ulf Hermann wrote:
> > > > is there any pattern in how the loader maps the ELF sections into
> > > > memory? What sections does it actually map and which of those do we
> > > > need
> > > > for unwinding?
> > > 
> > > Yes, it would be helpful to have some examples of mmap events plus
> > > the associated segment header (eu-readelf -l) of the ELF file.
> > > 
> > > Note that the kernel and dynamic loader will use the (PT_LOAD) segments,
> > > not the sections, to map things into memory. Each segment might contain
> > > multiple sections.
> > > 
> > > libdwfl then tries to associate the correct sections (and address bias)
> > > with how the ELF file was mapped into memory.
> > > 
> > > > I hope that only one of those MMAPs per ELF is actually meaningful and
> > > > we can simply add that one's pgoff as an extra member to Dwfl_Module
> > > > and
> > > > use it whenever we poke the underlying file.
> > > 
> > > One "trick" might be to just substract the pgoff from the load address.
> > > And so report as if the ELF file was being mapped from the start. This
> > > isn't really correct, but it might be interesting to see if that makes
> > > libdwfl able to just associate the whole ELF file with the correct
> > > address map.
> > 
> > I'll try to come up with some minimal code examples we can use to test all
> > of this. But from what I remember, neither of the above suggestions will
> > be
> > sufficient as we can still run into overlapping module errors from
> > elfutils
> > when we always load everything. I.e. I believe we've seen mappings that
> > eventually become partially obsoleted by a future mmap event. At that
> > point, we somehow need to be able to only map parts of a file, not all of
> > it. So just subtracting or honoring pgoff is not enough, I believe we also
> > need to be able to explicitly say how much of a file to map.
> > 
> > But to make this discussion easier to follow for others, I'll create some
> > standalone cpp code that takes a `perf script --show-mmap-events  | grep
> > PERF_RECORD_MMAP` input file and then runs this through elfutils API to
> > reproduce the issues we are facing.
> > 
> > I'll get back to you all once this is done.
> 
> Hey all,
> 
> here's one example of mmap events recorded by perf:
> 
> 0x7fac5ec0b000 to 0x7fac5ed9a000, len =   0x18f000, offset =               
> 0 r--p    /usr/lib/libstdc++.so.6.0.25
> 0x7fac5ec94000 to 0x7fac5ed8a000, len =    0xf6000, offset =         
> 0x89000 ---p    /usr/lib/libstdc++.so.6.0.25
> 0x7fac5ec94000 to 0x7fac5ed4c000, len =    0xb8000, offset =         
> 0x89000 r-xp    /usr/lib/libstdc++.so.6.0.25
> 0x7fac5ed4c000 to 0x7fac5ed89000, len =    0x3d000, offset =        
> 0x141000 r--p    /usr/lib/libstdc++.so.6.0.25
> 0x7fac5ed8a000 to 0x7fac5ed97000, len =     0xd000, offset =        
> 0x17e000 rw-p    /usr/lib/libstdc++.so.6.0.25

Spending more time on this issue, I've come up with a seemingly viable 
approach to workaround the libdwfl API limitations. Most notably, one must not 
take the raw mmap events and try to report them. Instead, we now associate the 
"base mmap", i.e. the first one with pgoff = 0, with the following mmaps for 
this file. Then, when we'd otherwise try to report one of the following mmaps, 
we lookup the base mmap addr and use that in our interaction with libdwfl.

Now I'm not seeing any "overlapping address" errors anymore, and unwinding 
seems to work fine again for perf files with mmap events like in the above.

There are still quite a few broken backtraces, but so far I can't say what the 
reason for this is.

So, from my POV, I'm fine - I have a viable workaround. But from the POV of 
future users of libdwfl, I still believe it would be useful to have more 
control over what gets reported and where. I.e. instead of having libdwfl 
analyze the PT_LOAD sections, offer an API that would allow us to feed the 
mmaped regions directly with pgoff etc.

Thanks for the input everyone

-- 
Milian Wolff
mail@milianw.de
http://milianw.de

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2018-10-20 10:49 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-19 12:12 Handling pgoff in perf elf mmap/mmap2 elf info Christoph Sterz
2018-09-19 12:24 ` Ulf Hermann
2018-09-21 13:07   ` Mark Wielaard
2018-09-21 13:35     ` Ulf Hermann
2018-09-26 14:38     ` Milian Wolff
2018-10-09 20:33       ` Milian Wolff
2018-10-11 17:02         ` Ulf Hermann
2018-10-11 17:37           ` Mark Wielaard
2018-10-11 18:14             ` Milian Wolff
2018-10-15 20:39               ` Milian Wolff
2018-10-15 21:05                 ` Mark Wielaard
2018-10-15 21:06                   ` Milian Wolff
2018-10-17 14:52                     ` Milian Wolff
2018-10-17 22:26                       ` Mark Wielaard
2018-10-18  7:41                 ` Ulf Hermann
2018-10-20 10:49                 ` Milian Wolff
2018-10-15 20:48               ` Milian Wolff

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).