public inbox for elfutils@sourceware.org
 help / color / mirror / Atom feed
* How to associate Elf with Dwfl_Module returned by dwfl_report_module
@ 2018-03-17 13:15 Milian Wolff
  2018-03-20 22:05 ` Mark Wielaard
  0 siblings, 1 reply; 11+ messages in thread
From: Milian Wolff @ 2018-03-17 13:15 UTC (permalink / raw)
  To: elfutils-devel

[-- Attachment #1: Type: text/plain, Size: 1520 bytes --]

Hey there,

a recurring issue in the libdwfl integration of perf and perfparser are 
supposedly overlapping modules. The perf data file contains the exact mappings 
for all files corresponding to the actual mmap events that occurred during 
runtime, e.g.:

```
$ perf script --show-mmap-events | grep MMAP | grep stdc
heaptrack_print 13962 87163.483450: PERF_RECORD_MMAP2 13962/13962: 
[0x7fd0aea84000(0x387000) @ 0 08:03 413039 3864781083]: r-xp /usr/lib/libstdc+
+.so.6.0.24
heaptrack_print 13962 87163.483454: PERF_RECORD_MMAP2 13962/13962: 
[0x7fd0aebfc000(0x1ff000) @ 0x178000 08:03 413039 3864781083]: ---p /usr/lib/
libstdc++.so.6.0.24
heaptrack_print 13962 87163.483458: PERF_RECORD_MMAP2 13962/13962: 
[0x7fd0aedfb000(0xd000) @ 0x177000 08:03 413039 3864781083]: rw-p /usr/lib/
libstdc++.so.6.0.24
heaptrack_print 13962 87163.484860: PERF_RECORD_MMAP2 13962/13962: 
[0x7fd0aedfb000(0xc000) @ 0x177000 08:03 413039 3864781083]: r--p /usr/lib/
libstdc++.so.6.0.24
```
So far, both perf and perfparser are using dwfl_report_elf to report the file. 
But that API is deducing the mapping addresses internally, which may or may 
not be what we saw at runtime. I suspect that this is the reason for some 
issues we are seeing, such as supposedly overlapping modules. 

Looking at the Dwfl API, I cannot figure out how to feed the mapping directly. 
There's dwfl_report_module, but how would I associate an Elf* and int fd with 
it, as done by dwfl_report_elf?

Thanks
-- 
Milian Wolff
mail@milianw.de
http://milianw.de

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: How to associate Elf with Dwfl_Module returned by dwfl_report_module
  2018-03-17 13:15 How to associate Elf with Dwfl_Module returned by dwfl_report_module Milian Wolff
@ 2018-03-20 22:05 ` Mark Wielaard
  2018-03-21 13:01   ` Milian Wolff
  0 siblings, 1 reply; 11+ messages in thread
From: Mark Wielaard @ 2018-03-20 22:05 UTC (permalink / raw)
  To: Milian Wolff; +Cc: elfutils-devel

Hi Milian,

On Sat, Mar 17, 2018 at 02:14:48PM +0100, Milian Wolff wrote:
> a recurring issue in the libdwfl integration of perf and perfparser are 
> supposedly overlapping modules. The perf data file contains the exact mappings 
> for all files corresponding to the actual mmap events that occurred during 
> runtime, e.g.:
> 
> ```
> $ perf script --show-mmap-events | grep MMAP | grep stdc
> heaptrack_print 13962 87163.483450: PERF_RECORD_MMAP2 13962/13962: 
> [0x7fd0aea84000(0x387000) @ 0 08:03 413039 3864781083]: r-xp /usr/lib/libstdc+
> +.so.6.0.24
> heaptrack_print 13962 87163.483454: PERF_RECORD_MMAP2 13962/13962: 
> [0x7fd0aebfc000(0x1ff000) @ 0x178000 08:03 413039 3864781083]: ---p /usr/lib/
> libstdc++.so.6.0.24
> heaptrack_print 13962 87163.483458: PERF_RECORD_MMAP2 13962/13962: 
> [0x7fd0aedfb000(0xd000) @ 0x177000 08:03 413039 3864781083]: rw-p /usr/lib/
> libstdc++.so.6.0.24
> heaptrack_print 13962 87163.484860: PERF_RECORD_MMAP2 13962/13962: 
> [0x7fd0aedfb000(0xc000) @ 0x177000 08:03 413039 3864781083]: r--p /usr/lib/
> libstdc++.so.6.0.24
> ```
> So far, both perf and perfparser are using dwfl_report_elf to report the file. 
> But that API is deducing the mapping addresses internally, which may or may 
> not be what we saw at runtime. I suspect that this is the reason for some 
> issues we are seeing, such as supposedly overlapping modules. 

How exactly are you calling dwfl_report_elf?
Specifically are you using false for the add_p_vaddr argument?

And could you provide some example where the reported address is
wrong/different from the start address of the Dwfl_Module?

> Looking at the Dwfl API, I cannot figure out how to feed the mapping directly. 
> There's dwfl_report_module, but how would I associate an Elf* and int fd with 
> it, as done by dwfl_report_elf?

When using dwfl_report_module the find_elf callback will be called that
you registered with dwfl_begin. That callback is called "lazily" the
first time dwfl_module_getelf is called. The callback can then set the
Elf*. But that does mean you have to keep track yourself (or immediately
call dwfl_module_getelf).

I would like to understand better what is really going wrong with
dwfl_report_elf before diving into using dwfl_module_report.

Cheers,

Mark

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: How to associate Elf with Dwfl_Module returned by dwfl_report_module
  2018-03-20 22:05 ` Mark Wielaard
@ 2018-03-21 13:01   ` Milian Wolff
  2018-03-21 14:23     ` Milian Wolff
                       ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Milian Wolff @ 2018-03-21 13:01 UTC (permalink / raw)
  To: elfutils-devel; +Cc: Mark Wielaard

[-- Attachment #1: Type: text/plain, Size: 4512 bytes --]

On Dienstag, 20. März 2018 23:05:49 CET Mark Wielaard wrote:
> Hi Milian,

Hey Mark :)

> On Sat, Mar 17, 2018 at 02:14:48PM +0100, Milian Wolff wrote:
> > a recurring issue in the libdwfl integration of perf and perfparser are
> > supposedly overlapping modules. The perf data file contains the exact
> > mappings for all files corresponding to the actual mmap events that
> > occurred during runtime, e.g.:
> > 
> > ```
> > $ perf script --show-mmap-events | grep MMAP | grep stdc
> > heaptrack_print 13962 87163.483450: PERF_RECORD_MMAP2 13962/13962:
> > [0x7fd0aea84000(0x387000) @ 0 08:03 413039 3864781083]: r-xp
> > /usr/lib/libstdc+ +.so.6.0.24
> > heaptrack_print 13962 87163.483454: PERF_RECORD_MMAP2 13962/13962:
> > [0x7fd0aebfc000(0x1ff000) @ 0x178000 08:03 413039 3864781083]: ---p
> > /usr/lib/ libstdc++.so.6.0.24
> > heaptrack_print 13962 87163.483458: PERF_RECORD_MMAP2 13962/13962:
> > [0x7fd0aedfb000(0xd000) @ 0x177000 08:03 413039 3864781083]: rw-p
> > /usr/lib/
> > libstdc++.so.6.0.24
> > heaptrack_print 13962 87163.484860: PERF_RECORD_MMAP2 13962/13962:
> > [0x7fd0aedfb000(0xc000) @ 0x177000 08:03 413039 3864781083]: r--p
> > /usr/lib/
> > libstdc++.so.6.0.24
> > ```
> > So far, both perf and perfparser are using dwfl_report_elf to report the
> > file. But that API is deducing the mapping addresses internally, which
> > may or may not be what we saw at runtime. I suspect that this is the
> > reason for some issues we are seeing, such as supposedly overlapping
> > modules.
> 
> How exactly are you calling dwfl_report_elf?

Here's the code for the perf tools:

https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/
perf/util/unwind-libdw.c?h=perf/core#n52

Here's the code for the perfparser:

http://code.qt.io/cgit/qt-creator/perfparser.git/tree/app/
perfsymboltable.cpp#n479

Let's concentrate on perf for now, but perfparser has similar logic:

We parse the mmap events in the perf.data file and store that information. 
Note that the perf.data file does not contain events for munmap calls. Then 
while unwinding the callstack of a perf sample, we lookup the most recent mmap 
event for every given instruction pointer address, and ensure that the 
corresponding ELF was registered with libdw.

> Specifically are you using false for the add_p_vaddr argument?

Yes, we are.

> And could you provide some example where the reported address is
> wrong/different from the start address of the Dwfl_Module?

I don't think it's the start address that is wrong, rather it's the end 
address. But it's hard for me to come up with a small selfcontained example at 
this stage. I am regularly seeing broken backtraces for samples where I have 
the gut feeling that missing reported ELFs are to blame. But we report 
everything, except for scenarios where the mmap events seemingly overlap. This 
overlapping is, as far as I can see, actually a side effect of remapping 
taking place in the dynamic linker (i.e. a single dlopen/dynamic linked 
library can yield multiple mmap events). One way or another, we end up with a 
situation where we cannot report an ELF to dwfl due to two issues:

a) either ELF tells us we are overlapping some module and just stops which is 
bad, since we would actually much prefer the newly reported ELF to take 
precedence

b) we find an mmap event that with a non-zero pgoff, and have no clue how to 
call dwfl_report_elf and just give up.

In both cases, I was hopeing for dwfl_report_module to help since it seemingly 
allows me to exactly recreate the mapping that was traced originally.

> > Looking at the Dwfl API, I cannot figure out how to feed the mapping
> > directly. There's dwfl_report_module, but how would I associate an Elf*
> > and int fd with it, as done by dwfl_report_elf?
> 
> When using dwfl_report_module the find_elf callback will be called that
> you registered with dwfl_begin. That callback is called "lazily" the
> first time dwfl_module_getelf is called. The callback can then set the
> Elf*. But that does mean you have to keep track yourself (or immediately
> call dwfl_module_getelf).

Ah, thanks!

> I would like to understand better what is really going wrong with
> dwfl_report_elf before diving into using dwfl_module_report.

See above, I would very much value your input. I'm still far from having fully 
grasped this situation.

Thanks

-- 
Milian Wolff
mail@milianw.de
http://milianw.de

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: How to associate Elf with Dwfl_Module returned by dwfl_report_module
  2018-03-21 13:01   ` Milian Wolff
@ 2018-03-21 14:23     ` Milian Wolff
  2018-03-21 14:35       ` Ulf Hermann
  2018-03-21 21:31       ` Mark Wielaard
  2018-03-21 21:21     ` Mark Wielaard
  2018-03-22  9:11     ` Ulf Hermann
  2 siblings, 2 replies; 11+ messages in thread
From: Milian Wolff @ 2018-03-21 14:23 UTC (permalink / raw)
  To: elfutils-devel; +Cc: Mark Wielaard

[-- Attachment #1: Type: text/plain, Size: 5482 bytes --]

On Mittwoch, 21. März 2018 14:01:41 CET Milian Wolff wrote:
> On Dienstag, 20. März 2018 23:05:49 CET Mark Wielaard wrote:
> > Hi Milian,
> 
> Hey Mark :)
> 
> > On Sat, Mar 17, 2018 at 02:14:48PM +0100, Milian Wolff wrote:
> > > a recurring issue in the libdwfl integration of perf and perfparser are
> > > supposedly overlapping modules. The perf data file contains the exact
> > > mappings for all files corresponding to the actual mmap events that
> > > occurred during runtime, e.g.:
> > > 
> > > ```
> > > $ perf script --show-mmap-events | grep MMAP | grep stdc
> > > heaptrack_print 13962 87163.483450: PERF_RECORD_MMAP2 13962/13962:
> > > [0x7fd0aea84000(0x387000) @ 0 08:03 413039 3864781083]: r-xp
> > > /usr/lib/libstdc+ +.so.6.0.24
> > > heaptrack_print 13962 87163.483454: PERF_RECORD_MMAP2 13962/13962:
> > > [0x7fd0aebfc000(0x1ff000) @ 0x178000 08:03 413039 3864781083]: ---p
> > > /usr/lib/ libstdc++.so.6.0.24
> > > heaptrack_print 13962 87163.483458: PERF_RECORD_MMAP2 13962/13962:
> > > [0x7fd0aedfb000(0xd000) @ 0x177000 08:03 413039 3864781083]: rw-p
> > > /usr/lib/
> > > libstdc++.so.6.0.24
> > > heaptrack_print 13962 87163.484860: PERF_RECORD_MMAP2 13962/13962:
> > > [0x7fd0aedfb000(0xc000) @ 0x177000 08:03 413039 3864781083]: r--p
> > > /usr/lib/
> > > libstdc++.so.6.0.24
> > > ```
> > > So far, both perf and perfparser are using dwfl_report_elf to report the
> > > file. But that API is deducing the mapping addresses internally, which
> > > may or may not be what we saw at runtime. I suspect that this is the
> > > reason for some issues we are seeing, such as supposedly overlapping
> > > modules.
> > 
> > How exactly are you calling dwfl_report_elf?
> 
> Here's the code for the perf tools:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/
> perf/util/unwind-libdw.c?h=perf/core#n52
> 
> Here's the code for the perfparser:
> 
> http://code.qt.io/cgit/qt-creator/perfparser.git/tree/app/
> perfsymboltable.cpp#n479
> 
> Let's concentrate on perf for now, but perfparser has similar logic:
> 
> We parse the mmap events in the perf.data file and store that information.
> Note that the perf.data file does not contain events for munmap calls. Then
> while unwinding the callstack of a perf sample, we lookup the most recent
> mmap event for every given instruction pointer address, and ensure that the
> corresponding ELF was registered with libdw.
> 
> > Specifically are you using false for the add_p_vaddr argument?
> 
> Yes, we are.
> 
> > And could you provide some example where the reported address is
> > wrong/different from the start address of the Dwfl_Module?
> 
> I don't think it's the start address that is wrong, rather it's the end
> address. But it's hard for me to come up with a small selfcontained example
> at this stage. I am regularly seeing broken backtraces for samples where I
> have the gut feeling that missing reported ELFs are to blame. But we report
> everything, except for scenarios where the mmap events seemingly overlap.
> This overlapping is, as far as I can see, actually a side effect of
> remapping taking place in the dynamic linker (i.e. a single dlopen/dynamic
> linked library can yield multiple mmap events). One way or another, we end
> up with a situation where we cannot report an ELF to dwfl due to two
> issues:
> 
> a) either ELF tells us we are overlapping some module and just stops which
> is bad, since we would actually much prefer the newly reported ELF to take
> precedence
> 
> b) we find an mmap event that with a non-zero pgoff, and have no clue how to
> call dwfl_report_elf and just give up.
> 
> In both cases, I was hopeing for dwfl_report_module to help since it
> seemingly allows me to exactly recreate the mapping that was traced
> originally.

Here's one way to investigate where perf and dwfl disagree on the file 
mappings:

diff --git a/tools/perf/util/unwind-libdw.c b/tools/perf/util/unwind-libdw.c
index 7bdd239c795c..739c68b73772 100644
--- a/tools/perf/util/unwind-libdw.c
+++ b/tools/perf/util/unwind-libdw.c
@@ -48,11 +48,17 @@ static int __report_module(struct addr_location *al, u64 
ip,
 			mod = 0;
 	}
 
-	if (!mod)
+	if (!mod) {
+		Dwarf_Addr s, e;
 		mod = dwfl_report_elf(ui->dwfl, dso->short_name,
 				      (dso->symsrc_filename ? dso->symsrc_filename : dso-
>long_name), -1, al->map->start,
 				      false);
 
+		dwfl_module_info(mod, NULL, &s, &e, NULL, NULL, NULL, NULL);
+		if (al->map->start != s || al->map->end != e)
+			pr_warning("MEH: %s | mmap: %zx %zx | dwfl: %zx %zx\n", dso-
>short_name, al->map->start, al->map->end, s, e);
+	}
+
 	return mod && dwfl_addrmodule(ui->dwfl, ip) == mod ? 0 : -1;
 }
 
On one of my perf.data files with many broken backtraces this gives me:

MEH: heaptrack_print | mmap: 56166e9d4000 56166ea39000 | dwfl: 56166e9d4000 
56166ea38880
MEH: ld-2.26.so | mmap: 7fd0afc6c000 7fd0afe93000 | dwfl: 7fd0afc6c000 
7fd0afe920f8
MEH: libc-2.26.so | mmap: 7fd0ae16a000 7fd0ae521000 | dwfl: 7fd0ae16a000 
7fd0ae5208f0
MEH: libstdc++.so.6.0.24 | mmap: 7fd0aea84000 7fd0aee0b000 | dwfl: 
7fd0aea84000 7fd0aee0a640
MEH: libzstd.so.1.3.3 | mmap: 7fd0aee0b000 7fd0af087000 | dwfl: 7fd0aee0b000 
7fd0af086030

Interestingly, here the mmap events observed by perf are actually always 
*larger* than what dwfl sees...
-- 
Milian Wolff
mail@milianw.de
http://milianw.de

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: How to associate Elf with Dwfl_Module returned by dwfl_report_module
  2018-03-21 14:23     ` Milian Wolff
@ 2018-03-21 14:35       ` Ulf Hermann
  2018-03-21 21:31       ` Mark Wielaard
  1 sibling, 0 replies; 11+ messages in thread
From: Ulf Hermann @ 2018-03-21 14:35 UTC (permalink / raw)
  To: elfutils-devel, milian.wolff

> MEH: heaptrack_print | mmap: 56166e9d4000 56166ea39000 | dwfl: 56166e9d4000 
> 56166ea38880
> MEH: ld-2.26.so | mmap: 7fd0afc6c000 7fd0afe93000 | dwfl: 7fd0afc6c000 
> 7fd0afe920f8
> MEH: libc-2.26.so | mmap: 7fd0ae16a000 7fd0ae521000 | dwfl: 7fd0ae16a000 
> 7fd0ae5208f0
> MEH: libstdc++.so.6.0.24 | mmap: 7fd0aea84000 7fd0aee0b000 | dwfl: 
> 7fd0aea84000 7fd0aee0a640
> MEH: libzstd.so.1.3.3 | mmap: 7fd0aee0b000 7fd0af087000 | dwfl: 7fd0aee0b000 
> 7fd0af086030
> 
> Interestingly, here the mmap events observed by perf are actually always 
> *larger* than what dwfl sees...

Do those differences map to some specific ELF sections? That is, does elfutils assume some sections are mmap'd even though they aren't or vice versa? As elfutils always assumes pgoff == 0 it should be fairly simple to see which sections those are (right?).

Ulf

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: How to associate Elf with Dwfl_Module returned by dwfl_report_module
  2018-03-21 13:01   ` Milian Wolff
  2018-03-21 14:23     ` Milian Wolff
@ 2018-03-21 21:21     ` Mark Wielaard
  2018-03-22 12:29       ` Milian Wolff
  2018-03-22  9:11     ` Ulf Hermann
  2 siblings, 1 reply; 11+ messages in thread
From: Mark Wielaard @ 2018-03-21 21:21 UTC (permalink / raw)
  To: Milian Wolff; +Cc: elfutils-devel

Hi Milian,

On Wed, Mar 21, 2018 at 02:01:41PM +0100, Milian Wolff wrote:
> Here's the code for the perf tools:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/
> perf/util/unwind-libdw.c?h=perf/core#n52
> 
> Here's the code for the perfparser:
> 
> http://code.qt.io/cgit/qt-creator/perfparser.git/tree/app/
> perfsymboltable.cpp#n479
> 
> Let's concentrate on perf for now, but perfparser has similar logic:
> 
> We parse the mmap events in the perf.data file and store that information. 
> Note that the perf.data file does not contain events for munmap calls. Then 
> while unwinding the callstack of a perf sample, we lookup the most recent mmap 
> event for every given instruction pointer address, and ensure that the 
> corresponding ELF was registered with libdw.

So, modules are never deregistered?
In that case, that might explain the issue.
But I see there is a check if there is already something at the address.
The interface to "remove" a module might not be immediately clear.
The idea is that if modules need to be remove you'll call
dwfl_report_begin, possibly dwfl_report_elf for any new module and then
dwfl_report_end has a callback that gets all old modules and decides
whether to re-report them, or they'll get removed. You might want to
experiment with doing that and not re-report any module that overlaps
with the new module. (See the libdwfl.h documentation for a hopefully
clearer description.)

> > Specifically are you using false for the add_p_vaddr argument?
> 
> Yes, we are.
> 
> > And could you provide some example where the reported address is
> > wrong/different from the start address of the Dwfl_Module?
> 
> I don't think it's the start address that is wrong, rather it's the end 
> address. But it's hard for me to come up with a small selfcontained example at 
> this stage. I am regularly seeing broken backtraces for samples where I have 
> the gut feeling that missing reported ELFs are to blame. But we report 
> everything, except for scenarios where the mmap events seemingly overlap. This 
> overlapping is, as far as I can see, actually a side effect of remapping 
> taking place in the dynamic linker (i.e. a single dlopen/dynamic linked 
> library can yield multiple mmap events). One way or another, we end up with a 
> situation where we cannot report an ELF to dwfl due to two issues:
> 
> a) either ELF tells us we are overlapping some module and just stops which is 
> bad, since we would actually much prefer the newly reported ELF to take 
> precedence
> 
> b) we find an mmap event that with a non-zero pgoff, and have no clue how to 
> call dwfl_report_elf and just give up.
> 
> In both cases, I was hopeing for dwfl_report_module to help since it seemingly 
> allows me to exactly recreate the mapping that was traced originally.

If you could add some logging and post that plus the eu-readelf -l
output of the ELF file, that might help track down what is really going
on.

Cheers,

Mark

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: How to associate Elf with Dwfl_Module returned by dwfl_report_module
  2018-03-21 14:23     ` Milian Wolff
  2018-03-21 14:35       ` Ulf Hermann
@ 2018-03-21 21:31       ` Mark Wielaard
  2018-03-22 12:30         ` Milian Wolff
  1 sibling, 1 reply; 11+ messages in thread
From: Mark Wielaard @ 2018-03-21 21:31 UTC (permalink / raw)
  To: Milian Wolff; +Cc: elfutils-devel

On Wed, Mar 21, 2018 at 03:23:51PM +0100, Milian Wolff wrote:
> Here's one way to investigate where perf and dwfl disagree on the file 
> mappings:
> 
> diff --git a/tools/perf/util/unwind-libdw.c b/tools/perf/util/unwind-libdw.c
> index 7bdd239c795c..739c68b73772 100644
> --- a/tools/perf/util/unwind-libdw.c
> +++ b/tools/perf/util/unwind-libdw.c
> @@ -48,11 +48,17 @@ static int __report_module(struct addr_location *al, u64 
> ip,
>  			mod = 0;
>  	}
>  
> -	if (!mod)
> +	if (!mod) {
> +		Dwarf_Addr s, e;
>  		mod = dwfl_report_elf(ui->dwfl, dso->short_name,
>  				      (dso->symsrc_filename ? dso->symsrc_filename : dso-
> >long_name), -1, al->map->start,
>  				      false);
>  
> +		dwfl_module_info(mod, NULL, &s, &e, NULL, NULL, NULL, NULL);
> +		if (al->map->start != s || al->map->end != e)
> +			pr_warning("MEH: %s | mmap: %zx %zx | dwfl: %zx %zx\n", dso-
> >short_name, al->map->start, al->map->end, s, e);
> +	}
> +
>  	return mod && dwfl_addrmodule(ui->dwfl, ip) == mod ? 0 : -1;
>  }
>  
> On one of my perf.data files with many broken backtraces this gives me:
> 
> MEH: heaptrack_print | mmap: 56166e9d4000 56166ea39000 | dwfl: 56166e9d4000 
> 56166ea38880
> MEH: ld-2.26.so | mmap: 7fd0afc6c000 7fd0afe93000 | dwfl: 7fd0afc6c000 
> 7fd0afe920f8
> MEH: libc-2.26.so | mmap: 7fd0ae16a000 7fd0ae521000 | dwfl: 7fd0ae16a000 
> 7fd0ae5208f0
> MEH: libstdc++.so.6.0.24 | mmap: 7fd0aea84000 7fd0aee0b000 | dwfl: 
> 7fd0aea84000 7fd0aee0a640
> MEH: libzstd.so.1.3.3 | mmap: 7fd0aee0b000 7fd0af087000 | dwfl: 7fd0aee0b000 
> 7fd0af086030
> 
> Interestingly, here the mmap events observed by perf are actually always 
> *larger* than what dwfl sees...

I think that is simply the end address being extended to the next page.
Where the page size seems to be 4K (0x1000). I assume if you look at
the actual LOAD segment sizes of the files (eu-readelf -l) they match
with what dwfl reports, and if it extends to the nearest 4k page it
matches whatp mmap reports.

Cheers,

Mark

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: How to associate Elf with Dwfl_Module returned by dwfl_report_module
  2018-03-21 13:01   ` Milian Wolff
  2018-03-21 14:23     ` Milian Wolff
  2018-03-21 21:21     ` Mark Wielaard
@ 2018-03-22  9:11     ` Ulf Hermann
  2018-03-22 12:33       ` Milian Wolff
  2 siblings, 1 reply; 11+ messages in thread
From: Ulf Hermann @ 2018-03-22  9:11 UTC (permalink / raw)
  To: elfutils-devel

Hi Milian,

> I am regularly seeing broken backtraces for samples where I have 
> the gut feeling that missing reported ELFs are to blame. But we report 
> everything, except for scenarios where the mmap events seemingly overlap.

Actually, at least for perfparser that's not quite true. When perfparser encounters an overlap error, it will throw out the entire set of mappings and restart reporting, with the addresses from the current sample (see PerfSymbolTable::reportElf() and PerfSymbolTable::clearCache()). If that still gives you overlapping ranges, it means perf has not sent all the mmap events and therefore we're reporting the wrong ELF for some address in your sample. That wrong ELF may be larger than the one we actually want and therefore it can overlap some other ELF an address in your sample points to.

I've seen that happen. Make sure to keep your sample rate low enough to prevent perf from dropping anything.

I realize we could optimize the reporting a bit, with the dwfl_report_end callback Mark mentioned, but if you have addresses into two overlapping ELFs in one sample, that's fundamentally impossible to unwind.

Ulf

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: How to associate Elf with Dwfl_Module returned by dwfl_report_module
  2018-03-21 21:21     ` Mark Wielaard
@ 2018-03-22 12:29       ` Milian Wolff
  0 siblings, 0 replies; 11+ messages in thread
From: Milian Wolff @ 2018-03-22 12:29 UTC (permalink / raw)
  To: elfutils-devel; +Cc: Mark Wielaard

[-- Attachment #1: Type: text/plain, Size: 3707 bytes --]

On Mittwoch, 21. März 2018 22:21:13 CET Mark Wielaard wrote:
> Hi Milian,
> 
> On Wed, Mar 21, 2018 at 02:01:41PM +0100, Milian Wolff wrote:
> > Here's the code for the perf tools:
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/
> > perf/util/unwind-libdw.c?h=perf/core#n52
> > 
> > Here's the code for the perfparser:
> > 
> > http://code.qt.io/cgit/qt-creator/perfparser.git/tree/app/
> > perfsymboltable.cpp#n479
> > 
> > Let's concentrate on perf for now, but perfparser has similar logic:
> > 
> > We parse the mmap events in the perf.data file and store that information.
> > Note that the perf.data file does not contain events for munmap calls.
> > Then
> > while unwinding the callstack of a perf sample, we lookup the most recent
> > mmap event for every given instruction pointer address, and ensure that
> > the corresponding ELF was registered with libdw.
> 
> So, modules are never deregistered?
> In that case, that might explain the issue.

No, they are deregistered - that is not the issue. Perf actually starts with a 
clean dwfl on every sample and registers whatever modules are relevant for the 
given sample. perfparser tries to be a bit smarter and caches more, but also 
has code to deregister if something goes amiss.

> But I see there is a check if there is already something at the address.
> The interface to "remove" a module might not be immediately clear.
> The idea is that if modules need to be remove you'll call
> dwfl_report_begin, possibly dwfl_report_elf for any new module and then
> dwfl_report_end has a callback that gets all old modules and decides
> whether to re-report them, or they'll get removed. You might want to
> experiment with doing that and not re-report any module that overlaps
> with the new module. (See the libdwfl.h documentation for a hopefully
> clearer description.)
> 
> > > Specifically are you using false for the add_p_vaddr argument?
> > 
> > Yes, we are.
> > 
> > > And could you provide some example where the reported address is
> > > wrong/different from the start address of the Dwfl_Module?
> > 
> > I don't think it's the start address that is wrong, rather it's the end
> > address. But it's hard for me to come up with a small selfcontained
> > example at this stage. I am regularly seeing broken backtraces for
> > samples where I have the gut feeling that missing reported ELFs are to
> > blame. But we report everything, except for scenarios where the mmap
> > events seemingly overlap. This overlapping is, as far as I can see,
> > actually a side effect of remapping taking place in the dynamic linker
> > (i.e. a single dlopen/dynamic linked library can yield multiple mmap
> > events). One way or another, we end up with a situation where we cannot
> > report an ELF to dwfl due to two issues:
> > 
> > a) either ELF tells us we are overlapping some module and just stops which
> > is bad, since we would actually much prefer the newly reported ELF to
> > take precedence
> > 
> > b) we find an mmap event that with a non-zero pgoff, and have no clue how
> > to call dwfl_report_elf and just give up.
> > 
> > In both cases, I was hopeing for dwfl_report_module to help since it
> > seemingly allows me to exactly recreate the mapping that was traced
> > originally.
> If you could add some logging and post that plus the eu-readelf -l
> output of the ELF file, that might help track down what is really going
> on.

Yes, I will try to find the time to write a more elaborate reproducer for this 
issue, to better figure out what is going on here.

Bye

-- 
Milian Wolff
mail@milianw.de
http://milianw.de

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: How to associate Elf with Dwfl_Module returned by dwfl_report_module
  2018-03-21 21:31       ` Mark Wielaard
@ 2018-03-22 12:30         ` Milian Wolff
  0 siblings, 0 replies; 11+ messages in thread
From: Milian Wolff @ 2018-03-22 12:30 UTC (permalink / raw)
  To: elfutils-devel; +Cc: Mark Wielaard

[-- Attachment #1: Type: text/plain, Size: 2579 bytes --]

On Mittwoch, 21. März 2018 22:31:07 CET Mark Wielaard wrote:
> On Wed, Mar 21, 2018 at 03:23:51PM +0100, Milian Wolff wrote:
> > Here's one way to investigate where perf and dwfl disagree on the file
> > mappings:
> > 
> > diff --git a/tools/perf/util/unwind-libdw.c
> > b/tools/perf/util/unwind-libdw.c index 7bdd239c795c..739c68b73772 100644
> > --- a/tools/perf/util/unwind-libdw.c
> > +++ b/tools/perf/util/unwind-libdw.c
> > @@ -48,11 +48,17 @@ static int __report_module(struct addr_location *al,
> > u64 ip,
> > 
> >  			mod = 0;
> >  	
> >  	}
> > 
> > -	if (!mod)
> > +	if (!mod) {
> > +		Dwarf_Addr s, e;
> > 
> >  		mod = dwfl_report_elf(ui->dwfl, dso->short_name,
> >  		
> >  				      (dso->symsrc_filename ? dso->symsrc_filename : dso-
> > >
> > >long_name), -1, al->map->start,
> > >
> >  				      false);
> > 
> > +		dwfl_module_info(mod, NULL, &s, &e, NULL, NULL, NULL, NULL);
> > +		if (al->map->start != s || al->map->end != e)
> > +			pr_warning("MEH: %s | mmap: %zx %zx | dwfl: %zx %zx\n", dso-
> > 
> > >short_name, al->map->start, al->map->end, s, e);
> > 
> > +	}
> > +
> > 
> >  	return mod && dwfl_addrmodule(ui->dwfl, ip) == mod ? 0 : -1;
> >  
> >  }
> > 
> > On one of my perf.data files with many broken backtraces this gives me:
> > 
> > MEH: heaptrack_print | mmap: 56166e9d4000 56166ea39000 | dwfl:
> > 56166e9d4000
> > 56166ea38880
> > MEH: ld-2.26.so | mmap: 7fd0afc6c000 7fd0afe93000 | dwfl: 7fd0afc6c000
> > 7fd0afe920f8
> > MEH: libc-2.26.so | mmap: 7fd0ae16a000 7fd0ae521000 | dwfl: 7fd0ae16a000
> > 7fd0ae5208f0
> > MEH: libstdc++.so.6.0.24 | mmap: 7fd0aea84000 7fd0aee0b000 | dwfl:
> > 7fd0aea84000 7fd0aee0a640
> > MEH: libzstd.so.1.3.3 | mmap: 7fd0aee0b000 7fd0af087000 | dwfl:
> > 7fd0aee0b000 7fd0af086030
> > 
> > Interestingly, here the mmap events observed by perf are actually always
> > *larger* than what dwfl sees...
> 
> I think that is simply the end address being extended to the next page.
> Where the page size seems to be 4K (0x1000). I assume if you look at
> the actual LOAD segment sizes of the files (eu-readelf -l) they match
> with what dwfl reports, and if it extends to the nearest 4k page it
> matches whatp mmap reports.

Yes, I agree - the above does not explain the issues I am seeing. It is most 
probably just noise and can be ignored.

I'll try to dig deeper to figure out what is going on and then come back here. 
For now, it seems like using dwfl_report_elf is fine.

Cheers

-- 
Milian Wolff
mail@milianw.de
http://milianw.de

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: How to associate Elf with Dwfl_Module returned by dwfl_report_module
  2018-03-22  9:11     ` Ulf Hermann
@ 2018-03-22 12:33       ` Milian Wolff
  0 siblings, 0 replies; 11+ messages in thread
From: Milian Wolff @ 2018-03-22 12:33 UTC (permalink / raw)
  To: elfutils-devel; +Cc: Ulf Hermann

[-- Attachment #1: Type: text/plain, Size: 1543 bytes --]

On Donnerstag, 22. März 2018 10:11:29 CET Ulf Hermann wrote:
> Hi Milian,
> 
> > I am regularly seeing broken backtraces for samples where I have
> > the gut feeling that missing reported ELFs are to blame. But we report
> > everything, except for scenarios where the mmap events seemingly overlap.
> 
> Actually, at least for perfparser that's not quite true. When perfparser
> encounters an overlap error, it will throw out the entire set of mappings
> and restart reporting, with the addresses from the current sample (see
> PerfSymbolTable::reportElf() and PerfSymbolTable::clearCache()).

Yes, I know :) I was more talking about the libdwfl integraiton in perf there.

> If that
> still gives you overlapping ranges, it means perf has not sent all the mmap
> events and therefore we're reporting the wrong ELF for some address in your
> sample. That wrong ELF may be larger than the one we actually want and
> therefore it can overlap some other ELF an address in your sample points
> to.
> 
> I've seen that happen. Make sure to keep your sample rate low enough to
> prevent perf from dropping anything.
> 
> I realize we could optimize the reporting a bit, with the dwfl_report_end
> callback Mark mentioned, but if you have addresses into two overlapping
> ELFs in one sample, that's fundamentally impossible to unwind.

In the concrete data file I have at hand, no chunks got lost, so I don't think 
that mmap events could have been lost?

Cheers

-- 
Milian Wolff
mail@milianw.de
http://milianw.de

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-03-22 12:33 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-17 13:15 How to associate Elf with Dwfl_Module returned by dwfl_report_module Milian Wolff
2018-03-20 22:05 ` Mark Wielaard
2018-03-21 13:01   ` Milian Wolff
2018-03-21 14:23     ` Milian Wolff
2018-03-21 14:35       ` Ulf Hermann
2018-03-21 21:31       ` Mark Wielaard
2018-03-22 12:30         ` Milian Wolff
2018-03-21 21:21     ` Mark Wielaard
2018-03-22 12:29       ` Milian Wolff
2018-03-22  9:11     ` Ulf Hermann
2018-03-22 12:33       ` Milian Wolff

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).