On Mittwoch, 21. März 2018 14:01:41 CET Milian Wolff wrote: > On Dienstag, 20. März 2018 23:05:49 CET Mark Wielaard wrote: > > Hi Milian, > > Hey Mark :) > > > On Sat, Mar 17, 2018 at 02:14:48PM +0100, Milian Wolff wrote: > > > a recurring issue in the libdwfl integration of perf and perfparser are > > > supposedly overlapping modules. The perf data file contains the exact > > > mappings for all files corresponding to the actual mmap events that > > > occurred during runtime, e.g.: > > > > > > ``` > > > $ perf script --show-mmap-events | grep MMAP | grep stdc > > > heaptrack_print 13962 87163.483450: PERF_RECORD_MMAP2 13962/13962: > > > [0x7fd0aea84000(0x387000) @ 0 08:03 413039 3864781083]: r-xp > > > /usr/lib/libstdc+ +.so.6.0.24 > > > heaptrack_print 13962 87163.483454: PERF_RECORD_MMAP2 13962/13962: > > > [0x7fd0aebfc000(0x1ff000) @ 0x178000 08:03 413039 3864781083]: ---p > > > /usr/lib/ libstdc++.so.6.0.24 > > > heaptrack_print 13962 87163.483458: PERF_RECORD_MMAP2 13962/13962: > > > [0x7fd0aedfb000(0xd000) @ 0x177000 08:03 413039 3864781083]: rw-p > > > /usr/lib/ > > > libstdc++.so.6.0.24 > > > heaptrack_print 13962 87163.484860: PERF_RECORD_MMAP2 13962/13962: > > > [0x7fd0aedfb000(0xc000) @ 0x177000 08:03 413039 3864781083]: r--p > > > /usr/lib/ > > > libstdc++.so.6.0.24 > > > ``` > > > So far, both perf and perfparser are using dwfl_report_elf to report the > > > file. But that API is deducing the mapping addresses internally, which > > > may or may not be what we saw at runtime. I suspect that this is the > > > reason for some issues we are seeing, such as supposedly overlapping > > > modules. > > > > How exactly are you calling dwfl_report_elf? > > Here's the code for the perf tools: > > https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/ > perf/util/unwind-libdw.c?h=perf/core#n52 > > Here's the code for the perfparser: > > http://code.qt.io/cgit/qt-creator/perfparser.git/tree/app/ > perfsymboltable.cpp#n479 > > Let's concentrate on perf for now, but perfparser has similar logic: > > We parse the mmap events in the perf.data file and store that information. > Note that the perf.data file does not contain events for munmap calls. Then > while unwinding the callstack of a perf sample, we lookup the most recent > mmap event for every given instruction pointer address, and ensure that the > corresponding ELF was registered with libdw. > > > Specifically are you using false for the add_p_vaddr argument? > > Yes, we are. > > > And could you provide some example where the reported address is > > wrong/different from the start address of the Dwfl_Module? > > I don't think it's the start address that is wrong, rather it's the end > address. But it's hard for me to come up with a small selfcontained example > at this stage. I am regularly seeing broken backtraces for samples where I > have the gut feeling that missing reported ELFs are to blame. But we report > everything, except for scenarios where the mmap events seemingly overlap. > This overlapping is, as far as I can see, actually a side effect of > remapping taking place in the dynamic linker (i.e. a single dlopen/dynamic > linked library can yield multiple mmap events). One way or another, we end > up with a situation where we cannot report an ELF to dwfl due to two > issues: > > a) either ELF tells us we are overlapping some module and just stops which > is bad, since we would actually much prefer the newly reported ELF to take > precedence > > b) we find an mmap event that with a non-zero pgoff, and have no clue how to > call dwfl_report_elf and just give up. > > In both cases, I was hopeing for dwfl_report_module to help since it > seemingly allows me to exactly recreate the mapping that was traced > originally. Here's one way to investigate where perf and dwfl disagree on the file mappings: diff --git a/tools/perf/util/unwind-libdw.c b/tools/perf/util/unwind-libdw.c index 7bdd239c795c..739c68b73772 100644 --- a/tools/perf/util/unwind-libdw.c +++ b/tools/perf/util/unwind-libdw.c @@ -48,11 +48,17 @@ static int __report_module(struct addr_location *al, u64 ip, mod = 0; } - if (!mod) + if (!mod) { + Dwarf_Addr s, e; mod = dwfl_report_elf(ui->dwfl, dso->short_name, (dso->symsrc_filename ? dso->symsrc_filename : dso- >long_name), -1, al->map->start, false); + dwfl_module_info(mod, NULL, &s, &e, NULL, NULL, NULL, NULL); + if (al->map->start != s || al->map->end != e) + pr_warning("MEH: %s | mmap: %zx %zx | dwfl: %zx %zx\n", dso- >short_name, al->map->start, al->map->end, s, e); + } + return mod && dwfl_addrmodule(ui->dwfl, ip) == mod ? 0 : -1; } On one of my perf.data files with many broken backtraces this gives me: MEH: heaptrack_print | mmap: 56166e9d4000 56166ea39000 | dwfl: 56166e9d4000 56166ea38880 MEH: ld-2.26.so | mmap: 7fd0afc6c000 7fd0afe93000 | dwfl: 7fd0afc6c000 7fd0afe920f8 MEH: libc-2.26.so | mmap: 7fd0ae16a000 7fd0ae521000 | dwfl: 7fd0ae16a000 7fd0ae5208f0 MEH: libstdc++.so.6.0.24 | mmap: 7fd0aea84000 7fd0aee0b000 | dwfl: 7fd0aea84000 7fd0aee0a640 MEH: libzstd.so.1.3.3 | mmap: 7fd0aee0b000 7fd0af087000 | dwfl: 7fd0aee0b000 7fd0af086030 Interestingly, here the mmap events observed by perf are actually always *larger* than what dwfl sees... -- Milian Wolff mail@milianw.de http://milianw.de