On Mittwoch, 21. März 2018 22:21:13 CET Mark Wielaard wrote: > Hi Milian, > > On Wed, Mar 21, 2018 at 02:01:41PM +0100, Milian Wolff wrote: > > Here's the code for the perf tools: > > > > https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/ > > perf/util/unwind-libdw.c?h=perf/core#n52 > > > > Here's the code for the perfparser: > > > > http://code.qt.io/cgit/qt-creator/perfparser.git/tree/app/ > > perfsymboltable.cpp#n479 > > > > Let's concentrate on perf for now, but perfparser has similar logic: > > > > We parse the mmap events in the perf.data file and store that information. > > Note that the perf.data file does not contain events for munmap calls. > > Then > > while unwinding the callstack of a perf sample, we lookup the most recent > > mmap event for every given instruction pointer address, and ensure that > > the corresponding ELF was registered with libdw. > > So, modules are never deregistered? > In that case, that might explain the issue. No, they are deregistered - that is not the issue. Perf actually starts with a clean dwfl on every sample and registers whatever modules are relevant for the given sample. perfparser tries to be a bit smarter and caches more, but also has code to deregister if something goes amiss. > But I see there is a check if there is already something at the address. > The interface to "remove" a module might not be immediately clear. > The idea is that if modules need to be remove you'll call > dwfl_report_begin, possibly dwfl_report_elf for any new module and then > dwfl_report_end has a callback that gets all old modules and decides > whether to re-report them, or they'll get removed. You might want to > experiment with doing that and not re-report any module that overlaps > with the new module. (See the libdwfl.h documentation for a hopefully > clearer description.) > > > > Specifically are you using false for the add_p_vaddr argument? > > > > Yes, we are. > > > > > And could you provide some example where the reported address is > > > wrong/different from the start address of the Dwfl_Module? > > > > I don't think it's the start address that is wrong, rather it's the end > > address. But it's hard for me to come up with a small selfcontained > > example at this stage. I am regularly seeing broken backtraces for > > samples where I have the gut feeling that missing reported ELFs are to > > blame. But we report everything, except for scenarios where the mmap > > events seemingly overlap. This overlapping is, as far as I can see, > > actually a side effect of remapping taking place in the dynamic linker > > (i.e. a single dlopen/dynamic linked library can yield multiple mmap > > events). One way or another, we end up with a situation where we cannot > > report an ELF to dwfl due to two issues: > > > > a) either ELF tells us we are overlapping some module and just stops which > > is bad, since we would actually much prefer the newly reported ELF to > > take precedence > > > > b) we find an mmap event that with a non-zero pgoff, and have no clue how > > to call dwfl_report_elf and just give up. > > > > In both cases, I was hopeing for dwfl_report_module to help since it > > seemingly allows me to exactly recreate the mapping that was traced > > originally. > If you could add some logging and post that plus the eu-readelf -l > output of the ELF file, that might help track down what is really going > on. Yes, I will try to find the time to write a more elaborate reproducer for this issue, to better figure out what is going on here. Bye -- Milian Wolff mail@milianw.de http://milianw.de