* dwfl_module_addrdie fails for binaries built with clang++ @ 2017-05-04 16:47 Milian Wolff 2017-05-05 16:25 ` Mark Wielaard 0 siblings, 1 reply; 4+ messages in thread From: Milian Wolff @ 2017-05-04 16:47 UTC (permalink / raw) To: elfutils-devel; +Cc: Mark Wielaard Hey all, I noticed that elfutils fails to handle clang binaries when we want to find a DIE for a certain address. I.e. dwfl_module_addrdie returns nullptr, and eu- addr2line fails to resolve inlined frames. To reproduce this: ~~~~~~~~~~~ $ cat test.cpp #include <cmath> #include <random> #include <iostream> using namespace std; int main() { uniform_real_distribution<double> uniform(-1E5, 1E5); default_random_engine engine; double s = 0; for (int i = 0; i < 10000000; ++i) { s += uniform(engine); } cout << "random sum: " << s << '\n'; return 0; } $ clang++ -O2 -g test.cpp -o test --std=c++11 $ objdump -Sl test | grep random.h -A2 <snip> 400a10: 48 89 f0 mov %rsi,%rax -- /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/6.3.1/../../../../include/c++/6.3.1/ bits/random.h:1818 operator()(_UniformRandomNumberGenerator& __urng, const param_type& __p) $ addr2line -e test -i 400a10 /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/6.3.1/../../../../include/c++/6.3.1/ bits/random.h:143 /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/6.3.1/../../../../include/c++/6.3.1/ bits/random.h:151 /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/6.3.1/../../../../include/c++/6.3.1/ bits/random.h:332 /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/6.3.1/../../../../include/c++/6.3.1/ bits/random.tcc:3332 /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/6.3.1/../../../../include/c++/6.3.1/ bits/random.h:183 /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/6.3.1/../../../../include/c++/6.3.1/ bits/random.h:1818 /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/6.3.1/../../../../include/c++/6.3.1/ bits/random.h:1809 /tmp/test.cpp:13 $ eu-addr2line -e test -i 400a10 ??:0 ~~~~~~~~~~~~~~~~~~~~~ This also affects us in our perfparser. Not being able to find a cudie means not finding inlined frames nor file/line mappings, which is quite a set-back. I have noticed that backward-cpp contains a (partially) work-around for this: https://github.com/bombela/backward-cpp/blob/master/backward.hpp#L1216 Is this the right approach and also what the non-eu addr2line does? If so, can that be added upstream too, such that dwfl_module_addrdie can be relied on? I've seen it on clang 3.6, 4 and 5. Neither passing -g3 nor -gdwarf-aranges helps. Thanks -- Milian Wolff mail@milianw.de http://milianw.de ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: dwfl_module_addrdie fails for binaries built with clang++ 2017-05-04 16:47 dwfl_module_addrdie fails for binaries built with clang++ Milian Wolff @ 2017-05-05 16:25 ` Mark Wielaard 2017-05-08 10:42 ` Milian Wolff 0 siblings, 1 reply; 4+ messages in thread From: Mark Wielaard @ 2017-05-05 16:25 UTC (permalink / raw) To: Milian Wolff; +Cc: elfutils-devel Hi Milian, On Thu, 2017-05-04 at 18:05 +0200, Milian Wolff wrote: > I noticed that elfutils fails to handle clang binaries when we want to find a > DIE for a certain address. I.e. dwfl_module_addrdie returns nullptr, and eu- > addr2line fails to resolve inlined frames. > > To reproduce this: >[...] > > This also affects us in our perfparser. Not being able to find a cudie means > not finding inlined frames nor file/line mappings, which is quite a set-back. > > I have noticed that backward-cpp contains a (partially) work-around for this: > > https://github.com/bombela/backward-cpp/blob/master/backward.hpp#L1216 O urgh how utterly broken (not backward-cpp, but the bogus DWARF clang generates). As that comment says: // Sadly clang does not generate the section .debug_aranges, thus // dwfl_module_addrdie will fail early. Clang doesn't either set // the lowpc/highpc/range info for every compilation unit. // // So in order to save the world: // for every compilation unit, we will iterate over every single // DIEs. Normally functions should have a lowpc/highpc/range, which // we will use to infer the compilation unit. // note that this is probably badly inefficient. And indeed having to scan through every CU to find a matching function DIE is badly inefficient :{ > Is this the right approach and also what the non-eu addr2line does? If so, can > that be added upstream too, such that dwfl_module_addrdie can be relied on? > > I've seen it on clang 3.6, 4 and 5. Neither passing -g3 nor -gdwarf-aranges > helps. Thanks for reporting this. I think this might be the same issue seen here: https://sourceware.org/bugzilla/show_bug.cgi?id=21247 ... or at least it seems related. The function/address not found in that case also comes from a CU generated by clang. It does have a lowpc and ranges, but the lowpc looks bogus (zero) and the ranges don't seem to cover the function in question. So it seems even worse than your example where there are no lowpc/ranges. We cannot even trust them if they are there. Sigh. I have to think about how to handle this. We clearly need something that just ignores the lowpc/highpc/ranges on CUs and parses every CU till the function/address DIE is found to know which CU and line_table to use. But that is so inefficient that I don't want to do that by default. Cheers, Mark ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: dwfl_module_addrdie fails for binaries built with clang++ 2017-05-05 16:25 ` Mark Wielaard @ 2017-05-08 10:42 ` Milian Wolff 2017-05-09 12:40 ` Mark Wielaard 0 siblings, 1 reply; 4+ messages in thread From: Milian Wolff @ 2017-05-08 10:42 UTC (permalink / raw) To: Mark Wielaard; +Cc: elfutils-devel [-- Attachment #1: Type: text/plain, Size: 3197 bytes --] On Freitag, 5. Mai 2017 15:06:48 CEST Mark Wielaard wrote: > Hi Milian, > > On Thu, 2017-05-04 at 18:05 +0200, Milian Wolff wrote: > > I noticed that elfutils fails to handle clang binaries when we want to > > find a DIE for a certain address. I.e. dwfl_module_addrdie returns > > nullptr, and eu- addr2line fails to resolve inlined frames. > > > > To reproduce this: > >[...] > > > > This also affects us in our perfparser. Not being able to find a cudie > > means not finding inlined frames nor file/line mappings, which is quite a > > set-back. > > > > I have noticed that backward-cpp contains a (partially) work-around for > > this: > > > > https://github.com/bombela/backward-cpp/blob/master/backward.hpp#L1216 > > O urgh how utterly broken (not backward-cpp, but the bogus DWARF clang > generates). As that comment says: > > // Sadly clang does not generate the section .debug_aranges, > thus > // dwfl_module_addrdie will fail early. Clang doesn't either set > // the lowpc/highpc/range info for every compilation unit. > // > // So in order to save the world: > // for every compilation unit, we will iterate over every single > // DIEs. Normally functions should have a lowpc/highpc/range, which > // we will use to infer the compilation unit. > > // note that this is probably badly inefficient. > > And indeed having to scan through every CU to find a matching function > DIE is badly inefficient :{ But this code comment is relatively old. Are we sure it's really still the case? > > Is this the right approach and also what the non-eu addr2line does? If so, > > can that be added upstream too, such that dwfl_module_addrdie can be > > relied on? > > > > I've seen it on clang 3.6, 4 and 5. Neither passing -g3 nor > > -gdwarf-aranges > > helps. > > Thanks for reporting this. I think this might be the same issue seen > here: https://sourceware.org/bugzilla/show_bug.cgi?id=21247 > ... or at least it seems related. The function/address not found in that > case also comes from a CU generated by clang. It does have a lowpc and > ranges, but the lowpc looks bogus (zero) and the ranges don't seem to > cover the function in question. So it seems even worse than your example > where there are no lowpc/ranges. We cannot even trust them if they are > there. Sigh. So the situation is different from the comment in backward-cpp... > I have to think about how to handle this. We clearly need something that > just ignores the lowpc/highpc/ranges on CUs and parses every CU till the > function/address DIE is found to know which CU and line_table to use. > But that is so inefficient that I don't want to do that by default. So, if this is really that bad - what are the binutils doing - does anyone know? Also, if it's really against all your expectations, shouldn't we report this upstream at clang and ask for input there? I can't believe they knowingly break their generated code in such a way. Rather, I believe it's either done unknowingly, or there is some alternative way to interpret the data that we are not aware of? Cheers -- Milian Wolff mail@milianw.de http://milianw.de [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 195 bytes --] ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: dwfl_module_addrdie fails for binaries built with clang++ 2017-05-08 10:42 ` Milian Wolff @ 2017-05-09 12:40 ` Mark Wielaard 0 siblings, 0 replies; 4+ messages in thread From: Mark Wielaard @ 2017-05-09 12:40 UTC (permalink / raw) To: Milian Wolff; +Cc: elfutils-devel On Sat, 2017-05-06 at 13:30 +0200, Milian Wolff wrote: > On Freitag, 5. Mai 2017 15:06:48 CEST Mark Wielaard wrote: > > On Thu, 2017-05-04 at 18:05 +0200, Milian Wolff wrote: > > > I noticed that elfutils fails to handle clang binaries when we want to > > > find a DIE for a certain address. I.e. dwfl_module_addrdie returns > > > nullptr, and eu- addr2line fails to resolve inlined frames. > > > > > > To reproduce this: > > >[...] > > > > > > This also affects us in our perfparser. Not being able to find a cudie > > > means not finding inlined frames nor file/line mappings, which is quite a > > > set-back. > > > > > > I have noticed that backward-cpp contains a (partially) work-around for > > > this: > > > > > > https://github.com/bombela/backward-cpp/blob/master/backward.hpp#L1216 > > > > O urgh how utterly broken (not backward-cpp, but the bogus DWARF clang > > generates). As that comment says: > > > > // Sadly clang does not generate the section .debug_aranges, > > thus > > // dwfl_module_addrdie will fail early. Clang doesn't either set > > // the lowpc/highpc/range info for every compilation unit. > > // > > // So in order to save the world: > > // for every compilation unit, we will iterate over every single > > // DIEs. Normally functions should have a lowpc/highpc/range, which > > // we will use to infer the compilation unit. > > > > // note that this is probably badly inefficient. > > > > And indeed having to scan through every CU to find a matching function > > DIE is badly inefficient :{ > > But this code comment is relatively old. Are we sure it's really still the > case? If you were able to replicate it then yes. > > > Is this the right approach and also what the non-eu addr2line does? If so, > > > can that be added upstream too, such that dwfl_module_addrdie can be > > > relied on? > > > > > > I've seen it on clang 3.6, 4 and 5. Neither passing -g3 nor > > > -gdwarf-aranges > > > helps. > > > > Thanks for reporting this. I think this might be the same issue seen > > here: https://sourceware.org/bugzilla/show_bug.cgi?id=21247 > > ... or at least it seems related. The function/address not found in that > > case also comes from a CU generated by clang. It does have a lowpc and > > ranges, but the lowpc looks bogus (zero) and the ranges don't seem to > > cover the function in question. So it seems even worse than your example > > where there are no lowpc/ranges. We cannot even trust them if they are > > there. Sigh. > > So the situation is different from the comment in backward-cpp... Only in how the lowpc/ranges were broken. The core issue is that we cannot rely on the lowpc/ranges (and aranges) being correct for a CU. We assume the DWARF producer doesn't really feed us garbage, but apparently clang does :{ > > I have to think about how to handle this. We clearly need something that > > just ignores the lowpc/highpc/ranges on CUs and parses every CU till the > > function/address DIE is found to know which CU and line_table to use. > > But that is so inefficient that I don't want to do that by default. > > So, if this is really that bad - what are the binutils doing - does anyone > know? They scan every CU just in case. Which is terrible for performance. Just compare binutils addr2line vs elfutils eu-addr2line on a large binary. e.g. on my local machine (best of 3): $ time eu-addr2line -e /usr/lib64/firefox/libxul.so 0x0157a892 /usr/src/debug/firefox-52.1.0/firefox-52.1.0esr/objdir/dom/bindings/ScrollViewChangeEventBinding.cpp:541 real 0m0.067s user 0m0.050s sys 0m0.017s $ time addr2line -e /usr/lib64/firefox/libxul.so 0x0157a892 /usr/src/debug/firefox-52.1.0/firefox-52.1.0esr/objdir/dom/bindings/ScrollViewChangeEventBinding.cpp:541 real 0m25.984s user 0m20.847s sys 0m4.193s So we definitely don't want to do what binutils does by default. Note that the worst case is an address that doesn't match against any function (e.g. what you might get if an unwind goes wrong). Currently that is the cheapest case (not covered by any CU, so done). But if we cannot rely on which addresses are covered by which CU then we have to scan all of them just to make sure there really isn't a subroutine description in there that does cover the address. I want to prevent us having to do that "just in case" and only if we (or the user) knows the DWARF might come from a bad producer. So I am pondering whether we should add something like -b, --bad, as command line argument for things like eu-addr2line, eu-stack, to indicate that we need some workarounds for bad DWARF. Which then would call something like dwarf_force_aranges () or something which would setup an aranges table created by explicit scanning of all CUs. > Also, if it's really against all your expectations, shouldn't we report > this upstream at clang and ask for input there? I can't believe they knowingly > break their generated code in such a way. Rather, I believe it's either done > unknowingly, or there is some alternative way to interpret the data that we > are not aware of? I think they are aware the DWARF they produce is broken. A quick search finds lots of bug reports about it. The following two specifically seem relevant for the above case: https://bugs.llvm.org/show_bug.cgi?id=13351 https://bugs.llvm.org/show_bug.cgi?id=30569 Cheers, Mark ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2017-05-08 13:05 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-05-04 16:47 dwfl_module_addrdie fails for binaries built with clang++ Milian Wolff 2017-05-05 16:25 ` Mark Wielaard 2017-05-08 10:42 ` Milian Wolff 2017-05-09 12:40 ` Mark Wielaard
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).