On Mon, 2017-04-03 at 23:15 +0200, Mark Wielaard wrote: > On Mon, Apr 03, 2017 at 11:02:53AM +0200, Ulf Hermann wrote: > > > Ping? Any progress on merging this functionality upstream? > > > It can make quite a difference in unwinding. > > > > The patches have also been in perfparser releases for over a year now. I > > would like to see them upstream. > > Yes, sorry. The patches looked fine. Then I thought I should try > adding similar support to another arch to double check. And then > I got distracted by something else. I'll get back to is asap and > make sure they get in before the next release end of the month. OK. Sorry for the delay. In general I like these patches. I even made a i386 fp unwind backend function to play with the idea a bit more. See attached. Does that look sane? I do agree with Jan that frame pointer unwinding is notoriously untrustworthy. Even with some sanity checks it is hard to know whether you are doing a correct unwind. gdb gets away with it through pretty advanced frame sniffers, which take a lot of low-level compiler generation knowledge to get correct (and even then...). You only restore the pc, stack and frame pointer registers (maybe incorrectly). So afterwards, even if you got valid CFI data you might be stuck. Ideally we provide the user with a flag to check how trustworthy a frame is and/or an option to drop inexact unwinding completely. But that would be an extension for another time. I do think that in the context of dwfl_thread_getframes () it is appropriate. Currently all you can get from a Dwfl_Frame is the pc anyway. I do however have a problem with the testcases. I approved the --allow-unknown option for the backtrace fp core tests (commit f9971cb) but I now believe that was a mistake. It makes it really hard to see whether the test actually tests anything. In fact for my i386 case the testcase even succeeded when disabling the whole frame pointer unwinder fallback... I believe the issue with the missing symbol names is not caused by the frame pointer unwinding not matching real function addresses, because they should! But because of the way the testcase binaries are generated. What you do is the following: # gcc -static -O2 -fno-omit-frame-pointer -I ../ -I ../lib/ -D_GNU_SOURCE # -pthread -o backtrace.x86_64.fp.exec backtrace-child.c # # Then strip the .eh_frame and .eh_frame_hdr sections from the binary: # # strip -R .eh_frame backtrace.x86_64.fp.exec # strip -R .eh_frame_hdr backtrace.x86_64.fp.exec # # The core is generated by calling the binary with --gencore But the .eh_frame sections are allocated sections. Which means that by removing them strip will alter the address layout of the program. Which causes the symbol table addresses to not match up anymore. I think they should be generated with -fno-asynchronous-unwind-tables to avoid them being generated in the first place (in the main binary functions, but keep those from the static library functions pulled in): # gcc -static -O2 -fno-omit-frame-pointer -fno-asynchronous-unwind-tables \ # -D_GNU_SOURCE -pthread -o tests/backtrace.i386.fp.exec -I. -Ilib \ # tests/backtrace-child.c That way the difference between having a frame pointer unwinder or not is much clearer: $ eu-stack --exec ./backtrace.i386.fp.exec --core ./backtrace.i386.fp.core PID 12044 - core TID 12045: #0 0x008a7416 __kernel_vsyscall #1 0x08051ab9 raise #2 0x080485c1 sigusr2 eu-stack: dwfl_thread_getframes tid 12045 at 0x80485c0 in [exe]: No DWARF information found TID 12044: #0 0x008a7416 __kernel_vsyscall #1 0x0804ae30 pthread_join #2 0x0804847c main eu-stack: dwfl_thread_getframes tid 12044 at 0x804847b in [exe]: No DWARF information found $ LD_LIBRARY_PATH=backends:libelf:libdw src/stack --exec ./backtrace.i386.fp.exec --core ./backtrace.i386.fp.core PID 12044 - core TID 12045: #0 0x008a7416 __kernel_vsyscall #1 0x08051ab9 raise #2 0x080485c1 sigusr2 #3 0x08048699 stdarg #4 0x08048702 backtracegen #5 0x0804871b start #6 0x0804a7cf start_thread #7 0x080746fe __clone TID 12044: #0 0x008a7416 __kernel_vsyscall #1 0x0804ae30 pthread_join #2 0x0804847c main #3 0x08053188 __libc_start_main #4 0x080481e1 _start src/stack: dwfl_thread_getframes tid 12044 at 0x80481e0 in [exe]: No DWARF information found Then if we check for one of the function names in the middle of the frame pointer only unwound frames, like backtracegen, this would give us a much better test of whether things worked correctly. Could you try regenerating your test binaries with -fno-asynchronous-unwind-tables? Then I'll try to adapt the testsuite to check for the one of the frame pointer only unwound function names. Thanks, Mark