On Freitag, 8. April 2022 22:05:27 CEST Frank Ch. Eigler wrote: > Hi - > > > another debuginfod related question, but unrelated to the other thread I > > started earlier today. In a work branch I have ported my heaptrack > > profiler > > over to elfutils. I have then run the analyzer that uses elfutils (and > > thus > > debuginfod internally via dwfl) on a recorded data file to have it > > download > > all the debug info files it can find. > > Nice. > > > These negative lookups are not cached. Meaning rerunning the same process > > using dwfl and debuginfod on the same data would always incur a > > significant > > slowdown, as we would again and again try to look for something that's not > > there. The lookups take roughly ~200ms for me to realize the data is not > > on > > the server. > > That's not correct, as of elfutils 0.184 (commit 5f72c51a7e5c0), > with some more recent tweaks in (commit 7d64173fb11c6). I am running elfutils 0.186 here and I definitely saw this happening (that's where the output from my previous mail came from after all). I've now wiped my local debuginfod cache and tried to replicate it, but magically the repeated lookups are gone. This is very mysterious... Nevertheless, I clearly still see a significant overhead of having DEBUGINFOD_URLS set, even after the first run that should have populated the cache, no? I.e. check this out: ``` $ export DEBUGINFOD_URLS=https://debuginfod.archlinux.org/ $ perf stat -r 5 ./heaptrack_interpret ... 5,993.67 msec task-clock # 0.696 CPUs utilized ( +- 0.22% ) 301 context-switches # 50.567 /sec ( +- 7.47% ) 3 cpu-migrations # 0.504 /sec ( +- 19.44% ) 145,574 page-faults # 24.456 K/sec ( +- 0.01% ) 25,026,712,955 cycles # 4.204 GHz ( +- 0.14% ) (83.27%) 357,786,453 stalled-cycles-frontend # 1.43% frontend cycles idle ( +- 1.77% ) (83.39%) 4,851,145,553 stalled-cycles-backend # 19.44% backend cycles idle ( +- 0.99% ) (83.35%) 64,893,827,473 instructions # 2.60 insn per cycle # 0.07 stalled cycles per insn ( +- 0.05% ) (83.29%) 14,234,732,114 branches # 2.391 G/sec ( +- 0.04% ) (83.30%) 111,326,964 branch-misses # 0.78% of all branches ( +- 0.17% ) (83.44%) 8.6057 +- 0.0667 seconds time elapsed ( +- 0.77% ) ``` versus: ``` $ unset DEBUGINFOD_URLS $ perf stat -r 5 ./heaptrack_interpret ... 2,505.36 msec task-clock # 1.009 CPUs utilized ( +- 0.48% ) 542 context-switches # 218.486 /sec ( +- 17.87% ) 1 cpu-migrations # 0.403 /sec ( +- 24.49% ) 50,712 page-faults # 20.443 K/sec ( +- 0.02% ) 10,821,713,676 cycles # 4.362 GHz ( +- 0.15% ) (83.24%) 129,975,157 stalled-cycles-frontend # 1.20% frontend cycles idle ( +- 4.86% ) (83.37%) 1,719,256,978 stalled-cycles-backend # 15.93% backend cycles idle ( +- 1.33% ) (83.38%) 28,947,652,758 instructions # 2.68 insn per cycle # 0.06 stalled cycles per insn ( +- 0.03% ) (83.40%) 6,118,310,875 branches # 2.466 G/sec ( +- 0.03% ) (83.38%) 46,907,166 branch-misses # 0.77% of all branches ( +- 0.16% ) (83.30%) 2.4833 +- 0.0124 seconds time elapsed ( +- 0.50% ) ``` There is clearly still a significant slowdown imposed by having debuginfod enabled - even after the first run. Is this expected? My assumption was: Only the first run would be potentially slow, then everything is cached locally. In the off-CPU flame graphs I clearly see this callchain when DEBUGINFOD_URLS is set: ``` dwfl_standard_find_debuginfo > debuginfod_query_server > curl_multi_wait ``` This accounts for ~2.5s of off-CPU time in my case. If you are saying that negative lookups are cached, then what is this? Why would a second run of the same dwfl-using application with the same input data keep querying the server? Thanks -- Milian Wolff mail@milianw.de http://milianw.de