Hey all, another debuginfod related question, but unrelated to the other thread I started earlier today. In a work branch I have ported my heaptrack profiler over to elfutils. I have then run the analyzer that uses elfutils (and thus debuginfod internally via dwfl) on a recorded data file to have it download all the debug info files it can find. Now, on consecutive runs, nothing should be downloaded anymore. But I notice that there is still a huge slowdown incurred when enabled `DEBUGINFOD_URLS`: With DEBUGINFOD_URLS set: ``` perf stat ./heaptrack_interpret ... 5,765.36 msec task-clock # 0.707 CPUs utilized ... 8.154873275 seconds time elapsed 5.477158000 seconds user 0.271662000 seconds sys ``` Without DEBUGINFOD_URLS set: ``` perf stat ./heaptrack_interpret ... 2,460.09 msec task-clock # 0.999 CPUs utilized ... 2.461661784 seconds time elapsed 2.334429000 seconds user 0.119698000 seconds sys ``` Note the significantly increased wall-time in the first case. The second case (without debuginfod) has basically zero off-CPU time and significantly reduced on-CPU time too. Looking at the off-CPU time flame graph, and enabling verbose debuginfod output lets me come to the conclusion, that the issue above is due to some libraries simply not having any debug information uploaded. I.e. I see dozens of reports such as: ``` debuginfod_find_debuginfo 2e245c2bf12f95fd8ab79b3a4be99524677cbd70 server urls "https://debuginfod.archlinux.org/" checking build-id checking cache dir /home/milian/.debuginfod_client_cache using timeout 90 init server 0 https://debuginfod.archlinux.org/buildid url 0 https://debuginfod.archlinux.org/buildid/ 2e245c2bf12f95fd8ab79b3a4be99524677cbd70/debuginfo query 1 urls in parallel Downloading | server response HTTP response code said error url 0 The requested URL returned error: 404 not found No such file or directory (err=-2) ``` These negative lookups are not cached. Meaning rerunning the same process using dwfl and debuginfod on the same data would always incur a significant slowdown, as we would again and again try to look for something that's not there. The lookups take roughly ~200ms for me to realize the data is not on the server. What's worse, I'm seeing multiple lookups for the same buildid *within the same process*. I.e.: ``` export DEBUGINFOD_VERBOSE=1 ./heaptrack_interpret ... |& egrep "^url 0 https" | sort | uniq -c | sort ... 6 url 0 https://debuginfod.archlinux.org/buildid/ 7f4b16b4b407cbae2d7118d6f99610e29a18a56a/debuginfo 8 url 0 https://debuginfod.archlinux.org/buildid/ c09c6f50f6bcec73c64a0b4be77eadb8f7202410/debuginfo 14 url 0 https://debuginfod.archlinux.org/buildid/ 85766e9d8458b16e9c7ce6e07c712c02b8471dbc/debuginfo ``` Here, we are paying roughly `14 * 0.2s = 2.8s` just for a single library. Can we find a way to improve this situation somehow generically? I would personally even be OK with caching the 404 error case locally for some time (say, one hour or one day or ...). Then at least we would at most pay this cost once per library, and not multiple times. And rerunning the analysis a second time would become much faster again. Was there a deliberate decision against caching negative server side lookups? Thanks -- Milian Wolff mail@milianw.de http://milianw.de