On Freitag, 8. April 2022 23:34:06 CEST Aaron Merey wrote: > Hi Milian, > > On Fri, Apr 8, 2022 at 5:08 PM Milian Wolff wrote: > > I can reproduce it now suddenly with debuginfod-find too: > > > > ``` > > $ > > debuginfod-find debuginfo 85766e9d8458b16e9c7ce6e07c712c02b8471dbc > > debuginfod_find_debuginfo 85766e9d8458b16e9c7ce6e07c712c02b8471dbc > > server urls "https://debuginfod.archlinux.org/" > > checking build-id > > checking cache dir /home/milian/.cache/debuginfod_client > > using timeout 90 > > init server 0 https://debuginfod.archlinux.org/buildid > > url 0 https://debuginfod.archlinux.org/buildid/ > > 85766e9d8458b16e9c7ce6e07c712c02b8471dbc/debuginfo > > query 1 urls in parallel > > server response HTTP response code said error > > url 0 The requested URL returned error: 404 > > not found No such file or directory (err=-2) > > Server query failed: No such file or directory > > ``` > > > > I do see an empty `/home/milian/.cache/debuginfod_client/ > > 85766e9d8458b16e9c7ce6e07c712c02b8471dbc/debuginfo` file. But the server > > is > > still queried (i.e. rerunning the above command always produces the same > > > output for me). The lookup is costly too at ~70ms overall: > I am not able to reproduce this on Fedora 35 using commit 8db849976f070. > > The first time I run debuginfod-find I get similar output to you: > ``` > debuginfod_find_debuginfo 85766e9d8458b16e9c7ce6e07c712c02b8471dbc > server urls "https://debuginfod.archlinux.org/" > checking build-id > checking cache dir /home/amerey/.cache/debuginfod_client > using timeout 90 > init server 0 https://debuginfod.archlinux.org/buildid > url 0 > https://debuginfod.archlinux.org/buildid/85766e9d8458b16e9c7ce6e07c712c02b8 > 471dbc/debuginfo query 1 urls in parallel > server response HTTP response code said error > url 0 The requested URL returned error: 404 > not found No such file or directory (err=-2) > Server query failed: No such file or directory > ``` > > But when I rerun debuginfod-find I get output indicating that the cache was > checked: > ``` > debuginfod_find_debuginfo 85766e9d8458b16e9c7ce6e07c712c02b8471dbc > server urls "https://debuginfod.archlinux.org/" > checking build-id > checking cache dir /home/amerey/.cache/debuginfod_client > not found No such file or directory (err=-2) > Server query failed: No such file or directory > ``` > > No calls to curl_multi_wait occur. Is there a "cache_miss_s" file in the top > level of your debuginfod cache? It should contain the number of seconds > that must elapse after a failed attempt before another query would be > attempted (default is 600 seconds). Yes that file exists and it contains 600. Running through strace, I see that it actually removes the empty file first. This looks totally bogus: ``` strace -e file -f debuginfod-find debuginfo 85766e9d8458b16e9c7ce6e07c712c02b8471dbc |& grep "cache/debuginfod_client" checking cache dir /home/milian/.cache/debuginfod_client newfstatat(AT_FDCWD, "/home/milian/.cache/debuginfod_client", {st_mode=S_IFDIR|0755, st_size=12288, ...}, 0) = 0 newfstatat(AT_FDCWD, "/home/milian/.cache/debuginfod_client/ cache_clean_interval_s", {st_mode=S_IFREG|0644, st_size=5, ...}, 0) = 0 newfstatat(AT_FDCWD, "/home/milian/.cache/debuginfod_client/ cache_clean_interval_s", {st_mode=S_IFREG|0644, st_size=5, ...}, 0) = 0 openat(AT_FDCWD, "/home/milian/.cache/debuginfod_client/ cache_clean_interval_s", O_RDONLY) = 5 newfstatat(AT_FDCWD, "/home/milian/.cache/debuginfod_client/ 85766e9d8458b16e9c7ce6e07c712c02b8471dbc/debuginfo", {st_mode=S_IFREG|000, st_size=0, ...}, 0) = 0 newfstatat(AT_FDCWD, "/home/milian/.cache/debuginfod_client/cache_miss_s", {st_mode=S_IFREG|0644, st_size=3, ...}, 0) = 0 openat(AT_FDCWD, "/home/milian/.cache/debuginfod_client/cache_miss_s", O_RDONLY) = 5 unlink("/home/milian/.cache/debuginfod_client/ 85766e9d8458b16e9c7ce6e07c712c02b8471dbc/debuginfo") = 0 openat(AT_FDCWD, "/home/milian/.cache/debuginfod_client/ 85766e9d8458b16e9c7ce6e07c712c02b8471dbc/debuginfo", O_RDONLY) = -1 ENOENT (No such file or directory) mkdir("/home/milian/.cache/debuginfod_client/ 85766e9d8458b16e9c7ce6e07c712c02b8471dbc", 0700) = -1 EEXIST (File exists) openat(AT_FDCWD, "/home/milian/.cache/debuginfod_client/ 85766e9d8458b16e9c7ce6e07c712c02b8471dbc/debuginfo.KdSnXl", O_RDWR|O_CREAT| O_EXCL, 0600) = 5 openat(AT_FDCWD, "/home/milian/.cache/debuginfod_client/ 85766e9d8458b16e9c7ce6e07c712c02b8471dbc/debuginfo", O_RDONLY|O_CREAT|O_EXCL, 000) = 7 unlink("/home/milian/.cache/debuginfod_client/ 85766e9d8458b16e9c7ce6e07c712c02b8471dbc/debuginfo.KdSnXl") = 0 rmdir("/home/milian/.cache/debuginfod_client/ 85766e9d8458b16e9c7ce6e07c712c02b8471dbc") = -1 ENOTEMPTY (Directory not empty) ``` Note how there's an unlink first followed by an openat of the file that just got removed? Then a mkdir for a folder that still exists. Then a tmpfile for the download. Then follows the creation of the empty file. And then it tries to rmdir the whole folder, which fails, but that still sounds wrong too, no? Backtraces for the unlink: ``` unlink("/home/milian/.cache/debuginfod_client/ 85766e9d8458b16e9c7ce6e07c712c02b8471dbc/debuginfo") = 0 > /usr/lib/libc.so.6(unlink+0xb) [0x103aab] > /usr/lib/libdebuginfod-0.186.so(debuginfod_add_http_header+0x1821) [0x3de1] > /usr/bin/debuginfod-find() [0x122f] > /usr/lib/libc.so.6(__libc_start_call_main+0x80) [0x2d310] > /usr/lib/libc.so.6(__libc_start_main+0x81) [0x2d3c1] > /usr/bin/debuginfod-find() [0x1495] ``` That is: ``` eu-addr2line -fe /usr/lib/libdebuginfod-0.186.so -a 0x3de1 0x0000000000003de1 debuginfod_query_server /usr/src/debug/elfutils-0.186/debuginfod/debuginfod-client.c:791:9 ``` Which in turn points at the code that does cache cleanup in `debuginfod_query_server`. I now used `rr` to record such a bogus run and I clearly see that `(time(NULL) - st.st_mtime <= cache_miss)` is false and it goes into the unlink case. I'll try to debug this further now - I definitely do not wait 600s inbetween these runs here... -- Milian Wolff mail@milianw.de http://milianw.de