On Freitag, 8. April 2022 22:05:27 CEST Frank Ch. Eigler wrote:
> Hi -
> 
> > another debuginfod related question, but unrelated to the other thread I
> > started earlier today. In a work branch I have ported my heaptrack
> > profiler
> > over to elfutils. I have then run the analyzer that uses elfutils (and
> > thus
> > debuginfod internally via dwfl) on a recorded data file to have it
> > download
> > all the debug info files it can find.
> 
> Nice.
> 
> > These negative lookups are not cached. Meaning rerunning the same process
> > using dwfl and debuginfod on the same data would always incur a
> > significant
> > slowdown, as we would again and again try to look for something that's not
> > there. The lookups take roughly ~200ms for me to realize the data is not
> > on
> > the server.
> 
> That's not correct, as of elfutils 0.184 (commit 5f72c51a7e5c0),
> with some more recent tweaks in (commit 7d64173fb11c6).

I am running elfutils 0.186 here and I definitely saw this happening (that's 
where the output from my previous mail came from after all). I've now wiped my 
local debuginfod cache and tried to replicate it, but magically the repeated 
lookups are gone. This is very mysterious...

Nevertheless, I clearly still see a significant overhead of having 
DEBUGINFOD_URLS set, even after the first run that should have populated the 
cache, no? I.e. check this out:

```
$ export DEBUGINFOD_URLS=https://debuginfod.archlinux.org/
$ perf stat -r 5 ./heaptrack_interpret ...
          5,993.67 msec task-clock                #    0.696 CPUs utilized            
( +-  0.22% )
               301      context-switches          #   50.567 /sec                     
( +-  7.47% )
                 3      cpu-migrations            #    0.504 /sec                     
( +- 19.44% )
           145,574      page-faults               #   24.456 K/sec                    
( +-  0.01% )
    25,026,712,955      cycles                    #    4.204 GHz                      
( +-  0.14% )  (83.27%)
       357,786,453      stalled-cycles-frontend   #    1.43% frontend cycles 
idle     ( +-  1.77% )  (83.39%)
     4,851,145,553      stalled-cycles-backend    #   19.44% backend cycles 
idle      ( +-  0.99% )  (83.35%)
    64,893,827,473      instructions              #    2.60  insn per cycle         
                                                  #    0.07  stalled cycles 
per insn  ( +-  0.05% )  (83.29%)
    14,234,732,114      branches                  #    2.391 G/sec                    
( +-  0.04% )  (83.30%)
       111,326,964      branch-misses             #    0.78% of all branches          
( +-  0.17% )  (83.44%)

            8.6057 +- 0.0667 seconds time elapsed  ( +-  0.77% )
```

versus:

```
$ unset DEBUGINFOD_URLS
$ perf stat -r 5 ./heaptrack_interpret ...
          2,505.36 msec task-clock                #    1.009 CPUs utilized            
( +-  0.48% )
               542      context-switches          #  218.486 /sec                     
( +- 17.87% )
                 1      cpu-migrations            #    0.403 /sec                     
( +- 24.49% )
            50,712      page-faults               #   20.443 K/sec                    
( +-  0.02% )
    10,821,713,676      cycles                    #    4.362 GHz                      
( +-  0.15% )  (83.24%)
       129,975,157      stalled-cycles-frontend   #    1.20% frontend cycles 
idle     ( +-  4.86% )  (83.37%)
     1,719,256,978      stalled-cycles-backend    #   15.93% backend cycles 
idle      ( +-  1.33% )  (83.38%)
    28,947,652,758      instructions              #    2.68  insn per cycle         
                                                  #    0.06  stalled cycles 
per insn  ( +-  0.03% )  (83.40%)
     6,118,310,875      branches                  #    2.466 G/sec                    
( +-  0.03% )  (83.38%)
        46,907,166      branch-misses             #    0.77% of all branches          
( +-  0.16% )  (83.30%)

            2.4833 +- 0.0124 seconds time elapsed  ( +-  0.50% )
```

There is clearly still a significant slowdown imposed by having debuginfod 
enabled - even after the first run. Is this expected? My assumption was: Only 
the first run would be potentially slow, then everything is cached locally.

In the off-CPU flame graphs I clearly see this callchain when DEBUGINFOD_URLS 
is set:

```
dwfl_standard_find_debuginfo > debuginfod_query_server > curl_multi_wait
```

This accounts for ~2.5s of off-CPU time in my case. If you are saying that 
negative lookups are cached, then what is this? Why would a second run of the 
same dwfl-using application with the same input data keep querying the server?

Thanks

-- 
Milian Wolff
mail@milianw.de
http://milianw.de