From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from gnu.wildebeest.org (gnu.wildebeest.org [45.83.234.184]) by sourceware.org (Postfix) with ESMTPS id 19BE23858C5E for ; Wed, 5 Jul 2023 13:45:25 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 19BE23858C5E Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=klomp.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=klomp.org Received: from r6.localdomain (82-217-174-174.cable.dynamic.v4.ziggo.nl [82.217.174.174]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by gnu.wildebeest.org (Postfix) with ESMTPSA id 146EB304F826; Wed, 5 Jul 2023 15:45:22 +0200 (CEST) Received: by r6.localdomain (Postfix, from userid 1000) id 915F134007A; Wed, 5 Jul 2023 15:45:21 +0200 (CEST) Message-ID: <0401eff5c94cfb2f92343de21746cacbd5936c3e.camel@klomp.org> Subject: Re: PR29472: debuginfod metadata query From: Mark Wielaard To: "Frank Ch. Eigler" , elfutils-devel@sourceware.org Date: Wed, 05 Jul 2023 15:45:21 +0200 In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.48.3 (3.48.3-1.fc38) MIME-Version: 1.0 X-Spam-Status: No, score=-3033.5 required=5.0 tests=BAYES_00,GIT_PATCH_0,JMQ_SPF_NEUTRAL,KAM_BADIPHTTP,KAM_DMARC_STATUS,KAM_SHORT,RCVD_IN_BARRACUDACENTRAL,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Frank, Skipping debuginfod.cxx itself for now. Review of everything else left. On Wed, 2023-04-12 at 16:31 -0400, Frank Ch. Eigler via Elfutils-devel wrote: > + * debuginfod.h.in (debuginfod_find_metadata): New decl. > + * libdebuginfod.map: Export it under ELFUTILS_0.190. > [...] >=20 > diff --git a/debuginfod/debuginfod.h.in b/debuginfod/debuginfod.h.in > index 4a256ba9af1f..3efa877a3489 100644 > --- a/debuginfod/debuginfod.h.in > +++ b/debuginfod/debuginfod.h.in > @@ -62,9 +62,9 @@ debuginfod_client *debuginfod_begin (void); > it is a binary blob of given length. > =20 > If successful, return a file descriptor to the target, otherwise > - return a posix error code. If successful, set *path to a > - strdup'd copy of the name of the same file in the cache. > - Caller must free() it later. */ > + return a negative POSIX error code. If successful, set *path to a > + strdup'd copy of the name of the same file in the cache. Caller > + must free() it later. */ >=20 OK, reformat only. > int debuginfod_find_debuginfo (debuginfod_client *client, > const unsigned char *build_id, > @@ -88,6 +88,18 @@ int debuginfod_find_section (debuginfod_client *client= , > const char *section, > char **path); > =20 > +/* Query the urls contained in $DEBUGINFOD_URLS for metadata > + with given query key/value. > + =20 > + If successful, return a file descriptor to the JSON document > + describing matches, otherwise return a negative POSIX error code. If > + successful, set *path to a strdup'd copy of the name of the same > + file in the cache. Caller must free() it later. */ > +int debuginfod_find_metadata (debuginfod_client *client, > + const char *key, > + const char* value, > + char **path); >=20 Can we have char *value? Also a bit more description about keys, values and the resulting json might be good here. > typedef int (*debuginfod_progressfn_t)(debuginfod_client *c, long a, lon= g b); > void debuginfod_set_progressfn(debuginfod_client *c, > debuginfod_progressfn_t fn); > diff --git a/debuginfod/libdebuginfod.map b/debuginfod/libdebuginfod.map > index 6334373f01b0..355a89fd0397 100644 > --- a/debuginfod/libdebuginfod.map > +++ b/debuginfod/libdebuginfod.map > @@ -22,3 +22,6 @@ ELFUTILS_0.188 { > debuginfod_get_headers; > debuginfod_find_section; > } ELFUTILS_0.183; > +ELFUTILS_0.190 { > + debuginfod_find_metadata; > +} ELFUTILS_0.188; OK. > diff --git a/doc/ChangeLog b/doc/ChangeLog > index 7f2d6ff4fd31..4ed30eb2cef9 100644 > --- a/doc/ChangeLog > +++ b/doc/ChangeLog > @@ -1,3 +1,10 @@ > +2023-04-12 Ryan Golberg , Frank Ch. Eigler > + > + PR29472: debuginfod metadata query > + * debuginfod-find.1: Document metadata query. > + * debuginfod-client-config.7: Document metadata cache control setting. > + * debuginfod.8: Document new option and webapi. > + > 2023-02-14 Mark Wielaard > =20 > * debuginfod.8: Add .TP before -g. > diff --git a/doc/debuginfod-client-config.7 b/doc/debuginfod-client-confi= g.7 > index 53d82806d395..bc98d1e6b35a 100644 > --- a/doc/debuginfod-client-config.7 > +++ b/doc/debuginfod-client-config.7 > @@ -134,3 +134,11 @@ are short-circuited (returning an immediate failure = instead of sending > a new query to servers). This accelerates queries that probably would > still fail. The default is 600, 10 minutes. 0 means "forget > immediately". > + > +.TP > +.B metadata_retention_s > +This control file sets how long to remember the results of a metadata > +query. New queries for the same artifacts within this time window are > +short-circuited (repeating the same results). This accelerates > +queries that probably would probably have the same results. The > +default is 86400, 1 day. 0 means "do not retain". OK. > diff --git a/doc/debuginfod-find.1 b/doc/debuginfod-find.1 > index 7d577babeb89..6652539cf3dd 100644 > --- a/doc/debuginfod-find.1 > +++ b/doc/debuginfod-find.1 > @@ -29,6 +29,8 @@ debuginfod-find \- request debuginfo-related data > .B debuginfod-find [\fIOPTION\fP]... source \fIBUILDID\fP \fI/FILENAME\f= P > .br > .B debuginfod-find [\fIOPTION\fP]... source \fIPATH\fP \fI/FILENAME\fP > +.br > +.B debuginfod-find [\fIOPTION\fP]... metadata \fIKEY\fP \fIVALUE\fP >=20 Note that the program itself will only have this if HAVE_JSON_C (which I think is slightly odd). > .SH DESCRIPTION > \fBdebuginfod-find\fP queries one or more \fBdebuginfod\fP servers for > @@ -119,6 +121,36 @@ l l. > \../bar/foo.c AT_comp_dir=3D/zoo/ source BUILDID /zoo//../bar/foo.c > .TE > =20 > +.SS metadata \fIKEY\fP \fIVALUE\fP > + > +All designated debuginfod servers are queried for metadata about files > +in their index. Different search keys may be supported by different > +servers. > + > +.TS > +l l l . > +KEY VALUE DESCRIPTION > + > +\fBfile\fP path exact match \fIpath\fP, including in archives > +\fBglob\fP pattern glob match \fIpattern\fP, including in archives > +.TE Something like this would be good to also have in debuginfod --help output. Maybe also show examples of the json output? > + > +The results of the search are output to \fBstdout\fP as a JSON array > +of objects, supplying metadata about each match. This doesn't seem to match the program code which just provides a reference to the cached metadata results file. > This metadata report > +may be cached. It may be incomplete and may contain duplicates. For > +each match, the result is a JSON object with these fields. Additional > +fields may be present. > + > +.TS > +l l l . > +NAME TYPE DESCRIPTION > + > +\fBbuildid\fP string hexadecimal buildid associated with the file > +\fBtype\fP string one of \fBdebuginfo\fP or \fBexecutable\fP > +\fBfile\fP string matched file name, outside or inside the archive > +\fBarchive\fP string archive containing matched file name, if any > +.TE > + > .SH "OPTIONS" It would be good to have an explanation of why the type cannot be source and why the query always goes over both type of files (could the user indicate they are only interested in one or the other?) > .TP > diff --git a/doc/debuginfod.8 b/doc/debuginfod.8 > index 07cb01aeed10..1070d290a77a 100644 > --- a/doc/debuginfod.8 > +++ b/doc/debuginfod.8 > @@ -133,6 +133,14 @@ scanner/groomer server and multiple passive ones, th= ereby sharing > service load. Archive pattern options must still be given, so > debuginfod can recognize file name extensions for unpacking. > =20 > +.TP > +.B "\-\-metadata\-maxtime=3DSECONDS" > +Impose a limit on the runtime of metadata webapi queries. These > +queries, especially broad "glob" wildcards, can take a large amount of > +time and produce large results. Public-facing servers may need to > +throttle them. The default limit is 5 seconds. Set 0 to disable this > +limit. OK. Why was a runtime of 5 seconds picked? Would it also make sense to limit the size of the result? =20 > +.SS /metadata?key=3D\fIKEY\fP&value=3D\fIVALUE\fP > + > +This endpoint triggers a search of the files in the index plus any > +upstream federated servers, based on given key and value. If > +successful, the result is a application/json textual array, listing > +metadata for the matched files. See \fIdebuginfod-find(1)\fP for > +documentation of the common key/value search parameters, and the > +resulting data schema. OK. Although I wouldn't mind if it at least said which key types this implementation supports. > .SH DATA MANAGEMENT > =20 > debuginfod stores its index in an sqlite database in a densely packed > diff --git a/tests/ChangeLog b/tests/ChangeLog > index eb3e1118fa00..4ef2248b5914 100644 > --- a/tests/ChangeLog > +++ b/tests/ChangeLog > @@ -1,3 +1,9 @@ > +2023-04-12 Ryan Golberg , Frank Ch. Eigler > + > + PR29472: debuginfod metadata query > + * run-debuginfod-find-metadata.sh: New test. > + * Makefile.am: Run it, dist it. > + > 2023-02-10 Mark Wielaard > =20 > * varlocs.c (print_expr): Handle DW_OP_GNU_uninit. > diff --git a/tests/Makefile.am b/tests/Makefile.am > index 7e32f1170c1b..df2190f07fe8 100644 > --- a/tests/Makefile.am > +++ b/tests/Makefile.am > @@ -261,7 +261,8 @@ TESTS +=3D run-debuginfod-dlopen.sh \ > run-debuginfod-response-headers.sh \ > run-debuginfod-extraction-passive.sh \ > run-debuginfod-webapi-concurrency.sh \ > - run-debuginfod-section.sh > + run-debuginfod-section.sh \ > + run-debuginfod-find-metadata.sh > endif > if !OLD_LIBMICROHTTPD > # Will crash on too old libmicrohttpd > @@ -578,6 +579,7 @@ EXTRA_DIST =3D run-arextract.sh run-arsymtest.sh run-= ar.sh \ > run-debuginfod-extraction-passive.sh \ > run-debuginfod-webapi-concurrency.sh \ > run-debuginfod-section.sh \ > + run-debuginfod-find-metadata.sh \ > debuginfod-rpms/fedora30/hello2-1.0-2.src.rpm \ > debuginfod-rpms/fedora30/hello2-1.0-2.x86_64.rpm \ > debuginfod-rpms/fedora30/hello2-debuginfo-1.0-2.x86_64.rpm \ OK. > diff --git a/tests/run-debuginfod-find-metadata.sh b/tests/run-debuginfod= -find-metadata.sh > new file mode 100755 > index 000000000000..c378bcdd5f2e > --- /dev/null > +++ b/tests/run-debuginfod-find-metadata.sh > @@ -0,0 +1,110 @@ > +#!/usr/bin/env bash > +# > +# Copyright (C) 2022 Red Hat, Inc. > +# This file is part of elfutils. > +# > +# This file is free software; you can redistribute it and/or modify > +# it under the terms of the GNU General Public License as published by > +# the Free Software Foundation; either version 3 of the License, or > +# (at your option) any later version. > +# > +# elfutils is distributed in the hope that it will be useful, but > +# WITHOUT ANY WARRANTY; without even the implied warranty of > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > +# GNU General Public License for more details. > +# > +# You should have received a copy of the GNU General Public License > +# along with this program. If not, see . > + > +. $srcdir/debuginfod-subr.sh > + > +# for test case debugging, uncomment: > +set -x > +# unset VALGRIND_CMD > + > +type curl 2>/dev/null || { echo "need curl"; exit 77; } > +type jq 2>/dev/null || { echo "need jq"; exit 77; } > + > +pkg-config json-c libcurl || { echo "one or more libraries are missing (= libjson-c, libcurl)"; exit 77; } Don't forget to add jq and libjson-c-devel to the BuildRequires in config/elfutils.spec.in > +env LD_LIBRARY_PATH=3D$ldpath DEBUGINFOD_URLS=3D ${VALGRIND_CMD} ${abs_b= uilddir}/../debuginfod/debuginfod-find --help 2>&1 | grep metadata || { ech= o "compiled without metadata support"; exit 77; } That shouldn't happen if the above check fails, you get an early skip already. > +DB=3D${PWD}/.debuginfod_tmp.sqlite > +export DEBUGINFOD_CACHE_PATH=3D${PWD}/.client_cache > +tempfiles $DB ${DB}_2 $DB-wal ${DB}_2-wal $DB-shm ${DB}_2-shm Why are the -wal and -shm files necessary here? Or why aren't they tempfiles in the other debuginfod test files? > +# This variable is essential and ensures no time-race for claiming ports= occurs > +# set base to a unique multiple of 100 not used in any other 'run-debugi= nfod-*' test > +base=3D13100 > +get_ports > +mkdir R D > +cp -rvp ${abs_srcdir}/debuginfod-rpms/rhel7 R > +cp -rvp ${abs_srcdir}/debuginfod-debs/*deb D > + > +env LD_LIBRARY_PATH=3D$ldpath DEBUGINFOD_URLS=3D ${VALGRIND_CMD} ${abs_b= uilddir}/../debuginfod/debuginfod $VERBOSE -R \ > + -d $DB -p $PORT1 -t0 -g0 R > vlog$PORT1 2>&1 & > +PID1=3D$! > +tempfiles vlog$PORT1 > +errfiles vlog$PORT1 > + > +wait_ready $PORT1 'ready' 1 > +wait_ready $PORT1 'thread_work_total{role=3D"traverse"}' 1 > +wait_ready $PORT1 'thread_work_pending{role=3D"scan"}' 0 > +wait_ready $PORT1 'thread_busy{role=3D"scan"}' 0 > + > +env LD_LIBRARY_PATH=3D$ldpath DEBUGINFOD_URLS=3D"http://127.0.0.1:$PORT1= https://bad/url.web" ${VALGRIND_CMD} ${abs_builddir}/../debuginfod/debugin= fod $VERBOSE -U \ > + -d ${DB}_2 -p $PORT2 -t0 -g0 D > vlog$PORT2 2>&1 & > +PID2=3D$! > +tempfiles vlog$PORT2 > +errfiles vlog$PORT2 > + > +wait_ready $PORT2 'ready' 1 > +wait_ready $PORT2 'thread_work_total{role=3D"traverse"}' 1 > +wait_ready $PORT2 'thread_work_pending{role=3D"scan"}' 0 > +wait_ready $PORT2 'thread_busy{role=3D"scan"}' 0 > + > +# have clients contact the new server > +export DEBUGINFOD_URLS=3Dhttp://127.0.0.1:$PORT2 > + > +tempfiles json.txt > +# Check that we find correct number of files, both via local and federat= ed links > +RESULTF=3D`env LD_LIBRARY_PATH=3D$ldpath ${abs_builddir}/../debuginfod/d= ebuginfod-find metadata glob "/u?r/bin/*"` > +cat $RESULTF > +N_FOUND=3D`cat $RESULTF | jq '. | length'` > +test $N_FOUND -eq 1 > +RESULTF=3D`env LD_LIBRARY_PATH=3D$ldpath ${abs_builddir}/../debuginfod/d= ebuginfod-find metadata glob "/usr/lo?al/bin/*"` > +cat $RESULTF > +N_FOUND=3D`cat $RESULTF | jq '. | length'` > +test $N_FOUND -eq 2 OK, I think I understand these tests. But they do fail locally for me. + RESULTF=3D'/home/mark/build/elfutils-obj/tests/test- 2315372/.client_cache/metada ta/key=3Dglob&value=3D%2Fu%3Fr%2Fbin%2F%2A' + cat '/home/mark/build/elfutils-obj/tests/test- 2315372/.client_cache/metadata/key=3Dglob&value=3D%2Fu%3Fr%2Fbin%2F%2A' [ ]++ cat '/home/mark/build/elfutils-obj/tests/test- 2315372/.client_cache/metadata/key=3Dglob&value=3D%2Fu%3Fr%2Fbin%2F%2A' ++ jq '. | length' + N_FOUND=3D0 + test 0 -eq 1 ++ err [...] [Wed Jul 5 13:35:16 2023] (2315460/2315473): sqlite3 error: prepare select d1.executable_p, d1.debuginfo_p, 0 as source_p, b1.hex, f1.name as file, a1.name as archive from buildids10_r_de d1, buildids10_files f1, buildids10_buildids b1, buildids10_files a1 where f1.id =3D d1.content and a1.id =3D d1.file and d1.buildid =3D b1.id and f1.name glob ? union all=20 select d2.executable_p, d2.debuginfo_p, 0, b2.hex, f2.name, NULL from buildids10_f_de d2, buildids10_files f2, buildids10_buildids b2 where f2.id =3D d2.file and d2.buildid =3D b2.id and f2.name glob ? : SQL logic error [Wed Jul 5 13:35:16 2023] (2315460/2315473): 127.0.0.1:42354 UA:elfutils/0.189,Linux/x86_64,fedora/38 XFF: GET /metadata?key=3Dglob&value=3D/u?r/bin/* 503 525 0+0ms (Running against the users/fche/try-pr29472f branch) > +# Query via the webapi as well > +curl http://127.0.0.1:$PORT2'/metadata?key=3Dglob&value=3D/usr/bin/*hi*' > +test `curl -s http://127.0.0.1:$PORT2'/metadata?key=3Dglob&value=3D/usr/= bin/*hi*' | jq '.[0].buildid =3D=3D "f17a29b5a25bd4960531d82aa6b07c8abe84fa= 66"'` =3D 'true' > +test `curl -s http://127.0.0.1:$PORT2'/metadata?key=3Dglob&value=3D/usr/= bin/*hi*' | jq '.[0].file =3D=3D "/usr/bin/hithere"'` =3D 'true' > +test `curl -s http://127.0.0.1:$PORT2'/metadata?key=3Dglob&value=3D/usr/= bin/*hi*' | jq '.[0].archive | test(".*hithere.*deb")'` =3D 'true' > + > +# An empty array is returned on server error or if the file DNE > +RESULTF=3D`env LD_LIBRARY_PATH=3D$ldpath ${abs_builddir}/../debuginfod/d= ebuginfod-find metadata file "/this/isnt/there"` > +cat $RESULTF > +test `cat $RESULTF | jq ". =3D=3D [ ]" ` =3D 'true' > + > +kill $PID1 > +kill $PID2 > +wait $PID1 > +wait $PID2 > +PID1=3D0 > +PID2=3D0 > + > +# check it's still in cache > +RESULTF=3D`env LD_LIBRARY_PATH=3D$ldpath ${abs_builddir}/../debuginfod/d= ebuginfod-find metadata file "/usr/bin/hithere"` > +cat $RESULTF > +test `cat $RESULTF | jq ". =3D=3D [ ]" ` =3D 'true' > + > +# invalidate cache, retry previously successful query to now-dead server= s > +echo 0 > $DEBUGINFOD_CACHE_PATH/metadata_retention_s > +RESULTF=3D`env LD_LIBRARY_PATH=3D$ldpath ${abs_builddir}/../debuginfod/d= ebuginfod-find metadata glob "/u?r/bin/*"` > +cat $RESULTF > +test `cat $RESULTF | jq ". =3D=3D [ ]" ` =3D 'true' > + > +exit 0 These tests seem OK, but I find it somewhat confusing that not found and error getting any result are represented the same (as an empty array). Cheers, Mark