From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from gnu.wildebeest.org (gnu.wildebeest.org [45.83.234.184]) by sourceware.org (Postfix) with ESMTPS id 9E7343876896 for ; Tue, 4 Jul 2023 17:00:32 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 9E7343876896 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=klomp.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=klomp.org Received: from r6.localdomain (82-217-174-174.cable.dynamic.v4.ziggo.nl [82.217.174.174]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by gnu.wildebeest.org (Postfix) with ESMTPSA id CFCFF3000D0D; Tue, 4 Jul 2023 19:00:30 +0200 (CEST) Received: by r6.localdomain (Postfix, from userid 1000) id 69F623402E2; Tue, 4 Jul 2023 19:00:30 +0200 (CEST) Message-ID: <5ea66dc460594f77d480747d3d095045a7a7c566.camel@klomp.org> Subject: Re: PR29472: debuginfod metadata query From: Mark Wielaard To: "Frank Ch. Eigler" , elfutils-devel@sourceware.org Date: Tue, 04 Jul 2023 19:00:30 +0200 In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.48.3 (3.48.3-1.fc38) MIME-Version: 1.0 X-Spam-Status: No, score=-3034.5 required=5.0 tests=BAYES_00,GIT_PATCH_0,JMQ_SPF_NEUTRAL,KAM_DMARC_STATUS,RCVD_IN_BARRACUDACENTRAL,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Frank, On Wed, 2023-04-12 at 16:31 -0400, Frank Ch. Eigler via Elfutils-devel wrote: > At long last, for your review, an updated & merged patch for $SUBJECT. > "git diff -s" against git/master follows, code also on > users/fche/try-pr29472e. Apologies for the delay in review. This is mainly because I am a json noob, and don't fully understand what interface guarantees we are/can make using it. > Author: Frank Ch. Eigler > Date: Tue Apr 11 23:35:25 2023 -0400 >=20 > PR29472: debuginfod: add metadata query webapi, C api, client > =20 > This patch extends the debuginfod API with a "metadata query" > operation. It allows clients to request an enumeration of file names > known to debuginfod servers, returning a JSON response including the > matching buildids. This lets clients later download debuginfo for a > range of versions of the same named binaries, in case they need to to > prospective work (like systemtap-based live-patching). It also lets > server operators implement prefetch triggering operations for popular > but slow debuginfo slivers like kernel vdso.debug files on fedora. > =20 > Implementation requires a modern enough json-c library. The configure check looks for json_object_from_fd. So that is json-c- 0.13 or higher (which has been out for 5 year). They seem to use symbol versioning (but also bump SO_NAME from time to time?). > % debuginfod-find metadata file /bin/ls > % debuginfod-find metadata glob "/usr/local/bin/c*" Would be nice to have some example output here. If only to explain what the used can expect. It gives the following for me: /home/mark/.cache/debuginfod_client/metadata/key=3Dfile&value=3D%2Fbin%2Fls /home/mark/.cache/debuginfod_client/metadata/key=3Dglob&value=3D%2Fusr%2Flo cal%2Fbin%2Fc%2A Probably because I don't have a DEBUGINFOD_URLS configured that understands the new requests, but it is a little confusing imho. > Documentation and testing are included. > =20 > Signed-off-by: Ryan Goldberg > Signed-off-by: Frank Ch. Eigler >=20 > diff --git a/ChangeLog b/ChangeLog > index 10c23002279e..c35a19dd51c4 100644 > --- a/ChangeLog > +++ b/ChangeLog > @@ -1,3 +1,9 @@ > +2023-04-12 Ryan Golberg , Frank Ch. Eigler > + > + PR29472: debuginfod metadata query > + * NEWS: Mention this. > + * configure.ac: Look for json-c library. > + > 2023-03-03 Mark Wielaard > =20 > * NEWS: Add ELFCOMPRESS_ZSTD support for libelf and elfcompress. > diff --git a/NEWS b/NEWS > index 16e37eca5731..1bf38a69f64c 100644 > --- a/NEWS > +++ b/NEWS > @@ -1,3 +1,7 @@ > +Version 0.190 > + > +debuginfod: New API for metadata queries: file name -> buildid. Clearly this calls for the new "code name" to be "meta" :) > Version 0.189 "Don't deflate!" > =20 > configure: eu-nm, eu-addr2line and eu-stack can provide demangled symbol= s > diff --git a/configure.ac b/configure.ac > index 4b67c84425fa..b319a0119180 100644 > --- a/configure.ac > +++ b/configure.ac > @@ -665,6 +665,12 @@ case "$ac_cv_search__obstack_free" in > esac > AC_SUBST([obstack_LIBS]) > =20 > +dnl formerly checked for json_object_array_add, but debuginfod needs a n= ewer function > +AC_CHECK_LIB(json-c, json_object_from_fd, [ > + AC_DEFINE([HAVE_JSON_C], [1], [Define if json-c is on the machine]) > + AC_SUBST(jsonc_LIBS, '-ljson-c') > +]) > + > dnl The directories with content. So this is an optional feature in all configurations? Maybe also include it in the summary at the end under OTHER FEATURES? > dnl Documentation. > diff --git a/debuginfod/ChangeLog b/debuginfod/ChangeLog > index f13c28d5c6f7..ac3fd6aa862f 100644 > --- a/debuginfod/ChangeLog > +++ b/debuginfod/ChangeLog > @@ -1,3 +1,26 @@ > +2023-04-12 Ryan Golberg , Frank Ch. Eigler > + > + PR29472: debuginfod metadata query > + * Makefile.am: Add json-c usage. > + * debuginfod-client.c (debuginfod_find_metadata): New function. > + (handle_data): New fields to hold metadata being received. > + (debuginfod_clean_cache): Clean metadata too. > + (header_callback): Simplify to realloc only. > + (metadata_callback): New function. > + (init_server_urls, init_handle, perform_queries, make_cache_path): > + New refactored functions. > + (debuginfod_query_server_by_buildid): Renamed, refactored. Update > + callers. Still going through these refactors. If you have a high level description why/what was refactored that would be helpful. > + * debuginfod-find.c (main): Handle metadata queries. > + * debuginfod.cxx (DEBUGINFOD_SQLITE_DDL): Add an index or two. > + (metadata_maxtime_s, parse_opt): New parameter for load control. > + (add_client_federation_headers): New refactored function. > + (handle_metadata): New function. > + (handler_cb): Call it for /metadata URL. Trace it. > + (groom): Tweak sqlite_ps object lifetimes. > + * debuginfod.h.in (debuginfod_find_metadata): New decl. > + * libdebuginfod.map: Export it under ELFUTILS_0.190. > + > 2023-03-30 Jan Alexander Steffens (heftig) > =20 > * debuginfod-client.c (update_atime): New function. > diff --git a/debuginfod/Makefile.am b/debuginfod/Makefile.am > index 125be97bbfcc..73ffe0e56468 100644 > --- a/debuginfod/Makefile.am > +++ b/debuginfod/Makefile.am > @@ -70,7 +70,7 @@ bin_PROGRAMS +=3D debuginfod-find > endif > =20 > debuginfod_SOURCES =3D debuginfod.cxx > -debuginfod_LDADD =3D $(libdw) $(libelf) $(libeu) $(libdebuginfod) $(argp= _LDADD) $(fts_LIBS) $(libmicrohttpd_LIBS) $(sqlite3_LIBS) $(libarchive_LIBS= ) -lpthread -ldl > +debuginfod_LDADD =3D $(libdw) $(libelf) $(libeu) $(libdebuginfod) $(argp= _LDADD) $(fts_LIBS) $(libmicrohttpd_LIBS) $(sqlite3_LIBS) $(libarchive_LIBS= ) $(jsonc_LIBS) $(libcurl_LIBS) -lpthread -ldl > =20 > debuginfod_find_SOURCES =3D debuginfod-find.c > debuginfod_find_LDADD =3D $(libdw) $(libelf) $(libeu) $(libdebuginfod) $= (argp_LDADD) $(fts_LIBS) > @@ -97,7 +97,7 @@ libdebuginfod_so_LIBS =3D libdebuginfod_pic.a > if DUMMY_LIBDEBUGINFOD > libdebuginfod_so_LDLIBS =3D > else > -libdebuginfod_so_LDLIBS =3D -lpthread $(libcurl_LIBS) $(fts_LIBS) $(libe= lf) > +libdebuginfod_so_LDLIBS =3D -lpthread $(libcurl_LIBS) $(fts_LIBS) $(libe= lf) $(jsonc_LIBS) > endif > $(LIBDEBUGINFOD_SONAME): $(srcdir)/libdebuginfod.map $(libdebuginfod_so_= LIBS) > $(AM_V_CCLD)$(LINK) $(dso_LDFLAGS) -o $@ \ So we are only linking to jsonc for libdebuginfod.so to check the text is valid json, but the user still has to parse the json themselves. So the promise is that the user gets valid json, but we don't make any other promises. Since libdebuginfod already pulls in libcurl and its dependencies, it doesn't really matter much. But imho it is not really necessary to pull json-c in for the client library. > diff --git a/debuginfod/debuginfod-client.c b/debuginfod/debuginfod-clien= t.c > index 5bb7952416e6..cbe448519320 100644 > --- a/debuginfod/debuginfod-client.c > +++ b/debuginfod/debuginfod-client.c > @@ -1,5 +1,5 @@ > /* Retrieve ELF / DWARF / source files from the debuginfod. > - Copyright (C) 2019-2021 Red Hat, Inc. > + Copyright (C) 2019-2023 Red Hat, Inc. > Copyright (C) 2021, 2022 Mark J. Wielaard > This file is part of elfutils. > =20 > @@ -60,6 +60,8 @@ int debuginfod_find_source (debuginfod_client *c, const= unsigned char *b, > int debuginfod_find_section (debuginfod_client *c, const unsigned char *= b, > int s, const char *scn, char **p) > { return -ENOSYS; } > +int debuginfod_find_metadata (debuginfod_client *c, > + const char *k, const char *v, char **p) { = return -ENOSYS; } > void debuginfod_set_progressfn(debuginfod_client *c, > debuginfod_progressfn_t fn) { } > void debuginfod_set_verbose_fd(debuginfod_client *c, int fd) { } > @@ -114,6 +116,10 @@ void debuginfod_end (debuginfod_client *c) { } > =20 > #include > =20 > +#ifdef HAVE_JSON_C > + #include > +#endif > + > static pthread_once_t init_control =3D PTHREAD_ONCE_INIT; > =20 > static void > @@ -174,6 +180,13 @@ static const char *cache_miss_filename =3D "cache_mi= ss_s"; > static const char *cache_max_unused_age_filename =3D "max_unused_age_s"; > static const long cache_default_max_unused_age_s =3D 604800; /* 1 week *= / > =20 > +#ifdef HAVE_JSON_C > +/* The metadata_retention_default_s file within the debuginfod cache > + specifies how long metadata query results should be cached. */ > +static const long metadata_retention_default_s =3D 86400; /* 1 day */ > +static const char *metadata_retention_filename =3D "metadata_retention_s= "; > +#endif > + > /* Location of the cache of files downloaded from debuginfods. > The default parent directory is $HOME, or '/' if $HOME doesn't exist.= */ > static const char *cache_default_name =3D ".debuginfod_client_cache"; > @@ -215,6 +228,9 @@ struct handle_data > /* Response http headers for this client handle, sent from the server = */ > char *response_data; > size_t response_data_size; > + /* Response metadata values for this client handle, sent from the serv= er */ > + char *metadata; > + size_t metadata_size; > }; OK, we are always collecting metadata even if we don't have json-c. > static size_t > @@ -333,7 +349,8 @@ debuginfod_clean_cache(debuginfod_client *c, > return -errno; > =20 > regex_t re; > - const char * pattern =3D ".*/[a-f0-9]+(/debuginfo|/executable|/source.= *|)$"; /* include dirs */ > + const char * pattern =3D ".*/(metadata.*|[a-f0-9]+(/debuginfo|/executa= ble|/source.*|))$"; /* include dirs */ > + /* NB: also includes .../section/ subdirs */ > if (regcomp (&re, pattern, REG_EXTENDED | REG_NOSUB) !=3D 0) > return -ENOMEM; >=20 I don't understand the NB: comment. Sorry. > @@ -569,18 +586,9 @@ header_callback (char * buffer, size_t size, size_t = numitems, void * userdata) > } > /* Temporary buffer for realloc */ > char *temp =3D NULL; > - if (data->response_data =3D=3D NULL) > - { > - temp =3D malloc(numitems); > - if (temp =3D=3D NULL) > - return 0; > - } > - else > - { > temp =3D realloc(data->response_data, data->response_data_size + numit= ems); > if (temp =3D=3D NULL) > return 0; > - } > =20 > memcpy(temp + data->response_data_size, buffer, numitems-1); > data->response_data =3D temp; OK, rely on realloc (NULL, size) doing malloc (size). > @@ -590,6 +598,341 @@ header_callback (char * buffer, size_t size, size_t= numitems, void * userdata) > return numitems; > } > =20 > + > +#ifdef HAVE_JSON_C > +static size_t > +metadata_callback (char * buffer, size_t size, size_t numitems, void * u= serdata) > +{ > + if (size !=3D 1) > + return 0; > + /* Temporary buffer for realloc */ > + char *temp =3D NULL; > + struct handle_data *data =3D (struct handle_data *) userdata; > + temp =3D realloc(data->metadata, data->metadata_size + numitems + 1); > + if (temp =3D=3D NULL) > + return 0; > + =20 > + memcpy(temp + data->metadata_size, buffer, numitems); > + data->metadata =3D temp; > + data->metadata_size +=3D numitems; > + data->metadata[data->metadata_size] =3D '\0'; > + return numitems; > +} > +#endif Why is this guarded by HAVE_JSON_C? > + > +/* This function takes a copy of DEBUGINFOD_URLS, server_urls, and seper= ates it into an > + * array of urls to query. The url_subdir is either 'buildid' or 'metada= ta', corresponding > + * to the query type. Returns 0 on success and -Posix error on faliure. > + */ > +int > +init_server_urls(char* url_subdir, char *server_urls, char ***server_url= _list, int *num_urls, int vfd) > +{ > + /* Initialize the memory to zero */ > + char *strtok_saveptr; > + char *server_url =3D strtok_r(server_urls, url_delim, &strtok_saveptr)= ; > + /* Count number of URLs. */ > + int n =3D 0; > + assert(0 =3D=3D strcmp(url_subdir, "buildid") || 0 =3D=3D strcmp(url_s= ubdir, "metadata")); > + > + /* PR 27983: If the url is already set to be used use, skip it */ > + while (server_url !=3D NULL) > + { > + int r; > + char *tmp_url; > + if (strlen(server_url) > 1 && server_url[strlen(server_url)-1] =3D= =3D '/') > + r =3D asprintf(&tmp_url, "%s%s", server_url, url_subdir); > + else > + r =3D asprintf(&tmp_url, "%s/%s", server_url, url_subdir); > + > + if (r =3D=3D -1) > + { > + return -ENOMEM; > + } > + int url_index; > + for (url_index =3D 0; url_index < n; ++url_index) > + { > + if(strcmp(tmp_url, (*server_url_list)[url_index]) =3D=3D 0) > + { > + url_index =3D -1; > + break; > + } > + } > + if (url_index =3D=3D -1) > + { > + if (vfd >=3D 0) > + dprintf(vfd, "duplicate url: %s, skipping\n", tmp_url); > + free(tmp_url); > + } > + else > + { > + n++; > + char ** realloc_ptr; > + realloc_ptr =3D reallocarray(*server_url_list, n, > + sizeof(char*)); > + if (realloc_ptr =3D=3D NULL) > + { > + free (tmp_url); > + return -ENOMEM; > + } > + *server_url_list =3D realloc_ptr; > + (*server_url_list)[n-1] =3D tmp_url; > + } > + server_url =3D strtok_r(NULL, url_delim, &strtok_saveptr); > + } > + *num_urls =3D n; > + return 0; > +} OK, this looks lifted from debuginfod_query_server updated to deal with either /buildid/ or /metadata/. > +/* Some boilerplate for checking curl_easy_setopt. */ > +#define curl_easy_setopt_ck(H,O,P) do { \ > + CURLcode curl_res =3D curl_easy_setopt (H,O,P); \ > + if (curl_res !=3D CURLE_OK) \ > + { \ > + if (vfd >=3D 0) \ > + dprintf (vfd, \ > + "Bad curl_easy_setopt: %s\n", \ > + curl_easy_strerror(curl_res)); \ > + return -EINVAL; \ > + } \ > + } while (0) > + OK, also lifted from debuginfod_query_server. And changed to return EINVAL directly without goto out2. So no cleanup needed whenever this is used? Sorry, I realize this is only 10% of the change. And I didn't really get to the "real" changes yet. But I have to step out for a bit. To be continued :) Note that I don't believe there is anything really objectonable here. Just pondering what interface we are really guaranteeing and what it means for json-c being totally optional (it changes the public interface of debuginfod-find for example). Cheers, Mark