public inbox for elfutils@sourceware.org
 help / color / mirror / Atom feed
From: Mark Wielaard <mark@klomp.org>
To: "Frank Ch. Eigler" <fche@elastic.org>, elfutils-devel@sourceware.org
Subject: Re: PR29472: debuginfod metadata query
Date: Tue, 04 Jul 2023 19:00:30 +0200	[thread overview]
Message-ID: <5ea66dc460594f77d480747d3d095045a7a7c566.camel@klomp.org> (raw)
In-Reply-To: <ZDcVNeakb2ze5oZ8@elastic.org>

Hi Frank,

On Wed, 2023-04-12 at 16:31 -0400, Frank Ch. Eigler via Elfutils-devel
wrote:
> At long last, for your review, an updated & merged patch for $SUBJECT.
> "git diff -s" against git/master follows, code also on
> users/fche/try-pr29472e.

Apologies for the delay in review. This is mainly because I am a json
noob, and don't fully understand what interface guarantees we are/can
make using it.

> Author: Frank Ch. Eigler <fche@redhat.com>
> Date:   Tue Apr 11 23:35:25 2023 -0400
> 
>     PR29472: debuginfod: add metadata query webapi, C api, client
>     
>     This patch extends the debuginfod API with a "metadata query"
>     operation.  It allows clients to request an enumeration of file names
>     known to debuginfod servers, returning a JSON response including the
>     matching buildids.  This lets clients later download debuginfo for a
>     range of versions of the same named binaries, in case they need to to
>     prospective work (like systemtap-based live-patching).  It also lets
>     server operators implement prefetch triggering operations for popular
>     but slow debuginfo slivers like kernel vdso.debug files on fedora.
>     
>     Implementation requires a modern enough json-c library.

The configure check looks for json_object_from_fd. So that is json-c-
0.13 or higher (which has been out for 5 year). They seem to use symbol
versioning (but also bump SO_NAME from time to time?).

>     % debuginfod-find metadata file /bin/ls
>     % debuginfod-find metadata glob "/usr/local/bin/c*"

Would be nice to have some example output here. If only to explain what
the used can expect. It gives the following for me:

/home/mark/.cache/debuginfod_client/metadata/key=file&value=%2Fbin%2Fls
/home/mark/.cache/debuginfod_client/metadata/key=glob&value=%2Fusr%2Flo
cal%2Fbin%2Fc%2A

Probably because I don't have a DEBUGINFOD_URLS configured that
understands the new requests, but it is a little confusing imho.

>     Documentation and testing are included.
>     
>     Signed-off-by: Ryan Goldberg <rgoldber@redhat.com>
>     Signed-off-by: Frank Ch. Eigler <fche@redhat.com>
> 
> diff --git a/ChangeLog b/ChangeLog
> index 10c23002279e..c35a19dd51c4 100644
> --- a/ChangeLog
> +++ b/ChangeLog
> @@ -1,3 +1,9 @@
> +2023-04-12  Ryan Golberg <rgoldberg@redhat.com>, Frank Ch. Eigler <fche@redhat.com>
> +
> +	PR29472: debuginfod metadata query
> +	* NEWS: Mention this.
> +	* configure.ac: Look for json-c library.
> +
>  2023-03-03  Mark Wielaard  <mark@klomp.org>
>  
>  	* NEWS: Add ELFCOMPRESS_ZSTD support for libelf and elfcompress.
> diff --git a/NEWS b/NEWS
> index 16e37eca5731..1bf38a69f64c 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -1,3 +1,7 @@
> +Version 0.190
> +
> +debuginfod: New API for metadata queries: file name -> buildid.

Clearly this calls for the new "code name" to be "meta" :)

>  Version 0.189 "Don't deflate!"
>  
>  configure: eu-nm, eu-addr2line and eu-stack can provide demangled symbols
> diff --git a/configure.ac b/configure.ac
> index 4b67c84425fa..b319a0119180 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -665,6 +665,12 @@ case "$ac_cv_search__obstack_free" in
>  esac
>  AC_SUBST([obstack_LIBS])
>  
> +dnl formerly checked for json_object_array_add, but debuginfod needs a newer function
> +AC_CHECK_LIB(json-c, json_object_from_fd, [
> +  AC_DEFINE([HAVE_JSON_C], [1], [Define if json-c is on the machine])
> +  AC_SUBST(jsonc_LIBS, '-ljson-c')
> +])
> +
>  dnl The directories with content.

So this is an optional feature in all configurations?
Maybe also include it in the summary at the end under OTHER FEATURES?

>  dnl Documentation.
> diff --git a/debuginfod/ChangeLog b/debuginfod/ChangeLog
> index f13c28d5c6f7..ac3fd6aa862f 100644
> --- a/debuginfod/ChangeLog
> +++ b/debuginfod/ChangeLog
> @@ -1,3 +1,26 @@
> +2023-04-12  Ryan Golberg <rgoldberg@redhat.com>, Frank Ch. Eigler <fche@redhat.com>
> +
> +	PR29472: debuginfod metadata query
> +	* Makefile.am: Add json-c usage.
> +	* debuginfod-client.c (debuginfod_find_metadata): New function.
> +	(handle_data): New fields to hold metadata being received.
> +	(debuginfod_clean_cache): Clean metadata too.
> +	(header_callback): Simplify to realloc only.
> +	(metadata_callback): New function.
> +	(init_server_urls, init_handle, perform_queries, make_cache_path):
> +	New refactored functions.
> +	(debuginfod_query_server_by_buildid): Renamed, refactored.  Update
> +	callers.

Still going through these refactors. If you have a high level
description why/what was refactored that would be helpful.

> +	* debuginfod-find.c (main): Handle metadata queries.
> +	* debuginfod.cxx (DEBUGINFOD_SQLITE_DDL): Add an index or two.
> +	(metadata_maxtime_s, parse_opt): New parameter for load control.
> +	(add_client_federation_headers): New refactored function.
> +	(handle_metadata): New function.
> +	(handler_cb): Call it for /metadata URL.  Trace it.
> +	(groom): Tweak sqlite_ps object lifetimes.
> +	* debuginfod.h.in (debuginfod_find_metadata): New decl.
> +	* libdebuginfod.map: Export it under ELFUTILS_0.190.
> +
>  2023-03-30  Jan Alexander Steffens (heftig) <heftig@archlinux.org>
>  
>  	* debuginfod-client.c (update_atime): New function.
> diff --git a/debuginfod/Makefile.am b/debuginfod/Makefile.am
> index 125be97bbfcc..73ffe0e56468 100644
> --- a/debuginfod/Makefile.am
> +++ b/debuginfod/Makefile.am
> @@ -70,7 +70,7 @@ bin_PROGRAMS += debuginfod-find
>  endif
>  
>  debuginfod_SOURCES = debuginfod.cxx
> -debuginfod_LDADD = $(libdw) $(libelf) $(libeu) $(libdebuginfod) $(argp_LDADD) $(fts_LIBS) $(libmicrohttpd_LIBS) $(sqlite3_LIBS) $(libarchive_LIBS) -lpthread -ldl
> +debuginfod_LDADD = $(libdw) $(libelf) $(libeu) $(libdebuginfod) $(argp_LDADD) $(fts_LIBS) $(libmicrohttpd_LIBS) $(sqlite3_LIBS) $(libarchive_LIBS) $(jsonc_LIBS) $(libcurl_LIBS) -lpthread -ldl
>  
>  debuginfod_find_SOURCES = debuginfod-find.c
>  debuginfod_find_LDADD = $(libdw) $(libelf) $(libeu) $(libdebuginfod) $(argp_LDADD) $(fts_LIBS)
> @@ -97,7 +97,7 @@ libdebuginfod_so_LIBS = libdebuginfod_pic.a
>  if DUMMY_LIBDEBUGINFOD
>  libdebuginfod_so_LDLIBS =
>  else
> -libdebuginfod_so_LDLIBS = -lpthread $(libcurl_LIBS) $(fts_LIBS) $(libelf)
> +libdebuginfod_so_LDLIBS = -lpthread $(libcurl_LIBS) $(fts_LIBS) $(libelf) $(jsonc_LIBS)
>  endif
>  $(LIBDEBUGINFOD_SONAME): $(srcdir)/libdebuginfod.map $(libdebuginfod_so_LIBS)
>  	$(AM_V_CCLD)$(LINK) $(dso_LDFLAGS) -o $@ \

So we are only linking to jsonc for libdebuginfod.so to check the text
is valid json, but the user still has to parse the json themselves. So
the promise is that the user gets valid json, but we don't make any
other promises. Since libdebuginfod already pulls in libcurl and its
dependencies, it doesn't really matter much. But imho it is not really
necessary to pull json-c in for the client library.

> diff --git a/debuginfod/debuginfod-client.c b/debuginfod/debuginfod-client.c
> index 5bb7952416e6..cbe448519320 100644
> --- a/debuginfod/debuginfod-client.c
> +++ b/debuginfod/debuginfod-client.c
> @@ -1,5 +1,5 @@
>  /* Retrieve ELF / DWARF / source files from the debuginfod.
> -   Copyright (C) 2019-2021 Red Hat, Inc.
> +   Copyright (C) 2019-2023 Red Hat, Inc.
>     Copyright (C) 2021, 2022 Mark J. Wielaard <mark@klomp.org>
>     This file is part of elfutils.
>  
> @@ -60,6 +60,8 @@ int debuginfod_find_source (debuginfod_client *c, const unsigned char *b,
>  int debuginfod_find_section (debuginfod_client *c, const unsigned char *b,
>  			     int s, const char *scn, char **p)
>  			      { return -ENOSYS; }
> +int debuginfod_find_metadata (debuginfod_client *c,
> +                              const char *k, const char *v, char **p) { return -ENOSYS; }
>  void debuginfod_set_progressfn(debuginfod_client *c,
>  			       debuginfod_progressfn_t fn) { }
>  void debuginfod_set_verbose_fd(debuginfod_client *c, int fd) { }
> @@ -114,6 +116,10 @@ void debuginfod_end (debuginfod_client *c) { }
>  
>  #include <pthread.h>
>  
> +#ifdef HAVE_JSON_C
> +  #include <json-c/json.h>
> +#endif
> +
>  static pthread_once_t init_control = PTHREAD_ONCE_INIT;
>  
>  static void
> @@ -174,6 +180,13 @@ static const char *cache_miss_filename = "cache_miss_s";
>  static const char *cache_max_unused_age_filename = "max_unused_age_s";
>  static const long cache_default_max_unused_age_s = 604800; /* 1 week */
>  
> +#ifdef HAVE_JSON_C
> +/* The metadata_retention_default_s file within the debuginfod cache
> +   specifies how long metadata query results should be cached. */
> +static const long metadata_retention_default_s = 86400; /* 1 day */
> +static const char *metadata_retention_filename = "metadata_retention_s";
> +#endif
> +
>  /* Location of the cache of files downloaded from debuginfods.
>     The default parent directory is $HOME, or '/' if $HOME doesn't exist.  */
>  static const char *cache_default_name = ".debuginfod_client_cache";
> @@ -215,6 +228,9 @@ struct handle_data
>    /* Response http headers for this client handle, sent from the server */
>    char *response_data;
>    size_t response_data_size;
> +  /* Response metadata values for this client handle, sent from the server */
> +  char *metadata;
> +  size_t metadata_size;
>  };

OK, we are always collecting metadata even if we don't have json-c.

>  static size_t
> @@ -333,7 +349,8 @@ debuginfod_clean_cache(debuginfod_client *c,
>      return -errno;
>  
>    regex_t re;
> -  const char * pattern = ".*/[a-f0-9]+(/debuginfo|/executable|/source.*|)$"; /* include dirs */
> +  const char * pattern = ".*/(metadata.*|[a-f0-9]+(/debuginfo|/executable|/source.*|))$"; /* include dirs */
> +  /* NB: also includes .../section/ subdirs */
>    if (regcomp (&re, pattern, REG_EXTENDED | REG_NOSUB) != 0)
>      return -ENOMEM;
> 
I don't understand the NB: comment. Sorry.

> @@ -569,18 +586,9 @@ header_callback (char * buffer, size_t size, size_t numitems, void * userdata)
>    }
>    /* Temporary buffer for realloc */
>    char *temp = NULL;
> -  if (data->response_data == NULL)
> -    {
> -      temp = malloc(numitems);
> -      if (temp == NULL)
> -        return 0;
> -    }
> -  else
> -    {
>    temp = realloc(data->response_data, data->response_data_size + numitems);
>    if (temp == NULL)
>      return 0;
> -    }
>  
>    memcpy(temp + data->response_data_size, buffer, numitems-1);
>    data->response_data = temp;

OK, rely on realloc (NULL, size) doing malloc (size).

> @@ -590,6 +598,341 @@ header_callback (char * buffer, size_t size, size_t numitems, void * userdata)
>    return numitems;
>  }
>  
> +
> +#ifdef HAVE_JSON_C
> +static size_t
> +metadata_callback (char * buffer, size_t size, size_t numitems, void * userdata)
> +{
> +  if (size != 1)
> +    return 0;
> +  /* Temporary buffer for realloc */
> +  char *temp = NULL;
> +  struct handle_data *data = (struct handle_data *) userdata;
> +  temp = realloc(data->metadata, data->metadata_size + numitems + 1);
> +  if (temp == NULL)
> +    return 0;
> +  
> +  memcpy(temp + data->metadata_size, buffer, numitems);
> +  data->metadata = temp;
> +  data->metadata_size += numitems;
> +  data->metadata[data->metadata_size] = '\0';
> +  return numitems;
> +}
> +#endif

Why is this guarded by HAVE_JSON_C?

> +
> +/* This function takes a copy of DEBUGINFOD_URLS, server_urls, and seperates it into an
> + * array of urls to query. The url_subdir is either 'buildid' or 'metadata', corresponding
> + * to the query type. Returns 0 on success and -Posix error on faliure.
> + */
> +int
> +init_server_urls(char* url_subdir, char *server_urls, char ***server_url_list, int *num_urls, int vfd)
> +{
> +  /* Initialize the memory to zero */
> +  char *strtok_saveptr;
> +  char *server_url = strtok_r(server_urls, url_delim, &strtok_saveptr);
> +  /* Count number of URLs.  */
> +  int n = 0;
> +  assert(0 == strcmp(url_subdir, "buildid") || 0 == strcmp(url_subdir, "metadata"));
> +
> +  /* PR 27983: If the url is already set to be used use, skip it */
> +  while (server_url != NULL)
> +  {
> +    int r;
> +    char *tmp_url;
> +    if (strlen(server_url) > 1 && server_url[strlen(server_url)-1] == '/')
> +      r = asprintf(&tmp_url, "%s%s", server_url, url_subdir);
> +    else
> +      r = asprintf(&tmp_url, "%s/%s", server_url, url_subdir);
> +
> +    if (r == -1)
> +      {
> +        return -ENOMEM;
> +      }
> +    int url_index;
> +    for (url_index = 0; url_index < n; ++url_index)
> +      {
> +        if(strcmp(tmp_url, (*server_url_list)[url_index]) == 0)
> +          {
> +            url_index = -1;
> +            break;
> +          }
> +      }
> +    if (url_index == -1)
> +      {
> +        if (vfd >= 0)
> +          dprintf(vfd, "duplicate url: %s, skipping\n", tmp_url);
> +        free(tmp_url);
> +      }
> +    else
> +      {
> +        n++;
> +        char ** realloc_ptr;
> +        realloc_ptr = reallocarray(*server_url_list, n,
> +                                        sizeof(char*));
> +        if (realloc_ptr == NULL)
> +          {
> +            free (tmp_url);
> +            return -ENOMEM;
> +          }
> +        *server_url_list = realloc_ptr;
> +        (*server_url_list)[n-1] = tmp_url;
> +      }
> +    server_url = strtok_r(NULL, url_delim, &strtok_saveptr);
> +  }
> +  *num_urls = n;
> +  return 0;
> +}

OK, this looks lifted from debuginfod_query_server updated to deal with
either /buildid/ or /metadata/.

> +/* Some boilerplate for checking curl_easy_setopt.  */
> +#define curl_easy_setopt_ck(H,O,P) do {			\
> +      CURLcode curl_res = curl_easy_setopt (H,O,P);	\
> +      if (curl_res != CURLE_OK)				\
> +	    {						\
> +	      if (vfd >= 0)					\
> +	        dprintf (vfd,				\
> +            "Bad curl_easy_setopt: %s\n",	\
> +		      curl_easy_strerror(curl_res));	\
> +	      return -EINVAL;					\
> +	    }						\
> +      } while (0)
> +

OK, also lifted from debuginfod_query_server. And changed to return
EINVAL directly without goto out2. So no cleanup needed whenever this
is used?


Sorry, I realize this is only 10% of the change. And I didn't really
get to the "real" changes yet. But I have to step out for a bit. To be
continued :)

Note that I don't believe there is anything really objectonable here.
Just pondering what interface we are really guaranteeing and what it
means for json-c being totally optional (it changes the public
interface of debuginfod-find for example).

Cheers,

Mark

  reply	other threads:[~2023-07-04 17:00 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-12 20:31 Frank Ch. Eigler
2023-07-04 17:00 ` Mark Wielaard [this message]
2023-07-04 23:21 ` Mark Wielaard
2023-07-05 12:34 ` Mark Wielaard
2023-07-05 13:45 ` Mark Wielaard
2023-07-07 13:59 ` Mark Wielaard
2023-07-07 14:26 ` Mark Wielaard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5ea66dc460594f77d480747d3d095045a7a7c566.camel@klomp.org \
    --to=mark@klomp.org \
    --cc=elfutils-devel@sourceware.org \
    --cc=fche@elastic.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).