public inbox for elfutils@sourceware.org
 help / color / mirror / Atom feed
* [Bug debuginfod/29472] New: Support querying the debuginfod-server for metadata
@ 2022-08-11 14:43 rgoldber at redhat dot com
  2022-08-12  9:05 ` [Bug debuginfod/29472] " mliska at suse dot cz
                   ` (7 more replies)
  0 siblings, 8 replies; 12+ messages in thread
From: rgoldber at redhat dot com @ 2022-08-11 14:43 UTC (permalink / raw)
  To: elfutils-devel

https://sourceware.org/bugzilla/show_bug.cgi?id=29472

            Bug ID: 29472
           Summary: Support querying the debuginfod-server for metadata
           Product: elfutils
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: debuginfod
          Assignee: unassigned at sourceware dot org
          Reporter: rgoldber at redhat dot com
                CC: elfutils-devel at sourceware dot org
  Target Milestone: ---

It would be beneficial to be able to query a debuginfod-server for its metadata
matching certain conditions. This would allow for operations like seeing what
executable files are available without attempting to download them or seeing
which files are larger than a certain size, which could be used to
prefetch+cache large required files, before the user needs them.

The proposed format is as follows:
debuginfod-find [OPTION]... metadata CONDITION QUERYFMT

Query all of the metadata matching the given CONDITION and return it as
described by QUERYFMT

The supported debuginfod-fields are:
    BUILDID, FILENAME, MODIFIED_TIME, SIZE, TYPE, SOURCE_TYPE, SOURCE
and should be referenced using {...}, noting that field names are case
insensitive.

CONDITION is a boolean expression composed of debuginfod-fields, constants
(strings and integers)
and the: `( ) = != < > <= >= && ||` operators, following the standard
conditional precedence order

QUERYFMT is a modified version of the standard printf(3) formatting.
The format is made up of static strings (which may include standard C character
escapes for new‐
lines, tabs, and other special characters (not including \0)) and printf(3)
type formatters. In 
place of the type specifier in the format string, use the debuginfod-field you
wish to query.

For example:

debuginfod-find metadata "{FILENAME}=/usr/bin/grep && {TYPE}=E"
"{FILENAME}({BUILDID})\t{SOURCE_TYPE}\n"
Will query all executables with the name /usr/bin/grep and might return the
following

/usr/bin/grep(90e7d8894b94f47ad17722ff8658f833f329b035)    R
/usr/bin/grep(e81e4e6e322030178260ae4f6055f781cd4997e1)    F

debuginfod-find metadata "{FILENAME}=/bin/bash" "{FILENAME}-{TYPE}\n"
Might return
/bin/bash-E
/bin/bash-S
/bin/bash-D

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug debuginfod/29472] Support querying the debuginfod-server for metadata
  2022-08-11 14:43 [Bug debuginfod/29472] New: Support querying the debuginfod-server for metadata rgoldber at redhat dot com
@ 2022-08-12  9:05 ` mliska at suse dot cz
  2022-08-12  9:05 ` mliska at suse dot cz
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: mliska at suse dot cz @ 2022-08-12  9:05 UTC (permalink / raw)
  To: elfutils-devel

https://sourceware.org/bugzilla/show_bug.cgi?id=29472

Martin Liska <mliska at suse dot cz> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mliska at suse dot cz

--- Comment #1 from Martin Liska <mliska at suse dot cz> ---
I think it's very similar to PR 28284.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug debuginfod/29472] Support querying the debuginfod-server for metadata
  2022-08-11 14:43 [Bug debuginfod/29472] New: Support querying the debuginfod-server for metadata rgoldber at redhat dot com
  2022-08-12  9:05 ` [Bug debuginfod/29472] " mliska at suse dot cz
@ 2022-08-12  9:05 ` mliska at suse dot cz
  2022-08-22 18:24 ` rgoldber at redhat dot com
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: mliska at suse dot cz @ 2022-08-12  9:05 UTC (permalink / raw)
  To: elfutils-devel

https://sourceware.org/bugzilla/show_bug.cgi?id=29472

Martin Liska <mliska at suse dot cz> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://sourceware.org/bugz
                   |                            |illa/show_bug.cgi?id=28284

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug debuginfod/29472] Support querying the debuginfod-server for metadata
  2022-08-11 14:43 [Bug debuginfod/29472] New: Support querying the debuginfod-server for metadata rgoldber at redhat dot com
  2022-08-12  9:05 ` [Bug debuginfod/29472] " mliska at suse dot cz
  2022-08-12  9:05 ` mliska at suse dot cz
@ 2022-08-22 18:24 ` rgoldber at redhat dot com
  2022-08-31 14:52 ` rgoldber at redhat dot com
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: rgoldber at redhat dot com @ 2022-08-22 18:24 UTC (permalink / raw)
  To: elfutils-devel

https://sourceware.org/bugzilla/show_bug.cgi?id=29472

--- Comment #2 from Ryan Goldberg <rgoldber at redhat dot com> ---
Created attachment 14289
  --> https://sourceware.org/bugzilla/attachment.cgi?id=14289&action=edit
Submit A Patch for 29472

Here is patch which is simpler than the above proposal. It allows for querying
by source path and will return a JSON array of metadata (where the metadata for
each file is its own JSON object).

The patch was tested using the elfutils try-buildbots on branch
users/rgoldber/try-metadata_query

There is a need for this patch as currently, even with work like PR 28284, a
user is unable to check what files are accessible with debuginfod, they can
only at most try and query by buildid and see if they can get a result. This
feature will be used in systemtap PR 27410

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug debuginfod/29472] Support querying the debuginfod-server for metadata
  2022-08-11 14:43 [Bug debuginfod/29472] New: Support querying the debuginfod-server for metadata rgoldber at redhat dot com
                   ` (2 preceding siblings ...)
  2022-08-22 18:24 ` rgoldber at redhat dot com
@ 2022-08-31 14:52 ` rgoldber at redhat dot com
  2022-09-02 17:25 ` rgoldber at redhat dot com
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: rgoldber at redhat dot com @ 2022-08-31 14:52 UTC (permalink / raw)
  To: elfutils-devel

https://sourceware.org/bugzilla/show_bug.cgi?id=29472

--- Comment #3 from Ryan Goldberg <rgoldber at redhat dot com> ---
Created attachment 14306
  --> https://sourceware.org/bugzilla/attachment.cgi?id=14306&action=edit
Submit A Patch for 29472, bug fix

Minor bug fix to be applied to the previous patch

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug debuginfod/29472] Support querying the debuginfod-server for metadata
  2022-08-11 14:43 [Bug debuginfod/29472] New: Support querying the debuginfod-server for metadata rgoldber at redhat dot com
                   ` (3 preceding siblings ...)
  2022-08-31 14:52 ` rgoldber at redhat dot com
@ 2022-09-02 17:25 ` rgoldber at redhat dot com
  2022-11-01 14:23 ` PATCH: Bug debuginfod/29472 followup Frank Ch. Eigler
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: rgoldber at redhat dot com @ 2022-09-02 17:25 UTC (permalink / raw)
  To: elfutils-devel

https://sourceware.org/bugzilla/show_bug.cgi?id=29472

Ryan Goldberg <rgoldber at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #14289|0                           |1
        is obsolete|                            |
  Attachment #14306|0                           |1
        is obsolete|                            |

--- Comment #4 from Ryan Goldberg <rgoldber at redhat dot com> ---
Created attachment 14310
  --> https://sourceware.org/bugzilla/attachment.cgi?id=14310&action=edit
Submit A Patch for 29472, revised

I've revised the previous patch to include the suggestions from elfutils-devel
[debuginfod metadata patch review]

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* PATCH: Bug debuginfod/29472 followup
  2022-08-11 14:43 [Bug debuginfod/29472] New: Support querying the debuginfod-server for metadata rgoldber at redhat dot com
                   ` (4 preceding siblings ...)
  2022-09-02 17:25 ` rgoldber at redhat dot com
@ 2022-11-01 14:23 ` Frank Ch. Eigler
  2022-11-01 22:20   ` Mark Wielaard
  2023-02-28 22:21 ` [Bug debuginfod/29472] Support querying the debuginfod-server for metadata mark at klomp dot org
  2024-06-03 15:27 ` fche at redhat dot com
  7 siblings, 1 reply; 12+ messages in thread
From: Frank Ch. Eigler @ 2022-11-01 14:23 UTC (permalink / raw)
  To: elfutils-devel

Hi -

On the users/fche/try-pr29472 branch, I pushed a followup to Ryan's
PR29472 draft from a bunch of weeks ago.  It's missing some
ChangeLog's but appears otherwise complete.  It's structured as Ryan's
original patch plus my followup that changes things around, so as to
preserve both contributions in the history.  I paste the overall diff
here.

There will be some minor merge conflicts between this and amerey's
section-extraction extensions that are also aiming for this release.
I'll be glad to deconflict whichever way.


diff --git a/ChangeLog b/ChangeLog
index 7bbb2c0fe97e..efce07161abe 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,8 @@
+2022-10-06  Ryan Goldberg <rgoldber@redhat.com>
+
+	* configure.ac (HAVE_JSON_C): Defined iff libjson-c
+	is found, and debuginfod metadata querying is thus enabled.
+
 2022-10-20  Mark Wielaard  <mark@klomp.org>
 
 	* Makefile.am (rpm): Remove --sign.
diff --git a/configure.ac b/configure.ac
index 1084b4695e2c..6077d52a7daf 100644
--- a/configure.ac
+++ b/configure.ac
@@ -600,6 +600,11 @@ case "$ac_cv_search__obstack_free" in
 esac
 AC_SUBST([obstack_LIBS])
 
+AC_CHECK_LIB(json-c, json_object_array_add, [
+  AC_DEFINE([HAVE_JSON_C], [1], [Define if json-c is on the machine])
+  AC_SUBST(jsonc_LIBS, '-ljson-c')
+])
+
 dnl The directories with content.
 
 dnl Documentation.
diff --git a/debuginfod/ChangeLog b/debuginfod/ChangeLog
index 1df903fe4ac2..79f827d95217 100644
--- a/debuginfod/ChangeLog
+++ b/debuginfod/ChangeLog
@@ -1,3 +1,27 @@
+2022-10-06  Ryan Goldberg <rgoldber@redhat.com>
+
+	* Makefile.am (debuginfod_LDADD): Add jsonc_LIBS.
+	(libdebuginfod_so_LDLIBS): Likewise.
+	* debuginfod-find.c (main): Add command line interface for
+	metadata query by path.
+	* debuginfod.h.in: Added debuginfod_find_metadata.
+	* debuginfod.cxx (add_client_federation_headers): New function
+	created from existing code to remove code duplication.
+	(handle_buildid_match): Calls new add_client_federation_headers
+	function.
+	(handle_metadata): New function which queries local DB and
+	upstream for metadata.
+	(handler_cb): New accepted url type, /metadata.
+	* debuginfod-client.c (struct handle_data): New fields: metadata,
+	metadata_size, to store incoming metadata.
+	(metadata_callback): New function called by curl upon reciving
+	metedata
+	(init_server_urls, init_handle, perform_queries) : New functions created from
+	existing code within debuginfod_query_server to reduce code duplication.
+	(debuginfod_query_server_by_buildid): debuginfod_query_server renamed, and above
+	functions used in place of identical previously inline code.
+	(debuginfod_find_metadata): New function.
+
 2022-10-18  Daniel Thornburgh <dthorn@google.com>
 
   * debuginfod-client.c (debuginfod_query_server): Add DEBUGINFOD_HEADERS_FILE
diff --git a/debuginfod/Makefile.am b/debuginfod/Makefile.am
index 435cb8a6839c..3d6bc26ecc4a 100644
--- a/debuginfod/Makefile.am
+++ b/debuginfod/Makefile.am
@@ -70,7 +70,7 @@ bin_PROGRAMS += debuginfod-find
 endif
 
 debuginfod_SOURCES = debuginfod.cxx
-debuginfod_LDADD = $(libdw) $(libelf) $(libeu) $(libdebuginfod) $(argp_LDADD) $(fts_LIBS) $(libmicrohttpd_LIBS) $(sqlite3_LIBS) $(libarchive_LIBS) -lpthread -ldl
+debuginfod_LDADD = $(libdw) $(libelf) $(libeu) $(libdebuginfod) $(argp_LDADD) $(fts_LIBS) $(libmicrohttpd_LIBS) $(sqlite3_LIBS) $(libarchive_LIBS) $(jsonc_LIBS) $(libcurl_LIBS) -lpthread -ldl
 
 debuginfod_find_SOURCES = debuginfod-find.c
 debuginfod_find_LDADD = $(libdw) $(libelf) $(libeu) $(libdebuginfod) $(argp_LDADD) $(fts_LIBS)
@@ -97,7 +97,7 @@ libdebuginfod_so_LIBS = libdebuginfod_pic.a
 if DUMMY_LIBDEBUGINFOD
 libdebuginfod_so_LDLIBS =
 else
-libdebuginfod_so_LDLIBS = -lpthread $(libcurl_LIBS) $(fts_LIBS)
+libdebuginfod_so_LDLIBS = -lpthread $(libcurl_LIBS) $(fts_LIBS) $(jsonc_LIBS)
 endif
 $(LIBDEBUGINFOD_SONAME): $(srcdir)/libdebuginfod.map $(libdebuginfod_so_LIBS)
 	$(AM_V_CCLD)$(LINK) $(dso_LDFLAGS) -o $@ \
diff --git a/debuginfod/debuginfod-client.c b/debuginfod/debuginfod-client.c
index 716cb7695617..4a75eef2303a 100644
--- a/debuginfod/debuginfod-client.c
+++ b/debuginfod/debuginfod-client.c
@@ -56,6 +56,8 @@ int debuginfod_find_executable (debuginfod_client *c, const unsigned char *b,
                                 int s, char **p) { return -ENOSYS; }
 int debuginfod_find_source (debuginfod_client *c, const unsigned char *b,
                             int s, const char *f, char **p)  { return -ENOSYS; }
+int debuginfod_find_metadata (debuginfod_client *c,
+                              const char *k, const char* v, char** m) { return -ENOSYS; }
 void debuginfod_set_progressfn(debuginfod_client *c,
 			       debuginfod_progressfn_t fn) { }
 void debuginfod_set_verbose_fd(debuginfod_client *c, int fd) { }
@@ -103,6 +105,10 @@ void debuginfod_end (debuginfod_client *c) { }
 
 #include <pthread.h>
 
+#ifdef HAVE_JSON_C
+  #include <json-c/json.h>
+#endif
+
 static pthread_once_t init_control = PTHREAD_ONCE_INIT;
 
 static void
@@ -201,6 +207,9 @@ struct handle_data
   /* Response http headers for this client handle, sent from the server */
   char *response_data;
   size_t response_data_size;
+  /* Response metadata values for this client handle, sent from the server */
+  char *metadata;
+  size_t metadata_size;
 };
 
 static size_t
@@ -555,18 +564,9 @@ header_callback (char * buffer, size_t size, size_t numitems, void * userdata)
   }
   /* Temporary buffer for realloc */
   char *temp = NULL;
-  if (data->response_data == NULL)
-    {
-      temp = malloc(numitems);
-      if (temp == NULL)
-        return 0;
-    }
-  else
-    {
-      temp = realloc(data->response_data, data->response_data_size + numitems);
-      if (temp == NULL)
-        return 0;
-    }
+  temp = realloc(data->response_data, data->response_data_size + numitems);
+  if (temp == NULL)
+    return 0;
 
   memcpy(temp + data->response_data_size, buffer, numitems-1);
   data->response_data = temp;
@@ -576,13 +576,345 @@ header_callback (char * buffer, size_t size, size_t numitems, void * userdata)
   return numitems;
 }
 
+#ifdef HAVE_JSON_C
+static size_t
+metadata_callback (char * buffer, size_t size, size_t numitems, void * userdata)
+{
+  if (size != 1)
+    return 0;
+  /* Temporary buffer for realloc */
+  char *temp = NULL;
+  struct handle_data *data = (struct handle_data *) userdata;
+  temp = realloc(data->metadata, data->metadata_size + numitems + 1);
+  if (temp == NULL)
+    return 0;
+
+  memcpy(temp + data->metadata_size, buffer, numitems);
+  data->metadata = temp;
+  data->metadata_size += numitems;
+  data->metadata[data->metadata_size] = '\0';
+  return numitems;
+}
+#endif
+
+
+/* This function takes a copy of DEBUGINFOD_URLS, server_urls, and seperates it into an
+ * array of urls to query. The url_subdir is either 'buildid' or 'metadata', corresponding
+ * to the query type. Returns 0 on success and -Posix error on faliure.
+ */
+int
+init_server_urls(char* url_subdir, char *server_urls, char ***server_url_list, int *num_urls, int vfd)
+{
+  /* Initialize the memory to zero */
+  char *strtok_saveptr;
+  char *server_url = strtok_r(server_urls, url_delim, &strtok_saveptr);
+  /* Count number of URLs.  */
+  int n = 0;
+  assert(0 == strcmp(url_subdir, "buildid") || 0 == strcmp(url_subdir, "metadata"));
+
+  /* PR 27983: If the url is already set to be used use, skip it */
+  while (server_url != NULL)
+  {
+    int r;
+    char *tmp_url;
+    if (strlen(server_url) > 1 && server_url[strlen(server_url)-1] == '/')
+      r = asprintf(&tmp_url, "%s%s", server_url, url_subdir);
+    else
+      r = asprintf(&tmp_url, "%s/%s", server_url, url_subdir);
+
+    if (r == -1)
+      {
+        return -ENOMEM;
+      }
+    int url_index;
+    for (url_index = 0; url_index < n; ++url_index)
+      {
+        if(strcmp(tmp_url, (*server_url_list)[url_index]) == 0)
+          {
+            url_index = -1;
+            break;
+          }
+      }
+    if (url_index == -1)
+      {
+        if (vfd >= 0)
+          dprintf(vfd, "duplicate url: %s, skipping\n", tmp_url);
+        free(tmp_url);
+      }
+    else
+      {
+        n++;
+        char ** realloc_ptr;
+        realloc_ptr = reallocarray(*server_url_list, n,
+                                        sizeof(char*));
+        if (realloc_ptr == NULL)
+          {
+            free (tmp_url);
+            return -ENOMEM;
+          }
+        *server_url_list = realloc_ptr;
+        (*server_url_list)[n-1] = tmp_url;
+      }
+    server_url = strtok_r(NULL, url_delim, &strtok_saveptr);
+  }
+  *num_urls = n;
+  return 0;
+}
+
+/* Some boilerplate for checking curl_easy_setopt.  */
+#define curl_easy_setopt_ck(H,O,P) do {			\
+      CURLcode curl_res = curl_easy_setopt (H,O,P);	\
+      if (curl_res != CURLE_OK)				\
+	    {						\
+	      if (vfd >= 0)					\
+	        dprintf (vfd,				\
+            "Bad curl_easy_setopt: %s\n",	\
+		      curl_easy_strerror(curl_res));	\
+	      return -EINVAL;					\
+	    }						\
+      } while (0)
+
+
+/*
+ * This function initializes a CURL handle. It takes optional callbacks for the write
+ * function and the header function, which if defined will use userdata of type struct handle_data*.
+ * Specifically the data[i] within an array of struct handle_data's.
+ * Returns 0 on success and -Posix error on faliure.
+ */
+int
+init_handle(debuginfod_client *client,
+  size_t (*w_callback)(char *buffer,size_t size,size_t nitems,void *userdata),
+  size_t (*h_callback)(char *buffer,size_t size,size_t nitems,void *userdata),
+  struct handle_data *data, int i, long timeout,
+  int vfd)
+{
+  data->handle = curl_easy_init();
+  if (data->handle == NULL)
+  {
+    return -ENETUNREACH;
+  }
+
+  if (vfd >= 0)
+    dprintf (vfd, "url %d %s\n", i, data->url);
+
+  /* Only allow http:// + https:// + file:// so we aren't being
+    redirected to some unsupported protocol.  */
+  curl_easy_setopt_ck(data->handle, CURLOPT_PROTOCOLS,
+    (CURLPROTO_HTTP | CURLPROTO_HTTPS | CURLPROTO_FILE));
+  curl_easy_setopt_ck(data->handle, CURLOPT_URL, data->url);
+  if (vfd >= 0)
+    curl_easy_setopt_ck(data->handle, CURLOPT_ERRORBUFFER,
+      data->errbuf);
+  if(w_callback) {
+    curl_easy_setopt_ck(data->handle,
+      CURLOPT_WRITEFUNCTION, w_callback);
+    curl_easy_setopt_ck(data->handle, CURLOPT_WRITEDATA, data);
+  }
+  if (timeout > 0)
+  {
+    /* Make sure there is at least some progress,
+      try to get at least 100K per timeout seconds.  */
+    curl_easy_setopt_ck (data->handle, CURLOPT_LOW_SPEED_TIME,
+            timeout);
+    curl_easy_setopt_ck (data->handle, CURLOPT_LOW_SPEED_LIMIT,
+            100 * 1024L);
+  }
+  data->response_data = NULL;
+  data->response_data_size = 0;
+  curl_easy_setopt_ck(data->handle, CURLOPT_FILETIME, (long) 1);
+  curl_easy_setopt_ck(data->handle, CURLOPT_FOLLOWLOCATION, (long) 1);
+  curl_easy_setopt_ck(data->handle, CURLOPT_FAILONERROR, (long) 1);
+  curl_easy_setopt_ck(data->handle, CURLOPT_NOSIGNAL, (long) 1);
+  if(h_callback){
+    curl_easy_setopt_ck(data->handle,
+      CURLOPT_HEADERFUNCTION, h_callback);
+    curl_easy_setopt_ck(data->handle, CURLOPT_HEADERDATA, data);
+  }
+  #if LIBCURL_VERSION_NUM >= 0x072a00 /* 7.42.0 */
+  curl_easy_setopt_ck(data->handle, CURLOPT_PATH_AS_IS, (long) 1);
+  #else
+  /* On old curl; no big deal, canonicalization here is almost the
+      same, except perhaps for ? # type decorations at the tail. */
+  #endif
+  curl_easy_setopt_ck(data->handle, CURLOPT_AUTOREFERER, (long) 1);
+  curl_easy_setopt_ck(data->handle, CURLOPT_ACCEPT_ENCODING, "");
+  curl_easy_setopt_ck(data->handle, CURLOPT_HTTPHEADER, client->headers);
+
+  return 0;
+}
+
+
+/*
+ * This function busy-waits on one or more curl queries to complete. This can
+ * be controled via only_one, which, if true, will find the first winner and exit
+ * once found. If positive maxtime and maxsize dictate the maximum allowed wait times
+ * and download sizes respectivly. Returns 0 on success and -Posix error on faliure.
+ */
+int
+perform_queries(CURLM *curlm, CURL **target_handle, struct handle_data *data, debuginfod_client *c,
+  int num_urls, long maxtime, long maxsize, bool only_one, int vfd)
+{
+  int still_running = -1;
+  long loops = 0;
+  int committed_to = -1;
+  bool verbose_reported = false;
+  struct timespec start_time, cur_time;
+  if (c->winning_headers != NULL)
+    {
+      free (c->winning_headers);
+      c->winning_headers = NULL;
+    }
+  if ( maxtime > 0 && clock_gettime(CLOCK_MONOTONIC_RAW, &start_time) == -1)
+  {
+    return errno;
+  }
+  long delta = 0;
+  do
+  {
+    /* Check to see how long querying is taking. */
+    if (maxtime > 0)
+    {
+      if (clock_gettime(CLOCK_MONOTONIC_RAW, &cur_time) == -1)
+      {
+        return errno;
+      }
+      delta = cur_time.tv_sec - start_time.tv_sec;
+      if ( delta >  maxtime)
+      {
+        dprintf(vfd, "Timeout with max time=%lds and transfer time=%lds\n", maxtime, delta );
+        return -ETIME;
+      }
+    }
+    /* Wait 1 second, the minimum DEBUGINFOD_TIMEOUT.  */
+    curl_multi_wait(curlm, NULL, 0, 1000, NULL);
+    CURLMcode curlm_res = curl_multi_perform(curlm, &still_running);
+
+    if(only_one){
+      /* If the target file has been found, abort the other queries.  */
+      if (target_handle && *target_handle != NULL)
+      {
+        for (int i = 0; i < num_urls; i++)
+          if (data[i].handle != *target_handle)
+            curl_multi_remove_handle(curlm, data[i].handle);
+          else
+          {
+            committed_to = i;
+            if (c->winning_headers == NULL)
+            {
+              c->winning_headers = data[committed_to].response_data;
+              if (vfd >= 0 && c->winning_headers != NULL)
+                dprintf(vfd, "\n%s", c->winning_headers);
+              data[committed_to].response_data = NULL;
+              data[committed_to].response_data_size = 0;
+            }
+          }
+      }
+
+      if (vfd >= 0 && !verbose_reported && committed_to >= 0)
+      {
+        bool pnl = (c->default_progressfn_printed_p && vfd == STDERR_FILENO);
+        dprintf (vfd, "%scommitted to url %d\n", pnl ? "\n" : "",
+          committed_to);
+        if (pnl)
+          c->default_progressfn_printed_p = 0;
+        verbose_reported = true;
+      }
+    }
+
+    if (curlm_res != CURLM_OK)
+    {
+      switch (curlm_res)
+      {
+      case CURLM_CALL_MULTI_PERFORM: continue;
+      case CURLM_OUT_OF_MEMORY: return -ENOMEM;
+      default: return -ENETUNREACH;
+      }
+    }
+
+    long dl_size = 0;
+    if(only_one && target_handle){ // Only bother with progress functions if we're retrieving exactly 1 file
+      if (*target_handle && (c->progressfn || maxsize > 0))
+      {
+        /* Get size of file being downloaded. NB: If going through
+            deflate-compressing proxies, this number is likely to be
+            unavailable, so -1 may show. */
+        CURLcode curl_res;
+#ifdef CURLINFO_CONTENT_LENGTH_DOWNLOAD_T
+        curl_off_t cl;
+        curl_res = curl_easy_getinfo(*target_handle,
+                                      CURLINFO_CONTENT_LENGTH_DOWNLOAD_T,
+                                      &cl);
+        if (curl_res == CURLE_OK && cl >= 0)
+          dl_size = (cl > LONG_MAX ? LONG_MAX : (long)cl);
+#else
+        double cl;
+        curl_res = curl_easy_getinfo(*target_handle,
+                                      CURLINFO_CONTENT_LENGTH_DOWNLOAD,
+                                      &cl);
+        if (curl_res == CURLE_OK)
+          dl_size = (cl >= (double)(LONG_MAX+1UL) ? LONG_MAX : (long)cl);
+#endif
+        /* If Content-Length is -1, try to get the size from
+            X-Debuginfod-Size */
+        if (dl_size == -1 && c->winning_headers != NULL)
+        {
+          long xdl;
+          char *hdr = strcasestr(c->winning_headers, "x-debuginfod-size");
+
+          if (hdr != NULL
+              && sscanf(hdr, "x-debuginfod-size: %ld", &xdl) == 1)
+            dl_size = xdl;
+        }
+      }
+
+      if (c->progressfn) /* inform/check progress callback */
+      {
+        loops ++;
+        long pa = loops; /* default param for progress callback */
+        if (*target_handle) /* we've committed to a server; report its download progress */
+        {
+          CURLcode curl_res;
+#ifdef CURLINFO_SIZE_DOWNLOAD_T
+          curl_off_t dl;
+          curl_res = curl_easy_getinfo(*target_handle,
+                                        CURLINFO_SIZE_DOWNLOAD_T,
+                                        &dl);
+          if (curl_res == 0 && dl >= 0)
+            pa = (dl > LONG_MAX ? LONG_MAX : (long)dl);
+#else
+          double dl;
+          curl_res = curl_easy_getinfo(*target_handle,
+                                        CURLINFO_SIZE_DOWNLOAD,
+                                        &dl);
+          if (curl_res == 0)
+            pa = (dl >= (double)(LONG_MAX+1UL) ? LONG_MAX : (long)dl);
+#endif
+
+        }
+
+        if ((*c->progressfn) (c, pa, dl_size))
+          break;
+      }
+    }
+    /* Check to see if we are downloading something which exceeds maxsize, if set.*/
+    if (target_handle && *target_handle && dl_size > maxsize && maxsize > 0)
+    {
+      if (vfd >=0)
+        dprintf(vfd, "Content-Length too large.\n");
+      return -EFBIG;
+    }
+  } while (still_running);
+  return 0;
+}
+
+
 /* Query each of the server URLs found in $DEBUGINFOD_URLS for the file
    with the specified build-id, type (debuginfo, executable or source)
    and filename. filename may be NULL. If found, return a file
    descriptor for the target, otherwise return an error code.
 */
 static int
-debuginfod_query_server (debuginfod_client *c,
+debuginfod_query_server_by_buildid (debuginfod_client *c,
 			 const unsigned char *build_id,
                          int build_id_len,
                          const char *type,
@@ -601,7 +933,7 @@ debuginfod_query_server (debuginfod_client *c,
   char suffix[PATH_MAX + 1]; /* +1 for zero terminator.  */
   char build_id_bytes[MAX_BUILD_ID_BYTES * 2 + 1];
   int vfd = c->verbose_fd;
-  int rc;
+  int rc, r;
 
   if (vfd >= 0)
     {
@@ -915,60 +1247,14 @@ debuginfod_query_server (debuginfod_client *c,
       goto out0;
     }
 
-  /* Initialize the memory to zero */
-  char *strtok_saveptr;
   char **server_url_list = NULL;
-  char *server_url = strtok_r(server_urls, url_delim, &strtok_saveptr);
-  /* Count number of URLs.  */
-  int num_urls = 0;
-
-  while (server_url != NULL)
-    {
-      /* PR 27983: If the url is already set to be used use, skip it */
-      char *slashbuildid;
-      if (strlen(server_url) > 1 && server_url[strlen(server_url)-1] == '/')
-        slashbuildid = "buildid";
-      else
-        slashbuildid = "/buildid";
-
-      char *tmp_url;
-      if (asprintf(&tmp_url, "%s%s", server_url, slashbuildid) == -1)
-        {
-          rc = -ENOMEM;
-          goto out1;
-        }
-      int url_index;
-      for (url_index = 0; url_index < num_urls; ++url_index)
-        {
-          if(strcmp(tmp_url, server_url_list[url_index]) == 0)
-            {
-              url_index = -1;
-              break;
-            }
-        }
-      if (url_index == -1)
-        {
-          if (vfd >= 0)
-            dprintf(vfd, "duplicate url: %s, skipping\n", tmp_url);
-          free(tmp_url);
-        }
-      else
-        {
-          num_urls++;
-          char ** realloc_ptr;
-          realloc_ptr = reallocarray(server_url_list, num_urls,
-                                         sizeof(char*));
-          if (realloc_ptr == NULL)
-            {
-              free (tmp_url);
-              rc = -ENOMEM;
-              goto out1;
-            }
-          server_url_list = realloc_ptr;
-          server_url_list[num_urls-1] = tmp_url;
-        }
-      server_url = strtok_r(NULL, url_delim, &strtok_saveptr);
-    }
+  char *server_url;
+  int num_urls;
+  r = init_server_urls("buildid", server_urls, &server_url_list, &num_urls, vfd);
+  if(0 != r){
+    rc = r;
+    goto out1;
+  }
 
   int retry_limit = default_retry_limit;
   const char* retry_limit_envvar = getenv(DEBUGINFOD_RETRY_LIMIT_ENV_VAR);
@@ -1038,13 +1324,6 @@ debuginfod_query_server (debuginfod_client *c,
 
       data[i].fd = fd;
       data[i].target_handle = &target_handle;
-      data[i].handle = curl_easy_init();
-      if (data[i].handle == NULL)
-        {
-          if (filename) curl_free (escaped_string);
-          rc = -ENETUNREACH;
-          goto out2;
-        }
       data[i].client = c;
 
       if (filename) /* must start with / */
@@ -1055,220 +1334,29 @@ debuginfod_query_server (debuginfod_client *c,
         }
       else
         snprintf(data[i].url, PATH_MAX, "%s/%s/%s", server_url, build_id_bytes, type);
-      if (vfd >= 0)
-	dprintf (vfd, "url %d %s\n", i, data[i].url);
-
-      /* Some boilerplate for checking curl_easy_setopt.  */
-#define curl_easy_setopt_ck(H,O,P) do {			\
-      CURLcode curl_res = curl_easy_setopt (H,O,P);	\
-      if (curl_res != CURLE_OK)				\
-	{						\
-	  if (vfd >= 0)					\
-	    dprintf (vfd,				\
-	             "Bad curl_easy_setopt: %s\n",	\
-		     curl_easy_strerror(curl_res));	\
-	  rc = -EINVAL;					\
-	  goto out2;					\
-	}						\
-      } while (0)
 
-      /* Only allow http:// + https:// + file:// so we aren't being
-	 redirected to some unsupported protocol.  */
-      curl_easy_setopt_ck(data[i].handle, CURLOPT_PROTOCOLS,
-			  (CURLPROTO_HTTP | CURLPROTO_HTTPS | CURLPROTO_FILE));
-      curl_easy_setopt_ck(data[i].handle, CURLOPT_URL, data[i].url);
-      if (vfd >= 0)
-	curl_easy_setopt_ck(data[i].handle, CURLOPT_ERRORBUFFER,
-			    data[i].errbuf);
-      curl_easy_setopt_ck(data[i].handle,
-			  CURLOPT_WRITEFUNCTION,
-			  debuginfod_write_callback);
-      curl_easy_setopt_ck(data[i].handle, CURLOPT_WRITEDATA, (void*)&data[i]);
-      if (timeout > 0)
-	{
-	  /* Make sure there is at least some progress,
-	     try to get at least 100K per timeout seconds.  */
-	  curl_easy_setopt_ck (data[i].handle, CURLOPT_LOW_SPEED_TIME,
-			       timeout);
-	  curl_easy_setopt_ck (data[i].handle, CURLOPT_LOW_SPEED_LIMIT,
-			       100 * 1024L);
-	}
-      data[i].response_data = NULL;
-      data[i].response_data_size = 0;
-      curl_easy_setopt_ck(data[i].handle, CURLOPT_FILETIME, (long) 1);
-      curl_easy_setopt_ck(data[i].handle, CURLOPT_FOLLOWLOCATION, (long) 1);
-      curl_easy_setopt_ck(data[i].handle, CURLOPT_FAILONERROR, (long) 1);
-      curl_easy_setopt_ck(data[i].handle, CURLOPT_NOSIGNAL, (long) 1);
-      curl_easy_setopt_ck(data[i].handle, CURLOPT_HEADERFUNCTION,
-			  header_callback);
-      curl_easy_setopt_ck(data[i].handle, CURLOPT_HEADERDATA,
-			  (void *) &(data[i]));
-#if LIBCURL_VERSION_NUM >= 0x072a00 /* 7.42.0 */
-      curl_easy_setopt_ck(data[i].handle, CURLOPT_PATH_AS_IS, (long) 1);
-#else
-      /* On old curl; no big deal, canonicalization here is almost the
-         same, except perhaps for ? # type decorations at the tail. */
-#endif
-      curl_easy_setopt_ck(data[i].handle, CURLOPT_AUTOREFERER, (long) 1);
-      curl_easy_setopt_ck(data[i].handle, CURLOPT_ACCEPT_ENCODING, "");
-      curl_easy_setopt_ck(data[i].handle, CURLOPT_HTTPHEADER, c->headers);
+      r = init_handle(c, debuginfod_write_callback, header_callback, &data[i], i, timeout, vfd);
+      if(0 != r){
+        rc = r;
+        if(filename) curl_free (escaped_string);
+        goto out2;
+      }
 
       curl_multi_add_handle(curlm, data[i].handle);
     }
 
   if (filename) curl_free(escaped_string);
+
   /* Query servers in parallel.  */
   if (vfd >= 0)
     dprintf (vfd, "query %d urls in parallel\n", num_urls);
-  int still_running;
-  long loops = 0;
-  int committed_to = -1;
-  bool verbose_reported = false;
-  struct timespec start_time, cur_time;
 
-  free (c->winning_headers);
-  c->winning_headers = NULL;
-  if ( maxtime > 0 && clock_gettime(CLOCK_MONOTONIC_RAW, &start_time) == -1)
+  r = perform_queries(curlm, &target_handle,data,c, num_urls, maxtime, maxsize, true,  vfd);
+  if (0 != r)
     {
-      rc = -errno;
+      rc = r;
       goto out2;
     }
-  long delta = 0;
-  do
-    {
-      /* Check to see how long querying is taking. */
-      if (maxtime > 0)
-        {
-          if (clock_gettime(CLOCK_MONOTONIC_RAW, &cur_time) == -1)
-            {
-              rc = -errno;
-              goto out2;
-            }
-          delta = cur_time.tv_sec - start_time.tv_sec;
-          if ( delta >  maxtime)
-            {
-              dprintf(vfd, "Timeout with max time=%lds and transfer time=%lds\n", maxtime, delta );
-              rc = -ETIME;
-              goto out2;
-            }
-        }
-      /* Wait 1 second, the minimum DEBUGINFOD_TIMEOUT.  */
-      curl_multi_wait(curlm, NULL, 0, 1000, NULL);
-      CURLMcode curlm_res = curl_multi_perform(curlm, &still_running);
-
-      /* If the target file has been found, abort the other queries.  */
-      if (target_handle != NULL)
-	{
-	  for (int i = 0; i < num_urls; i++)
-	    if (data[i].handle != target_handle)
-	      curl_multi_remove_handle(curlm, data[i].handle);
-	    else
-              {
-	        committed_to = i;
-                if (c->winning_headers == NULL)
-                  {
-                    c->winning_headers = data[committed_to].response_data;
-                    data[committed_to].response_data = NULL;
-                    data[committed_to].response_data_size = 0;
-                  }
-
-              }
-	}
-
-      if (vfd >= 0 && !verbose_reported && committed_to >= 0)
-	{
-	  bool pnl = (c->default_progressfn_printed_p && vfd == STDERR_FILENO);
-	  dprintf (vfd, "%scommitted to url %d\n", pnl ? "\n" : "",
-		   committed_to);
-	  if (pnl)
-	    c->default_progressfn_printed_p = 0;
-	  verbose_reported = true;
-	}
-
-      if (curlm_res != CURLM_OK)
-        {
-          switch (curlm_res)
-            {
-            case CURLM_CALL_MULTI_PERFORM: continue;
-            case CURLM_OUT_OF_MEMORY: rc = -ENOMEM; break;
-            default: rc = -ENETUNREACH; break;
-            }
-          goto out2;
-        }
-
-      long dl_size = 0;
-      if (target_handle && (c->progressfn || maxsize > 0))
-        {
-          /* Get size of file being downloaded. NB: If going through
-             deflate-compressing proxies, this number is likely to be
-             unavailable, so -1 may show. */
-          CURLcode curl_res;
-#ifdef CURLINFO_CONTENT_LENGTH_DOWNLOAD_T
-          curl_off_t cl;
-          curl_res = curl_easy_getinfo(target_handle,
-                                       CURLINFO_CONTENT_LENGTH_DOWNLOAD_T,
-                                       &cl);
-          if (curl_res == CURLE_OK && cl >= 0)
-            dl_size = (cl > LONG_MAX ? LONG_MAX : (long)cl);
-#else
-          double cl;
-          curl_res = curl_easy_getinfo(target_handle,
-                                       CURLINFO_CONTENT_LENGTH_DOWNLOAD,
-                                       &cl);
-          if (curl_res == CURLE_OK)
-            dl_size = (cl >= (double)(LONG_MAX+1UL) ? LONG_MAX : (long)cl);
-#endif
-          /* If Content-Length is -1, try to get the size from
-             X-Debuginfod-Size */
-          if (dl_size == -1 && c->winning_headers != NULL)
-            {
-              long xdl;
-              char *hdr = strcasestr(c->winning_headers, "x-debuginfod-size");
-
-              if (hdr != NULL
-                  && sscanf(hdr, "x-debuginfod-size: %ld", &xdl) == 1)
-                dl_size = xdl;
-            }
-        }
-
-      if (c->progressfn) /* inform/check progress callback */
-        {
-          loops ++;
-          long pa = loops; /* default param for progress callback */
-          if (target_handle) /* we've committed to a server; report its download progress */
-            {
-              CURLcode curl_res;
-#ifdef CURLINFO_SIZE_DOWNLOAD_T
-              curl_off_t dl;
-              curl_res = curl_easy_getinfo(target_handle,
-                                           CURLINFO_SIZE_DOWNLOAD_T,
-                                           &dl);
-              if (curl_res == 0 && dl >= 0)
-                pa = (dl > LONG_MAX ? LONG_MAX : (long)dl);
-#else
-              double dl;
-              curl_res = curl_easy_getinfo(target_handle,
-                                           CURLINFO_SIZE_DOWNLOAD,
-                                           &dl);
-              if (curl_res == 0)
-                pa = (dl >= (double)(LONG_MAX+1UL) ? LONG_MAX : (long)dl);
-#endif
-
-            }
-
-          if ((*c->progressfn) (c, pa, dl_size))
-            break;
-        }
-
-      /* Check to see if we are downloading something which exceeds maxsize, if set.*/
-      if (target_handle && dl_size > maxsize && maxsize > 0)
-        {
-          if (vfd >=0)
-            dprintf(vfd, "Content-Length too large.\n");
-          rc = -EFBIG;
-          goto out2;
-        }
-    } while (still_running);
 
   /* Check whether a query was successful. If so, assign its handle
      to verified_handle.  */
@@ -1625,7 +1713,7 @@ debuginfod_find_debuginfo (debuginfod_client *client,
 			   const unsigned char *build_id, int build_id_len,
                            char **path)
 {
-  return debuginfod_query_server(client, build_id, build_id_len,
+  return debuginfod_query_server_by_buildid(client, build_id, build_id_len,
                                  "debuginfo", NULL, path);
 }
 
@@ -1636,7 +1724,7 @@ debuginfod_find_executable(debuginfod_client *client,
 			   const unsigned char *build_id, int build_id_len,
                            char **path)
 {
-  return debuginfod_query_server(client, build_id, build_id_len,
+  return debuginfod_query_server_by_buildid(client, build_id, build_id_len,
                                  "executable", NULL, path);
 }
 
@@ -1645,11 +1733,222 @@ int debuginfod_find_source(debuginfod_client *client,
 			   const unsigned char *build_id, int build_id_len,
                            const char *filename, char **path)
 {
-  return debuginfod_query_server(client, build_id, build_id_len,
+  return debuginfod_query_server_by_buildid(client, build_id, build_id_len,
                                  "source", filename, path);
 }
 
 
+int debuginfod_find_metadata (debuginfod_client *client,
+                              const char* key, const char* value, char** metadata)
+{
+  (void) client;
+  (void) key;
+  (void) value;
+  
+  if (NULL == metadata) return -EINVAL;
+#ifdef HAVE_JSON_C
+  char *server_urls;
+  char *urls_envvar;
+  json_object *json_metadata = json_object_new_array();
+  int rc = 0, r;
+  int vfd = client->verbose_fd;
+
+  if(NULL == json_metadata){
+    rc = -ENOMEM;
+    goto out;
+  }
+
+  if(NULL == value || NULL == key){
+    rc = -EINVAL;
+    goto out;
+  }
+
+  if (vfd >= 0)
+    dprintf (vfd, "debuginfod_find_metadata %s %s\n", key, value);
+
+  /* Without query-able URL, we can stop here*/
+  urls_envvar = getenv(DEBUGINFOD_URLS_ENV_VAR);
+  if (vfd >= 0)
+    dprintf (vfd, "server urls \"%s\"\n",
+      urls_envvar != NULL ? urls_envvar : "");
+  if (urls_envvar == NULL || urls_envvar[0] == '\0')
+  {
+    rc = -ENOSYS;
+    goto out;
+  }
+
+  /* Clear the client of previous urls*/
+  free (client->url);
+  client->url = NULL;
+
+  long maxtime = 0;
+  const char *maxtime_envvar;
+  maxtime_envvar = getenv(DEBUGINFOD_MAXTIME_ENV_VAR);
+  if (maxtime_envvar != NULL)
+    maxtime = atol (maxtime_envvar);
+  if (maxtime && vfd >= 0)
+    dprintf(vfd, "using max time %lds\n", maxtime);
+
+  long timeout = default_timeout;
+  const char* timeout_envvar = getenv(DEBUGINFOD_TIMEOUT_ENV_VAR);
+  if (timeout_envvar != NULL)
+    timeout = atoi (timeout_envvar);
+  if (vfd >= 0)
+    dprintf (vfd, "using timeout %ld\n", timeout);
+
+  add_default_headers(client);
+
+  /* make a copy of the envvar so it can be safely modified.  */
+  server_urls = strdup(urls_envvar);
+  if (server_urls == NULL)
+  {
+    rc = -ENOMEM;
+    goto out;
+  }
+  /* thereafter, goto out1 on error*/
+
+  char **server_url_list = NULL;
+  char *server_url;
+  int num_urls;
+  r = init_server_urls("metadata", server_urls, &server_url_list, &num_urls, vfd);
+  if(0 != r){
+    rc = r;
+    goto out1;
+  }
+
+  CURLM *curlm = client->server_mhandle;
+  assert (curlm != NULL);
+
+  CURL *target_handle = NULL;
+  struct handle_data *data = malloc(sizeof(struct handle_data) * num_urls);
+  if (data == NULL)
+  {
+    rc = -ENOMEM;
+    goto out1;
+  }
+
+  /* thereafter, goto out2 on error.  */
+
+
+  /* Initialize handle_data  */
+  for (int i = 0; i < num_urls; i++)
+  {
+    if ((server_url = server_url_list[i]) == NULL)
+      break;
+    if (vfd >= 0)
+      dprintf (vfd, "init server %d %s\n", i, server_url);
+
+    data[i].errbuf[0] = '\0';
+    data[i].target_handle = &target_handle;
+    data[i].client = client;
+    data[i].metadata = NULL;
+    data[i].metadata_size = 0;
+
+    // libcurl > 7.62ish has curl_url_set()/etc. to construct these things more properly.
+    // curl_easy_escape() is older
+    CURL *c = curl_easy_init();
+    if (c) {
+      char *key_escaped = curl_easy_escape(c, key, 0);
+      char *value_escaped = curl_easy_escape(c, value, 0);
+      snprintf(data[i].url, PATH_MAX, "%s?key=%s&value=%s", server_url,
+               // fallback to unescaped values in unlikely case of error
+               key_escaped ?: key, value_escaped ?: value);
+      curl_free(value_escaped);
+      curl_free(key_escaped);
+      curl_easy_cleanup(c);
+    }
+    
+    r = init_handle(client, metadata_callback, header_callback, &data[i], i, timeout, vfd);
+    if(0 != r){
+      rc = r;
+      goto out2;
+    }
+    curl_multi_add_handle(curlm, data[i].handle);
+  }
+
+  /* Query servers */
+  if (vfd >= 0)
+      dprintf (vfd, "Starting %d queries\n",num_urls);
+  r = perform_queries(curlm, NULL, data, client, num_urls, maxtime, 0, false, vfd);
+  if(0 != r){
+    rc = r;
+    goto out2;
+  }
+
+  /* NOTE: We don't check the return codes of the curl messages since
+     a metadata query failing silently is just fine. We want to know what's
+     available from servers which can be connected with no issues.
+     If running with additional verbosity, the failure will be noted in stderr */
+
+  /* Building the new json array from all the upstream data
+    and cleanup while at it
+  */
+  for (int i = 0; i < num_urls; i++)
+  {
+    curl_multi_remove_handle(curlm, data[i].handle); /* ok to repeat */
+    if(NULL == data[i].metadata)
+    {
+      if (vfd >= 0)
+        dprintf (vfd, "Query to %s failed with error message:\n\t\"%s\"\n",
+          data[i].url, data[i].errbuf);
+      continue;
+    }
+    json_object *upstream_metadata = json_tokener_parse(data[i].metadata);
+    if(NULL == upstream_metadata) continue;
+    // Combine the upstream metadata into the json array
+    for (int j = 0, n = json_object_array_length(upstream_metadata); j < n; j++) {
+        json_object *entry = json_object_array_get_idx(upstream_metadata, j);
+        json_object_get(entry); // increment reference count
+        json_object_array_add(json_metadata, entry);
+    }
+    json_object_put(upstream_metadata);
+
+    curl_easy_cleanup (data[i].handle);
+    free (data[i].response_data);
+    free (data[i].metadata);
+  }
+
+  *metadata = strdup(json_object_to_json_string_ext(json_metadata, JSON_C_TO_STRING_PRETTY));
+
+  free (data);
+  goto out1;
+
+/* error exits */
+out2:
+  /* remove all handles from multi */
+  for (int i = 0; i < num_urls; i++)
+  {
+    if (data[i].handle != NULL)
+    {
+      curl_multi_remove_handle(curlm, data[i].handle); /* ok to repeat */
+      curl_easy_cleanup (data[i].handle);
+      free (data[i].response_data);
+      free (data[i].metadata);
+    }
+  }
+  free(data);
+
+out1:
+  for (int i = 0; i < num_urls; ++i)
+    free(server_url_list[i]);
+  free(server_url_list);
+  free (server_urls);
+
+/* general purpose exit */
+out:
+  json_object_put(json_metadata);
+  /* Reset sent headers */
+  curl_slist_free_all (client->headers);
+  client->headers = NULL;
+  client->user_agent_set_p = 0;
+
+  return rc;
+
+#else /* ! HAVE_JSON_C */
+  return -ENOSYS;
+#endif
+}
+
 /* Add an outgoing HTTP header.  */
 int debuginfod_add_http_header (debuginfod_client *client, const char* header)
 {
diff --git a/debuginfod/debuginfod-find.c b/debuginfod/debuginfod-find.c
index 778fb09b0890..4136c5679c23 100644
--- a/debuginfod/debuginfod-find.c
+++ b/debuginfod/debuginfod-find.c
@@ -31,6 +31,9 @@
 #include <gelf.h>
 #include <libdwelf.h>
 
+#ifdef HAVE_JSON_C
+  #include <json-c/json.h>
+#endif
 
 /* Name and version of program.  */
 ARGP_PROGRAM_VERSION_HOOK_DEF = print_version;
@@ -48,7 +51,11 @@ static const char args_doc[] = N_("debuginfo BUILDID\n"
                                   "executable BUILDID\n"
                                   "executable PATH\n"
                                   "source BUILDID /FILENAME\n"
-                                  "source PATH /FILENAME\n");
+                                  "source PATH /FILENAME\n"
+#ifdef HAVE_JSON_C                                  
+                                  "metadata KEY VALUE"
+#endif
+                                  );
 
 
 /* Definitions of arguments for argp functions.  */
@@ -140,6 +147,30 @@ main(int argc, char** argv)
       return 1;
     }
 
+#ifdef HAVE_JSON_C
+  if(strcmp(argv[remaining], "metadata") == 0){
+      if (remaining+2 == argc)
+      {
+        fprintf(stderr, "Require KEY and VALUE for \"metadata\"\n");
+        return 1;
+      }
+
+      char* metadata;
+      int rc = debuginfod_find_metadata (client, argv[remaining+1], argv[remaining+2],
+                                         &metadata);
+
+      if (rc < 0)
+      {
+        fprintf(stderr, "Server query failed: %s\n", strerror(-rc));
+        return 1;
+      }
+      // Output the metadata to stdout
+      printf("%s\n", metadata);
+      free(metadata);
+      return 0;
+  }
+#endif
+
   /* If we were passed an ELF file name in the BUILDID slot, look in there. */
   unsigned char* build_id = (unsigned char*) argv[remaining+1];
   int build_id_len = 0; /* assume text */
diff --git a/debuginfod/debuginfod.cxx b/debuginfod/debuginfod.cxx
index 9dc4836bbe12..105c39087cfc 100644
--- a/debuginfod/debuginfod.cxx
+++ b/debuginfod/debuginfod.cxx
@@ -115,6 +115,9 @@ using namespace std;
 #define tid() pthread_self()
 #endif
 
+#ifdef HAVE_JSON_C
+  #include <json-c/json.h>
+#endif
 
 inline bool
 string_endswith(const string& haystack, const string& needle)
@@ -173,7 +176,7 @@ static const char DEBUGINFOD_SQLITE_DDL[] =
   "        foreign key (buildid) references " BUILDIDS "_buildids(id) on update cascade on delete cascade,\n"
   "        primary key (buildid, file, mtime)\n"
   "        ) " WITHOUT_ROWID ";\n"
-  // Index for faster delete by file identifier
+  // Index for faster delete by file identifier and metadata searches
   "create index if not exists " BUILDIDS "_f_de_idx on " BUILDIDS "_f_de (file, mtime);\n"
   "create table if not exists " BUILDIDS "_f_s (\n"
   "        buildid integer not null,\n"
@@ -199,6 +202,8 @@ static const char DEBUGINFOD_SQLITE_DDL[] =
   "        ) " WITHOUT_ROWID ";\n"
   // Index for faster delete by archive file identifier
   "create index if not exists " BUILDIDS "_r_de_idx on " BUILDIDS "_r_de (file, mtime);\n"
+  // Index for metadata searches
+  "create index if not exists " BUILDIDS "_r_de_idx2 on " BUILDIDS "_r_de (content);\n"  
   "create table if not exists " BUILDIDS "_r_sref (\n" // outgoing dwarf sourcefile references from rpm
   "        buildid integer not null,\n"
   "        artifactsrc integer not null,\n"
@@ -386,6 +391,9 @@ static const struct argp_option options[] =
    { "passive", ARGP_KEY_PASSIVE, NULL, 0, "Do not scan or groom, read-only database.", 0 },
 #define ARGP_KEY_DISABLE_SOURCE_SCAN 0x1009
    { "disable-source-scan", ARGP_KEY_DISABLE_SOURCE_SCAN, NULL, 0, "Do not scan dwarf source info.", 0 },
+#define ARGP_KEY_METADATA_MAXTIME 0x100A
+   { "metadata-maxtime", ARGP_KEY_METADATA_MAXTIME, "SECONDS", 0,
+     "Number of seconds to limit metadata query run time, 0=unlimited.", 0 },
    { NULL, 0, NULL, 0, NULL, 0 },
   };
 
@@ -438,6 +446,8 @@ static unsigned forwarded_ttl_limit = 8;
 static bool scan_source_info = true;
 static string tmpdir;
 static bool passive_p = false;
+static unsigned metadata_maxtime_s = 5;
+
 
 static void set_metric(const string& key, double value);
 // static void inc_metric(const string& key);
@@ -639,6 +649,9 @@ parse_opt (int key, char *arg,
     case ARGP_KEY_DISABLE_SOURCE_SCAN:
       scan_source_info = false;
       break;
+    case ARGP_KEY_METADATA_MAXTIME:
+      metadata_maxtime_s = (unsigned) atoi(arg);
+      break;
       // case 'h': argp_state_help (state, stderr, ARGP_HELP_LONG|ARGP_HELP_EXIT_OK);
     default: return ARGP_ERR_UNKNOWN;
     }
@@ -1824,6 +1837,58 @@ handle_buildid_r_match (bool internal_req_p,
   return r;
 }
 
+void
+add_client_federation_headers(debuginfod_client *client, MHD_Connection* conn){
+  // Transcribe incoming User-Agent:
+  string ua = MHD_lookup_connection_value (conn, MHD_HEADER_KIND, "User-Agent") ?: "";
+  string ua_complete = string("User-Agent: ") + ua;
+  debuginfod_add_http_header (client, ua_complete.c_str());
+
+  // Compute larger XFF:, for avoiding info loss during
+  // federation, and for future cyclicity detection.
+  string xff = MHD_lookup_connection_value (conn, MHD_HEADER_KIND, "X-Forwarded-For") ?: "";
+  if (xff != "")
+    xff += string(", "); // comma separated list
+
+  unsigned int xff_count = 0;
+  for (auto&& i : xff){
+    if (i == ',') xff_count++;
+  }
+
+  // if X-Forwarded-For: exceeds N hops,
+  // do not delegate a local lookup miss to upstream debuginfods.
+  if (xff_count >= forwarded_ttl_limit)
+    throw reportable_exception(MHD_HTTP_NOT_FOUND, "not found, --forwared-ttl-limit reached \
+and will not query the upstream servers");
+
+  // Compute the client's numeric IP address only - so can't merge with conninfo()
+  const union MHD_ConnectionInfo *u = MHD_get_connection_info (conn,
+                                                                MHD_CONNECTION_INFO_CLIENT_ADDRESS);
+  struct sockaddr *so = u ? u->client_addr : 0;
+  char hostname[256] = ""; // RFC1035
+  if (so && so->sa_family == AF_INET) {
+    (void) getnameinfo (so, sizeof (struct sockaddr_in), hostname, sizeof (hostname), NULL, 0,
+                        NI_NUMERICHOST);
+  } else if (so && so->sa_family == AF_INET6) {
+    struct sockaddr_in6* addr6 = (struct sockaddr_in6*) so;
+    if (IN6_IS_ADDR_V4MAPPED(&addr6->sin6_addr)) {
+      struct sockaddr_in addr4;
+      memset (&addr4, 0, sizeof(addr4));
+      addr4.sin_family = AF_INET;
+      addr4.sin_port = addr6->sin6_port;
+      memcpy (&addr4.sin_addr.s_addr, addr6->sin6_addr.s6_addr+12, sizeof(addr4.sin_addr.s_addr));
+      (void) getnameinfo ((struct sockaddr*) &addr4, sizeof (addr4),
+                          hostname, sizeof (hostname), NULL, 0,
+                          NI_NUMERICHOST);
+    } else {
+      (void) getnameinfo (so, sizeof (struct sockaddr_in6), hostname, sizeof (hostname), NULL, 0,
+                          NI_NUMERICHOST);
+    }
+  }
+
+  string xff_complete = string("X-Forwarded-For: ")+xff+string(hostname);
+  debuginfod_add_http_header (client, xff_complete.c_str());
+}
 
 static struct MHD_Response*
 handle_buildid_match (bool internal_req_p,
@@ -2010,57 +2075,7 @@ handle_buildid (MHD_Connection* conn,
       debuginfod_set_progressfn (client, & debuginfod_find_progress);
 
       if (conn)
-        {
-          // Transcribe incoming User-Agent:
-          string ua = MHD_lookup_connection_value (conn, MHD_HEADER_KIND, "User-Agent") ?: "";
-          string ua_complete = string("User-Agent: ") + ua;
-          debuginfod_add_http_header (client, ua_complete.c_str());
-
-          // Compute larger XFF:, for avoiding info loss during
-          // federation, and for future cyclicity detection.
-          string xff = MHD_lookup_connection_value (conn, MHD_HEADER_KIND, "X-Forwarded-For") ?: "";
-          if (xff != "")
-            xff += string(", "); // comma separated list
-
-          unsigned int xff_count = 0;
-          for (auto&& i : xff){
-            if (i == ',') xff_count++;
-          }
-
-          // if X-Forwarded-For: exceeds N hops,
-          // do not delegate a local lookup miss to upstream debuginfods.
-          if (xff_count >= forwarded_ttl_limit)
-            throw reportable_exception(MHD_HTTP_NOT_FOUND, "not found, --forwared-ttl-limit reached \
-and will not query the upstream servers");
-
-          // Compute the client's numeric IP address only - so can't merge with conninfo()
-          const union MHD_ConnectionInfo *u = MHD_get_connection_info (conn,
-                                                                       MHD_CONNECTION_INFO_CLIENT_ADDRESS);
-          struct sockaddr *so = u ? u->client_addr : 0;
-          char hostname[256] = ""; // RFC1035
-          if (so && so->sa_family == AF_INET) {
-            (void) getnameinfo (so, sizeof (struct sockaddr_in), hostname, sizeof (hostname), NULL, 0,
-                                NI_NUMERICHOST);
-          } else if (so && so->sa_family == AF_INET6) {
-            struct sockaddr_in6* addr6 = (struct sockaddr_in6*) so;
-            if (IN6_IS_ADDR_V4MAPPED(&addr6->sin6_addr)) {
-              struct sockaddr_in addr4;
-              memset (&addr4, 0, sizeof(addr4));
-              addr4.sin_family = AF_INET;
-              addr4.sin_port = addr6->sin6_port;
-              memcpy (&addr4.sin_addr.s_addr, addr6->sin6_addr.s6_addr+12, sizeof(addr4.sin_addr.s_addr));
-              (void) getnameinfo ((struct sockaddr*) &addr4, sizeof (addr4),
-                                  hostname, sizeof (hostname), NULL, 0,
-                                  NI_NUMERICHOST);
-            } else {
-              (void) getnameinfo (so, sizeof (struct sockaddr_in6), hostname, sizeof (hostname), NULL, 0,
-                                  NI_NUMERICHOST);
-            }
-          }
-          
-          string xff_complete = string("X-Forwarded-For: ")+xff+string(hostname);
-          debuginfod_add_http_header (client, xff_complete.c_str());
-        }
+        add_client_federation_headers(client, conn);
 
       if (artifacttype == "debuginfo")
 	fd = debuginfod_find_debuginfo (client,
@@ -2272,6 +2287,140 @@ handle_metrics (off_t* size)
   return r;
 }
 
+
+#ifdef HAVE_JSON_C
+static struct MHD_Response*
+handle_metadata (MHD_Connection* conn,
+                 string key, string value, off_t* size)
+{
+  MHD_Response* r;
+  sqlite3 *thisdb = dbq;
+
+  // Query locally for matching e, d and s files
+
+  string op;
+  if (key == "glob")
+    op = "glob";
+  else if (key == "file")
+    op = "=";
+  else
+    throw reportable_exception("/metadata webapi error, unsupported key");
+
+  string sql = string(
+                      // explicit query r_de and f_de once here, rather than the query_d and query_e
+                      // separately, because they scan the same tables, so we'd double the work
+                      "select d1.executable_p, d1.debuginfo_p, 0 as source_p, b1.hex, f1.name as file "
+                      "from " BUILDIDS "_r_de d1, " BUILDIDS "_files f1, " BUILDIDS "_buildids b1 "
+                      "where f1.id = d1.content and d1.buildid = b1.id and f1.name " + op + " ? "
+                      "union all \n"
+                      "select d2.executable_p, d2.debuginfo_p, 0, b2.hex, f2.name "
+                      "from " BUILDIDS "_f_de d2, " BUILDIDS "_files f2, " BUILDIDS "_buildids b2 "
+                      "where f2.id = d2.file and d2.buildid = b2.id and f2.name " + op + " ? "
+                      "union all \n"
+                      // delegate to query_s for this one
+                      "select 0, 0, 1, q.buildid, q.artifactsrc "
+                      "from " BUILDIDS "_query_s as q "
+                      "where q.artifactsrc " + op + " ? ");
+                      
+  sqlite_ps *pp = new sqlite_ps (thisdb, "mhd-query-meta-glob", sql);
+  pp->reset();
+  pp->bind(1, value);
+  pp->bind(2, value);
+  pp->bind(3, value);
+  unique_ptr<sqlite_ps> ps_closer(pp); // release pp if exception or return
+
+  json_object *metadata = json_object_new_array();
+  if (!metadata)
+    throw libc_exception(ENOMEM, "json allocation");
+  
+  // consume all the rows
+  struct timespec ts_start;
+  clock_gettime (CLOCK_MONOTONIC, &ts_start);
+  
+  int rc;
+  while (SQLITE_DONE != (rc = pp->step()))
+    {
+      // break out of loop if we have searched too long
+      struct timespec ts_end;
+      clock_gettime (CLOCK_MONOTONIC, &ts_end);
+      double deltas = (ts_end.tv_sec - ts_start.tv_sec) + (ts_end.tv_nsec - ts_start.tv_nsec)/1.e9;
+      if (metadata_maxtime_s > 0 && deltas > metadata_maxtime_s)
+        break; // NB: no particular signal is given to the client about incompleteness
+      
+      if (rc != SQLITE_ROW) throw sqlite_exception(rc, "step");
+
+      int m_executable_p = sqlite3_column_int (*pp, 0);
+      int m_debuginfo_p  = sqlite3_column_int (*pp, 1);
+      int m_source_p     = sqlite3_column_int (*pp, 2);
+      string m_buildid   = (const char*) sqlite3_column_text (*pp, 3) ?: ""; // should always be non-null
+      string m_file      = (const char*) sqlite3_column_text (*pp, 4) ?: "";
+
+      auto add_metadata = [metadata, m_buildid, m_file](const string& type) {
+        json_object* entry = json_object_new_object();
+        if (NULL == entry) throw libc_exception (ENOMEM, "cannot allocate json");
+        defer_dtor<json_object*,int> entry_d(entry, json_object_put);
+        
+        auto add_entry_metadata = [entry](const char* k, string v) {
+          json_object* s;
+          if(v != "") {
+            s = json_object_new_string(v.c_str());
+            if (NULL == s) throw libc_exception (ENOMEM, "cannot allocate json");
+            json_object_object_add(entry, k, s);
+          }
+        };
+        
+        add_entry_metadata("type", type.c_str());
+        add_entry_metadata("buildid", m_buildid);
+        add_entry_metadata("file", m_file);
+        json_object_array_add(metadata, json_object_get(entry)); // Increase ref count to switch its ownership
+      };
+
+      if (m_executable_p) add_metadata("executable");
+      if (m_debuginfo_p) add_metadata("debuginfo");      
+      if (m_source_p) add_metadata("source");              
+
+    }
+  pp->reset();
+
+  // Query upstream as well
+  debuginfod_client *client = debuginfod_pool_begin();
+  if (metadata && client != NULL)
+  {
+    add_client_federation_headers(client, conn);
+
+    char * upstream_metadata;
+    if (0 == debuginfod_find_metadata(client, key.c_str(), value.c_str(), &upstream_metadata)) {
+      json_object *upstream_metadata_json = json_tokener_parse(upstream_metadata);
+      if(NULL != upstream_metadata_json)
+        {
+          for (int i = 0, n = json_object_array_length(upstream_metadata_json); i < n; i++) {
+            json_object *entry = json_object_array_get_idx(upstream_metadata_json, i);
+            json_object_get(entry); // increment reference count
+            json_object_array_add(metadata, entry);
+          }
+          json_object_put(upstream_metadata_json);
+        }
+      free(upstream_metadata);
+    }
+    debuginfod_pool_end (client);
+  }
+
+  const char* metadata_str = (metadata != NULL) ?
+    json_object_to_json_string(metadata) : "[ ]" ;
+  if (! metadata_str)
+    throw libc_exception (ENOMEM, "cannot allocate json");
+  r = MHD_create_response_from_buffer (strlen(metadata_str),
+                                       (void*) metadata_str,
+                                       MHD_RESPMEM_MUST_COPY);
+  *size = strlen(metadata_str);
+  json_object_put(metadata);
+  if (r)
+    add_mhd_response_header(r, "Content-Type", "application/json");
+  return r;
+}
+#endif
+
+
 static struct MHD_Response*
 handle_root (off_t* size)
 {
@@ -2406,6 +2555,20 @@ handler_cb (void * /*cls*/,
           inc_metric("http_requests_total", "type", artifacttype);
           r = handle_metrics(& http_size);
         }
+#ifdef HAVE_JSON_C
+      else if (url1 == "/metadata")
+        {
+          tmp_inc_metric m ("thread_busy", "role", "http-metadata");
+          const char* key = MHD_lookup_connection_value(connection, MHD_GET_ARGUMENT_KIND, "key");
+          const char* value = MHD_lookup_connection_value(connection, MHD_GET_ARGUMENT_KIND, "value");
+          if (NULL == value || NULL == key)
+            throw reportable_exception("/metadata webapi error, need key and value");
+
+          artifacttype = "metadata";
+          inc_metric("http_requests_total", "type", artifacttype);
+          r = handle_metadata(connection, key, value, &http_size);
+        }
+#endif
       else if (url1 == "/")
         {
           artifacttype = "/";
@@ -3693,12 +3856,13 @@ void groom()
   if (interrupted) return;
 
   // NB: "vacuum" is too heavy for even daily runs: it rewrites the entire db, so is done as maxigroom -G
-  sqlite_ps g1 (db, "incremental vacuum", "pragma incremental_vacuum");
-  g1.reset().step_ok_done();
-  sqlite_ps g2 (db, "optimize", "pragma optimize");
-  g2.reset().step_ok_done();
-  sqlite_ps g3 (db, "wal checkpoint", "pragma wal_checkpoint=truncate");
-  g3.reset().step_ok_done();
+  { sqlite_ps g (db, "incremental vacuum", "pragma incremental_vacuum"); g.reset().step_ok_done(); }
+  // https://www.sqlite.org/lang_analyze.html#approx
+  { sqlite_ps g (db, "analyze setup", "pragma analysis_limit = 1000;\n"); g.reset().step_ok_done(); }
+  { sqlite_ps g (db, "analyze", "analyze"); g.reset().step_ok_done(); }
+  { sqlite_ps g (db, "analyze reload", "analyze sqlite_schema"); g.reset().step_ok_done(); } 
+  { sqlite_ps g (db, "optimize", "pragma optimize"); g.reset().step_ok_done(); }
+  { sqlite_ps g (db, "wal checkpoint", "pragma wal_checkpoint=truncate"); g.reset().step_ok_done(); }
 
   database_stats_report();
 
diff --git a/debuginfod/debuginfod.h.in b/debuginfod/debuginfod.h.in
index 7d8e4972b185..4aa38abb5731 100644
--- a/debuginfod/debuginfod.h.in
+++ b/debuginfod/debuginfod.h.in
@@ -79,6 +79,16 @@ int debuginfod_find_source (debuginfod_client *client,
                             const char *filename,
                             char **path);
 
+/*  Query the urls contained in $DEBUGINFOD_URLS for metadata
+    with given query key/value.
+
+   If successful, return 0, otherwise return a posix error code.
+   If successful, set *metadata to a malloc'd json array
+   with each entry being a json object of metadata for 1 file.
+   Caller must free() it later.  metadata MUST be non-NULL.  */
+int debuginfod_find_metadata (debuginfod_client *client,
+                              const char *key, const char* value, char** metadata);
+
 typedef int (*debuginfod_progressfn_t)(debuginfod_client *c, long a, long b);
 void debuginfod_set_progressfn(debuginfod_client *c,
 			       debuginfod_progressfn_t fn);
diff --git a/debuginfod/libdebuginfod.map b/debuginfod/libdebuginfod.map
index 93964167836f..6e4fe4b5bcba 100644
--- a/debuginfod/libdebuginfod.map
+++ b/debuginfod/libdebuginfod.map
@@ -20,4 +20,5 @@ ELFUTILS_0.183 {
 } ELFUTILS_0.179;
 ELFUTILS_0.188 {
   debuginfod_get_headers;
+  debuginfod_find_metadata;
 } ELFUTILS_0.183;
diff --git a/doc/ChangeLog b/doc/ChangeLog
index 269ed06e567e..7f852824cdc9 100644
--- a/doc/ChangeLog
+++ b/doc/ChangeLog
@@ -1,3 +1,10 @@
+2022-10-06  Ryan Goldberg <rgoldber@redhat.com>
+
+	* debuginfod-find.1: Document metadata query commandline API.
+	* debuginfod_find_debuginfo.1: Document metadata queryC API.
+	* debuginfod_find_metadata.3: New file.
+	* Makefile.am (notrans_dist_*_man3): Add it.
+
 2022-10-28  Arsen Arsenović  <arsen@aarsen.me>
 
 	* readelf.1: Document the --syms alias.
diff --git a/doc/Makefile.am b/doc/Makefile.am
index db2506fd3d49..64ffdaa2c3b5 100644
--- a/doc/Makefile.am
+++ b/doc/Makefile.am
@@ -38,6 +38,7 @@ notrans_dist_man3_MANS += debuginfod_end.3
 notrans_dist_man3_MANS += debuginfod_find_debuginfo.3
 notrans_dist_man3_MANS += debuginfod_find_executable.3
 notrans_dist_man3_MANS += debuginfod_find_source.3
+notrans_dist_man3_MANS += debuginfod_find_metadata.3
 notrans_dist_man3_MANS += debuginfod_get_user_data.3
 notrans_dist_man3_MANS += debuginfod_get_url.3
 notrans_dist_man3_MANS += debuginfod_set_progressfn.3
diff --git a/doc/debuginfod-find.1 b/doc/debuginfod-find.1
index 957ec7e716f9..c44fbc763650 100644
--- a/doc/debuginfod-find.1
+++ b/doc/debuginfod-find.1
@@ -29,6 +29,8 @@ debuginfod-find \- request debuginfo-related data
 .B debuginfod-find [\fIOPTION\fP]... source \fIBUILDID\fP \fI/FILENAME\fP
 .br
 .B debuginfod-find [\fIOPTION\fP]... source \fIPATH\fP \fI/FILENAME\fP
+.br
+.B debuginfod-find [\fIOPTION\fP]... metadata \fIKEY\fP \fIVALUE\fP
 
 .SH DESCRIPTION
 \fBdebuginfod-find\fP queries one or more \fBdebuginfod\fP servers for
@@ -106,6 +108,35 @@ l l.
 \../bar/foo.c AT_comp_dir=/zoo/	source BUILDID /zoo//../bar/foo.c
 .TE
 
+.SS metadata \fIKEY\fP \fIVALUE\fP
+
+All designated debuginfod servers are queried for metadata about files
+in their index.  Different search keys may be supported by different
+servers.
+
+.TS
+l l l .
+KEY	VALUE	DESCRIPTION
+
+\fBfile\fP	path	match exact \fIpath\fP, including in archives
+\fBglob\fP	pattern	sqlite glob match \fIpattern\fP, including in archives
+.TE
+
+The results of the search are output to \fBstdout\fP as a JSON array
+of objects, supplying metadata about each match.  This metadata report
+may or may not be cached.  It may be incomplete and may contain
+duplicates.  For each match, the result is a JSON object with these
+fields.  Additional fields may be present.
+
+.TS
+l l l .
+NAME	TYPE	DESCRIPTION
+
+\fBbuildid\fP	string	hexadecimal buildid associated with the file
+\fBtype\fP	string	one of \fBdebuginfo\fP or \fBexecutable\fP or \fBsource\fP
+\fBfile\fP	string	matched file name, outside or inside the archive
+.TE
+
 .SH "OPTIONS"
 
 .TP
diff --git a/doc/debuginfod.8 b/doc/debuginfod.8
index 7c1dc3dd6a68..9f529d0a4042 100644
--- a/doc/debuginfod.8
+++ b/doc/debuginfod.8
@@ -133,6 +133,14 @@ scanner/groomer server and multiple passive ones, thereby sharing
 service load.  Archive pattern options must still be given, so
 debuginfod can recognize file name extensions for unpacking.
 
+.TP
+.B "\-\-metadata\-maxtime=SECONDS"
+Impose a limit on the runtime of metadata webapi queries.  These
+queries, especially broad "glob" wildcards, can take a large amount of
+time and produce large results.  Public-facing servers may need to
+throttle them.  The default limit is 5 seconds.  Set 0 to disable this
+limit.
+
 .TP
 .B "\-D SQL" "\-\-ddl=SQL"
 Execute given sqlite statement after the database is opened and
@@ -371,6 +379,16 @@ The exact set of metrics and their meanings may change in future
 versions.  Caution: configuration information (path names, versions)
 may be disclosed.
 
+.SS /metadata?key=\fIKEY\fP&value=\fIVALUE\fP
+
+This endpoint triggers a search of the files in the index plus any
+upstream federated servers, based on given key and value.  If
+successful, the result is a application/json textual array, listing
+metadata for the matched files.  See \fIdebuginfod-find(1)\fP for
+documentation of the common key/value search parameters, and the
+resulting data schema.
+
+
 .SH DATA MANAGEMENT
 
 debuginfod stores its index in an sqlite database in a densely packed
diff --git a/doc/debuginfod_find_debuginfo.3 b/doc/debuginfod_find_debuginfo.3
index 3dd832400ec6..f131813ecefc 100644
--- a/doc/debuginfod_find_debuginfo.3
+++ b/doc/debuginfod_find_debuginfo.3
@@ -43,6 +43,10 @@ LOOKUP FUNCTIONS
 .BI "                           int " build_id_len ","
 .BI "                           const char *" filename ","
 .BI "                           char ** " path ");"
+.BI "int debuginfod_find_metadata(debuginfod_client *" client ","
+.BI "                           const char *" key ","
+.BI "                           const char *" value ","
+.BI "                           char** " metadata ");"
 
 OPTIONAL FUNCTIONS
 
@@ -109,6 +113,14 @@ A \fBclient\fP handle should be used from only one thread at a time.
 A handle may be reused for a series of lookups, which can improve
 performance due to retention of connections and caches.
 
+.BR debuginfod_find_metadata (),
+likewise queries all debuginfod server URLs contained in
+.BR $DEBUGINFOD_URLS
+but instead retrieves metadata.  The query search mode is specified
+in the \fIkey\fP parameter, and its parameter \fIvalue\fP.  See
+\fIdebuginfod-find(1)\fP for more information on the available
+options for query key/value.
+
 .SH "RETURN VALUE"
 
 \fBdebuginfod_begin\fP returns the \fBdebuginfod_client\fP handle to
@@ -120,6 +132,13 @@ to the client cache and a file descriptor to that file is returned.
 The caller needs to \fBclose\fP() this descriptor.  Otherwise, a
 negative error code is returned.
 
+The one exception is \fBdebuginfod_find_metadata\fP, which likewise
+returns negative error codes, but on success returns 0 and sets
+\fI*metadata\fP to a string-form JSON array of the found matching
+metadata.  This should be freed by the caller.  See
+\fIdebuginfod-find(1)\fP for more information on the metadata being
+returned.
+
 .SH "OPTIONAL FUNCTIONS"
 
 A small number of optional functions are available to tune or query
diff --git a/doc/debuginfod_find_metadata.3 b/doc/debuginfod_find_metadata.3
new file mode 100644
index 000000000000..16279936e2ea
--- /dev/null
+++ b/doc/debuginfod_find_metadata.3
@@ -0,0 +1 @@
+.so man3/debuginfod_find_debuginfo.3
diff --git a/tests/ChangeLog b/tests/ChangeLog
index a240a70506b1..79aae9208319 100644
--- a/tests/ChangeLog
+++ b/tests/ChangeLog
@@ -1,3 +1,9 @@
+2022-10-06  Ryan Goldberg <rgoldber@redhat.com>
+
+	* run-debuginfod-find-metadata.sh: New test.
+	* Makefile.am (TESTS): Add run-debuginfod-find-metadata.sh.
+	(EXTRA_DIST): Likewise.
+
 2022-09-20  Yonggang Luo  <luoyonggang@gmail.com>
 
 	* Makefile.am (EXTRA_DIST): Remove debuginfod-rpms/hello2.spec.
diff --git a/tests/Makefile.am b/tests/Makefile.am
index ced4a8266236..aaa5d35a769c 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -247,7 +247,8 @@ TESTS += run-debuginfod-dlopen.sh \
          run-debuginfod-x-forwarded-for.sh \
          run-debuginfod-response-headers.sh \
          run-debuginfod-extraction-passive.sh \
-	 run-debuginfod-webapi-concurrency.sh
+	 run-debuginfod-webapi-concurrency.sh \
+	 run-debuginfod-find-metadata.sh
 endif
 if !OLD_LIBMICROHTTPD
 # Will crash on too old libmicrohttpd
@@ -554,6 +555,7 @@ EXTRA_DIST = run-arextract.sh run-arsymtest.sh run-ar.sh \
 	     run-debuginfod-response-headers.sh \
              run-debuginfod-extraction-passive.sh \
              run-debuginfod-webapi-concurrency.sh \
+			 run-debuginfod-find-metadata.sh \
 	     debuginfod-rpms/fedora30/hello2-1.0-2.src.rpm \
 	     debuginfod-rpms/fedora30/hello2-1.0-2.x86_64.rpm \
 	     debuginfod-rpms/fedora30/hello2-debuginfo-1.0-2.x86_64.rpm \
diff --git a/tests/run-debuginfod-find-metadata.sh b/tests/run-debuginfod-find-metadata.sh
new file mode 100755
index 000000000000..2e1999f56d91
--- /dev/null
+++ b/tests/run-debuginfod-find-metadata.sh
@@ -0,0 +1,89 @@
+#!/usr/bin/env bash
+#
+# Copyright (C) 2022 Red Hat, Inc.
+# This file is part of elfutils.
+#
+# This file is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# elfutils is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+. $srcdir/debuginfod-subr.sh
+
+# for test case debugging, uncomment:
+set -x
+unset VALGRIND_CMD
+
+type curl 2>/dev/null || { echo "need curl"; exit 77; }
+type jq 2>/dev/null || { echo "need jq"; exit 77; }
+
+pkg-config json-c libcurl || { echo "one or more libraries are missing (libjson-c, libcurl)"; exit 77; }
+
+DB=${PWD}/.debuginfod_tmp.sqlite
+export DEBUGINFOD_CACHE_PATH=${PWD}/.client_cache
+tempfiles $DB ${DB}_2
+
+# This variable is essential and ensures no time-race for claiming ports occurs
+# set base to a unique multiple of 100 not used in any other 'run-debuginfod-*' test
+base=13100
+get_ports
+mkdir R D
+cp -rvp ${abs_srcdir}/debuginfod-rpms/rhel7 R
+cp -rvp ${abs_srcdir}/debuginfod-debs/*deb D
+
+env LD_LIBRARY_PATH=$ldpath DEBUGINFOD_URLS= ${abs_builddir}/../debuginfod/debuginfod $VERBOSE -R \
+    -d $DB -p $PORT1 -t0 -g0 R > vlog$PORT1 2>&1 &
+PID1=$!
+tempfiles vlog$PORT1
+errfiles vlog$PORT1
+
+wait_ready $PORT1 'ready' 1
+wait_ready $PORT1 'thread_work_total{role="traverse"}' 1
+wait_ready $PORT1 'thread_work_pending{role="scan"}' 0
+wait_ready $PORT1 'thread_busy{role="scan"}' 0
+
+env LD_LIBRARY_PATH=$ldpath DEBUGINFOD_URLS="http://127.0.0.1:$PORT1 https://bad/url.web" ${abs_builddir}/../debuginfod/debuginfod $VERBOSE -U \
+    -d ${DB}_2 -p $PORT2 -t0 -g0 D > vlog$PORT2 2>&1 &
+PID2=$!
+tempfiles vlog$PORT2
+errfiles vlog$PORT2
+
+wait_ready $PORT2 'ready' 1
+wait_ready $PORT2 'thread_work_total{role="traverse"}' 1
+wait_ready $PORT2 'thread_work_pending{role="scan"}' 0
+wait_ready $PORT2 'thread_busy{role="scan"}' 0
+
+# have clients contact the new server
+export DEBUGINFOD_URLS=http://127.0.0.1:$PORT2
+
+tempfiles json.txt
+# Check that we find 11 files(which means that the local and upstream correctly reply to the query)
+N_FOUND=`env LD_LIBRARY_PATH=$ldpath ${abs_builddir}/../debuginfod/debuginfod-find metadata glob "/?sr*" | jq '. | length'`
+test $N_FOUND -eq 11
+
+# Query via the webapi as well
+EXPECTED='[ { "type": "executable", "buildid": "f17a29b5a25bd4960531d82aa6b07c8abe84fa66", "file": "/usr/bin/hithere"} ]'
+curl http://127.0.0.1:$PORT2'/metadata?key=glob&value=/usr/bin/*hi*'
+test `curl http://127.0.0.1:$PORT2'/metadata?key=glob&value=/usr/bin/*hi*' | jq ". == $EXPECTED" ` = 'true'
+
+# An empty array is returned on server error or if the file DNE
+test `env LD_LIBRARY_PATH=$ldpath ${abs_builddir}/../debuginfod/debuginfod-find metadata file "/this/isnt/there" | jq ". == [ ]" ` = 'true'
+
+kill $PID1
+kill $PID2
+wait $PID1
+wait $PID2
+PID1=0
+PID2=0
+
+test `env LD_LIBRARY_PATH=$ldpath ${abs_builddir}/../debuginfod/debuginfod-find metadata file "/usr/bin/hithere" | jq ". == [ ]" ` = 'true'
+
+exit 0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: PATCH: Bug debuginfod/29472 followup
  2022-11-01 14:23 ` PATCH: Bug debuginfod/29472 followup Frank Ch. Eigler
@ 2022-11-01 22:20   ` Mark Wielaard
  2022-11-10 17:12     ` Frank Ch. Eigler
  0 siblings, 1 reply; 12+ messages in thread
From: Mark Wielaard @ 2022-11-01 22:20 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: elfutils-devel

Hi Frank,

On Tue, Nov 01, 2022 at 10:23:06AM -0400, Frank Ch. Eigler via Elfutils-devel wrote:
> On the users/fche/try-pr29472 branch, I pushed a followup to Ryan's
> PR29472 draft from a bunch of weeks ago.  It's missing some
> ChangeLog's but appears otherwise complete.  It's structured as Ryan's
> original patch plus my followup that changes things around, so as to
> preserve both contributions in the history.  I paste the overall diff
> here.
> 
> There will be some minor merge conflicts between this and amerey's
> section-extraction extensions that are also aiming for this release.
> I'll be glad to deconflict whichever way.

The section extraction extension was just pushed. But the testcase
fails on some systems, which needs investigation.

This is a fairly big patch which introduces even more new
functionality, are you sure you want to target the 0.188 release of
tomorrow?

Are you sure the interface is correct? Is the sqlite "glob" pattern
standardized? Can it be provided if the underlying server database
isn't sqlite?

I haven't read the whole diff yet. There are several refactorings
which would be nice to see a separate patch.

Why does debuginfod-client.c use json-c? Can't the server sent the
json object as a normal char string? Why does the string from the
server need to be interpreted as a json object and then turned into a
string again?

Cheers,

Mark

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: PATCH: Bug debuginfod/29472 followup
  2022-11-01 22:20   ` Mark Wielaard
@ 2022-11-10 17:12     ` Frank Ch. Eigler
  2023-03-01 23:32       ` Mark Wielaard
  0 siblings, 1 reply; 12+ messages in thread
From: Frank Ch. Eigler @ 2022-11-10 17:12 UTC (permalink / raw)
  To: Mark Wielaard; +Cc: elfutils-devel

Hi -

> Are you sure the interface is correct? 

Hard to be sure, so it's left generalized.  The APIs would be
unchanged if future search strategies are added (or subtracted);
they'd affect the choice of acceptable KEY strings.

We know we want glob patterns over executable file names.  I've seen
cases where an exact match query produces a different sqlite query
plan from the glob one, but not sure how much performance difference
that implies.  Searching for source files by glob/match is removed
from today's version because it doesn't run fast enough (without a new
large index).

> Is the sqlite "glob" pattern standardized? 

Yes.

> Can it be provided if the underlying server database isn't sqlite?

Yeah, in that unlikely case we undertake this someday.  glob
expressions can be translated to regular expressions, which are
themselves supported in postgres & mysql.


> I haven't read the whole diff yet. There are several refactorings
> which would be nice to see a separate patch.

One such part that occurs to me is the debuginfod_query_server() ->
init_handles() / perform_queries() subdivision.  Are there others?


> Why does debuginfod-client.c use json-c? Can't the server sent the
> json object as a normal char string? Why does the string from the
> server need to be interpreted as a json object and then turned into a
> string again?

Use of the library allows robust processing (checking & merging) of
incoming json data from multiple upstream servers.  Luckily, json-c is
a small & self-contained library.

- FChE


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug debuginfod/29472] Support querying the debuginfod-server for metadata
  2022-08-11 14:43 [Bug debuginfod/29472] New: Support querying the debuginfod-server for metadata rgoldber at redhat dot com
                   ` (5 preceding siblings ...)
  2022-11-01 14:23 ` PATCH: Bug debuginfod/29472 followup Frank Ch. Eigler
@ 2023-02-28 22:21 ` mark at klomp dot org
  2024-06-03 15:27 ` fche at redhat dot com
  7 siblings, 0 replies; 12+ messages in thread
From: mark at klomp dot org @ 2023-02-28 22:21 UTC (permalink / raw)
  To: elfutils-devel

https://sourceware.org/bugzilla/show_bug.cgi?id=29472

Mark Wielaard <mark at klomp dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mark at klomp dot org

--- Comment #5 from Mark Wielaard <mark at klomp dot org> ---
Followup patch:
https://patchwork.sourceware.org/project/elfutils/patch/20221101142306.GL16441@redhat.com/

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: PATCH: Bug debuginfod/29472 followup
  2022-11-10 17:12     ` Frank Ch. Eigler
@ 2023-03-01 23:32       ` Mark Wielaard
  0 siblings, 0 replies; 12+ messages in thread
From: Mark Wielaard @ 2023-03-01 23:32 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: elfutils-devel

Hi Frank,

The patch as posted (users/fche/try-pr29472) doesn't apply anymore,
have you tried to rebase it to the current git master branch?

On Thu, Nov 10, 2022 at 12:12:17PM -0500, Frank Ch. Eigler via Elfutils-devel wrote:
> > Are you sure the interface is correct? 
> 
> Hard to be sure, so it's left generalized.  The APIs would be
> unchanged if future search strategies are added (or subtracted);
> they'd affect the choice of acceptable KEY strings.

OK, so the interfaces added are:

New debuginfod-client interface:

/*  Query the urls contained in $DEBUGINFOD_URLS for metadata
    with given query key/value.

   If successful, return 0, otherwise return a posix error code.
   If successful, set *metadata to a malloc'd json array
   with each entry being a json object of metadata for 1 file.
   Caller must free() it later.  metadata MUST be non-NULL.  */
int debuginfod_find_metadata (debuginfod_client *client,
                              const char *key, const char* value,
                              char** metadata);

Might need an explanation of which characters are allowed in key and
value. In particular is '=' allowed? Any other special chars? What
encoding is used (utf8)?

New server query: metadata?key=KEY&value=VALUE

Same as above, plus maybe explictly descibe the URL encoding?

New debuginfod-find command arguments:

debuginfod-find [OPTION]... metadata KEY VALUE 

Where:

KEY	VALUE	DESCRIPTION

file	path	match exact path, including in archives
glob	pattern	sqlite glob match pattern, including in archives

And as output a json array with objects that might contain the
following:

NAME	TYPE	DESCRIPTION

buildid	string	hexadecimal buildid associated with the file
type	string	one of debuginfo or executable or source
file	string	matched file name, outside or inside the archive

Having the associated buildid with every result object might be a lot,
especially when listing source files, is there a representation where
the buildid is only listed one for a group of results?

Is file here the full path?

> We know we want glob patterns over executable file names.  I've seen
> cases where an exact match query produces a different sqlite query
> plan from the glob one, but not sure how much performance difference
> that implies.  Searching for source files by glob/match is removed
> from today's version because it doesn't run fast enough (without a new
> large index).

Aha, that wasn't really clear from the above description.
Maybe the key name "glob" is a but generic then?
 
> > Is the sqlite "glob" pattern standardized? 
> 
> Yes.

Could you add a reference or description?

I believe globs include the ? (single char) and * (zero or many)
wildcards.  Does it include (negative!) [..] ranges?

> > Why does debuginfod-client.c use json-c? Can't the server sent the
> > json object as a normal char string? Why does the string from the
> > server need to be interpreted as a json object and then turned into a
> > string again?
> 
> Use of the library allows robust processing (checking & merging) of
> incoming json data from multiple upstream servers.  Luckily, json-c is
> a small & self-contained library.

aha, I see, the metadata query is different from other queries. It
"merges" the replies from all servers instead of picking the first one
that gives an answer.

Does it really merge the results? It looks like it just adds all
elements to the array whether or not they are there.

Should the result also contain the server URL from which the entry
came?

Cheers,

Mark

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug debuginfod/29472] Support querying the debuginfod-server for metadata
  2022-08-11 14:43 [Bug debuginfod/29472] New: Support querying the debuginfod-server for metadata rgoldber at redhat dot com
                   ` (6 preceding siblings ...)
  2023-02-28 22:21 ` [Bug debuginfod/29472] Support querying the debuginfod-server for metadata mark at klomp dot org
@ 2024-06-03 15:27 ` fche at redhat dot com
  7 siblings, 0 replies; 12+ messages in thread
From: fche at redhat dot com @ 2024-06-03 15:27 UTC (permalink / raw)
  To: elfutils-devel

https://sourceware.org/bugzilla/show_bug.cgi?id=29472

Frank Ch. Eigler <fche at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|NEW                         |RESOLVED
                 CC|                            |fche at redhat dot com

--- Comment #6 from Frank Ch. Eigler <fche at redhat dot com> ---
merged as:

commit d47d93b1049ecfad2f9bb9db30dc630c3d6131ca (HEAD -> master, origin/main)
gpg: Signature made Mon 03 Jun 2024 11:22:56 AM EDT
gpg:                using RSA key 4DD136490411C0A42B28844F258B6EFA0F209D24
gpg: Good signature from "Frank Ch. Eigler <fche@elastic.org>" [ultimate]
Author: Frank Ch. Eigler <fche@redhat.com>
Date:   Mon Oct 31 17:40:01 2022 -0400

    PR29472: debuginfod: add metadata query webapi, C api, client

    This patch extends the debuginfod API with a "metadata query"
    operation.  It allows clients to request an enumeration of file names
    known to debuginfod servers, returning a JSON response including the
    matching buildids.  This lets clients later download debuginfo for a
    range of versions of the same named binaries, in case they need to to
    prospective work (like systemtap-based live-patching).  It also lets
    server operators implement prefetch triggering operations for popular
    but slow debuginfo slivers like kernel vdso.debug files on fedora.

    Implementation requires a modern enough json-c library, namely 0.11,
    which dates from 2014.  Without that, debuginfod client/server bits
    will refuse to build.

    % debuginfod-find metadata file /bin/ls
    % debuginfod-find metadata glob "/usr/local/bin/c*"

    Refactored several functions in debuginfod-client.c, because the
    metadata search logic is different for multiple servers (merge all
    responses instead of first responder wins).

    Documentation and testing are included.

    Signed-off-by: Ryan Goldberg <rgoldber@redhat.com>
    Signed-off-by: Frank Ch. Eigler <fche@redhat.com>

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-06-03 15:27 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-11 14:43 [Bug debuginfod/29472] New: Support querying the debuginfod-server for metadata rgoldber at redhat dot com
2022-08-12  9:05 ` [Bug debuginfod/29472] " mliska at suse dot cz
2022-08-12  9:05 ` mliska at suse dot cz
2022-08-22 18:24 ` rgoldber at redhat dot com
2022-08-31 14:52 ` rgoldber at redhat dot com
2022-09-02 17:25 ` rgoldber at redhat dot com
2022-11-01 14:23 ` PATCH: Bug debuginfod/29472 followup Frank Ch. Eigler
2022-11-01 22:20   ` Mark Wielaard
2022-11-10 17:12     ` Frank Ch. Eigler
2023-03-01 23:32       ` Mark Wielaard
2023-02-28 22:21 ` [Bug debuginfod/29472] Support querying the debuginfod-server for metadata mark at klomp dot org
2024-06-03 15:27 ` fche at redhat dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).