Re: PR29472: debuginfod metadata query

public inbox for elfutils@sourceware.org
 help / color / mirror / Atom feed

From: Mark Wielaard <mark@klomp.org>
To: "Frank Ch. Eigler" <fche@elastic.org>
Cc: elfutils-devel@sourceware.org
Subject: Re: PR29472: debuginfod metadata query
Date: Wed, 5 Jul 2023 01:21:29 +0200	[thread overview]
Message-ID: <20230704232129.GD11693@gnu.wildebeest.org> (raw)
In-Reply-To: <ZDcVNeakb2ze5oZ8@elastic.org>

Hi Frank,

On Wed, Apr 12, 2023 at 04:31:49PM -0400, Frank Ch. Eigler via Elfutils-devel wrote:

> index f13c28d5c6f7..ac3fd6aa862f 100644
> --- a/debuginfod/ChangeLog
> +++ b/debuginfod/ChangeLog
> @@ -1,3 +1,26 @@
> +2023-04-12  Ryan Golberg <rgoldberg@redhat.com>, Frank Ch. Eigler <fche@redhat.com>
> +
> +	PR29472: debuginfod metadata query
> +	* Makefile.am: Add json-c usage.
> +	* debuginfod-client.c (debuginfod_find_metadata): New function.
> +	(handle_data): New fields to hold metadata being received.
> +	(debuginfod_clean_cache): Clean metadata too.
> +	(header_callback): Simplify to realloc only.
> +	(metadata_callback): New function.
> +	(init_server_urls, init_handle, perform_queries, make_cache_path):
> +	New refactored functions.
> +	(debuginfod_query_server_by_buildid): Renamed, refactored.  Update
> +	callers.
> +	* debuginfod-find.c (main): Handle metadata queries.
> +	* debuginfod.cxx (DEBUGINFOD_SQLITE_DDL): Add an index or two.
> +	(metadata_maxtime_s, parse_opt): New parameter for load control.
> +	(add_client_federation_headers): New refactored function.
> +	(handle_metadata): New function.
> +	(handler_cb): Call it for /metadata URL.  Trace it.
> +	(groom): Tweak sqlite_ps object lifetimes.
> +	* debuginfod.h.in (debuginfod_find_metadata): New decl.
> +	* libdebuginfod.map: Export it under ELFUTILS_0.190.
> +
> [...]
> +/*
> + * This function initializes a CURL handle. It takes optional callbacks for the write
> + * function and the header function, which if defined will use userdata of type struct handle_data*.
> + * Specifically the data[i] within an array of struct handle_data's.
> + * Returns 0 on success and -Posix error on faliure.
> + */
> +int
> +init_handle(debuginfod_client *client,
> +  size_t (*w_callback)(char *buffer,size_t size,size_t nitems,void *userdata),
> +  size_t (*h_callback)(char *buffer,size_t size,size_t nitems,void *userdata),
> +  struct handle_data *data, int i, long timeout,
> +  int vfd)
> +{
> +  data->handle = curl_easy_init();
> +  if (data->handle == NULL)
> +  {
> +    return -ENETUNREACH;
> +  }
> +
> +  if (vfd >= 0)
> +    dprintf (vfd, "url %d %s\n", i, data->url);
> +
> +  /* Only allow http:// + https:// + file:// so we aren't being
> +    redirected to some unsupported protocol.  */
> +#if CURL_AT_LEAST_VERSION(7, 85, 0)
> +  curl_easy_setopt_ck(data->handle, CURLOPT_PROTOCOLS_STR, "http,https,file");
> +#else
> +  curl_easy_setopt_ck(data->handle, CURLOPT_PROTOCOLS,
> +    (CURLPROTO_HTTP | CURLPROTO_HTTPS | CURLPROTO_FILE));
> +#endif
> +  curl_easy_setopt_ck(data->handle, CURLOPT_URL, data->url);
> +  if (vfd >= 0)
> +    curl_easy_setopt_ck(data->handle, CURLOPT_ERRORBUFFER,
> +      data->errbuf);
> +  if(w_callback) {
> +    curl_easy_setopt_ck(data->handle,
> +      CURLOPT_WRITEFUNCTION, w_callback);
> +    curl_easy_setopt_ck(data->handle, CURLOPT_WRITEDATA, data);
> +  }
> +  if (timeout > 0)
> +  {
> +    /* Make sure there is at least some progress,
> +      try to get at least 100K per timeout seconds.  */
> +    curl_easy_setopt_ck (data->handle, CURLOPT_LOW_SPEED_TIME,
> +            timeout);
> +    curl_easy_setopt_ck (data->handle, CURLOPT_LOW_SPEED_LIMIT,
> +            100 * 1024L);
> +  }
> +  curl_easy_setopt_ck(data->handle, CURLOPT_FILETIME, (long) 1);
> +  curl_easy_setopt_ck(data->handle, CURLOPT_FOLLOWLOCATION, (long) 1);
> +  curl_easy_setopt_ck(data->handle, CURLOPT_FAILONERROR, (long) 1);
> +  curl_easy_setopt_ck(data->handle, CURLOPT_NOSIGNAL, (long) 1);
> +  if(h_callback){
> +    curl_easy_setopt_ck(data->handle,
> +      CURLOPT_HEADERFUNCTION, h_callback);
> +    curl_easy_setopt_ck(data->handle, CURLOPT_HEADERDATA, data);
> +  }
> +  #if LIBCURL_VERSION_NUM >= 0x072a00 /* 7.42.0 */
> +  curl_easy_setopt_ck(data->handle, CURLOPT_PATH_AS_IS, (long) 1);
> +  #else
> +  /* On old curl; no big deal, canonicalization here is almost the
> +      same, except perhaps for ? # type decorations at the tail. */
> +  #endif
> +  curl_easy_setopt_ck(data->handle, CURLOPT_AUTOREFERER, (long) 1);
> +  curl_easy_setopt_ck(data->handle, CURLOPT_ACCEPT_ENCODING, "");
> +  curl_easy_setopt_ck(data->handle, CURLOPT_HTTPHEADER, client->headers);
> +
> +  return 0;
> +}

OK, extracted from debuginfod_query_server with parameterized header,
writer callback functions. Also explains why curl_easy_setopt_ck can
just return now, cleanup is done in the caller on failure.

> +/*
> + * This function busy-waits on one or more curl queries to complete. This can
> + * be controled via only_one, which, if true, will find the first winner and exit
> + * once found. If positive maxtime and maxsize dictate the maximum allowed wait times
> + * and download sizes respectivly. Returns 0 on success and -Posix error on faliure.
> + */
> +int
> +perform_queries(CURLM *curlm, CURL **target_handle, struct handle_data *data, debuginfod_client *c,
> +  int num_urls, long maxtime, long maxsize, bool only_one, int vfd)
> +{
> +  int still_running = -1;
> +  long loops = 0;
> +  int committed_to = -1;
> +  bool verbose_reported = false;
> +  struct timespec start_time, cur_time;
> +  if (c->winning_headers != NULL)
> +    {
> +      free (c->winning_headers);
> +      c->winning_headers = NULL;
> +    }
> +  if ( maxtime > 0 && clock_gettime(CLOCK_MONOTONIC_RAW, &start_time) == -1)
> +  {
> +    return errno;
> +  }
> +  long delta = 0;
> +  do
> +  {
> +    /* Check to see how long querying is taking. */
> +    if (maxtime > 0)
> +    {
> +      if (clock_gettime(CLOCK_MONOTONIC_RAW, &cur_time) == -1)
> +      {
> +        return errno;
> +      }
> +      delta = cur_time.tv_sec - start_time.tv_sec;
> +      if ( delta >  maxtime)
> +      {
> +        dprintf(vfd, "Timeout with max time=%lds and transfer time=%lds\n", maxtime, delta );
> +        return -ETIME;
> +      }
> +    }
> +    /* Wait 1 second, the minimum DEBUGINFOD_TIMEOUT.  */
> +    curl_multi_wait(curlm, NULL, 0, 1000, NULL);
> +    CURLMcode curlm_res = curl_multi_perform(curlm, &still_running);
> +
> +    if(only_one){
> +      /* If the target file has been found, abort the other queries.  */
> +      if (target_handle && *target_handle != NULL)
> +      {
> +        for (int i = 0; i < num_urls; i++)
> +          if (data[i].handle != *target_handle)
> +            curl_multi_remove_handle(curlm, data[i].handle);
> +          else
> +          {
> +            committed_to = i;
> +            if (c->winning_headers == NULL)
> +            {
> +              c->winning_headers = data[committed_to].response_data;
> +              if (vfd >= 0 && c->winning_headers != NULL)
> +                dprintf(vfd, "\n%s", c->winning_headers);
> +              data[committed_to].response_data = NULL;
> +              data[committed_to].response_data_size = 0;
> +            }
> +          }
> +      }
> +
> +      if (vfd >= 0 && !verbose_reported && committed_to >= 0)
> +      {
> +        bool pnl = (c->default_progressfn_printed_p && vfd == STDERR_FILENO);
> +        dprintf (vfd, "%scommitted to url %d\n", pnl ? "\n" : "",
> +          committed_to);
> +        if (pnl)
> +          c->default_progressfn_printed_p = 0;
> +        verbose_reported = true;
> +      }
> +    }
> +
> +    if (curlm_res != CURLM_OK)
> +    {
> +      switch (curlm_res)
> +      {
> +      case CURLM_CALL_MULTI_PERFORM: continue;
> +      case CURLM_OUT_OF_MEMORY: return -ENOMEM;
> +      default: return -ENETUNREACH;
> +      }
> +    }
> +
> +    long dl_size = -1;
> +    if(only_one && target_handle){ // Only bother with progress functions if we're retrieving exactly 1 file
> +      if (*target_handle && (c->progressfn || maxsize > 0))
> +      {
> +        /* Get size of file being downloaded. NB: If going through
> +           deflate-compressing proxies, this number is likely to be
> +           unavailable, so -1 may show. */
> +        CURLcode curl_res;
> +#if CURL_AT_LEAST_VERSION(7, 55, 0)
> +        curl_off_t cl;
> +        curl_res = curl_easy_getinfo(*target_handle,
> +                                      CURLINFO_CONTENT_LENGTH_DOWNLOAD_T,
> +                                      &cl);
> +        if (curl_res == CURLE_OK && cl >= 0)
> +          dl_size = (cl > LONG_MAX ? LONG_MAX : (long)cl);
> +#else
> +        double cl;
> +        curl_res = curl_easy_getinfo(*target_handle,
> +                                      CURLINFO_CONTENT_LENGTH_DOWNLOAD,
> +                                      &cl);
> +        if (curl_res == CURLE_OK && cl >= 0)
> +          dl_size = (cl >= (double)(LONG_MAX+1UL) ? LONG_MAX : (long)cl);
> +#endif
> +        /* If Content-Length is -1, try to get the size from
> +            X-Debuginfod-Size */
> +        if (dl_size == -1 && c->winning_headers != NULL)
> +        {
> +          long xdl;
> +          char *hdr = strcasestr(c->winning_headers, "x-debuginfod-size");
> +          size_t off = strlen("x-debuginfod-size:");
> +
> +          if (hdr != NULL && sscanf(hdr + off, "%ld", &xdl) == 1)
> +            dl_size = xdl;
> +        }
> +      }
> +
> +      if (c->progressfn) /* inform/check progress callback */
> +      {
> +        loops ++;
> +        long pa = loops; /* default param for progress callback */
> +        if (*target_handle) /* we've committed to a server; report its download progress */
> +        {
> +          CURLcode curl_res;
> +#if CURL_AT_LEAST_VERSION(7, 55, 0)
> +          curl_off_t dl;
> +          curl_res = curl_easy_getinfo(*target_handle,
> +                                        CURLINFO_SIZE_DOWNLOAD_T,
> +                                        &dl);
> +          if (curl_res == 0 && dl >= 0)
> +            pa = (dl > LONG_MAX ? LONG_MAX : (long)dl);
> +#else
> +          double dl;
> +          curl_res = curl_easy_getinfo(*target_handle,
> +                                        CURLINFO_SIZE_DOWNLOAD,
> +                                        &dl);
> +          if (curl_res == 0)
> +            pa = (dl >= (double)(LONG_MAX+1UL) ? LONG_MAX : (long)dl);
> +#endif
> +
> +        }
> +
> +        if ((*c->progressfn) (c, pa, dl_size == -1 ? 0 : dl_size))
> +          break;
> +      }
> +    }
> +    /* Check to see if we are downloading something which exceeds maxsize, if set.*/
> +    if (target_handle && *target_handle && dl_size > maxsize && maxsize > 0)
> +    {
> +      if (vfd >=0)
> +        dprintf(vfd, "Content-Length too large.\n");
> +      return -EFBIG;
> +    }
> +  } while (still_running);
> +  return 0;
> +}

I find the indentation really confusing, it isn't really GNU style.
Making it hard to see the blocks (which are pretty large).

OK, another part of debuginfod_query_server extracted. Now with a new
only_one boolean.  I still don't have a good grasp of the metadata
queries to fully understand why you might want replies from multiple
servers.

It is somewhat unfortunate that if only_one is true no progress is
reported at all. It would be nice to keep calling progressfn if any
target made progress.

Is target_handle ever set when only_one is true? If it isn't set then
maxsize is never checked.

In a lot of cases the user will have just one DEBUGINFOD_URLS
server. It might be nice if things then just worked as before.

The multi-server options confuse me a little :)

> +
> +/* Helper function to create client cache directory.
> +   $XDG_CACHE_HOME takes priority over $HOME/.cache.
> +   $DEBUGINFOD_CACHE_PATH takes priority over $HOME/.cache and $XDG_CACHE_HOME.
> +
> +   Return resulting path name or NULL on error.  Caller must free resulting string.
> + */
> +static char *
> +make_cache_path(void)
> +{
> +  char* cache_path = NULL;
> +  int rc = 0;
> +  /* Determine location of the cache. The path specified by the debuginfod
> +     cache environment variable takes priority.  */
> +  char *cache_var = getenv(DEBUGINFOD_CACHE_PATH_ENV_VAR);
> +  if (cache_var != NULL && strlen (cache_var) > 0)
> +    xalloc_str (cache_path, "%s", cache_var);
> +  else
> +    {
> +      /* If a cache already exists in $HOME ('/' if $HOME isn't set), then use
> +         that. Otherwise use the XDG cache directory naming format.  */
> +      xalloc_str (cache_path, "%s/%s", getenv ("HOME") ?: "/", cache_default_name);
> +
> +      struct stat st;
> +      if (stat (cache_path, &st) < 0)
> +        {
> +          char cachedir[PATH_MAX];
> +          char *xdg = getenv ("XDG_CACHE_HOME");
> +
> +          if (xdg != NULL && strlen (xdg) > 0)
> +            snprintf (cachedir, PATH_MAX, "%s", xdg);
> +          else
> +            snprintf (cachedir, PATH_MAX, "%s/.cache", getenv ("HOME") ?: "/");
> +
> +          /* Create XDG cache directory if it doesn't exist.  */
> +          if (stat (cachedir, &st) == 0)
> +            {
> +              if (! S_ISDIR (st.st_mode))
> +                {
> +                  rc = -EEXIST;
> +                  goto out1;
> +                }
> +            }
> +          else
> +            {
> +              rc = mkdir (cachedir, 0700);
> +
> +              /* Also check for EEXIST and S_ISDIR in case another client just
> +                 happened to create the cache.  */
> +              if (rc < 0
> +                  && (errno != EEXIST
> +                      || stat (cachedir, &st) != 0
> +                      || ! S_ISDIR (st.st_mode)))
> +                {
> +                  rc = -errno;
> +                  goto out1;
> +                }
> +            }
> +
> +          free (cache_path);
> +          xalloc_str (cache_path, "%s/%s", cachedir, cache_xdg_name);
> +        }
> +    }
> +
> +  goto out;
> +  
> + out1:
> +  (void) rc;
> +  free (cache_path);
> +  cache_path = NULL;
> +
> + out:
> +  if (cache_path != NULL)
> +    (void) mkdir (cache_path, 0700);
> +  return cache_path;
> +}


OK, another extraction from debuginfod_query_server. No changes I can
see, except for the error path goto and mkdir at the end. Shouldn't
that also check for errors (that aren't EEXIST)?

>  /* Query each of the server URLs found in $DEBUGINFOD_URLS for the file
>     with the specified build-id and type (debuginfo, executable, source or
>     section).  If type is source, then type_arg should be a filename.  If
> @@ -849,7 +1271,7 @@ cache_find_section (const char *scn_name, const char *target_cache_dir,
>     for the target if successful, otherwise return an error code.
>  */
>  static int
> -debuginfod_query_server (debuginfod_client *c,
> +debuginfod_query_server_by_buildid (debuginfod_client *c,
>  			 const unsigned char *build_id,
>                           int build_id_len,
>                           const char *type,
> @@ -870,7 +1292,7 @@ debuginfod_query_server (debuginfod_client *c,
>    char suffix[PATH_MAX + 1]; /* +1 for zero terminator.  */
>    char build_id_bytes[MAX_BUILD_ID_BYTES * 2 + 1];
>    int vfd = c->verbose_fd;
> -  int rc;
> +  int rc, r;
>  
>    c->progressfn_cancel = false;
>  
> @@ -995,70 +1417,22 @@ debuginfod_query_server (debuginfod_client *c,
>      dprintf (vfd, "suffix %s\n", suffix);
>  
>    /* set paths needed to perform the query
> -
> -     example format
> +     example format:
>       cache_path:        $HOME/.cache
>       target_cache_dir:  $HOME/.cache/0123abcd
>       target_cache_path: $HOME/.cache/0123abcd/debuginfo
>       target_cache_path: $HOME/.cache/0123abcd/source#PATH#TO#SOURCE ?
> -
> -     $XDG_CACHE_HOME takes priority over $HOME/.cache.
> -     $DEBUGINFOD_CACHE_PATH takes priority over $HOME/.cache and $XDG_CACHE_HOME.
>    */
>  
> -  /* Determine location of the cache. The path specified by the debuginfod
> -     cache environment variable takes priority.  */
> -  char *cache_var = getenv(DEBUGINFOD_CACHE_PATH_ENV_VAR);
> -  if (cache_var != NULL && strlen (cache_var) > 0)
> -    xalloc_str (cache_path, "%s", cache_var);
> -  else
> +  cache_path = make_cache_path();
> +  if (!cache_path)
>      {
> -      /* If a cache already exists in $HOME ('/' if $HOME isn't set), then use
> -         that. Otherwise use the XDG cache directory naming format.  */
> -      xalloc_str (cache_path, "%s/%s", getenv ("HOME") ?: "/", cache_default_name);
> -
> -      struct stat st;
> -      if (stat (cache_path, &st) < 0)
> -        {
> -          char cachedir[PATH_MAX];
> -          char *xdg = getenv ("XDG_CACHE_HOME");
> -
> -          if (xdg != NULL && strlen (xdg) > 0)
> -            snprintf (cachedir, PATH_MAX, "%s", xdg);
> -          else
> -            snprintf (cachedir, PATH_MAX, "%s/.cache", getenv ("HOME") ?: "/");
> -
> -          /* Create XDG cache directory if it doesn't exist.  */
> -          if (stat (cachedir, &st) == 0)
> -            {
> -              if (! S_ISDIR (st.st_mode))
> -                {
> -                  rc = -EEXIST;
> -                  goto out;
> -                }
> -            }
> -          else
> -            {
> -              rc = mkdir (cachedir, 0700);
> -
> -              /* Also check for EEXIST and S_ISDIR in case another client just
> -                 happened to create the cache.  */
> -              if (rc < 0
> -                  && (errno != EEXIST
> -                      || stat (cachedir, &st) != 0
> -                      || ! S_ISDIR (st.st_mode)))
> -                {
> -                  rc = -errno;
> +      rc = -ENOMEM;
>        goto out;
>      }
> -            }
> -
> -          free (cache_path);
> -          xalloc_str (cache_path, "%s/%s", cachedir, cache_xdg_name);
> -        }
> -    }
> -
>    xalloc_str (target_cache_dir, "%s/%s", cache_path, build_id_bytes);
> +  (void) mkdir (target_cache_dir, 0700); // failures with this mkdir would be caught later too
> +

If the same reason holds for make_cache_path then maybe add a similar
comment there.

>    if (section != NULL)
>      xalloc_str (target_cache_path, "%s/%s-%s", target_cache_dir, type, suffix);
>    else
> @@ -1193,60 +1567,14 @@ debuginfod_query_server (debuginfod_client *c,
>        goto out0;
>      }
>  
> -  /* Initialize the memory to zero */
> -  char *strtok_saveptr;
>    char **server_url_list = NULL;
> -  char *server_url = strtok_r(server_urls, url_delim, &strtok_saveptr);
> -  /* Count number of URLs.  */
> -  int num_urls = 0;
> -
> -  while (server_url != NULL)
> -    {
> -      /* PR 27983: If the url is already set to be used use, skip it */
> -      char *slashbuildid;
> -      if (strlen(server_url) > 1 && server_url[strlen(server_url)-1] == '/')
> -        slashbuildid = "buildid";
> -      else
> -        slashbuildid = "/buildid";
> -
> -      char *tmp_url;
> -      if (asprintf(&tmp_url, "%s%s", server_url, slashbuildid) == -1)
> -        {
> -          rc = -ENOMEM;
> -          goto out1;
> -        }
> -      int url_index;
> -      for (url_index = 0; url_index < num_urls; ++url_index)
> -        {
> -          if(strcmp(tmp_url, server_url_list[url_index]) == 0)
> -            {
> -              url_index = -1;
> -              break;
> -            }
> -        }
> -      if (url_index == -1)
> -        {
> -          if (vfd >= 0)
> -            dprintf(vfd, "duplicate url: %s, skipping\n", tmp_url);
> -          free(tmp_url);
> -        }
> -      else
> -        {
> -          num_urls++;
> -          char ** realloc_ptr;
> -          realloc_ptr = reallocarray(server_url_list, num_urls,
> -                                         sizeof(char*));
> -          if (realloc_ptr == NULL)
> -            {
> -              free (tmp_url);
> -              rc = -ENOMEM;
> +  char *server_url;
> +  int num_urls;
> +  r = init_server_urls("buildid", server_urls, &server_url_list, &num_urls, vfd);
> +  if(0 != r){
> +    rc = r;
>      goto out1;
>    }
> -          server_url_list = realloc_ptr;
> -          server_url_list[num_urls-1] = tmp_url;
> -        }
> -      server_url = strtok_r(NULL, url_delim, &strtok_saveptr);
> -    }
>  
>    int retry_limit = default_retry_limit;
>    const char* retry_limit_envvar = getenv(DEBUGINFOD_RETRY_LIMIT_ENV_VAR);
> @@ -1318,13 +1646,6 @@ debuginfod_query_server (debuginfod_client *c,
>  
>        data[i].fd = fd;
>        data[i].target_handle = &target_handle;
> -      data[i].handle = curl_easy_init();
> -      if (data[i].handle == NULL)
> -        {
> -          if (filename) curl_free (escaped_string);
> -          rc = -ENETUNREACH;
> -          goto out2;
> -        }
>        data[i].client = c;
>  
>        if (filename) /* must start with / */
> @@ -1338,226 +1659,29 @@ debuginfod_query_server (debuginfod_client *c,
>  		 build_id_bytes, type, section);
>        else
>          snprintf(data[i].url, PATH_MAX, "%s/%s/%s", server_url, build_id_bytes, type);
> [...lots of removed lines...]
>  
> +      r = init_handle(c, debuginfod_write_callback, header_callback, &data[i], i, timeout, vfd);
> +      if(0 != r){
> +        rc = r;
> +        if(filename) curl_free (escaped_string);
> +        goto out2;
>        }

OK, error handling seems similar.

> -
> -          if ((*c->progressfn) (c, pa, dl_size == -1 ? 0 : dl_size))
> -	    {
> -	      c->progressfn_cancel = true;
> -              break;
> -	    }
> +
> +      curl_multi_add_handle(curlm, data[i].handle);
>      }
>  
> -      /* Check to see if we are downloading something which exceeds maxsize, if set.*/
> -      if (target_handle && dl_size > maxsize && maxsize > 0)
> -        {
> +  if (filename) curl_free(escaped_string);
> +
> +  /* Query servers in parallel.  */
>    if (vfd >= 0)
> -            dprintf(vfd, "Content-Length too large.\n");
> -          rc = -EFBIG;
> +    dprintf (vfd, "query %d urls in parallel\n", num_urls);
> +
> +  r = perform_queries(curlm, &target_handle,data,c, num_urls, maxtime, maxsize, true,  vfd);
> +  if (0 != r)
> +    {
> +      rc = r;
>        goto out2;
>      }
> -    } while (still_running);
>  
>    /* Check whether a query was successful. If so, assign its handle
>       to verified_handle.  */
> @@ -1709,6 +1833,7 @@ debuginfod_query_server (debuginfod_client *c,
>                curl_multi_remove_handle(curlm, data[i].handle); /* ok to repeat */
>                curl_easy_cleanup (data[i].handle);
>                free(data[i].response_data);
> +              data[i].response_data = NULL;
>              }
>              free(c->winning_headers);
>              c->winning_headers = NULL;

OK, rewritten with the new extracted helper functions.

> @@ -1918,7 +2043,7 @@ debuginfod_find_debuginfo (debuginfod_client *client,
>  			   const unsigned char *build_id, int build_id_len,
>                             char **path)
>  {
> -  return debuginfod_query_server(client, build_id, build_id_len,
> +  return debuginfod_query_server_by_buildid(client, build_id, build_id_len,
>                                   "debuginfo", NULL, path);
>  }
>  
> @@ -1929,7 +2054,7 @@ debuginfod_find_executable(debuginfod_client *client,
>  			   const unsigned char *build_id, int build_id_len,
>                             char **path)
>  {
> -  return debuginfod_query_server(client, build_id, build_id_len,
> +  return debuginfod_query_server_by_buildid(client, build_id, build_id_len,
>                                   "executable", NULL, path);
>  }
>  
> @@ -1938,7 +2063,7 @@ int debuginfod_find_source(debuginfod_client *client,
>  			   const unsigned char *build_id, int build_id_len,
>                             const char *filename, char **path)
>  {
> -  return debuginfod_query_server(client, build_id, build_id_len,
> +  return debuginfod_query_server_by_buildid(client, build_id, build_id_len,
>                                   "source", filename, path);
>  }
>  
> @@ -1947,7 +2072,7 @@ debuginfod_find_section (debuginfod_client *client,
>  			 const unsigned char *build_id, int build_id_len,
>  			 const char *section, char **path)
>  {
> -  int rc = debuginfod_query_server(client, build_id, build_id_len,
> +  int rc = debuginfod_query_server_by_buildid(client, build_id, build_id_len,
>                                                "section", section, path);
>    if (rc != -EINVAL)
>      return rc;
> @@ -1997,6 +2122,364 @@ debuginfod_find_section (debuginfod_client *client,
>    return rc;
>  }

OK, calling debuginfod_query_server_by_buildid instead of
debuginfod_query_server now.

> +
> +int debuginfod_find_metadata (debuginfod_client *client,
> +                              const char* key, const char* value, char **path)
> +{
> +  (void) client;
> +  (void) key;
> +  (void) value;
> +  (void) path;
> +  
> +#ifdef HAVE_JSON_C
> +  char *server_urls = NULL;
> +  char *urls_envvar = NULL;
> +  char *cache_path = NULL;
> +  char *target_cache_dir = NULL;
> +  char *target_cache_path = NULL;
> +  char *target_cache_tmppath = NULL;
> +  char *key_and_value = NULL;
> +  int rc = 0, r;
> +  int vfd = client->verbose_fd;
> +  struct handle_data *data = NULL;
> +  
> +  json_object *json_metadata = json_object_new_array();
> +  if(NULL == json_metadata) {
> +    rc = -ENOMEM;
> +    goto out;
> +  }
> +
> +  if(NULL == value || NULL == key){
> +    rc = -EINVAL;
> +    goto out;
> +  }
> +
> +  if (vfd >= 0)
> +    dprintf (vfd, "debuginfod_find_metadata %s %s\n", key, value);
> +
> +  /* Without query-able URL, we can stop here*/
> +  urls_envvar = getenv(DEBUGINFOD_URLS_ENV_VAR);
> +  if (vfd >= 0)
> +    dprintf (vfd, "server urls \"%s\"\n",
> +      urls_envvar != NULL ? urls_envvar : "");
> +  if (urls_envvar == NULL || urls_envvar[0] == '\0')
> +  {
> +    rc = -ENOSYS;
> +    goto out;
> +  }
> +
> +  /* set paths needed to perform the query
> +     example format:
> +     cache_path:        $HOME/.cache
> +     target_cache_dir:  $HOME/.cache/metadata
> +     target_cache_path: $HOME/.cache/metadata/key=ENCODED&value=ENCODED
> +     target_cache_path: $HOME/.cache/metadata/key=ENCODED&value=ENCODED.XXXXXX     
> +  */
> +
> +  // libcurl > 7.62ish has curl_url_set()/etc. to construct these things more properly.

How is it more proper than what we do below?
Should we just use curl_url_set if we have it?

> +  // curl_easy_escape() is older
> +  {
> +    CURL *c = curl_easy_init();
> +    if (!c) {
> +      rc = -ENOMEM;
> +      goto out;
> +    }
> +    char *key_escaped = curl_easy_escape(c, key, 0);
> +    char *value_escaped = curl_easy_escape(c, value, 0);
> +    
> +    // fallback to unescaped values in unlikely case of error
> +    xalloc_str (key_and_value, "key=%s&value=%s", key_escaped ?: key, value_escaped ?: value);
> +    curl_free(value_escaped);
> +    curl_free(key_escaped);
> +    curl_easy_cleanup(c);
> +  }
> +
> +  /* Check if we have a recent result already in the cache. */
> +  cache_path = make_cache_path();
> +  if (! cache_path)
> +    goto out;
> +  xalloc_str (target_cache_dir, "%s/metadata", cache_path);
> +  (void) mkdir (target_cache_dir, 0700);
> +  xalloc_str (target_cache_path, "%s/%s", target_cache_dir, key_and_value);
> +  xalloc_str (target_cache_tmppath, "%s/%s.XXXXXX", target_cache_dir, key_and_value);  

Extra whitespace at end of line.
Also this function is a mix of indentation styles.
Could it just use GNU style throughout the whole function?

> +
> +  int fd = open(target_cache_path, O_RDONLY);
> +  if (fd >= 0)
> +    {
> +      struct stat st;
> +      int metadata_retention = 0;
> +      time_t now = time(NULL);
> +      char *metadata_retention_path = NULL;
> +
> +      xalloc_str (metadata_retention_path, "%s/%s", cache_path, metadata_retention_filename);
> +      if (metadata_retention_path)
> +        {
> +          rc = debuginfod_config_cache(metadata_retention_path,
> +                                       metadata_retention_default_s, &st);
> +          free (metadata_retention_path);
> +          if (rc < 0)
> +            rc = 0;
> +        }
> +      else
> +        rc = 0;
> +      metadata_retention = rc;
> +
> +      if (fstat(fd, &st) != 0)
> +        {
> +          rc = -errno;
> +          close (fd);
> +          goto out;
> +        }
> +
> +      if (metadata_retention > 0 && (now - st.st_mtime <= metadata_retention))
> +        {
> +          if (client && client->verbose_fd >= 0)
> +            dprintf (client->verbose_fd, "cached metadata %s", key_and_value);
> +
> +          if (path != NULL)
> +            {
> +              *path = target_cache_path; // pass over the pointer
> +              target_cache_path = NULL; // prevent free() in our own cleanup
> +            }
> +
> +          /* Success!!!! */
> +          rc = fd;
> +          goto out;
> +        }
> +
> +      /* We don't have to clear the likely-expired cached object here
> +         by unlinking.  We will shortly make a new request and save
> +         results right on top.  Erasing here could trigger a TOCTOU
> +         race with another thread just finishing a query and passing
> +         its results back.
> +      */
> +      // (void) unlink (target_cache_path);

This seems to be a diffent choice from what is done in
debuginfod_query_server_by_buildid. Why?

> +      close (fd);
> +    }
> +
> +  /* No valid cached metadata found: time to make the queries. */
> +
> +  /* Clear the client of previous urls*/
> +  free (client->url);
> +  client->url = NULL;
> +
> +  long maxtime = 0;
> +  const char *maxtime_envvar;
> +  maxtime_envvar = getenv(DEBUGINFOD_MAXTIME_ENV_VAR);
> +  if (maxtime_envvar != NULL)
> +    maxtime = atol (maxtime_envvar);
> +  if (maxtime && vfd >= 0)
> +    dprintf(vfd, "using max time %lds\n", maxtime);
> +
> +  long timeout = default_timeout;
> +  const char* timeout_envvar = getenv(DEBUGINFOD_TIMEOUT_ENV_VAR);
> +  if (timeout_envvar != NULL)
> +    timeout = atoi (timeout_envvar);
> +  if (vfd >= 0)
> +    dprintf (vfd, "using timeout %ld\n", timeout);
> +
> +  add_default_headers(client);
> +
> +  /* make a copy of the envvar so it can be safely modified.  */
> +  server_urls = strdup(urls_envvar);
> +  if (server_urls == NULL)
> +  {
> +    rc = -ENOMEM;
> +    goto out;
> +  }
> +
> +  /* thereafter, goto out1 on error*/
> +
> +  char **server_url_list = NULL;
> +  char *server_url;
> +  int num_urls = 0;
> +  r = init_server_urls("metadata", server_urls, &server_url_list, &num_urls, vfd);
> +  if(0 != r){
> +    rc = r;
> +    goto out1;
> +  }
> +
> +  CURLM *curlm = client->server_mhandle;
> +  assert (curlm != NULL);
> +
> +  CURL *target_handle = NULL;
> +  data = malloc(sizeof(struct handle_data) * num_urls);
> +  if (data == NULL)
> +  {
> +    rc = -ENOMEM;
> +    goto out1;
> +  }
> +
> +  /* thereafter, goto out2 on error.  */
> +
> +  /* Initialize handle_data  */
> +  for (int i = 0; i < num_urls; i++)
> +  {
> +    if ((server_url = server_url_list[i]) == NULL)
> +      break;
> +    if (vfd >= 0)
> +      dprintf (vfd, "init server %d %s\n", i, server_url);
> +
> +    data[i].errbuf[0] = '\0';
> +    data[i].target_handle = &target_handle;
> +    data[i].client = client;
> +    data[i].metadata = NULL;
> +    data[i].metadata_size = 0;
> +    data[i].response_data = NULL;
> +    data[i].response_data_size = 0;
> +      
> +    snprintf(data[i].url, PATH_MAX, "%s?%s", server_url, key_and_value);
> +    
> +    r = init_handle(client, metadata_callback, header_callback, &data[i], i, timeout, vfd);
> +    if(0 != r){
> +      rc = r;
> +      goto out2;
> +    }
> +    curl_multi_add_handle(curlm, data[i].handle);
> +  }
> +
> +  /* Query servers */
> +  if (vfd >= 0)
> +      dprintf (vfd, "Starting %d queries\n",num_urls);
> +  r = perform_queries(curlm, NULL, data, client, num_urls, maxtime, 0, false, vfd);
> +  if (0 != r) {
> +    rc = r;
> +    goto out2;
> +  }
> +
> +  /* NOTE: We don't check the return codes of the curl messages since
> +     a metadata query failing silently is just fine. We want to know what's
> +     available from servers which can be connected with no issues.
> +     If running with additional verbosity, the failure will be noted in stderr */
> +
> +  /* Building the new json array from all the upstream data and
> +     cleanup while at it.
> +   */
> +  for (int i = 0; i < num_urls; i++)
> +  {
> +    curl_multi_remove_handle(curlm, data[i].handle); /* ok to repeat */
> +    curl_easy_cleanup (data[i].handle);
> +    free (data[i].response_data);
> +
> +    if (NULL == data[i].metadata)
> +    {
> +      if (vfd >= 0)
> +        dprintf (vfd, "Query to %s failed with error message:\n\t\"%s\"\n",
> +          data[i].url, data[i].errbuf);
> +      continue;
> +    }
> +
> +    json_object *upstream_metadata = json_tokener_parse(data[i].metadata);
> +    if(NULL == upstream_metadata) continue;
> +    // Combine the upstream metadata into the json array
> +    for (int j = 0, n = json_object_array_length(upstream_metadata); j < n; j++) {
> +        json_object *entry = json_object_array_get_idx(upstream_metadata, j);
> +        json_object_get(entry); // increment reference count
> +        json_object_array_add(json_metadata, entry);
> +    }
> +    json_object_put(upstream_metadata);
> +
> +    free (data[i].metadata);
> +  }

Do we really need libjson-c functions to combine the results?  Can't
we simply return multiple json arrays/objects if we get results from
multiple servers?

> +  /* Because of race with cache cleanup / rmdir, try to mkdir/mkstemp up to twice. */
> +  for (int i=0; i<2; i++) {
> +    /* (re)create target directory in cache */
> +    (void) mkdir(target_cache_dir, 0700); /* files will be 0400 later */
> +
> +    /* NB: write to a temporary file first, to avoid race condition of
> +       multiple clients checking the cache, while a partially-written or empty
> +       file is in there, being written from libcurl. */
> +    fd = mkstemp (target_cache_tmppath);
> +    if (fd >= 0) break;
> +  }
> +  if (fd < 0) /* Still failed after two iterations. */
> +    {
> +      rc = -errno;
> +      goto out1;
> +    }
> +    
> +  
> +  /* Plop the complete json_metadata object into the cache. */
> +  const char* json_string = json_object_to_json_string_ext(json_metadata, JSON_C_TO_STRING_PRETTY);
> +  if (json_string == NULL)
> +    {
> +      rc = -ENOMEM;
> +      goto out1;
> +    }
> +  ssize_t res = write_retry (fd, json_string, strlen(json_string));
> +  (void) lseek(fd, 0, SEEK_SET); // rewind file so client can read it from the top
> +  
> +  /* NB: json_string is auto deleted when json_metadata object is nuked */
> +  if (res < 0 || (size_t) res != strlen(json_string))
> +    {
> +      rc = -EIO;
> +      goto out1;
> +    }
> +  /* PR27571: make cache files casually unwriteable; dirs are already 0700 */
> +  (void) fchmod(fd, 0400);
> +
> +  /* rename tmp->real */
> +  rc = rename (target_cache_tmppath, target_cache_path);
> +  if (rc < 0)
> +    {
> +      rc = -errno;
> +      goto out1;
> +      /* Perhaps we need not give up right away; could retry or something ... */
> +    }
> +  
> +  /* don't close fd - we're returning it */
> +  /* don't unlink the tmppath; it's already been renamed. */
> +  if (path != NULL)
> +   *path = strdup(target_cache_path);
> +
> +  rc = fd;
> +  goto out1;
> +
> +/* error exits */
> +out2:
> +  /* remove all handles from multi */
> +  for (int i = 0; i < num_urls; i++)
> +  {
> +    if (data[i].handle != NULL)
> +    {
> +      curl_multi_remove_handle(curlm, data[i].handle); /* ok to repeat */
> +      curl_easy_cleanup (data[i].handle);
> +      free (data[i].response_data);
> +      free (data[i].metadata);
> +    }
> +  }
> +
> +out1:
> +  free(data);
> +                              
> +  for (int i = 0; i < num_urls; ++i)
> +    free(server_url_list[i]);
> +  free(server_url_list);
> +
> +out:
> +  free (server_urls);
> +  json_object_put(json_metadata);
> +  /* Reset sent headers */
> +  curl_slist_free_all (client->headers);
> +  client->headers = NULL;
> +  client->user_agent_set_p = 0;
> +
> +  free (target_cache_dir);
> +  free (target_cache_path);
> +  free (target_cache_tmppath);
> +  free (key_and_value);
> +  free (cache_path);
> +    
> +  return rc;
> +
> +#else /* ! HAVE_JSON_C */
> +  return -ENOSYS;
> +#endif
> +}

OK. That is it for debuginfod-client.c.
Will go over the rest tomorrow.

Cheers,

Mark

next prev parent reply	other threads:[~2023-07-04 23:21 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-12 20:31 Frank Ch. Eigler
2023-07-04 17:00 ` Mark Wielaard
2023-07-04 23:21 ` Mark Wielaard [this message]
2023-07-05 12:34 ` Mark Wielaard
2023-07-05 13:45 ` Mark Wielaard
2023-07-07 13:59 ` Mark Wielaard
2023-07-07 14:26 ` Mark Wielaard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230704232129.GD11693@gnu.wildebeest.org \
    --to=mark@klomp.org \
    --cc=elfutils-devel@sourceware.org \
    --cc=fche@elastic.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).