public inbox for elfutils@sourceware.org
 help / color / mirror / Atom feed
* [Bug debuginfod/27277] Describe retrieved files when verbose
@ 2021-08-04 18:54 Noah Sanci
  2021-08-05 15:13 ` Mark Wielaard
  0 siblings, 1 reply; 23+ messages in thread
From: Noah Sanci @ 2021-08-04 18:54 UTC (permalink / raw)
  To: elfutils-devel

[-- Attachment #1: Type: text/plain, Size: 262 bytes --]

Hello,

The attached patch dumps the http headers retrieved from the
debuginfod server which a file was downloaded from. Some custom
headers are now returned such as X-FILE, X-FILE-SIZE, and X-ARCHIVE to
give some context about the downloaded files.

Noah Sanci

[-- Attachment #2: 0001-debuginfod-PR27277-Describe-retrieved-files-when-ver.patch --]
[-- Type: text/x-patch, Size: 13772 bytes --]

From dcbe8672d6be30f92ad8baa2fa157b9b7a551b48 Mon Sep 17 00:00:00 2001
From: Noah Sanci <nsanci@redhat.com>
Date: Wed, 28 Jul 2021 14:46:05 -0400
Subject: [PATCH] debuginfod: PR27277 - Describe retrieved files when verbose

There appear to exist use cases that intend to simply check for the
existence of content in a debuginfod server, without actually
downloading it.  In HTTP land, the HEAD operation is the natural
expression of this.  We could support this in the webapi, and give
options to debuginfod-find and the client API to use it.
Instead of implementing a describe option, allow users, with enough
verbosity, to get a file description printed to the verbose output
stream upon retrieving a file. The HEAD operation is not supported in
this patch.

E.g output:

HTTP/1.1 200 OK
Connection: Keep-Alive
Content-Length: 2428240
Cache-Control: public
Last-Modified: Sat, 15 May 2021 20:49:51 GMT
X-FILE-SIZE: 656906321
X-FILE: /usr/lib/debug/lib/modules/5.11.21-200.fc33.x86_64/kernel/drivers/scsi/bnx2fc/bnx2fc.ko.debug
X-ARCHIVE: kernel-debuginfo-5.11.21-200.fc33.x86_64.rpm
Content-Type: application/octet-stream
Date: Tue, 03 Aug 2021 18:50:36 GMT

https://sourceware.org/bugzilla/show_bug.cgi?id=27277

Signed-off-by: Noah Sanci <nsanci@redhat.com>
---
 debuginfod/ChangeLog           | 16 ++++++++++
 debuginfod/debuginfod-client.c | 58 ++++++++++++++++++++++++++++++++--
 debuginfod/debuginfod-find.c   |  3 ++
 debuginfod/debuginfod.cxx      | 11 +++++++
 debuginfod/debuginfod.h.in     |  3 ++
 debuginfod/libdebuginfod.map   |  3 ++
 doc/ChangeLog                  |  6 ++++
 doc/debuginfod-find.1          |  3 +-
 tests/ChangeLog                |  6 ++++
 tests/run-debuginfod-find.sh   | 19 ++++++++++-
 10 files changed, 124 insertions(+), 4 deletions(-)

diff --git a/debuginfod/ChangeLog b/debuginfod/ChangeLog
index 9e82d78d..4297b088 100644
--- a/debuginfod/ChangeLog
+++ b/debuginfod/ChangeLog
@@ -1,3 +1,19 @@
+2021-08-02  Noah Sanci  <nsanci@redhat.com>
+
+	PR27277
+	* debuginfod-client.c (debuginfod_client): New field winning_headers.
+	(handle_data): New field response_data.
+	(header_callback): Store received headers in response_data.
+	(debuginfod_query_server): Activate CURLOPT_HEADERFUNCTION.
+	Save winning response_data.
+	(debuginfod_get_response_headers): Return the winning headers.
+	* debuginfod.h.in: Declare new API function.
+	* libdebuginfod.map: Export it.
+	* debuginfod-find.c: Call it in verbose mode.
+	* debuginfod.cxx (handle_buildid_f_match): Add X-FILE and X-FILE-SIZE
+	http headers to response.
+	(handle_buildid_r_match): Add X-FILE, X-FILE-SIZE and X-ARCHIVE headers.
+
 2021-07-26  Noah Sanci  <nsanci@redhat.com>
 
 	PR27982
diff --git a/debuginfod/debuginfod-client.c b/debuginfod/debuginfod-client.c
index 7d4b220f..5c2a646c 100644
--- a/debuginfod/debuginfod-client.c
+++ b/debuginfod/debuginfod-client.c
@@ -127,6 +127,7 @@ struct debuginfod_client
      timeout or other info gotten from environment variables, the
      handle data, etc. So those don't have to be reparsed and
      recreated on each request.  */
+  char * winning_headers;
 };
 
 /* The cache_clean_interval_s file within the debuginfod cache specifies
@@ -183,6 +184,8 @@ struct handle_data
      to the cache. Used to ensure that a file is not downloaded from
      multiple servers unnecessarily.  */
   CURL **target_handle;
+  /* Response http headers for this client handle, sent from the server */
+  char *response_data;
 };
 
 static size_t
@@ -499,6 +502,33 @@ default_progressfn (debuginfod_client *c, long a, long b)
 }
 
 
+static size_t
+header_callback (char * buffer, size_t size, size_t numitems, void * userdata)
+{
+  if (size != 1)
+    return 0;
+  /* Temporary buffer for realloc */
+  char *temp = NULL;
+  size_t userlen = 0;
+  if (*(char**)userdata == NULL)
+    {
+      temp = malloc(numitems+1);
+      if (temp == NULL)
+        return 0;
+      memset(temp, '\0', numitems+1);
+    }
+  else
+    {
+      userlen = strlen(*(char**)userdata);
+      temp = realloc(*(char**)userdata, userlen + numitems + 1);
+      if (temp == NULL)
+       return 0;
+    }
+  strncat(temp, buffer, numitems);
+  *(char**)userdata = temp;
+  return numitems;
+}
+
 /* Query each of the server URLs found in $DEBUGINFOD_URLS for the file
    with the specified build-id, type (debuginfo, executable or source)
    and filename. filename may be NULL. If found, return a file
@@ -936,10 +966,13 @@ debuginfod_query_server (debuginfod_client *c,
 	  curl_easy_setopt (data[i].handle, CURLOPT_LOW_SPEED_LIMIT,
 			    100 * 1024L);
 	}
+      data[i].response_data = NULL;
       curl_easy_setopt(data[i].handle, CURLOPT_FILETIME, (long) 1);
       curl_easy_setopt(data[i].handle, CURLOPT_FOLLOWLOCATION, (long) 1);
       curl_easy_setopt(data[i].handle, CURLOPT_FAILONERROR, (long) 1);
       curl_easy_setopt(data[i].handle, CURLOPT_NOSIGNAL, (long) 1);
+      curl_easy_setopt(data[i].handle, CURLOPT_HEADERFUNCTION, header_callback);
+      curl_easy_setopt(data[i].handle, CURLOPT_HEADERDATA, (void *) &(data[i].response_data));
 #if LIBCURL_VERSION_NUM >= 0x072a00 /* 7.42.0 */
       curl_easy_setopt(data[i].handle, CURLOPT_PATH_AS_IS, (long) 1);
 #else
@@ -1161,10 +1194,10 @@ debuginfod_query_server (debuginfod_client *c,
                 {
                   char *effective_url = NULL;
                   long resp_code = 500;
-                  CURLcode ok1 = curl_easy_getinfo (target_handle,
+                  CURLcode ok1 = curl_easy_getinfo (msg->easy_handle,
 						    CURLINFO_EFFECTIVE_URL,
 						    &effective_url);
-                  CURLcode ok2 = curl_easy_getinfo (target_handle,
+                  CURLcode ok2 = curl_easy_getinfo (msg->easy_handle,
 						    CURLINFO_RESPONSE_CODE,
 						    &resp_code);
                   if(ok1 == CURLE_OK && ok2 == CURLE_OK && effective_url)
@@ -1238,6 +1271,7 @@ debuginfod_query_server (debuginfod_client *c,
             {
               curl_multi_remove_handle(curlm, data[i].handle); /* ok to repeat */
               curl_easy_cleanup (data[i].handle);
+              free(data[i].response_data);
             }
 	    goto query_in_parallel;
 	}
@@ -1264,6 +1298,16 @@ debuginfod_query_server (debuginfod_client *c,
   tvs[0].tv_usec = tvs[1].tv_usec = 0;
   (void) futimes (fd, tvs);  /* best effort */
 
+  c->winning_headers = NULL;
+  for (int i = 0; i < num_urls; ++i)
+      if (data[i].handle == verified_handle && data[i].response_data != NULL)
+        {
+          c->winning_headers = data[i].response_data;
+          if (vfd >= 0)
+            dprintf(vfd, "\n%s", c->winning_headers);
+          data[i].response_data = NULL;
+        }
+
   /* PR27571: make cache files casually unwriteable; dirs are already 0700 */
   (void) fchmod(fd, 0400);
                 
@@ -1281,6 +1325,7 @@ debuginfod_query_server (debuginfod_client *c,
     {
       curl_multi_remove_handle(curlm, data[i].handle); /* ok to repeat */
       curl_easy_cleanup (data[i].handle);
+      free (data[i].response_data);
     }
 
   for (int i = 0; i < num_urls; ++i)
@@ -1304,6 +1349,7 @@ debuginfod_query_server (debuginfod_client *c,
     {
       curl_multi_remove_handle(curlm, data[i].handle); /* ok to repeat */
       curl_easy_cleanup (data[i].handle);
+      free (data[i].response_data);
     }
 
   unlink (target_cache_tmppath);
@@ -1415,6 +1461,7 @@ debuginfod_end (debuginfod_client *client)
 
   curl_multi_cleanup (client->server_mhandle);
   curl_slist_free_all (client->headers);
+  free (client->winning_headers);
   free (client->url);
   free (client);
 }
@@ -1483,6 +1530,13 @@ debuginfod_set_progressfn(debuginfod_client *client,
   client->progressfn = fn;
 }
 
+/* Return the http response headers that came from the winning server */
+const char *
+debuginfod_get_response_headers(debuginfod_client *c)
+{
+  return c->winning_headers;
+}
+
 void
 debuginfod_set_verbose_fd(debuginfod_client *client, int fd)
 {
diff --git a/debuginfod/debuginfod-find.c b/debuginfod/debuginfod-find.c
index 3e8ab203..1ed76ffc 100644
--- a/debuginfod/debuginfod-find.c
+++ b/debuginfod/debuginfod-find.c
@@ -218,6 +218,9 @@ main(int argc, char** argv)
       const char* url = debuginfod_get_url (client);
       if (url != NULL)
         fprintf(stderr, "Downloaded from %s\n", url);
+      const char *headers = debuginfod_get_response_headers (client);
+      if (headers != NULL)
+        fprintf(stderr, "Headers:\n%s\n", headers);
     }
 
   debuginfod_end (client);
diff --git a/debuginfod/debuginfod.cxx b/debuginfod/debuginfod.cxx
index 4ddd9255..cc653a3a 100644
--- a/debuginfod/debuginfod.cxx
+++ b/debuginfod/debuginfod.cxx
@@ -1063,6 +1063,8 @@ handle_buildid_f_match (bool internal_req_t,
   else
     {
       MHD_add_response_header (r, "Content-Type", "application/octet-stream");
+      (void) MHD_add_response_header(r, "X-FILE-SIZE", std::to_string(s.st_size).c_str());
+      (void) MHD_add_response_header(r, "X-FILE", b_source0.c_str());
       add_mhd_last_modified (r, s.st_mtime);
       if (verbose > 1)
         obatched(clog) << "serving file " << b_source0 << endl;
@@ -1673,6 +1675,15 @@ handle_buildid_r_match (bool internal_req_p,
       else
         {
           MHD_add_response_header (r, "Content-Type", "application/octet-stream");
+          size_t place = b_source0.find_last_of("/");
+
+          if (place == string::npos)
+            (void) MHD_add_response_header(r, "X-ARCHIVE", b_source0.c_str());
+          else
+            (void) MHD_add_response_header(r, "X-ARCHIVE", (char *) ((unsigned long)b_source0.c_str() + place + 1) );
+
+          (void) MHD_add_response_header(r, "X-FILE", b_source1.c_str());
+          (void) MHD_add_response_header(r, "X-FILE-SIZE", std::to_string(fs.st_size).c_str());
           add_mhd_last_modified (r, archive_entry_mtime(e));
           if (verbose > 1)
             obatched(clog) << "serving archive " << b_source0 << " file " << b_source1 << endl;
diff --git a/debuginfod/debuginfod.h.in b/debuginfod/debuginfod.h.in
index c358df4d..21a1bab8 100644
--- a/debuginfod/debuginfod.h.in
+++ b/debuginfod/debuginfod.h.in
@@ -93,6 +93,9 @@ void* debuginfod_get_user_data (debuginfod_client *client);
 /* Get the current or last active URL, if known.  */
 const char* debuginfod_get_url (debuginfod_client *client);
 
+/* Get last set of HTTP response headers, if known.  */
+const char* debuginfod_get_response_headers (debuginfod_client *client);
+  
 /* Add an outgoing HTTP request  "Header: Value".  Copies string.  */
 int debuginfod_add_http_header (debuginfod_client *client, const char* header);
 
diff --git a/debuginfod/libdebuginfod.map b/debuginfod/libdebuginfod.map
index 7d2f5882..1798ff63 100644
--- a/debuginfod/libdebuginfod.map
+++ b/debuginfod/libdebuginfod.map
@@ -18,3 +18,6 @@ ELFUTILS_0.179 {
 ELFUTILS_0.183 {
   debuginfod_set_verbose_fd;
 } ELFUTILS_0.179;
+ELFUTILS_0.186 {
+  debuginfod_get_response_headers;
+} ELFUTILS_0.183;
diff --git a/doc/ChangeLog b/doc/ChangeLog
index 1822fc6b..855cc4a2 100644
--- a/doc/ChangeLog
+++ b/doc/ChangeLog
@@ -1,3 +1,9 @@
+2021-08-04  Noah Sanci  <nsanci@redhat.com>
+
+	PR27277
+	* debuginfod-find.1: Increasing verbosity describes the downloaded
+	file.
+
 2021-07-26  Noah Sanci <nsanci@redhat.com>
 
 	PR27982
diff --git a/doc/debuginfod-find.1 b/doc/debuginfod-find.1
index 482a8ae7..9010aa7b 100644
--- a/doc/debuginfod-find.1
+++ b/doc/debuginfod-find.1
@@ -110,7 +110,8 @@ l l.
 
 .TP
 .B "\-v"
-Increase verbosity, including printing frequent download-progress messages.
+Increase verbosity, including printing frequent download-progress messages
+and printing a brief description of the downloaded file.
 
 
 .SH "SECURITY"
diff --git a/tests/ChangeLog b/tests/ChangeLog
index 34666609..3ca81418 100644
--- a/tests/ChangeLog
+++ b/tests/ChangeLog
@@ -1,3 +1,9 @@
+2021-08-02  Noah Sanci  <nsanci@redhat.com>
+
+	PR27277
+	* run-debuginfod-find.sh: Add a test to ensure that file descriptions
+	are accurate for files outside or within archives
+
 2021-07-26  Noah Sanci  <nsanci@redhat.com>
 
 	PR27982
diff --git a/tests/run-debuginfod-find.sh b/tests/run-debuginfod-find.sh
index 991d1dc5..710ac03f 100755
--- a/tests/run-debuginfod-find.sh
+++ b/tests/run-debuginfod-find.sh
@@ -526,7 +526,24 @@ wait_ready $PORT1 'fdcache_op_count{op="evict"}' $( grep -c 'evicted a=.*' vlog$
 wait_ready $PORT1 'fdcache_op_count{op="prefetch_enqueue"}' $( grep -c 'interned.*front=0' vlog$PORT1 )
 wait_ready $PORT1 'fdcache_op_count{op="prefetch_evict"}' $( grep -c 'evicted from prefetch a=.*front=0' vlog$PORT1 || true )
 ########################################################################
-
+## PR27277
+# Check to see if source file not located in an archive prints the file's description
+env DEBUGINFOD_URLS="http://127.0.0.1:$PORT1" LD_LIBRARY_PATH=$ldpath DEBUGINFOD_MASIZE=1\
+ ${abs_top_builddir}/debuginfod/debuginfod-find -v source $BUILDID2 ${PWD}/prog2.c > vlog-find$PORT1.1 2>&1
+
+tempfiles vlog-find$PORT1.1
+grep 'X-FILE: '$(realpath $PWD)'/prog2.c' vlog-find$PORT1.1
+grep 'X-FILE-SIZE: '$(wc -c ${PWD}'/prog2.c' | awk '{print $1}') vlog-find$PORT1.1
+
+# Check to see if an executable file located in an archive prints the file's description and archive
+env DEBUGINFOD_URLS="http://127.0.0.1:$PORT1" LD_LIBRARY_PATH=$ldpath \
+ ${abs_top_builddir}/debuginfod/debuginfod-find $VERBOSE executable c36708a78618d597dee15d0dc989f093ca5f9120 > vlog-find$PORT1.2 2>&1
+
+tempfiles vlog-find$PORT1.2
+grep 'X-FILE: ' vlog-find$PORT1.2
+grep 'X-FILE-SIZE: ' vlog-find$PORT1.2
+grep 'X-ARCHIVE: hello2-1.0-2.x86_64.rpm' vlog-find$PORT1.2
+########################################################################
 # Federation mode
 
 # find another unused port
-- 
2.31.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Bug debuginfod/27277] Describe retrieved files when verbose
  2021-08-04 18:54 [Bug debuginfod/27277] Describe retrieved files when verbose Noah Sanci
@ 2021-08-05 15:13 ` Mark Wielaard
  2021-08-05 16:54   ` Frank Ch. Eigler
  0 siblings, 1 reply; 23+ messages in thread
From: Mark Wielaard @ 2021-08-05 15:13 UTC (permalink / raw)
  To: Noah Sanci, elfutils-devel

Hi Noah,

On Wed, 2021-08-04 at 14:54 -0400, Noah Sanci via Elfutils-devel wrote:
> The attached patch dumps the http headers retrieved from the
> debuginfod server which a file was downloaded from. Some custom
> headers are now returned such as X-FILE, X-FILE-SIZE, and X-ARCHIVE
> to give some context about the downloaded files.

I like the verbose http header output, but wish it was done earlier
instead of after the download. Maybe when we commit to an url, if the
info is available then.

The new X-FILE and X-ARCHIVE headers also seem useful.
One question about X-FILE, if it doesn't come from an archive, does it
leak a file system path that might be "secret" on the server?

Why is X-FILE-SIZE != Content-Length ?

I am less enthusiastic about the new debuginfod_get_response_headers
interface. It seems not as useful since it only works if we haven't
already (negatively) cached the file and it is very free-form, do we
guarantee any headers are there? Could you provide a user story where
this is used?

Maybe this interface is more useful if it was done as a new active
query type (the HEAD query you mention in the commit message)?

Cheers,

Mark

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Bug debuginfod/27277] Describe retrieved files when verbose
  2021-08-05 15:13 ` Mark Wielaard
@ 2021-08-05 16:54   ` Frank Ch. Eigler
  2021-08-06 10:04     ` Mark Wielaard
  0 siblings, 1 reply; 23+ messages in thread
From: Frank Ch. Eigler @ 2021-08-05 16:54 UTC (permalink / raw)
  To: Mark Wielaard; +Cc: Noah Sanci, elfutils-devel

Hi -

> I like the verbose http header output, but wish it was done earlier
> instead of after the download. Maybe when we commit to an url, if the
> info is available then.

What do you mean "done"?  Printed?

> The new X-FILE and X-ARCHIVE headers also seem useful.
> One question about X-FILE, if it doesn't come from an archive, does it
> leak a file system path that might be "secret" on the server?

Perhaps kind of sort of ... but since source files for such buildids
are resolved only against the host filesystem, those same names will
be there in all their glory in the DWARF.  My guess is that public
servers that care about such configuration privacy will be exclusively
archive based.

> Why is X-FILE-SIZE != Content-Length ?

Because Content-Length can be shorter due to compression
transfer-encoding.  It's the file size that governs local storage &
DEBUGINFOD_MAXSIZE interaction.

> I am less enthusiastic about the new debuginfod_get_response_headers
> interface. It seems not as useful since it only works if we haven't
> already (negatively) cached the file and it is very free-form, do we
> guarantee any headers are there? 

Naturally we can't guarantee any headers, because they are at the
pleasure of the server.

> Could you provide a user story where this is used?

Not really, beyond just printing the things for information purposes,
but not wanting the whole DEBUGINFOD_VERBOSE=1 firehose.  In the mid
term, it could help systemd-coredumpctl type tools map buildids to
actual distro artifact names, and enable paranoid
federation/buildtree/checking type measures (where we may want to
spot-check servers that buildids haven't been hijacked).  Vague but I
think there's something there.

> Maybe this interface is more useful if it was done as a new active
> query type (the HEAD query you mention in the commit message)?

That in turn would require THREE new API functions or a stateful
set_HEAD_mode_and_return_dev_null one and modifying the three main
lookup functions.

- FChE


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Bug debuginfod/27277] Describe retrieved files when verbose
  2021-08-05 16:54   ` Frank Ch. Eigler
@ 2021-08-06 10:04     ` Mark Wielaard
  2021-08-06 18:54       ` Frank Ch. Eigler
  0 siblings, 1 reply; 23+ messages in thread
From: Mark Wielaard @ 2021-08-06 10:04 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: Noah Sanci, elfutils-devel

Hi,

On Thu, 2021-08-05 at 12:54 -0400, Frank Ch. Eigler wrote:
> > I like the verbose http header output, but wish it was done earlier
> > instead of after the download. Maybe when we commit to an url, if
> > the
> > info is available then.
> 
> What do you mean "done"?  Printed?

Yes, since the actual download might take a bit, it is nice to see the
headers at the moment we commit to a server/download. aka here in the
source:

  if (vfd >= 0 && !verbose_reported && committed_to >= 0)
    {
      bool pnl = (c->default_progressfn_printed_p && vfd == STDERR_FILENO);
      dprintf (vfd, "%scommitted to url %d\n", pnl ? "\n" : "",
               committed_to);
      if (pnl)
        c->default_progressfn_printed_p = 0;
      verbose_reported = true;
    }

Assuming we have the headers at that point of course.
Otherwise right after that when they become available (this is inside
the download loop).

> > The new X-FILE and X-ARCHIVE headers also seem useful.
> > One question about X-FILE, if it doesn't come from an archive, does
> > it
> > leak a file system path that might be "secret" on the server?
> 
> Perhaps kind of sort of ... but since source files for such buildids
> are resolved only against the host filesystem, those same names will
> be there in all their glory in the DWARF.  My guess is that public
> servers that care about such configuration privacy will be
> exclusively
> archive based.

Yeah, you are right, the DWARF debug files themselves already "expose"
those paths.

> > Why is X-FILE-SIZE != Content-Length ?
> 
> Because Content-Length can be shorter due to compression
> transfer-encoding.  It's the file size that governs local storage &
> DEBUGINFOD_MAXSIZE interaction.

Ah, of course, then it is indeed useful to have both headers.

> > I am less enthusiastic about the new
> > debuginfod_get_response_headers
> > interface. It seems not as useful since it only works if we haven't
> > already (negatively) cached the file and it is very free-form, do
> > we
> > guarantee any headers are there? 
> 
> Naturally we can't guarantee any headers, because they are at the
> pleasure of the server.
> 
> > Could you provide a user story where this is used?
> 
> Not really, beyond just printing the things for information purposes,
> but not wanting the whole DEBUGINFOD_VERBOSE=1 firehose.  In the mid
> term, it could help systemd-coredumpctl type tools map buildids to
> actual distro artifact names, and enable paranoid
> federation/buildtree/checking type measures (where we may want to
> spot-check servers that buildids haven't been hijacked).  Vague but I
> think there's something there.

Yes, I do think there is something there, but imho it is too vague and
fragile to be useful as is, especially since it depends on what is in
the cache.

> > Maybe this interface is more useful if it was done as a new active
> > query type (the HEAD query you mention in the commit message)?
> 
> That in turn would require THREE new API functions or a stateful
> set_HEAD_mode_and_return_dev_null one and modifying the three main
> lookup functions.

Yes, it definitely is more work.

Cheers,

Mark

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Bug debuginfod/27277] Describe retrieved files when verbose
  2021-08-06 10:04     ` Mark Wielaard
@ 2021-08-06 18:54       ` Frank Ch. Eigler
  2021-08-09  9:25         ` Mark Wielaard
  0 siblings, 1 reply; 23+ messages in thread
From: Frank Ch. Eigler @ 2021-08-06 18:54 UTC (permalink / raw)
  To: Mark Wielaard; +Cc: Noah Sanci, elfutils-devel

Hi -

> Yes, since the actual download might take a bit, it is nice to see the
> headers at the moment we commit to a server/download. aka here in the
> source:
> 
>   if (vfd >= 0 && !verbose_reported && committed_to >= 0)
>     {
>       bool pnl = (c->default_progressfn_printed_p && vfd == STDERR_FILENO);
>       dprintf (vfd, "%scommitted to url %d\n", pnl ? "\n" : "",
>                committed_to);
>       if (pnl)
>         c->default_progressfn_printed_p = 0;
>       verbose_reported = true;
>     }

One could print it there, but verbose printing is so voluminous and
unstructured that mingling http headers there can only be for a
masochist human's consumption.


> > > Why is X-FILE-SIZE != Content-Length ?
> > 
> > Because Content-Length can be shorter due to compression
> > transfer-encoding.  It's the file size that governs local storage &
> > DEBUGINFOD_MAXSIZE interaction.
> 
> Ah, of course, then it is indeed useful to have both headers.

(... which reminds me, we should document those new headers in the
webapi section of the debuginfod.8 man page ...)


> Yes, I do think there is something there, but imho it is too vague and
> fragile to be useful as is, especially since it depends on what is in
> the cache.

I believe you mean that these headers would only be available for
files fetched anew, not for queries previously cached.  Yeah.  A
person or tool can force a new fetch by using something like
DEBUGINFOD_CACHE_PATH=`mktemp -d`.


> > That in turn would require THREE new API functions or a stateful
> > set_HEAD_mode_and_return_dev_null one and modifying the three main
> > lookup functions.
> 
> Yes, it definitely is more work.

So, is that your suggestion?  We proceed with that sort of thing?


- FChE


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Bug debuginfod/27277] Describe retrieved files when verbose
  2021-08-06 18:54       ` Frank Ch. Eigler
@ 2021-08-09  9:25         ` Mark Wielaard
  2021-08-23 15:11           ` Noah Sanci
  2021-09-22 20:33           ` Frank Ch. Eigler
  0 siblings, 2 replies; 23+ messages in thread
From: Mark Wielaard @ 2021-08-09  9:25 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: Noah Sanci, elfutils-devel

Hi Frank,

On Fri, 2021-08-06 at 14:54 -0400, Frank Ch. Eigler wrote:
> > Yes, since the actual download might take a bit, it is nice to see
> > the
> > headers at the moment we commit to a server/download. aka here in
> > the
> > source:
> > 
> >   if (vfd >= 0 && !verbose_reported && committed_to >= 0)
> >     {
> >       bool pnl = (c->default_progressfn_printed_p && vfd ==
> > STDERR_FILENO);
> >       dprintf (vfd, "%scommitted to url %d\n", pnl ? "\n" : "",
> >                committed_to);
> >       if (pnl)
> >         c->default_progressfn_printed_p = 0;
> >       verbose_reported = true;
> >     }
> 
> One could print it there, but verbose printing is so voluminous and
> unstructured that mingling http headers there can only be for a
> masochist human's consumption.

But the verbose output up to now is also chronological. I would expect
the headers to be shown before seeing verbose traces of other activity.

> > Yes, I do think there is something there, but imho it is too vague
> > and
> > fragile to be useful as is, especially since it depends on what is
> > in the cache.
> 
> I believe you mean that these headers would only be available for
> files fetched anew, not for queries previously cached.  Yeah.  A
> person or tool can force a new fetch by using something like
> DEBUGINFOD_CACHE_PATH=`mktemp -d`.

Which IMHO is not a great interface, especially if it forces a full new
download.

> That in turn would require THREE new API functions or a stateful
> > > set_HEAD_mode_and_return_dev_null one and modifying the three
> > > main
> > > lookup functions.
> > 
> > Yes, it definitely is more work.
> 
> So, is that your suggestion?  We proceed with that sort of thing?

Yes, separate the verbose printing of http headers (which I really do
like) from providing an interface to query what needs to be done to get
some file (is it in cache, can it be retieved from a remote server, how
big is it?) I don't think providing raw http headers is that interface.

Cheers,

Mark


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Bug debuginfod/27277] Describe retrieved files when verbose
  2021-08-09  9:25         ` Mark Wielaard
@ 2021-08-23 15:11           ` Noah Sanci
  2021-08-24  8:18             ` Mark Wielaard
  2021-09-22 20:33           ` Frank Ch. Eigler
  1 sibling, 1 reply; 23+ messages in thread
From: Noah Sanci @ 2021-08-23 15:11 UTC (permalink / raw)
  To: Mark Wielaard; +Cc: Frank Ch. Eigler, elfutils-devel

Hello,

I'm back to working on this patch, thanks for your patience and comments.

On Mon, Aug 9, 2021 at 5:27 AM Mark Wielaard <mark@klomp.org> wrote:
> Yes, separate the verbose printing of http headers (which I really do
> like) from providing an interface to query what needs to be done to get
> some file (is it in cache, can it be retieved from a remote server, how
> big is it?) I don't think providing raw http headers is that interface.

Should I separate the verbose printing of http headers and the new
interface into
different PRs?

-Noah Sanci


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Bug debuginfod/27277] Describe retrieved files when verbose
  2021-08-23 15:11           ` Noah Sanci
@ 2021-08-24  8:18             ` Mark Wielaard
  2021-08-27 18:38               ` Noah Sanci
  0 siblings, 1 reply; 23+ messages in thread
From: Mark Wielaard @ 2021-08-24  8:18 UTC (permalink / raw)
  To: Noah Sanci; +Cc: Frank Ch. Eigler, elfutils-devel

Hi Noah,

On Mon, 2021-08-23 at 11:11 -0400, Noah Sanci wrote:
> I'm back to working on this patch, thanks for your patience and
> comments.
> 
> On Mon, Aug 9, 2021 at 5:27 AM Mark Wielaard <mark@klomp.org> wrote:
> > Yes, separate the verbose printing of http headers (which I really do
> > like) from providing an interface to query what needs to be done to get
> > some file (is it in cache, can it be retieved from a remote server, how
> > big is it?) I don't think providing raw http headers is that interface.
> 
> Should I separate the verbose printing of http headers and the new
> interface into
> different PRs?

Yes, separate bugs/patches for the two features would be ideal IMHO.

Thanks,

Mark

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Bug debuginfod/27277] Describe retrieved files when verbose
  2021-08-24  8:18             ` Mark Wielaard
@ 2021-08-27 18:38               ` Noah Sanci
  2021-09-08 20:56                 ` Mark Wielaard
  0 siblings, 1 reply; 23+ messages in thread
From: Noah Sanci @ 2021-08-27 18:38 UTC (permalink / raw)
  To: Mark Wielaard; +Cc: Frank Ch. Eigler, elfutils-devel

[-- Attachment #1: Type: text/plain, Size: 902 bytes --]

Hello,

Here is the patch which prints the http headers when verbose.

-Noah Sanci

On Tue, Aug 24, 2021 at 4:19 AM Mark Wielaard <mark@klomp.org> wrote:
>
> Hi Noah,
>
> On Mon, 2021-08-23 at 11:11 -0400, Noah Sanci wrote:
> > I'm back to working on this patch, thanks for your patience and
> > comments.
> >
> > On Mon, Aug 9, 2021 at 5:27 AM Mark Wielaard <mark@klomp.org> wrote:
> > > Yes, separate the verbose printing of http headers (which I really do
> > > like) from providing an interface to query what needs to be done to get
> > > some file (is it in cache, can it be retieved from a remote server, how
> > > big is it?) I don't think providing raw http headers is that interface.
> >
> > Should I separate the verbose printing of http headers and the new
> > interface into
> > different PRs?
>
> Yes, separate bugs/patches for the two features would be ideal IMHO.
>
> Thanks,
>
> Mark
>

[-- Attachment #2: 0001-debuginfod-PR27277-Describe-retrieved-files-when-ver.patch --]
[-- Type: text/x-patch, Size: 20021 bytes --]

From ed7638571f188e346dd466c195b9ebda028d1c65 Mon Sep 17 00:00:00 2001
From: Noah Sanci <nsanci@redhat.com>
Date: Wed, 28 Jul 2021 14:46:05 -0400
Subject: [PATCH] debuginfod: PR27277 - Describe retrieved files when verbose

There appear to exist use cases that intend to simply check for the
existence of content in a debuginfod server, without actually
downloading it.  In HTTP land, the HEAD operation is the natural
expression of this.
Instead of implementing a HEAD/describe option, allow users, with enough
verbosity, to print the HTTP response headers upon retrieving a file.
E.g output:

HTTP/1.1 200 OK
Connection: Keep-Alive
Content-Length: 2428240
Cache-Control: public
Last-Modified: Sat, 15 May 2021 20:49:51 GMT
X-FILE: "file name"
X-FILE-SIZE: "byte length of file"
Content-Type: application/octet-stream
Date: Tue, 03 Aug 2021 18:50:36 GMT

https://sourceware.org/bugzilla/show_bug.cgi?id=27277

Signed-off-by: Noah Sanci <nsanci@redhat.com>
---
 debuginfod/ChangeLog                     |  15 ++
 debuginfod/debuginfod-client.c           |  58 ++++++-
 debuginfod/debuginfod.cxx                |  11 ++
 doc/ChangeLog                            |   7 +
 doc/debuginfod-find.1                    |   3 +-
 doc/debuginfod.8                         |   7 +
 tests/ChangeLog                          |   7 +
 tests/Makefile.am                        |   3 +-
 tests/run-debuginfod-response-headers.sh | 194 +++++++++++++++++++++++
 9 files changed, 299 insertions(+), 6 deletions(-)
 create mode 100755 tests/run-debuginfod-response-headers.sh

diff --git a/debuginfod/ChangeLog b/debuginfod/ChangeLog
index 395af94f..d911d5b5 100644
--- a/debuginfod/ChangeLog
+++ b/debuginfod/ChangeLog
@@ -16,6 +16,21 @@
 	* debuginfod.cxx (handler_cb): Fix after_you unique_set key
 	to the entire incoming URL.
 
+2021-08-02  Noah Sanci  <nsanci@redhat.com>
+
+	PR27277
+	* debuginfod-client.c (struct debuginfod_client): New field
+	winning_headers.
+	(struct handle_data): New field response_data.
+	(header_callback): Store received headers in response_data.
+	(debuginfod_query_server): Activate CURLOPT_HEADERFUNCTION.
+	Save winning response_data.
+	(debuginfod_end): free client winning headers.
+	* debuginfod.cxx (handle_buildid_f_match): remove X-FILE path. Add
+	X-FILE and X-FILE-SIZE headers.
+	(handle_buildid_r_match): remove X-FILE path. Add X-FILE, X-FILE-SIZE
+	headers, and X-ARCHIVE headers.
+
 2021-07-26  Noah Sanci  <nsanci@redhat.com>
 
 	PR27982
diff --git a/debuginfod/debuginfod-client.c b/debuginfod/debuginfod-client.c
index 7d4b220f..eb52dddf 100644
--- a/debuginfod/debuginfod-client.c
+++ b/debuginfod/debuginfod-client.c
@@ -127,6 +127,7 @@ struct debuginfod_client
      timeout or other info gotten from environment variables, the
      handle data, etc. So those don't have to be reparsed and
      recreated on each request.  */
+  char * winning_headers;
 };
 
 /* The cache_clean_interval_s file within the debuginfod cache specifies
@@ -183,6 +184,8 @@ struct handle_data
      to the cache. Used to ensure that a file is not downloaded from
      multiple servers unnecessarily.  */
   CURL **target_handle;
+  /* Response http headers for this client handle, sent from the server */
+  char *response_data;
 };
 
 static size_t
@@ -499,6 +502,33 @@ default_progressfn (debuginfod_client *c, long a, long b)
 }
 
 
+static size_t
+header_callback (char * buffer, size_t size, size_t numitems, void * userdata)
+{
+  if (size != 1)
+    return 0;
+  /* Temporary buffer for realloc */
+  char *temp = NULL;
+  size_t userlen = 0;
+  if (*(char**)userdata == NULL)
+    {
+      temp = malloc(numitems+1);
+      if (temp == NULL)
+        return 0;
+      memset(temp, '\0', numitems+1);
+    }
+  else
+    {
+      userlen = strlen(*(char**)userdata);
+      temp = realloc(*(char**)userdata, userlen + numitems + 1);
+      if (temp == NULL)
+       return 0;
+    }
+  strncat(temp, buffer, numitems);
+  *(char**)userdata = temp;
+  return numitems;
+}
+
 /* Query each of the server URLs found in $DEBUGINFOD_URLS for the file
    with the specified build-id, type (debuginfo, executable or source)
    and filename. filename may be NULL. If found, return a file
@@ -936,10 +966,13 @@ debuginfod_query_server (debuginfod_client *c,
 	  curl_easy_setopt (data[i].handle, CURLOPT_LOW_SPEED_LIMIT,
 			    100 * 1024L);
 	}
+      data[i].response_data = NULL;
       curl_easy_setopt(data[i].handle, CURLOPT_FILETIME, (long) 1);
       curl_easy_setopt(data[i].handle, CURLOPT_FOLLOWLOCATION, (long) 1);
       curl_easy_setopt(data[i].handle, CURLOPT_FAILONERROR, (long) 1);
       curl_easy_setopt(data[i].handle, CURLOPT_NOSIGNAL, (long) 1);
+      curl_easy_setopt(data[i].handle, CURLOPT_HEADERFUNCTION, header_callback);
+      curl_easy_setopt(data[i].handle, CURLOPT_HEADERDATA, (void *) &(data[i].response_data));
 #if LIBCURL_VERSION_NUM >= 0x072a00 /* 7.42.0 */
       curl_easy_setopt(data[i].handle, CURLOPT_PATH_AS_IS, (long) 1);
 #else
@@ -961,6 +994,7 @@ debuginfod_query_server (debuginfod_client *c,
   int committed_to = -1;
   bool verbose_reported = false;
   struct timespec start_time, cur_time;
+  c->winning_headers = NULL;
   if ( maxtime > 0 && clock_gettime(CLOCK_MONOTONIC_RAW, &start_time) == -1)
     {
       rc = errno;
@@ -995,8 +1029,18 @@ debuginfod_query_server (debuginfod_client *c,
 	    if (data[i].handle != target_handle)
 	      curl_multi_remove_handle(curlm, data[i].handle);
 	    else
-	      committed_to = i;
-	}
+              {
+	        committed_to = i;
+                if (c->winning_headers == NULL)
+                  {
+                    c->winning_headers = data[committed_to].response_data;
+                    if (vfd >= 0 && c->winning_headers != NULL)
+                      dprintf(vfd, "\n%s", c->winning_headers);
+                    data[committed_to].response_data = NULL;
+                  }
+
+              }
+        }
 
       if (vfd >= 0 && !verbose_reported && committed_to >= 0)
 	{
@@ -1161,10 +1205,10 @@ debuginfod_query_server (debuginfod_client *c,
                 {
                   char *effective_url = NULL;
                   long resp_code = 500;
-                  CURLcode ok1 = curl_easy_getinfo (target_handle,
+                  CURLcode ok1 = curl_easy_getinfo (msg->easy_handle,
 						    CURLINFO_EFFECTIVE_URL,
 						    &effective_url);
-                  CURLcode ok2 = curl_easy_getinfo (target_handle,
+                  CURLcode ok2 = curl_easy_getinfo (msg->easy_handle,
 						    CURLINFO_RESPONSE_CODE,
 						    &resp_code);
                   if(ok1 == CURLE_OK && ok2 == CURLE_OK && effective_url)
@@ -1238,7 +1282,10 @@ debuginfod_query_server (debuginfod_client *c,
             {
               curl_multi_remove_handle(curlm, data[i].handle); /* ok to repeat */
               curl_easy_cleanup (data[i].handle);
+              free(data[i].response_data);
             }
+            free(c->winning_headers);
+            c->winning_headers = NULL;
 	    goto query_in_parallel;
 	}
       else
@@ -1281,6 +1328,7 @@ debuginfod_query_server (debuginfod_client *c,
     {
       curl_multi_remove_handle(curlm, data[i].handle); /* ok to repeat */
       curl_easy_cleanup (data[i].handle);
+      free (data[i].response_data);
     }
 
   for (int i = 0; i < num_urls; ++i)
@@ -1304,6 +1352,7 @@ debuginfod_query_server (debuginfod_client *c,
     {
       curl_multi_remove_handle(curlm, data[i].handle); /* ok to repeat */
       curl_easy_cleanup (data[i].handle);
+      free (data[i].response_data);
     }
 
   unlink (target_cache_tmppath);
@@ -1415,6 +1464,7 @@ debuginfod_end (debuginfod_client *client)
 
   curl_multi_cleanup (client->server_mhandle);
   curl_slist_free_all (client->headers);
+  free (client->winning_headers);
   free (client->url);
   free (client);
 }
diff --git a/debuginfod/debuginfod.cxx b/debuginfod/debuginfod.cxx
index 6e182a84..e9b7e76c 100644
--- a/debuginfod/debuginfod.cxx
+++ b/debuginfod/debuginfod.cxx
@@ -1068,6 +1068,9 @@ handle_buildid_f_match (bool internal_req_t,
   else
     {
       MHD_add_response_header (r, "Content-Type", "application/octet-stream");
+      std::string file = b_source0.substr(b_source0.find_last_of("/")+1, b_source0.length());
+      MHD_add_response_header (r, "X-FILE-SIZE", to_string(s.st_size).c_str() );
+      MHD_add_response_header (r, "X-FILE", file.c_str() );
       add_mhd_last_modified (r, s.st_mtime);
       if (verbose > 1)
         obatched(clog) << "serving file " << b_source0 << endl;
@@ -1537,6 +1540,9 @@ handle_buildid_r_match (bool internal_req_p,
       inc_metric ("http_responses_total","result","archive fdcache");
 
       MHD_add_response_header (r, "Content-Type", "application/octet-stream");
+      MHD_add_response_header (r, "X-FILE-SIZE", to_string(fs.st_size).c_str());
+      MHD_add_response_header (r, "X-ARCHIVE", b_source0.c_str());
+      MHD_add_response_header (r, "X-FILE", b_source1.c_str());
       add_mhd_last_modified (r, fs.st_mtime);
       if (verbose > 1)
         obatched(clog) << "serving fdcache archive " << b_source0 << " file " << b_source1 << endl;
@@ -1678,6 +1684,11 @@ handle_buildid_r_match (bool internal_req_p,
       else
         {
           MHD_add_response_header (r, "Content-Type", "application/octet-stream");
+          std::string file = b_source1.substr(b_source1.find_last_of("/")+1, b_source1.length());
+          MHD_add_response_header (r, "X-FILE-SIZE", to_string(fs.st_size).c_str());
+          MHD_add_response_header (r, "X-ARCHIVE", b_source0.c_str());
+          MHD_add_response_header (r, "X-FILE", file.c_str());
+
           add_mhd_last_modified (r, archive_entry_mtime(e));
           if (verbose > 1)
             obatched(clog) << "serving archive " << b_source0 << " file " << b_source1 << endl;
diff --git a/doc/ChangeLog b/doc/ChangeLog
index d5f34f0f..d5198404 100644
--- a/doc/ChangeLog
+++ b/doc/ChangeLog
@@ -14,6 +14,13 @@
 	* Makefile.am: Updated to include debuginfod-client-config.7
 	* man3, man7: Symlinks for source tree man page testing.
 
+2021-08-04  Noah Sanci  <nsanci@redhat.com>
+
+	PR27277
+	* debuginfod-find.1: Increasing verbosity describes the downloaded
+	file.
+	* debuginfod.8: Describe X-FILE, X-FILE-SIZE, and X-ARCHIVE.
+
 2021-07-26  Noah Sanci <nsanci@redhat.com>
 
 	PR27982
diff --git a/doc/debuginfod-find.1 b/doc/debuginfod-find.1
index a61673f5..957ec7e7 100644
--- a/doc/debuginfod-find.1
+++ b/doc/debuginfod-find.1
@@ -110,7 +110,8 @@ l l.
 
 .TP
 .B "\-v"
-Increase verbosity, including printing frequent download-progress messages.
+Increase verbosity, including printing frequent download-progress messages
+and printing the http response headers from the server.
 
 
 .SH "SECURITY"
diff --git a/doc/debuginfod.8 b/doc/debuginfod.8
index 5b0d793c..e9c58fbb 100644
--- a/doc/debuginfod.8
+++ b/doc/debuginfod.8
@@ -256,6 +256,13 @@ Unknown buildid / request combinations result in HTTP error codes.
 This file service resemblance is intentional, so that an installation
 can take advantage of standard HTTP management infrastructure.
 
+Upon finding a file in an archive or simply in the database, some
+custom http headers are added to the response. For files in the
+database X-FILE and X-FILE-SIZE are added. X-FILE is simply the
+filename and X-FILE-SIZE is the size of the file. For files found
+in archives, in addition to X-FILE and X-FILE-SIZE, X-ARCHIVE is added.
+X-ARCHIVE is the name of the the file was found in.
+
 There are three requests.  In each case, the buildid is encoded as a
 lowercase hexadecimal string.  For example, for a program \fI/bin/ls\fP,
 look at the ELF note GNU_BUILD_ID:
diff --git a/tests/ChangeLog b/tests/ChangeLog
index 29c48b97..1ed1e0c8 100644
--- a/tests/ChangeLog
+++ b/tests/ChangeLog
@@ -8,6 +8,13 @@
 	* backtrace.c (callback_verify): Check for pthread_kill as first
 	frame. Change asserts to fprintf plus abort.
 
+2021-08-02  Noah Sanci  <nsanci@redhat.com>
+
+	PR27277
+	* run-debuginfod-response-headers.sh: Add a test to ensure that file descriptions
+	are accurate for files outside or within archives.
+	* Makefile.am: Add run-debuginfod-response-headers.sh to TESTS.
+
 2021-07-26  Noah Sanci  <nsanci@redhat.com>
 
 	PR27982
diff --git a/tests/Makefile.am b/tests/Makefile.am
index 429649f4..99fa61ba 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -211,7 +211,7 @@ if DEBUGINFOD
 check_PROGRAMS += debuginfod_build_id_find
 # With the dummy delegation doesn't work
 if !DUMMY_LIBDEBUGINFOD
-TESTS += run-debuginfod-find.sh
+TESTS += run-debuginfod-find.sh run-debuginfod-response-headers.sh
 endif
 endif
 
@@ -475,6 +475,7 @@ EXTRA_DIST = run-arextract.sh run-arsymtest.sh run-ar.sh \
 	     run-disasm-riscv64.sh \
 	     testfile-riscv64-dis1.o.bz2 testfile-riscv64-dis1.expect.bz2 \
              run-debuginfod-find.sh \
+             run-debuginfod-response-headers.sh \
 	     debuginfod-rpms/fedora30/hello2-1.0-2.src.rpm \
 	     debuginfod-rpms/fedora30/hello2-1.0-2.x86_64.rpm \
 	     debuginfod-rpms/fedora30/hello2-debuginfo-1.0-2.x86_64.rpm \
diff --git a/tests/run-debuginfod-response-headers.sh b/tests/run-debuginfod-response-headers.sh
new file mode 100755
index 00000000..682e7b9a
--- /dev/null
+++ b/tests/run-debuginfod-response-headers.sh
@@ -0,0 +1,194 @@
+#!/usr/bin/env bash
+#
+# Copyright (C) 2019-2021 Red Hat, Inc.
+# This file is part of elfutils.
+#
+# This file is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# elfutils is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+. $srcdir/test-subr.sh  # includes set -e
+
+# for test case debugging, uncomment:
+set -x
+
+DB=${PWD}/.debuginfod_tmp.sqlite
+tempfiles $DB
+export DEBUGINFOD_CACHE_PATH=${PWD}/.client_cache
+
+while true; do
+    PORT1=`expr '(' $RANDOM % 1000 ')' + 9000`
+    ss -atn | fgrep ":$PORT1" || break
+done
+PID1=0
+PID2=0
+PID3=0
+PID4=0
+
+cleanup()
+{
+  if [ $PID1 -ne 0 ]; then kill $PID1; wait $PID1; fi
+  if [ $PID2 -ne 0 ]; then kill $PID2; wait $PID2; fi
+  if [ $PID3 -ne 0 ]; then kill $PID3; wait $PID3; fi
+  if [ $PID4 -ne 0 ]; then kill $PID4; wait $PID4; fi
+  rm -rf F R D L Z ${PWD}/foobar ${PWD}/mocktree ${PWD}/.client_cache* ${PWD}/tmp*
+  exit_cleanup
+}
+
+# clean up trash if we were aborted early
+trap cleanup 0 1 2 3 5 9 15
+
+errfiles_list=
+err() {
+    echo ERROR REPORTS
+    for ports in $PORT1 $PORT2 $PORT3
+    do
+        echo ERROR REPORT $port metrics
+        curl -s http://127.0.0.1:$port/metrics
+        echo
+    done
+    for x in $errfiles_list
+    do
+        echo ERROR REPORT "$x"
+        cat $x
+        echo
+    done
+    false # trigger set -e
+}
+trap err ERR
+
+errfiles() {
+    errfiles_list="$errfiles_list $*"
+}
+
+wait_ready()
+{
+  port=$1;
+  what=$2;
+  value=$3;
+  timeout=20;
+
+  echo "Wait $timeout seconds on $port for metric $what to change to $value"
+  while [ $timeout -gt 0 ]; do
+    mvalue="$(curl -s http://127.0.0.1:$port/metrics \
+              | grep "$what" | awk '{print $NF}')"
+    if [ -z "$mvalue" ]; then mvalue=0; fi
+      echo "metric $what: $mvalue"
+      if [ "$mvalue" -eq "$value" ]; then
+        break;
+    fi
+    sleep 0.5;
+    ((timeout--));
+  done;
+
+  if [ $timeout -eq 0 ]; then
+    echo "metric $what never changed to $value on port $port"
+    err
+  fi
+}
+
+# We want to run debuginfod in the background.  We also want to start
+# it with the same check/installcheck-sensitive LD_LIBRARY_PATH stuff
+# that the testrun alias sets.  But: we if we just use
+#    testrun .../debuginfod
+# it runs in a subshell, with different pid, so not helpful.
+#
+# So we gather the LD_LIBRARY_PATH with this cunning trick:
+ldpath=`testrun sh -c 'echo $LD_LIBRARY_PATH'`
+
+mkdir F R
+
+env LD_LIBRARY_PATH=$ldpath DEBUGINFOD_URLS= ${abs_builddir}/../debuginfod/debuginfod $VERBOSE -F -R -d $DB -p $PORT1 -t0 -g0 -v R F > vlog$PORT1 2>&1 &
+PID1=$!
+tempfiles vlog$PORT1
+errfiles vlog$PORT1
+# Server must become ready
+wait_ready $PORT1 'ready' 1
+export DEBUGINFOD_URLS=http://127.0.0.1:$PORT1/   # or without trailing /
+# Check thread comm names
+ps -q $PID1 -e -L -o '%p %c %a' | grep groom
+ps -q $PID1 -e -L -o '%p %c %a' | grep scan
+ps -q $PID1 -e -L -o '%p %c %a' | grep traverse
+
+# We use -t0 and -g0 here to turn off time-based scanning & grooming.
+# For testing purposes, we just sic SIGUSR1 / SIGUSR2 at the process.
+
+########################################################################
+
+# Compile a simple program, strip its debuginfo and save the build-id.
+# Also move the debuginfo into another directory so that elfutils
+# cannot find it without debuginfod.
+echo "int main() { return 0; }" > ${PWD}/prog.c
+tempfiles prog.c
+# Create a subdirectory to confound source path names
+mkdir foobar
+gcc -Wl,--build-id -g -o prog ${PWD}/foobar///./../prog.c
+testrun ${abs_top_builddir}/src/strip -g -f prog.debug ${PWD}/prog
+BUILDID=`env LD_LIBRARY_PATH=$ldpath ${abs_builddir}/../src/readelf \
+          -a prog | grep 'Build ID' | cut -d ' ' -f 7`
+
+mv prog F
+mv prog.debug F
+kill -USR1 $PID1
+# Wait till both files are in the index.
+wait_ready $PORT1 'thread_work_total{role="traverse"}' 1
+wait_ready $PORT1 'thread_work_pending{role="scan"}' 0
+wait_ready $PORT1 'thread_busy{role="scan"}' 0
+
+cp -rvp ${abs_srcdir}/debuginfod-rpms R
+if [ "$zstd" = "false" ]; then  # nuke the zstd fedora 31 ones
+    rm -vrf R/debuginfod-rpms/fedora31
+fi
+
+kill -USR1 $PID1
+# Wait till both files are in the index and scan/index fully finished
+wait_ready $PORT1 'thread_work_total{role="traverse"}' 2
+wait_ready $PORT1 'thread_work_pending{role="scan"}' 0
+wait_ready $PORT1 'thread_busy{role="scan"}' 0
+# All rpms need to be in the index, except the dummy permission-000 one
+rpms=$(find R -name \*rpm | grep -v nothing | wc -l)
+wait_ready $PORT1 'scanned_files_total{source=".rpm archive"}' $rpms
+kill -USR1 $PID1  # two hits of SIGUSR1 may be needed to resolve .debug->dwz->srefs
+# Wait till both files are in the index and scan/index fully finished
+wait_ready $PORT1 'thread_work_total{role="traverse"}' 3
+wait_ready $PORT1 'thread_work_pending{role="scan"}' 0
+wait_ready $PORT1 'thread_busy{role="scan"}' 0
+
+########################################################################
+## PR27277
+# Make a simple request to the debuginfod server and check debuginfod-find's vlog to see if
+# the custom HTTP headers are received.
+rm -rf $DEBUGINFOD_CACHE_PATH
+env DEBUGINFOD_URLS="http://127.0.0.1:"$PORT1 LD_LIBRARY_PATH=$ldpath ${abs_top_builddir}/debuginfod/debuginfod-find\
+    -vvv executable F/prog > vlog-find$PORT1.1 2>&1
+tempfiles vlog-find$PORT1.1
+grep 'Content-Length: ' vlog-find$PORT1.1
+grep 'Connection: ' vlog-find$PORT1.1
+grep 'Cache-Control: ' vlog-find$PORT1.1
+grep 'X-FILE: ' vlog-find$PORT1.1
+grep 'X-FILE-SIZE: ' vlog-find$PORT1.1
+
+# Check to see if an executable file located in an archive prints the file's description and archive
+env DEBUGINFOD_URLS="http://127.0.0.1:"$PORT1 LD_LIBRARY_PATH=$ldpath ${abs_top_builddir}/debuginfod/debuginfod-find\
+    -vvv executable c36708a78618d597dee15d0dc989f093ca5f9120 > vlog-find$PORT1.2 2>&1
+tempfiles vlog-find$PORT1.2
+grep 'Content-Length: ' vlog-find$PORT1.2
+grep 'Connection: ' vlog-find$PORT1.2
+grep 'Cache-Control: ' vlog-find$PORT1.2
+grep 'X-FILE: ' vlog-find$PORT1.2
+grep 'X-FILE-SIZE: ' vlog-find$PORT1.2
+grep 'X-ARCHIVE: ' vlog-find$PORT1.2
+
+kill $PID1
+wait $PID1
+PID1=0
+exit 0
-- 
2.31.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Bug debuginfod/27277] Describe retrieved files when verbose
  2021-08-27 18:38               ` Noah Sanci
@ 2021-09-08 20:56                 ` Mark Wielaard
  2021-09-10 18:22                   ` Noah Sanci
  0 siblings, 1 reply; 23+ messages in thread
From: Mark Wielaard @ 2021-09-08 20:56 UTC (permalink / raw)
  To: Noah Sanci; +Cc: elfutils-devel

Hi Noah,

Looks like I reviewed a slightly older version earlier.  Could you
merge the reviewes/changes for both version in a new version of this
patch?

On Fri, Aug 27, 2021 at 02:38:27PM -0400, Noah Sanci via Elfutils-devel wrote:
> From ed7638571f188e346dd466c195b9ebda028d1c65 Mon Sep 17 00:00:00 2001
> From: Noah Sanci <nsanci@redhat.com>
> Date: Wed, 28 Jul 2021 14:46:05 -0400
> Subject: [PATCH] debuginfod: PR27277 - Describe retrieved files when verbose
> 
> There appear to exist use cases that intend to simply check for the
> existence of content in a debuginfod server, without actually
> downloading it.  In HTTP land, the HEAD operation is the natural
> expression of this.
> Instead of implementing a HEAD/describe option, allow users, with enough
> verbosity, to print the HTTP response headers upon retrieving a file.

Same comment as earlier, but now please also state the debuginfod
changes itself (the extra headers added).

> E.g output:
> 
> HTTP/1.1 200 OK
> Connection: Keep-Alive
> Content-Length: 2428240
> Cache-Control: public
> Last-Modified: Sat, 15 May 2021 20:49:51 GMT
> X-FILE: "file name"
> X-FILE-SIZE: "byte length of file"

I think an actual output example is better than including "place
holder text".

> +2021-08-02  Noah Sanci  <nsanci@redhat.com>
> +
> +	PR27277
> +	* debuginfod-client.c (struct debuginfod_client): New field
> +	winning_headers.
> +	(struct handle_data): New field response_data.
> +	(header_callback): Store received headers in response_data.
> +	(debuginfod_query_server): Activate CURLOPT_HEADERFUNCTION.
> +	Save winning response_data.
> +	(debuginfod_end): free client winning headers.
> +	* debuginfod.cxx (handle_buildid_f_match): remove X-FILE path. Add
> +	X-FILE and X-FILE-SIZE headers.
> +	(handle_buildid_r_match): remove X-FILE path. Add X-FILE, X-FILE-SIZE
> +	headers, and X-ARCHIVE headers.

So the changes compared to the other one are just in the
debuginfod.cxx file.

> diff --git a/debuginfod/debuginfod.cxx b/debuginfod/debuginfod.cxx
> index 6e182a84..e9b7e76c 100644
> --- a/debuginfod/debuginfod.cxx
> +++ b/debuginfod/debuginfod.cxx
> @@ -1068,6 +1068,9 @@ handle_buildid_f_match (bool internal_req_t,
>    else
>      {
>        MHD_add_response_header (r, "Content-Type", "application/octet-stream");
> +      std::string file = b_source0.substr(b_source0.find_last_of("/")+1, b_source0.length());
> +      MHD_add_response_header (r, "X-FILE-SIZE", to_string(s.st_size).c_str() );
> +      MHD_add_response_header (r, "X-FILE", file.c_str() );
>        add_mhd_last_modified (r, s.st_mtime);
>        if (verbose > 1)
>          obatched(clog) << "serving file " << b_source0 << endl;
> @@ -1537,6 +1540,9 @@ handle_buildid_r_match (bool internal_req_p,
>        inc_metric ("http_responses_total","result","archive fdcache");
>  
>        MHD_add_response_header (r, "Content-Type", "application/octet-stream");
> +      MHD_add_response_header (r, "X-FILE-SIZE", to_string(fs.st_size).c_str());
> +      MHD_add_response_header (r, "X-ARCHIVE", b_source0.c_str());
> +      MHD_add_response_header (r, "X-FILE", b_source1.c_str());
>        add_mhd_last_modified (r, fs.st_mtime);
>        if (verbose > 1)
>          obatched(clog) << "serving fdcache archive " << b_source0 << " file " << b_source1 << endl;
> @@ -1678,6 +1684,11 @@ handle_buildid_r_match (bool internal_req_p,
>        else
>          {
>            MHD_add_response_header (r, "Content-Type", "application/octet-stream");
> +          std::string file = b_source1.substr(b_source1.find_last_of("/")+1, b_source1.length());
> +          MHD_add_response_header (r, "X-FILE-SIZE", to_string(fs.st_size).c_str());
> +          MHD_add_response_header (r, "X-ARCHIVE", b_source0.c_str());
> +          MHD_add_response_header (r, "X-FILE", file.c_str());
> +
>            add_mhd_last_modified (r, archive_entry_mtime(e));
>            if (verbose > 1)
>              obatched(clog) << "serving archive " << b_source0 << " file " << b_source1 << endl;

This looks good. But I wonder if, since these are headers specific to
debuginfod, they shouldn't be named X-DEBUGINFOD-... Just to make sure
they don't clash with any others a proxy might insert.

> +2021-08-04  Noah Sanci  <nsanci@redhat.com>
> +
> +	PR27277
> +	* debuginfod-find.1: Increasing verbosity describes the downloaded
> +	file.
> +	* debuginfod.8: Describe X-FILE, X-FILE-SIZE, and X-ARCHIVE.
> +

> diff --git a/doc/debuginfod.8 b/doc/debuginfod.8
> index 5b0d793c..e9c58fbb 100644
> --- a/doc/debuginfod.8
> +++ b/doc/debuginfod.8
> @@ -256,6 +256,13 @@ Unknown buildid / request combinations result in HTTP error codes.
>  This file service resemblance is intentional, so that an installation
>  can take advantage of standard HTTP management infrastructure.
>  
> +Upon finding a file in an archive or simply in the database, some
> +custom http headers are added to the response. For files in the
> +database X-FILE and X-FILE-SIZE are added. X-FILE is simply the
> +filename and X-FILE-SIZE is the size of the file. For files found
> +in archives, in addition to X-FILE and X-FILE-SIZE, X-ARCHIVE is added.
> +X-ARCHIVE is the name of the the file was found in.

Probably should mention that the file name is unescaped (not url
encoded). The last sentence has too many "the". I think a word is
missing between the last two.

I have again not reviewed the testcase yet.

Cheers,

Mark


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Bug debuginfod/27277] Describe retrieved files when verbose
  2021-09-08 20:56                 ` Mark Wielaard
@ 2021-09-10 18:22                   ` Noah Sanci
  2021-09-12 19:08                     ` Mark Wielaard
  0 siblings, 1 reply; 23+ messages in thread
From: Noah Sanci @ 2021-09-10 18:22 UTC (permalink / raw)
  To: Mark Wielaard; +Cc: elfutils-devel

[-- Attachment #1: Type: text/plain, Size: 4895 bytes --]

Hello,

The updated patch is attached.

On Wed, Sep 8, 2021 at 4:56 PM Mark Wielaard <mark@klomp.org> wrote:
> On Fri, Aug 27, 2021 at 02:38:27PM -0400, Noah Sanci via Elfutils-devel wrote:
> > From ed7638571f188e346dd466c195b9ebda028d1c65 Mon Sep 17 00:00:00 2001
> > From: Noah Sanci <nsanci@redhat.com>
> > Date: Wed, 28 Jul 2021 14:46:05 -0400
> > Subject: [PATCH] debuginfod: PR27277 - Describe retrieved files when verbose
> >
> > There appear to exist use cases that intend to simply check for the
> > existence of content in a debuginfod server, without actually
> > downloading it.  In HTTP land, the HEAD operation is the natural
> > expression of this.
> > Instead of implementing a HEAD/describe option, allow users, with enough
> > verbosity, to print the HTTP response headers upon retrieving a file.
>
> Same comment as earlier, but now please also state the debuginfod
> changes itself (the extra headers added).
Should be good.

> > E.g output:
> >
> > HTTP/1.1 200 OK
> > Connection: Keep-Alive
> > Content-Length: 2428240
> > Cache-Control: public
> > Last-Modified: Sat, 15 May 2021 20:49:51 GMT
> > X-FILE: "file name"
> > X-FILE-SIZE: "byte length of file"
>
> I think an actual output example is better than including "place
> holder text".
Fixed

> > diff --git a/debuginfod/debuginfod.cxx b/debuginfod/debuginfod.cxx
> > index 6e182a84..e9b7e76c 100644
> > --- a/debuginfod/debuginfod.cxx
> > +++ b/debuginfod/debuginfod.cxx
> > @@ -1068,6 +1068,9 @@ handle_buildid_f_match (bool internal_req_t,
> >    else
> >      {
> >        MHD_add_response_header (r, "Content-Type", "application/octet-stream");
> > +      std::string file = b_source0.substr(b_source0.find_last_of("/")+1, b_source0.length());
> > +      MHD_add_response_header (r, "X-FILE-SIZE", to_string(s.st_size).c_str() );
> > +      MHD_add_response_header (r, "X-FILE", file.c_str() );
> >        add_mhd_last_modified (r, s.st_mtime);
> >        if (verbose > 1)
> >          obatched(clog) << "serving file " << b_source0 << endl;
> > @@ -1537,6 +1540,9 @@ handle_buildid_r_match (bool internal_req_p,
> >        inc_metric ("http_responses_total","result","archive fdcache");
> >
> >        MHD_add_response_header (r, "Content-Type", "application/octet-stream");
> > +      MHD_add_response_header (r, "X-FILE-SIZE", to_string(fs.st_size).c_str());
> > +      MHD_add_response_header (r, "X-ARCHIVE", b_source0.c_str());
> > +      MHD_add_response_header (r, "X-FILE", b_source1.c_str());
> >        add_mhd_last_modified (r, fs.st_mtime);
> >        if (verbose > 1)
> >          obatched(clog) << "serving fdcache archive " << b_source0 << " file " << b_source1 << endl;
> > @@ -1678,6 +1684,11 @@ handle_buildid_r_match (bool internal_req_p,
> >        else
> >          {
> >            MHD_add_response_header (r, "Content-Type", "application/octet-stream");
> > +          std::string file = b_source1.substr(b_source1.find_last_of("/")+1, b_source1.length());
> > +          MHD_add_response_header (r, "X-FILE-SIZE", to_string(fs.st_size).c_str());
> > +          MHD_add_response_header (r, "X-ARCHIVE", b_source0.c_str());
> > +          MHD_add_response_header (r, "X-FILE", file.c_str());
> > +
> >            add_mhd_last_modified (r, archive_entry_mtime(e));
> >            if (verbose > 1)
> >              obatched(clog) << "serving archive " << b_source0 << " file " << b_source1 << endl;
>
> This looks good. But I wonder if, since these are headers specific to
> debuginfod, they shouldn't be named X-DEBUGINFOD-... Just to make sure
> they don't clash with any others a proxy might insert.
Fixed

> > +2021-08-04  Noah Sanci  <nsanci@redhat.com>
> > +
> > +     PR27277
> > +     * debuginfod-find.1: Increasing verbosity describes the downloaded
> > +     file.
> > +     * debuginfod.8: Describe X-FILE, X-FILE-SIZE, and X-ARCHIVE.
> > +
>
> > diff --git a/doc/debuginfod.8 b/doc/debuginfod.8
> > index 5b0d793c..e9c58fbb 100644
> > --- a/doc/debuginfod.8
> > +++ b/doc/debuginfod.8
> > @@ -256,6 +256,13 @@ Unknown buildid / request combinations result in HTTP error codes.
> >  This file service resemblance is intentional, so that an installation
> >  can take advantage of standard HTTP management infrastructure.
> >
> > +Upon finding a file in an archive or simply in the database, some
> > +custom http headers are added to the response. For files in the
> > +database X-FILE and X-FILE-SIZE are added. X-FILE is simply the
> > +filename and X-FILE-SIZE is the size of the file. For files found
> > +in archives, in addition to X-FILE and X-FILE-SIZE, X-ARCHIVE is added.
> > +X-ARCHIVE is the name of the the file was found in.
>
> Probably should mention that the file name is unescaped (not url
> encoded).
Fixed
> The last sentence has too many "the". I think a word is
> missing between the last two.
Not sure if what I did is considered a fix

Thanks,
Noah Sanci

[-- Attachment #2: 0002-debuginfod-PR27277-Describe-retrieved-files-when-ver.patch --]
[-- Type: text/x-patch, Size: 21636 bytes --]

From 979f19eb4fd7a35ace4ddafed103922559b93120 Mon Sep 17 00:00:00 2001
From: Noah Sanci <nsanci@redhat.com>
Date: Wed, 28 Jul 2021 14:46:05 -0400
Subject: [PATCH 2/2] debuginfod: PR27277 - Describe retrieved files when
 verbose

Allow users, with enough verbosity, to print the HTTP response headers
upon retrieving a file. These files may include several custome http
response headers such as X-DEBUGINFOD-FILE, X-DEBUGINFOD-SIZE, and
X-DEBUGINFOD-ARCHIVE. These headers are added from the daemon, in
debuginfod.cxx.
run-debuginfod-fd-prefetch-caches.sh was updated so that it doesn't
trip and fail as previously greping for a value that should yield zero
caused an error.
Previously, target_handle was used to gather CURLE_OK reponses. Under
some conditions, target_handle was NULL when we wanted it to point to
the handle. This could cause some failuers. instead msg->easy_handle
is used, which ensures the correct handle is used.
E.g output:

HTTP/1.1 200 OK
Connection: Keep-Alive
Content-Length: 4095072
Cache-Control: public
Last-Modified: Thu, 09 Sep 2021 19:06:40 GMT
X-FILE: debuginfod
X-FILE-SIZE: 4095072
Content-Type: application/octet-stream
Date: Fri, 10 Sep 2021 16:38:06 GMT

https://sourceware.org/bugzilla/show_bug.cgi?id=27277

Signed-off-by: Noah Sanci <nsanci@redhat.com>
---
 debuginfod/ChangeLog                       |  16 +++
 debuginfod/debuginfod-client.c             |  58 +++++++++-
 debuginfod/debuginfod.cxx                  |  11 ++
 doc/ChangeLog                              |   8 ++
 doc/debuginfod-find.1                      |   3 +-
 doc/debuginfod.8                           |   9 ++
 tests/ChangeLog                            |  18 +++-
 tests/Makefile.am                          |   3 +-
 tests/run-debuginfod-fd-prefetch-caches.sh |   6 +-
 tests/run-debuginfod-response-headers.sh   | 118 +++++++++++++++++++++
 tests/test-subr.sh                         |  10 +-
 11 files changed, 241 insertions(+), 19 deletions(-)
 create mode 100755 tests/run-debuginfod-response-headers.sh

diff --git a/debuginfod/ChangeLog b/debuginfod/ChangeLog
index 7e221f54..1ec83de9 100644
--- a/debuginfod/ChangeLog
+++ b/debuginfod/ChangeLog
@@ -26,6 +26,22 @@
 	* debuginfod.cxx (handler_cb): Fix after_you unique_set key
 	to the entire incoming URL.
 
+2021-08-02  Noah Sanci  <nsanci@redhat.com>
+
+	PR27277
+	* debuginfod-client.c (struct debuginfod_client): New field
+	winning_headers.
+	(struct handle_data): New field response_data.
+	(header_callback): Store received headers in response_data.
+	(debuginfod_query_server): Activate CURLOPT_HEADERFUNCTION.
+	Save winning response_data.
+	(debuginfod_end): free client winning headers.
+	* debuginfod.cxx (handle_buildid_f_match): remove X-DEBUGINFOD-FILE
+	path. Add X-DEBUGINFOD-FILE and X-DEBUGINFOD-SIZE headers.
+	(handle_buildid_r_match): remove X-DEBUGINFOD-FILE path. Add
+	X-DEBUGINFOD-FILE, X-DEBUGINFOD-SIZE
+	headers, and X-ARCHIVE headers.
+
 2021-07-26  Noah Sanci  <nsanci@redhat.com>
 
 	PR27982
diff --git a/debuginfod/debuginfod-client.c b/debuginfod/debuginfod-client.c
index d41723ce..9c16e7a3 100644
--- a/debuginfod/debuginfod-client.c
+++ b/debuginfod/debuginfod-client.c
@@ -127,6 +127,7 @@ struct debuginfod_client
      timeout or other info gotten from environment variables, the
      handle data, etc. So those don't have to be reparsed and
      recreated on each request.  */
+  char * winning_headers;
 };
 
 /* The cache_clean_interval_s file within the debuginfod cache specifies
@@ -183,6 +184,8 @@ struct handle_data
      to the cache. Used to ensure that a file is not downloaded from
      multiple servers unnecessarily.  */
   CURL **target_handle;
+  /* Response http headers for this client handle, sent from the server */
+  char *response_data;
 };
 
 static size_t
@@ -499,6 +502,33 @@ default_progressfn (debuginfod_client *c, long a, long b)
 }
 
 
+static size_t
+header_callback (char * buffer, size_t size, size_t numitems, void * userdata)
+{
+  if (size != 1)
+    return 0;
+  /* Temporary buffer for realloc */
+  char *temp = NULL;
+  size_t userlen = 0;
+  if (*(char**)userdata == NULL)
+    {
+      temp = malloc(numitems+1);
+      if (temp == NULL)
+        return 0;
+      memset(temp, '\0', numitems+1);
+    }
+  else
+    {
+      userlen = strlen(*(char**)userdata);
+      temp = realloc(*(char**)userdata, userlen + numitems + 1);
+      if (temp == NULL)
+       return 0;
+    }
+  strncat(temp, buffer, numitems);
+  *(char**)userdata = temp;
+  return numitems;
+}
+
 /* Query each of the server URLs found in $DEBUGINFOD_URLS for the file
    with the specified build-id, type (debuginfo, executable or source)
    and filename. filename may be NULL. If found, return a file
@@ -936,10 +966,13 @@ debuginfod_query_server (debuginfod_client *c,
 	  curl_easy_setopt (data[i].handle, CURLOPT_LOW_SPEED_LIMIT,
 			    100 * 1024L);
 	}
+      data[i].response_data = NULL;
       curl_easy_setopt(data[i].handle, CURLOPT_FILETIME, (long) 1);
       curl_easy_setopt(data[i].handle, CURLOPT_FOLLOWLOCATION, (long) 1);
       curl_easy_setopt(data[i].handle, CURLOPT_FAILONERROR, (long) 1);
       curl_easy_setopt(data[i].handle, CURLOPT_NOSIGNAL, (long) 1);
+      curl_easy_setopt(data[i].handle, CURLOPT_HEADERFUNCTION, header_callback);
+      curl_easy_setopt(data[i].handle, CURLOPT_HEADERDATA, (void *) &(data[i].response_data));
 #if LIBCURL_VERSION_NUM >= 0x072a00 /* 7.42.0 */
       curl_easy_setopt(data[i].handle, CURLOPT_PATH_AS_IS, (long) 1);
 #else
@@ -961,6 +994,7 @@ debuginfod_query_server (debuginfod_client *c,
   int committed_to = -1;
   bool verbose_reported = false;
   struct timespec start_time, cur_time;
+  c->winning_headers = NULL;
   if ( maxtime > 0 && clock_gettime(CLOCK_MONOTONIC_RAW, &start_time) == -1)
     {
       rc = errno;
@@ -995,8 +1029,18 @@ debuginfod_query_server (debuginfod_client *c,
 	    if (data[i].handle != target_handle)
 	      curl_multi_remove_handle(curlm, data[i].handle);
 	    else
-	      committed_to = i;
-	}
+              {
+	        committed_to = i;
+                if (c->winning_headers == NULL)
+                  {
+                    c->winning_headers = data[committed_to].response_data;
+                    if (vfd >= 0 && c->winning_headers != NULL)
+                      dprintf(vfd, "\n%s", c->winning_headers);
+                    data[committed_to].response_data = NULL;
+                  }
+
+              }
+        }
 
       if (vfd >= 0 && !verbose_reported && committed_to >= 0)
 	{
@@ -1161,10 +1205,10 @@ debuginfod_query_server (debuginfod_client *c,
                 {
                   char *effective_url = NULL;
                   long resp_code = 500;
-                  CURLcode ok1 = curl_easy_getinfo (target_handle,
+                  CURLcode ok1 = curl_easy_getinfo (msg->easy_handle,
 						    CURLINFO_EFFECTIVE_URL,
 						    &effective_url);
-                  CURLcode ok2 = curl_easy_getinfo (target_handle,
+                  CURLcode ok2 = curl_easy_getinfo (msg->easy_handle,
 						    CURLINFO_RESPONSE_CODE,
 						    &resp_code);
                   if(ok1 == CURLE_OK && ok2 == CURLE_OK && effective_url)
@@ -1238,7 +1282,10 @@ debuginfod_query_server (debuginfod_client *c,
             {
               curl_multi_remove_handle(curlm, data[i].handle); /* ok to repeat */
               curl_easy_cleanup (data[i].handle);
+              free(data[i].response_data);
             }
+            free(c->winning_headers);
+            c->winning_headers = NULL;
 	    goto query_in_parallel;
 	}
       else
@@ -1281,6 +1328,7 @@ debuginfod_query_server (debuginfod_client *c,
     {
       curl_multi_remove_handle(curlm, data[i].handle); /* ok to repeat */
       curl_easy_cleanup (data[i].handle);
+      free (data[i].response_data);
     }
 
   for (int i = 0; i < num_urls; ++i)
@@ -1304,6 +1352,7 @@ debuginfod_query_server (debuginfod_client *c,
     {
       curl_multi_remove_handle(curlm, data[i].handle); /* ok to repeat */
       curl_easy_cleanup (data[i].handle);
+      free (data[i].response_data);
     }
 
   unlink (target_cache_tmppath);
@@ -1415,6 +1464,7 @@ debuginfod_end (debuginfod_client *client)
 
   curl_multi_cleanup (client->server_mhandle);
   curl_slist_free_all (client->headers);
+  free (client->winning_headers);
   free (client->url);
   free (client);
 }
diff --git a/debuginfod/debuginfod.cxx b/debuginfod/debuginfod.cxx
index 3269f657..4f7a09ad 100644
--- a/debuginfod/debuginfod.cxx
+++ b/debuginfod/debuginfod.cxx
@@ -1075,6 +1075,9 @@ handle_buildid_f_match (bool internal_req_t,
   else
     {
       MHD_add_response_header (r, "Content-Type", "application/octet-stream");
+      std::string file = b_source0.substr(b_source0.find_last_of("/")+1, b_source0.length());
+      MHD_add_response_header (r, "X-DEBUGINFOD-SIZE", to_string(s.st_size).c_str() );
+      MHD_add_response_header (r, "X-DEBUGINFOD-FILE", file.c_str() );
       add_mhd_last_modified (r, s.st_mtime);
       if (verbose > 1)
         obatched(clog) << "serving file " << b_source0 << endl;
@@ -1544,6 +1547,9 @@ handle_buildid_r_match (bool internal_req_p,
       inc_metric ("http_responses_total","result","archive fdcache");
 
       MHD_add_response_header (r, "Content-Type", "application/octet-stream");
+      MHD_add_response_header (r, "X-DEBUGINFOD-SIZE", to_string(fs.st_size).c_str());
+      MHD_add_response_header (r, "X-DEBUGINFOD-ARCHIVE", b_source0.c_str());
+      MHD_add_response_header (r, "X-DEBUGINFOD-FILE", b_source1.c_str());
       add_mhd_last_modified (r, fs.st_mtime);
       if (verbose > 1)
         obatched(clog) << "serving fdcache archive " << b_source0 << " file " << b_source1 << endl;
@@ -1685,6 +1691,11 @@ handle_buildid_r_match (bool internal_req_p,
       else
         {
           MHD_add_response_header (r, "Content-Type", "application/octet-stream");
+          std::string file = b_source1.substr(b_source1.find_last_of("/")+1, b_source1.length());
+          MHD_add_response_header (r, "X-DEBUGINFOD-SIZE", to_string(fs.st_size).c_str());
+          MHD_add_response_header (r, "X-DEBUGINFOD-ARCHIVE", b_source0.c_str());
+          MHD_add_response_header (r, "X-DEBUGINFOD-FILE", file.c_str());
+
           add_mhd_last_modified (r, archive_entry_mtime(e));
           if (verbose > 1)
             obatched(clog) << "serving archive " << b_source0 << " file " << b_source1 << endl;
diff --git a/doc/ChangeLog b/doc/ChangeLog
index ada48383..db3a3584 100644
--- a/doc/ChangeLog
+++ b/doc/ChangeLog
@@ -18,6 +18,14 @@
 	* Makefile.am: Updated to include debuginfod-client-config.7
 	* man3, man7: Symlinks for source tree man page testing.
 
+2021-08-04  Noah Sanci  <nsanci@redhat.com>
+
+	PR27277
+	* debuginfod-find.1: Increasing verbosity describes the downloaded
+	file.
+	* debuginfod.8: Describe X-DEBUGINFOD-FILE, X-DEBUGINFOD-SIZE, and
+	X-DEBUGINFOD-ARCHIVE.
+
 2021-07-26  Noah Sanci <nsanci@redhat.com>
 
 	PR27982
diff --git a/doc/debuginfod-find.1 b/doc/debuginfod-find.1
index a61673f5..957ec7e7 100644
--- a/doc/debuginfod-find.1
+++ b/doc/debuginfod-find.1
@@ -110,7 +110,8 @@ l l.
 
 .TP
 .B "\-v"
-Increase verbosity, including printing frequent download-progress messages.
+Increase verbosity, including printing frequent download-progress messages
+and printing the http response headers from the server.
 
 
 .SH "SECURITY"
diff --git a/doc/debuginfod.8 b/doc/debuginfod.8
index f9a418d1..fde06bb8 100644
--- a/doc/debuginfod.8
+++ b/doc/debuginfod.8
@@ -258,6 +258,15 @@ Unknown buildid / request combinations result in HTTP error codes.
 This file service resemblance is intentional, so that an installation
 can take advantage of standard HTTP management infrastructure.
 
+Upon finding a file in an archive or simply in the database, some
+custom http headers are added to the response. For files in the
+database X-DEBUGINFOD-FILE and X-DEBUGINFOD-SIZE are added.
+X-DEBUGINFOD-FILE is simply the unescaped filename and
+X-DEBUGINFOD-SIZE is the size of the file. For files found in archives,
+in addition to X-DEBUGINFOD-FILE and X-DEBUGINFOD-SIZE,
+X-DEBUGINFOD-ARCHIVE is added.  X-DEBUGINFOD-ARCHIVE is the name of the
+archive the file was found in.
+
 There are three requests.  In each case, the buildid is encoded as a
 lowercase hexadecimal string.  For example, for a program \fI/bin/ls\fP,
 look at the ELF note GNU_BUILD_ID:
diff --git a/tests/ChangeLog b/tests/ChangeLog
index 1abe5456..23aeec4a 100644
--- a/tests/ChangeLog
+++ b/tests/ChangeLog
@@ -106,11 +106,12 @@
 	run-debuginfod-query-retry.sh,
 	run-debuginfod-regex.sh,
 	run-debuginfod-sizetime.sh,
-	run-debuginfod-tmp-home.sh, and
-	run-debuginfod-writable.sh
-	run-debuginfod-x-forwarded-for.sh
-	* tests/Makefile.am: Added the above new files to the test suite
-	* tests/test-subr.sh: Added some general functions for above tests
+	run-debuginfod-tmp-home.sh,
+	run-debuginfod-writable.sh, and
+	run-debuginfod-x-forwarded-for.sh.
+	All of the above functions now use a 'base' variable to avoid races
+	* Makefile.am: Added the above new files to the test suite
+	* test-subr.sh: Added some general functions for above tests
 
 2021-08-28  Mark Wielaard  <mark@klomp.org>
 
@@ -160,6 +161,13 @@
 	* backtrace.c (callback_verify): Check for pthread_kill as first
 	frame. Change asserts to fprintf plus abort.
 
+2021-08-02  Noah Sanci  <nsanci@redhat.com>
+
+	PR27277
+	* run-debuginfod-response-headers.sh: Add a test to ensure that file descriptions
+	are accurate for files outside or within archives.
+	* Makefile.am: Add run-debuginfod-response-headers.sh to TESTS.
+
 2021-07-26  Noah Sanci  <nsanci@redhat.com>
 
 	PR27982
diff --git a/tests/Makefile.am b/tests/Makefile.am
index c586422e..ef097567 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -231,7 +231,8 @@ TESTS += run-debuginfod-dlopen.sh \
 	 run-debuginfod-federation-sqlite.sh \
 	 run-debuginfod-federation-link.sh \
 	 run-debuginfod-federation-metrics.sh \
-	 run-debuginfod-x-forwarded-for.sh
+	 run-debuginfod-x-forwarded-for.sh \
+         run-debuginfod-response-headers.sh
 endif
 endif
 
diff --git a/tests/run-debuginfod-fd-prefetch-caches.sh b/tests/run-debuginfod-fd-prefetch-caches.sh
index 61fee9e9..bee88c1e 100755
--- a/tests/run-debuginfod-fd-prefetch-caches.sh
+++ b/tests/run-debuginfod-fd-prefetch-caches.sh
@@ -51,9 +51,9 @@ grep 'prefetch fds ' vlog$PORT1 #$PREFETCH_FDS
 grep 'prefetch mbs ' vlog$PORT1 #$PREFETCH_MBS
 # search the vlog to find what metric counts should be and check the correct metrics
 # were incrimented
-wait_ready $PORT1 'fdcache_op_count{op="enqueue"}' $( grep -c 'interned.*front=1' vlog$PORT1 )
-wait_ready $PORT1 'fdcache_op_count{op="evict"}' $( grep -c 'evicted a=.*' vlog$PORT1 )
-wait_ready $PORT1 'fdcache_op_count{op="prefetch_enqueue"}' $( grep -c 'interned.*front=0' vlog$PORT1 )
+wait_ready $PORT1 'fdcache_op_count{op="enqueue"}' $( grep -c 'interned.*front=1' vlog$PORT1 || true)
+wait_ready $PORT1 'fdcache_op_count{op="evict"}' $( grep -c 'evicted a=.*' vlog$PORT1 || true )
+wait_ready $PORT1 'fdcache_op_count{op="prefetch_enqueue"}' $( grep -c 'interned.*front=0' vlog$PORT1 || true )
 wait_ready $PORT1 'fdcache_op_count{op="prefetch_evict"}' $( grep -c 'evicted from prefetch a=.*front=0' vlog$PORT1 || true )
 
 kill $PID1
diff --git a/tests/run-debuginfod-response-headers.sh b/tests/run-debuginfod-response-headers.sh
new file mode 100755
index 00000000..a458ca1b
--- /dev/null
+++ b/tests/run-debuginfod-response-headers.sh
@@ -0,0 +1,118 @@
+#!/usr/bin/env bash
+#
+# Copyright (C) 2019-2021 Red Hat, Inc.
+# This file is part of elfutils.
+#
+# This file is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# elfutils is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+. $srcdir/test-subr.sh  # includes set -e
+
+# for test case debugging, uncomment:
+set -x
+
+DB=${PWD}/.debuginfod_tmp.sqlite
+tempfiles $DB
+export DEBUGINFOD_CACHE_PATH=${PWD}/.client_cache
+
+# This variable is essential and ensures no time-race for claiming ports occurs
+# set base to a unique multiple of 100 not used in any other 'run-debuginfod-*' test
+base=9500
+get_ports
+mkdir F R
+
+env LD_LIBRARY_PATH=$ldpath DEBUGINFOD_URLS= ${abs_builddir}/../debuginfod/debuginfod $VERBOSE -F -R -d $DB -p $PORT1 -t0 -g0 -v R F > vlog$PORT1 2>&1 &
+PID1=$!
+tempfiles vlog$PORT1
+errfiles vlog$PORT1
+# Server must become ready
+wait_ready $PORT1 'ready' 1
+export DEBUGINFOD_URLS=http://127.0.0.1:$PORT1/   # or without trailing /
+# Check thread comm names
+ps -q $PID1 -e -L -o '%p %c %a' | grep groom
+ps -q $PID1 -e -L -o '%p %c %a' | grep scan
+ps -q $PID1 -e -L -o '%p %c %a' | grep traverse
+
+# We use -t0 and -g0 here to turn off time-based scanning & grooming.
+# For testing purposes, we just sic SIGUSR1 / SIGUSR2 at the process.
+
+########################################################################
+
+# Compile a simple program, strip its debuginfo and save the build-id.
+# Also move the debuginfo into another directory so that elfutils
+# cannot find it without debuginfod.
+echo "int main() { return 0; }" > ${PWD}/prog.c
+tempfiles prog.c
+# Create a subdirectory to confound source path names
+mkdir foobar
+gcc -Wl,--build-id -g -o prog ${PWD}/foobar///./../prog.c
+testrun ${abs_top_builddir}/src/strip -g -f prog.debug ${PWD}/prog
+BUILDID=`env LD_LIBRARY_PATH=$ldpath ${abs_builddir}/../src/readelf \
+          -a prog | grep 'Build ID' | cut -d ' ' -f 7`
+
+mv prog F
+mv prog.debug F
+kill -USR1 $PID1
+# Wait till both files are in the index.
+wait_ready $PORT1 'thread_work_total{role="traverse"}' 1
+wait_ready $PORT1 'thread_work_pending{role="scan"}' 0
+wait_ready $PORT1 'thread_busy{role="scan"}' 0
+
+cp -rvp ${abs_srcdir}/debuginfod-rpms R
+if [ "$zstd" = "false" ]; then  # nuke the zstd fedora 31 ones
+    rm -vrf R/debuginfod-rpms/fedora31
+fi
+
+kill -USR1 $PID1
+# Wait till both files are in the index and scan/index fully finished
+wait_ready $PORT1 'thread_work_total{role="traverse"}' 2
+wait_ready $PORT1 'thread_work_pending{role="scan"}' 0
+wait_ready $PORT1 'thread_busy{role="scan"}' 0
+# All rpms need to be in the index, except the dummy permission-000 one
+rpms=$(find R -name \*rpm | grep -v nothing | wc -l)
+wait_ready $PORT1 'scanned_files_total{source=".rpm archive"}' $rpms
+kill -USR1 $PID1  # two hits of SIGUSR1 may be needed to resolve .debug->dwz->srefs
+# Wait till both files are in the index and scan/index fully finished
+wait_ready $PORT1 'thread_work_total{role="traverse"}' 3
+wait_ready $PORT1 'thread_work_pending{role="scan"}' 0
+wait_ready $PORT1 'thread_busy{role="scan"}' 0
+
+########################################################################
+## PR27277
+# Make a simple request to the debuginfod server and check debuginfod-find's vlog to see if
+# the custom HTTP headers are received.
+rm -rf $DEBUGINFOD_CACHE_PATH
+env DEBUGINFOD_URLS="http://127.0.0.1:"$PORT1 LD_LIBRARY_PATH=$ldpath ${abs_top_builddir}/debuginfod/debuginfod-find\
+    -vvv executable F/prog > vlog-find$PORT1.1 2>&1
+tempfiles vlog-find$PORT1.1
+grep 'Content-Length: ' vlog-find$PORT1.1
+grep 'Connection: ' vlog-find$PORT1.1
+grep 'Cache-Control: ' vlog-find$PORT1.1
+grep 'X-DEBUGINFOD-FILE: ' vlog-find$PORT1.1
+grep 'X-DEBUGINFOD-SIZE: ' vlog-find$PORT1.1
+
+# Check to see if an executable file located in an archive prints the file's description and archive
+env DEBUGINFOD_URLS="http://127.0.0.1:"$PORT1 LD_LIBRARY_PATH=$ldpath ${abs_top_builddir}/debuginfod/debuginfod-find\
+    -vvv executable c36708a78618d597dee15d0dc989f093ca5f9120 > vlog-find$PORT1.2 2>&1
+tempfiles vlog-find$PORT1.2
+grep 'Content-Length: ' vlog-find$PORT1.2
+grep 'Connection: ' vlog-find$PORT1.2
+grep 'Cache-Control: ' vlog-find$PORT1.2
+grep 'X-DEBUGINFOD-FILE: ' vlog-find$PORT1.2
+grep 'X-DEBUGINFOD-SIZE: ' vlog-find$PORT1.2
+grep 'X-DEBUGINFOD-ARCHIVE: ' vlog-find$PORT1.2
+
+kill $PID1
+wait $PID1
+PID1=0
+exit 0
diff --git a/tests/test-subr.sh b/tests/test-subr.sh
index 41e95e31..95dcbb26 100644
--- a/tests/test-subr.sh
+++ b/tests/test-subr.sh
@@ -43,9 +43,9 @@ remove_files=
 exit_cleanup()
 {
   rm -rf ${PWD}/.client_cache*
-  rm -f $remove_files; 
+  rm -f $remove_files;
   ls -a ${PWD}
-  cd ..; 
+  cd ..;
   rmdir $test_dir
 }
 trap exit_cleanup 0
@@ -228,7 +228,7 @@ cleanup()
 {
   if [ $PID1 -ne 0 ]; then kill $PID1; wait $PID1; fi
   if [ $PID2 -ne 0 ]; then kill $PID2; wait $PID2; fi
-  rm -rf ${PWD}/.client_cache F R D L Z ${PWD}/foobar ${PWD}/mocktree 
+  rm -rf ${PWD}/.client_cache F R D L Z ${PWD}/foobar ${PWD}/mocktree
   exit_cleanup
 }
 trap cleanup 0 1 2 3 5 9 15
@@ -316,12 +316,12 @@ archive_test() {
 
 get_ports() {
   while true; do
-    PORT1=`expr '(' $RANDOM % 1000 ')' + 9000`
+    PORT1=`expr '(' $RANDOM % 100 ')' + $base`
     ss -atn | fgrep ":$PORT1" || break
   done
 # Some tests will use two servers, so assign the second var
   while true; do
-    PORT2=`expr '(' $RANDOM % 1000 ')' + 9000`
+    PORT2=`expr '(' $RANDOM % 100 ')' + $base`
     ss -atn | fgrep ":$PORT2" && $PORT1 -ne $PORT2 || break
   done
 
-- 
2.31.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Bug debuginfod/27277] Describe retrieved files when verbose
  2021-09-10 18:22                   ` Noah Sanci
@ 2021-09-12 19:08                     ` Mark Wielaard
  2021-09-13 20:07                       ` Noah Sanci
  0 siblings, 1 reply; 23+ messages in thread
From: Mark Wielaard @ 2021-09-12 19:08 UTC (permalink / raw)
  To: Noah Sanci; +Cc: elfutils-devel

Hi Noah,

On Fri, Sep 10, 2021 at 02:22:00PM -0400, Noah Sanci via Elfutils-devel wrote:
> From 979f19eb4fd7a35ace4ddafed103922559b93120 Mon Sep 17 00:00:00 2001
> From: Noah Sanci <nsanci@redhat.com>
> Date: Wed, 28 Jul 2021 14:46:05 -0400
> Subject: [PATCH 2/2] debuginfod: PR27277 - Describe retrieved files when
>  verbose
> 
> Allow users, with enough verbosity, to print the HTTP response headers
> upon retrieving a file. These files may include several custome http
> response headers such as X-DEBUGINFOD-FILE, X-DEBUGINFOD-SIZE, and
> X-DEBUGINFOD-ARCHIVE. These headers are added from the daemon, in
> debuginfod.cxx.

> run-debuginfod-fd-prefetch-caches.sh was updated so that it doesn't
> trip and fail as previously greping for a value that should yield zero
> caused an error.

I think this part should be in this patch.

> Previously, target_handle was used to gather CURLE_OK reponses. Under
> some conditions, target_handle was NULL when we wanted it to point to
> the handle. This could cause some failuers. instead msg->easy_handle
> is used, which ensures the correct handle is used.

Thanks for including this explanation. What were the "some conditions"?

> +2021-08-02  Noah Sanci  <nsanci@redhat.com>
> +
> +	PR27277
> +	* debuginfod-client.c (struct debuginfod_client): New field
> +	winning_headers.
> +	(struct handle_data): New field response_data.
> +	(header_callback): Store received headers in response_data.
> +	(debuginfod_query_server): Activate CURLOPT_HEADERFUNCTION.
> +	Save winning response_data.
> +	(debuginfod_end): free client winning headers.
> +	* debuginfod.cxx (handle_buildid_f_match): remove X-DEBUGINFOD-FILE
> +	path. Add X-DEBUGINFOD-FILE and X-DEBUGINFOD-SIZE headers.
> +	(handle_buildid_r_match): remove X-DEBUGINFOD-FILE path. Add
> +	X-DEBUGINFOD-FILE, X-DEBUGINFOD-SIZE
> +	headers, and X-ARCHIVE headers.
> +
>  2021-07-26  Noah Sanci  <nsanci@redhat.com>
>  
>  	PR27982
> diff --git a/debuginfod/debuginfod-client.c b/debuginfod/debuginfod-client.c
> index d41723ce..9c16e7a3 100644
> --- a/debuginfod/debuginfod-client.c
> +++ b/debuginfod/debuginfod-client.c
> @@ -127,6 +127,7 @@ struct debuginfod_client
>       timeout or other info gotten from environment variables, the
>       handle data, etc. So those don't have to be reparsed and
>       recreated on each request.  */
> +  char * winning_headers;
>  };
>  
>  /* The cache_clean_interval_s file within the debuginfod cache specifies
> @@ -183,6 +184,8 @@ struct handle_data
>       to the cache. Used to ensure that a file is not downloaded from
>       multiple servers unnecessarily.  */
>    CURL **target_handle;
> +  /* Response http headers for this client handle, sent from the server */
> +  char *response_data;
>  };

I think it will be convenient to also add a size_t response_data_size
field.  See below.

> +static size_t
> +header_callback (char * buffer, size_t size, size_t numitems, void * userdata)
> +{
> +  if (size != 1)
> +    return 0;
> +  /* Temporary buffer for realloc */
> +  char *temp = NULL;
> +  size_t userlen = 0;
> +  if (*(char**)userdata == NULL)
> +    {
> +      temp = malloc(numitems+1);
> +      if (temp == NULL)
> +        return 0;
> +      memset(temp, '\0', numitems+1);
> +    }
> +  else
> +    {
> +      userlen = strlen(*(char**)userdata);
> +      temp = realloc(*(char**)userdata, userlen + numitems + 1);
> +      if (temp == NULL)
> +       return 0;
> +    }
> +  strncat(temp, buffer, numitems);
> +  *(char**)userdata = temp;
> +  return numitems;
> +}

I found https://curl.se/libcurl/c/CURLOPT_HEADERFUNCTION.html
Maybe you could add that as comment for future readers.

In the above userdata points to data[i].response_data. Which means the
length/size needs to recalculated each time. Also the data is added
with strncat which needs to walk the whole buffer again.

If the stuct handle_data had also a size field then most of the above
recalculations of the size are unnecessary and since we then know the
(old) end of response_data we can simply memcpy the new data to the
end (of course we need to make sure to add a zero terminator, but that
can be done with one byte wrote instead of doing a memset of the whole
buffer).

>  /* Query each of the server URLs found in $DEBUGINFOD_URLS for the file
>     with the specified build-id, type (debuginfo, executable or source)
>     and filename. filename may be NULL. If found, return a file
> @@ -936,10 +966,13 @@ debuginfod_query_server (debuginfod_client *c,
>  	  curl_easy_setopt (data[i].handle, CURLOPT_LOW_SPEED_LIMIT,
>  			    100 * 1024L);
>  	}
> +      data[i].response_data = NULL;
>        curl_easy_setopt(data[i].handle, CURLOPT_FILETIME, (long) 1);
>        curl_easy_setopt(data[i].handle, CURLOPT_FOLLOWLOCATION, (long) 1);
>        curl_easy_setopt(data[i].handle, CURLOPT_FAILONERROR, (long) 1);
>        curl_easy_setopt(data[i].handle, CURLOPT_NOSIGNAL, (long) 1);
> +      curl_easy_setopt(data[i].handle, CURLOPT_HEADERFUNCTION, header_callback);
> +      curl_easy_setopt(data[i].handle, CURLOPT_HEADERDATA, (void *) &(data[i].response_data));

So if we wanted to do the suggestion of using a size field then it
should be initialized to zero here and the last call should be
&data[i].

>  #if LIBCURL_VERSION_NUM >= 0x072a00 /* 7.42.0 */
>        curl_easy_setopt(data[i].handle, CURLOPT_PATH_AS_IS, (long) 1);
>  #else
> @@ -961,6 +994,7 @@ debuginfod_query_server (debuginfod_client *c,
>    int committed_to = -1;
>    bool verbose_reported = false;
>    struct timespec start_time, cur_time;
> +  c->winning_headers = NULL;
>    if ( maxtime > 0 && clock_gettime(CLOCK_MONOTONIC_RAW, &start_time) == -1)
>      {
>        rc = errno;
> @@ -995,8 +1029,18 @@ debuginfod_query_server (debuginfod_client *c,
>  	    if (data[i].handle != target_handle)
>  	      curl_multi_remove_handle(curlm, data[i].handle);
>  	    else
> -	      committed_to = i;
> -	}
> +              {
> +	        committed_to = i;
> +                if (c->winning_headers == NULL)

The indenting here is off because of the mixing of spaces and tabs.

> +2021-08-04  Noah Sanci  <nsanci@redhat.com>
> +
> +	PR27277
> +	* debuginfod-find.1: Increasing verbosity describes the downloaded
> +	file.
> +	* debuginfod.8: Describe X-DEBUGINFOD-FILE, X-DEBUGINFOD-SIZE, and
> +	X-DEBUGINFOD-ARCHIVE.

Thanks for updating the docs. Looks good.

> diff --git a/tests/ChangeLog b/tests/ChangeLog
> index 1abe5456..23aeec4a 100644
> --- a/tests/ChangeLog
> +++ b/tests/ChangeLog
> @@ -106,11 +106,12 @@
>  	run-debuginfod-query-retry.sh,
>  	run-debuginfod-regex.sh,
>  	run-debuginfod-sizetime.sh,
> -	run-debuginfod-tmp-home.sh, and
> -	run-debuginfod-writable.sh
> -	run-debuginfod-x-forwarded-for.sh
> -	* tests/Makefile.am: Added the above new files to the test suite
> -	* tests/test-subr.sh: Added some general functions for above tests
> +	run-debuginfod-tmp-home.sh,
> +	run-debuginfod-writable.sh, and
> +	run-debuginfod-x-forwarded-for.sh.
> +	All of the above functions now use a 'base' variable to avoid races
> +	* Makefile.am: Added the above new files to the test suite
> +	* test-subr.sh: Added some general functions for above tests

These changes seem unrelated to this patch.

> +2021-08-02  Noah Sanci  <nsanci@redhat.com>
> +
> +	PR27277
> +	* run-debuginfod-response-headers.sh: Add a test to ensure that file descriptions
> +	are accurate for files outside or within archives.
> +	* Makefile.am: Add run-debuginfod-response-headers.sh to TESTS.

It also needs to be added to EXTRA_DIST.

> diff --git a/tests/run-debuginfod-fd-prefetch-caches.sh b/tests/run-debuginfod-fd-prefetch-caches.sh
> index 61fee9e9..bee88c1e 100755
> --- a/tests/run-debuginfod-fd-prefetch-caches.sh
> +++ b/tests/run-debuginfod-fd-prefetch-caches.sh
> @@ -51,9 +51,9 @@ grep 'prefetch fds ' vlog$PORT1 #$PREFETCH_FDS
>  grep 'prefetch mbs ' vlog$PORT1 #$PREFETCH_MBS
>  # search the vlog to find what metric counts should be and check the correct metrics
>  # were incrimented
> -wait_ready $PORT1 'fdcache_op_count{op="enqueue"}' $( grep -c 'interned.*front=1' vlog$PORT1 )
> -wait_ready $PORT1 'fdcache_op_count{op="evict"}' $( grep -c 'evicted a=.*' vlog$PORT1 )
> -wait_ready $PORT1 'fdcache_op_count{op="prefetch_enqueue"}' $( grep -c 'interned.*front=0' vlog$PORT1 )
> +wait_ready $PORT1 'fdcache_op_count{op="enqueue"}' $( grep -c 'interned.*front=1' vlog$PORT1 || true)
> +wait_ready $PORT1 'fdcache_op_count{op="evict"}' $( grep -c 'evicted a=.*' vlog$PORT1 || true )
> +wait_ready $PORT1 'fdcache_op_count{op="prefetch_enqueue"}' $( grep -c 'interned.*front=0' vlog$PORT1 || true )
>  wait_ready $PORT1 'fdcache_op_count{op="prefetch_evict"}' $( grep -c 'evicted from prefetch a=.*front=0' vlog$PORT1 || true )
>  
>  kill $PID1

This is an unrelated change.

> diff --git a/tests/run-debuginfod-response-headers.sh b/tests/run-debuginfod-response-headers.sh
> new file mode 100755
> index 00000000..a458ca1b
> --- /dev/null
> +++ b/tests/run-debuginfod-response-headers.sh
> @@ -0,0 +1,118 @@
> +#!/usr/bin/env bash
> +#
> +# Copyright (C) 2019-2021 Red Hat, Inc.
> +# This file is part of elfutils.
> +#
> +# This file is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3 of the License, or
> +# (at your option) any later version.
> +#
> +# elfutils is distributed in the hope that it will be useful, but
> +# WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
> +
> +. $srcdir/test-subr.sh  # includes set -e
> +
> +# for test case debugging, uncomment:
> +set -x
> +
> +DB=${PWD}/.debuginfod_tmp.sqlite
> +tempfiles $DB
> +export DEBUGINFOD_CACHE_PATH=${PWD}/.client_cache
> +
> +# This variable is essential and ensures no time-race for claiming ports occurs
> +# set base to a unique multiple of 100 not used in any other 'run-debuginfod-*' test
> +base=9500
> +get_ports
> +mkdir F R
> +
> +env LD_LIBRARY_PATH=$ldpath DEBUGINFOD_URLS= ${abs_builddir}/../debuginfod/debuginfod $VERBOSE -F -R -d $DB -p $PORT1 -t0 -g0 -v R F > vlog$PORT1 2>&1 &
> +PID1=$!
> +tempfiles vlog$PORT1
> +errfiles vlog$PORT1
> +# Server must become ready
> +wait_ready $PORT1 'ready' 1
> +export DEBUGINFOD_URLS=http://127.0.0.1:$PORT1/   # or without trailing /
> +# Check thread comm names
> +ps -q $PID1 -e -L -o '%p %c %a' | grep groom
> +ps -q $PID1 -e -L -o '%p %c %a' | grep scan
> +ps -q $PID1 -e -L -o '%p %c %a' | grep traverse
> +
> +# We use -t0 and -g0 here to turn off time-based scanning & grooming.
> +# For testing purposes, we just sic SIGUSR1 / SIGUSR2 at the process.
> +
> +########################################################################
> +
> +# Compile a simple program, strip its debuginfo and save the build-id.
> +# Also move the debuginfo into another directory so that elfutils
> +# cannot find it without debuginfod.
> +echo "int main() { return 0; }" > ${PWD}/prog.c
> +tempfiles prog.c
> +# Create a subdirectory to confound source path names
> +mkdir foobar
> +gcc -Wl,--build-id -g -o prog ${PWD}/foobar///./../prog.c
> +testrun ${abs_top_builddir}/src/strip -g -f prog.debug ${PWD}/prog
> +BUILDID=`env LD_LIBRARY_PATH=$ldpath ${abs_builddir}/../src/readelf \
> +          -a prog | grep 'Build ID' | cut -d ' ' -f 7`
> +
> +mv prog F
> +mv prog.debug F
> +kill -USR1 $PID1
> +# Wait till both files are in the index.
> +wait_ready $PORT1 'thread_work_total{role="traverse"}' 1
> +wait_ready $PORT1 'thread_work_pending{role="scan"}' 0
> +wait_ready $PORT1 'thread_busy{role="scan"}' 0
> +
> +cp -rvp ${abs_srcdir}/debuginfod-rpms R
> +if [ "$zstd" = "false" ]; then  # nuke the zstd fedora 31 ones
> +    rm -vrf R/debuginfod-rpms/fedora31
> +fi
> +
> +kill -USR1 $PID1
> +# Wait till both files are in the index and scan/index fully finished
> +wait_ready $PORT1 'thread_work_total{role="traverse"}' 2
> +wait_ready $PORT1 'thread_work_pending{role="scan"}' 0
> +wait_ready $PORT1 'thread_busy{role="scan"}' 0
> +# All rpms need to be in the index, except the dummy permission-000 one
> +rpms=$(find R -name \*rpm | grep -v nothing | wc -l)
> +wait_ready $PORT1 'scanned_files_total{source=".rpm archive"}' $rpms
> +kill -USR1 $PID1  # two hits of SIGUSR1 may be needed to resolve .debug->dwz->srefs
> +# Wait till both files are in the index and scan/index fully finished
> +wait_ready $PORT1 'thread_work_total{role="traverse"}' 3
> +wait_ready $PORT1 'thread_work_pending{role="scan"}' 0
> +wait_ready $PORT1 'thread_busy{role="scan"}' 0

Is it really necessary to add all this if this is just a test to check
the new headers are sent?

> +########################################################################
> +## PR27277
> +# Make a simple request to the debuginfod server and check debuginfod-find's vlog to see if
> +# the custom HTTP headers are received.
> +rm -rf $DEBUGINFOD_CACHE_PATH
> +env DEBUGINFOD_URLS="http://127.0.0.1:"$PORT1 LD_LIBRARY_PATH=$ldpath ${abs_top_builddir}/debuginfod/debuginfod-find\
> +    -vvv executable F/prog > vlog-find$PORT1.1 2>&1
> +tempfiles vlog-find$PORT1.1
> +grep 'Content-Length: ' vlog-find$PORT1.1
> +grep 'Connection: ' vlog-find$PORT1.1
> +grep 'Cache-Control: ' vlog-find$PORT1.1
> +grep 'X-DEBUGINFOD-FILE: ' vlog-find$PORT1.1
> +grep 'X-DEBUGINFOD-SIZE: ' vlog-find$PORT1.1
> +
> +# Check to see if an executable file located in an archive prints the file's description and archive
> +env DEBUGINFOD_URLS="http://127.0.0.1:"$PORT1 LD_LIBRARY_PATH=$ldpath ${abs_top_builddir}/debuginfod/debuginfod-find\
> +    -vvv executable c36708a78618d597dee15d0dc989f093ca5f9120 > vlog-find$PORT1.2 2>&1
> +tempfiles vlog-find$PORT1.2
> +grep 'Content-Length: ' vlog-find$PORT1.2
> +grep 'Connection: ' vlog-find$PORT1.2
> +grep 'Cache-Control: ' vlog-find$PORT1.2
> +grep 'X-DEBUGINFOD-FILE: ' vlog-find$PORT1.2
> +grep 'X-DEBUGINFOD-SIZE: ' vlog-find$PORT1.2
> +grep 'X-DEBUGINFOD-ARCHIVE: ' vlog-find$PORT1.2
> +
> +kill $PID1
> +wait $PID1
> +PID1=0
> +exit 0


> diff --git a/tests/test-subr.sh b/tests/test-subr.sh
> index 41e95e31..95dcbb26 100644
> --- a/tests/test-subr.sh
> +++ b/tests/test-subr.sh
> @@ -43,9 +43,9 @@ remove_files=
>  exit_cleanup()
>  {
>    rm -rf ${PWD}/.client_cache*
> -  rm -f $remove_files; 
> +  rm -f $remove_files;
>    ls -a ${PWD}
> -  cd ..; 
> +  cd ..;
>    rmdir $test_dir
>  }
>  trap exit_cleanup 0
> @@ -228,7 +228,7 @@ cleanup()
>  {
>    if [ $PID1 -ne 0 ]; then kill $PID1; wait $PID1; fi
>    if [ $PID2 -ne 0 ]; then kill $PID2; wait $PID2; fi
> -  rm -rf ${PWD}/.client_cache F R D L Z ${PWD}/foobar ${PWD}/mocktree 
> +  rm -rf ${PWD}/.client_cache F R D L Z ${PWD}/foobar ${PWD}/mocktree
>    exit_cleanup
>  }
>  trap cleanup 0 1 2 3 5 9 15
> @@ -316,12 +316,12 @@ archive_test() {
>  
>  get_ports() {
>    while true; do
> -    PORT1=`expr '(' $RANDOM % 1000 ')' + 9000`
> +    PORT1=`expr '(' $RANDOM % 100 ')' + $base`
>      ss -atn | fgrep ":$PORT1" || break
>    done
>  # Some tests will use two servers, so assign the second var
>    while true; do
> -    PORT2=`expr '(' $RANDOM % 1000 ')' + 9000`
> +    PORT2=`expr '(' $RANDOM % 100 ')' + $base`
>      ss -atn | fgrep ":$PORT2" && $PORT1 -ne $PORT2 || break
>    done

These changes also seem unrelated.

Cheers,

Mark


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Bug debuginfod/27277] Describe retrieved files when verbose
  2021-09-12 19:08                     ` Mark Wielaard
@ 2021-09-13 20:07                       ` Noah Sanci
  2021-09-16 10:50                         ` Mark Wielaard
  0 siblings, 1 reply; 23+ messages in thread
From: Noah Sanci @ 2021-09-13 20:07 UTC (permalink / raw)
  To: Mark Wielaard; +Cc: elfutils-devel

[-- Attachment #1: Type: text/plain, Size: 7746 bytes --]

Hello,

On Sun, Sep 12, 2021 at 3:08 PM Mark Wielaard <mark@klomp.org> wrote:
> > run-debuginfod-fd-prefetch-caches.sh was updated so that it doesn't
> > trip and fail as previously greping for a value that should yield zero
> > caused an error.
>
> I think this part should be in this patch.
Do you mean should or shouldn't? Removed for now.
> > Previously, target_handle was used to gather CURLE_OK reponses. Under
> > some conditions, target_handle was NULL when we wanted it to point to
> > the handle. This could cause some failuers. instead msg->easy_handle
> > is used, which ensures the correct handle is used.
>
> Thanks for including this explanation. What were the "some conditions"?
I removed this. For some time there was a some failures related to
target_handle being null, but msg->easy_handle being assigned.
My tests are passing like this, however

> I found https://curl.se/libcurl/c/CURLOPT_HEADERFUNCTION.html
> Maybe you could add that as comment for future readers.
Good idea, added.

> If the stuct handle_data had also a size field then most of the above
> recalculations of the size are unnecessary and since we then know the
> (old) end of response_data we can simply memcpy the new data to the
> end (of course we need to make sure to add a zero terminator, but that
> can be done with one byte wrote instead of doing a memset of the whole
> buffer).
Changed.

> >  #if LIBCURL_VERSION_NUM >= 0x072a00 /* 7.42.0 */
> >        curl_easy_setopt(data[i].handle, CURLOPT_PATH_AS_IS, (long) 1);
> >  #else
> > @@ -961,6 +994,7 @@ debuginfod_query_server (debuginfod_client *c,
> >    int committed_to = -1;
> >    bool verbose_reported = false;
> >    struct timespec start_time, cur_time;
> > +  c->winning_headers = NULL;
> >    if ( maxtime > 0 && clock_gettime(CLOCK_MONOTONIC_RAW, &start_time) == -1)
> >      {
> >        rc = errno;
> > @@ -995,8 +1029,18 @@ debuginfod_query_server (debuginfod_client *c,
> >           if (data[i].handle != target_handle)
> >             curl_multi_remove_handle(curlm, data[i].handle);
> >           else
> > -           committed_to = i;
> > -     }
> > +              {
> > +             committed_to = i;
> > +                if (c->winning_headers == NULL)
>
> The indenting here is off because of the mixing of spaces and tabs.
Fixed.


> > diff --git a/tests/ChangeLog b/tests/ChangeLog
> > index 1abe5456..23aeec4a 100644
> > --- a/tests/ChangeLog
> > +++ b/tests/ChangeLog
> > @@ -106,11 +106,12 @@
> >       run-debuginfod-query-retry.sh,
> >       run-debuginfod-regex.sh,
> >       run-debuginfod-sizetime.sh,
> > -     run-debuginfod-tmp-home.sh, and
> > -     run-debuginfod-writable.sh
> > -     run-debuginfod-x-forwarded-for.sh
> > -     * tests/Makefile.am: Added the above new files to the test suite
> > -     * tests/test-subr.sh: Added some general functions for above tests
> > +     run-debuginfod-tmp-home.sh,
> > +     run-debuginfod-writable.sh, and
> > +     run-debuginfod-x-forwarded-for.sh.
> > +     All of the above functions now use a 'base' variable to avoid races
> > +     * Makefile.am: Added the above new files to the test suite
> > +     * test-subr.sh: Added some general functions for above tests
>
> These changes seem unrelated to this patch.
Restored their states.

> > +2021-08-02  Noah Sanci  <nsanci@redhat.com>
> > +
> > +     PR27277
> > +     * run-debuginfod-response-headers.sh: Add a test to ensure that file descriptions
> > +     are accurate for files outside or within archives.
> > +     * Makefile.am: Add run-debuginfod-response-headers.sh to TESTS.
>
> It also needs to be added to EXTRA_DIST.
Added.

> > diff --git a/tests/run-debuginfod-fd-prefetch-caches.sh b/tests/run-debuginfod-fd-prefetch-caches.sh
> > index 61fee9e9..bee88c1e 100755
> > --- a/tests/run-debuginfod-fd-prefetch-caches.sh
> > +++ b/tests/run-debuginfod-fd-prefetch-caches.sh
> > @@ -51,9 +51,9 @@ grep 'prefetch fds ' vlog$PORT1 #$PREFETCH_FDS
> >  grep 'prefetch mbs ' vlog$PORT1 #$PREFETCH_MBS
> >  # search the vlog to find what metric counts should be and check the correct metrics
> >  # were incrimented
> > -wait_ready $PORT1 'fdcache_op_count{op="enqueue"}' $( grep -c 'interned.*front=1' vlog$PORT1 )
> > -wait_ready $PORT1 'fdcache_op_count{op="evict"}' $( grep -c 'evicted a=.*' vlog$PORT1 )
> > -wait_ready $PORT1 'fdcache_op_count{op="prefetch_enqueue"}' $( grep -c 'interned.*front=0' vlog$PORT1 )
> > +wait_ready $PORT1 'fdcache_op_count{op="enqueue"}' $( grep -c 'interned.*front=1' vlog$PORT1 || true)
> > +wait_ready $PORT1 'fdcache_op_count{op="evict"}' $( grep -c 'evicted a=.*' vlog$PORT1 || true )
> > +wait_ready $PORT1 'fdcache_op_count{op="prefetch_enqueue"}' $( grep -c 'interned.*front=0' vlog$PORT1 || true )
> >  wait_ready $PORT1 'fdcache_op_count{op="prefetch_evict"}' $( grep -c 'evicted from prefetch a=.*front=0' vlog$PORT1 || true )
> >
> >  kill $PID1
>
> This is an unrelated change.
Restored.

> > diff --git a/tests/run-debuginfod-response-headers.sh b/tests/run-debuginfod-response-headers.sh
> > new file mode 100755
> > index 00000000..a458ca1b
> > --- /dev/null
> > +++ b/tests/run-debuginfod-response-headers.sh
> > @@ -0,0 +1,118 @@
> > +#!/usr/bin/env bash
> > +#
> > +# Copyright (C) 2019-2021 Red Hat, Inc.
> > +# This file is part of elfutils.
> > +#
> > +# This file is free software; you can redistribute it and/or modify
> > +# it under the terms of the GNU General Public License as published by
> > +# the Free Software Foundation; either version 3 of the License, or
> > +# (at your option) any later version.
> > +#
> > +# elfutils is distributed in the hope that it will be useful, but
> > +# WITHOUT ANY WARRANTY; without even the implied warranty of
> > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > +# GNU General Public License for more details.
> > +#
> > +# You should have received a copy of the GNU General Public License
> > +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
[...]
> > +# Wait till both files are in the index and scan/index fully finished
> > +wait_ready $PORT1 'thread_work_total{role="traverse"}' 2
> > +wait_ready $PORT1 'thread_work_pending{role="scan"}' 0
> > +wait_ready $PORT1 'thread_busy{role="scan"}' 0
> > +# All rpms need to be in the index, except the dummy permission-000 one
> > +rpms=$(find R -name \*rpm | grep -v nothing | wc -l)
> > +wait_ready $PORT1 'scanned_files_total{source=".rpm archive"}' $rpms
> > +kill -USR1 $PID1  # two hits of SIGUSR1 may be needed to resolve .debug->dwz->srefs
> > +# Wait till both files are in the index and scan/index fully finished
> > +wait_ready $PORT1 'thread_work_total{role="traverse"}' 3
> > +wait_ready $PORT1 'thread_work_pending{role="scan"}' 0
> > +wait_ready $PORT1 'thread_busy{role="scan"}' 0
>
> Is it really necessary to add all this if this is just a test to check
> the new headers are sent?
A lot of the setup is to check that both the archive and regular file
headers are added. In the attached
path I removed as much as I felt reasonable. Please get back to me on
if it is enough.

> > diff --git a/tests/test-subr.sh b/tests/test-subr.sh
> > index 41e95e31..95dcbb26 100644
> > --- a/tests/test-subr.sh
> > +++ b/tests/test-subr.sh
> > @@ -43,9 +43,9 @@ remove_files=
> >  exit_cleanup()
> >  {
> >    rm -rf ${PWD}/.client_cache*
> > -  rm -f $remove_files;
> > +  rm -f $remove_files;
> >    ls -a ${PWD}
> > -  cd ..;
> > +  cd ..;
> >    rmdir $test_dir
> >  }
[...]
> >    while true; do
> > -    PORT2=`expr '(' $RANDOM % 1000 ')' + 9000`
> > +    PORT2=`expr '(' $RANDOM % 100 ')' + $base`
> >      ss -atn | fgrep ":$PORT2" && $PORT1 -ne $PORT2 || break
> >    done
>
> These changes also seem unrelated.
Fixed.

Thanks,

Noah Sanci

[-- Attachment #2: 0001-debuginfod-PR27277-Describe-retrieved-files-when-ver.patch --]
[-- Type: text/x-patch, Size: 17958 bytes --]

From e657353e59dd5df3e9e763625ba53fda96f9d354 Mon Sep 17 00:00:00 2001
From: Noah Sanci <nsanci@redhat.com>
Date: Tue, 10 Aug 2021 11:21:35 -0400
Subject: [PATCH] debuginfod: PR27277 - Describe retrieved files when verbose

Allow users, with enough verbosity, to print the HTTP response headers
upon retrieving a file. These files may include several custome http
response headers such as X-DEBUGINFOD-FILE, X-DEBUGINFOD-SIZE, and
X-DEBUGINFOD-ARCHIVE. These headers are added from the daemon, in
debuginfod.cxx.
run-debuginfod-fd-prefetch-caches.sh was updated so that it doesn't
trip and fail as previously greping for a value that should yield zero
caused an error.

E.g output:

HTTP/1.1 200 OK
Connection: Keep-Alive
Content-Length: 4095072
Cache-Control: public
Last-Modified: Thu, 09 Sep 2021 19:06:40 GMT
X-FILE: debuginfod
X-FILE-SIZE: 4095072
Content-Type: application/octet-stream
Date: Fri, 10 Sep 2021 16:38:06 GMT

https://sourceware.org/bugzilla/show_bug.cgi?id=27277

Signed-off-by: Noah Sanci <nsanci@redhat.com>
---
 debuginfod/ChangeLog                     | 16 ++++
 debuginfod/debuginfod-client.c           | 62 ++++++++++++++-
 debuginfod/debuginfod.cxx                | 11 +++
 doc/ChangeLog                            |  8 ++
 doc/debuginfod-find.1                    |  3 +-
 doc/debuginfod.8                         |  9 +++
 tests/ChangeLog                          |  6 ++
 tests/Makefile.am                        |  4 +-
 tests/run-debuginfod-response-headers.sh | 96 ++++++++++++++++++++++++
 9 files changed, 210 insertions(+), 5 deletions(-)
 create mode 100755 tests/run-debuginfod-response-headers.sh

diff --git a/debuginfod/ChangeLog b/debuginfod/ChangeLog
index 1173f9cd..f1668e29 100644
--- a/debuginfod/ChangeLog
+++ b/debuginfod/ChangeLog
@@ -31,6 +31,22 @@
 	* debuginfod.cxx (handler_cb): Fix after_you unique_set key
 	to the entire incoming URL.
 
+2021-08-02  Noah Sanci  <nsanci@redhat.com>
+
+	PR27277
+	* debuginfod-client.c (struct debuginfod_client): New field
+	winning_headers.
+	(struct handle_data): New field response_data, response_data_size.
+	(header_callback): Store received headers in response_data.
+	(debuginfod_query_server): Activate CURLOPT_HEADERFUNCTION.
+	Save winning response_data.
+	(debuginfod_end): free client winning headers.
+	* debuginfod.cxx (handle_buildid_f_match): remove X-DEBUGINFOD-FILE
+	path. Add X-DEBUGINFOD-FILE and X-DEBUGINFOD-SIZE headers.
+	(handle_buildid_r_match): remove X-DEBUGINFOD-FILE path. Add
+	X-DEBUGINFOD-FILE, X-DEBUGINFOD-SIZE
+	headers, and X-ARCHIVE headers.
+
 2021-07-26  Noah Sanci  <nsanci@redhat.com>
 
 	PR27982
diff --git a/debuginfod/debuginfod-client.c b/debuginfod/debuginfod-client.c
index d41723ce..08444c0c 100644
--- a/debuginfod/debuginfod-client.c
+++ b/debuginfod/debuginfod-client.c
@@ -127,6 +127,7 @@ struct debuginfod_client
      timeout or other info gotten from environment variables, the
      handle data, etc. So those don't have to be reparsed and
      recreated on each request.  */
+  char * winning_headers;
 };
 
 /* The cache_clean_interval_s file within the debuginfod cache specifies
@@ -183,6 +184,9 @@ struct handle_data
      to the cache. Used to ensure that a file is not downloaded from
      multiple servers unnecessarily.  */
   CURL **target_handle;
+  /* Response http headers for this client handle, sent from the server */
+  char *response_data;
+  size_t response_data_size;
 };
 
 static size_t
@@ -498,6 +502,37 @@ default_progressfn (debuginfod_client *c, long a, long b)
   return 0;
 }
 
+/* This is a callback function that receives http response headers in buffer for use
+ * in this program. https://curl.se/libcurl/c/CURLOPT_HEADERFUNCTION.html is the
+ * online documentation.
+ */
+static size_t
+header_callback (char * buffer, size_t size, size_t numitems, void * userdata)
+{
+  if (size != 1)
+    return 0;
+  /* Temporary buffer for realloc */
+  char *temp = NULL;
+  struct handle_data *data = (struct handle_data *) userdata;
+  if (data->response_data == NULL)
+    {
+      temp = malloc(numitems+1);
+      if (temp == NULL)
+        return 0;
+    }
+  else
+    {
+      temp = realloc(data->response_data, data->response_data_size + numitems + 1);
+      if (temp == NULL)
+        return 0;
+    }
+
+  memcpy(temp + data->response_data_size, buffer, numitems);
+  data->response_data = temp;
+  data->response_data_size += numitems;
+  data->response_data[data->response_data_size] = '\0';
+  return numitems;
+}
 
 /* Query each of the server URLs found in $DEBUGINFOD_URLS for the file
    with the specified build-id, type (debuginfo, executable or source)
@@ -936,10 +971,14 @@ debuginfod_query_server (debuginfod_client *c,
 	  curl_easy_setopt (data[i].handle, CURLOPT_LOW_SPEED_LIMIT,
 			    100 * 1024L);
 	}
+      data[i].response_data = NULL;
+      data[i].response_data_size = 0;
       curl_easy_setopt(data[i].handle, CURLOPT_FILETIME, (long) 1);
       curl_easy_setopt(data[i].handle, CURLOPT_FOLLOWLOCATION, (long) 1);
       curl_easy_setopt(data[i].handle, CURLOPT_FAILONERROR, (long) 1);
       curl_easy_setopt(data[i].handle, CURLOPT_NOSIGNAL, (long) 1);
+      curl_easy_setopt(data[i].handle, CURLOPT_HEADERFUNCTION, header_callback);
+      curl_easy_setopt(data[i].handle, CURLOPT_HEADERDATA, (void *) &(data[i]));
 #if LIBCURL_VERSION_NUM >= 0x072a00 /* 7.42.0 */
       curl_easy_setopt(data[i].handle, CURLOPT_PATH_AS_IS, (long) 1);
 #else
@@ -961,6 +1000,7 @@ debuginfod_query_server (debuginfod_client *c,
   int committed_to = -1;
   bool verbose_reported = false;
   struct timespec start_time, cur_time;
+  c->winning_headers = NULL;
   if ( maxtime > 0 && clock_gettime(CLOCK_MONOTONIC_RAW, &start_time) == -1)
     {
       rc = errno;
@@ -995,7 +1035,17 @@ debuginfod_query_server (debuginfod_client *c,
 	    if (data[i].handle != target_handle)
 	      curl_multi_remove_handle(curlm, data[i].handle);
 	    else
-	      committed_to = i;
+              {
+	        committed_to = i;
+                if (c->winning_headers == NULL)
+                  {
+                    c->winning_headers = data[committed_to].response_data;
+                    if (vfd >= 0 && c->winning_headers != NULL)
+                      dprintf(vfd, "\n%s", c->winning_headers);
+                    data[committed_to].response_data = NULL;
+                  }
+
+              }
 	}
 
       if (vfd >= 0 && !verbose_reported && committed_to >= 0)
@@ -1161,10 +1211,10 @@ debuginfod_query_server (debuginfod_client *c,
                 {
                   char *effective_url = NULL;
                   long resp_code = 500;
-                  CURLcode ok1 = curl_easy_getinfo (target_handle,
+                  CURLcode ok1 = curl_easy_getinfo (msg->easy_handle,
 						    CURLINFO_EFFECTIVE_URL,
 						    &effective_url);
-                  CURLcode ok2 = curl_easy_getinfo (target_handle,
+                  CURLcode ok2 = curl_easy_getinfo (msg->easy_handle,
 						    CURLINFO_RESPONSE_CODE,
 						    &resp_code);
                   if(ok1 == CURLE_OK && ok2 == CURLE_OK && effective_url)
@@ -1238,7 +1288,10 @@ debuginfod_query_server (debuginfod_client *c,
             {
               curl_multi_remove_handle(curlm, data[i].handle); /* ok to repeat */
               curl_easy_cleanup (data[i].handle);
+              free(data[i].response_data);
             }
+            free(c->winning_headers);
+            c->winning_headers = NULL;
 	    goto query_in_parallel;
 	}
       else
@@ -1281,6 +1334,7 @@ debuginfod_query_server (debuginfod_client *c,
     {
       curl_multi_remove_handle(curlm, data[i].handle); /* ok to repeat */
       curl_easy_cleanup (data[i].handle);
+      free (data[i].response_data);
     }
 
   for (int i = 0; i < num_urls; ++i)
@@ -1304,6 +1358,7 @@ debuginfod_query_server (debuginfod_client *c,
     {
       curl_multi_remove_handle(curlm, data[i].handle); /* ok to repeat */
       curl_easy_cleanup (data[i].handle);
+      free (data[i].response_data);
     }
 
   unlink (target_cache_tmppath);
@@ -1415,6 +1470,7 @@ debuginfod_end (debuginfod_client *client)
 
   curl_multi_cleanup (client->server_mhandle);
   curl_slist_free_all (client->headers);
+  free (client->winning_headers);
   free (client->url);
   free (client);
 }
diff --git a/debuginfod/debuginfod.cxx b/debuginfod/debuginfod.cxx
index 6cc9f777..280c8b6a 100644
--- a/debuginfod/debuginfod.cxx
+++ b/debuginfod/debuginfod.cxx
@@ -1075,6 +1075,9 @@ handle_buildid_f_match (bool internal_req_t,
   else
     {
       MHD_add_response_header (r, "Content-Type", "application/octet-stream");
+      std::string file = b_source0.substr(b_source0.find_last_of("/")+1, b_source0.length());
+      MHD_add_response_header (r, "X-DEBUGINFOD-SIZE", to_string(s.st_size).c_str() );
+      MHD_add_response_header (r, "X-DEBUGINFOD-FILE", file.c_str() );
       add_mhd_last_modified (r, s.st_mtime);
       if (verbose > 1)
         obatched(clog) << "serving file " << b_source0 << endl;
@@ -1544,6 +1547,9 @@ handle_buildid_r_match (bool internal_req_p,
       inc_metric ("http_responses_total","result","archive fdcache");
 
       MHD_add_response_header (r, "Content-Type", "application/octet-stream");
+      MHD_add_response_header (r, "X-DEBUGINFOD-SIZE", to_string(fs.st_size).c_str());
+      MHD_add_response_header (r, "X-DEBUGINFOD-ARCHIVE", b_source0.c_str());
+      MHD_add_response_header (r, "X-DEBUGINFOD-FILE", b_source1.c_str());
       add_mhd_last_modified (r, fs.st_mtime);
       if (verbose > 1)
         obatched(clog) << "serving fdcache archive " << b_source0 << " file " << b_source1 << endl;
@@ -1685,6 +1691,11 @@ handle_buildid_r_match (bool internal_req_p,
       else
         {
           MHD_add_response_header (r, "Content-Type", "application/octet-stream");
+          std::string file = b_source1.substr(b_source1.find_last_of("/")+1, b_source1.length());
+          MHD_add_response_header (r, "X-DEBUGINFOD-SIZE", to_string(fs.st_size).c_str());
+          MHD_add_response_header (r, "X-DEBUGINFOD-ARCHIVE", b_source0.c_str());
+          MHD_add_response_header (r, "X-DEBUGINFOD-FILE", file.c_str());
+
           add_mhd_last_modified (r, archive_entry_mtime(e));
           if (verbose > 1)
             obatched(clog) << "serving archive " << b_source0 << " file " << b_source1 << endl;
diff --git a/doc/ChangeLog b/doc/ChangeLog
index ada48383..db3a3584 100644
--- a/doc/ChangeLog
+++ b/doc/ChangeLog
@@ -18,6 +18,14 @@
 	* Makefile.am: Updated to include debuginfod-client-config.7
 	* man3, man7: Symlinks for source tree man page testing.
 
+2021-08-04  Noah Sanci  <nsanci@redhat.com>
+
+	PR27277
+	* debuginfod-find.1: Increasing verbosity describes the downloaded
+	file.
+	* debuginfod.8: Describe X-DEBUGINFOD-FILE, X-DEBUGINFOD-SIZE, and
+	X-DEBUGINFOD-ARCHIVE.
+
 2021-07-26  Noah Sanci <nsanci@redhat.com>
 
 	PR27982
diff --git a/doc/debuginfod-find.1 b/doc/debuginfod-find.1
index a61673f5..957ec7e7 100644
--- a/doc/debuginfod-find.1
+++ b/doc/debuginfod-find.1
@@ -110,7 +110,8 @@ l l.
 
 .TP
 .B "\-v"
-Increase verbosity, including printing frequent download-progress messages.
+Increase verbosity, including printing frequent download-progress messages
+and printing the http response headers from the server.
 
 
 .SH "SECURITY"
diff --git a/doc/debuginfod.8 b/doc/debuginfod.8
index f9a418d1..fde06bb8 100644
--- a/doc/debuginfod.8
+++ b/doc/debuginfod.8
@@ -258,6 +258,15 @@ Unknown buildid / request combinations result in HTTP error codes.
 This file service resemblance is intentional, so that an installation
 can take advantage of standard HTTP management infrastructure.
 
+Upon finding a file in an archive or simply in the database, some
+custom http headers are added to the response. For files in the
+database X-DEBUGINFOD-FILE and X-DEBUGINFOD-SIZE are added.
+X-DEBUGINFOD-FILE is simply the unescaped filename and
+X-DEBUGINFOD-SIZE is the size of the file. For files found in archives,
+in addition to X-DEBUGINFOD-FILE and X-DEBUGINFOD-SIZE,
+X-DEBUGINFOD-ARCHIVE is added.  X-DEBUGINFOD-ARCHIVE is the name of the
+archive the file was found in.
+
 There are three requests.  In each case, the buildid is encoded as a
 lowercase hexadecimal string.  For example, for a program \fI/bin/ls\fP,
 look at the ELF note GNU_BUILD_ID:
diff --git a/tests/ChangeLog b/tests/ChangeLog
index 1154686a..28c08346 100644
--- a/tests/ChangeLog
+++ b/tests/ChangeLog
@@ -106,6 +106,12 @@
 
 	* debuginfod-subr.sh (EXTRA_DIST): Add debuginfod-subr.sh.
 
+2021-08-20  Noah Sanci  <nsanci@redhat.com>
+
+	* run-debuginfod-response-headers.sh: Ensures custom http response
+	headers are used and functional
+	* Makefile.am: Added the above new file to TESTS and EXTRA_DIST
+
 2021-08-28  Mark Wielaard  <mark@klomp.org>
 
 	* run-debuginfod-find.sh: Use ":memory:" for the
diff --git a/tests/Makefile.am b/tests/Makefile.am
index 22942733..c98a5837 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -231,7 +231,8 @@ TESTS += run-debuginfod-dlopen.sh \
 	 run-debuginfod-federation-sqlite.sh \
 	 run-debuginfod-federation-link.sh \
 	 run-debuginfod-federation-metrics.sh \
-	 run-debuginfod-x-forwarded-for.sh
+	 run-debuginfod-x-forwarded-for.sh \
+         run-debuginfod-response-headers.sh
 endif
 endif
 
@@ -524,6 +525,7 @@ EXTRA_DIST = run-arextract.sh run-arsymtest.sh run-ar.sh \
 	     run-debuginfod-archive-groom.sh \
 	     run-debuginfod-archive-rename.sh \
              run-debuginfod-archive-test.sh \
+	     run-debuginfod-response-headers.sh \
 	     debuginfod-rpms/fedora30/hello2-1.0-2.src.rpm \
 	     debuginfod-rpms/fedora30/hello2-1.0-2.x86_64.rpm \
 	     debuginfod-rpms/fedora30/hello2-debuginfo-1.0-2.x86_64.rpm \
diff --git a/tests/run-debuginfod-response-headers.sh b/tests/run-debuginfod-response-headers.sh
new file mode 100755
index 00000000..bdb39b4d
--- /dev/null
+++ b/tests/run-debuginfod-response-headers.sh
@@ -0,0 +1,96 @@
+#!/usr/bin/env bash
+#
+# Copyright (C) 2019-2021 Red Hat, Inc.
+# This file is part of elfutils.
+#
+# This file is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# elfutils is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+. $srcdir/debuginfod-subr.sh  # includes set -e
+
+# for test case debugging, uncomment:
+set -x
+
+DB=${PWD}/.debuginfod_tmp.sqlite
+tempfiles $DB
+export DEBUGINFOD_CACHE_PATH=${PWD}/.client_cache
+
+# This variable is essential and ensures no time-race for claiming ports occurs
+# set base to a unique multiple of 100 not used in any other 'run-debuginfod-*' test
+base=9500
+get_ports
+mkdir F R
+env LD_LIBRARY_PATH=$ldpath DEBUGINFOD_URLS= ${abs_builddir}/../debuginfod/debuginfod $VERBOSE -F -R -d $DB -p $PORT1 -t0 -g0 -v R F > vlog$PORT1 2>&1 &
+PID1=$!
+tempfiles vlog$PORT1
+errfiles vlog$PORT1
+# Server must become ready
+wait_ready $PORT1 'ready' 1
+export DEBUGINFOD_URLS=http://127.0.0.1:$PORT1/   # or without trailing /
+########################################################################
+
+# Compile a simple program, strip its debuginfo and save the build-id.
+# Also move the debuginfo into another directory so that elfutils
+# cannot find it without debuginfod.
+echo "int main() { return 0; }" > ${PWD}/prog.c
+tempfiles prog.c
+# Create a subdirectory to confound source path names
+mkdir foobar
+gcc -Wl,--build-id -g -o prog ${PWD}/foobar///./../prog.c
+
+mv prog F
+
+cp -rvp ${abs_srcdir}/debuginfod-rpms R
+if [ "$zstd" = "false" ]; then  # nuke the zstd fedora 31 ones
+    rm -vrf R/debuginfod-rpms/fedora31
+fi
+
+kill -USR1 $PID1
+# Wait till both files are in the index and scan/index fully finished
+wait_ready $PORT1 'thread_work_total{role="traverse"}' 1
+# All rpms need to be in the index, except the dummy permission-000 one
+rpms=$(find R -name \*rpm | grep -v nothing | wc -l)
+wait_ready $PORT1 'scanned_files_total{source=".rpm archive"}' $rpms
+kill -USR1 $PID1  # two hits of SIGUSR1 may be needed to resolve .debug->dwz->srefs
+# Wait till both files are in the index and scan/index fully finished
+wait_ready $PORT1 'thread_work_total{role="traverse"}' 2
+
+########################################################################
+## PR27277
+# Make a simple request to the debuginfod server and check debuginfod-find's vlog to see if
+# the custom HTTP headers are received.
+rm -rf $DEBUGINFOD_CACHE_PATH
+env DEBUGINFOD_URLS="http://127.0.0.1:"$PORT1 LD_LIBRARY_PATH=$ldpath ${abs_top_builddir}/debuginfod/debuginfod-find\
+    -vvv executable F/prog > vlog-find$PORT1.1 2>&1
+tempfiles vlog-find$PORT1.1
+grep 'Content-Length: ' vlog-find$PORT1.1
+grep 'Connection: ' vlog-find$PORT1.1
+grep 'Cache-Control: ' vlog-find$PORT1.1
+grep 'X-DEBUGINFOD-FILE: ' vlog-find$PORT1.1
+grep 'X-DEBUGINFOD-SIZE: ' vlog-find$PORT1.1
+
+# Check to see if an executable file located in an archive prints the file's description and archive
+env DEBUGINFOD_URLS="http://127.0.0.1:"$PORT1 LD_LIBRARY_PATH=$ldpath ${abs_top_builddir}/debuginfod/debuginfod-find\
+    -vvv executable c36708a78618d597dee15d0dc989f093ca5f9120 > vlog-find$PORT1.2 2>&1
+tempfiles vlog-find$PORT1.2
+grep 'Content-Length: ' vlog-find$PORT1.2
+grep 'Connection: ' vlog-find$PORT1.2
+grep 'Cache-Control: ' vlog-find$PORT1.2
+grep 'X-DEBUGINFOD-FILE: ' vlog-find$PORT1.2
+grep 'X-DEBUGINFOD-SIZE: ' vlog-find$PORT1.2
+grep 'X-DEBUGINFOD-ARCHIVE: ' vlog-find$PORT1.2
+
+kill $PID1
+wait $PID1
+PID1=0
+exit 0
-- 
2.31.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Bug debuginfod/27277] Describe retrieved files when verbose
  2021-09-13 20:07                       ` Noah Sanci
@ 2021-09-16 10:50                         ` Mark Wielaard
  0 siblings, 0 replies; 23+ messages in thread
From: Mark Wielaard @ 2021-09-16 10:50 UTC (permalink / raw)
  To: Noah Sanci; +Cc: elfutils-devel

Hi Noah,

On Mon, 2021-09-13 at 16:07 -0400, Noah Sanci via Elfutils-devel wrote:
> On Sun, Sep 12, 2021 at 3:08 PM Mark Wielaard <mark@klomp.org> wrote:
> > > run-debuginfod-fd-prefetch-caches.sh was updated so that it doesn't
> > > trip and fail as previously greping for a value that should yield zero
> > > caused an error.
> > 
> > I think this part should be in this patch.
> 
> Do you mean should or shouldn't? Removed for now.

That should should have been shouldn't.

> > > +# Wait till both files are in the index and scan/index fully
> > > finished
> > > +wait_ready $PORT1 'thread_work_total{role="traverse"}' 2
> > > +wait_ready $PORT1 'thread_work_pending{role="scan"}' 0
> > > +wait_ready $PORT1 'thread_busy{role="scan"}' 0
> > > +# All rpms need to be in the index, except the dummy permission-
> > > 000 one
> > > +rpms=$(find R -name \*rpm | grep -v nothing | wc -l)
> > > +wait_ready $PORT1 'scanned_files_total{source=".rpm archive"}'
> > > $rpms
> > > +kill -USR1 $PID1  # two hits of SIGUSR1 may be needed to resolve
> > > .debug->dwz->srefs
> > > +# Wait till both files are in the index and scan/index fully
> > > finished
> > > +wait_ready $PORT1 'thread_work_total{role="traverse"}' 3
> > > +wait_ready $PORT1 'thread_work_pending{role="scan"}' 0
> > > +wait_ready $PORT1 'thread_busy{role="scan"}' 0
> > 
> > Is it really necessary to add all this if this is just a test to
> > check
> > the new headers are sent?
> 
> A lot of the setup is to check that both the archive and regular file
> headers are added. In the attached
> path I removed as much as I felt reasonable. Please get back to me on
> if it is enough.

Ah, yes, of course I had forgotten about the archive headers.

> Subject: [PATCH] debuginfod: PR27277 - Describe retrieved files when verbose
> 
> Allow users, with enough verbosity, to print the HTTP response headers
> upon retrieving a file. These files may include several custome http
> response headers such as X-DEBUGINFOD-FILE, X-DEBUGINFOD-SIZE, and
> X-DEBUGINFOD-ARCHIVE. These headers are added from the daemon, in
> debuginfod.cxx.

> run-debuginfod-fd-prefetch-caches.sh was updated so that it doesn't
> trip and fail as previously greping for a value that should yield zero
> caused an error.

^ This paragraph doesn't document a change in the patch.

> E.g output:
> 
> HTTP/1.1 200 OK
> Connection: Keep-Alive
> Content-Length: 4095072
> Cache-Control: public
> Last-Modified: Thu, 09 Sep 2021 19:06:40 GMT
> X-FILE: debuginfod
> X-FILE-SIZE: 4095072
> Content-Type: application/octet-stream
> Date: Fri, 10 Sep 2021 16:38:06 GMT
> 
> https://sourceware.org/bugzilla/show_bug.cgi?id=27277

But except for that one paragraph in the commit message that shouldn't
be there, this looks good. Please remove that paragraph from the commit
message (or replace it with one describing the new test added), rebase
it against the master branch and push it please.

Thanks,

Mark

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Bug debuginfod/27277] Describe retrieved files when verbose
  2021-08-09  9:25         ` Mark Wielaard
  2021-08-23 15:11           ` Noah Sanci
@ 2021-09-22 20:33           ` Frank Ch. Eigler
  2021-09-29 14:55             ` Mark Wielaard
  1 sibling, 1 reply; 23+ messages in thread
From: Frank Ch. Eigler @ 2021-09-22 20:33 UTC (permalink / raw)
  To: Mark Wielaard; +Cc: Noah Sanci, elfutils-devel

Hi -

> > > > That in turn would require THREE new API functions or a
> > > > stateful set_HEAD_mode_and_return_dev_null one and modifying
> > > > the three main lookup functions.

> > > Yes, it definitely is more work.
> > 
> > So, is that your suggestion?  We proceed with that sort of thing?
> 
> Yes, separate the verbose printing of http headers (which I really do
> like)

(This is now done, more or less, but noting that it is not a
machine-consumable API.)

> from providing an interface to query what needs to be done to get
> some file (is it in cache, can it be retieved from a remote server, how
> big is it?) I don't think providing raw http headers is that interface.

Well, we have gone some way into this on PR28284, on various branches
including nsanci/pr28284-webapi.  It's not complete, yet the "raw http
headers" aspect is still there, because what headers are available is
unpredictable.  But now this is made even more wordy by forking the
_find_ functions into a _describe_ triplet and all the other leftover
work elsewhere.

IMHO it's not an improvement over a single function that returns
headers associated with the lookup.  Please let's discuss this again.

- FChE


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Bug debuginfod/27277] Describe retrieved files when verbose
  2021-09-22 20:33           ` Frank Ch. Eigler
@ 2021-09-29 14:55             ` Mark Wielaard
  2021-09-29 21:28               ` Frank Ch. Eigler
  0 siblings, 1 reply; 23+ messages in thread
From: Mark Wielaard @ 2021-09-29 14:55 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: elfutils-devel

Hi Frank,

On Wed, 2021-09-22 at 16:33 -0400, Frank Ch. Eigler via Elfutils-devel
wrote:
> > from providing an interface to query what needs to be done to get
> > some file (is it in cache, can it be retieved from a remote server,
> > how
> > big is it?) I don't think providing raw http headers is that
> > interface.
> 
> Well, we have gone some way into this on PR28284, on various branches
> including nsanci/pr28284-webapi.  It's not complete, yet the "raw
> http
> headers" aspect is still there, because what headers are available is
> unpredictable.  But now this is made even more wordy by forking the
> _find_ functions into a _describe_ triplet and all the other leftover
> work elsewhere.
> 
> IMHO it's not an improvement over a single function that returns
> headers associated with the lookup.  Please let's discuss this again.

Looking at the nscanci/pr28284 branch I see the following proposed
interface for debuginfod-client:

/* The _describe_ variants of the query functions return a set of HTTP-like
   headers that would result from a new lookup.  No content is collected. 
   If successful, return zero, otherwise return a posix error code.  If
   successful, set *headers a strdup()'d copy of the headers.
   Caller must free() it later. */

int debuginfod_describe_debuginfo (debuginfod_client *client,
                                   const unsigned char *build_id,
                                   int build_id_len,
                                   char **headers);

int debuginfod_describe_executable (debuginfod_client *client,
                                    const unsigned char *build_id,
                                    int build_id_len,
                                    char **headers);

int debuginfod_describe_source (debuginfod_client *client,
                                const unsigned char *build_id,
                                int build_id_len,
                                const char *filename,
                                char **headers);

And if I understand your comments above correctly, you would rather see
a function like const char* debuginfod_get_url (debuginfod_client
*client); but for any headers. Would such a headers call be only be
accessible during debuginfod_progressfn_t callback or would it retain
the headers so they can be found after a debuginfod_find_* call (as
long as debuginfod_end hasn't been called on the client handle)?

Or do I misunderstand the various proposals?

Personally I don't really like an interface that relies on the program
having to parse somewhat arbitrary strings. I can see how it is useful
in verbose mode for the user to see the headers to know what is going
on (which we have now). But if the program needs to make any policy
decisions then what do we guarantee about the provided header strings?

What is the information the program really needs? I think that is a)
whether the requested file is already in cache, b) if not, whether it
can be retrieved from a server, and c) if so how big it is and/or how
many bytes need to be transferred from the server.

If the above seems reasonable information to provide to the program
then we can design a debuginfo-client interface based on that. Which
might or might not include the full headers to the program. But I might
be missing some important data or misinterpret the goal of the new
interface/information provided to the program.

Cheers,

Mark

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Bug debuginfod/27277] Describe retrieved files when verbose
  2021-09-29 14:55             ` Mark Wielaard
@ 2021-09-29 21:28               ` Frank Ch. Eigler
  2021-10-05 14:28                 ` Mark Wielaard
  0 siblings, 1 reply; 23+ messages in thread
From: Frank Ch. Eigler @ 2021-09-29 21:28 UTC (permalink / raw)
  To: Mark Wielaard; +Cc: elfutils-devel

Hi -

> [...]
> And if I understand your comments above correctly, you would rather see
> a function like const char* debuginfod_get_url (debuginfod_client
> *client); but for any headers. 

Correct.

> Would such a headers call be only be accessible during
> debuginfod_progressfn_t callback or would it retain the headers so
> they can be found after a debuginfod_find_* call (as long as
> debuginfod_end hasn't been called on the client handle)?

Like _url(), it'd would persist the value until the client is deleted.


> Personally I don't really like an interface that relies on the program
> having to parse somewhat arbitrary strings. I can see how it is useful
> in verbose mode for the user to see the headers to know what is going
> on (which we have now). 

The problem with what we have now, with $DEBUGINFOD_VERBOSE, is that
the amount of output is huge.  It is a debugging level trace.  It's
not consumable by non-expert users OR by software.

> But if the program needs to make any policy decisions then what do
> we guarantee about the provided header strings?

The simplest thing to do is simply to save whatever we fetched and
present it verbatim.  Parsing one-line "key: value" HTTP headers is
not that difficult.  We could -add- a few headers in order to provide
guarantees, but that's not necessary.  (We should catch up with
documenting the headers that debuginfod is known to send.)

But "programs making policy decisions" is not the only use case: what
about where a user would like to get a glance at that metadata, and
not all the other $DEBUGINFOD_VERBOSE firehose?  They could ALMOST do
it themselves via "% curl -I -i $URL", except $URL is synthesized and
competitively tie-broken between $DEBUGINFOD_URLS.


- FChE


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Bug debuginfod/27277] Describe retrieved files when verbose
  2021-09-29 21:28               ` Frank Ch. Eigler
@ 2021-10-05 14:28                 ` Mark Wielaard
  2022-07-14 15:32                   ` Noah Sanci
  0 siblings, 1 reply; 23+ messages in thread
From: Mark Wielaard @ 2021-10-05 14:28 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: elfutils-devel

Hi Frank,

On Wed, 2021-09-29 at 17:28 -0400, Frank Ch. Eigler wrote:
> The problem with what we have now, with $DEBUGINFOD_VERBOSE, is that
> the amount of output is huge.  It is a debugging level trace.  It's
> not consumable by non-expert users OR by software.

OK, but that seem two separate issues. The first is making generic
verbose/debug output usable by non-expert users. The second is
providing a software interface to probe whether an artifact can be
retrieved from a remote server (and whether it already is available in
cache and if now how many resources it might take to get it).

Or do you imagine some kind of "diagnostics" interface that programs
use to provide non-experts users with the http headers to analyze?

> > But if the program needs to make any policy decisions then what do
> > we guarantee about the provided header strings?
> 
> The simplest thing to do is simply to save whatever we fetched and
> present it verbatim.  Parsing one-line "key: value" HTTP headers is
> not that difficult.  We could -add- a few headers in order to provide
> guarantees, but that's not necessary.  (We should catch up with
> documenting the headers that debuginfod is known to send.)

OK, so I would assume that is for the non-expert user case. When/where
would these headers be shown?

> But "programs making policy decisions" is not the only use case: what
> about where a user would like to get a glance at that metadata, and
> not all the other $DEBUGINFOD_VERBOSE firehose?  They could ALMOST do
> it themselves via "% curl -I -i $URL", except $URL is synthesized and
> competitively tie-broken between $DEBUGINFOD_URLS.

So, for the above you would want some debuginfod-find --info flag that
spits out just the http headers but doesn't retrieve the actual file?

What I had in mind for a software interface would not include the
actual headers, but just used them to provide the program with
information. e.g.

/* Get info about an debuginfod artifact. Used to know whether
   the target is already in local cache or whether it can be retrieved
   from one if the urls contained in $DEBUGINFOD_URLS.

   If build_id_len == 0, the build_id is supplied as a lowercase
   hexadecimal string; otherwise it is a binary blob of given length.

   If the requested resource is in cache, return a file descriptor
   which an be used as is. If the requested resource can be found
   through one of the DEBUGINFOD_URLS then -1 is returned and
   file_size and transfer_size are set to the number of bytes of
   the target file and the number if bytes that need to be transferred
   from the server (file_size is the uncompressed size, transfer_size
   might be the compressed size). Otherwise return -2 to indicate the
   requested artifact cannot be found.

   If the file wasn't in cache, but can be retrieved from a remote
   server, then debuginfod_get_url () will return where the target
   can be found. A successive debuginfod_find_* call will retieve
   the resource (unless a network error occurs).  */

int debuginfod_info_debuginfo (debuginfod_client *client,
                               const unsigned char *build_id,
                               int build_id_len,
                               size_t *size, size_t *transfer_size);

If we want to combine both use cases we could add an optional char
**headers argument?

Cheers,

Mark

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Bug debuginfod/27277] Describe retrieved files when verbose
  2021-10-05 14:28                 ` Mark Wielaard
@ 2022-07-14 15:32                   ` Noah Sanci
  2022-08-04 13:12                     ` Mark Wielaard
  0 siblings, 1 reply; 23+ messages in thread
From: Noah Sanci @ 2022-07-14 15:32 UTC (permalink / raw)
  To: elfutils-devel

[-- Attachment #1: Type: text/plain, Size: 235 bytes --]

Hello,

Please find the patch for pr28284 attached

    Debuginfod and debuginfod clients are now equipped to send
    and receive http headers prefixed with X-DEBUGINFOD and
    print them in verbose mode for more context

Noah Sanci

[-- Attachment #2: 0001-PR28284-Debuginfod-header-functionality-implemented.patch --]
[-- Type: text/x-patch, Size: 9606 bytes --]

From 2d4902ca53b80b5cd5689a1ba77e4465c33fea64 Mon Sep 17 00:00:00 2001
From: Noah Sanci <nsanci@redhat.com>
Date: Wed, 15 Jun 2022 10:07:29 -0400
Subject: [PATCH] PR28284 - Debuginfod header functionality implemented

---
 debuginfod/debuginfod-client.c           | 14 ++++++-
 debuginfod/debuginfod-find.c             |  3 ++
 debuginfod/debuginfod.cxx                | 18 +++++++++
 debuginfod/debuginfod.h.in               |  4 ++
 debuginfod/libdebuginfod.map             |  3 ++
 doc/debuginfod_find_debuginfo.3          | 13 +++++++
 doc/debuginfod_get_headers.3             |  2 +
 tests/run-debuginfod-response-headers.sh | 48 +++++++++++++++++++++---
 8 files changed, 97 insertions(+), 8 deletions(-)
 create mode 100644 doc/debuginfod_get_headers.3

diff --git a/debuginfod/debuginfod-client.c b/debuginfod/debuginfod-client.c
index b7b65aff..a3565f57 100644
--- a/debuginfod/debuginfod-client.c
+++ b/debuginfod/debuginfod-client.c
@@ -501,6 +501,12 @@ header_callback (char * buffer, size_t size, size_t numitems, void * userdata)
 {
   if (size != 1)
     return 0;
+  // X-DEBUGINFOD is 11 characters long.
+  // Some basic checks to ensure the headers received are of the expected format
+  if ( strncmp(buffer, "X-DEBUGINFOD", 11) || buffer[numitems-1] != '\n'
+       || (buffer == strstr(buffer, ":")) ){
+    return numitems;
+  }
   /* Temporary buffer for realloc */
   char *temp = NULL;
   struct handle_data *data = (struct handle_data *) userdata;
@@ -1111,8 +1117,6 @@ debuginfod_query_server (debuginfod_client *c,
                 if (c->winning_headers == NULL)
                   {
                     c->winning_headers = data[committed_to].response_data;
-                    if (vfd >= 0 && c->winning_headers != NULL)
-                      dprintf(vfd, "\n%s", c->winning_headers);
                     data[committed_to].response_data = NULL;
                     data[committed_to].response_data_size = 0;
                   }
@@ -1542,6 +1546,12 @@ debuginfod_get_url(debuginfod_client *client)
   return client->url;
 }
 
+const char *
+debuginfod_get_headers(debuginfod_client *client)
+{
+  return client->winning_headers;
+}
+
 void
 debuginfod_end (debuginfod_client *client)
 {
diff --git a/debuginfod/debuginfod-find.c b/debuginfod/debuginfod-find.c
index f60b5463..fb1f294c 100644
--- a/debuginfod/debuginfod-find.c
+++ b/debuginfod/debuginfod-find.c
@@ -215,6 +215,9 @@ main(int argc, char** argv)
 
   if (verbose)
     {
+      const char* headers = debuginfod_get_headers(client);
+      if (headers)
+        fprintf(stderr, "Headers:\n%s", headers);
       const char* url = debuginfod_get_url (client);
       if (url != NULL)
         fprintf(stderr, "Downloaded from %s\n", url);
diff --git a/debuginfod/debuginfod.cxx b/debuginfod/debuginfod.cxx
index 51f4302b..d64c5965 100644
--- a/debuginfod/debuginfod.cxx
+++ b/debuginfod/debuginfod.cxx
@@ -2085,6 +2085,24 @@ and will not query the upstream servers");
             {
               add_mhd_response_header (r, "Content-Type",
 				       "application/octet-stream");
+              const char * hdrs = debuginfod_get_headers(client);
+              string header_dup;
+              if (hdrs)
+                header_dup = string(hdrs);
+              size_t pos = 0;
+              // Clean winning headers to add all X-DEBUGINFOD lines to the package we'll send
+              while( (pos = header_dup.find("X-DEBUGINFOD")) != string::npos)
+                {
+                  // Focus on where X-DEBUGINFOD- begins
+                  header_dup = header_dup.substr(pos);
+                  size_t newline =  header_dup.find('\n');
+                  if (newline == string::npos)
+                    break;
+                  add_mhd_response_header(r, header_dup.substr(0,header_dup.find(':')).c_str(),
+                                             header_dup.substr(header_dup.find(':')).c_str());
+                  header_dup = header_dup.substr(newline);
+                }
+
               add_mhd_last_modified (r, s.st_mtime);
               if (verbose > 1)
                 obatched(clog) << "serving file from upstream debuginfod/cache" << endl;
diff --git a/debuginfod/debuginfod.h.in b/debuginfod/debuginfod.h.in
index c358df4d..6ae8b91c 100644
--- a/debuginfod/debuginfod.h.in
+++ b/debuginfod/debuginfod.h.in
@@ -93,6 +93,10 @@ void* debuginfod_get_user_data (debuginfod_client *client);
 /* Get the current or last active URL, if known.  */
 const char* debuginfod_get_url (debuginfod_client *client);
 
+/* Returns all headers sent to this client which were prefixed
+ * with X-DEBUGINFOD */
+const char* debuginfod_get_headers(debuginfod_client *client);
+
 /* Add an outgoing HTTP request  "Header: Value".  Copies string.  */
 int debuginfod_add_http_header (debuginfod_client *client, const char* header);
 
diff --git a/debuginfod/libdebuginfod.map b/debuginfod/libdebuginfod.map
index 7d2f5882..f95b5b9a 100644
--- a/debuginfod/libdebuginfod.map
+++ b/debuginfod/libdebuginfod.map
@@ -18,3 +18,6 @@ ELFUTILS_0.179 {
 ELFUTILS_0.183 {
   debuginfod_set_verbose_fd;
 } ELFUTILS_0.179;
+ELFUTILS_0.189 {
+  debuginfod_get_headers;
+} ELFUTILS_0.183;
diff --git a/doc/debuginfod_find_debuginfo.3 b/doc/debuginfod_find_debuginfo.3
index 30cef3c1..984fda12 100644
--- a/doc/debuginfod_find_debuginfo.3
+++ b/doc/debuginfod_find_debuginfo.3
@@ -58,6 +58,7 @@ OPTIONAL FUNCTIONS
 .BI "const char* debuginfod_get_url(debuginfod_client *" client ");"
 .BI "int debuginfod_add_http_header(debuginfod_client *" client ","
 .BI "                               const char* " header ");"
+.BI "const char* debuginfod_get_headers(debuginfod_client *" client ");"
 
 .SH DESCRIPTION
 
@@ -198,6 +199,18 @@ By default, the library adds a descriptive \fIUser-Agent:\fP
 header to outgoing requests.  If the client application adds
 a header with the same name, this default is suppressed.
 
+.BR \%debuginfod_get_headers ()
+may be called with a debuginfod client. This function will return the
+http response headers prefixed with
+.BR X-DEBUGINFOD
+received from the first handle to get a response from a debuginfod server.
+Note that all other http headers aren't stored in the libcurl header
+callback function since they aren't of as much interest. The caller should
+copy the returned string if it is needed beyond the release of the client object.
+The returned string may be NULL if no headers are prefixed with
+.BR X-DEBUGINFOD
+\.
+
 .SH "MACROS"
 
 .SS "DEBUGINFOD_SONAME"
diff --git a/doc/debuginfod_get_headers.3 b/doc/debuginfod_get_headers.3
new file mode 100644
index 00000000..1db55982
--- /dev/null
+++ b/doc/debuginfod_get_headers.3
@@ -0,0 +1,2 @@
+.so man3/debuginfod_find_debuginfo.3
+
diff --git a/tests/run-debuginfod-response-headers.sh b/tests/run-debuginfod-response-headers.sh
index 62c43887..e5698cc9 100755
--- a/tests/run-debuginfod-response-headers.sh
+++ b/tests/run-debuginfod-response-headers.sh
@@ -73,17 +73,21 @@ rm -rf $DEBUGINFOD_CACHE_PATH
 env DEBUGINFOD_URLS="http://127.0.0.1:"$PORT1 LD_LIBRARY_PATH=$ldpath ${abs_top_builddir}/debuginfod/debuginfod-find\
     -vvv executable F/prog > vlog-find$PORT1.1 2>&1
 tempfiles vlog-find$PORT1.1
-grep 'Content-Length: ' vlog-find$PORT1.1
-grep 'X-DEBUGINFOD-FILE: ' vlog-find$PORT1.1
-grep 'X-DEBUGINFOD-SIZE: ' vlog-find$PORT1.1
+errfiles vlog-find$PORT1.1
+cat vlog-find$PORT1.1
+grep 'Headers:' vlog-find$PORT1.1
+grep 'X-DEBUGINFOD-FILE: prog' vlog-find$PORT1.1
+grep 'X-DEBUGINFOD-SIZE: '     vlog-find$PORT1.1
 
 # Check to see if an executable file located in an archive prints the file's description and archive
 env DEBUGINFOD_URLS="http://127.0.0.1:"$PORT1 LD_LIBRARY_PATH=$ldpath ${abs_top_builddir}/debuginfod/debuginfod-find\
     -vvv executable c36708a78618d597dee15d0dc989f093ca5f9120 > vlog-find$PORT1.2 2>&1
 tempfiles vlog-find$PORT1.2
-grep 'Content-Length: ' vlog-find$PORT1.2
-grep 'X-DEBUGINFOD-FILE: ' vlog-find$PORT1.2
-grep 'X-DEBUGINFOD-SIZE: ' vlog-find$PORT1.2
+errfiles vlog-find$PORT1.2
+cat vlog-find$PORT1.2
+grep 'Headers:'               vlog-find$PORT1.2
+grep 'X-DEBUGINFOD-FILE: '    vlog-find$PORT1.2
+grep 'X-DEBUGINFOD-SIZE: '    vlog-find$PORT1.2
 grep 'X-DEBUGINFOD-ARCHIVE: ' vlog-find$PORT1.2
 
 # Check that X-DEBUGINFOD-SIZE matches the size of each file
@@ -94,6 +98,38 @@ do
     test $st_size -eq $x_debuginfod_size
 done
 
+rm -rf $DEBUGINFOD_CACHE_PATH
+BUILDID=`env LD_LIBRARY_PATH=$ldpath ${abs_builddir}/../src/readelf \
+          -a F/prog | grep 'Build ID' | cut -d ' ' -f 7`
+netcat_dir="buildid/$BUILDID/"
+mkdir -p ${PWD}/$netcat_dir
+cp F/prog ${PWD}/$netcat_dir/executable
+tempfiles F/prog
+
+# Netcat dies after answering the request
+nc -l -p $PORT2 -c 'echo -e "HTTP/1.1 200 OK\nX-DEBUGINFOD-SIZE: ba:d_size\nX-DEBUGINFOD-\rFILE:\=\+ \r213\n\n $(date)"' & < ${PWD}/$netcat_dir"executable" &
+# Wait until the netcat port is in use. Otherwise debuginfod-find can query
+# before netcat is ready.
+SECONDS=0
+nc_start=$SECONDS
+while [ ! $(lsof -i -P -n | grep LISTEN | grep "nc.*$PORT2")  ]
+do
+  # If it takes longer than 5 seconds for netcat to start up, then fail
+  duration=$(( SECONDS - nc_start ))
+  if [ $SECONDS -gt 5 ]
+  then
+    err
+  fi
+done
+
+env DEBUGINFOD_URLS="http://127.0.0.1:"$PORT2 LD_LIBRARY_PATH=$ldpath ${abs_top_builddir}/debuginfod/debuginfod-find\
+    -vvv executable $BUILDID > vlog-find$PORT2 2>&1
+errfiles vlog-find$PORT2
+tempfiles vlog-find$PORT2
+cat vlog-find$PORT2 | grep "X-DEBUGINFOD-"
+rm -f "$netcat_dir"executable
+rmdir -p $netcat_dir
+
 kill $PID1
 wait $PID1
 PID1=0
-- 
2.36.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Bug debuginfod/27277] Describe retrieved files when verbose
  2022-07-14 15:32                   ` Noah Sanci
@ 2022-08-04 13:12                     ` Mark Wielaard
       [not found]                       ` <CAJXA7qg09YkxK-NRQ31Hem0+54Us=jYC5+1siPSbHangx=SCow@mail.gmail.com>
  0 siblings, 1 reply; 23+ messages in thread
From: Mark Wielaard @ 2022-08-04 13:12 UTC (permalink / raw)
  To: Noah Sanci, elfutils-devel

Hi Noah,

On Thu, 2022-07-14 at 11:32 -0400, Noah Sanci via Elfutils-devel wrote:
> Please find the patch for pr28284 attached
> 
>     Debuginfod and debuginfod clients are now equipped to send
>     and receive http headers prefixed with X-DEBUGINFOD and
>     print them in verbose mode for more context

OK, so this patch does 3 things:

- Introduce a debuginfod_get_headers public api for libdebuginfod
  that provides any X-DEBUGINFOD prefixed headers for the last
  retrieved file if the http(s) protocol was used (but not for the file
  protocol nor when the file was found in the cache).

- It changes debuginfod-find to use this new api to show those headers
  in verbose mode.

- Changes the debuginfod server to use the api to add any
  X-DEBUGINFOD headers to the response if the request was
  delegated.

So the last two are obvious once we have the first. I am slightly
worried about the "inconsistency" in when these headers are available.
Should we maybe cache them? Or do they only really make sense during
download/in the progress callback?

There currently are 3 special headers:

- X-DEBUGINFOD-FILE the (base) file name of the original file
- X-DEBUGINFOD-SIZE the on-disk size of the file
- X-DEBUGINFOD-ARCHIVE the archive name the file came from

All three are helpful for debug/verbose output. The size is independent
of the Content-Encoding and so is useful for progress
callback/filtering. The archive could be helpful if we had a mechanism
for getting all content such an archive.

The description of the new debuginfod_get_headers api is:

+.BR \%debuginfod_get_headers ()
+may be called with a debuginfod client. This function will return the
+http response headers prefixed with
+.BR X-DEBUGINFOD
+received from the first handle to get a response from a debuginfod server.
+Note that all other http headers aren't stored in the libcurl header
+callback function since they aren't of as much interest. The caller should
+copy the returned string if it is needed beyond the release of the client object.
+The returned string may be NULL if no headers are prefixed with
+.BR X-DEBUGINFOD
+\.

Looking at the headers the headers are cleaned up when a new request is
made using the same client object (not just when the client object is
released). 

Also I am not getting any headers because they seem to be turned into
lower-case by curl. e.g. from debuginfod.fedoraproject.org:

< date: Thu, 04 Aug 2022 12:56:20 GMT
< content-type: application/octet-stream
< x-debuginfod-size: 4896240
< x-debuginfod-archive: /mnt/fedora_koji_prod/koji/packages/bash/5.1.16/2.fc36/x86_64/bash-debuginfo-5.1.16-2.fc36.x86_64.rpm
< x-debuginfod-file: /usr/lib/debug/usr/bin/bash-5.1.16-2.fc36.x86_64.debug
< last-modified: Wed, 19 Jan 2022 22:15:52 GMT

And according to the RFC "Each header field consists of a name followed by a colon (":") and the field value. Field names are case-insensitive." So at least in the code we always need to compare case-insensitive.

+  // X-DEBUGINFOD is 11 characters long.
+  // Some basic checks to ensure the headers received are of the expected format
+  if ( strncmp(buffer, "X-DEBUGINFOD", 11) || buffer[numitems-1] != '\n'
+       || (buffer == strstr(buffer, ":")) ){
+    return numitems;
+  }

and in debuginfod.cxx

+              // Clean winning headers to add all X-DEBUGINFOD lines to the pac
kage we'll send
+              while( (pos = header_dup.find("X-DEBUGINFOD")) != string::npos)
+                {
+                  // Focus on where X-DEBUGINFOD- begins

I do wonder if we should simply say that the debuginfod_get_headers is only valid during a progress function callback. But maybe that is too restrictive. If not there is a slight difference between calling debuginfod_get_headers during a progress function and afterwards if there was an error during download (then we seem to clean them). Which probably good, but should be documented.

Also I probably won't mind providing all header, at least during the progress function callback. But you might be right that they aren't as informative.

Cheers,

Mark

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Bug debuginfod/27277] Describe retrieved files when verbose
       [not found]                       ` <CAJXA7qg09YkxK-NRQ31Hem0+54Us=jYC5+1siPSbHangx=SCow@mail.gmail.com>
@ 2022-08-08 14:35                         ` Mark Wielaard
  0 siblings, 0 replies; 23+ messages in thread
From: Mark Wielaard @ 2022-08-08 14:35 UTC (permalink / raw)
  To: Noah Sanci; +Cc: elfutils-devel

Hi Noah,

On Fri, Aug 05, 2022 at 03:21:48PM -0400, Noah Sanci wrote:
> On Thu, Aug 4, 2022 at 9:12 AM Mark Wielaard <mark@klomp.org> wrote:
> >
> > Hi Noah,
> >
> > On Thu, 2022-07-14 at 11:32 -0400, Noah Sanci via Elfutils-devel wrote:
> > > Please find the patch for pr28284 attached
> > >
> > >     Debuginfod and debuginfod clients are now equipped to send
> > >     and receive http headers prefixed with X-DEBUGINFOD and
> > >     print them in verbose mode for more context
> >
> > OK, so this patch does 3 things:
> >
> > - Introduce a debuginfod_get_headers public api for libdebuginfod
> >   that provides any X-DEBUGINFOD prefixed headers for the last
> >   retrieved file if the http(s) protocol was used (but not for the file
> >   protocol nor when the file was found in the cache).
> >
> > - It changes debuginfod-find to use this new api to show those headers
> >   in verbose mode.
> >
> > - Changes the debuginfod server to use the api to add any
> >   X-DEBUGINFOD headers to the response if the request was
> >   delegated.
> >
> > So the last two are obvious once we have the first. I am slightly
> > worried about the "inconsistency" in when these headers are available.
> > Should we maybe cache them? Or do they only really make sense during
> > download/in the progress callback?
>
> Not quite sure what you mean here, they're available from the time a server
> is chosen and the file downloaded until when debuginfod_end is called. Are
> you referring to caching outside the lifetime of the debuginfod_client
> or within its lifetime?

I am trying to figure out what the use case is. If the use case is
simply providing information during the download then having/promising
it is just available during the progress callback function might make
things easier.

There are two quirks making the winning headers available after the
download has concluded. If there was an error during download the
headers are erased. Likewise when the handle is reused.

This might be fine, but then has to be documented clearly.

And if there isn't really a usecase then to have the headers available
after the download has succeeded then it can also simply be documented
as only being valid during/from a progress function callback.

> > Looking at the headers the headers are cleaned up when a new request is
> > made using the same client object (not just when the client object is
> > released).
>
> If I understand what's being said, the implication is that there is
> potential for a single client to make multiple requests which would
> unintentionally overwrite previous winning_headers. If the concern
> is overwriting, winning_headers can be expanded into an array of all
> winning headers and a function created to search for headers based
> on header data such as X-DEBUGINFOD-FILE.

It isn't bad when that happens as long it is clear to the user. It
also means that it might be simpler to just guarantee access to the
headers during the progress function callback and not promise anything
about what the call returns after the (possible) download.

> > Also I am not getting any headers because they seem to be turned into
> > lower-case by curl. e.g. from debuginfod.fedoraproject.org:
> >
> > < date: Thu, 04 Aug 2022 12:56:20 GMT
> > < content-type: application/octet-stream
> > < x-debuginfod-size: 4896240
> > < x-debuginfod-archive: /mnt/fedora_koji_prod/koji/packages/bash/5.1.16/2.fc36/x86_64/bash-debuginfo-5.1.16-2.fc36.x86_64.rpm
> > < x-debuginfod-file: /usr/lib/debug/usr/bin/bash-5.1.16-2.fc36.x86_64.debug
> > < last-modified: Wed, 19 Jan 2022 22:15:52 GMT
> >
> > And according to the RFC "Each header field consists of a name
> > followed by a colon (":") and the field value. Field names are
> > case-insensitive." So at least in the code we always need to
> > compare case-insensitive.
>
> Both solutions implemented in the below patch will fix the issue
> assuming that the case is consistent (i.e the prefix isn't
> X-dEbUgInfod),

That is unlikely, but isn't really an invalid header.  I can imagine
someone using X-debuginfod or X-Debuginfod for example. So why not
simply use strncasecmp.

> however,
> for the debuginfod.cxx header cleaning the case of the original
> headers are forced to upper case and not preserved. I hope that is not
> an issue.

That shouldn't be a problem.

> @@ -1111,8 +1118,6 @@ debuginfod_query_server (debuginfod_client *c,
>                  if (c->winning_headers == NULL)
>                    {
>                      c->winning_headers = data[committed_to].response_data;
> -                    if (vfd >= 0 && c->winning_headers != NULL)
> -                      dprintf(vfd, "\n%s", c->winning_headers);
>                      data[committed_to].response_data = NULL;
>                      data[committed_to].response_data_size = 0;
>                    }

Note that previously this provided the full http headers in verbose
mode before/during the actualy download. While now debuginfod-find
--verbose will only provide the X-Debuginfod- headers after the
download.

I can see how only the X-debuginfod headers are useful after the call,
but during the progress callback function maybe the full headers are
useful to determine whether or not to abort the download early?

Cheers,

Mark


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Bug debuginfod/27277] Describe retrieved files when verbose
  2021-08-25 18:08 Noah Sanci
@ 2021-09-08 15:01 ` Mark Wielaard
  0 siblings, 0 replies; 23+ messages in thread
From: Mark Wielaard @ 2021-09-08 15:01 UTC (permalink / raw)
  To: Noah Sanci, elfutils-devel

Hi Noah,

On Wed, 2021-08-25 at 14:08 -0400, Noah Sanci via Elfutils-devel wrote:
> There appear to exist use cases that intend to simply check for the
> existence of content in a debuginfod server, without actually
> downloading it.  In HTTP land, the HEAD operation is the natural
> expression of this.
> Instead of implementing a HEAD/describe option, allow users, with
> enough
> verbosity, to print the HTTP response headers upon retrieving a file.
> E.g output:
> 
> HTTP/1.1 200 OK
> Connection: Keep-Alive
> Content-Length: 2428240
> Cache-Control: public
> Last-Modified: Sat, 15 May 2021 20:49:51 GMT
> Content-Type: application/octet-stream
> Date: Tue, 03 Aug 2021 18:50:36 GMT
> 
> Note: the new test is mostly compatible with the nsanci/test-fix
> commit. When nsanci/test-fix merges with master I can resubmit the
> patch with the test properly rebased and adjusted for maximum
> efficiency.

Please do resubmit the test now that we have separate debuginfod tests.

> The HEAD functionality will (likely) be put into a new PR for
> existing checking.

Thanks. That is https://sourceware.org/bugzilla/show_bug.cgi?id=28284

> From 6351258d707337b69563d4be8effbb30fc42f784 Mon Sep 17 00:00:00
> 2001
> From: Noah Sanci <nsanci@redhat.com>
> Date: Wed, 28 Jul 2021 14:46:05 -0400
> Subject: [PATCH] debuginfod: PR27277 - Describe retrieved files when
> verbose
> 
> There appear to exist use cases that intend to simply check for the
> existence of content in a debuginfod server, without actually
> downloading it.  In HTTP land, the HEAD operation is the natural
> expression of this.
> Instead of implementing a HEAD/describe option,

Maybe drop this first part, it describes what is not implemented in
this patch.

>  allow users, with enough
> verbosity, to print the HTTP response headers upon retrieving a file.
> E.g output:
> 
> HTTP/1.1 200 OK
> Connection: Keep-Alive
> Content-Length: 2428240
> Cache-Control: public
> Last-Modified: Sat, 15 May 2021 20:49:51 GMT
> Content-Type: application/octet-stream
> Date: Tue, 03 Aug 2021 18:50:36 GMT
> 
> https://sourceware.org/bugzilla/show_bug.cgi?id=27277
> 
> Signed-off-by: Noah Sanci <nsanci@redhat.com>
> [...]
> diff --git a/debuginfod/ChangeLog b/debuginfod/ChangeLog
> index 530f7dc7..cb9e50c7 100644
> --- a/debuginfod/ChangeLog
> +++ b/debuginfod/ChangeLog
> @@ -4,6 +4,17 @@
>  	* debuginfod.cxx (handler_cb): Fix after_you unique_set key
>  	to the entire incoming URL.
>  
> +2021-08-02  Noah Sanci  <nsanci@redhat.com>
> +
> +	PR27277
> +	* debuginfod-client.c (struct debuginfod_client): New field 
> +	winning_headers.

It seems technically we don't need winning_headers at this point. But
it will be useful when you implement the other PR.

> +	(struct handle_data): New field response_data.
> +	(header_callback): Store received headers in response_data.
> +	(debuginfod_query_server): Activate CURLOPT_HEADERFUNCTION.
> +	Save winning response_data.
> +	(debuginfod_end): free client winning headers.
> +
> [...]
> +static size_t
> +header_callback (char * buffer, size_t size, size_t numitems, void * userdata)
> +{
> +  if (size != 1)
> +    return 0;
> +  /* Temporary buffer for realloc */
> +  char *temp = NULL;
> +  size_t userlen = 0;
> +  if (*(char**)userdata == NULL)
> +    {
> +      temp = malloc(numitems+1);
> +      if (temp == NULL)
> +        return 0;
> +      memset(temp, '\0', numitems+1);
> +    }
> +  else
> +    {
> +      userlen = strlen(*(char**)userdata);
> +      temp = realloc(*(char**)userdata, userlen + numitems + 1);
> +      if (temp == NULL)
> +       return 0;
> +    }
> +  strncat(temp, buffer, numitems);
> +  *(char**)userdata = temp;
> +  return numitems;
> +}

I am slightly confused about this function.
Is the idea that this is given (part of) the headers and allocates
memory for userdata to store the parts it wants/needs?

If so, do we need to filter the headers here for those that interest
us, or is that simply all headers?

Do we need to protect against using too much memory?
Currently we keep reallocing till we run out of memory, maybe set some
predetermined limit? (64K of headers should be enough for anybody...)

Why do you clear the temp memory the first time, but don't clear the
end of the realloc data? Is the clearing of the whole block necessary
given that the temp data is overwritten immediately with the buffer
contents?

>  /* Query each of the server URLs found in $DEBUGINFOD_URLS for the file
>     with the specified build-id, type (debuginfo, executable or source)
>     and filename. filename may be NULL. If found, return a file
> @@ -936,10 +966,13 @@ debuginfod_query_server (debuginfod_client *c,
>  	  curl_easy_setopt (data[i].handle, CURLOPT_LOW_SPEED_LIMIT,
>  			    100 * 1024L);
>  	}
> +      data[i].response_data = NULL;
>        curl_easy_setopt(data[i].handle, CURLOPT_FILETIME, (long) 1);
>        curl_easy_setopt(data[i].handle, CURLOPT_FOLLOWLOCATION, (long) 1);
>        curl_easy_setopt(data[i].handle, CURLOPT_FAILONERROR, (long) 1);
>        curl_easy_setopt(data[i].handle, CURLOPT_NOSIGNAL, (long) 1);
> +      curl_easy_setopt(data[i].handle, CURLOPT_HEADERFUNCTION, header_callback);
> +      curl_easy_setopt(data[i].handle, CURLOPT_HEADERDATA, (void *) &(data[i].response_data));

Aha, ok, so this is how the headers are gotten and finally assigned to
response_data..

> @@ -1161,10 +1205,10 @@ debuginfod_query_server (debuginfod_client *c,
>                  {
>                    char *effective_url = NULL;
>                    long resp_code = 500;
> -                  CURLcode ok1 = curl_easy_getinfo (target_handle,
> +                  CURLcode ok1 = curl_easy_getinfo (msg->easy_handle,
>  						    CURLINFO_EFFECTIVE_URL,
>  						    &effective_url);
> -                  CURLcode ok2 = curl_easy_getinfo (target_handle,
> +                  CURLcode ok2 = curl_easy_getinfo (msg->easy_handle,
>  						    CURLINFO_RESPONSE_CODE,
>  						    &resp_code);
>                    if(ok1 == CURLE_OK && ok2 == CURLE_OK && effective_url)

Why this change?

> +2021-08-04  Noah Sanci  <nsanci@redhat.com>
> +
> +	PR27277
> +	* debuginfod-find.1: Increasing verbosity describes the downloaded
> +	file.


Thanks for updating the docs.

I haven't reviewed the test case yet.

Cheers,

Mark

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug debuginfod/27277] Describe retrieved files when verbose
@ 2021-08-25 18:08 Noah Sanci
  2021-09-08 15:01 ` Mark Wielaard
  0 siblings, 1 reply; 23+ messages in thread
From: Noah Sanci @ 2021-08-25 18:08 UTC (permalink / raw)
  To: elfutils-devel

[-- Attachment #1: Type: text/plain, Size: 886 bytes --]

Hello,

There appear to exist use cases that intend to simply check for the
existence of content in a debuginfod server, without actually
downloading it.  In HTTP land, the HEAD operation is the natural
expression of this.
Instead of implementing a HEAD/describe option, allow users, with enough
verbosity, to print the HTTP response headers upon retrieving a file.
E.g output:

HTTP/1.1 200 OK
Connection: Keep-Alive
Content-Length: 2428240
Cache-Control: public
Last-Modified: Sat, 15 May 2021 20:49:51 GMT
Content-Type: application/octet-stream
Date: Tue, 03 Aug 2021 18:50:36 GMT

Note: the new test is mostly compatible with the nsanci/test-fix
commit. When nsanci/test-fix merges with master I can resubmit the
patch with the test properly rebased and adjusted for maximum
efficiency.

The HEAD functionality will (likely) be put into a new PR for existing checking.

-Noah Sanci

[-- Attachment #2: 0001-debuginfod-PR27277-Describe-retrieved-files-when-ver.patch --]
[-- Type: text/x-patch, Size: 16446 bytes --]

From 6351258d707337b69563d4be8effbb30fc42f784 Mon Sep 17 00:00:00 2001
From: Noah Sanci <nsanci@redhat.com>
Date: Wed, 28 Jul 2021 14:46:05 -0400
Subject: [PATCH] debuginfod: PR27277 - Describe retrieved files when verbose

There appear to exist use cases that intend to simply check for the
existence of content in a debuginfod server, without actually
downloading it.  In HTTP land, the HEAD operation is the natural
expression of this.
Instead of implementing a HEAD/describe option, allow users, with enough
verbosity, to print the HTTP response headers upon retrieving a file.
E.g output:

HTTP/1.1 200 OK
Connection: Keep-Alive
Content-Length: 2428240
Cache-Control: public
Last-Modified: Sat, 15 May 2021 20:49:51 GMT
Content-Type: application/octet-stream
Date: Tue, 03 Aug 2021 18:50:36 GMT

https://sourceware.org/bugzilla/show_bug.cgi?id=27277

Signed-off-by: Noah Sanci <nsanci@redhat.com>
---
 debuginfod/ChangeLog                     |  11 ++
 debuginfod/debuginfod-client.c           |  58 ++++++-
 doc/ChangeLog                            |   6 +
 doc/debuginfod-find.1                    |   3 +-
 tests/ChangeLog                          |   7 +
 tests/Makefile.am                        |   3 +-
 tests/run-debuginfod-response-headers.sh | 186 +++++++++++++++++++++++
 7 files changed, 268 insertions(+), 6 deletions(-)
 create mode 100755 tests/run-debuginfod-response-headers.sh

diff --git a/debuginfod/ChangeLog b/debuginfod/ChangeLog
index 530f7dc7..cb9e50c7 100644
--- a/debuginfod/ChangeLog
+++ b/debuginfod/ChangeLog
@@ -4,6 +4,17 @@
 	* debuginfod.cxx (handler_cb): Fix after_you unique_set key
 	to the entire incoming URL.
 
+2021-08-02  Noah Sanci  <nsanci@redhat.com>
+
+	PR27277
+	* debuginfod-client.c (struct debuginfod_client): New field 
+	winning_headers.
+	(struct handle_data): New field response_data.
+	(header_callback): Store received headers in response_data.
+	(debuginfod_query_server): Activate CURLOPT_HEADERFUNCTION.
+	Save winning response_data.
+	(debuginfod_end): free client winning headers.
+
 2021-07-26  Noah Sanci  <nsanci@redhat.com>
 
 	PR27982
diff --git a/debuginfod/debuginfod-client.c b/debuginfod/debuginfod-client.c
index 7d4b220f..985a17c5 100644
--- a/debuginfod/debuginfod-client.c
+++ b/debuginfod/debuginfod-client.c
@@ -127,6 +127,7 @@ struct debuginfod_client
      timeout or other info gotten from environment variables, the
      handle data, etc. So those don't have to be reparsed and
      recreated on each request.  */
+  char * winning_headers;
 };
 
 /* The cache_clean_interval_s file within the debuginfod cache specifies
@@ -183,6 +184,8 @@ struct handle_data
      to the cache. Used to ensure that a file is not downloaded from
      multiple servers unnecessarily.  */
   CURL **target_handle;
+  /* Response http headers for this client handle, sent from the server */
+  char *response_data;
 };
 
 static size_t
@@ -499,6 +502,33 @@ default_progressfn (debuginfod_client *c, long a, long b)
 }
 
 
+static size_t
+header_callback (char * buffer, size_t size, size_t numitems, void * userdata)
+{
+  if (size != 1)
+    return 0;
+  /* Temporary buffer for realloc */
+  char *temp = NULL;
+  size_t userlen = 0;
+  if (*(char**)userdata == NULL)
+    {
+      temp = malloc(numitems+1);
+      if (temp == NULL)
+        return 0;
+      memset(temp, '\0', numitems+1);
+    }
+  else
+    {
+      userlen = strlen(*(char**)userdata);
+      temp = realloc(*(char**)userdata, userlen + numitems + 1);
+      if (temp == NULL)
+       return 0;
+    }
+  strncat(temp, buffer, numitems);
+  *(char**)userdata = temp;
+  return numitems;
+}
+
 /* Query each of the server URLs found in $DEBUGINFOD_URLS for the file
    with the specified build-id, type (debuginfo, executable or source)
    and filename. filename may be NULL. If found, return a file
@@ -936,10 +966,13 @@ debuginfod_query_server (debuginfod_client *c,
 	  curl_easy_setopt (data[i].handle, CURLOPT_LOW_SPEED_LIMIT,
 			    100 * 1024L);
 	}
+      data[i].response_data = NULL;
       curl_easy_setopt(data[i].handle, CURLOPT_FILETIME, (long) 1);
       curl_easy_setopt(data[i].handle, CURLOPT_FOLLOWLOCATION, (long) 1);
       curl_easy_setopt(data[i].handle, CURLOPT_FAILONERROR, (long) 1);
       curl_easy_setopt(data[i].handle, CURLOPT_NOSIGNAL, (long) 1);
+      curl_easy_setopt(data[i].handle, CURLOPT_HEADERFUNCTION, header_callback);
+      curl_easy_setopt(data[i].handle, CURLOPT_HEADERDATA, (void *) &(data[i].response_data));
 #if LIBCURL_VERSION_NUM >= 0x072a00 /* 7.42.0 */
       curl_easy_setopt(data[i].handle, CURLOPT_PATH_AS_IS, (long) 1);
 #else
@@ -961,6 +994,7 @@ debuginfod_query_server (debuginfod_client *c,
   int committed_to = -1;
   bool verbose_reported = false;
   struct timespec start_time, cur_time;
+  c->winning_headers = NULL;
   if ( maxtime > 0 && clock_gettime(CLOCK_MONOTONIC_RAW, &start_time) == -1)
     {
       rc = errno;
@@ -995,8 +1029,18 @@ debuginfod_query_server (debuginfod_client *c,
 	    if (data[i].handle != target_handle)
 	      curl_multi_remove_handle(curlm, data[i].handle);
 	    else
-	      committed_to = i;
-	}
+              {
+	        committed_to = i;
+                if (c->winning_headers == NULL)
+                  {
+                    c->winning_headers = data[committed_to].response_data;
+                    if (vfd >= 0)
+                      dprintf(vfd, "\n%s", c->winning_headers);
+                    data[committed_to].response_data = NULL;
+                  }
+
+              }
+        }
 
       if (vfd >= 0 && !verbose_reported && committed_to >= 0)
 	{
@@ -1161,10 +1205,10 @@ debuginfod_query_server (debuginfod_client *c,
                 {
                   char *effective_url = NULL;
                   long resp_code = 500;
-                  CURLcode ok1 = curl_easy_getinfo (target_handle,
+                  CURLcode ok1 = curl_easy_getinfo (msg->easy_handle,
 						    CURLINFO_EFFECTIVE_URL,
 						    &effective_url);
-                  CURLcode ok2 = curl_easy_getinfo (target_handle,
+                  CURLcode ok2 = curl_easy_getinfo (msg->easy_handle,
 						    CURLINFO_RESPONSE_CODE,
 						    &resp_code);
                   if(ok1 == CURLE_OK && ok2 == CURLE_OK && effective_url)
@@ -1238,7 +1282,10 @@ debuginfod_query_server (debuginfod_client *c,
             {
               curl_multi_remove_handle(curlm, data[i].handle); /* ok to repeat */
               curl_easy_cleanup (data[i].handle);
+              free(data[i].response_data);
             }
+            free(c->winning_headers);
+            c->winning_headers = NULL;
 	    goto query_in_parallel;
 	}
       else
@@ -1281,6 +1328,7 @@ debuginfod_query_server (debuginfod_client *c,
     {
       curl_multi_remove_handle(curlm, data[i].handle); /* ok to repeat */
       curl_easy_cleanup (data[i].handle);
+      free (data[i].response_data);
     }
 
   for (int i = 0; i < num_urls; ++i)
@@ -1304,6 +1352,7 @@ debuginfod_query_server (debuginfod_client *c,
     {
       curl_multi_remove_handle(curlm, data[i].handle); /* ok to repeat */
       curl_easy_cleanup (data[i].handle);
+      free (data[i].response_data);
     }
 
   unlink (target_cache_tmppath);
@@ -1415,6 +1464,7 @@ debuginfod_end (debuginfod_client *client)
 
   curl_multi_cleanup (client->server_mhandle);
   curl_slist_free_all (client->headers);
+  free (client->winning_headers);
   free (client->url);
   free (client);
 }
diff --git a/doc/ChangeLog b/doc/ChangeLog
index 77d1d705..3f542e08 100644
--- a/doc/ChangeLog
+++ b/doc/ChangeLog
@@ -10,6 +10,12 @@
 	* Makefile.am: Updated to include debuginfod-client-config.7
 	* man3, man7: Symlinks for source tree man page testing.
 
+2021-08-04  Noah Sanci  <nsanci@redhat.com>
+
+	PR27277
+	* debuginfod-find.1: Increasing verbosity describes the downloaded
+	file.
+
 2021-07-26  Noah Sanci <nsanci@redhat.com>
 
 	PR27982
diff --git a/doc/debuginfod-find.1 b/doc/debuginfod-find.1
index a61673f5..957ec7e7 100644
--- a/doc/debuginfod-find.1
+++ b/doc/debuginfod-find.1
@@ -110,7 +110,8 @@ l l.
 
 .TP
 .B "\-v"
-Increase verbosity, including printing frequent download-progress messages.
+Increase verbosity, including printing frequent download-progress messages
+and printing the http response headers from the server.
 
 
 .SH "SECURITY"
diff --git a/tests/ChangeLog b/tests/ChangeLog
index 3bfd1ca2..f4aaf3ee 100644
--- a/tests/ChangeLog
+++ b/tests/ChangeLog
@@ -4,6 +4,13 @@
 	* backtrace.c (callback_verify): Check for pthread_kill as first
 	frame. Change asserts to fprintf plus abort.
 
+2021-08-02  Noah Sanci  <nsanci@redhat.com>
+
+	PR27277
+	* run-debuginfod-response-headers.sh: Add a test to ensure that file descriptions
+	are accurate for files outside or within archives.
+	* Makefile.am: Add run-debuginfod-response-headers.sh to TESTS.
+
 2021-07-26  Noah Sanci  <nsanci@redhat.com>
 
 	PR27982
diff --git a/tests/Makefile.am b/tests/Makefile.am
index 429649f4..99fa61ba 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -211,7 +211,7 @@ if DEBUGINFOD
 check_PROGRAMS += debuginfod_build_id_find
 # With the dummy delegation doesn't work
 if !DUMMY_LIBDEBUGINFOD
-TESTS += run-debuginfod-find.sh
+TESTS += run-debuginfod-find.sh run-debuginfod-response-headers.sh
 endif
 endif
 
@@ -475,6 +475,7 @@ EXTRA_DIST = run-arextract.sh run-arsymtest.sh run-ar.sh \
 	     run-disasm-riscv64.sh \
 	     testfile-riscv64-dis1.o.bz2 testfile-riscv64-dis1.expect.bz2 \
              run-debuginfod-find.sh \
+             run-debuginfod-response-headers.sh \
 	     debuginfod-rpms/fedora30/hello2-1.0-2.src.rpm \
 	     debuginfod-rpms/fedora30/hello2-1.0-2.x86_64.rpm \
 	     debuginfod-rpms/fedora30/hello2-debuginfo-1.0-2.x86_64.rpm \
diff --git a/tests/run-debuginfod-response-headers.sh b/tests/run-debuginfod-response-headers.sh
new file mode 100755
index 00000000..be1345fc
--- /dev/null
+++ b/tests/run-debuginfod-response-headers.sh
@@ -0,0 +1,186 @@
+#!/usr/bin/env bash
+#
+# Copyright (C) 2019-2021 Red Hat, Inc.
+# This file is part of elfutils.
+#
+# This file is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# elfutils is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+. $srcdir/test-subr.sh  # includes set -e
+
+# for test case debugging, uncomment:
+set -x
+
+DB=${PWD}/.debuginfod_tmp.sqlite
+tempfiles $DB
+export DEBUGINFOD_CACHE_PATH=${PWD}/.client_cache
+
+while true; do
+    PORT1=`expr '(' $RANDOM % 1000 ')' + 9000`
+    ss -atn | fgrep ":$PORT1" || break
+done
+PID1=0
+PID2=0
+PID3=0
+PID4=0
+
+cleanup()
+{
+  if [ $PID1 -ne 0 ]; then kill $PID1; wait $PID1; fi
+  if [ $PID2 -ne 0 ]; then kill $PID2; wait $PID2; fi
+  if [ $PID3 -ne 0 ]; then kill $PID3; wait $PID3; fi
+  if [ $PID4 -ne 0 ]; then kill $PID4; wait $PID4; fi
+  rm -rf F R D L Z ${PWD}/foobar ${PWD}/mocktree ${PWD}/.client_cache* ${PWD}/tmp*
+  exit_cleanup
+}
+
+# clean up trash if we were aborted early
+trap cleanup 0 1 2 3 5 9 15
+
+errfiles_list=
+err() {
+    echo ERROR REPORTS
+    for ports in $PORT1 $PORT2 $PORT3
+    do
+        echo ERROR REPORT $port metrics
+        curl -s http://127.0.0.1:$port/metrics
+        echo
+    done
+    for x in $errfiles_list
+    do
+        echo ERROR REPORT "$x"
+        cat $x
+        echo
+    done
+    false # trigger set -e
+}
+trap err ERR
+
+errfiles() {
+    errfiles_list="$errfiles_list $*"
+}
+
+wait_ready()
+{
+  port=$1;
+  what=$2;
+  value=$3;
+  timeout=20;
+
+  echo "Wait $timeout seconds on $port for metric $what to change to $value"
+  while [ $timeout -gt 0 ]; do
+    mvalue="$(curl -s http://127.0.0.1:$port/metrics \
+              | grep "$what" | awk '{print $NF}')"
+    if [ -z "$mvalue" ]; then mvalue=0; fi
+      echo "metric $what: $mvalue"
+      if [ "$mvalue" -eq "$value" ]; then
+        break;
+    fi
+    sleep 0.5;
+    ((timeout--));
+  done;
+
+  if [ $timeout -eq 0 ]; then
+    echo "metric $what never changed to $value on port $port"
+    err
+  fi
+}
+
+# We want to run debuginfod in the background.  We also want to start
+# it with the same check/installcheck-sensitive LD_LIBRARY_PATH stuff
+# that the testrun alias sets.  But: we if we just use
+#    testrun .../debuginfod
+# it runs in a subshell, with different pid, so not helpful.
+#
+# So we gather the LD_LIBRARY_PATH with this cunning trick:
+ldpath=`testrun sh -c 'echo $LD_LIBRARY_PATH'`
+
+mkdir F R
+
+env LD_LIBRARY_PATH=$ldpath DEBUGINFOD_URLS= ${abs_builddir}/../debuginfod/debuginfod $VERBOSE -F -R -d $DB -p $PORT1 -t0 -g0 -v R F > vlog$PORT1 2>&1 &
+PID1=$!
+tempfiles vlog$PORT1
+errfiles vlog$PORT1
+# Server must become ready
+wait_ready $PORT1 'ready' 1
+export DEBUGINFOD_URLS=http://127.0.0.1:$PORT1/   # or without trailing /
+# Check thread comm names
+ps -q $PID1 -e -L -o '%p %c %a' | grep groom
+ps -q $PID1 -e -L -o '%p %c %a' | grep scan
+ps -q $PID1 -e -L -o '%p %c %a' | grep traverse
+
+# We use -t0 and -g0 here to turn off time-based scanning & grooming.
+# For testing purposes, we just sic SIGUSR1 / SIGUSR2 at the process.
+
+########################################################################
+
+# Compile a simple program, strip its debuginfo and save the build-id.
+# Also move the debuginfo into another directory so that elfutils
+# cannot find it without debuginfod.
+echo "int main() { return 0; }" > ${PWD}/prog.c
+tempfiles prog.c
+# Create a subdirectory to confound source path names
+mkdir foobar
+gcc -Wl,--build-id -g -o prog ${PWD}/foobar///./../prog.c
+testrun ${abs_top_builddir}/src/strip -g -f prog.debug ${PWD}/prog
+BUILDID=`env LD_LIBRARY_PATH=$ldpath ${abs_builddir}/../src/readelf \
+          -a prog | grep 'Build ID' | cut -d ' ' -f 7`
+
+mv prog F
+mv prog.debug F
+kill -USR1 $PID1
+# Wait till both files are in the index.
+wait_ready $PORT1 'thread_work_total{role="traverse"}' 1
+wait_ready $PORT1 'thread_work_pending{role="scan"}' 0
+wait_ready $PORT1 'thread_busy{role="scan"}' 0
+
+cp -rvp ${abs_srcdir}/debuginfod-rpms R
+if [ "$zstd" = "false" ]; then  # nuke the zstd fedora 31 ones
+    rm -vrf R/debuginfod-rpms/fedora31
+fi
+
+kill -USR1 $PID1
+# Wait till both files are in the index and scan/index fully finished
+wait_ready $PORT1 'thread_work_total{role="traverse"}' 2
+wait_ready $PORT1 'thread_work_pending{role="scan"}' 0
+wait_ready $PORT1 'thread_busy{role="scan"}' 0
+# All rpms need to be in the index, except the dummy permission-000 one
+rpms=$(find R -name \*rpm | grep -v nothing | wc -l)
+wait_ready $PORT1 'scanned_files_total{source=".rpm archive"}' $rpms
+kill -USR1 $PID1  # two hits of SIGUSR1 may be needed to resolve .debug->dwz->srefs
+# Wait till both files are in the index and scan/index fully finished
+wait_ready $PORT1 'thread_work_total{role="traverse"}' 3
+wait_ready $PORT1 'thread_work_pending{role="scan"}' 0
+wait_ready $PORT1 'thread_busy{role="scan"}' 0
+
+########################################################################
+## PR27277
+# Make a simple request to the debuginfod server and check debuginfod-find's vlog to see if
+# the custom HTTP headers are received.
+rm -rf $DEBUGINFOD_CACHE_PATH
+env DEBUGINFOD_URLS="http://127.0.0.1:"$PORT1 LD_LIBRARY_PATH=$ldpath ${abs_top_builddir}/debuginfod/debuginfod-find\
+    -vvv executable F/prog > vlog-find$PORT1.1 2>&1
+tempfiles vlog-find$PORT1.1
+cat vlog-find$PORT1.1
+#grep 'Content-Length: ' vlog-find$PORT1.1
+#grep 'Connection: ' vlog-find$PORT1.1
+#grep 'Cache-Control: ' vlog-find$PORT1.1
+
+# Check to see if an executable file located in an archive prints the file's description and archive
+env DEBUGINFOD_URLS="http://127.0.0.1:"$PORT1 LD_LIBRARY_PATH=$ldpath ${abs_top_builddir}/debuginfod/debuginfod-find\
+    -vvv executable c36708a78618d597dee15d0dc989f093ca5f9120 > vlog-find$PORT1.2 2>&1
+tempfiles vlog-find$PORT1.2
+grep 'Content-Length: ' vlog-find$PORT1.2
+grep 'Connection: ' vlog-find$PORT1.2
+grep 'Cache-Control: ' vlog-find$PORT1.2
+
-- 
2.31.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2022-08-08 14:35 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-04 18:54 [Bug debuginfod/27277] Describe retrieved files when verbose Noah Sanci
2021-08-05 15:13 ` Mark Wielaard
2021-08-05 16:54   ` Frank Ch. Eigler
2021-08-06 10:04     ` Mark Wielaard
2021-08-06 18:54       ` Frank Ch. Eigler
2021-08-09  9:25         ` Mark Wielaard
2021-08-23 15:11           ` Noah Sanci
2021-08-24  8:18             ` Mark Wielaard
2021-08-27 18:38               ` Noah Sanci
2021-09-08 20:56                 ` Mark Wielaard
2021-09-10 18:22                   ` Noah Sanci
2021-09-12 19:08                     ` Mark Wielaard
2021-09-13 20:07                       ` Noah Sanci
2021-09-16 10:50                         ` Mark Wielaard
2021-09-22 20:33           ` Frank Ch. Eigler
2021-09-29 14:55             ` Mark Wielaard
2021-09-29 21:28               ` Frank Ch. Eigler
2021-10-05 14:28                 ` Mark Wielaard
2022-07-14 15:32                   ` Noah Sanci
2022-08-04 13:12                     ` Mark Wielaard
     [not found]                       ` <CAJXA7qg09YkxK-NRQ31Hem0+54Us=jYC5+1siPSbHangx=SCow@mail.gmail.com>
2022-08-08 14:35                         ` Mark Wielaard
2021-08-25 18:08 Noah Sanci
2021-09-08 15:01 ` Mark Wielaard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).