Re: [Bug debuginfod/27983] ignore duplicate urls

public inbox for elfutils@sourceware.org
 help / color / mirror / Atom feed

From: Noah Sanci <nsanci@redhat.com>
To: Mark Wielaard <mark@klomp.org>, elfutils-devel@sourceware.org
Subject: Re: [Bug debuginfod/27983] ignore duplicate urls
Date: Mon, 19 Jul 2021 09:31:17 -0400	[thread overview]
Message-ID: <CAJXA7qiymkVMv7Pe+7J9r9ZFU5mi_5M=sjbhX1W4xon5eRbeEA@mail.gmail.com> (raw)
In-Reply-To: <YO8Sffae5F9TtdLm@wildebeest.org>

[-- Attachment #1: Type: text/plain, Size: 1103 bytes --]

Hello,


On Wed, Jul 14, 2021 at 12:36 PM Mark Wielaard <mark@klomp.org> wrote:

> You deduplicate the full URLs after they are fully constructed.  Would
> it make sense to do the deduplication on server_url, maybe even as
> part of the Count number of URLs code? That might make the code
> simpler. And you can change num_urls upfront.
Deduplication before fully building the URL would work well, however I
was concerned about the slashbuildid situation. I would need to alter
all urls to either have a trailing '/' or no trailing '/' to ensure
comparison between 'http://127.0.0.1:8000/' and 'http://127.0.0.1:8000'
is considered equal. This is possible, but I ultimately decided to
wait until full construction as those issues would have been handled.
I would be glad to make the change if you want.
>
> > +  num_urls = unduplicated_urls;
> > +  data = reallocarray( (void *) data, num_urls, sizeof(struct handle_data));
>
> Maybe this reallocarray is unnecessary. Yes, it might save a little
> bit of memory, but you do have to handle reallocarray failure.
Good to know. I removed it.

Thanks

Noah

[-- Attachment #2: 0001-debuginfod-PR27983-ignore-duplicate-urls.patch --]
[-- Type: text/x-patch, Size: 6339 bytes --]

From be4e07a8732dd688595b99f92ba275ef5060a34d Mon Sep 17 00:00:00 2001
From: Noah Sanci <nsanci@redhat.com>
Date: Fri, 9 Jul 2021 14:53:10 -0400
Subject: [PATCH] debuginfod: PR27983 - ignore duplicate urls

Gazing at server logs, one sees a minority of clients who appear to have
duplicate query traffic coming in: the same URL, milliseconds apart.
Chances are the user accidentally doubled her $DEBUGINFOD_URLS somehow,
and the client library is dutifully asking the servers TWICE. Bug #27863
reduces the pain on the servers' CPU, but dupe network traffic is still
being paid.  We should reject sending outright duplicate concurrent
traffic.

https://sourceware.org/bugzilla/show_bug.cgi?id=27983

Signed-off-by: Noah Sanci <nsanci@redhat.com>
---
 debuginfod/ChangeLog           |  7 +++++
 debuginfod/debuginfod-client.c | 55 +++++++++++++++++++++++++---------
 tests/ChangeLog                |  5 ++++
 tests/run-debuginfod-find.sh   | 13 ++++++++
 4 files changed, 66 insertions(+), 14 deletions(-)

diff --git a/debuginfod/ChangeLog b/debuginfod/ChangeLog
index d9d11737..24ccb8ef 100644
--- a/debuginfod/ChangeLog
+++ b/debuginfod/ChangeLog
@@ -1,3 +1,10 @@
+2021-07-09  Noah Sanci  <nsanci@redhat.com>
+
+	* debuginfod-client.c (debuginfod_query_server): As full-length
+	urls are generated with standardized formats, ignore duplicates.
+	Also update the number of urls to the unduplicated number of
+	urls.
+
 2021-06-18  Mark Wielaard  <mark@klomp.org>
 
 	* debuginfod-client.c (debuginfod_begin): Don't use client if
diff --git a/debuginfod/debuginfod-client.c b/debuginfod/debuginfod-client.c
index f417b40a..a9447cae 100644
--- a/debuginfod/debuginfod-client.c
+++ b/debuginfod/debuginfod-client.c
@@ -795,6 +795,7 @@ debuginfod_query_server (debuginfod_client *c,
 
   char *strtok_saveptr;
   char *server_url = strtok_r(server_urls, url_delim, &strtok_saveptr);
+  int unduplicated_urls = 0;
 
   /* Initialize each handle.  */
   for (int i = 0; i < num_urls && server_url != NULL; i++)
@@ -802,34 +803,59 @@ debuginfod_query_server (debuginfod_client *c,
       if (vfd >= 0)
 	dprintf (vfd, "init server %d %s\n", i, server_url);
 
-      data[i].fd = fd;
-      data[i].target_handle = &target_handle;
-      data[i].handle = curl_easy_init();
-      if (data[i].handle == NULL)
-        {
-          rc = -ENETUNREACH;
-          goto out1;
-        }
-      data[i].client = c;
-
-      /* Build handle url. Tolerate both  http://foo:999  and
-         http://foo:999/  forms */
       char *slashbuildid;
       if (strlen(server_url) > 1 && server_url[strlen(server_url)-1] == '/')
         slashbuildid = "buildid";
       else
         slashbuildid = "/buildid";
 
+      char *tmp_url = calloc(PATH_MAX, sizeof(char));
       if (filename) /* must start with / */
-        snprintf(data[i].url, PATH_MAX, "%s%s/%s/%s%s", server_url,
+        snprintf(tmp_url, PATH_MAX, "%s%s/%s/%s%s", server_url,
                  slashbuildid, build_id_bytes, type, filename);
       else
-        snprintf(data[i].url, PATH_MAX, "%s%s/%s/%s", server_url,
+        snprintf(tmp_url, PATH_MAX, "%s%s/%s/%s", server_url,
                  slashbuildid, build_id_bytes, type);
 
+      /* PR 27983: If the url is already set to be used use, skip it */
+      int url_index = -1;
+      for (url_index = 0; url_index < i; ++url_index)
+        {
+          if(strcmp(tmp_url, data[url_index].url) == 0)
+            {
+              url_index = -1;
+              break;
+            }
+        }
+      if (url_index == -1)
+        {
+          if (vfd >= 0)
+            dprintf(vfd, "duplicate url: %s, skipping\n", tmp_url);
+          free(tmp_url);
+          // Ensure that next iteration doesn't skip over an index mid-array
+          i--;
+          server_url = strtok_r(NULL, url_delim, &strtok_saveptr);
+          continue;
+        }
+      else
+        {
+          unduplicated_urls++;
+          strncpy(data[i].url, tmp_url, PATH_MAX);
+          free (tmp_url);
+        }
+
       if (vfd >= 0)
 	dprintf (vfd, "url %d %s\n", i, data[i].url);
 
+      data[i].fd = fd;
+      data[i].target_handle = &target_handle;
+      data[i].handle = curl_easy_init();
+      if (data[i].handle == NULL)
+        {
+          rc = -ENETUNREACH;
+           goto out1;
+        }
+      data[i].client = c;
       curl_easy_setopt(data[i].handle, CURLOPT_URL, data[i].url);
       if (vfd >= 0)
 	curl_easy_setopt(data[i].handle, CURLOPT_ERRORBUFFER, data[i].errbuf);
@@ -863,6 +889,7 @@ debuginfod_query_server (debuginfod_client *c,
       curl_multi_add_handle(curlm, data[i].handle);
       server_url = strtok_r(NULL, url_delim, &strtok_saveptr);
     }
+  num_urls = unduplicated_urls;
 
   /* Query servers in parallel.  */
   if (vfd >= 0)
diff --git a/tests/ChangeLog b/tests/ChangeLog
index b65cbeb7..5747d658 100644
--- a/tests/ChangeLog
+++ b/tests/ChangeLog
@@ -1,3 +1,8 @@
+2021-07-09  Noah Sanci  <nsanci@redhat.com>
+
+	* run-debuginfod-find.sh: Wrote test to ensure duplicate urls are in
+	fact not checked.
+
 2021-07-02  Mark Wielaard  <mark@klomp.org>
 
        * run-debuginfo-find.sh: unset VALGRIND_CMD before testing debuginfod
diff --git a/tests/run-debuginfod-find.sh b/tests/run-debuginfod-find.sh
index 74a5ceff..6faaf70b 100755
--- a/tests/run-debuginfod-find.sh
+++ b/tests/run-debuginfod-find.sh
@@ -359,6 +359,19 @@ rm -rf extracted
 
 wait_ready $PORT1 'found_sourcerefs_total{source=".rpm archive"}' $sourcefiles
 
+########################################################################
+# PR27983 ensure no duplicate urls are used in when querying servers for files
+TMP_DEBUG_URLS=$DEBUGINFOD_URLS
+rm -rf $DEBUGINFOD_CACHE_PATH # clean it from previous tests
+export DEBUGINFOD_URLS="http://127.0.0.1:$PORT1 http://127.0.0.1:$PORT1 http://127.0.0.1:$PORT1 http:127.0.0.1:7999"
+env LD_LIBRARY_PATH=$ldpath ${abs_top_builddir}/debuginfod/debuginfod-find -v executable $BUILDID2 > vlog4 2>&1 || true
+tempfiles vlog4
+if [ $( grep -c 'duplicate url: http://127.0.0.1:'$PORT1'.*' vlog4 ) -ne 2 ]; then
+  echo "Duplicated servers undetected";
+  err
+fi
+export DEBUGINFOD_URLS=$TMP_DEBUG_URLS
+########################################################################
 # Run a bank of queries against the debuginfod-rpms / debuginfod-debs test cases
 
 archive_test() {
-- 
2.31.1

next prev parent reply	other threads:[~2021-07-19 13:31 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-09 19:12 Noah Sanci
2021-07-14 16:36 ` Mark Wielaard
2021-07-19 13:31   ` Noah Sanci [this message]
2021-07-20 14:42     ` Mark Wielaard
2021-07-22 16:25       ` Noah Sanci
2021-07-22 19:22         ` Noah Sanci
2021-07-23 18:34           ` Mark Wielaard
2021-07-26 16:29             ` Noah Sanci
2021-07-28 14:55               ` Mark Wielaard
2021-07-28 16:23                 ` Noah Sanci
2021-07-29 12:20                   ` Mark Wielaard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJXA7qiymkVMv7Pe+7J9r9ZFU5mi_5M=sjbhX1W4xon5eRbeEA@mail.gmail.com' \
    --to=nsanci@redhat.com \
    --cc=elfutils-devel@sourceware.org \
    --cc=mark@klomp.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).