From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from gnu.wildebeest.org (wildebeest.demon.nl [212.238.236.112]) by sourceware.org (Postfix) with ESMTPS id 1D98B398306A for ; Tue, 20 Jul 2021 14:42:29 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 1D98B398306A Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=klomp.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=klomp.org Received: from reform (77-167-121-15.hybrid.kpn.net [77.167.121.15]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by gnu.wildebeest.org (Postfix) with ESMTPSA id 443A9300027E; Tue, 20 Jul 2021 16:42:27 +0200 (CEST) Received: by reform (Postfix, from userid 1000) id 5F4452E804E0; Tue, 20 Jul 2021 16:42:26 +0200 (CEST) Date: Tue, 20 Jul 2021 16:42:26 +0200 From: Mark Wielaard To: Noah Sanci Cc: elfutils-devel@sourceware.org Subject: Re: [Bug debuginfod/27983] ignore duplicate urls Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-8.3 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_BADIPHTTP, KAM_DMARC_STATUS, KAM_SHORT, NUMERIC_HTTP_ADDR, SPF_HELO_NONE, SPF_PASS, TXREP, WEIRD_PORT autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: elfutils-devel@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Elfutils-devel mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Jul 2021 14:42:31 -0000 Hi Noah, On Mon, Jul 19, 2021 at 09:31:17AM -0400, Noah Sanci wrote: > On Wed, Jul 14, 2021 at 12:36 PM Mark Wielaard wrote: > > You deduplicate the full URLs after they are fully constructed. Would > > it make sense to do the deduplication on server_url, maybe even as > > part of the Count number of URLs code? That might make the code > > simpler. And you can change num_urls upfront. > Deduplication before fully building the URL would work well, however I > was concerned about the slashbuildid situation. I would need to alter > all urls to either have a trailing '/' or no trailing '/' to ensure > comparison between 'http://127.0.0.1:8000/' and 'http://127.0.0.1:8000' > is considered equal. This is possible, but I ultimately decided to > wait until full construction as those issues would have been handled. > I would be glad to make the change if you want. I see what you mean, we have: /* Build handle url. Tolerate both http://foo:999 and http://foo:999/ forms */ char *slashbuildid; if (strlen(server_url) > 1 && server_url[strlen(server_url)-1] == '/') slashbuildid = "buildid"; else slashbuildid = "/buildid"; I think the code could become simpler if we did the deduplication of the server_url list up-front. That also gives you the oppertunity to make sure all server_urls end with a slash. But if you think that is too much work for this patch we can do it as a followup patch. You already included a test, which makes checking such a refactoring easier (sadly the run-debuginfod-find.sh test is somewhat fragile). > From be4e07a8732dd688595b99f92ba275ef5060a34d Mon Sep 17 00:00:00 2001 > From: Noah Sanci > Date: Fri, 9 Jul 2021 14:53:10 -0400 > Subject: [PATCH] debuginfod: PR27983 - ignore duplicate urls > > Gazing at server logs, one sees a minority of clients who appear to have > duplicate query traffic coming in: the same URL, milliseconds apart. > Chances are the user accidentally doubled her $DEBUGINFOD_URLS somehow, > and the client library is dutifully asking the servers TWICE. Bug #27863 > reduces the pain on the servers' CPU, but dupe network traffic is still > being paid. We should reject sending outright duplicate concurrent > traffic. > > https://sourceware.org/bugzilla/show_bug.cgi?id=27983 > diff --git a/debuginfod/debuginfod-client.c b/debuginfod/debuginfod-client.c > index f417b40a..a9447cae 100644 > --- a/debuginfod/debuginfod-client.c > +++ b/debuginfod/debuginfod-client.c > @@ -795,6 +795,7 @@ debuginfod_query_server (debuginfod_client *c, > > char *strtok_saveptr; > char *server_url = strtok_r(server_urls, url_delim, &strtok_saveptr); > + int unduplicated_urls = 0; > > /* Initialize each handle. */ > for (int i = 0; i < num_urls && server_url != NULL; i++) Maybe this should be: /* Initialize each handle. */ int i = 0; while (server_url != NULL) With an i++ at the end after the data[i] handle has been completely assigned. See below (*) for why. > + char *tmp_url = calloc(PATH_MAX, sizeof(char)); This can fail. So you'll have to handle tmp_url == NULL. Otherwise snprintf will crash. > if (filename) /* must start with / */ > - snprintf(data[i].url, PATH_MAX, "%s%s/%s/%s%s", server_url, > + snprintf(tmp_url, PATH_MAX, "%s%s/%s/%s%s", server_url, > slashbuildid, build_id_bytes, type, filename); > else > - snprintf(data[i].url, PATH_MAX, "%s%s/%s/%s", server_url, > + snprintf(tmp_url, PATH_MAX, "%s%s/%s/%s", server_url, > slashbuildid, build_id_bytes, type); > > + /* PR 27983: If the url is already set to be used use, skip it */ > + int url_index = -1; > + for (url_index = 0; url_index < i; ++url_index) No need to initialize url_index twice. Just declar int url_index; it will be assigned 0 at the next line. > + { > + if(strcmp(tmp_url, data[url_index].url) == 0) > + { > + url_index = -1; > + break; > + } > + } > + if (url_index == -1) > + { > + if (vfd >= 0) > + dprintf(vfd, "duplicate url: %s, skipping\n", tmp_url); > + free(tmp_url); > + // Ensure that next iteration doesn't skip over an index mid-array > + i--; > + server_url = strtok_r(NULL, url_delim, &strtok_saveptr); > + continue; (*) this i--; confused me a little. Which is why I suggest turning that for loop into a while loop. > + } > + else > + { > + unduplicated_urls++; > + strncpy(data[i].url, tmp_url, PATH_MAX); Both strings are max PATH_MAX chars, and tmp_url is zero terminated, so you can use strcpy. > + rc = -ENETUNREACH; > + goto out1; > + } > + data[i].client = c; > curl_easy_setopt(data[i].handle, CURLOPT_URL, data[i].url); > if (vfd >= 0) > curl_easy_setopt(data[i].handle, CURLOPT_ERRORBUFFER, data[i].errbuf); > @@ -863,6 +889,7 @@ debuginfod_query_server (debuginfod_client *c, > curl_multi_add_handle(curlm, data[i].handle); > server_url = strtok_r(NULL, url_delim, &strtok_saveptr); (*) So here a i++ (so you don't need the i-- above). > } > + num_urls = unduplicated_urls; > > /* Query servers in parallel. */ > if (vfd >= 0) Cheers, Mark