From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from gnu.wildebeest.org (wildebeest.demon.nl [212.238.236.112]) by sourceware.org (Postfix) with ESMTPS id 0A34E3847802 for ; Wed, 14 Jul 2021 16:36:16 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 0A34E3847802 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=klomp.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=klomp.org Received: from reform (77-63-20-31.mobile.kpn.net [77.63.20.31]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by gnu.wildebeest.org (Postfix) with ESMTPSA id 6C0AE300066F; Wed, 14 Jul 2021 18:36:14 +0200 (CEST) Received: by reform (Postfix, from userid 1000) id 6B1DB2E810D9; Wed, 14 Jul 2021 18:36:13 +0200 (CEST) Date: Wed, 14 Jul 2021 18:36:13 +0200 From: Mark Wielaard To: Noah Sanci Cc: elfutils-devel@sourceware.org Subject: Re: [Bug debuginfod/27983] ignore duplicate urls Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-9.8 required=5.0 tests=BAYES_00, GIT_PATCH_0, JMQ_SPF_NEUTRAL, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: elfutils-devel@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Elfutils-devel mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Jul 2021 16:36:17 -0000 Hi Noah, On Fri, Jul 09, 2021 at 03:12:18PM -0400, Noah Sanci via Elfutils-devel wrote: > From e37f49a0fd5f27907584b19336cd250d825acc98 Mon Sep 17 00:00:00 2001 > From: Noah Sanci > Date: Fri, 9 Jul 2021 14:53:10 -0400 > Subject: [PATCH] debuginfod: PR27983 - ignore duplicate urls > > Gazing at server logs, one sees a minority of clients who appear to have > duplicate query traffic coming in: the same URL, milliseconds apart. > Chances are the user accidentally doubled her $DEBUGINFOD_URLS somehow, > and the client library is dutifully asking the servers TWICE. Bug #27863 > reduces the pain on the servers' CPU, but dupe network traffic is still > being paid. We should reject sending outright duplicate concurrent > traffic. > > https://sourceware.org/bugzilla/show_bug.cgi?id=27983 > > Signed-off-by: Noah Sanci > --- > debuginfod/ChangeLog | 7 +++++ > debuginfod/debuginfod-client.c | 56 +++++++++++++++++++++++++--------- > tests/ChangeLog | 5 +++ > tests/run-debuginfod-find.sh | 13 ++++++++ > 4 files changed, 67 insertions(+), 14 deletions(-) > > diff --git a/debuginfod/ChangeLog b/debuginfod/ChangeLog > index d9d11737..24ccb8ef 100644 > --- a/debuginfod/ChangeLog > +++ b/debuginfod/ChangeLog > @@ -1,3 +1,10 @@ > +2021-07-09 Noah Sanci > + > + * debuginfod-client.c (debuginfod_query_server): As full-length > + urls are generated with standardized formats, ignore duplicates. > + Also update the number of urls to the unduplicated number of > + urls. You deduplicate the full URLs after they are fully constructed. Would it make sense to do the deduplication on server_url, maybe even as part of the Count number of URLs code? That might make the code simpler. And you can change num_urls upfront. > + num_urls = unduplicated_urls; > + data = reallocarray( (void *) data, num_urls, sizeof(struct handle_data)); Maybe this reallocarray is unnecessary. Yes, it might save a little bit of memory, but you do have to handle reallocarray failure. Cheers, Mark