From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from gnu.wildebeest.org (gnu.wildebeest.org [45.83.234.184]) by sourceware.org (Postfix) with ESMTPS id 87A6E3858407 for ; Thu, 9 Dec 2021 16:47:31 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 87A6E3858407 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=klomp.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=klomp.org Received: from tarox.wildebeest.org (83-87-18-245.cable.dynamic.v4.ziggo.nl [83.87.18.245]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by gnu.wildebeest.org (Postfix) with ESMTPSA id E290C3000ED0; Thu, 9 Dec 2021 17:47:26 +0100 (CET) Received: by tarox.wildebeest.org (Postfix, from userid 1000) id A5F23413CC9A; Thu, 9 Dec 2021 17:47:25 +0100 (CET) Message-ID: Subject: Re: patch rfc: PR28661: debuginfod thread-pool From: Mark Wielaard To: "Frank Ch. Eigler" , elfutils-devel@sourceware.org Date: Thu, 09 Dec 2021 17:47:25 +0100 In-Reply-To: <20211209035541.GA21366@redhat.com> References: <20211209035541.GA21366@redhat.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Mailer: Evolution 3.28.5 (3.28.5-10.el7) Mime-Version: 1.0 X-Spam-Status: No, score=-10.1 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: elfutils-devel@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Elfutils-devel mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 Dec 2021 16:47:35 -0000 Hi Frank, On Wed, 2021-12-08 at 22:55 -0500, Frank Ch. Eigler via Elfutils-devel wrote: > While I think this patch itself is fine, and works around the > libmicrohttpd bug that motivated it, I don't know how to test it > seriously in the testsuite. (We can certainly try few -C options for > parsing & operability.) The error edge cases only appear to occur > under very high load and task limits such as those imposed by systemd > cgroups. We have a couple of tests that try to seek the limit that they can still safely run under (e.g. run-large-elf-file.sh), but they were probably mistakes to include in the normal regression test suite. It is tricky to find the appropriate limit on a machine without hitting it and causing the test machine to fall over (briefly). If you can use ulimit -u or ulimit -T in the run-test.sh script then please use that, but that probably requires launching sub-shells and you quickly end up in shell-hell. So I would recommend to simply add a testcase that just uses no-option,=20 -C and -C 256 (or take any arbitrary number, 1 might be an interesting corner case) and see that if you have 4 (8, 16, ...?) debuginfod-find (or curl metric) processes doing some parallel queries works as expected. > Author: Frank Ch. Eigler > Date: Wed Dec 8 22:41:47 2021 -0500 >=20 > PR28661: debuginfo connection thread pool support > =20 > Add an option -C, which activates libmicrohttpd's thread-pool mode fo= r > handling incoming http connections. Add libmicrohttpd error-logging > callback function so as to receive indication of its internal errors, > and relay counts to our metrics. Some of these internal errors tippe= d > us off to a microhttpd bug that thread pooling works around. Documen= t > in debuginfod.8 page. Hand-tested against "ulimit -u NNN" shells. > =20 > Signed-off-by: Frank Ch. Eigler >=20 > diff --git a/debuginfod/ChangeLog b/debuginfod/ChangeLog > index 21d0721ef080..125e7d816f51 100644 > --- a/debuginfod/ChangeLog > +++ b/debuginfod/ChangeLog > @@ -1,3 +1,11 @@ > +2021-12-08 Frank Ch. Eigler > + > + * debuginfod.cxx (connection_pool): New global. > + (parse_opt): Parse & check -C option to set it. > + (error_cb): New callback for libmicrohttpd error counting. > + (main): Activate MHD_OPTION_THREAD_POOL_SIZE if appropriate. > + Activate error_cb. > + > 2021-12-01 Mark Wielaard > =20 > * debuginfod-client.c (debuginfod_query_server): Free tmp_url on > diff --git a/debuginfod/debuginfod.cxx b/debuginfod/debuginfod.cxx > index 0d3f02978ee2..bb0c6cd65670 100644 > --- a/debuginfod/debuginfod.cxx > +++ b/debuginfod/debuginfod.cxx > @@ -352,7 +352,9 @@ static const struct argp_option options[] =3D > { "rescan-time", 't', "SECONDS", 0, "Number of seconds to wait betwee= n rescans, 0=3Ddisable.", 0 }, > { "groom-time", 'g', "SECONDS", 0, "Number of seconds to wait between= database grooming, 0=3Ddisable.", 0 }, > { "maxigroom", 'G', NULL, 0, "Run a complete database groom/shrink pa= ss at startup.", 0 }, > - { "concurrency", 'c', "NUM", 0, "Limit scanning thread concurrency to= NUM.", 0 }, > + { "concurrency", 'c', "NUM", 0, "Limit scanning thread concurrency to= NUM, default=3D#CPUs.", 0 }, > + { "connection-pool", 'C', "NUM", OPTION_ARG_OPTIONAL, > + "Use webapi connection pool with NUM threads, default=3Dunlim.", 0 = }, > { "include", 'I', "REGEX", 0, "Include files matching REGEX, default= =3Dall.", 0 }, > { "exclude", 'X', "REGEX", 0, "Exclude files matching REGEX, default= =3Dnone.", 0 }, > { "port", 'p', "NUM", 0, "HTTP port to listen on, default 8002.", 0 }= , > @@ -411,6 +413,7 @@ static unsigned rescan_s =3D 300; > static unsigned groom_s =3D 86400; > static bool maxigroom =3D false; > static unsigned concurrency =3D std::thread::hardware_concurrency() ?: 1= ; > +static int connection_pool =3D 0; > static set source_paths; > static bool scan_files =3D false; > static map scan_archives; > @@ -557,6 +560,16 @@ parse_opt (int key, char *arg, > concurrency =3D (unsigned) atoi(arg); > if (concurrency < 1) concurrency =3D 1; > break; > + case 'C': > + if (arg) > + { > + connection_pool =3D atoi(arg); > + if (connection_pool < 2) > + argp_failure(state, 1, EINVAL, "-C NUM minimum 2"); > + } > + else // arg not given > + connection_pool =3D std::thread::hardware_concurrency() * 2 ?: 2= ; > + break; Aha, forget about my -C 1 corner case above then :) > case 'I': > // NB: no problem with unconditional free here - an earlier failed= regcomp would exit program > if (passive_p) > @@ -3684,7 +3698,13 @@ sigusr2_handler (int /* sig */) > } > =20 > =20 > - > +static void // error logging callback from libmicrohttpd internals > +error_cb (void *arg, const char *fmt, va_list ap) > +{ > + (void) arg; > + inc_metric("error_count","libmicrohttpd",fmt); > + (void) vdprintf (STDERR_FILENO, fmt, ap); > +} > =20 > =20 > // A user-defined sqlite function, to score the sharedness of the > @@ -3829,7 +3849,7 @@ main (int argc, char *argv[]) > =20 > // Start httpd server threads. Separate pool for IPv4 and IPv6, in > // case the host only has one protocol stack. > - MHD_Daemon *d4 =3D MHD_start_daemon (MHD_USE_THREAD_PER_CONNECTION > + MHD_Daemon *d4 =3D MHD_start_daemon ((connection_pool ? 0 : MHD_USE_TH= READ_PER_CONNECTION) > #if MHD_VERSION >=3D 0x00095300 > | MHD_USE_INTERNAL_POLLING_THREAD > #else > @@ -3839,8 +3859,11 @@ main (int argc, char *argv[]) > http_port, > NULL, NULL, /* default accept polic= y */ > handler_cb, NULL, /* handler callba= ck */ > + MHD_OPTION_EXTERNAL_LOGGER, error_c= b, NULL, > + (connection_pool ? MHD_OPTION_THREA= D_POOL_SIZE : MHD_OPTION_END), > + (connection_pool ? (int)connection_= pool : MHD_OPTION_END), > MHD_OPTION_END); > - MHD_Daemon *d6 =3D MHD_start_daemon (MHD_USE_THREAD_PER_CONNECTION > + MHD_Daemon *d6 =3D MHD_start_daemon ((connection_pool ? 0 : MHD_USE_TH= READ_PER_CONNECTION) > #if MHD_VERSION >=3D 0x00095300 > | MHD_USE_INTERNAL_POLLING_THREAD > #else > @@ -3851,6 +3874,9 @@ main (int argc, char *argv[]) > http_port, > NULL, NULL, /* default accept polic= y */ > handler_cb, NULL, /* handler callba= ck */ > + MHD_OPTION_EXTERNAL_LOGGER, error_c= b, NULL, > + (connection_pool ? MHD_OPTION_THREA= D_POOL_SIZE : MHD_OPTION_END), > + (connection_pool ? (int)connection_= pool : MHD_OPTION_END), > MHD_OPTION_END); Urgh, the use of varargs in MHD_start_daemon makes my head hurt. But it does look correct. > if (d4 =3D=3D NULL && d6 =3D=3D NULL) // neither ipv4 nor ipv6? boo > @@ -3904,6 +3930,8 @@ main (int argc, char *argv[]) > =20 > if (! passive_p) > obatched(clog) << "search concurrency " << concurrency << endl; > + obatched(clog) << "webapi connection pool " << connection_pool > + << (connection_pool ? "" : " (unlimited)") << endl; > if (! passive_p) > obatched(clog) << "rescan time " << rescan_s << endl; > obatched(clog) << "fdcache fds " << fdcache_fds << endl; > diff --git a/doc/ChangeLog b/doc/ChangeLog > index 7a73fa107bf3..f25bcd2ee2b3 100644 > --- a/doc/ChangeLog > +++ b/doc/ChangeLog > @@ -1,3 +1,8 @@ > +2021-12-08 Frank Ch. Eigler > + > + PR28661 > + * debuginfod.8 (-C): Document new flag. > + > 2021-11-05 Frank Ch. Eigler > =20 > PR28430 > diff --git a/doc/debuginfod.8 b/doc/debuginfod.8 > index 1e56f6568322..ee8e4078e5b5 100644 > --- a/doc/debuginfod.8 > +++ b/doc/debuginfod.8 > @@ -205,6 +205,23 @@ This important for controlling CPU-intensive operati= ons like parsing > an ELF file and especially decompressing archives. The default is the > number of processors on the system; the minimum is 1. > =20 > +.TP > +.B "\-C" "\-C=3DNUM" "\-\-connection\-pool" "\-\-connection\-pool=3DNUM" > +Set the size of the pool of threads serving webapi queries. The > +following table summarizes the interpretaton of this option and its > +optional NUM parameter. > +.TS > +l l. > +no option clone new thread for every request, no fixed pool > +\-C use a fixed thread pool sized automatically > +\-C=3DNUM use a fixed thread pool sized NUM, minimum 2 > +.TE > + > +The first mode is useful for friendly bursty traffic. The second mode > +is a simple and safe configuration based on the number of processors. > +The third mode is suitable for tuned load-limiting configurations > +facing unruly traffic. Thanks, the documentation looks pretty clear. Cheers, Mark