From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from gnu.wildebeest.org (gnu.wildebeest.org [45.83.234.184]) by sourceware.org (Postfix) with ESMTPS id E16343858D38 for ; Mon, 19 Jun 2023 15:08:52 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E16343858D38 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=klomp.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=klomp.org Received: from r6.localdomain (82-217-174-174.cable.dynamic.v4.ziggo.nl [82.217.174.174]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by gnu.wildebeest.org (Postfix) with ESMTPSA id A1B643023089; Mon, 19 Jun 2023 17:08:51 +0200 (CEST) Received: by r6.localdomain (Postfix, from userid 1000) id 1367E340311; Mon, 19 Jun 2023 17:08:51 +0200 (CEST) Message-ID: Subject: Re: Performance issue with systemd-coredump and container process linking 2000 shared libraries. From: Mark Wielaard To: Romain Geissler , elfutils-devel@sourceware.org Date: Mon, 19 Jun 2023 17:08:50 +0200 In-Reply-To: <20230614133920.10-1-romain.geissler@amadeus.com> References: <20230614133920.10-1-romain.geissler@amadeus.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.48.3 (3.48.3-1.fc38) MIME-Version: 1.0 X-Spam-Status: No, score=-3028.2 required=5.0 tests=BAYES_00,JMQ_SPF_NEUTRAL,KAM_DMARC_STATUS,RCVD_IN_BARRACUDACENTRAL,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Romain, Just to let you know I am looking at this. But haven't made much progress in understanding it yet. Thanks so much for the reproducer. I have been able to see the (very slow) parsing of the core file with it. $ time ./mimic-systemd-coredump [...] real 3m35.965s user 0m0.722s sys 0m0.345s Note however that a lot of time is "missing". And in fact running it again is fast!?! $ time ./mimic-systemd-coredump real 0m0.327s user 0m0.272s sys 0m0.050s This is because of the kernel inode/dentry cache. If I do $ echo 2 | sudo tee /proc/sys/vm/drop_caches before running ./mimic-systemd-coredump it is always slow. I'll try to figure out what we do to make it so hard for the kernel to do these lookups. But that doesn't invalidate the other observation you made, that the dwfl_module_get_elf call always returns NULL. > My understanding of the will of systemd developers is that they hoped tha= t libdwfl would > return some "partial" Elf* reference when calling dwfl_module_getelf, fro= m the elf > headers found in the core for each and every shared library (the first pa= ge of the > PT_LOAD mappings that the kernel always dumps even when the mapping is fi= le backed). Right, that is a reasonable hope. And I don't actually know why it always fails in this case. > However it seems that behind the hood it doesn't (is it linked to core_fi= le_read_eagerly > which seems to always return false in this case ?), and instead it uses t= he > .find_elf =3D dwfl_build_id_find_elf callback which tries to find the fil= e by buildid > on the filesystem. For some unknown reason to me, calling dwfl_module_get= elf is very > slow (I wouldn't expect that looking on the filesytem by buildid is that = slow actually). Apparently we do it in some really slow way if the inodes/dentries aren't in kernel cache (and the files are not actually on disk). Which does bring up the question why systemd-coredump isn't running in the same mount space as the crashing program. Then it would simply find the files that the crashing program is using. Or it might install a .find_elf callback that (also) looks under /proc/pid/root/ ? > So, is this behavior of dwfl_module_getelf expected ? If yes, do you agre= e that we shall > advise systemd-coredump developer to invert their logic, to first try to = look for partial > elf header from the core's PT_LOAD section, then only fallback to the mor= e reliable > dwfl_module_getelf if it didn't work ? In practice, we have tried the fol= lowing patch > applied to systemd v253 and it seems ot "fix" the above mentionned case: I don't think dwfl_module_getelf should always return NULL in this case. Nor should it be this slow. But given that it does and given that it is slow that is certainly reasonable advise. > Some other side question: on the long run, wouldn't it make sense that el= futils tries to parse the > json package metadata section by itself, just like it does for the buildi= d, rather than implementing > this logic in systemd ? Maybe we could provide this functionality. You are right that we have no problem getting the build-ids with $ eu-unstrip --core=3D./the-core -n So providing some other "static data" might be possible with a simpler interface. Thanks for this extensive bug report and reproducer. I play some more with it to hopefully get you some real answers/fixes. Cheers, Mark