public inbox for elfutils@sourceware.org
 help / color / mirror / Atom feed
From: Romain GEISSLER <romain.geissler@amadeus.com>
To: Mark Wielaard <mark@klomp.org>
Cc: "elfutils-devel@sourceware.org" <elfutils-devel@sourceware.org>,
	Francois RIGAULT <francois.rigault@amadeus.com>
Subject: Re: Performance issue with systemd-coredump and container process linking 2000 shared libraries.
Date: Tue, 20 Jun 2023 22:05:29 +0000	[thread overview]
Message-ID: <754003F4-3708-4C9C-AA30-EB76DAECF059@amadeus.com> (raw)
In-Reply-To: <20230620213701.GN24233@gnu.wildebeest.org>

> Le 20 juin 2023 à 23:37, Mark Wielaard <mark@klomp.org> a écrit :
> 
> Hi,
> 
> On Mon, Jun 19, 2023 at 05:08:50PM +0200, Mark Wielaard wrote:
> 
> So I made a mistake here. Since I was testing on fedora 38 which has
> DEBUGINFOD_URLS set. Without DEBUGINFOD_URLS set there is no big
> slowdown.
> 
> Do you have the DEBUGINFOD_URLS environment variable set?
> 
> The real sd-coredump will not have DEBUGINFOD_URLS set (I hope).
> 
> Thanks,
> 
> Mark

Hi,

Our real use case happens on a Openshift 4.13 node, so the OS is Red Hat Core OS 9 (which I assume shares a lot of foundations with RHEL 9).

On our side Francois also told me this afternoon that he didn’t really reproduce the same thing with my reproducer posted here and the real systemd-coredump issue he witnessed live, and also noticed that with DEBUGINFOD_URLS unset/set to the empty string my reproducer has no problem anymore. What he witnessed on the real case (using perf/gdb) was that apparently lots of time was spend in elf_getdata_rawchunk and often in this kind of stack:

Samples: 65K of event 'cpu-clock:pppH', Event count (approx.): 16468500000                                                                                                                                 
Overhead  Command         Shared Object             Symbol                                                                                                                                                 
  98.24%  (sd-parse-elf)  libelf-0.188.so           [.] elf_getdata_rawchunk
   0.48%  (sd-parse-elf)  libelf-0.188.so           [.] 0x00000000000048a3
   0.27%  (sd-parse-elf)  libelf-0.188.so           [.] gelf_getphdr
   0.11%  (sd-parse-elf)  libc.so.6                 [.] _int_malloc
   0.10%  (sd-parse-elf)  libelf-0.188.so           [.] gelf_getnote
   0.06%  (sd-parse-elf)  libc.so.6                 [.] __libc_calloc
   0.05%  (sd-parse-elf)  [kernel.kallsyms]         [k] __softirqentry_text_start
   0.05%  (sd-parse-elf)  libc.so.6                 [.] _int_free


(gdb) bt
#0  0x00007f0ba8a88194 in elf_getdata_rawchunk () from target:/lib64/libelf.so.1
#1  0x00007f0ba98e5013 in module_callback.lto_priv () from target:/usr/lib64/systemd/libsystemd-shared-252.so
#2  0x00007f0ba8ae7291 in dwfl_getmodules () from target:/lib64/libdw.so.1
#3  0x00007f0ba98e6dc0 in parse_elf_object () from target:/usr/lib64/systemd/libsystemd-shared-252.so
#4  0x0000562c474f2d5e in submit_coredump ()
#5  0x0000562c474f57d1 in process_socket.constprop ()
#6  0x0000562c474efbf8 in main ()

My reproducer actually doesn’t fully re-implement what systemd implements (the parsing of the package metadata is clearly omitted), so I thought I had reproduced the same problem while apparently I didn’t, sorry for that. We will also have to double check if really just using 2000 dummy libraries is enough or if this also needs to have a more complex binary like we have in our real case.

Tomorrow on our side we will have to play a bit with a local build of systemd-coredump and try to run it manually to better understand what’s going wrong.


Note: when I wrote and tested my reproducer, I used a fedora:38 container, which doesn’t have DEBUGINFOD_URLS set (which may be different from a real fedora 38, non containerized)

[root@7563ccfb7a39 /]# printenv|grep DEBUGINFOD_URLS
[root@7563ccfb7a39 /]# find /etc/profile.d/|grep debug
[root@7563ccfb7a39 /]# cat /etc/os-release
NAME="Fedora Linux"
VERSION="38 (Container Image)"

Cheers,
Romain


  reply	other threads:[~2023-06-20 22:05 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-14 13:39 Romain Geissler
2023-06-19 15:08 ` Mark Wielaard
2023-06-19 19:56   ` Romain GEISSLER
2023-06-19 20:54     ` Luca Boccassi
2023-06-20 11:59       ` Mark Wielaard
2023-06-20 12:12         ` Luca Boccassi
2023-06-20 13:15     ` Mark Wielaard
2023-06-20 21:37   ` Mark Wielaard
2023-06-20 22:05     ` Romain GEISSLER [this message]
2023-06-21 16:24       ` Mark Wielaard
2023-06-21 18:14         ` Romain GEISSLER
2023-06-21 19:39           ` Mark Wielaard
2023-06-22  8:10             ` Romain GEISSLER
2023-06-22 12:03               ` Mark Wielaard
2023-06-27 14:09         ` Mark Wielaard
2023-06-30 16:09           ` Romain GEISSLER
2023-06-30 22:35             ` Mark Wielaard
2023-06-22  2:40       ` Frank Ch. Eigler
2023-06-22 10:59         ` Mark Wielaard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=754003F4-3708-4C9C-AA30-EB76DAECF059@amadeus.com \
    --to=romain.geissler@amadeus.com \
    --cc=elfutils-devel@sourceware.org \
    --cc=francois.rigault@amadeus.com \
    --cc=mark@klomp.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).