From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dd17628.kasserver.com (dd17628.kasserver.com [85.13.138.83]) by sourceware.org (Postfix) with ESMTPS id CED6C3840C0A for ; Sat, 13 Jun 2020 17:40:28 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org CED6C3840C0A Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=milianw.de Authentication-Results: sourceware.org; spf=none smtp.mailfrom=mail@milianw.de Received: from milian-workstation.localnet (p5b3a0355.dip0.t-ipconnect.de [91.58.3.85]) by dd17628.kasserver.com (Postfix) with ESMTPSA id CF7586281CF4 for ; Sat, 13 Jun 2020 19:40:27 +0200 (CEST) From: Milian Wolff To: "elfutils-devel@sourceware.org" Subject: Can dwarf_getscopes{,_die} performance be improved? Date: Sat, 13 Jun 2020 19:40:06 +0200 Message-ID: <2050295.fxRyV3M0rs@milian-workstation> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="nextPart7767494.b0WPEsSQoV"; micalg="pgp-sha256"; protocol="application/pgp-signature" X-Spam-Status: No, score=-1.6 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_NONE autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: elfutils-devel@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Elfutils-devel mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 13 Jun 2020 17:40:30 -0000 --nextPart7767494.b0WPEsSQoV Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" Hey all! In perfparser we are running into a performance issue with elfutils when we try to resolve inline frames. We are following the procedure outlined by eu- addr2line, i.e. for every IP we basically do: ``` Dwarf_Addr bias = 0; Dwarf_Die *cudie = dwfl_module_addrdie(module, ip, &bias); Dwarf_Die *subroutine = nullptr; Dwarf_Die *scopes = nullptr; int nscopes = dwarf_getscopes(cudie, ip - bias, &scopes); for (int i = 0; i < nscopes; ++i) { Dwarf_Die *scope = &scopes[i]; const int tag = dwarf_tag(scope); if (tag == DW_TAG_subprogram || tag == DW_TAG_inlined_subroutine) { subroutine = scope; break; } } Dwarf_Die *scopes_die = nullptr; int nscopes_die = dwarf_getscopes_die(subroutine, &scopes_die); for (int i = 0; i < nscopes_die; ++i) { Dwarf_Die *scope = &scopes_die[i]; const int tag = dwarf_tag(scope); if (tag == DW_TAG_subprogram || tag == DW_TAG_inlined_subroutine) { // do stuff } } free(scopes_die); free(scopes); ``` Profiling shows that both, the calls to dwarf_getscopes and dwarf_getscopes_die can take a really long time. I have seen cases (in libclang.so.11) where a single call can take up to ~50ms on a fast desktop machine (Ryzen 3900X CPU, fast SSD, 32GB of RAM). Now while 50ms may not sound sound too problematic, we have to repeat these calls for every IP we encounter in a perf.data file. We already apply heavy caching, but even then we can easily encounter tens of thousands of individual addresses, which can add up to minutes or even hours of time required to process the data - only to get information on inlined frames. I have learned that the DWARF format is mostly meant for efficient storage and that one cannot assume that it is efficient for such mass post processing. This is the reason why I'm writing this email: Has anyone an idea on how to to post-process the DWARF data to optimize the lookup of inlined frames? Or is there some room for optimizations / caching within elfutils to amortize the repeated DWARF hierarchy walking that happens when one calls dwarf_getscopes{,_die}? From what I'm understanding, both calls will start a top-down search within the CU DIE via __libdw_visit_scopes. Once a match is found, the DIE scope chain is reported. I guess many times (parts of the) DIE scope chain will be shared across different IPs, but I don't see any way to leverage this to speed up the processing task. Thanks, any input would be welcome -- Milian Wolff mail@milianw.de http://milianw.de --nextPart7767494.b0WPEsSQoV Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part. Content-Transfer-Encoding: 7Bit -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEezawi1aUvUGg3A1+8zYW/HGdOX8FAl7lD3cACgkQ8zYW/HGd OX+PGRAAtJz9KhpIaG8Kohzr+I6YvVVdy3o04sxjRTFjMdUFPDdrvNW27Y8W6aaM 2V3MFiyVe7yzki/b2cWVmSK7Tg00TZcmtHR/LQznVao8H0h4sOE+1kfF45LS7HJM fbAZRyc+7l9WXneavP9QR7on5HLSH/ndlMAtTFkLhaeQiE83Qv8ysMmXBXlQiMsJ WY2wNG1QfJFayJ6nb70BkYV4jg4IpSJUcLsFFfHudhpFgQVN+DUw4KpIGjAkPQVP 0dxdnPhvQNxSJmykR/GumXWA9ZU39fibiu3VM7g/dlb4f60Rz0WRGXBow7glrlNa tB6xL7Uq4MRFwKlVVoKYfxyBGxeb2zswaWnTlgSqGYKna9Z3cNWWuhbANF0Ydgtx Eu8KX+gcYp0+Cr44blHbFw8vGoWquy4xaaddy23FCK/QxD8r+JDBiu8ZNHN95QKR S8tkgvJQOs6Yx+1v+G7uIv1OT8CdanwAGDxV5O5LC/8XgtFwFPaHmWvQKGKCKuHq 5S6JQGmod0igDP+jqanur7IFiX9KVwP0CAWTsOP6CLcp28tWve6G3wkgivPQyCEj 9FGBMqodUJkS5ad8Lze4Pq9Ibvrd6LLSbQ3AkjdP7KZD4b9UaFMloqWtKSh09gUl 7+fN9mvE3LHBQhbJrZh7nmNZC9cleZaKva1PgQhz9jkWSPiOWoY= =lLMX -----END PGP SIGNATURE----- --nextPart7767494.b0WPEsSQoV--