From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id 69A4738582AD for ; Tue, 6 Sep 2022 10:07:58 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 69A4738582AD Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1662458878; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type; bh=ZrK4+TUfkZiS/WWOnakY0u7Ip9xqUKYGCtLuAZamr6M=; b=DF8etgNuqF9Wwzvfb+F71bEAOxlHJCA+2cYB1SKYR9teLce0rKhU7hDuuZd1qEG0+sv2q5 EzMDaszKPkZE8PGUldVrPD74hvgfRkDyrinEWJrpxE2EEyWvbGxKkpc2zfNtgqso9sCeO2 uPP83DrXNm4cfJq6Bzrf4pj4CywKHp8= Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-607-T8tWYjoVOlydi1efr3SH0Q-1; Tue, 06 Sep 2022 06:07:57 -0400 X-MC-Unique: T8tWYjoVOlydi1efr3SH0Q-1 Received: by mail-qt1-f198.google.com with SMTP id s2-20020ac85cc2000000b00342f8ad1f40so8579853qta.12 for ; Tue, 06 Sep 2022 03:07:56 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=mime-version:user-agent:message-id:date:organization:subject:cc:to :from:x-gm-message-state:from:to:cc:subject:date; bh=ZrK4+TUfkZiS/WWOnakY0u7Ip9xqUKYGCtLuAZamr6M=; b=sa8Mkz7bQrK89PuLSGdmg3p2lkzVSclfigO7vHjkURUnYeyqyO4MvtNeDm5PN4TrpF 9o0kdvMkHDzNzlRmSwv/uWjJlCrPGZJy+GhNMgAm/si90gquiV8EJBDmrf4It5IL3M91 a1z5krMu7X2j1vJXnARJMvKHUA3KWzxvNnkwmLj8JbCGNHoCfE/05f//B6NuvpLkDCST xyyW8WRqyfw1I2r9W+DfbZ1oC0DGKBI4JVu27MkUXQZb5Qo6mW1f2ugq4Vv+yUW0656S E2f1CTEb/JOGTg9BlgH7aAm+jBLdeTDmfgqGKd9AbBmy1KNSqLjAB5NgIoJIN37IYijT nsjw== X-Gm-Message-State: ACgBeo299pCWriy8UIpY4q7RlcTKh6kYyGnPi25SGHU2+7rqjzp0JeCm Nm/OJsSYpeyFrZl6RXhQ+fd7VbiDzBcwZTHc2wokh70sFvDqouWa3by+ftCw4fNw6uFmOoNAT6V j0g/M8n0SscvC8QyLnnEr X-Received: by 2002:ac8:574a:0:b0:344:f354:62fb with SMTP id 10-20020ac8574a000000b00344f35462fbmr43534317qtx.493.1662458876391; Tue, 06 Sep 2022 03:07:56 -0700 (PDT) X-Google-Smtp-Source: AA6agR44PxVx3OURtEav9CKQAmvLwoe+tF+BhH3We6WlgpIaXCOE/Kk0JvB8pKx6duJ4QcbQmD38bg== X-Received: by 2002:ac8:574a:0:b0:344:f354:62fb with SMTP id 10-20020ac8574a000000b00344f35462fbmr43534300qtx.493.1662458876142; Tue, 06 Sep 2022 03:07:56 -0700 (PDT) Received: from localhost ([88.120.130.27]) by smtp.gmail.com with ESMTPSA id 69-20020a370548000000b006a6ab259261sm10367074qkf.29.2022.09.06.03.07.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 06 Sep 2022 03:07:55 -0700 (PDT) Received: by localhost (Postfix, from userid 1000) id EDE875802BD; Tue, 6 Sep 2022 12:07:53 +0200 (CEST) From: Dodji Seketeli To: libabigail@sourceware.org Cc: gprocida@google.com Subject: [PATCH 0/2, RFC] Speed up type DIEs canonicalization Organization: Red Hat / France X-Operating-System: Fedora 38 X-URL: http://www.redhat.com Date: Tue, 06 Sep 2022 12:07:53 +0200 Message-ID: <871qsoke86.fsf@redhat.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain X-Spam-Status: No, score=-6.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hello, This patch set is a response to the problem entitled "abidw performance regression on vmlinux", reported at https://sourceware.org/bugzilla/show_bug.cgi?id=29464. In that problem report, "abidw --noout vmlinux" would take more than hour. Before that performance regression, type DIEs canonicalization by the DWARF reader was fast, because it was wrong, basically. During canonicalization, the reader was using a speed optimization called "canonical type propagation". I argue that the optimization was beeing done too eagerly. Meaning it was being done even in cases where it should not have been, often leading to wrong (but very fast) outcomes. This patch was: commit 7ecef6361799326b99129a479b43b138f0b237ae Author: Dodji Seketeli Date: Mon Apr 4 15:35:48 2022 +0200 Canonicalize DIEs w/o assuming ODR & handle typedefs transparently That patch was mostly meant to affect only C++, Ada and Java binaries. But then, the patch stopped the reader from doing canonical type propagation on types that are not aggregates (pointers, typedefs etc). Doing type propagation on types that are not aggregate is wrong because that can lead to cases where two "pointers to aggregate" could be considered equal where in the end, the aggregates are different. This can happen for instance on aggregates that are detected as redundant. Please note that these terms (redundant, canonical type propagation, etc) are defined in the introductory comments of the first patches below. Anyway, that patch restricted canonicalization and canonical type propagation to aggregate types. It also restricted the amount canonical type propagation performed because there are cases where these should not be performed. I believe that these changes lead the many more type DIEs comparison being performed, even for C programs. On libraries where there tend to be less deep type trees, comparison time didn't explode. But on some applications like the Linux Kernel with deep type trees containing lots of redundant type the number of DIEs comparison exploded. The first patch of the series below tracks all the types that are subject to the canonical type propagation optimization and tries to apply that optimization to a maximum of types. That makes comparisons be as fast as possible as many types now have a canonical type. But whenever it detects that the optimization should not have taken place, it cancels it on all the types where it happened. This is like what we already do when canonicalyzing IR types. This brought comparison times of vmlinux below 5 minutes (from more than hour). The second patch restricts the analysis of DIEs to the interfaces that have exported symbols. Otherwise, several other kinds of interfaces where being analyzed at the DIE level. It was just later that only the IR of the exported interfaces where "chosen" to be put into the final ABI Corpus. That lead to potentially "too many" DIEs being analyzed in the first place, leading to many comparisons in the case of huge applications like vmlinux. I believe this brings the time to under a minute or so when analyzing vmlinux. That second patch thus introduces a new "--exported-interfaces-only" option supported by the tools abidw, abidiff, abipkgdiff, kmidiff, etc. That option triggers the new mode where only exported interfaces are analyzed at the DIE level. Note that when looking at the Linux Kernel, that option is enabled by default. It also introduces a new "--allow-non-exported-interfaces" for those tools. Note that this option is enabled by default when looking at any binary that is not the Linux Kernel. Dodji Seketeli (2): dwarf-reader: Revamp the canonical type DIE propagation algorithm Allow restricting analyzed decls to exported symbols doc/manuals/Makefile.am | 3 +- doc/manuals/abidiff.rst | 52 + doc/manuals/abidw.rst | 82 +- doc/manuals/abipkgdiff.rst | 51 + doc/manuals/kmidiff.rst | 52 +- doc/manuals/tools-use-libabigail.txt | 16 + include/abg-ir.h | 9 + src/abg-dwarf-reader.cc | 1524 ++- src/abg-ir-priv.h | 11 + src/abg-ir.cc | 36 + .../data/test-annotate/test14-pr18893.so.abi | 52 +- .../data/test-annotate/test15-pr18892.so.abi | 11386 ++++++++-------- .../data/test-annotate/test17-pr19027.so.abi | 1224 +- ...19-pr19023-libtcmalloc_and_profiler.so.abi | 11323 +++++++-------- ...st20-pr19025-libvtkParallelCore-6.1.so.abi | 7194 +++++----- ...x86_64--2.24.2-30.fc30.x86_64-report-0.txt | 2 +- .../test-read-dwarf/PR22122-libftdc.so.abi | 893 +- .../test-read-dwarf/test-libandroid.so.abi | 8 + .../test-read-dwarf/test14-pr18893.so.abi | 38 +- .../test-read-dwarf/test15-pr18892.so.abi | 10901 ++++++++------- .../test-read-dwarf/test16-pr18904.so.abi | 676 +- .../test-read-dwarf/test17-pr19027.so.abi | 1216 +- ...19-pr19023-libtcmalloc_and_profiler.so.abi | 11153 +++++++-------- ...st20-pr19025-libvtkParallelCore-6.1.so.abi | 7164 +++++----- .../test22-pr19097-libstdc++.so.6.0.17.so.abi | 32 +- .../test9-pr18818-clang.so.abi | 127 +- tools/abidiff.cc | 12 + tools/abidw.cc | 15 + tools/abipkgdiff.cc | 21 + tools/kmidiff.cc | 12 + 30 files changed, 33868 insertions(+), 31417 deletions(-) create mode 100644 doc/manuals/tools-use-libabigail.txt -- 2.37.2 -- Dodji