public inbox for libabigail@sourceware.org
 help / color / mirror / Atom feed
From: Dodji Seketeli <dodji@seketeli.org>
To: Giuliano Procida <gprocida@google.com>
Cc: Dodji Seketeli via Libabigail <libabigail@sourceware.org>
Subject: Re: [PATCH 2/2, RFC] Allow restricting analyzed decls to exported symbols
Date: Mon, 19 Sep 2022 11:34:51 +0200	[thread overview]
Message-ID: <86zgevd7wk.fsf@seketeli.org> (raw)
In-Reply-To: <CAGvU0Hn8CQkMy7HGB6Caq4JwaKa2KPB-cG+wyuVCghnCK-hFZg@mail.gmail.com> (Giuliano Procida's message of "Fri, 9 Sep 2022 14:03:49 +0100")

Hello,

Giuliano Procida <gprocida@google.com> a écrit:

> Hi Dodji.
>
> Sorry for the late reply. I was down with Covid for a while.

No problem.  I hope you are doing fine now.

[...]


> My understanding is that the intention here is to make the DWARF
> reader do less work (look at fewer type DIEs) than at present.

I'd rather say, to make it do less unnecessary work by default.

It first look at *interface* DIEs.  And then it walks the graph of node
to reach the type nodes that are connected.  If it first encounters a
type DIE, it ignores it.  I will reach that DIE only if it's connected
to an interface.

> We are actually hoping that we may be able to make the DWARF reader
> look at more type DIEs so that it is more likely to pick up full
> definitions of types instead of declarations.

If the definitions are not "used" by the interfaces, then by default,
why bother, as far as ABI is concerned (not necessarily APIs)?  But in
any case, if you really want to look at type definitions even those
that are not strictly connected to definitions, you still can.  Just
don't use the new option, I'd say.

Said otherwise I don't think this change will inherently look at less actually used
type definitions.  If it does, then it's a bug and it ought to be fixed.

>
> The rationale behind the change appears to be that DWARF processing is
> expensive, in particular for kernel ABIs. I would say "measure first".

I've profiledof the code and what I think I am seeing is that we are
just looking at too many DIEs that incurs a lot of comparison at
de-duplication (canonicalization) time.

But I believe the heuristics I am using to speed up comparison can be
improved.  They are just taking me time.  So I figured being able to
avoid a significant number of comparisons to begin with was a somewhat
lower hanging fruits, at least for me to be able to release 2.1.

I'll keep on looking at ways to ameliorate the DIE canonicalization even
more in the future, after 2.1.

> Here's roughly how I think about things:
>
> 1. building the IR is very cheap

Well, that's not what I see, when we have lots of DIEs that are built
'unnecessarily', and that would have to be compared to be de-duplicated.
The reason why I am de-duplicating some types (not all of DIE kinds) at
the DIE level is because I have seen over time that it significantly
drops the size and time of de-duplication at the IR level.  To see that
for yourself, it's quite easy, I believe, to disable DIE de-duplication
and run, says, "abidw --noout" on a sizeable binary.

> A kernel ABI may end up with 40k IR elements. The cost of allocating
> memory and calling constructors should be negligible. Any improvements
> to this end of things is pointless.

Again, you can just disable DIE canonicalization in the code and run
abidw --noout on vmlinux to see for yourself how things degrade in
practise.

> 2. reading DWARF information is fairly cheap
>
> We may have 100MB of DWARF but just reading the data (decoding
> attribute formats in particular) won't take that long.
>
> Reducing the number of DIEs examined at the top-level by a factor of 2
> will speed up this part by a factor of 2, but in the grand scheme of
> things that may not be very important.

Indeed.  But I guess the gain might depends on the kind of nodes that
got dropped.  If the nodes are for types that participate in quadratic
comparisons, then the gain might be higher.

But I agree that all in all, what I am seeing is indeed a linear gain in
general.

It might not be very important in the grand scheme, but in practise,
I'll take it :-)

I understand though, that I still need to keep working on this to find
ways to come up with better de-duplication schemes.  That would be
definitely for after 2.1 that I really want to see out now.  It's
overdue.

> 3. chasing references is a bit more expensive
>
> Cross-references in DWARF are pretty common and the lack of locality
> means that chasing cross-references is going to be a constant factor
> slower than iterating through the main DWARF tree.
>
> 4. deciding whether a DIE needs to be turned into IR is currently very expensive
>
> This is because it involves multiple look-ups and recursive comparison
> of DIEs which cannot be unconditionally memoised.

A.k.a de-duplication / canonicalization.

> Those are only my thoughts. Some profiling should give a more accurate picture.
>
> I was curious, so I did an analysis of the connectivity of a kernel
> ABI (using the STG IR, not libabigail's - there are minor
> differences). Here are some fun facts.
>
> The ABI has 34541 nodes.
> There are 25196 strongly-connected components.
> 25053 SCCs are just singleton nodes.
> The largest 3 SCCs have sizes: 4960, 784, 343.
> 1/7 of the ABI nodes are in one SCC!
>
> Completely idle speculation: Perhaps the really huge SCC contributes
> significantly to comparison cost.

I don't think we are saying different things.

Thanks for the comments.

[...]

Cheers,

-- 
		Dodji

  parent reply	other threads:[~2022-09-19  9:34 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-06 10:07 [PATCH 0/2, RFC] Speed up type DIEs canonicalization Dodji Seketeli
2022-09-06 10:10 ` [PATCH 1/2, RFC] dwarf-reader: Revamp the canonical type DIE propagation algorithm Dodji Seketeli
2022-09-06 10:11 ` [PATCH 2/2, RFC] Allow restricting analyzed decls to exported symbols Dodji Seketeli
2022-09-09 13:03   ` Giuliano Procida
2022-09-09 20:32     ` Ben Woodard
2022-09-10 10:56       ` Giuliano Procida
2022-09-19  9:34     ` Dodji Seketeli [this message]
2022-09-06 10:13 ` [PATCH 0/2, RFC] Speed up type DIEs canonicalization Dodji Seketeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=86zgevd7wk.fsf@seketeli.org \
    --to=dodji@seketeli.org \
    --cc=gprocida@google.com \
    --cc=libabigail@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).