[PATCH 1/2, RFC] dwarf-reader: Revamp the canonical type DIE propagation algorithm

public inbox for libabigail@sourceware.org
 help / color / mirror / Atom feed

From: Dodji Seketeli <dodji@seketeli.org>
To: Dodji Seketeli via Libabigail <libabigail@sourceware.org>
Cc: gprocida@google.com, Dodji Seketeli <dodji@redhat.com>
Subject: [PATCH 1/2, RFC] dwarf-reader: Revamp the canonical type DIE propagation algorithm
Date: Tue, 06 Sep 2022 12:10:50 +0200	[thread overview]
Message-ID: <87wnagizit.fsf@seketeli.org> (raw)
In-Reply-To: <871qsoke86.fsf@redhat.com> (Dodji Seketeli via Libabigail's message of "Tue, 06 Sep 2022 12:07:53 +0200")

[-- Attachment #1: Type: text/plain, Size: 12385 bytes --]

Hello,

During the type DIEs comparison for canonicalization we need to
perform the optimization called "canonical type propagation".  This is
very similar to what we do for type canonicalization at the IR level.
It's described in-extenso in abg-ir.cc, in the comment that starts
with "@defgroup OnTheFlyCanonicalization".

Basically during canonicalization, suppose we end-up comparing a
sub-type pair {Sl, Sr}.  Suppose Sr does already have a canonical
type.  Suppose also, that {Sl, Sr} ends up comparing equal.  We can
thus deduce that Sl would have the same canonical type as Sr.  So we
'propagate' the canonical type of Sr onto Sl.  Sl is said to have been
canonical-type-propagated from Sr.

Now, I need to introduce the concept of redundant type.

Consider an aggregate type T and its sub-types ST0, ST1 and ST2 which
looks like this:

	T
	+-- ST0
	|
	+-- ST1
	|    +
	|    |
	|    +-- T
	|
	+-- ST2

Notice how the sub-type ST1 itself has a sub-type T.

Now, consider the type T', similar to T, which looks like:

	T'
	+-- ST0'
	|
	+-- ST1'
	|    +
	|    |
	|    +-- T'
	|
	+-- ST2'
	|
	+-- ST'3

Now consider how libabigail compares those two types T and T'
member-wise, aka structurally:

		T                 T'
		+-- ST0           +-- ST0'
		|                 |
		+-- ST1           +-- ST1' <--- Let's call this point #0.
		|    +            |    +
		|    |            |    |
		|    +-- T        |    +-- T'  <--- Let's call this point #1.
		|                 |                 Note that the sub-types of
		+-- ST2           +-- ST2'          T and T' are not
		|                                   represented at this point
		+-- ST'3                            but they do exist!
						    Representing them would lead
						    to a never-ending graph, right?

The properties of T are compared against the properties of T', but to
complete that comparison, the sub-type ST0 is compared against ST0',
etc.  A comparison stack is thus maintained.  Each member of the stack
is the pair of sub-types being compared.  That content changes as
different sub-types are compared. Let's consider the content of the
stack when we reach the point #0 in the graph above. That stack
content would look like:

	[{T,T'} | {ST0, ST0'}]

If we keep going and reach the point #1, the content of the stack
becomes:

	[{T,T'} |{ST0, ST0'} | {T, T'}]

At this point, we see that {T,T'} appears twice in the stack. If we
keep going and explore again the sub-types of {T,T'}, an infinite loop
appears.  The pair {T,T'} is said to be redundant.

To avoid those infinite loops, redundancy detection is done by
compare_dies in abg-dwarf-reader.cc.  When compare_dies detects at #1 that
{T,T'} was already present in the stack then it returns "true" early,
as if in #1, T compares equal to T'.

But then what does that mean for the value of comparing {ST0,ST0'}?
The sub-type {T,T'} in #1 compared equal, so compare_dies assumes that
{ST0,ST0'} compares equal as well, right?  That is what libabigail
used to assume before the commit

	commit 7ecef6361799326b99129a479b43b138f0b237ae
	Author: Dodji Seketeli <dodji@redhat.com>
	Date:   Mon Apr 4 15:35:48 2022 +0200

	    Canonicalize DIEs w/o assuming ODR & handle typedefs transparently

Well, it turns out we need to compare the entire tree of sub-types of
{T,T'} until we reach {ST2, ST2'}.  At that point, compare_dies sees
that ST2' has a sub-type ST3' where ST2 has none.  So {ST2,ST2'}
compares /different/.  So {T,T'} compares different.  So back to #0,
because {ST1,ST2'} has {T,T'} as sub-type (or said differently,
{ST1,ST2'} depends on the redundant pair {T,T'}) then {ST1,ST1'}
compares different.

So {ST1,ST1'} compares different even though it were initially thought to
compare equal because compare_dies had to get out early in #1,
considering that {T,T'} compared equal at that point.

Now, there are two ways to operate when we reach #1:

1/ Avoid performing canonical type propagation as soon as we detect
that {T,T'} is redundant.  That avoidance lasts until we finish
comparing {ST2,ST2'}, that is, until the complete comparison of
{T,T'}.  That means hat comparing every single sub-type is almost
assured to be done structurally, rather than canonically.

2/ Speculate that {T,T'} compare equal, unless proved otherwise.  If
{T,T'} proves to be equal, then things are well and fast.  If they
prove different, then {ST0,ST0'} must be edited afterwards to cancel
the canonical type propagation that might have taken place.  In other
words, sub-types that depends on redundant types pairs must be tracked
until the comparison of those redundant type pairs is done.  If a
redundant type pair compares different then all the sub-types that
depend on it must be walked to have their propagated canonical types
erase as if no canonical type propagation ever took place.

The first approach is the one I initially put in place with the patch
referred to above.  It proved to be super slow.  Analyzing the kernel
was taking literally hours.  Oops.

This patch implements the second approach.  It's more involved than
the first approach.  But it brings the time down to around 2 minutes
on a reasonably fast machine.  This is still slower than what I would
like to see, but it's way better than what had with the first
approach.

A subsequent patch will bring the analysis time for the kernel further
down.  But in the mean time, this one is really needed, I think.

So the patch introduces a new type named offset_pairs_stack_type to
track the pairs of type DIEs being compared.  Each DIE is now
identified by a new offset_type type.  That type  contains the
traditional DIE offset using the Dwarf_Off type from elfutils, but it
also contains the source of that DIE.  The offset_pairs_stack_type
also tracks redundant type pairs and their dependant type pairs.

compare_dies is modified to return a value that is an instance of the
new enum comparison_result.  The results can be
COMPARISON_RESULT_EQUAL when a type pair compares equal,
COMPARISON_RESULT_DIFFERENT when a type pair compares different,
COMPARISON_RESULT_CYCLE_DETECTED when a redundant type is detected,
leading to a comparison cycle, and COMPARISON_RESULT_UNKNOWN when the
outcome of the comparison cannot be known because we are comparing a
pair that depends on a redundant pair.

A new function return_comparison_result is introduced.  It's intended
to be invoked right before returning from compare_dies.  It looks at
the actual comparison result and depending on the state of the stack
of type pairs being compared, handles the book keeping of redundant
types and their dependant types.  It also handles when to propagate
canonical types if necessary and when to cancel the canonical types
that might have been propagated.

The ABG_RETURN macro has been adapted to invoke
return_comparison_result before returning out from compare_dies.

	* src/abg-ir-priv.h (enum comparison_result): Define new enum.
	* src/abg-dwarf-reader.cc (type_comparison_result_to_be_cached)
	(maybe_cache_type_comparison_result)
	(get_cached_type_comparison_result)
	(maybe_get_cached_type_comparison_result)
	(maybe_propagate_canonical_type, propagate_canonical_type)
	(return_comparison_result): Define new static functions.
	(has_offset_pair, insert_offset_pair, erase_offset_pair)
	(have_offset_pair_in_common): Remove static functions.
	(read_context::die_comparison_visits_): Remove data member.  The
	concept supported by this data member is now replaced by caching
	the results of comparing aggregate types, especially those that
	are not yet canonicalized.  This essentially prevents the same
	aggregate type pair to be compared again and again.
	(read_context::{die_comparison_results_, compare_count_,
	canonical_propagated_count_, cancelled_propagation_count_}): New
	data members.
	(read_context::initialize): Initialize the new data members
	compare_count_, canonical_propagated_count_,
	cancelled_propagation_count_ of integral type.
	(read_context::{erase_canonical_die_offset}): New member
	functions.
	(struct offset_pairs_stack_type): Define new type.
	(die_offset): Remove.
	(is_canon_type_to_be_propagated_tag): Add union types to the set
	of types for which canonical type propagation might occur.
	(is_type_die_to_be_canonicalized): Add function types and array
	types to the types to be canonicalized.
	(ABG_RETURN): Change this macro to consider
	COMPARISON_RESULT_DIFFERENT rather  than the "false" boolean.
	Also, it uses the new return_comparison_result function.
	(ABG_RETURN_FALSE): Likewise, use the new return_comparison_result
	function.
	(SET_RESULT_TO_FALSE): Make this return
	COMPARISON_RESULT_DIFFERENT.
	(SET_RESULT_TO, RETURN_IF_COMPARISON_CYCLE_DETECTED): Define new
	macros.
	(compare_dies): Make this return comparison_result rather than
	just a bool.  This is also the core of the overhaul of the
	canonical DIE propagation algorithm.  The algorithm is now similar
	to the one implemented in the equals function for class_or_union
	types in abg-ir.cc.  It's described in the comment that starts
	with '@defgroup OnTheFlyCanonicalization' in abg-ir.cc.  The type
	of the aggregates_being_compared parameter is now
	offset_pairs_stack_type in parameter.  No more need for the
	redundant_aggregates_being_compared parameter.  The new
	offset_type that also encapsulates the source of the offset is now
	used in lieu of the Dwarf_Off type.  Results of comparing
	aggregates being compared are now cached.  When comparing
	aggregates, use the RETURN_IF_COMPARISON_CYCLE_DETECTED to return
	early if a cycle is detected.  The invocation of the ABG_RETURN
	macro (especially the call to return_comparison_result) is where
	the book keeping for canonical types propagation takes place, so
	remove the explicit code that was handling that from the end of
	this function.
	(read_debug_info_into_corpus): Print statistics about the number
	of aggregates compared and canonical-type-propagated.
	* tests/data/test-annotate/test14-pr18893.so.abi: Adjust.
	* tests/data/test-annotate/test15-pr18892.so.abi: Likewise.
	* tests/data/test-annotate/test17-pr19027.so.abi: Likewise.
	* tests/data/test-annotate/test19-pr19023-libtcmalloc_and_profiler.so.abi:
	Likewise.
	* tests/data/test-annotate/test20-pr19025-libvtkParallelCore-6.1.so.abi:
	Likewise.
	* tests/data/test-diff-pkg/GtkAda-gl-2.24.2-29.fc29.x86_64--2.24.2-30.fc30.x86_64-report-0.txt:
	Likewise.
	* tests/data/test-read-dwarf/PR22122-libftdc.so.abi: Likewise.
	* tests/data/test-read-dwarf/test-libandroid.so.abi: Likewise.
	* tests/data/test-read-dwarf/test14-pr18893.so.abi: Likewise.
	* tests/data/test-read-dwarf/test15-pr18892.so.abi: Likewise.
	* tests/data/test-read-dwarf/test16-pr18904.so.abi: Likewise.
	* tests/data/test-read-dwarf/test17-pr19027.so.abi: Likewise.
	* tests/data/test-read-dwarf/test19-pr19023-libtcmalloc_and_profiler.so.abi:
	Likewise.
	* tests/data/test-read-dwarf/test20-pr19025-libvtkParallelCore-6.1.so.abi:
	Likewise.
	* tests/data/test-read-dwarf/test22-pr19097-libstdc++.so.6.0.17.so.abi:
	Likewise.
	* tests/data/test-read-dwarf/test9-pr18818-clang.so.abi: Likewise.

Signed-off-by: Dodji Seketeli <dodji@redhat.com>
---
 src/abg-dwarf-reader.cc                       |  1371 +-
 src/abg-ir-priv.h                             |     9 +
 .../data/test-annotate/test14-pr18893.so.abi  |    52 +-
 .../data/test-annotate/test15-pr18892.so.abi  | 11386 ++++++++--------
 .../data/test-annotate/test17-pr19027.so.abi  |  1224 +-
 ...19-pr19023-libtcmalloc_and_profiler.so.abi | 11323 +++++++--------
 ...st20-pr19025-libvtkParallelCore-6.1.so.abi |  7194 +++++-----
 ...x86_64--2.24.2-30.fc30.x86_64-report-0.txt |     2 +-
 .../test-read-dwarf/PR22122-libftdc.so.abi    |   893 +-
 .../test-read-dwarf/test-libandroid.so.abi    |     8 +
 .../test-read-dwarf/test14-pr18893.so.abi     |    38 +-
 .../test-read-dwarf/test15-pr18892.so.abi     | 10901 ++++++++-------
 .../test-read-dwarf/test16-pr18904.so.abi     |   676 +-
 .../test-read-dwarf/test17-pr19027.so.abi     |  1216 +-
 ...19-pr19023-libtcmalloc_and_profiler.so.abi | 11153 +++++++--------
 ...st20-pr19025-libvtkParallelCore-6.1.so.abi |  7164 +++++-----
 .../test22-pr19097-libstdc++.so.6.0.17.so.abi |    32 +-
 .../test9-pr18818-clang.so.abi                |   127 +-
 18 files changed, 33399 insertions(+), 31370 deletions(-)

The patch is too big for the list so I am attaching it gzipped.

[-- Attachment #2: 0001-dwarf-reader-Revamp-the-canonical-type-DIE-propagati.patch.gz --]
[-- Type: application/gzip, Size: 815036 bytes --]

[-- Attachment #3: Type: text/plain, Size: 13 bytes --]

-- 
		Dodji

next prev parent reply	other threads:[~2022-09-06 10:10 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-06 10:07 [PATCH 0/2, RFC] Speed up type DIEs canonicalization Dodji Seketeli
2022-09-06 10:10 ` Dodji Seketeli [this message]
2022-09-06 10:11 ` [PATCH 2/2, RFC] Allow restricting analyzed decls to exported symbols Dodji Seketeli
2022-09-09 13:03   ` Giuliano Procida
2022-09-09 20:32     ` Ben Woodard
2022-09-10 10:56       ` Giuliano Procida
2022-09-19  9:34     ` Dodji Seketeli
2022-09-06 10:13 ` [PATCH 0/2, RFC] Speed up type DIEs canonicalization Dodji Seketeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87wnagizit.fsf@seketeli.org \
    --to=dodji@seketeli.org \
    --cc=dodji@redhat.com \
    --cc=gprocida@google.com \
    --cc=libabigail@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).