public inbox for libabigail@sourceware.org
 help / color / mirror / Atom feed
* [Bug default/28005] New: abidiff: compares the same types and declarations many times
@ 2021-06-22  8:17 gprocida at google dot com
  2021-06-22  8:26 ` [Bug default/28005] " gprocida at google dot com
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: gprocida at google dot com @ 2021-06-22  8:17 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=28005

            Bug ID: 28005
           Summary: abidiff: compares the same types and declarations many
                    times
           Product: libabigail
           Version: unspecified
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: default
          Assignee: dodji at redhat dot com
          Reporter: gprocida at google dot com
                CC: libabigail at sourceware dot org
  Target Milestone: ---

Created attachment 13510
  --> https://sourceware.org/bugzilla/attachment.cgi?id=13510&action=edit
archive containing two XML files for comparison

The test case here was comparing two mostly vanilla Linux kernel ABI XML files.
The libabigail version was 1656f9dd7b09cfc5013b891b93fa6b4dc41a0b6b (predating
a known performance regression).

The ABI files are attached.

Profiling with callgrind and instrumenting the code reveals that the two
functions:

ir::types_have_similar_structure
ir::equals (particularly the var_decl overload)

are called thousands more times than there are pairs of types or declarations
to be compared. Around 30% of runtime is estimated to be down these function
calls.

ir::types_have_similar_structure was called over 1 million times each for pairs
such as int, int and void, void.

ir::equals was called more than 50 thousand times each for around 20 (member)
variables and more 30 thousand times each for at least 1000 different ones.

There is a big caveat: I didn't check to see whether each call had the same
context (state affecting the outcome of function call) at each call. In
particular, these could be indirectly recursive calls where the same variable
pair appears earlier in the call stack. From my experimentation with diff
algorithms, I know it's not safe to always cache such pending comparisons.

On the other hand, types_have_similar_structure cannot be recursing for
arguments like int, int, so it's an obvious candidate for memoisation. However,
I think the repeated calls are a symptom of a deeper problem. My vague
intuition is that the structural similarity check could be done with a single
pass or could be eliminated altogether by being more careful elsewhere.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug default/28005] abidiff: compares the same types and declarations many times
  2021-06-22  8:17 [Bug default/28005] New: abidiff: compares the same types and declarations many times gprocida at google dot com
@ 2021-06-22  8:26 ` gprocida at google dot com
  2021-06-22  8:51 ` gprocida at google dot com
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: gprocida at google dot com @ 2021-06-22  8:26 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=28005

--- Comment #1 from gprocida at google dot com ---
FTR, valgrind invoked as:

valgrind --tool=callgrind --dump-instr=yes --collect-jumps=yes --cache-sim=yes
--branch-sim=yes .../build/tools/abidiff 5.8.{7,13}.abi >/dev/null

This took some hours. I let it run overnight.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug default/28005] abidiff: compares the same types and declarations many times
  2021-06-22  8:17 [Bug default/28005] New: abidiff: compares the same types and declarations many times gprocida at google dot com
  2021-06-22  8:26 ` [Bug default/28005] " gprocida at google dot com
@ 2021-06-22  8:51 ` gprocida at google dot com
  2021-06-22  9:03 ` gprocida at google dot com
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: gprocida at google dot com @ 2021-06-22  8:51 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=28005

--- Comment #2 from gprocida at google dot com ---
Actually, going up a bit in the profile...

harmless_harmful_filter::visit gets called 56 million times (not confirmed with
direct instrumentation, yet).

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug default/28005] abidiff: compares the same types and declarations many times
  2021-06-22  8:17 [Bug default/28005] New: abidiff: compares the same types and declarations many times gprocida at google dot com
  2021-06-22  8:26 ` [Bug default/28005] " gprocida at google dot com
  2021-06-22  8:51 ` gprocida at google dot com
@ 2021-06-22  9:03 ` gprocida at google dot com
  2022-01-21 14:08 ` gprocida at google dot com
  2022-01-21 14:09 ` gprocida at google dot com
  4 siblings, 0 replies; 6+ messages in thread
From: gprocida at google dot com @ 2021-06-22  9:03 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=28005

--- Comment #3 from gprocida at google dot com ---
7040 visit calls (with the same arguments) are called 6510 times (this is the
tip of the tail).

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug default/28005] abidiff: compares the same types and declarations many times
  2021-06-22  8:17 [Bug default/28005] New: abidiff: compares the same types and declarations many times gprocida at google dot com
                   ` (2 preceding siblings ...)
  2021-06-22  9:03 ` gprocida at google dot com
@ 2022-01-21 14:08 ` gprocida at google dot com
  2022-01-21 14:09 ` gprocida at google dot com
  4 siblings, 0 replies; 6+ messages in thread
From: gprocida at google dot com @ 2022-01-21 14:08 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=28005

--- Comment #4 from gprocida at google dot com ---
Hi Dodji.

Note: This issue remains lower priority for us than anything relating to abidw
XML output.

I originally reported findings against commit 1656f9dd. By the time we got to
commit 46b1ab08, things were significantly better. However, the following
commit 39ba8596 (found with git bisect) made run times even longer.

We noticed this with the ABI XML comparison I'm now attaching. Timings are in
user CPU seconds on my machine:

commit  5.8.{7,13}      A vs B
1656f9dd        165     170
46b1ab08        92      91
39ba8596        372     538

I haven't profiled again, but I can guess that there are going to be many
repeated comparisons as before.

Adding some caching of equality results may be possible, but my understanding
is that this needs to be done in a way that avoids caching incomplete (due to
infinite recursion avoidance) and possibly incorrect results.

Note that is it is always safe to cache results that say a difference was
found. In fact, with libabigail we see things go slower when they are more
differences, so some kind of memoisation of negative outcomes may really help,
but this may lose some change_kinds if recursion is stopped sooner than before.

It is also possible to write an ABI graph comparison algorithm so that each
pair of things is compared at most once - I outlined this a while ago - and the
ABIs here compare in about 1 second with that.

The RETURN_TRUE_IF_COMPARISON_CYCLE_DETECTED and related code in abg-ir.cc
looks like it's trying to do similar things, but I haven't studied it properly.

I do wish I could achieve comparable performance in libabigail without very
intrusive changes. At a minimum updating change_kind* k / change kind
propagation would need to be handled as part of this.

Lastly, libabigail also does canonicalisation, sharing some of the same
equality functionality. I've not attempted to try to extend my algorithm to
cover that, not even as a theoretical exercise.

Regards,
Giuliano.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug default/28005] abidiff: compares the same types and declarations many times
  2021-06-22  8:17 [Bug default/28005] New: abidiff: compares the same types and declarations many times gprocida at google dot com
                   ` (3 preceding siblings ...)
  2022-01-21 14:08 ` gprocida at google dot com
@ 2022-01-21 14:09 ` gprocida at google dot com
  4 siblings, 0 replies; 6+ messages in thread
From: gprocida at google dot com @ 2022-01-21 14:09 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=28005

--- Comment #5 from gprocida at google dot com ---
Created attachment 13920
  --> https://sourceware.org/bugzilla/attachment.cgi?id=13920&action=edit
two XML files to compare

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-01-21 14:09 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-22  8:17 [Bug default/28005] New: abidiff: compares the same types and declarations many times gprocida at google dot com
2021-06-22  8:26 ` [Bug default/28005] " gprocida at google dot com
2021-06-22  8:51 ` gprocida at google dot com
2021-06-22  9:03 ` gprocida at google dot com
2022-01-21 14:08 ` gprocida at google dot com
2022-01-21 14:09 ` gprocida at google dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).