public inbox for libabigail@sourceware.org
 help / color / mirror / Atom feed
* Re: [Bug default/26646] unexpected declaration-only types
       [not found] ` <bug-26646-12616-6R4shWHsRy@http.sourceware.org/bugzilla/>
@ 2022-03-02 22:35   ` Giuliano Procida
  2022-03-03 11:43     ` Dodji Seketeli
  0 siblings, 1 reply; 27+ messages in thread
From: Giuliano Procida @ 2022-03-02 22:35 UTC (permalink / raw)
  To: dodji at seketeli dot org; +Cc: Giuliano Procida via Libabigail

(Not sure what's going on with the addresses, I've added back libabigail here.)

Hi.

On Wed, 2 Mar 2022 at 13:34, dodji at seketeli dot org
<sourceware-bugzilla@sourceware.org> wrote:
>
> https://sourceware.org/bugzilla/show_bug.cgi?id=26646
>
> --- Comment #31 from dodji at seketeli dot org ---
> Dodji Seketeli via Libabigail <libabigail@sourceware.org> a écrit:
>
> [...]
>
>
> > This addresses https://sourceware.org/bugzilla/show_bug.cgi?id=26646#c21.
> >
> >       * src/abg-dwarf-reader.cc (compare_dies): Do not propagate
> >       canonical type when aggregate redundancy is detected.
> >
>
> [...]
>
> gprocida at google dot com via Libabigail <libabigail@sourceware.org> a
> écrit:
>
> [...]
>
> > I've thought a little bit more about this one.
> >
> > The current check is "local", recursion prevention at *this* DIE
> > prevents it from being canonicalised, but it could still depend on
> > child DIEs that are actually also parent DIEs and whose processing has
> > not yet been completed.
>
> You are right.  The current state is an improvement, but it's not
> "complete".  Some DIEs might still wrongly considered as being
> equivalent just because they depend on a "recursive DIE" that was
> detected as such.  The canonical DIE propagation might have happened
> during the window of time where the recursive DIE was comparing equal to
> its counterpart.
>
> This is addressed in the IR type canonicalization algorithm in
> abg-ir.cc.
> To learn more about this, look for
> /// @defgroup OnTheFlyCanonicalization On-the-fly Canonicalization
>
> and that comment.
>
> The implementation is scattered in the functions
> return_comparison_result, maybe_propagate_canonical_type and in the
> macro RETURN_TRUE_IF_COMPARISON_CYCLE_DETECTED.
>
> More on this below ...
>
> > Would it be safer (more precise) to inhibit on-the-fly
> > canonicalisation whenever the set (stack) of aggregates being compared
> > is non-empty?
>
> Right, that is the obvious thing to do, as I thought.  But then the
> problem I encountered, when doing this for IR types, was that things
> were too slow.  Really really slow.  In the time window where the
> canonical type DIE is inhibited, the amount of (quadratic) structural
> DIE comparisons goes through the roof.  That is why I came up with the
> canonical type propagation in the first place.
>
> So, the other (harder) approach I've taken is to not disable the
> on-the-fly canonicalization when the stack of aggregates being compared
> is non-empty, but rather, keep track of the types which are resolved as
> being equivalent, but that depend on recursive aggregate that were
> detected as such.
>
> I call these types "dependent types".  Let's consider one such dependent
> type D.  D's canonical type is the result of canonical propagation that
> happened as the recursive type (that D depends on) was on the stack of
> aggregates being compared.  D is labelled as having a "non-confirmed"
> canonical type.  If the recursive type it depends on is later considered
> not being equivalent to its counterpart, then the non-confirmed
> canonical of D is going to be "cancelled" and then D is going to be
> considered as being non canonicalized.
>
> This is done for D and all the types that depends on it.
>
> By doing this, the number of non canonicalized types is reduced to its
> absolute minimum and so comparisons are reasonably fast, even for an
> applications like the Kernel or Gimp.  Libraries usually have smaller
> type graphs so we don't usually see the speed issue there.  Unless it's
> llvm.so or libwebkit.so ;-)
>
> So, I wasn't going to get into doing this for DIEs right away because it
> takes a lot of time doing careful coding/debugging/profiling cycles.
>
> But I definitely think we'll need to do this to have perfectly precise
> canonicalizer.  My point was to get this in as it's an improvement
> already and then work on the subsequent patch with a cooler head.
>
> Does that make any sense?
>

I think it makes some sense, but it would take me some time to read
through, digest and reason about this properly.

Instead... I'm going to advertise my comparison algorithm again (for
which I've already done all the hard thinking and testing). :-) I'm
not sure how directly applicable it is to canonicalisation, but there
is the *potential* to eliminate all redundant comparisons.

Problem statement: How to decide if two DIEs (let's call them nodes)
are equivalent, when nodes can depend on other nodes recursively and
with cycles, without comparing each pair of nodes more than once.

Nodes can have attributes (like kind (struct, union), size etc.) and
labelled edges (like member x of type y, pointer to type z etc.).
Equivalence means that node attributes are equivalent, nodes have the
same edge labels and following matching labelled edges finds
equivalent nodes.

The C and C++ type systems have various wrinkles, but the overall
structure of equivalence testing involves this kind of matching of
attributes and following edges to other nodes.

Equivalence of two nodes means we cannot find any difference, no
matter at what depth we decide to terminate recursion.

The algorithm has the following components.

1. a "known results" map from comparisons (node pairs) to equivalence
results (bool) - I think the equivalent thing in libabigail may be
on-the-fly canonicalisation itself, but this only stores true
equivalences for a subset of types and it may be better to cache all
outcomes; this needs to be part of reader_context
2. the path-based strongly-connected component algorithm factored into
2 helper functions and associated shared state (which should also be
part of the reader context) - these replace the
aggregates_being_compared data structure and its access functions
3. a comparison function that uses the data structures and helper
functions at the beginning and end with per-kind comparison logic
sandwiched in the middle (this is compare_dies)

If no comparison cycles are possible then a simple DFS works, however,
it's good to cache computed comparisons to avoid repeated work (in
case the DFS traverses a DAG rather than a tree). Such a comparison
function looks like this:

1. Check for a known result, and if present return it (this is a bit
like a check for an existing canonical DIE but additionally handles
the case where the DIEs are already known to be non-equivalent).
2. [doesn't exist yet]
3. Now do the actual comparison work updating result to false if a
difference is found (exactly as at present).
4. [doesn't exist yet]
5. Insert (comparison, result) into "known results".
6. Return result.

If cycles are possible, this DFS will go into an infinite loop (and
crash with a stack overflow). The comparison function needs to be
updated:

1. Check for a known result, and if present return it (this is a bit
like a check for an existing canonical DIE but additionally handles
the case where the DIEs are already known to be non-equivalent).
2. Call the first SCC helper to see if the comparison is already in
progress (this is a bit like the aggregates-being-compared check).
2a. If the SCC helper says the comparison is progress, return a
tentative "true" (equals) result (this is very similar to what happens
for aggregates, but it can be done for all kinds of DIE).
2b. Otherwise the SCC helper has returned an index which *must* be
used before returning from the comparison function.
3. Now do the actual comparison work updating result to false if a
difference is found (exactly as at present - though I did notice one
stray early return false which needs to be changed to result = false).
4. Pass the index to the second SCC helper function which returns a
set of comparisons.
4a. The empty set implies that comparisons are still in progress.
4b. A non-empty set means a strongly-connected component has been
found and all of its comparisons have completed.
5. Insert (comparison, result) into "known results" for each
comparison in the set.
6. Return result.

I'm not exactly sure what this means in terms of
on-the-fly-canonicalisation. I tried adapting the code at the end of
compare_dies that performs canonicalisation but something broke
somewhere. I also tried skipping on-the-fly canonicalisation
altogether but that didn't work either. So, just dropping in the SCC
helper and moving a small amount of code around doesn't quite work.
It's possible I need to fix how I got back from DIE offsets to DIEs.

I've noted similarities between the SCC approach and what you have in
place already. The SCC approach guarantees no repeated comparisons, so
it would be really great if it could be applied to the DWARF reader.

My SCC helper code is here:
https://cs.android.com/android/kernel/superproject/+/build-tools:external/stg/scc.h

A concrete comparison algorithm using the SCC helper is here (it
actually builds ABI diffs in this case, but that extra logic can be
ignored):
https://cs.android.com/android/kernel/superproject/+/build-tools:external/stg/stg.cc;drc=3b6b65e1a9190dab2eabb98701a716a895b8a7b1;l=366

I'll post a cleaned-up version of my attempt to integrate this code
tomorrow with notes about where the test suite blows up.

Cheers,
Giuliano.

> Thanks.
>
> --
> You are receiving this mail because:
> You reported the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Bug default/26646] unexpected declaration-only types
  2022-03-02 22:35   ` [Bug default/26646] unexpected declaration-only types Giuliano Procida
@ 2022-03-03 11:43     ` Dodji Seketeli
  0 siblings, 0 replies; 27+ messages in thread
From: Dodji Seketeli @ 2022-03-03 11:43 UTC (permalink / raw)
  To: Giuliano Procida via Libabigail
  Cc: dodji at seketeli dot org, Giuliano Procida

Giuliano Procida via Libabigail <libabigail@sourceware.org> a écrit:


>> Does that make any sense?
>>
>
> I think it makes some sense, but it would take me some time to read
> through, digest and reason about this properly.
>
> Instead... I'm going to advertise my comparison algorithm again (for
> which I've already done all the hard thinking and testing). :-) I'm
> not sure how directly applicable it is to canonicalisation, but there
> is the *potential* to eliminate all redundant comparisons.

OK, I think this would be whole project in itself, as far as I am
concerned.  Right now, I am interested in fixing this issue in a 'best
effort' mode, keeping most of the existing infrastructure for the sake
of consistency and limiting the unintended impacts, release a 2.1
tarball and then we can talk about this further if you like.

I am not ditching what you are saying.  Just that this place (patch
review and this bug report) doesn't seem like the best place for this
right now.

Rather, a new comparison algorithm altogether being a project in itself
(IMHO), it'd be better to keep this discussion under its own "feature
request bug report", I'd say.

I understand if you don't have time to review what I am proposing.  I'll
thus go ahead and look at your proposal a bit later.

Thanks.

-- 
		Dodji

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug default/26646] unexpected declaration-only types
  2020-09-22 11:35 [Bug default/26646] New: unexpectedly " gprocida+abigail at google dot com
                   ` (22 preceding siblings ...)
  2022-03-03 13:32 ` gprocida at google dot com
@ 2022-06-08 15:43 ` gprocida at google dot com
  23 siblings, 0 replies; 27+ messages in thread
From: gprocida at google dot com @ 2022-06-08 15:43 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=26646

--- Comment #35 from gprocida at google dot com ---
I have bisected what I believe to be a (rare) exponential slowdown to this
commit:
https://sourceware.org/git/?p=libabigail.git;a=commit;h=bfd557040376d4e82e4e684f3745f1e55bafdd9b

The same pairs of DIEs get compared 100 000s of times (at least) and abidw does
not terminate in any reasonable amount of time.

This behaviour is consistent across build environments and compilers and I have
verified it on vanilla upstream libabigail at commit
ebf8f98f3dc7b6e4fe2af9c1ccbbfee55ace6c54.

The test case is 200MB compressed, unfortunately, much of which is vmlinux. The
bad behaviour kicks in when processing reaches a certain kernel module, as far
as I could tell using strace.

I'm looking into what I can do make the test case available to you. There may
be some complications here.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug default/26646] unexpected declaration-only types
  2020-09-22 11:35 [Bug default/26646] New: unexpectedly " gprocida+abigail at google dot com
                   ` (21 preceding siblings ...)
  2022-03-03 11:43 ` dodji at seketeli dot org
@ 2022-03-03 13:32 ` gprocida at google dot com
  2022-06-08 15:43 ` gprocida at google dot com
  23 siblings, 0 replies; 27+ messages in thread
From: gprocida at google dot com @ 2022-03-03 13:32 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=26646

--- Comment #34 from gprocida at google dot com ---
Hi.

compare_dies is almost exactly the right shape to be able to drop in the more
sophisticated cycle-handling logic (a.k.a. SCC finder) and I've posted an RFC
patch to show the small changes needed there and also to note that things don't
quite work, as mentioned already.

If had a better idea as to what compute_ and get_canonical_die_offset are
doing, I might be able to fix things myself, but I don't.

However, if you feel this is a separate project for another time, that's fair
enough.

In particular, I have no expectation that dropping in the SCC finder and
getting compare_dies working with it will do anything other than make the
aggregates-being-compared logic more precise and complete. I don't think it
will address this (declaration-only types) bug report in general.

Regards,
Giuliano.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug default/26646] unexpected declaration-only types
  2020-09-22 11:35 [Bug default/26646] New: unexpectedly " gprocida+abigail at google dot com
                   ` (20 preceding siblings ...)
  2022-03-02 22:36 ` gprocida at google dot com
@ 2022-03-03 11:43 ` dodji at seketeli dot org
  2022-03-03 13:32 ` gprocida at google dot com
  2022-06-08 15:43 ` gprocida at google dot com
  23 siblings, 0 replies; 27+ messages in thread
From: dodji at seketeli dot org @ 2022-03-03 11:43 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=26646

--- Comment #33 from dodji at seketeli dot org ---
Giuliano Procida via Libabigail <libabigail@sourceware.org> a écrit:


>> Does that make any sense?
>>
>
> I think it makes some sense, but it would take me some time to read
> through, digest and reason about this properly.
>
> Instead... I'm going to advertise my comparison algorithm again (for
> which I've already done all the hard thinking and testing). :-) I'm
> not sure how directly applicable it is to canonicalisation, but there
> is the *potential* to eliminate all redundant comparisons.

OK, I think this would be whole project in itself, as far as I am
concerned.  Right now, I am interested in fixing this issue in a 'best
effort' mode, keeping most of the existing infrastructure for the sake
of consistency and limiting the unintended impacts, release a 2.1
tarball and then we can talk about this further if you like.

I am not ditching what you are saying.  Just that this place (patch
review and this bug report) doesn't seem like the best place for this
right now.

Rather, a new comparison algorithm altogether being a project in itself
(IMHO), it'd be better to keep this discussion under its own "feature
request bug report", I'd say.

I understand if you don't have time to review what I am proposing.  I'll
thus go ahead and look at your proposal a bit later.

Thanks.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug default/26646] unexpected declaration-only types
  2020-09-22 11:35 [Bug default/26646] New: unexpectedly " gprocida+abigail at google dot com
                   ` (19 preceding siblings ...)
  2022-03-02 13:34 ` dodji at seketeli dot org
@ 2022-03-02 22:36 ` gprocida at google dot com
  2022-03-03 11:43 ` dodji at seketeli dot org
                   ` (2 subsequent siblings)
  23 siblings, 0 replies; 27+ messages in thread
From: gprocida at google dot com @ 2022-03-02 22:36 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=26646

--- Comment #32 from gprocida at google dot com ---
(Not sure what's going on with the addresses, I've added back libabigail here.)

Hi.

On Wed, 2 Mar 2022 at 13:34, dodji at seketeli dot org
<sourceware-bugzilla@sourceware.org> wrote:
>
> https://sourceware.org/bugzilla/show_bug.cgi?id=26646
>
> --- Comment #31 from dodji at seketeli dot org ---
> Dodji Seketeli via Libabigail <libabigail@sourceware.org> a écrit:
>
> [...]
>
>
> > This addresses https://sourceware.org/bugzilla/show_bug.cgi?id=26646#c21.
> >
> >       * src/abg-dwarf-reader.cc (compare_dies): Do not propagate
> >       canonical type when aggregate redundancy is detected.
> >
>
> [...]
>
> gprocida at google dot com via Libabigail <libabigail@sourceware.org> a
> écrit:
>
> [...]
>
> > I've thought a little bit more about this one.
> >
> > The current check is "local", recursion prevention at *this* DIE
> > prevents it from being canonicalised, but it could still depend on
> > child DIEs that are actually also parent DIEs and whose processing has
> > not yet been completed.
>
> You are right.  The current state is an improvement, but it's not
> "complete".  Some DIEs might still wrongly considered as being
> equivalent just because they depend on a "recursive DIE" that was
> detected as such.  The canonical DIE propagation might have happened
> during the window of time where the recursive DIE was comparing equal to
> its counterpart.
>
> This is addressed in the IR type canonicalization algorithm in
> abg-ir.cc.
> To learn more about this, look for
> /// @defgroup OnTheFlyCanonicalization On-the-fly Canonicalization
>
> and that comment.
>
> The implementation is scattered in the functions
> return_comparison_result, maybe_propagate_canonical_type and in the
> macro RETURN_TRUE_IF_COMPARISON_CYCLE_DETECTED.
>
> More on this below ...
>
> > Would it be safer (more precise) to inhibit on-the-fly
> > canonicalisation whenever the set (stack) of aggregates being compared
> > is non-empty?
>
> Right, that is the obvious thing to do, as I thought.  But then the
> problem I encountered, when doing this for IR types, was that things
> were too slow.  Really really slow.  In the time window where the
> canonical type DIE is inhibited, the amount of (quadratic) structural
> DIE comparisons goes through the roof.  That is why I came up with the
> canonical type propagation in the first place.
>
> So, the other (harder) approach I've taken is to not disable the
> on-the-fly canonicalization when the stack of aggregates being compared
> is non-empty, but rather, keep track of the types which are resolved as
> being equivalent, but that depend on recursive aggregate that were
> detected as such.
>
> I call these types "dependent types".  Let's consider one such dependent
> type D.  D's canonical type is the result of canonical propagation that
> happened as the recursive type (that D depends on) was on the stack of
> aggregates being compared.  D is labelled as having a "non-confirmed"
> canonical type.  If the recursive type it depends on is later considered
> not being equivalent to its counterpart, then the non-confirmed
> canonical of D is going to be "cancelled" and then D is going to be
> considered as being non canonicalized.
>
> This is done for D and all the types that depends on it.
>
> By doing this, the number of non canonicalized types is reduced to its
> absolute minimum and so comparisons are reasonably fast, even for an
> applications like the Kernel or Gimp.  Libraries usually have smaller
> type graphs so we don't usually see the speed issue there.  Unless it's
> llvm.so or libwebkit.so ;-)
>
> So, I wasn't going to get into doing this for DIEs right away because it
> takes a lot of time doing careful coding/debugging/profiling cycles.
>
> But I definitely think we'll need to do this to have perfectly precise
> canonicalizer.  My point was to get this in as it's an improvement
> already and then work on the subsequent patch with a cooler head.
>
> Does that make any sense?
>

I think it makes some sense, but it would take me some time to read
through, digest and reason about this properly.

Instead... I'm going to advertise my comparison algorithm again (for
which I've already done all the hard thinking and testing). :-) I'm
not sure how directly applicable it is to canonicalisation, but there
is the *potential* to eliminate all redundant comparisons.

Problem statement: How to decide if two DIEs (let's call them nodes)
are equivalent, when nodes can depend on other nodes recursively and
with cycles, without comparing each pair of nodes more than once.

Nodes can have attributes (like kind (struct, union), size etc.) and
labelled edges (like member x of type y, pointer to type z etc.).
Equivalence means that node attributes are equivalent, nodes have the
same edge labels and following matching labelled edges finds
equivalent nodes.

The C and C++ type systems have various wrinkles, but the overall
structure of equivalence testing involves this kind of matching of
attributes and following edges to other nodes.

Equivalence of two nodes means we cannot find any difference, no
matter at what depth we decide to terminate recursion.

The algorithm has the following components.

1. a "known results" map from comparisons (node pairs) to equivalence
results (bool) - I think the equivalent thing in libabigail may be
on-the-fly canonicalisation itself, but this only stores true
equivalences for a subset of types and it may be better to cache all
outcomes; this needs to be part of reader_context
2. the path-based strongly-connected component algorithm factored into
2 helper functions and associated shared state (which should also be
part of the reader context) - these replace the
aggregates_being_compared data structure and its access functions
3. a comparison function that uses the data structures and helper
functions at the beginning and end with per-kind comparison logic
sandwiched in the middle (this is compare_dies)

If no comparison cycles are possible then a simple DFS works, however,
it's good to cache computed comparisons to avoid repeated work (in
case the DFS traverses a DAG rather than a tree). Such a comparison
function looks like this:

1. Check for a known result, and if present return it (this is a bit
like a check for an existing canonical DIE but additionally handles
the case where the DIEs are already known to be non-equivalent).
2. [doesn't exist yet]
3. Now do the actual comparison work updating result to false if a
difference is found (exactly as at present).
4. [doesn't exist yet]
5. Insert (comparison, result) into "known results".
6. Return result.

If cycles are possible, this DFS will go into an infinite loop (and
crash with a stack overflow). The comparison function needs to be
updated:

1. Check for a known result, and if present return it (this is a bit
like a check for an existing canonical DIE but additionally handles
the case where the DIEs are already known to be non-equivalent).
2. Call the first SCC helper to see if the comparison is already in
progress (this is a bit like the aggregates-being-compared check).
2a. If the SCC helper says the comparison is progress, return a
tentative "true" (equals) result (this is very similar to what happens
for aggregates, but it can be done for all kinds of DIE).
2b. Otherwise the SCC helper has returned an index which *must* be
used before returning from the comparison function.
3. Now do the actual comparison work updating result to false if a
difference is found (exactly as at present - though I did notice one
stray early return false which needs to be changed to result = false).
4. Pass the index to the second SCC helper function which returns a
set of comparisons.
4a. The empty set implies that comparisons are still in progress.
4b. A non-empty set means a strongly-connected component has been
found and all of its comparisons have completed.
5. Insert (comparison, result) into "known results" for each
comparison in the set.
6. Return result.

I'm not exactly sure what this means in terms of
on-the-fly-canonicalisation. I tried adapting the code at the end of
compare_dies that performs canonicalisation but something broke
somewhere. I also tried skipping on-the-fly canonicalisation
altogether but that didn't work either. So, just dropping in the SCC
helper and moving a small amount of code around doesn't quite work.
It's possible I need to fix how I got back from DIE offsets to DIEs.

I've noted similarities between the SCC approach and what you have in
place already. The SCC approach guarantees no repeated comparisons, so
it would be really great if it could be applied to the DWARF reader.

My SCC helper code is here:
https://cs.android.com/android/kernel/superproject/+/build-tools:external/stg/scc.h

A concrete comparison algorithm using the SCC helper is here (it
actually builds ABI diffs in this case, but that extra logic can be
ignored):
https://cs.android.com/android/kernel/superproject/+/build-tools:external/stg/stg.cc;drc=3b6b65e1a9190dab2eabb98701a716a895b8a7b1;l=366

I'll post a cleaned-up version of my attempt to integrate this code
tomorrow with notes about where the test suite blows up.

Cheers,
Giuliano.

> Thanks.
>
> --
> You are receiving this mail because:
> You reported the bug.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug default/26646] unexpected declaration-only types
  2020-09-22 11:35 [Bug default/26646] New: unexpectedly " gprocida+abigail at google dot com
                   ` (18 preceding siblings ...)
  2022-03-01 14:40 ` gprocida at google dot com
@ 2022-03-02 13:34 ` dodji at seketeli dot org
  2022-03-02 22:36 ` gprocida at google dot com
                   ` (3 subsequent siblings)
  23 siblings, 0 replies; 27+ messages in thread
From: dodji at seketeli dot org @ 2022-03-02 13:34 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=26646

--- Comment #31 from dodji at seketeli dot org ---
Dodji Seketeli via Libabigail <libabigail@sourceware.org> a écrit:

[...]


> This addresses https://sourceware.org/bugzilla/show_bug.cgi?id=26646#c21.
>
> 	* src/abg-dwarf-reader.cc (compare_dies): Do not propagate
> 	canonical type when aggregate redundancy is detected.
>

[...]

gprocida at google dot com via Libabigail <libabigail@sourceware.org> a
écrit:

[...]

> I've thought a little bit more about this one.
>
> The current check is "local", recursion prevention at *this* DIE
> prevents it from being canonicalised, but it could still depend on
> child DIEs that are actually also parent DIEs and whose processing has
> not yet been completed.

You are right.  The current state is an improvement, but it's not
"complete".  Some DIEs might still wrongly considered as being
equivalent just because they depend on a "recursive DIE" that was
detected as such.  The canonical DIE propagation might have happened
during the window of time where the recursive DIE was comparing equal to
its counterpart.

This is addressed in the IR type canonicalization algorithm in
abg-ir.cc.
To learn more about this, look for
/// @defgroup OnTheFlyCanonicalization On-the-fly Canonicalization

and that comment.

The implementation is scattered in the functions
return_comparison_result, maybe_propagate_canonical_type and in the
macro RETURN_TRUE_IF_COMPARISON_CYCLE_DETECTED.

More on this below ...

> Would it be safer (more precise) to inhibit on-the-fly
> canonicalisation whenever the set (stack) of aggregates being compared
> is non-empty?

Right, that is the obvious thing to do, as I thought.  But then the
problem I encountered, when doing this for IR types, was that things
were too slow.  Really really slow.  In the time window where the
canonical type DIE is inhibited, the amount of (quadratic) structural
DIE comparisons goes through the roof.  That is why I came up with the
canonical type propagation in the first place.

So, the other (harder) approach I've taken is to not disable the
on-the-fly canonicalization when the stack of aggregates being compared
is non-empty, but rather, keep track of the types which are resolved as
being equivalent, but that depend on recursive aggregate that were
detected as such.

I call these types "dependent types".  Let's consider one such dependent
type D.  D's canonical type is the result of canonical propagation that
happened as the recursive type (that D depends on) was on the stack of
aggregates being compared.  D is labelled as having a "non-confirmed"
canonical type.  If the recursive type it depends on is later considered
not being equivalent to its counterpart, then the non-confirmed
canonical of D is going to be "cancelled" and then D is going to be
considered as being non canonicalized.

This is done for D and all the types that depends on it.

By doing this, the number of non canonicalized types is reduced to its
absolute minimum and so comparisons are reasonably fast, even for an
applications like the Kernel or Gimp.  Libraries usually have smaller
type graphs so we don't usually see the speed issue there.  Unless it's
llvm.so or libwebkit.so ;-)

So, I wasn't going to get into doing this for DIEs right away because it
takes a lot of time doing careful coding/debugging/profiling cycles.

But I definitely think we'll need to do this to have perfectly precise
canonicalizer.  My point was to get this in as it's an improvement
already and then work on the subsequent patch with a cooler head.

Does that make any sense?

Thanks.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Bug default/26646] unexpected declaration-only types
  2022-03-01 14:40 ` gprocida at google dot com
@ 2022-03-02 13:34   ` Dodji Seketeli
  0 siblings, 0 replies; 27+ messages in thread
From: Dodji Seketeli @ 2022-03-02 13:34 UTC (permalink / raw)
  To: gprocida at google dot com via Libabigail; +Cc: gprocida at google dot com

Dodji Seketeli via Libabigail <libabigail@sourceware.org> a écrit:

[...]


> This addresses https://sourceware.org/bugzilla/show_bug.cgi?id=26646#c21.
>
> 	* src/abg-dwarf-reader.cc (compare_dies): Do not propagate
> 	canonical type when aggregate redundancy is detected.
>

[...]

gprocida at google dot com via Libabigail <libabigail@sourceware.org> a
écrit:

[...]

> I've thought a little bit more about this one.
>
> The current check is "local", recursion prevention at *this* DIE
> prevents it from being canonicalised, but it could still depend on
> child DIEs that are actually also parent DIEs and whose processing has
> not yet been completed.

You are right.  The current state is an improvement, but it's not
"complete".  Some DIEs might still wrongly considered as being
equivalent just because they depend on a "recursive DIE" that was
detected as such.  The canonical DIE propagation might have happened
during the window of time where the recursive DIE was comparing equal to
its counterpart.

This is addressed in the IR type canonicalization algorithm in
abg-ir.cc.
To learn more about this, look for
/// @defgroup OnTheFlyCanonicalization On-the-fly Canonicalization

and that comment.

The implementation is scattered in the functions
return_comparison_result, maybe_propagate_canonical_type and in the
macro RETURN_TRUE_IF_COMPARISON_CYCLE_DETECTED.

More on this below ...

> Would it be safer (more precise) to inhibit on-the-fly
> canonicalisation whenever the set (stack) of aggregates being compared
> is non-empty?

Right, that is the obvious thing to do, as I thought.  But then the
problem I encountered, when doing this for IR types, was that things
were too slow.  Really really slow.  In the time window where the
canonical type DIE is inhibited, the amount of (quadratic) structural
DIE comparisons goes through the roof.  That is why I came up with the
canonical type propagation in the first place.

So, the other (harder) approach I've taken is to not disable the
on-the-fly canonicalization when the stack of aggregates being compared
is non-empty, but rather, keep track of the types which are resolved as
being equivalent, but that depend on recursive aggregate that were
detected as such.

I call these types "dependent types".  Let's consider one such dependent
type D.  D's canonical type is the result of canonical propagation that
happened as the recursive type (that D depends on) was on the stack of
aggregates being compared.  D is labelled as having a "non-confirmed"
canonical type.  If the recursive type it depends on is later considered
not being equivalent to its counterpart, then the non-confirmed
canonical of D is going to be "cancelled" and then D is going to be
considered as being non canonicalized.

This is done for D and all the types that depends on it.

By doing this, the number of non canonicalized types is reduced to its
absolute minimum and so comparisons are reasonably fast, even for an
applications like the Kernel or Gimp.  Libraries usually have smaller
type graphs so we don't usually see the speed issue there.  Unless it's
llvm.so or libwebkit.so ;-)

So, I wasn't going to get into doing this for DIEs right away because it
takes a lot of time doing careful coding/debugging/profiling cycles.

But I definitely think we'll need to do this to have perfectly precise
canonicalizer.  My point was to get this in as it's an improvement
already and then work on the subsequent patch with a cooler head.

Does that make any sense?

Thanks.

-- 
		Dodji

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug default/26646] unexpected declaration-only types
  2020-09-22 11:35 [Bug default/26646] New: unexpectedly " gprocida+abigail at google dot com
                   ` (17 preceding siblings ...)
  2022-03-01 14:34 ` gprocida at google dot com
@ 2022-03-01 14:40 ` gprocida at google dot com
  2022-03-02 13:34   ` Dodji Seketeli
  2022-03-02 13:34 ` dodji at seketeli dot org
                   ` (4 subsequent siblings)
  23 siblings, 1 reply; 27+ messages in thread
From: gprocida at google dot com @ 2022-03-01 14:40 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=26646

--- Comment #30 from gprocida at google dot com ---
Hi Dodji.

On Mon, 28 Feb 2022 at 10:01, dodji at redhat dot com
<sourceware-bugzilla@sourceware.org> wrote:
>
> https://sourceware.org/bugzilla/show_bug.cgi?id=26646
>
> --- Comment #28 from dodji at redhat dot com ---
> (In reply to gprocida from comment #26)
>
> > > If the regression disappears, I guess we can say that the current state is
> > > an improvement.  If so, I'll clean-up and post the patches on the list, if
> > > you agree.  Further investigation will continue after that.
> >
> > Agreed. Thanks!
>
> I have posted the cleaned-up patch for review at
> https://sourceware.org/pipermail/libabigail/2022q1/004179.html.
>

I've thought a little bit more about this one.

The current check is "local", recursion prevention at *this* DIE
prevents it from being canonicalised, but it could still depend on
child DIEs that are actually also parent DIEs and whose processing has
not yet been completed.

Would it be safer (more precise) to inhibit on-the-fly
canonicalisation whenever the set (stack) of aggregates being compared
is non-empty?

Giuliano.

> Thanks.
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug default/26646] unexpected declaration-only types
  2020-09-22 11:35 [Bug default/26646] New: unexpectedly " gprocida+abigail at google dot com
                   ` (16 preceding siblings ...)
  2022-02-28 10:01 ` dodji at redhat dot com
@ 2022-03-01 14:34 ` gprocida at google dot com
  2022-03-01 14:40 ` gprocida at google dot com
                   ` (5 subsequent siblings)
  23 siblings, 0 replies; 27+ messages in thread
From: gprocida at google dot com @ 2022-03-01 14:34 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=26646

--- Comment #29 from gprocida at google dot com ---
Hi.

On Mon, 28 Feb 2022 at 09:59, dodji at redhat dot com
<sourceware-bugzilla@sourceware.org> wrote:
>
> https://sourceware.org/bugzilla/show_bug.cgi?id=26646
>
> --- Comment #27 from dodji at redhat dot com ---
> (In reply to gprocida from comment #20)
>
> [...]
>
> > In terms of the code...
> >
> > I think it would be better to store pairs of offsets though it would
> > probably be extremely hard to find an example where this would make any
> > difference.
>
> I have update the patch to do just that.
>
> I posted it for review at
> https://sourceware.org/pipermail/libabigail/2022q1/004178.html.
>

Two (overlapping) things:

1. I wouldn't bother with inserting / testing / erasing the reverse
pair o(p2, p1)

It might cost more than it ever saves. But it would take
instrumentation to see if it ever makes a difference.

If you make this change then s/unordered pair/ordered pair/ in the doc
comments.

2. There is a typo.

One of the reverse pairs is written as o(p2, 1).

Otherwise, it looks good to me.

Giuliano.

> If it works for you, I'll be glad to apply it.
>
> [...]
>
> Thanks.
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug default/26646] unexpected declaration-only types
  2020-09-22 11:35 [Bug default/26646] New: unexpectedly " gprocida+abigail at google dot com
                   ` (15 preceding siblings ...)
  2022-02-28  9:59 ` dodji at redhat dot com
@ 2022-02-28 10:01 ` dodji at redhat dot com
  2022-03-01 14:34 ` gprocida at google dot com
                   ` (6 subsequent siblings)
  23 siblings, 0 replies; 27+ messages in thread
From: dodji at redhat dot com @ 2022-02-28 10:01 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=26646

--- Comment #28 from dodji at redhat dot com ---
(In reply to gprocida from comment #26)

> > If the regression disappears, I guess we can say that the current state is
> > an improvement.  If so, I'll clean-up and post the patches on the list, if
> > you agree.  Further investigation will continue after that.
> 
> Agreed. Thanks!

I have posted the cleaned-up patch for review at
https://sourceware.org/pipermail/libabigail/2022q1/004179.html.

Thanks.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug default/26646] unexpected declaration-only types
  2020-09-22 11:35 [Bug default/26646] New: unexpectedly " gprocida+abigail at google dot com
                   ` (14 preceding siblings ...)
  2022-02-24 16:05 ` gprocida at google dot com
@ 2022-02-28  9:59 ` dodji at redhat dot com
  2022-02-28 10:01 ` dodji at redhat dot com
                   ` (7 subsequent siblings)
  23 siblings, 0 replies; 27+ messages in thread
From: dodji at redhat dot com @ 2022-02-28  9:59 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=26646

--- Comment #27 from dodji at redhat dot com ---
(In reply to gprocida from comment #20)

[...]

> In terms of the code...
> 
> I think it would be better to store pairs of offsets though it would
> probably be extremely hard to find an example where this would make any
> difference.

I have update the patch to do just that.

I posted it for review at
https://sourceware.org/pipermail/libabigail/2022q1/004178.html.

If it works for you, I'll be glad to apply it.

[...]

Thanks.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug default/26646] unexpected declaration-only types
  2020-09-22 11:35 [Bug default/26646] New: unexpectedly " gprocida+abigail at google dot com
                   ` (13 preceding siblings ...)
  2022-02-24 15:54 ` dodji at redhat dot com
@ 2022-02-24 16:05 ` gprocida at google dot com
  2022-02-28  9:59 ` dodji at redhat dot com
                   ` (8 subsequent siblings)
  23 siblings, 0 replies; 27+ messages in thread
From: gprocida at google dot com @ 2022-02-24 16:05 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=26646

--- Comment #26 from gprocida at google dot com ---
Hi.

(In reply to dodji from comment #25)
> (In reply to gprocida from comment #24)
> > * struct can_dev_rcv_lists is now declaration-only (w.r.t. to the original
> > baseline ABIs)
> > * struct prefix_info too (w.r.t. an intermediate code version)
> 
> OK, I haven't analyzed this, is it sure that it was supposed to be fully
> declared?

Good question. I have to assume there is some set of paths where we can get
from the symbol to the full definition, either by going down a path or by
jumping from a declaration in one path to a definition in another path.
However, I have not tried to code up this idea!

> > * there is a test case regression (nmap)
> 
> OK, I think I have the reason for this. It seems to have been an unfortunate
> patch that snuck into the tree.  I have removed it and re-push the branch. 
> You can fetch it again and test.  This one should disappear.

It has.

> If the regression disappears, I guess we can say that the current state is
> an improvement.  If so, I'll clean-up and post the patches on the list, if
> you agree.  Further investigation will continue after that.

Agreed. Thanks!

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug default/26646] unexpected declaration-only types
  2020-09-22 11:35 [Bug default/26646] New: unexpectedly " gprocida+abigail at google dot com
                   ` (12 preceding siblings ...)
  2022-02-24 12:16 ` gprocida at google dot com
@ 2022-02-24 15:54 ` dodji at redhat dot com
  2022-02-24 16:05 ` gprocida at google dot com
                   ` (9 subsequent siblings)
  23 siblings, 0 replies; 27+ messages in thread
From: dodji at redhat dot com @ 2022-02-24 15:54 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=26646

--- Comment #25 from dodji at redhat dot com ---
(In reply to gprocida from comment #24)
> Hi.
> 
> I ran some quick tests - the libabigail test suite and on the two kernels.

Thanks!

> Good
> 
> * vmlinux3 and vmlinux4 get ABIs which are the same, including
> declaration-only / fully-defined status
> * both have more fully-defined types than the previous iteration (around 9)
> * a Linux test case looks better (some declaration-only duplicates vanish)

Good to know.

> 
> Not so good
> 
> * struct can_dev_rcv_lists is now declaration-only (w.r.t. to the original
> baseline ABIs)
> * struct prefix_info too (w.r.t. an intermediate code version)

OK, I haven't analyzed this, is it sure that it was supposed to be fully
declared?

> * there is a test case regression (nmap)

OK, I think I have the reason for this. It seems to have been an unfortunate
patch that snuck into the tree.  I have removed it and re-push the branch.  You
can fetch it again and test.  This one should disappear.

If the regression disappears, I guess we can say that the current state is an
improvement.  If so, I'll clean-up and post the patches on the list, if you
agree.  Further investigation will continue after that.

Thanks.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug default/26646] unexpected declaration-only types
  2020-09-22 11:35 [Bug default/26646] New: unexpectedly " gprocida+abigail at google dot com
                   ` (11 preceding siblings ...)
  2022-02-24 11:09 ` dodji at redhat dot com
@ 2022-02-24 12:16 ` gprocida at google dot com
  2022-02-24 15:54 ` dodji at redhat dot com
                   ` (10 subsequent siblings)
  23 siblings, 0 replies; 27+ messages in thread
From: gprocida at google dot com @ 2022-02-24 12:16 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=26646

--- Comment #24 from gprocida at google dot com ---
Hi.

I ran some quick tests - the libabigail test suite and on the two kernels.

Good

* vmlinux3 and vmlinux4 get ABIs which are the same, including declaration-only
/ fully-defined status
* both have more fully-defined types than the previous iteration (around 9)
* a Linux test case looks better (some declaration-only duplicates vanish)

Not so good

* struct can_dev_rcv_lists is now declaration-only (w.r.t. to the original
baseline ABIs)
* struct prefix_info too (w.r.t. an intermediate code version)
* there is a test case regression (nmap)

I'll do some more digging, but probably nothing directly with these two types
right now.

Someone here pointed out to me that Clang and GCC have "type homing" logic to
reduce the definitions that get emitted into DWARF to what's needed (leaving
some types declaration-only). The relevant flags (to disable this) are
-fstandalone-debug and -femit-class-debug-always, respectively.

You might decide that if either of these flags makes a difference to libabigail
output then there's a bug somewhere in the compiler, linker or libabigail. Or
it might be the case that disabling the logic exposes the types behind opaque
pointers in a way which is unwanted.

In any case, I'm going to see what sort of impact -fstandalone-debug has on
ABIs and kernel build time and size.

Regards,
Giuliano.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug default/26646] unexpected declaration-only types
  2020-09-22 11:35 [Bug default/26646] New: unexpectedly " gprocida+abigail at google dot com
                   ` (10 preceding siblings ...)
  2022-02-11 13:00 ` gprocida at google dot com
@ 2022-02-24 11:09 ` dodji at redhat dot com
  2022-02-24 12:16 ` gprocida at google dot com
                   ` (11 subsequent siblings)
  23 siblings, 0 replies; 27+ messages in thread
From: dodji at redhat dot com @ 2022-02-24 11:09 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=26646

--- Comment #23 from dodji at redhat dot com ---
Thanks for the analysis, very useful as usual.

Would the content of this branch (PR26646) fix the issues you are seeing?

https://sourceware.org/git/?p=libabigail.git;a=shortlog;h=refs/heads/PR26646

I still need to address your comment
https://sourceware.org/bugzilla/show_bug.cgi?id=26646#c20, but I wanted to fix
the ip_mc_list decl-only-ness first.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug default/26646] unexpected declaration-only types
  2020-09-22 11:35 [Bug default/26646] New: unexpectedly " gprocida+abigail at google dot com
                   ` (9 preceding siblings ...)
  2022-02-11 12:51 ` gprocida at google dot com
@ 2022-02-11 13:00 ` gprocida at google dot com
  2022-02-24 11:09 ` dodji at redhat dot com
                   ` (12 subsequent siblings)
  23 siblings, 0 replies; 27+ messages in thread
From: gprocida at google dot com @ 2022-02-11 13:00 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=26646

--- Comment #22 from gprocida at google dot com ---
I missed out a couple of anonymous members.

9f34bf9 function udp4_hwcsum (in ksymtab)
9f34c0f parameter skb
9f1b29c pointer
9f1b2a1 struct sk_buff
9f1b2aa anonymous member
9f1b2b3 anonymous union
9f1b2b8 anonymous member
9f1b2c1 anonymous struct
9f1b2e0 anonymous member
9f1b2e9 anonymous union
9f1b2ee member dev
9f1a5ad pointer
9f1a5b2 struct net_device
9f1a9c2 member ip_ptr
9f2e287 pointer
9f2e28c struct in_device
9f2e2c5 member mc_list
9f2e47b pointer
9f2e480 struct ip_mc_list (fully defined)

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug default/26646] unexpected declaration-only types
  2020-09-22 11:35 [Bug default/26646] New: unexpectedly " gprocida+abigail at google dot com
                   ` (8 preceding siblings ...)
  2022-02-10 18:53 ` gprocida at google dot com
@ 2022-02-11 12:51 ` gprocida at google dot com
  2022-02-11 13:00 ` gprocida at google dot com
                   ` (13 subsequent siblings)
  23 siblings, 0 replies; 27+ messages in thread
From: gprocida at google dot com @ 2022-02-11 12:51 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=26646

--- Comment #21 from gprocida at google dot com ---
For vmlinux4, I've found one (and only one) direct path to a full definition of
struct ip_mc_list:

9f34bf9 function udp4_hwcsum (in ksymtab)
9f34c0f parameter skb
9f1b29c pointer
9f1b2a1 struct sk_buff
9f1b2aa anonymous member
9f1b2b3 anonymous union
9f1b2c1 anonymous struct
9f1b2e9 anonymous union
9f1b2ee member dev
9f1a5ad pointer
9f1a5b2 struct net_device
9f1a9c2 member ip_ptr
9f2e287 pointer
9f2e28c struct in_device
9f2e2c5 member mc_list
9f2e47b pointer
9f2e480 struct ip_mc_list (fully defined)

The DIE offsets are as reported by readelf --debug-dump.

I hope this is useful.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug default/26646] unexpected declaration-only types
  2020-09-22 11:35 [Bug default/26646] New: unexpectedly " gprocida+abigail at google dot com
                   ` (7 preceding siblings ...)
  2022-02-10 17:36 ` dodji at redhat dot com
@ 2022-02-10 18:53 ` gprocida at google dot com
  2022-02-11 12:51 ` gprocida at google dot com
                   ` (14 subsequent siblings)
  23 siblings, 0 replies; 27+ messages in thread
From: gprocida at google dot com @ 2022-02-10 18:53 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=26646

--- Comment #20 from gprocida at google dot com ---
Hi.

I've tested the latest patch.

It increased the number of fully-defined types for vmlinux3! Unfortunately,
while it also added the same type definition for vmlinux4, our old friend
ip_mc_list became declaration-only. :-(

I still think it is a step in the right direction. Certainly the recursion
limit of 5 aggregates was arbitrary and small unit tests would be unlikely to
exercise the code.

In terms of the code...

I think it would be better to store pairs of offsets though it would probably
be extremely hard to find an example where this would make any difference.

It's safe to recurse a bit too much, so long as there is some termination
guarantee, but potentially unsafe to stop recursion too early which is
theoretically possible if you compare DIEs A and B and deeper in the recursion
want to compare C and B.

The comparison algorithm I put together tracks pairs of nodes compared as this
corresponds exactly to the design aim of avoiding repeated comparisons.

Giuliano.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug default/26646] unexpected declaration-only types
  2020-09-22 11:35 [Bug default/26646] New: unexpectedly " gprocida+abigail at google dot com
                   ` (6 preceding siblings ...)
  2022-02-10 17:33 ` dodji at redhat dot com
@ 2022-02-10 17:36 ` dodji at redhat dot com
  2022-02-10 18:53 ` gprocida at google dot com
                   ` (15 subsequent siblings)
  23 siblings, 0 replies; 27+ messages in thread
From: dodji at redhat dot com @ 2022-02-10 17:36 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=26646

--- Comment #19 from dodji at redhat dot com ---
(In reply to gprocida from comment #15)
> Hi Dodji.
> 
> I collected shortest paths from symbols to struct udp_table in the ABI for
> vmlinux4 and I've traced one by hand through the DWARF DIEs (readelf
> --debug-dump) for vmlinux3:
> 
> function symbol dev_get_by_name (in ksymtab) has a parameter of type struct
> net *                                                      
> struct net has member rtln of type struct sock *                            
> 
> struct sock has member sk_prot_creator of type struct proto *               
> 
> struct proto has member h of an anonymous union type                        
> 
> the anonymous union type has member udp_table of type struct udp_table *    
> 
> struct udp_table is fully defined
> 
> If you can see why this path is not being considered by the DWARF reader, I
> think the fix will address more than just this one instance.
> 
> Regards,
> Giuliano.

Thanks a lot for this analysis, it's super useful.

I came up with this patch
https://sourceware.org/bugzilla/attachment.cgi?id=13970.  Would it move the
needle a bit more?

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug default/26646] unexpected declaration-only types
  2020-09-22 11:35 [Bug default/26646] New: unexpectedly " gprocida+abigail at google dot com
                   ` (5 preceding siblings ...)
  2022-02-10 17:21 ` gprocida at google dot com
@ 2022-02-10 17:33 ` dodji at redhat dot com
  2022-02-10 17:36 ` dodji at redhat dot com
                   ` (16 subsequent siblings)
  23 siblings, 0 replies; 27+ messages in thread
From: dodji at redhat dot com @ 2022-02-10 17:33 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=26646

dodji at redhat dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dodji at redhat dot com

--- Comment #18 from dodji at redhat dot com ---
Created attachment 13970
  --> https://sourceware.org/bugzilla/attachment.cgi?id=13970&action=edit
Second candidate patch

This patch is the second attempt at fixing the issue and should address
specifically the comment
https://sourceware.org/bugzilla/show_bug.cgi?id=26646#c15

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug default/26646] unexpected declaration-only types
  2020-09-22 11:35 [Bug default/26646] New: unexpectedly " gprocida+abigail at google dot com
                   ` (4 preceding siblings ...)
  2022-02-10 16:46 ` gprocida at google dot com
@ 2022-02-10 17:21 ` gprocida at google dot com
  2022-02-10 17:33 ` dodji at redhat dot com
                   ` (17 subsequent siblings)
  23 siblings, 0 replies; 27+ messages in thread
From: gprocida at google dot com @ 2022-02-10 17:21 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=26646

--- Comment #17 from gprocida at google dot com ---
Actually, there are ~15 occurrences of struct proto DIEs being reached and I
haven't traced through all the code to work out why it never gets to a full
definition of struct udp_table.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug default/26646] unexpected declaration-only types
  2020-09-22 11:35 [Bug default/26646] New: unexpectedly " gprocida+abigail at google dot com
                   ` (3 preceding siblings ...)
  2022-02-10 16:45 ` dodji at redhat dot com
@ 2022-02-10 16:46 ` gprocida at google dot com
  2022-02-10 17:21 ` gprocida at google dot com
                   ` (18 subsequent siblings)
  23 siblings, 0 replies; 27+ messages in thread
From: gprocida at google dot com @ 2022-02-10 16:46 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=26646

--- Comment #16 from gprocida at google dot com ---
I've done a bit more debugging.

Tracing through DIE processing for both vmlinux{3,4} I see that in the former
case, the first and only DIE for struct udp_table that is reached with "called
from public decl" true is a declaration-only DIE.

So something is preventing the DWARF reader from seeing the (later) definition
reachable from dev_get_by_name.

Tracing through what happens with dev_get_by_name, I see that recursion stops
at struct net * which has already been loaded.

In turn that was loaded by function netif_set_real_num_tx_queues -> struct
net_device * -> struct net_device -> struct in_device * -> struct in_device ->
struct net *. This continues struct net -> struct sock * -> struct sock ->
struct proto * -> struct proto. Recursion seems to stop at struct proto, but
I'm not sure why, yet.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug default/26646] unexpected declaration-only types
  2020-09-22 11:35 [Bug default/26646] New: unexpectedly " gprocida+abigail at google dot com
                   ` (2 preceding siblings ...)
  2022-02-07 21:15 ` gprocida at google dot com
@ 2022-02-10 16:45 ` dodji at redhat dot com
  2022-02-10 16:46 ` gprocida at google dot com
                   ` (19 subsequent siblings)
  23 siblings, 0 replies; 27+ messages in thread
From: dodji at redhat dot com @ 2022-02-10 16:45 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=26646

dodji at redhat dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2022-02-10
             Status|UNCONFIRMED                 |ASSIGNED
     Ever confirmed|0                           |1

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug default/26646] unexpected declaration-only types
  2020-09-22 11:35 [Bug default/26646] New: unexpectedly " gprocida+abigail at google dot com
  2022-02-07 17:10 ` [Bug default/26646] unexpected " dodji at redhat dot com
  2022-02-07 17:47 ` gprocida at google dot com
@ 2022-02-07 21:15 ` gprocida at google dot com
  2022-02-10 16:45 ` dodji at redhat dot com
                   ` (20 subsequent siblings)
  23 siblings, 0 replies; 27+ messages in thread
From: gprocida at google dot com @ 2022-02-07 21:15 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=26646

--- Comment #15 from gprocida at google dot com ---
Hi Dodji.

I collected shortest paths from symbols to struct udp_table in the ABI for
vmlinux4 and I've traced one by hand through the DWARF DIEs (readelf
--debug-dump) for vmlinux3:

function symbol dev_get_by_name (in ksymtab) has a parameter of type struct net
*                                                      
struct net has member rtln of type struct sock *                                
struct sock has member sk_prot_creator of type struct proto *                   
struct proto has member h of an anonymous union type                            
the anonymous union type has member udp_table of type struct udp_table *        
struct udp_table is fully defined

If you can see why this path is not being considered by the DWARF reader, I
think the fix will address more than just this one instance.

Regards,
Giuliano.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug default/26646] unexpected declaration-only types
  2020-09-22 11:35 [Bug default/26646] New: unexpectedly " gprocida+abigail at google dot com
  2022-02-07 17:10 ` [Bug default/26646] unexpected " dodji at redhat dot com
@ 2022-02-07 17:47 ` gprocida at google dot com
  2022-02-07 21:15 ` gprocida at google dot com
                   ` (21 subsequent siblings)
  23 siblings, 0 replies; 27+ messages in thread
From: gprocida at google dot com @ 2022-02-07 17:47 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=26646

--- Comment #14 from gprocida at google dot com ---
The other type that is different between vmlinux3 and vmlinux4 is struct
udp_table.

I haven't yet traced a shortest path from a symbol to the full definition yet.
In theory this should be straightforward based on the XML for the second
kernel, but in practice I need to write some more code to automate the process.

I've found a ridiculously long path but it would be much too painful to verify
by hand.

Note that I suspect there could be up to around 100 types which could be
assigned full definitions.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug default/26646] unexpected declaration-only types
  2020-09-22 11:35 [Bug default/26646] New: unexpectedly " gprocida+abigail at google dot com
@ 2022-02-07 17:10 ` dodji at redhat dot com
  2022-02-07 17:47 ` gprocida at google dot com
                   ` (22 subsequent siblings)
  23 siblings, 0 replies; 27+ messages in thread
From: dodji at redhat dot com @ 2022-02-07 17:10 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=26646

dodji at redhat dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|unexpectedly                |unexpected declaration-only
                   |declaration-only types      |types

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2022-06-08 15:43 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-26646-12616@http.sourceware.org/bugzilla/>
     [not found] ` <bug-26646-12616-6R4shWHsRy@http.sourceware.org/bugzilla/>
2022-03-02 22:35   ` [Bug default/26646] unexpected declaration-only types Giuliano Procida
2022-03-03 11:43     ` Dodji Seketeli
2020-09-22 11:35 [Bug default/26646] New: unexpectedly " gprocida+abigail at google dot com
2022-02-07 17:10 ` [Bug default/26646] unexpected " dodji at redhat dot com
2022-02-07 17:47 ` gprocida at google dot com
2022-02-07 21:15 ` gprocida at google dot com
2022-02-10 16:45 ` dodji at redhat dot com
2022-02-10 16:46 ` gprocida at google dot com
2022-02-10 17:21 ` gprocida at google dot com
2022-02-10 17:33 ` dodji at redhat dot com
2022-02-10 17:36 ` dodji at redhat dot com
2022-02-10 18:53 ` gprocida at google dot com
2022-02-11 12:51 ` gprocida at google dot com
2022-02-11 13:00 ` gprocida at google dot com
2022-02-24 11:09 ` dodji at redhat dot com
2022-02-24 12:16 ` gprocida at google dot com
2022-02-24 15:54 ` dodji at redhat dot com
2022-02-24 16:05 ` gprocida at google dot com
2022-02-28  9:59 ` dodji at redhat dot com
2022-02-28 10:01 ` dodji at redhat dot com
2022-03-01 14:34 ` gprocida at google dot com
2022-03-01 14:40 ` gprocida at google dot com
2022-03-02 13:34   ` Dodji Seketeli
2022-03-02 13:34 ` dodji at seketeli dot org
2022-03-02 22:36 ` gprocida at google dot com
2022-03-03 11:43 ` dodji at seketeli dot org
2022-03-03 13:32 ` gprocida at google dot com
2022-06-08 15:43 ` gprocida at google dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).