public inbox for libabigail@sourceware.org
 help / color / mirror / Atom feed
* [Bug default/26646] New: unexpectedly declaration-only types
@ 2020-09-22 11:35 gprocida+abigail at google dot com
  2020-09-22 12:15 ` [Bug default/26646] " dodji at redhat dot com
                   ` (36 more replies)
  0 siblings, 37 replies; 41+ messages in thread
From: gprocida+abigail at google dot com @ 2020-09-22 11:35 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=26646

            Bug ID: 26646
           Summary: unexpectedly declaration-only types
           Product: libabigail
           Version: unspecified
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: default
          Assignee: dodji at redhat dot com
          Reporter: gprocida+abigail at google dot com
                CC: libabigail at sourceware dot org
  Target Milestone: ---

We've noticed for some time that we get a mix of declarations and definitions
appearing in kernel ABI XML for the same types.

In some cases, only a declaration appears despite there being a full definition
in the DWARF. Often, this declaration has a "size" despite the size information
being associated with the full DWARF definition, not the declaration.

I've sent in a patch which illustrates this bug (by changing the order in which
definitions and declarations are canonicalised). If need be, I can send over a
vmlinux.

Patch: https://sourceware.org/pipermail/libabigail/2020q3/002679.html

The patch serves as workaround for around 160 kernel types which appear
incomplete in the ABI.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug default/26646] unexpectedly declaration-only types
  2020-09-22 11:35 [Bug default/26646] New: unexpectedly declaration-only types gprocida+abigail at google dot com
@ 2020-09-22 12:15 ` dodji at redhat dot com
  2020-10-06 21:02 ` gprocida+abigail at google dot com
                   ` (35 subsequent siblings)
  36 siblings, 0 replies; 41+ messages in thread
From: dodji at redhat dot com @ 2020-09-22 12:15 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=26646

--- Comment #1 from dodji at redhat dot com ---
Hello

"gprocida+abigail at google dot com" <sourceware-bugzilla@sourceware.org>
writes:

> We've noticed for some time that we get a mix of declarations and definitions
> appearing in kernel ABI XML for the same types.
>
> In some cases, only a declaration appears despite there being a full definition
> in the DWARF. Often, this declaration has a "size" despite the size information
> being associated with the full DWARF definition, not the declaration.
>
> I've sent in a patch which illustrates this bug (by changing the order in which
> definitions and declarations are canonicalised). If need be, I can send over a
> vmlinux.

I would definitely want to get my hands on a vmlinux which exhibits the
issue, yes, please.

> Patch: https://sourceware.org/pipermail/libabigail/2020q3/002679.html

Thanks.

> The patch serves as workaround for around 160 kernel types which appear
> incomplete in the ABI.

Right, I'd prefer fixing the root cause.  So as I said above, the
vmlinux would be greatly appreciated.

Thanks!

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug default/26646] unexpectedly declaration-only types
  2020-09-22 11:35 [Bug default/26646] New: unexpectedly declaration-only types gprocida+abigail at google dot com
  2020-09-22 12:15 ` [Bug default/26646] " dodji at redhat dot com
@ 2020-10-06 21:02 ` gprocida+abigail at google dot com
  2020-10-07  9:33 ` gprocida+abigail at google dot com
                   ` (34 subsequent siblings)
  36 siblings, 0 replies; 41+ messages in thread
From: gprocida+abigail at google dot com @ 2020-10-06 21:02 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=26646

--- Comment #2 from Giuliano Procida <gprocida+abigail at google dot com> ---
I've dropped an AOSP vmlinux here:
https://github.com/myxoid/libabigail/commit/51fd7ce8d9b896bf4736bb638266bda5c761ee8a.
Uncompressed it's about 0.25G.

The reordering change (the previous commit) eliminate 168 declaration-only
types from the ABI XML.

Sorry this took so long.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug default/26646] unexpectedly declaration-only types
  2020-09-22 11:35 [Bug default/26646] New: unexpectedly declaration-only types gprocida+abigail at google dot com
  2020-09-22 12:15 ` [Bug default/26646] " dodji at redhat dot com
  2020-10-06 21:02 ` gprocida+abigail at google dot com
@ 2020-10-07  9:33 ` gprocida+abigail at google dot com
  2020-11-23 17:06 ` gprocida+abigail at google dot com
                   ` (33 subsequent siblings)
  36 siblings, 0 replies; 41+ messages in thread
From: gprocida+abigail at google dot com @ 2020-10-07  9:33 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=26646

--- Comment #3 from Giuliano Procida <gprocida+abigail at google dot com> ---
I'd recommend doing the following.

build/tools/abidw --no-comp-dir-path --no-show-locs --type-id-style hash
vmlinux mainline.xml.master

cherry-pick the patch
https://sourceware.org/pipermail/libabigail/2020q3/002679.html
and rebuild

build/tools/abidw --no-comp-dir-path --no-show-locs --type-id-style hash
vmlinux mainline.xml.ordered

Despite the abidw options given, it's a bit much to try to diff these by hand
as there are thousands of differences. Analysis shows 168 fewer
declaration-only (only) types in the latter XML.

*Assuming* both XML files are in some sense valid, abidiff can illustrate other
potential problems. It would make sense to open separate Bugzilla issues, if
confirmed.

Do 4 abidiffs (with / without --harmless, with / without --leaf-changes-only)
between mainline.xml.master and mainline.xml.ordered.

abidiff - shows lots of stuff being filtered out (hmmm)

abidiff --harmless - shows diffs like the following (bug?)

            type changed from:
              union {snd_info_entry_text text; const snd_info_entry_ops* ops;}
            to:
              union {snd_info_entry_text text; const snd_info_entry_ops* ops;}

abidiff --leaf-changes-only - shows some type changes filtered out

abidiff --leaf-changes-only --harmless - shows there's at least one bug in
reporting (perhaps due to the same anonymous union issue?)

Lastly, some of these abidiff runs take a long time (up to 7 minutes on a fast
machine), but that may be expected given the large XML differences.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug default/26646] unexpectedly declaration-only types
  2020-09-22 11:35 [Bug default/26646] New: unexpectedly declaration-only types gprocida+abigail at google dot com
                   ` (2 preceding siblings ...)
  2020-10-07  9:33 ` gprocida+abigail at google dot com
@ 2020-11-23 17:06 ` gprocida+abigail at google dot com
  2022-01-17 17:46 ` gprocida at google dot com
                   ` (32 subsequent siblings)
  36 siblings, 0 replies; 41+ messages in thread
From: gprocida+abigail at google dot com @ 2020-11-23 17:06 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=26646

--- Comment #4 from Giuliano Procida <gprocida+abigail at google dot com> ---
This should be revisited once the latest canonicalisation changes are in and
things have settled down.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug default/26646] unexpectedly declaration-only types
  2020-09-22 11:35 [Bug default/26646] New: unexpectedly declaration-only types gprocida+abigail at google dot com
                   ` (3 preceding siblings ...)
  2020-11-23 17:06 ` gprocida+abigail at google dot com
@ 2022-01-17 17:46 ` gprocida at google dot com
  2022-01-18  9:37 ` gprocida at google dot com
                   ` (31 subsequent siblings)
  36 siblings, 0 replies; 41+ messages in thread
From: gprocida at google dot com @ 2022-01-17 17:46 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=26646

gprocida at google dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |gprocida at google dot com

--- Comment #5 from gprocida at google dot com ---
A lot of changes and fixes have happened (including to the Clang compiler)
since this bug was originally opened.

I have a new test case which is very stable in terms of how abidw is built. I
get identical XML for each of two Android vmlinux objects, regardless of
whether built from upstream master (modulo disabling one assertion in the
symtab reader) or AOSP or the Google monorepo builds.

Between the two kernels, the ABIs differ only in that two types that were
declared in the first ABI appear fully defined in the second. For example:

<class-decl name='ip_mc_list' is-struct='yes' visibility='default'
is-declaration-only='yes' id='c2a59aaa'/>

vs

    <class-decl name='ip_mc_list' size-in-bits='1152' is-struct='yes'
visibility='default' id='c2a59aaa'>
      <data-member access='public' layout-offset-in-bits='0'>
...


Looking at the 2 DWARF dumps, locally at least, nothing has changed in relation
to the one type I checked. Both files have 1 declaration and 14 definitions of
the type.

Tracing through abidw execution, I see differing numbers of
add_or_update_class_type calls for that type.

In particular, there is only one such call for first vmlinux object and I'm
guessing it's for the only declaration-only DIE in the DWARF (as both appear
first in the DWARF dump and in the debug output).

I also noticed that the wrapper build_ir_node_from_die function hard codes as
true the is_declaration_only parameter passed to the other
build_ir_node_from_die function. 

If I interpose a check for the specific type name and flip this flag to false,
then I *do* see a fully-defined type in the output XML.

I think the test files and this information should be enough for a proper
investigation. Unfortunately, the files are rather large. If you only need to
work with the first kernel, size requirements are halved, but still too large
for Bugzilla.

$ ls -lh declarations.tar.bz2 vmlinux[34]{,.abidw_g3}
-rw-r----- 1 gprocida primarygroup 264M Jan 17 17:19 declarations.tar.bz2
-rw-r----- 1 gprocida primarygroup 471M Jan 17 14:14 vmlinux3
-rw-r----- 1 gprocida primarygroup 7.5M Jan 17 14:32 vmlinux3.abidw_g3
-rw-r----- 1 gprocida primarygroup 471M Jan 17 14:16 vmlinux4
-rw-r----- 1 gprocida primarygroup 7.5M Jan 17 14:32 vmlinux4.abidw_g3

I was testing with ip_mc_list. These are two differences in the ABIs:

$ stgdiff --abi --format small /tmp/vmlinux{3,4}.abidw_g3 -o /dev/stdout
type 'struct udp_table' changed
  was only declared, is now fully defined

type 'struct ip_mc_list' changed
  was only declared, is now fully defined

Let me know how I can get these files to you. Alternatively, if you think you
have a quick fix, I'd be happy to try it out.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug default/26646] unexpectedly declaration-only types
  2020-09-22 11:35 [Bug default/26646] New: unexpectedly declaration-only types gprocida+abigail at google dot com
                   ` (4 preceding siblings ...)
  2022-01-17 17:46 ` gprocida at google dot com
@ 2022-01-18  9:37 ` gprocida at google dot com
  2022-01-18 11:20 ` gprocida at google dot com
                   ` (30 subsequent siblings)
  36 siblings, 0 replies; 41+ messages in thread
From: gprocida at google dot com @ 2022-01-18  9:37 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=26646

--- Comment #6 from gprocida at google dot com ---
A little more debugging...

It is the called_from_public_decl check that prevents the fully-defined type
being built. The code change between the two kernel versions seems to have
nothing at all to do with the the two types affected, so perhaps something is
going wrong in this area.

One way to override this and avoid losing definitions is --load-all-types.

This option expensive in processing time and in the size of XML file generated.
The size is less of a problem (as we have a tool to remove all types that
cannot be reached from a symbol or another reachable type) but the processing
time may be. We will look into using this option pending a resolution of the
problem.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug default/26646] unexpectedly declaration-only types
  2020-09-22 11:35 [Bug default/26646] New: unexpectedly declaration-only types gprocida+abigail at google dot com
                   ` (5 preceding siblings ...)
  2022-01-18  9:37 ` gprocida at google dot com
@ 2022-01-18 11:20 ` gprocida at google dot com
  2022-01-18 11:20 ` gprocida at google dot com
                   ` (29 subsequent siblings)
  36 siblings, 0 replies; 41+ messages in thread
From: gprocida at google dot com @ 2022-01-18 11:20 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=26646

--- Comment #7 from gprocida at google dot com ---
Created attachment 13912
  --> https://sourceware.org/bugzilla/attachment.cgi?id=13912&action=edit
XML extracted from attached kernel file, compressed

abidw --type-id-style hash --no-show-locs --no-comp-dir-path vmlinux3 >
vmlinux3.abidw_g3

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug default/26646] unexpectedly declaration-only types
  2020-09-22 11:35 [Bug default/26646] New: unexpectedly declaration-only types gprocida+abigail at google dot com
                   ` (6 preceding siblings ...)
  2022-01-18 11:20 ` gprocida at google dot com
@ 2022-01-18 11:20 ` gprocida at google dot com
  2022-01-18 11:32 ` gprocida at google dot com
                   ` (28 subsequent siblings)
  36 siblings, 0 replies; 41+ messages in thread
From: gprocida at google dot com @ 2022-01-18 11:20 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=26646

--- Comment #8 from gprocida at google dot com ---
Kernel file:

https://ci.android.com/builds/submitted/8082661/kernel_abi_aarch64/latest -
vmlinux (470M) - this is what I've been calling vmlinux3

I don't think you actually need the other vmlinux and, as it was a presubmit
test artefact, I don't think it's public.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug default/26646] unexpectedly declaration-only types
  2020-09-22 11:35 [Bug default/26646] New: unexpectedly declaration-only types gprocida+abigail at google dot com
                   ` (7 preceding siblings ...)
  2022-01-18 11:20 ` gprocida at google dot com
@ 2022-01-18 11:32 ` gprocida at google dot com
  2022-01-24 15:07 ` dodji at redhat dot com
                   ` (27 subsequent siblings)
  36 siblings, 0 replies; 41+ messages in thread
From: gprocida at google dot com @ 2022-01-18 11:32 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=26646

--- Comment #9 from gprocida at google dot com ---
FTR, with --load-all-types:

* vmlinux3 ends up with 102 fewer declaration-only types
* vmlinux4 ends up with 100 fewer declaration-only types
* overall they both end up with same defined and declaration-only types

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug default/26646] unexpectedly declaration-only types
  2020-09-22 11:35 [Bug default/26646] New: unexpectedly declaration-only types gprocida+abigail at google dot com
                   ` (8 preceding siblings ...)
  2022-01-18 11:32 ` gprocida at google dot com
@ 2022-01-24 15:07 ` dodji at redhat dot com
  2022-01-24 17:29 ` gprocida at google dot com
                   ` (26 subsequent siblings)
  36 siblings, 0 replies; 41+ messages in thread
From: dodji at redhat dot com @ 2022-01-24 15:07 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=26646

--- Comment #10 from dodji at redhat dot com ---
Hello,

Here is my analysis on what is going on.

Let's look at the DWARF output and the abixml of the vmlinux3 binary.

From the abixml, we see that the struct ip_mc_list is used from the struct
in_device type.  For instance, the ip_mc_list::mc_list data member is of type
"pointer to decl-only struct ip_mc_list".

Moreover, the only use of struct ip_mc_list is through a "pointer to decl-only
struct ip_mc_list" use.

Let's look at the DWARF to see if we see the same.

[5ebb608]    structure_type       abbrev: 25
             name                 (strp) "in_device"
             byte_size            (data2) 352
             decl_file            (data1) inetdevice.h (122)
             decl_line            (data1) 25

This is the definition DIE of the struct in_device type.

[...]
     [5ebb641]      member               abbrev: 4
                   name                 (strp) "mc_list"
                   type                 (ref4) [5ebb7f7]
                   decl_file            (data1) inetdevice.h (122)
                   decl_line            (data1) 31
                   data_member_location (data1) 24

This is the definition of the "in_device::mc_list" data member.

Its type is the DIE [5ebb7f7] ...

[...]

[5ebb7f7]    pointer_type         abbrev: 5
             type                 (ref4) [5ebb7fc]

Here we see that the DIE [5ebb7f7] is a pointer ...

[5ebb7fc]    structure_type       abbrev: 12
            name                 (strp) "ip_mc_list"
            declaration          (flag_present) yes

... to a struct ip_mc_list that is declaration-only.

Looking through the DWARF, there no public exported function that refers
directly or indirectly to the full definition of ip_mc_list, as defined by this
DIE, for instance:

 [9b97e73]    structure_type       abbrev: 3
             name                 (strp) "ip_mc_list"
             byte_size            (data1) 144
             decl_file            (data1) igmp.h (228)
             decl_line            (data1) 67
[...]

So, this is why libabigail considers that the /definition/ of the struct
ip_mc_list is not used by any function that is publicly defined.  So, that
struct definition is not part of the ABI.

What am I missing?

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug default/26646] unexpectedly declaration-only types
  2020-09-22 11:35 [Bug default/26646] New: unexpectedly declaration-only types gprocida+abigail at google dot com
                   ` (9 preceding siblings ...)
  2022-01-24 15:07 ` dodji at redhat dot com
@ 2022-01-24 17:29 ` gprocida at google dot com
  2022-02-07 10:40 ` gprocida at google dot com
                   ` (25 subsequent siblings)
  36 siblings, 0 replies; 41+ messages in thread
From: gprocida at google dot com @ 2022-01-24 17:29 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=26646

--- Comment #11 from gprocida at google dot com ---
Hi Dodji.

Thanks for digging into this. I don't really have any tools for working with
DWARF other than dwarfdump, grep etc.

For vmlinux3, I see 1 declaration

< 1><0x0000e14b>    DW_TAG_structure_type
                      DW_AT_name                  ip_mc_list
                      DW_AT_declaration           yes(1)

and 13 definitions starting with this one

< 1><0x00018293>    DW_TAG_structure_type
                      DW_AT_name                  ip_mc_list
                      DW_AT_byte_size             0x00000090

Tracing back all 13 definitions to potential functions and variables was beyond
what I could achieve.

If all 13 are indeed not reachable, that does explain what's going on.

Note that I see identical 1 + 13 DIEs for vmlinux4 but there we *do* get a
definition in the XML.

Regards,
Giuliano.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug default/26646] unexpectedly declaration-only types
  2020-09-22 11:35 [Bug default/26646] New: unexpectedly declaration-only types gprocida+abigail at google dot com
                   ` (10 preceding siblings ...)
  2022-01-24 17:29 ` gprocida at google dot com
@ 2022-02-07 10:40 ` gprocida at google dot com
  2022-02-07 11:19 ` dodji at redhat dot com
                   ` (24 subsequent siblings)
  36 siblings, 0 replies; 41+ messages in thread
From: gprocida at google dot com @ 2022-02-07 10:40 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=26646

--- Comment #12 from gprocida at google dot com ---
I wrote a hacky readelf output to DIE graph script. For both kernels, I can
find chains of DIEs:

function dev_get_by_name returns type struct net_device *
struct net_device has a member ip_ptr of type struct in_device *
struct in_device has a member mc_list of type struct ip_mc_list

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug default/26646] unexpectedly declaration-only types
  2020-09-22 11:35 [Bug default/26646] New: unexpectedly declaration-only types gprocida+abigail at google dot com
                   ` (11 preceding siblings ...)
  2022-02-07 10:40 ` gprocida at google dot com
@ 2022-02-07 11:19 ` dodji at redhat dot com
  2022-02-07 17:10 ` [Bug default/26646] unexpected " dodji at redhat dot com
                   ` (23 subsequent siblings)
  36 siblings, 0 replies; 41+ messages in thread
From: dodji at redhat dot com @ 2022-02-07 11:19 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=26646

--- Comment #13 from dodji at redhat dot com ---
Indeed, looking at the combination of those two kernels I can see the problem
now, thanks!

I have submitted two patches to at
https://sourceware.org/pipermail/libabigail/2022q1/004145.html. Could you
please review/test them to see if they address the issue as you see it?

Thanks!

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug default/26646] unexpected declaration-only types
  2020-09-22 11:35 [Bug default/26646] New: unexpectedly declaration-only types gprocida+abigail at google dot com
                   ` (12 preceding siblings ...)
  2022-02-07 11:19 ` dodji at redhat dot com
@ 2022-02-07 17:10 ` dodji at redhat dot com
  2022-02-07 17:47 ` gprocida at google dot com
                   ` (22 subsequent siblings)
  36 siblings, 0 replies; 41+ messages in thread
From: dodji at redhat dot com @ 2022-02-07 17:10 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=26646

dodji at redhat dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|unexpectedly                |unexpected declaration-only
                   |declaration-only types      |types

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug default/26646] unexpected declaration-only types
  2020-09-22 11:35 [Bug default/26646] New: unexpectedly declaration-only types gprocida+abigail at google dot com
                   ` (13 preceding siblings ...)
  2022-02-07 17:10 ` [Bug default/26646] unexpected " dodji at redhat dot com
@ 2022-02-07 17:47 ` gprocida at google dot com
  2022-02-07 21:15 ` gprocida at google dot com
                   ` (21 subsequent siblings)
  36 siblings, 0 replies; 41+ messages in thread
From: gprocida at google dot com @ 2022-02-07 17:47 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=26646

--- Comment #14 from gprocida at google dot com ---
The other type that is different between vmlinux3 and vmlinux4 is struct
udp_table.

I haven't yet traced a shortest path from a symbol to the full definition yet.
In theory this should be straightforward based on the XML for the second
kernel, but in practice I need to write some more code to automate the process.

I've found a ridiculously long path but it would be much too painful to verify
by hand.

Note that I suspect there could be up to around 100 types which could be
assigned full definitions.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug default/26646] unexpected declaration-only types
  2020-09-22 11:35 [Bug default/26646] New: unexpectedly declaration-only types gprocida+abigail at google dot com
                   ` (14 preceding siblings ...)
  2022-02-07 17:47 ` gprocida at google dot com
@ 2022-02-07 21:15 ` gprocida at google dot com
  2022-02-10 16:45 ` dodji at redhat dot com
                   ` (20 subsequent siblings)
  36 siblings, 0 replies; 41+ messages in thread
From: gprocida at google dot com @ 2022-02-07 21:15 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=26646

--- Comment #15 from gprocida at google dot com ---
Hi Dodji.

I collected shortest paths from symbols to struct udp_table in the ABI for
vmlinux4 and I've traced one by hand through the DWARF DIEs (readelf
--debug-dump) for vmlinux3:

function symbol dev_get_by_name (in ksymtab) has a parameter of type struct net
*                                                      
struct net has member rtln of type struct sock *                                
struct sock has member sk_prot_creator of type struct proto *                   
struct proto has member h of an anonymous union type                            
the anonymous union type has member udp_table of type struct udp_table *        
struct udp_table is fully defined

If you can see why this path is not being considered by the DWARF reader, I
think the fix will address more than just this one instance.

Regards,
Giuliano.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug default/26646] unexpected declaration-only types
  2020-09-22 11:35 [Bug default/26646] New: unexpectedly declaration-only types gprocida+abigail at google dot com
                   ` (15 preceding siblings ...)
  2022-02-07 21:15 ` gprocida at google dot com
@ 2022-02-10 16:45 ` dodji at redhat dot com
  2022-02-10 16:46 ` gprocida at google dot com
                   ` (19 subsequent siblings)
  36 siblings, 0 replies; 41+ messages in thread
From: dodji at redhat dot com @ 2022-02-10 16:45 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=26646

dodji at redhat dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2022-02-10
             Status|UNCONFIRMED                 |ASSIGNED
     Ever confirmed|0                           |1

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug default/26646] unexpected declaration-only types
  2020-09-22 11:35 [Bug default/26646] New: unexpectedly declaration-only types gprocida+abigail at google dot com
                   ` (16 preceding siblings ...)
  2022-02-10 16:45 ` dodji at redhat dot com
@ 2022-02-10 16:46 ` gprocida at google dot com
  2022-02-10 17:21 ` gprocida at google dot com
                   ` (18 subsequent siblings)
  36 siblings, 0 replies; 41+ messages in thread
From: gprocida at google dot com @ 2022-02-10 16:46 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=26646

--- Comment #16 from gprocida at google dot com ---
I've done a bit more debugging.

Tracing through DIE processing for both vmlinux{3,4} I see that in the former
case, the first and only DIE for struct udp_table that is reached with "called
from public decl" true is a declaration-only DIE.

So something is preventing the DWARF reader from seeing the (later) definition
reachable from dev_get_by_name.

Tracing through what happens with dev_get_by_name, I see that recursion stops
at struct net * which has already been loaded.

In turn that was loaded by function netif_set_real_num_tx_queues -> struct
net_device * -> struct net_device -> struct in_device * -> struct in_device ->
struct net *. This continues struct net -> struct sock * -> struct sock ->
struct proto * -> struct proto. Recursion seems to stop at struct proto, but
I'm not sure why, yet.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug default/26646] unexpected declaration-only types
  2020-09-22 11:35 [Bug default/26646] New: unexpectedly declaration-only types gprocida+abigail at google dot com
                   ` (17 preceding siblings ...)
  2022-02-10 16:46 ` gprocida at google dot com
@ 2022-02-10 17:21 ` gprocida at google dot com
  2022-02-10 17:33 ` dodji at redhat dot com
                   ` (17 subsequent siblings)
  36 siblings, 0 replies; 41+ messages in thread
From: gprocida at google dot com @ 2022-02-10 17:21 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=26646

--- Comment #17 from gprocida at google dot com ---
Actually, there are ~15 occurrences of struct proto DIEs being reached and I
haven't traced through all the code to work out why it never gets to a full
definition of struct udp_table.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug default/26646] unexpected declaration-only types
  2020-09-22 11:35 [Bug default/26646] New: unexpectedly declaration-only types gprocida+abigail at google dot com
                   ` (18 preceding siblings ...)
  2022-02-10 17:21 ` gprocida at google dot com
@ 2022-02-10 17:33 ` dodji at redhat dot com
  2022-02-10 17:36 ` dodji at redhat dot com
                   ` (16 subsequent siblings)
  36 siblings, 0 replies; 41+ messages in thread
From: dodji at redhat dot com @ 2022-02-10 17:33 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=26646

dodji at redhat dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dodji at redhat dot com

--- Comment #18 from dodji at redhat dot com ---
Created attachment 13970
  --> https://sourceware.org/bugzilla/attachment.cgi?id=13970&action=edit
Second candidate patch

This patch is the second attempt at fixing the issue and should address
specifically the comment
https://sourceware.org/bugzilla/show_bug.cgi?id=26646#c15

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug default/26646] unexpected declaration-only types
  2020-09-22 11:35 [Bug default/26646] New: unexpectedly declaration-only types gprocida+abigail at google dot com
                   ` (19 preceding siblings ...)
  2022-02-10 17:33 ` dodji at redhat dot com
@ 2022-02-10 17:36 ` dodji at redhat dot com
  2022-02-10 18:53 ` gprocida at google dot com
                   ` (15 subsequent siblings)
  36 siblings, 0 replies; 41+ messages in thread
From: dodji at redhat dot com @ 2022-02-10 17:36 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=26646

--- Comment #19 from dodji at redhat dot com ---
(In reply to gprocida from comment #15)
> Hi Dodji.
> 
> I collected shortest paths from symbols to struct udp_table in the ABI for
> vmlinux4 and I've traced one by hand through the DWARF DIEs (readelf
> --debug-dump) for vmlinux3:
> 
> function symbol dev_get_by_name (in ksymtab) has a parameter of type struct
> net *                                                      
> struct net has member rtln of type struct sock *                            
> 
> struct sock has member sk_prot_creator of type struct proto *               
> 
> struct proto has member h of an anonymous union type                        
> 
> the anonymous union type has member udp_table of type struct udp_table *    
> 
> struct udp_table is fully defined
> 
> If you can see why this path is not being considered by the DWARF reader, I
> think the fix will address more than just this one instance.
> 
> Regards,
> Giuliano.

Thanks a lot for this analysis, it's super useful.

I came up with this patch
https://sourceware.org/bugzilla/attachment.cgi?id=13970.  Would it move the
needle a bit more?

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug default/26646] unexpected declaration-only types
  2020-09-22 11:35 [Bug default/26646] New: unexpectedly declaration-only types gprocida+abigail at google dot com
                   ` (20 preceding siblings ...)
  2022-02-10 17:36 ` dodji at redhat dot com
@ 2022-02-10 18:53 ` gprocida at google dot com
  2022-02-11 12:51 ` gprocida at google dot com
                   ` (14 subsequent siblings)
  36 siblings, 0 replies; 41+ messages in thread
From: gprocida at google dot com @ 2022-02-10 18:53 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=26646

--- Comment #20 from gprocida at google dot com ---
Hi.

I've tested the latest patch.

It increased the number of fully-defined types for vmlinux3! Unfortunately,
while it also added the same type definition for vmlinux4, our old friend
ip_mc_list became declaration-only. :-(

I still think it is a step in the right direction. Certainly the recursion
limit of 5 aggregates was arbitrary and small unit tests would be unlikely to
exercise the code.

In terms of the code...

I think it would be better to store pairs of offsets though it would probably
be extremely hard to find an example where this would make any difference.

It's safe to recurse a bit too much, so long as there is some termination
guarantee, but potentially unsafe to stop recursion too early which is
theoretically possible if you compare DIEs A and B and deeper in the recursion
want to compare C and B.

The comparison algorithm I put together tracks pairs of nodes compared as this
corresponds exactly to the design aim of avoiding repeated comparisons.

Giuliano.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug default/26646] unexpected declaration-only types
  2020-09-22 11:35 [Bug default/26646] New: unexpectedly declaration-only types gprocida+abigail at google dot com
                   ` (21 preceding siblings ...)
  2022-02-10 18:53 ` gprocida at google dot com
@ 2022-02-11 12:51 ` gprocida at google dot com
  2022-02-11 13:00 ` gprocida at google dot com
                   ` (13 subsequent siblings)
  36 siblings, 0 replies; 41+ messages in thread
From: gprocida at google dot com @ 2022-02-11 12:51 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=26646

--- Comment #21 from gprocida at google dot com ---
For vmlinux4, I've found one (and only one) direct path to a full definition of
struct ip_mc_list:

9f34bf9 function udp4_hwcsum (in ksymtab)
9f34c0f parameter skb
9f1b29c pointer
9f1b2a1 struct sk_buff
9f1b2aa anonymous member
9f1b2b3 anonymous union
9f1b2c1 anonymous struct
9f1b2e9 anonymous union
9f1b2ee member dev
9f1a5ad pointer
9f1a5b2 struct net_device
9f1a9c2 member ip_ptr
9f2e287 pointer
9f2e28c struct in_device
9f2e2c5 member mc_list
9f2e47b pointer
9f2e480 struct ip_mc_list (fully defined)

The DIE offsets are as reported by readelf --debug-dump.

I hope this is useful.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug default/26646] unexpected declaration-only types
  2020-09-22 11:35 [Bug default/26646] New: unexpectedly declaration-only types gprocida+abigail at google dot com
                   ` (22 preceding siblings ...)
  2022-02-11 12:51 ` gprocida at google dot com
@ 2022-02-11 13:00 ` gprocida at google dot com
  2022-02-24 11:09 ` dodji at redhat dot com
                   ` (12 subsequent siblings)
  36 siblings, 0 replies; 41+ messages in thread
From: gprocida at google dot com @ 2022-02-11 13:00 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=26646

--- Comment #22 from gprocida at google dot com ---
I missed out a couple of anonymous members.

9f34bf9 function udp4_hwcsum (in ksymtab)
9f34c0f parameter skb
9f1b29c pointer
9f1b2a1 struct sk_buff
9f1b2aa anonymous member
9f1b2b3 anonymous union
9f1b2b8 anonymous member
9f1b2c1 anonymous struct
9f1b2e0 anonymous member
9f1b2e9 anonymous union
9f1b2ee member dev
9f1a5ad pointer
9f1a5b2 struct net_device
9f1a9c2 member ip_ptr
9f2e287 pointer
9f2e28c struct in_device
9f2e2c5 member mc_list
9f2e47b pointer
9f2e480 struct ip_mc_list (fully defined)

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug default/26646] unexpected declaration-only types
  2020-09-22 11:35 [Bug default/26646] New: unexpectedly declaration-only types gprocida+abigail at google dot com
                   ` (23 preceding siblings ...)
  2022-02-11 13:00 ` gprocida at google dot com
@ 2022-02-24 11:09 ` dodji at redhat dot com
  2022-02-24 12:16 ` gprocida at google dot com
                   ` (11 subsequent siblings)
  36 siblings, 0 replies; 41+ messages in thread
From: dodji at redhat dot com @ 2022-02-24 11:09 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=26646

--- Comment #23 from dodji at redhat dot com ---
Thanks for the analysis, very useful as usual.

Would the content of this branch (PR26646) fix the issues you are seeing?

https://sourceware.org/git/?p=libabigail.git;a=shortlog;h=refs/heads/PR26646

I still need to address your comment
https://sourceware.org/bugzilla/show_bug.cgi?id=26646#c20, but I wanted to fix
the ip_mc_list decl-only-ness first.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug default/26646] unexpected declaration-only types
  2020-09-22 11:35 [Bug default/26646] New: unexpectedly declaration-only types gprocida+abigail at google dot com
                   ` (24 preceding siblings ...)
  2022-02-24 11:09 ` dodji at redhat dot com
@ 2022-02-24 12:16 ` gprocida at google dot com
  2022-02-24 15:54 ` dodji at redhat dot com
                   ` (10 subsequent siblings)
  36 siblings, 0 replies; 41+ messages in thread
From: gprocida at google dot com @ 2022-02-24 12:16 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=26646

--- Comment #24 from gprocida at google dot com ---
Hi.

I ran some quick tests - the libabigail test suite and on the two kernels.

Good

* vmlinux3 and vmlinux4 get ABIs which are the same, including declaration-only
/ fully-defined status
* both have more fully-defined types than the previous iteration (around 9)
* a Linux test case looks better (some declaration-only duplicates vanish)

Not so good

* struct can_dev_rcv_lists is now declaration-only (w.r.t. to the original
baseline ABIs)
* struct prefix_info too (w.r.t. an intermediate code version)
* there is a test case regression (nmap)

I'll do some more digging, but probably nothing directly with these two types
right now.

Someone here pointed out to me that Clang and GCC have "type homing" logic to
reduce the definitions that get emitted into DWARF to what's needed (leaving
some types declaration-only). The relevant flags (to disable this) are
-fstandalone-debug and -femit-class-debug-always, respectively.

You might decide that if either of these flags makes a difference to libabigail
output then there's a bug somewhere in the compiler, linker or libabigail. Or
it might be the case that disabling the logic exposes the types behind opaque
pointers in a way which is unwanted.

In any case, I'm going to see what sort of impact -fstandalone-debug has on
ABIs and kernel build time and size.

Regards,
Giuliano.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug default/26646] unexpected declaration-only types
  2020-09-22 11:35 [Bug default/26646] New: unexpectedly declaration-only types gprocida+abigail at google dot com
                   ` (25 preceding siblings ...)
  2022-02-24 12:16 ` gprocida at google dot com
@ 2022-02-24 15:54 ` dodji at redhat dot com
  2022-02-24 16:05 ` gprocida at google dot com
                   ` (9 subsequent siblings)
  36 siblings, 0 replies; 41+ messages in thread
From: dodji at redhat dot com @ 2022-02-24 15:54 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=26646

--- Comment #25 from dodji at redhat dot com ---
(In reply to gprocida from comment #24)
> Hi.
> 
> I ran some quick tests - the libabigail test suite and on the two kernels.

Thanks!

> Good
> 
> * vmlinux3 and vmlinux4 get ABIs which are the same, including
> declaration-only / fully-defined status
> * both have more fully-defined types than the previous iteration (around 9)
> * a Linux test case looks better (some declaration-only duplicates vanish)

Good to know.

> 
> Not so good
> 
> * struct can_dev_rcv_lists is now declaration-only (w.r.t. to the original
> baseline ABIs)
> * struct prefix_info too (w.r.t. an intermediate code version)

OK, I haven't analyzed this, is it sure that it was supposed to be fully
declared?

> * there is a test case regression (nmap)

OK, I think I have the reason for this. It seems to have been an unfortunate
patch that snuck into the tree.  I have removed it and re-push the branch.  You
can fetch it again and test.  This one should disappear.

If the regression disappears, I guess we can say that the current state is an
improvement.  If so, I'll clean-up and post the patches on the list, if you
agree.  Further investigation will continue after that.

Thanks.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug default/26646] unexpected declaration-only types
  2020-09-22 11:35 [Bug default/26646] New: unexpectedly declaration-only types gprocida+abigail at google dot com
                   ` (26 preceding siblings ...)
  2022-02-24 15:54 ` dodji at redhat dot com
@ 2022-02-24 16:05 ` gprocida at google dot com
  2022-02-28  9:59 ` dodji at redhat dot com
                   ` (8 subsequent siblings)
  36 siblings, 0 replies; 41+ messages in thread
From: gprocida at google dot com @ 2022-02-24 16:05 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=26646

--- Comment #26 from gprocida at google dot com ---
Hi.

(In reply to dodji from comment #25)
> (In reply to gprocida from comment #24)
> > * struct can_dev_rcv_lists is now declaration-only (w.r.t. to the original
> > baseline ABIs)
> > * struct prefix_info too (w.r.t. an intermediate code version)
> 
> OK, I haven't analyzed this, is it sure that it was supposed to be fully
> declared?

Good question. I have to assume there is some set of paths where we can get
from the symbol to the full definition, either by going down a path or by
jumping from a declaration in one path to a definition in another path.
However, I have not tried to code up this idea!

> > * there is a test case regression (nmap)
> 
> OK, I think I have the reason for this. It seems to have been an unfortunate
> patch that snuck into the tree.  I have removed it and re-push the branch. 
> You can fetch it again and test.  This one should disappear.

It has.

> If the regression disappears, I guess we can say that the current state is
> an improvement.  If so, I'll clean-up and post the patches on the list, if
> you agree.  Further investigation will continue after that.

Agreed. Thanks!

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug default/26646] unexpected declaration-only types
  2020-09-22 11:35 [Bug default/26646] New: unexpectedly declaration-only types gprocida+abigail at google dot com
                   ` (27 preceding siblings ...)
  2022-02-24 16:05 ` gprocida at google dot com
@ 2022-02-28  9:59 ` dodji at redhat dot com
  2022-02-28 10:01 ` dodji at redhat dot com
                   ` (7 subsequent siblings)
  36 siblings, 0 replies; 41+ messages in thread
From: dodji at redhat dot com @ 2022-02-28  9:59 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=26646

--- Comment #27 from dodji at redhat dot com ---
(In reply to gprocida from comment #20)

[...]

> In terms of the code...
> 
> I think it would be better to store pairs of offsets though it would
> probably be extremely hard to find an example where this would make any
> difference.

I have update the patch to do just that.

I posted it for review at
https://sourceware.org/pipermail/libabigail/2022q1/004178.html.

If it works for you, I'll be glad to apply it.

[...]

Thanks.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug default/26646] unexpected declaration-only types
  2020-09-22 11:35 [Bug default/26646] New: unexpectedly declaration-only types gprocida+abigail at google dot com
                   ` (28 preceding siblings ...)
  2022-02-28  9:59 ` dodji at redhat dot com
@ 2022-02-28 10:01 ` dodji at redhat dot com
  2022-03-01 14:34 ` gprocida at google dot com
                   ` (6 subsequent siblings)
  36 siblings, 0 replies; 41+ messages in thread
From: dodji at redhat dot com @ 2022-02-28 10:01 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=26646

--- Comment #28 from dodji at redhat dot com ---
(In reply to gprocida from comment #26)

> > If the regression disappears, I guess we can say that the current state is
> > an improvement.  If so, I'll clean-up and post the patches on the list, if
> > you agree.  Further investigation will continue after that.
> 
> Agreed. Thanks!

I have posted the cleaned-up patch for review at
https://sourceware.org/pipermail/libabigail/2022q1/004179.html.

Thanks.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug default/26646] unexpected declaration-only types
  2020-09-22 11:35 [Bug default/26646] New: unexpectedly declaration-only types gprocida+abigail at google dot com
                   ` (29 preceding siblings ...)
  2022-02-28 10:01 ` dodji at redhat dot com
@ 2022-03-01 14:34 ` gprocida at google dot com
  2022-03-01 14:40 ` gprocida at google dot com
                   ` (5 subsequent siblings)
  36 siblings, 0 replies; 41+ messages in thread
From: gprocida at google dot com @ 2022-03-01 14:34 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=26646

--- Comment #29 from gprocida at google dot com ---
Hi.

On Mon, 28 Feb 2022 at 09:59, dodji at redhat dot com
<sourceware-bugzilla@sourceware.org> wrote:
>
> https://sourceware.org/bugzilla/show_bug.cgi?id=26646
>
> --- Comment #27 from dodji at redhat dot com ---
> (In reply to gprocida from comment #20)
>
> [...]
>
> > In terms of the code...
> >
> > I think it would be better to store pairs of offsets though it would
> > probably be extremely hard to find an example where this would make any
> > difference.
>
> I have update the patch to do just that.
>
> I posted it for review at
> https://sourceware.org/pipermail/libabigail/2022q1/004178.html.
>

Two (overlapping) things:

1. I wouldn't bother with inserting / testing / erasing the reverse
pair o(p2, p1)

It might cost more than it ever saves. But it would take
instrumentation to see if it ever makes a difference.

If you make this change then s/unordered pair/ordered pair/ in the doc
comments.

2. There is a typo.

One of the reverse pairs is written as o(p2, 1).

Otherwise, it looks good to me.

Giuliano.

> If it works for you, I'll be glad to apply it.
>
> [...]
>
> Thanks.
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug default/26646] unexpected declaration-only types
  2020-09-22 11:35 [Bug default/26646] New: unexpectedly declaration-only types gprocida+abigail at google dot com
                   ` (30 preceding siblings ...)
  2022-03-01 14:34 ` gprocida at google dot com
@ 2022-03-01 14:40 ` gprocida at google dot com
  2022-03-02 13:34   ` Dodji Seketeli
  2022-03-02 13:34 ` dodji at seketeli dot org
                   ` (4 subsequent siblings)
  36 siblings, 1 reply; 41+ messages in thread
From: gprocida at google dot com @ 2022-03-01 14:40 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=26646

--- Comment #30 from gprocida at google dot com ---
Hi Dodji.

On Mon, 28 Feb 2022 at 10:01, dodji at redhat dot com
<sourceware-bugzilla@sourceware.org> wrote:
>
> https://sourceware.org/bugzilla/show_bug.cgi?id=26646
>
> --- Comment #28 from dodji at redhat dot com ---
> (In reply to gprocida from comment #26)
>
> > > If the regression disappears, I guess we can say that the current state is
> > > an improvement.  If so, I'll clean-up and post the patches on the list, if
> > > you agree.  Further investigation will continue after that.
> >
> > Agreed. Thanks!
>
> I have posted the cleaned-up patch for review at
> https://sourceware.org/pipermail/libabigail/2022q1/004179.html.
>

I've thought a little bit more about this one.

The current check is "local", recursion prevention at *this* DIE
prevents it from being canonicalised, but it could still depend on
child DIEs that are actually also parent DIEs and whose processing has
not yet been completed.

Would it be safer (more precise) to inhibit on-the-fly
canonicalisation whenever the set (stack) of aggregates being compared
is non-empty?

Giuliano.

> Thanks.
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Bug default/26646] unexpected declaration-only types
  2022-03-01 14:40 ` gprocida at google dot com
@ 2022-03-02 13:34   ` Dodji Seketeli
  0 siblings, 0 replies; 41+ messages in thread
From: Dodji Seketeli @ 2022-03-02 13:34 UTC (permalink / raw)
  To: gprocida at google dot com via Libabigail; +Cc: gprocida at google dot com

Dodji Seketeli via Libabigail <libabigail@sourceware.org> a écrit:

[...]


> This addresses https://sourceware.org/bugzilla/show_bug.cgi?id=26646#c21.
>
> 	* src/abg-dwarf-reader.cc (compare_dies): Do not propagate
> 	canonical type when aggregate redundancy is detected.
>

[...]

gprocida at google dot com via Libabigail <libabigail@sourceware.org> a
écrit:

[...]

> I've thought a little bit more about this one.
>
> The current check is "local", recursion prevention at *this* DIE
> prevents it from being canonicalised, but it could still depend on
> child DIEs that are actually also parent DIEs and whose processing has
> not yet been completed.

You are right.  The current state is an improvement, but it's not
"complete".  Some DIEs might still wrongly considered as being
equivalent just because they depend on a "recursive DIE" that was
detected as such.  The canonical DIE propagation might have happened
during the window of time where the recursive DIE was comparing equal to
its counterpart.

This is addressed in the IR type canonicalization algorithm in
abg-ir.cc.
To learn more about this, look for
/// @defgroup OnTheFlyCanonicalization On-the-fly Canonicalization

and that comment.

The implementation is scattered in the functions
return_comparison_result, maybe_propagate_canonical_type and in the
macro RETURN_TRUE_IF_COMPARISON_CYCLE_DETECTED.

More on this below ...

> Would it be safer (more precise) to inhibit on-the-fly
> canonicalisation whenever the set (stack) of aggregates being compared
> is non-empty?

Right, that is the obvious thing to do, as I thought.  But then the
problem I encountered, when doing this for IR types, was that things
were too slow.  Really really slow.  In the time window where the
canonical type DIE is inhibited, the amount of (quadratic) structural
DIE comparisons goes through the roof.  That is why I came up with the
canonical type propagation in the first place.

So, the other (harder) approach I've taken is to not disable the
on-the-fly canonicalization when the stack of aggregates being compared
is non-empty, but rather, keep track of the types which are resolved as
being equivalent, but that depend on recursive aggregate that were
detected as such.

I call these types "dependent types".  Let's consider one such dependent
type D.  D's canonical type is the result of canonical propagation that
happened as the recursive type (that D depends on) was on the stack of
aggregates being compared.  D is labelled as having a "non-confirmed"
canonical type.  If the recursive type it depends on is later considered
not being equivalent to its counterpart, then the non-confirmed
canonical of D is going to be "cancelled" and then D is going to be
considered as being non canonicalized.

This is done for D and all the types that depends on it.

By doing this, the number of non canonicalized types is reduced to its
absolute minimum and so comparisons are reasonably fast, even for an
applications like the Kernel or Gimp.  Libraries usually have smaller
type graphs so we don't usually see the speed issue there.  Unless it's
llvm.so or libwebkit.so ;-)

So, I wasn't going to get into doing this for DIEs right away because it
takes a lot of time doing careful coding/debugging/profiling cycles.

But I definitely think we'll need to do this to have perfectly precise
canonicalizer.  My point was to get this in as it's an improvement
already and then work on the subsequent patch with a cooler head.

Does that make any sense?

Thanks.

-- 
		Dodji

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug default/26646] unexpected declaration-only types
  2020-09-22 11:35 [Bug default/26646] New: unexpectedly declaration-only types gprocida+abigail at google dot com
                   ` (31 preceding siblings ...)
  2022-03-01 14:40 ` gprocida at google dot com
@ 2022-03-02 13:34 ` dodji at seketeli dot org
  2022-03-02 22:36 ` gprocida at google dot com
                   ` (3 subsequent siblings)
  36 siblings, 0 replies; 41+ messages in thread
From: dodji at seketeli dot org @ 2022-03-02 13:34 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=26646

--- Comment #31 from dodji at seketeli dot org ---
Dodji Seketeli via Libabigail <libabigail@sourceware.org> a écrit:

[...]


> This addresses https://sourceware.org/bugzilla/show_bug.cgi?id=26646#c21.
>
> 	* src/abg-dwarf-reader.cc (compare_dies): Do not propagate
> 	canonical type when aggregate redundancy is detected.
>

[...]

gprocida at google dot com via Libabigail <libabigail@sourceware.org> a
écrit:

[...]

> I've thought a little bit more about this one.
>
> The current check is "local", recursion prevention at *this* DIE
> prevents it from being canonicalised, but it could still depend on
> child DIEs that are actually also parent DIEs and whose processing has
> not yet been completed.

You are right.  The current state is an improvement, but it's not
"complete".  Some DIEs might still wrongly considered as being
equivalent just because they depend on a "recursive DIE" that was
detected as such.  The canonical DIE propagation might have happened
during the window of time where the recursive DIE was comparing equal to
its counterpart.

This is addressed in the IR type canonicalization algorithm in
abg-ir.cc.
To learn more about this, look for
/// @defgroup OnTheFlyCanonicalization On-the-fly Canonicalization

and that comment.

The implementation is scattered in the functions
return_comparison_result, maybe_propagate_canonical_type and in the
macro RETURN_TRUE_IF_COMPARISON_CYCLE_DETECTED.

More on this below ...

> Would it be safer (more precise) to inhibit on-the-fly
> canonicalisation whenever the set (stack) of aggregates being compared
> is non-empty?

Right, that is the obvious thing to do, as I thought.  But then the
problem I encountered, when doing this for IR types, was that things
were too slow.  Really really slow.  In the time window where the
canonical type DIE is inhibited, the amount of (quadratic) structural
DIE comparisons goes through the roof.  That is why I came up with the
canonical type propagation in the first place.

So, the other (harder) approach I've taken is to not disable the
on-the-fly canonicalization when the stack of aggregates being compared
is non-empty, but rather, keep track of the types which are resolved as
being equivalent, but that depend on recursive aggregate that were
detected as such.

I call these types "dependent types".  Let's consider one such dependent
type D.  D's canonical type is the result of canonical propagation that
happened as the recursive type (that D depends on) was on the stack of
aggregates being compared.  D is labelled as having a "non-confirmed"
canonical type.  If the recursive type it depends on is later considered
not being equivalent to its counterpart, then the non-confirmed
canonical of D is going to be "cancelled" and then D is going to be
considered as being non canonicalized.

This is done for D and all the types that depends on it.

By doing this, the number of non canonicalized types is reduced to its
absolute minimum and so comparisons are reasonably fast, even for an
applications like the Kernel or Gimp.  Libraries usually have smaller
type graphs so we don't usually see the speed issue there.  Unless it's
llvm.so or libwebkit.so ;-)

So, I wasn't going to get into doing this for DIEs right away because it
takes a lot of time doing careful coding/debugging/profiling cycles.

But I definitely think we'll need to do this to have perfectly precise
canonicalizer.  My point was to get this in as it's an improvement
already and then work on the subsequent patch with a cooler head.

Does that make any sense?

Thanks.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug default/26646] unexpected declaration-only types
  2020-09-22 11:35 [Bug default/26646] New: unexpectedly declaration-only types gprocida+abigail at google dot com
                   ` (32 preceding siblings ...)
  2022-03-02 13:34 ` dodji at seketeli dot org
@ 2022-03-02 22:36 ` gprocida at google dot com
  2022-03-03 11:43 ` dodji at seketeli dot org
                   ` (2 subsequent siblings)
  36 siblings, 0 replies; 41+ messages in thread
From: gprocida at google dot com @ 2022-03-02 22:36 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=26646

--- Comment #32 from gprocida at google dot com ---
(Not sure what's going on with the addresses, I've added back libabigail here.)

Hi.

On Wed, 2 Mar 2022 at 13:34, dodji at seketeli dot org
<sourceware-bugzilla@sourceware.org> wrote:
>
> https://sourceware.org/bugzilla/show_bug.cgi?id=26646
>
> --- Comment #31 from dodji at seketeli dot org ---
> Dodji Seketeli via Libabigail <libabigail@sourceware.org> a écrit:
>
> [...]
>
>
> > This addresses https://sourceware.org/bugzilla/show_bug.cgi?id=26646#c21.
> >
> >       * src/abg-dwarf-reader.cc (compare_dies): Do not propagate
> >       canonical type when aggregate redundancy is detected.
> >
>
> [...]
>
> gprocida at google dot com via Libabigail <libabigail@sourceware.org> a
> écrit:
>
> [...]
>
> > I've thought a little bit more about this one.
> >
> > The current check is "local", recursion prevention at *this* DIE
> > prevents it from being canonicalised, but it could still depend on
> > child DIEs that are actually also parent DIEs and whose processing has
> > not yet been completed.
>
> You are right.  The current state is an improvement, but it's not
> "complete".  Some DIEs might still wrongly considered as being
> equivalent just because they depend on a "recursive DIE" that was
> detected as such.  The canonical DIE propagation might have happened
> during the window of time where the recursive DIE was comparing equal to
> its counterpart.
>
> This is addressed in the IR type canonicalization algorithm in
> abg-ir.cc.
> To learn more about this, look for
> /// @defgroup OnTheFlyCanonicalization On-the-fly Canonicalization
>
> and that comment.
>
> The implementation is scattered in the functions
> return_comparison_result, maybe_propagate_canonical_type and in the
> macro RETURN_TRUE_IF_COMPARISON_CYCLE_DETECTED.
>
> More on this below ...
>
> > Would it be safer (more precise) to inhibit on-the-fly
> > canonicalisation whenever the set (stack) of aggregates being compared
> > is non-empty?
>
> Right, that is the obvious thing to do, as I thought.  But then the
> problem I encountered, when doing this for IR types, was that things
> were too slow.  Really really slow.  In the time window where the
> canonical type DIE is inhibited, the amount of (quadratic) structural
> DIE comparisons goes through the roof.  That is why I came up with the
> canonical type propagation in the first place.
>
> So, the other (harder) approach I've taken is to not disable the
> on-the-fly canonicalization when the stack of aggregates being compared
> is non-empty, but rather, keep track of the types which are resolved as
> being equivalent, but that depend on recursive aggregate that were
> detected as such.
>
> I call these types "dependent types".  Let's consider one such dependent
> type D.  D's canonical type is the result of canonical propagation that
> happened as the recursive type (that D depends on) was on the stack of
> aggregates being compared.  D is labelled as having a "non-confirmed"
> canonical type.  If the recursive type it depends on is later considered
> not being equivalent to its counterpart, then the non-confirmed
> canonical of D is going to be "cancelled" and then D is going to be
> considered as being non canonicalized.
>
> This is done for D and all the types that depends on it.
>
> By doing this, the number of non canonicalized types is reduced to its
> absolute minimum and so comparisons are reasonably fast, even for an
> applications like the Kernel or Gimp.  Libraries usually have smaller
> type graphs so we don't usually see the speed issue there.  Unless it's
> llvm.so or libwebkit.so ;-)
>
> So, I wasn't going to get into doing this for DIEs right away because it
> takes a lot of time doing careful coding/debugging/profiling cycles.
>
> But I definitely think we'll need to do this to have perfectly precise
> canonicalizer.  My point was to get this in as it's an improvement
> already and then work on the subsequent patch with a cooler head.
>
> Does that make any sense?
>

I think it makes some sense, but it would take me some time to read
through, digest and reason about this properly.

Instead... I'm going to advertise my comparison algorithm again (for
which I've already done all the hard thinking and testing). :-) I'm
not sure how directly applicable it is to canonicalisation, but there
is the *potential* to eliminate all redundant comparisons.

Problem statement: How to decide if two DIEs (let's call them nodes)
are equivalent, when nodes can depend on other nodes recursively and
with cycles, without comparing each pair of nodes more than once.

Nodes can have attributes (like kind (struct, union), size etc.) and
labelled edges (like member x of type y, pointer to type z etc.).
Equivalence means that node attributes are equivalent, nodes have the
same edge labels and following matching labelled edges finds
equivalent nodes.

The C and C++ type systems have various wrinkles, but the overall
structure of equivalence testing involves this kind of matching of
attributes and following edges to other nodes.

Equivalence of two nodes means we cannot find any difference, no
matter at what depth we decide to terminate recursion.

The algorithm has the following components.

1. a "known results" map from comparisons (node pairs) to equivalence
results (bool) - I think the equivalent thing in libabigail may be
on-the-fly canonicalisation itself, but this only stores true
equivalences for a subset of types and it may be better to cache all
outcomes; this needs to be part of reader_context
2. the path-based strongly-connected component algorithm factored into
2 helper functions and associated shared state (which should also be
part of the reader context) - these replace the
aggregates_being_compared data structure and its access functions
3. a comparison function that uses the data structures and helper
functions at the beginning and end with per-kind comparison logic
sandwiched in the middle (this is compare_dies)

If no comparison cycles are possible then a simple DFS works, however,
it's good to cache computed comparisons to avoid repeated work (in
case the DFS traverses a DAG rather than a tree). Such a comparison
function looks like this:

1. Check for a known result, and if present return it (this is a bit
like a check for an existing canonical DIE but additionally handles
the case where the DIEs are already known to be non-equivalent).
2. [doesn't exist yet]
3. Now do the actual comparison work updating result to false if a
difference is found (exactly as at present).
4. [doesn't exist yet]
5. Insert (comparison, result) into "known results".
6. Return result.

If cycles are possible, this DFS will go into an infinite loop (and
crash with a stack overflow). The comparison function needs to be
updated:

1. Check for a known result, and if present return it (this is a bit
like a check for an existing canonical DIE but additionally handles
the case where the DIEs are already known to be non-equivalent).
2. Call the first SCC helper to see if the comparison is already in
progress (this is a bit like the aggregates-being-compared check).
2a. If the SCC helper says the comparison is progress, return a
tentative "true" (equals) result (this is very similar to what happens
for aggregates, but it can be done for all kinds of DIE).
2b. Otherwise the SCC helper has returned an index which *must* be
used before returning from the comparison function.
3. Now do the actual comparison work updating result to false if a
difference is found (exactly as at present - though I did notice one
stray early return false which needs to be changed to result = false).
4. Pass the index to the second SCC helper function which returns a
set of comparisons.
4a. The empty set implies that comparisons are still in progress.
4b. A non-empty set means a strongly-connected component has been
found and all of its comparisons have completed.
5. Insert (comparison, result) into "known results" for each
comparison in the set.
6. Return result.

I'm not exactly sure what this means in terms of
on-the-fly-canonicalisation. I tried adapting the code at the end of
compare_dies that performs canonicalisation but something broke
somewhere. I also tried skipping on-the-fly canonicalisation
altogether but that didn't work either. So, just dropping in the SCC
helper and moving a small amount of code around doesn't quite work.
It's possible I need to fix how I got back from DIE offsets to DIEs.

I've noted similarities between the SCC approach and what you have in
place already. The SCC approach guarantees no repeated comparisons, so
it would be really great if it could be applied to the DWARF reader.

My SCC helper code is here:
https://cs.android.com/android/kernel/superproject/+/build-tools:external/stg/scc.h

A concrete comparison algorithm using the SCC helper is here (it
actually builds ABI diffs in this case, but that extra logic can be
ignored):
https://cs.android.com/android/kernel/superproject/+/build-tools:external/stg/stg.cc;drc=3b6b65e1a9190dab2eabb98701a716a895b8a7b1;l=366

I'll post a cleaned-up version of my attempt to integrate this code
tomorrow with notes about where the test suite blows up.

Cheers,
Giuliano.

> Thanks.
>
> --
> You are receiving this mail because:
> You reported the bug.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug default/26646] unexpected declaration-only types
  2020-09-22 11:35 [Bug default/26646] New: unexpectedly declaration-only types gprocida+abigail at google dot com
                   ` (33 preceding siblings ...)
  2022-03-02 22:36 ` gprocida at google dot com
@ 2022-03-03 11:43 ` dodji at seketeli dot org
  2022-03-03 13:32 ` gprocida at google dot com
  2022-06-08 15:43 ` gprocida at google dot com
  36 siblings, 0 replies; 41+ messages in thread
From: dodji at seketeli dot org @ 2022-03-03 11:43 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=26646

--- Comment #33 from dodji at seketeli dot org ---
Giuliano Procida via Libabigail <libabigail@sourceware.org> a écrit:


>> Does that make any sense?
>>
>
> I think it makes some sense, but it would take me some time to read
> through, digest and reason about this properly.
>
> Instead... I'm going to advertise my comparison algorithm again (for
> which I've already done all the hard thinking and testing). :-) I'm
> not sure how directly applicable it is to canonicalisation, but there
> is the *potential* to eliminate all redundant comparisons.

OK, I think this would be whole project in itself, as far as I am
concerned.  Right now, I am interested in fixing this issue in a 'best
effort' mode, keeping most of the existing infrastructure for the sake
of consistency and limiting the unintended impacts, release a 2.1
tarball and then we can talk about this further if you like.

I am not ditching what you are saying.  Just that this place (patch
review and this bug report) doesn't seem like the best place for this
right now.

Rather, a new comparison algorithm altogether being a project in itself
(IMHO), it'd be better to keep this discussion under its own "feature
request bug report", I'd say.

I understand if you don't have time to review what I am proposing.  I'll
thus go ahead and look at your proposal a bit later.

Thanks.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug default/26646] unexpected declaration-only types
  2020-09-22 11:35 [Bug default/26646] New: unexpectedly declaration-only types gprocida+abigail at google dot com
                   ` (34 preceding siblings ...)
  2022-03-03 11:43 ` dodji at seketeli dot org
@ 2022-03-03 13:32 ` gprocida at google dot com
  2022-06-08 15:43 ` gprocida at google dot com
  36 siblings, 0 replies; 41+ messages in thread
From: gprocida at google dot com @ 2022-03-03 13:32 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=26646

--- Comment #34 from gprocida at google dot com ---
Hi.

compare_dies is almost exactly the right shape to be able to drop in the more
sophisticated cycle-handling logic (a.k.a. SCC finder) and I've posted an RFC
patch to show the small changes needed there and also to note that things don't
quite work, as mentioned already.

If had a better idea as to what compute_ and get_canonical_die_offset are
doing, I might be able to fix things myself, but I don't.

However, if you feel this is a separate project for another time, that's fair
enough.

In particular, I have no expectation that dropping in the SCC finder and
getting compare_dies working with it will do anything other than make the
aggregates-being-compared logic more precise and complete. I don't think it
will address this (declaration-only types) bug report in general.

Regards,
Giuliano.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug default/26646] unexpected declaration-only types
  2020-09-22 11:35 [Bug default/26646] New: unexpectedly declaration-only types gprocida+abigail at google dot com
                   ` (35 preceding siblings ...)
  2022-03-03 13:32 ` gprocida at google dot com
@ 2022-06-08 15:43 ` gprocida at google dot com
  36 siblings, 0 replies; 41+ messages in thread
From: gprocida at google dot com @ 2022-06-08 15:43 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=26646

--- Comment #35 from gprocida at google dot com ---
I have bisected what I believe to be a (rare) exponential slowdown to this
commit:
https://sourceware.org/git/?p=libabigail.git;a=commit;h=bfd557040376d4e82e4e684f3745f1e55bafdd9b

The same pairs of DIEs get compared 100 000s of times (at least) and abidw does
not terminate in any reasonable amount of time.

This behaviour is consistent across build environments and compilers and I have
verified it on vanilla upstream libabigail at commit
ebf8f98f3dc7b6e4fe2af9c1ccbbfee55ace6c54.

The test case is 200MB compressed, unfortunately, much of which is vmlinux. The
bad behaviour kicks in when processing reaches a certain kernel module, as far
as I could tell using strace.

I'm looking into what I can do make the test case available to you. There may
be some complications here.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Bug default/26646] unexpected declaration-only types
  2022-03-02 22:35   ` Giuliano Procida
@ 2022-03-03 11:43     ` Dodji Seketeli
  0 siblings, 0 replies; 41+ messages in thread
From: Dodji Seketeli @ 2022-03-03 11:43 UTC (permalink / raw)
  To: Giuliano Procida via Libabigail
  Cc: dodji at seketeli dot org, Giuliano Procida

Giuliano Procida via Libabigail <libabigail@sourceware.org> a écrit:


>> Does that make any sense?
>>
>
> I think it makes some sense, but it would take me some time to read
> through, digest and reason about this properly.
>
> Instead... I'm going to advertise my comparison algorithm again (for
> which I've already done all the hard thinking and testing). :-) I'm
> not sure how directly applicable it is to canonicalisation, but there
> is the *potential* to eliminate all redundant comparisons.

OK, I think this would be whole project in itself, as far as I am
concerned.  Right now, I am interested in fixing this issue in a 'best
effort' mode, keeping most of the existing infrastructure for the sake
of consistency and limiting the unintended impacts, release a 2.1
tarball and then we can talk about this further if you like.

I am not ditching what you are saying.  Just that this place (patch
review and this bug report) doesn't seem like the best place for this
right now.

Rather, a new comparison algorithm altogether being a project in itself
(IMHO), it'd be better to keep this discussion under its own "feature
request bug report", I'd say.

I understand if you don't have time to review what I am proposing.  I'll
thus go ahead and look at your proposal a bit later.

Thanks.

-- 
		Dodji

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Bug default/26646] unexpected declaration-only types
       [not found] ` <bug-26646-12616-6R4shWHsRy@http.sourceware.org/bugzilla/>
@ 2022-03-02 22:35   ` Giuliano Procida
  2022-03-03 11:43     ` Dodji Seketeli
  0 siblings, 1 reply; 41+ messages in thread
From: Giuliano Procida @ 2022-03-02 22:35 UTC (permalink / raw)
  To: dodji at seketeli dot org; +Cc: Giuliano Procida via Libabigail

(Not sure what's going on with the addresses, I've added back libabigail here.)

Hi.

On Wed, 2 Mar 2022 at 13:34, dodji at seketeli dot org
<sourceware-bugzilla@sourceware.org> wrote:
>
> https://sourceware.org/bugzilla/show_bug.cgi?id=26646
>
> --- Comment #31 from dodji at seketeli dot org ---
> Dodji Seketeli via Libabigail <libabigail@sourceware.org> a écrit:
>
> [...]
>
>
> > This addresses https://sourceware.org/bugzilla/show_bug.cgi?id=26646#c21.
> >
> >       * src/abg-dwarf-reader.cc (compare_dies): Do not propagate
> >       canonical type when aggregate redundancy is detected.
> >
>
> [...]
>
> gprocida at google dot com via Libabigail <libabigail@sourceware.org> a
> écrit:
>
> [...]
>
> > I've thought a little bit more about this one.
> >
> > The current check is "local", recursion prevention at *this* DIE
> > prevents it from being canonicalised, but it could still depend on
> > child DIEs that are actually also parent DIEs and whose processing has
> > not yet been completed.
>
> You are right.  The current state is an improvement, but it's not
> "complete".  Some DIEs might still wrongly considered as being
> equivalent just because they depend on a "recursive DIE" that was
> detected as such.  The canonical DIE propagation might have happened
> during the window of time where the recursive DIE was comparing equal to
> its counterpart.
>
> This is addressed in the IR type canonicalization algorithm in
> abg-ir.cc.
> To learn more about this, look for
> /// @defgroup OnTheFlyCanonicalization On-the-fly Canonicalization
>
> and that comment.
>
> The implementation is scattered in the functions
> return_comparison_result, maybe_propagate_canonical_type and in the
> macro RETURN_TRUE_IF_COMPARISON_CYCLE_DETECTED.
>
> More on this below ...
>
> > Would it be safer (more precise) to inhibit on-the-fly
> > canonicalisation whenever the set (stack) of aggregates being compared
> > is non-empty?
>
> Right, that is the obvious thing to do, as I thought.  But then the
> problem I encountered, when doing this for IR types, was that things
> were too slow.  Really really slow.  In the time window where the
> canonical type DIE is inhibited, the amount of (quadratic) structural
> DIE comparisons goes through the roof.  That is why I came up with the
> canonical type propagation in the first place.
>
> So, the other (harder) approach I've taken is to not disable the
> on-the-fly canonicalization when the stack of aggregates being compared
> is non-empty, but rather, keep track of the types which are resolved as
> being equivalent, but that depend on recursive aggregate that were
> detected as such.
>
> I call these types "dependent types".  Let's consider one such dependent
> type D.  D's canonical type is the result of canonical propagation that
> happened as the recursive type (that D depends on) was on the stack of
> aggregates being compared.  D is labelled as having a "non-confirmed"
> canonical type.  If the recursive type it depends on is later considered
> not being equivalent to its counterpart, then the non-confirmed
> canonical of D is going to be "cancelled" and then D is going to be
> considered as being non canonicalized.
>
> This is done for D and all the types that depends on it.
>
> By doing this, the number of non canonicalized types is reduced to its
> absolute minimum and so comparisons are reasonably fast, even for an
> applications like the Kernel or Gimp.  Libraries usually have smaller
> type graphs so we don't usually see the speed issue there.  Unless it's
> llvm.so or libwebkit.so ;-)
>
> So, I wasn't going to get into doing this for DIEs right away because it
> takes a lot of time doing careful coding/debugging/profiling cycles.
>
> But I definitely think we'll need to do this to have perfectly precise
> canonicalizer.  My point was to get this in as it's an improvement
> already and then work on the subsequent patch with a cooler head.
>
> Does that make any sense?
>

I think it makes some sense, but it would take me some time to read
through, digest and reason about this properly.

Instead... I'm going to advertise my comparison algorithm again (for
which I've already done all the hard thinking and testing). :-) I'm
not sure how directly applicable it is to canonicalisation, but there
is the *potential* to eliminate all redundant comparisons.

Problem statement: How to decide if two DIEs (let's call them nodes)
are equivalent, when nodes can depend on other nodes recursively and
with cycles, without comparing each pair of nodes more than once.

Nodes can have attributes (like kind (struct, union), size etc.) and
labelled edges (like member x of type y, pointer to type z etc.).
Equivalence means that node attributes are equivalent, nodes have the
same edge labels and following matching labelled edges finds
equivalent nodes.

The C and C++ type systems have various wrinkles, but the overall
structure of equivalence testing involves this kind of matching of
attributes and following edges to other nodes.

Equivalence of two nodes means we cannot find any difference, no
matter at what depth we decide to terminate recursion.

The algorithm has the following components.

1. a "known results" map from comparisons (node pairs) to equivalence
results (bool) - I think the equivalent thing in libabigail may be
on-the-fly canonicalisation itself, but this only stores true
equivalences for a subset of types and it may be better to cache all
outcomes; this needs to be part of reader_context
2. the path-based strongly-connected component algorithm factored into
2 helper functions and associated shared state (which should also be
part of the reader context) - these replace the
aggregates_being_compared data structure and its access functions
3. a comparison function that uses the data structures and helper
functions at the beginning and end with per-kind comparison logic
sandwiched in the middle (this is compare_dies)

If no comparison cycles are possible then a simple DFS works, however,
it's good to cache computed comparisons to avoid repeated work (in
case the DFS traverses a DAG rather than a tree). Such a comparison
function looks like this:

1. Check for a known result, and if present return it (this is a bit
like a check for an existing canonical DIE but additionally handles
the case where the DIEs are already known to be non-equivalent).
2. [doesn't exist yet]
3. Now do the actual comparison work updating result to false if a
difference is found (exactly as at present).
4. [doesn't exist yet]
5. Insert (comparison, result) into "known results".
6. Return result.

If cycles are possible, this DFS will go into an infinite loop (and
crash with a stack overflow). The comparison function needs to be
updated:

1. Check for a known result, and if present return it (this is a bit
like a check for an existing canonical DIE but additionally handles
the case where the DIEs are already known to be non-equivalent).
2. Call the first SCC helper to see if the comparison is already in
progress (this is a bit like the aggregates-being-compared check).
2a. If the SCC helper says the comparison is progress, return a
tentative "true" (equals) result (this is very similar to what happens
for aggregates, but it can be done for all kinds of DIE).
2b. Otherwise the SCC helper has returned an index which *must* be
used before returning from the comparison function.
3. Now do the actual comparison work updating result to false if a
difference is found (exactly as at present - though I did notice one
stray early return false which needs to be changed to result = false).
4. Pass the index to the second SCC helper function which returns a
set of comparisons.
4a. The empty set implies that comparisons are still in progress.
4b. A non-empty set means a strongly-connected component has been
found and all of its comparisons have completed.
5. Insert (comparison, result) into "known results" for each
comparison in the set.
6. Return result.

I'm not exactly sure what this means in terms of
on-the-fly-canonicalisation. I tried adapting the code at the end of
compare_dies that performs canonicalisation but something broke
somewhere. I also tried skipping on-the-fly canonicalisation
altogether but that didn't work either. So, just dropping in the SCC
helper and moving a small amount of code around doesn't quite work.
It's possible I need to fix how I got back from DIE offsets to DIEs.

I've noted similarities between the SCC approach and what you have in
place already. The SCC approach guarantees no repeated comparisons, so
it would be really great if it could be applied to the DWARF reader.

My SCC helper code is here:
https://cs.android.com/android/kernel/superproject/+/build-tools:external/stg/scc.h

A concrete comparison algorithm using the SCC helper is here (it
actually builds ABI diffs in this case, but that extra logic can be
ignored):
https://cs.android.com/android/kernel/superproject/+/build-tools:external/stg/stg.cc;drc=3b6b65e1a9190dab2eabb98701a716a895b8a7b1;l=366

I'll post a cleaned-up version of my attempt to integrate this code
tomorrow with notes about where the test suite blows up.

Cheers,
Giuliano.

> Thanks.
>
> --
> You are receiving this mail because:
> You reported the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2022-06-08 15:43 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-22 11:35 [Bug default/26646] New: unexpectedly declaration-only types gprocida+abigail at google dot com
2020-09-22 12:15 ` [Bug default/26646] " dodji at redhat dot com
2020-10-06 21:02 ` gprocida+abigail at google dot com
2020-10-07  9:33 ` gprocida+abigail at google dot com
2020-11-23 17:06 ` gprocida+abigail at google dot com
2022-01-17 17:46 ` gprocida at google dot com
2022-01-18  9:37 ` gprocida at google dot com
2022-01-18 11:20 ` gprocida at google dot com
2022-01-18 11:20 ` gprocida at google dot com
2022-01-18 11:32 ` gprocida at google dot com
2022-01-24 15:07 ` dodji at redhat dot com
2022-01-24 17:29 ` gprocida at google dot com
2022-02-07 10:40 ` gprocida at google dot com
2022-02-07 11:19 ` dodji at redhat dot com
2022-02-07 17:10 ` [Bug default/26646] unexpected " dodji at redhat dot com
2022-02-07 17:47 ` gprocida at google dot com
2022-02-07 21:15 ` gprocida at google dot com
2022-02-10 16:45 ` dodji at redhat dot com
2022-02-10 16:46 ` gprocida at google dot com
2022-02-10 17:21 ` gprocida at google dot com
2022-02-10 17:33 ` dodji at redhat dot com
2022-02-10 17:36 ` dodji at redhat dot com
2022-02-10 18:53 ` gprocida at google dot com
2022-02-11 12:51 ` gprocida at google dot com
2022-02-11 13:00 ` gprocida at google dot com
2022-02-24 11:09 ` dodji at redhat dot com
2022-02-24 12:16 ` gprocida at google dot com
2022-02-24 15:54 ` dodji at redhat dot com
2022-02-24 16:05 ` gprocida at google dot com
2022-02-28  9:59 ` dodji at redhat dot com
2022-02-28 10:01 ` dodji at redhat dot com
2022-03-01 14:34 ` gprocida at google dot com
2022-03-01 14:40 ` gprocida at google dot com
2022-03-02 13:34   ` Dodji Seketeli
2022-03-02 13:34 ` dodji at seketeli dot org
2022-03-02 22:36 ` gprocida at google dot com
2022-03-03 11:43 ` dodji at seketeli dot org
2022-03-03 13:32 ` gprocida at google dot com
2022-06-08 15:43 ` gprocida at google dot com
     [not found] <bug-26646-12616@http.sourceware.org/bugzilla/>
     [not found] ` <bug-26646-12616-6R4shWHsRy@http.sourceware.org/bugzilla/>
2022-03-02 22:35   ` Giuliano Procida
2022-03-03 11:43     ` Dodji Seketeli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).