public inbox for libabigail@sourceware.org
 help / color / mirror / Atom feed
From: Giuliano Procida <gprocida@google.com>
To: libabigail@sourceware.org
Cc: dodji@seketeli.org, kernel-team@android.com, gprocida@google.com,
	 maennich@google.com, woodard@redhat.com
Subject: [RFC PATCH 3/9] Add pass to prune unreachable parts of the ABI
Date: Thu, 25 Mar 2021 21:51:40 +0000	[thread overview]
Message-ID: <20210325215146.3597963-4-gprocida@google.com> (raw)
In-Reply-To: <20210325215146.3597963-1-gprocida@google.com>

In response to internal reports about very large ABIs for libraires
implemented in C++ but exposing only a C API, I wrote a script as a
means of investigating the issue. This has been adapted here.

The aim is to remove parts of the XML that are irrelevant to
abidiff. ELF symbols (and shared object dependencies) are taken to be
the graph roots for the purpose of reachability analysis.

On most XML files, this results in a modest saving (5-10%). However,
for the library that sparked the invesigaton, the resulting XML was
more than 2300 times smaller.

If this functionality is implemented in libagail tself, this pass will
become a no-op.

	* scripts/abitidy.pl (prune_unreachable): New function to
	determine the reachable declaration and type nodes in an XNL
	ABI and remove all unreachable ones.

Signed-off-by: Giuliano Procida <gprocida@google.com>
---
 scripts/abitidy.pl | 215 ++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 214 insertions(+), 1 deletion(-)

diff --git a/scripts/abitidy.pl b/scripts/abitidy.pl
index 1f74e267..c9f93ed8 100755
--- a/scripts/abitidy.pl
+++ b/scripts/abitidy.pl
@@ -140,23 +140,233 @@ sub drop_empty($node) {
   }
 }
 
+# Remove unreachable declarations and types.
+#
+# When making a graph from ABI XML, the following are the types of
+# "node" we care about. The "edges" are obtained from a few XML
+# attributes as well as via XML element containment.
+#
+#  ELF (exported) symbols
+#
+#   elf-symbol (has a name; the code here currently knows nothing
+#     about aliases)
+#
+#  Declarations (that mention a symbol)
+#
+#   These live in a scope. In C++ scopes can be nested and include
+#   namespaces and class types.
+#
+#  var-decl (also used for member variables)
+#    elf-symbol linked to symbol via mangled-name
+#    type-id links to a type
+#  function-decl (also used for member functions)
+#    elf-symbol linked to symbol via mangled-name
+#    parameter and return type-ids link to types
+#      (alas, not just a link to a function type)
+#
+#  Types
+#
+#   These occupy pretty much all the other elements, besides those
+#   that act as simple containers.
+sub prune_unreachable($dom) {
+  my %elf_symbols;
+  # Graph vertices (only needed for statistics).
+  my %vertices;
+  # Graph edges.
+  my %edges;
+
+  # Keep track of type / symbol nesting.
+  my @stack;
+
+  # Traverse the whole XML DOM.
+  my sub make_graph($node) {
+    # The XML attributes we care about.
+    my $name;
+    my $id;
+    my $type_id;
+    my $symbol;
+    my $naming_typedef_id;
+
+    # Not every node we encounter is an XML element.
+    if ($node->nodeType == XML_ELEMENT_NODE) {
+      $name = $node->getAttribute('name');
+      $id = $node->getAttribute('id');
+      $type_id = $node->getAttribute('type-id');
+      $symbol = $node->getAttribute('mangled-name');
+      $naming_typedef_id = $node->getAttribute('naming-typedef-id');
+      die if defined $id && defined $symbol;
+    }
+
+    if (defined $name && $node->getName() eq 'elf-symbol') {
+      $elf_symbols{$name} = undef;
+      # Early return is safe, but not necessary.
+      return;
+    }
+
+    if (defined $id) {
+      my $vertex = "type:$id";
+      # This element defines a type (but there may be more than one
+      # defining the same type - we cannot rely on uniqueness).
+      $vertices{$vertex} = undef;
+      if (defined $naming_typedef_id) {
+        # This is an odd one, there can be a backwards link from an
+        # anonymous type to the typedef that refers to it, so we need to
+        # pull in the typedef, even if nothing else refers to it.
+        $edges{$vertex}{"type:$naming_typedef_id"} = undef;
+      }
+      if (@stack) {
+        # Parent<->child dependencies; record dependencies both
+        # ways to avoid holes in XML types and declarations.
+        $edges{$stack[-1]}{$vertex} = undef;
+        $edges{$vertex}{$stack[-1]} = undef;
+      }
+      push @stack, $vertex;
+    }
+
+    if (defined $symbol) {
+      my $vertex = "symbol:$symbol";
+      # This element is a declaration linked to a symbol (whether or not
+      # exported).
+      $vertices{$vertex} = undef;
+      if (@stack) {
+        # Parent<->child dependencies; record dependencies both ways
+        # to avoid holes in XML types and declarations.
+        #
+        # Symbols exist outside of the type hierarchy, so choosing to
+        # make them depend on a containing type scope and vice versa
+        # is conservative and probably not necessary.
+        $edges{$stack[-1]}{$vertex} = undef;
+        $edges{$vertex}{$stack[-1]} = undef;
+      }
+      # The symbol depends on the types mentioned in this element, so
+      # record it.
+      push @stack, $vertex;
+      # In practice there will be at most one symbol on the stack; we
+      # could verify this here, but it wouldn't achieve anything.
+    }
+
+    if (defined $type_id) {
+      if (@stack) {
+        # The enclosing type or symbol refers to another type.
+        $edges{$stack[-1]}{"type:$type_id"} = undef;
+      }
+    }
+
+    for my $child ($node->childNodes()) {
+      __SUB__->($child);
+    }
+
+    if (defined $symbol) {
+      pop @stack;
+    }
+    if (defined $id) {
+      pop @stack;
+    }
+  }
+
+  # Build a graph.
+  make_graph($dom);
+  die if @stack;
+  #warn Dumper(\%elf_symbols, \%vertices, \%edges);
+
+  # DFS visited state. Would be nicer with a flat namespace of nodes.
+  my %seen;
+  my sub dfs($vertex) {
+    no warnings 'recursion';
+    return if exists $seen{$vertex};
+    $seen{$vertex} = undef;
+
+    my $tos = $edges{$vertex};
+    if (defined $tos) {
+      for my $to (keys %$tos) {
+        __SUB__->($to);
+      }
+    }
+  }
+
+  # Traverse the graph, starting from the exported symbols.
+  for my $symbol (keys %elf_symbols) {
+    my $vertex = "symbol:$symbol";
+    if (exists $vertices{$vertex}) {
+      dfs($vertex);
+    } else {
+      warn "no declaration found for ELF symbol $symbol\n";
+    }
+  }
+
+  #warn Dumper(\%seen);
+
+  # Useful counts.
+  my sub print_report() {
+    my $count_elf_symbols = scalar keys %elf_symbols;
+    my $count_vertices = scalar keys %vertices;
+    my $count_seen = scalar keys %seen;
+
+    warn qq{ELF = $count_elf_symbols
+vertices = $count_vertices
+seen = $count_seen
+};
+  }
+
+  #print_report();
+
+  # XPath selection is too slow as we end up enumerating lots of
+  # nested items whose preservation is entirely determined by their
+  # containing items. DFS with early stopping for the win.
+  my sub remove_unwanted($node) {
+    my $node_name = $node->getName();
+    my $name;
+    my $id;
+    my $symbol;
+
+    if ($node->nodeType == XML_ELEMENT_NODE) {
+      $name = $node->getAttribute('name');
+      $id = $node->getAttribute('id');
+      $symbol = $node->getAttribute('mangled-name');
+      die if defined $id && defined $symbol;
+    }
+
+    # Return if we know that this is a type or declaration to keep or
+    # drop in its entirety.
+    if (defined $id) {
+      remove_node($node) unless exists $seen{"type:$id"};
+      return;
+    }
+    if ($node_name eq 'var-decl' || $node_name eq 'function-decl') {
+      remove_node($node) unless defined $symbol && exists $seen{"symbol:$symbol"};
+      return;
+    }
+
+    # Otherwise, this is not a type, declaration or part thereof, so
+    # process child elements.
+    for my $child ($node->childNodes()) {
+      __SUB__->($child);
+    }
+  }
+
+  remove_unwanted($dom);
+}
+
 # Parse arguments.
 my $input_opt;
 my $output_opt;
 my $all_opt;
 my $drop_opt;
+my $prune_opt;
 GetOptions('i|input=s' => \$input_opt,
            'o|output=s' => \$output_opt,
            'a|all' => sub {
-             $drop_opt = 1
+             $drop_opt = $prune_opt = 1
            },
            'd|drop-empty!' => \$drop_opt,
+           'p|prune-unreachable!' => \$prune_opt,
   ) and !@ARGV or die("usage: $0",
                       map { (' ', $_) } (
                         '[-i|--input file]',
                         '[-o|--output file]',
                         '[-a|--all]',
                         '[-d|--[no-]drop-empty]',
+                        '[-p|--[no-]prune-unreachable]',
                       ), "\n");
 
 exit 0 unless defined $input_opt;
@@ -169,6 +379,9 @@ close $input;
 # This simplifies DOM analysis and manipulation.
 strip_text($dom);
 
+# Prune unreachable elements.
+prune_unreachable($dom) if $prune_opt;
+
 # Drop empty elements.
 drop_empty($dom) if $drop_opt;
 
-- 
2.31.0.291.g576ba9dcdaf-goog


  parent reply	other threads:[~2021-03-25 21:52 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-11 11:53 [RFC PATCH] Add an experimental ABI pruning utility Giuliano Procida
2021-03-11 20:39 ` Ben Woodard
2021-03-12 16:51   ` Giuliano Procida
2021-03-12 18:41     ` Ben Woodard
2021-03-15 11:08       ` Giuliano Procida
2021-03-12 16:59 ` [RFC PATCH v2] " Giuliano Procida
2021-03-16 16:55   ` [RFC PATCH v3] " Giuliano Procida
2021-03-25 21:51     ` [RFC PATCH 0/9] Utility to manipulate ABI XML Giuliano Procida
2021-03-25 21:51       ` [RFC PATCH 1/9] Add ABI tidying utility Giuliano Procida
2021-03-25 21:51       ` [RFC PATCH 2/9] Add pass to drop empty XML elements Giuliano Procida
2021-03-25 21:51       ` Giuliano Procida [this message]
2021-03-25 21:51       ` [RFC PATCH 4/9] Add pass to filter symbols Giuliano Procida
2021-03-25 21:51       ` [RFC PATCH 5/9] Add pass to normalise anonymous type names Giuliano Procida
2021-03-25 21:51       ` [RFC PATCH 6/9] Add pass to report duplicate type ids Giuliano Procida
2021-03-25 21:51       ` [RFC PATCH 7/9] Add pass to eliminate duplicate member-type fragments Giuliano Procida
2021-03-25 21:51       ` [RFC PATCH 8/9] Add pass to stabilise types and declarations Giuliano Procida
2021-03-25 21:51       ` [RFC PATCH 9/9] Add pass to resolve stray forward type declarations Giuliano Procida

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210325215146.3597963-4-gprocida@google.com \
    --to=gprocida@google.com \
    --cc=dodji@seketeli.org \
    --cc=kernel-team@android.com \
    --cc=libabigail@sourceware.org \
    --cc=maennich@google.com \
    --cc=woodard@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).