public inbox for libabigail@sourceware.org
 help / color / mirror / Atom feed
From: Matthias Maennich <maennich@google.com>
To: Giuliano Procida <gprocida@google.com>
Cc: libabigail@sourceware.org, dodji@seketeli.org, kernel-team@android.com
Subject: Re: [PATCH 1/2] abg-writer: Add support for stable hash type ids.
Date: Mon, 15 Jun 2020 21:29:51 +0200	[thread overview]
Message-ID: <20200615192951.GF120706@google.com> (raw)
In-Reply-To: <20200612095917.2146-2-gprocida@google.com>

On Fri, Jun 12, 2020 at 10:59:16AM +0100, Android Kernel Team wrote:
>The type ids currently emitted by the XML writer are simply type-id-1,
>type-id-2 etc. Additions or removals of types early in this sequence
>result in cascading changes to many other XML elements.
>
>This commit adds support for stable type ids in the form of hashes of
>libabigail's internal type names. On fairly rare occasions (typically
>involving unnamed types), the names of two distinct types can be the
>same. In any case, if there is a hash collision the XML writer will
>find the next unused id and so preserve uniqueness.
>
>Diffs between XML files produced using --type-id-style hash will be
>much smaller and easier to review.
>
>	* doc/manuals/abidw.rst: Replace stray documentation of
>	--named-type-ids with documention of new --type-id-style
>	option.
>	* include/abg-writer.h (type_id_style_kind): Add new enum.
>	(set_type_id_style): Add new write_context setter.
>	(set_common_options): Set type id style in write context.
>	* src/abg-writer.cc (stable_hash): Add new 32-bit FNV-1a hash
>	function. (write_context): Add m_type_id_style member to
>	record type style to use, defaulting to COUNTER. Add
>	m_used_type_id_hashes to record already used hashes.
>	(write_context::get_type_id_style): Add new getter.
>	(write_context::set_type_id_style): Add new setter.
>	(get_id_for_type): Add support for HASH style.
>	(set_type_id_style): Add new helper function.
>	* tools/abidw.cc (options): Add type_id_style member.
>	(display_usage): Add description of --type-id-style option.
>	(parse_command_line): Parse --type-id-style option.
>
>Signed-off-by: Giuliano Procida <gprocida@google.com>
>---
> doc/manuals/abidw.rst | 15 +++----
> include/abg-writer.h  | 10 +++++
> src/abg-writer.cc     | 95 +++++++++++++++++++++++++++++++++++++++----
> tools/abidw.cc        | 20 ++++++++-
> 4 files changed, 125 insertions(+), 15 deletions(-)
>
>diff --git a/doc/manuals/abidw.rst b/doc/manuals/abidw.rst
>index e1fce997..9472cfe8 100644
>--- a/doc/manuals/abidw.rst
>+++ b/doc/manuals/abidw.rst
>@@ -197,13 +197,14 @@ Options
>     1.8 will not set the default size and will interpret types without
>     a size-in-bits attribute as zero sized.
>
>-  * ``--named-type-ids``
>-
>-    Without this option ids used to reference types in the XML file
>-    use simple numbers.  With this option the ids used are derived
>-    from the type name to make it easier to see which type is
>-    referenced and make the XML file more stable in case new types are
>-    added (without this option that might mean all id numbers change).

Note: This (correctly) drops a hunk that was accidentally committed as
part of an earlier change. This feature is not available, but
documented. That is rare :-)

>+  * ``--type-id-style`` ( ``counter`` | ``hash`` )
>+
>+    This option controls how types are idenfied in the generated XML
>+    files.  The default ``counter`` style just numbers (with
>+    ``type-id-`` as prefix) the types in the order they are
>+    encountered.  The ``hash`` style uses a (stable, portable) hash of
>+    the type names that libabigail uses internally and is intended
>+    to make the XML files easier to diff.
>
>   * ``--check-alternate-debug-info-base-name`` <*elf-path*>
>
>diff --git a/include/abg-writer.h b/include/abg-writer.h
>index 8086d828..021a95b7 100644
>--- a/include/abg-writer.h
>+++ b/include/abg-writer.h
>@@ -38,6 +38,12 @@ namespace xml_writer
>
> using namespace abigail::ir;
>
>+enum type_id_style_kind
>+{
>+  COUNTER,

I would rather call it a sequence, but counter is fine for me as well.

>+  HASH
>+};
>+
> class write_context;
>
> /// A convenience typedef for a shared pointer to write_context.
>@@ -74,6 +80,9 @@ set_short_locs(write_context& ctxt, bool flag);
> void
> set_write_parameter_names(write_context& ctxt, bool flag);
>
>+void
>+set_type_id_style(write_context& ctxt, type_id_style_kind style);
>+
> /// A convenience generic function to set common options (usually used
> /// by Libabigail tools) from a generic options carrying-object, into
> /// a given @ref write_context.
>@@ -96,6 +105,7 @@ set_common_options(write_context& ctxt, const OPTS& opts)
>   set_write_parameter_names(ctxt, opts.write_parameter_names);
>   set_short_locs(ctxt, opts.short_locs);
>   set_write_default_sizes(ctxt, opts.default_sizes);
>+  set_type_id_style(ctxt, opts.type_id_style);
> }
>
> void
>diff --git a/src/abg-writer.cc b/src/abg-writer.cc
>index bafa3024..f0dbd096 100644
>--- a/src/abg-writer.cc
>+++ b/src/abg-writer.cc
>@@ -27,6 +27,7 @@
>
> #include "config.h"
> #include <assert.h>
>+#include <iomanip>
> #include <iostream>
> #include <fstream>
> #include <sstream>
>@@ -56,6 +57,39 @@ ABG_BEGIN_EXPORT_DECLARATIONS
> ABG_END_EXPORT_DECLARATIONS
> // </headers defining libabigail's API>
>
>+namespace
>+{
>+/// Compute a stable string hash.
>+///
>+/// std::hash has no portability or stability guarantees so is
>+/// unsuitable where reproducibility is a requirement.
>+///
>+/// This is the 32-bit FNV-1a algorithm. The algorithm, reference code
>+/// and constants are all unencumbered. It is fast and has reasonable
>+/// distribution properties.
>+///
>+/// https://en.wikipedia.org/wiki/Fowler-Noll-Vo_hash_function
>+///
>+/// @param str the string to hash.
>+///
>+/// @return an unsigned 32 bit hash value.

static

>+uint32_t
>+stable_hash(const std::string& str)
>+{
>+  const uint32_t prime = 0x01000193;
>+  const uint32_t offset_basis = 0x811c9dc5;
>+  uint32_t hash = offset_basis;
>+  for (std::string::const_iterator i = str.begin(); i != str.end(); ++i)
>+    {
>+      uint8_t byte = *i;
>+      hash = hash ^ byte;
>+      hash = hash * prime;
>+    }
>+  return hash;
>+}
>+
>+} // namespace
>+
> namespace abigail
> {
> using std::cerr;
>@@ -177,7 +211,9 @@ class write_context
>   bool					m_write_parameter_names;
>   bool					m_short_locs;
>   bool					m_write_default_sizes;
>+  type_id_style_kind			m_type_id_style;
>   mutable type_ptr_map			m_type_id_map;
>+  mutable std::unordered_set<uint32_t>	m_used_type_id_hashes;

Use abg_compat::unordered_set instead.

>   mutable type_ptr_set_type		m_emitted_type_set;
>   type_ptr_set_type			m_emitted_decl_only_set;
>   // A map of types that are referenced by emitted pointers,
>@@ -214,7 +250,8 @@ public:
>       m_write_elf_needed(true),
>       m_write_parameter_names(true),
>       m_short_locs(false),
>-      m_write_default_sizes(true)
>+      m_write_default_sizes(true),
>+      m_type_id_style(COUNTER)
>   {}
>
>   /// Getter of the environment we are operating from.
>@@ -374,6 +411,24 @@ public:
>   set_show_locs(bool f)
>   {m_show_locs = f;}
>
>+  /// Getter of the "type-id-style" option.
>+  ///
>+  /// This option controls the kind of type ids used in XML output.
>+  ///
>+  /// @return the value of the "type-id-style" option.
>+  type_id_style_kind
>+  get_type_id_style() const
>+  {return m_type_id_style;}
>+
>+  /// Setter of the "type-id-style" option.
>+  ///
>+  /// This option controls the kind of type ids used in XML output.
>+  ///
>+  /// @param style the new value of the "type-id-style" option.
>+  void
>+  set_type_id_style(type_id_style_kind style)
>+  {m_type_id_style = style;}
>+
>   /// Getter of the @ref id_manager.
>   ///
>   /// @return the @ref id_manager used by the current instance of @ref
>@@ -421,14 +476,29 @@ public:
>       c = const_cast<type_base*>(t);
>
>     type_ptr_map::const_iterator it = m_type_id_map.find(c);
>-    if (it == m_type_id_map.end())
>+    if (it != m_type_id_map.end())
>+      return it->second;
>+
>+    switch (m_type_id_style)
>       {
>-	interned_string id =
>-	  get_id_manager().get_id_with_prefix("type-id-");
>-	m_type_id_map[c] = id;
>-	return id;
>+      case COUNTER:
>+        {
>+          interned_string id = get_id_manager().get_id_with_prefix("type-id-");
>+          return m_type_id_map[c] = id;
>+        }
>+      case HASH:
>+        {
>+          interned_string pretty = c->get_cached_pretty_representation(true);
>+          size_t hash = stable_hash(*pretty.raw());
>+          while (!m_used_type_id_hashes.insert(hash).second)
>+            ++hash;
>+          std::ostringstream os;
>+          os << std::hex << std::setfill('0') << std::setw(8) << hash;
>+          return m_type_id_map[c] = c->get_environment()->intern(os.str());
>+        }
>       }
>-    return it->second;
>+    ABG_ASSERT_NOT_REACHED;
>+    return interned_string();
>   }
>
>   string
>@@ -2131,6 +2201,17 @@ void
> set_write_default_sizes(write_context& ctxt, bool flag)
> {ctxt.set_write_default_sizes(flag);}
>
>+/// Set the 'type-id-style' property.
>+///
>+/// This property controls the kind of type ids used in XML output.
>+///
>+/// @param ctxt the context to set this property on.
>+///
>+/// @param style the new value of the 'type-id-style' property.
>+void
>+set_type_id_style(write_context& ctxt, type_id_style_kind style)
>+{ctxt.set_type_id_style(style);}
>+
> /// Serialize the canonical types of a given scope.
> ///
> /// @param scope the scope to consider.
>diff --git a/tools/abidw.cc b/tools/abidw.cc
>index 9aec3112..7de31737 100644
>--- a/tools/abidw.cc
>+++ b/tools/abidw.cc
>@@ -70,7 +70,10 @@ using abigail::comparison::corpus_diff_sptr;
> using abigail::comparison::compute_diff;
> using abigail::comparison::diff_context_sptr;
> using abigail::comparison::diff_context;
>+using abigail::xml_writer::COUNTER;
>+using abigail::xml_writer::HASH;
> using abigail::xml_writer::create_write_context;
>+using abigail::xml_writer::type_id_style_kind;
> using abigail::xml_writer::write_context_sptr;
> using abigail::xml_writer::write_corpus;
> using abigail::xml_reader::read_corpus_from_native_xml_file;
>@@ -114,6 +117,7 @@ struct options
>   bool			do_log;
>   bool			drop_private_types;
>   bool			drop_undefined_syms;
>+  type_id_style_kind	type_id_style;
>
>   options()
>     : display_version(),
>@@ -136,7 +140,8 @@ struct options
>       annotate(),
>       do_log(),
>       drop_private_types(false),
>-      drop_undefined_syms(false)
>+      drop_undefined_syms(false),
>+      type_id_style(COUNTER)
>   {}
>
>   ~options()
>@@ -175,6 +180,7 @@ display_usage(const string& prog_name, ostream& out)
>     << "  --no-write-default-sizes  do not emit pointer size when it equals"
>     " the default address size of the translation unit\n"
>     << "  --no-parameter-names  do not show names of function parameters\n"
>+    << "  --type-id-style (counter|hash)  type-id style (counter(default): type-id-number; hash: hexdigits\n"
>     << "  --check-alternate-debug-info <elf-path>  check alternate debug info "
>     "of <elf-path>\n"
>     << "  --check-alternate-debug-info-base-name <elf-path>  check alternate "
>@@ -301,6 +307,18 @@ parse_command_line(int argc, char* argv[], options& opts)
> 	opts.default_sizes = false;
>       else if (!strcmp(argv[i], "--no-parameter-names"))
> 	opts.write_parameter_names = false;
>+      else if (!strcmp(argv[i], "--type-id-style"))
>+        {
>+          ++i;
>+          if (i >= argc)
>+            return false;
>+          if (!strcmp(argv[i], "counter"))
>+            opts.type_id_style = COUNTER;
>+          else if (!strcmp(argv[i], "hash"))
>+            opts.type_id_style = HASH;
>+          else
>+            return false;
>+        }
>       else if (!strcmp(argv[i], "--check-alternate-debug-info")
> 	       || !strcmp(argv[i], "--check-alternate-debug-info-base-name"))
> 	{
>-- 
>2.27.0.290.gba653c62da-goog
>
>-- 
>To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>

  parent reply	other threads:[~2020-06-15 19:29 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-12  9:59 [PATCH 0/2] Stable " Giuliano Procida
2020-06-12  9:59 ` [PATCH 1/2] abg-writer: Add support for stable hash " Giuliano Procida
2020-06-15 17:18   ` Dodji Seketeli
2020-06-15 19:29   ` Matthias Maennich [this message]
2020-06-15 21:07   ` [PATCH v2 0/1] Stable " Giuliano Procida
2020-06-15 21:07     ` [PATCH v2 1/1] abg-writer: Add support for stable hash " Giuliano Procida
2020-06-16  8:21       ` Dodji Seketeli
2020-06-12  9:59 ` [PATCH 2/2] Add tests for abidw --type-id-style hash Giuliano Procida

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200615192951.GF120706@google.com \
    --to=maennich@google.com \
    --cc=dodji@seketeli.org \
    --cc=gprocida@google.com \
    --cc=kernel-team@android.com \
    --cc=libabigail@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).