From: Matthias Maennich <maennich@google.com>
To: Giuliano Procida <gprocida@google.com>
Cc: libabigail@sourceware.org, dodji@seketeli.org, kernel-team@android.com
Subject: Re: [PATCH 1/2] abg-writer: Add support for stable hash type ids.
Date: Mon, 15 Jun 2020 21:29:51 +0200 [thread overview]
Message-ID: <20200615192951.GF120706@google.com> (raw)
In-Reply-To: <20200612095917.2146-2-gprocida@google.com>
On Fri, Jun 12, 2020 at 10:59:16AM +0100, Android Kernel Team wrote:
>The type ids currently emitted by the XML writer are simply type-id-1,
>type-id-2 etc. Additions or removals of types early in this sequence
>result in cascading changes to many other XML elements.
>
>This commit adds support for stable type ids in the form of hashes of
>libabigail's internal type names. On fairly rare occasions (typically
>involving unnamed types), the names of two distinct types can be the
>same. In any case, if there is a hash collision the XML writer will
>find the next unused id and so preserve uniqueness.
>
>Diffs between XML files produced using --type-id-style hash will be
>much smaller and easier to review.
>
> * doc/manuals/abidw.rst: Replace stray documentation of
> --named-type-ids with documention of new --type-id-style
> option.
> * include/abg-writer.h (type_id_style_kind): Add new enum.
> (set_type_id_style): Add new write_context setter.
> (set_common_options): Set type id style in write context.
> * src/abg-writer.cc (stable_hash): Add new 32-bit FNV-1a hash
> function. (write_context): Add m_type_id_style member to
> record type style to use, defaulting to COUNTER. Add
> m_used_type_id_hashes to record already used hashes.
> (write_context::get_type_id_style): Add new getter.
> (write_context::set_type_id_style): Add new setter.
> (get_id_for_type): Add support for HASH style.
> (set_type_id_style): Add new helper function.
> * tools/abidw.cc (options): Add type_id_style member.
> (display_usage): Add description of --type-id-style option.
> (parse_command_line): Parse --type-id-style option.
>
>Signed-off-by: Giuliano Procida <gprocida@google.com>
>---
> doc/manuals/abidw.rst | 15 +++----
> include/abg-writer.h | 10 +++++
> src/abg-writer.cc | 95 +++++++++++++++++++++++++++++++++++++++----
> tools/abidw.cc | 20 ++++++++-
> 4 files changed, 125 insertions(+), 15 deletions(-)
>
>diff --git a/doc/manuals/abidw.rst b/doc/manuals/abidw.rst
>index e1fce997..9472cfe8 100644
>--- a/doc/manuals/abidw.rst
>+++ b/doc/manuals/abidw.rst
>@@ -197,13 +197,14 @@ Options
> 1.8 will not set the default size and will interpret types without
> a size-in-bits attribute as zero sized.
>
>- * ``--named-type-ids``
>-
>- Without this option ids used to reference types in the XML file
>- use simple numbers. With this option the ids used are derived
>- from the type name to make it easier to see which type is
>- referenced and make the XML file more stable in case new types are
>- added (without this option that might mean all id numbers change).
Note: This (correctly) drops a hunk that was accidentally committed as
part of an earlier change. This feature is not available, but
documented. That is rare :-)
>+ * ``--type-id-style`` ( ``counter`` | ``hash`` )
>+
>+ This option controls how types are idenfied in the generated XML
>+ files. The default ``counter`` style just numbers (with
>+ ``type-id-`` as prefix) the types in the order they are
>+ encountered. The ``hash`` style uses a (stable, portable) hash of
>+ the type names that libabigail uses internally and is intended
>+ to make the XML files easier to diff.
>
> * ``--check-alternate-debug-info-base-name`` <*elf-path*>
>
>diff --git a/include/abg-writer.h b/include/abg-writer.h
>index 8086d828..021a95b7 100644
>--- a/include/abg-writer.h
>+++ b/include/abg-writer.h
>@@ -38,6 +38,12 @@ namespace xml_writer
>
> using namespace abigail::ir;
>
>+enum type_id_style_kind
>+{
>+ COUNTER,
I would rather call it a sequence, but counter is fine for me as well.
>+ HASH
>+};
>+
> class write_context;
>
> /// A convenience typedef for a shared pointer to write_context.
>@@ -74,6 +80,9 @@ set_short_locs(write_context& ctxt, bool flag);
> void
> set_write_parameter_names(write_context& ctxt, bool flag);
>
>+void
>+set_type_id_style(write_context& ctxt, type_id_style_kind style);
>+
> /// A convenience generic function to set common options (usually used
> /// by Libabigail tools) from a generic options carrying-object, into
> /// a given @ref write_context.
>@@ -96,6 +105,7 @@ set_common_options(write_context& ctxt, const OPTS& opts)
> set_write_parameter_names(ctxt, opts.write_parameter_names);
> set_short_locs(ctxt, opts.short_locs);
> set_write_default_sizes(ctxt, opts.default_sizes);
>+ set_type_id_style(ctxt, opts.type_id_style);
> }
>
> void
>diff --git a/src/abg-writer.cc b/src/abg-writer.cc
>index bafa3024..f0dbd096 100644
>--- a/src/abg-writer.cc
>+++ b/src/abg-writer.cc
>@@ -27,6 +27,7 @@
>
> #include "config.h"
> #include <assert.h>
>+#include <iomanip>
> #include <iostream>
> #include <fstream>
> #include <sstream>
>@@ -56,6 +57,39 @@ ABG_BEGIN_EXPORT_DECLARATIONS
> ABG_END_EXPORT_DECLARATIONS
> // </headers defining libabigail's API>
>
>+namespace
>+{
>+/// Compute a stable string hash.
>+///
>+/// std::hash has no portability or stability guarantees so is
>+/// unsuitable where reproducibility is a requirement.
>+///
>+/// This is the 32-bit FNV-1a algorithm. The algorithm, reference code
>+/// and constants are all unencumbered. It is fast and has reasonable
>+/// distribution properties.
>+///
>+/// https://en.wikipedia.org/wiki/Fowler-Noll-Vo_hash_function
>+///
>+/// @param str the string to hash.
>+///
>+/// @return an unsigned 32 bit hash value.
static
>+uint32_t
>+stable_hash(const std::string& str)
>+{
>+ const uint32_t prime = 0x01000193;
>+ const uint32_t offset_basis = 0x811c9dc5;
>+ uint32_t hash = offset_basis;
>+ for (std::string::const_iterator i = str.begin(); i != str.end(); ++i)
>+ {
>+ uint8_t byte = *i;
>+ hash = hash ^ byte;
>+ hash = hash * prime;
>+ }
>+ return hash;
>+}
>+
>+} // namespace
>+
> namespace abigail
> {
> using std::cerr;
>@@ -177,7 +211,9 @@ class write_context
> bool m_write_parameter_names;
> bool m_short_locs;
> bool m_write_default_sizes;
>+ type_id_style_kind m_type_id_style;
> mutable type_ptr_map m_type_id_map;
>+ mutable std::unordered_set<uint32_t> m_used_type_id_hashes;
Use abg_compat::unordered_set instead.
> mutable type_ptr_set_type m_emitted_type_set;
> type_ptr_set_type m_emitted_decl_only_set;
> // A map of types that are referenced by emitted pointers,
>@@ -214,7 +250,8 @@ public:
> m_write_elf_needed(true),
> m_write_parameter_names(true),
> m_short_locs(false),
>- m_write_default_sizes(true)
>+ m_write_default_sizes(true),
>+ m_type_id_style(COUNTER)
> {}
>
> /// Getter of the environment we are operating from.
>@@ -374,6 +411,24 @@ public:
> set_show_locs(bool f)
> {m_show_locs = f;}
>
>+ /// Getter of the "type-id-style" option.
>+ ///
>+ /// This option controls the kind of type ids used in XML output.
>+ ///
>+ /// @return the value of the "type-id-style" option.
>+ type_id_style_kind
>+ get_type_id_style() const
>+ {return m_type_id_style;}
>+
>+ /// Setter of the "type-id-style" option.
>+ ///
>+ /// This option controls the kind of type ids used in XML output.
>+ ///
>+ /// @param style the new value of the "type-id-style" option.
>+ void
>+ set_type_id_style(type_id_style_kind style)
>+ {m_type_id_style = style;}
>+
> /// Getter of the @ref id_manager.
> ///
> /// @return the @ref id_manager used by the current instance of @ref
>@@ -421,14 +476,29 @@ public:
> c = const_cast<type_base*>(t);
>
> type_ptr_map::const_iterator it = m_type_id_map.find(c);
>- if (it == m_type_id_map.end())
>+ if (it != m_type_id_map.end())
>+ return it->second;
>+
>+ switch (m_type_id_style)
> {
>- interned_string id =
>- get_id_manager().get_id_with_prefix("type-id-");
>- m_type_id_map[c] = id;
>- return id;
>+ case COUNTER:
>+ {
>+ interned_string id = get_id_manager().get_id_with_prefix("type-id-");
>+ return m_type_id_map[c] = id;
>+ }
>+ case HASH:
>+ {
>+ interned_string pretty = c->get_cached_pretty_representation(true);
>+ size_t hash = stable_hash(*pretty.raw());
>+ while (!m_used_type_id_hashes.insert(hash).second)
>+ ++hash;
>+ std::ostringstream os;
>+ os << std::hex << std::setfill('0') << std::setw(8) << hash;
>+ return m_type_id_map[c] = c->get_environment()->intern(os.str());
>+ }
> }
>- return it->second;
>+ ABG_ASSERT_NOT_REACHED;
>+ return interned_string();
> }
>
> string
>@@ -2131,6 +2201,17 @@ void
> set_write_default_sizes(write_context& ctxt, bool flag)
> {ctxt.set_write_default_sizes(flag);}
>
>+/// Set the 'type-id-style' property.
>+///
>+/// This property controls the kind of type ids used in XML output.
>+///
>+/// @param ctxt the context to set this property on.
>+///
>+/// @param style the new value of the 'type-id-style' property.
>+void
>+set_type_id_style(write_context& ctxt, type_id_style_kind style)
>+{ctxt.set_type_id_style(style);}
>+
> /// Serialize the canonical types of a given scope.
> ///
> /// @param scope the scope to consider.
>diff --git a/tools/abidw.cc b/tools/abidw.cc
>index 9aec3112..7de31737 100644
>--- a/tools/abidw.cc
>+++ b/tools/abidw.cc
>@@ -70,7 +70,10 @@ using abigail::comparison::corpus_diff_sptr;
> using abigail::comparison::compute_diff;
> using abigail::comparison::diff_context_sptr;
> using abigail::comparison::diff_context;
>+using abigail::xml_writer::COUNTER;
>+using abigail::xml_writer::HASH;
> using abigail::xml_writer::create_write_context;
>+using abigail::xml_writer::type_id_style_kind;
> using abigail::xml_writer::write_context_sptr;
> using abigail::xml_writer::write_corpus;
> using abigail::xml_reader::read_corpus_from_native_xml_file;
>@@ -114,6 +117,7 @@ struct options
> bool do_log;
> bool drop_private_types;
> bool drop_undefined_syms;
>+ type_id_style_kind type_id_style;
>
> options()
> : display_version(),
>@@ -136,7 +140,8 @@ struct options
> annotate(),
> do_log(),
> drop_private_types(false),
>- drop_undefined_syms(false)
>+ drop_undefined_syms(false),
>+ type_id_style(COUNTER)
> {}
>
> ~options()
>@@ -175,6 +180,7 @@ display_usage(const string& prog_name, ostream& out)
> << " --no-write-default-sizes do not emit pointer size when it equals"
> " the default address size of the translation unit\n"
> << " --no-parameter-names do not show names of function parameters\n"
>+ << " --type-id-style (counter|hash) type-id style (counter(default): type-id-number; hash: hexdigits\n"
> << " --check-alternate-debug-info <elf-path> check alternate debug info "
> "of <elf-path>\n"
> << " --check-alternate-debug-info-base-name <elf-path> check alternate "
>@@ -301,6 +307,18 @@ parse_command_line(int argc, char* argv[], options& opts)
> opts.default_sizes = false;
> else if (!strcmp(argv[i], "--no-parameter-names"))
> opts.write_parameter_names = false;
>+ else if (!strcmp(argv[i], "--type-id-style"))
>+ {
>+ ++i;
>+ if (i >= argc)
>+ return false;
>+ if (!strcmp(argv[i], "counter"))
>+ opts.type_id_style = COUNTER;
>+ else if (!strcmp(argv[i], "hash"))
>+ opts.type_id_style = HASH;
>+ else
>+ return false;
>+ }
> else if (!strcmp(argv[i], "--check-alternate-debug-info")
> || !strcmp(argv[i], "--check-alternate-debug-info-base-name"))
> {
>--
>2.27.0.290.gba653c62da-goog
>
>--
>To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>
next prev parent reply other threads:[~2020-06-15 19:29 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-06-12 9:59 [PATCH 0/2] Stable " Giuliano Procida
2020-06-12 9:59 ` [PATCH 1/2] abg-writer: Add support for stable hash " Giuliano Procida
2020-06-15 17:18 ` Dodji Seketeli
2020-06-15 19:29 ` Matthias Maennich [this message]
2020-06-15 21:07 ` [PATCH v2 0/1] Stable " Giuliano Procida
2020-06-15 21:07 ` [PATCH v2 1/1] abg-writer: Add support for stable hash " Giuliano Procida
2020-06-16 8:21 ` Dodji Seketeli
2020-06-12 9:59 ` [PATCH 2/2] Add tests for abidw --type-id-style hash Giuliano Procida
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200615192951.GF120706@google.com \
--to=maennich@google.com \
--cc=dodji@seketeli.org \
--cc=gprocida@google.com \
--cc=kernel-team@android.com \
--cc=libabigail@sourceware.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).