public inbox for libabigail@sourceware.org
 help / color / mirror / Atom feed
* [PATCH 0/4] Looking at equality and hashing
@ 2021-09-10 11:23 Giuliano Procida
  2021-09-10 11:23 ` [PATCH 1/4] abg-writer: get_id_for_type: refactor insertion and look-up Giuliano Procida
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Giuliano Procida @ 2021-09-10 11:23 UTC (permalink / raw)
  To: libabigail; +Cc: dodji, kernel-team, gprocida, maennich

These relate to PR28320.

The first two patches are essentially code movement to better isolate
where insertions and look-ups happen.

The third patch adds some optional consistency checks. These find one
issue in the test suite. Unfortunately, they do not pick up on issue
in PR28320 itself. Perhaps there is a canonical type pointer issue as
well.

The last patch seems like an omission. Please take a look.

Giuliano Procida (4):
  abg-writer: get_id_for_type: refactor insertion and look-up
  abg-writer: refactor recording of unemitted types
  abg-writer: allow debugging of equality / hash issues
  RFC: abg-writer: add a missing check for emitted declarations

 configure.ac      |  16 ++++
 src/abg-writer.cc | 184 +++++++++++++++++++++++++++++++---------------
 2 files changed, 139 insertions(+), 61 deletions(-)

-- 
2.33.0.309.g3052b89438-goog


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/4] abg-writer: get_id_for_type: refactor insertion and look-up
  2021-09-10 11:23 [PATCH 0/4] Looking at equality and hashing Giuliano Procida
@ 2021-09-10 11:23 ` Giuliano Procida
  2021-09-10 11:23 ` [PATCH 2/4] abg-writer: refactor recording of unemitted types Giuliano Procida
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Giuliano Procida @ 2021-09-10 11:23 UTC (permalink / raw)
  To: libabigail; +Cc: dodji, kernel-team, gprocida, maennich

The insertion and look-up in get_id_for_type can be refactored. This
avoids an extra round of hashing and equality tests in the case of a
newly inserted type.

	* src/abg-writer.cc (get_id_for_type): Fold together the
	look-up and insertion logic by always attempting to insert a
	dummy value.

Signed-off-by: Giuliano Procida <gprocida@google.com>
---
 src/abg-writer.cc | 47 ++++++++++++++++++++++++++---------------------
 1 file changed, 26 insertions(+), 21 deletions(-)

diff --git a/src/abg-writer.cc b/src/abg-writer.cc
index 9f48dc92..d23f4fd6 100644
--- a/src/abg-writer.cc
+++ b/src/abg-writer.cc
@@ -421,30 +421,35 @@ public:
     if (c == 0)
       c = const_cast<type_base*>(t);
 
-    type_ptr_map::const_iterator it = m_type_id_map.find(c);
-    if (it != m_type_id_map.end())
-      return it->second;
+    auto insertion = m_type_id_map.emplace(c, interned_string());
+    interned_string& id = insertion.first->second;
 
-    switch (m_type_id_style)
-      {
-      case SEQUENCE_TYPE_ID_STYLE:
-	{
-	  interned_string id = get_id_manager().get_id_with_prefix("type-id-");
-	  return m_type_id_map[c] = id;
-	}
-      case HASH_TYPE_ID_STYLE:
+    if (insertion.second)
+      switch (m_type_id_style)
 	{
-	  interned_string pretty = c->get_cached_pretty_representation(true);
-	  size_t hash = hashing::fnv_hash(pretty);
-	  while (!m_used_type_id_hashes.insert(hash).second)
-	    ++hash;
-	  std::ostringstream os;
-	  os << std::hex << std::setfill('0') << std::setw(8) << hash;
-	  return m_type_id_map[c] = c->get_environment()->intern(os.str());
+	case SEQUENCE_TYPE_ID_STYLE:
+	  {
+	    id = get_id_manager().get_id_with_prefix("type-id-");
+	    break;
+	  }
+	case HASH_TYPE_ID_STYLE:
+	  {
+	    interned_string pretty = c->get_cached_pretty_representation(true);
+	    size_t hash = hashing::fnv_hash(pretty);
+	    while (!m_used_type_id_hashes.insert(hash).second)
+	      ++hash;
+	    std::ostringstream os;
+	    os << std::hex << std::setfill('0') << std::setw(8) << hash;
+	    id = c->get_environment()->intern(os.str());
+	    break;
+	  }
+	  default: {
+	    ABG_ASSERT_NOT_REACHED;
+	    break;
+	  }
 	}
-      }
-    ABG_ASSERT_NOT_REACHED;
-    return interned_string();
+
+    return id;
   }
 
   string
-- 
2.33.0.309.g3052b89438-goog


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 2/4] abg-writer: refactor recording of unemitted types
  2021-09-10 11:23 [PATCH 0/4] Looking at equality and hashing Giuliano Procida
  2021-09-10 11:23 ` [PATCH 1/4] abg-writer: get_id_for_type: refactor insertion and look-up Giuliano Procida
@ 2021-09-10 11:23 ` Giuliano Procida
  2021-09-10 11:23 ` [PATCH 3/4] abg-writer: allow debugging of equality / hash issues Giuliano Procida
  2021-09-10 11:23 ` [PATCH 4/4] RFC: abg-writer: add a missing check for emitted declarations Giuliano Procida
  3 siblings, 0 replies; 5+ messages in thread
From: Giuliano Procida @ 2021-09-10 11:23 UTC (permalink / raw)
  To: libabigail; +Cc: dodji, kernel-team, gprocida, maennich

The code fragment

if (referenced_type_should_be_emitted(*i, ctxt, tu, is_last))
  referenced_types_to_emit.insert(*i);

appears 5 times. This commit inlines the insertion into the helper
function which is also renamed.

	* src/abg-writer.cc (referenced_type_should_be_emitted):
	Replaced by record_unemitted_type.
	(record_unemitted_type): Replacement function that also takes
	care of inserting the type into the set.
	(write_translation_unit): Replace calls to
	referenced_type_should_be_emitted and inserts with calls to
	record_unemitted_type.

Signed-off-by: Giuliano Procida <gprocida@google.com>
---
 src/abg-writer.cc | 35 +++++++++++++++--------------------
 1 file changed, 15 insertions(+), 20 deletions(-)

diff --git a/src/abg-writer.cc b/src/abg-writer.cc
index d23f4fd6..795ca685 100644
--- a/src/abg-writer.cc
+++ b/src/abg-writer.cc
@@ -2248,10 +2248,12 @@ write_canonical_types_of_scope(const scope_decl	&scope,
 }
 
 /// Test if a type referenced in a given translation unit should be
-/// emitted or not.
+/// emitted and record it if so.
 ///
 /// This is a subroutine of @ref write_translation_unit.
 ///
+/// @param types the set of types to update.
+///
 /// @param t the type to consider.
 ///
 /// @param ctxt the write context to consider.
@@ -2260,20 +2262,18 @@ write_canonical_types_of_scope(const scope_decl	&scope,
 ///
 /// @param tu_is_last true if @p tu is the last translation unit being
 /// emitted.
-///
-/// @return true iff @p t is to be emitted.
-static bool
-referenced_type_should_be_emitted(const type_base *t,
-				  const write_context& ctxt,
-				  const translation_unit& tu,
-				  bool tu_is_last)
+static void
+record_unemitted_type(type_ptr_set_type& types,
+		      const type_base *t,
+		      const write_context& ctxt,
+		      const translation_unit& tu,
+		      bool tu_is_last)
 {
   if ((tu_is_last || t->get_translation_unit()->get_absolute_path()
        == tu.get_absolute_path())
       && !ctxt.type_is_emitted(t)
       && !ctxt.decl_only_type_is_emitted(t))
-    return true;
-  return false;
+    types.insert(t);
 }
 
 /// Serialize a translation unit to an output stream.
@@ -2389,22 +2389,19 @@ write_translation_unit(write_context&		ctxt,
 	 ctxt.get_referenced_types().begin();
        i != ctxt.get_referenced_types().end();
        ++i)
-    if (referenced_type_should_be_emitted(*i, ctxt, tu, is_last))
-      referenced_types_to_emit.insert(*i);
+    record_unemitted_type(referenced_types_to_emit, *i, ctxt, tu, is_last);
 
   for (fn_type_ptr_set_type::const_iterator i =
 	 ctxt.get_referenced_function_types().begin();
        i != ctxt.get_referenced_function_types().end();
        ++i)
-    if (referenced_type_should_be_emitted(*i, ctxt, tu, is_last))
-      referenced_types_to_emit.insert(*i);
+    record_unemitted_type(referenced_types_to_emit, *i, ctxt, tu, is_last);
 
   for (type_ptr_set_type::const_iterator i =
 	 ctxt.get_referenced_non_canonical_types().begin();
        i != ctxt.get_referenced_non_canonical_types().end();
        ++i)
-    if (referenced_type_should_be_emitted(*i, ctxt, tu, is_last))
-      referenced_types_to_emit.insert(*i);
+    record_unemitted_type(referenced_types_to_emit, *i, ctxt, tu, is_last);
 
   // Ok, now let's emit the referenced type for good.
   while (!referenced_types_to_emit.empty())
@@ -2459,15 +2456,13 @@ write_translation_unit(write_context&		ctxt,
 	     ctxt.get_referenced_types().begin();
 	   i != ctxt.get_referenced_types().end();
 	   ++i)
-	if (referenced_type_should_be_emitted(*i, ctxt, tu, is_last))
-	  referenced_types_to_emit.insert(*i);
+        record_unemitted_type(referenced_types_to_emit, *i, ctxt, tu, is_last);
 
       for (type_ptr_set_type::const_iterator i =
 	     ctxt.get_referenced_non_canonical_types().begin();
 	   i != ctxt.get_referenced_non_canonical_types().end();
 	   ++i)
-	if (referenced_type_should_be_emitted(*i, ctxt, tu, is_last))
-	  referenced_types_to_emit.insert(*i);
+        record_unemitted_type(referenced_types_to_emit, *i, ctxt, tu, is_last);
     }
 
   // Now handle all function types that were not only referenced by
-- 
2.33.0.309.g3052b89438-goog


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 3/4] abg-writer: allow debugging of equality / hash issues
  2021-09-10 11:23 [PATCH 0/4] Looking at equality and hashing Giuliano Procida
  2021-09-10 11:23 ` [PATCH 1/4] abg-writer: get_id_for_type: refactor insertion and look-up Giuliano Procida
  2021-09-10 11:23 ` [PATCH 2/4] abg-writer: refactor recording of unemitted types Giuliano Procida
@ 2021-09-10 11:23 ` Giuliano Procida
  2021-09-10 11:23 ` [PATCH 4/4] RFC: abg-writer: add a missing check for emitted declarations Giuliano Procida
  3 siblings, 0 replies; 5+ messages in thread
From: Giuliano Procida @ 2021-09-10 11:23 UTC (permalink / raw)
  To: libabigail; +Cc: dodji, kernel-team, gprocida, maennich

Inconsistenencies between equality and hash values can and have caused
hard-to-debug issues. There are quite a few unordered sets and maps in
the XML writer that use deep_ptr_eq_functor and type_hasher.

New ./configure option: --enable-debug-equality-hash-consistency

A simple way of tracking down potential issues is to check for
inconsistencies any time a type is looked up or inserted into a
container.

	* src/abg-writer.cc (get_id_for_type): Add a debugging stanza
	that, before a type is looked up or inserted, asserts that if
	it is equal to an existing mapped type then their hash values
	must be the same.

Signed-off-by: Giuliano Procida <gprocida@google.com>
---
 configure.ac      |  16 ++++++++
 src/abg-writer.cc | 102 +++++++++++++++++++++++++++++++++++++---------
 2 files changed, 98 insertions(+), 20 deletions(-)

diff --git a/configure.ac b/configure.ac
index 735cc9de..8870a7e7 100644
--- a/configure.ac
+++ b/configure.ac
@@ -72,6 +72,12 @@ AC_ARG_ENABLE(debug-self-comparison,
 	      ENABLE_DEBUG_SELF_COMPARISON=$enableval,
 	      ENABLE_DEBUG_SELF_COMPARISON=no)
 
+AC_ARG_ENABLE(debug-equality-hash-consistency,
+	      AS_HELP_STRING([--enable-debug-equality-hash-consistency=yes|no],
+			     [enable debugging of equality / hash value consistency in unordered maps (default is no)]),
+	      ENABLE_DEBUG_EQUALITY_HASH_CONSISTENCY=$enableval,
+	      ENABLE_DEBUG_EQUALITY_HASH_CONSISTENCY=no)
+
 AC_ARG_ENABLE(deb,
 	      AS_HELP_STRING([--enable-deb=yes|no|auto],
 			     [enable the support of deb in abipkgdiff (default is auto)]),
@@ -313,6 +319,16 @@ fi
 
 AM_CONDITIONAL(ENABLE_DEBUG_SELF_COMPARISON, test x$ENABLE_DEBUG_SELF_COMPARISON = xyes)
 
+dnl enable the debugging of equality / hash value consistency in the abidw XML writer
+if test x$ENABLE_DEBUG_EQUALITY_HASH_CONSISTENCY = xyes; then
+  AC_DEFINE([WITH_DEBUG_EQUALITY_HASH_CONSISTENCY], 1, [compile equality / hash value consistency checks])
+  AC_MSG_NOTICE([support of debugging equality / hash value consistency is enabled])
+else
+  AC_MSG_NOTICE([support of debugging equality / hash value consistency is disabled])
+fi
+
+AM_CONDITIONAL(ENABLE_DEBUG_EQUALITY_HASH_CONSISTENCY, test x$ENABLE_DEBUG_EQUALITY_HASH_CONSISTENCY = xyes)
+
 dnl Check for the dpkg program
 if test x$ENABLE_DEB = xauto -o x$ENABLE_DEB = xyes; then
    AC_CHECK_PROG(HAS_DPKG, dpkg, yes, no)
diff --git a/src/abg-writer.cc b/src/abg-writer.cc
index 795ca685..cf664720 100644
--- a/src/abg-writer.cc
+++ b/src/abg-writer.cc
@@ -115,6 +115,41 @@ struct type_hasher
   {return hash_type(t);}
 }; // end struct type_hasher
 
+template <typename T>
+static void check_set_equality_hash_consistency(
+    const T& haystack, const typename T::key_type& needle)
+{
+#ifdef WITH_DEBUG_EQUALITY_HASH_CONSISTENCY
+  abigail::diff_utils::deep_ptr_eq_functor eq;
+  type_hasher hasher;
+  const auto needle_hash = hasher(needle);
+  for (const auto& key : haystack)
+    ABG_ASSERT(!eq(needle, key) || needle_hash == hasher(key));
+#else
+  (void) haystack;
+  (void) needle;
+#endif
+}
+
+template <typename T>
+static void check_map_equality_hash_consistency(
+    const T& haystack, const typename T::key_type& needle)
+{
+#ifdef WITH_DEBUG_EQUALITY_HASH_CONSISTENCY
+  abigail::diff_utils::deep_ptr_eq_functor eq;
+  type_hasher hasher;
+  const auto needle_hash = hasher(needle);
+  for (const auto& key_value : haystack)
+    {
+      const auto& key = key_value.first;
+      ABG_ASSERT(!eq(key, needle) || needle_hash == hasher(key));
+    }
+#else
+  (void) haystack;
+  (void) needle;
+#endif
+}
+
 /// A convenience typedef for a map that associates a pointer to type
 /// to a string.  The pointer to type is hashed as fast as possible.
 typedef unordered_map<type_base*,
@@ -421,6 +456,7 @@ public:
     if (c == 0)
       c = const_cast<type_base*>(t);
 
+    check_map_equality_hash_consistency(m_type_id_map, c);
     auto insertion = m_type_id_map.emplace(c, interned_string());
     interned_string& id = insertion.first->second;
 
@@ -544,14 +580,25 @@ public:
   {
     // If the type is a function type, record it in a dedicated data
     // structure.
-    if (function_type* f = is_function_type(t.get()))
-      m_referenced_fn_types_set.insert(f);
-    else if (!t->get_naked_canonical_type())
-      // If the type doesn't have a canonical type, record it in a
-      // dedicated data structure.
-      m_referenced_non_canonical_types_set.insert(t.get());
+    type_base* t_ptr = t.get();
+    if (function_type* f = is_function_type(t_ptr))
+      {
+	check_set_equality_hash_consistency(m_referenced_fn_types_set, f);
+	m_referenced_fn_types_set.insert(f);
+      }
+    else if (!t_ptr->get_naked_canonical_type())
+      {
+	// If the type doesn't have a canonical type, record it in a
+	// dedicated data structure.
+	check_set_equality_hash_consistency(
+	    m_referenced_non_canonical_types_set, t_ptr);
+	m_referenced_non_canonical_types_set.insert(t_ptr);
+      }
     else
-      m_referenced_types_set.insert(t.get());
+      {
+	check_set_equality_hash_consistency(m_referenced_types_set, t_ptr);
+	m_referenced_types_set.insert(t_ptr);
+      }
   }
 
   /// Test if a given type has been referenced by a pointer, a
@@ -564,15 +611,26 @@ public:
   bool
   type_is_referenced(const type_base_sptr& t)
   {
-    if (function_type *f = is_function_type(t.get()))
-      return (m_referenced_fn_types_set.find(f)
-	      != m_referenced_fn_types_set.end());
-    else if (!t->get_naked_canonical_type())
-      return (m_referenced_non_canonical_types_set.find(t.get())
-	      != m_referenced_non_canonical_types_set.end());
+    type_base* t_ptr = t.get();
+    if (function_type *f = is_function_type(t_ptr))
+      {
+	check_set_equality_hash_consistency(m_referenced_fn_types_set, f);
+	return (m_referenced_fn_types_set.find(f)
+		!= m_referenced_fn_types_set.end());
+      }
+    else if (!t_ptr->get_naked_canonical_type())
+      {
+	check_set_equality_hash_consistency(
+	    m_referenced_non_canonical_types_set, t_ptr);
+	return (m_referenced_non_canonical_types_set.find(t_ptr)
+		!= m_referenced_non_canonical_types_set.end());
+      }
     else
-      return m_referenced_types_set.find
-	(t.get()) != m_referenced_types_set.end();
+      {
+	check_set_equality_hash_consistency(m_referenced_types_set, t_ptr);
+	return (m_referenced_types_set.find(t_ptr)
+		!= m_referenced_types_set.end());
+      }
   }
 
   /// A comparison functor to compare pointers to @ref type_base.
@@ -727,6 +785,7 @@ public:
     type_base *c = t->get_naked_canonical_type();
     if (c == 0)
       c = const_cast<type_base*>(t);
+    check_set_equality_hash_consistency(m_emitted_type_set, c);
     m_emitted_type_set.insert(c);
   }
 
@@ -739,6 +798,7 @@ public:
   bool
   type_is_emitted(const type_base *t) const
   {
+    check_set_equality_hash_consistency(m_emitted_type_set, t);
     return m_emitted_type_set.find(t) != m_emitted_type_set.end();
   }
 
@@ -792,6 +852,7 @@ public:
   {
     class_or_union* cl = is_class_or_union_type(t);
     ABG_ASSERT(cl && cl->get_is_declaration_only());
+    check_set_equality_hash_consistency(m_emitted_decl_only_set, t);
     m_emitted_decl_only_set.insert(t);
   }
 
@@ -814,10 +875,8 @@ public:
   bool
   decl_only_type_is_emitted(const type_base* t) const
   {
-    type_ptr_set_type::const_iterator i = m_emitted_decl_only_set.find(t);
-    if (i == m_emitted_decl_only_set.end())
-      return false;
-    return true;
+    check_set_equality_hash_consistency(m_emitted_decl_only_set, t);
+    return m_emitted_decl_only_set.find(t) != m_emitted_decl_only_set.end();
   }
 
   /// Test if a declaration-only class has been emitted.
@@ -2273,7 +2332,10 @@ record_unemitted_type(type_ptr_set_type& types,
        == tu.get_absolute_path())
       && !ctxt.type_is_emitted(t)
       && !ctxt.decl_only_type_is_emitted(t))
-    types.insert(t);
+    {
+      check_set_equality_hash_consistency(types, t);
+      types.insert(t);
+    }
 }
 
 /// Serialize a translation unit to an output stream.
-- 
2.33.0.309.g3052b89438-goog


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 4/4] RFC: abg-writer: add a missing check for emitted declarations
  2021-09-10 11:23 [PATCH 0/4] Looking at equality and hashing Giuliano Procida
                   ` (2 preceding siblings ...)
  2021-09-10 11:23 ` [PATCH 3/4] abg-writer: allow debugging of equality / hash issues Giuliano Procida
@ 2021-09-10 11:23 ` Giuliano Procida
  3 siblings, 0 replies; 5+ messages in thread
From: Giuliano Procida @ 2021-09-10 11:23 UTC (permalink / raw)
  To: libabigail; +Cc: dodji, kernel-team, gprocida, maennich

Before writing out extra referenced types, the function
write_translation_unit does not check if this was an already emitted
declaration-only type.

This may actually be intentional, given the following comment.

  // We allow several *declarations* of the same class in the corpus,
  // but only one definition.

	* src/abg-writer.cc (write_translation_unit): Also check if an
	an extra referenced type is declaration-only and already
	emitted.

Signed-off-by: Giuliano Procida <gprocida@google.com>
---
 src/abg-writer.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/abg-writer.cc b/src/abg-writer.cc
index cf664720..abd5a45a 100644
--- a/src/abg-writer.cc
+++ b/src/abg-writer.cc
@@ -2484,7 +2484,7 @@ write_translation_unit(write_context&		ctxt,
 	  // We handle types which have declarations *and* function
 	  // types here.
 	  type_base_sptr t(*i, noop_deleter());
-	  if (!ctxt.type_is_emitted(t))
+	  if (!ctxt.type_is_emitted(t) && !ctxt.decl_only_type_is_emitted(t))
 	    {
 	      if (decl_base* d = get_type_declaration(*i))
 		{
-- 
2.33.0.309.g3052b89438-goog


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-09-10 11:23 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-10 11:23 [PATCH 0/4] Looking at equality and hashing Giuliano Procida
2021-09-10 11:23 ` [PATCH 1/4] abg-writer: get_id_for_type: refactor insertion and look-up Giuliano Procida
2021-09-10 11:23 ` [PATCH 2/4] abg-writer: refactor recording of unemitted types Giuliano Procida
2021-09-10 11:23 ` [PATCH 3/4] abg-writer: allow debugging of equality / hash issues Giuliano Procida
2021-09-10 11:23 ` [PATCH 4/4] RFC: abg-writer: add a missing check for emitted declarations Giuliano Procida

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).