public inbox for libabigail@sourceware.org
 help / color / mirror / Atom feed
From: Dodji Seketeli <dodji@redhat.com>
To: libabigail@sourceware.org
Cc: gprocida@google.com,  Dodji Seketeli <dodji@redhat.com>
Subject: [PATCH 4/4] {dwarf,elf_based}-reader,writer: Avoid duplicating corpora in corpus_group
Date: Fri, 03 Feb 2023 12:49:10 +0100	[thread overview]
Message-ID: <87357nt1e1.fsf_-_@redhat.com> (raw)
In-Reply-To: <87mt5vt3op.fsf@seketeli.org> (Dodji Seketeli's message of "Fri, 03 Feb 2023 11:59:34 +0100")

Hello,

It's been brought to my attention on IRC that running

    abidw --linux-tree <kernel-build-tree>

would result in a corpus group that duplicates every single corpus in
the resulting abixml.  Oops.

This is because both dwarf::reader::read_corpus() and
elf_based_reader::read_and_add_corpus_to_group() add the corpus to the
corpus_group, and yet, the later function calls the former.  So the
corpus is added to the corpus_group twice.

This patch ensures that
elf_based_reader::read_and_add_corpus_to_group() is the only one to
add the corpus to the group.  It also ensures that this happens before
the corpus is constructed from the debug info because that is useful
for sharing types among the various corpora.  Otherwise, those types
are potentially duplicated in the IR of each corpus.

The patch also ensures that the abixml writer enforces the fact that
each corpus is emitted only once.

	* src/abg-dwarf-reader.cc (reader::read_debug_info_into_corpus):
	Do not add the corpus to the group here ...
	* src/abg-elf-based-reader.cc
	(elf_based_reader::read_and_add_corpus_to_group): ... because it's
	already added here.  But then, let's add it here /before/ reading
	type & symbols information into the corpus.
	* src/abg-writer.cc (write_context::m_emitted_corpora_set): Add
	new data member.
	(write_context::{corpus_is_emitted, record_corpus_as_emitted}):
	Define new member functions.
	(write_corpus): Invoke the new
	write_context::record_corpus_as_emitted here.
	(write_corpus_group): Ensure that each corpus is emitted only
	once.

Signed-off-by: Dodji Seketeli <dodji@redhat.com>
---
 src/abg-dwarf-reader.cc     |  3 ---
 src/abg-elf-based-reader.cc |  4 +---
 src/abg-writer.cc           | 45 +++++++++++++++++++++++++++++++++++--
 3 files changed, 44 insertions(+), 8 deletions(-)

diff --git a/src/abg-dwarf-reader.cc b/src/abg-dwarf-reader.cc
index 566c9db1..92ce6c6a 100644
--- a/src/abg-dwarf-reader.cc
+++ b/src/abg-dwarf-reader.cc
@@ -2121,9 +2121,6 @@ public:
     corpus()->set_soname(dt_soname());
     corpus()->set_needed(dt_needed());
     corpus()->set_architecture_name(elf_architecture());
-    if (corpus_group_sptr group = corpus_group())
-      group->add_corpus(corpus());
-
     // Set symbols information to the corpus.
     corpus()->set_symtab(symtab());
 
diff --git a/src/abg-elf-based-reader.cc b/src/abg-elf-based-reader.cc
index cd7b59b6..d1d9a2df 100644
--- a/src/abg-elf-based-reader.cc
+++ b/src/abg-elf-based-reader.cc
@@ -92,10 +92,8 @@ ir::corpus_sptr
 elf_based_reader::read_and_add_corpus_to_group(ir::corpus_group& group,
 					       fe_iface::status& status)
 {
+  group.add_corpus(corpus());
   ir::corpus_sptr corp = read_corpus(status);
-
-  if (status & fe_iface::STATUS_OK)
-    group.add_corpus(corp);
   return corp;
 }
 
diff --git a/src/abg-writer.cc b/src/abg-writer.cc
index 1fb067b8..9fe3dec7 100644
--- a/src/abg-writer.cc
+++ b/src/abg-writer.cc
@@ -230,7 +230,8 @@ class write_context
   class_tmpl_shared_ptr_map		m_class_tmpl_id_map;
   string_elf_symbol_sptr_map_type	m_fun_symbol_map;
   string_elf_symbol_sptr_map_type	m_var_symbol_map;
-  unordered_set<interned_string, hash_interned_string> m_emitted_decls_set;
+  unordered_set<interned_string, hash_interned_string>	m_emitted_decls_set;
+  unordered_set<string>				m_emitted_corpora_set;
 
   write_context();
 
@@ -818,6 +819,42 @@ public:
     m_emitted_decls_set.insert(irepr);
   }
 
+  /// Test if a corpus has already been emitted.
+  ///
+  /// A corpus is emitted if it's been recorded as having been emitted
+  /// by the function record_corpus_as_emitted().
+  ///
+  /// @param corp the corpus to consider.
+  ///
+  /// @return true iff the corpus @p corp has been emitted.
+  bool
+  corpus_is_emitted(const corpus_sptr& corp)
+  {
+    if (!corp)
+      return false;
+
+    if (m_emitted_corpora_set.find(corp->get_path())
+	== m_emitted_corpora_set.end())
+      return false;
+
+    return true;
+  }
+
+  /// Record the corpus has having been emitted.
+  ///
+  /// @param corp the corpus to consider.
+  void
+  record_corpus_as_emitted(const corpus_sptr& corp)
+  {
+    if (!corp)
+      return;
+
+    const string& path = corp->get_path();
+    ABG_ASSERT(!path.empty());
+
+    m_emitted_corpora_set.insert(path);
+  }
+
   /// Get the set of types that have been emitted.
   ///
   /// @return the set of types that have been emitted.
@@ -4588,6 +4625,7 @@ write_corpus(write_context&	ctxt,
   out << "</abi-corpus>\n";
 
   ctxt.clear_referenced_types();
+  ctxt.record_corpus_as_emitted(corpus);
 
   return true;
 }
@@ -4639,7 +4677,10 @@ std::ostream& out = ctxt.get_ostream();
 	 group->get_corpora().begin();
        c != group->get_corpora().end();
        ++c)
-    write_corpus(ctxt, *c, get_indent_to_level(ctxt, indent, 1), true);
+    {
+      ABG_ASSERT(!ctxt.corpus_is_emitted(*c));
+      write_corpus(ctxt, *c, get_indent_to_level(ctxt, indent, 1), true);
+    }
 
   do_indent_to_level(ctxt, indent, 0);
   out << "</abi-corpus-group>\n";
-- 
2.31.1



-- 
		Dodji


      parent reply	other threads:[~2023-02-03 11:49 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-01 10:48 [PATCH, RFC] " Dodji Seketeli
2023-02-03 10:59 ` [PATCH 0/4] Fix duplicated corpora in corpus group Dodji Seketeli
2023-02-03 11:47   ` [PATCH 1/4] fe-iface: Add missing virtual destructor Dodji Seketeli
2023-02-03 11:47   ` [PATCH 2/4] dwarf-reader: Remove unused code Dodji Seketeli
2023-02-03 11:48   ` [PATCH 3/4] corpus: Handle empty symbol table cases Dodji Seketeli
2023-02-03 11:49   ` Dodji Seketeli [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87357nt1e1.fsf_-_@redhat.com \
    --to=dodji@redhat.com \
    --cc=gprocida@google.com \
    --cc=libabigail@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).