From: Matthias Maennich <maennich@google.com>
To: "Mark J. Wielaard" <mark@klomp.org>
Cc: libabigail@sourceware.org, Giuliano Procida <gprocida@google.com>
Subject: Re: [PATCH 4/4] Add --merge-translation-units option.
Date: Thu, 7 May 2020 14:27:01 +0200 [thread overview]
Message-ID: <20200507122701.GF80772@google.com> (raw)
In-Reply-To: <20200421122821.13769-5-mark@klomp.org>
Hi Mark!
On Tue, Apr 21, 2020 at 02:28:21PM +0200, Mark J. Wielaard wrote:
>From: Mark Wielaard <mark@klomp.org>
>
>It is not always necessary to know in which translation unit a type,
>variable or function was defined first. Provide an option to simply
>merge all translation units for the same language (and address size).
>This also reduces the size of the XML representation of the corpus by
>~10% on a simple C only library.
>
>Currently only implemented for the DWARF reader and abidw tool.
>
> * doc/manuals/abidw.rst: Document --merge-translation-units.
> * include/abg-dwarf-reader.h (set_merge_translation_units):
> New function.
> * src/abg-dwarf-reader.cc (read_context): Add
> merge_translation_units_ bool, merge_translation_units and
> merge_translation_units methods.
> (set_merge_translation_units): New function.
> (read_context::build_translation_unit_and_add_to_ir): Check
> merge_translation_units to see how to find an existing
> (compatible) translation_unit to use. Drop path and
> compilation_dir when merging translation_units.
> * src/abg-reader.cc (read_context): Add m_merge_translation_units
> bool, merge_translation_units and merge_translation_units methods.
> * tools/abidw.cc (options): Add merge_translation_units bool.
> (display_usage): Describe --merge-translation-units.
> (parse_command_line): Parse --merge-translation-units.
> (main): Call set_merge_translation_units.
>
I tested that change with Linux Kernel binaries and I could not reach
the same gains (as in size reductions). 100K chopped off of 16M. It
simply does not have the same characteristics as elfutils(?) I suppose.
Nevertheless, the patch makes sense and is a good addition. Please note
that due to commit 246ca2004930 ("corpus/writer: sort emitted
translation units by path name"), it needs to be reworked as the empty
path name is now invalid. In my tests I used an artificial filename
created out of language and address size (e.g. 'LANG_C89_64bit'). That
also simplifies the code much as you can do the lookup by name in any
case.
Cheers,
Matthias
>Signed-off-by: Mark Wielaard <mark@klomp.org>
>---
> doc/manuals/abidw.rst | 7 ++++
> include/abg-dwarf-reader.h | 4 ++
> src/abg-dwarf-reader.cc | 85 +++++++++++++++++++++++++++++++-------
> src/abg-reader.cc | 20 ++++++++-
> tools/abidw.cc | 6 +++
> 5 files changed, 105 insertions(+), 17 deletions(-)
>
>diff --git a/doc/manuals/abidw.rst b/doc/manuals/abidw.rst
>index 7ae44737..256c4ad6 100644
>--- a/doc/manuals/abidw.rst
>+++ b/doc/manuals/abidw.rst
>@@ -158,6 +158,13 @@ Options
> representation build by Libabigail to represent the ABI and will
> not end up in the abi XML file.
>
>+ * ``--merge-translation-units``
>+
>+ With this option translation units for the same language (and
>+ address size) will be merged together as if the functions,
>+ variables and types were all defined together. Note that this
>+ also drops the compilation paths used.
>+
> * ``--no-linux-kernel-mode``
>
> Without this option, if abipkgiff detects that the binaries it is
>diff --git a/include/abg-dwarf-reader.h b/include/abg-dwarf-reader.h
>index d0329aed..430cc222 100644
>--- a/include/abg-dwarf-reader.h
>+++ b/include/abg-dwarf-reader.h
>@@ -192,6 +192,10 @@ void
> set_drop_undefined_syms(read_context& ctxt,
> bool f);
>
>+void
>+set_merge_translation_units(read_context& ctxt,
>+ bool f);
>+
> void
> set_do_log(read_context& ctxt, bool f);
>
>diff --git a/src/abg-dwarf-reader.cc b/src/abg-dwarf-reader.cc
>index 1c0d6ea0..f7cbe6a5 100644
>--- a/src/abg-dwarf-reader.cc
>+++ b/src/abg-dwarf-reader.cc
>@@ -3260,6 +3260,7 @@ public:
> corpus::exported_decls_builder* exported_decls_builder_;
> options_type options_;
> bool drop_undefined_syms_;
>+ bool merge_translation_units_;
> read_context();
>
> public:
>@@ -3429,6 +3430,7 @@ public:
> options_.load_in_linux_kernel_mode = linux_kernel_mode;
> options_.load_all_types = load_all_types;
> drop_undefined_syms_ = false;
>+ merge_translation_units_ = false;
> load_in_linux_kernel_mode(linux_kernel_mode);
> }
>
>@@ -3537,6 +3539,22 @@ public:
> drop_undefined_syms(bool f)
> {drop_undefined_syms_ = f;}
>
>+ /// Setter for the flag that tells us if we are merging translation
>+ /// units.
>+ ///
>+ /// @param f the new value of the flag.
>+ void
>+ merge_translation_units(bool f)
>+ {merge_translation_units_ = f;}
>+
>+ /// Getter for the flag that tells us if we are merging translation
>+ /// units.
>+ ///
>+ /// @return true iff we are merging translation units.
>+ bool
>+ merge_translation_units() const
>+ {return merge_translation_units_;}
>+
> /// Getter of the suppression specifications to be used during
> /// ELF/DWARF parsing.
> ///
>@@ -9487,6 +9505,17 @@ void
> set_drop_undefined_syms(read_context& ctxt, bool f)
> {ctxt.drop_undefined_syms(f);}
>
>+/// Setter of the "merge_translation_units" flag.
>+///
>+/// This flag tells if we should merge translation units.
>+///
>+/// @param ctxt the read context to consider for this flag.
>+///
>+/// @param f the value of the flag.
>+void
>+set_merge_translation_units(read_context& ctxt, bool f)
>+{ctxt.merge_translation_units(f);}
>+
> /// Setter of the "do_log" flag.
> ///
> /// This flag tells if we should emit verbose logs for various
>@@ -14267,28 +14296,52 @@ build_translation_unit_and_add_to_ir(read_context& ctxt,
> string path = die_string_attribute(die, DW_AT_name);
> string compilation_dir = die_string_attribute(die, DW_AT_comp_dir);
>
>- // See if the same translation unit exits already in the current
>- // corpus. Sometimes, the same translation unit can be present
>- // several times in the same debug info. The content of the
>- // different instances of the translation unit are different. So to
>- // represent that, we are going to re-use the same translation
>- // unit. That is, it's going to be the union of all the translation
>- // units of the same path.
>- {
>- string abs_path = compilation_dir + "/" + path;
>- result = ctxt.current_corpus()->find_translation_unit(abs_path);
>- }
>+ uint64_t lang = 0;
>+ die_unsigned_constant_attribute(die, DW_AT_language, lang);
>+ translation_unit::language language = dwarf_language_to_tu_language(lang);
>+
>+ corpus_sptr corp = ctxt.current_corpus();
>+
>+ if (ctxt.merge_translation_units())
>+ {
>+ // See if there is already a translation for the address_size
>+ // and language. If so, just reuse that one.
>+ for (translation_units::const_iterator tu =
>+ corp->get_translation_units().begin();
>+ tu != corp->get_translation_units().end();
>+ ++tu)
>+ {
>+ if ((*tu)->get_address_size() == address_size
>+ && (*tu)->get_language() == language)
>+ {
>+ result = (*tu);
>+ break;
>+ }
>+ }
>+ }
>+ else
>+ {
>+ // See if the same translation unit exits already in the current
>+ // corpus. Sometimes, the same translation unit can be present
>+ // several times in the same debug info. The content of the
>+ // different instances of the translation unit are different. So to
>+ // represent that, we are going to re-use the same translation
>+ // unit. That is, it's going to be the union of all the translation
>+ // units of the same path.
>+ string abs_path = compilation_dir + "/" + path;
>+ result = corp->find_translation_unit(abs_path);
>+ }
>
> if (!result)
> {
> result.reset(new translation_unit(ctxt.env(),
>- path,
>+ (ctxt.merge_translation_units()
>+ ? "" : path),
> address_size));
>- result->set_compilation_dir_path(compilation_dir);
>+ if (!ctxt.merge_translation_units())
>+ result->set_compilation_dir_path(compilation_dir);
> ctxt.current_corpus()->add(result);
>- uint64_t l = 0;
>- die_unsigned_constant_attribute(die, DW_AT_language, l);
>- result->set_language(dwarf_language_to_tu_language(l));
>+ result->set_language(language);
> }
>
> ctxt.cur_transl_unit(result);
>diff --git a/src/abg-reader.cc b/src/abg-reader.cc
>index dcaa27e1..7f9374c5 100644
>--- a/src/abg-reader.cc
>+++ b/src/abg-reader.cc
>@@ -132,6 +132,7 @@ private:
> suppr::suppressions_type m_supprs;
> bool m_tracking_non_reachable_types;
> bool m_drop_undefined_syms;
>+ bool m_merge_translation_units;
>
> read_context();
>
>@@ -143,7 +144,8 @@ public:
> m_corp_node(),
> m_exported_decls_builder(),
> m_tracking_non_reachable_types(),
>- m_drop_undefined_syms()
>+ m_drop_undefined_syms(),
>+ m_merge_translation_units()
> {}
>
> /// Getter for the flag that tells us if we are tracking types that
>@@ -181,6 +183,22 @@ public:
> drop_undefined_syms(bool f)
> {m_drop_undefined_syms = f;}
>
>+ /// Getter for the flag that tells us if we are merging translation
>+ /// units.
>+ ///
>+ /// @return true iff we are merging translation units.
>+ bool
>+ merge_translation_units() const
>+ {return m_merge_translation_units;}
>+
>+ /// Setter for the flag that tells us if we are merging translation
>+ /// units.
>+ ///
>+ /// @param f the new value of the flag.
>+ void
>+ merge_translation_units(bool f)
>+ {m_merge_translation_units = f;}
>+
> /// Getter of the path to the ABI file.
> ///
> /// @return the path to the native xml abi file.
>diff --git a/tools/abidw.cc b/tools/abidw.cc
>index 3f4b3f42..87248a2c 100644
>--- a/tools/abidw.cc
>+++ b/tools/abidw.cc
>@@ -113,6 +113,7 @@ struct options
> bool do_log;
> bool drop_private_types;
> bool drop_undefined_syms;
>+ bool merge_translation_units;
> bool named_type_ids;
>
> options()
>@@ -136,6 +137,7 @@ struct options
> do_log(),
> drop_private_types(false),
> drop_undefined_syms(false),
>+ merge_translation_units(false),
> named_type_ids(false)
> {}
>
>@@ -170,6 +172,7 @@ display_usage(const string& prog_name, ostream& out)
> << " --short-locs only print filenames rather than paths\n"
> << " --drop-private-types drop private types from representation\n"
> << " --drop-undefined-syms drop undefined symbols from representation\n"
>+ << " --merge-translation-units merge translation units for same language\n"
> << " --named_type-ids use id attributes based on type names in XML file\n"
> << " --no-comp-dir-path do not show compilation path information\n"
> << " --no-elf-needed do not show the DT_NEEDED information\n"
>@@ -317,6 +320,8 @@ parse_command_line(int argc, char* argv[], options& opts)
> opts.drop_private_types = true;
> else if (!strcmp(argv[i], "--drop-undefined-syms"))
> opts.drop_undefined_syms = true;
>+ else if (!strcmp(argv[i], "--merge-translation-units"))
>+ opts.merge_translation_units = true;
> else if (!strcmp(argv[i], "--named-type-ids"))
> opts.named_type_ids = true;
> else if (!strcmp(argv[i], "--no-linux-kernel-mode"))
>@@ -806,6 +811,7 @@ main(int argc, char* argv[])
> opts.linux_kernel_mode);
> read_context& ctxt = *c;
> set_drop_undefined_syms(ctxt, opts.drop_undefined_syms);
>+ set_merge_translation_units(ctxt, opts.merge_translation_units);
> set_show_stats(ctxt, opts.show_stats);
> set_suppressions(ctxt, opts);
> abigail::dwarf_reader::set_do_log(ctxt, opts.do_log);
>--
>2.18.2
>
prev parent reply other threads:[~2020-05-07 12:27 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-04-21 12:28 More simplifications of abi XML file Mark J. Wielaard
2020-04-21 12:28 ` [PATCH 1/4] Add named-types-ids to use name ids after the type name instead of numbers Mark J. Wielaard
2020-04-23 17:48 ` Dodji Seketeli
2020-04-24 12:57 ` Matthias Maennich
2020-04-24 15:26 ` Mark Wielaard
2020-04-24 17:11 ` Matthias Maennich
2020-04-24 14:26 ` Mark Wielaard
2020-04-21 12:28 ` [PATCH 2/4] Add no-parameter-names to drop function parameter names Mark J. Wielaard
2020-04-24 13:35 ` Dodji Seketeli
2020-04-21 12:28 ` [PATCH 3/4] Add --no-elf-needed option to drop DT_NEEDED list from corpus Mark J. Wielaard
2020-04-24 14:17 ` Dodji Seketeli
2020-04-21 12:28 ` [PATCH 4/4] Add --merge-translation-units option Mark J. Wielaard
2020-05-07 12:27 ` Matthias Maennich [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200507122701.GF80772@google.com \
--to=maennich@google.com \
--cc=gprocida@google.com \
--cc=libabigail@sourceware.org \
--cc=mark@klomp.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).