From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm1-x343.google.com (mail-wm1-x343.google.com [IPv6:2a00:1450:4864:20::343]) by sourceware.org (Postfix) with ESMTPS id C6443386F46B for ; Thu, 7 May 2020 12:27:04 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org C6443386F46B Received: by mail-wm1-x343.google.com with SMTP id y24so6616006wma.4 for ; Thu, 07 May 2020 05:27:04 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=xIcjomB3fKeqi0XZRnzTv2MUxoEW/YBdu+D/LwRIiJM=; b=kXhgS7u35txXPDci4vtwGnJtbijlYA85cNaV8xkKUs7CSi9Tu5iFzUgJxF8pGVW0f/ NjGZdgpdu7oI8aVg+sNp3AbAM7aqaU5cyZil/6KaerSJxKM9NJtrI27u0ldSHvnW8oPG kVZbRXnScWmW65640vzxab4PWLySJkc0+gdyW/j5tfXC5hISj1Ccmlum12WoFbHH4WQq EulY3fkZJVurVKWVIjant0s81WxRU5HZ6KFOnzRKMOdPvuo4jL6kewsH9lqlKJ9MY5/+ aJWaAasZtVLCRTtqp/AJKP9/F8QZG7UW7xpoaYYFtx1tF0lC2um3mhmGSJnfVf57wsx6 /dmw== X-Gm-Message-State: AGi0PuZm4OHMA/vVj5CpoPKH8dqADEh9V6MB3foFP6eXTSCd5xS+Y4Gj ZhSs62AIiljJ7VgEeRgi5x1K9L6gAh0o/g== X-Google-Smtp-Source: APiQypJWhW0IxKDukqaQ5Lq/PtRX2r6YAnQc0q9aRAEM/F5Mvkpeh1BkrsHA6ecDY8Tl4iWGSObWbg== X-Received: by 2002:a05:600c:2dda:: with SMTP id e26mr751642wmh.42.1588854423201; Thu, 07 May 2020 05:27:03 -0700 (PDT) Received: from google.com ([2a00:79e0:d:210:e8f7:125b:61e9:733d]) by smtp.gmail.com with ESMTPSA id c83sm8131318wmd.23.2020.05.07.05.27.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 May 2020 05:27:02 -0700 (PDT) Date: Thu, 7 May 2020 14:27:01 +0200 From: Matthias Maennich To: "Mark J. Wielaard" Cc: libabigail@sourceware.org, Giuliano Procida Subject: Re: [PATCH 4/4] Add --merge-translation-units option. Message-ID: <20200507122701.GF80772@google.com> References: <20200421122821.13769-1-mark@klomp.org> <20200421122821.13769-5-mark@klomp.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Disposition: inline In-Reply-To: <20200421122821.13769-5-mark@klomp.org> X-Spam-Status: No, score=-33.6 required=5.0 tests=BAYES_00, DKIMWL_WL_MED, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, USER_IN_DEF_DKIM_WL, USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libabigail@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libabigail mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 May 2020 12:27:15 -0000 Hi Mark! On Tue, Apr 21, 2020 at 02:28:21PM +0200, Mark J. Wielaard wrote: >From: Mark Wielaard > >It is not always necessary to know in which translation unit a type, >variable or function was defined first. Provide an option to simply >merge all translation units for the same language (and address size). >This also reduces the size of the XML representation of the corpus by >~10% on a simple C only library. > >Currently only implemented for the DWARF reader and abidw tool. > > * doc/manuals/abidw.rst: Document --merge-translation-units. > * include/abg-dwarf-reader.h (set_merge_translation_units): > New function. > * src/abg-dwarf-reader.cc (read_context): Add > merge_translation_units_ bool, merge_translation_units and > merge_translation_units methods. > (set_merge_translation_units): New function. > (read_context::build_translation_unit_and_add_to_ir): Check > merge_translation_units to see how to find an existing > (compatible) translation_unit to use. Drop path and > compilation_dir when merging translation_units. > * src/abg-reader.cc (read_context): Add m_merge_translation_units > bool, merge_translation_units and merge_translation_units methods. > * tools/abidw.cc (options): Add merge_translation_units bool. > (display_usage): Describe --merge-translation-units. > (parse_command_line): Parse --merge-translation-units. > (main): Call set_merge_translation_units. > I tested that change with Linux Kernel binaries and I could not reach the same gains (as in size reductions). 100K chopped off of 16M. It simply does not have the same characteristics as elfutils(?) I suppose. Nevertheless, the patch makes sense and is a good addition. Please note that due to commit 246ca2004930 ("corpus/writer: sort emitted translation units by path name"), it needs to be reworked as the empty path name is now invalid. In my tests I used an artificial filename created out of language and address size (e.g. 'LANG_C89_64bit'). That also simplifies the code much as you can do the lookup by name in any case. Cheers, Matthias >Signed-off-by: Mark Wielaard >--- > doc/manuals/abidw.rst | 7 ++++ > include/abg-dwarf-reader.h | 4 ++ > src/abg-dwarf-reader.cc | 85 +++++++++++++++++++++++++++++++------- > src/abg-reader.cc | 20 ++++++++- > tools/abidw.cc | 6 +++ > 5 files changed, 105 insertions(+), 17 deletions(-) > >diff --git a/doc/manuals/abidw.rst b/doc/manuals/abidw.rst >index 7ae44737..256c4ad6 100644 >--- a/doc/manuals/abidw.rst >+++ b/doc/manuals/abidw.rst >@@ -158,6 +158,13 @@ Options > representation build by Libabigail to represent the ABI and will > not end up in the abi XML file. > >+ * ``--merge-translation-units`` >+ >+ With this option translation units for the same language (and >+ address size) will be merged together as if the functions, >+ variables and types were all defined together. Note that this >+ also drops the compilation paths used. >+ > * ``--no-linux-kernel-mode`` > > Without this option, if abipkgiff detects that the binaries it is >diff --git a/include/abg-dwarf-reader.h b/include/abg-dwarf-reader.h >index d0329aed..430cc222 100644 >--- a/include/abg-dwarf-reader.h >+++ b/include/abg-dwarf-reader.h >@@ -192,6 +192,10 @@ void > set_drop_undefined_syms(read_context& ctxt, > bool f); > >+void >+set_merge_translation_units(read_context& ctxt, >+ bool f); >+ > void > set_do_log(read_context& ctxt, bool f); > >diff --git a/src/abg-dwarf-reader.cc b/src/abg-dwarf-reader.cc >index 1c0d6ea0..f7cbe6a5 100644 >--- a/src/abg-dwarf-reader.cc >+++ b/src/abg-dwarf-reader.cc >@@ -3260,6 +3260,7 @@ public: > corpus::exported_decls_builder* exported_decls_builder_; > options_type options_; > bool drop_undefined_syms_; >+ bool merge_translation_units_; > read_context(); > > public: >@@ -3429,6 +3430,7 @@ public: > options_.load_in_linux_kernel_mode = linux_kernel_mode; > options_.load_all_types = load_all_types; > drop_undefined_syms_ = false; >+ merge_translation_units_ = false; > load_in_linux_kernel_mode(linux_kernel_mode); > } > >@@ -3537,6 +3539,22 @@ public: > drop_undefined_syms(bool f) > {drop_undefined_syms_ = f;} > >+ /// Setter for the flag that tells us if we are merging translation >+ /// units. >+ /// >+ /// @param f the new value of the flag. >+ void >+ merge_translation_units(bool f) >+ {merge_translation_units_ = f;} >+ >+ /// Getter for the flag that tells us if we are merging translation >+ /// units. >+ /// >+ /// @return true iff we are merging translation units. >+ bool >+ merge_translation_units() const >+ {return merge_translation_units_;} >+ > /// Getter of the suppression specifications to be used during > /// ELF/DWARF parsing. > /// >@@ -9487,6 +9505,17 @@ void > set_drop_undefined_syms(read_context& ctxt, bool f) > {ctxt.drop_undefined_syms(f);} > >+/// Setter of the "merge_translation_units" flag. >+/// >+/// This flag tells if we should merge translation units. >+/// >+/// @param ctxt the read context to consider for this flag. >+/// >+/// @param f the value of the flag. >+void >+set_merge_translation_units(read_context& ctxt, bool f) >+{ctxt.merge_translation_units(f);} >+ > /// Setter of the "do_log" flag. > /// > /// This flag tells if we should emit verbose logs for various >@@ -14267,28 +14296,52 @@ build_translation_unit_and_add_to_ir(read_context& ctxt, > string path = die_string_attribute(die, DW_AT_name); > string compilation_dir = die_string_attribute(die, DW_AT_comp_dir); > >- // See if the same translation unit exits already in the current >- // corpus. Sometimes, the same translation unit can be present >- // several times in the same debug info. The content of the >- // different instances of the translation unit are different. So to >- // represent that, we are going to re-use the same translation >- // unit. That is, it's going to be the union of all the translation >- // units of the same path. >- { >- string abs_path = compilation_dir + "/" + path; >- result = ctxt.current_corpus()->find_translation_unit(abs_path); >- } >+ uint64_t lang = 0; >+ die_unsigned_constant_attribute(die, DW_AT_language, lang); >+ translation_unit::language language = dwarf_language_to_tu_language(lang); >+ >+ corpus_sptr corp = ctxt.current_corpus(); >+ >+ if (ctxt.merge_translation_units()) >+ { >+ // See if there is already a translation for the address_size >+ // and language. If so, just reuse that one. >+ for (translation_units::const_iterator tu = >+ corp->get_translation_units().begin(); >+ tu != corp->get_translation_units().end(); >+ ++tu) >+ { >+ if ((*tu)->get_address_size() == address_size >+ && (*tu)->get_language() == language) >+ { >+ result = (*tu); >+ break; >+ } >+ } >+ } >+ else >+ { >+ // See if the same translation unit exits already in the current >+ // corpus. Sometimes, the same translation unit can be present >+ // several times in the same debug info. The content of the >+ // different instances of the translation unit are different. So to >+ // represent that, we are going to re-use the same translation >+ // unit. That is, it's going to be the union of all the translation >+ // units of the same path. >+ string abs_path = compilation_dir + "/" + path; >+ result = corp->find_translation_unit(abs_path); >+ } > > if (!result) > { > result.reset(new translation_unit(ctxt.env(), >- path, >+ (ctxt.merge_translation_units() >+ ? "" : path), > address_size)); >- result->set_compilation_dir_path(compilation_dir); >+ if (!ctxt.merge_translation_units()) >+ result->set_compilation_dir_path(compilation_dir); > ctxt.current_corpus()->add(result); >- uint64_t l = 0; >- die_unsigned_constant_attribute(die, DW_AT_language, l); >- result->set_language(dwarf_language_to_tu_language(l)); >+ result->set_language(language); > } > > ctxt.cur_transl_unit(result); >diff --git a/src/abg-reader.cc b/src/abg-reader.cc >index dcaa27e1..7f9374c5 100644 >--- a/src/abg-reader.cc >+++ b/src/abg-reader.cc >@@ -132,6 +132,7 @@ private: > suppr::suppressions_type m_supprs; > bool m_tracking_non_reachable_types; > bool m_drop_undefined_syms; >+ bool m_merge_translation_units; > > read_context(); > >@@ -143,7 +144,8 @@ public: > m_corp_node(), > m_exported_decls_builder(), > m_tracking_non_reachable_types(), >- m_drop_undefined_syms() >+ m_drop_undefined_syms(), >+ m_merge_translation_units() > {} > > /// Getter for the flag that tells us if we are tracking types that >@@ -181,6 +183,22 @@ public: > drop_undefined_syms(bool f) > {m_drop_undefined_syms = f;} > >+ /// Getter for the flag that tells us if we are merging translation >+ /// units. >+ /// >+ /// @return true iff we are merging translation units. >+ bool >+ merge_translation_units() const >+ {return m_merge_translation_units;} >+ >+ /// Setter for the flag that tells us if we are merging translation >+ /// units. >+ /// >+ /// @param f the new value of the flag. >+ void >+ merge_translation_units(bool f) >+ {m_merge_translation_units = f;} >+ > /// Getter of the path to the ABI file. > /// > /// @return the path to the native xml abi file. >diff --git a/tools/abidw.cc b/tools/abidw.cc >index 3f4b3f42..87248a2c 100644 >--- a/tools/abidw.cc >+++ b/tools/abidw.cc >@@ -113,6 +113,7 @@ struct options > bool do_log; > bool drop_private_types; > bool drop_undefined_syms; >+ bool merge_translation_units; > bool named_type_ids; > > options() >@@ -136,6 +137,7 @@ struct options > do_log(), > drop_private_types(false), > drop_undefined_syms(false), >+ merge_translation_units(false), > named_type_ids(false) > {} > >@@ -170,6 +172,7 @@ display_usage(const string& prog_name, ostream& out) > << " --short-locs only print filenames rather than paths\n" > << " --drop-private-types drop private types from representation\n" > << " --drop-undefined-syms drop undefined symbols from representation\n" >+ << " --merge-translation-units merge translation units for same language\n" > << " --named_type-ids use id attributes based on type names in XML file\n" > << " --no-comp-dir-path do not show compilation path information\n" > << " --no-elf-needed do not show the DT_NEEDED information\n" >@@ -317,6 +320,8 @@ parse_command_line(int argc, char* argv[], options& opts) > opts.drop_private_types = true; > else if (!strcmp(argv[i], "--drop-undefined-syms")) > opts.drop_undefined_syms = true; >+ else if (!strcmp(argv[i], "--merge-translation-units")) >+ opts.merge_translation_units = true; > else if (!strcmp(argv[i], "--named-type-ids")) > opts.named_type_ids = true; > else if (!strcmp(argv[i], "--no-linux-kernel-mode")) >@@ -806,6 +811,7 @@ main(int argc, char* argv[]) > opts.linux_kernel_mode); > read_context& ctxt = *c; > set_drop_undefined_syms(ctxt, opts.drop_undefined_syms); >+ set_merge_translation_units(ctxt, opts.merge_translation_units); > set_show_stats(ctxt, opts.show_stats); > set_suppressions(ctxt, opts); > abigail::dwarf_reader::set_do_log(ctxt, opts.do_log); >-- >2.18.2 >