From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from relay5-d.mail.gandi.net (relay5-d.mail.gandi.net [217.70.183.197]) by sourceware.org (Postfix) with ESMTPS id 615E4385700D for ; Fri, 12 Mar 2021 11:18:17 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 615E4385700D Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=seketeli.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=dodji@seketeli.org X-Originating-IP: 88.120.130.27 Received: from localhost (unknown [88.120.130.27]) (Authenticated sender: dodji@seketeli.org) by relay5-d.mail.gandi.net (Postfix) with ESMTPSA id 33D381C0003; Fri, 12 Mar 2021 11:18:14 +0000 (UTC) Received: by localhost (Postfix, from userid 1000) id 7704C58000E; Fri, 12 Mar 2021 12:18:14 +0100 (CET) From: Dodji Seketeli To: Matthias Maennich Cc: libabigail@sourceware.org, gprocida@google.com, kernel-team@android.com Subject: Re: [PATCH 05/20] Refactor ELF symbol table reading by adding a new symtab reader Organization: Me, myself and I References: <20200619214305.562-1-maennich@google.com> <20210127125853.886677-1-maennich@google.com> <20210127125853.886677-6-maennich@google.com> X-Operating-System: Fedora 34 X-URL: http://www.seketeli.net/~dodji Date: Fri, 12 Mar 2021 12:18:14 +0100 In-Reply-To: <20210127125853.886677-6-maennich@google.com> (Matthias Maennich's message of "Wed, 27 Jan 2021 12:58:38 +0000") Message-ID: <87lfaszkll.fsf@seketeli.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-10.1 required=5.0 tests=BAYES_00, GIT_PATCH_0, JMQ_SPF_NEUTRAL, KAM_DMARC_STATUS, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libabigail@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Mailing list of the Libabigail project List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 12 Mar 2021 11:18:21 -0000 Hello, Matthias Maennich a =C3=A9crit: > Based on existing functionality, implement the reading of ELF symbol > tables as a separate component. This reduces the complexity of > abg-dwarf-reader's read_context by separating and delegating the > functionality. This also allows dedicated testing. > > The new namespace symtab_reader contains a couple of new components that > work loosely coupled together. Together they allow for a consistent view > on a symbol table. With filter criteria those views can be restricted, > iterated and consistent lookup maps can be built on top of them. While > this implementation tries to address some shortcomings of the previous > model, it still provides the high level interfaces to the symbol table > contents through sorted iterating and name/address mapped access. > > symtab_reader::symtab > > While the other classes in the same namespace are merely helpers, this > is the main implementation of symtab reading and storage. > Symtab objects are factory created to ensure a consistent construction > and valid invariants. Thus a symtab will be loaded by either passing > an ELF handle (when reading from binary) or by passing a set of > function/variable symbol maps (when reading from XML). > When constructed they are considered const and are not writable > anymore. As such, all public methods are const. > > The load reuses the existing implementation for loading symtab > sections, but since the new implementation does not distinguish > between functions and variables, the code could be simplified. The > support for ppc64 function entry addresses has been deferred to a > later commit. > > Linux Kernel symbol tables are now directly loaded by name when > encountering symbols prefixed with the __ksymtab_ as per convention. Whoah. No more messing with __ksymtab sections then. How cool is that! :-) > This has been tricky in the past due to various different binary > layouts (relocations, position relative relocations, symbol > namespaces, CFI indirections, differences between vmlinux and kernel > modules). Thus the new implementation is much simpler and is less > vulnerable to future ksymtab changes. Let's just hope the "__ksymtab_" prefix convention stays that, I guess ;-) > As we are also not looking up the Kernel symbols by addresses, we > could resolve shortcomings with symbol aliasing: Previously a symbol > and its alias were indistinguishable as they are having the same > symbol address. We could not identify the one that is actually > exported via ksymtab. I see. > One major architectural difference of this implementation is that we > do not early discard suppressed symbols. While we keep them out of the > vector of exported symbols, we still make them available for lookup. > That helps addressing issues when looking up a symbol by address (e.g. > from the ksymtab read implementation) that is suppressed. That would > fail in the existing implementation. > > Still, we intend to only instantiate each symbol once and pass around > shared_ptr instances to refer to it from the vector as well as from > the lookup maps. > > For reading, there are two access paths that serve the existing > patterns: > 1) lookup_symbol: either via a name or an address > 2) filtered iteration with begin(), end() > > The former is used for direct access with a clue in hand (like a name > or an address), the latter is used for iteration (e.g. when emitting > the XML). > > symtab_reader::symtab_iterator > > The symtab_iterator is an STL compatible iterator that is returned > from begin() and end() of the symtab. It allows usual forward iterator > operations and can optionally take a filter predicate to skip non > matching elements. > > symtab_reader::symtab_filter > > The symtab_filter serves as a predicate for the symtab_iterator by > providing a matches(const elf_symbol_sptr&) function. The predicate > is built by ANDing together several conditions on attributes a symbol > can have. The filter conditions are implemented in terms of > std::optional members to allow a tristate: "needs to have the > condition set", "must not have it set" and "don't care". > > symtab_reader::filtered_symtab > > The filtered_symtab is a convenience zero cost abstraction that allows > prepopulating the symtab_filter (call it a capture) such that begin() > and end() are now accessible without the need to pass the filter > again. Argumentless begin() and end() are a requirement for range-for > loops and other STL based algorithms. Neat design. I like it. Thanks for making this is so "down to the point" and yet with the a nice level of abstraction. Now I've just picked some superficial nits below. > > * src/abg-symtab-reader.h (symtab_filter): New class. > (symtab_iterator): Likewise. > (symtab): Likewise. > (filtered_symtab): Likewise. > * src/abg-symtab-reader.cc (symtab_filter::matches): New. > (symtab::make_filter): Likewise. > (symtab::lookup_symbol): Likewise. > (symbol_sort): Likewise. > (symtab::load): Likewise. > (symtab::load_): Likewise. > > Reviewed-by: Giuliano Procida > Signed-off-by: Matthias Maennich > --- > src/abg-symtab-reader.cc | 347 +++++++++++++++++++++++++++++++++++++++ > src/abg-symtab-reader.h | 277 ++++++++++++++++++++++++++++++- > 2 files changed, 623 insertions(+), 1 deletion(-) > > diff --git a/src/abg-symtab-reader.cc b/src/abg-symtab-reader.cc > index a6c8ca0ef548..4576be2a0b42 100644 > --- a/src/abg-symtab-reader.cc > +++ b/src/abg-symtab-reader.cc > @@ -1,6 +1,7 @@ > // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception > // -*- Mode: C++ -*- > // > +// Copyright (C) 2013-2020 Red Hat, Inc. > // Copyright (C) 2020 Google, Inc. > // > // Author: Matthias Maennich > @@ -9,7 +10,20 @@ > /// > /// This contains the definition of the symtab reader >=20=20 > +#include > +#include > +#include > + > +#include "abg-elf-helpers.h" > +#include "abg-fwd.h" > +#include "abg-internal.h" > +#include "abg-tools-utils.h" > + > +// Though this is an internal header, we need to export the symbols to b= e able > +// to test this code. TODO: find a way to export symbols just for unit = tests. > +ABG_BEGIN_EXPORT_DECLARATIONS > #include "abg-symtab-reader.h" > +ABG_END_EXPORT_DECLARATIONS Ah, right. Now that we have unitary tests this may become a real thing to care about. I think a way forward at some point might be to use ELF versioning (or something similar) to put assign a dedicated ELF version (e.g, _ABG_INTERNAL_) to all these symbols that are needed just for unit testing. Then a linker script would hide all these symbols in the final shared library. Another way to go would be to use the unit testing smartly, rather than like a gold hammer that considers that everything is a nail. Contrary to popular belief, I am unimpressed by the parroting of the fable that unit testing would be the end-all-be-all of software engineering. I think that reductionist view might be more harmful than what people might think. I tend to lean towards the "it depends" point of view. If we have a good enough API, I think unit testing on an envelope that is smaller than the exposed API is often counterproductive in practise. If we /need/ to do that, then it probably means that the API might need to be more granular, more "test-able", so to speak. So we might address/fix the problem at that level instead. In any case, I guess this can be saved as an undertaking for another day :-) [...] > +/// Construct a symtab object and instantiate from an ELF handle. Also p= ass > +/// in an ir::environment handle to interact with the context we are liv= ing > +/// in. If specified, the symbol_predicate will be respected when creati= ng > +/// the full vector of symbols. > +symtab_ptr > +symtab::load(Elf* elf_handle, > + ir::environment* env, > + symbol_predicate is_suppressed) This function lacks descriptive comments for its parameters and return value. I agree the meaning parameters are obvious in the overall context of the code, but we need the comment this to have a complete API doc generated with all the descriptions :-( > +{ > + ABG_ASSERT(elf_handle); > + ABG_ASSERT(env); > + > + symtab_ptr result(new symtab); > + if (!result->load_(elf_handle, env, is_suppressed)) > + return {}; > + > + return result; > +} > + > +/// Construct a symtab object from existing name->symbol lookup maps. > +/// They were possibly read from a different representation (XML maybe). > +symtab_ptr > +symtab::load(string_elf_symbols_map_sptr function_symbol_map, > + string_elf_symbols_map_sptr variables_symbol_map) > Likewise. > +{ > + symtab_ptr result(new symtab); > + if (!result->load_(function_symbol_map, variables_symbol_map)) > + return {}; > + > + return result; > +} > + > +symtab::symtab() : is_kernel_binary_(false), has_ksymtab_entries_(false)= {} Just for the sake of consistency with the rest of the code, I'd say this might be written as:=20 symtab::symtab() : is_kernel_binary_(false), has_ksymtab_entries_(false) {} > + > +/// Load the symtab representation from an Elf binary presented to us by= an > +/// Elf* handle. > +/// > +/// This method iterates over the entries of .symtab and collects all > +/// interesting symbols (functions and variables). > +/// > +/// In case of a Linux Kernel binary, it also collects information about= the > +/// symbols exported via EXPORT_SYMBOL in the Kernel that would then end= up > +/// having a corresponding __ksymtab entry. > +/// > +/// Symbols that are suppressed will be omitted from the symbols_ vector= , but > +/// still be discoverable through the name->symbol and addr->symbol look= up > +/// maps. > +bool > +symtab::load_(Elf* elf_handle, > + ir::environment* env, > + symbol_predicate is_suppressed) > +{ This function lacks descriptive comments for its parameters and return value. [...] > + const elf_symbol_sptr& symbol_sptr =3D elf_symbol::create( > + env, i, sym->st_size, name, > + elf_helpers::stt_to_elf_symbol_type(GELF_ST_TYPE(sym->st_info)), > + elf_helpers::stb_to_elf_symbol_binding(GELF_ST_BIND(sym->st_info)), > + sym_is_defined, sym_is_common, ver, > + elf_helpers::stv_to_elf_symbol_visibility( > + GELF_ST_VISIBILITY(sym->st_other)), > + false); // TODO: is_linux_strings_cstr Do we still need the is_linux_strings_cstr parameter? I'd say no, as we don't mess with __ksymtab* sections anymore. So maybe the TODO comment should be more explicit in saying that we need to get rid of it. Besides, I'd say that to comply with the rest of the code, the "new line" should come before the opening parenthesis of the function call, e.g: const elf_symbol_sptr& symbol_sptr =3D elf_symbol::create (env, i, sym->st_size, name, elf_helpers::stt_to_elf_symbol_type(GELF_ST_TYPE(sym->st_info)), elf_helpers::stb_to_elf_symbol_binding(GELF_ST_BIND(sym->st_info)), sym_is_defined, sym_is_common, ver, elf_helpers::stv_to_elf_symbol_visibility (GELF_ST_VISIBILITY(sym->st_other)), /*is_linux_strings_cstr=3D*/false); // TODO: The // is_linux_strings_cstr // parameter should be removed // as it's not needed anymore [...] > + > +/// Load the symtab representation from a function/variable lookup map p= air. > +/// > +/// This method assumes the lookup maps are correct and sets up the data > +/// vector as well as the name->symbol lookup map. The addr->symbol look= up > +/// map cannot be set up in this case. > +bool > +symtab::load_(string_elf_symbols_map_sptr function_symbol_map, > + string_elf_symbols_map_sptr variables_symbol_map) > + > +{ This function lacks descriptive comments for its parameters and return value. > + if (function_symbol_map) > + for (const auto& symbol_map_entry : *function_symbol_map) > + { > + symbols_.insert(symbols_.end(), symbol_map_entry.second.begin(), > + symbol_map_entry.second.end()); > + ABG_ASSERT(name_symbol_map_.insert(symbol_map_entry).second); > + } > + > + if (variables_symbol_map) > + for (const auto& symbol_map_entry : *variables_symbol_map) > + { > + symbols_.insert(symbols_.end(), symbol_map_entry.second.begin(), > + symbol_map_entry.second.end()); > + ABG_ASSERT(name_symbol_map_.insert(symbol_map_entry).second); > + } > + > + // sort the symbols for deterministic output > + std::sort(symbols_.begin(), symbols_.end(), symbol_sort); > + > + return true; > +} > + > } // end namespace symtab_reader > } // end namespace abigail > diff --git a/src/abg-symtab-reader.h b/src/abg-symtab-reader.h > index a929166b83ef..4c5e3b85c22d 100644 > --- a/src/abg-symtab-reader.h > +++ b/src/abg-symtab-reader.h [...] > +/// The symtab filter is the object passed to the symtab object in order= to > +/// iterate over the symbols in the symtab while applying filters. > +/// > +/// The general idea is that it consists of a set of optionally enforced= flags, > +/// such as 'functions' or 'variables'. If not set, those are not filter= ed for, > +/// neither inclusive nor exclusive. If set they are all ANDed together. > +class symtab_filter > +{ > +public: > + // Default constructor disabling all features. > + symtab_filter() {} > + > + bool > + matches(const elf_symbol& symbol) const; > + > + void > + set_functions(bool new_value =3D true) This function which is defined lacks comments for its parameter. > + { functions_ =3D new_value; }; To comply with the rest of the code, one liner function implementation don't have any leading/trailing space, e.g, it should be: void set_functions(bool new_value =3D true) {functions_ =3D new_value;} > + > + void > + set_variables(bool new_value =3D true) > + { variables_ =3D new_value; }; Likewise. > + > + void > + set_public_symbols(bool new_value =3D true) > + { public_symbols_ =3D new_value; }; Likewise. > + > + void > + set_undefined_symbols(bool new_value =3D true) > + { undefined_symbols_ =3D new_value; }; Likewise. > + > + void > + set_kernel_symbols(bool new_value =3D true) > + { kernel_symbols_ =3D new_value; }; Likewise. > + > +private: > + // The symbol is a function (FUNC) > + abg_compat::optional functions_; > + > + // The symbol is a variables (OBJECT) > + abg_compat::optional variables_; > + > + // The symbol is publicly accessible (global/weak with default/protect= ed > + // visibility) > + abg_compat::optional public_symbols_; > + > + // The symbols is not defined (declared) > + abg_compat::optional undefined_symbols_; > + > + // The symbol is listed in the ksymtab (for Linux Kernel binaries). > + abg_compat::optional kernel_symbols_; > +}; > + > +/// Base iterator for our custom iterator based on whatever the const_it= erator > +/// is for a vector of symbols. > +/// As of writing this, std::vector::const_iterator. > +typedef elf_symbols::const_iterator base_iterator; > + > +/// An iterator to walk a vector of elf_symbols filtered by symtab_filte= r. > +/// > +/// The implementation inherits all properties from the vector's > +/// const_iterator, but intercepts where necessary to allow effective > +/// filtering. This makes it a STL compatible iterator for general purpo= se > +/// usage. > +class symtab_iterator : public base_iterator > +{ > +public: > + typedef base_iterator::value_type value_type; > + typedef base_iterator::reference reference; > + typedef base_iterator::pointer pointer; > + typedef base_iterator::difference_type difference_type; > + typedef std::forward_iterator_tag iterator_category; > + > + /// Construct the iterator based on a pair of underlying iterators and= a > + /// symtab_filter object. Immediately fast forward to the next element= that > + /// matches the criteria (if any). > + symtab_iterator(base_iterator begin, > + base_iterator end, > + const symtab_filter& filter =3D symtab_filter()) > + : base_iterator(begin), end_(end), filter_(filter) This function lacks description for its parameters. > + { skip_to_next(); } There should be no trailing/leading space. > + > + /// Pre-increment operator to advance to the next matching element. > + symtab_iterator& > + operator++() > + { This function lacks a description for its return value. > + base_iterator::operator++(); > + skip_to_next(); > + return *this; > + } > + > + /// Post-increment operator to advance to the next matching element. > + symtab_iterator > + operator++(int) > + { This function lacks a description for its return value. > + symtab_iterator result(*this); > + ++(*this); > + return result; > + } [...] > +/// symtab is the actual data container of the symtab_reader implementat= ion. > +/// > +/// The symtab is instantiated either via an Elf handle (from binary) or= from a > +/// set of existing symbol maps (usually when instantiated from XML). It= will > +/// then discover the symtab, possibly the ksymtab (for Linux Kernel bin= aries) > +/// and setup the data containers and lookup maps for later perusal. > +/// > +/// The symtab is supposed to be used in a const context as all informat= ion is > +/// already computed at construction time. Symbols are stored sorted to = allow > +/// deterministic reading of the entries. > +/// > +/// An example use of the symtab class is > +/// > +/// const auto symtab =3D symtab::load(elf_handle, env); > +/// symtab_filter filter =3D symtab->make_filter(); > +/// filter.set_public_symbols(); > +/// filter.set_functions(); > +/// > +/// for (const auto& symbol : filtered_symtab(*symtab, filter)) > +/// { > +/// std::cout << symbol->get_name() << "\n"; > +/// } > +/// I find this a great API design. Simple enough to understand, even for a simple mind like me, not over-engineered. Thank you for that. > +/// This uses the filtered_symtab proxy object to capture the filter. > +class symtab > +{ > +public: > + typedef std::function symbol_predicate; > + > + /// Indicate whether any (kernel) symbols have been seen at constructi= on. > + /// > + /// @return true if there are symbols detected earlier. > + bool > + has_symbols() const > + { return is_kernel_binary_ ? has_ksymtab_entries_ : !symbols_.empty();= } Leading/Trailing space. > + > + symtab_filter > + make_filter() const; > + > + /// The (only) iterator type we offer is a const_iterator implemented = by the > + /// symtab_iterator. > + typedef symtab_iterator const_iterator; > + > + /// Obtain an iterator to the beginning of the symtab according to the= filter > + /// criteria. Whenever this iterator advances, it skips elements that = do not > + /// match the filter criteria. > + /// > + /// @param filter the symtab_filter to match symbols against > + /// > + /// @return a filtering const_iterator of the underlying type > + const_iterator > + begin(const symtab_filter& filter) const > + { return symtab_iterator(symbols_.begin(), symbols_.end(), filter); } Likewise. > + /// Obtain an iterator to the end of the symtab. > + /// > + /// @return an end iterator > + const_iterator > + end() const > + { return symtab_iterator(symbols_.end(), symbols_.end()); } Likewise. [...] > +/// Helper class to allow range-for loops on symtabs for C++11 and later= code. > +/// It serves as a proxy for the symtab iterator and provides a begin() = method > +/// without arguments, as required for range-for loops (and possibly oth= er > +/// iterator based transformations). > +/// > +/// Example usage: > +/// > +/// for (const auto& symbol : filtered_symtab(tab, filter)) > +/// { > +/// std::cout << symbol->get_name() << "\n"; > +/// } > +/// > +class filtered_symtab > +{ > + const symtab& tab_; > + const symtab_filter filter_; > + > +public: > + /// Construct the proxy object keeping references to the underlying sy= mtab > + /// and the filter object. > + filtered_symtab(const symtab& tab, const symtab_filter& filter) > + : tab_(tab), filter_(filter) { } I'd put the '{}' on the next line, with no space between the two braces. > + /// Pass through symtab.begin(), but also pass on the filter. > + symtab::const_iterator > + begin() const > + { return tab_.begin(filter_); } No trailing/leading space. > + > + /// Pass through symtab.end(). > + symtab::const_iterator > + end() const > + { return tab_.end(); } No trailing/leading space. > +}; > + > } // end namespace symtab_reader > } // end namespace abigail Thank you for this gem! Cheers, --=20 Dodji