From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 36982 invoked by alias); 12 Sep 2019 13:44:48 -0000 Mailing-List: contact libabigail-help@sourceware.org; run by ezmlm Precedence: bulk List-Post: List-Help: List-Id: List-Subscribe: Sender: libabigail-owner@sourceware.org Received: (qmail 36971 invoked by uid 89); 12 Sep 2019 13:44:48 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Checked: by ClamAV 0.100.3 on sourceware.org X-Virus-Found: No X-Spam-SWARE-Status: No, score=-26.9 required=5.0 tests=BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,SPF_PASS autolearn=ham version=3.3.1 spammy= X-Spam-Status: No, score=-26.9 required=5.0 tests=BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,SPF_PASS autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on sourceware.org X-Spam-Level: X-HELO: mail.kernel.org Received: from mail.kernel.org (HELO mail.kernel.org) (198.145.29.99) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 12 Sep 2019 13:44:45 +0000 Received: from linux-8ccs (ip5f5adbf4.dynamic.kabel-deutschland.de [95.90.219.244]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 38E4020856; Thu, 12 Sep 2019 13:44:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1568295884; bh=9Ga9w7WxZfpVInJP9RCP72NkL0i89zRTyywA6cc8vYk=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=KY9y2NjdyVKsZTGarBD7kzIhyekvXS/kEM4EOw8ISi4TZwEMh95vMLgacaz4XbuUI PW6XXT0ajbHhUBZohkZZ+sl/yhw34uY2bYP+/pdVKO7OsxJueTj9qgm9pa2obLV+tP bOmk8APuMLiPaXkFIpLYQJ0r3oxgWW7kmJRc1bqI= Date: Tue, 01 Jan 2019 00:00:00 -0000 From: Jessica Yu To: Matthias Maennich Cc: libabigail@sourceware.org, Dodji Seketeli , Android Kernel Team Subject: Re: [RFC PATCH] Support pre and post v4.19 ksymtabs for Linux kernel modules Message-ID: <20190912134439.GB27845@linux-8ccs> References: <20190809154900.13751-1-jeyu@kernel.org> <20190819080508.GA189145@google.com> <20190903142337.GA17881@linux-8ccs.fritz.box> <20190912100104.GA175701@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Disposition: inline In-Reply-To: <20190912100104.GA175701@google.com> X-OS: Linux linux-8ccs 4.12.14-lp150.12.61-default x86_64 User-Agent: Mutt/1.10.1 (2018-07-13) X-IsSubscribed: yes X-SW-Source: 2019-q3/txt/msg00036.txt.bz2 +++ Matthias Maennich [12/09/19 11:01 +0100]: >On Tue, Sep 03, 2019 at 04:23:37PM +0200, Jessica Yu wrote: >>+++ Matthias Maennich [19/08/19 09:05 +0100]: >>>Hi Jessica! >>> >>>On Fri, 09 Aug, 17:49, Jessica Yu wrote: >>>>As described in commit ad8c2531fb9, the format of the Linux kernel >>>>ksymtab changed in v4.19 to use relative references instead of absolute >>>>references. This changes the type of relocations emitted for ksymtab >>>>sections to be place-relative 32-bit relocations instead of absolute >>>>relocations. One side-effect of this is that libdwfl will not relocate >>>>the ksymtab sections due to the PC-relative relocations. This breaks >>>>load_kernel_symbol_table() for kernel modules because it only reads in >>>>zeros from the unrelocated ksymtab section and is subsequently unable to >>>>determine what exported symbols it refers to. Since a vmlinux binary is >>>>already fully linked and relocated (and therefore we can read its >>>>ksymtab section just fine), this problem is only relevant to Linux >>>>kernel modules. >>>> >>>>To work around this, we utilize the ksymtab relocation sections to >>>>determine which symbols the ksymtab entries refer to. We do this by >>>>inspecting each relocation's r_info field for the symbol table index and >>>>from there we are able to read each symbol's value and subsequently add >>>>that to the set of exported symbols. >>>> >>>>In addition, for Linux kernel modules, we can utilize relocation types >>>>to implement a new heuristic to determine the ksymtab format we have. >>>>The presence of PC-relative relocations suggest the new v4.19 format, >>>>and absolute relocation types suggest the old pre v4.19 format. >>>> >>>> * include/abg-ir.h (elf_symbol::{elf_symbol, create}): Take new >>>> symbol value and shndx parameters. >>>> (elf_symbol::{get_value, get_shndx}): Declare new accessors. >>>> * src/abg-ir.cc (elf_symbol::priv::{value_, shndx_}): New data members. >>>> (elf_symbol::priv::priv): Adjust. >>>> (elf_symbol::elf_symbol): Take new value and shndx parameters. >>>> (elf_symbol::create): Likewise. >>>> (elf_symbol::{get_value, get_shndx}): Define new accessors. >>>> * src/abg-reader.cc (build_elf_symbol): Adjust. >>>> * src/abg-dwarf-reader.cc (lookup_symbol_from_sysv_hash_tab) >>>> (lookup_symbol_from_gnu_hash_tab) >>>> (lookup_symbol_from_symtab): Adjust. >>>> (read_context::{ksymtab_reloc_section_, >>>> ksymtab_gpl_reloc_section_, >>>> ksymtab_strings_section_}): New data members. >>>> (read_context::read_context): Initialize ksymtab_reloc_section_, >>>> ksymtab_gpl_reloc_section_, ksymtab_strings_section_. >>>> (read_context::{find_ksymtab_reloc_section, >>>> find_ksymtab_gpl_reloc_section, find_ksymtab_strings_section, >>>> find_any_ksymtab_reloc_section, get_ksymtab_format_module, >>>> populate_symbol_map_from_ksymtab, >>>> populate_symbol_map_from_ksymtab_reloc, >>>> is_linux_kernel_module}): New member functions. >>>> (read_context::load_kernel_symbol_table): Adjust to call either >>>> populate_symbol_map_from_ksymtab{_reloc,} depending on ksymtab >>>> format. >>>> (read_context::get_ksymtab_format): Adjust to call >>>> get_ksymtab_format_module for linux kernel modules. >>>> (read_context::lookup_elf_symbol_from_index): Adjust. >>>> (create_default_var_sym, create_default_fn_sym): Adjust. >>>> >>>>Signed-off-by: Jessica Yu >>>>--- >>>>include/abg-ir.h | 10 ++ >>>>src/abg-dwarf-reader.cc | 391 +++++++++++++++++++++++++++++++++++++++++------- >>>>src/abg-ir.cc | 38 ++++- >>>>src/abg-reader.cc | 2 + >>>>4 files changed, 388 insertions(+), 53 deletions(-) >>>> >>>>diff --git a/include/abg-ir.h b/include/abg-ir.h >>>>index 170a844..d26f472 100644 >>>>--- a/include/abg-ir.h >>>>+++ b/include/abg-ir.h >>>>@@ -796,6 +796,8 @@ private: >>>>elf_symbol(const environment* e, >>>> size_t i, >>>> size_t s, >>>>+ uint64_t v, >>>>+ uint32_t x, >>> >>>I would like to see some comments here to describe the purpose of the >>>struct's members. I know there weren't any before, but maybe that is a >>>good chance. >>> >>>Also, I am not entirely sure why these are all one-character names, but >>>it makes it hard to come from x to "section header index". Maybe we >>>should give them more meaningful names. >>> >>>> const string& n, >>>> type t, >>>> binding b, >>>>@@ -818,6 +820,8 @@ public: >>>>create(const environment* e, >>>> size_t i, >>>> size_t s, >>>>+ uint64_t v, >>>>+ uint32_t x, >>>> const string& n, >>>> type t, >>>> binding b, >>>>@@ -838,6 +842,12 @@ public: >>>>void >>>>set_index(size_t); >>>> >>>>+ uint64_t >>>>+ get_value() const; >>>>+ >>>>+ uint32_t >>>>+ get_shndx() const; >>>>+ >>>>const string& >>>>get_name() const; >>>> >>>>diff --git a/src/abg-dwarf-reader.cc b/src/abg-dwarf-reader.cc >>>>index 47af7e3..835bf3d 100644 >>>>--- a/src/abg-dwarf-reader.cc >>>>+++ b/src/abg-dwarf-reader.cc >>>>@@ -1724,6 +1724,8 @@ lookup_symbol_from_sysv_hash_tab(const environment* env, >>>>GElf_Sym symbol; >>>>const char* sym_name_str; >>>>size_t sym_size; >>>>+ uint64_t sym_value; >>>>+ uint32_t sym_shndx; >>>>elf_symbol::type sym_type; >>>>elf_symbol::binding sym_binding; >>>>elf_symbol::visibility sym_visibility; >>>>@@ -1743,6 +1745,8 @@ lookup_symbol_from_sysv_hash_tab(const environment* env, >>>> sym_visibility = >>>> stv_to_elf_symbol_visibility(GELF_ST_VISIBILITY(symbol.st_other)); >>>> sym_size = symbol.st_size; >>>>+ sym_value = symbol.st_value; >>>>+ sym_shndx = symbol.st_shndx; >>>> elf_symbol::version ver; >>>> if (get_version_for_symbol(elf_handle, symbol_index, >>>> /*get_def_version=*/true, ver)) >>>>@@ -1751,6 +1755,8 @@ lookup_symbol_from_sysv_hash_tab(const environment* env, >>>> elf_symbol::create(env, >>>> symbol_index, >>>> sym_size, >>>>+ sym_value, >>>>+ sym_shndx, >>>> sym_name_str, >>>> sym_type, >>>> sym_binding, >>>>@@ -2028,7 +2034,8 @@ lookup_symbol_from_gnu_hash_tab(const environment* env, >>>> ABG_ASSERT(!ver.str().empty()); >>>> >>>> elf_symbol_sptr symbol_found = >>>>- elf_symbol::create(env, i, symbol.st_size, sym_name_str, >>>>+ elf_symbol::create(env, i, symbol.st_size, symbol.st_value, >>>>+ symbol.st_shndx, sym_name_str, >>>> sym_type, sym_binding, >>>> symbol.st_shndx != SHN_UNDEF, >>>> symbol.st_shndx == SHN_COMMON, >>>>@@ -2181,6 +2188,7 @@ lookup_symbol_from_symtab(const environment* env, >>>> ABG_ASSERT(!ver.str().empty()); >>>> elf_symbol_sptr symbol_found = >>>> elf_symbol::create(env, i, sym->st_size, >>>>+ sym->st_value, sym->st_shndx, >>>> name_str, sym_type, >>>> sym_binding, sym_is_defined, >>>> sym_is_common, ver, sym_visibility); >>>>@@ -3081,7 +3089,10 @@ public: >>>>/// to symbols exported using the EXPORT_SYMBOL_GPL macro from the >>>>/// linux kernel. >>>>Elf_Scn* ksymtab_section_; >>>>+ Elf_Scn* ksymtab_reloc_section_; >>>>Elf_Scn* ksymtab_gpl_section_; >>>>+ Elf_Scn* ksymtab_gpl_reloc_section_; >>>>+ Elf_Scn* ksymtab_strings_section_; >>>>Elf_Scn* versym_section_; >>>>Elf_Scn* verdef_section_; >>>>Elf_Scn* verneed_section_; >>>>@@ -3277,7 +3288,10 @@ public: >>>> nb_ksymtab_entries_ = 0; >>>> nb_ksymtab_gpl_entries_ = 0; >>>> ksymtab_section_ = 0; >>>>+ ksymtab_reloc_section_ = 0; >>>> ksymtab_gpl_section_ = 0; >>>>+ ksymtab_gpl_reloc_section_ = 0; >>>>+ ksymtab_strings_section_ = 0; >>>> versym_section_ = 0; >>>> verdef_section_ = 0; >>>> verneed_section_ = 0; >>>>@@ -6079,6 +6093,23 @@ public: >>>> return ksymtab_section_; >>>>} >>>> >>>>+ /// Return the .rel{a,}__ksymtab section of a linux kernel ELF file (either >>>>+ /// a vmlinux binary or a kernel module). >>>>+ /// >>>>+ /// @return the .rel{a,}__ksymtab section if found, nil otherwise. >>>>+ Elf_Scn* >>>>+ find_ksymtab_reloc_section() const >>>>+ { >>>>+ if (!ksymtab_reloc_section_) >>>>+ { >>>>+ Elf_Scn *sec = find_section(elf_handle(), ".rela__ksymtab", SHT_RELA); >>>>+ if (!sec) >>>>+ sec = find_section(elf_handle(), ".rel__ksymtab", SHT_REL); >>>>+ const_cast(this)->ksymtab_reloc_section_ = sec; >>>>+ } >>>>+ return ksymtab_reloc_section_; >>>>+ } >>>>+ >>>>/// Return the __ksymtab_gpl section of a linux kernel ELF file >>>>/// (either a vmlinux binary or a kernel module). >>>>/// >>>>@@ -6092,6 +6123,36 @@ public: >>>> return ksymtab_gpl_section_; >>>>} >>>> >>>>+ /// Return the .rel{a,}__ksymtab_gpl section of a linux kernel ELF file >>>>+ /// (either a vmlinux binary or a kernel module). >>>>+ /// >>>>+ /// @return the .rel{a,}__ksymtab_gpl section if found, nil otherwise. >>>>+ Elf_Scn* >>>>+ find_ksymtab_gpl_reloc_section() const >>>>+ { >>>>+ if (!ksymtab_gpl_reloc_section_) >>>>+ { >>>>+ Elf_Scn *sec = find_section(elf_handle(), ".rela__ksymtab_gpl", SHT_RELA); >>>>+ if (!sec) >>>>+ sec = find_section(elf_handle(), ".rel__ksymtab_gpl", SHT_REL); >>>>+ const_cast(this)->ksymtab_gpl_reloc_section_ = sec; >>>>+ } >>>>+ return ksymtab_gpl_reloc_section_; >>>>+ } >>>>+ >>>>+ /// Return the __ksymtab_strings section of a linux kernel ELF file >>>>+ /// (either a vmlinux binary or a kernel module). >>>>+ /// >>>>+ /// @return the __ksymtab_strings section if found, nil otherwise. >>>>+ Elf_Scn* >>>>+ find_ksymtab_strings_section() const >>>>+ { >>>>+ if (!ksymtab_strings_section_) >>>>+ const_cast(this)->ksymtab_strings_section_ = >>>>+ find_section(elf_handle(), "__ksymtab_strings", SHT_PROGBITS); >>>>+ return ksymtab_strings_section_; >>>>+ } >>>>+ >>> >>>We should find a way to deduplicate these functions. Not necessarily in >>>this patch, but in a followup. Most of them only differ in the section >>>name and whether they check for an alternative. Something generic like >>> >>>struct SectionQuery{ string name, int type }; >>>Elf_Scn* find_section(const vector&); >>> >>>should be sufficient to forward from all the other variants to one >>>implementation. >>> >>>>/// Return either a __ksymtab or a __ksymtab_gpl section, in case >>>>/// only the __ksymtab_gpl exists. >>>>/// >>>>@@ -6106,6 +6167,19 @@ public: >>>> return result; >>>>} >>>> >>>>+ /// Return either a .rel{a,}__ksymtab or a .rel{a,}__ksymtab_gpl section >>>>+ /// >>>>+ /// @return the .rel{a,}__ksymtab section if it exists, or the >>>>+ /// .rel{a,}__ksymtab_gpl; or NULL if neither is found. >>>>+ Elf_Scn* >>>>+ find_any_ksymtab_reloc_section() const >>>>+ { >>>>+ Elf_Scn *result = find_ksymtab_reloc_section(); >>>>+ if (!result) >>>>+ result = find_ksymtab_gpl_reloc_section(); >>>>+ return result; >>>>+ } >>>>+ >>>>/// Return the SHT_GNU_versym, SHT_GNU_verdef and SHT_GNU_verneed >>>>/// sections that are involved in symbol versionning. >>>>/// >>>>@@ -6283,7 +6357,8 @@ public: >>>> stv_to_elf_symbol_visibility(GELF_ST_VISIBILITY(s->st_other)); >>>> >>>> elf_symbol_sptr sym = >>>>- elf_symbol::create(env(), symbol_index, s->st_size, name_str, >>>>+ elf_symbol::create(env(), symbol_index, s->st_size, >>>>+ s->st_value, s->st_shndx, name_str, >>>> stt_to_elf_symbol_type(GELF_ST_TYPE(s->st_info)), >>>> stb_to_elf_symbol_binding(GELF_ST_BIND(s->st_info)), >>>> sym_is_defined, sym_is_common, ver, vis); >>>>@@ -7435,6 +7510,64 @@ public: >>>> return symbol; >>>>} >>>> >>>>+ /// Determine the format of the __ksymtab and __ksymtab_gpl >>>>+ /// sections of Linux kernel modules. >>>>+ /// >>>>+ /// This is important because we need the know the format of these >>> >>>s/need the/need to/ ^^^ >>> >>>>+ /// sections to be able to read from them. >>>>+ /// >>>>+ /// @return the format the __ksymtab[_gpl] sections. >>>>+ enum ksymtab_format >>>>+ get_ksymtab_format_module() const >>>>+ { >>>>+ Elf_Scn *section = find_any_ksymtab_reloc_section(); >>>>+ >>>>+ if (!section) >>>>+ return UNDEFINED_KSYMTAB_FORMAT; >>>>+ >>>>+ // Libdwfl has a weird quirk where, in the process of obtaining an Elf >>>>+ // descriptor via dwfl_module_getelf(), it will apply all relocations it >>>>+ // knows how to and it will zero a relocation after applying it. If the >>> >>>will zero the relocation info after >>> >>>>+ // .rela__ksymtab* section contained only simple (absolute) relocations, >>>>+ // they will have been all applied and sh_size will be 0. For arches that >>>>+ // support relative ksymtabs, simple relocations only appear in pre-4.19 >>>>+ // kernel modules. >>>>+ GElf_Shdr section_mem; >>>>+ GElf_Shdr *section_shdr = gelf_getshdr(section, §ion_mem); >>>>+ if (section_shdr->sh_size == 0) >>>>+ return PRE_V4_19_KSYMTAB_FORMAT; >>>>+ /* SHT_REL sections not supported yet */ >>>>+ if (section_shdr->sh_type == SHT_REL) >>>>+ return UNDEFINED_KSYMTAB_FORMAT; >>>>+ >>>>+ // If we still have a normal non-zeroed relocation section, we can guess >>>>+ // what format the ksymtab is in depending on what types of relocs it >>>>+ // contains. >>>>+ GElf_Rela rela; >>>>+ Elf_Data *rela_section_data = elf_rawdata(section, 0); >>>>+ gelf_getrela(rela_section_data, 0, &rela); >>>>+ >>>>+ // Sigh, I dislike the arch-dependent code here, but this seems to be a >>>>+ // reliable heuristic for kernel modules for now. Relative ksymtabs only >>>>+ // supported on x86 and arm64 as of v4.19. >>>>+ ksymtab_format format; >>>>+ switch (GELF_R_TYPE(rela.r_info)) >>>>+ { >>>>+ case R_X86_64_64: // Same as R_386_32 >>> >>>add 'fallthrough' to the comments, else younger compilers will warn >>>about the potentially unintentional fallthrough. >>> >>>>+ case R_AARCH64_ABS64: >>>>+ format = PRE_V4_19_KSYMTAB_FORMAT; >>>>+ break; >>>>+ case R_X86_64_PC32: // Same as R_386_PC32 >>>>+ case R_AARCH64_PREL32: >>>>+ format = V4_19_KSYMTAB_FORMAT; >>>>+ break; >>>>+ default: >>>>+ format = UNDEFINED_KSYMTAB_FORMAT; >>> >>>We should probably emit a warning in this case. Are there more >>>architectures that have PREL32 enabled as of now that are not yet >>>covered here? >> >>On second thought, I don't think we should actually warn *here* at >>this point, inside get_ksymtab_format_module(), because then we would >>have to include/check for reloc types for all arches that libabigail >>supports. >> >>For instance if we have a ppc64 module, we would hit the default case >>and warn immediately, but instead we should be falling back to >>try_reading_first_ksymtab_entry_using_pre_v4_19_format(), which >>gets called when get_ksymtab_format_module() returns >>UNDEFINED_KSYMTAB_FORMAT. We should warn when we have already >>exhausted all methods of determining the ksymtab format, which is what >>the current code does. The current sequence of calls is (1) if it's a >>module, see if we recognize any of the special PREL32 reloc types that >>we know of. if not, try method (2), call try_reading_first_ksymtab_entry_using_pre_v4_19_format(), >>then try method (3), try_reading_first_ksymtab_entry_using_v4_19_format(), >>then finally do ABG_ASSERT when all fails. I hope I'm making sense.. >>Does that sound reasonable? > >Hi Jessica, > >I think we are talking about the same thing. All I want to prevent is a >misleading fallback. So, yes, we should try all known methods and should >only warn if we are not sure whether the result we came up is actually >correct. Which again leads also to the question: Do you think it is >reasonable to have something in the kernel binaries to declare the >ksymtab format? In a separate section or a magic symbol? Something to >make it easier (and safer) for tools like libabigail? Yes, I think that would be reasonable, although I have not yet thought of a non-intrusive/kabi-compatible way of expressing that yet. Perhaps we could stick a magic string in __ksymtab_strings or a separate section like you suggest, maybe named .ksymtab_format with a magic value indicating the type. Although with the merge window at our heels it might have to be for 5.5, but that's plenty of time to come up with something at least. Thanks, Jessica