public inbox for libabigail@sourceware.org
 help / color / mirror / Atom feed
From: Jessica Yu <jeyu@kernel.org>
To: Matthias Maennich <maennich@google.com>
Cc: libabigail@sourceware.org, Dodji Seketeli <dodji@redhat.com>,
	Android Kernel Team <kernel-team@android.com>
Subject: Re: [RFC PATCH] Support pre and post v4.19 ksymtabs for Linux kernel modules
Date: Tue, 01 Jan 2019 00:00:00 -0000	[thread overview]
Message-ID: <20190912134439.GB27845@linux-8ccs> (raw)
In-Reply-To: <20190912100104.GA175701@google.com>

+++ Matthias Maennich [12/09/19 11:01 +0100]:
>On Tue, Sep 03, 2019 at 04:23:37PM +0200, Jessica Yu wrote:
>>+++ Matthias Maennich [19/08/19 09:05 +0100]:
>>>Hi Jessica!
>>>
>>>On Fri, 09 Aug, 17:49, Jessica Yu wrote:
>>>>As described in commit ad8c2531fb9, the format of the Linux kernel
>>>>ksymtab changed in v4.19 to use relative references instead of absolute
>>>>references. This changes the type of relocations emitted for ksymtab
>>>>sections to be place-relative 32-bit relocations instead of absolute
>>>>relocations. One side-effect of this is that libdwfl will not relocate
>>>>the ksymtab sections due to the PC-relative relocations. This breaks
>>>>load_kernel_symbol_table() for kernel modules because it only reads in
>>>>zeros from the unrelocated ksymtab section and is subsequently unable to
>>>>determine what exported symbols it refers to. Since a vmlinux binary is
>>>>already fully linked and relocated (and therefore we can read its
>>>>ksymtab section just fine), this problem is only relevant to Linux
>>>>kernel modules.
>>>>
>>>>To work around this, we utilize the ksymtab relocation sections to
>>>>determine which symbols the ksymtab entries refer to. We do this by
>>>>inspecting each relocation's r_info field for the symbol table index and
>>>>from there we are able to read each symbol's value and subsequently add
>>>>that to the set of exported symbols.
>>>>
>>>>In addition, for Linux kernel modules, we can utilize relocation types
>>>>to implement a new heuristic to determine the ksymtab format we have.
>>>>The presence of PC-relative relocations suggest the new v4.19 format,
>>>>and absolute relocation types suggest the old pre v4.19 format.
>>>>
>>>>      * include/abg-ir.h (elf_symbol::{elf_symbol, create}): Take new
>>>>      symbol value and shndx parameters.
>>>>      (elf_symbol::{get_value, get_shndx}): Declare new accessors.
>>>>      * src/abg-ir.cc (elf_symbol::priv::{value_, shndx_}): New data members.
>>>>      (elf_symbol::priv::priv): Adjust.
>>>>      (elf_symbol::elf_symbol): Take new value and shndx parameters.
>>>>      (elf_symbol::create): Likewise.
>>>>      (elf_symbol::{get_value, get_shndx}): Define new accessors.
>>>>      * src/abg-reader.cc (build_elf_symbol): Adjust.
>>>>      * src/abg-dwarf-reader.cc (lookup_symbol_from_sysv_hash_tab)
>>>>      (lookup_symbol_from_gnu_hash_tab)
>>>>      (lookup_symbol_from_symtab): Adjust.
>>>>      (read_context::{ksymtab_reloc_section_,
>>>>      ksymtab_gpl_reloc_section_,
>>>>      ksymtab_strings_section_}): New data members.
>>>>      (read_context::read_context): Initialize ksymtab_reloc_section_,
>>>>      ksymtab_gpl_reloc_section_, ksymtab_strings_section_.
>>>>      (read_context::{find_ksymtab_reloc_section,
>>>>      find_ksymtab_gpl_reloc_section, find_ksymtab_strings_section,
>>>>      find_any_ksymtab_reloc_section, get_ksymtab_format_module,
>>>>      populate_symbol_map_from_ksymtab,
>>>>      populate_symbol_map_from_ksymtab_reloc,
>>>>      is_linux_kernel_module}): New member functions.
>>>>      (read_context::load_kernel_symbol_table): Adjust to call either
>>>>      populate_symbol_map_from_ksymtab{_reloc,} depending on ksymtab
>>>>      format.
>>>>      (read_context::get_ksymtab_format): Adjust to call
>>>>      get_ksymtab_format_module for linux kernel modules.
>>>>      (read_context::lookup_elf_symbol_from_index): Adjust.
>>>>      (create_default_var_sym, create_default_fn_sym): Adjust.
>>>>
>>>>Signed-off-by: Jessica Yu <jeyu@kernel.org>
>>>>---
>>>>include/abg-ir.h        |  10 ++
>>>>src/abg-dwarf-reader.cc | 391 +++++++++++++++++++++++++++++++++++++++++-------
>>>>src/abg-ir.cc           |  38 ++++-
>>>>src/abg-reader.cc       |   2 +
>>>>4 files changed, 388 insertions(+), 53 deletions(-)
>>>>
>>>>diff --git a/include/abg-ir.h b/include/abg-ir.h
>>>>index 170a844..d26f472 100644
>>>>--- a/include/abg-ir.h
>>>>+++ b/include/abg-ir.h
>>>>@@ -796,6 +796,8 @@ private:
>>>>elf_symbol(const environment* e,
>>>>	     size_t		i,
>>>>	     size_t		s,
>>>>+	     uint64_t		v,
>>>>+	     uint32_t		x,
>>>
>>>I would like to see some comments here to describe the purpose of the
>>>struct's members. I know there weren't any before, but maybe that is a
>>>good chance.
>>>
>>>Also, I am not entirely sure why these are all one-character names, but
>>>it makes it hard to come from x to "section header index". Maybe we
>>>should give them more meaningful names.
>>>
>>>>	     const string&	n,
>>>>	     type		t,
>>>>	     binding		b,
>>>>@@ -818,6 +820,8 @@ public:
>>>>create(const environment*	e,
>>>>	 size_t		i,
>>>>	 size_t		s,
>>>>+	 uint64_t	v,
>>>>+	 uint32_t	x,
>>>>	 const string&		n,
>>>>	 type			t,
>>>>	 binding		b,
>>>>@@ -838,6 +842,12 @@ public:
>>>>void
>>>>set_index(size_t);
>>>>
>>>>+  uint64_t
>>>>+  get_value() const;
>>>>+
>>>>+  uint32_t
>>>>+  get_shndx() const;
>>>>+
>>>>const string&
>>>>get_name() const;
>>>>
>>>>diff --git a/src/abg-dwarf-reader.cc b/src/abg-dwarf-reader.cc
>>>>index 47af7e3..835bf3d 100644
>>>>--- a/src/abg-dwarf-reader.cc
>>>>+++ b/src/abg-dwarf-reader.cc
>>>>@@ -1724,6 +1724,8 @@ lookup_symbol_from_sysv_hash_tab(const environment*		env,
>>>>GElf_Sym symbol;
>>>>const char* sym_name_str;
>>>>size_t sym_size;
>>>>+  uint64_t sym_value;
>>>>+  uint32_t sym_shndx;
>>>>elf_symbol::type sym_type;
>>>>elf_symbol::binding sym_binding;
>>>>elf_symbol::visibility sym_visibility;
>>>>@@ -1743,6 +1745,8 @@ lookup_symbol_from_sysv_hash_tab(const environment*		env,
>>>>	  sym_visibility =
>>>>	    stv_to_elf_symbol_visibility(GELF_ST_VISIBILITY(symbol.st_other));
>>>>	  sym_size = symbol.st_size;
>>>>+	  sym_value = symbol.st_value;
>>>>+	  sym_shndx = symbol.st_shndx;
>>>>	  elf_symbol::version ver;
>>>>	  if (get_version_for_symbol(elf_handle, symbol_index,
>>>>				     /*get_def_version=*/true, ver))
>>>>@@ -1751,6 +1755,8 @@ lookup_symbol_from_sysv_hash_tab(const environment*		env,
>>>>	    elf_symbol::create(env,
>>>>			       symbol_index,
>>>>			       sym_size,
>>>>+			       sym_value,
>>>>+			       sym_shndx,
>>>>			       sym_name_str,
>>>>			       sym_type,
>>>>			       sym_binding,
>>>>@@ -2028,7 +2034,8 @@ lookup_symbol_from_gnu_hash_tab(const environment*		env,
>>>>	    ABG_ASSERT(!ver.str().empty());
>>>>
>>>>	  elf_symbol_sptr symbol_found =
>>>>-	    elf_symbol::create(env, i, symbol.st_size, sym_name_str,
>>>>+	    elf_symbol::create(env, i, symbol.st_size, symbol.st_value,
>>>>+			       symbol.st_shndx, sym_name_str,
>>>>			       sym_type, sym_binding,
>>>>			       symbol.st_shndx != SHN_UNDEF,
>>>>			       symbol.st_shndx == SHN_COMMON,
>>>>@@ -2181,6 +2188,7 @@ lookup_symbol_from_symtab(const environment*		env,
>>>>	    ABG_ASSERT(!ver.str().empty());
>>>>	  elf_symbol_sptr symbol_found =
>>>>	    elf_symbol::create(env, i, sym->st_size,
>>>>+			       sym->st_value, sym->st_shndx,
>>>>			       name_str, sym_type,
>>>>			       sym_binding, sym_is_defined,
>>>>			       sym_is_common, ver, sym_visibility);
>>>>@@ -3081,7 +3089,10 @@ public:
>>>>/// to symbols exported using the EXPORT_SYMBOL_GPL macro from the
>>>>/// linux kernel.
>>>>Elf_Scn*			ksymtab_section_;
>>>>+  Elf_Scn*			ksymtab_reloc_section_;
>>>>Elf_Scn*			ksymtab_gpl_section_;
>>>>+  Elf_Scn*			ksymtab_gpl_reloc_section_;
>>>>+  Elf_Scn*			ksymtab_strings_section_;
>>>>Elf_Scn*			versym_section_;
>>>>Elf_Scn*			verdef_section_;
>>>>Elf_Scn*			verneed_section_;
>>>>@@ -3277,7 +3288,10 @@ public:
>>>>  nb_ksymtab_entries_ = 0;
>>>>  nb_ksymtab_gpl_entries_ = 0;
>>>>  ksymtab_section_ = 0;
>>>>+    ksymtab_reloc_section_ = 0;
>>>>  ksymtab_gpl_section_ = 0;
>>>>+    ksymtab_gpl_reloc_section_ = 0;
>>>>+    ksymtab_strings_section_ = 0;
>>>>  versym_section_ = 0;
>>>>  verdef_section_ = 0;
>>>>  verneed_section_ = 0;
>>>>@@ -6079,6 +6093,23 @@ public:
>>>>  return ksymtab_section_;
>>>>}
>>>>
>>>>+  /// Return the .rel{a,}__ksymtab section of a linux kernel ELF file (either
>>>>+  /// a vmlinux binary or a kernel module).
>>>>+  ///
>>>>+  /// @return the .rel{a,}__ksymtab section if found, nil otherwise.
>>>>+  Elf_Scn*
>>>>+  find_ksymtab_reloc_section() const
>>>>+  {
>>>>+    if (!ksymtab_reloc_section_)
>>>>+      {
>>>>+        Elf_Scn *sec = find_section(elf_handle(), ".rela__ksymtab", SHT_RELA);
>>>>+        if (!sec)
>>>>+          sec = find_section(elf_handle(), ".rel__ksymtab", SHT_REL);
>>>>+        const_cast<read_context*>(this)->ksymtab_reloc_section_ = sec;
>>>>+      }
>>>>+    return ksymtab_reloc_section_;
>>>>+  }
>>>>+
>>>>/// Return the __ksymtab_gpl section of a linux kernel ELF file
>>>>/// (either a vmlinux binary or a kernel module).
>>>>///
>>>>@@ -6092,6 +6123,36 @@ public:
>>>>  return ksymtab_gpl_section_;
>>>>}
>>>>
>>>>+  /// Return the .rel{a,}__ksymtab_gpl section of a linux kernel ELF file
>>>>+  /// (either a vmlinux binary or a kernel module).
>>>>+  ///
>>>>+  /// @return the .rel{a,}__ksymtab_gpl section if found, nil otherwise.
>>>>+  Elf_Scn*
>>>>+  find_ksymtab_gpl_reloc_section() const
>>>>+  {
>>>>+    if (!ksymtab_gpl_reloc_section_)
>>>>+      {
>>>>+        Elf_Scn *sec = find_section(elf_handle(), ".rela__ksymtab_gpl", SHT_RELA);
>>>>+        if (!sec)
>>>>+          sec = find_section(elf_handle(), ".rel__ksymtab_gpl", SHT_REL);
>>>>+        const_cast<read_context*>(this)->ksymtab_gpl_reloc_section_ = sec;
>>>>+      }
>>>>+    return ksymtab_gpl_reloc_section_;
>>>>+  }
>>>>+
>>>>+  /// Return the __ksymtab_strings section of a linux kernel ELF file
>>>>+  /// (either a vmlinux binary or a kernel module).
>>>>+  ///
>>>>+  /// @return the __ksymtab_strings section if found, nil otherwise.
>>>>+  Elf_Scn*
>>>>+  find_ksymtab_strings_section() const
>>>>+  {
>>>>+    if (!ksymtab_strings_section_)
>>>>+      const_cast<read_context*>(this)->ksymtab_strings_section_ =
>>>>+        find_section(elf_handle(), "__ksymtab_strings", SHT_PROGBITS);
>>>>+    return ksymtab_strings_section_;
>>>>+  }
>>>>+
>>>
>>>We should find a way to deduplicate these functions. Not necessarily in
>>>this patch, but in a followup. Most of them only differ in the section
>>>name and whether they check for an alternative. Something generic like
>>>
>>>struct SectionQuery{ string name, int type };
>>>Elf_Scn* find_section(const vector<SectionQuery>&);
>>>
>>>should be sufficient to forward from all the other variants to one
>>>implementation.
>>>
>>>>/// Return either a __ksymtab or a __ksymtab_gpl section, in case
>>>>/// only the __ksymtab_gpl exists.
>>>>///
>>>>@@ -6106,6 +6167,19 @@ public:
>>>>  return result;
>>>>}
>>>>
>>>>+  /// Return either a .rel{a,}__ksymtab or a .rel{a,}__ksymtab_gpl section
>>>>+  ///
>>>>+  /// @return the .rel{a,}__ksymtab section if it exists, or the
>>>>+  /// .rel{a,}__ksymtab_gpl; or NULL if neither is found.
>>>>+  Elf_Scn*
>>>>+  find_any_ksymtab_reloc_section() const
>>>>+  {
>>>>+    Elf_Scn *result = find_ksymtab_reloc_section();
>>>>+    if (!result)
>>>>+      result = find_ksymtab_gpl_reloc_section();
>>>>+    return result;
>>>>+  }
>>>>+
>>>>/// Return the SHT_GNU_versym, SHT_GNU_verdef and SHT_GNU_verneed
>>>>/// sections that are involved in symbol versionning.
>>>>///
>>>>@@ -6283,7 +6357,8 @@ public:
>>>>    stv_to_elf_symbol_visibility(GELF_ST_VISIBILITY(s->st_other));
>>>>
>>>>  elf_symbol_sptr sym =
>>>>-      elf_symbol::create(env(), symbol_index, s->st_size, name_str,
>>>>+      elf_symbol::create(env(), symbol_index, s->st_size,
>>>>+			 s->st_value, s->st_shndx, name_str,
>>>>			 stt_to_elf_symbol_type(GELF_ST_TYPE(s->st_info)),
>>>>			 stb_to_elf_symbol_binding(GELF_ST_BIND(s->st_info)),
>>>>			 sym_is_defined, sym_is_common, ver, vis);
>>>>@@ -7435,6 +7510,64 @@ public:
>>>>  return symbol;
>>>>}
>>>>
>>>>+  /// Determine the format of the __ksymtab and __ksymtab_gpl
>>>>+  /// sections of Linux kernel modules.
>>>>+  ///
>>>>+  /// This is important because we need the know the format of these
>>>
>>>s/need the/need to/                       ^^^
>>>
>>>>+  /// sections to be able to read from them.
>>>>+  ///
>>>>+  /// @return the format the __ksymtab[_gpl] sections.
>>>>+  enum ksymtab_format
>>>>+  get_ksymtab_format_module() const
>>>>+  {
>>>>+    Elf_Scn *section = find_any_ksymtab_reloc_section();
>>>>+
>>>>+    if (!section)
>>>>+      return UNDEFINED_KSYMTAB_FORMAT;
>>>>+
>>>>+    // Libdwfl has a weird quirk where, in the process of obtaining an Elf
>>>>+    // descriptor via dwfl_module_getelf(), it will apply all relocations it
>>>>+    // knows how to and it will zero a relocation after applying it. If the
>>>
>>>will zero the relocation info after
>>>
>>>>+    // .rela__ksymtab* section contained only simple (absolute) relocations,
>>>>+    // they will have been all applied and sh_size will be 0. For arches that
>>>>+    // support relative ksymtabs, simple relocations only appear in pre-4.19
>>>>+    // kernel modules.
>>>>+    GElf_Shdr section_mem;
>>>>+    GElf_Shdr *section_shdr = gelf_getshdr(section, &section_mem);
>>>>+    if (section_shdr->sh_size == 0)
>>>>+      return PRE_V4_19_KSYMTAB_FORMAT;
>>>>+    /* SHT_REL sections not supported yet */
>>>>+    if (section_shdr->sh_type == SHT_REL)
>>>>+      return UNDEFINED_KSYMTAB_FORMAT;
>>>>+
>>>>+    // If we still have a normal non-zeroed relocation section, we can guess
>>>>+    // what format the ksymtab is in depending on what types of relocs it
>>>>+    // contains.
>>>>+    GElf_Rela rela;
>>>>+    Elf_Data *rela_section_data = elf_rawdata(section, 0);
>>>>+    gelf_getrela(rela_section_data, 0, &rela);
>>>>+
>>>>+    // Sigh, I dislike the arch-dependent code here, but this seems to be a
>>>>+    // reliable heuristic for kernel modules for now. Relative ksymtabs only
>>>>+    // supported on x86 and arm64 as of v4.19.
>>>>+    ksymtab_format format;
>>>>+    switch (GELF_R_TYPE(rela.r_info))
>>>>+      {
>>>>+      case R_X86_64_64: // Same as R_386_32
>>>
>>>add 'fallthrough' to the comments, else younger compilers will warn
>>>about the potentially unintentional fallthrough.
>>>
>>>>+      case R_AARCH64_ABS64:
>>>>+        format = PRE_V4_19_KSYMTAB_FORMAT;
>>>>+	break;
>>>>+      case R_X86_64_PC32: // Same as R_386_PC32
>>>>+      case R_AARCH64_PREL32:
>>>>+	format = V4_19_KSYMTAB_FORMAT;
>>>>+	break;
>>>>+      default:
>>>>+        format = UNDEFINED_KSYMTAB_FORMAT;
>>>
>>>We should probably emit a warning in this case. Are there more
>>>architectures that have PREL32 enabled as of now that are not yet
>>>covered here?
>>
>>On second thought, I don't think we should actually warn *here* at
>>this point, inside get_ksymtab_format_module(), because then we would
>>have to include/check for reloc types for all arches that libabigail
>>supports.
>>
>>For instance if we have a ppc64 module, we would hit the default case
>>and warn immediately, but instead we should be falling back to
>>try_reading_first_ksymtab_entry_using_pre_v4_19_format(), which
>>gets called when get_ksymtab_format_module() returns
>>UNDEFINED_KSYMTAB_FORMAT. We should warn when we have already
>>exhausted all methods of determining the ksymtab format, which is what
>>the current code does. The current sequence of calls is (1) if it's a
>>module, see if we recognize any of the special PREL32 reloc types that
>>we know of. if not, try method (2), call try_reading_first_ksymtab_entry_using_pre_v4_19_format(),
>>then try method (3), try_reading_first_ksymtab_entry_using_v4_19_format(),
>>then finally do ABG_ASSERT when all fails. I hope I'm making sense..
>>Does that sound reasonable?
>
>Hi Jessica,
>
>I think we are talking about the same thing. All I want to prevent is a
>misleading fallback. So, yes, we should try all known methods and should
>only warn if we are not sure whether the result we came up is actually
>correct. Which again leads also to the question: Do you think it is
>reasonable to have something in the kernel binaries to declare the
>ksymtab format? In a separate section or a magic symbol? Something to
>make it easier (and safer) for tools like libabigail?

Yes, I think that would be reasonable, although I have not yet thought
of a non-intrusive/kabi-compatible way of expressing that yet. Perhaps
we could stick a magic string in __ksymtab_strings or a separate
section like you suggest, maybe named .ksymtab_format with a magic
value indicating the type. Although with the merge window at our heels
it might have to be for 5.5, but that's plenty of time to come up with
something at least.

Thanks,

Jessica

      reply	other threads:[~2019-09-12 13:44 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-01  0:00 Jessica Yu
2019-01-01  0:00 ` Jessica Yu
2019-01-01  0:00 ` Matthias Maennich via libabigail
2019-01-01  0:00   ` Jessica Yu
2019-01-01  0:00     ` Matthias Maennich via libabigail
2019-01-01  0:00       ` Jessica Yu
2019-01-01  0:00         ` Matthias Maennich via libabigail
2019-01-01  0:00   ` Jessica Yu
2019-01-01  0:00     ` Matthias Maennich via libabigail
2019-01-01  0:00   ` Jessica Yu
2019-01-01  0:00     ` Matthias Maennich via libabigail
2019-01-01  0:00       ` Jessica Yu [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190912134439.GB27845@linux-8ccs \
    --to=jeyu@kernel.org \
    --cc=dodji@redhat.com \
    --cc=kernel-team@android.com \
    --cc=libabigail@sourceware.org \
    --cc=maennich@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).